0% found this document useful (0 votes)
0 views

Test Features

The document discusses various testing methods in language assessment, focusing on direct versus indirect testing, discrete point versus integrative testing, norm-referenced versus criterion-referenced testing, and objective versus subjective testing. Direct testing measures candidates' actual skills, while indirect testing assesses underlying abilities, though the latter often has weaker correlations with performance. The text advocates for direct and criterion-referenced testing for more accurate and meaningful assessments of language proficiency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Test Features

The document discusses various testing methods in language assessment, focusing on direct versus indirect testing, discrete point versus integrative testing, norm-referenced versus criterion-referenced testing, and objective versus subjective testing. Direct testing measures candidates' actual skills, while indirect testing assesses underlying abilities, though the latter often has weaker correlations with performance. The text advocates for direct and criterion-referenced testing for more accurate and meaningful assessments of language proficiency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Test Features

I. Direct versus indirect testing

Testing is said to be direct when it requires the candidate to perform precisely the skill that we wish to measure. If
we want to know how well candidates can write compositions, we get them to write compositions. If we want to
know how well they pronounce a language, we get them to speak. The tasks, and the texts that are used in direct
testing, should be as authentic as possible. The fact that candidates are aware that they are in a test situation means
that the tasks cannot be really authentic. Nevertheless every effort is made to make them as realistic as possible.

Direct testing is easier to carry out when it is intended to measure the productive skills of speaking and writing. The
very acts of speaking and writing provide us with information about the candidate’s ability. With listening and
reading, however, it is necessary to get candidates not only to listen or read but also to demonstrate that they have
done this successfully. Testers have to devise methods of eliciting such evidence accurately and without the method
interfering with the performance of the skills in which they are interested. Appropriate methods for achieving this
are discussed in Chapters 11 and 12. Interestingly enough, in many texts on language testing it is the testing of
productive skills that is presented as being most problematic, for reasons usually connected with reliability. In fact
these reliability problems are by no means insurmountable, as we shall see in Chapters 9 and 10.

Direct testing has a number of attractions. First, provided that we are clear about just what abilities we want to
assess, it is relatively straightforward to create the conditions which will elicit the behaviour on which to base our
judgements. Secondly, at least in the case of the productive skills, the assessment and interpretation of students’
performance is also quite straightforward. Thirdly, since practice for the test involves practice of the skills that we
wish to foster, there is likely to be a helpful backwash effect.

Indirect testing attempts to measure the abilities that underlie the skills in which we are interested. There was a time
when some professional testers would use the multiple choice technique to measure writing ability. Their items
were of the following kind where the candidate had to identify which of the underlined elements is erroneous or
inappropriate in formal standard English:

At the outset the judge seemed unwilling to believe anything that was said to her by my wife and I.

While the ability to respond to such items has been shown to be related statistically to the ability to write
compositions (although the strength of the relationship was not particularly great), the two abilities are far from
being identical. Another example of indirect testing is Lado’s (1961) proposed method of testing pronunciation
ability by a paper-and-pencil test in which the candidate has to identify pairs of words which rhyme with each other.

Perhaps the main appeal of indirect testing is that it seems to offer the possibility of testing a representative sample
of a finite number of abilities which underlie a potentially indefinite large number of manifestations of them. If, for
example, we take a representative sample of grammatical structures, then, it may be argued, we have taken a
sample which is relevant for all the situations in which control of grammar is necessary. By contrast, direct testing is
inevitably limited to a rather small sample of tasks, which may call on a restricted and possibly unrepresentative
range of grammatical structures. On this argument, indirect testing is superior to direct testing in that its results are
more generalisable.

The main problem with indirect tests is that the relationship between performance on them and performance of the
skills in which we are usually more interested tends to be rather weak in strength and uncertain in nature. We do not
yet know enough about the component parts of, say, composition writing to predict accurately composition writing
ability from scores on tests that measure the abilities that we believe underlie it. We may construct tests of
grammar, vocabulary, discourse markers, handwriting, punctuation, or of any other linguistic element. But we will
still not be able to predict accurately scores on compositions (even if we make sure of the validity of the composition
scores by having people write many compositions and by scoring these in a valid and highly reliable way).

It seems to us that in our present state of knowledge, at least as far as proficiency and final achievement tests are
concerned, it is preferable to rely principally on direct testing. Provided that we sample reasonably widely (for
example require at least two compositions, each calling for a different kind of writing and on a different topic), we
can expect more accurate estimates of the abilities that really concern us than would be obtained through indirect
testing. The fact that direct tests are generally easier to construct simply reinforces this view with respect to
institutional tests, as does their greater potential for positive backwash. It is only fair to say, however, that many
testers are reluctant to commit themselves entirely to direct testing and will always include an indirect element in
their tests. Of course, to obtain diagnostic information on underlying abilities, such as control of particular
grammatical structures, indirect testing may be perfectly appropriate.

In summary, we might say that both direct and indirect testing rely on obtaining samples of behaviour and drawing
inferences from them. While sampling may be easier in indirect testing, making meaningful inferences is likely to be
more difficult. Accurate inferences may be more readily made in direct testing, though it may be more difficult to
obtain samples that are truly representative. One can expect the backwash effect of direct testing to be the more
positive.

Before ending this section, it should be mentioned that some tests are referred to as semi-direct. The most obvious
examples of these are speaking tests where candidates respond to recorded stimuli, with their own responses being
recorded and later scored. These tests are semidirect in the sense that, although not direct, they simulate direct
testing.

II. Discrete point versus integrative testing

Discrete point testing refers to the testing of one element at a time, item by item. This might, for example, take the
form of a series of items, each testing a particular grammatical structure. Integrative testing, by contrast, requires
the candidate to combine many language elements in the completion of a task. This might involve writing a
composition, making notes while listening to a lecture, taking a dictation, or completing a cloze passage. Clearly this
distinction is not unrelated to that between indirect and direct testing. Discrete point tests will almost always be
indirect, while integrative tests will tend to be direct. However, some integrative testing methods, such as the cloze
procedure, are indirect. Diagnostic tests of grammar of the kind referred to in an earlier section of this chapter will
tend to be discrete point.

III. Norm-referenced versus criterion-referenced testing

Imagine that a reading test is administered to an individual student. When we ask how the student performed on the
test, we may be given two kinds of answer. An answer of the first kind would be that the student obtained a score
that placed her or him in the top 10 percent of candidates who have taken that test, or in the bottom five percent; or
that she or he did better than 60 percent of those who took it. A test which is designed to give this kind of
information is said to be norm-referenced. It relates one candidate’s performance to that of other candidates. We
are not told directly what the student is capable of doing in the language.

Testing for assignment to levels is intended to be carried out in a faceto-face situation, with questions being asked
orally. The tester gives the candidate reading matter of different kinds and at different levels of difficulty, until a
conclusion can be made as to the candidate’s ability. This can only be done, of course, with relatively small numbers
of candidates.

In this case we learn nothing about how the individual’s performance compares with that of other candidates.
Rather we learn something about what he or she can actually do in the language. Tests that are designed to provide
this kind of information directly are said to be criterion-referenced2.

When the previous edition of this book was published, it was not difficult to point to major language tests which
were norm-referenced. The scores which were reported did not indicate what a candidate could or could not do.
Rather a numerical score was provided, which candidates, teachers and institutions had to interpret on the basis of
experience. Only over time did it become possible to relate a person’s score to their likely success in coping in
particular second or foreign language situations.

Pure criterion-referenced tests classify people according to whether or not they are able to perform some task or set
of tasks satisfactorily. The tasks are set, and the performances are evaluated. It does not matter in principle whether
all the candidates are successful, or none of the candidates is successful. In broad terms, tasks are set, and those
who perform them satisfactorily ‘pass’; those who don’t, ‘fail’. This means that students are encouraged to measure
their progress in relation to meaningful criteria, without feeling that, because they are less able than most of their
fellows, they are destined to fail. Criterion-referenced tests therefore have two positive virtues: they set meaningful
standards in terms of what people can do, which do not change with different groups of candidates, and they
motivate students to attain those standards. We welcome the trend to make major tests more criterion-referenced.

Books on language testing have tended to give advice which is more appropriate to norm-referenced testing than to
criterion-referenced testing. One reason for this may be that procedures for use with norm-referenced tests
(particularly with respect to such matters as the analysis of items and the estimation of reliability) are well
established, while those for criterion-referenced tests are not. The view taken in this book, and argued for in Chapter
6, is that criterion-referenced tests are often to be preferred, not least for the positive backwash effect they are
likely to have. The lack of agreed procedures for such tests is not sufficient reason for them to be excluded from
consideration.

IV. Objective testing versus subjective testing

The distinction here is between methods of scoring, and nothing else. If no judgement is required on the part of the
scorer, then the scoring is objective. A multiple choice test, with the correct responses unambiguously identified,
would be a case in point. If judgement is called for, the scoring is said to be subjective. There are different degrees of
subjectivity in testing. The impressionistic scoring of a composition may be considered more subjective than the
scoring of short answers in response to questions on a reading passage.

Objectivity in scoring is sought after by many testers, not for itself, but for the greater reliability it brings. In general,
the less subjective the scoring, the greater agreement there will be between two different scorers (and between the
scores of one person scoring the same test paper on different occasions). However, there are ways of obtaining
reliable subjective scoring, even of compositions.

You might also like