Chap 1 Lang Testing
Chap 1 Lang Testing
TESTING, ASSESSING,
AND TEACHING
If you hear the word test in any classroom setting, your thoughts are not likely to be
positive, pleasant, or affirming. The anticipation of a test is almost always accompa
nied by feelings of anxiety and self-doubt~along with a fervent hope that you will
come out of it alive. Tests seem as unavoidable as tomorrow's sunrise in virtually
every kind of educational setting. Courses of study in every diSCipline are marked
by periodic tests-milestones of progress (or inadequacy)-and you intensely wish
for a miraculous exemption from these ordeals. We live by tests and sometimes
(metaphorically) die by them.
For a quick revisiting of how tests affect manY,learners, take the following
vocabulary quiz. All the words are found in standard English dictionaries, so you
should be able to answer all six items correctly, right? Okay, take the quiz and circle
the correct definition for each word. '
Circle the correct answer. You have 3 minutes to complete this examination!
~: polygene a. the first stratum of lower-order protozoa containing multiple genes
b. a combination of two or more plastics to produce a highly durable
material
c. one of a set of cooperating genes, each producing a small
quantitative effect
d. any of a number of multicellular chromosomes
,
1
2 CHAPTER 1 Testing, Assessing, and Teaching
3. gudgeon a. a jail for commoners during the Middle Ages, located in the villages
of Germany and France
b. a strip of metal used to reinforce beams and girders in building
construction
c. a tool used by Alaskan Indians to carve totem poles
d. a small Eurasian freshwater fish
.....
4. hippogriff a. a term used in children's literature to denote colorful and descriptive
phraseology
b. a mythological monster having the wings, claws, anij head of a
griffin and the body of a horse
c. ancient Egyptian cuneiform writing commonly found on the walls of
tombs
d. a skin transplant from the leg or foot to the hip
Now, how did that make you feel? Probably just the same as many learners
feel when they take many multiple-choice (or shall we say multiple-guess?),
timed, "tricky"tests. To add to the torment, if this were a commercially adminis
tered standardized test, you might have to wait weeks before learning your
results. You can check your answers on this quiz now by turning to page 16. If
you correctly identified three or more; items, congratulations! You just exceeded
the average.
Of course, this little pop quiz on obscure vocabulary is not an appropriate
example of classroom-based achievement testing, nor is it intended to be. It's simply
an illustration of how tests make us feel much of the time. Can tests be positive
experiences? Can they build a person's confidence and become learning experi
ences? Can they bring out the best in students? The answer is a resounding yes!
Tests need not be degrading, artificial, anxiety-provoking experiences. And that's
partly "rhat this book is all about: helping you to create more authentic, intrinsically
CHAPTER 1 Testing, Assessing, an.d Teaching 3
motivating assessment procedures that are appropriate for their context and
designed to offer constructive feedback to your students.
aefore we look at tests and test design in second language education, we need
to understand three basic interrelated concepts: testing, assessment, and teaching.
Notice that the title of this book is Language Assessment, not Language Testing.
There are important differences between these two constructs, and an even more
important relationship among testing, assessing, and teaching.
WHAT IS A TEST?
A test, in simple terms, is a method of measuring a person ~ ability, know/edge, or
performance in a given domain. Let's look at the components of this defmition. A
test is first a method. It is an instrument-a set of techniques, procedures, or items
that requires performance on the part of the test-taker. To qualify as a test, the method
must be explicit and structured: multiple-choice questions with prescribed correct
answers; a writing prompt with ;t scoring rubric; an oral interview based on a que;
tion sCript and a checklist of expected responses to be filled in by the administrator.
Second, a test must measure. Some tests measure general ability, while others
focus on very specific competencies or objectives. A' multi-skill profiCiency test
determines a general ability level; a quiz on recognizing correct use of defmite arti
.cles measures specific knowledge. The way the results or measurements are com
municated may vary. Some tests, such as a classroom. b ased short-answer essay test,
may earn the test-taker a letter grade accompanied by the instructor's marginal com
ments. Others, particularly large-scale standardized tests, provide a total numerical
score, a percentile rank, and perhaps sonle subscores. If an instrument does not
specify a form of reporting measurement-a means for offering the test-taker some
kind of result-then that technique cannot appropriately be defmed as a test.
Next, a test measures an individual's ability, knowledge, or performance. Teste~s
need to understand who the test-takers are. What is their previous experience and
background? Is the test appropriately matched to their abilities? How should test
. takers interpret their scores?
A test measures performance, but the results imply the test-taker's ability, or, to
use a concept common in the field of linguistics, competence. Most language tests
measure one's ability to perform language, that is, to speak, write, read, or listen to a
subset of language. On the other hand, it is not uncommon to fmd tests designed to
tap into a test-taker's knowledge about language: defming a vocabulary item, reciting
a grammatical rule, or identifying a' rhetorical feature in written discourse.
Performance-based tests sample the test-taker's actual use of language, but from
those samples the test administrator infers general competence. A test of reading
comprehension, for example, may consist of several short reading passages each fol
lowed by a limited number of comprehension questions-a small sample of a
second language learner's total reaciLflg behavior. But from the results of that test, the
examiner may infer a certain level of get1eral reading ability.
4 CHAPTER 1 Testing, Assessing, and Teaching
Finally, a test measures a given domain. In the case of a proficiency test, even
though the actual performance on the test involves only a sampling of skills, that
domain is overall proficiency in a language-general competence in all skills of a
language. Other tests may have more specific criteria. A test of pronunciation might
well be a test of only a limited set of phonemic minimal pairs. A vocabulary test may
focus on only the set of words covered in a particular lesson or unit. One of the
biggest obs.,tacles to overcome in constructing adequate tests is to measure the
desired criterion and not include other factors inadvertently, an issue that is
addressed in Chapters 2 and 3.
A well-constructed test is an instrument that provides an accurate measure of
the test-taker's ability within a particular domain. The definition sounds fairly simple,
but in fact, constructing a good test is a complex task involving both science and art.
graded. Teaching sets up the practice games of language learning: the opportuni.ties
for learners to listen, think, take risks, set goals, and process feedback from" the
«coa~h" and then recycle through the skills that they are trying to master. (A diagram
of the relationship among testing, teaching, and assessment is found in Figure 1.1.)
8
ASSESSMENT
At the same time, during these practice activities, teachers (and tennis coaches)
are indeed observing students' performan~eand making various evaluations of each"
learner: How did the performance compare to previous performance? Which
aspects of the performance were better than others? Is the learner performing rip
to an expected potential? How does the performance compare to that of others in
the same learning community? In the ideal classroom, all these observations feed
into the way the teacher provides instruction to each student.
suggestion for a strategy for compensating for a reading difficulty, and showing how
to modify a student's note-taking to better remember the content of a lecture.
On the other hand, formal assessments are exercises or procedures specifi
cally designed to tap into a storehouse of skills and knowledge. They are systematic,
planned sampling techniques constructed to give teacher and student an appraisal
of student ackievement. To extend the tennis analogy, formal assessments are the
tournament games that occur periodically in the course of a regimen of practice.
Is formal assessment· the same as a test? We can say that all tests are formal
assessments, but not all formal assessment is testing. For example, you might use a
student's journal or portfolio of materials as a formal assessment of the attainment
of certain course objectives, but it is problematic to call those two procedures
"tests." A systematic set of observations of a student's frequency of oral participation
in class is certainly a formal assessment, but it too is hardly what anyone would call
a test. Tests are usually relatively time-constrained (usually spanning a class period
or at most several hours) and draw on a limited sample of behavior.
your students might otherwise view as a summative test? Can you offer your ~tu
dents an opportunity to convert tests into "learning experiences"? We will take up
that ~hallenge in subsequent chapters ·in this book.
Teaching by Principles [hereinafter TBP] , Chapter 2). 1 For example, in the· 1950s, an
era of behaviorism and special attention to contrastive analysis, testing focused on
specific language elements such as the phonoiogical, grammatical, and lexical con
trasts between two languages. In the 1970s and 1980s, COf!!l!l~nicative theories of
language brought \vith them a more integrative view of testing it1 whicli-specialists
claimed that "the 'whole of the communicative event was considerably greater than
the sum of its linguistic elements" (Clark, 1983, p. 432). Today, test designers are still
challenged in their quest for more authentiC, valid instruments that simul~te real- J
world interaction. --.-~, F
1 Frequent references are made in this book to companion volumes by the author.
Principles of Language Learning and Teaching (PLL1) (Fourth Edition, 2000) is a
basic teacher reference book on essential foundations of second language acquisition
on which pedagogical practices are based. Teaching by Principles (TBP) (Second
Edition, 2001) spells out that pedagogy in practical terms for the language teacher.
CHAPTER 1
. Teaching..., 9
Testing" (\ssessing, and
claimed that cloze test results are good measures of overall proficiency. According
to theoretical constructs underlying this claim, the ability to supply appropriate
wor4s in blanks requires a number of abilities that lie at the heart of competence in
a language: knowledge of vocabulary, grammatical structure, discourse structure,
reading skills and strategies, and an internalized "expectancy" grammar (enabling
one to predict an item that will come next in a sequence). It was argued that suc
cessful completion of cloze items taps into all of those abilities, which were said to
be the essence of global language proficiency.
Dictation is a familiar language-teaching technique that evolved into a testing
technique. Essentially, learners listen to a passage of 100 to 150 words read aloud by
an administrator (or audiotape) and write what they hear, using correct spelling. The
listening. portion usually has three stages: an oral reading without pauses; an oral
reading with long pauses between every phrase (to give the learner time to write
down what is heard); and a third reading at normal speed to give test-takers a chance
to check what they wrote. (See Chapter 6 for more discussion of dictation as an
assessment device.) . ./
Supporters argue that dictation is an integrative test because it •._-,.......
taps,. into gram 1/
. ..... ". ..
matical and disc~ll1"~~<:Qmp,~~~~·l(.:i.f!.~ required for other modes of performance in a
taftguage~'·5iiccess on a dictation requires careful listening, reproduction in writing
of what is heard, effident short-term memory, and, to an extent, some expectancy
rules to aid the short-term memory. Furth~r, dictation test results tend to correlate
strongly with other tests of profiCiency. Dictation testing is usually classroom
centered since large-seale administration of dictations is quite impractical from a
scoring standpoint. Reliability of scoring criteria for dictation tests can be improved
by designing multiple-choice or exact-word cloze test scoring.
Proponents of integrative test methods soon centered their arguments on what
became knowp. as the unitary trait hypothesis, which suggested an "indivisible"
view of language profiCiency: that vocabulary, grammar, phonology, the "four skills,"
and other discrete points of language could not be disentangled from each other in
language performance. The unitary trait hypothesis contended that there is a gen
eral factor of language proficiency such that all the discrete points do not add up to
that whole.
Others argued strongly against the unitary trait pOSition. In a study of students
in Brazil and the Philippines, Farhady (1982) found Significant and widely varying
differences in performance on an ESL profiCiency test, depending on subjects' native
country, major field of study, and graduate versus undergraduate status. For example,
Brazilians scored very low in listening comprehension and relatively high in reading
comprehension. Filipinos, whose scores on five of the six components of the test
were considerably higher than Brazilians' scores, were actually lower than Brazilians
in reading comprehension scores. Farhady's contentions were supported in other
research that seriously questioned the unitary trait hypothesis. Finally, in the face of
the evidence, Oller retreated from his earlier stand and admitted that "the u~tr.ry
trait hypothesis was wrong" (1983, P.. 352).
10 CHAPTER 7 Testing, Assessing, and Teaching
Performance-Based Assessment
In language courses and programs around the world, test designers are now tackling
this new and more student-centered agenda (Alderson, 2001, 2002). Instead of just
offering paper-and-pencil selective response tests of a plethora of separate items,
perfonnance-based asseSSnlen( of language typically .involves oral production,
CHAPTER 1 Testing, -Assessing, and Teaching, 11
Robert Sternberg (1988, 1997) also charted new territory in intelligence re
search in recognizing creative thinking and manipulative strategies as part of intel
ligence. All "smart" people aren't necessarily adept at fast, reactive thinking. They
may be very innovative in being able to think beyond the normal limits imposed by
existing tests, but they may need a good deal of processing time to enact this cre
ativity. Other forms of smartness are found in those who know how to manipulate
their environment, namely, other people. Debaters, politiCians, successful salesper
sons, smooth talkers, and con artists are all smart in their manipulative ability to per
suade others to think their way, vote for them, make a purchase,·or -dQ·-5Q-mething
they might not otherwise do.
More recently, Daqiel Goleman's (1995) concept. of "EQ" (emotional quotient)
has spurred us to unde~s~e-ore'~the iinportance of the emotions 1ii our cognitive pro
cessing. Those who manage their emotions-especially emotions that can be detri
mental-tend to be more capable of fully intelligent processing. Anger, grief,
resentment, self-doubt, and other feelings can easily impair peak performance in
everyday tasks as well as higher-order problem solving.
These new conceptualizations of il1 telligence have not been universally
accepted by the academic community (see White, 1998, for example). Nevertheless,
their intuitive appeal infused the decade of the 1990s with a sense of both freedom
and responsibility in our testing agenda. Coupled with parallel educational reforms
at the time (Armstrong, 1994), they helped to free us from relying exclusively on
2 For a summary of Gardner'S theory of intelligence, see Brown (2000, pp. 100- 102).
CHAPTER 1 Testing, Ass.essing, and Teaching 13
ficult, in fact, to draw a clear line of distinction between what Armstrong (1994) and
Bailey (1998) have called traditional and alternative assessment. Many forms of
asSessment fall in between the two, and some combine the best of both.
Second, it is obvious that the table shows a bias toward aiternative assessment, ~"'!I~. ,~
and one should not be misled into thinking that everything on the left-hand side is
tainted while the list on the right-hand side offers salvation to the field of language
assessment! As Bt:ow.ll and Hudson (1998) aptly pointed out, the assessment ~radi
tions available to us should b'e'val'ued"and utilized for the ·functions that they pro
~~"!:f~....~~""!t'Ifi'p~~".....,:t'>:~t~.~t':~· ~""''''''''-'''c't'''''''",<,.""""",("_", __ ",, _i.-" __ H • • ,. . . . . " _ . _ ••• _.,,,.-.,,~.,- f..: _e'" <', -.~
vide. At the same time, we might all be stimulatecffO rook at tfie right-hand list and
---ask' ourselves if, among those concepts, there are alterQ;!l!Y~s to assessment that we ~~~:<.,..·,.~,lL',.~"fo'f •
Computer-Based Testing
Recent years have seen a burgeoning of assessment in which the test--taker performs
responses on a computer. Some computer-based tests (also known 3$ "computer
assisted" or "web-based" tests) are small-scale "home-grown" tests available on web
sites. Others are standardized, large-scale tests in which thousands or even .tens of
thousands of test-takers are involved. Students receive prompts (or probes, as they
are sometim<;s referred to) in the form of spoken or written stimuli from the com
puterized test and are required to type (or in some cases, speak) their responses.
Almost all computer-based test items· have fIXed, closed-ended responses; however,
tests like the Test of English as a Foreign Language (fOEFL~ offer a written essay
section that must be scored by humans (as opposed to automatic, electrOnic, or
machine scoring). As this book goes to press, the designers of the TOEFL are on the
verge of offering a spoken English section.
A specific type of computer·based test, a computer-adaptive test, has been
available for many years but has recently gained momentum. In a computer-adaptive
test (CAn, each test-taker receives a set of questions that meet the test specifica
tions and that are generally appropriate for his or her performance level. The CAT
starts with questions of moderate difficulty. As tesFtakers-answer-eaeh question, the
computer scores the question and uses that information, as well as the responses to
previous questions, to determine which question will be presented next. As long as
examinees respond correctly, the computer typically selects questions of greater or
equal difficulty. Incorrect answers, however, typically bring questions of lesser or
equal difficulty. The computer is programmed to fulfill the test design as it continu
ously adjusts to fmd questions of appropriate difficulty for test-takers at all perfor
mance levels. In CATs, the test-taker sees only one question at a time, and the
computer scores each question before selecting the next one. As a result, test-takers
cannot skip questions, and once they have entered and confirmed their answers t
they cannot return to questions or to any earlier part of the test.
Computer-based testing, with or without CAT technology, offers these advantages:
• classroom.;based testing
• self-directed testh"1g on various aspects of a language (vocabulary, granunar,
discourse, one or all of the four skills, etc.)
CHAPTER 7 Testing, Assessing,and T.eaching 1.~
Of course, some disadvantages are present in our current predilection for com
puterizing testing. Among them:
Some argue that computer-based testing, pushed to its ultimate level, might mit
igate against recent efforts to return testing to its artful form of being tailored by
teachers for their classrooms, of being designed to be performance~based, and of
allowing a teacher-student dialogue to form the basis of assessmept. This need not
be the case. Computer technology can be a boon to communicative language
testing. Teachers and test-makers of the future will have access to an ever-increasing
range of tools to safeguard against impersonal, stamped-out formulas for assessment.
By using technological innovations creatively, testers will be able to enhance authen
ticity, to increase interactive exchange, and to promote autonomy.
§ §
As you read this book, I hope you will do so with an appreciation for the place
of testing in assessment, and with a sense of the interconnection of assessment and
16 CHAPTER 1 Testing, Assessing, and Teaching
Answers to the vocabulary quiz on pages 1 and 2: 1 c, 2a, 3d, 4b, Sa, 6c.
EXERCISES
[Note: (I) Individual work; (6) Group or pair work; (C) Whole-class discussion.]
1. (G) In a small group, look at Figure 1.1 on page 5 that shows tests as a subset
of assessment and the latter as a subset of teaching. Do you agree with this
diagrammatic depiction of the three terms? Consider the following classroom
teaching techniques: choral drill, pair pronunciation practice, reading aloud,
information gap task, singing songs in "English, writing a description of the
weekend '5 activities. What proportion of each has an assessment facet to it?
Share your conclusions with the rest of the class.
2. (G) The chart below shows a hypotheticaillne of distinction between forma~
dve and summative assessment, and between informal and formal assessment.
As a group, place the following techniques/procedures into one of the four
cells and justify your decision. Share your results with other groups and dis
cuss any differences of opinion.
Placement tests
Diagnostic tests
Periodic achievement tests
Short pop quizzes
CHAPTER 1 Testing, Assessing, and T~aching 1.,7
Formative Summative
Informal
Formal
-'
that may presuppose the same intelligence in order to perform well. Share
your results with other groups.
7. (e) As a whole-class discussion, brainstorm a variety of test tasks that class
members have experienced in learning a foreign language. Then decide
which of those tasks are performance-based, which are not, and which ones
fall in between.
8. (G) Table 1.IUsts traditional and alternative assessment tasks and characteris
tics. In palrs, quickly review the advantages and diSadvantages of each, on
both sides of the chart. Share your conclusions with the rest of the class.
9. (e) Ask class members to share any experiences with computer-based testing
and evaluate the advantages and disadvantages of those experiences.