0% found this document useful (0 votes)
13 views

Chap 1 Lang Testing

edhhh

Uploaded by

maha sourani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Chap 1 Lang Testing

edhhh

Uploaded by

maha sourani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Cl::iAPTERl

TESTING, ASSESSING,
AND TEACHING

If you hear the word test in any classroom setting, your thoughts are not likely to be
positive, pleasant, or affirming. The anticipation of a test is almost always accompa­
nied by feelings of anxiety and self-doubt~along with a fervent hope that you will
come out of it alive. Tests seem as unavoidable as tomorrow's sunrise in virtually
every kind of educational setting. Courses of study in every diSCipline are marked
by periodic tests-milestones of progress (or inadequacy)-and you intensely wish
for a miraculous exemption from these ordeals. We live by tests and sometimes
(metaphorically) die by them.
For a quick revisiting of how tests affect manY,learners, take the following
vocabulary quiz. All the words are found in standard English dictionaries, so you
should be able to answer all six items correctly, right? Okay, take the quiz and circle
the correct definition for each word. '

Circle the correct answer. You have 3 minutes to complete this examination!
~: polygene a. the first stratum of lower-order protozoa containing multiple genes
b. a combination of two or more plastics to produce a highly durable
material
c. one of a set of cooperating genes, each producing a small
quantitative effect
d. any of a number of multicellular chromosomes
,

2. cynosure a. an object that serves as a focal point of attention and admiration; a


center of interest or attention
b. a narrow opening caused by a break or fault in lime$tone caves
c. the cleavage in rock caused by glaCial activity
d. one of a group of electrical impulses capable of passing through
metals

1
2 CHAPTER 1 Testing, Assessing, and Teaching

3. gudgeon a. a jail for commoners during the Middle Ages, located in the villages
of Germany and France
b. a strip of metal used to reinforce beams and girders in building
construction
c. a tool used by Alaskan Indians to carve totem poles
d. a small Eurasian freshwater fish
.....
4. hippogriff a. a term used in children's literature to denote colorful and descriptive
phraseology
b. a mythological monster having the wings, claws, anij head of a
griffin and the body of a horse
c. ancient Egyptian cuneiform writing commonly found on the walls of
tombs
d. a skin transplant from the leg or foot to the hip

5. reglet a. a narrow, flat molding


b. a musical composition of regular beat and harmonic intonation
c. an Australian bird of the eagle family
d. a short sleeve found on women's dresses in Victorian
England

6. fictile a. a short, oblong-shaped projectile used in early eighteenth-century


cannons
b. an Old English word for the leading character of a fictional
novel
c. moldable plastic; formed of a moldable substance such as clay or
earth
d. pertaining to the tendency of certain lower mammals to lose visual
depth perception with increasing age

Now, how did that make you feel? Probably just the same as many learners
feel when they take many multiple-choice (or shall we say multiple-guess?),
timed, "tricky"tests. To add to the torment, if this were a commercially adminis­
tered standardized test, you might have to wait weeks before learning your
results. You can check your answers on this quiz now by turning to page 16. If
you correctly identified three or more; items, congratulations! You just exceeded
the average.
Of course, this little pop quiz on obscure vocabulary is not an appropriate
example of classroom-based achievement testing, nor is it intended to be. It's simply
an illustration of how tests make us feel much of the time. Can tests be positive
experiences? Can they build a person's confidence and become learning experi­
ences? Can they bring out the best in students? The answer is a resounding yes!
Tests need not be degrading, artificial, anxiety-provoking experiences. And that's
partly "rhat this book is all about: helping you to create more authentic, intrinsically
CHAPTER 1 Testing, Assessing, an.d Teaching 3

motivating assessment procedures that are appropriate for their context and
designed to offer constructive feedback to your students.
aefore we look at tests and test design in second language education, we need
to understand three basic interrelated concepts: testing, assessment, and teaching.
Notice that the title of this book is Language Assessment, not Language Testing.
There are important differences between these two constructs, and an even more
important relationship among testing, assessing, and teaching.

WHAT IS A TEST?
A test, in simple terms, is a method of measuring a person ~ ability, know/edge, or
performance in a given domain. Let's look at the components of this defmition. A
test is first a method. It is an instrument-a set of techniques, procedures, or items­
that requires performance on the part of the test-taker. To qualify as a test, the method
must be explicit and structured: multiple-choice questions with prescribed correct
answers; a writing prompt with ;t scoring rubric; an oral interview based on a que;
tion sCript and a checklist of expected responses to be filled in by the administrator.
Second, a test must measure. Some tests measure general ability, while others
focus on very specific competencies or objectives. A' multi-skill profiCiency test
determines a general ability level; a quiz on recognizing correct use of defmite arti­
.cles measures specific knowledge. The way the results or measurements are com­
municated may vary. Some tests, such as a classroom. b ased short-answer essay test,
may earn the test-taker a letter grade accompanied by the instructor's marginal com­
ments. Others, particularly large-scale standardized tests, provide a total numerical
score, a percentile rank, and perhaps sonle subscores. If an instrument does not
specify a form of reporting measurement-a means for offering the test-taker some
kind of result-then that technique cannot appropriately be defmed as a test.
Next, a test measures an individual's ability, knowledge, or performance. Teste~s
need to understand who the test-takers are. What is their previous experience and
background? Is the test appropriately matched to their abilities? How should test­
. takers interpret their scores?
A test measures performance, but the results imply the test-taker's ability, or, to
use a concept common in the field of linguistics, competence. Most language tests
measure one's ability to perform language, that is, to speak, write, read, or listen to a
subset of language. On the other hand, it is not uncommon to fmd tests designed to
tap into a test-taker's knowledge about language: defming a vocabulary item, reciting
a grammatical rule, or identifying a' rhetorical feature in written discourse.
Performance-based tests sample the test-taker's actual use of language, but from
those samples the test administrator infers general competence. A test of reading
comprehension, for example, may consist of several short reading passages each fol­
lowed by a limited number of comprehension questions-a small sample of a
second language learner's total reaciLflg behavior. But from the results of that test, the
examiner may infer a certain level of get1eral reading ability.
4 CHAPTER 1 Testing, Assessing, and Teaching

Finally, a test measures a given domain. In the case of a proficiency test, even
though the actual performance on the test involves only a sampling of skills, that
domain is overall proficiency in a language-general competence in all skills of a
language. Other tests may have more specific criteria. A test of pronunciation might
well be a test of only a limited set of phonemic minimal pairs. A vocabulary test may
focus on only the set of words covered in a particular lesson or unit. One of the
biggest obs.,tacles to overcome in constructing adequate tests is to measure the
desired criterion and not include other factors inadvertently, an issue that is
addressed in Chapters 2 and 3.
A well-constructed test is an instrument that provides an accurate measure of
the test-taker's ability within a particular domain. The definition sounds fairly simple,
but in fact, constructing a good test is a complex task involving both science and art.

ASSESSMENT AND TEACHING


Assessment is a popular and sometimes misunderstood term in current educational
practice. You might be tempted to think of testing and assessing as synonymous
terms, but they are not. Tests are prepared·administrative procedures that.occur at
identifiable times in a curriculum when learners muster all their faculties to offer
peak performance, knowing that their responses are being measured and evaluated.
Assessment, on the other hand, is an ongoing process that encompasses a much
wider domain. Whenever a student responds to a question, offers a conunent, or
tries out a new word or structure, the teacher subconsciously makes an assessment
of the student'S performance. Written work-from a jotted-down phrase to a formal
essay-iS performance that ultimately is assessed by self, teacher, and possibly other
students. Reading and listening activities usually rc;:q~ire some sort of productive
performance that the teacher impliCitly judges, however peripheral that judgment
may be. A good teacher never ceases to aSSesssludel1ts,whetherthose--assessments
are incidental or intended.
:r~sts, then, ar~a subset -of assessmen~; they are certainly not the only form of
assessment that a teacher can make. Tests can be useful devices, but they are only one
among many procedures and tasks that teachers can ultimately use to assess students.
But now, you might be thinking, if you make assessments every time you teach
something in the classroom, does all teaching involve assessment? Are teachers con­
stantly assessing students with no interaction that is assessment-free?
The answer depends on your perspective. For optiroallearning to take place, stu­
dents in the classroom must have the freedom to experiment, to try out their own
hypotheses about language without feeling that their overall competence is being
judged in terms of those trials and errors. In the same way that tournament tennis
players must, before a tournament, have the freedom to practice their skills with no
implications for their fmal placement on that day of days, so also must learners have
ample opportunities to "play" with language in a classroom without being formally
CHAPTER 1 Testing,. ~55e55ing, and Teaching "'_ 5
',. ..'

graded. Teaching sets up the practice games of language learning: the opportuni.ties
for learners to listen, think, take risks, set goals, and process feedback from" the
«coa~h" and then recycle through the skills that they are trying to master. (A diagram
of the relationship among testing, teaching, and assessment is found in Figure 1.1.)

8
ASSESSMENT

Figure 1.1. Tests, assessment, and teaching

At the same time, during these practice activities, teachers (and tennis coaches)
are indeed observing students' performan~eand making various evaluations of each"
learner: How did the performance compare to previous performance? Which
aspects of the performance were better than others? Is the learner performing rip
to an expected potential? How does the performance compare to that of others in
the same learning community? In the ideal classroom, all these observations feed
into the way the teacher provides instruction to each student.

Informal and Formal Assessment


One way to begin untangling the lexical conundrum created by distinguishing
among tests, assessment, and teaching is to distinguish between informal and formal
assessment. Informal assessment can take a number of forms, starting with inci­
dental, unplanned comments and responses, along with coaching and other
impromptu feedback to the student. Examples include saying "Nice job!" "Good
work!" "Did you say can or can't?" "I think you meant to say you broke the glass,
not you break the glass," or putting a @ on some homework.
Informal assessment does not stop there. A good deal of a teacher's informal
assessment is embedded in classroom tasks designed to elicit performance without
recording results and making fixed judgments about a student'S compet~nce.
Examples at this end of the continuum are Inarginal comments on papers,
responding to a draft of an essay, advice about how to better pronounce a word, a
6 CHAPTER 1 Testing, Assessing, and Teaching

suggestion for a strategy for compensating for a reading difficulty, and showing how
to modify a student's note-taking to better remember the content of a lecture.
On the other hand, formal assessments are exercises or procedures specifi­
cally designed to tap into a storehouse of skills and knowledge. They are systematic,
planned sampling techniques constructed to give teacher and student an appraisal
of student ackievement. To extend the tennis analogy, formal assessments are the
tournament games that occur periodically in the course of a regimen of practice.
Is formal assessment· the same as a test? We can say that all tests are formal
assessments, but not all formal assessment is testing. For example, you might use a
student's journal or portfolio of materials as a formal assessment of the attainment
of certain course objectives, but it is problematic to call those two procedures
"tests." A systematic set of observations of a student's frequency of oral participation
in class is certainly a formal assessment, but it too is hardly what anyone would call
a test. Tests are usually relatively time-constrained (usually spanning a class period
or at most several hours) and draw on a limited sample of behavior.

Formative and Summative Assessment


Another useful distinction to.bear in mind is the function of.an assessment: How is
the procedure to be used? Two functions are commonly identified in the literature:
formative and summative assessment. Most of our classroom assessment is forma­
tive assessment: evaluating students in the process of "forming" their competen­
cies and skills with the goal of helping them to continue that growth process. The
key to such formation is the delivery (by the teacher) and internalization (by the stu­
dent) of appropriate feedback on· performance, with an eye toward the future con­
tinuation (or formation) of learning.
For all practical purposes, virtually all kinds of informal assessment are (or
should be) formative. They have as their primary focus the ongoing development of
the learner's language. So when you give a studenf:a-comment or a suggestion, or
call attention to an error, that feedback is offered in order to improve the learner's
language ability.
Summative assessment aims to measure, or summarize, what a student has
grasped, and typically occurs at the end of a course or unit of instruction. A sum­
mation of what a student has leamedimplies looking back and taking stock of how
well that student has accomplished objectives, but does not necessarily point the
way to future progress. Final exams in a course and general proficiency exams' are
examples of summative assessment.
One of the problems with prevailing attitudes toward testing is the view that
all tests (quizzes, periodic review tests, midterm exams, etc.) are summative. At var­
ious points in yo~r past educational experiences, no doubt you've considered such
tests as summative.You may have thought, "Whew! I'm glad that's over. Now I don't
have to remember that stuff anymore!" A challenge to you as a teacher is to change
that attitude amo~g your students: Can you instill a more formative quality to what
CHAPTER 1 Testing, Assessing, an~ Teaching. t 7

your students might otherwise view as a summative test? Can you offer your ~tu­
dents an opportunity to convert tests into "learning experiences"? We will take up
that ~hallenge in subsequent chapters ·in this book.

Norm-Referenced and Criterion-Referenced Tests


Another dichotomy that is important to clarify here and that aids in sorting out
common terminology in assessment is the distinction between norm-referenced
and criterion-referenced testing. In norm-referenced tests, each test-taker's score
is interpreted in relation to a mean (average score), median (middle score), standard t-/
deviation (extent of variance in scores), and/or percentile rank. The purpose'in such
tests is to place test-takers along a mathematical continuum in rank order. Scores are
usually reported back to the test-taker in the form of a numerical score (for
example, 230 out of 300) and a percentile rank (such as 84 percent, which means
that the test-taker's score was higher than 84 percent of the, total number of test­
takers, but lower than 16 percent in that administration). Typical of norm-referenced
tests are standardized tests like, the Scholastic Aptitude Test .(SAT~ or the Test of \r~/
English as a Foreign Language (fOEFL~, intended to be administered to large audi­
ences, with results efficiently d~sseminated to test-takers. Such tests must have fIXed,
predetermined responses in a format that can be scored quickly at minimum
expense. Money and efficiency are primary concerns in these tests.
Criterion-referenced tests, on the other hand, are designed to give test-takers
.feedback" usually in, th~ form of grades, on specific course or lesson objectives. v~·
Classroom tests involving the students in only one class, and connected to a cur­
riculum, are typical of criterion-referenced testing. Here, much time and effort on the
part of the teacher (test adm.inistrator) are sometimes required in order to deliver
useful, appropriate feedback to students, or what Oller (1979, p. 52) called "instruc­
tional value." In a criterion-referenced test, the distribution of students' scores across
a continuum may be of littlecoiicern as long as the instrumenl'assesses, appropriate
objectives. In Language Assessment, with an audience of classroom language
teachers and teachers in training, and with its emphasis on classroom-based assess­
ment (as opposed to standardized, large-scale testing), criterion-referenced testing is
of more prominent interest than norm-referenced testing. \

APPROACHES TO lANGUAGE TESTING: A BRIEF HISTORY


Now that you have a reasonably clear grasp of so'me common assessment terms, we
now tum to one of the primary concerns of this book: the creation and use of tests,
particularly classroom tests. A brief history of language testing over the past half­
century will serve as a backdrop to an understanding of classroom-based testirtg.
Historically, language-testing trends and practices have followed the shifting
sands of teaching methodology (for a deSCription of these trends, see Brown,
8 , CHAPTER 1 Testing, Assessing, and Teaching

Teaching by Principles [hereinafter TBP] , Chapter 2). 1 For example, in the· 1950s, an
era of behaviorism and special attention to contrastive analysis, testing focused on
specific language elements such as the phonoiogical, grammatical, and lexical con­
trasts between two languages. In the 1970s and 1980s, COf!!l!l~nicative theories of
language brought \vith them a more integrative view of testing it1 whicli-specialists
claimed that "the 'whole of the communicative event was considerably greater than
the sum of its linguistic elements" (Clark, 1983, p. 432). Today, test designers are still
challenged in their quest for more authentiC, valid instruments that simul~te real- J
world interaction. --.-~, F

Discrete-Point and Integrative Testing


This historical perspective underscores two major approaches to language testing
that were debated in the 1970s and early 1980s. These approaches still prevail today,
even if in mutated forrtl:thechoice"between discrete-point· and integrative testing
methods (Oller, 1979). Discrete-point tests are constructed on the assumption that
. language can be broken down into its component parts and that those parts can be
I
vi tested successfully. These components are the skills of listening, speaking, reading,
and writing, and various--unitsoflanguage (discrete p<?~ts}'Hof phonology/
graphology, morphology, lexicon, syn~, and discourse. It·was claimed that an
overall language profiCiency test, then, should sample all four skills and as many lin­
guistic discrete points as possible.
Such an approach demanded ;!_d.eCQntatu@~ation that often confused the
test-taker. So, as the profession emergedinto an era of"emphasizing communication,
authenticity, and context, new approaches were sought. Oller (1979) argued that
language competence is a unified set of interacting abilities that cannot be tested
separately. His claim was that communicative competence is so global and requires
such integration (hence the term "integrative" testing) that it cannot be captured in
additive tests of grammar, reading, vocabulary, and other discrete points of language.
Others (among them Cziko, 1982, and Savignon, 1982) soon followed in their sup­
port for integrative testing.
What does an integrative test look like? Two types of tests have historically
been claimed to be exa~"'bf integrative tests: cloze tests and dictations. A cloze
test is a reading passage (perhaps 150 to 300 words) '1ii which roughly every sixth
or seventh word has been deleted; the test-taker is required to supply words that fit
into those blanks. (See Chapter 8 for a full discussion of cloze testing.) Oller (1979)

1 Frequent references are made in this book to companion volumes by the author.
Principles of Language Learning and Teaching (PLL1) (Fourth Edition, 2000) is a
basic teacher reference book on essential foundations of second language acquisition
on which pedagogical practices are based. Teaching by Principles (TBP) (Second
Edition, 2001) spells out that pedagogy in practical terms for the language teacher.
CHAPTER 1
. Teaching..., 9
Testing" (\ssessing, and

claimed that cloze test results are good measures of overall proficiency. According
to theoretical constructs underlying this claim, the ability to supply appropriate
wor4s in blanks requires a number of abilities that lie at the heart of competence in
a language: knowledge of vocabulary, grammatical structure, discourse structure,
reading skills and strategies, and an internalized "expectancy" grammar (enabling
one to predict an item that will come next in a sequence). It was argued that suc­
cessful completion of cloze items taps into all of those abilities, which were said to
be the essence of global language proficiency.
Dictation is a familiar language-teaching technique that evolved into a testing
technique. Essentially, learners listen to a passage of 100 to 150 words read aloud by
an administrator (or audiotape) and write what they hear, using correct spelling. The
listening. portion usually has three stages: an oral reading without pauses; an oral
reading with long pauses between every phrase (to give the learner time to write
down what is heard); and a third reading at normal speed to give test-takers a chance
to check what they wrote. (See Chapter 6 for more discussion of dictation as an
assessment device.) . ./
Supporters argue that dictation is an integrative test because it •._-,.......
taps,. into gram­ 1/
. ..... ". ..
matical and disc~ll1"~~<:Qmp,~~~~·l(.:i.f!.~ required for other modes of performance in a
taftguage~'·5iiccess on a dictation requires careful listening, reproduction in writing
of what is heard, effident short-term memory, and, to an extent, some expectancy
rules to aid the short-term memory. Furth~r, dictation test results tend to correlate
strongly with other tests of profiCiency. Dictation testing is usually classroom­
centered since large-seale administration of dictations is quite impractical from a
scoring standpoint. Reliability of scoring criteria for dictation tests can be improved
by designing multiple-choice or exact-word cloze test scoring.
Proponents of integrative test methods soon centered their arguments on what
became knowp. as the unitary trait hypothesis, which suggested an "indivisible"
view of language profiCiency: that vocabulary, grammar, phonology, the "four skills,"
and other discrete points of language could not be disentangled from each other in
language performance. The unitary trait hypothesis contended that there is a gen­
eral factor of language proficiency such that all the discrete points do not add up to
that whole.
Others argued strongly against the unitary trait pOSition. In a study of students
in Brazil and the Philippines, Farhady (1982) found Significant and widely varying
differences in performance on an ESL profiCiency test, depending on subjects' native
country, major field of study, and graduate versus undergraduate status. For example,
Brazilians scored very low in listening comprehension and relatively high in reading
comprehension. Filipinos, whose scores on five of the six components of the test
were considerably higher than Brazilians' scores, were actually lower than Brazilians
in reading comprehension scores. Farhady's contentions were supported in other
research that seriously questioned the unitary trait hypothesis. Finally, in the face of
the evidence, Oller retreated from his earlier stand and admitted that "the u~tr.ry
trait hypothesis was wrong" (1983, P.. 352).
10 CHAPTER 7 Testing, Assessing, and Teaching

Communicative Language Testing'


By the mid-1980s, the language-testing field had abandoned arguments about the
unitary trait hypothesis and had begun to focus on designing communicative
language-testing tasks. Bachman and Palmer (1996, p. 9) include among "funda­
mental" princ1ples of language testing the nel,!d. for . ~corres~ondencebetween lan­
~u~ge tes~perfo~man.c~ .at;14~a.n81.l~g~~s7in (;;d~r for a' partlcUIirTanguage'test to
\..
be useful for its intended purposes, test performance must correspond in demon­
strable ways to language use in non-test situations:' The problem that language
assessment experts faced was that tasks tended to be artificial, contrived, and
unlikely to mirror language use in real life. As Weir (1990, p. 6) noted, "Integrative
tests such as,".~!9~~ only t~ll us about a candidate'S linguistic competence. They do
not tell us anything directly about a student's performance ability.",
And so a quest 'for authenticity was launched, as test designers centered on
communicative performance. Following Canale~dSwain's(1980)model·of conl­
municative competence, Bachman (1990) proposed a model of language compe­
tence consisting of organizational and p~gmatic competence, respectively
subdivided into grammatical and textual cQmponents, and into ill~ocutionary and
sociolinguistic' components. (Further discussion of both, Canale and Swain's and
Bachman's models can beJound in PLLT, Chapter 9.) Bachman and Palmer (1996,
pp. 70f) also emphasized the importance of strategic competence (the ability to
employ communicative strategies to compensate for breakdowns as well as to
enhance the rhetorical effect of utterances) in the process of communication. All
elements of the model, especially pragmatic and strategic abilities, needed to be
included in the constructs of language testing and in the actual performance
required of test-takers.
Communicative testing prese~ted challenges to test deSigners, as we will see in
subsequent chapters of this book. Tes~ constructors began to identify the kind~ of
rc;,i!l::w.:QdQ. t,;!~k~Jhflt language learners were called upon to perfolln. It was clear that
the contexts for those tasks were extraordinarily widely-'variecf and that the sam­
pling of tasks for anyone assessment procedure needed to be validated by what lan­
guage users actually do with language. Weir (1990, p. 11) reminded his readers that
"to measure language proficiency ... account must now be taken of: where, when,
how, with whom, and why language is to be used, and on what topics, and with what
effect." And the assessment field became more and more concerned with the
authenticity of tasks and the genuineness of texts. (See Skehan, 1988, 1989, for a
'survey of communicative testing research.) ,

Performance-Based Assessment
In language courses and programs around the world, test designers are now tackling
this new and more student-centered agenda (Alderson, 2001, 2002). Instead of just
offering paper-and-pencil selective response tests of a plethora of separate items,
perfonnance-based asseSSnlen( of language typically .involves oral production,
CHAPTER 1 Testing, -Assessing, and Teaching, 11

written production, open-ended responses, integrated performance (across'rskill


areas), group performance, and other interactive tasks. To be sure, such assessment
is time-consuming and therefore expensive, but those extra efforts are paying off in
the form of more direct testing because students are assessed as they perform actual
or simulated real-world tasks. In technical terms, higher content validity (see
Chapter 2 for an explanation) is achieved because learners are meas.ureQ, ~n the
processo~ Pe,!!2!"..~gJh~,~!*g~~e,q,lingYJ~t.~~acts. v
'an'
r~"" "Iii English language-teaching contexi,'peiformance-based assessment means
that you may have a difficult time distinguishing between formal and informal
assessment. If you rely a little less on formally structured tests and a little more on
evaluation ,while students are perfomling various tasks, you will be taking some
steps toward meeting the goals Qf performance-based testing. (See Chapter 10 for a
further discussion of performance-based assessment.)
A characteristic of many (but not all) performance-based language assessments
is the presence of interactive tasks. In such cases, the assessments involve learners
in actually performing the behavior that we want to measure. In interactive tasks,
te:!:~~e~~,~:~~~~ure~ "~'" tJ:1~,~~~~'~n,~{!~~~g, r~,g!:!~~,~!!?:g~"E~SP(;)f~g~g, or in com~
bining liste:ning and speaking, and in integrating reading an<:l writing. Paper-and­
••• v " ",

pencil tests certainly do not elicit such communicative performance .


.,'" A'pnme example of an interactive language assessment procedure is an oral
interyiew. The test-taker is required to listen accurately to someone else and to
respond appropriately. If care is taken in the test design process, language elicited
and volunteered by the student can be personalized and meaningful, and tasks can
approach the authenticity of real-life language use (see Chapter 7).

CURRENT ISSUES IN CLASSROOM TESTING


The design of communicative, performance-based assessment rubrics continues to
challenge both assessment experts and classroom teachers. Such efforts to improve
various facets of classroom testing are accompanied by some stimulating issues, all
of which are helping to shape our current understanding of effective assessment.
Let's look at three such issues: the effect of new theories of intelligence on the
testing iltdustry; the advent of what has come to be called "alternative"assessment;
and the increasing popularity of computer-based testing.

New Views on Intelligence


Intelligence was once viewed strictly as the ability to perform (a) linguistic and (b)
logical-mathematical problem solving. This "IQ" (intelligence quotient) concept of
intelligence has permeated the Western world and its way of testing for almost a
century. Since "smartness" in general is measured by timed, discrete-point tests con­
sisting of a hierarchy of separate items, why shouldn't every field of study be so mea­
sured? For many years, we have lived in a world of standardized, nonn-referenced
12 CHAPTER 1 Testing, Assessing, and Teaching

tests that are timed in a multiple-choice format consisting of a multiplicity of logic­


constrained items, many of which are inauthentic.
However, research on intelligence by psychologists like Howard Gardner,
Robert Sternberg, and Daniel Goleman has begun to turn the psychometric world
upside down...GarQl1er (1983,1999), for example, extended the traditional view of
intelligence to Seven different components. 2 He accepted the traditional conceptu­
alizations o~ linguistic intelligence and logical-mathematical intelligence on which
standardized IQ tests are based, but he included five other "frames of mind" in his
theory of multiple intelligences:

• spatial intelligence (the ability to find your way around an environment, to


form mental images of reality)
• musical intelligence (the ability to perceive and create pitch and rhythmiC
patterns)
• bodily-kinesthetiC intelligence (fine motor· movement, athletic prowess)
• interpersonal intelligence (the ability to understand others ~d how they
feel, and to interact effectively with them)
• intrapersonal intelligence (the ability to understand oneself and to develop a
sense of self-identity)

Robert Sternberg (1988, 1997) also charted new territory in intelligence re­
search in recognizing creative thinking and manipulative strategies as part of intel­
ligence. All "smart" people aren't necessarily adept at fast, reactive thinking. They
may be very innovative in being able to think beyond the normal limits imposed by
existing tests, but they may need a good deal of processing time to enact this cre­
ativity. Other forms of smartness are found in those who know how to manipulate
their environment, namely, other people. Debaters, politiCians, successful salesper­
sons, smooth talkers, and con artists are all smart in their manipulative ability to per­
suade others to think their way, vote for them, make a purchase,·or -dQ·-5Q-mething
they might not otherwise do.
More recently, Daqiel Goleman's (1995) concept. of "EQ" (emotional quotient)
has spurred us to unde~s~e-ore'~the iinportance of the emotions 1ii our cognitive pro­
cessing. Those who manage their emotions-especially emotions that can be detri­
mental-tend to be more capable of fully intelligent processing. Anger, grief,
resentment, self-doubt, and other feelings can easily impair peak performance in
everyday tasks as well as higher-order problem solving.
These new conceptualizations of il1 telligence have not been universally
accepted by the academic community (see White, 1998, for example). Nevertheless,
their intuitive appeal infused the decade of the 1990s with a sense of both freedom
and responsibility in our testing agenda. Coupled with parallel educational reforms
at the time (Armstrong, 1994), they helped to free us from relying exclusively on

2 For a summary of Gardner'S theory of intelligence, see Brown (2000, pp. 100- 102).
CHAPTER 1 Testing, Ass.essing, and Teaching 13

timed, discrete-point, analytical tests in measuring language. We were prodded.:~o


cautiously combat the potential tyranny of" objectivity" and its accompanying imper­
sonal !lPproach. But we also assumed the responsibility for tapping into whole Ian·
guage skills, learning processes, and the ability to negotiate meaning. Our challenge
was to test interpersonal, creative, communicative, interactive skills, and in doing so
to place some trust in our subjectivity and intuition.

Traditional and «Alternative" Assessment


Implied in some of the earlier description of performance-based classroom assess­
ment is a trend to supplement traditional test designs with alternatives that are more
authentic in their elicitation of meaningful communication. Table 1.1 highlights dif­
ferences between the two aEEroaches (adapted from Armstrong, 1994, and Bailey,
1998,p.207). ----...,
Tw'O caveats need to be stated here. First, the concepts in Table 1.1 represent
some .Q!S{~neralizations and should therefore be considered with caution. It is dif­
"~~~,"~'l(;e::?-'ry<~~:r;l..~N"·~",~, '

ficult, in fact, to draw a clear line of distinction between what Armstrong (1994) and
Bailey (1998) have called traditional and alternative assessment. Many forms of
asSessment fall in between the two, and some combine the best of both.
Second, it is obvious that the table shows a bias toward aiternative assessment, ~"'!I~. ,~

and one should not be misled into thinking that everything on the left-hand side is
tainted while the list on the right-hand side offers salvation to the field of language
assessment! As Bt:ow.ll and Hudson (1998) aptly pointed out, the assessment ~radi­
tions available to us should b'e'val'ued"and utilized for the ·functions that they pro­
~~"!:f~....~~""!t'Ifi'p~~".....,:t'>:~t~.~t':~· ~""''''''''-'''c't'''''''",<,.""""",("_", __ ",, _i.-" __ H • • ,. . . . . " _ . _ ••• _.,,,.-.,,~.,- f..: _e'" <', -.~

vide. At the same time, we might all be stimulatecffO rook at tfie right-hand list and
---ask' ourselves if, among those concepts, there are alterQ;!l!Y~s to assessment that we ~~~:<.,..·,.~,lL',.~"fo'f •

can constructively use in our classrooms .


..It should l?~ noted here that considerably more time and higher institutional
budgets are required to administer arid score-assessments that presuppose more

Table 1.1. Traditional and alternative assessment 'i

Traditional Assessment Alternative Assessment

One-shot, standardized exams Continuous long-term as~essment


limed, multiple-choice format Untimed, free-response format
Decontextualized test items Contextualized communicative tasks
Scores suffice for feedback Individualized feedback and washback
Norm-referenced scores Criterion-referenced scores
Focus on the "right" answer Open-ended, creative answers
Summative Formative
Oriented to product Oriented to process
Non-i nteractive performance Interactive performance
Fosters extri nsicmotivation Fosters intrinsic motivation
14 CHAPTER 1 Testing, Assessing, and Teaching

subjective evaluation, more individualization, and more interaction in the process of


offering feedback. The payoff for the latter, however, comes with more useful feed­
back to students, the potential for intrinsic motivation, and ultimately a more
complete description of a student's ability. (See Chapter 10 for a complete treatment
of alternatives in assessment.) More and more educators and advocates for educa­
tional reform are'"arguing for a de-emphasis on large-scale standardized tests in favor
of building l;>udgets that will offer the kind of contextualized, communicative
performance-based assessment that will better facilitate learning in our schools. (In
Chapter 4, issues surrounding standardized testing are addressed at length.)

Computer-Based Testing
Recent years have seen a burgeoning of assessment in which the test--taker performs
responses on a computer. Some computer-based tests (also known 3$ "computer­
assisted" or "web-based" tests) are small-scale "home-grown" tests available on web­
sites. Others are standardized, large-scale tests in which thousands or even .tens of
thousands of test-takers are involved. Students receive prompts (or probes, as they
are sometim<;s referred to) in the form of spoken or written stimuli from the com­
puterized test and are required to type (or in some cases, speak) their responses.
Almost all computer-based test items· have fIXed, closed-ended responses; however,
tests like the Test of English as a Foreign Language (fOEFL~ offer a written essay
section that must be scored by humans (as opposed to automatic, electrOnic, or
machine scoring). As this book goes to press, the designers of the TOEFL are on the
verge of offering a spoken English section.
A specific type of computer·based test, a computer-adaptive test, has been
available for many years but has recently gained momentum. In a computer-adaptive
test (CAn, each test-taker receives a set of questions that meet the test specifica­
tions and that are generally appropriate for his or her performance level. The CAT
starts with questions of moderate difficulty. As tesFtakers-answer-eaeh question, the
computer scores the question and uses that information, as well as the responses to
previous questions, to determine which question will be presented next. As long as
examinees respond correctly, the computer typically selects questions of greater or
equal difficulty. Incorrect answers, however, typically bring questions of lesser or
equal difficulty. The computer is programmed to fulfill the test design as it continu­
ously adjusts to fmd questions of appropriate difficulty for test-takers at all perfor­
mance levels. In CATs, the test-taker sees only one question at a time, and the
computer scores each question before selecting the next one. As a result, test-takers
cannot skip questions, and once they have entered and confirmed their answers t
they cannot return to questions or to any earlier part of the test.
Computer-based testing, with or without CAT technology, offers these advantages:

• classroom.;based testing
• self-directed testh"1g on various aspects of a language (vocabulary, granunar,
discourse, one or all of the four skills, etc.)
CHAPTER 7 Testing, Assessing,and T.eaching 1.~

• practice for upcoming high-stakes standardized tests


• some individualization, in the case of CATs
• large-scale standardized tests that can be administered easily to thousands of
test-takers at many different stations, then scored electronically for rapid
reporting of results

Of course, some disadvantages are present in our current predilection for com­
puterizing testing. Among them:

• Lack of security and the possibility of cheating are inherent in classroom­


based, unsupervised computerized tests.
• Occasional "home-grown" quizzes that appear on unofficial websites. may be
mistaken for validated assessments.
• The multiple-choice format preferred for most computer-based tests contains
the usual potential for flawed item design (see Chapter 3).
• Open-ended responses are less likely to appear because of the ·need for
human scorers, with all the attendant issues of cost, reliability, and turn­
around time.
• The human interactive elenlent (especially in oral production) is absent.

More is said about computer-based testing in subsequent chapters, especially


Chapter 4, in a discussion of large-scale standardized testing. In addition, the fol­
lowing websites provide further information and examples of computer-based tests:

Educational Testing Service www.ets.org


Test of English as a Foreign Language www.toetl..org
Test of English for International Communication www.toeic.com
International English Language Testing System www.ielts.org
Dave's ESL Cafe (computerized quizzes) www.eslcafe.com

Some argue that computer-based testing, pushed to its ultimate level, might mit­
igate against recent efforts to return testing to its artful form of being tailored by
teachers for their classrooms, of being designed to be performance~based, and of
allowing a teacher-student dialogue to form the basis of assessmept. This need not
be the case. Computer technology can be a boon to communicative language
testing. Teachers and test-makers of the future will have access to an ever-increasing
range of tools to safeguard against impersonal, stamped-out formulas for assessment.
By using technological innovations creatively, testers will be able to enhance authen­
ticity, to increase interactive exchange, and to promote autonomy.

§ §

As you read this book, I hope you will do so with an appreciation for the place
of testing in assessment, and with a sense of the interconnection of assessment and
16 CHAPTER 1 Testing, Assessing, and Teaching

teaching. Assessment is an integral part of the teaching-learning cycle. In an inter­


active, communicative curriculum, assessment is almost constant. Tests, which are a
subset of assessment, can provide authenticity, motivation, and feedback to the
learner. Tests are essential components of a successful curriculum and one of sev­
eral partners in the learning process. Keep in mind these basic principles:

1. Periodic !ssessments, both formal and informal, can increase motivation by


serving as milestones of student progress.
2. Appropriate assessments aid in the reinforcement and retention of informa­
tion.
3. Assessments can confirm areas of strength and pinpoint areas needing further
work.
4. Assessments can provide a sense of periodic closure to modules within a cur­
riculum.
5. Assessments can promote student autonomy by encouraging students' self­
evaluation of their progress.
6. Assessments can spur learners to set goals for themselves.
7. Assessments can aid in evaluating teaching effectiveness.

Answers to the vocabulary quiz on pages 1 and 2: 1 c, 2a, 3d, 4b, Sa, 6c.

EXERCISES
[Note: (I) Individual work; (6) Group or pair work; (C) Whole-class discussion.]

1. (G) In a small group, look at Figure 1.1 on page 5 that shows tests as a subset
of assessment and the latter as a subset of teaching. Do you agree with this
diagrammatic depiction of the three terms? Consider the following classroom
teaching techniques: choral drill, pair pronunciation practice, reading aloud,
information gap task, singing songs in "English, writing a description of the
weekend '5 activities. What proportion of each has an assessment facet to it?
Share your conclusions with the rest of the class.
2. (G) The chart below shows a hypotheticaillne of distinction between forma~
dve and summative assessment, and between informal and formal assessment.
As a group, place the following techniques/procedures into one of the four
cells and justify your decision. Share your results with other groups and dis­
cuss any differences of opinion.

Placement tests
Diagnostic tests
Periodic achievement tests
Short pop quizzes
CHAPTER 1 Testing, Assessing, and T~aching 1.,7

Standardized proficiency tests


Final exams
Portfolios
Journals
Speeches (prepared and rehearsed)
Oral presentations (prepared, but not rehearsed)
Impromptu student responses to teacher's questions
Student-written response (one paragraph) to a reading assignment
Drafting and revising writing
Final essays (after several drafts)
Student oral responses to teacher questions after a videotaped lecture
Whole class open-ended discussion of a topic

Formative Summative

Informal

Formal
-'

3. (I/C) Review the distinction between norm-referenced and criterion­


referenced testing. If norm-referenced tests typically yield a. distribution of
scores that resemble a bell-shaped curve, what kinds of distributions are
typical of classroom achievement tests in your experience?
4. (IIC) Restate in your own words the argument between unitary trait propo­
nents and discrete-point testing advocates. Why did Oller back down from the
unitary trait hypothesis?
5. (IIC) Why are cloze and dictation considered to be integrative tests?
6. (G) Look at the list of Gardner'S seven intelligences. Take one or two intelli­
gences, as assigned to your group, and brainstorm some teaching activities
that foster that type of intelligence. Then, brainstorm some assessment tasks
18 CHAPTER 1 Testing, Assessing, and Teaching

that may presuppose the same intelligence in order to perform well. Share
your results with other groups.
7. (e) As a whole-class discussion, brainstorm a variety of test tasks that class
members have experienced in learning a foreign language. Then decide
which of those tasks are performance-based, which are not, and which ones
fall in between.
8. (G) Table 1.IUsts traditional and alternative assessment tasks and characteris­
tics. In palrs, quickly review the advantages and diSadvantages of each, on
both sides of the chart. Share your conclusions with the rest of the class.
9. (e) Ask class members to share any experiences with computer-based testing
and evaluate the advantages and disadvantages of those experiences.

FOR YOUR FURTHER READING


McNamara, Tim. (2000). Language testing. Oxford: Oxford University Press.
One of a nUlnber of Oxford University Press's brief introductions to various
areas of language study, this 140~page primer on testing ()f(~rs definitions
of basic terms in language testing with briefexplanatiQns_of fundamental
concepts. It is a useful little reference book to check your understanding of
testing jargon and issues in the field.

Mousavi, Seyyed Abbas. (2002). An encyclopedic dictionary of language testing.


Third Edition. Taipei: Thng Hua Book Company.
This publication may be difficult to find in local bookstores, but it is a
highly useful compilation of virtually every term in the field of language
testing, with definitions~ background history, and research references. It
provides comprehensive explanations of thee>ries; principles, issues, tools,
and tasks. Its exhaustive 88-page bibliography is also downloadable at
http://www.abbas-mousavi.com. A shorter version of this 942-page tome
may be found in the previous version, Mousavi's (1999) Dictionary of lan­
guage testing (Tehran: Rahnama Publications).

You might also like