BUKU 2. Language Testing
BUKU 2. Language Testing
Pd
Nina Suzanne, M. Pd
LANGUAGE TESTING
Suswati Hendriani
Nina Suzanne
Language Testing
1. Judul
The Writers
iii
TABLE OF CONTENT
Preface……………………………………………………. iii
Table of Content………………………………………….. iv
iv
D. Oral Test………………………………………. 59
Unit 5. Testing Language Skills and Components
A. Testing Grammar……………………………… 63
B. Testing Vocabulary……………………………. 65
C. Testing Listening……………………………… 68
D. Testing Speaking………………………………. 82
E. Testing Reading……………………………….. 102
F. Testing Writing………………………………... 118
Unit 6. Assigning Grades And Course Marks
A. Scoring Tests………………………………………. 125
B. Assigning Grades…………………………………... 127
C. Conventional Methods of Assigning Course Marks.. 130
D. New Methods of Grading………………………….. 132
References…………………………………………………… 135
v
Language Testing
UNIT I
TEACHING, ASSESSMENT, AND TESTING
1
Language Testing
2
Language Testing
3
Language Testing
3. Kinds of Assessment
Assessment can be divided into:
a. Informal and Formal Assessment
Informal assessment can take a number of forms,
starting with incidental, unplanned comments and
response, along with coaching and other impromptu
feedback to the student. Example: “Nice job!” “Good
work!” Did you say can or can’t?” I think you meant
broke the glass, not you break the glass”, or putting (a
smile character) on some homework or assignment.
4
Language Testing
8
Language Testing
10
Language Testing
UNIT II
CHARACTERISTICS OF A GOOD TEST
A. Validity
Of the three qualities mentioned above, validity is the
most important. The term validity refers to an instrument’s
truthfulness, appropriateness, or accuracy. A valid instrument
is truthful because it measures what the person using the
instrument wishes or attempts to measure.
In the Standards for Educational and Psychological
Testing in Athanasou and Lamprianou (2002: 167), the
definition of validity is “the degree to which accumulated
evidence and theory support specific interpretations of test
scores entailed by proposed uses of a test.” The degree of
validity is an inference that requires several lines of evidence.
Furthermore, validity is specific, since an instrument
has validity only for the purpose for which it was intended.
11
Language Testing
1. Types of Validity
There are several types of validity, but among them
there are three which are commonly described: contents,
construct, and criterion validity. Each type of validity
seeks to answer a separate question, such as:
Content validity: does the assessment match the content
and learning outcomes of the subject?
Construct validity: does the assessment really involve
the particular behaviors, thought processes or talents
that are said to be assessed?
Criterion validity: does the assessment provide
evidence of future achievement or performance in other
related subjects?
Here are the explanations for each type of
validity:
a. Content Validity
A teacher’s primary concern for a classroom test
is with content validity. Content validity refers to the
degree to which an instrument samples the subject matter
in the area to be measured or the degree to which it
coincides with the instructional objectives which are to
be measured in a given field. In this term, the test-takers
are required to perform the behavior that is being
measured. For example, assessing a person’s ability to
speak a foreign language in a conversational setting is by
asking him/her to speak within some sort of authentic
context can achieve the content validity. But, asking the
learner to answer paper-and-pencil multiple-choice
questions requiring grammatical judgments doesn’t
achieve content validity.
Content validity is sometimes confused with what
is called “face” validity. Face validity is the appearance
13
Language Testing
of the test and its relevance to the person taking the test.
For instance, test-takers are perplexed when the content
does not appear to match the subject area. Face validity
is not content validity but is related.
There are two essential aspects that need to be
considered by testers related to content validity; first,
testers need to ensure that all questions are asked in a
way that is familiar for the learner and consistent with
the subject; two, testers need to judge whether the
assessment adequately samples the topics and learning
outcomes in the subject.
b. Criterion-related validity
Criterion-related validity is the extent to which the
“criterion” of the test has actually been reached. In the
case of teacher-made classroom assessments, criterion-
related evidence is best demonstrated through a
comparison of results of an assessment with results of
some other measure of the same criterion. Criterion
validity is commonly indicated by correlation
coefficients.
There are two categories of criterion-related
evidence:
1). Predictive Validity
Predictive validity is present in an evaluative
instrument or technique if relative success of the
student can be predicted accurately from the score or
rating obtained. It is important in the case of
placement tests, language aptitude tests, and the like.
It is used to assess and predict a test-taker’s likelihood
of future success.
14
Language Testing
15
Language Testing
c. Construct validity
Construct validity indicates the qualities a test
measures. Constructs that may adversely influence the
performance of certain pupils include desire to achieve,
response set, reading ability, competitiveness, and poor
test psychology. The pupil who has little desire to
achieve, or who has a poor test psychology, or who is
uncompetitive may perform much below his optimum
level. Furthermore, a pupil who knows the answers to
some questions but misunderstands them, possibly
because of his culture bias or his poor reading ability,
will probably answer incorrectly.
Construct-related validity refers to the theoretical
evidence for what we are measuring. How can you be
sure that a final exam in marketing is really assessing
“marketing” and not scholastic aptitude or examination
ability?
Construct validity studies can involve:
Internal consistency analysis of the questions in a
test to see whether a single aspect is being
assessed.
Analysis of results over time to trace changes in
student development of knowledge, skills or
attitudes.
Checks to see whether graduates or workers in an
occupation perform better than novices or
students.
Factors analysis of the items.
The correlation between assessments results and
other related assessments of knowledge, skills or
attitudes.
16
Language Testing
18
Language Testing
B. Reliability
The second important quality of measuring instrument
is its reliability. Reliability is the next important characteristic
of assessment results after validity. Testers should be certain
that the assessment has a high degree of validity and must
also be reliable.
Reliability refers to its consistency. A reliable
instrument is one which is consistent enough that subsequent
measurements give approximately the same numerical status
to the thing or person being measured. If a reliable test is
given two or three times to the same group, each person in
the group should get approximately the same score on all
tests.
The Standard for Educational and Psychological
Testing in Athanasou and Lamprianou (2002:175) define
reliability as “the degree to which test scores for a group of
test takers are consistent over repeated applications of a
measurement procedure and hence are inferred to be
dependable, and repeatable for an individual test taker.”
Reliability relates to questions about the stability and
consistency of results for a group. The questions that need to
be answers are:
Would the ranking of students’ results on an assessment
be similar when it is repeated?
Do two versions of an assessment produce the same
results?
Can I increase the reliability of results by lengthening a
test?
Are all the responses to the items or tasks homogeneous
and consistent with each other?
In a sense reliability is a part of validity, for the test
with high validity should measure that quality with
19
Language Testing
20
Language Testing
22
Language Testing
C. Practicality
Practicality refers to its usability. It is the third
desirable quality of tests. A test should be applicable to our
particular situation. For a test to have a high degree of
usability, it should:
be easy to administer
be easy to score
be economical to use, both in terms of teacher time
and of materials required
have good format
have meaningful norms.
The usability of a teacher-made test can be ensured by
observing the following rules:
1. Have the test typed and duplicated so that each pupil
will have a copy.
2. Directions to the pupil should accompany each part of
the test.
3. The test should be designed to fit the time limits of the
class period.
4. The test should be set up so that it can be readily
scored.
5. Care should be exercised in planning the test to make it
economical in terms of time required for test
construction, duplication, and scoring.
6. Norms of pupil performance should be established from
test results.
25
Language Testing
26
Language Testing
UNIT III
CONSTRUCTING AND ADMINISTERING
LANGUAGE TEST and TABLE OF
SPECIFICATION
28
Language Testing
29
Language Testing
e. Word study
2. Laboratory practice, including drill on dialogue and
pronunciation points keyed to the textbook.
3. Weekly compositions based on topics related to the
textbook reading.
From the course coverage it is clear that the general
objectives of the course are:
1. To increase skill in listening comprehension
2. To increase skill in oral production
3. To develop skill in reading simple descriptive and
expository prose.
4. To develop skill in writing simple description and
exposition
Our basic objectives, then, are to measure the extent to
which students have acquired or improved their control of
these skills.
Step 2. Dividing the General Course Objectives into Their
Components
The objectives in step 1 were extremely broad. As our
next step, then, we need to break them down into their
specific components, after which we may determine which of
these components we shall measure in our final examination.
Step 3. Establishing the General Design of the Test
At this point, two extremely important factors must be
considered: the time to be provided for testing, and the degree
of speediness we wish to build into our test.
Let us assume that a maximum of two hours has been
scheduled for the final examination. Of the total 120 minutes,
we should reserve at least 10 minutes for administrative
procedures: seating the students and handing out the
materials, giving general directions, collecting the materials
at the end of the testing period, handling unanticipated
30
Language Testing
problems, etc. We are thus left with 110 minutes for actual
testing.
31
Language Testing
32
Language Testing
33
Language Testing
34
Language Testing
C. Table of Specification
According to Fulcher and Davidson (2007:52), Test
specifications (specs) are generative explanatory documents
for the creation of test tasks. Specs tell the test makers of how
to phrase the test items, how to structure the test layout, how
to locate the passages, and how to make a host of difficult
choices as we prepare test materials.
35
Language Testing
36
Language Testing
Ket:
No : Nomor
SK : Standar Kompetensi
KD : Kompetensi Dasar
Mat : Materi
Indikator Penc. : Indikator Pencapaian
Indikator Soal : Indikator Soal
Bentuk soal : Bentuk soal
Level ot think : Level of thinking
No. Soal : Nomor soal
Wkt : Waktu
STANDAR KOMPETENSI
5. Memahami makna dalam teks tulis fungsional pendek sangat sederhana
yang berkaitan dengan lingkungan terdekat.
KOMPETENSI DASAR
5.2 Merespon makna dalam teks tulis fungsional pendek sederhana
secara akurat, lancar dan berterima yang berkaitan dengan
lingkungan sekitar.
MATERI
Functional Text
Invitation card
Short messages
Announcement
Greeting Card
INDIKATOR PENCAPAIAN
Mengidentifikasi makna teks tulis fungsional berbentuk invitation
card.
Mengidentifikasi makna teks tulis fungsional berbentuk short
message.
Mengidentifikasi makna teks tulis fungsional berbentuk
announcement.
Mengidentifikasi makna teks tulis fungsional berbentuk greeting card.
37
Language Testing
INDIKATOR SOAL
1. Invitation Card
Menentukan gambaran umum teks
Menentukan topik teks.
Menentukan informasi rinci tersurat dalam teks.
Menentukan informasi rinci tersirat dalam teks.
Menentukan rujukan kata dalam teks
Menentukan makna kata berdasar konteks
2. Short Messages
Menentukan gambaran umum teks
Menentukan topik teks.
Menentukan informasi rinci tersurat dalam teks.
Menentukan informasi rinci tersirat dalam teks.
Menentukan rujukan kata dalam teks
Menentukan makna kata berdasar konteks
3.Announcement
Menentukan gambaran umum teks
Menentukan topik teks.
Menentukan informasi rinci tersurat dalam teks.
Menentukan informasi rinci tersirat dalam teks.
Menentukan rujukan kata dalam teks
Menentukan makna kata berdasar konteks
4. Greeting Cards
Menentukan gambaran umum teks
Menentukan topik teks.
Menentukan informasi rinci tersurat dalam teks.
Menentukan informasi rinci tersirat dalam teks.
Menentukan rujukan kata dalam teks
Menentukan makna kata berdasar konteks
BENTUK SOAL
Pilihan ganda
Essay
38
Language Testing
UNIT IV
TYPES OF TESTS
There are some kinds of tests that are divided into some
categories. According to a small number of kinds of
information being sought, the tests are divided into as follows
(Hughes: 9):
1. Proficiency Tests
This kind of test is designed to measure people’s
ability in a language regardless of any training they may
have had in that language. The content of this test is based
on a specification of what candidates have to be able to do
in the language in order to be considered proficient. So, it
is not based on the content or objectives of language
courses which people taking the test may have followed.
In the case of some proficiency tests, “proficient”
means having sufficient command of the language for a
particular purpose. Example: a test designed to discover
whether someone can function successfully as a United
Nations translator, a test used to determine whether a
student’s English is good enough to follow a course of
39
Language Testing
2. Achievement Tests
Most teachers are unlikely to be responsible for
proficiency tests. It is much more probable that they will
be involved in the preparation and use of achievement
tests. In contrast to proficiency tests, achievement tests are
directly related to language courses, their purpose being to
establish how successful individual students, groups of
students, or the courses themselves have been in achieving
objectives.
There are two kinds of achievement tests: final
achievement tests and progress achievement tests. Final
achievement tests are those administered at the end of a
course of study. They may be written and administered by
ministries of education, official examining boards, or by
members of teaching institutions. Clearly the content of
these tests must be related to the courses with which they
are concerned, but the nature of this relationship is a
matter of disagreement among language testers.
In the view of some testers, the content of a final
achievement test should be based directly on a detailed
course syllabus or on the books and other material used
(syllabus-content approach).
The disadvantage; if the syllabus is badly designed, or
the book or other materials are badly chosen, and then the
result of a test can be very misleading.
40
Language Testing
41
Language Testing
A. Objective Test
Objective tests can be generally classified as formal or
informal tests. The formal tests are the published,
standardized tests that have been highly refined (1) through
the process of careful analysis of objectives, of course
42
Language Testing
43
Language Testing
44
Language Testing
On the other side, this test form also has some weaknesses
as follow:
1) It is difficult to construct items that call for only one
correct answer.
2) Completion-type items stress rote recall and
encourage pupils to spend their time memorizing
trivial details rather than seeking important
understandings.
45
Language Testing
2. Alternate-Response Form
The alternate-response or true-false form is probably
used more frequently by teachers than any other test form.
The alternate-response form can be constructed to measure
fairly complex understanding. It consists of a statement to be
judged true or false/correct or incorrect.
The average test difficulty is normally about the 75-
percent level and pupil has a 50-50 chance of guessing the
correct response. It means that, with a well-constructed, but
difficult, true-false test, the average pupil will answer
correctly approximately 75 percent of the items.
True-false questions are used widely in education and
training because these questions can be directed to the
essential structure of a subject’s knowledge. They are helpful
when there is a need to assess knowledge of the basic facts or
ideas in a subject area.
Example:
Most Americans stop working at age 65 or 70 and
retire.
T F
46
Language Testing
3. Multiple-Choice Form
The multiple-choice test is considered by most test
experts to be the best type of objective test for measuring a
variety of educational objectives. The test is versatile, and it
47
Language Testing
b. deliveries
c. sales
d. demonstrations
49
Language Testing
4. Matching Form
The matching examination is most useful for measuring
recognition and recall. It can be constructed in such a manner
that the pupil has virtually no chance of guessing the correct
responses.
Example:
All the names on the premise (on the left) are the
capital city of the countries on the right.
1. Kualalumpur a. Indonesia
2. Singapore b. Malaysia
3. Jakarta c. Myanmar
50
Language Testing
4. Bangkok d. Philippines
5. Manila e. Singapore
f. Thailand
Strengths:
1) It is easy to construct.
2) It takes very little time to administer and to correct.
3) It is to measure the pupil’s ability to recognize or
recall these facts
Weaknesses:
1) Its inability to measure the higher levels of learning.
51
Language Testing
B. Performance Test
Performance of any task is a complex of many factors,
but it includes two measurable aspects: (1) the procedure,
skill, or technique, and (2) the product or result. When the
procedure is assessed, the examiner is attempting to
determine how skillfully the subjects perform the desired
procedure, while the assessment of the product stresses the
end result through an examination of the quality of the
product.
A teacher must select a measurement approach
adaptable to his purpose. The three general approaches are:
1. Object or identification tests
It is a test that emphasizes knowledge of the product,
wherein the pupil is asked to identify the nature and
function of various components of the product.
52
Language Testing
53
Language Testing
b. Consensus ratings
It is a group-rating method in which a number
of people rate the product or the procedure. For
example, scores from a teacher and peers. The
average of these ratings is the score that is assigned.
c. Rank orderings
It is another means of comparatively
evaluating products. The products are arranged in
order from the best to the poorest.
d. Paired comparisons
In this method each product is compared
against others in all possible pairings to determine
which of each pair is better.
C. Essay Test
Essay tests should be used to measure such objectives
as understandings, attitudes, interests, creativity, and verbal
expression. With reference to the taxonomy, this test form is
useful to evaluate learning in the upper levels of the cognitive
domain, notably: application, analysis, synthesis, and
evaluation.
The essay examination fits well with the teaching style
that encourages divergent thinking and creativity, that
stresses the acquisition and application of large concepts, and
that deals with controversies and problem solving.
54
Language Testing
2. Extended-response items
The extended-response question is one that is
relatively unstructured, permitting the pupil freedom in
organizing and expressing the answer in a manner that
displays his personal insights and the breadth and scope
of his knowledge.
Example:
a. Discuss the tax structure of the local, state, and
federal branches of government in the United States.
b. Explain the purpose of the author in writing the
article.
55
Language Testing
56
Language Testing
Methods of Grading
There are two acceptable methods for grading essay
tests:
1. The point-score method
a. The Construct a grading key which includes the
major aspects that the pupil should include in his
response to each question.
b. Read a single question through all the papers.
57
Language Testing
Problems of Grading
There are several special problems in grading essay
examinations, regardless of the method used:
1. The halo effect
The halo effect is apparent when the score given by
the grader for one paper is influenced by a well
organized and answered of the first question. Even
though the following answers are poorer, the paper is
still assigned a high score.
In another case, a good pupil who did well at the
first test, but did poor at the last test is still assigned a
good score because the scorer is influenced by his
previous performance. On the other hand, a poor pupil
is still given a low score even though he did the last
test better than the previous one.
2. High grading or generosity error
It is related to halo effect. The grader who is
affected by generosity error consistently assigns grades
that are too high for all papers.
3. Low grading or penalty error
It is unfair as generosity error. The grader is too
strict. He consistently assigns grades that are low or too
low with too high standard for all of the classes.
4. The influence of extraneous factors.
58
Language Testing
D. Oral Test
The oral test is the oldest form of examination used
by teachers. It was used by early teachers. Nowadays, it
is used rarely to examine students’ ability. When properly
constructed and used, it can be both a good instructional
technique and a valuable, informal means of appraising
pupil progress.
59
Language Testing
60
Language Testing
61
Language Testing
62
Language Testing
UNIT V
TESTING LANGUAGE
SKILLS AND COMPONENTS
A. Testing Grammar
There is an essential difference between the traditional
―grammar‖ test for the native speaker of English and the kind
of structure test appropriate for the foreign learner. Structure
test for the native speaker is formal written English. On the
other hand, structure tests for foreign students will have as
their purpose the testing of control of the basic grammatical
patterns of the spoken language.
The preparation of s structure test should always begin
with the setting up include the full range of structures that
were taught in the course, and each structural type should
receive about the same emphasis in the test that it received in
the classroom.
The following are the item types of grammar test:
1. Completion (multiple-choice)
Example:
a. Sinta (lives) (is living) (has lived) in Padang
since 2010.
A B C
63
Language Testing
64
Language Testing
B. Testing Vocabulary
The selection of vocabulary test words is relatively easy
in achievement tests. The first decision that must be made is
whether to test the students’ active or passive vocabulary,
that is the words they should be using in their speech and
writing or those they will need merely to comprehend,
especially in their reading.
Generally speaking, vocabulary tests on an
intermediate level will concentrate on the words needed in
speaking or in comprehending the oral language, while tests
on an advanced level will deal mostly with the lexicon of
written Englishthe words needed by students if they are to
understand newspapers, periodicals, literature, and textbooks.
In selecting the test words, dictionary may be used, but
it is more convenient to use word lists based on frequency
counts of lexical items occurring in actual samples of the
language.
Useful as these and similar word counts are, the test
maker must be alert to their several shortcomings:
1. Word counts are usually based on the written language
only; therefore, many words that are extremely
common in the oral language will receive low
frequency ratings in the word lists.
2. The word lists classify words according to relative
frequency rather than absolute difficulty, and the two
are by no means always equivalent.
65
Language Testing
66
Language Testing
B. Sincere
C. Deaf
D. Harsh
3. Paraphrase
A third method of testing vocabulary, combining
elements of two of the previously discussed devices, is
to underline a word in context and provide several
possible meanings.
Example:
George was astounded to hear her answer
A. Greatly amused
B. Greatly relieved
C. Greatly surprised
D. Greatly angered
b. Supply type
1. Paraphrase
This is the variation of type above
(paraphrase/multiple-choice) and useful highly in
informal classroom testing. This type requires a
structured short answer supplied by the examinee.
Example:
George was astounded to hear her answer.
Direction: rewrite the sentence by substituting other
words for the underlined portion.
The possible answers include:
George was greatly surprised to hear
her answer.
George was amazed to hear her
answer.
George was astonished to hear her
answer.
67
Language Testing
2. Pictures (objective)
In the testing of children who have not yet reached the
reading stage, vocabulary may be measured with
pictures. There are two types of picture items have
frequently been used, as follow:
a. The examiner pronounces the name of an object
and asks the child to indicate, by pointing or
making a pencil mark, which one of a set of
pictures shows the object named.
Example:
The test booklet might contain four picturesof
a book, a bird, a boat, and a boxand the
examiner might ask, ―Draw a circle around the
boat.‖
b. In the second type, the child is shown a picture of
an object and is asked to name it.
C. Testing Listening
Teachers need to pay close attention to listening as a
mode of performance for assessment in the classroom.
68
Language Testing
69
Language Testing
Macroskills:
1. Intensive Listening
Listening for perception of the components
(phonemes, words, intonation, discourse markers, etc)
of a larger stretch of language. The focus in this section
is on the micro skills of intensive listening.
There are several kinds of tasks assessed for
intensive listening, they are:
a. Recognizing Phonological and Morphological
Elements
70
Language Testing
71
Language Testing
Dialogue paraphrase
2. Responsive Listening
Responsive listening focuses on the assessment
of a relatively short stretch of language (a greeting,
question, command, comprehension check, etc) in order
to make an equally short response. This typical
listening task is in a question-and-answer format which
provide interaction. The following are the example:
Appropriate response to a question
72
Language Testing
3. Selective Listening
In this kind of assessment task, the test-takers
listen to a limited quantity of aural input and must
discern within it some specific information.
Processing stretches of discourse such as short
monologues for several minutes in order to ―scan‖ for
certain information. Assessment tasks in selective
listening could ask students, for example, to listen for
names, numbers, a grammatical category, directions
(in a map exercise), or certain facts and events.
The techniques used in selective listening are:
a. Listening Cloze
Listening cloze tasks/ cloze dictations or partial
dictations require the test-taker to listen to a story,
73
Language Testing
74
Language Testing
b. Information Transfer
In information transfer technique, the
information must be transferred to a visual
representation, such as labeling a diagram,
identifying an element in a picture, completing a
form, or showing routes on a map.
The techniques used for this information
transfer task are:
Test-takers hear:
Choose the correct picture.
Test-takers see:
a. A picture of a table with one thick book and one
thin book on it.
b. A picture of a table with one thick book on it.
c. A picture of a table with one thick book and one
pencil on it.
d. A picture of a table with two thick books and a
pencil on the book on the right.
75
Language Testing
76
Language Testing
c. Dictation
Dictation is a widely researched genre of
assessing listening comprehension. In a dictation,
test-takers hear a passage, typically of 50 to 100
words, recited three times: first, at normal speed;
then, with long pauses between phrases or natural
word group, during which time test-takers write
down what they have just heard; and finally, at
normal speed once more so they can check their
work and proofread.
There are some difficulties in dictation. First,
the difficulty of a dictation task can be easily
manipulated by the length of the word groups, the
length of the pauses, the speed at which the text is
read, and the complexity of the discourse, grammar,
and vocabulary used in the passage. Second is the
difficulty in scoring. Scoring the dictation should be
depended on the context and purpose by deciding on
scoring criteria for several possible kinds of errors:
- Spelling error only, but the word appears to
have been heard correctly.
- Spelling and/or obvious misrepresentation of a
word, illegible word.
- Grammatical error.
- Skipped words or phrases.
- Permutation of words.
- Additional words not in the original.
- Replacement of a word with an appropriate
synonym
Beside the disadvantages above, dictation also
has benefits. First, it is practical to administer.
Second, there is a moderate degree of reliability in a
well-established scoring system. Third, there is a
77
Language Testing
78
Language Testing
79
Language Testing
Test-takers read:
1. What is Lynn’s problem?
a. She feels horrible
b. She ran too fast at the lake
c. She’s been drinking too many hot beverages
4. Extensive Listening
Listening to develop a top-down, global
understanding of spoken language. Listening for the
gist, for the main idea, and making inferences are all
part of extensive listening.
80
Language Testing
a. Note-taking
Note-taking is an authentic listening task that
is very appropriate to test students’ listening
ability. The students are asked to listen to a
lecture and take note any important point from
the lecturer’s explanation.
b. Editing
Editing is another authentic task provides both
a written and a spoken stimulus, and requires the
test-takers to listen for discrepancies.
The following is the way the task proceeds:
Test-takers read:
The written stimulus material such as a news
report, an email from a friend, notes from a
lecture, or an editorial in a newspaper.
Test-takers hear:
A spoken version of the stimulus that deviates,
in a finite number of facts or opinions, from the
original written form.
Test-takers mark:
The written stimulus by circling any words,
phrases, facts, or opinions that show a
discrepancy between the two versions.
81
Language Testing
c. Interpretive tasks
An interpretive task extends the stimulus
material to a longer stretch of discourse and
forces the test-taker to infer a response. Potential
stimuli include song lyrics, recited poetry,
radio/television news report, and an oral account
of an experience.
Test-takers are then directed to interpret the
stimulus by answering a few questions (in open-
ended form), such as:
Why was the singer feeling sad?
What events might have led up to the
reciting of this poem?
What do you think the political activists
might do next, and why?
D. Testing Speaking
Speaking is a productive skill that can be directly and
empirically observed. Those observations are invariably
colored by the accuracy and effectiveness of a test-taker’s
82
Language Testing
83
Language Testing
Macroskills
2 Acceptable pronounciation
85
Language Testing
Part B:
Test-takers repeat sentences dictated over the phone.
Examples: ―Leave town the next train.‖
Part C:
Test-takers answer questions with a single word or a
short phrase of two or three words. Example:
―Would you get water from a bottle or a
newspaper?‖
Part D:
Test-takers hear three word groups in random order
and must link them in a correctly ordered sentence.
Example: was reading/my mother/a magazine.
Part E:
Test-takers have 30 seconds to talk about their
opinion about some topic that is dictated over the
phone. Topics center on family, preferences, and
choices.
86
Language Testing
87
Language Testing
88
Language Testing
Customer: ______________________
Salesperson: It’s on sale today for $39.95
Customer: ______________________
Salesperson: Sure. We take Visa,
MasterCard, and American
Express.
Customer: ______________________
Test-takers see:
(source: www.odopod.com)
89
Language Testing
Test-takers see:
90
Language Testing
Test-takers hear:
1. [point to the table] What’s this?
2. [point to the blackboard] what is this?
3. [point to the teacher] is he a teacher?
Test-takers hear:
1. [point to the painting on the right] when
was this one painted?
2. [point to both] which painting is older?
3. Which painting would you buy? Why?
- Map-cued elicitation of giving directions
Test-takers see:
91
Language Testing
Test-takers hear:
You are at Main Street [point to the spot]. People
ask you for directions to get to five different
places. Listen to their questions, then give
directions.
1. Please give me directions to the City Park.
2. Please give me directions Community Center.
3. Please tell me how to get to the Bus Station.
Scoring responses on picture-cued intensive
speaking tasks varies, depending on the expected
performance criteria. The tasks that asked just for
one-word or simple-sentence responses can be
evaluated simply as correct or incorrect.
92
Language Testing
93
Language Testing
For example:
- Paraphrasing a story
Test-takers hear: Paraphrase the following little
story in your own words.
4. Interactive Speaking
Interactive tasks are what some would describe
as interpersonal speech events.
The tasks are:
95
Language Testing
a. Interview
A test administrator and a test-taker sit down
in a direct face-to face exchange and proceed
through a protocol of questions and directives.
The interview is scored on one or more
parameters such as: accuracy in pronunciation
and/or grammar, vocabulary usage, fluency,
sociolinguistic/pragmatic appropriateness, task
accomplishment, and comprehension.
Michael Canale in Brown (2004) suggests four
stages of interview, as follow:
1). Warm-up
Preliminary small talk for about one or
two minutes. Samples of questions are:
- How are you?
- What’s your name?
- What country are you from?
2). Level check
This stage check the test-taker’s readiness
to speak, confidence, etc. The questions are:
- Tell me about your family
- What is your academic major?
- What are your hobbies?
- What will you be doing ten years from
now?
3). Probe
Probe questions challenge test-takers to go
to the heights of their ability. The questions
are:
- What is your opinion about that issue?
- If you were president of your country,
what would you like to change about your
country?
96
Language Testing
97
Language Testing
d. Games
There are a variety of games that directly
involve language production.
Example:
Assessment games
1. ―Tinkertoy‖ game: A Tinkertoy (or Lego
block) structure is built behind a screen. One or
two learners are allowed to view the structure.
In successive stages of construction, the
learners tell ―runners‖ (who can’t observe the
structure) how to re-create the structure. The
runners then tell ―builders‖ behind another
screen how to build the structure. The builders
may question or confirm as they proceed, but
only through the two degrees of separation.
Object: re-create the structure as accurately as
possible.
2. Crossword puzzles are created in which the
names of all members of a class are clued by
obscure information about them. Each class
member must ask questions of others to
determine who matches the clues in the puzzle.
3. Information gap grids are created such that
class members must conduct mini-interviews
of other classmates to fill in boxes, e.g., ―born
in July,‖ ―plays the guitar,‖ has a two-year-old
brother,‖ etc.
4. City maps are distributed to class members.
Predetermined map directions are given to one
student who, with a city map in front of him or
her, describes the route to a partner, who must
then trace the route and get to the correct final
destination.
98
Language Testing
5. Extensive Speaking
Extensive speaking tasks involves complex,
relatively lengthy stretches of discourse. They are
frequently variations on monologues, usually with
minimal verbal interaction. This kind of assessment
includes more transactional speech events.
The tasks are:
a. Oral Presentation
In academic and professional arenas, the oral
presentation can be the presentation of a report, a
paper, a marketing plan, a sales idea, a design of a
new product, or a method. The following is the
example of a checklist for a prepared oral
presentation at the intermediate or advanced level
of English.
Content:
The purpose or objective of the presentation
was accomplished.
The introduction was lively and got my
attention.
The main idea or point was clearly stated
99
Language Testing
Delivery:
The speaker used gestures and body language
well.
The speaker maintained eye contact with the
audience.
The speaker’s language was natural and
fluent.
The speaker’s volume of speech was
appropriate.
The speaker’s rate of speech was appropriate
The speaker’s pronunciation was clear and
comprehensible.
The speaker’s grammar was correct and
didn’t prevent understanding.
The speaker used visual aids, handouts, etc
effectively.
The speaker showed enthusiasm and interest.
[if appropriate] The speaker responded to
audience questions well.
b. Picture-Cued Story-Telling
At this level, we consider a picture or a series
of pictures as a stimulus for a longer story or
description.
Example:
100
Language Testing
101
Language Testing
E. Testing Reading
In foreign language learning, reading is likewise a
skill that teachers simply expect learners to acquire. There
are two primary concepts that must be cleared to become
efficient readers, they are: 1. They need to be able to
master fundamental bottom-up strategies for processing
separate letters, words, and phrases, as well as top-down,
conceptually driven strategies for comprehension, 2. A
part of top-down approach, second language readers must
develop appropriate content and formal schemata ─
background information and cultural experience ─ to carry
out those interpretation effectively.
Brown (2004:186) divides the types (genres) of
reading as follow:
1. Academic reading
This includes:
General interest articles (in magazines,
newspapers,etc).
Technical reports, professional journal articles
Reference material
102
Language Testing
Textbooks, theses
Essays, papers
Test directions
Editorials and opinion writing
2. Job-related reading
This includes:
Messages
Letters/emails
Memos
Reports
Schedules, labels, signs, announcements
Forms, applications, questionnaires
Financial documents
Directories
Manual, directions
3. Personal reading
This includes:
Newspapers and magazines
Letters, emails, greeting cards, invitations
Messages, notes, lists
Schedules
Recipes, menus, maps, calendars
Advertisements
Novels, short stories, jokes, drama, poetry
Financial documents
Forms, questionnaires, medical reports,
immigration documents
Comic strips, cartoons
Like listening and speaking, reading also has micro
and macro-skills which represent the spectrum of
possibilities for objectives in the assessment of reading
comprehension.
103
Language Testing
Microskills
Macroskills
104
Language Testing
105
Language Testing
1. Perceptive reading
a. Reading aloud
The test-takers see separate letters, words,
and/or short sentences and read them aloud, one
by one, in the presence of an administrator.
b. Written response
In this case, the test-takers are given a task to
reproduce the probe in writing. If an error occurs,
the test-makers should determine its source. For
example, what might be assumed to be a writing
error may actually be a reading error, or vise
versa.
c. Multiple-choice
Here are some possibilities:
Minimal pair distinction
Test-takers read: Circle ―S‖ for same or ―D‖
for different.
1. Led let S D
2. Bit bit S D
3. Seat sit S D
4. Too to S D
5. Meet meat S D
Grapheme recognition task
106
Language Testing
d. Picture-cued items
Test-takers are shown a picture along with a
written text and are given one of a number of
possible tasks to perform.
The tasks are as follow:
Picture-cued word identification
107
Language Testing
1. Clock
2. Glass
3. Spoon
4. Dog
5. Chair
2. Selective Reading
A combination of bottom-up and top-down
processing may be used.
a. Multiple-choice (for form-focused criteria)
The most straightforward multiple-
choice items may have little context, but might
serve as a vocabulary or grammar check.
Example:
1). The cat is ____________ the room.
a. under
b. between
c. around
d. in
a. single
b. young
c. a husband
d. first
b. be b. will go
c. being c. going
b. Matching tasks
The most frequently appearing criterion in
matching procedures is vocabulary. The
format is as follow:
Direction: Write in the letter of the definition
on the right that matches the word on the left.
d. Picture-cued tasks
The methods that are commonly used are:
1). Test-takers read a sentence or passage and
choose one of four pictures that is being
described.
110
Language Testing
e. Gap-filling tasks
An extension of simple gap-filling tasks is
to create sentence completion items where
test-takers read part of a sentence and then
complete it by writing a phrase.
For example:
3. Interactive reading
Top-down processing is typical of such tasks,
although some instances of bottom-up performance may
be necessary
a. Cloze tasks
Cloze procedure is very popular for this task.
Cloze procedure itself can be divided into: cloze
procedure, fixed-ratio deletion (every nth word);
Cloze-procedure, rational deletion; c-test procedure,
and cloze-elide procedure.
1). Cloze procedure, fixed-ratio deletion (every fifth
word).
111
Language Testing
112
Language Testing
113
Language Testing
c. Question-Answer Tasks
This is an alternative task that a teacher can give
to his students. For example:
114
Language Testing
e. Scanning
Assessment of scanning is carried out by
presenting test-takers with a text and requiring rapid
identification of relevant bits of information.
f. Ordering tasks
Sentence-ordering task
116
Language Testing
F. Testing Writing
There are three genres of writing, they are:
1. Academic writing
o Papers and general subject reports
o Essays, compositions
o Academically focused journals
o Short-answer test responses
o Technical reports
o Theses, dissertations
2. Job-related writing
o Messages
o Letters/emails
o Memos
o Reports
o Schedules, labels, signs
o Advertisements, announcements, manuals
3. Personal writing
o Letters, emails, greeting cards, invitations
o Messages, notes
o Calendar entries, shopping lists, reminders
o Financial documents
o Forms, questionnaires, medical reports,
immigration documents
o Diaries, personal journals
o Fiction
118
Language Testing
Microskills
1. Produce graphemes and orthographic patterns of
English.
2. Produce writing at an efficient rate of speed to suit
the purpose.
3. Produce an acceptable core of words and use
appropriate word order patterns.
4. Use acceptable grammatical systems (e.g. tense,
agreement, pluralization), patterns, patterns, and
rules.
5. Express a particular meaning in different
grammatical forms.
6. Use cohesive devices in written discourse.
119
Language Testing
Macroskills
7. Use the rhetorical forms and conventions of
written discourse.
8. Appropriately accomplish the communicative
functions of written texts according to form and
purpose.
9. Convey links and connections between events,
and communicative such relations as main ideas,
supporting idea, new information, given
information, generalization, and exemplification.
10. Distinguish between literal and implied meanings
when writing.
11. Correctly convey culturally specific references in
the contexts of the written text.
12. Develop and use a battery of writing strategies,
such as accurately assessing the audience’s
interpretation, using prewriting devices, writing
with fluency in the first drafts, using paraphrases
and synonyms, soliciting peer and instructor
feedback, and using feedback for revising and
editing.
120
Language Testing
1). Copying
The test-takers read: Copy the following
words in the spaces marks
c. Picture-cued tasks
It can be divided into:
1). Short sentences
2). Picture description
3). Picture sequence description
e. Ordering tasks
Reordering words in a sentence
Test-takers read:
Put the words below into the correct order to
make a sentence
1). Cold/winter/is/weather/the/in/the
2). Doing/what/they/are
124
Language Testing
UNIT VI
ASSIGNING GRADES AND COURSE MARKS
A. Scoring tests
According to Green (1975: 159), scoring refers to the
process of checking the tests to determine the number of
correct and incorrect responses and assigning numerical
scores. These scores are called raw scores, and they indicate
the number of items that pupils have answered correctly. In
addition, there are also derived scores, such as percentiles
and standard scores, which are statistically calculated from
the raw scores.
Different types of tests can be scored in the different
manner. It is explained further as follow:
1. Objective Tests
Objective tests can be scored most quickly and
accurately of all the test types. If an answer sheet is
used for the test, a scoring key can be made by
punching out the correct responses on a cardboard
sheet, which fits over the answer sheets in such a way
that the pupils’ errors can be marked through the holes
in the cardboard.
125
Language Testing
2. Essay Tests
Essay tests can be scored by using two kinds of
scoring methods: point-score method and sorting
method. Scorers should carefully prepare the keys in
order to improve the reliability. There is no doubt that
careless, cursory reading by the evaluator and lack of
clearly defined grading criteria have been major
factors contributing to the unreliability of the essay
test.
3. Performance tests
Since there are several types of performance
tests, the methods of scoring these tests vary.
Objective performance tests, such as the identification
test, can be scored in the same manner as the
conventional objective test. Usually the work-sample
test can also be set up so that it can be scored
numerically, when a specified number of points is
allotted to each work-sample station, pupils’ scores
can be compiled and handled in the conventional
manner.
When pupil products are evaluated, however,
the scoring procedure is different in that check lists
and rating scales are used to increase the reliability of
the evaluation.
126
Language Testing
B. Assigning Grades
This section is focused on principles and methods of
grading.
Problems of Grading
There are several problems in grading that are needed to
be solved:
1. Course marks frequently do not reflect the actual
course achievement of individual pupils.
2. Teachers lack objective, clearly defined criteria for
assigning grades.
3. The halo effect frequently influences teachers to
grade those whom they like higher than their
achievement warrants.
4. Occasionally personality conflicts between a pupil
and his teacher cause the pupil to be unfairly
penalized when he is assigned a grade.
5. There is a tendency for male teachers to assign
higher grades to female pupils than to male pupils
for comparable achievement.
6. Course marks are often given on the basis of
insufficient data concerning the achievement of
pupils.
7. The grade may reflect a cultural bias detrimental to
minority group pupils.
8. The basis for assigning grades may not be clear to
pupils and parents.
127
Language Testing
128
Language Testing
129
Language Testing
130
Language Testing
Point-Score Methods
The point-score technique of assigning course marks is
useful in classes from the upper elementary grades through
the university. There are alternative methods for calculating
marks when using the point-score technique:
1. Cumulative-point score
It may be used with a curve. This method is
useful when some of the factors to be included in a
final mark are not readily assigned either a letter or
131
Language Testing
2. Grade-point average
It is easier to use than the cumulative-point
score method and is a better system to follow in
most courses. With this system pupils are given
letter grades on all their assignments and
examinations, and the letter grades are translated
into the following point scale before being
averaged: A=4 points, B=3 points, C=2 points, D=1
point, and F=0 point.
132
Language Testing
1. Criterion-Referenced Grading
The criterion or criteria of success are carefully
spelled out in terms of specified minimum levels of
acceptable competence, levels that can generally be
demonstrated in specific behavior or performance. The
emphasis is on successful performance, not failure, and
diagnostic measurement is used by the teacher at the
beginning to analyze pupil status and establish a base line
of competence.
During the instruction, formative measurement helps
in pacing the individual pupil’s progress, in determining
mastery of specified competencies or concepts, and in
recycling pupils through identified problem areas. At the
end of the instructional period, summative evaluation is
used to demonstrate the exit-level competence of each
pupil. At this point there are several options open to the
teacher (Green, 1975):
a. He may certify on the pupil’s record only the
specific competencies that have been mastered,
assigning no grade for unmastered competencies
requiring future instruction.
b. He may use a pass-fail grade, passing all who
demonstrate the minimum acceptable level of
competence.
c. He may give letter grades. Assigning A grades to
all who achieve the specified minimum
competence.
133
Language Testing
2. Continuous-Assessment Methods
Particularly at the lower elementary level where basic
skills are stressed, continuous monitoring or assessment
of achievement is carried out. In this instance the record
of each subskill, for example, word attack skills in
phonics, is evaluated with frequent (daily) tests, and the
record is kept graphically with a continuous charting of
errors and successes.
This procedure provides both teachers and students
constant feedback concerning progress, a type of
feedback which may be more meaningful than a letter
grade.
134
Language Testing
REFERENCES
135