0% found this document useful (0 votes)
38 views48 pages

Testing and Assessment

Testing and Assessment

Uploaded by

Nguyen Que
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views48 pages

Testing and Assessment

Testing and Assessment

Uploaded by

Nguyen Que
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

SESSIONS 17+18:

DESIGNING READING TESTS


Objectives:
By the end, participants will be able to:
- list some issues that a test designer should consider before composing a test

- describe the procedure for designing a reading test


- design a reading test for upper secondary students based on its specifications
---------------------------------------------------------
I. DESIGNING READING TESTS – IMPORTANT CONSIDERATIONS
Activity 1: Read the text and answer the following questions:
a. Fill in the blank with one word in the reading text.
1. the item format to the purpose of the assessment is essential, as __ items are not
well-suited for testing productive skills like writing or speaking.
2. Teachers must strive to write items with only __ correct answer to avoid ambiguity
and differing perspectives.
3. Items should be written at the __ level of proficiency for the students taking the
assessment.
4. Ambiguous terms, __, and double negatives should be avoided unless the intent is to
test those concepts.
5. Providing __ information on items that is irrelevant to the skill being tested can
confuse students and offer unintended clues.
b. Choose the correct answer from the choices for each question
1. Which of the following is the most important consideration when designing
assessment items?
A. Matching the item format to the purpose of the assessment
B. Ensuring only one correct answer
C. Writing items at the appropriate level of difficulty
D. All of the above
2. What is the primary reason for avoiding ambiguous language in assessment items?
A. It may cause students to answer incorrectly even if they know the correct answer.
B. It makes the items more challenging for students.
C. It can provide unintended clues to other items.
D. All of the above
3. Which of these strategies can help teachers avoid bias in their assessment items?
A. Having colleagues from diverse backgrounds review the items
B. Using statistical techniques to analyze item bias
C. Avoiding references to race, gender, religion, or nationality
D. All of the above
4. What is the primary purpose of having native speakers of the language take the
assessment before administration?
A. To ensure the items are appropriately challenging
B. To identify any factors other than language that have been introduced
C. To provide feedback on the clarity and wording of the items
D. All of the above
5. Which of the following is the most important reason for avoiding providing extra
information in assessment items?
A. It takes extra time for students to read.
B. It can inadvertently provide clues to other items.
C. It makes the items more difficult to understand.
D. All of the above
1. Teachers will of course want their item formats to match the purpose and content of
the item. In part, this means matching the right type of item to what is being tested in
terms of channels and modes. For instance, teachers may want to avoid using a
multiple-choice format, which is basically receptive (students read and select, but they
produce nothing), for testing productive skills like writing and speaking. Similarly, it
would make little sense to require the students to read aloud (productive) the letter of
words in a book in order to test the receptive skill of reading comprehension. Such a
task would be senseless, in part because the students would be using both receptive
and productive modes mixed with both oral and written channels when the purpose of
the test, reading comprehension, is essentially receptive mode and written channel. A
second problem would arise because the students would be too narrowly focused in
terms of content on reading the letters of the words. To avoid mixing modes and
channels and to focus the content at the comprehension level of the reading, teachers
might more profitably have the students read a written passage and use receptive-
response items in the form of multiple-choice comprehension questions. In short,
teachers must think about what they are trying to test in terms of all the dimensions
discussed in the previous chapter and try to match their purpose with the item format
that most closely resembles it.
2. The issue of making sure that each question has only one correct answer is not as
obvious as it might at first seem. Correctness is often a matter of degrees rather than
absolute. An option that is correct to one person may be less so to another, and an
option that seems incorrect to the teacher may appear to be correct to some of the
students. Such differences may occur due to differing points of view or to the differing
contexts that people can mentally supply in answering a given question. Every teacher
has probably disagreed with the “correct” answer on some test that he or she has taken
or given. Such problems arose because the item writer was unable to take into account
every possible point of view. One way that test writers attempt to circumvent this
problem is by having the examinees select the best answer. Such working does
ultimately leave the judgment as to which is the “best” answer in the hands of the test
writer, but how ethical is such a stance? I feel that the best course of action is to try to
write items for which there is clearly only one correct answer. The statistics discussed
in distractor efficiency analysis help teachers to spot cases where the results indicate
that two answers are possible, or that a second answer is very close to correct.
3. Each item should be written at approximately the level of proficiency of the students
who will take the test. Since a given language program may include students with a
wide range of abilities, teachers should think in terms of using items that are at about
the average ability level for the group. To begin with, teachers may have to gauge this
average level for the group. To begin with, teachers may have to gauge this average
level by intuition, but later, using the item statistics provided in this chapter, they will
be able to identify more rationally those items that are at the appropriate average level
for their students.
4. Ambiguous terms and tricky language should be avoided unless the purpose of the
item is to test ambiguity. The problem is that ambiguous language may cause students
to answer incorrectly even though they know the correct answer. Such an outcome is
always undesirable.
5. Likewise, the use of negatives and double negatives may be needlessly confusing
and should be avoided unless the purpose of the item is to test negatives. If negatives
must be tested, wise test writers emphasize the negative elements (by underlying them,
typing them in CAPITAL letters, or putting them in boldface type) so the students are
sure to notice what is being tested. Students should not miss an item because they did
not notice a negative marker. If indeed they know the answer.
6. The teacher should also avoid giving clues in one item that will help students to
answer another item. For instance, a clear example of a grammatical structure may
appear in one item that will help some students to answer a question about that
structure later in the test. Students should answer the latter question correctly only if
they know the concept or skill involved, not because they were clever enough to
remember and look back to an example or model of it in a previous item.
7. All the parts of each item should be on one page. Students, who know the concept or
skill being tested, should not respond incorrectly simply because they did not realize
that the correct answer was on the next page. This issue is easily checked but
sometimes forgotten.
8. Teachers should also avoid including extra information that is irrelevant to the
concept or skill being tested. Since most teachers will probably want their tests to be
relatively efficient, any extra information not related to the material being tested
should be avoided because it will just take extra time for the students to read and will
add nothing to the test. Such extra information may also inadvertently provide the
students with clues that they can use in answering other items.
9. All teachers should also be on the alert for bias that may have crept into their test
items. Race, gender, religion, nationality, and other biases must be avoided at all costs,
not only because they are morally wrong and illegal in many countries but also
because they affect the fairness and objectivity of the test. The problem is that a biased
item is testing something in addition to what it was originally designed to test. Hence,
such an item cannot provide clear and easily interpretable information. The only
practical way to avoid bias in most situations is to examine them. Preferably, these
colleagues will be both male and female and will be drawn from different racial,
religious, nationality and ethnic groupings. Since the potential for bias differs from
situation or situation, individual teachers will have to determine what is appropriate for
avoiding bias in the items administered to their particular populations of students.
Statistical techniques can also help teachers spot and avoid bias in items; however,
these statistics are still controversial and well beyond the scope of this book.
10. Regardless of any problems that teachers may find and correct in their items, they
should always have at least one or more colleagues (who are native speakers of the
language being tested) look over and perhaps take the test so that any additional
problems may be spotted before the test is actually used to make decisions about
students’ lives. As Lado (1961, p.323) put it, “if the test is administered to native
speakers of the language, they should make very high marks on it or we will suspect
that factors other than the basic ones of language have been introduced into the items.”
II. DESIGNING RECEPTIVE ITEMS IN A READING TEST
Activity 2. Read the text below and decide whether the following statements are
true or false.
1. True-false items should be designed to be tricky so that they challenge
intermediate or advanced language students.
2. True-false items that include absoluteness clues are difficult to answer
correctly.
3. Multiple-choice items should contain unintentional clues to help students
answer them.
4. All distractors in a multiple-choice item should be plausible.
5. Teachers should include needless redundancy in multiple-choice items to make
them more effective.
6. Patterns should be intentionally introduced in tests to help students guess the
correct answers.
7. Teachers should avoid using options like "all of the above" and "none of the
above" in multiple-choice items.
8. Matching items should have an equal number of options and premises.
9. The options in matching items should be shorter than the premises.
10. Mixing different themes in one set of matching items is a good strategy for
testing students effectively.
True-false items are typically written as statements, and students must decide whether
the statements are true or false. There are two potential problems shown in Table 3.2
that teachers should consider in developing items in this format.
1. The statement should be carefully worded to avoid any ambiguities that might cause
the students to miss it for the wrong reasons. The wording of true-false items is
particularly difficult and important. Teachers are often tempted to make such items
“tricky” so that the items will be difficult enough for intermediate or advanced
language students. Such trickiness should be avoided: Students should miss an item
because they do not know the concept or have the skill being tested rather than
because the item is tricky.
2. Teachers should also avoid absoluteness clues. Absoluteness clues allow students to
answer correctly without knowing the correct response. Absoluteness clues include
terms like all, always, absolutely, never, rarely, most often, and so forth. True-false
items that include such terms are very easy to answer regardless of the concept or skill
being tested because the answer is inevitably false.
Multiple-choice items are made up of an item stem, or the main part of the item at the
top, a correct answer which is obviously the choice (usually, a., b., or d.) that will be
counted correct, and the which are those choices that will be counted as incorrect.
These incorrect choices are called distractors because they should distract, or divert,
the attention away from the correct answer if the students really do not know which is
correct. The term refers collectively to all the alternative choices presented to the
students and includes the correct answer and the distractors. These terms are necessary
for understanding how multiple-choice items function.
1. Teachers should avoid unintentional clues (grammatical, phonological,
morphological, and so forth) that help students to answer an item without having the
knowledge or skill being tested. To avoid such clues, teachers should write multiple-
choice items so that they clearly test only one concept or skill at a time. Consider the
following item:
The fruit that Adam ate in the Bible was an ___
A. pear
B. banana
C. apple
D. papaya
The purpose of this item is neither clear nor straightforward. If the purpose of the item
is to test cultural or biblical knowledge, an unintentional grammatical clue (in that the
article “an” must be followed by a word that begins with a vowel) is interfering with
that purpose. Hence, a student who knows the article system in English can answer the
item correctly without ever having heard of Adam. If, on the other hand, the purpose
of the item is to test knowledge of this grammatical point, why confuse the issue with
the reference? In short, teachers should avoid items that are not
straightforward and clear in intent. Otherwise, unintentional clues may creep into their
items.
2. Teachers should also make sure that all the distractors are plausible. If one distractor
is ridiculous, that distractor is not helping to test the students. Instead, those students
who are guessing will be able to dismiss that distractor and improve their chances of
answering the item correctly without really knowing the correct answer. Why would
any teacher write an item that has ridiculous distractors? Brown’s law may help to
explain this phenomenon: When writing four-option, multiple-choice items, the stem,
and correct option are easy to write, and the next two distractors are relatively easy to
make up as well, but the last distractor is absolutely impossible. The only way to
understand Brown’s law is to try writing a few four-option, multiple-choice items. The
point is that teachers are often tempted to put something ridiculous for that last
distractor because they are having trouble thinking of an effective distractor. So
always check to see that all the distractors in a multiple-choice item are truly
distracting.
3. In order to make a test reasonably efficient, teachers should double-check that items
contain no needless redundancy. For example, consider the following item designed
to test the past tense of the verb to fall:
The boy was on his way to the store, walking down the street, when he stepped on a
piece of cold wet ice and ___.
A. fell flat on his face.
B. fall flat on his face.
C. felled flat on his face.
D. falled flat on his face.
In addition, to the problem of providing needless words and phrases throughout the
stem, the phrase “flat on his face” is repeated four times in the options, it could just as
easily been written once in the stem. Thus, the item could have been far shorter to read
and less redundant, yet equally effective, if it had been written as follows:
The boy stepped on a piece of ice and ____ flat on his face.
A. fell
B. fall
C. felled
D. falled.
4. Any test writer may unconsciously introduce a pattern into the test that will help the
students who are guessing to increase the probability of answering an item correctly. A
teacher might decide that the correct answer for the first item should be “c” For the
second item, that teacher might decide on “d” and for the third item “a”. Having
already picked “c” or “d” and arid “a” to be correct answers in the first three items, the
teacher will very likely pick “b” as the correct answer in the next item. Human beings
seem to have a need to balance things out like this, and such patterns can be used by
clever test takers to help them guess at better than chance levels without actually
knowing the answers. Since testers want to maximize the likelihood that students
answer items correctly because they know the concepts being tested, they generally
avoid patterns that can help students to guess.
A number of strategies can be used to avoid creating patterns. If the options are always
ordered from the shortest to longest or alphabetically, the choice of which option is
correct is out of the test writer’s hands, so the human tendency to create patterns will
be avoided. Another strategy that can be used is to select randomly which option will
be correct. Selection can be done with a table of random numbers or with the aces,
twos, threes, and fours taken from a deck of cards. In all cases, the purpose is to
eliminate patterns that may help students to guess the correct answers if they do not
know the answers.
5. Teachers can also be tempted (often due to Brown’s law, mentioned above) to use
options like “all of the above”,” “none of the above,” and ‘‘a. and only.” I normally
advise against this type of option unless the specific purpose of the item is to test two
things at a time and abilities to interpret such combinations. For the reasons discussed
in numbers 1 and 2 above, such items are usually inadvisable.
Matching items present the students with two columns of information; the students
must then find and identify matches between the two sets of information. For the sake
of discussion, the information given in the left- hand column will be called the premise
and that shown in the right-hand column will be labeled options. Thus, in a matching
test, students must match the correct option to each premise. There are three guidelines
that teachers should apply to matching items.
1. More options should be supplied than premises so that students cannot narrow down
the choices as they go along by simply keeping track of the options that they have
already used. For example, in matching ten definitions (premises) to a list of ten
vocabulary words (options), a student who knows nine will be assured of getting the
tenth one correct by the process of elimination., If, on the other hand, there are ten
premises and fifteen options, this problem is minimized.
2. The options should usually be shorter than the premises because most students will
read a premise and then search through the options for the correct match. By
controlling the length of the options, the amount of reading will be minimized.
Teachers often do exactly the opposite in creating vocabulary matching items by using
the vocabulary words as the premises and using the definitions (which are much
longer) as the options.
3. The premises and options should be logically related to one central theme that is
obvious to the students. Mixing different themes in one set of matching items is not a
good idea because it may confuse the students and cause them to miss items that they
would otherwise answer correctly. For example, lining up definitions and the related
vocabulary items is a good idea, but also mixing in matches between graphemic and
phonemic representations of words would only cause confusion. The two different
themes could be much more clearly and effectively tested as separate sets of matching
items.
III. SPECIFICATIONS FOR READING TEST
Activity 3. Based on the specifications for a mid-term reading section test for
students in grade 11, identify the strengths and weaknesses of the following
listening section test.
1. Ma trận đề kiểm tra
2. Bản đặc tả kỹ thuật ra đề kiểm tra
Mức độ Số câu hỏi theo mức độ nhận thức
kiến thức, Tổng Số
Đơn vị kiến Nhận Thông Vận dụng CH
Kỹ kĩ năng Vận dụng
STT thức/kỹ biết hiểu cao
năng cần kiểm
năng
tra, đánh
TN TL TN TL TN TL TN TL TN TL
giá

1 Reading 1. Cloze text Nhận


biết:
Đọc một đoạn
văn khoảng 100 - Dạng
từ và chọn một từ
đáp án trong số
- Động
4 lựa chọn để 3 3
từ
điền vào chỗ
trống - Từ chỉ
định

- Mạo từ

- Giới từ

Thông
hiểu:

- Nghĩa 1 1
của từ
trong
ngữ cảnh

Vận
dụng:

- Phân
tích mối
liên kết
giữa các 1 1
câu trong
văn bản
chọn liên
từ phù
hợp

2. Reading Nhận
comprehension biết:
1 3
Đọc một bài - Từ
khoảng 240 từ đồng 2
và trả lời bằng nghĩa
cách chọn đáp - 1 chi
án đúng trong số tiết trong
4 lựa chọn. văn bản
Thông
hiểu:
- Đại từ
quy
chiếu
- Loại
1 1
trừ các
chi tiết
đúng để
chọn 1
chi tiết
sai trong
bài
Vận
dụng:
- Hiểu
mục đích
của tác
giả 1 1

- Chọn
tiêu đề/ ý
chính
trong
văn bản
Vận
dụng
cao

- Thể
hiện
quan
điểm cá
nhân về
nội dung
bài đọc

(Tích
hợp vào
kiểm tra
kỹ năng
viết)
Reading section test

PART B – READING
Reading 1
Read the following passage and mark the letter A, B, C, or D on your answer
sheet to indicate the correct word or phrase that best fits each of the numbered
blanks from 1 to 5
Imagine you are buying an apple in a supermarket. Which do you choose, one
with a small brown mark or one without? Be honest - you'd go for the apple (1)
______ looks perfect. Supermarkets do this too, but on a much larger scale when
buying fruit and vegetables from farmers. And what (2) ______ to the red with
marks on them? They are thrown away. So are the ones that are a funny shape or
size.
(3) ______ reason for waste is that people buy more food than they can eat and
supermarkets do everything they can to encourage this, for example with offers
like "Buy one, get one free" . Developed countries waste about 650 million tons
of food each year and so do developing countries. (4) ______ the waste happens
for very different reasons. As the world's population grows, this problem will
only (5) ______ so we need to take action urgently.
(NB) Question 1. A. when B. who C. which D. whom
(NB) Question 2. A. happen B. was happened C. happened D. happens
(TH) Question 3. A. Another B. Few C. Many D. Other
(VD) Question 4. A. Although B. However C. As a result D. For example
(NB) Question 5. A. worsen B. worsening C. worsens D. to worsen
Key: 1 – C, 2 – D, 3 – A, 4 – B, 5 – A
Reading 2
Read the following passage and mark the letter A, B, C, or D on your answer
sheet to indicate the correct answer to each of the questions from 1 to 5
There was a man who had four sons. He wanted his sons to learn not to judge
things too quickly. So he sent them each on a quest, in turn, to go and look at a
pear tree that was a great distance away. The first son went in the winter, the
second in the spring, the third in summer, and the youngest son in the
fall. When they had all gone and come back, he called them together to
describe what they had seen.
The first son said that the tree was ugly, bent, and twisted. The second son said
no – it was covered with green buds and full of promise. The third son
disagreed, he said it was laden with blossoms that smelled so sweet and looked
so beautiful, it was the most graceful thing he had ever seen. The last son
disagreed with all of them; he said it was ripe and drooping with fruit, full of life
and fulfilment.
The man then explained to his sons that they were all right, because they had
each seen but one season in the tree’s life. He told them that you cannot judge a
tree, or a person, by only one season, and that the essence of who they are – and
the pleasure, joy, and love that come from that life – can only be measured at the
end, when all the seasons are up. If you give up when it’s winter, you will miss
the promise of your spring, the beauty of your summer, the fulfillment of your
fall.
Don’t judge a life by one difficult season. Don’t let the pain of one season
destroy the joy of all the rest.
(VD) Question 1. Which best serves as the title for the passage?
A. The Seasons of Life B. The Observation of a Tree
C. Father and Four Sons D. Love all the Seasons in a Year
(NB) Question 2. According to paragraph 2, what did the second son see in his
turn?
A. The tree was gloomy, withered and crooked.
B. The tree was in buds and teeming with vigor.
C. The tree was blossoming and gave off a sweet scent.
D. The tree was bountifully fruitful, brimming with life force.
(NB) Question 3. The word “laden” in paragraph 2 is closest in meaning to
__________.
A. loaded B. decorated C. enhanced D. given
(TH) Question 4. The word “they” in paragraph 3 refers to __________.
A. the four sons B. green buds
C. trees, people D. the pleasure, joy, and love
(TH) Question 5. According to paragraph 4, what is the lesson the father
wanted to impart to his children?
A. Moral lessons can come from the most unexpected and ordinary things.
B. No matter what season it is outside, you always have to cherish it.
C. The old age of humans is similar to the winter of nature.
D. Persevere through the difficulties and better times are sure to come
sometime sooner or later.
Key: 1 – A, 2 – B, 3 – A, 4 – C, 5 – D
Activity 4. Design a reading section test based on the specifications provided in
activity 3.
SESSIONS 19+20:
DESIGNING LISTENING TESTS
Objectives:
By the end, participants will be able to:
- list some issues that a test designer should consider before composing a listening test
- design a listening test for upper secondary students based on its specifications
---------------------------------------------------------
I. USING RECORDINGS IN A LISTENING TEST
Activity 1. Decide whether you agree or disagree with statements about
recordings used in assessing listening skills.
1. The recording must include key vocabulary I agree I don’t have an I disagree
and grammar points I have taught. opinion
2. Only native speakers of English should be I agree I don’t have an I disagree
on the recording. opinion
3. The people should speak slowly and clearly I agree I don’t have an I disagree
so that students can understand. opinion
4. The language used has to be completely I agree I don’t have an I disagree
accurate, with no grammar mistakes. opinion
5. The recording must be authentic: if it is a I agree I don’t have an I disagree
news broadcast, it ought to come from the opinion
BBC or CNN, not from an EFL textbook.
I agree I don’t have an I disagree
6. There shouldn’t be any words or expressions
opinion
that the students are not expected to know.
7. The topic should be something that the I agree I don’t have an I disagree
students have studied. opinion
8. The recording should be a kind of listening I agree I don’t have an I disagree
text that the learners have experienced in opinion
class before.
9. It’s a good idea to use written stories from I agree I don’t have an I disagree
newspapers or magazines and read them opinion
aloud.
II. DESIGNING LISTENING TESTS – IMPORTANT CONSIDERATIONS
Activity 2. Read the text below and answer the questions below.
a. Fill in the blank with the correct words.
1. In testing extended listening, it is essential to keep items __ apart in the passage.
2. Candidates should be warned by __ that appear both in the item and in the passage.
3. Candidates should be given __ time at the outset to familiarize themselves with the
items.
4. Multiple choice can work well for testing __ skills, such as phoneme discrimination.
5. Partial dictation can be used diagnostically to test students' ability to cope with
particular __.
b. Choose the correct answer from the choices for each question.
1. Which of the following is not a recommended technique for testing extended
listening?
A. Multiple choice
B. Short answer
C. Partial dictation
D. Dictation
2. Which of the following is a reason why multiple-choice items may be problematic
for testing extended listening?
A. Candidates have to hold in their heads four or more alternatives while listening.
B. Alternatives must be kept short and simple.
C. Multiple choice can only test lower-level skills.
D. All of the above.
3. What is the primary purpose of the note-taking procedure mentioned in the passage?
a. To identify the key information that candidates should be able to get from the
passage.
b. To write items that check whether candidates have understood the main points.
c. Both a and b.
d. None of the above.
4. According to the passage, what is the potential issue with having two items close to
each other in an extended listening test?
a. Candidates may miss the second item due to cognitive demands.
b. Candidates may listen for 'answers' that have already passed.
c. Both a and b.
d. None of the above.
5. What is the best way to provide candidates with the items before the listening
passage??
a. Give candidates enough time to review the items.
b. Do not give candidates time to review the items.
c. Only give candidates time to review the items in special cases.
d. Give candidates time to review the items, but only in their native language.
Writing items
For extended listening, such as a lecture, a useful first step is to listen to the passage
and note down what it is that candidates should be able to get from the passage. We
can then attempt to write items that check whether or not they have what they should
be able to get. This note-making procedure will not normally be necessary for shorter
passages, which will have been chosen (or constructed) to test particular abilities.
In testing extended listening, it is essential to keep items sufficiently far apart in the
passage. If two items are close to each other, candidates may miss the second of them
through no fault of their own, and the effect of this on subsequent items can be
disastrous, with candidates listening for ‘answers’ that have already passed. Since a
single faulty item can have such an effect, it is particularly important to trial extended
listening tests, even if only on colleagues aware of the potential problems.
Candidates should be warned by keywords that appear both in the item and in the
passage that the information called for is about to be heard. For example, an item may
ask about ‘the second point that the speaker makes and candidates will hear ‘My
second point is . . .’. The wording does not have to be identical, but candidates should
be given fair warning in the passage. It would be wrong, for instance, to ask about
‘what the speaker regards as her most important point’ when the speaker makes the
point and only afterward refers to it as the most important. Less obvious examples
should be revealed through trialing.
Other than in exceptional circumstances (such as when the candidates are required to
take notes on a lecture without knowing what the items will be, see below), candidates
should be given sufficient time at the outset to familiarize themselves with the items.
As was suggested for reading in the previous chapter, there seems no sound reason not
to write items and accept responses in the native language of the candidates. This will
in fact often be what would happen in the real world, when a fellow native speaker
asks for information that we have to listen for in the foreign language.
Possible techniques
Multiple choice
There is the problem of the candidates having to hold in their heads four or more
alternatives while listening to the passage and, after responding to one item, taking in
and retaining the alternatives for the next item. If multiple choice is to be used, then

the alternatives must be kept short and simple. The alternatives in the following, which
III

appeared in a sample listening test of a well-known examination, are probably too


complex.
When stopped by the police, how is the motorist advised to behave?
a. He should say nothing until he has seen his lawyer.

b. He should give only what additional information the law requires.

c. He should say only what the law requires.

d. He should in no circumstances say anything.

Better examples would be: (Understanding request for help)


I don’t suppose you could show me where this goes, could you? Response:
No, I don’t suppose so
a. Of course, I can.

b. I suppose it won’t go.

c. Not at all.

Multiple choice can work well for testing lower-level skills, such as phoneme
discrimination.
The candidate hears bat and chooses between pat mat fat bat.
Short answer
This technique can work well, provided that the question is short and straightforward,
and the correct, preferably unique, response is obvious.
Gap filling
This technique can work well where a short answer question with a unique answer is
not possible.
Woman: Do you think you can give me a hand with this?
Man: I’d love to help but I’ve got to go around to my mother’s house in a
minute.
The woman asks the man if he can ………... her but he has to visit his ……….
Partial dictation
While dictation may not be a particularly authentic listening activity (although in
lectures at university, for instance, there is often a certain amount of dictation), it can
be useful as a testing technique. As well as providing a ‘rough and ready’ measure of
listening ability, it can also be used diagnostically to test students’ ability to cope with
particular difficulties (such as weak forms in English).
Because traditional dictation is so difficult to score reliably, it is recommended that
partial dictation is used, where part of what the candidates hear is already written
down for them. It takes the following form:
The candidate sees:
It was a perfect day. The sun …………… in a clear blue sky and Diana felt that
all was…………… with the world. It wasn’t just the weather that made her feel
this way. It was also the fact that her husband had …………… agreed to a
divorce. More than that, he had agreed to let her keep the house and to pay her a
small fortune every month. Life …………… be better.
The tester reads:
It was a perfect day. The sun shone in a clear blue sky and Diana felt that all
was right with the world. It wasn’t just the weather that made her feel this way.
It was also the fact that her husband had finally agreed to a divorce. More than
that, he had agreed to let her keep the house and to pay her a small fortune
every month. Life couldn’t be better.
Since it is listening that is meant to be tested, correct spelling should probably not be
required for a response to be scored as correct. However, it is not enough for
candidates simply to attempt a representation of the sounds that they hear, without
making sense of those sounds. To be scored as correct, a response has to provide
strong evidence of the candidate’s having heard and recognized the missing word,
even if they cannot spell it. It has to be admitted that this can cause scoring problems.
The gaps may be longer than one word:
It was a perfect day. The sun shone …………… and Diana felt that all was well
with the world.
While this has the advantage of requiring the candidate to do more than listen for a
single word, it does make the scoring (even) less straightforward.
III. SPECIFICATIONS FOR A LISTENING TEST
Activity 3. Based on the specifications1 for a mid-term listening section test for
students in grade 11, identify the strengths and weaknesses of the following
listening section test.
1. Ma trận đề kiểm tra

2. Bản đặc tả kỹ thuật ra đề kiểm tra


Mức độ Số câu hỏi theo mức độ nhận thức
kiến thức, Tổng Số
Đơn vị kiến Nhận Thông Vận dụng CH
kĩ năng Vận dụng
STT Kỹ năng thức/kỹ biết hiểu cao
cần kiểm
năng TL
tra, đánh
TN TL TN TL TN TL TN TL TN
giá

1 Listening 1. Nghe một Nhận


đoạn hội biết:
thoại/ độc
- Nhận 2 2
thoại khoảng
biết:
1-2 phút
thuộc các 1 chi tiết
trong bài

1Bộ Giáo dục và Đào tạo. (2020). Tài liệu tập huấn giáo viên THPT – Xây dựng ma trận, đặc tả đề kiểm tra định
kỳ.
chủ đề liên như
quan và trả
- số
lời câu hỏi
True/false - thời gian

- địa chỉ

Thông
hiểu:

Thông
hiểu:

- Hiểu 2-3 2 2
chi tiết
đúng

- các ý
chính của
người nói

Vận
dụng:

- Hiểu
nhiều chi
tiết, loại 1 1
trừ các chi
tiết sai để
chọn đáp
án đúng.

2. Nghe. một Nhận


đoạn hội biết:
thoại/ độc
1 chi tiết
thoại khoảng
trong bài
1.5-2 phút và
như
trả lời câu 2 2

hỏi trắc - nơi chốn


nghiệm 4 lựa - thời gian
chọn.
- phương
hướng

Thông
hiểu:
2 2
- 2 hoặc 3
chi tiết
trong bài
nghe

- các ý
chính của
người nói

Vận
dụng:

- Hiểu mục
đích của
tác giả 1 1

- Chọn tiêu
đề/ ý chính
trong văn
bản

Listening section test

Listening 1
Exercise 1: Listen to the conversation and decide which statements are
True (T) or False (F)
Statements True False
(TH)1. Paul is younger than David
(NB) 2. Bill is nineteen years old.
(NB) 3. David is a teacher
(TH) 4. David's father is a teacher, too
(VD) 5. David's mother doesn't want to go and live in the
countryside

Listening 2
Exercise 2: Choose the best answer A, B or C about the interview.
1. What does Carlos hate?
A. shopping
B. museums
C. football
2. Where are they going to eat on Saturday evening?
A. at home
B. in an Italian restaurant
C. in a Chinese restaurant
3. What are they going to do on Sunday morning?
A. go for a drive
B. get up late
C. go to the cinema
4. Where are they going to have lunch on Sunday?
A. in a cafe
B. in a pub
C. at home
5. They can't go to the cinema on Sunday afternoon because
A. Carlos doesn't like films.
B. Eric doesn't like films
C. They don't have time
Answers:
Listening 1:
Keys: 1 – F, 2 – T, 3 – T, 4 – F, 5 – F
Tapescript:
Listen to Paul talking to a friend about his family
What does each person do?
You will hear the conversation twice.
Female: Tell me about your family, Paul
Paul: Well, you know Sally, my sister – the writer – don’t you?
Female: Yes. Is she your only sister?
Paul: He’s nineteen. He’s studying French in Paris at the moment.
Female: That sounds interesting… and David? What does he do?
Paul: Oh, he’s a teacher, the same as my mother was. But she finished working
last year.
Female: And what about your father?
Paul: Oh, he’s a doctor at the local hospital
Female: Of course. I’ve seen him there.
Paul: My mother says he works too hard. She wants him to stop. She wants to go
and live on a farm in the country, near David.
Female: Um… Are you a doctor too, Paul?
Paul: I’m not clever enough! I work in a bank – the one in the High Street, next
to the bookshop.
Listening 2:
Key: 1. B, 2. C, 3. A, 4. B, 5. C
Tapescript:
Listen to Eric talking to Mary about the weekend.
Their friend, Carlos, is coming to visit them.
Now listen to the conversation.
Eric: Mary… what do you want to do at the weekend, when Carlos comes?
Mary: Well, Eric, I must go shopping on Saturday morning.
Eric: He hates shopping. But we could go to the museum and then meet you for
lunch.
Mary: Fine. What shall we do in the afternoon?
Eric: There’s a good football match on – Carlos’ll like that.
Mary: OK. Do you want to eat at home in the evening?
Eric: OK. Let’s do that. Now, what about Sunday?
Mary: If we get up early on Sunday, we could go for a drive in the countryside.
Eric: Yes, and we could have lunch in a pub somewhere.
Mary: Yes, the one near the river’s nice. Shall we go to the cinema after lunch?
Eric: We can’t. His train’s at four o’clock and I’ll have to take him back to the
station.
Activity 4. Design a listening section test based on the specifications you have
created.
SESSIONS 21+22:
DESIGNING WRITING TESTS
Objectives:
By the end, participants will be able to:
- list some issues that a test designer should consider before composing a writing test
- design a writing test for upper secondary students based on its specifications
- score writing performances reliably based on the given rating scales
---------------------------------------------------------
I. DESIGNING WRITING TESTS – IMPORTANT CONSIDERATIONS
Activity 1. Read the text below and answer the questions below.
a. Fill in the blank with the correct words.
1. When writing assessment items, teachers should __ the possibility of students
providing alternative correct answers.
2. Providing __ context helps ensure the purpose of the item is clear to students.
3. Blanks of __ length avoid giving unintended clues about the answers.
4. Placing the blank __ the main body of the item provides students with the necessary
information to respond.
5. Supplying a list of possible answers can make fill-in items __ for students.
b. Choose the correct answer from the choices for each question.
1. Which of the following is NOT a key consideration when using fill-in items?
A. Ensuring one clear correct answer
B. Providing a glossary of acceptable answers
C. Using blanks of varying length
D. Positioning the blank after the main body
2. Why is it important to avoid providing too much extra context in fill-in items?
A. It burdens students with extraneous information.
B. It makes the items too easy.
C. It disrupts the flow of the assessment.
D. Both a and c
3. What is the primary benefit of supplying a list of possible answers for fill-in items?
A. It makes answering the items easier for students.
B. It makes scoring the items easier for the teacher.
C. It reduces the possibility of alternative correct answers.
D. Both a and b
4. Which of the following is a key guideline for short-response items?
A. The items should be formatted to elicit one concise answer.
B. Partial credit should never be awarded.
C. The items should be as wordy as possible.
D. Both a and b
5. When scoring task items, teachers must decide whether to use an analytic or holistic
approach. What is the key difference between these approaches?
A. Analytic scoring looks at various aspects separately, while holistic uses a single
rating scale.
B. Analytic scoring is more objective, while holistic is more subjective.
C. Analytic scoring is better for assessing language production, while holistic is better
for problem-solving.
D. Analytic scoring requires more time, while holistic is quicker.
Fill-in items are those where a word or a phrase is replaced by a blank in a sentence or
longer text, and the student’s job is to fill in that missing word or phrase. There are
five sets of issues that teachers should consider when using fill-in items.
1. In answering fill-in items, students will often write alternative correct answers that
the teacher did not anticipate when the items were written. To guard against this
possibility, teachers should check to make sure that each item has one very concise
correct answer. Alternatively, the teacher can develop a glossary of acceptable answers
for each blank. Obviously, as the number of alternative possibilities rises for each
item, the longer and more difficult the scoring becomes.
2. In deciding how much context to provide for each blank, teachers should make sure
that enough context has been provided that the purpose, or intent, of the item is clear to
those students who know the answer. At the same time, avoid giving too much extra
context. The extra context will burden students with extraneous material to read and
may inadvertently provide students with extraneous clues.
3. Generally speaking, all the blanks in a fill-in test should be the same length – that is,
if the first blank is twelve spaces long, then all the items should have blanks with
twelve spaces. Blanks of uniform length do not provide extraneous clues about the
relative length of the answers. Obviously, this stricture would not apply if a teacher
purposely wants to indicate the length of each word or the number of words in each
blank.
4. Teachers should also consider putting the main body of the item before the blank in
most of the items so that the students have the information necessary to answer the
item once they encounter the blank.
5. In situation where the blanks may be very difficult and frustrating for the students,
teachers may consider supplying a list of responses from which the students can
choose in filling in the blanks. This list will not only make answering the items easier
for the students but will also make the correction of the items easier for the teacher
because the students will have a limited set of possible answers from which to draw.
Short-response items are usually questions that the students can answer in a few
phrases or sentences. This type of question should conform to at least the following
two guidelines:
1. The teachers should make sure that the items are formatted so that there is one, and
only one, concise answer or set of answers that they are looking for in the responses to
each item. The parameters for what will be considered an acceptable answer must be
thought through carefully and clearly delineated before correcting such questions. As
in number 1 above for fill-in items, the goal in short-response items is to ensure that
the answer key will help the teacher to make clear-cut decisions to whether each item
is correct, without making modifications as the scoring progresses. Thus, the teacher’s
expectations should be thought out in advance, recognizing that subjectivity may
become a problem because he or she will necessarily be making judgments about the
relative quality of the students’ answers. Thus, partial credit entails giving some credit
for answers that are not 100% correct. For instance, on one short-response item, as
student might get two points for an answer with correct spelling ad correct grammar,
but only one point if either grammar or spelling were wrong, and no points if both
grammar and spelling were wrong. As with all the other aspects for scoring short-
response items, any partial credit scheme must be clearly thought and delineated
before scoring starts so that backtracking and rescoring will not be necessary.
2. Short-response items should generally be phrased as clear and direct questions.
Unnecessary wordiness should particularly be avoided with this type of item so that
the range of expected answers will stay narrow enough to be scored with relative ease
and objectivity.
Task items are defined here as any of a group of fairly opened-ended item types that
require students to perform a task in the language that is being tested. A task test might
include a series of communicative tasks, a set of problem-solving tasks, and a writing
task.
While task items are appealing to many language teachers, a number of complications
may arise in trying to use them. To avoid such difficulties, consider at least the
following points.
1. The directions for the task should be so clear that both the tester and the student
know exactly what the student must do.
2. The task should be sufficiently narrow in scope so that it fits logistically into the
time allotted for its performance and yet broad enough so that an adequate sample of
the student’s language use is obtained for scoring the item properly.
3. The teacher must carefully work out the scoring procedure for task items for the
same reasons listed in discussing the other types of productive response items. Two
entirely different approaches are possible in scoring tasks. A task can be scored using
an analytic approach, in which the teachers are various aspects of each student's
language production separately, or a task can be scored using a holistic approach, in
which the teachers use a single general scale to give a single global rating for each
students’ language production. The very nature of the items will depend on how the
teachers choose to score the task. If teachers choose to use an analytic approach, the
task may have three, four, five, or even six individual bits of information, each of
which must be treated as a separate item. A decision for a holistic approach will
produce results that must be treated differently – that is, more like a single item. Thus,
teachers must decide early on whether they will score the task items using an analytic
approach or a holistic approach.
II. WRITING PROMPTS FOR A WRITING TEST TASK

Activity 2. Read the passage and discuss important considerations when


designing writing prompts

1. Types of Prompts

Writing prompts are stimuli provided to students to initiate the writing process. These
prompts serve as a starting point, offering a topic or a scenario to guide students'
writing efforts. Writing prompts are widely used in educational settings to assess
writing skills, stimulate creativity, and provide practice in various writing styles.

Bare Prompts (Open Structure): These prompts are straightforward and direct,
presenting the writing task in simple terms without additional context or framing. The
candidate is expected to respond directly to the task.

Examples:

"Capital punishment. Discuss."

"Do you favor or oppose the goals of the women’s liberation movement in the United
States? Why?"

Framed Prompts: These prompts provide a situational context or a set of


circumstances and then ask the candidate to perform a specific task based on the
interpretation of that context. This type of prompt often requires the candidate to take a
stance and support it with evidence.

Examples:

"Some people feel that using animals for food is cruel and unnecessary, while others
feel that it is necessary for people to eat meat, and that the production of animals for
food can be done without cruelty. What is your position on the issue of whether people
should use animals for food? Discuss the strengths and weaknesses of both positions
and use concrete examples when you explain and defend your point of view."

Text-Based Prompts

Text-based prompts, also referred to as reading-based or response structure prompts,


involve presenting students with a passage of authentic or adapted reading material.
Students are then asked to write an essay that demonstrates their ability to interpret the
content of the reading or to use ideas from the reading in ways directed by the prompt.
These types of prompts are particularly valuable in assessing students' reading
comprehension and their ability to integrate and synthesize information from a text
into their writing.

Examples:

You have seen the following job advertisement in your local youth center.
ACTIVITY COORDINATOR WANTED!

(Weekends only)
• Would you like to work with young children aged 5-11?
• Do you have lots of energy?
• Are you an excellent swimmer?

If the answer to these questions is 'yes' then we want to hear from you. Our Children's Club is
looking for someone to be in charge of a group of 10 children to teach them swimming and to
do a range of other activities.
Please send a letter to Mrs. Sykes saying why you are suitable for the job.

2. Variables of writing prompts

Writing prompts serve as the stimuli for student writing, particularly in testing
situations. They must be carefully constructed to allow students to demonstrate their
true writing abilities. There are six key categories that test developers must consider
when creating writing prompts: contextual variables, content variables, linguistic
variables, task variables, rhetorical variables, and evaluation variables.

Contextual Variables

Contextual variables refer to the setting in which the writing will occur and the
purpose of the test. It is crucial to clarify these contexts to the students. For example,
prompts designed for placement tests must differentiate between various proficiency
levels, while those for graduation requirements must be challenging enough to assess
mastery of the subject. The design of prompts should align with the specific objectives
of the course or program.

Content Variables

Content variables deal with the subject matter of the prompt. Prompts should be based
on topics within the experience and knowledge base of the students. Research
indicates that students perform better when the writing task taps into their background
knowledge. Thus, prompt designers should select content accessible to all students to
ensure fair assessment.

Linguistic Variables

Linguistic variables encompass the language used in the prompt, which should be
clear, unambiguous, and culturally accessible. Instructions should be precise, avoiding
any potential for misinterpretation. For instance, ambiguous terms or culturally
specific references that may confuse non-native speakers should be avoided. A well-
worded prompt minimizes unexpected interpretations and ensures that all students
understand the task.
Task Variables

Task variables pertain to the specific tasks students are asked to perform in their
writing. Prompts should strike a balance between being sufficiently challenging and
achievable within the given time constraints. The number of tasks should be
manageable, allowing students to focus on developing their arguments effectively.
Overly complex prompts can overwhelm students and result in incomplete or
unfocused essays.

Rhetorical Variables

Rhetorical variables involve the approach students are instructed to take regarding the
content. Prompts should provide clear directions on the rhetorical style expected, such
as compare and contrast, argue, or describe. The level of specificity in rhetorical
instructions should be optimal to avoid all responses sounding the same or too varied,
which can complicate scoring.

Evaluation Variables

Evaluation variables concern how the responses to the prompts will be assessed. It is
vital that the criteria for evaluation are clearly defined and shared with the students
beforehand. Different scoring guidelines can privilege various aspects of writing, such
as linguistic accuracy or complexity of ideas. The document also discusses the
importance of using scoring rubrics that reflect the values and priorities of the writing
program.

III. SPECIFICATIONS OF A WRITING TEST


Activity 3. Designing writing tasks
When we plan to create a writing test, or to choose a writing test that is suitable for our
own students, we need to keep in mind these four key components: the task, the
writer, the scoring procedure, and the reader(s).
Below you can see two writing tasks, from different tests and at different levels. Read
these tasks and think about:
the writing situation
the topic
the task(s)
the wording of the rubric
the scoring criteria
Based on these five points, would the following writing tasks be suitable for your
students? Why or why not?
Writing Task 1
Write a short text about your holidays. Where did you spend your holidays? Where did
you stay? What did you do? Who went with you? What didn’t you like?
Writing Task 2
Every country in the world has problems with pollution and damage to the
environment. Do you think these problems can be solved?
Activity 4. Create a writing section test based on the specifications for a mid-term
writing section test for 11th-grade students below.
1. Ma trận đề kiểm tra

2. Bản đặc tả kỹ thuật ra đề kiểm tra


Số câu hỏi theo mức độ nhận thức
Mức độ kiến Tổng Số
Vận
Kỹ Đơn vị kiến thức, kĩ năng Nhận Thông Vận CH
STT dụng
năng thức/kỹ năng cần kiểm tra, biết hiểu dụng
cao
đánh giá
TN TL TN TL TN TL TN TL TN TL

1 Writing 1. Error Nhận biết:


identification
- Thì quá khứ
Phát hiện lỗi sai
- Dạng động 2 2
từ theo sau
động từ khác

- Hòa hợp
chủ vị

- Trật tự từ

2. Sentence Thông hiểu:


transformation
Viết lại câu
Viết câu sao cho nghĩa
không đổi, sử
dụng

- Các dạng 2 2
động từ theo
sau động từ
khác

- Các thì quá


khứ của động
từ

Vận dụng:

- Viết lại câu


sao cho nghĩa
không đổi

- Động từ
nguyên thể bị
động
2 2
- Danh động
từ bị động

- Danh động
từ hoàn thành

- Động từ
nguyên thể
hoàn thành

3. Theme writing Vận dụng


cao:
Viết theo chủ đề
- Viết một bài
khoảng 130
từ sử dụng từ/
ý gợi ý về 1 1
một trong
những chủ đề
sau:

- Một lá thư
mời một
người bạn
đến một bữa
tiệc

- Một lá thư
thân mật
miêu tả một
trải nghiệm
của bản thân

- Một đoạn
văn miêu tả
một người
bạn

IV. ASSESSING WRITING PERFORMANCES


Activity 5. Read the following candidates’ writing performances and practice
scoring them based on the rating scales provided.
Writing task:
PART C – WRITING
You should spend about 15 minutes on this task
Last week, you had a holiday with your family in Dalat. Write a letter to tell your
friend about your holiday. In your letter you should mention:
- Where did you spend your holidays?
- Where did you stay?
- What did you do?
- Who went with you?
- What did you like and dislike?
You should write at least 130 words.
Writing performances
Writing 1
Hi Linda,
How have you been these days? I’ve just return from an interesting trip to Da Lat with
my parents, and I’m eager to tell you about it.
We stayed at a homestay in the city center with a lovely garden and a small swimming
pool.
During this trip, we visited the famous Dalat Flower Park. we take a boat ride on
Xuan Huong Lake and enjoyed delicious foods at the night market.
What I loved most about Dalat was the natural scenery, I found the traffic in the city
busy all the time.
That’s all about my trip. Let’s tell me about your holiday. I look forward to hearing
from you soon.
Best,
Duong
Writing 2
Hi Marry,
Last week, I go Da Lat with my family. I stay in a hotel. I eat a lot of seafood and
sunbathe. I miss you a lot. I like the weather but don’t like the life in Da Lat. It is too
quiet for me.
Love
Mai
Rating scales
Tổng:
1 điểm Nhận biết Thông hiểu Vận dụng Vận dụng cao
Nội dung (0,1đ) (0,2đ) (0,3đ) (0,4đ)

(0,4đ) - Trả lời ở mức tối - Trả lời được cơ bản - Trả lời khá đầy đủ yêu - Trả lời đầy đủ
thiếu yêu cầu của yêu cầu của bài. Bài cầu của bài. Bài viết có yêu cầu của bài.
bài. Bài viết có câu viết có câu chủ đề. Các câu chủ đề. Các thành Bài viết có câu
chủ đề và các thành thành phần cơ bản đều phần cơ bản được phát chủ đề. Các
phần cơ bản. được phần cơ bản đều triển tương đối kỹ, thành phần cơ
được phát triển, đôi khi logic và cân xứng. bản được phát
- Viết được một số
có chỗ phát triển chưa triển kỹ logic,
ý chính nhưng có - Viết đủ ý chính, các ý
cân xứng. cân xứng.
chỗ diễn đạt không có liên quan đến chủ
rõ, bị lặp ý, không - Viết được cơ bản đủ ý đề. - Đủ ý, các ý
liên quan, gây khó chính. Có một vài ý thống nhất, có
- Dẫn chứng phù hợp,
hiểu cho người đọc. không liên quan, gây liên quan chặt
khá thuyết phục.
khó hiểu cho người đọc. chẽ.

- Có đôi chỗ dẫn chứng - Dẫn chứng


hay và thuyết
còn chưa phù hợp.
phục.
Tổ chức (0,05đ) (0,1đ) (0,15đ) (0,2đ)

thông tin - Thông tin bị lặp - Thông tin còn bị lặp. - Thông tin ít khi bị lặp. - Thông tin
và ảnh hưởng đến không bị lặp.
và tính - Sắp xếp các ý có - Sắp xếp ý logic, mạch
diễn đạt.
logic, có tính liên kết, lạc, dễ đọc, dễ hiểu. - Sắp xếp ý
liên kết
- Sắp xếp ý thiếu mặc dù tính mạch lạc logic, mạch lạc
- Sử dụng khá đa dạng và
(0,2đ) logic, thiếu tính liên giữa các câu còn thấp cao. Bài viết có
chính xác các phương tiên
kết. hoặc máy móc. tổng thể hài hòa,
kết nối.
trôi chảy, uyển
- Ít sử dụng hoặc sử - Sử dụng các phương
chuyển, tự nhiên.
dụng lặp các
tiện kết nối tuy còn đơn
phương tiện kết - Sử dụng đa
nối, hoặc sử dụng giản nhưng đúng nghĩa. dạng và chính
các phương tiện xác các phương
kết nối chưa chính tiện kết nối.
xác.

Từ vựng (0,05đ) (0,1đ) (0.15đ) (0,2đ)


(0,2đ) - Từ vựng liên quan - Từ vựng tạm đủ để - Từ vựng đa dạng có - Từ vựng
đến chủ đề hạn chế, diễn đạt thông tin liên liên quan đến chủ đề. Đôi phong phú
lặp, hoặc không quan đến chủ đề. Một chỗ còn mắc lỗi dùng từ
liên quan đến
phù hợp. số chỗ dùng từ chưa chưa phù hợp hoặc sai
chủ đề, diễn
phù hợp hoặc viết sai dạng từ.
- Dạng từ, chính tả đạt tự nhiên.
dạng từ.
còn viết sai, gây - Có sử dụng từ đồng Dùng từ phù
hiểu lầm hoặc khó - Còn mắc một số lỗi nghĩa, trái nghĩa, kết hợp hợp, chính xác.
chịu cho người đọc. sai về dạng từ và chính từ, thành ngữ...
- Sử dụng đa
tả gây khó khăn cho
- Ít lỗi sai về chính tả. dạng các từ
người
đồng nghĩa, trái
đọc. nghĩa, kết hợp
từ, thành ngữ…
- Rất ít lỗi về
dạng từ, và
chính tả.

Ngữ pháp (0,05đ) (0,1đ) (0,15đ) (0,2đ)


(0,2đ) - Sử dụng cấu trúc - Sử dụng các cấu trúc - Sử dụng đa dạng cấu - Sử dụng đa
câu đơn giản. câu khác nhau. trúc câu. dạng, linh hoạt
cấu trúc câu.
- Mắc nhiều lỗi sai - Có khá nhiều lỗi sai - Thi thoảng còn mắc
về ngữ pháp và về ngữ pháp và chấm lỗi về ngữ pháp và - Còn một vài
chấm câu, gây khó câu, đôi chỗ gây khó chấm câu, nhưng người lỗi nhỏ về ngữ
hiểu cho người đọc. hiểu cho người đọc. đọc vẫn có thể hiểu. pháp và chấm
cấu, người đọc
vẫn thấy dễ
hiểu.
SESSIONS 23+24:
DESIGNING SPEAKING TESTS
Objectives:
By the end, participants will be able to:
- list some issues that a test designer should consider before composing a speaking test
- design a speaking test for upper secondary students based on its specifications
- score speaking performances for upper secondary students reliably based on the
given rating scales
---------------------------------------------------------
I. ASSESSING SPEAKING ABILITIES
Activity 1.
When we want to assess L2 learners’ speaking ability, we need to consider a number
of issues. Read some statements below and decide whether they are ' true ' or ' false '.
A third option is also included (i.e. ' I can't decide ') for you to choose when you think
that whether a statement is true or false depends on the context.
Question 1
When designing speaking assessment tasks, we need to consider the learner’s needs
and characteristics.
a. True
b. False
c. I can't decide
Question 2
When assessing L2 learners’ speaking ability, we do not need worry about the length
of their performance.
a. True
b. False
c. I can't decide
Question 3
When scoring learners’ performance, we need to consider what they say and how they
say it.
a. True
b. False
c. I can't decide
Question 4
If a learner does not respond to a task well, it is because the task is bad.
a. True
b. False
c. I can't decide
Question 5
The learner can get more support from the examiner/teacher in the one-on-one,
individual mode than in the paired mode.
a. True
b. False
c. I can't decide
Question 6
The examiner/teacher can correct the learner’s mistakes/errors while they are in
interaction.
a. True
b. False
c. I can't decide
Question 7
One type of elicitation technique/task is enough to use in order to sample learners’
proficiency.
a. True
b. False
c. I can't decide
Question 8
The scoring of spoken performance is more reliable when an analytic rating scale is
used.
a. True
b. False
c. I can't decide
II. DESIGNING SPEAKING TESTS – IMPORTANT CONSIDERATIONS
Activity 2. Read the text below and answer the questions below.
a. Fill in the blank with the correct words.
1. The __ is often considered to be a prototypical exam format.
2. The flow of information in an interview is __.
3. Language users often __ and conclude topics.
4. In an interview, the learner is often asked about a number of different __ that s/he
can relate to.
5. The interview format is suitable for eliciting __ information about the learner.
b. Choose the correct answer from the choices for each question.
1. Which of the following does not reflect the way we use language in everyday
communication?
A. Initiating topics
B. Redirecting topics
C. Concluding topics
D. The examiner controlling the conversation
2. What is the purpose of using display questions in an interview?
A. To generate extended responses
B. To elicit personal information
C. To reflect on everyday communication
D. To make the interview more guided
3. What is the main difference between a discussion and a role-play activity?
A. The interlocutor's role
B. The flow of information
C. The task goal
D. The learner's language output
4. Why are paired or group oral tasks a good choice for classroom assessments?
A. They allow the teacher to monitor all students
B. They are more reflective of everyday communication
C. They are easier to manage with a group of learners
D. All of the above
5. What is the key consideration in designing tasks for paired or group oral
assessments?
A. Balancing the potential contributions
B. Providing guided but not fully controlled tasks
C. Allowing learners to voice their own opinions
D. All of the above
The interview is often considered to be a prototypical exam format although its scope
is limited because the examiner is in control of the conversation, initiating and
concluding topics and so the flow of information is one-way. Such an imbalance in
conversational rights and duties does not reflect the way we use language in everyday
communication since language users often initiate, redirect, and conclude topics, and
they often want to get information, not simply give it. In an interview, the learner is
often asked about a number of different topics that s/he can relate to. This test format
is probably suitable for eliciting personal information about the learner and getting
him/her to express opinions on certain issues.
For reasons of test validity and fairness, the interlocutor’s contributions should be as
guided as possible. This means that the questions should be preferably scripted (pre-
written) as it may make a big difference to a learner’s performance if the interlocutor
paraphrases the questions inappropriately. For example, display questions (to which
the answer is known in advance) are not likely to generate extended and meaningful
responses.
The following recommendations are made for interlocutors when asking questions to
a learner (Csépes & Egyud, 2004, p. 40):
• Use global questions for elicitation.

• Use wh-questions instead of yes/no questions whenever possible.

• Never ask more than one question at a time.

• Do not talk more than necessary: refrain from making unnecessary comments.

• Do not interrupt or finish what the learner wants to say.


• Do not ask questions that require special background knowledge.

• Avoid ambiguous and embarrassing questions.

• Use genuine questions and avoid display questions.

• Maintain eye contact with the learner when talking to him/her.

In contrast to interviews, in discussions or roleplay activities, the interlocutor acts as


the learner’s partner with whom they have to reach a specific goal based on some kind
of opinion or information gap between the two. In such tasks, there is a good
opportunity for the learner to display his/her oral interactional skills since the tasks
allow for a two-way information flow. In collaborative tasks, learners are typically
encouraged to initiate, negotiate, and argue for and/or against specific ideas,
suggestions, or propositions. In discussion activities, learners usually express their
own opinions, which distinguishes this technique from a role-play activity, where they
often take somebody else’s role (e.g. a holiday-maker) to reach a particular
communicative goal (e.g. to discuss where to go and what to do on a trip). The roles
featuring in test tasks usually simulate the ones we take in our everyday lives.
However, the examiner’s superiority in terms of age, language proficiency, and
authority (due to his/her role, which is difficult to ignore even in an assessment
situation) will still prevail in the majority of cases, and potentially limit the learner’s
language output for psychological reasons.
As a classroom teacher, in most cases, you have to schedule the assessment of
speaking as part of your daily work. This means that you need to engage all the
learners simultaneously while you may select some students whose performance you
wish to assess based on some criteria. The classroom context thus calls for task
formats that can be managed with a group of learners. The paired mode or the group
seems to be a good choice as the teacher can quietly monitor the students while they
are doing the task. However, the participants of either format must be presented with
instructions and prompts that are capable of eliciting language performance from all.
For task design, therefore, we need to consider the following:
• potential contributions to the interaction should be balanced, which could be
achieved by giving an equal number of visual or word prompts for both/all
participants;
• participants should have comparable tasks, i.e. they should be required to do the
same thing in order to facilitate a balanced, realistic, and smoothly-running
exchange between them;
• the interaction should be task-based because it seems to give learners a
meaningful purpose to engage in a conversation (e.g. listing, comparing,
contrasting, selecting, justifying, modifying, etc.);
• the tasks have to be guided but not fully controlled, i.e. learners should have a
chance to add something of their own to the exchange;
• learners should be given a chance to voice their own opinion rather than argue
for something given that they cannot identify with.
III. SPECIFICATIONS FOR A SPEAKING TEST
Activity 3. Read the specifications for a speaking test and highlight the main
points. Then share with the whole class.
Đặc tả đề thi cho kỹ năng Nói
1. Thông tin chung
Thời gian: 7-10 phút/thí sinh.
Miêu tả chung các phần: Đề thi gồm 2 phần.
Phần 1: Mỗi thí sinh trả lời 5 câu hỏi của giám khảo về thông tin bản thân và
các chủ đề khác nhau.
Phần 2: Trả lời câu hỏi theo tình huống
Miêu tả chung về nội dung: Thí sinh nói về các chủ đề có liên quan đến bản thân và đời
sống hằng ngày.
Số lượng câu hỏi:
Phần 1: 5 câu hỏi/1 thí sinh.
Phần 2: 6 câu hỏi/1 thí sinh.
Tổng điểm: 25 điểm.

2. Thông tin chi tiết


Số lượng câu
Các tiểu kỹ năng được
Phần Ngữ liệu nguồn hỏi / nhiệm
đánh giá
vụ

1 Phỏng vấn Trả lời phỏng vấn 5 câu


Mỗi thí sinh trả lời 2 câu hỏi của Giao tiếp xã hội hỏi/1 thí
giám khảo bằng cách sử dụng ngôn sinh
Giải thích và mô tả
ngữ thông thường được dùng trong một vấn đề cụ thể
các cuộc gặp mặt lần đầu tiên, cung
cấp thông tin về một cá nhân như
tên, nơi sinh, gia đình.
Mỗi thí sinh trả lời thêm 3 câu hỏi
của giám khảo về cuộc sống hằng
ngày, sở thích, những điều thích,
không thích, kinh nghiệm trong quá
khứ và các dự định tương lai.
Trả lời câu hỏi theo tình huống Trả lời câu hỏi có 6 câu
Thí sinh được giao một thẻ gợi ý thông tin gợi ý về một
có thông tin về một tình huống để tình huống cụ thể
trả lời cho 6 câu. Sử dụng các chức
năng ngôn ngữ
3. Dạng thức nhiệm vụ / Câu hỏi thi
Tiêu
điểm đề Dạng câu hỏi Độ khó Dạng thức trả lời
bài

Câu hỏi được thiết kế Giới thiệu tên tuổi, chỗ Trả lời trực tiếp
dưới dạng câu hỏi có từ để ở, đánh vần tên riêng … với giám khảo.
hỏi. Miêu tả gia đình, Thí sinh
Phỏng Khi sử dụng câu hỏi mở trường lớp, môn học... không được biết
vấn (yes / no question), sẽ có từ trước câu hỏi và
để hỏi bổ sung nhằm gợi không có thời
mở cho thí sinh trả lời câu gian chuẩn bị
hỏi chi tiết hơn. cho câu trả lời.
Phần này được thiết kế dưới Trả lời về thời gian, - Mỗi thí sinh
hình thức thẻ gợi ý về một địa điểm được yêu cầu trả
tình huống cho sẵn. Trên Miêu tả nội dung liên lời ít nhất 6 câu
mỗi thẻ gợi ý có thông tin quan đến sự vật, sự việc theo tình huống
Trả lời
về nội dung tình huống bao (What) cho sẵn.
câu hỏi
gồm gợi ý chi tiết về nội
theo tình Miêu tả nội dung liên
dung tình huống cho thí
huống quan đến tính chất của
sinh dùng để trả lời.
sự vật, sự việc (How).
4. Thang điểm và cách đánh giá
Thang điểm
Phần 1: 10 điểm (2 điểm/1 câu)
Phần 2: 15 điểm
Tổng điểm: 25 điểm
- Tiêu chí đánh giá: Thí sinh trả lời 5 câu hỏi trôi chảy, đúng ngữ pháp, sử dụng từ
vựng chính xác, đúng nội dung hỏi thì được điểm trọn vẹn (5 điểm). Cách tính điểm
cho mỗi câu trả lời dựa theo biểu chấm sau:

Điểm Miêu tả tiêu chí đánh giá

1,0 Trả lời (tương đối) lưu loát, đúng ý, hầu như chính xác về ngữ pháp, từ vựng và
phát âm.

0,75 Trả lời có ngập ngừng, đúng ý, khá chính xác về về ngữ pháp, từ vựng và phát
âm.

0,5 Ngập ngừng lâu để tìm ý và tìm từ, có mắc lỗi nhỏ về ngữ pháp, từ vựng và
phát âm nhưng không ảnh hưởng đến nội dung câu trả lời.

0,25 Dừng lại rất lâu mới có thể tìm được ý, chỉ có thể trả lời bằng những từ rời rạc,
mắc các lỗi nghiêm trọng về ngữ pháp, từ vựng và phát âm làm người nghe
phải cố gắng mới hiểu được nội dung câu trả lời.
0 Thí sinh hoàn toàn không thể trả lời câu hỏi.

Phần 2
Bảng thang điểm và tiêu chí đánh giá

Phát âm và độ lưu Mức độ


Điểm Từ vựng Ngữ pháp
loát hoàn thành
10 Sử dụng từ vựng Sử dụng ngữ pháp Rất lưu loát. Có Hoàn thành nhiệm
đa dạng, có thể tốt. Duy trì việc sử thể diễn đạt một vụ một cách hiệu
diễn đạt rõ ràng dụng ngữ pháp cách trôi chảy và quả. Hoàn tất tất cả
chính
và không bị hạn xác. Các lỗi ngữ ngay lập tức. Phát các yêu cầu. Diễn
pháp
chế bởi từ vựng hiếm khi xảy ra. âm và ngữ điệu đạt ý kiến và quan
để diễn đạt thích hợp, chỉ sử điểm một cách
những gì muốn dụng tạm ngừng chính xác.
nói. trong diễn đạt tự
nhiên

9 Giữa mức điểm 10 và 8

8 Sử dụng từ vựng Sử dụng từ vựng Lưu loát. Có thể Hoàn tất nhiệm vụ
tốt.
tốt. Có đủ từ Không mắc lỗi ngữ nói các cụm từ với tốt. Hoàn tất tất cả
vựng để thảo pháp gây hiểu tốc độ vừa phải các yêu cầu một
nhầm.
luận các chủ đề Các câu đơn mặc dù thí sinh cách chính xác và
giản
hằng ngày và không có lỗi phải ngừng khi hiệu quả nhưng
ngữ
trừu tượng. Diễn pháp. Câu phức tạp tìm các cấu trúc và không mở rộng.
giải khi cần thiết. được dùng cụm từ để diễn
nhưng
Thỉnh thoảng thường xuyên có lỗi đạt. Có vài chỗ
dùng sai từ. ngữ pháp. tạm ngưng dài
thấy được. Phát
âm và ngữ điệu
hầu hết chính xác.

7 Giữa mức điểm 8 và 6

6 Từ vựng cơ bản. Nhìn chung sử Hầu hết lưu loát. Hoàn tất các nhiệm
Sử dụng từ vựng dụng ngữ pháp Có khả năng giao vụ không cân đối.
cho các chủ đề đúng. Giao tiếp tiếp tự tin trong Hoàn tất một số
hằng ngày tốt. tương đối chính các tình huống nhiệm vụ tốt
Có khả năng xác trong các bối quen thuộc và nhưng gặp phải
diễn giải. Cố cảnh quen thuộc. không quen thuộc. khó khăn khi hoàn
gắng sử dụng từ Có cố gắng sử Có thể bày tỏ ý tất một số nhiệm
và cụm từ phức dụng cấu trúc kiến nhưng không vụ khác. Thỉnh
tạp nhưng sử phức tạp nhưng duy trì hội thoại thoảng giao tiếp
dụng sai. Có thường có lỗi ngữ với một tốc độ không logic.
một số vấn đề pháp. thích hợp.
về phạm vi từ
vựng.
5 Giữa mức điểm 6 và 4

4 Sử dụng từ vựng Sử dụng ngữ pháp Ngập ngừng. Giao Hoàn tất nhiệm vụ
hạn chế. Có thể hạn chế. Các cấu tiếp có thể hiểu rất hạn chế.
sử dụng từ vựng trúc được nhưng việc Thường xuyên
để thảo luận câu đơn giản được ngập ngừng để sửa thiếu logic.
những vấn đề sử dụng hầu hết lỗi ngữ pháp và từ
thường ngày. chính xác. Không vựng là rõ ràng,
Các từ thường sử dụng các cấu đặc biệt là trong
được sử dụng trúc phức tạp. trường hợp nói tự
sai. Hay mắc do với các cụm từ
phải lỗi về từ dài.
vựng.
3 Giữa mức điểm 4 và 2

2 Từ vựng rất hạn Rất hạn chế về Vắn tắt. Có thể Cố gắng để hoàn
chế. Sử dụng các việc kiểm soát ngữ diễn đạt hiểu được tất các nhiệm vụ
cụm từ thuộc pháp. qua các lời nói rất nhưng không logic
lòng rất đơn Chỉ sử dụng đúng
ngắn có ngập và không có tổ
giản. Sử dụng các cấu trúc đơn
tiếng mẹ đẻ khi giản. Sử dụngcác ngừng, lặp lại chức. Đề cập đến
không có từ để cụm từ mang tính thường xảy ra và các phần của
diễn đạt. Giao công thức. có thể thấy rõ. nhiệm vụ nhưng
tiếp thỉnh thoảng Thường xuyên tự không phát triển
bị ngưng do sửa, ngập ngừng hay bỏ qua.
thiếu từ vựng.
và lỗi phát âm có
thể gây ra hiểu
nhầm.

1 Giữa mức điểm 2 và 0

0 Sử dụng từ vựng Không sử dụng Không nói được. Không thực hiện
không thích hợp được ngữ pháp. Không thể hiểu các nhiệm vụ.
suốt phần thi. Trả lời quá ngắn được. Trả lời quá Hoàn toàn hiểu sai
Trả lời quá ngắn và không thể đánh ngắn và không thể các nhiệm vụ. Trả
và không thể giá được. đánh giá được. lời quá ngắn và
đánh giá được. không thể đánh giá
được.

Cách tính tổng điểm kỹ năng Nói = Điểm phần 1 + Điểm phần 2.
5. Bài thi minh họa
SPEAKING TEST
PART 1
Interlocutor:
Hello, please sit down. I’m………. .
In the first part I will ask you some questions.
What’s your name?
How are you today?
What’s your favourite food?
Who cooks it for you?
How often do you eat it?
PART 2
Interlocutor:
Here is some information about a school picnic. Please use the information on the
card to answer the questions. Use the information and cue words to help you.
Do you understand what you need to do?
Now, you have one minute to read your card and then start answering the questions.






Prompt card

That’s the end of the speaking test. Thank you!


Activity 4. Design a complete speaking test based on the specifications discussed
in activity 3.
IV. ASSESSING SPEAKING PERFORMANCES
Activity 5. Listen to two candidates’ speaking performances and practice scoring
them based on the rating scales provided in activity 3.

You might also like