What and How To Test
What and How To Test
PAPER
By:
2020
1
A. Purposes for language learning and language testing
So, tests can have a wash back effect, which means that they may result in
instructional programs or teaching practices changing to reflect the test contents
because language teachers want their students to do well on high stakes tests for
many different reasons. In some respects, standardized test can be expected to have
an indirect effect on what language teachers teach and sometimes even how they
teach the foreign language. As an experienced language educator, the author of this
essay accepts the inevitability of the wash back effect of major tests, even those
given at the end of the term by the instructor because we are sometimes obligated to
«teach to the test. However, once this inevitability is accepted, foreign language
teachers often advocate for an even more important outcome than passing the test.
They often teach their students to become autonomous language learners, meaning
we want students to become independent learners so they continue to learn the
language long after they have completed formal language study.
2
B. Testing versus Assessment
2. Communities
3
3. Cultures
A concept that is very complex and has many different meanings, but in the
context of the Five Cs, it refers mainly to the life styles, mores, beliefs, and habits of
people who share not only a language (e.g., Spanish) but who also share deeper
values. In languages like Spanish with more than twenty different countries where it
is spoken and many different cultures within each country and English with its many
variations in countries as different as the Australia, England, New Zealand, Nigeria,
and the U.S.A, it is clear why this is a complex concept. Any yet, language
instructors worldwide share the view that culture is intricately related to language
and therefore cannot be avoided when a foreign language is taught.
4. Comparisons
Related to culture, this concept means that language learners, almost without
exception, tend to make comparisons between their L1 and their L2 during the
language program and even after completing language study; it is, therefore, perhaps
useful for language instructors to help students make appropriate connection and
avoid unhelpful ones in their language study. For example, many novice language
learners make comments like, but the way they in native speaker language pronounce
certain sounds is strange, or the way their grammar works is weird, etc. Most
language instructors typically try to convince their students that making judgmental
comparisons is sometimes not helpful when trying to develop proficiency in a
language.
5. Connections
4
student to make connections between two aspects of the foreign language (e.g., the
complexities of spelling in English with such words a through, though, and thought).
D. Constructing Test
1. Test Items
A test item is a specific task test takers are asked to perform. Test items can
assess one or more points or objectives, and the actual item itself may take on a
different constellation depending on the context. For example, an item may test one
point (understanding of a given vocabulary word) or several points (the ability to
obtain facts from a passage and then make inferences based on the facts). Likewise, a
given objective may be tested by a series of items. For example, there could be five
items all testing one grammatical point (e.g., tag questions). Items of a similar kind
may also be grouped together to form subtests within a given test.
2. Classifying Items
Integrative – An integrative item would test more than one point or objective
at a time. (e.g., comprehension of words, and ability to use them correctly in
context). For example: Demonstrate your comprehension of the following words by
using them together in a written paragraph: “paralysis,” “accident,” and “skiing.”
Sometimes an integrative item is really more a procedure than an item, as in the case
of a free composition, which could test a number of objectives; for example, use of
appropriate vocabulary, use of sentence level discourse, organization, statement of
thesis and supporting evidence. For example:
5
Write a one-page essay describing three sports and the relative likelihood of
being injured while playing them competitively. Objective – A multiple-choice item
for example, is objective in that there is only one right answer. Subjective – A free
composition may be more subjective in nature if the scorer is not looking for any one
right answer, but rather for a series of factors (creativity, style, cohesion and
coherence, grammar, and mechanics).
The language skills that we test include the more receptive skills on a
continuum – listening and reading, and the more productive skills – speaking and
writing. There are, of course, other language skills that cross-cut these four skills,
such as vocabulary. Assessing vocabulary will most likely vary to a certain extent
across the four skills, with assessment of vocabulary in listening and reading –
perhaps covering a broader range than assessment of vocabulary in speaking and
writing. We can also assess nonverbal skills, such as gesturing, and this can be both
receptive (interpreting someone else’s gestures) and productive (making one’s own
gestures).
6
and evaluation (making quantitative and qualitative judgments about material). It has
been popularly held that these levels demand increasingly greater cognitive control
as one moves from knowledge to evaluation – that, for example, effective operation
at more advanced levels, such as synthesis and evaluation, would call for more
advanced control of the second language. Yet this has not necessarily been borne out
by research (see Alderson & Lukmani, 1989). The truth is that what makes items
difficult, sometimes defies the intuitions of the test constructors.
5. Grammatical competence
Major grammatical errors might be considered those that either interfere with
intelligibility or stigmatize the speaker. Minor errors would be those that do not get
in the way of the listener's comprehension nor would they annoy the listener to any
extent. Thus, getting the tense wrong in the above example, "We have had a great
time at your house last night" could be viewed as a minor error, whereas in another
case, producing "I don't have what to say" ("I really have no excuse" by translating
directly from the appropriate Hebrew language) could be considered a major error
since it is not only ungrammatical but also could stigmatize the speaker as rude and
unconcerned, rather than apologetic.
Student Placement,
Diagnosis of Difficulties,
Checking Student Progress,
Reports to Student and Superiors,
Evaluation of Instruction.
7
information to adversely influence the student's feeling of self-worth. It is even more
unfortunate that the perception matches reality in the majority of testing situations.
Consequently, tests are highly stressful anxiety producing events for most persons.
Multiple choice questions should be written without ambiguity. That is, the
statement of the question stem should be clear and should leave no doubt about how
to select choices. Additionally, the choices should be written without ambiguity and
8
should contain all information required to make a decision whether or not to choose
it. The decision whether to select or not select a choice should not depend on an
obscure interpretation of either the stem or the choice. A multiple choice question
may easily be used to determine if the student recalls facts. However, a multiple
choice question may also be used to determine if the student has mastered the
learning objective well enough to correctly analyse a statement.
8. Fill-in-the-Blank Questions
E. Test Construction
1. Closed-Answer or “Objective” Tests
9
choice, matching, fill-in, true/false, or fill-in-the-blank items as objective tests.
Objective tests have the advantages of allowing an instructor to assess a large and
potentially representative sample of course material and allow for reliable and
efficient scoring. The disadvantages of objective tests include a tendency to
emphasize only “recognition” skills, the ease with which correct answers can be
guessed on many item types, and the inability to measure students’ organization and
synthesis of material (Adapted with permission from Yonge, 1977).
2. Essay Tests
1. Have in mind the processes that you want measured (e.g., analysis,
synthesis).
2. Start questions with words such as “compare,” “contrast,” “explain why.”
Don’t use “what,” “when,” or “list.” (These latter types are better
measured with objective-type items).
3. Write items that define the parameters of expected answers as clearly as
possible.
4. Make sure that the essay question is specific enough to invite the level of
detail you expect in the answer. A question such as “Discuss the causes of
10
the American Civil War,” might get a wide range of answers, and
therefore be impossible to grade reliably. A more controlled question
would be, “Explain how the differing economic systems of the North and
South contributed to the conflicts that led to the Civil War.
5. Don’t have too many questions for the time available.
11