Writing and Evaluating Test Items
Writing and Evaluating Test Items
ITEM FORMATS
The Dichotomous Format - The dichotomous format offers two alternatives for each item.
Usually a point is given for the selection of one of the alternatives. The most common example
of this format is the true-false examination.(Yes or No).
The Polytomous Format- The polytomous format (sometimes called polychotomous)
resembles the dichotomous format except that each item has more than two alternatives.
Typically, a point is given for the selection of one of the alternatives, and no point is given for
selecting any other choice.
Incorrect choices are called distractors. On item analysis, the choice of distractors is
critically important.
The Likert Format - The technique is called the Likert format because it was used as part of
Likert’s (1932) method of attitude scale construction. Five alternatives are off ered: strongly
disagree, disagree, neutral, agree, and strongly agree. Scoring requires that any negatively
worded items be reverse scored and the responses are then summed.
The Category Format - A technique that is similar to the Likert format but that uses an even
greater number of choices is the category format.
An approach related to category scales is the visual analogue scale. Using this
method, the respondent is given a 100-millimeter line and asked to place a mark
between two well-defi ned endpoints. Th e scales are scored according to the
measured distance from the fi rst endpoint to the mark
Checklists and Q-Sorts - One format common in personality measurement is the adjective
checklist. With this method, a subject receives a long list of adjectives and indicates whether
each one is characteristic of himself or herself. Adjective checklists can be used for describing
either oneself or someone else.
ITEM ANALYSIS
- a general term for a set of methods used to evaluate test items, is one of the most important aspects
of test construction. The basic methods involve assessment of item difficulty and item discriminability.
Item Difficulty - For a test that measures achievement or ability, item difficulty is defined by the
number of people who get a particular item correct.Item difficulty is only one way to evaluate
test items.
1
Discriminability - Assessment of item discriminability determines whether the people who
have done well on particular items have also done well on the whole test.
The Extreme Group Method- This method compares people who have done well with
those who have done poorly on a test. For example, you might find the students with
test scores in the top third and those in the bottom third of the class. Then you
would find the proportions of people in each group who got each item correct. The
diff erence between these proportions is called the discrimination index.
The Point Biserial Method -Another way to examine the discriminability of items is to
find the correlation be-tween performance on the item and performance on the total
test. The correlation between a dichotomous.variable and continous variable is called a
point biserial correlation.
The total test score is used as an estimate of the amount of a “trait” possessed by individuals.
Drawing the Item Characteristic Curve - To draw the item characteristic curve, we need to
defi ne discrete categories of test performance.
Item Response Theory - According to these approaches, each item on a test has its own item
characteristic curve that describes the probability of getting each particular item right or wrong
given the ability level of each test taker. With the computer, items can be sampled, and the
specifi c range of items where the test taker begins to have difficulty can be identifed
This theory has many technical advantages. It builds on traditional models of
item analysis and can provide information on item functioning, the value of specific
items, and the reliability of a scale .
According to classical test theory, a score is derived from the sum of an individual’s
responses to various items, which are sampled from a larger domain that represents
a specific trait or ability.
External Criteria - Total test score, for evaluating items
LIMITATIONS OF ITEM ANALYSIS - The growing interest in criterion-referenced tests has posed new
questions about the adequacy of item-analysis procedures. Th e main problem is this: Th ough statistical
methods for item analysis tell the test constructor which items do a good job of separating students,
they do not help the students learn. Young children do not care as much about how many items they
missed as they do about what they are doing wrong. Many times children make specific errors and will
continue to make them until they discover why they are making them.
2
3