Classroom Assessment WTNTK-pages-143
Classroom Assessment WTNTK-pages-143
not be accurate. Even though we typically refer to an item’s p value as its dif-
ficulty index, the actual difficulty of an item is tied to the instructional program
surrounding it. If students are especially well taught, they may perform excellently
on a complex item that, by anyone’s estimate, is a tough one. Does the resulting
p value of .95 indicate the item is easy? No. The item’s complicated content may
have simply been taught effectively. For example, almost all students in a pre-med
course for prospective physicians might correctly answer a technical item about
the central nervous system that almost all “people off the street” would answer
incorrectly. A p value of .96 based on the pre-med students’ performances would
not render the item intrinsically easy.
Item-Discrimination Indices
For tests designed to yield norm-referenced inferences, one of the most power-
ful indicators of an item’s quality is the item-discrimination index. In brief, an
item-discrimination index typically tells us how frequently an item is answered
correctly by those who perform well on the total test. Fundamentally, an item-
discrimination index reflects the relationship between students’ responses for the
total test and their responses to a particular test item. One approach to computing
an item-discrimination statistic is to calculate a correlation coefficient between
students’ total test scores and their performance on a particular item.
A positively discriminating item indicates an item is answered correctly more
often by those who score well on the total test than by those who score poorly on
the total test. A negatively discriminating item is answered correctly more often by
those who score poorly on the total test than by those who score well on the total
test. A nondiscriminating item is one for which there’s no appreciable difference
in the correct response proportions of those who score well or poorly on the total
test. This set of relationships is summarized in the following chart. (Remember
that < and > signify less than and more than, respectively.)
Proportion of Correct
Type of Item Responses on Total Test
Positive Discriminator High Scorers 7 Low Scorers
Negative Discriminator High Scorers 6 Low Scorers
Nondiscriminator High Scorers = Low Scorers
In general, teachers would like to discover that their items are positive dis-
criminators because a positively discriminating item tends to be answered correctly
by the most knowledgeable students (those who scored high on the total test) and
incorrectly by the least knowledgeable students (those who scored low on the total
test). Negatively discriminating items indicate something is awry, because the item
tends to be missed more often by the most knowledgeable students and answered
correctly more frequently by the least knowledgeable students.
1. Order the test papers from high to low by total score. Place the paper having
the highest total score on top, and continue with the next highest total score
sequentially until the paper with the lowest score is placed on the bottom.
2. Divide the papers into a high group and a low group, with an equal number
of papers in each group. Split the groups into upper and lower halves. If
there is an odd number of papers, simply set aside one of the middle papers
so the number of papers in the high and low groups will be the same. If there
are several papers with identical scores at the middle of the distribution, then
randomly assign them to the high or low distributions so the number of pa-
pers in the two groups is identical. The use of 50-percent groups has the ad-
vantage of providing enough papers to permit reliable estimates of upper and
lower group performances.
3. Calculate a p value for each of the high and low groups. Determine the
number of students in the high group who answered the item correctly and
then divide this number by the number of students in the high group. This
provides you with ph. Repeat the process for the low group to obtain pl.
4. Subtract pl from ph to obtain each item’s discrimination index (D). In essence,
then, D = ph - pl.
Suppose you are in the midst of conducting an item analysis of your mid-
term examination items. Let’s say you split your class of 30 youngsters’ papers
into two equal upper-half and lower-half papers. All 15 students in the high
group answered item 42 correctly, but only 5 of the 15 students in the low group
answered it correctly. The item discrimination index for item 42, therefore, would
be 1.00 - .33 = .67.
Now, how large should an item’s discrimination index be in order for you to
consider the item acceptable? Ebel and Frisbie (1991) offer the experience-based
guidelines in Table 11.1 for indicating the quality of items to be used for making
norm-referenced interpretations. If you consider their guidelines as approxima-
tions, not absolute standards, they’ll usually help you decide whether your items
are discriminating satisfactorily.
An item’s ability to discriminate is highly related to its overall difficulty
index. For example, an item answered correctly by all students has a total p value
of 1.00. For that item, the ph and pl are also 1.00. Thus, the item’s discrimination
index is zero 1 1.00 - 1.00 = 0 2. A similar result would ensue for items in which
the overall p value was zero—that is, items no student had answered correctly.