Item Writing
Item Writing
Objectives
Introduction
Item Writing
The item developer should take care in selecting and developing items. They
should avoid redundant items.
In order to get the required number if items, one may need to write 3-4
items for each item that they wish to write. For example if you wish to write
20 items for your test, you may generate a pool of 60-80 items.
Writing exceptionally long items may lead to having items that are
misleading or confusing. So they should be avoided.
4. Keep the level of reading difficulty appropriate for those who will
complete the scale.
It is important to be mindful of the level of reading difficulty of the targeted
test takers. If for example the item developer is writing for nursery school
children, the items should be in line with the capability of the targeted test
takers. If this is not done, they will not understand the test and will
therefore fail the test.
At times, the test takers may develop the ‘acquiescence response set’ where
they tend to respond positively to all items. To avoid this bias, you may
include items that are worded in the opposite direction.
For example;
“I feel tired”.
“I feel energised”.
For example, if you are writing items for a religious population, it may not be
appropriate to write items reflecting mannerisms that may be offensive to
2
them like – alcohol drinking, eating certain foods that may be taboo to them,
etc.
When items are used over a long period, they tend to lose reliability. Hence
the need to ensure they are reliable at any one point if they are to be used.
Item Formats
Different item formats are used for different purposes. The format used for
evaluating attitudes may not be the same to be used for assessing
personalities. Each format is chosen based on the pros and cons for that
particular format.
a. Dichotomous Format
This format offers two alternatives for each item. If a test taker selects one of
the alternatives that is presented, they are awarded a point.
3
A common dichotomous test is the True-False examination. The test taker’s
task is to choose either what is true or what is false, but not both for a
single item.
Disadvantages
4
This resembles the dichotomous format only that it has more than two
alternatives.
A point is given for selecting one of the alternatives but not for selecting any
other choice.
Some test takers can get the items correct simply by guessing even if they
have not read the subject matter. Hence for a test with three alternatives,
the chances of getting a correct choice is 33%, etc.
W
Corrected score = R -
n−1
Take an example of 100 items with 4 choices each, and the test taker
decided to guess all through the exercise. By default the expected score
from guessing will be a quarter (25) of the 100 items. R is expected to be
25 of the 100 items, and the number of wrong responses will be W =
(100-25)= 75 and n = 4
75 75
Correct score = 25 -( ) = 25 –( ) =25-25=0
4−1 3
5
So when correction for guessing is applied, the corrected score is
actually 0.
An example:
Mukiibi was subjected to a psychological test with 100 items, each item
having four answer choices to choose from. He scored 88 correct answers
and was pronounced to have passed the test. What is Mukiibi’s score
after correction for guessing?
W
From the formula, Corrected score = R -
n−1
12 12
Correct score = 88- ( ) = 88 – ( ) = (88-4 ) =84
4−1 3
The omitted numbers are not included. They provide neither credit nor
penalty.
It takes little time for the test takers to respond since they do not write
the answers. Hence one can respond to a large number of items in a
short time.
The tests are easy to score. The tester only counts the correct items to
get the score.
Disadvantages
6
This scale is non-comparative and measures only a single trait. The
respondent is asked to indicate their level of agreement with a given
statement by way of an ordinal scale.
1 I am afraid
of
caterpillars
2 I love snakes
3 I fear cats
4 I do not like
centipedes
Likert scales are some times referred to as summative scales because each
specific question can be summed up with other related items to create a
score for a group of statements.
Scoring requires that each negatively worded item be reverse scored and the
responses are then summed up.
Advantages
It is easy to construct
7
Central tendency bias; participants may avoid extreme response
categories
Although it may seem similar to the Likert format, the category scale
uses a defined point rating system.
The numbers that are assigned when using the rating scale are
sometimes influenced by the context in which the items are rated.
An example:
Advantages
Disadvantages
It does not take into consideration the context in which the test
subject is being rated! E.g. in a class of averagely performing students,
a student may be rated as 9, which represents a very good performer.
8
Yet if the same student is placed in another class of only highly
performing students, the same student may be rated 3, which
represents a relatively poor performance.
In order to overcome the problems above, the end points of this scale
have to be clearly defined, by outlining the expected characteristics of
each point (Kaplan & Ernest, 1983).
e. Checklists
The test taker is given a list of adjectives and asked to indicate whether
each is characteristic of him/herself or someone else.
Here, a rating of 9 will mean that the statement on the card is the best
description of the characteristic of the person being studied, while 1 is
the least description of that person’s characteristics.
For example ;
Castro is…
Is a dependable person.
Is a talkative individual.
Tends to be self-defensive.
9
Is thin-skinned; sensitive to anything that can be construed as
criticism.
Q-Lists
A test taker is given a list of statements about one their proposed
personal characteristics and asked to sort them into a given number of
piles, e.g. 5, or 9 piles.
These statements are sorted into piles that indicate the degree to which
they appear to describe a given person accurately.
A pile list of 1 to 9 is provided to the test taker, where he/she will rate
and place the statement listed on the card, onto the pile number that
appropriately describes the characteristics of the person being studied.
The frequency of cards placed on the different piles is noted and the best
characteristic description of the person under study is noted.
Conclusion
The items if written carefully will be able to help in the assessment of the
subject and give accurate results to the tester.
10
References
11