0% found this document useful (0 votes)
392 views11 pages

Item Writing

The document outlines the steps in developing test items and discusses different item formats including dichotomous, polychotomous, Likert, checklists and Q-sorts. It details the advantages and disadvantages of each format and provides guidance on writing clear, unbiased items at an appropriate reading level for test takers.

Uploaded by

azailelleon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
392 views11 pages

Item Writing

The document outlines the steps in developing test items and discusses different item formats including dichotomous, polychotomous, Likert, checklists and Q-sorts. It details the advantages and disadvantages of each format and provides guidance on writing clear, unbiased items at an appropriate reading level for test takers.

Uploaded by

azailelleon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

ITEM WRITING AND ITEM FORMATS

Objectives

1. To outline the steps taken in developing test items.


2. To discuss the different item formats, i.e. dichotomous,
polychotomous, Likert, Checklists, Q-sorts and the category scale;
their advantages and disadvantages.

Introduction

Items are specific questions or problems that make up a test (Kaplan


&Saccuzo, 2009).

An item is a specific stimulus to which a person responds overtly (i.e can


be observed) or can be scored. This response can be scored or evaluated
for example on a scale or grade e.g. 75% meaning out of a 100-item test,
the individual has scored 75 items correct.

A test is a measurement device or technique used to quantify behaviour


or help in understanding and prediction of behaviour. It is also termed as
a collection of items.

Item Writing

Item writing involves a number of steps;

1. Define clearly what you want to measure.

Most often, it will be in one of these areas;

 A type of cognitive achievement –this can be either a skill or


knowledge. An example of knowledge is – ‘knowledge of
Ugandan history’ or for a skill – ‘demonstration of an ability to
multiply decimals’.
 A type of affective trait- for example - interest in psychology.

The items should be made as speficic as possible.

2. Generate an item pool

The item developer should take care in selecting and developing items. They
should avoid redundant items.
In order to get the required number if items, one may need to write 3-4
items for each item that they wish to write. For example if you wish to write
20 items for your test, you may generate a pool of 60-80 items.

3. Avoid exceptionally long items.

Writing exceptionally long items may lead to having items that are
misleading or confusing. So they should be avoided.

4. Keep the level of reading difficulty appropriate for those who will
complete the scale.
It is important to be mindful of the level of reading difficulty of the targeted
test takers. If for example the item developer is writing for nursery school
children, the items should be in line with the capability of the targeted test
takers. If this is not done, they will not understand the test and will
therefore fail the test.

5. Avoid double barrelled items that convey two or more ideas at


the same time.
Double barrelled items may end up confusing the test taker since they may
fail to decide whether to agree with or disagree with the statement. This will
eventually affect the results of the test. An example is ;
Indicate whether you agree or disagree with the statement.
“ I vote NRM because I support Universal Secondary Education”.
These are two different statements; “I vote NRM” and “I support Universal
Secondary Education”. Someone can agree with one but not the other or
viceversa.

6. Consider mixing positively and negatively worded items.

At times, the test takers may develop the ‘acquiescence response set’ where
they tend to respond positively to all items. To avoid this bias, you may
include items that are worded in the opposite direction.

For example;

“I feel tired”.

“I feel energised”.

7. When writing test items, you need to be sensitive to the


cultural and ethnic differences.

For example, if you are writing items for a religious population, it may not be
appropriate to write items reflecting mannerisms that may be offensive to

2
them like – alcohol drinking, eating certain foods that may be taboo to them,
etc.

8. It is important to realise that items become obsolete. When


they become obsolete, they lose reliability.

When items are used over a long period, they tend to lose reliability. Hence
the need to ensure they are reliable at any one point if they are to be used.

Other general guides for item writing.

 “All of the Above” should not be an answer option

 “None of the Above” should not be an answer option

 All answer options should be credible

 Order of answer options should be logical or vary

 Items should cover important concepts and objectives

 Negative wording should not be used

 Answer options should include only one correct answer

 Specific determiners (e.g. always, never) should not be used

 Answer options should be homogenous

 Correct answer options should not be the longest answer option

 Items should be independent of each other

 Test copies should be clear, readable and not hand-written

Item Formats

Different item formats are used for different purposes. The format used for
evaluating attitudes may not be the same to be used for assessing
personalities. Each format is chosen based on the pros and cons for that
particular format.

a. Dichotomous Format

This format offers two alternatives for each item. If a test taker selects one of
the alternatives that is presented, they are awarded a point.

3
A common dichotomous test is the True-False examination. The test taker’s
task is to choose either what is true or what is false, but not both for a
single item.

Other item responses on the format include, “Yes” or “No”

An example of dichotomous items;

Item True False

1 Tough managers produce best performing


teams

2 Teamwork encourages social loafing

3 Introverts perform tasks better as


individuals

4 I often worry about my reading ability

Advantages of the Dichotomous Format

1. It is easy to construct and adminster.

2. It is easily scored. The tester only needs to count the number of


correct items to get the score.

3. The true-false items require absolute judgement. The test taker


cannot choose anything in between.

Disadvantages

1. They encourage students to memorise material and be able to pass


the test even when they have not really understood the concepts.

2. Dichotomous items tend to be less reliable than other item formats.


This is because it only poses a mere chance of 50% of either passing
the test or failing it! It is easy for a test taker to simply guess a correct
answer without understanding the context of the item.

b. The Polychotomous Format (Polytomous)

4
This resembles the dichotomous format only that it has more than two
alternatives.

A point is given for selecting one of the alternatives but not for selecting any
other choice.

For a polychotomous examination, the test taker has to determine which


alternative is correct. Incorrect alternatives are called distractors.

According to the psychometric theory, adding more distractors increases the


reliability of the item. It is usually best to have 3- 4 distractors for this
purpose. However, poorly written distractors may affect the quality of the
test.

Unlike in the dichotomous format where a 50% chance of success is


observed, in the polychotomous format, chances of success are dependent
on the number of choices available per item, i.e. if the choices are four,
chance of a correct choice is one out of the four choices which is equivalent
to 25%. If the choices are three, the chance of a correct choice is one out of
three which is equivalent to 33.3%.

Some test takers can get the items correct simply by guessing even if they
have not read the subject matter. Hence for a test with three alternatives,
the chances of getting a correct choice is 33%, etc.

Because of guessing, a correction for guessing is done. The formula to


correct for guessing on a test is;

W
Corrected score = R -
n−1

Where R = the number of right responses

W = the number of wrong responses

n = the number of choices for each item

Take an example of 100 items with 4 choices each, and the test taker
decided to guess all through the exercise. By default the expected score
from guessing will be a quarter (25) of the 100 items. R is expected to be
25 of the 100 items, and the number of wrong responses will be W =
(100-25)= 75 and n = 4

Using the formula above;

75 75
Correct score = 25 -( ) = 25 –( ) =25-25=0
4−1 3

5
So when correction for guessing is applied, the corrected score is
actually 0.

An example:

Mukiibi was subjected to a psychological test with 100 items, each item
having four answer choices to choose from. He scored 88 correct answers
and was pronounced to have passed the test. What is Mukiibi’s score
after correction for guessing?

W
From the formula, Corrected score = R -
n−1

R is observed to be 88 of the 100 items, W = 12 and n = 4

12 12
Correct score = 88- ( ) = 88 – ( ) = (88-4 ) =84
4−1 3

So Mukiibi’s corrected score is 84.

The omitted numbers are not included. They provide neither credit nor
penalty.

The expression (W/n-1) is an estimate of the number of responses the


test taker is expected to get right by chance.

Advantages of use of polychotomous format

 It takes little time for the test takers to respond since they do not write
the answers. Hence one can respond to a large number of items in a
short time.

 The tests are easy to score. The tester only counts the correct items to
get the score.

Disadvantages

It may be easy to guess a correct answer and by chance a correct answer


may be selected.

c. The Likert Format

This format requires that a respondent indicates the degree of agreement


with a particular attitudinal question.

It is very popular with personality and attitude scales.

6
This scale is non-comparative and measures only a single trait. The
respondent is asked to indicate their level of agreement with a given
statement by way of an ordinal scale.

It is sometimes expressed as a four, five or even six –point scale ranging


from, Strongly agree, Agree, Neutral, Disagree, Strongly Disagree. The
more the number of points, the less likely it is for the respondent to be
neutral.

An example of a six-point scale;

No Item Strongly Moderately Mildly Mildly Moderately Agre


. Disagree Disagree Disagree Agree

1 I am afraid
of
caterpillars

2 I love snakes

3 I fear cats

4 I do not like
centipedes

Likert scales are some times referred to as summative scales because each
specific question can be summed up with other related items to create a
score for a group of statements.

Scoring requires that each negatively worded item be reverse scored and the
responses are then summed up.

Advantages

 It is easy to construct

 It produces a highly reliable scale

 It is easy to read and complete by the test takers.

Weaknesses of this scale include:

7
 Central tendency bias; participants may avoid extreme response
categories

 Acquiescence bias; participants may agree with statements as


presented in order to please the tester.

 Social desirability bias; Respondents may wish to portray themselves


in a more favorable light rather than being honest.

 Validity may be difficult to demonstrate; it may not portray what the


tester intended to measure

d. The Category Format

It is similar to the Likert scale but uses an even greater number of


choices than the Likert scale.

Although it may seem similar to the Likert format, the category scale
uses a defined point rating system.

Test takers are required to rate a given item scenario on a scale in a


category range. For example one may use a scale of 1 to 5 or 1 to 10,
where 1 is the lowest score and 5 or 10 being the highest score
respectively.

The numbers that are assigned when using the rating scale are
sometimes influenced by the context in which the items are rated.

The number of categories used depends on the fineness of the


discrimination that the test takers are willing to make. If they wish to
have a fine discrimination they will take even more categories.

An example:

1. On a scale of 1 to 5, rate Bazalaki’s attitude towards class


assignments. (where 1 is very negative and 5 is very positive)

2. On a scale of 1to 10, rate the level of academic excellence of Makerere


university. (where 1 is very ordinary and 10 is very competitive.

Advantages

 It is very easy to administer

Disadvantages

 It does not take into consideration the context in which the test
subject is being rated! E.g. in a class of averagely performing students,
a student may be rated as 9, which represents a very good performer.

8
Yet if the same student is placed in another class of only highly
performing students, the same student may be rated 3, which
represents a relatively poor performance.

 Also on this scale, test takers have a tendency to spread their


responses evenly across the entire scale of 1 to 10, which may not
fairly represent the actual score.

In order to overcome the problems above, the end points of this scale
have to be clearly defined, by outlining the expected characteristics of
each point (Kaplan & Ernest, 1983).

For example if one is looking at the performance of students in a given


class, for a student to score 10, they must have been;

- attending all classes


- contribute to every question asked in class
- solves problems fast
- assists others to complete their class work
- regularly passes class tests with over 80%.

On the other hand, the opposite can explain the characteristics of a


student scoring 1.

e. Checklists

These are used in personality measurement.

The test taker is given a list of adjectives and asked to indicate whether
each is characteristic of him/herself or someone else.

Here, a rating of 9 will mean that the statement on the card is the best
description of the characteristic of the person being studied, while 1 is
the least description of that person’s characteristics.

For example ;

Castro is…

 Is a dependable person.

 Is a talkative individual.

 Behaves in a sympathetic or considerate manner.

 Appears to have a high degree of intellectual capacity

 Is protective of those close to him

 Tends to be self-defensive.

9
 Is thin-skinned; sensitive to anything that can be construed as
criticism.

Q-Lists
A test taker is given a list of statements about one their proposed
personal characteristics and asked to sort them into a given number of
piles, e.g. 5, or 9 piles.

These statements are sorted into piles that indicate the degree to which
they appear to describe a given person accurately.

A pile list of 1 to 9 is provided to the test taker, where he/she will rate
and place the statement listed on the card, onto the pile number that
appropriately describes the characteristics of the person being studied.

For example 100 statements about a person’s characteristics are listed


on cards, with each card having one statement, making 100 cards.

The degree of representation of the statements on the cards can be


distributed across the 9 piles, depending on the test taker’s
interpretation of the subject being studied.

The frequency of cards placed on the different piles is noted and the best
characteristic description of the person under study is noted.

The observed results tend to follow a normal distribution. However, items


that lie at the extreme ends of the quantum always Speke volumes about
the true personal characteristics of the subject.

Conclusion

The items if written carefully will be able to help in the assessment of the
subject and give accurate results to the tester.

10
References

 Kaplan, R.M &Saccuzzo, D.P(2009) Psychological Testing, Principles,


Applications and Issues

 Crocker, L &Algina, J (2008) Introduction to Classical and Modern


Test Theory

 Suen, H,K& McClellan, S(2003). Test item construction techniques


and Principles.

11

You might also like