0% found this document useful (0 votes)
55 views

Module 6

1. The document discusses item analysis and validation, which involves testing draft test items on students and analyzing the results to improve item quality. 2. Two important measures of item quality are the difficulty index, which indicates what percentage of students answered correctly, and the discrimination index, which shows how well an item distinguishes between higher- and lower-scoring students. 3. A good item will have a difficulty index between 26-75% and a discrimination index between 0.46-1.0, indicating it is neither too easy nor too difficult and distinguishes well between more and less knowledgeable students. Items outside these ranges require revision or removal.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Module 6

1. The document discusses item analysis and validation, which involves testing draft test items on students and analyzing the results to improve item quality. 2. Two important measures of item quality are the difficulty index, which indicates what percentage of students answered correctly, and the discrimination index, which shows how well an item distinguishes between higher- and lower-scoring students. 3. A good item will have a difficulty index between 26-75% and a discrimination index between 0.46-1.0, indicating it is neither too easy nor too difficult and distinguishes well between more and less knowledgeable students. Items outside these ranges require revision or removal.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

KING’S COLLEGE OF MARBEL, INC.

ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT

MODULE 6: ITEM ANALYSIS AND VALIDATION

Time Frame: 6 Hours


Learning Outcomes: At the end of this lesson, the students are expected to:
1. explain the meaning of item analysis, item validity, reliability, item difficulty,
discrimination index;
2. determine the validity and reliability of given test items; and
3. determine the quality of a test item by its difficulty index, discrimination index and
plausibility of option (for a selected –response).

ENGAGE

Welcome students! In the previous module, you were able to construct your TOS and
test items by following the set guidelines. I do hope that you can develop a more improved
version of those items as you go further in this course. Remember that you will be in the
classroom someday and will be assessing your students traditionally and authentically, that’s
why you really need to practice more in making test items that should be in parallel to the set
standards, competencies and objectives.
Test items, especially on the multiple-choice type of test, should be tested and validated
before it can be kept in the test bank. Test banks are area wherein good and developed
questions are kept in order to be used for the next examination. Just like what the Professional
Regulation Commission (PRC) do, they kept good test items to be used for the next board
exams. As a teacher, what should you do with your drafted test items so that they can be used
again in future examinations?
This module will lead you in unfolding things to be done in order for you to look deeper
on the quality of the test items you have prepared beforehand and with that, you can consider
those test items as subject to be accepted, rejected or developed.

EXPLORE

ACTIVITY 1: How I see things?


Directions: Provide a short definition of the terms given below before reading the content of this
module. (Just give it a try)
1. Discrimination Index
___________________________________________________________________________
___________________________________________________________________________

2. Difficulty Index _____________________________________________________________


___________________________________________________________________________

3. Validity ___________________________________________________________________
___________________________________________________________________________

4. Reliability _________________________________________________________________
___________________________________________________________________________

5. Plausibility ________________________________________________________________

76
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT

___________________________________________________________________________

EXPLAIN

INTRODUCTION
The teacher normally prepares a draft of the test. Such a draft is subjected to item
analysis and validation in order to ensure that the final version of the test would be useful and
functional. First, the teacher tries out the draft test to a group of students of similar
characteristics as the intended test takers (try-out phase). From the try-out group, catch item will
be analyzed in terms of its ability to discriminate between those who know and those who do not
know and also its level of difficulty (item analysis phase). The item analysis will provide
information that will allow the teacher to decide whether to revise or replace an item (item
revision phase). Then, finally, the final draft of the test is subjected to validation if the intent is to
make use of the test as a standard test for the particular unit or grading period.

6.1. Item Analysis: Difficulty Index and Discrimination Index


There are two important characteristics of an item that will be of interest to the teacher.
1hese are: (a) item difficulty and (b) discrimination index. We shall learn how to measure these
characteristics and apply our knowledge in making a decision about the item in question.
The difficulty of an item or item difficulty is defined as the number of students
who are able to answer the item correctly divided by the total number of students. Thus:
Item difficulty = number of students with correct answer total number of students

The item difficulty is usually expressed in percentage.

Example: What is the item difficulty index of an item if students are unable to answer it
correctly while 75 answered it correctly?

Here, the total number of students is 100, hence the item difficulty index is 75/100 or
75%
.
Another example: 25 students answered the item correctly while 75 students did not.
The total number of students is 100 so the difficulty index is 25/100 or 25 which is 25%.

It is a more difficult test item than that one with a difficulty index of 75.

A high percentage indicates an easy tem/question while a low percentage indicates a


difficult item.

One problem with this type of difficult index is that it may not actually indicate that the
item is difficult (or easy). A student who does not know the subject matter will naturally be
unable to answer the item correctly even if the question is easy. How do we decide on the
basis of this index whether the item is too difficult or too easy?

77
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT

The following arbitrary rule is often used in the literature:

Range of Difficulty Index Interpretation Action

0-0.25 Difficult Revise or discard


0.26-0.755 Right difficulty Retain
0.76-above Easy Revise or discard

Difficult items tend to discriminate between those who know and those who do not know
the answer. Conversely, easy items cannot discriminate between these two groups of students.
We are therefore interested in deriving a measure that will tell us whether an item cans
discriminate between these two groups of students such a measure is called an index of
discriminate.

An easy way to derive such a measure is to measure how difficult an item is with respect
to those in the upper 25% of the class and how difficult it is with respect to those in the lower
25% of the class. If the upper 25% found difficult then the item can discriminate properly
between these two groups. Thus;

Index of discrimination = DU - DL (U-Upper group; L- Lower group)


Example: Obtain the index of discrimination of an item if the upper 25% of the class had
a difficulty index of 0.60 (i.e. 60% of the upper 25% - got the correct answer) while the lower
25% of the class had a difficulty index of 0.20.
Here, DU = 0.60 while DL = 0.20, thus index of discrimination = 0.60 - 0.20 = 0.40.

Discrimination index is the difference between the proportion of the top scorers
who got an item and the proportion of the lowest scorers who got the item correct and
the proportion of the lowest scorers who got the item right. The discrimination is between -l
and +1. The closer the discrimination index is to +1, the more effectively the item can
discriminate or distinguish between the two groups of students. A negative discrimination index
means more from the lower group got the item correctly. The last item is not good and so must
be discarded

Theoretically, the index of discrimination can range trom-1.0 (when DU =0 and DL = 1)


to 1.0 (when DU = 1 and DL = 0). When the index of discrimination is equal to -1, then this
means that all of the lower 25% of the students got the correct answer while all of the upper
25% got the wrong answer in a sense such an index discriminates correctly between the two
groups but the tem itself is highly questionable. Why should the bright ones get the wrong
answer and the poor ones get the right answer? On the other hand, if the index of discrimination
is 1.0, then this means that all of the lower 25% failed to get the correct answer while all of the
upper 25% got the correct answer. This is a perfectly discriminating item and is the ideal item
that should be included in the test.

78
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT

From these discussions, let us agree to discard or revise all items that have negative
discrimination index for although they discriminate correctly between the upper and lower 25%%
of the class, the content of the item itself may be highly dubious or doubtful.

As in the case of the index of difficulty, we have the following rule of thumb:
Index Range Interpretation Action
-1.0 – -0. 50 Can discriminate but Discard
item is questionable

-0.49 - 0.45 Non-discriminating Revise

0.46-1.0 Discriminating item Include

Example: Consider a multiple choice type of test (80 studetnts) of which the following
data were obtained:
Item Options
A B* C D
1 0 40 20 20 Total
0 15 5 0 Upper 25%
0 5 10 5 Lower 25%

The correct response is B. Let us compute the difficulty index and index of
discrimination:
Difficulty Index = no. of students getting correct response/total
40/80= 50%, within range of a good item (right difficulty)

The discrimination index can similarly be computed:


DU = no. of students in upper 25% with correct response/no. of students in the upper
25%
= 15/20 =.75 or 75%
DL=no. of students in lower 25%% with correct response/ no. of students in the lower
25%
= 5/20 =. 25 or 25%
Discrimination Index = DU-DL = 75-25=50 or 50% (Discriminating Item)

Thus, the item also has a "good discriminating power.”

79
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT

It is also instructive to note that the distracter A is not an effective distracter since this
was never selected by the students. It is an implausible distracter. Distracters C and D appear
to have good appeal as distracters. They are plausible distracters.

Index of Difficulty

Ru+ RL
P= X 100
T

Where
Ru --- The number in the upper group who answered the item correctly.
RL --- The number in the lower group who answered the item correctly
T --- The total number who tried the item

Index of item Discriminating Power

Ru−RL
D=
1
T
2
Where:

P --- Percentage who answered the item correctly (index of difficulty)


R --- Number who answered the item correctly
T --- Total number who tried the item.

8
D= x 100=40 %
20

The smaller the percentage figure the more difficult the item

Estimate the item discriminating power using the formula below

Ru−RL 6−2
D= = =0.40
1 10
T
2

The discriminating power of an item is reported as a decimal fraction maximum


discriminating power is indicated by an index of 1.00.

80
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT

Maximum discrimination is usually found at the 50 percent level of difficulty

0.00 - 0.20 = Very difficult


0.21 - 0.80 = Moderately difficult
0.81-1.00 = Very easy

For classroom achievement tests, most test constructors desire terms with indices of
difficulty no lower than 20 nor higher than 80, with an average index of difficulty from 30 or 40 to
a maximum of 60.
The INDEX OF DISCRIMINATION is the difference between the proportion of the upper
group who got an item right and the proportion of the lower group who got the item right. This
index in dependent upon the difficulty of an item. It may reach a maximum value of 100 for an
item with an index of difficulty of 50, that is, when l00% of the upper group and none of the lower
group answer the item correctly. For items of less than or greater than 50 difficulty, the index of
discrimination has a maximum value of less than 100.

More Sophisticated Discrimination Index


Item discrimination refers to the ability of an item to differentiate among students on the
basis of how well they know the material being tested. Various hand calculation procedures
have traditionally been used to compare item responses to total test scores using high and low
scoring groups of students. Computerized analyses provide more accurate assessment of the
discrimination power of items because they take into account responses of all students rather
than just high and low scoring groups.
The item discrimination index provided by Score Pak® is a Pearson Product Moment
correlation between student responses to a particular item and total scores on all other items on
the test. This index is the equivalent of a point-biserial coefficient in this application. It provides
an estimate of the degree to which an individual item is measuring the same thing as the rest of
the items.
Because the discrimination index reflects the degree to which an item and the test as a
whole are measuring a unitary ability or attribute, values of the coefficient will tend to be lower
for tests measuring a wide range of content areas than for more homogeneous tests. Item
discrimination indices must always be interpreted in the context of the type of test which is being
analyzed. Items with low discrimination indices are often ambiguously worded and should be
examined. Items with negative indices should be examined to determine why a negative value
was obtained. For example, a negative value may indicate that the item was mis-keyed, so that
students who knew the material tended to choose an unkeyed, but correct, response option.

Tests with high internal consistency consist of items with mostly positive relationships
with total test score. In practice, values of the discrimination index will seldom exceed .50
because of the differing shapes of 1tem and total score distributions. ScorePak® classifies item
discrimination as "good" if the index is above .30; fair" if it is between 10 and 30; and "poor"
if it is below .l0.
81
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT

A good item is one that has good discriminating ability and has sufficient level of difficult
(not too difficult nor too easy).
At the end of the Item Analysis report, test items are listed according to their degrees of
difficulty (easy, medium, and hard) and discrimination (good, fair, poor). These distributions
provide a quick overview of the test, and can be used to identity items which are not performing
well and which can perhaps be improved or discarded.

The Item-Analysis Procedure for Norm-provides the following information:


1. The difficulty of the item;
2. The discriminating power of the item, and
3. The effectiveness of each alternative
Some benefits derived from Item Analysis are:
1. It provides useful information for class discussion of the test.
2. It provides data which help students improve their learning.
3. It provides insights and skills that lead to the preparation for better tests in the
future.
6.2. Validation and Validity
After performing the item analysis and revising the item which revision, the next step is
to validate the instrument. The purpose of validation is to determine the characteristics of the
whole test itself, namely, the validity and reliability of the test. Validation process of collecting
and analyzing evidence to support the meaningfulness and usefulness of the test.

Validity. Validity is the extent to which a test measures what it purports to measure or as
referring to the appropriateness, correctness, meaningfulness and usefulness of the specific
decisions a teacher makes based on the test results. These two definitions of validity differ in
the sense that the first definition refers to the test itself while the second refers to the decisions
made by the teacher based on the test. A test is valid when it is aligned with the learning
outcome.
A teacher who conducts test validation might want to gather different kinds of evidence.
There are essentially three main types of evidence that may be collected: content-related
evidence of validity, criterion-related evidence of validity and construct-related evidence
of validity. Content-related evidence of validity refers to the content and format of the
instrument. How appropriate is the content? How comprehensive? Does it logically get at the
intended variable? How adequately does the sample of items or questions represent the content
to be assessed?
Criterion-related evidence of validity refers to the relationship between Scores
obtained using the instrument and scores obtained using one or more other tests (often called
criterion). How strong is this relationship? How well such do scores estimate present or predict
future performance of a certain type?
Construct-related evidence of validity refers to the nature of the psychological
construct or characteristic being measured by the test. How well does a measure of the

82
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT

construct explain differences in the behavior of the individuals or their performance on a certain
task?
The usual procedure for determining content validity may be described as follows: The
teacher writes out the objectives of the test based on the Table of Specifications and then gives
these together with the test to at least two (2) experts along with a description or the intended
test takers. 1he experts look at the objectives, read over the item in the test and place a check
mark in front of each question or item that they feel does not measure one or more objectives.
They also place a check mark in front of each objective not assessed by any item in the test.
The teacher then rewrites any item checked and resubmits to the experts and/or writes new
items to cover those objectives not covered by the existing test. This continues until the experts
approve of all items and also until the experts agree that all of the objectives are objectives not
covered by the test.
In order to obtain evidence of criterion-related validity, the teacher usually compares
scores on the test in question with the scores on some other independent criterion test which
presumably has already high validity. For example, if a test is designed to measure
mathematics ability of students and it correlates highly with a standardized mathematics
achievement test (external criterion), then we say we have high criterion-related evidence of
validity. In particular, this type of criterion-related validity is called its concurrent validity.
Another type of criterion-related validity is called predictive validity wherein the test scores
in the instrument are correlated with scores on a later performance (criterion measure) of the
students. For example, the mathematics ability test constructed by the teacher may be
correlated with their later performance in a Division-wide mathematics achievement test.
In summary content validity refers to how will the test items reflect the knowledge
actually required for a given topic area (e.g. math). It requires the use of recognized subject
matter experts to evaluate whether test items assess defined outcomes. Does a pre-
employment test measure effectively and comprehensively the abilities required to preterm the
job? Does an English grammar test measure effectively the ability to write good English?

Criterion-related validity is also known as concrete validity because criterion validity


refers to a test's correlation with a concrete outcome.
In the case of pre-employment test, the two variables that are compared are test scores
and employee performance.
There are 2 main types of criterion validity-concurrent validity and predictive validity.
Concurrent validity refers to a comparison between the measure in question and an outcome
assessed at the same time.
An example of concurrent validity is a comparison of the scores with NAT Math exam
with course grades in Grade 12 Math. In Predictive validity, we ask this question: Do the scores
in NAT Math exam predict the Math grade in Grade 12?

6.3. Reliability
Reliability refers to the consistency of the scores obtained- how consistent they are for
each individual from one administration of an instrument to another and from one set of items to
another. We already gave the formula for computing the reliability of a test: for internal

83
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT

consistency; for instance, we could use the split-halt method or the Kuder-Richardson formulae
(KR-20 or KR-21)

Reliability and validity are related concepts. If an instrument is unreliable, it cannot get
valid outcomes. As reliability improves, validity may improve (or it may not). However, if an
instrument is shown scientifically to be valid then it is almost certain that it is also reliable.

Predictive validity compares the question with an outcome assessed at a later time. An
example of predictive validity is a comparison of scores in the National Achievement Test (NAT)
with first semester grade point average (GPA) in college. Do NAT scores predict college
performance? Construct validity refers to the ability of a test to measure what it is supposed to
measure. As researcher, you intend to measure depression but you actually measure anxiety so
your research gets compromised.

The following table is a standard followed almost universally educational test and
measurement
Reliability Interpretation
90 and above Excellent reliability: at the level of the best standardized tests
.80 - 90 Very good for a classroom test
Good for a classroom test; in the range of most. There are
.70-80 probably a few items which could be improved.
Somewhat low. This test needs to be supplemented by other
.60-7 measures (e.g. more tests) to determine grades. 1here are
probably some items which could be improved.
Suggests need for revision of test, unless it is quite short (ten
.50-60 or fewer items) The test definitely needs to be supplemented
by other measures (e.g. more tests) for grading.
Questionable reliability. This test should not contribute heavily
.50 or below to the course grade, and it needs revision.

ELABORATE

ACTIVITY 2: Who am I?
Directions: Identify the term being described by statements that follow. Write your answer on
the space provided before the number.
____________1. Refers to a statistical technique that helps instructors identify the effectiveness
of their test items.
____________2. Refers to the proportion of students who got the test item correctly.
____________3. Which is the difference between the proportion of the top scorers who got an
item correct and the proportion of the bottom scorers who got the item right.
____________4. Is concerned with how easy or difficult a test item is
____________5. An adjective describe an effective distracter.

ACTIVITY 3: TRUE or False!


Directions: Write TRUE if the statement is correct and FALSE if it is wrong on the space
provided before the number.
________1. Difficulty index indicates the proportion of students who got the item right.
________2. Difficulty index indicates the proportion of students who got the item wrong.

84
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT

________3. A high percentage indicates an easy item/question and a low percentage indicates
a difficult item.
________4. Authors agree, in general, that item should have values of difficulty no less than
20% correct and no greater than 80%.
________5. Very difficult or very easy items contribute greatly to the discriminating power of a
test.
________6. The discrimination index range is between -1 and +2.
________7. The farther the index is to +1, the more effectively the item distinguishes between
the two groups of students.
________8. When an item discriminates negatively, such item should be revised and eliminated
from scoring.
________9. A positive discrimination index indicates that the lower performing students actually
selected the key or correct response more frequently than the top performers.
________10. If no one selects a distracter it is important to revise the option and attempt to
make the distracter a more plausible choice.

EVALUATE

ACTIVITY 4: Problem Solving


A. Directions: Solve for the difficulty index of each test item.
Item No. No. of Correct No. of Students Difficulty Index/ Action/Decision
Responses Interpretation
1 2 50
2 10 30
3 20 30
4 30 30

B. Directions: Solve for the discrimination index of the following test items.
Item Number No. of Correct No. of Discrimination Index/ Action/Decision
Responses Students Interpretation
1 UG 12 25
LG 20 25
2 UG 10 25
LG 20 25
3 UG 20 25
LG 10 25
4 UG 10 25
LG 24 25

ACTIVITY 5: Question and Answer (HOTS)


Directions: Answer the given statements or questions that follow. Write your answer on the
space provided.
1. Enumerate the three types of validity evidence. Which of these types of validity is the most
difficult to measure? Why?

__________________________________________________________________________
__________________________________________________________________________
85
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT

__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________

2. What is the relationship between validity and reliability? Can a test be reliable and yet not
valid? Illustrate.

__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________

3. Why is item analysis important in assessment?

__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________

Closure:

In this module, you were able to learn different terms related to item analysis such as
difficulty index, discrimination index, plausibility, validity and reliability. These terms constitute
the things to de done in order to conclude that a certain test item is subject to be kept in a
test bank. In order to do so, the test items must gone through test trial and then be analyzed.
This particular aspect in assessment is sometimes neglected by some other teachers due to
time constraints and due to the number of students and classes they are handling. Item
analysis needs an ample time to be done but will result to a more reliable and valid test items
to be used in the future. Be of great patience in analyzing test items.

86

You might also like