Module 6
Module 6
ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT
ENGAGE
Welcome students! In the previous module, you were able to construct your TOS and
test items by following the set guidelines. I do hope that you can develop a more improved
version of those items as you go further in this course. Remember that you will be in the
classroom someday and will be assessing your students traditionally and authentically, that’s
why you really need to practice more in making test items that should be in parallel to the set
standards, competencies and objectives.
Test items, especially on the multiple-choice type of test, should be tested and validated
before it can be kept in the test bank. Test banks are area wherein good and developed
questions are kept in order to be used for the next examination. Just like what the Professional
Regulation Commission (PRC) do, they kept good test items to be used for the next board
exams. As a teacher, what should you do with your drafted test items so that they can be used
again in future examinations?
This module will lead you in unfolding things to be done in order for you to look deeper
on the quality of the test items you have prepared beforehand and with that, you can consider
those test items as subject to be accepted, rejected or developed.
EXPLORE
3. Validity ___________________________________________________________________
___________________________________________________________________________
4. Reliability _________________________________________________________________
___________________________________________________________________________
5. Plausibility ________________________________________________________________
76
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT
___________________________________________________________________________
EXPLAIN
INTRODUCTION
The teacher normally prepares a draft of the test. Such a draft is subjected to item
analysis and validation in order to ensure that the final version of the test would be useful and
functional. First, the teacher tries out the draft test to a group of students of similar
characteristics as the intended test takers (try-out phase). From the try-out group, catch item will
be analyzed in terms of its ability to discriminate between those who know and those who do not
know and also its level of difficulty (item analysis phase). The item analysis will provide
information that will allow the teacher to decide whether to revise or replace an item (item
revision phase). Then, finally, the final draft of the test is subjected to validation if the intent is to
make use of the test as a standard test for the particular unit or grading period.
Example: What is the item difficulty index of an item if students are unable to answer it
correctly while 75 answered it correctly?
Here, the total number of students is 100, hence the item difficulty index is 75/100 or
75%
.
Another example: 25 students answered the item correctly while 75 students did not.
The total number of students is 100 so the difficulty index is 25/100 or 25 which is 25%.
It is a more difficult test item than that one with a difficulty index of 75.
One problem with this type of difficult index is that it may not actually indicate that the
item is difficult (or easy). A student who does not know the subject matter will naturally be
unable to answer the item correctly even if the question is easy. How do we decide on the
basis of this index whether the item is too difficult or too easy?
77
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT
Difficult items tend to discriminate between those who know and those who do not know
the answer. Conversely, easy items cannot discriminate between these two groups of students.
We are therefore interested in deriving a measure that will tell us whether an item cans
discriminate between these two groups of students such a measure is called an index of
discriminate.
An easy way to derive such a measure is to measure how difficult an item is with respect
to those in the upper 25% of the class and how difficult it is with respect to those in the lower
25% of the class. If the upper 25% found difficult then the item can discriminate properly
between these two groups. Thus;
Discrimination index is the difference between the proportion of the top scorers
who got an item and the proportion of the lowest scorers who got the item correct and
the proportion of the lowest scorers who got the item right. The discrimination is between -l
and +1. The closer the discrimination index is to +1, the more effectively the item can
discriminate or distinguish between the two groups of students. A negative discrimination index
means more from the lower group got the item correctly. The last item is not good and so must
be discarded
78
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT
From these discussions, let us agree to discard or revise all items that have negative
discrimination index for although they discriminate correctly between the upper and lower 25%%
of the class, the content of the item itself may be highly dubious or doubtful.
As in the case of the index of difficulty, we have the following rule of thumb:
Index Range Interpretation Action
-1.0 – -0. 50 Can discriminate but Discard
item is questionable
Example: Consider a multiple choice type of test (80 studetnts) of which the following
data were obtained:
Item Options
A B* C D
1 0 40 20 20 Total
0 15 5 0 Upper 25%
0 5 10 5 Lower 25%
The correct response is B. Let us compute the difficulty index and index of
discrimination:
Difficulty Index = no. of students getting correct response/total
40/80= 50%, within range of a good item (right difficulty)
79
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT
It is also instructive to note that the distracter A is not an effective distracter since this
was never selected by the students. It is an implausible distracter. Distracters C and D appear
to have good appeal as distracters. They are plausible distracters.
Index of Difficulty
Ru+ RL
P= X 100
T
Where
Ru --- The number in the upper group who answered the item correctly.
RL --- The number in the lower group who answered the item correctly
T --- The total number who tried the item
Ru−RL
D=
1
T
2
Where:
8
D= x 100=40 %
20
The smaller the percentage figure the more difficult the item
Ru−RL 6−2
D= = =0.40
1 10
T
2
80
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT
For classroom achievement tests, most test constructors desire terms with indices of
difficulty no lower than 20 nor higher than 80, with an average index of difficulty from 30 or 40 to
a maximum of 60.
The INDEX OF DISCRIMINATION is the difference between the proportion of the upper
group who got an item right and the proportion of the lower group who got the item right. This
index in dependent upon the difficulty of an item. It may reach a maximum value of 100 for an
item with an index of difficulty of 50, that is, when l00% of the upper group and none of the lower
group answer the item correctly. For items of less than or greater than 50 difficulty, the index of
discrimination has a maximum value of less than 100.
Tests with high internal consistency consist of items with mostly positive relationships
with total test score. In practice, values of the discrimination index will seldom exceed .50
because of the differing shapes of 1tem and total score distributions. ScorePak® classifies item
discrimination as "good" if the index is above .30; fair" if it is between 10 and 30; and "poor"
if it is below .l0.
81
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT
A good item is one that has good discriminating ability and has sufficient level of difficult
(not too difficult nor too easy).
At the end of the Item Analysis report, test items are listed according to their degrees of
difficulty (easy, medium, and hard) and discrimination (good, fair, poor). These distributions
provide a quick overview of the test, and can be used to identity items which are not performing
well and which can perhaps be improved or discarded.
Validity. Validity is the extent to which a test measures what it purports to measure or as
referring to the appropriateness, correctness, meaningfulness and usefulness of the specific
decisions a teacher makes based on the test results. These two definitions of validity differ in
the sense that the first definition refers to the test itself while the second refers to the decisions
made by the teacher based on the test. A test is valid when it is aligned with the learning
outcome.
A teacher who conducts test validation might want to gather different kinds of evidence.
There are essentially three main types of evidence that may be collected: content-related
evidence of validity, criterion-related evidence of validity and construct-related evidence
of validity. Content-related evidence of validity refers to the content and format of the
instrument. How appropriate is the content? How comprehensive? Does it logically get at the
intended variable? How adequately does the sample of items or questions represent the content
to be assessed?
Criterion-related evidence of validity refers to the relationship between Scores
obtained using the instrument and scores obtained using one or more other tests (often called
criterion). How strong is this relationship? How well such do scores estimate present or predict
future performance of a certain type?
Construct-related evidence of validity refers to the nature of the psychological
construct or characteristic being measured by the test. How well does a measure of the
82
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT
construct explain differences in the behavior of the individuals or their performance on a certain
task?
The usual procedure for determining content validity may be described as follows: The
teacher writes out the objectives of the test based on the Table of Specifications and then gives
these together with the test to at least two (2) experts along with a description or the intended
test takers. 1he experts look at the objectives, read over the item in the test and place a check
mark in front of each question or item that they feel does not measure one or more objectives.
They also place a check mark in front of each objective not assessed by any item in the test.
The teacher then rewrites any item checked and resubmits to the experts and/or writes new
items to cover those objectives not covered by the existing test. This continues until the experts
approve of all items and also until the experts agree that all of the objectives are objectives not
covered by the test.
In order to obtain evidence of criterion-related validity, the teacher usually compares
scores on the test in question with the scores on some other independent criterion test which
presumably has already high validity. For example, if a test is designed to measure
mathematics ability of students and it correlates highly with a standardized mathematics
achievement test (external criterion), then we say we have high criterion-related evidence of
validity. In particular, this type of criterion-related validity is called its concurrent validity.
Another type of criterion-related validity is called predictive validity wherein the test scores
in the instrument are correlated with scores on a later performance (criterion measure) of the
students. For example, the mathematics ability test constructed by the teacher may be
correlated with their later performance in a Division-wide mathematics achievement test.
In summary content validity refers to how will the test items reflect the knowledge
actually required for a given topic area (e.g. math). It requires the use of recognized subject
matter experts to evaluate whether test items assess defined outcomes. Does a pre-
employment test measure effectively and comprehensively the abilities required to preterm the
job? Does an English grammar test measure effectively the ability to write good English?
6.3. Reliability
Reliability refers to the consistency of the scores obtained- how consistent they are for
each individual from one administration of an instrument to another and from one set of items to
another. We already gave the formula for computing the reliability of a test: for internal
83
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT
consistency; for instance, we could use the split-halt method or the Kuder-Richardson formulae
(KR-20 or KR-21)
Reliability and validity are related concepts. If an instrument is unreliable, it cannot get
valid outcomes. As reliability improves, validity may improve (or it may not). However, if an
instrument is shown scientifically to be valid then it is almost certain that it is also reliable.
Predictive validity compares the question with an outcome assessed at a later time. An
example of predictive validity is a comparison of scores in the National Achievement Test (NAT)
with first semester grade point average (GPA) in college. Do NAT scores predict college
performance? Construct validity refers to the ability of a test to measure what it is supposed to
measure. As researcher, you intend to measure depression but you actually measure anxiety so
your research gets compromised.
The following table is a standard followed almost universally educational test and
measurement
Reliability Interpretation
90 and above Excellent reliability: at the level of the best standardized tests
.80 - 90 Very good for a classroom test
Good for a classroom test; in the range of most. There are
.70-80 probably a few items which could be improved.
Somewhat low. This test needs to be supplemented by other
.60-7 measures (e.g. more tests) to determine grades. 1here are
probably some items which could be improved.
Suggests need for revision of test, unless it is quite short (ten
.50-60 or fewer items) The test definitely needs to be supplemented
by other measures (e.g. more tests) for grading.
Questionable reliability. This test should not contribute heavily
.50 or below to the course grade, and it needs revision.
ELABORATE
ACTIVITY 2: Who am I?
Directions: Identify the term being described by statements that follow. Write your answer on
the space provided before the number.
____________1. Refers to a statistical technique that helps instructors identify the effectiveness
of their test items.
____________2. Refers to the proportion of students who got the test item correctly.
____________3. Which is the difference between the proportion of the top scorers who got an
item correct and the proportion of the bottom scorers who got the item right.
____________4. Is concerned with how easy or difficult a test item is
____________5. An adjective describe an effective distracter.
84
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT
________3. A high percentage indicates an easy item/question and a low percentage indicates
a difficult item.
________4. Authors agree, in general, that item should have values of difficulty no less than
20% correct and no greater than 80%.
________5. Very difficult or very easy items contribute greatly to the discriminating power of a
test.
________6. The discrimination index range is between -1 and +2.
________7. The farther the index is to +1, the more effectively the item distinguishes between
the two groups of students.
________8. When an item discriminates negatively, such item should be revised and eliminated
from scoring.
________9. A positive discrimination index indicates that the lower performing students actually
selected the key or correct response more frequently than the top performers.
________10. If no one selects a distracter it is important to revise the option and attempt to
make the distracter a more plausible choice.
EVALUATE
B. Directions: Solve for the discrimination index of the following test items.
Item Number No. of Correct No. of Discrimination Index/ Action/Decision
Responses Students Interpretation
1 UG 12 25
LG 20 25
2 UG 10 25
LG 20 25
3 UG 20 25
LG 10 25
4 UG 10 25
LG 24 25
__________________________________________________________________________
__________________________________________________________________________
85
KING’S COLLEGE OF MARBEL, INC. ASSESSMENT OF LEARNING 1
BRGY. MORALES, CITY OF KORONADAL EDUC 313
COLLEGE DEPARTMENT
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
2. What is the relationship between validity and reliability? Can a test be reliable and yet not
valid? Illustrate.
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
Closure:
In this module, you were able to learn different terms related to item analysis such as
difficulty index, discrimination index, plausibility, validity and reliability. These terms constitute
the things to de done in order to conclude that a certain test item is subject to be kept in a
test bank. In order to do so, the test items must gone through test trial and then be analyzed.
This particular aspect in assessment is sometimes neglected by some other teachers due to
time constraints and due to the number of students and classes they are handling. Item
analysis needs an ample time to be done but will result to a more reliable and valid test items
to be used in the future. Be of great patience in analyzing test items.
86