Chapter 3. Development of Tools For Classroom Based Assessment
Chapter 3. Development of Tools For Classroom Based Assessment
Learning Outcomes:
At the end of the chapter, the students are able to:
1. Identify the different types of the test;
2. implement the essential steps in the test development process that includes:
a. re-examination of the target outcomes;
b. determining the desired competencies to be measured;
c. preparing a Table of Specification (TOS); and
d. construction of valid and appropriate classroom assessment tests for
measuring learning outcomes;
3. calculate for the validity and reliability of a prepared test;
4. Identify blunders in a constructed test; and
5. illustrate the test development process.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 50
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Format 1:
Content Number of Items
1. Importance of Research 6
2. Types of Research 12
3. Qualities of a Good Researcher 8
4. The Research Process 14
Total 40
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 51
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Format 2:
Topics Cognitive Level Type of Test Item Total
Number Points
1. Importance of Remembering Enumeration 9-14 6
Research
2. Types of Evaluating Constructed 15-26 12
Research response
3. Qualities of a Understanding True or False 1-8 8
Good
Researcher
4. The Research Creating Creating a 27-40 14
Process diagram
Total 40
Format 3:
Specific Objectives No. of No. of Cognitive Level Item
Class Items Distribution
sessions
K-C A HOTS
1, List the importance of 1½ 6 / 9-14
research
2. Identify and justify the 3 12 / 15-26
type of research that will
best address a given
research question
3. Distinguish a statement 2 8 / 1-8
that describes a good
researcher
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 52
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Format 4:
Content Class Cognitive Level
sessions
(by hour)
Rem Und App An Eval Crea Total Item
Items Dist
1. Importance 1½ / 6 9-14
of Research
2. Types of 3 / 12 15-26
Research
3. Qualities of 2 / 8 1-8
a Good
Researcher
4. The 3½ / 14 27-40
Research
Process
Total 10 40
Rem – Remembering; Under – Understanding; App – Applying; An – Analyzing;
Eval – Evaluating; Crea – Creating
In deciding on the number of items per subtopic, the formula below is observed:
Ex. For the topic on the importance of research, the following are the given:
Number of class sessions – 1 ½
Desired total number of items – 40
Total number of class sessions – 10
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 53
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
The length of time, the type of test, and the type of item used are also factors
considered in determining the number of items to be constructed in a test. Gabuyo (2012)
presents an estimated average time to answer a specific type of test. (see Table 4)
Test Design and Construction Phase. This is where test items are designed and
constructed following the appropriate item format for the specified learning outcomes of
instruction. The test items are dependent upon the educational outcomes and
materials/topics to be tested. This phase includes the following steps:
1. Item Construction. According to Osterlind, 1989), the perils of writing test items
without adequate forethought are great. Decisions about persons, programs,
projects, and materials are often made on the basis of test scores. If a test is made
up of items haphazardly written by untutored persons, the resulting decisions could
be erroneous. Such errors can sometimes have serious consequences for learners,
teachers, and the institution as a whole. Performances, programs, projects, and
materials could be misjudged. Obviously, such a disservice to examinees as well as to
the evaluation process should be avoided if at all possible.
To help classroom teachers improve the quality of test construction, Kubiszyn
and Borich (2007) suggested some general guidelines for writing test items consist of
the following:
Begin writing items far enough or in advance so that you will have time to
revise them.
Match items to intended outcomes of an appropriate level of difficulty to
provide a valid measure of instructional objectives. Limit the question to
the skill being assessed.
Be sure each item deals with an important aspect of the content area and
not with trivia.
Be sure the problem posed is unambiguous.
Be sure that the item is independent of all other items. The answer to
one item should not be required as a condition in answering the next
item. A hint to one answer should not be embedded in another item.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 54
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Be sure the item has one or the best answer on which experts would
agree.
Prevent unintended clues to an answer in the statement or question.
Grammatical inconsistencies such as a or an give clues to the correct
answer to those students who are not well prepared for the test.
Avoid replication of the textbook in writing test items. Do not quote
directly from the textual materials.
Avoid tricky questions in an achievement test. Do not waste time testing
how well the students can interpret your intentions.
Try to write items that require higher-order thinking skills.
Types of Test
To create effective tests, the teacher needs to familiarize himself/herself with
the different types of tests and avoid any pitfalls in test construction through helpful
and definitive guidelines.
Objective Test. This test consists of questions or items that require factual
answers. This test can be quickly and unambiguously scored by the teacher or
anyone who has the answer key. The response options are often structured
that can easily be marked as correct or incorrect, thus minimizing subjectivity or
bias on part of the scorer.
a. Selection Test. In this test type, the students select the best possible
answer/s from the choices that are already given and do not need to
recall facts or information from their memory.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 55
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
a.1 True-false test contains items with only two fixed choices (binary
options). The students simply recognize a correct or an incorrect
statement that can stand for their knowledge and understanding of
facts or information shared.
TRUE-FALSE TEST
Advantages Disadvantages
True-false items are easy to There is a high probability of
a.2
formulate. guessing.
The score is more objective It is not well-suited for
than the essay test. measuring complex mental
It covers a large range of processes.
content in a short span of It often measures low-level
time. thinking skills that are limited
The test is easy to formulate to the ability to recognize,
and quick to check and to recall, and understand
score. information.
It is easier to prepare
compared to multiple-choice
and matching-type tests.
Guidelines
Keep the statement direct, brief, and concise but complete.
Each statement should only focus on a single idea unless it has the
intention to show a cause-and-effect relationship.
Use approximately the same number of true and false statements.
Don’t copy statements directly taken from the textbook.
Specify clearly in the direction where and how the students should
mark their answers.
Arrange the true and false items in random order to minimize the
chance for students to detect a pattern of responses.
BEWARE of using
trivial and tricky questions;
opinion-based statement, unless such a statement is attributed
to an author, expert, or a proponent;
superlatives such as best, worst, largest, etc.;
negative or double negatives. If this cannot be avoided, bold
negative words or underline it to call the attention of the
examinees; and
clues to the correct choice through specific determiners such as
some, sometimes, and many that tend to appear in the true
statements; and never, always, all, none that tend to appear in
the statements that are false.
Matching type test provides two columns for learners to connect or match words,
phrases, or sentences. Column A on the left side contains the descriptions called
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 56
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
premises and column B on the right side contains the options for answers called
responses. The items in Colum A are numbered while the items in Column B are
labeled with capital letters. The convention is for learners to match the given
response on the right with the premise on the left.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 57
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
a.3 Multiple-choice test requires test-takers to choose the correct answer from the list
of options given. It includes three parts: the stem, the keyed option, and the
incorrect options or alternatives. The stem represents the problem or question
usually expressed in the completion form or question form. The key option is the
correct answer. The incorrect options or alternatives are also called distractors or
foils.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 58
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
MULTIPLE-CHOICE TEST
Advantages Disadvantages
It can be scored and analyzed The development of good
efficiently, quickly, and items in a test is time-
reliably. consuming.
It measures learning Plausible distractors are hard
outcomes at various levels to formulate.
from knowledge to Test scores can be influenced
evaluation. by other factors such as the
It measures almost any test-wiseness or reading ability
educational objective. of the examinees.
It measures broad samples of It is not effective in assessing
content within a short span of the problem-solving skills of
time. the students
Its questions/items can It is not applicable when
further be analyzed in terms measuring the ability to
of validity and reliability. organize and express ideas.
If an item analysis is applied,
it can reveal the difficulty of
an item and its ability to
discriminate against the good
performing students.
Guidelines
Phrase each question concisely.
Use simple, precise, and unambiguous wording.
Avoid the use of trivial and tricky questions.
Use three to five options to challenge critical thinking and discourage
guessing.
Present diagram, drawing, or illustration when students are asked to
apply, analyze, or evaluate ideas.
Use tables, figures, or charts when students are required to interpret
ideas.
Use pictures, if possible, when students are required to apply concepts
and principles.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 59
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 60
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Guidelines
The stem should:
o be written in question or completion form. If blanks are
provided in completion form, they are placed at the end and
NOT at the beginning or in the middle of the
sentence/statement;
o be clear and concise (does not use excessive/irrelevant
words);
o avoid using negative words such as not or except. If this
cannot be avoided, they are written in bold or capital letters
to call the attention of the examinee; and
o be free from grammatical clues and errors.
Options:
o are arranged in a logical order;
o are marked with capital letters;
o are listed vertically beneath the stem.
o provide only one correct or clearly best answer in each item;
o are kept independent and do not overlap with options in
other items;
o Are homogenous in content to raise the difficulty level of an
item;
o are of uniform or equal length as much as possible; and
o avoid or sparingly use the phrases “all of the above” and
“none of the above.”
Distractors:
o should be plausible and effective but not too attractive to be
mistaken by most students as the correct answer. Each
distractor should be chosen by at least 5% of the examinees
but not more than the key answer;
o should be equally familiar to the examinees; and
o should not be constructed for the purpose of tricking the
examinees.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 61
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
b. Supply Test. The supply test, otherwise known as the constructed-response test,
requires students to create and supply their own answers or perform a certain task
to show mastery of knowledge or skills rather than choosing an answer to a
question. It includes short answer, completion type, essay type items. These tests
can be categorized as either objective or subjective. They are in objective form when
definite answers are asked from the examinees and which answers observe stable
scoring or are not influenced by the judgment of the scorers. On the other hand,
they are in the subjective form when students are allowed to answer items in the
test in their own words or using their original ideas.
b.1 Short-answer test contains items that ask students to provide exact answers.
Rather than simply choosing from several options provided, the examinees
either provide clearly defined answers or compose short phrases for answers.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 62
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Guidelines
Clearly specify in the test direction how the question should be
answered.
Frame questions/items using words that are easily understood by the
examinees.
Restate or do not copy exact wordings from the text.
Make sure that examinees provide factually correct answers.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 63
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
COMPLETION TEST
Advantages Disadvantages
It is easy to construct. It is more difficult or tedious to
It minimizes guessing. score than other objective
It has wider coverage in terms types of tests.
of content It is typically not suitable for
measuring complex learning
outcomes.
Guidelines
Only the keywords to be supplied by the examinees should be omitted.
The item should require a single-word answer or brief answers.
Use only one blank per item. Preferably, place it at the end of the
statement.
Blanks provided should be equal in length. Their length should provide
sufficient space for the answer.
Do not use indefinite statements that allow varying answers.
Indicate the units (e.g. cm, ft, inc) when items require numerical
answers.
Avoid grammatical clues such as a or an.
Do copy the exact sentences from textbooks.
Subjective Test. This test allows the student to organize and present answers in their
own words or using their original ideas. This test can be influenced by the judgment
or opinion of the examinees and the scorers; nevertheless, it allows the assessment
of aspects in students’ performance that are complex and qualitative. Questions
raised may elicit varied answers that can be expressed in several ways.
b.3 Essay test is a subjective type of test that requires examinees to structure long
written response to answer a question. This test measures complex cognitive
skills or processes and is usually scored on an opinion basis. It may require the
examinees to give definitions, provide interpretations, make evaluations or
comparisons, contrast concepts, and demonstrate knowledge of the
relationships (Morrow, et al., 2016).
b.3.1 Restricted response essays set limits on the content and response
given by the students. Limitations in the form of the response are well-
specified in the given essay question/item.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 64
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Scoring Rubric:
Descriptor Points
The essay demonstrates complete knowledge and 4
understanding of the topic. It uses clear and precise
language.
The essay demonstrates very good knowledge and 3
understanding of the topic. It uses clear language with
occasional lapses.
The essay demonstrates a good knowledge and 2
understanding of the topic. It uses clear language and
precise language for the most part.
The essay demonstrates little knowledge and 1
understanding of the topic. The language is only partly
clear and accurate.
The essay demonstrates no real knowledge and 0
understanding of the topic. The language is not clear
and inaccurate.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 65
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
ESSAY TEST
Advantages Disadvantages
It is most useful in assessing It is difficult to check and score.
higher-order thinking skills. It observes inconsistent and
It is best to develop logical unreliable procedures for
thinking and critical scoring.
reasoning. Test effectiveness is difficult to
It takes less time and is easy analyze and establish.
to construct. Its reliability is often low
It largely eliminates guessing. because of the subjective
It can effectively reveal scoring of the answers.
personality and measure It does not allow a larger
opinions and attitudes. sampling of content.
It gives examinees freedom to It encourages bluffing. Scoring
plan their answers and may be affected by good
respond within broad limits handwriting, neatness,
grammar, etc.
It entails excessive use of time
for answering.
Scores may be affected by
personal biases or previous
impressions.
Guidelines
Use rubrics for scoring an essay answer.
Do not begin with who or what in writing your essay question.
Use unambiguous wording of the essay questions/items.
Indicate the values or assigned points for every essay question/item.
All examinees should be required to answer all and the same essay
questions for valid and objective scoring.
Keep the students anonymous while checking the essay answers.
Evaluate all answers to one question before going on to the next.
Make sure that students have ample time to answer the essay test.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 66
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
c.2 Product-based focuses on the final assessable output and not on the
actual performance of making the product. Examples are portfolios,
multimedia presentations, posters, ads, and bulletin boards.
PERFORMANCE TEST
Advantages Disadvantages
They can measure complex Scoring procedures are
learning outcomes in a generally subjective and
natural setting. unreliable.
The students can apply the They demand a great amount
knowledge, skills, and values of time for preparation,
learned. administration, and scoring.
They promote more active They can possibly be costly.
student engagement in an They rely heavily on students’
activity. creativity and drive.
They can help identify the
students’ strengths and
weaknesses.
They provide a more realistic
way of assessing
performance.
Guidelines
Focus on skill or product to be tested. They should relate to the pre-
determined learning outcomes.
Provide clear directions on the task or product required. Clearly
communicate expectations.
Minimize dependence on skills that are not relevant to the intended
purpose.
Use rubrics to rate performance or product.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 67
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
2. Test Assembling. After constructing the test items, arrange the test items. There are two
steps in assembling the test: (1) packaging the test; and (b) reproducing the test. Gabuyo
(2012) sets the following guidelines for the assembling of the test.
Group all test items with similar format.
Arrange test items from easy to difficult.
Space the test items for easy reading.
Keep items and options on the same page.
Place the illustrations near the description.
Check the answer key.
Decide where to record the answer.
3. Writing direction. All test directions must be complete, explicit, and simply worded. The
type of answer that is elicited from the learners must be clearly specified. The number of
items to which they apply, how to record their answers, the basis of which they select
answers, and the criteria for scoring or scoring system, and the time allocated for each
type of test (if so required) should also be indicated in the test instructions.
Example:
WEAK BETTER
Direction: Choose the best answer for Direction: Study the rubric below and
each given question. identify the correct answer for the
questions that follow (marked as items
6, 7, 8, 9, and 10). Write the CAPITAL
LETTER corresponding to the correct
answer on the space provided before
each number. (1 point each)
4. Checking on the assembled test items. Before reproducing the test, it is very important
to proofread first the test items for typographical and grammatical errors and make
necessary corrections if any. If possible, let others examine the test. This can save
time during the examination as it will not anymore cause any distraction to the
students.
Table 5. Checklist for Checking the Test Items
Checklist Yes No
1. Are the test items appropriate to measure the set
learning outcome?
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 68
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Reviewing Phase. In this phase, items in the test are examined in terms of their
validity and usability. The test will then be administered to a sample group for reliability
testing. The initial results will be subjected to an item analysis to determine the
discrimination and difficulty indexes of the test before it will be rolled out to the actual
participants.
1. Validating the test. This is done by checking on the relevance, appropriateness,
and meaningfulness of the test to the purpose it claims to serve. This is an
imperative requirement for the accurate application and interpretation of the
test results.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 69
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
This validation can further be classified into two – concurrent and predictive
validity.
c. Construct Validity. This type of validity determines how well a test measures
what it is designed to measure. It is the ability of an assessment tool to
measure a theoretical or unobservable variable quality that it claims to test.
Is the test constructed in a way that it successfully tests what it claims to
test? Does the test measure the concept or construct that it is intended to
measure?
c.1 Convergent validity establishes that a test has a high level of correlation
with another test that measures the same construct.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 70
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
The following are the factors that can lower the validity of a test:
o Ambiguity
o Unclear directions
o Errors in test scoring
o Inappropriate length of the test
o Flaws in test administration
o Poor construction of test items
o Identifiable clues or patterns for answers
o Inappropriate level of difficulty of test items
o Incorrect arrangement of test types and test items
o Insufficient time to answer the entire test
o Difficult to understand or incomprehensible wordings
Validity Coefficient
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 71
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
n ∑ xy−( ∑ x ) (∑ y )
r xy =
√ (n ∑ x −( ∑ x ) ) (n ∑ y −(∑ y) )
2 2 2 2
Where:
r xy=correlation coefficient
n=number of cases
∑ x=summation of x scores
∑ y=summation of y scores
∑ xy=summation of the product of x∧ y scores
∑ x =∑ of squared x scores
2
∑ y =∑ of squared y scores
2
The number between 0 and 1.00 that indicates the magnitude of the
relationship. As a general rule, the higher the validity coefficient the more
beneficial it is to use the test. Validity coefficients can be interpreted as
follows:
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 72
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Scores in
Scores in Math Criterion
Students Test (x) Test (y) xy x2 y2
1 14 12 168 196 144
2 20 27 540 400 729
3 25 25 625 625 625
4 16 17 272 256 289
5 30 30 900 900 900
6 23 25 575 529 625
7 10 16 160 100 256
8 28 29 812 784 841
9 20 19 380 400 361
10 18 23 414 324 529
11 9 7 63 81 49
12 27 29 783 729 841
13 29 26 754 841 676
14 24 25 600 576 625
15 5 15 75 25 225
∑x=298 ∑y=325 ∑xy=7121 ∑x2=6766 ∑y2=7715
n ∑ xy−( ∑ x ) (∑ y )
r xy =
√ (n ∑ x −( ∑ x ) ) (n ∑ y −(∑ y) )
2 2 2 2
106815−96850
r xy =
√ ( 10140−88804 ) (115725−105625)
9965
r xy =
√ ( 12686 ) (10100)
9965
r xy =
√ 128128600
r 9965
xy=
11319.390
r xy=0.88
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 73
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Coefficient of Determination
2. Pilot testing. This step in the process means conducting a “test rehearsal” to a
try-out or sample group to test the validity of a test. A selected group of
learners tries answering a test and their obtained scores in the test provide
useful feedback prior to the deployment of the test to the target group of
examinees. The data gathered from this experimental procedure helps the
teacher to perform a preliminary analysis of the feasibility, practicality, and
usability of the techniques, methods, and tools used in a test. Early detection of
probable problems and difficulties can take place before the actual test
administration. This will eventually reveal aspects of the test and its conduct
that need to be refined or improved.
The following are the important considerations in the pilot testing of a test:
o Create a plan. Identify the smaller sample to be tested. Determine the
time duration, cost, correspondences, and the persons to collaborate
with in the conduct of a test.
o Prepare for a pilot test. Set the time and venue for the conduct of a pilot
test. Check on the condition of the test-takers. Try to eliminate aspects
in the environment that will threaten the validity of the test such as the
lightings, ventilation, and orderliness of the test/examination room.
o Deploy the test. Regulate the test-takers and the condition of the venue.
Make sure that the test can obtain truthful and objective information.
Keep an eye on any form of distraction or interruption during the conduct
of the test. Address the specific needs of the test-takers. Provide clear
and complete answers in case they ask questions or clarifications about
the test.
o Assess and evaluate the pilot test. Reflect on the pilot-testing activity
that took place. Identify flaws in the process and devise a plan so they
can be avoided during the actual test. Organize and collate scores for
further analysis.
3. Item Analysis.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 74
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
After the conduct of pilot-testing, the teachers score and look into the quality
of each item in the test. This procedure helps the teachers identify good items
by doing an item analysis. Item analysis is a process that examines student
responses to individual test items (questions) in order to assess the quality of
those items and the test as a whole (University of Washington Office of
Educational Assessment, 2020). This is certainly vital in improving items in later
tests. Also, the teachers can clearly spot and eliminate ambiguous or misleading
items, design appropriate remediation, and construct a test bank for future use.
The most common method employed for item analysis is the Upper-Lower
(U-L) index method (Stocklein cited in Gutierrez, 2020). This analysis provides
teachers with three types of information which include (a) difficulty index, (b)
discrimination index, and (c) distractor or option-response analysis.
Formula:
Difficulty Index = % of the upper group who got the item right + % of the lower group who got the item right
2
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 75
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
The discrimination index is the power of the test item to discriminate between those
who scored high (in the upper 27%) and those who scored low (in the lower 27%) on the
total test.
Formula:
Discrimination Index = % U – % L
If an item obtained an acceptable level of difficulty (index ranges from 0.41 to 0.60)
and discrimination (index is 0.40 and above), it is considered a good item and must be
retained. If an item is not or unacceptable in either difficulty or discrimination indices, it is
considered fair and must be revised. Finally, if an item is not or unacceptable in both
indices, it is considered a poor item and therefore, must be rejected.
Table 9. Guide to Making Decision for the Retention, Revision, and Rejection of Test Items
Lower
Item Upper 27% Diff Discr
27% Remarks Remarks Decision
Number Index index
14 % 14 %
moderately
1 12 0.86 3 0.21 0.54 0.65 very good retain
difficult
2 14 1.00 7 0.50 0.75 easy 0.50 very good revise
3 7 0.50 10 0.71 0.61 easy -0.21 poor reject
4 12 0.86 6 0.43 0.65 easy 0.43 very good revise
moderately
5 10 0.71 4 0.29 0.50 0.42 very good retain
difficult
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 76
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Example: Let us assume that 100 students took the test. If A is the key (right
answer) and the item difficulty is 0.60, then 60 students answered correctly.
What about the remaining 30 students and the effectiveness of the three
distractors?
Options Remarks
Difficulty A B C D
Index
Key Distractors
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 77
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
4. Reliability Testing
Effective assessments are dependable and consistent that yield reliable evidence
(Suskie, 2018). Reliability refers to the consistency of measure which can be
affected by the clarity of the assessment tool and the capability of those who use
it. Reliable assessment tools generate repeatable and consistent results over
time (test-retest), across items (internal consistency), and across different raters
or evaluators (inter-rater reliability). If the test tool or instrument is unreliable, it
cannot produce a valid outcome.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 78
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
n ∑ xy−( ∑ x ) (∑ y )
r xy =
√ (n ∑ x −( ∑ x ) ) (n ∑ y −(∑ y) )
2 2 2 2
Test
Scores Re-test Scores
(1st Run) (2nd Run)
Students x y xy x2 y2
1 17 22 374 289 484
2 49 28 1372 2401 784
3 26 12 312 676 144
4 32 30 960 1024 900
5 12 40 480 144 1600
6 27 31 837 729 961
7 18 17 306 324 289
8 38 27 1026 1444 729
9 33 40 1320 1089 1600
10 24 30 720 576 900
11 9 18 162 81 324
12 34 35 1190 1156 1225
13 33 41 1353 1089 1681
14 46 38 1748 2116 1444
15 29 39 1131 841 1521
∑xy=1329
∑x=427 ∑y=448 1 ∑x2=13979 ∑y2=14586
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 79
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
n ∑ xy−( ∑ x ) (∑ y )
r xy =
√ (n ∑ x −( ∑ x ) ) (n ∑ y −(∑ y) )
2 2 2 2
199365−191296
r xy =
√ ( 209685−182329 ) (218790−200704)
8069
r xy =
√ ( 27356 ) (18086)
r 8069
xy=
22243.22
r xy=¿0.36 ¿
The obtained rxy = 0.36 which is lower than 0.50 indicates that the test has a
questionable level of reliability. It will not contribute to successfully meet
the desired course outcomes and therefore needs to be thoroughly reviewed
and revised.
b. Internal consistency gauges how well the items in an instrument can produce
consistent or similar results on multiple items measuring the same construct.
If people’s responses to the different items are not correlated with each
other, then it would no longer make sense to claim that test items are all
measuring the same underlying construct.
Formula:
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 80
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
ρ KR 20=
k
k−1
1−
(
∑ piqi
σ
2 )
Where:
k = number of test-takers
pj = proportion of examinees passing the item
qj = proportion of examinees failing the item
σ2 = variance of the total scores of all the people taking the test
0.64 0.70
p 0.588 0.765 0.765 7 0.765 6 0.824 0.824 0.706 0.412
0.35 0.29
q 0.412 0.235 0.235 3 0.235 4 0.176 0.176 0.294 0.588
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 81
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
0.22 0.20
pq 0.242 0.18 0.18 8 0.18 8 0.145 0.145 0.208 0.242 1.958
k 17
var 5.294
ρKR20 0.69
The yielded ρKR20 value of 0.69 means that the reliability of the instrument is
somewhat low but is already within the acceptable range value (0.60 or higher). The test
needs, however, to be supplemented by other measures to serve as the basis of grades and
some items probably need to be improved.
Cronbach’s alpha is the most common measure of internal consistency when there
are multiple options given to answer an item in the test. An instrument that uses the Likert
Scale to do assessments in the affective domain can apply this formula to ensure its ability
to draw out reliable answers from the respondents. The formula for Cronbach’s coefficient
alpha is:
k
α= ¿)
k −1
Where:
k= the number of items in a measure
2
σ i = the variance of each individual
2
σ X = the variance of the total items on the index
This application is easier and more efficiently done using Microsoft Excel.
Example: The researcher used the Likert scale for the respondents to rate their
attitude toward Mathematics as a subject. The respondents showed their
agreement or disagreement on the different items raised in the questionnaire. The
responses of the participants were coded from the lowest (strongly disagree) with
an assigned score of 1 to the highest (strongly agree) with a designated score of 5.
The data matrix below reflects the answers of the 19 respondents on the 5-item
test.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 82
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Step 1. On the top menu, click data analysis > two-way ANOVA without
replication > OK.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 83
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Step 2. The ANOVA without replication dialogue box will appear. Click the
input range > new worksheet ply > OK.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 84
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
OK
Highlight
the scores
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 85
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
α =¿ 1-
MS Error
MS Rows ( )
α =¿ 1- ( 0.388889
3.251462 )
α =¿ 1- ( 0.388889
3.251462 )
α =¿ 1- 0.119606
α =¿ 0.88096
For example, two raters observed five learners who executed the basic steps
in a folk dance. The table shows how each demonstration of skill was rated.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 86
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
2 4 4
3 1 4
4 5 5
5 3 3
If there are multiple raters, add columns for the pairing of results and to
indicate the agreement.
Example:
The obtained result (39.8%) is below 0.50. It suggests that the test has a
questionable level of reliability. Revision of the test and adoption of more
tests to provide a valid assessment of student's performance are called for.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 87
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Activity B. Test Design and Construction. Based on the chosen topic and
the specified learning outcomes, construct a test (unit or periodic test). Use
a separate
sheet with the layout indicated in activity A. ( maximum of 60 points)
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 88
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Exercises.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 89
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
II. Short Answer. Study the data in the table below and answer the questions
that follow. Write your answer in the space provided. (1 point each)
Using the results of the 5-item test given to 60 students, Teacher Trixie
wanted to determine which items need to be retained, revised, and rejected. She
encoded the data and determined the number of students who got each of the 5
items correct from both the upper and lower 27th group.
___________________________
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 90
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
SENTENCE COMPLETION
Fill in the blanks with the correct word to complete the
statement.
1. A test is a ___________ used to establish the
quality, __________, or reliability of ________,
TRUE OR FALSE
Tell whether the statement is true or false.
2. Scoring an essay test is always difficult.
MULTIPLE CHOICE
Choose the best answer.
3. Which is the best way of dealing with discipline
problem in the classroom?
A. Always give test
B. Impose punishment
C. Talking with the child in private
D. Must involve the parents
ESSAY
Construct an essay to answer the question.
4. List the 7-step path to making “ethical decisions.”
List them in their correct progressive order.
MATCHING TYPE
Match column A with B. Letter only.
A B
___1. Multiple choice A. Most difficult to score
___2. True-False B. Students can simply make guesses
___3. Short Answer C. measures greatest variety of
learning outcomes
D. Least useful for educational
diagnosis
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 91
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Scoring Rubric
Level Score Description
Exemplary 5 points The comment/remark is accurate with all
main points included.
There is a clear and logical presentation of
ideas.
Sentences are well-structured and free
from grammatical and/or syntactic errors.
Very Good 4 points The comment/remark is accurate but
there are minor problems in logic and
construction.
Few grammatical/syntactic errors are
found.
Good 3 points One or two major points are missing in the
comment/remark.
Few grammatical/syntactic errors are
found.
Needs 2 points The answer does convey a full
Improvement understanding of the lesson.
The quality of writing is inferior.
Unsatisfactory 1 point The answer is inaccurate or deviates from
what is asked.
Sentences are disorganized and contain
major grammatical/syntactic errors.
Scenario 1. Teacher Luna constructed a 50-item summative test in Filipino for grade
six pupils. She prepared a TOS according to the pre-determined topics outlined in
the course program. Being assigned a special assignment, she missed delivering
almost half of the topics that she was supposed to cover. To make up for her
absences, she distributed hand-outs or copies so students could proceed despite her
failure to hold a regular class. Will the test provide a valid result?
Scenario 2. Teacher May set this learning outcome for her students to develop at
the end of the lesson: the students are able to demonstrate proficiency in writing
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 92
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
and verbal skills. Considering the time and effort she will exert in checking the
papers, she finally opted to give a short answer test where students will still be
required to construct short sentences. Does her assessment method match her
target outcome? Will she be able to measure what she’s supposed to measure?
Scenario 3. Sir Ben gave a 120 multiple-choice test in Math for his college students
to answer in one hour. Due to lack of time, more than 50% of the students were not
able to finish. The students appealed that the remaining unanswered items will not
be counted and students’ scores will only be based on the number of items they
were able to finish. Sir Ben finally conceded to the students’ request Is his decision
proper? Will it not invalidate the test?
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 93
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Student
s 1st Run 2nd Run Please show your process here.
1 22 21
2 13 19
3 24 24
4 25 19
5 16 18
6 3 12
7 23 26
8 21 25
9 22 25
10 15 20
11 16 19
12 19 18
13 21 22
14 3 14
15 4 9
16 2 12
17 16 23
18 8 13 Answer: _________________
19 26 24 Interpretation:
20 24 30 ______________________________
21 30 30 ____________________________
22 16 14
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 94
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
VI. Illustrating the Concept. Present a creative and inventive illustration that
depicts the test development process. Label every phase and the
corresponding activities done by the teacher/assessor for a clearer
representation of the process. (10 pts.).
References:
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 95
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT
Kubiszyn, Tom & Borich, G. (2007) Educational Testing and Measurement: Classroom Application and
Practice. 8th Ed. Wiley Jossey-Bass
Navarro, R., and Santos, R. (2012). Assessment of Learning Outcomes, 2 nd edition. Manila,
Philippines: Lorimar Publishing, Inc.
Osterlind S.J. (1989) What Is Constructing Test Items?. In: Constructing Test Items.
Evaluation in Education and Human Services, vol 25. Springer, Dordrecht. Retrieved
August 21, 2020, from https://link.springer.com/chapter/10.1007%2F978-94-009-1071-3_1.
Suskie, L. (2018). Assessing Student Learning: A Common Sense Guide. USA: Jossey Bass.
This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior
permission from the authors. Page 96