Module Assessment 3 - 020535
Module Assessment 3 - 020535
Designing and
Developing Assessments
Overview
Module 3 | Assessment in Learning 1
Learning Outcomes
Assessment tools are techniques used to measure a student’s academic abilities, skills,
and/or fluency in a given subject or to measure one’s progress toward academic
proficiency in a specific subject area.
It is the instrument (form, test, rubric, etc.) that is used to collect data for each outcome.
The actual product that is handed out to students for the purpose of assessing whether
they have achieved a particular learning outcome(s).
Assessments can be either formal or informal. Informal assessments are often inferences
an educator draws as a function of unsystematic observations of a student’s performance
in the subject matter under consideration. Formal assessments are objective
measurements of a student’s abilities, skills, and fluency using screening, progress
monitoring, diagnosis, or evaluation. Both types of assessments are important; however,
only formal assessments are research, or evidence-based.
Educators use assessment tools to make informed decisions regarding strategies to
enhance student learning.
1. Measure all instructional objectives. When a teacher constructs test items to measure the
learning progress of the students, they should match all the learning objectives posed during
instruction. That is why the first step in constructing a test is for the teacher to go back to the
instructional objectives.
2. Cover all the learning tasks. Teacher should construct a test that contains a wide range of
sampling of items.in this case, the teacher can determine the educational outcomes or
abilities that the resulting scores are representatives of the total performance in the areas
measured.
3. Use appropriate test items. The test items constructed must be appropriate to measure
learning outcomes.
4. Make test valid and reliable. The teacher must construct a test that is valid so that in can
measure what is supposed to measure from the students. The test is reliable when the scores
of the students remain the same or consistent when the teacher gives the same test for the
second time.
5. Use test to improve learning. The test scores should be utilize by the teacher properly to
improve learning by discussing the skills or competencies on the items that have not been
learned or mastered by the learners.
45
Module 3 | Assessment in Learning 1
The type of test used should match the instructional objective or learning outcomes of the
subject matter posed during the delivery of the instruction. The following are the types of
assessment tools:
1. Objective Test. It requires student to select the correct response or to supply a word or short
phrase to answer a question or complete statement. It includes true-false, matching type, and
multiple-choice questions. The word objective refers to the scoring, it indicates that there is
only one correct answer.
2. Subjective Test. It permits the student to organize and present an original answer. It includes
either short answer questions or long general questions. This type of test has no specific
answer. Hence, it is usually scored on an opinion basis, although there will be certain facts
and understanding expected in the answer.
3. Performance Assessment. Is an assessment in which students are asked to perform real-
world tasks that demonstrate meaningful application of essential knowledge and skills. It can
appropriately measure learning objectives which focus on the ability of the students to
demonstrate skills or knowledge in real-lifer situations.
4. Portfolio Assessment. It is an assessment that is based on the systematic, longitudinal
collection of student work created in response to specific, known instructional objectives and
evaluated in relation to the same criteria. Portfolio is a purposeful collection of students’ work
that exhibits the students’ efforts, progress and achievements in one or more areas over a
period of time. It measures the growth and development of students.
5. Oral Questioning. This method is used to collect assessment data by asking oral questions.
The most commonly used of all forms of assessment in class, assuming that the learner hears
and shares the use of common language with the teacher during instruction. The ability of the
students to communicate orally is very relevant to this type of assessment. This is also a form
of formative assessment.
6. Observation Technique. This is a method of collecting assessment data. The teacher will
observe how students carry our certain activities either observing the process or product.
There are two types of observation techniques: formal and informal observations. Formal
observations are planned in advance like when the teacher assess oral report or presentation
in class while informal observation is done spontaneously, during instruction like observing
the working behavior of students while performing a laboratory experiment.
7. Self-report. The responses of the students may be used to evaluate both performance and
attitude. Assessment tools could include sentence completion, Likert scales, checklists, or
holistic scales.
46
Module 3 | Assessment in Learning 1
5. Scorability means that the test should be easy to score, direction for scoring should be clearly
stated in the instruction. Provide the students an answer sheet and the answer key for the one
who sill check the test.
6. Adequacy means that the test should contain a wide range of sampling of items determine the
educational outcomes or abilities so that the resulting scores are representatives of the total
performance in the areas measured.
7. Administrability means the test should be administered uniformly to all students so that the
scores obtained will not vary due to factors other than differences of the students’ knowledge
and skills. There should be a clear provision for instruction for the students, proctors and even
the one who will check the test.
8. Practicality and Efficiency refers to the teacher’s familiarity with the methods used, time
required for the assessment, complexity of the administration, ease of scoring, ease of
interpretation of the test results and the materials used must be at the lowest cost.
2. What are the advantages and disadvantages of a subjective test over objective type
of test?
There are different types of assessing the performance of the students like objective test,
subjective test, performance based assessment, oral questioning, portfolio assessment, self-
assessment and checklist. Each of this has their own function and use. The type of assessment
tools should always be appropriate with the objectives of the lesson. There are two general types
of test item to use in achievement test using paper and pencil test. It is classified as selection-
type items and supply-type items.
A. Multiple-choice Test
Knowledge Level
The number of chromosomes in a cell produced by meiosis is ______
A. half as many as the original cell.
B. twice as many as the original cell.
C. the same number as the original cell.
D. not predictable.
Comprehension Level
Why did John B. Watson reject the structuralist study of mental events?
A. He believed that structuralism relied too heavily on scientific methods.
B. He rejected the concept that psychologists should study observable behavior.
C. He believed that scientists should focus on what is objectively observable.
D. He actually embraced both structuralism and functionalism.
Application Level
An envelope contains 140 pieces consisting P1000 and P500 peso bill. If the number
of P500 IS 20 more than the number of P1000 peso bill. How many P500 peso bill are
there?
A. 40
B. 60
C. 70
D. 80
Analysis Level
What is the statistical test used when you test the mean difference between the pre-
test and post-test?
A. Analysis of variance
B. t-test
C. Correlation
D. Regression analysis
48
Module 3 | Assessment in Learning 1
Matching type item consists of two columns. Column A contains the descriptions and
must be placed at the left side while column b contains the options and placed at the right side.
The examinees are asked to match the options that are associated with the descriptions.
Direction: Match each trigonometric term in column A with the corresponding expression
in column B. Write only the letter of your choice on the space provided for each item.
COLUMN A COLUMN B
49
sin θ
_____1. 1 A.
cos θ
_____2. csc θ B. sinθ cscθ
_____3. cos θ C. 1 – cos2θ
_____4. tan D.
sec θ
In this type of test, the examinees determine whether the statement presented is true or
false. True or false test item is an example of a “force-choice test” because there are only two
possible choices in this type of test. The students are required to choose the answer true or false
in recognition to a correct statement or incorrect statement. This type of test is appropriate in
assessing the behavioural objectives such as ‘identify,” “select,” or “recognize.” It is also suited to
assess the knowledge and comprehension level in cognitive domain. This is appropriate when
there are only two plausible alternatives or distracters.
1. Avoid writing a very long sentence statement. Eliminate unnecessary word(s) in the statement.
2. Avoid trivial questions.
3. It should contain only one idea in each item except for statement showing the relationship
between cause and effect.
4. It can be used for establishing cause and effect relationship.
5. Avoid using negative or double negatives. Construct the statement positively. If this cannot be
avoided, bold negative words or underline it to call the attention of the examinees.
6. Avoid using opinion-based statement, if it cannot be avoided, the statement should be
attributed to somebody.
7. Avoid specific determiner such as “never,” “always,” “all,” “none,” for they tend to appear in the
statements that are false.
8. Avoid specific determiner such as “some,” “sometimes,” and “may” for they tend to appear in
the statements that are true.
9. The number of true items must be the same with the number of false items.
10. Avoid grammatical clues that lead to correct answer such as the article 9a, an, the).
11. Avoid statement directly taken from the textbook.
12. Avoid arranging the statements in a logical order such as (TTTTTFFFFF, TFTFTFTFTF, etc.).
13. Directions should indicate where or how the students should mark their answer.
Direction: Determine if the statement is true or false. Write true if the statement is true,
otherwise, write false.
1. Limited only to low level of thinking skills such as knowledge and comprehension, or
recognition or recall information.
2. High probability of guessing the correct answer compared to multiple-choice which consists of
more than 2 choices.
1. The item should require a single word answer or brief and definite statement. Do not use
indefinite statement that allows several answers.
2. Be sure that the language used in the statement is precise and accurate in relation to the
subject matter being tested.
3. Be sure to omit only key words; do not eliminate so many words so that the meaning of the
item statement will not change.
4. Do not leave the blank at the beginning or within the statement. It should be at the end of the
statement.
5. Use direct question rather than incomplete statement. The statement should pose the problem
to the examinee.
6. Be sure to indicate the units in which to be expresses when the statement requires numerical
answer.
7. Be sure that the answer the student is required to produce is factually correct.
8. Avoid grammatical clues.
9. Do not select textbook sentences.
Question Form
Direction: Write your answer on the space provided before each item.
Completion Form
Module 3 | Assessment in Learning 1
1. It is only appropriate for questions that can be answered with short responses.
2. There is a difficulty in scoring when the questions are not prepared properly and clearly. The
question should be clearly stated so that the answer of the student is clear.
3. It can assess only knowledge, comprehension and application levels in Bloom’s taxonomy of
cognitive domain.
4. It is not adaptable in measuring complex learning outcomes.
5. Scoring is tedious and time consuming.
B. Essay Items
It is appropriate when assessing students’ ability to organize and present original ideas. It
consists of a few number of questions wherein the examinee is expected to demonstrate the
ability to recall factual knowledge; organize his knowledge; and present his knowledge in logical
and integrated answer. Extended response essay and restricted response essay are the two
types of essay test items. Extended Response Essay allows the students to determine the
length and complexity of the response. It is very useful in assessing the synthesis and evaluation
skills of the students. When the objective is to determine whether the students can organize
ideas, integrate and express ideas, evaluate information in knowledge, it is best to use extended
response essay test. Restricted Response Essay is an essay item that places strict limits on
both content and the response given by the students. This type of essay, the content usually
restricted by the scope of the topic to be discussed and the limitations on the form of the
response is indicated in the question.
1. Present and describe the modern theory of evolution and discuss how it is
supported by evidence from the areas of (a) comparative anatomy, (b) population genetics.
2. From the statement, “Mathematics maybe defined as the subject in which we
never know what we are talking about, nor whether what we are saying is true,” what do
you think is the reasoning of the statement?
52Explain your answer.
Restricted Response Essay
1. Point out the advantages and disadvantages of an essay type of test. Limit your
Module 3 | Assessment in Learning 1
1. It is easier to prepare and less time consuming compared to other paper and pencil tests.
2. It measures higher-order thinking skills (analysis, synthesis and evaluation).
3. It allows students’ freedom to express individuality in answering the given question.
4. The students have a chance to express their own ideas in order to plan their own answer.
5. It reduces guessing answer compared to any objective type of test.
6. It presents more realistic task to the students.
7. It emphasizes on the integration and application of ideas.
C. Problem-Solving Test
53
Module 3 | Assessment in Learning 1
Direction: Analyze and solve each problem. Show your solution neatly and clearly by
applying the strategy indicated in each item. Each item corresponds to 10 points.
1. Debbie begins a physical fitness program. Debbie’s goal is to do 100 sit-ups. On the first
day of the program, she does 20 sit-ups. Every 5 th day of the program, she increases the
number of sit-ups by 10. After how many days will she reach her goal? (Make a list or table)
2. In three more years, Miguel's grandfather will be six times as old as Miguel was last year.
When Miguel's present age is added to his grandfather's present age, the total is 68. How old
is each one now? (Use an equation)
1. It minimizes guessing by requiring the students to provide an original response rather than to
select from several alternatives.
2. It is easier to construct.
3. It can most appropriately measure learning objectives which focus on the ability to apply skills
and knowledge in the solution of problems.
4. It can measure an extensive amount of content objectives.
2. Formulate at least two examples of the different types of objective and subjective test
in your area of specialization.
Table of Specification
Table of specification (TOS) is a chart or table that details the content and level of
cognitive domain assessed on a test as well as the types and emphases of test items
(Gareis and Grant, 2008).
TOS is very important in addressing the validity and reliability of the test items. The
validity of the test means that the assessment can be used to draw appropriate result
from the assessment because the assessment guarded against any systematic error.
TOS provides the test constructor a way to ensure that the assessment is based from the
intended learning outcomes.
54
Module 3 | Assessment in Learning 1
It is also a way of ensuring that the number of questions on the test is adequate to ensure
dependable results that are not likely caused by chance.
It is also a useful guide in constructing a test and in determining the type of test items that
you need to construct.
A. Format 1 of a Table of Specification. This format is composed of the specific objectives, the
cognitive level, type of test used, the item number, and the total points needed in each item.
Specific Objectives refer to the intended learning outcomes stated as specific instructional
objective covering a particular test topic.
Cognitive Level pertains to the intellectual skill or ability to correctly answer a test item using
Bloom’s taxonomy of educational objectives. We sometimes refer to this as the cognitive
domain of a test item. Thus, entries in this column could be knowledge, comprehension,
application, analysis, synthesis, and evaluation.
Type of Test Item identifies the type or kind of test a test item belongs to. Examples of entries in
this column could be multiple-choice, true or false, or even essay.
Item Number simply identifies the question number as it appears in the test.
Total Points summarize the score given to a particular test.
55
Module 3 | Assessment in Learning 1
will include the learning outcomes in the areas of knowledge, intellectual skills or abilities,
general skills, attitudes, interest, and appreciation. Use Bloom’s taxonomy or Krathwolh’s 2001
revised taxonomy of cognitive domain as guide.
2. Make an outline of the subject matter to be covered in the test. The length of the test will
depend on the areas covered in its content and the time needed to answer.
3. Decide on the number of items per subtopic. Use this formula to determine the number of
items to be constructed for each subtopic covered in the test so that the number of item in
each topic should be proportioned to the number of class sessions.
4. Make the two-way chart as shown in the format 2 and format 3 of a Table of
Specification.
5. Construct the test items. A classroom teacher should always follow the general principle of
constructing test items. The test item should always correspond with the learning outcome so
that it serves whatever purpose it may have.
1. Planning Stage
Determine who will use the assessment results and how they will use them.
Identify the learning targets to be assessed.
Select the appropriate assessment method or methods.
Determine the sample size.
2. Development Stage
Develop or select items, exercises, tasks, and scoring procedures.
Review and critique the overall assessment for quality before use.
3. Use Stage
Conduct and score the assessment.
Revise as needed for future use.
56
Module 3 | Assessment in Learning 1
1. Examine the instructional objectives of the topics previously discussed. The first step in
developing a test is to examine and go back to the instructional objectives so that you can
match with the test items to be constructed.
2. Make a table of specification (TOS). TOS ensures that the assessment is based from the
intended learning outcomes.
3. Construct the test items. In constructing test items, it is necessary to follow the general
guidelines for constructing test items. Kubiszyn and Borich (2007) suggested some guidelines
for writing test items to help classroom teachers improve the quality of test items to write.
Begin writing items far enough or in advance so that you will have time to revise them.
Match items to intended outcomes at appropriate level of difficulty to provide valid
measure of instructional objectives. Limit the question to the skill being assessed.
Be sure each item deals with an important aspect of the content area and not with trivia.
Be sure the problem posed is clear and unambiguous.
Be sure that the item is independent with all other items. The answer to one item should
not be required as a condition in answering the next item. A hint to one answer should
not be embedded to another item.
Be sure the item has one or best answer on which experts would agree.
Prevent unintended clues to an answer in the statement or question. Grammatical
inconsistencies such as “a” or “an” give clues to the correct answer to those students
who are not well prepared for the test.
Avoid replication of the textbook in writing test items; do not quote directly from the
textual materials. You are usually not interested in how well students memorize the text.
Besides, taken out of context, direct quotes from the text are often ambiguous.
Avoid trick or catch questions in an achievement test. Do not waste time testing how well
the students can interpret your intentions.
Try to write items that require higher-order thinking skills.
4. Assemble the test items. After constructing the test items following the different principles of
constructing test item, the next step to consider is to assemble the test items. There are two
steps in assembling the test: (1) packaging the test; and (2) reproducing the test. In
assembling the test, consider the following guidelines:
Group all test items with similar format.
Arrange test items from easy to difficult.
Space the test items for easy reading.
Keep items and option in the same page.
Place the illustrations near the description.
Check the answer key.
Decide where to record the answer.
5. Check the assembled test items. Before reproducing the test, it is very important to
proofread first the test items for typographical and grammatical errors and make necessary
corrections if any. If possible, let others examine the test to validate its content. This can save
time during the examination and avoid destruction of the concentration of the students.
6. Write directions. Check the test directions for each item format to be sure that it is clear for
the students to understand. The test direction should contain the numbers of items to which
they apply; how to record their answers; the basis of which they select answer; and the criteria
for scoring or the scoring system.
7. Make the answer key. Be sure to check your answer key so that the correct answers follow a
fairly random sequence.
57
Module 3 | Assessment in Learning 1
8. Analyze and improve the test items. Analyzing and improving the test items should be done
after checking, scoring and recording the test.
Item Analysis
Item analysis is a process of examining the student’s response to individual item in the
test. It consists of different procedures for assessing the quality of the test items given to the
students. Through the use of item analysis we can identify which of the given are good and
defective test items. Good items are to be retained and defective items are to be improved, to be
revised or to be rejected.
1. Item analysis data provide a basis for efficient class discussion of the test results.
2. Item analysis data provide a basis for remedial work.
3. Item analysis data provide a basis for general improvement of classroom instruction.
4. Item analysis data provide a basis for increased skills in test construction.
5. Item analysis procedures provide a basis for constructing test bank.
1. Difficulty Index
It refers to the proportion of the number of students in the upper and lower groups who
answered an item correctly. The larger the proportion, the more students, who have learned the
subject is measured by the item. To compute the difficulty index of an item, use the formula:
n
D F=
N
where: DF = difficulty index; n = number of the students selecting item correctly in the
upper group and in the lower group; and N = total number of students who answered the
test
Level of Difficulty
To determine the level of difficulty of an item, find first the difficulty index using the
formula and identify the level of difficulty using the range given below. The higher the value of the
index of difficulty, the easier the item is. Hence, more students got the correct answer and more
students mastered the content measured by that item.
2. Discrimination Index
It is the power of the item to discriminate the students who know the lesson and those
who do not know the lesson. It also refers to the number of students in the upper group who got
an item correctly minus the number of students in the lower group who got an item correctly.
Divide the difference by either the number of the students in the upper group or number of
students in the lower group or get the higher number if they are not equal. Discrimination index is
the basis of measuring the validity of an item. This index can be interpreted as an indication of the
58
Module 3 | Assessment in Learning 1
extent to which overall knowledge of the content area or mastery of the skills is related to the
response on an item. The formula used to compute for the discrimination index is:
CUG −C LG
D I=
D
where: DI = discrimination index value; CUG = number of the students selecting the correct
answer in the upper group; CLG = number of students selecting the correct answer in the lower
group; and D = the number of students in either the lower group or upper group
1. Positive discrimination happens when more students in the upper group got the item
correctly than those students in the lower group.
2. Negative discrimination occurs when more students in the lower group got the item correctly
than the students in the upper group.
3. Zero discrimination happens when a number of students in the upper group and lower group
who answer the test correctly are equal, hence, the test item cannot distinguish the students who
performed in the overall test and the students whose performance are very poor.
Level of Discrimination
Ebel and Frisbie (1986) as cited by Hetzel (1997) recommended the use of the Level of
Discrimination of an Item for easier interpretation.
Options A B C* D E
Upper Group
Lower Group
59
Module 3 | Assessment in Learning 1
Distracter Analysis
1. Distracter. It is the term used for the incorrect options in the multiple-choice type of test while
the correct answer represents the key.
2. Miskeyed item. The test item is a potential miskey if there are more students from the upper
group who choose the incorrect options than the key.
3. Guessing item. Students from the upper group have equal spread of choices among the given
alternatives.
4. Ambiguous item. This happen when more students from the upper group choose equally an
incorrect option and the keyed answer.
Example 1. A class is composed of 40 students. Divide the group into two. Option B is the correct
answer. Based from the given data on the table, as a teacher, what would you do
with the test item?
Options A B* C D E
Upper Group 3 10 4 0 3
Lower Group 4 4 8 0 4
60
Module 3 | Assessment in Learning 1
c. Retain options A, C, and E because most of the students who did not perform well in
the overall examination selected it. Those options attract most students from the lower
group
4. Conclusion: Retain the test item but change option D, make it more realistic to make it effective
for the upper and lower groups. At least 5% of the examinees choose the incorrect option.
Example 2. Below is the result of an item analysis for a test item in Mathematics. Are you going
to reject, revise or retain the test item?
Options A B C* D E
Upper Group 4 3 4 3 6
Lower Group 3 4 3 4 5
1. Compute the difficulty index.
N=4+3=7
N = 39
n 7
D F= =
N 39
D F=0.18∨18 %
2. Compute the discrimination index
CUG = 4
CLG = 3
D = 20
CUG −C LG 4−3 1
D I= = =
D 20 20
D I =0.005∨5 %
3. Make an analysis about the level of difficulty, discrimination and distracters.
a. Only 18% of the examinees got the answer correctly, hence, the item is very difficult.
b. More students from the upper group got the answer correctly, hence, it has a positive
discrimination of 5%
c. Students respond about equally to all alternatives, an indication that they are guessing.
d. If the test item is well-written but too difficult, reteach the material to the class.
4. Conclusion: Reject the item because it is very difficult and the discrimination index is very poor,
and option A and B are not effective distracters.
Example 3. A class is composed of 50 students. Use 27% to get the upper and the lower groups.
Analyze the item given the following results. Option D is the correct answer. What will
you do with the test item?
Options A B C D* E
Upper Group 3 1 2 6 2
Lower Group 5 0 4 4 1
D I =0.14∨14 %
3. Make an analysis about the level of difficulty, discrimination and distracters.
a. Only 36% of the examinees got the answer correctly, hence, the item is difficult.
b. More students from the upper group got the answer correctly, hence, it has a positive
discrimination of 14%
c. Modify options B and E because more students from the upper group chose them
compare with the lower group, hence, they are not effective distracters because most
of the students who performed well in the overall examination selected them as their
answers.
d. Retain options A and C because most of the students who did not perform well in the
overall examination selected them as the correct answers. Hence, options A and C
are effective distracters.
4. Conclusion: Revised the item by modifying options B and E.
Test Reliability
Reliability refers to the consistency with which it yields the same rank for individuals who
take the test more than once (Kubiszyn and Borich, 2007). That is, how consistent test results or
other assessment results from one measurement to another. A test is reliable when it can be
used to predict practically the same scores when test administered twice to the same group of
students and with a reliability index of 0.60 or above. The reliability of a test can be determined by
means of Pearson Product Moment of Correlation, spearman-Brown Formula, Kuder-Richardson
Formulas, Cronbach’s Alpha, etc.
1. Test-retest Method. A type of reliability determined by administering the same test twice to the
same group of students with any time interval between the tests. The results of the test scores
are correlated using the Pearson Product Correlation Coefficient (r) and this this correlation
coefficient provides a measure of stability. This indicated how stable the test result over a
period of time.
2. Equivalent/Parallel/Alternate Forms. A type of reliability determined by administering two
different but equivalent forms of the test to the same group of students in close succession.
The equivalent forms are constructed to the same set of specification that is similar in content,
type of items and difficulty. The results of the test scores are correlated using the Pearson
Product Correlation Coefficient and this correlation coefficient provides a measure of the
degree to which generalization about the performance of students from one assessment to
another assessment is justified. It measures the equivalence of the tests.
3. Split-half Method. Administer test once and score two equivalent halves of the test. To split
the test into halves that are equivalent, the usual procedure is to score the even-numbered
and the odd-numbered test item separately. This provides two scores for each student. The
results of the test scores are correlated using the Spearman-Brown formula and this
correlation coefficient provides a measure of internal consistency. It indicates the degree to
which consistent results are obtained from two halves of the test.
4. Kuder-Richardson Formula. Administer the test once. Score the total test and apply the
Kuder-Richardson (KR) formula. The KR-20 formula is applicable only in situations where
62
Module 3 | Assessment in Learning 1
students’ responses are scored dichotomously, and therefore, is most useful with traditional
test items that are scored as right or wrong, true or false, and yes or no type. KR-20 formula
estimates of reliability provide information whether the degree to which the items in the test
measure is of the same characteristic, it is an assumption that all items are of equal in
difficulty. Another formula for testing the internal consistency of a test is the KR-21 formula,
which is not limited to test items that are scored dichotomously.
Reliability Coefficient
Reliability coefficient is a measure of the amount of error associated with the test
scores. Reliability Coefficient has the following description:
(a) The range of the reliability coefficient is from 0 to 1.0;
(b) The acceptable range value is 0.60 or higher;
(c) The higher the value of the reliability coefficient, the more reliable the overall test
scores;
(d) Higher reliability indicates that the test items measure the same thing.
1. The group variability will affect the size of the reliability coefficient. Higher coefficient results
from heterogeneous groups than from the homogeneous groups. As group variability
increases, reliability goes up.
2. Scoring reliability limits test score reliability. If tests are scored unreliable, error is introduced.
This will limit the reliability of the test scores.
3. Test length affects test score reliability. As the length increases, the reliability tends to go up.
4. Item difficulty affects test score reliability. As test items become very easy or very hard, the
test’s reliability goes down.
Example 1. Prof. Joel conducted a test to his 10 students in Elementary Statistics class twice
after one-day interval. The test given after one day is exactly the same test given the
first time. Scores below were gathered in the first test (FT) and second test (ST).
Using test-retest method, is the test reliable? Show the complete solution.
Student FT ST
1 36 38
2 26 34
3 38 38
4 15 27
5 17 25
6 28 26
7 32 35
63
Module 3 | Assessment in Learning 1
8 35 36
9 12 19
10 35 38
Using the Pearson r formula, find Ʃx, Ʃy, Ʃxy, Ʃx2, Ʃy2
Student FT (x) ST (y) xy x2 y2
1 36 38 1368 1296 1444
2 26 34 884 676 1156
3 38 38 1444 1444 1444
4 15 27 405 225 729
5 17 25 425 289 625
6 28 26 728 784 676
7 32 35 1120 1024 1225
8 35 36 1260 1225 1296
9 12 19 228 144 361
10 35 38 1330 1225 1444
n=10 Ʃx = 274 Ʃy = 316 Ʃxy = 9192 Ʃx2 = 8332 Ʃy2 = 10400
( n )( Σxy )−( Σx ) (Σy)
r=
√ [ ( n ) ( Σ x ) −(Σx) ] [ ( n ) (Σ y )−( Σy) ]
2 2 2 2
r =0.91
Analysis: The reliability coefficient using the Pearson r = 0.91 means that it has a very high
reliability. The scores of the 10 students conducted twice with one-day interval are consistent.
Hence, the test has a very high reliability.
Example 2. Prof. Glenn conducted a test to his 10 students in his Chemistry class. The test was
given only once. The scores of the students in odd and even items below were
gathered, (O) odd items and (E) even items. Using split-half method, is the test
reliable? Show the complete solution.
2 r oe
Use the Spearman-Brown Formula r ot = to find the reliability of the whole test, find
1+ r oe
the Ʃx, Ʃy, Ʃxy, Ʃx2, Ʃy2 to solve the reliability of the odd and even test items.
64
Module 3 | Assessment in Learning 1
Steps:
1. Use the Pearson Product Correlation Coefficient Formula to solve for r.
( n )( Σxy )−( Σx ) (Σy)
r=
√ [ ( n ) ( Σ x ) −(Σx) ] [ ( n ) (Σ y )−( Σy) ]
2 2 2 2
r =0.33
2. Find the reliability of the original test using the formula:
2 r oe
r ot =
1+ r oe
2(0.33)❑
r ot =
1+ 0.33
0.66
r ot =
1.33
r ot =0.50
3. Analysis: The reliability coefficient using Brown formula is 0.50, which is questionable
reliability. Hence, the test items should be revised.
Example 3. Ms. Tan administered a 40-item test in English for her Grade VI pupils in UEPLES.
Below are the scores of 15 pupils, find the reliability using the Kuder-Richardson
formula.
Student Score (x)
1 16
2 25
3 35
4 39
5 25
6 18
7 19
8 22
9 33
10 36
11 20
12 17
65
Module 3 | Assessment in Learning 1
13 26
14 35
15 39
Steps:
1. Solve the mean and the standard deviation of the scores using the given table.
KR 21=
k
k−1 [
1−
x (k −x)
ks2 ]
KR 21=
40
40−1
1−
[
27(40−27)
40 (70.14)
❑
]
66
Module 3 | Assessment in Learning 1
KR 21=
40
39[1−
27(13)
40(70.14 )
❑
]
KR 21=1.03 1−
[ 351
2 805.60 ]
KR 21=1.03 [ 1−0.1251 ]
KR 21=1.03 [ 0.8749 ]
KR 21=0.90
5. Analysis: The reliability coefficient using KR-21 formula is 0.90 which means that the test has
a very good reliability. Meaning, the test is very good for a classroom test.
Test Validity
Validity is concerned whether the information obtained from an assessment permits the
teacher to make a correct decision about a student’s learning. This means that the
appropriateness of score-based inferences or decisions made are based on the students’ test
results. Validity is the extent to which a test measures what it is supposed to measure.
Types of Validity
1. Face Validity. It is the extent to which a measurement method appears “on its face” to
measure the construct of interest. Face validity is at best a very weak kind of evidence that a
measurement method is measuring what it is supposed to. One reason is that it is based on
people’s intuitions about human behaviour, which are frequently wrong. It is also the case that
many established measures in psychology work quite well despite lacking face validity.
2. Content Validity. A type of validation that refers to the relationship between test and the
instructional objectives, establishes content so that the test measures what is supposed to
measure. Things to remember about validity:
a. The evidence of the content validity of a test is found in the Table of specification.
b. This is the most important type of validity for a classroom teacher.
c. There is no coefficient for content validity. It is determined by experts judgmentally, not
empirically.
3. Criterion-related Validity. A type of validation that refers to the extent to which scores from a
test relate to theoretically similar measures. It is a measure of how accurately a student’s
current test score can be used to estimate a score on a criterion measure, like performance in
courses, classes or another measurement instrument. For example, the classroom reading
grades should indicate similar levels of performance as Standardized Reading test scores.
a. Concurrent validity. The criterion and the predictor data are collected at the same time.
This type of validity is appropriate for tests designed to assess a student’s criterion
status or when you want to diagnose student’s status; it is a good diagnostic screening
test. It is established by correlating the criterion and the predictor using Pearson
Product Correlation Coefficient and other statistical tools correlations.
b. Predictive validity. A type of validation that refers to a measure of the extent to which
student’s current test result can be used to estimate accurately the outcome of the
student’s performance at later time. It is appropriate for tests designed to assess
students’ future status on a criterion. Regression analysis can be sued to predict the
criterion of a single predictor or multiple predictors.
4. Construct Validity. A type of validation that refers to the measure of the extent to which a test
measures a theoretical and unobservable variable qualities such as intelligence, math
achievement, performance anxiety, and the like, over a period of time on the basis of gathering
evidence. It is established through intensive study of the test or measurement instrument using
67
Module 3 | Assessment in Learning 1
convergent/divergent validation and factor analysis. There are other ways of assessing
construct validity like test’s internal consistency, developmental change and experimental
intervention.
a. Convergent validity is a type of construct validation wherein a test has a high
correlation with another test that measures the same construct.
b. Divergent validity is type of construct validation wherein a test has low correlation with
a test that measures a different construct. In this case, a high validity occurs only
when there is a low correlation coefficient between the tests that measure different
traits.
c. Factor analysis assesses the construct validity of a test using complex statistical
procedures conducted with different procedures.
1. Validity refers to the decisions we make, and not to the test itself or to the measurement.
2. Like reliability, validity is not an all-or-nothing concept; it is never totally absent or absolutely
perfect.
3. A validity estimate, called a validity coefficient, refers to specific type of validity. It ranges
between 0 and 1.
4. Validity can never be finally determined; it is specific to each administration of the test.
Validity Coefficient
The validity coefficient is the computed value of the r xy. In theory, the validity coefficient
has values like the correlation that ranges from 0 to 1. In practice, most of the validity scores are
usually small and they range from 0.3 to 0.5, few exceed 0.6 to 0.7. Hence, there is a lot of
improvement in most of our psychological measurement. Another way of interpreting the findings
is the squared correlation coefficient (rxy)2, this is called coefficient of determination. Coefficient
of determination indicates how much variation in the criterion can be accounted for by the
predictor.
Example: Teacher James develops a 45-item test and he wants to determine if his test is valid.
He takes another test that is already acknowledged for its validity and uses it as
criterion. He conducted these two sets of test to his 15 students. The following table
shows the results of the two tests. Is the test valid? Find the validity coefficient using
Pearson r and the coefficient of determination.
250530−236 220
r=
√ [ 232305−216 225 ] [ 272 430−258 064 ]
14 310
r=
√ [ 16 080 ] [ 14 366 ]
14 310
r=
√ 231 005 280
14 310
r=
15 198.85785
r =0.94
Coefficient of determination = (r)2 = (0.94)2= 88.36%
Interpretation: The correlation coefficient is 0.94, which means that the validity of the test is high,
or 88.36% of the variance in the students’ performance can be attributed to the test.
1. A 25-item multiple-choice test in Physical Education with four options was recorded
below for item number 10. Listed were a number of students in the lower and upper
groups who answered A, B, C, and D.
Item 10 A B C D*
Upper Group (27%) 4 5 2 9
Lower Group (27%) 6 4 5 5
2. Teacher Luis conducted a test to his 15 students in Science class twice with one-day
interval. The test given after one week69is exactly the same test during the first time it
was conducted. Scores below were gathered in the first test (FT) and second test
(ST).
a. Using test-retest method, is the test reliable? Show the complete solution
using the Pearson r and Spearman rho formula.
Module 3 | Assessment in Learning 1
Feedback
How was it working with this module? Were you exhausted seeing a lot of terms, numbers,
and computations used in designing and developing assessment? I hope you were able to follow
the discussion in this module. Remember that in assessment, numbers and computations are
always included. The results of the different formulas used for item analysis, reliability and
validation testing are used for the interpretation of the test item. So it is necessary that you will
know this computation processes. If you are having a hard time on some lessons, you can always
go back to the different topics and examples.
Summary
To aid you in reviewing the concepts in this module, here are the highlights:
70
Module 3 | Assessment in Learning 1
procedures presented.
Table of specification (TOS) is a chart or table that details the content and level of
cognitive domain assessed on a test as well as the types and emphases of test
items (Gareis and Grant, 2008).
Item analysis is a process of examining the student’s response to individual item in
the test. It consists of different procedures for assessing the quality of the test
items given to the students. Through the use of item analysis we can identify which
of the given are good and defective test items.
Difficulty Index refers to the proportion of the number of students in the upper and
lower groups who answered an item correctly.
Discrimination Index is the power of the item to discriminate the students who know
the lesson and those who do not know the lesson.
Reliability refers to the consistency with which it yields the same rank for
individuals who take the test more than once. That is, how consistent test results or
other assessment results from one measurement to another.
Validity is the extent to which a test measures what it is supposed to measure.
Suggested Readings
If you want to learn more about the topics in this module, you may log on to the following
links:
https://content.schoolinsites.com/api/documents/a4734c1ff0b948828e25b66791054c3b.pdf
https://www.slideshare.net/RonaldQuileste/constructing-test-questions-and-the-table-of-
specifications-tos
https://www.yourarticlelibrary.com/statistics-2/teacher-made-test-meaning-features-and-
uses-statistics/92607
https://www.slideshare.net/tamlinares/sound20-design2028ch204-729
https://opentextbc.ca/researchmethods/chapter/reliability-and-validity-of-measurement/
References
Burton, S. J., Sudweeks, R. E., Merrill, P. F., Wood, B. (1991). How to prepare better multiple-
choice test items: Guidelines for university faculty. Retrieved
from http://testing.byu.edu/info/handbooks/betteritems.pdf
Gabuyo, Y.A. (2012) Assessment of Learning I. Rex Book Store, Inc., Manila, Philippines.
71
Module 3 | Assessment in Learning 1
72