Modulemidterm 2020 2021 2
Modulemidterm 2020 2021 2
WHAT IS A TEST?
I. Standardized Tests - tests that have been carefully constructed by experts in the
light of accepted objectives.
a.1 Multiple Choice Test - consists of a stem each of which presents three to five
alternatives or options in which only one is correct or definitely better than the other.
The correct option choice or alternative in each item is merely called answer and the
rest of the alternatives are called distractors or decoys or foils,
a.2 True - False or Alternative Response - consists of declarative statements that one
has to respond or mark true 6r; false; right or wrong, correct or incorrect, yes or no, fact
or opinion, agree or disagree and the. like. It is a test made up of items which allow
dfchotomous responses.
a.3 Matching Type - consists of two parallel columns with each word, number, or
symbol in one column being matched to a word sentence, or phrase in the other
column. The items in Column I or A for which a match is sought are called premises,
and the items in Column II or B from which the selection is made are called responses,
b.2 Essay Type- Essay questions provide freedom of response that is needed to
adequately assess students' ability to formulate, organize, integrate and evaluate ideas
and information or apply knowledge and skills.
Dr. Eric A. Matriano
Page |2
PCK 2 – Assessment of Learning 1
b.2.1 Restricted Essay-limits both the content and the response. Content is
usually restricted by the scope of the topic to be discussed.
b.2.2 Extended Essay - allows the students to select any factual information that
they think is pertinent to organize their answers in accordance with their best judgment
and to integrate and evaluate ideas which they think appropriate.
Other Classification of Tests
Verbal Tests -one in which words are very necessary and the examinee should
be equipped with vocabulary in attaching meaning to or responding to test items.
Non -Verbal Tests- one in which words are not that important, student responds
to test items in the form of drawings, pictures or designs.
Objective Tests - one in which equally competent examinees will get the (same
scores, e.g. multiple - choice test
Dr. Eric A. Matriano
Page |3
PCK 2 – Assessment of Learning 1
Subjective Tests - one in which the scores can be Influenced by the
opinion/judgment of the rater, e.g. essay test
Types:
a. Rating Scale - measures attitudes toward others or asks an. individual to rate
another individual on a number of behavioral dimensions on a continuum from good to
bad or excellent to poor; or on a number of items by selecting the most appropriate
response category along 3 or 5 point scale (e.g., 5-excellent, 4-above average, 3-
average, 2-beiow average, 1 -poor)
d. Checklist -an assessment instrument that calls for a simple yes-no judgment.
It is basically a method of recording whether a characteristic is present or absent or
whether an action was or was not taken i.e. checklist of student's daily activities
Dr. Eric A. Matriano
Page |6
PCK 2 – Assessment of Learning 1
PERSONALITY ASSESSMENTS
Values concern preferences for “life goals” and "ways of life’, in contrast to
interests, which concern preference for particular activities.
a. Nonprojective Tests
1. Personality Inventories
It may be specific and measure only one trait, such as introversion, extroversion,
or may be general and measure’s number of traits.
2. Creativity Tests
Test of creativity are really tests designed to measure those personality
characteristics that are related to creative behavior. One such trait is referred to as
divergent thinking. Unlike convergent thinkers who tend to took for the right answer,
divergent thinkers tend to seek alternatives.
3. Interest Inventories
An interest Inventory asks an individual to indicate personal like, such as kinds of
activities he or she likes to engage in.
b. Projective Tests
The most commonly used projective technique is the method of association. This
technique asks the respondent to react to a stimulus such as a picture, inkblot, or word.
SUMMATIVE ASSESSMENT
Two teachers of the same grade level have set the following objectives for the day's
lesson: At the end of the period, the students should be able to: A. construct bar graph;
and B. interpret bar graphs. To assess the attainment of the objectives, Teacher A
required the students to construct a bar graph for the given set of data then she asked
them to interpret this using a set of questions as guide. Teacher B presented a bar
graph then asked them to interpret this using also a set of guide questions.
3. Mr. Fepe Smith is doing a performance-based assessment for the day’s lesson.
Which of the following will most likely happen?
a. Students are evaluated in one sitting.
b. Students do an actual demonstration of their skill.
c. Students are evaluated in the most objective manner.
d. Students are evaluated based on varied evidences of learning.
4. Ms. Jocephine Babaluga rated her students in terms of appropriate and effective use
of some laboratory equipment and measurement tools and the students ability to follow
the specified procedures. What mode of assessment should Miss Babaluga?
a. Portfolio Assessment
Dr. Eric A. Matriano
Page |8
PCK 2 – Assessment of Learning 1
b. Traditional Assessment
c. Journal Assessment
d. Performance Based Assessment
5. Mrs. Hilario presented the lesson on baking through a group activity so that the
students will not just learn how to bake but also develop their interpersonal skills.
How should this lesson be assessed?
I. She should give the students an essay test explaining how they baked the
cake.
II. The students should be graded on the quality of their baked cake ' using a
rubric.
III. The students in a group should rate the members based on their ability to
cooperate in their group activity.
IV. She should observe how the pupils perform their tasks.
a. l, II, and lII only c. II, III and IV only
b. I, II, IV only d. I, II, III, and IV
6. If a teacher has set objectives in all domains or learning targets and which could be
assessed using a single performance task, what criterion in selecting a task
should she consider?
a. Generalizability c. Multiple Foci
b. Fairness d. Teachability
9. Ms. Violeta Kabayo aims to measure a product of learning. Which of these objectives
will she most likely set for her instruction?
a. Show positive attitude towards learning common nouns
b. Identify common nouns in a reading selection
c. Construct paragraph using common nouns
d. Use a common noun in a sentence
11. Prof. Gilingue would like to find out how well her students know each other. What
Dr. Eric A. Matriano
Page |9
PCK 2 – Assessment of Learning 1
assessment Instrument would best suit her objective?
a. Self-report Instrument c. Guess-who technique
b. Sociometric technique d. AII of the above
12. Mr. Lampazok asked his pupils to indicate on a piece of paper the names of their
classmates whom they would like to be with for some group activity, what
assessment technique did Mr. Lapazok use?
a. Self-report technique c. Sociometric technique
b. Guess-who technique d. Anecdotal technique
23. The National Secondary Achievement Test (NSAT) and the National Elementary
Achievement Test (NEAT) results are interpreted against set mastery level. They fall
under __
a. Intelligence Test c. Aptitude Test
b. Criterion-Referenced Test d. Norm-Referenced Test
25. With synthesizing skills in mind, which has the highest diagnostic value?
a. Performance Tests c. Essay Tests
b. Personality Tests d. Completion Tests
2. Why are affective and non-cognitive learning outcomes difficult to measure and
evaluate as compared to cognitive learning outcomes?
4. Could anybody (even those who has knowledge on human behavior and
psychology) develop a personality test questionnaire? Are personality tests really
“accurate” in describing one’s personality? Support your answer.
MODULE 6
In carrying out scientific research, the type of hypothesis which indicates the direction in
which the experimenter expects the results to occur once the data has been analyzed is
known as a(n) ...
could be written
An hypothesis which indicates the expected result of a study is called a(n) ...
might be rewritten
According to Ebel, the most versatile type of objective item for measuring a variety of
educational outcomes is the ...
The second version specifies whose opinion is to be used, narrows the task to
consideration of objective items, and focuses on one item characteristic. The first
version poses an almost impossible task.
1. John Gosset
2. Sir Ronald Fisher
3. Karl Pearson
1. consistent
2. objective
3. stable
4. standardized
5. valid
contains a grammatical inconsistency (an objective) which gives away the answer.
1. consistent.
2. objective.
3. stable.
4. standardized.
5. valid.
Do Do Not
Determine the purpose of the Ask questions that do not assess one of your
assessment and the utilization of the learning objectives.
outcome scores.
Utilize the table of specifications to guide Focus on trivial issues that promote the
the type, number, and distribution of shallow memorization of facts or details.
questions.
Match the requirements of the test items Intentionally target test questions toward a
to the designated learning objectives. specific sub-set of students.
Prepare more test items that you need so that you can review and delete
ineffective items prior to the test.
Write test items well in advance of the test date, then wait several days to review
the items. This type of fresh perspective may help you to identify potential
problems or areas of confusion.
Review all test items once they are compiled for the test to ensure that the
wording of one item does not give away the answers to another item.
Within each group of test items, order questions from the least to most difficult.
Have a naive reader review test items to identify points of confusion or
grammatical errors.
The number of items you include in a given assessment depends upon the length of the
class period and the type of items utilized. The following guidelines will assist you in
determining an assessment appropriate for college-level students.
Item Arrangement
If you are using different response formats, group together items with the same format.
Then try to group items together by content and within that content area arrange the
items from easy to hard. Having different response formats mixed together means that
the student must adjust to differences in answering. This may take his mind away from
the content of the test itself.
Generally you want the response method to easily fit the test question. For example,
some software is available that enables you to print your questions directly on a
machine-scorable sheet. This saves time for the student and minimizes errors in
transfer of answers.
Test Directions
Time available: In order to pace themselves, students should know how much
time they have for the test. It doesn’t hurt to remind them—gently—about times
through out the test.
Recording of answers: Have the students be consistent in how they mark the test
papers or answer sheets. If you don’t have a separate answer sheet, make sure
they all circle the correct answer. Putting an X over the correct answer is less
deliberate and can be confused with rejecting an option.
Points for items: It is usually the best practice to award one point for each
multiple choice question. Weighing some questions more than others can be very
judgmental on your part and unnecessary strategic information for the examinee.
Penalty for guessing: Don’t penalize guessing.
Showing work on problems: You should decide ahead of time and alert the
examinees about whether their notes, computations, or other intermediate steps
toward the answer will be credited. The hardest part is deciding how to award
this in an objective manner and in a way you can communicate your strategy
clearly to the examinees.
Dr. Eric A. Matriano
P a g e | 18
PCK 2 – Assessment of Learning 1
Use of outside materials: Having the text, class handouts, or other supplemental
materials or aids (calculators, formulas, tables, etc.) used on a test changes the
character of a test and can change the students’ strategies for studying and
taking the test. You do not want them to count on the text for answers because:
Just plan ahead so that the material made available has to be used on the test
and does not become a security blanket for the anxious student.
The type of item you select to measure an objective should be the most direct way to
measure the outcome. For example, to determine if a microscope is being focused
correctly, you could devise a short answer test or directly observe the students. In this
instance, assuming a reasonable class size, observation seems the most direct method
of assessment. However, if there is a set of basic facts you wish a student to know,
trying to devise an essay topic where all the facts will be mentioned might prove to be
impossible. A set of objective test items is more apt to address the objective.
Dr. Eric A. Matriano
P a g e | 19
PCK 2 – Assessment of Learning 1
In an objective test of all multiple choice items, it is best to give equal weight to each
item in scoring. If you believe one learning objective should be given twice the weight as
another, this should be reflected in the number of questions you assign to the objective
in your Table of Specifications.
If your test is divided so that some items will take more time to answer than other items
(for example, if you have a set of short-answer supply items that will require more work
or time than other test items), then the items in that entire set can get more weight than
another set of items. These weights should be made clear to the students. Students
should always know point values of questions so that they can apportion their testing
time.
Scoring criteria for assessments should be done in advance. Developing the criteria can
lead to a change in the question so that expectations are clearer.
A holistic scoring rubric generates one score. A user of this method must develop a
description of the meaning of each of the scaled scores. An outline of the rubric you are
going to use should be handed to the students. This will enable them to prepare for the
exam.
Dr. Eric A. Matriano
P a g e | 20
PCK 2 – Assessment of Learning 1
Performance assessments require a student to demonstrate a skill by actually
performing it (e.g., writing a computer program to calculate a mean; carrying out an
experiment). Generally these methods of assessment appear authentic. They, too,
require a scoring rubric.
Besides scorer unreliability, the principal problems with performance assessments is the
amount of time required for the task. Because each task is so time consuming, the
number of tasks given is very few. If you have the students perform only a few tasks, it
is questionable whether you can generalize to other similar tasks.
Good for:
Advantages:
Easy to construct
Good for "who," what," where," "when" content
Minimizes guessing
Encourages more intensive study-student must know the answer vs. recognizing the
answer.
Disadvantages:
May overemphasize memorization of facts
Take care - questions may have more than one correct answer
Scoring is laborious
4. There should preferably be only one blank per test item, but if more than one, they should be
for a related series. Do not use so many blanks that the question is a puzzle.
Good: Who was given credit for the early development of the theory of evolution?
___________________
5. Each blank in all items should be the same length. This avoids the possibility of the blank
itself serving as a clue.
6. When writing the item be sure to write complete sentences. Do not include any specific
determinate (clues) such as (a, an) or a word that implies singularity.
Poor: When did Marie Curie receive the Nobel Prize? ________________. ("A long time
ago.." is a correct answer!)
Better: In what year did Marie Curie receive the Nobel Prize? ______________ .
8. When writing an item, do not take a statement directly from a textbook or from the teacher's
lecture, but write the item so as to test understanding rather than rote memory.
Advantages Disadvantages
A lot of vocabulary can be The understanding assessed is likely
assessed in a minimal time to be trivial (recall/knowledge level)
Difficult to avoid ambiguity in
Construction is relatively easy constructing questions
Do's Don'ts
1. Leave only important terms blank. 1. Lift statements directly from the
2. Keep items brief. book.
2. Use "a" or "an" before a blank; make
3. Limit the number of blanks per it "a/an" if needed to make it
statement to one, at the most grammatical.
two for older students.
3. Count a misspelled or non-
4. Limit the response called for to grammatical answer entirely wrong.
single words or very brief phrases. Let students know in advance that
spelling and grammar count.
5. Try to put the blanks near the end
of the statement (or better yet,
see 9 and 4 below). 4. Provide lines in the statement, e.g.,
"water and alcohols are
6. Try to ensure that only one term ____________ molecules." Instead,
fits each blank. use numbered blanks that are all of
Recommendations:
Good for:
Types:
Question/Right answer
Incomplete statement
Best answer
Advantages:
Very effective
Versatile at all levels
Disadvantages:
Dr. Eric A. Matriano
P a g e | 24
PCK 2 – Assessment of Learning 1
Difficult to construct good test items.
Difficult to come up with plausible distractors/alternative responses.
Avoid "all of the above"--can answer based on partial knowledge (if one is incorrect or
two are correct, but unsure of the third...).
Avoid "none of the above."
Make all distractors plausible/homogeneous.
Don't overlap response alternatives (decreases discrimination between students who
know the material and those who don't).
Don't use double negatives.
Present alternatives in logical or numerical order.
Place correct answer at random (A answer is most often).
Make each item independent of others on test.
Way to judge a good stem: student's who know the content should be able to answer
before reading the alternatives
List alternatives on separate lines, indent, separate by blank line, use letters vs. numbers
for alternative answers.
Need more than 3 alternatives, 4 is best.
General Rules
1. The stem should pose a clear question or problem and should contain as much of the item as
possible. It should be written as a question.
Poor: Evolution:
a. Albert Einstein.
b. Charles Darwin.
c. Marie Curie.
Dr. Eric A. Matriano
P a g e | 25
PCK 2 – Assessment of Learning 1
2. The stem should be stated simply, using correct English.
a. 1831
b. The calendar was inaccurate then; the year was actually 1832.
c. Darwin was not the only one to visit Galapagos.
d. Darwin visited Galapagos sometime between 1831 and 1836.
5. Avoid the use of negatives such as none or not. If they must be used, underline or capitalize
them.
Better: Which of the following is NOT a mammal? OR All of the following are
mammals EXCEPT:
7. All alternatives should be grammatically related to the stem and listed in some logical
numerical or systematic form. This is less confusing to the students and decreases the probability
that he will make careless mechanical errors. How many legs do spiders have?
8. The length of the alternatives should be consistent, not vary with being correct or incorrect.
Test-wise students know that the correct answer is often the longest one with the most qualifiers.
a. botanist
b. geologist
c. astronomer
d. physicist
11. Avoid use of response alternative such as "none of the above," "none of these," "both (a) and
(c) above," or "all of the above." "None of the above" may measure only the student's ability to
recognize incorrectness. "All of the above" may be a giveaway if a student recognizes more than
one correct alternative. Also a student may recognize a correct response and mark it without
reading down to the "all of the above" alternative.
a. A physicist
b. A mathematician
c. A Nobel Prize winner
d. Both B and C above
e. All of the above
12. Each alternative should be independent so as not to give clues to answer another alternative.
13. When testing for knowledge of a term, it is preferable to put the word in the stem, and
alternative definitions in the response alternatives.
a. Proton
b. Neutriolos
c. Electron
d. Neutron
14. All alternatives should be written so that they are all plausible to the less informed student.
Advantages Disadvantages
A large number of ideas can be addressed in a It is time-consuming to write good items,
short period of response time. especially those at higher cognitive levels.
These questions are easily and quickly scored.
Questions can elicit responses from all Test-wise and English fluent students tend
cognitive levels, from knowledge to to be favored.
evaluation.
Do's Don'ts
1. Use the same number of distractors (wrong 1. Use specific determiners in distractors
answers) for every question. such as all, none, only, and alone because
2. Use plausible distractors that are related to they usually indicate an incorrect answer.
the stem and are similar in character; tricky, Likewise, avoid generally, often, usually,
cute, and 'throw-away' ones are anathema. most, and may because they often indicate
the correct answer.
3. Have all distractors (and the correct answer) 2. Avoid negatives, including less obvious
about the same length. ones, such as without, because they can
confuse or be missed by students; highlight
4. Use correct grammar; if the stem is an
the negative word if you find you must use
incomplete sentence, each distractor should
one.
be grammatically consistent with it and
complete the sentence. 3. Provide clues in the stem, such as “a” or
“an” at the end; put these articles with the
Possible Distractors
3. Combine conclusions and explanations such that both are right, the former is wrong, the latter is
wrong, both are wrong.
Recommendations:
1. Using a graphic may make writing more challenging questions easier, e.g., graph interpretation.
2. Analyses are easily done on difficulty level and whether distractors are working; some machine
scoring programs can compute these.
Good for:
Advantages:
Can test large amounts of content
Students can answer 3-4 questions per minute
Disadvantages:
They are easy
Dr. Eric A. Matriano
P a g e | 29
PCK 2 – Assessment of Learning 1
It is difficult to discriminate between students that know the material and students who
don't
Students have a 50-50 chance of getting the right answer by guessing
Need a large number of items for high reliability
Use specific determinants with caution: never, only, all, none, always, could, might, can,
may, sometimes, generally, some, few.
Use only one central idea in each item.
Don't emphasize the trivial.
Use exact quantitative language
Don't lift items straight from the book.
Make more false than true (60/40). (Students are more likely to answer true.)
General Rules
1. Avoid using specific determiners such as always, never, might, may only, etc. There are
usually exceptions to these strong terms.
Poor: T F All of the mountains in the Rocky Mountains were formed by volcanic action.
Good: T F The mountains in the Rocky Mountains were formed by volcanic action.
2. When writing a true/false item, be sure to base it on a statement that is absolutely true or false.
3. Eliminate double negatives and if possible avoid negatives. If a negative such as not or none
must be used, be sure to underline or capitalize it.
4. Do not take statements out of textbooks, but write the statement in your own words.
5. Do not make the true statements consistently longer than the false statements and vice versa.
Good: T F John Glenn is best known for his first orbital flight around the earth.
7. Avoid the use of more than one idea in an item unless it is a cause/effect item. If it is a
cause/effect item, it should be stated so that students will react to the effect and not the cause.
Poor: T F Slavery was a major cause of the Civil War whereas the economic situation of
the southern states was not.
Good: T F Slavery was one of the major causes of the Civil War
Poor: T F Gerald Ford did not win the 1966 presidential election because of the good
weather in the east which brought many people to the polls on election day.
Good: T F One of the reasons Gerald Ford did not win the 1976 presidential election was
the good weather in the east which brought many people to the polls on
election day.
8. A false statement should be written so that it is plausible to someone who has not studied the
area being tested.
9. The crucial part of a true/false item should be placed at the end of the item.
Poor: T F The economic situation of the southern states was a major cause of the Civil
War.
Good: T F A major cause of the Civil War was the economic situation of the southern
states.
10. When using opinion, the source should be identified unless the ability to identify the source
is what is being measured.
10. It is preferable that students not be asked to write "T" or "F" in a blank. Many students
have developed a talent for making T's and F's look very similar. It is best to provide
alternative responses and have the students circle or underline the one which is correct.
Do's Don'ts
1. Use a single point that determines the 1. Use tricky questions.
truth of the statement. An example 2. Use unnecessary words and
violation: The cm is larger than the mm complicated content.
and the mm is larger than the dm.
2. Take care with grammar and spelling. 3. Use statements directly from the text.
Rephrase them so students must at
3. Use a single clause, simply and directly least have comprehended the material
stated; if two clauses are used, the main as opposed to recognizing it.
clause should be true and the
subordinate clause true or false. An 4. Avoid negatives; this means not just
example violation: Lilies are considered words like not or none, but negative
annuals because their bulbs live from prefixes and suffixes as well. Double
year to year. negatives are never grammatical.
Recommendations:
These questions are okay for quick checks of vocabulary and concepts.
The point value assigned should be minimal.
Dr. Eric A. Matriano
P a g e | 32
PCK 2 – Assessment of Learning 1
4. Matching Type
Good for:
Knowledge level
Some comprehension level, if appropriately constructed
Types:
Terms with definitions
Phrases with other phrases
Causes with effects
Parts with larger units
Problems with solutions
Advantages:
Maximum coverage at knowledge level in a minimum amount of space/preptime
Valuable in content areas that have a lot of facts
Disadvantages:
Time consuming for students
Not good for higher levels of learning
Use items in response column more than once (reduces the effects of guessing).
Use homogenous material in each exercise.
Make all responses plausible.
Put all items on a single page.
Put response in some logical order (chronological, alphabetical, etc.).
Responses should be short.
General Rules
1. In the directions, the basis for matching should be stated, and also whether the various
responses can be used once or more than once.
3. The statements in the response column (right) should be kept short and listed in some logical
order--alphabetically or chronologically. This helps students quickly locate responses.
4. The number of items in the left column should be five or less, for Junior High students. Tests
for high school students may contain up to 10 items.
5. The number of responses should exceed the number of items in order to avoid answering by
the process of elimination.
Poor:
Better:
8. Longer words and phrases should be listed in the left column and shorter words and phrases
should be listed in the right column. Placing long phrases in the right column requires student to
keep much more information in their heads as they consider all alternatives.
Advantages Disadvantages
Dr. Eric A. Matriano
P a g e | 34
PCK 2 – Assessment of Learning 1
A large number of related ideas cans be
addressed in a short period of time.
Such questions are restricted to recognition
Answers are easily and quickly scored. of simple understandings.
Clues are difficult to avoid.
Do's Don'ts
1. Make certain that the relationship 1. Split a matching question between pages.
between the stems and the responses is 2. Fail to check that directions state a
the same throughout the question. For relationship and that it is correct across the
example, all of the items might be things entire question.
OR events, but a combination of things
and events is inappropriate. (Warning: 3. Provide more than one correct response for
this is more difficult than it appears!) a single stem, unless you've been very clear
2. State the specific relationship between with the directions and have taught
the stems and responses in the students to do this kind of question in
directions to the question. Check that it advance.
fits each stem and its response. If it
doesn't, rework the question. 4. Change the grammar across stems and
responses, e.g., between plural and
3. Put the stems ("question") column on singular.
the left and number them.
Recommendations:
3. Having just a few responses when some can be used twice (or more) is okay.
5. Rating Scales
1. There should be a separate scale for each characteristic, attitude, or behavior being measured.
3. The characteristics and the points on the scale should be clearly defined.
1
|
|
|
5
1 unimportant
2 of little importance
3 important
4 very important
5. Raters should be instructed to mark only at the designated rating positions,not in-between.
6. Raters should be instructed to omit ratings where they feel unqualified to judge.
7. Do not define the end categories so extremely that no one will use them.
6. Checklists
1. Identify and describe clearly each of the specific actions desired in the performance.
2. Add to the list those actions which represent common errors, if they are limited in number and
can be clearly identified.
3. Arrange the desired actions and likely errors in the approximate order in which they are
expected to occur.
4. Provide a simple procedure for numbering or checking the actions in the sequence in which
they occur.
5. If the list is lengthy, group the behavior to be checked under separate sub-headings.
Example Checklist
Directions: Check the space to the left of each behavior as that behavior is observed.
____ Has proper materials (2 slices of bread, peanut butter, knife) ready.
____ Lays bread on flat surface.
____ Opens peanut butter jar.
____ Grasps knife firmly by the handle.
____ Inserts blade of knife into peanut butter and removes the desired amount.
____ Has difficulty manipulating the knife.
____ Spreads peanut butter on one side of the first slice of bread with a side-to-side motion so
that peanut butter is evenly distributed.
____ Places second slice of bread directly on top of peanut butter.
____ Closes peanut butter jar.
7. Essay
Good for:
Advantages:
Students less likely to guess
Easy to construct
Disadvantages:
Can limit amount of material tested, therefore has decreased validity.
Subjective, potentially unreliable scoring.
This final list, may serve you well as you score student responses.
1. Decide whether to use the analytic or holistic approach. The tentative scoring key or
model answer should reflect the specific approach chosen.
2. Check the scoring key or model answer against a sample of actual responses without
assigning any grades. Revise the scoring key or model answer if necessary.
3. Decide in advance how to handle irrelevant and inaccurate information, bluffing, and
technical problems (e.g., spelling, penmanship, grammar, punctuation, organization, etc.).
4. Evaluate the responses without identifying the student.
Essay Questions
Advantages Disadvantages
All cognitive levels can be Essay questions are time-consuming to
addressed with essay questions. answer and answers take more time to
It takes less time to write an essay score.
test (only because there are fewer Less content can be sampled.
questions to write).
Reliability of both response and score
Students’ organizational skills are is less (although validity may be
also measured. better).
Do's Don'ts
1. Teach students how to respond to 1. Don't provide optional questions, i.e.,
the types of essay questions to be answer two of the following four
asked, e.g., how to construct an questions. This results in different tests
argument citing evidence, logic and (if you do offer choices, make sure you
the null hypothesis. are satisfied with different tests).
2. Give adequate directions as to the 2. Don't start essay questions with words
content of the desired response, i.e., such as name, list, who, what, where,
don't just say "discuss," say "discuss etc. These do not indicate that thinking
in terms of x, y, and z." beyond recall is required; use a
different kind of question instead.
3. Provide the structure of response
wanted in the directions, e.g., 3. Don't wait until the last minute to write.
Dr. Eric A. Matriano
P a g e | 39
PCK 2 – Assessment of Learning 1
compare and contrast, analyze, Allow time to review and rewrite.
synthesize, evaluate from
(what/whose?) perspectives, develop
in the manner of (what/who).
4. Indicate the length of the response
desired.
5. Write longer rather than shorter
questions. Use novel questions when
feasible.
6. Ask for thinking beyond knowing or
comprehending (don’t waste
anyone’s time!).
7. Have students respond on their own
paper or provide plenty of space.
8. Provide for a range of acceptable
answers such that all students will be
able to respond to some extent.
Encourage students to try.
Scoring
1. Decide whether an analytic or holistic rubric is more appropriate for the question.
2. Plan the scoring guide/rubric as you write the questions (usually improves the question).
3. Read a sample of responses before you begin to score; make adjustments if needed.
4. Vary the sequence of papers as you score across questions. No paper should always be
first.
5. Do read all responses to a single question together (faster and helps you maintain focus).
6. Hide students’ names.
7. Provide constructive comments and do so legibly.
Recommendations:
Dr. Eric A. Matriano
P a g e | 40
PCK 2 – Assessment of Learning 1
Supplementing
Essays
There are creative ways
Having students construct a table or do a to supplement essay
labeled diagram may sometimes be more tests that can be just as
appropriate than writing an essay. effective as the essays
If problems / calculations are used, consider themselves. For
having students explain what the answer example, rather than
means. asking students to write
an essay describing the
You may want to consider grading writing process of mitosis, you
skills separately. might ask them to draw
a diagram like the one to
the left and label it with
short descriptions. Just
8. Oral Exams make sure that you have covered a
similar diagram in class.
Good for:
Advantages:
Useful as an instructional tool-allows students to learn at the same time as testing.
Allows teacher to give clues to facilitate learning.
Disadvantages:
Time consuming to give and take.
Could have poor student performance because they haven't had much practice with it.
9. Student Portfolios
Good for:
Advantages:
Can assess compatible skills: writing, documentation, critical thinking, problem solving
Can allow student to present totality of learning.
Disadvantages:
Can be difficult and time consuming to grade.
Advantages:
Measures some skills and abilities not possible to measure in other ways
Disadvantages:
Can not be used in some fields of study
Difficult to construct
Difficult to grade
Time-consuming to give and take
• Generalizability - the likelihood that the students’ performance on the task will generalize the
comparable tasks.
• Authenticity-The task is similar to what the students might encounter in the real world as.
opposed to encountering only in the school.
■ Teachability - The task allows one to master the skill that one should be proficient in.
• Feasibility - The task Is realistically implementable in relation to its cost, space, time, and
equipment requirements.
. • Fairness-The task is fair to all the students regardless of their social status or gender
More likely than not, you will be responsible for the typing, proofreading, and collation of
the test. Spell check is a great addition to software programs, but this will not detect all
Dr. Eric A. Matriano
P a g e | 42
PCK 2 – Assessment of Learning 1
errors. If you hand out a test with numerous mistakes, the message you are sending to
the students is “this test and your evaluation are not worth a lot of effort.”
It’s helpful to consecutively number each answer sheet and accompanying test booklet.
When the students hand in the answer sheet and test booklet, you can quickly
determine if a test booklet is missing and whose test booklet it is. After you have gone
over a test with the students, it’s beneficial for you to collect the booklets since you
might want to use many of the items again.
Make sure each student gets one, and only one, booklet and answer sheet.
Try to spread students out when they are taking the test.
The best anti-cheating devices are your ears, eyes, and feet. Walk up and down
aisles; it’s good exercise.
Station yourself at the exit door with a roster in hand. When tests are handed to
you, check off names. That way, if you don’t have a student’s exam, you know
that you lost it and the student should be retested.
Bring pencils with you and tell students if they need another, raise their hand—do
not go to another student. Have the students return the borrowed pencils!
SUMMATIVE ASSESSMENT
I. TRUE OR FALSE. Write TRUE if the statement is correct and if it is incorrect, change the
underlined word/s to make it correct. (15 pts)
Multiple Choice. Write the letter of the best/correct answer. (25 points)
1. Which of the following does not belong to the group?
a. Matching c. Multiple choice
b. True-false d. Completion type
2. A type of test where three or more plausible alternatives are provided in each item.
a. Essay test c. Multiple choice test
b. Matching type d. Problem-solving test
5. A test that consists of a series of items in which it admits only one correct answer
for each item out of two alternatives.
a. Problem-solving c. True-false
b. Multiple-choice d. Short-answer
6. A matching item test includes ten events to be matched with twelve responses
containing dates, cities, and provinces. The error of this test construction is:
Dr. Eric A. Matriano
P a g e | 44
PCK 2 – Assessment of Learning 1
a. too many responses c. Homogeneous response
b. unbalanced matching d. Heterogeneous response
10. From a measurement standpoint, using test consisting entirely of essay items is
undesirable because
a. Content sampling tends to be limited
b. Scoring requires too much time.
c. Difficult to construct the item
d. Inefficient measurement of cognitive ability.
12. The “halo effect” in scoring items is a tendency to score more highly those
responses:
a. That are technically well written
b. Read earlier in the scoring process
c. Of students known to be good students
d. Read later in the scoring process
13. In an association form, short-answer item, the spaces for the answers should:
a. All have uniform size
b. Vary according to length of correct answer
c. Vary in size, but not according to any order
16. Which test a subjective and less reliable for scoring and grading?
a. Completion c. True or false
b. Essay d. Matching
17. Teacher Kiko Martin wants to test student’s acquisition of declarative knowledge.
Which test is appropriate?
a. Performance test c. Short answer test
b. Submission of a report d. Essay
18. Teacher Vilma Aunor would like to cover a wide variety of objectives in the quarterly
examinations in his English class lesson on subject verb agreement. Which of the
following type of test the most appropriate?
a. True or False c. Multiple choice
b. Essay d. Matching type
19. Among the types of assessment below, which does not belong to the concept-
group?
a. Multiple choice c. True or false
b. Matching type d. Completion test
20. Which type of test measures student’s thinking, organizing and written
communication skills?
a. Completion c. True or false
b. Essay d. Matching
22. Mr. Piolo Cruz made an essay test for the objective “Identify the planets lathe
solar system". Was the assessment method used the most appropriate for the given
objective? Why?
a. Yes, because essay test is easier to construct than objective test.
b. Yes, because essay test can measure any type of objective.
c. No. He should have conducted oral questioning.
d. No. He should have prepared an objective tests.
23. Mr. Janno Alcasid wants to test students’ knowledge of the different places in the
Philippines, their capital and their products and so he gave his students an essay
test. If
you were the teacher will you do the same?
a. No, the giving of objective test is more appropriate than the use of essay.,
b. No, such method of assessment is inappropriate because essay is difficult.
c. Yes, essay test could measure more than what other tests could measure.
D. Yes, essay test is the best in measuring any type of knowledge.
24. What principle of test construction is violated when one places very difficult items
at the beginning; thus creating frustration among students particularly those of
average ability and below average?
a. All the items of particular type should be placed together in the test
b. The items should be phrased so that the content rather than the form of
the
statements will determine the answer
c. All items should be approximately 50 percent difficulty.
d. The items of any particular type should be arranged in an ascending order
of
difficulty
25. With specific details in mind, which one has a stronger diagnostic value?
a. Multiple Choice test c. Restricted Essay type
b. True or False Test d. Non-restricted/extended Essay
Type
Identification. Evaluate the following items. Indicate whether each item has
Choose the letter only of the correct answer from the above options.(10 pts)
___ 4. A reading teacher wants to find out if the pupils are ready to move on to the next
lesson.
What kind of test should she give?
A. Diagnostic C. Placement
B. Formative D. Medical
___ 5. A gumamela is an
A. Incomplete flower C. Pistillate flower
B. Complete flower D. Staminate flower
Matching Type. Match Column B (response ) with Column A (premises ) and write the letter
of your answer on the blank before each number in column A. (10 pts)
Column A Column B
Module 7
ITEM ANALYSIS
1. Determine the difficulty index and discrimination index of items in a given test.
2. Improve items in the test using the item analysis.
3. Identify items which are ambiguous, miskeyed, guessed and with poor
distracters.
ITEM ANALYSIS: Is the process of examining student’s responses to each test item
to judge the quality of the item, specifically what one looks for is the difficulty and
discriminating ability of the item as well the effectiveness of each alternatives.
CRITERION GROUP:
We need two criterion groups for the item analysis. The smaller the
percentage used for upper and the lower group, the greater the differentiation will be.
However, the smaller the extreme groups, the less reliable the resulting values would
be, Kelly (1939) claimed that the optimum at which these two conditions balanced is
reached when the upper and the lower 27% values are used.
(The teacher has to check the options again. Most probably option d is the
correct answer, and has miskeyed it.)
Upper Group 4 3 3 2 3
Options a (b) c d e
Upper Group 0 12 0 0 3
Lower Group 0 8 6 0 7
The values of facility ranges from 0 to 1. From very hard to very simple.
The Discrimination Index can take value from -1.00 to +1.00. The higher
the discrimination index, the better the item discriminated.
The following Rules of thumb for interpreting Item Discrimination Index Values
for Classroom Test.
6. Calculate the Facility index as follows: Sum up all the number of pupils in
the three groups who responded correctly and divide it by the total number of
pupils taking the test, N.
This gives the Facility Index (FI)
FI = ( U + M + L ) / N
Where:
U - Number of pupils in the upper group who answered
correctly.
M - Number of pupils in the Middle group who answered
correctly.
L - Number of pupils in the Lower group who answered
correctly.
N - Total Number of Pupils taking the test.
DI = ( U – L ) / NUL
By examining the responses of the pupils in the upper and lower groups
on each option, the DI and the FI of each item, the teacher can revise the item to
improve it. The ff. guidelines maybe used in selecting and improving the item.
Table 2
2. Any option that is not selected by the pupils in either the upper or lower
group violates the principle that each alternative in a multiple choice- item
should be PLAUSIBLE. The alternative should be revised before it is used
again.
3. Distractors should be if possible discriminate negatively. This means that
incorrect options should be selected less frequently by the upper group than
by the lower group. A distractor is functioning well if it attracts most or large
proportion of the pupils from the lower group.
Note: Easy items will prove to have low discrimination power. Why? Let’s take an
example at the extreme. If an item is answered correctly by all students, as many
good as poor students will have answered it correctly. Keeping the item, then, should
be more a function of whether it addresses an important objective. However, very
difficult items are also apt to produce poor discrimination indices. These items also
should be examined carefully.
SUMMATIVE ASSESSMENT
TRUE OR FALSE. Write TRUE if the statement is correct and write FALSE if it is
incorrect.
Given the data for a 10-item Multiple Choice, do an item analysis. Make decision if
the item is to be accepted or rejected.
1. Why is item analysis needed after checking the test? What is the importance of
item analysis in improving the quality of test items?
Module 8
STANDARDS OF QUALITY ASSESSMENT:
CHARACTERISTICS OF A GOOD TEST
Objectives:
If a test looks like it measures what it claims to, the test is said to have
face validity. A test with face validity may or may not actually produce data
which will correspond to the learning objectives. For example, an objective
may call for a student to classify four types of leaves. At first, the item may
seem valid, it has high face validity. Closer attention will show that the
objective requires higher levels of thinking to classify and describe each leaf.
The test then does not in fact measure what it purports to, and can be said to
have a low degree of content validity. No chemistry teacher would think of
Dr. Eric A. Matriano
P a g e | 58
PCK 2 – Assessment of Learning 1
measuring knowledge of analytical chemistry with a test on acid rain. Nor
would a biology teacher think seriously about measuring microscope skills
with an essay test.
Ex2. To correlate the results of college entrance test and the grade
weighted average (GWA) of students at some future time is
establishing predictive validity. The college entrance result is the
predictor and the GWA is the criterion.
NOTE: The main is the difference is the time the criterion is available.
In predictive validity, the criterion is not yet available at the time the
test is administered. In concurrent validity, the criterion is already
available at the time the test is conducted.
Teachers may be able to improve classroom tests' validity by using the list
below as a guide to test construction or as a checklist to review a test or a testing
situation.
B. Test items
Material tested
Construction
Setting
B. RELIABILITY
Reliability refers to the consistency of test results. If a test gives the same
results when measuring an individual or group on two different occasions then the
scores are reliable. If different teachers rate the same essay, for example, on the
same criteria and obtain the same score then we say the scores are reliable
from one rater to another. In both cases we are interested in consistency or
trustworthiness. In a simple example you might consider the task of measuring the
length of a room.
There are several instruments you could use. You could step off the length on
two different occasions or two people could step it off. Another way would be to use
a large rubber band with marks on it at one foot intervals. Again you could measure
several times or several people could use the rubber band. A third choice might be to
use a steel measuring tape. The tape will obviously give more consistent results; the
more trustworthy results--from time to time and from measurer to measurer. Unless
the measurement can be shown to be reasonably consistent over different occasions
Dr. Eric A. Matriano
P a g e | 61
PCK 2 – Assessment of Learning 1
or over different samples of the same behavior little confidence can be placed in the
results. The results are not reliable.
There are factors which influence the reliability of a test. They are:
Objectivity:
Scores for objective test items are less subject to the opinions or values of the
scorers and are thereby more reliable. In essay testing, when relying on
observations of students' performance, or when rating the products of their work,
scores tend to be unreliable. Later you will learn of some ways to increase reliability
in such situations.
The test itself may contain poorly constructed items, items that give clues,
ambiguous items, very easy, very difficult and very high vocabulary reading level.
With norm referenced tests there must be a range of difficulty. Tests which
are too easy or too difficult tend to be less reliable.
Length of test
It determines how scores are consistent over a given period of time. The
same test is administered twice to the same group within a time interval in
between. The time interval may vary according to purpose of the test
administrator. Some authors suggest a period of two weeks or 15 days is
sufficient enough. If the time interval is short, students may recall their
answers in the first administration. If the time interval is long, tends to lower
the test reliability.
NOTE: It must be remembered that a test score is not a true score of the test
taker. An obtained score is a combination of his/her true score plus error of
measurement.
It uses two different versions of the same test administered to the same group
close together in time (same day or next day). Form A have the same number of
items, same item format, and almost the same moderately difficult items with Form
B.
For example, the skill to be tested in both forms is to multiply two-digit number
with another two digit number.
FORM A : 35 x 75
FORM B : 25 x 82
We can say that Form A and Form B measure the same skill and the items
are parallel.
4. Internal-Consistency Method
It employs only one test administration of the test given to the group.
0.90 and above Excellent Reliability at the level of the best standardized test
0.70 – 0.79 Good for a classroom test. There are probably a few items
which could be improved.
0.50 – 0.59 Suggests need for revision of test, unless it is quite short (ten or
fewer items. The test definitely needs to be supplemented by
other measures.
C. FAIRNESS
An assessment procedure needs to be fair. This means many things. First,
students need to know exactly what the learning target are and what they are
suppose to be achieving, then they could get lost in the maze of concept being
discussed in class. Likewise, students have to e informed how their progress will be
assessed in order to allow them to strategize and optimize their performance.
Second assessment has to be viewed as an opportunity to learn rather than
the opportunity to weed out poor and slow learners. The goal should be that of
diagnosing the learning process rather than the learning object.
Third, fairness also implies freedom from teacher-stereo-typing. Some
example of stereotyping include: boys are better than girls in mathematics or girls
are better than boys in language. Such stereotyped image and thinking could lead to
unnecessary and unwanted biases in the way that teacher assess their student.
E. ECONOMY
The test will not entail too much cost in terms of production cost and
administration cost. Paper-pencil test is more economical as compared to many
performance test. The test should also save time and effort spent for its
administration and that answer sheets must be provided so it can be given from time
to time.
F. SCORABILITY
The test should be easy to score such that directions for scoring are clear,
point/s for each correct answer(s) is/are specified.
G. INTERPRETABILITY
Test scores can easily be interpreted and described in terms of the specific
tasks that a student can perform or his/her relative position in a clearly defined
group.
SUMMATIVE ASSESSMENT
MULTIPLE CHOICE. Write the letter of the correct/best answer for each item. (20
points)
3. If a teacher made test over emphasizes facts and under emphasizes the other
objectives of the course for which it is designed, what can be said about the
test?
a. It lacks content validity
b. It lacks construct validity
c. It lacks criterion-related validity
d. All of the above
4. Should a teacher make deductions for misspelled words in an examination in
a content course?
a. No; such deductions will invalidate the test
b. Yes; all errors should result in deductions
c. Yes; good spelling is goal for which all teachers are responsible
d. Yes; but only to the extent that spelling is a basic objective of the
course
5. What happens when deductions due to poor penmanship are made in a pupils
test in science?
a. It lowers the validity of the test
b. It lowers the reliability of the test
c. It lowers both the validity and reliability of the test
d. It does not affect the validity and reliability of the test
6. Which best describes validity?
a. Consistency in the test results
b. Adequacy of standardization
c. Homogeneity in the content of the test
d. Objectivity in the administration and scoring
7. Which is directly affected by the objectivity in scoring a test?
a. The validity of the test
b. The difficulty of the items
c. The interpretation of the test
d. The reliability of the test
8. Which describes an objective test?
a. It has definite norms which serve as basis for evaluating students
performance
b. It is a test in which teacher judgment in the construction of the test
is eliminated
c. It is a test where adequate answers get the same rating
d. Its items reflect directly upon the objectives
9. Why is it said that standardized tests can never completely replace informal
teacher-made test?
Dr. Eric A. Matriano
P a g e | 66
PCK 2 – Assessment of Learning 1
a. They lack validity for classroom purposes
b. They lack reliability for classroom purposes
c. They provide inadequate basis for interpretation
d. They call for special training in administration scoring
10. Why do experts recommend a variety of instruments and techniques of
evaluation?
a. They allow the teacher to use a variety of teaching procedures.
b. They make for greater objectivity in scoring
c. They yield a wider range of scores and permit better grading
d. They allow different objectives to be evaluated more adequately
17. Mr. Carla Phokwang developed an Achievement Test in Math for her grade three
pupils. Before she finalized the test, she examined carefully if the test items were
constructed based on the competencies that have to be tested. What test validity is
she trying to establish?
a. content validity b. predictive validity
b. concurrent validity d. construct validity
18. Mrs. Rosa Sampaguita wants to establish the reliability of her achievement test
in English. Which of the following activities will help achieve her purpose?
a. Administer two parallel tests to two different groups of students.
b. Administer two equivalent tests to the same group of students.
c. Administer single test but to two different groups of students.
d. Administer two different tests but to the same group of students
19. Mr. Gringo Bato tried to correlate the scores of his pupils in the Social Studies
test with their grades in the same subject last 3rd Quarter. What test validity is he
trying to establish?
a. Content validity c. Concurrent validity
b. Construct validity d. Criterion-related validity
20. The more objective a test is, the more it is
a. Reliable c. Usable
b. Valid d. Practical