0% found this document useful (0 votes)
2 views

Modulemidterm 2020 2021 2

Uploaded by

Chay Tiania
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Modulemidterm 2020 2021 2

Uploaded by

Chay Tiania
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 68

MODULE 5

TEST AND TESTING

WHAT IS A TEST?

 It is an instrument or systematic procedure which typically consists of a set of


questions for measuring a sample of behavior.
 It is a special form of assessment made under contrived circumstances
especially so that it may be administered
 it is a systematic form of assessment that answers the question, "How well does
the individual perform - either in comparison with others or in comparison with a
domain of performance task.
 An Instrument designed to measure any quality, ability, skill or knowledge.

PURPOSES I USES OF TESTS

 Instructional Uses of Tests


 grouping learners for instruction within a class
 identifying learners who need corrective and enrichment experiences
 measuring class progress for any given period
 assigning grades/marks
 guiding activities for specific learners (the slow, average, fast)

 Guidance Uses of Tests


 assisting learners to set educational and vocational goals
 improving teacher, counselor and parents' understanding of children with
problems
 preparing information/data to guide conferences with parents about their
children
 determining Interests in types of occupations not previously considered tr
known by the students
 predicting success in future educational or vocational endeavor

 Administrative Uses of Tests


 determining emphasis to be given to the different learning areas in the
curriculum
 measuring the school progress from year to year -
 determining how well students are attaining worthwhile educational goals
 determining appropriateness of the school curriculum for students of
different levels of ability
 developing adequate basis for pupil promotion or retention

Dr. Eric A. Matriano


Page |1
PCK 2 – Assessment of Learning 1
Classification of Tests According Format

I. Standardized Tests - tests that have been carefully constructed by experts in the
light of accepted objectives.

1. Ability Tests-combine verbal and numerical ability, reasoning and


computations.
Ex.: OLSAT - Otis Lennon Standardized Ability Test

2. Aptitude Tests - measure potential In a specific field or area; predict the


degree to which an individual will succeed in any given area such art,
music, mechanical task or academic studies.
Ex.: OAT - Differential Aptitude Test

II. Teacher-Made Tests - constructed by classroom teacher which measure and


appraise student progress in terms of specific classroom/instructional objectives.

1 . Objective Type-answers are in the form of a single word or phrase or symbol

a. Limited Response or Selection Type - requires tie student to select the


answer from a given number of alternatives or choices.

a.1 Multiple Choice Test - consists of a stem each of which presents three to five
alternatives or options in which only one is correct or definitely better than the other.
The correct option choice or alternative in each item is merely called answer and the
rest of the alternatives are called distractors or decoys or foils,

a.2 True - False or Alternative Response - consists of declarative statements that one
has to respond or mark true 6r; false; right or wrong, correct or incorrect, yes or no, fact
or opinion, agree or disagree and the. like. It is a test made up of items which allow
dfchotomous responses.

a.3 Matching Type - consists of two parallel columns with each word, number, or
symbol in one column being matched to a word sentence, or phrase in the other
column. The items in Column I or A for which a match is sought are called premises,
and the items in Column II or B from which the selection is made are called responses,

b. Free Response or Constructed Response Type or Supply Test- requires


the student to supply or give the correct answer.
b.1 Short Answer - uses a direct question that can be answered by a word, phrase,
number, or symbol.Ib.2 Completion Test-consists of an incomplete statement that can
also be answered by a word, phrase, number, or symbol

b.2 Essay Type- Essay questions provide freedom of response that is needed to
adequately assess students' ability to formulate, organize, integrate and evaluate ideas
and information or apply knowledge and skills.
Dr. Eric A. Matriano
Page |2
PCK 2 – Assessment of Learning 1
b.2.1 Restricted Essay-limits both the content and the response. Content is
usually restricted by the scope of the topic to be discussed.
b.2.2 Extended Essay - allows the students to select any factual information that
they think is pertinent to organize their answers in accordance with their best judgment
and to integrate and evaluate ideas which they think appropriate.
Other Classification of Tests

 Psychological Tests - aim to measure students' intangible aspects of behavior,


i.e. intelligence, attitudes, interests and aptitude.

 Educational Tests - aim to measure the results/effects of instruction.

 Survey Tests - measure general level of student's achievement over a broad


range of learning outcomes and tend to emphasize norm – referenced
interpretation

 Mastery Tests-measure the degree of mastery of a limited set of specific


learning outcomes and typically use criterion referenced interpretations.

 Verbal Tests -one in which words are very necessary and the examinee should
be equipped with vocabulary in attaching meaning to or responding to test items.

 Non -Verbal Tests- one in which words are not that important, student responds
to test items in the form of drawings, pictures or designs.

 Standardized Tests - constructed by a professional item writer, cover a large


domain of learning tasks with just few items measuring each specific task.
Typically, items are of average difficulty and omits very easy and very difficult
items, emphasize discrimination among individuals in terms of relative level of
learning.

 Teacher-Made-Tests - constructed by a classroom teacher, give focus on a


limited domain of learning tasks with relatively large number of items measuring
each specific task. Matches item difficulty to learning tasks, without alternating
item difficulty or omitting easy or. difficult items, emphasize description of what
learning tasks students can and cannot do/perform.

 Individual Tests - administered on a one - to - one basis using careful oral


questioning.

 Group Tests - administered to group of individuals, questions are typically


answered using paper and pencil technique.

 Objective Tests - one in which equally competent examinees will get the (same
scores, e.g. multiple - choice test
Dr. Eric A. Matriano
Page |3
PCK 2 – Assessment of Learning 1
 Subjective Tests - one in which the scores can be Influenced by the
opinion/judgment of the rater, e.g. essay test

 Power Tests - designed to measure level of performance under sufficient time


conditions, consist of items arranged in order of increasing difficulty.

 Speed Tests - designed to measure the number of items an individual can


complete in a give time, consists of items approximately of the same level
of difficulty.

ASSESSMENT OF AFFECTIVE AND OTHER NON - COGNITIVE


LEARNING OUTCOMES

Affective and other non-cognitive learning outcomes requiring assessment procedure


beyond paper-and-pencil test.

Affective Assessment Procedures/Tools

Observational Techniques - used in assessing affective and other non-cognitive


learning outcomes and aspects of development of students.

1. Anecdotal Records - method of recording factual description of students'

Dr. Eric A. Matriano


Page |4
PCK 2 – Assessment of Learning 1
behavior.

Effective use of Anecdotal Records


1. Determine in advance what to observe but be alert for unusual behavior.
2. Analyze observational records for possible sources of bias.
3. Observe and record enough of the situation to make the behavior
meaningful.
4. Wake a record of the incident right after observation, as much as possible.
5. Limit each anecdote to a brief description of a single incident.
6. Keep the factual description of the incident and your interpretation of it,
separate.
7. Record both positive and negative behavioral incidents.
8. Collect a number of anecdotes on a student before drawing inferences
concerning typical behavior.
9. Obtain practice in writing anecdotal records.

Dr. Eric A. Matriano


Page |5
PCK 2 – Assessment of Learning 1
2. Peer appraisal - is especially useful in assessing personality characteristics, social
relations skills, and other forms of typical behavior. Peer – appraisal methods include
the guess - who technique and the sociometric technique.

a. Guess-Who Technique - method, used to obtain ,peer judgment or peer


ratings requiring students to name their classmates who best fit each of a series of
behavior description, the number of nominations students receive on each characteristic
indicates their reputation in the peer group.

b. Sociometric Technique - also calls for nominations, but students indicate


their choice of companions for some group situation or activity, the number of choices
students receives serves as an indication of their total social acceptance.

3. Self-Report Techniques - used to obtain information that is inaccessible by


other means, including reports on the students’ attitudes, interests, and personal
feelings.

4. Attitude Scales - used to determine what a student believes, perceives or


feels: Attitudes can be measured toward self, others, and a variety of other activities,
institutions, or situations.

Types:

a. Rating Scale - measures attitudes toward others or asks an. individual to rate
another individual on a number of behavioral dimensions on a continuum from good to
bad or excellent to poor; or on a number of items by selecting the most appropriate
response category along 3 or 5 point scale (e.g., 5-excellent, 4-above average, 3-
average, 2-beiow average, 1 -poor)

b. Semantic Differential Scale - asks an individual to give a quantitative rating


to the subject of the attitude scale on a number of bipolar adjectives such as good-bad,
friendly-unfriendly etc.

c. Likert Scale - an assessment instrument which asks an individual to respond


to a series pf statements by indicating whether she/he strongly agrees (SA), agrees (A),
is undecided (U), disagrees (D), or strongly disagrees (SO) with each statement. Each
response is associated with a point value, and an individual's score is determined by
summing up the point values for each positive statements: SA - 5, A- 4, U - 3, D - 2, SD
-1 . for negative statements, the point values would be reversed, that is, SA -1 , A - 2,
and so on.

d. Checklist -an assessment instrument that calls for a simple yes-no judgment.
It is basically a method of recording whether a characteristic is present or absent or
whether an action was or was not taken i.e. checklist of student's daily activities
Dr. Eric A. Matriano
Page |6
PCK 2 – Assessment of Learning 1
PERSONALITY ASSESSMENTS

It refers to procedures for assessing emotional adjustment, interpersonal relations,


motivation, interests, feelings aid attitudes toward self, others, and a variety of other
activities, institutions, and situations.

Interests are preferences for particular activities.

Example of statement on questionnaire: I would rather gook ten write a letter.

Values concern preferences for “life goals” and "ways of life’, in contrast to
interests, which concern preference for particular activities.

Example: I consider it more important to have people respect me than to admire


me.

Attitude concerns feelings about particular social objects - physical objects,


types of people, particular persons, social institutions, government policies and others.

Example: I enjoy solving math problem,

a. Nonprojective Tests

1. Personality Inventories

Personality Inventories present lists of questions or statements describing


behaviors characteristic erf certain personality traits, and the individual is asked to
indicate (yes, no, undecided) whether the statement describes her or him.

It may be specific and measure only one trait, such as introversion, extroversion,
or may be general and measure’s number of traits.

2. Creativity Tests
Test of creativity are really tests designed to measure those personality
characteristics that are related to creative behavior. One such trait is referred to as
divergent thinking. Unlike convergent thinkers who tend to took for the right answer,
divergent thinkers tend to seek alternatives.

3. Interest Inventories
An interest Inventory asks an individual to indicate personal like, such as kinds of
activities he or she likes to engage in.

b. Projective Tests

Projective tests were developed in an attempt to eliminate some of the major


problems inherent in the use of self - report measures, such as the tendency of some
respondents to give ‘socially acceptable” responses.
Dr. Eric A. Matriano
Page |7
PCK 2 – Assessment of Learning 1
The purposes of such tests are usually not obvious to respondents; the individual
is typically asked to respond to ambiguous items.

The most commonly used projective technique is the method of association. This
technique asks the respondent to react to a stimulus such as a picture, inkblot, or word.

SUMMATIVE ASSESSMENT

MULTIPLE CHOICE. Choose the letter of the correct/best answer. (25


points)
Refer to this case in answering items 1 - 2

Two teachers of the same grade level have set the following objectives for the day's
lesson: At the end of the period, the students should be able to: A. construct bar graph;
and B. interpret bar graphs. To assess the attainment of the objectives, Teacher A
required the students to construct a bar graph for the given set of data then she asked
them to interpret this using a set of questions as guide. Teacher B presented a bar
graph then asked them to interpret this using also a set of guide questions.

1. Whose practice is acceptable based on the principles of assessment?


a. Teacher A c. both Teacher A and Teacher B
b. Teacher B d. Neither Teacher A nor Teacher B

2. Which is true about the given case?


a. Objective A matched with performance-based assessment while B can be
assessed using the traditional pen-and-paper objective test.
b. Objective A matched with traditional assessment while B can be assessed
using a performance-based method.
c. Both objective A and B matched with performance-based assessment.
d. Both objective A and B matched with traditional assessment

3. Mr. Fepe Smith is doing a performance-based assessment for the day’s lesson.
Which of the following will most likely happen?
a. Students are evaluated in one sitting.
b. Students do an actual demonstration of their skill.
c. Students are evaluated in the most objective manner.
d. Students are evaluated based on varied evidences of learning.

4. Ms. Jocephine Babaluga rated her students in terms of appropriate and effective use
of some laboratory equipment and measurement tools and the students ability to follow
the specified procedures. What mode of assessment should Miss Babaluga?
a. Portfolio Assessment
Dr. Eric A. Matriano
Page |8
PCK 2 – Assessment of Learning 1
b. Traditional Assessment
c. Journal Assessment
d. Performance Based Assessment

5. Mrs. Hilario presented the lesson on baking through a group activity so that the
students will not just learn how to bake but also develop their interpersonal skills.
How should this lesson be assessed?
I. She should give the students an essay test explaining how they baked the
cake.
II. The students should be graded on the quality of their baked cake ' using a
rubric.
III. The students in a group should rate the members based on their ability to
cooperate in their group activity.
IV. She should observe how the pupils perform their tasks.
a. l, II, and lII only c. II, III and IV only
b. I, II, IV only d. I, II, III, and IV

6. If a teacher has set objectives in all domains or learning targets and which could be
assessed using a single performance task, what criterion in selecting a task
should she consider?
a. Generalizability c. Multiple Foci
b. Fairness d. Teachability

7. Which term refers to the collection of students' products and accomplishments in a


given period for evaluation purposes?
a. Diary c. Anecdotal record
b. Portfolio d. Observation Report
8. Ms. Chikadez Tomador uses alternative methods of assessment. Which of the
following will she NOT likely use?
a. Multiple Choice c. Oral Presentation
b. Journal Writing c. Developing Portfolios

9. Ms. Violeta Kabayo aims to measure a product of learning. Which of these objectives
will she most likely set for her instruction?
a. Show positive attitude towards learning common nouns
b. Identify common nouns in a reading selection
c. Construct paragraph using common nouns
d. Use a common noun in a sentence

10. Which is a guidance function of a test?


a. Identifying pupils who need corrective teaching
b. Predicting success in future academic and vocational education
c. Assigning marks for courses taken
d. Grouping pupils for instruction within a class

11. Prof. Gilingue would like to find out how well her students know each other. What
Dr. Eric A. Matriano
Page |9
PCK 2 – Assessment of Learning 1
assessment Instrument would best suit her objective?
a. Self-report Instrument c. Guess-who technique
b. Sociometric technique d. AII of the above

12. Mr. Lampazok asked his pupils to indicate on a piece of paper the names of their
classmates whom they would like to be with for some group activity, what
assessment technique did Mr. Lapazok use?
a. Self-report technique c. Sociometric technique
b. Guess-who technique d. Anecdotal technique

13. Which of the following assessment procedures/tools is useful in assessing social


relation skills?
a. Anecdotal record c. Peer appraisal
b. Attitude scale d. Any of the above

14. Which is an example of affective learning outcome?


a. Interpret stimuli from various modalities to provide data needed in noting
adjustments to the environment
b. Judge problem and issues in terms of situations involved than in terms of
fixed dogmatic thinking.
c. Appreciate the quality and worth of the story read
d. Create a computer program to detect location of COVID19 positive.

15. How can skills be assessed appropriately?


a. Through self-report
b. Through direct observation
c. Through open-ended interview
d. Through paper-and-pencil test

16. Which is most useful in estimating student’s success in future studies?


a. Aptitude test
b. Achievement test
c. Interest inventories
d. Portfolio techniques

17. What do diagnostic tests identify?


a. The specific nature of the remedial program needed
b. The general areas of weakness in class performance
c. The causes underlying academic difficulties
d. The specifies nature of difficulties

18. How are readiness test classified?


a. Aptitude test
b. Achievement test
c. Interest inventory
d. Personality profile
Dr. Eric A. Matriano
P a g e | 10
PCK 2 – Assessment of Learning 1
19. Which describes a speed test?
a. It consists of various types of objective items
b. It consists of items that are scaled in order of difficulty
c. It consists of sub sections with corresponding time limits
d. It consists of items of approximately equal difficulties that are administered
with less than adequate time limits

20. What is the primary use of academic achievement test?


a. They measure personality traits that make for effective use of one’s ability, like
motivation
b. They identify the type of activities the individual would tend to select
c. They estimate students capacity to profit from academic instruction
d. They appraise the present academic ability of the student

21. Why do experts recommend a variety of instruments and techniques of


evaluation?
a. They allow the teacher to use a variety of teaching procedures.
b. They make for greater objectivity in scoring
c. They yield a wider range of scores and permit better grading
d. They allow different objectives to be evaluated more adequately

22. What is one disadvantage of paper-and-pencil test?


a. It can measure limited set of objectives.
b. It can identify collaborative skills.
c. It can assess student’s learning experiences over time
d. It can determine creative skills of the students

23. The National Secondary Achievement Test (NSAT) and the National Elementary
Achievement Test (NEAT) results are interpreted against set mastery level. They fall
under __
a. Intelligence Test c. Aptitude Test
b. Criterion-Referenced Test d. Norm-Referenced Test

24. Which are direct measures of competence?


a. Personality Tests c. Performance Tests
b. Paper-and-Pencil Tests d. Standardized Tests

25. With synthesizing skills in mind, which has the highest diagnostic value?
a. Performance Tests c. Essay Tests
b. Personality Tests d. Completion Tests

Essay. Answer the following briefly. (5 points each)

Dr. Eric A. Matriano


P a g e | 11
PCK 2 – Assessment of Learning 1
1. Discuss the uses of tests for guidance purposes.

2. Why are affective and non-cognitive learning outcomes difficult to measure and
evaluate as compared to cognitive learning outcomes?

3. You noticed that a particular student in your classroom is shy, hesitant to


participate in class and group activities, with no friends around him. What will you
do to better “understand” this learner and what intervention would you propose to
help him?

4. Could anybody (even those who has knowledge on human behavior and
psychology) develop a personality test questionnaire? Are personality tests really
“accurate” in describing one’s personality? Support your answer.

MODULE 6

WRITING EFFECTIVE TEST ITEMS


Objectives:

After taking this module, you should be able to --


1. Identify violations of item writing rules when given sample objectively scored test
items.
2. Write objectively scored items which comply with the given rules.
3. Evaluate different test items based on suggested and standard rules.

Points to Remember before Starting this Module:


 Test items are used to ensure learning has taken place. The writing
of test questions should be done before the instructional material is
created, and matched precisely to the previously stated objectives.

 There are two broad categories of tests; norm-referenced tests


compare learners to each other, and criterion based tests match
each learner to pre-specified criteria. Many assessment situations
combine both types of questions.

Dr. Eric A. Matriano


P a g e | 12
PCK 2 – Assessment of Learning 1
 Different types of knowledge (facts, concepts, procedures, and
principles), and different types of performances (remember and
apply), require different types of test questions.

 Although scoring is often done by selecting a single correct answer,


more intricate tests use checklists or rubrics to assess
comprehension.

General Tips and Rules for Effective Test Item Writing

1. Express Items as Precisely, Clearly and Simply as Possible


Unnecessary material reduces the effectiveness of an item by forcing examinees
to respond to the irrelevant material and perhaps be distracted by it. For example, the
following item:

In carrying out scientific research, the type of hypothesis which indicates the direction in
which the experimenter expects the results to occur once the data has been analyzed is
known as a(n) ...

could be written

An hypothesis which indicates the expected result of a study is called a(n) ...

2. Include all Qualifications Necessary to Provide a Reasonable Basis


for Responding
The item

What is the most effective type of test item?

might be rewritten

According to Ebel, the most versatile type of objective item for measuring a variety of
educational outcomes is the ...

The second version specifies whose opinion is to be used, narrows the task to
consideration of objective items, and focuses on one item characteristic. The first
version poses an almost impossible task.

3. Emphasize General Tasks Rather than Small Details

Dr. Eric A. Matriano


P a g e | 13
PCK 2 – Assessment of Learning 1
The item

The product-moment coefficient of correlation was developed by

1. John Gosset
2. Sir Ronald Fisher

3. Karl Pearson

might be replaced by the item

The product-moment coefficient of correlation is used to determine the degree of


relationship between

1. two dichotomous variables.


2. a dichotomous variable and a continuous variable.

3. two continuous variables.

If an item on the product-moment coefficient of correlation is to be included in a test, it


should concern some basic understanding or skill useful in determining when and how
to apply the technique.

4. Avoid Jargon and Textbook Language


It is essential to use technical terms in any area of study. Sometimes, however, jargon
and textbook phrases provide irrelevant clues to the answer, as the following item.

A test is valid when it

1. produces consistent scores over time.


2. correlates well with a parallel form.

3. measures what it purports to measure.

4. can be objectively scored.

5. has representative norms.

The phrase "measures what it purports to measure" is considered to be a measurement


cliche which would be quickly recognized by students in the area. The item might be
rewritten:

The validity of a test may be determined by

1. measuring the consistency of its scores.


2. comparing its scores with those of a parallel form.

Dr. Eric A. Matriano


P a g e | 14
PCK 2 – Assessment of Learning 1
3. correlating its scores with a criterion measure.

4. inspecting the system of scoring.

5. evaluating the usefulness of its norms.

5. Locate and Delete Irrelevant Clues


Occasionally, verbal associations and grammatical clues render an item ineffective. For
example, the item

A test which may be scored merely by counting the correct responses is an


_______________ test.

1. consistent
2. objective

3. stable

4. standardized

5. valid

contains a grammatical inconsistency (an objective) which gives away the answer.

The item could be rewritten

A test which may be scored by counting the correct responses is said to be

1. consistent.
2. objective.

3. stable.

4. standardized.

5. valid.

6. Eliminate Irrelevant Sources of Difficulty


Other extraneous sources of difficulty may plague examinees in addition to the
item faults mentioned above. Students may misunderstand the test directions if the test
format is complex and/or the students are not familiar with it. When response keys are
common to two or more items, care must be taken that students are made aware of the
situation. If a set of items using a common key extends to a second page, the key
should be repeated on the second page. Then students will not forget the key or have to
turn back to an earlier page to consult the key.

Dr. Eric A. Matriano


P a g e | 15
PCK 2 – Assessment of Learning 1
Whenever complex or unfamiliar test formats are used, examinees should have
an opportunity to practice responding to items prior to the actual test whose results are
used for grading. Such a practice administration will also give the item writer an
indication of difficulties students may be having with directions or with the test format.

7. Place all Items of a Given Type Together in the Test


Grouping like test items allows examinees to respond to all items requiring a
common mind-set at one time. They don't have to continually shift back and forth from
one type of task to another. Further, when items are grouped by type, each item is
contiguous to its appropriate set of directions.

8. Prepare Keys or Model Answers in Advance of Test Administration


Preparing a key for objective-type items or a model answer to essay or short
answer items is an excellent way to check the quality of the items. If there are major
flaws in items, they are likely to be discovered in the keying process. Preparing a model
answer prior to administering the test is especially important for essay or other open-
end items because it allows the examiner to develop a frame of reference prior to
grading the first examination.

9. Arrange for Competent Review of the Items


Anyone who has attempted to proof his or her own copy knows that it is much
better to have the material proofed by another person. The same principle applies to
proofing test items. However, it is important that the outside reviewer be competent in
the subject matter area. Unfortunately, critical review of test items is a demanding and
time-consuming task. Item writers may make reciprocal agreements with colleagues or
may find advanced students to critique their items. Test construction specialists may
provide helpful comments with respect to general item characteristics.

Do Do Not
Determine the purpose of the Ask questions that do not assess one of your
assessment and the utilization of the learning objectives.
outcome scores.
Utilize the table of specifications to guide Focus on trivial issues that promote the
the type, number, and distribution of shallow memorization of facts or details.
questions.
Match the requirements of the test items Intentionally target test questions toward a
to the designated learning objectives. specific sub-set of students.

Dr. Eric A. Matriano


P a g e | 16
PCK 2 – Assessment of Learning 1
Write using simple, complete grammar Make test items intentionally difficult or tricky.
and wording.
Create items that are worded at the Include more test items that can be answered
average reading level of the target by the average student in the designated
student population. amount of time.
Ensure that each test item has one Utilize items provided by a publisher's
undisputedly correct answer. testbank without reviewing each item for its
relevance to course-specific learning goals.
Write test items at a level of difficulty that
matches the learning objective and
student population.
Include a variety of test item formats.

Tips to improve the overall quality of test items and assessments:

 Prepare more test items that you need so that you can review and delete
ineffective items prior to the test.
 Write test items well in advance of the test date, then wait several days to review
the items. This type of fresh perspective may help you to identify potential
problems or areas of confusion.
 Review all test items once they are compiled for the test to ensure that the
wording of one item does not give away the answers to another item.
 Within each group of test items, order questions from the least to most difficult.
 Have a naive reader review test items to identify points of confusion or
grammatical errors.

Determining the Number of Assessment Items:

The number of items you include in a given assessment depends upon the length of the
class period and the type of items utilized. The following guidelines will assist you in
determining an assessment appropriate for college-level students.

Item Type Average Time


True-false 30 seconds
Multiple-choice 1 minute
Multiple-choice of higher level learning objectives 1.5 minutes
Short Answer 2 minutes
Completion 1 minute
Dr. Eric A. Matriano
P a g e | 17
PCK 2 – Assessment of Learning 1
Matching 30 seconds per response
Short Essay 10-15 minutes
Extended Essay 30 minutes
Visual Image 30 seconds

Item Arrangement

If you are using different response formats, group together items with the same format.
Then try to group items together by content and within that content area arrange the
items from easy to hard. Having different response formats mixed together means that
the student must adjust to differences in answering. This may take his mind away from
the content of the test itself.

Generally you want the response method to easily fit the test question. For example,
some software is available that enables you to print your questions directly on a
machine-scorable sheet. This saves time for the student and minimizes errors in
transfer of answers.

Test Directions

Directions should include statements about

 Time available: In order to pace themselves, students should know how much
time they have for the test. It doesn’t hurt to remind them—gently—about times
through out the test.
 Recording of answers: Have the students be consistent in how they mark the test
papers or answer sheets. If you don’t have a separate answer sheet, make sure
they all circle the correct answer. Putting an X over the correct answer is less
deliberate and can be confused with rejecting an option.
 Points for items: It is usually the best practice to award one point for each
multiple choice question. Weighing some questions more than others can be very
judgmental on your part and unnecessary strategic information for the examinee.
 Penalty for guessing: Don’t penalize guessing.
 Showing work on problems: You should decide ahead of time and alert the
examinees about whether their notes, computations, or other intermediate steps
toward the answer will be credited. The hardest part is deciding how to award
this in an objective manner and in a way you can communicate your strategy
clearly to the examinees.

Dr. Eric A. Matriano
P a g e | 18
PCK 2 – Assessment of Learning 1
 Use of outside materials: Having the text, class handouts, or other supplemental
materials or aids (calculators, formulas, tables, etc.) used on a test changes the
character of a test and can change the students’ strategies for studying and
taking the test. You do not want them to count on the text for answers because:

1. they won’t study, and more importantly,

2. you should not be testing for use of an index or use of reference


materials.

Just plan ahead so that the material made available has to be used on the test
and does not become a security blanket for the anxious student.

Writing Specific Types of Items


There is an almost infinite variety to the forms test items may take. Test items are often grouped
into two main categories: objective items and constructed-response items. Objective items are
those in which the examinee recognizes a best answer from options presented in the item.
Objective items include multiple-choice items, alternative-response items and matching items.
Constructed- response items include restricted-response items, short-answer items, completion
items and essay items.

Objective and Essay Test Items

A common distinction is made between an objective item (e.g., true-false, multiple


choice, etc.) and an essay item. An objective item is called objective because its scoring
system is such that almost anyone, when given the key, will get the same score for an
item as anyone else. Essay items, however, require expert judges to score them, and
even those judges may disagree on the score. Rubrics are schema or criteria that direct
the evaluation of student written production and should be developed along with the
course objectives. Students are aware of the subjective nature of scoring essays, and
sometimes it is feasible to share these rubrics with them so they can direct their
studying. In any type of evaluation, it’s important that the score not be seen as biased or
capricious.

The type of item you select to measure an objective should be the most direct way to
measure the outcome. For example, to determine if a microscope is being focused
correctly, you could devise a short answer test or directly observe the students. In this
instance, assuming a reasonable class size, observation seems the most direct method
of assessment. However, if there is a set of basic facts you wish a student to know,
trying to devise an essay topic where all the facts will be mentioned might prove to be
impossible. A set of objective test items is more apt to address the objective.
Dr. Eric A. Matriano
P a g e | 19
PCK 2 – Assessment of Learning 1
In an objective test of all multiple choice items, it is best to give equal weight to each
item in scoring. If you believe one learning objective should be given twice the weight as
another, this should be reflected in the number of questions you assign to the objective
in your Table of Specifications.

If your test is divided so that some items will take more time to answer than other items
(for example, if you have a set of short-answer supply items that will require more work
or time than other test items), then the items in that entire set can get more weight than
another set of items. These weights should be made clear to the students. Students
should always know point values of questions so that they can apportion their testing
time.

If you are going to have An example of limited response might be an answer to a


an essay as part of your question like:
assessment, it is strongly
"Briefly summarize the experiment done to determine ... "
recommended that you
establish your scoring
system before you give
your test. An essay test is By comparison, the following question would elicit a more
appropriate when you extended response:
want to assess the ability
of students to organize "Compare the two major theories of alcoholism paying special
and apply information. In attention to the philosophical background of the theories,
some content areas you methods of treatment, and evidence supporting each. Which
might want to assess
theory do you believe is best supported by the evidence? What
writing skills. The degree
aspects of the research brought you to this conclusion?"
of freedom given in order
to answer a question
might be limited or the
student may be given a
great deal of freedom.

Scoring criteria for assessments should be done in advance. Developing the criteria can
lead to a change in the question so that expectations are clearer.

For limited/restricted response essays, the instructor should write an example of a


response, giving points for the components regarded as important in the answer. For
extended response essays, the instructor can use either an analytic scoring rubric or
holistic rubric. An analytic scoring method delineates desired characteristics, each
characteristic is then rated on a scale (for example, 1 – 5), and descriptions of the scale
points within the characteristics is given.

A holistic scoring rubric generates one score. A user of this method must develop a
description of the meaning of each of the scaled scores. An outline of the rubric you are
going to use should be handed to the students. This will enable them to prepare for the
exam.
Dr. Eric A. Matriano
P a g e | 20
PCK 2 – Assessment of Learning 1
Performance assessments require a student to demonstrate a skill by actually
performing it (e.g., writing a computer program to calculate a mean; carrying out an
experiment). Generally these methods of assessment appear authentic. They, too,
require a scoring rubric.

Besides scorer unreliability, the principal problems with performance assessments is the
amount of time required for the task. Because each task is so time consuming, the
number of tasks given is very few. If you have the students perform only a few tasks, it
is questionable whether you can generalize to other similar tasks.

RULES IN TEST ITEM CONSTRUCTION

1. Completion Items/ Short Answer/ Fill-in the Blank

Good for:

 Application, synthesis, analysis, and evaluation levels

Advantages:
 Easy to construct
 Good for "who," what," where," "when" content

 Minimizes guessing
 Encourages more intensive study-student must know the answer vs. recognizing the
answer.

Disadvantages:
 May overemphasize memorization of facts
 Take care - questions may have more than one correct answer

 Scoring is laborious

Tips for Writing Good Short Answer Items:


 When using with definitions: supply term, not the definition-for a better judge of student
knowledge.
 For numbers, indicate the degree of precision/units expected.

 Use direct questions, not an incomplete statement.


 If you do use incomplete statements, don't use more than 2 blanks within an item.
 Arrange blanks to make scoring easy.
 Try to phrase question so there is only one answer possible.

Dr. Eric A. Matriano


P a g e | 21
PCK 2 – Assessment of Learning 1
General Rules
1. Require short answers where only one response in correct.

Poor: ___________________discovered radium.

Poor: Radium was discovered by ___________________. (Poor because many answers


may fit. e.g., "a Frenchman," "Curie," "Experimenting with pitchblende.")

Better: Who discovered radium? ___________________

2. Should specify in advance if spelling is to be graded.


3. The blanks should be at or near the end of the statement, so that the response logically
follows the stimulus.

Poor: ___________________discovered radium.

Better: Who discovered radium? ___________________

4. There should preferably be only one blank per test item, but if more than one, they should be
for a related series. Do not use so many blanks that the question is a puzzle.

Poor: ___________________observed great diversity in__________________in the


___________________. (Huh?)

Good: Who was given credit for the early development of the theory of evolution?
___________________

5. Each blank in all items should be the same length. This avoids the possibility of the blank
itself serving as a clue.
6. When writing the item be sure to write complete sentences. Do not include any specific
determinate (clues) such as (a, an) or a word that implies singularity.

Poor: When an animal eats plants, it is said to be an ________________.

Better: When an animal eats plants, it is said to be a/an ________________.

7. If an answer is to be in specific units, be sure to indicate the units wanted.

Poor: When did Marie Curie receive the Nobel Prize? ________________. ("A long time
ago.." is a correct answer!)

Better: In what year did Marie Curie receive the Nobel Prize? ______________ .

8. When writing an item, do not take a statement directly from a textbook or from the teacher's
lecture, but write the item so as to test understanding rather than rote memory.

Dr. Eric A. Matriano


P a g e | 22
PCK 2 – Assessment of Learning 1
9. Write an item that cannot be completed by general knowledge but requires some
knowledge of the subject matter.
10. Words or phrases omitted should not be trivial words or phrases.

Poor: Darwin ___________________ finches in the Galapagos _________________.

Poor: Darwin studied ___________________ in the ___________________ islands.

11. When using completion items write them as questions, if possible.

Advantages Disadvantages
 A lot of vocabulary can be  The understanding assessed is likely
assessed in a minimal time to be trivial (recall/knowledge level)
 Difficult to avoid ambiguity in
 Construction is relatively easy constructing questions

 Scoring requires careful reading for


unanticipated but correct answers

Do's Don'ts
1. Leave only important terms blank. 1. Lift statements directly from the
2. Keep items brief. book.
2. Use "a" or "an" before a blank; make
3. Limit the number of blanks per it "a/an" if needed to make it
statement to one, at the most grammatical.
two for older students.
3. Count a misspelled or non-
4. Limit the response called for to grammatical answer entirely wrong.
single words or very brief phrases. Let students know in advance that
spelling and grammar count.
5. Try to put the blanks near the end
of the statement (or better yet,
see 9 and 4 below). 4. Provide lines in the statement, e.g.,
"water and alcohols are
6. Try to ensure that only one term ____________ molecules." Instead,
fits each blank. use numbered blanks that are all of

Dr. Eric A. Matriano


P a g e | 23
PCK 2 – Assessment of Learning 1
7. Indicate the units if the answer the same length, e.g., "water and
called for involves a numerical alcohols are examples of –1–
measure. molecules" and "water in its gaseous
state is called –2– ."
8. Give students credit for
unanticipated yet correct
responses.

9. Number the blanks and provide


lines down the right-hand side of
the page, all of the same length,
for students to write their
answers. (If you're left-handed,
put them on the left.)

Recommendations:

 These questions would be appropriate for quick checks of essential vocabulary.


 The point value assigned should be minimal.

 Consider reworking such questions into short answer format.

2. Multiple Choice Test Items

Good for:

 Application, synthesis, analysis, and evaluation levels

Types:
 Question/Right answer
 Incomplete statement
 Best answer

Advantages:
 Very effective
 Versatile at all levels

 Minimum of writing for student


 Guessing reduced
 Can cover broad range of content

Disadvantages:
Dr. Eric A. Matriano
P a g e | 24
PCK 2 – Assessment of Learning 1
 Difficult to construct good test items.
 Difficult to come up with plausible distractors/alternative responses.

Tips for Writing Good Multiple Choice items:


 Stem should present single, clearly formulated problem.
 Stem should be in simple, understood language; delete extraneous words.

 Avoid "all of the above"--can answer based on partial knowledge (if one is incorrect or
two are correct, but unsure of the third...).
 Avoid "none of the above."
 Make all distractors plausible/homogeneous.
 Don't overlap response alternatives (decreases discrimination between students who
know the material and those who don't).
 Don't use double negatives.
 Present alternatives in logical or numerical order.
 Place correct answer at random (A answer is most often).
 Make each item independent of others on test.
 Way to judge a good stem: student's who know the content should be able to answer
before reading the alternatives
 List alternatives on separate lines, indent, separate by blank line, use letters vs. numbers
for alternative answers.
 Need more than 3 alternatives, 4 is best.

General Rules

1. The stem should pose a clear question or problem and should contain as much of the item as
possible. It should be written as a question.

Poor: Evolution:

a. was observed by Albert Einstein.


b. was discovered by Charles Darwin.
c. is a theory not accepted by everyone.

Better: Who developed the theory of evolution?

a. Albert Einstein.
b. Charles Darwin.
c. Marie Curie.
Dr. Eric A. Matriano
P a g e | 25
PCK 2 – Assessment of Learning 1
2. The stem should be stated simply, using correct English.

3. Avoid use of direct statements from textbooks.

4. Avoid use of trick and ambiguous questions.

Poor: In what year did Darwin visit the Galapagos?

a. 1831
b. The calendar was inaccurate then; the year was actually 1832.
c. Darwin was not the only one to visit Galapagos.
d. Darwin visited Galapagos sometime between 1831 and 1836.

5. Avoid the use of negatives such as none or not. If they must be used, underline or capitalize
them.

Poor: Which of the following is not a mammal?

Better: Which of the following is NOT a mammal? OR All of the following are
mammals EXCEPT:

6. Vary the position of the correct alternative.

7. All alternatives should be grammatically related to the stem and listed in some logical
numerical or systematic form. This is less confusing to the students and decreases the probability
that he will make careless mechanical errors. How many legs do spiders have?

8. The length of the alternatives should be consistent, not vary with being correct or incorrect.
Test-wise students know that the correct answer is often the longest one with the most qualifiers.

Poor: The function of the villus is to:


A. Absorb digested food.
B. Transport non-lipid soluble materials across a concentration gradient of the
plasma membrane by active transport.
C. Remove foreign particles or antigens from the gastrointestinal tract by
phagocytosis and subsequent degradation.

D. Aid cells in synthesis and packaging of certain digestive enzymes to be secreted


into the duodenum for the hydrolysis of carbohydrates such as starch.

9. Avoid use of wordy stems.

10. Avoid use of verbal clues such as a, an.

Poor: A scientist who studies stars and galaxies is called an:


Dr. Eric A. Matriano
P a g e | 26
PCK 2 – Assessment of Learning 1
a. Botanist
b. Geologist
c. Astronomer
d. physicist

Better: A scientist who studies stars and galaxies is called a/an:

a. botanist
b. geologist
c. astronomer
d. physicist

11. Avoid use of response alternative such as "none of the above," "none of these," "both (a) and
(c) above," or "all of the above." "None of the above" may measure only the student's ability to
recognize incorrectness. "All of the above" may be a giveaway if a student recognizes more than
one correct alternative. Also a student may recognize a correct response and mark it without
reading down to the "all of the above" alternative.

Poor: Albert Einstein was:

a. A physicist
b. A mathematician
c. A Nobel Prize winner
d. Both B and C above
e. All of the above

(Which of these alternatives is a wrong answer?)

12. Each alternative should be independent so as not to give clues to answer another alternative.

13. When testing for knowledge of a term, it is preferable to put the word in the stem, and
alternative definitions in the response alternatives.

Poor: A particle found in most atomic nuclei is referred to as a/an

a. Proton
b. Neutriolos
c. Electron
d. Neutron

Better: A neutron can best be defined as:

Dr. Eric A. Matriano


P a g e | 27
PCK 2 – Assessment of Learning 1
a. Any number equal to zero
b. A negative particle found in atomic nuclei
c. A neutral particle found in atomic nuclei
d. A part of most atoms we know about

14. All alternatives should be written so that they are all plausible to the less informed student.

Advantages Disadvantages
 A large number of ideas can be addressed in a  It is time-consuming to write good items,
short period of response time. especially those at higher cognitive levels.
 These questions are easily and quickly scored.

 Questions can elicit responses from all  Test-wise and English fluent students tend
cognitive levels, from knowledge to to be favored.
evaluation.

 Questions can be improved over time by


analyzing them in light of student
performance.

Do's Don'ts
1. Use the same number of distractors (wrong 1. Use specific determiners in distractors
answers) for every question. such as all, none, only, and alone because
2. Use plausible distractors that are related to they usually indicate an incorrect answer.
the stem and are similar in character; tricky, Likewise, avoid generally, often, usually,
cute, and 'throw-away' ones are anathema. most, and may because they often indicate
the correct answer.
3. Have all distractors (and the correct answer) 2. Avoid negatives, including less obvious
about the same length. ones, such as without, because they can
confuse or be missed by students; highlight
4. Use correct grammar; if the stem is an
the negative word if you find you must use
incomplete sentence, each distractor should
one.
be grammatically consistent with it and
complete the sentence. 3. Provide clues in the stem, such as “a” or
“an” at the end; put these articles with the

Dr. Eric A. Matriano


P a g e | 28
PCK 2 – Assessment of Learning 1
5. Put all of the distractors in a single column, distractors.
not side by side or in two columns.
4. Avoid using "all of the above" and "none of
6. Use reasonable vocabulary and avoid the above." If you do use them, make
wordiness and ambiguity. them as frequently the incorrect answers
7. State the authority to be used in items calling as they are the correct answers.
for judgment.

8. Vary the position of the correct answer (the


tendency is to make it B or C).

9. Examine questions carefully for subtle clues in


word choice or phrasing.

Possible Distractors

1. Typical errors or misconceptions (keep track of these as you teach).


2. Misstated relationships, where the correct terms are connected with a wrong relationship.

3. Combine conclusions and explanations such that both are right, the former is wrong, the latter is
wrong, both are wrong.

Recommendations:

1. Using a graphic may make writing more challenging questions easier, e.g., graph interpretation.
2. Analyses are easily done on difficulty level and whether distractors are working; some machine
scoring programs can compute these.

3. True/False Alternative Response Items

Good for:

 Knowledge level content


 Evaluating student understanding of popular misconceptions
 Concepts with two logical responses

Advantages:
 Can test large amounts of content
 Students can answer 3-4 questions per minute

Disadvantages:
 They are easy
Dr. Eric A. Matriano
P a g e | 29
PCK 2 – Assessment of Learning 1
 It is difficult to discriminate between students that know the material and students who
don't
 Students have a 50-50 chance of getting the right answer by guessing
 Need a large number of items for high reliability

Tips for Writing Good True/False items:


 Avoid double negatives.
 Avoid long/complex sentences.

 Use specific determinants with caution: never, only, all, none, always, could, might, can,
may, sometimes, generally, some, few.
 Use only one central idea in each item.
 Don't emphasize the trivial.
 Use exact quantitative language
 Don't lift items straight from the book.
 Make more false than true (60/40). (Students are more likely to answer true.)

General Rules

1. Avoid using specific determiners such as always, never, might, may only, etc. There are
usually exceptions to these strong terms.

Poor: T F All of the mountains in the Rocky Mountains were formed by volcanic action.

Good: T F The mountains in the Rocky Mountains were formed by volcanic action.

2. When writing a true/false item, be sure to base it on a statement that is absolutely true or false.

3. Eliminate double negatives and if possible avoid negatives. If a negative such as not or none
must be used, be sure to underline or capitalize it.

Poor: T F Gregor Mendel was not a great astronomer.

Good: T F Gregor Mendel was not a great astronomer

4. Do not take statements out of textbooks, but write the statement in your own words.

5. Do not make the true statements consistently longer than the false statements and vice versa.

6. Do not use complex sentences with many dependent clauses.

Dr. Eric A. Matriano


P a g e | 30
PCK 2 – Assessment of Learning 1
Poor: T F John Glenn, an American astronaut, was a skillful pilot and is best known for
his first orbital flight around the earth.

Good: T F John Glenn is best known for his first orbital flight around the earth.

7. Avoid the use of more than one idea in an item unless it is a cause/effect item. If it is a
cause/effect item, it should be stated so that students will react to the effect and not the cause.

Poor: T F Slavery was a major cause of the Civil War whereas the economic situation of
the southern states was not.

Good: T F Slavery was one of the major causes of the Civil War

Poor: T F Gerald Ford did not win the 1966 presidential election because of the good
weather in the east which brought many people to the polls on election day.

Good: T F One of the reasons Gerald Ford did not win the 1976 presidential election was
the good weather in the east which brought many people to the polls on
election day.

8. A false statement should be written so that it is plausible to someone who has not studied the
area being tested.

9. The crucial part of a true/false item should be placed at the end of the item.

Poor: T F The economic situation of the southern states was a major cause of the Civil
War.

Good: T F A major cause of the Civil War was the economic situation of the southern
states.

10. When using opinion, the source should be identified unless the ability to identify the source
is what is being measured.

Poor: T F According to most botanists, Mendel is considered to be the greatest botanist.

Good: T F According to Brett Moulding, Mendel is considered to be the greatest


botanist.

10. It is preferable that students not be asked to write "T" or "F" in a blank. Many students
have developed a talent for making T's and F's look very similar. It is best to provide
alternative responses and have the students circle or underline the one which is correct.

Dr. Eric A. Matriano


P a g e | 31
PCK 2 – Assessment of Learning 1
Advantages Disadvantages
 Many topics can be covered in the time  The understanding assessed is likely to
available for students to respond. be trivial (recall/knowledge level).
 It is difficult to avoid ambiguity in
 These questions are quickly and easily constructing these questions.
scored.
 The odds of guessing a correct answer
are 50:50.

 Better students tend to read too much


into the questions.

Do's Don'ts
1. Use a single point that determines the 1. Use tricky questions.
truth of the statement. An example 2. Use unnecessary words and
violation: The cm is larger than the mm complicated content.
and the mm is larger than the dm.
2. Take care with grammar and spelling. 3. Use statements directly from the text.
Rephrase them so students must at
3. Use a single clause, simply and directly least have comprehended the material
stated; if two clauses are used, the main as opposed to recognizing it.
clause should be true and the
subordinate clause true or false. An 4. Avoid negatives; this means not just
example violation: Lilies are considered words like not or none, but negative
annuals because their bulbs live from prefixes and suffixes as well. Double
year to year. negatives are never grammatical.

4. Have approximately half of the 5. Avoid specific determiners. Statements


statements true and half false. It is with words such as generally, may,
easier to start with all true statements, most, often, should, and usually are
then go back and change some to false generally true. Statements with words
statements. such as all, alone, only, no, none, never,
and always are generally false.
5. Use a random pattern in the sequence
of answers, e.g., ttfft is okay, tftft is not.

Recommendations:

 These questions are okay for quick checks of vocabulary and concepts.
 The point value assigned should be minimal.
Dr. Eric A. Matriano
P a g e | 32
PCK 2 – Assessment of Learning 1
4. Matching Type

Good for:

 Knowledge level
 Some comprehension level, if appropriately constructed

Types:
 Terms with definitions
 Phrases with other phrases
 Causes with effects
 Parts with larger units
 Problems with solutions

Advantages:
 Maximum coverage at knowledge level in a minimum amount of space/preptime
 Valuable in content areas that have a lot of facts

Disadvantages:
 Time consuming for students
 Not good for higher levels of learning

Tips for Writing Good Matching items:


 Need 15 items or less.
 Give good directions on basis for matching.

 Use items in response column more than once (reduces the effects of guessing).
 Use homogenous material in each exercise.
 Make all responses plausible.
 Put all items on a single page.
 Put response in some logical order (chronological, alphabetical, etc.).
 Responses should be short.

General Rules

1. In the directions, the basis for matching should be stated, and also whether the various
responses can be used once or more than once.

Dr. Eric A. Matriano


P a g e | 33
PCK 2 – Assessment of Learning 1
2. The right column items should be identified with letters, and students should match items by
writing the letters in a space to the left of the test item number.

3. The statements in the response column (right) should be kept short and listed in some logical
order--alphabetically or chronologically. This helps students quickly locate responses.

4. The number of items in the left column should be five or less, for Junior High students. Tests
for high school students may contain up to 10 items.

5. The number of responses should exceed the number of items in order to avoid answering by
the process of elimination.

6. Items in each column should belong to the same general class.

Poor:

____ a. Mendel a. Lived in Germany


____ b. Einstein b. Cell theory
____ c. Schleiden c. Chemist
____ d. Leuwenhoek d. Lived in Holland
____ e. Bohr e. Nobel Prize

Better:

____ a. Bohr a. Astronomer


____ b. Einstein b. Botanist
____ c. Leuwenhoek c. Chemist
____ d. Mendel d. Geologist
____ e. Schleiden e. Inventor
f. Physicist

7. Each matching item should be contained on a single page.

8. Longer words and phrases should be listed in the left column and shorter words and phrases
should be listed in the right column. Placing long phrases in the right column requires student to
keep much more information in their heads as they consider all alternatives.

Advantages Disadvantages
Dr. Eric A. Matriano
P a g e | 34
PCK 2 – Assessment of Learning 1
 A large number of related ideas cans be
addressed in a short period of time.
 Such questions are restricted to recognition
 Answers are easily and quickly scored. of simple understandings.
 Clues are difficult to avoid.

 A common error is lack of consistency of


relationship throughout the question.

Do's Don'ts
1. Make certain that the relationship 1. Split a matching question between pages.
between the stems and the responses is 2. Fail to check that directions state a
the same throughout the question. For relationship and that it is correct across the
example, all of the items might be things entire question.
OR events, but a combination of things
and events is inappropriate. (Warning: 3. Provide more than one correct response for
this is more difficult than it appears!) a single stem, unless you've been very clear
2. State the specific relationship between with the directions and have taught
the stems and responses in the students to do this kind of question in
directions to the question. Check that it advance.
fits each stem and its response. If it
doesn't, rework the question. 4. Change the grammar across stems and
responses, e.g., between plural and
3. Put the stems ("question") column on singular.
the left and number them.

4. Put the blanks for students to record


their answers next to the stems (or on
an answer sheet).

5. Put the responses (answers) column on


the right and letter them (capital
letters).

6. Order the responses in some logical


fashion, e.g., alphabetically,
sequentially.

7. Make the stems longer than the


responses.

8. Use between five and ten stems.

9. Provide more responses than needed


(about 40-50% more than stimuli).

Formatting matching tests

Dr. Eric A. Matriano


P a g e | 35
PCK 2 – Assessment of Learning 1
(This is not a question, just an illustration of formatting.) For an example, download the Sound test at
the bottom of this page - MS Word)

___ 1. This is the longer stem to the A. Lettered response


left. B. Logical order
___ 2. The stems have the blanks. C. More responses than
needed
D. Shorter responses

Recommendations:

1. Consider using photographs, diagrams, graphs that illustrate structures or events.


2. Sometimes one graphic can be used twice, e.g., names and functions of cell parts.

3. Having just a few responses when some can be used twice (or more) is okay.

5. Rating Scales

1. There should be a separate scale for each characteristic, attitude, or behavior being measured.

2. Each characteristic should be equally significant.

3. The characteristics and the points on the scale should be clearly defined.

Poor: How important is money to a happy marriage?

1
|
|
|
5

Better: How important is money to a happy marriage?

1 unimportant

2 of little importance

3 important

4 very important

5 the most important


Dr. Eric A. Matriano
P a g e | 36
PCK 2 – Assessment of Learning 1
4. Between 3 and 7 rating positions should be provided on the scale.

5. Raters should be instructed to mark only at the designated rating positions,not in-between.

6. Raters should be instructed to omit ratings where they feel unqualified to judge.

7. Do not define the end categories so extremely that no one will use them.

6. Checklists

1. Identify and describe clearly each of the specific actions desired in the performance.

2. Add to the list those actions which represent common errors, if they are limited in number and
can be clearly identified.

3. Arrange the desired actions and likely errors in the approximate order in which they are
expected to occur.

4. Provide a simple procedure for numbering or checking the actions in the sequence in which
they occur.

5. If the list is lengthy, group the behavior to be checked under separate sub-headings.

Example Checklist

Making a Peanut Butter Sandwich:

Directions: Check the space to the left of each behavior as that behavior is observed.

____ Has proper materials (2 slices of bread, peanut butter, knife) ready.
____ Lays bread on flat surface.
____ Opens peanut butter jar.
____ Grasps knife firmly by the handle.
____ Inserts blade of knife into peanut butter and removes the desired amount.
____ Has difficulty manipulating the knife.
____ Spreads peanut butter on one side of the first slice of bread with a side-to-side motion so
that peanut butter is evenly distributed.
____ Places second slice of bread directly on top of peanut butter.
____ Closes peanut butter jar.

7. Essay

Good for:

 Application, synthesis and evaluation levels

Dr. Eric A. Matriano


P a g e | 37
PCK 2 – Assessment of Learning 1
Types:

 Extended response: synthesis and evaluation levels; a lot of freedom in answers


 Restricted response: more consistent scoring, outlines parameters of responses

Advantages:
 Students less likely to guess
 Easy to construct

 Stimulates more study


 Allows students to demonstrate ability to organize knowledge, express opinions, show
originality.

Disadvantages:
 Can limit amount of material tested, therefore has decreased validity.
 Subjective, potentially unreliable scoring.

 Time consuming to score.

Tips for Writing Good Essay Items:


 Provide reasonable time limits for thinking and writing.
 Avoid letting them to answer a choice of questions (You won't get a good idea of the
broadness of student achievement when they only answer a set of questions.)
 Give definitive task to student-compare, analyze, evaluate, etc.
 Use checklist point system to score with a model answer: write outline, determine how
many points to assign to each part
 Score one question at a time-all at the same time.

Guidelines for grading essay questions

This final list, may serve you well as you score student responses.

1. Decide whether to use the analytic or holistic approach. The tentative scoring key or
model answer should reflect the specific approach chosen.
2. Check the scoring key or model answer against a sample of actual responses without
assigning any grades. Revise the scoring key or model answer if necessary.
3. Decide in advance how to handle irrelevant and inaccurate information, bluffing, and
technical problems (e.g., spelling, penmanship, grammar, punctuation, organization, etc.).
4. Evaluate the responses without identifying the student.

Dr. Eric A. Matriano


P a g e | 38
PCK 2 – Assessment of Learning 1
5. Apply the rating criteria as consistently as possible. Once the grading has begun, the
criteria should not be changed, nor should they vary from examinee to examinee or from
rater to rater.
6. Try to grade all the responses to a particular question without interruption.
7. Grade only one question at a time; that is, rate all the answers to one question before
starting to rate the next question.
8. Randomly shuffle the order of the examination papers each time you begin grading a new
question.
9. If important decisions are to be based on the results, have each essay graded
independently by at least two different raters and average the results.

Essay Questions

Advantages Disadvantages
 All cognitive levels can be  Essay questions are time-consuming to
addressed with essay questions. answer and answers take more time to
 It takes less time to write an essay score.
test (only because there are fewer  Less content can be sampled.
questions to write).
 Reliability of both response and score
 Students’ organizational skills are is less (although validity may be
also measured. better).
Do's Don'ts
1. Teach students how to respond to 1. Don't provide optional questions, i.e.,
the types of essay questions to be answer two of the following four
asked, e.g., how to construct an questions. This results in different tests
argument citing evidence, logic and (if you do offer choices, make sure you
the null hypothesis. are satisfied with different tests).
2. Give adequate directions as to the 2. Don't start essay questions with words
content of the desired response, i.e., such as name, list, who, what, where,
don't just say "discuss," say "discuss etc. These do not indicate that thinking
in terms of x, y, and z." beyond recall is required; use a
different kind of question instead.
3. Provide the structure of response
wanted in the directions, e.g., 3. Don't wait until the last minute to write.
Dr. Eric A. Matriano
P a g e | 39
PCK 2 – Assessment of Learning 1
compare and contrast, analyze, Allow time to review and rewrite.
synthesize, evaluate from
(what/whose?) perspectives, develop
in the manner of (what/who).
4. Indicate the length of the response
desired.
5. Write longer rather than shorter
questions. Use novel questions when
feasible.
6. Ask for thinking beyond knowing or
comprehending (don’t waste
anyone’s time!).
7. Have students respond on their own
paper or provide plenty of space.
8. Provide for a range of acceptable
answers such that all students will be
able to respond to some extent.
Encourage students to try.

9. Consider asking more questions


requiring shorter responses than
fewer questions requiring lengthy
responses, especially for less mature
students.

Scoring

1. Decide whether an analytic or holistic rubric is more appropriate for the question.
2. Plan the scoring guide/rubric as you write the questions (usually improves the question).
3. Read a sample of responses before you begin to score; make adjustments if needed.
4. Vary the sequence of papers as you score across questions. No paper should always be
first.
5. Do read all responses to a single question together (faster and helps you maintain focus).
6. Hide students’ names.
7. Provide constructive comments and do so legibly.

Recommendations:
Dr. Eric A. Matriano
P a g e | 40
PCK 2 – Assessment of Learning 1
Supplementing
Essays
There are creative ways
 Having students construct a table or do a to supplement essay
labeled diagram may sometimes be more tests that can be just as
appropriate than writing an essay. effective as the essays
 If problems / calculations are used, consider themselves. For
having students explain what the answer example, rather than
means. asking students to write
an essay describing the
 You may want to consider grading writing process of mitosis, you
skills separately. might ask them to draw
a diagram like the one to
the left and label it with
short descriptions. Just
8. Oral Exams make sure that you have covered a
similar diagram in class.
Good for:

 Knowledge, synthesis, evaluation levels

Advantages:
 Useful as an instructional tool-allows students to learn at the same time as testing.
 Allows teacher to give clues to facilitate learning.

 Useful to test speech and foreign language competencies.

Disadvantages:
 Time consuming to give and take.
 Could have poor student performance because they haven't had much practice with it.

 Provides no written record without checklists.

9. Student Portfolios

Good for:

 Knowledge, application, synthesis, evaluation levels

Advantages:
 Can assess compatible skills: writing, documentation, critical thinking, problem solving
 Can allow student to present totality of learning.

 Students become active participants in the evaluation process.

Disadvantages:
 Can be difficult and time consuming to grade.

19. Performance Task


Dr. Eric A. Matriano
P a g e | 41
PCK 2 – Assessment of Learning 1
Good for:

 Application of knowledge, skills, abilities

Advantages:
 Measures some skills and abilities not possible to measure in other ways

Disadvantages:
 Can not be used in some fields of study
 Difficult to construct

 Difficult to grade
 Time-consuming to give and take

7 Criteria in Selecting a Good Performance Assessment Task (Burke, 1999)

• Generalizability - the likelihood that the students’ performance on the task will generalize the
comparable tasks.

• Authenticity-The task is similar to what the students might encounter in the real world as.
opposed to encountering only in the school.

• Multiple Foci - The task measures multiple instructional outcomes.

■ Teachability - The task allows one to master the skill that one should be proficient in.

• Feasibility - The task Is realistically implementable in relation to its cost, space, time, and
equipment requirements.

• Scorability-The task can be reliably and accurately evaluated.

. • Fairness-The task is fair to all the students regardless of their social status or gender

Compilation of the Test

More likely than not, you will be responsible for the typing, proofreading, and collation of
the test. Spell check is a great addition to software programs, but this will not detect all
Dr. Eric A. Matriano
P a g e | 42
PCK 2 – Assessment of Learning 1
errors. If you hand out a test with numerous mistakes, the message you are sending to
the students is “this test and your evaluation are not worth a lot of effort.”

It’s helpful to consecutively number each answer sheet and accompanying test booklet.
When the students hand in the answer sheet and test booklet, you can quickly
determine if a test booklet is missing and whose test booklet it is. After you have gone
over a test with the students, it’s beneficial for you to collect the booklets since you
might want to use many of the items again.

Preventing Cheating on a Test


You will find that the procedure for handling cases of cheating is in our student
handbook. Generally it’s recommended that you structure the situation so that cheating
is avoided.
 Shred “bad” and old copies of exams. This writer has seen wastebaskets in
hallways outside department offices filled with exam questions in addition to
confidential student information.
 Secure the exams and any sources (e.g., manuals). Some office help in
department offices may be undergraduate students. Even though that student
may not be in your class, he might know someone who is.

 Make sure each student gets one, and only one, booklet and answer sheet.

 Try to spread students out when they are taking the test.

 The best anti-cheating devices are your ears, eyes, and feet. Walk up and down
aisles; it’s good exercise.

 Station yourself at the exit door with a roster in hand. When tests are handed to
you, check off names. That way, if you don’t have a student’s exam, you know
that you lost it and the student should be retested.

 Bring pencils with you and tell students if they need another, raise their hand—do
not go to another student. Have the students return the borrowed pencils!

SUMMATIVE ASSESSMENT

I. TRUE OR FALSE. Write TRUE if the statement is correct and if it is incorrect, change the
underlined word/s to make it correct. (15 pts)

_____1. The maximum number of items in matching test is 30.


_____2. Homogeneity of the alternatives in multiple-choice test should be increased in
order to choose the best alternative by means of elimination.
Dr. Eric A. Matriano
P a g e | 43
PCK 2 – Assessment of Learning 1
_____3. Double negative statements must be provided in constructing a true-false test.
_____4. Alternatives (options and distracters) must be plausible with each other in
constructing multiple-choice test.
_____5. To minimize guessing in constructing multiple-choice test, more than five
alternatives must be provided.
_____6. Completion items cannot efficiently measure lower levels of cognitive ability.
_____7. True-false items efficiently measure higher-level of cognitive ability.
_____8. An imbalanced matching type (that is, the number responses exceeds the
number of premises) is preferable.
_____9. Essay items help the students to think critically and logically.
_____10. Any test format cannot measure all instructional objectives.
_____11. Tricky questions are not necessary to test pupils understanding
_____12. Short-answer items are easier to construct than matching item.
_____13. The use of essay item is an effective means of measuring higher level
cognitive outcomes.
_____14. Multiple-choice items cannot measure higher level of cognitive skills.
_____15. Essay items cannot measure writing and self-expression skills.

Multiple Choice. Write the letter of the best/correct answer. (25 points)
1. Which of the following does not belong to the group?
a. Matching c. Multiple choice
b. True-false d. Completion type

2. A type of test where three or more plausible alternatives are provided in each item.
a. Essay test c. Multiple choice test
b. Matching type d. Problem-solving test

3. The other term for completion test is:


a. True-false test c. Multiple choice test
b. Fill-in the blank test d. Short-answer test

4. Which of the following does not belong to the group


a. Multiple choice c. Matching type
b. Completion type d. Analogy type

5. A test that consists of a series of items in which it admits only one correct answer
for each item out of two alternatives.
a. Problem-solving c. True-false
b. Multiple-choice d. Short-answer

6. A matching item test includes ten events to be matched with twelve responses
containing dates, cities, and provinces. The error of this test construction is:
Dr. Eric A. Matriano
P a g e | 44
PCK 2 – Assessment of Learning 1
a. too many responses c. Homogeneous response
b. unbalanced matching d. Heterogeneous response

7. Which of the following is not a strength of multiple-choice


a. efficiently measure c. adequate content sampling
b. provides diagnostic information d. Allows educated guess

8. A table of specifications is used for


a. Content of reading level. c. content and cognitive process.
b. Planning test d. listing instructional objectives.
9. In completion test, the blank(s) should be placed
a. At or near the beginning of the item
b. Between the beginning and the middle of the item.
c. At or near the end of the item.
d. As close to the middle of the item.

10. From a measurement standpoint, using test consisting entirely of essay items is
undesirable because
a. Content sampling tends to be limited
b. Scoring requires too much time.
c. Difficult to construct the item
d. Inefficient measurement of cognitive ability.

11. When constructing multiple-choice items, it is best to:


a. Make all alternatives the same length
b. Put the main idea of the main item in the alternatives
c. Use options such as a and b, but not c.
d. Repeat key words in the stem and the options

12. The “halo effect” in scoring items is a tendency to score more highly those
responses:
a. That are technically well written
b. Read earlier in the scoring process
c. Of students known to be good students
d. Read later in the scoring process

13. In an association form, short-answer item, the spaces for the answers should:
a. All have uniform size
b. Vary according to length of correct answer
c. Vary in size, but not according to any order

Dr. Eric A. Matriano


P a g e | 45
PCK 2 – Assessment of Learning 1
d. Vary in size according to some order

14. Essay items are popular in teacher-constructed tests because they:


a. Are subjective in scoring
b. Are perceived to be more effective in measuring higher-level cognitive
outcomes.
c. Tend to have greater content sampling
d. Tend to have greater reliability than objective items

15. Anonymous scoring of essay item responses tends to reduce:


a. The halo effect c. reader agreement
b. Order effects d. effects of penmanship

16. Which test a subjective and less reliable for scoring and grading?
a. Completion c. True or false
b. Essay d. Matching

17. Teacher Kiko Martin wants to test student’s acquisition of declarative knowledge.
Which test is appropriate?
a. Performance test c. Short answer test
b. Submission of a report d. Essay

18. Teacher Vilma Aunor would like to cover a wide variety of objectives in the quarterly
examinations in his English class lesson on subject verb agreement. Which of the
following type of test the most appropriate?
a. True or False c. Multiple choice
b. Essay d. Matching type

19. Among the types of assessment below, which does not belong to the concept-
group?
a. Multiple choice c. True or false
b. Matching type d. Completion test
20. Which type of test measures student’s thinking, organizing and written
communication skills?
a. Completion c. True or false
b. Essay d. Matching

Dr. Eric A. Matriano


P a g e | 46
PCK 2 – Assessment of Learning 1
21. Ms. Regine Misalucha tasked her students to show how- to play basketball. What
learning target is she assessing?
a. Knowledge c. Skills
b. Reasoning d. Product

22. Mr. Piolo Cruz made an essay test for the objective “Identify the planets lathe
solar system". Was the assessment method used the most appropriate for the given
objective? Why?
a. Yes, because essay test is easier to construct than objective test.
b. Yes, because essay test can measure any type of objective.
c. No. He should have conducted oral questioning.
d. No. He should have prepared an objective tests.

23. Mr. Janno Alcasid wants to test students’ knowledge of the different places in the
Philippines, their capital and their products and so he gave his students an essay
test. If
you were the teacher will you do the same?
a. No, the giving of objective test is more appropriate than the use of essay.,
b. No, such method of assessment is inappropriate because essay is difficult.
c. Yes, essay test could measure more than what other tests could measure.
D. Yes, essay test is the best in measuring any type of knowledge.

24. What principle of test construction is violated when one places very difficult items
at the beginning; thus creating frustration among students particularly those of
average ability and below average?
a. All the items of particular type should be placed together in the test
b. The items should be phrased so that the content rather than the form of
the
statements will determine the answer
c. All items should be approximately 50 percent difficulty.
d. The items of any particular type should be arranged in an ascending order
of
difficulty

25. With specific details in mind, which one has a stronger diagnostic value?
a. Multiple Choice test c. Restricted Essay type
b. True or False Test d. Non-restricted/extended Essay
Type

Identification. Evaluate the following items. Indicate whether each item has

A. Acceptable stem and acceptable options


B. Acceptable stem but unacceptable options
C. Unacceptable stem but acceptable options
D. Unacceptable stem and unacceptable options

Choose the letter only of the correct answer from the above options.(10 pts)

Dr. Eric A. Matriano


P a g e | 47
PCK 2 – Assessment of Learning 1
___1. Luis is twelve years old. How many orbits around the sun has the earth made since he
was born?
A. 12 B. 52 C. 30 D. 365

___ 2. A standardized test


A. Has norms C. Has scorability
B. Has objectivity D. Has reliability

___ 3. Generally, the longer the test, the higher is its


A. Validity C. Interpretability
B. Reliability D. Usability

___ 4. A reading teacher wants to find out if the pupils are ready to move on to the next
lesson.
What kind of test should she give?
A. Diagnostic C. Placement
B. Formative D. Medical

___ 5. A gumamela is an
A. Incomplete flower C. Pistillate flower
B. Complete flower D. Staminate flower

___ 6. The following methods are used in hydroponics except


A. Slop-method
B. Water-culture method
C. Sub-irrigation method
D. All of the above

___ 7. When expressed as a percentage, ¾ equals


A. A number less than 80%
B. Exactly 75%
C. A number greater than 70%
D. 80%

___ 8. Which of the following games is interesting?


A. Basketball C. Football
B. Baseball D. Badminton

___ 9. The topsoil contains a dark sticky substance called


A. Bedrock C. Humus
B. Compost D. Subsoil

___ 10. The heart is protected from injury by the


A. Skull C. Rib cage
B. Pelvis D. Spinal column

Matching Type. Match Column B (response ) with Column A (premises ) and write the letter
of your answer on the blank before each number in column A. (10 pts)

Column A Column B

_____1. Multiple-choice item A. constructed-response


item
_____2. Completion item B. specific determiners

Dr. Eric A. Matriano


P a g e | 48
PCK 2 – Assessment of Learning 1
_____3. An incorrect alternative response C. multiple-choice item
_____4. A selected-response item that measures higher D. selected-response
item
cognitive ability level E. distractor
_____5. Words that denote whether a statement is true or false F. formative testing
_____6. A testing done to monitor student’s progress G. essay test items
over a period of time. H. summative testing
_____7. Poor test scores reliability due to subjectivity I. standardized test
_____8. A disadvantage of essay J. True-False test item
_____9. A binary test K. achievement test
_____10. Test to monitor learning progress during instruction L. scoring difficulty

Module 7

ITEM ANALYSIS

“The quality of the whole depends on the quality of its parts.”


Dr. Eric A. Matriano
P a g e | 49
PCK 2 – Assessment of Learning 1
Objectives:

After taking this module, the participants are expected to:

1. Determine the difficulty index and discrimination index of items in a given test.
2. Improve items in the test using the item analysis.
3. Identify items which are ambiguous, miskeyed, guessed and with poor
distracters.

ITEM ANALYSIS: Is the process of examining student’s responses to each test item
to judge the quality of the item, specifically what one looks for is the difficulty and
discriminating ability of the item as well the effectiveness of each alternatives.

This is done by studying the learner’s responses to each items. It also


gives information concerning each of the ff. points:
1. The difficulty of the item
2. The discriminating power of the item
3. The effectiveness of each item

USES OF ITEM ANALYSIS: item analysis data have several values:


1. To facilitate classroom instruction, especially in the Diagnostic testing.
2. They can be of aid in subsequent test revision.
3. They lead to increase the test construction skill of the teacher.
4. They can be used to build a test file for future use, or test.
5. They provide a basis for discussing test results.
6.They help one judges the worth on quality of a test. They help the teacher to
discover items that are:
a. ambiguous
b. miskeyed
c. too easy or too difficult
7. They help a teacher to discover the discriminating property of the item.

CRITERION GROUP:

We need two criterion groups for the item analysis. The smaller the
percentage used for upper and the lower group, the greater the differentiation will be.
However, the smaller the extreme groups, the less reliable the resulting values would
be, Kelly (1939) claimed that the optimum at which these two conditions balanced is
reached when the upper and the lower 27% values are used.

INTERPRETING ITEM ANALYSIS DATA

1. AMBIGUITY: An ability of the higher group to discriminate between a correct


and wrong options. This is when the higher achievement group selects an
incorrect one.
EXAMPLE;
Dr. Eric A. Matriano
P a g e | 50
PCK 2 – Assessment of Learning 1
Options (a) b c d e
Upper Group 6 2 6 1 0

2. MISKEYED: An incorrect option selected by a large number of pupils in the


upper group suggests a key error.
EXAMPLE;
Options (a) b c d e
Upper Group 1 2 0 11 1

(The teacher has to check the options again. Most probably option d is the
correct answer, and has miskeyed it.)

3. GUESSING: Sometimes items inadvertently contain information which pupils


have not yet learned. Others maybe so difficult so that the pupils have no idea
how to respond. These would invite the pupils to guess the correct response.
When they do so the choice would roughly equally distributed among the
Options as shown in the example:
Options a b (c) d e

Upper Group 4 3 3 2 3

(The probability is almost equally or equally distributed)

4. POOR DISTRACTORS: Some distracters are not chosen at all by pupils.


Distractors a and d in the example below illustrate such case. These
distractors maybe too obviously wrong so that even a low achievement pupil
knows that it is a wrong option.
Another type of poor distractor. It attracts more the upper group
pupils than the lower group. This maybe due to misconceptions.

Options a (b) c d e
Upper Group 0 12 0 0 3
Lower Group 0 8 6 0 7

5. DIFFICULT LEVEL: The item difficulty or the difficulty level of an item is


usually expressed by its degree of facility index, FI. If the facility index is
high, the item is easy; if the facility index is low, the item is a difficult one. This
value is sometimes referred as difficulty index.

The values of facility ranges from 0 to 1. From very hard to very simple.

Dr. Eric A. Matriano


P a g e | 51
PCK 2 – Assessment of Learning 1
FREDRICK LORD has estimated different difficulty level for the difficult types
of
test.

No. of Options Optimum Difficulty Level


1 ( easy) 0.50
2 ( T or F) 0.55
3 0.77
4 0.74
5 0.69

6. DISCRIMINATION INDEX (DI): it means the extent to which an item is


capable of measuring individual differences. If high or low achievement
groups performed equally well on an item, it has DI of zero and is useless for
measuring individual differences.

On the other hand, if high achievement group respond correctly to an item


and most low achievement group missed it, the item discriminate properly. (It
is positive Discriminate) This means that the item is good in identifying
successful from unsuccessful pupils. The opposite may happen when the
lower group do better than the upper group. The D. I. is negative. Thus, the
type of item should be revised.

The Discrimination Index can take value from -1.00 to +1.00. The higher
the discrimination index, the better the item discriminated.

The following Rules of thumb for interpreting Item Discrimination Index Values
for Classroom Test.

D.I. Item Evaluation


0.40 and above - very good item
0.30 – 0.29 - good item but possibly subject to
improvement
0.20 – 0.29 - marginal item, usually subject to
improvement
0.19 – and below - poor items to be rejected or kept for mastery
test

STEPS OF AN ITEM ANALYSIS:


1. After the test has been given, the answer sheets must be scored and the
score is written on the top of each answer sheet.
Dr. Eric A. Matriano
P a g e | 52
PCK 2 – Assessment of Learning 1
2. Arrange the test papers by score, placing one with the highest score on top
and continuing sequentially until the one with the lowest score is at the bottom.
3. Count the total number of answer sheets or the number of pupils who sit for
the test. N. Calculate 27% of number of pupils who sit for the test and bring it to the
nearest whole number and call this NUL.
4. Remove the lowest and the highest 27% leaving intact the middle 46%.
5. For each item count the number of pupils in the upper group who responded
to each options. Do the same thing for the lower group, count the number of pupils in
the middle group who chose correct option. Record them in a table as shown below.

Item No. Option


Facility Discrimination
and the Remarks
Group Index Index
correct a b c d (Refer to Table 2)
(FI) (DI)
Response
The item is
11+14+5 11 – 5
Upper 1 11 2 1 acceptable.
55 15
1 (b) Middle 5 14 3 3
Lower 3 5 4 3
FI = 0.55 DI = 0.40
12+19+13 The item is easy and
12 – 13 with negative
Upper 12 0 2 1 55
15 discrimination. Option
2 (a) Middle 19 1 3 2 B is not a plausible
Lower 13 0 2 0 FI = 0.80 option. Reject the
DI = - 0.067
item.
The item is difficult
4+7+4 4–4 and non-
Upper 4 4 3 4
55 15 discriminating. The
3 (b) Middle 6 7 6 6 students have
Lower 3 4 4 4 guessed the item. The
FI = 0.27 DI = 0
item is rejected.
Upper 1 1 1 12
4 (d) Middle 2 3 3 17
Lower 3 4 6 2
Upper 2 0 10 3
5 (c) Middle 3 0 17 5
Lower 3 0 8 4
Upper 7 5 2 1
6(a) Middle 12 10 2 1
Lower 6 3 2 4
Upper 1 12 0 2
7(b) Middle 2 20 0 3
Lower 2 10 0 3
Upper 3 5 2 5
8(d) Middle 4 8 4 9
Lower 4 3 2 6
9(a) Upper 1 0 1 13
Middle 2 3 1 19
Dr. Eric A. Matriano
P a g e | 53
PCK 2 – Assessment of Learning 1
Lower 1 2 2 10
Upper 4 4 3 4
10(c) Middle 4 8 4 9
Lower 4 3 2 6

6. Calculate the Facility index as follows: Sum up all the number of pupils in
the three groups who responded correctly and divide it by the total number of
pupils taking the test, N.
This gives the Facility Index (FI)

FI = ( U + M + L ) / N

Where:
U - Number of pupils in the upper group who answered
correctly.
M - Number of pupils in the Middle group who answered
correctly.
L - Number of pupils in the Lower group who answered
correctly.
N - Total Number of Pupils taking the test.

7. Calculate the Discrimination Index:

DI = ( U – L ) / NUL

Subtract the number of pupils who responded correctly in the Lower


Group. (L) from the number of pupils who responded correctly in the Upper
Group marked with (U). Divide the difference by the number of pupils in the
upper or lower group, NUL.
8. Enter values of F and DI in the table.

ITEM IMPROVEMENT AND SELECTION:

By examining the responses of the pupils in the upper and lower groups
on each option, the DI and the FI of each item, the teacher can revise the item to
improve it. The ff. guidelines maybe used in selecting and improving the item.

1. Consideration base on Facility Index and Discrimination Index.

Table 2

FACILITY INDEX (FI)


Dr. Eric A. Matriano
P a g e | 54
PCK 2 – Assessment of Learning 1
Discrimination
Index (DI) 0.40/Below 0.40-0.80 0.80 &
above

0.40 & above Difficult Acceptable Easy

0.30 – 0.39 Difficult Improvable Easy

0.20 – 0.29 Difficult Marginal Easy

0.19- & below Rejected Rejected Rejected

2. Any option that is not selected by the pupils in either the upper or lower
group violates the principle that each alternative in a multiple choice- item
should be PLAUSIBLE. The alternative should be revised before it is used
again.
3. Distractors should be if possible discriminate negatively. This means that
incorrect options should be selected less frequently by the upper group than
by the lower group. A distractor is functioning well if it attracts most or large
proportion of the pupils from the lower group.

Note: Easy items will prove to have low discrimination power. Why? Let’s take an
example at the extreme. If an item is answered correctly by all students, as many
good as poor students will have answered it correctly. Keeping the item, then, should
be more a function of whether it addresses an important objective. However, very
difficult items are also apt to produce poor discrimination indices. These items also
should be examined carefully.

SUMMATIVE ASSESSMENT

TRUE OR FALSE. Write TRUE if the statement is correct and write FALSE if it is
incorrect.

_____1. A very easy item should be discarded.


_____2. If an item has DI = 0, then the item very discriminating.
_____3. If DI = 1 for a test item, then the item is either miskeyed or very
indiscriminate.
_____4. If the FI of a test item is 0, it is very difficult.
_____ 5. If FI of a test item is 1 it must be discarded.
_____ 6. If FI of a test item is 0.85, it is high therefore easy.
_____ 7. If FI of a test item is 0.11, it is low therefore discriminating.
_____ 8. If the DI of a test item is -0.84, it is high negative therefore not
discriminating.
_____ 9. If the DI of a test item is 0.05, it is low positive, therefore not discriminating.
Dr. Eric A. Matriano
P a g e | 55
PCK 2 – Assessment of Learning 1
_____ 10. If none of the examinees chose a particular option in a multiple choice
item, that option must be changed or modified.

Given the data for a 10-item Multiple Choice, do an item analysis. Make decision if
the item is to be accepted or rejected.

Item No. Option


Facility Discrimination
and the Remarks
Group Index Index
correct a b c d (FI) (DI) (Refer to Table 2)
Response
Upper 1 1 1 12
1 (d) Middle 2 3 3 17
Lower 3 4 6 2
Upper 2 0 10 3
2 (c) Middle 3 0 17 5
Lower 3 0 8 4
Upper 7 5 2 1
3 (a) Middle 12 10 2 1
Lower 6 3 2 4
Upper 1 12 0 2
4(b) Middle 2 20 0 3
Lower 2 10 0 3
Upper 3 5 2 5
5 (d) Middle 4 8 4 9
Lower 4 3 2 6
Upper 1 0 1 13
6 (a) Middle 2 3 1 19
Lower 1 2 2 10
Upper 4 4 3 4
7 (c) Middle 4 8 4 9
Lower 4 3 2 6
Upper 11 1 2 1
8 (a) Middle 15 2 5 3
Lower 3 4 2 6
Upper 1 11 1 2
9 (b) Middle 1 22 1 1
Lower 3 5 2 5
Upper 1 2 2 10
10 (d) Middle 2 2 3 18
Lower 1 1 0 13

Essay. Answer the following questions.

1. Why is item analysis needed after checking the test? What is the importance of
item analysis in improving the quality of test items?

Dr. Eric A. Matriano


P a g e | 56
PCK 2 – Assessment of Learning 1
2. How will you identify items which are ambiguous, miskeyed, guessed and with
poor distracters?

Module 8
STANDARDS OF QUALITY ASSESSMENT:
CHARACTERISTICS OF A GOOD TEST

Objectives:

At the end of this module, you are expected to:

1. Explain validity and reliability as standards of quality assessment.


2. Identify the different factors affecting validity and reliability of test.
3. Evaluate the reliability of a given test.
4. Apply the formula for reliability index.
5. Discuss the other characteristics of a good test.

In general, high-quality assessments are considered those with a high level of


reliability and validity. Approaches to reliability and validity vary, however. These are
the characteristics of a good test or assessment.

Dr. Eric A. Matriano


P a g e | 57
PCK 2 – Assessment of Learning 1
A. VALIDITY

The Validity of a test may be defined as the degree to which a test


measures what it is supposed to measure. This is related to the purpose of the
test. Since validity is a matter of degree, it is incorrect to say that a test is either valid
or invalid. All tests have some degree of validity for any purpose for which they are
used; however, some are much more valid than others. If the objective matches the
test items prepared, the test is said to be valid.

It is also refers to the usefulness of the instrument for a given purpose. It is


the most important criterion of a good assessment instrument.

A valid assessment is one which measures what it is intended to measure.


For example, it would not be valid to assess driving skills through a written test
alone. A more valid way of assessing driving skills would be through a combination
of tests that help determine what a driver knows, such as through a written test of
driving knowledge, and what a driver is able to do, such as through a performance
assessment of actual driving. Teachers frequently complain that some examinations
do not properly assess the syllabus upon which the examination is based; they are,
effectively, questioning the validity of the exam.

Validity of an assessment is generally gauged through examination of


evidence in the following categories:

1. Face Validity - is done by examining the physical appearance of the


instrument to make it readable and understandable

2. Content Validity--Does the content of the test measure stated objectives? Is


done through a careful and critical examination of the objectives* of
assessment to reflect the curricular objectives

Although there are several types of validity, classroom teachers are


most concerned with the type known as content validity. Content validity is the
extent to which the test or test items are an accurate sample of the total
subject-matter content. Content validity relies heavily on the preparation of
good instructional objectives to define the subject-matter to be learned. It
refers to the adequacy nd representativeness of learning outcomes to be
measured.Properly written objectives can serve as a guide to the construction
of valid test items. It is assured with the use of Table of Specification.

If a test looks like it measures what it claims to, the test is said to have
face validity. A test with face validity may or may not actually produce data
which will correspond to the learning objectives. For example, an objective
may call for a student to classify four types of leaves. At first, the item may
seem valid, it has high face validity. Closer attention will show that the
objective requires higher levels of thinking to classify and describe each leaf.
The test then does not in fact measure what it purports to, and can be said to
have a low degree of content validity. No chemistry teacher would think of
Dr. Eric A. Matriano
P a g e | 58
PCK 2 – Assessment of Learning 1
measuring knowledge of analytical chemistry with a test on acid rain. Nor
would a biology teacher think seriously about measuring microscope skills
with an essay test.

3. Criterion-Related Validity --Do scores correlate to an outside reference? It is


established statistically such that a set of scores revealed by the measuring
instrument is correlated with the scores obtained in another external predictor
or measure. It has two purposes: concurrent and predictive.

a. Predictive Validity – involves the use of a criterion and a predictor. It


describes the future performance of an individual by correlating the
sets of scores obtained from two measures given at a longer time
interval.

(ex1: Do high scores on a 4th grade reading test accurately predict


reading skill in future grades?

Ex2. To correlate the results of college entrance test and the grade
weighted average (GWA) of students at some future time is
establishing predictive validity. The college entrance result is the
predictor and the GWA is the criterion.

b. Concurrent Validity. It describes the present status of the individual


by correlating the sets of scores obtained from two measures given at
a close interval.

The college entrance exam is compared or correlated with some


criterion available at the time of the test such as the general average in
the fourth year high school.

NOTE: The main is the difference is the time the criterion is available.
In predictive validity, the criterion is not yet available at the time the
test is administered. In concurrent validity, the criterion is already
available at the time the test is conducted.

4. Construct--Does the assessment correspond to other significant variables?


(ex: Do ESL students consistently perform differently on a writing exam than
native English speakers?)

Construct validity refers to how well a performance on a particular set of tasks


or components can be explained in terms of some psychological construct or traits
(such as intelligence, personality, aptitude in math, critical thinking and creative
thinking).

Dr. Eric A. Matriano


P a g e | 59
PCK 2 – Assessment of Learning 1
a. Convergent Validity-is established if the instrument defines another
similar trait other than what it is intended to measure. E.g. Critical Thinking
Test may be correlated with Creative Thinking Test.

b. Divergent Validity - is established if an instrument can describe only the


intended trait and not the other traits. E.g. Critical Thinking Test may not be
correlated with Reading Comprehension Test

Factors Influencing Validity

Teachers may be able to improve classroom tests' validity by using the list
below as a guide to test construction or as a checklist to review a test or a testing
situation.

A. Directions and test format


1. Are directions clear and complete?
2. Are students told how to record answers?
3. Can students record answers without making errors?
4. Are directions for specific items located near the items?
5. Are students told how items will be scored? How many points? Is it
permissible to guess?
6. Has the student been given an example of how to respond?
7. Is test printing large and bold enough to be easily read?
8. Is there adequate space for students to write answers?

B. Test items

Material tested

a. Do test items measure the instructional objectives and ideas to be learned


in the unit?
b. Do test items and instructional objectives correspond to what was taught in
the unit?
c. Is there an adequate number of items to reduce the effect of guessing?
d. Are items inappropriately difficult or easy?

Construction

a. Is the vocabulary and sentence structure within the ability of students?


b. Do test items violate any of the important rules for construction of items?

Dr. Eric A. Matriano


P a g e | 60
PCK 2 – Assessment of Learning 1
c. Are there identifiable cues within the test? For example, is there a pattern of
correct answers (T T T T F F F F), or is there heavy emphasis on A & B in multiple
choice, or one item containing the answer to another item?

Setting

a. Are there distractions such as noise, movement, student activity,


interesting material on chalkboards or elsewhere?
b. Are students giving or receiving answers from each other inappropriately?
c. Is there adequate time allowed

B. RELIABILITY

Reliability refers to the consistency of test results. If a test gives the same
results when measuring an individual or group on two different occasions then the
scores are reliable. If different teachers rate the same essay, for example, on the
same criteria and obtain the same score then we say the scores are reliable
from one rater to another. In both cases we are interested in consistency or
trustworthiness. In a simple example you might consider the task of measuring the
length of a room.

A reliable assessment is one which consistently achieves the same results


with the same (or similar) cohort of students. Various factors affect reliability –
including ambiguous questions, too many options within a question paper, vague
marking instructions and poorly trained markers. Traditionally, the reliability of an
assessment is based on the following:

1. Temporal stability: Performance on a test is comparable on two or more


separate occasions.
2. Form equivalence: Performance among examinees is equivalent on different
forms of a test based on the same content.
3. Internal consistency: Responses on a test are consistent across questions.
For example: In a survey that asks respondents to rate attitudes toward
technology, consistency would be expected in responses to the following
questions:

 "I feel very negative about computers in general."


 "I enjoy using computers."

There are several instruments you could use. You could step off the length on
two different occasions or two people could step it off. Another way would be to use
a large rubber band with marks on it at one foot intervals. Again you could measure
several times or several people could use the rubber band. A third choice might be to
use a steel measuring tape. The tape will obviously give more consistent results; the
more trustworthy results--from time to time and from measurer to measurer. Unless
the measurement can be shown to be reasonably consistent over different occasions
Dr. Eric A. Matriano
P a g e | 61
PCK 2 – Assessment of Learning 1
or over different samples of the same behavior little confidence can be placed in the
results. The results are not reliable.

Reliability is an important consideration. It would be helpful if several


teachers, each reading a book report, would give it the same score; otherwise, how
can the score or the feedback notes to the students be trusted? It would be desirable
if we could be sure that a test provided reliable scores of several samples of the
same behavior or of a class' behavior over a given period of time.

Although reliability is a desired quality it provides no assurance that evaluation


results will give the desired results. Little is gained if measures, or tests, consistently
give the wrong information. Refer to our example with the steel tape measure. As
reliable as it is, it is not a valid measure of room temperature. Of the two qualities,
reliability and validity, validity is the more important.

There are factors which influence the reliability of a test. They are:

Objectivity:

Scores for objective test items are less subject to the opinions or values of the
scorers and are thereby more reliable. In essay testing, when relying on
observations of students' performance, or when rating the products of their work,
scores tend to be unreliable. Later you will learn of some ways to increase reliability
in such situations.

Errors in the Test Itself or Intratest Error:

The test itself may contain poorly constructed items, items that give clues,
ambiguous items, very easy, very difficult and very high vocabulary reading level.

With norm referenced tests there must be a range of difficulty. Tests which
are too easy or too difficult tend to be less reliable.

Length of test

Length is a another factor affecting reliability. Generally speaking, the longer


the test the more reliable. More specifically, the more items testing an idea or a skill
the more reliable the test score will be. This is because more items reduce the
chance that guessing will affect the score.

Errors within Test Takers

Fatigue, hunger, headache, emotional upset, anxiety, growth and new


learning acquired before the test are some conditions that can affect the test scores.

Errors in Test Administration

Dr. Eric A. Matriano


P a g e | 62
PCK 2 – Assessment of Learning 1
Possible source of errors include lightning of the classroom, ventilation, noise,
seating arrangement, instructions, time allotment, attitude of test examiner, and the
like.

Errors in Test Scoring

This includes error in scoring such as miskey, mistake in correcting a wrong


answer, mistake in the use of required pencil, and subjective scoring.

METHODS OF ESTIMATING TEST RELIABILITY

1. Test – Retest Method

It determines how scores are consistent over a given period of time. The
same test is administered twice to the same group within a time interval in
between. The time interval may vary according to purpose of the test
administrator. Some authors suggest a period of two weeks or 15 days is
sufficient enough. If the time interval is short, students may recall their
answers in the first administration. If the time interval is long, tends to lower
the test reliability.

NOTE: It must be remembered that a test score is not a true score of the test
taker. An obtained score is a combination of his/her true score plus error of
measurement.

The Challenge for Us Teachers===== In order to approximate the true


score, errors of measurement have to be reduced.

2. Alternate Forms or Parallel Forms Method

It uses two different versions of the same test administered to the same group
close together in time (same day or next day). Form A have the same number of
items, same item format, and almost the same moderately difficult items with Form
B.

For example, the skill to be tested in both forms is to multiply two-digit number
with another two digit number.

FORM A : 35 x 75

FORM B : 25 x 82

We can say that Form A and Form B measure the same skill and the items
are parallel.

3. Test-Retest with Alternate Forms Method


Dr. Eric A. Matriano
P a g e | 63
PCK 2 – Assessment of Learning 1
This method consists of administering the two versions (Alternate Forms) of
the same test on two separate occasions. The time interval may be short (at
least two weeks) or long (six months).

4. Internal-Consistency Method

It employs only one test administration of the test given to the group.

It uses Split-Half Method (Spearman-Brown Prophecy), Kuder-Richardson


Formula 20 and 21.

0.90 and above Excellent Reliability at the level of the best standardized test

0.80 – 0.89 Very Good for a classroom test

0.70 – 0.79 Good for a classroom test. There are probably a few items
which could be improved.

0.60 – 0.69 Somewhat Low. This test needs to be supplemented by other


measures (more tests) to determine the grade. There are
probably some items that could be improved.

0.50 – 0.59 Suggests need for revision of test, unless it is quite short (ten or
fewer items. The test definitely needs to be supplemented by
other measures.

0.49 – below Questionable reliability. This test should not contribute


heavily to the course grade, and it needs revision.

C. FAIRNESS
An assessment procedure needs to be fair. This means many things. First,
students need to know exactly what the learning target are and what they are
suppose to be achieving, then they could get lost in the maze of concept being
discussed in class. Likewise, students have to e informed how their progress will be
assessed in order to allow them to strategize and optimize their performance.
Second assessment has to be viewed as an opportunity to learn rather than
the opportunity to weed out poor and slow learners. The goal should be that of
diagnosing the learning process rather than the learning object.
Third, fairness also implies freedom from teacher-stereo-typing. Some
example of stereotyping include: boys are better than girls in mathematics or girls
are better than boys in language. Such stereotyped image and thinking could lead to
unnecessary and unwanted biases in the way that teacher assess their student.

Dr. Eric A. Matriano


P a g e | 64
PCK 2 – Assessment of Learning 1
D. PRACTICALITY, ADMINISTRABILITY AND EFFICIENCY

Another quality of a good assessment procedure is practicality and efficiency.


An assessment procedure should be practical in the sense that the teacher should
be familiar with it, does not require too much time is in fact, implementable. A
complex assessment procedure tends to be difficult to score and interpret resulting in
a lot of misdiagnosis or too long feedback period which may render the test
inefficient. The test should be easy to administer such that the directions should
clearly indicate how a student should respond to the test/task items and how much
time should be spent for each test item or for this whole test.

E. ECONOMY

The test will not entail too much cost in terms of production cost and
administration cost. Paper-pencil test is more economical as compared to many
performance test. The test should also save time and effort spent for its
administration and that answer sheets must be provided so it can be given from time
to time.

F. SCORABILITY

The test should be easy to score such that directions for scoring are clear,
point/s for each correct answer(s) is/are specified.

G. INTERPRETABILITY

Test scores can easily be interpreted and described in terms of the specific
tasks that a student can perform or his/her relative position in a clearly defined
group.

SUMMATIVE ASSESSMENT

MULTIPLE CHOICE. Write the letter of the correct/best answer for each item. (20
points)

1. In evaluating the test which should be given first consideration?


a. Validity
b. Usability
c. Reliability
d. Administrability
2. Which statement concerning test validity and reliability is most accurate?
Dr. Eric A. Matriano
P a g e | 65
PCK 2 – Assessment of Learning 1
a. A test cannot be reliable unless it is valid
b. A test cannot be valid unless it is reliable
c. A test cannot be valid and reliable unless it is objective
d. A test cannot be valid and reliable unless it is standardized

3. If a teacher made test over emphasizes facts and under emphasizes the other
objectives of the course for which it is designed, what can be said about the
test?
a. It lacks content validity
b. It lacks construct validity
c. It lacks criterion-related validity
d. All of the above
4. Should a teacher make deductions for misspelled words in an examination in
a content course?
a. No; such deductions will invalidate the test
b. Yes; all errors should result in deductions
c. Yes; good spelling is goal for which all teachers are responsible
d. Yes; but only to the extent that spelling is a basic objective of the
course
5. What happens when deductions due to poor penmanship are made in a pupils
test in science?
a. It lowers the validity of the test
b. It lowers the reliability of the test
c. It lowers both the validity and reliability of the test
d. It does not affect the validity and reliability of the test
6. Which best describes validity?
a. Consistency in the test results
b. Adequacy of standardization
c. Homogeneity in the content of the test
d. Objectivity in the administration and scoring
7. Which is directly affected by the objectivity in scoring a test?
a. The validity of the test
b. The difficulty of the items
c. The interpretation of the test
d. The reliability of the test
8. Which describes an objective test?
a. It has definite norms which serve as basis for evaluating students
performance
b. It is a test in which teacher judgment in the construction of the test
is eliminated
c. It is a test where adequate answers get the same rating
d. Its items reflect directly upon the objectives
9. Why is it said that standardized tests can never completely replace informal
teacher-made test?
Dr. Eric A. Matriano
P a g e | 66
PCK 2 – Assessment of Learning 1
a. They lack validity for classroom purposes
b. They lack reliability for classroom purposes
c. They provide inadequate basis for interpretation
d. They call for special training in administration scoring
10. Why do experts recommend a variety of instruments and techniques of
evaluation?
a. They allow the teacher to use a variety of teaching procedures.
b. They make for greater objectivity in scoring
c. They yield a wider range of scores and permit better grading
d. They allow different objectives to be evaluated more adequately

11. The more objective a test is, the more it is


a. Reliable
b. Valid
c. Usable
d. Discriminating
12. A teacher, desirous of checking the reliability of her 30 – item science test,
analyze the scores she obtained when she administered the test to her 50
students. She computed the reliability coefficient between even- and odd-
number items. Which of these methods did she use?
a. Parallel forms method
b. Test-retreat method
c. Curricular method
d. Not reliable
13. If the teacher above obtained a reliability coefficient of 0.80, she can say that
her test was
a. highly reliable
b. fairly reliable
c. curricular method
d. not reliable
14. The teacher gave items in the test which were not taught nor assigned as
reading material. The test lacks:
a. Content Validity c. Predictive Validity
b. Construct Validity d. Concurrent Validity
15. Teacher uses two different versions of the same test administered to the
same group close together in time (same day or next day) to measure validity.
a. Test-Retest Method
b. Parallel or Alternate Forms Method
c. Test-Retest with Alternate Forms Method
d. Internal Consistency Method or Split-Half Method
16. What type of validity does the Pre-board. Examination possess if its results
can
explain how the students will likely perform in their licensure examination?
a. content c. construct
Dr. Eric A. Matriano
P a g e | 67
PCK 2 – Assessment of Learning 1
b. predictive d. concurrent

17. Mr. Carla Phokwang developed an Achievement Test in Math for her grade three
pupils. Before she finalized the test, she examined carefully if the test items were
constructed based on the competencies that have to be tested. What test validity is
she trying to establish?
a. content validity b. predictive validity
b. concurrent validity d. construct validity
18. Mrs. Rosa Sampaguita wants to establish the reliability of her achievement test
in English. Which of the following activities will help achieve her purpose?
a. Administer two parallel tests to two different groups of students.
b. Administer two equivalent tests to the same group of students.
c. Administer single test but to two different groups of students.
d. Administer two different tests but to the same group of students
19. Mr. Gringo Bato tried to correlate the scores of his pupils in the Social Studies
test with their grades in the same subject last 3rd Quarter. What test validity is he
trying to establish?
a. Content validity c. Concurrent validity
b. Construct validity d. Criterion-related validity
20. The more objective a test is, the more it is
a. Reliable c. Usable
b. Valid d. Practical

Essay. Answer the following briefly.(5 points each)


1. If a test is valid, is it reliable? Are all reliable tests valid?
2. Why is it important for a teacher to consider the validity and reliability of the
test he/she has to prepare? Explain your answer.
3. Discuss how Table of Specifications help in increasing the validity of a test.
4. Explain at least three (3) factors affecting the validity of a test.
5. Explain at least three (3) factors affecting the reliability of a test.

Dr. Eric A. Matriano


P a g e | 68
PCK 2 – Assessment of Learning 1

You might also like