Assessment of Learning Hand Outs PDF
Assessment of Learning Hand Outs PDF
Basic Concepts
• Test - an instrument designed to measure any characteristic, quality, ability, skill or knowledge
• Measurement - a process of quantifying the degree to which someone or something possesses a given trait (i.e. quality, characteristics,
feature)
• Assessment - a process of gathering and organizing quantitative or qualitative data into an interpretable form to have a basis for judgment
or decision-making
• Evaluation - a process of systematic collection and analysis of both qualitative and quantitative data in order to make some judgment or
decision; involves judgment about the desirability of changes in students
Assessment
• Traditional Assessment – refers to pen and paper mode of assessing any quality, ability, skill or knowledge (Ex. standardized and teacher-
made tests)
• Alternative Assessment
◦ Performance-based Assessment - a mode of assessment that requires the students to perform a significant task that is relevant to a
task outside the school (Ex. practical test, oral and aural tests, projects)
◦ Portfolio Assessment - a process of gathering multiple indicators of student progress to support course goals in dynamic, ongoing
and collaborative process
• Authentic Assessment - refers to the use of assessment methods that simulate true-to-life situations
Purposes of Assessment
Performance-based Assessment
• A process of gathering information about student’s learning through actual demonstration of essential and observable skills and creation of
products that are grounded in real world contexts and constraints
A. Generalizability E. Feasibility
B. Authenticity F. Scorability
C. Multiple foci G. Fairness
D. Teachability
How?
• Identify the competency that has to be demonstrated by the students with or without a product.
• Describe the task to be performed by the students either individually or as a group, the resources needed, time allotment and other
requirements to be able to assess the focused competency.
• Develop a scoring rubric reflecting the criteria, levels of performance and the scores.
Portfolio Assessment
• A purposeful, ongoing, dynamic, and collaborative process of gathering multiple indicators of the learner’s growth and development
• Also performance-based but more authentic than any other performance-based task
Page 1
Principles of Portfolio Assessment
Types of Portfolios
2. Collect
Evidences 7. Exhibit
3. Select
6. Evaluate
4. Organize
5. Reflect
Rubrics
• A measuring instrument used in rating performance-based tasks
• Offers a set of guidelines or descriptions in scoring different levels of performance or qualities of products of learning
Types of Rubrics
Type Description
Holistic Describes the overall quality of a performance or product; there is only one rating given to the entire work or performance
Rubric
Analytic Describes the quality of a performance or product in terms of the identified dimensions and/or criteria which are rated
Rubric independently to give a better picture of the quality of work or performance
Whether holistic or analytic, the rubric should have the following information
• Competency to be tested – this should be a behavior that requires either a demonstration or creation of products of learning
• Performance task – the task should be authentic, feasible, and has multiple foci
• Evaluative criteria and their indicators – these should be made clear using observable traits
• Performance levels – these levels could vary in number from 3 or more
• Qualitative and quantitative descriptions of each performance level – these descriptions should be observable to be measurable
Tests
Purposes/Uses of Tests
• Instructional (Ex. grouping learners for instruction within a class, identifying learners who need corrective and enrichment experiences,
assigning grades)
• Guidance (Ex. preparing information/data to guide conferences with parents about their children, determining interests in types of
occupations not previously considered or known by the students)
• Administrative (Ex. determining emphasis to be given to the different learning areas in the curriculum, determining appropriateness of the
school curriculum for students of different levels of ability)
Page 2
Types of Tests
According to:
Educational Test Psychological Test
• Aims to measure the results of • Aims to measure students
instruction intelligence or mental ability in a
• Administered after the large degree without reference to
instructional process what the student has learned
What it measures (Purpose) Example: Achievement Test • Intangible aspects of an
individual
• Administered before the
instructional process
Examples: Aptitude Test, Personality Test,
Intelligence Test
Norm-Referenced Test Criterion-Referenced Test
• Result is interpreted by • Result is interpreted by
comparing one student with other comparing a student against a
students set of criteria
How it is interpreted (Interpretation) • Some will really pass • All or none may pass
• There is competition for a limited • There is NO competition for a
percentage of high score. limited percentage of high score.
• Describes student’s performance • Describes student’s mastery of
compared to others the course objective
Survey Mastery Test
• Covers a broad range of • Covers a specific learning
objectives objective
The scope of the test (Scope and Content) • Measures general achievement • Measures fundamental skills and
in certain subjects abilities
• Is constructed by trained • Is typically constructed by the
professional teacher
Power Speed
• Consists of items of increasing • Consists of items with the same
Level of difficulty of the test and time level of difficulty but taken with level of difficulty but taken with
allotment (Time Limit and Level of ample time time limit
Difficulty) • Measures a student’s ability to • Measures student’s speed and
answer more and more difficult accuracy in responding
items
Individual Group
• Given to one student at a time • Given to many individuals at the
• Mostly given orally or requires same time
actual demonstration of skill • Usually a pencil and paper test
Manner of administration • Many opportunities for clinical • Lack of insights about the
observation examinee
• Chance to follow-up examinee’s • Same amount of time needed to
response in order to clarify gather information from each
student (i.e. efficient)
Verbal Non-Verbal
Language mode
• Words are used by students in • Pictures or symbols are used by
attaching meaning to or students in attaching meaning to
responding to test items or in responding to test items
Standardized Informal
• Made by an expert; tried out, so it • Made by the classroom teacher;
can be used to a wider group not tried out
• Covers a broad range of content • Covers a narrow range of content
covered in a subject area • Various types of items are used
Who constructed the test and who can • Uses mainly multiple choice • Teacher picks or writes items as
take it (Construction) • Items written are screened and needed for the test
the best items are chosen for the • Scored by a teacher
final instrument • Interpretation of results is usually
• Can be scored by a machine criterion-referenced
• Interpretation of results is usually
norm-referenced
Objective Subjective
• Scorer’s personal biases do not • Affected by scorer’s personal
affect scoring bias, opinion, or judgment
• Worded so that only one answer • Several answers are possible
Degree of influence of the rater on the
satisfies the requirement of the • Possible disagreement on what is
outcome (Effect of Biases)
statement the correct answer
• Little or no disagreement on what
is the correct answer
Examples:
• Short Answer
• Completion Test
• Observational Techniques
◦ Anecdotal records
◦ Peer appraisal
▪ Guess-Who technique
▪ Sociometric technique
◦ Self-report technique
◦ Attitude scales
• Personality Assessments
◦ Personality inventories
◦ Creativity tests
◦ Interest inventories
Page 4
Interpreting the Difficulty and Discrimination Indices
1. Multiple Choice
a) The stem of the item should be meaningful by itself and should present a definite problem.
b) The item should include as much of the item and should be free of irrelevant material.
c) Use a negatively stated item stem only when significant learning outcomes required it.
d) Highlight negative words in the stem for emphasis.
e) All the alternatives should be grammatically consistent with the stem of the item.
f) An item should only have one correct or clearly best answer.
g) Items used to measure understanding should contain novelty, but beware of too much.
h) All distracters should be plausible.
i) Verbal associations between the stem and the correct answer should be avoided.
j) The relative length of the alternatives should not provide a clue to the answer.
k) The alternatives should be arranged logically.
l) The correct answer should appear in each of the alternative positions and approximately equal number of times but in random order.
m) Use of special alternatives such as “none of the above” or “all of the above” should be done sparingly.
n) Do not use multiple-choice items when other types are more appropriate.
o) Always have the stem and alternatives on the same page.
p) Break any of these rules when you have a good reason for doing so.
2. Alternative Response
a) Avoid broad statements.
b) Avoid trivial statements.
c) Avoid the use of negative statements, especially double negatives.
d) Avoid long and complex sentences.
e) Avoid including two ideas in one statement unless cause-effect relationships are being measured.
f) If opinion is used, attribute it to some source unless the ability to identify opinion is being specifically measured.
g) True statements and false statements should be approximately equal in length.
h) The number of true statements and false statements should be approximately equal.
i) Start with false statement since it is a common observation that the first statement in this type is always positive.
3. Matching Type
a) Use only homogeneous material in a single matching exercise.
b) Include an unequal number of responses and premises, and instruct the students that responses may be used once, more than once,
or not at all.
c) Keep the list of items to be matched brief, and place the shorter responses at the right.
d) Arrange the list of responses in logical order.
e) Indicate in the directions the basis for matching the responses and premises.
f) Place all the items for one matching exercise on the same page.
1. Word the item(s) so that the required answer is both brief and specific.
2. Do not take statements directly from textbooks as a basis for short answer items.
3. A direct question is generally more desirable than an incomplete statement.
4. If the item is to be expressed in numerical units, indicate the type of answer wanted.
5. Blanks for answers should be equal in length.
6. Answers should be written before the item number for easy checking.
7. When completion items are to be used, do not have too many blanks. Blanks should be within or at the end of the sentence and not at the
beginning.
Page 5
Suggestions for Writing Essay Type Tests
1. Restrict the use of essay questions to those learning outcomes that cannot be satisfactorily measured by objective items.
2. Avoid the use of optional questions.
3. Indicate the approximate time limit or the number of points for each question.
4. Prepare the scoring guide (rubric) for the essay questions.
Validity is the degree to which the test measures what it intended to measure. It is the usefulness of the test for a given purpose.
Types of Validity
Reliability
Reliability refers to the consistency of scores obtained by the same person when retested using the same instrument or one that is parallel to it.
Any measure indicating the center of a set of data, arranged in an increasing or decreasing order of magnitude, is called a measure of central
location or a measure of central tendency.
• The arithmetic mean is the sum of the data values divided by the total number of values.
Page 6
• The median of a set of numbers arranged in order of magnitude is the middle value or the arithmetic mean of the two middle values.
• The mode is defined to be the value that occurs most often in a data set. The mode may not exist, and even if it does exist, it may not be
unique.
• The mean (or median or mode) is the point on the scale around which scores tend to group
• It is the average or typical score which represents a given group of subjects
• Given two or more values of central tendency, one can define who performed poor, good, better, or best
Measures of Variability
A measure of variation or dispersion describes how large the differences between the individual scores.
• The larger the measure of variability, the more spread the scores, and the group is said to be heterogeneous.
• The smaller the measure of variability, the less spread the scores, and the group is said to be homogenous.
• The range of a set of data is the difference between the largest and smallest number in the set.
N
• Given the finite population x1 , x 2 ,..., x N , the population standard deviation is s= ∑i=1 x i − 2
N
• Quartile Deviation: QD =
Q3 - Q1
2
Percentiles
• Percentiles divide the distribution into 100 groups.
• Deciles divide the data set into 10 groups. Deciles are denoted by D1,D2 ,...,D9 with the corresponding percentiles being P10 ,P20 ,...,P90
.
• Quartiles divide the data set into 4 groups. Quartiles are denoted by Q1 , Q2 , and Q3 with the corresponding percentiles being P25 , P50 ,
and P75 .
• The interquartile range, IQR = Q3 - Q1 .
Standard Scores
• The standard score or z-score for a value is obtained by subtracting the mean from the value and dividing the result by the standard
deviation. It represents the number of standard deviations a data value falls above or below the mean.
Stanines
• Standard scores that tell the location of a raw score in a specific segment in a normal distribution which is divided into 9 segments,
numbered from a low of 1 through a high of 9
• Scores falling within the boundaries of these segments are assigned one of these 9 numbers (standard nine)
t-Score
• Tells the location of a score in a normal distribution having a mean of 50 and a standard deviation of 10
Measures of Shape
Page 7
Kurtosis – the peakedness or flatness of the distribution
• Mesokurtic – moderate peakedness
• Leptokurtic – more peaked or steeper than a normal distribution
• Platykurtic – flatter than a normal distribution
Other Shapes
• Bimodal – curves with two peaks or mode
• Polymodal – curve with three or mode modes
• Rectangular – there is no mode
Assigning Grades/Marks/Ratings
Marking/Grading is the process of assigning value to a performance.
Grade Condition
F Not coming to class regularly or not turning in the required works
D Coming to class regularly and turning in the required work on time
C Coming to class regularly, turning in the required work on time, and
receiving a check mark on all assignments to indicate they are
satisfactory
B Coming to class regularly, turning in the required work on time, and
Page 8
receiving a check mark on all assignments except at least three that
achieve a check-plus, indicating superior achievement
A As above, plus a written report on one of the books listed for
supplementary reading
1. Explain your grading system to the students early in the course and remind them of the grading policies regularly.
2. Base grades on a predetermined and reasonable set of standards.
3. Base grades on as much objective evidence as possible.
4. Base grades on the student’s relative standing compared to classmates.
5. Base grades on a variety of sources.
6. As a rule, do not change grades.
7. Become familiar with the grading policy of your school and with your colleague’s standards.
8. When failing a student, closely follow school procedures.
9. Record grades on report cards and cumulative records.
10. Guard against bias in grading.
11. Keep students informed of their standing in the class.
Page 9
Exercises
1. A class is composed of academically poor students. The distribution would be most likely to be ____________.
A. skewed to the right C. a bell curve
B. leptokurtic D. skewed to the left
2. A negative discrimination index means that ___________.
A. the test item has low reliability
B. the test item could not discriminate between the lower and upper groups
C. more from the lower group answered the test item correctly
D. more from the upper group got the item correctly
3. A number of test items are said to be non-discriminating? What conclusion/s can be drawn?
I. Teaching or learning was very good.
II. The item is so easy that anyone could get it right.
III. The item is so difficult that nobody could get it.
A. II only C. III only
B. I and II D. II and III
4. A positive discrimination index means that
A. the test item has low reliability.
B. the test item could not discriminate between the lower and upper groups.
C. more from the upper group got the item correctly.
D. more from the lower group got the item correctly.
5. A quiz is classified as a
A. diagnostic test. C. summative test.
B. formative test. D. placement test.
6. A teacher would use a standardized test ___________.
A. to serve as a unit test C. to compare her students to national norms
B. to engage in easy scoring D. to serve as a final examination
7. A test item has a difficulty index of 0.81 and a discrimination index of 0.13. What should the test constructor do?
A. Make it a bonus item. C. Reject the item.
B. Retain the item. D. Revise the item.
8. An examinee whose score is within x1 SD belongs to which of the following groups?
A. Above average C. Needs improvement
B. Below average D. Average
9. Are percentile ranks the same as percentage correct?
A. It cannot be determined unless scores are given.
B. It cannot be determined unless the number of examinees is given.
C. No
D. Yes
10. Assessment is said to be authentic when the teacher ___________.
A. considers students’ suggestions in testing
B. gives valid and reliable paper-pencil test
C. includes parents in the determination of assessment procedures
D. gives students real-life tasks to accomplish
11. Below is a list of methods used to establish the reliability of the instrument. Which method is questioned for light reliability due
to practice and familiarity?
A. Split half
B. Equivalent forms
C. Test retest
D. Kuder Richardson
12. Beth is one-half standard deviation above the mean of her group in arithmetic and one standard deviation above the mean in
spelling. What does this imply?
A. She is better in arithmetic than in spelling when compared to the group.
B. She excels both in spelling and in arithmetic.
C. In comparison to the group, she is better in spelling than in arithmetic.
D. She does not excel in spelling nor in arithmetic.
13. Concurrent validity requires
A. correlation study. C. item difficulty.
B. item analysis. D. peer consultation.
14. For mastery learning which type of testing will be most fit?
A. Norm-referenced testing
B. Criterion-referenced testing
C. Formative testing
D. Aptitude testing
15. For maximum interaction, which type of questions must a teacher avoid?
A. Rhetorical C. Leading
B. Informational D. Divergent
16. “Group the following items according to phylum” is a thought test item on ________________.
A. inferring
B. classifying
C. generalizing
D. comparing
17. HERE IS A COMPLETION TEST ITEM: THE __________ IS OBTAINED BY DIVIDING THE __________ BY THE
__________. The rule in completion test item construction violated is
A. avoid over mutilated statements
B. avoid grammatical clues to the answer
C. avoid infinite statements
D. the required response should be a single word or a brief phrase
18. “If all the passers of 2006 Licensure Examination for Teachers will turn out to be the most effective in the Philippine school
system, it can be said that this LET possesses ______________ validity.
A. construct C. predictive
B. content D. concurrent
19. If all your students in your class passed the pretest, what should you do?
A. Administer the posttest.
B. Go through the lesson quickly in order not to skip any.
C. Go on to the next unit.
D. Go through the unit as usual because it is part of the syllabus.
20. If I favor “assessment for learning”, which will I do most likely?
A. Conduct a pretest, formative and summative test.
B. Teach based on pretest results.
C. Give specific feedback to students.
D. Conduct peer tutoring for students in need of help.
21. If teacher wants to test student’s ability to organize ideas, which type of test should she formulate?
A. Technical problem type
B. Short answer
C. Multiple-Choice type
D. Essay
22. If the computed range is low, this means that ____________.
A. The students performed very well in the test.
B. The difference between the highest and the lowest score is high.
C. The students performed very poorly in the test.
D. The difference between the highest and the lowest score is low.
23. If your Licensure Examination Test (LET) items sample adequately the competencies listed in the syllabi, it can be said that
the LET possesses __________ validity.
A. concurrent C. content
B. construct D. predictive
24. In a 50-item test where the mean is 20 and the standard deviation is 8, Soc obtained a score of 16. What descriptive rating
should his teacher give him?
A. Average C. Poor
B. Below average D. Above average
25. In a grade distribution, what does the normal curve mean?
A. All students have average grades.
B. A large number of students have high grades and very few with low grades.
C. A large number of more or less average students and very few students receive low and high grades.
D. A large number of students receive low grades and very few students get high grades.
26. In a Science class test, one group had a range within the top quarter of 15 points and another group on the same
measurement had a range of 30 points. Which statement applies?
A. The first group is more varied than the second group.
B. The first group has a variability twice as great as the second group within the top quarter.
C. The second group has a variability twice as great as the first group within the top quarter.
D. The second group does not differ from the first group in variability.
27. In an entrance examination, student A’s Percentile is 25 (P 25). Based on this Percentile rank, which is likely to happen?
A. Student A will be admitted.
B. Student A has 50-50 percent chance to be admitted.
C. Student A will not be admitted.
D. Student A has 75 percent chances to be admitted.
28. In group norming, the percentile rank of the examinee is
A. dependent on his batch of examinees.
B. independent on his batch of examinees.
C. unaffected by skewed distribution.
D. affected by skewed distribution.
29. In her item analysis, Teacher G found out that more from the upper group got test item no. 6 correctly. What conclusion can be
drawn? The test item has a ________.
A. high difficulty index C. positive discrimination index
B. high facility index D. negative discrimination index
30. In his second item analysis, Teacher H found out that more from the lower group got the test item no. 6 correctly. This means
that the test item __________.
A. has a negative discriminating power C. has a positive discriminating power
B. has a lower validity D. has a high reliability
31. In test construction, what does TOS mean?
A. Table of Specifications C. Table of Specific Test Items
B. Table of Specifics D. Terms of Specification
32. In the context on the theory on multiple intelligences, what is one weakness of the paper-pencil test?
A. It is not easy to administer.
B. It puts the non-linguistically intelligent at a disadvantage
C. It utilizes so much time.
D. It lacks reliability.
33. In the parlance of test construction, what does TOS mean?
A. Team of Specifications C. Table of Specifications
B. Table of Specifics D. Terms of Specifications
34. In which competency did my students find the greatest difficulty? In the item with a difficulty index of ____________.
A. 0.1 C. 0.9
B. 1.0 D. 0.5
35. In which type of grading do teachers evaluate students’ learning not in terms of grade but by evaluating the students in terms
of expected and mastery skills?
A. Point grading system C. Mastery grading
B. Relative grading D. Grade contracting
36. Is it wise practice to orient our students and parents on our grading system?
A. No, this will court a lot of complaints later.
B. Yes, but orientation must be only for our immediate customers, the students.
C. Yes, so that from the very start student and their parents know how grades are derived.
D. No, grades and how they are derived are highly confidential.
37. It is good to give students challenging and creative learning tasks because
A. development is aided by stimulation. C. development is affected by cultural changes.
B. the development of individuals is unique. D. development is the individual’s choice.
38. Marking on a normative basis means that ___________.
A. the normal curve of distribution should be followed
B. the symbols used in grading indicate how a student achieved relative to other students
C. some get high marks
D. some are expected to fail
39. Median is to point as standard deviation is to __________.
A. area C. distance
B. volume D. square
40. Ms. Celine gives a quiz to her class after teaching a lesson. What does she give?
A. Diagnostic test C. Performance test
B. Summative test D. Formative test
41. NSAT and NEAT results are interpreted against set mastery level. This means that NSAT and NEAT fall under __________.
A. intelligence test C. criterion-referenced test
B. aptitude test D. norm-referenced test
42. On the first day of class after initial introductions, the teacher administered a Misconception/Preconception Check. She
explained that she wanted to know what the class as a whole already knew about Philippines before the Spaniards came. On
what assumption is this practiced based?
A. Teachers teach a number of erroneous information in history.
B. A Misconception/Preconception check determines students’ readiness for instruction.
C. The greatest obstacle to new learning often is not the students’ lack of prior knowledge but, rather, the existence of
prior knowledge.
D. History books are replete with factual errors.
43. Other than finding out how well the course competencies were met, Teacher K also wants to know his students’ performance
when compared with other students in the country. What is Teacher K interested to do?
A. Authentic evaluation C. Formative evaluation
B. Norm-referenced evaluation D. Criterion-referenced evaluation
44. Other than the numerical grades found in students’ report cards, teachers are asked to give remarks. On which belief is this
practice based?
A. Numerical grades have no meaning.
B. Giving remarks about each child is part of the assessment task of every teacher.
C. Remarks, whether positive or negative, motivate both parents and learner.
D. Grades do not reflect all developments that take place in every learner.
45. Out of 3 distracters in a multiple choice test item, namely B, C and D no pupil chose D as an answer. This implies that D is
____________.
A. an ineffective distracter C. a plausible distracter
B. a vague distracter D. an effective distracter
46. Q1 is 25th percentile as median is to ____________.
A. 40th percentile C. 50th percentile
th
B. 60 percentile D. 75th percentile
47. Quiz is to formative test while periodic is to __________
A. criterion-reference test C. norm-reference test
B. summative test D. diagnostic test
48. Range is to variability as mean is to _____________.
A. level of facility C. correlation
B. level of difficulty D. central tendency
49. Referring to assessment of learning, which statement on the normal curve is FALSE?
A. The normal curve may not necessarily apply to homogenous class.
B. When all pupils achieve as expected their learning curve may deviate from the normal curve.
C. The normal curve is sacred. Teachers must adhere to it no matter what.
D. The normal curve may not be achieved when every pupils acquires targeted competencies.
50. Ruben scored 60 on a percentile-ranked test. This means that __________.
A. Ruben got 60% of the question wrong.
B. 60% of the students who took the test scored higher than Ruben.
C. 60% of the students who took the test scored lower than Ruben.
D. Ruben got 60% of the questions right.
51. Standard deviation is to variability as mode is to ___________________.
A. correlation C. discrimination
B. level of difficulty D. central tendency
52. Standard deviation is to variability as mean is to __________.
A. coefficient of correlation C. discrimination index
B. central tendency D. level of difficulty
53. Study this group of test which was administered to a class to whom Peter belongs, then answer the question:
In which subject(s) did Peter perform most poorly in relation to the group’s mean performances?
A. English C. English and Physics
B. Physics D. Math
54. Study this group of tests which was administered with the following results, then answer the question: