0% found this document useful (0 votes)
46 views14 pages

Ail 2

The document discusses principles of high-quality assessment. It defines high-quality assessment as focusing on using assessment results and consequences for students, not just on test validity. Key principles discussed include having clear and appropriate learning targets, selecting assessment methods appropriately matched to targets, ensuring validity of inferences from assessments, and maintaining reliability of assessment results. A concept map shows the interrelation of these principles for high-quality classroom assessment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views14 pages

Ail 2

The document discusses principles of high-quality assessment. It defines high-quality assessment as focusing on using assessment results and consequences for students, not just on test validity. Key principles discussed include having clear and appropriate learning targets, selecting assessment methods appropriately matched to targets, ensuring validity of inferences from assessments, and maintaining reliability of assessment results. A concept map shows the interrelation of these principles for high-quality classroom assessment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Principles of High-Quality Assessment

I. Introduction
Formulating instructional objectives or learning targets is identified as the first step in
conducting both the process of teaching and evaluation. Once you have determined your
objectives or learning targets, or have answered the question “what to assess”, you will probably
be concerned with answering the question “how to assess?” At this point, it is important to keep
in mind several criteria that determine the quality and credibility of the assessment methods that
you choose. This lesson will focus on the different principles or criteria and it will provide
suggestions for practical steps you can take to keep the quality of your assessment high. At the
end of your reading, you must be able to attain the following objectives:

 To demonstrate understanding on the different principles of high-quality assessment;


 To state the properties of assessment tools; and
 To recognize the importance of the principles of high-quality assessment in the
construction of teacher-made test
II. Learning Content
Before moving on the different criteria, let us first answer the question, “what is high-
quality assessment”?
Until recently, test validity, reliability and efficiency describe quality of classroom assessment,
and this has put emphasis on highly technical, statistically sophisticated standards. In most
classrooms however, such technical qualities have little relevance because the purpose of
assessment is different. This does not mean underscoring the importance of validity and reliability
of the assessment methods rather high-quality assessment adds other criteria as well.
High-quality assessment is not only concerned on the detailed inspection of the test itself;
rather focus on the use and consequences of the results and what assessment get students to do.
The criteria of high-quality assessment which will be discussed in this lesson in detail are
presented on a concept map in Figure 1.
Figure 1. Criteria for ensuring high-quality classroom assessment
Other principles of high-quality assessment include balance, clear communication,
continuity, authenticity and ethics

1
Clear and
appropriate
learning
targets

Select
Appropriate Fairness
methods

High quality

assessment Positive
Validity
Consequences

Practicality
Reliability and
efficiency

CLEAR AND APPROPRIATE LEARNING TARGETS


Sound assessment begins with clear and appropriate learning targets. Learning target is
defined as a statement of student performance that includes both a description of what students
should know, understand, and be able to do at the end of the unit of instruction and as much as
possible about the criteria for judging the level of performance.
TYPES AND SOURCES OF LEARNING TARGETS
According to Stiggins and Conklin (1992), there are five types of learning targets. As
summarized in Table 1, these targets are not presented as a hierarchy or order. None of these is
more important than any other, rather each simply represents types of targets that can be
identified and used for assessment.
Types of Learning Targets

Knowledge and simple understanding Student mastery of substantive subject matter


2
and procedure`
Deep understanding and reasoning Student ability to reason and solve problems
Skills Student ability to demonstrate achievement
related skill and performing psychomotor
behaviors
Products Student ability to create achievement-related
products such as written reports, oral
presentation and art products
Affects Student attainment of affective states such as
attitudes, values, interest and self-efficacy

The types of learning targets presented provide a start to identifying the focus of
instruction and assessment, but you will find other sources that are more specific about learning
targets such as the Bloom’s Taxonomy of Objectives.
Cognitive Domain

Blooms Taxonomy Revised Blooms Taxonomy Illustrative Verbs


Knowledge Remember Names, lists, recalls, defines,
describes
Comprehension Understand Explains, rephrase,
summarizes, converts,
interprets
Application Apply Demonstrates, modifies,
produces, solves, applies
Analysis Analyze Distinguishes, compares,
differentiates, classifies
Synthesis Evaluate For Synthesis: generates,
combines, constructs,
formulates, proposes
Evaluation Create (synthesis) For evaluation: justifies,
criticizes, concludes, supports,
defends, confirms

Each level of the taxonomy represents an increasingly complex type of cognition, with
knowledge level considered as the lowest level. However, the remaining five levels are referred to
as “Intellectual abilities and skills.” Though this categorization of cognitive tasks was created
more than 50 years ago, and other more contemporary frameworks were offered, the taxonomy is
still valuable in providing a comprehensive list of possible learning objectives with clear action
verbs that operationalize the learning targets.
APPROPRIATENESS OF ASSESSMENT METHODS
Many different approaches or methods are used to assess students but your choice will
greatly depend on the match of the learning target and the method. The different methods of
assessment are categorized according to the nature and characteristics of each method. There are
four major categories: selected-response, teacher observation, and self-report
I. Selective Response
a) Multiple Choice
b) Binary Choice (e.g., true/false)
c) Matching
3
II. Constructed Response
a) Brief conducted response
1. Short answer
2. Completion
3. Label a diagram
b) Performance-based tasks
1. Products
 Paper
 Project
 Poem
 Portfolio
 Reflection
 Journal
 Graph/Table
2. Skills
 Speech
 Demonstration
 Debate
 Recital
c) Essay Items
1) Restricted-response
2) Extended-response

d) Oral Questioning
1) Informal questioning
2) Examinations
3) Interviews

III. Teacher Observation


a. Informal
b. Formal
c.
IV. Self-Report
a. Attitude survey
b. Questionnaires
c. Inventories
VALIDITY
Validity is a familiar concept that is the heart of any type of high-quality assessment. It
refers to the characteristic that refers to the appropriateness of the inferences, uses and
consequences that result from the assessment. The more popular definition for this concept states
that “it is the extent to which a test measures what is supposed to measure”. Although this notion
is important, validity is more than that. Validity is concerned with the soundness, trustworthiness,
or legitimacy of the inferences that were made on the basis of the obtained scores. In other words,
is the interpretation made from the test result reasonable? Is the information that I have gathered
the right kind of evidence for the decision I need to make or the intended use? How sound is the
interpretation of the information.
How do we determine the validity of the assessment method or the test that we use?
4
Validity is always determined by professional judgement. This judgement is made by the
user of the information (i.e. the teacher for classroom assessment). Traditionally, validity comes
from three evidences: content-related, criterion-related and construct related. How can teachers
use these evidences, as well consequences and uses, to make an overall judgement about the
degree of validity of the assessment. The contemporary idea of validity is unitary, with the view
that there are different types of evidence to use in determining validity, rather than the traditional
view that there are different types of validity.
Content-related evidence. Suppose you wanted to test for everything sixth-grade students
learn in a four-week unit on insects. Can you imagine how long the test would be and how much
time the students would take to complete the test? What you do is to select a sample of what has
been taught, and use this student achievement as basis for judging that the students demonstrate
knowledge about the unit. Adequate sampling of course is determined by your professional
judgement. This can be done by reviewing the match between the intended interferences and what
is on the test. This process begins with clear learning targets and prepares a table of specification
for these targets. The table of specification or the test blueprint is a two-way grid that shows the
content and types of learning targets. A blank table of specification is presented in Figure 2.
A sample Table of specification (TOS) of an achievement test in Science

Learning Targets
Major Knowledge/simple Deep Skills Products Affects Totals
Content understanding understanding
Areas and reasoning
1. (Topic) No./% No./% No./% No./% No./% No./%
2. (Topic) No./% No./% No./% No./% No./% No./%
3. (Topic) No./% No./% No./% No./% No./% No./%
4. Mammals 4/8% No./% No./% No./% No./% No./%
- - - - - - -
- - - - - - -
N (Topic) - - - - - -
Total no. of No./% No./% No./% No./% No./% 50/100%
items/% of
the test

The table is completed by simply indicating the number of items (No.) and the percentage
of items from each type of learning target. For example, if the topic were vertebrates, you might
have mammals as one topic. If there were four knowledge items for mammals, and this was 8
percent of the test (N=50), then 4/8% would be included in that table under knowledge. The rest
of the tale is completed by your judgement as to wether which learning targets will be assessed,
what area of the content will be sampled, and how much of the assessment is measuring each
target. In this process, evidence of content-related validity is established.
Another consideration related to this type of evidence is the extent to which an
assessment can be said to have instructional validity or concerned with the match between what
is taught and what is assesses. One way to check this is to examine the table of specifications
after teaching a unit to determine if the emphasis in different areas is consistent with what was
emphasized in class. For example, if you emphasize knowledge in teaching a unit (e.g., facts,
definition of terms, places dates and names), it would not be logical to test for reasoning and the
make inferences about the knowledge students learned in the class.

5
Criterion-related evidence. This is established by relating an assessment to some other
valued measure (criterion) that either provides an estimate of current performance (concurrent
criterion-related evidence) or predicts future performance (predictive criterion-related evidence)
Classroom teachers do not conduct formal studies to obtain correlation coefficients that will
provide evidence of validity, but the principle is very important for teachers to employ. The
principle is that when you have two or more measures of the same thing, and these measures
provide similar results, then you have established criterion-related evidence. For examples, if
your assessment of a student’s score in quiz that tests steps in using microscope, then you have
criterion-related evidence that your inference about the skill of this student is valid.
Similarly, if you are interested in the extent to which preparation by your students, as
indicated by scores on a final exam in mathematics, predicts how well they will do next year, then
you can examine the grades of previous students and determine informally if students who scored
high on your final exam are getting high grades and students who scored low on your final exam
are obtaining low grades. If correlation is found, then an inference about predicting how your
students will performed, based on their final exam is valid, particularly, predictive criterion-
related validity.
Construct-related evidence. A construct refers to an unobservable trait or characteristics
that a person possesses, such as intelligence, reading comprehension, honesty, self-concept,
attitude, reasoning, learning style and anxiety. These are not measured directly rather the
characteristic is constructed to account for behavior that can be observed. Three types of
construct-related evidence are theoretical, logical and statistical. Theoretical explanation is to
define the characteristic in such a way that its meaning is clear and not confused with any other
constructs like “what is attitude or ‘how much students enjoy reading”. Logical analyses on the
other hand can be done by asking the students to comment on what they were thinking when they
answered the questions, or compare the scores of groups who, as determined by other criteria,
should respond differently. Finally, statistical procedures can be used to correlate scores from
measures of the construct. For example, self-concept of academic ability scores from one survey
should be related to another measure of the same thing (convergent construct-related evidence)
but less related to measures of self-concept of physical ability (divergent construct-related
evidence)
RELIABILITY
Like validity, term reliability has been used for so many years to describe an essential
characteristic of sound assessment. Reliability is concerned with the consistency, stability, and
dependability of the results. In other words, a reliable result is one that shows similar
performance at different times or under different conditions.
Suppose Mrs. Reyes is assessing her student’s addition and subtraction skills, she decided
to give the students a twenty-point quiz to determine their skills. She examines the results but
wants to be sure about the level of performance before designing appropriate instruction. So she
gives another quiz two days later on the same addition and subtraction skills. The results are as
follows:

Addition Subtraction
Quiz 1 Quiz 2 Quiz 1 Quiz 2
Carlo 18 16 13 20
Kate 10 12 18 10
Jane 9 8 8 14
Fely 16 15 17 12
6
The scores for addition are fairly consistent. All four students scored within one or two points on
the quizzes; students who scored high on the first quiz also scored high on the second quiz, and
the students scored low did so on both quizzes. Consequently, the results for addition are reliable.
For subtraction, on the other hand, there is considerable change in performance from the first to
the second quiz. Students scoring low on the first quiz score high on the second. For subtraction,
then, the results are unreliable because they are not consistent. The scores contradict one another.
The teacher’s goal is to use the quiz to accurately determine the defined skill. In the case
of addition, she can get a fairly accurate picture with an assessment that is reliable. For
subtraction, the other hand, she cannot use this result alone to estimate the student’s real or actual
skill. More assessments are needed before she can be confident that scores are reliable and thus
provide a dependable result.
But even the scores in addition are reliable; they are not without some degree of error. In
fact, all assessments have error; they are never perfect measure of the trait or skill. The concept of
error in assessment is critical to understanding reliability. Conceptually, whenever we see
something, we get an observed score of result. This observed score is a product of what the true
or real ability or skill is plus some degree of error:
Observed score = True score + Error
Reliability is directly related to error. It is not a matter of all or none, as if some results
are reliable and others unreliable. Rather, for each assessment there is some degree of error. Thus,
we think in terms of low, moderate, or high reliability. It is important to remember that error can
be positive or negative. That is, the observed score can be higher or lower than the true score
depending on the nature of the error. For example, if the student is sick, tired, in bad mood or
distracted, the score may have negative error and underestimate the true score.
So what are the sources of error in assessment that they may affect test reliability? Figure
3 summarizes the different sources of assessment error

7
Internal error

 Health
 Mood
 Motivation
 Test-taking skills
 Anxiety
 Fatigue
 General ability

Assessment Observed
Actual or true knowledge, Score
Understanding, Reasoning, Skills,
Products or Affects.

External error

 Directions
 Luck
 Item ambiguity
 Heat in room,
lighting
 Sampling of
Items
 Observer Possible sources of assessment error
differences
 Test
interruptions
 Scoring
 Observer Bias

Methods of Establishing reliability evidences


In the previous example given, what Mrs. Reyes did is called a test-retest method of
establishing reliability. That is giving the same test twice the same students at two different points
in time. Other methods include parallel-forms method and alternative-forms reliability
estimates. Parallel forms of a test exist when, for each form of the test, the means and the
variances of observed test scores are equal. Alternative forms are simply different versions of a
test that have been constructed so as to be parallel, in which the two forms of the tests are
typically designed to be equivalent with respect to variables such as content and level of
difficulty. Other methods that require statistical procedures are the Split-half reliability estimates,
the Spearman-Brown formula, the Kuder-Richardson formulas, and the Coefficient alphas.
To enhance reliability, the following suggestions are to be considered:

8
 Use sufficient number of items or tasks. (Other things being equal, longer tests
are more reliable)
 Use independent raters or observers who provide similar score on the same
performances
 Construct items and tasks that clearly differentiate students on what is being
assessed.
 Make sure the assessment procedures and scoring are as objective as possible.
 Continue assessment until results are consistent.
 Eliminate or reduce the influence or extraneous events or factors
 Use shorted assessments more frequently that fewer but long assessment

FAIRNESS
A fair assessment is one that provides all students an equal opportunity to demonstrate
achievement and yield scores that are comparably valid from one person or group to another. If
some students have an advantage over others because of factors unrelated to what is being taught,
then the assessment is not fair. Thus, neither the assessment task nor scoring is differentially
affected by race, gender, ethnic background, or other unrelated to what is being assessed. The
following criteria represent potential influences that determine whether or not an assessment is
fair.
1. Student knowledge of learning targets and assessment. A fair assessment is one in
which it is clear what will and will not be tested and your objectives is not to fool or trick students
or to outguess them on assessment. Rather, you need to be very clear and specific about the
learning target – what is to be assessed and how it will be scored
2. Opportunity to Learn. This means that the students know what to learn and then are
provided ample time and appropriate instruction. It is usually not sufficient to simply tell students
what will be assessed and the test them. You must plan instruction that focuses specifically on
helping students understand, providing students with feedback on their progress, and giving
students the time they need to learn.
3. Prerequisite knowledge and skills. It is unfair to assess students on things that require
prerequisite knowledge or skills that they do not possess. For example, you want to test math
reasoning skills. Your questions are based on short paragraphs that provide needed information.
In this situation, math reasoning skills can be demonstrated only if students can read and
understand the paragraphs. Thus, reading skills are prerequisites. If students do poorly on the test,
their performance may have more to do with a lack of reading skills than with math reasoning
4. Avoiding stereotypes. Stereotypes are judgements about how group of people will
behave based on characteristics such as gender, race, socioeconomic status and physical
appearance. Though it is impossible to avoid stereotypes completely because of out values,
beliefs and preferences, we can control the influence of these prejudices.
5. Avoiding bias in assessment task and procedures. Bias is present if the assessment
distorts performance because of the student’s ethnicity, gender, race, religious background and so
on. Bias appears in two forms: offensiveness and unfair penalization.
POSITIVE CONSEQUENCES

9
Ask yourself these questions. How will assessment affect student motivation? Will
students be more or less likely to be meaningfully involved? Will their motivation be intrinsic or
intrinsic? How will the assessment affect my teaching? What will the parents think about my
assessment? It is important to remember that nature of classroom assessment has important
consequences for teaching and learning.
Positive consequences on students. The most direct consequence of assessment is that
students learn and study in a way consistent with your assessment task. If your assessment is
multiple choice to determine the student’s knowledge of specific facts, students will tend to
memorize information. Assessment also has clear consequences on student’s motivation. If the
students know what will be assessed and how it will be scored, and if they believe that the
assessment will be fair, they are likely to be motivated to learn. Finally, the student-teacher
relationship is influenced by the nature of assessment such as when teachers construct
assessments carefully and provide feedback to students, the relationship is strengthened.
Positive consequences on teachers. Just as students learn depending on the assessment,
teachers tend to teach the test. Thus, if assessment calls for memorization of facts, the teacher
tends to teach lots of facts; if the assessment requires reasoning, then the teacher structures
exercises and experiences that get students to think. Assessment may also influence how you are
perceived by others. Are you comfortable with school administrators and parents reviewing and
critiquing your assessments? What about the views of other teachers? How do your assessments
fit with what you want to be as a professional? Thus, like students, teachers are affected by the
nature of the assessments they give their students.
PRACTICALITY AND EFFICIENCY
High quality assessments are practical and efficient. Because time is limited commodity
for teachers, factors like familiarity with the method, time required, complexity of administration,
ease of scoring, ease of interpretation and cost should be considered.
1. Familiarity with the method. This includes knowing the strengths and limitations of
the method, how to administer, how to score and interpret responses. Otherwise, teachers risk
time and resources for questionable results.
2. Time Required. Gather only as much information as you need for the decision. The
time is required should include how long it takes to construct the assessment, and how long it
takes to score the results. Thus, if you plan to use a test format (like multiple choice) over and
over for different groups of student, it is efficient to put in considerable time preparing the
assessment as long as you can use many of the same test items each year of the semester.
3. Complexity of administration. The directions and procedures for administration
should be clear and that little time and efforts are needed. Assessments that require long and
complicated instructions are less efficient and because probable student’s misunderstanding,
reliability and validity are affected.
4. Ease of Scoring. It is obvious that objectives test are easier to score than other
methods. In general use the easiest method of scoring appropriate to the method and purpose of
the assessment. Scoring performance-based assessment, essays and papers are more difficult to
score so it is more practical to use rating scales and checklists rather than writing extended
individualized evaluations.
5. Ease of interpretation. Objectives tests that report a single score than other methods.
In general use the easiest interpret, and individualized written comments are more difficult to

10
interpret. You can share to students key and other materials that provide meaning to different
scores or grades.
6. Cost. Like other practical aspects, it is best to use the most economical assessment.
However, it would be certainly unwise to use a more unreliable or invalid instrument just because
it costs less.
BALANCE
1. Assessment methods should be able to ass all domains of learning (Cognitive, psychomotor
and affective)
2. Assessment methods should be able to assess all hierarchy of objectives.
AUTHENTICITY
1. Assessment should touch real life situations.
2. Assessment should emphasize practicability.
CONTINUITY
1. Since assessment is an integral part of the teaching-learning process, it should be continuous.
CLEAR COMMUNICATION
1. Assessment result should be communicated to all people involved
2. Assessment results can be communicated by pre-test and post-test reviews.
ETHICS IN ASSESSMENT
As an indicator who uses assessments, you are expected to uphold principles of
professional conduct such as:
1) Protecting the safety, health, and welfare of all examinees;
2) Knowing about and behaving in compliance with laws relevant to
activities;
3) Maintaining and improving competence in assessment
4) Providing assessment services only in your area of expertise;
5) Adhering to, and promoting high standards of professional conduct
within and between educational institutions;
6) Promoting the understanding of sound assessment practices; and
7) Performing your professional responsibilities with honesty, integrity, due
care, and fairness.
III. Learning Task (you will be notified through google classroom on the date of
submission)
Activity 1 (on learning targets and methods of assessment)
For each of the following situations or questions, indicate which assessment method
provides the best match. Then provide a brief explanation why you choose that method of
assessment. Choices are selected response, essay, performance based, oral question, observation
and self-report.
1. Mrs. Abad needs to check students to see if they are able to draw graphs correctly

11
like the examples just demonstrated in class

Method: _______________________
Why?
______________________________________________________________________________
______________________________________________________________________________
_________________________________________________________________.
2. Mr. Garcia wants to see if his students are comprehending the story before moving
to the next set of instructional activities.

Method: ________________________
Why?
______________________________________________________________________________
______________________________________________________________________________
_________________________________________________________________.
3. Ms. Santos wants to find out how many spelling words her students know.
Method: _________________________
Why?
______________________________________________________________________________
______________________________________________________________________________
________________________________________________________________.
4. Ms. Cruz wants to see how well her students can compare and contrast the EDSA 1
and EDSA 2 people power revolution

Method: __________________________
Why?
______________________________________________________________________________
______________________________________________________________________________
_________________________________________________________________.
5. Mr. Mango’s objective is to enhance his student’s self-efficiency and attitude toward
school.

Method: __________________________
Why?
______________________________________________________________________________
______________________________________________________________________________
_________________________________________________________________.
6. Mr. Fuentes wants to know if his class can identify the different parts of a
microscope.
Method: __________________________
Why?
______________________________________________________________________________

12
______________________________________________________________________________
____________________________________________________________________.
Activity 2. (On validity and reliability)
A. Answer the follo0wing questions briefly.
1. Should teachers be concerned about relatively technical features of assessment such as validity
and reliability? Why or why not?

2. Which of the following statements is correct, and why?

a. Validity is impossible without strong reliability.


b. A test can be reliable and without validity.
c. A valid test is reliable
3. Mr. Carlos asks the other math teachers in his high school to review his midterm to see if the
test
items represent his learning targets. Which type of evidence of validity is being used, and why?

4. The students in the following lists are rank ordered, based on their performance on two tests on
the same content (Highest score at the top). Do The results suggest a reliable assessment? Why
or why not?

Test A Test B.
George Ann
Tess Robert
Ann Carlo
Carlo George
Robert Tess
5. Reading activity. When do we use these methods of establishing reliability evidences?
a. Split-half reliability instruments
b. Spearman-Brown Formula
c. Kuder-Richardson (KR 20 & KR 21) formulas
d. Coefficient alphas
Activity 3. (on fairness, practicality and positive consequences)
1. Which aspect of fairness is illustrated in each of the following assessment situations?
a. Students complained because they were not told what to study for the test
b. Students studied the wrong way for the test (e.g., they memorized the content)
c. The teacher was unable to cover the last unit that was on the test.
d. The test was about the story about life in Baguio City and students who had been
to Baguio showed better comprehension scores than students who had not been there.
13
2. Is the following test item biased? Why or Why not?
Carlo has decided to develop a family budget. He has P2,000 to work with and decides
to put P1,000 into house rental, P300 into food, P200 to transportation, P300 into
entertainment, P150 into utilities, and P50 into savings. What percent or Ramon’s
budget
is being spent into each of the categories?

3. Why is it important for teachers to consider practicality and efficiency in selecting their
assessments?

4. Based on your experience or observed practices, suggest at least two ways on how to enhance
the practicality and efficiency of the assessment in terms of:
a. Cost
b. Ease of scoring
c. Complexity of administration
5. On-site activity. Ask a group of high school or elementary students, depending on your interest
about what they see as fair assessment. Also, ask them how different kinds of assessment affect
them; for example, do they study differently for essay and multiple-choice tests?

Activity 4. Share insights that you gained in the lesson. I would suggest that in each
principle/criterion of high-quality assessment, a paragraph or two is encouraged.
IV. References
Buendicho, Flordeliza C. Assessment of Student Learning 1,
Rex Book Store, Manila, 2010
Garcia, Carlito D. Measuring and Evaluating Learning Outcomes: a Textbook in
Assessment of Learning 1 & 2, Books Atbp. Publishing corp., Mandaluyong City, 20

14

You might also like