Assessment Methods
Assessment Methods
and Steve Chappuis, Copyright 2004 by Assessment Training Institute (www.assessmentinst.com). Reprinted by permission of the Assessment Training Institute.
PA R T
2
Assessment Methods
CHAPTER
4 1
Assess How? Designing Assessments to Do What You Want
o far, we have examined two keys to assessment quality. The first key is to know at the outset how we intend to use assessment results. Sometimes we can use them to promote learning (assessment for learning) and other times to check to see if learning has occurredthat is, for purposes of accountability (assessment of learning). As the second key to quality, we have established that assessments must be designed to reflect the variety of achievement targets that underpin standards: mastery of content knowledge, the ability to use knowledge to reason, demonstration of performance skills and product development capabilities. Now we consider the third key to classroom assessment qualityhow to design assessments that cover our targets and serve our purposes (the shaded portion of Figure 4.1). In this chapter we describe four assessment methods representing the range of assessment options, explain how to choose which method to use for any given learning target, and outline the steps in assessment planning and development. We treat each of the four assessment methods in depth in Chapters 5 through 8; here we offer an overview with an emphasis on selecting the proper method and on thoughtful assessment planning.
89
Figure 4.1
Accurate Assessment
WHY ASSESS?
What's the purpose? Who will use the results?
ASSESS WHAT?
What are the learning targets? Are they clear?
Are they good?
COMMUNICATE HOW?
How manage information? How report?
Effectively Used
90
Chapter 4
3. Performance assessment 4. Personal communication All four methods are legitimate options when their use correlates highly with the learning target and the intended use of the information. (Portions of the following discussion are adapted from Stiggins, 2005.)
Selected Response
Selected response and short answer methods consist of those in which students select the correct or best response from a list provided. Formats include multiple choice, true/false, matching, short answer, and fill-in questions. (Although short answer and fill-in-the-blank do require students to generate an answer, they call for a very brief answer that is counted right or wrong, so we include these options in the selected response category.) For all selected response assessments, students scores are figured as the number or proportion of questions answered correctly.
We judge correctness of extended written responses by applying one of two types of predetermined scoring criteria. One type gives points for specific pieces of information that are present. For example, when students in a biology class are asked to describe the Krebs cycle, points might be awarded for noting that the cycle describes the sequence of reac-
91
tions by which cells generate energy, takes place in the mitochondria, consumes oxygen, produces carbon dioxide and water as waste products, and converts ADP to energy-rich ATP. The second type of criteria can take the form of a rubric, such as a general rubric for making comparisons, which can be applied to any exercise calling for comparison. Scores therefore also take one of two forms: number or percentage of points attained, or rubric scores.
Performance Assessment
Performance assessment is assessment based on observation and judgment; we look at a performance or product and make a judgment as to its quality. Examples include the following: Complex performances such as playing a musical instrument, carrying out the steps in a scientific experiment, speaking a foreign language, reading aloud with fluency, repairing an engine, or working productively in a group. In these cases it is the doingthe processthat is important. Creating complex products such as a term paper, a lab report, or a work of art. In these cases what counts is not so much the process of creation (although that may be evaluated, too), but the level of quality of the product itself. As with extended written response assessments, performance assessments have two parts: a performance task or exercise and a scoring guide. Again, the scoring guide can award points for specific features of a performance or product that are present, or it can take the form of a rubric, in which levels of quality are described. For example, to assess the ability to do a simple process, such as threading a sewing machine, doing long division, or safely operating a band saw, points might be awarded for each step done in the correct order. Or, for more complex processes or products, you might have a rubric for judging quality that has several dimensions, such as ideas, organization, voice, word choice, sentence fluency and conventions in writing, or content, organization, presentation, and use of language in an oral presentation. Again, scores could be reported in number or percent of points earned, or in terms of a rubric score.
92
Chapter 4
Personal Communication
Gathering information about students through personal communication is just what it sounds likewe find out what students have learned through interacting with them. Examples include the following: Looking at and responding to students comments in journals and logs Asking questions during instruction Interviewing students in conferences Listening to students as they participate in class Giving examinations orally
We usually think of this as informal, rather than formal assessment (in which results are recorded for later use). Often it is. However, as long as the learning target and criteria for judging response quality are clear, information gathered via personal communication can be used to provide descriptive feedback to students, for instructional planning, and for student self-reflection and goal setting. If planned well and recorded systematically, information from personal communication can be used as the basis for assessments of learning. Student responses are evaluated in one of two ways. Sometimes the questions we ask require students to provide a simple, short answer, and all were looking for is whether the answer is correct or incorrect. This is parallel to scoring for written selected response questions. Questions during instruction usually call for these short answer oral responses. Other times, student oral responses are longer and more complex, parallel to extended written response questions. Just as with extended written response, we evaluate the quality of oral responses using a rubric or scoring guide. Longer, more complicated responses would occur, for example, during oral examination or oral presentations.
93
94
Chapter 4
TargetMethod Match
One of the values in classifying assessments according to method is that we can think clearly about how to assess what we are teaching. The heart of accuracy in classroom assessment revolves around matching different kinds of achievement targets, with all the forms and nuances of each, to the appropriate assessment method. This is easily done and can save time in the long run. To begin thinking about the match between kind of learning target and assessment method, please complete the following two activities. You may want to discuss possible answers with colleagues.
DEEPEN UNDERSTANDING
95
DEEPEN UNDERSTANDING
96
Chapter 4
97
98
Chapter 4
Figure 4.2
Target to Be Assessed
Knowledge Mastery
Reasoning Proficiency
Performance Skills
99
Table 4.1
Target to Be Assessed
Knowledge Mastery
Reasoning Proficiency
Can watch students solve some problems and infer reasoning proficiency. Good match. Can observe and evaluate skills as they are being performed. Good match. Can assess the attributes of the product itself.
Skills
Not a good match. Can assess mastery of the knowledge prerequisites to skillful performance, but cannot rely on these to tap the skill itself. Not a good match. Can assess mastery of knowledge prerequisite to the ability to create quality products, but cannot use to assess the quality of products themselves. Strong match when the product is written. Not a good match when the product is not written.
Source: Adapted from Student-Involved Assessment for Learning, 4th ed. (p. 69), by R. J. Stiggins, 2005, Upper Saddle River, NJ: Merrill/Prentice Hall. Copyright 2005 by Pearson Education, Inc. Adapted by permission of Pearson Education, Inc.
100
Chapter 4
Performance Assessment
Performance assessment is usually not a good choice for assessing knowledge targets, for three reasons. Well illustrate the first reason with a brief example. Lets say we ask a student to complete a rather complex performance, such as writing and executing a computer program, for the purpose of determining if she has the prerequisite knowledge. If the student successfully executes the program, then we know that she possesses the prerequisite knowledge. The problem comes in when the program does not run successfully. Was it due to lack of knowledge of the programming language, due to the inability to use knowledge to create a program that does what it is intended to do, or merely due to the inability to manipulate the keyboard or to proofread? We cant know the reason for failure unless we follow up the performance assessment with one of the other assessment methods. We must ask some short answer or extended response questions to find out if the prerequisite knowledge was there to start with. But, if our initial objective was to assess mastery of specific knowledge, why go through the extra work? To save time and increase accuracy, we recommend using selected response, short answer, and extended written response assessments to evaluate knowledge targets. The second reason this is not a good match is because it is inefficient to assess all content knowledge with a performance assessment. A single performance task does require some subset of knowledge, and you can assess its presence with a particular performance task, but how many performance tasks would you have to create to cover all the knowledge you want students to acquire? For example, how many performance assessments would it take to determine if students can spell all the words you want them to spell? Or, how many performance assessments would it take to determine if students can perform all the mathematical operations they have been taught in a semester? Again, we recommend assessing knowledge with a simpler method and reserving performance assessment for those learning targets that really require it.
101
The third reason that performance assessments are usually not a good match for knowledge learning targets has again to do with practicality. It just isnt practical (or safe) to conduct some performance assessments. For example, lets say that you want to know if students can read schedules, such as bus schedules. It would be most authentic to ask students to get around town on the bus, but it would be highly inefficient and perhaps dangerous. Asking students to answer multiple-choice or short answer questions requiring understanding of a bus schedule would be a good compromise for getting the information needed.
Personal Communication
This is a good match with knowledge targets for most students at all grade levels, but tends to be inefficient if a lot of knowledge is to be assessed for lots of students. Personal communication works best for real-time sampling of student understanding during instruction. Also, for some students, such as those with special needs, English language learners, or younger students, it is the best way to gather accurate information.
102
Chapter 4
to solve the following problem in mathematics: Estimate the number of hours of TV advertising the typical U.S. fifth grader watches in a year. Describe your procedure for determining your answer. This is an extended response question. If the learning target you want to assess is student reasoning, a single number as the right answer is not the focus of the assessmentthe process itself is.
Performance Assessment
This is a partial match for assessing reasoning. For example, we can observe students carrying out science laboratory procedures and draw conclusions about their reasoning based on our observations. But, theres a hitch that keeps performance assessment from being a great match with reasoning targets: we need to make an inference from what we observe. If students do well on a performance task requiring specific patterns of reasoning, we can assume that reasoning is sound. However, if they dont do well, it could be due to lack of prerequisite knowledge, lack of motivation, or to imprecise reasoning. Without engaging in additional time-consuming assessment, we may not be able to judge level of achievement on reasoning targets.
103
Personal Communication
For gathering accurate information, personal communication is a strong match to reasoning targets. Teachers can ask students questions to probe more deeply into a response. Or, students can demonstrate their solution to a problem, explaining their reasoning out loud as they go. The drawbacks with using personal communication to assess reasoning proficiency are, as always, the amount of time it takes and the record-keeping challenge it poses.
104
Chapter 4
105
4. Proficiency using specified mathematical procedures Using mathematical procedures might imply knowledgeability to carry out the steps in a procedureor it might imply reasoningunderstanding when to use a mathematical procedure. You could use selected response, extended written response, or personal communication to assess either a knowledge or reasoning interpretation. 5. Proficiency conducting labs in science Proficiency conducting labs in science is a performance skillskillfully using equipmenttherefore it requires a performance assessmentwatching students use the equipment.
TRY THIS
106
Chapter 4
Figure 4.3
1. 2. 3. 4. 5.
Plan: Assess why? Assess what? Assess how? How Important? Develop: Determine the sample. Select, create, or modify test items or tasks and scoring mechanisms. Critique: Evaluate for quality. Administer: Administer the test or assessment. Revise: Evaluate test quality based on results and revise as needed.
the proper assessment method. The fourth step, which we address later in this section, is to determine the relative importance of each learning target so that we sample each adequately. In the second stage we select or create test items or tasks and scoring mechanisms, adhering to the guidelines offered for each method in Chapters 5 through 8. During the third stage, we check to make sure we have avoided all possible things that might inadvertently cause results to misrepresent student learning, again using information provided for each method in Chapters 5 through 8. In the fourth stage, we simply administer the assessment to students. In the fifth and last stage, we note any problems with the questions, tasks, or scoring mechanisms on the assessment and rework them as needed. The five stages of development we describe here are presented in the context of a teacher-developed assessment for classroom use. However, they also apply to any other type of assessment developed by grade level teams, content area departments, or district subject-area teams for purposes other than individual classroom use. Short-cycle, common, or interim assessments also need to adhere to standards of quality, and the five stages of development should frame that assessment development process, as well. In Chapters 5 through 8 we will describe any variations on the theme applicable for particular assessment methods.
107
TRY THIS
108
Chapter 4
Table 4.2
109
5 5 5 15 5 5 10
10 10
5 10
15 35
Assess Why?
As we saw in Chapter 2, assessment results can be used for many purposes. In each of our two examples, the teachers primary purposes are twofold: to help students understand how much they have learned, and to add information to the gradebook in preparation for calculating a course grade. Because assessment design is influenced by how we intend to use the results and by whom else will also use them, we answer the question, Assess why? first of all.
Assess What?
Sound assessments arise from clear, specific, and appropriate achievement targets. Beginning with clear targets is important because different targets require different assessment methods and also because the breadth and depth of a learning target will affect how much coverage it will need on the assessment and in instruction. So at this juncture, you will do the following:
110
Chapter 4
1. List the major learning targets you will be teaching. 2. Identify the prerequisite subtargets by unpacking or clarifying the learning targets, as needed. 3. Classify the targets, subtopics, and/or unpacked learning targets, into knowledge, reasoning, performance skills, products, and/or dispositions. 4. Write the unpacked and/or clarified learning targets into the appropriate spaces in the test plan format you select. Blank forms are on the CD in the file, Test Planning Forms.
Table 4.4
Acquire vocabulary associated with the physics of sound. Learn that sound originates from a source that is vibrating and is detected at a receiver such as the human ear Use knowledge of the physics of sound to solve simple sound challenges Understand the relationship between the pitch of a sound and the physical properties of the sound source (i.e., length of vibrating object, frequency of vibrations, and tension of vibrating string) Use scientific thinking processes to conduct investigations and build explanations: observing, comparing, and organizing (1) How sound travels through solids, liquids, and air; (2) Methods to amplify sound at the source and at the receiver
Knowledge
Selected Response
25%
Knowledge
5%
Reasoning
Reasoning
Reasoning Skill
40%Design an experiment for a given hypothesis; give data/ student organizes; set up stations/students conduct an experiment all novel
Source: From the FOSSR Physics of Sound Teacher Guide, The Regents of the University of California, 2005, developed by Lawrence Hall of Science and published by Delta Education, LLC. Reprinted by permission.
111
The secondary school music teacher whose test plan is represented in Table 4.3, has planned a 3-week unit of instruction on bluegrass music. He has chosen bluegrass music as the context for the following music standards: Classifies selected exemplary works from various historical periods by genre, style, and composer. Explains how use of specific musical elements (for example, rhythm, melody, timbre, expressive devices) is characteristic of music from various world cultures. Identifies music that represents the history and diverse cultures of our state. Identifies important composers and performers who influenced various genres of American music. Students will need to acquire some knowledge about bluegrass music in three categoriesworks (famous pieces of music), musical elements (used to give the music the bluegrass feel), and composers/performers. In addition, the teacher will teach students to use the content knowledge in each of these three areas to reason analytically and comparatively. As indicated in the test plan, any single test question either will test knowledge or will be a combination of knowledge and the reasoning that is to be performed using that knowledge. In the plan for the fourth-grade unit on the physics of sound, the teacher has written selected learning targets down the left-hand column of Table 4.4. The type of learning target is noted in the next column. These teachers chose content categories based on their state content standards, local curriculum guides, and natural subdivisions of content. They chose reasoning patterns from content standards, local curriculum, and priorities in their teaching.
Assess How?
This is fairly straightforward. Once you have classified learning targets by type it is easy to decide which assessment method to select by referring to the matching guidelines in Table 4.1. The fourth-grade teacher is emphasizing science process skills as well as knowledge and reasoning so she will be using more than one assessment method. She has chosen the planning format shown in Table 4.4, which allows her to specify how each learning target will be assessed. The music teacher has only knowledge and reasoning learning targets. He has decided that the combination of knowledge and reasoning can be assessed well with a
112
Chapter 4
selected response test. Since he has no need for a test plan to show different assessment methods, he has chosen a test plan format that emphasizes how content knowledge crosses with level of thinking.
How Important?
When we define the relative importance of each of the learning targets listed, we are mapping out how we will sample student learning. What will be most important on this assessment? How many points will each item be worth? For the most part, this is the call of the individual teacher, taking into account the following: The breadth and depth of the learning target. For example, in Table 4.4, the learning target Learn that sound originates from a source that is vibrating and is detected at a receiver such as the human ear doesnt cover as much territory as Acquire vocabulary associated with the physics of sound, or Use scientific thinking processes to conduct investigations and build explanations: observing, comparing, and organizing. Therefore, assessing learning where sound originates will carry less weight on the assessment, as reflected by the percentage of total points, and other targets will carry more weight. In all cases, the assessment must include enough questions or tasks to provide evidence leading us to a confident conclusion about student achievement, without wasting time gathering too much evidence. The critical question is, How much evidence is enough? How many multiple choice test items, essay exercises, performance tasks? (Each assessment method brings with it a set of rules of evidence for determining how big a sample of student achievement we need. We explain those guidelines in Chapters 5 through 8.) The importance of each learning target. For example, in Table 4.4, the teacher has determined that the most important learning target focuses on science processes and skills. Scientific information is important, and there is an expectation that students will learn some content information from this unit of study, but process skills are more important in this case. Therefore, science process targets alone will comprise 40 percent of the assessment points and the other four targets combined will total 60 percent. State standards and local curriculum. For example, the music teacher is guided by the state standard in his emphasis of knowledge and reasoning targets in the unit. Because the state standards emphasize using information to analyze and classify, the teacher has also emphasized it on his testtwo-thirds of the points on the test reflect students ability to apply knowledge in novel ways.
113
Although not a hard and fast rule, a good guideline for making decisions regarding percentage of importance for each learning target is that percentage of instructional time and percentage of assessment time should be roughly equal. So, if science processes and skills represent 40 percent of importance, roughly 40 percent of instructional time will be used to teach science processes and skills.
114
Chapter 4
Figure 4.4
B. Barriers that can occur within the assessment context Noise distractions Poor lighting Discomfort Lack of rapport with assessor Cultural insensitivity in assessor or assessment Lack of proper equipment
C. Barriers that arise from the assessment itself (regardless of method) Directions lacking or vague Poorly worded questions Poor reproduction of test questions Missing information
115
Figure 4.4
(Continued)
B. Barriers with extended written response assessments Lack of reading or writing skills No scoring criteria Inappropriate scoring criteria Evaluator untrained in applying scoring criteria Biased scoring due to stereotyping of respondent Insufficient time or patience to read and score carefully Students dont know the criteria by which theyll be judged
C. Barriers with performance assessment Lack of reading skills Inappropriate or nonexistent scoring criteria Evaluator untrained in applying scoring criteria Bias due to stereotypic thinking Insufficient time or patience to observe and score carefully Student doesnt feel safe Unfocused or unclear tasks Tasks that dont elicit the correct performance Biased tasks Students dont know the criteria by which theyll be judged Insufficient sampling
D. Barriers when using personal communication Sampling enough performance Problems with accurate record keeping
Source: Adapted from Practice with Student-Involved Classroom Assessment (pp. 194195), by J. A. Arter & K. U. Busick, 2001, Portland, OR: Assessment Training Institute. Adapted by permission.
116
Chapter 4
117
Summary
No single assessment method is superior to any other. Selected response, extended written response, performance assessment, and personal communication are all viable options depending on the learning targets to be assessed, the purpose of the assessment, and special student characteristics such as age, English proficiency, or specific learning disabilities. All assessment development proceeds through the same five stages: (1) identify the purpose, specify the targets, select appropriate methods, decide on relative importance of the targets and sample well; (2) write the questions using guidelines for quality; (3) eliminate as many potential sources of bias and distortion as possible; (4) administer the assessment; and (5) examine the results for areas needing fine tuning. By doing the work at each stage, we can have confidence that our assessments are yielding accurate results.
118
Chapter 4
DEEPEN UNDERSTANDING
119
TRY THIS
120
Chapter 4
TRY THIS
Arter, Judy. and Chappuis, Jan. and Chappuis, Steve. and Stiggins, Richard. 2004. Assess How? Designing Assessments to Do What You Want. In CLASSROOM ASSESSMENT FOR STUDENT LEANRING: DOING IT RIGHT USING IT WELL, 89-121. Portland, Oregon: Assessment Training Institute. This reading is provided by the Assessment Training Institute, which provides comprehensive professional development, training materials and resources for teachers and school leaders so they can better manage the challenges of classroom assessment. This PBS Teacherline course on assessment and evaluation is only a beginning exploration of classroom assessment, therefore Teacherline encourages students to further develop their knowledge and skills in these areas by visiting the Assessment Training Institutes Web site at: www.assessmentinst.com.
121