0% found this document useful (0 votes)
179 views

Supporting Documentation

Supporting Documentation for IAAL Reflections

Uploaded by

Shelley Lubritz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views

Supporting Documentation

Supporting Documentation for IAAL Reflections

Uploaded by

Shelley Lubritz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

The Relation of AIMSweb®, Curriculum-Based

Measurement, and the Common Core Standards:


All Parts of Meaningful School Improvement

W h it e Pap e r

BIG IDEAS:
1. The Common Core State Standards (CCSS) provide sets of end of high
school outcomes and end-of-year annual benchmarks to guide what
students should learn.
2. The assessment implications of CCSS are clearly related to summative
evaluation and accountability.
3. No single test is sufficient for all the data-based decisions, screening,
intervention planning/diagnosis, progress monitoring, accountability/program
evaluation that schools make.
4. Assessment of CCSS need not be separate items or tests for each standard,
but may include “rich tasks” that address a number of separate standards.
5. AIMSweb’s Curriculum-Based Measurement (CBM) tests typically are based
on these rich tasks that are validated as “vital signs” or “indicators” of
general basic skill outcomes like general reading ability or writing ability.
6. AIMSweb’s CBM tests are consistent with the CCSS, especially with the
K–5 Reading and Writing Standards. They are content valid.
7. AIMSweb’s CBM tests are complementary with the assessment
requirements to attain the CCSS. The tests have consequential validity for
making screening decisions to facilitate early intervention and critically,
for frequent progress monitoring, one of the most powerful tools to
increase achievement.
Mark R. Shinn, PhD
Professor of School Psychology,
National Louis University
AIMSweb Consultant
Introduction For more than 30 years, our nation’s schools use of Curriculum-Based Measurement
(CBM), a set of simple, time efficient, and scientifically sound assessment tools, has
increased rapidly for frequent basic skills progress monitoring and screening students for
risk. CBM is the primary set of testing tools used by AIMSweb in a General Outcome
Measurement (GOM) approach to data-based decision making. Most often, AIMSweb is
used in the context of delivery of Multi-Tier System of Supports (MTSS), also known as
Response to Intervention (RtI).
The past 2 years has seen some confusion about the role of CBM in contemporary
assessment practice, largely due to the 2010 publication of Common Core State Standards
(CCSS) for English Language Arts and Literacy in History/Social Studies and Science and the
Common Core State Standards for Mathematics (K–12) by the Council of Chief State School
Officers (CCSSO) and the National Governors Association (NGA). In many schools,
little has changed with respect to assessment practices. CBM remains a cornerstone of
data-based decision making for frequent progress monitoring and screening in MTSS/
RtI. However, in other school districts, CBM’s use, like other assessment tools currently
in use, has been questioned because of concerns about their relation to the CCSS. Given
the intense pressure to adopt and use the CCSS, the questioning of what is appropriate
assessment is legitimate. This white paper is intended to contribute to understanding the
assessment implications of the CCSS and the use of CBM. By understanding what CCSS
and CBM is and isn’t, the paper contends that the use of CBM for formative, frequent
progress monitoring, one of education’s most powerful tools to increase achievement
(Hattie, 2009;Yeh, 2007), is a critical component to achieve the CCSS. Frequent progress
monitoring is especially important for students who are at risk and CBM use in proactive
universal screening enables schools to intervene as early as Kindergarten entry to provide
appropriately intensive intervention. Thus, I will argue that CBM is consistent with, and
complementary to, the CCSS.
By consistent, I mean that there is a clear relation between what is assessed when schools
use CBM and what academic skills are deemed important to gauge in the CCSS. This can
be judged largely by an evaluation of content validity. By complementary, I mean that the use
of CBM supports decisions that are related to essential judgments regarding attainment of the
CCSS, but using testing tools and practices that answer different questions than one would
expect with respect to assessment of the CCSS that emphasizes summative evaluation and
accountability. No single test can be valid for all decision-making purposes (i.e., screening,
instructional planning/diagnosis, frequent formative progress monitoring, summative
progress monitoring, accountability/program evaluation) unless testing time and resources
are unlimited. This lack of a “Swiss Army knife” assessment instrument is compounded
from a practical perspective by the current lack of a national test of the CCSS. Evaluating
AIMSweb’s ability to complement proposed CCSS assessment is a construct validity and
consequential validity question (Barton, 1999; Messick, 1986).
I will present a brief background of CBM test development and use and its relation to
the academic standards movement in general. I also will present a brief review of what
the CCSS is and isn’t and conclude with how CBM forms one of the “single rich tasks”
consistent with the CCSS (National Governors Association Center for Best Practices &
Council of Chief State School Officers, 2012; p. 5) assessment process. In this paper, I will
examine consistency with, and complementarity to, the Common Core State Standards for
English Language Arts and Literacy in History/Social Studies and Science, but the concepts apply
as well to the CCSS for Mathematics.

2 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
A Brief History of The most commonly used CBM test is Reading-CBM (R-CBM). Students read graded
CBM Use and Interfaces passages of controlled difficulty aloud for a brief (i.e., 1 minute) period of time and the
with Assessment of State number of words read correctly (WRC) is counted. However, there are CBM tests
of Mathematics Computation (M-COMP), Mathematics Concepts and Applications (M-CAP),
Academic Standards
spelling (S-CBM) and written expression (WE-CBM), and early literacy and numeracy. CBM
provides a set of standard tools that are used in General Outcome Measurement (GOM).
Instead of testing students on a variety of ever-changing, different tests as in Mastery
Monitoring (MM), GOM is intended to provide a consistent scale for decision making
within and across years, working like other disciplines’ general outcome measures (e.g.,
thermometers for medicine, Dow Jones Industrial Index for the economy). For more detail
on GOM and MM, see Fuchs and Deno (1991) and Shinn (2012).
All CBM tests were created empirically, with careful attention to construct validity with
the intent of identifying simple “indicators” or “vital signs” of more broad academic domains
such as general reading achievement, mathematics achievement, etc. The goal of CBM test
construction was to find a single measure that was robust in information in each basic
skills domain (e.g., reading, mathematics computation, written language) that correlated to
other accepted measures of the same construct (i.e., criterion related validity) that would
allow valid decisions about overall student progress and relative standing (i.e., construct
validity, consequential validity). For examples of how these details were developed and
validated, see Deno (1992) or Fuchs, Fuchs, and Maxwell (1988). As a result of research
programs, we have learned that when students read aloud for 1 minute and WRC is
counted, what is assessed is much more than behaviors like oral reading fluency or even
oral reading skills. What is assessed is general reading achievement, incorporating a
variety of skills. For example, students with rich vocabulary read more words correctly
in a fixed period of time than students who do not have a rich vocabulary. Students
who comprehend what they read, read more words correctly in a fixed period of time
than students who do not comprehend what they read. Students who can decode
unfamiliar words read more words correctly in a fixed period of time than students who
cannot decode unfamiliar words. AIMSweb provides these field-tested, validated, and
independently reviewed CBM test materials in the basic skills areas and organizes and
reports the data for educators and parents.
Emerging out of the special education research community in the late 1970s, where CBM
was used for writing IEP goals and supporting frequent progress monitoring toward
those goals, CBM use expanded in the early 1980s as it became recognized that these
were efficient and effective tools for all students when making decisions about basic
skills (Deno, Marston, Shinn, & Tindal, 1983; Deno, Mirkin, & Wesson, 1984). Schools saw
the importance of not only monitoring special education students’ IEP progress, but the
progress of all students. Schools also began to use CBM progress monitoring tools for
universal screening to support early intervention, in part, to prevent the need for special
education (Deno, 1986). Use of CBM continued to grow, but expanded exponentially
nationwide at the end of the 20th century with accumulated scientific knowledge and
examples of successful school practices that dovetailed as critical components in the
National Reading Panel Report (2000), No Child Left Behind (NCLB), and Reading First. Use of
CBM for progress monitoring and screening became even more prevalent with passage of
the Individuals with Disabilities Education Act (IDEA) of 2004 that reinforced NCLB efforts to
support early identification of at risk students through screening and regular reporting of
standardized measures of academic progress to parents for all students and as integral to
evaluating response to intervention (RtI)(Shinn, 2002, 2008). Further coalescence occurred
as RtI expanded into a more comprehensive service delivery system, Multi-tier System of
Supports (MTSS). Foundational to RtI and MTSS is a seamless data system where simple
time and cost efficient screening can lead directly to simple and cost efficient progress
monitoring for all students that leads to even more frequent progress monitoring for
students at risk (Shinn, 2010).

3 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
Coinciding with these school improvement efforts, for more than 20 years, states
have been actively engaged in identifying and assessing their own state standards. With
passage of NCLB, the role of state standards reached its zenith. School district and
school accountability and consequences was mandated to be tied to performance
on state standards tests (SSTs) that were required to begin at Grade 3 and, with few
exceptions, were completed at the end of the academic year.
The myriad national school reform efforts and CBM and state standards assessment
strategies typically were not in conflict, but consistent and complementary. SST was
seen as valid for purposes of summative progress monitoring for individual students to
determine what students had learned and for school and school district accountability.
In contrast, CBM was seen as valid for purposes of frequent formative evaluation to
judge progress and facilitate any necessary modifications of intervention programs, and
to enable very frequent (e.g., weekly) formative evaluation for at risk students, with
the added capacity for beginning of the year universal screening. Importantly, CBM
allowed for early identification through universal screening as early as the beginning of
kindergarten, avoiding a “wait-to-fail” approach that would result if the first point of
decision making was the end of Grade 3. In fact, it was possible to use CBM to predict
long-term performance of individual students on SSTs (Silberglitt & Hintze, 2005; Stage
& Jacobsen, 2001). For example, a student who earned an R-CBM WRC score of 60 at
the end of Grade 1 would be predicted to be highly likely to pass the end-of-Grade 3
Illinois Standards Achievement Test (ISAT). In contrast, a student who earned an R-CBM
score of less than 40 WRC at the end of Grade 1 would be predicted to be highly
unlikely to pass the end-of-Grade 3 ISAT.
Despite more than two decades of implementation effort, state standards proved to
be unsatisfactory. Each state was permitted to write their own standards for learning
outcomes in language arts, including reading, and mathematics, and some states added
specific content area (e.g., science) standards. No uniform process was used to create
these standards and each state separately contracted for their own assessments and
criterion scores for judging success.
Although the need to identify expected learning outcomes was well accepted, the
operationalization of the state standards was subjected to almost universal criticism
for the variability in rigor among states. In 2006, Finn, Julian, and Petrilli reviewed their
ratings of state standards since 2000 and changes through 2006. They gave the collective
state standards a rating of C-minus in 2000 and concluded in 2006 that “two-thirds of
the nation’s K–12 students attend schools in states with C-, D-, or F-rated standards”
(p. 9). Few meaningful changes in state standards occurred in the subsequent intervening
period. A 2010 comprehensive review of state standards by Carmichael, Martino, et al.,
(2010) concluded that “the vast majority of states have failed even to adopt rigorous
standards” (p.1).
The identified problem was not just the standards, but also the basis of judging their
attainment. By creating individual SSTs, content and criteria for success varied resulting
in such state-to-state differences that “in some states, students could score below the
10th percentile nationally and still be considered proficient. In other states… they had
to reach the 77th percentile to wear the same label” (Finn, et al., 2006; p. 2). In my own
state of Illinois, students who read at or around the 30th percentile nationally would
be judged as proficient on the ISAT. Thirty miles north of my sons’ schools, students
in the state of Wisconsin can read as poorly as the 14th percentile nationally (Grades
2 and 8) and be judged as proficient by their SST, the Wisconsin Knowledge and Concepts
Examination-Criterion Referenced Test (WKCSE-CRT).

4 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
A Brief Overview of the Because of these concerns in state-to-state standards rigor and differences in state standard
Common Core State tests’ criteria and outcomes, an effort to develop national standards began almost a decade
Standards (CCSS) and ago. This effort by the Council of Chief State School Officers (CCSSO) and the National
Governors Association (NGA) was an extension of their previous work to develop College
Assessment Implications
and Career Readiness (CCR) reading, writing, speaking, listening, language, and mathematics
standards. The Common Core State Standards (CCSS) were released for feedback in 2009 and
published in 2010. As of August 2012, 45 states and the District of Columbia have adopted the
English and Language Arts and Mathematics CCSS.
The CCSS represent the “what” in terms of students’ learning. According to the authors:
…standards are the foundation upon which almost everything else rests—or should rest.
They should guide state assessments and accountability systems; inform teacher
preparation, licensure, and professional development; and give shape to curricula, textbooks,
software programs, and more. Choose your metaphor: Standards are targets, or blueprints,
or roadmaps.They set the destination: what we want our students to know and be able to
do by the end of their K–12 experience, and the benchmarks they should reach
along the way (National Governors Association Center for Best Practices &
Council of Chief State School Officers, 2012; p. 1) (emphasis added)
Within this brief introductory paragraph are three implications for assessment. First, the
standards “should guide state assessments and accountability systems.” Consistent with previous
state standards’ efforts, this statement narrows the scope of CCSS assessment decisions from
“every decision” (e.g., formative assessment, summative assessment, accountability/program
evaluation, instructional planning, screening) and “everyone’s assessments” (e.g., states, school
districts, schools, classrooms) to two major decisions (1) summative assessment, and (2)
accountability/program evaluation, and one assessment system, a state’s and its capacity to
make these two decisions. Second, the statement is a clear intent to focus assessment on long-
term outcomes, at the end of K–12. Third, the paragraph communicates the need to include
other outcomes along the way through the establishment of “benchmarks” toward these
long-term outcomes, implicitly by summative assessment at the end of each grade.
These end-of-the year summative benchmarks are elaborated on later in the CCSS document
and identified explicitly by clarifying paragraphs in the section on Key Design Considerations
The K–12 grade-specific standards define end-of-year expectations and a cumulative
progression designed to enable students to meet college and career readiness expectations
no later than the end of high school (National Governors Association Center for Best
Practices & Council of Chief State School Officers, 2012 p. 4) (emphasis added).
In summary, the implications of CCSS are authors’ judgments about two important decisions,
summative evaluation and accountability/program evaluation. Therefore, schools will continue
to need assessment instruments and practices for two equally important decisions to support
achieving the CCSS, identifying at risk students and conducting formative evaluation, especially
frequent formative evaluation.
The CCSS are also explicit in identifying what they are not. It is clear that the authors did
not intend that the CCSS determine the how of instruction and assessment nor were they
intended to be de-limiting. In other words, they are the ends, not the means to achieve them.
Importantly, the CCSS authors express awareness of the interrelatedness of the standards and
the corresponding implications for assessment.
…each standard need not be a separate focus for instruction and assessment. Often, several
standards can be addressed by a single rich task (emphasis added).
This last narrative is critical to understanding how the AIMSweb’s CBM tests are consistent
with and complement the CCSS. As noted earlier (see page 2), the specific CBM measures
were designed exactly in line with the CCSS concept of “rich tasks.” They allow for making
statements about several standards.

5 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
Organization of the CCSS The CCSS are divided into two documents, the Common Core State Standards for English
and How CBM is Consistent Language Arts and Literacy in History/Social Studies and Science that I will abbreviate as
with, and Complements, CCSS-ELA and Common Core State Standards for Mathematics (K–12) that I will abbreviate
as CCSS-M. Both sets of standards are regarded as a step forward in terms of logical
the Standards
coherence, developmental progression across grades, and specificity (Carmichael, Wilson, et
al., 2010). As I stated earlier, I will focus on the CCSS-ELA in this white paper.
The CCSS-ELA is divided into three sections, (1) Standards for English Language Arts & Literacy
in History/Social Studies, Science, and Technical Subjects K–5; (2) Standards for English Language
Arts 6–12; and (3) Literacy in History/Social Studies, Science, and Technical Subjects 6–12. Within
the K–5 and 6–12 English Language Arts sections are strands:
(1) Reading:Text Complexity and the Growth of Comprehension
(2) Writing:Text Types, Responding to Reading, and Research
(3) Speaking and Listening: Flexible Communication and Collaboration
(4) Language: Conventions, Effective Use, and Vocabulary
The Literacy in History/Social Studies, Science, and Technical Subjects 6–12 section includes only
the Reading and Writing strands.

Common Core State The K–5 Reading Standards are designed to ensure that all students get off to a healthy
Standards K–5 Reading academic start and become competent readers, essential for understanding and using
narrative and informational text. Basic skills are necessary, albeit insufficient to attain the
CCSS. Given AIMSweb CBM’s focus on basic skills assessment, and in particular, reading, it is
not surprising that R-CBM is highly (and most) consistent with CCSS K–5 Reading Standards.
R-CBM is a “rich task” where students read aloud for 1 minute, serving as a holistic test
that can contribute to understanding student performance relative to a number of reading
standards. For older students (e.g., Grade 5 and above) AIMSweb Maze, a silent 3-minute
reading test, also is consistent with the CCSS Reading Standards. AIMSweb early literacy
measures, Letter Naming, Phonemic Segmentation, Letter Sounds, and Nonsense Words are
also consistent (i.e., content valid) with a number of CCSS Reading Standards.
The K–5 Reading Standards are divided into two main sections, Anchor Standards that are
consistent across grades but operationalized developmentally, and Foundational Standards
that are critical components of general reading skill.

K–5 Anchor Standards


The K–5 Reading Standards include 10 identical Anchor Standards across grades, divided into
four areas:
(1) Key Ideas and Details
(2) Craft and Structure
(3) Integration of Knowledge and Ideas
(4) Range of Reading and Level of Text Complexity
These areas and the Anchor Standards are detailed separately by types of text: (1) Reading
Standards for Literature K–5, and (2) Reading Standards for Informational Text K–5. Each area
contains specific standards that are operationalized with different features and content
across the grades and the types of text.
Reading Standards for Literature K–5. AIMSweb’s R-CBM is highly consistent with the Range
of Reading and Level of Text Complexity Anchor, that requires students to:
10. Read and comprehend complex literary and informational texts independently
and proficiently.

6 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
Developmental differences are noted. For example, the summative expected outcome for
Grade 1 Literature is:
10. With prompting and support, read prose and poetry of appropriate complexity for
grade 1.
In contrast, the expected summative expected outcome for Grade 5 Literature is:
10. By the end of the year, read and comprehend literature, including stories, dramas, and
poetry at the high end of grades 4–5 text complexity band independently and proficiently.
AIMSweb’s R-CBM is consistent (i.e., content valid) with this Literature Anchor Standard.
Students are tested by having them read grade-level passages of suitable difficulty (e.g.,
Grade 5 passages for Grade 5 standards). The passages are not representative of all text
types (e.g., poetry), but form the basis for judging students’ skill in general reading in terms
of independence and proficiency consistent with this standard.
Most importantly, AIMSweb’s R-CBM is complementary to CCSS assessment strategies. It
has demonstrated consequential validity as a general reading test to identify students at risk
(Shinn, 1989; 2007). The test can be used for universal screening early in an academic year
to identify students at risk for failing to attain the CCSS grade-level, end-of-year standards.
R-CBM is time and cost efficient, ensuring that sizable amounts of school resources
are not diverted away from instruction. And because it has been validated as a frequent,
formative assessment instrument (Fuchs & Fuchs, 1999; 2008), R-CBM can be used to
monitor progress regularly to ensure students are acquiring the skills necessary to meet
CCSS standards. In short, early screening and formative, frequent progress monitoring
complements the CCSS testing strategies that are summative and emphasize accountability
and program evaluation.
Reading Standards for Informational Text K–5. It should be noted that AIMsweb’s R-CBM is less
consistent (i.e., content valid) with the Reading Standards for Informational Text K–5. Across
grades, students are expected to:
10. Read and comprehend complex literary and informational texts independently and
proficiently (p. 5) (emphasis added)
Less consistent is not the same as inconsistent. This judgment is based on the type of text
material students would be expected to read to be judged on these informational texts. This
requirement is clear in examining these Anchor Standards.
For example, the Grade 2 Anchor Standard end-of-the-year outcome is:
10. By the end of year, read and comprehend informational texts, including history/social
studies, science, and technical texts, in the grades 2–3 text complexity band
proficiently, with scaffolding as needed at the high end of the range. (emphasis added)
The Grade 5 Anchor Standard end-of-the-year outcome is similar, but requires successful
navigation of Grade 4–5 material:
10. By the end of year, read and comprehend informational texts, including history/social
studies, science, and technical texts, in the grades 4–5 text complexity band proficiently,
with scaffolding as needed at the high end of the range. (emphasis added)
As noted earlier, AIMSweb reading tests are based on passages that are narrative or
literature text largely due to their intended purpose, to serve as vital signs or indicators
of general reading ability. As the CCSS themselves imply, reading literature is different from
reading informational text. Skill in reading informational text relies much more on specific
content knowledge, vocabulary, and interest, than reading more narrative or literature
text. Of course, general reading ability is directly correlated (i.e., construct-related validity)
to being able to read and comprehend informational texts, but in terms of content validity,
AIMSweb’s reading tests would be less valid.

7 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
However, as I have tried to emphasize throughout the white paper, the primary value of
AIMSweb is not to serve as content valid measures of the CCSS. Consistency is important,
but the primary usefulness of AIMSweb is to complement attainment of the CCSS by
facilitating early intervention for those students at risk by time and cost efficient universal
screening and frequent progress monitoring.

K–5 Foundational Skills Standards


The K–5 Reading Standards also include four Foundational Skills that span literature
and informational reading that are: necessary and important components of an effective
comprehensive reading program designed to develop proficient readers” (CCSS, p. 15):
(1) Print Concepts
(2) Phonological Awareness
(3) Phonics and Word Recognition
(4) Fluency
AIMSweb’s R-CBM test is most obviously highly consistent with the CCSS Foundational Skills
of Fluency. With the exception of an end-of-Kindergarten standard that students will “read
emergent reader texts with purpose and understanding,” the Fluency standard is the same at
each grade:
4. Read with sufficient accuracy and fluency to support comprehension.
a. Read on-level text with purpose and understanding.
b. Read on-level prose and poetry orally with accuracy, appropriate rate, and
expression on successive readings.
c. Use context to confirm or self-correct word recognition and
understanding, rereading as necessary.
Assessing general reading skill, including fluency and accuracy, with AIMSweb’s R-CBM is
clearly consistent with the CCSS Foundational Skills of Fluency. With respect to content validity,
students read CCSS recommended “on-level text” using passages that have been field tested
for equivalent difficulty and subjected to readability evaluations, including use of Lexile ratings
(Howe & Shinn, 2002). Results are scored quantitatively in terms of WRC and accuracy,
the number of words read correctly divided by the total number of words read. Qualitative
ratings of “appropriate rate and expression on successive readings” as well as “self correct
word recognition and understanding, rereading as necessary” is accomplished quickly and
efficiently through the AIMSweb Qualitative Features Checklist (QFC) as part of Benchmark
Assessment.
Although AIMSweb’s CBM test content is consistent (i.e., content validity) with respect to
the CCSS, the primary contribution of AIMSweb is its consequential validity; it complements
the CCSS summative and accountability assessment focus. It enables schools to engage in
early screening and intervention practices and frequent and formative evaluation to ensure
students are benefiting so they may attain the CCSS.
Unlike the Foundational Skills of Fluency, which emphasize broad outcomes that can be
assessed readily for screening and progress monitoring using AIMSweb’s R-CBM, the
Print Concepts and Phonological Awareness Foundational Skills include more narrow specific
outcomes and discrete skills. Not unexpectedly given these foundational skills and their
relation to overall reading success, the greatest consistency is at Kindergarten and Grade 1.
Kindergarten Print Concepts Standards are:
1. Demonstrate understanding of the organization and basic features of print.
a. Follow words from left to right, top to bottom, and page by page.

8 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
b. Recognize that spoken words are represented in written language by specific
sequences of letters.
c. Understand that words are separated by spaces in print.
d. Recognize and name all upper- and lowercase letters of the alphabet.
AIMSweb’s Letter Naming (LN) test requires students to name randomly ordered upper and
lower case letters. The number of correct letters named in 1 minute is the score of interest
and is most obviously consistent with the discrete skills of (d). It has high content validity.
The primary advantage of AIMSweb LN is that it is complementary and is especially useful
as an extremely time and cost efficient K entry screener. AIMSweb uses LN not just
as a content valid measure of the discrete skill of naming letters, but as a “vital sign” or
“indicator” of the Print Concepts construct. That is, entry Kindergarten students who do
poorly on LN typically have little “print awareness.” Thus, they are likely to perform poorly
on all the Print Concepts Standards 1a through 1d. Although summative assessment and
accountability may require end-of-the-year testing on all four K standards, screening for risk
in attaining these standards can be accomplished economically by testing students for 1
minute on AIMSweb LN. Because each of these four standards represents a very discrete
and short-term instructional skill focus (e.g., 1–4 weeks), frequent progress monitoring may
be conducted best within the curriculum used to teach students these skills.
Kindergarten Phonological Awareness Standards are:
2. Demonstrate understanding of spoken words, syllables, and sounds (phonemes).
a. Recognize and produce rhyming words.
b. Count, pronounce, blend, and segment syllables in spoken words.
c. Blend and segment onsets and rimes of single-syllable spoken words.
d. Isolate and pronounce the initial, medial vowel, and final sounds
(phonemes) in three-phoneme (consonant-vowel-consonant, or CVC) words.*
(This does not include CVCs ending with /l/, /r/, or /x/.)
e. Add or substitute individual sounds (phonemes) in simple, one-syllable words
to make new words.
Grade 1 Phonological Awareness Standards are:
2. Demonstrate understanding of spoken words, syllables, and sounds (phonemes).
a. Distinguish long from short vowel sounds in spoken single-syllable words.
b. Orally produce single-syllable words by blending sounds (phonemes), including
consonant blends.
c. Isolate and pronounce initial, medial vowel, and final sounds
(phonemes) in spoken single-syllable words.
d. Segment spoken single-syllable words into their complete sequence
of individual sounds (phonemes).
AIMSweb’s Phonemic Segmentation Fluency test (PSF) requires students to parse orally
presented single and multi-syllable words into phonemes. The number of correct phonemes
segmented in 1 minute is the score of interest and is most obviously consistent (i.e., content
valid) with the discrete skills of K 2b, 2c, and 2d and Grade 1 2c and 2d.
AIMSweb PSF is complementary when the test is used not as a measure of these discrete
skills, but as a “vital sign” or correlate of the phonological awareness construct. Kindergarten
and Grade 1 students who do poorly on PSF typically have a variety of phonological
awareness deficits and if they are to attain the end-of-year standards, early screening allows

9 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
early identification and early intervention. Although summative assessment and accountability
may require end-of-the-year testing on all five K standards and all four Grade 1 standards,
screening for risk in attaining these standards can be accomplished economically by testing
students for 1 minute on AIMSweb PSF. This test is especially important for screening when
students are failing to acquire kindergarten reading skills.

Phonics and Word Recognition Standard


This CCSS Foundational Skill is unique in that it requires both reading and spelling
assessment. A single foundational skill standard is specified across Grades K–5 that is
operationalized developmentally.
3. Know and apply grade-level phonics and word analysis skills in decoding words.
Grades 4 and 5 operationalize this Foundational Skill the same way.
a. Use combined knowledge of all letter-sound correspondences, syllabication patterns, and
morphology (e.g., roots and affixes) to read accurately unfamiliar multisyllabic words in
context and out of context.
The other grades require different and more developmentally relevant skills. For example, at
Kindergarten, the Foundational Skills for Phonics and Word Recognition include:
Demonstrate basic knowledge of one-to-one letter-sound correspondences by producing
the primary sound or many of the most frequent sounds for each consonant.
Associate the long and short sounds with common spellings (graphemes) for the five
major vowels.
These spelling related skills can be assessed by using AIMSweb Letter Sounds (LS), a content
valid test where students are required to produce as many common letter sounds as
they can in 1 minute, given a series of upper and lower case letters. AIMSweb LS also has
consequential validity as a Kindergarten and early Grade 1 screener and progress monitoring
tool for students who are at risk for or receiving intervention for Phonics concerns (Hintze
& Silberglitt, 2005; Silberglitt, 2007).
Spelling Curriculum-Based Measurement (S-CBM) also has content validity for drawing
conclusions about some of the Phonics and Word Recognition Foundational Standards. The
S-CBM test requires students to write orally dictated grade-level phonetically regular
and irregular words for 2 minutes. Results are scored by the number of correct letter
sequences (CLS) and words spelled correctly. CLS scoring allows for identifying the correct
and incorrect phonics spelling patterns. The strong relation between early reading and
spelling skills has been long noted (Adams, 1990) and this relation is identified in a number
of standards.
For example, at Grade 1, students are expected to:
a. Know the spelling-sound correspondences for common consonant digraphs.
b. Decode regularly spelled one-syllable words.
c. Know final -e and common vowel team conventions for representing long vowel sounds.
d. Use knowledge that every syllable must have a vowel sound to determine the number
of syllables in a printed word.
e. Decode two-syllable words following basic patterns by breaking the words into syllables.
f. Read words with inflectional endings.
g. Recognize and read grade-appropriate irregularly spelled words.

10 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
Grade 1 standards a, c, d, and e clearly are components of AIMSweb’s S-CBM. Similar
examples of content validity can be seen in other grades. At Grade 3, the Foundational
Standards also include the following end-of-year skills that may best be assessed through a
spelling test rather than a reading test alone.
a. Distinguish long and short vowels when reading regularly spelled one-syllable words.
b. Know spelling-sound correspondences for additional common vowel teams.
c. Decode regularly spelled two-syllable words with long vowels.
d. Decode words with common prefixes and suffixes.
e. Identify words with inconsistent but common spelling-sound correspondences.
Like the other AIMSweb measures, S-CBM complements CCSS decision making; it has
evidence of consequential validity for screening and progress monitoring decisions (Fuchs,
Allinder, Hamlett, & Fuchs, 1990; Fuchs, Fuchs, Hamlett, & Allinder, 1991).
Finally, it should be noted that AIMSweb R-CBM has content validity for many of the Phonics
and Word Recognition Standards. However, this consistency would be addressed through a
qualitative analysis of the specific words read aloud correctly and incorrectly.

Common Core State The K–5 Writing Standards have the following goal:
Standards K–5 Writing To build a foundation for college and career readiness, students need to learn to use writing as
a way of offering and supporting opinions, demonstrating understanding of the subjects they are
studying, and conveying real and imagined experiences and events (p. 18).
Like the K–5 Reading Standards, the Writing Standards include 10 identical Anchor Standards
across grades, divided into four areas:
(1) Text Types and Purposes
(2) Production and Distribution of Writing
(3) Research to Build and Present Knowledge
(4) Range of Writing
Of these four areas, AIMSweb CBM Written Expression (WE-CBM) is most consistent with
one of the three Anchor Standards for Text Types and Purposes at Grades 1–5.
For example, at Grade 2, students are expected to:
3. Write narratives in which they recount two or more appropriately sequenced
events, include some details regarding what happened, use temporal words to
signal event order, and provide some sense of closure. (emphasis added)
At Grade 4, students are expected to:
3. Write narratives to develop real or imagined experiences or events using
effective technique, descriptive details, and clear event sequences. (emphasis added)
a. Orient the reader by establishing a situation and introducing a narrator and/or
characters; organize an event sequence that unfolds naturally.
b. Use dialogue and description to develop experiences and events or show the
responses of characters to situations.
c. Use a variety of transitional words and phrases to manage the sequence of events.
d. Use concrete words and phrases and sensory details to convey experiences
and events precisely.
e. Provide a conclusion that follows from the narrated experiences or events.

11 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
WE-CBM requires students to write short narratives for 3 minutes, given a topic story
starter. Therefore, WE-CBM is consistent with portions of the CCSS K–5 Writing Standards.
Like the CCSS Reading Standards, however, AIMSweb’s WE-CBM task is valuable more with
respect to its complementary assessment and contribution to decision-making practices.
The test has been validated as a measure of general beginning writing skills (McMaster & Espin,
2007). Student writing narratives are scored by production (counting the total number of
words written, TWW) and by correct sequences of writing judged by mechanics, syntax,
and semantics (i.e., correct writing sequences, CWS). As a vital sign or indicator of general
written expression skills, it can be used as a time and cost efficient screener to enable early
intervention and as a frequent progress monitoring instrument as long as students show
writing deficits.

Other K–5 Common Core AIMSweb’s CBM tests are not as directly related to the other two K–5 CCSS sections,
State Standards Speaking and Listening: Flexible Communication and Collaboration, and Language: Conventions,
Effective Use, and Vocabulary. Content validity may be considered to be lower. Most of the
Speaking and Listening as well as Language standards are very specific and discrete skills that
are short-term instructional outcomes and reflect Mastery Monitoring (MM) more than the
general outcomes assessed by CBM. For example, in the area of Language Standards K–5,
Conventions of Standard English, at the end of Grade 1, students are expected to:
2. Demonstrate command of the conventions of standard English capitalization,
punctuation, and spelling when writing.
a. Capitalize dates and names of people.
b. Use end punctuation for sentences.
c. Use commas in dates and to separate single words in a series.
d. Use conventional spelling for words with common spelling patterns and for
frequently occurring irregular words.
e. Spell untaught words phonetically, drawing on phonemic awareness and
spelling conventions.
These specific outcomes are components of strong general outcomes, but they are not
CBM’s primary assessment focus. Again, it is not that AIMSweb CBM is inconsistent with
or irrelevant to these standards. These skills can be assessed qualitatively in AIMSweb’s
WE CBM narrative writing test and counted quantitatively as part of the Correct Word
Sequence (CSW) scoring system.

Common Core State The CCSS Standards English Language Arts 6–12 and Literacy in History/Social Studies,
Standards Grades 6–12 Science, and Technical Subjects 6–12 clearly represent students’ use of reading and language
arts skills to navigate, understand and use complex text. Some of the Standards are
related to AIMSweb CBM tests. But this relation is one of consequential validity (e.g., how
AIMSweb complements CCSS) rather than content validity (i.e., how consistent AIMSweb
is with CCSS).
The Reading Standards 6–12 are constructed similarly to the K–5 Reading Standards.
There are Anchor Standards and a strong emphasis of success on levels of text complexity.
Additionally, the Reading Standards distinguish between reading and understanding literature
and informational text.
Similar to many of the other standards in the document, AIMSweb’s CBM tests are less
consistent with the 6–12 standards. CBM’s emphasis on assessing general basic skill
outcomes through rich tasks that are valid vital signs or indicators of more complex
constructs, however, has a critical role for complementing the CCSS summative and

12 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
accountability emphasis. That is, AIMSweb can be used to screen for those students whose
lack of basic reading skills are contributing to a failure to attain the 6-12 Reading Standards.
A different level of intensive intervention would be required for students with these severe
basic skill deficits than for students who have basic reading skills, but who are failing to
acquire specific CCSS 6–12 Standards. But equally, if not more importantly, AIMSweb’s CBM
tests serve as the best available technology to monitor progress frequently for formative
evaluation.

Summary It is important that instructional and assessment practices align with the CCSS. For
assessment, it is important that testing practices are consistent (i.e., content valid) with
the CCSS. AIMSweb’s CBM tests are consistent, especially with their intended audience,
typically developing students acquiring basic skills. AIMSweb’s tests are especially consistent
with the K–5 Reading Standards, including but not limited to the Anchor Standards and reading
literature, Foundational Skills, especially Fluency, and K–5 Writing Anchor Standards.
But content validity, assessing the elements of specific achievement, is not the strength nor the
primary purpose of AIMSweb. Its strengths are how it complements summative evaluation
and accountability. Good assessment supports important decision making and the most
common ones in America’s schools are (a) Screening, the process of identifying students
at risk so that intervention can be provided as early as possible; (b) Diagnosis/Instructional
Planning, where the instructional content that needs to be taught is identified; (c) Progress
Monitoring, judging whether students are benefiting from instruction; and (d) Accountability
and Program Evaluation, a summative decision where critical decisions are made about the
effectiveness of schools, of teaching and teachers, and of the instructional programs delivered
to groups of students.
One of this white paper’s Big Ideas was that, for a variety of technical reasons, no single
test can contribute to all these decisions equally well. A summative test used to assess
attainment of the CCSS for accountability purposes is unlikely to make a good screener.
It makes little sense to screen students at the beginning of the year on outcomes they are
intended to achieve at the end of the year. One would and should expect many beginning-
of-the-school-year students to not have yet reached those grade-level standards. Screening
needs to be accurate in differentiating individual students who need additional assessment or
intervention from those students who do not. It needs to be proven to be technically sound.
Screening also needs to be time and cost efficient. Spending lots of time testing all students
with a corresponding loss of instructional time can make screening very expensive, and thus,
impractical. AIMSweb’s CBM tests can accurately and efficiently find the students who need
more intensive intervention to attain the CCSS.
Another Big Idea of this paper is that frequent progress monitoring for formative assessment
is one of the most powerful tools to support student learning and attain the CCSS.
Frequent progress monitoring for formative assessment requires that a different set of test
characteristics be considered. Like screening, progress monitoring also needs to be time and
cost efficient. In contrast to screening where all students are screened at a single point in
time, progress may also be monitored for all students at regular, infrequent intervals
(e.g., 3–4 times per year), or for some students with more severe needs more frequently,
up to 1–2 times per week. Spending lots of time completing progress monitoring testing
with sizable numbers of students with a corresponding loss of instructional time can make
it very expensive, and thus, impractical. The bottom line for progress monitoring is finding
out which students are not benefiting from intervention and need changes in instruction.
AIMSweb is uniquely suited for use in basic skills progress monitoring, especially frequent
progress monitoring with students at risk.

13 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
References Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT Press.
Barton, P. E. (1999). Too much testing of the wrong kind;Too little of the right kind in K–12 education (pp. 1–35). Princeton, NJ:
Educational Testing Service, Research Division.
Carmichael, S. B., Martino, G., Porter-Magee, K., Wilson, W. S., Fairchild, D., Haydel, E., . . . Winkler, A. M. (2010). The state of
State Standards–and the Common Core–in 2010. Washington, DC: Thomas B. Fordham Institute.
Carmichael, S. B., Wilson, W. S., Martino, G., Finn, C. E., Porter-Magee, K., & Winkler, A. M. (2010). Review of the Draft K–12
Common Core Standards. Washington, DC: Thomas. B. Fordham Institute.
Deno, S. (1992). The nature and development of curriculum- based measurement. Preventing school failure, 36(2), 5–10.
Deno, S. L. (1986). Formative evaluation of individual student programs: A new role for school psychologists. School
Psychology Review, 15, 358–374.
Deno, S. L., Marston, D., Shinn, M. R., & Tindal, G. (1983). Oral reading fluency: A simple datum for scaling reading disability.
Topics in Learning and Learning Disability, 2, 53–59.
Deno, S. L., Mirkin, P., & Wesson, C. (1984). How to write effective data-based IEPs. Teaching Exceptional Children, 16, 99–104.
Finn, C. E., Julian, L., & Petrilli, M. J. (2006). 2006 The state of State Standards. Washington, DC: Thomas. B. Fordham Institute.
Fuchs, L. S., Allinder, R., Hamlett, C. L., & Fuchs, D. (1990). Analysis of spelling curriculum and teachers’ skills in identifying
phonetic error types. Remedial and Special Education, 11, 42–53.
Fuchs, L. S., & Deno, S. L. (1991). Paradigmatic distinctions between instructionally relevant measurement models. Exceptional
Children, 57, 488–500.
Fuchs, L. S., & Fuchs, D. (1999). Monitoring student progress toward the development of reading competence: A review of
three forms of classroom-based assessment. School Psychology Review, 28, 659–671.
Fuchs, L. S., & Fuchs, D. (2008). Best practices in progress monitoring reading and mathematics at the elementary level. In
A. Thomas & J. Grimes (Eds.), Best practices in school psychology V (pp. 2,147–2,164). Bethesda, MD: National Association of
School Psychologists.
Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Allinder, R. M. (1991). The contribution of skills analysis to Curriculum-Based
Measurement in spelling. Exceptional Children, 57(5), 443–452.
Fuchs, L. S., Fuchs, D., & Maxwell, L. (1988). The validity of informal reading comprehension measures. Remedial and Special
Education, 9, 20–28.
Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. New York, NY: Routledge.
Hintze, J. M., & Silberglitt, B. (2005). A longitudinal examination of the diagnostic accuracy and predictive validity of R-CBM
and high stakes testing. School Psychology Review, 34, 372–386.
Howe, K. B., & Shinn, M. M. (2002). Standard Reading Assessment Passages (RAPs)for use in General Outcome Measurement:
A manual describing development and technical features. Eden Prairie, MN: Edformation, Inc.
McMaster, K., & Espin, C. (2007). Technical features of curriculum-based measurement in writing. The Journal of Special
Education, 41, 68–84.
Messick, S. (1986). The once and future issues of validity: Assessing the meaning and consequences of measurement. Educational
Testing Services.
National Governors Association Center for Best Practices, & Council of Chief State School Officers. (2012). Common Core
State Standards Initiative, from http://www.corestandards.org
National Reading Panel. (2000). Teaching children to read: An evidence-based assessment of the scientific research literature on
reading and its implications for reading instruction. Washington, DC: National Institute of Child Health and Human Development,
National Institute for Literacy, U. S. Department of Education.
Shinn, M. R. (1989). Identifying and defining academic problems: CBM screening and eligibility procedures. In M. R. Shinn (Ed.),
Curriculum-based measurement: Assessing special children (pp. 90–129). New York: Guilford.
Shinn, M. R. (2002). Best practices in Curriculum-Based Measurement and its use in a Problem-Solving model. In A. Thomas
& J. Grimes (Eds.), Best practices in school psychology IV (pp. 671–698). Bethesda, MD: National Association of School
Psychologists.
Shinn, M. R. (2007). Identifying students at risk, monitoring performance, and determining eligibility within RTI: Research on
educational need and benefit from academic intervention. School Psychology Review, 36, 601–617.
Shinn, M. R. (2008). Best practices in Curriculum-Based Measurement and its use in a Problem-Solving model. In A.
Thomas & J. Grimes (Eds.), Best practices in school psychology V (pp. 243–262). Bethesda, MD: National Association of School
Psychologists.
Shinn, M. R. (2010). Building a scientifically based data system for progress monitoring and universal screening across three
tiers including RTI using Curriculum-Based Measurement. In M. R. Shinn & H. M. Walker (Eds.), Interventions for achievement
and behavior problems in a three-tier model, including RTI (pp. 259–293). Bethesda, MD: National Association of School
Psychologists.
Shinn, M. R. (2012). Measuring general outcomes: A critical component in scientific and practical progress monitoring practices.
Minneapolis, MN: Pearson Assessment.
Silberglitt, B. (2007). Using AIMSweb Early Literacy Measures to predict successful first-grade readers. Eden Prairie, MN:
Edformation, Inc.
Silberglitt, B., & Hintze, J. M. (2005). Formative assessment using CBM-R cut score to track progress toward success on state-
mandated achievement tests: A comparison of methods. Journal of Psychoeducational Assessment, 23, 304–325.
Stage, S. A., & Jacobsen, M. D. (2001). Predicting student success on a state-mandated performance-based assessment using
oral reading fluency. School Psychology Review, 2001, 407–419.
Yeh, S. S. (2007). The cost effectiveness of five policies for improving achievement. American Journal of Evaluation, 28, 416–436.

14 | The Relation of AIMSweb, Curriculum-Based Measurement, and the Common Core Standards: All Parts of Meaningful School Improvement Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved.
Mathematics
Computation
Administration
and Technical
Manual

w w w. a i m s w e b . c o m • 888-944-1882
Pearson Corporate Executive Office 5601 Green Valley Drive Bloomington, MN 55427
800.627.7271 www.PsychCorp.com

Copyright © 2010 NCS Pearson, Inc. All rights reserved.

Warning: No part of this publication may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopy, recording, or any information storage and retrieval system, without
permission in writing from the copyright owner.

Pearson, the PSI logo, PsychCorp, and AIMSweb are trademarks in the U.S. and/or other countries of
Pearson Education, Inc., or its affiliate(s).

Portions of this work were previously published.

Printed in the United States of America.


Table of Contents

Section 1
Introduction������������������������������������������������������������������������������������������������������������������������������ 1
Welcome to AIMSweb® ����������������������������������������������������������������������������������������������������������� 1
Mathematics Computation (M–COMP)�������������������������������������������������������������������������������� 1
Changes From the Previous Edition�������������������������������������������������������������������������������������� 2
Test Security and Copyright Restrictions������������������������������������������������������������������������������ 2
User Qualifications �������������������������������������������������������������������������������������������������������������� 3
Contact Information ����������������������������������������������������������������������������������������������������������� 3
Accessing More AIMSweb Material �������������������������������������������������������������������������������������� 3

Section 2
Using AIMSweb® Mathematics Computation (M–COMP)�������������������������������������������������������������� 5
Benchmarking and Screening ������������������������������������������������������������������������������������������������ 6
Progress Monitoring����������������������������������������������������������������������������������������������������������������� 8
Goal Setting ����������������������������������������������������������������������������������������������������������������������������� 9

Section 3
Guidelines for Administration, Scoring, and Reporting���������������������������������������������������������������� 11
General Testing Considerations���������������������������������������������������������������������������������������������� 11
Testing Students With Special Accommodations����������������������������������������������������������������� 11
Conducting a Survey-Level Assessment or Off-Level Testing ����������������������������������������������� 12
Administering the Probes������������������������������������������������������������������������������������������������������� 13
Administration Directions �������������������������������������������������������������������������������������������������� 14
Scoring Guidelines ����������������������������������������������������������������������������������������������������������������� 15
How to Score the M–COMP������������������������������������������������������������������������������������������������� 16
M–COMP Scoring Examples ����������������������������������������������������������������������������������������������� 17
Reporting �������������������������������������������������������������������������������������������������������������������������������� 26
Generating Student Reports������������������������������������������������������������������������������������������������� 26

Copyright © 2010 NCS Pearson, Inc. All rights reserved. iii


AIMSWeb® Administration and Technical Manual

Section 4
Developing and Standardizing the Test ������������������������������������������������������������������������������������� 29
Creating the New Blueprint���������������������������������������������������������������������������������������������������� 29
Item Development ������������������������������������������������������������������������������������������������������������� 29
Item Review������������������������������������������������������������������������������������������������������������������������ 30
Test Construction���������������������������������������������������������������������������������������������������������������� 30
Major Research Stages������������������������������������������������������������������������������������������������������������� 31
Pilot Studies ����������������������������������������������������������������������������������������������������������������������� 31
National Field Test �������������������������������������������������������������������������������������������������������������� 31
Finalizing and Selecting Probes������������������������������������������������������������������������������������������� 32

Appendix A Technical Adequacy and Data Tables ����������������������������������������������������������������������� 33


A.1 Demographic Characteristics of the Sample by Grade and Geographic Region������������������� 34
A.2 Demographic Characteristics of the Sample by Grade and Community Type �������������������� 34
A.3 Demographic Characteristics of the Sample by Grade and Sex������������������������������������������� 35
A.4 Demographic Characteristics of the Sample by Grade and Race/Ethnicity������������������������� 35
A.5 Demographic Characteristics of the Sample by Grade and Median Family Income ����������� 36
A.6 Descriptive and Reliability Statistics by Grade ������������������������������������������������������������������� 36
A.7 C
 orrelations of AIMSweb M–COMP Scores With Group Mathematics
Assessment and Diagnostic Evaluation (G∙MADE) Scores by Grade ����������������������������������� 37

Glossary����������������������������������������������������������������������������������������������������������������������������������� 39

References ������������������������������������������������������������������������������������������������������������������������������� 41

Figures
2.1 Benchmark/Progress Monitor Flow Chart ��������������������������������������������������������������������������� 5
2.2 M–COMP Individual Student Report����������������������������������������������������������������������������������� 7
2.3 M–COMP Progress Monitoring Improvement Report ��������������������������������������������������������� 8
3.1 Scored Answer Key������������������������������������������������������������������������������������������������������������� 16
3.2 Correct Answer Not on the Answer Key ���������������������������������������������������������������������������� 17
3.3 Range of Acceptable Answers �������������������������������������������������������������������������������������������� 18
3.4 Overcorrection Into Error �������������������������������������������������������������������������������������������������� 19
3.5 Targeted Answer in Directions ������������������������������������������������������������������������������������������� 20
3.6 Mixed Number for Grade 8, Item 22���������������������������������������������������������������������������������� 20
3.7 Answers to Decimal Problems ������������������������������������������������������������������������������������������� 21
3.8 Written Answers ���������������������������������������������������������������������������������������������������������������� 22
3.9 Crossed-Out Answer ���������������������������������������������������������������������������������������������������������� 23

iv Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Table of Contents

3.10 Difficult-to-Read Response����������������������������������������������������������������������������������������������� 24


3.11 Reversed Numbers With Intended Number Obvious ������������������������������������������������������� 25
3.12 Rotated Numbers With Intended Number Indeterminable ���������������������������������������������� 25

Tables
3.1 Total Point Value by Grade������������������������������������������������������������������������������������������������� 15
4.1 Domains Evaluated With M–COMP by Grade ������������������������������������������������������������������� 29
4.2 National Field Testing Item and Probe Count by Grade ����������������������������������������������������� 32

Copyright © 2010 NCS Pearson, Inc. All rights reserved. v


Section

1 Introduction

Welcome to AIMSweb®

AIMSweb® is an assessment, data organization, and reporting system that provides the framework
and data necessary for response to intervention (RTI) and multitiered instruction. Designed
specifically to benchmark and monitor progress, AIMSweb uses Curriculum-Based Measurement
(CBM) practices: brief, reliable, and valid measures of basic reading skills, language arts, and
mathematics. These standardized tests are based on general outcome measurement principles
so they can be efficiently and accurately used to evaluate student progress relative to a year-end
target, regardless of curriculum or intervention.

Schools use AIMSweb benchmark assessments to universally screen all students three times a
year in order to identify at-risk students early. This enables the educators to align students to
multitiered interventions according to each student’s instructional need—before he or she has a
chance to fail. For students receiving intervention, schools use AIMSweb to monitor their progress
frequently, to ensure these students are progressing at the rate necessary to make adequate progress
and meet their goal.

AIMSweb is designed to be generalized to any curriculum and is used to assess skills taught across
a range of subject matter. As general outcome measures, they serve as indicators of performance
(e.g., general mathematics skills), as opposed to a mastery test that reports performance level on
specific skills or concepts (e.g., division of decimals) or a diagnostic assessment that analyzes
performance in depth.

AIMSweb assessment information can be collected through web-based data capture (for some
measures) or by individually entering a student’s scores (for all measures). Reports tied to robust
graphs and tables are generated instantly, can be interpreted by norms-based or standards-based
comparisons, and can be viewed anytime the user is connected to the Internet. Individual student
data and reports are transportable across time, across schools and communities, and across the
kinds of decisions educators typically make about student performance.

Mathematics Computation (M–COMP)


AIMSweb Mathematics Computation (M–COMP) is a series of assessments that yield general math
computation performance and rate of progress information. M–COMP includes three probes for
benchmarking and 30 probes for progress monitoring for each grade, 1 through 8. M–COMP is
a timed, 8-minute, open-ended, paper-based test that can be group administered or individually
administered.

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 1


AIMSWeb® Administration and Technical Manual

M–COMP is a revision of the AIMSweb Mathematics–CBM (M–CBM) and Mathematics–CBM2


(M–CBM2), which will be retired in fall of 2011. Features of M–COMP include:

• Enhanced content to increase depth of information and to align more closely with National
Council of Teachers of Mathematics (NCTM) standards,

• Updated and streamlined scoring rules,

• Clearer, more student-friendly format, and

• Scores equated to M–CBM and M–CBM2 to support longitudinal data.

Changes From the Previous Edition


Feedback from M–CBM and the M–CBM2 users was considered when designing this revision.
Users noted concerns in (1) the length of time and difficulty of scoring, (2) students skipping
around to easier items skewing performance data, and (3) the format of the test impeded student
performance and the scoring process. These three major concerns were taken into account and
addressed during the conceptual development of the M–COMP.

The concern regarding the time-intensive and potentially subjective nature of counting every
numeral in Correct Digit scoring and the Critical Process scoring was taken into account when
the team developed AIMSweb Mathematic Concepts and Applications (M–CAP, 2009), and a
more streamlined scoring system was developed for that test. Because user reaction has been so
favorable to the M–CAP scoring, the AIMSweb content team evaluated its efficacy for use in the
development of M–COMP.

Based on user feedback for M–CBM and the M–CBM2 and positive user reaction to the M–CAP,
the AIMSweb content team re-evaluated the current M–CBM and M–CBM2 scoring process
against the weighted-scoring system as used in M–CAP, and the data indicated that when applied
to the M–COMP, the system minimized scoring time, maximized sensitivity to growth, controlled
for students who skip to the easiest items, and ensures the psychometric soundness of the process.
Applying this system therefore addressed the first two concerns noted by current M–CBM and M–
CBM2 users.

The third concern noted was that the M–CBM and M–CBM2 test format impeded the students’
ability to take the test easily and the teachers’ ability to score the test quickly. To address these
concerns, the format was revised by placing boxes around each item, numbering the items, and
increasing the space between items.

Test Security and Copyright Restrictions


Educators who use the M–COMP are responsible for ensuring that the test materials, including the
probes, remain secure. Individual or collective test items may not be copied or disclosed to anyone
but the appropriate school personnel; to do so would compromise the validity of the M–COMP
measure and reduce its value as a measurement tool. Under no circumstances may test materials
be resold or displayed in locations where unqualified individuals can purchase or view partial or
complete portions of the M–COMP. This includes personal websites and Internet auction sites.

2 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Section 1 • Introduction

All test items, data, and materials are copyrighted. With the exception of usage as described in this
Manual, the Pearson Education, Inc., Rights and Permissions Department must approve in writing
(hard copy or electronic communication) the copying or reproduction of any test materials. The
only other exception to this requirement is the copying of a completed M–COMP probe(s) to
convey a student’s records to a qualified educator.

User Qualifications
The Mathematics Computation (M–COMP) is designed to be administered by general and special
education teachers, but may be used by school psychologists and other education specialists as
well. Users must adhere to the standardized administration and scoring rules to obtain accurate
results and properly interpret those results. Any deviation from the standardized administration
and scoring procedures may invalidate results and the resulting scores cannot be reported or
compared to the normative data. Before administering an M–COMP probe, please read Section 3
of this manual in its entirety. Section 3 also provides further guidance in using M–COMP with
populations that require special accommodations.

Contact Information
If you have any questions, comments, or suggestions about the AIMSweb M–COMP or concerns
about the policies and procedures presented in this Manual, please contact:

AIMSweb Customer Relations


Pearson
19500 Bulverde Road
San Antonio, TX 78259
Phone: 866-313-6194
Fax : 866-313-6197
Email: [email protected]

Accessing More AIMSweb Material


The AIMSweb documents and manuals referenced within this document can be found by logging
in to your AIMSweb account, select the yellow Downloads tab from the top of the page, and
scroll down until you find the document you are looking for. Clicking on the document title will
bring up a printable Portable Document File (PDF) of that document.

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 3


Section

2 Using AIMSweb® Mathematics


Computation (M–COMP)

Benchmarking students’ performance three times a year yields distinct data points, enabling you
to determine if students are on track with their progress, are struggling and may benefit from
intervention, or are out performing their peers. Progress monitoring for students identified during
Benchmarking as likely to benefit from some level of intervention enables you to determine
whether or not students are benefiting by measuring their rate of improvement.

Using the AIMSweb reporting system, a desirable rate of improvement (ROI) can be determined
and progress monitoring decisions made based on achieving that goal. The ROI is represented by
a trend line, or slope, which indicates the average weekly improvement. If the trend line (i.e., the
actual ROI) meets or exceeds the aim line (i.e., the expected ROI), you can then be confident that
the student is benefitting from the intervention, and that you should continue with the program.
If the trend line is not meeting or exceeding the aim line, then you should consider changing the
intervention approach (e.g., same program at higher intensity or a different program). See Figure 2.1.

Universal Benchmark/Screening
(Fall)

Good/Acceptable Poor/Borderline
Performance Performance

No action
Screen again in
Winter and Spring

Good/Acceptable Decide on Intervention


Poor/Borderline
Performance: Plan/Progress
Performance
No action Monitor Schedule

Progress Monitor

ROI indicates Student ROI Indicates Lack


On Track to Meet of Satisfactory
End-of-Year Goals Progress

Adjust Intervention
Level/Program

Continue with ROI Indicates Student ROI Continues to


Intervention on Track to Meet Indicate Lack of
Level/Program End-of-Year Goals Satisfactory Progress

Progress Monitor Repeat Intervention


and Benchmark Adjustment and
Appropriately Progress Monitoring
until Student On Track

End-of-Year End-of-Year
Goal Met Goal Met

Figure 2.1 Benchmark/Progress Monitor Flow Chart

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 5


AIMSWeb® Administration and Technical Manual

You or another education specialist may determine the frequency of progress monitoring based
on the specific needs of a particular child. Progress monitoring can be as frequent as once a week,
once a month, or several times between Benchmark periods. For students who are receiving Tier
2 and Tier 3 interventions, the goal material typically is their grade-level material; that is, grade
4 students would be progress monitored using grade 4 M–COMP probes. However, for students
who have IEPs with significant mathematics computation discrepancies, goal material may be
lower than their current grade placement. In these instances, you should conduct a Survey Level
Assessment (SLA) to determine a student’s present level of performance. Teams then write an
individualized goal to represent the grade-level proficiency to be achieved by the IEP annual
review date, which may include progress monitoring with off-grade level probes. See the sections
“Benchmarking and Screening” and “Progress Monitoring” that appear later in this section.

Benchmarking and Screening

AIMSweb M–COMP has three Benchmark probes per grade to be administered to all students
during the standard school year:

• Fall (September 1–October 15), which may also be used for purposes of screening

• Winter (January 1–January 31), for progress monitoring and program evaluation

• Spring (May 1–May 31), for progress monitoring and program evaluation

The purpose of benchmarking is to ensure that all students are assessed after a similar exposure
to the school curriculum. Although benchmarking periods range from 4 to 6 weeks, the
process should be completed within 2 weeks after a school begins the benchmarking process.
(Most schools complete benchmark testing within 2 days.) This limits the effects of additional
instruction for some students. If your school district’s schedule differs from the standard schedule,
adjust your benchmarking periods to reflect the level of instruction consistent with that of the
standard school year and the suggested Benchmark testing dates.

You can use the initial M–COMP Benchmark probe as a screening tool to make RTI decisions, and
then compare the results to normative- or standards-based data. Using the normative-based data,
an individual student report that presents the range of average M–COMP student performance
(i.e., scores between the 25th and 74th percentiles) can be generated that shows a student’s
current M–COMP performance (the number of points earned on a particular probe). The student’s
performance can be judged using percentile ranks and the raw scores relative to the normative
group, and can be used to make screening decisions.

Figure 2.2 presents a sample student’s M–COMP performance relative to a school norm is shown
on the standard AIMSweb “box and whisker plot” (note that any individual student’s ROI can also
be compared to a normative group).

6 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Section 2 • Using AIMSweb® Mathematics Computation (M–COMP)

Teacher: Teacher One Student: Student One


Benchmark Scores for 2009–2010 School Year
State School District – Elementary School
Student One (Grade 3)
Compared To: Elementary School
Mathematics Computation
110
Above
99 Average

88 Average
77 Below
66 average
Points

55 Target
44 Student
33
22
11
0
3Fall 3Winter 3Spring
M-COMP M-COMP M-COMP
Grade, Benchmark Period, and Outcome Measure
Copyright © 2009 by NCS Pearson, Inc.

Benchmark Comparison: Elementary School


Outcome Measure Year Grade Fall Winter Spring Level of Skill Instructional Recommendation
Continue Current Program
Mathematics
2009–2010 3 20.0 29.0 45.0 Average (Elementary School Spring
Computation (M-COMP)
Percentiles)
Student One improved from 20 Points (pts) from Grade 3 Probes at the Fall Benchmark to 45 Points (pts) at the Spring Benchmark. The rate of
improvement (ROI) from the Fall Benchmark is 0.7 Points per week. Currently, Student One’s score is Average compared to Elementary School Spring
Percentiles. This was a score at the 64 percentile compared to other students in the Elementary School.

Figure 2.2 M–COMP Individual Student Report

This report reflects the student’s M–COMP performance level over time, and the number of points
earned on a particular probe from fall to winter (or from fall to winter to spring), representing this
student’s ROI. The line extending from the top of the box represents the range of above average
M–COMP student performance (i.e., scores at the 75th percentile and above). Scores above this
top line represent scores in the upper 10% of students in the comparison group. The line extending
from the bottom of the box represents the range of below average M–COMP student performance
(i.e., scores between the 10th and 24th percentiles). Often, scores at this level are used to identify
potential candidates for tiered interventions (e.g., Tier 2). Scores below this bottom line represent
scores in the lower 10% of students in the comparison group. Typically, scores at this level are used
to identify potential candidates for the most intensive of tiered interventions (e.g., Tier 3).

With a standards-based approach to interpretation, educators can use a student’s M–COMP


Benchmark scores to broadly predict performance on a high-stakes test (e.g., a state-required
achievement test) by identifying those students who are most likely to pass and those who are
most likely to not pass your state test. More importantly, however, M–COMP Benchmarking better
enables you to identify students who are in between the extremes of a performance range, so that
students who otherwise may slip through the cracks can get the intervention they need to better
equip them for success in class, as well as on state tests.

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 7


AIMSWeb® Administration and Technical Manual

For more detail on how to use M–COMP in a benchmark assessment approach for screening
all students, see the AIMSweb Training Workbook Organizing and Implementing a Benchmark
Assessment Program (Shinn, 2002a). This and other manuals can be found on the Downloads tab
within your AIMSweb account.

Progress Monitoring

M–COMP provides 30 Progress Monitoring probes for each grade (Grades 1–8). Like the three
Benchmark probes, the Progress Monitoring probes have been standardized to be equivalent in
difficulty. When benchmarking and progress monitoring are used together, you can be confident
that improvement or lack of improvement in a student’s performance has been accurately tracked.

When a student is identified as potentially at risk and requiring mathematics intervention, you
and other qualified education professionals may meet to discuss appropriate end-of-year goals
for that student, and determine the required ROI to take the student from his or her current
performance level to the desired end-of-year performance level. Figure 2.3 presents a frequent
progress monitoring graph that shows an example of how a student’s rate of improvement has
been tracked with progress monitoring.

Progress Monitoring Improvement Report for Student One


from 09/15/2009 to 12/15/2009
Student One (Grade 3)
Grade 3: Mathematics Computation
60
Points
54 Points Aimline
Points Trend
48
42
Points (Points)

36
30
24
18
12
6
0
9/15 9/22 9/29 10/6 10/13 10/20 10/27 11/3 11/10 11/17 11/24 12/1 12/8 12/15
Date
Copyright © 2009 by NCS Pearson, Inc.

Goal Statement
In 13.0 weeks, Student One will achieve 45 Points from grade 3 Mathematics Computation. The rate of improvement should be 1.92 Points per week.
The current average rate of improvement is 2.14 Points per week.

Date 09/15 09/22 09/29 10/06 10/13 10/20 10/27 11/03 11/10 11/17 11/24 12/01 12/08 12/15
Points 20 22 18 25 30 28 33 35 33 38 42 40 45 46
Goal/Trend ROI 1.92/2.14

Figure 2.3 M–COMP Progress Monitoring Improvement Report

8 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Section 2 • Using AIMSweb® Mathematics Computation (M–COMP)

In progress monitoring, you use a single, different M–COMP probe each time. Progress monitoring
frequency should be consistent with the intensity level of intervention for that student. You may
want to monitor the progress of a student who is in the Tier 2 intervention range once or twice a
month. For a student in the more intensive Tier 3 range, you may want to monitor progress once
or twice a week.

Goal Setting

To get the most value from progress monitoring, it is important to set meaningful goals. The
components of the goal are (1) an established time frame, (2) the level of performance expected,
and (3) the criterion for success. Typical time frames include the duration of the intervention or
the end of the school year. An annual time frame is typically used when IEP goals are written for
students who are receiving special education.

M–COMP goals are written in an individualized yet standard format such as the example below:

In 34 weeks (1 academic year), Betsy will write correct answers to computation problems,
earning 40 points on grade 5 M–COMP probes.

You can establish the criterion for success according to standards, local norms, national norms,
or a normative rate of improvement. The team may want to compare a student’s performance
to district/local norms; that is, to compare the scores to his or her peers in the context of daily
learning. The last type of criterion is to use a normative rate-of-improvement (ROI). Using a
mathematical formula, an average rate of weekly improvement attained from a normative database
is multiplied by the time frame to determine the criterion for success.

National norms for the M–COMP will be available in the fall of 2011. For detailed information
and direction for setting goals, see Progress Monitoring Strategies for Writing Individual Goals in
General Curriculum and More Frequent Formative Evaluation (Shinn, 2002b).

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 9


Section

3 Guidelines for Administration,


Scoring, and Reporting

General Testing Considerations

M–COMP is a standardized test; that is, it is an assessment with directions, time limits, materials,
and scoring procedures designed to remain consistent each time the test is given, to ensure
comparability of scores. To make valid normative decisions (national, state, district, or school),
you must follow all directions carefully. Changes in the presentation of the probes, alterations of
the instructions given to students, or the inappropriate use of probes as teaching instruments may
invalidate any decisions or conclusions about student performance. M–COMP can be administered
to whole classes, small groups (3–4 students), or individually. Regardless of the setting, always
carefully monitor student participation and effort.

Before giving M–COMP, familiarize yourself with the procedures for administering, timing, and
scoring. This will enable you to pay maximum attention to the students during testing. Briefly
introduce the tasks in grade-appropriate language. Tell the students that some of the tasks may
be easy, while others may be more difficult, and that they are not expected to answer all the
questions correctly. Finally, let the students know that they may not be able to complete the probe
in the time allowed.

Students may use scratch paper, but the use of calculators, slide rulers, or any other assistive
devices (except where dictated by a student’s IEP) is prohibited. If there is material in the
classroom that may help the students (such as posters of mathematic functions), please remove
the materials from the students’ view. Make sure all cell phones, beepers, watch alarms, and other
unused electronics are turned off.

Testing Students With Special Accommodations


Some examples of special accommodations include (a) increasing the amount to test-taking
time for a particular student, (b) having a student practice the test beforehand, or (c) providing
feedback during the testing process to a student about whether an answer is correct or incorrect.
These accommodations are changes in the way the test was standardized, and should not be allowed. Like
all standardized tests, using M–COMP probes with some students may be inappropriate because
the demands of the test do not match the capabilities of a specific student. For example, because
M–COMP requires pencil-paper test taking skills, students with severe motor problems may not
be appropriate candidates for M–COMP use. Although the stratified sample includes students
with disabilities, those students were administered the test in the standardized manner, with no
special accommodations. For students with mild visual impairments, text enlargement may be an
appropriate accommodation as it does not invalidate the standardized procedures.

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 11


AIMSWeb® Administration and Technical Manual

Due to the standardized nature of M–COMP and the reporting requirements of benchmarking,
special accommodations cannot be made during the three Benchmark periods (Fall, Winter,
Spring); however, when you are monitoring progress frequently you have more latitude for
allowing specialized accommodations, as long as you are comparing the student’s scores to his
or her own scores only (i.e., individually referenced instead of norm-referenced) over a period
of time. For example, during progress monitoring, it may be appropriate to increase test-time
for a student with some motor impairment (e.g., from 8 to 12 minutes), providing this increase
is kept standard through out the progress monitoring process. Please note, any comparison of
those progress monitor scores to normative scores is inappropriate and you cannot base your
interpretation on such a comparison.

Conducting a Survey-Level Assessment or Off-Level Testing


Benchmark probes must always be administered at grade level; however, in some circumstances in
progress monitoring, it may be appropriate to administer the M–COMP probes from a grade other
than the student’s actual grade level in addition to the student’s grade-level probes. This process
is called survey-level assessment (SLA) or off-level testing. Briefly, an individual student is tested
on M–COMP probes from his or her grade-level and then at consecutively lower-grade M–COMP
probes until a normative score is obtained that reflects the student’s current level of performance.
The normative score is that score where the tested student performs on the probe as well as a
typical student at that lower grade. For example, a Grade 5 student who performs well below
average (e.g., < 10th percentile) on a Grade 5 M–COMP Benchmark probe would be tested using
M–COMP Benchmark probes from successively lower grade levels (e.g., Grade 4, Grade 3) until the
student’s normative score is in the average range for that lower grade level (i.e., between the 25th
and 74th percentile).

This is done first and foremost as the process for writing individualized goals for frequent progress
monitoring (Shinn, 2002b). Off-level testing can provide supplemental information about the
severity of an achievement-performance discrepancy. A Grade 5 student whose score is within
the average range of Grade 2 students on a Grade 2 M–COMP Benchmark probe has a larger
performance discrepancy and a more severe mathematics achievement problem than a Grade 5
student whose performance is average when compared to an average range of Grade 4 students on
a Grade 4 M–COMP Benchmark.

For more information on conducting off-level testing, see the AIMSweb manual Progress Monitoring
Strategies for Writing Individualized Goals in General Curriculum and More Frequent Formative
Evaluation (Shinn, 2002b).

12 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Section 3 • Guidelines for Administration, Scoring, and Reporting

Administering the Probes

Setting the students at ease before testing begins is essential. Remember the age of the students
you are testing when you present the instructions and respond to questions. The instructions are
carefully worded with simple language.

When using M–COMP, keep the following in mind:

• M–COMP is not a teaching tool. You cannot give M–COMP probes as “practice” tests or
worksheets in order to teach the material or prepare students for testing.

• M–COMP is a timed (8 minute for all grades), standardized test: Changes in the presentation of
probes, use of probes (e.g., as a teaching aid), or administration time (e.g., from 8 to 10 minutes
or 8 to 5 minutes) of probes violate the standardized procedure and invalidate the students’
scores on that probe.

• Encourage students to attempt each item before moving on to the next item and discourage
skipping items. If you see that a student is skipping ahead without attempting each item,
redirect the student to try each problem before moving to the text item.

• If a student asks a question or requests clarification, redirect him or her to any direction
provided on the probe and encourage the student to do his or her best.

To administer the probes, you will need:

• the standardized administration directions found in this manual.

• A copy of the M–COMP probe.

Note. When you print a probe, the Answer Key is included. Remove the Answer Key before
replication and retain for your use in scoring.

• A stopwatch or other accurate timer to monitor administration time.

• Sufficient sharpened pencils.

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 13


AIMSWeb® Administration and Technical Manual

Administration Directions
Read the instructions to the students verbatim. Instructions to you are in regular font. Do not read
them to the students. The instructions you read aloud to the students are in bold print.

Say to the students:

We are going to take an 8-minute math test.

Read the problems carefully and work each problem in the order presented, starting at the first
problem on the page and working across the page from left to right. Do not skip around.

If you do not understand how to do a problem, mark it with an X and move on. Once you have tried
all of the problems in order, you may go back to the beginning of the worksheet and try to complete
the problems you marked.

Although you may show your work and use scratch paper if that is helpful for you in working the
problems, you may not use calculators or any other aids.

Keep working until you have completed all of the problems or I tell you to stop.

Do you have any questions?

Answer any questions the students may have, then hand the students their probes, and say:

Here are your tests.

Put your name, your teacher’s name, and the date on each page in the space provided, then turn
over the test.

Do not turn the test back over or start working until I tell you to begin.

Allow students time to write their information on the probe, then say:

Begin.

If a student asks a question or requests clarification, redirect him or her to the probe and say:

Read the directions again, and work the problem as best you can.

If you still do not understand the problem or are unable to work it, you may move on to the next
question.

If you see that a student is skipping ahead without attempting each item, provide the following
direction:

Try to work each problem. Do not skip around.

When the 8 minutes have elapsed, say:

Stop and put down your pencil.

If a student(s) continues to work, restate:

Stop working now and put down your pencil.

At this time, collect the probe(s) and proceed to scoring.

14 Copyright © 2010 NCS Pearson, Inc. All rights reserved. This page is reproducible.
Section 3 • Guidelines for Administration, Scoring, and Reporting

Scoring Guidelines

M–COMP uses the same streamlined scoring system used with the AIMSweb Mathematics
Concepts and Applications (M–CAP), released in fall 2009. Rather than scoring based on correct
digits and partial credit, as in the M–CBM and M– CBM2, M–COMP scoring assigns a point value
based on difficulty of 1, 2, or 3 to each item. Within each grade, the point value for a given item
remains the same (i.e., if the first item is valued at 1 point on the Fall benchmark, it is valued at
1 point for every other benchmark and progress monitoring probe for that grade). This method
minimizes scoring time, maximizes sensitivity to growth, controls for students who skip to the
“easiest” items, and ensures the psychometric soundness of the process. Although the total points
available vary modestly across grades, within a grade each probe has the exact same total point
value. Table 3.1 presents the total point value per probe, per grade.

Table 3.1 Total Point Value by Grade


Grade Maximum Points
1 48
2 50
3 68
4 73
5 76
6 74
7 70
8 80

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 15


AIMSWeb® Administration and Technical Manual

How to Score the M–COMP


Each probe file includes an Answer Key. The answers provided on the Keys are the target answers
for each item on the probe, along with the point value of that answer. Scoring is a straightforward
process: Circle the point value if the student’s answer is correct, or circle zero if the answer is
incorrect. You then simply add up the value of the correct answers to obtain the total score for
the probe. Figure 3.1 presents an example of a scored Answer Key for Grade 4.

Grade 4, Probe 1 Answer Key


Item Item
No. Answer Correct Incorrect No. Answer Correct Incorrect
7
1. 31 1 0 20.
9
3 0
2. 4 2 0 21. 11.9 2 0
3. 15 1 0 22. 63 1 0
4. 648 2 0 23. 7 2 0
5. 28 1 0 24. 13 2 0
3
6. 205 1 0 25.
7
3 0
7. 393 2 0 26. 342 2 0
8. 64 1 0 27. 2.1 3 0
9. 357 2 0 28. 1014 1 0
10. 18 2 0 29. 1009 2 0
11. 478 1 0 30. 18 3 0
12. 186 2 0 31. 6748 2 0
13. 310 1 0 32. 2.9 3 0
14. 12 1 0 33. 1637 3 0
15. 140 1 0 34. 677 2 0
9
16. 30 1 0 35.
10
3 0
17. 9 2 0 36. 7627 2 0
2
18. 14.3 3 0 37.
5
3 0
1
19. 120 1 0 38. 30 r1; 30.25; 30
4
3 0

Subtotal 1 20 Subtotal 2 13

TOTAL = Subtotal 1 + Subtotal 2 33


Figure 3.1 Scored Answer Key

16 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Section 3 • Guidelines for Administration, Scoring, and Reporting

M–COMP Scoring Examples


The biggest challenge in scoring is determining what to do if an answer deviates from the one
provided on the Answer Key, but may still be correct. The criteria used to decide when alternate
answers are or are not acceptable are based on best practices and professional judgment. The
primary goal is to determine if the answer reflects an understanding of the task presented. Although
the provided Answer Keys present some alternate acceptable answers, the keys are not exhaustive.
If a student’s answer
●10is correct,
Write thethen score as correct, regardless of whether or not the answer is in
fraction
the key. Figure 3.2 shows an example
in lowest terms of a student who has presented an answer as a decimal.

5 =
10

10 Write the fraction
in lowest terms 1
5 = 2 .50
10
1
2 .50

Grade 5, Probe 3 Answer Key


Item Item
No. Answer Correct Incorrect No. Answer Correct Incorrect
Grade 5, Probe 3 Answer Key
Item 1. 128 Item 1 0 21. 79 1 0
No. Answer Correct Incorrect No. Answer Correct Incorrect
63
1. 128 2. 1 40 21. 1 79 0 22. 1 0 100
2 0
1 3
2 0 3 0
63
2. 4 3. 1 8
0 22.
100
23. 2 160 r3; 16.6; 16
5
1 3
3.
8 4. 2 0
143 23. 16 2r3; 16.6; 160
5 24. 3 0 5 3 0
2 0 3 0
1 0 1 0
4. 143 24. 5
5. 820 25. 396
5. 820 1 10 25. 396 1 0 9
6.
12
3 0 26.
13
1 0
1 9
6. 3 0 26. 1 0 21
1 0 2 0
12 13
7. 70 21
27.
7. 70 1 0 27.
31
2 0 31

8. 0.95 1 0 28. 45 3 0
8. 0.95 1 0 28. 45 3 0
1
9. 19 9. 2 190 29. 2 1 0 29. 1 0 12
1 0
12
1 1
2 0 3 0
1 1
10.
2
10. 2 2
0 30.
15
30. 3 0 15
3 3 1
11. 10 r3; 10.6; 10
11.5 2 r3; 10.6;
10 0 10 31.
3 53 r3; 253.5; 53 0; 53
6 2 31. 3 53 r3;0 53.5; 53 ; 53
3 1
3 0
5 6 2
2 0 3 0
2 0 3 0
12. 3883 32. 33
12. 3883 32. 33

Figure 3.2 Correct Answer Not on the Answer Key

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 17



39 16 ●40
6
— 20 and Technical Manual
AIMSWeb® Administration
8 ÷ 6 =
20|136
120 9 7
8 7
16 of items on the M–COMP probes56
A number (Grades 4–8) result in responses that can be reduced
9 6 54
to a simpler form. If the instructions do not specifically require that the student write the answer
in the lowest terms, you may receive 2different correct answers. For these items, there may be
1 54
a range of acceptable responses provided on the Answer Key. You will also find items, such as
division items, where the correct answer can be presented with a remainder. Depending on your
curriculum, it may be appropriate for the student to present this remainder as a decimal, fraction,
or with an “r” followed by the remainder. Figure 3.3 shows an example of each of these item
types, found in Grade 7.

Note. “Lowest terms” can be used interchangeably with “reduce,” “simplify,” and similar terms,
depending on which term is preferred in your school’s curriculum. If your school uses the term
“reduce,” tell students that when they see the instruction “lowest terms,” it means to reduce.

●●
39 39 1616 ●●
40 40
66 00
22
|—
2020 |—
136136 8 8÷ ÷6 6= =
120120 9 9 7 7
1616 8 8 7 7 56 56
9 9 6 6 54 54
22
1 15454

16 8 4
39. 6 r16; 6.8; 6
20
; 6 ; 6
10 5
3 0
56 2 1
40.
54
; 1 ; 1
54 27
3 0

Figure 3.3 Range of Acceptable Answers

Credit may be given for a clearly correct response conveyed in a manner other than the one
indicated; this is where you must rely on best practices and professional judgment.

The rest of this section presents examples of the most common variations of correct and incorrect
16 16 8 8 4 4
answers seen
39. 39. 6 in
6 r16; 6.8;the
r16; 6 6;national
6.8; 6 ; 6; 6 ; 6 field-test
20 20 10 10 5 5
3 3 0 sample,
0 as well as examples of answers that require the
judgment
40. 40. in evaluating
56
; 1 ; 1; 1 ; 1 correctness.
56 2 2 1 1
3 3 0Also
0 included are examples of the types of issues that impact
54 54 54 54 27 27
scoring decisions, including but not limited to, problems with legibility, reversed numerals,
crossed-out responses, and overcorrection.

18 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Section 3 • Guidelines for Administration, Scoring, and Reporting

The scoring for grades 1–3 is straightforward. The problems are basic computation and number-
sense questions. There is not much variability between what is correct and incorrect. At grade 4,
computation with fractions and decimals are presented and it is here that some ambiguity begins
to present itself.

The examples in this section are not exhaustive, but representational of student responses in the
national field-testing sample. Use them to guide your professional judgment when determining
● 15
the correctness of answers that deviate from the correct responses identified on the Answer Key.
15 − 12 = 3
Checklist for25 25
Determining 25Credit
3 1
D
 oes the7student’s response match the answer (or alternate answers) provided on the
25
Answer Key?
1
 oes the7student’s answer represent an alternate correct answer that is not provided
D
on the Answer Key?

D
 oes the answer reflect an understanding of the task type?

These are important questions because they reflect the basic purpose of benchmarking and progress
monitoring—to determine if students are acquiring the skills required to complete the basic
computational tests presented on the probes. If you encounter a scoring dilemma that is not covered
in these pages, use your professional judgment in deciding the score. There is no partial scoring,
so it is important to make consistent decisions when scoring potentially ambiguous answers.

For the majority of problem types, there is little deviation in acceptable answers, and where there
is, it will be in the method of presentation (e.g., 0.50 versus .5). In problems where the target
answer is a fraction, some students may choose to reduce the answer even when instruction to do
so has not been given. Generally, as long as the reduction is performed properly, and the reduction
is correct, give the student credit for the answer. This becomes tricky when a student initially
provides the correct response and then makes an error in reducing. The final answer presented is
what you score, so it is possible for a student to “overcorrect” into error (see Figure 3.4).


15

15 − 12 = 3
25 25 25
3 1
25 7
1
7

3
15.
25
1 0

Figure 3.4 Overcorrection Into Error

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 19


4 20 6 8
AIMSWeb® Administration and Technical Manual

Generally, when a specific type of target answer is required, such as an improper fraction or a
mixed number, that target is requested in the directions, such as in Figure 3.5, wherein a mixed
number is specifically requested as the answer.

nswer
rms 38 ●
● 38 ●38
Write
Write the Write the ●
the answer
answer 39 ●39
answer ●39
Write
Write the Write the ●
the answer
answer 40 ●40
answer ●40
Write
Write the Write
the answer
answerthe answer
in lowest
in lowest in lowest
termstermsterms in lowest
in lowest
in lowest
termstermsterms in lowest
in lowest
in lowest
termstermsterms

= 24 2 ÷2 1÷ 2=1÷ = 1 = 2 92 918 18 18
2 920 20
20 7 57 57 5
22 4 4 3 43 3
2 32 326 363 63 3
10 10 10 8 8 8
4 144 44 4+ 2 +52
4 14 14 5
+2 5 − 1 −41 −4 1 4
20 20 20 8 8 8
1 121 12 1 12 23 23 23 1 1 1
4 4 20
20 4 20 6 8 6 6
8 8

1
38. 1
2
3 0
3
39. 5
20
3 0
1
40. 6
8
2 0

Figure 3.5 Targeted Answer in Directions

1 1 1
38. 38. 38. 1
2
1
2
1
2
3 3 0 3 0 0
3 3 3
Sometimes, however,39.the39.directions
39. 5 3
may be open
20
3 0interpretation,
to
5
20
3 05 0
20
such in items where the student
is instructed “Write the answer
40. 40. 40. in the lowest6terms.”
2
1
6
1 On
2 0 2 06certain
1
0 items, such as Grade 8, Item 2, the
8 8 8
target answer is a mixed number, but the student provided an improper fraction (see Figure 3.6).


22 Write the answer
in lowest terms

3 • 8 = 24
2 11 22
12
11

1
22. 1
11
2 0

Figure 3.6 Mixed Number for Grade 8, Item 22

20 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


$9.93
+ 2.68
$9.93 Section 3 • Guidelines for Administration, Scoring, and Reporting

Although the target answer is a mixed number, and that is what is presented on the Answer
●23 Key, a small, but significant number of students in our national field testing provided a reduced
improper fraction as an answer, as did the student in this example. After discussion with our
0.84 − 0.3
experts,
23 ●
= agreed that the nonspecific “lowest terms” could be understood by some students
it was
● to be the reduced improper fraction, rather than the reduced
230.30 23 mixed number, to be the lowest
0.84 −improper ●
0.3 = fraction as correct.
terms. For that reason, you may score a correct and properly reduced
0.54 ●1 ● 1
0.30
Because
0.84 − the = is that the student know how to both work0.84
0.3target − 0.3 =and reduce to a mixed
the problem
7.25 7.25
number, we recommend that you provide that feedback to any
0.30 0.54
0.30
student who provides an improper
$0
fraction, 54¢
and particularly if that student
+ 2.68
has also shown+ 2.68
difficulty with items specifically requesting a
0.54
mixed number as a response. $9.93 $9.93 0.54
$0 54¢
and

27
$0 54¢noted in the national field-testing sample $0
andpractice
Another 54¢grades, students
andat certain
was that,

1 ●+ 1.7 =
3.72
added ●
25 the $ symbol to items that require addition or subtraction
27 of decimals. Because of its
prevalence, this issue was also discussed with math experts, and the decision was made that if the

7.25 ●
271.70
25
0.39and
numerals + 0.5 =
the decimal ● 27
placement were correct, credit 3.72
would +
be ●231.7 This
given. = issue ●is23a bit more
+ 2.68
5.44 0.50
complicated however, when students write out the answer to1.70
the question without the decimal,
$9.93 3.72 0.39
+ 1.7+ 0.5
= = 3.72 + 1.70.84=− 0.3 = 0.84 − 0.3 =
relying solely on the symbols to denote the difference between the numerals
0.30 preceding 0.30and
0.50 5.44
1.70
5.44
following the missing decimal. Figure 3.7 presents examples 1.70
of correct0.54
and incorrect answers
0.54 in
5.44
this area. 5.44
.89 cents 5.44$0 and 54¢ $0 and 54¢

● .89 cents
1 5.44 ●
23 ●
25 ●
25 5.44
●27 ●
27

7.25 0.84 − 0.3 +


0.39 = 0.5 = 0.39 + 0.5 = 3.72 + 1.7 = 3.72 + 1.7 =
+ 2.68 0.30 0.50 0.50 1.70 1.70
$9.93 0.54 5.44 5.44

.89
$0 and 54¢
cents .89 cents 5.44 5.44


25 ●
27

0.39 + 0.5 = 1. 9.93 1


3.72 + 1.7 =
23 0

0.50 1.70
23.1. 0.54
9.93 5.44 21 00
0.84 − 0.3 =

.89 cents 5.44


0.30 23. 0.54 2 0
1. 0.54
9.93 1. 1 9.930 1 0
23. 0.54 2 0 23. 0.54 2 0
$0 and 54¢ 23. 0.54 23. 2 0.540

●25 ●
27
25. 0.89 2 0
0.39 + 0.5 = 3.72 + 1.7 =
25. 0.50 0.89 25.
2 0.89
0 25.
1.70 2 0.890 2 0

1 0 5.44
2 0
1. 9.93
27. 5.42
.89 cents 5.44 27. 5.42 2 2 5.420
23. 0.54 2 0 27. 5.42 27.
Figure 3.7 Answers to Decimal Problems
27. 5.42 2 0 27. 5.42 2

5. 0.89 2 0

1. 9.93 1 27. 0 5.42 2 0


Copyright © 2010 NCS Pearson, Inc. All rights reserved. 21
23. 0.54 2 0
seven cents
AIMSWeb® Administration and Technical Manual ●
7 ●
8

5.46 16
In all grades we found examples of students occasionally spelling out the×answers
− 2.19 5 to the problems.
If the answer to a problem is the number 3 and a student writes in three, you may give credit. If
three dollars
the problem is one working with decimals and the answer is 2.5 and the student writes two point
5.84
5.84 and twenty-
five, you may give credit. If, however, the student responds with two dollars and fifty cents, the
3.07
+ 3.07
answer is incorrect because the answer seven cents
skirts the issue of decimal placement. Figure 3.8 presents
8.91
$8.91 an example of corrected and incorrect answers from Grade 6.


1 ●
7●7 ●
8●8

1. 5.84
+ 3.07
8.91 5.46
5.46
− 2.19
− 2.19
1 0 16 16
× 5× 5

$8.91 three
three
dollars
dollars
and
and
twenty-
twenty-
seven
seven
cents
cents


7 ●
8

5.46 16
− 2.19 × 5
1. 8.91 1 0 dollars
three
and twenty-
seven cents
7. 3.27 1 0
8. 80 1 0
1. 1. 8.91 8.91 1 1 0 0

Figure 3.8 Written Answers

7. 3.27 1 0
1. 8.91 1 0
7. 8.7. 3.27 3.27 80 1 1 0 0 1 0
8. 8. 80 80 1 1 0 0

7. 3.27 1 0
8. 80 1 0

22 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


x is equal to 1
3 = 1 10 5 2
= Reporting
Section 3 • Guidelines for Administration, Scoring, and
21 7 6+x 26 12 7
6+1=7
Other examples of scoring issues are crossed-out answers, illegible answers, reversed numbers, or 5
rotated numbers in answers. 6 13
• Crossed-out answers: If a student shows his or her work, but then crossed or X-ed out the
problem without placing the answer in the blank, the item is incorrect and receives no credit. If
the student has crossed out the problem, but then returned to the item and placed an answer
in the blank, score the item based on whether or not the answer placed in the blank is correct.
See Figure 3.9.

10●
● ●
10Write
10Write
the the
Write ●
fraction
fraction
the 11●
fraction ●
11Evaluate
11Evaluate
Evaluate 12●
thethe the ● ●
12Write
12Write
the the
fraction
Write fraction
the fraction
in lowest
in lowest terms
terms
in lowest terms expression
expression
expressionwhen
when when in lowest
in lowest terms
terms
in lowest terms
x isxequal
isxequal
is to 1to 1to 1
equal
3 =3 =31 =1 1 10 10
= 5= 5
10 =25 2 2
21 21 217 7 7 6 +6x+6x+ x 26 26 12
26 127127 7
6+6 1=+71=7
1=+76
555
666 131313

1
10.
7
2 0
11. 7 1 0
5
12.
13
1 0

Figure 3.9 Crossed-Out Answers

1 1 1
10. 10. 10.
7 7 7
2 2 20 0 0
11. 11. 11. 7 7 7 1 1 10 0 0
5 5 5
12. 12. 12.
13 13 13
1 1 10 0 0

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 23


+ 410
AIMSWeb® Administration and Technical Manual

• Illegible, reversed, or rotated numbers: When students write answers that have illegible,
reversed, or rotated numbers, it is important to keep in mind the intent of using M−COMP
probes—to determine a student’s understanding of the task and progress throughout the
school year. Problems with legibility are common, particularly with the younger grades, and
students identified as having specific learning challenges may have issues with reversing
numbers and letters. Figures 3.10 through 3.12 provide examples of such responses.

• If the response is hard to read, but can be determined, score the answer as correct.


21●
21●
21●
21 ●
22●
22●
22●
22 ●
23●
23●
23●
23 ●
24●
24●
24●
24
— — — —
6 6 6 6 8|16
8|16
8|16
8|16 4 4 4 4 309309309309
× 7× 7× 7× 7 × 4× 4× 4× 4 224224224224
+ 410
+ 410
+ 410
+ 410

21. 42 1 0
22. 2 3 0
23. 16 2 0
24. 943 2 0

Figure 3.10 Difficult-to-Read Response

21. 21. 21. 21. 42 42 42 42 1 1 10 10 0 0


22. 22. 22. 22. 2 2 2 2 3 3 30 30 0 0
23. 23. 23. 23. 16 16 16 16 2 2 20 20 0 0
24. 24. 24. 24. 943 943 943 943 2 2 20 20 0 0

24 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


5
7
Section 3 • Guidelines for Administration, Scoring, and Reporting

• If the response is too illegible to determine with confidence, score it as incorrect. If the
response is reversed, but the digit the student intended is obvious, score it as correct.

●1 ●
1● 1 ●2 ●
2● 2 ●
3● ● 3 the
3 Circle Circle
Circle the the number
number
number
that is that
that is is
thethe the greatest.
greatest.
greatest.
22 2 00
++707
=+=7 =
548548
548 537537
537

263+ 3
3
++ ●
27

15 55 5 7 16
77 7
13 −6 − 10

1. 5 1 0
2. 7 1 0
3. 548 1 0

Figure 3.11 Reversed Numbers With Intended Number Obvious


1. 1. 1. 5 5 5 1 1 10 0 0
2. 2. 2. 7 7 7 1 1 10 0 0
3. 3. 3. 548548 548 1 1 10 0 0

• If the response is rotated and you cannot easily determine what digit the student intended,
score as it incorrect.


25 ●●
25 25 ●
26 ●●
26 26 ●
27 ●●
27 27

15 1515 7 77 16 1616
1313
+ 13 + + 66
−6 −− 1010
− 10 − −

25. 28 3 0
26. 1 2 0
27. 6 3 0

Figure 3.12 Rotated Numbers With Intended Number Indeterminable

25. 25. 25. 28 28 28 3 30 3 0 0


26. 26. 26. 1 1 1 2 20 2 0 0
27. 27. 27. 6
Copyright © 2010 NCS Pearson, Inc. All rights reserved.
6 6 3 30 3 0 0
25
AIMSWeb® Administration and Technical Manual

A final note on scoring: Use your professional judgment in determining whether or not to give
a student credit for an answer that deviates from the answer provided on the Answer Key. If the
answer is mathematically correct, shows an understanding of the operation being assessed, and
is consistent with the manner in which your curriculum treats that operation, than the student
should get credit for the answer. When students present non-target responses, such as adding
money symbols, writing out the answer as words not numbers, or providing a reduced improper
fraction where a mixed number is the target, after you have scored the answer as correct, discuss
the item with the student so he or she understands what is expected in the future, as continuing
with certain nonstandard styles could inadvertently lead to errors on other probes.

Reporting

The next step in the process is reporting your data in the AIMSweb reporting system. First, log
into your school’s AIMSweb account. On the opening screen there are tabs along the top and
down the left side. Click on Report in the row of tabs along the top. At the Report page, there are
tabs across the top and down the left side of the page. The row of tabs across the top represent
the type of information you can report. In this case, click Mathematics. After you choose
Mathematics, choose the level of the information you want to report from the tabs down the left
side: Customer, District, School, Grade, or AIMSweb.

Generating Student Reports


The most common types of reports used are the Individual Student Report, Pathway Report, and
Email Report.

Creating an Individual Student Report


If you have entered student scores, they are listed under the column headings for each General
Outcome Measure.

Select Mathematics from the gray tabs.

Select M–COMP from the corresponding radio buttons.

Click on a student’s score to view the student’s Individual Report.

Note: If you click on a column heading (e.g., RBP), an Individual Student Report is
generated for all of the students in the classroom.

26 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Section 3 • Guidelines for Administration, Scoring, and Reporting

Pathway Report
The Pathway Report for every student in the classroom is located under the Pathway column
heading. The Pathway Report displays student scores for every Benchmark Period and every
General Outcome Measure. Click the Pathway column heading to generate for all the students in
the classroom.

Note: You will need Adobe Acrobat Reader to view and print the reports. If you do not have Adobe
Acrobat Reader installed on your computer, download and install the latest version free of charge
from http://get.adobe.com/reader/.

Emailing a Report
All reports listed in this section can be generated as PDFs and emailed from the AIMSweb system.

You may email the currently selected report by clicking the EMail Report button.

The Email Report screen is displayed. Enter:

Your email address (displayed by default according to the email address that was originally entered
with the user information)

Recipient email address

Subject (displays a default subject)

Enter your message in the Short Message window.

Click Send to send the email, or click Cancel to return to the currently selected chart.

Click the Back button to return to the reports page.

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 27


Section

4 Developing and
Standardizing the Test

Creating the New Blueprint

The first major component and task of the M–COMP revision was determining the blueprint
for each grade. The AIMSweb content team engaged internal mathematics expertise as well as
multiple nationally recognized RTI and mathematics experts in the development of the new
blueprints for each grade (1–8). Upon finalization of blueprints, anchor probes were developed for
each of the grades. Once the anchor probes were finalized, each probe was sent to the RTI and
mathematics experts, along with a team of professional educators for an additional round of input
and analysis. Once all of the data were aggregated, the AIMSweb content team used the collective
analyses and made final adjustments to the probes.

Table 4.1 Domains Evaluated With M–COMP by Grade


1 2 3 4 5 6 7 8
Quantity Discrimination  
Column Addition   
Basic Facts      
Complex Computation       
Fractions     
Decimals     
Reducing   
Percentages    
Conversions    
Expressions 
Integers   
Exponents  
Equations  

Item Development
Each item was individually developed by professional test developers and mathematics experts.
The development of each item was based on the blueprint by grade-level and domain-specific
criteria. Each item was field tested prior to the final selection of probes. Only items with field-test
data that meet specific acceptance criteria are considered for use in publication (see National Field
Test section for details).

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 29


AIMSWeb® Administration and Technical Manual

Item Review
Following extensive internal review, revision, and pilot testing, a team of mathematics experts
reviewed each item and individual probe for each of the grade levels (1–8) and provided detailed
reports on their reviews. The experts were asked to evaluate myriad aspects of each item and probe
to ensure the highest level of quality standards were met. The following is a list of some of the
specific tasks performed and components reviewed by the experts.

• Detailed item reviews were performed to ensure no errors existed in the overall structure of the
items (stem, format, etc.).

• Each item was re-worked to ensure accuracy of item function and the answer key was checked
against the associated answer key.

• Item types were evaluated for equivalency and were placed consistently across probes.

• Item sequencing was evaluated to ensure clueing and other anomalies did not occur.

• Upon completion of their reviews, mathematics experts provided detailed reports to AIMSweb
content, who then made the appropriate adjustments.

Test Construction
An anchor probe was developed for each grade level (1–8). Each anchor probe was constructed by
selecting individual items based on multiple criteria. All items were field-tested and then evaluated
based on point-biserial correlations and item difficulty. Items that did not meet psychometric
criteria were removed from the item pool. Item placement on each anchor probe was based on
increasing item difficulty within a domain. To ensure the sensitivity of the instrument and to
maximize the amount of data collected from at-risk learners, easier items were generally placed at
the beginning of each probe and more difficult items followed.

To maintain proper randomization and representation of item type by domain, items were not
placed in exact order of difficulty within each probe. Multiple item types that measure the same
domain in groups of three or more also were not placed in exact order of difficulty. These anchor
probes then served as the templates for developing equivalent probes at each grade.

Each equivalent probe was built to replicate the item type proportions, difficulty, and item
placement on the anchor probe. For example, Item 1 of the grade 1 anchor probe is a basic
addition question; therefore Item 1 for each subsequent and equivalent grade 1 probe is the same
problem type (i.e., basic addition) with a different item of similar difficulty and construction.
Although most items are unique (with a small percentage of repetition within a grade), the intent
is to assess the same learning domain with a similar question at the same numbered position on
each equivalent probe.

30 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Section 4 • Developing and Standardizing the Test

Major Research Stages

There were three major research stages in creating the M–COMP benchmark and progress
monitoring probes: pilot studies, national field testing, and final selection for publication.

Pilot Studies
The primary goal of the pilot stage was to produce anchor probes with the desired content
coverage and psychometric properties for which equivalent probes could be generated for national
field-testing. This stage of development focused on issues such as item content and relevance,
adequacy of scale floors and ceilings, appropriate time limits at administration, scoring criteria,
and other relevant psychometric properties. There were three pilot studies.

Pilot 1: In the initial pilot, 16 students were tested to ensure that all directions for administration
and individual item directions were clear and grade-level appropriate, that no issues existed with
item progression, and that all items were functioning properly. Qualitative and quantitative
feedback were used to make minor adjustments to prepare for Pilot 2.

Pilot 2: To determine alternate-form reliability and the time limits for administration at each
grade level, one anchor probe and two clone probes were administered to a group of 337 students.
The progress students made toward completing the probes was marked for each of the three
probes at each grade at varying time points. A correlation analysis was conducted at each grade
level to determine the most appropriate amount of time necessary for the test to maintain reliable
discriminability. Again, qualitative and quantitative feedback was used to make adjustments to
prepare for Pilot 3.

Pilot 3: Pilot 3 administration was untimed. The anchor probes were group administered to 444
students. The administration was not timed. The intent of this study was to extend the collection
of item-specific and probe-level data from Pilots 1 and 2 to further evaluate the performance of all
items. Multiple criteria, including point-biserial correlation coefficients and p values, were used to
evaluate the items. Split-half correlation and Cronbach’s alpha were used to evaluate the reliability
of the anchor probe.

National Field Test


A national field-test edition of the M–COMP assessment was developed. Data were obtained from
a sample of 7,703 students, representing some key population demographic variables such as
grade, sex, race/ethnicity, socioeconomic status, and geographic region. See Appendix A for the
demographic information for the national sample.

For national field testing, 45 probes (including the anchor probe for each grade) were
administered. Given the administration time limit (8 minutes), and to avoid an accelerated
practice effect from repeatedly answering questions on the same domain, only a single set
of six probes (including five M–COMP probes and either one off-level M–COMP probe or an
M–CBM/M–CBM2 probe) was administered to each student. Twenty-two sets of probes were
assembled for each grade. In each set, the anchor probe was always administered first. The
remaining four M–COMP probes were administered in counter-balanced order, with half of the

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 31


AIMSWeb® Administration and Technical Manual

sample received an M–CBM or M–CBM2 probe, that was placed either after the first (anchor)
probe or the third M–COMP probe.

Because norms will not be available for the M–COMP until after the 2010–2011 school year, a
conversion table is being developed for separate release to enable educators to convert the 2010
Spring M–CBM and M–CBM2 scores to the equivalent 2010 M–COMP Spring scores for the
purpose of end-of-year goal setting.

Table 4.2 National Field Testing Item and Probe Count by Grade
Item Count Probe Count
Grade by Probe by Grade
Grade 1 28 45
Grade 2 28 45
Grade 3 37 45
Grade 4 38 45
Grade 5 39 45
Grade 6 40 45
Grade 7 40 45
Grade 8 40 45

Finalizing and Selecting Probes


Multiple criteria were used to select the most psychometrically-sound equivalent probes. Pearson’s
product-moment correlation coefficient was examined to assess the consistency of probes within
each grade. The average correlation coefficients for each probe with other probes in the set are
reported in Table A.6. To evaluate the internal consistency of the probes, Cronbach’s alpha and
split-half reliability were examined (see Table A.6). Probe selection was based on the evaluation of
these statistical properties and the comparison of the probe mean scored to the aggregated mean
for each grade. Analysis of the confidence interval at the 99% level using the standard error of
measurement (SEM) showed no statistically significant difference among the final selected probes
were statistically equivalent to each other in the grade. The aggregated means and SEMs are also
reported in Table A.6.

32 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Appendix A

Technical Adequacy and Data Tables

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 33


AIMSWeb® Administration and Technical Manual

A.1 Demographic Characteristics of the Sample by Grade and Geographic Region

Geographic Region
Northeast Midwest South West Total
Grade N % N % N % N % N
1 202 22.0 229 24.9 423 46.0 65 7.1 919
2 165 16.9 272 27.9 478 49.0 61 6.3 976
3 195 20.1 172 17.7 530 54.6 74 7.6 971
4 192 21.0 97 10.6 568 62.0 59 6.4 916
5 178 17.0 151 14.4 655 62.5 64 6.1 1,048
6 96 9.8 372 37.9 440 44.9 73 7.4 981
7 87 9.2 277 29.3 436 46.2 144 15.3 944
8 92 9.7 356 37.6 350 36.9 150 15.8 948

Total 1,207 15.7 1,926 25.0 3,880 50.4 690 9.0 7,703
Note. Row percentages may not sum to 100 due to rounding.

A.2 D
 emographic Characteristics of the Sample by Grade and Community Type

Community Type
Urban Suburban Rural Total
Grade N % N % N % N
1 248 27.0 485 52.8 186 20.2 919
2 203 20.8 542 55.5 231 23.7 976
3 252 26.0 502 51.7 217 22.3 971
4 235 25.7 490 53.5 191 20.9 916
5 240 22.9 644 61.5 164 15.6 1,048
6 151 15.4 724 73.8 106 10.8 981
7 63 6.7 707 74.9 174 18.4 944
8 67 7.1 802 84.6 79 8.3 948

Total 1,459 18.9 4,896 63.6 1,348 17.5 7,703


Note. Row percentages may not sum to 100 due to rounding.

34 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Appendix A

A.3 D
 emographic Characteristics of the Sample by Grade and Sex

Sex
Female Male Total
Grade N % N % N
1 489 53.2 430 46.8 919
2 545 55.8 431 44.2 976
3 544 56.0 427 44.0 971
4 513 56.0 403 44.0 916
5 562 53.6 486 46.4 1,048
6 504 51.4 477 48.6 981
7 504 53.4 440 46.6 944
8 537 56.6 411 43.4 948

Total 4,198 54.5 3,505 45.5 7,703


Note. Row percentages may not sum to 100 due to rounding.

A.4 Demographic Characteristics of the Sample by Grade and Race/Ethnicity

Race/Ethnicity
African American
American Indian Asian Hispanic White Other a Total
Grade N % N % N % N % N % N % N
1 96 10.4 19 2.1 19 2.1 237 25.8 536 58.3 12 1.3 919
2 86 8.8 18 1.8 15 1.5 247 25.3 595 61.0 15 1.5 976
3 87 9.0 12 1.2 15 1.5 269 27.7 579 59.6 9 0.9 971
4 88 9.6 13 1.4 7 0.8 260 28.4 544 59.4 4 0.4 916
5 99 9.4 35 3.3 5 0.5 225 21.5 679 64.8 5 0.5 1,048
6 89 9.1 25 2.5 11 1.1 152 15.5 693 70.6 11 1.1 981
7 37 3.9 28 3.0 10 1.1 234 24.8 626 66.3 9 1.0 944
8 58 6.1 28 3.0 23 2.4 187 19.7 646 68.1 6 0.6 948

Total 640 8.3 178 2.3 105 1.4 1,811 23.5 4,898 63.6 71 0.9 7,703
Note. Row percentages may not sum to 100 due to rounding.
Includes Alaska Natives, Pacific Islanders, Native Hawaiians, and all other groups not classified as African American, American Indian, Asian,
a

Hispanic, or White.

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 35


AIMSWeb® Administration and Technical Manual

A.5 Demographic Characteristics of the Sample by Grade and Median Family Income

Median Family Income Level


Low Middle High Total
Grade N % N % N % N
1 462 50.3 150 16.3 307 33.4 919
2 517 53.0 115 11.8 344 35.2 976
3 381 39.2 267 27.5 323 33.3 971
4 327 35.7 364 39.7 225 24.6 916
5 366 34.9 433 41.3 249 23.8 1,048
6 234 23.9 328 33.4 419 42.7 981
7 293 31.0 371 39.3 280 29.7 944
8 218 23.0 356 37.6 374 39.5 948

Total 2,798 36.3 2,384 30.9 2,521 32.7 7,703


Note. Row percentages may not sum to 100 due to rounding.

A.6 Descriptive and Reliability Statistics by Grade

Grade Mean a SD b SEM c rd Split–Half d Alpha d


1 36.0 12.8 4.02 .86 .89 .87
2 37.9 11.4 4.04 .82 .85 .82
3 51.2 17.6 4.67 .89 .90 .89
4 51.3 17.1 5.56 .85 .91 .87
5 33.7 20.3 5.67 .89 .93 .91
6 32.5 17.9 4.79 .89 .89 .89
7 34.8 19.4 5.06 .90 .92 .91
8 30.2 18.9 5.59 .88 .92 .90
a
Weighted average.
b
Pooled standard deviation.
c
The SEM for each probe was calculated based on the average correlation coefficient and the actual standard deviation of the raw score for the
probe. The average SEM for the grade was calculated by averaging the squared SEMs for each probe and obtaining the square root of the result.
d
The average reliability coefficients were calculated using Fisher’s z transformation.

36 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


Section 4 • Appendix A

A.7 C
 orrelations of AIMSweb M–COMP Scores With Group Mathematics Assessment
and Diagnostic Evaluation (G∙MADE) Scores by Grade

M–COMP G∙MADE Total


Grade Mean SD Mean SD r a N
1 36.0 12.2 17.9 5.6 .84 98
3 45.5 14.2 15.9 5.7 .73 98
8 28.4 13.6 13.4 3.8 .76 54
a
The average reliability coefficients across both administration orders were calculated using Fisher’s z transformation.

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 37


Glossary

Aim Line—Graphical representation of student’s expected rate of progress.


Benchmark Probe—Administered to all students in Fall for screening, Winter and Spring for
progress monitoring and program evaluation.
Box and Whisker Plot—A graphical representation of a set of scores divided into four parts using
quartiles and median.
Content Validity—A type of validity that relates to how adequately the content of a test represents a
specified body of knowledge, and to how adequately subjects’ responses represent knowledge of
the content.
Correlation—A measure of the strength and direction of the relationship between two sets of
variables. (See Correlation Coefficient).
Correlation Coefficient (r)—A statistic ranging between 0 and 1 that indicates the degree and
direction of relationship between two variables. The strength of the relationship is indicated by
the values of the coefficients (with greater values indicating stronger relationships). The direction
of the relationship is indicated by either a positive sign (+) representing a positive relationship
in which variables tend to increase or decrease together, or a negative sign (–) representing an
inverse relationship between variables.
Individually Referenced—Comparing a student’s scores to his or her own scores (instead of norm-
referenced) over time.
Internal Consistency—A type of test score reliability indicating the degree of correlation among
item responses within each separate part of a test.
Internal Structure Validity—A type of validity involving the degree to which relationships among
test items and test components conform to what the test is intended to measure.
Line of Best Fit—A line drawn through individual data points.
Longitudinal Tracking—The tracking of particular data (e.g., mean entering scores) over a long
period of time to establish trends.
Mean (M)—The average of a set of scores computed by adding all of the scores together and then
dividing by the total number of scores.
Median—The middle value in a distribution of score with 50% of the scores lying below it (i.e., the
50th percentile).
Meta-Analysis—A method of research that analyzes the results of several independent studies by
combining them to determine an overall effect or the degree of relationship between variables.
N count (N)—The total number of individuals who make up a sample (e.g., the number of students
that took a probe).

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 39


AIMSWeb® Administration and Technical Manual

Normative Score—Score on a scale where the tested student performed as well as the typical student
at that grade.
Normative Sample/Norm Group—The group of individuals (sample) earning scores on a test whose
score data are used to determine scaled scores and/or percentile ranks.
Off-Level Testing (Survey-Level Assessment)—Administering probes from a grade other than the
student’s actual grade level in some circumstances, in addition to the student’s grade-level
probes until a normative score is obtained that reflects the student’s current level of performance.
Percentile Rank (PR)—A whole number between 1 and 99 that represents the proportion of
individuals from the normative sample who earned lower than a given score on a test.
Progress Monitoring Probe—Administered to students identified as at risk, to monitor student
progress and program evaluation. Frequency to be determined by teacher or IEP, if applicable.
Predictive Validity—A type of validity based on how accurately test data (e.g., admission test scores)
are able to predict criterion measures obtained at some later time (e.g., a grade point average
earned after admission).
Predictor Variable—A variable that occurs prior to any intervention (e.g., scores on an admission
test) that is used to predict some subsequent outcome (e.g., a grade point average earned after
admission). (See Variable.)
Rate of Improvement (ROI)—The rate of improvement measures the progress the student makes per
progress monitoring event since the start of an intervention program. To determine the ROI,
divide the amount of progress per event by the number of weeks of intervention.
Raw Score (RS)—The number of items answered correctly by a candidate on a test.
Reliability—An estimate of the dependability of test scores in terms of the degree of consistency
between measures of the test (e.g., comparisons of administrations of a test over time, or
comparisons of items within a test).
Reliability Coefficient—A correlation statistic (usually ranging from 0 to 1) that measures the degree
to which test scores are free of measurement error. (See Standard Error of Measurement.)
Standard Deviation (SD)—A measure of the variability of test scores in terms of how spread out
scores are from one another in a normative sample distribution.
Standard Error of Measurement (SEM)—An estimate (based on group data) of the variation in scores
earned on repeated administrations of the same test by the same individual.
Trend Line—A graphical representation of an individual student’s rate of improvement (ROI).
Validity—The extent to which a test measures what it is intended to measure. Validity refers to the
extent to which test scores (or other measures) support appropriate interpretations and inferences
regarding characteristics of a person measured (e.g., knowledge or ability) or performances other
than those measured (e.g., subsequent performance or achievement).

40 Copyright © 2010 NCS Pearson, Inc. All rights reserved.


References

Shinn, M. R. (2002a). Organizing and implementing a benchmark assessment program. Eden Prairie,
MN: Edformation.
Shinn, M. R. (2002b). Progress monitoring strategies for writing individualized goals in general curriculum
and more frequent formative evaluation. Eden Prairie, MN: Edformation.
Shinn, M. R. (2008). Best practices in curriculum-based measurement and its use in problems-
solving model. In A. Thomas & J. Grimes (Eds.), Best practices in school psychology (5th ed.,
pp. 243–262). Bethesda, MD: National Association of School Psychologists.

Copyright © 2010 NCS Pearson, Inc. All rights reserved. 41

You might also like