Flanagan, D. & Caltabiano, B. (2004) - Test Scores - A Guide To Understanding and Using Tests Results.
Flanagan, D. & Caltabiano, B. (2004) - Test Scores - A Guide To Understanding and Using Tests Results.
net/publication/237626627
CITATIONS READS
2 21,135
2 authors, including:
Dawn P. Flanagan
St. John's University
75 PUBLICATIONS 3,251 CITATIONS
SEE PROFILE
All content following this page was uploaded by Dawn P. Flanagan on 16 May 2014.
When a student takes either an individually or group-administered standardized test at school, the
results are made available to both parents and teachers. It is important that parents and teachers
understand the meaning of scores that come from standardized tests. This handout provides a
description of common terms used to describe test performance. You are also encouraged to refer to
handouts on Psychological Reports (Flanagan & Caltabiano) and Intellectual Assessment (Ortiz & Lella)
to gain a better understanding of the evaluation process (See “Resources”).
Standard Score
Most educational and psychological tests provide standard scores that are based on a scale that has
a statistical mean (or average score) of 100. If a student earns a standard score that is less than 100,
then that student is said to have performed below the mean, and if a student earns a standard score that
is greater than 100, then that student is said to have performed above the mean. However, there is a
wide range of average scores, from low average to high average, with most students earning standard
scores on educational and psychological tests that fall in the range of 85–115. This is the range in which
68% of the general population performs and, therefore, is considered the normal limits of functioning.
Classifying standard scores. However, the normal limits of functioning encompass three
classification categories: low average (standard scores of 80–89), average (standard scores of 90–109),
and high average (110–119). These classifications are used typically by school psychologists and other
assessment specialists to describe a student’s ability compared to same-age peers from the general
population.
Subtest scores. Many psychological tests are composed of multiple subtests that have a mean of 10,
50, or 100. Subtests are relatively short tests that measure specific abilities, such as vocabulary, general
knowledge, or short-term auditory memory. Two or more subtest scores that reflect different aspects of
the same broad ability (such as broad Verbal Ability) are usually combined into a composite or index
score that has a mean of 100. For example, a Vocabulary subtest score, a Comprehension subtest score,
and a General Information subtest score (the three subtest scores that reflect different aspects of Verbal
Ability) may be combined to form a broad Verbal Comprehension Index score. Composite scores, such as
IQ scores, Index scores, and Cluster scores, are more reliable and valid than individual subtest scores.
Therefore, when a student’s performance demonstrates relatively uniform ability across subtests that
measure different aspects of the same broad ability (the Vocabulary, Comprehension, and General
Information subtest scores are both average), then the most reliable and valid score is the composite
score (Verbal Comprehension Index in this example). However, when a student’s performance
demonstrates uneven ability across subtests that measure different aspects of the same broad ability
(the Vocabulary score is below average, the Comprehension score is below average, and the General
Information score is high average), then the Verbal Comprehension Index may not provide an accurate
estimate of verbal ability. In this situation, the student’s verbal ability may be best understood by
Helping Children at Home and School II: Handouts for Families and Educators S2–81
looking at what each subtest measures. In sum, it is Tests that are highly reliable have relatively small
important to remember that unless performance is confidence bands associated with their scores,
relatively uniform on the subtests that make up a indicating that these tests provide the most consistent
particular broad ability domain (such as Verbal Ability), scores across time.
then the overall score (in this case the Verbal
Comprehension Index) may be a misleading estimate. Example: Reporting Scores
The following statement is one that can be
Percentile commonly found in a psychological report and can be
Standard scores may also be reported with a used to illustrate these definitions: “Jacob obtained a
percentile to aid in understanding performance. A standard score of 93 + 7 on a test of reading
percentile indicates the percentage of individuals in the comprehension, which is ranked at the 33rd percentile
norm group that scored below a particular score. For and is classified as average.” This is what that
example, a student who earned a standard score of 100 statement means: First, Jacob’s observed score fell
performed at the 50th percentile. This means that the below the mean of 100. Second, Jacob did as well as or
student performed as well as or better than 50% of better than 33% of students his age from the general
same-age peers from the general population. A standard population. Third, there is a 95% chance that Jacob’s
score of 90 has a percentile rank of 25. A student who is true score falls somewhere between 86 and 100. Fourth,
reported to be at the 25th percentile performed as well or Jacob’s performance is considered average relative to
better than 25% of same-age peers, just as a student same-age peers from the general population. The table
who is reported to be at the 75th percentile performed as at the end of this handout provides commonly used
well or better than 75% of students of the same age. performance classifications for standard scores and
While the standard score of 90 is below the statistical percentiles.
mean of 100 and is at the 25th percentile, this
performance is still within the average range and Understanding the Assessment Report
generally does not indicate any need for concern. Type of norms used. It is important to take note of
the types of norms used when reading test results in a
Confidence Interval psychological or school assessment report. A student’s
Psychological tests do not measure ability perfectly. performance on a standardized test can be compared to
No matter how carefully a test is developed, it will always other students of the same age (age norms) or of the
contain some form of error or unreliability. This error may same grade (grade norms). Age norms are always used
exist for various reasons that are not always readily for tests of intellectual ability so that comparisons can
identifiable. In order to account for this error, standard be made to same-age peers. The use of grade norms is
scores are often reported with confidence intervals. related to the type of test being utilized or may be
Confidence intervals represent a range of standard dictated by certain situations. For example, grade norms
scores in which the student’s true score is likely to fall a may be most appropriate for achievement tests when a
certain percentage of the time. Most confidence student has repeated a grade and to see how the
intervals are set at 95%, meaning that a student’s true student’s performance compares to grade-level peers.
score is likely to fall between the upper and lower limits Use of age or grade equivalents. Age and grade
of the confidence interval 95 out of 100 times (or 95% equivalents are different from age and grade norms.
of the time). For example, if a student earned a standard Essentially, the age and grade equivalents are scores
score of 90 with a confidence interval of +5, this means that indicate the typical age or grade level of students
that the lower limit of the confidence interval is 85 (that who obtain a given score. For example, if Jacob’s
is, 90 – 5 = 85) and the upper limit of the confidence performance on the test of reading comprehension is
interval is 95 (90 + 5 = 95). The standard score of 90 equal to an age equivalent of 8.7 years and a grade
may be reported in a psychological report as 90 + 5 or equivalent of 2.6, this means that his obtained raw score
90 (85 – 95). Although the student’s score on the day of is equivalent to the same number of items correct that is
the evaluation was 90 in this example, the true score average for all 8-year, 7-month old children included in
may be lower or higher than 90 owing to an error the norm group on that particular reading
associated with the method in which the ability was comprehension test. Additionally Jacob’s score is
measured. Therefore, it is more accurate to say that equivalent to the average reading comprehension
there is a 95% chance that the student’s true performance of all children included in the normative
performance on this test falls somewhere between 85 sample who were in the sixth month of second grade.
and 95. The age or grade equivalents do not mean that Jacob is
Resources
Flanagan, D., & Caltabiano, L. (2004). Psychological
reports: A guide for parents and teachers. In A.
Canter, L. Paige, M. Roth, I. Romero, & S. Carroll
(Eds.), Helping children and home and school II:
Handouts for families and educators. Bethesda, MD:
National Association of School Psychologists.
Harcourt Assessment (n.d.). Some things parents should
know about testing. Available:
Helping Children at Home and School II: Handouts for Families and Educators S2–83
Classifying Test Scores
Note. Classifications are based on those described in Flanagan and Ortiz (2001) and Flanagan, Ortiz, Alfonso, and Mascolo (2002)
and were adapted from Woodcock and Mather (1989)