0% found this document useful (0 votes)

68 views39 pages

Brookhart Et Al 2016 RER 100 Year Grades Review

This document summarizes a century of research on grading practices in education. It reviews five types of grading studies: 1) Early 20th century studies found teachers' grades to be unreliable; 2) More recent quantitative studies show grades assess both cognitive and non-cognitive factors; 3) Teacher surveys find grades reflect what teachers value in student work; 4) Studies of standards-based grading examine the relationship between grades and test scores; 5) Grading practices differ between K-12 and higher education. The purpose is to comprehensively answer what grades mean by synthesizing findings across these research areas over 100 years.

Uploaded by

Sendhil Revuluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views39 pages

Brookhart Et Al 2016 RER 100 Year Grades Review

Uploaded by

Sendhil Revuluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

1

A Century of Grading Research: Meaning and Value in

the Most Common Educational Measure1
Susan M. Brookhart Thomas R. Guskey Alex J. Bowers James H. McMillan
Duquesne University University of Kentucky Teachers College, Virginia Commonwealth
Columbia University University

Jeffrey K. Smith Lisa F. Smith Michael T. Stevens Megan E. Welsh

University of Otago University of Otago University of University of
California at Davis California at Davis

ABSTRACT:1
Grading refers to the symbols assigned to individual pieces Grading is important to study because of the centrality of
of student work or to composite measures of student grades in the educational experience of all students.
performance on report cards. This review of over 100 Grades are widely perceived to be what students “earn” for
years of research on grading considers five types of their achievement (Brookhart, 1993, p.139), and have
studies: (a) early studies of the reliability of grades, (b) pervasive influence on students and schooling (Pattison,
quantitative studies of the composition of K-12 report card Grodsky, & Muller, 2013). Furthermore, grades predict
grades, (c) survey and interview studies of teachers’ important future educational consequences, such as
perceptions of grades, (d) studies of standards-based dropping out of school (Bowers, 2010a; Bowers & Sprott,
grading, and (e) grading in higher education. Early 20th 2012; Bowers, Sprott, & Taff, 2013), applying and being
century studies generally condemned teachers’ grades as admitted to college, and college success (Atkinson &
unreliable. More recent studies of the relationships of Geiser, 2009; Bowers, 2010a; Thorsen & Cliffordson,
grades to tested achievement and survey studies of 2012). Grades are especially predictive of academic
teachers’ grading practices and beliefs suggest that grades success in more open admissions higher education
assess a multidimensional construct containing both institutions (Sawyer, 2013).
cognitive and non-cognitive factors reflecting what
teachers value in student work. Implications for future Purpose of This Review and Research Question
research and for grading practices are discussed. This review synthesizes findings from five types of grading
studies: (a) early studies of the reliability of grades on
Keywords: grading, classroom assessment, educational student work, (b) quantitative studies of the composition of
measurement K-12 report card grades and related educational outcomes,
(c) survey and interview studies of teachers’ perceptions of
Grading refers to the symbols assigned to individual pieces grades and grading practices, (d) studies of standards-based
of student work or to composite measures of student grading (SBG) and the relationship between students’
performance on student report cards. Grades or marks, as report card grades and large-scale accountability
they were referred to in the first half of the 20th century, assessments, and (e) grading in higher education. The
were the focus of some of the earliest educational research. central question underlying all of these studies is “What do
Grading research history parallels the history of grades mean?” In essence, this is a validity question (Kane,
educational research more generally, with studies 2006; Messick, 1989). It concerns whether evidence
becoming both more rigorous and sophisticated over time. supports the intended meaning and use of grades as an
educational measure. To date, several reviews have given
partial answers to that question, but none of these reviews
1 synthesize 100 years of research from five types of studies.
This document is a pre-print of this manuscript, published
The purpose of this review is to provide a more
in the journal Review of Educational Research. Citation:
comprehensive and complete answer to the research
Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan,
question “What do grades mean?”
J. H., Smith, J. K., Smith, L. F., Stevens, M.T., Welsh, M.
E. (2016). A Century of Grading Research: Meaning and
BACKGROUND:
Value in the Most Common Educational Measure. Review
The earliest research on grading concerned mostly the
of Educational Research, 86(4), 803-848.
reliability of grades teachers assigned to students’ work.
doi: 10.3102/0034654316672069
The earliest investigation of which the authors are aware
http://doi.org/10.3102/0034654316672069

Brookhart et al. (2016) A Century of Grading Research

was published in the Journal of the Royal Statistical 443 schools studied, 8 percent employed descriptive
Society. Edgeworth (1888) applied the “Theory of Errors” grading, 9 percent percentage grading, 31 percent
(p. 600) based on normal curve theory to the case of percentage-equivalent categorical grading, 54 percent
grading examinations. He described three different sources categorical grading that was not percentage-equivalent, and
of error: (a) chance; (b) personal differences among 2 percent “gave a general rating on some basis such as
graders regarding the whole exam (severity or leniency and ‘degree to which the pupil is working to capacity’” (Hill,
speed) and individual items on the exam, now referred to 1935, p. 119). By the 1940s, more than 80 percent of U. S.
as task variation; and (c) “taking his [the examinee’s] schools had adopted the A–F grading scale. A–F remained
answers as representative of his proficiency” (p. 614), now the most commonly used scale until the present day.
referred to as generalizing to the domain. In parsing these Current grading reforms move in the direction of SBG, a
sources of error, Edgeworth went beyond simple chance relatively new and increasingly common practice
variation in grades to treat grades as subject to multiple (Grindberg, 2014) in which grades are based on standards
sources of variation or error. This nuanced view, which for achievement. In SBG, work habits and other non-
was quite advanced for its time, remains useful today. achievement factors are reported separately from
Edgeworth pointed out the educational consequences of achievement (Guskey & Bailey, 2010).
unreliability in grading, especially in awarding diplomas,
honors and other qualifications to students. He used this METHOD:
point to build an argument for improving reliability. Literature searches for each of the five types of studies
Today, the existence of unintended adverse consequences were conducted by different groups of co-authors, using
is also an argument for improving validity (Messick, the same general strategy: (a) a keyword search of
1989). electronic databases, (b) review of abstracts against criteria
for the type of study, (c) a full read of studies that met
During the 19th century, student progress reports were criteria, and (d) a snowball search using the references
presented to parents orally by the teacher during a visit to a from qualified studies. All searches were limited to
student’s home, with little standardization of content. Oral articles published in English.
reports were eventually abandoned in favor of written
narrative descriptions of how students were performing in To identify studies of grading reliability, electronic
certain skills like penmanship, reading, or arithmetic searches using the terms “teachers’ marks (or marking)”
(Guskey & Bailey, 2001). In the 20th century, high school and “teachers’ grades (or grading)” were conducted in the
student populations became so diverse and subject area following databases: ERIC, the Journal of Educational
instruction so specific that high schools sought a way to Measurement (JEM), Educational Measurement: Issues
manage the increasing demands and complexity of and Practice (EMIP), ProQuest’s Periodicals Index Online,
evaluating student progress (Guskey & Bailey, 2001). and the Journal of Educational Research (JER). The
Although elementary schools maintained narrative criterion for inclusion was that the research addressed
descriptions, high schools increasingly favored percentage individual pieces of student work (usually examinations),
grades because the completion of narrative descriptions not composite report card grades. Sixteen empirical
was viewed as time-consuming and lacking cost- studies were found (Table 1).
effectiveness (Farr, 2000). One could argue that this move
to percentage grades eliminated the specific To identify studies of grades and related educational
communication of what students knew and could do. outcomes, search terms included “(grades OR marks) AND
(model* OR relationship OR correlation OR association
Reviews by Crooks (1933), Smith and Dobbin (1960), and OR factor).” Databases searched included JSTOR, ERIC,
Kirschenbaum, Napier, and Simon (1971) debated whether and Educational Full Text Wilson Web. Criteria for
grading should be norm- or criterion-referenced, based on inclusion were that the study (a) examined the relationship
clearly defined standards for student learning. Although of K-12 grades to schooling outcomes, (b) used
high schools tended to stay with norm-referenced grades to quantitative methods, and (c) examined data from actual
accommodate the need for ranking students for college student assessments rather than teacher perspectives on
admissions, some elementary school educators transitioned grading. Forty-one empirical studies were identified
to what was eventually called mastery learning and then (Tables 2, 3, and 4).
standards-based education. Based on studies of grading
reliability (Kelly, 1914; Rugg, 1918), in the 1920’s For studies of K-12 teachers’ perspectives about grading
teachers began to adopt grading systems with fewer and and grading practices, the search terms used were
broader categories (e.g., the A–F scale). Still, variation in “grade(s),” “grading,” and “marking” with “teacher
grading practices persisted. Hill (1935) found variability perceptions,” “teacher practices,” and “teacher attitudes.”
in the frequency of grade reports, ranging from 2–12 times Databases searched included ERIC, Education Research
per year, and a wide array of grade reporting practices. Of Complete, Dissertation Abstracts, and Google Scholar.

Brookhart et al. (2016) A Century of Grading Research

Table 1

Early Studies of the Reliability of Grades

Study Method Sample Main Findings

Ashbaugh Descriptive 55 seniors and grad students  Grading the same paper on 3 occasions, the mean remained
(1924) statistics in Education grading 1 7th constant but the distribution narrowed
grade arithmetic paper  Grader inconsistency over time; grades more variable on
occasion 2 than occasion 3
 After presenting results to the class and discussing the problems
and the student’s work, graders devised a point scheme for each
problem and grading variability decreased
Bolton Descriptive 22 6th grade teachers of  Teachers are consistent with one another in their ratings
(1927) statistics arithmetic in one district,  Average deviation was 5.1 (out of 100)
grading 24 papers  Greater variability for lowest-quality work (level of work as a
source of variation)
Brimi (2011) Descriptive 73 English teachers grading  Range of scores was 46 points and covered all five letter grade
statistics one essay levels (ABCDF)
Eells (1930) Intra-rater 61 teachers in a  Teacher inconsistency over time a major source of variation
reliability; measurement course,  Estimated reliability ranged from 0.25 to 0.51
correlation of grading 3 elementary  Variability lowest for one very poor paper (level of work as a
time 1 and time geography and 2 history source of variation)
2, 11-week questions
interval
Healy (1935) Descriptive 175 sixth grade  Format and usage errors weighed more heavily in teachers’
statistics compositions from 50 grades than the quality of ideas (relative emphasis of criteria as
different teachers, one each a source of variation in grades)
of Excellent, Superior,
Average, Poor, Failure, re-
analyzed by trained judges

Brookhart et al. (2016) A Century of Grading Research

Hulten Intra-rater 30 English teachers grading  Teacher inconsistency over time

(1925) reliability; 5 compositions  20% of compositions changed from pass to fail or vice versa on
descriptive the second marking
statistics for time
1 and time 2, 2-
month interval
Jacoby Descriptive 6 astronomy professors  Little variability in grades
(1910) statistics marking 11 exams  Student work quality was high
Lauterbach Descriptive 57 teachers grading 120  Student work quality was a source of variation in grades
(1928) statistics papers (30 papers per  In absolute terms, there was much variation by teacher for each
teacher, half handwritten paper
and half typed)  In relative terms, teachers’ marks reliably ranked students
Shriner Descriptive 25 high school English  Teachers’ grading was reliable
(1930) statistics teachers and 25 algebra  Median correlations of each teacher’s grade with the average
teachers, grading 25 exams grade for each paper were .946 (algebra) and .917 (English)
each (English and algebra,  Greater teacher variability in grades for the poorer papers
respectively)
Silberstein Descriptive 31 teachers grading 1  When teachers re-graded the same paper, they changed their
(1922) statistics English paper that originally grade
passed in high school (73%)  Variation in scores on individual questions on the exam were
but failed by Regents (59%) very variable and explained the overall grading variation, except
for one question about syntax, where grades were more uniform
Sims (1933) Descriptive reanalysis of four data sets:  Two kinds of variability in teachers’ grades: (a) differences in
statistics 21 teachers grading 24 students’ work quality, and (b) “differences in the standards of
arithmetic papers; 25 grading found among school systems and among teachers
teachers grading 25 algebra within a system” (p. 637)
papers; 25 teachers grading  Teacher variability in assigning grades was large
25 high school English  Variability in marks was reduced by converting scores to grades
exams; and 9 readers
grading 20 psychology
exams

Brookhart et al. (2016) A Century of Grading Research

Starch (1913) Descriptive 10 instructors grading 10  Teacher variability was large, and largest for the two poorest
statistics freshman English exams papers
 Isolated four sources of variation and reported probable error (p.
632, total probable error=5.4 out of 100): 1) Differences among
the standards of different schools (probable error almost 0), (2)
Differences among the standards of different teachers (pe=1.0),
(3) Differences in the relative values placed by different
teachers upon various elements in a paper, including content
and form (pe=2.1), and (4) Differences due to the pure inability
to distinguish between closely allied degrees of merit (pe=2.2).
Starch (1915) Descriptive 12 teachers grading 24 6th  Average teacher variability of 4.2 (out of 100) was reduced to
statistics and 7th grade compositions 2.8 by forcing a normal distribution using a 5-category scale
(Poor, Inferior, Medium, Superior, and Excellent)
Starch and Descriptive 142 high school English  Teacher variability in assigning grades was large (a range of 30-
Elliott (1912) statistics teachers grading 2 exams 40 out of 100 points, probable error of 4.0 and 4.8, respectively)
 Teacher variability in the relative sense, as well
Starch and Descriptive 138 high school  Teacher variability was larger than for the English papers in
Elliott statistics mathematics teachers Starch and Elliott (1912): probable error of 7.5
(1913a) grading 1 geometry exam  Grade for 1 answer varies about as widely as composite grade
for the whole exam
Starch and Descriptive 122 high school history  Teacher variability was larger than for the English or math
Elliott statistics teachers grading 1 exam exams (Starch & Elliott, 1912, 1913a): probable error of 7.7
(1913b)  Concluded that variability isn’t due to subject, but “the
examiner and method of examination” (p. 680)

Brookhart et al. (2016) A Century of Grading Research

Criteria for inclusion were that the study topic was K-12 investigated, “Differences among the standards of different
teachers’ perceptions of grading and grading practices and schools” (p. 630) contributed practically nothing toward
were published since 1994 (the date of Brookhart’s the total (p. 632).
previous review). Thirty-five empirical studies were found
(31 are presented in Table 5, and four that investigated Other studies listed in Table 1 identify these and other
SBG are in Table 6). sources of grading variability. Differences in grading
criteria, or lack of criteria, were found to be a prominent
The search for studies of standards-based grading used the source of variability in grades (Ashbaugh, 1924; Brimi,
search terms “standards” and (“grades” or “reports) and 2011; Eells, 1930; Healy, 1935; Silberstein, 1922), akin to
“education.” Databases searched included Psychinfo, Starch’s (1913) difference in the relative values teachers
Psycharticles, ERIC, and Education Source. The criterion place on various elements in a paper. Teacher severity or
for inclusion was that articles needed to address SBG. leniency was found to be another source of variability in
Eight empirical studies were identified (Table 6). grades (Shriner, 1930; Silberstein, 1922; Sims, 1933),
similar to Starch’s differences in teachers’ standards.
For studies of grading in higher education, search terms Differences in student work quality were associated with
included “grades” or “grading,” combined with variability in grades, but the findings were inconsistent.
“university,” “college,” and “higher education” in the title. Bolton (1927), for example, found greater grading
Databases searched included EBSCO Education Research variability for poorer papers. Similarly, Jacoby (1910)
Complete, ERIC, and ProQuest (Education Journals). The interpreted his high agreement as a result of the high
inclusion criterion was that the study investigated grading quality of the papers in his sample. Eells (1930), however,
practices in higher education. University websites in 12 found greater grading consistency in the poorer papers.
different countries were also consulted to allow for Lauterbach (1928) found more grading variability for
international comparisons. Fourteen empirical studies typewritten compositions than for handwritten versions of
were found (Table 7). the same work. Finally, between-teacher error was a
central factor in all of the studies in Table 1. Studies by
RESULTS: Eells and Hulten (1925) demonstrated within-teacher
Summaries of results from each of the five types of studies, error, as well.
along with tables listing those results, are presented in this
section. The Discussion section that follows synthesizes Given a probable error of around 5 in a 100-point scale,
the findings and examines the meaning of grades based on Starch (1913) recommended the use of a 9-point scale (i.e.,
that synthesis. A+, A-, B+, B-, C+, C-, D+, D-, and F) and later tested the
improvement in reliability gained by moving to a 5-point
Grading Reliability scale based on the normal distribution (Starch, 1915). His
Table 1 displays the results of studies on the reliability of and other studies contributed to the movement in the early
teachers’ grades. The main finding was that great variation 20th century away from a 100-point scale. The ABCDF
exist in the grades teachers assign to students’ work letter grade scale became more common and remains the
(Ashbaugh, 1924; Brimi, 2011; Eells, 1930; Healy, 1935; most prevalent grading scale in schools in the U.S today.
Hulten, 1925; Kelly, 1914; Lauterbach, 1928; Rugg, 1918;
Silberstein, 1922; Sims, 1933; Starch, 1913, 1915; Starch Grades and Related Educational Outcomes
& Elliott, 1912, 1913a,b). Three studies (Bolton, 1927; Quantitative studies of grades and related educational
Jacoby, 1910; Shriner, 1930) argued against this outcomes moved the focus of research on grades from
conclusion, however, contending that teacher variability in questions of reliability to questions of validity. Three
grading was not as great as commonly suggested. types of studies investigated the meaning of grades in this
way. The oldest line of research (Table 2) looked at the
As the work of Edgeworth (1888) previewed, these studies relationship between grades and scores on standardized
identified several sources of the variability in grading. tests of intelligence or achievement. Today, those studies
Starch (1913), for example, determined that three major would be seen as seeking concurrent evidence for validity
factors produced an average probable error of 5.4 on a 100- under the assumption that graded achievement should be
point scale across instructors and schools. Specifically, the same as tested achievement (Brookhart, 2015). As the
“Differences due to the pure inability to distinguish 20th century progressed, researchers added non-cognitive
between closely allied degrees of merit” (p. 630) variables to these studies, describing grades as
contributed 2.2 points, “Differences in the relative values multidimensional measures of academic knowledge,
placed by different teachers upon various elements in a engagement, and persistence (Table 3). A third group of
paper, including content and form” (p. 630) contributed 2.1 more recent studies looked at the relationship between
points, “Differences among the standards of different grades and other educational outcomes, for example
teachers” (p. 630) contributed 1.0 point. Although dropping out of school or future success in school

Brookhart et al. (2016) A Century of Grading Research

Table 2

Studies of the Relation of K-12 Report Card Grades and Tested Achievement

Study Method Sample Main Findings

Brennan, Kim, Correlation 736 eighth-grade students Compared the Massachusetts MCAS standardized state reading
Wenz-Gross, and test scores to grades in mathematics, English, and science
Siperstein (2001) r=0.54 to 0.59
Carter (1952) Correlation 235 high school students Grades and standardized algebra achievement scores r=0.52
Duckworth, Structural a) 1,364 ninth grade students  Standardized reading and mathematics test scores compared
Quinn, and Equation b) 510 eighth grade students to GPA r=0.62 to 0.66
Tsukayama Modeling  Engagement and persistence is mediated through teacher
(2012) evaluations of student conduct and homework completion
Duckworth and Correlation 140 eighth-grade students GPA and 2003 TerraNova Second Edition/California
Seligman (2006) Achievement Test; r=0.66
McCandless, Correlation 433 seventh grade students Grades and Metropolitan Achievement Test scores r=0.31,
Roberts, and accounting for socio-economic status, ethnicity, and gender
Starnes (1972)
Moore (1939) Correlation 200 fifth and sixth grade Grades and Stanford Achievement Test r=0.61
students
Pattison, Correlation U.S. Nationally representative High school GPA compared to reading (r=0.46 to 0.54) and
Grodsky, and datasets of over 10,000 mathematics standardized tests (r=0.52 to 0.64)
Muller (2013) students each:
 National Longitudinal
Study of the High School
Class of 1972 (NLS72)
 High School and Beyond
sophomore cohort (HS&B)
 National Educational
Longitudinal Study of 1988
(NELS)
 Educational Longitudinal
Brookhart et al. (2016) A Century of Grading Research
8

Study of 2002 (ELS)

Unzicker (1925) Correlation 425 seventh through ninth Average grades across English, mathematics and history
grade students correlated 0.47 with the Otis intelligence test
Woodruff and Correlation About 700,000 high schools Self-reported GPA and ACT composite scores r=0.56 to 0.58
Ziomek (2004) students each year, 1991–2003 Self-reported mathematics grades and ACT scores r=0.54 to
0.57
Self-reported English grades and ACT scores r=0.45 to 0.50

Brookhart et al. (2016) A Century of Grading Research

Table 3

Studies of K-12 Report Card Grades as Multidimensional Measures of Academic Knowledge, Engagement, and Persistence

Study Method Sample Main Findings

Bowers (2009) Multi- 195 students high school Grades were multidimensional, separating core subject and non-
dimensional students core grades versus state standardized assessments in science
scaling mathematics and reading and the ACT
Bowers (2011) Multi- 4,520 high school students Three factor structure: (a) a cognitive factor that describes the
dimensional from the Educational relationship between tests and core subject grades, (b) an
scaling Longitudinal Study of 2002 engagement factor between core subject grades and non-core
(ELS) subject grades, and (c) a factor that described the difference
between grades in art and physical education
Casillas et al. Correlation; 4,660 seventh and eighth 25% of the explained variance in GPAs was attributable the
(2012) Hierarchical graders standardized assessments. Academic discipline and commitment
linear to school were strongly related to GPA
modeling
Farkas, Grobe, Regression 486 eighth graders and their Student work habits were the strongest non-cognitive predictors
Sheehan, and teachers of grades
Shaun (1990)
Kelly (2008) Hierarchical 1,653 sixth, seventh, and eighth Positive and significant effects of students’ substantive
linear grade students engagement on subsequent grades but no relationship with
modeling procedural engagement
Klapp Lekholm Structural 99,070 Swedish students Grades consisted of two major factors 1) a cognitive achievement
and Cliffordson Equation factor and s) a non-cognitive “common grade dimension”
(2008) Modeling
Klapp Lekholm Factor 99,070 Swedish students Cognitive achievement factor of grades consists of student self-
and Cliffordson Analysis; perception of competence, self-efficacy, coping strategies, and
(2009) Structural subject-specific interest. Non-cognitive factor consists of
Klapp Lekholm Equation motivation and a general interest in school
(2011) Modelling
Brookhart et al. (2016) A Century of Grading Research
10

Miner (1967) Factor 671 high school students Examined academic grades in first, third, sixth, ninth, and twelfth
Analysis grade; achievement tests in fifth, sixth, and ninth grades; and
citizenship grades in first, third, and sixth grades. A three factor
solution was identified: three factor solution: (a) objective
achievement, (b) behavior factor, and (c) high school
achievement as measured through grades.
Sobel (1936) Descriptive Not reported Students categorized into three groups based on comparing
grades and achievement test levels; grade-superior, middle-
group, mark-superior
Thorsen and Structural All grade 9 students in Sweden, Generally replicated Klapp Lekholm and Cliffordson (2009)
Cliffordson Equation 99,085 (2003), 105,697 (2004),
(2012) Modeling 108,753 (2005)
Thorsen (2014) Structural 3,855 students in Sweden Generally replicated Klapp Lekholm and Cliffordson (2009) in
Equation examining norm-referenced grades
Modeling
Willingham, Regression 8,454 students from 581schools A moderate relationship between grades and tests was identified
Pollack, and as well as strong positive relationships between grades and
Lewis (2002) student motivation, engagement, completion of work assigned,
and persistence

Brookhart et al. (2016) A Century of Grading Research

Table 4

Studies of Grades as Predictors of Educational Outcomes

Study Method Sample Main Findings

Alexander, Regression 301 grade 9 students Student background, grade retention, academic performance and
Entwisle, and behavior strongly related to dropping out
Kabbani (2001)
Allensworth and Descriptive; 24,894 first time ninth grades GPA and failing a course in early high school strongly predict
Easton (2007) Regression students in Chicago dropout
Allensworth, Descriptive; 19,963 grade 8 Chicago Middle school grades and attendance are stronger predictors of
Gwynne, Moore, Regression students high school performance in comparison to test scores, and
and de la Torre middle school grades are a strong predictor of students on or off
(2014) track for high school success
Balfanz, Herzog, Regression 12,972 sixth grade students Predictors of dropping out of high school included failing
and MacIver from Philadelphia mathematics or English, low attendance, a poor behavior
(2007)
Barrington and ANOVA; 214 high school students GPA, number of low grades, intelligence test scores, and student
Hendricks (1989) Correlation mobility significantly predicted dropout.
Bowers (2010a) Cluster 188 students tracked from Longitudinal low grade clusters across all types of course
analysis grade 1 through high school subjects correlated with dropping out and not taking the ACT
Bowers (2010b) Regression 193 students tracked from Receiving low grades (D or F) and being retained in grade
grade 1 through high school strongly related to dropping out
Bowers and Growth 5400 grade 10 Education Non-cumulative GPA trajectories in early high school were
Sprott (2012) mixture Longitudinal Study of 2002 strongly predictive of dropping out
modeling students
Bowers, Sprott, Receiver 110 dropout flags from 36 Dropout flags focusing on GPA were some of the most accurate
and Taff (2013) operating previous studies dropout flags across the literature
characteristic
analysis

Brookhart et al. (2016) A Century of Grading Research

Cairns, Cairns, Cluster 475 grade 7 students Beyond student demographics, student aggressiveness and low
and Neckerman analysis; levels of academic performance associated with dropping out
(1989) regression
Cliffordson Two-level 164,106 Swedish students Grades predict achievement in higher education more strongly
(2008) modeling than SweSAT (Swedish Scholastic Aptitude Test), and criterion-
referenced grades predict slightly better than norm-referenced
grades
Ekstrom, Goertz, Regression High School and Beyond Grades and problem behavior identified as the most important
Pollack, and survey, 30,000 high school variables for identifying dropping out, higher than test scores.
Rock (1986) sophomores
Ensminger and Regression 1,242 first graders from Low grades and aggressive behavior related to eventually
Slusarcick historically disadvantaged dropping out, with low SES negatively moderating the
(1992) community relationships.
Fitzsimmons, Correlation 270 high school students Students receiving low grades (D or F) in elementary or middle
Cheever, school were at much higher risk of dropping out.
Leonard, and
Macunovich
(1969)
Jimerson, Regression 177 children tracked from birth Home environment, quality of parent caregiving, academic
Egeland, Sroufe, through age 19 achievement, student problem behaviors, peer competence and
and Carlson intelligence test scores significantly related with dropping out.
(2000)
Lloyd (1978) Regression 1532 third grade students Dropping out significantly predicted with grades and marks
Morris, Ehren, Correlation; 785 in grades 7 through 12 Dropping out predicted by Absences, low grades (D or F),
and Lenz (1991) chi-square mobility.
Roderick and Regression 27,612 Chicago ninth graders Examined significant predictors of course failure, including low
Camburn (1999) attendance, and found failure rates varied significantly at the
school level
Troob (1985) Descriptive 21,000 New York city high Low grades and high absences corresponded to higher levels of
school students dropping out

Brookhart et al. (2016) A Century of Grading Research

(Table 4). These studies offer predictive evidence for part of researchers to understand what teacher-assigned
validity under the assumption that grades measure school grades represent in comparison to other known
success. standardized assessments. In other words, their focus was
criterion validity (Ross & Hooks, 1930).
Correlation of grades and other assessments. Table 2
describes studies that investigated the relationship between Investigations from the late 20th century and into the 21st
grades (usually grade-point average, GPA) and century replicated earlier studies but included larger, more
standardized test scores in an effort to understand the representative samples and used more current standardized
composition of the grades and marks that teachers assign to tests and methods (Brennan, Kim, Wenz-Gross, &
K-12 students. Despite the enduring perception that the Siperstein, 2001; Woodruff & Ziomek, 2004). Brennan
correlation between grades and standardized test scores is and colleagues (2001), for example, compared reading
strong (Allen, 2005; Duckworth, Quinn, & Tsukayama, scores from the Massachusetts MCAS state test to grades
2012; Stanley & Baines, 2004), this correlation is and in mathematics, English, and science and found
always has been relatively modest, in the .5 range. As correlations ranging from .54 to .59. Similarly, using GPA
Willingham, Pollack, and Lewis (2002) noted: and 2003 TerraNova Second Edition/California
Understanding these characteristics of grades is Achievement Tests, Duckworth and Seligman (2006)
important for the valid use of test scores as well as found a correlation of .66. Subsequently, Duckworth et al.
grade averages because, in practice, the two (2012) examined standardized reading and mathematics
measures are often intimately connected… [there test scores to GPA and found correlations between .62 and
is a] tendency to assume that a grade average and .66.
a test score are, in some sense, mutual surrogates;
that is, measuring much the same thing, even in Woodruff and Ziomek (2004) compared GPA and ACT
the face of obvious differences (p.2). composite scores for all high school students who took the
ACT college entrance exam between 1991 and 2003. They
Research on the relationship between grades and found moderate but consistent correlations ranging from
standardized assessment results is marked by two major .56 to .58 over the years for average GPA and composite
eras: early 20th century studies and late 20th into 21st ACT scores, from .54 to .57 for mathematics grades and
century studies. Unzicker (1925) found that average ACT scores, and from .45 to .50 in English. Student GPAs
grades across subjects correlated .47 with intelligence test were self-reported, however. Pattison and colleagues
scores. Ross and Hooks (1930) reviewed 20 studies (2013) examined four decades of achievement data on tens
conducted from 1920 through 1929 on report card grades of thousands of students using national databases to
and intelligence test scores in elementary school as compare high school GPA to reading and mathematics
predictors of junior high and high school grades. Results standardized tests. The authors found GPA correlations
showed that the correlations between grades in seventh consistent with past research, ranging from .52 to .64 in
grade and intelligence test scores ranged from .38 to .44. mathematics and from .46 to .54 in reading
Ross and Hooks concluded: comprehension.
Data from this and other studies indicate that the
grade school record affords a more reliable or Although some variability exists across years and subjects,
consistent basis of prediction than any other correlations have remained moderate but remarkably
available, the correlations in three widely-scattered consistent in studies based on large, nationally-
school systems showing remarkable stability; and representative datasets. Across 100 years of research,
that without question the grade school record of the teacher-assigned grades typically correlate about .5 with
pupil is the most usable or practical of all bases for standardized measures of achievement. In other words, 25
prediction, being available wherever cumulative percent of the variation in grades teachers assign is
records are kept, without cost and with a minimum attributable to a trait comparable to the trait measured by
expenditure of time and effort (p. 195). standardized tests (Bowers, 2011). The remaining 75
percent is attributable to something else. As Swineford
Subsequent studies moved from correlating grades and (1947) noted in a study on grading in middle and high
intelligence test scores to correlating grades with school, “the data [in the study] clearly show that marks
standardized achievement results (Carter, 1952, r = .52; assigned by teachers in this school are reliable measures of
Moore, 1939, r = .61). McCandless, Roberts, and Starnes something but there is apparently a lack of agreement on
(1972) found a smaller correlation (r = .31) after just what that something should be” (p.47) [author’s
accounting for socio-economic status, ethnicity, and emphasis]. A correlation of .5 is neither very weak—
gender. Although the sample selection procedures and countering arguments that grades are completely subjective
methods used in these early investigations are problematic measures of academic knowledge; nor is it very strong—
by current standards, they represent a clear desire on the refuting arguments that grades are a strong measure of

Brookhart et al. (2016) A Century of Grading Research

fundamental academic knowledge, and remains consistent finding suggests that most teachers successfully use grades
despite large shifts in the educational system, especially in to reward achievement-oriented behavior and promote a
relation to accountability and standardized testing (Bowers, widespread growth in achievement” (Kelly, 2008, p.45).
2011; Linn, 1982). Kelly also argued that misperceptions that teachers do not
distinguish between apparent and substantive engagement
Grades as multi-dimensional measures of academic lends mistaken support to the use of high-stakes tests as
knowledge, engagement, and persistence. Investigations inherently more “objective” (p. 46) than teacher
of the composition of K-12 report card grades consistently assessments.
find them to be multidimensional, comprising minimally
academic knowledge, substantive engagement, and Recent studies have expanded on this work, applying
persistence. Table 3 presents studies of grades and other sophisticated methodologies. Bowers (2009, 2011) used
measures, including many non-cognitive variables. The multi-dimensional scaling to examine the relationship
earliest study of this type, Sobel (1936) found that students between grades and standardized test scores in each
with high grades and low test scores had outstanding semester in high school, in both core subjects
penmanship, attendance, punctuality, and effort marks, and (mathematics, English, science, and social studies) and
their teachers rated them high in industry, perseverance, non-core subjects (foreign/non-English languages, art, and
dependability, co-operation, and ambition. Similarly, physical education). Bowers (2011) found evidence for a
Miner (1967) factor analyzed longitudinal data for a three factor structure: (a) a cognitive factor that describes
sample of students, including their grades in first, third, the relationship between tests and core subject grades, (b) a
sixth, ninth, and twelfth grade; achievement tests in fifth, conative and engagement factor between core subject
sixth, and ninth grades; and citizenship grades in first, grades and non-core subject grades (termed a “Success at
third, and sixth grades. She identified a three-factor School Factor, SSF,” p. 154), and (c) a factor that
solution: (a) objective achievement as measured through described the difference between grades in art and physical
standardized assessments, (b) early classroom citizenship education. He also showed that teachers’ assessment of
(a behavior factor), and (c) high school achievement as students’ ability to negotiate the social processes of
measured through grades, demonstrating that behavior and schooling represents much of the variance in grades that is
two types of achievement could be identified as separate unrelated to test scores. This points to the importance of
factors. substantive engagement and persistence (Kelly, 2008;
Willingham et al., 2002) as factors that help students in
Farkas, Grobe, Sheehan, and Shaun (1990) showed that both core and non-core subjects. Subsequently,
student work habits were the strongest non-cognitive Duckworth et al. (2012) used structural equation modeling
predictors of grades. They noted: “Most striking is the (SEM) for 510 New York City fifth through eighth graders
powerful effect of student work habits upon course to show that engagement and persistence is mediated
grades… teacher judgments of student non-cognitive through teacher evaluations of student conduct and
characteristics are powerful determinants of course grades, homework completion.
even when student cognitive performance is controlled” (p.
140). Likewise, Willingham et al. (2002), using large Casillas and colleagues (2012) examined the
national databases, found a moderate relationship between interrelationship among grades, standardized assessment
grades and tests as well as strong positive relationships scores, and a range of psychosocial characteristics and
between grades and student motivation, engagement, behavior. Twenty-five percent of the explained variance in
completion of work assigned, and persistence. Relying on GPAs was attributable to the standardized assessments; the
a theory of a conative factor of schooling—focusing on rest was predicted by a combination of prior grades (30%),
student interest, volition, and self-regulation (Snow, psychosocial factors (23%), behavioral indicators (10%),
1989)—the authors suggested that grades provide a useful demographics (9%), and school factors (3%). Academic
assessment of both conative and cognitive student factors discipline and commitment to school (i.e., the degree to
(Willingham et al., 2002). which the student is hard working, conscientious, and
effortful) had the strongest relationship to GPA.
Kelly (2008) countered a criticism of the conative factor
theory of grades, namely that teachers may award grades A set of recent studies focused on the Swedish national
based on students appearing engaged and going through context (Cliffordson, 2008; Klapp Lekholm, 2011; Klapp
the motions (i.e., a procedural form of engagement) as Lekholm & Cliffordson, 2008, 2009; Thorsen, 2014;
opposed to more substantive engagement involving Thorsen & Cliffordson, 2012), which is interesting because
legitimate effort and participation that leads to increased report cards are uniform throughout the country and
learning. He found positive and significant effects of require teachers to grade students using the same
students’ substantive engagement on subsequent grades but performance level scoring system used by the national
no relationship with procedural engagement, noting “This exam. Klapp Lekholm and Cliffordson (2008) showed that

Brookhart et al. (2016) A Century of Grading Research

grades consisted of two major factors: a cognitive Leonard, & Macunovich, 1969; Lloyd, 1974, 1978; Voss,
achievement factor and a non-cognitive “common grade Wendling, & Elliott, 1966) identified teacher-assigned
dimension” (p. 188). In a follow-up study, Klapp Lekholm grades as one of the strongest predictors of student risk for
and Cliffordson (2009) reanalyzed the same data, failing to graduate from high school. Subsequent studies
examining the relationships between multiple student and included other variables such as absence and misbehavior
school characteristics and both the cognitive and non- and found that grades remained a strong predictor
cognitive achievement factors. For the cognitive (Barrington & Hendricks, 1989; Cairns, Cairns, & Necker,
achievement factor of grades, student self-perception of 1989; Ekstrom, Goertz, Pollack, & Rock, 1986;
competence, self-efficacy, coping strategies, and subject- Ensminger & Slusarcick, 1992; Finn, 1989; Hargis, 1990;
specific interest were most important. In contrast, the most Morris, Ehren, & Lenz, 1991; Rumberger, 1987; Troob,
important student variables for the non-cognitive factor 1985). More recent research using a life course
were motivation and a general interest in school. These perspective showed low or failing grades have a
SEM results were replicated across three full population- cumulative effect over a student’s time in school and
level cohorts in Sweden representing all 99,085 9th grade contribute to the eventual decision to leave (Alexander,
students in 2003, 105,697 students in 2004, and 108,753 in Entwisle, & Kabbani, 2001; Jimerson, Egeland, Sroufe, &
2005 (Thorsen & Cliffordson, 2012), as well as in Carlson, 2000; Pallas, 2003; Roderick & Camburn, 1999).
comparison to both norm-referenced and criterion-
referenced grading systems, examining 3,855 students in Other research in this area considered grades in two ways:
Sweden (Thorsen, 2014). Klapp Lekholm and Cliffordson the influence of low grades (Ds and Fs) on dropping out,
(2009) wrote: and the relationship of a continuous scale of grades (such
The relation between general interest or motivation as GPA) to at-risk status and eventual graduation or
and the common grade dimension seems to dropping out. Three examples are particularly notable.
recognize that students who are motivated often Allensworth and colleagues have shown that failing a core
possess both specific and general goals and subject in ninth grade is highly correlated with dropping
approach new phenomena with the goal of out of school, and thus places a student off track for
understanding them, which is a student graduation (Allensworth, 2013; Allensworth & Easton,
characteristic awarded in grades (p. 19). 2005, 2007). Such failure also compromises the transition
from middle school to high school (Allensworth, Gwynne,
These findings, similar to those of Kelly (2008), Bowers Moore, & de la Torre, 2014). Balfanz, Herzog, and
(2009, 2011), and Casillas et al. (2012), support the idea MacIver (2007) showed a strong relationship between
that substantive engagement is an important component of failing core courses in sixth grade and dropping out.
grades that is distinct from the skills measured by Focusing on modeling conditional risk, Bowers (2010b)
standardized tests. A validity argument that expects grades found the strongest predictor of dropping out after grade
and standardized tests to correlate highly therefore may not retention was having D and F grades.
be sound because the construct of school achievement is
not fully defined by standardized test scores. Tested Few studies, however, have focused on grades as the sole
achievement represents one dimension of the results of predictor of graduation or dropping out. Most studies
schooling, privileging “individual cognition, pure instead examine longitudinal grade patterns, using either
mentation, symbol manipulation, and generalized learning” data mining techniques such as cluster analysis of all
(Resnick, 1987, pp. 13-15). course grades K-12 (Bowers, 2010a) or mixture modeling
techniques to identify growth patterns or decline in GPA in
Grades as predictors of educational outcomes. early high school (Bowers & Sprott, 2012). A recent
Table 4 presents studies of grades as predictors of review of the studies on the accuracy of dropout predictors
educational outcomes. Teacher-assigned grades are well- showed that along with the Allensworth Chicago on-track
known to predict graduation from high school (Bowers, indicator (Allensworth & Easton, 2007), longitudinal GPA
2014), as well as transition from high school to college trajectories were among the most accurate predictors
(Atkinson & Geiser, 2009; Cliffordson, 2008). identified (Bowers et al., 2013).
Satisfactory grades historically have been used as one of
the means to grant students a high school diploma Teachers’ Perceptions of Grading and Grading
(Rumberger, 2011). Studies from the second half of the Practices
20th century and into the 21st century, however, have Systematic investigations of teachers’ grading practices
focused on using grades from early grade levels to predict and perceptions about grading began to be published in the
student graduation rate or risk of dropping out of school 1980s and were summarized in Brookhart’s (1994) review
(Gleason & Dynarski, 2002; Pallas, 1989). of 19 empirical studies of teachers grading practices,
opinions, and beliefs. Five themes were supported. First,
Early studies in this domain (Fitzsimmons, Cheever, teachers use measures of achievement, primarily tests, as

Brookhart et al. (2016) A Century of Grading Research

major determinants of grades. Second, teachers believe it were by far most important in determining grades. These
is important to grade fairly. Views of fairness included findings have been replicated (Duncan & Noonan, 2007;
using multiple sources of information, incorporating effort, McMillan et al., 2002). In a qualitative study, McMillan
and making it clear to students what is assessed and how and Nash (2000) found that teaching philosophy and
they will be graded. This suggests teachers consider judgments about what is best for students’ motivation and
school achievement to include the work students do in learning contributes to variability of grading practices,
school, not just the final outcome. Third, in 12 of the suggesting that an emphasis on effort, in particular,
studies teachers included non-cognitive factors in grades, influences these outcomes. Randall and Engelhard (2010)
including ability, effort, improvement, completion of work, found that teacher beliefs about what best supports students
and, to a small extent, other student behaviors. Fourth, are important factors in grading, especially using non-
grading practices are not consistent across teachers, either cognitive factors for borderline grades, as Sun and Cheng
with respect to purpose or the extent to which non- (2013) also found with a sample of Chinese secondary
cognitive factors are considered, reflecting differences in teachers. These studies suggest that part of the reason for
teachers’ beliefs and values. Finally, grading practices the multidimensional nature of grading reported in the
vary by grade level. Secondary teachers emphasize previous section is that teachers’ conceptions of “academic
achievement products, such as tests; whereas, elementary achievement” include behavior that supports and promotes
teachers use informal evidence of learning along with academic achievement, and that teachers evaluate these
achievement and performance assessments. Brookhart’s behaviors as well as academic content in determining
(1994) review demonstrated an upswing in interest in grades. These studies also showed significant variation
investigating grading practices during this period, in which among teachers within the same school. That is, the
performance-based and portfolio classroom assessment weight that different teachers give to separate factors can
was emphasized and reports of the unreliability of vary a great deal within a single elementary or secondary
teachers’ subjective judgments about student work also school (Cizek et al., 1995; Cross & Frary, 1999; Duncan &
increased. The findings were in accord with policy- Noonan, 2007; Guskey, 2009b; Troug & Friedman, 1996;
makers’ increasing distrust of teachers’ judgments about U.S. Department of Education, 1999; Webster, 2011).
student achievement.
Teacher perceptions about grading. Compared to the
Teachers’ reported grading practices. Empirical studies number of studies about teachers’ grading practices,
of teachers’ grading practices over the past twenty years relatively few studies focus directly on perceptual
have mainly used surveys to document how teachers use constructs such as importance, meaning, value, attitudes,
both cognitive and non-cognitive evidence, primarily and beliefs. Several studies used Brookhart’s (1994)
effort, and their own professional judgment in determining suggestion that Messick’s (1989) construct validity
grades. Table 5 shows most studies published since framework is a reasonable approach for investigating
Brookhart’s 1994 review document that teachers in perceptions. This focuses on both the interpretation of the
different subjects and grade levels use “hodgepodge” construct (what grading means) and the implications and
grading (Brookhart, 1991, p. 36), combining achievement, consequences of grading (the effect it has on students).
effort, behavior, improvement, and attitudes (Adrian, 2012; Sun and Cheng (2013) used this conceptual framework to
Bailey, 2012; Cizek, Fitzgerald, & Rachor, 1995; Cross & analyze teachers’ comments about their grading and the
Frary, 1999; Duncan & Noonan, 2007; Frary, Cross, & extent to which values and consequences were considered.
Weber, 1993; Grimes, 2010; Guskey, 2002, 2009b; The results showed that teachers interpreted good grades as
Imperial, 2011; Liu, 2008a; Llosa, 2008; McMillan, 2001; a reward for accomplished work, based on both effort and
McMillan & Lawson, 2001; McMillan, Myran, & quality, student attitude toward achievement as reflected by
Workman, 2002; McMillan & Nash, 2000; Randall & homework completion, and progress in learning. Teachers
Engelhard, 2009, 2010; Russell & Austin, 2010; Sun & indicated the need for fairness and accuracy, not just
Cheng, 2013; Svennberg, Meckbach, & Redelius, 2014; accomplishment, saying that grades are fairer if they are
Troug & Friedman, 1996; Yesbeck, 2011). Teachers’ lowered for lack of effort or participation, and that grading
often make grading decisions with little school or district needs to be strict for high achievers. Teachers also
guidance. considered consequences of grading decisions for students’
future success and feelings of competence.
Teachers distinguish among non-achievement factors in
grading. They view “academic enablers” (McMillan, Fairness in an individual sense is a theme in several studies
2001, p. 25), including effort, ability, work habits, of teacher perceptions of grades (Bonner & Chen, 2009;
attention, and participation, differently from other non- Grimes, 2010; Hay & MacDonald, 2008; Kunnath, 2016;
achievement factors, such as student personality and Sun & Cheng, 2013; Svennberg et al., 2014; Tierney,
behavior. McMillan, consistent with earlier research, Simon, & Charland, 2011). Teachers perceive grades to
found that academic performance and academic enablers have value according to what they can do for individual

Brookhart et al. (2016) A Century of Grading Research

Table 5

Studies of Teachers’ Grading Practices and Perceptions

Study Method Sample Main Findings

Adrian Mixed methods 86 elementary teachers  Approximately 20% of teachers thought that effort, behavior, and
(2012) homework should be included in standards-based grading
 Few thought that it was not appropriate to reduce grades for late
assignments
Bailey Survey; 307 secondary teachers Teachers used a variety of factors in grading, with social studies and
(2012) descriptive male teachers emphasizing effort more than other groups, science
teachers emphasizing effort least, and female teachers emphasizing
behavior more than male teachers
Bonner and Survey; 222 teacher candidates Grading perceptions, based on instructional style, focused on equity,
Chen (2008) scenarios; consistency, accuracy, and fairness, using non-achievement factors to
descriptive obtain highest grades possible
Cizek, Survey; 143 elementary and  With few differences based on grade level or years of experience,
Fitzgerald, descriptive secondary teachers teachers used both objective and subjective factors, synthesizing
and Rachor information to enhance the likelihood of achieving high grades.
(1995)  Significant diversity in grading practices
 Little awareness of district grading policies
Cross and Survey; 307 middle and high  Teachers variously combined achievement, effort, behavior,
Frary (1999) descriptive school teachers improvement, and attitudes to assign grades, and reported that
“ideal” grading should include non-cognitive factors
 Most teachers agreed that effort, conduct and achievement should
be reported separately
Duncan and Survey; factor 77 high school math  Achievement and academic enabling factors, such as effort and
Noonan analysis teachers ability, were identified as most important for grading, with
(2007) significant variation among teachers
 Non-achievement factors considered by most teachers
 Frame of reference for grading was mixed; mostly criterion-
referenced, some self-referenced based on improvement, some
Brookhart et al. (2016) A Century of Grading Research
18

norm-referenced
Frary, Cross, Survey; 536 secondary teachers Up to 70% of teachers agreed that ability, effort, and improvement
and Weber descriptive should be used for grading
(1993)
Grimes Survey; 199 middle school Grades should be based on both achievement and non-achievement
(2010) descriptive teachers factors, including improvement, mastery, and effort
Guskey Survey; 94 elementary and 112  70% of teachers reported an ideal grade distribution of 41% As,
(2002) descriptive secondary teachers 29%Bs, and 19% Cs, but with significant variation
 Teachers wanted students to obtain the highest grade possible
 Highest ranked purpose was to communicate to parents, then to use
as feedback to students
 Multiple factors used to determine grades, including homework,
effort, and progress
Guskey Survey; 513 elementary and  Significant variation in grading practices and issues were reported
(2009b) descriptive secondary teachers.  Most agreed learning occurs without grading
 50% averaged multiple scores to determine grades
 73% based grades on criteria, not norm
 Grades used for communication with students and parents
Hay and Interviews and Two high school Teachers’ values and experience influenced internalization of criteria
MacDonald observations teachers important for grading, resulting in varied practices
(2008)
Imperial Survey; 411 high school  Teachers reported a wide variety of grading practices; whereas the
(2011) descriptive teachers primary purpose was to indicate achievement, about half used non-
cognitive factors
 Grading was unrelated to training received in recommended
grading practices
Kunnath Mixed methods 251 high school  Teachers used both objective achievement results and subjective
(2016) teachers factors in grading
 Teachers incorporated individual circumstances to promote the
highest grades possible
 Grading was based on teachers’ philosophy of teaching
Brookhart et al. (2016) A Century of Grading Research
19

Liu (2008a) Survey; 52 middle and 55 high  Most teachers used effort, ability, and attendance/participation in
multivariate school teachers grading, with few differences between grade levels
analyses  40% used classroom behavior
 90% used effort
 65% used ability
 75% used attendance/participation

Liu (2008b) Survey; factor 300 middle and high Six components in grading were confirmed: importance/value,
analysis school teachers feedback for motivation, instruction, and improvement,
effort/participation, ability and problem solving, comparisons/extra
credit, and grading self-efficacy/ease/confidence/accuracy
Llosa (2008) Survey; factor 1,224 elementary  While showing variations in interpreting English proficiency
analysis; verbal teachers standards, teachers’ grading supported valid summative judgments
protocol analysis though weak formative use for improving instruction
 Teachers incorporated student personality and behavior in grading

McMillan Survey; 1,483 middle and high  Significant variation in weight given to different factors, with a
(2001) descriptive; school teachers high percentage of teachers using non-cognitive factors
factor analysis  Four components of grading were identified: academic enabling
non-cognitive factors, achievement, external comparisons, use of
extra credit, with significant variation among teachers
McMillan Survey; 213 secondary science Teachers reported use of both cognitive and non-cognitive factors in
and Lawson descriptive teachers grading, especially effort
(2001)
McMillan, Survey; factor 901elementary school  Five components were confirmed, including academic enablers
Myran, and analysis teachers such as improvement and effort, extra credit, achievement,
Workman homework, and external comparisons
(2002)  70% indicated use of effort, improvement and ability
 No differences between math and language arts teachers
 High variability in how much different factors are weighted

Brookhart et al. (2016) A Century of Grading Research

McMillan Interviews 24 elementary and Found that teaching philosophy and student effort that improves
and Nash secondary math and motivation and learning were very important considerations for
(2000) English teachers grading
Randall and Survey; 800 elementary, 800 Achievement was the most important factor; effort and behavior
Engelhard scenarios; middle, and 800 high provided as feedback; little emphasis on ability
(2009) descriptive; school teachers
Rasch modeling
Randall and Survey; 79 elementary, 155 Achievement was the most important factor; use of effort and
Engelhard scenarios; middle, and 108 high classroom behavior for borderline cases
(2010) descriptive school teachers
Russell and Survey; 352 secondary music  Non-cognitive factors, such as performance/skill,
Austin descriptive teachers attendance/participation, attitude, and practice/effort weighted as
(2010) much or more than achievement.
 In high school there was a greater emphasis on attendance; middle
school more on practice.
Simon, Case study One high school math Found standardized grading policies conflicted with professional
Tierney, teacher judgments
Forgette-
Giroux,
Charland,
Noonan, and
Duncan
(2010)
Sun and Survey 350 English language  Found emphasis on individualized use of grades for motivation and
Cheng scenarios; secondary teachers extensive use of non-cognitive factors and fairness, especially for
(2013) descriptive borderline grades and for encouragement and effort attributions to
benefit students
 Teachers placed more emphasis on non-achievement factors, such
as effort, homework and study habits, than achievement

Brookhart et al. (2016) A Century of Grading Research

Svennberg, Interviews Four physical education Identified knowledge/skills, motivation, confidence, and interaction
Meckbach, teachers with others as important factors
and Redelius
(2014)
Tierney, Mixed methods 77 high school math  Most teachers believed in fair grading practices that stressed
Simon, and teachers improvement, with little emphasis on attitude, motivation, or
Charland participation, with differences individualized to students
(2011)  Effort was considered for borderline grades
Troug and Mixed methods 53 high school teachers Found significant variability in grading practices and use of both
Friedman achievement and non-achievement factors
(1996)
Webster Mixed methods 42 high school teachers Teachers reported multiple purposes and inconsistent practices while
(2011) showing a clear desire to focus most on achievement consistent with
standards
Wiley (2011) Survey; 15 high school teachers  Teachers varied in how much non-achievement factors were used
scenarios; for grading
descriptive  Found greater emphasis on non-achievement factors, especially
effort for low ability or low achieving students
Yesbeck Interviews 10 middle school Found that a multitude of both achievement and non-achievement
(2011) language arts teachers factors were included in grading

Brookhart et al. (2016) A Century of Grading Research

students. Many teachers use their understanding of ordered categories (e.g., below basic, basic, proficient,
individual student circumstances, their instructional advanced), and involve separate reporting of work habits
experience, and perceptions of equity, consistency, and behavior (Brookhart, 2011; Guskey, 2009a; Guskey &
accuracy, and fairness to make professional judgments, Bailey, 2001, 2010; Marzano & Heflebower, 2011;
instead of solely relying on a grading algorithm. This McMillan, 2009; Melograno, 2007; Mohnsen, 2013;
suggests that grading practices may vary within a single O’Connor, 2009; Scriffiny, 2008; Shippy, Washer, &
classroom, just as it does between teachers, and that this is Perrin, 2013; Wiggins, 1994). It is differentiated from
valued at least by some teachers as a needed element of standardized grading, which provides teachers with
accurate, fair grading, not a problem. In contrast, Simon et uniform grading procedures in an attempt to improve
al. (2010) reported in a case study of one high school consistency in grading methods, and from mastery
mathematics teacher in Canada that standardized grading grading, which expresses student performance on a variety
policy often conflicted with professional judgment and had of skills using a binary mastered/not mastered scale
a significant impact on determining students’ final grades. (Guskey & Bailey, 2001). Some also assert that SBG can
This reflects the impact of policy in that country, an provide exceptionally high-quality information to parents,
important contextual influence. teachers, and students and therefore SBG has the potential
to bring about instructional improvements and larger
Some researchers (Liu, 2008b; Liu, O’Connell, & educational reforms. Some urge caution, however. Cizek
McCoach, 2006; Wiley, 2011) have developed scales to (2000), for example, warned that SBG may be no better
assess teachers’ beliefs and attitudes about grading, than other reporting formats and subject to the same
including items that load on importance, usefulness, effort, misinterpretations as other grading scales.
ability, grading habits, and perceived self-efficacy of the
grading process. These studies have corroborated the Literature on SBG implementation recommendations is
survey and interview findings about teachers’ beliefs in extensive, but empirical studies are few. Studies of SBG to
using both cognitive and non-cognitive factors in grading. date have focused mostly on the implementation of SBG
reforms and the relationship of standards-based grades to
Guskey (2009b) found differences between elementary and state achievement tests designed to measure the same or
secondary teachers in their perspectives about purposes of similar standards. One study investigated student, teacher,
grading. Elementary teachers were more likely to view and parent perceptions of SBG. Table 6 presents these
grading as a process of communication with students and studies.
parents and to differentiate grades for individual students.
Secondary teachers believed that grading served a Implementation of SBG. Schools, districts, and teachers
classroom control and management function, emphasizing have experienced difficulties in implementing SBG
student behavior and completion of work. (Clarridge & Whitaker, 1994; Cox, 2011; Hay &
McDonald, 2008; McMunn, Schenck, & McColskey, 2003;
In short, findings from the limited number of studies on Simon et al., 2010; Tierney et al., 2011). The
teacher perceptions of grading are largely consistent with understanding and support of teachers, parents, and
findings from grading practice surveys. Some studies have students is key to successful implementation of SBG
successfully explored the basis for practices and show that practices, especially grading on standards and separating
teachers view grading as a means to have fair, achievement grades from learning skills (academic
individualized, positive impacts on students’ learning and enablers). Although many teachers report that they support
motivation, and to a lesser extent, classroom control. such grading reforms, they also report using practices that
Together, the research on grading practices and perceptions mix effort, improvement, or motivation with academic
suggests the following four clear and enduring findings. achievement (Cox, 2011; Hay & McDonald, 2008;
First, teachers idiosyncratically use a multitude of McMunn et al., 2003). Teachers also vary in implementing
achievement and non-achievement factors in their grading SBG practices (Cox, 2011), especially in the use of
practices to improve learning and motivation as well as common assessments, minimum grading policies,
document academic performance. Second, student effort accepting work late with no penalty, and allowing students
is a key element in grading. Third, teachers advocate for to retest and replace poor scores with retest scores.
students by helping them achieve high grades. Finally,
teacher judgment is an essential part of fair and accurate The previous section summarized two studies of grading
grading. practices in Ontario, Canada, which adopted SBG
province-wide and required teachers to grade students on
Standards-Based Grading specific topics within each content area using percentage
SBG recommendations emphasize communicating student grades. Simon et al. (2010) identified tensions between
progress in relation to grade-level standards (e.g., adding provincial grading policies and one teacher’s practice.
fractions, computing area) that describe performance using Tierney and colleagues (2011) found that few teachers

Brookhart et al. (2016) A Century of Grading Research

Table 6

Studies of Standards-Based Grading

Study Method Sample Main Findings

Cox (2011) Focus group; 16 high school teachers Although a district policy limited the impact of non-achievement
interview factors on grades, teachers varied a great deal in their
implementation.
High implementers:
 substituted end-of-course assessment and high stakes assessment
scores for grades when students performed better on these exams
than on other assessments,
 allowed students to retake exams and would record the highest
score,
 assigned a score of 50 to all failing grades, and
 accepted late work without penalty
Guskey, Swan, Survey; 24 elementary and Teachers and parents believed that a standards-based report card
and Jung descriptive secondary teachers and 117 provided high quality, clear, and more understood information
(2010) parents
Howley, Interviews; 52 middle school girls and Half of the variance in grade point average could be explained by
Kusimo, and surveys; test 52 of their teachers test scores, but the relationship between grades and test scores
Parrott (1999) scores; grade varied by school. Teachers differed in the extent to which non-
point average cognitive factors like effort were used to determine grades
McMunn, Interviews; 241 teachers, all levels  Teachers who volunteered to participate in a standards-based
Schenck, and focus groups; grading effort reported changing their grading practices to be more
McColskey observations; standards-based after participating in professional development
(2003) surveys;  However, classroom observations and student focus group data
document indicated that implementation of standards-based practice was not
analysis as widespread as teachers reported
Ross and Grades; test 15,942 students randomly  Moderate correlations were observed between grades and test
Kostuch scores; student sampled from the scores
(2011) demographics population of students in  The magnitude of the grade-test score relationship did not vary by
Ontario gender or grade, but was stronger in mathematics than in reading
or writing
Brookhart et al. (2016) A Century of Grading Research
24

 Grades tended to be higher than test scores, except for in writing

Swan, Guskey, Survey 115 parents, 383 teachers Both teachers and parents preferred standards-based over
and Jung Both in a district in which traditional report cards, with teachers indicated the greatest
(2014) grades and traditional preference. Teachers also reported that although standards-based
report cards were grades took more time to generate, the effort was worthwhile due
concurrently generated to improvements in the quality of information provided
Welsh and Interviews; 2 37 elementary teachers  Interviews were quantitatively coded to generate an Appraisal Style
D’Agostino years of were interviewed, 80 scale that captured the use of high-quality standards-based grading
(2009); standards-based elementary classrooms practices
grades; 2 yearas provided student-level  The convergence between spring grades and test scores, both
Welsh, expressed in terms of performance levels, was estimated for each
of test scores grades and test scores
D’Agostino, teacher in each year. Teachers tended to grade more rigorously in
and Kaniskan mathematics and less rigorously in reading and writing
(2013)  Appraisal Style was moderately correlated with convergence rates

Brookhart et al. (2016) A Century of Grading Research

were aware of and applying provincial SBG policies. This found stronger SBG-test correlations in mathematics than
is consistent with McMunn and colleagues’ (2003) in reading or writing, and grades tended to be higher than
findings, which showed that changes in grading practice do test scores, with the exception of writing scores at some
not necessarily follow after changes in grading policy. grade levels.

SBG as a communication tool. Swan, Guskey, and Jung Grading in Higher Education
(2010, 2014) found that parents, teachers, and students Grades in higher education differ markedly among
preferred SBG over traditional report cards, with teachers countries. As a case in point, four dramatic differences
considering adopting SBG having the most favorable exist between the U.S. and New Zealand. First, grading
attitudes. Teachers implementing SBG reported that it practices are much more centralized in New Zealand where
took longer to record the detailed information included in grading is fairly consistent across universities and highly
the SBG report cards but felt the additional time was consistent within universities. Second, the grading scale
worthwhile because SBGs yielded higher-quality starts with a passing score of 50 percent, and 80 percent
information. An earlier informal report by Guskey (2004) and above score an A. Third, essay testing is more
found, however, that many parents attempted to interpret prevalent in New Zealand than multiple choice testing.
nearly all labels (e.g., below basic, basic, proficient, Fourth, grade distributions are reviewed and grades of
advanced) in terms of letter grades. It may be that a individual instructors are considered each semester at
decade of increasing familiarity with SBG has changed departmental-level meetings. These are at best rarities in
perceptions of the meaning and usefulness of SBG. higher education in the U.S.

Relationship of SBGs to high-stakes test scores. One An examination of 35 country and university websites
might expect consistency between SBGs and standards- paints a broad picture of the diversity in grading practices.
based assessment scores because they purport to measure Many countries use a system like that in New Zealand, in
the same standards. Eight papers examined this which 50 or 51 is the minimal passing score, and 80 and
consistency (Howley, Kusimo, & Parrott, 1999; Klapp above (sometimes 90 and above) is considered A level
Lekholm, 2011; Klapp Lekholm & Cliffordson, 2008, performance. Many countries also offer an E grade, which
2009; Ross & Kostuch, 2011; Thorsen & Cliffordson, is sometimes a passing score and other times indicates a
2012; Welsh & D’Agostino, 2009; Welsh, D’Agostino, & failure less egregious than an F. If 50 percent is considered
Kaniskan, 2013). All yielded essentially the same results: passing, then skepticism toward multiple choice testing
SBGs and high-stakes, standards-based assessment scores (where there is often a 1 in 4 chance of a correct guess)
were only moderately related. Howley et al. (1999) found becomes understandable. In the Netherlands, a 1 (lowest)
that 50 percent of the variance in GPA could be explained to 10 (highest) system is used, with grades 1–3 and 9–10
by standards-based assessment scores, and the magnitude rarely awarded, leaving a five-point grading system for
of the relationship varied by school. Interview data most students (Nuffic, 2013). In the European Union,
revealed that even in SBG settings, some teachers still differences between countries are so substantial that the
included non-cognitive factors (e.g., attendance and European Credit Transfer and Accumulation System was
participation) in grades. This may explain the modest created (European Commission, 2009).
relationship, at least in part.
Grading in higher education varies within countries, as
Welsh and D’Agostino (2009) and Welsh et al. (2013) well. In the U.S., it is typically seen as a matter of
developed an Appraisal Scale that gauged teachers’ efforts academic freedom and not a fit subject for external
to assess and grade students on standards attainment. This intervention. Indeed, in an analysis of the American
10-item measure focused on the alignment of assessments Association of Collegiate Registrars and Admissions
with standards and on the use of a clear, standards- Officers (AACRAO) survey of grading in higher education
attainment focused grading method. They found small to in the U.S., Collins and Nickel (1974) reported “…there
moderate correlations between this measure and grade-test are as many different types of grading systems as there are
score convergence. That is, the standards-based grades of institutions” (p. 3). The 2004 version of the same survey
teachers who utilized criterion-referenced achievement suggested, however, a somewhat more settled situation in
information were more related to standards-based recent years (Brumfield, 2005). Grading in higher
assessments than were the grades of teachers who do not education shares many issues of grade meaning with the K-
follow this practice. Welsh and D’Agostino (2009) and 12 context, which have been addressed above. Two unique
Welsh et al. (2013) found that SBG-test score relationships issues for grade meaning remain: grading and student
were larger in writing and mathematics than in reading. In course evaluations, and historical changes in expected
addition, although teachers assigned lower grades than test grade distributions. Table 7 presents studies in these areas.
scores in mathematics, grades were higher than test scores
in reading and writing. Ross and Kostuch (2011) also

Brookhart et al. (2016) A Century of Grading Research

Table 7

Studies of Grading in Higher Education

Study Method Sample Main Findings

Abrami, Experimental, Exp. 1, 143 undergraduates Standards did not affect student achievement
Dickens, quantitative Exp. 2, 278 undergraduates
Perrry, and
Leventhal
(1980)
Brumfield Survey 419 member institutions of Grades are a central feature of academia. There is a broad range of
(2005) the American Association grading systems
of Collegiate Registrars
and Admissions Officers in
2014
Centra and Non- 9,194 class averages of Ratings of teacher effectiveness were correlated at .20 with expected
Creech experimental student evaluations grades
(1976)
Collins and Survey 544 two-and four-year There are many different types of grading systems and the use of
Nickel colleges and universities non-traditional grading practices is widespread
(1974)
Feldman Meta-analysis 31 studies Correlation between anticipated grade and course evaluation rating
(1997) was between .10 and .30
Ginexi Survey 136 undergraduate students Anticipated grade was related to higher teacher ratings and ease of
(2003) in a general psychology comprehension of assigned readings, but to no other questions on the
course course evaluation
Holmes Experimental 97 undergraduate students Students’ grades were not related to course evaluations but students
(1972) in an introductory who received unexpectedly (manipulated) low grades gave poorer
psychology course instructor evaluations

Brookhart et al. (2016) A Century of Grading Research

Kasten and Experimental 77 graduate students in 5 Random assignment to 3 purposes for the course evaluation
Young educational administration (personal decision, instructor’s use, or no purpose stated) yielded no
(1983) classes significant differences in ratings
Kulick and Monte Carlo Series of simulations based Normal distributions of test scores do not necessarily provide
Wright simulation on 400 students evidence of the efficacy of the evaluation of the quality of the test
(2008)
Maurer Experimental 642 students in 17 Students were randomly assigned to 3 conditions (personnel
(2006) (unspecified) classes taught decision, course improvement, or control group) and asked for
by the same instructor expected grades; expected grade was related to course evaluations
but stated purpose of the evaluation was not
Mayo (1970) Survey 3 instructors of an In a mastery learning context, active participation with course
undergraduate introductory material appear to be superior to only doing the reading and
measurement course receiving lectures
Nicolson Survey 64 colleges approved by 36 of the colleges used a 5-division marking scale for grading
(1917) the Carnegie Foundation purposes
Salmons Non- 444 introductory Students were given a course evaluation prior to the first exam and
(1993) experimental psychology students from again after receiving their final grades. From pre to post, students
Radford University anticipating a low grade lowered their evaluation of the course and
students anticipating a high grade raised their evaluation of the
course
Smith and Experimental 240 introductory Students were randomly assigned to 1 of 3 approaches to university
Smith (2009) psychology students grading: a 100-point system, a percentage system, and an open point
system. Significant differences were found for motivation,
confidence, and effort, but not for perceptions of achievement or
accuracy

Brookhart et al. (2016) A Century of Grading Research

Grades and student course evaluations. Students in system with 25 percent of the grades at the top, 50 percent
higher education routinely evaluate the quality of their in the middle, and 25 percent at the bottom (Winter, 1993).
course experiences and their instructors’ teaching. The Working from European models, American universities
relationship between course grades and course evaluations invented systems for ranking and categorizing students
has been of interest for at least 40 years (Abrami, Dickens, based both on academic performance and on progress,
Perry, & Leventhal, 1980; Holmes, 1972) and is a sub- conduct, attentiveness, interest, effort, and regular
question in the general research about student evaluations attendance at class and chapel (Cureton, 1971; Rugg, 1918;
of courses (e.g., Centra, 1993; Marsh, 1984, 1987; Schneider & Hutt, 2014). Grades were ubiquitous at all
McKeachie, 1979; Spooren, Brockx, & Mortelmans, levels of education at the turn of the 20th century, but were
2013). The hypothesis is straightforward: students will idiosyncratically determined (Schneider & Hutt, 2014), as
give higher course evaluations to faculty who are lenient described earlier.
graders. This grade-leniency theory (Love & Kotchen,
2010; McKenzie, 1975) has long been lamented, To resolve inconsistencies, educators turned to the new
particularly by faculty who perceive themselves as science of statistics, and a concomitant passion for
rigorous graders and do not enjoy favorable student measuring and ranking human characteristics (Pearson,
evaluations. This assumption is so prevalent that it is close 1930). Inspired by the work of his cousin, Charles Darwin,
to accepted as settled science (Ginexi, 2003; Marsh, 1987; Francis Galton pioneered the field of psychometrics,
Salmons, 1993). Ginexi posited that the relationship extending his efforts to rank one’s fitness to produce high
between anticipated grades and course evaluation ratings quality offspring on an A to D scale (Galton & Galton,
could be a function of cognitive dissonance (between the 1998). Educators began to debate how normal curve
student’s self-image and an anticipated low grade), or of theory and other scientific advances should be applied to
revenge theory (retribution for an anticipated low grade). grading. As with K–12 education, the consensus was that
Although Maurer (2006) argued that revenge theory is the 0–100 marking system led to an unjustified implication
popular among faculty receiving low course evaluations, of precision, and that the normal curve would allow for
both his study and an earlier study by Kasten and Young transformation of student ranks into A-F or other
(1983) did not find this to be the case. These authors categories (Rugg, 1918).
therefore argued for the cognitive dissonance model, where
attributing poor teaching to the perceived lack of student Meyer (1908) argued for grade categories as follows:
success is an intrapersonal face-saving device. excellent (3 percent of students), superior (22 percent),
medium (50 percent), inferior (22 percent), and failure (3
A critical look at the literature presents an alternative percent). He argued that a student picked at random is as
argument. First, the relationship between anticipated likely to be of medium ability as not. Interestingly,
grades and course evaluation ratings is moderate at best. Meyer’s terms for the middle three grades (superior,
Meta-analytic work (Centra & Creech, 1976; Feldman, medium, and inferior) are norm-referenced; whereas, the
1997) suggests correlations between .10 and .30, or that two extreme grades (excellent and failure) are criterion-
anticipated grades account for less than 10 percent of the referenced. Roughly a decade later, Nicolson (1917) found
variance in course evaluations. It therefore appears that that 36 out of 64 colleges were using a 5-point scale for
anticipated grades have little influence on student grading, typically A–F. The questions debated at the time
evaluations. Second, the relationship between anticipated were more over the details of such systems as opposed to
grades and course evaluations could simply reflect an the overall approach. As Rugg (1918) stated:
honest assessment of students’ opinions of instruction, Now the term inherited capacity practically defines
which varies according to the students’ experiences of the itself. By it we mean the “start in life;” the sum
course (Smith & Smith, 2009). Students who like the total of nervous possibilities which the infant has at
instructional approach may be expected to do better than birth and to which, therefore, nothing that the
students who do not. Students exposed to exceptionally individual himself can do will contribute in any
good teaching might be expected to do well in the course way whatsoever. (p. 706)
and to rate the instruction highly (and vice versa for poor
instruction). Although face-saving or revenge might occur, Rugg went on to say that educational conditions interact
a fair amount of honest and accurate appraisal of the with inherited capacity, resulting in what he called “ability-
quality of teaching might be reflected in the observed to-do” (p. 706). He recommended basing teachers’ marks
correlations. on observations of students’ performance that reflect those
abilities, and that grades should form a normal distribution.
Historical changes in expectations for grade That is, the normal distribution should form a basis for
distributions. The roots of grading in higher education checking the quality of the grades that teachers assign.
can be traced back hundreds of years. In the 16th century, This approach reduces grading to determining the number
Cambridge University developed a three tier grading of grading divisions and the number of students who

Brookhart et al. (2016) A Century of Grading Research

should fall into each category. Thus, there is a shift from a increasingly unpopular, so did the pressure on professors
decentralized and fundamentally haphazard approach to not to fail students and make them subject to the draft. The
assigning grades to one that is based on “scientific” (p. effect of the draft on grading practices in higher education
701) principle. Furthermore, Rugg argued that letter is unmistakable (Rojstaczer & Healy, 2012). The
grades were preferable to percentage grades as they more proportion of A and B grades rose dramatically during the
accurately represented the level of precision that was years of the draft; the proportion of D and F grades fell
possible. concomitantly.

Another interesting aspect of Rugg’s (1918) and Meyer’s Grades have risen again dramatically in the past 25 years.
(1908) work is the notion that grades should simply be a Rojstaczer and Healy (2012) argued that this resulted from
method of ranking students, and not necessarily used for new views of students as consumers, or even customers,
making decisions about achievement. Although Meyer and away from viewing students as needing discipline.
argued that three percent should fail a typical course (and Others have contended that faculty inflate grades to vie for
he feared that people would see this as too lenient), he was good course ratings (the grade-leniency theory, Love &
less certain about what to do with the “inferior” group, Kotchen, 2010). Or, perhaps students are higher-achieving
stating that grades should solely represent a student’s rank than they were and deserve better grades.
in the class. In hindsight, these approaches seem
reductionist at best. Although the notion of grading “on Discussion: What Do Grades Mean?
the curve” remained popular through at least through the This review shows that over the past 100 years teacher-
early 1960s, a categorical (A-F) approach to assigning assigned grades have been maligned by researchers and
grades was implemented. This system tended to mask pyschometricians alike as subjective and unreliable
keeping a close eye on the notion that not too many As nor measures of student academic achievement (Allen, 2005;
too many Fs were handed out (Guskey, 2000; Kulick & Banker, 1927; Carter, 1952; Evans, 1976; Hargis, 1990;
Wright, 2008). The normal curve was the “silent partner” Kirschenbaum et al., 1971; Quann, 1983; Simon &
of the grading system. Bellanca, 1976). However, others have noted that grades
are a useful indicator of numerous factors that matter to
In the U.S. in the 1960s, a confluence of technical and students, teachers, parents, schools, and communities
societal events led to dramatic changes in perspectives (Bisesi, Farr, Greene, & Haydel, 2000; Folzer-Napier,
about grading. These were criterion-referenced testing 1976; Linn, 1982). Over the past 100 years, research has
(Glaser, 1963), mastery learning and mastery testing attempted to identify the different components of grades in
(Bloom, 1971; Mayo, 1970), the Civil Rights movement, order to inform educational decision making (Bowers,
and the war in Vietnam. Glaser brought forth the 2009; Parsons, 1959). Interestingly, although standardized
innovative idea that sense should be made out of test assessment scores have been shown to have low criterion
performance by “referencing” performance not to a validity for overall schooling outcomes (e.g., high school
norming group, but rather to the domain whence the test graduation and admission to post-secondary institutions),
came; students’ performance should not be based on the grades consistently predict K-12 educational persistence,
performance of their peers. The proper referent, according completion, and transition from high school to college
to Glaser, was the level of mastery on the subject matter (Atkinson & Geiser, 2009; Bowers et al., 2013).
being assessed. Working from Carroll’s model of school
learning (Carroll, 1963), Bloom developed the underlying One hundred years of quantitative studies of the
argument for mastery learning theory: that achievement in composition of K-12 report card grades demonstrate that
any course (and by extension, the grade received) should teacher-assigned grades represent both the cognitive
be a function of the quality of teaching, the perseverance of knowledge measured in standardized assessment scores
the student, and the time allowed for the student to master and, to a smaller extent, non-cognitive factors such as
the material (Bloom, 1971; Guskey, 1985). substantive engagement, persistence, and positive school
behaviors (e.g., Bowers, 2009, 2011; Farkas et al., 1990;
It was not the case that the work of Bloom (1971) and Klapp Lekholm & Cliffordson, 2008, 2009; Miner, 1967;
Glaser (1963) single-handedly changed how grading took Willingham et al., 2002). Grades are useful in predicting
place in higher education, but ideas about teaching and and identifying students who may face challenges in either
learning partially inspired by this work led to a substantial the academic component of schooling or in the socio-
rethinking of the proper aims of education. Bring into this behavioral domain (e.g., Allensworth, 2013; Allensworth
mix a national reexamination of status and equity, and the & Easton, 2007; Allensworth et al., 2014; Atkinson &
time was ripe for a humanistic and social reassessment of Geiser, 2009; Bowers, 2014).
grading and learning in general. The final ingredient in the
mix was the war in Vietnam. The U.S. had its first The conclusion is that grades typically represent a mixture
conscription since World War II, and as the war grew of multiple factors that teachers value. Teachers recognize

Brookhart et al. (2016) A Century of Grading Research

the important role of effort in achievement and motivation Bailey, 2010; Marzano & Hefflebower, 2011; O’Connor,
(Aronson, 2008; Cizek et al., 1995; Cross & Frary, 1999; 2009; Scriffiny, 2008). Although measurement experts
Duncan & Noonan, 2007; Guskey, 2002, 2009b; Imperial, and professional developers may wish grades were
2011; Kelly, 2008; Liu, 2008a; McMillan, 2001; McMillan unadulterated measures of what students have learned and
& Lawson, 2001; McMillan et al., 2002; McMillan & are able to do, strong evidence indicates that they are not.
Nash, 2000; Randall & Engelhard, 2009, 2010; Russell &
Austin, 2010; Sun & Cheng, 2013; Svennberg et al., 2014; For those who wish grades could be a more focused
Troug & Friedman, 1996; Yesbeck, 2011). They measure of achievement of intended instructional
differentiate academic enablers (McMillan, 2001, p. 25) outcomes, future research needs to cast a broader net. The
like effort, ability, improvement, work habits, attention, value teachers attach to effort and other academic enablers
and participation, which they endorse as relevant to in grades and their insistence that grades should be fair
grading, from other student characteristics like gender, point to instructional and societal issues that are well
socioeconomic status, or personality, which they do not beyond the scope of grading. Why, for example, do some
endorse as relevant to grading. students who sincerely try to learn what they are taught not
achieve the intended learning outcomes? Two important
This quality of graded achievement as a multidimensional possibilities include intended learning outcomes that are
measure of success in school may be what makes grades developmentally inappropriate for these students (e.g.,
better predictors of future success in school than tested these students lack readiness or prior instruction in the
achievement (Atkinson & Geiser, 2009; Barrington & domain), and poorly designed lessons that do not make
Hendricks, 1989; Bowers, 2014; Cairns et al., 1989; clear what students are expected to learn, do not instruct
Cliffordson, 2008; Ekstrom et al., 1986; Ensminger & students in appropriate ways, and do not arrange learning
Slusarcick, 1992; Finn, 1989; Fitzsimmons et al., 1969; activities and formative assessments in ways that help
Hargis, 1990; Lloyd, 1974, 1978; Morris et al., 1991; students learn well. Research focusing solely on grades
Rumberger, 1987; Troob, 1985; Voss et al., 1966), typically misses antecedent causes. Future research should
especially given known limitations of achievement testing make these connections. For example, does more of the
(Nichols & Berliner, 2007; Polikoff, Porter, & Smithson, variance in grades reflect achievement in classes where
2011). In the search for assessments of non-cognitive lessons are high-quality and appropriate for students? Is a
factors that predict educational outcomes (Heckman & negatively skewed grade distribution, where most students
Rubinstein, 2001; Levin, 2013), grades appear to be useful. achieve and very few fail, effective for the purposes of
Current theories postulate that both cognitive and non- certifying achievement, communicating with students and
cognitive skills are important to acquire and build over the parents, passing students to the next grade, or predicting
course of life. Although non-cognitive skills may help future educational success? Do changes in instructional
students to develop cognitive skills, the reverse is not true design lead to changes in grading practices, in grade
(Cunha & Heckman, 2008). distributions, and in the usefulness of grades as predictors
of future educational success?
Teachers’ values are a major component in this
multidimensional measure. Besides academic enablers, This review suggests that most teachers’ grades do not
two other important teacher values work to make graded yield a pure achievement measure, but rather a
achievement different from tested achievement. One is the multidimensional measure dependent on both what the
value that teachers place on being fair to students (Bonner, students learn and how they behave in the classroom. This
2016; Bonner & Chen, 2009; Brookhart, 1994; Grimes, conclusion, however, does not excuse low quality grading
2010; Hay & MacDonald, 2008; Sun & Cheng, 2013; practices or suggest there is no room for improvement.
Svennberg et al., 2014; Tierney et al., 2011). In their One hundred years of grading research have generally
concept of fairness, most teachers believe that students confirmed large variation among teachers in the validity
who try should not fail, whether or not they learn. Related and reliability of grades, both in the meaning of grades and
to this concept is teachers’ wish to help all or most students the accuracy of reporting.
be successful (Bonner, 2016; Brookhart, 1994).
Early research found great variation among teachers when
Grades, therefore, must be considered multidimensional asked to grade the same examination or paper. Many of
measures that reflect mostly achievement of classroom these early studies communicated a “what’s wrong with
learning intentions and also, to a lesser degree, students’ teachers” undertone that today would likely be seen as
efforts at getting there. Grades are not unidimensional researcher bias. Early researchers attributed sources of
measures of pure achievement, as has been assumed in the variation in teachers’ grades to one or more of the
past (e.g., Carter, 1952; McCandless et al., 1972; Moore, following sources: criteria (Ashbaugh, 1924; Brimi, 2011;
1939; Ross & Hooks, 1930) or recommended in the present Healy, 1935; Silberstein, 1922; Sims, 1933, Starch, 1915;
(e.g., Brookhart, 2009, 2011; Guskey, 2000; Guskey & Starch & Elliott, 1913a,b), students’ work quality (Bolton,

Brookhart et al. (2016) A Century of Grading Research

1927; Healy, 1935; Jacoby, 1910; Lauterbach, 1928; and other academic enablers when determining grades
Shriner, 1930; Sims, 1933), teacher severity/leniency (Cox, 2011; Hay & McDonald, 2008; McMunn et al.,
(Shriner, 1930; Silberstein, 1922; Sims, 1933; Starch, 2003).
1915; Starch & Elliott, 1913b), task (Silberstein, 1922;
Starch & Elliott, 1913a), scale (Ashbaugh, 1924; Sims, Future research in this area should seek ways to help
1933; Starch 1913, 1915), and teacher error (Brimi, 2011; teachers improve the criteria they use to grade, their skill at
Eells, 1930; Hulten, 1925; Lauterbach, 1928, Silberstein, identifying levels of quality on the criteria, and their ability
1922; Starch & Elliott, 1912, 1913a,b). Starch (1913, to effectively merge these assessment skills and
Starch & Elliott 1913b) found that teacher error and instructional skills. When students are taught the criteria
emphasizing different criteria were the two largest sources by which to judge high-quality work and are assessed by
of variation. those same criteria, grade meaning is enhanced. Even if
grades remain multidimensional measures of success in
Regarding sources of error, Smith (2003) suggested school, the dimensions on which grades are based should
reconceptualizing reliability for grades as a matter of be defensible goals of schooling and should match
sufficiency of information for making the grade students’ opportunities to learn.
assignment. This recommendation is consistent with the
fact that as grades are aggregated from individual pieces of No research agenda will ever entirely eliminate teacher
work to report card or course grades and grade-point variation in grading. Nevertheless, the authors of this
averages, reliability increases. The reliability of overall review have suggested several ways forward. Investigating
college grade-point average is estimated at .93 (Beatty, grading in the larger context of instruction and assessment
Walmsley, Sackett, Kuncel, & Koch, 2015). will help focus research on important sources and causes of
invalid or unreliable grading decisions. Investigating ways
In most studies investigating teachers’ grading reliability, to differentiate instruction more effectively, routinely, and
teachers were sent examination papers without specific easily will reduce teachers’ feelings of pressure to pass
grading criteria and simply asked to assign grades. Today, students who may try but do not reach an expected level of
this lack of clear grading criteria would be seen as a achievement. Investigating the multidimensional construct
shortcoming in the assessment process. Most of these of “success in school” will acknowledge that grades
studies thus confounded teachers’ inability to judge student measure something significant that is not measured by
work consistently and random error, considering both achievement tests. Investigating ways to help teachers
teacher error. Rater training offers a modern solution to develop skills in writing or selecting and then
this situation. Research has shown that with training on communicating criteria, and recognizing these criteria in
established criteria, individuals can judge examinees’ work students’ work, will improve the quality of grading. All of
more accurately and reliably (Myford, 2012). these seem reachable goals to achieve before the next
Unfortunately, most teachers and professors today are not century of grading research. All will assuredly contribute
well trained, typically grade alone, and rarely seek help to enhancing the validity, reliability, and fairness of
from colleagues to check the reliability of their grading. grading.
Thus, working toward clearer criteria, collaborating among
teachers, and involving students in the development of Suggested Citation Format:
grading criteria appear to be promising approaches to Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan,
enhancing grading reliability. J. H., Smith, J. K., Smith, L. F., Stevens, M.T., Welsh, M.
E. (2016). A Century of Grading Research: Meaning and
Considering criteria as a source of variation in teachers’ Value in the Most Common Educational Measure. Review
grading has implications for grade meaning and validity. of Educational Research, 86(4), 803-848.
The attributes upon which grading decisions are based doi: 10.3102/0034654316672069
function as the constructs the grades are intended to http://doi.org/10.3102/0034654316672069
measure. To the extent teachers include factors that do not
indicate achievement in the domain they intend to measure REFERENCES:
(e.g., when grades include consideration of format and Abrami, P. C., Dickens, W. J., Perry, R. P., & Leventhal,
surface level features of an assignment), grades do not give L. (1980). Do teacher standards for assigning grades
students, parents, or other educators accurate information affect student evaluations of instruction? Journal of
about learning. Furthermore, to the extent teachers do not Educational Psychology, 72, 107–118.
appropriately interpret student work as evidence of doi:10.1037/0022-0663.72.1.107
learning, the intended meaning of the grade is also Adrian, C. A, (2012). Implementing standards-based
compromised. There is evidence that even teachers who grading: Elementary teachers’ beliefs, practices and
explicitly decide to grade solely on achievement of concerns. (Doctoral dissertation). Retrieved from
learning standards sometimes mix effort, improvement,

Brookhart et al. (2016) A Century of Grading Research

ProQuest. (1032540669) nongraduates. Journal of Educational Research, 82,

Aronson, M. J. (2008). How teachers' perceptions in the 309–319. doi:10.1080/00220671.1989.10885913
areas of student behavior, attendance and student Beatty, A. S., Walmsley, P. T., Sackett, P. R., Kuncel, N.
personality influence their grading practice (Doctoral R., & Koch, A. J. (2015). The reliability of college
dissertation). Retrieved from ProQuest. (304510267) grades. Educational Measurement: Issues and
Alexander, K. L., Entwisle, D. R., & Kabbani, N. S. Practice, 34(4), 31–40. doi:10.1111/emip.12096
(2001). The dropout process in life course perspective: Bisesi, T., Farr, R., Greene, B., & Haydel, E. (2000).
Early risk factors at home and school. The Teachers Reporting to parents and the community. In E.
College Record, 103, 760–822. doi:10.1111/0161- Trumbull & B. Farr (Eds.), Grading and reporting
4681.00134 student progress in an age of standards (pp. 157–184).
Allen, J. D. (2005). Grades as valid measures of academic Norwood, MA: Christopher-Gordon.
achievement of classroom learning. The Clearing Bloom, B. S. (1971). Mastery learning. In J. H. Block
House, 78, 218–223. doi:10.3200/TCHS.78.5.218-223 (Ed.), Mastery learning: Theory and practice (pp. 47–
Allensworth, E. M. (2013). The use of ninth-grade early 63). New York, NY: Holt, Rinehart & Winston.
warning indicators to improve Chicago Schools. Bolton, F. E. (1927). Do teachers’ marks vary as much as
Journal of Education for Students Placed at Risk supposed? Education, 48, 28–39.
(JESPAR), 18, 68–83. Bonner, S. M. (2016). Teacher perceptions about
doi:10.1080/10824669.2013.745181 assessment: Relationship to practice and policy. In G.
Allensworth, E. M., & Easton, J. Q. (2005). The on-track T. Brown, & L. Harris (Eds.), Human factors and
indicator as a predictor of High School graduation social conditions of assessment. London, United
(Vol. 2006). Chicago, IL: University of Chicago Kingdom: Routledge.
Consortium on Chicago School Research. Bonner, S. M., & Chen, P. P. (2009). Teacher candidates'
Allensworth, E. M., & Easton, J. Q. (2007). What matters perceptions about grading and constructivist
for staying on-track and graduating in Chicago public teaching. Educational Assessment, 14, 57–77.
high schools: A close look at course grades, failures, doi:10.1080/10627190903039411
and attendance in the freshman year. Chicago, IL: Bowers, A. J. (2009). Reconsidering grades as data for
University of Chicago Consortium on Chicago School decision making: More than just academic knowledge.
Research. Journal of Educational Administration, 47, 609–629.
Allensworth, E. M., Gwynne, J. A., Moore, P., & de la doi:10.1108/09578230910981080
Torre, M. (2014). Looking forward to high school and Bowers, A. J. (2010a). Analyzing the longitudinal K-12
college: Middle grade indicators of readiness in grading histories of entire cohorts of students: Grades,
Chicago Public Schools. Chicago, IL: University of data driven decision making, dropping out and
Chicago Consortium on Chicago School Research. hierarchical cluster analysis. Practical Assessment
Ashbaugh, E. J. (1924). Reducing the variability of Research and Evaluation, 15(7), 1–18. Retrieved from
teachers’ marks. Journal of Educational Research, 9, http://pareonline.net/pdf/v15n7.pdf
185–198. doi:10.1080/00220671.1924.10879447 Bowers, A. J. (2010b). Grades and graduation: A
Atkinson, R. C., & Geiser, S. (2009). Reflections on a longitudinal risk perspective to identify student
century of college admissions tests. Educational dropouts. Journal of Educational Research, 103, 191–
Researcher, 38, 665–676. 207. doi:10.1080/00220670903382970
doi:10.3102/0013189x09351981 Bowers, A. J. (2011). What's in a grade? The
Balfanz, R., Herzog, L., & MacIver, D. J. (2007). multidimensional nature of what teacher-assigned
Preventing student disengagement and keeping grades assess in high school. Educational Research
students on the graduation path in urban middle-grades and Evaluation, 17, 141–159.
schools: Early identification and effective doi:10.1080/13803611.2011.597112
interventions. Educational Psychologist, 42, 223–235. Bowers, A. J. (2014). Student risk factors. In D. J. Brewer
doi:10.1080/00461520701621079 & L. O. Picus (Eds.), Encyclopedia of education
Bailey, M. T. (2012). The relationship between secondary economics and finance. Thousand Oaks, CA: Sage.
school teacher prerceptions of grading practices and Bowers, A. J., & Sprott, R. (2012). Examining the multiple
secondary school teacher perceptions of student trajectories associated with dropping out of high
motivation. (Doctoral dissertation) Retrieved from school: A growth mixture model analysis. Journal of
ProQuest. (1011481355) Educational Research, 105, 176–195.
Banker, H. J. (1927). The significance of teachers’ marks. doi:10.1080/00220671.2011.552075
The Journal of Educational Research, 16, 159–171. Bowers, A. J., Sprott, R., & Taff, S. (2013). Do we know
doi:10.1080/00220671.1927.10879778 who will drop out? A review of the predictors of
Barrington, B. L., & Hendricks, B. (1989). Differentiating dropping out of high school: Precision, sensitivity and
characteristics of high school graduates, dropouts, and specificity. The High School Journal, 96, 77–100.

Brookhart et al. (2016) A Century of Grading Research

doi:10.1353/hsj.2013.0000 and student ratings of teacher effectiveness. Report

Brennan, R. T., Kim, J., Wenz-Gross, M., & Siperstein, G. PR-76–1. Princeton, NJ: Educational Testing Service.
N. (2001). The relative equitability of high-stakes Cizek, G. J. (2000). Pockets of resistance in the assessment
testing versus teacher-assigned grades: An analysis of revolution. Educational
the Massachusetts Comprehensive Assessment System Measurement: Issues and Practice, 19(2), 16–23.
(MCAS). Harvard Educational Review, 71, 173–215. doi:10.1111/j.1745-3992.2000.tb00026.x
doi:10.17763/haer.71.2.v51n6503372t4578 Cizek, G. J., Fitzgerald, J. M., & Rachor, R. A. (1995).
Brimi, H. M. (2011). Reliability of grading high school Teachers' assessment practices: Preparation, isolation,
work in English. Practical Assessment, Research & and the kitchen sink. Educational Assessment, 3, 159–
Evaluation, 16(17). Retrieved from 179. doi:10.1207/s15326977ea0302_3
http://pareonline.net/getvn.asp?v=16&n=17 Clarridge, P. B. & Whitaker, E. M. (1994). Implementing a
Brookhart, S. M. (1991). Grading practices and validity. new elementary progress report. Educational
Educational Measurement: Issues and Practice, 10(1), Leadership, 52(2), 7–9. Retrieved from
35–36. doi:10.1111/j.1745-3992.1991.tb00182.x http://www.ascd.org/publications/educational-
Brookhart, S. M. (1993). Teachers' grading practices: leadership/oct94/vol52/num02/Implementing-a-New-
Meaning and values. Journal of Educational Elementary-Progress-Report.aspx
Measurement, 30, 123–142. doi:10.1111/j.1745- Cliffordson, C. (2008). Differential prediction of study
3984.1993.tb01070.x success across academic programs in the Swedish
Brookhart, S. M. (1994). Teachers' grading: Practice and context: The validity of grades and tests as selection
theory. Applied Measurement in Education, 7, 279– instruments for higher education. Educational
301. doi:10.1207/s15324818ame0704_2 Assessment, 13, 56–75.
Brookhart, S. M. (2009). Grading (2nd ed.). New York, doi:10.1080/10627190801968240
NY: Merrill Pearson Education. Collins, J. R. & Nickel, K. N. (1974). A study of grading
Brookhart, S. M. (2011). Grading and learning: Practices practices in institutions of higher education. Retrieved
that support student achievement. Bloomington, IN: from ERIC database (ED 097 846).
Solution Tree Press. Cox, K. B. (2011). Putting classroom grading on the table,
Brookhart, S. M. (2013). Grading. In J. H. McMillan a reform in progress. American Secondary Education,
(Ed.), Sage handbook of research on classroom 40(1), 67–87.
assessment (pp. 257–271). Thousand Oaks, CA: Sage Crooks, A. D. (1933). Marks and marking systems: A
Publications, Inc. digest. Journal of Educational Research, 27, 259–272.
Brookhart, S. M. (2015). Graded achievement, tested doi:10.1080/00220671.1933.10880402
achievement, and validity. Educational Assessment, Cross, L. H., & Frary, R. B. (1999). Hodgepodge grading:
20, 268–296. doi:10.1080/10627197.2015.1093928 Endorsed by students and teachers
Brumfield, C. (2005). Current trends in grades and alike. Applied Measurement in Education, 12, 53–
grading practices in higher education: Results of the 72.doi:10.1207/s15324818ame1201_4
2004 AACRAO survey. Retrieved from database Cunha, F., & Heckman, J. J. (2008). Formulating,
(ED489795). identifying and estimating the technology of cognitive
Cairns, R. B., Cairns, B. D., & Neckerman, H. J. (1989). and noncognitive skill formation. Journal of Human
Early school dropout: Configurations and Resources, 43, 738–782. doi:10.3368/jhr.43.4.738
determinants. Child Development, 60, 1437–1452. Cureton, L. W. (1971). The history of grading practices.
doi:10.2307/1130933 NCME measurement in education 2(4). 1–8.
Carroll, J. (1963). A model of school learning. The Duckworth, A. L., Quinn, P. D., & Tsukayama, E. (2012).
Teachers College Record, 64, 723–723. What No Child Left Behind leaves behind: The roles of
Carter, R. S. (1952). How invalid are marks assigned by IQ and self-control in predicting standardized
teachers? Journal of Educational Psychology, 43, achievement test scores and report card grades.
218–228. doi:10.1037/h0061688 Journal of Educational Psychology, 104, 439–451.
Casillas, A., Robbins, S., Allen, J., Kuo, Y. L., Hanson, M. doi:10.1037/a0026280
A., & Schmeiser, C. (2012). Predicting early academic Duckworth, A. L., & Seligman, M. E. P. (2006). Self-
failure in high school from prior academic discipline gives girls the edge: Gender in self-
achievement, psychosocial characteristics, and discipline, grades, and achievement test scores.
behavior. Journal of Educational Psychology, 104, Journal of Educational Psychology, 98, 198–208.
407–420. doi:10.1037/a0027180 doi:10.1037/0022-0663.98.1.198
Centra, J. A. (1993). Reflective faculty evaluation. San Duncan, R. C., & Noonan, B. (2007). Factors affecting
Francisco, CA: Jossey-Bass. teachers' grading and assessment practices. Alberta
Centra, J. A., & & Creech, F. R. (1976). The relationship Journal of Educational Research, 53, 1–21.
between student, teacher, and course characteristics Edgeworth, F. Y. (1888). The statistics of examinations.

Brookhart et al. (2016) A Century of Grading Research

Journal of the Royal Statistical Society, 51, 599–635. Ginexi, E. M. (2003). General psychology course
Eells, W. C. (1930). Reliability of repeated grading of evaluations: Differential survey response by
essay type examinations. Journal of Educational expected grade. Teaching of Psychology, 30, 248–251.
Psychology, 21, 48–52. Glaser, R. (1963). Instructional technology and the
Ekstrom, R. B., Goertz, M. E., Pollack, J. M., & Rock, D. measurement of learning outcomes: Some questions.
A. (1986). Who drops out of high school and why? American Psychologist, 18, 519. doi:10.1111/j.1745-
Findings from a national study. Teachers College 3992.1994.tb00561.x
Record, 87, 356–373. Gleason, P., & Dynarski, M. (2002). Do we know whom to
Ensminger, M. E., & Slusarcick, A. L. (1992). Paths to serve? Issues in using risk factors to identify dropouts.
high school graduation or dropout: A longitudinal Journal of Education for Students Placed at Risk, 7,
study of a first-grade cohort. Sociology of Education, 25–41. doi:10.1207/S15327671ESPR0701_3
65, 91–113. doi:10.2307/2112677 Grimes, T. V. (2010). Interpreting the meaning of grades:
European Commission. (2009). ECTS user’s guide. A descriptive analysis of middle school teachers'
Luxembourg, Belgium: Office for Official assessment and grading practices. (Doctoral
Publications of the European Communities. dissertation). Retrieved from ProQuest. (305268025)
doi:10.2766/88064 Grindberg, E. (2014, April 7). Ditching letter grades for a
Evans, F. B. (1976). What research says about grading. In ‘window’ into the classroom. Cable News Network.
S. B. Simon & J. A. Bellanca (Eds.), Degrading the Retrieved from:
grading myths: A primer of alternatives to grades and http://www.cnn.com/2014/04/07/living/report-card-
marks (pp. 30–50). Washington, DC: Association for changes-standards-based-grading-schools/
Supervision and Curriculum Development. Guskey, T. R. (1985). Implementing mastery learning.
Farkas, G., Grobe, R. P., Sheehan, D., & Shuan, Y. (1990). Belmont, CA: Wadsworth.
Cultural resources and school success: Gender, Guskey, T. R. (2000). Grading policies that work against
ethnicity, and poverty groups within an urban school standards…and how to fix them. IASSP Bulletin,
district. American Sociological Review, 55, 127–142. 84(620), 20–29. doi:10.1177/019263650008462003
doi:10.2307/2095708 Guskey, T. R. (2002, April). Perspectives on grading and
Farr, B. P. (2000). Grading practices: An overview of the reporting: Differences among teachers, students, and
issues. In E. Trumbull & B. Farr (Eds.), Grading and parents. Paper presented at the Annual Meeting of the
reporting student progress in an age of standards (pp. American Educational Research Association, New
1–22). Norwood, MA: Christopher-Gordon. Orleans, LA.
Feldman, K. A. (1997). Identifying exemplary teachers and Guskey, T. R. (2004). The communication challenge of
teaching: Evidence from student ratings. In R. P. Perry standards-based reporting. Phi Delta Kappan, 86,
& J. C. Smart (Eds.), Effective teaching in higher 326–329. doi:10.1177/003172170408600419
education: Research and practice (pp. 93–143). New Guskey, T. R. (2009a). Grading policies that work against
York, NY: Agathon Press. standards… And how to fix them. In T.R. Guskey
Finn, J. D. (1989). Withdrawing from school. Review of (Ed.), Practical solutions for serious problems in
Educational Research, 59, 117–142. standards-based grading (pp. 9–26). Thousand Oaks,
doi:10.3102/00346543059002117 CA: Corwin.
Fitzsimmons, S. J., Cheever, J., Leonard, E., & Guskey, T. R. (2009b, April). Bound by tradition:
Macunovich, D. (1969). School failures: Now and Teachers' views of crucial grading and reporting
tomorrow. Developmental Psychology, 1, 134–146. issues. Paper presented at the Annual Meeting of the
doi:10.1037/h0027088 American Educational Research Association, San
Folzer-Napier, S. (1976). Grading and young children. In Francisco, CA.
S. B. Simon & J. A. Bellanca (Eds.), Degrading the Guskey, T., & Bailey, J. (2001). Developing grading and
grading myths: A primer of alternatives to grades and reporting systems for student learning. Thousand
marks (pp. 23–27). Washington, DC: Association for Oaks, CA: Corwin.
Supervision and Curriculum Development. Guskey, T.R. & Bailey, J.M. (2010). Developing standards
Frary, R. B., Cross, L. H., & Weber, L. J. (1993). Testing based report cards. Thousand Oaks, CA: Corwin.
and grading practices and opinions of secondary Guskey, T. R., Swan, G. M. & Jung, L. A. (2010,
teachers of academic subjects: Implications for April). Developing a statewide, standards-based
instruction in measurement. Educational student report card: A review of the Kentucky
Measurement: Issues & Practice, 12(3), 23–30. initiative. Paper presented at the Annual Meeting of
doi:10.1111/j.1745-3992.1993.tb00539.x the American Educational Research Association,
Galton, D. J., & Galton, C. J. (1998). Francis Galton: and Denver, CO.
eugenics today. Journal of Medical Ethics, 24, 99– Hargis, C. H. (1990). Grades and grading practices:
105. Obstacles to improving education and helping at-risk

Brookhart et al. (2016) A Century of Grading Research

students. Springfield, MA: Charles C. Thomas. education. New York, NY: Hart.
Hay, P. J., & Macdonald, D. (2008). (Mis)appropriations Klapp Lekholm, A. (2011). Effects of school
of criteria and standards-referenced assessment in a characteristics on grades in compulsory school.
performance-based subject. Assessment in Education: Scandinavian Journal of Educational Research, 55,
Principles, Policy & Practice, 15, 153–168. 587–608. doi:10.1080/00313831.2011.555923
doi:10.1080/09695940802164184 Klapp Lekholm, A., & Cliffordson, C. (2008).
Healy, K. L. (1935). A study of the factors involved in the Discrepancies between school grades and test scores at
rating of pupils’ compositions. Journal of individual and school level: effects of gender and
Experimental Education, 4, 50–53. family background. Educational Research and
doi:10.1080/00220973.1935.11009995 Evaluation, 14, 181–199.
Heckman, J. J., & Rubinstein, Y. (2001). The importance doi:10.1080/13803610801956663
of noncognitive skills: Lessons from the GED testing Klapp Lekholm, A., & Cliffordson, C. (2009). Effects of
program. The American Economic Review, 91, 145– student characteristics on grades in compulsory
149. doi:10.2307/2677749 school. Educational Research and Evaluation, 15, 1–
Hill, G. (1935). The report card in present practice. 23. doi:10.1080/13803610802470425
Educational Method, 15, 115–131. Kulick, G., & Wright, R. (2008). The impact of grading on
Holmes, D. S. (1972). Effects of grades and disconfirmed the curve: A simulation analysis. International Journal
grade expectancies on students’ evaluations of their for the Scholarship of Teaching and Learning, 2(2), 5.
instructor. Journal of Educational Psychology, 63, Kunnath, J. P. (2016). A critical pedagogy perspective of
130–133. the impact of school poverty level on the teacher
Howley, A., Kusimo, P. S. & Parrott, L. (1999). Grading grading decision-making process (Doctoral
and the ethos of effort. Learning Environments dissertation). Retrieved from ProQuest. (10007423)
Research, 3, 229–246. doi:10.1023/A:1011469327430 Lauterbach, C. E. (1928). Some factors affecting teachers’
Hulten, C. E. (1925). The personal element in teachers’ marks. Journal of Educational Psychology, 19, 266–
marks. Journal of Educational Research, 12, 49–55. 271.
doi:10.1080/00220671.1925.10879575 Levin, H. M. (2013). The utility and need for incorporating
Imperial, P. (2011). Grading and reporting purposes and noncognitive skills into large-scale educational
practices in catholic secondary schools and grades' assessments. In M. von Davier, E. Gonzalez, I. Kirsch
efficacy in accurately communicating student & K. Yamamoto (Eds.), The role of international
learning (Doctoral dissertation). Retrieved from large-scale assessments: Perspectives from
ProQuest. (896956719) technology, economy, and educational research (pp.
Jacoby, H. (1910). Note on the marking system in the 67–86). Dordrecht, Netherlands: Springer.
astronomical course at Columbia College, 1909–1910. Linn, R. L. (1982). Ability testing: Individual differences,
Science, 31, 819–820. doi:10.1126/science.31.804.819 prediction, and differential prediction. In A. K.
Jimerson, S. R., Egeland, B., Sroufe, L. A., & Carlson, B. Wigdor & W. R. Garner (Eds.), Ability testing: Uses,
(2000). A prospective longitudinal study of high consequences, and controversies (pp. 335–388).
school dropouts examining multiple predictors across Washington, DC: National Academy Press.
development. Journal of School Psychology, 38, 525– Liu, X. (2008a, October). Measuring teachers’ perceptions
549. doi:10.1016/S0022-4405(00)00051-0 of grading practices: Does school level make a
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), difference? Paper presented at the Annual Meeting of
Educational Measurement (4th ed., pp. 17–64). the Northeastern Educational Research Association,
Westport, CT: American Council on Rocky Hill, CT.
Education/Praeger. Liu, X. (2008b, October). Assessing measurement
Kasten, K. L., & Young, I. P. (1983). Bias and the intended invariance of the teachers’ perceptions of grading
use of student evaluations of university practices scale across cultures. Paper presented at the
faculty. Instructional Science, 12, 161–169. Annual Meeting of the Northeastern Educational
doi:10.1007/BF00122455 Research Association, Rocky Hill, CT.
Kelly, F. J. (1914). Teachers’ marks: Their variability and Liu, X., O'Connell, A. A., & McCoach, D. B. (2006,
standardization. Contributions to Education No. 66. April). The initial validation of teachers' perceptions
New York, NY: Teachers College, Columbia of grading practices. Paper presented at the Annual
University. Meeting of the American Educational Research
Kelly, S. (2008). What types of students' effort are Association, San Francisco, CA.
rewarded with high marks? Sociology of Education, Llosa, L. (2008). Building and supporting a validity
81, 32–52. doi:10.1177/003804070808100102 argument for a standards-based classroom assessment
Kirschenbaum, H., Napier, R., & Simon, S. B. (1971). of English proficiency based on teacher judgments.
Wad-ja-get? The grading game in American Educational Measurement: Issues and Practice, 27(3),

Brookhart et al. (2016) A Century of Grading Research

32–42. doi:10.1111/j.1745-3992.2008.00126.x 158).

Lloyd, D. N. (1974). Analysis of sixth grade characteristics McMillan, J. H., Myran, S., & Workman, D. (2002).
predicting high school dropout or graduation. JSAS Elementary teachers' classroom assessment and
Catalog of Selected Documents in Psychology, 4, 90. grading practices. Journal of Educational Research,
Lloyd, D. N. (1978). Prediction of school failure from 95, 203–213. doi:10.1080/00220670209596593
third-grade data. Educational and Psychological McMillan, J. H., & Nash, S. (2000, April). Teacher
Measurement, 38, 1193–1200. classroom assessment and grading decision
doi:10.1177/001316447803800442 making. Paper presented at the Annual Meeting of the
Love, D. A., & Kotchen, M. J. (2010). Grades, course National Council of Measurement in Education, New
evaluations, and academic incentives. Eastern Orleans, LA.
Economic Journal, 36, 151–163. McMunn, N., Schenck, P., & McColskey, W. (2003,
doi:10.1057/eej.2009.6 April). Standards-based assessment, grading, and
Marsh, H. W. (1984). Students' evaluations of university reporting in classrooms: Can district training and
teaching: Dimensionality, reliability, validity, potential support change teacher practice? Paper presented at
biases, and utility. Journal of Educational Psychology, the Annual Meeting of the American Educational
76, 707–754. doi:10.1016/0883-0355(87)90001-2 Research Association, Chicago, IL.
Marsh, H. W. (1987). Students’ evaluations of university Melograno, V. J. (2007). Grading and report cards for
teaching: Research findings, methodological issues, standards-based physical education. Journal of
and directions for future research. International Physical Education, Recreation, and Dance, 78(6),
Journal of Educational Research, 11, 253–288. 45–53. doi:10.1080/07303084.2007.10598041
doi:10.1016/0883-0355(87)90001-2 Messick, S. (1989). Validity. In R. L. Linn
Marzano, R. J. & Heflebower, T. (2011). Grades that show (Ed.), Educational measurement (3rd ed.) (pp. 13–
what students know. Educational Leadership, 69(3), 103). New York, NY: American Council of Education
34–39. and Macmillan.
Maurer, T. W. (2006). Cognitive dissonance or revenge? Meyer, M. (1908). The grading of students. Science, 28,
Student grades and course evaluations. Teaching of 243–252. doi:10.1126/science.28.712.243
Psychology, 33, 176–179. Miner, B. C. (1967). Three factors of school achievement.
doi:10.1207/s15328023top3303_4 The Journal of Educational Research, 60, 370–376.
Mayo, S. T. (1970). Trends in the teaching of the first doi:10.2307/27531890
course in measurement. National Council of Mohnsen, B. (2013). Assessment and grading in physical
Measurement in Education Symposium Paper. Loyola education. Strategies: A journal for physical and
University, Chicago. Retrieved from ERIC database. sports educators, 20(2), 24–28.
(ED047007) doi:10.1080/08924562.2006.10590709
McCandless, B. R., Roberts, A., & Starnes, T. (1972). Moore, C. C. (1939). The elementary school mark. The
Teachers' marks, achievement test scores, and aptitude Pedagogical Seminary and Journal of Genetic
relations with respect to social class, race, and sex. Psychology, 54, 285–294.
Journal of Educational Psychology, 63, 153–159. doi:10.1080/08856559.1939.10534336
doi:10.1037/h0032646 Morris, J. D., Ehren, B. J., & Lenz, B. K. (1991). Building
McKeachie, W. J. (1979). Student ratings of faculty: A a model to predict which fourth through eighth graders
reprise. Academe, 65, 384–397. doi:10.2307/40248725 will drop out in high school. Journal of Experimental
McKenzie, R.B. (1975). The economic effects of grade Education, 59, 286–293.
inflation on instructor evaluations: A theoretical doi:10.1080/00220973.1991.10806615
approach. Journal of Economic Education, 6, 99–105. Myford, C. (2012). Rater cognition research: Some
doi:10.1080/00220485.1975.10845408 possible directions for the future. Educational
McMillan, J. H. (2001). Secondary teachers' classroom Measurement: Issues and Practice, 31(3), 48–49.
assessment and grading practices. doi:10.1111/j.1745-3992.2012.00243.x
Educational Measurement: Issues and Practice, 20(1), 20– Nichols, S. L., & Berliner, D. C. (2007). Collateral
32. doi:10.1111/j.1745-3992.2001.tb00055.x damage: How high stakes testing corrupts America's
McMillan, J. H. (2009). Synthesis of issues and schools. Cambridge, MA: Harvard Education Press.
implications for practice. In T.R. Guskey (Ed.), Nicolson, F. W. (1917). Standardizing the marking system.
Practical solutions for serious problems in standards- Educational Review, 54, 225–237.
based grading (pp. 105–120). Thousand Oaks, CA: Nuffic. (2013). Grading systems in the Netherlands, the
Corwin. United States and the United Kingdom. The Hague,
McMillan, J. H., & Lawson, S. R. (2001). Secondary Netherlands: Author.
science teachers' classroom assessment and grading O’Connor, K. (2009). How to grade for learning: Linking
practices. Retrieved from ERIC database (ED 450 grades to standards. (3rd ed.). Glenview, IL: Pearson

Brookhart et al. (2016) A Century of Grading Research

Professional Development. reconstruction of the marking system. The Elementary

Pallas, A. M. (1989). Conceptual and measurement issues School Journal, 18, 701–719. doi:10.1086/454643
in the study of school dropouts. In K. Namboodiri & Rumberger, R. W. (1987). High school dropouts: A review
R. G. Corwin (Eds.), Research in the sociology of of issues and evidence. Review of Educational
education and socialization (Vol. 8, pp. 87–116). Research, 57, 101–121.
Greenwich, CT: JAI. doi:10.3102/00346543057002101
Pallas, A. M. (2003). Educational transitions, trajectories, Rumberger, R. W. (2011). Dropping out: Why students
and pathways. In J. T. Mortimer & M. J. Shanahan drop out of high school and what can be done about it.
(Eds.), Handbook of the life course. New York, NY: Cambridge, MA: Harvard University Press.
Kluwer Academic/Plenum. Russell, J. A., & Austin, J. R. (2010). Assessment practices
Parsons, T. (1959). The school class as a social system: of secondary music teachers. Journal of Research in
Some of its functions in American society. Harvard Music Education, 58, 37–54.
Educational Review, 29, 297–318. doi:10.1177/0022429409360062
Pattison, E., Grodsky, E., & Muller, C. (2013). Is the sky Salmons, S. D. (1993). The relationship between students’
falling? Grade inflation and the signaling power of grades and their evaluation of instructor performance.
grades. Educational Researcher, 42, 259–265. Applied H.R.M. Research, 4, 102–114.
doi:10.3102/0013189x13481382 Sawyer, R. (2013). Beyond correlations: Usefulness of
Pearson, K. (1930). Life of Francis Galton. London, UK: high school GPA and test scores in making college
Cambridge University Press. admissions decisions. Applied Measurement in
Polikoff, M. S., Porter, A. C., & Smithson, J. (2011). How Education, 26, 89–112.
well aligned are state assessments of student doi:10.1080/08957347.2013.765433
achievement with state content standards? American Schneider, J., & Hutt, E. (2014). Making the grade: A
Educational Research Journal, 48, 965–995. history of the A-F marking scheme. Journal of
doi:10.3102/0002831211410684 Curriculum Studies, 46, 201–224.
sQuann, C. J. (1983). Grades and grading: Historical doi:10.1080/00220272.2013.790480.
perspectives and the 1982 AACRAO study. Scriffiny, P. L. (2008). Seven reasons for standards-based
Washington, DC: American Association of Collegiate grading. Educational Leadership, 66(2), 70–74.
Registrars and Admissions Officers. Retrieved from
Randall, J., & Engelhard, G. (2009). Examining teacher http://www.ascd.org/publications/educational_leadersh
grades using Rasch measurement theory. Journal of ip/oct08/vol66/num02/Seven_Reasons_for_Standards-
Educational Measurement, 46, 1–18. Based_Grading.aspx
doi:10.1111/j.1745-3984.2009.01066.x Shriner, W. O. (1930). The comparison factor in the
Randall, J., & Engelhard, G. (2010). Examining the evaluation of examination papers. Teachers College
grading practices of teachers. Teaching and Teacher Journal, 1, 65–74.
Education, 26, 1372–1380. Shippy, N., Washer, B. A., & Perrin, B. (2013). Teaching
doi:10.1016/j.tate.2010.03.008 with the end in mind: The role of standards-based
Resnick, L. B. (1987). The 1987 presidential address: grading. Journal of Family and Consumer Sciences,
Learning in school and out. Educational Researcher, 105(2), 14–16. doi: 10.14307/JFCS105.2.5
16, 13–20 + 54. doi:10.3102/0013189X016009013 Silberstein, N. (1922) The variability of teachers' marks.
Roderick, M., & Camburn, E. (1999). Risk and recovery English Journal, 11, 414–424.
from course failure in the early years of High School. Simon, S. B., & Bellanca, J. A. (1976). Degrading the
American Educational Research Journal, 36, 303– grading myths: A primer of alternatives to grades and
343. doi:10.3102/00028312036002303 marks. Washington, DC: Association for Supervision
Rojstaczer, S., & Healy, C. (2012). Where A is ordinary: and Curriculum Development.
The evolution of American college and university Simon, M., Tierney, R. D., Forgette-Giroux, R., Charland,
grading, 1940–2009. Teachers College Record, J., Noonan, B., & Duncan, R. (2010). A secondary
114(7), 1–23. school teacher’s description of the process of
Ross, C. C., & Hooks, N. T. (1930). How shall we predict determining report card grades. McGill Journal of
high-school achievement? The Journal of Educational Education, 45, 535–554. doi:10.7202/1003576ar
Research, 22, 184–196. doi:10.2307/27525222 Sims, V. M. (1933). Reducing the variability of essay
Ross, J. A., & Kostuch, L. (2011). Consistency of report examination marks through eliminating variations in
card grades and external assessments in a Canadian standards of grading. Journal of Educational
province. Educational Assessment, Evaluation, and Research, 26, 637–647.
accountability, 23, 158–180. doi:10.1007/s11092-011- doi:10.1080/00220671.1933.10880358
9117-3 Smith, A. Z., & Dobbin, J. E. (1960). Marks and marking
Rugg, H. O. (1918). Teachers’ marks and the systems. In C. W. Harris (Ed.), Encyclopedia of

Brookhart et al. (2016) A Century of Grading Research

educational research (3rd ed.) (pp. 783–791). New importance for the prediction of upper secondary
York, NY: Macmillan. school grades. Scandinavian Journal of Educational
Smith, J. K. (2003). Reconsidering reliability in classroom Research, 58, 127–146.
assessment and grading. Educational Measurement: doi:10.1080/00313831.2012.705322
Issues and Practice, 22(4), 26–33. doi:10.1111/j.1745- Thorsen, C., & Cliffordson, C. (2012). Teachers' grade
3992.2003.tb00141.x assignment and the predictive validity of criterion-
Smith, J. K., & Smith, L. F. (2009). The impact of framing referenced grades. Educational Research and
effect on student preferences for university grading Evaluation, 18, 153–172.
systems. Studies in Educational Evaluation, 35, 160– doi:10.1080/13803611.2012.659929
167. doi:10.1016/j.stueduc.2009.11.001 Tierney, R. D., Simon, M., & Charland, J. (2011). Being
Snow, R. E. (1989). Toward assessment of cognitive and fair: Teachers’ interpretations of principles for
conative structures in learning. Educational standards-based grading. The Educational Forum, 75,
Researcher, 18(9), 8–14. 210–227. doi:10.1080/00131725.2011.577669
doi:10.3102/0013189x018009008 Troob, C. (1985). Longitudinal study of students entering
Sobel, F. S. (1936). Teachers' marks and objective tests as high school in 1979: The relationship between first
indices of adjustment. Teachers College Record, 38, term performance and school completion. New York,
239–240. NY: New York City Board of Education.
Spooren, P. Brockx, B, & Mortelmans, D. (2013). On the Troug, A. J., & Friedman, S. J. (1996). Evaluating high
validity of student evaluation of teaching: The state of school teachers’ written grading policies from a
the art. Review of Educational Research, 83, 598–642. measurement perspective. Paper presented at the
doi:10.3102/0034654313496870 annual meeting of the National Council on
Stanley, G., & Baines, L. (2004). No more shopping for Measurement in Education, New York.
grades at B-Mart: Re-establishing grades as indicators Unzicker, S. P. (1925). Teachers’ marks and intelligence.
of academic performance. The Clearing House: A The Journal of Educational Research, 11, 123–131.
Journal of Educational Strategies, Issues and Ideas, doi:10.1080/00220671.1925.10879537
77, 101–104. doi:10.1080/00098650409601237 U. S. Department of Education. (1999). What Happens in
Starch, D. (1913). Reliability and distribution of grades. Classrooms? Instructional Practices in Elementary
Science, 38, 630–636. doi:10.1126/science.38.983.630 and Secondary Schools, 1994–95, NCES 1999–348,
Starch, D. (1915). Can the variability of marks be reduced? by R. R. Henke, X. Chen, G. Goldman, M. Rollefson,
School and Society, 2, 242–243. & K. Gruber. Washington, DC: Author. Retrieved
Starch, D., & Elliott, E. C. (1912). Reliability of the from http://nces.ed.gov/pubs99/1999348.pdf
grading of high-school work in English. School Voss, H. L., Wendling, A., & Elliott, D. S. (1966). Some
Review, 20, 442–457. types of high school dropouts. The Journal of
Starch, D., & Elliott, E. C. (1913a). Reliability of grading Educational Research, 59, 363–368.
work in mathematics. School Review, 21, 254–259. Webster, K. L. (2011). High school grading practices:
Starch, D., & Elliott, E. C. (1913b). Reliability of grading Teacher leaders’ reflections, insights, and
work in history. School Review, 21, 676–681. recommendations (Doctoral dissertation). Retrieved
Sun, Y., & Cheng, L. (2013). Teachers' grading practices: from ProQuest. (3498925)
Meaning and values assigned. Assessment in Welsh, M. E., & D'Agostino, J. (2009). Fostering
Education: Principles, Policy & Practice, 21, 326– consistency between standards-based grades and large-
343. doi:10.1080/0969594.2013.768207 scale assessment results. In T. R. Guskey
Svennberg, L., Meckbach, J., & Redelius, K. (2014). (Ed.), Practical solutions for serious problems in
Exploring PE teachers' 'gut feelings': An attempt to standards-based grading (pp. 75–104). Thousand
verbalise and discuss teachers' internalised grading Oaks, CA: Corwin.
criteria. European Physical Education Review, 20, Welsh, M. E., D’Agostino, J. V., & Kaniskan, R. (2013).
199–214. doi:10.1177/1356336X13517437 Grading as a reform effort: Do standards-based grades
Swan, G. M., Guskey, T. R., & Jung, L. A. (2014). converge with test scores? Educational Measurement:
Parents’ and teachers’ perceptions of standards-based Issues and Practice, 32(2), 26–36.
and traditional report cards. Educational Assessment, doi:10.1111/emip.12009
Evaluation and Accountability, 26, 289–299. Wiggins, G. (1994). Toward better report cards.
doi:10.1007/s11092-014-9191-4 Educational Leadership, 52(2), 28–37. Retrieved
Swineford, F. (1947). Examination of the purported from: http://www.ascd.org/publications/educational-
unreliability of teachers' marks. The Elementary leadership/oct94/vol52/num02/Toward-Better-Report-
School Journal, 47, 516–521. doi:10.2307/3203007 Cards.aspx
Thorsen, C. (2014). Dimensions of norm-referenced Wiley, C. R. (2011). Profiles of teacher grading practices:
compulsory school grades and their relative Integrating teacher beliefs, course criteria, and

Brookhart et al. (2016) A Century of Grading Research

student characteristics (Doctoral dissertation).

Retrieved from ProQuest. (887719048)
Willingham, W. W., Pollack, J. M., & Lewis, C. (2002).
Grades and test scores: Accounting for observed
differences. Journal of Educational Measurement, 39,
1–37. doi:10.1002/j.2333-8504.2000.tb01838.x
Winter, R. (1993). Education or grading? Arguments for a
non-subdivided honours degree. Studies in Higher
Education, 18, 363–377.
doi:10.1080/03075079312331382271
Woodruff, D. J., & Ziomek, R. L. (2004). High school
grade inflation from 1991 to 2003. (Research report
series 2004–04). Iowa City, IA: ACT.
doi:10.1.1.409.9896
Yesbeck, D. M. (2011). Grading practices: Teachers'
considerations of academic and non-academic
factors (Doctoral dissertation). Retrieved from
ProQuest. (913076079)

Brookhart et al. (2016) A Century of Grading Research

The Reading Zone PDF
100% (2)
The Reading Zone PDF
144 pages
Assessment Handbook: Scott Foresman
No ratings yet
Assessment Handbook: Scott Foresman
518 pages
TOEIC Skills 3: Answer Key
No ratings yet
TOEIC Skills 3: Answer Key
87 pages
Skills For The TOEIC Test Speaking and Writing
100% (6)
Skills For The TOEIC Test Speaking and Writing
192 pages
Literature Review-Standards Based Grading
100% (1)
Literature Review-Standards Based Grading
17 pages
Undergraduate Research in the Sciences: Engaging Students in Real Science
From Everand
Undergraduate Research in the Sciences: Engaging Students in Real Science
Sandra Laursen
No ratings yet
Read and Retell 1 SB
100% (2)
Read and Retell 1 SB
84 pages
Grading Discrepancy in Global Education
No ratings yet
Grading Discrepancy in Global Education
14 pages
Ojscieadmin, Journal Manager, 1234-5986-2-LE
No ratings yet
Ojscieadmin, Journal Manager, 1234-5986-2-LE
12 pages
Assessment
No ratings yet
Assessment
4 pages
EJ1191403
No ratings yet
EJ1191403
11 pages
Review of Grading and Achievement Literature 2-3-17
No ratings yet
Review of Grading and Achievement Literature 2-3-17
11 pages
Activity For Grading and Reporting of Grades
No ratings yet
Activity For Grading and Reporting of Grades
5 pages
Grading 2 The Case Against Percentage Grades
No ratings yet
Grading 2 The Case Against Percentage Grades
7 pages
Statistics Test Questions: Content and Trends
No ratings yet
Statistics Test Questions: Content and Trends
16 pages
Assessment: Polytechnic University of The Philippines
No ratings yet
Assessment: Polytechnic University of The Philippines
7 pages
Authentic Assessment Lit Review Sample
100% (1)
Authentic Assessment Lit Review Sample
10 pages
Criteria-Based - Assessment
No ratings yet
Criteria-Based - Assessment
21 pages
Unit 2 Testing and Evaluation Material For MA (1730201317937)
No ratings yet
Unit 2 Testing and Evaluation Material For MA (1730201317937)
7 pages
Assessment
No ratings yet
Assessment
10 pages
The Functions of Grading Students: December 2010
No ratings yet
The Functions of Grading Students: December 2010
10 pages
TLTK Dan-Anders Norman 2023
No ratings yet
TLTK Dan-Anders Norman 2023
14 pages
Ch.08-Grading Systems
No ratings yet
Ch.08-Grading Systems
48 pages
Assessment Literacy For Teacher Candidates: A Focused Approach
No ratings yet
Assessment Literacy For Teacher Candidates: A Focused Approach
18 pages
Criterion and Referenced Based Assessments
No ratings yet
Criterion and Referenced Based Assessments
8 pages
Assessmentof Schools Evaluation Systemsin Egypt
No ratings yet
Assessmentof Schools Evaluation Systemsin Egypt
18 pages
Grading and Reporting Student Learning
100% (1)
Grading and Reporting Student Learning
61 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
Grading and Reporting of Assessment Results
No ratings yet
Grading and Reporting of Assessment Results
54 pages
British Educational Res J - 2013 - Black
No ratings yet
British Educational Res J - 2013 - Black
16 pages
The Impact of Grading On The Curve: A Simulation Analysis
No ratings yet
The Impact of Grading On The Curve: A Simulation Analysis
19 pages
Forms of Assessments
No ratings yet
Forms of Assessments
5 pages
M.asters of Edn Notes of Netaji Univ. For Special Edn Coursees
No ratings yet
M.asters of Edn Notes of Netaji Univ. For Special Edn Coursees
188 pages
Grading System
No ratings yet
Grading System
21 pages
Article 1
No ratings yet
Article 1
26 pages
Listening Guides
No ratings yet
Listening Guides
7 pages
Background of The Study
No ratings yet
Background of The Study
18 pages
Mathematics Course Work Distribution and Student Success
No ratings yet
Mathematics Course Work Distribution and Student Success
4 pages
The Truth About Homework from the Students' Perspective
From Everand
The Truth About Homework from the Students' Perspective
Gladys R. Landing-Corretjer, Ed.D
No ratings yet
El200712 Christopher
No ratings yet
El200712 Christopher
4 pages
1 s2.0 S0191491X20301760 Main
No ratings yet
1 s2.0 S0191491X20301760 Main
11 pages
Historical Research Research: Its Meaning
No ratings yet
Historical Research Research: Its Meaning
13 pages
Stenhouse-1979-Can Research Improve Teaching
No ratings yet
Stenhouse-1979-Can Research Improve Teaching
7 pages
Grades That Show What Students Know
No ratings yet
Grades That Show What Students Know
7 pages
Research Notes
No ratings yet
Research Notes
76 pages
Educative Essays: Volume 4
From Everand
Educative Essays: Volume 4
Benjamin L. Stewart, PhD
No ratings yet
K
No ratings yet
K
20 pages
Grading and Reporting Student Learning: Thomas R. Guskey
No ratings yet
Grading and Reporting Student Learning: Thomas R. Guskey
46 pages
Testing in The Classroom and Its Effectiveness in Predicting Student Achievement and Understanding
No ratings yet
Testing in The Classroom and Its Effectiveness in Predicting Student Achievement and Understanding
18 pages
Good Education in An Age of Measurement
No ratings yet
Good Education in An Age of Measurement
14 pages
ASSESSMENT OF LEARNING - assignmentPED - 6
No ratings yet
ASSESSMENT OF LEARNING - assignmentPED - 6
5 pages
Not All Finns Think Alike: Varying Views of Assessment in Finland
No ratings yet
Not All Finns Think Alike: Varying Views of Assessment in Finland
10 pages
Educational Assessment: Overview: June 2010
No ratings yet
Educational Assessment: Overview: June 2010
20 pages
2002 - McMillan - Myran - Workman - Elementary Teacher's Classroom Assessment and Grading Practices PDF
No ratings yet
2002 - McMillan - Myran - Workman - Elementary Teacher's Classroom Assessment and Grading Practices PDF
12 pages
Assessing the Teaching of Writing: Twenty-First Century Trends and Technologies
From Everand
Assessing the Teaching of Writing: Twenty-First Century Trends and Technologies
Amy E. Dayton
No ratings yet
Relevance of Assessment
No ratings yet
Relevance of Assessment
12 pages
Measurement and Evaluation Issues in Science Education
No ratings yet
Measurement and Evaluation Issues in Science Education
12 pages
Test Development Evaluation (6462)
No ratings yet
Test Development Evaluation (6462)
9 pages
SAT - An Overview of The SAT 9 Test
100% (1)
SAT - An Overview of The SAT 9 Test
172 pages
The Impact on Algebra vs. Geometry of a Learner's Ability to Develop Reasoning Skills
From Everand
The Impact on Algebra vs. Geometry of a Learner's Ability to Develop Reasoning Skills
Dr. Lisa A. Johnson
No ratings yet
Lesson 5 - Grading Systems and DepEd's Guidelines
No ratings yet
Lesson 5 - Grading Systems and DepEd's Guidelines
17 pages
Grading Systems
No ratings yet
Grading Systems
18 pages
1.introduction To Educational Research
No ratings yet
1.introduction To Educational Research
8 pages
Chapter 5
0% (1)
Chapter 5
11 pages
I. Terminologies A. Average Grading System
No ratings yet
I. Terminologies A. Average Grading System
5 pages
CASES Assessment Rubric
No ratings yet
CASES Assessment Rubric
20 pages
Calder WP 301-0624
No ratings yet
Calder WP 301-0624
61 pages
June2024ERS Report
No ratings yet
June2024ERS Report
33 pages
Dissertation Nocopyright-Augmented
No ratings yet
Dissertation Nocopyright-Augmented
186 pages
Banner Year For 2 Multi-Manager Stalwarts: JANUARY 20, 2021
No ratings yet
Banner Year For 2 Multi-Manager Stalwarts: JANUARY 20, 2021
9 pages
Illinois 2011 NCLB Conference Program
No ratings yet
Illinois 2011 NCLB Conference Program
75 pages
INGLIS. L 64067 English Home Language Assignment 2
No ratings yet
INGLIS. L 64067 English Home Language Assignment 2
21 pages
MAL2035 Challenge Module Assessment Brief - Assessment 1 Brief
No ratings yet
MAL2035 Challenge Module Assessment Brief - Assessment 1 Brief
8 pages
ARW1 Evaluation Guide (For Students)
No ratings yet
ARW1 Evaluation Guide (For Students)
5 pages
Statement of The Problem
No ratings yet
Statement of The Problem
5 pages
Parents Perception Towards Students-1
No ratings yet
Parents Perception Towards Students-1
10 pages
Four Strategies For Utilizing Social Media in Teaching & Learning ESP in Nigeria - ICC-JOURNAL-Volume-5-Issue-1-Autumn-2023
No ratings yet
Four Strategies For Utilizing Social Media in Teaching & Learning ESP in Nigeria - ICC-JOURNAL-Volume-5-Issue-1-Autumn-2023
60 pages
A Case Study Into Reading Strategies Using Think Aloud Protocols
100% (2)
A Case Study Into Reading Strategies Using Think Aloud Protocols
74 pages
English Summaries Questions
No ratings yet
English Summaries Questions
14 pages
Best Practices in Reading: Schools Division Office - ALIAGA Annex
No ratings yet
Best Practices in Reading: Schools Division Office - ALIAGA Annex
3 pages
Assignment 1-Skills Assignment
No ratings yet
Assignment 1-Skills Assignment
13 pages
My Homework Lesson 3 Draw Scaled Bar Graphs
100% (1)
My Homework Lesson 3 Draw Scaled Bar Graphs
6 pages
Weekly Student Teaching Journal - Placement One
No ratings yet
Weekly Student Teaching Journal - Placement One
8 pages
RPH Pencerapan
No ratings yet
RPH Pencerapan
1 page
p01 loresWATER+FOOTER
No ratings yet
p01 loresWATER+FOOTER
130 pages
Purpose of Reading Academic Text: English (Company Name) Submitted To: Shaista Mam
No ratings yet
Purpose of Reading Academic Text: English (Company Name) Submitted To: Shaista Mam
6 pages
K To 12 MELCS ENGLISH7
0% (1)
K To 12 MELCS ENGLISH7
2 pages
NCF-SE Summary English 1 Aug 23
No ratings yet
NCF-SE Summary English 1 Aug 23
26 pages
Basalem-NHS - English The Bridge
No ratings yet
Basalem-NHS - English The Bridge
4 pages
TESTINGANDSCORESPEAKING
No ratings yet
TESTINGANDSCORESPEAKING
11 pages
Gender Issues - Jacobs Without Highlighting PDF
No ratings yet
Gender Issues - Jacobs Without Highlighting PDF
6 pages
English Ncert
No ratings yet
English Ncert
152 pages
Unit 9
No ratings yet
Unit 9
45 pages
Reading Assessment Template
No ratings yet
Reading Assessment Template
7 pages
21st Century Skills
No ratings yet
21st Century Skills
42 pages
Action Plan in Reading
100% (1)
Action Plan in Reading
3 pages

Brookhart Et Al 2016 RER 100 Year Grades Review

Uploaded by

Brookhart Et Al 2016 RER 100 Year Grades Review

Uploaded by

1

A Century of Grading Research: Meaning and Value in

Jeffrey K. Smith Lisa F. Smith Michael T. Stevens Megan E. Welsh

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Early Studies of the Reliability of Grades

Study Method Sample Main Findings

Brookhart et al. (2016) A Century of Grading Research

Hulten Intra-rater 30 English teachers grading  Teacher inconsistency over time

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Study Method Sample Main Findings

Study of 2002 (ELS)

Brookhart et al. (2016) A Century of Grading Research

Study Method Sample Main Findings

Brookhart et al. (2016) A Century of Grading Research

Studies of Grades as Predictors of Educational Outcomes

Study Method Sample Main Findings

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Studies of Teachers’ Grading Practices and Perceptions

Study Method Sample Main Findings

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Studies of Standards-Based Grading

Study Method Sample Main Findings

 Grades tended to be higher than test scores, except for in writing

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Studies of Grading in Higher Education

Study Method Sample Main Findings

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

ProQuest. (1032540669) nongraduates. Journal of Educational Research, 82,

Brookhart et al. (2016) A Century of Grading Research

doi:10.1353/hsj.2013.0000 and student ratings of teacher effectiveness. Report

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

32–42. doi:10.1111/j.1745-3992.2008.00126.x 158).

Brookhart et al. (2016) A Century of Grading Research

Professional Development. reconstruction of the marking system. The Elementary

Brookhart et al. (2016) A Century of Grading Research

Brookhart et al. (2016) A Century of Grading Research

student characteristics (Doctoral dissertation).

Brookhart et al. (2016) A Century of Grading Research

You might also like