Standards-Based Reform
Standards-Based Reform
HOOVER EDUCATION
Executive Summary
“Standards-based reform” in the heyday of the education reform movement was the strategy
of setting statewide standards, measuring student performance against those standards,
and then holding schools accountable for the results. It was at the heart of the No Child Left
Behind Act (NCLB) and dominated education policy from the 1990s into the 2010s.
Did it succeed as an overall strategy? Were there individual components that were particularly
effective? We know that student achievement improved markedly in the late 1990s and early
2000s—the very time that states were starting to put standards, tests, and “consequential
accountability” into place. Some of the gains can be directly attributed to those policies, but
the improvement was likely driven by other factors too.
On the flip side, when student achievement plateaued and even started to decline in the 2010s,
it’s plausible that the tapering-off was related to the softening of school-level accountability.
But hard evidence is scant, and it’s difficult to know for sure. We unfortunately have limited
information about what exactly schools did to get those better results in the earlier era.
For policymakers, though, there are some clear lessons. Capacity at the school and district
levels is critical. These reforms were driven by the assumption that schools knew what to do
to get better and were not doing it. We learned they needed significantly more support to
improve. We also need broader measures of success. NCLB set the same outcome goals for
all students, but multiple achievement measures can better meet the needs of students and
families. Reforms that change the day-to-day work of schools take a long time to implement,
so policymakers need to be patient and stay the course. They also need to invest in research.
There is more we can learn about what works.
• The No Child Left Behind era showed some successes, but how and why they occurred
is unclear.
• • •
“Standards-based reform” in the heyday of the education reform movement was a bit like the
title of a recent film: Everything Everywhere All at Once. The strategy of setting statewide stan-
dards, measuring student performance against those standards, and then holding schools
accountable for the results was at the heart of the No Child Left Behind Act (NCLB) and dom-
inated education policy for most of the “long NCLB period” from the 1990s into the 2010s. To
many observers, standards-based reform was education reform, and so the question about
whether standards-based reform worked is equivalent to asking whether education reform
worked.
Answering that question is only possible if we define what’s in and what’s out: What counts
under the umbrella of standards-based reform? Did it succeed as an overall strategy? Were
there individual components that were particularly effective?
In this chapter, we will work our way through these and related questions, but readers should
beware that the results will not be entirely satisfying. Get ready for a lot of shrugging.
We know, for example, that student achievement improved markedly in the late 1990s and
early 2000s—the very time that states were starting to put standards, tests, and “consequen-
tial accountability” into place. Some of the gains can be directly attributed to those policies.
But the improvement was likely driven by other factors, too, some of which had very little
to do with education policy or even schools, such as the plummeting child poverty rate at
the time.
On the flip side, when student achievement plateaued and even started to decline in the
2010s, it’s plausible that the tapering off was related to the softening of school-level account-
ability, as NCLB lost steam and eventually gave way to the Every Student Succeeds Act and
the Common Core State Standards. But hard evidence is scant, and it’s difficult to know
for sure, especially because—again—so much else was going on at the same time. That
included the aftermath of the Great Recession (and its budget cuts) as well as the advent of
smartphones and social media, which may have depressed student achievement just as they
boosted teenage anxiety and depression.
And while we know that standards, testing, and especially accountability drove some of the
improvements in student outcomes in the 1990s and 2000s, especially in math, we unfortu-
nately have limited information about exactly what schools did to get those better results. For
the most part, the “black box” that is the typical K–12 classroom stayed shut.
The NCLB Act locked into place a specific version of standards-based reform, one that incor-
porated a mishmash of ideas that had been floating around since the 1980s and arguably
since the 1960s. Think of it like a dish at a fusion restaurant, reflecting a novel combination of
flavors and culinary lineages—not always with a satisfying outcome.
One might even say that this version of standards-based reform was incoherent—which is
ironic, given that coherence was arguably the number-one goal of the original progenitors
of the idea. In a series of articles and books in the late 1980s, scholars Jennifer O’Day and
Marshall Smith argued for what they called “systemic reform.” Their key insight was that the
multiple layers of governance baked into the US education system as well as myriad conflict-
ing policies emanating from the many cooks in the K–12 kitchen were pulling educators in too
many directions. What we needed was to fix the system as a whole, to think comprehensively
and coherently and thereby get everyone rowing in the same direction in pursuit of stronger
and more equitable student outcomes.1
To do so, we needed to get serious about “alignment.” We should start with a clear set of
desired outcomes, also known as standards, delineating what we expect students to know
and be able to do—at the end of high school but also at key milestones along the way. Those
curricular standards would set forth both the content of what kids needed to learn and the
level at which they needed to learn it. Regular assessments would help practitioners and
policymakers understand whether kids were on track to meet expectations and ready to
progress to the next grade level and, ultimately, high school graduation. This approach would
allow for the assessment of student performance against common expectations and criteria
rather than measuring students against one another (norm-referenced evaluation and rank-
ings) to determine academic achievement. But perhaps most importantly, all the other key
pieces of the education apparatus needed to be aligned to the standards as well—especially
teacher preparation, professional development, instructional materials, and funding systems.
O’Day and Smith didn’t say much about “accountability” as we would later come to talk about
it—consequences that would accrue to educators, especially for poor student performance.
Instead, their focus was primarily on coherence, alignment, and building “capacity” in the
system to improve teaching and learning.
Systemic reform was popular with traditional education groups. 2 It spoke to the frustration of
classroom teachers as well as principals and superintendents, without directly threatening the
But this approach was hardly the only school improvement game in town. Other ideas were
gaining prominence in the late 1980s and early 1990s, too, ideas promulgated by governors,
economists, political scientists, and business leaders. To oversimplify a bit, they coalesced
around the “reinventing government” frame3 —namely that to reform a broken system like K–12
education, leaders needed to embrace a “tight-loose” strategy: tight about the results to be
accomplished and loose about how people closer to the problem might get there. This was
how business titans of the time steered their organizations, especially as the economy was
shifting to knowledge work. To get the best results, people on the front lines had to have the
autonomy to make decisions and solve problems themselves in real time rather than take
orders from the top. They should be rewarded when they improved productivity accordingly.
But if they failed to generate the desired results, unpleasantness might be expected to follow.
They might even lose their jobs.
This struck a chord among some education scholars as well. As far back as 1966’s Coleman
Report, we knew about the disconnect between education inputs and outcomes. If we
wanted better results, it made sense to focus on the latter.4 Furthermore, many of the reforms
embraced in the wake of 1983’s A Nation at Risk report tried to tweak inputs such as teacher
salaries, course requirements, and days in the school year. In an era of stagnant achieve-
ment and widening achievement gaps, none of that seemed to be working. It was time, many
thought, for something else.
By the early 1990s, the tight-loose frame was a big driver behind the charter schools move-
ment and the notion of “accountability for results” for public schools writ large. Lamar
Alexander, who was governor of Tennessee before becoming US secretary of education
under George H. W. Bush, was apt to talk about “an old-fashioned horse trade”: greater
autonomy for schools and educators in return for greater accountability for improved stu-
dent outcomes. 5 And it wasn’t just Republican governors who embraced this model; several
Democratic ones did, too, especially southern governors such as Jim Hunt (North Carolina),
Richard Riley (South Carolina), and Bill Clinton (Arkansas). It helped that the Progressive
Policy Institute—a think tank for the New Dems—supported this approach enthusiastically.
This version of standards-based reform had some overlap with O’Day and Smith’s systemic
reform, especially when it came to the centrality of academic standards. But it put greater
emphasis on the measurement of achievement against those standards—in other words,
high-stakes testing—and especially on accountability measures connected to results. This
reflected the thinking of both economists and political scientists, who thought that the right
incentives might allow local schools and school systems to break through the political barri-
ers to change. With enough pressure from on high, schools might finally put the needs of kids
first rather than follow the lead of adult interest groups, especially unions. They would remove
ineffective teachers from the classroom, for example, ditch misguided curricula, and untie
the hands of principals. The assumption was that the major barrier to improvement was not
This made sense to some key actors on the political left as well, especially the Education
Trust and other civil rights organizations. They bought into this version of standards-based
reform but with an important twist: doing right by kids would be defined primarily as doing
right by kids who had been mistreated by the education system. That meant Black, Hispanic,
and low-income students especially. These reformers wanted to counterbalance the polit-
ical power of the unions but also that of affluent parents and other actors who tended to
steer resources to the children and families who needed them the least. They wanted to use
top-down accountability to redirect money, qualified teachers, and attention to the highest-
poverty schools and the most disadvantaged kids.7
These various flavors of standards-based reform were all in the mix in the 1990s, with many
public discussions in particular about the wisdom of a strategy focused on “capacity build-
ing” versus one that stressed “accountability for results.”8 The enactment of NCLB settled
the debate; the accountability hawks won. Capacity building would mostly be put on the shelf
in favor of a muscular, federally driven effort to hold schools accountable, especially for the
achievement of the groups that most concerned civil rights leaders.
The No Child Left Behind Act of 2001, the Bush-era reauthorization of the Elementary and
Secondary Education Act, was the law of the land for an entire generation of students. The
kids who entered kindergarten in the fall of 2002, nine months after then president George W.
Bush put his signature on NCLB, were seniors in high school in December 2015 when then
president Barack Obama signed into law the Every Student Succeeds Act (its reauthorized
successor).
That’s not to say that the same policy was set in stone for those thirteen years. For the first
half of its life, federal officials implemented it rather faithfully, but the second half came with
major policy shifts driven by regulatory actions and what might be termed “strategic non
enforcement.” Let’s take a brief trip down memory lane.
“NCLB-classic”—which was the 2001 reauthorization of the 1965 Elementary and Secondary
Education Act—centered on the three-legged stool of standards, tests, and accountability.
But those three elements were not treated with the same level of prescription. States had
complete control over their standards—both in terms of the content to be included and in
terms of the level of performance that would be considered good enough. Not so when it
came to the tests—those had to be given annually to students in grades three through eight
in reading and in math, plus once in high school, plus three times in science. And the assess-
ments had to meet a variety of technical requirements.
NCLB had a plethora of other provisions, from mandating that schools hire only “highly quali-
fied teachers” to bringing “scientifically based reading instruction” (now called the science of
reading) to the nation’s schools. Some of these other pieces could be considered capacity-
building efforts. But overwhelmingly, NCLB was about accountability for results. It assumed
that with enough pressure, schools and districts would cut through the Gordian knot that was
holding them back in order to raise the achievement of students, especially those from mar-
ginalized groups. That was the theory. And as we’ll get to in a moment, it partly worked.
But it also soon became clear that many schools and systems didn’t know what to do in
response to the accountability pressure—or couldn’t steel themselves to make the requisite
changes in long-established practices and structures. Some educators narrowed the cur-
riculum, significantly expanding the time spent on math and reading at the expense of other
subjects. Stories filled the nation’s newspapers about schools teaching to the test, canceling
recess, even ignoring lice outbreaks, all because of the accountability pressures of NCLB. In
perhaps the most notable education scandal, teachers and principals in the Atlanta Public
Schools district were found to have cheated on state-administered tests by providing stu-
dents with the correct answers to questions and even changing students’ answers and modi-
fying test sheets to ensure higher scores.
NCLB EVOLVES
As with most federal statutes, Congress was supposed to update NCLB after a few years.
A reauthorization push in 2007 came close to doing so and would have made the law even
tougher, but it fell apart under fierce opposition from teachers’ unions and other education
advocacy groups. So the law lumbered on even as it became clearer to its strongest support-
ers, including then education secretary Margaret Spellings, that parts of it were becoming
unworkable.
One of the major issues was that an increasing number of schools were failing to meet NCLB’s
adequate yearly progress provisions. If tens of thousands of schools were deemed subpar,
then the sting and stigma were lost, as was much of the motivation to do something to fix it. In
Through a series of regulatory actions, Spellings (under George W. Bush) and Arne Duncan
(under Obama) allowed states to make critical changes to their implementation of NCLB to
address these concerns. They allowed growth models provided the models still expected
students to hit “proficiency” within a few years. They loosened rules around supplemental
services so that school districts could provide tutoring themselves rather than outsource it to
private providers. The cascade of sanctions was replaced with a menu of intervention options
and funded generously through the School Improvement Grants program—all meant to
encourage “school turnarounds.” An Obama-era waiver program allowed states even greater
flexibility to tinker with their accountability targets in return for commitments to embrace
other reforms the administration supported.
Meanwhile, states were working to address another key issue with NCLB: its encouragement
of low-level academic standards and much-too-easy-to-pass tests. Because the law required
states to set targets that would result in virtually all students reaching the “proficient” level by
2014, it incentivized states to set the proficiency bar very low. This, in turn, may have encour-
aged educators to engage in low-level instruction, with teaching to the test and “drill and kill”
methods. It also provided parents with misleading information, as states told most parents
that their children were “proficient” in reading and math, even if they were actually several
years below grade level and nowhere near on track for college or a decent-paying career. In
Tennessee, for example, the state reported that 90 percent of students were “proficient” in
fourth-grade reading in 2009 while the National Assessment of Educational Progress (NAEP)
had the number at 28 percent.9 Advocates came to call this the “honesty gap.”
Under the leadership of the National Governors Association and the Council of Chief State
School Officers, states started collaborating on a set of common standards for English lan-
guage arts and math—what would eventually become the Common Core State Standards.
The hope was that, by working together and providing political cover to one another, the
states would finally set the bar suitably high—at a level that indicated that high school grad-
uates were truly ready for college or career and that would encourage teachers to aim for
higher-level teaching. It would certainly be hard for the effort to result in worse standards
than what most states had in place. Multiple reviews of state standards over the years
from the American Federation of Teachers, Achieve, and the Thomas B. Fordham Institute
found that they were generally vague, poorly written, and lacking in the type of curricular
content that “systemic reformers” had envisioned so many years before.10 It wasn’t surpris-
ing, then, that so many educators reported teaching to the test. The tests became the true
standards, and they were perceived to be of low quality too.
As mentioned before, judging the success or failure of such a sprawling reform effort is hard
to do. Thankfully, scholars Dan Goldhaber and Michael DeArmond of the CALDER Center
at the American Institutes for Research offered a wonderful overview of the research liter-
ature in a recent report for the US Chamber of Commerce, Looking Back to Look Forward:
Quantitative and Qualitative Reviews of the Past 20 Years of K–12 Education Assessment and
Accountability Policy.11 I strongly encourage readers to review their findings; allow me to sum-
marize them here.
First, it’s clear that student achievement in the United States improved dramatically from the
mid to late 1990s until the early 2010s—especially in math, especially at the elementary and
middle school levels, and especially for the most marginalized student groups. Pointing to
studies by M. Danish Shakeel, Paul Peterson, Eric Hanushek, Ayesha Hashim, Sean Reardon,
and others, Goldhaber and DeArmond conclude that “the long-term gains on the NAEP reveal
a decades-long narrowing of test score achievement gaps between underserved groups (e.g.,
students of color, lower achieving students) and more advantaged groups (e.g., White stu-
dents, higher achieving students).”12
My own analysis of NAEP trends from that time period focused on the impressive gains made
by the nation’s low-income, Black, and Hispanic students, especially at the lower levels of
achievement.13 The proportion of Black fourth-graders scoring at the “below basic” level on
the NAEP reading exam, for example, dropped from more than two-thirds in 1992 to less than
half in 2015. Likewise, the percentage of Hispanic eighth-graders scoring “below basic” in
math dropped from two-thirds in 1990 to 40 percent in 2015. Those numbers were still much
too high, but the improvement over time was breathtaking.
Nor was it just student achievement. High school graduation rates shot up as well, climbing
fifteen points on average from the mid-1990s until today. We saw major improvements in col-
lege completion, too, with the percentage of Black and Hispanic young adults with four-year
Alas, the progress in test scores stalled in the early to mid-2010s, and achievement even
declined in some subjects and grade levels in the late 2010s, before the pandemic wiped
out decades of gains. As Goldhaber and DeArmond explain, this has led some analysts to
argue that the rise and fall of test-based accountability can explain the rise and fall of student
achievement.
That’s possible, but NAEP’s design makes it hard to know for sure. What scholars can do is
compare states with various policies (and policy implementation timelines) to try to link the
adoption of standards-based reform to changes in student achievement. That’s exactly what a
series of studies did in the 2000s, including ones by Martin Carnoy and Susanna Loeb, another
by Eric Hanushek and Margaret Raymond, and a seminal paper by Tom Dee and Brian Jacob.15
The latter compared states that adopted “consequential accountability” in the late 1990s to
those that adopted it in the early 2000s, once NCLB mandated them to do so. Dee and Jacob
found large impacts of those policies on math achievement (an effect size in the neighborhood
of half a year of learning), with even greater effects for the lowest-achieving students as well as
Black, Hispanic, and low-income kids. The impacts on reading and science were null.
Another study, by Manyee Wong, Thomas D. Cook, and Peter M. Steiner, used Catholic
schools as a control group and found more evidence that accountability policies raised
achievement in math in the public schools.16 Other research, also reviewed by Goldhaber and
DeArmond, looked at the impact of NCLB on the so-called bubble kids—the students who
were closest to the proficiency line or the schools most at risk of sanctions. Most studies
found the largest gains for such students and schools, for better or worse.17
A brand-new study, by Ozkan Eren, David N. Figlio, Naci H. Mocan, and Orgul Ozturk, found
that accountability policies had an impact on more than just test scores. “Our findings indi-
cate that a school’s receipt of a lower accountability rating, at the bottom end of the ratings
distribution, decreases adult criminal involvement. Accountability pressures also reduce the
propensity of students’ reliance on social welfare programs in adulthood and these effects
persist at least until when individuals reach their early 30s.”18
Circumstantial evidence from individual states also points to a big impact from consequential
accountability. Massachusetts, which combined standards-based reform with an enormous
increase in spending in its 1993 Education Reform Act, saw student achievement skyrocket in
the late 1990s and early 2000s—the much-remarked “Massachusetts miracle.” Fourth-grade
reading scores increased by nineteen points from 1998 through 2007—the equivalent of about
two grade levels. Eighth-grade math scores jumped thirty-one points from 2000 to 2009. With
its high-quality academic standards, intensive supports for teachers, lavish funding, and new
high school graduation exam for students, the Bay State showed what was possible.
What we can say, then, is that NCLB-style accountability worked, at least for a while and at
least in math. Nationally, it didn’t make an impact in reading, even though reading achieve-
ment was improving during the NCLB era (including in states like Massachusetts and
Mississippi). We also aren’t sure if achievement plateaued in the 2010s because accountabil-
ity necessarily stopped working or because accountability stopped.
It doesn’t help that we don’t have much evidence about the mechanisms that might have
driven the gains Dee and Jacob (and others) found. Did schools improve their approach to
teaching mathematics? Did they make more time for intensive interventions such as tutoring,
especially for their lowest-performing kids? Did they work harder or smarter to support teach-
ers and get their best folks where they were needed most? Why did accountability lead to
gains in math but not in reading?
We only have a few studies on how these policies might have changed classroom practice.
As mentioned above, it was widely perceived that schools—especially elementary schools,
where the schedule is more flexible—narrowed the curriculum and spent more time on math
and reading and less time on social studies and science. Several teacher surveys showed
this to be the case.19 (Perhaps that’s one reason standards-based reform failed to move the
needle on reading achievement, given the growing evidence linking content knowledge in
subjects like social studies to improvements in reading comprehension. 20) The improvement
of scores for bubble kids indicates that schools and teachers may have shifted their attention
to kids near the proficiency line. And teaching to the test was also thought to be pervasive;
some teacher surveys, for example, found that instruction became more teacher centered
and focused on basic skills. 21
Alas, studying policy implementation all the way into the classroom is difficult and expen-
sive. So save from surveying teachers about their practice—which is better than nothing but
not terribly reliable—not much else was done. 22,23 As a result, when it comes to changes that
standards-based reform might have brought to the classroom, we have more questions than
answers.
In 2009, the Obama administration successfully lobbied Congress to allocate $3.5 billion
(eventually growing to $7 billion) into the Title I School Improvement Grants program. This
sum was directed primarily to the 5 percent of schools in each state with the lowest academic
achievement. The federal government instructed districts to select from four intervention
However, as Goldhaber and DeArmond explain, some local and state studies did find posi-
tive impacts arising from the SIG initiative. California’s implementation was particularly well
studied by scholars including Thomas Dee, Susanna Loeb, Min Sun, Emily K. Penner, and
Katharine O. Strunk. 25 Both statewide and in particular cities, the results were generally posi-
tive, with improvements in both reading and math. This may be because California required
its lowest-performing schools to implement more intensive interventions. It also focused a
great deal of money—up to $1.5 million—on each school and gave the school lots of help in
spending it well.
Though not addressed by Goldhaber and DeArmond, another place to look for lessons on
accountability is the school choice movement. In particular, we can compare the relative
success of charter schools with private school choice, given that the former operates under
a strict accountability regime while the latter, in most states, does not. A growing body of
research, including a new study from CREDO at Stanford University, shows charter school
students outpacing their traditional public school peers both on test scores and on long-term
outcomes such as college completion. That is especially the case for urban charter schools
and for Black and Hispanic students. 26
Private school choice programs, on the other hand, have been markedly less effective in
boosting student outcomes, at least as judged by test scores. Recent studies of large-scale
voucher programs in Ohio, Indiana, and Louisiana all show voucher recipients trailing their
public school peers on test score growth, sometimes quite significantly. 27 To be sure, another
set of voucher studies finds positive long-term impacts on measures such as high school
graduation and college enrollment. 28 But the negative findings on achievement are still worry-
ing and might reflect the lack of consequential accountability baked into these programs.
In the charter schools sector, authorizers are empowered to close low-performing or finan-
cially unsustainable schools, and they do so with regularity. This is real accountability, and
the threat of closure very likely contributes to—perhaps even causes much of—the charter
achievement advantage.
What’s less clear, once again, are the exact mechanisms. Does the threat of school closure
encourage charter schools to improve? Perhaps—and a series of studies from the Fordham
Institute and others have found that charter schools tend to embrace a variety of practices
associated with improved achievement, from higher teacher expectations to greater teacher
diversity to firmer policies around student discipline. 29 On the other hand, it’s surely the
case that school closures themselves automatically improve the performance of the charter
sector, as the worst schools disappear, shifting the bell curve of achievement to the right.
Whatever the reason, it’s clear that accountability plays a key role in the relative success of
charter schools.
Yet only in recent years have reformers embraced curriculum as a key lever for school
improvement, with foundations and even states investing in building high-quality instructional
materials and organizations such as EdReports judging them for alignment with rigorous stan-
dards. Imagine how much more progress we might have made had we embarked on these
efforts twenty years earlier!
Yet that would have been hard to do, since back then states were just developing their stan-
dards, and they differed dramatically from one another even as most were of low quality. Only
with the creation of the Common Core State Standards was there an opportunity to build a
truly national marketplace for curricular materials, which is exactly what has happened in
recent years. As high-quality products like Core Knowledge Language Arts and Eureka Math
gain market share, we might be returning to the capacity-building effort we ditched so many
decades ago. Perhaps fixing teacher preparation and professional development can come next.
It’s become clear that states need to show leadership around curriculum and instruction
rather than sit back and hope districts make the right decisions on their own. States that
have done so over the past twenty-five years—including, at various times, Massachusetts,
Tennessee, and Mississippi—have seen improvements in achievement (though, of course,
correlation does not equal causation).
Nor can we make strong claims about the standards and assessments that are at the
heart of standards-based reform. Scholars have failed to detect any difference in achieve-
ment in states that had low standards versus high ones or weak tests versus strong ones.
As they say, the absence of evidence is not the evidence of absence. It’s hard to believe
that the quality of standards and assessments does not matter; rather, it’s more likely
that to drive positive change, demanding expectations and tests must be connected to
sophisticated school rating systems; meaningful accountability for results; and capacity-
building efforts, like the introduction of high-quality curricular materials, to help students
succeed.
The lesson for standards-based reform—and many other reforms as well—is that policy
makers can’t view components as items on an à la carte menu. In order to drive improve-
ments, it’s all or nothing. Especially in the push for “systemic,” coherent reform, the effort
is only as strong as its weakest link. If the question is which is most important (standards,
assessments, school ratings, consequences, turnaround efforts, or capacity building, espe-
cially around curriculum), the correct answer is “all of the above.”
The standards-based reform movement succeeded in promoting the idea that “all students
can learn” and that we must reject the “soft bigotry of low expectations.” These are powerful
and necessary maxims. But they rub up against the lived experience of educators, who must
cope with the reality of classrooms of students who can be as many as seven grade levels
apart on the first day of school. 30
Slogans about “holding schools accountable for results” elide critical questions over the
details. Results for which students? All of them? Including the ones who start the school
year way above or way below grade level? The embrace of “growth models” in the late NCLB
period and under ESSA helped to circle this square. By focusing on progress from one school
NCLB had an answer to this question, implicit though it may have been: the sharp focus
of NCLB was on helping the lowest-achieving students—who tended to be Black, Hispanic,
or low-income, or students with disabilities, or those still learning English—reach basic stan-
dards. And as discussed earlier, this focus worked for a time (again mostly in math) as those
were the precise groups whose achievement rose the most during the 1990s and 2000s and
who were much more likely to graduate from high school in the 2010s. But did this hyper-
focus unintentionally incentivize the success and growth of some students over others?
And was getting these students to a baseline level of proficiency setting them up for post
secondary success?
The pushback to testing has been significant. Some of that stemmed from how schools
responded to the tests—as discussed earlier, by “teaching to the test” or narrowing the cur-
riculum. Some of it related to the Obama-era push to tie teacher evaluations to test scores.
Some of it focused on the tests themselves. Making kids sit for annual assessments from
grades three through eight ate up precious instructional time. But since the results didn’t
come back until months later—even until the next school year—they weren’t of much help
to educators. They weren’t “instructionally useful.” Thus, most school districts opted to give
students additional standardized tests, such as NWEA’s Measures of Academic Progress, in
order to receive real-time information about how students were doing. One study found stu-
dents spending as many as twenty-five hours a year sitting for tests. 32
In recent years, some advocates and assessment providers have called for testing sys-
tems that can produce both accountability data and instructionally useful information
for educators. That’s an understandable impulse, but trade-offs are unavoidable. Some
approaches would assess students three times a year, for example—so-called through-year
assessments—which might increase the testing load and encourage schools to adopt a
curriculum closely aligned with the scope and sequence of the tests, for better or worse.
Assessments that return results immediately, meanwhile, are by definition not graded by
humans, and (so far at least) they can’t test the same higher-order skills that the better
state assessments today can. This might encourage a return to low-level teaching of the
skill-and-drill variety.
A key issue going forward is whether states will pursue these more instructionally useful
assessment systems or simply acknowledge that we need a variety of tests, some to guide
instruction and others to generate accountability data, as unpopular as the latter may be.
What can tomorrow’s policymakers learn from our experience with standards, assessments,
and accountability?
• Be clear-eyed about capacity in the system. Some of us wrongly assumed that incentives
were the only big problem—that once we put pressure on schools to improve, they
would figure out how to help their students meet standards. What standards-based
reform revealed, however, was how little capacity existed in many schools. Educators
didn’t know how to boost achievement, or they only knew how to do this for some kids
in their schools. They didn’t know what curricula to use. And accountability wasn’t
generally strong enough to overcome the political incentives operating in the system,
especially union politics. Reformers can’t wish realities like these away. Fixing perverse
incentives is necessary but not sufficient; capacity building is needed too. And that
means states need to take a more muscular role around issues like curriculum and
teacher preparation than some of us once imagined.
• Be wary of any reform that is about “all” students (or all schools). Yes, all kids need to
learn to read, write, and do math, and virtually all students can reach basic standards.
But not all kids need to (or can be) college ready. Reforms that don’t come to terms with
the huge variability in kids’ readiness levels, cognitive abilities, and prior achievements
will lose popular support and will flounder.
• Don’t take success for granted! Especially in the wake of the awful COVID-19 pandemic
and its disastrous impact on our schools, it’s hard not to romanticize the period in the
late 1990s and early 2000s when achievement was skyrocketing. What we wouldn’t give
to have those test score gains back! Yet the education debate at the time wasn’t full of
celebration and confidence, but angst about things not moving quickly enough. What
we need to remember is that education happens slowly, year by year, and we need to
make sure that policy leaders stay on course over a long period of time. We should fight
the urge to look for the “next big thing.” At the current moment, for example, there’s
much enthusiasm about universal education savings accounts as new and exciting, in
contrast to charter schools, which feel old and dated to some. Yet based on their strong
track record, slowly but surely continuing to expand high-quality charter schools may
be the best approach to improving student outcomes and expanding parental options.
Policymakers, advocates, and philanthropists need to get better at finishing what we
started.
• Scholars need new ways to study policy change all the way to the classroom. Thanks in
part to the data produced by standards-based reforms, the field of education research
has improved markedly in recent decades. Experimental and quasi-experimental
designs are much more common, and every day brings important new findings about
interventions and their impact on student outcomes. Yet as this chapter demonstrates,
we still struggle to follow policy changes all the way down to the classroom. But
The conventional wisdom in some quarters is that standards-based reform in general, and
NCLB in particular, didn’t work. That conventional wisdom is incorrect. These policies
deserve some of the credit for the historically large achievement gains of the 1990s and
2000s and the equally impressive improvements in the high school graduation and college
completion rates of more recent years.
But this approach to reform will work much better if it is combined with efforts to boost
the knowledge, skills, and confidence of educators on the front lines. Providing high-quality
instructional materials is arguably the best way to do that, and it’s an effort that states
have finally embarked upon. This is still no panacea; the Gordian knot hasn’t been sliced
through, nor have teachers’ unions disappeared, nor have we solved the riddle of how to
get fourteen thousand school districts to embrace smart policies and practices. Systemic
dysfunction remains. But a recommitment to accountability for results, along with a focus on
making classroom instruction more coherent, effective, and equitable, could yield stronger
results in the years ahead.
Essays in this series were reviewed by members of the Hoover Education Success Initiative
(HESI) Practitioner Council. For more information about the Practitioner Council and HESI, visit
us online at hoover.org/hesi.
Mike Petrilli is partly right about the lessons from the past decades of standards-based reform.
More research is certainly needed, but we must be smart about how we evaluate the impact of
these reforms. In the multilayered, locally controlled US system of K–12 education, trends on
state NAEP scores will never yield clear answers, because the action is at the local level.
A recent report from the Council of the Great City Schools (CGCS) points to a very produc-
tive way forward. It asks the right research question: what accounts for effective districtwide
instructional improvement initiatives, and what role, if any, did state standards play? To
answer that question, the CGCS examined the instructional improvement strategies in large
districts and compared those that produced increased achievement with those that did not.
This work focused on school districts—where key implementation actions occur—and looked
That approach does not yield a simple succeed/fail conclusion on any one reform initiative.
But it does tell us where, how, and why standards are making a difference. And it underscores
the need to systematically build the capacity of local schools and districts to dramatically
improve the coherence of curriculum and instructional materials; professional learning; and
assessments, accountability, and continuous improvement efforts.
—Mike Cohen, former president of Achieve
Petrilli does an excellent job of summarizing the education reform efforts from the late 1980s
to the present. There are several points he made that resonate with me and with the experi-
ences I have had leading reform at the district and state levels.
First, the “tight-loose” strategy was doomed to failure. It assumes people would know what
to do, what strategies to employ, and how to improve student achievement. In my experience,
there simply are not enough leaders who have been trained to do this work in order to effect
change for a diverse population of students. Petrilli’s statement, “States need to show leader-
ship around curriculum and instruction rather than sit back and hope districts make the right
decisions on their own,” is correct. If people knew what to do, they would be doing it.
Second, there is a need for “coherence, alignment, and building ‘capacity’ in the system to
improve teaching and learning.” That needs to be coupled with accountability with a cap-
ital A. In my opinion, accountability drives the behaviors you want to see from educators.
The old adage “What gets measured gets done” is true. I wholeheartedly agree with Petrilli
as he summarizes his article stating, “A recommitment to accountability for results, along
with a focus on making classroom instruction more coherent, effective, and equitable, could
yield stronger results in the years ahead.” Students’ futures are at stake, and we owe them
nothing less.
—Dr. Carey M. Wright, former state superintendent of education for Mississippi
NOTES
1. See especially Marshall S. Smith and Jennifer O’Day, “Quality and Equality in American
Education: Systemic Problems, Systemic Solutions,” in The Dynamics of Opportunity in America:
Evidence and Perspectives, ed. Irwin Kirsch and Henry Braun (Cham, Switzerland: Springer
International Publishing, 2016), 297–358, https://link.springer.com/chapter/10.1007/978-3
-319-25991-8_9; Consortium for Policy Research in Education, “Putting the Pieces Together:
Systemic School Reform,” CPRE Policy Briefs (1991): 1–10, https://www.cpre.org/sites/default
/files/policybrief/847_rb06.pdf.
2. See, for example, American Federation of Teachers, “Achieving the Goals of Standards-
Based Reform,” last modified 2002, accessed June 26, 2023, https://www.aft.org/resolution
/achieving-goals-standards-based-reform.
Copyright © 2023 by the Board of Trustees of the Leland Stanford Junior University
The views expressed in this essay are entirely those of the author and do not necessarily reflect the
views of the staff, officers, or Board of Overseers of the Hoover Institution.
28 27 26 25 24 23 7 6 5 4 3 2 1
MICHAEL J. PETRILLI
Michael J. Petrilli is president of the Thomas B. Fordham Institute, research
fellow at the Hoover Institution, executive editor of Education Next, and host of
the Education Gadfly Show podcast. He is the author of The Diverse Schools
Dilemma, editor of Education for Upward Mobility, and coeditor of How to
Educate an American and Follow the Science to School.
A Nation at Risk + 40
The modern school-reform movement in the United States was set in motion by the release of the report
A Nation at Risk in 1983. Countless education policy changes at the local, state, and national levels came
as a result. A Nation at Risk + 40 is a research initiative designed to better understand the impact of these
efforts. Each author in this series has gone deep in a key area of school reform, exploring the following
questions: What kinds of reforms have been attempted and why? What is the evidence of their impact?
What are the lessons for today’s education policymakers? As the nation’s schools work to recover from the
effects of the COVID-19 pandemic, this series not only describes the education-reform journey of the past
forty years, it also provides timely and research-driven guidance for the future.
The Hoover Institution gratefully acknowledges Allan B. and Kathy Hubbard, the Daniels Fund, the William
and Flora Hewlett Foundation, and the Koret Foundation for their generous support of the Hoover Education
Success Initiative and this publication.