Revisiting The Impact of NCLB High-Stakes School Accountability

Revisiting the Impact of NCLB High-Stakes School Accountability, Capacity, and Resources:
State NAEP 19902009 Reading and Math Achievement Gaps and Trends
Author(s): Jaekyung Lee and Todd Reeves
Source: Educational Evaluation and Policy Analysis, Vol. 34, No. 2 (June 2012), pp. 209-231
Published by: American Educational Research Association
Stable URL: http://www.jstor.org/stable/23254111
Accessed: 16-04-2017 23:30 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
American Educational Research Association is collaborating with JSTOR to digitize, preserve and
extend access to Educational Evaluation and Policy Analysis
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
Educational Evaluation and Policy Analysis
June 2012, Vol. 34, No. 2, pp. 209-231
DOI: 10.3102/0162373711431604
2012 AERA. http://eepa.aera.net
Revisiting the Impact of NCLB High-Stakes School

Accountability, Capacity, and Resources:
State NAEP 1990-2009 Reading and Math
Achievement Gaps and Trends
Jaekyung Lee
State University of New York at Buffalo
Todd Reeves
Boston College
This study examines the impact of high-stakes school accountability, capacity, and resources under
NCLB on reading and math achievement outcomes through comparative interrupted time-series
analyses of 1990-2009 NAEP state assessment data. Through hierarchical linear modeling latent
variable regression with inverse probability of treatment weighting, the study addresses pre-NCLB
differences in state characteristics and trends to account for variations in post-NCLB gains. While
the states 'progress was uneven among different grades, subjects, and subgroups, NCLB did not yet
evidence sustainable and generalizable high-stakes accountability policy effects. Improving average
achievement as well as narrowing achievement gaps was associated with long-term statewide instruc
tional capacity and teacher resources rather than short-term NCLB implementation fidelity, rigor of
standards, and state agency s capacity for data tracking and intervention.
Keywords: NCLB, accountability, capacity, NAEP, achievement gap
North Carolina. Evidence for the effects of such

The No Child Left Behind Act of 2001 (NCLB;
Pub. L. 107-110, enacted January 8, 2002) is
pre-NCLB test-driven accountability policy on
student achievement, however, was mixed and
broadly aimed at achieving student proficiency
in reading and mathematics across all statesoften
and contradictory (see Carnoy & Loeb, 2002;
closing extant academic achievement gaps Grissmer & Flanagan, 1998; Haney, 2000; Ladd,
1999; Lee, 2008). Furthermore, recent studies
between identifiable subgroups of U.S. students.
also yield mixed findings on post-NCLB aca
Crucially, the policy is grounded in the theory
demic progress, generating controversy over the
that establishing measurable student performance
standards with consequences for schools policy's
will efficacy (see Center on Education Policy
[CEP], 2007c; Dee & Jacob, 2009; Duffett, Far
motivate the improvement of student achievement
kas,
outcomes. NCLB relies on high-stakes testing of & Loveless, 2008; Education Trust, 2006;
students to ensure that schools make adequate
Fuller, Gesicki, Kang, & Wright, 2006; Lee, 2006;
yearly progress (AYP) toward the goal of 100%
Wong, Cook, & Steiner, 2009).
student proficiency in these subjects by 2014. Mixed evidence for the effects of NCLB on
Test-driven external accountability policy under
student achievement may be well understood from
NCLB builds on the alleged success of first research design and methodological perspectives.
generation accountability states such as TexasFirst,
and previous studies often confounded policy
Lee and Reeves
effects by relying on data from states' ownan enriched multilevel analysis of both intrastate
assessments as a tool of both NCLB intervention and interstate variations in NAEP achievement
and evaluation at the same time. States tended to outcomes before as well as after NCLB. The
show more post-NCLB progress on their own model incorporates state capacity and policy
high-stakes tests, although such progress did not implementation factors beyond the adoption of
always transfer to independent low-stakes tests high-stakes testing, and time-varying school
such as the National Assessment of Educational resource effects into achievement trends. In
Progress (NAEP; Lee, 2010). In addition, even addition, this study addresses potential threats to
when the NAEP instead of state assessments is the internal validity of quasi-experimental
used for policy evaluation, post-NCLB change research on the impact of NCLB by using
may reflect a continuing trend that began before enhanced statistical control for selection biases
the policy was implemented. It remains to be and regression to the mean through inverse
rigorously examined whether and, if so, how and probability treatment weighting and latent
to what extent NAEP reading and mathematics variable regression techniques. Our study also
average achievement and achievement gap trends extends prior work by examining states' progress
are systematically related to state implementation toward narrowing academic achievement gaps
of test-driven accountability policies before and (i.e., those between students in the 10th/25th and
after NCLB. 90th/75th percentiles) as well as racial/ethnic and
Second, research efforts to evaluate NCLB socioeconomic achievement gaps and by
were thwarted by the complexity and variability explaining the gap trends in the broader, longer
of policy design and implementation in different term context of state capacity and endowment
states. Under NCLB, the existence of dual irrespective of federal policy.
accountability systems and interactions between Given current policy goals and intervention
federal and state policies also complicates the targets, this study tests the hypothesis that NCLB
analysis of post-NCLB achievement data. The promotes academic excellence and equity in
mandatory nationwide implementation of NCLB reading and math across all states by both
essentially precludes analysis of further impacts improving the average achievement of all students
of overall accountability systems by eliminating and narrowing the gap between disadvantaged,
a comparison group of states without such policies low-achieving and minority students and their
(Hanushek & Raymond, 2004). Nevertheless, counterparts. It also tests the hypotheses that
some previous studies attempted to capitalize on states with stronger educational capacity in place
interstate pre-NCLB accountability policy to produce desired student outcomes and with
variations (Dee & Jacob, 2009; Lee, 2006). This more timely, intensive, and rigorous
research design treats first-generation implementation of accountability policy under
accountability states as a comparison group and NCLB would experience a more positive impact.
second-generation accountability states as a
treatment group under NCLB. One major problem Analytical Framework
with this simple design, however, is the assumption
that all states are subject to the same dosage of The theory of action behind test-driven exter
accountability policy treatment under NCLB. The nal accountability policy is deemed fatally simple
earlier studies also had limitations in that it could (see Adams & Kirst, 1999; Benveniste, 1985;
take several years for a new federal policy to Elmore, 2002; Fuhrman, 1999; Newmann, King,
& Rigdon, 1997; O'Day, 2002; Wise, 1979). The
produce an effect and also that the effect, if any,
could be uneven between states, subjects, and logic of performance-driven accountability policy
subgroups within states as a function of their draws on rationalistic and behavioristic views of
human behavior by positing that holding schools,
preexisting differences as well as policy treatment.
In light of these concerns, this study employs teachers, and students accountable for academic
a new approach to the evaluation of NCLB with performance, with incentives provided (i.e.,
regard to its impact vis-a-vis excellence and rewards and sanctions), will inform, motivate,
equity policy goals. Our approach involves a and reorient the behavior of schooling agents
comparative interrupted time-series design, with toward the goal. Over the years, states' policy
210
State NAEP 1990-2009
approaches to accountability have switched from

second-generation statesthose states where test
a primary emphasis on input guarantees to per
driven external accountability was new. By this
formance guarantees (Elmore & Fuhrman, 1995).logic, states with no exposure to high-stakes
The advocates of input guarantees argue that
testing prior to NCLB would be more likely to
every student must have equal access to high
experience the effect of this new intervention by
quality learning by specifying key inputs (e.g., improving on their pre-NCLB achievement.
per-pupil spending, class size, and teacher train Previous studies on the impact of NCLB,
including Lee (2006) and Dee and Jacob (2009),
ing) in the form of binding opportunity-to-learn
(OTL) standards (O'Day & Smith, 1993). In con
attempted to address variation among states in
trast, the critics of OTL standards argue that hold
their accountability policy history prior to NCLB.
ing schools and students accountable for Specifically, states that did not have high-stakes
performance creates incentives for schools to find accountability policies before NCLB and were
out which practices work most effectively exposed only to the influences of external
(Hanushek, 1997). Although neither an input nor accountability under NCLB are compared with
an output focus does not automatically lead to states that were active in test-driven accountability
improvement in the distribution of student learn policy prior to NCLB. Their analyses compare
ing, the two approaches are not mutually exclu differences in both pre- and post-NCLB growth
sive, and combining them can be a more successful rates between these two groups of states to draw
path (Bartman, 2002). For example, lower class causal inferences about the impact of NCLB. Lee
sizes and increased, more equitable funding (2006) found that NCLB did not make a significant
in Texas have created a context in which the difference in improving reading and mathematics
accountability system could increase academic achievement or achievement gaps across the
excellence and equity (Grissmer, Flanagan, states.1 In contrast, Dee and Jacob (2009) reported
Kawata, & Williamson, 2000; Skrla, Scheurich, significantly positive effects in Grade 4 math.2
Johnson, & Koschoreck, 2004). Both Lee (2006) and Dee and Jacob (2009)
If an analysis were to find a distinctive effectemployed a comparative interrupted time-series
of NCLB, it could not, of course, be attributed toresearch design for their analyses of NCLB
just one part of NCLB such as high-stakes testingaccountability policy effects (see Design A in
or school accountability. Other policy initiativesFigure 1). This design is based on the assumption
under NCLB (e.g., teacher quality), and an initialthat first-generation accountability states maintain
influx of new federal funds, may have influencedthe same accountability policy as before NCLB
the trends as well. Moreover, some states that hadwhereas second-generation accountability states
high-stakes test-driven accountability prior toadopted this same accountability policy only after
NCLB continued their own policies along with NCLB. However, this design ignores the possibility
NCLB, thus creating a dual accountability system.
that federal NCLB accountability policy has added
By the same token, the pre-NCLB period is notits own features to what first-generation states had
free of similar types of interventions (e.g.,and also that second-generation states did not
restructuring). Consequently, both pre-NCLB andimplement that policy as faithfully and expeditiously
post-NCLB policy factors should be considered.as did their first-generation counterparts. To
Lawmakers did not intend that NCLB would address this implementation dimension and federal
supplant a state's preexisting accountabilitystate policy interactions, this study adds post
policy but rather that it would function as an NCLB policy variables to the model in addition to
just a pre-NCLB state policy variable (see Design
add-on to enhance or augment state policy. States
with strong accountability systems may be betterB in Figure 1).
positioned to embrace and implement NCLB Regardless of whether or not students in high
reform policy since implementation theoryaccountability states averaged significantly
predicts stronger implementation fidelity amonggreater post-NCLB gains on the NAEP than
agents/players who are accustomed to the students in states with little or no pre-NCLB
intervention. No matter what real impact NCLBaccountability measures in place, the reasons for
may have had on first-generation states, thethe presence or absence of such an effect remain
primary target of NCLB may have very well been unclear. Previous studies found that states' policy
211
First-generation xoxoxoxo xoxo xoxo
Accountability
states (e.g., TX,
NC.CA)
Second-generation o o o o xo xo xoxo
Accountability
states (e.g., IA,
ME, WY)
Pre-NCLB Period Post-NCLB Period

(1990-2002) (2003- 2009)
Design A: pre-NCLB state accountability policy (X) as a single treatment variable
First-generation X^X^XjOXjO X,+X,0 Xi+XjO X!+XjO

Accountability
states (e.g., TX,
NC,CA)
Second 0 0 0 0 Xyt-XjO
generation
Accountability
states (e.g., IA,
ME, WY)
Pre-NCLB Period Post-NCLB Period

(1990-2002) (2003-2009)
Design B: pre-NCLB state accountability policy (Xt) plus federal NCLB accountability policy (X,) as dual treatment variables
FIGURE 1. Comparative interrupted time-series research designs for analysis of NCLB policy im
Note. X, = pre-NCLB state accountability policy; X2 = NCLB federal accountability policy; X, + X2 = mix of fed
accountability policies under NCLB; X, , X: = delayed or watered-down version of X, + X2; O = student read
achievement average and gap measures.
approaches to school accountability vary inGiven

termsthe tradition that much of educational
policymaking
of their mix of pressure and support (Elmore & has been historically left to the
Fuhrman, 1995; Lee & Wong, 2004). Similar
states, individual states were able to negotiate an
patterns of interstate variations in accountability
implementation plan with the federal government
and to take advantage of built-in flexibilities
and capacity building efforts are likely to produce
around standard-setting under NCLB (Marion
different results after NCLB. However, previous
et al., 2002; Mills, 2008).3
studies did not examine the mechanism through
which high-stakes accountability policy might This
have study also constructs a state rigor of
affected student outcomes. This kind of black-box performance standards measure as another key
approach ignored variables that account for the dimension of NCLB accountability. The
relationship between state policy and student percentage of schools that did not meet
outcome variables. This study elaborates on Design performance targets and thus received
B by examining critical factors that might influence interventions depended not only on how well the
the success of policy implementation, that is, schools perform but also on how rigorous the
fidelity, rigor, and capacity. performance standards are (Lee, 2010). Recent
First, this study constructs a state fidelity of studies of NCLB policy interventions for schools
implementation variable as a predictor of policy identified as "needing improvement" (a
outcomes. Not all states fully implemented euphemism for failure) reveal problems. For
NCLB, at least in the first several years after the example, a RAND/AIR study (Zimmer, Gill,
law passed. One possible reason for interstate Razquin, Booker, & Lockwood, 2007) shows very
variation in post-NCLB progress is states' limited participation in supplemental education
different implementation strategies (Lee, 2010). services (24%-28% in elementary and <5% in
212
State NAEP1990-2009
high school) and school choice (<1%) and very

and achievement gaps in reading and mathemat
ics. The aggregate national trends may obscure
small and no effects of supplementary education
services and school choice, respectively, variations
on among states in terms of both the nature
student achievement. As last resorts, corrective
and extent of NCLB policy impact. The study
uses NAEP state-level aggregate fourth- and
action and restructuring also appear to have been
either underused or ineffective (CEP, 2008).4 eighth-grade public school students' achievement
However, few Title I schools undergoing results in these subjects during the 1990-2009
restructuring even experienced any of the listed period. The "interrupted" time-series aspect of
interventions (Mathis, 2009). this study's design involves division of the NAEP
Last, the study adds measures of states' trend period into pre-NCLB (1990-2002) and
capacity for district/school support. Previous post-NCLB (2003-2009) periods. It estimates
studies showed that test-driven accountability post-NCLB changes in average achievement rela
policies imposed changes in schools with little to tive to pre-NCLB trends among fourth and eighth
no support over the long haul and that this graders. The "comparative" aspect of the study
unfunded mandate has shortchanged schools design involves comparison of states varying in
(Kim & Sunderman, 2004; Linn, 2003; Porter & terms of their educational capacity and high
Chester, 2002). Although NCLB provides a stakes accountability policy history prior to
federal mandate for states to develop statewide NCLB as well as the fidelity and rigor of their
systems of support intended to build the capacity NCLB policy implementation.
of underperforming districts and schools, this new There can be several different possible
expectation for an enhanced role of state education trajectories concerning the post-NCLB change of
agencies in school improvement has faced serious state average achievement. For each of the three
challenges due to their own fiscal, administrative, potential post-NCLB growth patterns, changes
and technical capacity limitations (CEP, 2007b; may be observed in terms of growth rate (slope)
McClure, 2005; Rhim, Hassel, & Redding, 2008). and/or achievement status (level). When NCLB
Therefore, we consider that it is critical to measure has a significant positive effect, the performance
statewide endowment or capacity from a multi trajectory will shift upward with a marked increase
level education system perspective, incorporating in achievement level and/or growth rate. When
not only state agency-level capacity measures NCLB has a significant negative effect, the
such as funding and data system building for performance trajectory will shift downward with
district/school support but also school/classroom a marked decrease in the growth rate and/or level.
level educational endowment measures such as When NCLB has no effect at all, we expect no
teacher quality and class size. The effects of these change in the level and slope such that pre- and
key school resources on academic achievement post-NCLB growth patterns remain the same.
have been well demonstrated (see Ferguson, 1991; Hierarchical linear modeling (HLM) was
Finn & Achilles, 1990; Greenwald, Hedges, & used to track individual states' patterns of
Laine, 1996; Grissmer et al., 2000; Lee & Barro, academic growth and to examine interstate
1998). However, a previous study also found that variations (Raudenbush & Bryk, 2002). Two-level
state activism in high-stakes school accountability HLM analyses were conducted separately for four
policy did not usually lead to systemic changes outcome variables (see Appendix A). At Level 1
in the distributions of key school resources (Lee (time level), three temporal predictors were used
& Wong, 2004). The same problem could recur to keep track of each state /'s outcome variable Y
when states implement high-stakes accountability at year t; initial status (n ), pre-NCLB growth rate
policy under NCLB in lieu of or by means of (k ), post-NCLB change in the growth rate (rc ),
capacity building and investment. and post-NCLB change in the level of achievement
(7t ) were assumed to vary randomly among states.
Method The model postulates discontinuities in both slope
and level; it hypothesizes that both the growth rate
This study employs a comparative interrupted
and level change after NCLB. When both level
time-series design to explore the impact of the
and slope increase together, we expect sustained
NCLB policy on student achievement outcomes positive gain after NCLB such that the post-NCLB
213
Lee and Reeves
growth rate is significantly greater than the prestandard as measured by the discrepancy between
NCLB growth rate and the gains are not temporary.NAEP and state assessment results, which
The Level 1 model also includes time-varyingcaptures the scale of intervention. State education
covariates. Recognizing that demographicagency capacity variables include measures of
changes between cohort groups may influence building a longitudinal student achievement data
student achievement trends, we account for thetracking system and funding for SINI schools.
percentages of minority (i.e., Black and Hispanic)The effect sizes are reported based on student
and poor (i.e., eligible for free or reduced-price level standard deviations (o) of achievement in
lunch) students. Furthermore, the Level 1 model 2003.
includes teacher salary (as proxy for teacher Correlations among the Level 2 state variables
experience and quality) and pupil-teacher ratio show a weak to moderate degree of inter
(as proxy for class size) to examine the effects of
relationships but no indication of multicollinearity
key school resources on achievement outcomes problems. The fidelity of states' NCLB policy
over time among sequential cohorts. Per pupil implementation was positively associated with
expenditures, global measures of school resources, the intensity of their pre-NCLB high-stakes
are highly correlated with both the teacher salaryaccountability policy (r = .38) and slightly
and pupil-teacher ratio variables. Moreover, since
negatively associated with the rigor ofperformance
teacher salary and pupil-teacher ratio serve as standards (r = -.20 for reading; r = -.25 for
key determinants of instructional spending permath). This suggests that the first-generation
pupil, we used those two specific measures ofaccountability states were more likely to comply
school resources for elementary and secondarywith NCLB mandates but at the same time adopted
education.5 relatively lower performance standard levels. On
Our supplementary trend analyses of these the other hand, the study found a nonsignificant
time-varying covariates as dependent variables correlation between the state agency capacity and
showed significant growth in percentage minority/fidelity factors (r = .13 between data tracking
poverty students both before and after NCLB. capacity and implementation fidelity; r = -.12
Pupil-teacher ratio and teacher salary showed between school improvement grants for SINI and
different trends: Pupil-teacher ratio decreasedimplementation fidelity), implying that high
incrementally throughout the 1990-2009 period,stakes accountability policy was not systematically
whereas real teacher salary remained largelyaccompanied by capacity-building efforts at the
unchanged. There were substantial variationsstate level. However, the states' activism in
among states in the trend of increases in building a data tracking system was positively
school resources (associated with increases associated with the state agencies' capacity for
in achievement) as well as increases in the school support as measured by school improvement
poor, minority population (with decreases ingrants for SINI (r = .45).
achievement) throughout the entire period. These The Level 2 model involves comparing the first
forces may have worked together to influence stateand second-generation accountability states. To
achievement trends, independent of NCLB. address potential selection bias in drawing causal
At Level 2 (state level), not only pre-NCLB inferences about the impact of NCLB based on this
state accountability policy but also variables thatcomparison, this study applies inverse probability
tap into post-NCLB state policy activities are used
of treatment weighting (IPTW) and latent variable
to explain interstate variations in post-NCLBregression methods. IPTW builds on propensity
changes to state achievement trends; they include score matching that employs a predicted probability
fidelity of NCLB implementation, rigor ofof group membershiptreatment versus control
standards, data tracking capacity, and fundinggroupbased on observed predictors, which may
capacity for schools in need of improvementbe used for matching or as covariates for quasi
(SINI; see Appendices B and C). The fidelityexperimental research (Rosenbaum & Rubin, 1983).
variable measures how faithfully and quicklyIPTW realizes this matching by assigning differential
states complied with key NCLB federal weights to subjects based on the inverse probability
requirements in place, whereas the rigor variableof receiving a treatment at a given time conditional
captures the level of states' own performance on prior outcome history and other covariates
214
(Hirano & Imbens, 2002). Based on prior research

state-level aggregate policy measures employed
are only proxies for observable state policy
(Camoy & Loeb, 2002; Lee, 1997), we identified
six covariates (X, = % high school graduates,activities and responses that may or may not
coincide with local policy implementation such
X2 = % population below poverty, X3 = % White
population, X4 = % state share of education as changes in instructional time allocation and
classroom
revenues, X5 = traditionalistic political culture, X6 practices.7 Second, it remains
controversial whether using the NAEP as opposed
= average SAT scores corrected for % test takers)
to states' own assessment results is a more valid
that were likely to be associated with both policy
adoption and achievement outcomes; they were allway to measure policy impact on student
achievement outcomes under NCLB. We chose
measured around 1990, that is, prior to the states'
pre-NCLB accountability policy implementation not to use state tests as an outcome measure in
NCLB evaluations because they are part of the
period (1992-2001). Then, the treatment variable
(z)dummy variable for the existence of pre NCLB treatment/intervention. Third, our model
NCLB accountability policy based on the Dee and
tests for changes in both status and growth: Effect
Jacob (2009) studywas modeled as a functionon
ofthe former suggests immediate temporary gain/
loss in achievement whereas effect on the latter
the covariates (x) with linear and quadratic terms
suggests subsequent incremental gain/loss.
using binomial logistic regression to generate
However, it may not accurately capture delayed
propensity scores for treatment group assignment
policy effects. Fourth, one potential factor that
[^( 11) = 24.04,/? < .01 for the omnibus test of model
coefficients; Nagelkerke R2 = .52; classification may confound the results of the average
accuracy rate = 82%]. The results showed that the achievement and gap trend analysis is change in
first-generation accountability states were more the identification and exclusion of certain groups
likely to have lower high school graduation rates,
of students for NAEP testing, particularly students
with learning disabilities (SWD) and English
lower SAT scores, and traditional political culture
language learners (ELL). Since the exclusion rate
(e.g., centralized state control of education); despite
the statistical insignificance of the percentage of SWD and/or ELL students varied from state
to state, future research needs to consider this
poverty and percentage White variables, they were
interstate variation for potentially more valid
retained in the model based on prior research. With
the estimated propensity score (i.e., predicted
comparison of the state achievement trends.8
probability that each state adopts high-stakesFinally, because of the shift from voluntary state
accountability before NCLB, conditional on allparticipation
of in the NAEP to mandatory with the
the covariates), states were assigned weights NCLB
to legislation, analyses that rely on different
adjust for selection bias.6 Finally, we fit the HLM
sets of states can yield different estimates of the
model with IPTW weights at the state level to pre-NCLB and post-NCLB trends. A sensitivity
estimate NCLB accountability policy effects. analysis that uses only states with enough pre
Preliminary analyses showed that both the NCLB data to estimate a trend was conducted to
capacity and accountability factors are associated
see how much the differential state participation
with state initial statuses and pre-NCLB growthaffects results. Specifically, we selected states
that participated at least three times before
rates; high-capacity states tend to have a relatively
NCLB.9 We identified 40 states for Grade 4
higher initial status of student achievement (in
both subjects and grades), whereas high
reading and Grade 8 math, 33 states for Grade 8
reading, and 31 states for Grade 4 math that met
accountability (particularly high-rigor) states tend
to have a relatively lower initial status (in boththis criterion and then conducted separate analyses
with those selected states only. The sensitive
subjects and grades) or greater pre-NCLB growth
(in Grade 4 reading and Grade 8 math). Thus, analysis checked robustness by comparing these
a latent variable regression method was used limited state sample results with the original
50-state sample results. We found that the
to address the effects of initial status and pre
NCLB growth rate on post-NCLB changes (see magnitudes of coefficients are similar and their
Raudenbush & Bryk, 2002). significance patterns are almost the same such
that the substantive findings and conclusions do
The study has several limitations that warrant
caveats and further investigations. First, the
not change.10
215
Lee and Reeves
Results associated with decreases in states' average

achievement across subjects and grades. The
Post-NCLB State Average Achievement estimated intrastate regression coefficient for the
Trends
effect of percentage poor students ranges from
18 to -.24. For example, a 10 percentage point
Descriptive analyses of pooled data, state aver
increase of poor students (about one standard
age achievement scale scores from all 50 states
deviation of state average poverty rate) is
over the 1990-2009 period, show substantial
associated with a 2.4 point (.07 o) loss in Grade
intrastate and interstate variations (N= 362, M=
4 state average reading scale scores (10 * b40 =
217.58, SD = 7.31 for Grade 4 reading; N= 277,
-2.4). Consistent negative relationships of a
M= 262.61, SD = 6.04 for Grade 8 reading; N
similar magnitude are found with percentage
324, M= 231.57, SD =10.41 for Grade 4 math;
Black (b50 ranging from -.14 to -.25) and
N = 357, M= 274.39, SD = 10.70 for Grade 8
percentage Hispanic (bm ranging from -.09 to
math). The random coefficient growth model
-.16) across grades and subjects.
HLM analysis finds that there were mixed patterns
On the other hand, there were consistently
of changes in average achievement for different
positive effects of increases in teacher resources on
subjects and grades after NCLB across the states:
student achievement.11 The coefficient for a teacher
The overall directions of these changes were posi
salary effect (one unit = $1,000) ranges from .07 to
tive for math but negative for reading (see fixed
.20. For example, a $7,000 increase in teacher salary
effects in Table 1).
(about one standard deviation of state average
For Grade 4 reading, the status of average
teacher salary) was associated with a 1,4-point gain
achievement did not change (b}0 = -0.70), but the
(.04 a) in Grade 4 state average reading scale scores
rate of growth increased marginally (b20 = 0.26)
(7 * b10 = 1.40). At the same time, the coefficient
after NCLB. For Grade 8 reading, the status of
for the pupil-teacher ratio effect ranges from -.13
average achievement dropped after NCLB (b30 =
to -.58, with the exception of a marginally positive
-0.93) and the rate of growth also slowed down
effect in eighth-grade math. For example, a three
over the post-NCLB period (b20 = -0.36).
unit decrease in pupil-teacher ratio (about one
Consequently, the average total amounts of post
standard deviation of state average pupil-teacher
NCLB reading score gains, as observed during
ratio) was associated with a 1.74-point gain (.05 a)
the 2003-2009 period across all states (i.e., b30
of Grade 4 state average reading scale scores (-3 *
for post-NCLB change to status + 6 years x b20
fo80 = 1.74).
for post-NCLB change to growth), were 0.86 for
Grade 4 and -3.09 for Grade 8. In math, as Post-NCLB State Achievement Gap
with reading, more positive changes occurred to Trends
Grade 4 than Grade 8 but there were no negative
changes. For Grade 4 math, the status of average The results of random coefficient model
analyses for achievement gaps are summarized
achievement rose (b30 = 7.61) but the rate of
growth remained the same (b20 = 0.02) afterin Table 2. In reading, there were mixed, although
NCLB. For Grade 8 math, there were no significantgenerally negative, patterns in post-NCLB prog
changes in either status or growth. The averageress toward closing the achievement gaps between
total amounts of post-NCLB math score gains racial, socioeconomic, and academic subgroups.
For the achievement gaps between racial and
during 2003-2009 across all states were 7.73 for
Grade 4 and 0.97 for Grade 8. socioeconomic subgroups in particular, pre
It needs to be noted that these estimates of NCLB progress has either remained the same or
post-NCLB changes are above and beyond what slowed down after NCLB. There were no post
one would expect to happen based on a NCLB changes seen in either the status or growth
continuation of pre-NCLB trends. Furthermore, rate of the Black-White reading achievement gap
both pre- and post-NCLB trends would capture in Grades 4 and 8. The same can be said of the
residualized achievement gains or losses unique Hispanic-White reading gap.
from changes in student demographics and school A reversal of earlier progress since 1990 is most
resources. Increases in the proportion of minority evident for the socioeconomic reading achievement
and poor student populations were consistently gap. Post-NCLB increases in both the status (b}0
216
TABLE 1
Summary Results of Hierarchical Linear Modeling (HLM) Base Model (Random Coefficients Model) for th
Trends of State National Assessment of Educational Progress (NAEP) Reading and Math Average Achievem
Reading Math
Grade 4 Grade 8 Grade 4 Grade 8
Fixed effects
Initial status (Poo) 211.74*** (0.97) 256.83*** (1.57) 214.21*** (0.95) 260.08*** (1.05)
Pre-NCLB growth 0.49*** (0.08) 0.51** (0.15) 1.04*** (0.09) 1.28*** (0.07)
(Pio)
Post-NCLB change to 0.26* (0.13) -0.36* (0.17) 0.02 (0.11) 0.02 (0.11)
growth (P2o)
Post-NCLB change to -0.70 (0.50) -0.93** (0.31) 7.61*** (0.48) 0.86 (0.53)
status (P30)
% poor effect (P40) -0.24*** (0.04) -0.21*** (0.04) -0.18*** (0.04) -0.20**** (0.04)
% Black effect (P50) -0.14** (0.04) -0.15*** (0.04) -0.17*** (0.04) -0.25*** (0.05)
% Hispanic effect (Pgg) -0.09* (0.04) -0.12** (0.04) -0.12** (0.04) -0.16** (0.05)
Teacher salary effect 0.20** (0.06) 0.07 (0.06) 0.15** (0.05) 0.10+ (0.06)
$70)
Pupil-teacher ratio -0.58** (0.16) -0.55*** (0.13) -0.13 (0.14) 0.27* (0.15)
effect (P80)
Random effects
Initial status (tqq) 28.82*** 64 42*** 27.56*** 40.89***

0.11*** 0.61*** 0.18*** 0 16****
Pre-NCLB growth (tjq)
Post-NCLB change to 0.15 0.95*** 0 34*** 0.35***
growth (x2o)
Post-NCLB change to 1.02 1.35* 3.29** 6.96***
status (T30)
Level 1 variance (a2) 4.80 1.18 2.62 2.08
Deviance statistics (-2 1582.43 995.20 1521.80 1706.55
log likelihood)
Note. Unstandardized regression coefficients with (standard errors) are reported for fixed effects. NCLB = No Child Left Behind
Act of 2001.
< .10. *p< .05. **p < .01. ***p < .001.
= 1.87) and growth rate (b20 = 0.89) of the nonpoor Post-NCLB changes to the reading achievement
poor gap in fourth grade were observed. The totalgaps between academic subgroups were mixed;
amount of post-NCLB increases by 2009 in the the changes in status evidence setbacks, whereas
nonpoor-poor gap for fourth-grade reading was changes in growth rates suggest progress. We
7.21. Although somewhat smaller in magnitude, found an increase in the status of the fourth-grade
increases in both status (b30 = 1.73) and growth reading achievement gap between the 90th and
rate (b20 = .47) were also seen for eighth grade. For 10th percentiles (b30 = 2.20). However, we
fourth grade, the increase in gap level was observed a reduction in the growth rate (b20 =
attributable to differential post-NCLB drops for -1.11) for this gap. In eighth grade, we found an
both subgroups wherein poor students dropped increase in the status (b30 = 5.05) of the reading
more than their nonpoor counterparts. The post achievement gap between the 90th and 10th
NCLB increase in the growth rate of the percentiles. Similar post-NCLB increases to the
socioeconomic reading gap in fourth grade was status of both the fourth-grade (b3g = 1.19) and
attributable to a setback experienced by poor eighth-grade (b3g = 2.73) reading achievement
students only. For eighth grade, the post-NCLB gap between the 75th and 25th percentiles were
increase in the level of this socioeconomic reading observed. In contrast, we again found reductions
achievement gap occurred because poor students in the growth rate after NCLB in both fourth grade
dropped more than nonpoor students. (b20 = -0.70) and eighth grade (b20 = -0.33). The
217
TABLE 2
Summary Results of Hierarchical Linear Modeling (HLM) Base Model (Random Coefficients Model) for the
Trends of State National Assessment of Educational Progress (NAEP) Reading and Math Achievement Gap
Trends
Reading Math
White-Black
Pre-NCLB growth -0.23* (0.11) 0.09 (0.24) -0.35** (0.12) 0.23* (0.12)
Post-V^CLB change to -0.08 (0.25) -0.29 (0.29) 0.27* (0.14) -0.66* (0.18)
growth (p2o)
Post-NCLB change to 0.14(0.93) -0.03 (0.88) -1.26 (0.75) -1.60(1.10)
status (p30)
Teacher salary (P70) 0.18+ (0.09) 0.14(0.09) 0.14* (0.07) 0.24** (0.09)
Pupil-teacher ratio -0.04 (0.27) -0.23 (0.24) 0.12(0.18) 0.03 (0.25)
(080)
White-Hispanic
Pre-NCLB growth -0.37* (0.15) 0.08 (0.33) 0.06 (0.13) 0.14(0.22)

(P10)
Post-NCLB change to 0.50* (0.27) -0.32 (0.45) -0.21 (0.15) -0.62* (0.25)
growth (p20)
Post-NCLB change to -1.28(1.10) 0.30(1.08) -3.42** (0.96) -0.37(1.31)
status (P30)
Teacher salary (P70) 0.56*** (0.11) 0.31** (0.10) 0.31*** (0.07) 0.26** (0.09)
Pupil-teacher ratio 0.23 (0.27) -0.26 (0.25) 0.38* (0.17) -0.15 (0.23)
$80)
Nonpoor-Poor
Pre-NCLB growth -1.10*** (0.17) -0.66** (0.19) -0.22 (0.15) 0.37* (0.20)
(P10)
Post-NCLB change to 0.89*** (0.21) 0.47* (0.22) 0.15 (0.15) -0.49* (0.20)
growth (p20)
Post-NCLB change to 1.87** (0.59) 1.73** (0.57) -2.21** (0.75) -2.18* (0.97)
status (P30)
Teacher salary (P70) 0.34*** (0.06) 0.30*** (0.05) 0.27*** (0.04) 0.30*** (0.05)
Pupil-teacher ratio -0.03 (0.15) -0.09 (0.13) -0.13 (0.11) -0.11 (0.14)
90th-10th Percentile
Pre-NCLB growth 0.00 (0.13) -0.30 (0.22) -0.28* (0.13) 0.31** (0.09)
(Pl0>
Post-NCLB change to -1.11*** (0.28) -0.42 (0.28) 0.39* (0.15) -0.31* (0.12)
growth (P20)
Post-NCLB change to 2.20*0.15) 5.05*** (0.56) -4.29*** (1.06) -1.46* (0.63)
status (P30)
Teacher salary (P70) 0.16* (0.08) 0.27*** (0.07) 0.19*** (0.05) 0.37*** (0.06)
Pupil-teacher ratio 0.90*** (0.21) 0.56** (0.17) 0.57*** (0.14) 0.34* (0.16)
$80)
(continued)
218
TABLE 2 (continued)
Reading Math
75th-25th Percentile
Pre-NCLB growth 0.00 (0.07) -0.15 (0.12) -0.17* (0.08) 0.11* (0.05)
(Pio)
Post-NCLB change to -0.70*** (0.14) -0.33* (0.16) 0.20* (0.09) 0.00 (0.06)
growth (P2o)
Post-NCLB change to 1.19* (0.57) 2.79*** (0.31) -2.36** (0.62) -0.62f (0.32)
status (P30)
Teacher salary (P70) 0.06 (0.04) 0.14*** (0.04) 0.11*** (0.03) 0.21*** (0.03)
Pupil-teacher ratio 0.46*** (0.12) 0.24** (0.09) 0.35*** (0.08) 0.28** (0.08)
(080)
Note. Only selected fixed effect portions of the results are shown. Unstandardized coefficients with (standard errors) are reported.
Full model results for achievement gaps are available from the authors upon request. NCLB = No Child Left Behind Act of2001.
fp < .10. *p < .05. **p < .01. ***p < .001.
total post-NCLB increases by 2009 in the fourth-grade gap level was fueled by positive
75th-25th percentile gap for fourth- and eighth changes for both groups, with poor students
grade reading were -3.01 and 0.81, respectively. gaining more. The post-NCLB reduction in the
On the other hand, post-NCLB changes to gap growth rate in eighth grade was owed to
achievement gaps in math suggest some progress. positive post-NCLB growth rate changes for both
For the racial/ethnic achievement gaps, there were groups, with relatively faster growth for poor
indications of favorable changes in both the students. The reduction in the status of the poverty
Black-White and Hispanic-White gaps. For gap in eighth-grade math was similarly explained
example, we found a post-NCLB reduction in the by increases in status for both groups, which
growth rate of the Black-White gap for Grade 8 favored poor over nonpoor students. The total
(b20 = -0.66). This gap reduction occurred because reduction in the nonpoor-poor gap for eighth
of an increase in the post-NCLB growth rate for grade math by 2009 was 5.12.
Black students. The total post-NCLB reduction Our analysis of the academic gaps in math
of the White-Black gap for eighth-grade math by largely also suggests favorable changes after
2009 was 5.56. Next, the growth rate for the NCLB. For progress in closing the gap in math
Hispanic-White achievement gap in eighth grade achievement between the 90th and 10th
decelerated after NCLB (b10 = -0.62), and the percentiles, our analysis reveals post-NCLB
status of this gap in fourth grade decreased after reductions in status in fourth grade (bVj = 4.29)
NCLB (b30 = -3.42). The reduction in the gap and in both the status (b30 = -1.46) and growth
growth rate in eight grade occurred because of an rate (b2Q = -.31) in eighth grade. The post-NCLB
increase in the growth rate for Hispanic students. reduction in the status of the gap in fourth grade
The reduction in the status of the gap for fourth was caused by positive changes for both groups,
grade occurred because of larger increases in the with the 10th percentile gaining more. For eighth
status for Hispanic than White students. The total grade, the post-NCLB reduction in the academic
post-NCLB reduction in the White-Hispanic gap gap level was fueled by an increase in level for
for eighth-grade math by 2009 was 4.09. the 10th percentile. The decrease in the growth
There was also some indication of progress in rate in eighth grade was fueled by positive changes
narrowing socioeconomic gaps in math after for both groups, with the 10th percentile's
NCLB. For post-NCLB changes in the nonpoor relatively faster growth. Total post-NCLB
poor gap in math achievement, we found a reduction in the 90th-10th percentile gap for
reduction in the level of the gap in fourth grade eighth-grade math by 2009 was 3.32.
(bw = -2.21) and a reduction in both the growth As with overall academic achievement, we
rate (b20 = -0.49) and status (b}0 = -2.18) in eighth also examined the effects of two school resource
grade. The post-NCLB reduction in the variables (i.e., teacher salary and pupil-teacher
219
Lee and Reeves
ratio) on racial/ethnic, socioeconomic, and targeted to those in compensatory education or

academic achievement gaps. It is interesting that special education (Rothstein, 1995). In contrast, the
increases in teacher salary were associated with effects of teacher salary increase, if any, are more
increases in the magnitude of 18 of fhe 20 gaps likely to accrue to high-achieving students, since
examined. For example, a $1,000 increase in the custom of a backloaded teacher salary raise
salary was associated with a ,24-point increase gives larger benefits to veteran teachers (Lankford
in the gap between Black and White students in & WyckofF, 1997; Monk & Jacobson, 1985),
eighth-grade mathematics. The magnitudes of the whereas higher paid teachers with greater experience
other significant teacher salary coefficients (b10s) often transfer to more advantaged and high
ranged from. 11 to .56. Subgroup analyses showed achieving schools (Roza, 2010). Uneven effects on
that 12 of the 18 teacher salary effects on gaps the achievement gaps may follow such unequal
were driven by a positive relationship between distributions of teacher compensation and class size.
teacher salary and the achievement of only the
more advantaged group. The six others were NCLB and State Policy Effects on Post
fueled by a positive relationship between teacher NCLB Achievement Trends
salary and the achievement of the advantaged
group, on one hand, and a negative relationship Notwithstanding aggregated fixed effects pat
between teacher salary and the achievement of terns of achievement trends as summarized in the
the disadvantaged group, on the other hand. These previous sections, the random coefficient model
results suggest that an increase in teacher salary analysis finds that there were significant varia
(proxy for teacher experience and quality) might tions among the states in terms of the nature and
have stronger effects for more advantaged high extent of post-NCLB changes even after adjusting
achieving students. for the effects of time-varying covariates. Random
In contrast, increases in pupil-teacher ratio effects were significant for both the level and
were associated with increases in academic growth rate of state average achievement before
achievement gaps. For example, a one-unit and after NCLB across subjects and grades; one
exception was Grade 4 reading (see random
increase in pupil-teacher ratio was associated with
a .56-point increase in the achievement effects gap in Table 1).
between the 90th and 10th percentiles in grade To account for those interstate variations in
eight reading. The magnitudes of the other pupilpost-NCLB trends, a sequence of three state-level
models was applied to each of four achievement
teacher ratio coefficients (>80s) for the academic
outcomes as dependent variables (see Table 3).
gaps ranged from .24 to .90. Increased pupil
Model 1 does not include any adjustment for
teacher ratios were usually associated with lower
achievement for the more disadvantaged group covariates at both levels and includes only pre
only. Pupil-teacher ratio also had a similar, NCLB accountability as a sole predictor at Level
positive relationship with the mathematics 2. Model 2 adds all other Level 2 predictors with
achievement gap between Hispanic and White
IPTW weights but it does not include teacher
students in Grade 4. This effect was seen because
salary and pupil-teacher ratio variables at Level
a higher pupil-teacher ratio was associated with 1. Model 3 includes both resource variables at
lower achievement for Hispanic students only. Level 1 to account for changes in test scores
These results suggest that pupil-teacher ratio attributable to resource changes so that trends are
(proxy for class size) has stronger effects for more left to pick up the effects of accountability.
disadvantaged low-achieving students (see Finn Comparisons of results with and without resource
& Achilles, 1990). variables provide a range of results under different
The negative effect of teacher salary improvement assumptions about the relationship between
and the positive effect of pupil-teacher ratio resources and accountability.
reduction on achievement gaps might be related to The Model 1 results show partially significant
their different resource allocation patterns between effects of NCLB. Here, observed negative effects
advantaged high-achieving students and of pre-NCLB accountability for Grade 4 reading
disadvantaged low-achieving students. Pupil (Z>31 = -.78) and math (&31 = -.78) as well as Grade
teacher ratio tends to be lower for low-achieving 8 reading (b2l =-.15) mean that the first-generation
students, since class size reduction has been mostly accountability states improved less than the
220
^3
S
R
O
>0
CQ o
U 3
Z c
^ Hi
ffl CQ
>-3 )
u
-
w ^ .5 2 w
IS z Z
TABLE 3 SumaryResltofHirachlLnerModig(HL)Stae-vlModsfNChildLetBnAcof201(NCLB)PolicyEfetsnNaiolAsemntof EducationlProges(NAEP)ReadingadMathAveragAchievmnt
221
.S
43 co
i/~j On (nj On * 00 _
S * g; "i S s ^
oSiSd
a S
<u
e a a
*t
6
O^
-O
O
a,
CN co <D
co
? ^
T3
o <U
s >
s3
*
r On r r^T vo
r^T ON vo iri *
*t O ^t o o n -1 (N O *
00 (O
d d d co
i
"O o o CN <N
o ? ? o d ? 1 vq
2
.2
T3 H
2 S3
_*. w-v o M O^! fÔ ^ *
<N 5
2 ^ ^ ^ ^ * *W
? so e S/ 3
62 W ^ w ^ C-
O J o .
(U O
o "3
co W
S3
>. <N
On _
.S -a
w o
I |6
_ Tf On i On 0 -s
CO ^ "-J
PQ ^ 1 S
T3
o '? ? w 7 i-J o IJ
c a>
s <-> S
^ 3 c3
- ' V3 6
<u ^ ' 43 1/5
CD a ^ .ts u
o
^ -5
* O,, 'tôvi'-oovio* *2 "S
^M^nnnoooooofjN c 'S
-o
o
P, H P. ÔwOw^f
H p w
IW
"7w.
^ .2 ^
o s
f s
T3 O
SS
E.-0
a> g
-. cc
i "2
Ia 3o
<u o
c3 S
? a g
b
! S fS
'. co " i r
o ^ ^ vo r o\
\0 N -h iri (S
! i-3
1 S S
, <^> O, Cy> CO' X> I-J
1 *c ^
' > .S
a
<D i-l
W hr 0) 00
00 N O N -H 00 IO
^ c
3 s r>L"vOrsj^Ttoovo
. nr\
S
p. H
o < T3 '
2 o
<11 O
3 I v
00 <N
r
-g $ ^
a *
d
? c 00
S .S O
us ^^ V
3 r
s ^
O D
> w
<N m
S
3a -SI
co co CO
& vo
3 3 r+~>
'1
w
>2 CO.
43 o
O -3T (
g 2> 5 43 2
3JJ o o c a x, v
U .2 -o
I J3 "O
8. ^
00
ro Z c g a g. "*?> j
i? m
o ca CQ CQ On <L>
w o S J" M
o
ox) ^ 1 J
<N ^ O
g ^
hJ ^ Ji ^ cti
!T G, U
.IP 12 z Z - fl "1
s1 1
ffl i) >
S 5 & -Si H
?s
N~/ "O "" op
E 5 u
_p P-.
Sli
222
second-generation states since NCLB. Flipping this Furthermore, the Model 3 results give only
partial support for the hypothesis that states'
interpretation, the "negative" effect of having "pre
NCLB" accountability policy on "post-NCLB"
administrative/financial capacity for district/
status or growth would signify a "positive" effect
school support under NCLB would bring academic
improvement (see "data tracking capacity" and
of adopting NCLB accountability policy. However,
"funding capacity for SINI" in Table 3).
after controlling for pre-NCLB state characteristics, Two
trends, and other covariates including demographicstandard deviations of an increase in the data
tracking capacity variable is associated with
changes, Model 2 results make the effects of pre
NCLB accountability insignificant, except fora a
2-point gain in the status of Grade 8 math
achievement (2 * b}4 = 1.98). Significant effects
consistently significant effect on post-NCLB change
on status but not growth indicate that the effects
to growth in Grade 8 reading (bn = -.22). A two
standard deviation decrease in the pre-NCLB high
were immediate but not sustainable. State funding
capacity for SINI was not systematically
stakes accountability (equivalent to a move from
associated with post-NCLB changes in state
first-generation to second-generation status) was
average achievement. However, additional
associated with a very small gain by 2009 in Grade
8 reading: -2 x (b3l + 6 b2l) = 2.36 (.07 a). Even subgroup
so, and gap analysis showed that funding
the validity of this interpretation depends on capacity
its for SINI was associated with improved
post-NCLB gains for only low-achieving students
implicit assumption that all states, including both
and thus smaller academic gaps between the
first- and second-generation accountability states,
implemented NCLB accountability policy equally
75th and 25th percentiles and the 90th and 10th
well. percentiles in math.
The Model 2 results for most state-level Last, the results of HLM latent variable
predictors are highly similar to correspondingregression in both Models 2 and 3 show the
Model 3 results in all subjects and grades. Thistendency for states that had a relatively lower initial
suggests that accountability policy did not
status and/or relatively smaller gains prior to NCLB
influence overall school resources (i.e., teacherto experience more immediate gains and/or faster
salary and pupil-teacher ratio) and thus policygrowth after NCLB (see coefficients forpre-NCLB
level and pre-NCLB growth in Table 3). Because
effects, if any, are not mediated by those resources
that have independent effects on achievementthe first-generation accountability states gained
outcomes. As shown in the likelihood ratio test faster than their second-generation counterparts
results, Model 2 explains a significant share of during the pre-NCLB period, a reversed pace of
the interstate variations in post-NCLB trends, growth is likely to occur to those two groups of
whereas Model 3, with the same set of state policy states by chance regardless of NCLB policy impact.
variables plus school resource variables, explains The first-generation states are unlikely to sustain
even more of the variations than Model 2. the same rate of growth under NCLB perhaps as
The Model 3 results show that states' fidelity a result of diminishing returns to high-stakes
and rigor in NCLB implementation have a accountability policy and thus fading policy
few positive effects regardless of pre-NCLB impact, whereas the second-generation states are
accountability status. After controlling for other likely to make faster gains than before in spite of
possible confounders, positive gains were their less faithful implementation of NCLB.
associated with NCLB implementation fidelity Without considering this pattern that arose, possibly
{b12 = .14 for Grade 8 math) as well as the rigor due to diminishing returns to accountability policy,
of standards (b2i = .69 for Grade 8 reading and we might overestimate the impact of NCLB by
b23 = . 12 for Grade 4 math). Even those weak observing greater gains among the second
positive effects of fidelity and rigor variables are generation states than the first-generation states.
hard to generalize, since they are not observed
consistently across grades and subjects. By and Summary and Conclusion
large, the state-level regression results reveal only
limited potential effects of high-stakes school This study updates and revisits earlier evalu
accountability policyeffects that failed to make ations of the NCLB policy's impact with regard
either statistically or practically significant to progress toward the goal of improving profi
differences in post-NCLB trends. ciency for all students and narrowing student
223
Lee and Reeves
achievement gaps with NAEP data collected by greater shortage of qualified teachers in math than
2009 in reading and math. It offers new insights in reading, particularly within high-minority and
for the evaluation of NCLB by differentiating high-poverty schools (National Center for Education
high-stakes school accountability and capacity Statistics [NCES], 2002). Given these conditions,
building policy initiatives among the first- and it is hard to understand why the test results improved
second-generation accountability states over an in math but not in reading during the NCLB period
extended pre-NCLB/post-NCLB time period. The (2003-2009). This inconsistency begs the question
study is also more methodologically judicious of whether these changes are necessarily attributable
than previous studies by refining the comparative to the impact of NCLB.
interrupted time-series design through statistical The findings of our study challenge those of
modeling that addresses more potential threats to some previous studies. For example, an earlier
causal inferences about the policy impact. study by Dee and Jacob (2009) reported significant
By and large, there were highly mixed patterns positive effects of NCLB in math but not in
of post-NCLB changes in terms of improving reading based on NAEP data through 2007. In
reading versus math achievement across the nation. contrast, our study found a significant positive
The comparison of pre- and post-NCLB reading effect only in Grade 8 reading based on NAEP
outcome trends showed that the level of state data through 2009. Two possible reasons for the
average achievement as well as the pace of divergence of these findings from those of Dee
achievement gains have either remained the same and Jacob (2009) include our longer NAEP time
or declined after NCLB. In contrast, the earlier frame and our additional efforts to consider
progress in math has continued or accelerated with internal validity threats. It appears that, without
more gains after NCLB than before. However, the adequate statistical control for pre-NCLB state
magnitude of these cumulative achievement gains characteristics and achievement trends, one
or losses in the post-NCLB period (2003-2009) observes more tentative gains among the second
was relatively small. Similarly, there were mixed generation accountability states (in comparison
patterns of progress in narrowing achievement gaps with the first-generation counterparts) in Grade
in reading versus math. In reading, the states did 4 math and reading during the period of 2003
not narrow achievement gaps since NCLB but 2009. However, those modest post-NCLB gains
instead experienced a setback to the earlier became even smaller and insignificant once we
progress. In math, there was some significant took into account differences in the earlier results
progress for narrowing the achievement gaps. and diminishing returns to accountability policy.
Regardless of the subpopulations involved and the Furthermore, cross-state variations in the fidelity
grade/subject, however, it is noteworthy that the of NCLB policy implementation and the rigor of
magnitude of post-NCLB changes in the gaps is performance standards were not systematically
not only short of meeting NCLB achievement associated with post-NCLB academic improvement
targets for all students but also particularly and achievement gap patterns. If the NCLB policy
insufficient to redress setbacks to the earlier were to be ascribed as a cause for any observed
national progress in narrowing racial achievement post-NCLB change in achievement gaps, one might
gaps (see Lee, 2002; Peterson, 2006). expect that these predictors tied to the policy itself
Were different patterns of post-NCLB academic would be consistently related to the outcomes.
improvement in reading and math related to NCLB? Although there was limited evidence for positive
The high-stakes external accountability policy under effects of high-stakes accountability, this was often
NCLB appears to have placed equal or similar restricted to immediate gains right after NCLB
priorities on improving both reading and math without sustainable effects on further growth. The
achievement. However, there were actually more study also found limited evidence for positive effects
favorable changes in instructional conditions for of state agency capacity for district/school support,
reading than for math (e.g., considerable investment particularly building achievement data tracking
in an early reading program). A study of changes in systems and funding schools in need of improvement.
instructional time allocation since NCLB also shows On the other hand, our study found more
that schools allocated relatively more time to reading consistent positive effects of statewide educational
than to math (CEP, 2007a). In addition, there was a resources, independent of accountability policy,
224
on achievement gains. The trends of states' Appendix A

educational resources as measured by pupil
HLM Final Models
teacher ratio and teacher salary were not related
to state activism in high-stakes accountability
Level 1 Model:
policy before or after NCLB. Although states'
targeted financial and technical support for
Yfi = *0i + wj,-(Pre-NCLB Time)?J- + Jt2,(Post
schools identified as needing improvement under
NCLB Time);,- + ^/(NCLB)# + 7C4,(%Poor)^- +
NCLB might be potentially cost-effective, this
7i5/(%Black)?j + 7t6/(%Hispanic)ft- + ^{Teacher Sal
approach shortchanges the long-term need for
ary) n + 7tg,-(Pupil-Teacher Ratio);; + ef;
statewide educational investment in school/ (Pre-NCLB Time)? is the number of years elapsed
classroom level infrastructures such as more since the baseline until year ? (0 for 1990,1 for 1991,...
qualified teachers and smaller classes. The study19 for 2009);
also demonstrates that the statewide improvement (Post-NCLB Time)? is the number of years
of educational resources was associated with elapsed since the enactment of NCLB until year ? (0
for 1990 through 2002, 1 for 2003, 2 for 2004,. . .
either the widening or narrowing of achievement
7 for 2009);
gaps. Uneven effects of teacher salary and pupil
(NCLB)f is a dummy variable for the presence of
teacher ratio on the achievement of racial,
NCLB at year t (0 for 1990 through 2002, 1 for 2003
socioeconomic, and academic subgroups suggest through 2009);
that teacher resources may work to either increase (%Black)?i, (%Hispanic)?i, and (%Poor)?i are the
or decrease achievement gaps. percentages of Black, Hispanic, and poor students,
Given mixed evidence on the mid-course respectively, in the state i from which achievement
outcome variables were drawn at year ?;
efficacy of NCLB, this study has implications for
subsequent research conducted to track future (Teacher Salary)?, and (Pupil-Teacher Ratio)?,, are
policy changes and outcomes across different average teacher salary and pupil-teacher ratio, respec
tively, in state i at year ?;
groups of states (i.e., the first- and second
generation accountability states) and to explain ttq is the initial status of achievement at baseline
year (? = 0);
complex patterns of post-NCLB academic
7tj is the pre-NCLB annual growth rate during the
progress in different subjects and varied results
pre-NCLB period (i.e., achievement gain per year dur
for different types of achievement gaps. Although
ing 1996-2002);
the ultimate impact of NCLB in its current form712's post-NCLB increment to the pre-NCLB
remains uncertain and this study cannot pinpoint achievement growth (i.e., change in JC] during
specific directions of policy changes in anticipation
2003-2009);
of NCLB reauthorization, it is important to 713 is the post-NCLB elevation to the pre-NCLB
achievement status (i.e., change in Eq during
address the limits of external test-driven
2003-2009);
accountability policy in terms of building long
term instructional capacity and producing
sustainable academic gains across the system.
Level 2 Model:
Although the study does not find a tradeoff
between the goals of improving average
achievement and narrowing achievement gaps, it K0i= POO + r0 i
n\i= PlO + rli
is a tall order for a federal educational policy to
%2i ~ P20 + P2l(Pre~NCLB Accountability); +
promote both academic excellence and equity.12
(^(Fidelity); + ^(Rigor); + p24(Data Tracking); +
p25(Funding for SINI); + p26(7C0 f) + P27(7tli)+ r2i
Authors' Note
n3i = P30 + P31 (Pre-NCLB Accountability); +
p32(Fidelity);- + ^(Rigor); + P34(Data Tracking),- +
An earlier version of this article was presented P35(Funding for SINI); + p36(7C0i) + + r3i
at the 2010 annual meeting of the American Edu npi = PpO {p = 4, 5, 6, 7, 8)
cational Research Association.
225
Lee and Reeves
Appendix B explicitly to standards for Grades 3-8 and

high school.
Description of State-Level Independent Partial = State has reading/language arts
Variables standards in some, but not all, of required
grades, or State requires districts to have
1. Pre-NCLB State Accountability: a dummy variable standards, but not in all required grades, or
for the adoption of pre-NCLB state accountability the Policy for required standards is in final
policy (coded 1 for the first-generation accountability adoption/enactment stage.
states and 0 for the second-generation accountability No = Reading/language arts standards are not
states). This variable was obtained from Dee and evident, or No suggested grade-level
Jacob (2009) and it represents whether or not each expectations evident, or Standards might be
state had a high-stakes consequential accountability encouraged but not required.
system in place prior to NCLB. From the same data
source, a continuous variable version was also 3. Rigor of Standards: This variable measures the rigor
created to capture variation within the first-generation of states' student performance standards for Grades
states in the duration of pre-NCLB accountability 4 and 8 reading and math based on the average
policy by subtracting implementation year (ranging discrepancy between state assessment- and NAEP
from 1992 in Illinois to 2001 in Alaska) from 2003 based proficiency rates in logits during 2003,2005,
(the 1st year of NCLB implementation). This variable and 2007. The higher student proficiency rates based
is highly correlated (r = .63) with a composite on states' own assessments relative to NAEP, the
measure of state activism in high-stakes test lower the rigor of state proficiency standards.
accountability policy from Lee & Wong (2004). Criterion-related validity evidence for this study's
2. Fidelity of NCLB Implementation: This variable is measure of the rigor of standards is given by its high
a composite index of NCLB state implementation correlations with other similar measures: r = .97 for
fidelity. It is derived from the Education Commission Grade 4 reading, r = .96 for Grade 8 reading, r = .81
of the States (ECS) database, which tracks state for Grade 4 math, and r = .85 for Grade 8 math for
laws, departmental regulations, board rules, and its relationships with NCES (2007) estimates of the
directives and practices related to requirements stringency of 2005 state proficiency standards, and
across seven major sections of the NCLB legislation:r = .91 for Grade 4 reading, r = .78 for grade 8
(a) NCLB Standards and Assessments, (b) NCLB reading, r = .85 for Grade 4 math, and
Accountability (AYP), (c) NCLB School r = .76 for Grade 8 math for its relationships
Improvement, (d) NCLB Safe Schools, (e) NCLB with Fordham Institute (Cronin, Dahlin, Adkins,
Supplemental Services, (f) NCLB Report Card, and & Kingsbury, 2007) estimates of the rigor of
(g) NCLB Teacher Quality. The 3-point scale ratings 2005 state proficiency standards. Imperfect
of state policy implementation status (3 = on target, correlations with those of previous studies are due
2 = partially on target, 1 = not on target) are to the fact that this study combined the rigor measures
combined across 38 items in the seven areas as across 5 years (2003-2007) to capture stable and
updated in September 2007 (ECS, 2007). The alpha consistent effects of the post-NCLB standards.
reliability coefficient for the index is .81. The ECS4. Data Tracking Capacity: This variable is a composite
used analytical scoring rubrics, so-called "decision index of longitudinal student achievement data
points," to determine whether the 50 states were on tracking capacity. It is derived from the Data Quality
track to meet NCLB requirements (see ECS, 2005, Campaign (ECS) 2009-2010 state profile database
for rubrics for all 38 items). For example, the rubrics (see http://www.dataqualitycampaign.org), which
for reading standards, one of the 38 items, are as tracks state activities related to 10 essential elements
follows: (i.e., core components of a robust longitudinal data
system) and 10 action requirements (i.e., support
system for using data to make informed decisions
Yes (On Target) = State has academic content and improve student performance). The 10 core
standards in reading/language arts in Grades elements include (a) statewide student identifier,
3-8 and high school as required under the (b) student-level enrollment data, (c) student-level
1994 Elementary and Secondary Education test data, (d) information on untested students, (e)
Act (ESEA) or requires districts to have statewide teacher identifier with a teacher-student
standards. If state has standards for grade match, (f) student-level course completion data, (g)
bands rather than individual grade levels, the student-level SAT, ACT, and AP exam data, (h)
state also must have "suggested grade-level
expectations" (or the equivalent) that are tied (continued)
226
Appendix B (continued) improvement funds ($) by total number of SINI across

the years of2004,2005, and 2006. The number of SINI
is obtained from Education Week reports ("AYP
student-level graduation and dropout data, (i) ability Status," 2004; "Preliminary NCLB Results," 2006).
to match student-level P-12 and higher education School improvement funds (SIF) were estimated by
data, and (j) state data audit system. The 10 action taking 4% of each state's Title I Part A funds (i.e., 4%
requirements include (a) linking data systems, (b) set aside for the SIF out of ESEATitle I Grants to Local
creating stable sustainable support, (c) developing Educational Agencies) (Source: U.S. Department of
governance structures, (d) building state data Education website, http://www2.ed.gov/about/
repositories, (e) implementing systems for timely overview/budget/history/index.html). States are
access, (f) creating progress reports using student supposed to reserve 5% of the SIF to support the
data, (g) creating reports to guide systemwide operation of their systems of school support and
improvement, (h) developing a P-20/workforce allocate the remaining 95% for district-level and
research agenda, (i) promoting educator professional school-level grants (McClure, 2005). As a proxy
development/credentialing, and (j) promoting indicator of funding capacity for SINI, the variable is
strategies to raise awareness of available data. The based on per school funding for schools that were
number of elements and actions met in each state identified in need of improvement as a result of failing
(1 = yes, 0 = no) is summed across 20 items. The to meet the AYP target for at least 2 consecutive years
alpha reliability coefficient for the index is .64. and were subjected to different stages of NCLB
5. Funding Capacity for SINI: This variable measures interventions: (a) school transfer, (b) supplementary
the amount of funding available for schools in need of education service, (c) corrective action, (d) restructuring
improvement (SINI) by dividing total school (1st year), and (e) restructuring (2nd year).
Appendix G. Measures of State-level HLM Model Predictors
Fidelity of Rigor of Rigor of Data Funding

Pre-NCLB NCLB Standards Standards Tracking Capacity
State Accountability Implementation (Reading) (Math) Capacity for SINI
Alabama .81 .61 -.64 -1.03 -.16 -.15

Alaska -.61 .90 -.89 -.35 -.55 -.56
Arizona -.97 -.13 -.01 .19 -.95 -.28
Arkansas .10 .75 .68 .51 2.62 -.51
California .10 .90 1.17 .92 -.95 -.33
Colorado -.97 .61 -1.19 -1.18 -.55 -.34
Connecticut .10 .75 .40 -.66 -.55 -.07
Delaware .46 .61 -.65 -.02 1.43 -.32
Florida .10 .46 .81 .10 1.82 .10
Georgia -.26 .75 -1.59 -1.06 1.03 -.37

Hawaii -.97 -1.01 .67 1.76 -.16 -.53
Idaho -.97 .17 -.45 -.33 -2.93 -.46
Illinois 2.59 .90 .05 -.63 -.55 -.41
Indiana 1.52 .02 .04 -.31 .24 -.13
Iowa -.97 .31 -.03 -.55 -.55 -.15
Kansas 1.52 -.42 -.20 -.14 .63 .48
Kentucky 1.52 1.19 .46 .74 1.43 .00

Louisiana .10 .31 -.05 -.41 .24 .13
Maine -.97 -2.62 1.51 1.79 .24 -.18
Maryland .10 1.05 .07 .16 -.95 -.40

Massachusetts .46 .46 1.48 2.23 .24 -.43
Michigan .46 .17 -.26 -.24 -.55 -.32

Minnesota -.97 -1.15 .35 .43 .24 -.09
Mississippi -.97 .17 -1.68 -1.41 .24 .98

Missouri -.97 .46 2.31 1.90 .24 .02
Montana -.97 -2.77 .19 .27 -1.74 -.39
(continued)
227
Lee and Reeves
Appendix C (continued)
Fidelity of Rigor of Rigor of Data Funding

Pre-NCLB NCLB Standards Standards Tracking Capacity
State Accountability Implementation (Reading) (Math) Capacity for SINI
Nebraska -.97 -1.89 -1.03 -1.67 -.95 .09

Nevada 1.17 .61 .58 .11 -.55 -.42
New Hampshire -.97 -.57 1.49 1.18 -.55 -.22
New Jersey -.97 -.27 -.09 .00 -.55 -.46

New Mexico .46 .17 .35 .37 .24 -.43
New York .46 .90 1.01 -.21 -.55 -.14
North Carolina 1.17 .75 -1.46 -1.13 .63 .08
North Dakota -.97 -1.74 -.03 .27 -.55 -.22
Ohio -.97 .90 -.22 .13 .63 -.34
Oklahoma 1.17 .90 -1.41 -1.61 -.55 -.27
Oregon -.26 -1.01 -.24 -.37 -.16 .31
Pennsylvania -.97 .46 .66 .30 -.55 -.24

Rhode Island .81 .02 .91 .93 -.95 .09
South Carolina .10 .02 1.54 1.60 -.55 -.34
South Dakota -.97 .90 -.61 -.10 -.95 -.50
Tennessee -.26 .75 -2.37 -2.53 .24 -.21
Texas 1.88 .02 -1.57 -.86 2.62 6.45
Utah -.97 -1.74 -.36 -.35 .63 .57
Vermont .10 -1.45 1.02 1.11 -.16 -.27
Virginia .46 .75 -.51 -.68 1.03 -.16
Washington -.97 .46 .44 1.10 1.03 -.25
West Virginia .81 -.42 -1.34 -1.20 -.16 .09

Wisconsin 2.24 .02 -.74 -.17 .24 .19
Wyoming -.97 -2.03 1.41 1.11 1.03 1.31
Note. All of the above variables are standardized to have a mean of zero and a standard deviation of one. Therefore, each state's
z-score indicates its relative standing among all 50 states.
Notes high-minority, large urban schools that receive more

Title I funding were more likely to have been identified
for improvement under NCLB (Kim & Sunderman,
1. An exception was a temporary gain in fourth-grade
mathematics achievement immediately after NCLB, 2004; LeFloch et al., 2007), their chance of exposure
although the post-NCLB growth rate quickly reverted
to policy interventions varies among states.
to its initial pre-NCLB level. The racial and socioeco 4. Common interventions for corrective action
nomic achievement gaps in NAEP reading and math involved changes in curriculum or the appointment of
ematics achievement persisted after NCLB. outside advisors. Reopening the school as a charter
2. Dee and Jacob (2009) pointed out the possibilities
school, replacing all or most of the school staff, or
that the earlier study by Lee (2006) may not have cap
turning over school operations to either the state or a
tured the effect of accountability on the grounds that
private company with a demonstrated record of effec
(a) the study is underpowered due to a restricted time
tiveness were rarely used options (U.S. Department of
period containing data only through 2005 and (b) it
Education, 2003).
used an accountability policy measure (Lee & Wong, 5. The data source for both variables is Digest of
2004), which includes non-NCLB policy requirementsEducation Statistics during years 1989-1990 through
such as student accountability. 2008-2009, which provides (a) an estimated average
3. This also led to wide variation in the level of annual salary of teachers in public elementary
student proficiency and/or school AYP standards. and secondary schools by state and (b) pupil-teacher
Although approximately 13% of schools were identiratios in public elementary and secondary schools by
fied for improvement across the nation in 2005 (18% state.
for Title I schools), the identification rate varies from
6. The following formula was used for computing
2% in Iowa to 68% in Florida. Although high-poverty,stabilized weight (w) for each state i:
228
11. The significance and strength of these intrastate

P(T = \\Tt P(^=o)i(i-n relationships between time-varying measures of school
p(J = 1IX), p(T = 01 X); resources and student achievement in this study con
verge with prior meta-analysis research. Greenwald,
For states (iV= 30) with a pre-NCLB high-stakes school Hedges, and Laine (1996) reported an average effect
accountability policy (T= 1), the greater the chance of size of approximately .21 SD achievement gain per
pre-NCLB treatment group assignment (or presumably $7,000 increase for teacher salary and a .03 SD achieve
post-NCLB control group) conditional on the covariates ment gain per 3 student decrease in pupil-teacher ratio.
[p( T= 1|A)], the smaller the weight it gets for regression 12. Socioeconomic changes and other state policies
analysis. For states (N= 20) without a pre-NCLB high such as welfare might also contribute to student
stakes school accountability policy (T = 0), the greater achievement gap trends (see Lee, 2002; Miller &
the chance of pre-NCLB control group assignment (or Zhang, 2009). It calls for more comprehensive policy
presumably post-NCLB treatment group) given covariates strategies to address factors beyond schools.
[p(T-01*)], the smaller the weight. Regression analysis
based on this weighted sample is expected to produce
References
unbiased estimates of the treatment effect, independent
of observed differences in the initial treatment and control
Adams, J. E Jr., & Kirst, M. W. (1999). New
groups (i.e., first- and second-generation accountability
demands and concepts for educational accountabil
states). Before applying the IPTW method for the estima ity: Striving for results in an era of excellence. In J.
tion of NCLB policy effects, we used a propensity score Murphy & K. S. Louis (Eds.), The handbook of
stratification method to ensure that treatment and control
research on educational administration (2nd ed.,
group members are balanced on their propensity score pp. 463489). San Francisco: Jossey-Bass.
and key covariates of selection into the treatment. AYP status. (2004, December 8). Education Week.
7. The current NAEP collects very limited types and Retrieved September 1,2007, from www.edweek.org
amounts of information on classroom practices, and the Bartman, K. D. (2002). Public education in the 21st
available teacher survey variables are hardly consistent century: How do we ensure that no child is left
between different rounds, before and after NCLB. Our behind? Temple Political & Civil Rights Law
use of states as the unit of analysis also raises concerns Review, 12{ 1), 95-119.
about aggregation bias at the state level as well as con Benveniste, G. (1985). The design of school account
cerns about a limited sample size and statistical power. ability systems. Educational Evaluation and Policy
8. On one hand, as a result of demographic changes, Analysis, 7, 261-279.
the national average identification rate of SWD and/or Carnoy, M., & Loeb, S. (2002). Does external account
ELL students in NAEP has increased over the past 15 ability affect student outcomes? Educational Evalua
years and thus tends to be higher for the post-NCLB period tion and Policy Analysis, 24(4), 305-331.
than for the pre-NCLB period. On the other hand, as a Center on Education Policy. (2007a). Choices, changes,
and challenges: Curriculum and instruction in the
result of accommodation permitted since 1996 in math
and since 1998 in reading, the national average exclusion
NCLB era. Washington, DC: Center on Education
rate of SWD and/or ELL students in NAEP has decreased Policy.
over time and thus tends to be somewhat lower for the Center on Education Policy. (2007b). Educational
architects: Do state education agencies have the
post-NCLB period than for the pre-NCLB period.
tools necessary to implement NCLB? Washington,
9. This means that states should have either no miss
DC: Center on Education Policy.
ing or only one missing NAEP data point during the
Center on Education Policy. (2007c). Has student
pre-NCLB period (i.e., minimum 3 out of 4 times for achievement increased since 2002? State test score
both Grade 4 reading and Grade 8 math, 3 out of 3 trends through 2006-07. Retrieved December 12,
times for Grade 4 math). The maximum number of 2008, from http://www.cep-dc.org
times is only 2 for grade 8 reading, so only states with Center on Education Policy. (2008). A call to restruc
a minimum 2 times are included.
ture restructuring: Lessons from the No Child Left
10. We also checked the correlation between pre Behind Act in five states. Washington, DC: Center
NCLB accountability and the number of times that on Education Policy.
states participated in the NAEP during the 1990-2009 Cronin, J., Dahlin, M., Adkins, D., & Kingsbury, G.
period for each subject and grade. The results showed G. (2007). The proficiency illusion (Thomas B.
no indications of correlations at all (r = -.04 for Grade Fordham Institute report). Retrieved December 1,
4 reading, r =. 18 for Grade 8 reading, r= .02 for Grade 2008, from http://www.edexcellence.net/
4 math, r = . 11 for Grade 8 math). The same patterns Dee, T., & Jacob, B. (2009). The impact of No Child
were found for correlations with the NCLB implemen Left Behind on student achievement (National
tation fidelity variable. Bureau of Economic Research Working Paper No.
229
Lee and Reeves
15531). Retrieved March 1, 2010, from http://www Journal of Policy Analysis and Management, 24(2),
.nber.org 297-327.
Duffett, A., Farkas, S., & Loveless, T. (2008). High Hirano, K., & Imbens, G. W. (2002). Estimation of
achieving students in the era of No Child Left causal effects using propensity score weighting: An
Behind (Thomas B. Fordham Institute report). application to data on right heart catheterization.
Retrieved December 1, 2008, from http://www Health Services and Outcomes Research Methodol
.edexcellence.net/ ogy, 2,259-278.
Education Commission of the States. (2005, June). Kim, J., & Sunderman, G. L. (2004). Large mandates
Decision points for NCLB requirements. Retrieved and limited resources: State response to the No
September 5, 2007, from http://nclb2.ecs.org Child Left Behind Act and implications for account
Education Commission of the States. (2007). NCLB ability. Cambridge, MA: The Civil Rights Project
database. Retrieved September 1,2007, from http:// at Harvard University.
nclb2.ecs.org/NCLBSURVEY/nclb.aspx?Target=AD Ladd, H. F. (1999). The Dallas school accountability
Education Trust. (2006). Primary progress, secondary and incentive program: An evaluation of its impact
challenge: A state-by-state look at student achieve on student outcomes. Economics of Education
ment patterns. Washington, DC: Education Trust. Review, 18, 1-16.
Elmore, R. F. (2002). Testing trap. Harvard Magazine, Lankford, H., & Wyckoff, J. (1997). The changing
105(1), 35. structure of teacher compensation, 1970-1994.
Elmore, R. F., & Fuhrman, S. H. (1995). Opportunity Economics of Education Review, 16(4), 371-384.
to-learn standards and the state role in education. Lee, J. (1997). State activism in education reform:
Teachers College Record, 96, 432-457. Applying the Rasch model to measure trends and
Ferguson, R. F. (1991, summer). Paying for public examine policy coherence. Educational Evaluation
education: New evidence on how and why money and Policy Analysis, 19( 1), 2943.
matters. Harvard Journal on Legislation, 28(2), Lee, J. (2002). Racial and ethnic achievement gap
465-498. trends: Reversing the progress toward equity? Edu
Finn, J. D., & Achilles, C. M. (1990). Answers andcational Researcher, 31, 3-12.
Lee, J. (2006). Tracking achievement gaps and assess
questions about class size: A statewide experiment.
American Educational Research Journal, 27(3), ing the impact of NCLB on the gaps: An in-depth
557-577. look into national and state reading and math out
Fuhrman, S. H. (1999). The new accountability come trends. Cambridge, MA: The Civil Rights
(CPRE Policy Brief). Philadelphia: Consortium forProject at Harvard University.
Policy Research in Education, Graduate School of Lee, J. (2008). Is test-driven external accountability
Education, University of Pennsylvania. effective? Synthesizing the evidence from cross
Fuller, B., Gesicki, K., Kang, E., & Wright, J. (2006). state causal-comparative and correlational studies.
Is the No Child Left Behind Act working? The reli Review of Educational Research, 78(3), 608-644.
ability of how states track achievement (PACE Lee, J. (2010). Trick or treat: New ecology of educa
Working Paper No. 06-1). Berkeley: University oftion accountability system in the USA. Journal of
California. Education Policy, 25(1), 73-93.
Greenwald, R., Hedges, L. V., & Laine, R. D. (1996). Lee, J., & Wong, K. K. (2004). The impact of account
The effect of school resources on student achieve ability on racial and socioeconomic equity: Consid
ment. Review of Educational Research, 66(3), ering both school resources and achievement
361-396. outcomes. American Educational Research Jour
Grissmer, D., & Flanagan, A. (1998). Exploring rapidnal, 41(A), 797-832.
achievement gains in North Carolina and Texas. Lee, J.-W., & Barro, R. J. (1998). School quality in a
Washington, DC: National Education Goals Panel. cross section of countries. Cambridge, MA:
Grissmer, D., Flanagan, A., Kawata, J., & Williamson, S. National Bureau of Economic Research.
(2000). Improving student achievement: What stateLeFloch, K. C., Martinez, F., O'Day, J., Stecher, B.,
NAEP test scores tell us. Santa Monica, CA: Rand. Taylor, J., & Cook, A. (2007). State and local
Haney, W. (2000). The myth of the Texas miracle inimplementation of the No Child Left Behind Act
education. Educational Policy Analysis Archives. Volume IIIAccountability under NCLB: Interim
Retrieved March 3, 2001, from http://epaa.asu.edu/ report. Washington, DC: U.S. Department of
epaa/v8n41 Education.
Hanushek, E. A. (1997). Assessing the effects Linn,
of R. L. (2003). Accountability: Responsibility and
school resources on student performance: Anreasonable expectations. Educational Researcher,
update. Educational Evaluation and Policy Analy32(1), 3-13.
sis, 19, 141-164. Marion, S. F., White, C., Carlson, D., Erpenbach, W.
Hanushek, E. A., & Raymond, M. E. (2004). Does J., Rabinowitz, S., & Sheinker, J. (2002). Making
school accountability lead to improved performance?valid and reliable decisions in the determination of
230
State NAEP1990-2009
adequate yearly progress (A Paper in the Series:

Rhim, L. M., Hassel, B., & Redding, S. (2008). State
Implementing The State Accountability System role in supporting school improvement. In S. Red
Requirements Under the No Child Left Behind Act
ding & H. Walberg (Eds.), Handbook on statewide
systems of support. Charlotte, NC: Academic
of 2001). Washington, DC: Council of Chief State
School Officers. Development Institute and Information Age Pub
Mathis, W. J. (2009). NCLB's ultimate restructuring lishing (pp. 21-56).
alternatives: Do they improve the quality of educa Rothstein, R (1995). Where's the money gone? Changes
tion? (EPIC/EPRU Policy Brief). Boulder, CO: in the level and composition of education spending.
National Education Policy Center. Washington, DC: Economic Policy Institute.
McClure, P. (2005). School improvement under No Roza, M. (2010). Educational economics: Where do
Child Left Behind. Retrieved from http://www school funds go? Washington, DC: Urban Institute
.americanprogress.org/kf/mcclure3-03-2005.pdf Press.
Miller, A. R., & Zhang, L. (2009). The effects of wel Skrla, L., Scheurich, J. J., Johnson, J. F., & Kos
fare reform on the academic performance of chil choreck, J. W. (2004). Accountability for equity:
dren in low-income households. Journal of Policy Can state policy leverage social justice? In L. Skrla
Analysis and Management, 28(4), 577-599. & J. J. Scheurich (Eds.), Educational equity and
Mills, J. I. (2008). A legislative overview of No Child accountability: Paradigms, policies, and politics
Left Behind. Consequences of No Child Left Behind (pp. 51-78). New York, NY: Routledge Falmer.
for Educational Evaluation: New Directions for U.S. Department of Education. (2003). No child left
Evaluation, 117, 9-20. behind: A parent's guide. Washington, DC: U.S.
Monk, D., & Jacobson, S. (1985). The distribution of Department of Education.
salary increments between veteran and novice Wise, A. E. (1979). Legislated learning: The bureau
teachers: Evidence from New York State. Journal cratization of the American classroom. Berkeley:
of Education Finance, 11, 157-175. University of California Press.
National Center for Education Statistics. (2002). Wong, M Cook, T. D & Steiner, P. (2009). No Child
Qualifications of the public school teacher work Left Behind: An interim evaluation of its effects on
force: Prevalence of out-of-field teaching, 1987-88 learning using two interrupted time series each
to 1999-2000 (NCES Statistical Analysis Report with its own non-equivalent comparison series
2002-603). Washington, DC: NCES. (Northwestern University Institute for Policy
National Center for Education Statistics. (2007). Map Research Working Paper). Retrieved April 5, 2010,
ping 2005 state proficiency standards onto the from http://www.northwestern.edu
NAEP scales (NCES 2007-482). Washington, DC: Zimmer, R., Gill, B., Razquin, P., Booker, K., &
Government Printing Office. Lockwood, J. R., III. (2007). State and local imple
Newmann, F. M., King, M. B., & Rigdon, M. (1997). mentation of the No Child Left Behind Act Volume
Accountability and school performance: Implica ITitle 1 school choice, supplemental educational
tions from restructuring schools. Harvard Educa services, and student achievement. Washington,
tional Review, 67(1), 41-75. DC: U.S. Department of Education.
No Child Left Behind Act of 2001, Pub. L. No.
107-110.
Authors
O'Day, J. A. (2002). Complexity, accountability, and
school improvement. Harvard Educational Review,
JAEKYUNG LEE is Professor and Associate Dean
72(3), 293-329.
for Academic Affairs at the State University of New
O'Day, J., & Smith, M. (1993). Systemic reform and
York at Buffalo, Graduate School of Education, 409
educational opportunity. In S. Fuhrman (Ed.),
Designing coherent educational policy (pp. 250 Baldy Hall, Buffalo, NY 14260-1000; JL224@buffalo
312). San Francisco: Jossey-Bass. .edu. His research focuses on educational accountability
Peterson, P. E. (Ed.). (2006). Generational change:and equity issues.
Closing the test score gap. Lanham, MD: Rowman
& Littlefield. TODD REEVES is a doctoral student at Boston Col
Porter, A., & Chester, M. (2002). Building a highlege, Lynch School of Education, 336 Campion Hall,
quality assessment and accountability program: TheChestnut Hill, MA 02467; reevest@bc.edu. He is inter
Philadelphia example. In D. Ravitch (Ed.), Brookested in the utilization of psychological research in
ings papers on education policy (pp. 285-315).education and teacher education.
Washington, DC: Brookings Institution.
Preliminary NCLB results show slippage in 2006.
(2006, September 20). Education Week. Retrieved
September 1, 2007, from www.edweek.org Submitted June 27,2010
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchi Revision received November 3, 2011
cal linear models. Thousand Oaks, CA: SAGE. Accepted November 9,2011
231

Revisiting The Impact of NCLB High-Stakes School Accountability

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Revisiting The Impact of NCLB High-Stakes School Accountability

Загружено:

Авторское право:

Доступные форматы

Revisiting the Impact of NCLB High-Stakes School Accountability, Capacity, and Resources:

2012 AERA. http://eepa.aera.net

Revisiting the Impact of NCLB High-Stakes School

Keywords: NCLB, accountability, capacity, NAEP, achievement gap

North Carolina. Evidence for the effects of such

approaches to accountability have switched from

Pre-NCLB Period Post-NCLB Period

Design A: pre-NCLB state accountability policy (X) as a single treatment variable

First-generation X^X^XjOXjO X,+X,0 Xi+XjO X!+XjO

Pre-NCLB Period Post-NCLB Period

approaches to school accountability vary inGiven

high school) and school choice (<1%) and very

(Hirano & Imbens, 2002). Based on prior research

Results associated with decreases in states' average

Grade 4 Grade 8 Grade 4 Grade 8

Initial status (tqq) 28.82*** 64 42*** 27.56*** 40.89***

Grade 4 Grade 8 Grade 4 Grade 8

Pre-NCLB growth -0.37* (0.15) 0.08 (0.33) 0.06 (0.13) 0.14(0.22)

Grade 4 Grade 8 Grade 4 Grade 8

ratio) on racial/ethnic, socioeconomic, and targeted to those in compensatory education or

TABLE 3 SumaryResltofHirachlLnerModig(HL)Stae-vlModsfNChildLetBnAcof201(NCLB)PolicyEfetsnNaiolAsemntof EducationlProges(NAEP)ReadingadMathAveragAchievmnt

on achievement gains. The trends of states' Appendix A

Appendix B explicitly to standards for Grades 3-8 and

Appendix B (continued) improvement funds ($) by total number of SINI across

Appendix G. Measures of State-level HLM Model Predictors

Fidelity of Rigor of Rigor of Data Funding

Alabama .81 .61 -.64 -1.03 -.16 -.15

Georgia -.26 .75 -1.59 -1.06 1.03 -.37

Kentucky 1.52 1.19 .46 .74 1.43 .00

Maryland .10 1.05 .07 .16 -.95 -.40

Michigan .46 .17 -.26 -.24 -.55 -.32

Mississippi -.97 .17 -1.68 -1.41 .24 .98

Fidelity of Rigor of Rigor of Data Funding

Nebraska -.97 -1.89 -1.03 -1.67 -.95 .09

New Hampshire -.97 -.57 1.49 1.18 -.55 -.22

New Jersey -.97 -.27 -.09 .00 -.55 -.46

Oregon -.26 -1.01 -.24 -.37 -.16 .31

Pennsylvania -.97 .46 .66 .30 -.55 -.24

Virginia .46 .75 -.51 -.68 1.03 -.16

Washington -.97 .46 .44 1.10 1.03 -.25

West Virginia .81 -.42 -1.34 -1.20 -.16 .09

Wyoming -.97 -2.03 1.41 1.11 1.03 1.31

Notes high-minority, large urban schools that receive more

11. The significance and strength of these intrastate

adequate yearly progress (A Paper in the Series:

Вам также может понравиться

Initial status (tqq) 28.82* 64 42* 27.56* 40.89*