Академический Документы
Профессиональный Документы
Культура Документы
State NAEP 19902009 Reading and Math Achievement Gaps and Trends
Author(s): Jaekyung Lee and Todd Reeves
Source: Educational Evaluation and Policy Analysis, Vol. 34, No. 2 (June 2012), pp. 209-231
Published by: American Educational Research Association
Stable URL: http://www.jstor.org/stable/23254111
Accessed: 16-04-2017 23:30 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
American Educational Research Association is collaborating with JSTOR to digitize, preserve and
extend access to Educational Evaluation and Policy Analysis
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
Educational Evaluation and Policy Analysis
June 2012, Vol. 34, No. 2, pp. 209-231
DOI: 10.3102/0162373711431604
Jaekyung Lee
State University of New York at Buffalo
Todd Reeves
Boston College
This study examines the impact of high-stakes school accountability, capacity, and resources under
NCLB on reading and math achievement outcomes through comparative interrupted time-series
analyses of 1990-2009 NAEP state assessment data. Through hierarchical linear modeling latent
variable regression with inverse probability of treatment weighting, the study addresses pre-NCLB
differences in state characteristics and trends to account for variations in post-NCLB gains. While
the states 'progress was uneven among different grades, subjects, and subgroups, NCLB did not yet
evidence sustainable and generalizable high-stakes accountability policy effects. Improving average
achievement as well as narrowing achievement gaps was associated with long-term statewide instruc
tional capacity and teacher resources rather than short-term NCLB implementation fidelity, rigor of
standards, and state agency s capacity for data tracking and intervention.
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
Lee and Reeves
effects by relying on data from states' ownan enriched multilevel analysis of both intrastate
assessments as a tool of both NCLB intervention and interstate variations in NAEP achievement
and evaluation at the same time. States tended to outcomes before as well as after NCLB. The
show more post-NCLB progress on their own model incorporates state capacity and policy
high-stakes tests, although such progress did not implementation factors beyond the adoption of
always transfer to independent low-stakes tests high-stakes testing, and time-varying school
such as the National Assessment of Educational resource effects into achievement trends. In
Progress (NAEP; Lee, 2010). In addition, even addition, this study addresses potential threats to
when the NAEP instead of state assessments is the internal validity of quasi-experimental
used for policy evaluation, post-NCLB change research on the impact of NCLB by using
may reflect a continuing trend that began before enhanced statistical control for selection biases
the policy was implemented. It remains to be and regression to the mean through inverse
rigorously examined whether and, if so, how and probability treatment weighting and latent
to what extent NAEP reading and mathematics variable regression techniques. Our study also
average achievement and achievement gap trends extends prior work by examining states' progress
are systematically related to state implementation toward narrowing academic achievement gaps
of test-driven accountability policies before and (i.e., those between students in the 10th/25th and
after NCLB. 90th/75th percentiles) as well as racial/ethnic and
Second, research efforts to evaluate NCLB socioeconomic achievement gaps and by
were thwarted by the complexity and variability explaining the gap trends in the broader, longer
of policy design and implementation in different term context of state capacity and endowment
states. Under NCLB, the existence of dual irrespective of federal policy.
accountability systems and interactions between Given current policy goals and intervention
federal and state policies also complicates the targets, this study tests the hypothesis that NCLB
analysis of post-NCLB achievement data. The promotes academic excellence and equity in
mandatory nationwide implementation of NCLB reading and math across all states by both
essentially precludes analysis of further impacts improving the average achievement of all students
of overall accountability systems by eliminating and narrowing the gap between disadvantaged,
a comparison group of states without such policies low-achieving and minority students and their
(Hanushek & Raymond, 2004). Nevertheless, counterparts. It also tests the hypotheses that
some previous studies attempted to capitalize on states with stronger educational capacity in place
interstate pre-NCLB accountability policy to produce desired student outcomes and with
variations (Dee & Jacob, 2009; Lee, 2006). This more timely, intensive, and rigorous
research design treats first-generation implementation of accountability policy under
accountability states as a comparison group and NCLB would experience a more positive impact.
second-generation accountability states as a
treatment group under NCLB. One major problem Analytical Framework
with this simple design, however, is the assumption
that all states are subject to the same dosage of The theory of action behind test-driven exter
accountability policy treatment under NCLB. The nal accountability policy is deemed fatally simple
earlier studies also had limitations in that it could (see Adams & Kirst, 1999; Benveniste, 1985;
take several years for a new federal policy to Elmore, 2002; Fuhrman, 1999; Newmann, King,
& Rigdon, 1997; O'Day, 2002; Wise, 1979). The
produce an effect and also that the effect, if any,
could be uneven between states, subjects, and logic of performance-driven accountability policy
subgroups within states as a function of their draws on rationalistic and behavioristic views of
human behavior by positing that holding schools,
preexisting differences as well as policy treatment.
In light of these concerns, this study employs teachers, and students accountable for academic
a new approach to the evaluation of NCLB with performance, with incentives provided (i.e.,
regard to its impact vis-a-vis excellence and rewards and sanctions), will inform, motivate,
equity policy goals. Our approach involves a and reorient the behavior of schooling agents
comparative interrupted time-series design, with toward the goal. Over the years, states' policy
210
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
State NAEP 1990-2009
211
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
First-generation xoxoxoxo xoxo xoxo
Accountability
states (e.g., TX,
NC.CA)
Second-generation o o o o xo xo xoxo
Accountability
states (e.g., IA,
ME, WY)
Second 0 0 0 0 Xyt-XjO
generation
Accountability
states (e.g., IA,
ME, WY)
Design B: pre-NCLB state accountability policy (Xt) plus federal NCLB accountability policy (X,) as dual treatment variables
FIGURE 1. Comparative interrupted time-series research designs for analysis of NCLB policy im
Note. X, = pre-NCLB state accountability policy; X2 = NCLB federal accountability policy; X, + X2 = mix of fed
accountability policies under NCLB; X, , X: = delayed or watered-down version of X, + X2; O = student read
achievement average and gap measures.
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
State NAEP1990-2009
213
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
Lee and Reeves
growth rate is significantly greater than the prestandard as measured by the discrepancy between
NCLB growth rate and the gains are not temporary.NAEP and state assessment results, which
The Level 1 model also includes time-varyingcaptures the scale of intervention. State education
covariates. Recognizing that demographicagency capacity variables include measures of
changes between cohort groups may influence building a longitudinal student achievement data
student achievement trends, we account for thetracking system and funding for SINI schools.
percentages of minority (i.e., Black and Hispanic)The effect sizes are reported based on student
and poor (i.e., eligible for free or reduced-price level standard deviations (o) of achievement in
lunch) students. Furthermore, the Level 1 model 2003.
includes teacher salary (as proxy for teacher Correlations among the Level 2 state variables
experience and quality) and pupil-teacher ratio show a weak to moderate degree of inter
(as proxy for class size) to examine the effects of
relationships but no indication of multicollinearity
key school resources on achievement outcomes problems. The fidelity of states' NCLB policy
over time among sequential cohorts. Per pupil implementation was positively associated with
expenditures, global measures of school resources, the intensity of their pre-NCLB high-stakes
are highly correlated with both the teacher salaryaccountability policy (r = .38) and slightly
and pupil-teacher ratio variables. Moreover, since
negatively associated with the rigor ofperformance
teacher salary and pupil-teacher ratio serve as standards (r = -.20 for reading; r = -.25 for
key determinants of instructional spending permath). This suggests that the first-generation
pupil, we used those two specific measures ofaccountability states were more likely to comply
school resources for elementary and secondarywith NCLB mandates but at the same time adopted
education.5 relatively lower performance standard levels. On
Our supplementary trend analyses of these the other hand, the study found a nonsignificant
time-varying covariates as dependent variables correlation between the state agency capacity and
showed significant growth in percentage minority/fidelity factors (r = .13 between data tracking
poverty students both before and after NCLB. capacity and implementation fidelity; r = -.12
Pupil-teacher ratio and teacher salary showed between school improvement grants for SINI and
different trends: Pupil-teacher ratio decreasedimplementation fidelity), implying that high
incrementally throughout the 1990-2009 period,stakes accountability policy was not systematically
whereas real teacher salary remained largelyaccompanied by capacity-building efforts at the
unchanged. There were substantial variationsstate level. However, the states' activism in
among states in the trend of increases in building a data tracking system was positively
school resources (associated with increases associated with the state agencies' capacity for
in achievement) as well as increases in the school support as measured by school improvement
poor, minority population (with decreases ingrants for SINI (r = .45).
achievement) throughout the entire period. These The Level 2 model involves comparing the first
forces may have worked together to influence stateand second-generation accountability states. To
achievement trends, independent of NCLB. address potential selection bias in drawing causal
At Level 2 (state level), not only pre-NCLB inferences about the impact of NCLB based on this
state accountability policy but also variables thatcomparison, this study applies inverse probability
tap into post-NCLB state policy activities are used
of treatment weighting (IPTW) and latent variable
to explain interstate variations in post-NCLBregression methods. IPTW builds on propensity
changes to state achievement trends; they include score matching that employs a predicted probability
fidelity of NCLB implementation, rigor ofof group membershiptreatment versus control
standards, data tracking capacity, and fundinggroupbased on observed predictors, which may
capacity for schools in need of improvementbe used for matching or as covariates for quasi
(SINI; see Appendices B and C). The fidelityexperimental research (Rosenbaum & Rubin, 1983).
variable measures how faithfully and quicklyIPTW realizes this matching by assigning differential
states complied with key NCLB federal weights to subjects based on the inverse probability
requirements in place, whereas the rigor variableof receiving a treatment at a given time conditional
captures the level of states' own performance on prior outcome history and other covariates
214
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
State NAEP 1990-2009
215
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
Lee and Reeves
216
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
TABLE 1
Summary Results of Hierarchical Linear Modeling (HLM) Base Model (Random Coefficients Model) for th
Trends of State National Assessment of Educational Progress (NAEP) Reading and Math Average Achievem
Reading Math
Fixed effects
Initial status (Poo) 211.74*** (0.97) 256.83*** (1.57) 214.21*** (0.95) 260.08*** (1.05)
Pre-NCLB growth 0.49*** (0.08) 0.51** (0.15) 1.04*** (0.09) 1.28*** (0.07)
(Pio)
Post-NCLB change to 0.26* (0.13) -0.36* (0.17) 0.02 (0.11) 0.02 (0.11)
growth (P2o)
Post-NCLB change to -0.70 (0.50) -0.93** (0.31) 7.61*** (0.48) 0.86 (0.53)
status (P30)
% poor effect (P40) -0.24*** (0.04) -0.21*** (0.04) -0.18*** (0.04) -0.20**** (0.04)
% Black effect (P50) -0.14** (0.04) -0.15*** (0.04) -0.17*** (0.04) -0.25*** (0.05)
% Hispanic effect (Pgg) -0.09* (0.04) -0.12** (0.04) -0.12** (0.04) -0.16** (0.05)
Teacher salary effect 0.20** (0.06) 0.07 (0.06) 0.15** (0.05) 0.10+ (0.06)
$70)
Pupil-teacher ratio -0.58** (0.16) -0.55*** (0.13) -0.13 (0.14) 0.27* (0.15)
effect (P80)
Random effects
growth (x2o)
Post-NCLB change to 1.02 1.35* 3.29** 6.96***
status (T30)
Level 1 variance (a2) 4.80 1.18 2.62 2.08
Deviance statistics (-2 1582.43 995.20 1521.80 1706.55
log likelihood)
Note. Unstandardized regression coefficients with (standard errors) are reported for fixed effects. NCLB = No Child Left Behind
Act of 2001.
< .10. *p< .05. **p < .01. ***p < .001.
= 1.87) and growth rate (b20 = 0.89) of the nonpoor Post-NCLB changes to the reading achievement
poor gap in fourth grade were observed. The totalgaps between academic subgroups were mixed;
amount of post-NCLB increases by 2009 in the the changes in status evidence setbacks, whereas
nonpoor-poor gap for fourth-grade reading was changes in growth rates suggest progress. We
7.21. Although somewhat smaller in magnitude, found an increase in the status of the fourth-grade
increases in both status (b30 = 1.73) and growth reading achievement gap between the 90th and
rate (b20 = .47) were also seen for eighth grade. For 10th percentiles (b30 = 2.20). However, we
fourth grade, the increase in gap level was observed a reduction in the growth rate (b20 =
attributable to differential post-NCLB drops for -1.11) for this gap. In eighth grade, we found an
both subgroups wherein poor students dropped increase in the status (b30 = 5.05) of the reading
more than their nonpoor counterparts. The post achievement gap between the 90th and 10th
NCLB increase in the growth rate of the percentiles. Similar post-NCLB increases to the
socioeconomic reading gap in fourth grade was status of both the fourth-grade (b3g = 1.19) and
attributable to a setback experienced by poor eighth-grade (b3g = 2.73) reading achievement
students only. For eighth grade, the post-NCLB gap between the 75th and 25th percentiles were
increase in the level of this socioeconomic reading observed. In contrast, we again found reductions
achievement gap occurred because poor students in the growth rate after NCLB in both fourth grade
dropped more than nonpoor students. (b20 = -0.70) and eighth grade (b20 = -0.33). The
217
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
TABLE 2
Summary Results of Hierarchical Linear Modeling (HLM) Base Model (Random Coefficients Model) for the
Trends of State National Assessment of Educational Progress (NAEP) Reading and Math Achievement Gap
Trends
Reading Math
White-Black
Pre-NCLB growth -0.23* (0.11) 0.09 (0.24) -0.35** (0.12) 0.23* (0.12)
Post-V^CLB change to -0.08 (0.25) -0.29 (0.29) 0.27* (0.14) -0.66* (0.18)
growth (p2o)
Post-NCLB change to 0.14(0.93) -0.03 (0.88) -1.26 (0.75) -1.60(1.10)
status (p30)
Teacher salary (P70) 0.18+ (0.09) 0.14(0.09) 0.14* (0.07) 0.24** (0.09)
Pupil-teacher ratio -0.04 (0.27) -0.23 (0.24) 0.12(0.18) 0.03 (0.25)
(080)
White-Hispanic
Pre-NCLB growth -1.10*** (0.17) -0.66** (0.19) -0.22 (0.15) 0.37* (0.20)
(P10)
Post-NCLB change to 0.89*** (0.21) 0.47* (0.22) 0.15 (0.15) -0.49* (0.20)
growth (p20)
Post-NCLB change to 1.87** (0.59) 1.73** (0.57) -2.21** (0.75) -2.18* (0.97)
status (P30)
Teacher salary (P70) 0.34*** (0.06) 0.30*** (0.05) 0.27*** (0.04) 0.30*** (0.05)
Pupil-teacher ratio -0.03 (0.15) -0.09 (0.13) -0.13 (0.11) -0.11 (0.14)
90th-10th Percentile
Pre-NCLB growth 0.00 (0.13) -0.30 (0.22) -0.28* (0.13) 0.31** (0.09)
(Pl0>
Post-NCLB change to -1.11*** (0.28) -0.42 (0.28) 0.39* (0.15) -0.31* (0.12)
growth (P20)
Post-NCLB change to 2.20*0.15) 5.05*** (0.56) -4.29*** (1.06) -1.46* (0.63)
status (P30)
Teacher salary (P70) 0.16* (0.08) 0.27*** (0.07) 0.19*** (0.05) 0.37*** (0.06)
Pupil-teacher ratio 0.90*** (0.21) 0.56** (0.17) 0.57*** (0.14) 0.34* (0.16)
$80)
(continued)
218
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
TABLE 2 (continued)
Reading Math
75th-25th Percentile
Pre-NCLB growth 0.00 (0.07) -0.15 (0.12) -0.17* (0.08) 0.11* (0.05)
(Pio)
Post-NCLB change to -0.70*** (0.14) -0.33* (0.16) 0.20* (0.09) 0.00 (0.06)
growth (P2o)
Post-NCLB change to 1.19* (0.57) 2.79*** (0.31) -2.36** (0.62) -0.62f (0.32)
status (P30)
Teacher salary (P70) 0.06 (0.04) 0.14*** (0.04) 0.11*** (0.03) 0.21*** (0.03)
Pupil-teacher ratio 0.46*** (0.12) 0.24** (0.09) 0.35*** (0.08) 0.28** (0.08)
(080)
Note. Only selected fixed effect portions of the results are shown. Unstandardized coefficients with (standard errors) are reported.
Full model results for achievement gaps are available from the authors upon request. NCLB = No Child Left Behind Act of2001.
fp < .10. *p < .05. **p < .01. ***p < .001.
total post-NCLB increases by 2009 in the fourth-grade gap level was fueled by positive
75th-25th percentile gap for fourth- and eighth changes for both groups, with poor students
grade reading were -3.01 and 0.81, respectively. gaining more. The post-NCLB reduction in the
On the other hand, post-NCLB changes to gap growth rate in eighth grade was owed to
achievement gaps in math suggest some progress. positive post-NCLB growth rate changes for both
For the racial/ethnic achievement gaps, there were groups, with relatively faster growth for poor
indications of favorable changes in both the students. The reduction in the status of the poverty
Black-White and Hispanic-White gaps. For gap in eighth-grade math was similarly explained
example, we found a post-NCLB reduction in the by increases in status for both groups, which
growth rate of the Black-White gap for Grade 8 favored poor over nonpoor students. The total
(b20 = -0.66). This gap reduction occurred because reduction in the nonpoor-poor gap for eighth
of an increase in the post-NCLB growth rate for grade math by 2009 was 5.12.
Black students. The total post-NCLB reduction Our analysis of the academic gaps in math
of the White-Black gap for eighth-grade math by largely also suggests favorable changes after
2009 was 5.56. Next, the growth rate for the NCLB. For progress in closing the gap in math
Hispanic-White achievement gap in eighth grade achievement between the 90th and 10th
decelerated after NCLB (b10 = -0.62), and the percentiles, our analysis reveals post-NCLB
status of this gap in fourth grade decreased after reductions in status in fourth grade (bVj = 4.29)
NCLB (b30 = -3.42). The reduction in the gap and in both the status (b30 = -1.46) and growth
growth rate in eight grade occurred because of an rate (b2Q = -.31) in eighth grade. The post-NCLB
increase in the growth rate for Hispanic students. reduction in the status of the gap in fourth grade
The reduction in the status of the gap for fourth was caused by positive changes for both groups,
grade occurred because of larger increases in the with the 10th percentile gaining more. For eighth
status for Hispanic than White students. The total grade, the post-NCLB reduction in the academic
post-NCLB reduction in the White-Hispanic gap gap level was fueled by an increase in level for
for eighth-grade math by 2009 was 4.09. the 10th percentile. The decrease in the growth
There was also some indication of progress in rate in eighth grade was fueled by positive changes
narrowing socioeconomic gaps in math after for both groups, with the 10th percentile's
NCLB. For post-NCLB changes in the nonpoor relatively faster growth. Total post-NCLB
poor gap in math achievement, we found a reduction in the 90th-10th percentile gap for
reduction in the level of the gap in fourth grade eighth-grade math by 2009 was 3.32.
(bw = -2.21) and a reduction in both the growth As with overall academic achievement, we
rate (b20 = -0.49) and status (b}0 = -2.18) in eighth also examined the effects of two school resource
grade. The post-NCLB reduction in the variables (i.e., teacher salary and pupil-teacher
219
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
Lee and Reeves
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
^3
S
R
O
>0
CQ o
U 3
Z c
^ Hi
ffl CQ
>-3 )
u
-
w ^ .5 2 w
IS z Z
221
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
.S
43 co
i/~j On (nj On * 00 _
S * g; "i S s ^
oSiSd
a S
<u
e a a
*t
6
O^
-O
O
a,
CN co <D
co
? ^
T3
o <U
s >
s3
*
r On r r^T vo
r^T ON vo iri *
*t O ^t o o n -1 (N O *
00 (O
d d d co
i
"O o o CN <N
o ? ? o d ? 1 vq
2
.2
T3 H
2 S3
_*. w-v o M O^! f^O ^ *
<N 5
2 ^ ^ ^ ^ * *W
? so e S/ 3
62 W ^ w ^ C-
O J o .
(U O
o "3
co W
S3
>. <N
On _
.S -a
w o
I |6
_ Tf On i On 0 -s
CO ^ "-J
PQ ^ 1 S
T3
o '? ? w 7 i-J o IJ
c a>
s <-> S
^ 3 c3
- ' V3 6
<u ^ ' 43 1/5
CD a ^ .ts u
o
^ -5
* O,, 't^ovi'-oovio* *2 "S
^M^nnnoooooofjN c 'S
-o
o
P, H P. ^OwOw^f
H p w
IW
"7w.
^ .2 ^
o s
f s
T3 O
SS
E.-0
a> g
-. cc
i "2
Ia 3o
<u o
c3 S
? a g
b
! S fS
'. co " i r
o ^ ^ vo r o\
\0 N -h iri (S
! i-3
1 S S
, <^> O, Cy> CO' X> I-J
1 *c ^
' > .S
a
<D i-l
W hr 0) 00
00 N O N -H 00 IO
^ c
3 s r>L"vOrsj^Ttoovo
. nr\
S
p. H
o < T3 '
2 o
<11 O
3 I v
00 <N
r
-g $ ^
a *
d
? c 00
S .S O
us ^^ V
3 r
s ^
O D
> w
<N m
S
3a -SI
co co CO
& vo
3 3 r+~>
'1
w
>2 CO.
43 o
O -3T (
g 2> 5 43 2
3JJ o o c a x, v
U .2 -o
I J3 "O
8. ^
00
ro Z c g a g. "*?> j
i? m
o ca CQ CQ On <L>
w o S J" M
o
ox) ^ 1 J
<N ^ O
g ^
hJ ^ Ji ^ cti
!T G, U
.IP 12 z Z - fl "1
s1 1
ffl i) >
S 5 & -Si H
?s
N~/ "O "" op
E 5 u
_p P-.
Sli
222
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
State NAEP 1990-2009
second-generation states since NCLB. Flipping this Furthermore, the Model 3 results give only
partial support for the hypothesis that states'
interpretation, the "negative" effect of having "pre
NCLB" accountability policy on "post-NCLB"
administrative/financial capacity for district/
status or growth would signify a "positive" effect
school support under NCLB would bring academic
improvement (see "data tracking capacity" and
of adopting NCLB accountability policy. However,
"funding capacity for SINI" in Table 3).
after controlling for pre-NCLB state characteristics, Two
trends, and other covariates including demographicstandard deviations of an increase in the data
tracking capacity variable is associated with
changes, Model 2 results make the effects of pre
NCLB accountability insignificant, except fora a
2-point gain in the status of Grade 8 math
achievement (2 * b}4 = 1.98). Significant effects
consistently significant effect on post-NCLB change
on status but not growth indicate that the effects
to growth in Grade 8 reading (bn = -.22). A two
standard deviation decrease in the pre-NCLB high
were immediate but not sustainable. State funding
capacity for SINI was not systematically
stakes accountability (equivalent to a move from
associated with post-NCLB changes in state
first-generation to second-generation status) was
average achievement. However, additional
associated with a very small gain by 2009 in Grade
8 reading: -2 x (b3l + 6 b2l) = 2.36 (.07 a). Even subgroup
so, and gap analysis showed that funding
the validity of this interpretation depends on capacity
its for SINI was associated with improved
post-NCLB gains for only low-achieving students
implicit assumption that all states, including both
and thus smaller academic gaps between the
first- and second-generation accountability states,
implemented NCLB accountability policy equally
75th and 25th percentiles and the 90th and 10th
well. percentiles in math.
The Model 2 results for most state-level Last, the results of HLM latent variable
predictors are highly similar to correspondingregression in both Models 2 and 3 show the
Model 3 results in all subjects and grades. Thistendency for states that had a relatively lower initial
suggests that accountability policy did not
status and/or relatively smaller gains prior to NCLB
influence overall school resources (i.e., teacherto experience more immediate gains and/or faster
salary and pupil-teacher ratio) and thus policygrowth after NCLB (see coefficients forpre-NCLB
level and pre-NCLB growth in Table 3). Because
effects, if any, are not mediated by those resources
that have independent effects on achievementthe first-generation accountability states gained
outcomes. As shown in the likelihood ratio test faster than their second-generation counterparts
results, Model 2 explains a significant share of during the pre-NCLB period, a reversed pace of
the interstate variations in post-NCLB trends, growth is likely to occur to those two groups of
whereas Model 3, with the same set of state policy states by chance regardless of NCLB policy impact.
variables plus school resource variables, explains The first-generation states are unlikely to sustain
even more of the variations than Model 2. the same rate of growth under NCLB perhaps as
The Model 3 results show that states' fidelity a result of diminishing returns to high-stakes
and rigor in NCLB implementation have a accountability policy and thus fading policy
few positive effects regardless of pre-NCLB impact, whereas the second-generation states are
accountability status. After controlling for other likely to make faster gains than before in spite of
possible confounders, positive gains were their less faithful implementation of NCLB.
associated with NCLB implementation fidelity Without considering this pattern that arose, possibly
{b12 = .14 for Grade 8 math) as well as the rigor due to diminishing returns to accountability policy,
of standards (b2i = .69 for Grade 8 reading and we might overestimate the impact of NCLB by
b23 = . 12 for Grade 4 math). Even those weak observing greater gains among the second
positive effects of fidelity and rigor variables are generation states than the first-generation states.
hard to generalize, since they are not observed
consistently across grades and subjects. By and Summary and Conclusion
large, the state-level regression results reveal only
limited potential effects of high-stakes school This study updates and revisits earlier evalu
accountability policyeffects that failed to make ations of the NCLB policy's impact with regard
either statistically or practically significant to progress toward the goal of improving profi
differences in post-NCLB trends. ciency for all students and narrowing student
223
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
Lee and Reeves
achievement gaps with NAEP data collected by greater shortage of qualified teachers in math than
2009 in reading and math. It offers new insights in reading, particularly within high-minority and
for the evaluation of NCLB by differentiating high-poverty schools (National Center for Education
high-stakes school accountability and capacity Statistics [NCES], 2002). Given these conditions,
building policy initiatives among the first- and it is hard to understand why the test results improved
second-generation accountability states over an in math but not in reading during the NCLB period
extended pre-NCLB/post-NCLB time period. The (2003-2009). This inconsistency begs the question
study is also more methodologically judicious of whether these changes are necessarily attributable
than previous studies by refining the comparative to the impact of NCLB.
interrupted time-series design through statistical The findings of our study challenge those of
modeling that addresses more potential threats to some previous studies. For example, an earlier
causal inferences about the policy impact. study by Dee and Jacob (2009) reported significant
By and large, there were highly mixed patterns positive effects of NCLB in math but not in
of post-NCLB changes in terms of improving reading based on NAEP data through 2007. In
reading versus math achievement across the nation. contrast, our study found a significant positive
The comparison of pre- and post-NCLB reading effect only in Grade 8 reading based on NAEP
outcome trends showed that the level of state data through 2009. Two possible reasons for the
average achievement as well as the pace of divergence of these findings from those of Dee
achievement gains have either remained the same and Jacob (2009) include our longer NAEP time
or declined after NCLB. In contrast, the earlier frame and our additional efforts to consider
progress in math has continued or accelerated with internal validity threats. It appears that, without
more gains after NCLB than before. However, the adequate statistical control for pre-NCLB state
magnitude of these cumulative achievement gains characteristics and achievement trends, one
or losses in the post-NCLB period (2003-2009) observes more tentative gains among the second
was relatively small. Similarly, there were mixed generation accountability states (in comparison
patterns of progress in narrowing achievement gaps with the first-generation counterparts) in Grade
in reading versus math. In reading, the states did 4 math and reading during the period of 2003
not narrow achievement gaps since NCLB but 2009. However, those modest post-NCLB gains
instead experienced a setback to the earlier became even smaller and insignificant once we
progress. In math, there was some significant took into account differences in the earlier results
progress for narrowing the achievement gaps. and diminishing returns to accountability policy.
Regardless of the subpopulations involved and the Furthermore, cross-state variations in the fidelity
grade/subject, however, it is noteworthy that the of NCLB policy implementation and the rigor of
magnitude of post-NCLB changes in the gaps is performance standards were not systematically
not only short of meeting NCLB achievement associated with post-NCLB academic improvement
targets for all students but also particularly and achievement gap patterns. If the NCLB policy
insufficient to redress setbacks to the earlier were to be ascribed as a cause for any observed
national progress in narrowing racial achievement post-NCLB change in achievement gaps, one might
gaps (see Lee, 2002; Peterson, 2006). expect that these predictors tied to the policy itself
Were different patterns of post-NCLB academic would be consistently related to the outcomes.
improvement in reading and math related to NCLB? Although there was limited evidence for positive
The high-stakes external accountability policy under effects of high-stakes accountability, this was often
NCLB appears to have placed equal or similar restricted to immediate gains right after NCLB
priorities on improving both reading and math without sustainable effects on further growth. The
achievement. However, there were actually more study also found limited evidence for positive effects
favorable changes in instructional conditions for of state agency capacity for district/school support,
reading than for math (e.g., considerable investment particularly building achievement data tracking
in an early reading program). A study of changes in systems and funding schools in need of improvement.
instructional time allocation since NCLB also shows On the other hand, our study found more
that schools allocated relatively more time to reading consistent positive effects of statewide educational
than to math (CEP, 2007a). In addition, there was a resources, independent of accountability policy,
224
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
State NAEP 1990-2009
225
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
Lee and Reeves
fidelity. It is derived from the Education Commission Grade 4 reading, r = .96 for Grade 8 reading, r = .81
of the States (ECS) database, which tracks state for Grade 4 math, and r = .85 for Grade 8 math for
laws, departmental regulations, board rules, and its relationships with NCES (2007) estimates of the
directives and practices related to requirements stringency of 2005 state proficiency standards, and
across seven major sections of the NCLB legislation:r = .91 for Grade 4 reading, r = .78 for grade 8
(a) NCLB Standards and Assessments, (b) NCLB reading, r = .85 for Grade 4 math, and
Accountability (AYP), (c) NCLB School r = .76 for Grade 8 math for its relationships
Improvement, (d) NCLB Safe Schools, (e) NCLB with Fordham Institute (Cronin, Dahlin, Adkins,
Supplemental Services, (f) NCLB Report Card, and & Kingsbury, 2007) estimates of the rigor of
(g) NCLB Teacher Quality. The 3-point scale ratings 2005 state proficiency standards. Imperfect
of state policy implementation status (3 = on target, correlations with those of previous studies are due
2 = partially on target, 1 = not on target) are to the fact that this study combined the rigor measures
combined across 38 items in the seven areas as across 5 years (2003-2007) to capture stable and
updated in September 2007 (ECS, 2007). The alpha consistent effects of the post-NCLB standards.
reliability coefficient for the index is .81. The ECS4. Data Tracking Capacity: This variable is a composite
used analytical scoring rubrics, so-called "decision index of longitudinal student achievement data
points," to determine whether the 50 states were on tracking capacity. It is derived from the Data Quality
track to meet NCLB requirements (see ECS, 2005, Campaign (ECS) 2009-2010 state profile database
for rubrics for all 38 items). For example, the rubrics (see http://www.dataqualitycampaign.org), which
for reading standards, one of the 38 items, are as tracks state activities related to 10 essential elements
follows: (i.e., core components of a robust longitudinal data
system) and 10 action requirements (i.e., support
system for using data to make informed decisions
Yes (On Target) = State has academic content and improve student performance). The 10 core
standards in reading/language arts in Grades elements include (a) statewide student identifier,
3-8 and high school as required under the (b) student-level enrollment data, (c) student-level
1994 Elementary and Secondary Education test data, (d) information on untested students, (e)
Act (ESEA) or requires districts to have statewide teacher identifier with a teacher-student
standards. If state has standards for grade match, (f) student-level course completion data, (g)
bands rather than individual grade levels, the student-level SAT, ACT, and AP exam data, (h)
state also must have "suggested grade-level
expectations" (or the equivalent) that are tied (continued)
226
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
State NAEP 1990-2009
(continued)
227
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
Lee and Reeves
Appendix C (continued)
Note. All of the above variables are standardized to have a mean of zero and a standard deviation of one. Therefore, each state's
z-score indicates its relative standing among all 50 states.
228
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
State NAEP 1990-2009
229
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
Lee and Reeves
15531). Retrieved March 1, 2010, from http://www Journal of Policy Analysis and Management, 24(2),
.nber.org 297-327.
Duffett, A., Farkas, S., & Loveless, T. (2008). High Hirano, K., & Imbens, G. W. (2002). Estimation of
achieving students in the era of No Child Left causal effects using propensity score weighting: An
Behind (Thomas B. Fordham Institute report). application to data on right heart catheterization.
Retrieved December 1, 2008, from http://www Health Services and Outcomes Research Methodol
.edexcellence.net/ ogy, 2,259-278.
Education Commission of the States. (2005, June). Kim, J., & Sunderman, G. L. (2004). Large mandates
Decision points for NCLB requirements. Retrieved and limited resources: State response to the No
September 5, 2007, from http://nclb2.ecs.org Child Left Behind Act and implications for account
Education Commission of the States. (2007). NCLB ability. Cambridge, MA: The Civil Rights Project
database. Retrieved September 1,2007, from http:// at Harvard University.
nclb2.ecs.org/NCLBSURVEY/nclb.aspx?Target=AD Ladd, H. F. (1999). The Dallas school accountability
Education Trust. (2006). Primary progress, secondary and incentive program: An evaluation of its impact
challenge: A state-by-state look at student achieve on student outcomes. Economics of Education
ment patterns. Washington, DC: Education Trust. Review, 18, 1-16.
Elmore, R. F. (2002). Testing trap. Harvard Magazine, Lankford, H., & Wyckoff, J. (1997). The changing
105(1), 35. structure of teacher compensation, 1970-1994.
Elmore, R. F., & Fuhrman, S. H. (1995). Opportunity Economics of Education Review, 16(4), 371-384.
to-learn standards and the state role in education. Lee, J. (1997). State activism in education reform:
Teachers College Record, 96, 432-457. Applying the Rasch model to measure trends and
Ferguson, R. F. (1991, summer). Paying for public examine policy coherence. Educational Evaluation
education: New evidence on how and why money and Policy Analysis, 19( 1), 2943.
matters. Harvard Journal on Legislation, 28(2), Lee, J. (2002). Racial and ethnic achievement gap
465-498. trends: Reversing the progress toward equity? Edu
Finn, J. D., & Achilles, C. M. (1990). Answers andcational Researcher, 31, 3-12.
Lee, J. (2006). Tracking achievement gaps and assess
questions about class size: A statewide experiment.
American Educational Research Journal, 27(3), ing the impact of NCLB on the gaps: An in-depth
557-577. look into national and state reading and math out
Fuhrman, S. H. (1999). The new accountability come trends. Cambridge, MA: The Civil Rights
(CPRE Policy Brief). Philadelphia: Consortium forProject at Harvard University.
Policy Research in Education, Graduate School of Lee, J. (2008). Is test-driven external accountability
Education, University of Pennsylvania. effective? Synthesizing the evidence from cross
Fuller, B., Gesicki, K., Kang, E., & Wright, J. (2006). state causal-comparative and correlational studies.
Is the No Child Left Behind Act working? The reli Review of Educational Research, 78(3), 608-644.
ability of how states track achievement (PACE Lee, J. (2010). Trick or treat: New ecology of educa
Working Paper No. 06-1). Berkeley: University oftion accountability system in the USA. Journal of
California. Education Policy, 25(1), 73-93.
Greenwald, R., Hedges, L. V., & Laine, R. D. (1996). Lee, J., & Wong, K. K. (2004). The impact of account
The effect of school resources on student achieve ability on racial and socioeconomic equity: Consid
ment. Review of Educational Research, 66(3), ering both school resources and achievement
361-396. outcomes. American Educational Research Jour
Grissmer, D., & Flanagan, A. (1998). Exploring rapidnal, 41(A), 797-832.
achievement gains in North Carolina and Texas. Lee, J.-W., & Barro, R. J. (1998). School quality in a
Washington, DC: National Education Goals Panel. cross section of countries. Cambridge, MA:
Grissmer, D., Flanagan, A., Kawata, J., & Williamson, S. National Bureau of Economic Research.
(2000). Improving student achievement: What stateLeFloch, K. C., Martinez, F., O'Day, J., Stecher, B.,
NAEP test scores tell us. Santa Monica, CA: Rand. Taylor, J., & Cook, A. (2007). State and local
Haney, W. (2000). The myth of the Texas miracle inimplementation of the No Child Left Behind Act
education. Educational Policy Analysis Archives. Volume IIIAccountability under NCLB: Interim
Retrieved March 3, 2001, from http://epaa.asu.edu/ report. Washington, DC: U.S. Department of
epaa/v8n41 Education.
Hanushek, E. A. (1997). Assessing the effects Linn,
of R. L. (2003). Accountability: Responsibility and
school resources on student performance: Anreasonable expectations. Educational Researcher,
update. Educational Evaluation and Policy Analy32(1), 3-13.
sis, 19, 141-164. Marion, S. F., White, C., Carlson, D., Erpenbach, W.
Hanushek, E. A., & Raymond, M. E. (2004). Does J., Rabinowitz, S., & Sheinker, J. (2002). Making
school accountability lead to improved performance?valid and reliable decisions in the determination of
230
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms
State NAEP1990-2009
Porter, A., & Chester, M. (2002). Building a highlege, Lynch School of Education, 336 Campion Hall,
quality assessment and accountability program: TheChestnut Hill, MA 02467; reevest@bc.edu. He is inter
Philadelphia example. In D. Ravitch (Ed.), Brookested in the utilization of psychological research in
ings papers on education policy (pp. 285-315).education and teacher education.
Washington, DC: Brookings Institution.
Preliminary NCLB results show slippage in 2006.
(2006, September 20). Education Week. Retrieved
September 1, 2007, from www.edweek.org Submitted June 27,2010
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchi Revision received November 3, 2011
cal linear models. Thousand Oaks, CA: SAGE. Accepted November 9,2011
231
This content downloaded from 128.32.10.164 on Sun, 16 Apr 2017 23:30:47 UTC
All use subject to http://about.jstor.org/terms