"The Effects of Accountability System Design On Teachers' Use of Test Score Data" by Jennifer Jennings

The Effects of Accountability System
Design on Teachers Use of Test Score

Data
JENNIFER JENNINGS
New York University
Background/Context: Many studies have concluded that educational accountability policies increase data use, but we know little about how to design accountability systems to
encourage productive versus distortive uses of test score data.
Purpose: I propose that five features of accountability systems affect how test score data are
used and examine how individual and organizational characteristics interact with system
features to influence teachers data use. First, systems apply varying amounts of pressure.
Second, the locus of pressure varies across systems. Third, systems diverge in the distributional goals they set for student performance. Fourth, the characteristics of assessments vary
across systems. Finally, systems differ in scopethat is, whether they incorporate multiple
measures or are process- or outcome oriented.
Research Design: I review the literature on the effects of accountability systems on teachers
data use and propose a research agenda to further our understanding of this area.
Conclusions/Recommendations: Researchers have spent much more time analyzing test
score data than investigating how teachers use data in their work. Evolving accountability
systems provide new opportunities for scholars to study how the interactions between
accountability features, individual characteristics, and organizational contexts affect teachers test score data use.
The focus on data, I would say, is the driving force [behind education] reform. No longer can we guess. We need to challenge
ourselves everyday to see what the data mean. Secretary of
Education Arne Duncan, 2010 (Quoted in Prabhu, 2010)
Teachers College Record Volume 114, 110304, November 2012, 23 pages

Copyright by Teachers College, Columbia University
0161-4681
1
Teachers College Record, 114, 110304 (2012)
Since the 1970s, American education policy has relied on test-based

accountability policies to improve student achievement and to close
achievement gaps between advantaged and disadvantaged groups.
Central to the theory of action underlying accountability is the idea that
newly available test score data, in conjunction with the sanctions attached
to these data, change the way that schools and teachers do business. In
the view of many policy makers, exemplified by the Secretary of
Education quoted at the beginning of this article, data are the primary
driver of education reform.
Because data cannot do anything by themselves, whats missing from
this account is an understanding of whether and how data change practice at the district, school, and classroom level and lead to educational
improvement. Scholars have identified a number of potential positive
effects of accountability-induced test score data use (hereafter, data
use) on classroom and school practices, such as supporting diagnosis of
student needs, identifying strengths and weaknesses in the curriculum,
identifying content not mastered by students, motivating teachers to
work harder and smarter, changing instruction to better align it with
standards, encouraging teachers to obtain professional development that
will improve instruction, and more effectively allocating resources within
schools (Stecher, 2004). Other scholars studying the unintended consequences of accountability systems have been more skeptical about the
transformative potential of data use because educators can also use data
as a strategic resource to manipulate performance on accountability measures (Koretz, 2008).
Many studies have found that accountability policies increase data use
(Kerr, Marsh, Ikemoto, & Barney, 2006; Marsh, Pane, & Hamilton, 2006;
Massell, 2001). Yet little is known about how features of accountability systems affect how educators use data because accountability has been conceived of as one treatment in the literature. This is misleading, because
accountability systems, even under the No Child Left Behind Act
(NCLB), differ in important ways that have implications for schools and
teachers use of data. As Lee (2008) wrote, We need to know who is
accountable for what, how, and why (p. 625). Almost 20 years after the
implementation of the first state accountability systems, we still know little about how features of accountability systems interact with organizational and individual characteristics to influence teachers responses to
accountability.
This article draws on the existing literature to catalog the features of
accountability systems that may affect data use and reviews what we know,
and what we dont know, about their effects. I limit my scope to the use
of test score data, which at present are the primary data used in schools
2
TCR, 114, 110304 Accountability System Design
in response to accountability, and further restrict my focus to teachers

data use.
WHAT COUNTS AS DATA USE?
I conceive of data use broadly in this review. At one end of the continuum
is the most passive use of test score datadata use as a lens. Teachers may
never open a spreadsheet or a score report, but they generally have a
rough idea of how their school performed on state standardized tests. A
near universal use of test score data among teachers, then, involves making inferences about school performance and determining whether test
scores accurately reflect the schools quality. Data use of this kind may
lead educators to do nothing, to change classroom practice, to work
longer hours, to inform their professional identities, or to look for a job
at another school. The key point here is that whether exposure to this
information leads to action, inaction, or dismissal of its relevance, educators are using these data to make sense of the schools in which they work.
Educators can also use test score data more actively as a tool for diagnosis, whereby they identify problems and develop an account of their
causes. Teachers may utilize formative assessment data to identify skills
on which their class or an individual student performed poorly. Databased diagnoses range from more to less formal and systematic and may
reflect perceptions as much as accurate inferences about student or class
needs.
Data can also serve as a compass pointing toward particular instructional and organizational changes, or toward maintaining the status quo.
Based on the inferences drawn from test score data, teachers may change
their instructional approaches or the content they cover to improve student learning. Teachers may also use data in more general ways to allocate resources. They may alter the way they allocate attention to students,
or change the way they spend classroom time. Data-induced instructional
change is not necessarily instructional improvement, however, because
all changes can make things worse or better.
Another type of data use involves monitoring.1 School leaders and teachers set goals for student performance and gauge progress by using formative or summative standardized tests. Teachers often determine whether
a curriculum is working, or whether a student should change ability
groups or receive additional attention, by assessing progress on tests.
Finally, data can also be used as a legitimizer. Every day, teachers must
provide accounts to themselves about why they chose Option A over
Option B in their classrooms. Such decisions are never clear-cut and
sometimes involve tradeoffs between content (Should I focus more on
3
math or science?) or students (Which students should I tutor after

school?). Data provide an objective rationale for making such decisions. Because of the cultural standing of data, they also provide a legitimate way to account for ones actions to other parties, such as a colleague
or a principal.
Together, these types of data use capture how teachers view their
schools, students, and themselves (lens); how they determine whats working, whats going wrong, and why (diagnosis); what they should do in
response (compass); how they establish whether it worked (monitoring);
and how they justify decisions to themselves or to others (legitimizer).
Because of the positive valence of data use in the practitioner literature
(i.e., Boudet, City, & Murnane, 2005) and in the culture at large, it is
worth further refining these categories to distinguish between productive
and distortive uses of data. At the core of data use is the process of making inferences from student test scores regarding students and schools
performance, responding (or choosing not to respond) to these inferences, monitoring the effectiveness of this response, and accounting for
it to oneself and others. I thus define productive data use as practices that
improve student learning and do not invalidate the inferences about student- and
school-level performance that policy makers, educators, and parents hope to make.
To the extent that teachers use of test score data to make instructional
and organizational decisions produces score gains that do not generalize
to other measures of learningfor example, other measures of achievement or other measures of educational attainmentand thus leads us to
make invalid inferences about which schools, teachers, and programs are
effective, I will characterize this as distortive data use.
Two concrete examples are useful in making this distinction clear. In
an extreme case such as cheating, teachers use formative assessment data
to determine which students are lowest performing and thus whose
answer sheets should be altered. As a result, scores increase substantially,
and we infer that school quality has improved when it has not. On the
other hand, consider a teacher who uses formative assessment data to
pinpoint her students weaknesses in the area of mathematics and finds
that they perform much worse on statistics and probability problems than
geometry problems. She searches for more effective methods for teaching this contentmethods focused on the most important material in
this strand, not the details of the specific testand students performance on multiple external assessments improves in this strand of mathematics. We infer that students made gains in statistics and probability,
which is, in this case, a valid inference.
THE EFFECTS OF FEATURES OF ACCOUNTABILITY SYSTEMS ON

TEST SCORE DATA USE
I focus my review of the impact of five features of accountability systems
on teachers data use and discuss these features in terms of productive
and distortive forms of data use. I chose these features based on my
review of the literature because they represent, in my view, the most
important dimensions on which accountability systems differ.
First, accountability systems apply varying amounts of pressure. Systems
differ in the required pace of improvement and vary on a continuum
from supportive to punitive pressure. Second, the locus of pressure varies
across accountability systems. Systems may hold districts, schools, or students accountable for performance, and recent policies hold individual
teachers accountable. Third, accountability systems vary in the distributional goals they set for student performance. Prioritizing growth versus
proficiency may produce different responses, as will holding schools
accountable for racial and socioeconomic subgroups. Fourth, features of
assessments vary across systems. To the extent that teachers feel that using
test data will improve scores on a given test, teachers may be more likely
to use it, though it is not clear whether they will use it in productive or
distortive ways. Fifth, the scope of the accountability system may affect data
use. This includes whether an accountability system incorporates multiple measures, or is process- or outcome oriented. An accountability system that rewards teachers for short-term test score increases will likely
produce different uses of data than one that measures persistent effects
on test scores.
In the preceding description, accountability features are treated as universally understood and processed by all teachers. The implication is that
adopting a design feature will lead to a predicted set of responses. But
there are good reasons to believe that teachers may understand and
interpret the implications of these features differently. Coburn (2005)
studied how three reading teachers responded to changes in the institutional environment. She found that teachers sensemaking, framed by
their prior beliefs and practices, influenced their responses to messages
from the institutional environment. Diamond (2007), studying the
implementation of high-stakes testing policy in Chicago, found that
teachers understanding of and responses to this policy were mediated by
their interactions of colleagues. A number of other studies confirm that
organizational actors may not perceive and react to the organizational
environment similarly (Coburn, 2001, 2006; Spillane, Reiser, & Reimer,
2002; Weick, 1995). These studies largely found that organizational
actors both within and between organizations construct the demands of,
5
and appropriate responses to, accountability systems differently. As a

result, individuals and organizations respond in varied ways (and potentially use data in varied ways) that are not simply a function of what policy makers perceive as teachers incentives.
The object of interest in this review, then, is not the average effect of
the five accountability system features on data use. Rather, I consider how
individual and organizational characteristics interact with different features to produce the responses we observe.
VARYING AMOUNTS OF PRESSURE
Accountability systems vary in the amount and type of pressure exerted
on schools. The empirical problem for studying this issue, however, is
that accountability pressure is in the eye of the beholder; that is, it does
not exist objectively in the environment. What is perceived as significant
pressure in one school may be simply ignored in another. Teachers use
data as a lens for understanding environmental demands, but the meanings they attach to these data vary across individuals and schools.
Although the studies that I discuss next treat pressure as a quantity
known and agreed on by all teachers, I emphasize that understanding the
effects of pressure on data use first requires a more complex understanding of how teachers use data to establish that they face accountability
pressure.
Accountability pressure may have heterogeneous effects on data use
depending on schools proximity to accountability targets. As noted,
teachers may not have a uniform understanding of their proximity to targets, but most studies simulate this pressure by predicting schools odds
of missing accountability targets. Only one study has examined the effect
of schools proximity to accountability targets (here, adequate yearly
progress [AYP]) on instructional responses. Combining school-level data
on test performance and survey data from the RAND study of the implementation of NCLB in three states (Pennsylvania, Georgia, and
California), Reback, Rockoff, and Schwartz (2010) examined the effects
of accountability pressure on a series of plausibly data-driven instructional behaviors. Schools that were furthest from AYP targets were substantially more likely to focus on students close to proficiency relative to
those that were very likely to make AYP (53% of teachers vs. 26%), to
focus on topics emphasized on the state test (84% vs. 69%), and to look
for particular styles and formats of problems in the state test and emphasize them in [their] instruction (100% vs. 67%; Reback et al., 2010).
This study does not illuminate the process of data use in these schools.
These activities could have resulted from data use as a lens and stimulus
6
rather than examination of test data to diagnose student performance

levels and content weaknesses. Nonetheless, the study provides some support for the hypothesis that the amount of accountability pressure affects
how, and how much, data are used; data appear to be used to target both
students and item content and styles when schools face more accountability pressure. It also demonstrates that teachers facing objectively low risks
of missing accountability targets still respond strongly to accountability
systems. In these low-risk schools, more than two thirds of teachers in
this study appeared to use data to focus on particular content and item
formats.
Hamilton et al.s (2007) study of the implementation of NCLB in three
states (Pennsylvania, Georgia, and California) also considered how variation in state accountability systems may affect instructional responses,
and data use in particular. Hamilton et al. found that districts and schools
responded in similar ways across the three states. Among the most common improvement strategies deployed were aligning curricula and
instruction with standards, providing extra support for low-performing
students, encouraging educators to use test results for planning instruction, adopting benchmark assessments, and engaging in test preparation
activities. Though this review focuses on teachers, some of the most useful information on data use in this study came from principal surveys.
More than 90% of principals in all three states reported that they were
using student test data to improve instruction, though a higher fraction
of principals in Georgia found data useful than in the other two states.
Georgia also appeared to be an outlier in data use in other ways. For
example, 89% of districts required interim assessments in elementary
school math, whereas only 44% did in California, and 38% did in
Pennsylvania. Other studies suggest that these findings could be a function of variation in pressure resulting from where accountability targets
were set and how quickly schools were expected to increase scores.
Pedulla et al.s (2003) study provides some evidence on this issue; they
found that higher stakes increase data use, and another study has found
that the amount of pressure a district is facing is associated with teachers
emphasis of tested content and skills (Center on Education Policy, 2007).
Beyond these two studies, the management literature on organizational
learning and behavior provides further insight into how the amount of
accountability pressure might interact with organizational characteristics
to affect data use. Current federal accountability targets of 100% proficiency by 20142 can be understood as a stretch goal, which Sitkin, Miller,
See, Lawless, and Carton (in press) described as having two features,
extreme difficultyan extremely high level of difficulty that renders the
goal seemingly impossible given current situational characteristics and
7
resources; and extreme noveltythere are no known paths for achieving

the goal given current capabilities (i.e., current practices, skills, and
knowledge) (p. 9). Ordonez, Schweitzer, Galinsky, and Bazerman
(2009) characterized this problem as goals gone wild, writing that they
can narrow focus, motivate risk-taking, lure people into unethical behavior, inhibit learning, increase competition, and decrease intrinsic motivation (p. 17). Schweitzer, Ordonez, and Douma (2004) empirically tested
this idea in a laboratory experiment and found that subjects with unmet
goals engaged in unethical behavior at a higher rate than those told to
do their best and that this effect was stronger when subjects just missed
their goals. Although we have little empirical evidence on how these findings might generalize to the level of organizations, Sitkin et al. proposed
that high-performing organizations are the most likely to benefit from
stretch goals, whereas these goals are likely to have disruptive, suboptimal
effects in low-performing organizations.
To the extent that the management literature described earlier applies
to schools, these findings suggest that high- and low-performing schools
will respond differently to accountability pressure. We can hypothesize
that the lowest performing schools may be more likely to pursue distortive uses of data when faced with stretch goals. These schools may
attempt to quickly increase test scores using some of the approaches documented in the studies reviewed here, such as targeting high-return
content or students. Higher performing schools, which often have
higher organizational capacity, may respond in more productive ways.
There is some evidence for this argument in the educational literature
on data use; two studies have found that schools with higher organizational capacity are more likely to use data productively (Marsh et al.,
2006; Supovitz & Klein, 2003). Marsh et al. synthesized the results of four
RAND studies that incorporated study of teachers data use and found
that staff preparation to analyze data, the availability of support to help
make sense of data, and organizational norms of openness and collaboration facilitated data use. Supovitz and Klein (2003), in their study of
Americas Choice schools use of data, found that a central barrier to data
use was the lack of technical ability to manipulate data to answer questions about student performance.
LOCUS OF PRESSURE
Accountability systems hold some combination of students, teachers,
schools, and districts accountable for improving performance. In cities
like New York and Chicago, students must attain a certain level of performance on tests to be promoted to the next grade, and 26 states currently
8
require exit exams for students to graduate (Urbina, 2010). Merit pay
proposals in districts such as Houston, Charlotte-Mecklenberg, and
Minneapolis tie some component of a teachers compensation to his or
her students performance on standardized tests (Papay & Moore
Johnson, 2009). More recently, many federal Race to the Top winners
committed to linking up to 50% of teacher evaluations to student test
scores (New Teacher Project, 2010), and in September 2010, the Los
Angeles Times published individual value-added data for elementary teachers (Buddin, 2010). Principals in many cities can earn substantial
bonuses or lose their jobs based on test scores, and NCLB holds both
schools and districts accountable for improving test scores.
Does the locus of pressure affect how much, and how, data are used?
We need to understand whether within-school variation in data use
increases or decreases when accountability moves to the level of the
teacher. An intriguing finding in the current literature is that most of the
variation in data use currently exists within rather than between schools
(Marsh et al., 2006). This means that individual user characteristics need
to be a focus of study along with organizational characteristics. Because
current studies focus on outcomes rather than process, we can only use
existing results to generate hypotheses about between-teacher variation
in data use.
Recent studies by Papay (2010) and Corcoran, Jennings, and Beveridge
(2010) compared teacher effects on high- and low-stakes tests under the
assumption that the stakes attached to tests matter for teacher responses.
These studies suggest that teachers who appear effective on high-stakes
tests are not necessarily effective on low-stakes tests. Furthermore,
teacher effects on high-stakes tests decay more quickly than do teacher
effects on low-stakes tests, perhaps because teachers face different pressures to increase scores on high-stakes tests that lead them to use test data
in different ways. Corcoran et al. (2010) found that there are particularly
large gaps in measured teacher effectiveness on the two tests for inexperienced teachers. These teachers may be using, or experiencing the stimulus effects of, high-stakes test data differently than their more
experienced peers. This may be because these teachers face pretenure
pressure to increase test scores; the Reback et al. (2010) study found that
in schools facing accountability pressure, untenured teachers work substantially more hours per week.
These findings led to two specific lines of inquiry. The first task is to
explain whether data use plays a role in making teachers effective on a
high-stakes test but not a low-stakes test. For example, do some teachers
use test score data from high-stakes tests as the dominant lens in making
sense of school performance, and thus focus their time and attention
9
there? Do teachers with substantially greater effectiveness on high-stakes

than low-stakes tests use item-level data to diagnose student needs and
inform their instruction? The second task is to explain whether there are
systematic teacher characteristics that are associated with different levels
and types of data use. Do new teachers understand the line between productive and distortive data use differently than more experienced teachers? Does preservice training now focus more on data use, which would
produce different types of usage among new teachers? Do untenured
teachers experience accountability pressure in a more acute way and thus
make them more attentive to test score data? Answering each of these
questions requires an understanding of how teachers internalize the
locus of pressure, what they do as a result, and how these behaviors vary
across teachers both between and within organizations.
The locus of accountability may also affect how data are used to target
resources, such as time and attention in and out of class, to students.
Targeting reflects the use of data as diagnosis and potentially as legitimizer. That is, test scores are used to diagnose students in need of additional resources, and the imperative to improve test scores may legitimize
the use of these data to target additional time and attention to these students. Teachers (or their administrators) may also use administrative
data to determine which students do not required targeted resources in
the short term. Corcoran (2010) found that in the Houston Independent
School District, a large fraction of students did not have two consecutive
years of test data available. This creates incentives for teachers to use data
to identify students who dont count if teachers are held individually
accountable for student value-added, so they can focus their attention
elsewhere. Even if accountability is at the level of the school, Jennings
and Crosta (2010) found that 7% of students are not continuously
enrolled in the average school in Texas and thus do not contribute to
accountability indicators. This may be consequential for students. For
example, Booher-Jennings (2005) found that teachers in the Texas
school that she studied focused less attention on students who were not
counted in the accountability scheme because they were not continuously enrolled. These actions were legitimated as data-driven decisions.
In this case, data systems that were intended to improve the outcomes of
all students were instead used to determine which students would not
affect the schools scores.
Because policies that hold individual teachers accountable for scores
are new and have not been studied, the performance management literature on individual incentives is helpful in understanding how teachers
may use data differently in these cases. Much of this literature is based on
formal models of human behavior rather than empirical data. These
10
studies suggest that high-powered individual incentives focused on a limited set of easily measurable goals are likely to distort behavior and lead
to undesirable uses of data (Baker, Gibbons, & Murphy, 1994; Campbell,
1979; Holmstrom & Milgrom, 1991). If this precept applies to schools, we
can predict that individual accountability focused largely on test scores
will encourage distortive uses of data. But we may also expect that these
responses will be mediated by the organizational contexts in which teachers work (Coburn, 2001). In some places, this pressure may be experienced acutely, whereas in others, principals and colleagues may act as a
buffer. As I will discuss later, subjective performance measures have been
proposed as a way to offset these potentially distortive uses of data.
DISTRIBUTIONAL GOALS OF THE ACCOUNTABILITY SYSTEM:
PROFICIENCY, GROWTH, AND EQUITY
The goals of an accountability system affect how student performance is
measured, which in turn may affect how data are used. The three major
models currently in use are a status (i.e., proficiency) model, a growth
model, or some combination of the two. These models create different
incentives for using data as both diagnosis and compass to target
resources to students. In the case of status modelsby which I mean
models that focus on students proficiencyteachers have incentives to
move as many students over the cut score as possible but need not attend
to the average growth in their class. In a growth model, teachers have
incentives to focus on those students who they believe have the greatest
propensity to exhibit growth. Because state tests generally have strong
ceiling effects that limit the measurable growth of high-performing students (Koedel & Betts, 2009), teachers may focus on lower performing
students in a growth system.
No studies to date have investigated whether status models have different effects on data use than growth models. However, we can generate
hypotheses about these effects by considering a growing body of literature that has assessed how accountability systems affect student achievement across the test score distribution. A prime suspect in producing
uneven distributional effects is reliance of current accountability systems
on proficiency rates, a threshold measure of achievement. Measuring
achievement this way can lead teachers to manipulate systems of measurement to create the appearance of improvement. For example, teachers can focus on bubble students, those close to the proficiency cut
score (Booher-Jennings, 2005; Hamilton et al., 2007; Neal &
Schanzenbach, 2007; Reback, 2008). Test score data appear to play a
central role in making these targeting choices and may also be used to
11
legitimize these choices as data-driven decision making (BooherJennings, 2005). Because sanctions are a function of passing rates, slightly
increasing the scores of a small number of students can positively impact
the schools accountability rating.
A large body of evidence addresses the issue of distributional effects
and provides insight into the extent to which teachers are using data to
target resources to students. The literature is decidedly mixed. One study
in Chicago found negative effects of accountability pressure on the lowest performing students (Neal & Schanzenbach, 2007), whereas another
in Texas found larger gains for marginal students and positive effects for
low-performing students as well (Reback, 2008). Four studies identified
positive effects on low-performing students (Dee & Jacob, 2009; Jacob,
2005; Ladd & Lauen, 2010; Springer, 2007), whereas four found negative
effects on high-performing students (Dee & Jacob, 2009; Krieg, 2008;
Ladd & Lauen, 2010; Reback, 2008). Because these studies intended to
establish effects at the level of the population, they did not directly attend
to how teachers varied in their use of data to target students and how
organizational context may have mediated these responses. I return to
these issues in my proposed research agenda.
Only one study to date has compared the effects of status and growth
models on achievement. Analyzing data from North Carolina, which has
a low proficiency bar, Ladd and Lauen (2010) found that low-achieving
students made more progress under a status-based accountability system.
In contrast, higher achieving students made more progress under a
growth-based system. This suggests that teachers allocation of resources
is responsive to the goals of the measurement system; teachers targeted
students below the proficiency bar under a status system, and those
expected to make larger gains (higher performing students) under a
growth system. As more states implement growth models, researchers will
have additional opportunities to address this question and determine
what role the difficulty of the proficiency cut plays in affecting how teachers allocate their attention.
A second feature that may be important for how data are used to target
students is whether the system requires subgroup accountability, and
what cutoffs are established for separately counting a subgroup. States
vary widely in how they set their subgroup cutoffs. In Georgia and
Pennsylvania, 40 students count as a subgroup, whereas in California,
schools must enroll 100 students or 50 students if that constitutes 15% of
school enrollment (Hamilton et al., 2007). Only one study by Lauen and
Gaddis (2010) has addressed the impact of NCLBs subgroup requirements. Though they found weak and inconsistent effects of subgroup
accountability, Lauen and Gaddis found large effects of subgroup
12
accountability on low-achieving students test scores in reported subgroups; these were largest for Hispanic students. These findings suggest
that we need to know more about how data are used for targeting in
schools that are separately accountable for subgroups compared with
similar schools that are not.
To summarize, most of our knowledge about the effects of distributional goals of accountability systems comes from studies that examine
test scores, rather than data use, as the focus of study. These studies raise
many questions about the role data use played in producing these outcomes. First, data use has made targeting more sophisticated, real-time,
and accurate, but we know little about how targeting varies across teachers and schools. Second, we need to know whether teachers, administrators, or both are driving targeting behavior. For example, whereas
77%90% of elementary school principals reported encouraging teachers to focus their efforts on students close to meeting the standards,
only 29%37% of teachers reported doing so (Hamilton et al., 2007).
Third, we need to know more about the uses of summative data for monitoring the effectiveness of targeting processes. How do teachers interpret students increases in proficiency when they are applying targeting
practices? Depending on the inference teachers want to make, targeting
can be perceived as a productive or distortive use of data. Targeting students below passing creates the illusion of substantial progress on proficiency, making it distortive if the object of interest is change in student
learning. On the other hand, the inferences made based on test scores
would not be as distortive if teachers examined average student scale
scores. At present, we do not know to what extent teachers draw these distinctions in schools. A final area of interest, which will be discussed in
more detail in the following section, is the extent to which targeting
increases students skills generally or is tailored to predictable test items
that will push students over the cut score.
FEATURES OF ASSESSMENTS
Features of assessments may affect whether teachers use data in productive or distortive ways. Here, I focus on three attributes of assessments
the framing of standards, the sampling of standards, and the cognitive
demand of the skills represented on state testsbecause they are most
relevant to the potentially distortive uses of data. Many more features of
assessments, such as the extent to which items are coachable, should
also be explored. The specificity of standards and their content varies
widely across states (Finn, Petrilli, & Julian, 2006). Framing standards too
broadly leads teachers to use test data to focus on tests rather than on the
13
standards themselves (Stecher, Chun, Barron, & Ross, 2000). In other

words, if a standard requires that students understand multiple instantiations of a skill but always test the same one on the test, teachers will likely
ignore the unsampled parts of the standard. On the other hand, overly
narrow framing of standards also enables test preparation that increases
overall scores without increasing learning. For example, in some cases,
state standards are framed so narrowly that they describe a test question
rather than a set of skills (Jennings & Bearak, 2010). The implication for
data use is that the framing of standards may affect how much teachers
use item data disaggregated at the standard level. Ultimately, this turns
on teachers assessments of how much information standard-level data
provide.
By the same token, the sampling of standardswhat is actually covered
on the testsmay also affect how teachers use data to inform instruction.
Teachers themselves have reported significant misalignment between the
tests and the standards, such that content was included that was not covered in the curriculum, or important content was omitted (Hamilton et
al., 2007). Omissions are not randomly drawn from the standards; for
example, Rothman, Slattery, Vranek, and Resnick (2002) found that tests
were more likely to cover standards that were less cognitively demanding.
Studies of standard coverage on the New York, Massachusetts, and Texas
tests confirm these impressions and have found that state tests do not
cover the full state standards and are predictable across years in ways that
facilitate test-specific instruction, though there is substantial variation
across subjects and states (Jennings & Bearak, 2010). At one end of the
continuum is New York; in mathematics, in no grade is more than 55%
of the state standards tested in 2009. By contrast, the Texas math tests covered almost every standard in 2009, and the Massachusetts exams covered roughly four fifths of the standards in that year. Jennings and Bearak
(2010) analyzed test-item level data and found that students performed
better on standards that predictably accounted for a higher fraction of
test points. This suggests that teachers had targeted their attention to
standards most likely to increase test scores. Survey evidence largely confirms these findings. In the RAND three-state study, teachers reported
that there were many standards to be tested, so teachers had identified
highly assessed standards on which to focus their attention (Hamilton
& Stecher, 2006).
A complementary group of studies on teaching to the format illustrates how features of the assessment can affect how data are used to
influence practice. Studies by Borko and Elliott (1999), DarlingHammond and Wise (1985), McNeil (2000), Shepard (1988), Shepard
and Dougherty (1991), Smith and Rottenberg (1991), Pedulla et al.
14
(2003), and Wolf and McIver (1999) all demonstrate how teachers focus
their instruction not only on the content of the test, but also its format,
by presenting material in formats as they will appear on the test and
designing tasks to mirror the content of the tests. To the extent that students learn how to correctly answer questions when they are presented in
a specific format but struggle with the same skills when they are presented in a different format, this use of test data to inform instruction is
distortive because it inflates scores.
Taken together, these studies suggest that different assessments create
incentives for teachers to use data in different ways. They also suggest
that teachers are using relatively detailed data about student performance on state exams in their teaching but provide little insight into the
types of teachers or schools where these practices are most likely to be
prevalent. School organizational characteristics, such as the availability of
data systems, instructional support staff, and profesional development
opportunties, may affect how features of assessments are distilled for
teachers. Teachers own beliefs about whether these uses constitute
good teaching may also matter. Some teachers view focusing on frequently tested standards as best practice, whereas others see this as
teaching to the test. Another important area for inquiry is whether this
type of data use is arising from the bottom up or the top down. In many
cases, teachers are not making these decisions alone. Rather, school and
district leaders may mandate uses of data and changes in practice that
will increase test scores, and teachers may unevenly respond to these
demands (Bulkley, Fairman, Martinez, & Hicks, 2004; Hannaway, 2007;
Koretz, Barron, Mitchell, & Stecher, 1996; Koretz, Mitchell, Barron, &
Keith, 1996; Ladd & Zelli, 2002).
SCOPE
Many have hypothesized that accountability systems based on multiple
process and outcome measures may encourage more productive uses of
data than those based only on test scores. In a policy address, Ladd
(2007) proposed the use of teams of inspectors that would produce qualitative evaluations of school quality so that accountability systems would
more proactively promote good practice. Hamilton et al. (2007) suggested creating a broader set of indicators to provide more complete
information to the public about how schools are performing and to
lessen the unwanted consequences of test-based accountability. The hope
is that by measuring multiple outcomes and taking account of the
processes through which outcomes are produced, educators will have
weaker incentives to use distortive means.
15
A large body of literature in economics and management has considered how multiple measures may influence distortive responses to incentives, particularly when firms have multiple goals. In their seminal article,
Holmstrom and Milgrom (1991) outlined two central problems in this
area: Organizations have multiple goals that are not equally easy to measure, and success in one goal area may not lead to improved performance
in other goal areas. They showed that when organizational effectiveness
in achieving goals is weakly correlated across goal domains and information across these domains is asymmetric, workers will focus their attention on easily measured goals to the exclusion of others. They
recommended minimizing strong objective performance incentives in
these cases.
Because comprehensive multiple measures systems have not been
implemented in the United States, it is currently difficult to study their
effects on data use. Existing studies have examined how the use of multiple measures affects the validity of inferences we can make about school
quality (Chester, 2005) or have described the features of these systems
(Brown, Wohlstetter, & Liu, 2008), but none has evaluated their effects
on data use. The European and U.K. experience with inspectorates provides little clear guidance on this issue; scholars continue to debate
whether these systems have improved performance or simply led to gaming of inspections (Ehren & Visscher, 2006).
A PROPOSED RESEARCH AGENDA
To improve accountability system design and advance our scholarly
knowledge of data use, researchers should study both productive and distortive uses of data. Next, I propose a series of studies that would help
build our understanding of how accountability system features affect
teachers data use and what factors produce variability in teachers
responses.
AMOUNT OF PRESSURE
There are two specific features of regulatory accountability systems that
should be explored to understand their impacts on data use: where the
level of expected proficiency or gain is set, and how quickly schools are
expected to improve. There is wide variation in the cut scores for proficiency that states set under NCLB as well as substantial variation in the
required pace of improvement under NCLB. As Koretz and Hamilton
(2006) have written, establishing the right amount of required improvement and its effects has been a recurrent problem in performance
16
management systems and is generally aspirational rather than evidence

based. Some states used linear trends to reach 100% proficiency by
20132014, whereas others allowed slower progress early on but required
the pace of improvement to increase as 2014 approached (Porter, Linn,
& Trimble, 2005). Such differences across states can be exploited to
understand the impact of the amount of pressure on data use.
Also worth exploring is how schools react to cross-cutting accountability pressures. Local, state, and federal accountability systems are currently
layered on top of each other, yet we know little about which sanctions
drive educators behavior and why. For example, California maintained
its pre-NCLB state growth system and layered a NCLB-compliant status
system on top. In New York City, the A-F Progress Report system was layered on top of the state and federal accountability system. In both cases,
growth and status systems frequently produced different assessments of
school quality. We need to know more about how educators make sense
of the competing priorities of these systems and determine how to
respond to their multiple demands. For example, we might expect that
judgments of low performance from both systems would intensify
accountability pressure and increase teachers use of test score data,
whereas conflicting judgments might lead teachers to draw on test score
data to legitimize one designation versus the other.
LOCUS OF PRESSURE
Rapidly evolving teacher evaluation systems and the staggered implementation of these systems across time and space provide an opportunity to
understand how data use differs when both teachers and schools are held
accountable.
We need to know more about how educators understand value-added
data that are becoming widely used and distributed and how they
respond in their classrooms. For example, we need to know more about
how teachers interpret the meaning of these measures as well as their
validity. We need to know to what extent teachers perceive that these
measures accurately capture their impact and how this perceived legitimacy affects the way they put these data into use. Of particular interest is
how these interpretations vary by teacher and principal characteristics
such as experience, grade level, subject taught, teacher preparation (i.e.,
traditional vs. alternative), demographics, and tenure. Researchers
should also investigate the interaction between individual and organizational characteristics, such as the schools level of performance, racial
and socioeconomic composition, resources, and climate.
Once we have a better understanding of how teachers interpret these
17
data, scholars should study whether and how teachers data use changes
in response to measures of their value-added. For example, we need to
understand whether teachers pursue productive or distortive forms of
data use as they try to improve their value-added scores and how these
reactions are mediated by individual and organizational characteristics.
GOALS OF THE ACCOUNTABILITY SYSTEM: STATUS, GROWTH, AND
EQUITY
Although scholars have proposed a variety of theories to predict how
teachers will respond to status versus growth-oriented systems, few of
these theories have been tested empirically. What seems clear is that status and growth-based systems create different incentives for using data to
inform instruction. Though there is no extant research on this area, one
might hypothesize that in a status-based system, a teacher might use formative assessment data to reteach skill deficits of students below the
threshold. Under a growth system, the same teacher might use these data
to maximize classroom growth, which could occur by focusing on the
skills that the majority of the class missed. Alternatively, a more sophisticated approach could be used whereby teachers determine which students have the greatest propensity to exhibit growth and focus on the
skill deficits of these students. As accountability systems evolve,
researchers should study how the incentives implied by status and growth
systems affect data use practices.
All the ideas posed in the preceding paragraphs suggest that teachers
have a clear understanding of how status versus growth systems work.
Because growth systems are based on sophisticated statistical models,
there are good reasons to suspect that is untrue. Researchers should also
study how teachers vary in their understanding of these systems, how
these understandings vary between and within organizations, and how
they shape teachers use of data.
FEATURES OF ASSESSMENTS
As described in the literature review, assessments offer different opportunities to use data to improve test scores. This area will become particularly relevant with the implementation of the Common Core standards
and assessments, which focus on fewer concepts but promote deeper
understanding of them. The switch to a new set of standards will provide
a unique opportunity for researchers to observe the transition in data use
that occurs when the features of standards and assessments change.
We also need to know how teachers vary in the extent to which they use
18
distortive approaches to increase scores, which can be enabled by assessments that build in predictable features. Existing research suggests that
there is substantial variation across states in such responses, which suggests that features of assessments may matter. More investigation at the
organizational level is needed. For example, there is substantial variation
in the fraction of teachers reporting that they emphasize certain assessment styles and formats of problems in their classrooms. One hypothesis
is that features of assessments make these behaviors a higher return strategy in some places than others. Another hypothesis is that variation in
score reporting produces variation in teachers data use. Researchers
could contrast teachers responses in states and districts that disaggregate
subscores in great detail, relative to those that provide only overall
scores.3 Once we better understand the features of assessments and
assessment reporting that contribute to these differences, researchers
can design assessments that promote desired uses of data and minimize
undesired uses. Using existing student-by-item-level administrative data,
it is now possible to model teachers responses to these incentives, but
what is missing from such studies is an understanding of the data-related
behaviors that produced them. Future studies using both survey and
qualitative approaches to study data use can help to unpack these
findings.
SCOPE
Many have hypothesized that accountability systems based on multiple
measuresand in particular, those that are both process- and outcome
orientedmay produce more productive uses of data. Future studies
should establish how teachers interpret multiple measures systems. These
systems will put different weights on different types of measures, requiring teachers to decide how to allocate their time between meeting them.
For example, in New York City, 85% of schools letter grades are based on
student test scores, whereas 15% are based on student, teacher, and parent surveys and attendance records. Likewise, new systems of teacher
evaluation incorporate both test scores (in some cases, up to 51% of the
evaluation) and other evaluations. We need to know how teachers understand these systems in practice and how the weights put on different types
of measures influence their understanding.
CONCLUSION
The rise of the educational accountability movement has created a flurry
of enthusiasm for the use of data to transform practice and generated
19
reams of test score data that teachers now work with every day.
Researchers have spent much more time analyzing these test score data
themselves than trying to understand how teachers use data in their
work. What this literature review makes clear is just how scant our knowledge is about what teachers are doing with these data on a day-to-day
basis. Given the widespread policy interest in redesigning accountability
systems to minimize the undesired consequences of these policies, understanding how accountability features influence teachers data use is an
important first step in that enterprise.
Notes
1. Federal, state, and district policy makers, of course, formally use data to measure
how schools are doing and to apply rewards or sanctions, but I focus here on use of data by
teachers.
2. Recent regulations now make it possible for states to request waivers from this
requirement.
3. I thank a reviewer for this point.
References
Baker, G., Gibbons, R., & Murphy, K. J. (1994). Subjective performance measures in optimal incentive contracts. Quarterly Journal of Economics, 109, 11251156.
Booher-Jennings, J. (2005). Below the bubble: Educational triage and the Texas accountability system. American Educational Research Journal, 42, 231268.
Borko, H., & Elliott, R. (1999). Hands-on pedagogy versus hands-off accountability:
Tensions between competing commitments for exemplary math teachers in Kentucky.
Phi Delta Kappan, 80, 394400.
Boudet, K., City, E., & Murnane, R. (Eds.). (2005). Data-wise: A step-by-step guide to using assessment results to improve teaching and learning. Cambridge, MA: Harvard Education Press.
Brown, R. S., Wohlstetter, P., & Liu, S. (2008). Developing an indicator system for schools
of choice: A balanced scorecard approach. Journal of School Choice, 2, 392414.
Buddin, R. (2010). Los Angeles teacher ratings: FAQ and about. Los Angeles Times. Retrieved
from http://projects.latimes.com/value-added/faq/
Bulkley, K., Fairman, J., Martinez, M. C., & Hicks, J. E. (2004). The district and test preparation. In W. A. Firestone & R. Y. Schorr (Eds.), The ambiguity of test preparation (pp.
113142). Mahwah, NJ: Erlbaum.
Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and
Program Planning, 2, 6790.
Center on Education Policy. (2007). Choices, changes, and challenges: Curriculum and instruction in the NCLB era. Retrieved from http://www.cep-dc.org
Chester, M. D. (2005). Making valid and consistent inferences about school effectiveness
from multiple measures. Educational Measurement: Issues and Practice, 24, 4052.
Coburn, C. E. (2001). Collective sensemaking about reading: How teachers mediate reading policy in their professional communities. Educational Evaluation and Policy Analysis,
23, 145170.
20
Coburn, C. E. (2005). Shaping teacher sensemaking: School leaders and the enactment of
reading policy. Educational Policy, 19, 476509.
Coburn, C. E. (2006). Framing the problem of reading instruction: Using frame analysis to
uncover the microprocesses of policy implementation. American Educational Research
Journal, 43, 343379.
Corcoran, S. P. (2010). Can teachers be evaluated by their students test scores? Should they be?
Providence, RI: Annenberg Institute, Brown University.
Corcoran, S. P., Jennings, J. L., & Beveridge, A. A. (2010). Teacher effectiveness on high and lowstakes tests (Working paper). New York University.
Darling-Hammond, L., & Wise, A.E. (1985). Beyond standardization: State standards and
school improvement. Elementary School Journal, 85, 315336.
Dee, T. S., & Jacob, B. (2009). The impact of No Child Left Behind on student achievement (NBER
working paper). Cambridge, MA: National Bureau of Economic Research.
Diamond, J. B. (2007). Where rubber meets the road: Rethinking the connection between
high-stakes testing policy and classroom instruction. Sociology of Education, 80, 285313.
Ehren, M., & Visscher, A.J. (2006). Towards a theory on the impact of school inspections.
British Journal of Educational Studies, 54, 5172.
Finn, C. E., Petrilli, M. J., & Julian, L. (2006). The state of state standards. Washington, DC:
Thomas B. Fordham Foundation.
Hamilton, L. S., & Stecher, B. M. (2006). Measuring instructional responses to standards-based
accountability. Santa Monica, CA: RAND.
Hamilton, L. S., Stecher, B. M., Marsh, J. A., McCombs, J. S., Robyn, A., Russell, J. L., . . .
Barney, H. (2007). Implementing standards-based accountability under No Child Left Behind:
Responses of superintendents, principals, and teachers in three states. Santa Monica, CA: RAND.
Hannaway, J. (2007, November). Unbounding rationality: Politics and policy in a data rich system.
Mistisfer lecture, University Council of Education Administration, Alexandria, VA.
Holmstrom, B., & Milgrom, P. (1991). Multitask principal-agent analyses: Incentive contracts, asset ownership, and job design. Journal of Law, Economics, and Organization, 7,
2452.
Jacob, B. A. (2005). Accountability, incentives, and behavior: Evidence from school reform
in Chicago. Journal of Public Economics, 89, 761796.
Jennings, J. L., & Bearak, J. (2010, August). State test predictability and teaching to the test:
Evidence from three states. Paper presented at the annual meeting of the American
Sociological Association, Atlanta, GA.
Jennings, J. L., & Crosta, P. (2010, November). The unaccountables. Paper presented at the
annual meeting of APPAM, Boston, MA.
Kerr, K. A., Marsh, J. A., Ikemoto, G. S., & Barney, H. (2006). Strategies to promote data use
for instructional improvement: Actions, outcomes, and lessons from three urban districts. American Journal of Education, 112, 496520.
Koedel, C., & Betts, J. (2009). Value-added to what? How a ceiling in the testing instrument influences value-added estimation (Working paper). University of Missouri.
Koretz, D. (2008). Measuring up: What standardized testing really tells us. Cambridge, MA:
Harvard University Press.
Koretz, D., Barron, S., Mitchell, K., & Stecher, B. (1996a). The perceived effects of the Kentucky
Instructional Results Information System. Santa Monica, CA: RAND.
Koretz, D., & Hamilton, L. S. (2006). Testing for accountability in K-12. In R. L. Brennan
(Ed.), Educational measurement (4th ed., pp. 531578). Westport, CT: American Council
on Education/Praeger.
Koretz, D., Mitchell, K., Barron, S., & Keith, S. (1996b). The perceived effects of the Maryland
School Performance Assessment Program (CSE Tech. Rep. No. 409). Los Angeles: University
21
of California, National Center for Research on Evaluation, Standards, and Student

Testing.
Krieg, J. (2008). Are students left behind? The distributional effects of No Child Left
Behind. Education Finance and Policy, 3, 250281.
Ladd, H. F. (2007, November). Holding schools accountable. Paper presented at the annual
meetings of APPAM, Washington, DC.
Ladd, H. F., & Lauen, D. L. (2010). Status versus growth: The distributional effects of
accountability policies. Journal of Policy Analysis and Management, 29(3), 426450.
Ladd, H., & Zelli, A. (2002). School-based accountability in North Carolina: The responses
of school principals. Educational Administration Quarterly, 38, 494529.
Lauen, D. L., & Gaddis, S. M. (2010). Shining a light or fumbling in the dark? The effects of
NCLBs subgroupspecific accountability on student achievement gains (Working paper).
University of North Carolina, Chapel Hill.
Lee, J. (2008). Is test-driven external accountability effective? Synthesizing the evidence
from cross-state causal-comparative and correlational studies. Review of Educational
Research, 78, 608644.
Marsh, J. A., Pane, J. F., & Hamilton, L. S. (2006). Making sense of data-driven decision making
in education: Evidence from recent RAND research (No. OP-170). Santa Monica, CA: RAND.
Massell, D. (2001). The theory and practice of using data to build capacity: State and local
strategies and their effects. In S. H. Fuhrman (Ed.), From the capitol to the classroom:
Standards-based reform in the states (pp. 148169). Chicago, IL: University of Chicago.
McNeil, L. M. (2000). Contradictions of school reform: The educational costs of standardized testing.
London, England: Routledge.
Neal, D., & Schanzenbach, D. W. (2007). Left behind by design: Proficiency counts and test-based
accountability (Working paper). University of Chicago.
New Teacher Project. (2010). The real race begins: Lessons from the first round of Race to the Top.
New York, NY: Author.
Ordonez, L. D., Schweitzer, M. E., Galinsky, A. D., & Bazerman, M. H. (2009). Goals gone
wild: The systematic side effects of over prescribing goal setting. Academy of Management
Perspectives, 23, 616.
Papay, J. P. (2010). Different tests, different answers: The stability of teacher value-added
estimates across outcome measures. American Education Research Journal, 48, 130.
Papay, J., & Moore Johnson, S. (2009). Redesigning teacher pay: A system for the next generation
of educators. Washington, DC: Economic Policy Institute.
Pedulla, J. J., Abrams, L. M., Madaus, G. F., Russell, M. K., Ramos, M. A., & Miao, J. (2003).
Perceived effects of state-mandated testing programs on teaching and learning: Findings from a
national survey of teachers. Chestnut Hill, MA: National Board on Educational Testing and
Public Policy.
Porter, A. C., Linn, R. L., & Trimble C. S. (2005). The effects of state decisions about NCLB
adequate yearly progress targets. Educational Measurement: Issues and Practice, 24, 3239.
Prabhu, M. T. (2010). Forum calls for better use of data in education. eSchool News, 12(4).
Reback, R. (2008). Teaching to the rating: School accountability and the distribution of student achievement. Journal of Public Economics, 92, 13941415.
Reback, R., Rockoff, J., & Schwartz, H. (2010). Under pressure: Job security, resource allocation,
and productivity in schools under NCLB (Working paper). Barnard College.
Rothman, R., Slattery, J. B., Vranek, J. L., & Resnick, L.B. (2002). Benchmarking and alignment of standards and testing (Working paper). UCLA.
Schweitzer, M., Ordonez, L., & Douma, B. (2004). Goal setting as a motivator of unethical
behavior. Academy of Management Journal, 47, 422432.
22
Shepard, L. A. (1988, April). The harm of measurement-driven instruction. Paper presented at

the annual meeting of the American Educational Research Association, Washington, DC.
Shepard, L. A., & Dougherty, K. D. (1991). The effects of high stakes testing. In R. L. Linn
(Ed.), Annual meetings of the American Education Research Association and the National Council
of Measurement in Education. Chicago, IL.
Sitkin, S., Miller, C., See, K., Lawless, M., & Carton, D. (in press). The paradox of stretch
goals: Pursuit of the seemingly impossible in organizations. Academy of Management
Review, 36.
Smith, M. L., & Rottenberg, C. (1991). Unintended consequences of external testing in elementary schools. Educational Measurement: Issues and Practice, 10, 711.
Spillane, J. P., Reiser, B. J., & Reimer, T. (2002). Policy implementation and cognition:
Reframing and refocusing implementation research. Review of Educational Research, 72,
387431.
Springer, M. (2007). The influence of an NCLB accountability plan on the distribution of
student test score gains. Economics of Education Review, 27, 556563.
Stecher, B. (2004). Consequences of large-scale high-stakes testing on school and classroom
practice. In L. Hamilton, B. M. Stecher, & S. Klein (Eds.), Making sense of test-based accountability in education (pp. 79100). Santa Monica, CA: RAND.
Stecher, B. M., Chun, T. J., Barron, S. I., & Ross, K. E. (2000). The effects of the Washington
State education reform on schools and classrooms: Initial findings. Santa Monica, CA: RAND.
Supovitz, J. A., & Klein, V. (2003). Mapping a course for improved student learning: How innovative schools systematically use student performance data to guide improvement. Philadelphia:
Consortium for Policy Research in Education, University of Pennsylvania.
Urbina, I. (2010, January 12). As school exit tests prove tough, states ease standards. The
New York Times, p. A1.
Weick, K. (1995). Sensemaking in organizations. London, England: Sage.
Wolf, S. A., & McIver, M.C. (1999). When process becomes policy: The paradox of Kentucky
state reform for exemplary teachers of writing. Phi Delta Kappan, 80, 401406.
JENNIFER JENNINGS is assistant professor of sociology at New York

University. Her research interests include the effects of organizational
accountability systems on racial, socioeconomic, and gender disparities
in educational and health outcomes and the effects of schools and teachers on nontest score outcomes.
23

"The Effects of Accountability System Design On Teachers' Use of Test Score Data" by Jennifer Jennings

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

"The Effects of Accountability System Design On Teachers' Use of Test Score Data" by Jennifer Jennings

Загружено:

Авторское право:

Доступные форматы

The Effects of Accountability System

Design on Teachers Use of Test Score

Teachers College Record Volume 114, 110304, November 2012, 23 pages

Teachers College Record, 114, 110304 (2012)

Since the 1970s, American education policy has relied on test-based

TCR, 114, 110304 Accountability System Design

in response to accountability, and further restrict my focus to teachers

Teachers College Record, 114, 110304 (2012)

math or science?) or students (Which students should I tutor after

TCR, 114, 110304 Accountability System Design

THE EFFECTS OF FEATURES OF ACCOUNTABILITY SYSTEMS ON

Teachers College Record, 114, 110304 (2012)

and appropriate responses to, accountability systems differently. As a

TCR, 114, 110304 Accountability System Design

rather than examination of test data to diagnose student performance

Teachers College Record, 114, 110304 (2012)

resources; and extreme noveltythere are no known paths for achieving

TCR, 114, 110304 Accountability System Design

Teachers College Record, 114, 110304 (2012)

there? Do teachers with substantially greater effectiveness on high-stakes

TCR, 114, 110304 Accountability System Design

Teachers College Record, 114, 110304 (2012)

TCR, 114, 110304 Accountability System Design

Teachers College Record, 114, 110304 (2012)

standards themselves (Stecher, Chun, Barron, & Ross, 2000). In other

TCR, 114, 110304 Accountability System Design

Teachers College Record, 114, 110304 (2012)

TCR, 114, 110304 Accountability System Design

management systems and is generally aspirational rather than evidence

Teachers College Record, 114, 110304 (2012)

TCR, 114, 110304 Accountability System Design

Teachers College Record, 114, 110304 (2012)

TCR, 114, 110304 Accountability System Design

Teachers College Record, 114, 110304 (2012)

of California, National Center for Research on Evaluation, Standards, and Student

TCR, 114, 110304 Accountability System Design

Shepard, L. A. (1988, April). The harm of measurement-driven instruction. Paper presented at

JENNIFER JENNINGS is assistant professor of sociology at New York

Вам также может понравиться