Вы находитесь на странице: 1из 13

Educational Evaluation and Policy Analysis

Winter 1988, Vol. 10, No. 4, pp. 301-313

Symbolic Validation: The Case of State-Mandated, High-Stakes


Testing

Peter W. Airasian
Boston College

Because the wisdom and likely effects of educational innovations rarely are established before
their adoption, the innovations must seek their legitimacy in nonempirical sources. This
paper uses the example of the rapid rise of state-mandated, high-stakes testing programs to
suggest that proposed innovations find their way into practice by virtue of their potency as
symbols of prevalent value orientations in the wider culture. Thus, innovations undergo a
type of consensual, symbolic validation that determines their acceptability. State-mandated,
high-stakes testing programs have gained widespread popular support in the quest for
heightened educational standards because they symbolize order and control, desired educa-
tional outcomes, and traditional moral values.

Most educational innovations are adopted services that were provided argued for re-
with high levels of technical uncertainty; the form in special education in the 1970s. Var-
wisdom of their adoption and the range of ious forms of test score data showing de-
their likely effects rarely are known in ad- clines in pupil performance were offered in
vance of their implementation (Rowan, the early 1980s to support the contention
1982; Wise, 1979). Although many thought- that our nation was at risk and in need of
ful individuals have reminded us of this fact higher educational standards.
and urged a more systematic, evidence- Each of these types of evidence provided
based approach to innovation (Campbell, legitimacy to the importance of a particular
1969; Fullan, 1982; Suchman, 1967), their problem area and to the need for reform in
advice has gone largely unheeded. that area. The evidence did not, however,
The past 25 years have seen scant empir- provide legitimacy to the Title I programs,
ical evidence to justify the implementation Public Law 94-142 statutes, and competency
of a wide range of educational programs testing programs that were ultimately se-
including, but by no means limited to, Title lected to remedy the problems. There is an
(Chapter) I, Public Law 94-142, and state- important difference between evidence that
mandated high school graduation tests. documents the need for intervention and
There is little disagreement that reform was evidence that documents the efficacy of a
needed in each of these areas, and empirical particular strategy of intervention. Typi-
evidence was brought forth to document the cally, once a need for intervention is docu-
severity of the problems in each area. Data mented empirically, the specific policy, pro-
showing test score differences between dif- gram, or approach advanced to meet that
ferent racial and ethnic groups were a key need must seek its own sources of legitimi-
justification for the need for compensatory zation and validation. These sources rarely
education programs of the 1960s. Evidence include trial testing and collection of empir-
of the failure to provide services to special ical evidence.
needs pupils and the inadequacies of many Several reasons explain why empirical evi-

301
Peter W. Airasian

dence does not and cannot play an impor- school children, for example, may be at-
tant role in the legitimization of innovations. tacked by extending the length of the school
In the first place, innovation almost always day, adding required courses to the school
is conceived of in response to some social curriculum, mandating exit tests to certify
problem or crisis that demands urgent solu- pupils' competence, instituting remedial ed-
tion. Whether the problem is disparities in ucation programs, raising grade to grade pro-
school performance between racial/ethnic motion requirements, developing incentives
groups, the large number of latchkey chil- for poor learners to drop out of school, or
dren, or declining school standards, the ur- some combination of these strategies. If em-
gency to alleviate the problem—to "do pirical evidence regarding the efficacy of any
something now" lest the problem be exac- or all of these approaches is lacking, as it is
erbated—generally limits both the opportu- typically for the reasons just stated, then the
nity and the will to carry out empirical prea- legitimacy of the chosen reform strategy
doption trials of proposed solutions (Aira- must be found elsewhere.
sian, 1983). Most often, that legitimacy is found in
Second, the political benefit that accrues what may be termed social validation, in the
to those who promulgate, mandate, and en- mesh between prevalent social norms and
act reform programs is itself often a prime values and the norms and values perceived
and sufficient motivation for unexamined to be reflected in the reform strategy (Par-
implementations (Moynihan, 1970). The sons, 1956; Rowan, 1982). If an innovation
possibility that pilot studies will identify de- is perceived to be capable of moving educa-
ficiencies in an innovation or delay imple- tional or social goals in a direction consistent
mentation by provoking debate over study with the goals, norms, or values of influential
findings can eliminate the political will to social groups, the innovation will be well
carry out pretrial studies before widespread received. So, the likelihood of adopting an
adoption (Timpane, 1976). innovation is determined largely by how well
Finally, a general lack of technical knowl- that innovation is perceived to incorporate
edge about how to remedy identified edu- the goals, norms, or values of important
cational problems (Airasian, 1987b; Wise, social constituencies (Cremin, 1961). The
1979) forces policymakers to frame innova- legitimacy of most innovations then, derives
tions in such general terms that pretrial stud- not from empirical evidence of their likely
ies are difficult to plan and carry out. As effectiveness, but rather from the percep-
Elmore and McLaughlin (1982) point out, tions they evoke and the extent to which
some policies are explicit and tell us all we those perceptions mesh with prevalent social
need to know to do what is correct (e.g., norms and values. In essence, innovations
"Don't exceed 55 miles per hour"), whereas are weighed in terms of whether they are
others do not contain sufficient information symbols of broader social values, and it is
or the knowledge necessary to carry them this symbolic weighing of innovations that
out (e.g., "Improve the educational achieve- determines their legitimacy in the public's
ment of disadvantaged children"). The eyes. A brief examination of social symbols
stated intent of most educational innova- and symbolism helps clarify this relation-
tions fits into this latter group. For these ship.
three reasons, there are pressures to intro-
duce unproven remedies once a problem or Symbols and Symbolism
need is documented. A symbol is a concrete indicator of a more
Nevertheless, when pressed to "do some- general, abstract idea (Firth, 1973; White-
thing now," policymakers typically have a head, 1927). Symbols tend to be evocative
choice of many actions that can be taken; rather than analytical, emotional rather than
there are many ways to attack identified intellectual. That is to say, symbols evoke
educational or social problems. A decline in not only concrete images, but also feelings,
levels of academic performance among values, emotions, and sentiments.

302
State-Mandated Testing
In addition to being representational and and so have meaning only for that individ-
infused with an affective component, sym- ual. For example, although each of us
bols flow from culture and depend on con- dreams, the specific content and context of
vention, habit, or agreement within the our dreams vary from one person to another.
culture for their meanings and impor- At the individual level, each person's dream
tance (Firth, l973;Peirce, 1940; Whitehead, is a private symbol reflecting his or her idio-
1927). Symbolism derives from the values syncrasies. Public symbols, on the other
and ideals of the extant culture, not from hand, reflect common threads underlying
the concrete indicator itself For example, private symbols; they emerge from the reg-
the cultural dependence of language, the ularities and commonalities that are con-
most ubiquitous of symbolic forms, can be tained in and can be discerned from private
seen in many examples of a given word symbols. Thus, Freud's approach to the
conveying different images in two cultures analysis of dreams became viable when he
(e.g., boot in the United States evokes an showed that the private symbols of one in-
image of footwear but in England boot dividual's dreams had underlying common-
evokes an image of the empty space in the alities with the private symbols of other in-
back of a car). Because cultures change over dividuals' dreams (Firth, 1973). The com-
time, the meanings of symbols can change monality sought in public symbols rests
and evolve. For example, less than 50 years largely on the cultural experiences and
ago, test score disparities between black stu- shared understandings of the social group
dents and white students were taken to sym- being described (Lindblom & Cohen, 1979).
bolize the "natural order of things" between The reaction to the symbol is dependent on
racial groups—a reaffirmation of the pre- this commonality, because the symbol ab-
dominant hereditarian outlook (Cronbach, stracts only the important aspects of the
1975). In more recent times, given changes referent, leaving the individual's memory to
in cultural mores and expectations, differ- fill in the details.
ential school test performance among racial The common experiences or values un-
or ethnic groups is taken to symbolize un- derlying public symbols shed light on the
equal educational opportunity, discrimina- issue of the political or social validity of
tion, and lack of progress in an important symbols. When an innovation contains ele-
area of social reform. ments that symbolize experiences or values
Central to the study of symbols is the generally believed in the extant cultural
question of the relationship between a sym- ethos to be necessary for solving a given
bol and the things or emotions it represents. social or educational problem, the innova-
Why do some objects become powerful so- tion will achieve social legitimization and
cial symbols while others do not? The an- acceptance. The degree to which the inno-
swer to this question must reflect the fact vation symbolizes something common or
that a symbol derives from some common- readily understandable in an audience and
ality of experience or perception among the degree to which that commonality is
those who accept and use it. In this context, believed to be necessary for resolving an
symbols can be understood as shorthand, identified problem, dictate the extent to
mnemonic devices to remember and com- which the innovation will be perceived to be
municate experiences and emotions (Lo- valid. Cremin's (1961) discussion of the rise
emker, 1962). of the vocational education movement in
Symbols themselves can be categorized the 1880s and 1890s and Callahan's (1962)
into two domains: public and private (Firth, description of the advent of the school effi-
1973). Both types are true symbols; they are ciency movement in the period 1910 to 1930
representational, often emotion laden, and exemplify situations in which there was a
dependent on cultural experiences. Private clear meshing of cultural values and educa-
symbols reflect an individual's particular tional responses. Symbolic validity then, is
idiosyncrasies, background, and experiences based on existential or consensual valida-

303
Peter W. Airasian
tion, on public symbols and their value, and schools and pupils. There was a perception
emotional referents. This paradigm can be that standards in schools had declined and
useful in explaining the adoption or lack of needed to be strengthened. The National
adoption of proposed, but untested, inno- Commission on Educational Excellence
vations. (1983) summarized the evidence advanced
This paper explores the symbolism pos- to document this perception. Data showed
sessed by a new and rapidly spreading edu- that SAT scores had declined annually be-
cational innovation, namely state-man- tween 1963 and 1980. The National Assess-
dated, high-stakes testing programs (Aira- ment of Educational Progress results showed
sian, 1987c; Madaus, l985;Popham, 1987). performance declines over time in most
The results of such testing programs are school subjects at most grade levels. On 19
being used as the primary criteria in making different international achievement tests
important educational decisions, such as American pupils never ranked first or sec-
whether or not a pupil will receive a high ond, but ranked last seven times. Average
school diploma or be promoted to the next achievement of high school pupils on stand-
grade, whether a teacher will be certified or ardized tests was lower in the early 1980s
recertified to teach in a given state, and than it was 26 years ago when Sputnik was
whether remedial funds or performance bo- launched. In 1980, one quarter of all the
nuses will be awarded to schools and teach- math courses taught in public 4-year colleges
ers. The tests appeared virtually out of no- were remedial courses. The Department of
where about 10 years ago and have enjoyed the Navy indicated that one quarter of its
an acceptance and status that not only is recruits could not read at the ninth-grade
rarely seen in an educational innovation, but level. These and similar statistics conveyed
also far outstrips available evidence of their a picture of an educational system in which
effectiveness. The following sections seek to standards and academic performance had
explain the unusual popularity and breadth fallen to unacceptable levels. There was a
of implementation of such testing programs. perceived need to focus on educational ex-
This paper does not deal with the techni- cellence and improve school standards and
cal, psychometric properties of state-man- attainments.
dated, high-stakes tests, but rather with pub- The American public clearly endorsed this
lic perceptions of these tests. It does not ask view. In a 1983 Gallup Poll on education
whether the tests are good and dependable (Gallup, 1983), 74% of the American adults
in a technical sense, but asks why people polled agreed that the quality of education
have been so quick to adopt them in the in American public schools was only fair
name of educational reform. In this regard, and not improving. In 1984 (Gallup, 1984),
it is necessary to point out that the percep- 60% of adults polled believed that elemen-
tual issues addressed in this discussion have tary school pupils were not made to work
created a too-willing audience for state-man- hard enough in school. Sixty-seven percent
dated, high-stakes testing programs, an au- felt that high school pupils were not required
dience that needs to have its expectations to work hard enough in school. In 1986
tempered by a more critical review and ex- (Gallup, 1986), 70% of American adults sur-
amination of the technical properties of such veyed said that there should be stricter re-
tests. quirements for both grade to grade promo-
The Case of State-Mandated, High-Stakes tion and high school graduation. In the same
Testing Programs poll, 68% of respondents favored stricter
high school graduation requirements even if
Documenting the Problem it meant that fewer students would graduate.
In the late 1970s, after two decades of
focus on policies designed to enhance edu- Responding to the Perceived Need
cational equity, concern began to be ex- Once the need for reform was docu-
pressed about the quality of American mented by evidence such as this, many pol-

304
State-Mandated Testing
icy responses were advanced, and most of amination system (Madaus & Airasian,
these were tried in one form or another 1977). Popham (1987) states
(Doyle & Hartle, 1985). Among the wide Measurement-driven instruction occurs
array of reform strategies advanced to im- when a high stakes test of educational
prove school quality, however, the predom- achievement, because of the important
inant strategy, in terms of the breadth of its contingencies associated with the students'
application and its public acceptance, has performance, influences the instructional
been new forms of standardized testing and program that prepares students for the test
accountability programs. Within the past 10 . . . Teachers tend to focus a significant
years all state legislatures have been con- portion of their instructional activities on
fronted by, and most have endorsed, new the knowledge and skills assessed by such
state-mandated testing programs designed to tests. A high-stakes test of educational
achievement, then, serves as a powerful
make students, teachers, and schools ac- curricular magnet, (p. 680)
countable for their academic performance.
The new state-mandated tests are different High-stakes standardized tests have be-
from more traditional standardized educa- come cornerstones in state government ef-
tional tests in three key ways (Airasian, forts to improve educational standards and
1987a). First, the new state tests are man- to gain control over education in local
dated for all schools and virtually all pupils school districts (Airasian, 1987a). Such tests
within a state. Second, the mandate elimi- have assumed new and important gatekeep-
nates most of the local district discretion in ing roles in the educational process. A survey
test selection, administration, content cov- in Education Week ("Excellence," 1985) in-
erage, scoring, and interpretation. A single, dicated that 29 states require pupils to take
state-approved test that is administered, competency or proficiency tests at selected
scored, and interpreted according to state points in the educational ladder. Over 20
guidelines is used across school districts. states have passed legislation that requires
Third, the tests have built-in sanctions or a high school students to show mastery on a
so-called high stake associated with specific state-mandated exit test as a prerequisite to
levels of performance. Any individual's test receipt of a regular high school diploma.
score can be compared to a statewide stand- Eight states and several large cities tie grade
ard of satisfactory performance and a pass/ to grade promotion to pupils' performance
fail decision made about whether the indi- on standardized tests. As of April 1987,
vidual can graduate from high school, be 44 states had implemented or decided to
promoted to the next grade, receive a per- implement a testing program to be used to
formance bonus, and so on. Once the test is award prospective teachers state certifi-
scored, the school administrator has no dis- cation. Twenty-seven states test or plan to
cretion in whether to dole out the educa- test applicants to teacher training programs
tional reward or sanction. in state colleges and to determine admit-
The logic behind these testing programs is tance on the basis of these test scores (Leh-
that when an important consequence, that mann & Phillips, 1987; Rudner, 1987).
is, a high stake, such as receiving a high Other states use pupil performance on state-
school diploma or a teaching certificate is mandated tests to allocate remedial funding
tied to test performance, the content re- to local school districts and to award mon-
flected in the test will become incorporated etary bonuses to schools and teachers based
into instruction. The high stakes associated on year-to-year pupil test score improve-
with test performance will force an instruc- ment. California and Pennsylvania are
tional response to the test so that the test working on implementing "honors" tests
content will, in current parlance, "drive" that will be used to certify advanced achieve-
instruction. This approach is practiced in ment in particular high school subjects
most countries of the world and exemplified (Carlson, 1987; Hertzog & Lehr, 1987).
by the British O and A Level external ex- Tests have moved from recorders of the

305
Peter W. Airasian
effects of educational reform to initiators of Tests as Symbols
reform, from passive instruments to intru-
sive devices that threaten test takers and seek One answer to this question is that state-
to dictate the ends of the instructional sys- mandated, high-stakes achievement tests
tem. Tests have moved from descriptive in- possess potent symbolic appeal. Certainly
dices to certification devices. State-man- the concept of a test is meaningful to most
dated, high-stakes tests used to determine Americans; it is in the realm of their expe-
high school graduation, grade to grade pro- rience and provides a contextual reference
motion, teacher certification, resource allo- capable of conjuring up many and rich men-
cation, and monetary bonuses abound in tal images. Although each individual may
American education. All this has occurred attach idiosyncratic meaning and images to
in barely a 10-year period. the concept of a test, there are also more
Clearly, the public supports the new forms general, public meanings and images that
of educational testing, even though the reflect commonalities across individuals.
American educational system has had little Rafferty (1985) examined the significance of
prior experience with them (Airasian, 1987c; tests and examinations in society by survey-
Madaus & Airasian, 1977). Gallup Poll re- ing the manner in which they are treated in
sults from 1983 (Gallup, 1983) indicated nontechnical writings such as biographies,
that 75% of American adults believed that fiction, poetry, and drama. Rafferty has
children should be promoted from grade to shown both the permeation of the concept
grade only if they can pass examinations. In of tests and examinations in biography and
the same poll, three quarters of the respond- fiction and the commonalities tests and ex-
ents endorsed testing pupils in local schools aminations are perceived to symbolize (e.g.,
with national tests so that educational fear of failure, path to social mobility, main-
achievement could be compared across tenance of standards). Moreover, the word
communities. In 1984 (Gallup, 1984), about test is not affectively neutral and most often
two thirds of those polled agreed that all evokes images of an experience that was
high school students in the United States unpleasant at best and anxiety-provoking
should be required to pass a standard na- and obstructive at worst.
tionwide examination to get a high school State-mandated, high-stakes tests possess
diploma. The 1986 poll (Gallup, 1986) re- three generally distinct, though not mutually
vealed that 77% of respondents would like exclusive, types of symbolic appeal. First,
to see national tests given in their local they symbolize order and control in a system
schools so that achievement can be com- where autonomous local control is perceived
pared across communities. In the 1987 Gal- to be weak. Second, they symbolize, in the
lup Poll of Education (Gallup & Clark, constructs they are perceived to measure,
1987), 74% of the public at large endorsed a important educational outcomes. Third,
national testing program for public school they symbolize a distinct value or moral
students, although only 31 % of professional outlook that the public wishes to see re-
educators endorsed such a testing program flected in its schools.
(Elam, 1987). These data reflect substantial Tests as symbols of order and control In
and ongoing public support for testing for- most communities and in the nation at large,
mats and uses that were virtually unknown support for schools is conditioned by the
and untried in the United States until answer to the question: How are our schools
around 1980. Certainly little clear empirical doing? The answer sought most often is
evidence of their effectiveness in increasing rooted in standardized test results because,
standards and achievement in schools has wisely or not, the public is no longer willing
been produced, and their benefits are to accept testimonials about the status of
debated (Airasian, 1987b; Madaus, 1988; education from teachers and school admin-
McDill, Natriello, & Pallas, 1985). Yet they istrators. The credibility of school-based as-
are endorsed heartily by the public at large. sessment has diminished in the face of pes-
The question is, Why? simistic news about educational standards
306
State-Mandated Testing
and pupil achievements. In addition, there educational credentialing system because, at
is a sense that we have lost control of edu- least for the present, they have more legiti-
cation in our local schools and that diversity, mizing power than locally derived tests and
competing interests, and politicization are testimonials.
the order of the day in the public schools In this regard, tests are perceived as sym-
(Airasian, 1987b; Kirst, 1984). Greene bols of order and control in the midst of
(1987) has summed up this sense and its diversity, chaos, competition for dwindling
resulting desire very well: school resources, and the need for external
verification of educational standards. State-
Even as we become increasingly aware of mandated, high-stakes tests are perceived to
the great diversity of nations and cultures
represented in our schools, of the unpre- be scientific because they produce a numer-
dictable and the uncontrollable factors to ical score; fair because they are standardized
be taken into account, we are afflicted more across all pupils and school districts; and
and more by arguments against relativism objective because preestablished cut scores
and multiplicity. At once, we are bom- and decision rules remove local teachers',
barded with demands for a "common core" parents' or principals' influence in educa-
at least of information, for categorical tional decisionmaking. There is a sense that
agreements that will overcome what strikes human judgment and biases are being re-
people as cognitive as well as moral chaos. moved from the assessment process by ex-
Feeling uncertainties and fragmentations ternally mandated high-stakes tests, with a
all around, they ask implicitly for effective
systematizations, for what is "logically resulting increase in control, fairness, and
basic." They want promises and prescrip- objectivity. External tests reinforce a sense
tions for the children; many want single- of the commonness of the common schools
track highways with one-way signs leading and the perception that the distribution of
unswervingly to some kind of success, (pp. educational resources and credentials and
6-7) the social benefits that are believed to flow
from them (i.e., higher salaries, greater pres-
Coupled with increased social acceptance of tige, and better jobs) is based on merit and
testing to insure industrial product safety performance, not whim and political clout.
and standards, these perceptions provide To provide a perspective on this symbol-
powerful motives for state-mandated, high- ism, let us recognize two important charac-
stakes testing programs. teristics of state-mandated, high-stakes tests.
Most Americans trust and desire external First, as tests, they are fairly trivial instru-
tests. The latest Gallup Poll of Education ments in and of themselves. This is not to
(Gallup & Clark, 1987) indicates that 70% say they are trivial either in their perceived
of the adults surveyed support reporting import or actual effect, but it is to point out
achievement test results publicly on a state that most tests measure only a small portion
by state and community by community ba- of student behavior vis a vis some content
sis so that comparisons can be made among domain. Important decisions often are based
schools. Only 14% of those polled opposed on a one-shot, 30- or 45-item multiple
public reporting for comparison purposes. choice test. Certainly there are better and
About 70% of the people polled by Gallup often more valid ways to measure behavior
also thought that the local schools should and to make important decisions about
not be given sole responsibility for policing pupils.
their compliance with state educational Second, the results of state-mandated,
standards. The public is seeking some exter- high-stakes tests rarely tell us things we don't
nal assurance of the quality of the teaching already know about pupils, especially since
and learning process, and it turns to stand- the decisions we require from such tests
ardized tests as the one index it believes can pertain to minimum competence. Although
provide the evidence it seeks. State-man- correlations between teacher judgments of
dated, high-stakes standardized tests are pupils' achievement and pupils' measured
being used to restore public trust in the achievement are typically in the .6 to .7
307
Peter W. Airasian
range (Brophy & Good, 1974), such studies have attained these desired outcomes or
are based on asking teachers to discriminate schooling.
among pupils on a five- or six-point rating It is hard to be against competence, stand-
scale. High-stakes testing programs do not ards, or functional literacy, and few politi-
require such fine distinctions among pupils, cians or school board members would be
but instead seek to differentiate the most likely to state in public that they are not in
extreme 10 to 15% of poor learners from favor of such constructs. What, after all, is
their peers. Teachers can make such differ- someone who is against competence for—
entiations; they know which pupils are aca- incompetence? The language of state-man-
demically unprepared to be promoted, grad- dated, high-stakes testing programs helps
uated, certified, and so on. Although tests crystalize public acceptance because words
do not, of course, produce information that like competence and standards conjure up
is perfectly isomorphic with teacher percep- symbolically potent images.
tion, there is substantial agreement between Too often, however, the symbolic richness
the two sources of information (Kellaghan, of the language used to characterize an in-
Madaus, & Airasian, 1982), especially for novation confers a status on the innovation's
the type of differentiation sought from state- particular methods and procedures that goes
mandated, high-stakes tests. The tests supply far beyond its inherent worth or meaning.
largely redundant information, but in a In state-mandated, high-stakes testing pro-
manner that is perceived to be more objec- grams, there is often a confusion between
tive and fair than school-based judgments. what the high-stakes tests are actually meas-
If state-mandated, high-stakes tests are uring and what they are assumed to be meas-
one-shot, small sample measures of educa- uring by people who are reacting to the
tional attainments and if they provide little symbolic richness of the language used to
new information about test takers, some- characterize the testing program. For most
thing apart from the tests themselves and people a so-called "Functional Literacy
the information they provide makes them Test" or "Survival Skills Test" symbolizes
desired by the public. That something is an array of competencies and proficiencies
what the state-mandated, high-stakes tests that goes far beyond the 25 vocabulary and
are perceived to represent: external control, 25 basic math items that may actually com-
order, common standards, unbiased judg- pose the test. When we name or describe the
ments, and sanctions for poor performance. essence of an innovation with constructs
The tests are symbols of these desired char- that are socially desirable and complex, as it
acteristics. is often politically beneficial to do, the ter-
Tests as symbols of important educational minology we use makes the innovation itself
constructs. The language associated with an a symbol of the constructs.
innovation is important in helping it achieve Consider the language used by the Cali-
social consensus and acceptance. Language fornia State Legislature in 1980, when it
sets the perceptions of and expectations for passed State Bill 3408 requiring junior and
the innovation. State-mandated, high-stakes senior high schools to adopt standards of
testing programs are no exception, and typ- proficiency in basic skills, and tied high
ically are encased in language that is in- school graduation to these standards: "Pu-
tended to strike a responsive chord with the pils attending the public schools in Califor-
public. The rhetoric of such testing programs nia [should] acquire the knowledge, skills,
heavily relies on constructs such as compe- and confidence required to function effec-
tence, survival skills, functional literacy, ex- tively in contemporary society." Such lan-
cellence, academic standards, basic skills, guage is not unique to California; similar
preparation for life, mastery, and more re- language can be found in bills of all states.
cently, cultural literacy. Words like these At first blush it is difficult to find fault with
command attention and support. The rhet- such an intent. It holds out the promise of
oric makes it important to insure that pupils knowledgeable, confident pupils successfully

308
State-Mandated Testing
carrying out their chosen social roles. Envi- results of the test. What actually is measured
sion the characteristics of a pupil who fulfills is confused with what the language of the
this expectation. testing program suggests is being measured.
Consider more fully the language of the Often the public doesn't even know the na-
law. What, for example, is the distinction ture of the content being tested. It knows
between knowledge and skills? Is knowledge only that a functional literacy test or a min-
a familiarity with basic facts, such as the imum competency test will be administered,
multiplication tables or rules of punctua- and this seems to be enough to convince it
tion, while skills are applications of these that the test is desirable.
facts to real-world situations like filling out In 1966, Coleman et. al. published Equal-
a tax form or writing a coherent business ity of Educational Opportunity, in which
letter? And if this distinction is the intended they concluded that "school brings little in-
one, are we confident that schools can or fluence to bear on a child's achievement that
should teach such applications to all pupils? is independent of his background and gen-
More to the point, however, is the language eral social context" (p. 325). The common
used in the latter portion of the law. Can interpretation of this finding was that
schools teach confidence? What exactly is schools don't make a difference and an enor-
confidence? How much is too much and mous amount of debate over the ineffective-
how much is too little? What of the pupil ness of schools ensued from this interpreta-
who demonstrates cognitive proficiency but tion (Madaus, Airasian, & Kellaghan, 1980).
lacks "confidence"? Although Coleman et. al. used a variety of
Finally, what skills and knowledge are outcome measures to assess pupil perform-
required to function effectively in contem- ance, the measure used in the main analyses
porary society? Is there a universal norm of was a single measure of general verbal ability
effective functioning that can be applied to (Brimer et. al., 1978; Jencks, 1972). School
all people in all contexts? Moreover, in a outcomes were assessed by a general verbal
society that is changing constantly, is it suf- ability test, which came to symbolize for the
ficient and desirable to set one's goals on American public the broad domain of learn-
functioning effectively in contemporary so- ings it associated with school achievement.
ciety? Contemporary society will not be con- Little wonder that the conclusion that
temporary in 10 years, and the skills, knowl- schools don't make a difference in pupils'
edge, and confidence required to function achievement created debate and soul search-
today may be quite different from what is ing. In interpreting the results of the Cole-
required in 10 years. man study, the construct verbal ability,
In lieu of such an analysis, the terminol- which described what actually was meas-
ogy used to describe this early version of a ured, became confused with the construct
state-mandated, high-stakes testing program school achievement and a status and inter-
is seductive: "knowledge, skills, and confi- pretation that was not warranted was con-
dence to function effectively in contempo- ferred on the verbal ability test.
rary society." Regardless of the particular Tests, and more particularly, the scores
domains that finally are measured by the pupils get on tests, make concrete abstract
test used to determine whether or not a terms like competence, functional literacy,
student will be granted a diploma in Califor- excellence, and school achievement. The test
nia, performance on that test will symbolize score becomes a symbol of the broader con-
for most Californians the degree to which a struct it was designed to measure. So, the
pupil possesses the "knowledge, skills, and names we give tests and the constructs that
confidence to function effectively in con- they are intended to assess serve as powerful
temporary society." It goes without saying and persuasive social symbols for state-man-
that the test will not assess such functioning dated, high-stakes testing programs.
at a level that captures the richness and Tests as moral symbols. In addition to
complexity most people will read into the symbolizing order, control, and desired

309
Peter W. Airasian
school achievements, state-mandated, high- than students in comparable schools else­
stakes tests also symbolize a set of more basic where. Do you think that this would serve
social values. Rafferty (1985) has shown that as an incentive for the local schools to do an
in nontechnical writings there is a common­ even better job or not?" Seventy-two percent
ality of reaction to tests and examinations. said yes, higher test performance than com­
This reaction centers on testing as a symbol parable schools would be an incentive to do
of traditional social and educational values an even better job. Only 12% said no, higher
such as hard work, reward for effort, and performance would not be an incentive to
punishment for the lazy (Airasian, 1987c; do even better.
Eraut, 1981). Most of the public prizes these The final question posed to adults was the
traditional values and would like to see more reverse situation to the second question:
of them reflected in its schools. The more "Let's assume that the students in the public
these values are reflected in school practices schools in this community received lower
and programs, the more the rewards of ed­ test scores than students in comparable
ucation (e.g., diplomas, honors, etc.) are schools elsewhere. Do you think this would
thought to be worth. Tests, particularly state- encourage the local schools to try to do a
mandated, high-stakes tests with their em­ better job or not?" Again, 72% said yes,
phasis on common standards and sanctions lower test performance than comparable
for poor performance, reinforce the impres­ schools would be an incentive to try to do a
sion that the emphasis in the schools is on better job. As with question 2, 12% said no,
earning credentials by demonstrating good lower test scores would not be an incentive
performance. In a sense, the tests revalue a to try to do better.
debased educational currency, the high Regardless of whether the results of a test­
school diploma, teaching certificate, etc. The ing program showed local school perform­
use of high-stakes tests to allocate educa­ ance to be higher or lower than comparable
tional benefits and credentials harkens back neighboring schools, over 70% of the adults
to an earlier time (real or imagined) in which polled said test results would be an incentive
people attained high school diplomas, grade for their local school to improve. People
to grade promotions, and teacher certifi­ appear to perceive some value inherent in
cations in the old-fashioned way: They the test or testing process that is independent
earned them (Armbruster, 1977). of the results attained on the test. Tests are
But state-mandated, high-stakes tests are valued for what they stand for, not for them­
perceived not only as a measure of hard selves. Tests motivate people to strive to do
work and diligence in learning, they also are better, regardless of their current level of
perceived to be motivators of hard work. performance, and this is desired. The mere
The results of the most recent Gallup Poll presence of tests in schools provides psychic
of Education (Gallup & Clark, 1987) shed satisfaction to the patrons of high standards,
light on this characteristic of tests. A national reward for hard work, and punishment for
sample of adults was asked, "It is now being laziness. Black (1982) echoes a common re­
proposed that educational achievement test frain that
results be reported on a state-by-state basis,
so that comparisons can be made between if the student must face an accounting (in
schools of similar size and racial and eco­ the form of a test) of how well he or she
nomic make-up. Do you favor or oppose has mastered the subject, learning will take
place. Whether as a competitive exercise or
this proposal?" Given the preceding discus­ in isolation, standardized testing offers the
sion, it should not be suφrising that 70% of carrot (the satisfaction of doing well) and
the sample favored the proposal, and only the stick (the disappointment of doing
14% opposed it. badly) both at once. (p. 245)
Adults were then asked a second question:
"Let's assume that the students in the local How much stronger is the belief in this
public schools received higher test results accounting when the standardized test is a

310
State-Mandated Testing
state-mandated, high-stakes test with real grams might have been considerably less
consequences for test takers? warmly endorsed.
If this line of reasoning is correct, it is Because of the public's current faith in
likely that the particular content domain tests and testing, it would not be surprising
assessed by the tests may be less important if the symbolic dimensions of tests produced
than the fact that a test of some kind is being some effects of their own, quite apart from
administered and rewards or punishments the educational effects produced by the test-
are doled out on the basis of performance. ing programs. Given their potent symbol-
The test becomes a surrogate or symbol for ism, an inherent danger in state-mandated,
the values people perceive in the act of test- high-stakes tests is that their mere existence
ing: hard work, nose to the grindstone, trial will convince the public that much is being
by fire, and so on. The situation is somewhat done to remedy recognized educational
analogous to the issue of school prayer. Few problems, thereby directing attention away
would argue strongly that daily recitation of from other, needed remedies. The symbolic
the Twenty-third Psalm or the observance richness of high-stakes tests creates the dan-
of a moment of silence at the start of the ger that such tests can become a convenient,
school day, in and of itself, will have a major though poor, substitute for other necessary
impact on pupils' moral and ethical devel- and meaningful actions to improve schools
opment or demeanor. That is not the point. and learning. Also, as noted earlier, focus on
For its advocates, the recitation of the the symbolism of the new testing programs
Twenty-third Psalm or a moment of silence directs attention away from needed consid-
symbolizes a host of religious and moral eration of the technical quality of the tests.
values that it is reassuring to believe are The symbolic role of testing often makes it
present in the school environment to which difficult to separate discussion of educa-
the young are sent to learn and be nurtured. tional policy and standards from public re-
In the same manner, tests represent broad lations.
social and educational values that the public A second consequence of the symbolic
wants in the school environment. dimension of tests relates to the way that the
public defines and measures educational ex-
Conclusion cellence. Because so much is read into test
As educational reform mechanisms, state- scores by a public that confuses the construct
mandated, high-stakes testing programs may label of the test with the more narrow set of
not be as powerful as many expect. The jury items it contains, test scores and passing
is still out on the effects of such programs. rates on high-stakes tests are becoming the
There is little question, however, that the educational equivalent to the Dow Jones
testing programs are powerful symbolically; stock average. People follow both the Dow
they strike a responsive chord in the public Jones and test scores avidly in order to gauge
at large and this response helps explain the the financial and educational progress of
widespread and speedy adoption of an in- society.
novation that had virtually no track record Tests are socially valid, highly respected
in American education before about 1979. symbols of a broad range of administrative,
In state-mandated, high-stakes testing pro- academic, and moral virtues in our society.
grams, tests symbolize order and control, a When tests have the added features of cen-
focus on important educational outcomes, tral control and sanctions for poor perform-
and basic, traditional moral values. These ance, we can expect that their symbolic im-
values are endorsed by the extant culture portance will be heightened. Thus, regardless
and this endorsement provided the legiti- of the actual impact that state-mandated,
mization necessary to justify their wide- high-stakes tests have on pupils, teachers,
spread adoption in the name of educational and the curriculum, they are likely to have
reform. Given a different extant culture, an important perceptual impact on the pub-
state-mandated, high-stakes testing pro- lic at large. This perceptual impact underlies

311
Peter W. Airasian
the social consensus or social validation that in education: The states take charge. Washing-
provides the legitimization of testing as a ton, DC: American Enterprises Institute.
workable and desired reform strategy. Excellence: A 50-State Survey. (1985, February
6). Education Week, 11-30.
Elam, S. (1987). Differences between educators
References and the public on questions of education pol-
Airasian, P. W. (1983). Societal experimentation. icy. Phi Delta Kappan, 69(4), 294-296.
In G. Madaus, M. Scriven, & D. Stufïlebeam Elmore, R. F, & McLaughlin, M. W. (1982).
(Eds.), Evaluation models (pp. 163-176). Bos- Strategic choice in federal education policy:
ton: Kluwer-Nijhoff. The compliance-assistance trade off. In A. Lie-
Airasian, P. W. (1987a). State mandated testing berman & M. McLaughlin (Eds.), Policy mak-
and educational reform: Context and con- ing in education. Eighty-first yearbook of the
sequences. American Journal of Education, National Society for the Study of Education,
95(3), 393-412. Part I (pp. 159-194). Chicago: University of
Airasian, P. W. (1987b, November). State-man- Chicago Press.
dated, high-stakes testing programs: Challenges Eraut, M. (1981). Accountability and evaluation.
to educators. Paper presented at the Association In B. Simon & W. Taylor (Eds.), Education in
for Moral Education annual conference, Har- the eighties (pp. 145-162). London: Batsford
vard University, Cambridge, MA. Academic and Educational Limited.
Airasian, P. W. (1987c). The consequences of Firth, R. (1973). Symbols public and private.
high school graduation testing programs. Ithaca, NY: Cornell University Press.
NASSP Bulletin, 71(496), 54-67. Fullan, M. (1982). The meaning of educational
Armbruster, F. F. (1977). Our children's crippled change. New York: Teachers College Press.
future. New York: Quadrangle Books. Gallup, G. H. (1983). The 15th annual Gallup
Black, T. M. (1982). Straight talk about American poll of the public's attitudes toward the public
education. New York: Harcourt, Brace, Jova- schools. Phi Delta Kappan, 65(1), 33-47.
novich. Gallup, G. H. (1984). The 16th annual Gallup
Brimer, A., Madaus, G. F., Chapman, B., Kel- poll of the public's attitudes toward the public
laghan, T., & Wood, R. (1978). Sources of schools. Phi Delta Kappan, 66(1), 23-38.
difference in school achievement. Windsor, Gallup, G. H. (1986). The 18th annual Gallup
England: NFER. poll of the public's attitudes toward the public
Brophy, J. E., & Good, T. L. (1974). Teacher- schools. Phi Delta Kappan, 68{\\ 43-60.
student relationships. New York: Holt, Rine- Gallup, A. M., & Clark, D. L. (1987). The 19th
hart and Winston. annual Gallup poll of the public's attitudes
Callahan, R. E. (1962). Education and the cult of toward the public schools. Phi Delta Kappan,
efficiency. Chicago: University of Chicago 69(1), 17-30.
Press. Greene, M. (1987, October). Quality in teacher
Campbell, D. T. (1969). Reforms as experiments. education. Paper delivered at the Northeast
American Psychologist, 4, 409-429. Holmes Group Fall, 1987 Conference, Boston,
Carlson, D. C. (1987, April). Technical problems MA.
associated with high-aspiration examinations. Hertzog, J. T, & Lehr, S. T. (1987, April). Con-
Paper presented at the annual meeting of the tent determination for honors tests. Paper pre-
National Council on Measurement in Educa- sented at the annual meeting of the National
tion, Washington, DC. Council on Measurement in Education, Wash-
Coleman, J. S., Campbell, E. Q., Hobson, C. J., ington, DC.
McPartland, J., Mood, A. M., Weinfeld, F. D., Jencks, C. S. (1972). The quality of the data
& York, R. L. (1966). Equality of educational collected by The equality of educational oppor-
opportunity. Washington, DC: Office of Edu- tunity survey. In F. Mosteller & D. P. Moyni-
cation, Department of Health, Education, and han (Eds.), On equality of educational oppor-
Welfare. tunity (pp. 437-512). New York: Vintage
Cremin, L. A. (1961). The transformation of the Books.
school. New York: Alfred A. Knopf. Kellaghan, T., Madaus, G. F., & Airasian, P. W.
Cronbach, L. J. (1975). Five decades of public (1982). The effects of standardized testing. Bos-
controversy over mental testing. American Psy- ton: Kluwer-Nijhoff.
chologist, 30, 1-14. Kirst, M. W. (1984). The changing balance of
Doyle, D. P. & Hartle, T. W. (1985). Excellence state and local power to control education. Phi

312
State-Mandated Testing

Delta Kappan, 66(3), 189-191. Parsons, T. (1956). Suggestions for a sociological


Lehmann, I. J. & Phillips, S. S. (1987). A survey approach to the theory of organization-1. Ad-
of state teacher-competency examination pro- ministrative Science Quarterly, 1, 63-85.
grams. Educational Measurement: Issues and Peirce, C. S. (1940). The philosophy of Peirce;
Practice, 6(1), 14-18. selected writings. New York: Harcourt Brace.
Lindblom, C. E., & Cohen, D. K. (1979). Usable Popham, W. J. (1987). The merits of measure-
knowledge. New Haven: Yale University Press. ment-driven instruction. Phi Delta Kappan,
Loemker, L. E. (1962). Symbol and myth in 68(9), 679-682.
philosophy. In T. Altizer, W. A. Beardslee, & Rafferty, M. (1985, April). Examinations in lit-
J. H. Young (Eds.), Truth, myth, and symbol erature: Perceptions from non-technical writers
(pp. 109-127). Englewood Cliffs, NJ: Prentice- of England and Ireland from 1850 to 1984.
Hall. Unpublished doctoral dissertation, Boston Col-
Madaus, G. F. (1985). Testing as an administra- lege.
tive mechanism in educational policy: What Rowan, B. (1982). Organizational structure and
does the future hold? Educational Horizons, the institutional environment: The case of the
63(5), 34-39. public schools. Administrative Science Quar-
Madaus, G. F. (1988). The influence of testing terly, 27, 259-279.
on the curriculum. In L. Tanner (Ed.), Critical Rudner, L. M. (Ed.). (1987). What's happening
issues in curriculum. The 87th yearbook of the in teacher testing. Washington, DC: Office of
National Society for the Study of Education, Educational Research and Improvement, U.S.
part I (pp. 83-121). Chicago: University of Department of Education.
Chicago Press. Suchman, E. A. (1967). Evaluative research. New
Madaus, G. F., & Airasian, P. W. (1977). Issues York: Russell Sage Foundation.
in evaluating student outcomes in competency- Timpane, M. (1976). Into the maw: The uses of
based graduation programs. Journal of Re- policy in Washington. Phi Delta Kappan, 56(2),
search and Development in Education, 10(3), 177-178.
79-91. Whitehead, A. N. (1927). Symbolism, its mean-
Madaus, G. F., Airasian, P. W., & Kellaghan, T. ing and effect. New York: Macmillan.
(1980). School effectiveness: A reassessment of Wise, A. E. (1979). Legislated learning. Berkeley
the evidence. New York: McGraw-Hill. and Los Angeles: University of California
McDill, E. L., Natriello, G., & Pallas, A. M. Press.
(1985). Raising standards and retaining stu-
dents: The impact of the reform recommen-
dations on potential dropouts. Review of Edu-
cational Research, 55, 415-433. Author
Moynihan, D. P. (1970). Maximum feasible mis- PETER W. AIRASIAN, Professor and Chair of
understanding. New York: Free Press. Educational Foundations Division, School of
National Commission on Excellence in Educa- Education, McGuinn 600, Boston College,
tion. (1983). A nation at risk. Washington, DC: Chestnut Hill, MA 02167. Specialization: test-
U.S. Government Printing Office. ing and public policy.

313

Вам также может понравиться