Вы находитесь на странице: 1из 5

Washback or backwash

Washback or backwash, a term now commonly used in applied linguistics, refers to the influence of testing on teaching and learning (Alderson & Wall, 1993), and has become an increasingly prevalent and prominent phenomenon in educationwhat is assessed becomes what is valued, which becomes what is taught (McEwen, 1995a, p. 42). There seems to be at least two major types or areas of washback or backwash studiesthose relating to traditional, multiple-choice, large-scale tests, which are perceived to have had mainly negative influences on the quality of teaching and learning (Madaus & Kellaghan, 1992; Nolan, Haladyna, & Haas, 1992; Shepard, 1990), and those studies where a specific test or examination1 has been modified and improved upon (e.g., performance-based assessment), in order to exert a positive influence on teaching and learning (Linn & Herman, 1997; Sanders & Horn, 1995). The second type of studies has shown, however, positive, negative, or no influence on teaching and learning. Furthermore, many of those studies have turned to focus on understanding the mechanism of how washback or backwash is used to change teaching and learning Washback (Alderson & Wall, 1993) or backwash (Biggs, 1995, 1996) here refers to the influence of testing on teaching and learning. The concept is rooted in the notion that tests or examinations can and should drive teaching, and hence learning, and is also referred to as measurementdriven instruction (Popham, 1987). Wall (1997) distinguished between test impact and test washback in terms of the scope of the effects. According to Wall, impact refers to . . . any of the effects that a test may have on individuals, policies or practices, within the classroom, the school, the educational system or society as a whole (see Stecher, Chun, & Barron, chap. 4, this volume), whereas Washback (or backwash) is defined as the effects of tests on teaching and learning (Wall, 1997, p. 291). Messick (1996), who defined washback as the extent to which a test influences language teachers and learners to do things they would not necessarily otherwise do that promote or inhibit [emphasis added] language learning (p. 241, as cited in Alderson & Wall, 1993, p. 117). Wall and Alderson also noted that tests can be powerful determiners, both positively and negatively, [ According to Messick (1996), for optimal positive washback there should be little, if any, difference between activities involved in learning the language and activities involved in preparing for the test (pp. 241 242). However, the lack of simple, one-to-one relationships in such complex systems was highlighted by Messick (1996): A poor test may be associated with positive effects

and a good test with negative effects because of other things that are done or not done in the education system (p. 242). In terms of complexity and validity, Alderson and Wall (1993) argued that Washback is likely to be a complex phenomenon which cannot be related directly to a tests validity (p. 116). The washback effect should, therefore, refer to the effects of the test itself on aspects of teaching and learning. Negative Washback Tests in general, and perhaps language tests in particular, are often criticized for their negative influence on teachingso-called negative washback which has long been identified as a potential problem. Positive Washback Like most areas of language testing, for each argument in favor or opposed to a particular position, there is a counterargument. There are, then, researchers who strongly believe that it is feasible and desirable to bring about beneficial changes in teaching by changing examinations, representing the positive washback scenario, which is closely related to measurement- driven instruction in general education. In this case, teachers and learners have a positive attitude toward the examination or test, and work willingly and collaboratively toward its objectives. WASHBACK: FUNCTIONS AND MECHANISMS Traditionally, tests have come at the end of the teaching and learning process for evaluative purposes. However, with the widespread expansion and proliferation of high-stakes public examination systems, the direction seems to have been largely reversed. Testing can come first in the teaching and learning process. Particularly when tests are used as levers for change, new materials need to be designed to match the purposes of a new test, and school administrative and management staff, teachers, and students are generally required to learn to work in alternative ways, and often work harder, to achieve high scores on the test. In addition to these changes, many more changes in the teaching and learning context can occur as the result of a new test, although the consequences and effects may be independent of the original intentions of the test designers, due to the complex interplay of forces and factors both within and beyond the school. Such influences were linked to test validity by Shohamy (1993a), who pointed out that the need to include aspects of test use in construct validation originates in the fact that testing is not an isolated event; rather, it is connected to a whole set of variables that interact in the educational process (p. 2). Similarly, Linn (1992) encouraged the measurement research community to make the case that the introduction of any new high-stakes examination system should pay greater

attention to investigations of both the intended and unintended consequences of the system than was typical of previous test-based reform efforts (p. 29). As a result of this complexity, Messick (1989) recommended a unified validity concept, which requires that when an assessment model is designed to make inferences about a certain construct, the inferences drawn from that model should not only derive from test score interpretation, but also from other variables operating within the social context (Bracey, 1989; Cooley, 1991; Cronbach, 1988; Gardner, 1992; Gifford & OConnor, 1992; Linn, Baker, & Dunbar, 1991; Messick, 1992). The importance of collaboration was also highlighted by Messick (1975): Researchers, other educators, and policy makers must work together to develop means of evaluating educational effectiveness that accurately represent a school or districts progress toward a broad range of important educational goals (p. 956). The Tracheotomy Backwash Model (a) Participantsstudents, classroom teachers, administrators, materials developers and publishers, whose perceptions and attitudes toward their work may be affected by a test (b) Processesany actions taken by the participants which may contribute to the process of learning (c) Productswhat is learned (facts, skills, etc.) and the quality of the learning Note. Adapted from Hughes, 1993, p. 2. Cited in Bailey (1996).

WASHBACK: THE CURRENT TRENDS IN ASSESSMENT One of the main functions of assessment is generally believed to be as one form of leverage for educational change, which has often led to top-down educational reform strategies by employing better kinds of assessment practices (James, 2000; Linn, 2000; Noble & Smith, 1994a). Assessment practices are currently undergoing a major paradigm shift in many parts of the world, which can be described as a reaction to the perceived shortcomings of the prevailing paradigm, with its emphasis on standardized testing (Biggs, 1992, 1996; Genesee, 1994). Alternative or authentic assessment methods have thus emerged as systematic attempts to measure learners abilities to use previously acquired knowledge in solving novel problems or completing specific tasks, as part of this use of assessment to reform curriculum and improve instruction at the school and classroom level (Linn, 1983, 1992; Lock, 2001; Noble & Smith, 1994a, 1994b; Popham, 1983).

Therefore, I begin with an outline of the complexity of the phenomenon called washback. (a) Dimensions Watanabe (1997b) conceptualized washback on the following dimensions, each of which represents one of the various aspects of its nature. Specificity. Washback may be general or specific. General Washback means a type of effect that may be produced by any test. For example, if there is a hypothesis that a test motivates students to study harder than they would otherwise, washback here relates to any type of exam, hence, general washback. Specific washback, on the other hand, refers to a type of washback that relates to only one specific aspect of a test or one specific test type. For example, a belief that if a listening component is included in the test, the students and teachers will emphasize this aspect in their learning or teaching. Intensity. Washback may be strong or weak. If the test has a strong effect, then it will determine everything that happens in the classroom, and lead all teachers to teach in the same way toward the exams. On the other hand, if a test has a weak effect, then it will affect only a part of the classroom events, or only some teachers and students, but not others. If the examination produces an effect only on some teachers, it is likely that the effect is mediated by certain teacher factors. The research to date indicates the presence of washback toward the weak end of the continuum. It has also been suggested that the intensity of washback may be a function of how high or low are the stakes (Cheng, 1998a). Length. The influence of exams, if it is found to exist, may last for a short period of time, or for a long time. For instance, if the influence of an entrance examination is present only while the test takers are preparing for the test, and the influence disappears after entering the institution, this is short-term washback. However, if the influence of entrance exams on students continues after they enter the institution, this is long-term washback. Intentionality. Messick (1989) implied that there is unintended as well as intended washback when he wrote, Judging validity in terms of whether a test does the job it is employed to do . . . requires evaluation of the intended or unintended social consequences of test interpretation and use. The appropriateness of the intended testing purpose and the possible occurrence of unintended outcomes and side effects are the major issues

(p. 84). McNamara (1996) also holds a similar view, stating that High priority needs to be given to the collection of evidence about the intended and unintended effects of assessments on the ways teachers and students spend their time and think about the goals of education (p. 22). The researcher has to investigate not only intended washback but also unintended washback. Value. Examination washback may be positive or negative. Because it is not conceivable that the test writers intend to cause negative washback, intended washback may normally be associated with positive washback, while unintended washback is related to both negative and positive washback. When it comes to the issue of value judgment, the washback research may be regarded as being a part of evaluation studies. The distinction between positive and negative could usefully be made only by referring to the audience. In other words, researchers need to be ready to answer the question, who the evaluation is for (Alderson, 1992). For example, one type of outcome may be evaluated as being positive by teachers, whereas the same outcome may be judged to be negative by school principals. Thus, it is important to identify the evaluator when it comes to passing value judgment (see also chap. 1, this volume).