Вы находитесь на странице: 1из 18

The Journal of Special Education

http://sed.sagepub.com

The Predictive Validity of Dynamic Assessment: A Review

Erin Caffrey, Douglas Fuchs and Lynn S. Fuchs J Spec Educ 2008; 41; 254 DOI: 10.1177/0022466907310366

The online version of this article can be found at:

http://sed.sagepub.com/cgi/content/abstract/41/4/254

Published by:

and
and
The Journal of Special Education http://sed.sagepub.com The Predictive Validity of Dynamic Assessment: A Review Erin Caffrey,Hammill Institute on Disabilities and http://www.sagepublications.com Additional services and information for The Journal of Special Education can be found at: Email Alerts: http://sed.sagepub.com/cgi/alerts Subscriptions: http://sed.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Downloaded from http://sed.sagepub.com by on February 17, 2009 " id="pdf-obj-0-27" src="pdf-obj-0-27.jpg">

http://www.sagepublications.com

Additional services and information for The Journal of Special Education can be found at:

Downloaded from http://sed.sagepub.com by on February 17, 2009

 

The Journal of Special Education

The Predictive Validity of Dynamic Assessment

Volume 41 Number 4 Winter 2008 254-270 © 2008 Hammill Institute on Disabilities

10.1177/0022466907310366

http://journalofspecialeducation.sagepub.com hosted at http://online.sagepub.com

A Review

Erin Caffrey Douglas Fuchs Lynn S. Fuchs

Peabody College of Vanderbilt University

The authors report on a mixed-methods review of 24 studies that explores the predictive validity of dynamic assess- ment (DA). For 15 of the studies, they conducted quantitative analyses using Pearson’s correlation coefficients. They descriptively examined the remaining studies to determine if their results were consistent with findings from the group of 15. The authors implemented analyses in five phases: They compared the predictive validity of traditional tests and DA, compared two forms of DA, examined the predictive validity of DA by student population, investigated various outcome measures to determine whether they mediate DA’s predictive validity, and assessed the value added of DA over traditional testing. Results indicated superior predictive validity for DA when feedback is not contingent on student response, when applied to students with disabilities rather than at-risk or typically achieving students, and when independent DA and criterion-referenced tests were used as outcomes instead of norm-referenced tests and teacher judgment.

Keywords: dynamic assessment; interactive assessment; predictive validity; disabilities

T he purposes of educational assessment are to eval- uate current school achievement, predict future

achievement, and prescribe educational treatments. Traditional one-point-in-time assessments or pretest– posttest assessments have been used to accomplish these aims because the testing is standardized, easily administered, and norm referenced. Traditional tests tend to produce clear-cut results that are used to evalu- ate, identify, and classify children. Nevertheless, these tests have been criticized for underestimating general ability (Swanson & Lussier, 2001) and lacking sensi- tivity toward both so-called disadvantaged students (e.g., Peña, Quinn, & Iglesias, 1992; Utley, Haywood, & Masters, 1992) and students with disabilities (Lidz, 1987). The scores of low-achieving students on tradi- tional tests are often difficult to interpret because of floor effects. That is, many unskilled kindergartners and first graders, when given a traditional reading test such as the Word Identification and Word Attack sub- tests of the Woodcock Reading Mastery Tests–Revised, obtain a score of zero. Is such a score indicative of an unskilled reader not yet ready to acquire beginning reading skills, or does it signal a currently unskilled

reader ready to learn after pertinent instruction? Dynamic assessment (DA), an alternative to traditional testing, may be capable of distinguishing between these two types of nonreaders.

An Alternative to Traditional Testing

DA has been variously described as learning poten- tial assessment (e.g., Budoff, Gimon, & Corman, 1976; Budoff, Meskin, & Harrison, 1971), mediated learning (e.g., Feuerstein, Rand, & Hoffman, 1979), testing the limits (Carlson & Wiedl, 1978, 1979), mediated assessment (e.g., Bransford, Delclos, Vye, Burns, & Hasselbring, 1987), and assisted learning and transfer by graduated prompts (e.g., Campione, Brown, Ferrara, Jones, & Steinberg, 1985). Across its variants, DA differs from traditional testing in terms of the nature of the examiner–student relationship, the

Authors’ Note: Address correspondence to Douglas Fuchs, Peabody #328, 230 Appleton Place, Nashville, TN 37203; e-mail:

doug.fuchs@vanderbilt.edu.

254

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment

255

content of feedback, and the emphasis on process rather than on product (Grigorenko & Sternberg, 1998). In traditional testing, the examiner is a neutral or “objective” participant who provides standardized directions and does not typically provide perfor- mance-contingent feedback. The DA examiner, by contrast, not only gives performance-contingent feed- back but offers instruction in response to student fail- ure to alter or enhance the student’s achievement. Put differently, traditional testing is oriented toward the product of student learning (i.e., level of perfor- mance), whereas the DA examiner’s interest is both in the product and in the process (i.e., rate of growth) of student learning. Some researchers claim that DA’s twin focus on the level and rate of learning makes it a better predic- tor of future learning. Consider the child who enters kindergarten with little background knowledge. She scores poorly on traditional tests, but during DA, she demonstrates intelligence, maturity, attention, and motivation, and she learns a task—or a series of tasks— with relatively little guidance from the examiner. Because of this, and in spite of her performance on traditional tests, she is seen as in less danger of school failure than her classmates who score poorly on both traditional tests and DA. So, DA may correctly identify children who seem at risk for school failure but who, with timely instruction, may respond relatively quickly and perform within acceptable limits. Data from DA may also identify the type and intensity of intervention necessary for acade- mic success. DA incorporates a test-teach-test format, conceptually similar to responsiveness-to-intervention techniques. However, as we will discuss later, DA can potentially measure one’s responsiveness within a much shorter time frame.

Clinically Oriented DA Versus Research- Oriented DA

Over time, DA has evolved into two branches of study: clinically oriented DA and research-oriented DA. Clinically oriented DA began as an educational treatment to remediate cognitive deficiencies pre- sumed to cause learning problems. Its most well- known operationalization is Feuerstein’s Learning Potential Assessment Device (LPAD). The LPAD is a nonstandardized method of assessing and treating the cognitive deficiencies of children with learning prob- lems. Treatment duration can last many years (Rand, Tannenbaum, & Feuerstein, 1979). Research-oriented

DA, by contrast, originated as an assessment tool. It typically involves a standardized assessment during which the examiner guides a student’s learning in a single session. The time required for the student to reach mastery, or the necessary level of instructional explicitness to advance the student, serves as an index of the student’s learning potential. Researchers and practitioners have used this form of DA to identify students who may require more intensive intervention and to place them in settings where such interven- tions can be implemented. Three concerns are typically expressed about DA:

Namely, its construct is fuzzy, its technical character- istics are largely unknown, and its administration and scoring are labor intensive. First, construct fuzziness (Jitendra & Kame’enui, 1993) refers to when DA’s theory, purpose, procedures, and uses are unclear. Fuzziness occurs, for example, when, at a most general level, researchers fail to distinguish for their audience between clinically oriented or research-oriented DA. As a second example, a major purpose of clinically oriented DA, as just indicated, is to remediate defi- cient cognitive processes that appear to contribute to learning problems. However, the procedures of clini- cally oriented DA are generally nonstandardized and require the examiner’s insight and expertise to assess learning problems and adapt intervention. Second, the extant literature does not typically report the reliability and validity of DA measures. This stems partly from a deliberate rejection of standardized pro- cedures by some researchers. Many advocates of clini- cally oriented DA believe standardization contradicts its spirit and theoretical orientation (e.g., Feuerstein, 1979). A standardized approach, they say, would fail to provide truly individualized intervention in response to student failure. Proponents of research-oriented DA, by contrast, believe standardization and technical adequacy are necessary to make it a worthwhile tool for research and practice (e.g., Bryant, Brown, & Campione, 1983; Ferrara, 1987; Swanson, 1994). These two views of standardization and DA are reflected in the nature of feedback offered during clinically oriented and research-oriented DA. In clin- ically oriented DA, examiners tend to frequently change how they teach to determine the type of inter- vention with which the student is most successful. In research-oriented DA, examiners typically change how much they teach and the level of explicitness of their teaching rather than the intervention. So, in essence, practitioners of clinically oriented DA use an ever- changing process to maximize student achievement,

Downloaded from http://sed.sagepub.com by on February 17, 2009

  • 256 The Journal of Special Education

whereas those using research-oriented DA attempt to assess student achievement in response to a more standardized intervention. Third, critics have suggested that the time required to develop protocols and train examiners may not be worth the information DA provides. Traditional tests already exist, and preparing examiners to use them is relatively straightforward. DA protocols have been around for decades, too, but because of inadequate information about their psychometric properties, more time may be needed to establish their validity and utility. Again, this criticism may be better understood by contrasting the two types of DA. Clinically oriented DA involves relatively little time to develop because scripted protocols are rarely developed. Insight and expertise are essential, and student responsiveness to instruction is relatively dependent on the specific educator providing the help. Conversely, research- oriented DA requires a laborious process of protocol development because the protocols must be standard- ized (and possibly norm based) on the target popula- tion. At the same time, the demand for practitioner insight and expertise is less. Because procedures are standardized, practitioners can be trained in about the time it takes to train examiners in traditional testing.

Is There a Need for DA?

Currently, few proponents of DA—including clini- cians and researchers—believe it is a viable alterna- tive to traditional testing. Rather, many would say DA should not replace traditional testing but should be used in conjunction with it (e.g., Lidz, 1987). The question then becomes, “What unique information can DA provide?” First, DA may represent a less-biased measure of school achievement for certain student groups because it is less dependent on mainstream lan- guage skills and background experience (e.g., Peña et al., 1992; Sewell, 1979; Sewell & Severson, 1974). As we suggested earlier, it may be especially useful in recognizing readiness to learn among low-achieving students because, unlike many traditional tests, it does not suffer from floor effects. Also, items on most tradi- tional tests are scored right or wrong, reflecting an all- or-nothing perspective. DA, by contrast, gives multiple opportunities for success, and low-achieving students’ performances can be measured along a continuum of how easily they learn. Second, clinically oriented DA may inform instruc- tion so that educational interventions can be more readily designed (e.g., Feuerstein, 1979; Haywood, 1992). Third, research-oriented DA has the potential

to predict future student achievement because it attem- pts to measure both level of performance and rate of growth. Presumably, those who learn with greater ease during DA will benefit more from classroom instruc- tion and achieve at a higher level.

Purpose of Review

Several extensive reviews of DA are available in the literature. Grigorenko and Sternberg (1998) offer a comprehensive descriptive review of types of DA based on their comparative informativeness, power of prediction, degree of efficiency, and robustness of results. Yet, no quantitative syntheses were con- ducted, and DA’s predictive validity was not system- atically analyzed by type of feedback, population, or achievement criterion. Swanson and Lussier (2001) conducted a selective quantitative synthesis of DA. They used effect sizes and mixed regression analyses to model responsiveness to DA and found that the magnitude of effects was best predicted by type of DA and domain. Whereas they focused on differences between ability groups and effectiveness of various types of DA, they did not pursue issues of validity. This review focuses on the predictive validity of DA. Prediction of future achievement is important because it can identify students at risk for school failure and in need of more intensive intervention. Students enter school with different cognitive strengths and weaknesses, different home and community experi- ences and expectations, and different levels of prior education. These capacities, experiences, and expecta- tions result in various levels of academic competence and readiness. Traditional testing has been criticized for its limited ability to estimate accurately a student’s potential for change. It is possible that DA, in con- junction with traditional testing, may provide a more accurate estimate of a student’s potential for change and likelihood of school success and inform planning for appropriate instruction. As we write, there is strong interest in responsive- ness to intervention as a substitute for, or an adjunct to, traditional testing to identify at-risk and special- needs students. Most current thinking about and formal operationalizations of responsiveness to intervention require a 10-week to 30-week instructional period. DA may be viewed as a response-to-intervention (RTI) process, too. Its possible advantage is that instruc- tional responsiveness may be determined in a single assessment session. However, much still needs to be

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment

257

understood in this regard about DA’s psychometric properties—and about the operationalizations of responsiveness to intervention (e.g., Fuchs & Fuchs, 2006; Fuchs, Fuchs, & Compton, 2004).

Definitions

Method

As we indicated earlier, no single definition of DA exists. In this review, DA refers to any procedure that examines the effects of deliberate, short-term, inter- vention-induced changes on student achievement, with the intention of measuring both the level and rate of learning. In addition, for purposes of our review, DA must provide corrective feedback and interven- tion in response to student failure. Whereas, as dis- cussed, DA is used for many purposes (e.g., to measure current achievement, to predict future achievement, to inform intervention), this review is concerned primarily with its predictive validity, that is, determining how well DA predicts future student achievement.

Inclusion Criteria

Four criteria were used to select articles for this review. First, selected articles were published in English. Several relevant programs of DA research have been published in Russian (e.g., Ginzburg, 1981; Goncharova, 1990), German (e.g., Carlson & Wiedl, 1980; Guthke, 1977; Wiedl & Herrig, 1978), and Dutch (e.g., Hamers, Hessels, & Van Luit, 1991; Hamers & Ruijssenaars, 1984). A subset of these authors published their original studies in English (Hessels & Hamers, 1993; Meijer, 1993; Resing, 1993; Tissink, Hamers, & Van Luit, 1993), which are included in this review. If only secondary reports were available in English (e.g., Flammer & Schmid, 1982; Hamers & Ruijssenaars, 1984), the studies were excluded. Second, the articles selected included partici- pants enrolled in preschool through high school. A study by Shochet (1992), for example, was excluded for using South African college students. Third, arti- cles in the review included (a) students with high- incidence disabilities, (b) students at risk for school failure due to cultural or economic disadvantage, (c) second-language learners, or (d) normally achiev- ing students. Studies involving students with low- incidence disabilities, such as sensory impairments,

were not included (e.g., Dillon, 1979; Tellegen & Laros, 1993). Fourth, included articles described studies in which reported data could be used to examine DA’s predictive validity. Studies of only concurrent valid- ity or only construct validity were excluded (e.g., Campione et al., 1985). To examine predictive valid- ity, included studies compared students’ levels of per- formance on a DA measure to their levels of performance on an academic achievement measure at some point in the future or to their future educational identification or classification. Studies that opera- tionalized DA as an educational treatment were excluded (e.g., Feuerstein et al., 1981; Feuerstein, Rand, Hoffman, Hoffman, & Miller, 1979; Muttart, 1984; Rand et al., 1979; Savell, Twohig, & Rachford, 1986). In these studies, there were no data of a pre- dictive nature. Finally, those studies that operational- ized DA as different conditions of behavioral reinforcement (i.e., praise, candy, reproof) were excluded (e.g., Kratochwill & Severson, 1977).

Search Procedure

ERIC, PsycInfo, and ECER were searched for the terms dynamic assessment or interactive assessment or learning potential or mediated assessment. From this search, the first author identified the major contributors to the study of DA (e.g., Feuerstein & Budoff) and discovered a special issue of The Journal of Special Education devoted to the topic. In his introduction to this special issue, Haywood (1992) identified the indi- viduals who engaged in the groundbreaking research in the DA field: Feuerstein, Rand, and Hoffman (1979); Feuerstein, Haywood, Rand, Hoffman, and Jensen (1986); Haywood and Tzuriel (1992), and Lidz (1987, 1991). Articles by these authors were searched for potential studies of predictive validity. In addition, two comprehensive reviews by Grigorenko and Sternberg (1998) and Swanson and Lussier (2001) were con- sulted. From these resources, articles were collected that were described as addressing the validity of DA or that had titles indicating that the validity of DA was addressed. Finally, a second search was conducted of ERIC, PsycInfo, and ECER with the terms dynamic assessment or interactive assessment or learning potential or mediated learning and predictive validity to ensure that the collected studies represented most of what was available. A total of 24 studies were identi- fied during the search. (In the References section, these studies are indicated by an asterisk.)

Downloaded from http://sed.sagepub.com by on February 17, 2009

  • 258 The Journal of Special Education

Analysis

The data in the 24 studies were analyzed with regard to four dimensions. First, we compared tradi- tional testing and DA with respect to the magnitude of their respective correlation coefficients with an achievement criterion. Second, two forms of DA (with contingent feedback and with noncontingent feedback) were compared. Contingent feedback refers to a process by which an examiner responds to student failure with highly individualized, nonstan- dardized intervention. Noncontingent feedback requires an examiner to respond to student failure with standardized interventions, regardless of the error, or errors, committed. Type of feedback was analyzed because, arguably, it speaks to the nature of classroom instruction. In classrooms with a standard nondifferentiated instructional approach, students would most likely receive noncontingent feedback, whereas in a classroom with more of an individual- ized approach, students would likely receive more contingent feedback. Third, the predictive validity of DA was analyzed across four populations: mixed-ability groups, typi- cally achieving students, students at risk or disadvan- taged but not disabled, and students with disabilities. Second-language learners were classified as at risk or disadvantaged. To use DA as a tool for identification, it is especially important that the predictive validity be strong for at-risk students and students with disabilities because these students are particularly susceptible to the floor effects of traditional tests, discussed earlier. Fourth, the nature of the achievement criterion was analyzed to determine whether DA could best predict (a) independent performance on the DA posttest (referred to as posttest DA), (b) norm-referenced achievement tests, (c) criterion-referenced achieve- ment tests, or (d) teacher judgment. Posttest DA is the score on the DA measure given at the end of the study. It is the same measure given at the beginning of the study, but the administration is different. For posttest DA, the examiner does not offer corrective feedback. The posttest DA measure represents independent student performance on identical content measured by the pretest DA. Norm-referenced achievement tests are any commercially available assessments of achievement. Criterion-referenced achievement tests are researcher-designed assessments to measure the same construct as explored by the DA administered in the study. Teacher judgment is a rating of the student’s

achievement in the classroom. Table 1 provides demo- graphic information on study participants in each of the 24 studies (i.e., number of participants and age of participants). In addition, Table 1 classifies the studies into categories for analysis (i.e., contingent vs. non- contingent, type of feedback, population, and achieve- ment criterion). After analyzing the studies along these four dimensions, we also determined the value added of DA, over and above traditional testing. This was accomplished by finding studies in which researchers used forced-entry multiple regression to investigate how much variance DA could explain after the vari- ance due to traditional testing was explained. If DA explains significant added variance, it may be worth the time and effort to develop new protocols and use them for identification and placement. Mixed methods were used to explore the data. Pearson’s correlation coefficients were used as an indicator of prediction strength, and the coefficients served as a common metric across 15 of the 24 stud- ies. If multiple correlations were reported, the appro- priate correlations were averaged to provide only one correlation coefficient per analysis category per study. For example, if DA with contingent feedback was used to predict both math and reading perfor- mance, the two correlations were averaged to calcu- late one correlation for the contingent versus noncontingent analysis category. Authors of the 9 studies not reporting Pearson’s correlation coeffi- cients used various group designs and single-subject designs that produced data not directly comparable to Pearson’s correlation coefficient. Nevertheless, the information in these 9 studies was considered valu- able because of the few investigations exploring DA’s predictive validity. Hence, we provide descriptions of their methods and outcomes in the narrative. Because studies reporting Pearson’s correlation coefficients were included in the aggregated data (see Table 2), we do not describe them in the narrative. Significance testing between average correlation coefficients was not possible due to small samples and correspondingly low statistical power. Instead, we dis- cuss trends in the magnitude and direction of the coef- ficients. Table 2 presents the 15 relevant studies and associated correlation coefficients along the four dimensions: DA versus traditional testing, contingent feedback versus noncontingent feedback, population (mixed-ability groups vs. typically achieving students vs. students who are at risk or disadvantaged but not dis- abled vs. students with disabilities), and achievement

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment

259

Table 1 Demographic Characteristics of Participants and Study Characteristics Used in Analysis

 

Chronological

 

Achievement

Study

N

Age or Grade

Feedback

Population

Criterion

Babad and

207

3rd grade

Noncontingent

Mixed-ability group;

Teacher

Budoff (1974)

normally achieving; at-risk/ disadvantaged; students with disabilities

judgment

Bain and

15

2 years 6 months

Noncontingent

Students with disabilities

Criterion

Olswang (1995)

to 3 three years

 

referenced

Bryant (1982)

Preschool

Noncontingent

Mixed-ability group

Posttest

 

DA score

Bryant, Brown,

Preschool

Noncontingent

Mixed-ability group

Posttest

and Campione, (1983)

DA score

Budoff, Gimon,

188

6 years 2 months to

Noncontingent

At-risk/disadvantaged

Norm

and Corman (1976)

14 years 10 months

 

referenced

Budoff, Meskin,

96

7th–9th grade

Noncontingent

Students with disabilities

Criterion

and Harrison (1971)

referenced

Byrne, Fielding-Barnsley,

103

11 years

Contingent

Mixed-ability group

Norm

and Ashley (2000)

referenced

Day, Engelhardt, Maxwell,

84

4 years to 5 years

Contingent

Normally achieving

Posttest

and Bolig (1997)

6 months

 

DA score

Ferrara (1987)

30

5 years 2 months to

Noncontingent

Normally achieving

Posttest

 

6

years 2 months

DA score

Hessels and

500

5 years 4 months to

Noncontingent

At-risk/disadvantaged

Criterion

Hamers (1993)

7 years 9 months

 

referenced

Lidz, Jepsen,

66

11 years to 21 years

Contingent

Students with disabilities

Criterion

and Miller (1997)

referenced

Meijer (1993)

224

16 years to 17

Contingent

Mixed-ability group

Criterion

 

years 11 months

 

referenced

Olswang and

21

2 years 7 months

Noncontingent

Students with disabilities

Criterion

Bain (1996)

to 3 years

 

referenced

Peña, Quinn,

50

3 years 7 months to

Contingent

At-risk/disadvantaged

Norm

and Iglesias (1992)

4 years 9 months

 

referenced

Resing (1993)

234

7 years 1 month to

Noncontingent

Students with disabilities

Teacher

 

8

years 4 months

judgment

Rutland and

26

11 years to 14 years

Noncontingent

Students with disabilities

Posttest

Campbell (1995)

5 months

 

DA score

Samuels, Killip,

20

4 years to 5 years

Contingent

Mixed-ability group

Norm

MacKenzie, and Fagan (1992)

6 months

 

referenced

Sewell (1979)

91

1st grade

Contingent

Normally achieving;

Norm

 

at-risk/disadvantaged

referenced

Sewell and

62

5 years 10 months to

Contingent

At-risk/disadvantaged

Norm

Severson (1974)

7 years 5 months

 

referenced

Spector (1992)

38

5 years 11 months

Noncontingent

Normally achieving

Criterion

 

referenced

Speece, Cooper,

193

1st grade

Noncontingent

At-risk/disadvantaged

Norm

and Kibler (1990)

referenced

Swanson (1994)

143

10 years 9 months

Contingent

Mixed-ability group

Norm

 

referenced

Swanson (1995)

61

10 years 7 months

Contingent

Mixed-ability group

Norm

 

referenced

Tissink, Hamers,

115

5 years 5 months to

Noncontingent

Mixed-ability group

Criterion

and Van Luit (1993)

6 years 8 months

 

referenced

Note: A dash indicates information not reported. DA = dynamic assessment.

Downloaded from http://sed.sagepub.com by on February 17, 2009

(continued)

.39

Table 2 Average Correlation per Study Within Analysis Categories Between Dynamic Assessment (DA) and Student Achievement

.59

.73

.41

.64

.24

.57

.57

.59

.35

.73

.34

.41

.24

.36

.57

.64

.39

.57

.64

.39

.57

.57

.73

.24

.59

.41

.60

.52

.49

.27

.38

.33

.51

.41

.64

.24

.59

.39

.57

.57

.73

.41

Bryant (1982) Bryant, Brown, and Campione, (1983) Budoff, Gimon, and Corman, (1976) Budoff, Meskin, and Harrison (1971) Byrne, Fielding-Barnsley, and Ashley (2000) Day, Engelhardt, Maxwell, and Bolig (1997)

Iglesias (1992)

and Olswang

and Hamers

Ferrara (1987)

and Budoff

Meijer (1993)

Quinn, and

and Miller

Lidz, Jepsen,

and Bain

(1996)

(1996)

(1974)

(1997)

(1993)

Olswang

Hessels

Babad

Study

Peña,

Bain

260

Downloaded from http://sed.sagepub.com by on February 17, 2009

Note: A dash indicates information not reported. Studies for which no correlations appear did not report correlations or provide data with which we could calculate them. Findings from these

.39

.46

.58

.63

.32

.44

.36

.38

.41

.68

.53

.59

.68

.44

.37

.28

.41

Table 2 (continued)

.42

.37

.58

.46

.46

.36

.44

.56

.46

.58

.68

.32

.36

.39

.41

studies are reported in the narrative of our review.

.50

.29

.39

.48

.18

.35

.41

.41

.32

.44

.46

.36

.49

.58

.68

.41

Samuels, Killip, MacKenzie, and Fagan (1992) Sewell (1979) Sewell and Severson (1974) Spector (1992) Speece, Cooper, and Kibler (1990) Swanson (1994) Swanson (1995) Tissink, Hamers, and Van Luit (1993) Average

and Campbell

Resing (1993)

(1995)

Rutland

Study

Downloaded from http://sed.sagepub.com by on February 17, 2009

261

  • 262 The Journal of Special Education

criterion (posttest DA vs. norm-referenced tests vs. criterion-referenced tests vs. teacher judgment).

Findings

DA Versus Traditional Testing

As indicated, correlations between DA measures and achievement measures were reported in 15 of the 24 studies, and correlations between traditional testing and achievement measures were also reported in the same 15 studies (Babad & Budoff, 1974; Bryant, 1982; Bryant et al., 1983; Day, Engelhardt, Maxwell, & Bolig, 1997; Ferrara, 1987; Hessels & Hamers, 1993; Lidz, Jepsen, & Miller, 1997; Olswang & Bain, 1996; Rutland & Campbell, 1995; Sewell, 1979; Sewell & Severson, 1974; Spector, 1992; Speece, Cooper, & Kibler, 1990; Swanson, 1995; Tissink et al., 1993). The average correlation between DA and achievement measures was .49. The average cor- relation between traditional testing and achievement measures was .41. Correlations equal to or greater than .40 are considered by some scholars to be large (Cohen, 1977, 1988; Lipsey & Wilson, 2000). In the prediction of academic achievement, however, these correlations seem modest. Pearson’s correlation coef- ficients do not consider the shared variance between traditional and dynamic measures, and it is impossi- ble to determine the unique predictive ability of tra- ditional or dynamic measures by these correlations (Lipsey & Wilson, 2000). As indicated, 9 additional studies investigated the predictive validity of DA without reporting Pearson’s correlation coefficients (Bain & Olswang, 1995; Budoff et al., 1976; Budoff et al., 1971; Byrne, Fielding-Barnsley, & Ashley, 2000; Meijer, 1993; Peña et al., 1992; Resing, 1993; Samuels, Killip, MacKenzie, & Fagan, 1992; Swanson, 1994). These studies were grouped into three categories according to their design and analysis: single-subject design with visual analysis (Bain & Olswang, 1995), quasi- experimental design with multiple regression analysis (Budoff et al., 1976; Byrne et al., 2000; Meijer, 1993; Resing, 1993; Swanson, 1994), and experimental design with between-group comparisons (Budoff et al., 1971; Peña et al., 1992; Samuels et al., 1992).

Single-subject design with visual analysis. Bain and Olswang (1995) studied the validity of DA to predict future speech growth in a sample of 15 preschoolers with specific language impairment.

Data were displayed on two scatterplots. The first scatterplot displayed participants based on their weighted DA scores for both semantic and functional relations against their changes in mean length utter- ance during the 9-week study. Results indicated that the weighted DA score accurately predicted change in rate of learning for 12 of the 15 participants. The second graph plotted participants’ weighted DA scores only for semantic relations against their changes in mean length of utterance. Results indi- cated that the weighted DA scores accurately pre- dicted the changes in rate of learning for all 15 participants. That is, those with the highest weighted DA scores showed the greatest gains in speech.

Quasi-experimental design with multiple regres-

sion analysis. Budoff et al. (1976), Byrne et al. (2000), Meijer (1993), Resing (1993), and Swanson (1994) used multiple regression analyses to study the unique predictive ability of DA over and above tradi- tional assessment. All of these studies used some form of verbal and quantitative achievement as crite- ria to determine predictive validity. Budoff et al. found mixed results with a population of disadvan- taged students: DA was significantly better than tra- ditional testing in the prediction of nonverbal/ quantitative achievement; however, patterns of pre- diction for verbal measures were inconsistent. Although DA scores were a statistically significant predictor of one of the four verbal measures, tradi- tional measures (e.g., IQ) and demographic informa- tion (e.g., age) were generally more consistent predictors. By contrast, Byrne et al. (2000), Meijer (1993), and Resing (1993) showed that DA made a signifi- cant and consistent contribution to the prediction of achievement. Byrne et al. used a DA procedure called session of last error to predict future phonemic awareness and reading achievement. Session of last error is a measure of the rate of reading progress throughout the study. It is closer to the current opera- tionalization of RTI than it is to DA because it tracks student achievement for several weeks. The faster students reached mastery, the earlier their session of last error. Byrne et al. (2000) studied the reading achieve- ment of a cohort of children in kindergarten and conducted follow-up tests in second and fifth grade. Byrne and his colleagues performed a series of multiple regression analyses on achievement in kindergarten, second grade, and fifth grade. In each

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment

263

of the analyses, the posttest traditional score was entered first into the equation. Session of last error was entered as the second predictive variable. In all cases, the session of last error was a significant pre- dictor of achievement above and beyond the tradi- tional posttest score. It explained from 9% to 21% of the total variance. Meijer (1993) performed a similar analysis on the math achievement of a mixed-ability group of sec- ondary students. First, a traditional measure of initial math achievement was entered into the multiple regression, which accounted for 11% of the variance in achievement. Second, a DA measure was added as a predictor, and it accounted for an additional 13% of the variance. Similarly, Resing (1993) found that, after controlling for verbal IQ, the combination of two dynamic measures (number of hints required to solve a problem and number of items requiring help) predicted an additional 13% of the variance in verbal achieve- ment, 18% of the variance in math achievement, and 14% of the variance in teacher ratings of school per- formance for primary students with disabilities. Swanson (1994) conducted two separate multiple regression analyses on a mixed-ability group of pri- mary students. In the first analysis, the initial tradi- tional score was entered before dynamic variables. For reading achievement, the initial traditional score explained 11% of the total variance, and a combina- tion of dynamic scores explained an additional 19%. For math achievement, the initial traditional score explained 20% of the total variance, and a processing stability score (initial score minus maintenance score) explained an additional 12%. DA did not explain unique variance in math achievement. In the second regression analysis, all variables were allowed to com- pete against each other. For reading achievement, three DA measures (gain score, probe score, and maintenance score) were found to be the best predic- tors of achievement, explaining a total of 34% of the variance. For math achievement, only one DA mea- sure (gain score) was a significant predictor of achievement, explaining 32% of the variance. The ability of DA to predict future achievement, therefore, may depend on what domain of achievement is being predicted and whether initial traditional scores are entered as the first variable in a multiple regression.

Experimental design with between-group compar-

isons. Three studies investigated the predictive validity of DA with experimental methods (Budoff et al., 1971; Peña et al., 1992; Samuels et al., 1992). Budoff et al.

studied DA’s utility in predicting the response to a classroom science curriculum for low-achieving students in Grades 7 through 9. Even after IQ was fac- tored out, performance on DA predicted which students would respond positively to the science cur- riculum. That is, students who initially scored higher on DA or students who improved throughout the administration of DA tended to learn more than students who scored lower on DA and showed no improvement during its administration. Peña et al. (1992) used DA to differentiate Spanish-speaking preschool students with language disorders from nondisabled Spanish-speaking students who had poor English skills. Peña and her colleagues developed a measure of learning potential called the modifiability index. Results indicated that students with a language disorder had a significantly lower modifiability index than did nondisabled students. In addition, students with a higher modifia- bility index demonstrated more gain in single-word vocabulary over the course of the school year. Peña et al. concluded that static measures alone would overidentify Spanish-speaking students for special education placement, but DA distinguished students with language disorders from nondisabled students. Another potential use of DA is informing educa- tional placement. Samuels et al. (1992) studied DA with regard to its prediction of regular versus special education placement of students after preschool. DA significantly predicted educational placement (general vs. special). Results also indicated that placement could not be predicted on the basis of a traditional receptive vocabulary measure (Peabody Picture Vocabulary Test–Revised). Samuels et al. concluded that traditional assessment alone could not fully capture the potential of a student and that DA may be an important tool for placement and programming decisions.

Summary. DA and traditional tests correlate simi- larly with future achievement measures. However, researchers have demonstrated that DA can identify students who will respond to instruction (Bain & Olswang, 1995; Budoff et al., 1971), distinguish between minority students with and without language disorders (Peña et al., 1992), and predict future educa- tional placement (Samuels et al., 1992). Researchers in several studies have reported that DA can contribute to the prediction of achievement beyond traditional tests (Byrne et al., 2000; Meijer, 1993; Resing, 1993). However, this seems to depend on the analysis tech- niques and domains of study (Swanson, 1994).

Downloaded from http://sed.sagepub.com by on February 17, 2009

  • 264 The Journal of Special Education

Does Type of Feedback in DA Influence Predictive Validity?

Of the 15 DA studies reporting Pearson’s correlation coefficients, 6 provided contingent feedback (individ- ualized instruction in response to student failure) and 9 provided noncontingent feedback (standardized instruction in response to student failure). Studies with contingent feedback correlated .39 with achieve- ment, whereas studies with noncontingent feedback correlated .56 with achievement. Nine studies did not report Pearson’s correlation coefficients: 6 studies with contingent feedback (Budoff et al., 1976; Byrne et al., 2000; Meijer, 1993; Peña et al., 1992; Samuels et al., 1992; Swanson, 1994) and 3 studies with non- contingent feedback (Bain & Olswang, 1995; Budoff et al., 1971; Resing, 1993).

Contingent feedback. It was difficult to investigate contingent feedback studies as a group because the authors of these studies operationalized achievement variables in different ways (continuous or dichoto- mous), which changed the meaning of significant results. When achievement was operationalized as a continuous variable (i.e., an achievement test), 2 stud- ies reported positive support for the predictive valid- ity of DA (Budoff et al., 1976; Byrne et al., 2000), and 2 additional studies reported mixed findings (Meijer, 1993; Swanson, 1994) such that results depended on the analysis technique and achievement domain. Two other studies operationalized achieve- ment as a dichotomous variable and found that DA predicted identification or educational placement (Peña et al., 1992; Samuels et al., 1992). When an inherently continuous variable (i.e., achievement) is transformed into an artificial dichotomy (i.e., educa- tional placement using an achievement cutoff point), these variables become less reliable and result in a loss of statistical power. Although the 2 studies using dichotomized variables demonstrate positive results for DA, they should be interpreted with caution.

Noncontingent feedback. Results of the studies using noncontingent feedback were somewhat more straightforward. Using visual analysis, Bain and Olswang (1995) found that their noncontingent DA measure predicted immediate growth in speech with consistency. In addition, Budoff et al. (1971) and Resing (1993) found that DA predicted unique variance above and beyond that which was pre- dicted by IQ.

Summary. Trends in Pearson’s correlation coeffi- cients show that DA with noncontingent feedback is more strongly associated with future achievement than is DA with contingent feedback. Studies in which researchers have used contingent feedback and do not report correlation coefficients are difficult to synthesize across participants and studies because of their highly individualized nature. Studies in which noncontingent feedback was used, and in which cor- relation coefficients are not reported, are somewhat easier to synthesize. Generally, they provide evidence that DA is useful in the prediction of future achieve- ment, even when used in conjunction with traditional testing.

For Whom Does DA Have Predictive Validity?

Study participants were separated into four cate- gories: mixed-ability groups, normally achieving students, at-risk or disadvantaged students, and students with disabilities. Two studies (Babad & Budoff, 1974; Sewell, 1979) reported data sepa- rately for more than one participant group and therefore provided Pearson’s correlation coeffi- cients in more than one category. Correlations were provided in 5 studies with mixed-ability groups (r = .46), 5 studies with normally achieving students (r = .42), 5 studies with at-risk or disadvantaged students (r = .37), and 4 studies with students with disabili- ties (r = .59). All studies with typically achieving students pro- vided Pearson’s correlation coefficients. DA corre- lated .42 with outcome measures. Four studies with mixed-ability groups did not provide Pearson’s corre- lation coefficients. These results will not be discussed because they do not differentiate typically achieving students from at-risk students from students with dis- abilities. The data in mixed-ability group studies were not disaggregated by population. With no details on the mixed-ability group, it is impossible to tell what type of student (i.e., normally achieving, at risk, or disabled) contributed most significantly to the results. Achievement of at-risk or disadvantaged students, for whom DA measures are often designed, is pre- dicted with slightly less accuracy than for mixed- ability groups and typically achieving students. Two studies with at-risk or disadvantaged students did not report Pearson’s correlation coefficients (Budoff et al., 1976; Peña et al., 1992). As discussed, Budoff et al. found that DA scores were significant, yet

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment

265

inconsistent, predictors of achievement. The results of Peña et al. indicated that DA can differentiate dis- abled from nondisabled Spanish-speaking children and predict English-language growth. DA predicted the academic achievement of students with disabilities with considerably more accuracy than it did the other three groups. Two DA studies (Bain & Olswang, 1995; Budoff et al., 1971) predicting the achievement of students with disabili- ties did not provide Pearson’s correlation coefficients. The results of these two studies, as discussed, support the quantitative trend of correlation coefficients, indi- cating that DA may be a better predictor of achieve- ment than traditional testing for students with disabilities.

Summary. Trends in correlation coefficients show that DA was most strongly correlated with the achievement of students with disabilities. The correlation between DA and achievement was weakest for at-risk or disadvantaged students. Ironically, DA is often designed with the intent of creating a less biased measure of achievement for at-risk students. These results indicate that DA may not be less biased than traditional testing for this population.

What Achievement Criteria Affect DA’s Predictive Validity?

There were four types of achievement criteria:

independent performance on the posttest of the DA measure (posttest DA), norm-referenced tests, criterion-referenced tests, and teacher judgment. Posttest DA is the achievement measure that is most similar to the DA measure itself. In most cases, the posttest DA is simply an alternate form of the pretest and training phases of DA. Criterion-referenced achievement tests are the next most similar to the DA measure. These criterion-referenced tests are designed by the researcher to measure the same con- struct being taught during DA. Norm-referenced achievement tests, by contrast, may or may not be similar to the DA measure. Fifteen studies provided Pearson’s correlation coef- ficients: 5 predicted posttest DA, 4 predicted norm- referenced tests, 5 predicted criterion-referenced tests, and 1 predicted teacher judgment. DA measures corre- lated .53 with posttest DA, .38 with norm-referenced tests, .63 with criterion-referenced tests, and .39 with teacher judgment. The trend of the correlations is inter- esting with respect to the similarity of the DA measure

to the achievement measure. Measures more similar to DA, such as posttest DA and criterion-referenced tests, are predicted with greater accuracy (.53 and .63, respectively) than those measures that are less similar, such as norm-referenced tests and teacher judgment (.38 and .39, respectively).

Posttest DA. All studies that predicted posttest DA provided Pearson’s correlation coefficients. DA mea- sures correlated .53 with independent posttest DA performance.

Norm-referenced achievement tests. Five studies that predicted norm-referenced tests did not provide correlation coefficients (Budoff et al., 1976; Byrne et al., 2000; Peña et al., 1992; Samuels et al., 1992; Swanson, 1994). Mixed support was found for DA’s ability to predict achievement as measured by norm- referenced tests. As discussed, Peña et al. and Samuels et al. found positive support for the use of DA as a tool for identification and placement, respec- tively, and Byrne et al. determined that DA explained unique variance in achievement. Budoff et al. and Swanson found mixed results. Demographic factors and traditional testing were more consistent predic- tors than was DA in Budoff et al.’s study; Swanson found that the significance of the results depended on analysis techniques and the academic domain in question.

Criterion-referenced achievement tests. Four stud- ies that predicted criterion reference did not provide correlation coefficients (Bain & Olswang, 1995; Budoff et al., 1971; Meijer, 1993; Resing, 1993). As discussed, Bain and Olswang as well as Budoff et al. found positive support for the ability of DA to predict growth in achievement. Meijer and Resing both con- cluded that DA explained unique variance in the pre- diction of achievement, even after intelligence had been factored out. DA was a consistently significant predictor of achievement as measured by criterion- referenced tests.

Teacher judgment. One study that predicted teacher judgment (Resing, 1993) did not report Pearson’s correlation coefficients. Although DA did not predict teacher judgment as well as posttest DA or criterion-referenced achievement tests did, one study (Resing, 1993) found that DA accounted for 14% of the variance in teacher judgment of achievement, even after IQ had been factored out.

Downloaded from http://sed.sagepub.com by on February 17, 2009

  • 266 The Journal of Special Education

Summary. Again, studies in which researchers did not report Pearson’s correlation coefficients followed the general trend of the quantitative analysis. Posttest DA and criterion-referenced tests were predicted more consistently than were norm-referenced tests and teacher judgment.

Discussion

The purpose of this review was to synthesize evi- dence on the predictive validity of DA. Pearson corre- lation coefficients indicate that traditional testing and DA predict future achievement with similar accuracy. Trends among the correlation coefficients indicated that DA predicted achievement more accurately (a) when the feedback of the assessment was noncontingent on student response, (b) with respect to the achievement of students with disabilities rather than students at risk or typically achieving students, and (c) when involving independent DA posttests and criterion-referenced tests instead of norm-referenced tests and teacher judgment of student achievement. If traditional testing and DA do equally well in predicting achievement, why should we consider using DA? If DA is laborious to develop and validate, why exert the extra effort to develop new tests when valid traditional tests are available? To address this question, we must consider another question: Do traditional testing and DA measure the same constructs that predict achievement? Past reviews have not focused on whether DA explains unique variance in student achievement. To examine this, we must look at the added value of DA. This is possible in analyses in which researchers used forced-entry multiple regression. In such an analysis, if traditional variables are entered first, it is possible to examine DA’s unique contribution to the variance in student performance.

Does DA Provide Added Value?

Ten studies conducted forced-entry multiple regres- sion analysis to explore DA’s unique ability to predict achievement over and above traditional testing (Bryant, 1982; Bryant et al., 1983; Byrne et al., 2000; Ferrara, 1987; Meijer, 1993; Resing, 1993; Rutland & Campbell, 1995; Spector, 1992; Speece et al., 1990; Tissink et al., 1993). Two studies (Byrne et al., 2000; Meijer, 1993) investigated the unique contribution of DA after traditional achievement tests had been entered in the multiple regression, and 8 studies

investigated the unique contribution of DA after tra- ditional cognitive tests (i.e., IQ tests) had been entered in the multiple regression.

Value added to traditional achievement tests. DA contributed significant unique variance to the predic- tion of future achievement beyond traditional achievement tests. Byrne et al. (2000) found that DA accounted for an additional 9% to 21% of the vari- ance in phonemic awareness and reading achieve- ment for students in kindergarten, Grade 2, and Grade 5. Likewise, Meijer (1993) found that DA accounted for an additional 13% of the variance in math achievement for secondary students.

Value added to traditional cognitive tests. DA also consistently contributed significant unique variance to the prediction of future achievement beyond tradi- tional cognitive tests. The 8 studies in which researchers conducted these analyses predicted three domains: general reasoning, verbal achievement, and math achievement. Regarding general reasoning, researchers investigated student performance on mea- sures such as mazes, matrices, and series completion. Bryant (1982) found that two DA measures predicted significant variance in achievement: training score (22%) and transfer score (17%). Bryant et al. (1983) found that transfer score explained 22% of the vari- ance in achievement above and beyond IQ (although the training score was found to be nonsignificant). Rutland and Campbell (1995) found that dynamic training, maintenance, and transfer all made signifi- cant contributions to the variance in achievement (11%, 11%, and 9%, respectively). In the verbal domain, DA also consistently con- tributed to the prediction of achievement. Spector (1992) found that DA contributed between 12% and 14% on phonological awareness measures and 21% on a word-reading measure. Indeed, DA was the only significant predictor of word reading. In Resing’s (1993) study, DA contributed an additional 13% in higher level verbal measures, such as reading sen- tences and writing. Speece et al. (1990), however, reported that DA was not a significant predictor of verbal achievement. The only significant predictors of verbal achievement in this study were verbal IQ and traditional pretest (25% combined). Results concerning the added value of DA in the prediction of math achievement were consistent but varied in magnitude. Ferrara (1987) noted that two dynamic measures explained a statistically significant

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment

267

portion of the variance in math growth: training score (17%) and maintenance and transfer score (32%). Resing (1993) and Tissink et al. (1993) reported that DA contributed significant variance to math achieve- ment although it contributed less so than Ferrara’s study (18% and 7%, respectively). Speece et al. (1990) indicated that DA training contributed signifi- cant variance to math achievement; however, it explained only 2% of the overall variance. In general, there is evidence that DA can predict unique achievement not tapped by traditional achievement or traditional cognitive testing. When DA scores followed the entry of traditional scores in forced-entry multiple regressions, they explained sig- nificant variance in the prediction of general reason- ing, verbal achievement, reading achievement, and math achievement. Only in Speece et al. (1990) were results inconsistent with these findings. Future research, therefore, should acknowledge that DA may not be synonymous with, or substitute for, traditional tests. Rather, it may provide valuable information beyond that which traditional tests provide. The prac- tical significance of this additional information, how- ever, is not yet understood.

Implications for Research

Internal validity and the selection of outcome mea-

sures. One issue in the selection of measures con- cerns the relationship of the predictor variables to the outcome variables. If Predictor A measures the same skill as the outcome measure and Predictor B does not, it would be reasonable to expect that Predictor A would be the stronger of the two. In this review, posttest DA performance and criterion-referenced achievement tests were more highly correlated with DA than were norm-referenced tests and teacher judgment. This finding makes sense given that posttest DA and criterion-referenced achievement tests are more similar to the pretest DA measure. Indeed, posttest DA and criterion-referenced tests are often designed by the researcher to be more similar to pretest DA and particularly sensitive to measuring change. With respect to norm-referenced tests, one would hope researchers would choose tests that are similar to pretest DA. However, it is unlikely that a commercially produced standardized test could be sensitive to change within a short time frame. Likewise, teacher judgment can be similar or not sim- ilar to the DA measure, depending on the correlation between a teacher’s perception of student achieve- ment and the actual achievement.

All this is to say that there is an unequal relation- ship between pretest DA measures and the various achievement measures, which complicates compar- isons among different achievement measures. Nevertheless, selecting varied outcome measures may be important to keep the magnitude of the results in perspective.

External validity and the selection of outcome

measures. The selection of outcome measures is equally important for purposes of external validity. The big question is, What outcome are we trying to predict? A related question is, What are the skills most representative of that outcome? If researchers use criterion-referenced tests to measure outcome, would performance on these tests generalize to suc- cess or failure in the classroom? Perhaps curriculum- based measures or teacher judgment of classroom performance would be a more sensitive index of classroom success. The second question above is concerned with the skills being assessed independent of the type of out- come measure selected. In a study measuring reading achievement, researchers may use both real-word reading and nonword reading. Some may suggest, however, that predicting nonword reading is less important than predicting real-word reading. Selecting multiple measures using real words may be more appropriate in that case.

Implications for Practice

Our review suggests that DA may add unique vari- ance to the prediction of future academic achieve- ment. And because DA is a test-teach-test process, we believe it is appropriate to ask whether it may be regarded, and further explored, as an alternative approach to RTI. If DA—alone or in conjunction with other measures—is administered early in the school year and predicts academic performance at a later point, practitioners may choose to use it as a means of helping them identify students in need of more inten- sive instruction. Moreover, it may be a quicker, more efficient method of RTI than conventional methods for selecting an appropriate tier, or level, of intensity of instruction. More conventional RTI methods require anywhere from 8 to 30 weeks or more to determine instructional responsiveness and the appro- priateness of a given instructional level. DA may be more practical in another way. Standard written protocols guide its use (e.g., Fuchs et al., 2007), whereas the implementation of instruction associated

Downloaded from http://sed.sagepub.com by on February 17, 2009

  • 268 The Journal of Special Education

with more conventional RTI methods is less straight- forward and more difficult to achieve with fidelity.

Limitations of Review

There are very few quantitative syntheses of DA research (e.g., Swanson & Lussier, 2001) and none that are concerned primarily with predictive validity. It is difficult to synthesize research on such a broad and sometimes poorly defined topic, and these results must be understood relative to the paucity of avail- able studies.

Study design. Several studies in this review were not primarily concerned with measuring the predic- tive validity of DA, and the DA measures may not have been designed with this purpose in mind. Similarly, the achievement measures may not have been chosen specifically to measure change across time. In addition, both the DA measures and crite- rion-referenced measures had unreported psychomet- ric properties. We cannot be sure that the constructs indexed were valid, that the measures were reliable, or that they were implemented with fidelity.

Study rigor. One final note concerns the relationship of DA feedback and study rigor. In well-controlled research, the researcher strives to minimize variables that will confound results. It is easier to conduct rigor- ous research in DA using standardized, noncontingent feedback. Individualized, contingent feedback is more difficult to control. Researchers who use noncontin- gent feedback may be exploring performance using methods that are easier to measure, quantify, and ana- lyze. In such studies, standardized procedures are used in all cases of student failure; therefore, the indepen- dent variable is clear and unchanging. Researchers who use contingent feedback, by contrast, introduce an if–then process into intervention. For example, if the students fail because they did not understand the direc- tions, then the examiner may need to repeat or clarify the directions. If the students fail because they lack the underlying skills necessary for success, then the exam- iner may need to concentrate on teaching lower level skills. How can we compare the results of DA across students who require individualized intervention? If the instructional elements are not the same, how can we determine that the predictive ability is due to the nature of the DA and not to the teacher, teaching method, or some other unmeasured variable? It may be that non- contingent and contingent feedback cannot be judged by the same standards of rigor. And, consequently, it

may not be appropriate to compare noncontingent and contingent feedback using current research methods because noncontingent feedback fits more easily into the framework of rigorous, empirical research and therefore is more likely to produce consistent results. Clinically oriented DA that uses contingent feedback may need to develop new and different standards of rigor. DA is a broad area of research difficult to navigate. As we conduct more rigorous research with an artic- ulated purpose, we can begin to accurately determine if research-oriented DA is useful in the identification of children at risk for school failure. Furthermore, we can investigate if clinically oriented DA is useful in the treatment of these children.

References

References marked with an asterisk indicate studies included in the synthesis. *Babad, E. Y., & Budoff, M. (1974). Sensitivity and validity of learning-potential measurement in three levels of ability. Journal of Educational Psychology, 66, 439–447. *Bain, B. A., & Olswang, L. B. (1996). Examining readiness for learning two-word utterances by children with specific expressive language impairment: Dynamic assessment valida- tion. American Journal of Speech Language Pathology, 4,

81–91.

Bransford, J. C., Delclos, J. R., Vye, N. J., Burns, M., & Hasselbring, T. S. (1987). State of the art and future directions. In C. S. Lidz (Ed.), Dynamic assessment: An interactional approach to evaluating learning potential (pp. 479–496). New York: Guilford. *Bryant, N. R. (1982). Preschool children’s learning and trans- fer of matrices problems: A study of proximal development. Unpublished master’s thesis, University of Illinois. *Bryant, N. R., Brown, A. L., & Campione, J. C. (1983, April).

Preschool children’s learning and transfer of matrices prob-

lems: Potential for improvement. Paper presented at the meet- ing of the Society for Research in Child Development, Detroit. *Budoff, M., Gimon, A., & Corman, L. (1976). Learning poten- tial measurement with Spanish-speaking youth as an alterna- tive to IQ tests: A first report. Interamerican Journal of Psychology, 8, 233–246. *Budoff, M., Meskin, J., & Harrison, R. H. (1971). Educational test of the learning-potential hypothesis. American Journal of Mental Deficiency, 76, 159–169. *Byrne, B., Fielding-Barnsley, R., & Ashley, L. (2000). Effects of preschool phoneme identity training after six years: Outcome level distinguished from rate of response. Journal of Educational Psychology, 92, 659–667. Campione, J. C., Brown, A. L., Ferrara, R. A., Jones, R. S., & Steinberg, E. (1985). Breakdowns in flexible use of informa- tion: Intelligence-related differences in transfer following equivalent learning performance. Intelligence, 9, 297–315. Carlson, J. S., & Wiedl, K. H. (1978). Use of testing-the-limits procedures in the testing of intellectual capabilities in children

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment

269

with learning difficulties. American Journal of Mental Deficiency, 11, 559–564. Carlson, J. S., & Wiedl, K. H. (1979). Toward a differential test- ing approach: Testing-the-limits employing the Raven matri- ces. Intelligence, 3, 323–344. Carlson, J. S., & Wiedl, K. H. (1980). Applications of a dynamic testing approach in intelligence assessment: Empirical results and theoretical formulations. Zeitschrift fur Differentielle und Diagnostische Psychologie, 1, 303–318. Cohen, J. (1977). Statistical power analysis for the behavioral sciences (Rev. ed.). New York: Academic Press. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. *Day, J. D., Engelhardt, J. L., Maxwell, S. E., & Bolig, E. E. (1997). Comparison of static and dynamic assessment proce- dures and their relation to independent performance. Journal of Educational Psychology, 89, 358–368. Dillon, R. (1979). Improving validity by testing for competence:

Refinement of a paradigm and its application to the hearing impaired. Educational and Psychological Measurement, 39,

363–371.

*Ferrara, R. A. (1987). Learning mathematics in the zone of prox- imal development: The importance of flexible use of knowl- edge . Unpublished doctoral dissertation, University of Illinois. Feuerstein, R. (1979). The dynamic assessment of retarded per- formers. Baltimore: University Park Press. Feuerstein, R., Haywood, H. C., Rand, Y., Hoffman, M. B., & Jensen, M. (1986). Examiner manual for the Learning Potential Assessment Device. Jerusalem: Hadassah-WIZCO- Canada Research Institute. Feuerstein, R., Miller, R., Hoffman, M. B., Rand, Y., Mintzker, Y., & Jensen, M. R. (1981). Cognitive modifiability in adoles- cence: Cognitive structure and the effects of intervention. The Journal of Special Education, 15, 269–287. Feuerstein, R., Rand, Y., & Hoffman, M. B. (1979). The dynamic assessment of retarded performers: The Learning Potential Assessment Device. Baltimore: University Park Press. Feuerstein, R., Rand, Y., Hoffman, M., Hoffman, M., & Miller, R. (1979). Cognitive modifiability in retarded adolescents:

Effects of instrumental enrichment. American Journal of Mental Deficiency, 83, 539–550. Flammer, A., & Schmid, H. (1982). Lerntests: Konzept, Realisierungen, Bewährung. Eine Übersicht. Schweizerische Zeitschrift für Psychologie und ihre Anwendungen, 33, 14–32. Fuchs, D., & Fuchs, L. S. (2006). Introduction to responsiveness- to-intervention: What, why, and how valid is it? Reading Research Quarterly, 41(1), 93–99. Fuchs, D., Fuchs, L. S., & Compton, D. L. (2004). Identifying reading disability by responsiveness-to-instruction:

Specifying measures and criteria. Learning Disability Quarterly, 27(4), 216–227. Fuchs, D., Fuchs, L. S., Compton, D. L., Bouton, B., Caffrey, E., & Hill, L. (2007). Dynamic assessment as responsiveness-to- intervention: A scripted protocol to identify young at-risk readers. Teaching Exceptional Children, 39(5), 58–63. Ginzburg, M. P. (1981). O vozmozhnoi interpretatsii poniatia zony blitzhaishego razvitia [On a possible interpretation of the concept of the zone of proximal development]. In D. B. El’konin & A. L. Venger (Eds.), Diagnostika uchebnoi

diatel’nosti I intellectual’nogo razvitia detei (pp. 145-155). Moscow: Academia Pedagogicheskikh Nauk SSSR. Goncharova, E. L. (1990). Nekotorye voprosy vyshego obrazova- nia vzrnslykh slepoglukhikh [On higher education for the deaf-blind]. In V. N. Chulkov, V. I. Lubovsky, & E. N. Martsinovskaia (Eds.), Differentsirovannyi podkhod pri obuchenii I vospitanii slepoglukhikh detei (pp. 56-70). Moscow: Academia Pedagogicheskikh Nauk SSSR. Grigorenko, E. L., & Sternberg, R. J. (1998). Dynamic testing. Psychological Bulletin, 124, 75–111. Guthke, J. (1977). Zur Diagnostik der intellekturllen Lernahigkeit [Assessment of intellectual learning potential]. Berlin: VEB Deutcher Verlag der Wissenschafen. Hamers, J. H. M., Hessels, M. G. P., & Van Luit, J. E. H. (1991). Leertest voor etnische minderheden. In Test en handleiding. Lisse: Swets & Zeitlinger. Hamers, J. H. M., & Ruijssenaars, A. J. J. M. (1984). Leergeschiktheid en leertests. Lisse: Swets & Zeitlinger. Haywood, H. C. (1992). Introduction to special issue. Journal of Special Education, 26, 233–234. Haywood, H. C., & Tzuriel, D. (Eds.). (1992). Interactive assess- ment. New York: Springer-Verlag. *Hessels, M. G. P., & Hamers, J. H. M. (1993). The learning potential test for ethnic minorities. In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.), Learning poten- tial assessment: Theoretical, methodological, and practical issues (pp. 285–311). Lisse, Netherlands: Swets & Zeitlinger B.V. Jitendra, A. K., & Kame’enui, E. J. (1993). Dynamic assessment as a compensatory assessment procedure: A description and analysis. Remedial and Special Education, 14(5), 6–18. Kratochwill, T. R., & Severson, R. A. (1977). Process assessment:

An examination of reinforcer effectiveness and predictive validity. Journal of School Psychology, 5, 293–300. Lidz, C. S. (Ed.). (1987). Dynamic assessment: An interactional approach to evaluating learning potential . New York:

Guilford. *Lidz, C. S., Jepsen, R. H., & Miller, M. B. (1997). Relationships between cognitive process and academic achievement:

Application of a group dynamic assessment procedure with multiply handicapped adolescents. Educational and Child Psychology, 14, 56–67. Lipsey, M. W., & Wilson, D. B. (2000). Practical meta-analysis. Thousand Oaks, CA: Sage. *Meijer, J. (1993). Learning potential, personality characteristics, and test performance. In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.), Learning potential assessment:

Theoretical, methodological, and practical issues (pp. 341–362). Lisse, Netherlands: Swets & Zeitlinger B.V. Muttart, K. (1984). Assessment of effects of instrumental enrich- ment cognitive training. Special Education in Canada, 58,

106–108.

*Olswang, L. B., & Bain, B. A. (1996). Assessment information for predicting upcoming change in language production. Journal of Speech and Hearing Research, 39, 414–423. *Peña, E., Quinn, R., & Iglesias, A. (1992). The application of dynamic methods to language assessment: A nonbiased pro- cedure. The Journal of Special Education, 26, 269–280. Rand, Y., Tannenbaum, A. J., & Feuerstein, R. (1979). Effects of instrumental enrichment on the psychoeducational

Downloaded from http://sed.sagepub.com by on February 17, 2009

  • 270 The Journal of Special Education

development of low-functioning adolescents. Journal of Educational Psychology, 71, 751–763. *Resing, W. C. M. (1993). Measuring inductive reasoning skills:

The construction of a learning potential test. In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.), Learning potential assessment: Theoretical, methodological, and practical issues (pp. 219–242). Lisse, Netherlands: Swets & Zeitlinger B.V. *Rutland, A., & Campbell, R. (1995). The validity of dynamic assessment methods for children with learning difficulties and nondisabled children. Journal of Cognitive Education, 5, 81–94. *Samuels, M. T., Killip, S. M., MacKenzie, H., & Fagan, J. (1992). Evaluating preschool programs: The role of dynamic assessment. In H. C. Haywood & D. Tzuriel (Eds.) Interactive assessment (pp. 251–271). New York: Springer-Verlag. Savell, J. M., Twohig, P. T., & Rachford, D. L. (1986). Empirical status of Feuerstein’s ‘Instrumental Enrichment’ (FIE) tech- nique as a method of teaching thinking skills. Review of Educational Research, 56, 381–409. *Sewell, T. E. (1979). Intelligence and learning tasks as predic- tors of scholastic achievement in Black and White first-grade children. Journal of School Psychology, 17, 325–332. *Sewell, T. E., & Severson, R. A. (1974). Learning ability and intelligence as cognitive predictors of achievement in first- grade black children. Journal of Educational Psychology, 66,

948–955.

Shochet, I. M.(1992). A dynamic assessment for undergraduate admission: The inverse relationship between modifiability and predictability. In H. C. Haywood & D. Tzuriel (Eds.), Interactive assessment (pp. 332–355). New York: Springer-Verlag. *Spector, J. E. (1992). Predicting progress in beginning reading:

Dynamic assessment of phonemic awareness. Journal of Educational Psychology, 84, 353–363. *Speece, D. L., Cooper, D. H., & Kibler, J. M. (1990). Dynamic assessment, individual differences, and academic achieve- ment. Learning and Individual Differences, 2, 113–127. *Swanson, H. L. (1994). The role of working memory and dynamic assessment in the classification of children with learning disabilities. Learning Disabilities Research and Practice, 9, 190–202. *Swanson, H. L. (1995). Effects of dynamic testing on the classifi- cation of learning disabilities: The predictive and discriminant validity of the Swanson-Cognitive Processing Test (S-CPT). Journal of Psychoeducational Assessment, 13, 204–229.

Swanson, H. L., & Lussier, C. M. (2001). A selective synthesis of the experimental literature on dynamic assessment. Review of Educational Research, 71, 321–363. Tellegen, P. J., & Laros, J. A. (1993). The Snijders-Oomen nonver- bal intelligence tests: General intelligence tests or tests for learn- ing potential? In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.), Learning potential assessment: Theoretical, methodological, and practical issues (pp. 267–283). Lisse, Netherlands: Swets & Zeitlinger B.V. *Tissink, J., Hamers, J. H. M., & Van Luit, J. E. H. (1993). Learning potential tests with domain-general and domain-specific tasks. In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.), Learning potential assessment: Theoretical, methodolog- ical, and practical issues (pp. 243–266). Lisse, Netherlands:

Swets & Zeitlinger B.V. Utley, C. A., Haywood, H. C., & Masters, J. C. (1992). Policy implications of psychological assessment of minority children. In H. C. Haywood & D. Tzuriel (Eds.), Interactive assessment (pp. 445–469). New York: Springer-Verlag. Wiedl, K. H., & Herrig, D. (1978). Okologische Validitat und Schulerfolgsprognose im Lern- und Intelligenztest: Eine exemplarische Studie. Diagnostica, 24, 175-186.

Erin Caffrey is a graduate student at Vanderbilt University. Her primary interests include students with learning disabilities, beginning reading, and assessment. Her current research exam- ines the predictive validity of dynamic assessment measures of early reading skills.

Douglas Fuchs is a professor of special education at Peabody College of Vanderbilt University where he also codirects the Vanderbilt–Kennedy Center Reading Clinic. With Lynn S. Fuchs, Don Compton, and others, he is currently exploring the impor- tance of dynamic assessments in reading and math to match at- risk students to appropriate levels of instruction in an response-to-intervention framework.

Lynn S. Fuchs is a professor of special education at Peabody College of Vanderbilt University where she conducts research funded by the National Institute of Child Health and Human Development and other agencies on the importance of individual differences to math and reading performance and how curriculum- based measurement may be used to develop effective instruction for difficult-to-teach children.

Downloaded from http://sed.sagepub.com by on February 17, 2009