Beret Vas 2008

This article was downloaded by: [UZH Hauptbibliothek / Zentralbibliothek Zürich]
On: 10 September 2013, At: 13:24

Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK
Evidence-Based Communication Assessment and

Intervention
Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/tebc20
A review of meta-analyses of single-subject

experimental designs: Methodological issues and
practice
a a
S. Natasha Beretvas & Hyewon Chung
a
Department of Educational Psychology, University of Texas at Austin, USA
Published online: 15 Nov 2008.
To cite this article: S. Natasha Beretvas & Hyewon Chung (2008) A review of meta-analyses of single-subject experimental
designs: Methodological issues and practice, Evidence-Based Communication Assessment and Intervention, 2:3, 129-141, DOI:
10.1080/17489530802446302
To link to this article: http://dx.doi.org/10.1080/17489530802446302
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
129 META-ANALYSES OF SINGLE-SUBJECT EXPERIMENTAL DESIGNS
A review of meta-analyses of single-subject experimental

designs: Methodological issues and practice
S. Natasha Beretvas & Hyewon Chung, Department of Educational Psychology, University of Texas at Austin, USA
...............................................................................................................................................
Abstract
Several metrics have been suggested for summarizing results from single-subject experimental designs. This study briefly
reviews the most commonly used metrics, noting their methodological limitations. This study also includes a synthesis of
recent meta-analyses, describing which metrics were used and how meta-analysts handled dependence in the form of
multiple treatments, outcomes, and participants per study. Guidelines for future methodological research and for single-
subject experimental design meta-analysts are provided.
Keywords: Meta–analysis, effect sizes, single–subject experimental–designs, methodology

Downloaded by [UZH Hauptbibliothek / Zentralbibliothek Zürich] at 13:24 10 September 2013
INTRODUCTION discerned from SSED results. And for the results

Meta-analytic procedures afford researchers of a visual analysis of data to support a treatment’s
a method to quantitatively synthesize past research effect, usually the magnitude of the effect must be
results, thereby providing evidence to support best sufficiently large that associated practical and
practice (Glass, 1976; Hedges & Olkin, 1985; Hunter clinical significance can be assumed. However,
& Schmidt, 1990). While these meta-analytic meth- visual inspection of results involves the considera-
ods work well for synthesizing the results of studies tion of several dimensions along which data might
with large sample sizes (large-n), there is still no vary, including mean shift (change in average
consensus concerning how best to summarize results behavior in baseline versus treatment), slope
from single-subject experimental design (SSED) change (change in trend between baseline and
studies. This is a problem because a considerable treatment), and variability (of data points around
amount of educational and psychological research the general trend). Parsonson and Baer (1992)
has made use of SSEDs (Galassi & Gersh, 1993). provide a detailed summary of research conducted
Indeed, SSEDs are frequently employed in educa- to investigate the relationship between character-
tional research designed to assess a treatment’s istics of graphs and the inferences resulting from
effect on special populations, such as individuals visual interpretations of the results. Unfortunately,
with autism and related developmental disabilities. some of this research has indicated that inferences
Typically in pre–post large-n intervention studies, based on visual analysis are not very reliable.
the focus is on change in outcome level between pre- Statistical summaries of results provide an alter-
test and post-test. While this is part of the interest native to visual inspection.
with SSED studies, trends within and across baseline There are several justifications for using statistical
and treatment phases are simultaneously considered methods to summarize results from SSEDs. Beyond
and are perhaps the most important aspect of the the parsimony associated with use of quantitative
data to consider when evaluating the results from meta-analysis, the current climate of evidence-based
such studies. Numerical descriptors of these trends practice also heralds a renewed focus on methods
are difficult to estimate when the data describe used to meta-analyze SSEDs results. A statistical
a treatment’s effect on an individual (Crosbie, 1993) summary of results also allows a potentially more
and when the number of repeated measures is as objective summary of studies’ results through the
small as is commonly found in educational SSED use of meta-analytic procedures. Use of meta-
research (Busk & Marascuilo, 1988; Huitema, 1985). analysis encourages generalization of findings
For this reason, visual analysis of graphed data is across individuals. It also permits exploration of
typically employed in SSED studies to assess any differences identified in study results.
a treatment’s effect. In addition, statistical description of study results
A perusal of a plot of results should clearly allows for the identification of a treatment effect,
identify whether a treatment effect can be despite potentially unstable baseline data, and of
small treatment effects that might not be visible
For correspondence: S. Natasha Beretvas. E-mail: tasha.beretvas@ graphically (Nourbackhsh & Ottenbacher, 1994).
utexas.edu Lastly, the potential lack of reliability noted for
Source of funding: Preparation of this article was supported by a grant from
the Institute of Education Sciences, U.S. Department of Education. However, interpretation of data using visual analysis could
the opinions expressed do not express the opinions of this agency. be reduced with the use of statistical summaries.
ISSN 1748–9539 print/ISSN 1748–9547 online ß 2008 Informa Healthcare USA, Inc.
DOI: 10.1080/17489530802446302
The interpretation of results associated with an population standard deviation, s, of outcome scores
appropriately conducted quantitative descriptor will within the populations. The corresponding standard-
be more consistent. ized mean difference that is frequently used as
Several quantitative descriptors have been derived a metric for SSEDs is
as options to describe a treatment’s effect in SSEDs.
These descriptors include single-indicator summaries _ Y B Y A ð2Þ
SMD ¼
such as various versions of a standardized difference S
in phase (treatment and baseline) means (Busk & where the sample mean, Y i , is calculated on the basis
Serlin, 1992) and the percentage of non-overlapping of the mean of the outcome values in phase i (A or B)
data (PND; Scruggs, Mastropieri, & Casto, 1987). for a single individual, whereas the sample mean in
Others have suggested using single R2-change Equation 1 is calculated on the basis of the outcome
indicators that simultaneously describe change in scores for a sample of individuals. What further
_ _
the outcome’s level and slope as a result of treatment distinguishes from SMD is the calculation of the
(Center, Skiba, & Casey, 1985–1986; Faith, Allison, & standard deviation. In Equation 2, s is the sample
Gorman, 1996). Other researchers have suggested standard deviation for the individual’s outcome
a pair of R2-change indicators to describe a treatment scores (calculated using data just from the baseline
effect with one indicator describing change in the phase or pooled across the A and B phases).
level of an outcome between baseline and treatment There will be less variability in measures on
(e.g. Crosbie, 1995; Tryon, 1982), with the other a single individual over time, S , than in measures
indicator describing change in trend as a result of the on multiple individuals at a single time point, s. The
treatment being introduced (e.g. Beretvas & Chung, reason for this is that an individual’s scores will
2007; Crosbie, 1995). Finally, some researchers have probably be similar to his or her score at an earlier
proposed a multi-level approach to the meta-analysis
point in time, whereas if two independent
of SSEDs (van den Noortgate & Onghena, in press).
individuals had been randomly selected, then there
Despite the need to quantitatively synthesize results
should be no relationship between their scores. The
from SSEDs, there is little consensus in the field
autocorrelation introduced in SSED time-series
about what qualifies as an appropriate descriptor.
data results in a violation of the assumption of
Methodological critiques of each of the descriptors
independence that is made with most large-n
have been performed, and no single descriptor has
group-comparison designs. Given the fundamental
yet been established as best. In spite of these
differences between s and S , the metric estimates in
criticisms, researchers continue to use several of
Equations 1 and 2 should not be considered as
these descriptors. The current paper provides a brief
measured on the same metric. The reduced varia-
description of some of the more commonly used
effect sizes metrics, emphasizing some of the bility in repeated measures on an individual
_
(S ) will
problems associated with each metric. In addition, inflate the scale
_
of the resulting SMD as compared
a summary of recent applied meta-analyses of single with that of . Thus, if the effect-size estimate in
subject designs will be provided, along with some Equation 2 is used to describe results from SSEDs,
directions for future research in this area. they should not be combined with effect-size
estimates from large-n studies. In addition, it
should not be assumed
_
that the variance typically
SINGLE INDICATOR EFFECT-SIZE METRICS associated with (see, for example, Cooper &
Hedges, 1995; Hedges & Olkin, 1985) is associated
Standardized mean difference _
with SMD . Mostly as a result of the potential
The standardized mean difference used in SSEDs is autocorrelation resulting from repeated measures _
calculated on the basis of the effect size of the same on an individual, the sampling distribution of SMD
name used for large-n studies that compares the is unclear and, thus, the relevant variance to
difference in means of two independent groups at associate with the metric’s index is also unclear.
a single time point. The basic formula for the large-n _
The standardized mean difference effect size,
standardized mean difference is estimated using SMD , appears to be quite commonly used in meta-
analyses of SSED research (see, for example, Busk &
_ Y T Y C
¼ ð1Þ Serlin, 1992; Faith et al., 1996). Other researchers
S (Hershberger, Wallace, Green, & Marquis, 1999)
_
where is an estimate of the difference between the have encouraged calculating an effect-size estimate
mean outcome score of the treatment group, Y T , and similar to that in Equation 2, except that the means
the mean of the control group, Y C . This difference is and standard deviations would be based on only the
standardized by dividing by an estimate of the last three time points in each phase. Use of only the
last three time points per phase might be preferred as 9

8
providing a more valid measure of baseline data once
7
the pattern in the outcome measure has stabilized;
6
however, all of the criticisms mentioned above with
Outcome
_ 5
regard to SMD also apply to effect size. 4
Apart
_
from the
_
caveat about the different metrics 3
of SMD versus , it should be emphasized that use of 2
this metric descriptor would only make sense when 1
no trend was evident in the baseline or treatment 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
phases (see Figure 1 for an example). However, there
might be a trend in the pattern of behavior either Time
within one or within both phases and, thus, the

Figure 1. AB design, outcome level shift (increase) with no trend

mean value will not be very representative of within baseline and treatment phases.
a treatment’s effect. This metric will only represent
the difference in the average outcome levels in each
phase. No trend might be evident in baseline but the
8
treatment might change the outcome level and
introduce a trend (for example, see Figure 2). 7
Alternatively, there might already be a slight trend 6

in baseline (reflecting a natural development in the 5
Outcome
outcome over time). The intervention might raise the 4
level and convert the trend to a steeper slope 3
supporting more rapid growth in the outcome (see 2
Figure 3). In any of the scenarios in which there is
1
a trend1, a change in the average level in the
0
outcome of each phase will not capture the treat- 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
ment’s impact on the trend. It is also possible that Time
there might be a trend in baseline and that the
treatment has absolutely no effect on either trend or Figure 2. AB design, outcome level shift (increase) with no trend in
level (for example, see Figure 4). Summarizing the baseline and trend (increasing behavior) introduced in treatment
results of a study for which the pattern depicted in phase.
_
Figure 4 applies using SMD will lead to (false)
30
detection of a treatment’s effect even though the
change in level solely resulted from natural 25
development.
20
Outcome
Percentage of non-overlapping data 15
Scruggs et al. (1987) introduced the PND as an index 10

of a treatment’s effect. The PND provides a non-
parametric descriptor of the overlap between the 5
data in the treatment versus the baseline phases. To
0
describe results in an AB design for a treatment 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
designed to increase a behavior, the PND is Time
calculated by tallying the percentage of data points
in the treatment phase that exceed the highest point Figure 3. AB design, outcome level shift (increase) with slight,
reached in the baseline phase. (If a treatment is positive trend in baseline and change (increase) in trend
introduced in treatment phase.
designed to decrease a behavior then the percent of
points in the treatment phase that are lower than the
lowest data point in the baseline phase are tallied). with parametric descriptors of effect size. And
The PND does not require the assumptions of while exact values for data points are needed for
normality, linearity, and homogeneity associated other metrics, this is not the case for PND calcula-
tion. Extensive description of problems and benefits
..................................................................... associated with use of the PND are provided
1
All trends have been depicted, here, as positive. It is of course feasible that
trends could be negative (reflecting a reduction in the behavioral outcome). elsewhere (see, for example, the entire second issue
The same conclusions apply for such scenarios. of Remedial and Special Education, 1987, Volume 8).
9 and X2 are hypothesized to predict Y, then the

8 following regression model could be tested:
7
Yi ¼ 0 þ 1 X1i þ 2 X2i þ . . . ei ð3Þ
6
Estimation of the relevant regression coefficients
Outcome
5
(the s) in Equation 3 is unbiased and efficient when
4
the relevant model assumptions are met. One of the
3
primary assumptions (beyond homoscedasticity and
2 normality) is that of the independence of the
1 residual terms, the eis. This assumption is approxi-
0 mately met in large-n studies and, thus, the standard
errors estimated using OLS can be assumed to be
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Time accurate. However, when this assumption is violated,
then estimation can no longer be considered efficient
Figure 4. AB design, positive trend in baseline and treatment
phases and no treatment effect (on level or trend).
and the standard error estimates cannot be assumed
accurate.
If residuals are (what is termed) positively
Despite the caveats associated with its use, the PND autocorrelated2, as might be possible with time-
is one of the most commonly used descriptors of
series data (namely, data measured over time for an
treatment effectiveness in the research syntheses of
individual), then the standard error estimates will be
SSEDs in education and special education (see
biased underestimates (see for example, Pankratz,
Schlosser, Lee, & Wendt, in this issue).
1983; Crosbie, 1993; Busk & Marascuilo, 1992;
Several other non-overlapping metrics have been
Gorman & Allison, 1996). Various researchers
developed since the introduction of the PND,
(Jones, Weinrott, & Vaught, 1978; Huitema, 1985;
including the percentage of all non-overlapping
Busk & Marascuilo, 1988, and others) have argued
data (Parker, Hagan-Burke, & Vannest, 2007), the
whether there is autocorrelation in the errors of
percentage of data points exceeding the median (Ma,
SSED data. Regardless, as recommended by Busk
2006), the improvement rate difference (Parker &
and Marascuilo (1992), ‘‘single-case researchers
Hagan-Burke, 2007), the mean baseline reduction
should analyze their data not assuming indepen-
(Lundervold & Bourland 1988), among others. The
dence of observations’’ (p. 165).
benefit of these metrics includes that they are simple
Huitema and McKean (1998) demonstrated that
to use and unaffected by potential autocorrelation.
appropriate modeling of the trends in SSED datasets
Unfortunately, however, several associated problems
can greatly reduce the autocorrelation in residuals.
have been noted. One of the problems with these
non-parametric indices has to do with the unknown This should be a goal for researchers interested in
sampling distributions associated with each of them. describing trends in SSED studies because, if
This seriously compromises the validity of statistical autocorrelation is not sufficiently well explained
tests conducted using these indices. Additional away with predictors, the autoregressive models that
criticisms, including the potential impact of floor or are needed to appropriately model the patterns are
ceiling effects and orthogonal slope changes have particularly complex. The use of autoregressive
also been noted. Another common criticism of these integrated moving average (ARIMA) time-series
non-parametric descriptors is that their values can be models as popularized by Box and Jenkins (1976)
confounded in the presence of a trend in the data. has been advocated for the more formally termed
Additional procedures have been recommended to ‘interrupted time series’ (ITS) data that are com-
handle assessment of a treatment’s effect using monly encountered in SSEDs (Glass, Willson,
regression-based procedures that model the possible & Gottman, 1975). However, ARIMA models are
changes in level and trend of the outcome. Before complicated and function well only with a number of
describing the different procedures, one of the data points per phase that is much higher (i.e. 50 or
fundamental assumptions of multiple regression more; Box & Jenkins, 1976) than is typically
will be reviewed. In addition, its possible violation encountered in educational SSED research
in SSEDs will be discussed along with the associated (Hartmann et al., 1980).
effect on statistical results. Huitema (1985) examined 881 experiments pub-
With ordinary least squares (OLS) regression lished in the Journal of Applied Behavior Analysis
analyses for large-n studies, the outcome for
.....................................................................
individual i, Yi, is modeled as a function of the 2
If residuals are negatively autocorrelated, then standard error estimates will
relevant predictors. For example if two predictors, X1 be inflated with overly conservative Type I error rates.
between 1968 and 1977 and found the modal standard deviation describing the variability of the
number of data points per phase to be 3–4, with scores in each phase. A form of the correction used
a median of 5. Busk and Marascuilo (1988) by White et al. (see Equation 5), although using r not
summarized this information for articles published r2, is applied to standard deviations in related-
in the same journal from 1975 to 1985 and found samples test statistics in large-n meta-analyses (see
that the vast majority of SSED analyses involved Morris & DeShon, 2002, for example) to correct for
fewer than 30 data points for the baseline and correlated data. Yet, given that the relationship
intervention phases (85% and 73%, respectively). Not between time and the outcome has already been
only are model parameters poorly estimated in the modeled in each phase with the regression of Y on t,
presence of autocorrelation, but the autocorrelation use of this correction does not seem appropriate.
(in the residuals) is also poorly estimated with small Since these ideas were introduced, several authors
data sets. have suggested use of a piecewise regression model
Given the evident problems with estimating to describe ITS data (such as is found in AB designs
autocorrelation, Huitema and McKean’s (1998) and their extensions). The parameterizations of the
suggestion to reduce autocorrelation with appropri- piecewise regression model were designed to provide
ate model specification holds great merit. Several parameters describing potential changes in level and
regression models have been suggested to explain slope upon introduction of the treatment. Center
patterns exhibited in two-phase data to explore the et al. (1985–1986) suggested use of a metric based on
existence of potential treatment effects. It should be a change in R2 (R2) for two piecewise regression
remembered that one of the strongest criticisms models. The full piecewise regression model’s
applied to the non-regression
_
effect-size estimates parameterization,
(such as the PND and SMD ) is that they do not
reflect the possible impact of a linear trend. In the Yt ¼ 0 þ 1 Tt þ 2 Dt þ 3 ðTt n1 ÞDt þ et ð6Þ
presence of the simplest kind of trend, a treatment
was designed to provide two regression coefficients
can affect both level and trend and, thus, the
(2 and 3) that described the change in the level
relevant model needs to take this into consideration.
of and the trend in the outcome from baseline to
treatment. (Note that variable Tt is used to identify
the time point of the outcome, Yt, at time t, and Dt is
SINGLE METRICS BASED ON CHANGE IN R2 a dummy-coded variable used to identify whether
As a first reaction to this same concern, Gorsuch the outcome was measured in the baseline (D ¼ 0) or
(1983) had suggested calculating an effect-size the treatment (D ¼ 1) phase). A restricted piecewise
estimate that was a function of the change in R2 regression model, which did not include the inter-
introduced by adding time as a predictor of the action (change in slope) nor the phase (D) (change
outcome, Yt. A similar procedure was suggested by in level) predictors,
White, Rusch, Kazdin and Hartmann (1989). White
Yt ¼ 0 þ 1 Tt þ et ð7Þ
et al.’s suggestion was to calculate an effect-size
estimate using the following formula: would also be estimated. The change in R2 for the
YB0 YA0 two models (in Equations 6 and 7) is calculated and
ESY 0 ¼ ð4Þ converted into an effect size that describes the effect
SY 0
of an intervention on both the slope and intercept.
where Yi0 is the outcome score on the last day of Specifically, the effect size, F(2,df), represents
phase i predicted using the linear regression of Y on a measure of the change in the proportion of
Time using data from phase i, and SY 0 is the pooled variance in the outcome explained (i.e. the R2)
within-phase standard deviation estimate. The stan- with the simultaneous addition of the change in
dard deviation is calculated for each phase i using: intercept and the change in trend parameters (from
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Equation 6 to Equation 7). The effect size is
SY 0 ¼ Si 1 rYt 2 ð5Þ
i calculated based on the F-ratio statistic testing the
R2 of the full and restricted models:
where si is the conventional standard-deviation
estimate of scores calculated within phase i (for ðR2full R2rest Þ=ðdfrest dffull Þ
each of phase A and B), and rYt represents the F¼ ð8Þ
ð1 R2full Þ=ðn dffull 1Þ
correlation between Y and time. The resulting two
values for SYi0 are pooled together to obtain SY 0 (in where df represents the degrees of freedom.
Equation 4). Given the non-independence of the In Equation 8, the ‘rest’ subscript refers to the
values on Y, it seems necessary to correct the ‘restricted’ regression model in Equation 7. The ‘full’
subscript refers to the ‘full’ regression model in Still other metrics have been suggested, although
Equation 6 that includes the change in level and problems with these have also been identified. (For
linear growth coefficients. This means that the example, Blumberg, 1984; Crosbie, 1995; Gorman
numerator of Equation 8 provides the amount of & Allison, 1996, all of whom have pointed out
variability explained by the addition of the change in serious problems with the C statistic of Tryon,
level and change in trend coefficients. The associated 1982). Crosbie (1993; 1995) developed a procedure
effect size, F(2,df), is a function of this F-ratio and is designed to analyze ITS data such as is encoun-
designed to correct for the F-ratio being based on tered in SSEDs. Crosbie wrote a program that uses
two df (Faith et al. 1996): an alternative estimate of the autocorrelation
sffiffiffiffiffiffiffiffiffiffiffiffi designed to correct for its bias with small sample
_ 2F sizes. The program, ITSACORR, provides two
Fð2, df Þ ¼ 2 ð9Þ
dfError indices. It includes tests of the change in intercept
between baseline and treatment phases and of the

This resulting F(2,df) essentially describes the mag- change in slope between the two phases while
nitude of the change in level and trend introduced modeling potential autocorrelation. On the basis of
by the treatment. the results of a simulation study, Crosbie found
Faith et al. (1996) suggested a slight modification evidence supporting the usefulness of ITSACORR
to Center et al.’s (1985–1986) suggested metric; for estimating treatment effects for two-phase,
however, there are two problems with what was short ITS data sets (1993). More recently, however,
suggested. A fundamental problem is that the Huitema (2004) described some potential problems
piecewise regression used by both Center et al. with the setting of parameters for the design
(1985–1986) and Faith et al. (1996) is better matrix assumed in ITSACORR. The set of problems
parameterized using Huitema and McKean’s (2000) evolve whenever there might be a trend during the
model. In this model, [Tt (n1 þ 1)]Dt is used baseline phase. The comparison of intercepts under
instead of Tt n1 (in Equation 6): ITSACORR involves a comparison of the predicted
Yt ¼ 0 þ 1 Tt þ 2 Dt þ 3 ½Tt ðn1 þ 1ÞDt þ et : measure at the first baseline time point with the
predicted value for the first measure in the
ð10Þ treatment phase. When there is a trend during
In this model, coefficient 0 provides an estimate of baseline, then these intercepts would be expected
the baseline phase’s intercept (value of the outcome to be different regardless of whether there is
when T ¼ 0 and D ¼ 0). Coefficient 1 provides an a treatment effect or not. Another problem
estimate of the linear trend in the outcome during is that the value of the intercept predicted for
the baseline phase (i.e. the change in Y for a change the treatment phase will be a function of the
of 1 in T ). 2 provides an estimate of the difference autocorrelation. When no trends exist in the
in the intercept predicted based on the linear trend baseline data, then ITSACORR has been found to
in the intervention data and the intercept predicted function well.
for the first intervention time point (when Several other researchers have also explored the
T ¼ n1 þ 1) based on the baseline data. The 3 use of a pair of indices to describe a treatment’s
coefficient (of the interaction term) provides an effect on slope and on level (Beretvas & Chung, 2007;
estimate of the change in the slope for the treatment van den Noortgate & Onghena, 2003). Beretvas and
data versus the baseline data. The 2 and 3 Chung used Center et al.’s (1985–1986) R2 metric
coefficients thus provide valuable information that idea paired with Huitema and McKean’s (2000)
can be used to describe a treatment’s effect on the piecewise regression equation (see Equation 10)
level and slope, respectively. formulation to derive their pair of effect sizes.
Another fundamental problem with Center et al.’s Specifically, for their effect size describing the
(1985–1986) and Faith et al.’s (1996) metric is that effect of an intervention on the level of an outcome,
the use of a single effect-size estimate to describe the authors calculated the R2 for the full piecewise
a treatment’s effect on both level and slope does not regression equation (Equation 10) versus the follow-
seem optimal. It would seem more appropriate to ing restricted regression equation:
calculate a metric for an intervention’s effect on Yt ¼ 0 þ 1 Tt þ 2 ½Tt ðn1 þ 1ÞDt þ et ð11Þ
a slope and a metric to describe the effect on the level
of the outcome. Lastly, if any of the models used which included no change-in-level coefficient. The
do not sufficiently explain autocorrelation in resi- associated F-ratio statistic testing the significance
duals, then this could negatively impact the resulting of this R2 was then calculated using Equation 8.
metrics as accurate estimates of effect size. The effect size describing the treatment’s effect on
the level was calculated by substituting the resulting In addition, as with Beretvas and Chung’s (2007)
F-ratio into the following Equation: metrics, potential autocorrelation in residuals could
sffiffiffiffiffiffiffiffiffiffiffiffi affect the precision and accuracy of these metrics
_ F when used with small data sets. Thus, when
F,Level ¼ 2 ð12Þ
dfError calculating these metrics, the full piecewise regres-
sion model (in Equation 10) could be estimated by
Note that, because there is only one parameter added modeling the potential autocorrelation in residuals.
to the full (Equation 10) over the restricted model It is anticipated, however, that with the small data
(Equation 11), the df associated with this F-ratio is sets typically encountered in SSED studies, estima-
only 1 (see Equation 12 versus Equation 9). tion will still be problematic.
The metric describing the effect of the intervention There is clearly a lack of consensus about the best
on the slope is calculated in a similar way. The metric to use to summarize SSED results. And, as is
difference is calculated in the R2 for the full evident from this review of methodological studies’
piecewise regression model in Equation 10 versus discussions of these metrics, there are problems
the following restricted piecewise regression model: associated with each of them. However, applied
Yt ¼ 0 þ 1 Tt þ 2 Dt þ et ð13Þ meta-analyses are still being conducted on SSED
results. It is important to see which metrics are most
which assumed no change-in-slope as a result of the commonly used by practitioners despite their asso-
intervention. The F-ratio testing the change in R2 is ciated methodological caveats. This could help
calculated using Equation 8 and the metric describ- inform future research into refinements for the
ing the effect of the intervention on the slope is relevant metrics. Given the plethora of metrics that
calculated in the same way as in Equation 12: have been suggested for use in describing SSED
sffiffiffiffiffiffiffiffiffiffiffiffi results, a survey of relatively recent SSED meta-
_ F analyses was conducted. A description of the survey
F,Slope ¼ 2 : ð14Þ
dfError and its results is described below.
Beretvas and Chung (2007) recommend testing for
autocorrelation in the residuals (remaining once
SSED-STUDY META-ANALYSES NARRATIVE REVIEW
the full model was estimated). If significant auto-
RESULTS
correlation is found, the authors recommend using
auto-regression to estimate the relevant regression The PsycINFO, MEDLINE and ERIC databases were
equations (Equations 10, 11, and 13). Otherwise, OLS each searched using the keywords ‘meta-analysis’,
estimation could be used. The authors conducted ‘review’, or ‘synthesis’ paired with ‘single-case’,
a simulation study to assess the functioning of their ‘single-subject’, or ‘PND’ for the years 1985 to
metric and found that it worked well for scenarios in 2005. Applied meta-analyses were included that
which the model fully explained the autocorrelation. incorporated computation of some form of quanti-
The metric worked less well in scenarios possibly tative effect size for single-n studies. In addition to
typical of some SSEDs with small data sets, in which the studies identified in the search, relevant applied
there was residual autocorrelation even when it was meta-analyses known to the authors were also
modeled using auto-regression. surveyed. Only those studies that clearly defined
Van den Noortgate and Onghena (2003) suggested the type of metric calculated were included in the
calculating two metrics to describe an intervention’s review. If an applied meta-analysis was unclear
effect on the level and linear growth in an outcome. about any of the other meta-analytic steps, it was
They encouraged estimating the full regression still included.
model in Equation 10 using OLS and suggested
standardizing the change in level and slope coeffi-
Metrics used
cients (i.e. 2 and 3) by dividing each by the square
root of the mean squared error. The authors suggest The database searches led to the identification of 279
synthesizing the resulting standardized effect-size studies in PsycINFO, 65 in ERIC, and 405 in
estimates using multivariate multilevel modeling MEDLINE. Redundancy across databases was
(while correcting the covariance matrix for the two removed and the remaining subset of studies was
effect sizes by dividing its elements by the mean assessed to identify which of them involved applied
squared error. This procedure addresses several of meta-analyses that synthesized results from SSEDs
the concerns associated with some of the other using clearly described quantitative metrics. The
metrics. However, the procedure was applied only to resulting 21 studies were supplemented with
a real data set and was not empirically evaluated. another 4 applied single-n meta-analyses that met
criteria for inclusion. Thus, a total of 25 meta- the effect from the first baseline to the first
analyses were summarized. treatment phase and the other would describe the
The most popular metric used was the PND index, effect for the second AB phase). How are these two
and was commonly used along with the percent of indices then used in the meta-analysis? Are they
zero data points (PZD). The PND and the PZD were treated as independent data points? They should not
used in 12 of the 25 meta-analyses. The next most be treated as independent because they represent the
popular
_
metric, used in 7 of the studies, was treatment’s effect on the same person. In a design
the SMD , in various versions. This metric was including the following sequence of phases, ABC, are
typically obtained by dividing the difference in the two effect sizes calculated; one describing results in
outcome means of the selected AB phases by the phase B versus the baseline phase (A) and another
standard deviation pooled across the data points in effect size for the pattern of outcome scores in phase
the baseline phase. Three studies used a version of C versus the same baseline phase? How do meta-
the mean baseline reduction procedure, and three analysts handle these two dependent metrics?
other studies used results from analyses using Typically, a primary study does not just investigate
Crosbie’s ITSACORR software. Two studies used the effect of an intervention on a single subject.
metrics based on an incorrectly specified piecewise In a multiple-baseline study, for example, the
regression model. researcher might assess a treatment’s effectiveness
Of the 25 meta-analyses, 7 used multiple indices to for three participants who are part of the same
describe their results. Of these 7, however, 4 used multiple-baseline study. This would lead to the
PZD with PND, which tend to provide very similar calculation of three outcomes. How are these three
summaries. Campbell (2003) and Bonner and metrics used in an ensuing meta-analysis? Are they
Barnett (2004) used a version of the standardized treated as independent? It seems likely that there is
mean difference and the PND to assess treatments’ some degree of dependence among these three
effectiveness. Maughan, Christiansen and Jenson metrics, given the commonality in the participants’
(2005) standardized
_
the results from ITSACORR and experiences in their being assessed by the same
also used SMD . Marquis et al. (2000) used three researcher, possibly being treated by the same
effect sizes, including one _
very like the mean caregiver, and their assessment taking place in the
baseline reduction, the SMD , and a regression- same setting.
based effect size. It is also possible that the intervention’s effect is
Results of this survey indicated that most SSED not evaluated just for a single outcome. Quite
meta-analysts are using the simplest indicators to frequently, multiple measures are used to assess
synthesize results (i.e. PND and the standardized the effect of an intervention. If, for example, two
mean difference). Given the typically small data sets kinds of social behaviors are tallied for a single
involved in these meta-analyses, only simple metrics participant, then two outcomes could be calculated
should really be used. However, use of these for the participant. These metrics cannot be assumed
particular indices has been criticized as noted above. to be independent, because they describe the same
In addition to investigating the types of metrics participant and the measures being assessed are
that were used to describe studies’ results, other probably correlated.
steps in the meta-analytic process were also briefly Little methodological research addresses how
summarized. The foundation of most SSEDs entails SSED meta-analysts should handle multiple treat-
a comparison between results in a baseline phase ments, participants, or outcomes. In other words, it
with results in an intervention phase. Nevertheless, is unclear how the dependence of outcomes within
the simple AB design in and of itself is not used due each primary study is handled. It seems important
to resulting validity threats (see, for example, to consider how to handle these issues and
Kazdin, 1982). Instead, more complex designs, a preliminary step involves assessing how they are
such as multiple baseline, reversal, alternating currently being handled.
treatment designs, and the like, are used.
Interpretation of the results from these more
Techniques used to deal with the dependence of
complex designs, however, still focus on the pattern
outcomes yielded by the same metric
in an intervention phase compared with the pattern
in the relevant baseline phase. The problem is, of Given the lack of methodological focus on the issue
course, that there are frequently multiple treatment of dependence of multiple outcomes yielded by the
phases and even multiple baseline phases. In an same metric within studies, the majority of meta-
ABAB design, for example, a researcher could analyses were not very clear about how this
calculate two indices to describe the treatment’s issue was dealt with. Of the 25 meta-analyses
effect on a single subject. (One index would describe that were reviewed, 10 did not make explicit
how the inevitable multiple-treatment dependence in several studies (n ¼ 6), using either descriptive or
was handled. Some seemed to have treated the inferential statistics. All studies included
multiple outcomes per primary study as indepen- a table listing the outcomes by study (and participant,
dent, but it was not completely clear. The most treatment, and outcome, as mentioned above).
commonly described method (in 7 meta-analyses)
for handling multiple-treatment dependence was to
average the multiple indices (one per treatment DISCUSSION
phase versus preceding baseline phase) together.
_
This meant taking a simple mean of SMD s (e.g. As outlined in the introduction, there are multiple
Swanson & Sachse-Lee, 2000) or using the mean or metrics that have been introduced and suggested for
median of the PNDs (e.g. Algozzine, Browder, & use with SSEDs. Unfortunately, there are methodo-
Karvonen, 2001; Schlosser & Lee, 2000). In six of the logical issues associated with the majority of these
meta-analyses, the multiple-treatment outcomes per metrics. These issues are founded in the complex
study were treated as independent, thereby ignoring time-series nature of SSEDs and in the inevitable
the possible dependence. In one study (Skiba, Casey, developmental trajectories of outcomes measured
& Center, 1985–1986), only the outcome for the first over time. A critical weakness of many of the metrics
treatment phase was used to counter possible carry- currently used in applied meta-analyses is that only
over effects. And one study used the outcome a single metric, rather than multiple metrics, is used
associated with the largest dosage treatment to describe a treatment’s effect. Development leads
(Allison, Faith, & Franklin, 1995). to possible trends over time. If there is a trend in an
The techniques used to handle multiple measures outcome even without intervention, then the level of
per study were also reviewed. The majority of the the outcome will inevitably change at a later (e.g.
studies (n ¼ 13) summarized results separately for intervention-phase) time point. For example,
each of the related outcomes. For example, Skiba a treatment might only be considered clinically
et al. (1985–1986) presented metrics for each of the significant if it produces immediate change in the
following different, but related, types of behavior: dependent variable. For such studies, a summary of
withdrawn, noncompliant, management problem, differences in means between baseline and treat-
off-task, appropriate and social interaction beha- ment could be appropriate. More commonly, it is
viors. In eight of the meta-analyses it was very recognized that the effect of a treatment might entail
unclear how the multiple related outcome depen- more gradual improvement. For example, in devel-
dence was handled. Four studies calculated simple oping communication behaviors in children with
averages across the multiple measures per autism, treatment is expected to be associated with
participant. a gradual, positively accelerating trend. In this
When multiple participants per study were scenario, not only is the level of the outcome
encountered, a similar set of techniques were used expected to change with the introduction of the
to handle the resulting within-study dependencies. treatment, so is the growth, or slope, of the outcome.
Almost half of the studies (n ¼ 12) ignored the Thus, a descriptor other than a single mean-shift
dependence and treated each participant’s outcome metric would be needed to describe the treatment’s
as independent. Four of the studies calculated effect. A single number could not effectively convey
a single outcome for each study by aggregating the a change in both level and slope.
participants’ outcomes. Three of the studies (Carey & Other treatments might be designed to change
Matyas, 2005; Scholz & Ott, 2000; Wurthmann, outcomes that already naturally increase (or
Klieser, & Lehmann, 1996) used meta-analytic decrease) over time during baseline. Such treatments
techniques to summarize the results gathered in might be designed to accelerate (or decelerate) the
their study and treated each participant’s results as trend in the outcome that already exists without
independent. For example, Scholz and Ott (2000) intervention. To identify the effectiveness of these
synthesized the 21 p values from ITSACORR analyses kinds of treatments, again, a single metric cannot
of 21 participants’ data. In six studies, how the convey the expected change in level and trend. The
dependence resulting from multiple participants per current paper has focused solely on the potential for
study was handled was not clearly described. linear trends. There is, of course, a possibility of
In terms of the analyses conducted using the curvilinear trends for certain outcomes and espe-
resulting outcomes, the majority of meta-analyses cially of asymptotic trends (Shadish & Rindskopf,
reported average (mean or median) metric values 2007). This adds further complexity to the models
across the set of metrics (either per study or per needed to describe an intervention’s effect, and is an
participant as noted above). Sample or study char- important area of continued research. It seems that
acteristics were explored as moderators of outcomes more specific guidelines that outline which metric to
use for which kind of treatment-effect investigation distributions will also provide accurate variance
are needed. In addition, while there are a host of estimates that could then be used in the weighting
possible alternatives, the emphasis needs to be on of SSED metrics for calculating pooled estimates.
matching the method with the anticipated trends in
the data.
The other challenge to the identification of an CAVEATS AND RECOMMENDATIONS FOR SSED META-
appropriate metric for summarizing SSED results is ANALYSTS
autocorrelation. Autocorrelation changes the vari-
In the culture of evidence-based practice and
ability of the standardized metrics associated with
accountability, many researchers and practitioners
SSEDs over what would be expected with large-n
are turning to meta-analysis to provide the relevant
designs. Traditional tests of autocorrelation have
been found to be biased when used with data sets as evidence for best practice. The meta-analysis tradi-
small as those typically encountered in the SSED tion was founded originally to summarize treatment
literature. More recent research has identified some effects from large-n studies most typically involving
modified test statistics that perform well in identify- a comparison of groups at a single time point. As the
ing lag-one autocorrelation for small data sets use of SSED studies increased, meta-analytic
(Huitema & McKean, 2000; Riviello & Beretvas, researchers tried to impose the effect sizes used
2008). It is hoped that future research will evolve with large-n studies on results from single-n studies.
from better identification of, to potential correction Unfortunately, while SSED studies typically involve
for, autocorrelation. This could then lead to metric a comparison of outcomes scores for an individual
formulation that controls for autocorrelation result- under treatment with his or her scores during
ing from repeated measures of individuals. a baseline phase, the pattern of these scores is
Additional challenges to the meta-analysis of assessed over time for an individual. An effect size
results from SSEDs have also been noted and needed to describe changes in an individual over
surveyed in the current study. A primary challenge time will not have the same metric as an effect size
involves the inevitable dependence of metrics within comparing groups of individuals at a single time
studies being synthesized. As noted above, this point. In fact, only for the simplest pattern of change
dependence can result for individual participants’ anticipated with a treatment (in which there is no
data in the following ways: (a) from calculating trend during baseline and treatment and the treat-
multiple outcomes with repeated use of single- ment is designed only to change the outcome’s
baseline-phase data (e.g. for a design incorporating level), could there be some correspondence between
a pattern such as ‘ABC’); (b) from multiple treat- large-n and SSED summaries. Both designs could
ments per study (e.g. a design including the ‘‘ABAC’’ then be used to detect a change in outcome level.
pattern); or (c) from multiple outcome measures per However, even for this simplest scenario, the
participant. Multiple dependent outcomes could be variability underlying estimates, and thus the result-
associated with a primary study’s results because the ing metrics of the associated effect sizes, will differ
study involves multiple participants (e.g. multiple- as a result of the studies’ designs. This subtle
baseline designs). From the results of the narrative difference between large-n and SSED studies lies at
review, it seems clear that a number of techniques the root of what complicates the formulation of
are being used to handle this dependence. These a useful effect size for use with single-n data. And,
techniques match the various techniques used to more importantly, researchers should not quantita-
handle multiple dependent outcomes per study in tively synthesize results from large-n and single-n
large-n meta-analyses. While several ad hoc techni- studies (Kavale, Mathur, Forness, Quinn, &
ques are used with large-n meta-analyses (including Rutherford, 2000). Separate large-n and SSED
averaging together each study’s effect sizes, selecting syntheses should be conducted.
‘best’ single effect sizes for each study, etc.), multi- Using data from an applied meta-analysis of
variate pooling through generalized least squares school-based interventions’ effect on communication
estimation (Becker, 1992) is also frequently used. skills, Wendt (2008) and Beretvas, Chung,
Unfortunately, generalized least squares cannot yet Machalicek and Riviello (2008) compared various
be used with SSED results, because of the unknown non-parametric and parametric effect-size estimates,
sampling distributions of most of the SSED metrics. respectively. Different inferences about the interven-
This again underscores the importance of identifying tion’s effectiveness were made on the basis of
SSED metrics and their associated sampling dis- the metric that was used to describe the studies’
tributions. This could then lead to better-founded results. This does not seem surprising, given the
techniques for handling within-study dependence. differing sources of the criticisms of SSED metrics
Identification of optimal metrics and their sampling (e.g. autocorrelation, trend in baseline, need for
multiple metrics). The authors of both papers found Busk, P. L., & Marascuilo, L. A. (1988). Autocorrelation
that, when the effect-size-based inferences did not in single-subject research: A counterargument to the
myth of no autocorrelation. Behavioral Assessment, 10,
converge, it was possible to identify the source of the 229–242.
divergence from the primary study’s data or design. Busk, P. L., & Marascuilo, L. A. (1992). Statistical analysis in
This strongly supports triangulation of metrics single-case research: Issues, procedures, and recommendations,
through meta-analysts’ use of multiple SSED metrics with applications to multiple behaviors. In T. R. Kratochwill, &
until some optimal metric has been derived. When J. R. Levin (Eds.), Single-case research design and analysis: New
directions for psychology and education (pp. 159–185). Hillsdale, NJ:
divergent inferences result, the meta-analyst is Erlbaum.
strongly encouraged to identify the source of the Campbell, J. M. (2003). Efficacy of behavioral interventions for
differences by close examination of each primary reducing problem behavior in persons with autism: A quanti-
study. tative synthesis of single-subject research. Research in
Finally, no mention has been made of criteria that Developmental Disabilities, 24, 120–138.

Carey, L. M., & Matyas, T. A. (2005). Training of somatosensory
could be used to categorize metric values mimicking discrimination after stroke: Facilitation of stimulus general-
Cohen’s (1977) ‘small’, ‘moderate’ and ‘large’ cut- ization. American Journal of Physical and Medical Rehabilitation, 84,
offs. Until a metric (and sampling distribution) is 428–42.
derived that better matches the pattern of SSED data Center, B. A., Skiba, R. J., & Casey, A. (1985–1986).
being synthesized, numerical cut-offs are perhaps A methodology for the quantitative synthesis of intra-
subject design research. Journal of Special Education, 19,
less important than the consideration of clinical 387–400.
significance (Shogren, Faggella-Luby, Bae, & Cohen, J. (1977). Statistical power analysis for the behavioral sciences
Wehmeyer, 2004). (rev ed.). Hillsdale, NJ: Erblaum.
It is hoped that methodological research focused Cooper, H., & Hedges, L. V. (Eds.)., (1994). The handbook of research
on synthesizing results from SSEDs will continue to synthesis. New York, NY: Russell Sage Foundation.
Crosbie, J. (1993). Interrupted time-series analysis with single-
evolve, and that some of these challenges will subject data. Journal of Consulting and Clinical Psychology, 61,
eventually be overcome. In the meantime, careful 966–974.
descriptive summaries of studies’ results can and Crosbie, J. (1995). Interrupted time-series analysis with short
should be conducted. series: Why it is problematic; how it can be improved.
In J. M. Gottman (Ed.), The analysis of change (pp. 361–395).
Mahwah, NJ: Erlbaum.

Didden, R., Duker, P. C., & Korzilius, H. (1997). Meta-analytic
study on treatment effectiveness for problem behaviors with
REFERENCES
individuals who have mental retardation. American Journal on
References used in the narrative literature review are marked with Mental Retardation, 101, 387–399.

an asterisk. DuPaul, G. J., & Eckert, T. L. (1997). The effects of school-based
*Algozzine, B., Browder, D., & Karvonen, M. (2001). Effects of interventions for attention deficit hyperactivity disorder: A
interventions to promote self-determination for individuals meta-analysis. School Psychology Review, 26, 5–27.
with disabilities. Review of Educational Research, 71, 219–277. Faith, M. S., Allison, D. B., & Gorman, B. S. (1996). Meta-analysis
*Allison, D. B., Faith, M. S., & Franklin, R. D. (1995). Antecedent of single-case research. In D. R. Franklin, D. B. Allison, &
exercise in the treatment of disruptive behavior: A meta- B. S. Gorman (Eds.), Design and analysis of single-case research
analytic review. Clinical Psychology: Science and Practice, 2, (pp. 245–277). Hillsdale, NJ: Erlbaum.
279–304. Galassi, J. P., & Gersh, T. L. (1993). Myths, misconceptions
Becker, B. J. (1992). Using results from replicated studies to and missed opportunity: Single-case designs and counsel-
estimate linear models. Journal of Educational Statistics, 17, ing psychology. Journal of Counseling Psychology, 40,
341–362. 525–531.
Beretvas, S. N., & Chung, H. (2007, May) R-squared change effect size Glass, G. V. (1976). Primary, secondary, and meta-analysis of
estimates for single-subject meta-analyses. Paper presented at the research. Educational Researcher, 5, 3–8.
7th Annual International Campbell Collaboration Colloquium Glass, G. V., Willson, V. W., & Gottman, J. M. (1975). Design and
in London, England. analysis of time-series experiments. Boulder: Colorado Associated
Beretvas, S. N., Chung, H., Machalicek, W. A., & Riviello, C. (2008, University Press.
May). Computation of regression-based effect size measures. Paper Gorman, B. S., & Allison, D. B. (1996). Statistical alternatives for
presented at the 8th Annual International Campbell single-case designs. In D. R. Franklin, D. B. Allison, &
Collaboration Colloquium in Vancouver, Canada. B. S. Gorman (Eds.), Design and analysis of single-case research
Blumberg, C. J. (1984). Comments on ‘‘A simplified time-series (pp. 159–214). Hillsdale, NJ: Erlbaum.
analysis for evaluating treatment interventions’’. Journal of Gorsuch, R. L. (1983). Three methods for analyzing limited
Applied Behavior Analysis, 17, 539–542. time-series (N of 1) data. Behavioral Assessment, 5,
141–154.
Bonner, M., & Barnett, D. W. (2004). Intervention-based school
psychology services: Training for child-level accountability; Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis.
preparing for program-level accountability. Journal of School New York: Academic Press.
Psychology, 42, 23–43. Hershberger, S. L., Wallace, D. D., Green, S. G., & Marquis, J. G.
Box, G. E. P., & Jenkins, J. M. (1976). Time series analysis: (1999). Meta-analysis of single-case designs. In R.
Forecasting and control (2nd ed.). San Francisco, CA: Holden– H. Hoyle (Ed.), Statistical strategies for small sample research
Day. (pp. 107–132). Thousand Oaks, CA: Sage.
Huitema, B. E. (1985). Autocorrelation in applied behavior design and analysis: new directions for psychology and education
analysis: A myth. Behavioral Analysis, 7, 107–110. (pp. 15–40). Hillsdale, NJ: Erlbaum.
Huitema, B. E. (2004). Analysis of interrupted time-series Riviello, C., & Beretvas, S. N. (2008). Detecting lag-one autocorrelation
experiments using ITSE: A critique. Understanding statistics, 3, in interrupted time series designs with small sample sizes. Manuscript
27–46. submitted for publication.

Huitema, B. E., & McKean, J. W. (1998). Irrelevant autocorrela- Schlosser, R. W., & Lee, D. L. (2000). Promoting general-
tion in least-squares intervention models. Psychological Methods, ization and maintenance in augmentative and alternative
3, 104–116. communication: A meta-analysis of 20 years of effectiveness
Huitema, B. E., & McKean, J. W. (2000). Design specification research. Augmentative and Alternative Communication, 16,
issues in time-series intervention models. Educational and 208–226.
Psychological Measurement, 60, 38–58. Schlosser, R. W., Lee, D., & Wendt, O. (in press). The
Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: percentage of non-overlapping data (PND): A systematic
Correcting error and bias in research findings. Newbury Park, CA: review of reporting characteristics in systematic reviews and
Sage Publications. meta-analyses. Evidence-Based Communication Assessment and
Jones, R. R., Weinrott, M., & Vaught, R. S. (1978). Effects of serial Intervention.

dependency on the agreement between visual and statistical Scholz, O. B., & Ott, R. (2000). Effect and course of tape-based
inference. Journal of Applied Behavior Analysis, 11, 277–283. hypnotherapy in subjects suffering from insomnia. Australian

Kahng, S., Iwata, B. A., & Lewin, A. B. (2002). Behavioral Journal of Clinical Hypnotherapy and Hypnosis, 21, 96–114.

treatment of self-injury, 1964 to 2000. American Journal on Scotti, J. R., Evans, I. M., Meyer, L. H., & Walker, P. (1991).
Mental Retardation, 107, 212–221. A meta-analysis of intervention research with problem
Kavale, K. A., Mathur, S. R., Forness, S. R., Quinn, M. M., behavior: Treatment validity and standards of practice.
& Rutherford, R. B. (2000). Right reason in the integration of American Journal on Mental Retardation, 96, 233–256.
group and single-subject research in behavioral disorders. Scruggs, T. E., Mastropieri, M. A., & Casto, G. (1987). The
Behavioral Disorders, 25, 142–157. quantitative synthesis of single-subject research: Methodology
Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and validation. Remedial and Special Education, 8, 24–33.

and applied settings. New York, NY: Oxford University Press. Scruggs, T. E., Mastropieri, M. A., Cook, S. B., & Escobar, C.
Lundervold, D., & Bourland, G. (1988). Quantitative analysis of (1986). Early intervention for children with conduct disorders:
treatment of aggression, self-injury, and property destruction. A quantitative synthesis of single-subject research. Behavioral
Behavior Modification, 12, 590–617. Disorders, 11, 260–71.

Ma, H. (2006). An alternative method for quantifying synthesis of Scruggs, T. E., Mastropieri, M. A., Forness, S. R., & Kavale, K. A.
single-subject research: Percent of data points exceeding the (1988). Early language intervention: A quantitative synthesis
median. Behavior Modification, 30, 598–617. of single-subject research. Journal of Special Education, 22,

Marquis, J. G., Horner, R. H., Carr, E. G., Turnbull, A. P., 259–83.

Thompson, M., Behrens, G. A., et al. (2000). A meta-analysis of Scruggs, T. E., Mastropieri, M. A., & McEwen, I. (1988). Early
positive behavior support. In R. M. Gersten, E. P. Schiller, & intervention for developmental functioning: A quantitative
S. Vaughn (Eds.), Contemporary special education research: Syntheses synthesis of single-subject research. Journal of the Division for
of the knowledge base on critical instructional issues (pp. 137–178). Early Childhood, 12, 359–67.
Mahway, NJ: Erlbaum. Shadish, W. R., & Rindskopf, D. M. (2007). Methods for evidence-

Mastropieri, M. A., & Scruggs, T. E. (1985–86). Early intervention based practice: Quantitative synthesis of single-subject designs.
for socially withdrawn children. Journal of Special Education, 19, In G. Julnes, & D. J. Rog (Eds.), Informing federal policies on
429–441. evaluation method: Building the evidence base for method choice in

Mathur, S. R., Kavale, K. A., Quinn, M. M., Forness, S. R., government sponsored evaluation (pp. 95–109). San Francisco:
& Rutherford, R. B. (1998). Social skills interventions with Jossey–Bass.

students with emotional and behavioral problems: A quantita- Shogren, K. A., Faggella-Luby, M. N., Bae, S. J., & Wehmeyer, M. L.
tive synthesis of single-subject research. Behavioral Disorders, 23, (2004). The effect of choice-making as an intervention for
193–201. problem behavior: A meta-analysis. Journal of Positive Behavior

Maughan, D. R., Christiansen, E., & Jenson, W. R. (2005). Interventions, 6, 228–237.

Behavioral parent training as a treatment for externalizing Skiba, R. J., Casey, A., & Center, B. A. (1985–86). Nonaversive
behaviors and disruptive behavior disorders: A meta-analysis. procedures in the treatment of classroom behavior problems.
School Psychology Review, 34, 267–286. Journal of Special Education, 19, 459–481.

Morris, S. B., & DeShon, R. P. (2002). Combining effect size Swanson, H. L., & Sachse-Lee, C. (2000). A meta-analysis of
estimates in meta-analysis with repeated measures and single-subject-design intervention research for students with
independent-groups designs. Psychological Methods, 7, 105–125. LD. Journal of Learning Disabilities, 33, 114–136.

Nourbakhsh, M. R., & Ottenbacher, K. J. (1994). The statistical Swanson, H. L., O’Shaughnessy, T. E., & McMahon, C. M. (1998).
analysis of single-subject data: A comparative examination. A selective synthesis of single subject design intervention
Physical Therapy, 74, 768–776. research on students with learning disabilities. Advances in
Pankratz, A. (1983). Forecasting with univariate Box–Jenkins models: Learning and Behavioral Disabilities, 12, 79–126.
Concepts and cases. New York: Wiley. Tryon, W. W. (1982). A simplified time-series analysis for
Parker, R. I., & Hagan-Burke, S. (2007). Single-case research evaluating treatment interventions. Journal of Applied Behavior
results as clinical outcomes. Journal of School Psychology, 45, Analysis, 15, 423–429.
637–653. van den Noortgate, W., & Onghena, P. (2003). Hierarchical linear
Parker, R. I., Hagan-Burke, S., & Vannest, K. (2007). Percentage of models for the quantitative integration of effect sizes in single-
all non-overlapping data (PAND): An alternative to PND. case research. Behavior Research Methods, Instruments & Computers,
Journal of Special Education, 40, 194–204. 35, 1–10.
Parsonson, B. S., & Baer, D. M. (1992). The visual analysis of data, van den Noortgate, W., & Onghena, P. (in press). A multi-level
and current research into the stimuli controlling it. analysis of single-subject studies. Evidence-Based Communication
In T. R. Kratochwill, & J. R. Levin (Eds.), Single-case research Assessment and Intervention.
Wendt, O. (2008, May). Computation of non-regression-based effect Wurthmann, C., Klieser, E., Lehmann, E., & Krauth, J. (1996).
size metrics. Paper presented at the 8th Annual Single-subject experiments to determine individually differen-
International Campbell Collaboration Colloquium in tial effects of anxiolytics in generalized anxiety disorder.
Vancouver, Canada. Neuropsychobiology, 33, 196–201.
White, O. R., Rusch, F. R., Kazdin, A. E., Xin, Y. P., Grasso, E., Dipipi-Hoy, C. M., & Jitendra, A. (2005). The
& Hartmann, D. P. (1989). Applications of meta- effects of purchasing skill instruction for individuals with
analysis in individual subject research. Behavioral developmental disabilities: A meta-analysis. Exceptional Children,
Assessment, 11, 281–296. 71, 379–400.

Beret Vas 2008

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Beret Vas 2008

Загружено:

Авторское право:

Доступные форматы

This article was downloaded by: [UZH Hauptbibliothek / Zentralbibliothek Zürich]

On: 10 September 2013, At: 13:24

Evidence-Based Communication Assessment and

A review of meta-analyses of single-subject

To link to this article: http://dx.doi.org/10.1080/17489530802446302

PLEASE SCROLL DOWN FOR ARTICLE

A review of meta-analyses of single-subject experimental

Keywords: Meta–analysis, effect sizes, single–subject experimental–designs, methodology

INTRODUCTION discerned from SSED results. And for the results

last three time points per phase might be preferred as 9

within one or within both phases and, thus, the

Figure 1. AB design, outcome level shift (increase) with no trend

Alternatively, there might already be a slight trend 6

Percentage of non-overlapping data 15

Scruggs et al. (1987) introduced the PND as an index 10

9 and X2 are hypothesized to predict Y, then the

between baseline and treatment phases and of the

Вам также может понравиться