Вы находитесь на странице: 1из 6

Research in Nursing & Health, 1998, 21, 557562

Focus on Qualitative Methods Uses and Abuses of the Analysis of Covariance


Steven V. Owen,1* Robin D. Froman2

Bureau of Educational Research, Box U-4, University of Connecticut, Storrs, CT 06269 2 Center for Nursing Research, University of Connecticut, Storrs, CT Received 30 March 1998; accepted 11 August 1998

Abstract: The analysis of covariance (ANCOVA) is a powerful analytic tool, but there continue to be abuses of the method. We review assumptions and illustrate legitimate uses of ANCOVA, and summarize statistical packages approach to the method. Finally, we consider how ANCOVA is used in contemporary nursing research. 1998 John Wiley & Sons, Inc. Res Nurs Health
21:557 562, 1998

Keywords: analysis of covariance, ANCOVA

As many statistics books point out, the analysis of covariance (ANCOVA) has two primary purposes: (a) to improve the power of a statistical analysis by reducing error variance, and (b) to statistically equate comparison groups. The first purpose operates well when participants are randomly assigned to their groups. But using ANCOVA with intact or pre-existing groups can have the opposite effect, a reduction in statistical power. The second purpose usually accompanies nonrandom group comparisons, and analysts apply ANCOVA to make the group comparisons more fair. In this article, we review the merits and demerits of these claims for ANCOVA. More specifically, we explore various ANCOVA pitfalls that can deliver misleading results for the unwary analyst, and review appropriate uses of ANCOVA. We also show how statistical packages (BMDP, SPSS, SAS, and SYSTAT) differ in their approach to ANCOVA. Though our focus is on the conventional ANOVA formulation, for researchers who subscribe to Cohens (1968) idea that regression analysis can do (just about) anything, our remarks apply to regression models as well. In fact, regression models may be more vulnerable to ANCOVA problems because independent variables often serve as covariates whether or not the reCorrespondence to Steven V. Owen. *Associate Director. Director.

searcher intended them to take that role. When Sir Ronald Fisher invented the ANCOVA model in the 1930s, he took random assignment and experimental control for granted. Fisher had been studying agricultural methods, and random assignment was easy to arrange. The point of his invention was to enhance the precision of the statistical analysis. Today, ANCOVA is used routinely with quasi-experimental data where treatments cannotbecause of expense, ethical concerns, or general disruptivenessbe randomly assigned to participants. The inability to assign participants to treatments is particularly evident in health care research. For example, in comparing lung vital capacity in smokers and nonsmokers, participants self-select themselves into the two comparison groups. If the researcher thinks that age might be a confounding variable, age might be assigned to a covariate role. Whether that decision is a good or bad one depends largely on two ANCOVA assumptions. The first statistical assumption is that the covariate(s) is(are) uncorrelated with other independent variables. In the smoking example, is age correlated with the independent variable, groups? If the correlation is nonzero, then removing the variance associated with age will also remove some of

1998 John Wiley & Sons, Inc. CCC 0160-6891/98/060557-06

557

558

RESEARCH IN NURSING & HEALTH

duction in power, because the covariate uses a degree of freedom that had been assigned to the error term. That results in an increase in the mean square error (reducing power). When sample size is small to begin with, the covariate must be strong enough to compensate for the loss of the error degree of freedom. But even if the covariate-dependent variable correlation is large, there may or may not be an improvement in power; it depends on meeting the first assumption.

EXAMPLE 1: REDUCING STATISTICAL POWER


FIGURE 1. A covariate erodes the effect of another independent variable.

the variance associated with the grouping variable. This in effect leaves less of the dependent variables (lung vital capacity) variance to be accounted for by the independent variable (smoking). Figure 1 illustrates the situation. Notice that the covariate, age, overlaps with smoking status (arrowed portion), absorbing some of smokings relationship with lung vital capacity. In the frequent case where ANCOVA is arranged specifically to equate groups that differ on some pretest measure, then the analyst has automatically violated the assumption. Does that make any difference? Wu and Slakter (1989), discussing ANCOVA in nursing research, showed no hesitation in recommending the technique to adjust for pre-existing group differences; whereas Pedhazur and Schmelkin (1991, p. 283) remarked that the approach is fraught with serious biases and threats to validity. Our position is more aligned with Pedhazur and Schmelkins. The second statistical assumption for ANCOVA is that the covariate(s) is (are) correlated with the dependent variable. When a covariate is a pretest and the dependent variable is the posttest, there should be a substantial correlation between the two (but see Pedhazur and Schmelkin [1991, pp. 283284] for other potential problems with such a design). In the smoking example, age of course is not a pretest; rather, it is considered a proxy variable, that is, a convenient substitute for other underlying constructs. In a proxy role, age might represent longer opportunity for smoking and increased vulnerability to an earlier cultural acceptance of smoking. What is the correlation of age with the dependent variable, lung vital capacity? If the correlation is small, then the covariate adjustment is similarly small, and there will not be a noticeable improvement in power. There might even be a re-

The data here are from an example in the BMDP Manual, Volume 1 (Dixon, 1992), in which 40 participants run a mile, and then their pulse rates are measured. A 2 (Sex) by 2 (Smoking) ANOVA is arranged. Both main effects are significant: Sex F(1,36) 37.33, p .001, effect size1 (partial 2) .51; Smoking F(1,36) 6.56, p .01, effect size .15. The interaction term is nonsignificant. Things change when the data are rerun, using baseline pulse as a covariate. Once again, the main effects are significant: Sex F(1,35) 21.81, p .001, effect size (partial 2) .38; Smoking F(1,35) 5.71, p .05, effect size .14. Notice that the effect size for Sex has dropped substantially. That is because baseline pulse was correlated with Sex, violating assumption 1. What about the substantive problem of interpretation? This involves the well known problem of variance partitioning. Darlington (1968) and other early analysts described the problem clearly, and admitted no answers. Even today, Pedhazur (1997) has addressed an entire chapter to the issue, because variance partitioning is widely used, mostly abused, in the social sciences for determining the relative importance of independent variables. . . .[In the last 15 years,] abuses of variance partitioning have not abated but rather increased (p. 243). When variables covary, there is no satisfactory way to assign unique explanatory power to them individually. One can make odd sounding statements that reveal how confusing the situation is. For example, Adjusting for initial pulse rate, sex is associated with post-exercise pulse rate. What does it mean to hold pulse rate constant, as though everyone had the same initial
1In ANCOVA,

partial 2 is defined as:

adjusted SS for effect (adjusted SS for effect adjusted SS for error) (Tabachnick & Fidell, 1996, p. 349).

USES AND ABUSES OF ANCOVA / OWEN AND FROMAN

559

pulse rate? What value is the hypothetical constant pulse rate? Could one choose a different pulse rate to hold constant? Back to Pedhazur: Unfortunately, applications of ANCOVA in quasi-experimental and nonexperimental research are by and large not valid (1997, p. 654).

EXAMPLE 2: IMPROVING STATISTICAL POWER These data are from an intervention study of 32 veterans diagnosed with post-traumatic stress disorder. Two treatments are randomly assigned: A 1week Outward Bound experience, or regular counseling sessions at a Veterans Affairs hospital. Pre-intervention and 1-week follow-up measures are taken with the Beck Hopelessness Scale (Beck & Steer, 1988). A one-way ANOVA is run, and the main effect for treatment is significant: F(1,30) 10.67, p .01, effect size (partial 2) .26. As in the first example, the data are rerun, this time using baseline Hopelessness scores as a covariate. Once again, the treatment effect is significant: F(1,29) 17.24, p .001, effect size = .37. In this case, all the signs of increased statistical power are present: larger F-ratio, smaller p value, and larger effect size. In this case, ANCOVA did its job because both assumptions were in place: Random assignment of treatments to participants creates an expected correlation of zero between the pretest and the grouping variable; and pretest scores are theoretically and statistically related to the outcome measure.

Figure 2 depicts this case. Because of random assignment of treatment groups, the grouping variable is not related to the covariate, Hopelessness pretest scores. But the covariate is related to the dependent variable, and boosts the independent variables power by removing some of what otherwise would be error variance. Once the dependent variables variance associated with the covariates variance is removed, the portion of the remaining variance in the independent variable shared with the independent variable (treatment) becomes larger.

OTHER ANCOVA ASSUMPTIONS The usual ANOVA assumptionshomogeneity of variance, normality, and independence of scoreshold for ANCOVA as well. And, as usual, the F-ratio can withstand some disruption in homogeneity of variance and normality (especially with equal cell sizes), but it is highly vulnerable to correlated scores, which create Type I errors. There is another ANOVA/ANCOVA assumption often unmentioned in statistics books: No measurement error in the covariate(s). In the case of ANCOVA with random assignment, covariate measurement error does not bias the adjusted means, but it does produce less statistical power, which in turn increases the probability of a Type II error. With a quasi-experimental design lacking random assignment, covariate measurement error creates bias in adjusted means. The bias is usually negative (underadjustment), but under some conditions can be positive (Bryk & Weisberg, 1977). Although measurement error in the dependent variable is not an ANOVA/ANCOVA assumption, it can disrupt statistical power. With ANOVA, measurement error in the dependent variable reduces statistical power, but with ANCOVA, the outcome is less predictable: Even Type I errors may result if the covariate is correlated to other independent variables, because measurement error in the covariate may now ripple through the entire model by way of its correlations with other variables. Homogeneity of regression slopes is an additional assumption for the ANCOVA model. This means that each comparison group should show a similar regression slope when the dependent variable is regressed on the covariate(s). The reason for the assumption is that all groups dependent variable scores are adjusted based on a pooled regression slope; if the groups individual slopes differ sharply, then the pooling becomes a muddy average.

FIGURE 2. A covariate improves the effect of another independent variable.

560

RESEARCH IN NURSING & HEALTH

Interestingly, when cell sizes are equal, the ANCOVA F-ratios are generally robust except for the most gross violations of homogenity of regression (Hamilton, 1977; Wu, 1984). That does not mean that equal cell sizes allows the analyst to ignore the homogeneity of slope assumption. A robust F-ratio is a statistical summary that delivers no particular insight about how groups are different. It can be far more informative, following a violation of homogenous slope, to calculate Johnson-Neyman regions of significance. This technique helps to map out where groups do and do not differ along various values of the covariate. Dorsey and Soeken (1996) produced an introduction to the Johnson-Neyman method applied to nursing research. The final ANCOVA assumption is that the relationship between the covariate(s) and the dependent variable is linear. Because the regression is based only on the linear portion, any systematic but nonlinear relationship will cause a reduction in statistical power. The simplest solution for nonlinearity is to apply a power transformation (e.g., quadratic, cubic) to the covariate before the ANCOVA analysis. WHAT DO NURSE RESEARCHERS DO WITH ANCOVA? We searched four important nursing journals over a 5-year period (1993 1997; reference list available from first author) for examples of ANCOVA. Image had none, in keeping with its recent drift toward qualitative research (Henry, 1998). Western Journal of Nursing Research had only 2, Research in Nursing & Health published 5 , and Nursing Research showed 9, for a total of 16. Of those articles, in 9 (56%) the investigator used random assignment of participants to treatments, so the covariate(s) were expected to be uncorrelated with the dependent variables (assumption 1). Only 1 of the 16 articles contained a thorough assessment of ANCOVA assumptions. In fairness, though, in the 9 using random assignment the investigator should not have needed to check the correlation between the covariate(s) and independent variable(s). Also, because random assignment can produce (approximately) equal cell sizes, the analysis is inoculated against violations of all assumptions except independence of scores. It is surprising that so few of the articles contained information about assumption 2, the relationship of the covariate and the dependent variable. Only 1 indicated F-ratios for covariates, and in 1 other study the investigator gave the simple corre-

lations between covariates and dependent variables.

STATISTICAL PACKAGES AND ANCOVA In 1982, Searle and Hudson compared ANCOVA procedures from 10 computer programs, and discovered different output among all 10. Although contemporary statistical programs are easier to use, output and labeling have not improved much since then. Three of the four packages we reviewed are owned by SPSS (BMDP version 7, SPSS version 8.0, and SYSTAT version 7.01). Interestingly, the flagship program, SPSS, differs from the other two in its approach to ANCOVA. Its default setting is what SPSS terms the experimental approach, in which main effects and interactions are adjusted for the covariate. The default for BMDP and SYSTAT, in SPSS language, is called the regression approach, in which each termeven the covariateis adjusted for each other term. When the covariate is uncorrelated with other independent variables, then both approaches give the same result (that is, there is nothing to adjust in the covariate). But in the nonexperimental situation, where the covariate may be related to a grouping variable, the two approaches can deliver markedly different results. In this case, the regression approach gives more conservative results, with less statistical power. With each package, the thoughtful analyst can easily override the defaults to produce the alternate approach. SYSTAT does not label the covariate(s) as such on the printout. BMDPs programs 1V and 4V label the covariate(s) clearly, but 2V does not. SAS (version 6.12) does not treat the approaches as alternate. It delivers both the regression and experimental results in a single table, so the user can decide which to use (or not decide, and report both). SAS does not identify the covariate(s) on the printout. SPSSs ANCOVA is the most unconventional of the four packages. The only way to assign covariate status to a variable is through SPSSs General Linear Model procedure. The resulting printout does not distinguish covariates from other independent variables.

STATISTICAL PACKAGES TREATMENT OF ANCOVA ASSUMPTIONS As a rule, statistical packages encourage users to ignore assumptions and leap right to the main

USES AND ABUSES OF ANCOVA / OWEN AND FROMAN

561

analysis. Inside ANOVA programs, packages offer the Levene test for homogenity of variance, but any other tests of assumptions must be arranged by the user. For ANCOVA, the situation is no better. In BMDP, only one of its three ANCOVA programs (1V) automatically delivers a homogeneity of regression test. Unfortunately, this program handles only a one-way model, so if the analyst has a factorial model, she must convert the cell structure to a one-way model just to get the assumption tested. SYSTAT and SAS offer no homogeneity test. In SPSSs GLM procedure, one must construct an interaction term representing the assumption test. Without a clear guide (SPSS, 1997, pp. 118119), this would be hard to discover. Any analyst facile with regression analysis could readily test for slope homogeneity inside a regression model. Caution should be used, though, in arranging the model. With a hierarchical analysis (the preferred approach), the homogeneity term (interaction between the covariate and the independent variable) is entered last, and the test is a version of SPSSs experimental approach, where each successive term is adjusted for previous terms. If a direct or simultaneous regression is used, the homogeneity term is tested with SPSSs regression approach. CONCLUSIONS In 1969, Janet Elashoff called the analysis of covariance (ANCOVA) a delicate instrument. It still is. Carefully handled, though, it is an excellent device for the analysts toolkit. To improve the quality of future ANCOVA studies, we recommend that the method be limited primarily to randomized designs. When the analyst wants to use ANCOVA with an intact group or other nonrandom assignment, the correlation between the covariate(s) and the independent variable(s) should be reported. As the correlations are increasingly nonzero, then conclusions drawn about the independent variables are increasingly suspect. ANCOVA is an interesting and useful toolkit, but it is not a fix-all to be applied indiscriminately to equate groups. As mentioned above, the Johnson-Neyman method can be used as an option (or as a complement) to ANCOVA. Myers and Well (1995) offer a brief comparison of ANCOVA with other approachesblocking, analysis of gain scoresto improving statistical power in nonrandom groups. Kirk (1995, Chapter 15) gives a short but excellent review of ANCOVA applications, and Huitemas (1980) text remains as the definitive work on ANCOVA.

We also recommend that researchers report tests of ANCOVA assumptions. That statistical packages make assumption tests challenging is not a good reason to avoid them entirely. And it is easy, not challenging, to report the simple correlations between covariates and dependent variables. In the case where the correlations are tiny, then there is no gain whatsoever to using ANCOVA.

REFERENCES
Beck, A.T., & Steer, R.A. (1988). Beck Hopelessness Scale manual. San Antonio: Psychological Corporation. Bryk, A.S., & Weisberg, H.I. (1977). Use of the nonequivalent control group design when subjects are growing. Psychological Bulletin, 84, 950962. Cohen, J. (1968). Multiple regression as a general data analytic system. Psychological Bulletin, 70, 426443. Darlington, R.B. (1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69, 161182. Dixon, W.J. (1992). BMDP statistical software manual, Vol. 1. Berkeley, CA: University of California Press. Dorsey, S.G., & Soeken, K.L. (1996). Use of the Johnson-Neyman technique as an alternative to analysis of covariance. Nursing Research, 45, 363366. Elashoff, J.D. (1969). Analysis of covariance: A delicate instrument. American Educational Research Journal, 6, 383401. Hamilton, B.L. (1977). An empirical investigation of the effects of heterogeneous regression slopes in analysis of covariance. Educational and Psychological Measurement, 37, 701702. Henry, B. (1998). To Journal readers, report and requests, 1998. Image: Journal of Nursing Scholarship, 30, 2. Huitema, B.(1980). The analysis of covariance and alternatives. New York: Wiley. Kirk, R.E. (1995). Experimental design: Procedures for the behavioral sciences (3rd Ed.). Pacific Grove, CA: Brooks/Cole. Myers, J.L., & Well, A.D. (1995). Research design & statistical analysis. Hillsdale, NJ: Lawrence Erlbaum. Pedhazur, E.J. (1997). Multiple regression in behavioral research (3rd Ed.). New York: Harcourt Brace. Pedhazur, E.J., & Schmelkin, L.P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Lawrence Erlbaum. Searle, S.R., & Hudson, G.F.S. (1982). Some distinctive features of output from statistical computing packages for analysis of covariance. Biometrics, 38, 737745. SPSS. (1997). SPSS advanced statistics 7.5. Chicago: Author. Tabachnick, B.G., & Fidell, L.S. (1996). Using multivariate statistics (3rd Ed.) New York: Harper Collins.

562

RESEARCH IN NURSING & HEALTH

Wu, Y-W.B. (1984). The effects of heterogeneous regression slopes on the robustness of two test statistics in the analysis of covariance. Educational and Psychological Measurement, 44, 647 663.

Wu, Y-W.B., & Slakter, M.J. (1989). Analysis of covariance in nursing research. Nursing Research, 38, 306308.

Вам также может понравиться