Вы находитесь на странице: 1из 19

QMIN

ANOVA and ANCOVA - 1.1

1 ANOVAandANCOVA
For statistical packages that have a general procedure for fitting general linear models (GLM), the distinction between regression, the analysis of variance (ANOVA), and the analysis of covariance (ANCOV A) is immaterial. Usually, the general procedure allows one to specify which of the independent variables are truly categorical and which of the independent variables are metric. After a user makes these specifications, the model can be fitted to data and the results can be assessed. In other circumstances, the software may have separate routines for the analysis of variance and the analysis of covariance. If ali of the independent variables are truly categorical, then ANOV A is the appropriate procedure. If at leas! one of the independent variables is truly categorical but at least one other independent variable is continuous, then ANCOVA is appropriate. Before applying an ANOVA or ANCOV A procedure to data, the first and foremost question that a user must ask is whether the independent variable(s) are truly categorical or have at leas! an ordinal metric associated with them. As we ha ve continuously emphasized, the acid test for a truly categorical variable is a very common-sense criterion--can one randomly order the groups in a bar chart without sacrificing information? If the answer to this question is "yes," then the independent variable is truly categorical andan ANO VA or ANCOV A is suitable. If the answer is "no," then one should code the data according to the principies outlined in Section X.X and use the procedures in the previous chapter to fit the GLM to the data.

1.1 The Logic and Terminology of ANOVA


Let us explore the logic behind ANOV A with an example. Suppose that a neuroscience lab is interested in the relative efficacies of four different selective serotonin reuptake inhibitors (SSRis). The lab administers each of these SSRis to separate groups of rats for three weeks, sacrifices the animals, and then assesses the concentration of a certain serotonin receptor in a brain region. Figure 1.1 depicts the mean number of bound receptors (per volumne) in the assay and Table 1.1 presents the descriptive statistics. ANOVA is an appropriate procedure for the analysis of these data because there is no underlying quantitative difference among the four SSRis. One could rearrange the order of the bars in Figure 1.1 without any loss of information. The ANOV A procedure begins with the null hypothesis that assumes the rats in al! four groups are sampled from a single population with a mean of anda variance of d. Under the assumption that receptor number is normally distributed, then the observed mean for each group should be sampled from a hat of means with an overa]] mean of mean of and a variance of d2/ N where Nis the sample size of the group. (Note that under the central limit theorem, the distribution of receptor number need not be normal if the sample size per group were very large. With an N of 13 or 15, however, central limit will apply only if the variable is close to normal.). Under the altemative hypothesis, at least one group of rats is sampled from a different hat with a mean that we designate as A but the same variance, d, as the first hat. Note that the altemative hypothesis includes the possibilities that two, three or ali four groups are sampled from different hats. Under this hypothesis, the means will also be sampled from different hats with different overall means. The variance of the means in each hat, however, will remain d!N.

QMIN

ANOV A and ANCOV A - 1.2

Figure 1.1 Mean(+/- 1 standard error) receptor concentrations in a brain area after administration of a selective serotonin reuptake inhibitor.

400

(])

i:::i

B
u

....

o..
(]) (])

r:r:;

::;
ro

aio

SSRll

SSRl2

SSR13

SSR14

SSRl
Table 1.1 Sample size, means and standard deviations for the amount of bound receptor after administration of different selective serotonin reuptake inhibitors (SSRis ). SSRI: SSRil SSRI2 SSRI3 SSRI4

N: 15 15 15 13

Mean: 264.28 239.68 265.58 293.28

St. Dev. 41.48 62.81 54.50 49.50

The ANOV A procedure obtains two different estimates of the population variance, d, using the mathematics of the null hypothesis. The first of these is called the within-group variance which is more generally termed the error variance. In this example, the within-group or

QMIN

ANOV A and ANCOV A - 1.3

error variance is simply the average of the four variances for each group, the average being weighted by sample size. The second estmate of el- is based on the group means. Because the variance of the mean receptor concentration under the null hypothesis is cl-!N, we can obtain an estimate of the population variance by first calculating the variance of the observed means (again weighted by sample size) and then multiplying this variance by N (or in the case of different Ns, a weighted average ofthe sample size foreach group). This estmate of el-is called the between-group or among-groups variance; in more general GLM programs, it will be called the model variance. Under the null hypothesis, both estimates of el- should be equal, so the ratio of the between-group estimate divided by the within-group estmate should be distributed as an F distribution (see Section X.X on the F distribution). Clearly, we expect the value of F under the null hypothesis to be around 1.0 1.Under the altemative hypothesis, however, the variance of the observed means will be larger than cl-!N because it will also contain the variance dueto the true differences in the means. Hence, the alternative hypothesis predicts that the between-group variance will be larger than the within-group variance. The expected F under this hypothesis should be much larger than 1.0. Consequently, the F statistic serves as a test for the equality of means. If the probability of the observed F statistic (including any values more extreme than the observed F) is less than the alpha leve!, then we reject the null hypothesis. (terminology to be completed)

1.2 Oneway ANOVA: How to do them.


A oneway ANO VA consists of a single ANO VA factor with at least two levels. If there are two levels, the oneway ANOV A becomes identical to the t test for independent samples. Otherwise, the ANO VA provides a single overall test for the equality of the group means. Here, we provide the stpes for fitting a oneway ANOV A model to data using the SSRI example given above.

Step l. Examine the raw data. As in ali statistical analyses, the very first step is to examine the distribution of the dependent variable within each of the groups with an eye on detecting outliers (see Section X.X). Because group sample size is experimental neuroscience tends to be small to moderate, one should also assess normality of the distribution within each group (not for the dependent variable ignoring group differences). Figure 1.2 presents box plots for the four SSRI groups. There is a potential outlier in lower part of the distribution of SSRil. Here, it would be wise to return to the original autoradiographs (or whatever other technique was used to assess receptor density) to check the data value and make certain that it was correctly entered into the database. In the present case, the data value is rea1 2 The appearance of it as a potential outlier can be traced to the fact that

1.2.1

Technically, the expected value of F under the null hypothesis is a complicated function of the degrees of freedom for the between-group and the within-group variances and is not exactly 1.0. 'fhe expected F, however, should be close to 1.0.
2

The example is hypothetical, but the actual numbers were simulated assuming a normal distribution within each

group.

QMIN

ANOV A and ANCOVA - 1.4

there are only 15 observations n ths group, so the quantile statistcs used to construct the box plots are estimated wth considerable error. The assessment of norrnality within each group is also complicated by the small number of subjects. Recall from Section X.X that statistical tests for normality are best left to relatively large samples. In the present case, the best solution is to examine visually the stem-and-leaf plots for each group. An alternative approach is to creare a new variable that consists of the receptor concentration for an observation less the group mean. The dstribution of the new variable can then be assessed across all observations. This, however, can be done after fitting the ANOVA model to the data, so we shall deal with this later.

Figure 1.2 Box plots for receptor concentrations for each selective serotonin receptor uptake nhibitor (SSRI).

400

350

: . . 300

-_
e:;

+ + + +

~1250 1l 5
200

l50

SSRil

SSRI2 SSRI

SSRI3

SSRI4

Step 2: Test for homogeneity of variance. In Section X.X, we noted that a key assumption about the GLM is that the residuals have the same variance for each set of values for the independent variable(s ). In the discussion of

1.2.2

QMIN

ANOV A and ANCOV A - 1.5

regression (Section X.X), this assumption is referred to as the homoscedasticity of the residuals. Because the independent variable in an ANO VA is the ANOV A factor and its "values" are the levels of that factor, this assumption implies that the variances within each group are equal; in ANOV A terms, the assumption is most often called homogeneity of variance. Concretely, this assumption implies that the variance within each of the four SSRI groups is sampled from a hat of variances with the same overall population variance, d. Most statistical packages provide severa! tests for homogeneity of variance. The poorest of these tests computes an F ratio for each possible pair of groups. In the present example, there are four groups, so there would be six different F ratios. The problem here is obvious-the risk of a false positive result from performing so many tests. Ofthe remaining tests, the two most popular are Bartlett's test (Bartlett, 19XX) and Levene's test (Levene, 19XX). Bartlett's test is sensitive to departures from normality, so most statisticians currently favor Levene's test. Figure 1.3 presents the results from these two tests on the SSRI data.

Figure 1.3 Two tests for the homogeneity of variance.


Levenes Test for Homogeneity of Bound_Receptor Variance ANOVA of Squared Deviations from Group Means Sum of Squares
34094362 5.8785E8

Source SSRI Error

DF

Mean Square
11364787 10886185

F Value
1.04

Pr > F
0.3806

3 54

Bartlett's Test for Homogeneity of Bound_Receptor Variance Source SSRI


DF

Chi-Square 2.3729

Pr > ChiSq 0.4987

Levene' s test creates an F test statistic while Bartlett' s test is based on a ,i statistic. Both tests are based on the null hypothesis that the group variances are within sampling error of one another. Hence, if the p value for the tests is less than a prespecified alpha leve], then the hypothesis of homogeneity of variance is rejected. For the SSRI data, both p values are well above the conventional cutoff of .05, so we conclude that the variances are homogenous. One should not be obsess ional about homogeneity of variance. When the sample sizes are roughly equal-as they are in the curren! example-then ANOV A is reasonably robust against the violation of this assumption. In experimental neuroscience, there should be deliberate planning to keep sample size equal. The major situation that can cause difficulty here occurs with lognormal distributions, a topic discussed Jater in Section X.X. In other areas of neuroscience, group sample sizes can be very unequal-e.g., examining sex differences in a

QMIN

ANOV A and ANCOV A - 1.6

study of restrictive eating disorders. Here, when the assumption is strongly violated, then either the dependen! variable should be transformed ora nonparametric technique employed.

Step 3: Fit the model. Sorne statistical packages have separate routines for fitting a oneway ANOV A. Others fit a oneway ANOV A within the context of a routine that can fil any number of ANOVA factors. We recommend, however, that you use a GLM package if your software has such a routine. Although it may take more time to master, you will only have to learn one software routine for ali of your regression, ANOVA, and ANCOV A routines. 1.2.4 Step 4: Interpret the Results. Figure 1.4 presents the results from fitting the oneway ANO VA model to the SSRI data using PROC GLM in SAS. (The actual output contains other information that is redundant with that given in Figure 1.4 and is omitted from the Figure). Figure 1.4 Results from a oneway ANOVA on the SSRI data.
Dependent Variable: Bound_Receptor Sum of Squares 20020.4292 150426.6751 170447.1043

1.2.3

Source DF Model 3 Error 54 Corrected Total 57 R-Square 0.117458

Mean Square 6673.4764 2785.6792

F Value 2.40

Pr > F 0.0783

Coeff Var 19.93535

Root MSE 52.77953

Bound_Receptor Mean 264.7534

The ANOV A table in Figure 1.4 provides the two estimates of d under the null hypothesis, but in ANOV A idiom, they are mean squares instead of variance estimates. Hence, the mean square for Error in this table is the estimate of d calculated from the within-group variances, and the mean square for Model is the estmate of d based on the group means. Recall the definition of a variance given earlier in Section X.X-a variance equals the sum of squared deviations of observed values from predicted values divided by its degrees of freedom. The ANOVA table also gives the sums of squares and the degrees of freedom used to calculate the two estimates of d. You should verify that the mean square for Model equals the sum of squares for Model divided by its degrees of freedom and that the mean square for Error equals its sum of squares divided by its df The F statistic equals the mean square for Model divided by the mean square for Error6673.4764 I 2785.6792 = 2.40. The p value for the F is the area under the F distribution with 3 and 54 dffrom 2.40 to positive infinity. Hence, it is the probability of observing an F of 2.40 or greater. The value here--.0783-is greater than .05, so we will not reject the null hypothesis.

QMIN

ANOVA and ANCOVA - 1.7

The p leve!, however, is el ose enough to .05 that we might want to note a trend toward significance. The R 2 statistic is a measure of overall effect size. It equals the proportion of variance in recap tor concentration (or, in general, the dependent variable) explained by mean differences among the four SSRI groups (or, in general, the ANOV A factor). Here, about 12% of the variance in receptor concentration is explained by the differences among the four SSRis. The "Root MSE" stands for the square root of the mean square for Error. Because the mean square for Error is the error variance (which is the pooled within-group variance ), Root MSE is the standard deviation of error variance (or the pooled standard deviation). This statistic should be used to compute standardized effect sizes, part of the next topic. Multiple Comparison Procedures Although R 2 is a measure of effect size in a oneway ANOV A, it suffers from one major limitation-it does not indicate which groups may be responsible for a significant effect. Ali a signifcant R 2 and the F statistic say is that the means for the groups are unlikely to have been sampled from a single hat of means. Unfortunately, there is no simple, unequivocal statistical solution to the problem of comparing the means for the different levels of an ANOV A factor. The fundamental difficulty can be illustrated by imaging a study on personality in which the ANOV A factor was astrological birth sign. There are 12 birth signs, so a comparison between the ali possible pairs of birth signs would result in 66 different statistical tests. One might be tempted to say that 5% of these tests might be significant by chance, but that statement is misleading because the tests are not statistically independent. The Virgo mean that gets compared to the Leo mean is based on the same subjects as, say, the Virgo and Pisces comparison. Because of these problems, a number of statistical methods have been developed to test for the difference in means among thc levels of an ANOV A factor. Collectively, these are known as mu/tiple comparison procedures or, sometimes, as post hoc (i.e., "after the fact") tests 3, and this very name should caution the user that these tests should be regarded more asan afterthought than a rigorous examination of pre-specified hypotheses. Our first advice on post hoc tests is to avoid them whenever possible. It is not that they are wrong or erroneous. Instead, they are poor substitutes for formulating a clear hypothesis about the group means, coding the levels of the ANOV A factor to embody that hypothesis, and then directly testing the hypothesis. The proper role for post hoc tests is in completely atheoretical situations where the investigator lacks any clear direction about which means might differ from which others. (to be completed)

1.2.5 Step 5. Communicating the Results (not yet completed).

Multiple comparison procedures are also called means separation tests.

QMIN

ANOV A and ANCOV A - 1.8

1.3 Twoway ANOVA


The results of the SSRI study above might be described as equivoca!. There was no strong evidence to reject the null hypothesis, but at the same time, there was an intriguing trend toward significance. Perhaps in planning the study, the investigators underestimated the effect size and hence did not allocate a sufficient number of rats per group to allow the design to detect real differences among the four SSRis. Suppose that the lab decides to repeat the study, but with a twist. SSRis are not prescribed to random humans but are selectively used to treat anxiety and depression. Perhaps the failure to detect differences in rat brain carne about because the animals did not have the equivalent of human dysphoric mood. Hence, the lab decides to add an additional condition in the replication. Rats will be randomly assigned to either a control condition orto a condition of chronic stress exposure before being administered an SSRI. Now we have two independent variables-SSRI and stress-so the analysis becomes a twoway (or two factor) ANOVA. The logic of such an ANOV A is identical to that of a oneway ANOVA-obtain different estimates ofthe population variance, el, under the null hypothesis and test whether the estimates are similar. The only difference in the twoway ANOVA is that there will be more estimates of el. As before, we will have an estmate based on the withingroup variances that will be the error variance for the model. We will also have an estmate of el based on the four means of the SSRI groups (collapsed over the two stress levels). But in addition, we will also have an estmate of el based on the means of the control and the stressed groups (collapsed over the SSRis) as well asan estmate based on the interaction between SSRI and stress. The steps involved in fitting twoway (or, in general, multiway or factorial) ANOV As are identical to those in fitting a oneway ANOV A with one exception-the test for homogeneity of variance. Methods for testing homogeneity of variance in factorial ANOV A is a current research effort among statisticians and no firm consensus has developed. (X.X ? add here what NOT to do andad hoc remedy if recoding aka to a oneway ANOV A and do the test).

Interpreting Results in a Twoway ANOVA: Main Effects Error! Reference source not found. presents the mean receptor concentration of the eight groups as a function of the stress condition and the type of SSRI. There appears to be a difference in receptor levels as a function of stress for SSRil and for SSRI3. Might SSRis 2 and 4, on the other hand, counter the stress effect by normalizing the number of receptors? Only a rigorous statistical test can answer this question.

1.3.1

QMIN

ANOV A and ANCOV A - 1.9

Figure 1.5 Mean(+/- 1 standard error) receptor concentrations for four different types of selective serotonin reuptake inhibitors (SSRI) as a function of stress.
350~----~

Stress:
325

-None Chronic

,...._ >:il 300

w. ,....,

----+ 275
~

~
8
(j)

...

250

~ >:1

gJ

I//
1

. +/
2 3

~ 225

200

SSRI Number:
Figure 1.6 presents the results of the two factor ANOV A. The ANOV A table at the top of the output assesses the overall fit of the model. The mean square for Error in this table is the error variance, computed as the weighted average of the variances within the eight cells of the model. The mean square for Model may be thought of as the estimate of d derived from the variance of the eight ce]] means and the sample size. As in the oneway ANOVA, the F statistic equals the mean square for Model divided by the mean square for Error. Its value is 2.25, and the p value is less than .05. Hence, we reject the null hypothesis that the eight means are equal. The F statistic for the overa]] model may be called an omnibus F. It informs us that the overa]] model predicts receptor concentration better than chance, but it does not imply where those differences lie. At least one of the eight means must be different from at least one of the other eight means, but this statement includes a large number of possibilities. It is necessary to examine the results for the individual ANO VA terms in order to fathom the source(s) of the significance of the overa]] model.

QMIN

ANOVA and ANCOVA - 1.10

Figure 1.6 Results from a two factor ANOV A (Stress and SSRI) on receptyor concentration.
Dependent Variable: Bound_Receptors Sum of Squares 42484.7688 291126.0556 333610.8244 Root MSE 51.91928 Receptor Concentration Mean Square 6069.2527 2695.6116

Source Model Error Corrected Total R-Square 0.127348 Source Stress SSRI Stress*SSRI

DF 7 108 115

F Value 2.25

Pr > F 0.0354

Coeff Var 19.10765 DF 1 3 3

Bound_Receptors Mean 271.7198 F Value 7.43 1.59 1.19 Pr > F 0.0075 0.1971 0.3167

Type I SS 20030.09043 12822.03684 9632.64154

Mean Square 20030.09043 4274.01228 3210.88051

Source Stress SSRI Stress*SSRI

DF
1

3 3

Type III SS 19612.75652 12769.87469 9632.64154

Mean Square 19612.75652 4256.62490 3210.88051

F Value 7.28 1.58 1.19

Pr > F 0.0081 0.1987 0.3167

Examine the last table in Figure 1.6, the one labeling the sums of squares as "Type III SS." (We discuss the middle table, i.e., the "Type I SS," later in Section X.X). This portian of the ANOV A output provides statistical tests for the ANOV A factors in the model as well as their interaction. To understand this table, it is first necessary to comprehend a marginal mean. The general definition of a marginal mean is the mean for one leve! of an ANOVA factor, ignoring ali other ANOV A factors. Consider the cell means for the current data set presented in Table 1.2. The marginal mean for SSRil (258.0) is simply the mean receptor concentration for ali rats who were administered this SSRI, ignoring the fact that sorne of them had been subjected to chronic stress while others were not. Similarly, the marginal mean for rats in the Chronic stress condition (258.6) is the mean for ali rats in that condition, ignoring the fact that they received four different SSRis.

QMIN

ANOVA and ANCOVA - 1.11

Table 1.2 Mean receptor concentrations as a function of stress and type of selective serotonin reuptake inhibitor (SSRI). SSRI: SSRil SSRI2 SSRI3 SSRI4 Mai:_gin Stress: None Chronic 279.6 236.5 271.6 272.8 292.3 249.5 296.9 277.4 284.9 258.6 Mm:gin 258.0 272.2 269.4 287.2 271.7

Retuming to the last table in Figure 1.6, the row for Stress assesses whether the two marginal means for Stress (284.9 and 258.0) can be regarded as being pulled from the same hat of means or, in more accurate statistical jargon, are within sampling error of each other. Under the null hypothesis, the estimate of el derived from these two marginal means is 19,612.76, given in Figure 1.6 as the mean square for stress. This estimate of el is compared to the error variance. Hence, the F statistic for Stress equals the mean square for Stress divided by the mean square for Error in the overall ANOVA table. You should verify that 7.43 is the result (within rounding error) of dividing 19,612.76 by 2,695.61. The p value for this F statistic is less than .05, so conclude that the two means are not equal. Substantively, chronic stress results in a reduction of the number of receptors. The next row in the last table of Figure 1.6 tests whether the four marginal means for the SSRis are within sampling error of one another. Once again, these four means (along with the sample size on which they are calculated) are used to derive an estimate of el based on the mathematics of the null hypothesis. This estimate of el is the mean square for SSRI in this row (4,256.62). The F statistic for SSRI divides this mean square by the error variance: 4,256.62 / 2,695.61=1.58. The p value for this F (.20) is greater than .05, so there is no compelling evidence to reject the null hypothesis. Substantively, these data suggest that, as in the first study, there are no mean SSRI differences in receptor concentrations.

Interpreting Results in a Twoway ANOVA: Interaction A model that tests for the marginal means (which is equivalen! to tests of the ANOV A factors per se) is called a main ejfects model. When the model has more than one ANOV A factor, then the model may also contain an interaction between ANOV A factors. In regression, an interaction between two variables tests for equal slopes. The same can be said of interactions in ANOVA, but the equality is for profiles of means and not necessarily slopes. Because the stress factor in this example has only two levels, we can begin discussion of interaction by constructing a plot of the two means for each of the four SSRis (see Error! Reference source not found.). Here, there are four straight lines, so the ANOV A interaction tests whether the slopes of these four lines are the same. In the earlier Error! Reference source not found., the means were expressed by two line graphs, one for the stressed and the other for the nonstressed rats. A line like this is called a profile that consists of a series of valleys and peaks. The interaction term in ANOV A tests whether the two profiles have a similar shape, although they may differ in elevation. 1.3.2

QMIN

ANOVA and ANCOVA - 1.12

Figure 1.7 Mean (+/-1 standard error) receptor concentrations ofstressed and nonstressed rats as a function of the type of serotonin reuptake inhibitor (SSRI).

-._ 275

- - ' -'

'

~-

'

- ,_

-- '

--- -

- - -

200

None Stress Condition:

Chronic

Algebracally. the interaction model tests the difference between the observed cell means and the cell means predicted from the marginal effects (i.e., the additive effects of the levels of the main ANOVA factors). The additive effect for a leve! of an ANOV A factor is simply the differencc between the marginal mean for that ANOV A factor and the overall mean. For example, the marginal mean for stressed rats is 258.6 and the overall mean is 271.7 (see Table 1.2). Hence the marginal or additive effect ofthe unstressed condition is 258.6-271.7 = -13.l. Substantively, this means that by pretreating the rats with chronic stress, receptor concentration decreases by 13.1 units. Similarly, the marginal effect of the first SSRI is 271.7 - 258.0 = 13.7 units. If the effects of stress and SSRil are additive, then the predicted mean for the stressed/SSRil cell is the overall mean plus the marginal effect of stress plus the marginal effect of SSRil or 271.7 13.l + 13.7 = 272.3. The difference between the observed cell mean (236.5) and the predicted cell mean is the interaction effect: (236.5 - 272.3) = -35.8.

QMIN

ANOVA and ANCOVA - 1.13

Under the null hypothesis, the variance of the interaction effects should be the population variance, el_ Under the null hypothesis, the variance of the interaction effects should be greater than el. The variance in the interactions equals the mean square for the Stress*SSRI effect in Figure 1.6. The F test for the interaction consists of this mean square divided by the error variance. For the SSRI data, the F equals 1.19 and its p value is .32. Hence, there is no evidence for an interaction. The overall ANOV A results show a significant effect for stress, but no effect for either the main effect of SSRI or the interaction between stress and SSRI. In terms of the plot of means in Error! Reference source not found., this implies that the line for the stressed animals is significantly lower than that for the nonstressed rats. The lack of effect for SSRI and for the interaction implies that the two lines for stressed and unstressed rats should really be straight, horizontal lines. The fact that they do not take on this appearance may be attributed to sampling error.

Deleting Interaction Terms In the discussion of interactions in regresson (see Section X.X), we noted that when an interaction is not significant, then the model should be fitted again without the interaction term. The same precept applies to ANOV A. In doing this, however, one must be mindful of the reasons for dropping nonsignificant interactions. Interactions may-but do not have to-induce multicollinearity among the independent variables in the model, particularly among those variables involved in the interaction. In ANOV A, multicollinearity is synonymous with "lack of orthogonality" (see Section X.X). Hence, when the number of observations varies markedly among the cells, the ANOV A becomes more "multicollinear." In such cases, it is recommended that nonsignificant interactions be dropped and the ANOVA rerun. When there are equal numbers of observations per cell-or when this situation is closely approximated-then the results for the main effects will not usually change appreciably in models with and without the interaction.

1.3.3

1.4 Multiway ANOVA


Multiway ANOV As have more than two ANOV A factors. They operate just the same as two-way ANOV As-it is just that they have more main effects and more potential interaction terms. The logic of the interaction terms is identical to that outlines in Section X.X for regression. We illustrate with an example. Vasopressin is an octapeptide hormone that has been associated with sociality in sorne rodent species (Tom Insel refs). A lab that is exploring the development of a genetic mouse model of social behavior decides to screen a series of inbred mi ce for sensitivity to the social effects of vasopressin. Six inbred strains of mice are selected and mice within each strain are randomly assigned to receive an implant that releases a certain amount of vasopressin ora sham implant. The researchers also intend to analyze sex differences by studying both male and female mice. The dependent variable is a composite score of sociality. The design for this study is a six (Strain) by two (Implant type) by two (Sex) ANOVA. With three ANOVA factors, there are three possible two-way interactions: Strain*Implant, Strain*Sex, and Implant*Sex. In addition, there will be one three-way nteraction: Strain*Implant*Sex. Before beginning analysis, it is important to consider the critica! terms in

QMIN

ANOVA and ANCOVA- 1.14

the GLM in light of the investigator's purpose. This lab wants to identify strains that are sensitive to the effects of extra vasopressin released from the implan! and distinguish these from strains that lack sensitivity to the hormone. Hence, they hypothesize that the profile of strain means for those mice with the active implant should differ from the profile for mice without the implan!. This is operationalized in the GLM by the interaction between Strain and Implan!. We begin data analysis by fitting a full model to the data. This model predicted significantly better than chance-R 2 = .25, df=(23, 376), p < .0001. The results of the individual terms are given in Figure 1.8.
Figure 1.8 Results of an ANOVA for mouse sociality data.
Source DF Strain 5 Implant 1 Sex 1 Strain*Implant 5 Strain*Sex 5 Implant*Sex 1 Strain*Implant*Sex 5 Type III SS 348.6991144 572.4634659 498.2671829 396.8582913 160.3450572 6.2285028 76.2976395 Mean Square 69.7398229 572.4634659 498.2671829 79.3716583 32.0690114 6.2285028 5.2595279 F Value 4.21 34.56 30.08 4.79 1.94 0.38 0.92 Pr > F 0.0010 <.0001 <.0001 0.0003 0.0875 0.5401 0.4670

Ali main effects are significan!. The means of the six strains, collapsed o ver Implant and Sex, are nota random sample of means (F(5,376) = 4.21, p = .001), although this fact alone does not inform us of which strains differ from which others. The Implant had the effect of increasing sociality by 2.3 raw units (standardized effect size = .57), and females scored on average 2.1 units higher than males (standardized effect size .52). Of particular importance is the Strain*Implant interaction. According to the investigator's original aims, this is the critica! term in the GLM, and it is significant (F(5, 376) = 4.79, p = .0003). None of the other interaction reaches significance, although there is a noteworthy trend for the Strain*Sex interaction (F(5, 376) = 1.94, p =.09). Before examining the Strain*Implant interaction, we might considering rerunning the model severa! times, each time dropping an interaction term. In fact, this should have little effect. Although the ANOV A is not completely orthogonal, it is not far from it-the smallest N per cell was 14 and the largest was 18-so the results from the full model should not change much when interaction terms are dropped. To illustrate, Table 1.3 presents the F statistics and their p values for the fu]] model, for a model dropping the three-way interaction (Model 1 in Table 1.3), one dropping both the three-way interaction and the Implant*Sex interaction (Model 2), and one that drops ali of the nonsignificant interactions (Model 3). The form of the Strain by Implan! interaction is depicted in Figure 1.9. The sensitivity of a strain to the extra vasopressin released by the implant can be judged by the difference in strain means between the active and the sham implan!. Strains 2 and 3 are the most sensitive, while strains 1 and 4 are the least sensitive. The lab might wish to follow these finding up by, for example, using these strains for selective breeding or for cross-breeding to examine potential quanttative trait loci that might contribute to murine sociality.

QMIN

ANOVA and ANCOVA - 1.15

Table 1.3 Effect of dropping interaction terms from the GLM on mouse sociality.
Ful! F p .001 4.21 34.56 <.001 30.08 <.001 4.79 <.001 .088 1.94 0.38 .540 0.92 .467 GLMModel: Model 1: Model 2: p F F p 4.32 <.001 4.30 <.001 35.15 <.001 35.01 <.001 29.20 <.001 29.32 <.001 4.82 <.001 4.84 <.001 .088 1.94 .087 1.93 0.37 .544 Model 3: p F 4.41 <.001 36.20 <.001 29.67 <.001 5.18 <.001

Effect: Strain Implant Sex Strain*Implant Strain*Sex Implant*Sex Strain*Implant*Sex

Figure 1.9 Means (+/- 1 SE) for mouse sociality seores in six inbred strains of mice as a function of an implant releasing vasopressin.
30fr-----,

Implant:
-No --Yes

1 ._,_
~

/}-----}\
/ / / // / / \ \

\ \
\ \
/ / /

\ \ \
/

/ / /

Strain Nnmber:

QMIN

ANOVA and ANCOVA - 1.16

1.5 ANCOVA: The Analysis of Covariance


The analysis of covariance (ANCOVA) is something of a misnomer because it implies that the object of analysis is the covariance statistic. In fact, ANCOV A operates in the same fashion as ANO VA by deriving different estimates of the population variance, el, under the null hypothesis and testing their equality using F ratios. A more informed view is to regard ANCOVA as a type of GLM in which sorne of the independent variables are categorical while others are quantitative and continuous. The continuous variables are termed covariates and, in terms of the algebra of the model, are entered as if they were any other independent variable. Certain software packages, however, may have special software routines devoted to ANCOV A separate from regression and ANOV A routines. In a few older statistical texts, a "covariate" was defined as a variable that was required (or desired) for statistical control. The impression was sometimes left on the reader that "covariates" were not very interesting in themselves, but were useful in increasing the power of the statistical analysis. Although this view may apply to sorne cases, it if far from being universally true. Sex, for example, might be dummy coded and used as a quantitative variable in a GLM, and the investigator may be very interested in sex differences in response. The two purposes for using a covariate are the same as those for entering any independent variable into a GLM-understanding and/or statistical control. That said, it should be obvious that everything said about independent variables in regression and ANOVA applies to covariates. One can examine polynomials of covariates as well as interactions between covariates and other independent variables. In short, it is best to think of a covariate as just another independent variable even though sorne software packages require them to be treated as if they hada special status ali to themselves. We illustrate the use of covariates in a clinical trail example. The typical design of a clinical tria! is to measure subjects, randomly assign them to treatment conditions, and then measure the subjects again during the treatment phase. The initial, pre-assignment measure is usually referred to as the baseline and the measures taken during treatment as the follow-ups 4 . The example here involves the relative efficacy of two treatments for major affective disorder (depression)-the tricyclic antidepressant imipramine and cognitive-behavioral therapy (CBT). Typical of such studies, patients with clinical diagnoses of depression are screened using a structured psychiatric interview, often supplemented by one or more depression rating scales, to assess whether they meet pre-established eligibility criteria for the study. A patient meeting inclusion criteria is then randomly assigned to one of four groups: (1) double placebo (i.e., placebo drug anda placebo psychotherapy); (2) imipramine and placebo psychotherapy; (3) placebo drug and CBT; and (4) imipramine and CBT. Thus far the design is a two by two ANOVA. The first ANOV A factor is Drug with the two levels of "present" or "absent," and the second ANOVA factor is CBT with the same two levels. It is illustrative to examine the analysis of a six-week follow up score using this ANOVA model. Figure 1.10 presents the means for the different groups on depression-rating seores at the six week follow-up, and Figure 1.11 presents the results of two by two ANOV A performed on these data.

Again, these terms, whi1e common, are not universal. There are also many different variants of the basic clinical trial outlines in the text.

QMIN

ANOVAandANCOVA-1.17

It is apparent from both Figures that both impramine and CBT result in lower ratings of depression than the double control group. The plot of the means in Figure 1.10 is highly suggestive of an interaction. The ANOV A results, however, do not find a significant interaction, so at this time, one must conclude that the two lines in Figure 1.10 are within sampling error of being parallel.

Figure 1.10 Mean(+/- 1 SEM) depression seores at six-week follow upas a function of the type of drug and psychotherapy.
CBT: 28~
-No

--Yes
26

}
No

-- ------

-- --

Yes
Imipramine:

QMIN

ANOVA and ANCOVA-1.18

Figure 1.11 Results of an ANOV A on depression rating seores six weeks after treatment.

Dependent Variable: Post_6_Weeks Sum of Squares 232.100380 2200.817802 2432.918182

Source Model Error Corrected Total R-Square 0.095400 Source Drug CBT Drug*CBT DF 1 1 1

DF 3 106 109

Mean Square 77.366793 20.762432

F Value 3.73

Pr > F 0.0136

Coeff var 20.31714

Root MSE 4.556581

Post_6_Weeks Mean 22.42727 F Val u e 7.67 5.60 2.68 Pr > F 0.0066 0.0198 0.1049

Type I I I SS 159.2500000 116.2823201 55.5528771

Mean Square 159.2500000 116.2823201 55.5528771

Figure 1.12 presents the results from adding baseline depression-rating seores as a covariate. Note that the interaction term is now significant. Why is this? Figure 1.12 Results of an ANCOVA on depression-rating seores six weeks after treatment, controlling for baseline depression seores.

Dependent variable: Post _6_Weeks Sum of Mean Square 217.388558 14.889180

Source Model Error Corrected Total R-Square 0.357412 Source Baseline Drug CBT Drug*CBT DF 1 1 1 1

DF 4 105 109

Squares 869.554233 1563.363949 2432.918182

F Value 14.60

Pr > F <.0001

Coeff Var 17.20517 Type I I I SS 637.4538524 126.0443563 128.0112038 61.0286180

Root MSE 3.858650

Post_6_Weeks Mean 22.42727 F Value 42.81 8.47 8.60 4.10 Pr > F <.0001 0.0044 0.0041 0.0455

Mean Square 637.4538524 126.0443563 128.0112038 61.0286180

QMIN

ANOVAandANCOVA-1.19

The reason is that the baseline depression seores are correlated with depression seores at six weeks after treatment and, hence, reduce the error variance which, as we should recall, is the denominator in the F statistic. Let us take a few moments to refleet on how this is done. Reeall that under the null hypothesis, an F statistic is the ratio of two estimates of the population varianee, d. When baseline is not in the GLM, then the sampling model is akin to drawing raw six-week seores out of a hat, computing means, and throwing the resulting statisties into a hat of means. When baseline is included in the GLM, however, the sampling model is similar to drawing raw six-week seores from a hat, computing a regression of those six-week seores on baseline, and then calculating the residuals from that regression. The means that go into the hat of means are calculated on the residuals from this regression, not on the raw seores. The result of this is to reduce the variance of the sampling distribution of the means (see Seetion X.X). If the raw six-week seores hada variance of 1.0, then the variance in the hat of means would equal !IN. The residuals from the regression, on the other hand, would have a variance of 1 - r 2 where r is the correlation between baseline and six-week seores. Hence, the variance in the hat of means derived from the residuals equals (1 - r 2 )/N. A comparison of the statistics in Figure 1.11 with those in Figure 1.12 bears this out. Without baseline in the model, the estimate of el- used in the denominator of the F statistic is 20.8 (the Mean Square for Error in Figure 1.11). When baseline in included, that estimate is 14.9, a reduction of 28%. Similarly, the R 2 for the model without baseline is .10 while the R 2 for the model with baseline is .36. Because the only difference in these two models is the addition of baseline as the independent variable, the increase in R2 is attributable solely to the effects of baseline.

Вам также может понравиться