Вы находитесь на странице: 1из 12

PLS205 Lab 3 January 23, 2014

Laboratory Topics 4 & 5

Orthogonal contrasts
Class comparisons in SAS
Trend analysis in SAS
Multiple mean comparisons

Orthogonal contrasts

Planned, single degree-of-freedom orthogonal contrasts are powerful means of perfectly partitioning the
ANOVA model sum of squares to gain greater insight into your data; and this method of analysis is
available in SAS via Proc GLM.

Whenever you program contrasts, be sure to use the "Order = Data" option in Proc GLM so that the
coefficients featured in the subsequent Contrast statements will correspond accurately to the levels of the
indicated classification variable. For example:

Proc GLM Order = Data;

The Contrast statements can come anywhere after the Model statement in Proc GLM. These statements
specify the independent F-test to be conducted. Its syntax:

Contrast 'ID' ClassVariable Coefficients;

where ID, enclosed in single quotes, is the label you assign to the contrast (just a title, it can be anything);
ClassVariable is the classification variable whose means are being compared; and Coefficients is the set
of orthogonal coefficient values, separated by spaces or tabs. Note that in a nested design, it is imperative
that the coefficients be followed by a declaration of the appropriate error term:

Contrast 'A vs. B' Trtmt 1 1 1 -1 -1 -1 / e = Pot(Trtmt);

Before looking at our first example, remember that Proc GLM uses several different methods for
determining SS, the details of which will be covered later in the course [For more details, refer to Topic
11.4 in your class notes]. For now, let's reiterate the following rule of thumb:

Use the Type I SS (sum of squares) for regressions

The Type I SS measures incremental sums of squares for the model as each variable is added.

Use the Type III SS for F-tests

Type III is the sum of squares for each effect adjusted for every other effect and is used for
both balanced and unbalanced designs.

CLASS COMPARISONS USING CONTRASTS

PLS205 2014 3.1 Lab 3 (Topics 4-5)


Example 4.1 ST&D pg. 159 [Lab3ex1.sas]

This is a CRD in which 18 mint plants were randomly assigned to 6 different treatments (i.e. all
combinations of two temperature [High and Low] and three light [8, 12, and 16 hour days] conditions)
and their growth measured.

Data MintMean;
Input Trtmt $ Growth @@;
Cards;
L08 15.0 L12 18.0 L16 19.0 H08 32.0 H12 22.0 H16 33.0
L08 17.5 L12 14.0 L16 21.5 H08 28.0 H12 26.5 H16 27.0
L08 11.5 L12 17.5 L16 22.0 H08 28.0 H12 29.0 H16 35.0
;

* L08 means Low Temp and 8 hours of light, H12 means High Temp and 12 hours
of light, etc.;

Proc GLM Order = Data; * To maintain the order in which we entered data;
Class Trtmt;
Model Growth = Trtmt; * L08 L12 L16 H08 H12 H16;
Contrast 'Temp' Trtmt 1 1 1 -1 -1 -1;
Contrast 'Light linear' Trtmt 1 0 -1 1 0 -1;
Contrast 'Light quadratic' Trtmt 1 -2 1 1 -2 1;
Contrast 'Temp * Light linear' Trtmt 1 0 -1 -1 0 1;
Contrast 'Temp * Light quadratic' Trtmt 1 -2 1 -1 2 -1;
Run;
Quit;

What questions we are asking here exactly? To answer this, it is helpful to articulate the null hypothesis
for each contrast:
Contrast Temp H0: Mean plant growth under low temperature conditions is the same as under high
temperature conditions.
Contrast Light Linear H0: Mean plant growth under 8 hour days is the same as under 16 hour days
(OR: The response of growth to light has no linear component).
Contrast Light Quadratic H0: Mean plant growth under 12 hour days is the same as the average mean
growth under 8 and 16 hour days combined (OR: The growth response to light is perfectly linear; OR:
The response of growth to light has no quadratic component).
Contrast Temp * Light Linear H0: The linear component of the response of growth to light is the same
at both temperatures.
Contrast Temp * Light Quadratic H0: The quadratic component of the response of growth to light is
the same at both temperatures.

So what would it mean to find significant results and to reject each of these null hypotheses?
Reject contrast Temp H0 = There is a significant response of growth to temperature.
Reject contrast Light linear H0 = The response of growth to light has a significant linear component.
Reject contrast Light quadratic H0 = The response of growth to light has a significant quadratic
component.
Reject contrast Temp * Light Linear H0 = The linear component of the response of growth to light
depends on temperature.
Reject contrast Temp * Light Quadratic H0 = The quadratic component of the response of growth to
light depends on temperature.

PLS205 2014 3.2 Lab 3 (Topics 4-5)


Results of the GLM procedure
Dependent Variable: Growth

Sum of
Source DF Squares Mean Square F Value Pr > F

Model 5 718.5694444 143.7138889 16.69 <.0001


Error 12 103.3333333 8.6111111
Corrected Total 17 821.9027778

R-Square Coeff Var Root MSE Growth Mean

0.874275 12.68198 2.934469 23.13889

Source DF Type III SS Mean Square F Value Pr > F

Trtmt 5 718.5694444 143.7138889 16.69 <.0001

Contrast DF Contrast SS Mean Square F Value Pr > F

Temp 1 606.6805556 606.6805556 70.45 <.0001 ***


Light linear 1 54.1875000 54.1875000 6.29 0.0275 *
Light quadratic 1 35.0069444 35.0069444 4.07 0.0667
Temp * Light linear 1 11.0208333 11.0208333 1.28 0.2800
Temp * Light quadratic 1 11.6736111 11.6736111 1.36 0.2669

Things to notice

Notice the sum of the contrast degrees of freedom. What does it equal? Why?
Notice the sum of the contrast SS. What does it equal? Why?
What insight does this analysis give you into your experiment?

PLS205 2014 3.3 Lab 3 (Topics 4-5)


TREND ANALYSIS USING CONTRASTS

Example 4.2 ST&D pg. 387 [Lab3ex2.sas]

This experiment was conducted to investigate the relationship between plant spacing and yield in
soybeans. The researcher randomly assigned five different plant spacings to 30 field plots, planted the
soybeans accordingly, and measured the yield of each plot at the end of the season. Since we are
interested in the overall relationship between plant spacing and yield (i.e. characterizing the response of
yield to plant spacing), it is appropriate to perform a trend analysis.

Title 'Equally spaced treatments in a CRD';


Data SoyRows;
Input Sp Yield;
Cards;
18 33.6 24 31.1 30 33.0 36 28.4 42 31.4
18 37.1 24 34.5 30 29.5 36 29.9 42 28.3
18 34.1 24 30.5 30 29.2 36 31.6 42 28.9
18 34.6 24 32.7 30 30.7 36 32.3 42 28.6
18 35.4 24 30.7 30 30.7 36 28.1 42 29.6
18 36.1 24 30.3 30 27.9 36 26.9 42 33.4
;
Proc GLM Order = Data;
Class Sp;
Model Yield = Sp;
Means Sp; * 18 24 30 36 42;
Contrast 'Linear' Sp -2 -1 0 1 2;
Contrast 'Quadratic' Sp 2 -1 -2 -1 2;
Contrast 'Cubic' Sp -1 2 0 -2 1;
Contrast 'Quartic' Sp 1 -4 6 -4 1;
Run;
Quit;

What questions we are asking here exactly? As before, it is helpful to articulate the null hypothesis for
each contrast:
Contrast Linear H0: The response of yield to spacing has no linear component.
Contrast Quadratic H0: The response of yield to spacing has no quadratic component.
Contrast Cubic H0: The response of yield to spacing has no cubic component.
Contrast Quartic H0: The response of yield to spacing has no quartic component.

Can you see, based on the contrast coefficients, why these are the null hypotheses?

PLS205 2014 3.4 Lab 3 (Topics 4-5)


Results of the GLM procedure
Dependent Variable: Yield

Sum of
Source DF Squares Mean Square F Value Pr > F

Model 4 125.6613333 31.4153333 9.90 <.0001


Error 25 79.3283333 3.1731333
Corrected Total 29 204.9896667

R-Square Coeff Var Root MSE Yield Mean

0.613013 5.690541 1.781329 31.30333

Source DF Type III SS Mean Square F Value Pr > F

Sp 4 125.6613333 31.4153333 9.90 <.0001

Contrast DF Contrast SS Mean Square F Value Pr > F

Linear 1 91.26666667 91.26666667 28.76 <.0001 ***


Quadratic 1 33.69333333 33.69333333 10.62 0.0032 **
Cubic 1 0.50416667 0.50416667 0.16 0.6936
Quartic 1 0.19716667 0.19716667 0.06 0.8052

Interpretation

There is a quadratic relationship between row spacing and yield. Why? Because there is a significant
quadratic component to the response but no significant cubic or quartic components. Please note that we
are only able to carry out trend comparisons in this way because the treatments are equally spaced. Now,
exactly the same result can be obtained through a regression approach, as shown in the next example.

Example 4.3 [Lab3ex3.sas]

Title 'Equally spaced treatments in a CRD';


Data SoyRows;
Input Sp Yield;
Cards; 'Linear' H0: There is no linear component
'Quadratic' H0: There is no quadratic component
; 'Cubic' H0: There is no cubic component
Proc GLM; 'Quartic' H0: There is no quartic component
Model Yield = Sp
Sp*Sp
Sp*Sp*Sp
Sp*Sp*Sp*Sp;
Run;
Quit;

PLS205 2014 3.5 Lab 3 (Topics 4-5)


Results of the GLM procedure
Dependent Variable: Yield

Sum of
Source DF Squares Mean Square F Value Pr > F

Model 4 125.6613333 31.4153333 9.90 <.0001


Error 25 79.3283333 3.1731333
Corrected Total 29 204.9896667

R-Square Coeff Var Root MSE Yield Mean

0.613013 5.690541 1.781329 31.30333

Source DF Type I SS Mean Square F Value Pr > F

Sp 1 91.26666667 91.26666667 28.76 <.0001 ***


Sp*Sp 1 33.69333333 33.69333333 10.62 0.0032 **
Sp*Sp*Sp 1 0.50416667 0.50416667 0.16 0.6936
Sp*Sp*Sp*Sp 1 0.19716667 0.19716667 0.06 0.8052

Source DF Type III SS Mean Square F Value Pr > F

Sp 1 0.41016441 0.41016441 0.13 0.7222


Sp*Sp 1 0.27910540 0.27910540 0.09 0.7692
Sp*Sp*Sp 1 0.22140395 0.22140395 0.07 0.7938
Sp*Sp*Sp*Sp 1 0.19716667 0.19716667 0.06 0.8052

Again, since this is a regression analysis, use the Type I SS, not the Type III SS. Notice in this case that
the Type I SS results match perfectly those from our earlier analysis by contrasts.

For the interested:

When you carry out a trend analysis using a regression approach, SAS also provides estimates of the
parameters for your model:

Standard
Parameter Estimate Error t Value Pr > |t|

Intercept 92.91666667 132.6083560 0.70 0.4900


Sp -6.97245370 19.3932598 -0.36 0.7222
Sp*Sp 0.30495756 1.0282517 0.30 0.7692
Sp*Sp*Sp -0.00620499 0.0234905 -0.26 0.7938
Sp*Sp*Sp*Sp 0.00004876 0.0001956 0.25 0.8052

In this case, the equation of the trend line that best fits out data would be:

Yield = 0.30 * Sp2 6.97 * Sp + 92.92

PLS205 2014 3.6 Lab 3 (Topics 4-5)


Multiple Mean Comparisons

Orthogonal contrasts are planned, a priori tests that partition the experimental variance cleanly. They are
a powerful tool for analyzing data, but they are not appropriate for all experiments. Less restrictive
comparisons among treatment means can be performed using Proc GLM by way of the Means statement.
Any number of Means statements may be used within a given Proc GLM, provided they appear after the
Model statement. The syntax:

Means Class-Variables / Options;

This statement tells SAS to:

1. Compute the means of the response variable for each level of the specified classification
variable(s), all of which were featured in the original Model statement; then
2. Perform multiple comparisons among these means using the stated Options.

Some of the available Options are listed below:

Fixed Range Tests

DUNNETT ('control') Dunnett's test [NOTE: If no control is specified, the first treatment is used.]
T or LSD Fisher's least significant difference test
TUKEY Tukey's studentized range test (HSD: Honestly significant difference)
SCHEFFE Scheffs test

Multiple Range Tests

DUNCAN Duncan's test


SNK Student-Newman-Keuls test
REGWQ Ryan-Einot-Gabriel-Welsch test

The default significance level for comparisons among means is = 0.05, but this can be changed easily
using the option Alpha = , where is the desired significance level. The important thing to keep in mind
is the EER (experimentwise error rate); we want to keep it controlled while keeping the test as sensitive as
possible, so our choice of test should reflect that.

PLS205 2014 3.7 Lab 3 (Topics 4-5)


Example 4.4 (One-Way Multiple Comparison) [Lab3ex4.sas]

Heres the clover experiment again, a CRD in which 30 different clover plants were randomly inoculated
with six different strains of rhizobium are the resulting level of nitrogen fixation measured.
Data Clover;
Input Culture $ Nlevel;
Cards;
3DOk1 24.1 3DOk4 17.9 3DOk13 14.3
3DOk1 32.6 3DOk4 16.5 3DOk13 14.4
3DOk1 27 3DOk4 10.9 3DOk13 11.8
3DOk1 28.9 3DOk4 11.9 3DOk13 11.6
3DOk1 31.4 3DOk4 15.8 3DOk13 14.2
3DOk5 19.1 3DOk7 20.7 Comp 17.3
3DOk5 24.8 3DOk7 23.4 Comp 19.4
3DOk5 26.3 3DOk7 20.5 Comp 19.1
3DOk5 25.2 3DOk7 18.1 Comp 16.9
3DOk5 24.3 3DOk7 16.7 Comp 20.8
;
Proc GLM;
Class Culture;
Model Nlevel = Culture;
Means Culture / LSD;
Means Culture / Dunnett ('Comp'); * The control treatment is 'Comp';
Means Culture / Tukey;
Means Culture / Scheffe;
Means Culture / Duncan;
Means Culture / SNK;
Means Culture / REGWQ;
Proc Boxplot;
Title 'Boxplot Comparing Treatment Means';
Plot NLevel*Culture / cboxes = black;
Run;
Quit;

In this experiment, there is no obvious structure to the treatment levels and therefore no way to anticipate
the relevant questions to ask. We want to know how the different rhizobial strains performed; and to do
this, we must systematically make all pair-wise comparisons among them.

In the output on the following pages, keep an eye on the


least (or minimum) significant difference(s) used for each test.

What is indicated by changes in these values from test to test?

Also notice how the comparisons change significance with the different tests.

PLS205 2014 3.8 Lab 3 (Topics 4-5)


t Tests (LSD) for Nlevel

This test controls the Type I comparisonwise error rate, not the experimentwise error rate.

Alpha 0.05
Error Degrees of Freedom 24
Error Mean Square 6.668833
Critical Value of t 2.06390
Least Significant Difference 3.3709

Means with the same letter are not significantly different.

t Grouping Mean N Culture

A 28.800 5 3DOk1
B 23.940 5 3DOk5
C 19.880 5 3DOk7
C 18.700 5 Comp
D 14.600 5 3DOk4
D 13.260 5 3DOk13

Dunnett's t Tests for Nlevel

This test controls the Type I experimentwise error for comparisons of all treatments against a control.

Alpha 0.05
Error Degrees of Freedom 24
Error Mean Square 6.668833
Critical Value of Dunnett's t 2.69540
Minimum Significant Difference 4.4023

Comparisons significant at the 0.05 level are indicated by ***.

Difference
Culture Between Simultaneous 95%
Comparison Means Confidence Limits

3DOk1 - Comp 10.100 5.698 14.502 ***


3DOk5 - Comp 5.240 0.838 9.642 ***
3DOk7 - Comp 1.180 -3.222 5.582
3DOk4 - Comp -4.100 -8.502 0.302
3DOk13 - Comp -5.440 -9.842 -1.038 ***

Tukey's Studentized Range (HSD) Test for Nlevel

This test controls the Type I experimentwise error rate (MEER), but it generally has a higher Type II
error rate than REGWQ.

Alpha 0.05
Error Degrees of Freedom 24
Error Mean Square 6.668833
Critical Value of Studentized Range 4.37265
Minimum Significant Difference 5.0499

Means with the same letter are not significantly different.

Tukey Grouping Mean N Culture

A 28.800 5 3DOk1
B A 23.940 5 3DOk5
B C 19.880 5 3DOk7
D C 18.700 5 Comp
D E 14.600 5 3DOk4
E 13.260 5 3DOk13

PLS205 2014 3.9 Lab 3 (Topics 4-5)


Scheffe's Test for Nlevel [For group comparisons with Scheffe, see Section 5.3.1.4]
This test controls the Type I MEER.

Alpha 0.05
Error Degrees of Freedom 24
Error Mean Square 6.668833
Critical Value of F 2.62065
Minimum Significant Difference 5.9121

Means with the same letter are not significantly different.

Scheffe Grouping Mean N Culture

A 28.800 5 3DOk1
B A 23.940 5 3DOk5
B C 19.880 5 3DOk7
B C D 18.700 5 Comp
C D 14.600 5 3DOk4
D 13.260 5 3DOk13

Duncan's Multiple Range Test for Nlevel

This test controls the Type I comparisonwise error rate, not the MEER.

Alpha 0.05
Error Degrees of Freedom 24
Error Mean Square 6.668833

Number of Means 2 3 4 5 6
Critical Range 3.371 3.540 3.649 3.726 3.784

Means with the same letter are not significantly different.

Duncan Grouping Mean N Culture

A 28.800 5 3DOk1
B 23.940 5 3DOk5
C 19.880 5 3DOk7
C 18.700 5 Comp
D 14.600 5 3DOk4
D 13.260 5 3DOk13

Student-Newman-Keuls (SNK) Test for Nlevel

This test controls the Type I experimentwise error rate under the complete null hypothesis (EERC) but not
under partial null hypotheses (EERP).

Alpha 0.05
Error Degrees of Freedom 24
Error Mean Square 6.668833

Number of Means 2 3 4 5 6
Critical Range 3.3708858 4.0787156 4.5055234 4.8116298 5.0499266

Means with the same letter are not significantly different.

SNK Grouping Mean N Culture

A 28.800 5 3DOk1
B 23.940 5 3DOk5
C 19.880 5 3DOk7
C 18.700 5 Comp
D 14.600 5 3DOk4
D 13.260 5 3DOk13

PLS205 2014 3.10 Lab 3 (Topics 4-5)


Ryan-Einot-Gabriel-Welsch (REGWQ) Multiple Range Test for Nlevel

This test controls the Type I MEER.

Alpha 0.05
Error Degrees of Freedom 24
Error Mean Square 6.668833

Number of Means 2 3 4 5 6
Critical Range 4.1910831 4.5900067 4.8041049 4.8116298 5.0499266

Means with the same letter are not significantly different.

REGWQ Grouping Mean N Culture

A 28.800 5 3DOk1
B 23.940 5 3DOk5
C B 19.880 5 3DOk7
C D 18.700 5 Comp
E D 14.600 5 3DOk4
E 13.260 5 3DOk13

And to make the relationships among the tests easier to see (i.e. to make sure the dead horse is thoroughly
beaten), here is a nice little summary table of all the above results:

Significance Groupings
Culture LSD Dunnett Tukey Scheffe Duncan SNK REGWQ
3DOk1 A *** A A A A A
3DOk5 B *** AB AB B B B
3DOk7 C BC BC C C BC
Comp C CD BCD C C CD
3DOk4 D DE CD D D DE
3DOk13 D *** E D D D E

Least Sig't 3.371 4.402 5.05 5.912 3.371 3.371 4.191


Difference fixed fixed fixed fixed 3.784 5.05 5.05
EER EERC
no yes yes yes no yes
Control only

Notice where the non-EER-controlling tests get you into potential Type I trouble, namely by their
readiness to declare significant differences between 3DOk5 and 3DOk7 and between Comp and 3DOk4.

On the other hand, regarding potential Type II trouble, notice where the relatively insensitive Scheffe's
test (insensitive due to its ability to make unlimited pair-wise and group comparisons) failed to pick up a
difference detected by other EER-controlling tests (e.g. between 3DOk7 and 3DOk4). Notice, too, how
the multiple-range REGWQ was able to detect the difference between 3DOk1 and 3DOk5 when the
fixed-range Tukey test was not (both control for EER).

Remember, while you should steer clear of tests that do not control for EER, there's no
"right" test or "wrong" test. There's only knowing the characteristics of each and
choosing the most appropriate one for your experiment (and the culture of your
discipline).

PLS205 2014 3.11 Lab 3 (Topics 4-5)


It is instructive to consider the above table of comparisons with the boxplot below in hand:

35

30

25
N
l
e
v
e
l
20

15

10

3 DOk 1 3 DOk 5 3 DOk 4 3 DOk 7 3 DOk 1 3 Co mp

Cu l t u r e

Something to think about:

Does the boxplot above raise any red flags for you about your data?
How would go about investigating such concerns?

PLS205 2014 3.12 Lab 3 (Topics 4-5)

Вам также может понравиться