Вы находитесь на странице: 1из 6

Chap.

10, page 1
Math 445 Chapter 10 Inferences About Regression Coefficients

Chapter 10 concerns statistical inferences about individual regression coefficients, about linear
combinations of coefficients, and about sets of coefficients. All these inferences, which are based on
either the t or F distribution, are dependent on the assumptions of normality of the residuals, constant
variance, and independence. Assessment of these assumptions is covered in Chapter 11.

Example 1: Rainfall data


Consider the additive model (a model without interactions is called “additive” because the effects of
the variables are additive and don’t depend on the levels of the other variables):

µ (Precip Latitude, Altitude, Rainshadow ) = β 0 + β1Latitude + β 2 Altitude + β 3Rainshadow

Assume that the linear regression model assumptions are satisfied; that is, that the model fits, that the
residuals are normal with constant variance and that the observations are independent. The formal
inferences we make below are valid only if these assumptions are satisfied.
Coefficientsa

Unstandardized
Coefficients 95% Confidence Interval for B
Model B Std. Error t Sig. Lower Bound Upper Bound
1 (Constant) -97.557 24.554 -3.973 .0005 -148.028 -47.085
Latitude (degrees) 3.428 .667 5.139 .0000 2.057 4.800
Altitude (ft) .00115 .00085 1.352 .1880 -.00060 .00290
Rainshadow -19.688 3.439 -5.725 .0000 -26.758 -12.619
a. Dependent Variable: Precipitation (in)

• The t statistic and P-value for each coefficient are for a two-sided test of the hypothesis that the
true coefficient is 0.

• Is there evidence of an effect of latitude on precipitation? This is addressed by a two-sided test


of the hypothesis H 0 : β1 = 0 . There is convincing evidence (P=.0005) that β1 is greater than
0. In addition, we estimate that mean precipitation rises about 3.43 inches for every one degree
increase in latitude (95% confidence interval: 2.06 to 4.80 inches) given that altitude and rain
shadow remain the same.

• The test of H 0 : β1 = 0 (and the confidence interval) is for the model which also has Altitude
and Rainshadow in it. Thus, it is a test of the effect of Latitude after the linear effects of
Altitude and Rainshadow have been adjusted for. This is different than a test of H 0 : β1 = 0
without Altitude and Rainshadow in the model.

• We do not have convincing evidence (P = .188) that mean precipitation changes with altitude,
given that latitude and rain shadow remain fixed. We estimate that mean precipitation
increases by 1.15 inches for every 1000 foot increase in altitude (95% confidence interval, 0.60
inch decrease to 2.90 inch increase).

• Do locations in the rain shadow differ from those not in the rain shadow, after adjusting for the
effects of latitude and altitude? (In other words, is there evidence that β 3 ≠ 0 ?) There is
Chap. 10, page 2
completely convincing evidence (P<.00005) that locations in the rain shadow receive less
precipitation on average than locations of the same latitude and altitude not in the rain shadow.
What is more interesting is that locations in the rain shadow are estimated to have mean
precipitation 19.7 inches less (95% confidence interval: 26.8 inches to 12.6 inches less) than
equivalent locations (on altitude and latitude) not in the rain shadow.

Inferences and interpretation when there are interactions in the model


When interactions are present in a model, the test of significance for the coefficient on a term which is
involved in a higher order interaction is not useful because we must always include this term in the
model anyway. In addition, the coefficient on this term does not have a meaningful interpretation.

Example: In the Chapter 9 notes, we fit the following model to the rainfall data:

µ (Precip Latitude, Rainshadow ) = β 0 + β1Latitude + β 2 Rainshadow + β 3Latitude * Rainshadow


Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -175.457 26.177 -6.703 .000
Latitude (degrees) 5.581 .705 .895 7.912 .000
Rainshadow 139.839 39.019 4.240 3.584 .001
Latitude*Rainshadow -4.315 1.051 -4.871 -4.105 .000
a. Dependent Variable: Precipitation (in)

The coefficient on rainshadow is large and positive – but it does not mean that locations in the
rainshadow are estimated to have mean precipitation 139.8 inches greater than locations of the same
latitude not in the rainshadow! Why not?

The statistical significance of the coefficients on the first-order terms (Latitude and Rainshadow) is
also irrelevant since they are both involved in the second-order term. In particular, if either coefficient
were not statistically significantly different from 0 (large P-value), that would not mean that we had no
evidence of an effect of that variable. For example, if the coefficient for Latitude in the above model
had had a statistically nonsignificant coefficient, that would not mean that we had no evidence of an
effect of latitude, because the effect of latitude also comes through the Latitude*Rainshadow
interaction, which is statistically significant.

Suppose we fit the following model with a 3-way interaction:

µ (Precip Latitude, Rainshadow ) = β 0 + β1Altitude + β 2 Latitude + β 3 Rainshadow + β 4 Altitude * Latitude


+ β 5 Altitude * Rainshadow + β 6 Latitude * Rainshadow + β 7 Altitude * Latitude * Rainshadow

• We must include all two-way interactions which are part of the 3-way interaction.
Chap. 10, page 3
• The coefficient on the 3-way interaction is interpreted as the difference between the effect of the
two-way interaction between any pair of variables for different levels of the third variable. For
example, β 7 represents the difference in the effect of the Altitude by Latitude interaction for
locations in and not in the rain shadow.

• The coefficients on all the terms below the 3-way interaction have no useful interpretation as long
as the 3-way interaction is in the model, and the tests of significance of these terms are not
meaningful.

• The test of significance on the coefficient on the 3-way interaction is meaningful: we have no
evidence that there is a 3-way interaction among these variables in their association with
precipitation. That’s good: we generally don’t want to include a 3-way interaction unless we
have strong evidence to the contrary.

• Interactions will be addressed further in the model-building chapter.


Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -178.154 26.390 -6.751 .000
Altitude (ft) .0248 .0172 3.129 1.444 .163
Latitude (degrees) 5.5929 .7191 .897 7.778 .000
Rainshadow 72.7033 50.9637 2.205 1.427 .168
Altitude*Latitude -.0006 .0004 -2.953 -1.358 .188
Altitude*Rainshadow .0067 .0233 .572 .289 .776
Latitude*Rainshadow -2.4465 1.3797 -2.761 -1.773 .090
Alt*Lat*Raindshadow -.0002 .0006 -.746 -.376 .711
a. Dependent Variable: Precipitation (in)

Inferences for linear combinations of parameters


Sometimes, the effect of interest is a linear combination of parameters.

Example 2: Exercise 9.18, p. 263, Speed of Evolution.


There are two binary variables: Sex and Continent. Suppose they are coded as indicator variables as
follows:

Sex: 0 = Female, 1 = Male


Continent: 0 = NA, 1 =EU

Consider the model

µ (Wing Latitude, Sex, Continent ) = β 0 + β1Latitude + β 2Sex + β 3Continent + β 4Sex * Continent

This model implies the following relationships between Wing size and Latitude:
Chap. 10, page 4
Female, NA: µ (Wing Latitude, Sex = 0, Continent = 0 ) = β 0 + β1Latitude
Female, EU: µ (Wing Latitude, Sex = 0, Continent = 1) = β 0 + β1Latitude + β 3
Male, NA: µ (Wing Latitude, Sex = 1, Continent = 0 ) = β 0 + β1Latitude + β 2
Male, EU: µ (Wing Latitude, Sex = 1, Continent = 1) = β 0 + β1Latitude + β 2 + β 3 + β 4

• The slope coefficients are identical for all four groups since there are no interactions with
Latitude.

• The intercepts are different and the differences represent the vertical distances between the
parallel lines relating Wing size to Latitude.

• β 3 represents the difference between mean Wing size for females in NA and EU; a test of
H 0 : β 3 = 0 and a confidence interval for β 3 can be obtained directly from the regression
output.

• The difference between mean Wing size for males in NA and EU is β 3 + β 4 . An estimate of
this difference is βˆ + βˆ ; however, the SE and a confidence interval cannot be easily obtained
3 4

from the regression output. SE( βˆ3 + βˆ 4 ) depends on the SE’s of βˆ3 and β̂ 4 individually, but
also on the covariance of βˆ3 and βˆ 4 . Although you can obtain the needed covariance from the
SPSS regression output to calculate SE( βˆ + βˆ ), it is easier to simply reparameterize the
3 4
model to obtain this directly from the regression output.

• Reparameterization: reverse the coding on Sex: let 0 be male and 1 be female. The “Male” and
“Female” labels are then switched in the above set of equations and β 3 in this new model
represents the difference in mean wing size for males in NA and EU; i.e., it is the same as
β 3 + β 4 in the old model. The SE of the estimated difference can be obtained directly from the
regression output.

• Reparameterizing changes the interpretation of individual parameters but it doesn’t change the
model.

Inferences about the mean response at some combination of X’s.


The estimated mean of Y at any combination of X’s is obtained by plugging in these values into the
estimated regression equation. The standard error of the mean response can be obtained in SPSS by
including an extra case in the data file which has the desired X’s but a missing value for Y. Then, as
with simple linear regression, on the regression dialog box, choose Save…SE of mean predictions for
the SE of the mean, and choose Prediction Intervals Mean for confidence intervals for the mean
response and Prediction Intervals…Individual for prediction intervals for an individual response.
These are individual confidence intervals and prediction intervals, not simultaneous.

Example 1: Rainfall data. Here are some results when the additive model was fit.
µ (Precip Latitude, Altitude, Rainshadow ) = β 0 + β1Latitude + β 2 Altitude + β 3Rainshadow
Chap. 10, page 5

The fitted model is


µˆ (Precip Latitude, Altitude, Rainshadow ) = −97.557 + 3.428 * Latitude + 0.00115 * Altitude − 19.688 * Rainshadow

The predicted values, standard error of the mean (SEP), 95% confidence interval for the mean (LMCI,
UMCI) and 95% prediction interval (LICI, UICI) are shown for cases 26-30 plus two new sets of X
values. These confidence intervals are valid only if the assumptions of the regression model are
satisfied; we have not checked these assumptions yet.

Case Precip Altitude Latitude Shadow Pred SEP LMCI UMCI LICI UICI
26 9.94 19 32.7 0 14.574 3.846 6.669 22.479 -6.151 35.299
27 4.25 2105 34.1 1 2.047 3.184 -4.499 8.593 -18.198 22.292
28 1.66 -178 36.5 1 7.687 2.565 2.415 12.959 -12.183 27.557
29 74.87 35 41.7 0 45.448 4.460 36.281 54.615 24.210 66.686
30 15.95 60 39.2 1 17.217 2.989 11.072 23.362 -2.902 37.336
. 1000 35.0 0 23.586 2.892 17.640 29.531 3.527 43.645
. 3000 40.0 1 23.337 3.126 16.911 29.763 3.130 43.544

According to this model, the estimated mean annual precipitation for locations at 3000 feet and 40
degrees latitude which are in the rain shadow is 23.34 inches (95% confidence interval 16.9 to 29.8
inches). A 95% prediction interval for the annual precipitation at an individual location like this is
3.13 to 43.5 inches.

Extra –Sums-of-Squares Tests

We sometimes want to test a hypothesis about a set of parameters in a regression model. Recall that
we did this in an ANOVA model where the overall F test tested H 0 : µ1 = µ 2 = … = µ I and where an
extra sum of squares F test was used to compare two models. This test is valid only if the assumptions
of the regression model (normality, constant variance, independence) are satisfied.

Example 1: Meadowfoam study, Case Study 9.1

Suppose we fit the model regressing number of Flowers on Timing (binary variable; early or late) and
Light Intensity where Light Intensity is treated as a factor with 6 levels. Thus there is an indicator
variable for Timing called early (1 for early, 0 for late) and 5 indicator variables for Intensity, called
L300, L450, L600, L750, L900 with 150 treated as the reference level. There are no interactions so the
model is:

µ (Flowers early , LIGHT) = β 0 + β1early + β 2 L300 + β 3 L450 + β 4 L600 + β 5 L750 + β 6 L900

A shorthand way of describing the model (see Section 9.3.5, p. 249) is:

µ (Flowers early, LIGHT) = early + LIGHT

Suppose we want to test the hypothesis that there is no effect of light intensity given that the Timing
variable is in the model.. What hypothesis about the regression parameters do we want to test?
Chap. 10, page 6

To test this hypothesis, we fit a full model with early and all the indicator variables for LIGHT in the
model. Then we fit a reduced model with just early in the model and carry out an extra sum-of-squares
F-test just as we did in Chapter 5.

Full model results:


ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 3570.464 6 595.077 13.181 .000a
Residual 767.472 17 45.145
Total 4337.936 23
a. Predictors: (Constant), Early, L900, L750, L600, L450, L300
b. Dependent Variable: Flowers

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 67.196 3.629 18.518 .000
L300 -9.125 4.751 -.253 -1.921 .072
L450 -13.375 4.751 -.371 -2.815 .012
L600 -23.225 4.751 -.644 -4.888 .000
L750 -27.750 4.751 -.769 -5.841 .000
L900 -29.350 4.751 -.814 -6.178 .000
Early 12.158 2.743 .452 4.432 .000
a. Dependent Variable: Flowers

Reduced model results:


ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 886.950 1 886.950 5.654 .027a
Residual 3450.986 22 156.863
Total 4337.936 23
a. Predictors: (Constant), Early
b. Dependent Variable: Flowers

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 50.058 3.616 13.845 .000
Early 12.158 5.113 .452 2.378 .027
a. Dependent Variable: Flowers

Carry out the F-test (the coefficients above are not necessary for this test, only the ANOVA table).