Академический Документы
Профессиональный Документы
Культура Документы
SLIDES . BY
.
. John Loucks
St. Edwards
. University
.
.
.
.
.
.
.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 1
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
Regression Analysis: Model Building
General Linear Model
Determining When to Add or Delete
Variables
Variable Selection
Procedures
Multiple Regression Approach
to
Experimental Design
Autocorrelation and the Durbin-Watson
Test
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 2
or duplicated, or posted to a publicly accessible website, in whole or in part.
General Linear Model
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 3
or duplicated, or posted to a publicly accessible website, in whole or in part.
General Linear Model
y 0 1x1
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 4
or duplicated, or posted to a publicly accessible website, in whole or in part.
Modeling Curvilinear Relationships
y 0 1x1 2x12
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 5
or duplicated, or posted to a publicly accessible website, in whole or in part.
Interaction
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 6
or duplicated, or posted to a publicly accessible website, in whole or in part.
Transformations Involving the Dependent
Variable
Often the problem of nonconstant variance can
be
corrected by transforming the dependent
variable to a
Most statistical packages provide the ability to
different scale.
apply
logarithmic transformations using either the
base-10
Another approach,
(common called
log) or the baseaereciprocal
= 2.71828...
transformation,
(natural log). is to use 1/y as the dependent
variable instead of y.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 7
or duplicated, or posted to a publicly accessible website, in whole or in part.
Nonlinear Models That Are Intrinsically
Linear
Models in which the parameters (0, 1, . . . , p )
have exponents other than one are called
nonlinear models.
In some cases we can perform a transformation
of variables that will enable us to use
regression analysis with the general linear
model.
The exponential model involves the regression
equation:
E(y) 0 x1
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 9
or duplicated, or posted to a publicly accessible website, in whole or in part.
Determining When to Add or Delete
Variables
The pvalue criterion can also be used to
determine whether it is advantageous to add
one or more dependent variables to a multiple
regression model.
The pvalue associated with the computed F
statistic can be compared to the level of
significance .
It is difficult to determine the pvalue directly
from the tables of the F distribution, but
computer software packages, such as Minitab
or Excel, provide the pvalue.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 10
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection Procedures
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 12
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Stepwise Regression
Compute F stat. and Any
p-value for each indep. p-value < alpha
variable not in model to enter
? No
No
Indep. variable Yes
Any with largest
p-value > alpha Yes p-value is Stop
to remove removed
? from model
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 14
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Forward Selection
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 16
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Backward Elimination
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 18
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Backward Elimination
Partial Data
Selling House Number Number Garage
Segment Price Size of of Size
of City ($000) (00 sq. ft.) Bedrms. Bathrms. (cars)
Northwest 290 21 4 2 2
South 95 11 2 1 0
Northeast 170 19 3 2 2
Northwest 375 38 5 4 3
West 350 24 4 3 2
South 125 10 2 2 0
West 310 31 4 4 2
West 275 25 3 2 2
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 19
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Backward Elimination
Regression Output
Greates
Variable t
to be p-value
removed > .05
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 20
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Backward Elimination
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 21
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Backward Elimination
Regression Output
Greates
Variable t
to be p-value
removed > .05
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 22
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Backward Elimination
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 23
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Backward Elimination
Regression Output
Greates
Variable t
to be p-value
removed > .05
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 24
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Backward Elimination
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 25
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Backward Elimination
Regression Output
Greates
t
p-value
is < .05
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 26
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Backward Elimination
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 27
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Best-Subsets
Regression
The three preceding procedures are one-
variable-at-a-time methods offering no
guarantee that the best model for a given
number of variables
Some software will be
packages found.best-subsets
include
regression that enables the user to find, given
a specified number of independent variables,
the best regression model.
Minitab output identifies the two best one-
variable estimated regression equations, the
two best two-variable equation, and so on.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 28
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable Selection: Best-Subsets
Regression
Example: PGA Tour Data
The Professional Golfers Association keeps a
variety of statistics regarding performance
measures.
Data include the average driving distance,
percentage
of drives that land in the fairway, percentage of
greens hit in regulation, average number of
putts,
percentage of sand saves, and average score.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 29
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable-Selection Procedures
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 30
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable-Selection Procedures
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 31
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable-Selection Procedures
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 32
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable-Selection Procedures
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 33
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable-Selection Procedures
Sample Correlation
Coefficients
Score Drive Fair Green Putt
Drive -.154
Fair -.427 -.679
Green-.556 -.045 .421
Putt .258 -.139 .101 .354
Sand-.278 -.024 .265 .083 -.296
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 34
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable-Selection Procedures
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 35
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable-Selection Procedures
Minitab Output
The regression equation
Score = 74.678 - .0398(Drive) - 6.686(Fair)
- 10.342(Green) + 9.858(Putt)
Predictor Coef Stdev t-ratio
p
Constant74.678 6.952 10.74 .000
Drive -.0398 .01235 -3.22 .004
Fair -6.686 1.939 -3.45 .003
Green -10.342 3.561 -2.90 .009
Putt 9.858 3.180 3.10 .006
s = .2691 R-sq = 72.4% R-sq(adj) = 66.8%
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 36
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variable-Selection Procedures
Minitab Output
Analysis of Variance
SOURCE DF SS MS
F P
Regression 4 3.79469 .94867
13.10
Error .000 20 1.44865
.07243
Total 24 5.24334
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 37
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
The use of dummy variables in a multiple
regression equation can provide another
approach to solving analysis of variance and
experimental design problems.
We will use the results of multiple regression to
perform the ANOVA test on the difference in
the means of three populations.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 38
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
Example: Reed Manufacturing
Janet Reed would like to know if there is
any
significant difference in the mean number of
hours
worked per week for the department
managers at
herAthree
simple random sample
manufacturing of five
plants managers
(in Buffalo,
from
Pittsburgh, and Detroit).
each of the three plants was taken and the
number
of hours worked by each manager for the
previous
2014
week is shown on the next slide.
Cengage Learning. All Rights Reserved. May not be scanned, copied Slide
39
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 40
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
We begin by defining two dummy variables, A
and
B, that will indicate the plant from which each
sample
Inobservation
general, if there are k populations, we need
was selected.
to
define k 1 dummy variables.
A = 0, B = 0 if observation is from Buffalo
A= plant
1, B = 0 if observation is from
A= Pittsburgh
0, B = 1 plant
if observation is from
Detroit plant
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 41
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
Input Data
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 42
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 43
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 44
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
Next, we observe that if there is no difference
in the means:
E(y) for the Pittsburgh plant E(y) for the Buffalo plant = 0
E(y) for the Detroit plant E(y) for the Buffalo plant = 0
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 45
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
Because 0 equals E(y) for the Buffalo plant and
0 + 1 equals E(y) for the Pittsburgh
plant, the first difference is equal to (0 + 1) - 0
= 1 .
Because 0 + 2 equals E(y) for the Detroit
plant, the second difference is equal to (0 +
We 0 = 2conclude
2) -would . that there is no difference
in the three means if 1 = 0 and 2 = 0.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 46
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
The null hypothesis for a test of the difference
of means is
H0: 1 = 2 =
0
To test this null hypothesis, we must compare
the value of MSR/MSE to the critical value from
an F distribution with the appropriate
numerator and denominator degrees of
freedom.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 47
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
ANOVA Table Produced by Excel
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 48
or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Approach to
Experimental Design
At a .05 level of significance, the critical value
of F with k 1 = 3 1 = 2 numerator d.f.
and nT k = 15 3 = 12 denominator d.f. is
3.89.
Because the observed value of F (9.55) is
greater than the critical value of 3.89, we reject
the null hypothesis.
Alternatively, we reject the null hypothesis
because the p-value of .003 < = .05.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 49
or duplicated, or posted to a publicly accessible website, in whole or in part.
Autocorrelation and the Durbin-Watson
Test
Often, the data used for regression studies in
business and economics are collected over
time.
It is not uncommon for the value of y at one
time period to be related to the value of y at
previous time periods.
In this case, we say autocorrelation (or serial
correlation) is present in the data.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 50
or duplicated, or posted to a publicly accessible website, in whole or in part.
Autocorrelation and the Durbin-Watson
Test
With positive autocorrelation, we expect a
positive residual in one period to be followed
by a positive residual in the next period.
With positive autocorrelation, we expect a
negative residual in one period to be followed
by a negative residual in the next period.
With negative autocorrelation, we expect a
positive residual in one period to be followed
by a negative residual in the next period, then
a positive residual, and so on.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 51
or duplicated, or posted to a publicly accessible website, in whole or in part.
Autocorrelation and the Durbin-Watson
Test
When autocorrelation is present, one of the
regression assumptions is violated: the error
terms are not independent.
When autocorrelation is present, serious errors
can be made in performing tests of
significance based upon the assumed
regression model.
The Durbin-Watson statistic can be used to
detect first-order autocorrelation.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 52
or duplicated, or posted to a publicly accessible website, in whole or in part.
Autocorrelation and the Durbin-Watson
Test
Durbin-Watson Test Statistic
n
2
(et et 1 )2
d t 2 n
2
et2
t 1
i
ei yi y
The ith residual is denoted
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 53
or duplicated, or posted to a publicly accessible website, in whole or in part.
Autocorrelation and the Durbin-Watson
Test
Durbin-Watson Test Statistic
The statistic ranges in value from zero to
four.
If successive values of the residuals are
close
together (positive autocorrelation is
present),
If successive values are far apart
the statistic will be small.
(negative
autocorrelation is present), the
statistic will
Abevalue of two indicates no
large.
autocorrelation.
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 54
or duplicated, or posted to a publicly accessible website, in whole or in part.
Autocorrelation and the Durbin-Watson
Test
Suppose the values of (residuals) are not
independent but are related in the following
manner: = + zt t-1 t
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 55
or duplicated, or posted to a publicly accessible website, in whole or in part.
Autocorrelation and the Durbin-Watson
Test
The null hypothesis always is:
H 0 : 0 there is no autocorrelation
The alternative hypothesis is:
H a : 0 to test for positive autocorrelation
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 56
or duplicated, or posted to a publicly accessible website, in whole or in part.
Autocorrelation and the Durbin-Watson
Test
A Sample Of Critical Values For The
Durbin-Watson Test For Autocorrelation
Significance Points of dL and dU: =
.05
Number of Independent Variables
1 2 3 4 5
n dL dU dL dU dL dU dU dU dU dU
1.0 1.3 0.9 1.5 0.8 1.7 0.6 1.9 0.5 2.2
15
8 6 5 4 2 5 9 7 6 1
1.1 1.3 0.9 1.5 0.8 1.7 0.7 1.9 0.6 2.1
16
0 7 8 4 6 3 4 3 2 5
1.1 1.3 1.0 1.5 0.9 1.7 0.7 1.9 0.6 2.1
17
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide
3 8 2 4 0 1 8 0 7
or duplicated, or posted to a publicly accessible website, in whole or in part.
0
57
Autocorrelation and the Durbin-Watson
Test
Positive Incon- No evidence of
autocor- clusive positive autocorrelation
relation
0 dL dU 2 4-dU 4-dL 4
2014 Cengage Learning. All Rights Reserved. May not be scanned, copied Slide 59
or duplicated, or posted to a publicly accessible website, in whole or in part.