Вы находитесь на странице: 1из 26

Multiple Regression

Involves the use of more than one


independent variable.
Multivariate analysis involves more than one
dependent variable - OMS 633
Adding more variables will help us to explain
more variance - the trick becomes: are the
additional variables significant and do they
improve the overall model? Additionally, the
added independent variables should not be
too highly related with each other!
Multiple Regression
A sample data set:




Sales= hundreds of gallons
Price = price per gallon
Advertising = hundreds of dollars
Week Sales Price Advrtising
1 10 1.3 9
2 6 2 7
3 5 1.7 5
4 12 1.5 14
5 10 1.6 15
6 15 1.2 12
7 5 1.5 6
8 12 1.4 10
9 17 1 15
10 20 1.1 21
Analyzing the output
Evaluate for multicollinearity
State and interpret the equation
Interpret Adjusted R
2

Interpret S
yx

Are the independent variables significant?
Is the model significant
Forecast and develop prediction interval
Examine the error terms
Calculate MAD, MSE, MAPE, MPE
Correlation Matrix
Simple correlation for each combination
of variables (independents vs.
independents; independents vs.
dependent)
Sales Price Advrtising
Sales 1
Price -0.86349 1
Advrtising 0.891497 -0.65449 1
Multicollinearity
Its possible that the independent variables are related to
one another. If they are highly related, this condition is
called multicollinearity. Problems:
A regression coefficient that is positive in sign in a two-
variable model may change to a negative sign
Estimates of the regression coefficient change greatly
from sample to sample because the standard error of
the regression coefficient is large.
Highly interrelated independent variable can explain
some of the same variance in the dependent variable -
so there is no added benefit, even though the R-square
has increased.
We would throw one variable out - high correlation (.7)
Multiple Regression Equation





Gallon Sales = 16.4 - 8.2476 (Price) + .59 (Adv)
i i
X b X b X b b Y ... 1

2 2 1 0

Coefficients
Standard
Error t Stat P-value
Intercept 16.41 4.34 3.78 0.01
Price -8.25 2.20 -3.76 0.01
Advrtising 0.59 0.13 4.38 0.00
Regression Coefficients
b
o
is the Y-intercept - the value of sales when X
1

and X
2
are 0.
b
1
and b
2
are net regression coefficients. The
change in Y per unit change in the relevant
independent variable, holding the other
independent variables constant.

Regression Coefficients
For each unit increase ($1.00) in price, sales will
decrease 8.25 hundred gallons, holding advertising
constant.
For each unit increase ($100, represented as 1) in
Advertising, sales will increase .59 hundred gallons,
holding price constant.
Be very careful about the units! 10 in the advertising
indicates $1,000 because advertising is in hundreds
Gallons = 16.4 - 8.2476 (1.00) + .59 (10)
= 14.06 or 1,406 Gallons

Regression Coefficients
How does a one cent increase in price
affect sales (holding advertising at
$1,000)?
16.4-8.25(1.01)+.59(10) = 13.9675

If price stays $1.00, and increase
advertising $100, from $1,000 to $1100:
16.4-8.25(1.00)+.59(11) = 14.65

Regression Statistics
Standard error of the estimate
R
2
and Adjusted R
2

Regression Statistics
Multiple R 0.965364
R Square 0.931929
Adjusted R Square 0.91248
Standard Error 1.507196
Observations 10
R
2
and Adjusted R
2
Same formulas as Simple Regression
SSR/SST (this is an UNADJUSTED R
2
)
Adjusted R
2
from ANOVA = 1-MSR/(SST/n-1)

91% of the variance in gallons sold is
explained by price per gallon and
advertising.
Standard Error of the Estimate
Measures the standard amount that the
actual values (Y) differ from the
estimated values .
No change in formula, except, in this
example, k=3.
Can still use square root of MSE
Y

Evaluate the Independent Variables


H
o
: The regression coefficient is not significantly
different from zero
H
A
: The regression coefficient is significantly different
from zero
Use the t-stat and the --value to evaluate EACH
independent variable. If an independent variable is
NOT significant, we remove it from the model and re-
run!
Coefficie
nts
Standard
Error t Stat P-value
Intercept 16.40637 4.342519 3.778075 0.00691
Pri ce -8.24758 2.196057 -3.75563 0.007115
Advrti si ng 0.585101 0.133672 4.377145 0.003246
Evaluate the Model
Ho: The model is NOT valid and there is NOT a
statistical relationship between the dependent and
independent variables
HA: The model is valid. There is a statistical relationship
between the dependent and independent variables.
If F from the ANOVA is greater than the F from the F-
table, reject Ho: The model is valid. We can look at the
P-values. If the p-value is less than our set a level, we
can REJECT Ho.


ANOVA
df SS MS F
Significa
nce F
Regressi on 2 217.6985 108.8493 47.91657 8.23E-05
Resi dual 7 15.90149 2.271641
Total 9 233.6
Forecast and Prediction Interval
Same as simple regression - however,
many times we will not have the
correction factor (formula under the
square root). It is acceptable to use the
Standard error of the estimate provided
in the computer output.


2
2
2 /
) (
) ( 1
1

X X
X X
n
S Z Y
i
i
yx a
Examining the Errors
Heteroscedasticity exists when the residuals do not
have a constant variance across an entire range of
values.
Run an autocorrelation on the error terms to determine
if the errors are random. If the errors are not random,
the model needs to be re-evaluated. More on this in
Chapter 9.
Evaluate with MAD, MAPE, MPE, MSE
Dummy Variables
Used to determine the relationship
between qualitative independent
variables and a dependent variable.
Differences based on gender
Effect of training/no-training on performance
Seasonal data- quarters
We use 0 and 1 to indicate off or on.
For example, code males as 1 and
females as 0.
Dummy Variables
The data indicates job
performance rating based
on achievement test
score and
female (0) and
males (1).
How do males and
females differ in their
job performance?
Rati ng Test Score Gender
5 60 0
4 55 0
3 35 0
10 96 0
2 35 0
7 81 0
6 65 0
9 85 0
9 99 1
2 43 1
8 98 1
6 91 1
7 95 1
3 70 1
6 85 1
Dummy Variables
The regression equation:
Job performance = -1.96 +.12 (test score) -2.18 (gender)
Holding gender constant, a one unit increase in test score
increases job performance rating by 1.2 points.
Holding test score constant, males experience a 2.18 point lower
performance rating than females. Or stated differently, females
have a 2.18 higher job performance than males, holding test
scores constant.
Coefficie
nts
Standard
Error t Stat P-value
Intercept -1.96 0.71 -2.77 0.02
Test Score 0.12 0.01 11.86 0.00
Gender -2.18 0.45 -4.84 0.00
Dummy Variable Analysis
Evaluate for multicollinearity
State and interpret the equation
Interpret Adjusted R
2

Interpret S
yx

Are the independent variables significant?
Is the model significant
Forecast and develop prediction interval
Examine the error terms
Calculate MAD, MSE, MAPE, MPE
Model Evaluation
If the variables indicate multicollinearity, run the
model, interpret, but then re-run the best model
(I.e. throw out one of the highly correlated
variables)
If one of the independent variables are NOT
significant, (whether dummy variable or other)
throw it out and re-run the model
If the overall model is not significant - back to the
drawing board - need to gather better predictor
variables maybe an elective course!
Stepwise Regression
Sometimes, we will have a great number
of variables - running a correlation matrix
will help determine if any variables
should NOT be in the model (low
correlation with the dependent variable).
Can also run different types of
regression, such as stepwise regression
Stepwise regression
Adds one variable at a time - one step at
a time. Based on explained variance
(and highest correlation with the
dependent variable). The independent
variable that explains the most variance
in the dependent variable is entered into
the model first.
A partial f-test is determined to see if a
new variable stays or is eliminated.
Start with the correlation Matrix
Unit
Sales
Test
Score
Age
(years) Anxiety
Experience
(Years)
High
School
GPA
Uni t Sal es 1
Test Score 0.67612 1
Age (years) 0.798141 0.227706 1
Anxi ety -0.29586 -0.22199 -0.28679 1
Experi ence (Years) 0.549834 0.349639 0.539568 -0.27869 1
Hi gh School GPA 0.621784 0.317772 0.694569 -0.24438 0.3121288 1
Stepwise Regression
F-to-Enter: 4.00 F-to-Remove: 4.00

Response is Unit Sales on 5 predictors, with N = 30

Step 1 2
Constant -100.85 -86.79

Age (yea 6.97 5.93
T-Value 7.01 10.60

Test Sco 0.200
T-Value 8.13

S 6.85 3.75
R-Sq 63.70 89.48

Stepwise Regression
The equation at Step1:
Sales = -100.85 + 6.97 (age)
The equation at Step2:
Sales = -86.79 + 5.93 (age) + .200 (test
score)

No other variables are significant; the
model stops.

Вам также может понравиться