Академический Документы
Профессиональный Документы
Культура Документы
We wish to build a model that fits the data better than the
simple linear regression model.
y i 1 x1i 2 x 2i p x pi i
Model and Required Conditions
SSE
s
n k 1
of Determination
Coefficient
SSE
• The definition is: R 1
2
i
(Y Y ) 2
Testing the Validity of the Model
H0: 1 = 2 = … = k = 0
H1: At least one i is not equal to zero.
MSR/MSE
ANOVA
df SS MS F Significance F
Regression k = 6 3123.8 520.6 17.14 0.0000
Residual n–k–1 = 93 2825.6 30.4
Total n-1 = 99 5949.5
SSR MSR=SSR/k
SSE MSE=SSE/(n-k-1)
SST
SSR: Sum of Squares for Regression
SSE: Sum of Squares for Error
SST: Sum of Squares Total
• As in analysis of variance, we have:
F,k,n-k-1 = F0.05,6,100-6-1=2.17
F = 17.14 > 2.17
Also, the p-value (Significance F) = 0.0000
Reject the null hypothesis.
Insufficient Evidence
Example: Sex discrimination in wages
Do female employees tend to receive lower starting salaries
than similarly qualified and experienced male employees?
Variables collected
93 employees on data file (61 female, 32 male).
b sa l
6000
other characteristics.
5000
4000
Female Male
fsex
Relationships of bsal with other variables
8 0 0 0 8 0 0 0 8 0 0 0 8 0 0 0
l
l
7 0 0 0 7 0 0 0 7 0 0 0 7 0 0 0
a
a
s
s
6 0 0 0 6 0 0 0 6 0 0 0 6 0 0 0
b
b
5 0 0 0 5 0 0 0 5 0 0 0 5 0 0 0
4 0 0 0 4 0 0 0 4 0 0 0 4 0 0 0
L in e a r F it L in e a r F it L in e a r F it L in e a r F it
Multiple regression model
For employees who started at the same time, had the same
education and experience, and were the same age, women earned
$767 less on average than men.
Which variable is the strongest predictor of the outcome?
The reported t-stats (coef. / SE) and p-values are used to test whether a
particular coefficient equals 0, given that all other coefficients are in the
model.
Examples:
We need to transform
3000
sal77 Residual
2000
variables. 1000
0
- 1000
- 2000
- 3000
7000 9000 11000 13000 15000 17000
sal77 Pr edic t ed
2
Plots of residuals vs. predictors
2
l
l
a
a
F i t Y b y X G r o u p
s
s
B i v a r i a t e F i t o fB i R
v ea sr ii da u
t ae l F bi st a lo fB 2i R
v B
ea ysr ii sda eu
t nae il oF rbi st a lo f
b
b
l
l
1 5 0 0 1 5 0 0 1 5 0 0
a
a
u
u
1 0 0 0 1 0 0 0 1 0 0 0
id
id
id
5 0 0 5 0 0 5 0 0
s
s
0 0 0
e
e
R
R
- 5 0 0 - 5 0 0 - 5 0 0
- 1 0 0 0 - 1 0 0 0 - 1 0 0 0
6 06 57 07 58 08 59 09 51 0 0 3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0 7 8 9 1 01 11 21 31 41 51 61 7
s e n io r a g e e d u c
Fi t Y by X G r o up
Bi v a r i a t e Fi t of Re s i d u a l bs a l Bi
2 vBy
a r ieaxtp ee r F i t of Re s i d u a l bs a l 2 By f se x
15 00 15 00
R e s id u a l b s a l 2
R e s id u a l b s a l 2
10 00 10 00
50 0 50 0
0 0
- 5 00 - 5 00
- 1 00 0 - 1 00 0
- 50 0 5 0 1 0 01 5 02 0 02 5 03 0 03 5 04 0 0 - 0 . 01 . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 1 . 1
ex pe r f s ex
Collinearity
When predictors are highly correlated, standard errors are
inflated