Академический Документы
Профессиональный Документы
Культура Документы
24 November 2010
Correlations
Pearsons r revisited
=
=
P
)(Yi Y )
(X X
qPi i
2
2
i (Xi X ) (Yi Y )
Sum of Products
p
Sum of Squaresx Sum of Squaresy
SP
p
SSx SSy
HA : x,y 6= 0
r N 2
t=
1 r2
So computation of t is:
t
=
=
=
.24 8 2
p
1 (.24)2
.24(2.45)
1 .0576
.59
.59
=
= .61
.97
.9424
From R (using qt()), we know that the critical value for t with
df=6, = .05 is 2.447 so we do not reject H0
Regression analysis
I
Example
35
30
25
20
15
40
45
par(mar=c(4,4,1,1))
x <- c(0,3,1,0,6,5,3,4,10,8)
y <- c(12,13,15,19,26,27,29,31,40,48)
plot(x, y, xlab="Number of prior convictions (X)",
ylab="Sentence length (Y)", pch=19)
abline(h=c(10,20,30,40), col="grey70")
10
P
1 =
(xi x)2
the intercept:
0 = y 1 x
2 =
[yi (0 + 1 xi )]2
n2
Things to note:
I
the residual is ei = yi yi
2
i ei
2 = RSS/(n 2)
the point (x = 0, y = 0 )
the point (
x , y ) (the average X predicts the average Y )
2
i ei
Regression line
30
25
20
15
35
40
Yhat = 14 + 3X
Y intercept
10
Regression terminology
y is regressed on X
Notation
Independent variables X
Dependent variable Y
= 0 + 1 (13)
= 14 + 3(13)
= 14 + 39
= 53
P
(yi y )2
P
ESS Estimation or Regression sum of squares (
yi y )2
P 2 P
RSS Residual sum of squares
ei = (
yi yi )2
The key to remember is that TSS = ESS + RSS
R2
R 2 and Pearsons r
R2
I
Note that computing the regression line comes from the same
sums of squares (SP, SSx , SSY ) used in computing the
correlation r
0 R 2 1.0
Computation:
SP
r=p
SSx SSy
R 2 computation example
=
=
=
SP
SSx SSy
300
(100)(1250)
300
125000
300
353.55
.85
R2
I
R2 =
R 2 continued
R 2 continued
Solid arrow:
P variation in y when X is unknown (TSS Total Sum of
Squares (yi y )2 )
R 2 decomposed
= y +
Var(y ) = Var(
y ) + Var(e) + 2Cov(
y , e)
Var(y ) = Var(
y ) + Var(e) + 0
X
X
2
(yi y ) /N =
(
yi y
)2 /N +
(ei e)2 /N
X
X
X
(yi y )2 =
(
yi y
)2 +
(ei e)2
X
X
X
(yi y )2 =
(
yi y
)2 +
ei2
TSS
TSS/TSS
= ESS + RSS
= ESS/TSS + RSSTSS
1 = R 2 + unexplained variance
3Q
1.5303
Max
8.2424
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.0909
21.6202
2.271
0.0636 .
x
0.7576
0.3992
1.898
0.1065
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 4.587 on 6 degrees of freedom
Multiple R-squared: 0.375, Adjusted R-squared: 0.2709
F-statistic: 3.601 on 1 and 6 DF, p-value: 0.1065
y2
4
y1
10
11
Anscombe plots
10
12
14
10
12
14
x2
y4
6
y3
10
10
12
12
x1
10
x3
12
14
10
12
14
x4
16
18
Linear model
Linear model
Stochastic
Systematic
Generalised version:
Yi f (i , )
i = g (Xi , )
Stochastic
Systematic
Model
Yi f (i , )
i = g (Xi , )
Stochastic
Systematic
Model
Yi f (i , )
i = g (Xi , )
Stochastic
Systematic
H0 :1 = 0
H1 :1 6= 0
F -test
F =
ESS/1
ESS
= 2
RSS/(n 2)
Example
> require(foreign)
> dail <- read.dta("dailcorrected.dta")
> summary(lm(votes1st ~ spend_total + incumb + electorate + minister, data=dail))
Call:
lm(formula = votes1st ~ spend_total + incumb + electorate + minister,
data = dail)
Residuals:
Min
1Q
-4934.1 -1038.8
Median
-347.6
3Q
1054.0
Max
6900.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
7.966e+02 4.172e+02
1.909
0.0569 .
spend_total
1.737e-01 1.095e-02 15.862
<2e-16 ***
incumbIncumbent 2.522e+03 2.207e+02 11.424
<2e-16 ***
electorate
-4.827e-04 5.404e-03 -0.089
0.9289
minister
-1.303e+02 3.965e+02 -0.329
0.7425
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1847 on 458 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.6478, Adjusted R-squared: 0.6447
F-statistic: 210.6 on 4 and 458 DF, p-value: < 2.2e-16
1. Specification:
I
I
I
I
2. E() = 0
3. Error terms:
I
I
Objective: minimize
I
I
P
(Yi Yi )2 , where
Yi = b0 + b1 Xi
error ei = (Yi Yi )
b =
=
ei2 =
P
)(Yi Y )
(Xi X
P
)
(Xi X
P
XY
P i 2i
Xi
OLS rationale
I
I
I
= X +
XY
= X 0X + X 0
X 0Y
= X 0X + 0
(X 0 X )1 X 0 Y
= +0
= (X 0 X )1 X 0 Y
I
In simple
P case where y = 0 + 1 x, this gives
2 / (xi x)2 for the variance of 1
Note how increasing the variation in X will reduce the variance
of 1
I
I
OLS in R
> dail <- read.dta("dail2002.dta")
> mdl <- lm(votes1st ~ spend_total*incumb + minister, data=dail)
> summary(mdl)
Call:
lm(formula = votes1st ~ spend_total * incumb + minister, data = dail)
Residuals:
Min
1Q
-5555.8 -979.2
Median
-262.4
3Q
877.2
Max
6816.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
469.37438 161.54635
2.906 0.00384 **
spend_total
0.20336
0.01148 17.713 < 2e-16 ***
incumb
5150.75818 536.36856
9.603 < 2e-16 ***
minister
1260.00137 474.96610
2.653 0.00826 **
spend_total:incumb
-0.14904
0.02746 -5.428 9.28e-08 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1796 on 457 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.6672, Adjusted R-squared: 0.6643
F-statistic:
229 on 4 and 457 DF, p-value: < 2.2e-16
OLS in Stata
. use dail2002
(Ireland 2002 Dail Election - Candidate Spending Data)
. gen spendXinc = spend_total * incumb
(2 missing values generated)
. reg votes1st spend_total incumb minister spendXinc
Source |
SS
df
MS
-------------+-----------------------------Model | 2.9549e+09
4
738728297
Residual | 1.4739e+09
457 3225201.58
-------------+-----------------------------Total | 4.4288e+09
461 9607007.17
Number of obs
F( 4,
457)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
462
229.05
0.0000
0.6672
0.6643
1795.9
-----------------------------------------------------------------------------votes1st |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------spend_total |
.2033637
.0114807
17.71
0.000
.1808021
.2259252
incumb |
5150.758
536.3686
9.60
0.000
4096.704
6204.813
minister |
1260.001
474.9661
2.65
0.008
326.613
2193.39
spendXinc | -.1490399
.0274584
-5.43
0.000
-.2030003
-.0950794
_cons |
469.3744
161.5464
2.91
0.004
151.9086
786.8402
------------------------------------------------------------------------------
P
(yi y )2
P
ESS Estimation or Regression sum of squares (
yi y )2
P 2 P
RSS Residual sum of squares
ei = (
yi yi )2
The key to remember is that TSS = ESS + RSS