Вы находитесь на странице: 1из 17

Linear Statistical Analysis I

Fall 2013

()

Linear Statistical Analysis I

Fall 2013

1 / 17

Outline

Ordinary Least Squares Estimation,


estimation of 2
Hypothesis testing
ANOVA (analysis of variance)

()

Linear Statistical Analysis I

Fall 2013

2 / 17

Example: (Heights data) There are 1375 motherdaughter pairs.


Measurements on the heights of each pair were made.
> heights=read.table("heights.txt",header=T)
> fit=lm(Dheight~Mheight, data=heights)
> X=heights$Mheight
> Y=heights$Dheight
> plot(X,Y)
> lines(X,fit$fitted.values,col="red",lwd=3)
> sum(fit$residuals^2)
[1] 7051.957

()

Linear Statistical Analysis I

Fall 2013

3 / 17

70

65
60

55

55

60

65

70

()

Linear Statistical Analysis I

Fall 2013

4 / 17

> heights=read.table("heights.txt",header=T)
> fit=lm(Dheight~Mheight, data=heights)
> X=heights$Mheight
> Y=heights$Dheight
> SSE=sum((Y-fit$fitted.values)^2)
> MSE
[1] 5.136167
> sqrt(MSE)
[1] 2.266311
> se_0=sqrt(MSE*(1/1375+mean(X)^2/(1375-1)/var(X)))
> se_0
[1] 1.622469
> se_1=sqrt(MSE*(1/(1375-1)/var(X)))
> se_1
[1] 0.02596069
> summary(fit)
Call:
lm(formula = Dheight ~ Mheight, data = heights)
Residuals:
Min
1Q Median
-7.397 -1.529 0.036

3Q
1.492

Max
9.053

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.91744
1.62247
18.44
<2e-16 ***
Mheight
0.54175
0.02596
20.87
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 2.266 on 1373 degrees of freedom
Multiple R-squared: 0.2408,
Adjusted R-squared: 0.2402
F-statistic: 435.5 on 1 and 1373 DF, p-value: < 2.2e-16
()

Linear Statistical Analysis I

Fall 2013

5 / 17

Test H0 : 0 = 20
> heights=read.table("heights.txt",header=T)
> fit=lm(Dheight~Mheight, data=heights)
> X=heights$Mheight
> Y=heights$Dheight
> SSE=sum((Y-fit$fitted.values)^2)
> MSE
[1] 5.136167
> sqrt(MSE)
[1] 2.266311
> se_0=sqrt(MSE*(1/1375+mean(X)^2/(1375-1)/var(X)))
> se_0
[1] 1.622469
> b0=29.917
> t=(b0-20)/se_0
> t
[1] 6.112288
> 2*(1-pt(t,1375-2))
[1] 1.277454e-09

()

Linear Statistical Analysis I

Fall 2013

6 / 17

Test H0 : 0 = 0 and H0 : 1 = 0.
> t=(b0)/se_0
> 2*(1-pt(t,1375-2))
[1] 0
> b1=0.54175
> t=(b1)/se_1
> 2*(1-pt(t,1375-2))
[1] 0
> summary(fit)
Call:
lm(formula = Dheight ~ Mheight, data = heights)
Residuals:
Min
1Q Median
3Q
Max
-7.397 -1.529 0.036 1.492 9.053
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.91744
1.62247
18.44
<2e-16 ***
Mheight
0.54175
0.02596
20.87
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 2.266 on 1373 degrees of freedom
Multiple R-squared: 0.2408,
Adjusted R-squared: 0.2402
F-statistic: 435.5 on 1 and 1373 DF, p-value: < 2.2e-16

()

Linear Statistical Analysis I

Fall 2013

7 / 17

>
>
>
>
>
>

X=rnorm(1000,5,5)
e=rnorm(1000,0,5)
Y=2+e
plot(X,Y)
fit=lm(Y~X)
summary(fit)

Call:
lm(formula = Y ~ X)
Residuals:
Min
1Q
-15.3695 -3.5737

Median
-0.0721

3Q
3.5952

Max
15.7919

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.08247
0.22201
9.38
<2e-16 ***
X
-0.01866
0.03165
-0.59
0.556
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 5.046 on 998 degrees of freedom
Multiple R-squared: 0.0003482, Adjusted R-squared: -0.0006535
F-statistic: 0.3476 on 1 and 998 DF, p-value: 0.5556

()

Linear Statistical Analysis I

Fall 2013

8 / 17

10 5

10

15

10

10

15

20

()

Linear Statistical Analysis I

Fall 2013

9 / 17

>
>
>
>
>
>
>
>

X=rnorm(1000,0,5)
e=rnorm(1000,0,5)
Y=X^2+e
plot(X,Y)

fit=lm(Y~X)
summary(fit)

Call:
lm(formula = Y ~ X)
Residuals:
Min
1Q Median
-36.603 -22.505 -12.763

3Q
Max
9.932 209.211

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.9851
1.1156 23.293
<2e-16 ***
X
-0.3669
0.2190 -1.675
0.0942 .
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 35.27 on 998 degrees of freedom
Multiple R-squared: 0.002804,
Adjusted R-squared: 0.001804
F-statistic: 2.806 on 1 and 998 DF, p-value: 0.09424

()

Linear Statistical Analysis I

Fall 2013

10 / 17

50

100

150

200

15

10

10

15

()

Linear Statistical Analysis I

Fall 2013

11 / 17

20

10

20

30

40

> X=rnorm(1000,5,5)
> Y=1+2*X
> plot(X,Y)
> SS1=var(Y)*(1000-1)
> abline(h=mean(Y))
> SS1
[1] 98388.65

10
()

Linear Statistical Analysis I

10

15

20
Fall 2013

12 / 17

30

10 0

10 20 30

> e=rnorm(1000,0,10)
> Y=e
> plot(X,Y)
> SS2=var(Y)*(1000-1)
> abline(h=mean(Y))
> SS2
[1] 98864.55

10
()

Linear Statistical Analysis I

10

15

20
Fall 2013

13 / 17

> Y=1+2*X
> fit1=lm(Y~X)
> anova(fit1)
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq
F value
Pr(>F)
X
1 98389
98389 3.4831e+34 < 2.2e-16 ***
Residuals 998
0
0
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Warning message:
In anova.lm(fit1) :
ANOVA F-tests on an essentially perfect fit are unreliable
###########################################################
> Y=e
> fit1=lm(Y~X)
> fit2=lm(Y~X)
> anova(fit1)
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X
1
162 161.614 1.6341 0.2014
Residuals 998 98703 98.901

()

Linear Statistical Analysis I

Fall 2013

14 / 17

> heights=read.table("heights.txt",header=T)
> fit=lm(Dheight~Mheight, data=heights)
> X=heights$Mheight
> Y=heights$Dheight
> anova(fit)
Analysis of Variance Table
Response: Dheight
Df Sum Sq Mean Sq F value
Pr(>F)
Mheight
1 2236.7 2236.66 435.47 < 2.2e-16 ***
Residuals 1373 7052.0
5.14
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1

()

Linear Statistical Analysis I

Fall 2013

15 / 17

> summary(fit)
Call:
lm(formula = Dheight ~ Mheight, data = heights)
Residuals:
Min
1Q Median
3Q
Max
-7.397 -1.529 0.036 1.492 9.053
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.91744
1.62247
18.44
<2e-16 ***
Mheight
0.54175
0.02596
20.87
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 2.266 on 1373 degrees of freedom
Multiple R-squared: 0.2408,
Adjusted R-squared: 0.2402
F-statistic: 435.5 on 1 and 1373 DF, p-value: < 2.2e-16
> qt(0.975,1372)
[1] 1.961695
> 29.91744-qt(0.975,1372)*1.62247
[1] 26.73465
> 29.91744+qt(0.975,1372)*1.62247
[1] 33.10023
> 0.54175-qt(0.975,1372)*0.02596
[1] 0.4908244
> 0.54175+qt(0.975,1372)*0.02596
[1] 0.5926756
> confint(fit,level=0.95)
2.5 %
97.5 %
(Intercept) 26.7346495 33.1002241
Mheight
0.4908201 0.5926739
> 7052.0/qchisq(0.975,1372)
[1] 4.775998
> 7052.0/qchisq(0.025,1372)
[1] 5.547348
()

Linear Statistical Analysis I

Fall 2013

16 / 17

> anova(fit)
Analysis of Variance Table
Response: Dheight
Df Sum Sq Mean Sq F value
Pr(>F)
Mheight
1 2236.7 2236.66 435.47 < 2.2e-16 ***
Residuals 1373 7052.0
5.14
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
> 7052.0/qchisq(0.975,1372)
[1] 4.775998
> 7052.0/qchisq(0.025,1372)
[1] 5.547348

()

Linear Statistical Analysis I

Fall 2013

17 / 17

Вам также может понравиться