Вы находитесь на странице: 1из 10

DEPAUL UNIVERSITY

HW 2
MAT 512 Li
Peter Drogos
2/12/2014

Drogos 2
NOTE: R code is bolded in small font
1.
l.

. R concludes verifies this result (see below, highlighted).

Call:
lm(formula = Minutes ~ Copiers, data = copier.data)

Residuals:
Min

1Q Median

3Q

Max

-6.8729 -2.9696 -0.4751 2.8260 7.3315

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.4641

3.4390 3.334 0.00875 **

Copiers

0.8045 30.580 2.09e-10 ***

24.6022

--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 4.615 on 9 degrees of freedom


Multiple R-squared: 0.9905, Adjusted R-squared: 0.9894
F-statistic: 935.1 on 1 and 9 DF, p-value: 2.094e-10

Drogos 3
m. The calculated simple correlation coefficient is 0.995. This means that there is a strong relationship between Minutes and Copiers,
and they are positively correlated.
n. (
. This means that we reject
, and we say that the simple regression model is significant
)
at level of significance
. This matches with the results from R, where we can see a significant p-value much less than the
level of significance. This confirms our strong relationship between Minutes and Copiers.
Call:
lm(formula = Minutes ~ Copiers, data = copier.data)

Residuals:
Min

1Q Median

3Q

Max

-6.8729 -2.9696 -0.4751 2.8260 7.3315

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.4641

3.4390 3.334 0.00875 **

Copiers

0.8045 30.580 2.09e-10 ***

24.6022

--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 4.615 on 9 degrees of freedom


Multiple R-squared: 0.9905, Adjusted R-squared: 0.9894
F-statistic: 935.1 on 1 and 9 DF, p-value: 2.094e-10

Drogos 4
2.
a.

The first data plot indicates that as the number of monthly X-ray exposures increases, the number of monthly labor hours required
increases. The same can be said of monthly occupied bed days and monthly labor hours. These plots suggest linear regression models.
The plot for the average length of patients stay and monthly labor hours required also suggests a linear regression model, but is less
defined. The data plots indicate that the given model might be reasonable because we wish to predict the monthly labor hours required
based on the monthly X-ray exposures, the monthly occupied sick days, and the average length of a patients stay.
b. (see attached written sheet)
c. Without using the handy functions, we find that the ordinary least squares estimate is

(
) (see attached Excel work and R code below).
> b<-solve(t(X)%*%X)%*%t(X)%*%y
>b
[,1]

Drogos 5
(Intercept) 1946.80203866
x1

0.03857709

x2

1.03939197

x3

-413.75779647

So, we see that the point estimates for

are 1946.802, 0.0386, 1.03939, and -413.7578, respectively.

d. The results from c) for the point estimates match when using the handy functions in R, see below
Call:
lm(formula = y ~ x1 + x2 + x3)

Residuals:
Min

1Q Median

3Q

Max

-677.23 -270.19 60.93 228.32 517.70

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1946.80204 504.18193 3.861 0.00226 **
x1

0.03858 0.01304 2.958 0.01197 *

x2

1.03939

x3

0.06756 15.386 2.91e-09 ***

-413.75780 98.59828 -4.196 0.00124 **

--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

The point estimates for

(from R output) are 1946.802, 0.0386, 1.03939, and -413.7578, respectively.

Drogos 6
e. When x1=56194, x2=14077.88, and x3=6.89, we calculate a point prediction of 15896.25 (see attached sheet).
f. Using R, we obtain the same results as the written solution.
predict(hosp.model,data.frame(x1=56194,x2=14077.88,x3=6.89))
1
15896.25

g. The 95% PI of the labor hours corresponding to


> X<-model.matrix(~x1+x2+x3,data=hosp.data)
> b<-solve(t(X)%*%X)%*%t(X)%*%y
> x0<-c(1,56194,14077.88,6.89)
> y.pred<-t(b)%*%x0
> t<-qt(0.975,length(y)-length(b))
> DV<-t(x0)%*%solve(t(X)%*%X)%*%x0
> sum(resid(hosp.model)^2)
> s.square<-sum(resid(hosp.model)^2)/(length(y)-4)
> s<-sqrt(s.square)
> low.PI<-y.pred-(t*s*sqrt(1+DV))
> upp.PI<-y.pred+(t*s*sqrt(1+DV))

) is [14906.24, 16886.26]

Drogos 7
h. The 95% PI of the labor hours corresponding to

) obtained by R is [14906.24,16886.26].

predict(hosp.model,data.frame(x1=56194,x2=14077.88,x3=6.89),interval="p",level=0.95)
fit

lwr

upr

1 15896.25 14906.24 16886.26

i. An unbiased estimator for the standard deviation of the error term

is s=387.1598.

j. The adjusted R^2 =0.995155651 (see attached Excel work)


k. The result obtained by R confirms this result
> summary(hosp.model)

Residual standard error: 387.2 on 12 degrees of freedom


Multiple R-squared: 0.9961, Adjusted R-squared: 0.9952
F-statistic: 1028 on 3 and 12 DF, p-value: 9.919e-15

l. The model we use is significant because of the F-test results. We find that the linear regression model is significant at past the
significance level
(p<0.05).
F-statistic: 1028 on 3 and 12 DF, p-value: 9.919e-15

m. Based off of the t-test results, there is no term in the model we should drop (each term has p<0.05, each term is significant at
) level of significance (see highlighted portion below).
Coefficients:
Estimate Std. Error t value Pr(>|t|)

Drogos 8
(Intercept) 1946.80204 504.18193 3.861 0.00226 **
x1

0.03858 0.01304 2.958 0.01197 *

x2

1.03939 0.06756 15.386 2.91e-09 ***

x3

-413.75780 98.59828 -4.196 0.00124 **

3. (see attached written sheet)


-------------------------------------------------------------R code (no output)
1.
copier.data<-read.table('t3-7 service time.txt', header=T, sep=",")
> x<-copier.data[,1]
> y<-copier.data[,2]
> copier.model<-lm(Minutes~Copiers,data=copier.data)
> summary(copier.model)

2.
> hosp.data<-read.table('t4-11 hospital.txt', header=T,sep=",")
> hosp.data
> x1<-hosp.data[,1]
> x2<-hosp.data[,2]
> x3<-hosp.data[,3]
> y<-hosp.data[,4]

Drogos 9
> hosp.model<-lm(y~x1+x2+x3)
> predict(hosp.model,data.frame(x1=56194,x2=14077.88,x3=6.89),interval="p",level=0.95)
> X<-model.matrix(~x1+x2+x3,data=hosp.data)
>X
> b<-solve(t(X)%*%X)%*%t(X)%*%y
> y.pred<-t(b)%*%x0
> t<-qt(0.975,length(y)-length(b))
> DV<-t(x0)%*%solve(t(X)%*%X)%*%x0
> summary(hosp.model)
> sum(resid(hosp.model)^2)
> s.square<-sum(resid(hosp.model)^2)/(length(y)-4)
> s<-sqrt(s.square)
> low.PI<-y.pred-(t*s*sqrt(1+DV))
> upp.PI<-y.pred+(t*s*sqrt(1+DV))

-------------------------------------------------------Excel written work


x1

x2
2463
2048
3940
6505
5723
11520
5779

x3
472.92
1339.75
620.25
568.33
1497.6
1365.83
1687

y
4.45
6.92
4.28
3.9
5.5
4.6
5.62

566.52
696.82
1033.15
1603.62
1611.37
1613.27
1854.17

total variation
16618887
15573496
13032077
9238724
9191671
9180154
7778392

y^
explained variation
692.14447
15610420.02
555.12936
16711887.21
972.59527
13472949.06
1174.8082
12029372.88
1448.5043
10205741.45
1907.557
7483452.04
1597.8745
9273683.966

Drogos 10
5969
8461
20106
13313
10771
15543
34703
39204
86533

1639.92
2872.33
3655.08
2912
3921
3865.67
12446.33
14098.4
15524

x0
intercept
x1
x2
x3
y^
unexpl
var
R^2
adj. R^2

5.15 2160.55
6.18 2305.58
6.15 3503.93
5.88 3571.89
4.88
3741.4
5.5 4026.52
10.78 11732.17
7.05 15414.94
6.35 18854.45
4643.147

pt. estimates (from


R)
1946.802039
56194
0.03857709
14077.88
1.03939197
6.89
-413.7577965
15896.24724
1798712.99
0.996124521
0.995155651

6163287
5464219
1297815
1147591
813147.4
380228.7
50254249
1.16E+08
2.02E+08
4.64E+08

1750.7357
2701.6564
3976.8834
3054.1924
4418.6337
4288.6842
11761.849
15195.95
18793.152

8366042.31
3769385.45
443907.0666
2524776.288
50406.1466
125643.7705
50675922.86
111361644.5
200222653.6
462327888.6

Вам также может понравиться