Вы находитесь на странице: 1из 4

Regression Inferences in R Assume you have the data set previously named Data from Problem 1.

19, with explanatory variable named ACT and response variable named GPA. Assume further that you have fit a linear model to the data, and that the model is named College. Recall that the summary of this linear model fit loo s li e!

A "1 # $1%%& confidence interval for the intercept % "section '.'$ would be e(ual to the estimate "in this case '.11)%*$ plus or minus the product of the standard error "in this case %.+'%,9$ and the critical value under the t distribution at n # ' degrees of freedom with right-tail probability .'. In this case, n / 1'% so n # ' / 11, "as given under 0Residual standard error1 in the summary$. 2imilarly, a "1 # $1%%& confidence interval for the slope 1 "section '.1$ would be e(ual to the estimate "in this case %.%+,,+$ plus or minus the product of the standard error "in this case %.%1'33$ and the critical value under the t distribution at n # ' degrees of freedom with right-tail probability .'. 4o find the critical value under the t distribution at n # ' degrees of freedom with left-tail probability 1 # .' without having to use the tables in the text, use the qt() command in R. If / %.%*, then 1 # .' / %.93*. 4hen the critical value under the t distribution at 11, degrees of freedom with left-tail probability %.93* can be found by giving the command! > qt(0.975, 118) 5e then have all the (uantities needed to calculate the confidence intervals for the slope and intercept of the true regression line. 6owever, we can actually get those confidence intervals directly. 4o obtain 9*& confidence intervals for the regression parameters in the above linear model College, the R command would be! > confint(College)

4hen R will display the lower bound and the upper bound of the confidence interval for each parameter! 2.5 % 97.5 % ( nte!ce"t) 1.#7859015 2.7#9508#2 ACT 0.01$5$$07 0.0%#12118 4o obtain a different confidence interval, say, a 99& confidence interval, you would type! > confint(College, le&el'0.99) If you want simultaneous confidence intervals for both the intercept and slope, using the 7onferroni method with 8oint confidence level , set the level e(ual to 1 # . '. If you want to conduct the hypothesis test H%! 1 / % "section '.1$, the test statistic t9 is already calculated for you in the model summary "see the top of the first page$. :otice that for the slope "after the word 0A;41$ under t &al(e we have a t-value of +.%)%. :ow you can loo up the critical value for the t distribution as described above for any given significance level. 6owever, the two-sided P-value is already given to you under P!(>)t)), which in this case is given to be 0%.%%'9',1 a small value. ":ote! the summary will not display a P-value smaller than 'e-1<, but will simply display ='e-1<, which is essentially >ero.$ 4he two stars in the last column of the summary tell you that the given P-value is between %.%%1 and %.%1 "in the significance codes line$. 2o you would definitely re8ect H% at any reasonable significance level, and conclude that there is a linear association between the two variables. ?ou would test H%! % / % in the same way. :ote that in this example, the P value for the intercept is 1.$0e*09, which means 1.+% x 1%-9, so you would strongly re8ect H% at any reasonable significance level. 6owever, if you want to test H%! 1 / k or H%! % / k, where k is some non->ero value, you have to do a little more wor . ?ou have to calculate the test statistic t9 yourself, using t9 / "@stimate # k$ . 2td. @rror where the values for @stimate and 2td. @rror come from the linear model summary. ?ou would then have to calculate the critical value for the t distribution at the desired level, as described above, and ma e the comparison appropriate for your decision rule. ?ou can also calculate the two-sided P value using the absolute value of the result you got for t9. 4he R command to get the two-sided P-value is! > 2 + "t()t+), n*2, lo,e!.tail ' -A./0) Af course, in place of )t+) put the absolute value of t9, and in place of n*2 put the value of n # '. If you are conducting a one-sided test li e H%! 1 B k or H%! 1 = k, you would obtain t9 the same way, but when you find the critical value for comparison with t9, do not divide the value of in half. Also, if you are calculating the P-value, do not multiply by ', do not use the absolute value of t9, and if you are testing H%! 1 = k you should omit the setting lo,e!.tail ' -A./0.

:ext, suppose you want confidence intervals for the mean response ECYhD at different specified levels of your explanatory variable, whether or not those levels occur in the original data "section '.)$. 4he easiest method is to first define a data frame, letEs call it ne,, that consists of the values at which you want confidence intervals. 7ut in order for this to wor correctly, you have to use the name of the explanatory variable for your data. Recall that in our example, that variable is named ACT. 2uppose you want a confidence interval for the mean response when Xh / '%. 4hen first give the command > ne, 1* 2ata.f!a3e(ACT'20) Ar, if you want ;Is at Xh / '%, Xh / '*, and Xh / +%, you can do three at once this way! > ne, 1* 2ata.f!a3e(ACT'c(20,25,$0)) 4his creates a data frame ob8ect in R called ne,. Remember, instead of ACT, use the name of the explanatory variable in your data. :ow, if you want 9%& confidence intervals at these three values, give the command! > "!e2ict(College, ne,, 4e.fit ' T560, inte!&al ' 7confi2ence7, le&el ' 0.90) Fa e sure you use the name of your linear model in place of College. 4hen R will return, for each level of Xh that you specified, the fitted value "Yh$ followed by the lower and upper bounds of the confidence interval for mean response. 4his will also give you the standard errors for the fitted values. If you donEt need them, you can change the value of 4e.fit to GAH2@ or 8ust delete that argument. 6owever, if you need to predict the mean of m new observations at the given values of Xh, or you need to obtain a confidence band for the regression line at these values, you will need these standard errors for e(uations (2.39), (2.39a) and (2.40) in the text. In that case, you may want to save the output from the "!e2ict() function. Iust choose a name for the output, say C , and type C 1* in front of the "!e2ict() function. 4hen if you need, for example, the second fitted value "that is, h when Xh / '*$ among the three, it is stored as C 8fit92,1:, and its corresponding standard error is stored as C 84e.fit92:. If your dataframe ne, only involved one value for Xh, then the corresponding value for h is stored as C 8fit91: and its standard error is stored as C 84e.fit. ?ou can change the value of le&el if you need a 9*& or a 99& confidence interval to %.9* or %.99, respectively. 4he default level is %.9*, so if you need a 9*& confidence interval you can also 8ust leave the le&el argument out. If you want simultaneous confidence intervals for the mean response at g different levels of the predictor, using the 7onferroni method with 8oint confidence level , set the level e(ual to 1 # . g. If you need prediction intervals instead "section '.*$, 8ust change 7confi2ence7 to 7"!e2iction7 in the command after inte!&al'. @verything else is the same. If you want 9*& confidence intervals for all the fitted values corresponding to the original data, type!

> "!e2ict(College, 4e.fit ' T560, inte!&al ' 7confi2ence7, le&el ' 0.95) 4his will also give you the standard errors for the fitted values. Again, you can change the level if you need a 9%& or a 99& confidence interval, and if you need prediction intervals instead, change 7confi2ence7 to 7"!e2iction7 in the command after inte!&al'. In this example, you would receive 1'% confidence intervals or prediction intervals, most of them repeated multiple times since there are many students in the sample with the same A;4 score. If you were only interested in intervals at specific levels of the explanatory variable "which occur in the original data$, you would have to match up the returned intervals with the original data. 4his would be cumbersome, so the previous method is preferred. 4o obtain the 5or ing-6otelling 1 # confidence band for the regression line at Xh, you first need the multiplier W, using the F distribution. If 1 # / %.9% and n # ' / 11,, for example, then the multiplier is found by typing in R! > ; 1* 4q!t( 2 + qf(0.90, 2, 118) ) 4hen the confidence band at Xh can be found using e(uation (2.40) and the stored values of h and its standard error. Gor the previous example involving three levels of Xh, , the 9%& confidence band when Xh / '* could be found by typing!
> c( C 8fit92,1: < ; + C 84e.fit92:, C 8fit92,1: = ; + C 84e.fit92: )

7ut if your data frame ne, consisted only of Xh / '% "rather than the three values of '%, '* and +%$, you would instead 8ust type!
> c( C 8fit91: < ; + C 84e.fit, C 8fit91: = ; + C 84e.fit )

Вам также может понравиться