Академический Документы
Профессиональный Документы
Культура Документы
and
Key question: How accurate are these estimates? Statistical procedures allow us to formally address this question.
Consider fitting a line through the XY-plots in Figures 5.1-5.4. You would be most confident in the line you fit in Figure 5.3 Larger number of data points + less scattering (i.e. less variability in errors) + more variability in X = more accurate estimates.
Note: Figures 5.1, 5.2, 5.3 and 5.4 all contain artificially generated data with =0, =1.
0 0 1 2 3 4 5
-2
-4 X
Y 2 1 0 0 1 2 3 4 5 6 -1 X
tb is a critical value from the Student tdistribution --- calculated automatically in computer packages
sb = standard error of is a measure of the accuracy of
SR S ( N 2 ) ( X i X )2
(cont.)
tb controls the confidence level (e.g. tb is bigger for 95% confidence than 90%). sb varies directly with SSR (i.e. how variable the residuals are) sb varies inversely with N, the number of data points
( sb varies inversely with X i X ) 2 , which is related to the variance/variability of X.
Note: Different computer packages label confidence intervals in different ways. E.g. Excel labels bounds of confidence interval as Lower 95% and Upper 95%
Useful (but formally incorrect) intuition: There is a 95% probability that the true value of lies in the confidence interval. Correct intuition: If you repeatedly use the above formula for calculating a confidence interval, 95% of the intervals you construct will contain the true value for .
Can choose any level of confidence you want (e.g. 90%, 99%).
10
Data Set Figure 5.1 Figure 5.2 Figure 5.3 Figure 5.4
90% 95% Confid. Confid. Interval Interval [-.92,2.75] [-1.57,3.39] [.75,1.32] [.70,1.38] [.99,1.01] [.99,1.02] [-1.33,4.36] [-1.88,4.91]
11
.000842
12
Y = 34 ,136 + 6.59 X ,
The OLS estimate of the marginal effect of X on Y is 6.59. Increasing lot size by an extra square foot is associated with a $6.59 increase in house price. The 95% confidence interval for [5.72,7.47]. is
We are 95% confident that the effect of lot size on house is at least $5.72 and at most $7.47.
13
Hypothesis Testing
Test whether =0 (i.e. whether X has any explanatory power) One way of doing it: look at confidence interval, check whether it contains zero. If no, then you are confident 0.
14
Useful (but formally incorrect) intuition: P-value measures the probability that = 0. .05 = 5% = level of significance Other levels of significance (e.g. 1% or 10%) occasionally used
15
16
Jargon
17
Test whether R2=0 (i.e. whether X has any explanatory power) Note: In simple regression testing R2=0 and =0 are the same, but in multiple regression they will be different. F-statistic is a test statistic analogous to tstatistic (e.g. small values of it indicate R2=0).
( N 2) R 2 F = (1 R 2 )
18
19
P-value = Significance F = 5.5 10-10. Since P-value < .05 conclude R2 0. Profits do have explanatory power for Y.
20
Chapter Summary
1.
Accuracy of OLS estimates depends on number of data points, variability of the explanatory variable and variability of the errors. The confidence interval provides an interval in which you can be confident lies. The width of the confidence interval depends on the same factors as affect the accuracy of OLS estimates. In addition, the width of the confidence interval depends on the confidence level. A hypothesis test of whether =0 can be used to find out whether the explanatory variable belongs in the regression. The P-value is a measure of how plausible the hypothesis is. If the P-value for the hypothesis test of whether =0 is less than .05 then you can reject the hypothesis at the 5% level of significance. If the P-value for the hypothesis test of whether =0 is greater than .05 then you cannot reject the hypothesis at the 5% level of significance. A hypothesis test of whether R2=0 can be used to investigate whether the regression helps explain the dependent variable.
2.
3.
4.
5.
6.
7.
21