Above is the stata regression output. According to this output, the linear regression model is: E (Y) = B o +B 1 (x 1 ) Where: Y is the dependent variable or the response variable i.e. Life expectancy B o is the constant term which is affecting the outcome variable without any change in independent variable B 1 is the coefficient of the independent variable i.e. Rural Population x 1 is the value of the independent variable i.e. Rural Population The final linear regression model is: E (Life expectancy) = 82.78 0.3456 (Rural population) According to this model, if the rural population is 0, which means that the entire populations lies in urban regions, then the life expectancy is 82.78 years. However, as soon as rural population increases by one unit of percentage, the average life expectancy will go down by 0.3456 years which is approximately 4 months. Figure 1 shows a graphical representation of the regression model. The slope of the line of best fit is the coefficient of the independent variable (B 1 ). The y- intercept is the constant term
Lifeexpectancy Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 4206.02123 29 145.035215 Root MSE = 7.5808 Adj R-squared = 0.6038 Residual 1609.13251 28 57.4690182 R-squared = 0.6174 Model 2596.88872 1 2596.88872 Prob > F = 0.0000 F( 1, 28) = 45.19 Source SS df MS Number of obs = 30 . regress Lifeexpectancy Ruralpopulation 4 0 5 0 6 0 7 0 8 0 0 20 40 60 80 Rural population Life expectancy Fitted values (82.78)
Prob > F is the p-value of the model. It tests whether R 2 is different from 0. Usually we need a p-value lower than 0.05 to show a statistically significant relationship between X and Y. In this case, clearly this value is less than 0.05 which shows that rural population variable is statistically significant to explain the life expectancy. Moreover, R-square shows the amount of variance of Y explained by X. In this case rural population explains 61.74% of the variance in life expectancy. Adjusted R 2 shows the same as R2 but adjusted by the number of cases and number of variables. Hence, this provides a more honest association between rural population and life expectancy. Although, it is very clear from the values of R 2 that there is a relationship between rural population and life expectancy, we can still test hypothesis regarding the coefficient of rural population: Ho: B 1 =0 Ha: B 1 0 The null hypothesis states here that the coefficient (B 1 ) of the independent variable, rural population, is zero. This would mean that there is no relationship between the two variables and that rural population does not affect life expectancy. The alternate hypothesis on the other hand, states that the coefficient is non-zero which would mean there is a relationship between the two variables and that rural population affects life expectancy in either a negative way or positive way. To test this hypothesis, we can use the T-value test, the P-value test and the Confidence Interval test. An assumption we are making here is that the significance level is 5%. Firstly, we will use the T-test. The t-values test the hypothesis that the coefficient is different from 0 which is our alternate hypothesis. We will reject the alternate hypothesis if T>t . In this case, we need a t-value greater than 1.96 (for 95% confidence). According to the stata output, the T-value is -6.72. This means that we reject the null hypothesis that the coefficient is 0. Therefore, the coefficient is non-zero and the independent variable is affecting the response variable. The second test we do is the p-value test. The two-tail p-values test the hypothesis that each coefficient is different from 0. In this case, there is only one coefficient which is of rural population. As this value is less than 0.05, it can be safely concluded that at 5% significance level, this null hypothesis is rejected. Therefore, rural population is statistically significant in explaining life expectancy. The last test of hypothesis is the confidence interval test. As the value of 0 is not included in the confidence interval (-0.4509181, -0.2402904), it can be deduced that the null hypothesis has to be rejected. Hence, rural population is statistically significant in explaining the variance in life expectancy.
the importance of looking at this relationship, limitations of using a simple linear regression (like missing some important independent variables, especially the ones that may be related to your independent variable of interest) conclusion-what can you conclude or not conclude from your work. Assumptions of the error term Prediction and estimation 1) We can check the usefulness of the hypothesized model i.e. x really contributes information for the prediction of y using the straight line model. 2) The values of y for valus of x which are outside the range of the values of x contained in the sample data may lead to errors of estimation and prediction. 2) it could give a poor repreasentation of true model for valus of x outside this region.