Вы находитесь на странице: 1из 3

Intro: outline se dekhna hai

Normality tests + explanation





Above is the stata regression output. According to this output, the linear regression model is:
E (Y) = B
o
+B
1
(x
1
)
Where:
Y is the dependent variable or the response variable i.e. Life expectancy
B
o
is the constant term which is affecting the outcome variable without any change in
independent variable
B
1
is the coefficient of the independent variable i.e. Rural Population
x
1
is the value of the independent variable i.e. Rural Population
The final linear regression model is:
E (Life expectancy) = 82.78 0.3456 (Rural population)
According to this model, if the rural
population is 0, which means that
the entire populations lies in urban
regions, then the life expectancy is
82.78 years. However, as soon as
rural population increases by one
unit of percentage, the average life
expectancy will go down by 0.3456
years which is approximately 4
months. Figure 1 shows a graphical
representation of the regression
model. The slope of the line of best
fit is the coefficient of the
independent variable (B
1
). The y-
intercept is the constant term

_cons 82.77647 2.634268 31.42 0.000 77.38041 88.17252
Ruralpopulation -.3456043 .0514126 -6.72 0.000 -.4509181 -.2402904

Lifeexpectancy Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 4206.02123 29 145.035215 Root MSE = 7.5808
Adj R-squared = 0.6038
Residual 1609.13251 28 57.4690182 R-squared = 0.6174
Model 2596.88872 1 2596.88872 Prob > F = 0.0000
F( 1, 28) = 45.19
Source SS df MS Number of obs = 30
. regress Lifeexpectancy Ruralpopulation
4
0
5
0
6
0
7
0
8
0
0 20 40 60 80
Rural population
Life expectancy Fitted values
(82.78)

Prob > F is the p-value of the model. It tests whether R
2
is different from 0. Usually we need a p-value
lower than 0.05 to show a statistically significant relationship between X and Y. In this case, clearly this
value is less than 0.05 which shows that rural population variable is statistically significant to explain
the life expectancy.
Moreover, R-square shows the amount of variance of Y explained by X. In this case rural population
explains 61.74% of the variance in life expectancy. Adjusted R
2
shows the same as R2 but adjusted by
the number of cases and number of variables. Hence, this provides a more honest association
between rural population and life expectancy.
Although, it is very clear from the values of R
2
that there is a relationship between rural population
and life expectancy, we can still test hypothesis regarding the coefficient of rural population:
Ho: B
1
=0
Ha: B
1
0
The null hypothesis states here that the coefficient (B
1
) of the independent variable, rural population,
is zero. This would mean that there is no relationship between the two variables and that rural
population does not affect life expectancy. The alternate hypothesis on the other hand, states that the
coefficient is non-zero which would mean there is a relationship between the two variables and that
rural population affects life expectancy in either a negative way or positive way.
To test this hypothesis, we can use the T-value test, the P-value test and the Confidence Interval test.
An assumption we are making here is that the significance level is 5%.
Firstly, we will use the T-test. The t-values test the hypothesis that the coefficient is different from 0
which is our alternate hypothesis. We will reject the alternate hypothesis if T>t
.
In this case, we need
a t-value greater than 1.96 (for 95% confidence). According to the stata output, the T-value is -6.72.
This means that we reject the null hypothesis that the coefficient is 0. Therefore, the coefficient is
non-zero and the independent variable is affecting the response variable.
The second test we do is the p-value test. The two-tail p-values test the hypothesis that each
coefficient is different from 0. In this case, there is only one coefficient which is of rural population. As
this value is less than 0.05, it can be safely concluded that at 5% significance level, this null hypothesis
is rejected. Therefore, rural population is statistically significant in explaining life expectancy.
The last test of hypothesis is the confidence interval test. As the value of 0 is not included in the
confidence interval (-0.4509181, -0.2402904), it can be deduced that the null hypothesis has to be
rejected. Hence, rural population is statistically significant in explaining the variance in life expectancy.


the importance of looking at this relationship,
limitations of using a simple linear regression (like missing some important independent variables,
especially the ones that may be related to your independent variable of interest)
conclusion-what can you conclude or not conclude from your work.
Assumptions of the error term
Prediction and estimation
1) We can check the usefulness of the hypothesized model i.e. x really contributes information
for the prediction of y using the straight line model.
2) The values of y for valus of x which are outside the range of the values of x contained in the
sample data may lead to errors of estimation and prediction.
2) it could give a poor repreasentation of true model for valus of x outside this region.

Вам также может понравиться