Вы находитесь на странице: 1из 4

OLS Assumptions :

1. Independence: Observations independent of each other.


2. Multivariate normality: Residuals (errors) follow a normal distribution.
3. Homoskedasticity: Constant variance of residuals.
4. Linearity: Linear relationship between response and predictor variables.
5. Little or no multicollinearity: Predictors should not be highly correlated.
6. No autocorrelation: Variables not correlated with itself over time (in panel or time series data).

OLS assumes that the residual 𝜀 is a random variable that is independently and identically distributed
(i.i.d.) with the properties 𝜀 ~ N(0, 𝜎2).
Normality Tests:
Shapiro-Wilk Test
shapiro.test(m2$res) for less than 2000
Kolmogorov-Smirnov Test:
ks.test(norm, m2$res) for higher no of obs <2000

Homoskedasticity:
Property of being equal variance for residual
Heteroskedastic: not having equal variance, violating the ols assumption

If residuals plot is non-linear there is possibility of missing variable in model and it suggests non-
linearity.
Tests for homoskedasticity :
Bartlette’s test for equal variances. bartlett.test(list(m2$res, m2$fit))

Leven’s test for non-normal data =

Endogeniety :
Two stage least square

stage1 <- lm(Vehicles ~ Popn, data=d)


VehiclesHat <- fitted(stage1)
stage2 <- lm(Sales ~ Mktg + Reps + VehiclesHat, data=d)
summary(stage2)
install.packages("sem")
library(sem)
2sls <- tsls(Sales ~ Vehicles + Mktg + Reps, ~ Popn, data=d)
summary(2sls)

Hausman Test for Endogeneity


Chi-sq test between coefficients from OLS and 2SLS.
Basic idea: If coefficients are too close, no need to use 2SLS.
H0: OLS = 2SLS.
Reject H0 if p<0.05
cf_diff <- coef(2sls) - coef(ols)
vc_diff <- vcov(2sls) – vcov(ols)
x2_diff <- as.vector(t(cf_diff) %*% solve(vc_diff) %*% cf_diff)
pchisq(x2_diff, df = 2, lower.tail = FALSE)

Detecting autocorrelation:
Durbin Watson test

ACF and PACF plots :


Residuals should be randomly distributed arnd mean
Patterns (exp decay, posit-negative swing are bad)

Drunk driving:

H1a : B drinkage < 0 young people are more prone to drink and driving so if legal drinking is increased it
will reduce the fatalities
H2a : B beertax < 0 : if we increase the beertax less number f people would by drink and eventually less
fatalities
H3a : B jail < 0 : if jail sentence increase for drunk driv people will drive carefully and fatalities would be
less
H4a : B comm < 0 : if community service for drunk driving incrase people will drive more carefully

Model 1 :
Call:
plm(formula = fatalityrate ~ beertax + as.factor(legdrinkage) +
as.factor(jail) + +as.factor(commserv) + unemprate, data = drunkdrive,
model = "random", index = c("state", "year"))

Balanced Panel: n = 48, T = 7, N = 336

Effects:
var std.dev share
idiosyncratic 0.03377 0.18376 0.129
individual 0.22798 0.47748 0.871
theta: 0.8561

Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.472985 -0.114130 -0.024289 0.086629 0.910719

Coefficients:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 2.3412319 0.1368313 17.1104 < 2.2e-16 ***
beertax 0.0607571 0.1214867 0.5001 0.6173
as.factor(legdrinkage)19 -0.0654510 0.0817920 -0.8002 0.4242
as.factor(legdrinkage)20 -0.0813748 0.0829191 -0.9814 0.3271
as.factor(legdrinkage)21 -0.1005301 0.0791180 -1.2706 0.2048
as.factor(jail)yes 0.0897133 0.1135126 0.7903 0.4299
as.factor(commserv)yes -0.0765384 0.1318356 -0.5806 0.5619
unemprate -0.0345450 0.0073345 -4.7100 3.664e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares: 12.82


Residual Sum of Squares: 11.88
R-Squared: 0.073264
Adj. R-Squared: 0.053487
F-statistic: 3.70436 on 7 and 328 DF, p-value: 0.00071513

Fatality rate = 2.34 + 0.06*beertax -0.065*age19 – 0.08*age20 – 0.1*age21 + 0.08*jail –


0.07*commserv – 0.034*unemp

R squre for the above modle is much less than As per the above model we can suggest to increase the
legal drinking age to 21 and to increase the community service for drink n driving cases.

Model with year :

> summary(m3)
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)

Call:
plm(formula = fatalityrate ~ beertax + as.factor(legdrinkage) +
as.factor(jail) + as.factor(year) + as.factor(commserv) +
unemprate, data = drunkdrive, model = "random", index = c("state",
"year"))

Balanced Panel: n = 48, T = 7, N = 336

Effects:
var std.dev share
idiosyncratic 0.02628 0.16213 0.103
individual 0.22905 0.47859 0.897
theta: 0.873

Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.466201 -0.105241 -0.018079 0.094308 0.649069

Coefficients:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 2.8739226 0.1474620 19.4892 < 2.2e-16 ***
beertax -0.0216301 0.1195870 -0.1809 0.85658
as.factor(legdrinkage)19 -0.0140196 0.0746655 -0.1878 0.85118
as.factor(legdrinkage)20 -0.0093797 0.0763558 -0.1228 0.90231
as.factor(legdrinkage)21 -0.0095802 0.0743783 -0.1288 0.89759
as.factor(jail)yes 0.1449190 0.1083812 1.3371 0.18213
as.factor(year)1983 -0.0879033 0.0356283 -2.4672 0.01414 *
as.factor(year)1984 -0.2522745 0.0411269 -6.1341 2.510e-09 ***
as.factor(year)1985 -0.3138957 0.0422482 -7.4298 9.860e-13 ***
as.factor(year)1986 -0.2349473 0.0441156 -5.3257 1.891e-07 ***
as.factor(year)1987 -0.2958331 0.0490291 -6.0338 4.398e-09 ***
as.factor(year)1988 -0.3512523 0.0540569 -6.4978 3.096e-10 ***
as.factor(commserv)yes -0.0626118 0.1249300 -0.5012 0.61659
unemprate -0.0846675 0.0098518 -8.5941 3.693e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares: 12.369


Residual Sum of Squares: 9.544
R-Squared: 0.2284
Adj. R-Squared: 0.19724
F-statistic: 7.33176 on 13 and 322 DF, p-value: 1.3207e-12

Looking at model we can see that there is no specific pattern of continuous icrease or descreae of
fatality rate.

The model abve gives rsq of 22% which is not quite strong though we have used the plm models .
We could increase the model by using advanced plm and time series seasonality techniques.

Log :

Y = a + b*log(x)
B = Y will change when x is changed by 100%

Log(y) = a + b*x
When x increases by 1 y increases by B *100 %

Log(y) = a + b*log(x)
When

Вам также может понравиться