SDM Consolidated

OLS Assumptions :
1. Independence: Observations independent of each other.

2. Multivariate normality: Residuals (errors) follow a normal distribution.
3. Homoskedasticity: Constant variance of residuals.
4. Linearity: Linear relationship between response and predictor variables.
5. Little or no multicollinearity: Predictors should not be highly correlated.
6. No autocorrelation: Variables not correlated with itself over time (in panel or time series data).
OLS assumes that the residual 𝜀 is a random variable that is independently and identically distributed
(i.i.d.) with the properties 𝜀 ~ N(0, 𝜎2).
Normality Tests:
Shapiro-Wilk Test
shapiro.test(m2$res) for less than 2000
Kolmogorov-Smirnov Test:
ks.test(norm, m2$res) for higher no of obs <2000
Homoskedasticity:
Property of being equal variance for residual
Heteroskedastic: not having equal variance, violating the ols assumption
If residuals plot is non-linear there is possibility of missing variable in model and it suggests non-
linearity.
Tests for homoskedasticity :
Bartlette’s test for equal variances. bartlett.test(list(m2$res, m2$fit))
Leven’s test for non-normal data =
Endogeniety :
Two stage least square
stage1 <- lm(Vehicles ~ Popn, data=d)

VehiclesHat <- fitted(stage1)
stage2 <- lm(Sales ~ Mktg + Reps + VehiclesHat, data=d)
summary(stage2)
install.packages("sem")
library(sem)
2sls <- tsls(Sales ~ Vehicles + Mktg + Reps, ~ Popn, data=d)
summary(2sls)
Hausman Test for Endogeneity

Chi-sq test between coefficients from OLS and 2SLS.
Basic idea: If coefficients are too close, no need to use 2SLS.
H0: OLS = 2SLS.
Reject H0 if p<0.05
cf_diff <- coef(2sls) - coef(ols)
vc_diff <- vcov(2sls) – vcov(ols)
x2_diff <- as.vector(t(cf_diff) %*% solve(vc_diff) %*% cf_diff)
pchisq(x2_diff, df = 2, lower.tail = FALSE)
Detecting autocorrelation:
Durbin Watson test
ACF and PACF plots :

Residuals should be randomly distributed arnd mean
Patterns (exp decay, posit-negative swing are bad)
Drunk driving:
H1a : B drinkage < 0 young people are more prone to drink and driving so if legal drinking is increased it
will reduce the fatalities
H2a : B beertax < 0 : if we increase the beertax less number f people would by drink and eventually less
fatalities
H3a : B jail < 0 : if jail sentence increase for drunk driv people will drive carefully and fatalities would be
less
H4a : B comm < 0 : if community service for drunk driving incrase people will drive more carefully
Model 1 :
Call:
plm(formula = fatalityrate ~ beertax + as.factor(legdrinkage) +
as.factor(jail) + +as.factor(commserv) + unemprate, data = drunkdrive,
model = "random", index = c("state", "year"))
Balanced Panel: n = 48, T = 7, N = 336
Effects:
var std.dev share
idiosyncratic 0.03377 0.18376 0.129
individual 0.22798 0.47748 0.871
theta: 0.8561
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.472985 -0.114130 -0.024289 0.086629 0.910719
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 2.3412319 0.1368313 17.1104 < 2.2e-16 ***
beertax 0.0607571 0.1214867 0.5001 0.6173
as.factor(legdrinkage)19 -0.0654510 0.0817920 -0.8002 0.4242
as.factor(jail)yes 0.0897133 0.1135126 0.7903 0.4299
as.factor(commserv)yes -0.0765384 0.1318356 -0.5806 0.5619
unemprate -0.0345450 0.0073345 -4.7100 3.664e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 12.82

Residual Sum of Squares: 11.88
R-Squared: 0.073264
Adj. R-Squared: 0.053487
F-statistic: 3.70436 on 7 and 328 DF, p-value: 0.00071513
Fatality rate = 2.34 + 0.06*beertax -0.065*age19 – 0.08*age20 – 0.1*age21 + 0.08*jail –

0.07*commserv – 0.034*unemp
R squre for the above modle is much less than As per the above model we can suggest to increase the
legal drinking age to 21 and to increase the community service for drink n driving cases.
Model with year :
> summary(m3)
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)
Call:
plm(formula = fatalityrate ~ beertax + as.factor(legdrinkage) +
as.factor(jail) + as.factor(year) + as.factor(commserv) +
unemprate, data = drunkdrive, model = "random", index = c("state",
"year"))
Balanced Panel: n = 48, T = 7, N = 336
Effects:
var std.dev share
idiosyncratic 0.02628 0.16213 0.103
individual 0.22905 0.47859 0.897
theta: 0.873
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.466201 -0.105241 -0.018079 0.094308 0.649069
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 2.8739226 0.1474620 19.4892 < 2.2e-16 ***
beertax -0.0216301 0.1195870 -0.1809 0.85658
as.factor(jail)yes 0.1449190 0.1083812 1.3371 0.18213
as.factor(year)1983 -0.0879033 0.0356283 -2.4672 0.01414 *
as.factor(year)1984 -0.2522745 0.0411269 -6.1341 2.510e-09 ***
as.factor(year)1985 -0.3138957 0.0422482 -7.4298 9.860e-13 ***
as.factor(year)1986 -0.2349473 0.0441156 -5.3257 1.891e-07 ***
as.factor(year)1987 -0.2958331 0.0490291 -6.0338 4.398e-09 ***
as.factor(year)1988 -0.3512523 0.0540569 -6.4978 3.096e-10 ***
as.factor(commserv)yes -0.0626118 0.1249300 -0.5012 0.61659
unemprate -0.0846675 0.0098518 -8.5941 3.693e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 12.369

Residual Sum of Squares: 9.544
R-Squared: 0.2284
Adj. R-Squared: 0.19724
F-statistic: 7.33176 on 13 and 322 DF, p-value: 1.3207e-12
Looking at model we can see that there is no specific pattern of continuous icrease or descreae of
fatality rate.
The model abve gives rsq of 22% which is not quite strong though we have used the plm models .
We could increase the model by using advanced plm and time series seasonality techniques.
Log :
Y = a + b*log(x)
B = Y will change when x is changed by 100%
Log(y) = a + b*x
When x increases by 1 y increases by B *100 %
Log(y) = a + b*log(x)
When

SDM Consolidated

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

SDM Consolidated

Загружено:

Авторское право:

Доступные форматы

OLS Assumptions :

1. Independence: Observations independent of each other.

Leven’s test for non-normal data =

stage1 <- lm(Vehicles ~ Popn, data=d)

Hausman Test for Endogeneity

ACF and PACF plots :

Balanced Panel: n = 48, T = 7, N = 336

Total Sum of Squares: 12.82

Fatality rate = 2.34 + 0.06beertax -0.065age19 – 0.08age20 – 0.1age21 + 0.08*jail –

Model with year :

Balanced Panel: n = 48, T = 7, N = 336

Total Sum of Squares: 12.369

Вам также может понравиться