Вы находитесь на странице: 1из 3

# - Linear Probability Model (LPM) in multiple linear regression model with a binary

dependent variable and is called the linear probability model because the
response probability is linear in the parameters.

- There are at least two potential problems with using PEAs to summarize the
partial effects of the explanatory variables:
o If some of the explanatory variables are discrete, the averages of them
represent no one in the sample.
o If a continuous explanatory variable appears as a non-linear function what
should we do average the variable or the function? (most packages
default to the latter)

- In probability theory, the central limit theorem (CLT) establishes that, in some
situations, when independent random variables are added, their properly
normalized sum tends toward a normal distribution (informally a "bell curve") even
if the original variables themselves are not normally distributed.

## - OLS, MLS, Gauss-Markov assumptions:

- Selection bias violates MLR.4 – Zero conditional mean

- Outliers usually violate the first assumption because the model does not work
linearly.

## - Violation of assumptions – Solutions:

o Linearity in the parameters
▪ Consider non-linear transformations of the independent and
dependent variables.
▪ Consider interaction terms and additional regression controls.
▪ Consider non-linear regression models.
o Random sample
▪ If the sample is not random, it is very likely that the results cannot
be generalized to the whole population. Therefore, it is up to the
researcher to recognize the limitation of the analysis, and not
make strong claims about how the results apply to the population.
Alternatively, the research might try to acquire a new sample —
this could take a lot of time and/or money, or it might not even be
possible in some cases.
o Multicollinearity
▪ As we saw earlier in the course, a common source of perfect
collinearity is if we try to include all levels of a categorical variable
in a model with a constant. Fortunately, R checks for perfect
collinearity for us.
▪ Multicollinearity affects your standard errors and thus the
significance of your results. Coefficients may also change
erratically in response to small changes in the model or the data,
because the estimate is less precise.
▪ If we have strongly collinear variables and we are concerned
about getting precise estimates for their coefficients, it might be
best to include only of the variables in the model. Strong
collinearity might be a less concerning issue if the collinear
variables are there just as controls, depending on the degree of
collinearity.
o Zero conditional mean
▪ Bias formula
▪ The obvious solution to this issue would be to include the omitted
variables. Many times, however, this is
▪ not possible because we do not have data on the omitted variable
— most likely this is why they are omitted.

o Homoscedasticity
▪ Heteroskedasticity does not affect the unbiasedness of the
coefficients but it affects the standard errors.
- Data missing:
o When data is missing at random, the estimated slope does not change
considerably. However, when it is missing not at random, we have a case
like that of a non-random sample.
- Outliers:
o The bigger the sample the less the results are affected by the outlier.

## - Key assumption of the first-differences approach? The change in unobserved

heterogeneity (νt = uit − uit−1) is uncorrelated with the changes in the
independent variables (δxit = xit − xit−1). Also, there must be variation on the
changes.