dependent variable and is called the linear probability model because the
response probability is linear in the parameters.
- There are at least two potential problems with using PEAs to summarize the
partial effects of the explanatory variables:
o If some of the explanatory variables are discrete, the averages of them
represent no one in the sample.
o If a continuous explanatory variable appears as a non-linear function what
should we do average the variable or the function? (most packages
default to the latter)
- In probability theory, the central limit theorem (CLT) establishes that, in some
situations, when independent random variables are added, their properly
normalized sum tends toward a normal distribution (informally a "bell curve") even
if the original variables themselves are not normally distributed.
- Outliers usually violate the first assumption because the model does not work
linearly.
o Homoscedasticity
▪ Heteroskedasticity does not affect the unbiasedness of the
coefficients but it affects the standard errors.
- Data missing:
o When data is missing at random, the estimated slope does not change
considerably. However, when it is missing not at random, we have a case
like that of a non-random sample.
- Outliers:
o The bigger the sample the less the results are affected by the outlier.