Академический Документы
Профессиональный Документы
Культура Документы
Multicollinearity
Nature
Causes Consequences
and Types
Detection
Correction
Multicollinearity:
Nature
Multicollinearity This
is a sample problem.
is because independent variables in a regression equation are assumed to be nonstochastic so that population covariance between them is zero by definition.
But
Multicollinearity:
Types
There
multicollinearity means existence of an exact linear relationship between two or more independent variables.
Multicollinearity:
Types
For
Y = 1 + 2X2 + 3X3 + u
would suffer from perfect multicollinearity if
X3 = 1+ 2X2
Note
that there is no random error term in the second equation, which means |r23| = 1.
Multicollinearity:
Types
In
the case of imperfect multicollinearity, r23 is large but not equal to one in absolute value.
This
Y = 1 + 2X2 + 3X3 + u
we had that
X3 = 1 + 2X2 + v
where v is a random error term.
A special kind of perfect multicollinearity is caused by dominant independent variable(s) where one or more independent variables are perfectly collinear with the dependent variable!
A dominant variable is so strongly correlated with the dependent variable that it dominates (overwhelm) all other variables.
The estimated coefficients may have unexpected or "wrong" signs. Estimates and their standard errors will be sensitive to changes in the sample or model specification.
When there is no multicollinearity, R2k is zero and VIF = 1/(1 - R2k) = 1/(1 - 0) = 1. When there is perfect multicollinearity, R2k is close to 1 and VIF = 1/(1 - 1) = 1/0 .
Multicollinearity:
Correction
The following options are available for alleviating, handling, or coping with multicollinearity:
1. Do Nothing
As mentioned above, even if every measure of intercorrelation among the independent variables points to the existence of strong multicollinearity, one need do nothing if the estimated t ratios are significant at reasonable levels and the estimates coefficients have the expected signs and reasonable magnitudes.
Multicollinearity:
Correction 2. Drop One of the Collinear Variables
Some suggest dropping one of the collinear variables from the model. While sometimes this is wise, other times dropping a variable can cause omitted-variable bias
Of course, in some cases this bias may be more than offset by the gain in efficiency. In that case mean-squared error (MSE) of the estimated coefficients on the included variable declines indicating an improvement.
Multicollinearity:
Correction
But how can we tell whether MSE increases or decreases when we omit a variable from the model? If the t ratio for the variable that is a candidate for dropping is less than one in absolute value, then dropping that variable would reduce MSE of the estimated parameter on the included variable. A corollary to the above rule is to never drop an independent variable whose estimated coefficient has a t ratio that is greater than one in absolute value, even if it has a unexpected (wrong) sign
Multicollinearity:
Correction
3. Transform the Data
First Difference the Variables Differencing reduces spurious correlation that normally arises in time-series data in level form.
Express the Variables as Ratios Sometimes one can greatly reduce even eliminate multicollinearity by combining two multicollinear variables by expressing them as a ratio.
Multicollinearity:
Correction
4. Obtain Additional Data
Because multicollinearity is a sample problem, it is possible that in another sample it is not as severe as in the first sample.
Furthermore, because standard errors of estimates are inversely related to the sample size, we can alleviate the major consequence of multicollinearity, inflated standard errors, by increasing the sample size.