Вы находитесь на странице: 1из 28

Lecture 9

Violation of Assumptions of CLR Model:

Multicollinearity
Nature
Causes Consequences

and Types

Detection
Correction

Multicollinearity:
Nature
Multicollinearity This

is a sample problem.

is because independent variables in a regression equation are assumed to be nonstochastic so that population covariance between them is zero by definition.
But

in a sample of a given population, the independent variables may be correlated.

Multicollinearity:
Types
There

are two types of multicollinearity:

perfect multicollinearity imperfect multicollinearity


Perfect

multicollinearity means existence of an exact linear relationship between two or more independent variables.

Multicollinearity:
Types
For

example, the following regression,

Y = 1 + 2X2 + 3X3 + u
would suffer from perfect multicollinearity if

X3 = 1+ 2X2
Note

that there is no random error term in the second equation, which means |r23| = 1.

Multicollinearity:
Types
In

the case of imperfect multicollinearity, r23 is large but not equal to one in absolute value.
This

would be the case if in the following regression equation,

Y = 1 + 2X2 + 3X3 + u
we had that

X3 = 1 + 2X2 + v
where v is a random error term.

Causes of Perfect Multicollinearity


Perfect multicollinearity is typically caused by carelessness on the part of the researcher.

A special kind of perfect multicollinearity is caused by dominant independent variable(s) where one or more independent variables are perfectly collinear with the dependent variable!

Causes of Perfect Multicollinearity


This would arise if an independent variable is definitionally related to the dependent variable.

A dominant variable is so strongly correlated with the dependent variable that it dominates (overwhelm) all other variables.

Causes of Imperfect but High Multicollinearity


Since multicollinearity is a sample problem, high imperfect multicollinearity has to be caused by problems arising from the sample. These include: poor data manipulated data small sample size time-series data especially macro data

Consequences of Perfect Multicollinearity


When there is perfect multicollinearity, regression coefficients cannot be estimated as they will be of the indeterminate form, 0/0. Standard errors of the estimated coefficients tends to infinity as they will equal 2/0, implying complete lack of precision.

Consequences of Imperfect but High Multicollinearity


Regression coefficients are estimable OLS estimates are BLUE

Standard errors of estimates are too large


This makes t ratios too small And this increases probability of Type II Error

Consequences of Imperfect but High Multicollinearity


While R2 and adjusted R2 are not affected, we may encounter a situation where these are high and significant but none or only a few of the estimated regression coefficients are individually significant.

The estimated coefficients may have unexpected or "wrong" signs. Estimates and their standard errors will be sensitive to changes in the sample or model specification.

Detection of Perfect Multicollinearity


This is quite simple-- The model cannot be estimated. For example, in EViews, you would get the error message, near singular matrix indicating that estimation cannot proceed.

Detection of Imperfect but High Multicollinearity


Seldom if ever would we have zero multicollinearity in a regression model. That is we never have a regression equation in which all regressors are orthogonal. Thus the issue is not one of detection but one of degree of intercorrelation. Moreover, because multicollinearity is a sample problem rather than a population problem, there is no formal test for its presence.

Detection of Imperfect but High Multicollinearity


All we have is some general guidelines (rules of thumb) mostly based on the symptoms of high but imperfect multicollinearity. At best, these guidelines may lead us to suspect high multicollinearity. Nor do they tell us much regarding the severity of the consequences of high multicollinearity. We discuss these guidelines below.

Some Guidelines for Assessing Imperfect Multicollinearity


Suspect a high degree of multicollinearity if...
1. The simple correlation coefficient between two independent variables is high and statistically significant. But there are two problems with this:
First, this is only a sufficient condition for high multicollinearity

Second, it is not clear how high the correlation


coefficients must be for there to be severe multicollinearity.

Some Guidelines for Assessing Imperfect Multicollinearity


A possible answer to the second problem is to compare the squared of the simple correlation coefficient between two independent variables with the unadjusted R2 from the model.
If the squared simple correlation coefficient is greater than or equal to the unadjusted R2, we may conclude that the two explanatory variables are highly correlated with one another. Unfortunately, this rule itself suffers from a number of defects, which render it unsatisfactory.

Some Guidelines for Assessing Imperfect Multicollinearity


_ 2. R2, R2, F statistic, and simple correlation coefficients between the dependent variable and each individual independent variable are high but none or only a few of the estimated coefficients are individually significant.
Note, however, that this is only a sufficient condition for high multicollinearity.

Some Guidelines for Assessing Imperfect Multicollinearity


_ 3. R2, R2, and F statistic are high, but partial correlation coefficients between the dependent variable and independent variables are low.
Once again, this is only a sufficient condition for high multicollinearity.

Some Guidelines for Assessing Imperfect Multicollinearity


4. In a regression of the kth independent variable, Xk, on the remaining independent variables, the resulting R2 (known R2-delete and denoted R2k) is high and significant based on an F test. 5. Variance-inflation factor (VIF), defined as

VIF(k) = 1/(1 - R2k)


is much larger than one, where R2k, is as defined in (4) above.

Some Guidelines for Assessing Imperfect Multicollinearity


VIF may be viewed as the ratio of variance of ^k in the presence of multicollinearity to variance of ^k in the absence of multicollinearity.

When there is no multicollinearity, R2k is zero and VIF = 1/(1 - R2k) = 1/(1 - 0) = 1. When there is perfect multicollinearity, R2k is close to 1 and VIF = 1/(1 - 1) = 1/0 .

Some Guidelines for Assessing Imperfect Multicollinearity


6. Adding or dropping a few observations to or from the sample, or adding or dropping an independent variable results in significant changes in the estimated values, their signs and their statistical significance.

Some Remarks Concerning Detection of Imperfect Multicollinearity


Except for VIF, the above rules only tell us how strongly the independent variables are correlated. They do not tell us how serious the consequence of multicollinearity is say in terms of significantly lowering the t scores. Lawrence Klein suggests multicollinearity significantly lowers the t scores if R2 < R2k.
The problem is that t ratios may be quite high even though R2 < R2k.

Some Remarks Concerning Detection of Imperfect Multicollinearity


Even if every measure of intercorrelation among the independent variables points to the existence of a high degree of multicollinearity, there is no problem if the estimated t ratios are significant and all estimated coefficients have expected signs and reasonable magnitudes.

Multicollinearity:
Correction
The following options are available for alleviating, handling, or coping with multicollinearity:

1. Do Nothing
As mentioned above, even if every measure of intercorrelation among the independent variables points to the existence of strong multicollinearity, one need do nothing if the estimated t ratios are significant at reasonable levels and the estimates coefficients have the expected signs and reasonable magnitudes.

Multicollinearity:
Correction 2. Drop One of the Collinear Variables
Some suggest dropping one of the collinear variables from the model. While sometimes this is wise, other times dropping a variable can cause omitted-variable bias
Of course, in some cases this bias may be more than offset by the gain in efficiency. In that case mean-squared error (MSE) of the estimated coefficients on the included variable declines indicating an improvement.

Multicollinearity:
Correction
But how can we tell whether MSE increases or decreases when we omit a variable from the model? If the t ratio for the variable that is a candidate for dropping is less than one in absolute value, then dropping that variable would reduce MSE of the estimated parameter on the included variable. A corollary to the above rule is to never drop an independent variable whose estimated coefficient has a t ratio that is greater than one in absolute value, even if it has a unexpected (wrong) sign

Multicollinearity:
Correction
3. Transform the Data
First Difference the Variables Differencing reduces spurious correlation that normally arises in time-series data in level form.
Express the Variables as Ratios Sometimes one can greatly reduce even eliminate multicollinearity by combining two multicollinear variables by expressing them as a ratio.

Multicollinearity:
Correction
4. Obtain Additional Data
Because multicollinearity is a sample problem, it is possible that in another sample it is not as severe as in the first sample.
Furthermore, because standard errors of estimates are inversely related to the sample size, we can alleviate the major consequence of multicollinearity, inflated standard errors, by increasing the sample size.

Вам также может понравиться