You are on page 1of 8


Azra Mia, Sifiso Vumase, Sindiswa Majija,



The classical linear regression model assumption is that the variance of the error, is
constant, this is called Homoscedasticity. Homoscedasticity is when the variations
across the values neither decreases nor increases but is the same across the
values. Heteroscedasticity is when the variance of the error,

is unequal across the

range of values of the independent variable. As the dependent variable increases or

decreases as the value of the independent variable increases, a scatterplot of these two
variables will create a cone-like shape. Using an example of savings being the
dependent variable and income being the independent variable, the difference between
heteroscedasticity and homoscedasticity will be clear.

Using the model


i+ 2 i+ i

When income increases, savings on the average increases but with homoscedasticity
the variance of the savings stay the same at all levels of income, whereas with
heteroscedasticity it increases with income. Heteroscedasticity is used with cross
sectional data rather than time series data. In cross sectional data, members of a
population are focused on at a given point in time and in time series data one collects
data of the same kind for a period of years. Examples of this are Gross National
product, consumption expenditure savings, or employment in South Africa for a certain
period of years. OLS is a way of estimating a model and this is used in regression
analysis. OLS estimator states that when the assumption of homoscedasticity holds,
is therefore the best linear unbiased estimator. When we replace
homoscedasticity with the assumption of heteroscedasticity we establish that the
unbiasedness of is not necessary that the error has to be homoscedastic.

The variances of the error play no part in the determination of the unbiased property.
2 is also a consistent estimator despite heteroscedasticity. It is not affected by

unequal variances. That is as the sample size increases indefinitely, the estimated
2 converges to its value . ^
2 Has the property of being consistent and unbiased
but its not efficient or best.
Its variance is not at its minimum in the class of unbiased estimators. Heteroscedasticity
focuses on the error term and not the two variables in the context of regression.


As the meaning of heteroscedasticity has already been explored its important to note
the cause of it. There are numerous and they are mentioned below:

Heteroscedasticity arises when measurement errors occur; this could happen by

means of inaccurate or highly detailed responses from a survey or sample data
set, or even an error when entering the data.
Inconstant variances of the disturbances (error terms) may be caused by the
presence of outliers extreme values of the independent variables- and
skewness in the distribution of one or more of the regressors: i.e. values of the
independent variable that differ greatly from the other values within the model or
sample. For instance the presence of an extremely wealthy individual in a sample
of middle-income earners will distort the dispersion
The change variance could arise when taking into consideration too many
subpopulation differences. Therefore heteroscedasticity is highly likely in crosssectional data however its not impossible for it to be present in time series data.
When analyzing the data, there could be misspecifications, such as running the
data with Y instead of the log of Y; this would lead to a change in variance within
the model.
The omission of important variables also leads to heteroscedasticity being
present therefore if more statistically significant variables were added to the
model, the variance of the error term would remain constant.

The equation for a homoscedastic variance:

var ( ui ) = 2
And the equation for a heteroscedastic variance:
var (ui)= 2i
The () indicates to us how the variance changes with each increase in sample
size, therefore presenting heteroscedasticity

An example to illustrate heteroscedasticity would be the analyzing of high income

earners vs low income earners. The high income earners have more disposable income
therefore they are more flexible with their consumption which creates more dispersion.
However a low income earner has less disposable income and would have little or no
options but to spend the income on consumption of basic necessities, therefore less


As the causes of heteroscedasticity have already been mentioned, it is important to take

into consideration the consequences.
Once heteroscedasticity is detected in a regression line, it violates the assumption that
variance should remain equal (homoscedastic) and to the minimum. Heteroscedasticity
leads to the regression estimators being no longer efficient nor the best as it does not
have a minimum variance anymore, therefore it can no longer be termed BLUE.
However, heteroscedasticity does not cause bias as the distribution remains on the
same mean over the homoscedastic normal distribution curve

As it can be seen from the graph, the normal distribution curves, share a mean of 4,
however the blue curve has a larger variance (standard deviation 2) compare to the red
Variance for a homoscedastic regression:
variance= 2


Variance for a heteroscedastic regression:

x2 2

x2 2


The heteroscedastic variance equation underestimates the true variance and the
standard error of .

In conclusion, when using the usual testing methods such as, t-test, f-test and
hypothesis testing with heteroscedasticity present, then the conclusions we reach and
statements we make can be very misleading. This could lead us to concluding that the
explanatory variable X is statistically significant on Y, when it may actually have no
impact at all. The confidence intervals for will also be much narrower than what is
statistically accepted, which would lead to false conclusions.


Heteroscedasticity does not destroy the unbiasedness and consistency properties of the
ordinary least squares estimators, but they are no longer efficient and knowing these
consequences it is necessary to seek remedial measures. The lack of efficiency makes
the usual hypothesis-testing procedure invalid.
Sometimes heteroscedasticity results from improper model specification, there may be
subgroup difference, effect of variables may not be linear, or some important variables
have been left out of the model. These problems would have to be dealt with first. Do
not perform other techniques such as WLS, because they dont get to the heart of the
If 2 is known as the most straight forward method of correcting heteroscedasticity is by
means of weighted least squares to obtain BLUE estimators. The WLS estimator is the
OLS estimator applied to a transformed model that is obtained by multiplying each term
on both sides of the regression equation by a weight, denoted wt. For the above
example, the transformed model is
wtYt = wt1 + 2(wtXt2) + 3(wtXt3) + wtt

For the GLS estimator, the wt = 1/t. Thus, the GLS estimator is a particular kind of
WLS estimator. Thus, each observation on each variable is given a weight wt that is
inversely proportional to the standard deviation of the error for that observation. This
means that observations with a large error variance are given less weight, and
observations with a smaller error variance are given more weight in the GLS regression.
Down sides to GLS/WLS

The major problem with GLS estimator is that to use it you must know the true error
variance and standard deviation of the error for each observation in the sample.
However, the true error variance is always unknown and unobservable. Thus, the GLS
estimator is not a feasible estimator.
2 is not always known and when not known another method should be used which is
the Feasible Generalized Least Squares. To make the GLS estimator feasible, we can
use the sample data to obtain an estimate of t for each observation in the sample. We
can then apply the GLS estimator using the estimates of t. When we do this, we have
a different estimator. This estimator is called the Feasible Generalized Least Squares
Yt = 1 + 2Xt2 + 3Xt3 + t

for t = 1, 2, , n

Var (t) = 2 = Some Function

for t = 1, 2, , n

The rest of the assumptions are the same as the classical linear regression model.
Suppose that we assume that the error variance is a linear function of X t2 and Xt3. Thus,
we are assuming that the heteroscedasticity has the following structure.

Var (t) = t2 = 1 + 2Xt2 + 3Xt3

for t = 1, 2, , n

For this to work you would need to Regress Yt against a constant, Xt2, and Xt3, then
calculate, Square these residuals. Use the estimates of 1, 2, and 3 to calculate the
predicted values t2. This is an estimate of the error variance for each observation.
Check the predicted values. For any predicted value that is non-positive replace it with
the squared residual for that observation. This ensures that the estimate of the variance
is a positive number (you cant have a negative variance). Lastly Multiply Y t, Xt2, and
Xt3 for each observation by its weight and Regress wtYt on wt, wtXt2, and wtXt3 using
OLS, and after this you would
Obtain FGLS estimates of the parameters 1, 2, and.

The question as to which remedial measure will work depends fully on the problem and
the severity of the heteroscedasticity.