You are on page 1of 9




We note that in application of least squares principle; we
are in fact dealing with estimating average Y. This average
Y, we postulate that is decomposable into ‘a’, bX and ‘u’.
The so called disturbance term ‘u’ is nothing but the
random component of Y in the context of the population
we are theorizing about. The estimators of slope or
intercept follow the property of Y. Y follows the property
of ‘u’. What we assume about ‘u’, determines, so to say,
the sampling distribution properties of the estimators. Why
is it that what we assume about the population determine
the properties of estimators based on observed samples of
Y? The obvious answer is: we think that the sample is a
representative sample. As a result, we appeal to the
principle of analogy. To make this point more clear we can
first write down the population regression function (never
directly observable) and sample regression function based
on observed Y and X.
PRF: Yi=β0 + β1 Xi + ui …………………. (5)
SRF: Yi= β*0 + β1* Xi + ei ........................ (6)
Now, the OLS formula for β1* is
β1* = ΣxY = Σ x(β0+ β1 Xi +ui)
Σx2 Σx2
Formula By analogy Y is from PRF

The important point is that in deriving the properties of

the estimators, we are thinking that sample Y is same as
population Y which we never observe. The sampling
distribution properties of the estimators depend on the
distribution properties of the disturbance term u.
As a result, the assumptions of CLRM about u become
crucial in determining the sampling distributions of the
estimators. Not only that the stronger the assumptions, the
more attractive the properties of the estimators turn out to
be. Unfortunately, to the utter dismay of the researcher, the
stronger the assumptions the less realistic they may turn out
to be. Hence, the researcher is at the receiving end of the
two horns of dilemma. Try to make the estimators attractive
in terms of unbiasedness, low variance or linearity on Y,
testable in terms of confidence interval and so on, the
stricter the assumptions must be about ‘u’. But these
assumptions may take her away from the reality of
available data set. So, she likes to make the assumptions the
weakest. The ‘Occam’s Razor’ summarizes this dilemma.
In the context of EM, the regression model fulfills the
following three conditions:
(1) The expected value of ui given Xi is zero. Symbolically,
E (u i | X i) = 0

Why does this condition hold in the case of randomized

experiments? The answer is not far to seek. X denotes the
treatment level. Each subject is assigned into different
treatment level in a random manner. For a given level of Xi,
positive u and negative u have equal probability.
Consequently, the sum of the products pi*ui is zero, where
pi is the probability of ui. In short, E (ui|Xi) = 0.
(2) The covariance of Xi and ui is zero. Symbolically,
Cov(u i, X i) = 0
This is the direct consequence of randomization. The
subjects are assigned to different groups in a random
manner. So correlation is by definition zero.
There are certain sources of confusion here. The least
squares principle leads to two normal conditions /equations.
One is that Σei=0. The second condition is that ΣXiei=0.
These are mathematical first order conditions of
minimization of error squares sum. These conditions are
the restrictions imposed on the sample data by the least
squares principle. On the other hand, the two assumptions
mentioned above are assumptions about potentially
unobservable population.
Why is it then that we expect the sample estimators like
OLS (ordinary least squares ) to show certain desirable
properties like unbiasedness etc. when the disturbance term
of PRF fulfill conditions like the two mentioned above ? It
is due to the basic assumption that the sample is
representative of the population. Otherwise, least squares
principle cannot help at all. Let us imagine a hunter with a
gun which shoots straight, hits the target most of the time
and the stray bullets also do not stray very much. He is
requested to kill a man-eater tiger. But he goes to a jungle
of deadly serpents with no tiger. Will his gun help?
Similarly, OLS cannot help if the sample comes from a
‘wrong’ population.
Here the researcher must be cautious indeed. What will
help the researcher in selecting a representative sample?
First, a clear definition of the population is needed. Let us
say that the population consists of the married females in
the age group 18 to 35 years with at least one living child
.It will be less difficult to collect a sample in this case than
simply saying that all females define our population .Of
course much depends on the objectives of the study.
Second, method of data collection should not create any
errors. For example, females may understate their age.
Appropriate strategy of data collection has to be adopted
like looking at school leaving certificate or municipality
certificate. Sometimes, participatory learning methods like
focused group may help.
Third, the sample size should be sufficiently large, say,
more than 30.
The issues of random sampling can be addressed after
facing the problems mentioned above.
Let us assume that both the problems mentioned above,
namely, representative-ness and randomization are met.
What will be the consequences now for OLS estimators if
the two assumptions of CLRM are met?
We generally introduce the concept of ‘bias’ of the sample
based estimator to analyze the consequence. If we have for
the sample β1* and its population counterpart β1, then bias is
defined as follows:
Bias = E (β1*) - β1
Here, E represents the symbol for expected value, of
course. If bias is zero then regression model gives a proper
estimate of the effect of X on Y like EM.
Will the OLS estimators become unbiased if the two
assumptions mentioned above are met? The answer is, NO.
Unbiased- ness requires that the average of the sample
estimator is equal to the population estimator. Now we note
β1,OLS*= β1+ Σ(Xi – ¯X)ui …………………….(7)
Σ (Xi – ¯X)2
This expression can be easily derived by taking deviation
from the mean form of the regression equation and using
the result that the sum of deviation from mean for X is
zero. If we take expectations for both sides then the
numerator of the second expression on the right hand side
has E (ui). Now from the Law of Iterative Expectation, we
have the result
E(ui) = E{E(ui|Xi,X1,X2,…,XN)} .
For E(ui) to be zero ,which makes the estimator unbiased,
we require that (a) E(ui|Xi,) is independent from other Xs
like X1, X2 , X3 etc and (b) E(ui|Xi,)=0 for each X level
.This leads us to the third important assumption of the
(3) The disturbance terms are identically and independently
distributed (i.i.d). Conditions (a) and (b) imply this.
Randomization of course ensures the fulfillment of this
condition through assignment to experimental groups in a
random manner.
A few concluding remarks before we wind up this lecture
(1) So far we have assumed a single regressor. But we can
have more than one explanatory variable. As a
consequence, we may have to ask the question, what is the
effect of X1 on Y, holding other variable constant? Multiple
regression is perfectly capable for answering such
questions. In fact it is designed to do precisely such
work .The coefficients of Xs measure these impacts. Do we
need additional assumption now? Yes. One assumption is
required. It is that the different explanatory variables are
not really the same variable in the sense that one can
derived from the other mathematically as a linear
transformation. Technically this is called absence of perfect
or deterministic multicollinearity. In this case, partial
impacts are not defined at all. Non linear relationships pose
no problem, however. Nor do imperfect or stochastic
(2) What about variance of the OLS estimators? If there is
same (homo) variation (scedasticity) for all disturbance
terms for each level of treatment then the OLS contains all
relevant information. It will have the minimum variance
among all linear and unbiased estimators. This is because
of two reasons. The OLS weights are perfectly correlated
(Σkixi=1) with the explanatory variables, i.e., Xs. Secondly,
the estimators are based on the principle of minimization of
variance of the disturbance term. This is the
homoscedasticity assumption required to establish the
Gauss-Markov Theorem.
(3) Time series econometrics requires further extensions.