0 оценок0% нашли этот документ полезным (0 голосов)
95 просмотров9 страниц
In application of least squares principle, we are dealing with estimating average Y. What we assume about 'u', determines the sampling distribution properties of the estimators. The stronger the assumptions, the more attractive the properties turn out to be.
Исходное описание:
Оригинальное название
Amitabha Sinha Associate Professor Department of a & A
In application of least squares principle, we are dealing with estimating average Y. What we assume about 'u', determines the sampling distribution properties of the estimators. The stronger the assumptions, the more attractive the properties turn out to be.
Авторское право:
Attribution Non-Commercial (BY-NC)
Доступные форматы
Скачайте в формате DOC, PDF, TXT или читайте онлайн в Scribd
In application of least squares principle, we are dealing with estimating average Y. What we assume about 'u', determines the sampling distribution properties of the estimators. The stronger the assumptions, the more attractive the properties turn out to be.
Авторское право:
Attribution Non-Commercial (BY-NC)
Доступные форматы
Скачайте в формате DOC, PDF, TXT или читайте онлайн в Scribd
ASSOCIATE PROFESSOR DEPARTMENT OF A & A ECONOMICS TRIPURA UNIVERSITY
I. UNDERSTANDING THE ASSUMPTIONS OF CLRM
We note that in application of least squares principle; we are in fact dealing with estimating average Y. This average Y, we postulate that is decomposable into ‘a’, bX and ‘u’. The so called disturbance term ‘u’ is nothing but the random component of Y in the context of the population we are theorizing about. The estimators of slope or intercept follow the property of Y. Y follows the property of ‘u’. What we assume about ‘u’, determines, so to say, the sampling distribution properties of the estimators. Why is it that what we assume about the population determine the properties of estimators based on observed samples of Y? The obvious answer is: we think that the sample is a representative sample. As a result, we appeal to the principle of analogy. To make this point more clear we can first write down the population regression function (never directly observable) and sample regression function based on observed Y and X. PRF: Yi=β0 + β1 Xi + ui …………………. (5) SRF: Yi= β*0 + β1* Xi + ei ........................ (6) Now, the OLS formula for β1* is β1* = ΣxY = Σ x(β0+ β1 Xi +ui) Σx2 Σx2 Formula By analogy Y is from PRF
The important point is that in deriving the properties of
the estimators, we are thinking that sample Y is same as population Y which we never observe. The sampling distribution properties of the estimators depend on the distribution properties of the disturbance term u. As a result, the assumptions of CLRM about u become crucial in determining the sampling distributions of the estimators. Not only that the stronger the assumptions, the more attractive the properties of the estimators turn out to be. Unfortunately, to the utter dismay of the researcher, the stronger the assumptions the less realistic they may turn out to be. Hence, the researcher is at the receiving end of the two horns of dilemma. Try to make the estimators attractive in terms of unbiasedness, low variance or linearity on Y, testable in terms of confidence interval and so on, the stricter the assumptions must be about ‘u’. But these assumptions may take her away from the reality of available data set. So, she likes to make the assumptions the weakest. The ‘Occam’s Razor’ summarizes this dilemma. In the context of EM, the regression model fulfills the following three conditions: (1) The expected value of ui given Xi is zero. Symbolically, E (u i | X i) = 0
Why does this condition hold in the case of randomized
experiments? The answer is not far to seek. X denotes the treatment level. Each subject is assigned into different treatment level in a random manner. For a given level of Xi, positive u and negative u have equal probability. Consequently, the sum of the products pi*ui is zero, where pi is the probability of ui. In short, E (ui|Xi) = 0. (2) The covariance of Xi and ui is zero. Symbolically, Cov(u i, X i) = 0 This is the direct consequence of randomization. The subjects are assigned to different groups in a random manner. So correlation is by definition zero. There are certain sources of confusion here. The least squares principle leads to two normal conditions /equations. One is that Σei=0. The second condition is that ΣXiei=0. These are mathematical first order conditions of minimization of error squares sum. These conditions are the restrictions imposed on the sample data by the least squares principle. On the other hand, the two assumptions mentioned above are assumptions about potentially unobservable population. Why is it then that we expect the sample estimators like OLS (ordinary least squares ) to show certain desirable properties like unbiasedness etc. when the disturbance term of PRF fulfill conditions like the two mentioned above ? It is due to the basic assumption that the sample is representative of the population. Otherwise, least squares principle cannot help at all. Let us imagine a hunter with a gun which shoots straight, hits the target most of the time and the stray bullets also do not stray very much. He is requested to kill a man-eater tiger. But he goes to a jungle of deadly serpents with no tiger. Will his gun help? Similarly, OLS cannot help if the sample comes from a ‘wrong’ population. Here the researcher must be cautious indeed. What will help the researcher in selecting a representative sample? First, a clear definition of the population is needed. Let us say that the population consists of the married females in the age group 18 to 35 years with at least one living child .It will be less difficult to collect a sample in this case than simply saying that all females define our population .Of course much depends on the objectives of the study. Second, method of data collection should not create any errors. For example, females may understate their age. Appropriate strategy of data collection has to be adopted like looking at school leaving certificate or municipality certificate. Sometimes, participatory learning methods like focused group may help. Third, the sample size should be sufficiently large, say, more than 30. The issues of random sampling can be addressed after facing the problems mentioned above. Let us assume that both the problems mentioned above, namely, representative-ness and randomization are met. What will be the consequences now for OLS estimators if the two assumptions of CLRM are met? We generally introduce the concept of ‘bias’ of the sample based estimator to analyze the consequence. If we have for the sample β1* and its population counterpart β1, then bias is defined as follows: Bias = E (β1*) - β1 Here, E represents the symbol for expected value, of course. If bias is zero then regression model gives a proper estimate of the effect of X on Y like EM. Will the OLS estimators become unbiased if the two assumptions mentioned above are met? The answer is, NO. Unbiased- ness requires that the average of the sample estimator is equal to the population estimator. Now we note that β1,OLS*= β1+ Σ(Xi – ¯X)ui …………………….(7) Σ (Xi – ¯X)2 This expression can be easily derived by taking deviation from the mean form of the regression equation and using the result that the sum of deviation from mean for X is zero. If we take expectations for both sides then the numerator of the second expression on the right hand side has E (ui). Now from the Law of Iterative Expectation, we have the result E(ui) = E{E(ui|Xi,X1,X2,…,XN)} . For E(ui) to be zero ,which makes the estimator unbiased, we require that (a) E(ui|Xi,) is independent from other Xs like X1, X2 , X3 etc and (b) E(ui|Xi,)=0 for each X level .This leads us to the third important assumption of the CLRM. (3) The disturbance terms are identically and independently distributed (i.i.d). Conditions (a) and (b) imply this. Randomization of course ensures the fulfillment of this condition through assignment to experimental groups in a random manner. II.EXTENSIONS A few concluding remarks before we wind up this lecture (1) So far we have assumed a single regressor. But we can have more than one explanatory variable. As a consequence, we may have to ask the question, what is the effect of X1 on Y, holding other variable constant? Multiple regression is perfectly capable for answering such questions. In fact it is designed to do precisely such work .The coefficients of Xs measure these impacts. Do we need additional assumption now? Yes. One assumption is required. It is that the different explanatory variables are not really the same variable in the sense that one can derived from the other mathematically as a linear transformation. Technically this is called absence of perfect or deterministic multicollinearity. In this case, partial impacts are not defined at all. Non linear relationships pose no problem, however. Nor do imperfect or stochastic multicollinearity. (2) What about variance of the OLS estimators? If there is same (homo) variation (scedasticity) for all disturbance terms for each level of treatment then the OLS contains all relevant information. It will have the minimum variance among all linear and unbiased estimators. This is because of two reasons. The OLS weights are perfectly correlated (Σkixi=1) with the explanatory variables, i.e., Xs. Secondly, the estimators are based on the principle of minimization of variance of the disturbance term. This is the homoscedasticity assumption required to establish the Gauss-Markov Theorem. (3) Time series econometrics requires further extensions.