# Problem Set 5

## ECN 140 Econometrics

Professor Oscar Jorda

## DUE: June 6, 2006

1) Earnings functions, whereby the log of earnings is regressed on years of education, years of on-the-job
training, and individual characteristics, have been studied for a variety of reasons. Some studies have
focused on the returns to education, others on discrimination, union and non-union differentials, etc. For
all these studies, a major concern has been the fact that ability should enter as a determinant of earnings,
but that it is close to impossible to measure and therefore represents an omitted variable.

Assume that the coefficient on years of education is the parameter of interest. Given that education is
positively correlated to ability, since, for example, more able students attract scholarships and hence
receive more years of education, the OLS estimator for the returns to education could be upward-biased.
To overcome this problem, various authors have used instrumental variables estimation techniques. For
each of the instruments potential instruments listed below, briefly discuss instrument validity.

## (d) Number of siblings the individual has.

2) The figure shows a plot and a fitted linear regression line of the age-earnings profile of 1,744 individuals,
taken from the Current Population Survey.

(a) Describe the problems in predicting earnings using the fitted line. What would the pattern of the
residuals look like for the age category under 40?

(b) What alternative functional form might fit the data better?

(c) What other variables might you want to consider in specifying the determinants of earnings?

3) Suggest a transformation in the variables that will linearize the deterministic part of the population
regression functions below. Write the resulting regression function in a form that can be estimated by
using OLS.

(a)

(b)

(c)

(d)

4) Your textbook gives the following example of simultaneous causality bias of a two equation system:

In microeconomics, you studied the demand and supply of goods in a single market. Let the demand
( ) and supply ( ) for the i-th good be determined as follows,

where P is the price of the good. In addition, you typically assume that the market clears.

Explain how the simultaneous causality bias applies in this situation. The textbook explained a positive
correlation between and for > 0 through an argument that started from "imagine that is
negative." Repeat this exercise here.

1) (a) Instrumental validity has two components, instrument relevance ( ), and instrument

exogeneity ( ) . The individual's postal zip code will certainly be uncorrelated with the omitted
variable, ability, even though some zip codes may attract more able individuals. However, this is an example of a
weak instrument, since it is also uncorrelated with years of education.

(b) There is instrument relevance in this case, since, on average, individuals who do well in intelligence scores or
other work-related test scores will have more years of education. Unfortunately there is bound to be a high
correlation with the omitted variable ability, since this is what these tests are supposed to measure.

(c) A non-zero correlation between the mother's or father's years of education and the individual's years of
education can be expected. Hence this is a relevant instrument. However, it is not clear that the parent's years of
education are uncorrelated with parent's ability, which in turn, can be a major determinant of the individual's
ability. If this is the case, then years of education of the mother or father is not a valid instrument.

(d) There is some evidence that the larger the number of siblings of an individual, the less the number of years of
education the individual receives. Hence number of siblings is a relevant instrument. It has been argued that
number of siblings is uncorrelated with an individual's ability. In that case it also represents an exogenous
instrument. However, there is the possibility that ability depends on the attention an individual receives from
parents, and this attention is shared with other siblings.
2) (a) There would be many overpredictions for this age category under 40, and hence more negative residuals.

(b) It would be better to fit a quadratic here, i.e., a polynomial regression model, which would produce an
inverted U-shape.

(c) Answers will vary by students, but education, gender, race, tenure with an employer, professional choice, and
ability are typically present in answers.
3) (a)

(b)

(c)

(d)

1