Вы находитесь на странице: 1из 71

# Limited Dependent Variable Models

## Limited Dependent Variables

A limited dependent variable is a dependent variable whose range is restricted For example:
Any indicator variable such as whether or not a household is poor (i.e., 0 or 1) Test scores (generally bound by 0 and 100) The number of children born to a woman is a nonnegative integer

Outline
Logit and probit models for binary dependent variables Tobit model for corner solutions

Why do we care?
Lets start with a review of the linear probability model to examine some of its shortcomings The model is given by: y = 0 + 1 x1 + ... + k xk + u where P ( y = 1| x ) = E ( y | x ) = 0 + 1 x1 + ... + k xk

## Linear Probability Model

There will be three undesirable features of this model: 1. The error term will not be homoskedastic. This violates assumption LMR.4. Our OLS estimates will still be unbiased, but the standard errors are incorrect. Nonetheless, it is easy to adjust for heteroskedasticity of unknown form.
2.

We can get predictions that are either greater than 1 or less than 0! The independent variables cannot be linearly related to the dependent variable for all possible values.

3.

## Linear Probability Model Example

Lets look at how being in the labour force is influenced by various determinants:
Husbands earnings Years of education Previous labour market experience Age Number of children less than 6 years old Number of children between 6 and 18 years of age

## Linear Probability Model Example

Coefficient estimate Husbands income Years of education Experience Experience2 Age # kids <= 6 years old # kids > 6 years old -0.0034 0.038 0.039 -0.00060 -0.016 -0.262 0.013 Usual standard errors 0.0014 0.007 0.006 0.00018 0.002 0.034 0.013 Robust standard errors 0.0015 0.007 0.006 0.00019 0.002 0.032 0.014
7

## Linear Probability Model Example

Using standard errors that are robust to unknown heteroskedasticity is simple and does not substantially change the reported standard errors Interpreting the coefficients:
All else equal, an extra year of education increases the probability of participating in the labour force by 0.038 (3.8%) All else equal, an additional child 6 years of age or less decreases the probability of working by 0.262
8

## Linear Probability Model Example

Predicted probabilities: Sometimes we obtain predicted probabilities that are outside of the range [0,1]. In this sample, 33 of the 753 observations produce predicted probabilities outside of [0,1]. For example, consider the following observation:
Husbands earnings = 17.8 Years of education = 17 Previous labour market experience = 15 Age = 32 Number of children less than 6 years old = 0 Number of children between 6 and 18 years of age = 1 The predicted probability is 1.13!!
9

## Linear Probability Model Example

An additional problem is that probabilities cannot be linearly related to the independent variables for all possible values
For example, consider the estimate of the marginal effect of increasing the number of children 6 years of age or younger. It is estimated to be -0.262. This means that if this independent variable increased from 0 to 4, the probability of being in the labour market would fall by 1.048, which is impossible!

10

## Linear Probability Model

It is still a useful model to estimate, especially since the estimate coefficients are much easier to interpret than the nonlinear models that we are going to introduce shortly Plus, it usually works well for values of the independent variables that are close to the respective means (i.e., outlying values of x cause problems)

11

## Limited Dependent Variables Models

In this lecture were going to cover estimation techniques that will better address the nature of the dependent variable
Logit & Probit Tobit

12

## Logit and Probit Models for Binary Response

Were going to prevent predicted values from ever falling outside the range [0,1] by estimating a nonlinear regression: P ( y = 1| x ) = G ( 0 + x ) where 0<G(z)<1 for all real numbers z The two most commonly used functions for G(.) are the logit model and the probit model: exp ( z ) = (z) G(z) = 1 + exp ( z )
G(z) = (z)
13

## Logit and Probit Models for Binary Response

Logit and probit models can be derived from an underlying latent variable model
i.e., an unobserved variable

y = 1 y* > 0 We assume that e is independent of x and that e either has the standard logistic distribution or the standard normal distribution Under either assumption e is symmetrically distributed about 0, which implies that 1-G(-z)=G(z) for all real numbers z
14

y* = 0 + x + e,

## Logit and Probit Models for Binary Response

We can now derive the response probability for y: P ( y = 1| x ) = P ( y* > 0 | x )

= P ( 0 + x + e > 0 | x ) = P ( e > ( 0 + x ) | x ) = 1 G ( 0 + x ) = G ( 0 + x )

15

## Logit and Probit Models for Binary Response

In most applications of binary response models our main interest is to explain the effects of the xs on the response probability P(y=1|x) The latent variable interpretation tends to give the impression that we are interested in the effects of the xs on y* For probit and logit models, the direction of the effect of the xs on E(y*|x) and E(y|x)=P(y=1|x) are the same In most applications however, the latent variable does not have a well-defined unit of measurement which limits its interpretation. Nonetheless, in some examples this is a very useful tool for thinking about the problem.

16

## Logit and Probit Models for Binary Response

The sign of the coefficients will tell us the direction of the partial effect of xj on P(y=1|x) However, unlike the linear probability model, the magnitudes of the coefficients are not especially useful If xj is a roughly continuous variable, its partial effect is given by: p ( x ) dG ( z ) = j dz x j
17

## Logit and Probit Models for Binary Response

In the linear probability model the derivative of G was simply 1, since G(z)=z in the linear probability model. In other words, we can move from this nonlinear function back to the linear model by simply assuming G(z)=z. For both the logit and the probit models g(z)=dG(z)/dz is always positive (since G is the cumulative distribution function, g is the probability density function). Thus, the sign of j is the same as the sign of the partial effect. The magnitude of the partial effect is influenced by the entire vector of xs
18

## Logit and Probit Models for Binary Response

Nonetheless, the relative effect of any two continuous explanatory variables do not depend on x The ratio of the partial effects for xj and xh is j/h, which does not depend on x

19

## Logit and Probit Models for Binary Response

Suppose x1 is a discrete variable, its partial effect of going from c to c+1 is given by:

G ( 0 + 1 ( c + 1) + 2 x2 + ... + k xk ) G ( 0 + 1c + 2 x2 + ... + k xk )

Again, this effect depends on x Note, however, that the sign of 1 is enough to know whether the discrete variable has a positive or negative effect This is because G() is strictly increasing

20

## Logit and Probit Models for Binary Response

We use Maximum Likelihood Estimation, which already takes into consideration the heteroskedasticity inherent in the model Assume that we have a random sample of size n To obtain the maximum likelihood estimator, conditional on the explanatory variables, we need the density of yi given xi

f ( y | xi ; ) = G ( xi ) 1 G ( xi )
y

1 y

, y = 0,1
21

## Logit and Probit Models for Binary Response

When y=1: f(y|xi:)=G(xi) When y=0: f(y|xi:)=1-G(xi) The log-likelihood function for observation i is given by: li ( ) = yi log G ( xi ) + (1 yi ) log 1 G ( xi ) The log-likelihood for a sample of size n is obtained by summing this expression over all observations
L ( ) = li ( )
i =1 n

22

## Logit and Probit Models for Binary Response

The MLE of maximizes this log-likelihood If G is the standard logit cdf, then we get the logit estimator If G is the standard normal cdf, then we get the probit estimator Under general conditions, the MLE is:
Consistent Asymptotically normal Asymptotically efficient
23

## Inference in Probit and Logit Models

Standard regression software, such as Stata, will automatically report asymptotic standard errors for the coefficients This means we can construct (asymptotic) t-tests for statistical significance in the usual way:
t j = j se j

( )
24

Logit and Probit Models for Binary Response: Testing Multiple Hypotheses
We can also test for multiple exclusion restrictions (i.e., two or more regression parameters are equal to 0) There are two options commonly used:
A Wald test A likelihood ratio test

25

Logit and Probit Models for Binary Response: Testing Multiple Hypotheses
Wald test:
In the linear model, the Wald statistic, can be transformed to be essentially the same as the F statistic The formula can be found in Wooldridge (2002, Chapter 15) It has an asymptotic chi-squared distribution, with degrees of freedom equal to the number of restrictions being tested In Stata we can use the test command following probit or logit estimation
26

Logit and Probit Models for Binary Response: Testing Multiple Hypotheses
Likelihood ratio (LR) test If both the restricted and unrestricted models are easy to compute (as is the case when testing exclusion restrictions), then the LR test is very attractive It is based on the difference in the log-likelihood functions for the restricted and unrestricted models
Because the MLE maximizes the log-likelihood function, dropping variables generally leads to a smaller log-likelihood (much in the same way are dropping variables in a liner model leads to a smaller R2)

## The likelihood ratio statistic is given by:

It is asymptotically chi-squared with degrees of freedom equal to the number of restrictions can use lrtest in Stata
27

LR = 2 ( Lur Lr )

Logit and Probit Models for Binary Response: Interpreting Probit and Logit Estimates
Recall that unlike the linear probability model, the estimated coefficients from Probit or Logit estimation do not tell us the magnitude of the partial effect of a change in an independent variable on the predicted probability This depends not just on the coefficient estimates, but also on the values of all the independent variables and the coefficients

28

Logit and Probit Models for Binary Response: Interpreting Probit and Logit Estimates
For roughly continuous variables the marginal effect is approximately by: P ( y = 1| x ) g 0 + x j x j

For discrete variables the estimated change in the predicted probability is given by: G 0 + 1 ( c + 1) + 2 x2 + ... + k xk

( G(

+ 1c + 2 x2 + ... + k xk 0

29

Logit and Probit Models for Binary Response: Interpreting Probit and Logit Estimates
Thus, we need to pick interesting value of x at which to evaluate the partial effects
Often the sample averages are used. Thus, we obtain the partial effect at the average (PEA) We could also use lower or upper quartiles, for example, to see how the partial effects change as some elements of x get large or small If xk is a binary variable, then it often makes sense to use a value of 0 or 1 in the partial effect equation, rather than the average value of xk
30

Logit and Probit Models for Binary Response: Interpreting Probit and Logit Estimates
An alternative approach is to calculate the average partial effect (APE) For a continuous explanatory variable, xj, the APE is: n n n 1 g 0 + xi j = n 1 g 0 + xi j i =1 i =1

The two scale factors (at the mean for PEA and averaged over the sample for the APE) differ since the first uses a nonlinear function of the average and the second uses the average of a nonlinear function
31

## Example 17.1: Married Womens Labour Force Participation

We are going to use the data in MROZ.RAW to estimate a labour force participation for women using logit and probit estimation.
The explanatory variables are nwifeinc, educ, exper, exper2, age, kidslt6, kidsge6 probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6

32

Example 17.1
Independent variable Coefficient Estimates OLS (robust stderr) -0.0034 (0.0015) 0.038 (0.007) -0.016 (0.002) -0.262 (0.032) 0.013 (0.014) Probit -0.012 (0.005) 0.131 (0.025) -0.053 (0.008) -0.868 (0.119) 0.036 (0.043) Logit -0.021 (0.008) 0.221 (0.043) -0.088 (0.014) -1.44 (0.20) 0.060 (0.075)
33

Husbands income Years of education Age # kids <= 6 years old # kids > 6 years old

Example 17.1
True or false:
The Probit and Logit model estimates suggest that the linear probability model was underestimating the negative impact of having young children on the probability of women participating in the labour force.

34

Example 17.1
How does the predicted probability change as the number of young children increases from 0 to 1? What about from 1 to 2?
Well evaluate the effects at:
Husbands income=20.13 Education=12.3 Experience=10.6 Age=42.5 # older children=1

## These are all close to the sample averages

35

Example 17.1
From the probit estimates: Going from 0 to 1 small child decreases the probability of labour force participation by 0.334 Going from 1 to 2 small child decreases the probability of labour force participation by 0.256 Notice that the impact of one extra child is now nonlinear (there is a diminishing impact). This differs from the linear probability model which says any increase of one young child has the same impact.
36

## Logit and Probit Models for Binary Response

Similar to linear models, we have to be concerned with endogenous explanatory variables. We dont have time to cover this so see Wooldridge (2002, Chapter 15) for a discussion We need to be concerned with heteroskedasticity in probit and logit models. If var(e|x) depends on x then the response probability no longer has the form G(0+x) implying that more general estimation techniques are required The linear probability can be applied to panel data, typically estimated using fixed effects Logit and probit models with unobserved effects are difficult to estimate and interpret (see Wooldridge (2002, Chapter 15))

37

## The Tobit Model for Corner Solution Responses

Often in economics we observes variables for which 0 (or some other fixed number) is in an optimal outcome for some units of observations, but a range of positive outcomes prevail for other observations
For example:
Number of hours worked annually Trade flows Hours spent on the internet Grade on a test (may be grouped at both 0 and 100)

38

## The Tobit Model for Corner Solution Responses

Let y be a variable that is roughly continuous over strictly positive values but that takes on zero with a positive probability Similar to the binary dependent variable context we can use a linear model and this might not be so bad for observations that are close to the mean, but we may obtain negative fitted values and therefore negative predictions for y

39

## The Tobit Model for Corner Solution Responses

We often express the observed outcome, y, in terms of an unobserved latent variable, say y*
y* = x + u , u | x ~ N ( 0, 2 ) y = max ( 0, y *)

We now need to think about how to estimate this model. There are two cases to consider:
When y=0 When y>0
40

## The Tobit Model for Corner Solution Responses

Lets start with how wed incorporate y=0. What is the probability that y=0 conditional on the explanatory variables?

## P ( y = 0 | x ) = P ( y* < 0 | x ) = P ( x + u < 0 | x ) = P ( u < x | x ) = P ( u < x | x ) = ( x ) = 1 ( x )

Definition of y Definition of y*

41

## The Tobit Model for Corner Solution Responses

What is the probability that y>0 conditional on the explanatory variables? Since y is continuous for values greater than 0, the probability is simply the density of the normal variable u We can now put together these two pieces to form the log-likelihood function for the Tobit model (see equation 17.22 in Wooldridge)
42

## Interpreting Tobit estimates

Given standard regression packages, it is straight forward to estimate a Tobit model using maximum likelihood (the details of the formulation are available in Wooldridge (2002, Chapter 16)) The underlying model tells us that j measures the partial effect of xj on y*, the latent variable. However, were usually interested in the observed outcome y, not y* In the Tobit model two conditional expectations are generally of interest: E(y|y>0,x) E(y|x)
43

## Interpreting Tobit estimates

E ( y | y > 0, x ) = x + ( x / ) E ( y | x ) = ( x / ) x + ( x / )

Take home message: Conditional expectations in the Tobit are much more complicated than in the linear model E(y|x) is a nonlinear of function of both x and . Moreover, this conditional expectation can be shown to be positive for any values of x and .
44

## Interpreting Tobit estimates

To examine partial effects, we should consider two cases: When xj is continuous When xj is discrete When xj is continuous we can use calculus to solve for the partial effects:

E ( y | y > 0, x ) x j E ( y | x ) x j

= j 1 ( x ) x + ( x )

= j ( x )

Like in probit or logit models, the partial effect will depend on all explanatory variables and parameters
45

## Interpreting Tobit estimates

When xj is discrete we estimate the partial effect as the difference:
E ( y | y > 0, x j , x j = c + 1) E ( y | y > 0, x j , x j = c ) E ( y | x j , x j = c + 1) E ( y | x j , x j = c )

46

## Interpreting Tobit estimates

Just like the probit and logit models, there are two common approaches for evaluating the partial effects:
Partial Effect at the Average (PEA)
Evaluate the expressions at the same average

## Average Partial Effect (APE)

Calculate the mean over the values for the entire sample

47

## Example 17.2: Womens annual labour supply

We can use the same dataset, MROZ.RAW, that we used to estimate the probability of women participating in the labour force to estimate the impact of various explanatory variables on the total number of hours worked Of the 753 women in the sample:
428 worked for a wage during the year 325 worked zero hours in the labour market

48

## Tobit example: Womens annual labour supply

reg hours nwifeinc educ exper expersq age kidslt6 kidsge6 tobit hours nwifeinc educ exper expersq age kidslt6 kidsge6, ll(0)

49

## Tobit example: Womens annual labour supply

Coefficient Estimates OLS Husbands income Years of education Age # kids <= 6 years old # kids > 6 years old Sigma -3.45 (2.54) 28.76 (12.95) -30.51 (4.36) -442.09 (58.85) -32.78 (23.18) Tobit -8.81 (4.46) 80.65 (21.58) -54.41 (7.42) -894.02 (111.88) -16.22 (38.64) 1122.022 (41.58)
50

## Tobit example: Womens annual labour supply

The Tobit coefficient estimates all have the same sign as the OLS coefficients The pattern of statistical significance is also very similar Remember though, we cannot directly compare the OLS and Tobit coefficients in terms of their effect on hours worked

51

## Tobit example: Womens annual labour supply

Lets construct some marginal effects for some of the discrete variables First, the means of the explanatory variables:
Husbands income: 20.12896 Education: 12.28685 Experience: 10.63081 Age: 42.53785 # young children: 0.2377158 # older children: 1.353254
52

## Tobit example: Womens annual labour supply

Recall the formula:

E ( y | x ) = ( x / ) x + ( x / )

We can use this to answer the following question: What is the impact of moving from 0 to 1 young children on the total number of hours worked? Well evaluate for a hypothetical person close to the mean values:
Husbands income: 20.12896 Education: 12 Experience: 11 Age: 43 # older children: 1

53

## Tobit example: Womens annual labour supply

x(#young=0,means)=624.64 x(#young=1,means)=-269.38 x(#young=0,means) / =0.5567 x(#young=1,means) / =-0.2401 (#young=0,means)=0.3417 (#young=1,means)=0.3876 (#young=0,means)=0.7111 (#young=1,means)=0.4051
54

## Tobit example: Womens annual labour supply

E(y|#young=0,means)=827.6 E(y|#young=1,means)=325.8 E(y|#young=0,means)-E(y|#young=1,means)=502 Thus, for a hypothetical average woman, going from 0 young children to 1 young child would decrease hours worked by 502 hours. This is larger than the OLS estimate of a 442 hour decrease. We could do the same thing to look at the impact of adding a second young child.

55

Specification Issues
The Tobit model relies on the assumptions of normality and homoskedasticity in the latent variable model Recall, using OLS we did not need to assume a distributional form for the error term in order to have unbiased (or consistent) estimates of the parameters. Thus, although using Tobit may provide us with a more realistic description of the data (for example, no negative predicted values) we have to make stronger assumptions than when using OLS. In a Tobit model, if any of the assumptions fail, it is hard to know what the estimated coefficients mean.
56

Specification Issues
One important limitation of Tobit models is that the expectation of y, conditional on a positive value, is closely linked to the probability that y>0 The effect of xj on P(y>0|x) is proportional to j, as is the effect on E(y|y>0,x). Moreover, for both expressions the factor multiplying j is positive. Thus, if you want a model where an explanatory variable has opposite effects on P(y>0|x) and E(y|y>0,x), then Tobit is inappropriate. One way to informally evaluate a Tobit model is to estimate a probit model where: w=1 if y>0 w=0 if y=0

57

Specification Issues
The coefficient on xj in the above probit model, say j, is directly related to the coefficient on xj in the Tobit model, j: j = j Thus, we can look to see if the estimated values differ.
For example, if the estimates differ in sign, this may suggest that the Tobit model is in appropriate

58

## Specification Issues: Annual hours worked example

From our previous examples, we estimated the probit coefficient on the variable # of young children to be -0.868 In the Tobit model, we estimated j/=-0.797 for the variable # of young children This is not a very large difference, but it suggests that having a young child impacts the initial labour force participation decision more than how many hours a woman works, once she is in the labour force The Tobit model effectively averages this two effects: The impact on the probability of working The impact on the number of hours worked, conditional on working

59

Specification Issues
If we find evidence that the Tobit model is inappropriate, we can use hurdle or two-part models These models have the feature that P(y>0|x) and E(y|y>0,x) depend on different parameters and thus xj can have dissimilar effects on the two functions (see Wooldridge (2002, Chapter 16))

60

Practice questions
17.2, 17.3 C17.1, C17.2, C17.3

61

## Computer Exercise C17.2

Use the data in LOANAPP.RAW for this exercise. Estimate a probit model of approve on white. Find the estimated probability of loan approval for both whites and nonwhites. How do these compare to the linear probability model estimates? probit approve white regress approve white
62

## Computer Exercise C17.2

Probit White Constant 0.784 (0.087) 0.547 (0.075) LPM 0.201 (0.020) 0.708 (0.018)

As there is only one explanatory variable and it takes only two values, there are only two different predicted probabilities: the estimated loan approval probabilities for white and nonwhite applicants Hence, the predicted probabilities, whether we use a probit, logit, or LPM model are simply the cell frequencies: 0.708 for nonwhite applicants 0.908 for white applicants
63

## Computer Exercise C17.2

We can do this in Stata using the following commands following the probit estimation: predict phat summarize phat if white==1 summarize phat if white==0

64

## Computer Exercise C17.2

Now add the variables hrat, obrat, loanprc, unem, male, married, dep, sch, cosign, chist, pubrec, mortlat1, mortlat2, and vr to the probit model. Is there statistically significant evidence of discrimination against nonwhites?

65

## Computer Exercise C17.2

approve white hrat obrat loanprc unem male married dep sch cosign chist pubrec mortlat1 mortlat2 vr _cons Coef. .5202525 .0078763 -.0276924 -1.011969 -.0366849 -.0370014 .2657469 -.0495756 .0146496 .0860713 .5852812 -.7787405 -.1876237 -.4943562 -.2010621 2.062327 Std. Err. .0969588 .0069616 .0060493 .2372396 .0174807 .1099273 .0942523 .0390573 .0958421 .2457509 .0959715 .12632 .2531127 .3265563 .0814934 .3131763 z 5.37 1.13 -4.58 -4.27 -2.10 -0.34 2.82 -1.27 0.15 0.35 6.10 -6.16 -0.74 -1.51 -2.47 6.59 P>z 0.000 0.258 0.000 0.000 0.036 0.736 0.005 0.204 0.879 0.726 0.000 0.000 0.459 0.130 0.014 0.000 [95% Conf.Interval] .3302168 -.0057682 -.0395488 -1.47695 -.0709464 -.2524549 .0810159 -.1261266 -.1731974 -.3955917 .3971805 -1.026323 -.6837153 -1.134395 -.3607862 1.448512 .7102883 .0215209 -.015836 -.5469881 -.0024234 .1784521 .4504779 .0269753 .2024967 .5677343 .7733818 -.5311578 .308468 .1456823 -.041338 2.676141 66

## Computer Exercise C17.2

Estimate the previous model by logit. Compare the coefficient on white to the probit estimate.

67

## Computer Exercise C17.2

approve white hrat obrat loanprc unem male married dep sch cosign chist pubrec mortlat1 mortlat2 vr _cons Coef. .9377643 .0132631 -.0530338 -1.904951 -.0665789 -.0663852 .5032817 -.0907336 .0412287 .132059 1.066577 -1.340665 -.3098821 -.8946755 -.3498279 3.80171 Std. Err. .1729041 .0128802 .0112803 .4604412 .0328086 .2064288 .177998 .0733341 .1784035 .4460933 .1712117 .2173657 .4635193 .5685807 .1537248 .5947054 z 5.42 1.03 -4.70 -4.14 -2.03 -0.32 2.83 -1.24 0.23 0.30 6.23 -6.17 -0.67 -1.57 -2.28 6.39 P>z 0.000 0.303 0.000 0.000 0.042 0.748 0.005 0.216 0.817 0.767 0.000 0.000 0.504 0.116 0.023 0.000 [95% Conf.Interval] .5988784 -.0119816 -.0751427 -2.807399 -.1308825 -.4709781 .1544121 -.2344657 -.3084356 -.7422677 .731008 -1.766694 -1.218363 -2.009073 -.6511231 2.636109 1.27665 .0385078 -.0309249 -1.002503 -.0022753 .3382078 .8521513 .0529986 .3908931 1.006386 1.402146 -.9146363 .598599 .2197222 -.0485328 4.967311

68

## Computer Exercise C17.2

Use the average partial effect (APE) to calculate the size of discrimination for the probit and logit estimates.

69

## Computer Exercise C17.2

This can be done in Stata using the user-written command margeff
For dummy variables the APE is calculated as a discrete change in the dependent variable as the dummy variable changes from 0 to 1 (see Cameron and Trivedi, 2009, Chapter 14)

70

## Computer Exercise C17.2

Average Partial Effect of being White on Loan Approval Probit White 0.104 (0.023) Logit 0.101 (0.022) OLS 0.129 (0.020)

Partial Effect at the Average White 0.106 (0.024) 0.097 (0.022) 0.129 (0.020)

71