Вы находитесь на странице: 1из 785

1

BOOK PREFACE
This book provides a detailed treatment of microeconometric analysis, the analysis of individuallevel data on the economic behavior of individuals or firms. This usually entails regression methods
applied to cross-section and panel data.
The book aims to provide the practitioner with a comprehensive coverage of statistical methods
and their application in modern applied microeconometrics research. These methods include
nonlinear modelling, inference under minimal distributional assumptions, identifying and
measuring causation rather than mere association, and correcting from departures from simple
random sampling. Many of these features are of relevance to individual-level data analysis
throughout the social sciences.
The ambitious agenda has determined the characteristics of this book. First, although oriented to
the practitioner the book is relatively advanced in places. A cookbook approach is inadequate as
when two or more complications occur simultaneously, a common situation, the practitioner must
know enough to be able to adapt available methods. Second, the book provides considerable
coverage of practical data problems, see especially the last three chapters. Third, the book includes
substantial empirical examples in many chapters, to illustrate some of the methods covered. Finally,
the book is unusually long. Despite this length we have been space-constrained. We had intended to
include even more empirical examples. And abbreviated presentations will at times fail to recognize
the accomplishments of researchers who have made substantive contributions.
The book assumes a basic understanding of the linear regression model with matrix algebra. It is
written at the mathematical level of the first-year economics Ph.D. sequence, comparable to Greene
(2003). We have two types of readers in mind. First, the book can be used as a course text for a
microeconometrics course, typically taught in the second-year of the Ph.D., or for data-oriented
microeconomics field courses such as labor economics, public economics and industrial
organization. Second, the book can be used as a reference work for graduate students and applied
researchers who despite training in microeconometrics will inevitably have gaps that they wish to
fill.
For instructors using this book as an econometrics course text it is best to introduce the basic
nonlinear cross-section and linear panel data models as early as possible, initially skipping many of
the methods chapters. The key methods chapter (chapter 5) covers maximum likelihood and
nonlinear least squares estimation. ML and NLS provide adequate background for the most
commonly-used nonlinear cross-section models (chapters 14-17, 20), basic linear panel data models
(chapter 21) and treatment evaluation methods (chapter 25). Generalized method of moments
estimation (chapter 6) is needed especially for advanced linear panel data methods (chapter 22).
For readers using this book as a reference work, the chapters have been written to be as selfcontained as possible. The notable exception is that some command of general estimation results in
chapter 5, and occasionally chapter 6, will be necessary. Most models chapters are structured to
begin with a discussion and example that is accessible to a wide audience.
The web-site www.econ.ucdavis.edu/faculty/cameron/mmabook provides all the data and
computer programs used in this book, and related materials useful for instructional purposes.
This project has been long and arduous, and at times seemingly without an end. Its completion
has been greatly aided by our colleagues, friends, and graduate students. We would like to thank
especially the following for reading and commenting on specific chapters: Bijan Borah, Kurt
Brnns, Pian Chen, Tim Cogley, Parthe Deb, David Drukker, Massimiliano De Santis, Jeff Gill,
10

Tue Gorgens, Shiferaw Gurmu, Lu Ji, Oscar Jorda, Roger Koenker, Chenghui Li, Tong Li, Doug
Miller, Murat Munkin, Jim Prieger, Ahmed Rahmen, Sunil Sapra, Haruki Seitani, Yacheng Sun,
Xiaoyong Zheng, and David Zimmer. We thank Rajeev Dehejia, Bronwyn Hall, Cathy Kling,
Jeffrey Kling, Will Manning, Brian McCall and Jim Ziliak for making their data available for
empirical illustrations. We thank our respective departments for facilitating our collaboration, and
for the production and distribution of the draft manuscript at various stages. We benefitted from the
comments of two anonymous reviewers. Guidance, advice and encouragement from our CUP
editor, Scott Pariss, has been invaluable.
Our interest in econometrics owes much to the training and environments we encountered as
students and in the initial stages of our academic careers. The first author thanks The Australian
National University, Stanford University, especially Takeshi Amemiya and Tom MaCurdy, and The
Ohio State University. The second author thanks the London School of Economics and The
Australian National University.
Our interest in writing a book oriented to the practitioner owes much to our exposure to the
research of graduate students and colleagues at our respective institutions, UC-Davis and IUBloomington.
Finally, we would like to thank our families for their patience and understanding without which
completion of this project would not have been possible.
A. Colin Cameron
Davis, California
Pravin K. Trivedi
Bloomington, Indiana

11

TABLE OF CONTENTS
I: PRELIMINARIES

II: CORE METHODS

1.
Overview
2. Causal and Noncausal Models
3. Microeconomic Data Structures
4.
Linear
models
5.
ML
and
NLS
estimation
6. GMM and Systems Estimation
7.
Hypothesis
Tests
8. Specification Tests and Model
Selection
9.
Semiparametric
Methods
10. Numerical Optimization

III:
SIMULATION- 11.
Bootstrap
BASED
12.
Simulation-based
METHODS
13. Bayesian Methods

Methods
Methods

IV:

CROSS-SECTION 14.
Binary
Outcome
Models
DATA MODELS
15.
Multinomial
Models
16. Tobit and Selection Models
17. Transition Data: Survival Analysis
18. Mixture Models and Unobserved
Heterogeneity
19. Models of Multiple Hazards
20. Count Data Models

V:

PANEL
MODELS

DATA 21. Linear Panel Models: Basics


22. Linear Panel Models: Extensions
23. Nonlinear Panel Models

VI: FURTHER TOPICS 24. Stratified and Clustered Samples


25.
Treatment
Evaluation
26.
Measurement
Error
Models
27. Missing Data and Imputation
APPENDICES

A.
Asymptotic
Theory
B. Making Pseudo-Random Draws

12

PART 1 (chapters 1-3)

Part 1 covers the essential components of microeconometric analysis -- an economic specification, a


statistical
model
and
a
data
set.
Chapter 1 discusses the distinctive aspects of microeconometrics, and provides an outline of the
book. It emphasizes that discreteness of data, and nonlinearity and heterogeneity of behavioral
relationships are key aspects of disaggregated microeconometric models. It concludes by presenting
the
notation
and
conventions
used
throughout
the
book.
Chapters 2 and 3 set the scene for the remainder of the book by introducing the reader to key model
and
data
concepts
that
shape
the
analyses
of
later
chapters.
A key distinction in econometrics is between essentially descriptive models and data summaries at
various levels of statistical sophistication and models that go beyond associations and attempt to
estimate causal parameters. The classic definitions of causality in econometrics derive from the
Cowles Commission simultaneous equations models that draw sharp distinctions between
exogenous and endogenous variables, and between structure and reduced form parameters.
Although reduced form models are very useful for prediction, knowledge of structural or causal
parameters is essential for policy analyses. Identification of structural parameters within the
simultaneous equations framework poses numerous conceptual and practical difficulties. An
alternative approach based on the potential outcome model, also attempts to identify causal
parameters but it does so by posing limited questions within a more manageable framework.
Chapter 2 attempts to provide an overview of the fundamental issues that arise in these alternative
frameworks. Readers who initially find this material challenging should return to this chapter later
after gaining greater familiarity with specific models covered later in the book.
The empirical researchers ability to identify causal parameters depends not only on the statistical
tools and models but also on the type of data available. An experimental framework provides a
standard for establishing causal connections. However, observational, not experimental, data form
the basis of much of econometric inference. Chapter 3 surveys the pros and cons of three main types
of data available: observational data, data from social experiments, and those from natural
experiments. The potential as well as the difficulties of conducting causal inference based on each
type of data are reviewed.

PART 2 (chapters 4-10)

Part 2 presents the core methods least squares, method of moments, and maximum likelihood -of estimation and inference in nonlinear regression models that are central in microeconometrics.
Both the traditional topics as well as more modern topics like quantile regression, sequential
estimation, empirical likelihood, bootstrap, and semi- and nonparametric regression are covered. In
general the discussion is at a level intended to provide enough background and detail to enable the
practitioner to read and comprehend articles in the leading econometrics journals. We presume
prior
familiarity
with
linear
regression
analysis.
Chapter 4 begins with the linear regression model. It then covers at an introductory level quantile
regression, which models distributional features other than the conditional mean. It provides a
lengthy expository treatment of instrumental variables estimation, a major semiparametric method
13

of causal inference. Chapter 5 presents the most commonly-used estimation methods for nonlinear
models, beginning with the quite general topic of m-estimation, before specialization to maximum
likelihood and nonlinear least squares regression. Chapter 6 provides a comprehensive treatment of
generalized method of moments, which is a quite general estimation framework, applicable both in
linear and nonlinear, and single- and multi-equation settings. The chapter emphasizes the special
case
of
instrumental
variables
estimation.
Chapter 7 covers both the classical and bootstrap approaches to hypothesis testing, while Chapter 8
presents relatively more modern methods of model selection and specification analysis. .Because of
their importance the bootstrap methods also get a more detailed stand-alone treatment in Chapter
11. As much as possible testing methods are presented in a unified manner in these chapters, but
specific
applications
occur
throughout
the
book
Chapter 9 is a stand-alone chapter that presents nonparametric and semiparametric estimation
methods that place a flexible structure on the econometric model. Chapter 10 presents the
computational methods used to compute the nonlinear estimators presented in chapters 5 and 6.
This material becomes especially relevant to the practitioner if an estimator is not automatically
computed by an econometrics package.

PART 3 (chapters 11-13)

Part 1 emphasized that: (1) Microeconometric models are often nonlinear; (2) they are frequently
estimated using large and heterogeneous data sets; and (3) the data often come from surveys that
are complex and subject to a variety of sampling biases. A realistic depiction of the economic
phenomena in such settings often requires the use of models that are difficult to estimate and
analyze. Advances in computing hardware and software now make it feasible to tackle such tasks.
Part 3 presents modern, computer-intensive, simulation-based methods of inference that mitigate
some of these difficulties. The background required to cover this material varies somewhat with the
chapter but the essential base is least squares and maximum likelihood estimation.
Chapter 11 presents bootstrap methods for statistical inference. These methods have the attraction
of providing a simple way to obtain standard errors when the formulae from asymptotic theory are
complex, as is the case for some two-step estimators. Furthermore, if implemented appropriately, a
bootstrap can lead to a more refined asymptotic theory that may then lead to better statistical
inference
in
small
samples.
Chapter 12 presents simulation-based estimation methods. These methods permit estimation in
situations where standard computational methods may not permit calculation of an estimator,
because of the presence of an integral over a probability distribution for which there is no closedform
solution.
Chapter 13 surveys Bayesian methods that provide an approach to estimation and inference that is
quite different from the classical approach used in other chapters of this book. Despite this different
approach, the Bayesian toolkit can also be adopted to permit classical estimation and inference for
problems that are otherwise intractable

14

PART 4 (chapters 14-20)

Part 4, consisting of chapters 14 to 20, covers the core nonlinear limited dependent variable models
for cross-section data, defined by the range of values taken by the dependent variable. Topics
covered include models for binary and multinomial data, duration data and count data. The
complications of censoring, truncation and sample selection are also studied.
Chapters 14-15 cover models for binary and multinomial data that are standard in the analysis of
discrete choice and outcomes. Maximum likelihood methods are dominant. Different
parameterizations for the conditional probabilities in these models lead to different models, notably
logit and probit models, which are well-established Recent literature has focused on less restrictive
modeling with more flexible functional forms for conditional probabilities and on accommodating
individual unobserved heterogeneity. These objectives motivate the use of semiparametric methods
and
simulation-based
estimation
methods.
Censoring, truncation or sample selection generate empirically several important classes of models
that are analyzed in Chapter 16. The long-established Tobit model is central to this literature, but its
estimation and inference rely on strong distributional assumptions to permit consistent estimation.
We also examine the newer semiparametric methods require weaker assumptions.
Chapters 17-19 consider duration models in which the focus is on either the determinants of spell
lengths, such as length of an unemployment spell, or on modeling the hazard rate of transitions from
one initial state to another. The relative importance of state dependence and unobserved
heterogeneity as determinants of the average length of spell is a central issue, whose resolution
raises fundamental questions about alternative modeling approaches. The analysis covers both
discrete and continuous time models, and both parametric and semiparametric formulations,
including the standard models like the exponential, the Weibull, and the proportional hazards
model. Chapter 18 covers formulation and interpretation of richer models that incorporate
unobserved heterogeneity. Chapter 19 deals with models with several types of events using the
competing
risks
formulation
and
models
of
multiple
spells.
Chapter 20 covers the analysis of event count of the kind very common in health economics. There
are many strong connections and parallels between count data models and duration models because
of their common foundation in stochastic processes. We analyze the widely-used Poisson and
negative binomial regression models, together with important variants such as the two-part or
hurdle model, zero-inflated models, latent class models, and endogenous regressor models, all of
which accommodate different facets of the event processes.

PART 5 (chapters 21-23)

Cross section models have certain inherent limitations. They are predominantly equilibrium models
that generally do not shed light on intertemporal dependence of events. They also cannot
satisfactorily resolve fundamental issues about the sources of persistence in behavior. Such
persistence may be behavioral, i.e. arising from true state dependence, or it may be spurious, being
an artifact of the inability to control for heterogeneous behavior in the population. Because panel
data, also called longitudinal data, contain periodically repeated observations of the same subjects,
they have a large potential for resolving issues that cross section models cannot satisfactorily
handle. Chapters 21 through 23 present methods for panel data. We progress systematically from
15

linear models for continuous data in Chapter 21 to nonlinear panel data models for limited
dependent variables in Chapter 23. Both fixed effects and random effects models are considered. A
persistent theme through these three chapters is the importance of using robust methods of
inference.
Chapter 21, which reviews the key general results for linear panel data regression models, can be
read easily by those with a good grasp of linear regression; it does not require the material covered
in Parts 2 to 4. We recommend that even those who are interested in more advanced material should
quickly peruse through the contents of this chapter first to gain familiarity with key concepts and
definitions.
Chapter 22 covers important extensions of Chapter 21, especially to dynamic panels which allow
for Markovian dependence structure of current variables. The analysis is in the GMM framework
that is currently favored by many practitioners in this area. The analysis here is at times intricate,
involving many issues of detail. A strong grasp of GMM will be helpful in absorbing the main
results
of
this
chapter.
The results of Chapters 21 and 22 do not extend to nonlinear panel models of Chapter 23 in a
general and unified fashion. There are relatively fewer general results for limited dependent variable
panel models. Despite this, in Chapter 23 we begin by presenting an analysis of some general issues
and approaches. Later sections can be treated as panel data extensions of the counterpart cross
section models in Part 4. these analyze four categories of models for binary, count , censored, and
duration data, respectively. These should be accessible to a suitably prepared reader familiar with
the parallel cross section models.

PART 6 (chapters 24-27)

Frequently in empirical work data present not one but multiple complications that the analysis must
simultaneously deal with. Examples of such complications include departures from simple random
sampling, clustering of observations, measurement errors, and missing data. When they occur,
individually or jointly, and in the context of any of the models developed in Parts 4 and 5,
identification of parameters of interest will be compromised. Three chapters in Part 6 Chapters
24, 26, and 27 analyze the consequences of such complications and then present methods that
attempt to overcome the consequences. The methods are illustrated using examples taken from the
earlier parts of the book. This features gives points of connection between Part 6 and the rest of the
book.
Chapter 24, which deals with features of data from complex surveys, complements various topics
covered Chapters 3, 5, and 16. Chapter 26 which deals with measurement errors complements
topics in Chapter 4, 14, and 20. Chapter 27 is a stand-alone chapter on missing data and multiple
imputation, but its use of the EM algorithm and Gibbs sampler also gives it points of contact with
Chapters
10
and
13,
respectively.
Chapter 25 deals with the important topic of treatment evaluation. Treatment is a broad term that
refers to the impact of one variable, e.g. schooling, on some outcome variable, e.g. income.
Treatment variables may be exogenously assigned, or may be endogenously chosen. The topic of
treatment evaluation concerns the identifiability of the impact of treatment on outcome, as measured
by either the marginal effects or certain functions of marginal effect. A variety of methods are used
including instrumental variables regression and propensity score matching. The problem of
treatment evaluation can arise in the context of any model considered in parts 4 and 5. This chapter
16

may also be read on its own, but it does presume familiarity with many other topics covered in the
book, including instrumental variables and selection models, which is why it is placed in the last
part.

17

GUIDE FOR INSTRUCTORS AND OTHER READERS

The book assumes a basic understanding of the linear regression model with matrix algebra. It is
written at the mathematical level of the first-year economics Ph.D. sequence, comparable to Greene
(2000).
While some of the material in this book is covered in a first-year sequence, most of the material in
this book appears in second year econometrics Ph.D. courses or in data-oriented microeconomics
field courses such as labor economics, public economics or industrial organization. This book is
intended to be used as both an econometrics text and as an adjunct for such field courses. More
generally, the book is intended to be useful as a reference work for applied researchers in
economics, in related social sciences such as sociology and political science, and in epidemiology.
The models chapters have been written to be as self-contained as possible, to minimize the amount
of background material in the methods chapters that needs to be read. For the specific models
presented in parts four and five (chapters 14-23) it will generally be sufficient to read the relevant
chapter in isolation, except that some command of the general estimation results in chapter 5 and in
some cases chapter 6 will be necessary. Most chapters are structured to begin with a discussion and
example that is accessible to a wide audience.
For instructors using this book as a course text it is best to introduce the basic nonlinear crosssection and linear panel data models as early as possible, skipping many of the methods chapters.
The most commonly-used nonlinear cross-section models are presented in chapters 14-16, and
require knowledge of maximum likelihood and least squares estimation, presented in chapter five.
Chapter twenty-one on linear panel data models requires even less preparation, essentially just
chapter four.
Table 1.2 provides an outline for a one-quarter second-year graduate course taught at the University
of California - Davis, immediately following the required first-year statistics and econometrics
sequence. A quarter provides sufficient time to cover the basic results given in the first half of the
chapters in this outline. With additional time one can go into further detail or cover a subset of
chapters eleven to thirteen on computationally-intensive estimation methods (simulation-based
estimation, the bootstrap which is also briefly presented in chapter seven and Bayesian methods);
additional cross-section models (durations and counts) presented in chapters seventeen to twenty;
and additional panel data models (linear model extensions and nonlinear models) given in chapters
twenty-two and twenty-three.
Outline of a twenty-lecture ten-week course:
Lectures
Chapter
Topic
1-3
4
Review of linear models and asymptotic theory
4-7

Estimation: M-estimation, ML and NLS

10

Estimation: Numerical Optimization

9-11

14,15

Models: Binary and multinomial

12-14

16

Models: Censored and Truncated

15

Estimation: GMM

16

Testing: Hypothesis Tests

17-19

21

Models: Basic Linear Panel

20
9
Estimation: Semiparametric
At Indiana University - Bloomington, a fifteen-week semester long field course in
microeconometrics is based on material in most of Parts 4 and 5 (chapters 14-23). The prerequisite
courses for this course cover material similar to the material in Part 2 (chapters 4-10).
18

Some exercises are provided at the end of each chapter after the first three introductory chapters.
These exercises are usually learning-by-doing exercises, some are purely methodological while
others entail analysis of generated or actual data. The level of difficulty of the questions is mostly
related to the level of difficulty of the topic.
Detailed programs and data for all the data applications (using either actual data or generated data)
will be made available at the book website.

19

ADVANCE REVIEWS
"This book presents an elegant and accessible treatment of the broad range of rapidly expanding
topics currently being studied by microeconometricians. Thoughtful, intuitive, and careful in laying
out central concepts of sophisticated econometric methodologies, it is not only an excellent
textbook for students, but also an invaluable reference text for practitioners and researchers."
- Cheng Hsiao, University of Southern California
"I wish "Microeconometrics" was available when I was a student! Here, in one place -- and in clear
and readable prose -- you can find all of the tools that are necessary to do cutting-edge applied
economic
analysis,
and
with
many
helpful
examples."
- Alan Krueger, Princeton University
"Cameron and Trivedi have written a remarkably thorough and up-to-date treatment of
microeconometric methods. This is not a superficial cookbook; the early chapters carefully lay the
theoretical foundations on which the authors build their discussion of methods for discrete and
limited dependent variables and for analysis of longitudinal data. A distinctive feature of the book
is its attention to cutting-edge topics like semiparametric regression, bootstrap methods, simulationbased estimation, and empirical likelihood estimation. A highly valuable book."
- Gary Solon, University of Michigan
"The empirical analysis of micro data is more widespread than ever before. The book by Cameron
and Trivedi contains a superb treatment of all the methods that economists like to apply to such
data. What is more, it fully integrates a number of exciting new methods that have become
applicable due to recent advances in computer technology. The text is in perfect balance between
econometric theory and empirical intuition, and it contains many insightful examples."
-

Gerard J. van den Berg, Free University, Amsterdam, The Netherlands

20

PROGRAMS: I. INTRODUCTION (chapters 1-3)


No programs.

PROGRAMS: II. CORE METHODS (chapters 4-10)


Section Pages

Example

Program and Output

4.5.3

84-5

Robust Standard Errors for mma04p1wls.do


OLS, WLS and GLS
mma04p1wls.txt

* mma04p1wls.asc

4.6.4

88-90

Quantile
and
Regression

qreg0902.dta
qreg0902.asc

4.8.8

102-3

Instrumental
Regression

4.9.6

110-2

IV Application with Weak mma04p4ivweak.do


mma04p4ivweak.txt
Instruments

Median mma04p2qreg.do
mma04p2qreg.txt
Variables mma04p3iv.do
mma04p3iv.txt

Data
[* means generated]

or

* mma04p3iv.asc

DATA66.dat
DATA66.dct

and

5.9.2-3 159-63 Exponential: MLE using mma05p1mle.do


ml command
mma05p1mle.txt

* mma05data.asc

5.9.2-3 159-63 Exponential: NLS using nl mma05p2nls.do


command
mma05p2nls.txt

* mma05data.asc

5.9.2-3 159-63 Exponential: NLS using ml mma05p3nlsbyml.do


command
mma05p3nlsbyml.txt

* mma05data.asc

5.9.4

159-63 Exponential: Computation mma05p4margeffects.do


mma05p4margeffects.txt
of marginal effects

* mma05data.asc

6.5.4

198-9

Nonlinear
Limdep

* mma06p1nl2sls.asc

6.5.4

198-9

Part of preceding using mma06p2twostage.do


Stata
mma06p2twostage.txt

* mma06p1nl2sls.asc

7.4

241-3

Likelihood-based
Hypothesis Testts

* mma07p1mltests.asc

7.6.3

248-9

Asymptotic Power of Wald mma07p2power.do


Test
mma07p2power.txt

No data

7.7.1-5 250-4

Monte Carlo Simulation of mma07p3montecarlo.do


Wald Test
mma07p3montecarlo.txt

Data
for
many
simulations not saved

7.8

254-6

Bootstrap example

* mma07p4boot.asc

8.2.9

269-71 Conditional moment tests mma08p1cmtests.do


example
mma08p1cmtests.txt

* mma08p1cmtests.asc

8.5.5

283-4

Nonnested

2SLS:

models

Using mma06p1nl2sls.lim
mma06p1nl2sls.out

mma07p1mltests.do
mma07p1mltests.txt

mma07p4boot.do
mma07p4boot.txt

test mma08p2nonnested.do

21

example

mma08p2nonnested.txt

8.7.3

290-1

Model
example

diagnostics mma08p3diagnostics.do
mma08p3diagnostics.txt

9.2

295-7

Nonparametric
density mma09p1np.do
estimation and regression: mma09p1np.txt
appplication

mma08p2nonnested.asc
*
mma08p3diagnostics.asc

9.4-9.5 307-19 Nonparametric regression: mma09p2npmore.do


more
mma09p2npmore.txt

* mma09p2npmore.asc

9.3.3

* mma09p3kernels.asc

299300

10.2.5 338-9

Kernel functions plotted

mma09p3kernels.do
mma09p3kernels.txt

Gradient method example mma10p1gradient.do


(Newton Raphson)
mma10p1gradient.txt

PROGRAMS:

No data

III. Computationally-Intensive Methods

(chapters 11-13)

Section

Pages

Example

Program and Output

Data

11.3

366-8

Bootstrap example

mma11p1boot.do
mma11p1boot.txt

* mma11p1boot.asc

12.3.3

391-2

Integral
Example

12.4.5,
12.5.6

397-7,
403-4

Maximum
Simulated mma12p2mslmsm.do
Likelihood and Maximum mma12p2mslmsm.txt
Simulated Score Example

*
mma12p2mslmsm.asc

12.8.2

412-3

Illustration of Methods to mma12p3draws.do


Draw Random Variates
mma12p3draws.txt

No data

13.2.2

424

Bayes Theorem Illustration mma13p1bayesthm.do


for Normal Distribution mma13p1bayesthm.txt
and Prior

No data

13.6

452-4

MCMC Example: Gibbs mma13p2bayesgibbs.sas Program generated


Sampler for SUR
mma13p2bayesgibbs.lst
mma13p2bayesgibbs.log

PROGRAMS:

IV.

Computation mma12p1integration.do No data


mma12p1integration.txt

Models

for

Cross-Section

Data

Section Pages

Example

14.2

Logit
and
Probit mma14p1binary.do
Application (fishing mode) mma14p1binary.txt

464-5

Program and Output

(chapters

14-20)

Data
Nldata.asc

22

14.7.5

486

Maximum score estimator mma14p2maxscore.lim


for binary outcome
mma14p2maxscore.out

mma14p1binary.asc

15.2.1- 491-5
3

Multinomial Logit and mma15p1mnl.do


Conditional
Logit mma15p1mnl.txt
Application (fishing mode)

Nldata.asc

15.6.3

511

Nested Logit (or GEV) mma15p2gev.do


estimation
mma15p2gev.txt

Nldata.asc

15.2.2

493-4

Limdep multinomial logit

Nldata.asc

mma15p3mnl.lim
mma15p3mnl.out

15.2.1- 491-5
3

Limdep and addon Nlogit mma15p4gev.lim


for conditional and nested mma15p4gev.out
logit

mma15p4gev.asc

16.2.1

530-1,
565

Classic Tobit MLE and mma16p1tobit.do


CLAD
mma16p1tobit.txt

mma16p1tobit.asc

16.3.4

540

Inverse Mills ratio plotted

No data

16.6

553-5

Selection
Application
expenditures)

17.2
17.5.1

574-5
581-3

Nonparametric estimation mma17p1km.do


(KM for NA) for survival mma17p1km.txt
data (strike duration)

strkdur.dta
strkdur.asc

17.5.1

581-2

Nonparametric estimation mma17p2kmextra.do


(KM and NA) for survival mma17p2kmextra.txt
data (artificial)

Data in program

17.6.1

584-6

Weibull
distribution mma17p3weib.do
functions plotted
mma17p3weib.txt

No data

17.11

603-8

Duration regression models mma17p4duration.do


(unemployment duration) mma17p4duration.txt

ema1996.dta
or ema1996.asc

18.8

632-6

Duration regression with mma18p1heterogeneity.do ema1996.dta


unobserved heterogeneity mma18p1heterogeneity.txt or ema1996.asc
(unemployment duration)

19.5

658-3

Competing risks model mma19p1comprisks.do


(unemployment duration) mma19p1comprisks.txt

ema1996.dta
or ema1996.asc

20.2
20.7

671-4
690

Count regression (doctor mma20p1count.do


contacts)
mma20p1count.txt

randdata.dta
mma20p1count.asc

mma16p2mills.do
mma16p2mills.txt

Model mma16p3selection.do
(medical mma16p3selection.txt

randdata.dta
or
mma16p3selection.asc

or

23

PROGRAMS:

V.

Models

for

Data

(chapters

Pages

21.3.1-3

708-13 Linear Panel Fixed and mma21p1panfeandre.do


Random Effects Application mma21p1panfeandre.txt
(hours and wages)

MOM.dat

21.3.2
21.3.4

710
719

Linear Panel Estimators mma21p2panmanual.do


manually obtained by OLS mma21p2panmanual.txt
on transformed equation
(hours and wages)

MOM.dat

21.3.4

713-5

Linear
Panel
Residual mma21p3panresiduals.do
Analysis (hours and wages) mma21p3panresiduals.txt

MOM.dat

21.5.5

725

Linear Panel pooled OLS mma21p4pangls.do


and GLS estimation (hours mma21p4pangls.txt
and wages)

MOM.dat

22.3

754-6

Linear
Panel
GMM mma22p1gmmpanel.do
Application (hours and mma22p1gmmpanel.txt
wages)

MOMprecise.dat

23.3

792-5

Nonlinear Panel Application mma23p1pannonlin.do


(patents and R&D)
mma23p1pannonlin.txt

patr7079.asc

VI.
Example

Program and Output

21-23)

Section

PROGRAMS:

Example

Panel

Further

Methods

Section

Pages

24.7

848-53 Clustered Linear Regression mma24p1olscluster.do


(household
medical mma24p1olscluster.txt
expenditure clustered on
commune)

Data

(chapters

Program and Output

Clustered
Poisson mma24p2poiscluster.do
Regression
(individual mma24p2poiscluster.txt
pharmacy visits clustered on
commune)

24-27)

Data
vietnam_ex1.dta
or vietnam_ex1.asc

vietnam_ex2.dta
or vietnam_ex2.asc

25.8.1-4

889-93 Treatment
Evaluation: mma25p1treatment.do
Simple
calculations mma25p1treatment.txt
(training on earnings)

nswpsid.da1
or nswpsid.dta

25.8.5

893-6

nswpsid.da1
or nswpsid.dta

25.8

889-96 Treatment

Treatment
Evaluation: mma25p2matching.do
Propensity score matching mma25p2matching.txt
(training on earnings):
Evaluation: mma25p3extra.do

nswre74_treated.dta
24

Additional analysis not in mma25p3extra.txt


book using additional data
sets (NSW experimental
controls and CPS controls)

26.5

919-20 Measurement
Example

27.8

935-9

Error

Bias To

Missing
Data
MCMC To come
Imputation Example

and
nswre74_control.dta
or nswre74_all.asc
propensity_cps.dta
or
propensity_cps.asc
come Generated data

Generated data

25

DATA

SETS

Data in fixed format text file have extension .asc or .dat [and if Stata dictionary used extension is
.dct]
Stata
data
files
have
extension
.dta
We thank Rajeev Dehejia, Bronwyn Hall, Cathy Kling, Jeffrey Kling, Will Manning, Brian McCall
and Jim Ziliak for making their data available for empirical illustrations. The relevant citations are
given below. For "Authors' extract" the citation is A. C. Cameron and P. K. Trivedi (2005),
"Microeconometrics: Methods and Applications," Cambridge University Press, New York.
Many more examples use generated data - see programs.
Pages

Topic

Data Source

Data

88-90

Median and quantile Vietnam World Bank Livings Standards qreg0902.dta


regression
Survey
qreg0902.asc
Authors' extract

or

110-2

Instrumental
National
Longitudinal
Survey DATA66.dat
variables with weak J. R. Kling (2001) "Interpreting DATA66.dct
instruments
Instrumental Variables Estimates of the
Return to Schooling," Journal of Business
and Economic Statistics, 19, 358-364.

and

295-7
300

Panel Survey of
Nonparametric
density estimation Authors' extract
and regression

463-6
486
491-5

Binary
multinomial
outcomes

553-6
565

Selection models

Rand Health Insurance


Authors' extract

574-5
582

Duration models

Strike
duration
data strkdur.asc
J. Kennan (1985), "The Duration of strkdur.asc
Contract strikes in U.S. Manufacturing,"
Journal of Econometrics, 28, 5-28.

or

603-8
632-6
658-62

Duration models

Current Population Survey Displaced ema1996.dta


Workers
Supplement ema1996.asc
B. P. McCall (1996), "Unemployment
Insurance Rules, Joblessness, and Parttime Work," Econometrica, 64, 647-682.

or

671-4
692

Count data models

Rand Health Insurance Experiment randdata.dta


or
P. Deb and P.K. Trivedi (2002), "The mma20p1count.asc
Structure of Demand for Medical Care:
Latent Class versus Two-Part Models,"
Journal of Health Economics, 21, 601625.

708-15

Linear
panel Panel Survey of Income Dynamics MOM.dat
models: basics
J. Ziliak (1997), "Efficient Estimation

Income

Dynamics psidf3050.dat

choice
data Nldata.asc
and Fishing-mode
J. A. Herriges and C. L. Kling (1999), mma15p4gev.asc
"Nonlinear Income Effects in Random
Utility Models," Review of Economics
and Statistics, 81, 62-72.

or

Experiment randdata.dta
or
mma16p3selection.asc

26

With Panel Data when Instruments are


Predetermined: An Empirical Comparison
of Moment-Condition Estimators," Journal
of Business and Economic Statistics, 15,
419-431.
754-6

Linear
panel Panel Survey of Income Dynamics MOMprecise.dat
models: GMM
J. Ziliak (1997) - see previous cite.

792-5

Nonlinear
models

848-53

Clustered data

889-95

panel Patents-R&D
data patr7079.asc
B. H. Hall, Z. Griliches and J. A.
Hausman (1986), "Patents and R&D: Is
There a Lag?", International Economic
Review, 27, 265-283.

Treatment
evaluation
[nswpsid:
NSW
treated vs PSID
control used in text.
The other data sets
not used in text but
used
in
mmap3extra.do]

Vietnam World Bank Livings Standards


Survey
Authors' extract: (1) Household data (2)
Individual data

vietnam_ex1.dta
vietnam_ex1.asc
vietnam_ex2.dta
vietnam_ex2.asc

National Supported Work demonstration


project
and
controls.
R.H. Dehejia and S. Wahba (1999),
"Causal Effects in Nonexperimental
Studies: Reevaluating the Evaluation of
Training Programs," JASA, 1053-1062.
and
/
or
R.H. Dehejia and S. Wahba (2002),
"Propensity-score Matching Methods for
Nonexperimental Causal Studies," ReStat,
151-161.

nswpsid.da1
or
nswpsid.dta
nswre74_treated.dta
and
nswre74_control.dta
or
nswre74_all.asc
propensity_cps.dta
or propensity_cps.asc

or
or

27

EXPLANATION OF BOOK PROGRAMS


PROGRAMS USED:

Most programs are in Stata version 8.0, executed on a MSWindows PC with Stata 8.2.
Stata 7 will usually be okay. Exceptions where Stata 8 is needed include:
(1) Estimates command (for tabulating regression results) is not available in version 7.
Comment out occurrences of "estimates store ..."
and "estimates table ...."
(2) Graphics commands (used to obtain the figures in the book) changed substantially from 7 to 8.
This only effects generating figures. If graphs are important, it is best to upgrade to Stata 8 as so
much
better.
(3) In some places free Stata add-ons have been included. These are noted in programs.
To download these programs e.g. knnreg in Stata give command "search knnreg" and follow
directions.
The Stata programs vary from very problem-specific code to code that potentially can be adapted to
one's own needs.
Some programs use Limdep version 7.0 and Nlogit 2.0, executed on an MSWindows PC.
Some programs use SAS / IML. SAS version 8.0 used on a Unix machine.
FILE NAMING CONVENTIONS:
For
Stata:
as
an
example
for
chapter
4.5.3
we
provide:
mma04p1wls.do
Stata
program
mma04p1wls.txt
Output
from
this
program
- mma04p1wls.asc
The generated data as fixed width ascii data set
[permits analysis with programs other than Stata]
For
Limdep:
as
an
example
for
chapter
14.5.3
we
provide:
mma15p3mnl.lim
Limdep
program
- mma15p3mnl.out
Output from this program
For
SAS:
as
an
example
for
chapter
13.6
we
provide:
mma15p2bayesgibbs.sas
SAS
program
mma13p2bayesgibbs.lst
SAS
output
- mma13p2bayesgibbs.log SAS logfile
For
data
sets
the
extensions
are:
.dta
for
Stata
data
set
- .asc for ascii (text) data set that is usually both space delimited and fixed width
For descriptions of the data sets see the relevant program that uses the data set, and the associated
output.
PROGRAM CPU TIME
Programs generally take little time to run.
Exception is programs that entail simulation, including bootstrapping.
Programs can be speeded up by reducing the number of simulations / replications, though final
analysis should use many simulations / replications.

28

29

30

31

32

33

34

35

36

37

38

Chapter 4. Linear models

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p1wls.txt
log type: text
opened on: 17 May 2005, 13:41:48
.
. ********** OVERVIEW OF MMA04P1WLS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 4.5.3 pages 84-5
. * Robust Standard Errors for OLS, WLS and GLS
. * (1) Robust and nonrobust standard errors for OLS, WLS and GLS.
. * (2) Table 4.3
. * using generated data (see below)
.
. ********** SETUP **********
.
. set more off
. version 8
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA and SUMMARIZE **********
.
. * Model is y = 1 + 1*x + u
. * where u = abs(x)*e
.*
x ~ N(0, 5^2)
.*
e ~ N(0, 2^2)
.
. * Errors are conditionally heteroskedastic with V[u|x]=4*x^2
. * OLS, WLS and GLS are consistent
. * but need to use robust standard errors for OLS and WLS.
.
. set seed 10105
. set obs 100
obs was 0, now 100
. gen x = 5*invnorm(uniform())
39

. gen e = 2*invnorm(uniform())
. gen u = abs(x)*e
. gen y = 1 + 1*x + u
.
. * Descriptive Statistics
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x|
100 -.1322828 4.64293 -11.05289 10.63336
e|
100 .350339 2.033639 -3.776468 5.150759
u|
100 1.215709 8.187081 -19.58098 32.6086
y|
100 2.083426 9.364465 -27.63657 39.93944
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x e u using mma04p1wls.asc, replace
.
. ********** ESTIMATE THE MODELS **********
.
. ** (1) OLS - first column of Table 4.3
.
. * (1A) OLS with wrong standard errors
. regress y x
Source |
SS
df
MS
Number of obs = 100
-------------+-----------------------------F( 1, 98) = 30.23
Model | 2046.73901 1 2046.73901
Prob > F
= 0.0000
Residual | 6634.88855 98 67.7029444
R-squared = 0.2358
-------------+-----------------------------Adj R-squared = 0.2280
Total | 8681.62755 99 87.6932076
Root MSE
= 8.2282
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .979313 .1781124 5.50 0.000 .6258548 1.332771
_cons | 2.212973 .8231553 2.69 0.008 .5794478 3.846497
-----------------------------------------------------------------------------. estimates store olsusual
.
. * (1B) OLS with correct standard errors (robust sandwich)
. regress y x, robust

40

Regression with robust standard errors


Number of obs =
F( 1, 98) = 12.68
Prob > F
= 0.0006
R-squared = 0.2358
Root MSE = 8.2282

100

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .979313 .2750617 3.56 0.001 .4334621 1.525164
_cons | 2.212973 .8198253 2.70 0.008
.586056 3.839889
-----------------------------------------------------------------------------. estimates store olsrobust
.
. ** (2) WLS - second column of Table 4.3
.
. * (2A) WLS with wrong standard errors
. * Use the aweight option (not clearly explained in Stata manual).
. * The aweight option MULTIPLIES y and x by sqrt(aweight).
. * Here we suppose V[u]=constant*|x|
. * So want to divide by sqrt(|x|), so let aweight=1/|x|
. gen absx = abs(x)
. regress y x [aweight=1/absx]
(sum of wgt is 5.7885e+02)
Source |
SS
df
MS
Number of obs = 100
-------------+-----------------------------F( 1, 98) = 25.29
Model | 56.759883 1 56.759883
Prob > F
= 0.0000
Residual | 219.985987 98 2.24475497
R-squared = 0.2051
-------------+-----------------------------Adj R-squared = 0.1970
Total | 276.74587 99 2.79541283
Root MSE
= 1.4983
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9569768 .1903115 5.03 0.000 .5793097 1.334644
_cons | 1.060374 .1498265 7.08 0.000 .7630484
1.3577
-----------------------------------------------------------------------------. estimates store wlsusual
.
. * (2B) WLS with correct standard errors (robust sandwich)
. regress y x [aweight=1/absx], robust
(sum of wgt is 5.7885e+02)
Regression with robust standard errors

Number of obs =

100
41

F( 1, 98) = 17.07
Prob > F
= 0.0001
R-squared = 0.2051
Root MSE = 1.4983
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9569768 .231612 4.13 0.000 .4973503 1.416603
_cons | 1.060374 .050533 20.98 0.000 .9600931 1.160655
-----------------------------------------------------------------------------. estimates store wlsrobust
.
. ** (3) GLS - last column of Table 4.3
.
. * (3A) GLS with usual standard errors (correct)
. * Here we know V[u]=constant*x^2
. * So want to divide by x, so let aweight=1/(x^2)
. gen xsq = x*x
. regress y x [aweight=1/xsq]
(sum of wgt is 1.0314e+05)
Source |
SS
df
MS
Number of obs = 100
-------------+-----------------------------F( 1, 98) = 20.70
Model | .086075004 1 .086075004
Prob > F
= 0.0000
Residual | .407542418 98 .004158596
R-squared = 0.1744
-------------+-----------------------------Adj R-squared = 0.1660
Total | .493617422 99 .004986035
Root MSE
= .06449
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9516457 .2091752 4.55 0.000 .5365444 1.366747
_cons | .9964956 .0065131 153.00 0.000 .9835706 1.009421
-----------------------------------------------------------------------------. estimates store glsusual
.
. * (3B) GLS with standard errors (robust sandwich - unnecessary here)
. regress y x [aweight=1/xsq], robust
(sum of wgt is 1.0314e+05)
Regression with robust standard errors
Number of obs =
F( 1, 98) = 20.89
Prob > F
= 0.0000
R-squared = 0.1744

100

42

Root MSE

= .06449

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9516457 .2082145 4.57 0.000 .5384508 1.364841
_cons | .9964956 .0078922 126.26 0.000 .9808337 1.012157
-----------------------------------------------------------------------------. estimates store glsrobust
.
. * (3C) Check that aweight works as expected.
. * Do GLS by OLS on daya transformed by dividing by x.
. gen try = y/x
. gen trint = 1/x
. gen trx = x/x
. regress try trx trint, noconstant
Source |
SS
df
MS
Number of obs = 100
-------------+-----------------------------F( 2, 98) =11850.15
Model | 101659.545 2 50829.7726
Prob > F
= 0.0000
Residual | 420.359033 98 4.28937789
R-squared = 0.9959
-------------+-----------------------------Adj R-squared = 0.9958
Total | 102079.904 100 1020.79904
Root MSE
= 2.0711
-----------------------------------------------------------------------------try |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------trx | .9516457 .2091752 4.55 0.000 .5365444 1.366747
trint | .9964956 .0065131 153.00 0.000 .9835706 1.009421
-----------------------------------------------------------------------------.
. ********** DISPLAY KEY RESULTS **********
.
. * Table 4.3
. estimates table olsusual olsrobust wlsusual wlsrobust glsusual glsrobust, /*
>
*/ se stats(N r2) b(%7.3f) keep(_cons x)
-------------------------------------------------------------------------Variable | olsus~l olsro~t wlsus~l wlsro~t glsus~l glsro~t
-------------+-----------------------------------------------------------_cons | 2.213 2.213 1.060 1.060 0.996 0.996
| 0.823 0.820 0.150 0.051 0.007 0.008
x | 0.979 0.979 0.957 0.957 0.952 0.952
| 0.178 0.275 0.190 0.232 0.209 0.208
43

-------------+-----------------------------------------------------------N | 100.000 100.000 100.000 100.000 100.000 100.000


r2 | 0.236 0.236 0.205 0.205 0.174 0.174
-------------------------------------------------------------------------legend: b/se
.
. * Minor typo in Table 4.3:
. * for GLS Constant has robust s.e. of [0.008] not [0.006]
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma04p1wls.txt
log type: text
closed on: 17 May 2005, 13:41:48
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p2qreg.txt
log type: text
opened on: 17 May 2005, 13:43:21
.
. ********** OVERVIEW OF MMA04P2QREG.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 4.6.4 pages 88-90
. * Quantile Regression analysis.
. * (1) Quantile regression estimates for different quantiles
. * (2) Figure 4.1: Quantile Slope Coefficient Estimates as Quantile Varies
. * (3) Figure 4.2: Quantile Regression Lines as Quantile Varies
.
. * To run this program you need data file
. * qreg0902.dta
. * or for programs other than Stata use qreg92.asc
.
. * Step (3) takes a long time due to bootstrap to get standard errors.
. * To speed up the program reduce the number of repititions in qsreg
. * But any final results should use a large number of bootstraps
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */

44

.
. ********** DATA DESCRIPTION **********
.
. * The data from World Bank 1997 Vietnam Living Standards Survey
. * are described in chapter 4.6.4.
. * A larger sample from this survey is studied in Chapter 24.7
.
. ********** READ DATA, TRANSFORM and SAMPLE SELECTION **********
.
. use qreg0902
. describe
Contains data from qreg0902.dta
obs:
5,999
vars:
9
19 Sep 2002 21:45
size:
191,968 (98.1% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------sex
byte %8.0g
Gender of HH.head (1:M;2:F)
age
int %8.0g
Age of household head
educyr98
float %9.0g
schooling year of HH.head
farm
float %9.0g
loaiho Type of HH (1:farm; 0:nonfarm)
urban98
byte %8.0g
urban
1:urban 98; 0:rural 98
hhsize
long %12.0g
Household size
lhhexp1
float %9.0g
lhhex12m
float %9.0g
lnrlfood
float %9.0g
------------------------------------------------------------------------------Sorted by:
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------sex |
5999 1.270712 .4443645
1
2
age |
5999 48.01284 13.7702
16
95
educyr98 |
5999 7.094419 4.416092
0
22
farm |
5999 .5730955 .4946694
0
1
urban98 |
5999 .2883814 .4530472
0
1
-------------+-------------------------------------------------------hhsize |
5999 4.752292 1.954292
1
19
lhhexp1 |
5999 9.341561 .6877458 6.543108 12.20242
lhhex12m |
5006 6.310585 1.593083
0 12.36325
lnrlfood |
5999 8.679536 .5368118 6.356364 11.38385
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile sex age educyr98 farm urban98 hhsize lhhexp1 lhhex12m lnrlfood /*
45

>

*/ using qreg0902.asc, replace

.
. * drop zero observations for medical expenditures
. drop if lhhex12m == .
(993 observations deleted)
.
. * lhhexp1 is natural logarithm of household total expenditure
. * lhhex12m is natural logarithm of household medical expenditure
. gen lntotal = lhhexp1
. gen lnmed = lhhex12m
. label variable lntotal "Log household total expenditure"
. label variable lnmed "Log household medical expenditure"
. describe
Contains data from qreg0902.dta
obs:
5,006
vars:
11
19 Sep 2002 21:45
size:
200,240 (98.0% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------sex
byte %8.0g
Gender of HH.head (1:M;2:F)
age
int %8.0g
Age of household head
educyr98
float %9.0g
schooling year of HH.head
farm
float %9.0g
loaiho Type of HH (1:farm; 0:nonfarm)
urban98
byte %8.0g
urban
1:urban 98; 0:rural 98
hhsize
long %12.0g
Household size
lhhexp1
float %9.0g
lhhex12m
float %9.0g
lnrlfood
float %9.0g
lntotal
float %9.0g
Log household total expenditure
lnmed
float %9.0g
Log household medical
expenditure
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------sex |
5006 1.269676 .443836
1
2
age |
5006 48.06133 13.79974
18
95
educyr98 |
5006 7.147956 4.333304
0
21
46

farm |
5006 .5679185 .4954151
0
1
urban98 |
5006 .2920495 .4547504
0
1
-------------+-------------------------------------------------------hhsize |
5006 4.832601 1.95257
1
19
lhhexp1 |
5006 9.370402 .6726841 6.543108 12.20242
lhhex12m |
5006 6.310585 1.593083
0 12.36325
lnrlfood |
5006 8.697963 .5309517 6.356364 11.38385
lntotal |
5006 9.370402 .6726841 6.543108 12.20242
-------------+-------------------------------------------------------lnmed |
5006 6.310585 1.593083
0 12.36325
.
. ********* ANALYSIS: QUANTILE REGRESSION **********
.
. * (0) OLS
. reg lnmed lntotal
Source |
SS
df
MS
Number of obs = 5006
-------------+-----------------------------F( 1, 5004) = 311.91
Model | 745.293239 1 745.293239
Prob > F
= 0.0000
Residual | 11956.9671 5004 2.38948183
R-squared = 0.0587
-------------+-----------------------------Adj R-squared = 0.0585
Total | 12702.2603 5005 2.53791415
Root MSE
= 1.5458
-----------------------------------------------------------------------------lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lntotal | .5736545 .0324817 17.66 0.000 .5099761 .6373328
_cons | .9352117 .3051496 3.06 0.002 .3369847 1.533439
-----------------------------------------------------------------------------. predict pols
(option xb assumed; fitted values)
. reg lnmed lntotal, robust
Regression with robust standard errors
Number of obs =
F( 1, 5004) = 318.05
Prob > F
= 0.0000
R-squared = 0.0587
Root MSE = 1.5458

5006

-----------------------------------------------------------------------------|
Robust
lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lntotal | .5736545 .0321665 17.83 0.000
.510594 .636715
_cons | .9352117 .298119 3.14 0.002 .3507677 1.519656
-----------------------------------------------------------------------------. * Bootstrap standard errors for OLS
47

. set seed 10101


. * bs "reg lnmed lntotal" "_b[lntotal]", reps(100)
.
. * (1) Quantile and median regression for quantiles 0.1, 0.5 and 0.9
. * Save prediction to construct Figure 4.2.
. qreg lnmed lntotal, quant(.10)
Iteration 1: WLS sum of weighted deviations = 3554.0793
Iteration 1: sum of abs. weighted deviations = 3555.3279
Iteration 2: sum of abs. weighted deviations = 3344.1924
Iteration 3: sum of abs. weighted deviations = 3051.7353
Iteration 4: sum of abs. weighted deviations = 2942.1274
Iteration 5: sum of abs. weighted deviations = 2939.3979
Iteration 6: sum of abs. weighted deviations = 2935.9969
Iteration 7: sum of abs. weighted deviations = 2933.0493
Iteration 8: sum of abs. weighted deviations = 2932.7763
Iteration 9: sum of abs. weighted deviations = 2932.4432
Iteration 10: sum of abs. weighted deviations = 2932.4429
.1 Quantile regression
Number of obs =
5006
Raw sum of deviations 2936.097 (about 4.1743875)
Min sum of deviations 2932.443
Pseudo R2 = 0.0012
-----------------------------------------------------------------------------lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lntotal | .1512009 .0552584 2.74 0.006 .0428702 .2595317
_cons | 2.825072 .5194064 5.44 0.000 1.806808 3.843336
-----------------------------------------------------------------------------. predict pqreg10
(option xb assumed; fitted values)
. qreg lnmed lntotal, quant(.5)
Iteration 1: WLS sum of weighted deviations = 6112.8801
Iteration
Iteration
Iteration
Iteration

1: sum of abs. weighted deviations =


2: sum of abs. weighted deviations =
3: sum of abs. weighted deviations =
4: sum of abs. weighted deviations =

6112.4546
6098.5295
6097.2178
6097.1564

Median regression
Number of obs =
Raw sum of deviations 6324.265 (about 6.3716121)
Min sum of deviations 6097.156
Pseudo R2

5006
=

0.0359

-----------------------------------------------------------------------------lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lntotal | .6210917 .0388194 16.00 0.000 .5449886 .6971948
_cons | .5921626 .3646869 1.62 0.104 -.1227836 1.307109
48

-----------------------------------------------------------------------------. predict pqreg50


(option xb assumed; fitted values)
. qreg lnmed lntotal, quant(.90)
Iteration 1: WLS sum of weighted deviations = 3275.6073
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

1: sum of abs. weighted deviations =


2: sum of abs. weighted deviations =
3: sum of abs. weighted deviations =
4: sum of abs. weighted deviations =
5: sum of abs. weighted deviations =
6: sum of abs. weighted deviations =
7: sum of abs. weighted deviations =
8: sum of abs. weighted deviations =

3279.5575
2691.3839
2521.5214
2506.303
2505.1952
2505.1334
2505.1314
2505.1313

.9 Quantile regression
Number of obs =
5006
Raw sum of deviations 2687.692 (about 8.2789364)
Min sum of deviations 2505.131
Pseudo R2 = 0.0679
-----------------------------------------------------------------------------lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lntotal | .8003569 .0517225 15.47 0.000 .6989581 .9017558
_cons | .6750967 .4857563 1.39 0.165 -.2771985 1.627392
-----------------------------------------------------------------------------. predict pqreg90
(option xb assumed; fitted values)
.
. * (2) Create Figure 4.2 on page 90 first as this is easy
. graph twoway (scatter lnmed lntotal, msize(vsmall)) (lfit pqreg90 lntotal, clstyle(p2)) /*
> */ (lfit pqreg50 lntotal, clstyle(p1)) (lfit pqreg10 lntotal, clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Regression Lines as Quantile Varies") /*
> */ xtitle("Log Household Medical Expenditure", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Log Household Total Expenditure", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Actual Data") label(2 "90th percentile") /*
> */
label(3 "Median") label(4 "10th percentile"))
. graph export ch4fig2QR.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch4fig2QR.wmf written in Windows Metafile format)
.
. * (3) Create Figure 4.1 second as this is more difficult
. * Simultaneous quantile regression for quantiles 0.05, 0.10, ..., 0.90, 0.95
. * with standard errors by bootstrap - here 200 replications
. set seed 10101
49

. sqreg lnmed lntotal, quant(.05,.1,.15,.2,.25,.3,.35,.4,.45,.5,.55,.6,.65,.7,.75,.8,.85,.9,.95) rep


> s(200)
(fitting base model)
(bootstrapping .....................................................................................
> ..................................................................................................
> .................)
Simultaneous quantile regression
bootstrap(200) SEs

Number of obs =
5006
.05 Pseudo R2 = 0.0015
.10 Pseudo R2 = 0.0012
.15 Pseudo R2 = 0.0058
.20 Pseudo R2 = 0.0106
.25 Pseudo R2 = 0.0149
.30 Pseudo R2 = 0.0183
.35 Pseudo R2 = 0.0242
.40 Pseudo R2 = 0.0274
.45 Pseudo R2 = 0.0326
.50 Pseudo R2 = 0.0359
.55 Pseudo R2 = 0.0408
.60 Pseudo R2 = 0.0464
.65 Pseudo R2 = 0.0500
.70 Pseudo R2 = 0.0520
.75 Pseudo R2 = 0.0563
.80 Pseudo R2 = 0.0603
.85 Pseudo R2 = 0.0630
.90 Pseudo R2 = 0.0679
.95 Pseudo R2 = 0.0795

-----------------------------------------------------------------------------|
Bootstrap
lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------q5
|
lntotal | .1536332 .0791236 1.94 0.052 -.0014838 .3087501
_cons | 2.095395 .7559016 2.77 0.006 .6134964 3.577293
-------------+---------------------------------------------------------------q10
|
lntotal | .1512009 .085018 1.78 0.075 -.0154716 .3178734
_cons | 2.825072 .7697613 3.67 0.000 1.316002 4.334141
-------------+---------------------------------------------------------------q15
|
lntotal | .2695707 .0580757 4.64 0.000 .1557168 .3834245
_cons | 2.231293 .5429047 4.11 0.000 1.166962 3.295624
-------------+---------------------------------------------------------------q20
|
lntotal | .3552251 .0504688 7.04 0.000 .2562841 .4541662
_cons | 1.740233 .4649551 3.74 0.000 .8287172 2.651749
-------------+---------------------------------------------------------------q25
|
lntotal | .4034632 .0421514 9.57 0.000 .3208279 .4860984
50

_cons | 1.567055 .3844967 4.08 0.000 .8132731 2.320837


-------------+---------------------------------------------------------------q30
|
lntotal | .4797723 .0478081 10.04 0.000 .3860474 .5734972
_cons | 1.097107 .4299363 2.55 0.011 .2542435 1.93997
-------------+---------------------------------------------------------------q35
|
lntotal | .52179 .0440082 11.86 0.000 .4355147 .6080652
_cons | .9213684 .4064355 2.27 0.023 .1245768 1.71816
-------------+---------------------------------------------------------------q40
|
lntotal | .5691746 .0412824 13.79 0.000 .4882429 .6501062
_cons | .6808693 .3754568 1.81 0.070 -.0551906 1.416929
-------------+---------------------------------------------------------------q45
|
lntotal | .6123663 .0402805 15.20 0.000 .5333989 .6913337
_cons | .4890392 .373467 1.31 0.190 -.2431197 1.221198
-------------+---------------------------------------------------------------q50
|
lntotal | .6210917 .0414602 14.98 0.000 .5398117 .7023718
_cons | .5921626 .3866997 1.53 0.126 -.1659383 1.350263
-------------+---------------------------------------------------------------q55
|
lntotal | .6523013 .02904 22.46 0.000 .5953701 .7092324
_cons | .4913988 .264271 1.86 0.063 -.0266881 1.009486
-------------+---------------------------------------------------------------q60
|
lntotal | .6531127 .0321585 20.31 0.000 .5900679 .7161575
_cons | .6631971 .2981433 2.22 0.026 .0787056 1.247689
-------------+---------------------------------------------------------------q65
|
lntotal | .6843844 .03378 20.26 0.000 .6181608 .7506079
_cons | .5550968 .3162769 1.76 0.079 -.0649445 1.175138
-------------+---------------------------------------------------------------q70
|
lntotal | .714783 .0330755 21.61 0.000 .6499406 .7796255
_cons | .4732288 .3028818 1.56 0.118 -.1205524 1.06701
-------------+---------------------------------------------------------------q75
|
lntotal | .7416898 .0369607 20.07 0.000 .6692306 .814149
_cons | .4298887 .3416755 1.26 0.208 -.239945 1.099722
-------------+---------------------------------------------------------------q80
|
lntotal | .7675658 .0443925 17.29 0.000
.680537 .8545946
_cons | .3966887 .4132223 0.96 0.337 -.4134081 1.206785
-------------+---------------------------------------------------------------q85
|
lntotal | .8009016 .056703 14.12 0.000 .6897389 .9120642
_cons | .3649957 .5369325 0.68 0.497 -.6876273 1.417619
-------------+---------------------------------------------------------------q90
|
51

lntotal | .8003569 .0473557 16.90 0.000 .7075189 .8931949


_cons | .6750967 .4450068 1.52 0.129 -.1973116 1.547505
-------------+---------------------------------------------------------------q95
|
lntotal | .767308 .0507532 15.12 0.000 .6678094 .8668066
_cons | 1.487137 .4739756 3.14 0.002 .5579371 2.416337
-----------------------------------------------------------------------------. * Test equality of slope coefffiients for 25th and 75th quantiles
. test [q25]lntotal = [q75]lntotal
( 1) [q25]lntotal - [q75]lntotal = 0
F( 1, 5004) = 55.14
Prob > F = 0.0000
. * Create vectors of slope cofficients and estimated variances
. * Code here specific for this problem
. * with single slope coefficient is 1st, 3rd, 5th , ... entry
. matrix b = e(b)
. matrix bslopevector = b[1,1]\b[1,3]\b[1,5]\b[1,7]\b[1,9]\b[1,11]\b[1,13] /*
>
*/ \b[1,15]\b[1,17]\b[1,19]\b[1,21]\b[1,23]\b[1,25] /*
>
*/ \b[1,27]\b[1,29]\b[1,31]\b[1,33]\b[1,35]\b[1,37]
. matrix V = e(V)
. matrix Vslopevector = V[1,1]\V[3,3]\V[5,5]\V[7,7]\V[9,9]\V[11,11]\V[13,13] /*
>
*/ \V[15,15]\V[17,17]\V[19,19]\V[21,21]\V[23,23]\V[25,25] /*
>
*/ \V[27,27]\V[29,29]\V[31,31]\V[33,33]\V[35,35]\V[37,37]
. matrix q = e(q1)\e(q2)\e(q3)\e(q4)\e(q5)\e(q6)\e(q7)\e(q8)\e(q9)\e(q10) /*
>
*/ \e(q11)\e(q12)\e(q13)\e(q14)\e(q15)\e(q16)\e(q17)\e(q18)\e(q19)
. * Convert column vectors to variables as graph handles variables
. svmat bslopevector, name(bslope)
. svmat Vslopevector, name(Vslope)
. svmat q, name(quantiles)
. gen upper = bslope1 + 1.96*sqrt(Vslope1)
(4987 missing values generated)
. gen lower = bslope1 - 1.96*sqrt(Vslope1)
(4987 missing values generated)
. * Also include OLS slope ccoefficient
. quietly reg lnmed lntotal
. gen bols=_b[lntotal]
52

. sum upper bslope1 lower bols


Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------upper |
19 .6564067 .1904354 .3087155 .9120393
bslope1 |
19 .5641943 .209318 .1512009 .8009015
lower |
19 .4719818 .2302585 -.0154343 .7075397
bols |
5006 .5736545
0 .5736545 .5736545
.
. * Following produces Figure 4.1 om page 89
. graph twoway (line upper quantiles1, msize(vtiny) mstyle(p2) clstyle(p1) clcolor(gs12)) /*
> */ (line bslope1 quantiles1, msize(vtiny) mstyle(p1) clstyle(p1)) /*
> */ (line lower quantiles1, msize(vtiny) mstyle(p2) clstyle(p1) clcolor(gs12)) /*
> */ (line bols quantiles1, msize(vtiny) mstyle(p3) clstyle(p2)), /*
> */ scale(1.2) plotregion(style(none)) /*
> */ title("Slope Estimates as Quantile Varies") /*
> */ xtitle("Quantile", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Slope and confidence bands", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(4) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Upper 95% confidence band") label(2 "Quantile slope coefficient") /*
> */
label(3 "Lower 95% confidence band") label(4 "OLS slope coefficient") )
. graph export ch4fig1QR.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch4fig1QR.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma04p2qreg.txt
log type: text
closed on: 17 May 2005, 13:51:21
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p3iv.txt
log type: text
opened on: 17 May 2005, 13:44:29
.
. ********** OVERVIEW OF MMA04P3IV.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 4.8.8 pages 102-3
. * Instrumental variables analysis.
53

. * (1) IV Regression (with robust s.e.'s though not needed here for iid error).
. * (2) Table 4.4
. * using generated data (see below)
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** GENERATE DATA and SUMMARIZE **********
.
. * Model is
. * y = b1 + b2*x + u
. * x = c1 + c2*z + v
. * z ~ N[2,1]
. * where b1=0, b2=0.5, c1=0 and c2=1
. * and u and v are joint normal (0,0,1,1,0.8)
.
. * OLS of y on z is inconsistent as z is correlated with u
. * Instead need to do IV with instrument x for z
. * Also try using
.
. set seed 10001
. set obs 10000
obs was 0, now 10000
. scalar b1 = 0
. scalar b2 = 0.5
. scalar c1 = 0
. scalar c2 = 1
.
. * Generate errors u and v
. * Use fact that u is N(0,1)
. * and v | u is N(0 + (.8/1)(u - 0), 1 - .8x.8/1 = 0.36)
. gen u = 1*invnorm(uniform())
. gen muvgivnu = 0.8*u
. gen v = 1*(muvgivnu+sqrt(0.36)*invnorm(uniform()))
.
. * Generate instrument z (which is purely random)
. gen z = 2 + 1*invnorm(uniform())

54

.
. * Generate regressor x which is correlated with z, and with u via v
. gen x = c1 + c2*z + v
.
. * Generate dependent variable y
. gen y = b1 + b2*x + u
.
. * Generate z-cubed. Used as an alternative instrument
. gen zcube = z*z*z
.
. * Descriptive Statistics
. describe
Contains data
obs:
10,000
vars:
7
size:
320,000 (96.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------u
float %9.0g
muvgivnu
float %9.0g
v
float %9.0g
z
float %9.0g
x
float %9.0g
y
float %9.0g
zcube
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u | 10000 .003772 1.010726 -4.010302 4.267661
muvgivnu | 10000 .0030176 .8085809 -3.208241 3.414129
v | 10000 .0097031 1.005874 -3.992237 3.79261
z | 10000 1.997786 1.013118 -1.895752 5.81496
x | 10000 2.007489 1.436511 -3.139744 7.366555
-------------+-------------------------------------------------------y | 10000 1.007516 1.538611 -5.309155 7.794924
zcube | 10000 14.14145 17.88016 -6.813095 196.6257
. correlate y x z u v
(obs=10000)

55

|
y
x
z
u
v
-------------+--------------------------------------------y | 1.0000
x | 0.8423 1.0000
z | 0.3403 0.7140 1.0000
u | 0.9237 0.5716 0.0107 1.0000
v | 0.8601 0.7090 0.0124 0.8055 1.0000

. correlate y x z u v, cov
(obs=10000)
|
y
x
z
u
v
-------------+--------------------------------------------y | 2.36732
x | 1.86165 2.06356
z | .530456 1.0391 1.02641
u | 1.4365 .829866 .010909 1.02157
v | 1.33119 1.02447 .012687 .818958 1.01178

. graph matrix y x z u v
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x z u v using mma04p3iv.asc, replace
.
. ********** DO THE ANALYSIS: ESTIMATE MODELS **********
.
. * (1) OLS is inconsistent (first column of Table 4.4)
. regress y x
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) =24412.17
Model | 16793.2198 1 16793.2198
Prob > F
= 0.0000
Residual | 6877.65935 9998 .687903516
R-squared = 0.7094
-------------+-----------------------------Adj R-squared = 0.7094
Total | 23670.8791 9999 2.36732464
Root MSE
= .8294
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9021522 .005774 156.24 0.000 .890834 .9134704
_cons | -.8035441 .014253 -56.38 0.000 -.8314827 -.7756054
-----------------------------------------------------------------------------. regress y x, robust
Regression with robust standard errors
Number of obs = 10000
F( 1, 9998) =24780.49
56

Prob > F
= 0.0000
R-squared = 0.7094
Root MSE = .8294
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9021522 .0057309 157.42 0.000 .8909184 .9133859
_cons | -.8035441 .0141056 -56.97 0.000 -.8311939 -.7758942
-----------------------------------------------------------------------------. estimates store olswrong
.
. * (2) IV with instrument x is consistent and efficient (second column of Table 4.4)
. ivreg y (x = z)
Instrumental variables (2SLS) regression
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) = 2728.97
Model | 13628.1781 1 13628.1781
Prob > F
= 0.0000
Residual | 10042.701 9998 1.004471
R-squared = 0.5757
-------------+-----------------------------Adj R-squared = 0.5757
Total | 23670.8791 9999 2.36732464
Root MSE
= 1.0022
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .5104982 .0097723 52.24 0.000 .4913426 .5296538
_cons | -.017303 .0220296 -0.79 0.432 -.0604854 .0258793
-----------------------------------------------------------------------------Instrumented: x
Instruments: z
-----------------------------------------------------------------------------. ivreg y (x = z), robust
IV (2SLS) regression with robust standard errors
Number of obs = 10000
F( 1, 9998) = 2670.19
Prob > F
= 0.0000
R-squared = 0.5757
Root MSE = 1.0022
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .5104982 .0098792 51.67 0.000 .4911329 .5298635
_cons | -.017303 .0220785 -0.78 0.433 -.0605813 .0259752
57

-----------------------------------------------------------------------------Instrumented: x
Instruments: z
-----------------------------------------------------------------------------. estimates store iv
.
. * (3) IV estimator in (3) can be computed by
.*
regress y on z gives dy/dz
.*
regress x on z gives dx/dz
. * and divide the two
. regress y z
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) = 1309.44
Model | 2741.16635 1 2741.16635
Prob > F
= 0.0000
Residual | 20929.7128 9998 2.09338995
R-squared = 0.1158
-------------+-----------------------------Adj R-squared = 0.1157
Total | 23670.8791 9999 2.36732464
Root MSE
= 1.4469
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | .516808 .0142819 36.19 0.000 .4888126 .5448035
_cons | -.0249553 .031991 -0.78 0.435 -.0876642 .0377535
-----------------------------------------------------------------------------. matrix byonz = e(b)
. regress x z
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) =10396.43
Model | 10518.3341 1 10518.3341
Prob > F
= 0.0000
Residual | 10115.2362 9998 1.01172597
R-squared = 0.5098
-------------+-----------------------------Adj R-squared = 0.5097
Total | 20633.5703 9999 2.06356339
Root MSE
= 1.0058
-----------------------------------------------------------------------------x|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | 1.01236 .0099287 101.96 0.000 .9928979 1.031822
_cons | -.0149899 .02224 -0.67 0.500 -.0585847 .028605
-----------------------------------------------------------------------------. matrix bxonz = e(b)
. matrix ivfirstprinciples = byonz[1,1]/bxonz[1,1]
. matrix list byonz
58

byonz[1,2]
z
_cons
y1 .51680804 -.02495533
. matrix list bxonz
bxonz[1,2]
z
_cons
y1 1.0123602 -.01498985
. matrix list ivfirstprinciples
symmetric ivfirstprinciples[1,1]
c1
r1 .5104982
.
. * (4) IV can be computed as 2SLS, but wrong standard errors
. * (third column of Table 4.4)
. * (4A) OLS of x on z gives xhat
. regress x z
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) =10396.43
Model | 10518.3341 1 10518.3341
Prob > F
= 0.0000
Residual | 10115.2362 9998 1.01172597
R-squared = 0.5098
-------------+-----------------------------Adj R-squared = 0.5097
Total | 20633.5703 9999 2.06356339
Root MSE
= 1.0058
-----------------------------------------------------------------------------x|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | 1.01236 .0099287 101.96 0.000 .9928979 1.031822
_cons | -.0149899 .02224 -0.67 0.500 -.0585847 .028605
-----------------------------------------------------------------------------. predict xhat, xb
. * (4B) OLS of x on xhat gives IV but wrong standard errors
. regress y xhat
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) = 1309.44
Model | 2741.16636 1 2741.16636
Prob > F
= 0.0000
Residual | 20929.7127 9998 2.09338995
R-squared = 0.1158
-------------+-----------------------------Adj R-squared = 0.1157
Total | 23670.8791 9999 2.36732464
Root MSE
= 1.4469
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
59

-------------+---------------------------------------------------------------xhat | .5104982 .0141075 36.19 0.000 .4828446 .5381518


_cons | -.017303 .0318026 -0.54 0.586 -.0796425 .0450364
-----------------------------------------------------------------------------. regress y xhat, robust
Regression with robust standard errors
Number of obs = 10000
F( 1, 9998) = 1271.86
Prob > F
= 0.0000
R-squared = 0.1158
Root MSE = 1.4469
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xhat | .5104982 .0143144 35.66 0.000
.482439 .5385574
_cons | -.017303 .0319207 -0.54 0.588 -.0798741 .045268
-----------------------------------------------------------------------------. estimates store twosls
.
. * (5) IV with instrument xcubed is consistent but inefficient
. * (last column of Table 4.4)
. ivreg y (x = zcube)
Instrumental variables (2SLS) regression
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) = 2001.31
Model | 13598.1181 1 13598.1181
Prob > F
= 0.0000
Residual | 10072.761 9998 1.0074776
R-squared = 0.5745
-------------+-----------------------------Adj R-squared = 0.5744
Total | 23670.8791 9999 2.36732464
Root MSE
= 1.0037
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .5086427 .0113699 44.74 0.000 .4863555 .5309299
_cons | -.0135782 .0249344 -0.54 0.586 -.0624546 .0352982
-----------------------------------------------------------------------------Instrumented: x
Instruments: zcube
-----------------------------------------------------------------------------. ivreg y (x = zcube), robust
IV (2SLS) regression with robust standard errors
Number of obs = 10000
F( 1, 9998) = 1894.15
60

Prob > F
= 0.0000
R-squared = 0.5745
Root MSE = 1.0037
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .5086427 .0116871 43.52 0.000 .4857337 .5315517
_cons | -.0135782 .0253208 -0.54 0.592 -.063212 .0360556
-----------------------------------------------------------------------------Instrumented: x
Instruments: zcube
-----------------------------------------------------------------------------. estimates store ivineff
.
. ********** DISPLAY KEY RESULTS in Table 4.4 p.103 **********
.
. * Table 4.4 page 103
. estimates table olswrong iv twosls ivineff, se stats(N r2) b(%8.3f) keep(_cons x xhat)
---------------------------------------------------------Variable | olswrong
iv
twosls ivineff
-------------+-------------------------------------------_cons | -0.804 -0.017 -0.017 -0.014
| 0.014
0.022
0.032
0.025
x | 0.902
0.510
0.509
| 0.006
0.010
0.012
xhat |
0.510
|
0.014
-------------+-------------------------------------------N | 1.0e+04 1.0e+04 1.0e+04 1.0e+04
r2 | 0.709
0.576
0.116
0.574
---------------------------------------------------------legend: b/se
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma04p3iv.txt
log type: text
closed on: 17 May 2005, 13:44:41
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p4ivweak.txt
log type: text
opened on: 17 May 2005, 13:45:59

61

.
. ********** OVERVIEW OF MMA04P4IVWEAK.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 4.9.5 pages 110-2
. * IV regression with potentially weak instruments
. * (1) Compares OLS and IV estimation of log-wages on schooling regression
. * where schooling, experience and experience-squared are endogenous
. * and proximity to 4-year college, age and age-squared are instruments
. * so model is just-identified.
. * (2) Verifies that here can treat errors as homoskedastic
. * (3) Looks at weak instruments
. * (A) instrument relevance: Whether Shea's partial R-squared is low
. * (B) finite sample bias: whether first-stage partial F is low
. * (4) Provides Table 4.5
. * (5) Does more analysis than reported in the book
.
. * To run this program you need data and dictionary files
. * DATA66.dat ASCII data set
. * DATA66.dct Stata dictionary that labels variables
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set memory 20m
(20480k)
. set linesize 150 /* Permits long inputline commands with delimit */
.
. ********** ORIGINAL DATA SOURCE **********
.
. * Program mma4p4ivweak.do based on Kling Analys66.d0 September 2003
. * written for Jeffrey R. Kling (2001) "Interpreting Instrumental Variables Estimates
. * of the Return to Schooling", Journal of Business and Economic Statistics,
. * July 2001, 19 (3), pp.358-364.
. * This program focuses on Columns (1) and (2) of Kling's Table 1 on p.359
. * in turn based on
. * David Card (1995), "Using Geographic Variation in College Proximity to
. * Estimate the Returns to Schooling", in
. * Aspects of Labor Market Behavior: Essays in Honor of John Vanderkamp,
. * eds. L.N. Christofides et al., Toronto: University of Toronto Press, pp.201-221.
.
62

. ********** READ IN DATA and SUMMARIZE **********


.
. infile using DATA66.dct, using(DATA66.dat)
dictionary using DATA66.dat {
_column(1) id
%8f "ID CODE (r0000100) n= 5225 mean= 2613.000 min= 1 max=
5225 "
_column(9) black
%3f "Race (r0002300) n= 5225 mean= 1.296 min= 1 max=3
"
_column(13) imigrnt
%3f "Was r's brthpl in the US? (r0038000) n=4965 mean=0.98 mn=0
mx=1 "
_column(17) hhead
%8f "Person R lived w/ @ age 14 (r0039700) n= 5213 mean=1.92 mn=1
mx=9"
_column(28) mag_14
%10f "Were magznes avail at age 14 (r0039900) n=5167 mean=0.69
mn=0 mx=1 "
_column(40) news_14 %10f "Were nwspaprs avail at age 14 (r0040000) n=5195 mean=0.85
mn=0 mx=1"
_column(52) lib_14 %10f "Were lib-card avail at age14 (r0040100) n=5204 mean=0.66 mn=0
mx=1 "
_column(63) num_sib
%8f "Tot # sibs r 66 (r0056900) n=5168 mean=3.408 min=0
max=18"
_column(72) fgrade
%8f "Hgc by father, 66 (r0063100) n=3930 mean=9.937 min=0
max=18"
_column(81) mgrade
%8f "Hgc by mother, 66 (r0063300) n=4573 mean=10.25 min=0
max=18"
_column(90) iq
%8f "Iq_score (r0171100) n= 3369 mean=101.582 min=50 max=158 "
_column(99) bdate
%8f "Birthdate - STATA formatted
"
_column(108) gfill76 %8f "'76 Grade level, some values filled from prevs reports"
_column(117) wt76
%8f "'76 Weight "
_column(126) grade76 %8f "'76 Grade level"
_column(135) grade66 %8f "'66 Grade level"
_column(144) age66
%8f "Age reported by screener (r0002200) "
_column(153) smsa66
%8f "If lived in SMSA in 1966 (r0002455=1,2)"
_column(162) region
%8f "Census Region in 1966 (r0002900)
"
_column(171) smsa76
%8f "If lived in SMSA in 1976 (r0437515=1,2)"
_column(180) col4
%8f "If any 4-year college nearby (r0004000!=4) "
_column(189) mcol4
%8f "If male 4-year college nearby (r0004100=1,2) "
_column(198) col4pub %8f "If public 4-year college nearby (r0004000=2,3)"
_column(207) south76 %1f "If lived in South in 1976 (r0437511=1)
"
_column(209) wage76 %10f "'76 Wage"
_column(219) exp76
%8f "'76 experience, (10 + age66) - grade76 - 6)"
_column(230) expsq76 %10f "'76 experience, exp76 ^2/100
"
_column(243) age76
%8f "'76 age (age66 +10)
"
_column(252) agesq76 %8f "'76 age squared (age76^2)
"
_column(261) reg1
%8f "region==NE"
_column(270) reg2
%8f "If lived in Region 2 (region= MidAtl)"
_column(279) reg3
%8f "If lived in Region 3 (region= ENC) "
_column(288) reg4
%8f "If lived in Region 4 (region= WNC) "
_column(297) reg5
%8f "If lived in Region 5 (region= SA ) "
_column(306) reg6
%8f "If lived in Region 6 (region= ESC) "
_column(315) reg7
%8f "If lived in Region 7 (region= WSC) "
_column(324) reg8
%8f "If lived in Region 8 (region= M ) "
63

_column(333) reg9
%8f "If lived in Region 9 (region= P ) "
_column(342) momdad14 %8f "If lived with both parents at age 14 "
_column(351) sinmom14 %8f "If lived with mother only at age 14 "
_column(360) nodaded %1f "If father has no formal education "
_column(362) nomomed %1f "If mother has no formal education "
_column(365) daded
%10f "Mean grade level of father
"
_column(377) momed
%10f "Mean grade level of mother
"
_column(396) famed
%8f "Father's and mother's education
"
_column(405) famed1
%8f "If mgrade> 12 & fgrade> 12 (famed=1) "
_column(414) famed2
%8f "If mgrade>=12 & fgrade>=12 (famed=2) "
_column(423) famed3
%8f "If mgrade==12 & fgrade==12 (famed=3) "
_column(432) famed4
%8f "If mgrade>=12 & fgrade==-1 (famed=4) "
_column(441) famed5
%8f "If fgrade>=12 (famed=5)
"
_column(450) famed6
%8f "If mgrade>=12 & fgrade> -1 (famed=6) "
_column(459) famed7
%8f "If mgrade>=9 & fgrade>=9 (famed=7) "
_column(468) famed8
%8f "If mgrade> -1 & fgrade> -1 (famed=8) "
_column(477) famed9
%8f "If famed not in range (1-8)"
_column(486) int76
%8f "If wt76 not missing "
_column(495) age1415 %8f "If in age group =14-15"
_column(504) age1617 %8f "If in age group =16-17"
_column(513) age1819 %8f "If in age group =18-19"
_column(522) age2021 %8f "If in age group =20-21"
_column(531) age2224 %8f "If in age group =20-24"
_column(540) cage1415 %8f "If in age group =14,15 and lived near college"
_column(549) cage1617 %8f "If in age group =16,17 and lived near college"
_column(558) cage1819 %8f "If in age group =18,19 and lived near college"
_column(567) cage2021 %8f "If in age group =20,21 and lived near college"
_column(576) cage2224 %8f "If in age group =20-24 and lived near college"
_column(585) cage66
%8f "Age in 66 and whether lived near college "
_column(594) a1
%8f "If age in 66 = 14 (age66= 14)"
_column(603) a2
%8f "If age in 66 = 15 (age66= 15)"
_column(612) a3
%8f "If age in 66 = 16 (age66= 16)"
_column(621) a4
%8f "If age in 66 = 17 (age66= 17)"
_column(630) a5
%8f "If age in 66 = 18 (age66= 18)"
_column(639) a6
%8f "If age in 66 = 19 (age66= 19)"
_column(648) a7
%8f "If age in 66 = 20 (age66= 20)"
_column(657) a8
%8f "If age in 66 = 21 (age66= 21)"
_column(666) a9
%8f "If age in 66 = 22 (age66= 22)"
_column(675) a10
%8f "If age in 66 = 23 (age66= 23)"
_column(684) a11
%8f "If age in 66 = 24 (age66= 24)"
_column(693) ca1
%8f "Not lived near college in 66"
_column(702) ca2
%8f "If age in 66 = 14 and lived near college"
_column(711) ca3
%8f "If age in 66 = 15 and lived near college"
_column(720) ca4
%8f "If age in 66 = 16 and lived near college"
_column(729) ca5
%8f "If age in 66 = 17 and lived near college"
_column(738) ca6
%8f "If age in 66 = 18 and lived near college"
_column(747) ca7
%8f "If age in 66 = 19 and lived near college"
_column(756) ca8
%8f "If age in 66 = 20 and lived near college"
_column(765) ca9
%8f "If age in 66 = 21 and lived near college"
_column(774) ca10
%2f "If age in 66 = 22 and lived near college"
_column(777) ca11
%2f "If age in 66 = 23 and lived near college"
64

_column(780) ca12
%8f "If age in 66 = 24 and lived near college"
_column(782) g25
%12f "Grade level when 25 years old
"
_column(795) g25i
%12f "If =g25 and intrvwed in year used for determining g25 "
_column(819) intmo66 %8f "Intvw month in 1966, used to identify cases incl by CARD"
_column(828) nlsflt
%8f "Flag to identify if the case was used by CARD"
_column(837) nsib
%8f "Number of siblings "
_column(846) ns1
%8f "If number of siblings = 0 (nsib= 0)"
_column(855) ns2
%8f "If number of siblings = 2 (nsib= 2)"
_column(864) ns3
%8f "If number of siblings = 3 (nsib= 3)"
_column(873) ns4
%8f "If number of siblings = 4 (nsib= 4)"
_column(882) ns5
%8f "If number of siblings = 6 (nsib= 6)"
_column(891) ns6
%8f "If number of siblings = 9 (nsib= 9)"
_column(900) ns7
%8f "If number of siblings =18 (nsib=18)"
}
(5226 observations read)
. * save DATA66, replace
. desc
Contains data
obs:
5,226
vars:
101
size: 2,132,208 (89.8% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
ID CODE (r0000100) n= 5225
mean= 2613.000 min= 1 max=
5225
black
float %9.0g
Race (r0002300) n= 5225 mean=
1.296 min= 1 max=3
imigrnt
float %9.0g
Was r's brthpl in the US?
(r0038000) n=4965 mean=0.98
mn=0 mx=1
hhead
float %9.0g
Person R lived w/ @ age 14
(r0039700) n= 5213 mean=1.92
mn=1 mx=9
mag_14
float %9.0g
Were magznes avail at age 14
(r0039900) n=5167 mean=0.69
mn=0 mx=1
news_14
float %9.0g
Were nwspaprs avail at age 14
(r0040000) n=5195 mean=0.85
mn=0 mx=1
lib_14
float %9.0g
Were lib-card avail at age14
(r0040100) n=5204 mean=0.66
mn=0 mx=1
num_sib
float %9.0g
Tot # sibs r 66 (r0056900)
n=5168 mean=3.408 min=0
max=18
65

fgrade
mgrade
iq

float %9.0g
float %9.0g
float %9.0g

bdate
gfill76

float %9.0g
float %9.0g

wt76
grade76
grade66
age66

float %9.0g
float %9.0g
float %9.0g
float %9.0g

smsa66

float %9.0g

region
smsa76
col4

float %9.0g
float %9.0g
float %9.0g

mcol4

float %9.0g

col4pub

float %9.0g

south76

float %9.0g

wage76
exp76

float %9.0g
float %9.0g

expsq76
age76
agesq76
reg1
reg2

float %9.0g
float %9.0g
float %9.0g
float %9.0g
float %9.0g

reg3

float %9.0g

reg4

float %9.0g

reg5

float %9.0g

reg6

float %9.0g

reg7

float %9.0g

reg8

float %9.0g

reg9

float %9.0g

Hgc by father, 66 (r0063100)


n=3930 mean=9.937 min=0 max=18
Hgc by mother, 66 (r0063300)
n=4573 mean=10.25 min=0 max=18
Iq_score (r0171100) n= 3369
mean=101.582 min=50 max=158
Birthdate - STATA formatted
'76 Grade level, some values
filled from prevs reports
'76 Weight
'76 Grade level
'66 Grade level
Age reported by screener
(r0002200)
If lived in SMSA in 1966
(r0002455=1,2)
Census Region in 1966
(r0002900)
If lived in SMSA in 1976
(r0437515=1,2)
If any 4-year college nearby
(r0004000!=4)
If male 4-year college nearby
(r0004100=1,2)
If public 4-year college nearby
(r0004000=2,3)
If lived in South in 1976
(r0437511=1)
'76 Wage
'76 experience, (10 + age66) grade76 - 6)
'76 experience, exp76 ^2/100
'76 age (age66 +10)
'76 age squared (age76^2)
region==NE
If lived in Region 2 (region=
MidAtl)
If lived in Region 3 (region=
ENC)
If lived in Region 4 (region=
WNC)
If lived in Region 5 (region=
SA )
If lived in Region 6 (region=
ESC)
If lived in Region 7 (region=
WSC)
If lived in Region 8 (region= M
)
If lived in Region 9 (region= P
)
66

momdad14

float %9.0g

If lived with both parents at


age 14

sinmom14

float %9.0g

If lived with mother only at


age 14

nodaded
nomomed
daded
momed
famed
famed1
famed2
famed3
famed4
famed5
famed6
famed7
famed8
famed9
int76
age1415
age1617
age1819
age2021
age2224
cage1415
cage1617
cage1819
cage2021
cage2224
cage66
a1
a2
a3
a4
a5
a6

float %9.0g

If father has no formal


education
float %9.0g
If mother has no formal
education
float %9.0g
Mean grade level of father
float %9.0g
Mean grade level of mother
float %9.0g
Father's and mother's education
float %9.0g
If mgrade> 12 & fgrade> 12
(famed=1)
float %9.0g
If mgrade>=12 & fgrade>=12
(famed=2)
float %9.0g
If mgrade==12 & fgrade==12
(famed=3)
float %9.0g
If mgrade>=12 & fgrade==-1
(famed=4)
float %9.0g
If fgrade>=12 (famed=5)
float %9.0g
If mgrade>=12 & fgrade> -1
(famed=6)
float %9.0g
If mgrade>=9 & fgrade>=9
(famed=7)
float %9.0g
If mgrade> -1 & fgrade> -1
(famed=8)
float %9.0g
If famed not in range (1-8)
float %9.0g
If wt76 not missing
float %9.0g
If in age group =14-15
float %9.0g
If in age group =16-17
float %9.0g
If in age group =18-19
float %9.0g
If in age group =20-21
float %9.0g
If in age group =20-24
float %9.0g
If in age group =14,15 and
lived near college
float %9.0g
If in age group =16,17 and
lived near college
float %9.0g
If in age group =18,19 and
lived near college
float %9.0g
If in age group =20,21 and
lived near college
float %9.0g
If in age group =20-24 and
lived near college
float %9.0g
Age in 66 and whether lived
near college
float %9.0g
If age in 66 = 14 (age66= 14)
float %9.0g
If age in 66 = 15 (age66= 15)
float %9.0g
If age in 66 = 16 (age66= 16)
float %9.0g
If age in 66 = 17 (age66= 17)
float %9.0g
If age in 66 = 18 (age66= 18)
float %9.0g
If age in 66 = 19 (age66= 19)
67

a7
a8
a9
a10
a11
ca1
ca2

float %9.0g
float %9.0g
float %9.0g
float %9.0g
float %9.0g
float %9.0g
float %9.0g

If age in 66 = 20 (age66= 20)


If age in 66 = 21 (age66= 21)
If age in 66 = 22 (age66= 22)
If age in 66 = 23 (age66= 23)
If age in 66 = 24 (age66= 24)
Not lived near college in 66
If age in 66 = 14 and lived
near college
ca3
float %9.0g
If age in 66 = 15 and lived
near college
ca4
float %9.0g
If age in 66 = 16 and lived
near college
ca5
float %9.0g
If age in 66 = 17 and lived
near college
ca6
float %9.0g
If age in 66 = 18 and lived
near college
ca7
float %9.0g
If age in 66 = 19 and lived
near college
ca8
float %9.0g
If age in 66 = 20 and lived
near college
ca9
float %9.0g
If age in 66 = 21 and lived
near college
ca10
float %9.0g
If age in 66 = 22 and lived
near college
ca11
float %9.0g
If age in 66 = 23 and lived
near college
ca12
float %9.0g
If age in 66 = 24 and lived
near college
g25
float %9.0g
Grade level when 25 years old
g25i
float %9.0g
If =g25 and intrvwed in year
used for determining g25
intmo66
float %9.0g
Intvw month in 1966, used to
identify cases incl by CARD
nlsflt
float %9.0g
Flag to identify if the case
was used by CARD
nsib
float %9.0g
Number of siblings
ns1
float %9.0g
If number of siblings = 0
(nsib= 0)
ns2
float %9.0g
If number of siblings = 2
(nsib= 2)
ns3
float %9.0g
If number of siblings = 3
(nsib= 3)
ns4
float %9.0g
If number of siblings = 4
(nsib= 4)
ns5
float %9.0g
If number of siblings = 6
(nsib= 6)
ns6
float %9.0g
If number of siblings = 9
(nsib= 9)
ns7
float %9.0g
If number of siblings =18
(nsib=18)
------------------------------------------------------------------------------68

Sorted by:
Note: dataset has changed since last saved
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
5225
2613 1508.472
1
5225
black |
5225 .2752153 .4466655
0
1
imigrnt |
5225 .0237321 .1522277
0
1
hhead |
5225 -.3783732 47.95128
-999
9
mag_14 |
5225 .6861566 .4616275
0
1
-------------+-------------------------------------------------------news_14 |
5225 .8483024 .3577176
0
1
lib_14 |
5225 .658469 .4733619
0
1
num_sib |
5168 3.407701 2.586307
0
18
fgrade |
3930 9.93715 3.777654
0
18
mgrade |
4573 10.25104 3.17986
0
18
-------------+-------------------------------------------------------iq |
3369 101.5818 15.93225
50
158
bdate |
5204 472926.6 31765.04 360823 521224
gfill76 |
5225 12.78718 2.802705
0
18
wt76 |
3695 475512.5 265188.5
98617 2582192
grade76 |
3671 13.23018 2.747627
0
18
-------------+-------------------------------------------------------grade66 |
5225 10.58431 2.433696
0
18
age66 |
5225 18.09129 3.157657
14
24
smsa66 |
5225 .6599043 .4737864
0
1
region |
5225 4.721722 2.300767
1
9
smsa76 |
5225 .491866 .4999817
0
1
-------------+-------------------------------------------------------col4 |
5225 .691866 .4617664
0
1
mcol4 |
5225 .6874641 .4635713
0
1
col4pub |
5225 .5129187 .4998809
0
1
south76 |
3695 .3964817 .4892328
0
1
wage76 |
3078 1.658013 .4430234
0 3.1797
-------------+-------------------------------------------------------exp76 |
3671 8.933533 4.212664
0
25
expsq76 |
3671 .9754971 .8778352
0
6.25
age76 |
5225 28.09129 3.157657
24
34
agesq76 |
5225 799.0896 182.0539
576
1156
reg1 |
5225
.04 .1959779
0
1
-------------+-------------------------------------------------------reg2 |
5225 .1617225 .3682313
0
1
reg3 |
5225 .1900478 .3923763
0
1
reg4 |
5225 .0639234 .2446399
0
1
reg5 |
5225 .2126316 .4092083
0
1
reg6 |
5225 .0895694 .2855912
0
1
-------------+-------------------------------------------------------reg7 |
5225 .1083254 .3108206
0
1
reg8 |
5225 .0304306 .1717855
0
1
69

reg9 |
5225 .1033493 .3044437
0
1
momdad14 |
5225 .7680383 .4221251
0
1
sinmom14 |
5225 .1182775 .3229673
0
1
-------------+-------------------------------------------------------nodaded |
5225 .2478469 .4318038
0
1
nomomed |
5225 .1247847 .3305062
0
1
daded |
5225 9.937162 3.276134
0
18
momed |
5225 10.25103 2.974812
0
18
famed |
5225 6.05933 2.643855
1
9
-------------+-------------------------------------------------------famed1 |
5225 .0610526 .2394497
0
1
famed2 |
5225 .0742584 .262216
0
1
famed3 |
5225 .1144498 .3183872
0
1
famed4 |
5225 .0474641 .2126498
0
1
famed5 |
5225 .077512 .2674276
0
1
-------------+-------------------------------------------------------famed6 |
5225 .1245933 .3302888
0
1
famed7 |
5225 .0486124 .215077
0
1
famed8 |
5225 .2273684 .4191726
0
1
famed9 |
5225 .224689 .4174173
0
1
int76 |
5225 .707177 .4551014
0
1
-------------+-------------------------------------------------------age1415 |
5225 .2595215 .4384141
0
1
age1617 |
5225 .2482297 .4320271
0
1
age1819 |
5225 .1751196 .3801058
0
1
age2021 |
5225
.11311 .3167576
0
1
age2224 |
5225 .2040191 .4030216
0
1
-------------+-------------------------------------------------------cage1415 |
5225 .1755024 .3804327
0
1
cage1617 |
5225 .1680383 .3739361
0
1
cage1819 |
5225 .1245933 .3302888
0
1
cage2021 |
5225 .0796172 .2707256
0
1
cage2224 |
5225 .1441148 .3512397
0
1
-------------+-------------------------------------------------------cage66 |
5225 12.56115 8.785895
0
24
a1 |
5225 .1314833 .3379605
0
1
a2 |
5225 .1280383 .3341644
0
1
a3 |
5225 .1326316 .3392086
0
1
a4 |
5225 .1155981 .3197729
0
1
-------------+-------------------------------------------------------a5 |
5225 .098756 .2983627
0
1
a6 |
5225 .0763636 .2656045
0
1
a7 |
5225 .0560766 .2300915
0
1
a8 |
5225 .0570335 .2319288
0
1
a9 |
5225 .0666029 .2493568
0
1
-------------+-------------------------------------------------------a10 |
5225 .0683254 .2523275
0
1
a11 |
5225 .0690909 .2536329
0
1
ca1 |
5225 .308134 .4617664
0
1
ca2 |
5225 .0876555 .2828203
0
1
ca3 |
5225 .0878469 .2830992
0
1
70

-------------+-------------------------------------------------------ca4 |
5225 .0870813 .2819812
0
1
ca5 |
5225 .0809569 .2727951
0
1
ca6 |
5225 .0708134 .2565374
0
1
ca7 |
5225 .0537799 .2256044
0
1
ca8 |
5225 .0390431 .193716
0
1
-------------+-------------------------------------------------------ca9 |
5225 .0405742 .1973204
0
1
ca10 |
5225 .0465072 .2106009
0
1
ca11 |
5225 .0484211 .2146748
0
1
ca12 |
5225 12.52593 2.740455
0
18
g25 |
5225 12.53923 2.749407
0
18
-------------+-------------------------------------------------------g25i |
4148 12.77929 2.740756
0
18
intmo66 |
5225 -5.790239 128.4984
-999
12
nlsflt |
5225 .9835407 .1272459
0
1
nsib |
5225 2.818565 2.473752
0
18
ns1 |
5225 .2547368 .4357549
0
1
-------------+-------------------------------------------------------ns2 |
5225 .3534928 .4780998
0
1
ns3 |
5225 .0109091 .1038853
0
1
ns4 |
5225 .1892823 .3917702
0
1
ns5 |
5225 .135311 .3420882
0
1
ns6 |
5225 .0558852 .2297218
0
1
-------------+-------------------------------------------------------ns7 |
5225 .0003828 .0195628
0
1
.
. * Define the exogenous regressors using the global macro exogregressors
. global exogregressors black south76 smsa76 reg2-reg9 /*
> */ smsa66 momdad14 sinmom14 nodaded nomomed daded momed famed1-famed8
.
. * Write data to a text (ascii) file so can use with programs other than stata
. outfile wage76 grade76 exp76 expsq76 col4 age76 agesq76 black south76 smsa76 reg2-reg9 /*
> */ smsa66 momdad14 sinmom14 nodaded nomomed daded momed famed1-famed8 /*
> */ using mma04p4ivweak.asc, replace
.
.
. ********** (1) OLS AND IV ESTIMATES: COLUMNS 1 AND 2 OF KLING TABLE 1
.
. * RETAIN cases for the analysis
. * Here drop if missing wages or missing schooling or not at first interview
. keep if wage76!=. & grade76!=. & nlsflt==1
(2216 observations deleted)
.
. * DESCRIBE dependent variable, regressors and instruments
. desc wage76 grade76 exp76 expsq76 col4 age76 agesq76 $exogregressors

71

storage display value


variable name type format
label
variable label
------------------------------------------------------------------------------wage76
float %9.0g
'76 Wage
grade76
float %9.0g
'76 Grade level
exp76
float %9.0g
'76 experience, (10 + age66) grade76 - 6)
expsq76
float %9.0g
'76 experience, exp76 ^2/100
col4
float %9.0g
If any 4-year college nearby
(r0004000!=4)
age76
float %9.0g
'76 age (age66 +10)
agesq76
float %9.0g
'76 age squared (age76^2)
black
float %9.0g
Race (r0002300) n= 5225 mean=
1.296 min= 1 max=3
south76
float %9.0g
If lived in South in 1976
(r0437511=1)
smsa76
float %9.0g
If lived in SMSA in 1976
(r0437515=1,2)
reg2
float %9.0g
If lived in Region 2 (region=
MidAtl)
reg3
float %9.0g
If lived in Region 3 (region=
ENC)
reg4
float %9.0g
If lived in Region 4 (region=
WNC)
reg5
float %9.0g
If lived in Region 5 (region=
SA )
reg6
float %9.0g
If lived in Region 6 (region=
ESC)
reg7
float %9.0g
If lived in Region 7 (region=
WSC)
reg8
float %9.0g
If lived in Region 8 (region= M
)
reg9
float %9.0g
If lived in Region 9 (region= P
)
smsa66
float %9.0g
If lived in SMSA in 1966
(r0002455=1,2)
momdad14
float %9.0g
If lived with both parents at
age 14
sinmom14
float %9.0g
If lived with mother only at
age 14
nodaded
float %9.0g
If father has no formal
education
nomomed
float %9.0g
If mother has no formal
education
daded
float %9.0g
Mean grade level of father
momed
float %9.0g
Mean grade level of mother
famed1
float %9.0g
If mgrade> 12 & fgrade> 12
(famed=1)
famed2
float %9.0g
If mgrade>=12 & fgrade>=12
(famed=2)
famed3
float %9.0g
If mgrade==12 & fgrade==12
72

famed4

float %9.0g

famed5
famed6

float %9.0g
float %9.0g

famed7

float %9.0g

famed8

float %9.0g

(famed=3)
If mgrade>=12 & fgrade==-1
(famed=4)
If fgrade>=12 (famed=5)
If mgrade>=12 & fgrade> -1
(famed=6)
If mgrade>=9 & fgrade>=9
(famed=7)
If mgrade> -1 & fgrade> -1
(famed=8)

.
. * SUMMARIZE dependent variable, regressors and instruments
. sum wage76 grade76 exp76 expsq76 col4 age76 agesq76 $exogregressors
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------wage76 |
3010 1.656664 .443798
0 3.1797
grade76 |
3010 13.26346 2.676913
1
18
exp76 |
3010 8.856146 4.141672
0
23
expsq76 |
3010 .9557907 .8461831
0
5.29
col4 |
3010 .6820598 .4657535
0
1
-------------+-------------------------------------------------------age76 |
3010 28.1196 3.137004
24
34
agesq76 |
3010 800.5495 180.7484
576
1156
black |
3010 .2335548 .4231624
0
1
south76 |
3010 .4036545 .4907113
0
1
smsa76 |
3010 .7129568 .4524571
0
1
-------------+-------------------------------------------------------reg2 |
3010 .1607973 .367405
0
1
reg3 |
3010 .1956811
.39679
0
1
reg4 |
3010 .0641196 .2450066
0
1
reg5 |
3010 .2083056 .406164
0
1
reg6 |
3010 .0960133 .2946584
0
1
-------------+-------------------------------------------------------reg7 |
3010 .1099668 .3129003
0
1
reg8 |
3010 .0282392 .165683
0
1
reg9 |
3010 .0903654 .2867522
0
1
smsa66 |
3010 .6495017 .4772053
0
1
momdad14 |
3010 .7893688 .4078247
0
1
-------------+-------------------------------------------------------sinmom14 |
3010 .1006645 .3009339
0
1
nodaded |
3010 .2292359 .4204111
0
1
nomomed |
3010 .1172757 .321802
0
1
daded |
3010 9.988262 3.266511
0
18
momed |
3010 10.33675 2.987507
0
18
-------------+-------------------------------------------------------famed1 |
3010 .0614618 .2402153
0
1
famed2 |
3010 .0787375 .2693734
0
1
famed3 |
3010 .1249169 .3306796
0
1
famed4 |
3010 .0475083 .2127588
0
1
73

famed5 |
3010 .0790698 .2698925
0
1
-------------+-------------------------------------------------------famed6 |
3010 .1328904 .3395126
0
1
famed7 |
3010 .0504983 .2190073
0
1
famed8 |
3010 .2202658 .4144947
0
1
.
. * OLS estimates of return to schooling.
. * This regression computes schooling coeff, se for Table1 col 1 p.359
. * based on all cases (age grp 14-24) reported highest grd cmpl 76
.
. reg wage76 grade76 exp76 expsq76 $exogregressors
Source |
SS
df
MS
Number of obs = 3010
-------------+-----------------------------F( 29, 2980) = 44.94
Model | 180.320527 29 6.21794919
Prob > F
= 0.0000
Residual | 412.32209 2980 .138363117
R-squared = 0.3043
-------------+-----------------------------Adj R-squared = 0.2975
Total | 592.642616 3009 .196956669
Root MSE
= .37197
-----------------------------------------------------------------------------wage76 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------grade76 | .072635 .0036984 19.64 0.000 .0653833 .0798868
exp76 | .0845293 .0066819 12.65 0.000 .0714277 .0976308
expsq76 | -.2289581 .0319499 -7.17 0.000 -.2916041 -.1663121
black | -.1894065 .0194462 -9.74 0.000 -.2275358 -.1512773
south76 | -.1464841 .0260345 -5.63 0.000 -.1975314 -.0954368
smsa76 | .1377121 .0201334 6.84 0.000 .0982353 .1771889
reg2 | .1023805 .0360137 2.84 0.005 .0317662 .1729947
reg3 | .1488958 .0352521 4.22 0.000 .0797748 .2180168
reg4 | .0601267 .0417556 1.44 0.150 -.021746 .1419994
reg5 | .1348504 .0419098 3.22 0.001 .0526752 .2170255
reg6 | .1452831 .0453155 3.21 0.001 .0564302 .2341359
reg7 | .1301968 .044965 2.90 0.004 .0420312 .2183624
reg8 | -.0444289 .0513937 -0.86 0.387 -.1451997 .0563419
reg9 | .1285658 .0389959 3.30 0.001 .0521042 .2050274
smsa66 | .0233775 .019544 1.20 0.232 -.0149436 .0616987
momdad14 | .0693317 .0263402 2.63 0.009
.017685 .1209785
sinmom14 | .0335387 .0354168 0.95 0.344 -.0359052 .1029825
nodaded | -.0390477 .0531089 -0.74 0.462 -.1431815 .0650862
nomomed | .0168143 .0348295 0.48 0.629 -.051478 .0851066
daded | -.0017839 .0043977 -0.41 0.685 -.0104068 .0068389
momed | .0081443 .0041513 1.96 0.050 4.64e-06 .0162839
famed1 | -.1166029 .0788125 -1.48 0.139 -.2711354 .0379296
famed2 | -.052544 .0712753 -0.74 0.461 -.1922977 .0872097
famed3 | -.0719675 .0654608 -1.10 0.272 -.2003205 .0563856
famed4 | -.0197095 .0437058 -0.45 0.652 -.1054062 .0659872
famed5 | -.0252185 .0643526 -0.39 0.695 -.1513985 .1009615
famed6 | -.0733887 .0621076 -1.18 0.237 -.1951667 .0483894
famed7 | -.059927 .0656929 -0.91 0.362 -.188735 .068881
74

famed8 | -.0738951 .0572428 -1.29 0.197 -.1861345 .0383444


_cons | -.0278815 .1005974 -0.28 0.782 -.2251288 .1693659
-----------------------------------------------------------------------------. estimates store ols
.
. * IV Instrumental variables estimates of return to schooling.
. * This regression computes schooling coeff and se for Table 1. col 2 p.359
. * Endogenous variables: schooling, experience, experience squared
. * Excl instruments: college in cnty, age age^2
. * based on all cases (age grp 14-24) reported highest grd cmpl 76 ***/
.
. ivreg wage76 $exogregressors /*
> */ (grade76 exp76 expsq76 = col4 age76 agesq76 $exogregressors)
Instrumental variables (2SLS) regression
Source |
SS
df
MS
Number of obs = 3010
-------------+-----------------------------F( 29, 2980) = 34.56
Model | 122.395448 29 4.22053269
Prob > F
= 0.0000
Residual | 470.247169 2980 .157801063
R-squared = 0.2065
-------------+-----------------------------Adj R-squared = 0.1988
Total | 592.642616 3009 .196956669
Root MSE
= .39724
-----------------------------------------------------------------------------wage76 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------grade76 | .1324485 .0493419 2.68 0.007 .0357009 .2291961
exp76 | .0632411 .0241061 2.62 0.009 .0159748 .1105074
expsq76 | -.1266694 .1184765 -1.07 0.285 -.3589735 .1056347
black | -.1643766 .0292248 -5.62 0.000 -.2216795 -.1070737
south76 | -.1400178 .0283887 -4.93 0.000 -.1956812 -.0843545
smsa76 | .0909867 .0441338 2.06 0.039 .0044509 .1775224
reg2 | .0753178 .0444167 1.70 0.090 -.0117726 .1624083
reg3 | .1231473 .0431763 2.85 0.004
.038489 .2078057
reg4 | .0241968 .0534911 0.45 0.651 -.0806865 .1290801
reg5 | .1247819 .0455148 2.74 0.006 .0355383 .2140255
reg6 | .135761 .0490304 2.77 0.006
.039624 .2318979
reg7 | .1063645 .0519274 2.05 0.041 .0045472 .2081817
reg8 | -.0850609 .064327 -1.32 0.186 -.2111907 .0410688
reg9 | .0916464 .0515551 1.78 0.076 -.0094409 .1927337
smsa66 | .0379821 .0241116 1.58 0.115 -.0092951 .0852592
momdad14 | .043168 .0354056 1.22 0.223 -.0262539
.11259
sinmom14 | .025849 .0383465 0.67 0.500 -.0493392 .1010373
nodaded | -.0462392 .0570684 -0.81 0.418 -.1581366 .0656583
nomomed | .0266252 .0383434 0.69 0.487 -.048557 .1018074
daded | -.0110565 .0089768 -1.23 0.218 -.0286579 .0065449
momed | -.0017539 .0093223 -0.19 0.851 -.0200326 .0165249
famed1 | -.213271 .1160049 -1.84 0.066 -.4407287 .0141867
famed2 | -.1567074 .1145696 -1.37 0.171 -.3813508 .0679361
75

famed3 | -.1354685 .0872725 -1.55 0.121 -.3065889 .035652


famed4 | -.0707323 .0627189 -1.13 0.260 -.193709 .0522444
famed5 | -.0699675 .077928 -0.90 0.369 -.2227656 .0828306
famed6 | -.1171712 .0754408 -1.55 0.120 -.2650926 .0307502
famed7 | -.0921498 .0749801 -1.23 0.219 -.2391679 .0548683
famed8 | -.1184618 .0713021 -1.66 0.097 -.2582681 .0213445
_cons | -.4311125 .3567904 -1.21 0.227 -1.130693 .2684678
-----------------------------------------------------------------------------Instrumented: grade76 exp76 expsq76
Instruments: black south76 smsa76 reg2 reg3 reg4 reg5 reg6 reg7 reg8 reg9
smsa66 momdad14 sinmom14 nodaded nomomed daded momed famed1
famed2 famed3 famed4 famed5 famed6 famed7 famed8 col4 age76
agesq76
-----------------------------------------------------------------------------. estimates store iv
.
. ********** (2) NEW ANALYSIS: HETEROSKEDASTIC ROBUST STANDARD ERRORS
**********
.
. * Heteroskedastic errors makes little difference here.
.
. quietly reg wage76 grade76 exp76 expsq76 $exogregressors
. hettest /* Shows that here there is no heteroskeadsticity for OLS */
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of wage76
chi2(1)
= 0.42
Prob > chi2 = 0.5191
. quietly reg wage76 grade76 exp76 expsq76 $exogregressors, robust
. estimates store olshet
.
. quietly ivreg wage76 $exogregressors /*
> */ (grade76 exp76 expsq76 = col4 age76 agesq76 $exogregressors), robust
. estimates store ivhet
.
. **** DISPLAY RESULTS IN TABLE 4.5 p.111
.
. * Table 4.5 p.111: OLS and IV estimates, s.e.'s and R^2 in Table 4.5
.
. * Table reports only the coefficient and standard erros for grade76
. estimates table ols olshet iv ivhet, /*
76

>

*/ se stats(N ll r2 rss mss rmse df_r) b(%10.4f)

-----------------------------------------------------------------Variable | ols
olshet
iv
ivhet
-------------+---------------------------------------------------grade76 | 0.0726
0.0726
0.1324
0.1324
| 0.0037
0.0039
0.0493
0.0488
exp76 | 0.0845
0.0845
0.0632
0.0632
| 0.0067
0.0068
0.0241
0.0241
expsq76 | -0.2290 -0.2290 -0.1267
-0.1267
| 0.0319
0.0322
0.1185
0.1182
black | -0.1894 -0.1894 -0.1644 -0.1644
| 0.0194
0.0198
0.0292
0.0285
south76 | -0.1465 -0.1465 -0.1400 -0.1400
| 0.0260
0.0280
0.0284
0.0292
smsa76 | 0.1377
0.1377
0.0910
0.0910
| 0.0201
0.0193
0.0441
0.0440
reg2 | 0.1024
0.1024
0.0753
0.0753
| 0.0360
0.0350
0.0444
0.0432
reg3 | 0.1489
0.1489
0.1231
0.1231
| 0.0353
0.0338
0.0432
0.0418
reg4 | 0.0601
0.0601
0.0242
0.0242
| 0.0418
0.0412
0.0535
0.0531
reg5 | 0.1349
0.1349
0.1248
0.1248
| 0.0419
0.0428
0.0455
0.0459
reg6 | 0.1453
0.1453
0.1358
0.1358
| 0.0453
0.0452
0.0490
0.0483
reg7 | 0.1302
0.1302
0.1064
0.1064
| 0.0450
0.0457
0.0519
0.0516
reg8 | -0.0444 -0.0444 -0.0851 -0.0851
| 0.0514
0.0509
0.0643
0.0619
reg9 | 0.1286
0.1286
0.0916
0.0916
| 0.0390
0.0388
0.0516
0.0504
smsa66 | 0.0234
0.0234
0.0380
0.0380
| 0.0195
0.0187
0.0241
0.0231
momdad14 | 0.0693
0.0693
0.0432
0.0432
| 0.0263
0.0257
0.0354
0.0352
sinmom14 | 0.0335
0.0335
0.0258
0.0258
| 0.0354
0.0359
0.0383
0.0384
nodaded | -0.0390 -0.0390 -0.0462 -0.0462
| 0.0531
0.0511
0.0571
0.0550
nomomed | 0.0168
0.0168
0.0266
0.0266
| 0.0348
0.0344
0.0383
0.0375
daded | -0.0018 -0.0018 -0.0111 -0.0111
| 0.0044
0.0044
0.0090
0.0089
momed | 0.0081
0.0081
-0.0018
-0.0018
| 0.0042
0.0042
0.0093
0.0093
famed1 | -0.1166 -0.1166 -0.2133 -0.2133
| 0.0788
0.0792
0.1160
0.1160
famed2 | -0.0525 -0.0525 -0.1567 -0.1567
| 0.0713
0.0698
0.1146
0.1132
77

famed3 | -0.0720 -0.0720 -0.1355 -0.1355


| 0.0655
0.0644
0.0873
0.0865
famed4 | -0.0197 -0.0197 -0.0707 -0.0707
| 0.0437
0.0416
0.0627
0.0601
famed5 | -0.0252 -0.0252 -0.0700 -0.0700
| 0.0644
0.0625
0.0779
0.0763
famed6 | -0.0734 -0.0734 -0.1172 -0.1172
| 0.0621
0.0601
0.0754
0.0735
famed7 | -0.0599 -0.0599 -0.0921 -0.0921
| 0.0657
0.0640
0.0750
0.0730
famed8 | -0.0739 -0.0739 -0.1185 -0.1185
| 0.0572
0.0545
0.0713
0.0682
_cons | -0.0279 -0.0279 -0.4311
-0.4311
| 0.1006
0.0997
0.3568
0.3528
-------------+---------------------------------------------------N | 3010.0000 3010.0000 3010.0000 3010.0000
ll | -1279.2297 -1279.2297
r2 | 0.3043
0.3043
0.2065
0.2065
rss | 412.3221 412.3221 470.2472 470.2472
mss | 180.3205 180.3205 122.3954 122.3954
rmse | 0.3720
0.3720
0.3972
0.3972
df_r | 2980.0000 2980.0000 2980.0000 2980.0000
-----------------------------------------------------------------legend: b/se
.
. ********** (3) NEW ANALYSIS: CHECK FOR WEAK INSTRUMENTS **********
.
. * Model is y = b1*x1 + x2'b2 + u
. * where x1 is scalar endogenous (grade76)
. * where x2 is vector of regressors that includes
.*
exp76 and exp76 which are also endogenous
.*
and $exogregressors which are exogenous
. * and the instruments Z are grade76 col4 age76 agesq76 $exogregressors
.
. * Check for weak instruments
. * Focus on grade76 but can also do this for the other two endogenous regressors.
. * In this example no problems for the other two:
. * as age and age-squared are good instruments for exp and exp-squared.
.
. **** (A) Simple analysis R-squared and F-test [Given in Table 4.5]
.
. * R2 from regress endogenous regressor on instruments
. * This is same as correlation between x1 and projection of x1 on Z
. quietly reg grade76 col4 age76 agesq76 $exogregressors
. di e(r2) " r2 of x1 on Z"
.29677588 r2 of x1 on Z
.
. * Do the partial F-test on the three instruments
78

. * This is the standard first-stage regression F-test


.
. **** DISPLAY RESULT IN TABLE 4.5 page 111
.
. * First-stage F statistic given in Table 4.5
. test col4 age76 agesq76
( 1) col4 = 0
( 2) age76 = 0
( 3) agesq76 = 0
F( 3, 2980) = 8.07
Prob > F = 0.0000
.
. * Compare this to R-squared when only regress on instruments without Z
. quietly reg grade76 $exogregressors
. di e(r2) " r2 of x1 on Z with the three additional instruments dropped"
.29106483 r2 of x1 on Z with the three additional instruments dropped
.
. * Obtain first-stge F for the other two endogenous
. quietly reg exp76 col4 age76 agesq76 $exogregressors
. test col4 age76 agesq76
( 1) col4 = 0
( 2) age76 = 0
( 3) agesq76 = 0
F( 3, 2980) = 1772.03
Prob > F = 0.0000
. quietly reg expsq76 col4 age76 agesq76 $exogregressors
. test col4 age76 agesq76
( 1) col4 = 0
( 2) age76 = 0
( 3) agesq76 = 0
F( 3, 2980) = 1542.36
Prob > F = 0.0000
.
. **** (B) Minimum eigenvalue of matrix analog of the first-stage F statistic
.*
proposed by Stock et al (2002) and tables in Stock and Yogo (2003)
. * This test is not done here.
.
. **** (C) Bound et al (1995) partial R-squared
79

.
. * Not relevant here as more than one endogenous regressor
. * If only one endogenous regressor x1 Bound et al purge the effect of x2
. * by (1) get residual from regress x1 on x2
. * (2) get the residuals from regress z on x2
. * and then get the R-squared from regress (1) on (2).
.
. **** (D) Shea (1997) partial R-squared [Given in Table 4.5]
.
. * Here we have three endogenous regressors.
. * Focus on the endogenous schooling regressor.
. * For the other two just need to replace the first line of (1)
. * e.g. quietly reg exp76 grade76 expsq76 $exogregressors
. * and replace the first line of (2B)
. * e.g. quietly reg exp76hat grade76hat expsq76hat $exogregressors
.
. * (1) Form x1 - x1tilda: residual from regress x1 on other regressors
. quietly reg grade76 exp76 expsq76 $exogregressors
. predict x1minusx1tilda, resid
.
. * (2) Form x1hat - x1hattilda: residual from regress x1hat on fitted values of other regressors
. * (2A) First get the fitted values from regress endogenous on instruments
. quietly reg grade76 col4 age76 agesq76 $exogregressors
. predict grade76hat, xb
. di e(r2) " r2 from regress x1 on Z"
.29677588 r2 from regress x1 on Z
. quietly reg exp76 col4 age76 agesq76 $exogregressors
. predict exp76hat, xb
. di e(r2) " r2 from regress second endog regressor on Z"
.70622765 r2 from regress second endog regressor on Z
. quietly reg expsq76 col4 age76 agesq76 $exogregressors
. predict expsq76hat, xb
. di e(r2) " r2 from regress third endog regressor on Z"
.67573235 r2 from regress third endog regressor on Z
. * Fitted values for the exogenous from regress exogenous on instruments are the exogenous
. * (2B) Run the regression of x1hat on fitted values of other regressors
. quietly reg grade76hat exp76hat expsq76hat $exogregressors
. di e(r2) " r2 from regress prediction of x1 on predictions of x2
.98987117 r2 from regress prediction of x1 on predictions of x2
80

. predict x1hatminusx1hattilda, resid


.
. * (3) Form the correlation between (1) and (2)
. corr x1minusx1tilda x1hatminusx1hattilda
(obs=3010)
| x1minu~a x1hatm~a
-------------+-----------------x1minusx1t~a | 1.0000
x1hatminus~a | 0.0800 1.0000

.
. **** DISPLAY RESULT IN TABLE 4.5 page 111
.
. * Shea's Partial R^2 in Table 4.5
. di r(rho)^2 " Shea's partial R-squared measure"
.00640757 Shea's partial R-squared measure
.
. sum grade76 grade76hat exp76 exp76hat expsq76 expsq76hat grade76 x1minusx1tilda
x1hatminusx1hattilda grade76hat
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------grade76 |
3010 13.26346 2.676913
1
18
grade76hat |
3010 13.26346 1.458306 8.919074 17.42063
exp76 | 3010 8.856146 4.141672
0
23
exp76hat |
3010 8.856146 3.480551 1.329216 17.68953
expsq76 |
3010 .9557907 .8461831
0
5.29
-------------+-------------------------------------------------------expsq76hat |
3010 .9557907 .6955874 -.3913698 2.917523
grade76 |
3010 13.26346 2.676913
1
18
x1minusx1t~a |
3010 -8.71e-10 1.833502 -6.948598 5.661138
x1hatminus~a |
3010 -6.86e-11 .1467669 -.3732457 .3033035
grade76hat |
3010 13.26346 1.458306 8.919074 17.42063
.
. **** (E) Poskitt-Skeels (2002) partial R-squared
. * Not done here
.
. **** (F) If model was over-identified then do test of over-identifying restrictions
. * Not done here as model is just-identified
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma04p4ivweak.txt
log type: text
closed on: 17 May 2005, 13:46:03
81

-----------------------------------------------------------------------------------------------------------------------------------------------------

82

Chapter 5.9 pp.159-63

----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p1mle.txt
log type: text
opened on: 17 May 2005, 13:48:11
.
. ********** OVERVIEW OF MMA05P1MLE.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 5.9 pp.159-63
. * Maximum likelihood analysis.
.
. * Provides first two columns of Table 5.7
. * (1) OLS
using Stata command regress
. * (2) MLE
using Stata command exp for exponential MLE
. * (3) MLE
using Stata command ml for user-provided log-likelihood
. * using generated data (see below)
.
. * Related programs:
. * mma05p2nls.do
NLS, WNLS, FGNLS for same data using nl command
. * mma05p3nlsbyml.do
NLS, WNLS, FGNLS for same data using ml command
. * mma05p4margeffects.do Calculates marginal effects
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** GENERATE DATA and SUMMARIZE **********
.
. * Model is y ~ exponential(exp(a + bx))
.*
x ~ N[mux, sigx^2]
.*
f(y) = exp(a + bx)*exp(-y*exp(a + bx))
.*
lnf(y) = (a + bx) - y*exp(a + bx)
.*
E[y] = exp(-(a + bx)) note sign reversal for the mean
.*
V[y] = exp(-(a + bx)) = E[y]^2
.
. * The dgp sets particular values of a, b, mux and sigx
. * Here a = 2, b = -1 and x ~ N[1, 1]
. scalar a = 2

83

. scalar b = -1
. scalar mux = 1
. scalar sigx = 1
.
. * Set the sample size. Table 5.7 uses N=10,000
. set obs 10000
obs was 0, now 10000
.
. * Generate x and y
. set seed 2003
. gen x = mux + sigx*invnorm(uniform())
. gen lamda = exp(a + b*x)
. gen Ey = 1/lamda
. * To generate exponential with mean mu=Ey use
. * Integral 0 to a of (1/mu)exp(-x/mu) dx by change of variables
. * = Integral 0 to a/mu of exp(-t)dt
. * = incomplete gamma function P(0,a/mu) in the terminology of Stata
. gen y = Ey*invgammap(1,uniform())
. gen lny = ln(y)
. gen lnfy = ln(lamda) - y*lamda
. * twoway scatter Ey x
.
. * Descriptive Statisitcs
. describe
Contains data
obs:
10,000
vars:
6
size:
280,000 (97.3% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------x
float %9.0g
lamda
float %9.0g
Ey
float %9.0g
y
float %9.0g
lny
float %9.0g
lnfy
float %9.0g
------------------------------------------------------------------------------84

Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x | 10000 1.014313 1.004905 -2.895741 4.994059
lamda | 10000 4.457478 5.939084 .0500838 133.7191
Ey | 10000 .6185677 .8294007 .0074784 19.96655
y | 10000 .6194352 1.291416 .0000445 30.60636
lny | 10000 -1.554348 1.62358 -10.02114 3.421208
-------------+-------------------------------------------------------lnfy | 10000 -.0209485 1.419595 -7.52596 4.402257
.
. ********** WRITE DATA TO A TEXT FILE **********
.
. * Write data to a text (ascii) file
. * used for programs mma05p2nlsbyml.do, mma05p3nlsbynl.do
. * and mma05p4margeffects.do
. * and can also use with programs other than Stata
. outfile y x using mma05data.asc, replace
.
. ********** DO THE ANALYSIS: OLS and MLE **********
.
. ** (1) OLS ESTIMATION
.
. * OLS is inconsistent in this example
. regress y x
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) = 3030.74
Model | 3879.13606 1 3879.13606
Prob > F
= 0.0000
Residual | 12796.7438 9998 1.27993037
R-squared = 0.2326
-------------+-----------------------------Adj R-squared = 0.2325
Total | 16675.8799 9999 1.66775476
Root MSE
= 1.1313
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .6198182 .0112587 55.05 0.000 .5977488 .6418876
_cons | -.0092545 .016075 -0.58 0.565 -.0407648 .0222558
-----------------------------------------------------------------------------. estimates store rols
. regress y x, robust
Regression with robust standard errors

Number of obs = 10000


85

F( 1, 9998) = 596.30
Prob > F
= 0.0000
R-squared = 0.2326
Root MSE = 1.1313
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .6198182 .0253823 24.42 0.000 .5700638 .6695725
_cons | -.0092545 .0171978 -0.54 0.591 -.0429655 .0244566
-----------------------------------------------------------------------------. estimates store rolsrobust
.
. ** (2) ML ESTIMATION USING STATA COMMAND FOR EXPONENTIAL MLE
.
. * The following uses Stata duration model commands.
. * First need to define the duration variable (here y)
. stset y
failure event: (assumed to fail at time=y)
obs. time interval: (0, y]
exit on or before: failure
-----------------------------------------------------------------------------10000 total obs.
0 exclusions
-----------------------------------------------------------------------------10000 obs. remaining, representing
10000 failures in single record/single failure data
6194.352 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t = 30.60636
. streg x, dist(exp) nohr
failure _d: 1 (meaning all fail)
analysis time _t: y
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -20754.005


log likelihood = -17232.884
log likelihood = -15760.556
log likelihood = -15752.193
log likelihood = -15752.19
log likelihood = -15752.19

Exponential regression -- log relative-hazard form


No. of subjects =

10000

Number of obs =

10000
86

No. of failures =
10000
Time at risk = 6194.352495
LR chi2(1)
Log likelihood =

-15752.19

= 10003.63
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | -.9896276 .0098692 -100.27 0.000 -1.008971 -.9702842
_cons | 1.982921 .0141496 140.14 0.000 1.955188 2.010654
-----------------------------------------------------------------------------. estimates store rexp
. streg x, dist(exp) nohr robust
failure _d: 1 (meaning all fail)
analysis time _t: y
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -20754.005


log pseudo-likelihood = -17232.884
log pseudo-likelihood = -15760.556
log pseudo-likelihood = -15752.193
log pseudo-likelihood = -15752.19
log pseudo-likelihood = -15752.19

Exponential regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
10000
Number of obs = 10000
=
10000
= 6194.352495
Wald chi2(1) = 9914.62
Log pseudo-likelihood = -15752.19
Prob > chi2 = 0.0000
-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | -.9896276 .0099388 -99.57 0.000 -1.009107 -.9701479
_cons | 1.982921 .0144307 137.41 0.000 1.954637 2.011205
-----------------------------------------------------------------------------. estimates store rexprobust
.
. ** (3) ML ESTIMATION USING STATA ML COMMAND
.
. * For MLE computation can use the following Stata commands
. * ml model lf
provide the log-density
. * ml model D0
provide the log-likelihood
. * ml model D1
provide the log-likelihood and gradient
87

. * ml model D2
provide the log-likelihood, gradient and hessian
.
. * At a minimum need to provide
. * (A) program define fcn where fcn is the function name
.*
defines the log-density (independent observations assumed)
. * (B) ml model lf fcn + some extras
.*
the extras give the dependent variable and regressors
. * (C) ml maximize
.*
obtains the mle
. * (D) ml model lf fcn + some extras, robust
.*
provides robust sandwich standard errors
.
. * Here we provide the log-density (ml model lf) as this is simplest,
. * and the Stata manual says that numerically only D2 is better.
.
. * (A) Define the log-density
.*
lnf(y) = (a+bx) - y*exp(a+bx) = theta - y*exp(theta) where theta = x'b
. program define mleexp0
1. version 8.0
2. args lnf theta
/* Must use lnf while could use name other than theta */
3. quietly replace `lnf' = `theta' - $ML_y1*exp(`theta')
4. end
.
. * (B) Say that dependent variable is y and regressors are x plus a constant
. ml model lf mleexp0 (y = x)
.
. * (C) Obtain the MLE
. ml search
/* Optional - can provide better starting values */
initial:
log likelihood = -6194.3525
improve:
log likelihood = -6194.3525
alternative: log likelihood = -5212.7607
rescale:
log likelihood = -5212.7607
. ml maximize
initial:
log likelihood = -5212.7607
rescale:
log likelihood = -5212.7607
Iteration 0: log likelihood = -5212.7607
Iteration 1: log likelihood = -1563.9176
Iteration 2: log likelihood = -217.6055
Iteration 3: log likelihood = -208.73633
Iteration 4: log likelihood = -208.71383
Iteration 5: log likelihood = -208.71383
Number of obs =
10000
Wald chi2(1) = 10054.85
Log likelihood = -208.71383
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------88

y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | -.9896276 .0098692 -100.27 0.000 -1.008971 -.9702842
_cons | 1.982921 .0141496 140.14 0.000 1.955188 2.010654
-----------------------------------------------------------------------------. estimates store rmle
.
. * (D) Obtain robust standard errors
. ml model lf mleexp0 (y = x), robust
. ml search
initial:
log pseudo-likelihood = -6194.3525
improve:
log pseudo-likelihood = -6194.3525
alternative: log pseudo-likelihood = -5212.7607
rescale:
log pseudo-likelihood = -5212.7607
. ml maximize
initial:
log pseudo-likelihood = -5212.7607
rescale:
log pseudo-likelihood = -5212.7607
Iteration 0: log pseudo-likelihood = -5212.7607
Iteration 1: log pseudo-likelihood = -1563.9176
Iteration 2: log pseudo-likelihood = -217.6055
Iteration 3: log pseudo-likelihood = -208.73633
Iteration 4: log pseudo-likelihood = -208.71383
Iteration 5: log pseudo-likelihood = -208.71383
Number of obs =
10000
Wald chi2(1) = 9914.62
Log pseudo-likelihood = -208.71383
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | -.9896276 .0099388 -99.57 0.000 -1.009107 -.9701479
_cons | 1.982921 .0144307 137.41 0.000 1.954637 2.011205
-----------------------------------------------------------------------------. estimates store rmlerobust
.
. * (E) Calculate R-squared and log-likelihood at the ML estimates
. * lnL sums lnf(y) = ln(lamda) - y*lamda
. gen lamdaml = exp(_b[_cons] + _b[x]*x)
. gen lnfml = ln(lamdaml) - y*lamdaml
. quietly means lnfml
89

. scalar LLml = r(mean)*r(N)


. * R-squared = 1 - Sum_i(y_i - yhat_i)^2 / Sum_i(y_i - ybar)^2
. gen yhatml = 1/lamdaml
. egen ybar = mean(y)
. * quietly means y
. * scalar ybar = r(mean)
. gen y_yhatsqml = (y - yhatml)^2
. gen y_ybarsq = (y - ybar)^2
. quietly means y_yhatsqml
. scalar SSresidml = r(mean)
. quietly means y_ybarsq
. scalar SStotal = r(mean)
. scalar Rsqml = 1 - SSresidml/SStotal
. di LLml " " Rsqml
-208.71383 .39062307
.
. ********** DISPLAY RESULTS: First two columns of Table 5.7 p.161
.
. * (1) OLS - nonrobust and robust standard errors
. * Here OLS is inconsistent.
. * And expect sign reversal for slope as in true model mean E[y] = exp(-x'b)
. estimates table rols rolsrobust, b(%10.4f) se(%10.4f) t stats(N ll r2) keep(_cons x)
---------------------------------------Variable | rols
rolsrobust
-------------+-------------------------_cons | -0.0093 -0.0093
| 0.0161
0.0172
|
-0.58
-0.54
x | 0.6198
0.6198
| 0.0113
0.0254
|
55.05
24.42
-------------+-------------------------N | 10000.0000 10000.0000
ll | -1.542e+04 -1.542e+04
r2 | 0.2326
0.2326
---------------------------------------legend: b/se/t

90

.
. * (2) MLE by command ereg - nonrobust and robust standard errors
. estimates table rexp rexprobust, b(%10.4f) se(%10.4f) t stats(N ll) keep(_cons x)
---------------------------------------Variable | rexp
rexprobust
-------------+-------------------------_cons | 1.9829
1.9829
| 0.0141
0.0144
| 140.14
137.41
x | -0.9896 -0.9896
| 0.0099
0.0099
| -100.27
-99.57
-------------+-------------------------N | 10000.0000 10000.0000
ll | -1.575e+04 -1.575e+04
---------------------------------------legend: b/se/t
.
. * (3) MLE by command ml - nonrobust and robust standard errors
. estimates table rmle rmlerobust, b(%10.4f) se(%10.4f) t stats(N ll) keep(_cons x)
---------------------------------------Variable | rmle
rmlerobust
-------------+-------------------------_cons | 1.9829
1.9829
| 0.0141
0.0144
| 140.14
137.41
x | -0.9896 -0.9896
| 0.0099
0.0099
| -100.27
-99.57
-------------+-------------------------N | 10000.0000 10000.0000
ll | -208.7138 -208.7138
---------------------------------------legend: b/se/t
. * And ML log-likelihood (check) and R-squared (needed to be computed)
. di "Log likeihood for ML: " LLml
Log likeihood for ML: -208.71383
. di "R-squared for MLE: " Rsqml
R-squared for MLE: .39062307
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma05p1mle.txt
log type: text
closed on: 17 May 2005, 13:48:18
91

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p2nls.txt
log type: text
opened on: 17 May 2005, 13:53:31
.
. ********** OVERVIEW OF MMA05P2NLS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 5.9 pp.159-63
. * Nonlinear least squares
.
. * Provides last three columns of Table 5.7 results for
. * (1) NLS using Stata command nl (hard to get robust s.e.'s)
. * (2) FGNLS using Stata command nl (hard to get robust s.e.'s)
. * (3) WNLS using Stata command nl (hard to get robust s.e.'s)
. * using generated data set mma05data.asc
.
. * Note: Stata 8 does not give robust se's for nl
.*
But ml does - see program mma05p3nlsbyml.do
.*
New Stata 9 does have a robust se option (unlike Stata 8)
.
. * Related programs:
. * mma05p1mle.do
OLS and MLE for the same data
. * mma05p3nlsbyml.do
NLS using ml rather than nl
. * mma05p4margeffects.do Calculates marginal effects
.
. * To run this program you need data and dictionary files
. * mma05data.asc ASCII data set generated by mma05p1mle.do
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** READ IN DATA and SUMMARIZE **********
.
. * Model is y ~ exponential(exp(a + bx))
.*
x ~ N[mux, sigx^2]
.*
f(y) = exp(a + bx)*exp(-y*exp(a + bx))
.*
lnf(y) = (a + bx) - y*exp(a + bx)
.*
E[y] = exp(-(a + bx)) note sign reversal for the mean
.*
V[y] = exp(-(a + bx)) = E[y]^2
. * Here a = 2, b = -1 and x ~ N[mux=1, sigx^21]
92

. * and Table 5.7 uses N=10,000


.
. * Data was generated by program mma05p1mle.do
. infile y x using mma05data.asc
(10000 observations read)
.
. * Descriptive Statistics
. describe
Contains data
obs:
10,000
vars:
2
size:
120,000 (98.8% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------y
float %9.0g
x
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y | 10000 .6194352 1.291416 .0000445 30.60636
x | 10000 1.014313 1.004905 -2.895741 4.994059
.
. ********** DO THE ANALYSIS: NLS, WNLS and NFGLS **********
.
. *** (1) NLS ESTIMATION USING STATA NL COMMAND (Nonlinear LS)
.
. * To do this in Stata
. * (A) program define nlfcn where fcn is the function name
.*
defines g(x_i'b) and says what the regressors x are
. * (B) nl fcn y
where fcn is the function name in (A)
.*
and y is the dependent variable
.*
does NLS of y on fcn defined in (A)
. * (C) Heteroskedastic-consistent standard errors requires extra coding
.
. * (1A) Define g(x'b)
.*
Note: Since E[y] = exp(-(a + bx)) there is sign reversal for the mean
. program define nlexpnls
1. version 7.0
2. if "`1'" == "?" {
/* if query call ... */
3.
global S_1 "b1int b2x"
/* declare parameters */
4.
global b1int=1
/* initial values */
93

5.
global b2x=0
6.
exit}
7. replace `1'=exp(-$b1int-$b2x*x) /* calculate function */
8. end
.
. * (1B) Do NLS of y on the function expnls defined in (A)
. nl expnls y
(obs = 10000)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

residual SS =
residual SS =
residual SS =
residual SS =
residual SS =
residual SS =

17308.68
10333.37
10150.66
10149.86
10149.86
10149.86

Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 5103.98
Model | 10363.0157 2 5181.50784
Prob > F
= 0.0000
Residual | 10149.8633 9998 1.01518937
R-squared = 0.5052
-------------+-----------------------------Adj R-squared = 0.5051
Total | 20512.879 10000 2.0512879
Root MSE
= 1.007566
Res. dev. = 28527.52
(expnls)
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------b1int | 1.887563 .0306819 61.52 0.000
1.82742 1.947705
b2x | -.9574684 .0097419 -98.28 0.000 -.9765645 -.9383724
-----------------------------------------------------------------------------(SEs, P values, CIs, and correlations are asymptotic approximations)
. estimates store bnls
.
. * Complications now begin: getting standard erors. Easier to use (1) !!
.
. * (1C) Get sandwich heteroskedastic-robust standard errors for NLS
.
. * Note that robust option does not work for nl
. * So wrong standard errors are given for this problem as errors are heterosckeastic
.
. * To get robust standard errors is not straightforward
.
. * Obtain them by OLS regress y - g(x,b) on dg/db with robust option.
. * Explanation: OLS regress y - g(x,b) = (dg/db)'a + v
. * This is NR algorithm for update of b
. * But a = 0 since iterations have converged, so v = y - g(x,b)
. * So nonrobust standard errors from this OLS regression yield
. * V[a] = s^2 (Sum_i (dg_i/db)(dg_i/db)')
94

. * where s^2 = (Sum_i(y - g(x_i,b)^2))


. * This is the nonrobust standard errors for NLS
. * And robust option gives robust standard errors from this OLS regression.
.
. * Obtain the derivatives dg/db
. * Here g = exp(x'b) so dg/db = exp(x'b)*x = yhat*x
. quietly nl expnls y
. predict residnls, residuals
. predict yhatnls, yhat
. scalar snls = e(rmse)

/* Use in earlier code */

. gen d1 = yhatnls
. gen d2 = x*yhatnls
. * This OLS regression gives robust standard errors
. regress residnls d1 d2, noconstant robust
Regression with robust standard errors
Number of obs = 10000
F( 2, 9998) = 0.00
Prob > F
= 1.0000
R-squared = 0.0000
Root MSE = 1.0076
-----------------------------------------------------------------------------|
Robust
residnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------d1 | 4.46e-07 .1420794 0.00 1.000 -.2785037 .2785046
d2 | -1.49e-07 .0611969 -0.00 1.000 -.1199583 .119958
-----------------------------------------------------------------------------. estimates store bnlsrobust
.
. * Check: Do OLS regression that gives nonrobust standard errors
.*
and verify that same results as in (1B)
. regress residnls d1 d2, noconstant
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 0.00
Model | 2.6739e-10 2 1.3370e-10
Prob > F
= 1.0000
Residual | 10149.8633 9998 1.01518937
R-squared = 0.0000
-------------+-----------------------------Adj R-squared = -0.0002
Total | 10149.8633 10000 1.01498633
Root MSE
= 1.0076
-----------------------------------------------------------------------------residnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
95

-------------+---------------------------------------------------------------d1 | 4.46e-07 .0306819 0.00 1.000 -.0601423 .0601432


d2 | -1.49e-07 .0097419 -0.00 1.000 -.0190961 .0190958
-----------------------------------------------------------------------------. estimates store bnlscheck
.
. * (1D) Alternative to (1C) robust NLS standard errors that are better.
. * These are sandwich form but use knowledge that V[u]=exp(x'b)^2
. * which can be estimated by Vhat[u] = yhat
. * Now use this knowledge here in computing S in DSD.
. * Form DSDknown = D'SD with S = Diag(yhat^2)
. gen ds1known = yhatnls*yhatnls
. gen ds2known = x*yhatnls*yhatnls
. matrix accum DSDknown = ds1known ds2known, noconstant
(obs=10000)
. matrix accum DD2 = d1 d2, noconstant
(obs=10000)

/* DD commented above */

. * Form the robust variance matrix estimate


. matrix vnlsknown = syminv(DD2)*DSDknown*syminv(DD2)
. * Calculate the robust standard errors
. scalar seb1intnlsknown = sqrt(vnlsknown[1,1])
. scalar seb2xnlsknown = sqrt(vnlsknown[2,2])
. di "Robust standard errors of NLS estimates of b1int and b2x: "
Robust standard errors of NLS estimates of b1int and b2x:
. di "Using knowledge that Var[u] = exp(x'b)^2 estimated by yhat"
Using knowledge that Var[u] = exp(x'b)^2 estimated by yhat
. di seb1intnlsknown " " seb2xnlsknown
.21097066 .08798113
.
. * (1E) Calculate R-squared and log-likelihood at the NLS estimates
. * Note that Stata version 8 reports the wrong R-squared
. * as uses TSS = Sum_i y_i^2 and not Sum_i(y_i - ybar)^2
. * lnL sums lnf(y) = ln(lamda) - y*lamda
. gen lamdanls = 1 / yhatnls
/* yhatnls saved earlier */
. gen lnfnls = ln(lamdanls) - y*lamdanls
. quietly means lnfnls

96

. scalar LLnls = r(mean)*r(N)


. * R-squared = 1 - Sum_i(y_i - yhat_i)^2 / Sum_i(y_i - ybar)^2
. egen ybar = mean(y)
. * quietly means y
. * scalar ybar = r(mean)
. gen y_ybarsq = (y - ybar)^2
. quietly means y_ybarsq
. scalar SStotal = r(mean)
. gen y_yhatsqnls = (y - yhatnls)^2
. quietly means y_yhatsqnls
. scalar SSresidnls = r(mean)
. scalar Rsqnls = 1 - SSresidnls/SStotal

/* SStotal found earlier */

. di LLnls " " Rsqnls


-232.97524 .39134462
.
. ** (2) FGNLS ESTIMATION USING STATA NL COMMAND
.
. * The following gives FGNLS in Table 5.7
. * To instead get the WNLS estimates in Table 5.7
. * replace gen wfgnls = (1/yhatnls)^2 below by gen wfgnls = 1/yhatnls
.
. * The Feasible generalized NLS estimator minimizes
. * SUM_i (y_i - g(x_i'b))^2 / s_i^2 where s_i^2 = estimate of sigma_i^2
. * This is y_i = g(x_i'b) + u_i where u_i ~ (0,s_i^2)
. * Can do NLS with weighting option [aweight = 1/(s_i^2)]
. * Here s_i^2 = [exp(x_i'b)]^2 = yhatnls^2
.
. * The simplest way to proceed is to use the aweights option.
.
. * (2A) nls program expnls already defined in (1A)
.
. * (2B) For FGNLS do this nls but now with weights
. gen wfgnls = (1/yhatnls)^2
. * gen wfgnls = 1/yhatnls
. nl expnls y [aweight=wfgnls]
(sum of wgt is 405584.32)
Iteration 0: residual SS = 1127.256
Iteration 1: residual SS = 363.8331
Iteration 2: residual SS = 239.3399
97

Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:

residual SS =
residual SS =
residual SS =
residual SS =

220.6796
220.2856
220.2851
220.2851

Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 4946.06
Model | 217.95244 2 108.97622
Prob > F
= 0.0000
Residual | 220.285065 9998 .022032913
R-squared = 0.4973
-------------+-----------------------------Adj R-squared = 0.4972
Total | 438.237505 10000 .043823751
Root MSE
= .1484349
Res. dev. = 8924.231
(expnls)
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------b1int | 1.984035 .0147737 134.30 0.000 1.955075 2.012994
b2x | -.990691 .01001 -98.97 0.000 -1.010313 -.9710694
-----------------------------------------------------------------------------(SEs, P values, CIs, and correlations are asymptotic approximations)
. estimates store bfgnls
.
. * (2C) Robust standard errors
. * The standard errors obtained given are consistent
. * assuming correct model for heteroskedasticity.
. * To guard against misspecification use similar approach to nls case
. * Obtain the derivatives dg/db
. * Here g = exp(x'b) so dg/db = exp(x'b)*x = yhat*x
. predict residoptnls, residuals
. predict yhatoptnls, yhat
. gen d1opt = yhatoptnls
. gen d2opt = x*yhatoptnls
. * This OLS regression gives robust standard errors
. regress residoptnls d1opt d2opt [aweight=wfgnls], noconstant robust
(sum of wgt is 4.0558e+05)
Regression with robust standard errors
Number of obs = 10000
F( 2, 9998) = 0.00
Prob > F
= 1.0000
R-squared = 0.0000
Root MSE = .14843
-----------------------------------------------------------------------------|
Robust
residoptnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
98

-------------+---------------------------------------------------------------d1opt | -9.85e-09 .0145803 -0.00 1.000 -.0285803 .0285802


d2opt | 8.81e-09 .0101319 0.00 1.000 -.0198606 .0198606
-----------------------------------------------------------------------------. estimates store bfgnlsrobust
. * This OLS regression gives nonrobust standard errors
. * It is a check and should equal (C)
. regress residoptnls d1opt d2opt [aweight=wfgnls], noconstant
(sum of wgt is 4.0558e+05)
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 0.00
Model | 2.2737e-13 2 1.1369e-13
Prob > F
= 1.0000
Residual | 220.285065 9998 .022032913
R-squared = 0.0000
-------------+-----------------------------Adj R-squared = -0.0002
Total | 220.285065 10000 .022028506
Root MSE
= .14843
-----------------------------------------------------------------------------residoptnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------d1opt | -9.85e-09 .0147737 -0.00 1.000 -.0289594 .0289594
d2opt | 8.81e-09 .01001 0.00 1.000 -.0196216 .0196216
-----------------------------------------------------------------------------. estimates store bfgnlscheck
.
. * (2D) Calculate R-squared and log-likelihood at the NLS estimates
. * Note that Stata version 8 reports the wrong R-squared
. * as uses TSS = Sum_i y_i^2 and not Sum_i(y_i - ybar)^2
. * lnL sums lnf(y) = ln(lamda) - y*lamda
. gen lamdafgnls = 1 / yhatoptnls
/* yhatoptnls saved earlier */
. gen lnffgnls = ln(lamdafgnls) - y*lamdafgnls
. quietly means lnffgnls
. scalar LLfgnls = r(mean)*r(N)
. * R-squared = 1 - Sum_i(y_i - yhat_i)^2 / Sum_i(y_i - ybar)^2
. gen y_yhatsqfgnls = (y - yhatoptnls)^2
. quietly means y_yhatsqfgnls
. scalar SSresidfgnls = r(mean)
. scalar Rsqfgnls = 1 - SSresidfgnls/SStotal
. di LLfgnls "

/* SStotal found earlier */

" Rsqfgnls
99

-208.71965

.39056605

.
. ** (3) WNLS ESTIMATION USING STATA NL COMMAND
.
. * To get WNLS estimates in Table 5.7
. * replace gen wfgnls = (1/yhatnls)^2 in (3) FGNLS by gen wfgnls = 1/yhatnls
. * Code is shorter as all comments are dropped
.
. gen wwnls = 1/yhatnls
. nl expnls y [aweight=wwnls]
(sum of wgt is 39858.614)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

residual SS =
residual SS =
residual SS =
residual SS =
residual SS =

2630.417
1694.802
1500.277
1494.658
1494.653

Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 5073.75
Model | 1517.00087 2 758.500436
Prob > F
= 0.0000
Residual | 1494.6525 9998 .149495149
R-squared = 0.5037
-------------+-----------------------------Adj R-squared = 0.5036
Total | 3011.65337 10000 .301165337
Root MSE
= .386646
Res. dev. = 14035.49
(expnls)
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------b1int | 1.990623 .0224903 88.51 0.000 1.946537 2.034708
b2x | -.9960671 .009777 -101.88 0.000 -1.015232 -.9769022
-----------------------------------------------------------------------------(SEs, P values, CIs, and correlations are asymptotic approximations)
. estimates store bwnls
. predict residwnls, residuals
. predict yhatwnls, yhat
. gen d1w = yhatwnls
. gen d2w = x*yhatwnls
. regress residwnls d1w d2w [aweight=wwnls], noconstant robust
(sum of wgt is 3.9859e+04)
Regression with robust standard errors
Number of obs = 10000
F( 2, 9998) = 0.00
100

Prob > F
= 1.0000
R-squared = 0.0000
Root MSE = .38665
-----------------------------------------------------------------------------|
Robust
residwnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------d1w | -1.11e-07 .0358551 -0.00 1.000 -.0702833 .0702831
d2w | 5.35e-08 .0224175 0.00 1.000 -.0439428 .043943
-----------------------------------------------------------------------------. estimates store bwnlsrobust
. regress residwnls d1w d2w [aweight=wwnls], noconstant
(sum of wgt is 3.9859e+04)
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 0.00
Model | 1.8190e-12 2 9.0949e-13
Prob > F
= 1.0000
Residual | 1494.6525 9998 .149495149
R-squared = 0.0000
-------------+-----------------------------Adj R-squared = -0.0002
Total | 1494.6525 10000 .14946525
Root MSE
= .38665
-----------------------------------------------------------------------------residwnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------d1w | -1.11e-07 .0224903 -0.00 1.000 -.0440856 .0440853
d2w | 5.35e-08 .009777 0.00 1.000 -.0191649 .019165
-----------------------------------------------------------------------------. estimates store bwnlscheck
. gen lamdawnls = 1 / yhatwnls

/* yhatwnls saved earlier */

. gen lnfwnls = ln(lamdawnls) - y*lamdawnls


. quietly means lnfwnls
. scalar LLwnls = r(mean)*r(N)
. gen y_yhatsqwnls = (y - yhatwnls)^2
. quietly means y_yhatsqwnls
. scalar SSresidwnls = r(mean)
. scalar Rsqwnls = 1 - SSresidwnls/SStotal

/* SStotal found earlier */

. di LLwnls " " Rsqwnls


-208.93381 .39017996
101

.
. ***** PRINT RESULTS: Last three columns of Table 5.7 page 161
.
. * (1) NLS using NL - nonrobust and robust standard errors
. * Here nonrobust differs from robust asymptotically
.
. * Table 5.7 NLS nonrobust standard errors
. estimates table bnls, b(%10.4f) se(%10.4f) t stats(N ll)
--------------------------Variable | bnls
-------------+------------b1int | 1.8876
| 0.0307
|
61.52
b2x | -0.9575
| 0.0097
| -98.28
-------------+------------N | 10000.0000
ll |
--------------------------legend: b/se/t
. * Table 5.7 NLS robust standard errors
. estimates table bnlscheck bnlsrobust, b(%10.4f) se(%10.4f) t stats(N ll)
---------------------------------------Variable | bnlscheck bnlsrobust
-------------+-------------------------d1 | 0.0000
0.0000
| 0.0307
0.1421
|
0.00
0.00
d2 | -0.0000 -0.0000
| 0.0097
0.0612
|
-0.00
-0.00
-------------+-------------------------N | 10000.0000 10000.0000
ll | -1.426e+04 -1.426e+04
---------------------------------------legend: b/se/t
.
. /*
> * Check: Nonrobust standard errors of NLS b1int and b2x:
> di seb1intnlsnr " " seb2xnlsnr
> * Robust standard errors of NLS estimates of b1int and b2x:
> di seb1intnls " " seb2xnls
> */
. * Alternative Robust standard errors of NLS estimates of b1int and b2x:
102

. * These use knowledge that Var[u] = exp(x'b)


. di seb1intnlsknown " " seb2xnlsknown
.21097066 .08798113
.
. * (3) WNLS - nonrobust and robust standard errors
. * Here nonrobust = robust asymptotically as WNLS in LEF
. * Also should be same as MLE asymptotically
. * Table 5.7 WNLS nonrobust standard errors
. estimates table bwnls, b(%10.4f) se(%10.4f) t stats(N ll)
--------------------------Variable | bwnls
-------------+------------b1int | 1.9906
| 0.0225
|
88.51
b2x | -0.9961
| 0.0098
| -101.88
-------------+------------N | 10000.0000
ll |
--------------------------legend: b/se/t
. * Table 5.7 WNLS robust standard errors
. estimates table bwnlscheck bwnlsrobust, b(%10.4f) se(%10.4f) t stats(N ll)
---------------------------------------Variable | bwnlscheck bwnlsrob~t
-------------+-------------------------d1w | -0.0000 -0.0000
| 0.0225
0.0359
|
-0.00
-0.00
d2w | 0.0000
0.0000
| 0.0098
0.0224
|
0.00
0.00
-------------+-------------------------N | 10000.0000 10000.0000
ll | -4685.9286 -4685.9286
---------------------------------------legend: b/se/t
.
. * (2) FGNLS - nonrobust and robust standard errors
. * Here nonrobust = robust asymptotically as FGNLS in LEF
. * Also should be same as MLE asymptotically
. * Table 5.7 FGNLS nonrobust standard errors
. estimates table bfgnls, b(%10.4f) se(%10.4f) t stats(N ll)

103

--------------------------Variable | bfgnls
-------------+------------b1int | 1.9840
| 0.0148
| 134.30
b2x | -0.9907
| 0.0100
| -98.97
-------------+------------N | 10000.0000
ll |
--------------------------legend: b/se/t
. * Table 5.7 FGNLS robust standard errors
. estimates table bfgnlscheck bfgnlsrobust, b(%10.4f) se(%10.4f) t stats(N ll)
---------------------------------------Variable | bfgnlsch~k bfgnlsro~t
-------------+-------------------------d1opt | -0.0000
-0.0000
| 0.0148
0.0146
|
-0.00
-0.00
d2opt | 0.0000
0.0000
| 0.0100
0.0101
|
0.00
0.00
-------------+-------------------------N | 10000.0000 10000.0000
ll | 4887.7042 4887.7042
---------------------------------------legend: b/se/t
.
. * (4) Print the various log-likelihoods and R-squared
. * Log-likelihood for NLS and FNGLS
. di "LLnls: " LLnls " LLfgnls: " LLfgnls " LLwnls: " LLwnls
LLnls: -232.97524 LLfgnls: -208.71965 LLwnls: -208.93381
. * R-squared for MLE, NLS and FNGLS
. di "Rsqnls: " Rsqnls " Rsqfgnls: " Rsqfgnls " Rsqwnls: " Rsqwnls
Rsqnls: .39134462 Rsqfgnls: .39056605 Rsqwnls: .39017996
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma05p2nls.txt
log type: text
closed on: 17 May 2005, 13:53:34
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p3nlsbyml.txt
104

log type: text


opened on: 17 May 2005, 13:54:20
.
. ********** OVERVIEW OF MMA05P2NLSBYML.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 5.9 pp.159-63
. * Nonlinear Least Squares using Stata command ml
.
. * Provides third column of Table 5.7 for
. * (1) NLS using Stata ml command (easy to get robust s.e.'s)
. * using generated data set mma05data.asc
.
. * Note: Use ml rather than nl as then much easier to get robust s.e.'s
.*
Can instead use stata command nl see program mma05p2nlsbynl.do
.
. * Related programs:
. * mma05p1mle.do
OLS and MLE for the same data
. * mma05p2nls.do
NLS (and WMNLS and FGNLS) using Stata command nl
. * mma05p4margeffects.do Calculates marginal effects
.
. * To run this program you need data and dictionary files
. * mma05data.asc ASCII data set generated by mma05p1mle.do
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** READ IN DATA and SUMMARIZE **********
.
. * Model is y ~ exponential(exp(a + bx))
.*
x ~ N[mux, sigx^2]
.*
f(y) = exp(a + bx)*exp(-y*exp(a + bx))
.*
lnf(y) = (a + bx) - y*exp(a + bx)
.*
E[y] = exp(-(a + bx)) note sign reversal for the mean
.*
V[y] = exp(-(a + bx)) = E[y]^2
. * Here a = 2, b = -1 and x ~ N[mux=1, sigx^21]
. * and Table 5.7 uses N=10,000
.
. * Data was generated by program mma05p1mle.do
. infile y x using mma05data.asc
(10000 observations read)
105

.
. * Descriptive Statistics
. describe
Contains data
obs:
10,000
vars:
2
size:
120,000 (98.8% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------y
float %9.0g
x
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y | 10000 .6194352 1.291416 .0000445 30.60636
x | 10000 1.014313 1.004905 -2.895741 4.994059
.
. ********** DO THE ANALYSIS: NLS using STATA COMMAND ML **********
.
. * (1) NLS ESTIMATION USING STATA ML COMMAND (maximum likelihood)
.
. * Advantage: ml command has robust standard errors as an option
.
. * The NLS estimator minimizes SUM_i (y_i - g(x_i'b))^2.
. * Here let g(x'b) = exp(a + b*x) = exp(b1int + b2x*x) say.
. * In fact for this dgp E[y] = exp(-(a + bx)) so sign reversal for the mean.
.
. * To adjust this code to other NLS problems
. * (a) If more regressors, say x1 x2 and x3, replace ml model line with
.*
ml model lf mlexp (y = x1 x2 x3) / sigma
. * (b) If different functional form for mean, say g(x'b), redefine `res' as
.*
`res' = $ML_y1 - g(`theta')
. * (c) If functional form for mean is not single-index then the program
. * will become considerably more complicated with more args.
.
. * (1A) The program "mlexp" defines the objective function
. program define mlexp
1. version 8.0
2. args lnf theta sigma
/* theta contains b1int and b2x; sigma is st.dev.of error */
3. tempvar res
/* create to shorten expression for lnf */
4. quietly gen double `res' = $ML_y1 - exp(-`theta')
106

5. quietly replace `lnf' = -0.5*ln(2*_pi) - ln(`sigma') - 0.5*`res'^2/`sigma'^2


6. end
.
. * (1B) The following command gives the dep variable (y) and regressors (x + intercept)
. ml model lf mlexp (y = x) / sigma
. ml search
initial:
log likelihood = -<inf> (could not be evaluated)
feasible:
log likelihood = -35613.002
improve:
log likelihood = -19164.648
rescale:
log likelihood = -16938.923
rescale eq: log likelihood = -16938.923
. ml maximize
initial:
log likelihood = -16938.923
rescale:
log likelihood = -16938.923
rescale eq: log likelihood = -16938.923
Iteration 0: log likelihood = -16938.923 (not concave)
Iteration 1: log likelihood = -15504.033
Iteration 2: log likelihood = -14673.535
Iteration 3: log likelihood = -14272.637
Iteration 4: log likelihood = -14263.775
Iteration 5: log likelihood = -14263.761
Iteration 6: log likelihood = -14263.761
Number of obs =
10000
Wald chi2(1) = 10492.88
Log likelihood = -14263.761
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------eq1
|
x | -.9574683 .0093471 -102.43 0.000 -.9757883 -.9391483
_cons | 1.887562 .0295701 63.83 0.000 1.829606 1.945519
-------------+---------------------------------------------------------------sigma
|
_cons | 1.007465 .0071239 141.42 0.000 .9935028 1.021428
-----------------------------------------------------------------------------. estimates store bnlsbymle
.
. * (1C) Adding ,robust gives Heteroskedastic robust standard errors
. ml model lf mlexp (y = x) / sigma, robust
. ml search
initial:
log pseudo-likelihood = -<inf> (could not be evaluated)
feasible:
log pseudo-likelihood = -35613.002
107

improve:
log pseudo-likelihood = -17310.807
rescale:
log pseudo-likelihood = -17310.807
rescale eq: log pseudo-likelihood = -16777.282
. ml maximize
initial:
log pseudo-likelihood = -16777.282
rescale:
log pseudo-likelihood = -16777.282
rescale eq: log pseudo-likelihood = -16777.282
Iteration 0: log pseudo-likelihood = -16777.282 (not concave)
Iteration 1: log pseudo-likelihood = -16097.359
Iteration 2: log pseudo-likelihood = -16013.711
Iteration 3: log pseudo-likelihood = -14412.885
Iteration 4: log pseudo-likelihood = -14264.159
Iteration 5: log pseudo-likelihood = -14263.761
Iteration 6: log pseudo-likelihood = -14263.761
Number of obs =
10000
Wald chi2(1) = 288.75
Log pseudo-likelihood = -14263.761
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------eq1
|
x | -.9574683 .0563463 -16.99 0.000 -1.067905 -.8470317
_cons | 1.887562 .127832 14.77 0.000 1.637016 2.138108
-------------+---------------------------------------------------------------sigma
|
_cons | 1.007465 .0561714 17.94 0.000 .8973713 1.117559
-----------------------------------------------------------------------------. estimates store bnlsbymlerobust
.
. ***** PRINT RESULTS: Third column of Table 5.7 p.111 **********
.
. * (1) NLS by ML - nonrobust and robust standard errors
. * The coefficient estimates are exactly the same as those using the nl command
. * The estimated standard errors are close - within 10% of those using the nl command
. * Table 5.7 reports the standard errors using the nl command
. estimates table bnlsbymle bnlsbymlerobust, b(%10.4f) se(%10.4f) t stats(N ll)
---------------------------------------Variable | bnlsbymle bnlsbyml~t
-------------+-------------------------eq1
|
x | -0.9575 -0.9575
| 0.0093
0.0563
| -102.43
-16.99
108

_cons | 1.8876
1.8876
| 0.0296
0.1278
|
63.83
14.77
-------------+-------------------------sigma
|
_cons | 1.0075
1.0075
| 0.0071
0.0562
| 141.42
17.94
-------------+-------------------------Statistics |
N | 10000.0000 10000.0000
ll | -1.426e+04 -1.426e+04
---------------------------------------legend: b/se/t
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma05p3nlsbyml.txt
log type: text
closed on: 17 May 2005, 13:54:27
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p4margeffects.txt
log type: text
opened on: 17 May 2005, 13:57:02
.
. ********** OVERVIEW OF MMA05P4MARGINALEFFECTS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 5.9.4 pp.162-3
. * Marginal effects analysis for a nonlinear model (here exponential regression).
.
. * Provides
. * (1) Sample average marginal effect using derivative
. * (2) Sample average marginal effect using first difference
. * (3) Marginal effect evaluated at the sample mean
. * (4) Marginal effects (1)-(3) when model estimated by Stata ml command
. * using generated data (see below)
.
. * Related programs:
. * mma05p1mle.do
OLS and MLE for the same data
. * mma05p2nls.do
NLS, WNLS, FGNLS for same data using nl command
. * mma05p3nlsbyml.do NLS for same data using ml command
.
109

. * To run this program you need data and dictionary files


. * mma05data.asc ASCII data set generated by mma05p1mle.do
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** READ IN DATA and SUMMARIZE **********
.
. * Model is y ~ exponential(exp(a + bx))
.*
x ~ N[mux, sigx^2]
.*
f(y) = exp(a + bx)*exp(-y*exp(a + bx))
.*
lnf(y) = (a + bx) - y*exp(a + bx)
.*
E[y] = exp(-(a + bx)) note sign reversal for the mean
.*
V[y] = exp(-(a + bx)) = E[y]
. * Here a = 2, b = -1 and x ~ N[mux=1, sigx^21]
. * and Table 5.7 uses N=10,000
.
. * Data was generated by program mma05p1mle.do
. infile y x using mma05data.asc
(10000 observations read)
.
. * Descriptive Statistics
. describe
Contains data
obs:
10,000
vars:
2
size:
120,000 (98.8% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------y
float %9.0g
x
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y | 10000 .6194352 1.291416 .0000445 30.60636
x | 10000 1.014313 1.004905 -2.895741 4.994059
.
110

. ********** MARGINAL EFFECTS for CHAPTER 5.9.4 **********


.
. ** (1) DERIVATIVE METHOD FOR SAMPLE AVERAGE MARGINAL EFFECT
.
. * (1A) METHOD A: Use analytical results
. * Since E[y] = exp(-(a + bx)) Note: here sign reversal for the mean !!
.*
dE[y]/dx = -b*exp(-(a + bx)) = -b*E[y]
.
. * Estimate the model
. * The Stata code for exponential regression is unusual as st command
. * Need to declare data to be st data with dependent variable y
. stset y
failure event: (assumed to fail at time=y)
obs. time interval: (0, y]
exit on or before: failure
-----------------------------------------------------------------------------10000 total obs.
0 exclusions
-----------------------------------------------------------------------------10000 obs. remaining, representing
10000 failures in single record/single failure data
6194.352 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t = 30.60636
. quietly streg x, distribution(exponential) nohr
. gen dEydxanalyticalderivative = -_b[x]*exp(-_b[_cons] - _b[x]*x)
. * Alternative is to (1) predict the mean and (2) multiply by -_b[x]
. quietly sum dEydxanalyticalderivative
. scalar mesaad = r(mean)
. di "Sample average marginal effect by analytical derivative = " mesaad
Sample average marginal effect by analytical derivative = .60976598
.
. * (1B) METHOD B: Use numerical derivative (here one-sided)
. * This is same as first difference code, except have small change in x
. * Note: precision problems can arise with small changes in x
. * The following code tries to minimize such problems
. * Change in x will be 0.0001 times the standard deviation of x
. egen sdx = sd(x)
. quietly streg x, distribution(exponential) nohr
. * Need to tell streg to predict the mean as this is not the default.
. predict y0, mean time
111

. gen xoriginal = x
. replace x = x+0.0001*sdx
(10000 real changes made)
. predict y1, mean time
. gen dEydxnumericalderivative = (y1 - y0)/(0.0001*sdx)
. quietly sum dEydxnumericalderivative
. scalar mesand = r(mean)
. di "Sample average marginal effect by numerical derivative = " mesand
Sample average marginal effect by numerical derivative = .60949044
. replace x = xoriginal
(10000 real changes made)
. drop xoriginal sdx y0 y1
.
. ** (2) FINITE DIFFERENCE METHOD FOR SAMPLE AVERAGE MARGINAL EFFECT
.
. streg x, distribution(exponential) nohr /* y is dependent variable */
failure _d: 1 (meaning all fail)
analysis time _t: y
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -20754.005


log likelihood = -17232.884
log likelihood = -15760.556
log likelihood = -15752.193
log likelihood = -15752.19
log likelihood = -15752.19

Exponential regression -- log relative-hazard form


No. of subjects =
10000
No. of failures =
10000
Time at risk = 6194.352464

Number of obs =

LR chi2(1)
Log likelihood =

-15752.19

= 10003.63
Prob > chi2 =

10000

0.0000

-----------------------------------------------------------------------------_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | -.9896276 .0098692 -100.27 0.000 -1.008971 -.9702842
_cons | 1.982921 .0141496 140.14 0.000 1.955188 2.010654
-----------------------------------------------------------------------------112

.
. * The following method can be used following many stata estimation commands
. * 1. Predict y using sample data.
. * Need to say predict the mean as this is not the streg default.
. predict y0, mean time
. * 2. Predict y with regressor of x increased by one
. gen xoriginal = x
. replace x = x+1
(10000 real changes made)
. predict y1, mean time
. replace x = xoriginal /* Put x back to initial value for later analysis */
(10000 real changes made)
. * 3. Calculate difference
. gen dEydxfinitedifference = y1 - y0
. quietly sum dEydxfinitedifference
. scalar mesafd = r(mean)
. di "Sample average marginal effect by first differences = " mesafd
Sample average marginal effect by first differences = 1.0414485
. drop xoriginal y0 y1
.
. ** (3) DERIVATIVE METHOD FOR MARGINAL EFFECT AT SAMPLE MEAN
.
. * (3A) Use Stata command mfx
. quietly streg x, distribution(exponential) nohr
. * Need to tell mfx to predict the mean as this is not the streg default.
. mfx compute, dydx predict(mean time)
Marginal effects after ereg
y = predicted mean _t (predict, mean time)
= .37563828
-----------------------------------------------------------------------------variable |
dy/dx Std. Err. z P>|z| [ 95% C.I. ]
X
---------+-------------------------------------------------------------------x | .371742
.00525 70.81 0.000 .361452 .382032 1.01431
-----------------------------------------------------------------------------. di "Marginal effect by analytical derivative at mean of x using mfx: "
Marginal effect by analytical derivative at mean of x using mfx:

113

. matrix list e(Xmfx_dydx)


symmetric e(Xmfx_dydx)[1,1]
x
r1 .371742
.
. * (3B) Write ones own code
. quietly streg x, distribution(exponential) nohr
. quietly sum x
. scalar meanx = r(mean)
. scalar dEydxatmeanx = -_b[x]*exp(-_b[_cons] - _b[x]*meanx)
. di "Marginal effect by analytical derivative at mean of x done manually: "
Marginal effect by analytical derivative at mean of x done manually:
. di dEydxatmeanx
.371742
.
. ** (4) MARGINAL EFFECTS AFTER ML COMMAND
.
. * Preceding (1) - (3) presume there is a built-in command to get MLE.
. * Now consider ML estimation using Stata's ml command.
. * After ml command cannot use predict or mfx.
. * Need to be more manual, as follows.
.
. * Estimate model by ml: for details see mma0p1mle.do
. program define mleexp0
1. version 8.0
2. args lnf theta
/* Must use lnf while could use name other than theta */
3. quietly replace `lnf' = `theta' - $ML_y1*exp(`theta')
4. end
. quietly ml model lf mleexp0 (y = x)
. quietly ml search
. quietly ml maximize
.
. * Note that here the mean is in fact exp(-a-b*x)
.
. * (1A) Sample average marginal effect by calculus methods
. gen mldEydxanalyticalderivative = -_b[x]*exp(-_b[_cons] - _b[x]*x)
. quietly sum mldEydxanalyticalderivative

114

. scalar mlmesaad = r(mean)


. di "Sample average marginal effect by analytical derivative = " mlmesaad
Sample average marginal effect by analytical derivative = .60976598
.
. * (1B) Sample average marginal effect by numerical derivative
. egen sdx = sd(x)
. gen y0 = exp(-_b[_cons] - _b[x]*x)
. gen xoriginal = x
. replace x = x+0.0001*sdx
(10000 real changes made)
. gen y1 = exp(-_b[_cons] - _b[x]*x)
. gen mldEydxnumericalderivative = (y1 - y0)/(0.0001*sdx)
. quietly sum mldEydxnumericalderivative
. scalar mlmesand = r(mean)
. di "ML sample average marginal effect by numerical derivative = " mlmesand
ML sample average marginal effect by numerical derivative = .60949063
. replace x = xoriginal
(10000 real changes made)
. drop xoriginal sdx y0 y1
.
. * (2) Sample average marginal effect by increase x by one unit (finite difference)
. gen mldEydxfinitedifference = exp(-_b[_cons]-_b[x]*(x+1)) - exp(-_b[_cons]-_b[x]*x)
. quietly sum mldEydxfinitedifference
. scalar mlmesafd = r(mean)
. di "Sample average marginal effect by first differnce = " mlmesafd
Sample average marginal effect by first differnce = 1.0414485
.
. * (3) Marginal effect estimated at the sample mean of x
. quietly sum x
. scalar meanx = r(mean)
. scalar mldEydxatmeanx = -_b[x]*exp(-_b[_cons] - _b[x]*meanx)

115

. di "ML marginal effect at mean of x by analytical derivative: "


ML marginal effect at mean of x by analytical derivative:
. di mldEydxatmeanx
.371742
.
. ********** DISPLAY RESULTS on p.162-3 **********
.
. di "Marginal Effects: (1A) Analytical deriv (1B) Numerical Deriv (2) First diff"
Marginal Effects: (1A) Analytical deriv (1B) Numerical Deriv (2) First diff
. sum dEydxfinitedifference dEydxanalyticalderivative dEydxnumericalderivative
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------dEydxfinit~e | 10000 1.041449 1.373144 .01325 32.59646
dEydxanaly~e | 10000 .609766 .8039727 .0077578 19.08516
dEydxnumer~e | 10000 .6094904 .8035654 .0077479 19.11325
.
. di "KEY RESULTS FOR CHAPTER 5.9.4 pp.162-3 FOLLOW"
KEY RESULTS FOR CHAPTER 5.9.4 pp.162-3 FOLLOW
. di "(1A) Sample average marginal effect by analytical derivative = " mesaad
(1A) Sample average marginal effect by analytical derivative = .60976598
. di "(1B) Sample average marginal effect by numerical derivative = " mesand
(1B) Sample average marginal effect by numerical derivative = .60949044
. di "(2) Sample average marginal effect by first differences = " mesafd
(2) Sample average marginal effect by first differences = 1.0414485
. di "(3) Marginal effect at mean of x by analytical derivative = " dEydxatmeanx
(3) Marginal effect at mean of x by analytical derivative = .371742
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma05p4margeffects.txt
log type: text
closed on: 17 May 2005, 13:57:06

116

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma06p2Theil.txt
log type: text
opened on: 18 May 2005, 17:45:50
.
. ********** OVERVIEW OF MMA06P2THEIL.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * NOTE: Stata does not have a NL2SLS command
.
. * Chapter 6.5.4 nonlinear 2SLS example.
. * Table 6.4 partial only
. * (1) OLS
inconsistent
. * (2) NL2SLS consistent NOT INCLUDED AS STATA DOES NOT DO
. * (3) Wrong 2SLS inconsistent
.
. * To run this program you need data set
.*
mma06p1nl2sls.asc
. * generated by Limdep program MMA06P1NL2SLS.LIM
.
. * Some of the analysis is done in Limdep which (unlike Stata) has
. * an NL2SLS command
.
. ********** SETUP **********
.
. set more off
. version 8.0
.
. ********** READ DATA and SUMMARIZE **********
.
. * Model is y = 1*x^2 + u
.*
x = 1*z + v
. * where u and v are joint normal (0,0,1,1,0.8)
.
. infile y x xsq z zsq u v using mma06p1nl2sls.asc
(200 observations read)
.
. * Descriptive Statistics
. describe
Contains data
obs:
200
117

vars:
7
size:
6,400 (99.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------y
float %9.0g
x
float %9.0g
xsq
float %9.0g
z
float %9.0g
zsq
float %9.0g
u
float %9.0g
v
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y|
200 1.632794 2.418096 -2.332656 9.354863
x|
200 .9970513 .8330302 -1.908285 2.696363
xsq |
200 1.684581 1.638509 .0000948 7.270374
z|
200
1
0
1
1
zsq |
200
1
0
1
1
-------------+-------------------------------------------------------u|
200 -.0517871 .9427286 -2.816687 2.202356
v|
200 -.0029487 .8330302 -2.908285 1.696363
.
. ********** DO THE ANALYSIS: ESTIMATE MODELS **********
.
. * (1) OLS is inconsistent (first column of Table 4.4)
. regress y xsq, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 2250.83
Model | 1558.96322 1 1558.96322
Prob > F
= 0.0000
Residual | 137.83055 199 .692615831
R-squared = 0.9188
-------------+-----------------------------Adj R-squared = 0.9184
Total | 1696.79377 200 8.48396883
Root MSE
= .83224
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsq | 1.189495 .0250721 47.44 0.000 1.140054 1.238936
-----------------------------------------------------------------------------. estimates store olswrong

118

. regress y xsq, noconstant robust


Regression with robust standard errors
Number of obs =
F( 1, 199) = 3850.71
Prob > F
= 0.0000
R-squared = 0.9188
Root MSE = .83224

200

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsq | 1.189495 .0191687 62.05 0.000 1.151695 1.227295
-----------------------------------------------------------------------------. estimates store olswrongrob
.
. * (2) NL2SLS command Stata does not have
. * See LIMDEP program MMA06P1NL2SLS.LIM
.
. * (3A) Theil's 2sls where first regress x on z is inconsistent
. regress x z, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 286.51
Model | 198.822258 1 198.822258
Prob > F
= 0.0000
Residual | 138.093918 199 .693939288
R-squared = 0.5901
-------------+-----------------------------Adj R-squared = 0.5881
Total | 336.916176 200 1.68458088
Root MSE
= .83303
-----------------------------------------------------------------------------x|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | .9970513 .0589041 16.93 0.000 .8808949 1.113208
-----------------------------------------------------------------------------. predict xhat
(option xb assumed; fitted values)
. gen xhatsq = xhat*xhat
. regress y xhatsq, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 91.19
Model | 533.203113 1 533.203113
Prob > F
= 0.0000
Residual | 1163.59065 199 5.84718921
R-squared = 0.3142
-------------+-----------------------------Adj R-squared = 0.3108
Total | 1696.79377 200 8.48396883
Root MSE
= 2.4181

119

-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xhatsq | 1.642466 .1719981 9.55 0.000 1.303293 1.981638
-----------------------------------------------------------------------------. estimates store ivwrong
.
. ********** DISPLAY KEY RESULTS Table 6.4 p.199 **********
.
. * Table 4.4 p.199
. estimates table olswrong olswrongrob ivwrong, b(%8.3f) se stats(N r2) keep(xsq xhatsq)
----------------------------------------------Variable | olswrong olswro~b ivwrong
-------------+--------------------------------xsq | 1.189
1.189
| 0.025
0.019
xhatsq |
1.642
|
0.172
-------------+--------------------------------N | 200.000 200.000 200.000
r2 | 0.919
0.919
0.314
----------------------------------------------legend: b/se
.
. * (3B) IV with instrument xsq for zsq should work but Stata cannot do
. ivreg y (xsq = xsq), noconstant
Instrumental variables (2SLS) regression
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) =
.
Model | 1558.96322 1 1558.96322
Prob > F
=
.
Residual | 137.83055 199 .692615831
R-squared =
.
-------------+-----------------------------Adj R-squared =
.
Total | 1696.79377 200 8.48396883
Root MSE
= .83224
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsq | 1.189495 .0250721 47.44 0.000 1.140054 1.238936
-----------------------------------------------------------------------------Instrumented: xsq
Instruments: xsq
-----------------------------------------------------------------------------. corr xsq xsq
(obs=200)
120

|
xsq
xsq
-------------+-----------------xsq | 1.0000
xsq | 1.0000 1.0000

. corr xsq z
(obs=200)
|
xsq
z
-------------+-----------------xsq | 1.0000
z|
.
.

. regress xsq z, noconstant


Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 211.41
Model | 567.562553 1 567.562553
Prob > F
= 0.0000
Residual | 534.257348 199 2.68471029
R-squared = 0.5151
-------------+-----------------------------Adj R-squared = 0.5127
Total | 1101.8199 200 5.50909951
Root MSE
= 1.6385
-----------------------------------------------------------------------------xsq |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | 1.684581 .1158601 14.54 0.000 1.45611 1.913052
-----------------------------------------------------------------------------. predict xsqhat
(option xb assumed; fitted values)
. regress y xsqhat, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 91.19
Model | 533.203113 1 533.203113
Prob > F
= 0.0000
Residual | 1163.59065 199 5.84718921
R-squared = 0.3142
-------------+-----------------------------Adj R-squared = 0.3108
Total | 1696.79377 200 8.48396883
Root MSE
= 2.4181
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsqhat | .9692582 .1015002 9.55 0.000 .7691043 1.169412
-----------------------------------------------------------------------------. * ivreg y (xsq = z), noconstant
.
121

. gen one = 1
. regress y one, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 91.19
Model | 533.203113 1 533.203113
Prob > F
= 0.0000
Residual | 1163.59065 199 5.84718921
R-squared = 0.3142
-------------+-----------------------------Adj R-squared = 0.3108
Total | 1696.79377 200 8.48396883
Root MSE
= 2.4181
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------one | 1.632794 .1709852 9.55 0.000 1.295618 1.969969
-----------------------------------------------------------------------------.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma06p2Theil.txt
log type: text
closed on: 18 May 2005, 17:45:50
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma06p2twostage.txt
log type: text
opened on: 18 May 2005, 17:59:06
.
. ********** OVERVIEW OF MMA06P2TWOSTAGE.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * NOTE: Stata does not have a NL2SLS command
.
. * Chapter 6.5.4 nonlinear 2SLS example on pages 198-9.
.
. * Table 6.4 partial only
. * (1) OLS
inconsistent
. * (2) NL2SLS consistent NOT INCLUDED AS STATA DOES NOT DO
. * (3) Twostage Here 2SLS using Theil's interpretation of 2SLS is inconsistent
.
. * To run this program you need data set
.*
mma06p1nl2sls.asc
. * generated by Limdep program MMA06P1NL2SLS.LIM
.
. * Some of the analysis is done in Limdep which (unlike Stata) has
122

. * an NL2SLS command
.
. ********** SETUP **********
.
. set more off
. version 8.0
.
. ********** READ DATA and SUMMARIZE **********
.
. * Model is y = 1*x^2 + u
.*
x = 1*z + v
. * where u and v are joint normal (0,0,1,1,0.8)
.
. infile y x xsq z zsq u v using mma06p1nl2sls.asc
(200 observations read)
.
. * Descriptive Statistics
. describe
Contains data
obs:
200
vars:
7
size:
6,400 (99.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------y
float %9.0g
x
float %9.0g
xsq
float %9.0g
z
float %9.0g
zsq
float %9.0g
u
float %9.0g
v
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y|
200 1.632794 2.418096 -2.332656 9.354863
x|
200 .9970513 .8330302 -1.908285 2.696363
xsq |
200 1.684581 1.638509 .0000948 7.270374
z|
200
1
0
1
1
zsq |
200
1
0
1
1
-------------+-------------------------------------------------------123

u|
v|

200 -.0517871
200 -.0029487

.9427286 -2.816687 2.202356


.8330302 -2.908285 1.696363

.
. ********** DO THE ANALYSIS: ESTIMATE MODELS **********
.
. * (1) OLS is inconsistent (first column of Table 4.4)
. regress y xsq, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 2250.83
Model | 1558.96322 1 1558.96322
Prob > F
= 0.0000
Residual | 137.83055 199 .692615831
R-squared = 0.9188
-------------+-----------------------------Adj R-squared = 0.9184
Total | 1696.79377 200 8.48396883
Root MSE
= .83224
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsq | 1.189495 .0250721 47.44 0.000 1.140054 1.238936
-----------------------------------------------------------------------------. estimates store olswrong
. regress y xsq, noconstant robust
Regression with robust standard errors
Number of obs =
F( 1, 199) = 3850.71
Prob > F
= 0.0000
R-squared = 0.9188
Root MSE = .83224

200

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsq | 1.189495 .0191687 62.05 0.000 1.151695 1.227295
-----------------------------------------------------------------------------. estimates store olswrongrob
.
. * (2) NL2SLS command Stata does not have
. * See LIMDEP program MMA06P1NL2SLS.LIM
. * See also code further down
.
. * (3A) Theil's 2sls where first regress x on z
.*
and then use xhat^2 as instrument for x^2 is inconsistent
.
. regress x z, noconstant

124

Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 286.51
Model | 198.822258 1 198.822258
Prob > F
= 0.0000
Residual | 138.093918 199 .693939288
R-squared = 0.5901
-------------+-----------------------------Adj R-squared = 0.5881
Total | 336.916176 200 1.68458088
Root MSE
= .83303
-----------------------------------------------------------------------------x|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | .9970513 .0589041 16.93 0.000 .8808949 1.113208
-----------------------------------------------------------------------------. predict xhat
(option xb assumed; fitted values)
. gen xhatsq = xhat*xhat
. regress y xhatsq, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 91.19
Model | 533.203113 1 533.203113
Prob > F
= 0.0000
Residual | 1163.59065 199 5.84718921
R-squared = 0.3142
-------------+-----------------------------Adj R-squared = 0.3108
Total | 1696.79377 200 8.48396883
Root MSE
= 2.4181
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xhatsq | 1.642466 .1719981 9.55 0.000 1.303293 1.981638
-----------------------------------------------------------------------------. estimates store twostage
.
. ********** DISPLAY KEY RESULTS Table 6.4 p.199 **********
.
. * Table 4.4 p.199 first and third columns
. estimates table olswrong twostage, b(%8.3f) se stats(N r2) keep(xsq xhatsq)
-----------------------------------Variable | olswrong twostage
-------------+---------------------xsq | 1.189
| 0.025
xhatsq |
1.642
|
0.172
-------------+---------------------N | 200.000 200.000
r2 | 0.919
0.314
125

-----------------------------------legend: b/se
.
. ********** FURTHER ANALYSIS **********
.
. * For this particular example there are ways to get linear IV to work
. * as the problem is not very nonlinear
.
. * (2A) regress xsq on z giving xsqhat and then regress y on xsqhat
.*
Gives nl2sls estimator though not correct standard errors
.
. * Note we get estimator 0.969 which is correct - Table 6.4 had typo
. regress xsq z, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 211.41
Model | 567.562553 1 567.562553
Prob > F
= 0.0000
Residual | 534.257348 199 2.68471029
R-squared = 0.5151
-------------+-----------------------------Adj R-squared = 0.5127
Total | 1101.8199 200 5.50909951
Root MSE
= 1.6385
-----------------------------------------------------------------------------xsq |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | 1.684581 .1158601 14.54 0.000
1.45611 1.913052
-----------------------------------------------------------------------------. predict xsqhat
(option xb assumed; fitted values)
. regress y xsqhat, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 91.19
Model | 533.203113 1 533.203113
Prob > F
= 0.0000
Residual | 1163.59065 199 5.84718921
R-squared = 0.3142
-------------+-----------------------------Adj R-squared = 0.3108
Total | 1696.79377 200 8.48396883
Root MSE
= 2.4181
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsqhat | .9692582 .1015002 9.55 0.000 .7691043 1.169412
-----------------------------------------------------------------------------.
. * (2B) IV with instrument z for xsq should work but Stata cannot do
.*
for some reason due to here z = 1 which has no variation
. ivreg y (xsq = z), noconstant
note: z dropped due to collinearity
126

equation not identified; must have at least as many instruments not in


the regression as there are instrumented variables
r(481);
end of do-file
r(481);
. exit, clear

127

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p1mltests.txt
log type: text
opened on: 17 May 2005, 13:59:20
.
. ********** OVERVIEW OF MMA07P1MLTESTS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 7.4 pp.241-3
. * Likelihood-based hypothesis tests
.
. * Implements the three likelihood-based tests presented in Table 7.1:
. * Wald test
. * LR test
. * LM test direct
. * LM test via auxiliary regression
. * for a Poisson model with simulated data (see below).
.
. * NOTE: To implement this program requires:
.*
the free Stata add-on rndpoix
. * To obtain this, in Stata give command: search rndpoix
. * If you don't want to do this, instead use the data set
.
. ********** SETUP ***********
.
. version 8
. set more off
.
. ********** GENERATE DATA ***********
.
. * Model is
. * y ~ Poisson[exp(b1 + b2*x2 + b3*x3 + b4*x4]
. * where
. * x2, x3 and x4 are iid ~ N[0,1]
. * and b1=0, b2=0.1, b3=0.1 and b4=0.1
.
. set seed 10001
. set obs 200
obs was 0, now 200
. scalar b1 = 0

128

. scalar b2 = 0.1
. scalar b3 = 0.1
. scalar b4 = 0.1
.
. * Generate regressors
. gen x2 = invnorm(uniform())
. gen x3 = invnorm(uniform())
. gen x4 = invnorm(uniform())
.
. * Generate y
. gen mupoiss = exp(b1+b2*x2+b3*x3+b4*x4)
. * The next requires Stata add-on. In Stata: search rndpoix
. rndpoix(mupoiss)
( Generating ....... )
Variable xp created.
. gen y = xp
.
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x2 |
200 -.0091098 1.010072 -2.857666 2.149822
x3 |
200 -.1459839 1.109521 -3.086754 3.111421
x4 |
200 -.0325314 .9674748 -2.852186 2.379461
mupoiss |
200 1.000447 .1993649 .6191922 1.903112
xp |
200
.845 .951579
0
6
-------------+-------------------------------------------------------y|
200
.845 .951579
0
6
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x2 x3 x4 using mma07p1mltests.asc, replace
.
. ********** ANALYSIS: LIKELIHOOD-BASED HYPOTHESIS TESTS ***********
.
. * Hypotheses to test are
. * (A) Single exclusion: b3 = 0
. * (B) Multiple exclusion: b3 = 0, b4 = 0
. * (C) Linear:
b3 = b4
. * (B) Nonlinear:
b3/b4 = 1
.
129

. * Tests are Wald, LR, LM and LM (auxiliary)


.
. ****** (A) TEST H0: b3 = 0
.
. * First skip to (B) where many comments given.
.
. ****** (B) TEST H0: b3 = 0, b4 = 0.
.
. * (1) Wald test requires estimation of unrestricted model only
. poisson y x2 x3 x4
Iteration 0: log likelihood = -238.77153
Iteration 1: log likelihood = -238.77153
Poisson regression

Number of obs =
200
LR chi2(3)
=
8.30
Prob > chi2 = 0.0401
Log likelihood = -238.77153
Pseudo R2
= 0.0171

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0275702 .0767909 -0.36 0.720 -.1780775 .1229371
x3 | .1630037 .0670848 2.43 0.015 .0315199 .2944874
x4 | .1026568 .0802139 1.28 0.201 -.0545595 .2598732
_cons | -.1653238 .0773479 -2.14 0.033 -.316923 -.0137246
-----------------------------------------------------------------------------.
. * (1A) Stata Wald test command
. test (x3=0) (x4=0)
( 1) [y]x3 = 0
( 2) [y]x4 = 0
chi2( 2) = 8.57
Prob > chi2 = 0.0138
.
. * (1B) Wald test done manually
. * Use h'[RVR]-inv*h.
. * Details below will change for each example.
. * In particular, for nonlinear restrictions more work in forming R
. * Note that Stata puts the intercept last, not first.
. * So here the second and third elements of b are set to zero.
. matrix bfull = e(b)
/* 1xq row vector */
. matrix vfull = e(V)

/* qxq matrix */

. matrix h = (bfull[1,2]\bfull[1,3])

/* hx1 vector */

130

. matrix R = (0,1,0,0\0,0,1,0)

/* h x q matrix */

. matrix Wald = h'*syminv(R*vfull*R')*h /* scalar */


. matrix list h
h[2,1]
c1
r1 .16300365
r2 .10265681
. matrix list R
R[2,4]
c1 c2 c3 c4
r1 0 1 0 0
r2 0 0 1 0
. matrix list Wald
symmetric Wald[1,1]
c1
c1 8.5701855
. scalar WaldB = Wald[1,1]
.
. * (2) Likelihood ratio test requires estimating both models
.
. poisson y x2 x3 x4
Iteration 0: log likelihood = -238.77153
Iteration 1: log likelihood = -238.77153
Poisson regression

Number of obs =
200
LR chi2(3)
=
8.30
Prob > chi2 = 0.0401
Log likelihood = -238.77153
Pseudo R2
= 0.0171

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0275702 .0767909 -0.36 0.720 -.1780775 .1229371
x3 | .1630037 .0670848 2.43 0.015 .0315199 .2944874
x4 | .1026568 .0802139 1.28 0.201 -.0545595 .2598732
_cons | -.1653238 .0773479 -2.14 0.033 -.316923 -.0137246
-----------------------------------------------------------------------------. estimates store unrestricted
. scalar llunrest = e(ll)

/* Used for Stata lrtest */


/* Used for manual lrtest */
131

. poisson y x2
Iteration 0: log likelihood = -242.92271
Iteration 1: log likelihood = -242.92271 (backed up)
Poisson regression

Number of obs =
200
LR chi2(1)
=
0.00
Prob > chi2 = 0.9608
Log likelihood = -242.92271
Pseudo R2
= 0.0000
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0037493 .0763386 -0.05 0.961 -.1533701 .1458716
_cons | -.1684599 .0769294 -2.19 0.029 -.3192388 -.0176811
-----------------------------------------------------------------------------. estimates store restrictedB
. scalar llrestB = e(ll)

/* Used for Stata lrtest */


/* Used for Stata lrtest */

.
. * (2A) Stata likelihood ratio test
. lrtest unrestricted restrictedB
likelihood-ratio test
LR chi2(2) =
8.30
(Assumption: restrictedB nested in unrestricted)
Prob > chi2 =

0.0157

.
. * (2B) Likelihood test done manually
. scalar LRB = -2*(llrestB-llunrest)
. di "LR " LRB
LR 8.3023503
.
. * (3) LM test via direct compuation requires estimating only the restricted model.
.
. * For exclusion restrictions in the Poisson, from 7.6.2
. * LM = dlnL/db * V[b]-inv * dlnL/db where b evaluated at restricted
. * = [Sum_i u_i*x_i]'[Sum_i exp(x_i'b)*x_i*x_i'][Sum_i u_i*x_i]
. * First calculate Sum_i u_i*x_i' : a 1x4 row vector
.
. quietly poisson y x2
. predict yhatrest
(option n assumed; predicted number of events)
. gen u = y - yhatrest

/* yhatrest = exp(x_brest) calculated earlier */

132

. gen one = 1
. matrix vecaccum dlnL_db = u one x2 x3 x4, noconstant
. * Then calculate Sum_i exp(x_i'b)*x_i*x_i'
. gen trx1 = sqrt(yhatrest)
. gen trx2 = sqrt(yhatrest)*x2
. gen trx3 = sqrt(yhatrest)*x3
. gen trx4 = sqrt(yhatrest)*x4
. matrix accum Vb = trx1 trx2 trx3 trx4, noconstant
(obs=200)
. matrix LMdirect = dlnL_db*syminv(Vb)*dlnL_db'
. matrix list dlnL_db
dlnL_db[1,4]
one
x2
x3
x4
u 1.192e-07 -4.632e-08 37.578639 19.933299
. matrix list Vb
symmetric Vb[4,4]
trx1
trx2
trx3
trx4
trx1
169
trx2 -2.1828434 171.62608
trx3 -24.733563 16.929495 210.68156
trx4 -5.561359 17.0457 23.027167 157.58531
. matrix list LMdirect
symmetric LMdirect[1,1]
u
u 8.5750886
. scalar LMdirectB = LMdirect[1,1]
.
. * (4) LM test via auxiliary regression
.
. * N uncentered Rsq from regress (noconstant) 1 on the scores
. * Begin by computing the unrestricted scores at the restricted estimates.
. * This varies from problem to problem.
. * In general could compute lnf(y) at current parameters
. * and then get numerical derivative when perturb beta a little.
. * Here use analytical derivative.
. * s_j = dlnf(y)/db_j = (y-exp(x'b))*x_j for the Poisson
133

.
. drop yhatrest
. quietly poisson y x2
. predict yhatrest
(option n assumed; predicted number of events)
. gen s1 = (y-yhatrest)*1
. gen s2 = (y-yhatrest)*x2
. gen s3 = (y-yhatrest)*x3
. gen s4 = (y-yhatrest)*x4
. regress one s1 s2 s3 s4, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 4, 196) = 2.36
Model | 9.18577727 4 2.29644432
Prob > F
= 0.0549
Residual | 190.814223 196 .973541953
R-squared = 0.0459
-------------+-----------------------------Adj R-squared = 0.0265
Total |
200 200
1
Root MSE
= .98668
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------s1 | -.0265153 .0748092 -0.35 0.723 -.1740497 .121019
s2 | -.0102806 .0809418 -0.13 0.899 -.1699093 .1493481
s3 | .1794153 .0697359 2.57 0.011 .0418862 .3169444
s4 | .1225885 .0821671 1.49 0.137 -.0394566 .2846336
-----------------------------------------------------------------------------. * LM equals N times uncentered Rsq
. scalar LMauxB = e(N)*e(r2)
. * Check: LM equals explained sum of squares
. scalar LMauxB2 = e(mss)
. di "LMauxB " LMauxB " LMauxB2 " LMauxB2
LMauxB 9.1857773 LMauxB2 9.1857773
.
. * (5) DISPLAY RESULTS
.
. estimates table unrestricted restrictedB, se stats(N ll r2) b(%8.3f)
-----------------------------------Variable | unrest~d restri~B
-------------+---------------------134

x2 | -0.028 -0.004
| 0.077
0.076
x3 | 0.163
| 0.067
x4 | 0.103
| 0.080
_cons | -0.165 -0.168
| 0.077
0.077
-------------+---------------------N | 200.000 200.000
ll | -238.772 -242.923
r2 |
-----------------------------------legend: b/se
. * Wald test using stata default Poisson variance matrix
. di "WaldB " WaldB " p-value " chi2tail(2,WaldB)
WaldB 8.5701855 p-value .01377234
. * LR test using Poisson log-likelihoods
. di " LRB " LRB " p-value " chi2tail(2,LRB)
LRB 8.3023503 p-value .0157459
. * LM test direct
. di " LMdirectB " LMdirectB " p-value " chi2tail(2,LMdirectB)
LMdirectB 8.5750886 p-value .01373862
. * LM test direct by auxiliary regression
. di " LMauxB " LMauxB " p-value " chi2tail(2,LMauxB)
LMauxB 9.1857773 p-value .01012357
.
. ****** (A) TEST H0: b3 = 0
.
. * (1) Wald test
. quietly poisson y x2 x3 x4
. test (x3=0)
( 1) [y]x3 = 0
chi2( 1) = 5.90
Prob > chi2 = 0.0151
. scalar WaldA = r(chi2)
.
. * (2) LR test
. poisson y x2 x4
Iteration 0: log likelihood = -241.64842
135

Iteration 1: log likelihood = -241.64842


Poisson regression

Number of obs =
200
LR chi2(2)
=
2.55
Prob > chi2 = 0.2793
Log likelihood = -241.64842
Pseudo R2
= 0.0053

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0163179 .0770381 -0.21 0.832 -.1673098 .134674
x4 | .1278017 .0800348 1.60 0.110 -.0290637 .284667
_cons | -.1719505 .0772389 -2.23 0.026 -.3233359 -.0205651
-----------------------------------------------------------------------------. estimates store restrictedA
. lrtest unrestricted

/* Uses estimates store unrestricted from earlier */

likelihood-ratio test
LR chi2(1) =
5.75
(Assumption: restrictedA nested in unrestricted)
Prob > chi2 =

0.0165

. scalar LRA = r(chi2)


.
. * (3) LM test via direct compuation requires estimating only the restricted model.
. * See (B) for more explanation
. drop one yhatrest u trx1 trx2 trx3 trx4
. matrix drop dlnL_db Vb LMdirect
. quietly poisson y x2 x4
. predict yhatrest
(option n assumed; predicted number of events)
. gen u = y - yhatrest

/* yhatrest = exp(x_brest) calculated earlier */

. gen one = 1
. matrix vecaccum dlnL_db = u one x2 x3 x4, noconstant
. gen trx1 = sqrt(yhatrest)
. gen trx2 = sqrt(yhatrest)*x2
. gen trx3 = sqrt(yhatrest)*x3
. gen trx4 = sqrt(yhatrest)*x4
. matrix accum Vb = trx1 trx2 trx3 trx4, noconstant
136

(obs=200)
. matrix LMdirect = dlnL_db*syminv(Vb)*dlnL_db'
. matrix list dlnL_db
dlnL_db[1,4]
one
x2
x3
x4
u -1.788e-07 -1.717e-07 34.832631 -3.179e-07
. matrix list Vb
symmetric Vb[4,4]
trx1
trx2
trx3
trx4
trx1
169
trx2 -2.1828435 170.25918
trx3 -21.987555 15.647287 212.5673
trx4 14.371941 16.35821 22.067372 158.94405
. matrix list LMdirect
symmetric LMdirect[1,1]
u
u 5.9159017
. scalar LMdirectA = LMdirect[1,1]
.
. * (4) LM test via auxiliary regression
. * See (B) for more explanation
. drop yhatrest s1 s2 s3 s4 one
. quietly poisson y x2 x4
. predict yhatrest
(option n assumed; predicted number of events)
. gen s1 = (y-yhatrest)*1
. gen s2 = (y-yhatrest)*x2
. gen s3 = (y-yhatrest)*x3
. gen s4 = (y-yhatrest)*x4
. gen one = 1
. regress one s1 s2 s3 s4, noconstant
Source |
SS
df
MS
-------------+------------------------------

Number of obs = 200


F( 4, 196) = 1.57
137

Model | 6.21794802 4 1.554487


Prob > F
= 0.1832
Residual | 193.782052 196 .988683939
R-squared = 0.0311
-------------+-----------------------------Adj R-squared = 0.0113
Total |
200 200
1
Root MSE
= .99433
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------s1 | -.021781 .0760166 -0.29 0.775 -.1716964 .1281344
s2 | .0237921 .082791 0.29 0.774 -.1394834 .1870675
s3 | .1785093 .0711813 2.51 0.013 .0381297 .3188889
s4 | -.0065009 .084884 -0.08 0.939 -.1739042 .1609024
-----------------------------------------------------------------------------. * LM equals N times uncentered Rsq
. scalar LMauxA = e(N)*e(r2)
. di "LMauxA " LMauxA
LMauxA 6.217948
.
. * (5) DISPLAY RESULTS in Table 7.1 page 242
.
. estimates table unrestricted restrictedA, se stats(N ll r2) b(%8.3f)
-----------------------------------Variable | unrest~d restri~A
-------------+---------------------x2 | -0.028 -0.016
| 0.077
0.077
x3 | 0.163
| 0.067
x4 | 0.103
0.128
| 0.080
0.080
_cons | -0.165 -0.172
| 0.077 0.077
-------------+---------------------N | 200.000 200.000
ll | -238.772 -241.648
r2 |
-----------------------------------legend: b/se
. di "WaldA " WaldA " p-value " chi2tail(1,WaldA)
WaldA 5.9040087 p-value .01510647
. di " LRA " LRA " p-value " chi2tail(1,LRA)
LRA 5.7537678 p-value .01645333
. di " LMdirectA " LMdirectA " p-value " chi2tail(1,LMdirectA)
LMdirectA 5.9159017 p-value .01500482
138

. di " LMauxA " LMauxA " p-value " chi2tail(1,LMauxA)


LMauxA 6.217948 p-value .01264616
.
. ****** (C) TEST H0: b3 = b4
.
. * (1A) Wald test
. poisson y x2 x3 x4
Iteration 0: log likelihood = -238.77153
Iteration 1: log likelihood = -238.77153
Poisson regression

Number of obs =
200
LR chi2(3)
=
8.30
Prob > chi2 = 0.0401
Log likelihood = -238.77153
Pseudo R2
= 0.0171
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0275702 .0767909 -0.36 0.720 -.1780775 .1229371
x3 | .1630037 .0670848 2.43 0.015 .0315199 .2944874
x4 | .1026568 .0802139 1.28 0.201 -.0545595 .2598732
_cons | -.1653238 .0773479 -2.14 0.033 -.316923 -.0137246
-----------------------------------------------------------------------------. test (x3=x4)
( 1) [y]x3 - [y]x4 = 0
chi2( 1) = 0.29
Prob > chi2 = 0.5883
.
. * (1B) Wald test done manually
. * Note that Stata puts the intercept last, not first.
. * So here the second and third elements of b are tested as equal.
. matrix drop h R Wald
. matrix bfull = e(b)

/* 1xq row vector */

. matrix vfull = e(V)

/* qxq matrix */

. matrix h = (bfull[1,2]-bfull[1,3])
. matrix R = (0,1,-1,0)

/* hx1 vector */

/* h x q matrix */

. matrix Wald = h'*syminv(R*vfull*R')*h /* scalar */


. matrix list h
139

symmetric h[1,1]
c1
r1 .06034684
. matrix list R
R[1,4]
c1 c2 c3 c4
r1 0 1 -1 0
. matrix list Wald
symmetric Wald[1,1]
c1
c1 .29301766
. scalar WaldC = Wald[1,1]
. di " WaldC " WaldC " p-value " chi2tail(1,WaldC)
WaldC .29301766 p-value .5882932
.
. * (2) LR Test
. * In general getting the restricted MLE requires constrained ML
. * Here simple as if b3=b4 then mean is exp(b1+b2*x2+B3*(x3+x4))
. gen x3plusx4 = x3+x4
. poisson y x2 x3plusx4
Iteration 0: log likelihood = -238.91785
Iteration 1: log likelihood = -238.91785
Poisson regression

Number of obs =
200
LR chi2(2)
=
8.01
Prob > chi2 = 0.0182
Log likelihood = -238.91785
Pseudo R2
= 0.0165

-----------------------------------------------------------------------------y | Coef. Std. Err.


z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0287235 .0768651 -0.37 0.709 -.1793763 .1219293
x3plusx4 | .1374814 .0479519 2.87 0.004 .0434974 .2314653
_cons | -.1672262 .0773265 -2.16 0.031 -.3187832 -.0156691
-----------------------------------------------------------------------------. estimates store restrictedC
. lrtest unrestricted

/* Uses estimates store unrestricted from earlier */

likelihood-ratio test

LR chi2(1) =

0.29
140

(Assumption: restrictedC nested in unrestricted)

Prob > chi2 =

0.5885

. scalar LRC = r(chi2)


.
. * (3) LM test direct
. * Can use same code as earlier. Just different restricted estimates.
. * Now from poisson y x2 x3plusx4
. drop one yhatrest u trx1 trx2 trx3 trx4
. matrix drop dlnL_db Vb
. quietly poisson y x2 x3plusx4
. predict yhatrest
(option n assumed; predicted number of events)
. gen u = y - yhatrest

/* yhatrest = exp(x_brest) calculated earlier */

. gen one = 1
. matrix vecaccum dlnL_db = u one x2 x3 x4, noconstant
. gen trx1 = sqrt(yhatrest)
. gen trx2 = sqrt(yhatrest)*x2
. gen trx3 = sqrt(yhatrest)*x3
. gen trx4 = sqrt(yhatrest)*x4
. matrix accum Vb = trx1 trx2 trx3 trx4, noconstant
(obs=200)
. matrix LMdirect = dlnL_db*syminv(Vb)*dlnL_db'
. matrix list dlnL_db
dlnL_db[1,4]
one
x2
x3
x4
u 8.345e-07 -3.601e-07 4.8459933 -4.8459932
. matrix list Vb
symmetric Vb[4,4]
trx1
trx2
trx3
trx4
trx1
169
trx2 -2.1828442 171.13986
trx3 7.9990827 13.105974 225.99023
trx4 19.217934 15.11254 28.153892 161.75506

141

. matrix list LMdirect


symmetric LMdirect[1,1]
u
u .29306257
. scalar LMdirectC = LMdirect[1,1]
.
. * (4) LM test via auxiliary regression
. drop yhatrest s1 s2 s3 s4 one
. quietly poisson y x2 x3plusx4
. predict yhatrest
(option n assumed; predicted number of events)
. gen s1 = (y-yhatrest)*1
. gen s2 = (y-yhatrest)*x2
. gen s3 = (y-yhatrest)*x3
. gen s4 = (y-yhatrest)*x4
. gen one = 1
. regress one s1 s2 s3 s4, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 4, 196) = 0.08
Model | .31510777 4 .078776943
Prob > F
= 0.9891
Residual | 199.684892 196 1.01880047
R-squared = 0.0016
-------------+-----------------------------Adj R-squared = -0.0188
Total |
200 200
1
Root MSE
= 1.0094
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------s1 | -.000531 .077731 -0.01 0.995 -.1538275 .1527654
s2 | .012802 .0857027 0.15 0.881 -.1562159 .1818199
s3 | .0283145 .0761713 0.37 0.711 -.121906 .1785351
s4 | -.0367099 .0869889 -0.42 0.673 -.2082642 .1348445
-----------------------------------------------------------------------------. * LM equals N times uncentered Rsq
. scalar LMauxC = e(N)*e(r2)
. di "LMauxC " LMauxC
LMauxC .31510777

142

.
. * (5) DISPLAY RESULTS in Table 7.1 page 242
.
. estimates table unrestricted restrictedC, se stats(N ll r2) b(%8.3f)
-----------------------------------Variable | unrest~d restri~C
-------------+---------------------x2 | -0.028 -0.029
| 0.077
0.077
x3 | 0.163
| 0.067
x4 | 0.103
| 0.080
x3plusx4 |
0.137
|
0.048
_cons | -0.165 -0.167
| 0.077
0.077
-------------+---------------------N | 200.000 200.000
ll | -238.772 -238.918
r2 |
-----------------------------------legend: b/se
. di "WaldC " WaldC " p-value " chi2tail(1,WaldC)
WaldC .29301766 p-value .5882932
. di " LRC " LRC " p-value " chi2tail(1,LRC)
LRC .29264001 p-value .5885337
. di " LMdirectC " LMdirectC " p-value " chi2tail(1,LMdirectC)
LMdirectC .29306257 p-value .58826462
. di " LMauxC " LMauxC " p-value " chi2tail(1,LMauxC)
LMauxC .31510777 p-value .57456264
.
. ****** (D) TEST H0: b3/b4 - 1 = 0
.
. * (1) Wald test of b3 /b4 - 1 = 0
. * Stata does not do nonlinear hypotheses.
. * Instead do 7.2.5 algebra.
. matrix drop h R Wald
. matrix h = (bfull[1,2]/bfull[1,3] - 1)
. matrix R = (0, 1/bfull[1,3], -bfull[1,2]/(bfull[1,3]^2), 0)
. matrix Wald = h'*syminv(R*vfull*R')*h

143

. matrix list h
symmetric h[1,1]
c1
r1 .58785028
. matrix list R
R[1,4]
r1

c1
c2
c3
c4
0 9.7411946 -15.467559

. matrix list Wald


symmetric Wald[1,1]
c1
c1 .15768686
. scalar WaldD = Wald[1,1]
. di " WaldD " WaldD " p-value " chi2tail(1,WaldD)
WaldD .15768686 p-value .69129516
.
. * (2) LR Test
. * This requires MLE subject to nonlinear constraints.
. * This is difficult so not done here.
. * But note that here will get same result as if
. * get MLE subject to b3 = b4 which was done in (C).
.
. * (3) LM test direct
. * Like (2) requires restricted MLE.
. * This is difficult so not done here.
. * But note that here will get same result as if
. * get MLE subject to b3 = b4 which was done in (C).
.
. * (4) LM test via auxiliary regrression
. * Same as for (3)
.
. * (5) DISPLAY RESULTS
. di "WaldD " WaldD " p-value " chi2tail(1,WaldD)
WaldD .15768686 p-value .69129516
.
.
. *********** DISPLAY RESULTS GIVEN IN TABLE 7.1 on page 242 ***********
.
. estimates table unrestricted restrictedA restrictedB restrictedC, se stats(N ll r2) b(%8.3f)
---------------------------------------------------------Variable | unrest~d restri~A restri~B restri~C
144

-------------+-------------------------------------------x2 | -0.028 -0.016 -0.004 -0.029


| 0.077
0.077
0.076
0.077
x3 | 0.163
| 0.067
x4 | 0.103
0.128
| 0.080
0.080
x3plusx4 |
0.137
|
0.048
_cons | -0.165 -0.172 -0.168 -0.167
| 0.077
0.077
0.077
0.077
-------------+-------------------------------------------N | 200.000 200.000 200.000 200.000
ll | -238.772 -241.648 -242.923 -238.918
r2 |
---------------------------------------------------------legend: b/se
. di "WaldA " WaldA " p-value " chi2tail(1,WaldA)
WaldA 5.9040087 p-value .01510647
.
. * Wald test statistics
. di "Wald A to D: (A) " %8.3f WaldA " (B) " %8.3f WaldB " (C) " %8.3f WaldC " (D) " %8.3f
WaldD
Wald A to D: (A) 5.904 (B) 8.570 (C) 0.293 (D) 0.158
. di " p-values : (A) " %8.3f chi2tail(1,WaldA) " (B) " %8.3f chi2tail(2,WaldB) " (C) " %8.3f chi2t
> ail(1,WaldC) " (D) " %8.3f chi2tail(1,WaldD)
p-values : (A) 0.015 (B) 0.014 (C) 0.588 (D) 0.691
.
. * LR test statistics
. di "LR A to D: (A) " %8.3f LRA " (B) " %8.3f LRB " (C) " %8.3f LRC " (D) " %8.3f LRC
LR A to D: (A) 5.754 (B) 8.302 (C) 0.293 (D) 0.293
. di " p-values : (A) " %8.3f chi2tail(1,LRA) " (B) " %8.3f chi2tail(2,LRB) " (C) " %8.3f chi2tail(
> 1,LRC) " (D) " %8.3f chi2tail(1,LRC)
p-values : (A) 0.016 (B) 0.016 (C) 0.589 (D) 0.589
.
. * Direct LM test statistics
. di "LM A to D: (A) " %8.3f LMdirectA " (B) " %8.3f LMdirectB " (C) " %8.3f LMdirectC " (D)
" %8.
> 3f LMdirectC
LM A to D: (A) 5.916 (B) 8.575 (C) 0.293 (D) 0.293
. di " p-values: (A) " %8.3f chi2tail(1,LMdirectA) " (B) " %8.3f chi2tail(2,LMdirectB) " (C) " %8.
> 3f chi2tail(1,LMdirectC) " (D) " %8.3f chi2tail(1,LMdirectC)
p-values: (A) 0.015 (B) 0.014 (C) 0.588 (D) 0.588

145

.
. * Auxiliary Regression LM test statistics
. di "LM* A to D: (A) " %8.3f LMauxA " (B) " %8.3f LMauxB " (C) " %8.3f LMauxC " (D) "
%8.3f LMauxC
LM* A to D: (A) 6.218 (B) 9.186 (C) 0.315 (D) 0.315
. di " p-values : (A) " %8.3f chi2tail(1,LMauxA) " (B) " %8.3f chi2tail(2,LMauxB) " (C) " %8.3f
chi
> 2tail(1,LMauxC) " (D) " %8.3f chi2tail(1,LMauxC)
p-values : (A) 0.013 (B) 0.010 (C) 0.575 (D) 0.575
.
. ********** CLOSE OUTPUT ***********
. log close
log: c:\Imbook\bwebpage\Section2\mma07p1mltests.txt
log type: text
closed on: 17 May 2005, 13:59:21
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p2power.txt
log type: text
opened on: 17 May 2005, 14:00:49
.
. ********** OVERVIEW OF MMA07P2POWER.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 7.6.3 pages 248-9
. * Asymptotic Power of Wald test
.
. * (1) Chapter 7.6.3 obtains power for noncentral chisquare
. * (2) Figure 7.2 (ch7power.wmf) plots against the noncentrality parameter lamda
. * No data needed
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** ANALYSIS **********
.
. * Obtain power of chi-square tests
146

. * with df degrees of freedom


. * and noncentrality parameter (ncp) lamda from 0 to 20
. * for size alpha = 0.01, 0.05 and 0.10
.
. set obs 201
obs was 0, now 201
. scalar df = 1

/* Degrees of freedom */

. gen lamda = 0.1*(_n-1) /* Lamda = 0, 0.1, 0.2, ..., 19.9, 20.0 */


.
. * Obtain power
.*
= Pr[W > chi-square(alpha) | W ~ chi-square(alpha)]
. * for alpha = 0.01, 0.05 and 0.10
.
. * Critical value at size alpha uses central chisquare
. * invchi2tail gives cv such that Pr(Chi2 > cv) = alpha
. * Power is 1 minus cdf of noncentral chisquare
. * nchi2 gives the cdf of noncentral chisquare
.
. scalar alpha = 0.01
. scalar criticalvalue = invchi2tail(df,alpha)
. gen power01 = 1-nchi2(df,lamda,criticalvalue)
.
. scalar alpha = 0.05
. scalar criticalvalue = invchi2tail(df,alpha)
. gen power05 = 1-nchi2(df,lamda,criticalvalue)
.
. scalar alpha = 0.10
. scalar criticalvalue = invchi2tail(df,alpha)
. gen power10 = 1-nchi2(df,lamda,criticalvalue)
.
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lamda |
201
10 5.816786
0
20
power01 |
201 .6230651 .3095508
.01 .9710402
power05 |
201 .7583101 .2717153
.05 .9940005
power10 |
201 .8152767 .2396043
.1 .9976528

147

. * For lamda = 0 have size = power, here 0.01, 0.05 and 0.10
. list if lamda==0 | lamda==5 | lamda==10 | lamda==20
+----------------------------------------+
| lamda power01 power05 power10 |
|----------------------------------------|
1. | 0
.01
.05
.1 |
51. | 5 .3670189 .6087795 .7228636 |
101. | 10 .7212129 .8853791 .9354209 |
201. | 20 .9710402 .9940005 .9976528 |
+----------------------------------------+
.
. ********** FIGURE 7.1 (p.249): PLOT THE POWER FUNCTION **********
.
. graph twoway (line power10 lamda, clstyle(p1)) /*
> */ (line power05 lamda, clstyle(p2)) /*
> */ (line power01 lamda, clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Test Power as a function of the ncp") /*
> */ xtitle("Noncentrality parameter lamda", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Test Power", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(3) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Test size = 0.10") label(2 "Test size = 0.05") /*
> */
label(3 "Test size = 0.01"))
. graph export ch7power.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch7power.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma07p2power.txt
log type: text
closed on: 17 May 2005, 14:00:52
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p3montecarlo.txt
log type: text
opened on: 18 May 2005, 11:28:58
.
. ********** OVERVIEW OF MMA07P3MONTECARLO.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 7.7.1-7.7.5 pp. 250-4
148

. * Size and power of the Wald test


.
. * (1) Figure 7.2 Density of Wald test statistic
. * (2) Table 7.2 Actual size of Wald test at various nominal sizes
. * (3) Table 7.2 Actual power of Wald test at various nominal sizes
. * (4) Table 7.2 Nominal power of Wald test at various nominal sizes
. * (5) Alternative way to simulate using postfile rather than simulate
.
. * on the slope coefficient for a Probit model with simulated data (see below).
.
. * NOTE: Because this is a simulation using many samples (here 10,000)
. * the generated data are not saved in a text file.
.
. * Problem can arise if in one of the simulations all of sample is y=0 or y=1
. * Then the probit model is not estimable.
. * Then need increase sample size, change dgp or reduce number of simulations.
. * Here used N=40 with S=10000 for size and for power
. * Another possible change is to have same regressors x across simulations
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** MONTE CARLO OVERVIEW **********
.
. * The data generating process is
. * - Probit with Pr[y=1] = Phi(b1 + b2*x2)
. * - where b1 = 0 and b2 = 1
. * - and regressor x ~ N[0,1] is fixed throughout the simulations
.
. * The sample size N set below in the global numobs
. * The number of simulations S is set below in the global numsims
. * A third option is to switch to same x in each sample. This needs to be done manually.
.
. * The simulation is done using stata command simulate
. * At the end of the program, an alternative using postfile is given
.
. * The program investigates both size and power
. * of the Wald test that b2 = 1.
. * For power the dgp instead uses b2 = 2.
.
. ********** INITIAL SIMULATION SET UP **********
.
. set seed 10101
. * Change the following for different sample size N
149

. global numobs "40"


. * Change the following for different number of simulations S
. global numsims "10000"
.
. ****** ANALYSIS: SIMULATION OF PROBIT MODEL SLOPE ESTIMATES AND WALD
TEST
.
. * The program is rclass.
. * This means the results returned by the program are put into r( )
. * Here we return meany, vary, betahat, sebetahat, ztestforbetaeq1
.
. * The probit model is Pr[y=1] = Phi(b1 + b2*x2) where b1=0 and b2=1
. * For size calculations: b2 = 1
. * For power calculations: b2 = 1.5 (as an example)
. * So pass the argument trueb2 as an argument.
.
. * The following three lines are only needed
. * if the regressors are constant across simulations,
. * as then need to generate once and put in a data file to be reused.
. * They are commented out here as here (x,y) both resampled.
. * Also simprobit and simprobit2 need one line changed if x is fixed.
. /*
> set obs numobs
> gen x = invnorm(uniform())
> save xforsim, replace
> */
. * This version of the program instead redraws both x and y in each simulation
.
. * The program has one argument
. * - trueb2 = value of b2 in the dgp
.
. program simprobit, rclass
1. version 8.0
2. /* define arguments. Here trueb2 = b2 in Phi(b1 + b2*x2) */
. args trueb2
3. /* Generate the data: here x and y */
. drop _all
4. set obs $numobs
5. gen x = invnorm(uniform())
6. /* If instead want same x in each simulation,
>
replace above line with: use xforsim */
. gen y = 0
7. replace y = 1 if 0 + `trueb2'*x + invnorm(uniform()) > 0
8. /* Summarize the generated data as a check */
. summarize y
9. return scalar ymean=r(mean)
10. return scalar yvar=r(Var)
11. /* Do probit and store key results */
. probit y x
150

12. return scalar b2hat=_b[x]


13. return scalar seb2hat = _se[x]
14. return scalar ztestforb2eq1 = (_b[x]-1)/_se[x]
15. end
.
. ****** (1) DISTRIBUTION OF WALD TEST STATISTIC (Figure 7.2 p.253)
.
. * Now call the program simprobit where
. * - include values for each argument within the quotes " "
. * (here the argument is b2true and is set to 1 for size and 1.5 for power)
. * - make sure that ask for each of the returned results
.
. * For size calculations set trueb2 = 1
. simulate "simprobit 1" ymean=r(ymean) yvar=r(yvar) b2hat=r(b2hat) /*
> */ seb2hat=r(seb2hat) ztestforb2eq1=r(ztestforb2eq1), reps($numsims)
command:
simprobit 1
statistics: ymean
= r(ymean)
yvar
= r(yvar)
b2hat
= r(b2hat)
seb2hat = r(seb2hat)
ztestfor~1 = r(ztestforb2eq1)
.
. * Summary of the results returned by simulate
. * For Wald test key output is ztestforb2eq1
. describe
Contains data
obs:
10,000
simulate: simprobit 1
vars:
5
18 May 2005 11:29
size:
240,000 (97.7% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------ymean
float %9.0g
r(ymean)
yvar
float %9.0g
r(yvar)
b2hat
float %9.0g
r(b2hat)
seb2hat
float %9.0g
r(seb2hat)
ztestforb2eq1 float %9.0g
r(ztestforb2eq1)
------------------------------------------------------------------------------Sorted by:
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ymean | 10000
.49946 .0794447
.225
.775
yvar | 10000 .2499373 .0089917 .1788462 .2564103
151

b2hat | 10000 1.133952 .4516738 -.0306482 9.389184


seb2hat | 10000 .3589645 .1561059 .1902922 4.583915
ztestforb2~1 | 10000 .1141294 .9558451 -4.087344 2.278257
.
. * For b2hat there are two ways to estimate the standard deviation.
. * One is the average of seb2hat, the standard error of b2hat
. * The other is the standard deviation of b2hat.
. * These are equal asymptotically, but perhaps not in small samples due to bias.
. * Also aveseb2hat is used later in calculating asymptotic power.
. sum seb2hat
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------seb2hat | 10000 .3589645 .1561059 .1902922 4.583915
. scalar aveseb2hat = r(mean)
. sum b2hat
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------b2hat | 10000 1.133952 .4516738 -.0306482 9.389184
. scalar stdevb2hat = r(sd)
. di "Average standard error of b2hat: " aveseb2hat
Average standard error of b2hat: .3589645
. di "Standard deviation of b2hat:
" stdevb2hat
Standard deviation of b2hat:
.45167383
.
. * The Wald test statistic will be called Wald
. gen Wald = ztestforb2eq1
. label var Wald "Wald test statistic"
.
. * The mean and st.dev. should be 0 and 1 if Wald ~ N[0,1]
. sum Wald
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------Wald | 10000 .1141294 .9558451 -4.087344 2.278257
.
. * The 2.5 and 97.5 percentiles should be -1.96 and 1.96 if Wald ~ N[0,1]
. * They can be used to get size-adjusted Wald test at 5 percent.
. _pctile Wald, p(2.5,99.5)

152

. display "Wald: Lower 2.5 percentile = " r(r1) " Upper 2.5 percentile = " r(r2)
Wald: Lower 2.5 percentile = -1.904708 Upper 2.5 percentile = 2.0034728
.
. * The density of the simulated values of the Wald test should be
. * a standard normal density if Wald ~ N[0,1]
. * The following plots kernel estimate of density of Wald and a N[0,1] density
. * Could also do Student[N-k] but this looks same as N[0,1] if N>=30.
. gen N01density = normden(Wald)
. sum Wald
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------Wald | 10000 .1141294 .9558451 -4.087344 2.278257
.
. graph twoway (kdensity Wald, range(-3 3) clstyle(p1)) /*
> */ (connect N01density Wald if Wald>-3 & Wald<3, clstyle(p2) sort(Wald) s(i)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Monte Carlo Simulations of Wald Test") /*
> */ xtitle("Wald Test Statistic", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Monte Carlo") label(2 "Standard Normal") /*
> */
label(3 "Test size = 0.01"))
. graph export ch7montecarlo.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch7montecarlo.wmf written in Windows Metafile format)
.
. ****** (2) ACTUAL SIZE OF THE WALD TEST STATISTIC (Table 7.2, p.253)
.
. * Obtain the size properties of a two-sided Wald test
. * That rejects if |Wald| > z_alpha/2 where alpha = .01, .05, .1, .2
.
. * Convert to two-sided test by taking absolute value
. gen absWald = abs(Wald)
.
. * Give key percentiles of |Wald|
. * Percentiles must be in ascending order for Stata
. _pctile absWald, p(0.80,0.90,0.95,0.99)
. display "I[Upper percentiles of |Wald|: " " 1 " r(r4) " 5 " r(r3) " 10 " r(r2) " 20 " r(r1)
I[Upper percentiles of |Wald|: 1 .0115847 5 .01074749 10 .00998338 20 .00923005
.
. * Program to calculate actual size given nominal size
. * Temporary variables and scalars are in quotes ` '
. program size, rclass
153

1.
version 8.0
2.
args nominalsize
3.
tempvar reject
4.
tempname normalcriticalvalue
5.
quietly {
6.
scalar `normalcriticalvalue' = invnorm(1-(`nominalsize'/2))
7.
gen `reject' = 0
8.
replace `reject' = 1 if absWald > `normalcriticalvalue'
9.
summarize `reject'
10.
return scalar actualsize = r(mean)
11.
}
12. end
.
. * Calculate actual size for nominal sizes 0.01, 0.05, 0.10 and 0.20
. size 0.01
. scalar actualsize01 = r(actualsize)
. size 0.05
. scalar actualsize05 = r(actualsize)
. size 0.10
. scalar actualsize10 = r(actualsize)
. size 0.20
. scalar actualsize20 = r(actualsize)
.
. * Following gives Actual Size column of Table 7.2 (p.253)
. * Nominal Sizes and Actual Sizes of Two-sided Wald Test
. di "0.01: " actualsize01 _new "0.05: " actualsize05 _new /*
> */ "0.10: " actualsize10 _new "0.20: " actualsize20
0.01: .0053
0.05: .0294
0.10: .0805
0.20: .1922
.
. ****** (3) ACTUAL POWER OF THE WALD TEST STATISTIC (Table 7.2, p.253)
.
. * Consider power when b2 = 2 rather than 1
.
. * Obtain the actual power by simulation
. * Use the same program simprobit as for size,
. * except the argument b2true is 2.0 rather than 1.0
.
. drop _all
154

.
. * For size calculations set trueb2 = 2
. simulate "simprobit 2" ymean=r(ymean) yvar=r(yvar) b2hat=r(b2hat) /*
> */ seb2hat=r(seb2hat) ztestforb2eq1=r(ztestforb2eq1), reps(10000)
command:
simprobit 2
statistics: ymean
= r(ymean)
yvar
= r(yvar)
b2hat
= r(b2hat)
seb2hat = r(seb2hat)
ztestfor~1 = r(ztestforb2eq1)
.
. * Calculate |Wald|
. gen Wald = ztestforb2eq1
(71 missing values generated)
. gen absWald = abs(Wald)
(71 missing values generated)
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ymean |
9929 .4998389 .0791531
.225
.825
yvar |
9929 .249985 .0090933 .1480769 .2564103
b2hat |
9929 2.581075 2.73046 .8547966 209.9805
seb2hat |
9929 1.002628 5.799384 .2816004 540.1536
ztestforb2~1 |
9929 1.667773 .3853416 -.4042006 2.59991
-------------+-------------------------------------------------------Wald |
9929 1.667773 .3853416 -.4042006 2.59991
absWald |
9929 1.668285 .383118 .0033462 2.59991
.
. * Calculate actual power for nominal sizes 0.01, 0.05, 0.10 and 0.20
. * This can use the earlier program size
. size 0.01
. scalar actualpower01 = r(actualsize)
. size 0.05
. scalar actualpower05 = r(actualsize)
. size 0.10
. scalar actualpower10 = r(actualsize)
. size 0.20
155

. scalar actualpower20 = r(actualsize)


.
. * Following gives Actual Power column of Table 7.2 (p.253)
. * Nominal Sizes and Actual Power of Two-sided Wald Test
. di "0.01: " actualpower01 _new "0.05: " actualpower05 _new /*
> */ "0.10: " actualpower10 _new "0.20: " actualpower20
0.01: .0073
0.05: .2257
0.10: .6077
0.20: .8583
.
. ****** (4) ASYMPTOTIC POWER OF THE WALD TEST STATISTIC (Table 7.2, p.253)
.
. * Consider power when b2 = 2 rather than 1
.
. * Calculate asymptotic theoretical power using noncentral chisquare
. * Asymptotic power = Pr[W > chi-square(alpha) | W ~ noncentral chi-square(alpha,ncp)
. * The noncentrality parameter is 0.5*(delta^2)/(se[b2]^2)
. * Here size has b2 = 1 and power has b2 = 1+delta
. * So delta = b2true - 1.
. * Need to find the standard error of b2.
. * Use the average from earlier simulations.
.
. * Program to calculate asymptotic power given nominal size
. * Temporary variables and scalars and arguments are in quotes ` '
. * invchi2tail gives cv such that Pr(Chi2 > cv) = nominalsize
. * Power is 1 minus cdf of noncentral chisquare
. * nchi2 gives the cdf of noncentral chisquare
.
. drop _all
.
. * Arguments are alpha (size), lamda and df (degrees of freedom)
. program power, rclass
1.
version 8.0
2.
args alpha lamda df
3.
tempname criticalvalue powervianoncentralchi
4.
quietly {
5.
scalar `criticalvalue' = invchi2tail(`df',`alpha')
6.
scalar `powervianoncentralchi' = 1-nchi2(`df',`lamda',`criticalvalue')
7.
return scalar asymppower = `powervianoncentralchi'
8.
}
9. end
.
. * scalar criticalvalue = invchi2tail(df,alpha)
. * replace power = 1-nchi2(df,lamda,criticalvalue)
.
156

. * Calculate df and lamda.


. * This uses an estimate of se[beta] obtained earlier
. scalar delta = 1 /* Here 2 - 1. Changes for different alternatives */
. scalar lamda = 0.5*(delta*delta)/(aveseb2hat*aveseb2hat)
. scalar df = 1
. di "delta: " delta " aveseb2hat: " aveseb2hat " lamda: " lamda " df: " df
delta: 1 aveseb2hat: .3589645 lamda: 3.8803151 df: 1
.
. * Calculate asymptotic power for nominal sizes 0.01, 0.05, 0.10 and 0.20
. power 0.01 lamda df
. scalar asymppower01 = r(asymppower)
. power 0.05 lamda df
. scalar asymppower05 = r(asymppower)
. power 0.10 lamda df
. scalar asymppower10 = r(asymppower)
. power 0.20 lamda df
. scalar asymppower20 = r(asymppower)
.
. * Following gives Asymptotic Power column of Table 7.2 (p.253)
. * Nominal Sizes and Asymptotic Power of Two-sided Wald Test
. di "0.01: " asymppower01 _new "0.05: " asymppower05 _new /*
> */ "0.10: " asymppower10 _new "0.20: " asymppower20
0.01: .2722675
0.05: .50398701
0.10: .62755902
0.20: .75494224
.
. ****** (5) ALTERNATIVE ANALYSIS: SIMULATION METHOD USING POSTFILE
.
. * This is an alternative, given for completeness.
. * This fails if the model is not estimable in any of the simulation samples.
. * By contrast, simulate just drops that simulation sample and continues simulating.
.
. * For each round of the simulation, the variables in `sim' are sent
. * as a new line to a stata data set simprobitresults.
. * The names of these variables are given in quotes after S_1
. * Need as many names in quotes after S_1 as variables at post
. * Then can analyze these using summarize etcetera
157

.
. * This program has two arguments
. * - numsims = desired number of simulations
. * - trueb2 = slope coefficient used to generate the data
.
. drop _all
.
. program simprobit2
1.
version 8.0
2.
args numsims trueb2
3.
tempname sim
4.
postfile `sim' meany vary beta sterror ztestforbeta using probitsimresults, replace
5.
quietly {
6.
forvalues i = 1/`numsims' {
7.
drop _all
8.
set obs $numobs
/* may need to change */
9.
gen x = invnorm(uniform())
10.
/* If instead want same x in each simulation
>
replace above line with: use xforsim */
.
gen y = 0
11.
/* Use b2 = 1.0 for size and 1.5 for power */
.
replace y = 1 if 0+`trueb2'*x+invnorm(uniform()) > 0
12.
summarize y
13.
scalar meany=r(mean)
14.
scalar vary=r(Var)
15.
probit y x
16.
scalar beta=_b[x]
17.
scalar sterror = _se[x]
18.
scalar ztestforbeta = (beta-1)/sterror
19.
post `sim' (meany) (vary) (beta) (sterror) (ztestforbeta)
20.
}
21.
}
22.
postclose `sim'
23. end
.
. simprobit2 $numsims 1
. use probitsimresults, clear
.
. * Here we just summarize results for comparison with earlier
. * But could do the further analysis as above
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------meany | 10000 .4989575 .0791248
.225
.775
vary | 10000 .2499885 .0090127 .1788462 .2564103
beta | 10000 1.135003 .4315248 .0901358 7.205799
158

sterror | 10000 .3583266 .133302 .1863547 3.360862


ztestforbeta | 10000 .1218973 .954814 -3.401833 2.299991
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma07p3montecarlo.txt
log type: text
closed on: 18 May 2005, 11:29:29
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p4boot.txt
log type: text
opened on: 18 May 2005, 21:36:29
.
. ********** OVERVIEW OF MMA07BOOT4.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 7.8 pages 254-256
. * Bootstrap applied to probit model
. * Provides
. * (1) Bootstrap confidence intervals
. * (2) Bootstrap hypothesis test without refinement
. * (3) Bootstrap hypothesis test with refinement: percentile-t method
.
. * Note corrections to book
. * - sample size is N=40 not N=30
. * - use 999 bootstrap replications not 1000
. * - for asymptotic refinement p.256 the critical region
.*
is (-1.89, 1.80) not (-2.62, 1.83)
.
. * For more detail on bootstrap see
. * Chapter 11: Bootstrap Methods pages 355-383
. * and program mma11p1boot.do
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** GENERATE DATA **********
.
. * DGP is Probit: Pr[y=1] = PHI(a + bx)
159

. * where x is N[0,1]
. * and a = 0 and b = 1
.
. * Change the following for different sample size N
. global numobs "40"
.
. * Probit example with slope coefficient equal to 1
. set seed 10105
. set obs $numobs
obs was 0, now 40
. gen x = invnorm(uniform())
. gen y = 0
. replace y = 1 if 0+1.0*x+invnorm(uniform()) > 0
(19 real changes made)
. save xyforsim, replace
file xyforsim.dta saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x|
40 -.0359197 .9203391 -2.210579 1.45199
y|
40
.475 .5057363
0
1
. probit y x
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -27.675866


log likelihood = -22.927488
log likelihood = -22.735204
log likelihood = -22.733966
log likelihood = -22.733966

Probit estimates

Number of obs =
40
LR chi2(1)
=
9.88
Prob > chi2 = 0.0017
Log likelihood = -22.733966
Pseudo R2
= 0.1786
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .8168831 .2942893 2.78 0.006 .2400867 1.393679
_cons | -.0725436 .2162576 -0.34 0.737 -.4964006 .3513135
-----------------------------------------------------------------------------. save mma07p4boot, replace
160

file mma07p4boot.dta saved


.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x using mma07p4boot.asc, replace
.
. ********** (1) BOOTSTRAP CONFIDENCE INTERVALS **********
.
. * Stata produces four bootstrap 100*(1-alpha) confidence intervals
. * (1)-(2) have no asymptotic refinement
. * (3)-(4) have asymptotic refinement
.
. * (1) Regular asymptotic normal: bhat +/- t(S-1)_alpha/2*se(bhat)
. * except instead of using the initial se(bhat)
. * we use the standard deviation of bhat from the bootstrap reps
. * and use t(S-1) rather than z for critical value
. * where S = number of bootstrap reps
.
. * (2) Percentile method: which orders the bhat(s) from simulations and
. * goes from alpha/2 lowest bhat(s) to the alpha/2 highest bhat(s)
. * where (s) denotes the s-th bootstrap sample
.
. * (3) Bootstrap-corrected. Same as (4) with a=0
.
. * (4) Bootstrap-corrected and accelerated.
. * This works with the pivotal Wald statistic.
. * See the manual [R]bootstrap or a textbook.
. * e.g. Efron and Tibsharani (1993, pp.184-188) with a=0
. * This orders the bhats from simulations and
. * goes from p1 to the p2 highest
. * where p1 and p2 are bias-correction adjustments to alpha/2 and 1-alpha/2
. * Let p1 = Phi(2z0 - z_alpha/2)
.*
p2 = Phi(2z0 + z_alpha/2)
.*
z0 measures the median bias in bhat with
.*
z0 = Phi-inv(fraction of the bhat(s) < bhat)
. * And if z0=0 then p1 = alpha/2 and no correction
.
. * Change the following for different number of simulations S
. * From page 399, for testing better to use 999 than 1000
. global breps "999" /* The number of bootstrap reps used below */
.
. * (1A) Simplest bootstrap is of all the estimated coefficients
. set seed 10105
. bootstrap "probit y x" _b, reps($breps) bca
command:
probit y x
statistics: b_x
= _b[x]
b_cons = _b[_cons]
161

Bootstrap statistics

Number of obs =
Replications =
999

40

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------b_x | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N)
|
.3495505 1.878616 (P)
|
.2808956 1.600026 (BC)
|
.1552112 1.480223 (BCa)
b_cons | 999 -.0725436 -.0176301 .2448404 -.5530047 .4079175 (N)
|
-.596443 .4247662 (P)
|
-.5528302 .4381396 (BC)
|
-.5205303 .4445401 (BCa)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
BCa = bias-corrected and accelerated
.
. * (1B) This bootstrap is of MLE of b2 and the associated standard error
. * and additionally gives the bias-accelerated method of Efron
. set seed 10105
. bootstrap "probit y x" _b[x] _se[x], reps($breps) bca
command:
probit y x
statistics: _bs_1
= _b[x]
_bs_2
= _se[x]
Bootstrap statistics

Number of obs =
Replications =
999

40

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N)
|
.3495505 1.878616 (P)
|
.2808956 1.600026 (BC)
|
.1552112 1.480223 (BCa)
_bs_2 | 999 .2942893 .0422005 .0932673 .1112667 .4773118 (N)
|
.2323841 .5831083 (P)
|
.2214397 .4475662 (BC)
|
.2162534 .4143377 (BCa)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
BCa = bias-corrected and accelerated
162

.
. * (1C) This bootstrap repeats (2)
. * but will permit bootstrapping if Stata commands are more than one line
. use mma07p4boot, clear
. program define commandtobootstrap, rclass
1. version 8.0
2. quietly probit y x
3. return scalar b2hat=_b[x]
4. return scalar seb2hat=_se[x]
5. end
. set seed 10105
. bootstrap "commandtobootstrap" r(b2hat) r(seb2hat), reps($breps)
command:
commandtobootstrap
statistics: _bs_1
= r(b2hat)
_bs_2
= r(seb2hat)
Bootstrap statistics

Number of obs =
Replications =
999

40

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N)
|
.3495505 1.878616 (P)
|
.2808956 1.600026 (BC)
_bs_2 | 999 .2942893 .0422005 .0932673 .1112667 .4773118 (N)
|
.2323841 .5831083 (P)
|
.2214397 .4475662 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
.
. ********** (2) BOOTSTRAP HYPOTHESIS TESTS - NO REFINEMENT p.255 **********
.
. * We want to test H0: b2 = 1 against Ha: b2 not equal 1
.
. * For a simple test such as this we can just use
. * the bootstrap confidence intervals from (1)
. * and reject if bhat2 is not in the confidence interval
.
. * Here we instead present a common method without refinement
. * essentially (1) above, performing the usual Wald test,
. * except the standard error is estimated by bootstrap.
. * This is useful when hard to obtain standard error by other means.
163

. * Here W = (b2hat - b2_0) / seb2hat_boot where b2_0 = 1


. * and reject at level .05 if |W| > z_.025 = 1.96
.
. use mma07p4boot, clear
. * Save the estimate
. quietly probit y x
. scalar b2est = _b[x]
. * Obtain the bootstrap standard error
. set seed 10105
. bootstrap "probit y x" _b, reps($breps) bca
command:
probit y x
statistics: b_x
= _b[x]
b_cons = _b[_cons]
Bootstrap statistics

Number of obs =
Replications =
999

40

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------b_x | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N)
|
.3495505 1.878616 (P)
|
.2808956 1.600026 (BC)
|
.1552112 1.480223 (BCa)
b_cons | 999 -.0725436 -.0176301 .2448404 -.5530047 .4079175 (N)
|
-.596443 .4247662 (P)
|
-.5528302 .4381396 (BC)
|
-.5205303 .4445401 (BCa)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
BCa = bias-corrected and accelerated
. matrix sebboot = e(se)
. scalar seb2boot = sebboot[1,1] /* x is first then constant */
. * Calculate the test statistic
. scalar Wald = (b2est - 1)/seb2boot
.
. * DISPLAY RESULTS at bottom p.255
. * Note: Text had typo:
. * (1-0.817)/0.376 = -0.487 should be (0.817-1)/0.376 = -0.487
.
164

. di "Probit slope estimate is:


" b2est
Probit slope estimate is:
.8168831
. di "Bootstrap standard estimate is: " seb2boot
Bootstrap standard estimate is: .37638029
. di "Wald statistic (no refinement) is: " Wald
Wald statistic (no refinement) is: -.48652096
. di "Reject at level .05 if |Wald| > 1.96"
Reject at level .05 if |Wald| > 1.96
.
. ********** (3) BOOTSTRAP HYPOTHESIS TESTS - PERCENTILE-T p.256 **********
.
. * Stata does not give this. For methods see
. * e.g. Efron and Tibsharani (1993, pp.160-162)
. * e.g. Cameron and Trivedi (2005)

Chapter 11.2.6-11.2.7
. * For sample s compute t-test(s) = (bhat(s)-bhat) / se(s)
. * where bhat is initial estimate
. * and bhat(s) and se(s) are for sth round.
. * Order the t-test(s) statistics and choose the alpha/2 percentiles
. * which give the critical values for the t-test
.
. * Implementation requires saving the results from each bootstrap replication
. * in order to obtain ccritical values from percentiles of bootstrap distribution
.
. * (3A) Here bootstrap computes (b(s) - bhat) / se(s) s = 1,...,S
.
. use mma07p4boot, clear
. * Save the estimate and the Wald test statistic
. quietly probit y x
. scalar b2est = _b[x]
. scalar Wald = (_b[x] - 1)/_se[x]
. * Then bootstrap calculates (b(s) - bhat) / se(s)
. set seed 10105
. bootstrap "probit y x" ((_b[x]-b2est)/_se[x]), reps($breps) /*
> */ level(95) saving(mma07p4bootreps) replace
command:
probit y x
statistic: _bs_1
= (_b[x]-b2est)/_se[x]
Bootstrap statistics

Number of obs =
Replications =
999

40

165

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 999
0 .1003619 .9350234 -1.834837 1.834837 (N)
|
-1.890602 1.801358 (P)
|
-2.101316 1.565618 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. * Then get data sets with result from each bootstrap
. use mma07p4bootreps, clear
(bootstrap: probit y x)
. sum

/* Here just _bs_1 */

Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------_bs_1 |
999 .1003619 .9350234 -3.032139 2.572848
. gen b2test = _bs_1 /* _bs_1 is the bootstrap result of interest */
. sum b2test, detail /* Gives percentiles but not 2.5% and 97.5% */
b2test
------------------------------------------------------------Percentiles
Smallest
1% -2.188575 -3.032139
5% -1.540843 -2.605178
10% -1.137846 -2.599248
Obs
999
25% -.4995352 -2.566578
Sum of Wgt.
999
50%
75%
90%
95%
99%

.1238111
Mean
.1003619
Largest
Std. Dev.
.9350234
.7789762
2.22565
1.338348
2.359132
Variance
.8742688
1.560646
2.377491
Skewness
-.2505319
2.014282
2.572848
Kurtosis
2.853737

. _pctile b2test, p(2.5,97.5)


.
. * DISPLAY RESULTS on p.256
.
. * Note: Error on p.256 Here get (-1.89, 1.80) not (-2.62, 1.83)
. di "Lower 2.5 and upper 2.5 percentile of coeff b for z: " r(r1) " and " r(r2)
Lower 2.5 and upper 2.5 percentile of coeff b for z: -1.8906019 and 1.8013585
. di "Reject H0 if Wald = " Wald " lies outside " r(r1) " ," r(r2) ")"
Reject H0 if Wald = -.62223436 lies outside -1.8906019 ,1.8013585)
166

.
. * (3B) Equivalently bootstrap calculates b(s) and se(s) s = 1,...,S
.*
and then later calculate (b(s) - bhat) / se(s)
.
. use mma07p4boot, clear
. * Save the estimate and the Wald test statistic
. quietly probit y x
. scalar b2est = _b[x]
. scalar Wald = (_b[x] - 1)/_se[x]
. * Then bootstrap calculates b(s) and se(s)
. set seed 10105
. bootstrap "probit y x" _b[x] _se[x], reps($breps) /*
> */ level(95) saving(mma07p4bootreps) replace
command:
probit y x
statistics: _bs_1
= _b[x]
_bs_2
= _se[x]
Bootstrap statistics

Number of obs =
Replications =
999

40

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N)
|
.3495505 1.878616 (P)
|
.2808956 1.600026 (BC)
_bs_2 | 999 .2942893 .0422005 .0932673 .1112667 .4773118 (N)
|
.2323841 .5831083 (P)
|
.2214397 .4475662 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. * Then get data sets with result from each bootstrap
. use mma07p4bootreps, clear
(bootstrap: probit y x)
. sum

/* Here _bs_1 and _bs_2 */

Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------_bs_1 |
999 .918616 .3763803 .0030288 3.806198
_bs_2 |
999 .3364898 .0932673 .2162534 1.34312
167

. gen b2test = (_bs_1 - b2est)/_bs_2


. _pctile b2test, p(2.5,97.5)
.
. * DISPLAY RESULTS on p.256
. * Note: Error on p.256 Here get (-1.89, 1.80) not (-2.62, 1.83)
. di "Lower 2.5 and upper 2.5 percentile of coeff b for z: " r(r1) " and " r(r2)
Lower 2.5 and upper 2.5 percentile of coeff b for z: -1.8906019 and 1.8013583
. di "Reject H0 if Wald = " Wald " lies outside " r(r1) " ," r(r2) ")"
Reject H0 if Wald = -.62223436 lies outside -1.8906019 ,1.8013583)
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma07p4boot.txt
log type: text
closed on: 18 May 2005, 21:36:36

168

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma08p1cmtests.txt
log type: text
opened on: 17 May 2005, 14:04:20
.
. ********** OVERVIEW OF MMA08P1CMTESTS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 8.2.6 pages 269-71
. * Conditional moment tests example producing Table 8.1
.
. * (A) TEST OF THE CONDITIONAL MEAN
. * (B) TEST THAT CONDITIONAL VARIANCE = MEAN
. * (C) ALTERNATIVE TEST THAT CONDITIONAL VARIANCE = MEAN
. * (D) INFORMATION MATRIX TEST
. * (E) CHI-SQUARE GOODNESS OF FIT TEST
. * for a Poisson model with generated data (see below).
.
. * The data generation requires free Stata add-on command rndpoix
. * In Stata: search rndpoix
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA **********
.
. * Model is
. * y ~ Poisson[exp(b1 + b2*x2]
. * where
. * x2 is iid ~ N[0,1]
. * and b1=0 and b2=1.
.
. set seed 10001
. set obs 200
obs was 0, now 200
. scalar b1 = 0

169

. scalar b2 = 1
.
. * Generate regressors
. gen x2 = invnorm(uniform())
.
. * Generate y
. gen mupoiss = exp(b1+b2*x2)
. * The next requires Stata add-on. In Stata: search rndpoix
. rndpoix(mupoiss)
( Generating ................ )
Variable xp created.
. gen y = xp
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x2 using mma08p1cmtests.asc, replace
.
. ********* POISSON REGRESSION **********
.
. poisson y x2
Iteration 0: log likelihood = -263.53818
Iteration 1: log likelihood = -263.5288
Iteration 2: log likelihood = -263.5288
Poisson regression

Log likelihood = -263.5288

Number of obs =
LR chi2(1)
= 321.75
Prob > chi2 = 0.0000
Pseudo R2
=

200

0.3791

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | 1.12402 .0687868 16.34 0.000 .9892006 1.25884
_cons | -.1652935 .089065 -1.86 0.063 -.3398578 .0092707
-----------------------------------------------------------------------------. * Obtain exp(x'b)
.
. * Obtain the scores to be used later
. predict yhat
(option n assumed; predicted number of events)
. * For the Poisson s = dlnf(y)/db = (y - exp(x'b))*x
. gen s1 = (y - yhat)

170

. gen s2 = (y - yhat)*x2
.
. * Summarize data
. * Should get s1 and s2 summing to zero
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x2 |
200 -.0091098 1.010072 -2.857666 2.149822
mupoiss |
200 1.599601 1.674071 .0574026 8.58333
xp |
200
1.525 2.363749
0
15
y|
200
1.525 2.363749
0
15
yhat |
200
1.525 1.803242 .0341372 9.498652
-------------+-------------------------------------------------------s1 |
200 1.36e-09 1.36719 -3.148933 6.245292
s2 |
200 6.69e-09 1.889198 -6.420406 12.97311
.
. ********** ANALYSIS: CONDITIONAL MOMENTS TESTS **********
.
. * The program is appropriate for MLE with density assumed to be correctly specified.
. * Let H0: E[m(y,x,theta)] = 0
. * Then CM = explained sum of squares or N times uncentered Rsq from
. * auxiliary regression of 1 on m and the components of s = dlnf(y)//dtheta
. * The test is chi-squared with dim(m) degrees of freedom.
.
. * Define the dependent variable one for the aucxiliary regressions
. gen one = 1
.
. *** (A) TEST OF THE CONDITIONAL MEAN (Table 8.1 p.270 row 1)
.
. * Test H0: E[(y - exp(x'b))*z] = 0 where z = x2sq
.
. * A smilar test is relevant for many nonlinear models
. * Just change the expression for the conditional mean.
. * Here we used E[y|x] = exp(x'b) for the Poisson
. * Also for the Poisson z cannot be x as this sums to zero by Poisson foc
. * For some other models (basically non-LEF models) z can be x
.
. gen z = x2*x2
. gen mA = (y - yhat)*z
. regress one mA s1 s2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 3, 197) = 1.09
Model | 3.27177115 3 1.09059038
Prob > F
= 0.3536
Residual | 196.728229 197 .998620451
R-squared = 0.0164
171

-------------+-----------------------------Total |
200 200
1

Adj R-squared = 0.0014


Root MSE
= .99931

-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mA | .1046155 .0577969 1.81 0.072 -.0093646 .2185956
s1 | -.0377486 .0822939 -0.46 0.647 -.2000387 .1245415
s2 | -.1544278 .1029465 -1.50 0.135 -.3574463 .0485908
-----------------------------------------------------------------------------. scalar CMA = e(N)*e(r2)
. di "CMA: " CMA " p-value: " chi2tail(1,CMA)
CMA: 3.2717711 p-value: .07048149
.
. * Check that three different ways give same answer.
. di "N times Uncentered R-squared: " e(N)*e(r2)
N times Uncentered R-squared: 3.2717711
. di "Explained Sum of Squares:
" e(mss)
Explained Sum of Squares:
3.2717711
. di "N minus Residual Sum of Squares: " e(N) - e(rss)
N minus Residual Sum of Squares: 3.2717711
.
. *** (B) TEST THAT CONDITIONAL VARIANCE = MEAN (Table 8.1 p.270 row 2)
.
. * Test H0: E[{(y - exp(x'b))^2 - exp(x'b)}*x] = 0
.
. * This test is peculiar to Poisson which restricts mean = variance
.
. * Here m has 2 terms
. gen mB1 = ((y - yhat)^2 - yhat)
. gen mB2 = ((y - yhat)^2 - yhat)*x2
. regress one mB1 mB2 s1 s2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 4, 196) = 0.60
Model | 2.43400011 4 .608500026
Prob > F
= 0.6604
Residual | 197.566 196 1.0079898
R-squared = 0.0122
-------------+-----------------------------Adj R-squared = -0.0080
Total |
200 200
1
Root MSE
= 1.004
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------172

mB1 | .0432045 .0542516 0.80 0.427 -.0637873 .1501963


mB2 | -.0052374 .0357193 -0.15 0.884 -.0756808 .065206
s1 | -.0399879 .1073712 -0.37 0.710 -.251739 .1717633
s2 | -.003196 .0852726 -0.04 0.970 -.1713655 .1649735
-----------------------------------------------------------------------------. scalar CMB = e(N)*e(r2)
. di "CMB: " CMB " p-value: " chi2tail(2,CMB)
CMB: 2.4340001 p-value: .29611717
.
. *** (C) ALTERNATIVE TEST THAT CONDITIONAL VARIANCE = MEAN (Table 8.1 p.270
row 3)
.
. * Test H0: E[{(y - exp(x'b))^2 - y}*x] = 0
.
. * This test is peculiar to Poisson which restricts mean = variance
. * This test is also peculiar as here dm/db = 0
.
. * Here m has 2 terms
. gen mC1 = ((y - yhat)^2 - y)
. gen mC2 = ((y - yhat)^2 - y)*x2
.
. * To be consistent with other tests include s1 and s2.
. regress one mC1 mC2 s1 s2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 4, 196) = 0.60
Model | 2.43400011 4 .608500027
Prob > F
= 0.6604
Residual | 197.566 196 1.0079898
R-squared = 0.0122
-------------+-----------------------------Adj R-squared = -0.0080
Total |
200 200
1
Root MSE
= 1.004
-----------------------------------------------------------------------------one |
Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mC1 | .0432045 .0542516 0.80 0.427 -.0637873 .1501963
mC2 | -.0052374 .0357192 -0.15 0.884 -.0756808 .065206
s1 | .0032166 .0825345 0.04 0.969 -.1595531 .1659863
s2 | -.0084334 .0641096 -0.13 0.895 -.1348665 .1179997
-----------------------------------------------------------------------------. scalar CMC = e(N)*e(r2)
. di "CMC: " CMC " p-value: " chi2tail(2,CMC)
CMC: 2.4340001 p-value: .29611717
.
173

. * Since dm/db = 0 could just do the regression without the scores


. regress one mC1 mC2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 2, 198) = 1.21
Model | 2.40695177 2 1.20347588
Prob > F
= 0.3016
Residual | 197.593048 198 .997944688
R-squared = 0.0120
-------------+-----------------------------Adj R-squared = 0.0021
Total |
200 200
1
Root MSE
= .99897
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mC1 | .0458705 .0510111 0.90 0.370 -.0547243 .1464652
mC2 | -.0075807 .03212 -0.24 0.814 -.0709218 .0557605
-----------------------------------------------------------------------------. scalar CMCnoscores = e(N)*e(r2)
. di "CMCnoscores: " CMC " p-value: " chi2tail(2,CMCnoscores)
CMCnoscores: 2.4340001 p-value: .30014911
.
. *** (D) INFORMATION MATRIX TEST (Table 8.1 p.270 row 4)
.
. * Test H0: E[{(y - exp(x'b))^2 - y}*vech(xx')] = 0
.
. * A similar test is relevant for other parametric models
. * In general m = vech(d2lnf(y)/dbdb')
. * and for Poisson this yields above
.
. * Here m is a 3x1 vector
. gen mD1 = ((y - yhat)^2 - y)
. gen mD2 = ((y - yhat)^2 - y)*x2
. gen mD3 = ((y - yhat)^2 - y)*x2*x2
.
. * To be consistent with other tests include s1 and s2.
. regress one mD1 mD2 mD3 s1 s2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 5, 195) = 0.58
Model | 2.9463051 5 .58926102
Prob > F
= 0.7129
Residual | 197.053695 195 1.01053177
R-squared = 0.0147
-------------+-----------------------------Adj R-squared = -0.0105
Total |
200 200
1
Root MSE
= 1.0053
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
174

-------------+---------------------------------------------------------------mD1 | .0546342 .0566422 0.96 0.336 -.0570759 .1663442


mD2 | -.0712751 .0994042 -0.72 0.474 -.2673205 .1247703
mD3 | .0330527 .0464213 0.71 0.477 -.0584996 .124605
s1 | -.0098554 .0846533 -0.12 0.907 -.176809 .1570982
s2 | -.0146441 .0647803 -0.23 0.821 -.1424041 .1131158
-----------------------------------------------------------------------------. scalar CMD = e(N)*e(r2)
. di "CMD: " CMD " p-value: " chi2tail(3,CMD)
CMD: 2.9463051 p-value: .39997818
.
. * Since dm/db = 0 could just do the regression without the scores
. regress one mD1 mD2 mD3, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 3, 197) = 0.91
Model | 2.73445751 3 .911485837
Prob > F
= 0.4370
Residual | 197.265542 197 1.00134793
R-squared = 0.0137
-------------+-----------------------------Adj R-squared = -0.0013
Total |
200 200
1
Root MSE
= 1.0007
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mD1 | .056165 .054176 1.04 0.301 -.0506743 .1630043
mD2 | -.056325 .0911035 -0.62 0.537 -.2359884 .1233384
mD3 | .0233527 .0408339 0.57 0.568 -.057175 .1038805
-----------------------------------------------------------------------------. scalar CMDnoscores = e(N)*e(r2)
. di "CMDnoscores: " CMDnoscores " p-value: " chi2tail(3,CMDnoscores)
CMDnoscores: 2.7344575 p-value: .43440333
.
. *** (E) CHI-SQUARE GOODNESS OF FIT TEST (Table 8.1 p.270 row 5)
.
. * Test H0: E[{d_j - Pr[y = j]] = 0
. * where d_j = 1 if y = j for j = 0, 1, 2, and 3 or more
. * and Pr[y = j] = exp(-lamda)*lamda^y/y! for lamda = exp(x'b)
. * Cells get too small if have more cells than up to 3 or more.
.
. * A similar test is relevant for other parametric models,
. * though a natural partitioning for y may be less obvious.
.
. * Here m has 4 terms
. gen d0 = 0

175

. replace d0 = 1 if y==0
(87 real changes made)
. gen d1 = 0
. replace d1 = 1 if y==1
(51 real changes made)
. gen d2 = 0
. replace d2 = 1 if y==2
(22 real changes made)
. gen p0 = exp(-yhat)
. gen p1 = exp(-yhat)*yhat
. gen p2 = exp(-yhat)*(yhat^2)/2
. gen mE1 = d0 - p0
. gen mE2 = d1 - p1
. gen mE3 = d2 - p2
. regress one mE1 mE2 mE3 s1 s2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 5, 195) = 0.49
Model | 2.50056717 5 .500113433
Prob > F
= 0.7807
Residual | 197.499433 195 1.0128176
R-squared = 0.0125
-------------+-----------------------------Adj R-squared = -0.0128
Total |
200 200
1
Root MSE
= 1.0064
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mE1 | 1.020078 .7290569 1.40 0.163 -.4177712 2.457927
mE2 | .7149016 .5053259 1.41 0.159 -.2817042 1.711507
mE3 | .2705081 .383646 0.71 0.482 -.4861201 1.027136
s1 | .2916116 .2217763 1.31 0.190 -.1457765 .7289997
s2 | -.1341565 .1125046 -1.19 0.235 -.3560384 .0877255
-----------------------------------------------------------------------------. scalar CME = e(N)*e(r2)
. di "CME: " CME " p-value: " chi2tail(3,CME)
CME: 2.5005672 p-value: .47518859
.
. * Wrong alternative is basic chisquare
176

. quietly sum d0
. scalar sumd0 = r(sum)
. quietly sum d1
. scalar sumd1 = r(sum)
. quietly sum d2
. scalar sumd2 = r(sum)
. scalar sumd3 = 1 - sumd0 - sumd1 - sumd2
. quietly sum p0
. scalar sump0 = r(sum)
. quietly sum p1
. scalar sump1 = r(sum)
. quietly sum p2
. scalar sump2 = r(sum)
. scalar sump3 = 1 - sump0 - sump1 - sump2
. scalar chisq = (sumd0-sump0)^2/sump0 + (sumd1-sump1)^2/sump1 /*
>
*/ + (sumd2-sump2)^2/sump2 + (sumd3-sump3)^2/sump3
. di "Wrong Traditional chi-square: " chisq " p = " chi2tail(3,chisq)
Wrong Traditional chi-square: .47431003 p = .92449803
.
.
. ********** DISPLAY RESULTS (Table 8.1 p.270) **********
.
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x2 |
200 -.0091098 1.010072 -2.857666 2.149822
mupoiss |
200 1.599601 1.674071 .0574026 8.58333
xp |
200
1.525 2.363749
0
15
y|
200
1.525 2.363749
0
15
yhat |
200
1.525 1.803242 .0341372 9.498652
-------------+-------------------------------------------------------s1 |
200 1.36e-09 1.36719 -3.148933 6.245292
s2 |
200 6.69e-09 1.889198 -6.420406 12.97311
one |
200
1
0
1
1
177

z|
200 1.015227 1.286795 .0000877 8.166255
mA |
200 .1563713 3.403966 -13.52498 26.94856
-------------+-------------------------------------------------------mB1 |
200 .334863 3.470417 -6.436038 30.24896
mB2 |
200
.43869 5.749749 -11.74974 62.83503
mC1 |
200 .334863 3.077815 -6.838236 24.00367
mC2 |
200
.43869 4.897291 -12.484 49.86192
mD1 |
200 .334863 3.077815 -6.838236 24.00367
-------------+-------------------------------------------------------mD2 |
200
.43869 4.897291 -12.484 49.86192
mD3 |
200 .8381842 9.190652 -22.791 103.5763
d0 |
200
.435 .4970011
0
1
d1 |
200
.255 .436955
0
1
d2 |
200
.11 .3136749
0
1
-------------+-------------------------------------------------------p0 |
200 .429237 .2918348 .000075 .9664389
p1 |
200 .2406035 .1137756 .000712 .367864
p2 |
200 .1235594 .0894167 .0005631 .2706694
mE1 |
200 .005763 .4287003 -.9289918 .9571021
mE2 |
200 .0143965 .4210301 -.367864 .9315748
-------------+-------------------------------------------------------mE3 |
200 -.0135594 .3065698 -.2706694 .9688674
.
. * Gives Rows 1-5 of Table 8.1 (The CMxnoscores are not reported)
. di "CMA: " CMA " p-value: " chi2tail(1,CMA)
CMA: 3.2717711 p-value: .07048149
. di "CMB: " CMB " p-value: " chi2tail(2,CMB)
CMB: 2.4340001 p-value: .29611717
. di "CMC: " CMC " p-value: " chi2tail(2,CMC)
CMC: 2.4340001 p-value: .29611717
. di "CMD: " CMD " p-value: " chi2tail(3,CMD)
CMD: 2.9463051 p-value: .39997818
. di "CME: " CME " p-value: " chi2tail(3,CME)
CME: 2.5005672 p-value: .47518859
. di "CMCnoscores: " CMCnoscores " p-value: " chi2tail(2,CMCnoscores)
CMCnoscores: 2.4069518 p-value: .30014911
. di "CMDnoscores: " CMDnoscores " p-value: " chi2tail(3,CMDnoscores)
CMDnoscores: 2.7344575 p-value: .43440333
.
. ********** FURTHER ANALYSIS gives M** column in Table 8.1 **********
.
. * The following drops the scores from the regression. Provides lower bound.
. * Results are reported in last column in Table 8.1
178

. quietly regress one mA, noconstant


. di "CMA without scores:" e(N)*e(r2) " with p = " chi2tail(1,e(N)*e(r2))
CMA without scores:.42328231 with p = .51530376
. quietly regress one mB1 mB2, noconstant
. di "CMB without scores:" e(N)*e(r2) " with p = " chi2tail(2,e(N)*e(r2))
CMB without scores:1.8897296 with p = .38873213
. quietly regress one mC1 mC2, noconstant
. di "CMC without scores:" e(N)*e(r2) " with p = " chi2tail(2,e(N)*e(r2))
CMC without scores:2.4069518 with p = .30014911
. quietly regress one mD1 mD2 mD3, noconstant
. di "CMD without scores:" e(N)*e(r2) " with p = " chi2tail(3,e(N)*e(r2))
CMD without scores:2.7344575 with p = .43440333
. quietly regress one mE1 mE2 mE3, noconstant
. di "CME without scores:" e(N)*e(r2) " with p = " chi2tail(3,e(N)*e(r2))
CME without scores:.73842732 with p = .86413036
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma08p1cmtests.txt
log type: text
closed on: 17 May 2005, 14:04:20
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma08p2nonnested.txt
log type: text
opened on: 18 May 2005, 21:27:00
.
. ********** OVERVIEW OF MMA08P2NONNESTED.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 8.5.3 pages 283-4
. * Nonnested model comparison given in Table 8.2:
.
. * (A) AIC AND VARIATIONS
. * (B) VUONG TEST for Overlapping Models
179

. * for a Poisson model with simulated data (see below).


.
. * This example requires the free Stata add-on command rndpoix.
. * In Stata: search rndpoix
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA **********
.
. * Dgp is
. * y ~ Poisson[exp(b1 + b2*x2 + b3*x3]
. * where
. * x2, x3 is iid ~ N[0,1]
. * and b1=0 and b2=1 and b3=1.
.
. * The Models compared are
. * Poisson of y on x2
. * Poisson of y on x3 and x3^2
.
. set seed 10001
. set obs 100
obs was 0, now 100
. scalar b1 = 0.5
. scalar b2 = 0.5
. scalar b3 = 0.5
.
. * Generate regressors
. gen x2 = invnorm(uniform())
. gen x3 = invnorm(uniform())
. gen x2sq = x2*x2
. gen x3sq = x3*x3
.
. * Generate y
. gen mupoiss = exp(b1+b2*x2+b3*x3)

180

. * The next requires Stata add-on. In Stata: search rndpoix


. rndpoix(mupoiss)
( Generating ......... )
Variable xp created.
. gen y = xp
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x2 x3 x2sq x3sq using mma08p2nonnested.asc, replace
.
. ********* SETUP FOR THIS PROGRAM *********
.
. * Change this if want different regressors
. * Here both models differ from the dgp
. * The Vuong test below assumes that the two models are OVERLAPPING
. global XLISTMODEL1 x2
. global XLISTMODEL2 x3 x3sq
.
. ********* (A) AIC AND VARIATIONS *********
.
. * Stata output from Poisson saves much of this.
. * Also calculate manually.
.
. * The following code can be changed to different models than poisson
. * provided
. * ereturn list yields N = e(N); q = e(k); and LnL = e(ll)
. * We use AIC = -2lnL+2q; BIC = -2lnL+lnN*q; CAIC = -2lnL+(1+lnN)*q
.
. poisson y $XLISTMODEL1
Iteration 0: log likelihood = -183.43146
Iteration 1: log likelihood = -183.43146
Poisson regression

Number of obs =
100
LR chi2(1)
=
16.28
Prob > chi2 = 0.0001
Log likelihood = -183.43146
Pseudo R2
= 0.0425
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | .291164 .072311 4.03 0.000 .1494371 .4328909
_cons | .6084331 .0752833 8.08 0.000 .4608806 .7559857
-----------------------------------------------------------------------------. estimates store model1

181

. scalar ll1 = e(ll)


. scalar q1 = e(k)
. scalar N1 = e(N)
. scalar aic1 = -2*ll1 + 2*q1
. scalar bic1 = -2*ll1 + ln(N1)*q1
. scalar caic1 = -2*ll1 + (1 + ln(N1))*q1
.
. poisson y $XLISTMODEL2
Iteration 0: log likelihood = -176.09611
Iteration 1: log likelihood = -176.09119
Iteration 2: log likelihood = -176.09119
Poisson regression

Number of obs =
100
LR chi2(2)
=
30.96
Prob > chi2 = 0.0000
Log likelihood = -176.09119
Pseudo R2
= 0.0808
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x3 | .3588412 .07035 5.10 0.000 .2209578 .4967245
x3sq | .0912999 .0514311 1.78 0.076 -.0095032 .1921029
_cons | .492656 .0958903 5.14 0.000 .3047144 .6805975
-----------------------------------------------------------------------------. estimates store model2
. scalar ll2 = e(ll)
. scalar q2 = e(k)
. scalar N2 = e(N)
. scalar aic2 = -2*ll2 + 2*q2
. scalar bic2 = -2*ll2 + ln(N2)*q2
. scalar caic2 = -2*ll2 + (1 + ln(N2))*q2
.
. * Display results given in first three rows of Table 8.2 page 284
.
. estimates table model1 model2, stats(N k ll aic bic)

182

---------------------------------------Variable | model1
model2
-------------+-------------------------x2 | .29116396
x3 |
.35884118
x3sq |
.09129986
_cons | .60843314 .49265596
-------------+-------------------------N|
100
100
k|
2
3
ll | -183.43146 -176.09119
aic | 370.86292 358.18238
bic | 376.07326 365.99789
---------------------------------------.
. di "Model 1: " _n "lnL: " ll1 " q: " q1 _n " N: " N1
Model 1:
lnL: -183.43146 q: 2
N: 100
. di "-2lnL: " -2*ll1 _n "AIC: " aic1 _n " BIC: " bic1 _n "caic: " caic1
-2lnL: 366.86292
AIC: 370.86292
BIC: 376.07326
caic: 378.07326
.
. di "Model 2: " _n "lnL: " ll2 " q: " q2 _n " N: " N2
Model 2:
lnL: -176.09119 q: 3
N: 100
. di "-2lnL: " -2*ll2 _n "AIC: " aic2 _n " BIC: " bic2 _n "caic: " caic2
-2lnL: 352.18238
AIC: 358.18238
BIC: 365.99789
caic: 368.99789
.
. ********* (B) VUONG TEST FOR OVERLAPPING MODELS *********
.
. * The test has three variants
. * (1) Nested models: G is contained in F
. * (2) Strictly non-nested models: F intersection G equals null set
. * (3) Overlapping models: F intersection G does not equal null set
.
. * Need to compute lnf(y) for models 1 and 2,
. * where density f is model 1 and density g is model 2
.
. * The procedures will vary with model. Here use Poisson.
183

.
. * (0) COMPUTE THE LR TEST STATISTIC
.
. * This is LR = Sum_i [ ln (fy1_i / gy2_i) ]
.*
= Sum_i lnfy1_i - Sum_i lngy2_i
.*
= difference in log-likelihood for the two models
.
. * Easiest if program output gives logL
. * Otherwise need to generate manually
.
. quietly poisson y $XLISTMODEL1
. scalar llf = e(ll)
. quietly poisson y $XLISTMODEL2
. scalar llg = e(ll)
. scalar LR = llf - llg
. di "LR = " LR " and llf = " llf " llg = " llg
LR = -7.3402698 and llf = -183.43146 llg = -176.09119
.
. * (1) NESTED MODELS
.
. * Not done here as not relevant for the example of this application.
.
. * (1A) Usual LR test if assume densities correctly specified.
.
. * (1B) If instead want robustified version then need to compute W
. * and use the weighted chi-square test.
. * This is not the appropriate test here,
. * but in 3(A) below W is computed and a weighted chi-square test used.
. * This code could be easily adapted to here.
.
. * (2) STRICTLY NON-NESTED MODELS
.
. * Not done here as not relevant for the example of this application.
. * Test uses LR/what ~ normal where what is computed in 3(B) below.
.
. * (3) OVERLAPPING MODELS
.
. * This is the relevant test here
. * First test whether overlapping (even though here know that is)
. * THen do the test
.
. * (3A-1) Compute what^2
.
. * Calculate what^2
. * = (1/N)*Sum_i[ln(fy1_i/gy2_i)^2] - [(1/N)*Sum_i[ln(fy1_i/gy2_i)]^2
184

. * = (1/N) * Sum_i [(ln(fy1_i) - ln(gy2_i))^2] - (LR/N)^2


.
. * For the Poisson
.*
f(y) = exp(-mu)*mu^y/y!
. * so lnf(y) = -mu + y*ln(mu) - lny!
. quietly poisson y $XLISTMODEL1
. predict yhatf
(option n assumed; predicted number of events)
. * Poisson default predict gives yhat = exp(x'b)
. gen lnf = -yhatf + y*ln(yhatf) - lnfact(y)
. quietly poisson y $XLISTMODEL2
. predict yhatg
(option n assumed; predicted number of events)
. gen lng = -yhatg + y*ln(yhatg) - lnfact(y)
. gen lnratiosq = (lnf-lng)^2
. sum lnratiosq
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lnratiosq |
100 .6967792 1.816804 .0000331 13.85592
. scalar whatsq = r(sum)/_N - (LR/_N)^2
. scalar Nwhatsq = _N*whatsq
. di "First-stage test statistic whatsq - still need to find critical value"
First-stage test statistic whatsq - still need to find critical value
. di "N*omegahatsq = " Nwhatsq
N*omegahatsq = 69.139128
.
. * Aside: Check by recomputing LR this long way
. gen lnratio = (lnf-lng)
. sum lnratio
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lnratio |
100 -.0734027 .8356883 -3.722355 2.571382
. scalar LRcheck = r(sum)
.
185

. *** Display results given in second last row of Table 8.2 page 284
.
. di "LR = " LR " and LRcheck = " LRcheck
LR = -7.3402698 and LRcheck = -7.3402702
.
. * (3A-2) Find the critical value by first find W, then eigenvalues lamda, then simulate
.
. * Calculate estimate of the W matrix on page ?? of Vuong.
. * (a) Can estimate Af = E[d2lnf(y)/dbdb'] as inverse of usual ML variance matrix
. * (b) Since the robust ML variance matrix is V = Ainv*B*Ainv
. * can estimate Bf = -E[dlnf(y)/dbxdlnf(y)/db'] by A*V*A where A is in (a)
. * (c) For Ag same as in part (a) except for model g
. * (d) For Bg same as in part (a) except for model g
. * (e) The only tricky bit is computation of Bfg
.
. gen one = 1
. * (a) Af
. quietly poisson y one $XLISTMODEL1, noconstant
. matrix Af = syminv(e(V))
. * (b) Bf
. quietly poisson y one $XLISTMODEL1, noconstant robust
. * robust gives Ainv*B*Ainv so pre and post multiply by A gives B
. * Also make adjustment s Stata divides by (_N-1). Here use _N.
. matrix Bf = Af*e(V)*Af*(_N-1)/_N
. * (c) Ag
. quietly poisson y one $XLISTMODEL2, noconstant
. matrix Ag = syminv(e(V))
. * (d) Bg
. quietly poisson y one $XLISTMODEL2, noconstant robust
. matrix Bg = Ag*e(V)*Ag*(_N-1)/_N
.
. * (e) Bfg requires more specialized code pecuuliar to this example
. * For Poisson dlnf(y)/db = Sum_I (y_i - mu_i)*x_i
. * so Bfg = (1/N)*Sum_i [(y_i - muf_i)*xf_i]*[(y_i - mug_i)*xg_i]'
. * For model 1 x is intercept and x2 (global XLISTMODEL1 x2)
. gen bf1 = (y - yhatf)
/* yhatf saved earlier = y - muf */
. gen bf2 = (y - yhatf)*x2
. * For model 2 x is intercept, x3 and x3sq (global XLISTMODEL2 x3 x3sq)
. gen bg1 = (y - yhatg)
/* yhatg saved earlier = y - mug */
186

. gen bg2 = (y - yhatg)*x3


. gen bg3 = (y - yhatg)*x3sq
. * Create Bfg
. matrix accum BfBg = bf1 bf2 bg1 bg2 bg3, noconstant
(obs=100)
. * and Bfg is the (1,2) submatrix: rows 1 to 2 and columns 3 to 5
. matrix Bfg = BfBg[1..2,3..5]
.
. * Form the matrix W
. * Note there is no need for minus sign as A has been defined as -A
. matrix W11 = Bf*syminv(Af)
. matrix W12 = Bfg*syminv(Ag)
. matrix W21 = Bfg'*syminv(Af)
. matrix W22 = Bg*syminv(Ag)
. matrix W = W11,W12\W21,W22
. matrix list W
W[5,5]

y:one
y:x2
bg1
bg2
bg3

y:
y:
y:
y:
y:
one
x2
one
x3
x3sq
1.5571072 .01745302 1.3738479 .03868485 -.1702893
.05110494 1.4484966 .61074273 .07847014 -.15039712
1.1488275 .1064062 1.6030095 .0647251 -.18944561
.39558125 .08428705 .20709641 1.0650899 -.05677421
1.1180355 -.0564763 .19914593 .07617139 .90718177

.
. * Calculate the eigenvalues of W
. matrix eigenvalues reigvalW ceigvalW = W
. * Real eigenvalues
. matrix list reigvalW
reigvalW[1,5]
y:
y:
y:
y:
y:
one
x2
one
x3
x3sq
real 2.7511946 .29082285 1.4750881 1.0021719 1.0616075
. * Complex eigenvalues - hopefully none
. matrix list ceigvalW

187

ceigvalW[1,5]
y: y: y: y: y:
one x2 one x3 x3sq
complex 0 0 0 0 0
.
. * This gives the vector lamda of eigenvalus of W
. matrix lamda = reigvalW
. scalar l1 = lamda[1,1]
. scalar l2 = lamda[1,2]
. scalar l3 = lamda[1,3]
. scalar l4 = lamda[1,4]
. scalar l5 = lamda[1,5]
.
. * Now obtain the p-value and critical value at level 0.05
. preserve
. * Obtain the 5 percent critical value by simulating 10000 draws from
. * M_p+q(lamda) = Sum_j lamda*j*z_j^2 where z_j are N[0,1] so z_j^2 are chi(1)
. set seed 10101
. set obs 10000
obs was 100, now 10000
. gen randomdraw = l1*invnorm(uniform())^2 + l2*invnorm(uniform())^2 + /*
> */ l3*invnorm(uniform())^2 + l4*invnorm(uniform())^2 + l5*invnorm(uniform())^2
. gen indicator = Nwhatsq >= randomdraw
. quietly sum indicator
. di "p-value for the Omegahatsq test = " 1-r(mean)
p-value for the Omegahatsq test = 0
. sum randomdraw, detail
randomdraw
------------------------------------------------------------Percentiles
Smallest
1% .6438425
.0756691
5% 1.286375
.1250253
10% 1.850972
.1326376
Obs
10000
25% 3.137835
.1402145
Sum of Wgt.
10000
50%

5.359223

Mean

6.614841
188

75%
90%
95%
99%

Largest
Std. Dev.
4.90562
8.751276
38.32291
12.8871
38.75208
Variance
24.06511
16.10237
40.94431
Skewness
1.733549
23.85304
44.08449
Kurtosis
7.514808

. di "Reject overlapping at level .05 if N*omegahatsq exceeds " r(p95)


Reject overlapping at level .05 if N*omegahatsq exceeds 16.102374
. restore
. di "where N*omegahatequals " Nwhatsq
where N*omegahatequals 69.139128
. di "If reject then continue to second step."
If reject then continue to second step.
. di "Otherwise stop as cannot determine whether models are overlapping."
Otherwise stop as cannot determine whether models are overlapping.
.
. * (3B) Do the second stage test if reject at (3A)
. gen TLR = (LR/sqrt(whatsq))/sqrt(_N)
.
. *** Display results given in second last row of Table 8.2 page 284
.
. di "TLR is N[0,1]. Here TLR = " TLR
TLR is N[0,1]. Here TLR = -.88277513
. di "Two-tailed test p-value: " chi2tail(1,TLR^2)
Two-tailed test p-value: .37735778
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma08p2nonnested.txt
log type: text
closed on: 18 May 2005, 21:27:00
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma08p3diagnostics.txt
log type: text
opened on: 17 May 2005, 14:10:13
.
. ********** OVERVIEW OF MMA08P3DIAGNOSTICS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
189

. * by A. Colin Cameron and Pravin K. Trivedi (2005)


. * Cambridge University Press
.
. * Chapter 8.7.3 pages 290-1
. * Model diagnostics example (Table 8.3)
.
. * (A) DIFFERENT R-SQUAREDS
. * (B) CALCULATION OF RESIDUALS
. * for a Poisson model with simulated data (see below).
.
. * The data generation requires free Stata add-on command rndpoix
. * In Stata: search rndpoix
.
. * This program gives results for model 2
. * For model 1 need to rerun with only x3 as regressor
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA **********
.
. * Model is
. * y ~ Poisson[exp(b1 + b2*x2 + b3*x3]
. * where
. * x2 and x3 are iid ~ N[0,1]
. * and b1=0.5 and b2=0.5 and b3=0.5.
.
. * The Diagnostics below are from Poisson regression of y on x3 alone
. * or from Poisson regression of y on x3 and x3sq. [Note" x2 is omitted]
.
. set seed 10001
. set obs 100
obs was 0, now 100
. scalar b1 = 0.5
. scalar b2 = 0.5
. scalar b3 = 0.5
.
. * Generate regressors
. gen x2 = invnorm(uniform())

190

. gen x3 = invnorm(uniform())
.
. * Generate y
. gen mupoiss = exp(b1+b2*x2+b3*x3)
. * The next requires Stata add-on. In Stata: search rndpoix
. rndpoix(mupoiss)
( Generating ......... )
Variable xp created.
. gen y = xp
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x2 |
100 .0053689 1.000686 -2.173506 2.106561
x3 |
100 -.0235884 1.024207 -2.857666 2.149822
mupoiss |
100 2.020511 1.400564 .3380426 7.029678
xp |
100
1.92 1.835013
0
8
y|
100
1.92 1.835013
0
8
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x2 x3 using mma08p3diagnostics.asc, replace
.
. ********* SETUP FOR THIS PROGRAM **********
.
. * Change this if want different regressors
. gen x3sq = x3*x3
. * global XLIST x3
/* Model 1 */
. global XLIST x3 x3sq /* Model 2 */
.
. ********* R-SQUARED (reported in Table 8.3 p.291) **********
.
. * The following code can be changed to diffferent models than poisson
. * For RsqRES, RsqEXP and RsqCOR need
.* y
dependent variable
. * yhat predicted value of dependent variable
. * For RsqWRSS additionally need
. * sigmasq predicted variance of dependent variable
. * For RsqRG need log density evaluated at values given below
.
. * Obtain exp(x'b) Will vary with the model
. poisson y $XLIST
Iteration 0: log likelihood = -176.09611
191

Iteration 1: log likelihood = -176.09119


Iteration 2: log likelihood = -176.09119
Poisson regression

Number of obs =
100
LR chi2(2)
=
30.96
Prob > chi2 = 0.0000
Log likelihood = -176.09119
Pseudo R2
= 0.0808
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x3 | .3588412 .07035 5.10 0.000 .2209578 .4967245
x3sq | .0912999 .0514311 1.78 0.076 -.0095032 .1921029
_cons | .492656 .0958903 5.14 0.000 .3047144 .6805975
-----------------------------------------------------------------------------. predict yhat
(option n assumed; predicted number of events)
. scalar dof = e(N)-e(k)
.
. * RsqRES and RsqEXP are R-squared from sums of squares
. * First get TSS, ESS and RSS
. egen ybar = mean(y)
. gen ylessybarsq = (y - ybar)^2
. quietly sum ylessybarsq
. scalar totalss = r(mean)
. gen yhatlessybarsq = (yhat - ybar)^2
. quietly sum yhatlessybarsq
. scalar explainedss = r(mean)
. gen residualsq = (y - yhat)^2
. quietly sum residualsq
. scalar residualss = r(mean)
. * Second computed the rsquared
. scalar sereg = sqrt(residualss/dof)
. scalar RsqRES = 1 - residualss/totalss
. scalar RsqEXP = explainedss/totalss

192

.
. * RsqCOR uses sample correlation
. quietly correlate y yhat
. scalar RsqCOR = r(rho)^2
.
. di "standard error of regression: " sereg
standard error of regression: .16620308
. di "totalss: " totalss _n "explainedss: " explainedss _n "residualss: " residualss
totalss: 3.3336
explainedss: .69556676
residualss: 2.6794761
. di "RsqRES: " RsqRES _n "RsqEXP: " RsqEXP _n "RsqCOR: " RsqCOR
RsqRES: .19622149
RsqEXP: .20865333
RsqCOR: .19640666
.
. * RsqWRSS uses weighted sums of squares
. * First generate estimated variance of y
. * Here for Poisson use fact that variance = mean
. gen sigmasq = yhat
. gen weightedylessybarsq = ((y - ybar)^2) / sigmasq
. quietly sum weightedylessybarsq
. scalar weightedtotalss = r(mean)
. gen weightedresidualsq = ((y - yhat)^2) / sigmasq
. quietly sum weightedresidualsq
. scalar weightedresidualss = r(mean)
. scalar RsqWRSS = 1 - weightedresidualss/weightedtotalss
. di "RsqWRSS: " RsqWRSS
RsqWRSS: .16945018
.
. * RsqRG is from ML. Difficult to generalize beyond LEF models.
. * Need
. * lnL_fit log-likelihood at fitted values (the usual)
. * lnL_0 log-likelihood at intecept only
. * lnL_max log-likelihood at best fit
. quietly poisson y $XLIST

193

. scalar lnL_fit = e(ll)


. scalar lnL_0 = e(ll_0)
. * The following applies only for Poisson. Differs for otehr models.
. * lnf(y) = -mu + y*ln(mu) - ln(y!)
. * is maximized at mu = y
. * so compute lnL_max = sum of [-y + y*ln(y) - lny!]
. * Following sets 0*ln0 = 0
. gen ylny = 0
. replace ylny = y*ln(y) if y > 0
(51 real changes made)
. gen lnfyatmax = -y + ylny - lnfact(y)
. quietly sum lnfyatmax
. scalar lnL_max = r(sum)
. scalar RsqRG = (lnL_fit - lnL_0) / (lnL_max - lnL_0)
.
. * RsqQ should only be used for binary and other discrete choice models
. * And definitely use only if lnL_fit < 0
. scalar RsqQ = 1 - lnL_fit/lnL_0
.
. di "lnL_0: " lnL_0 _n "lnL_fit: " lnL_fit _n "lnL_max: " lnL_max
lnL_0: -191.57162
lnL_fit: -176.09119
lnL_max: -101.12402
. di "RsqRG: " RsqRG _n "RsqQ: " RsqQ
RsqRG: .17115358
RsqQ: .08080754
.
. * Check
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x2 |
100 .0053689 1.000686 -2.173506 2.106561
x3 |
100 -.0235884 1.024207 -2.857666 2.149822
mupoiss |
100 2.020511 1.400564 .3380426 7.029678
xp |
100
1.92 1.835013
0
8
y|
100
1.92 1.835013
0
8
-------------+-------------------------------------------------------x3sq |
100 1.039067 1.446146 .0000877 8.166255
yhat |
100
1.92 .838208 1.150405 5.398193
194

ybar |
100
1.92
0
1.92
1.92
ylessybarsq |
100
3.3336 5.966374
.0064 36.9664
yhatlessyb~q |
100 .6955668 1.572256 4.82e-06 12.09783
-------------+-------------------------------------------------------residualsq |
100 2.679476 4.830379 .0000825 36.93972
sigmasq |
100
1.92 .838208 1.150405 5.398193
weightedyl~q |
100 1.681324 2.560112 .0018502 19.23135
weightedre~q |
100 1.396423 2.424518 .0000276 19.21747
ylny |
100 2.15694 3.48234
0 16.63553
-------------+-------------------------------------------------------lnfyatmax |
100 -1.01124 .6233793 -1.969071
0
. poisson y $XLIST /* Stata Rsq = RsqQ */
Iteration 0: log likelihood = -176.09611
Iteration 1: log likelihood = -176.09119
Iteration 2: log likelihood = -176.09119
Poisson regression

Number of obs =
100
LR chi2(2)
=
30.96
Prob > chi2 = 0.0000
Log likelihood = -176.09119
Pseudo R2
= 0.0808

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x3 | .3588412 .07035 5.10 0.000 .2209578 .4967245
x3sq | .0912999 .0514311 1.78 0.076 -.0095032 .1921029
_cons | .492656 .0958903 5.14 0.000 .3047144 .6805975
-----------------------------------------------------------------------------.
. *** The following results are for Model 2 in Table 8.3 p.291
. *** For model 1 R-squareds need to rerun with only x3 as regressor
. di "standard error of regression: " sereg
standard error of regression: .16620308
. di "RsqRES: " RsqRES _n "RsqEXP: " RsqEXP _n "RsqCOR: " RsqCOR
RsqRES: .19622149
RsqEXP: .20865333
RsqCOR: .19640666
. di "RsqWRSS: " RsqWRSS _n "RsqRG: " RsqRG _n "RsqQ: " RsqQ
RsqWRSS: .16945018
RsqRG: .17115358
RsqQ: .08080754
.
. ********* RESIDUAL ANALYSIS (text bottom p.290 to top p.291) **********
.
. * Assume that from earlier have yhat
195

.
. * raw residual
. gen raw = y - yhat
. gen sigma = sqrt(yhat)
. gen Pearson = (y - yhat)/sigma
. * Note that earlier defined ylny = 0 if y=0 and = yln(y) otherwise
. gen deviance = sign(y-yhat)*sqrt(2*(-y+ylny)-2*(-yhat+y*ln(yhat)))
.
. *** The following are results reported in text bottom p.290 to top p.291
. sum raw Pearson deviance
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------raw |
100 -2.38e-09 1.645157 -2.993904 6.077806
Pearson |
100 -.0014455 1.187656 -1.498094 4.383774
deviance |
100 -.2103819 1.212345 -2.016939 3.264961
. corr raw Pearson deviance
(obs=100)
|
raw Pearson deviance
-------------+--------------------------raw | 1.0000
Pearson | 0.9852 1.0000
deviance | 0.9625 0.9818 1.0000

. * Example of use to find whether x3 belongs in the model


. * graph twoway scatter Pearson x3
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma08p3diagnostics.txt
log type: text
closed on: 17 May 2005, 14:10:13

196

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma09p1np.txt
log type: text
opened on: 17 May 2005, 14:16:51
.
. ********** OVERVIEW OF MMA09P1NP.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 9.2 p.295-297
. * Nonparametric density estimation and nonparametric regression using actual data.
.
. * (1) Histogram: Figure 9.1 in chapter 9.2.1 (ch9hist)
. * (2) Kernel density estimate as bandwidth varies: Figure 9.2 in chapter 9.2.1 (ch9kd1)
. * (3) Kernel density estimate as kernel varies: Figure 9.4 in chapter 9.3.4 (ch9kdensu1)
. * (4) Lowess regression: Figure 9.3 in chapter 9.4.3 (ch9ksm1)
. * (5) Extra: Nearest neighbours regression: using Lowess and using add-on knnreg
. * (6) Extra: Kernel regression: using add-on kernreg
.
. * using data on earnings and education (see below)
.
. * NOTE: This particular program uses version 8.2 rather than 8.0
.*
For kernel density Stata uses an alternative formulation of Epanechnikov
.*
To follow book and e.g. Hardle (1990) use epan2 rather than epan
.*
epan = epan2 if epan bandwidth is epan2 bandwidth divided by sqrt(5)
.*
where kernel epan2 is an update to Stata version 8.2
.
. * To run this program you need file
. * psidf3050.dat
. * in your directory
.
. * To do (5) and (6) you need Stata add-ons knnreg and kernreg
. * In Stata give command search knnreg and search kernreg
.
. * See also mma9p2npmore.do for more on nonparametric regression (Figures 9.5-9.7)
.
. ********** SETUP
.
. di "mma09p1np.do Cameron and Trivedi: Stata nonparametrics with wages and education"
mma09p1np.do Cameron and Trivedi: Stata nonparametrics with wages and education
. set more off
. version 8
. set scheme s1mono /* Graphics scheme */
197

.
. ********** DATA DESCRIPTION
.*
. * The original data are from the PSID Individual Level Final Release 1993 data
. * From www.isr.umich.edu/src/psid then choose Data Center
. * 4856 observations on 9 variables for Females 30 to 50 years
.
. * Fixed width data
. * intnum 1-4 V30001="1968 INTERVIEW NUMBER"
. * persnum 5-7 V30002="PERSON NUMBER"
. * age
8-9 V30809="AGE OF INDIVIDUAL
93"
. * educatn 10-11 V30820="G90 HIGHEST GRADE COMPLETED
93"
. * earnings 12-17 V30821="TOTAL LABOR INCOME
93"
. * hours 18-21 V30823="1992 ANNUAL WORK HOURS
93"
. * sex
22 V32000="SEX OF INDIVIDUAL"
. * kids 23-24 V32022="# LIVE BIRTHS TO THIS INDIVIDUAL"
. * [NOTE: DO NOT USE THE kids VARIABLE AS IT IS NUMBER OF BIRTHS
.*
NOT NUMBER OF KIDS CURRENTLYU IN HOUSEHOLD]
. * married 25 V32049="LAST KNOWN MARITAL STATUS"
.
. ********** READ DATA **********
.
. * Data are fixed format so use infix
. infix intnum 1-4 persnum 5-7 age 8-9 educatn 10-11 earnings 12-17 /*
> */ hours 18-21 sex 22 kids 23-24 married 25 using psidf3050.dat
(4856 observations read)
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------intnum |
4856 4598.101 2761.971
4
9306
persnum |
4856 59.21355 79.74856
1
205
age |
4856 38.46293 5.595116
30
50
educatn |
4855 16.37714 18.4495
0
99
earnings |
4856 14244.51 15985.45
0 240000
-------------+-------------------------------------------------------hours |
4856 1235.335 947.1758
0
5160
sex |
4856
2
0
2
2
kids |
4856 4.48126 14.88786
0
99
married |
4856 1.920717 1.504848
1
9
.
. ********** MISSING VALUES, DATA TRANSFORMATIONS and SAMPLE SELECTION
.
. * For Highest grade codes the missing codes are 98 DK and 99 NA and 0 inappropriate
. * Here treat these as missing
. replace educatn = . if (educatn==0 | educatn==98 | educatn==99)
(290 real changes made, 290 to missing)

198

.
. * For marital status the codes are
. * 1 married; 2 Never married; 3 Widowed; 4 Divorced, annulment;
. * 5 Separated; 8 NA / DK; 9 No histories 85-93
. * Recode 2-5 as not married and treat 8 and 9 as missing
. replace married = . if (married==8 | married==9)
(52 real changes made, 52 to missing)
. replace married = 0 if married > 1
(1785 real changes made)
.
. * For kids the missing codes are 98 DK/NA and 99 no birth history
. replace kids = . if (kids==98 | kids==99)
(118 real changes made, 118 to missing)
. * But do not use these data as it is number of births
. * not number of kids currently in household
. * So I drop kids
. drop kids
.
. * Work with positive earnings only
. drop if earnings==0
(1204 observations deleted)
. * Topcode women with very high earnings
. replace earnings=100000 if earnings>100000
(11 real changes made)
. * Create log hourly wage
. gen hwage = earnings/hours
. gen lnhwage = ln(hwage)
.
. * Work with age 36 and nonmissing education data
. keep if age == 36
(3468 observations deleted)
. drop if educatn == .
(7 observations deleted)
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------intnum |
177 4699.853 2765.081
14
9240
persnum |
177 59.53672 79.73001
1
188
age |
177
36
0
36
36
educatn |
177 12.58757 2.841347
3
17
199

earnings |
177 17470.55 13513.56
87
70000
-------------+-------------------------------------------------------hours |
177 1506.401 698.4145
8
3160
sex |
177
2
0
2
2
married |
177 .7457627 .4366669
0
1
hwage |
177 12.71631 16.58889 .6837607
175
lnhwage |
177 2.198163 .8281614 -.3801473 5.164786
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile intnum persnum age educatn earnings hours sex married hwage /*
> */ lnhwage using mma09p1np.asc, replace
.
. ********* ANALYSIS: (1)-(3) NONPARAMETRIC DENSITY ESTIMATES
.
. set scheme s1mono
.
. * Here give bin width for histogram and kdensity
.
. * Calculate Silberman's plugin estimate of optimal bandwidth in (9.13)
. * with delta given in Table 9.1 for Epanechnikov kernel
. quietly sum lnhwage, detail
. global sadj = min(r(sd),(r(p75)-r(p25))/1.349)
. di "sadj: " $sadj " iqr/1349: " (r(p75)-r(p25))/1.349 " stdev: " r(sd)
sadj: .65488184 iqr/1349: .65488184 stdev: .82816143
. global bwepan2 = 1.3643*1.7188*$sadj/(r(N)^0.2)
. di "Bandwidth: " $bwepan2
Bandwidth: .54538542
.
. * HISTOGRAM ONLY - Figure 9.1
. graph twoway (histogram lnhwage, bin(20) bcolor(*.2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Histogram for Log Wage") /*
> */ xtitle("Log Hourly Wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(10) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Histogram") label(2 "Kernel"))
. graph save ch9hist, replace
(file ch9hist.gph saved)
. graph export ch9hist.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9hist.wmf written in Windows Metafile format)

200

.
. * COMBINED HISTOGRAM AND KERNEL DENSITY ESTIMATE
. graph twoway (histogram lnhwage, bin(20) bcolor(*.2)) /*
> */ (kdensity lnhwage, width($bwepan2) epan2 clstyle(p1)), /*
> */ title("Histogram and Kernel Density for Log Wage") /*
> */ caption("Note: Kernel is Epanechnikov with bandwidth 0.55")
.
. * KERNEL DENSITY ESTIMATE FOR 3 BANDWIDTHS - Figure 9.2
. global bwonehalf = 0.5*$bwepan2
. global btwotimes = 2*$bwepan2
. graph twoway (kdensity lnhwage, width($bwonehalf) epan2 clstyle(p2)) /*
> */ (kdensity lnhwage, width($bwepan2) epan2 clstyle(p1)) /*
> */ (kdensity lnhwage, width($btwotimes) epan2 clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Density Estimates as Bandwidth Varies") /*
> */ xtitle("Log Hourly Wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Kernel density estimates", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "One-half plug-in") label(2 "Plug-in") /*
> */
label(3 "Two times plug-in"))
. graph save ch9kd1, replace
(file ch9kd1.gph saved)
. graph export ch9kd1.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9kd1.wmf written in Windows Metafile format)
.
. * KERNEL DENSITY ESTIMATE FOR 4 DIFFERENT KERNELS - Figure 9.4
. * Calculate Silberman's plugin optimal bandwidths using (9.13)
. * with delta given in Table 9.1 for the different kernels
.
. * Use sadj calculated earlier for Epanecnnikov
. global bwgauss = 1.3643*0.7764*$sadj/(_N^0.2)
. global bwbiweight = 1.3643*2.0362*$sadj/(_N^0.2)
. global bwrectang = 0.5*1.3643*1.3510*$sadj/(_N^0.2)
. di "Usual Epanechnikov (epan2):
" $bwepan2
Usual Epanechnikov (epan2):
.54538542
. di "Gaussian:
Gaussian:

" $bwgauss
.24635632

. di "Quartic or biweight:
Quartic or biweight:

" $bwbiweight
.64609832

201

. di "Uniform or rectangular:
" $bwrectang
Uniform or rectangular:
.21434015
. graph twoway (kdensity lnhwage, width($bwepan2) epan2) /*
> */ (kdensity lnhwage, width($bwgauss) gauss) /*
> */ (kdensity lnhwage, width($bwbiweight) biweight) /*
> */ (kdensity lnhwage, width($bwrectang) rectangle), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Density Estimates as Kernel Varies") /*
> */ xtitle("Log Hourly Wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Kernel density estimates", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(3) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Epanechnikov (h=0.545)") label(2 "Gaussian (h=0.246)") /*
> */
label(3 "Quartic (h=0.646)") label(4 "Uniform (h=0.214)"))
. graph save ch9kdensu1, replace
(file ch9kdensu1.gph saved)
. graph export ch9kdensu1.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9kdensu1.wmf written in Windows Metafile format)
.
. * SHOW THAT STATA EPANECHNIKOV = USUAL EPANECHNIKOV
. * Once divide usual Epanechnikov bandwidth by sqrt(5).
. * (Pagan and Ullah (1999, p.28) have formulae.)
. global bwepan = $bwepan2/sqrt(5)
. graph twoway (kdensity lnhwage, width($bwepan2) epan2) /*
> */ (kdensity lnhwage, width($bwepan) epan), /*
> */ title("Epan = Epan2 if bandwidth adjusted") /*
> */ legend( label(1 "Usual Epanechnikov") label(2 "Stata Epanechnikov"))
.
.
. ********* ANALYSIS: (4) LOWESS NONPARAMETRIC REGRESSION ESTIMATES
.
. * LOWESS WITH DEFAULT BANDWIDTH of 0.8
. lowess lnhwage educatn
.
. * LOWESS REGRESSION WITH BANDWIDTHS of 0.1, 0.4 and 0.8 - Figure 9.3
. graph twoway (scatter lnhwage educatn, msize(medsmall) msymbol(o)) /*
> */ (lowess lnhwage educatn, bwidth(0.8) clstyle(p2)) /*
> */ (lowess lnhwage educatn, bwidth(0.4) clstyle(p1)) /*
> */ (lowess lnhwage educatn, bwidth(0.1) clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Nonparametric Regression as Bandwidth Varies") /*
> */ xtitle("Years of Schooling", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Log Hourly Wage", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(12) ring(0) col(2)) legend(size(small)) /*
> */ legend( label(1 "Actual data") label(2 "Bandwidth h=0.8") /*
202

> */

label(3 "Bandwidth h=0.4") label(4 "Bandwidth h=0.1"))

. graph save ch9ksm1, replace


(file ch9ksm1.gph saved)
. graph export ch9ksm1.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9ksm1.wmf written in Windows Metafile format)
.
. ********* ANALYSIS: (5) EXTRA: K-NEAREST NEIGHBORS NONPARAMETRIC
REGRESSION
.
. * NEAREST NEIGHBOURS REGRESSION USING LOWESS
. * Use lowess with mean and noweight options to give running means = centered kNN
. global knnbwidth = 0.3
. di "knn via Lowess uses following % of sample: " $knnbwidth
knn via Lowess uses following % of sample: .3
. lowess lnhwage educatn, bwidth($knnbwidth) mean noweight
.
. * LOWESS COMPARED TO NEAREST NEIGHBOURS
. graph twoway (lowess lnhwage educatn, bwidth(0.3) mean noweight) /*
> */ (lowess lnhwage educatn, bwidth(0.3)), /*
> */ title("Centered kNN versus Lowess") /*
> */ legend( label(1 "Centered kNN") label(2 "Lowess 0.8"))
.
. * NEAREST NEIGHBOURS REGRESSION USING KNNREG COMPARED TO USING
LOWESS
. * knnreg is a Stata add-on (in Stata search knnreg to find and download)
. * Here we verify that same as lowess knn except knnreg drops endpoints
. global k = round($knnbwidth*_N)
. di "knnreg uses following number of neighbours: " $k
knnreg uses following number of neighbours: 53
. knnreg lnhwage educatn, k($k) gen(knnregpred) ylabel nograph
. lowess lnhwage educatn, bwidth($knnbwidth) gen(knnlowesspred) mean noweight nograph
. * Following shows that the same except knnreg drops endpoints and lowess does not
. sum knnlowesspred knnregpred
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------knnlowessp~d |
177 2.180308 .4522163 1.475512 2.954416
knnregpred |
125 2.184309 .3412013 1.529874 2.802865
. corr knnlowesspred knnregpred
203

(obs=125)
| knnlow~d knnreg~d
-------------+-----------------knnlowessp~d | 1.0000
knnregpred | 1.0000 1.0000

.
. ********* ANALYSIS: (6) EXTRA: KERNEL NONPARAMETRIC REGRESSION
.
. * KERNEL REGRESSION
. * Kercode 1 = Uniform; 2 = Triangle; 3 = Epanechnikov; 4 = Quartic (Biweight);
.*
5 = Triweight; 6 = Gaussian; 7 = Cosinus
. * bwidth(#) defines width of the weight function window around each grid point.
. * npoint(#) specifies the number of equally spaced grid points over range of x.
. * Here bwidth(3) gives e.g. positive weight from x=4 to x=10 if current x0=7
. kernreg lnhwage educatn, bwidth(3) kercode(3) npoint(100) ylabel gen(kernregpred1 xkernreg)
. graph twoway (lowess lnhwage educatn, bwidth(0.5) clstyle(p2)) /*
> */ (line kernregpred xkernreg, clstyle(p1)), /*
> */ title("Lowess versus kernel regression") /*
> */ legend( label(1 "Lowess") label(2 "Kernreg"))
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma09p1np.txt
log type: text
closed on: 17 May 2005, 14:17:05
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma09p2npmore.txt
log type: text
opened on: 17 May 2005, 14:17:35
.
. ********** OVERVIEW OF MMA09P2NPMORE.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 9.4-9.5 (pages 307-19)
. * More on nonparametric regression, including Figures 9.5 - 9.7
.
. * It provides
. * (1) Nonparametric regression
.*
k-nearest neighbors regression: Figure 9.5 in chapter 9.4.2 (ch9ksmma)
204

.*
Lowess regression: Figure 9.6 in chapter 9.4.3 (ch9ksmlowess)
.*
Kernel regression (using Stata add-on kernreg)
. * (2) Nonparametric derivative estimation
.*
Figure 9.7 in chapter 9.5.5 (ch9kderiv)
. * (3) Cross-validation - still incomplete
. * using generated data (see below)
.
. * See also mma09p1np.do for nonparametric density estimation and regression
.
. * This program uses free Stata add-on command kernreg
. * To obtain in Stata give command search kernreg
.
. ********** SETUP **********
.
. di "mma09p2npmore.do Cameron and Trivedi: Stata nonparametrics with generated data"
mma09p2npmore.do Cameron and Trivedi: Stata nonparametrics with generated data
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** GENERATE DATA **********
.
. * Model is y = 150 + 6.5*x - 0.15*x^2 + 0.001*x^3 + u
. * where u ~ N[0, 25^2]
.*
x = 1, 2, 3, ... , 100
.*
e ~ N[0, 2^2]
.
. set seed 10101
. set obs 100
obs was 0, now 100
. gen u = 25*invnorm(uniform())
. gen x = _n
. gen y = 150 + 6.5*x - 0.15*x^2 + 0.001*x^3 + u
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u|
100 2.809606 25.26291 -71.97334 73.59318
x|
100
50.5 29.01149
1
100
y|
100 228.5596 35.25377 132.2952 345.5873
.
205

. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x using mma09p2npmore.asc, replace
.
. ******** PARAMETRIC REGRESSION **********
.
. * OLS regression on cubic polymomial
. gen xsquared = x^2
. gen xcubed = x^3
. reg y x xsquared xcubed
Source |
SS
df
MS
Number of obs = 100
-------------+-----------------------------F( 3, 96) = 31.15
Model | 60691.6801 3 20230.56
Prob > F
= 0.0000
Residual | 62348.2994 96 649.461452
R-squared = 0.4933
-------------+-----------------------------Adj R-squared = 0.4774
Total | 123039.98 99 1242.82808
Root MSE
= 25.485
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | 6.055295 .9033915 6.70 0.000 4.262077 7.848513
xsquared | -.1402283 .0207284 -6.77 0.000 -.1813738 -.0990828
xcubed | .0009492 .0001349 7.03 0.000 .0006814 .0012171
_cons | 155.1521 10.58835 14.65 0.000 134.1344 176.1698
-----------------------------------------------------------------------------. predict ycubic
(option xb assumed; fitted values)
. summarize y ycubic
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y|
100 228.5596 35.25377 132.2952 345.5873
ycubic |
100 228.5596 24.75979 161.0681 307.6293
.
. ******** (1) NONPARAMETRIC REGRESSION **********
.
. * K-NEAREST NEIGHBORS REGRESSION - FIGURE 9.5
. * ksm without options gives running mean = moving average = centered kNN
. * Here _N = 100 so bwidth = 0.05 gives 100*0.05 = 5 nearest neighbours
. graph twoway (scatter y x, msize(medsmall) msymbol(o)) /*
> */ (lowess y x, mean noweight bwidth(0.05) clstyle(p1)) /*
> */ (lfit y x, clstyle(p3)) /*
> */ (lowess y x, mean noweight bwidth(0.25) clstyle(p2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("k-Nearest Neighbours Regression as k Varies") /*
206

>
>
>
>
>

*/ xtitle("Regressor x", size(medlarge)) xscale(titlegap(*5)) /*


*/ ytitle("Dependent variable y", size(medlarge)) yscale(titlegap(*5)) /*
*/ legend(pos(12) ring(0) col(1)) legend(size(small)) /*
*/ legend( label(1 "Actual Data") label(2 "kNN (k=5)") /*
*/
label(3 "Linear OLS") label(4 "kNN (k=25)"))

. graph save ch9ksmma, replace


(file ch9ksmma.gph saved)
. graph export ch9ksmma.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9ksmma.wmf written in Windows Metafile format)
.
. * VERIFY THAT kNN SAME AS MOVING AVERAGE
. * Do moving average by hand for k = 5
. gen yma5 = (y[_n-2] + y[_n-1] + y + y[_n+1] + y[_n+2])/5
(4 missing values generated)
. replace yma5 = (y[_n]+y[_n+1]+y[_n+2])/3 if _n==1
(1 real change made)
. replace yma5 = (y[_n-1]+y[_n]+y[_n+1]+y[_n+2])/4 if _n==2
(1 real change made)
. replace yma5 = (y[_n+1]+y[_n]+y[_n-1]+y[_n-2])/4 if _n==99
(1 real change made)
. replace yma5 = (y[_n]+y[_n-1]+y[_n-2])/3 if _n==100
(1 real change made)
. lowess y x, mean noweight bwidth(0.05) nogr gen(yknn5)
. sum yma5 yknn5
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------yma5 |
100 228.6037 26.63323 157.1434 297.4832
yknn5 |
100 228.6037 26.63323 157.1434 297.4832
.
. * LOWESS REGRESSION - FIGURE 9.6
. graph twoway (scatter y x, msize(medsmall) msymbol(o)) /*
> */ (lowess y x, bwidth(0.25) clstyle(p1)) /*
> */ (line ycubic x, clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Lowess Nonparametric Regression") /*
> */ xtitle("Regressor x", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Dependent variable y", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(12) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Actual Data") label(2 "Lowess (k=25)") /*
> */
label(3 "OLS Cubic Regression") )
207

. graph save ch9ksmlowess, replace


(file ch9ksmlowess.gph saved)
. graph export ch9ksmlowess.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9ksmlowess.wmf written in Windows Metafile format)
.
. * KERNEL REGRESSION COMPARED TO k NEAREST NEIGHBORS REGRESSION
. * For this artificial example (with equally spaced x)
. * knn = kernel regression using uniform prior
. * Kercode 1 = Uniform; 2 = Triangle; 3 = Epanechnikov; 4 = Quartic (Biweight);
.*
5 = Triweight; 6 = Gaussian; 7 = Cosinus
. * bwidth(#) defines width of the weight function window around each grid point.
. * npoint(#) specifies the number of equally spaced grid points over range of x.
. * Here bwidth(12) gives e.g. positive weight from x=15 to x=39 if current x=37
. kernreg y x, bwidth(12) kercode(1) npoint(100) ylabel gen(pykernreg xkernreg)
. lowess y x, mean noweight bwidth(0.25) gen(yknn25)
. sum pykernreg yknn25
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------pykernreg |
100 228.6856 18.75275 181.1579 272.5488
yknn25 |
100 228.6856 18.75275 181.1578 272.5488
.
. ******** (2) DERIVATIVE ESTIMATION **********
.
. * DERIVATIVE ESTIMATION - FIGURE 9.7
. * Here use Lowess regression
. lowess y x, xlab ylab bwidth(0.25) lowess nogr gen(yplowess)
. * Need to first sort data on regressor if data on regressor are not ordered
. sort x
. gen dydxlowess = (yplowess - yplowess[_n-1])/(x - x[_n-1])
(1 missing value generated)
. * And do the same for the earlier fitted cubic
. gen dydxcubic = (ycubic - ycubic[_n-1])/(x - x[_n-1])
(1 missing value generated)
. graph twoway (line dydxlowess x, clstyle(p1)) /*
> */ (line dydxcubic x, clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Nonparametric Derivative Estimation") /*
> */ xtitle("Regressor x", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Dependent variable y", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(12) ring(0) col(1)) legend(size(small)) /*
208

> */ legend( label(1 "From Lowess (k=25)") /*


> */ label(2 "From OLS Cubic Regression") )
. graph save ch9kderiv, replace
(file ch9kderiv.gph saved)
. graph export ch9kderiv.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9kderiv.wmf written in Windows Metafile format)
.
. ******** (3) CROSS-VALIDATION [PRELIMINARY] **********
.
. /* The following does not work.
> I need to figure out use of macros */
.
. forvalues i = 5/25 {
2. scalar bd`i' = 0.01*`i'
3. global bw`i' = bd`i'
4. lowess y x, mean noweight bwidth($bw`i') gen(py`i') nogr
5. gen cv`i' = sum(3/2*(y-py`i')^2)
6. }
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u|
100 2.809606 25.26291 -71.97334 73.59318
x|
100
50.5 29.01149
1
100
y|
100 228.5596 35.25377 132.2952 345.5873
xsquared |
100
3383.5 3024.356
1
10000
xcubed |
100
255025 289320.7
1 1000000
-------------+-------------------------------------------------------ycubic |
100 228.5596 24.75979 161.0681 307.6293
yma5 |
100 228.6037 26.63323 157.1434 297.4832
yknn5 |
100 228.6037 26.63323 157.1434 297.4832
pykernreg |
100 228.6856 18.75275 181.1579 272.5488
xkernreg |
100
50.5 29.01149
1
100
-------------+-------------------------------------------------------yknn25 |
100 228.6856 18.75275 181.1578 272.5488
yplowess |
100 228.6494 25.46305 156.8217 302.5474
dydxlowess |
99 1.471977 2.20262 -1.953159 6.964434
dydxcubic |
99 1.480416 2.100452 -.8495026 6.342957
py5 |
100 228.0408 8.046055 217.6967 243.0812
-------------+-------------------------------------------------------cv5 |
100 84655.13 34359.8 10940.13 162417.9
py6 |
100 228.0408 8.046055 217.6967 243.0812
cv6 |
100 84655.13 34359.8 10940.13 162417.9
py7 |
100 228.0408 8.046055 217.6967 243.0812
cv7 |
100 84655.13 34359.8 10940.13 162417.9
-------------+-------------------------------------------------------py8 |
100 228.0408 8.046055 217.6967 243.0812
209

cv8 |
100 84655.13 34359.8 10940.13 162417.9
py9 |
100 228.0408 8.046055 217.6967 243.0812
cv9 |
100 84655.13 34359.8 10940.13 162417.9
py10 |
100 228.0408 8.046055 217.6967 243.0812
-------------+-------------------------------------------------------cv10 |
100 84655.13 34359.8 10940.13 162417.9
py11 |
100 228.0408 8.046055 217.6967 243.0812
cv11 |
100 84655.13 34359.8 10940.13 162417.9
py12 |
100 228.0408 8.046055 217.6967 243.0812
cv12 |
100 84655.13 34359.8 10940.13 162417.9
-------------+-------------------------------------------------------py13 |
100 228.0408 8.046055 217.6967 243.0812
cv13 |
100 84655.13 34359.8 10940.13 162417.9
py14 |
100 228.0408 8.046055 217.6967 243.0812
cv14 |
100 84655.13 34359.8 10940.13 162417.9
py15 |
100 228.0408 8.046055 217.6967 243.0812
-------------+-------------------------------------------------------cv15 |
100 84655.13 34359.8 10940.13 162417.9
py16 |
100 228.0408 8.046055 217.6967 243.0812
cv16 |
100 84655.13 34359.8 10940.13 162417.9
py17 |
100 228.0408 8.046055 217.6967 243.0812
cv17 |
100 84655.13 34359.8 10940.13 162417.9
-------------+-------------------------------------------------------py18 |
100 228.0408 8.046055 217.6967 243.0812
cv18 |
100 84655.13 34359.8 10940.13 162417.9
py19 |
100 228.0408 8.046055 217.6967 243.0812
cv19 |
100 84655.13 34359.8 10940.13 162417.9
py20 |
100 228.0408 8.046055 217.6967 243.0812
-------------+-------------------------------------------------------cv20 |
100 84655.13 34359.8 10940.13 162417.9
py21 |
100 228.0408 8.046055 217.6967 243.0812
cv21 |
100 84655.13 34359.8 10940.13 162417.9
py22 |
100 228.0408 8.046055 217.6967 243.0812
cv22 |
100 84655.13 34359.8 10940.13 162417.9
-------------+-------------------------------------------------------py23 |
100 228.0408 8.046055 217.6967 243.0812
cv23 |
100 84655.13 34359.8 10940.13 162417.9
py24 |
100 228.0408 8.046055 217.6967 243.0812
cv24 |
100 84655.13 34359.8 10940.13 162417.9
py25 |
100 228.0408 8.046055 217.6967 243.0812
-------------+-------------------------------------------------------cv25 |
100 84655.13 34359.8 10940.13 162417.9
. * Then need to choose the `i' with minimum cv`i'
. * Problem here is that this gives e.g. $bw5 = 5 not 0.05
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma09p2npmore.txt
log type: text
closed on: 17 May 2005, 14:17:43
210

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma09p3kernels.txt
log type: text
opened on: 18 May 2005, 21:31:55
.
. ********** OVERVIEW OF MMA09P3KERNELS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * This program plots different kernel regression functions
. * This is not included in the book
. * There is no data
.
. * Results:
. * Epanstata is similar to Gaussian kernel. Less peaked than Epanechnikov.
. * Triangular, Quartic, Triweight and Tricubic are similar,
. * and are more peaked than Epanechnikov
. * The fourth oreder Kernels can take negative values.
.
. * NOTE: For kernel density Stata uses an alternative formulation of Epanechnikov
.*
To follow book and e.g. Hardle (1990) use epan2
.*
(available in Stata version 8.2) rather than epan
.
. ********** SETUP **********
.
. di "mma09p3kernels.do Cameron and Trivedi: Stata Kernel Functions"
mma09p3kernels.do Cameron and Trivedi: Stata Kernel Functions
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** GENERATE DATA **********
.
. * Graphs will be for z = -2.5 to 2.5 in increments of 0.02
. set obs 251
obs was 0, now 251
. gen z = -2.52 + 0.02*_n
.
. ********** CALCULATE THE KERNELS **********
211

.
. * Indicator for |z| < 1
. gen abszltone = 1
. replace abszltone = 0 if abs(z)>=1
(152 real changes made)
.
. gen kuniform = 0.5*abszltone
.
. gen ktriangular = (1 - abs(z))*abszltone
.
. * Stata calls the usual Epanechnikov kernel epan2
. gen kepanechnikov = (3/4)*(1 - z^2)*abszltone
.
. * Stata uses alternative epanechnikov
. gen abszltsqrtfive = 1
. replace abszltsqrtfive = 0 if abs(z)>=sqrt(5)
(28 real changes made)
. gen kepanstata = (3/4)*(1 - (z^2)/5)/sqrt(5)*abszltsqrtfive
.
. gen kquartic = (15/16)*((1 - z^2)^2)*abszltone
.
. gen ktriweight = (35/32)*((1 - z^2)^3)*abszltone
.
. gen ktricubic = (70/81)*((1 - (abs(z))^3)^3)*abszltone
.
. gen kgaussian = normden(z)
.
. gen k4thordergauss = (1/2)*(3-(z^2))*normden(z)
.
. * This is the optimal 4th order - Pagan and Ullah p.57
. gen k4thorderquartic = (15/32)*(3 - 10*z^2 + 7*z^4)*abszltone
.
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------z|
251
0 1.452033
-2.5
2.5
212

abszltone |
251 .3944223 .4897027
0
1
kuniform |
251 .1972112 .2448514
0
.5
ktriangular |
251 .1992032 .3058094
0
1
kepanechni~v |
251 .1991833 .2831384
0
.75
-------------+-------------------------------------------------------abszltsqrt~e |
251 .8884462 .3154457
0
1
kepanstata |
251 .199203 .1175801
0 .3354102
kquartic |
251 .1992032 .3209618
0
.9375
ktriweight |
251 .1992032 .351183
0 1.09375
ktricubic |
251 .1992032 .3191548
0 .8641976
-------------+-------------------------------------------------------kgaussian |
251 .1967985 .1323354 .0175283 .3989423
k4thorderg~s |
251 .2053453 .2297148 -.0327459 .5984134
k4thorderq~c |
251 .199253 .4584096 -.2676096 1.40625
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile z abszltone kuniform ktriangular kepanechnikov abszltsqrtfive /*
> */ kepanstata kquartic ktriweight ktricubic kgaussian /*
> */ k4thordergauss k4thorderquartic using mma09p3kernels.asc, replace
.
. ********** PLOT THE KERNEL FUNCTIONS **********
.
. * Epanstata is similar to Gaussian kernel. Less peaked than Epanechnikov
. graph twoway (line kuniform z) (line kepanechnikov z) (line kepanstata z) /*
> */ (line kgaussian z), title("Four standard kernel functions")
.
. * Triangular, Quartic, Triweight and Tricubic are similar
. * and are more peaked than Epanechnikov
. graph twoway (line ktriangular z) (line kquartic z) (line ktriweight z) /*
> */ (line ktricubic z), title("Four similar kernel functions")
.
. graph twoway (line k4thordergauss z) (line k4thorderquartic z), /*
> */ title("Two fourth order kernel functions")
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma09p3kernels.txt
log type: text
closed on: 18 May 2005, 21:32:00

213

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma10p1gradient.txt
log type: text
opened on: 17 May 2005, 14:21:11
.
. ********** OVERVIEW OF MMA10P1GRADIENT.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 10.2.4 page 338-9
. * Gradient Method Example (Newton-Raphson)
. * using artificial data
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** ANALYSIS: FIRST SIX ROUNDS OF NR **********
.
. * General Algorithm is
. * b_s+1 = b_s + A_s*g_s
.
. * For this the example in section 10.2.4
. * Q(b) = -(1/2N) * Sum_i {(y_i-exp(b))^2}
.*
= -(1/2N) * Sum_i {(y_i)^2 -2*y_i*exp(b) + exp(b)^2}
.*
= ymean*exp(b) - 0.5*(exp(b))^2 - (1/N) * Sum_i {(y_i)^2}
.
. * so the gradient vector (here a scalar)
.*
g = dQ_s / db
.*
= (ymean - exp(b))*exp(b)
.
. * and using the Method of scoring variation of Newton-Raphson
. * the weighting matrix (here a scalar)
. * A_s = Inv [ - E[d^2 Q_s / db^2 ] ]
. * A_s = Inv [ - E[(ymean - exp(b))*exp(b) - exp(b)*exp(b)] ]
.*
= Inv [ exp(2b) ] since E[(ymean - exp(b)] = 0
.*
= exp(-2b)
.
. * Data
. scalar ymean = 2.0

214

.
. * Starting value
. scalar b_1 = 0.0
.
. * First round
. scalar g_1 = (ymean - exp(b_1))*exp(b_1)
. scalar A_1 = exp(-2*b_1)
. scalar b_2 = b_1 + A_1*g_1
.
. * Second round
. scalar g_2 = (ymean - exp(b_2))*exp(b_2)
. scalar A_2 = exp(-2*b_2)
. scalar b_3 = b_2 + A_2*g_2
.
. * Third round
. scalar g_3 = (ymean - exp(b_3))*exp(b_3)
. scalar A_3 = exp(-2*b_3)
. scalar b_4 = b_3 + A_3*g_3
.
. * Fourth round
. scalar g_4 = (ymean - exp(b_4))*exp(b_4)
. scalar A_4 = exp(-2*b_4)
. scalar b_5 = b_4 + A_4*g_4
.
. * Fifth round
. scalar g_5 = (ymean - exp(b_5))*exp(b_5)
. scalar A_5 = exp(-2*b_5)
. scalar b_6 = b_5 + A_5*g_5
.
. * Sixth round
. scalar g_6 = (ymean - exp(b_6))*exp(b_6)
. scalar A_6 = exp(-2*b_6)
.
215

. * We also calculate the objective function at each round


. * (ignoring the term - (1/N) * Sum_i {(y_i)^2} which does not depend on b)
. scalar Q_1 = ymean*exp(b_1) - 0.5*(exp(b_1))^2
. scalar Q_2 = ymean*exp(b_2) - 0.5*(exp(b_2))^2
. scalar Q_3 = ymean*exp(b_3) - 0.5*(exp(b_3))^2
. scalar Q_4 = ymean*exp(b_4) - 0.5*(exp(b_4))^2
. scalar Q_5 = ymean*exp(b_5) - 0.5*(exp(b_5))^2
. scalar Q_6 = ymean*exp(b_6) - 0.5*(exp(b_6))^2
.
. * DISPLAY THE RESULTS GIVEN IN TABLE 10.1 page 339
. di "Round Estiamte Gradient Weight Function"
Round Estiamte Gradient Weight Function
. di " 1: " b_1 %8.6f " " g_1 %8.6f " " A_1 %8.6f " " Q_1 %8.6f
1: 0 1 1 1.5
. di " 2: " b_2 %8.6f " " g_2 %8.6f " " A_2 %8.6f " " Q_2 %8.6f
2: 1 -1.9524924 .13533528 1.7420356
. di " 3: " b_3 %8.6f " " g_3 %8.6f " " A_3 %8.6f " " Q_3 %8.6f
3: .73575888 -.18171081 .22957678 1.9962098
. di " 4: " b_4 %8.6f " " g_4 %8.6f " " A_4 %8.6f " " Q_4 %8.6f
4: .6940423 -.00358529 .24955284 1.9999984
. di " 5: " b_5 %8.6f " " g_5 %8.6f " " A_5 %8.6f " " Q_5 %8.6f
5: .69314758 -1.602e-06 .2499998 2
. di " 6: " b_6 %8.6f " " g_6 %8.6f " " A_6 %8.6f " " Q_6 %-8.6f
6: .69314718 -3.206e-13 .25 2
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma10p1gradient.txt
log type: text
closed on: 17 May 2005, 14:21:11
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma11p1boot.txt
log type: text
opened on: 18 May 2005, 15:52:55
.
. ********** OVERVIEW OF MMA11P1BOOT.DO **********
216

.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 11.3 pages 366-368
. * Bootstrap applied to exponential regression model
. * Provides
. * (1) Bootstrap distribution of beta and t-statistic (Table 11.1)
. * (2) Various statistics from bootstrap (pages 366-8)
. * (3) Bootstrap density of the t-statistic (Figure 11.1)
. * using generated data (see below)
.
. * Note: To speed up progam reduce breps - the number of bootstrap replications
.*
But final program should use many repications
.
. * Note: This program uses ereg which is an old Stata command
.*
superceded by streg, dist(exp)
.
. * Note: For bootstrap see also mm07p4boot.do
.*
which has additional commands / ways to bootstrap
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** GENERATE DATA **********
.
. * Model is y ~ exponential(exp(a + bx + cz))
. * where x and z are joint normal (1,1,0.1,0.1,0.5)
. * i.e. means 0.1 and 0.1
.*
sd's 0.1 and 0.1 and correln 0.5 (so correln^2 = .25)
. * variances 0.01 and 0.01 and covariance 0.005
.
. * Generate data from joint normal
. * Use fact that x is N(mu0.1,0.1)
.*
and z | x is N(0.1 + .05/.1*(x - .1), .01x.75 = .0075)
.*
so that st dev = sqrt(0.0075) = 0.0866025
.
. set obs 50
obs was 0, now 50
. set seed 10001
. * Generate x and z bivariate normal
. scalar mu1=0.1
217

. scalar mu2=0.1
. scalar sig1=0.1
. scalar sig2=0.1
. scalar rho=0.5
. scalar sig12=rho*sig1*sig2
. gen x = mu1 + sig1*invnorm(uniform())
. gen muzgivx = mu2+(sig12/(sig2*sig2))*(x-mu1)
. gen sigzgivx = sqrt(sig2*sig2*(1-rho*rho))
. gen z = muzgivx + sigzgivx*invnorm(uniform())
. * To generate y exponential with mean mu=Ey use
. * Integral 0 to a of (1/mu)exp(-x/mu) dx by change of variables
. * = Integral 0 to a/mu of exp(-t)dt
. * = incomplete gamma function P(0,a/mu) in the terminology of Stata
. gen Ey = exp(-2.0+2*x+2*z)
. gen y = Ey*invgammap(1,uniform())
. gen logy = log(y)
.
. * Descriptive Statistics
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x|
50 .0935209 .1031485 -.1173506 .2778609
muzgivx |
50 .0967604 .0515742 -.0086753 .1889304
sigzgivx |
50 .0866025
0 .0866025 .0866025
z|
50 .1033014 .0909297 -.0885447 .3137469
Ey |
50 .2114837 .071719 .0945722 .4314067
-------------+-------------------------------------------------------y|
50 .2024206 .2237202 .0005293 .9601147
logy |
50 -2.282336 1.45494 -7.543878 -.0407026
. ereg y x z
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -84.246434


log likelihood = -80.068104
log likelihood = -79.871694
log likelihood = -79.871338
log likelihood = -79.871338
218

Exponential regression -- entry time 0


log expected-time form
Number of obs =
LR chi2(2)
=
8.75
Log likelihood = -79.871338
Prob > chi2 =

50
0.0126

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .2670543 1.417339 0.19 0.851 -2.510879 3.044988
z | 4.663384 1.740712 2.68 0.007 1.251652 8.075117
_cons | -2.191619 .2328589 -9.41 0.000 -2.648014 -1.735224
-----------------------------------------------------------------------------.
. save mma11p1boot, replace
file mma11p1boot.dta saved
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x z using mma11p1boot.asc, replace
.
. ********** SIMPLE BOOTSTRAP **********
.
. * Stata produces four bootstrap 100*(1-alpha) confidence intervals
. * (N) and (P) have no asymptotic refinement
. * (BC)-(BCA) have asymptotic refinement
. * For details see program mma07p4boot.do
.
. * Change the following for different number of simulations S
. * From page 399, for testing better to use 999 than 1000
. global breps = 999 /* The number of bootstrap reps used below */
.
. set seed 20001
.
. * A simple and adequate bootstrap command for the slope coefficients is
. bs "ereg y x z" "_b[x] _b[z]", reps($breps) level(95)
command:
ereg y x z
statistics: _bs_1
= _b[x]
_bs_2
= _b[z]
Bootstrap statistics

Number of obs =
Replications =
999

50

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------219

_bs_1 | 999 .2670543 -.1885509 1.420956 -2.52135 3.055458 (N)


|
-2.9054 2.696445 (P)
|
-2.590993 2.864327 (BC)
_bs_2 | 999 4.663384 .0524786 1.939086 .8582302 8.468539 (N)
|
.5006047 8.483892 (P)
|
.231034 8.174835 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
.
. ********** MORE DETAILED BOOTSTRAP **********
.
. * The following bootstrap also gives standard error at each replication
. * and saves data from replications for further analysis
.
. * In partiulcar, want to use the percentile-t method,
. * which provides asymtptotic refinement
.
. * Stata does not give this. For methods see
. * e.g. Efron and Tibsharani (1993, pp.160-162)
. * e.g. Cameron and Trivedi (2005) Chapter 11.2.6-11.2.7
. * For sample s compute t-test(s) = (bhat(s)-bhat) / se(s)
. * where bhat is initial estimate
. * and bhat(s) and se(s) are for sth round.
. * Order the t-test(s) statistics and choose the alpha/2 percentiles
. * which give the critical values for the t-test
.
. * Implementation requires saving the results from each bootstrap replication
. * in order to obtain ccritical values from percentiles of bootstrap distribution
.
. use mma11p1boot.dta, clear
.
. * Get and store coefficients (b)
. * for regressors in the original model and data before bootstrap
. quietly ereg y x z
. global bx=_b[x]
. global sex=_se[x]
. global bz=_b[z]
. global sez=_se[z]
. di " Coefficients bx: " $bx " and bz: " $bz
Coefficients bx: .26705432 and bz: 4.6633845
. di " Standard error sex: " $sex " and sez: " $sez
220

Standard error sex: 1.4173391 and sez: 1.7407119


.
. * Bootstrap and save coeff estimates and se's from each replication
. set seed 20001
. bs "ereg y x z" "_b[x] _b[z] _se[x] _se[z]", reps($breps) level(95) saving(mma11p1bootreps) repl
> ace
command:
ereg y x z
statistics: _bs_1
= _b[x]
_bs_2
= _b[z]
_bs_3
= _se[x]
_bs_4
= _se[z]
Bootstrap statistics

Number of obs =
Replications =
999

50

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 999 .2670543 -.1885509 1.420956 -2.52135 3.055458
|
-2.9054 2.696445 (P)
|
-2.590993 2.864327 (BC)
_bs_2 | 999 4.663384 .0524786 1.939086 .8582302 8.468539
|
.5006047 8.483892 (P)
|
.231034 8.174835 (BC)
_bs_3 | 999 1.417339 .0644196 .1718393 1.080131 1.754547
|
1.234399 1.902349 (P)
|
1.196068 1.742845 (BC)
_bs_4 | 999 1.740712 .0910103 .186631 1.374478 2.106946
|
1.542322 2.257937 (P)
|
1.453673 2.058318 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

(N)

(N)

(N)

(N)

.
. * Now use the bootstrap estimates
. use mma11p1bootreps, clear
(bootstrap: ereg y x z)
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------_bs_1 |
999 .0785034 1.420956 -9.431229 4.278278
_bs_2 |
999 4.715863 1.939086 -1.747643 12.09208
_bs_3 |
999 1.481759 .1718393 1.145421 2.761842
_bs_4 |
999 1.831722 .186631 1.387625 2.910449
221

. * Order comes from "_b[x] _b[z] _se[x] _se[z]" in earlier bs


. gen bxs = _bs_1
. gen bzs = _bs_2
. gen sexs = _bs_3
. gen sezs = _bs_4
. gen ttestxs = (bxs - $bx)/sexs
. gen ttestzs = (bzs - $bz)/sezs
.
. ********** (1) TABLE 11.1 (page 367)
.
. summarize bzs ttestzs, d
bzs
------------------------------------------------------------Percentiles
Smallest
1% -.3361366
-1.747643
5% 1.544816
-1.716207
10% 2.270323 -1.366866
Obs
999
25% 3.570291 -1.205571
Sum of Wgt.
999
50%
75%
90%
95%
99%

4.77197
Mean
4.715863
Largest
Std. Dev.
1.939086
5.970802
10.10243
7.100958
10.42623
Variance
3.760056
7.810663
10.76733
Skewness
-.1344324
9.426978
12.09208
Kurtosis
3.545415

ttestzs
------------------------------------------------------------Percentiles
Smallest
1% -2.66391 -3.921595
5% -1.727528
-3.483456
10% -1.32364 -3.201425
Obs
999
25% -.6209012 -2.975815
Sum of Wgt.
999
50%
75%
90%
95%
99%

.0618649
Mean
.0261125
Largest
Std. Dev.
1.046855
.7034938
2.693856
1.323415
3.087892
Variance
1.095904
1.70558
3.11692
Skewness
-.1596043
2.529097
3.738328
Kurtosis
3.337749

.
. * Additionally need the 2.5 and 97.5 percentiles not given in summarize, d
222

.
. * Coefficient of z
. _pctile bzs, p(2.5,97.5)
. di " Lower 2.5 and upper 2.5 percentile of coeff b for z: " r(r1) " and " r(r2)
Lower 2.5 and upper 2.5 percentile of coeff b for z: .50060469 and 8.4838924
.
. * t-statistic for z
. _pctile ttestzs, p(2.5,97.5)
. di " Lower 2.5 and upper 2.5 percentile of ttest on z: " r(r1) " and " r(r2)
Lower 2.5 and upper 2.5 percentile of ttest on z: -2.1827998 and 2.0659592
.
. ********** (2) RESULTS IN TEXT PAGES 366-7 **********
.
. * (2A) Bootstrap standard error estimate (no refinement)
. * These are given earlier in bootstrap table output
. * Equivalently get the standard deviation of bzs
.
. quietly sum bzs
. scalar bzbootse = r(sd)
. di "Bootstrap estimate of standard error: " bzbootse
Bootstrap estimate of standard error: 1.9390864
.
. * (2B) Test b3 = 0 using percentile-t method (asymptotic refinement)
. * Use the 2.5% and 97.5% bootstrap critical values for t-statistic for z
.
. _pctile ttestzs, p(2.5,97.5)
. di " Lower 2.5 and upper 2.5 percentile of ttest on z: " r(r1) " and " r(r2)
Lower 2.5 and upper 2.5 percentile of ttest on z: -2.1827998 and 2.0659592
.
. * (2D) 95% confidence interval with asymptotic refinement
. * Use the preceding critical values
.
. scalar lbz = $bz + r(r1)*$sez /* Note the plus sign here */
. scalar ubz = $bz + r(r2)*$sez
. di " Percentile-t interval lower and upper bounds: (" lbz "," ubz ")"
Percentile-t interval lower and upper bounds: (.86375888,8.2596243)
.
. * (2B-Var) Variation for symmetric two-sided test on z
.
223

. gen absttestzs = abs(ttestzs)


. _pctile absttestzs, p(95)
. di " Upper 5 percentile of symmetric two-sided test on z: " r(r1) "
Upper 5 percentile of symmetric two-sided test on z: 2.0775187
.
. * (2C) Test b3 = 0 without asymptotic refinement
. * Usual Wald test except use bootstrap estimate of standard error
.
. scalar Wald = ($bz - 0) / bzbootse
. di "Wald statistic using bootstrap standard error: " Wald
Wald statistic using bootstrap standard error: 2.404939
.
. * (2E) Bootstrap estimate of bias
. * This is given in the earlier bootstrap results table
. * and is explained in the text
.
. ********** (3) FIGURE 11.1 (p.368) PLOTS ESTIMATED DENSITY OF T-STATISTIC FOR
Z
.
. set scheme s1mono
. label var ttestzs "Bootstrap t-statistic"
. kdensity ttestzs, normal /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Bootstrap Density of 't-Statistic'") /*
> */ xtitle("t-statistic from each bootstrap replication", size(medlarge)) xscale(titlegap(*5)) /*
>
> */ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Bootstrap Estimate") label(2 "Standard Normal"))
. graph save ch11boot, replace
(file ch11boot.gph saved)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section3\mma11p1boot.txt
log type: text
closed on: 18 May 2005, 15:53:47

224

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma12p1integration.txt
log type: text
opened on: 18 May 2005, 21:17:14
.
. ********** OVERVIEW OF MMA12P1INTEGRATION.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 12.3.3 pages 391-2
. * Computes integral numerically and by simulation
. * (1) Illustrate Midpoint Rule (page 392)
. * (2) Illustrate Monte Carlo integral (Table 12.1 page 392)
.*
. * for computing E[x] and E[exp(-exp(x))] for x ~ N[0,1]
.
. * No data need be read in.
.
. ********** SETUP **********
.
. set more off
. version 8.0
.
. ********** (1) NUMERICAL INTEGRATION USING MIDPOINT RULE **********
.
. * Midpoint rule for n evaluation points between a and b is
. * Integral = Sum (j=1 to n) [(b-a)/n]*f(xbar_j)
. * where xbar_j is midpoint between x_j-1 and x_j
.
. program midpointrule, rclass
1. version 8
2. /* define arguments. Here trueb2 = b2 in Phi(b1 + b2*x2) */
. args neval a b
3. drop _all
4. scalar increment = (`b'-`a') / `neval'
5. set obs `neval'
6. /* Compute the function of interest */
. gen xbar = `a' - 0.5*increment + increment*_n
7. gen density = exp(-xbar*xbar/2)/sqrt(2*_pi)
8. * Following is contribution to E[x] when x ~ N[0,1]
. gen f1xbar = xbar*density
9. * Following is contribution to E[exp(-exp(x))] when x ~ N[0,1]
. gen f2xbar = exp(-exp(x))*density
10. /* Compute the averages */
225

. quietly sum f1xbar


11. scalar Ex = r(sum)*increment
12. quietly sum f2xbar
13. scalar Eexpminexpx = r(sum)*increment
14. /* Print results */
. di "Evaluation points: " `neval' " over range: (" `a' "," `b' ")
15. di "Midpoint rule estimate of E[x] is: " Ex
16. di "Midpoint rule estimate of E[exp(-exp(x))] is: " Eexpminexpx
17. end
.
. midpointrule 20 -5 5
obs was 0, now 20
Evaluation points: 20 over range: (-5,5)
Midpoint rule estimate of E[x] is: 0
Midpoint rule estimate of E[exp(-exp(x))] is: .38175625
. midpointrule 200 -5 5
obs was 0, now 200
Evaluation points: 200 over range: (-5,5)
Midpoint rule estimate of E[x] is: 0
Midpoint rule estimate of E[exp(-exp(x))] is: .38175618
. midpointrule 2000 -5 5
obs was 0, now 2000
Evaluation points: 2000 over range: (-5,5)
Midpoint rule estimate of E[x] is: 0
Midpoint rule estimate of E[exp(-exp(x))] is: .38175618
.
. ********** (2) MONTE CARLO INTEGRATION USING DRAWS FROM DENSITY OF X
**********
.
. * To get E[g(x)]
. * make draws from N[0,1], compute g(x), and average over draws
.
. program simintegration, rclass
1. version 8
2. /* define arguments. Here trueb2 = b2 in Phi(b1 + b2*x2) */
. args nsims
3. /* Generate the data: here x */
. drop _all
4. set obs `nsims'
5. set seed 10101
6. gen x = invnorm(uniform())
7. /* Compute the function of interest */
. gen f1x = x /* For E[x] just need x */
8. gen f2x = exp(-exp(x)) /* For E[exp(-exp(x))] */
9. /* Compute the averages */
. quietly sum f1x
10. scalar Ex = r(mean)
226

11. quietly sum f2x


12. scalar Eexpminexpx = r(mean)
13. di "Number of simulations: " `nsims'
14. di "Monte Carlo estimate of E[x] is: " Ex
15. di "Monte Carlo estimate of E[exp(-exp(x))] is: " Eexpminexpx
16. end
.
. * Note a different program was used to obtain Table 12.1 on page 392
. * So results will differ somewhat from text, except for very high number of simulations
.
. simintegration 10
obs was 0, now 10
Number of simulations: 10
Monte Carlo estimate of E[x] is: -.10143571
Monte Carlo estimate of E[exp(-exp(x))] is: .42635197
. simintegration 25
obs was 0, now 25
Number of simulations: 25
Monte Carlo estimate of E[x] is: .17496346
Monte Carlo estimate of E[exp(-exp(x))] is: .35703296
. simintegration 50
obs was 0, now 50
Number of simulations: 50
Monte Carlo estimate of E[x] is: .0079132
Monte Carlo estimate of E[exp(-exp(x))] is: .37966293
. simintegration 100
obs was 0, now 100
Number of simulations: 100
Monte Carlo estimate of E[x] is: .11238423
Monte Carlo estimate of E[exp(-exp(x))] is: .3524417
. simintegration 500
obs was 0, now 500
Number of simulations: 500
Monte Carlo estimate of E[x] is: .06990338
Monte Carlo estimate of E[exp(-exp(x))] is: .36137551
. simintegration 1000
obs was 0, now 1000
Number of simulations: 1000
Monte Carlo estimate of E[x] is: .04309113
Monte Carlo estimate of E[exp(-exp(x))] is: .36945581
. simintegration 1000
obs was 0, now 1000
Number of simulations: 1000
Monte Carlo estimate of E[x] is: .04309113
227

Monte Carlo estimate of E[exp(-exp(x))] is: .36945581


. simintegration 100000
obs was 0, now 100000
Number of simulations: 100000
Monte Carlo estimate of E[x] is: -.00405425
Monte Carlo estimate of E[exp(-exp(x))] is: .38284684
. clear
. set mem 20m
(20480k)
. simintegration 1000000
obs was 0, now 1000000
Number of simulations: 1000000
Monte Carlo estimate of E[x] is: -.00085186
Monte Carlo estimate of E[exp(-exp(x))] is: .38192861
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section3\mma12p1integration.txt
log type: text
closed on: 18 May 2005, 21:17:16
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma12p2mslmsm.txt
log type: text
opened on: 18 May 2005, 21:46:27
.
. ********** OVERVIEW OF MMA12P2MSLMSM.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 12.4.5 pages 397-8 and 12.5.5 pages 402-4
. * Computes integral numerically and by simulation
. * (1) Maximum Simulated likelihood Table 12.2
. * (2) Method of Simulated Moments Table 12.3
. * with application to generated data
.
. * The application is only illustrative.
. * This is not a template program for MSL or MSM.
.
. * Different number of simulations S lead to different estimators.
. * This program gives entries in Tables 12.2 and 12.3 for S = 100
. * For other values of S change the value of simreps
228

. * from the current global simreps 100


.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** DATA DESCRIPTION **********
.
. * Model is y = theta + u + e
. * where theta is a scalar parameter equal to 1
.*
u is extreme value type 1
.*
e is N(0,1)
. * n is set in global numobs
.
. ********** DEFINE GLOBALS **********
.
. global simreps 100 /* change this to change the number of simulations */
. global numobs 100 /* change this to change the number of observations */
.
.
. ********** (1) MAXIMUM SIMULATED LIKELIHOOD (Table 12.2 p.398) **********
.
. * This MSL program is inefficiently written computer code
. * as it requires drawing the same random variates at each iteration
.
. * Generate data
. clear
. set obs $numobs
obs was 0, now 100
. set seed 10101
. gen u = -log(-log(uniform()))
. gen e = invnorm(uniform())
. gen y = 1 + u + e
. summarize u e y
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u|
100 .7236045 1.372637 -1.827296 6.423636
e|
100 .0415449 .9472174 -2.906972 2.302204
y|
100 1.765149 1.684177 -2.227185 8.143228
229

.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile u e y using mma12p2mslmsm.asc, replace
.
. * Use the variant ml d0 as this gives the entire likelihood, not just one observation.
. * I want this so that seed is only reset for the entire data.
. * My program is inefficient as variates needs to be redrawn at each iteration
. program define msl
1. version 6.0
2. args todo b lnf
/* Need to use the names todo b and lnf
>
todo always contains 1 and may be ignored
>
b is parameters and lnf is log-density */
3. tempvar theta1
/* create as needed to calculate lf, g, ... */
4. mleval `theta1' = `b', eq(1) /* theta1 is theta1_i = x_i'b
*/
5. local y "$ML_y1"
/* create to make program more readable */
6. set seed 10101
7. tempvar denssim
8. global isim=1
9. quietly gen `denssim' = exp(-0.5*(`y'-`theta1'+log(-log(uniform())))^2)/sqrt(2*_pi)
10. while $isim < $simreps {
11.
quietly replace `denssim' = `denssim' + exp(-0.5*(`y'-`theta1'+log(-log(uniform())))^2)/sq
> rt(2*_pi)
12. global isim=$isim+1
13. }
14. mlsum `lnf' = ln(`denssim'/$isim)
15. end
.
. gen one = 1
. ml model d0 msl (y = one, nocons )
. ml maximize
initial:
log likelihood = -216.68168
alternative: log likelihood = -199.54479
rescale:
log likelihood = -191.09715
Iteration 0: log likelihood = -191.09715
Iteration 1: log likelihood = -190.4391 (not concave)
Iteration 2: log likelihood = -190.43885
Iteration 3: log likelihood = -190.4385
Iteration 4: log likelihood = -190.4385

Log likelihood = -190.4385

Number of obs =
100
Wald chi2(1) =
65.72
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
230

-------------+---------------------------------------------------------------one | 1.177456 .1452451 8.11 0.000 .8927806 1.462131


-----------------------------------------------------------------------------.
. *** Display MSL results in one column of Table 12.2 p.398
.
. di "For number of simulations S = " $simreps
For number of simulations S = 100
. di "MSL estimator: " _b[one]
MSL estimator: 1.1774557
. di "Standard error: " _se[one]
Standard error: .14524511
.
. ********** (2) METHOD OF SIMULATED MOMENTS (Table 12.3 p.404) **********
.
. clear
. set obs $numobs
obs was 0, now 100
. set seed 10101
. gen u = -log(-log(uniform()))
. gen e = invnorm(uniform())
. gen y = 1 + u + e
. summarize u e y
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u|
100 .7236045 1.372637 -1.827296 6.423636
e|
100 .0415449 .9472174 -2.906972 2.302204
y|
100 1.765149 1.684177 -2.227185 8.143228
.
. global isim=1
. gen usim = -log(-log(uniform()))
. gen esim = invnorm(uniform())
. while $isim < $simreps {
2. quietly replace usim = usim-log(-log(uniform()))
3. quietly replace esim = esim+invnorm(uniform())
4. global isim=$isim+1
231

5. }
. gen usimbar = usim/$isim
. gen esimbar = esim/$isim
. gen theta = y - usimbar - esimbar
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u|
100 .7236045 1.372637 -1.827296 6.423636
e|
100 .0415449 .9472174 -2.906972 2.302204
y|
100 1.765149 1.684177 -2.227185 8.143228
usim |
100 57.36345 13.16979 21.96637 90.07499
esim |
100 -.9702956 11.38655 -26.38858 33.28406
-------------+-------------------------------------------------------usimbar |
100 .5736345 .1316979 .2196637 .9007499
esimbar |
100 -.009703 .1138655 -.2638858 .3328406
theta |
100 1.201218 1.681435 -2.757669 7.75245
.
. * Results for Table 12.3 on page 404
. * Here the st.eror of theta_MSM is approximated by the st. dev. of theta
. * divided by the square root of S (the number of simulations)
. quietly sum theta
. scalar theta_MSM = r(mean)
. scalar approx_sterror = r(sd)/sqrt($simreps)
.
. * Display MSM results in one column of Table 12.3 p.404
. di "For number of simulations S = " $simreps
For number of simulations S = 100
. di "MSM estimator: " theta_MSM
MSM estimator: 1.2012178
. di "Approximate standard error: " approx_sterror
Approximate standard error: .16814348
.
. * As written this will not give the correct standard errors (see p.403).
. * Can get this by also computing the squared rv to get E[y^2]
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section3\mma12p2mslmsm.txt
log type: text
232

closed on: 18 May 2005, 21:46:28


-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma12p3draws.txt
log type: text
opened on: 18 May 2005, 21:48:36
.
. ********** OVERVIEW OF MMA12P3DRAWS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 12.8.2 pages 412-5
. * Draws figures that illustrate two common ways to draw random variates
.
. * (1) Illustrate Inverse Transformation method: Figure 12.2
. * (2) Illustrate Envelope method: Figure 12.3
.
. * No data need be read in.
.
. ********** SETUP **********
.
. set more off
. version 8
. set scheme s1mono
.
. ********** (1) INVERSE TRANSFORMATION - FIGURE 12.2 page 413 **********
.
. * Graph is for x = 0 to 5 in increments of 0.05
. set obs 100
obs was 0, now 100
. gen x = 0.05*_n
. * Unit Exponential cdf
. gen Fx = 1 - exp(-x)
. * Suppose uniform draw is 0.64
. gen uniformdraw = 0.64
.
. graph twoway (line Fx x, yline(0.64) xline(1.02)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Inverse Transformation Method") /*
233

> */ xtitle("Random variable x", size(medlarge)) xscale(titlegap(*5)) /*


> */ ytitle("Cdf F(x)", size(medlarge)) yscale(titlegap(*5)) /*
> */ caption(" " "Draw of 0.64 (vertical axis) yields x = 1.02 (horizontal axis).")
. graph save ch12fig2invtransform, replace
(file ch12fig2invtransform.gph saved)
. graph export ch12fig2invtransform.wmf, replace
(file c:\Imbook\bwebpage\Section3\ch12fig2invtransform.wmf written in Windows Metafile
format)
.
. ********** (2) ENVELOPE METHOD - FIGURE 12.3 **********
.
. * The following is a modification of the figure in the book
. * making clear that the envelope is a scaling up of g(x)
.
. clear
.
. * Graph is for x = 0 to 10 in increments of 0.1
. set obs 101
obs was 0, now 101
. gen x = -0.05 + 0.1*_n
. * Unit Exponential cdf
. gen fx = normden(x-4)
. gen gx = 1.5*normden(x-4)+0.005
.
. graph twoway (line fx x, clstyle(p1)) /*
> */ (line gx x, clstyle(p1) clwidth(*2) clcolor(gs12)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Accept-reject Method") /*
> */ xtitle("Random variable x", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("f(x) and kg(x)", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Desired density f(x)") label(2 "Envelope kg(x)") )
. graph save ch12fig3envelope, replace
(file ch12fig3envelope.gph saved)
. graph export ch12fig3envelope.wmf, replace
(file c:\Imbook\bwebpage\Section3\ch12fig3envelope.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section3\mma12p3draws.txt
234

log type: text


closed on: 18 May 2005, 21:48:42
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma13p1bayesthm.txt
log type: text
opened on: 24 May 2005, 11:04:08
.
. ********** OVERVIEW OF MMA13P1BAYESTHM.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 13.2.2 page 424
. * Create Figure 13.1
. * (1) Bayes Analysis illustrated using normal distribution and prior
.
. * No data are needed.
.
. ********** SETUP
.
. set more off
. version
version 8.2
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION **********
.
. * Model is y ~ normal(theta, sigmesq) where sigmasq is known.
. * and the prior is theta ~ normal(mu, tau)
. * which gives a normal posterior
. * n is set below in set obs
.
. ********** CREATE DATA **********
.
. * The likleihood and prior are normal so the posterior is also normal
.
. * Will evaluate the densities at points between 0 and 15
. set obs 150
obs was 0, now 150
. gen xeval = 0.1*_n
.
235

. * Likelihood with sigmasq known


. scalar nobs = 50
. scalar ybar = 10
. scalar sigmasq = 100
. gen likelihood = normden(xeval,ybar,sqrt(sigmasq/nobs))
.
. * Prior
. scalar mu = 5
. scalar tausq = 3
. gen prior = normden(xeval,mu,sqrt(tausq))
.
. * Posterior given sample mean of using
. scalar tau1sq=1/((nobs/sigmasq)+(1/tausq))
. scalar mu1 = tau1sq*((ybar*nobs/sigmasq)+(mu/tausq))
. gen posterior = normden(xeval,mu1,sqrt(tau1sq))
.
. scalar list
mu1 =
tau1sq =
tausq =
mu =
sigmasq =
ybar =
nobs =

8
1.2
3
5
100
10
50

. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------xeval |
150
7.55 4.344537
.1
15
likelihood |
150 .0666548 .0944174 6.44e-12 .2820948
prior |
150 .0665247 .0804685 1.33e-08 .2303294
posterior |
150 .0666667 .1131755 1.85e-12 .3641828
.
. graph twoway (line likelihood xeval, clstyle(p2)) /*
> */ (line prior xeval, clstyle(p3)) /*
> */ (line posterior xeval, clstyle(p1)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Bayes: Likelihood, Prior and Posterior") /*
> */ xtitle("Evaluation point", size(medlarge)) xscale(titlegap(*5)) /*
236

>
>
>
>

*/ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /*


*/ legend(pos(10) ring(0) col(1)) legend(size(small)) /*
*/ legend( label(1 "Likelihood N[10,2]") label(2 "Prior N[5,3]") /*
*/
label(3 "Posterior N[8,1.2]") )

. graph save Ch13_Bayes1, replace


(file Ch13_Bayes1.gph saved)
. graph export Ch13_Bayes1.wmf, replace
(file c:\Imbook\bwebpage\Section3\Ch13_Bayes1.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section3\mma13p1bayesthm.txt
log type: text
closed on: 24 May 2005, 11:04:12
1
The SAS System
25, 2005

08:50 Wednesday, May

NOTE: Copyright (c) 2002-2003 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) 9.1 (TS1M2)
Licensed to UNIV OF CA/DAVIS, Site 0029107010.
NOTE: This session is executing on the SunOS 5.9 platform.

You are running SAS 9. Some SAS 8 files will be automatically converted
by the V9 engine; others are incompatible. Please see
http://support.sas.com/rnd/migration/planning/platform/64bit.html
PROC MIGRATE will preserve current SAS file attributes and is
recommended for converting all your SAS libraries from any
SAS 8 release to SAS 9. For details and examples, please see
http://support.sas.com/rnd/migration/index.html

This message is contained in the SAS news file, and is presented upon
initialization. Edit the file "news" in the "misc/base" directory to
display site-specific news and information in the program log.
The command line option "-nonews" will prevent this display.

NOTE: SAS initialization used:


real time
0.11 seconds
cpu time
0.10 seconds
1
2

* MMA13P2BAYES.SAS March 2005 for SAS version 8.2

237

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

********** OVERVIEW OF MMA13P2BAYES.SAS **********


* SAS Program
* copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
* used for "Microeconometrics: Methods and Applications"
* by A. Colin Cameron and Pravin K. Trivedi (2005)
* Cambridge University Press
* Chapter 13.6 p.452-4
* MCMC Example: Gibbs Sampler for 2 equation SUR
* Program creates the first column of Table 13.3
* (though differs somewhat due to use of different seed)
* For different columns of Table 13.3 change
* nobs = Sample size N (1000 or 10000)
* replics = Gibbs sample replications (50000 or 100000)
* tau = 1, 10 or 0.1
* This program does first column: tau=10, nobs=1000, replics=50000
* Note that the program does not exactly replicate Table 13.3
* Table 13.3 used the computer clock for seed,
* with third argument zero in rannor(j( , ,0))
* Here instead the seed is consecutively 10101, 20101, ... , 70101
* so third argument is eg rannor(j( , ,10101))
* to permit reproducability by other users
* This programs creates

238

2
25, 2005

The SAS System

08:50 Wednesday, May

30
* MMA13P2BAYES.1ST SAS Output with one column of Table 13.3
31
* MMA13P2BAYES.LOG SAS log file
32
33
* This program uses generated data - so no data set required
34
* This program uses a lot of memory - 1 gigabyte should do
35
* In Unix give command sas -MEMSIZE 1G mma13p2bayesgibbs.sas
36
37
*********************************************************************;
38
*****
BIVARIATE NORMAL-BAYESIAN-ESTIMATION-BY-MCMC
**************;
39
*********************************************************************;
40
41
OPTIONS LS=75;
42
options NOTES;
43
44
PROC IML;
NOTE: IML Ready
45
start main;
45
!
46
47
print "A. Colin Cameron and Pravin K. Trivedi (2005)";
47
!
48
print "Microeconometrics: Methods and Applications, CUP";
48
!
49
print "MCMC Example: Gibbs Sampler for SUR";
49
!
50
51
************* GENERATING DATA: 2 EQUATION SUR
51
! ****************;
52
53
nobs = 1000;
53
!
54
replics = 50000;
54
!
55
burn = 5000;
55
!
56
replics = replics + burn;
56
!
57
58
npar1 = 2;
58
!
59
npar2 = 2;
59
!
60
61
alpha1 ={1,1};
61
!
62
alpha2 ={1,1};
62
!
239

63
64
64
65
65
66
66
67
67
68
69

sigma = {1 -0.5,-0.5 1};


!
T = {0.15 2.18 0.725 0.45};
!
EPS = 1e-20;
!
IC = (1/2.506628275);
!
R1 = j(nobs,1,1)||rannor(j(nobs,1,10101));

240

3
69
70
70
71
72
72
73
73
74
74
75
76
76
77
77
78
79
79
80
81
81
82
82
83
84
84
85
85
86
86
87
87
88
89
89
90
90
91
91
92
93
93
94
95
95
96
97
97
98

The SAS System 08:50 Wednesday, May 25, 2005


!
R2 = j(nobs,1,1)||rannor(j(nobs,1,20101));
!
e = rannor(j(nobs,2,30101))*root(sigma);
!
e1 = e[,1];
!
e2 = e[,2];
!
Y1 = R1*alpha1 + e1;
!
Y2 = R2*alpha2 + e2;
!
*************
SPECIFY PRIOR DISTRIBUTIONS
! ******************;
alpha01 = j(npar1,1,0);
!
alpha02 = j(npar2,1,0);
!
sigma = I(2);
!
p = 3;
!
df = 5;
!
tau = 10;
!
MUalpha = alpha01//alpha02;
!
OMalpha = tau*I(npar1+npar2);
!
OMphi = I(2);
!
************ ANALYSIS: GIBBS SAMLING BEGINS HERE
! ***************;
do rep = 1 to replics;
!
*************
GENERATE ALPHA1 ALPHA2 RHO
! *******************;

241

99
99
100
101
102
102
103
104

isigma = inv(sigma);
!
LL = ((isigma[1,1]*R1`*R1||isigma[1,2]*R1`*R2)//
(isigma[2,1]*R2`*R1||isigma[2,2]*R2`*R2));
!
LisigY = ((isigma[1,1]*R1`*Y1+isigma[1,2]*R1`*Y2)//
(isigma[2,1]*R2`*Y1+isigma[2,2]*R2`*Y2));

242

4
104
105
106
107
107
108
109
109
110
110
111
112
112
113
113
114
115
115
116
117
118
118
119
119
120
120
121
121
122
122
123
123
124
124
125
126
126
127
128
128
129
130
130
131
131
132
132
133
134

The SAS System 08:50 Wednesday, May 25, 2005


!
alpha = inv(inv(OMalpha)+ LL)*(LisigY + inv(OMalpha)*MUalpha)
+ root(inv(inv(OMalpha)+
! LL))`*rannor(j(npar1+npar2,1,40101));
alpha1 = alpha[1:npar1];
!
alpha2 = alpha[npar1+1:npar1+npar2];
!
e1 = Y1 - R1*alpha1;
!
e2 = Y2 - R2*alpha2;
!
*************
GENERATE SIGMA
! *******************;
mt = (sqrt((rannor(j(1,nobs+df,50101))##2)[,+])||0)//
(rannor(j(1,1,60101))||sqrt((rannor(j(1,nobs+df-1,70101))##
! 2)[,+]));
mv = mt*mt`;
!
e=(e1||e2);
!
ms = e`*e+inv(OMphi);
!
ml = root(inv(ms))`;
!
mg = ml*mv*ml`;
!
sigma = inv(mg);
!
free mt mv e ml;
!
************* WRITE TO OUTPUT FILE IF AFTER BURN-IN
! **************;
if rep <= burn then goto point300;
!
sigma3 = sigma[1,1]||sigma[1,2]||sigma[2,2];
!
out1 = alpha1`||alpha2`||sigma3;
!
output1=output1//out1;
243

134
135
136
136
136
137
138
138

!
!

point300:
end;

*************
! **************;

END OF GIBBS SAMPLING

244

5
139
140
141
141
142
142
143
143
144
145
145
146
147
147
148
148
149
150
150
151
151
152
152
153
153
154
155
155
156
156
157
157
158
158
159
160
160
161
161
162
162
163
164
164
165
165
166
166
167

The SAS System 08:50 Wednesday, May 25, 2005

****************************************************************
! *****;
***** RESULTS: COMPARE LAST HALF WITH ALL (AFTER BURN-IN)
! *******;
****************************************************************
! *****;
replics = replics-burn;
!
out1 = output1[replics/2+1:replics,];
!
out = output1[1:replics,];
!
create exp from out1;
!
append from out1;
!
summary var _num_;
!
close exp;
!
create exp from out;
!
append from out;
!
summary var _num_;
!
close exp;
!
****************************************************************
! *****;
****** RESULTS: POSTERIOR MEAN AND SD - TABLE 13.3 P.454
! ********;
****************************************************************
! *****;
xnames1 = {"CONSTANT"} || {"R1"};
!
xnames2 = {"CONSTANT"} || {"R2"};
!
parnames = concat({"d1"}," ",xnames1)||concat({"d2"},"
! ",xnames2)||{"SIGMA11"}||{"SIGMA12"}||{"SIGMA22"};

245

168
168
169
169
170
170
171
171

meanout = out[+,]/replics;
!
stderr =
! sqrt(((out-j(replics,1,1)*meanout)##2)[+,]/(replics-1));
parm = meanout;
!
stderr = stderr`;
!

246

6
172
172
173
174
174
175
175
176
176
177
177
178
178
179
179
180
180
181
181
182
182
183
183
184
185
185
186
186
187
187
188
189
189
190
191
191
192
193
193
194
194
195
195
196
196
197
198
198
199

The SAS System 08:50 Wednesday, May 25, 2005


tnpar = npar1 + npar2 + 3;
!
tstat = parm`/ stderr;
!
coeff = parm` || stderr || tstat;
!
info = tau // nobs // replics // burn // tnpar;
!
rowinfo={'TAU' '# OBSERVATIONS' '# REPLICATIONS' '# BURN-IN' '#
! PARAMETERS'};
estcol ={ 'ESTIMATE' 'STD ERR' 'T-STAT'};
!
mattrib info rowname=rowinfo label={" "};
!
mattrib coeff rowname=parnames colname=estcol label={" "};
!
print / "Results for Table 13.3 p.454";
!
print info;
!
print coeff;
!
****************************************************************
! *****;
********** RESULTS: CONVERGENCE CHECK: SEE P.454
! ***************;
****************************************************************
! *****;
print / "Convergence check on p.454";
!
corr = j(20,7,0);
!
do i = 1 to 7;
!
cov = covlag(out[,i],20)`;
!
corr[,i] = cov/cov[1];
!
end;
!
covd1 = j(20,2,0);
!

247

200
200
201
201
202
202
203
203

do k = 1 to 3;
!
covd1 = corr[,2*k-1:2*k];
!
print covd1;
!
end;
!

248

The SAS System 08:50 Wednesday, May 25, 2005

204
205
covd1 = corr[,7];
205
!
206
print covd1;
206
!
207
208
finish main;
NOTE: Module MAIN defined.
208
!
209
210
run main;
NOTE: The data set WORK.EXP has 25000 observations and 7 variables.
NOTE: The data set WORK.EXP has 50000 observations and 7 variables.
210
!
NOTE: Exiting IML.
NOTE: 65925 workspace compresses.
NOTE: The PROCEDURE IML printed pages 1-6.
NOTE: PROCEDURE IML used (Total process time):
real time
5:44.35
cpu time
5:44.04

NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
real time
5:45.48
cpu time
5:45.15

249

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma14p1binary.txt
log type: text
opened on: 19 May 2005, 09:01:28
.
. ********** OVERVIEW OF MMA14P1BINARY.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 14.2 (pages 464-6) Logit and probit models.
. * Provides
. * (1) Table 14.1: Data summary
. * (2) Table 14.2: Logit, Probit and OLS slope estimates
. * (3) Figure 14.1: Plot of Logit Probit and OLS predicted probabilities
.
. * To run this program you need data file
. * Nldata.asc
.
. ********** SETUP
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION
.
. * Data Set comes from :
. * J. A. Herriges and C. L. Kling,
. * "Nonlinear Income Effects in Random Utility Models",
. * Review of Economics and Statistics, 81(1999): 62-72
.
. * The data are given as a combined observation with data on all 4 choices.
. * This will work for multinomial logit program.
. * For conditional logit will need to make a new data set which has
. * four separate entries for each observation as there are four alternatives.
.
. * Filename: NLDATA.ASC
. * Format: Ascii
. * Number of Observations: 1182
. * Each observations appears over 3 lines with 4 variables per line
. * so 4 x 1182 = 4728 observations
. * Variable Number and Description
. * 1 Recreation mode choice. = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
250

. * 2 Price for chosen alternative


. * 3 Catch rate for chosen alternative
. * 4 = 1 if beach mode chosen; = 0 otherwise
. * 5 = 1 if pier mode chosen; = 0 otherwise
. * 6 = 1 if private boat mode chosen; = 0 otherwise
. * 7 = 1 if charter boat mode chosen; = 0 otherwise
. * 8 = price for beach mode
. * 9 = price for pier mode
. * 10 = price for private boat mode
. * 11 = price for charter boat mode
. * 12 = catch rate for beach mode
. * 13 = catch rate for pier mode
. * 14 = catch rate for private boat mode
. * 15 = catch rate for charter boat mode
. * 16 = monthly income
.
. ********** READ IN DATA **********
.
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. * Divide income by 1000 so that results are easy to read
. gen ydiv1000 = income/1000
.
. label define modetype 1 "beach" 2 "pier" 3 "private" 4 "charter"
. label values mode modetype
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
1182 3.005076 .9936162
1
4
price |
1182 52.08197 53.82997
1.29 666.11
crate |
1182 .3893684 .5605964
.0002 2.3101
dbeach |
1182 .1133672 .3171753
0
1
dpier |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------dprivate |
1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
-------------+-------------------------------------------------------pcharter |
1182 84.37924 63.54465
27.29 691.11
qbeach |
1182 .2410113 .1907524
.0678
.5333
qpier |
1182 .1622237 .1603898 .0014 .4522
251

qprivate |
1182 .1712146 .2097885
.0002
.7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
-------------+-------------------------------------------------------income |
1182 4099.337 2461.964 416.6667
12500
ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
.
. ********** CREATE BINARY DATA: CHARTER vs PIER **********
.
. * Binary logit of charter (mode = 2) versus pier (mode = 4)
. keep if mode == 2 | mode == 4
(552 observations deleted)
. * charter is 1 if fish from charter boat and 0 if fish from pier
. gen charter = 0
. replace charter = 1 if mode == 4
(452 real changes made)
.
. gen pratio = 100*ln(pcharter/ppier)
. gen lnrelp = ln(pchart/ppier)
.
. * Overall summary
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
630 3.434921 .9011843
2
4
price |
630 62.51669 52.31219
1.29 387.208
crate |
630 .5533478 .6953035
.0014 2.3101
dbeach |
630
0
0
0
0
dpier |
630 .2825397 .4505921
0
1
-------------+-------------------------------------------------------dprivate |
630
0
0
0
0
dcharter |
630 .7174603 .4505921
0
1
pbeach |
630 95.19802 95.62037
1.29 578.048
ppier |
630 95.19802 95.62037
1.29 578.048
pprivate |
630 55.26221 59.99482
2.29 494.058
-------------+-------------------------------------------------------pcharter |
630 84.89158 60.79327
27.29 529.058
qbeach |
630 .2546022 .1983357
.0678
.5333
qpier |
630 .1716835 .1687288
.0014 .4522
qprivate |
630 .1695303 .2033172
.0014
.7369
qcharter |
630 .6368509 .688508
.0029 2.3101
-------------+-------------------------------------------------------income |
630 3741.402 2145.71 416.6667
12500
ydiv1000 |
630 3.741402 2.14571 .4166667
12.5
charter |
630 .7174603 .4505921
0
1
252

pratio |
lnrelp |

630 27.45581 126.2598 -215.3976 406.2712


630 .2745581 1.262598 -2.153976 4.062713

. * Summary by charter or by pier


. sort mode
. by mode: summarize
----------------------------------------------------------------------------------------------------> mode = pier
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
178
2
0
2
2
price |
178 30.57133 35.58442
1.29 224.296
crate |
178 .2025348 .1702942
.0014
.4522
dbeach |
178
0
0
0
0
dpier |
178
1
0
1
1
-------------+-------------------------------------------------------dprivate |
178
0
0
0
0
dcharter |
178
0
0
0
0
pbeach |
178 30.57133 35.58442
1.29 224.296
ppier |
178 30.57133 35.58442
1.29 224.296
pprivate |
178 82.42908 69.30802
2.29 494.058
-------------+-------------------------------------------------------pcharter |
178 109.7633 72.37726
27.29 529.058
qbeach |
178 .2614444 .1949684
.0678
.5333
qpier |
178 .2025348 .1702942
.0014
.4522
qprivate |
178 .1501489 .0968393
.0014
.2601
qcharter |
178 .4980798 .3756255
.0029 1.0266
-------------+-------------------------------------------------------income |
178 3387.172 2340.324 416.6667
12500
ydiv1000 |
178 3.387172 2.340324 .4166667
12.5
charter |
178
0
0
0
0
pratio |
178 164.2956 104.3052 -79.13918 406.2712
lnrelp |
178 1.642956 1.043052 -.7913917 4.062713
----------------------------------------------------------------------------------------------------> mode = charter
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
452
4
0
4
4
price |
452 75.09694 52.51942
27.29 387.208
crate |
452 .6914998 .7714728
.0029 2.3101
dbeach |
452
0
0
0
0
dpier |
452
0
0
0
0
-------------+-------------------------------------------------------dprivate |
452
0
0
0
0
dcharter |
452
1
0
1
1
pbeach |
452 120.6483 99.78664
4.29 578.048
253

ppier |
452 120.6483 99.78664
4.29 578.048
pprivate |
452 44.56376 52.23744
2.29 362.208
-------------+-------------------------------------------------------pcharter |
452 75.09694 52.51942
27.29 387.208
qbeach |
452 .2519077 .1997956
.0678
.5333
qpier |
452 .1595341 .1667353 .0014 .4522
qprivate |
452 .1771628 .2318749
.0014
.7369
qcharter |
452 .6914998 .7714728
.0029 2.3101
-------------+-------------------------------------------------------income |
452
3880.9 2050.028 416.6667
12500
ydiv1000 |
452
3.8809 2.050028 .4166667
12.5
charter |
452
1
0
1
1
pratio |
452 -26.43243 87.53686 -215.3976 235.8242
lnrelp |
452 -.2643243 .8753686 -2.153976 2.358242

.
. * Write final data to a text (ascii) file so can use with programs other than Stata
. outfile charter lnrelp using mma14p1binary.asc, replace
.
. ********** TABLE 14.1 - DATA SUMMARY BY OUTCOME AND OVERALL **********
.
. * Following gives Table 14.1 page 464
. summarize charter pcharter ppier lnrelp
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------charter |
630 .7174603 .4505921
0
1
pcharter |
630 84.89158 60.79327
27.29 529.058
ppier |
630 95.19802 95.62037
1.29 578.048
lnrelp |
630 .2745581 1.262598 -2.153976 4.062713
. sort mode
. by mode: summarize charter pcharter ppier lnrelp
----------------------------------------------------------------------------------------------------> mode = pier
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------charter |
178
0
0
0
0
pcharter |
178 109.7633 72.37726
27.29 529.058
ppier |
178 30.57133 35.58442
1.29 224.296
lnrelp |
178 1.642956 1.043052 -.7913917 4.062713
----------------------------------------------------------------------------------------------------> mode = charter
Variable |

Obs

Mean

Std. Dev.

Min

Max
254

-------------+-------------------------------------------------------charter |
452
1
0
1
1
pcharter |
452 75.09694 52.51942
27.29 387.208
ppier |
452 120.6483 99.78664
4.29 578.048
lnrelp |
452 -.2643243 .8753686 -2.153976 2.358242

.
. ********** TABLE 14.2 - ESTIMATE LOGIT, PROBIT AND OLS MODELS
.
. logit charter lnrelp
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -375.06167


log likelihood = -223.44527
log likelihood = -208.29369
log likelihood = -206.84942
log likelihood = -206.82698
log likelihood = -206.82697

Logit estimates

Number of obs =
630
LR chi2(1)
= 336.47
Prob > chi2 = 0.0000
Log likelihood = -206.82697
Pseudo R2
= 0.4486
-----------------------------------------------------------------------------charter |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -1.82253 .1445681 -12.61 0.000 -2.105879 -1.539182
_cons | 2.053125 .1689307 12.15 0.000 1.722027 2.384223
-----------------------------------------------------------------------------. estimates store blogit
.
. probit charter lnrelp
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -375.06167


log likelihood = -221.55989
log likelihood = -205.42312
log likelihood = -204.41773
log likelihood = -204.41087

Probit estimates

Number of obs =
630
LR chi2(1)
= 341.30
Prob > chi2 = 0.0000
Log likelihood = -204.41087
Pseudo R2
= 0.4550
-----------------------------------------------------------------------------charter |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -1.055515 .0761117 -13.87 0.000 -1.204691 -.9063383
255

_cons | 1.19436 .089504 13.34 0.000 1.018936 1.369785


-----------------------------------------------------------------------------. estimates store bprobit
.
. regress charter lnrelp
Source |
SS
df
MS
Number of obs = 630
-------------+-----------------------------F( 1, 628) = 542.12
Model | 59.1676598 1 59.1676598
Prob > F
= 0.0000
Residual | 68.5402767 628 .109140568
R-squared = 0.4633
-------------+-----------------------------Adj R-squared = 0.4624
Total | 127.707937 629 .203033285
Root MSE
= .33036
-----------------------------------------------------------------------------charter |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -.2429137 .0104328 -23.28 0.000 -.2634011 -.2224262
_cons | .7841542 .0134701 58.21 0.000 .7577023 .8106061
-----------------------------------------------------------------------------. estimates store bOLS
.
. * Heteroskedastic robust standard errors only needed for OLS
. * but given for other models for completeness
.
. logit charter lnrelp, robust
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -375.06167


log pseudo-likelihood = -223.44527
log pseudo-likelihood = -208.29369
log pseudo-likelihood = -206.84942
log pseudo-likelihood = -206.82698
log pseudo-likelihood = -206.82697

Logit estimates

Number of obs =
630
Wald chi2(1) = 194.28
Prob > chi2 = 0.0000
Log pseudo-likelihood = -206.82697
Pseudo R2
= 0.4486
-----------------------------------------------------------------------------|
Robust
charter |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -1.82253 .1307556 -13.94 0.000 -2.078807 -1.566254
_cons | 2.053125 .1473477 13.93 0.000 1.764329 2.341921
-----------------------------------------------------------------------------. estimates store bloghet
256

.
. probit charter lnrelp, robust
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log pseudo-likelihood = -375.06167


log pseudo-likelihood = -221.55989
log pseudo-likelihood = -205.42312
log pseudo-likelihood = -204.41773
log pseudo-likelihood = -204.41087

Probit estimates

Number of obs =
630
Wald chi2(1) = 232.07
Prob > chi2 = 0.0000
Log pseudo-likelihood = -204.41087
Pseudo R2
= 0.4550
-----------------------------------------------------------------------------|
Robust
charter |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -1.055515 .0692881 -15.23 0.000 -1.191317 -.9197122
_cons | 1.19436 .0794429 15.03 0.000 1.038655 1.350066
-----------------------------------------------------------------------------. estimates store bprobhet
.
. regress charter lnrelp, robust
Regression with robust standard errors
Number of obs =
F( 1, 628) = 792.44
Prob > F
= 0.0000
R-squared = 0.4633
Root MSE = .33036

630

-----------------------------------------------------------------------------|
Robust
charter |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -.2429137 .0086292 -28.15 0.000 -.2598592 -.2259681
_cons | .7841542 .0119566 65.58 0.000 .7606744 .8076341
-----------------------------------------------------------------------------. estimates store bOLShet
.
. * Following gives Table 14.2 page 465
. estimates table blogit bprobit bOLS bloghet bprobhet bOLShet, /*
> */ t stats(N ll r2 r2_p) b(%8.3f) keep(_cons lnrelp)
-------------------------------------------------------------------------------Variable | blogit bprobit
bOLS bloghet bprobhet bOLShet
257

-------------+-----------------------------------------------------------------_cons | 2.053
1.194
0.784
2.053
1.194
0.784
| 12.15
13.34
58.21
13.93
15.03
65.58
lnrelp | -1.823 -1.056 -0.243 -1.823 -1.056 -0.243
| -12.61 -13.87 -23.28 -13.94 -15.23 -28.15
-------------+-----------------------------------------------------------------N | 630.000 630.000 630.000 630.000 630.000 630.000
ll | -206.827 -204.411 -195.167 -206.827 -204.411 -195.167
r2 |
0.463
0.463
r2_p | 0.449
0.455
0.449
0.455
-------------------------------------------------------------------------------legend: b/t
.
. ********** FIGURE 14.1 - PLOT PREDICTED PROBABILITY AGAINST X FOR MODELS
.
. quietly logit charter lnrelp
. predict plogit, p
.
. quietly probit charter lnrelp
. predict pprobit, p
.
. quietly regress charter lnrelp
. predict pOLS
(option xb assumed; fitted values)
.
. sum charter plogit pprobit pOLS
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------charter |
630 .7174603 .4505921
0
1
plogit |
630 .7174603 .3193077 .0047196 .9974746
pprobit |
630
.72019 .3196164 .0009877 .9997377
pOLS |
630 .7174603 .3067022 -.2027341 1.307384
.
. sort lnrelp
.
. * Following gives Figure 14.1 page 466
. graph twoway (scatter charter lnrelp, msize(vsmall) jitter(3)) /*
> */ (line plogit lnrelp, clstyle(p1)) /*
> */ (line pprobit lnrelp, clstyle(p2)) /*
> */ (line pOLS lnrelp, clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
258

>
>
>
>
>
>

*/ title("Predicted Probabilities Across Models") /*


*/ xtitle("Log relative price (lnrelp)", size(medlarge)) xscale(titlegap(*5)) /*
*/ ytitle("Predicted probability", size(medlarge)) yscale(titlegap(*5)) /*
*/ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
*/ legend( label(1 "Actual Data (jittered)") label(2 "Logit") /*
*/
label(3 "Probit") label(4 "OLS"))

. graph export ch14binary.wmf, replace


(file c:\Imbook\bwebpage\Section4\ch14binary.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma14p1binary.txt
log type: text
closed on: 19 May 2005, 09:01:31

259

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma15p1mnl.txt
log type: text
opened on: 19 May 2005, 12:16:20
.
. ********** OVERVIEW OF MMA15P1MNL.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 15.2.1-3 pages 491-5
. * Multinomial and conditional logit models analysis.
. * It provides ....
. * (0) Data summary (Table 15.1)
. * (1A) Multinomial Logit estimates (Table 15.1)
. * (1B) Multinomial Logit marginal effects (text page 494)
. * (2A) Conditional Logit estimates (Table 15.2)
. * (2B) Conditional Logit marginal effects (Table 15.3)
. * (3) Multinomial estimates obtained using Cinditional Logit
. * (4) "Mixed Model" estimates (Table 15.1)
.
. * Related programs are
. * mma15p2gev.do estimates a nested logit model using Stata
. * mma15p3mnl.lim estimates multinomial models using Limdep
. * mma15p4gev.lim estimates conditional and nested logit models using Limdep
.
. * To run this program you need data file
. * Nldata.asc
.
. /* Program summary:
>
> (1) Multinomial logit of mode on alternative-invariant regressor (income)
>
mlogit mode income
>
> (2) Conditional logit of mode on alternative-specific regressor (price, catch rate)
>
First reshape data so 4 observations per individual - one for each mode.
>
clogit mode p q
>
> (3) Conditional logit of mode on alternative-invariant regressor (income)
>
First reshape data so 4 observations per individual - one for each mode.
>
Then create dummy variables for each mode d2 d3 d4
>
clogit mode d2 d3 d4 d2y d3y d4y
>
This gives same results as (1)
>
> (4) Conditional logit of mode on alternative-invariant regressor (income)
>
and on alternative-sepcific regressor (price, catch rate)
>
First reshape data so 4 observations per individual - one for each mode.
260

>
Then create dummy variables for each mode d2 d3 d4
>
clogit mode d2 d3 d4 d2y d3y d4y p q
> */
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION **********
.
. * Data Set comes from :
. * J. A. Herriges and C. L. Kling,
. * "Nonlinear Income Effects in Random Utility Models",
. * Review of Economics and Statistics, 81(1999): 62-72
.
. * The data are given as a combined observation with data on all 4 choices.
. * This will work for multinomial logit program.
. * For conditional logit will need to make a new data set which has
. * four separate entries for each observation as there are four alternatives.
.
. * Filename: NLDATA.ASC
. * Format: Ascii
. * Number of Observations: 1182
. * Each observations appears over 3 lines with 4 variables per line
. * so 4 x 1182 = 4728 observations
. * Variable Number and Description
. * 1 Recreation mode choice. = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
. * 2 Price for chosen alternative
. * 3 Catch rate for chosen alternative
. * 4 = 1 if beach mode chosen; = 0 otherwise
. * 5 = 1 if pier mode chosen; = 0 otherwise
. * 6 = 1 if private boat mode chosen; = 0 otherwise
. * 7 = 1 if charter boat mode chosen; = 0 otherwise
. * 8 = price for beach mode
. * 9 = price for pier mode
. * 10 = price for private boat mode
. * 11 = price for charter boat mode
. * 12 = catch rate for beach mode
. * 13 = catch rate for pier mode
. * 14 = catch rate for private boat mode
. * 15 = catch rate for charter boat mode
. * 16 = monthly income
.
. ********** READ IN DATA and SUMMARIZE (Table 15.1, p.492) **********
.
. * Method to read in depends on model used
261

.
. /* Data are on fishing mode: 1 beach, 2 pier, 3 private boat, 4 charter
> Data come as one observation having data for all 4 modes.
> Both alternative specific and alternative invariant regresssors.
> */
.
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. gen ydiv1000 = income/1000
.
. * Look at data by alternative
. label define modetype 1 "beach" 2 "pier" 3 "private" 4 "charter"
. label values mode modetype
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
1182 3.005076 .9936162
1
4
price |
1182 52.08197 53.82997
1.29 666.11
crate |
1182 .3893684 .5605964
.0002 2.3101
dbeach |
1182 .1133672 .3171753
0
1
dpier |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------dprivate |
1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
-------------+-------------------------------------------------------pcharter |
1182 84.37924 63.54465
27.29 691.11
qbeach |
1182 .2410113 .1907524
.0678
.5333
qpier |
1182 .1622237 .1603898 .0014 .4522
qprivate |
1182 .1712146 .2097885
.0002
.7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
-------------+-------------------------------------------------------income |
1182 4099.337 2461.964 416.6667
12500
ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
. sort mode
. by mode: summarize
---------------------------------------------------------------------------------------------------262

-> mode = beach


Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
134
1
0
1
1
price |
134 35.69949 43.09414
1.29 306.82
crate |
134 .2791948 .1938734
.0678
.5333
dbeach |
134
1
0
1
1
dpier |
134
0
0
0
0
-------------+-------------------------------------------------------dprivate |
134
0
0
0
0
dcharter |
134
0
0
0
0
pbeach |
134 35.69949 43.09414
1.29 306.82
ppier |
134 35.69949 43.09414
1.29 306.82
pprivate |
134 97.80913 75.43844
2.29 392.946
-------------+-------------------------------------------------------pcharter |
134 125.0032 78.37641
27.29 427.946
qbeach |
134 .2791948 .1938734
.0678
.5333
qpier |
134 .2190015 .1677117
.0025
.4522
qprivate |
134 .1593985 .0948855 .0008 .2601
qcharter |
134 .5176089 .3629096
.0027 1.0266
-------------+-------------------------------------------------------income |
134 4051.617 2505.42 416.6667
12500
ydiv1000 |
134 4.051617 2.50542 .4166667
12.5
----------------------------------------------------------------------------------------------------> mode = pier
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
178
2
0
2
2
price |
178 30.57133 35.58442
1.29 224.296
crate |
178 .2025348 .1702942 .0014 .4522
dbeach |
178
0
0
0
0
dpier |
178
1
0
1
1
-------------+-------------------------------------------------------dprivate |
178
0
0
0
0
dcharter |
178
0
0
0
0
pbeach |
178 30.57133 35.58442
1.29 224.296
ppier |
178 30.57133 35.58442
1.29 224.296
pprivate |
178 82.42908 69.30802
2.29 494.058
-------------+-------------------------------------------------------pcharter |
178 109.7633 72.37726
27.29 529.058
qbeach |
178 .2614444 .1949684 .0678 .5333
qpier |
178 .2025348 .1702942
.0014
.4522
qprivate |
178 .1501489 .0968393
.0014
.2601
qcharter |
178 .4980798 .3756255
.0029 1.0266
-------------+-------------------------------------------------------income |
178 3387.172 2340.324 416.6667
12500
ydiv1000 |
178 3.387172 2.340324 .4166667
12.5

263

----------------------------------------------------------------------------------------------------> mode = private


Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
418
3
0
3
3
price |
418 41.60681 55.90806
2.29 666.11
crate |
418 .1775411 .2435798
.0002
.7369
dbeach |
418
0
0
0
0
dpier |
418
0
0
0
0
-------------+-------------------------------------------------------dprivate |
418
1
0
1
1
dcharter |
418
0
0
0
0
pbeach |
418 137.5271 115.3058
2.29 843.186
ppier |
418 137.5271 115.3058
2.29 843.186
pprivate |
418 41.60681 55.90806
2.29 666.11
-------------+-------------------------------------------------------pcharter |
418 70.58409 56.39575
27.29 691.11
qbeach |
418 .2082868 .1729351
.0678
.5333
qpier |
418 .1297646 .1368029
.0025 .4522
qprivate |
418 .1775411 .2435798
.0002
.7369
qcharter |
418 .6539167 .8064379
.0021 2.3101
-------------+-------------------------------------------------------income |
418 4654.107 2777.898 416.6667
12500
ydiv1000 |
418 4.654107 2.777898 .4166667
12.5
----------------------------------------------------------------------------------------------------> mode = charter
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
452
4
0
4
4
price |
452 75.09694 52.51942
27.29 387.208
crate |
452 .6914998 .7714728
.0029 2.3101
dbeach |
452
0
0
0
0
dpier |
452
0
0
0
0
-------------+-------------------------------------------------------dprivate |
452
0
0
0
0
dcharter |
452
1
0
1
1
pbeach |
452 120.6483 99.78664
4.29 578.048
ppier |
452 120.6483 99.78664
4.29 578.048
pprivate |
452 44.56376 52.23744
2.29 362.208
-------------+-------------------------------------------------------pcharter |
452 75.09694 52.51942
27.29 387.208
qbeach |
452 .2519077 .1997956
.0678
.5333
qpier |
452 .1595341 .1667353
.0014
.4522
qprivate |
452 .1771628 .2318749
.0014
.7369
qcharter |
452 .6914998 .7714728
.0029 2.3101
-------------+-------------------------------------------------------income |
452
3880.9 2050.028 416.6667
12500
ydiv1000 |
452
3.8809 2.050028 .4166667
12.5
264

.
. * Following commands give Table 15.1, p.492
. summarize ydiv100 pbeach ppier pprivate pcharter qbeach qpier /*
> */ qprivate qcharter dbeach dpier dprivate dcharter
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
pcharter |
1182 84.37924 63.54465
27.29 691.11
-------------+-------------------------------------------------------qbeach |
1182 .2410113 .1907524 .0678 .5333
qpier |
1182 .1622237 .1603898
.0014
.4522
qprivate |
1182 .1712146 .2097885
.0002
.7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
dbeach |
1182 .1133672 .3171753
0
1
-------------+-------------------------------------------------------dpier |
1182 .1505922 .3578023
0
1
dprivate |
1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
. sort mode
. by mode: summarize ydiv100 pbeach ppier pprivate pcharter qbeach qpier /*
> */ qprivate qcharter dbeach dpier dprivate dcharter
----------------------------------------------------------------------------------------------------> mode = beach
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ydiv1000 |
134 4.051617 2.50542 .4166667
12.5
pbeach |
134 35.69949 43.09414
1.29 306.82
ppier |
134 35.69949 43.09414
1.29 306.82
pprivate |
134 97.80913 75.43844
2.29 392.946
pcharter |
134 125.0032 78.37641
27.29 427.946
-------------+-------------------------------------------------------qbeach |
134 .2791948 .1938734
.0678
.5333
qpier |
134 .2190015 .1677117
.0025
.4522
qprivate |
134 .1593985 .0948855
.0008
.2601
qcharter |
134 .5176089 .3629096
.0027 1.0266
dbeach |
134
1
0
1
1
-------------+-------------------------------------------------------dpier |
134
0
0
0
0
dprivate |
134
0
0
0
0
dcharter |
134
0
0
0
0

265

----------------------------------------------------------------------------------------------------> mode = pier


Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ydiv1000 |
178 3.387172 2.340324 .4166667
12.5
pbeach |
178 30.57133 35.58442
1.29 224.296
ppier |
178 30.57133 35.58442
1.29 224.296
pprivate |
178 82.42908 69.30802
2.29 494.058
pcharter |
178 109.7633 72.37726
27.29 529.058
-------------+-------------------------------------------------------qbeach |
178 .2614444 .1949684 .0678 .5333
qpier |
178 .2025348 .1702942
.0014
.4522
qprivate |
178 .1501489 .0968393
.0014
.2601
qcharter |
178 .4980798 .3756255
.0029 1.0266
dbeach |
178
0
0
0
0
-------------+-------------------------------------------------------dpier |
178
1
0
1
1
dprivate |
178
0
0
0
0
dcharter |
178
0
0
0
0
----------------------------------------------------------------------------------------------------> mode = private
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ydiv1000 |
418 4.654107 2.777898 .4166667
12.5
pbeach |
418 137.5271 115.3058
2.29 843.186
ppier |
418 137.5271 115.3058
2.29 843.186
pprivate |
418 41.60681 55.90806
2.29 666.11
pcharter |
418 70.58409 56.39575
27.29 691.11
-------------+-------------------------------------------------------qbeach |
418 .2082868 .1729351
.0678
.5333
qpier |
418 .1297646 .1368029
.0025
.4522
qprivate |
418 .1775411 .2435798
.0002
.7369
qcharter |
418 .6539167 .8064379
.0021 2.3101
dbeach |
418
0
0
0
0
-------------+-------------------------------------------------------dpier |
418
0
0
0
0
dprivate |
418
1
0
1
1
dcharter |
418
0
0
0
0
----------------------------------------------------------------------------------------------------> mode = charter
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ydiv1000 |
452
3.8809 2.050028 .4166667
12.5
pbeach |
452 120.6483 99.78664
4.29 578.048
ppier |
452 120.6483 99.78664
4.29 578.048
pprivate |
452 44.56376 52.23744
2.29 362.208
266

pcharter |
452 75.09694 52.51942
27.29 387.208
-------------+-------------------------------------------------------qbeach |
452 .2519077 .1997956
.0678
.5333
qpier |
452 .1595341 .1667353
.0014
.4522
qprivate |
452 .1771628 .2318749
.0014
.7369
qcharter |
452 .6914998 .7714728
.0029 2.3101
dbeach |
452
0
0
0
0
-------------+-------------------------------------------------------dpier |
452
0
0
0
0
dprivate |
452
0
0
0
0
dcharter |
452
1
0
1
1

.
. ********** (1) MULTINOMIAL LOGIT: ALTERNATIVE-INVARIANT REGRESSOR
*********
.
. *** (1A) Estimate the model
.
. * Data are already in form for mlogit
.
. * The following gives MNL column of Table 15.2, p.493
. mlogit mode ydiv1000, basecategory(1)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -1497.7229


log likelihood = -1477.5265
log likelihood = -1477.1514
log likelihood = -1477.1506

Multinomial logistic regression


LR chi2(3)
Prob > chi2
Log likelihood = -1477.1506

Number of obs =
1182
=
41.14
= 0.0000
Pseudo R2
= 0.0137

-----------------------------------------------------------------------------mode |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------pier
|
ydiv1000 | -.1434029 .0532882 -2.69 0.007 -.2478459 -.03896
_cons | .8141503 .2286316 3.56 0.000 .3660405 1.26226
-------------+---------------------------------------------------------------private
|
ydiv1000 | .0919064 .0406638 2.26 0.024 .0122069 .1716059
_cons | .7389208 .1967309 3.76 0.000 .3533352 1.124506
-------------+---------------------------------------------------------------charter
|
ydiv1000 | -.0316399 .0418463 -0.76 0.450 -.1136571 .0503774
_cons | 1.341291 .1945167 6.90 0.000 .9600457 1.722537
-----------------------------------------------------------------------------(Outcome mode==beach is the comparison group)

267

.
. *** (1B) Calculate the marginal effects
.
. quietly mlogit mode ydiv1000, basecategory(1)
. * Predict by default gives the probabilities
. predict p1 p2 p3 p4
(option p assumed; predicted probabilities)
.
. * As check compare predicted to actual probabilities
. summarize dbeach p1 dpier p2 dprivate p3 dcharter p4
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------dbeach |
1182 .1133672 .3171753
0
1
p1 |
1182 .1133672 .0036716 .0947395 .1153659
dpier |
1182 .1505922 .3578023
0
1
p2 |
1182 .1505922 .0444575 .0356142 .2342903
dprivate |
1182 .3536379 .4783008
0
1
-------------+-------------------------------------------------------p3 |
1182 .3536379 .0797714 .2396973 .625706
dcharter |
1182 .3824027 .4861799
0
1
p4 |
1182 .3824027 .0346281 .2439403 .4158273
.
. * Quick way to compute marginal effects (or semi-elasticities dp/dlnx or elasticities)
. * is to use built-in Stata function whcih evaluates at sample mean
. * dydx, eyex, dwex or eydx
. mfx compute, dydx predict(outcome(1))
Marginal effects after mlogit
y = Pr(mode==1) (predict, outcome(1))
= .11541492
-----------------------------------------------------------------------------variable |
dy/dx Std. Err. z P>|z| [ 95% C.I. ]
X
---------+-------------------------------------------------------------------ydiv1000 | .000075
.00393 0.02 0.985 -.007635 .007785 4.09934
-----------------------------------------------------------------------------. mfx compute, dydx predict(outcome(2))
Marginal effects after mlogit
y = Pr(mode==2) (predict, outcome(2))
= .14472379
-----------------------------------------------------------------------------variable |
dy/dx Std. Err. z P>|z| [ 95% C.I. ]
X
---------+-------------------------------------------------------------------ydiv1000 | -.0206598
.00487 -4.24 0.000 -.030212 -.011108 4.09934
------------------------------------------------------------------------------

268

. mfx compute, dydx predict(outcome(3))


Marginal effects after mlogit
y = Pr(mode==3) (predict, outcome(3))
= .35220366
-----------------------------------------------------------------------------variable |
dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
---------+-------------------------------------------------------------------ydiv1000 | .0325985
.00569 5.73 0.000 .021442 .043755 4.09934
-----------------------------------------------------------------------------. mfx compute, dydx predict(outcome(4))
Marginal effects after mlogit
y = Pr(mode==4) (predict, outcome(4))
= .38765763
-----------------------------------------------------------------------------variable |
dy/dx Std. Err. z P>|z| [ 95% C.I. ]
X
---------+-------------------------------------------------------------------ydiv1000 | -.0120137
.00608 -1.98 0.048 -.023922 -.000106 4.09934
-----------------------------------------------------------------------------.
. * Better is to evaluate marginal effect for each observation and average
. * The following calculates marginal effects using noncalculus methods
. * by comparing the predicted probability before and after change in x
. * Here consider small change of 0.0001 - then multiply by 1000
. * So should be similar to using calculus methods.
. replace ydiv1000 = ydiv1000 + 0.0001
(1182 real changes made)
. predict p1new p2new p3new p4new
(option p assumed; predicted probabilities)
. gen dp1dy = 10000*(p1new - p1)
. gen dp2dy = 10000*(p2new - p2)
. gen dp3dy = 10000*(p3new - p3)
. gen dp4dy = 10000*(p4new - p4)
.
. * The computed marginal effects follow.
. * These are close to those given in text page 494 (which were calculated using Limdep)
. sum dp1dy dp2dy dp3dy dp4dy
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------dp1dy |
1182 .0001549 .0015919 -.0042468 .0027567
dp2dy |
1182 -.0207849 .0046004 -.0278652 -.0067055
269

dp3dy |
dp4dy |

1182 .0318045 .0014852 .0280142 .0336766


1182 -.0111929 .0041308 -.0190735 -.0026822

.
. * Note that here these are similar to the earlier values at means
. * This is because little variation in predicted probability across individuals here
.
. * ASIDE: Binary logit will differ a little from MNL
. keep if mode == 1 | mode == 2
(870 observations deleted)
. mlogit mode ydiv1000
Iteration 0: log likelihood = -213.14899
Iteration 1: log likelihood = -210.28877
Iteration 2: log likelihood = -210.28833
Multinomial logistic regression
LR chi2(1)
Prob > chi2
Log likelihood = -210.28833

Number of obs =
312
=
5.72
= 0.0168
Pseudo R2
= 0.0134

-----------------------------------------------------------------------------mode |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------beach
|
ydiv1000 | .1134757 .0481736 2.36 0.018 .0190571 .2078942
_cons | -.7037127 .2125851 -3.31 0.001 -1.120372 -.2870535
-----------------------------------------------------------------------------(Outcome mode==pier is the comparison group)
.
. ******* (2) CONDITIONAL LOGIT: ALTERNATIVE-SPECIFIC REGRESSOR *********
.
. *** (2A) Estimate the model
.
. * This requires reshaping the data
. clear
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. gen ydiv1000 = income/1000
.
. * Data are one entry per individual
. * Need to reshape to 4 observations per individual - one for each alternative
. * Use reshape to do this which also creates variable (see below)
270

. * alternatv = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter


. gen id = _n
. gen d1 = dbeach
. gen p1 = pbeach
. gen q1 = qbeach
. gen d2 = dpier
. gen p2 = ppier
. gen q2 = qpier
. gen d3 = dprivate
. gen p3 = pprivate
. gen q3 = qprivate
. gen d4 = dcharter
. gen p4 = pcharter
. gen q4 = qcharter
. describe
Contains data
obs:
1,182
vars:
30
size:
146,568 (98.6% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------mode
float %9.0g
price
float %9.0g
crate
float %9.0g
dbeach
float %9.0g
dpier
float %9.0g
dprivate
float %9.0g
dcharter
float %9.0g
pbeach
float %9.0g
ppier
float %9.0g
pprivate
float %9.0g
pcharter
float %9.0g
qbeach
float %9.0g
qpier
float %9.0g
qprivate
float %9.0g
271

qcharter
float %9.0g
income
float %9.0g
ydiv1000
float %9.0g
id
float %9.0g
d1
float %9.0g
p1
float %9.0g
q1
float %9.0g
d2
float %9.0g
p2
float %9.0g
q2
float %9.0g
d3
float %9.0g
p3
float %9.0g
q3
float %9.0g
d4
float %9.0g
p4
float %9.0g
q4
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
1182 3.005076 .9936162
1
4
price |
1182 52.08197 53.82997
1.29 666.11
crate |
1182 .3893684 .5605964
.0002 2.3101
dbeach |
1182 .1133672 .3171753
0
1
dpier |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------dprivate |
1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
-------------+-------------------------------------------------------pcharter |
1182 84.37924 63.54465
27.29 691.11
qbeach |
1182 .2410113 .1907524
.0678
.5333
qpier |
1182 .1622237 .1603898
.0014
.4522
qprivate |
1182 .1712146 .2097885 .0002 .7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
-------------+-------------------------------------------------------income |
1182 4099.337 2461.964 416.6667
12500
ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
id |
1182
591.5 341.3583
1
1182
d1 |
1182 .1133672 .3171753
0
1
p1 |
1182 103.422 103.641
1.29 843.186
-------------+-------------------------------------------------------q1 |
1182 .2410113 .1907524
.0678
.5333
d2 |
1182 .1505922 .3578023
0
1
p2 |
1182 103.422 103.641
1.29 843.186
272

q2 |
1182 .1622237 .1603898
.0014
.4522
d3 |
1182 .3536379 .4783008
0
1
-------------+-------------------------------------------------------p3 |
1182 55.25657 62.71344
2.29 666.11
q3 |
1182 .1712146 .2097885
.0002
.7369
d4 |
1182 .3824027 .4861799
0
1
p4 |
1182 84.37924 63.54465
27.29 691.11
q4 |
1182 .6293679 .7061142
.0021 2.3101
.
. reshape long d p q, i(id) j(alterntv)
(note: j = 1 2 3 4)
Data
wide -> long
----------------------------------------------------------------------------Number of obs.
1182 -> 4728
Number of variables
30 ->
22
j variable (4 values)
-> alterntv
xij variables:
d1 d2 ... d4 -> d
p1 p2 ... p4 -> p
q1 q2 ... q4 -> q
----------------------------------------------------------------------------. * This automatically creates alterntv = 1 (beach), ... 4 (charter)
. describe
Contains data
obs:
4,728
vars:
22
size:
420,792 (95.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
alterntv
byte %9.0g
mode
float %9.0g
price
float %9.0g
crate
float %9.0g
dbeach
float %9.0g
dpier
float %9.0g
dprivate
float %9.0g
dcharter
float %9.0g
pbeach
float %9.0g
ppier
float %9.0g
pprivate
float %9.0g
pcharter
float %9.0g
qbeach
float %9.0g
qpier
float %9.0g
qprivate
float %9.0g
273

qcharter
float %9.0g
income
float %9.0g
ydiv1000
float %9.0g
d
float %9.0g
p
float %9.0g
q
float %9.0g
------------------------------------------------------------------------------Sorted by: id alterntv
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919
.0678
.5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219 .0002 .7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------p|
4728 86.61996 88.01813
1.29 843.186
q|
4728 .3009544 .4335593
.0002 2.3101
.
. clogit d q, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -1627.3339


log likelihood = -1604.8049
log likelihood = -1604.6163
log likelihood = -1604.6163

Conditional (fixed-effects) logistic regression Number of obs =


LR chi2(1)
=
67.97

4728

274

Prob > chi2


Log likelihood = -1604.6163

= 0.0000
Pseudo R2
=

0.0207

-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------q | .6307908 .0757624 8.33 0.000 .4822993 .7792823
-----------------------------------------------------------------------------. clogit d p, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -1595.7652


log likelihood = -1411.4335
log likelihood = -1376.0224
log likelihood = -1372.9619
log likelihood = -1372.9332
log likelihood = -1372.9332

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(1)
= 531.33
Prob > chi2 = 0.0000
Log likelihood = -1372.9332
Pseudo R2
= 0.1621
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------p | -.0179501 .0010694 -16.79 0.000 -.0200461 -.0158542
-----------------------------------------------------------------------------.
. * The following gives CL column of Table 15.2
. clogit d p q, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -1581.9099


log likelihood = -1363.5718
log likelihood = -1317.8453
log likelihood = -1312.1013
log likelihood = -1311.9797
log likelihood = -1311.9796

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(2)
= 653.24
Prob > chi2 = 0.0000
Log likelihood = -1311.9796
Pseudo R2
= 0.1993
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------p | -.0204765 .0012231 -16.74 0.000 -.0228737 -.0180794
q | .9530985 .0894134 10.66 0.000 .7778514 1.128346
-----------------------------------------------------------------------------275

.
. *** (2B) Calculate the marginal effects
.
. quietly clogit d p q, group(id)
. predict pinitial
(option pc1 assumed; conditional probability for single outcome within group)
.
. * Now compute marginal effects
. * Consider in turn a change in each price and catch rate
. * Change price by 1 unit and then multiply by 100 as in Table 15.2
. * Change catch rate by 0.001 and then multiply by 1000
.
. * Change p1: price beach
. replace p = p + 1 if alterntv==1
(1182 real changes made)
. predict pnewp1
(option pc1 assumed; conditional probability for single outcome within group)
. gen mep1 = 100*(pnewp1 - pinitial)
. replace p = p - 1 if alterntv==1
(1182 real changes made)
.
. * Change p2: price pier
. replace p = p + 1 if alterntv==2
(1182 real changes made)
. predict pnewp2
(option pc1 assumed; conditional probability for single outcome within group)
. gen mep2 = 100*(pnewp2 - pinitial)
. replace p = p - 1 if alterntv==2
(1182 real changes made)
.
. * Change p3: price private boat
. replace p = p + 1 if alterntv==3
(1182 real changes made)
. predict pnewp3
(option pc1 assumed; conditional probability for single outcome within group)
. gen mep3 = 100*(pnewp3 - pinitial)
. replace p = p - 1 if alterntv==3
276

(1182 real changes made)


.
. * Change p4: price charter boat
. replace p = p + 1 if alterntv==4
(1182 real changes made)
. predict pnewp4
(option pc1 assumed; conditional probability for single outcome within group)
. gen mep4 = 100*(pnewp4 - pinitial)
. replace p = p - 1 if alterntv==4
(1182 real changes made)
.
. * Change q1: catch rate beach
. replace q = q + 0.001 if alterntv==1
(1182 real changes made)
. predict pnewq1
(option pc1 assumed; conditional probability for single outcome within group)
. gen meq1 = 1000*(pnewq1 - pinitial)
. replace q = q - 0.001 if alterntv==1
(1182 real changes made)
.
. * Change q2: catch rate pier
. replace q = q + 0.001 if alterntv==2
(1182 real changes made)
. predict pnewq2
(option pc1 assumed; conditional probability for single outcome within group)
. gen meq2 = 1000*(pnewq2 - pinitial)
. replace q = q - 0.001 if alterntv==2
(1182 real changes made)
.
. * Change q1: catch rate private boat
. replace q = q + 0.001 if alterntv==3
(1182 real changes made)
. predict pnewq3
(option pc1 assumed; conditional probability for single outcome within group)
. gen meq3 = 1000*(pnewq3 - pinitial)

277

. replace q = q - 0.001 if alterntv==3


(1182 real changes made)
.
. * Change q1: catch rate charter boat
. replace q = q + 0.001 if alterntv==4
(1182 real changes made)
. predict pnewq4
(option pc1 assumed; conditional probability for single outcome within group)
. gen meq4 = 1000*(pnewq4 - pinitial)
. replace q = q + 0.001 if alterntv==4
(1182 real changes made)
.
. * Following gives Table 15.3 on page 493
. sort alterntv
. by alterntv: sum pinitial mep1 mep2 mep3 mep4 meq1 meq2 meq3 meq4
----------------------------------------------------------------------------------------------------> alterntv = 1
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------pinitial |
1182 .1942074 .1545855 6.19e-08 .6159062
mep1 |
1182 -.2703818 .1753241 -.5119085 -1.26e-07
mep2 |
1182 .1183563 .1425011
0 .5107701
mep3 |
1182 .0846517 .0561764 6.24e-08 .1818448
mep4 |
1182 .0675326 .0398588 6.44e-08 .1960158
-------------+-------------------------------------------------------meq1 |
1182 .1264198 .0817316 5.91e-08 .2382994
meq2 |
1182 -.0552685 .0664207 -.2378225
0
meq3 |
1182 -.0395602 .0262581 -.0849366 -2.91e-08
meq4 |
1182 -.0315872 .0186528 -.0915527 -3.00e-08
----------------------------------------------------------------------------------------------------> alterntv = 2
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------pinitial |
1182 .1832872 .1456892 5.73e-08 .484103
mep1 |
1182 .1184102 .1425963
0 .5111754
mep2 |
1182 -.2618934 .1742628 -.5112112 -1.16e-07
mep3 |
1182 .0801368 .0543153 5.78e-08 .1729459
mep4 |
1182 .0636229 .0381182 5.96e-08 .1775354
-------------+-------------------------------------------------------meq1 |
1182 -.0552672 .0664175 -.2378225
0
meq2 |
1182 .1224849 .0812789 5.47e-08 .2380311
278

meq3 |
meq4 |

1182 -.0374514
1182 -.0297604

.0253908 -.0807345 -2.69e-08


.0178421 -.0829101 -2.78e-08

----------------------------------------------------------------------------------------------------> alterntv = 3
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------pinitial |
1182 .3298317 .173932 .0000756 .6739099
mep1 |
1182 .084509 .0561326
0 .1815647
mep2 |
1182 .0799891 .0542687
0 .172469
mep3 |
1182 -.3897785 .1364849 -.5119085 -.0001532
mep4 |
1182 .2248109 .1606873 1.24e-08 .5118489
-------------+-------------------------------------------------------meq1 |
1182 -.0395636
.02626 -.0849366
0
meq2 |
1182 -.0374553 .0253917 -.0807345
0
meq3 |
1182 .1818861 .0633881 .0000721 .2382994
meq4 |
1182 -.104879 .0748259 -.2382398 -7.28e-09
----------------------------------------------------------------------------------------------------> alterntv = 4
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------pinitial |
1182 .2926737 .1807255 .000078 .7322331
mep1 |
1182 .0674624 .0398696
0 .1958013
mep2 |
1182 .0635479 .0381287
0 .1772434
mep3 |
1182
.22499 .1608719 1.24e-08 .511682
mep4 |
1182 -.3559665 .1370352 -.5119085 -.0001582
-------------+-------------------------------------------------------meq1 |
1182 -.0315891 .018653 -.0915825
0
meq2 |
1182 -.0297618 .0178418 -.0829399
0
meq3 |
1182 -.1048757 .0748219 -.2382398 -7.28e-09
meq4 |
1182 .1662257 .0636901 .0000744 .2382994

.
. ******* (3) CONDITIONAL LOGIT: ALTERNATIVE-INVARIANT REGRESSOR *********
.
. * Here we get clogit to do something that is easier done by mlogit
.
. clear
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. gen ydiv1000 = income/1000

279

.
. * Data are one entry per individual
. * Need to reshape to 4 observations per individual - one for each alternative
. * Use reshape to do this but first create variable
. * Alternative = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
. gen id = _n
. gen d1 = dbeach
. gen d2 = dpier
. gen d3 = dprivate
. gen d4 = dcharter
. describe
Contains data
obs:
1,182
vars:
22
size:
108,744 (98.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------mode
float %9.0g
price
float %9.0g
crate
float %9.0g
dbeach
float %9.0g
dpier
float %9.0g
dprivate
float %9.0g
dcharter
float %9.0g
pbeach
float %9.0g
ppier
float %9.0g
pprivate
float %9.0g
pcharter
float %9.0g
qbeach
float %9.0g
qpier
float %9.0g
qprivate
float %9.0g
qcharter
float %9.0g
income
float %9.0g
ydiv1000
float %9.0g
id
float %9.0g
d1
float %9.0g
d2
float %9.0g
d3
float %9.0g
d4
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved

280

. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
1182 3.005076 .9936162
1
4
price |
1182 52.08197 53.82997
1.29 666.11
crate |
1182 .3893684 .5605964
.0002 2.3101
dbeach |
1182 .1133672 .3171753
0
1
dpier |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------dprivate | 1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
-------------+-------------------------------------------------------pcharter |
1182 84.37924 63.54465
27.29 691.11
qbeach |
1182 .2410113 .1907524 .0678 .5333
qpier |
1182 .1622237 .1603898
.0014
.4522
qprivate |
1182 .1712146 .2097885
.0002
.7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
-------------+-------------------------------------------------------income |
1182 4099.337 2461.964 416.6667
12500
ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
id |
1182
591.5 341.3583
1
1182
d1 |
1182 .1133672 .3171753
0
1
d2 |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------d3 | 1182 .3536379 .4783008
0
1
d4 |
1182 .3824027 .4861799
0
1
.
. reshape long d, i(id) j(alterntv)
(note: j = 1 2 3 4)
Data
wide -> long
----------------------------------------------------------------------------Number of obs.
1182 -> 4728
Number of variables
22 ->
20
j variable (4 values)
-> alterntv
xij variables:
d1 d2 ... d4 -> d
----------------------------------------------------------------------------. describe
Contains data
obs:
4,728
vars:
20
size:
382,968 (96.3% of memory free)
------------------------------------------------------------------------------281

storage display value


variable name type format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
alterntv
byte %9.0g
mode
float %9.0g
price
float %9.0g
crate
float %9.0g
dbeach
float %9.0g
dpier
float %9.0g
dprivate
float %9.0g
dcharter
float %9.0g
pbeach
float %9.0g
ppier
float %9.0g
pprivate
float %9.0g
pcharter
float %9.0g
qbeach
float %9.0g
qpier
float %9.0g
qprivate
float %9.0g
qcharter
float %9.0g
income
float %9.0g
ydiv1000
float %9.0g
d
float %9.0g
------------------------------------------------------------------------------Sorted by: id alterntv
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919
.0678
.5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219
.0002
.7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
282

income |
4728 4099.337 2461.183 416.6667
ydiv1000 |
4728 4.099337 2.461183 .4166667
d|
4728
.25 .4330585
0
1

12500
12.5

.
. gen obsnum=_n
. gen d2 = 0
. replace d2 = 1 if mod(obsnum,4)==2
(1182 real changes made)
. gen d3 = 0
. replace d3 = 1 if mod(obsnum,4)==3
(1182 real changes made)
. gen d4 = 0
. replace d4 = 1 if mod(obsnum,4)==0
(1182 real changes made)
. gen d2y = 0
. replace d2y = d2*ydiv1000
(1182 real changes made)
. gen d3y = 0
. replace d3y = d3*ydiv1000
(1182 real changes made)
. gen d4y = 0
. replace d4y = d4*ydiv1000
(1182 real changes made)
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
283

pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919 .0678 .5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219
.0002 .7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------obsnum |
4728
2364.5
1365
1
4728
d2 |
4728
.25 .4330585
0
1
d3 |
4728
.25 .4330585
0
1
d4 |
4728
.25 .4330585
0
1
d2y |
4728 1.024834 2.160064
0
12.5
-------------+-------------------------------------------------------d3y |
4728 1.024834 2.160064
0
12.5
d4y |
4728 1.024834 2.160064
0
12.5
.
. * The following gives MNL column of Table 15.2, p.493,
. * which was more easily obtained using mlogit earlier
. clogit d d2 d3 d4 d2y d3y d4y, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -1570.1863


log likelihood = -1479.3713
log likelihood = -1477.159
log likelihood = -1477.1506
log likelihood = -1477.1506

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(6)
= 322.90
Prob > chi2 = 0.0000
Log likelihood = -1477.1506
Pseudo R2
= 0.0985
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------d2 | .8141503 .228632 3.56 0.000 .3660399 1.262261
d3 | .7389208 .1967309 3.76 0.000 .3533352 1.124506
d4 | 1.341291 .1945167 6.90 0.000 .9600457 1.722537
d2y | -.1434029 .0532884 -2.69 0.007 -.2478463 -.0389595
d3y | .0919064 .0406637 2.26 0.024 .0122069 .1716058
d4y | -.0316399 .0418463 -0.76 0.450 -.1136571 .0503774
-----------------------------------------------------------------------------.
284

. ******* (4) "MIXED LOGIT" = CONDITIONAL LOGIT WITH BOTH


.*
ALTERNATIVE-SPECIFIC REGRESSOR
.*
AND ALTERNATIVE INVARIANT REGRESSOR *********
.
. clear
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. gen ydiv1000 = income/1000
.
. * Data are one entry per individual
. * Need to reshape to 4 observations per individual - one for each alternative
. * Use reshape to do this but first create variable
. * Alternative = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
. gen id = _n
. gen d1 = dbeach
. gen p1 = pbeach
. gen q1 = qbeach
. gen d2 = dpier
. gen p2 = ppier
. gen q2 = qpier
. gen d3 = dprivate
. gen p3 = pprivate
. gen q3 = qprivate
. gen d4 = dcharter
. gen p4 = pcharter
. gen q4 = qcharter
.
. reshape long d p q, i(id) j(alterntv)
(note: j = 1 2 3 4)
Data
wide -> long
----------------------------------------------------------------------------285

Number of obs.
1182 -> 4728
Number of variables
30 ->
22
j variable (4 values)
-> alterntv
xij variables:
d1 d2 ... d4 -> d
p1 p2 ... p4 -> p
q1 q2 ... q4 -> q
----------------------------------------------------------------------------. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter | 4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919
.0678
.5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219
.0002
.7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------p|
4728 86.61996 88.01813
1.29 843.186
q|
4728 .3009544 .4335593
.0002 2.3101
.
. * Bring in alternative specific dummies
. * Since d2-d4 already used instead call them dummy2 - dummy4
. gen obsnum=_n
. gen dummy1 = 0
. replace dummy1 = 1 if mod(obsnum,4)==1
(1182 real changes made)
. gen dummy2 = 0
286

. replace dummy2 = 1 if mod(obsnum,4)==2


(1182 real changes made)
. gen dummy3 = 0
. replace dummy3 = 1 if mod(obsnum,4)==3
(1182 real changes made)
. gen dummy4 = 0
. replace dummy4 = 1 if mod(obsnum,4)==0
(1182 real changes made)
. * And interact with income
. gen d1y = 0
. replace d1y = dummy1*ydiv1000
(1182 real changes made)
. gen d2y = 0
. replace d2y = dummy2*ydiv1000
(1182 real changes made)
. gen d3y = 0
. replace d3y = dummy3*ydiv1000
(1182 real changes made)
. gen d4y = 0
. replace d4y = dummy4*ydiv1000
(1182 real changes made)
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
287

-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919
.0678
.5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219 .0002 .7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------p|
4728 86.61996 88.01813
1.29 843.186
q|
4728 .3009544 .4335593
.0002 2.3101
obsnum |
4728
2364.5
1365
1
4728
dummy1 |
4728
.25 .4330585
0
1
dummy2 |
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------dummy3 |
4728
.25 .4330585
0
1
dummy4 |
4728
.25 .4330585
0
1
d1y |
4728 1.024834 2.160064
0
12.5
d2y |
4728 1.024834 2.160064
0
12.5
d3y |
4728 1.024834 2.160064
0
12.5
-------------+-------------------------------------------------------d4y |
4728 1.024834 2.160064
0
12.5
.
. clogit d dummy2 dummy3 dummy4 p q, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:

log likelihood = -1548.5161


log likelihood = -1311.3761
log likelihood = -1247.5777
log likelihood = -1232.1412
log likelihood = -1230.7975
log likelihood = -1230.7838
log likelihood = -1230.7838

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(5)
= 815.63
Prob > chi2 = 0.0000
Log likelihood = -1230.7838
Pseudo R2
= 0.2489
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------dummy2 | .3070552 .1145738 2.68 0.007 .0824947 .5316158
dummy3 | .8713749 .1140428 7.64 0.000 .6478551 1.094895
dummy4 | 1.498888 .1329328 11.28 0.000 1.238345 1.759432
p | -.0247896 .0017044 -14.54 0.000 -.0281301 -.021449
q | .3771689 .1099707 3.43 0.001 .1616303 .5927074
288

-----------------------------------------------------------------------------.
. * The following gives Mixed column of Table 15.2, p.493
. clogit d p q dummy2 dummy3 dummy4 d2y d3y d4y, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:

log likelihood = -1538.389


log likelihood = -1297.4143
log likelihood = -1233.5431
log likelihood = -1216.8043
log likelihood = -1215.1582
log likelihood = -1215.1376
log likelihood = -1215.1376

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(8)
= 846.92
Prob > chi2 = 0.0000
Log likelihood = -1215.1376
Pseudo R2
= 0.2584
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------p | -.0251166 .0017317 -14.50 0.000 -.0285106 -.0217225
q | .357782 .1097733 3.26 0.001 .1426302 .5729337
dummy2 | .7779594 .2204939 3.53 0.000 .3457992 1.21012
dummy3 | .5272788 .2227927 2.37 0.018 .0906131 .9639444
dummy4 | 1.694366 .2240506 7.56 0.000 1.255235 2.133497
d2y | -.1275771 .0506395 -2.52 0.012 -.2268288 -.0283255
d3y | .0894398 .0500671 1.79 0.074 -.0086898 .1875695
d4y | -.0332917 .0503409 -0.66 0.508 -.131958 .0653746
-----------------------------------------------------------------------------.
. * Output data file for Read into Limdep program mma15p4gev.lim
. outfile id d p q ydiv1000 dummy2 dummy3 dummy4 d2y d3y d4y using mma15p4gev.asc, replace
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma15p1mnl.txt
log type: text
closed on: 19 May 2005, 12:16:24
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma15p2gev.txt
log type: text
opened on: 19 May 2005, 12:16:29
.
. ********** OVERVIEW OF MMA15P2GEV.DO **********
.
289

. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 15.6.3 page 511
. * Nested logit (GEV) model analysis.
. * (1) Set data up and reproduce Mixed estimates in Table 15.2 p.493
. * (2A) Nested logit model estimates (page 511)
. * (2B) Restricted nested logit model estimates (page 511)
. * (2C) Equivalent conditional logit model estimates (same as (2B))
.
. * Related programs are
. * mma15p1mnl.do multinomial and conditional logit using Stata
. * mma15p3mnl.lim multinomial logit using Limdep
. * mma15p4gev.lim conditional and nested logit using Limdep and Nlogit
.
. * To run this program you need data file
. * Nldata.asc
.
. * NOTE: The example here is deliberately simple and merely illustrative.
.*
with nesting structure
.*
/ \
.*
/ \ / \
. * In this case with parameter rho_j differing across alternatives
. * Stata 8 estimates the earlier variant of the nested logit model
. * rather than the preferred variant given in the text.
. * See the discussion at bottom of page 511 and also Train (2003, p.88)
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION **********
.
. * Data Set comes from :
. * J. A. Herriges and C. L. Kling,
. * "Nonlinear Income Effects in Random Utility Models",
. * Review of Economics and Statistics, 81(1999): 62-72
.
. * The data are given as a combined observation with data on all 4 choices.
. * This will work for multinomial logit program.
. * For conditional logit will need to make a new data set which has
. * four separate entries for each observation as there are four alternatives.
.
290

. * Filename: NLDATA.ASC
. * Format: Ascii
. * Number of Observations: 1182
. * Each observations appears over 3 lines with 4 variables per line
. * so 4 x 1182 = 4728 observations
. * Variable Number and Description
. * 1 Recreation mode choice. = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
. * 2 Price for chosen alternative
. * 3 Catch rate for chosen alternative
. * 4 = 1 if beach mode chosen; = 0 otherwise
. * 5 = 1 if pier mode chosen; = 0 otherwise
. * 6 = 1 if private boat mode chosen; = 0 otherwise
. * 7 = 1 if charter boat mode chosen; = 0 otherwise
. * 8 = price for beach mode
. * 9 = price for pier mode
. * 10 = price for private boat mode
. * 11 = price for charter boat mode
. * 12 = catch rate for beach mode
. * 13 = catch rate for pier mode
. * 14 = catch rate for private boat mode
. * 15 = catch rate for charter boat mode
. * 16 = monthly income
.
. ******* (1) CONDITIONAL LOGIT MODEL (Table 15.2 p.493 Mixed column) *********
.
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. gen ydiv1000 = income/1000
.
. * Data are one entry per individual
. * Need to reshape to 4 observations per individual - one for each alternative
. * Use reshape to do this which also creates variable (see below)
. * alternatv = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
. gen id = _n
. gen d1 = dbeach
. gen p1 = pbeach
. gen q1 = qbeach
. gen d2 = dpier
. gen p2 = ppier
. gen q2 = qpier
291

. gen d3 = dprivate
. gen p3 = pprivate
. gen q3 = qprivate
. gen d4 = dcharter
. gen p4 = pcharter
. gen q4 = qcharter
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
1182 3.005076 .9936162
1
4
price |
1182 52.08197 53.82997
1.29 666.11
crate |
1182 .3893684 .5605964
.0002 2.3101
dbeach |
1182 .1133672 .3171753
0
1
dpier |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------dprivate |
1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
-------------+-------------------------------------------------------pcharter |
1182 84.37924 63.54465
27.29 691.11
qbeach |
1182 .2410113 .1907524
.0678
.5333
qpier |
1182 .1622237 .1603898
.0014
.4522
qprivate |
1182 .1712146 .2097885
.0002
.7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
-------------+-------------------------------------------------------income |
1182 4099.337 2461.964 416.6667
12500
ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
id |
1182
591.5 341.3583
1
1182
d1 |
1182 .1133672 .3171753
0
1
p1 |
1182 103.422 103.641
1.29 843.186
-------------+-------------------------------------------------------q1 |
1182 .2410113 .1907524
.0678
.5333
d2 |
1182 .1505922 .3578023
0
1
p2 |
1182 103.422 103.641
1.29 843.186
q2 |
1182 .1622237 .1603898
.0014
.4522
d3 |
1182 .3536379 .4783008
0
1
-------------+-------------------------------------------------------p3 |
1182 55.25657 62.71344
2.29 666.11
q3 |
1182 .1712146 .2097885
.0002
.7369
d4 |
1182 .3824027 .4861799
0
1
p4 |
1182 84.37924 63.54465
27.29 691.11
292

q4 |

1182 .6293679

.7061142

.0021

2.3101

.
. reshape long d p q, i(id) j(alterntv)
(note: j = 1 2 3 4)
Data
wide -> long
----------------------------------------------------------------------------Number of obs.
1182 -> 4728
Number of variables
30 ->
22
j variable (4 values)
-> alterntv
xij variables:
d1 d2 ... d4 -> d
p1 p2 ... p4 -> p
q1 q2 ... q4 -> q
----------------------------------------------------------------------------. * This automatically creates alterntv = 1 (beach), ... 4 (charter)
. describe
Contains data
obs:
4,728
vars:
22
size:
420,792 (95.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
alterntv
byte %9.0g
mode
float %9.0g
price
float %9.0g
crate
float %9.0g
dbeach
float %9.0g
dpier
float %9.0g
dprivate
float %9.0g
dcharter
float %9.0g
pbeach
float %9.0g
ppier
float %9.0g
pprivate
float %9.0g
pcharter
float %9.0g
qbeach
float %9.0g
qpier
float %9.0g
qprivate
float %9.0g
qcharter
float %9.0g
income
float %9.0g
ydiv1000
float %9.0g
d
float %9.0g
p
float %9.0g
q
float %9.0g
------------------------------------------------------------------------------293

Sorted by: id alterntv


Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185 .0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919 .0678 .5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219
.0002
.7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------p|
4728 86.61996 88.01813
1.29 843.186
q|
4728 .3009544 .4335593
.0002 2.3101
.
. * Bring in alternative specific dummies
. * Since d2-d4 already used instead call them dummy2 - dummy4
. gen obsnum=_n
. gen dummy1 = (mod(obsnum,4)==1) * 1
. gen dummy2 = (mod(obsnum,4)==2) * 1
. gen dummy3 = (mod(obsnum,4)==3) * 1
. gen dummy4 = (mod(obsnum,4)==0) * 1
. gen d1y = (mod(obsnum,4)==1) * ydiv1000
. gen d2y = (mod(obsnum,4)==2) * ydiv1000

294

. gen d3y = (mod(obsnum,4)==3) * ydiv1000


. gen d4y = (mod(obsnum,4)==0) * ydiv1000
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919
.0678
.5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219
.0002
.7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------p|
4728 86.61996 88.01813
1.29 843.186
q|
4728 .3009544 .4335593
.0002 2.3101
obsnum |
4728
2364.5
1365
1
4728
dummy1 |
4728
.25 .4330585
0
1
dummy2 |
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------dummy3 |
4728
.25 .4330585
0
1
dummy4 |
4728
.25 .4330585
0
1
d1y |
4728 1.024834 2.160064
0
12.5
d2y |
4728 1.024834 2.160064
0
12.5
d3y |
4728 1.024834 2.160064
0
12.5
-------------+-------------------------------------------------------d4y |
4728 1.024834 2.160064
0
12.5
.
. * The following gives Mixed column of Table 15.2 p.493
. * Note that dummy1 and d1y are omitted to avoid dummy variablle trap
.
295

. clogit d dummy2 dummy3 dummy4 d2y d3y d4y p q, group(id)


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:

log likelihood = -1538.389


log likelihood = -1297.4143
log likelihood = -1233.5431
log likelihood = -1216.8043
log likelihood = -1215.1582
log likelihood = -1215.1376
log likelihood = -1215.1376

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(8)
= 846.92
Prob > chi2 = 0.0000
Log likelihood = -1215.1376
Pseudo R2
= 0.2584
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------dummy2 | .7779594 .2204939 3.53 0.000 .3457992 1.21012
dummy3 | .5272788 .2227927 2.37 0.018 .0906131 .9639444
dummy4 | 1.694366 .2240506 7.56 0.000 1.255235 2.133497
d2y | -.1275771 .0506395 -2.52 0.012 -.2268288 -.0283255
d3y | .0894398 .0500671 1.79 0.074 -.0086898 .1875695
d4y | -.0332917 .0503409 -0.66 0.508 -.131958 .0653746
p | -.0251166 .0017317 -14.50 0.000 -.0285106 -.0217225
q | .357782 .1097733 3.26 0.001 .1426302 .5729337
-----------------------------------------------------------------------------.
. ******* (2) NESTED LOGIT MODEL (p.511) *********
.
. * Define the Tree for Nested logit
.*
with nesting structure
.*
/ \
.*
/ \ / \
. * In this case with parameter rho_j differing across alternatives
. * Stata 8 estimates the earlier variant of the nested logit model
. * rather than the preferred variant given in the text.
. * See the discussion at bottom of page 511 and also Train (2003, p.88)
.
. nlogitgen type = alterntv(shore: 1 | 2 , boat: 3 | 4)
new variable type is generated with 2 groups
label list lb_type
lb_type:
1 shore
2 boat
. nlogittree alterntv type
tree structure specified for the nested logit model

296

top --> bottom


type
alterntv
-------------------------shore
1
2
boat
3
4
.
. *** (2A) Estimate the nested logit model
. ***
This is the model on p.511 that has "higher log-likelihood"
.
. * For the top level we use regressors that do not vary at the lower level
. * So not p or q, but could be income or alternative dummy
. * Here use income and alternative dummy
. gen dshore = (type ==1) * 1
. gen dshorey = (type ==1) * ydiv1000
. nlogit d (alterntv = p q) (type = dshore dshorey), group(id)
tree structure specified for the nested logit model
top --> bottom
type
alterntv
-------------------------shore
1
2
boat
3
4
initial:
log likelihood = -1256.8179
rescale:
log likelihood = -1256.8179
rescale eq: log likelihood = -1228.6278
Iteration 0: log likelihood = -1228.6278
Iteration 1: log likelihood = -1227.407 (backed up)
Iteration 2: log likelihood = -1225.366 (backed up)
Iteration 3: log likelihood = -1216.5831 (backed up)
Iteration 4: log likelihood = -1210.9623
Iteration 5: log likelihood = -1210.323 (backed up)
Iteration 6: log likelihood = -1199.5959
Iteration 7: log likelihood = -1198.2166
Iteration 8: log likelihood = -1193.1834
Iteration 9: log likelihood = -1190.8805
Iteration 10: log likelihood = -1188.0112
Iteration 11: log likelihood = -1185.7944
Iteration 12: log likelihood = -1184.8715
Iteration 13: log likelihood = -1183.776
Iteration 14: log likelihood = -1182.6316
297

Iteration 15: log likelihood = -1182.1119


Iteration 16: log likelihood = -1181.8783
Iteration 17: log likelihood = -1181.323
Iteration 18: log likelihood = -1181.162
Iteration 19: log likelihood = -1180.912
Iteration 20: log likelihood = -1180.7877
Iteration 21: log likelihood = -1180.5545
Iteration 22: log likelihood = -1180.4177
Iteration 23: log likelihood = -1180.2966
BFGS stepping has contracted, resetting BFGS Hessian (0)
Iteration 24: log likelihood = -1180.2253
Iteration 25: log likelihood = -1180.2209 (backed up)
Iteration 26: log likelihood = -1180.2139 (backed up)
Iteration 27: log likelihood = -1180.2137 (backed up)
Iteration 28: log likelihood = -1180.2113
Iteration 29: log likelihood = -1180.2019
Iteration 30: log likelihood = -1180.1739
Iteration 31: log likelihood = -1180.1278
BFGS stepping has contracted, resetting BFGS Hessian (1)
Iteration 32: log likelihood = -1180.0852
Iteration 33: log likelihood = -1180.0773 (backed up)
Iteration 34: log likelihood = -1180.0762 (backed up)
Iteration 35: log likelihood = -1180.0762 (backed up)
Iteration 36: log likelihood = -1180.0758
Iteration 37: log likelihood = -1180.0694
Iteration 38: log likelihood = -1180.0671
Iteration 39: log likelihood = -1180.0664
BFGS stepping has contracted, resetting BFGS Hessian (2)
Iteration 40: log likelihood = -1180.058
Iteration 41: log likelihood = -1180.0576 (backed up)
Iteration 42: log likelihood = -1180.0575 (backed up)
Iteration 43: log likelihood = -1180.0575 (backed up)
Iteration 44: log likelihood = -1180.0573
Iteration 45: log likelihood = -1180.0466
Iteration 46: log likelihood = -1180.0434
BFGS stepping has contracted, resetting BFGS Hessian (3)
Iteration 47: log likelihood = -1180.043
Iteration 48: log likelihood = -1180.0427 (backed up)
Iteration 49: log likelihood = -1180.0427 (backed up)
Iteration 50: log likelihood = -1180.0427 (backed up)
Iteration 51: log likelihood = -1180.0427
Iteration 52: log likelihood = -1180.0422
BFGS stepping has contracted, resetting BFGS Hessian (4)
Iteration 53: log likelihood = -1180.0414
Iteration 54: log likelihood = -1180.0412 (backed up)
Iteration 55: log likelihood = -1180.0412 (backed up)
Iteration 56: log likelihood = -1180.0412 (backed up)
Iteration 57: log likelihood = -1180.0411
Iteration 58: log likelihood = -1180.0404
Iteration 59: log likelihood = -1180.0401
BFGS stepping has contracted, resetting BFGS Hessian (5)
298

Iteration 60: log likelihood = -1180.0381


Iteration 61: log likelihood = -1180.038 (backed up)
Iteration 62: log likelihood = -1180.0364 (backed up)
Iteration 63: log likelihood = -1180.0364 (backed up)
Iteration 64: log likelihood = -1180.0364
Iteration 65: log likelihood = -1180.0361
Iteration 66: log likelihood = -1180.0357
BFGS stepping has contracted, resetting BFGS Hessian (6)
Iteration 67: log likelihood = -1180.0348
Iteration 68: log likelihood = -1180.0348 (backed up)
Iteration 69: log likelihood = -1180.0348 (backed up)
Iteration 70: log likelihood = -1180.0348 (backed up)
Iteration 71: log likelihood = -1180.0348
Iteration 72: log likelihood = -1180.0331
Iteration 73: log likelihood = -1180.0328
BFGS stepping has contracted, resetting BFGS Hessian (7)
Iteration 74: log likelihood = -1180.0319
Iteration 75: log likelihood = -1180.0318 (backed up)
Iteration 76: log likelihood = -1180.0317 (backed up)
Iteration 77: log likelihood = -1180.0317 (backed up)
Iteration 78: log likelihood = -1180.0317 (backed up)
Iteration 79: log likelihood = -1180.0313
BFGS stepping has contracted, resetting BFGS Hessian (8)
Iteration 80: log likelihood = -1180.031
Iteration 81: log likelihood = -1180.031 (backed up)
Iteration 82: log likelihood = -1180.031 (backed up)
Iteration 83: log likelihood = -1180.031 (backed up)
Iteration 84: log likelihood = -1180.031 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (9)
Iteration 85: log likelihood = -1180.0305
Iteration 86: log likelihood = -1180.0304 (backed up)
Iteration 87: log likelihood = -1180.0304 (backed up)
Iteration 88: log likelihood = -1180.0304 (backed up)
Iteration 89: log likelihood = -1180.0304
Iteration 90: log likelihood = -1180.0303
Iteration 91: log likelihood = -1180.0301
BFGS stepping has contracted, resetting BFGS Hessian (10)
Iteration 92: log likelihood = -1180.0296
Iteration 93: log likelihood = -1180.0295 (backed up)
Iteration 94: log likelihood = -1180.0295 (backed up)
Iteration 95: log likelihood = -1180.0295 (backed up)
Iteration 96: log likelihood = -1180.0295
Iteration 97: log likelihood = -1180.0292
Iteration 98: log likelihood = -1180.029
BFGS stepping has contracted, resetting BFGS Hessian (11)
Iteration 99: log likelihood = -1180.0288
Iteration 100: log likelihood = -1180.0288 (backed up)
Iteration 101: log likelihood = -1180.0288 (backed up)
Iteration 102: log likelihood = -1180.0288 (backed up)
Iteration 103: log likelihood = -1180.0288 (backed up)
Iteration 104: log likelihood = -1180.0285
299

BFGS stepping has contracted, resetting BFGS Hessian (12)


Iteration 105: log likelihood = -1180.0283
Iteration 106: log likelihood = -1180.0283 (backed up)
Iteration 107: log likelihood = -1180.0283 (backed up)
Iteration 108: log likelihood = -1180.0283 (backed up)
Iteration 109: log likelihood = -1180.0283
Iteration 110: log likelihood = -1180.0282
Iteration 111: log likelihood = -1180.028
BFGS stepping has contracted, resetting BFGS Hessian (13)
Iteration 112: log likelihood = -1180.0274
Iteration 113: log likelihood = -1180.0274 (backed up)
Iteration 114: log likelihood = -1180.0274 (backed up)
Iteration 115: log likelihood = -1180.0274 (backed up)
Iteration 116: log likelihood = -1180.0274 (backed up)
Iteration 117: log likelihood = -1180.0266
BFGS stepping has contracted, resetting BFGS Hessian (14)
Iteration 118: log likelihood = -1180.0265
Iteration 119: log likelihood = -1180.0265 (backed up)
Iteration 120: log likelihood = -1180.0265 (backed up)
Iteration 121: log likelihood = -1180.0265 (backed up)
Iteration 122: log likelihood = -1180.0265 (backed up)
Iteration 123: log likelihood = -1180.0263
BFGS stepping has contracted, resetting BFGS Hessian (15)
Iteration 124: log likelihood = -1180.0261
Iteration 125: log likelihood = -1180.0261 (backed up)
Iteration 126: log likelihood = -1180.0261 (backed up)
Iteration 127: log likelihood = -1180.0261 (backed up)
Iteration 128: log likelihood = -1180.0261 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (16)
Iteration 129: log likelihood = -1180.026
Iteration 130: log likelihood = -1180.026 (backed up)
Iteration 131: log likelihood = -1180.026 (backed up)
Iteration 132: log likelihood = -1180.026 (backed up)
Iteration 133: log likelihood = -1180.026 (backed up)
Iteration 134: log likelihood = -1180.0259
BFGS stepping has contracted, resetting BFGS Hessian (17)
Iteration 135: log likelihood = -1180.0213
Iteration 136: log likelihood = -1180.0208 (backed up)
Iteration 137: log likelihood = -1180.0207 (backed up)
Iteration 138: log likelihood = -1180.0207 (backed up)
Iteration 139: log likelihood = -1180.0206
Iteration 140: log likelihood = -1180.0191
Iteration 141: log likelihood = -1180.0186
BFGS stepping has contracted, resetting BFGS Hessian (18)
Iteration 142: log likelihood = -1180.0185
Iteration 143: log likelihood = -1180.0185 (backed up)
Iteration 144: log likelihood = -1180.0185 (backed up)
Iteration 145: log likelihood = -1180.0185
Iteration 146: log likelihood = -1180.0185 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (19)
Iteration 147: log likelihood = -1180.0184
300

Iteration 148: log likelihood = -1180.0184 (backed up)


Iteration 149: log likelihood = -1180.0184 (backed up)
Iteration 150: log likelihood = -1180.0184 (backed up)
Iteration 151: log likelihood = -1180.0184 (backed up)
Iteration 152: log likelihood = -1180.0184
Iteration 153: log likelihood = -1180.0183
BFGS stepping has contracted, resetting BFGS Hessian (20)
Iteration 154: log likelihood = -1180.0177
Iteration 155: log likelihood = -1180.0176 (backed up)
Iteration 156: log likelihood = -1180.0176 (backed up)
Iteration 157: log likelihood = -1180.0176 (backed up)
Iteration 158: log likelihood = -1180.0176 (backed up)
Iteration 159: log likelihood = -1180.0172
Iteration 160: log likelihood = -1180.0171
BFGS stepping has contracted, resetting BFGS Hessian (21)
Iteration 161: log likelihood = -1180.017
Iteration 162: log likelihood = -1180.017 (backed up)
Iteration 163: log likelihood = -1180.017 (backed up)
Iteration 164: log likelihood = -1180.017 (backed up)
Iteration 165: log likelihood = -1180.017
Iteration 166: log likelihood = -1180.017
BFGS stepping has contracted, resetting BFGS Hessian (22)
Iteration 167: log likelihood = -1180.0169
Iteration 168: log likelihood = -1180.0169 (backed up)
Iteration 169: log likelihood = -1180.0169 (backed up)
Iteration 170: log likelihood = -1180.0169 (backed up)
Iteration 171: log likelihood = -1180.0169 (backed up)
Iteration 172: log likelihood = -1180.0169
Iteration 173: log likelihood = -1180.0169
BFGS stepping has contracted, resetting BFGS Hessian (23)
Iteration 174: log likelihood = -1180.0167
Iteration 175: log likelihood = -1180.0167 (backed up)
Iteration 176: log likelihood = -1180.0167 (backed up)
Iteration 177: log likelihood = -1180.0167 (backed up)
Iteration 178: log likelihood = -1180.0167 (backed up)
Iteration 179: log likelihood = -1180.0166
BFGS stepping has contracted, resetting BFGS Hessian (24)
Iteration 180: log likelihood = -1180.0165
Iteration 181: log likelihood = -1180.0165 (backed up)
Iteration 182: log likelihood = -1180.0165 (backed up)
Iteration 183: log likelihood = -1180.0165 (backed up)
Iteration 184: log likelihood = -1180.0165 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (25)
Iteration 185: log likelihood = -1180.0165
Iteration 186: log likelihood = -1180.0165 (backed up)
Iteration 187: log likelihood = -1180.0165 (backed up)
Iteration 188: log likelihood = -1180.0164 (backed up)
Iteration 189: log likelihood = -1180.0164 (backed up)
Iteration 190: log likelihood = -1180.0164
BFGS stepping has contracted, resetting BFGS Hessian (26)
Iteration 191: log likelihood = -1180.0164
301

Iteration 192: log likelihood = -1180.0164 (backed up)


Iteration 193: log likelihood = -1180.0164 (backed up)
Iteration 194: log likelihood = -1180.0164 (backed up)
Iteration 195: log likelihood = -1180.0164 (backed up)
Iteration 196: log likelihood = -1180.0164
BFGS stepping has contracted, resetting BFGS Hessian (27)
Iteration 197: log likelihood = -1180.0163
Iteration 198: log likelihood = -1180.0163 (backed up)
Iteration 199: log likelihood = -1180.0163 (backed up)
Iteration 200: log likelihood = -1180.0163 (backed up)
Iteration 201: log likelihood = -1180.0163 (backed up)
Iteration 202: log likelihood = -1180.0162
BFGS stepping has contracted, resetting BFGS Hessian (28)
Iteration 203: log likelihood = -1180.0162
Iteration 204: log likelihood = -1180.0162 (backed up)
Iteration 205: log likelihood = -1180.0162 (backed up)
Iteration 206: log likelihood = -1180.0162 (backed up)
Iteration 207: log likelihood = -1180.0162 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (29)
Iteration 208: log likelihood = -1180.0161
Iteration 209: log likelihood = -1180.0161 (backed up)
Iteration 210: log likelihood = -1180.0161 (backed up)
Iteration 211: log likelihood = -1180.0161 (backed up)
Iteration 212: log likelihood = -1180.0161
Iteration 213: log likelihood = -1180.0161
BFGS stepping has contracted, resetting BFGS Hessian (30)
Iteration 214: log likelihood = -1180.016
Iteration 215: log likelihood = -1180.016 (backed up)
Iteration 216: log likelihood = -1180.016 (backed up)
Iteration 217: log likelihood = -1180.016 (backed up)
Iteration 218: log likelihood = -1180.016 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (31)
Iteration 219: log likelihood = -1180.016
Iteration 220: log likelihood = -1180.016 (backed up)
Iteration 221: log likelihood = -1180.016 (backed up)
Iteration 222: log likelihood = -1180.016 (backed up)
Iteration 223: log likelihood = -1180.016 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (32)
Iteration 224: log likelihood = -1180.0159
Iteration 225: log likelihood = -1180.0159 (backed up)
Iteration 226: log likelihood = -1180.0159 (backed up)
Iteration 227: log likelihood = -1180.0159 (backed up)
Iteration 228: log likelihood = -1180.0159
Iteration 229: log likelihood = -1180.0159
Iteration 230: log likelihood = -1180.0159
BFGS stepping has contracted, resetting BFGS Hessian (33)
Iteration 231: log likelihood = -1180.0157
Iteration 232: log likelihood = -1180.0157 (backed up)
Iteration 233: log likelihood = -1180.0157 (backed up)
Iteration 234: log likelihood = -1180.0157 (backed up)
Iteration 235: log likelihood = -1180.0157 (backed up)
302

Iteration 236: log likelihood = -1180.0156


Nested logit estimates
Levels
=
2
Dependent variable =
d
Log likelihood = -1180.0156

Number of obs
=
4728
LR chi2(6)
= 917.1687
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------alterntv |
p | -.0013303 .001081 -1.23 0.218 -.003449 .0007883
q | .1284825 .1038986 1.24 0.216 -.075155
.33212
-------------+---------------------------------------------------------------type
|
dshore | -11.40196 9.15307 -1.25 0.213 -29.34164 6.537733
dshorey | .1108341 .0531049 2.09 0.037 .0067505 .2149178
-------------+---------------------------------------------------------------(incl. value |
parameters) |
type
|
/shore | 29.98591 24.40089 1.23 0.219 -17.83896 77.81078
/boat | 14.06438 11.39886 1.23 0.217 -8.276971 36.40572
-----------------------------------------------------------------------------LR test of homoskedasticity (iv = 1): chi2(2)= 145.39 Prob > chi2 = 0.0000
-----------------------------------------------------------------------------. estimates store nlogitunrest
.
. *** (2B) Estimate the restricted nested logit model
. ***
This is the model on p.511 that has log L = -1252
.
. * Set the inclusive value parameters to 1
. nlogit d (alterntv = p q) (type = dshore dshorey), group(id) ivc(shore=1, boat=1)
tree structure specified for the nested logit model
top --> bottom
type
alterntv
-------------------------shore
1
2
boat
3
4
User-defined constraint(s):
IV constraint(s):
[shore]_cons = 1
[boat]_cons = 1
303

initial:
log likelihood = -1256.8179
rescale:
log likelihood = -1256.8179
rescale eq: log likelihood = -1228.6278
Iteration 0: log likelihood = -1264.4012
Iteration 1: log likelihood = -1264.1213 (backed up)
Iteration 2: log likelihood = -1256.9241 (backed up)
Iteration 3: log likelihood = -1255.0984 (backed up)
Iteration 4: log likelihood = -1254.4838
Iteration 5: log likelihood = -1252.7216
Iteration 6: log likelihood = -1252.7111
Iteration 7: log likelihood = -1252.711
Nested logit estimates
Levels
=
2
Dependent variable =
d
Log likelihood = -1252.711

Number of obs
=
4728
LR chi2(4)
= 771.7778
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------alterntv |
p | -.020246 .0012832 -15.78 0.000 -.022761 -.017731
q | .7552644 .0918004 8.23 0.000
.575339 .9351899
-------------+---------------------------------------------------------------type
|
dshore | -.5897435 .1565201 -3.77 0.000 -.8965172 -.2829697
dshorey | -.0790869 .0381453 -2.07 0.038 -.1538503 -.0043235
-------------+---------------------------------------------------------------(incl. value |
parameters) |
type
|
/shore |
1
.
.
.
.
.
/boat |
1
.
.
.
.
.
-----------------------------------------------------------------------------LR test of homoskedasticity (iv = 1): chi2(0)= 0.00 Prob > chi2 =
.
-----------------------------------------------------------------------------. estimates store nlogitrest
.
. * Perform a likelihood ratio test that inclusive parameters = 1
. lrtest nlogitunrest nlogitrest
likelihood-ratio test
LR chi2(2) = 145.39
(Assumption: nlogitrest nested in nlogitunrest)
Prob > chi2 =

0.0000

.
. *** (2C) As a check, verify that this restricted nested logit = conditional logit
.
. clogit d p q dshore dshorey, group(id)
304

Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -1547.6028


log likelihood = -1317.5764
log likelihood = -1262.8183
log likelihood = -1253.096
log likelihood = -1252.7117
log likelihood = -1252.711

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(4)
= 771.78
Prob > chi2 = 0.0000
Log likelihood = -1252.711
Pseudo R2
= 0.2355
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------p | -.0202461 .0012832 -15.78 0.000 -.0227611 -.0177311
q | .7552646 .0918003 8.23 0.000 .5753392 .9351899
dshore | -.5897442 .15652 -3.77 0.000 -.8965178 -.2829706
dshorey | -.0790866 .0381453 -2.07 0.038 -.1538499 -.0043232
-----------------------------------------------------------------------------.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma15p2gev.txt
log type: text
closed on: 19 May 2005, 12:19:10

305

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma16p1tobit.txt
log type: text
opened on: 19 May 2005, 13:00:31
.
. ********** OVERVIEW OF MMA16P1TOBIT.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 16.2.1 pages 530-1 and 16.9.2 page 565
. * Classic Tobit model with generated data
. * Provides
. * (1) Graph of various conditional means Figure 16.1 (ch16condmeans.wmf)
. * (2) Tobit model estimation: various estimators not reported in book
. * (3) Tobit model estimation: CLAD estimation mentioned on page 565
. * using generated data (see below)
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA **********
.
. * Data generating process is
. * Regressor:
lnwage ~ N(2.75, 0.6^2)
. * Error term:
e ~ N(0, 1000^2)
. * Latent variable:
ystar = -2500 + 1000*lnwage + e
. * Truncated variable: ytrunc = 1(ystar>0)*ystar
. * Censored variable: ycens = 1(ystar<=0)*0 + 1(ystar>0)*ystar
. * Censoring Indicator: dy = 1(ycens>0)
.
. set seed 10101
. set obs 200
obs was 0, now 200
. gen e = 1000*invnorm(uniform( ))
. gen lnwage = 2.75 + 0.6*invnorm(uniform( ))
. gen ystar = -2500 + 1000*lnwage + e
306

. gen ytrunc = ystar


. replace ytrunc = . if (ystar < 0)
(70 real changes made, 70 to missing)
. gen ycens = ystar
. replace ycens = 0 if (ystar < 0)
(70 real changes made)
. gen dy = ycens
. replace dy = 1 if (ycens>0)
(130 real changes made)
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------e|
200 76.96455 977.5598 -2906.972 2943.727
lnwage |
200 2.792559 .6249093 .9039821 4.373462
ystar |
200 369.5237 1163.722 -2852.944 3105.383
ytrunc |
130 1047.602 712.0859 17.88135 3105.383
ycens |
200 680.9414 761.3346
0 3105.383
-------------+-------------------------------------------------------dy |
200
.65 .4781665
0
1
.
. * Save data as text (ascii) so that can use programs other than Stata
. outfile e lnwage ystar ytrunc ycens dy using mma16p1tobit.asc, replace
.
. ********** (1) PLOT THEORETICAL CONDITIONAL MEANS **********
.
. * Here we use the true parameter values used in the dgp
.
. * Compute the censored and truncated means
. gen xb = -2500 + 1000*lnwage
. gen sigma = 1000
. gen capphixb = normprob(xb/sigma)
. gen phixb = normd(xb/sigma)
. gen lamda = phixb/capphixb
. gen eytrunc = xb + sigma*lamda

307

. gen eycens = capphixb*eytrunc


.
. * Descriptive Statistics
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------e|
200 76.96455 977.5598 -2906.972 2943.727
lnwage |
200 2.792559 .6249093 .9039821 4.373462
ystar |
200 369.5237 1163.722 -2852.944 3105.383
ytrunc |
130 1047.602 712.0859 17.88135 3105.383
ycens |
200 680.9414 761.3346
0 3105.383
-------------+-------------------------------------------------------dy |
200
.65 .4781665
0
1
xb |
200 292.5592 624.9093 -1596.018 1873.462
sigma |
200
1000
0
1000
1000
capphixb |
200 .5983181 .2092614 .0552424 .9694977
phixb |
200 .3271769 .0771531 .0689849 .3989196
-------------+-------------------------------------------------------lamda |
200 .6687834 .3533611 .0711553 2.020711
eytrunc |
200 961.3426 283.2587 424.693 1944.617
eycens |
200 631.3493 380.6074 23.46106 1885.302
.
. * Plot Figure 16.1 on page 531
. sort lnwage
. graph twoway (scatter ystar lnwage, msize(small)) /*
> */ (scatter eytrunc lnwage, c(l) msize(vtiny) clstyle(p3) clwidth(medthick)) /*
> */ (scatter eycens lnwage, c(l) msize(vtiny) clstyle(p2) clwidth(medthick)) /*
> */ (scatter xb lnwage, c(l) msize(vtiny) clstyle(p1) clwidth(medthick)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Tobit: Censored and Truncated Means") /*
> */ xtitle("Natural Logarithm of Wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Different Conditional Means", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(5) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Actual Latent Variable") label(2 "Truncated Mean") /*
> */
label(3 "Censored Mean") label(4 "Uncensored Mean"))
. graph export ch16condmeans.wmf, replace
(file c:\Imbook\bwebpage\Section4\ch16condmeans.wmf written in Windows Metafile format)
.
. ********** (2) TOBIT MODEL ESTIMATION FOR THESE DATA **********
.
. * These are computations not reported in the book.
.
. * With only 200 observations the Heckman 2-step estimates given below
. * are very inefficient. To verify that they are consistent
. * increase the sample size e.g. set obs 20000
308

.
. * (2A) ESTIMATE THE VARIOUS MODELS
.
. *** UNCENSORED OLS REGRESSION
. * Possible here since for these generated data we actually know ystar
. * Yelds consistent estimate. Expect slope = 1000 approximately.
. regress ystar lnwage, robust
Regression with robust standard errors
Number of obs =
F( 1, 198) = 96.32
Prob > F
= 0.0000
R-squared = 0.2944
Root MSE = 980

200

-----------------------------------------------------------------------------|
Robust
ystar |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 1010.39 102.9518 9.81 0.000 807.3673 1213.413
_cons | -2452.05 303.2432 -8.09 0.000 -3050.051 -1854.049
-----------------------------------------------------------------------------. estimates store ols
. predict ystarols
(option xb assumed; fitted values)
.
. *** CENSORED OLS REGRESSION
. * Yields inconsistent estimates
. * From subsection 16.3.6 for slope coefficient OLS converges to p times b
. * where p is fraction of sample with positive values. Here 0.65*1000 = 650.
. regress ycens lnwage, robust
Regression with robust standard errors
Number of obs =
F( 1, 198) = 84.20
Prob > F
= 0.0000
R-squared = 0.2522
Root MSE = 660.04

200

-----------------------------------------------------------------------------|
Robust
ycens |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 611.8108 66.67493 9.18 0.000 480.3267 743.2949
_cons | -1027.577 176.0776 -5.84 0.000 -1374.805 -680.3484
-----------------------------------------------------------------------------. estimates store censols
. predict ycensols
309

(option xb assumed; fitted values)


.
. *** TRUNCATED OLS REGRESSION for POSITIVE WAGE
. * Yields inconsistent estimates
. * See subsection 16.3.6 for discussion.
. regress ytrunc lnwage, robust
Regression with robust standard errors
Number of obs =
F( 1, 128) = 22.05
Prob > F
= 0.0000
R-squared = 0.1261
Root MSE
= 668.28

130

-----------------------------------------------------------------------------|
Robust
ytrunc |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 442.6319 94.26938 4.70 0.000 256.1038
629.16
_cons | -282.4444 282.9091 -1.00 0.320 -842.2285 277.3396
-----------------------------------------------------------------------------. estimates store truncols
. predict ytrunols
(option xb assumed; fitted values)
.
. *** CENSORED TOBIT MLE REGRESSION for HWAGE
. * Yields consistent estimates
. tobit ycens lnwage, ll(0)
Tobit estimates

Number of obs =
200
LR chi2(1)
=
65.64
Prob > chi2 = 0.0000
Log likelihood = -1118.3857
Pseudo R2
= 0.0285
-----------------------------------------------------------------------------ycens |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 956.4877 116.8382 8.19 0.000 726.0879 1186.887
_cons | -2244.567 346.8778 -6.47 0.000 -2928.595 -1560.539
-------------+---------------------------------------------------------------_se | 896.6811 59.14988
(Ancillary parameter)
-----------------------------------------------------------------------------Obs. summary:
130

70 left-censored observations at ycens<=0


uncensored observations

. estimates store censtobit

310

. predict ycenstob
(option xb assumed; fitted values)
.
. *** TRUNCATED TOBIT MLE REGRESSION for HWAGE
. * If done propoerly yields consistent estimates
. * Not sure how to do this in Stata
. * The obvious command is
. * tobit ytrunc lnwage, ll(0)
. * but this gives the same estimates as truncated OLS
.
. *** PROBIT REGRESSION for HWAGE
. * Yields consistent estimates for slope b/s = 1000/1000 = 1
. * but uses less information so expect less efficient than tobit
. probit dy lnwage
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -129.48933


log likelihood = -106.07902
log likelihood = -105.30024
log likelihood = -105.29672

Probit estimates

Number of obs =
200
LR chi2(1)
=
48.39
Prob > chi2 = 0.0000
Log likelihood = -105.29672
Pseudo R2
= 0.1868
-----------------------------------------------------------------------------dy |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 1.173851 .1870053 6.28 0.000 .8073277 1.540375
_cons | -2.795715 .508104 -5.50 0.000 -3.79158 -1.799849
-----------------------------------------------------------------------------. estimates store probit
. predict yprobit
(option p assumed; Pr(dy))
.
. *** HECKMAN 2-STEP ESTIMATOR DONE MANUALLY
. * Yields consistent estimates but less efficient than censored tobit MLE
. * The second stage standard errors will be incorrect
. probit dy lnwage
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -129.48933


log likelihood = -106.07902
log likelihood = -105.30024
log likelihood = -105.29672

Probit estimates

Number of obs =
LR chi2(1)
=
48.39

200

311

Prob > chi2


Log likelihood = -105.29672

= 0.0000
Pseudo R2
=

0.1868

-----------------------------------------------------------------------------dy |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 1.173851 .1870053 6.28 0.000 .8073277 1.540375
_cons | -2.795715 .508104 -5.50 0.000 -3.79158 -1.799849
-----------------------------------------------------------------------------. predict probity, xb
. gen invmills = normd(probity)/normprob(probity)
. summarize dy probity invmills
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------dy |
200
.65 .4781665
0
1
probity |
200 .482335 .7335506 -1.734574 2.33808
invmills |
200 .5867037 .3823083 .0261866 2.140342
. regress ytrunc lnwage invmills
Source |
SS
df
MS
Number of obs = 130
-------------+-----------------------------F( 2, 127) = 9.41
Model | 8440402.78 2 4220201.39
Prob > F
= 0.0002
Residual | 56971158.9 127 448591.802
R-squared = 0.1290
-------------+-----------------------------Adj R-squared = 0.1153
Total | 65411561.6 129 507066.369
Root MSE
= 669.77
-----------------------------------------------------------------------------ytrunc |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 176.6468 418.2392 0.42 0.673 -650.9731 1004.267
invmills | -498.9958 760.3525 -0.66 0.513 -2003.596 1005.604
_cons | 745.3069 1597.558 0.47 0.642 -2415.972 3906.586
-----------------------------------------------------------------------------. estimates store heck2step
. correlate lnwage invmills
(obs=200)
| lnwage invmills
-------------+-----------------lnwage | 1.0000
invmills | -0.9745 1.0000

. * And more robust standard errors may be found by


312

. regress ytrunc lnwage invmills, robust


Regression with robust standard errors
Number of obs =
F( 2, 127) = 13.96
Prob > F
= 0.0000
R-squared = 0.1290
Root MSE = 669.77

130

-----------------------------------------------------------------------------|
Robust
ytrunc |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 176.6468 379.1739 0.47 0.642 -573.6699 926.9636
invmills | -498.9958 635.4917 -0.79 0.434 -1756.519 758.5276
_cons | 745.3069 1431.149 0.52 0.603 -2086.68 3577.293
-----------------------------------------------------------------------------. estimates store heck2srobust
.
. *** HECKMAN 2-STEP ESTIMATOR DONE USING BUILT-IN HECKMAN COMMAND
. * Yields consistent estimates but less efficient than censored tobit MLE
. heckman ytrunc lnwage, select(lnwage) twostep
Heckman selection model -- two-step estimates Number of obs
(regression model with sample selection)
Censored obs
=
Uncensored obs =
130
Wald chi2(2)
Prob > chi2

200
70

= 39.57
= 0.0000

-----------------------------------------------------------------------------|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------ytrunc
|
lnwage | 176.6469 425.0025 0.42 0.678 -656.3428 1009.636
_cons | 745.3067 1617.583 0.46 0.645 -2425.098 3915.711
-------------+---------------------------------------------------------------select
|
lnwage | 1.173851 .1870053 6.28 0.000 .8073277 1.540375
_cons | -2.795715 .508104 -5.50 0.000 -3.79158 -1.799849
-------------+---------------------------------------------------------------mills
|
lambda | -498.9957 760.5005 -0.66 0.512 -1989.549 991.5578
-------------+---------------------------------------------------------------rho | -0.67419
sigma | 740.1433
lambda | -498.99575 760.5005
-----------------------------------------------------------------------------. estimates store heckman
313

. predict ystarhec, xb
. predict ytrunhec, ycond
. predict ycenshec, yexpected
. predict yinvmill, mills
. predict yprobsel, psel
. correlate lnwage yinvmill
(obs=200)
| lnwage yinvmill
-------------+-----------------lnwage | 1.0000
yinvmill | -0.9745 1.0000

.
. * (2B) DISPLAY COEFFICIENT ESTIMATES
.
. * OLS estimates True model is -2500 + 1000*lnwage
. estimates table ols censols truncols, b(%10.2f) se(%10.2f) t stats(N ll)
----------------------------------------------------Variable | ols
censols
truncols
-------------+--------------------------------------lnwage | 1010.39
611.81
442.63
| 102.95
66.67
94.27
|
9.81
9.18
4.70
_cons | -2452.05 -1027.58 -282.44
| 303.24
176.08
282.91
|
-8.09
-5.84
-1.00
-------------+--------------------------------------N | 200.00
200.00
130.00
ll | -1660.29 -1581.24 -1029.07
----------------------------------------------------legend: b/se/t
.
. * Tobit estimates True model is -2500 + 1000*lnwage
. estimates table censtobit probit, b(%10.2f) se(%10.2f) t stats(N ll)
---------------------------------------Variable | censtobit
probit
-------------+-------------------------lnwage | 956.49
1.17
| 116.84
0.19
|
8.19
6.28
314

_se | 896.68
|
59.15
| 15.16
_cons | -2244.57
-2.80
| 346.88
0.51
|
-6.47
-5.50
-------------+-------------------------N | 200.00
200.00
ll | -1118.39
-105.30
---------------------------------------legend: b/se/t
.
. * Tobit estimates using Heckman manual True model is -2500 + 1000*lnwage
. estimates table heck2step heck2srobust, b(%10.2f) se(%10.2f) t stats(N ll)
---------------------------------------Variable | heck2step heck2sro~t
-------------+-------------------------lnwage | 176.65
176.65
| 418.24
379.17
|
0.42
0.47
invmills | -499.00 -499.00
| 760.35
635.49
|
-0.66
-0.79
_cons | 745.31
745.31
| 1597.56
1431.15
|
0.47
0.52
-------------+-------------------------N | 130.00
130.00
ll | -1028.85 -1028.85
---------------------------------------legend: b/se/t
.
. * Tobit estimates using Heckman built-in True model is -2500 + 1000*lnwage
. estimates table heckman, b(%10.2f) se(%10.2f) t stats(N ll)
--------------------------Variable | heckman
-------------+------------ytrunc
|
lnwage | 176.65
| 425.00
|
0.42
_cons | 745.31
| 1617.58
|
0.46
-------------+------------select
|
lnwage |
1.17
315

|
0.19
|
6.28
_cons | -2.80
|
0.51
|
-5.50
-------------+------------mills
|
lambda | -499.00
| 760.50
|
-0.66
-------------+------------Statistics |
N | 200.00
ll |
--------------------------legend: b/se/t
.
. ********** (3) CLAD ESTIMATION FOR THESE DATA page 565 **********
.
. * Compare tobit MLE with censored least absolute deviations (CLAD) estimator
. * Gives results at end of section 16.9.3 page 565
.
. tobit ycens lnwage, ll(0)
Tobit estimates

Number of obs =
200
LR chi2(1)
=
65.64
Prob > chi2 = 0.0000
Log likelihood = -1118.3857
Pseudo R2
= 0.0285
-----------------------------------------------------------------------------ycens |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 956.4877 116.8382 8.19 0.000 726.0879 1186.887
_cons | -2244.567 346.8778 -6.47 0.000 -2928.595 -1560.539
-------------+---------------------------------------------------------------_se | 896.6811 59.14988
(Ancillary parameter)
-----------------------------------------------------------------------------Obs. summary:
130

70 left-censored observations at ycens<=0


uncensored observations

. clad ycens lnwage, reps(100) ll(0)


Initial sample size = 200
Final sample size = 159
Pseudo R2 = .12380382
Bootstrap statistics
Variable | Reps Observed

Bias Std. Err. [95% Conf. Interval]


316

---------+------------------------------------------------------------------lnwage | 100 838.2366 59.09127 165.7476 509.3575 1167.116 (N)


|
666.9485 1298.217 (P)
|
664.528 1247.371 (BC)
---------+------------------------------------------------------------------const | 100 -1897.847 -184.2656 529.6713 -2948.83 -846.8643 (N)
|
-3406.233 -1435.466 (P)
|
-3406.233 -1435.466 (BC)
----------------------------------------------------------------------------N = normal, P = percentile, BC = bias-corrected
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section4\mma16p1tobit.txt
log type: text
closed on: 19 May 2005, 13:00:37
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma16p2mills.txt
log type: text
opened on: 19 May 2005, 13:02:12
.
. ********** OVERVIEW OF MMA16P2MILLS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 16.3.4 page 540
. * Presentation of Mills ratio
. * It provides
. * (1) Figure 16.1 (ch16millsratio.wmf)
. * This program requires no data
.
. ********** SETUP ***********
.
. set more off
. version 8
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA AND FUNCTIONS
.
. * Create density cdf Mills ratio for N[0,1]
. set obs 100
obs was 0, now 100
317

. gen c = 4*(50-_n)/100
. gen PHIc = norm(c)
. gen phic = normden(c)
. gen lamdac = phic/(1-PHIc)
.
. * Descriptive statistics
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------c|
100
-.02 1.16046
-2
1.96
PHIc |
100 .4952275 .338039 .0227501 .9750021
phic |
100 .2386177 .1157086 .053991 .3989423
lamdac |
100 .9284788 .7023349 .0552479 2.337835
.
. *********** FIGURE 16.2 page 540 ***********
.
. * This graph shows Mills ratio and cdf and density
. graph twoway (scatter lamdac c, c(l) msize(vtiny) clstyle(p1) clwidth(medthick)) /*
> */ (scatter PHIc c, c(l) msize(vtiny) clstyle(p3) clwidth(medthick)) /*
> */ (scatter phic c, c(l) msize(vtiny) clstyle(p2) clwidth(medthick)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Inverse Mills Ratio as Cutoff Varies") /*
> */ xtitle("Cutoff point c", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Inverse Mills, pdf and cdf", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Inverse Mills ratio") label(2 "N[0,1] Cdf") label(3 "N[0,1] Density"))
. graph export ch16millsratio.wmf, replace
(file c:\Imbook\bwebpage\Section4\ch16millsratio.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT ***********
. log close
log: c:\Imbook\bwebpage\Section4\mma16p2mills.txt
log type: text
closed on: 19 May 2005, 13:02:15
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma16p3selection.txt
log type: text
opened on: 19 May 2005, 13:04:33
.
. ********** OVERVIEW OF MMA16P3SELECTION.DO **********
318

.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 16.6 pages 553-5
. * Selection models example
. * It provides
. * (1) Two-part model estimation (Table 16.1)
. * (2) Selection model estimation
. * (2A) ML estimates (Table 16.1)
. * (2B) Heckman 2-step estimates (Table 16.1)
. * (2C) Check for possible collinearity problems in Heckman 2-Step
.
. * To use this program you need health expenditure data in Stata data set
. * randdata.dta
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** DATA DESCRIPTION **********
.
. * Essentially same data as in P. Deb and P.K. Trivedi (2002)
. * "The Structure of Demand for Medical Care: Latent Class versus
. * Two-Part Models", Journal of Health Economics, 21, 601-625
. * except that paper used different outcome (counts rather than $)
.
. * Each observation is for an individual over a year.
. * Individuals may appear in up to five years.
. * All available sample is used except only fee for service plans included.
. * In analysis here only year 2 is used so panel complications are avoided.
. * Clustering of individuals within household is ignored here.
.
. * Dependent variable is
.*
MED
med
Annual medical expenditures in constant dollars
.*
excluding dental and outpatient mental
.*
LNMED lnmeddol Ln(Medical expenditures) given meddol > 0
.*
Missing otherwise
.*
DMED binexp 1 if medical expenditures > 0
.
. * Regressors are
. * - Health insurance measures
.*
LC
logc
log(coinsrate+1) where coinsurance rate is 0 to 100
319

.*
IDP
idp
1 if individual deductible plan
.*
LPI
lpi
1og(annual participation incentive payment) or 0 if no payment
.*
FMDE
fmde
log(max(medical deductible expenditure)) if IDP=1 and MDE>1 or 0
otherw
> ise.
. * - Health status measures
.*
NDISEASE disea number of chronic diseases
.*
PHYSLIM physlm 1 if physical limitation
.*
HLTHG hlthg 1 if good health
.*
HLTHF hlthf 1 if good health
.*
HLTHP hlthp 1 if good health (omitted is excellent)
. * - Socioeconomic characteristics
.*
LINC linc
log of annual family income (in $)
.*
LFAM lfam
log of family size
.*
EDUCDEC educdec years of schooling of decision maker
.*
AGE
xage
exact age
.*
BLACK black 1 if black
.*
FEMALE female 1 if female
.*
CHILD child 1 if child
.*
FEMCHILD fchild 1 if female child
.
. * If panel data used then clustering is on
.*
zper
person id
.
. ********** READ DATA **********
.
. use randdata.dta, clear
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------plan | 20190 11.17553 3.976751
1
19
site | 20190 3.298811 1.80382
1
6
coins | 20190 26.3056 36.40386
0
100
tookphys | 20190 .5974245 .4904288
0
1
year | 20190 2.420109 1.217141
1
5
-------------+-------------------------------------------------------zper | 20190 357965.5 180868.1 125024 632167
black | 20190 .1814983 .3827071
0
1
income | 20190 8037.409 4058.371
0 29237.54
xage | 20190 25.72233 16.76945
0 64.27515
female | 20190 .5170381 .499722
0
1
-------------+-------------------------------------------------------educdec | 20186 11.96681 2.806255
0
25
time | 20190 .9989561 .0259741 .0767123
1
outpdol | 20190 51.12649 94.92627
0 2599.902
drugdol | 20190 13.1687 33.76212
0 706.3979
suppdol | 20190
6.8024 21.39346
0 1009.47
-------------+-------------------------------------------------------mentdol | 20190 6.870347 58.41298
0 1340.834
320

inpdol | 20190 100.4694 655.6215


0 38649.81
meddol | 20190 171.5679 698.2015
0 39182.02
totadm | 20190 .1127291 .4111857
0
8
inpmis | 20190 .0039624 .062824
0
1
-------------+-------------------------------------------------------mentvis | 20190 .4322437 3.430789
0
62
mdvis | 20190 2.860426 4.504365
0
77
notmdvis | 20190 .6855869 3.763543
0
109
num | 20190 3.954235 1.853034
1
14
mhi | 20190 76.55584 12.50224
12.2
100
-------------+-------------------------------------------------------disea | 20190 11.24449 6.741449
0
58.6
physlm | 20190 .1235003 .3220164
0
1
ghindx | 14967 73.09055 15.99371
3.7
100
mdeoff | 20185 417.8422 384.1199
0
1000
pioff | 20185 446.677 367.466
0 1291.68
-------------+-------------------------------------------------------child | 20190 .4013373 .4901812
0
1
fchild | 20190 .1937098 .3952139
0
1
lfam | 20190 1.248156 .539301
0 2.639057
lpi | 20190 4.707894 2.69784
0 7.163699
idp | 20190 .2599802 .4386343
0
1
-------------+-------------------------------------------------------logc | 20190 2.383342 2.041776
0 4.564348
fmde | 20190 4.029524 3.471353
0 8.294049
hlthg | 20190 .3620109 .4805938
0
1
hlthf | 20190 .077266 .2670196
0
1
hlthp | 20190 .0149579 .1213874
0
1
-------------+-------------------------------------------------------xghindx | 20190 73.2375 14.2332
3.7
100
linc | 20190 8.708265 1.228309
0 10.28324
lnum | 20190 1.248156 .539301
0 2.639057
lnmeddol | 15737 4.109318 1.484654 -.8495329 10.57597
binexp | 20190 .7794453 .414631
0
1
.
. /* Describe and summarize the original data.
> describe
> summarize
> * The orignal data are a panel.
> * The following summarizes panel features for completeness
> iis zper
> tis year
> xtdes
> xtsum meddol lnmeddol binexp
> */
.
. ********** DATA SELECTION AND TRANSFORMATIONS **********
.
. * Use only Year 2
. keep if year==2
321

(14615 observations deleted)


.
. * educdec is missing for one observation
. drop if educdec==.
(1 observation deleted)
.
. * rename variables
. rename meddol MED
. rename binexp DMED
. rename lnmeddol LNMED
. rename linc LINC
. rename lfam LFAM
. rename educdec EDUCDEC
. rename xage AGE
. rename female FEMALE
. rename child CHILD
. rename fchild FEMCHILD
. rename black BLACK
. rename disea NDISEASE
. rename physlm PHYSLIM
. rename hlthg HLTHG
. rename hlthf HLTHF
. rename hlthp HLTHP
. rename idp IDP
. rename logc LC
. rename lpi LPI
. rename fmde FMDE
.
. * Define the regressor list which in commands can refer to as $XLIST
322

. global XLIST LC IDP LPI FMDE PHYSLIM NDISEASE HLTHG HLTHF HLTHP /*
>
*/ LINC LFAM EDUCDEC AGE FEMALE CHILD FEMCHILD BLACK
.
. * Summarize the dependents and regressors
. sum MED DMED LNMED $XLIST
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
5574 169.7247 802.8303
0 39182.02
DMED |
5574 .7680301 .4221277
0
1
LNMED |
4281 4.069462 1.499372 -.5343859 10.57597
LC |
5574 2.420739 2.043883
0 4.564348
IDP |
5574 .261751 .4396272
0
1
-------------+-------------------------------------------------------LPI |
5574 4.726834 2.681354
0 7.163699
FMDE |
5574 4.065015 3.450558
0 8.294049
PHYSLIM |
5574 .1242463 .3233768
0
1
NDISEASE |
5574 11.20526 6.788959
0
58.6
HLTHG |
5574 .3649085 .4814477
0
1
-------------+-------------------------------------------------------HLTHF |
5574 .0782203 .268542
0
1
HLTHP | 5574 .0156082 .123965
0
1
LINC |
5574 8.696929 1.220592
0 10.28324
LFAM |
5574 1.241407 .5403965
0 2.564949
EDUCDEC |
5574 11.9466 2.837492
0
25
-------------+-------------------------------------------------------AGE |
5574 25.57613 16.73011 .0253251 63.27515
FEMALE |
5574 .5184787 .4997032
0
1
CHILD |
5574 .4050951 .4909545
0
1
FEMCHILD |
5574 .1955508 .3966597
0
1
BLACK |
5574 .1859852 .3860055
0
1
.
. * Detailed summary shows that MED>0 very skewed whereas LNMED is not
. sum MED LNMED if MED>0, detail
medical exp excl outpatient men
------------------------------------------------------------Percentiles
Smallest
1% 2.109705
.5860291
5% 5.752914
.6630728
10% 9.376465
.6770833
Obs
4281
25% 21.31435
.6770833
Sum of Wgt.
4281
50%
75%
90%
95%
99%

52.64357
Mean
220.987
Largest
Std. Dev.
909.9021
136.4518
12044.11
453.8059
17465.98
Variance
827921.9
904.328
18641.98
Skewness
24.00829
2666.309
39182.02
Kurtosis
873.379
323

LNMED
------------------------------------------------------------Percentiles
Smallest
1%
.746548 -.5343859
5% 1.749707
-.4108706
10% 2.238203 -.3899609
Obs
4281
25% 3.059381 -.3899609
Sum of Wgt.
4281
50%
75%
90%
95%
99%

3.963544
Mean
4.069462
Largest
Std. Dev.
1.499372
4.915971
9.396331
6.11767
9.76801
Variance
2.248116
6.807192
9.833171
Skewness
.347695
7.888451
10.57597
Kurtosis
3.28909

.
. * Write final data to a text (ascii) file so can use with programs other than Stata
. outfile DMED MED LNMED LC IDP LPI FMDE PHYSLIM NDISEASE HLTHG HLTHF
HLTHP /*
>
*/ LINC LFAM EDUCDEC AGE FEMALE CHILD FEMCHILD BLACK /*
>
*/ using mma16p3selection.asc, replace
.
. ****************** CHAPTER 16.6 REGRESSION ANALYSIS **************
.
. * The analysis below models log expenditure (lny), not expenditure (y)
. * where here y = MED and lny = LNMED.
.
. * This makes regular tobit difficult as it is not clear
. * what the censoring/truncation point is since ln(0) = -infinity
. * Also note that some LNMED<0 as 0<MED<1 is possible.
. * So just do two-part model and sample selection model.
.
. * Interested in comparing MED not LNMED at end of day.
. * So use
. * If lny = xb + u, u ~ N[0, s^2] for y > 0
. * Then E[y] = exp(xb + (s^2)/2)
for y > 0
. * and E[y] = Pr[y>0]*exp(xb + (s^2)/2) for all y
.
. * The models estimated are
. * (1) Two-part model using
. * (a) probit for whether positive y
. * (b) regress with lny as dependent variable
. * (2) Sample selection model similar to (3)
. * except that inverse Mills ratio appears in (b), estimated by
. * (a) MLE
. * (b) Heckman 2-step
.
. * Additionally censored tobit and truncated tobit commands in levels
. * are given below for completeness.
324

.
. ************ (1) TWO-PART MODEL ************
.
. * Two-part model: binary probit and then lognormal for expenditures
.
. * First part: probit for MED > 0
. probit DMED $XLIST
/* global XLIST defined earlier */
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -3019.1326


log likelihood = -2698.302
log likelihood = -2690.6146
log likelihood = -2690.5768
log likelihood = -2690.5768

Probit estimates

Number of obs =
5574
LR chi2(17) = 657.11
Prob > chi2 = 0.0000
Log likelihood = -2690.5768
Pseudo R2
= 0.1088
-----------------------------------------------------------------------------DMED |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.118708 .0269005 -4.41 0.000 -.1714319 -.065984
IDP | -.1279483 .0522351 -2.45 0.014 -.2303272 -.0255693
LPI | .0283091 .0088793 3.19 0.001
.010906 .0457121
FMDE | .0075319 .0161584 0.47 0.641 -.024138 .0392018
PHYSLIM | .2732013 .0743761 3.67 0.000 .1274268 .4189758
NDISEASE | .0224861 .0035958 6.25 0.000 .0154384 .0295338
HLTHG | .0387516 .0438545 0.88 0.377 -.0472016 .1247049
HLTHF | .1920062 .0836688 2.29 0.022 .0280185 .355994
HLTHP | .6397294 .2126322 3.01 0.003 .222978 1.056481
LINC | .0518413 .0168128 3.08 0.002 .0188889 .0847938
LFAM | -.0335599 .041728 -0.80 0.421 -.1153452 .0482253
EDUCDEC | .036307 .0076536 4.74 0.000 .0213062 .0513078
AGE | .0002631 .0021606 0.12 0.903 -.0039715 .0044978
FEMALE | .4451035 .054292 8.20 0.000 .3386932 .5515138
CHILD | .111489 .0808338 1.38 0.168 -.0469424 .2699203
FEMCHILD | -.4512845 .0799219 -5.65 0.000 -.6079284 -.2946405
BLACK | -.6057367 .0523148 -11.58 0.000 -.7082718 -.5032017
_cons | -.271605 .1877345 -1.45 0.148 -.6395579 .0963478
-----------------------------------------------------------------------------. estimates store twoparta
. scalar llprobit = e(ll)
. predict probsel2part, p
. predict xbprobit, xb

/* version 8 command for later table */

/* Log-likelihood */
/* Pr[y>0] = PHI(x'b) */
/* x'b */

.
325

. * Second part: OLS for log of positive values


. * Here LNMED where LNMED missing if MED < 0
. regress LNMED $XLIST
Source |
SS
df
MS
Number of obs = 4281
-------------+-----------------------------F( 17, 4263) = 39.69
Model | 1314.70352 17 77.335501
Prob > F
= 0.0000
Residual | 8307.23358 4263 1.94868252
R-squared = 0.1366
-------------+-----------------------------Adj R-squared = 0.1332
Total | 9621.9371 4280 2.24811614
Root MSE
= 1.396
-----------------------------------------------------------------------------LNMED |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0164006 .0312495 -0.52 0.600 -.0776658 .0448647
IDP | -.0789998 .061796 -1.28 0.201 -.2001522 .0421526
LPI | .0027057 .0097138 0.28 0.781 -.0163383 .0217498
FMDE | -.0306123 .0180695 -1.69 0.090 -.0660379 .0048134
PHYSLIM | .2619829 .0687459 3.81 0.000 .1272052 .3967607
NDISEASE | .0198922 .0034441 5.78 0.000
.01314 .0266444
HLTHG | .1438008 .0483778 2.97 0.003 .0489553 .2386464
HLTHF | .3642649 .0881004 4.13 0.000 .1915422 .5369876
HLTHP | .7865099 .1700502 4.63 0.000 .453123 1.119897
LINC | .0931988 .0217849 4.28 0.000 .0504891 .1359085
LFAM | -.1408033 .046203 -3.05 0.002 -.2313852 -.0502214
EDUCDEC | -5.66e-06 .0082599 -0.00 0.999 -.0161993 .016188
AGE | .0055602 .002251 2.47 0.014 .0011471 .0099733
FEMALE | .3442509 .0571573 6.02 0.000 .2321929 .456309
CHILD | -.2677921 .0904307 -2.96 0.003 -.4450833 -.0905009
FEMCHILD | -.3512207 .0896517 -3.92 0.000 -.5269847 -.1754568
BLACK | -.1964412 .0677021 -2.90 0.004 -.3291725 -.0637099
_cons | 3.077182 .2213448 13.90 0.000
2.64323 3.511133
-----------------------------------------------------------------------------. estimates store twopartb
. scalar lllognormal = e(ll) /* Log-likelihood */
. scalar sols = e(rmse)

/* Standard error of the regression */

. predict pLNMED, xb

/* Predicted mean from OLS */

. predict rLNMED, residuals


(1293 missing values generated)
.
. * Check for normal errors
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
326

Variables: fitted values of LNMED


chi2(1)
= 17.11
Prob > chi2 = 0.0000
. * imtest
. sktest LNMED rLNMED
Skewness/Kurtosis tests for Normality
------- joint -----Variable | Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
-------------+------------------------------------------------------LNMED |
0.000
0.001
.
0.0000
rLNMED |
0.000
0.000
.
0.0000
.
. * Create two-part model log-likelihood
. scalar lltwopart = llprobit + lllognormal
. di "lltwopart = " lltwopart
lltwopart = -10184.076
.
. * Create predictions of level of expenditures not logs
. * E[y] = exp(pLNMED + (s^2)/2) for y > 0
. * and E[y] = Pr[y>0]*exp(xb + (s^2)/2) for all y
. gen pMEDpos2part = exp(pLNMED + (sols^2)/2)
. gen pMEDall2part = probsel2part*pMEDpos2part
.
. * Compare predictions to actual for MED > 0
. sum LNMED pLNMED MED pMEDpos2part if MED > 0
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------LNMED |
4281 4.069462 1.499372 -.5343859 10.57597
pLNMED |
4281 4.069462 .5542326 2.298199 6.482164
MED |
4281 220.987 909.9021 .5860291 39182.02
pMEDpos2part |
4281 183.462 126.0213 26.37827 1731.088
. corr LNMED pLNMED MED pMEDpos2part if MED > 0
(obs=4281)
| LNMED pLNMED
MED pMEDpo~t
-------------+-----------------------------------LNMED | 1.0000
pLNMED | 0.3696 1.0000
MED | 0.4560 0.1576 1.0000
pMEDpos2part | 0.3387 0.9204 0.1669 1.0000

327

.
. * Compare predictions to actual including zeroes
. sum MED pMEDall2part DMED probsel2part
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
5574 169.7247 802.8303
0 39182.02
pMEDall2part |
5574 140.966 120.2022 4.880651 1729.783
DMED |
5574 .7680301 .4221277
0
1
probsel2part |
5574 .7678377 .1457464 .1526731 .999246
. corr MED pMEDall2part DMED probsel2part
(obs=5574)
|
MED pMEDal~t DMED probse~t
-------------+-----------------------------------MED | 1.0000
pMEDall2part | 0.1772 1.0000
DMED | 0.1162 0.2158 1.0000
probsel2part | 0.1031 0.6380 0.3467 1.0000

.
. ************ (2) SELECTION MODEL ************
.
. * Sample selection model for log expenditures
. * Selection equation:
.*
Observe y = y* if I = z'a + u > 0 u ~ N[0,1]
. * Regression equation:
.*
y* = x'b + v v ~ N[0,s^2] and Corr[u,v]=rho
.
. * (2A) MLE for sample selection model
. heckman LNMED $XLIST, select (DMED = $XLIST)
Iteration 0: log likelihood = -10183.753 (not concave)
Iteration 1: log likelihood = -10183.676 (not concave)
Iteration 2: log likelihood = -10183.593 (not concave)
Iteration 3: log likelihood = -10183.525 (not concave)
Iteration 4: log likelihood = -10183.467 (not concave)
Iteration 5: log likelihood = -10183.408 (not concave)
Iteration 6: log likelihood = -10183.311 (not concave)
Iteration 7: log likelihood = -10183.21 (not concave)
Iteration 8: log likelihood = -10179.155
Iteration 9: log likelihood = -10176.799
Iteration 10: log likelihood = -10170.17
Iteration 11: log likelihood = -10170.11
Iteration 12: log likelihood = -10170.11
Heckman selection model
Number of obs
=
5574
(regression model with sample selection)
Censored obs
=
1293
328

Uncensored obs

Log likelihood = -10170.11

4281

Wald chi2(17)
= 805.17
Prob > chi2
=

0.0000

-----------------------------------------------------------------------------|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNMED
|
LC | -.0760236 .0337456 -2.25 0.024 -.1421638 -.0098833
IDP | -.1497199 .0661379 -2.26 0.024 -.2793478 -.020092
LPI | .01493 .0105015 1.42 0.155 -.0056526 .0355127
FMDE | -.023522 .0194745 -1.21 0.227 -.0616913 .0146474
PHYSLIM | .3548628 .0755425 4.70 0.000 .2068023 .5029233
NDISEASE | .0286474 .0037972 7.54 0.000 .0212051 .0360897
HLTHG | .1559173 .0521775 2.99 0.003 .0536513 .2581834
HLTHF | .4451223 .0955263 4.66 0.000 .2578942 .6323505
HLTHP | .9986065 .1878791 5.32 0.000 .6303701 1.366843
LINC | .1214009 .0230845 5.26 0.000 .0761562 .1666457
LFAM | -.1583018 .0497464 -3.18 0.001 -.255803 -.0608005
EDUCDEC | .0175951 .0090183 1.95 0.051 -.0000805 .0352707
AGE | .0057376 .0024426 2.35 0.019 .0009501 .0105251
FEMALE | .5503441 .0633313 8.69 0.000 .4262171 .6744711
CHILD | -.1976875 .097398 -2.03 0.042 -.3885841 -.006791
FEMCHILD | -.5653227 .0975292 -5.80 0.000 -.7564765 -.374169
BLACK | -.5358684 .0749191 -7.15 0.000 -.6827072 -.3890296
_cons | 2.107745 .2442285 8.63 0.000 1.629066 2.586424
-------------+---------------------------------------------------------------DMED
|
LC | -.1068027 .0264766 -4.03 0.000 -.1586959 -.0549096
IDP | -.108769 .0509938 -2.13 0.033 -.2087149 -.0088231
LPI | .0294804 .0086214 3.42 0.001 .0125827 .0463781
FMDE | .0007403 .0158738 0.05 0.963 -.0303719 .0318524
PHYSLIM | .2848256 .0722656 3.94 0.000 .1431877 .4264635
NDISEASE | .0210805 .0034967 6.03 0.000 .0142271 .027934
HLTHG | .0576901 .042799 1.35 0.178 -.0261945 .1415747
HLTHF | .2237238 .0814547 2.75 0.006 .0640755 .3833721
HLTHP | .7984291 .2048087 3.90 0.000 .3970114 1.199847
LINC | .0553122 .0166179 3.33 0.001 .0227416 .0878827
LFAM | -.031201 .0402985 -0.77 0.439 -.1101846 .0477827
EDUCDEC | .031499 .0074987 4.20 0.000 .0168018 .0461961
AGE | -.0006072 .0021064 -0.29 0.773 -.0047357 .0035212
FEMALE | .4093059 .0532548 7.69 0.000 .3049283 .5136834
CHILD | .0530643 .0786326 0.67 0.500 -.1010527 .2071813
FEMCHILD | -.3953421 .0783811 -5.04 0.000 -.5489662 -.241718
BLACK | -.5831049 .0520534 -11.20 0.000 -.6851277 -.4810822
_cons | -.2141574 .1842169 -1.16 0.245 -.5752159 .146901
-------------+---------------------------------------------------------------/athrho | .9408188 .0736303 12.78 0.000
.796506 1.085132
/lnsigma | .4511091 .0177227 25.45 0.000 .4163732 .485845
-------------+---------------------------------------------------------------329

rho | .7355982 .0337886


.6620789 .7950943
sigma | 1.570053 .0278256
1.516452 1.625548
lambda | 1.154928 .0702985
1.017145 1.29271
-----------------------------------------------------------------------------LR test of indep. eqns. (rho = 0): chi2(1) = 27.93 Prob > chi2 = 0.0000
-----------------------------------------------------------------------------. estimates store heckmle
. scalar llhecklogs = e(ll)
. scalar shml = e(sigma)

/* Log-likelihood */
/* s where Var[v]=s^2 */

.
. * Save the Stata predictions:
. * Distinguish between ystar=E[y*], ypos=E[y|I>0] and yall=E[y]
. predict ystarhml, xb
/* E[y*] = x'b */
. predict yposhml, ycond

/* E[y|I>0] = E[y*|I>0] = x'b+c*lamda(z'a) */

. predict invmillhml, mills

/* lamda(z'a) = phi(z'a)/PHI(z'a) */

. predict probselhml, psel

/* PHI(z'a) */

. * The following not appropriate here as it sets y=0 if I<0


. * whereas here data is in logs and y=ln(MED)=-infinity if I<0
. predict yallhml, yexpected /* E[y] = PHI(z'a)*E[y|I>0] */
. sum ystarhml yposhml invmillhml probselhml yallhml
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ystarhml |
5574 3.543161 .7462608 .9570364 6.92732
yposhml |
5574 4.000607 .5482433 2.50515 6.92955
invmillhml |
5574 .396082 .2165116 .0019309 1.476998
probselhml |
5574 .7674107 .1404707 .1737047 .9994534
yallhml |
5574 3.124032 .9125439 .4932862 6.925763
.
. * Create predictions of level of expenditures not logs
. * E[y] = exp(ypos + (s^2)/2) for y > 0 Var[v]=s^2
. * and E[y] = Pr[y>0]*exp(ypos + (s^2)/2) for all y
. gen pMEDposhml = exp(yposhml + (shml^2)/2)
. gen pMEDallhml = probselhml*pMEDposhml
.
. * Compare predictions to actual for MED > 0
. sum LNMED yposhml MED pMEDposhml if MED > 0
Variable |

Obs

Mean

Std. Dev.

Min

Max
330

-------------+-------------------------------------------------------LNMED |
4281 4.069462 1.499372 -.5343859 10.57597
yposhml |
4281 4.071295 .5573439 2.50515 6.92955
MED |
4281 220.987 909.9021 .5860291 39182.02
pMEDposhml |
4281 240.4096 185.0424 42.00053 3505.48
. corr LNMED yposhml MED pMEDpos2part if MED > 0
(obs=4281)
| LNMED yposhml
MED pMEDpo~t
-------------+-----------------------------------LNMED | 1.0000
yposhml | 0.3690 1.0000
MED | 0.4560 0.1592 1.0000
pMEDpos2part | 0.3387 0.9343 0.1669 1.0000

.
. * Compare predictions to actual including zeroes
. sum MED pMEDallhml DMED probselhml
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
5574 169.7247 802.8303
0 39182.02
pMEDallhml |
5574 184.5571 174.1649 8.814864 3503.564
DMED |
5574 .7680301 .4221277
0
1
probselhml |
5574 .7674107 .1404707 .1737047 .9994534
. corr MED pMEDallhml DMED probselhml
(obs=5574)
|
MED pMEDal~l DMED probse~l
-------------+-----------------------------------MED | 1.0000
pMEDallhml | 0.1734 1.0000
DMED | 0.1162 0.2015 1.0000
probselhml | 0.1074 0.6092 0.3468 1.0000

.
. * (2B) Heckman 2 step for sample selection model
. * Same as MLE execpt add option twostep in heckman command
. heckman LNMED $XLIST, select (DMED = $XLIST) twostep
Heckman selection model -- two-step estimates Number of obs
(regression model with sample selection)
Censored obs
=
Uncensored obs =
4281

=
5574
1293

Wald chi2(34)
= 944.44
Prob > chi2
= 0.0000

331

-----------------------------------------------------------------------------|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNMED
|
LC | -.0279209 .039754 -0.70 0.482 -.1058373 .0499955
IDP | -.0922898 .0680191 -1.36 0.175 -.2256048 .0410252
LPI | .0052225 .0111057 0.47 0.638 -.0165442 .0269893
FMDE | -.0295212 .0182427 -1.62 0.106 -.0652762 .0062339
PHYSLIM | .2814948 .0804535 3.50 0.000 .1238088 .4391808
NDISEASE | .021617 .0050395 4.29 0.000 .0117398 .0314943
HLTHG | .1474026 .0490497 3.01 0.003 .051267 .2435381
HLTHF | .3821683 .0961284 3.98 0.000
.19376 .5705765
HLTHP | .833294 .1974488 4.22 0.000 .4463015 1.220287
LINC | .0990973 .0251548 3.94 0.000 .0497948 .1483998
LFAM | -.1441358 .0468074 -3.08 0.002 -.2358766 -.052395
EDUCDEC | .0033639 .0109501 0.31 0.759 -.0180979 .0248257
AGE | .0055556 .0022549 2.46 0.014 .0011361 .0099751
FEMALE | .3846323 .1032799 3.72 0.000 .1822074 .5870573
CHILD | -.2565136 .0936771 -2.74 0.006 -.4401173 -.0729098
FEMCHILD | -.392146 .125089 -3.13 0.002 -.637316 -.146976
BLACK | -.2633649 .1577542 -1.67 0.095 -.5725574 .0458276
_cons | 2.882514 .4698969 6.13 0.000 1.961533 3.803495
-------------+---------------------------------------------------------------DMED
|
LC | -.118708 .0269005 -4.41 0.000 -.1714319 -.065984
IDP | -.1279483 .0522351 -2.45 0.014 -.2303272 -.0255693
LPI | .0283091 .0088793 3.19 0.001
.010906 .0457121
FMDE | .0075319 .0161584 0.47 0.641 -.024138 .0392018
PHYSLIM | .2732013 .0743761 3.67 0.000 .1274268 .4189758
NDISEASE | .0224861 .0035958 6.25 0.000 .0154384 .0295338
HLTHG | .0387516 .0438545 0.88 0.377 -.0472016 .1247049
HLTHF | .1920062 .0836688 2.29 0.022 .0280185 .355994
HLTHP | .6397294 .2126322 3.01 0.003 .222978 1.056481
LINC | .0518413 .0168128 3.08 0.002 .0188889 .0847938
LFAM | -.0335599 .041728 -0.80 0.421 -.1153452 .0482253
EDUCDEC | .036307 .0076536 4.74 0.000 .0213062 .0513078
AGE | .0002631 .0021606 0.12 0.903 -.0039715 .0044978
FEMALE | .4451035 .054292 8.20 0.000 .3386932 .5515138
CHILD | .111489 .0808338 1.38 0.168 -.0469424 .2699203
FEMCHILD | -.4512845 .0799219 -5.65 0.000 -.6079284 -.2946405
BLACK | -.6057367 .0523148 -11.58 0.000 -.7082718 -.5032017
_cons | -.271605 .1877345 -1.45 0.148 -.6395579 .0963478
-------------+---------------------------------------------------------------mills
|
lambda | .2358048 .5018117 0.47 0.638 -.7477282 1.219338
-------------+---------------------------------------------------------------rho | 0.16833
sigma | 1.4008246
lambda | .23580476 .5018117
------------------------------------------------------------------------------

332

. estimates store heck2step


. scalar sh2s = e(sigma)

/* s where Var[v]=s^2 */

.
. * Save the Stata predictions:
. * Distinguish between ystar=E[y*], ypos=E[y|I>0] and yall=E[y]
. predict ystarh2s, xb
/* E[y*] = x'b */
. predict yposh2s, ycond

/* E[y|I>0] = E[y*|I>0] = x'b+c*lamda(z'a) */

. predict invmillh2s, mills

/* lamda(z'a) = phi(z'a)/PHI(z'a) */

. predict probselh2s, psel

/* PHI(z'a) */

. * The following not appropriate here as it sets y=0 if I<0


. * whereas here data is in logs and y=ln(MED)=-infinity if I<0
. predict yallh2s, yexpected /* E[y] = PHI(z'a)*E[y|I>0] */
. sum ystarh2s yposh2s invmillh2s probselh2s yallh2s
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ystarh2s |
5574 3.904371 .589474 2.005307 6.573941
yposh2s |
5574 3.997637 .5516546 2.337985 6.574553
invmillh2s |
5574 .3955256 .2253329 .002599 1.545223
probselh2s |
5574 .7678377 .1457464 .1526731 .999246
yallh2s |
5574 3.124344 .9213697 .4450346 6.569597
.
. * Create predictions of level of expenditures not logs
. * E[y] = exp(ypos + (s^2)/2) for y > 0 Var[v]=s^2
. * and E[y] = Pr[y>0]*exp(ypos + (s^2)/2) for all y
. gen pMEDposh2s = exp(yposh2s + (sh2s^2)/2)
. gen pMEDallh2s = probselh2s*pMEDposh2s
.
. * Compare predictions to actual for MED > 0
. sum LNMED yposh2s MED pMEDposh2s if MED > 0
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------LNMED |
4281 4.069462 1.499372 -.5343859 10.57597
yposh2s |
4281 4.069462 .5543231 2.337985 6.574553
MED |
4281 220.987 909.9021 .5860291 39182.02
pMEDposh2s |
4281 184.9993 129.5432 27.63657 1911.624
. corr LNMED yposh2s MED pMEDpos2part if MED > 0
(obs=4281)

333

| LNMED yposh2s
MED pMEDpo~t
-------------+-----------------------------------LNMED | 1.0000
yposh2s | 0.3697 1.0000
MED | 0.4560 0.1584 1.0000
pMEDpos2part | 0.3387 0.9240 0.1669 1.0000

.
. * Compare predictions to actual including zeroes
. sum MED pMEDallh2s DMED probselh2s
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
5574 169.7247 802.8303
0 39182.02
pMEDallh2s |
5574 142.1438 123.2964 5.272963 1910.182
DMED |
5574 .7680301 .4221277
0
1
probselh2s |
5574 .7678377 .1457464 .1526731 .999246
. corr MED pMEDallh2s DMED probselh2s
(obs=5574)
|
MED pMEDa~2s DMED probs~2s
-------------+-----------------------------------MED | 1.0000
pMEDallh2s | 0.1772 1.0000
DMED | 0.1162 0.2132 1.0000
probselh2s | 0.1031 0.6298 0.3467 1.0000

.
. * (2C) Check for possible collinearity problems in Heckman 2-Step
.
. * Check variation in inverse mills ratio and related measures
. gen zprimea = invnorm(probselh2s)
. gen zprimeasq = zprimea*zprimea
. sum invmillh2s probselh2s zprimea ystarh2s
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------invmillh2s |
5574 .3955256 .2253329 .002599 1.545223
probselh2s |
5574 .7678377 .1457464 .1526731 .999246
zprimea |
5574 .8217315 .5175712 -1.025036 3.17314
ystarh2s |
5574 3.904371 .589474 2.005307 6.573941
. sum invmillh2s probselh2s zprimea ystarh2s, detail
Mills' ratio
------------------------------------------------------------334

Percentiles
Smallest
1% .0443035
.002599
5% .1081773
.0065964
10% .1479522
.0074306
25% .2404661
.0111331
50%
75%
90%
95%
99%

Obs
5574
Sum of Wgt.
5574

.3522253
Mean
.3955256
Largest
Std. Dev.
.2253329
.5044507
1.42819
.7088638
1.42819
Variance
.0507749
.863094
1.466996
Skewness
1.105156
1.080771
1.545223
Kurtosis
4.403004

Pr(DMED)
------------------------------------------------------------Percentiles
Smallest
1%
.338421
.1526731
5% .4598847
.1769602
10% .5570307
.1900167
Obs
5574
25% .6946899
.1900167
Sum of Wgt.
5574
50%
75%
90%
95%
99%

.7984734
Mean
.7678377
Largest
Std. Dev.
.1457464
.8717066
.9962835
.927941
.9976236
Variance
.021242
.9502093
.9979156
Skewness
-1.048826
.9823552
.999246
Kurtosis
3.903288

zprimea
------------------------------------------------------------Percentiles
Smallest
1% -.4167765
-1.025036
5% -.1007243
-.9270119
10% .1434453 -.8778346
Obs
5574
25% .5091883 -.8778346
Sum of Wgt.
5574
50%
75%
90%
95%
99%

.8361809
Mean
.8217315
Largest
Std. Dev.
.5175712
1.134495
2.676793
1.460626
2.82333
Variance
.2678799
1.646887
2.865093
Skewness
-.0298741
2.105021
3.17314
Kurtosis
3.462529

Linear prediction
------------------------------------------------------------Percentiles
Smallest
1% 2.770451
2.005307
5% 3.096997
2.005307
10% 3.248734
2.066777
Obs
5574
25% 3.460358
2.093177
Sum of Wgt.
5574

335

50%
75%
90%
95%
99%

3.818303
Mean
3.904371
Largest
Std. Dev.
.589474
4.304362
6.054721
4.68132
6.055911
Variance
.3474796
4.946257
6.273092
Skewness
.5047628
5.495563
6.573941
Kurtosis
3.235111

.
. * Check for Mills ratio linear in zprimea
. regress invmillh2s zprimea
Source |
SS
df
MS
Number of obs = 5574
-------------+-----------------------------F( 1, 5572) =84783.34
Model | 265.518552 1 265.518552
Prob > F
= 0.0000
Residual | 17.4500012 5572 .00313173
R-squared = 0.9383
-------------+-----------------------------Adj R-squared = 0.9383
Total | 282.968553 5573 .050774906
Root MSE
= .05596
-----------------------------------------------------------------------------invmillh2s |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------zprimea | -.4217284 .0014484 -291.18 0.000 -.4245677 -.418889
_cons | .7420731 .0014065 527.59 0.000 .7393158 .7448305
-----------------------------------------------------------------------------. regress invmillh2s zprimea zprimeasq
Source |
SS
df
MS
Number of obs =
-------------+-----------------------------F( 2, 5571) =
Model | 282.919807 2 141.459904
Prob > F
Residual | .04874607 5571 8.7500e-06
R-squared
-------------+-----------------------------Adj R-squared =
Total | 282.968553 5573 .050774906
Root MSE

5574
.
= 0.0000
= 0.9998
0.9998
= .00296

-----------------------------------------------------------------------------invmillh2s |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------zprimea | -.6381933 .0001715 -3720.60 0.000 -.6385296 -.6378571
zprimeasq | .1329635 .0000943 1410.22 0.000 .1327787 .1331484
_cons | .7945547 .0000831 9556.73 0.000 .7943917 .7947177
-----------------------------------------------------------------------------. * twoway scatter yinvmill probitxb
.
. * Check R-squared from regress yinvmill on other regressors
. regress invmillh2s $XLIST
Source |
SS
df
MS
Number of obs = 5574
-------------+-----------------------------F( 17, 5556) = 7477.36
Model | 271.118403 17 15.9481414
Prob > F
= 0.0000
Residual | 11.85015 5556 .002132856
R-squared = 0.9581
336

-------------+-----------------------------Adj R-squared = 0.9580


Total | 282.968553 5573 .050774906
Root MSE
= .04618
-----------------------------------------------------------------------------invmillh2s |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | .0529008 .000877 60.32 0.000 .0511815 .0546202
IDP | .0590603 .0017037 34.67 0.000 .0557204 .0624003
LPI | -.0113774 .0002792 -40.75 0.000 -.0119247 -.01083
FMDE | -.0054681 .0005178 -10.56 0.000 -.0064831 -.004453
PHYSLIM | -.0864947 .0021028 -41.13 0.000 -.090617 -.0823724
NDISEASE | -.0077731 .0001032 -75.31 0.000 -.0079754 -.0075707
HLTHG | -.0155696 .0013947 -11.16 0.000 -.0183037 -.0128355
HLTHF | -.0844067 .0025693 -32.85 0.000 -.0894435 -.0793698
HLTHP | -.2164141 .0052914 -40.90 0.000 -.2267872 -.206041
LINC | -.0293205 .0005678 -51.64 0.000 -.0304337 -.0282074
LFAM | .0170455 .0013216 12.90 0.000 .0144545 .0196364
EDUCDEC | -.0152414 .0002405 -63.38 0.000 -.0157128 -.01477
AGE | .0001145 .0000665 1.72 0.085 -.0000158 .0002448
FEMALE | -.1792718 .0016754 -107.00 0.000 -.1825563 -.1759873
CHILD | -.0474152 .0025807 -18.37 0.000 -.0524744 -.042356
FEMCHILD | .1803783 .002565 70.32 0.000 .1753498 .1854067
BLACK | .3020816 .0017915 168.62 0.000 .2985695 .3055937
_cons | .875215 .0061051 143.36 0.000 .8632467 .8871833
-----------------------------------------------------------------------------.
. * Find the condition number with inverse mills ratio included
. matrix accum XX = invmillh2s $XLIST
(obs=5574)
. matrix XXScaled = corr(XX)
. matrix symeigen XXSeigvec XXSeigval = XXScaled
. scalar rowsXX = rowsof(XX)
. scalar condnum1 = sqrt(XXSeigval[1,1]/XXSeigval[1,rowsXX])
. scalar condnum2 = sqrt(XXSeigval[1,1]/XXSeigval[1,(rowsXX-1)])
.
. * Find the condition number without inverse mills ratio
. matrix accum ZZ = $XLIST
(obs=5574)
. matrix ZZScaled = corr(ZZ)
. matrix symeigen ZZSeigvec ZZSeigval = ZZScaled
. scalar rowsZZ = rowsof(ZZ)
337

. scalar condnumnoinvmills1 = sqrt(ZZSeigval[1,1]/ZZSeigval[1,rowsZZ])


. scalar condnumnoinvmills2 = sqrt(ZZSeigval[1,1]/ZZSeigval[1,(rowsZZ-1)])
.
. * Condition numbers between 30 and 100 indicate a strong near dependency
. scalar list condnum1 condnum2
condnum1 = 82.333696
condnum2 = 24.558474
. scalar list condnumnoinvmills1 condnumnoinvmills2
condnumnoinvmills1 = 36.660119
condnumnoinvmills2 = 20.990872
.
. * (2D) Do Heckman 2 step manually (this is unnecessary)
. quietly probit DMED $XLIST
/* global XLIST defined earlier */
. predict pselmanual, p

/* Pr[y>0] = PHI(x'b) */

. predict xbmanual, xb

/* x'b */

. gen invmillsmanual = normden(xbmanual)/pselmanual


. regress LNMED $XLIST invmillsmanual if MED > 0
Source |
SS
df
MS
Number of obs = 4281
-------------+-----------------------------F( 18, 4262) = 37.49
Model | 1315.13292 18 73.06294
Prob > F
= 0.0000
Residual | 8306.80418 4262 1.94903899
R-squared = 0.1367
-------------+-----------------------------Adj R-squared = 0.1330
Total | 9621.9371 4280 2.24811614
Root MSE
= 1.3961
-----------------------------------------------------------------------------LNMED |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0279209 .0397381 -0.70 0.482 -.1058282 .0499864
IDP | -.0922898 .067979 -1.36 0.175 -.225564 .0409844
LPI | .0052225 .0110962 0.47 0.638 -.0165318 .0269769
FMDE | -.0295212 .01822 -1.62 0.105 -.065242 .0061996
PHYSLIM | .2814948 .0803424 3.50 0.000 .1239819 .4390076
NDISEASE | .0216171 .0050367 4.29 0.000 .0117426 .0314915
HLTHG | .1474026 .0489869 3.01 0.003 .0513627 .2434424
HLTHF | .3821683 .0960103 3.98 0.000 .1939381 .5703985
HLTHP | .833294 .1971219 4.23 0.000 .4468325 1.219756
LINC | .0990973 .0251514 3.94 0.000 .0497875 .1484071
LFAM | -.1441358 .0467495 -3.08 0.002 -.2357891 -.0524825
EDUCDEC | .0033639 .0109441 0.31 0.759 -.0180922 .0248201
AGE | .0055556 .0022512 2.47 0.014
.001142 .0099692
FEMALE | .3846324 .103291 3.72 0.000 .1821281 .5871366
338

CHILD | -.2565135 .0935766 -2.74 0.006 -.4399725 -.0730546


FEMCHILD | -.392146 .1250644 -3.14 0.002 -.6373374 -.1469547
BLACK | -.2633649 .1578399 -1.67 0.095 -.5728134 .0460835
invmillsma~l | .235805 .5023784 0.47 0.639 -.7491182 1.220728
_cons | 2.882514 .470116 6.13 0.000 1.960841 3.804186
-----------------------------------------------------------------------------. predict yposmanual, xb
. * Predictions here should equal those from heckman two-step earlier
. sum yposh2s yposmanual invmillh2s invmillsmanual probselh2s pselmanual
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------yposh2s |
5574 3.997637 .5516546 2.337985 6.574553
yposmanual |
5574 3.997637 .5516546 2.337985 6.574553
invmillh2s |
5574 .3955256 .2253329 .002599 1.545223
invmillsma~l |
5574 .3955256 .2253329 .002599 1.545223
probselh2s |
5574 .7678377 .1457464 .1526731 .999246
-------------+-------------------------------------------------------pselmanual |
5574 .7678377 .1457464 .1526731 .999246
. * And put in squared invmills ratio
. gen invmillssq = invmillsmanual*invmillsmanual
. regress LNMED $XLIST invmillsmanual invmillssq if MED > 0
Source |
SS
df
MS
Number of obs = 4281
-------------+-----------------------------F( 19, 4261) = 35.64
Model | 1319.30272 19 69.4369854
Prob > F
= 0.0000
Residual | 8302.63438 4261 1.94851781
R-squared = 0.1371
-------------+-----------------------------Adj R-squared = 0.1333
Total | 9621.9371 4280 2.24811614
Root MSE
= 1.3959
-----------------------------------------------------------------------------LNMED |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0793176 .0530386 -1.50 0.135 -.1833009 .0246658
IDP | -.1419148 .075965 -1.87 0.062 -.2908457 .0070161
LPI | .0174224 .0138796 1.26 0.209 -.0097888 .0446337
FMDE | -.0258495 .0183897 -1.41 0.160 -.0619029 .0102039
PHYSLIM | .3867535 .1078448 3.59 0.000 .1753217 .5981854
NDISEASE | .0305019 .0078898 3.87 0.000 .0150337 .0459701
HLTHG | .1652111 .0504705 3.27 0.001 .0662626 .2641596
HLTHF | .4576241 .1089774 4.20 0.000 .2439716 .6712766
HLTHP | 1.056745 .2493566 4.24 0.000 .5678762 1.545614
LINC | .1169339 .027948 4.18 0.000 .0621414 .1717264
LFAM | -.1550441 .0473343 -3.28 0.001 -.2478439 -.0622443
EDUCDEC | .018452 .0150373 1.23 0.220 -.011029 .047933
AGE | .0057227 .0022538 2.54 0.011
.001304 .0101414
FEMALE | .5748999 .1660813 3.46 0.001 .2492941 .9005056
339

CHILD | -.2096856 .0988886 -2.12 0.034 -.4035587 -.0158125


FEMCHILD | -.5873068 .1828525 -3.21 0.001 -.9457929 -.2288207
BLACK | -.5010232 .2264954 -2.21 0.027 -.9450721 -.0569744
invmillsma~l | 2.159812 1.407886 1.53 0.125 -.6003768 4.920001
invmillssq | -1.043357 .7132265 -1.46 0.144 -2.441653 .3549381
_cons | 1.909849 .8142753 2.35 0.019 .3134454 3.506253
-----------------------------------------------------------------------------.
. ************ (3) DISPLAY RESULTS FOR TABLE 16.1 (page 554) ************
.
. * Note for brevity the coefficients for only some of the regressors are reported
.
. * First two columns of Table 16.1 (page 554)
. * Two part estimates: probit for first part and lognormal for second
. estimates table twoparta twopartb, t stats(N ll rank aic bic) b(%10.3f)
---------------------------------------Variable | twoparta twopartb
-------------+-------------------------LC | -0.119
-0.016
|
-4.41
-0.52
IDP | -0.128
-0.079
|
-2.45
-1.28
LPI |
0.028
0.003
|
3.19
0.28
FMDE |
0.008
-0.031
|
0.47
-1.69
PHYSLIM |
0.273
0.262
|
3.67
3.81
NDISEASE |
0.022
0.020
|
6.25
5.78
HLTHG |
0.039
0.144
|
0.88
2.97
HLTHF |
0.192
0.364
|
2.29
4.13
HLTHP |
0.640
0.787
|
3.01
4.63
LINC |
0.052
0.093
|
3.08
4.28
LFAM | -0.034
-0.141
|
-0.80
-3.05
EDUCDEC |
0.036
-0.000
|
4.74
-0.00
AGE |
0.000
0.006
|
0.12
2.47
FEMALE |
0.445
0.344
|
8.20
6.02
CHILD |
0.111
-0.268
|
1.38
-2.96
FEMCHILD | -0.451
-0.351
340

|
-5.65
-3.92
BLACK | -0.606
-0.196
| -11.58
-2.90
_cons | -0.272
3.077
|
-1.45
13.90
-------------+-------------------------N | 5574.000 4281.000
ll | -2690.577 -7493.499
rank | 18.000
18.000
aic | 5417.154 15022.998
bic | 5536.419 15137.513
---------------------------------------legend: b/t
. di "lltwopart = " lltwopart
lltwopart = -10184.076
.
. * Last four columns of Table 16.1 (page 554)
. * Sample selection estimates: 2step and MLE estimates
. set matsize 60
. estimates table heck2step heckmle, t stats(N ll rank aic bic) b(%10.3f)
---------------------------------------Variable | heck2step heckmle
-------------+-------------------------LNMED
|
LC | -0.028
-0.076
|
-0.70
-2.25
IDP | -0.092
-0.150
|
-1.36
-2.26
LPI |
0.005
0.015
|
0.47
1.42
FMDE | -0.030
-0.024
|
-1.62
-1.21
PHYSLIM |
0.281
0.355
|
3.50
4.70
NDISEASE |
0.022
0.029
|
4.29
7.54
HLTHG |
0.147
0.156
|
3.01
2.99
HLTHF |
0.382
0.445
|
3.98
4.66
HLTHP |
0.833
0.999
|
4.22
5.32
LINC |
0.099
0.121
|
3.94
5.26
LFAM | -0.144
-0.158
|
-3.08
-3.18
EDUCDEC |
0.003
0.018
341

|
0.31
1.95
AGE |
0.006
0.006
|
2.46
2.35
FEMALE |
0.385
0.550
|
3.72
8.69
CHILD | -0.257
-0.198
|
-2.74
-2.03
FEMCHILD | -0.392
-0.565
|
-3.13
-5.80
BLACK | -0.263
-0.536
|
-1.67
-7.15
_cons |
2.883
2.108
|
6.13
8.63
-------------+-------------------------DMED
|
LC | -0.119
-0.107
| -4.41
-4.03
IDP | -0.128
-0.109
|
-2.45
-2.13
LPI |
0.028
0.029
|
3.19
3.42
FMDE |
0.008
0.001
|
0.47
0.05
PHYSLIM |
0.273
0.285
|
3.67
3.94
NDISEASE |
0.022
0.021
|
6.25
6.03
HLTHG |
0.039
0.058
|
0.88
1.35
HLTHF |
0.192
0.224
|
2.29
2.75
HLTHP |
0.640
0.798
|
3.01
3.90
LINC |
0.052
0.055
|
3.08
3.33
LFAM | -0.034
-0.031
|
-0.80
-0.77
EDUCDEC |
0.036
0.031
|
4.74
4.20
AGE |
0.000
-0.001
|
0.12
-0.29
FEMALE |
0.445
0.409
|
8.20
7.69
CHILD |
0.111
0.053
|
1.38
0.67
FEMCHILD | -0.451
-0.395
|
-5.65
-5.04
BLACK | -0.606
-0.583
| -11.58
-11.20
_cons | -0.272
-0.214
|
-1.45
-1.16
342

-------------+-------------------------mills
|
lambda |
0.236
|
0.47
-------------+-------------------------athrho
|
_cons |
0.941
|
12.78
-------------+-------------------------lnsigma
|
_cons |
0.451
|
25.45
-------------+-------------------------Statistics |
N | 5574.000 5574.000
ll |
-10170.110
rank | 37.000
38.000
aic |
. 20416.221
bic |
. 20668.004
---------------------------------------legend: b/t
.
. ************ (4) A LITTLE FURTHER ANALYSIS **********
.
. * Predictions
. * Compare predictions to actual for MED > 0
. sum MED pMEDpos2part pMEDposhml pMEDposh2s if MED > 0
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
4281 220.987 909.9021 .5860291 39182.02
pMEDpos2part |
4281 183.462 126.0213 26.37827 1731.088
pMEDposhml |
4281 240.4096 185.0424 42.00053 3505.48
pMEDposh2s |
4281 184.9993 129.5432 27.63657 1911.624
. corr MED pMEDpos2part pMEDposhml pMEDposh2s if MED > 0
(obs=4281)
|
MED pMEDpo~t pMEDpo~l pMEDp~2s
-------------+-----------------------------------MED | 1.0000
pMEDpos2part | 0.1669 1.0000
pMEDposhml | 0.1617 0.9830 1.0000
pMEDposh2s | 0.1669 0.9994 0.9887 1.0000

.
. * Compare predictions to actual including zeroes
. sum MED pMEDall2part pMEDallhml pMEDallh2s DMED probsel2part probselhml probselh2s

343

Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
5574 169.7247 802.8303
0 39182.02
pMEDall2part |
5574 140.966 120.2022 4.880651 1729.783
pMEDallhml |
5574 184.5571 174.1649 8.814864 3503.564
pMEDallh2s |
5574 142.1438 123.2964 5.272963 1910.182
DMED |
5574 .7680301 .4221277
0
1
-------------+-------------------------------------------------------probsel2part | 5574 .7678377 .1457464 .1526731 .999246
probselhml |
5574 .7674107 .1404707 .1737047 .9994534
probselh2s |
5574 .7678377 .1457464 .1526731 .999246
. corr MED pMEDall2part pMEDallhml pMEDallh2s DMED probsel2part probselhml probselh2s
(obs=5574)
|
MED pMEDal~t pMEDal~l pMEDa~2s DMED probse~t probse~l probs~2s
-------------+-----------------------------------------------------------------------MED | 1.0000
pMEDall2part | 0.1772 1.0000
pMEDallhml | 0.1734 0.9861 1.0000
pMEDallh2s | 0.1772 0.9995 0.9909 1.0000
DMED | 0.1162 0.2158 0.2015 0.2132 1.0000
probsel2part | 0.1031 0.6380 0.5939 0.6298 0.3467 1.0000
probselhml | 0.1074 0.6552 0.6092 0.6468 0.3468 0.9980 1.0000
probselh2s | 0.1031 0.6380 0.5939 0.6298 0.3467 1.0000 0.9980 1.0000

.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section4\mma16p3selection.txt
log type: text
closed on: 19 May 2005, 13:04:40

344

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p1km.txt
log type: text
opened on: 19 May 2005, 13:19:55
.
. ********** OVERVIEW OF MMA17P1KM.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 17.2 (pages 574-5) and 17.5.1 (pages 581-3)
. * Nonparametric Duration Analysis
. * It provides
. * (1) Kaplan-Meier Survival Estimate Graph (Figure 17.1: kennanstrk.wmf)
. * (2) Nelson-Aalen Cumulative Hazard Estimate Graph
. * (3) Kaplan-Meier Survivor Function Estimates (Table 17.3)
. * (4) Shows that Cox regression on intercept gives same results
.
. * To run this program you need data file
. * strkdur.dta
.
. ********** SETUP **********
.
. set more off
. version 8
. set scheme s1mono /* Used for graphs */
.
. ********** DATA DESCRIPTION
.
. * The data is the same data as given in Table 1 of
. * J. Kennan, "The Duration of Contract strikes in U.S. Manufacturing",
. * Journal of Econometrics, 1985, Vol. 28, pp.5-28.
.
. * There are 566 observations from 1968-1976 with two variables
. * 1. dur is duration of the strike in days
. * 2. gdp is a measure of stage of business cycle
.*
(deviation of monthly log industrial production in manufacturing
.*
from prediction from OLS on time, time-squared and monthly dummies)
.
. * All observations are complete for these data. There is no censoring !!
. * For an example with censoring see mma17p2kmextra.do or mma17p4duration.do
.
. ********** READ DATA **********
.
345

. use strkdur.dta
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------dur |
566 43.62367 44.66641
1
235
gdp |
566 .0060411 .0499072 -.13996 .08554
.
. * Create ASCII data set so that can use programs other than Stata
. outfile dur gdp using strkdur.asc, replace
.
. ********* ANALYSIS: NONPARAMETRIC SURVIVAL CURVE AND HAZARD
FUNCTION **********
.
. * Stata st curves require defining the dependent variable
. stset dur
failure event: (assumed to fail at time=dur)
obs. time interval: (0, dur]
exit on or before: failure
-----------------------------------------------------------------------------566 total obs.
0 exclusions
-----------------------------------------------------------------------------566 obs. remaining, representing
566 failures in single record/single failure data
24691 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
235
.
. * The data here are complete. If dur is instead right-censored,
. * then also need to define a censoring indicator. For example
. * stset dur, fail(censor=1)
. * where the variable censor=1 if data are right-censored and =0 otherwise
. * See mma17p3duration.do
.
. * (1) GRAPH KAPLAN-MEIER SURVIVAL CURVE
.
. * Minimal command that gives 95% confidence bands
. sts graph, gwood
failure _d: 1 (meaning all fail)
analysis time _t: dur
.
. * Longer command for Figure 17.1 (page 575)
346

. * Nicer graphs and also confidence bands are bolder and easier to read
. sts gen surv = s
. sts gen lbsurv = lb(s)
. sts gen ubsurv = ub(s)
. sort dur
. graph twoway (line ubsurv dur, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)) /*
> */ (line surv dur, msize(vtiny) mstyle(p1) c(J) clstyle(p1)) /*
> */ (line lbsurv dur, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)), /*
> */ scale(1.2) plotregion(style(none)) /*
> */ title("Kaplan-Meier Survival Function Estimate") /*
> */ xtitle("Strike duration in days", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Survival Probability", size(medlarge)) yscale(titlegap(*5)) /*
> */ ylabel(0.00(0.25)1.00,grid)/*
> */ legend(pos(3) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Upper 95% confidence band") label(2 "Survival Function") /*
> */
label(3 "Lower 95% confidence band") )
. graph export kennanstrk.wmf, replace
(file c:\Imbook\bwebpage\Section4\kennanstrk.wmf written in Windows Metafile format)
.
. * (2) GRAPH NELSON-AALEN CUMULATIVE HAZARD FUNCTION
.
. * Minimal command that gives 95% confidence bands
. sts graph, cna
failure _d: 1 (meaning all fail)
analysis time _t: dur
.
. * Longer command gives nicer figure
. sts graph, cna /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Nelson-Aalen Cumulative Hazard") /*
> */ xtitle("Strike duration in days", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(12) ring(0) col(1)) legend(size(small)) /*
> */ legend(label(1 "95% confidence bands") label(2 "Cumulative Hazard"))
failure _d: 1 (meaning all fail)
analysis time _t: dur
.
. * (3) LIST SURVIVOR and NELSON-AALEN CUMULATIVE HAZARD ESTIMATES
.
. * Gives a lot of output
.
347

. * Table 17.2: Kaplan-Meier Survivor Function (page 583)


. sts list
failure _d: 1 (meaning all fail)
analysis time _t: dur
Beg.
Net
Survivor
Std.
Time Total Fail Lost
Function Error [95% Conf. Int.]
------------------------------------------------------------------------------1
566 10
0
0.9823 0.0055 0.9674 0.9905
2
556 21
0
0.9452 0.0096 0.9230 0.9612
3
535 16
0
0.9170 0.0116 0.8910 0.9369
4
519 17
0
0.8869 0.0133 0.8578 0.9104
5
502 18
0
0.8551 0.0148 0.8234 0.8816
6
484
9
0
0.8392 0.0154 0.8063 0.8670
7
475 12
0
0.8180 0.0162 0.7837 0.8474
8
463 12
0
0.7968 0.0169 0.7613 0.8277
9
451 13
0
0.7739 0.0176 0.7371 0.8061
10
438
8
0
0.7597 0.0180 0.7223 0.7928
11
430
9
0
0.7438 0.0183 0.7058 0.7777
12
421 10
0
0.7261 0.0187 0.6874 0.7609
13
411 11
0
0.7067 0.0191 0.6673 0.7424
14
400 11 0
0.6873 0.0195 0.6473 0.7237
15
389 12
0
0.6661 0.0198 0.6256 0.7033
16
377
8
0
0.6519 0.0200 0.6111 0.6896
17
369
6
0
0.6413 0.0202 0.6003 0.6793
18
363
8
0
0.6272 0.0203 0.5860 0.6656
19
355
7
0
0.6148 0.0205 0.5734 0.6535
20
348
7
0
0.6025 0.0206 0.5609 0.6415
21
341
5
0
0.5936 0.0206 0.5519 0.6328
22
336 11
0
0.5742 0.0208 0.5324 0.6137
23
325 10
0
0.5565 0.0209 0.5146 0.5964
24
315
8
0
0.5424 0.0209 0.5004 0.5824
25
307
4
0
0.5353 0.0210 0.4934 0.5754
26
303
7
0
0.5230 0.0210 0.4810 0.5632
27
296
6
0
0.5124 0.0210 0.4704 0.5527
28
290
9
0
0.4965 0.0210 0.4546 0.5369
29
281
5
0
0.4876 0.0210 0.4458 0.5281
30
276
5
0
0.4788 0.0210 0.4371 0.5193
31
271
8
0
0.4647 0.0210 0.4231 0.5051
32
263
5
0
0.4558 0.0209 0.4144 0.4963
33
258
6
0
0.4452 0.0209 0.4039 0.4857
34
252
5
0
0.4364 0.0208 0.3952 0.4768
35
247
4
0
0.4293 0.0208 0.3883 0.4697
36
243
6
0
0.4187 0.0207 0.3779 0.4590
37
237
6
0
0.4081 0.0207 0.3675 0.4483
38
231
8
0
0.3940 0.0205 0.3537 0.4340
39
223
3
0
0.3887 0.0205 0.3485 0.4287
40
220
1
0
0.3869 0.0205 0.3468 0.4269
41
219
4
0
0.3799 0.0204 0.3399 0.4197
42
215
8
0
0.3657 0.0202 0.3261 0.4053
348

43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
67
68
70
71
72
74
75
77
82
83
84
85
86
87
88
90
91
92
94
98
99
100
101
102
103
104
105
106

207
203
194
191
187
182
179
174
166
165
157
151
150
148
145
142
141
137
131
126
124
122
117
114
113
112
108
107
106
105
104
101
99
98
95
93
92
91
90
89
87
86
85
82
79
77
74
72
71
68
67

4
9
3
4
5
3
5
8
1
8
6
1
2
3
3
1
4
6
5
2
2
5
3
1
1
4
1
1
1
1
3
2
1
3
2
1
1
1
1
2
1
1
3
3
2
3
2
1
3
1
2

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0.3587
0.3428
0.3375
0.3304
0.3216
0.3163
0.3074
0.2933
0.2915
0.2774
0.2668
0.2650
0.2615
0.2562
0.2509
0.2491
0.2420
0.2314
0.2226
0.2191
0.2155
0.2067
0.2014
0.1996
0.1979
0.1908
0.1890
0.1873
0.1855
0.1837
0.1784
0.1749
0.1731
0.1678
0.1643
0.1625
0.1608
0.1590
0.1572
0.1537
0.1519
0.1502
0.1449
0.1396
0.1360
0.1307
0.1272
0.1254
0.1201
0.1184
0.1148

0.0202
0.0200
0.0199
0.0198
0.0196
0.0195
0.0194
0.0191
0.0191
0.0188
0.0186
0.0186
0.0185
0.0183
0.0182
0.0182
0.0180
0.0177
0.0175
0.0174
0.0173
0.0170
0.0169
0.0168
0.0167
0.0165
0.0165
0.0164
0.0163
0.0163
0.0161
0.0160
0.0159
0.0157
0.0156
0.0155
0.0154
0.0154
0.0153
0.0152
0.0151
0.0150
0.0148
0.0146
0.0144
0.0142
0.0140
0.0139
0.0137
0.0136
0.0134

0.3193
0.3039
0.2988
0.2919
0.2834
0.2783
0.2698
0.2563
0.2546
0.2411
0.2310
0.2294
0.2260
0.2210
0.2159
0.2143
0.2076
0.1976
0.1893
0.1860
0.1827
0.1744
0.1695
0.1678
0.1662
0.1596
0.1580
0.1563
0.1547
0.1530
0.1481
0.1449
0.1432
0.1384
0.1351
0.1335
0.1319
0.1302
0.1286
0.1254
0.1238
0.1222
0.1173
0.1125
0.1093
0.1045
0.1013
0.0997
0.0950
0.0934
0.0902

0.3981
0.3819
0.3765
0.3693
0.3602
0.3548
0.3457
0.3312
0.3293
0.3147
0.3037
0.3019
0.2982
0.2927
0.2872
0.2854
0.2780
0.2669
0.2577
0.2540
0.2503
0.2410
0.2354
0.2335
0.2317
0.2242
0.2223
0.2205
0.2186
0.2167
0.2111
0.2073
0.2055
0.1998
0.1960
0.1942
0.1923
0.1904
0.1885
0.1847
0.1828
0.1809
0.1752
0.1695
0.1657
0.1600
0.1561
0.1542
0.1485
0.1465
0.1427
349

107
65
2
0
0.1113 0.0132 0.0871 0.1388
108
63
2
0
0.1078 0.0130 0.0839 0.1349
109
61
2
0
0.1042 0.0128 0.0808 0.1311
111
59
1
0
0.1025 0.0127 0.0792 0.1291
112
58
1
0
0.1007 0.0126 0.0777 0.1272
114
57
1
0
0.0989 0.0126 0.0761 0.1252
115
56
1
0
0.0972 0.0124 0.0745 0.1233
116
55
1
0
0.0954 0.0123 0.0730 0.1213
117
54
2
0
0.0919 0.0121 0.0699 0.1174
118
52
1
0
0.0901 0.0120 0.0683 0.1155
119
51
1
0
0.0883 0.0119 0.0668 0.1135
122
50
3
0
0.0830 0.0116 0.0622 0.1076
123
47
1
0
0.0813 0.0115 0.0606 0.1056
124
46
1
0
0.0795 0.0114 0.0591 0.1037
125
45
2
0
0.0760 0.0111 0.0561 0.0997
126
43
1
0
0.0742 0.0110 0.0545 0.0977
127
42
2
0
0.0707 0.0108 0.0515 0.0937
130
40
2
0
0.0671 0.0105 0.0485 0.0897
131
38
1
0
0.0654 0.0104 0.0470 0.0877
133
37
1
0
0.0636 0.0103 0.0455 0.0857
135
36
1
0
0.0618 0.0101 0.0440 0.0837
136
35
2
0
0.0583 0.0098 0.0410 0.0797
139
33
2
0
0.0548 0.0096 0.0381 0.0756
140
31
1
0
0.0530 0.0094 0.0366 0.0736
141
30
3
0
0.0477 0.0090 0.0323 0.0675
142
27
1
0
0.0459 0.0088 0.0308 0.0654
143
26
1
0
0.0442 0.0086 0.0294 0.0633
146
25
2
0
0.0406 0.0083 0.0265 0.0592
147
23
1
0
0.0389 0.0081 0.0251 0.0571
148
22
2
0
0.0353 0.0078 0.0223 0.0529
151
20
1
0
0.0336 0.0076 0.0209 0.0508
152
19
1
0
0.0318 0.0074 0.0196 0.0487
153
18
2
0
0.0283 0.0070 0.0169 0.0444
154
16
1
0
0.0265 0.0068 0.0155 0.0423
160
15
1
0
0.0247 0.0065 0.0142 0.0401
163
14
2
0
0.0212 0.0061 0.0116 0.0357
165
12
1
0
0.0194 0.0058 0.0103 0.0335
168
11
1
0
0.0177 0.0055 0.0091 0.0312
174
10
1
0
0.0159 0.0053 0.0079 0.0290
175
9
1
0
0.0141 0.0050 0.0067 0.0267
179
8
1
0
0.0124 0.0046 0.0055 0.0244
191
7
1
0
0.0106 0.0043 0.0044 0.0220
192
6
1
0
0.0088 0.0039 0.0034 0.0196
205
5
1
0
0.0071 0.0035 0.0024 0.0171
208
4
1
0
0.0053 0.0031 0.0015 0.0146
216
3
1
0
0.0035 0.0025 0.0007 0.0121
226
2
1
0
0.0018 0.0018 0.0002 0.0095
235
1 1 0
0.0000
.
.
.
------------------------------------------------------------------------------.
350

. * And Nelson-Aalen Integrated Hazard


. * sts list, na
.
. * (4) STCOX REGRESS ON INTERCEPT GIVES SAME RESULTS AS ABOVE
.
. * Cox Regression on an intercept
. gen one = 1
. stcox one, basesurv(coxbasesurv) basechazard(coxbasecumhaz) basehc(coxbasehaz)
failure _d: 1 (meaning all fail)
analysis time _t: dur
note: one dropped due to collinearity
Iteration 0: log likelihood = -3032.134
Refining estimates:
Iteration 0: log likelihood = -3032.134
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk =

566
566
24691

Number of obs =

LR chi2(0)
Log likelihood =

-3032.134

566

=
0.00
Prob > chi2 =

-----------------------------------------------------------------------------_t | Haz. Ratio Std. Err.


z P>|z| [95% Conf. Interval]
-------------+--------------------------------------------------------------------------------------------------------------------------------------------.
. * Instead use sts which analyzes dependent in isolation
. * sts gen surv = s
. sts gen cumhaz = na
. sts gen haz = h
.
. * Compare to verify that same answers
. sum surv coxbasesurv cumhaz coxbasecumhaz haz coxbasehaz
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------surv |
566 .493014 .2848417
0 .9823322
coxbasesurv |
566 .493014 .2848417
0 .9823322
cumhaz |
566
1 .9834583 .0176678 6.871446
coxbasecum~z |
566
1 .9834583 .0176678 6.871446
haz |
566 .0345186 .0515235 .0045455
1
-------------+-------------------------------------------------------coxbasehaz |
566 .0345186 .0515235 .0045455
1
351

. corr surv coxbasesurv


(obs=566)
| surv coxbas~v
-------------+-----------------surv | 1.0000
coxbasesurv | 1.0000 1.0000

. corr cumhaz coxbasecumhaz


(obs=566)
| cumhaz cox~mhaz
-------------+-----------------cumhaz | 1.0000
coxbasecum~z | 1.0000 1.0000

. corr haz coxbasehaz


(obs=566)
|
haz cox~ehaz
-------------+-----------------haz | 1.0000
coxbasehaz | 1.0000 1.0000

.
. * (5) ESTIMATE HAZARD FUNCTION
.
. * sts graph does not give the true hazard function - it instead gives the
. * difference in the cumulative hazard (without division by time difference).
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section4\mma17p1km.txt
log type: text
closed on: 19 May 2005, 13:20:01
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p2kmextra.txt
log type: text
opened on: 19 May 2005, 13:24:01
.
. ********** OVERVIEW OF MMA17PP2KMEXTRA.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
352

. * Cambridge University Press


.
. * Chapter 17.5.1 pages 581-2
. * Nonparametric Survival Analysis
. * Provides
. * (1) K-M Survivor Function and N_A Cum Hazard Estimates (Table 17.2)
. * using artificial data
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA **********
.
. * The time does not matter except for the hazard.
. * Here arbitrarily let durations be 1, 4, 6, 11 and 20 (so irregularly spaced)
. * 1. At t = 10 (time t1): 6 failures
. * 2. At t = 15:
4 censored (lost) between t1 and t2
. * 3. At t = 20 (time t2): 5 failures
. * 4. At t = 25:
3 censored (lost) between t2 and t3
. * 3. At t = 30 (time t3): 2 failures
. * 4. At t = 35:
1 censored (lost) between t3 and t4
. * 3. At t = 40 (time t4): 1 failures
. * 4. At t = 45:
32 failures (lost) between t4 and t5
. * 5. At t = 50 (time t5): 26 censored
.
. * Indicator failed = 1 if fail and 0 if censored
. input duration failed

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.

duration
10 1
10 1
10 1
10 1
10 1
10 1
15 0
15 0
15 0
15 0
20 1
20 1
20 1
20 1
20 1
25 0

failed

353

17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.

25
25
30
30
35
40
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
50
50
50
50
50
50
50
50
50
50
50
50
50

0
0
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
354

68. 50
69. 50
70. 50
71. 50
72. 50
73. 50
74. 50
75. 50
76. 50
77. 50
78. 50
79. 50
80. 50
81. end

1
1
1
1
1
1
1
1
1
1
1
1
1

.
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------duration |
80
39.625 13.40166
10
50
failed |
80
.5 .5031546
0
1
.
. ***** COMPUTATION USING STATA **********
.
. * Stata st curves require defining the dependent variable
. stset duration, fail(failed=1)
failure event: failed == 1
obs. time interval: (0, duration]
exit on or before: failure
-----------------------------------------------------------------------------80 total obs.
0 exclusions
-----------------------------------------------------------------------------80 obs. remaining, representing
40 failures in single record/single failure data
3170 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
50
. stsum
failure _d: failed == 1
analysis time _t: duration
|
incidence
no. of |------ Survival time -----|
| time at risk rate
subjects
25%
50%
75%
---------+--------------------------------------------------------------------355

total |

3170 .0126183

80

50

50

50

. stdes
failure _d: failed == 1
analysis time _t: duration
|-------------- per subject --------------|
Category
total
mean
min median
max
-----------------------------------------------------------------------------no. of subjects
80
no. of records
80
1
1
1
1
(first) entry time
(final) exit time
subjects with gap
time on gap if gap
time at risk

0
39.625

0
10

0
45

50

0
0
3170

39.625

10

45

50

failures
40
.5
0
.5
1
-----------------------------------------------------------------------------.
. * K-M survival graph
. * sts graph, gwood
.
. * N-A Cumulative Hazard
. * sts graph, cna
.
. * Kaplan-Meier Survivor Function listed (last column Table 17.2)
. sts list
failure _d: failed == 1
analysis time _t: duration
Beg.
Net
Survivor
Std.
Time Total Fail Lost
Function Error [95% Conf. Int.]
------------------------------------------------------------------------------10
80
6
0
0.9250 0.0294 0.8407 0.9656
15
74
0
4
0.9250 0.0294 0.8407 0.9656
20
70
5
0
0.8589 0.0395 0.7596 0.9193
25
65
0
3
0.8589 0.0395 0.7596 0.9193
30
62
2
0
0.8312 0.0428 0.7268 0.8984
35
60
0
1
0.8312 0.0428 0.7268 0.8984
40
59
1
0
0.8171 0.0443 0.7104 0.8875
45
58
0 32
0.8171 0.0443 0.7104 0.8875
50
26 26 0
0.0000
.
.
.
------------------------------------------------------------------------------.
356

. * Nelson-Aalen Cumulative Hazard Listed (second last column Table 17.2)


. sts list, na
failure _d: failed == 1
analysis time _t: duration
Beg.
Net
Nelson-Aalen Std.
Time Total Fail Lost
Cum. Haz. Error [95% Conf. Int.]
------------------------------------------------------------------------------10
80
6
0
0.0750 0.0306 0.0337 0.1669
15
74
0
4
0.0750 0.0306 0.0337 0.1669
20
70
5
0
0.1464 0.0442 0.0810 0.2648
25
65
0
3
0.1464 0.0442 0.0810 0.2648
30
62
2
0
0.1787 0.0498 0.1035 0.3085
35
60
0
1
0.1787 0.0498 0.1035 0.3085
40
59
1
0
0.1956 0.0526 0.1155 0.3313
45
58
0 32
0.1956 0.0526 0.1155 0.3313
50
26 26
0
1.1956 0.2030 0.8571 1.6678
------------------------------------------------------------------------------.
. ***** MANUAL COMPUTATION AS IN TABLE 17.2 (page 582) **********
.
. scalar cumhaz1 = 6/80
. scalar cumhaz2 = 6/80 + 5/70
. scalar cumhaz3 = 6/80 + 5/70 + 2/62
. scalar surv1 = 1-6/80
. scalar surv2 = (1-6/80)*(1-5/70)
. scalar surv3 = (1-6/80)*(1-5/70)*(1-2/62)
. di "Cumulative hazard at t1: " cumhaz1 " at t2: " cumhaz2 " at t3: " cumhaz3
Cumulative hazard at t1: .075 at t2: .14642857 at t3: .17868664
. di "Survivor function at t1: " surv1 " at t2: " surv2 " at t3: " surv3
Survivor function at t1: .925 at t2: .85892857 at t3: .8312212
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma17p2kmextra.txt
log type: text
closed on: 19 May 2005, 13:24:01
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p3weib.txt
log type: text
opened on: 19 May 2005, 14:22:25
357

.
. ********** OVERVIEW OF MMA17P3WEIB.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 17.6.1 (pages 584-6)
. * Plot of Weibull density, survuvor, hazard and cumulative hazard functions
. * Provides
. * (1) Figure 17.2 (ch17weibull.wmf)
.
. * This program requires no data
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA AND FUNCTIONS **********
.
. set obs 800
obs was 0, now 800
.
. gen t = 0.1*_n /* duration time */
.
. * Generate the survivor, hazard, cumulative hazard and density
. scalar g = 0.01 /* gamma */
. scalar a = 1.5 /* alpha */
. gen surv = exp(-g*(t^(a)))
. gen density = g*a*(t^(a-1))*exp(-g*(t^(a)))
. gen hazard = g*a*(t^(a-1))
. gen cumhaz = -ln(surv)
.
. ********** DO THE FOUR SEPARATE GRAPHS FOR FIGURE 17.2 **********
.
358

. * Weibull density
. graph twoway (scatter density t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /*
> */ ytitle("Weibull density", size(large)) yscale(titlegap(*5)) /*
> */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge))
. graph save ch17fig2a, replace
(file ch17fig2a.gph saved)
.
. * Weibull survivor
. graph twoway (scatter surv t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /*
> */ ytitle("Weibull survivor", size(large)) yscale(titlegap(*5)) /*
> */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge))
. graph save ch17fig2b, replace
(file ch17fig2b.gph saved)
.
. * Weibull hazard
. graph twoway (scatter hazard t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /*
> */ ytitle("Weibull hazard", size(large)) yscale(titlegap(*5)) /*
> */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge))
. graph save ch17fig2c, replace
(file ch17fig2c.gph saved)
.
. * Weibull cumulative hazard
. graph twoway (scatter cumhaz t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative hazard", size(large)) yscale(titlegap(*5)) /*
> */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge))
. graph save ch17fig2d, replace
(file ch17fig2d.gph saved)
.
. ********** COMBINE THE FOUR GRAPHS FOR FIGURE 17.2 (page 585) **********
.
. graph combine ch17fig2a.gph ch17fig2b.gph ch17fig2c.gph ch17fig2d.gph, /*
> */ title("Weibull Distribution", margin(b=2) size(vlarge))
. graph export ch17weibull.wmf, replace
(file c:\Imbook\bwebpage\Section4\ch17weibull.wmf written in Windows Metafile format)
359

.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section4\mma17p3weib.txt
log type: text
closed on: 19 May 2005, 14:22:39
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p4duration.txt
log type: text
opened on: 19 May 2005, 15:25:00
.
. ********** OVERVIEW OF MMA17P4DURATION.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 17.11 (pages 603-8)
. * Duration regression with censored data example
. * Provides
. * (1) Data summary: Table 17.6
. * (2) List of Survivor Function and Cumulative Hazard Estimates: Table 17.7
. * (3) Various graphs describing the data
.*
(3A) K-M Survival Graph for all data (Figure 17.3: km_pt1.wmf)
.*
(3B) K-M Survival Graph by unemployment insurance (Figure 17.4: km_pt2.wmf)
.*
(3C) N-A Cumulative Hazard Graph for all data (Figure 17.5: na_pt1.wmf)
.*
(3D) N-A Cumulative Hazard Graph by unemployment insurance (Figure 17.6: na_pt2.wmf)
. * (4) Coefficient Estimates of Some Parametric Models (Table 17.8)
. * (4) Hazard Rate Estimates of Some Parametric Models (Table 17.9)
.
. * To run this program you need data file
. * ema1996.dta
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
. set matsize 100
.
. ********** DATA DESCRIPTION **********
.
360

. * The data is from


. * B.P. McCall (1996), "Unemployment Insurance Rules, Joblessness,
.*
and Part-time Work," Econometrica, 64, 647-682.
.
. * McCalls data set named ema_1996_pt_lastweek.dta
. * has name changed to ema1996.dta
.
. * There are 3343 observations from the CPS Displaced Worker Surveys
. * of 1986, 1988, 1990 and 1992
. * 1. spell is length of spell in number of two-week intervals
. * 2. CENSOR1 = 1 if re-employed at full-time job
. * 3. CENSOR2 = 1 if re-employed at part-time job
. * 4. CENSOR3 = 1 if re-employed but left job: pt-ft status unknown
. * 5. CENSOR4 = 1 if still jobless
. * 6. ui (UI) = 1 if filed UI claim
. * 7. reprate (RR) = eligible replacement rate
. * 8. disrate (DR) = eligible disregard rate
. * 9. tenure (TENURE) = years tenure in lost job
. * 10. logwage (LOGWAGE) = log weekly earnings in lost job (1985$)
. * 11.-43. other variables listed in McCall (1986) table 2 p.657
.
. ********** READ DATA **********
.
. use ema1996.dta
(Sample for 1996 EMA paper: part-time= worked part-time last week)
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------spell |
3343 6.247981 5.611271
1
28
censor1 |
3343 .3209692 .4669188
0
1
censor2 |
3343 .1014059 .3019106
0
1
censor3 |
3343 .1717021 .3771777
0
1
censor4 |
3343 .3754113 .4843014
0
1
-------------+-------------------------------------------------------ui |
3343 .5527969 .4972791
0
1
reprate |
3343 .4544717 .1137918
.066
2.059
logwage |
3343 5.692994 .5356591 2.70805 7.600402
tenure |
3343 4.114867 5.862322
0
40
disrate |
3343 .1094376 .0735274
.002
1.02
-------------+-------------------------------------------------------slack |
3343 .4884834 .4999421
0
1
abolpos |
3343 .1456775 .3528354
0
1
explose |
3343 .5025426 .5000683
0
1
stateur |
3343
6.5516 1.803825
2.5
13
houshead |
3343 .6120251 .4873617
0
1
-------------+-------------------------------------------------------married |
3343 .5860006 .4926221
0
1
female |
3343 .3478911 .4763725
0
1
child |
3343 .4501944 .4975876
0
1
361

ychild |
3343 .1956327 .3967463
0
1
nonwhite |
3343 .1390966 .3460991
0
1
-------------+-------------------------------------------------------age |
3343 35.44331 10.6402
20
61
schlt12 |
3343 .2811846 .4496446
0
1
schgt12 |
3343 .3356267 .4722797
0
1
smsa |
3343 .7241998 .4469835
0
1
bluecoll |
3343 .6036494 .489212
0
1
-------------+-------------------------------------------------------mining |
3343 .029315 .1687132
0
1
constr |
3343 .1480706 .3552231
0
1
transp |
3343 .0646126 .2458778
0
1
trade |
3343 .1848639 .3882452
0
1
fire |
3343 .0514508 .2209484
0
1
-------------+-------------------------------------------------------services |
3343 .1699073 .3756075
0
1
pubadmin |
3343 .0095722 .097383
0
1
year85 |
3343 .2677236 .442839
0
1
year87 |
3343 .2174693 .4125862
0
1
year89 |
3343 .1998205 .3999251
0
1
-------------+-------------------------------------------------------midatl |
3343 .1088842 .3115405
0
1
encen |
3343 .1429853 .3501103
0
1
wncen |
3343 .0643135 .2453472
0
1
southatl |
3343 .2375112 .4256217
0
1
escen |
3343 .0532456 .2245564
0
1
-------------+-------------------------------------------------------wscen |
3343 .1441819 .3513266
0
1
mountain |
3343 .1079868 .3104102
0
1
pacific |
3343 .0260245 .159232
0
1
.
. * The following gives variables in same order as Table 2 p.657 of McCall (1996)
. * which gives fuller names for the variables
. sum spell censor1 censor2 censor3 censor4 age /*
> */ ui reprate disrate logwage tenure slack abolpos explose bluecoll /*
> */ houshead married child ychild female schlt12 schgt12 nonwhite smsa /*
> */ midatl encen wncen southatl escen wscen mountain pacific /*
> */ mining constr transp trade fire services pubadmin /*
> */ year85 year87 year89
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------spell |
3343 6.247981 5.611271
1
28
censor1 |
3343 .3209692 .4669188
0
1
censor2 |
3343 .1014059 .3019106
0
1
censor3 |
3343 .1717021 .3771777
0
1
censor4 |
3343 .3754113 .4843014
0
1
-------------+-------------------------------------------------------age |
3343 35.44331 10.6402
20
61
ui |
3343 .5527969 .4972791
0
1
362

reprate |
3343 .4544717 .1137918
.066
2.059
disrate |
3343 .1094376 .0735274
.002
1.02
logwage |
3343 5.692994 .5356591 2.70805 7.600402
-------------+-------------------------------------------------------tenure |
3343 4.114867 5.862322
0
40
slack |
3343 .4884834 .4999421
0
1
abolpos |
3343 .1456775 .3528354
0
1
explose |
3343 .5025426 .5000683
0
1
bluecoll |
3343 .6036494 .489212
0
1
-------------+-------------------------------------------------------houshead |
3343 .6120251 .4873617
0
1
married |
3343 .5860006 .4926221
0
1
child |
3343 .4501944 .4975876
0
1
ychild |
3343 .1956327 .3967463
0
1
female |
3343 .3478911 .4763725
0
1
-------------+-------------------------------------------------------schlt12 |
3343 .2811846 .4496446
0
1
schgt12 |
3343 .3356267 .4722797
0
1
nonwhite |
3343 .1390966 .3460991
0
1
smsa |
3343 .7241998 .4469835
0
1
midatl |
3343 .1088842 .3115405
0
1
-------------+-------------------------------------------------------encen |
3343 .1429853 .3501103
0
1
wncen |
3343 .0643135 .2453472
0
1
southatl |
3343 .2375112 .4256217
0
1
escen |
3343 .0532456 .2245564
0
1
wscen |
3343 .1441819 .3513266
0
1
-------------+-------------------------------------------------------mountain |
3343 .1079868 .3104102
0
1
pacific |
3343 .0260245 .159232
0
1
mining |
3343 .029315 .1687132
0
1
constr |
3343 .1480706 .3552231
0
1
transp |
3343 .0646126 .2458778
0
1
-------------+-------------------------------------------------------trade |
3343 .1848639 .3882452
0
1
fire |
3343 .0514508 .2209484
0
1
services |
3343 .1699073 .3756075
0
1
pubadmin |
3343 .0095722 .097383
0
1
year85 |
3343 .2677236 .442839
0
1
-------------+-------------------------------------------------------year87 |
3343 .2174693 .4125862
0
1
year89 |
3343 .1998205 .3999251
0
1
.
. * The following creates a space-delimited data set with
. * variables in same order as Table 2 p.657 of McCall (1996)
. * Permits use by programs other than Stata
. * Note that order has been changed a little from the original Stata data set
.
. outfile spell censor1 censor2 censor3 censor4 age /*
> */ ui reprate disrate logwage tenure slack abolpos explose bluecoll /*
363

>
>
>
>

*/ houshead married child ychild female schlt12 schgt12 nonwhite smsa /*


*/ midatl encen wncen southatl escen wscen mountain pacific /*
*/ mining constr transp trade fire services pubadmin /*
*/ year85 year87 year89 using ema1996.asc, replace

.
. ********* ANALYSIS: UNEMPLOYMENT DURATION **********
.
. * Stata st curves require defining the dependent variable
. * and the censoring variable if there is one
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. stdes
failure _d: censor1 == 1
analysis time _t: spell
|-------------- per subject --------------|
Category
total
mean
min median
max
-----------------------------------------------------------------------------no. of subjects
3343
no. of records
3343
1
1
1
1
(first) entry time
(final) exit time
subjects with gap
time on gap if gap
time at risk

0
6.247981
0
0
20887 6.247981

0
1

0
5

28

28

failures
1073 .3209692
0
0
1
-----------------------------------------------------------------------------.
. * (1) SUMMARIZE KEY VARIABLES (Table 17.6, p.603)
.
. sum spell censor1 censor2 censor3 censor4 ui reprate disrate tenure logwage
364

Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------spell |
3343 6.247981 5.611271
1
28
censor1 |
3343 .3209692 .4669188
0
1
censor2 |
3343 .1014059 .3019106
0
1
censor3 |
3343 .1717021 .3771777
0
1
censor4 |
3343 .3754113 .4843014
0
1
-------------+-------------------------------------------------------ui |
3343 .5527969 .4972791
0
1
reprate | 3343 .4544717 .1137918
.066
2.059
disrate |
3343 .1094376 .0735274
.002
1.02
tenure |
3343 4.114867 5.862322
0
40
logwage |
3343 5.692994 .5356591 2.70805 7.600402
.
. * (2) LIST SURVIVAL CURVE AND CUMULATIVE HAZARD ESTIMATES (Table 17.7,
p.605)
.
. * Kaplan-Meier Estimates of Survival Function
. sts list
failure _d: censor1 == 1
analysis time _t: spell
Beg.
Net
Survivor
Std.
Time Total Fail Lost
Function Error [95% Conf. Int.]
------------------------------------------------------------------------------1 3343 294 246
0.9121 0.0049 0.9019 0.9212
2 2803 178 304
0.8541 0.0062 0.8415 0.8659
3 2321 119 305
0.8103 0.0071 0.7960 0.8238
4 1897 56 165
0.7864 0.0076 0.7712 0.8008
5 1676 104 233
0.7376 0.0085 0.7206 0.7538
6 1339 32 111
0.7200 0.0088 0.7023 0.7369
7 1196 85 178
0.6688 0.0098 0.6492 0.6876
8
933 15 70
0.6581 0.0100 0.6380 0.6773
9
848 33 98
0.6325 0.0106 0.6113 0.6528
10
717
3 55
0.6298 0.0106 0.6086 0.6503
11
659 26 77
0.6050 0.0113 0.5825 0.6267
12
556
7 40
0.5974 0.0115 0.5744 0.6195
13
509 25 69
0.5680 0.0123 0.5434 0.5918
14
415 30 74
0.5270 0.0135 0.5001 0.5531
15
311 19 40
0.4948 0.0146 0.4658 0.5230
16
252 10 41
0.4751 0.0153 0.4449 0.5047
17
201
8 24
0.4562 0.0161 0.4245 0.4874
18
169
7 13
0.4373 0.0169 0.4040 0.4702
19
149
4 15
0.4256 0.0174 0.3912 0.4595
20
130
3 18
0.4158 0.0179 0.3804 0.4507
21
109
4 23
0.4005 0.0188 0.3635 0.4372
22
82
4
9
0.3810 0.0203 0.3412 0.4206
23
69
0
9
0.3810 0.0203 0.3412 0.4206
365

24
60
0
2
0.3810 0.0203 0.3412 0.4206
25
58
0 10
0.3810 0.0203 0.3412 0.4206
26
48
2 13
0.3651 0.0223 0.3214 0.4088
27
33
5 24
0.3098 0.0296 0.2528 0.3684
28
4
0
4
0.3098 0.0296 0.2528 0.3684
------------------------------------------------------------------------------.
. * Nelson-Aalen Estimates of Cumulative Hazard
. sts list, na
failure _d: censor1 == 1
analysis time _t: spell
Beg.
Net
Nelson-Aalen Std.
Time Total Fail Lost
Cum. Haz. Error [95% Conf. Int.]
------------------------------------------------------------------------------1 3343 294 246
0.0879 0.0051 0.0784 0.0986
2 2803 178 304
0.1514 0.0070 0.1383 0.1658
3 2321 119 305
0.2027 0.0084 0.1869 0.2199
4 1897 56 165
0.2322 0.0093 0.2147 0.2512
5 1676 104 233
0.2943 0.0111 0.2733 0.3169
6 1339 32 111
0.3182 0.0119 0.2957 0.3424
7 1196 85 178
0.3893 0.0142 0.3624 0.4181
8
933 15 70
0.4053 0.0148 0.3774 0.4353
9
848 33 98
0.4443 0.0162 0.4135 0.4773
10 717
3 55
0.4484 0.0164 0.4174 0.4818
11
659 26 77
0.4879 0.0182 0.4536 0.5248
12
556
7 40
0.5005 0.0188 0.4650 0.5387
13
509 25 69
0.5496 0.0212 0.5096 0.5927
14
415 30 74
0.6219 0.0250 0.5748 0.6728
15
311 19 40
0.6830 0.0286 0.6291 0.7415
16
252 10 41
0.7227 0.0313 0.6639 0.7866
17
201
8 24
0.7625 0.0343 0.6982 0.8327
18
169
7 13
0.8039 0.0377 0.7333 0.8812
19
149
4 15
0.8307 0.0400 0.7559 0.9130
20
130
3 18
0.8538 0.0422 0.7750 0.9406
21
109
4 23
0.8905 0.0460 0.8048 0.9853
22
82
4
9
0.9393 0.0521 0.8426 1.0470
23
69
0
9
0.9393 0.0521 0.8426 1.0470
24
60
0
2
0.9393 0.0521 0.8426 1.0470
25
58
0 10
0.9393 0.0521 0.8426 1.0470
26
48
2 13
0.9809 0.0598 0.8705 1.1055
27
33
5 24
1.1325 0.0904 0.9685 1.3242
28
4
0
4
1.1325 0.0904 0.9685 1.3242
------------------------------------------------------------------------------.
. * (3) VARIOUS GRAPHS (Figures 17.3-17.6)
.
. * (3A) Figure 17.3: Overall Survival Function (page 604)
366

. * sts graph, gwood


. * Nicer graphs and also confidence bands are bolder and easier to read
. sts gen surv = s
. sts gen lbsurv = lb(s)
. sts gen ubsurv = ub(s)
. sort spell
. graph twoway (line ubsurv spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)) /*
> */ (line surv spell, msize(vtiny) mstyle(p1) c(J) clstyle(p1)) /*
> */ (line lbsurv spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)), /*
> */ scale(1.2) plotregion(style(none)) /*
> */ title("Overall Survival Function Estimate") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Survival Probability", size(medlarge)) yscale(titlegap(*5)) /*
> */ ylabel(0.00(0.25)1.00,grid)/*
> */ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Upper 95% confidence band") label(2 "Survival Estimate") /*
> */
label(3 "Lower 95% confidence band") )
. graph export km_pt1.wmf, replace
(file c:\Imbook\bwebpage\Section4\km_pt1.wmf written in Windows Metafile format)
.
. * (3B) Figure 17.4: Survival Function by Treatment (here ui) (p.605)
. * sts graph, by(ui)
. sts graph, by(ui) /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Survival Function Estimates by UI Status") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Survival Probability", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
> */ legend(label(1 "No UI (UI = 0)") label(2 "Received UI (UI = 1)") )
failure _d: censor1 == 1
analysis time _t: spell
. graph export km_pt2.wmf, replace
(file c:\Imbook\bwebpage\Section4\km_pt2.wmf written in Windows Metafile format)
.
. * (3C) Figure 17.5: Overall Cumulative Hazard Function (p.606)
. * sts graph, cna
. * Nicer graphs and also confidence bands are bolder and easier to read
. sts gen cumhaz = na
. sts gen lbcumhaz = lb(na)
. sts gen ubcumhaz = ub(na)
367

. sort spell
. graph twoway (line ubcumhaz spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)) /*
> */ (line cumhaz spell, msize(vtiny) mstyle(p1) c(J) clstyle(p1)) /*
> */ (line lbcumhaz spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)), /*
> */ scale(1.2) plotregion(style(none)) /*
> */ title("Overall Cumulative Hazard Estimate") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ ylabel(0.00(0.50)1.50,grid)/*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Upper 95% confidence band") label(2 "Cumulative Hazard Estimate") /*
> */
label(3 "Lower 95% confidence band") )
. graph export na_pt1.wmf, replace
(file c:\Imbook\bwebpage\Section4\na_pt1.wmf written in Windows Metafile format)
.
. * (3D) Figure 17.6: Cumulative Hazard Function by Treatment (here ui) (p.606)
. * sts graph, na by(ui)
. sts graph, na by(ui) /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Cumulative Hazard Estimates by UI Status") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
> */ legend(label(1 "No UI (UI = 0)") label(2 "Received UI (UI = 1)") )
failure _d: censor1 == 1
analysis time _t: spell
. graph export na_pt2.wmf, replace
(file c:\Imbook\bwebpage\Section4\na_pt2.wmf written in Windows Metafile format)
.
. * (4) VARIOUS PARAMETRIC MODELS: COEFFICIENTS (Table 17.8)
.
. * streg default is to report hazard rates ratehr than coeffcients
. * streg with nohr option reports coefficients
.
. * Create regressors
. gen RR = reprate
. gen DR = disrate
. gen UI = ui
. gen RRUI = RR*UI
. gen DRUI = DR*UI
368

. gen LOGWAGE = logwage


.
. * Define $xlist = list of regressors used in subsequent regressions
. global xlist RR DR UI RRUI DRUI LOGWAGE /*
> */ tenure slack abolpos explose stateur houshead married /*
> */ female child ychild nonwhite age schlt12 schgt12 smsa bluecoll /*
> */ mining constr transp trade fire services pubadmin /*
> */ year85 year87 year89 midatl /*
> */ encen wncen southatl escen wscen mountain pacific
.
. * Exponential regression
. streg $xlist, nohr robust dist(exponential)
failure _d: censor1 == 1
analysis time _t: spell
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -3012.4909


log pseudo-likelihood = -2810.3791
log pseudo-likelihood = -2701.8024
log pseudo-likelihood = -2700.6911
log pseudo-likelihood = -2700.6903
log pseudo-likelihood = -2700.6903

Exponential regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 565.24


Log pseudo-likelihood = -2700.6903
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .4720235 .6005534 0.79 0.432 -.7050396 1.649087
DR | -.5756396 .7624489 -0.75 0.450 -2.070012 .9187327
UI | -1.424561 .2493917 -5.71 0.000 -1.91336 -.9357622
RRUI | .9655904 .6118408 1.58 0.115 -.2335956 2.164776
DRUI | -.1990635 1.019118 -0.20 0.845 -2.196498 1.798371
LOGWAGE | .3508005 .115598 3.03 0.002 .1242327 .5773684
tenure | -.0001462 .0064637 -0.02 0.982 -.0128147 .0125224
slack | -.2593666 .0759363 -3.42 0.001 -.4081991 -.1105342
abolpos | -.1550897 .0953306 -1.63 0.104 -.3419342 .0317549
explose | .198458 .0648354 3.06 0.002
.071383 .3255331
stateur | -.064626 .0229903 -2.81 0.005 -.1096862 -.0195659
houshead | .3812208 .0836602 4.56 0.000 .2172499 .5451918
married | .369552 .0786145 4.70 0.000 .2154705 .5236335
369

female | .1164067 .0852986 1.36 0.172 -.0507754 .2835888


child | -.0333008 .0794577 -0.42 0.675 -.1890352 .1224335
ychild | -.1449722 .1022781 -1.42 0.156 -.3454336 .0554892
nonwhite | -.6692066 .1188272 -5.63 0.000 -.9021037 -.4363095
age | -.0220821 .0039256 -5.63 0.000 -.0297762 -.0143879
schlt12 | -.1231414 .0966102 -1.27 0.202 -.3124939 .066211
schgt12 | .1114395 .082945 1.34 0.179 -.0511297 .2740087
smsa | .1922291 .0799904 2.40 0.016 .0354508 .3490075
bluecoll | -.2033718 .085129 -2.39 0.017 -.3702215 -.036522
mining | -.1205818 .1973575 -0.61 0.541 -.5073955 .2662319
constr | -.04475 .1081519 -0.41 0.679 -.2567237 .1672238
transp | -.1786694 .156034 -1.15 0.252 -.4844906 .1271517
trade | -.0345159 .1019152 -0.34 0.735 -.234266 .1652341
fire | .1120549 .1386716 0.81 0.419 -.1597365 .3838462
services | .1840002 .0983911 1.87 0.061 -.0088428 .3768432
pubadmin | .1090606 .2954211 0.37 0.712 -.4699541 .6880752
year85 | .2147661 .0888664 2.42 0.016 .0405911 .388941
year87 | .3541162 .0948499 3.73 0.000 .1682139 .5400186
year89 | .467082 .1104355 4.23 0.000 .2506325 .6835316
midatl | .0264112 .1465647 0.18 0.857 -.2608503 .3136727
encen | .0043916 .1502813 0.03 0.977 -.2901544 .2989375
wncen | .1724311 .1607689 1.07 0.283 -.1426703 .4875324
southatl | .2638807 .1183726 2.23 0.026 .0318747 .4958867
escen | .35414 .19317 1.83 0.067 -.0244664 .7327463
wscen | .3385896 .1433308 2.36 0.018 .0576664 .6195128
mountain | .0063693 .1538821 0.04 0.967 -.2952341 .3079727
pacific | .0770202 .2393505 0.32 0.748 -.3920982 .5461385
_cons | -4.079107 .8767097 -4.65 0.000 -5.797426 -2.360788
-----------------------------------------------------------------------------. estimates store bexponential
.
. * Weibull regression
. streg $xlist, nohr robust dist(weibull)
failure _d: censor1 == 1
analysis time _t: spell
Fitting constant-only model:
Iteration 0: log pseudo-likelihood = -3012.4909
Iteration 1: log pseudo-likelihood = -3012.3543
Iteration 2: log pseudo-likelihood = -3012.3543
Fitting full model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -3012.3543


log pseudo-likelihood = -2799.9064
log pseudo-likelihood = -2688.7377
log pseudo-likelihood = -2687.6004
370

Iteration 4: log pseudo-likelihood = -2687.5995


Iteration 5: log pseudo-likelihood = -2687.5995
Weibull regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 501.65


Log pseudo-likelihood = -2687.5995
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .4481156 .6381895 0.70 0.483 -.8027127 1.698944
DR | -.4269187 .8086983 -0.53 0.598 -2.011938 1.158101
UI | -1.496066 .2639679 -5.67 0.000 -2.013434 -.9786984
RRUI | 1.015226 .6455611 1.57 0.116 -.2500501 2.280503
DRUI | -.2988417 1.065384 -0.28 0.779 -2.386956 1.789272
LOGWAGE | .3655253 .12212 2.99 0.003 .1261745 .6048761
tenure | -.0011127 .0068716 -0.16 0.871 -.0145809 .0123554
slack | -.2652154 .0803214 -3.30 0.001 -.4226424 -.1077883
abolpos | -.1604227 .1012942 -1.58 0.113 -.3589557 .0381103
explose | .2075085 .0684715 3.03 0.002 .0733068 .3417103
stateur | -.0708745 .0242117 -2.93 0.003 -.1183286 -.0234204
houshead | .3976626 .0887192 4.48 0.000 .2237762 .571549
married | .3786057 .0830317 4.56 0.000 .2158665 .541345
female | .1260829 .0896987 1.41 0.160 -.0497233 .301889
child | -.0336778 .0839956 -0.40 0.688 -.1983061 .1309505
ychild | -.1613066 .108947 -1.48 0.139 -.3748389 .0522256
nonwhite | -.7025504 .12426 -5.65 0.000 -.9460956 -.4590052
age | -.0235823 .0041922 -5.63 0.000 -.0317989 -.0153658
schlt12 | -.1226759 .1022762 -1.20 0.230 -.3231335 .0777816
schgt12 | .1162848 .0880692 1.32 0.187 -.0563278 .2888973
smsa | .1999567 .0841129 2.38 0.017 .0350985 .3648149
bluecoll | -.1994925 .0899354 -2.22 0.027 -.3757626 -.0232223
mining | -.1015676 .2036644 -0.50 0.618 -.5007425 .2976073
constr | -.0253737 .1135609 -0.22 0.823 -.247949 .1972016
transp | -.1981522 .1672141 -1.19 0.236 -.5258858 .1295814
trade | -.0311361 .1079502 -0.29 0.773 -.2427146 .1804423
fire | .1262153 .1492527 0.85 0.398 -.1663145 .4187452
services | .2031673 .1038945 1.96 0.051 -.0004622 .4067968
pubadmin | .1117728 .3087374 0.36 0.717 -.4933415 .716887
year85 | .2374972 .093387 2.54 0.011
.054462 .4205325
year87 | .3787397 .1011782 3.74 0.000 .1804341 .5770454
year89 | .4920278 .1180472 4.17 0.000 .2606596 .7233959
midatl | .02465 .1542139 0.16 0.873 -.2776037 .3269036
encen | -.0014111 .1579065 -0.01 0.993 -.3109023 .30808
wncen | .1844363 .1694444 1.09 0.276 -.1476687 .5165413
southatl | .2740974 .1250481 2.19 0.028 .0290076 .5191872
371

escen | .367742 .2024771 1.82 0.069 -.0291058 .7645899


wscen | .3440005 .1527804 2.25 0.024 .0445563 .6434446
mountain | .0159627 .1620188 0.10 0.922 -.3015883 .3335136
pacific | .0849532 .2504077 0.34 0.734 -.4058368 .5757432
_cons | -4.357886 .9196792 -4.74 0.000 -6.160424 -2.555347
-------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281
-------------+---------------------------------------------------------------p | 1.129225 .0219492
1.087014 1.173075
1/p | .8855632 .0172131
.8524608 .9199511
-----------------------------------------------------------------------------. estimates store bweibull
.
. * Gompertz regression
. streg $xlist, nohr robust dist(gompertz)
failure _d: censor1 == 1
analysis time _t: spell
Fitting constant-only model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -3012.4909


log pseudo-likelihood = -3002.0916
log pseudo-likelihood = -3002.026
log pseudo-likelihood = -3002.026

Fitting full model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -3002.026


log pseudo-likelihood = -2796.0001
log pseudo-likelihood = -2701.6693
log pseudo-likelihood = -2700.6057
log pseudo-likelihood = -2700.605
log pseudo-likelihood = -2700.605

Gompertz regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
=
=

Log pseudo-likelihood =

3343
1073
20887

Number of obs =

Wald chi2(40) = 529.75


-2700.605
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .472405 .6033813 0.78 0.434 -.7102005 1.655011
DR | -.5627894 .7646131 -0.74 0.462 -2.061404 .9358247
372

UI | -1.428355 .2508349 -5.69 0.000 -1.919982 -.9367272


RRUI | .9689413 .6144464 1.58 0.115 -.2353514 2.173234
DRUI | -.2112495 1.021112 -0.21 0.836 -2.212593 1.790094
LOGWAGE | .3524722 .1162698 3.03 0.002 .1245876 .5803567
tenure | -.0002233 .0065002 -0.03 0.973 -.0129635 .0125168
slack | -.2593933 .0762829 -3.40 0.001 -.4089051 -.1098815
abolpos | -.1552595 .0958002 -1.62 0.105 -.3430244 .0325053
explose | .1991286 .0650876 3.06 0.002 .0715592 .326698
stateur | -.065244 .0231645 -2.82 0.005 -.1106456 -.0198424
houshead | .3822818 .0841671 4.54 0.000 .2173173 .5472464
married | .3700141 .0789107 4.69 0.000
.215352 .5246762
female | .1170987 .0856236 1.37 0.171 -.0507206 .2849179
child | -.0331425 .0798246 -0.42 0.678 -.1895958 .1233108
ychild | -.1466596 .102884 -1.43 0.154 -.3483085 .0549893
nonwhite | -.6720521 .1197092 -5.61 0.000 -.9066778 -.4374264
age | -.0222175 .0039787 -5.58 0.000 -.0300157 -.0144193
schlt12 | -.1228615 .097015 -1.27 0.205 -.3130075 .0672845
schgt12 | .1121295 .0831976 1.35 0.178 -.0509348 .2751938
smsa | .1925807 .0803478 2.40 0.017 .0351019 .3500596
bluecoll | -.203405 .0854986 -2.38 0.017 -.3709791 -.0358309
mining | -.1183683 .1976441 -0.60 0.549 -.5057435 .269007
constr | -.0423947 .1082891 -0.39 0.695 -.2546375 .169848
transp | -.1799724 .1570001 -1.15 0.252 -.487687 .1277422
trade | -.0341793 .1023611 -0.33 0.738 -.2348034 .1664447
fire | .1143611 .1398161 0.82 0.413 -.1596734 .3883955
services | .1854033 .0987923 1.88 0.061 -.0082261 .3790327
pubadmin | .1089298 .2965867 0.37 0.713 -.4723694 .690229
year85 | .2172389 .0890506 2.44 0.015 .0427028 .3917749
year87 | .3564181 .095298 3.74 0.000 .1696374 .5431988
year89 | .4690752 .1114266 4.21 0.000
.250683 .6874674
midatl | .026766 .1471298 0.18 0.856 -.2616031 .3151351
encen | .0043808 .15089 0.03 0.977 -.2913581 .3001198
wncen | .1735986 .1614007 1.08 0.282 -.142741 .4899382
southatl | .2647448 .1188746 2.23 0.026
.031755 .4977347
escen | .3560917 .1938142 1.84 0.066 -.0237772 .7359606
wscen | .3393956 .1442438 2.35 0.019 .0566829 .6221082
mountain | .0076507 .1545162 0.05 0.961 -.2951954 .3104969
pacific | .0778885 .2400495 0.32 0.746 -.3925999 .5483769
_cons | -4.09733 .8802997 -4.65 0.000 -5.822686 -2.371975
-------------+---------------------------------------------------------------gamma | .002658 .0067759 0.39 0.695 -.0106225 .0159386
-----------------------------------------------------------------------------. estimates store bgompertz
.
. * Weibull regression
. stcox $xlist, nohr robust
failure _d: censor1 == 1
analysis time _t: spell
373

Iteration 0: log pseudo-likelihood = -7981.9304


Iteration 1: log pseudo-likelihood = -7731.2822
Iteration 2: log pseudo-likelihood = -7717.3198
Iteration 3: log pseudo-likelihood = -7717.2334
Iteration 4: log pseudo-likelihood = -7717.2334
Refining estimates:
Iteration 0: log pseudo-likelihood = -7717.2334
Cox regression -- Breslow method for ties
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 540.98


Log pseudo-likelihood = -7717.2334
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .5222796 .5711698 0.91 0.361 -.5971926 1.641752
DR | -.752507 .72175 -1.04 0.297 -2.167111 .6620971
UI | -1.317719 .2372893 -5.55 0.000 -1.782798 -.8526409
RRUI | .8822462 .582115 1.52 0.130 -.2586783 2.023171
DRUI | -.0951357 .977774 -0.10 0.922 -2.011538 1.821266
LOGWAGE | .3352639 .1106483 3.03 0.002 .1183972 .5521306
tenure | .0008278 .0061286 0.14 0.893 -.0111841 .0128396
slack | -.247863 .0721173 -3.44 0.001 -.3892103 -.1065158
abolpos | -.1511638 .0905035 -1.67 0.095 -.3285475 .0262198
explose | .1865068 .0615742 3.03 0.002 .0658236 .30719
stateur | -.0590475 .022085 -2.67 0.008 -.1023334 -.0157616
houshead | .3601866 .0794827 4.53 0.000 .2044035 .5159698
married | .358819 .0746355 4.81 0.000 .2125362 .5051019
female | .1002758 .0813277 1.23 0.218 -.0591236 .2596753
child | -.0396054 .0755365 -0.52 0.600 -.1876542 .1084435
ychild | -.1276638 .0967856 -1.32 0.187 -.3173602 .0620325
nonwhite | -.6394475 .1151332 -5.55 0.000 -.8651043 -.4137906
age | -.0204623 .0037593 -5.44 0.000 -.0278305 -.0130942
schlt12 | -.1220585 .0920073 -1.33 0.185 -.3023895 .0582726
schgt12 | .1104817 .0783542 1.41 0.159 -.0430897 .2640531
smsa | .1864841 .0766075 2.43 0.015 .0363361 .3366321
bluecoll | -.2108023 .080867 -2.61 0.009 -.3692986 -.052306
mining | -.1238251 .1906352 -0.65 0.516 -.4974632 .249813
constr | -.054455 .1029488 -0.53 0.597 -.256231 .1473209
transp | -.1551657 .1466515 -1.06 0.290 -.4425973 .1322659
trade | -.0383252 .0968106 -0.40 0.692 -.2280706 .1514201
fire | .1097585 .1300779 0.84 0.399 -.1451895 .3647065
services | .1666262 .0939507 1.77 0.076 -.0175138 .3507662
pubadmin | .1022002 .2829817 0.36 0.718 -.4524336 .6568341
year85 | .204162 .084908 2.40 0.016 .0377454 .3705786
374

year87 | .3384229 .0899115 3.76 0.000 .1621997 .5146462


year89 | .4486559 .104937 4.28 0.000 .2429832 .6543286
midatl | .0342238 .140515 0.24 0.808 -.2411805 .3096282
encen | .0174597 .1438862 0.12 0.903 -.2645521 .2994716
wncen | .1650967 .1532559 1.08 0.281 -.1352795 .4654728
southatl | .2518023 .1127138 2.23 0.025 .0308874 .4727172
escen | .3450422 .1839818 1.88 0.061 -.0155554 .7056398
wscen | .3316752 .1359801 2.44 0.015 .0651591 .5981914
mountain | .009484 .1468626 0.06 0.949 -.2783613 .2973293
pacific | .0720292 .2263339 0.32 0.750 -.3715771 .5156355
-----------------------------------------------------------------------------. estimates store bcox
.
. * Display Results for Table 17.8 (page 607)
. estimates table bexponential bweibull bgompertz, t stats(N ll) b(%8.3f) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE _cons)
----------------------------------------------Variable | bexpon~l bweibull bgompe~z
-------------+--------------------------------RR | 0.472
0.448
0.472
| 0.79
0.70
0.78
DR | -0.576 -0.427 -0.563
| -0.75 -0.53 -0.74
UI | -1.425 -1.496 -1.428
| -5.71 -5.67 -5.69
RRUI | 0.966
1.015
0.969
| 1.58
1.57
1.58
DRUI | -0.199 -0.299 -0.211
| -0.20 -0.28 -0.21
LOGWAGE | 0.351
0.366
0.352
| 3.03
2.99
3.03
_cons | -4.079 -4.358 -4.097
| -4.65 -4.74 -4.65
-------------+--------------------------------N | 3343.000 3343.000 3343.000
ll | -2.7e+03 -2.7e+03 -2.7e+03
----------------------------------------------legend: b/t
. estimates table bcox, t stats(N ll) b(%8.3f) keep(RR DR UI RRUI DRUI LOGWAGE)
------------------------Variable | bcox
-------------+----------RR | 0.522
| 0.91
DR | -0.753
| -1.04
375

UI | -1.318
| -5.55
RRUI | 0.882
| 1.52
DRUI | -0.095
| -0.10
LOGWAGE | 0.335
| 3.03
-------------+----------N | 3343.000
ll | -7.7e+03
------------------------legend: b/t
.
. * (5) VARIOUS PARAMETRIC MODELS: HAZARD RATIOS (Table 17.9, page 608))
.
. * streg default is to report hazard rates rather than coeffcients
. * streg with nohr option reports coefficients
.
. * Exponential regression
. streg $xlist, robust dist(exponential)
failure _d: censor1 == 1
analysis time _t: spell
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -3012.4909


log pseudo-likelihood = -2810.3791
log pseudo-likelihood = -2701.8024
log pseudo-likelihood = -2700.6911
log pseudo-likelihood = -2700.6903
log pseudo-likelihood = -2700.6903

Exponential regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 565.24


Log pseudo-likelihood = -2700.6903
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t | Haz. Ratio Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | 1.603235 .9628283 0.79 0.432 .494089 5.202226
DR | .5623451 .4287594 -0.75 0.450 .1261843 2.506112
UI | .2406141 .0600072 -5.71 0.000 .1475837 .3922867
RRUI | 2.626338 1.606901 1.58 0.115 .7916819 8.712654
DRUI | .8194978 .8351649 -0.20 0.845 .1111919 6.039799
LOGWAGE | 1.420204 .1641727 3.03 0.002 1.132279 1.781344
376

tenure | .9998539 .0064627 -0.02 0.982 .9872671 1.012601


slack | .7715401 .0585879 -3.42 0.001 .6648465 .8953557
abolpos | .8563384 .0816353 -1.63 0.104 .7103949 1.032264
explose | 1.219521 .0790681 3.06 0.002 1.073992 1.384769
stateur | .937418 .0215515 -2.81 0.005 .8961153 .9806243
houshead | 1.464071 .1224844 4.56 0.000 1.242655 1.724939
married | 1.447086 .1137619 4.70 0.000 1.240445 1.68815
female | 1.123453 .0958289 1.36 0.172 .9504921 1.327887
child | .9672475 .0768553 -0.42 0.675 .8277574 1.130244
ychild | .8650463 .0884753 -1.42 0.156 .7079133 1.057058
nonwhite | .5121147 .0608532 -5.63 0.000 .4057153 .6464176
age | .9781599 .0038399 -5.63 0.000 .9706627 .9857151
schlt12 | .8841386 .0854168 -1.27 0.202 .7316201 1.068452
schgt12 | 1.117886 .0927231 1.34 0.179 .9501554 1.315226
smsa | 1.211948 .0969443 2.40 0.016 1.036087 1.41766
bluecoll | .8159748 .0694631 -2.39 0.017 .6905813 .9641369
mining | .8864046 .1749386 -0.61 0.541 .6020616 1.305038
constr | .9562365 .1034188 -0.41 0.679 .7735819 1.182019
transp | .8363823 .1305041 -1.15 0.252 .6160109 1.135589
trade | .966073 .0984575 -0.34 0.735 .7911514 1.179669
fire | 1.118574 .1551145 0.81 0.419 .8523684 1.46792
services | 1.202016 .1182677 1.87 0.061 .9911962 1.457676
pubadmin | 1.11523 .3294624 0.37 0.712
.625031 1.989882
year85 | 1.239572 .1101563 2.42 0.016 1.041426 1.475418
year87 | 1.424921 .1351536 3.73 0.000
1.18319 1.716039
year89 | 1.595332 .1761812 4.23 0.000 1.284838 1.980861
midatl | 1.026763 .1504872 0.18 0.857 .7703962 1.368442
encen | 1.004401 .1509427 0.03 0.977 .7481481 1.348425
wncen | 1.18819 .191024 1.07 0.283 .8670399 1.628293
southatl | 1.301973 .1541179 2.23 0.026 1.032388 1.641953
escen | 1.424955 .2752586 1.83 0.067 .9758305 2.080787
wscen | 1.402967 .2010884 2.36 0.018 1.059362 1.858023
mountain | 1.00639 .1548654 0.04 0.967 .7443573 1.360664
pacific | 1.080064 .2585138 0.32 0.748 .6756378 1.726573
-----------------------------------------------------------------------------.
. * Weibull regression
. streg $xlist, robust dist(weibull)
failure _d: censor1 == 1
analysis time _t: spell
Fitting constant-only model:
Iteration 0: log pseudo-likelihood = -3012.4909
Iteration 1: log pseudo-likelihood = -3012.3543
Iteration 2: log pseudo-likelihood = -3012.3543
Fitting full model:

377

Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -3012.3543


log pseudo-likelihood = -2799.9064
log pseudo-likelihood = -2688.7377
log pseudo-likelihood = -2687.6004
log pseudo-likelihood = -2687.5995
log pseudo-likelihood = -2687.5995

Weibull regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 501.65


Log pseudo-likelihood = -2687.5995
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t | Haz. Ratio Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | 1.56536 .998996 0.70 0.483 .4481117 5.46817
DR | .6525166 .527689 -0.53 0.598 .1337292 3.183881
UI | .2240097 .0591314 -5.67 0.000 .1335294 .3757999
RRUI | 2.759988 1.781741 1.57 0.116 .7787618 9.781599
DRUI | .7416768 .7901705 -0.28 0.779 .0919091 5.985096
LOGWAGE | 1.441271 .176008 2.99 0.003
1.13448 1.831025
tenure | .9988879 .006864 -0.16 0.871 .9855249 1.012432
slack | .7670407 .0616098 -3.30 0.001 .6553129 .8978176
abolpos | .8517837 .0862808 -1.58 0.113 .6984053 1.038846
explose | 1.230608 .0842616 3.03 0.002 1.076061 1.407352
stateur | .9315788 .0225551 -2.93 0.003 .8884041 .9768517
houshead | 1.488342 .1320445 4.48 0.000 1.250791 1.771008
married | 1.460247 .1212469 4.56 0.000 1.240937 1.718316
female | 1.134376 .101752 1.41 0.160 .9514927 1.352411
child | .966883 .0812139 -0.40 0.688 .8201188 1.139911
ychild | .8510311 .0927173 -1.48 0.139
.6874 1.053613
nonwhite | .4953204 .0615485 -5.65 0.000
.388254 .6319119
age | .9766936 .0040945 -5.63 0.000 .9687014 .9847517
schlt12 | .8845503 .0904684 -1.20 0.230 .7238772 1.080887
schgt12 | 1.123316 .0989295 1.32 0.187 .9452293 1.334955
smsa | 1.22135 .1027313 2.38 0.017 1.035722 1.440247
bluecoll | .8191464 .0736702 -2.22 0.027 .6867654 .9770452
mining | .9034201 .1839945 -0.50 0.618 .6060805 1.346633
constr | .9749455 .1107157 -0.22 0.823 .7803997 1.21799
transp | .820245 .1371565 -1.19 0.236 .5910316 1.138352
trade | .9693436 .1046408 -0.29 0.773 .7844954 1.197747
fire | 1.134526 .1693311 0.85 0.398 .8467799 1.520053
services | 1.225277 .1272996 1.96 0.051 .9995379 1.501999
pubadmin | 1.118259 .3452483 0.36 0.717 .6105827 2.048048
year85 | 1.268072 .1184214 2.54 0.011 1.055972 1.522772
year87 | 1.460443 .147765 3.74 0.000 1.197737 1.780769
year89 | 1.63563 .1930814 4.17 0.000 1.297786 2.061422
378

midatl | 1.024956 .1580625 0.16 0.873


.757597 1.386668
encen | .9985899 .1576839 -0.01 0.993 .7327855 1.36081
wncen | 1.20254 .2037638 1.09 0.276 .8627169 1.67622
southatl | 1.315343 .1644812 2.19 0.028 1.029432 1.680661
escen | 1.444469 .292472 1.82 0.069 .9713137 2.148113
wscen | 1.410579 .2155089 2.25 0.024 1.045564 1.903025
mountain | 1.016091 .1646258 0.10 0.922 .7396425 1.395864
pacific | 1.088666 .2726104 0.34 0.734 .6664189 1.778452
-------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281
-------------+---------------------------------------------------------------p | 1.129225 .0219492
1.087014 1.173075
1/p | .8855632 .0172131
.8524608 .9199511
-----------------------------------------------------------------------------.
. * Gompertz regression
. streg $xlist, robust dist(gompertz)
failure _d: censor1 == 1
analysis time _t: spell
Fitting constant-only model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -3012.4909


log pseudo-likelihood = -3002.0916
log pseudo-likelihood = -3002.026
log pseudo-likelihood = -3002.026

Fitting full model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -3002.026


log pseudo-likelihood = -2796.0001
log pseudo-likelihood = -2701.6693
log pseudo-likelihood = -2700.6057
log pseudo-likelihood = -2700.605
log pseudo-likelihood = -2700.605

Gompertz regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
=
=

Log pseudo-likelihood =

3343
1073
20887

Number of obs =

Wald chi2(40) = 529.75


-2700.605
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t | Haz. Ratio Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | 1.603847 .9677311 0.78 0.434 .4915456 5.233135
379

DR | .5696179 .4355373 -0.74 0.462 .1272752 2.549315


UI | .239703 .0601259 -5.69 0.000 .1466096 .3919084
RRUI | 2.635153 1.61916 1.58 0.115 .7902931 8.786655
DRUI | .809572 .8266639 -0.21 0.836 .1094166 5.990014
LOGWAGE | 1.42258 .165403 3.03 0.002 1.132681 1.786676
tenure | .9997767 .0064987 -0.03 0.973 .9871202 1.012595
slack | .7715195 .0588538 -3.40 0.001 .6643773 .8959403
abolpos | .856193 .0820234 -1.62 0.105 .7096209 1.033039
explose | 1.220339 .079429 3.06 0.002 1.074182 1.386383
stateur | .9368388 .0217014 -2.82 0.005 .895256 .9803531
houshead | 1.465625 .1233575 4.54 0.000 1.242738 1.728487
married | 1.447755 .1142433 4.69 0.000 1.240298 1.689912
female | 1.12423 .0962607 1.37 0.171 .9505442 1.329653
child | .9674007 .0772224 -0.42 0.678 .8272934 1.131236
ychild | .8635879 .0888493 -1.43 0.154 .7058811 1.056529
nonwhite | .5106596 .0611307 -5.61 0.000 .4038637 .6456961
age | .9780275 .0038913 -5.58 0.000 .9704303 .9856841
schlt12 | .8843861 .0857988 -1.27 0.205 .7312444 1.0696
schgt12 | 1.118658 .0930697 1.35 0.178 .9503406 1.316786
smsa | 1.212374 .0974117 2.40 0.017 1.035725 1.419152
bluecoll | .8159478 .0697624 -2.38 0.017 .6900584 .9648035
mining | .8883688 .1755808 -0.60 0.549 .603057 1.308664
constr | .9584913 .1037942 -0.39 0.695 .7751974 1.185125
transp | .8352933 .1311411 -1.15 0.252
.614045 1.13626
trade | .9663982 .0989216 -0.33 0.738 .7907263 1.181098
fire | 1.121157 .1567557 0.82 0.413 .8524222 1.474613
services | 1.203704 .1189167 1.88 0.061 .9918076 1.460871
pubadmin | 1.115084 .3307191 0.37 0.713 .6235232 1.994172
year85 | 1.242641 .110658 2.44 0.015 1.043628 1.479605
year87 | 1.428205 .1361051 3.74 0.000 1.184875 1.721505
year89 | 1.598515 .1781172 4.21 0.000 1.284903 1.988673
midatl | 1.027127 .1511211 0.18 0.856 .7698165 1.370444
encen | 1.00439 .1515525 0.03 0.977
.747248 1.35002
wncen | 1.189578 .1919987 1.08 0.282 .8669786 1.632215
southatl | 1.303098 .1549053 2.23 0.026 1.032265 1.644991
escen | 1.427739 .276716 1.84 0.066 .9765033 2.087486
wscen | 1.404099 .2025325 2.35 0.019
1.05832 1.862851
mountain | 1.00768 .1557029 0.05 0.961 .7443861 1.364103
pacific | 1.081002 .2594941 0.32 0.746 .6752989 1.730442
-------------+---------------------------------------------------------------gamma | .002658 .0067759 0.39 0.695 -.0106225 .0159386
-----------------------------------------------------------------------------.
. * Cox regression
. stcox $xlist, robust
failure _d: censor1 == 1
analysis time _t: spell
Iteration 0: log pseudo-likelihood = -7981.9304
380

Iteration 1: log pseudo-likelihood = -7731.2822


Iteration 2: log pseudo-likelihood = -7717.3198
Iteration 3: log pseudo-likelihood = -7717.2334
Iteration 4: log pseudo-likelihood = -7717.2334
Refining estimates:
Iteration 0: log pseudo-likelihood = -7717.2334
Cox regression -- Breslow method for ties
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 540.98


Log pseudo-likelihood = -7717.2334
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t | Haz. Ratio Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | 1.685866 .962916 0.91 0.361 .5503545 5.164209
DR | .4711838 .3400769 -1.04 0.297 .1145079 1.938854
UI | .2677452 .0635331 -5.55 0.000
.168167 .4262877
RRUI | 2.416321 1.406577 1.52 0.130 .7720714 7.562264
DRUI | .9092495 .8890406 -0.10 0.922 .1337828 6.179678
LOGWAGE | 1.398309 .1547206 3.03 0.002 1.125691 1.73695
tenure | 1.000828 .0061337 0.14 0.893 .9888782 1.012922
slack | .7804668 .0562851 -3.44 0.001 .6775918 .8989608
abolpos | .8597068 .0778065 -1.67 0.095 .7199688 1.026567
explose | 1.205033 .0741989 3.03 0.002 1.068038 1.359599
stateur | .942662 .0208187 -2.67 0.008 .9027285 .9843619
houshead | 1.433597 .1139461 4.53 0.000 1.226793 1.675262
married | 1.431638 .106851 4.81 0.000 1.236811 1.657154
female | 1.105476 .0899059 1.23 0.218 .9425903 1.296509
child | .9611687 .0726033 -0.52 0.600 .8289013 1.114542
ychild | .8801492 .0851858 -1.32 0.187 .7280685 1.063997
nonwhite | .5275839 .0607424 -5.55 0.000 .4210076 .6611394
age | .9797456 .0036832 -5.44 0.000 .9725532 .9869912
schlt12 | .8850966 .0814354 -1.33 0.185 .7390501 1.060004
schgt12 | 1.116816 .0875072 1.41 0.159 .9578255 1.302197
smsa | 1.205005 .0923125 2.43 0.015 1.037004 1.400224
bluecoll | .8099341 .0654969 -2.61 0.009 .6912189 .9490384
mining | .8835344 .1684327 -0.65 0.516 .6080713 1.283785
constr | .9470011 .0974926 -0.53 0.597 .7739632 1.158726
transp | .8562733 .1255737 -1.06 0.290 .6423659 1.141412
trade | .9623999 .0931706 -0.40 0.692
.796068 1.163485
fire | 1.116009 .1451681 0.84 0.399 .8648584 1.440091
services | 1.181313 .1109851 1.77 0.076 .9826387 1.420155
pubadmin | 1.107605 .313432 0.36 0.718 .6360783 1.928677
year85 | 1.226497 .1041394 2.40 0.016 1.038467 1.448572
year87 | 1.402734 .1261218 3.76 0.000 1.176095 1.673046
year89 | 1.566206 .1643529 4.28 0.000 1.275047 1.92385
381

midatl | 1.034816 .1454072 0.24 0.808 .7856998 1.362918


encen | 1.017613 .1464205 0.12 0.903 .7675496 1.349146
wncen | 1.179507 .1807665 1.08 0.281 .8734718 1.592767
southatl | 1.286342 .1449884 2.23 0.025 1.031369 1.604348
escen | 1.41205 .2597913 1.88 0.061
.984565 2.025142
wscen | 1.3933 .1894611 2.44 0.015 1.067329 1.818826
mountain | 1.009529 .148262 0.06 0.949 .7570232 1.346259
pacific | 1.074687 .243238 0.32 0.750 .6896459 1.674702
-----------------------------------------------------------------------------.
. * Display results for Table 17.9 page 608
. * Not possible here as estimates table gives coefficients not hazard rates
. * Instead need to use output for each model
. * Not sure why t-statistics differ somewhat from those in Table 17.9
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma17p4duration.txt
log type: text
closed on: 19 May 2005, 15:25:17

382

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma18p1heterogeneity.txt
log type: text
opened on: 19 May 2005, 17:58:22
.
. ********** OVERVIEW OF MMA18P1HETEROGENEITY.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 18.8 Pages 632-6
. * Unobserved Heterogeneity with Duration data Example
. * (1) Exponential with and without heterogeneity
.*
Residuals Plots: Figures 18.2 (exp.wmf) and 18.3 (exp_gamma.wmf)
.*
Tabulate Model Estimates: Table 18.1
. * (2) Weibull with and without heterogeneity: Generalized Residuals Plots
.*
Residuals Plots: Figures 18.4 (Weibul16.wmf) and 18.5 (Weibul16_IG.wmf)
.*
Tabulate model Estimates: Table 18.2
.
. * To run this program you need data file
. * ema1996.dta
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
. set matsize 100
.
. ********** DATA DESCRIPTION **********
.
. * The data is from
. * B.P. McCall (1996), "Unemployment Insurance Rules, Joblessness,
.*
and Part-time Work," Econometrica, 64, 647-682.
.
. * There are 3343 observations from the CPS Displaced Worker Surveys
. * of 1986, 1988, 1990 and 1992 on 33 variables including
. * spell = length of spell in number of two-week intervals
. * CENSOR1 = 1 if re-employed at full-time job
.
. * See program mma17p4duration.do for further description of the data set
.
. ********** READ DATA **********
383

.
. use ema1996.dta
(Sample for 1996 EMA paper: part-time= worked part-time last week)
.
. ********** CREATE ADDITIONAL VARIABLES **********
.
. gen RR = reprate
. gen DR = disrate
. gen UI = ui
. gen RRUI = RR*UI
. gen DRUI = DR*UI
. gen LOGWAGE = logwage
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------spell |
3343 6.247981 5.611271
1
28
censor1 |
3343 .3209692 .4669188
0
1
censor2 |
3343 .1014059 .3019106
0
1
censor3 |
3343 .1717021 .3771777
0
1
censor4 |
3343 .3754113 .4843014
0
1
-------------+-------------------------------------------------------ui |
3343 .5527969 .4972791
0
1
reprate |
3343 .4544717 .1137918
.066
2.059
logwage |
3343 5.692994 .5356591 2.70805 7.600402
tenure |
3343 4.114867 5.862322
0
40
disrate |
3343 .1094376 .0735274
.002
1.02
-------------+-------------------------------------------------------slack |
3343 .4884834 .4999421
0
1
abolpos |
3343 .1456775 .3528354
0
1
explose |
3343 .5025426 .5000683
0
1
stateur |
3343
6.5516 1.803825
2.5
13
houshead |
3343 .6120251 .4873617
0
1
-------------+-------------------------------------------------------married |
3343 .5860006 .4926221
0
1
female |
3343 .3478911 .4763725
0
1
child |
3343 .4501944 .4975876
0
1
ychild |
3343 .1956327 .3967463
0
1
nonwhite |
3343 .1390966 .3460991
0
1
-------------+-------------------------------------------------------age |
3343 35.44331 10.6402
20
61
schlt12 |
3343 .2811846 .4496446
0
1
schgt12 |
3343 .3356267 .4722797
0
1
smsa |
3343 .7241998 .4469835
0
1
384

bluecoll |
3343 .6036494 .489212
0
1
-------------+-------------------------------------------------------mining |
3343 .029315 .1687132
0
1
constr |
3343 .1480706 .3552231
0
1
transp |
3343 .0646126 .2458778
0
1
trade |
3343 .1848639 .3882452
0
1
fire |
3343 .0514508 .2209484
0
1
-------------+-------------------------------------------------------services |
3343 .1699073 .3756075
0
1
pubadmin |
3343 .0095722 .097383
0
1
year85 |
3343 .2677236 .442839
0
1
year87 |
3343 .2174693 .4125862
0
1
year89 |
3343 .1998205 .3999251
0
1
-------------+-------------------------------------------------------midatl |
3343 .1088842 .3115405
0
1
encen |
3343 .1429853 .3501103
0
1
wncen |
3343 .0643135 .2453472
0
1
southatl |
3343 .2375112 .4256217
0
1
escen |
3343 .0532456 .2245564
0
1
-------------+-------------------------------------------------------wscen |
3343 .1441819 .3513266
0
1
mountain |
3343 .1079868 .3104102
0
1
pacific |
3343 .0260245 .159232
0
1
RR |
3343 .4544717 .1137918
.066
2.059
DR |
3343 .1094376 .0735274
.002
1.02
-------------+-------------------------------------------------------UI |
3343 .5527969 .4972791
0
1
RRUI |
3343 .2478687 .2380667
0
2.059
DRUI |
3343 .0602776 .0754261
0
.824
LOGWAGE |
3343 5.692994 .5356591 2.70805 7.600402
.
. ********* ANALYSIS: UNEMPLOYMENT DURATION **********
.
. * Stata st curves require defining the dependent variable
. * and the censoring variable if there is one
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
385

. stdes
failure _d: censor1 == 1
analysis time _t: spell
|-------------- per subject --------------|
Category
total
mean
min median
max
-----------------------------------------------------------------------------no. of subjects
3343
no. of records
3343
1
1
1
1
(first) entry time
(final) exit time
subjects with gap
time on gap if gap
time at risk

0
6.247981
0
0
20887 6.247981

0
1

0
5

28

28

failures
1073 .3209692
0
0
1
-----------------------------------------------------------------------------.
. * Define $xlist = list of regressors used in subsequent regressions
. global xlist RR DR UI RRUI DRUI LOGWAGE /*
> */ tenure slack abolpos explose stateur houshead married /*
> */ female child ychild nonwhite age schlt12 schgt12 smsa bluecoll /*
> */ mining constr transp trade fire services pubadmin /*
> */ year85 year87 year89 midatl /*
> */ encen wncen southatl escen wscen mountain pacific
.
. * (1) EXPONENTIAL REGRESSION
.
. * Estimate exponential without heterogeneity
. streg $xlist, nolog nohr dist(exponential) robust
failure _d: censor1 == 1
analysis time _t: spell
Exponential regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 565.24


Log pseudo-likelihood = -2700.6903
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
386

-------------+---------------------------------------------------------------RR | .4720235 .6005534 0.79 0.432 -.7050396 1.649087


DR | -.5756396 .7624489 -0.75 0.450 -2.070012 .9187327
UI | -1.424561 .2493917 -5.71 0.000 -1.91336 -.9357622
RRUI | .9655904 .6118408 1.58 0.115 -.2335956 2.164776
DRUI | -.1990635 1.019118 -0.20 0.845 -2.196498 1.798371
LOGWAGE | .3508005 .115598 3.03 0.002 .1242327 .5773684
tenure | -.0001462 .0064637 -0.02 0.982 -.0128147 .0125224
slack | -.2593666 .0759363 -3.42 0.001 -.4081991 -.1105342
abolpos | -.1550897 .0953306 -1.63 0.104 -.3419342 .0317549
explose | .198458 .0648354 3.06 0.002
.071383 .3255331
stateur | -.064626 .0229903 -2.81 0.005 -.1096862 -.0195659
houshead | .3812208 .0836602 4.56 0.000 .2172499 .5451918
married | .369552 .0786145 4.70 0.000 .2154705 .5236335
female | .1164067 .0852986 1.36 0.172 -.0507754 .2835888
child | -.0333008 .0794577 -0.42 0.675 -.1890352 .1224335
ychild | -.1449722 .1022781 -1.42 0.156 -.3454336 .0554892
nonwhite | -.6692066 .1188272 -5.63 0.000 -.9021037 -.4363095
age | -.0220821 .0039256 -5.63 0.000 -.0297762 -.0143879
schlt12 | -.1231414 .0966102 -1.27 0.202 -.3124939 .066211
schgt12 | .1114395 .082945 1.34 0.179 -.0511297 .2740087
smsa | .1922291 .0799904 2.40 0.016 .0354508 .3490075
bluecoll | -.2033718 .085129 -2.39 0.017 -.3702215 -.036522
mining | -.1205818 .1973575 -0.61 0.541 -.5073955 .2662319
constr | -.04475 .1081519 -0.41 0.679 -.2567237 .1672238
transp | -.1786694 .156034 -1.15 0.252 -.4844906 .1271517
trade | -.0345159 .1019152 -0.34 0.735 -.234266 .1652341
fire | .1120549 .1386716 0.81 0.419 -.1597365 .3838462
services | .1840002 .0983911 1.87 0.061 -.0088428 .3768432
pubadmin | .1090606 .2954211 0.37 0.712 -.4699541 .6880752
year85 | .2147661 .0888664 2.42 0.016 .0405911 .388941
year87 | .3541162 .0948499 3.73 0.000 .1682139 .5400186
year89 | .467082 .1104355 4.23 0.000 .2506325 .6835316
midatl | .0264112 .1465647 0.18 0.857 -.2608503 .3136727
encen | .0043916 .1502813 0.03 0.977 -.2901544 .2989375
wncen | .1724311 .1607689 1.07 0.283 -.1426703 .4875324
southatl | .2638807 .1183726 2.23 0.026 .0318747 .4958867
escen | .35414 .19317 1.83 0.067 -.0244664 .7327463
wscen | .3385896 .1433308 2.36 0.018 .0576664 .6195128
mountain | .0063693 .1538821 0.04 0.967 -.2952341 .3079727
pacific | .0770202 .2393505 0.32 0.748 -.3920982 .5461385
_cons | -4.079107 .8767097 -4.65 0.000 -5.797426 -2.360788
-----------------------------------------------------------------------------. estimates store bexp
.
. * Figure 18.2 (p.633) - Generalized (Cox-Snell) Residuals for Exponential
. predict resid, csnell
. stset resid, fail(censor1)
387

failure event: censor1 != 0 & censor1 < .


obs. time interval: (0, resid]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
1073 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t = 5.218098
. sts generate survivor=s
. generate cumhaz = -ln(survivor)
. sort resid
. graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Exponential Model Residuals") /*
> */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(6) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line"))
. graph export exp.wmf, replace
(file c:\Imbook\bwebpage\Section4\exp.wmf written in Windows Metafile format)
. drop resid survivor cumhaz
.
. * Estimate exponential with gamma heterogeneity
. stset spell, fail(censor1)
failure event: censor1 != 0 & censor1 < .
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
388

last observed exit t =

28

. streg $xlist, nolog nohr dist(exponential) frailty(gamma) robust


failure _d: censor1
analysis time _t: spell
Exponential regression -- log relative-hazard form
Gamma frailty
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 576.86


Log pseudo-likelihood = -2695.3518
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .5005828 .6187508 0.81 0.419 -.7121465 1.713312
DR | -.8824469 .7894395 -1.12 0.264 -2.42972 .664826
UI | -1.584537 .2622252 -6.04 0.000 -2.098489 -1.070586
RRUI | 1.091168 .6327026 1.72 0.085 -.1489067 2.331242
DRUI | .0574048 1.047123 0.05 0.956 -1.994919 2.109729
LOGWAGE | .3792805 .1191278 3.18 0.001 .1457944 .6127666
tenure | .0007938 .0065903 0.12 0.904 -.012123 .0137106
slack | -.2862928 .0770348 -3.72 0.000 -.4372782 -.1353074
abolpos | -.1842749 .0977213 -1.89 0.059 -.3758051 .0072552
explose | .2151452 .0663117 3.24 0.001 .0851767 .3451137
stateur | -.0650451 .023552 -2.76 0.006 -.1112061 -.0188841
houshead | .3960399 .0847153 4.67 0.000 .2300009 .5620789
married | .3961194 .0806744 4.91 0.000 .2380005 .5542384
female | .1102564 .0869256 1.27 0.205 -.0601147 .2806275
child | -.0464355 .0815869 -0.57 0.569 -.206343 .113472
ychild | -.1213622 .103309 -1.17 0.240 -.3238441 .0811196
nonwhite | -.6909793 .1217489 -5.68 0.000 -.9296027 -.4523559
age | -.0225342 .0040184 -5.61 0.000 -.0304101 -.0146582
schlt12 | -.1513782 .0968026 -1.56 0.118 -.3411079 .0383515
schgt12 | .1011742 .0834622 1.21 0.225 -.0624088 .2647572
smsa | .212363 .081774 2.60 0.009
.052089 .372637
bluecoll | -.220439 .0862751 -2.56 0.011 -.3895351 -.0513429
mining | -.1721823 .2051663 -0.84 0.401 -.5743008 .2299362
constr | -.0897602 .11034 -0.81 0.416 -.3060225 .1265022
transp | -.1572488 .1563607 -1.01 0.315 -.4637102 .1492126
trade | -.0451107 .1034986 -0.44 0.663 -.2479642 .1577428
fire | .0881685 .1386688 0.64 0.525 -.1836175 .3599544
services | .1682835 .1005405 1.67 0.094 -.0287723 .3653393
pubadmin | .0961407 .3092103 0.31 0.756 -.5099004 .7021817
year85 | .1940199 .0906564 2.14 0.032 .0163366 .3717031
year87 | .3564373 .0959014 3.72 0.000 .1684741 .5444005
389

year89 | .4924007 .1101907 4.47 0.000 .2764308 .7083705


midatl | .0156736 .1488094 0.11 0.916 -.2759874 .3073347
encen | .0089345 .1538505 0.06 0.954 -.2926069 .3104759
wncen | .1742124 .1634726 1.07 0.287 -.1461881 .4946129
southatl | .2676635 .1192515 2.24 0.025 .0339348 .5013922
escen | .3741169 .199389 1.88 0.061 -.0166783 .7649121
wscen | .361461 .1423856 2.54 0.011 .0823903 .6405316
mountain | -.00019 .1557385 -0.00 0.999 -.3054318 .3050519
pacific | .0800478 .2463547 0.32 0.745 -.4027986 .5628941
_cons | -4.095067 .9086039 -4.51 0.000 -5.875898 -2.314236
-------------+---------------------------------------------------------------/ln_the | -1.462995 .31608 -4.63 0.000
-2.0825 -.8434894
-------------+---------------------------------------------------------------theta | .2315418 .0731857
.1246183 .4302067
-----------------------------------------------------------------------------. estimates store bexpgamma
.
. * Figure 18.3 (p.633) - Generalized (Cox-Snell) Residuals for Exponential-Gamma
. predict resid, csnell
(option unconditional assumed)
. stset resid, fail(censor1)
failure event: censor1 != 0 & censor1 < .
obs. time interval: (0, resid]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
1073 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t = 3.971096
. sts generate survivor=s
. generate cumhaz = -ln(survivor)
. sort resid
. graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Exponential-Gamma Model Residuals") /*
> */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
390

> */ legend(pos(6) ring(0) col(1)) legend(size(small)) /*


> */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line"))
. graph export exp_gamma.wmf, replace
(file c:\Imbook\bwebpage\Section4\exp_gamma.wmf written in Windows Metafile format)
. drop resid survivor cumhaz
.
. /*
> * Following did not work, even with starting values provided
> * Results in book obtained on different computer with different Stata version
> * Estimate exponential with IG heterogeneity
> stset spell, fail(censor1=1)
> quietly streg $xlist, nolog nohr dist(exponential) robust
> matrix theta = 1.6
> matrix bstart = e(b),theta
> streg $xlist, nohr dist(exponential) frailty(invgauss) robust from(bstart)
> * estimates store bexpIG
> */
.
. * Table 18.1 (p.634) - Display Parameter Estimates
. * Note that exponetial-IG missing
. estimates table bexp bexpgamma, t(%9.3f) stats(N ll) b(%9.3f) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE _cons)
-------------------------------------Variable | bexp
bexpgamma
-------------+-----------------------RR | 0.472
0.501
| 0.786
0.809
DR | -0.576 -0.882
| -0.755 -1.118
UI | -1.425 -1.585
| -5.712 -6.043
RRUI | 0.966
1.091
| 1.578
1.725
DRUI | -0.199
0.057
| -0.195
0.055
LOGWAGE | 0.351
0.379
| 3.035
3.184
_cons | -4.079 -4.095
| -4.653 -4.507
-------------+-----------------------N | 3343.000 3343.000
ll | -2700.690 -2695.352
-------------------------------------legend: b/t
.
. * (2) WEIBULL REGRESSION
391

.
. * Estimate Weibull without heterogeneity
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr dist(weibull) robust
failure _d: censor1 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 501.65


Log pseudo-likelihood = -2687.5995
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .4481156 .6381895 0.70 0.483 -.8027127 1.698944
DR | -.4269187 .8086983 -0.53 0.598 -2.011938 1.158101
UI | -1.496066 .2639679 -5.67 0.000 -2.013434 -.9786984
RRUI | 1.015226 .6455611 1.57 0.116 -.2500501 2.280503
DRUI | -.2988417 1.065384 -0.28 0.779 -2.386956 1.789272
LOGWAGE | .3655253 .12212 2.99 0.003 .1261745 .6048761
tenure | -.0011127 .0068716 -0.16 0.871 -.0145809 .0123554
slack | -.2652154 .0803214 -3.30 0.001 -.4226424 -.1077883
abolpos | -.1604227 .1012942 -1.58 0.113 -.3589557 .0381103
explose | .2075085 .0684715 3.03 0.002 .0733068 .3417103
stateur | -.0708745 .0242117 -2.93 0.003 -.1183286 -.0234204
houshead | .3976626 .0887192 4.48 0.000 .2237762 .571549
married | .3786057 .0830317 4.56 0.000 .2158665 .541345
female | .1260829 .0896987 1.41 0.160 -.0497233 .301889
child | -.0336778 .0839956 -0.40 0.688 -.1983061 .1309505
ychild | -.1613066 .108947 -1.48 0.139 -.3748389 .0522256
392

nonwhite | -.7025504 .12426 -5.65 0.000 -.9460956 -.4590052


age | -.0235823 .0041922 -5.63 0.000 -.0317989 -.0153658
schlt12 | -.1226759 .1022762 -1.20 0.230 -.3231335 .0777816
schgt12 | .1162848 .0880692 1.32 0.187 -.0563278 .2888973
smsa | .1999567 .0841129 2.38 0.017 .0350985 .3648149
bluecoll | -.1994925 .0899354 -2.22 0.027 -.3757626 -.0232223
mining | -.1015676 .2036644 -0.50 0.618 -.5007425 .2976073
constr | -.0253737 .1135609 -0.22 0.823 -.247949 .1972016
transp | -.1981522 .1672141 -1.19 0.236 -.5258858 .1295814
trade | -.0311361 .1079502 -0.29 0.773 -.2427146 .1804423
fire | .1262153 .1492527 0.85 0.398 -.1663145 .4187452
services | .2031673 .1038945 1.96 0.051 -.0004622 .4067968
pubadmin | .1117728 .3087374 0.36 0.717 -.4933415 .716887
year85 | .2374972 .093387 2.54 0.011
.054462 .4205325
year87 | .3787397 .1011782 3.74 0.000 .1804341 .5770454
year89 | .4920278 .1180472 4.17 0.000 .2606596 .7233959
midatl | .02465 .1542139 0.16 0.873 -.2776037 .3269036
encen | -.0014111 .1579065 -0.01 0.993 -.3109023 .30808
wncen | .1844363 .1694444 1.09 0.276 -.1476687 .5165413
southatl | .2740974 .1250481 2.19 0.028 .0290076 .5191872
escen | .367742 .2024771 1.82 0.069 -.0291058 .7645899
wscen | .3440005 .1527804 2.25 0.024 .0445563 .6434446
mountain | .0159627 .1620188 0.10 0.922 -.3015883 .3335136
pacific | .0849532 .2504077 0.34 0.734 -.4058368 .5757432
_cons | -4.357886 .9196792 -4.74 0.000 -6.160424 -2.555347
-------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281
-------------+---------------------------------------------------------------p | 1.129225 .0219492
1.087014 1.173075
1/p | .8855632 .0172131
.8524608 .9199511
-----------------------------------------------------------------------------. estimates store bweib
.
. * Figure 18.4 (p.635) - Generalized (Cox-Snell) Residuals for Weibull
. predict resid, csnell
. stset resid, fail(censor1)
failure event: censor1 != 0 & censor1 < .
obs. time interval: (0, resid]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
1073 total analysis time at risk, at risk from t =
0
393

earliest observed entry t =


0
last observed exit t = 6.283261
. sts generate survivor=s
. generate cumhaz = -ln(survivor)
. sort resid
. graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Weibull Model Residuals") /*
> */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(6) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line"))
. graph export Weibul16.wmf, replace
(file c:\Imbook\bwebpage\Section4\Weibul16.wmf written in Windows Metafile format)
. drop resid survivor cumhaz
.
. * Estimate Weibull with gamma heterogeneity
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr dist(weibull) frailty(invgauss) robust
failure _d: censor1 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
Inverse-Gaussian frailty
No. of subjects
No. of failures

=
=

3343
1073

Number of obs =

3343

394

Time at risk

20887

Wald chi2(40) = 643.00


Log pseudo-likelihood = -2616.3216
Prob > chi2

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .7356277 .9058181 0.81 0.417 -1.039743 2.510998
DR | -1.072566 1.149098 -0.93 0.351 -3.324758 1.179625
UI | -2.574752 .3843798 -6.70 0.000 -3.328123 -1.821381
RRUI | 1.733571 .9333928 1.86 0.063 -.0958458 3.562987
DRUI | -.060621 1.537813 -0.04 0.969 -3.07468 2.953438
LOGWAGE | .575656 .1766599 3.26 0.001 .2294089 .9219031
tenure | -.0009848 .0097472 -0.10 0.920 -.0200889 .0181194
slack | -.4416007 .1142976 -3.86 0.000 -.6656199 -.2175814
abolpos | -.2873066 .1465357 -1.96 0.050 -.5745113 -.0001019
explose | .3641943 .0976897 3.73 0.000 .1727259 .5556627
stateur | -.0981133 .0346763 -2.83 0.005 -.1660775 -.030149
houshead | .5924383 .1256739 4.71 0.000 .3461219 .8387546
married | .6083214 .1183487 5.14 0.000 .3763624 .8402805
female | .1788439 .1285074 1.39 0.164 -.0730259 .4307137
child | -.0914227 .121778 -0.75 0.453 -.3301031 .1472578
ychild | -.1805373 .1527477 -1.18 0.237 -.4799173 .1188426
nonwhite | -1.008517 .1725174 -5.85 0.000 -1.346645 -.6703894
age | -.0333776 .0059183 -5.64 0.000 -.0449772 -.0217779
schlt12 | -.2258621 .1439543 -1.57 0.117 -.5080075 .0562832
schgt12 | .1505129 .124469 1.21 0.227 -.0934418 .3944677
smsa | .3009952 .119907 2.51 0.012 .0659819 .5360086
bluecoll | -.3211857 .1253163 -2.56 0.010 -.5668012 -.0755702
mining | -.2319827 .3008491 -0.77 0.441 -.8216361 .3576708
constr | -.1260324 .1633669 -0.77 0.440 -.4462257 .1941609
transp | -.2763858 .225893 -1.22 0.221 -.7191279 .1663562
trade | -.0687616 .1518284 -0.45 0.651 -.3663399 .2288166
fire | .0668973 .2131814 0.31 0.754 -.3509306 .4847252
services | .231914 .1494712 1.55 0.121 -.0610441 .5248721
pubadmin | .0901949 .4579252 0.20 0.844 -.807322 .9877117
year85 | .2780139 .1339053 2.08 0.038 .0155644 .5404634
year87 | .5208783 .1415375 3.68 0.000 .2434699 .7982867
year89 | .7209598 .1655487 4.35 0.000 .3964903 1.045429
midatl | -.0192077 .2222646 -0.09 0.931 -.4548382 .4164228
encen | -.0297055 .2284931 -0.13 0.897 -.4775438 .4181328
wncen | .2460338 .24216 1.02 0.310 -.2285911 .7206586
southatl | .3563643 .1793284 1.99 0.047 .0048872 .7078415
escen | .5461543 .2910193 1.88 0.061 -.024233 1.116542
wscen | .4606814 .2140966 2.15 0.031 .0410598 .880303
mountain | .017581 .2293804 0.08 0.939 -.4319963 .4671584
pacific | .1379886 .3636985 0.38 0.704 -.5748475 .8508247
_cons | -5.303059 1.34133 -3.95 0.000 -7.932017 -2.6741
-------------+---------------------------------------------------------------/ln_p | .5611667 .0225898 24.84 0.000 .5168915 .6054418
395

/ln_the | 1.852696 .0896755 20.66 0.000 1.676935 2.028457


-------------+---------------------------------------------------------------p | 1.752716 .0395935
1.676807 1.832062
1/p | .570543 .0128884
.5458332 .5963715
theta | 6.376987 .5718595
5.349136 7.602343
-----------------------------------------------------------------------------. estimates store bweibIG
.
. * Figure 18.5 (p.636) - Generalized (Cox-Snell) Residuals for Weibull-IG
. predict resid, csnell
(option unconditional assumed)
. stset resid, fail(censor1)
failure event: censor1 != 0 & censor1 < .
obs. time interval: (0, resid]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
1073 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t = 5.044588
. sts generate survivor=s
. generate cumhaz = -ln(survivor)
. sort resid
. graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Weibull-IG Model Residuals") /*
> */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(6) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line"))
. graph export Weibul16_IG.wmf, replace
(file c:\Imbook\bwebpage\Section4\Weibul16_IG.wmf written in Windows Metafile format)
. drop resid survivor cumhaz
.
396

. * Table 18.2 (p.635) - Display Parameter Estimates


. estimates table bweibIG bweib, t(%9.3f) stats(N ll) b(%9.3f) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE _cons)
-------------------------------------Variable | bweibIG
bweib
-------------+-----------------------RR | 0.736
0.448
| 0.812
0.702
DR | -1.073 -0.427
| -0.933 -0.528
UI | -2.575 -1.496
| -6.698 -5.668
RRUI | 1.734
1.015
| 1.857
1.573
DRUI | -0.061 -0.299
| -0.039 -0.281
LOGWAGE | 0.576
0.366
| 3.259
2.993
_cons | -5.303 -4.358
| -3.954 -4.738
-------------+-----------------------N | 3343.000 3343.000
ll | -2616.322 -2687.600
-------------------------------------legend: b/t
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma18p1heterogeneity.txt
log type: text
closed on: 19 May 2005, 17:58:38

397

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma19p1comprisks.txt
log type: text
opened on: 19 May 2005, 17:52:44
.
. ********** OVERVIEW OF MMA18P1COMPRISKS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 19.5 pages 658-62
. * Competing Risks Example with censoring mechanism each of the three risks
. * (1A) Table 19.2 p.659 Exponential
. * (1B) Table 19.2 p.659 Exponential with IG frailty
. * (2A) Table 19.3 p.659 Weibull
. * (2B) Table 19.3 p.659 Weibull with IG frailty
. * (2C) Table 19.3 p.660 Cox model
. * (2D) Graph the resulting Cox baseline survival and cumulative hazards
.*
Figure 19.1: (combined_bsf.wmf) baseline survival functions
.*
Figure 19.2: (combined_cbh.wmf) baseline cumulative hazards
.
. * To run this program you need data file
. * ema1996.dta
.
. * NOTE: The IG Heterogeneity estimation was unsuccessful for exponential
.*
but successful for Weibull
.
. ********** SETUP **********
.
. set more off
. version 8
. set scheme s1mono /* Used for graphs */
. set matsize 80

/* Needed for this program */

.
. ********** DATA DESCRIPTION **********
.
. * The data is from
. * B.P. McCall (1996), "Unemployment Insurance Rules, Joblessness,
.*
and Part-time Work," Econometrica, 64, 647-682.
.
. * There are 3343 observations from the CPS Displaced Worker Surveys
. * of 1986, 1988, 1990 and 1992 on 33 variables including
. * spell = length of spell in number of two-week intervals
398

. * CENSOR1 = 1 if re-employed at full-time job


. * CENSOR2 = 1 if re-employed at part-time job
. * CENSOR3 = 1 if re-employed but left job: pt-ft status unknown
. * CENSOR4 = 1 if still jobless
.
. * See program mma17p4duration.do for further description of the data set
.
. ********** READ DATA and CREATE ADDITIONAL VARIABLES **********
.
. use ema1996.dta
(Sample for 1996 EMA paper: part-time= worked part-time last week)
.
. gen RR = reprate
. gen DR = disrate
. gen UI = ui
. gen RRUI = RR*UI
. gen DRUI = DR*UI
. gen LOGWAGE = logwage
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------spell |
3343 6.247981 5.611271
1
28
censor1 |
3343 .3209692 .4669188
0
1
censor2 |
3343 .1014059 .3019106
0
1
censor3 |
3343 .1717021 .3771777
0
1
censor4 |
3343 .3754113 .4843014
0
1
-------------+-------------------------------------------------------ui |
3343 .5527969 .4972791
0
1
reprate |
3343 .4544717 .1137918
.066
2.059
logwage |
3343 5.692994 .5356591 2.70805 7.600402
tenure |
3343 4.114867 5.862322
0
40
disrate |
3343 .1094376 .0735274
.002
1.02
-------------+-------------------------------------------------------slack |
3343 .4884834 .4999421
0
1
abolpos |
3343 .1456775 .3528354
0
1
explose |
3343 .5025426 .5000683
0
1
stateur |
3343
6.5516 1.803825
2.5
13
houshead |
3343 .6120251 .4873617
0
1
-------------+-------------------------------------------------------married |
3343 .5860006 .4926221
0
1
female |
3343 .3478911 .4763725
0
1
child |
3343 .4501944 .4975876
0
1
ychild |
3343 .1956327 .3967463
0
1
399

nonwhite |
3343 .1390966 .3460991
0
1
-------------+-------------------------------------------------------age |
3343 35.44331 10.6402
20
61
schlt12 |
3343 .2811846 .4496446
0
1
schgt12 |
3343 .3356267 .4722797
0
1
smsa |
3343 .7241998 .4469835
0
1
bluecoll |
3343 .6036494 .489212
0
1
-------------+-------------------------------------------------------mining |
3343 .029315 .1687132
0
1
constr |
3343 .1480706 .3552231
0
1
transp |
3343 .0646126 .2458778
0
1
trade |
3343 .1848639 .3882452
0
1
fire |
3343 .0514508 .2209484
0
1
-------------+-------------------------------------------------------services |
3343 .1699073 .3756075
0
1
pubadmin |
3343 .0095722 .097383
0
1
year85 |
3343 .2677236 .442839
0
1
year87 |
3343 .2174693 .4125862
0
1
year89 |
3343 .1998205 .3999251
0
1
-------------+-------------------------------------------------------midatl |
3343 .1088842 .3115405
0
1
encen |
3343 .1429853 .3501103
0
1
wncen |
3343 .0643135 .2453472
0
1
southatl |
3343 .2375112 .4256217
0
1
escen |
3343 .0532456 .2245564
0
1
-------------+-------------------------------------------------------wscen |
3343 .1441819 .3513266
0
1
mountain |
3343 .1079868 .3104102
0
1
pacific |
3343 .0260245 .159232
0
1
RR |
3343 .4544717 .1137918
.066
2.059
DR |
3343 .1094376 .0735274
.002
1.02
-------------+-------------------------------------------------------UI |
3343 .5527969 .4972791
0
1
RRUI |
3343 .2478687 .2380667
0
2.059
DRUI |
3343 .0602776 .0754261
0
.824
LOGWAGE |
3343 5.692994 .5356591 2.70805 7.600402
.
. ********* COMPETING RISKS FOR UNEMPLOYMENT DURATION **********
.
. * Stata analysis requires using stset to define the dependent variable
. * and the censoring variable if there is one
.
. * For the competing risks model there are three censoring variables
. * CENSOR1 = 1 if re-employed at full-time job
. * CENSOR2 = 1 if re-employed at part-time job
. * CENSOR3 = 1 if re-employed but left job: pt-ft status unknown
.
. * Define $xlist = list of regressors used in subsequent regressions
. global xlist RR DR UI RRUI DRUI LOGWAGE /*
> */ tenure slack abolpos explose stateur houshead married /*
400

>
>
>
>

*/ female child ychild nonwhite age schlt12 schgt12 smsa bluecoll /*


*/ mining constr transp trade fire services pubadmin /*
*/ year85 year87 year89 midatl /*
*/ encen wncen southatl escen wscen mountain pacific

.
. *** (1A) EXPONENTIAL WITH NO HETEROGENEITY Table 19.2
.
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(exponential)
failure _d: censor1 == 1
analysis time _t: spell
Exponential regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 565.24


Log pseudo-likelihood = -2700.6903
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .4720235 .6005534 0.79 0.432 -.7050396 1.649087
DR | -.5756396 .7624489 -0.75 0.450 -2.070012 .9187327
UI | -1.424561 .2493917 -5.71 0.000 -1.91336 -.9357622
RRUI | .9655904 .6118408 1.58 0.115 -.2335956 2.164776
DRUI | -.1990635 1.019118 -0.20 0.845 -2.196498 1.798371
LOGWAGE | .3508005 .115598 3.03 0.002 .1242327 .5773684
tenure | -.0001462 .0064637 -0.02 0.982 -.0128147 .0125224
slack | -.2593666 .0759363 -3.42 0.001 -.4081991 -.1105342
abolpos | -.1550897 .0953306 -1.63 0.104 -.3419342 .0317549
explose | .198458 .0648354 3.06 0.002
.071383 .3255331
401

stateur | -.064626 .0229903 -2.81 0.005 -.1096862 -.0195659


houshead | .3812208 .0836602 4.56 0.000 .2172499 .5451918
married | .369552 .0786145 4.70 0.000 .2154705 .5236335
female | .1164067 .0852986 1.36 0.172 -.0507754 .2835888
child | -.0333008 .0794577 -0.42 0.675 -.1890352 .1224335
ychild | -.1449722 .1022781 -1.42 0.156 -.3454336 .0554892
nonwhite | -.6692066 .1188272 -5.63 0.000 -.9021037 -.4363095
age | -.0220821 .0039256 -5.63 0.000 -.0297762 -.0143879
schlt12 | -.1231414 .0966102 -1.27 0.202 -.3124939 .066211
schgt12 | .1114395 .082945 1.34 0.179 -.0511297 .2740087
smsa | .1922291 .0799904 2.40 0.016 .0354508 .3490075
bluecoll | -.2033718 .085129 -2.39 0.017 -.3702215 -.036522
mining | -.1205818 .1973575 -0.61 0.541 -.5073955 .2662319
constr | -.04475 .1081519 -0.41 0.679 -.2567237 .1672238
transp | -.1786694 .156034 -1.15 0.252 -.4844906 .1271517
trade | -.0345159 .1019152 -0.34 0.735 -.234266 .1652341
fire | .1120549 .1386716 0.81 0.419 -.1597365 .3838462
services | .1840002 .0983911 1.87 0.061 -.0088428 .3768432
pubadmin | .1090606 .2954211 0.37 0.712 -.4699541 .6880752
year85 | .2147661 .0888664 2.42 0.016 .0405911 .388941
year87 | .3541162 .0948499 3.73 0.000 .1682139 .5400186
year89 | .467082 .1104355 4.23 0.000 .2506325 .6835316
midatl | .0264112 .1465647 0.18 0.857 -.2608503 .3136727
encen | .0043916 .1502813 0.03 0.977 -.2901544 .2989375
wncen | .1724311 .1607689 1.07 0.283 -.1426703 .4875324
southatl | .2638807 .1183726 2.23 0.026 .0318747 .4958867
escen | .35414 .19317 1.83 0.067 -.0244664 .7327463
wscen | .3385896 .1433308 2.36 0.018 .0576664 .6195128
mountain | .0063693 .1538821 0.04 0.967 -.2952341 .3079727
pacific | .0770202 .2393505 0.32 0.748 -.3920982 .5461385
_cons | -4.079107 .8767097 -4.65 0.000 -5.797426 -2.360788
-----------------------------------------------------------------------------. estimates store bexpr1
.
. stset spell, fail(censor2=1)
failure event: censor2 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
339 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
402

. streg $xlist, nolog nohr robust dist(exponential)


failure _d: censor2 == 1
analysis time _t: spell
Exponential regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
339
20887

Number of obs =

Wald chi2(40) = 227.08


Log pseudo-likelihood = -1250.5446
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.0928628 .9761428 -0.10 0.924 -2.006068 1.820342
DR | -.9600127 1.246692 -0.77 0.441 -3.403483 1.483458
UI | -1.047747 .5236826 -2.00 0.045 -2.074146 -.021348
RRUI | -.6698307 1.191869 -0.56 0.574 -3.005851 1.666189
DRUI | 1.987208 1.726509 1.15 0.250 -1.396688 5.371105
LOGWAGE | -.2577715 .1793075 -1.44 0.151 -.6092077 .0936646
tenure | .0053684 .0125538 0.43 0.669 -.0192366 .0299734
slack | -.2636908 .1311029 -2.01 0.044 -.5206477 -.0067339
abolpos | -.5626836 .202701 -2.78 0.006 -.9599703 -.1653969
explose | .0490271 .1130116 0.43 0.664 -.1724715 .2705258
stateur | -.1032439 .0406788 -2.54 0.011 -.182973 -.0235148
houshead | -.073544 .1343412 -0.55 0.584 -.3368479 .18976
married | -.0618813 .1339552 -0.46 0.644 -.3244287 .2006661
female | .4531912 .1384047 3.27 0.001
.181923 .7244594
child | -.2164986 .1452571 -1.49 0.136 -.5011973 .0682002
ychild | .149031 .1815684 0.82 0.412 -.2068365 .5048986
nonwhite | -.4563527 .1820135 -2.51 0.012 -.8130927 -.0996127
age | -.001781 .0064207 -0.28 0.781 -.0143653 .0108033
schlt12 | -.1803101 .1661528 -1.09 0.278 -.5059636 .1453433
schgt12 | -.0534463 .1462829 -0.37 0.715 -.3401555 .2332629
smsa | .1295376 .1384588 0.94 0.349 -.1418367 .400912
bluecoll | .0088207 .1510547 0.06 0.953 -.2872411 .3048825
mining | -.0141252 .4078632 -0.03 0.972 -.8135225 .785272
constr | .1867498 .1896106 0.98 0.325 -.1848802 .5583799
transp | -.402533 .2898061 -1.39 0.165 -.9705426 .1654766
trade | .1106678 .1735195 0.64 0.524 -.2294241 .4507598
fire | -.3396026 .3006096 -1.13 0.259 -.9287865 .2495813
services | .1619867 .1705571 0.95 0.342 -.172299 .4962724
pubadmin | .7445446 .5413463 1.38 0.169 -.3164746 1.805564
year85 | -.0548375 .149323 -0.37 0.713 -.3475052 .2378301
year87 | -.12113 .1616797 -0.75 0.454 -.4380164 .1957563
year89 | .1244437 .1950397 0.64 0.523 -.257827 .5067144
midatl | -.3969537 .2577568 -1.54 0.124 -.9021477 .1082403
403

encen | -.5115788 .2576815 -1.99 0.047 -1.016625 -.0065323


wncen | -.0674875 .257402 -0.26 0.793 -.5719862 .4370113
southatl | -.2719375 .1944647 -1.40 0.162 -.6530813 .1092062
escen | .065407 .3099463 0.21 0.833 -.5420766 .6728905
wscen | -.0941963 .2338712 -0.40 0.687 -.5525754 .3641827
mountain | .2287682 .2264905 1.01 0.312 -.215145 .6726814
pacific | -.2060074 .3970221 -0.52 0.604 -.9841563 .5721415
_cons | -.8636363 1.325425 -0.65 0.515 -3.461421 1.734148
-----------------------------------------------------------------------------. estimates store bexpr2
.
. stset spell, fail(censor3=1)
failure event: censor3 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
574 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(exponential)
failure _d: censor3 == 1
analysis time _t: spell
Exponential regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
574
20887

Number of obs =

Wald chi2(40) = 372.34


Log pseudo-likelihood = -1742.3964
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.6011551 .724665 -0.83 0.407 -2.021472 .8191621
DR | 1.121525 .9012528 1.24 0.213 -.6448975 2.887948
UI | -.9672682 .4486302 -2.16 0.031 -1.846567 -.0879691
RRUI | -.4326869 1.014413 -0.43 0.670
-2.4209 1.555526
DRUI | 2.102012 1.302564 1.61 0.107 -.450967 4.654991
404

LOGWAGE | .0029166 .1448149 0.02 0.984 -.2809153 .2867485


tenure | -.0479889 .0121403 -3.95 0.000 -.0717835 -.0241942
slack | -.4583215 .097709 -4.69 0.000 -.6498277 -.2668154
abolpos | -.2736409 .1396283 -1.96 0.050 -.5473073 .0000255
explose | .0246749 .0862551 0.29 0.775 -.144382 .1937319
stateur | -.1086692 .0319298 -3.40 0.001 -.1712504 -.046088
houshead | .5298135 .1054798 5.02 0.000 .3230769 .7365501
married | .0268657 .1062998 0.25 0.800 -.1814781 .2352095
female | .2590041 .109547 2.36 0.018 .0442959 .4737122
child | -.141802 .1114763 -1.27 0.203 -.3602915 .0766876
ychild | -.0885931 .136915 -0.65 0.518 -.3569416 .1797553
nonwhite | -.4668153 .143211 -3.26 0.001 -.7475036 -.186127
age | -.0247346 .0054431 -4.54 0.000 -.0354029 -.0140662
schlt12 | -.1034495 .1224893 -0.84 0.398 -.3435241 .1366251
schgt12 | .0952043 .1081669 0.88 0.379 -.1167988 .3072075
smsa | .0128711 .1021476 0.13 0.900 -.1873344 .2130767
bluecoll | .3098248 .1110841 2.79 0.005 .0921038 .5275457
mining | .2388579 .2604652 0.92 0.359 -.2716445 .7493603
constr | .0983356 .1419787 0.69 0.489 -.1799376 .3766088
transp | -.0783446 .1897853 -0.41 0.680 -.4503169 .2936278
trade | .1033278 .1292151 0.80 0.424 -.1499291 .3565847
fire | -.3607287 .2689374 -1.34 0.180 -.8878363 .166379
services | .0248212 .1323061 0.19 0.851 -.234494 .2841363
pubadmin | -1.770536 1.040329 -1.70 0.089 -3.809544 .2684714
year85 | .295673 .1143137 2.59 0.010 .0716222 .5197237
year87 | .4303606 .1198341 3.59 0.000 .1954901 .6652311
year89 | -.1373874 .1627204 -0.84 0.398 -.4563135 .1815386
midatl | -.5339921 .2188609 -2.44 0.015 -.9629516 -.1050326
encen | -.075022 .1998626 -0.38 0.707 -.4667454 .3167014
wncen | .1239805 .2095321 0.59 0.554 -.2866948 .5346559
southatl | .1522514 .1635982 0.93 0.352 -.1683951 .472898
escen | -.5123015 .3170723 -1.62 0.106 -1.133752 .1091488
wscen | .0198459 .1898764 0.10 0.917 -.3523051 .3919968
mountain | .1999108 .1869463 1.07 0.285 -.1664972 .5663188
pacific | .4481059 .2705097 1.66 0.098 -.0820833 .9782951
_cons | -1.620926 1.072666 -1.51 0.131 -3.723312 .4814595
-----------------------------------------------------------------------------. estimates store bexpr3
.
. * Table 19.2 (page 658) first three columns
. estimates table bexpr1 bexpr2 bexpr3, b(%10.3f) se(%10.3f) stats(N ll) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE tenure)
----------------------------------------------------Variable | bexpr1
bexpr2
bexpr3
-------------+--------------------------------------RR |
0.472
-0.093
-0.601
|
0.601
0.976
0.725
DR | -0.576
-0.960
1.122
405

|
0.762
1.247
0.901
UI | -1.425
-1.048
-0.967
|
0.249
0.524
0.449
RRUI |
0.966
-0.670
-0.433
|
0.612
1.192
1.014
DRUI | -0.199
1.987
2.102
|
1.019
1.727
1.303
LOGWAGE |
0.351
-0.258
0.003
|
0.116
0.179
0.145
tenure | -0.000
0.005
-0.048
|
0.006
0.013
0.012
-------------+--------------------------------------N | 3343.000 3343.000 3343.000
ll | -2700.690 -1250.545 -1742.396
----------------------------------------------------legend: b/se
.
. *** (1B) EXPONENTIAL WITH IG HETEROGENEITY Table 19.2
.
. /* Did not work even though Weibull with IG heterogeneity did
>
> stset spell, fail(censor1=1)
> streg $xlist, nohr robust dist(exponential) frailty(invgauss)
> estimates store bexpigr1
>
> stset spell, fail(censor2=1)
> streg $xlist, nolog nohr robust dist(exponential) frailty(invgauss)
> estimates store bexpigr2
>
> stset spell, fail(censor3=1)
> streg $xlist, nolog nohr robust dist(exponential)
> estimates store bexpiggr3
>
> * Table 19.2 (page 658) first three columns
> estimates table bexpigr1 bexpigr2 bexpigr3, b(%10.3f) se(%10.3f) stats(N ll) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE tenure)
>
> */
.
. *** (2A) WEIBULL WITH NO HETEROGENEITY Table 19.3
.
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
406

-----------------------------------------------------------------------------3343 obs. remaining, representing


1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(weibull)
failure _d: censor1 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 501.65


Log pseudo-likelihood = -2687.5995
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .4481156 .6381895 0.70 0.483 -.8027127 1.698944
DR | -.4269187 .8086983 -0.53 0.598 -2.011938 1.158101
UI | -1.496066 .2639679 -5.67 0.000 -2.013434 -.9786984
RRUI | 1.015226 .6455611 1.57 0.116 -.2500501 2.280503
DRUI | -.2988417 1.065384 -0.28 0.779 -2.386956 1.789272
LOGWAGE | .3655253 .12212 2.99 0.003 .1261745 .6048761
tenure | -.0011127 .0068716 -0.16 0.871 -.0145809 .0123554
slack | -.2652154 .0803214 -3.30 0.001 -.4226424 -.1077883
abolpos | -.1604227 .1012942 -1.58 0.113 -.3589557 .0381103
explose | .2075085 .0684715 3.03 0.002 .0733068 .3417103
stateur | -.0708745 .0242117 -2.93 0.003 -.1183286 -.0234204
houshead | .3976626 .0887192 4.48 0.000 .2237762 .571549
married | .3786057 .0830317 4.56 0.000 .2158665 .541345
female | .1260829 .0896987 1.41 0.160 -.0497233 .301889
child | -.0336778 .0839956 -0.40 0.688 -.1983061 .1309505
ychild | -.1613066 .108947 -1.48 0.139 -.3748389 .0522256
nonwhite | -.7025504 .12426 -5.65 0.000 -.9460956 -.4590052
age | -.0235823 .0041922 -5.63 0.000 -.0317989 -.0153658
schlt12 | -.1226759 .1022762 -1.20 0.230 -.3231335 .0777816
schgt12 | .1162848 .0880692 1.32 0.187 -.0563278 .2888973
smsa | .1999567 .0841129 2.38 0.017 .0350985 .3648149
bluecoll | -.1994925 .0899354 -2.22 0.027 -.3757626 -.0232223
mining | -.1015676 .2036644 -0.50 0.618 -.5007425 .2976073
constr | -.0253737 .1135609 -0.22 0.823 -.247949 .1972016
transp | -.1981522 .1672141 -1.19 0.236 -.5258858 .1295814
trade | -.0311361 .1079502 -0.29 0.773 -.2427146 .1804423
fire | .1262153 .1492527 0.85 0.398 -.1663145 .4187452
407

services | .2031673 .1038945 1.96 0.051 -.0004622 .4067968


pubadmin | .1117728 .3087374 0.36 0.717 -.4933415 .716887
year85 | .2374972 .093387 2.54 0.011
.054462 .4205325
year87 | .3787397 .1011782 3.74 0.000 .1804341 .5770454
year89 | .4920278 .1180472 4.17 0.000 .2606596 .7233959
midatl | .02465 .1542139 0.16 0.873 -.2776037 .3269036
encen | -.0014111 .1579065 -0.01 0.993 -.3109023 .30808
wncen | .1844363 .1694444 1.09 0.276 -.1476687 .5165413
southatl | .2740974 .1250481 2.19 0.028 .0290076 .5191872
escen | .367742 .2024771 1.82 0.069 -.0291058 .7645899
wscen | .3440005 .1527804 2.25 0.024 .0445563 .6434446
mountain | .0159627 .1620188 0.10 0.922 -.3015883 .3335136
pacific | .0849532 .2504077 0.34 0.734 -.4058368 .5757432
_cons | -4.357886 .9196792 -4.74 0.000 -6.160424 -2.555347
-------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281
-------------+---------------------------------------------------------------p | 1.129225 .0219492
1.087014 1.173075
1/p | .8855632 .0172131
.8524608 .9199511
-----------------------------------------------------------------------------. estimates store bweibr1
.
. stset spell, fail(censor2=1)
failure event: censor2 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
339 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(weibull)
failure _d: censor2 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
339
20887

Number of obs =

Wald chi2(40) =

3343

222.95
408

Log pseudo-likelihood = -1248.6859

Prob > chi2

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.0855974 .9920715 -0.09 0.931 -2.030022 1.858827
DR | -.9387836 1.279111 -0.73 0.463 -3.445794 1.568227
UI | -1.110175 .5267037 -2.11 0.035 -2.142496 -.0778551
RRUI | -.6171912 1.203735 -0.51 0.608 -2.976469 1.742086
DRUI | 1.973269 1.756599 1.12 0.261 -1.469601 5.41614
LOGWAGE | -.2437885 .1833224 -1.33 0.184 -.6030938 .1155168
tenure | .0050643 .0127387 0.40 0.691 -.0199031 .0300317
slack | -.2689689 .133176 -2.02 0.043 -.529989 -.0079487
abolpos | -.5721689 .2059292 -2.78 0.005 -.9757826 -.1685551
explose | .0555267 .1147555 0.48 0.628
-.16939 .2804433
stateur | -.1087083 .0413647 -2.63 0.009 -.1897816 -.027635
houshead | -.0679894 .13661 -0.50 0.619 -.3357401 .1997613
married | -.060856 .1362403 -0.45 0.655 -.327882 .20617
female | .4583892 .1408831 3.25 0.001 .1822634 .734515
child | -.2228982 .147376 -1.51 0.130 -.5117499 .0659535
ychild | .1463598 .1844362 0.79 0.427 -.2151284 .507848
nonwhite | -.485664 .186033 -2.61 0.009 -.8502819 -.121046
age | -.0027009 .0065569 -0.41 0.680 -.0155521 .0101503
schlt12 | -.1837633 .1684487 -1.09 0.275 -.5139167 .1463901
schgt12 | -.0488958 .1485385 -0.33 0.742 -.340026 .2422343
smsa | .1380042 .1410747 0.98 0.328 -.1384971 .4145055
bluecoll | .0132584 .1537386 0.09 0.931 -.2880637 .3145805
mining | -.0138734 .4110202 -0.03 0.973 -.8194583 .7917115
constr | .1973771 .1920481 1.03 0.304 -.1790303 .5737845
transp | -.4116241 .2927848 -1.41 0.160 -.9854717 .1622234
trade | .1125741 .1765277 0.64 0.524 -.2334139 .4585621
fire | -.3378747 .3046641 -1.11 0.267 -.9350054 .2592561
services | .1700335 .1729565 0.98 0.326 -.1689551 .5090221
pubadmin | .7553679 .5487635 1.38 0.169 -.3201889 1.830925
year85 | -.0501695 .1515048 -0.33 0.741 -.3471135 .2467745
year87 | -.1116858 .1645254 -0.68 0.497 -.4341497 .2107781
year89 | .1344555 .1987084 0.68 0.499 -.2550059 .5239168
midatl | -.4039691 .2606153 -1.55 0.121 -.9147658 .1068276
encen | -.5105877 .2608364 -1.96 0.050 -1.021818 .0006423
wncen | -.0579723 .2607792 -0.22 0.824 -.5690902 .4531456
southatl | -.2682241 .1972983 -1.36 0.174 -.6549216 .1184733
escen | .079807 .3146812 0.25 0.800 -.5369568 .6965709
wscen | -.0854421 .2368638 -0.36 0.718 -.5496865 .3788024
mountain | .2441762 .2300886 1.06 0.289 -.2067892 .6951416
pacific | -.1999107 .4003467 -0.50 0.618 -.9845758 .5847544
_cons | -1.055211 1.353275 -0.78 0.436 -3.707582 1.597159
-------------+---------------------------------------------------------------/ln_p | .0815649 .0308379 2.64 0.008 .0211236 .1420061
-------------+---------------------------------------------------------------p | 1.084984 .0334587
1.021348 1.152584
409

1/p | .9216729 .0284225


.8676159 .9790979
-----------------------------------------------------------------------------. estimates store bweibr2
.
. stset spell, fail(censor3=1)
failure event: censor3 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
574 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(weibull)
failure _d: censor3 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
574
20887

Number of obs =

Wald chi2(40) = 350.72


Log pseudo-likelihood = -1729.8356
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.6946399 .762754 -0.91 0.362 -2.18961 .8003305
DR | 1.361414 .9691375 1.40 0.160 -.5380611 3.260888
UI | -1.098453 .4595297 -2.39 0.017 -1.999115 -.1977918
RRUI | -.3055217 1.046769 -0.29 0.770 -2.357151 1.746107
DRUI | 1.990913 1.37004 1.45 0.146 -.6943156 4.676141
LOGWAGE | .0401096 .1526549 0.26 0.793 -.2590886 .3393078
tenure | -.0495153 .0126559 -3.91 0.000 -.0743204 -.0247103
slack | -.473113 .1025776 -4.61 0.000 -.6741614 -.2720647
abolpos | -.2910168 .1465355 -1.99 0.047 -.5782212 -.0038124
explose | .0315602 .0906338 0.35 0.728 -.1460787 .2091991
stateur | -.1199252 .0337488 -3.55 0.000 -.1860717 -.0537787
houshead | .5592843 .1107798 5.05 0.000 .3421598 .7764087
410

married | .032312 .1115613 0.29 0.772 -.1863442 .2509681


female | .2764899 .1147909 2.41 0.016 .0515039 .5014759
child | -.149619 .1167679 -1.28 0.200 -.3784799 .079242
ychild | -.1018703 .1436607 -0.71 0.478 -.3834401 .1796996
nonwhite | -.5164388 .1517355 -3.40 0.001 -.8138349 -.2190427
age | -.0275549 .0057648 -4.78 0.000 -.0388536 -.0162561
schlt12 | -.1115642 .1291366 -0.86 0.388 -.3646673 .1415389
schgt12 | .1015553 .1135108 0.89 0.371 -.1209217 .3240324
smsa | .0270168 .1078739 0.25 0.802 -.1844122 .2384459
bluecoll | .3229431 .1167884 2.77 0.006
.094042 .5518443
mining | .2437267 .2731206 0.89 0.372 -.2915799 .7790332
constr | .1307943 .1484399 0.88 0.378 -.1601425 .4217311
transp | -.1004424 .2004105 -0.50 0.616 -.4932397 .2923549
trade | .1181562 .136055 0.87 0.385 -.1485068 .3848192
fire | -.344603 .2792784 -1.23 0.217 -.8919787 .2027726
services | .0519644 .1386656 0.37 0.708 -.2198151 .3237438
pubadmin | -1.780582 1.049217 -1.70 0.090 -3.837009 .2758459
year85 | .311726 .1192592 2.61 0.009 .0779822 .5454698
year87 | .4514345 .126241 3.58 0.000 .2040067 .6988623
year89 | -.1180122 .1713414 -0.69 0.491 -.4538352 .2178108
midatl | -.5476552 .224463 -2.44 0.015 -.9875945 -.1077158
encen | -.084084 .20745 -0.41 0.685 -.4906786 .3225106
wncen | .1288938 .2191536 0.59 0.556 -.3006393 .5584268
southatl | .16223 .1702456 0.95 0.341 -.1714454 .4959053
escen | -.5110545 .3270884 -1.56 0.118 -1.152136 .130027
wscen | .0218047 .1978693 0.11 0.912 -.3660121 .4096214
mountain | .2045852 .1949939 1.05 0.294 -.1775957 .5867662
pacific | .4535074 .2840292 1.60 0.110 -.1031795 1.010194
_cons | -2.017592 1.123888 -1.80 0.073 -4.220372 .1851884
-------------+---------------------------------------------------------------/ln_p | .163312 .0235045 6.95 0.000
.117244 .2093801
-------------+---------------------------------------------------------------p | 1.177404 .0276744
1.124394 1.232914
1/p | .8493261 .019963
.8110869 .8893682
-----------------------------------------------------------------------------. estimates store bweibr3
.
. * Table 19.3 (page 659) first three columns
. estimates table bweibr1 bweibr2 bweibr3, b(%10.3f) se(%10.3f) stats(N ll) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE tenure)
----------------------------------------------------Variable | bweibr1
bweibr2
bweibr3
-------------+--------------------------------------RR |
0.448
-0.086
-0.695
|
0.638
0.992
0.763
DR | -0.427
-0.939
1.361
|
0.809
1.279
0.969
UI | -1.496
-1.110
-1.098
411

|
0.264
0.527
0.460
RRUI |
1.015
-0.617
-0.306
|
0.646
1.204
1.047
DRUI | -0.299
1.973
1.991
|
1.065
1.757
1.370
LOGWAGE |
0.366
-0.244
0.040
|
0.122
0.183
0.153
tenure | -0.001
0.005
-0.050
|
0.007
0.013
0.013
-------------+--------------------------------------N | 3343.000 3343.000 3343.000
ll | -2687.600 -1248.686 -1729.836
----------------------------------------------------legend: b/se
.
. *** (2B) WEIBULL WITH IG HETEROGENEITY Table 19.3
.
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nohr robust dist(weibull) frailty(invgauss)
failure _d: censor1 == 1
analysis time _t: spell
Fitting weibull model:
Fitting constant-only model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:

log pseudo-likelihood = -3134.2376 (not concave)


log pseudo-likelihood = -2998.472
log pseudo-likelihood = -2984.8299
log pseudo-likelihood = -2960.0446
log pseudo-likelihood = -2954.9102
log pseudo-likelihood = -2954.8838
log pseudo-likelihood = -2954.8838

412

Fitting full model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -2656.6306


log pseudo-likelihood = -2632.196
log pseudo-likelihood = -2616.9139
log pseudo-likelihood = -2616.3231
log pseudo-likelihood = -2616.3216
log pseudo-likelihood = -2616.3216

Weibull regression -- log relative-hazard form


Inverse-Gaussian frailty
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 643.00


Log pseudo-likelihood = -2616.3216
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .7356277 .9058181 0.81 0.417 -1.039743 2.510998
DR | -1.072566 1.149098 -0.93 0.351 -3.324758 1.179625
UI | -2.574752 .3843798 -6.70 0.000 -3.328123 -1.821381
RRUI | 1.733571 .9333928 1.86 0.063 -.0958458 3.562987
DRUI | -.060621 1.537813 -0.04 0.969 -3.07468 2.953438
LOGWAGE | .575656 .1766599 3.26 0.001 .2294089 .9219031
tenure | -.0009848 .0097472 -0.10 0.920 -.0200889 .0181194
slack | -.4416007 .1142976 -3.86 0.000 -.6656199 -.2175814
abolpos | -.2873066 .1465357 -1.96 0.050 -.5745113 -.0001019
explose | .3641943 .0976897 3.73 0.000 .1727259 .5556627
stateur | -.0981133 .0346763 -2.83 0.005 -.1660775 -.030149
houshead | .5924383 .1256739 4.71 0.000 .3461219 .8387546
married | .6083214 .1183487 5.14 0.000 .3763624 .8402805
female | .1788439 .1285074 1.39 0.164 -.0730259 .4307137
child | -.0914227 .121778 -0.75 0.453 -.3301031 .1472578
ychild | -.1805373 .1527477 -1.18 0.237 -.4799173 .1188426
nonwhite | -1.008517 .1725174 -5.85 0.000 -1.346645 -.6703894
age | -.0333776 .0059183 -5.64 0.000 -.0449772 -.0217779
schlt12 | -.2258621 .1439543 -1.57 0.117 -.5080075 .0562832
schgt12 | .1505129 .124469 1.21 0.227 -.0934418 .3944677
smsa | .3009952 .119907 2.51 0.012 .0659819 .5360086
bluecoll | -.3211857 .1253163 -2.56 0.010 -.5668012 -.0755702
mining | -.2319827 .3008491 -0.77 0.441 -.8216361 .3576708
constr | -.1260324 .1633669 -0.77 0.440 -.4462257 .1941609
transp | -.2763858 .225893 -1.22 0.221 -.7191279 .1663562
trade | -.0687616 .1518284 -0.45 0.651 -.3663399 .2288166
fire | .0668973 .2131814 0.31 0.754 -.3509306 .4847252
services | .231914 .1494712 1.55 0.121 -.0610441 .5248721
pubadmin | .0901949 .4579252 0.20 0.844 -.807322 .9877117
413

year85 | .2780139 .1339053 2.08 0.038 .0155644 .5404634


year87 | .5208783 .1415375 3.68 0.000 .2434699 .7982867
year89 | .7209598 .1655487 4.35 0.000 .3964903 1.045429
midatl | -.0192077 .2222646 -0.09 0.931 -.4548382 .4164228
encen | -.0297055 .2284931 -0.13 0.897 -.4775438 .4181328
wncen | .2460338 .24216 1.02 0.310 -.2285911 .7206586
southatl | .3563643 .1793284 1.99 0.047 .0048872 .7078415
escen | .5461543 .2910193 1.88 0.061 -.024233 1.116542
wscen | .4606814 .2140966 2.15 0.031 .0410598 .880303
mountain | .017581 .2293804 0.08 0.939 -.4319963 .4671584
pacific | .1379886 .3636985 0.38 0.704 -.5748475 .8508247
_cons | -5.303059 1.34133 -3.95 0.000 -7.932017 -2.6741
-------------+---------------------------------------------------------------/ln_p | .5611667 .0225898 24.84 0.000 .5168915 .6054418
/ln_the | 1.852696 .0896755 20.66 0.000 1.676935 2.028457
-------------+---------------------------------------------------------------p | 1.752716 .0395935
1.676807 1.832062
1/p | .570543 .0128884
.5458332 .5963715
theta | 6.376987 .5718595
5.349136 7.602343
-----------------------------------------------------------------------------. estimates store bweibigr1
.
. stset spell, fail(censor2=1)
failure event: censor2 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
339 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(weibull) frailty(invgauss)
failure _d: censor2 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
Inverse-Gaussian frailty
No. of subjects
No. of failures
Time at risk

=
=
=

3343
339
20887

Number of obs =

3343

414

Wald chi2(40) = 253.77


Log pseudo-likelihood = -1230.1643
Prob > chi2

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.3802006 1.452095 -0.26 0.793 -3.226255 2.465854
DR | -1.689504 1.779553 -0.95 0.342 -5.177363 1.798355
UI | -2.063963 .7469659 -2.76 0.006 -3.527989 -.5999369
RRUI | -.3019038 1.702153 -0.18 0.859 -3.638063 3.034255
DRUI | 3.263067 2.469908 1.32 0.186 -1.577863 8.103998
LOGWAGE | -.4954862 .2614747 -1.89 0.058 -1.007967 .0169948
tenure | .0174014 .0192239 0.91 0.365 -.0202768 .0550795
slack | -.3889861 .1911789 -2.03 0.042 -.7636898 -.0142824
abolpos | -.8027208 .2877528 -2.79 0.005 -1.366706 -.2387356
explose | .1187808 .1663987 0.71 0.475 -.2073546 .4449162
stateur | -.1753726 .059272 -2.96 0.003 -.2915437 -.0592015
houshead | -.0832153 .1944376 -0.43 0.669 -.464306 .2978754
married | -.0092249 .1945187 -0.05 0.962 -.3904747 .3720248
female | .6284921 .2064768 3.04 0.002
.223805 1.033179
child | -.389325 .2127697 -1.83 0.067 -.806346 .0276959
ychild | .3144939 .2663886 1.18 0.238 -.2076182 .836606
nonwhite | -.6691885 .2633831 -2.54 0.011 -1.18541 -.1529671
age | -.0034533 .0093696 -0.37 0.712 -.0218174 .0149108
schlt12 | -.3242365 .2380109 -1.36 0.173 -.7907293 .1422562
schgt12 | -.0745655 .2138285 -0.35 0.727 -.4936618 .3445307
smsa | .2107394 .2012744 1.05 0.295 -.1837512
.60523
bluecoll | -.0065426 .2175612 -0.03 0.976 -.4329548 .4198696
mining | .1293103 .6093175 0.21 0.832 -1.06493 1.323551
constr | .2870954 .2728176 1.05 0.293 -.2476172 .8218081
transp | -.6470251 .4118414 -1.57 0.116 -1.454219 .1601692
trade | .1901489 .2529975 0.75 0.452 -.3057172 .6860149
fire | -.4680763 .4488502 -1.04 0.297 -1.347807 .411654
services | .2462185 .2531429 0.97 0.331 -.2499325 .7423696
pubadmin | 1.351206 .7621665 1.77 0.076 -.1426127 2.845025
year85 | -.1501166 .2195046 -0.68 0.494 -.5803377 .2801044
year87 | -.2400145 .236954 -1.01 0.311 -.7044358 .2244069
year89 | .1828811 .2831188 0.65 0.518 -.3720216 .7377838
midatl | -.4074373 .3806192 -1.07 0.284 -1.153437 .3385627
encen | -.6525035 .381508 -1.71 0.087 -1.400245 .0952385
wncen | -.1300751 .3835973 -0.34 0.735 -.8819119 .6217617
southatl | -.3491396 .2954776 -1.18 0.237 -.928265 .2299859
escen | .2960895 .4558667 0.65 0.516 -.5973927 1.189572
wscen | -.0903554 .3527441 -0.26 0.798 -.7817212 .6010104
mountain | .3721587 .3457717 1.08 0.282 -.3055413 1.049859
pacific | -.1996218 .6042626 -0.33 0.741 -1.383955 .9847112
_cons | 1.157635 1.957298 0.59 0.554 -2.678599 4.993869
-------------+---------------------------------------------------------------/ln_p | .5004283 .0361284 13.85 0.000
.429618 .5712386
/ln_the | 2.896807 .1749249 16.56 0.000
2.55396 3.239653
415

-------------+---------------------------------------------------------------p | 1.649428 .0595911


1.53667 1.770459
1/p | .6062709 .0219036
.5648254 .6507577
theta | 18.11621 3.168976
12.85793 25.52487
-----------------------------------------------------------------------------. estimates store bweibigr2
.
. stset spell, fail(censor3=1)
failure event: censor3 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
574 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(weibull) frailty(invgauss)
failure _d: censor3 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
Inverse-Gaussian frailty
No. of subjects
No. of failures
Time at risk

=
=
=

3343
574
20887

Number of obs =

Wald chi2(40) = 416.91


Log pseudo-likelihood = -1696.8456
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.4326716 1.111223 -0.39 0.697 -2.610628 1.745285
DR | 1.166629 1.377826 0.85 0.397 -1.533861 3.867119
UI | -1.761667 .623017 -2.83 0.005 -2.982758 -.5405758
RRUI | -.5160276 1.418361 -0.36 0.716 -3.295964 2.263909
DRUI | 3.668779 1.93489 1.90 0.058 -.1235355 7.461093
LOGWAGE | -.0069584 .2162461 -0.03 0.974 -.4307929 .4168762
tenure | -.0677151 .0174959 -3.87 0.000 -.1020065 -.0334237
slack | -.7093182 .145145 -4.89 0.000 -.9937971 -.4248392
416

abolpos | -.4327781 .2106818 -2.05 0.040 -.8457069 -.0198494


explose | .0930879 .1284587 0.72 0.469 -.1586864 .3448623
stateur | -.1684826 .0472936 -3.56 0.000 -.2611764 -.0757887
houshead | .7760519 .1555864 4.99 0.000 .4711081 1.080996
married | .0849334 .1585652 0.54 0.592 -.2258487 .3957154
female | .329107 .1637254 2.01 0.044 .0082111 .6500028
child | -.2734744 .1667453 -1.64 0.101 -.6002892 .0533403
ychild | -.101407 .2021952 -0.50 0.616 -.4977024 .2948883
nonwhite | -.7325977 .211777 -3.46 0.001 -1.147673 -.3175223
age | -.0354358 .007992 -4.43 0.000 -.0510998 -.0197719
schlt12 | -.1729163 .1803828 -0.96 0.338 -.5264602 .1806275
schgt12 | .0955174 .1615133 0.59 0.554 -.2210429 .4120777
smsa | .0225321 .1500451 0.15 0.881 -.2715509 .3166151
bluecoll | .4311626 .1651405 2.61 0.009 .1074931 .7548321
mining | .4464055 .3724328 1.20 0.231 -.2835495 1.17636
constr | .1875875 .2104018 0.89 0.373 -.2247926 .5999675
transp | -.0190191 .2877627 -0.07 0.947 -.5830237 .5449855
trade | .1708654 .1960546 0.87 0.383 -.2133945 .5551253
fire | -.3548846 .3851005 -0.92 0.357 -1.109668 .3998985
services | .0199891 .1978478 0.10 0.920 -.3677854 .4077636
pubadmin | -2.249289 1.450209 -1.55 0.121 -5.091646 .5930688
year85 | .3978277 .1726143 2.30 0.021 .0595099 .7361456
year87 | .6809662 .1807412 3.77 0.000
.32672 1.035212
year89 | -.1380237 .2307311 -0.60 0.550 -.5902485 .314201
midatl | -.7908245 .3280754 -2.41 0.016 -1.43384 -.1478085
encen | -.1035781 .2984816 -0.35 0.729 -.6885913 .4814351
wncen | .2578004 .3150731 0.82 0.413 -.3597316 .8753324
southatl | .2314723 .2430344 0.95 0.341 -.2448663 .7078109
escen | -.6777305 .4486486 -1.51 0.131 -1.557065 .2016045
wscen | .0308173 .2842933 0.11 0.914 -.5263874 .5880219
mountain | .2849032 .2816226 1.01 0.312 -.267067 .8368734
pacific | .7162217 .4103619 1.75 0.081 -.0880727 1.520516
_cons | -1.42279 1.617429 -0.88 0.379 -4.592894 1.747313
-------------+---------------------------------------------------------------/ln_p | .5795747 .026888 21.56 0.000 .5268752 .6322742
/ln_the | 2.262575 .1322516 17.11 0.000 2.003367 2.521783
-------------+---------------------------------------------------------------p | 1.785279 .0480026
1.693632 1.881886
1/p | .5601365 .0150609
.5313819 .5904471
theta | 9.607798 1.270647
7.413974 12.45078
-----------------------------------------------------------------------------. estimates store bweibigr3
.
. * Table 19.3 (page 659) first three columns
. estimates table bweibigr1 bweibigr2 bweibigr3, b(%10.3f) se(%10.3f) stats(N ll) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE tenure)
----------------------------------------------------Variable | bweibigr1 bweibigr2 bweibigr3
417

-------------+--------------------------------------RR |
0.736
-0.380
-0.433
|
0.906
1.452
1.111
DR | -1.073
-1.690
1.167
|
1.149
1.780
1.378
UI | -2.575
-2.064
-1.762
|
0.384
0.747
0.623
RRUI |
1.734
-0.302
-0.516
|
0.933
1.702
1.418
DRUI | -0.061
3.263
3.669
|
1.538
2.470
1.935
LOGWAGE |
0.576
-0.495
-0.007
|
0.177
0.261
0.216
tenure | -0.001
0.017
-0.068
|
0.010
0.019
0.017
-------------+--------------------------------------N | 3343.000 3343.000 3343.000
ll | -2616.322 -1230.164 -1696.846
----------------------------------------------------legend: b/se
.
. *** (2C) ESTIMATE COX MODEL SPECIFICATION OF COMPETING RISKS
.
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. stcox $xlist, nolog nohr robust basesurv(survrisk1) basechazard(chrisk1)
failure _d: censor1 == 1
analysis time _t: spell
Cox regression -- Breslow method for ties
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) =

3343

540.98
418

Log pseudo-likelihood = -7717.2334

Prob > chi2

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .5222796 .5711698 0.91 0.361 -.5971926 1.641752
DR | -.752507 .72175 -1.04 0.297 -2.167111 .6620971
UI | -1.317719 .2372893 -5.55 0.000 -1.782798 -.8526409
RRUI | .8822462 .582115 1.52 0.130 -.2586783 2.023171
DRUI | -.0951357 .977774 -0.10 0.922 -2.011538 1.821266
LOGWAGE | .3352639 .1106483 3.03 0.002 .1183972 .5521306
tenure | .0008278 .0061286 0.14 0.893 -.0111841 .0128396
slack | -.247863 .0721173 -3.44 0.001 -.3892103 -.1065158
abolpos | -.1511638 .0905035 -1.67 0.095 -.3285475 .0262198
explose | .1865068 .0615742 3.03 0.002 .0658236
.30719
stateur | -.0590475 .022085 -2.67 0.008 -.1023334 -.0157616
houshead | .3601866 .0794827 4.53 0.000 .2044035 .5159698
married | .358819 .0746355 4.81 0.000 .2125362 .5051019
female | .1002758 .0813277 1.23 0.218 -.0591236 .2596753
child | -.0396054 .0755365 -0.52 0.600 -.1876542 .1084435
ychild | -.1276638 .0967856 -1.32 0.187 -.3173602 .0620325
nonwhite | -.6394475 .1151332 -5.55 0.000 -.8651043 -.4137906
age | -.0204623 .0037593 -5.44 0.000 -.0278305 -.0130942
schlt12 | -.1220585 .0920073 -1.33 0.185 -.3023895 .0582726
schgt12 | .1104817 .0783542 1.41 0.159 -.0430897 .2640531
smsa | .1864841 .0766075 2.43 0.015 .0363361 .3366321
bluecoll | -.2108023 .080867 -2.61 0.009 -.3692986 -.052306
mining | -.1238251 .1906352 -0.65 0.516 -.4974632 .249813
constr | -.054455 .1029488 -0.53 0.597 -.256231 .1473209
transp | -.1551657 .1466515 -1.06 0.290 -.4425973 .1322659
trade | -.0383252 .0968106 -0.40 0.692 -.2280706 .1514201
fire | .1097585 .1300779 0.84 0.399 -.1451895 .3647065
services | .1666262 .0939507 1.77 0.076 -.0175138 .3507662
pubadmin | .1022002 .2829817 0.36 0.718 -.4524336 .6568341
year85 | .204162 .084908 2.40 0.016 .0377454 .3705786
year87 | .3384229 .0899115 3.76 0.000 .1621997 .5146462
year89 | .4486559 .104937 4.28 0.000 .2429832 .6543286
midatl | .0342238 .140515 0.24 0.808 -.2411805 .3096282
encen | .0174597 .1438862 0.12 0.903 -.2645521 .2994716
wncen | .1650967 .1532559 1.08 0.281 -.1352795 .4654728
southatl | .2518023 .1127138 2.23 0.025 .0308874 .4727172
escen | .3450422 .1839818 1.88 0.061 -.0155554 .7056398
wscen | .3316752 .1359801 2.44 0.015 .0651591 .5981914
mountain | .009484 .1468626 0.06 0.949 -.2783613 .2973293
pacific | .0720292 .2263339 0.32 0.750 -.3715771 .5156355
-----------------------------------------------------------------------------. estimates store bcoxrisk1
.
419

. stset spell, fail(censor2=1)


failure event: censor2 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
339 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. stcox $xlist, nolog nohr robust basesurv(survrisk2) basechazard(chrisk2)
failure _d: censor2 == 1
analysis time _t: spell
Cox regression -- Breslow method for ties
No. of subjects
No. of failures
Time at risk

=
=
=

Log pseudo-likelihood =

3343
339
20887

Number of obs =

Wald chi2(40) = 211.82


-2444.342
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.0719673 .9513101 -0.08 0.940 -1.936501 1.792566
DR | -1.0236 1.193087 -0.86 0.391 -3.362007 1.314807
UI | -.906022 .5109396 -1.77 0.076 -1.907445 .0954013
RRUI | -.7818457 1.166182 -0.67 0.503 -3.06752 1.503829
DRUI | 2.031968 1.671862 1.22 0.224 -1.244821 5.308756
LOGWAGE | -.2800345 .1736454 -1.61 0.107 -.6203732 .0603043
tenure | .0059934 .0122664 0.49 0.625 -.0180483 .0300352
slack | -.2476685 .12775 -1.94 0.053 -.498054 .0027169
abolpos | -.5434923 .1976775 -2.75 0.006 -.9309331 -.1560516
explose | .0334802 .1101886 0.30 0.761 -.1824856 .2494459
stateur | -.0923228 .0393339 -2.35 0.019 -.1694157 -.0152299
houshead | -.0864111 .1303336 -0.66 0.507 -.3418602 .1690379
married | -.065464 .1298376 -0.50 0.614 -.3199409 .189013
female | .4386603 .1340263 3.27 0.001 .1759735 .7013471
child | -.2049337 .1413612 -1.45 0.147 -.4819966 .0721293
ychild | .1556684 .1766059 0.88 0.378 -.1904727 .5018095
nonwhite | -.3956483 .1761206 -2.25 0.025 -.7408382 -.0504583
age | .0001207 .0062519 0.02 0.985 -.0121327 .0123741
420

schlt12 | -.1723734 .1618354 -1.07 0.287 -.489565 .1448182


schgt12 | -.0583556 .142103 -0.41 0.681 -.3368724 .2201611
smsa | .1120279 .1334106 0.84 0.401 -.1494521 .3735079
bluecoll | -.0021333 .1460376 -0.01 0.988 -.2883617 .2840951
mining | -.0132972 .401138 -0.03 0.974 -.7995132 .7729188
constr | .1654229 .1852256 0.89 0.372 -.1976127 .5284584
transp | -.3818733 .2831048 -1.35 0.177 -.9367485 .1730019
trade | .1065755 .1677346 0.64 0.525 -.2221782 .4353293
fire | -.345295 .2945472 -1.17 0.241 -.9225969 .2320068
services | .1443583 .1664345 0.87 0.386 -.1818474 .470564
pubadmin | .7203208 .5238954 1.37 0.169 -.3064953 1.747137
year85 | -.0647735 .1460286 -0.44 0.657 -.3509844 .2214373
year87 | -.138436 .1574958 -0.88 0.379 -.4471221 .1702502
year89 | .100033 .1887671 0.53 0.596 -.2699437 .4700097
midatl | -.3838124 .2529706 -1.52 0.129 -.8796257 .1120009
encen | -.5058645 .2521219 -2.01 0.045 -1.000014 -.0117146
wncen | -.081463 .2512893 -0.32 0.746 -.5739811 .411055
southatl | -.2799968 .1891246 -1.48 0.139 -.6506742 .0906805
escen | .0372908 .2993588 0.12 0.901 -.5494417 .6240233
wscen | -.1157119 .2286912 -0.51 0.613 -.5639385 .3325146
mountain | .204597 .2206239 0.93 0.354 -.2278179 .6370119
pacific | -.2138749 .3899895 -0.55 0.583 -.9782404 .5504905
-----------------------------------------------------------------------------. estimates store bcoxrisk2
.
. stset spell, fail(censor3=1)
failure event: censor3 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
574 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. stcox $xlist, nolog nohr robust basesurv(survrisk3) basechazard(chrisk3)
failure _d: censor3 == 1
analysis time _t: spell
Cox regression -- Breslow method for ties
No. of subjects

3343

Number of obs =

3343
421

No. of failures
Time at risk

=
=

574
20887

Wald chi2(40) = 357.81


Log pseudo-likelihood = -4094.2361
Prob > chi2

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.4692082 .7157644 -0.66 0.512 -1.872081 .9336643
DR | .8759221 .8786992 1.00 0.319 -.8462967 2.598141
UI | -.9051384 .4449384 -2.03 0.042 -1.777202 -.0330753
RRUI | -.5392752 1.002388 -0.54 0.591 -2.503919 1.425369
DRUI | 2.293752 1.274021 1.80 0.072 -.2032836 4.790787
LOGWAGE | -.0140883 .1415912 -0.10 0.921 -.291602 .2634253
tenure | -.0465013 .0118142 -3.94 0.000 -.0696567 -.0233458
slack | -.4587556 .0952092 -4.82 0.000 -.6453621 -.2721491
abolpos | -.2743895 .136703 -2.01 0.045 -.5423223 -.0064566
explose | .0199625 .0843281 0.24 0.813 -.1453176 .1852426
stateur | -.1013309 .0311307 -3.26 0.001 -.1623459 -.0403159
houshead | .5154239 .1031203 5.00 0.000 .3133117 .717536
married | .0280002 .1037338 0.27 0.787 -.1753143 .2313148
female | .2477194 .1071841 2.31 0.021 .0376425 .4577962
child | -.1477253 .1086376 -1.36 0.174 -.3606511 .0652005
ychild | -.0702224 .1341067 -0.52 0.601 -.3330667 .1926219
nonwhite | -.4472066 .1401892 -3.19 0.001 -.7219723 -.1724409
age | -.0227849 .0053188 -4.28 0.000 -.0332096 -.0123602
schlt12 | -.1050265 .1191449 -0.88 0.378 -.3385462 .1284931
schgt12 | .0912594 .1057371 0.86 0.388 -.1159815 .2985004
smsa | .0078536 .0994133 0.08 0.937 -.1869928
.2027
bluecoll | .2916892 .1085873 2.69 0.007 .0788619 .5045165
mining | .2392902 .2514416 0.95 0.341 -.2535263 .7321067
constr | .0659352 .1393882 0.47 0.636 -.2072606 .339131
transp | -.0724276 .1845329 -0.39 0.695 -.4341054 .2892502
trade | .0824395 .1260009 0.65 0.513 -.1645178 .3293967
fire | -.3901171 .2648329 -1.47 0.141
-.90918 .1289458
services | .0007351 .1296195 0.01 0.995 -.2533144 .2547847
pubadmin | -1.749927 1.038715 -1.68 0.092 -3.785771 .2859182
year85 | .2810465 .1124259 2.50 0.012 .0606957 .5013973
year87 | .4139684 .117016 3.54 0.000 .1846212 .6433155
year89 | -.1485614 .1590621 -0.93 0.350 -.4603173 .1631946
midatl | -.5271828 .2165005 -2.44 0.015 -.9515159 -.1028497
encen | -.063171 .1962513 -0.32 0.748 -.4478166 .3214745
wncen | .134275 .2051501 0.65 0.513 -.2678118 .5363617
southatl | .1522905 .1610446 0.95 0.344 -.1633512 .4679321
escen | -.5030762 .3118938 -1.61 0.107 -1.114377 .1082245
wscen | .0116807 .1858946 0.06 0.950 -.352666 .3760273
mountain | .2043736 .1827277 1.12 0.263 -.1537662 .5625134
pacific | .4327009 .2661013 1.63 0.104 -.088848 .9542498
------------------------------------------------------------------------------

422

. estimates store bcoxrisk3


.
. * Table 19.3 (page 659) last three columns
. * NOTE: The results from this program differ a little from those
.*
given in text. Need to resolve this.
. estimates table bcoxrisk1 bcoxrisk2 bcoxrisk3, b(%10.3f) se(%10.3f) stats(N ll) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE tenure)
----------------------------------------------------Variable | bcoxrisk1 bcoxrisk2 bcoxrisk3
-------------+--------------------------------------RR |
0.522
-0.072
-0.469
|
0.571
0.951
0.716
DR | -0.753
-1.024
0.876
|
0.722
1.193
0.879
UI | -1.318
-0.906
-0.905
|
0.237
0.511
0.445
RRUI |
0.882
-0.782
-0.539
|
0.582
1.166
1.002
DRUI | -0.095
2.032
2.294
|
0.978
1.672
1.274
LOGWAGE |
0.335
-0.280
-0.014
|
0.111
0.174
0.142
tenure |
0.001
0.006
-0.047
|
0.006
0.012
0.012
-------------+--------------------------------------N | 3343.000 3343.000 3343.000
ll | -7717.233 -2444.342 -4094.236
----------------------------------------------------legend: b/se
.
. *** (2D) GRAPHS FOR COX COMPETING RISKS MODEL
.
. * Figure 19.1 (page 661) - Plot the three baseline survival functions
. sort _t
. graph twoway (scatter survrisk1 _t, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter survrisk2 _t, c(J) msymbol(i) msize(small) clstyle(p2)) /*
> */ (scatter survrisk3 _t, c(J) msymbol(i) msize(small) clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Baseline Survival Functions") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Baseline Survival Probability", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(3) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Risk 1 (full-time job)") label(2 "Risk 2 (part-time job)") label(3 "Risk 3 (
> unknown job)"))
. graph export combined_bsf.wmf, replace
(file c:\Imbook\bwebpage\Section4\combined_bsf.wmf written in Windows Metafile format)
423

.
. * Figure 19.2 (page 659) - Plot the three baseline cumulative hazards
. sort _t
. graph twoway (scatter chrisk1 _t, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter chrisk2 _t, c(J) msymbol(i) msize(small) clstyle(p2)) /*
> */ (scatter chrisk3 _t, c(J) msymbol(i) msize(small) clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Baseline Cumulative Hazard Functions") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Baseline Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Risk 1 (full-time job)") label(2 "Risk 2 (part-time job)") label(3 "Risk 3 (
> unknown job)"))
. graph export combined_cbh.wmf, replace
(file c:\Imbook\bwebpage\Section4\combined_cbh.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma19p1comprisks.txt
log type: text
closed on: 19 May 2005, 17:53:08

424

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma20p1count.txt
log type: text
opened on: 20 May 2005, 08:41:33
.
. ********* OVERVIEW OF MMA20P1COUNT.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 20.3 pages 671-4 and 20.7 page 690
. * Count data regression example
. * It provides
. * (1) Frequency distribution for count (Table 20.3)
. * (2) Data summary (Table 20.4)
. * (3) Poisson regression with various standard errors (Table 20.5)
. * (4) Negative binomial regression with various standard errors (Table 20.5)
.
. * To use this program you need health expenditure data in Stata data set
. * randdata.dta
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** DATA DESCRIPTION **********
.
. * Essentially same data as in P. Deb and P.K. Trivedi (2002)
. * "The Structure of Demand for Medical Care: Latent Class versus
. * Two-Part Models", Journal of Health Economics, 21, 601-625
. * except that paper used different outcome (counts rather than $)
.
. * Each observation is for an individual over a year.
. * Individuals may appear in up to five years.
. * All available sample is used except only fee for service plans included.
. * In analysis here only year 2 is used so panel complications are avoided.
. * Clustering of individuals within household is ignored here.
.
. * Dependent variable is
.*
MED
med
Annual medical expenditures in constant dollars
.*
excluding dental and outpatient mental
.*
LNMED lnmeddol Ln(Medical expenditures) given meddol > 0
425

.*
Missing otherwise
.*
DMED binexp 1 if medical expenditures > 0
.
. * Regressors are
. * - Health insurance measures
.*
LC
logc
log(coinsrate+1) where coinsurance rate is 0 to 100
.*
IDP
idp
1 if individual deductible plan
.*
LPI
lpi
1og(annual participation incentive payment) or 0 if no payment
.*
FMDE
fmde
log(max(medical deductible expenditure)) if IDP=1 and MDE>1 or 0
otherw
> ise.
. * - Health status measures
.*
NDISEASE disea number of chronic diseases
.*
PHYSLIM physlm 1 if physical limitation
.*
HLTHG hlthg 1 if good health
.*
HLTHF hlthf 1 if good health
.*
HLTHP hlthp 1 if good health (omitted is excellent)
. * - Socioeconomic characteristics
.*
LINC linc
log of annual family income (in $)
.*
LFAM lfam
log of family size
.*
EDUCDEC educdec years of schooling of decision maker
.*
AGE
xage
exact age
.*
BLACK black 1 if black
.*
FEMALE female 1 if female
.*
CHILD child 1 if child
.*
FEMCHILD fchild 1 if female child
.
. * If panel data used then clustering is on
.*
zper
person id
.
. ********** READ DATA, SELECT AND TRANSFORM **********
.
. use randdata.dta, clear
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------plan | 20190 11.17553 3.976751
1
19
site | 20190 3.298811 1.80382
1
6
coins | 20190 26.3056 36.40386
0
100
tookphys | 20190 .5974245 .4904288
0
1
year | 20190 2.420109 1.217141
1
5
-------------+-------------------------------------------------------zper | 20190 357965.5 180868.1 125024 632167
black | 20190 .1814983 .3827071
0
1
income | 20190 8037.409 4058.371
0 29237.54
xage | 20190 25.72233 16.76945
0 64.27515
female | 20190 .5170381 .499722
0
1
-------------+-------------------------------------------------------educdec | 20186 11.96681 2.806255
0
25
426

time | 20190 .9989561 .0259741 .0767123


1
outpdol | 20190 51.12649 94.92627
0 2599.902
drugdol | 20190 13.1687 33.76212
0 706.3979
suppdol | 20190
6.8024 21.39346
0 1009.47
-------------+-------------------------------------------------------mentdol | 20190 6.870347 58.41298
0 1340.834
inpdol | 20190 100.4694 655.6215
0 38649.81
meddol | 20190 171.5679 698.2015
0 39182.02
totadm | 20190 .1127291 .4111857
0
8
inpmis | 20190 .0039624 .062824
0
1
-------------+-------------------------------------------------------mentvis | 20190 .4322437 3.430789
0
62
mdvis | 20190 2.860426 4.504365
0
77
notmdvis | 20190 .6855869 3.763543
0
109
num | 20190 3.954235 1.853034
1
14
mhi | 20190 76.55584 12.50224
12.2
100
-------------+-------------------------------------------------------disea | 20190 11.24449 6.741449
0
58.6
physlm | 20190 .1235003 .3220164
0
1
ghindx | 14967 73.09055 15.99371
3.7
100
mdeoff | 20185 417.8422 384.1199
0
1000
pioff | 20185 446.677 367.466
0 1291.68
-------------+-------------------------------------------------------child | 20190 .4013373 .4901812
0
1
fchild | 20190 .1937098 .3952139
0
1
lfam | 20190 1.248156 .539301
0 2.639057
lpi | 20190 4.707894 2.69784
0 7.163699
idp | 20190 .2599802 .4386343
0
1
-------------+-------------------------------------------------------logc | 20190 2.383342 2.041776
0 4.564348
fmde | 20190 4.029524 3.471353
0 8.294049
hlthg | 20190 .3620109 .4805938
0
1
hlthf | 20190 .077266 .2670196
0
1
hlthp | 20190 .0149579 .1213874
0
1
-------------+-------------------------------------------------------xghindx | 20190 73.2375 14.2332
3.7
100
linc | 20190 8.708265 1.228309
0 10.28324
lnum | 20190 1.248156 .539301
0 2.639057
lnmeddol | 15737 4.109318 1.484654 -.8495329 10.57597
binexp | 20190 .7794453 .414631
0
1
.
. /* Describe and summarize the original data.
> describe
> summarize
> * The orignal data are a panel.
> * The following summarizes panel features for completeness
> iis zper
> tis year
> xtdes
> xtsum meddol lnmeddol binexp
427

> */
.
. * Note that unlike chapter 16 we use all years, not just year 2
.
. * educdec is missing for some observations
. drop if educdec==.
(4 observations deleted)
.
. * rename variables
. rename mdvis MDU
. rename meddol MED
. rename binexp DMED
. rename lnmeddol LNMED
. rename linc LINC
. rename lfam LFAM
. rename educdec EDUCDEC
. rename xage AGE
. rename female FEMALE
. rename child CHILD
. rename fchild FEMCHILD
. rename black BLACK
. rename disea NDISEASE
. rename physlm PHYSLIM
. rename hlthg HLTHG
. rename hlthf HLTHF
. rename hlthp HLTHP
. rename idp IDP
. rename logc LC
. rename lpi LPI
. rename fmde FMDE
428

.
. * Define the regressor list which in commands can refer to as $XLIST
. global XLIST LC IDP LPI FMDE PHYSLIM NDISEASE HLTHG HLTHF HLTHP /*
>
*/ LINC LFAM EDUCDEC AGE FEMALE CHILD FEMCHILD BLACK
.
. sum MDU $XLIST
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MDU | 20186 2.860696 4.504765
0
77
LC | 20186 2.383588 2.041713
0 4.564348
IDP | 20186 .2599822 .4386354
0
1
LPI | 20186 4.708827 2.697293
0 7.163699
FMDE | 20186 4.030322 3.471234
0 8.294049
-------------+-------------------------------------------------------PHYSLIM | 20186 .1235247 .3220437
0
1
NDISEASE | 20186 11.2445 6.741647
0
58.6
HLTHG | 20186 .3620826 .4806144
0
1
HLTHF | 20186 .0772813 .2670439
0
1
HLTHP | 20186 .0149609 .1213992
0
1
-------------+-------------------------------------------------------LINC | 20186 8.708167 1.22841
0 10.28324
LFAM | 20186 1.248404 .5390681
0 2.639057
EDUCDEC | 20186 11.96681 2.806255
0
25
AGE | 20186 25.71844 16.76759
0 64.27515
FEMALE | 20186 .5169424 .4997252
0
1
-------------+-------------------------------------------------------CHILD | 20186 .4014168 .4901972
0
1
FEMCHILD | 20186 .1937481 .3952436
0
1
BLACK | 20186 .1815343 .3827365
0
1
.
. * Write final data to a text (ascii) file so can use with programs other than Stata
. outfile MDU LC IDP LPI FMDE PHYSLIM NDISEASE HLTHG HLTHF HLTHP /*
>
*/ LINC LFAM EDUCDEC AGE FEMALE CHILD FEMCHILD BLACK /*
>
*/ using mma20p1count.asc, replace
.
. ********** (1) FREQUENCIES OF COUNT (Table 20.3, page 672) **********
.
. * Following ggives Table 20.3 (page 672) frequencies
. tabulate MDU
number |
face-to-fac |
t md visits |
Freq. Percent
Cum.
------------+----------------------------------0|
6,308
31.25
31.25
1|
3,815
18.90
50.15
429

2|
3|
4|
5|
6|
7|
8|
9|
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
37 |
38 |
39 |
40 |
41 |
44 |
45 |
46 |
48 |
51 |
52 |
55 |
56 |
57 |
58 |
62 |
63 |

2,795
1,884
1,345
968
689
531
408
287
206
190
118
109
82
59
56
33
37
35
26
22
19
19
13
8
10
6
12
6
8
8
4
5
9
5
5
9
1
3
5
6
2
2
2
1
3
1
1
1
1
1
1

13.85
9.33
6.66
4.80
3.41
2.63
2.02
1.42
1.02
0.94
0.58
0.54
0.41
0.29
0.28
0.16
0.18
0.17
0.13
0.11
0.09
0.09
0.06
0.04
0.05
0.03
0.06
0.03
0.04
0.04
0.02
0.02
0.04
0.02
0.02
0.04
0.00
0.01
0.02
0.03
0.01
0.01
0.01
0.00
0.01
0.00
0.00
0.00
0.00
0.00
0.00

63.99
73.33
79.99
84.79
88.20
90.83
92.85
94.27
95.29
96.24
96.82
97.36
97.77
98.06
98.34
98.50
98.68
98.86
98.98
99.09
99.19
99.28
99.35
99.39
99.44
99.46
99.52
99.55
99.59
99.63
99.65
99.68
99.72
99.75
99.77
99.82
99.82
99.84
99.86
99.89
99.90
99.91
99.92
99.93
99.94
99.95
99.95
99.96
99.96
99.97
99.97
430

65 |
1
0.00
99.98
69 |
1
0.00
99.98
72 |
1
0.00
99.99
74 |
1
0.00
99.99
76 |
1
0.00
100.00
77 |
1
0.00
100.00
------------+----------------------------------Total | 20,186
100.00
.
. * Histogram with kernel density estimate
. hist MDU, discrete kdensity
(start=0, width=1)
.
. ********** (2) DATA SUMMARY (Table 20.4, page 672) **********
.
. * Following gives variables in same order as Table 20.4 (page 672)
. sum MDU LC IDP LPI FMDE LINC LFAM AGE FEMALE CHILD FEMCHILD BLACK /*
>
*/ EDUCDEC PHYSLIM NDISEASE HLTHG HLTHF HLTHP
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MDU | 20186 2.860696 4.504765
0
77
LC | 20186 2.383588 2.041713
0 4.564348
IDP | 20186 .2599822 .4386354
0
1
LPI | 20186 4.708827 2.697293
0 7.163699
FMDE | 20186 4.030322 3.471234
0 8.294049
-------------+-------------------------------------------------------LINC | 20186 8.708167 1.22841
0 10.28324
LFAM | 20186 1.248404 .5390681
0 2.639057
AGE | 20186 25.71844 16.76759
0 64.27515
FEMALE | 20186 .5169424 .4997252
0
1
CHILD | 20186 .4014168 .4901972
0
1
-------------+-------------------------------------------------------FEMCHILD | 20186 .1937481 .3952436
0
1
BLACK | 20186 .1815343 .3827365
0
1
EDUCDEC | 20186 11.96681 2.806255
0
25
PHYSLIM | 20186 .1235247 .3220437
0
1
NDISEASE | 20186 11.2445 6.741647
0
58.6
-------------+-------------------------------------------------------HLTHG | 20186 .3620826 .4806144
0
1
HLTHF | 20186 .0772813 .2670439
0
1
HLTHP | 20186 .0149609 .1213992
0
1
.
.
. *********** (3, 4) REGRESSION ANALYSIS **************
.
. * Here just two estimators - Poisson and negative binomial
. * but three ways to calculate standard errors
431

. * (A) default ML
. * (B) robust (to misspecification of heteroskedasticity)
. * (C) cluster-robust needed here as data are actually panel (see chapter 21, 24)
.
. *** Table 20.5 Poisson regression estimates
.
. * Default standard errors assume variance = mean (ignoring overdispersion)
. * This is first t-ratio in Table 20.5
. poisson MDU $XLIST
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -60097.599


log likelihood = -60087.636
log likelihood = -60087.622
log likelihood = -60087.622

Poisson regression

Number of obs =
20186
LR chi2(17) = 13106.07
Prob > chi2 = 0.0000
Log likelihood = -60087.622
Pseudo R2
= 0.0983

-----------------------------------------------------------------------------MDU |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0427332 .0060785 -7.03 0.000 -.0546469 -.0308195
IDP | -.1613169 .0116218 -13.88 0.000 -.1840952 -.1385385
LPI | .0128511 .0018362 7.00 0.000 .0092523 .0164499
FMDE | -.020613 .0035521 -5.80 0.000 -.027575 -.0136511
PHYSLIM | .2684048 .0123624 21.71 0.000 .2441749 .2926347
NDISEASE | .023183 .0006081 38.12 0.000 .0219912 .0243749
HLTHG | .0394004 .0095884 4.11 0.000 .0206074 .0581934
HLTHF | .2531119 .016212 15.61 0.000 .2213369 .2848869
HLTHP | .5216034 .0272382 19.15 0.000 .4682176 .5749892
LINC | .0834099 .0051656 16.15 0.000 .0732854 .0935343
LFAM | -.1296626 .0089603 -14.47 0.000 -.1472245 -.1121008
EDUCDEC | .0176149 .0016387 10.75 0.000 .0144031 .0208268
AGE | .0023756 .0004311 5.51 0.000 .0015306 .0032206
FEMALE | .3487667 .0113504 30.73 0.000 .3265203 .371013
CHILD | .3361904 .0178194 18.87 0.000 .3012649 .3711158
FEMCHILD | -.3625218 .0179396 -20.21 0.000 -.3976827 -.3273608
BLACK | -.6800518 .0155484 -43.74 0.000 -.7105262 -.6495775
_cons | -.1898766 .0491731 -3.86 0.000 -.2862541 -.093499
-----------------------------------------------------------------------------. estimates store poisml
.
. * Should always control for possible overdispersion
. * This is second t-ratio in Table 20.5
. poisson MDU $XLIST, robust
Iteration 0: log pseudo-likelihood = -60097.599
432

Iteration 1: log pseudo-likelihood = -60087.636


Iteration 2: log pseudo-likelihood = -60087.622
Iteration 3: log pseudo-likelihood = -60087.622
Poisson regression

Number of obs =
20186
Wald chi2(17) = 1924.78
Prob > chi2 = 0.0000
Log pseudo-likelihood = -60087.622
Pseudo R2
= 0.0983
-----------------------------------------------------------------------------|
Robust
MDU |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0427332 .0150712 -2.84 0.005 -.0722723 -.0131942
IDP | -.1613169 .0279441 -5.77 0.000 -.2160863 -.1065474
LPI | .0128511 .0044136 2.91 0.004 .0042007 .0215015
FMDE | -.020613 .0088874 -2.32 0.020 -.0380319 -.0031941
PHYSLIM | .2684048 .0325743 8.24 0.000 .2045604 .3322493
NDISEASE | .023183 .0017189 13.49 0.000
.019814 .0265521
HLTHG | .0394004 .023194 1.70 0.089 -.006059 .0848598
HLTHF | .2531119 .0429454 5.89 0.000 .1689405 .3372833
HLTHP | .5216034 .0748808 6.97 0.000 .3748398 .668367
LINC | .0834099 .0139182 5.99 0.000 .0561306 .1106891
LFAM | -.1296626 .0226793 -5.72 0.000 -.1741132 -.085212
EDUCDEC | .0176149 .004042 4.36 0.000 .0096927 .0255371
AGE | .0023756 .0011184 2.12 0.034 .0001837 .0045675
FEMALE | .3487667 .0283549 12.30 0.000
.293192 .4043413
CHILD | .3361904 .040411 8.32 0.000 .2569863 .4153945
FEMCHILD | -.3625218 .04415 -8.21 0.000 -.4490542 -.2759893
BLACK | -.6800518 .0368748 -18.44 0.000 -.7523252 -.6077785
_cons | -.1898766 .127516 -1.49 0.136 -.4398033 .0600502
-----------------------------------------------------------------------------. estimates store poisrobust
.
. * Should also control here for clustering (see chapter 24)
. * as up to four years of data for each person.
. * Table 20.5 did not report these results
. poisson MDU $XLIST, cluster(zper)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -60097.599


log pseudo-likelihood = -60087.636
log pseudo-likelihood = -60087.622
log pseudo-likelihood = -60087.622

Poisson regression

Number of obs =
20186
Wald chi2(17) = 827.07
Log pseudo-likelihood = -60087.622
Prob > chi2 = 0.0000
(standard errors adjusted for clustering on zper)
433

-----------------------------------------------------------------------------|
Robust
MDU |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0427332 .0226824 -1.88 0.060 -.0871899 .0017235
IDP | -.1613169 .0424591 -3.80 0.000 -.2445352 -.0780986
LPI | .0128511 .0067697 1.90 0.058 -.0004173 .0261195
FMDE | -.020613 .0134449 -1.53 0.125 -.0469646 .0057386
PHYSLIM | .2684048 .0491061 5.47 0.000 .1721586 .364651
NDISEASE | .023183 .0027457 8.44 0.000 .0178015 .0285645
HLTHG | .0394004 .0354001 1.11 0.266 -.0299825 .1087833
HLTHF | .2531119 .0675164 3.75 0.000 .1207822 .3854416
HLTHP | .5216034 .1163731 4.48 0.000 .2935163 .7496905
LINC | .0834099 .0200881 4.15 0.000 .0440379 .1227818
LFAM | -.1296626 .0340038 -3.81 0.000 -.1963089 -.0630164
EDUCDEC | .0176149 .0062678 2.81 0.005 .0053302 .0298996
AGE | .0023756 .0016549 1.44 0.151 -.0008681 .0056192
FEMALE | .3487667 .0432567 8.06 0.000
.263985 .4335483
CHILD | .3361904 .0586109 5.74 0.000 .2213151 .4510656
FEMCHILD | -.3625218 .0660639 -5.49 0.000 -.4920045 -.233039
BLACK | -.6800518 .0544268 -12.49 0.000 -.7867263 -.5733774
_cons | -.1898766 .1860343 -1.02 0.307 -.5544971 .174744
-----------------------------------------------------------------------------. estimates store poiscluster
.
. *** Table 20.5 Negative binomial regression estimates
.
. * Default standard errors assume variance = mean (ignoring overdispersion)
. * This is first t-ratio in Table 20.5
. nbreg MDU $XLIST
Fitting Poisson model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -60097.599


log likelihood = -60087.636
log likelihood = -60087.622
log likelihood = -60087.622

Fitting constant-only model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -44579.449


log likelihood = -44192.261
log likelihood = -44191.615
log likelihood = -44191.615

Fitting full model:


Iteration 0: log likelihood = -42968.574
Iteration 1: log likelihood = -42783.342
434

Iteration 2: log likelihood = -42777.614


Iteration 3: log likelihood = -42777.611
Negative binomial regression

Number of obs =
20186
LR chi2(17) = 2828.01
Prob > chi2 = 0.0000
Log likelihood = -42777.611
Pseudo R2
= 0.0320
-----------------------------------------------------------------------------MDU |
Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0504405 .0128694 -3.92 0.000 -.0756641 -.0252169
IDP | -.1475976 .0254099 -5.81 0.000 -.1974001 -.0977951
LPI | .0158351 .0040586 3.90 0.000 .0078805 .0237898
FMDE | -.021335 .0075119 -2.84 0.005 -.036058 -.0066119
PHYSLIM | .2751715 .0295572 9.31 0.000 .2172404 .3331026
NDISEASE | .0259352 .0014827 17.49 0.000 .0230292 .0288412
HLTHG | .0065371 .0202235 0.32 0.747 -.0331002 .0461744
HLTHF | .2368643 .0374086 6.33 0.000 .1635448 .3101837
HLTHP | .4256563 .0741812 5.74 0.000 .2802638 .5710488
LINC | .0845165 .0085659 9.87 0.000 .0677277 .1013053
LFAM | -.1226764 .019308 -6.35 0.000 -.1605195 -.0848333
EDUCDEC | .0162582 .0034846 4.67 0.000 .0094285 .0230879
AGE | .0025943 .0009433 2.75 0.006 .0007455 .0044432
FEMALE | .3672884 .024005 15.30 0.000 .3202395 .4143373
CHILD | .3060317 .0385618 7.94 0.000 .230452 .3816115
FEMCHILD | -.3755503 .0371392 -10.11 0.000 -.4483418 -.3027587
BLACK | -.7104372 .0274929 -25.84 0.000 -.7643223 -.6565521
_cons | -.2069298 .0899431 -2.30 0.021 -.3832151 -.0306445
-------------+---------------------------------------------------------------/lnalpha | .1674206 .0147901
.1384326 .1964087
-------------+---------------------------------------------------------------alpha | 1.182251 .0174856
1.148472 1.217024
-----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 3.5e+04 Prob>=chibar2 = 0.000
. estimates store nbml
.
. * Should always control for possible overdispersion
. * This is second t-ratio in Table 20.5
. nbreg MDU $XLIST, robust
Fitting Poisson model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -60097.599


log pseudo-likelihood = -60087.636
log pseudo-likelihood = -60087.622
log pseudo-likelihood = -60087.622

Fitting constant-only model:


435

Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -44579.449


log pseudo-likelihood = -44192.261
log pseudo-likelihood = -44191.615
log pseudo-likelihood = -44191.615

Fitting full model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -42968.574


log pseudo-likelihood = -42783.342
log pseudo-likelihood = -42777.614
log pseudo-likelihood = -42777.611

Negative binomial regression

Number of obs =
20186
Wald chi2(17) = 2203.12
Prob > chi2 = 0.0000
Log pseudo-likelihood = -42777.611
Pseudo R2
= 0.0320

-----------------------------------------------------------------------------|
Robust
MDU |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0504405 .0156238 -3.23 0.001 -.0810625 -.0198184
IDP | -.1475976 .0303777 -4.86 0.000 -.2071367 -.0880585
LPI | .0158351 .004431 3.57 0.000 .0071505 .0245197
FMDE | -.021335 .0090748 -2.35 0.019 -.0391211 -.0035488
PHYSLIM | .2751715 .0341067 8.07 0.000 .2083235 .3420195
NDISEASE | .0259352 .0016925 15.32 0.000
.022618 .0292524
HLTHG | .0065371 .023814 0.27 0.784 -.0401375 .0532118
HLTHF | .2368643 .0436579 5.43 0.000 .1512963 .3224322
HLTHP | .4256563 .0686042 6.20 0.000 .2911945 .560118
LINC | .0845165 .0113918 7.42 0.000 .0621891 .106844
LFAM | -.1226764 .0231639 -5.30 0.000 -.1680769 -.0772759
EDUCDEC | .0162582 .0040332 4.03 0.000 .0083533 .024163
AGE | .0025943 .0011128 2.33 0.020 .0004133 .0047753
FEMALE | .3672884 .0285724 12.85 0.000 .3112876 .4232892
CHILD | .3060317 .0428976 7.13 0.000
.221954 .3901095
FEMCHILD | -.3755503 .0447039 -8.40 0.000 -.4631682 -.2879323
BLACK | -.7104372 .0359462 -19.76 0.000 -.7808903 -.639984
_cons | -.2069298 .1130753 -1.83 0.067 -.4285533 .0146938
-------------+---------------------------------------------------------------/lnalpha | .1674206 .0187562
.1306591 .2041821
-------------+---------------------------------------------------------------alpha | 1.182251 .0221746
1.139579 1.226522
-----------------------------------------------------------------------------. estimates store nbrobust
.
. * Should also control here for clustering (see chapter 24)
. * as up to four years of data for each person.
436

. * Table 20.5 did not report these results


. nbreg MDU $XLIST, cluster(zper)
Fitting Poisson model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -60097.599


log pseudo-likelihood = -60087.636
log pseudo-likelihood = -60087.622
log pseudo-likelihood = -60087.622

Fitting constant-only model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -44579.449


log pseudo-likelihood = -44192.261
log pseudo-likelihood = -44191.615
log pseudo-likelihood = -44191.615

Fitting full model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -42968.574


log pseudo-likelihood = -42783.342
log pseudo-likelihood = -42777.614
log pseudo-likelihood = -42777.611

Negative binomial regression

Number of obs =
20186
Wald chi2(17) = 1034.43
Log pseudo-likelihood = -42777.611
Prob > chi2 = 0.0000
(standard errors adjusted for clustering on zper)
-----------------------------------------------------------------------------|
Robust
MDU |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0504405 .0236804 -2.13 0.033 -.0968533 -.0040277
IDP | -.1475976 .0457769 -3.22 0.001 -.2373186 -.0578766
LPI | .0158351 .0066968 2.36 0.018 .0027096 .0289607
FMDE | -.021335 .0137245 -1.55 0.120 -.0482344 .0055645
PHYSLIM | .2751715 .0489905 5.62 0.000 .1791519 .371191
NDISEASE | .0259352 .0025814 10.05 0.000 .0208758 .0309946
HLTHG | .0065371 .0359676 0.18 0.856 -.0639581 .0770323
HLTHF | .2368643 .0653989 3.62 0.000 .1086848 .3650437
HLTHP | .4256563 .1000813 4.25 0.000 .2295005 .621812
LINC | .0845165 .0152197 5.55 0.000 .0546864 .1143467
LFAM | -.1226764 .0340453 -3.60 0.000 -.189404 -.0559488
EDUCDEC | .0162582 .0059501 2.73 0.006 .0045962 .0279202
AGE | .0025943 .001581 1.64 0.101 -.0005045 .0056931
FEMALE | .3672884 .0420327 8.74 0.000 .2849059 .4496709
CHILD | .3060317 .0598167 5.12 0.000 .1887932 .4232702
FEMCHILD | -.3755503 .0649845 -5.78 0.000 -.5029175 -.2481831
BLACK | -.7104372 .0531155 -13.38 0.000 -.8145417 -.6063326
_cons | -.2069298 .1576721 -1.31 0.189 -.5159613 .1021018
437

-------------+---------------------------------------------------------------/lnalpha | .1674206 .0252599


.1179121 .2169291
-------------+---------------------------------------------------------------alpha | 1.182251 .0298635
1.125145 1.242256
-----------------------------------------------------------------------------. estimates store nbcluster
.
. ************ DISPLAY RESULTS FOR TABLE 20.5 (page 673) ************
.
. * Note for brevity the coefficients for only some of the regressors
. * are given in Table 20.5
.
. * First columns of Table 20.5 (page 673) plus cluster-robust
. estimates table poisml poisrobust poiscluster, t stats(N ll rank aic bic) b(%10.4f) t(%10.3f)
----------------------------------------------------Variable | poisml poisrobust poisclus~r
-------------+--------------------------------------LC | -0.0427 -0.0427 -0.0427
| -7.030
-2.835
-1.884
IDP | -0.1613 -0.1613 -0.1613
| -13.881
-5.773
-3.799
LPI | 0.0129
0.0129
0.0129
|
6.999
2.912
1.898
FMDE | -0.0206 -0.0206 -0.0206
| -5.803
-2.319
-1.533
PHYSLIM | 0.2684
0.2684
0.2684
| 21.711
8.240
5.466
NDISEASE | 0.0232
0.0232
0.0232
| 38.124
13.487
8.443
HLTHG | 0.0394
0.0394
0.0394
|
4.109
1.699
1.113
HLTHF | 0.2531
0.2531
0.2531
| 15.613
5.894
3.749
HLTHP | 0.5216
0.5216
0.5216
| 19.150
6.966
4.482
LINC | 0.0834
0.0834
0.0834
| 16.147
5.993
4.152
LFAM | -0.1297 -0.1297 -0.1297
| -14.471
-5.717
-3.813
EDUCDEC | 0.0176
0.0176
0.0176
| 10.749
4.358
2.810
AGE | 0.0024
0.0024
0.0024
|
5.510
2.124
1.435
FEMALE | 0.3488
0.3488
0.3488
| 30.727
12.300
8.063
CHILD | 0.3362
0.3362
0.3362
| 18.866
8.319
5.736
FEMCHILD | -0.3625 -0.3625 -0.3625
438

| -20.208
-8.211
-5.487
BLACK | -0.6801 -0.6801 -0.6801
| -43.738 -18.442 -12.495
_cons | -0.1899 -0.1899 -0.1899
| -3.861
-1.489
-1.021
-------------+--------------------------------------N | 20186.0000 20186.0000 20186.0000
ll | -6.009e+04 -6.009e+04 -6.009e+04
rank | 18.0000
18.0000
18.0000
aic | 1.202e+05 1.202e+05 1.202e+05
bic | 1.204e+05 1.204e+05 1.204e+05
----------------------------------------------------legend: b/t
.
. * Last columns of Table 20.5 (page 673) give bnbml. Also give others.
. estimates table nbml nbrobust nbcluster, t stats(N ll rank aic bic) b(%10.4f) t(%10.3f)
----------------------------------------------------Variable | nbml
nbrobust nbcluster
-------------+--------------------------------------MDU
|
LC | -0.0504 -0.0504 -0.0504
| -3.919
-3.228
-2.130
IDP | -0.1476 -0.1476 -0.1476
| -5.809
-4.859
-3.224
LPI | 0.0158
0.0158
0.0158
|
3.902
3.574
2.365
FMDE | -0.0213 -0.0213 -0.0213
| -2.840
-2.351
-1.555
PHYSLIM | 0.2752
0.2752
0.2752
|
9.310
8.068
5.617
NDISEASE | 0.0259
0.0259
0.0259
| 17.492
15.324
10.047
HLTHG | 0.0065
0.0065
0.0065
|
0.323
0.275
0.182
HLTHF | 0.2369
0.2369
0.2369
|
6.332
5.425
3.622
HLTHP | 0.4257
0.4257
0.4257
|
5.738
6.205
4.253
LINC | 0.0845
0.0845
0.0845
|
9.867
7.419
5.553
LFAM | -0.1227 -0.1227 -0.1227
| -6.354
-5.296
-3.603
EDUCDEC | 0.0163
0.0163
0.0163
|
4.666
4.031
2.732
AGE | 0.0026
0.0026
0.0026
|
2.750
2.331
1.641
FEMALE | 0.3673
0.3673
0.3673
| 15.300
12.855
8.738
CHILD | 0.3060
0.3060
0.3060
439

|
7.936
7.134
5.116
FEMCHILD | -0.3756 -0.3756 -0.3756
| -10.112
-8.401
-5.779
BLACK | -0.7104 -0.7104 -0.7104
| -25.841 -19.764 -13.375
_cons | -0.2069 -0.2069 -0.2069
| -2.301
-1.830
-1.312
-------------+--------------------------------------lnalpha
|
_cons | 0.1674
0.1674
0.1674
| 11.320
8.926
6.628
-------------+--------------------------------------Statistics |
N | 20186.0000 20186.0000 20186.0000
ll | -4.278e+04 -4.278e+04 -4.278e+04
rank | 19.0000
19.0000
19.0000
aic | 85593.2220 85593.2220 85593.2220
bic | 85743.5642 85743.5642 85743.5642
----------------------------------------------------legend: b/t
.
. * For Poisson correcting for overdispersion is most important.
. * For negative binomial overdispersion is already incorporated.
. * For both contreolling for clustering (in this example with panel data)
. * is also needed.
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section4\mma20p1count.txt
log type: text
closed on: 20 May 2005, 08:41:56

440

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma21p1panfeandre.txt
log type: text
opened on: 23 May 2005, 11:27:25
.
. ********** OVERVIEW OF MMA21P1PANBFEANDRE.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 21.3.1-3 pages 709-14
. * Program performs basic panel analysis, mainly using XTREG:
. * It derives most of Table 21.1 and Figures 21.1-21.4
. * (1) pooled OLS
. * (2) between
. * (3) within (or fixed effects)
. * (4) first differences
. * (5) random effects - GLS
. * (6) random effects - MLE
. * (7) Hausman test of FE versus RE
. * Standard errors are default plus panel bootstrap
.
. * The individual effects model is
. * y_it = x_it'b + a_i + e_it
. * Default panel output assumes e_it is random.
. * This is usually too strong an assumption.
. * Instead should get panel-robust or cluster-robust errors after xtreg
. * See Section 21.2.3 pages 709-12
. * Stata Version 8 does not do this but Stata version 9 does.
.
. * Three ways to obtain panel-robust se's for fixed and random effects models:
. * (1) Use Stata version 9 and cluster option in xtreg
. * (2) Use Stata version 8 xtreg and then panel bootstrap (this program)
. * (3) Use Stata version 8 regress cluster option on transformed model (next program)
.
. * The four basic linear panel programs are
. * mma21p1panfeandre.do Linear fixed and random effects using xtreg
. * mma21p2panfeandre.do Linear fe and re using transformation and regress
.*
plus also has valid Hausman test
. * mma21p3panresiduals.do Residual analysis after linear fe and re
. * mma21p4panpangls.do Pooled panel OLS and GLS
.
. * To run this program you need data file
. * MOM.dat
.
. * To speed up this program reduce nreps, the number of bootstraps
. * used in the panel bootstrap to get panel-robust standard errors
441

.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION **********
.
. * The original data is from
. * Jim Ziliak (1997)
. * "Efficient Estimation With Panel Data when Instruments are Predetermined:
. * An Empirical Comparison of Moment-Condition Estimators"
. * Journal of Business and Economic Statistics, 15, 419-431
.
. * File MOM.dat has data on 532 men over 10 years (1979-1988)
. * Data are space-delimited ordered by person with separate line for each year
. * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ...
. * 8 variables:
. * lnhr lnwg kids ageh agesq disab id year
.
. * File MOM.dat is the version of the data posted at the JBES website
. * Note that in chapter 22 we instead use MOMprecise.dat
. * which is the same data set but with more significant digits
.
. ********** READ DATA **********
.
. * The data are in ascii file MOM.dat
. * There are 532 individuals with 10 lines (years) per individual
. * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY
. infile lnhr lnwg kids ageh agesq disab id year using MOM.dat
(5320 observations read)
.
. ********** DATA TRANSFORMATIONS AND CHECK **********
.
. * Create year dummies
. tabulate year, generate(dyear)
year |
Freq. Percent
Cum.
------------+----------------------------------1979 |
532
10.00
10.00
1980 |
532
10.00
20.00
1981 |
532
10.00
30.00
1982 |
532
10.00
40.00
1983 |
532
10.00
50.00
1984 |
532
10.00
60.00
1985 |
532
10.00
70.00
442

1986 |
532
10.00
80.00
1987 |
532
10.00
90.00
1988 |
532
10.00
100.00
------------+----------------------------------Total |
5,320
100.00
.
. * The following lists the variables in data set and summarizes data
. describe
Contains data
obs:
5,320
vars:
18
size:
244,720 (97.6% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------lnhr
float %9.0g
lnwg
float %9.0g
kids
float %9.0g
ageh
float %9.0g
agesq
float %9.0g
disab
float %9.0g
id
float %9.0g
year
float %9.0g
dyear1
byte %8.0g
year== 1979.0000
dyear2
byte %8.0g
year== 1980.0000
dyear3
byte %8.0g
year== 1981.0000
dyear4
byte %8.0g
year== 1982.0000
dyear5
byte %8.0g
year== 1983.0000
dyear6
byte %8.0g
year== 1984.0000
dyear7
byte %8.0g
year== 1985.0000
dyear8
byte %8.0g
year== 1986.0000
dyear9
byte %8.0g
year== 1987.0000
dyear10
byte %8.0g
year== 1988.0000
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lnhr |
5320 7.65743 .2855914
2.77
8.56
lnwg |
5320 2.609436 .4258924
-.26
4.69
kids |
5320 1.555827 1.195924
0
6
ageh |
5320 38.91823 8.450351
22
60
agesq |
5320 1586.024 689.7759
484
3600
-------------+-------------------------------------------------------disab |
5320 .0609023 .2391734
0
1
443

id |
5320
266.5 153.5893
1
532
year |
5320
1983.5 2.872551
1979
1988
dyear1 |
5320
.1 .3000282
0
1
dyear2 |
5320
.1 .3000282
0
1
-------------+-------------------------------------------------------dyear3 |
5320
.1 .3000282
0
1
dyear4 |
5320
.1 .3000282
0
1
dyear5 |
5320
.1 .3000282
0
1
dyear6 |
5320
.1 .3000282
0
1
dyear7 |
5320
.1 .3000282
0
1
-------------+-------------------------------------------------------dyear8 |
5320
.1 .3000282
0
1
dyear9 |
5320
.1 .3000282
0
1
dyear10 |
5320
.1 .3000282
0
1
. save mom, replace
file mom.dta saved
.
. * The following summarizes panel features for completeness
. iis id
. tis year
. xtdes
id: 1, 2, ..., 532
n=
532
year: 1979, 1980, ..., 1988
T=
Delta(year) = 1; (1988-1979)+1 = 10
(id*year uniquely identifies each observation)
Distribution of T_i: min
5%
10
10
10

10

25%
50%
75%
10
10
10
10

95%

max

Freq. Percent Cum. | Pattern


---------------------------+-----------532 100.00 100.00 | 1111111111
---------------------------+-----------532 100.00
| XXXXXXXXXX
. xtsum lnhr lnwg kids ageh agesq disab
Variable
|
Mean Std. Dev.
Min
Max | Observations
-----------------+--------------------------------------------+---------------lnhr overall | 7.65743 .2855914
2.77
8.56 | N = 5320
between |
.1790083
6.416
8.242 | n = 532
within |
.2226492 3.66943 9.001431 | T =
10
|
|
lnwg overall | 2.609436 .4258924
-.26
4.69 | N = 5320
between |
.3911937
1.346
4.543 | n = 532
within |
.1691472 .0694361 4.487436 | T =
10
444

|
|
overall | 1.555827 1.195924
0
6 | N = 5320
between |
1.032205
0
5.4 | n = 532
within |
.605468 -2.444173 5.055827 | T =
10
|
|
ageh overall | 38.91823 8.450351
22
60 | N = 5320
between |
7.945371
26.5
55.5 | n = 532
within |
2.895916 32.71823 52.21823 | T =
10
|
|
agesq overall | 1586.024 689.7759
484
3600 | N = 5320
between |
650.9138
710.5 3088.5 | n = 532
within |
229.8235 963.3239 2581.724 | T =
10
|
|
disab overall | .0609023 .2391734
0
1 | N = 5320
between |
.1657419
0
1 | n = 532
within |
.1725689 -.8390977 .9609023 | T =
10

kids

.
. ********** DEFINE GLOBALS INCLUDING REGRESSOR LIST **********
.
. * Number of reps for the boostrap
. * Table 21.2 pge 710 used 500
. global nreps 500
.
. * The regression below are of lnhrs on lnwg
. * Additional regressors to be included below are defined in xextra
. * Choose one of the following
.
. * No additional regressors
. global xextra
. global xextrashort
.
. * Include year dummies with one ommitted (or two omitted for first differences)
. * global xextra dyear1 dyear2 dyear3 dyear3 dyear4 dyear5 dyear6 dyear7 dyear8 dyear9
. * global xextrashort dyear2 dyear3 dyear3 dyear4 dyear5 dyear6 dyear7 dyear8 dyear9
.
. * Include socioeconomic characteristics
. * global xextra kids ageh agesq disab
. * global xextrashort kids ageh agesq disab
.
. ********* DIFFERENT PANEL ESTIMATES pages 709-14 **********
.
. * Note that in the first xt command need to give , i(id)
. * to indicate that the ith observation is for the ith id
.
. * XTDATA permits plots of between, within and overall
. * Useful for looking at the data. See Stata manual under xtdata for example.
. * XTREG gives between, within and RE estiamtes though not correct standard errors
445

.
. * The graphs below use new Stata 8 graphics
. * Change graphics scheme from default s2color to s1mono for printing
. set scheme s1mono
. * The following graphs include
. * legend(pos(4) ring(0) col(1))
.*
changes position of legend to four o'clock
. * legend( label(1 "Data used") label(2 "Smoothed fit") label(3 "Linear fit"))
.*
changes labels for the legends
.
. *** (1) POOLED OLS (OVERALL) REGRESSION (Table 21.2 POLS column and Figure 21.1)
.
. use mom, clear
.
. * Wrong formula OLS standard errors require e_it is i.i.d.
. regress lnhr lnwg $xextra
Source |
SS
df
MS
Number of obs = 5320
-------------+-----------------------------F( 1, 5318) = 82.22
Model | 6.60538417 1 6.60538417
Prob > F
= 0.0000
Residual | 427.225206 5318 .080335691
R-squared = 0.0152
-------------+-----------------------------Adj R-squared = 0.0150
Total | 433.830591 5319 .081562435
Root MSE
= .28344
-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0827436 .0091251 9.07 0.000 .0648545 .1006326
_cons | 7.441516 .0241265 308.44 0.000 7.394219 7.488814
-----------------------------------------------------------------------------. estimates store polsiid
.
. * Wrong White heteroskesdastic-consistent standard errors
. * assume standard errors require e_it is independent over i
. regress lnhr lnwg $xextra, robust
Regression with robust standard errors
Number of obs =
F( 1, 5318) = 16.61
Prob > F
= 0.0000
R-squared = 0.0152
Root MSE = .28344

5320

-----------------------------------------------------------------------------|
Robust
lnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0827436 .0203042 4.08 0.000 .0429391 .122548
446

_cons | 7.441516 .0548992 135.55 0.000 7.333891 7.549141


-----------------------------------------------------------------------------. estimates store polshet
.
. * Correct panel robust standard errors
. regress lnhr lnwg $xextra, cluster(id)
Regression with robust standard errors
Number of obs = 5320
F( 1, 531) = 7.99
Prob > F
= 0.0049
R-squared = 0.0152
Number of clusters (id) = 532
Root MSE
= .28344
-----------------------------------------------------------------------------|
Robust
lnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0827436 .0292711 2.83 0.005 .0252421 .140245
_cons | 7.441516 .079587 93.50 0.000 7.285172 7.59786
-----------------------------------------------------------------------------. estimates store polspanel
.
. * Correct panel bootstrap standard errors
. * Note that use cluster option so that bootstrap is over just i and not both i and t
. set seed 10001
. bs "regress lnhr lnwg $xextra" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level(95)
command:
regress lnhr lnwg
statistics: _bs_1
= _b[lnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

5320

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .0827435 -.0005317 .0298395 .024117 .1413701 (N)
|
.027782 .1408137 (P)
|
.0284079 .1434854 (BC)
_bs_2 | 500 7.441516 .001375 .0805676 7.283223 7.59981 (N)
|
7.281352 7.593587 (P)
|
7.269371 7.585756 (BC)
-----------------------------------------------------------------------------Note: N = normal
447

P = percentile
BC = bias-corrected
. matrix polsbootse = e(se)
.
. * Overall plot of data with lowess local regression line - Figure 21.1 page 712
. graph twoway (scatter lnhr lnwg, msize(vsmall)) (lowess lnhr lnwg) (lfit lnhr lnwg), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Pooled (Overall) Regression") /*
> */ xtitle("Log hourly wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Log annual hours", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(4) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Original data") label(2 "Nonparametric fit") label(3 "Linear fit"))
. graph export ch21pantot.wmf, replace
(file c:\Imbook\bwebpage\Section5\ch21pantot.wmf written in Windows Metafile format)
.
. *** (2) BETWEEN REGRESSION (Table 21.2 Between column and Figure 21.2)
.
. use mom, clear
.
. * Usual standard errors assume iid error
. xtreg lnhr lnwg, be i(id)
Between regression (regression on group means) Number of obs
=
Group variable (i): id
Number of groups =
532
R-sq: within = 0.0162
between = 0.0213
overall = 0.0152
F(1,530)
sd(u_i + avg(e_i.))= .1772555

Obs per group: min =


avg =
10.0
max =
10
= 11.55
Prob > F
=

5320

10

0.0007

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0668379 .0196635 3.40 0.001 .0282099 .1054658
_cons | 7.483021 .0518829 144.23 0.000
7.3811 7.584943
-----------------------------------------------------------------------------. estimates store beiid
.
. * Heteroskedasticity robust standard errors
. * Stata has no option for this. See ch21panel2.do
.
. * Correct panel bootstrap standard errors
448

. set seed 10001


. bootstrap "xtreg lnhr lnwg, be i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level(95)
command:
xtreg lnhr lnwg , be i(id)
statistics: _bs_1
= _b[lnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

5320

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .0668379 -.0005547 .0192363 .0290438 .1046319 (N)
|
.0240799 .1059889 (P)
|
.0274993 .1066802 (BC)
_bs_2 | 500 7.483021 .0016537 .0519151 7.381022 7.58502 (N)
|
7.383433 7.595335 (P)
|
7.382822 7.592656 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix bebootse = e(se)
.
. * Betweeen plot of data with lowess local regression line - Figure 21.2 page 712
. iis id
. xtdata, be
. graph twoway (scatter lnhr lnwg, msize(vsmall)) (lowess lnhr lnwg) (lfit lnhr lnwg), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Between Regression") /*
> */ xtitle("Log hourly wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Log annual hours", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(4) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Averages") label(2 "Nonparametric fit") label(3 "Linear fit"))
. graph export ch21panbe.wmf, replace
(file c:\Imbook\bwebpage\Section5\ch21panbe.wmf written in Windows Metafile format)
.
. *** (3) WITHIN (FIXED EFFECTS) REGRESSION (Table 21.2 Within column and Figure 21.3)
.
. use mom, clear
.
449

. * Usual standard errors assume iid error


. xtreg lnhr lnwg $xextra, fe i(id)
Fixed-effects (within) regression
Group variable (i): id

Number of obs
=
5320
Number of groups =
532

R-sq: within = 0.0162


between = 0.0213
overall = 0.0152

corr(u_i, Xb) = -0.1995

Obs per group: min =


avg =
10.0
max =
10
F(1,4787)
=
Prob > F

10

78.96
= 0.0000

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1676755 .01887 8.89 0.000 .1306816 .2046694
_cons | 7.219892 .0493434 146.32 0.000 7.123156 7.316628
-------------+---------------------------------------------------------------sigma_u | .18142881
sigma_e | .23278339
rho | .37789558 (fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0: F(531, 4787) = 5.83
Prob > F = 0.0000
. estimates store feiid
.
. * Correct panel robust standard errors
. * Stata has no option for this. See ch21panel2.do
.
. * Correct panel bootstrap standard errors
. set seed 10001
. bootstrap "xtreg lnhr lnwg $xextra, fe i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level
> (95)
command:
xtreg lnhr lnwg , fe i(id)
statistics: _bs_1
= _b[lnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

5320

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .1676755 -.0055543 .0844631 .0017284 .3336226 (N)
|
.0213276 .3318829 (P)
|
.0300515 .3605573 (BC)
450

_bs_2 | 500 7.219892 .01461 .223047 6.781665 7.658119 (N)


|
6.782279 7.604026 (P)
|
6.683465 7.574718 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix febootse = e(se)
.
. * Within plot of data with lowess local regression line - Figure 21.3 page 712
. iis id
. xtdata, fe
. graph twoway (scatter lnhr lnwg, msize(vsmall)) (lowess lnhr lnwg) (lfit lnhr lnwg), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Within (Fixed Effects) Regression") /*
> */ xtitle("Log hourly wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Log annual hours", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(4) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Deviations from average") label(2 "Nonparametric fit") label(3 "Linear fit")
>)
. graph export ch21panfe.wmf, replace
(file c:\Imbook\bwebpage\Section5\ch21panfe.wmf written in Windows Metafile format)
.
. *** (4) FIRST DIFFERENCES REGRESSION (Table 21.2 First diff column and Figure 21.4)
.
. * Stata has no command for first differences regression
. * Though may be possible with xtabond
. * Instead need to create differenced data
.
. use mom, clear
. * The following only works if each observation is (i,t)
. * and within i the data are ordered by t
. gen dlnhr = lnhr - lnhr[_n-1]
(1 missing value generated)
. gen dlnwg = lnwg - lnwg[_n-1]
(1 missing value generated)
. gen dkids = kids - kids[_n-1]
(1 missing value generated)
. gen dageh = ageh - ageh[_n-1]
(1 missing value generated)

451

. gen dagesq = agesq - agesq[_n-1]


(1 missing value generated)
. gen ddisab = disab - disab[_n-1]
(1 missing value generated)
. * The following drops the first year which here is 1979
. drop if year == 1979
(532 observations deleted)
.
. * Usual standard errors assume iid error
. regress dlnhr dlnwg $xextrashort
Source |
SS
df
MS
Number of obs = 4788
-------------+-----------------------------F( 1, 4786) = 26.09
Model | 2.27870825 1 2.27870825
Prob > F
= 0.0000
Residual | 417.943979 4786 .087326364
R-squared = 0.0054
-------------+-----------------------------Adj R-squared = 0.0052
Total | 420.222687 4787 .087784142
Root MSE
= .29551
-----------------------------------------------------------------------------dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .1089851 .0213351 5.11 0.000 .0671584 .1508118
_cons | .0008283 .0042712 0.19 0.846 -.0075452 .0092018
-----------------------------------------------------------------------------. estimates store fdiffiid
.
. * Correct panel robust standard errors
. regress dlnhr dlnwg $xextrashort, cluster(id)
Regression with robust standard errors
Number of obs = 4788
F( 1, 531) = 1.69
Prob > F
= 0.1936
R-squared = 0.0054
Number of clusters (id) = 532
Root MSE
= .29551
-----------------------------------------------------------------------------|
Robust
dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .1089851 .0837266 1.30 0.194 -.0554909 .2734612
_cons | .0008283 .0016148 0.51 0.608 -.0023439 .0040005
-----------------------------------------------------------------------------. estimates store fdiffpanel
.
452

. * "Robust" standard errors only control for heteroskedasticity


. regress dlnhr dlnwg $xextrashort, robust
Regression with robust standard errors
Number of obs =
F( 1, 4786) = 2.51
Prob > F
= 0.1135
R-squared = 0.0054
Root MSE = .29551

4788

-----------------------------------------------------------------------------|
Robust
dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .1089851 .0688514 1.58 0.114 -.0259952 .2439654
_cons | .0008283 .0042856 0.19 0.847 -.0075735 .0092301
-----------------------------------------------------------------------------. estimates store fdiffhet
.
. * Correct panel bootstrap standard errors
. set seed 10001
. bs "regress dlnhr dlnwg $xextrashort" "_b[dlnwg] _b[_cons]", cluster(id) reps($nreps) level(95)
command:
regress dlnhr dlnwg
statistics: _bs_1
= _b[dlnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

4788

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .1089851 -.0092694 .0832844 -.0546462 .2726165 (N)
|
-.0486034 .2608319 (P)
|
-.0329857 .2929305 (BC)
_bs_2 | 500 .0008283 -8.39e-06 .0015843 -.0022843 .003941 (N)
|
-.0023564 .0038644 (P)
|
-.0023692 .003842 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix fdiffbootse = e(se)
.
. * First differences plot with lowess local regression line - Figure 21.4 page 713
453

. graph twoway (scatter dlnhr dlnwg, msize(vsmall)) (lowess dlnhr dlnwg) (lfit dlnhr dlnwg), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("First Differences Regression") /*
> */ xtitle("Log hourly wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Log annual hours", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(4) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "First differences") label(2 "Nonparametric fit") label(3 "Linear fit"))
. graph export ch21panfd.wmf, replace
(file c:\Imbook\bwebpage\Section5\ch21panfd.wmf written in Windows Metafile format)
.
. *** (5) RANDOM EFFECTS GLS REGRESSION (Table 21.2 RE-GLS column)
.
. use mom, clear
.
. * Usual standard errors assume iid error
. xtreg lnhr lnwg, re i(id)
Random-effects GLS regression
Group variable (i): id
R-sq: within = 0.0162
between = 0.0213
overall = 0.0152
Random effects u_i ~ Gaussian
corr(u_i, X)
= 0 (assumed)

Number of obs
Number of groups =
Obs per group: min =
avg =
10.0
max =
10

=
5320
532
10

Wald chi2(1)
= 76.64
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1193322 .0136312 8.75 0.000 .0926155 .146049
_cons | 7.346041 .0363925 201.86 0.000 7.274713 7.417368
-------------+---------------------------------------------------------------sigma_u | .16124733
sigma_e | .23278339
rho | .32424354 (fraction of variance due to u_i)
-----------------------------------------------------------------------------. estimates store reglsiid
.
. * Correct panel robust standard errors
. * Stata has no option for this. See ch21panel2.do
. * or use xtgee corr(exchangeable), robust see ch21panel4.do
.
. * Correct panel bootstrap standard errors
. set seed 10001

454

. bootstrap "xtreg lnhr lnwg, re i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level(95)
command:
xtreg lnhr lnwg , re i(id)
statistics: _bs_1
= _b[lnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

5320

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .1193322 .0084025 .0563763 .008568 .2300965 (N)
|
.0332454 .2379648 (P)
|
.0203328 .2199058 (BC)
_bs_2 | 500 7.346041 -.0217114 .1492226 7.052859 7.639223 (N)
|
7.029869 7.577236 (P)
|
7.082208 7.614716 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix reglsbootse = e(se)
.
. *** (6) RANDOM EFFECTS MLE REGRESSION (Table 21.2 RE-MLE column)
.
. use mom, clear
.
. * Usual standard errors assume iid error
. xtreg lnhr lnwg, mle i(id)
Fitting constant-only model:
Iteration 0: log likelihood = -305.19469
Iteration 1: log likelihood = -304.97993
Iteration 2: log likelihood = -304.97987
Fitting full model:
Iteration 0: log likelihood = -270.51687
Iteration 1: log likelihood = -266.91794
Iteration 2: log likelihood = -266.91155
Random-effects ML regression
Group variable (i): id
Random effects u_i ~ Gaussian

Number of obs
Number of groups =

=
5320
532

Obs per group: min =


avg =
10.0
max =
10

10

455

LR chi2(1)
Log likelihood = -266.91155

= 76.14
Prob > chi2
=

0.0000

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1195474 .0137484 8.70 0.000 .092601 .1464938
_cons | 7.345479 .0366973 200.16 0.000 7.273554 7.417404
-------------+---------------------------------------------------------------/sigma_u | .162175 .0060469 26.82 0.000 .1503233 .1740266
/sigma_e | .2329172 .0023819 97.79 0.000 .2282488 .2375856
-------------+---------------------------------------------------------------rho | .3265097 .017266
.2934209 .3610233
-----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)= 1147.08 Prob>=chibar2 = 0.000
. estimates store remleiid
.
. * Correct panel robust standard errors
. * Stata has no option for this. See ch21panel2.do
.
. * Correct panel bootstrap standard errors
. set seed 10001
. bootstrap "xtreg lnhr lnwg, mle i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level(95)
command:
xtreg lnhr lnwg , mle i(id)
statistics: _bs_1
= _b[lnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

5320

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .1195474 .0094957 .0582585 .0050852 .2340096 (N)
|
.0333037 .2445228 (P)
|
.0209889 .2249033 (BC)
_bs_2 | 500 7.345479 -.0245541 .1540811 7.042751 7.648207 (N)
|
7.013718 7.577084 (P)
|
7.070499 7.613971 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix remlebootse = e(se)
456

.
. * Population averaged is similar to re (gives similar to mle version of re)
. * Exactly same as xtgee, i(id)
. xtreg lnhr lnwg, pa i(id)
Iteration 1: tolerance = .03364039
Iteration 2: tolerance = .00033468
Iteration 3: tolerance = 4.733e-06
Iteration 4: tolerance = 6.715e-08
GEE population-averaged model
Number of obs
=
5320
Group variable:
id
Number of groups =
532
Link:
identity
Obs per group: min =
10
Family:
Gaussian
avg =
10.0
Correlation:
exchangeable
max =
10
Wald chi2(1)
= 76.70
Scale parameter:
.0805511
Prob > chi2
= 0.0000
-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1195474 .0136507 8.76 0.000 .0927925 .1463023
_cons | 7.345479 .0364481 201.53 0.000 7.274042 7.416916
-----------------------------------------------------------------------------. estimates store paiid
.
. *** (7) HAUSMAN TEST (NOT ROBUST)
.
. * Hausman test of fixed versus random effects
. * The FE estimates are saved in feiid
. * The RE estimates are saved in reglsiid
.
. * From Section 21.4.3 pages 717-9 this usual implementation of the Hausman test
. * is invalid if there is any intracluster correlation left in the RE model
. * as then the RE estimator is no longer fully efficient
. * so Var[b_RE - b_FE] does not equal Var[b_FE] - V[b_RE]
.
. * Following is not valid - see MMA21P2PANMANUAL.DO for robust version
. hausman feiid reglsiid
---- Coefficients ---|
(b)
(B)
(b-B) sqrt(diag(V_b-V_B))
| feiid
reglsiid
Difference
S.E.
-------------+---------------------------------------------------------------lnwg | .1676755 .1193322
.0483432
.0130486
-----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
457

Test: Ho: difference in coefficients not systematic


chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
=
13.73
Prob>chi2 =
0.0002
.
. ********* DISPLAY RESULTS - Table 21.2 on page 710 *********
.
. * Standard error using iid errors and in somce cases panel
. estimates table polsiid polshet polspanel beiid feiid, /*
> */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)
------------------------------------------------------------------------------Variable | polsiid
polshet polspanel
beiid
feiid
-------------+----------------------------------------------------------------lnwg |
0.083
0.083
0.083
0.067
0.168
|
0.009
0.020
0.029
0.020
0.019
_cons |
7.442
7.442
7.442
7.483
7.220
|
0.024
0.055
0.080
0.052
0.049
-------------+----------------------------------------------------------------N | 5320.000 5320.000 5320.000 5320.000 5320.000
ll | -840.453 -840.453 -840.453
166.573
486.743
r2 |
0.015
0.015
0.015
0.021
0.016
tss |
433.831
rss | 427.225
427.225
427.225
16.652
259.398
mss |
6.605
6.605
6.605
0.363
4.279
rmse |
0.283
0.283
0.283
0.177
0.233
df_r | 5318.000 5318.000
531.000
530.000 4787.000
------------------------------------------------------------------------------legend: b/se
. estimates table fdiffiid fdiffhet fdiffpanel reglsiid remleiid, /*
> */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)
------------------------------------------------------------------------------Variable | fdiffiid fdiffhet fdiffpanel reglsiid remleiid
-------------+----------------------------------------------------------------_
|
dlnwg |
0.109
0.109
0.109
|
0.021
0.069
0.084
lnwg |
0.119
|
0.014
_cons |
0.001
0.001
0.001
7.346
|
0.004
0.004
0.002
0.036
-------------+----------------------------------------------------------------lnhr
|
lnwg |
0.120
|
0.014
_cons |
7.345
458

|
0.037
-------------+----------------------------------------------------------------sigma_u
|
_cons |
0.162
|
0.006
-------------+----------------------------------------------------------------sigma_e
|
_cons |
0.233
|
0.002
-------------+----------------------------------------------------------------Statistics |
N | 4788.000 4788.000 4788.000 5320.000 5320.000
ll | -956.059 -956.059 -956.059
-266.912
r2 | 0.005
0.005
0.005
tss |
rss | 417.944
417.944
417.944
mss |
2.279
2.279
2.279
rmse |
0.296
0.296
0.296
df_r | 4786.000 4786.000
531.000
------------------------------------------------------------------------------legend: b/se
. estimates table paiid, se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)
--------------------------Variable | paiid
-------------+------------lnwg |
0.120
|
0.014
_cons |
7.345
|
0.036
-------------+------------N | 5320.000
ll |
r2 |
tss |
rss |
mss |
rmse |
df_r |
--------------------------legend: b/se
.
. * Standard errors using panel bootstrap (regular bootstrap for between)
. matrix list polsbootse
polsbootse[1,2]
_bs_1
_bs_2
se .02983953 .0805676

459

. matrix list bebootse


bebootse[1,2]
_bs_1
_bs_2
se .01923625 .05191507
. matrix list febootse
febootse[1,2]
_bs_1
_bs_2
se .08446309 .22304703
. matrix list fdiffbootse
fdiffbootse[1,2]
_bs_1
_bs_2
se .08328443 .00158427
. matrix list reglsbootse
reglsbootse[1,2]
_bs_1
_bs_2
se .05637633 .14922264
. matrix list remlebootse
remlebootse[1,2]
_bs_1
_bs_2
se .05825849 .15408111
.
. ********** CLOSE OUTPUT *********
. log close
log: c:\Imbook\bwebpage\Section5\mma21p1panfeandre.txt
log type: text
closed on: 23 May 2005, 11:34:06
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma21p2panmanual.txt
log type: text
opened on: 23 May 2005, 11:34:50
.
. ********** OVERVIEW OF MMA21P2PANMANUAL.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
460

. * Chapter 21.3.1-3 pages 709-14


. * Program performs basic panel analysis and gets panel robust se's
. * by first transforming model and then using REGRESS
. * It also presents a valid Hausman test of FE versus RE model
.
. * This program estimates
. * (2) between estimator by regress y_bar on x_bar
. * (4) within estimator by regress (y - y_bar) on (x - x_bar)
. * (5) random effects gls by regress (y - rho*y_bar) on (x - rho*x_bar)
. * (6) random effects mle by regress (y - rho*y_bar) on (x - rho*x_bar)
. * (7) robust variant of the Hausman test
. * and calculates
. * - usual standard errors
.*
(which may differ from xtreg due to different degrees of freedom)
. * - panel robust standard errors
.*
(which for RE simplify by assuming lamda_hat is known not estimated)
. * - panel bootstrap standard errors
.*
(which should equal panel robust from ch21panel.do as #bootstrap reps --> infinity)
. * - heteroskedasticity robust standard errors
.*
(which are wrong but included for comparison with others)
.
. * The code is very limited:
. * - it considers only one regressor
. * - it assumes a balanced data set with exactly 10 years of data per obnservations
. * - it does not use loops for transformations which would generalize code
.
. * NOTE: If have Stata Version 9 (rather than version 8) a simpler way to proceed is
. * to directly use XTREG (see program mma21p1panfeandre.do) with option cluster(id)
.
. * The four basic linear panel programs are
. * mma21p1panfeandre.do Linear fixed and random effects using xtreg
. * mma21p2panfeandre.do Linear fe and re using transformation and regress
.*
plus also has valid Hausman test
. * mma21p3panresiduals.do Residual analysis after linear fe and re
. * mma21p4panpangls.do Pooled panel OLS and GLS
.
. * To run this program you need data file
. * MOM.dat
. * in your directory
.
. * To speed up this program reduce nreps, the number of bootstraps
. * used in the panel bootstrap.
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */

461

.
. ********** DATA DESCRIPTION **********
.
. * The original data is from
. * Jim Ziliak (1997)
. * "Efficient Estimation With Panel Data when Instruments are Predetermined:
. * An Emprirical Comparison of Moment-Condition Estimators"
. * Journal of Business and Economic Statistics, 15, 419-431
.
. * File MOM.dat has data on 532 men over 10 years (1979-1988)
. * Data are space-delimited ordered by person with separate line for each year
. * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ...
. * 8 variables:
. * lnhr lnwg kids ageh agesq disab id year
.
. * File MOM.dat is the version of the data posted at the JBES website
. * Note that in chapter 22 we instead use MOMprecise.dat
. * which is the same data set but with more significant digits
.
. ********** READ DATA **********
.
. * The data are in ascii file MOM.dat
. * There are 532 individuals with 10 lines (years) per individual
. * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY
. infile lnhr lnwg kids ageh agesq disab id year using MOM.dat
(5320 observations read)
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lnhr |
5320 7.65743 .2855914
2.77
8.56
lnwg |
5320 2.609436 .4258924
-.26
4.69
kids |
5320 1.555827 1.195924
0
6
ageh |
5320 38.91823 8.450351
22
60
agesq |
5320 1586.024 689.7759
484
3600
-------------+-------------------------------------------------------disab |
5320 .0609023 .2391734
0
1
id |
5320
266.5 153.5893
1
532
year |
5320
1983.5 2.872551
1979
1988
.
. ********** DEFINE GLOBALS **********
.
. * Number of reps for the boostrap
. * Table 21.1 used 500
. global nreps 500
.
. ******** RUN REGRESSIONS USING XTREG **********
.
462

. * This is to verify alternative estimates later on


. * And for random effects it saves lamda
. * used later on to construct transformed regression
. * of (y - lamda*y_1) on (x - lamda*x_1)
.
. xtreg lnhr lnwg, be i(id)
Between regression (regression on group means) Number of obs
=
Group variable (i): id
Number of groups =
532
R-sq: within = 0.0162
between = 0.0213
overall = 0.0152

Obs per group: min =


avg =
10.0
max =
10

F(1,530)
sd(u_i + avg(e_i.))= .1772555

= 11.55
Prob > F
=

5320

10

0.0007

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0668379 .0196635 3.40 0.001 .0282099 .1054658
_cons | 7.483021 .0518829 144.23 0.000
7.3811 7.584943
-----------------------------------------------------------------------------. estimates store bextreg
.
. xtreg lnhr lnwg, fe i(id)
Fixed-effects (within) regression
Group variable (i): id
R-sq: within = 0.0162
between = 0.0213
overall = 0.0152

corr(u_i, Xb) = -0.1995

Number of obs
=
5320
Number of groups =
532
Obs per group: min =
avg =
10.0
max =
10

F(1,4787)
=
Prob > F

10

78.96
= 0.0000

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1676755 .01887 8.89 0.000 .1306816 .2046694
_cons | 7.219892 .0493434 146.32 0.000 7.123156 7.316628
-------------+---------------------------------------------------------------sigma_u | .18142881
sigma_e | .23278339
rho | .37789558 (fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0: F(531, 4787) = 5.83
Prob > F = 0.0000

463

. estimates store fextreg


.
. xtreg lnhr lnwg, re i(id)
Random-effects GLS regression
Group variable (i): id
R-sq: within = 0.0162
between = 0.0213
overall = 0.0152
Random effects u_i ~ Gaussian
corr(u_i, X)
= 0 (assumed)

Number of obs
Number of groups =

=
5320
532

Obs per group: min =


avg =
10.0
max =
10

10

Wald chi2(1)
= 76.64
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1193322 .0136312 8.75 0.000 .0926155 .146049
_cons | 7.346041 .0363925 201.86 0.000 7.274713 7.417368
-------------+---------------------------------------------------------------sigma_u | .16124733
sigma_e | .23278339
rho | .32424354 (fraction of variance due to u_i)
-----------------------------------------------------------------------------. estimates store reglsxtreg
. scalar sesq = e(sigma_e)^2
. scalar susq = e(sigma_u)^2
. scalar lamdaregls = 1 - sqrt( sesq / (e(Tbar)*susq + sesq) )
. di lamdaregls
.58470925
.
. xtreg lnhr lnwg, mle i(id)
Fitting constant-only model:
Iteration 0: log likelihood = -305.19469
Iteration 1: log likelihood = -304.97993
Iteration 2: log likelihood = -304.97987
Fitting full model:
Iteration 0: log likelihood = -270.51687
Iteration 1: log likelihood = -266.91794
Iteration 2: log likelihood = -266.91155
Random-effects ML regression

Number of obs

5320
464

Group variable (i): id

Number of groups =

Random effects u_i ~ Gaussian

Obs per group: min =


avg =
10.0
max =
10

LR chi2(1)
Log likelihood = -266.91155

532

= 76.14
Prob > chi2
=

10

0.0000

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1195474 .0137484 8.70 0.000
.092601 .1464938
_cons | 7.345479 .0366973 200.16 0.000 7.273554 7.417404
-------------+---------------------------------------------------------------/sigma_u | .162175 .0060469 26.82 0.000 .1503233 .1740266
/sigma_e | .2329172 .0023819 97.79 0.000 .2282488 .2375856
-------------+---------------------------------------------------------------rho | .3265097 .017266
.2934209 .3610233
-----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)= 1147.08 Prob>=chibar2 = 0.000
. estimates store remlextreg
. scalar sesq2 = e(sigma_e)^2
. scalar susq2 = e(sigma_u)^2
. scalar lamdaremle = 1 - sqrt( sesq2 / (e(g_avg)*susq2 + sesq2) )
. di lamdaremle
.58648101
.
. ******** ANALYSIS: FE, RE and FD ESTIMATORS CALCULATED MANUALLY
**********
.
. *** FIRST TRANSFORM DATA FROM LONG FORM TO WIDE FORM
.
. * Here just do this for lnhr and lnwg
. keep lnhr lnwg id year
. reshape wide lnhr lnwg, i(id) j(year)
(note: j = 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988)
Data
long -> wide
----------------------------------------------------------------------------Number of obs.
5320 -> 532
Number of variables
4 ->
21
j variable (10 values)
year -> (dropped)
xij variables:
465

lnhr -> lnhr1979 lnhr1980 ... lnhr1988


lnwg -> lnwg1979 lnwg1980 ... lnwg1988
----------------------------------------------------------------------------.
. * Since year is 1979 to 1988 this will create
. * lnhr1979 to lnhr1988 and lnwg1979 to lnwg1988
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
532
266.5 153.7194
1
532
lnhr1979 |
532 7.669342 .249361
5.89
8.54
lnwg1979 |
532 2.597763 .4188951
.52
4.62
lnhr1980 |
532 7.660094 .2691995
5.22
8.34
lnwg1980 |
532 2.602368 .3945963
.8
4.61
-------------+-------------------------------------------------------lnhr1981 |
532 7.66765 .2105797
6.36
8.4
lnwg1981 |
532 2.610959 .3870011
1.53
4.53
lnhr1982 |
532 7.64609 .2427195
5.38
8.31
lnwg1982 |
532 2.61468 .4014363
1.21
4.61
lnhr1983 |
532 7.613064 .382703
2.77
8.37
-------------+-------------------------------------------------------lnwg1983 |
532 2.610526 .4111869
1.08
4.62
lnhr1984 |
532 7.636523 .3316735
3.18
8.44
lnwg1984 |
532 2.600188 .4621549
-.26
4.65
lnhr1985 |
532 7.668365 .2597423
5.08
8.54
lnwg1985 |
532 2.614944 .4347554
1.33
4.69
-------------+-------------------------------------------------------lnhr1986 |
532 7.659286 .3330862
2.77
8.38
lnwg1986 |
532 2.602632 .4432807
.07
4.59
lnhr1987 |
532 7.67406 .2745015
4.38
8.56
lnwg1987 |
532 2.614699 .4300122
1.28
4.03
lnhr1988 |
532 7.679831 .2552894
4.79
8.53
-------------+-------------------------------------------------------lnwg1988 |
532 2.625602 .4701759
-.22
4.6
.
. *** (1) POOLED OLS (OVERALL) REGRESSION
.
. * Not relevant
.
. *** (2) CREATE INDIVIDUAL AVERAGES AND DO BETWEEN REGRESSION
.
. gen avelnhr = (lnhr1979+lnhr1980+lnhr1981+lnhr1982+lnhr1983+lnhr1984+ /*
>
*/ lnhr1985+lnhr1986+lnhr1987+lnhr1988) / 10
. gen avelnwg = (lnwg1979+lnwg1980+lnwg1981+lnwg1982+lnwg1983+lnwg1984+ /*
>
*/ lnwg1985+lnwg1986+lnwg1987+lnwg1988) / 10

466

.
. * Should replicate xtreg, be
. regress avelnhr avelnwg
Source |
SS
df
MS
Number of obs = 532
-------------+-----------------------------F( 1, 530) = 11.55
Model | .363013807 1 .363013807
Prob > F
= 0.0007
Residual | 16.6523404 530 .03141951
R-squared = 0.0213
-------------+-----------------------------Adj R-squared = 0.0195
Total | 17.0153542 531 .032043982
Root MSE
= .17726
-----------------------------------------------------------------------------avelnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------avelnwg | .0668379 .0196635 3.40 0.001 .0282099 .1054658
_cons | 7.483021 .0518829 144.23 0.000
7.3811 7.584943
-----------------------------------------------------------------------------. estimates store bebyols
.
. * Better is the following as gives heteroskedastic robust standard errors
. regress avelnhr avelnwg, robust
Regression with robust standard errors
Number of obs =
F( 1, 530) = 7.55
Prob > F
= 0.0062
R-squared = 0.0213
Root MSE = .17726

532

-----------------------------------------------------------------------------|
Robust
avelnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------avelnwg | .0668379 .0243185 2.75 0.006 .0190654 .1146103
_cons | 7.483021 .0657699 113.78 0.000
7.35382 7.612223
-----------------------------------------------------------------------------. estimates store behet
.
. * Or could bootstrap
. bootstrap "regress avelnhr avelnwg" "_b[avelnwg] _b[_cons]", reps(200) level(95)
command:
regress avelnhr avelnwg
statistics: _bs_1
= _b[avelnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
Replications =
200

532

467

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 200 .0668379 -.0010221 .0239486 .0196123 .1140634 (N)
|
.0233175 .1143305 (P)
|
.0266221 .1175503 (BC)
_bs_2 | 200 7.483021 .0029632 .0648396 7.35516 7.610882 (N)
|
7.362745 7.600107 (P)
|
7.358079 7.591704 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix bebootse = e(se)
.
. *** (3) CREATE DIFFERENCED DATA FOR FE AND RE
.
. * Continue with data already and then reshape
. * Mean difference for FE and quasi for RE-GLS and RE-MLE
.
. * Mean difference for FE
. gen mdlnhr1979 = lnhr1979 - avelnhr
. gen mdlnhr1980 = lnhr1980 - avelnhr
. gen mdlnhr1981 = lnhr1981 - avelnhr
. gen mdlnhr1982 = lnhr1982 - avelnhr
. gen mdlnhr1983 = lnhr1983 - avelnhr
. gen mdlnhr1984 = lnhr1984 - avelnhr
. gen mdlnhr1985 = lnhr1985 - avelnhr
. gen mdlnhr1986 = lnhr1986 - avelnhr
. gen mdlnhr1987 = lnhr1987 - avelnhr
. gen mdlnhr1988 = lnhr1988 - avelnhr
. gen mdlnwg1979 = lnwg1979 - avelnwg
. gen mdlnwg1980 = lnwg1980 - avelnwg
. gen mdlnwg1981 = lnwg1981 - avelnwg
. gen mdlnwg1982 = lnwg1982 - avelnwg

468

. gen mdlnwg1983 = lnwg1983 - avelnwg


. gen mdlnwg1984 = lnwg1984 - avelnwg
. gen mdlnwg1985 = lnwg1985 - avelnwg
. gen mdlnwg1986 = lnwg1986 - avelnwg
. gen mdlnwg1987 = lnwg1987 - avelnwg
. gen mdlnwg1988 = lnwg1988 - avelnwg
.
. * Quasi difference for RE - GLS
. gen reglsdlnhr1979 = lnhr1979 - lamdaregls*avelnhr
. gen reglsdlnhr1980 = lnhr1980 - lamdaregls*avelnhr
. gen reglsdlnhr1981 = lnhr1981 - lamdaregls*avelnhr
. gen reglsdlnhr1982 = lnhr1982 - lamdaregls*avelnhr
. gen reglsdlnhr1983 = lnhr1983 - lamdaregls*avelnhr
. gen reglsdlnhr1984 = lnhr1984 - lamdaregls*avelnhr
. gen reglsdlnhr1985 = lnhr1985 - lamdaregls*avelnhr
. gen reglsdlnhr1986 = lnhr1986 - lamdaregls*avelnhr
. gen reglsdlnhr1987 = lnhr1987 - lamdaregls*avelnhr
. gen reglsdlnhr1988 = lnhr1988 - lamdaregls*avelnhr
. gen reglsdlnwg1979 = lnwg1979 - lamdaregls*avelnwg
. gen reglsdlnwg1980 = lnwg1980 - lamdaregls*avelnwg
. gen reglsdlnwg1981 = lnwg1981 - lamdaregls*avelnwg
. gen reglsdlnwg1982 = lnwg1982 - lamdaregls*avelnwg
. gen reglsdlnwg1983 = lnwg1983 - lamdaregls*avelnwg
. gen reglsdlnwg1984 = lnwg1984 - lamdaregls*avelnwg
. gen reglsdlnwg1985 = lnwg1985 - lamdaregls*avelnwg
. gen reglsdlnwg1986 = lnwg1986 - lamdaregls*avelnwg
. gen reglsdlnwg1987 = lnwg1987 - lamdaregls*avelnwg
469

. gen reglsdlnwg1988 = lnwg1988 - lamdaregls*avelnwg


.
. * Quasi difference for RE - MLE
. gen remledlnhr1979 = lnhr1979 - lamdaremle*avelnhr
. gen remledlnhr1980 = lnhr1980 - lamdaremle*avelnhr
. gen remledlnhr1981 = lnhr1981 - lamdaremle*avelnhr
. gen remledlnhr1982 = lnhr1982 - lamdaremle*avelnhr
. gen remledlnhr1983 = lnhr1983 - lamdaremle*avelnhr
. gen remledlnhr1984 = lnhr1984 - lamdaremle*avelnhr
. gen remledlnhr1985 = lnhr1985 - lamdaremle*avelnhr
. gen remledlnhr1986 = lnhr1986 - lamdaremle*avelnhr
. gen remledlnhr1987 = lnhr1987 - lamdaremle*avelnhr
. gen remledlnhr1988 = lnhr1988 - lamdaremle*avelnhr
. gen remledlnwg1979 = lnwg1979 - lamdaremle*avelnwg
. gen remledlnwg1980 = lnwg1980 - lamdaremle*avelnwg
. gen remledlnwg1981 = lnwg1981 - lamdaremle*avelnwg
. gen remledlnwg1982 = lnwg1982 - lamdaremle*avelnwg
. gen remledlnwg1983 = lnwg1983 - lamdaremle*avelnwg
. gen remledlnwg1984 = lnwg1984 - lamdaremle*avelnwg
. gen remledlnwg1985 = lnwg1985 - lamdaremle*avelnwg
. gen remledlnwg1986 = lnwg1986 - lamdaremle*avelnwg
. gen remledlnwg1987 = lnwg1987 - lamdaremle*avelnwg
. gen remledlnwg1988 = lnwg1988 - lamdaremle*avelnwg
.
. *** NOW BACK TO LONG FORM
.
. * Then back to long form
. reshape long lnhr lnwg mdlnhr mdlnwg reglsdlnhr reglsdlnwg remledlnhr remledlnwg, i(id) j(year)
(note: j = 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988)
470

Data
wide -> long
----------------------------------------------------------------------------Number of obs.
532 -> 5320
Number of variables
85 ->
14
j variable (10 values)
-> year
xij variables:
lnhr1979 lnhr1980 ... lnhr1988 -> lnhr
lnwg1979 lnwg1980 ... lnwg1988 -> lnwg
mdlnhr1979 mdlnhr1980 ... mdlnhr1988 -> mdlnhr
mdlnwg1979 mdlnwg1980 ... mdlnwg1988 -> mdlnwg
reglsdlnhr1979 reglsdlnhr1980 ... reglsdlnhr1988->reglsdlnhr
reglsdlnwg1979 reglsdlnwg1980 ... reglsdlnwg1988->reglsdlnwg
remledlnhr1979 remledlnhr1980 ... remledlnhr1988->remledlnhr
remledlnwg1979 remledlnwg1980 ... remledlnwg1988->remledlnwg
----------------------------------------------------------------------------.
. describe
Contains data
obs:
5,320
vars:
14
size:
276,640 (97.2% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
year
int %9.0g
lnhr
float %9.0g
lnwg
float %9.0g
avelnhr
float %9.0g
avelnwg
float %9.0g
_est_bebyols byte %8.0g
esample() from estimates store
_est_behet
byte %8.0g
esample() from estimates store
mdlnhr
float %9.0g
mdlnwg
float %9.0g
reglsdlnhr
float %9.0g
reglsdlnwg
float %9.0g
remledlnhr
float %9.0g
remledlnwg
float %9.0g
------------------------------------------------------------------------------Sorted by: id year
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
5320
266.5 153.5893
1
532
471

year |
5320
1983.5 2.872551
1979
1988
lnhr |
5320 7.65743 .2855914
2.77
8.56
lnwg |
5320 2.609436 .4258924
-.26
4.69
avelnhr |
5320 7.65743 .1788568
6.416
8.242
-------------+-------------------------------------------------------avelnwg |
5320 2.609436 .3908626
1.346
4.543
_est_bebyols |
5320
1
0
1
1
_est_behet |
5320
1
0
1
1
mdlnhr |
5320 -1.21e-09 .2226492 -3.988
1.344
mdlnwg |
5320 -9.86e-10 .1691472 -2.54 1.878
-------------+-------------------------------------------------------reglsdlnhr |
5320 3.18006 .2347122 -1.181465 4.008506
reglsdlnwg |
5320 1.083675 .2344336 -1.593137 2.966892
remledlnhr |
5320 3.166493 .2346121 -1.193439 3.997138
remledlnwg |
5320 1.079051 .2339546 -1.597177 2.962247
. save MOM2, replace
file MOM2.dta saved
.
. *** (4) FIXED EFFECTS ESTIMATOR USING DIFFERENCED DATA
.
. * This should replicate xtreg, fe
. regress mdlnhr mdlnwg
Source |
SS
df
MS
Number of obs = 5320
-------------+-----------------------------F( 1, 5318) = 87.72
Model | 4.27857391 1 4.27857391
Prob > F
= 0.0000
Residual | 259.39846 5318 .048777446
R-squared = 0.0162
-------------+-----------------------------Adj R-squared = 0.0160
Total | 263.677034 5319 .04957267
Root MSE
= .22086
-----------------------------------------------------------------------------mdlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mdlnwg | .1676755 .0179032 9.37 0.000
.132578 .202773
_cons | -1.04e-09 .003028 -0.00 1.000 -.0059361 .0059361
-----------------------------------------------------------------------------. estimates store febyols
.
. * This gives panel corrected standard errors
. regress mdlnhr mdlnwg, cluster(id)
Regression with robust standard errors
Number of obs = 5320
F( 1, 531) = 3.89
Prob > F
= 0.0490
R-squared = 0.0162
Number of clusters (id) = 532
Root MSE
= .22086

472

-----------------------------------------------------------------------------|
Robust
mdlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mdlnwg | .1676755 .0849706 1.97 0.049 .0007557 .3345953
_cons | -1.04e-09 6.39e-09 -0.16 0.870 -1.36e-08 1.15e-08
-----------------------------------------------------------------------------. estimates store fepanel
.
. * This gives panel bootstrap standard errors
. * Similar to bootstrap applied to xtreg, fe
. set seed 10001
. bs "regress mdlnhr mdlnwg" "_b[mdlnwg] _b[_cons]", cluster(id) reps($nreps) level(95)
command:
regress mdlnhr mdlnwg
statistics: _bs_1
= _b[mdlnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

5320

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .1676755 -.0055543 .0844631 .0017284 .3336226 (N)
|
.0213276 .3318829 (P)
|
.0300515 .3605573 (BC)
_bs_2 | 500 -1.04e-09 2.79e-10 6.50e-09 -1.38e-08 1.17e-08 (N)
|
-1.39e-08 1.28e-08 (P)
|
-1.41e-08 1.17e-08 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix febootse = e(se)
.
. * This gives heteroskedasticity corrected standard errors that are not panel robust
. regress mdlnhr mdlnwg, robust
Regression with robust standard errors
Number of obs =
F( 1, 5318) = 7.79
Prob > F
= 0.0053
R-squared = 0.0162
Root MSE
= .22086

5320

473

-----------------------------------------------------------------------------|
Robust
mdlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mdlnwg | .1676755 .0600942 2.79 0.005 .0498662 .2854848
_cons | -1.04e-09 .003028 -0.00 1.000 -.0059361 .0059361
-----------------------------------------------------------------------------. estimates store fehet
.
. *** (5) RANDOM EFFECTS - GLS ESTIMATOR USING DIFFERENCED DATA
.
. * Should give same coefficient estimates as xtreg
. * May give different standard errors as treats lamda as known
. * but in practice the differnece is not great as lamda precisely estimated
.
. * This should replicate xtreg, re
. regress reglsdlnhr reglsdlnwg
Source |
SS
df
MS
Number of obs = 5320
-------------+-----------------------------F( 1, 5318) = 76.64
Model | 4.16279701 1 4.16279701
Prob > F
= 0.0000
Residual | 288.860014 5318 .054317415
R-squared = 0.0142
-------------+-----------------------------Adj R-squared = 0.0140
Total | 293.022811 5319 .055089831
Root MSE
= .23306
-----------------------------------------------------------------------------reglsdlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------reglsdlnwg | .1193323 .0136312 8.75 0.000 .0926095 .146055
_cons | 3.050743 .0151135 201.86 0.000 3.021114 3.080371
-----------------------------------------------------------------------------. estimates store reglsbyols
.
. * This gives panel corrected standard errors
. regress reglsdlnhr reglsdlnwg, cluster(id)
Regression with robust standard errors
Number of obs = 5320
F( 1, 531) = 5.39
Prob > F
= 0.0206
R-squared = 0.0142
Number of clusters (id) = 532
Root MSE
= .23306
-----------------------------------------------------------------------------|
Robust
reglsdlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------reglsdlnwg | .1193323 .0514016 2.32 0.021 .0183568 .2203077
474

_cons | 3.050743 .0571367 53.39 0.000 2.938501 3.162984


-----------------------------------------------------------------------------. estimates store reglspanel
.
. * This gives panel bootstrap standard errors
. * Similar to bootstrap applied to xtreg, fe
. set seed 10001
. bs "regress reglsdlnhr reglsdlnwg" "_b[reglsdlnwg] _b[_cons]", cluster(id) reps($nreps) level(95)
command:
regress reglsdlnhr reglsdlnwg
statistics: _bs_1
= _b[reglsdlnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

5320

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .1193323 -.0020689 .0516757 .0178035 .220861 (N)
|
.0300938 .2277364 (P)
|
.0339291 .236732 (BC)
_bs_2 | 500 3.050743 .0022622 .0571941 2.938372 3.163114 (N)
|
2.93212 3.148191 (P)
|
2.920954 3.143819 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix reglsbootse = e(se)
.
. * This gives heteroskedasticity corrected standard errors that are not panel robust
. regress reglsdlnhr reglsdlnwg, robust
Regression with robust standard errors
Number of obs =
F( 1, 5318) = 7.81
Prob > F
= 0.0052
R-squared = 0.0142
Root MSE
= .23306

5320

-----------------------------------------------------------------------------|
Robust
reglsdlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------reglsdlnwg | .1193323 .0426897 2.80 0.005
.035643 .2030215
475

_cons | 3.050743 .047821 63.80 0.000 2.956994 3.144491


-----------------------------------------------------------------------------. estimates store reglshet
.
. *** (6) RANDOM EFFECTS - MLE ESTIMATOR USING DIFFERENCED DATA
.
. * Should give same coefficient estimates as xtreg
. * May give different standard errors as treats lamda as known
. * but in practice the differnece is not great as lamda precisely estimated
.
. * This should replicate xtreg, mle
. regress remledlnhr remledlnwg
Source |
SS
df
MS
Number of obs = 5320
-------------+-----------------------------F( 1, 5318) = 76.67
Model | 4.16076808 1 4.16076808
Prob > F
= 0.0000
Residual | 288.612179 5318 .054270812
R-squared = 0.0142
-------------+-----------------------------Adj R-squared = 0.0140
Total | 292.772947 5319 .055042855
Root MSE
= .23296
-----------------------------------------------------------------------------remledlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------remledlnwg | .1195474 .0136533 8.76 0.000 .0927814 .1463134
_cons | 3.037495 .0150748 201.49 0.000 3.007942 3.067048
-----------------------------------------------------------------------------. estimates store remlebyols
.
. * This gives panel corrected standard errors
. regress remledlnhr remledlnwg, cluster(id)
Regression with robust standard errors
Number of obs = 5320
F( 1, 531) = 5.38
Prob > F
= 0.0208
R-squared = 0.0142
Number of clusters (id) = 532
Root MSE
= .23296
-----------------------------------------------------------------------------|
Robust
remledlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------remledlnwg | .1195474 .0515474 2.32 0.021 .0182855 .2208093
_cons | 3.037495 .0570501 53.24 0.000 2.925424 3.149567
-----------------------------------------------------------------------------. estimates store remlepanel

476

.
. * This gives panel bootstrap standard errors
. * Similar to bootstrap applied to xtreg, fe
. set seed 10001
. bs "regress remledlnhr remledlnwg" "_b[remledlnwg] _b[_cons]", cluster(id) reps($nreps)
level(95)
command:
regress remledlnhr remledlnwg
statistics: _bs_1
= _b[remledlnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

5320

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .1195474 -.0020813 .0518188 .0177375 .2213573 (N)
|
.0300552 .2282355 (P)
|
.0339668 .2372786 (BC)
_bs_2 | 500 3.037495 .0022658 .0571042 2.925301 3.149689 (N)
|
2.919076 3.134685 (P)
|
2.907989 3.13043 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix remlebootse = e(se)
.
. * This gives heteroskedasticity corrected standard errors that are not panel robust
. regress reglsdlnhr reglsdlnwg, robust
Regression with robust standard errors
Number of obs =
F( 1, 5318) = 7.81
Prob > F
= 0.0052
R-squared = 0.0142
Root MSE
= .23306

5320

-----------------------------------------------------------------------------|
Robust
reglsdlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------reglsdlnwg | .1193323 .0426897 2.80 0.005
.035643 .2030215
_cons | 3.050743 .047821 63.80 0.000 2.956994 3.144491
-----------------------------------------------------------------------------. estimates store remlehet
477

.
. *** (7) ROBUST VARIANT OF HAUSMAN TEST
.
. * From Section 21.4.3 pages 717-9 the usual implementation of the Hausman test
. * is invalid if there is any intracluster correlation left in the RE model
. * as then the RE estimator is no longer fully efficient
. * so Var[b_RE - b_FE] does not equal Var[b_FE] - V[b_RE]
.
. * (7A) Nonrobust version of Hausman test by auxiliary regression
.*
[will be similar to nonrobust version in mma21p1panfeandre.do]
. regress reglsdlnhr reglsdlnwg mdlnwg
Source |
SS
df
MS
Number of obs = 5320
-------------+-----------------------------F( 2, 5317) = 45.26
Model | 4.90465081 2 2.45232541
Prob > F
= 0.0000
Residual | 288.11816 5317 .054188106
R-squared = 0.0167
-------------+-----------------------------Adj R-squared = 0.0164
Total | 293.022811 5319 .055089831
Root MSE
= .23278
-----------------------------------------------------------------------------reglsdlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------reglsdlnwg | .0668379 .0196635 3.40 0.001 .0282893 .1053864
mdlnwg | .1008376 .0272531 3.70 0.000 .0474104 .1542648
_cons | 3.10763 .0215465 144.23 0.000
3.06539 3.14987
-----------------------------------------------------------------------------. scalar Hnonrobust = (_b[mdlnwg]/_se[mdlnwg])^2
. di Hnonrobust
13.690344
.
. * Perform preferred valid robust version of Hausman test
. * This gives the results presented on p.719
. regress reglsdlnhr reglsdlnwg mdlnwg, cluster(id)
Regression with robust standard errors
Number of obs = 5320
F( 2, 531) = 4.24
Prob > F
= 0.0149
R-squared = 0.0167
Number of clusters (id) = 532
Root MSE
= .23278
-----------------------------------------------------------------------------|
Robust
reglsdlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------reglsdlnwg | .0668379 .0243001 2.75 0.006 .0191016 .1145741
mdlnwg | .1008376 .0785137 1.28 0.200 -.053398 .2550732
_cons | 3.10763 .027293 113.86 0.000 3.054014 3.161245
478

-----------------------------------------------------------------------------. scalar Hrobust = (_b[mdlnwg]/_se[mdlnwg])^2


. di Hrobust
1.6495074
.
. ********* DISPLAY RESULTS - Table 21.2 on page 710 *********
.
. * All estimates should be equal for a given estimator.
. * The standard errors will vary.
. * The first and second assume iid errors and generally will be the same.
. * The third assumes heteroskedastic errors, but are not panel robust.
. * The fourth are panel robust and also allow for heteroskedasticity.
. estimates table bextreg bebyols behet, b(%10.3f) se /*
> */ stats(N ll r2 tss rss mss rmse df_r)
----------------------------------------------------Variable | bextreg
bebyols
behet
-------------+--------------------------------------lnwg |
0.067
|
0.020
avelnwg |
0.067
0.067
|
0.020
0.024
_cons |
7.483
7.483
7.483
|
0.052
0.052
0.066
-------------+--------------------------------------N | 5320.000
532.000
532.000
ll | 166.573
166.573
166.573
r2 |
0.021
0.021
0.021
tss |
rss | 16.652
16.652
16.652
mss |
0.363
0.363
0.363
rmse |
0.177
0.177
0.177
df_r | 530.000
530.000
530.000
----------------------------------------------------legend: b/se
. estimates table fextreg febyols fehet fepanel, b(%10.3f) se /*
> */ stats(N ll r2 tss rss mss rmse df_r)
-----------------------------------------------------------------Variable | fextreg
febyols
fehet
fepanel
-------------+---------------------------------------------------lnwg |
0.168
| 0.019
mdlnwg |
0.168
0.168
0.168
|
0.018
0.060
0.085
_cons |
7.220
-0.000
-0.000
-0.000
|
0.049
0.003
0.003
0.000
479

-------------+---------------------------------------------------N | 5320.000 5320.000 5320.000 5320.000


ll | 486.743
486.743
486.743
486.743
r2 |
0.016
0.016
0.016
0.016
tss | 433.831
rss | 259.398
259.398
259.398
259.398
mss |
4.279
4.279
4.279
4.279
rmse |
0.233
0.221
0.221
0.221
df_r | 4787.000 5318.000 5318.000
531.000
-----------------------------------------------------------------legend: b/se
. estimates table reglsxtreg reglsbyols reglshet reglspanel, b(%10.3f) se /*
> */ stats(N ll r2 tss rss mss rmse df_r)
-----------------------------------------------------------------Variable | reglsxtreg reglsbyols reglshet reglspanel
-------------+---------------------------------------------------lnwg |
0.119
| 0.014
reglsdlnwg |
0.119
0.119
0.119
|
0.014
0.043
0.051
_cons |
7.346
3.051
3.051
3.051
|
0.036
0.015
0.048
0.057
-------------+---------------------------------------------------N | 5320.000 5320.000 5320.000 5320.000
ll |
200.589
200.589
200.589
r2 |
0.014
0.014
0.014
tss |
rss |
288.860
288.860
288.860
mss |
4.163
4.163
4.163
rmse |
0.233
0.233
0.233
df_r |
5318.000 5318.000
531.000
-----------------------------------------------------------------legend: b/se
. estimates table remlextreg remlebyols remlehet remlepanel, b(%10.3f) se /*
> */ stats(N ll r2 tss rss mss rmse df_r)
-----------------------------------------------------------------Variable | remlextreg remlebyols remlehet remlepanel
-------------+---------------------------------------------------lnhr
|
lnwg |
0.120
| 0.014
_cons |
7.345
| 0.037
-------------+---------------------------------------------------sigma_u
|
_cons |
0.162
| 0.006
480

-------------+---------------------------------------------------sigma_e
|
_cons |
0.233
| 0.002
-------------+---------------------------------------------------_
|
remledlnwg |
0.120
0.120
|
0.014
0.052
reglsdlnwg |
0.119
|
0.043
_cons |
3.037
3.051
3.037
|
0.015
0.048
0.057
-------------+---------------------------------------------------Statistics |
N | 5320.000 5320.000 5320.000 5320.000
ll | -266.912
202.872
200.589
202.872
r2 |
0.014
0.014
0.014
tss |
rss |
288.612
288.860
288.612
mss |
4.161
4.163
4.161
rmse |
0.233
0.233
0.233
df_r |
5318.000 5318.000
531.000
-----------------------------------------------------------------legend: b/se
.
. * The following are (panel) bootstrap standard errors
. matrix list bebootse
bebootse[1,2]
_bs_1
_bs_2
se .02394857 .06483965
. matrix list febootse
febootse[1,2]
_bs_1
_bs_2
se .08446309 6.497e-09
. * Note that the following two differ from mma21p1panfeandre.do
. * as here the same value of lamda is used throught the bootstraps
. matrix list remlebootse
remlebootse[1,2]
_bs_1
_bs_2
se .05181879 .05710419
. matrix list reglsbootse
reglsbootse[1,2]
_bs_1
_bs_2
481

se .05167569 .05719414
.
. * For completeness give lamda
. di lamdaregls
.58470925
. di lamdaremle
.58648101
.
. * Robust and nonrobust versions of Hausman test given on p.719
. di Hnonrobust /* Not valid if intracluster correlation */
13.690344
. di Hrobust
1.6495074

/* Valid if intracluster correlation */

.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section5\mma21p2panmanual.txt
log type: text
closed on: 23 May 2005, 11:35:55
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma21p2panresiduals.txt
log type: text
opened on: 23 May 2005, 11:37:22
.
. ********** OVERVIEW OF MMA21P3PANRESIDUALS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 21.3.4 pages 713-15 Residual analysis
. * This program
. * (1) estimates correlations for
. * - dependent variable
. * - regressors variable
. * - residuals from pooled ols [Table 21.3]
. * - residuals from within estimation [Table 21.4]
. * - residuals from random effects estimation
. * (2) separately estimates correlations for
. * - residuals from first differences estiamtion
. * (3) gets correlations for each individual observation
.
482

. * The code is very limited:


. * - it considers only one regressor
. * - it assumes a balanced data set with exactly 10 years of data per obnservations
. * - it does not use loops for transformations which would generalize code
.
. * The four basic linear panel programs are
. * mma21p1panfeandre.do Linear fixed and random effects using xtreg
. * mma21p2panfeandre.do Linear fe and re using transformation and regress
.*
plus also has valid Hausman test
. * mma21p3panresiduals.do Residual analysis after linear fe and re
. * mma21p4panpangls.do Pooled panel OLS and GLS
.
. * To run you need file
. * MOM.dat
. * in your directory
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION **********
.
. * The original data is from
. * Jim Ziliak (1997)
. * "Efficient Estimation With Panel Data when Instruments are Predetermined:
. * An Emprirical Comparison of Moment-Condition Estimators"
. * Journal of Business and Economic Statistics, 15, 419-431
.
. * File MOM.dat has data on 532 men over 10 years (1979-1988)
. * Data are space-delimited ordered by person with separate line for each year
. * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ...
. * 8 variables:
. * lnhr lnwg kids ageh agesq disab id year
.
. * File MOM.dat is the version of the data posted at the JBES website
. * Note that in chapter 22 we instead use MOMprecise.dat
. * which is the same data set but with more significant digits
.
. ********** READ DATA **********
.*
. * The data are in ascii file MOM.dat
. * There are 532 individuals with 10 lines (years) per individual
. * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY
. infile lnhr lnwg kids ageh agesq disab id year using MOM.dat
(5320 observations read)

483

. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lnhr |
5320 7.65743 .2855914
2.77
8.56
lnwg |
5320 2.609436 .4258924
-.26
4.69
kids |
5320 1.555827 1.195924
0
6
ageh |
5320 38.91823 8.450351
22
60
agesq |
5320 1586.024 689.7759
484
3600
-------------+-------------------------------------------------------disab |
5320 .0609023 .2391734
0
1
id |
5320
266.5 153.5893
1
532
year |
5320
1983.5 2.872551
1979
1988
.
. ************ (1) ANALYSIS: OBTAIN KEY AUTOCORRELATIONS Tables 21.3, 21.4
**********
.
. ** RUN REGRESSIONS AND GET RESIDUALS OF INTEREST
.
. * pooled ols
. regress lnhr lnwg
Source |
SS
df
MS
Number of obs = 5320
-------------+-----------------------------F( 1, 5318) = 82.22
Model | 6.60538417 1 6.60538417
Prob > F
= 0.0000
Residual | 427.225206 5318 .080335691
R-squared = 0.0152
-------------+-----------------------------Adj R-squared = 0.0150
Total | 433.830591 5319 .081562435
Root MSE
= .28344
-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0827436 .0091251 9.07 0.000 .0648545 .1006326
_cons | 7.441516 .0241265 308.44 0.000 7.394219 7.488814
-----------------------------------------------------------------------------. predict upols, residuals
.
. * fixed effects (within)
. xtreg lnhr lnwg, fe i(id)
Fixed-effects (within) regression
Group variable (i): id
R-sq: within = 0.0162
between = 0.0213
overall = 0.0152

Number of obs
=
5320
Number of groups =
532
Obs per group: min =
avg =
10.0
max =
10

F(1,4787)

10

78.96
484

corr(u_i, Xb) = -0.1995

Prob > F

0.0000

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1676755 .01887 8.89 0.000 .1306816 .2046694
_cons | 7.219892 .0493434 146.32 0.000 7.123156 7.316628
-------------+---------------------------------------------------------------sigma_u | .18142881
sigma_e | .23278339
rho | .37789558 (fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0: F(531, 4787) = 5.83
Prob > F = 0.0000
. predict ufe, e
.
. * random effects
. xtreg lnhr lnwg, re i(id)
Random-effects GLS regression
Group variable (i): id
R-sq: within = 0.0162
between = 0.0213
overall = 0.0152
Random effects u_i ~ Gaussian
corr(u_i, X)
= 0 (assumed)

Number of obs
Number of groups =

=
5320
532

Obs per group: min =


avg =
10.0
max =
10

10

Wald chi2(1)
= 76.64
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1193322 .0136312 8.75 0.000 .0926155 .146049
_cons | 7.346041 .0363925 201.86 0.000 7.274713 7.417368
-------------+---------------------------------------------------------------sigma_u | .16124733
sigma_e | .23278339
rho | .32424354 (fraction of variance due to u_i)
-----------------------------------------------------------------------------. predict ure, e
.
. summarize upols ufe ure
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------upols |
5320 -1.27e-10 .2834089 -4.826247 .964581
ufe |
5320 -5.52e-11 .2208354 -4.003929 1.2719
ure |
5320 -9.00e-11 .2231118 -4.131111 1.085362
485

. save mom3, replace


file mom3.dta saved
.
. ** TRANSFORM DATA FROM LONG FORM TO WIDE FORM
.
. * Here just do this for lnhr and lnwg and the residuals
. keep lnhr lnwg id year upols ufe ure
. reshape wide lnhr lnwg upols ufe ure, i(id) j(year)
(note: j = 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988)
Data
long -> wide
----------------------------------------------------------------------------Number of obs.
5320 -> 532
Number of variables
7 ->
51
j variable (10 values)
year -> (dropped)
xij variables:
lnhr -> lnhr1979 lnhr1980 ... lnhr1988
lnwg -> lnwg1979 lnwg1980 ... lnwg1988
upols -> upols1979 upols1980 ... upols1988
ufe -> ufe1979 ufe1980 ... ufe1988
ure -> ure1979 ure1980 ... ure1988
----------------------------------------------------------------------------.
. * Since year is 1979 to 1988 this will create
. * lnhr1979 to lnhr1988 and lnwg1979 to lnwg1988
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
532
266.5 153.7194
1
532
lnhr1979 |
532 7.669342 .249361
5.89
8.54
lnwg1979 |
532 2.597763 .4188951
.52
4.62
upols1979 |
532 .0128775 .2517228 -1.764168 .8312218
ufe1979 |
532 .0138689 .2249175 -1.578105 1.2719
-------------+-------------------------------------------------------ure1979 |
532 .0133046 .2200196 -1.618987 1.085362
lnhr1980 |
532 7.660094 .2691995
5.22
8.34
lnwg1980 |
532 2.602368 .3945963
.8
4.61
upols1980 |
532 .0032483 .2679463 -2.354734 .6659743
ufe1980 |
532 .0038486 .2253673 -2.085636 1.128546
-------------+-------------------------------------------------------ure1980 |
532 .0035069 .2238723 -2.089847 .9429754
lnhr1981 |
532 7.66765 .2105797
6.36
8.4
lnwg1981 |
532 2.610959 .3870011
1.53
4.53
upols1981 |
532 .0100939 .2133106 -1.342159 .7582438
ufe1981 |
532 .0099646 .163407 -1.001722 1.03687
486

-------------+-------------------------------------------------------ure1981 |
532 .0100382 .1596593 -1.02491 .8517824
lnhr1982 |
532 7.64609 .2427195
5.38
8.31
lnwg1982 |
532 2.61468 .4014363
1.21
4.61
upols1982 |
532 -.0117742 .2422735 -2.264238 .6897579
ufe1982 |
532 -.0122196 .1890237 -1.623214 .7918997
-------------+-------------------------------------------------------ure1982 |
532 -.0119661 .1875585 -1.737484 .6666697
lnhr1983 |
532 7.613064 .382703
2.77
8.37
lnwg1983 |
532 2.610526 .4111869
1.08
4.62
upols1983 |
532 -.0444568 .3778255 -4.826247 .7307264
ufe1983 |
532 -.0445494 .2836351 -3.577253 .5196197
-------------+-------------------------------------------------------ure1983 |
532 -.0444967 .294545 -3.804399 .5078294
lnhr1984 |
532 7.636523 .3316735
3.18
8.44
lnwg1984 |
532 2.600188 .4621549
-.26
4.65
upols1984 |
532 -.0201427 .3208512 -4.240003 .8263766
ufe1984 |
532 -.0193572 .225836 -2.810104 .8327778
-------------+-------------------------------------------------------ure1984 |
532 -.0198043 .2378605 -3.140221 .7036628
lnhr1985 |
532 7.668365 .2597423
5.08
8.54
lnwg1985 |
532 2.614944 .4347554
1.33
4.69
upols1985 |
532 .0104785 .259051 -2.503835 .8624523
ufe1985 |
532 .0100107 .1856724 -1.581894 .7944546
-------------+-------------------------------------------------------ure1985 |
532 .010277 .1886509 -1.752727 .7370209
lnhr1986 |
532 7.659286 .3330862
2.77
8.38
lnwg1986 |
532 2.602632 .4432807
.07
4.59
upols1986 |
532 .0024183 .3312105 -4.801424 .7439653
ufe1986 |
532 .0029962 .2595405 -4.003929 .6384854
-------------+-------------------------------------------------------ure1986 |
532 .0026673 .264328 -4.131111 .5111209
lnhr1987 |
532 7.67406 .2745015
4.38
8.56
lnwg1987 |
532 2.614699 .4300122
1.28
4.03
upols1987 |
532 .0161942 .2749153 -3.283269 .964581
ufe1987 |
532 .0157472 .2141618 -2.817174 1.009662
-------------+-------------------------------------------------------ure1987 |
532 .0160016 .2148092 -2.897725 .8441463
lnhr1988 |
532 7.679831 .2552894
4.79
8.53
lnwg1988 |
532 2.625602 .4701759
-.22
4.6
upols1988 |
532 .0210628 .2519891 -2.633313 .9072749
ufe1988 |
532 .0196898 .2048927 -1.68379 1.123516
-------------+-------------------------------------------------------ure1988 |
532 .0204713 .2022375 -1.897506 .9393954
.
. ** OBTAIN THE VARIOUS CORRELATIONS
.
. corr lnhr1979 lnhr1980 lnhr1981 lnhr1982 lnhr1983 lnhr1984 lnhr1985 lnhr1986 lnhr1987
lnhr1988
(obs=532)
487

| lnhr1979 lnhr1980 lnhr1981 lnhr1982 lnhr1983 lnhr1984 lnhr1985 lnhr1986 lnhr1987


-------------+--------------------------------------------------------------------------------lnhr1979 | 1.0000
lnhr1980 | 0.3220 1.0000
lnhr1981 | 0.4321 0.4022 1.0000
lnhr1982 | 0.2947 0.3142 0.5670 1.0000
lnhr1983 | 0.2070 0.2324 0.3788 0.4781 1.0000
lnhr1984 | 0.1908 0.2235 0.3141 0.3318 0.6476 1.0000
lnhr1985 | 0.2284 0.3184 0.3999 0.3453 0.3930 0.5839 1.0000
lnhr1986 | 0.1934 0.1931 0.2813 0.2524 0.3162 0.3595 0.4128 1.0000
lnhr1987 | 0.1986 0.3160 0.3322 0.2951 0.3261 0.3464 0.3987 0.3603 1.0000
lnhr1988 | 0.1640 0.2551 0.3081 0.2674 0.2267 0.2537 0.3509 0.5741 0.5248
| lnhr1988
-------------+--------lnhr1988 | 1.0000

. corr lnwg1979 lnwg1980 lnwg1981 lnwg1982 lnwg1983 lnwg1984 lnwg1985 lnwg1986


lnwg1987 lnwg1988
(obs=532)
| lnwg1979 lnwg1980 lnwg1981 lnwg1982 lnwg1983 lnwg1984 lnwg1985 lnwg1986
lnwg1987
-------------+--------------------------------------------------------------------------------lnwg1979 | 1.0000
lnwg1980 | 0.8415 1.0000
lnwg1981 | 0.8283 0.8920 1.0000
lnwg1982 | 0.7984 0.8559 0.9015 1.0000
lnwg1983 | 0.7795 0.8408 0.8787 0.9155 1.0000
lnwg1984 | 0.7208 0.7737 0.8102 0.8267 0.8625 1.0000
lnwg1985 | 0.7424 0.7929 0.8290 0.8511 0.8636 0.8620 1.0000
lnwg1986 | 0.7250 0.7714 0.8122 0.8286 0.8530 0.8399 0.9157 1.0000
lnwg1987 | 0.7188 0.7639 0.8029 0.8282 0.8525 0.8681 0.9117 0.9111 1.0000
lnwg1988 | 0.7220 0.7604 0.7900 0.8139 0.8326 0.8373 0.8787 0.8743 0.9101
| lnwg1988
-------------+--------lnwg1988 | 1.0000

. * The following gives Table 21.3 p.714


. corr upols1979 upols1980 upols1981 upols1982 upols1983 upols1984 upols1985 upols1986
upols1987 upo
> ls1988
(obs=532)
| upo~1979 upo~1980 upo~1981 upo~1982 upo~1983 upo~1984 upo~1985 upo~1986
upo~1987
-------------+--------------------------------------------------------------------------------488

upols1979 |
upols1980 |
upols1981 |
upols1982 |
upols1983 |
upols1984 |
upols1985 |
upols1986 |
upols1987 |
upols1988 |

1.0000
0.3283
0.4442
0.3008
0.2089
0.2025
0.2395
0.1987
0.2091
0.1619

1.0000
0.4035
0.3140
0.2298
0.2289
0.3246
0.1903
0.3167
0.2456

1.0000
0.5678
0.3739
0.3194
0.4087
0.2797
0.3340
0.3016

1.0000
0.4684
0.3360
0.3484
0.2470
0.2877
0.2582

1.0000
0.6398
0.3898
0.3109
0.3097
0.2083

1.0000
0.5800
0.3535
0.3361
0.2470

1.0000
0.3991 1.0000
0.3941 0.3496 1.0000
0.3436 0.5545 0.5242

| upo~1988
-------------+--------upols1988 | 1.0000

. corr ure1979 ure1980 ure1981 ure1982 ure1983 ure1984 ure1985 ure1986 ure1987 ure1988
(obs=532)
| ure1979 ure1980 ure1981 ure1982 ure1983 ure1984 ure1985 ure1986 ure1987
-------------+--------------------------------------------------------------------------------ure1979 | 1.0000
ure1980 | 0.0778 1.0000
ure1981 | 0.1777 0.0604 1.0000
ure1982 | -0.0250 -0.0519 0.2492 1.0000
ure1983 | -0.2339 -0.2277 -0.1609 0.0587 1.0000
ure1984 | -0.2482 -0.2431 -0.2691 -0.1709 0.3795 1.0000
ure1985 | -0.1842 -0.0919 -0.1054 -0.1581 -0.0939 0.2197 1.0000
ure1986 | -0.1860 -0.2333 -0.2434 -0.2405 -0.1110 -0.0763 -0.0361 1.0000
ure1987 | -0.1665 -0.0481 -0.1580 -0.1904 -0.1710 -0.1506 -0.0646 -0.0553 1.0000
ure1988 | -0.1960 -0.1251 -0.1646 -0.1949 -0.3265 -0.2786 -0.1221 0.2708 0.2379
| ure1988
-------------+--------ure1988 | 1.0000

. * The following gives Table 21.4 p.715


. corr ufe1979 ufe1980 ufe1981 ufe1982 ufe1983 ufe1984 ufe1985 ufe1986 ufe1987 ufe1988
(obs=532)
| ufe1979 ufe1980 ufe1981 ufe1982 ufe1983 ufe1984 ufe1985 ufe1986 ufe1987
-------------+--------------------------------------------------------------------------------ufe1979 | 1.0000
ufe1980 | 0.1017 1.0000
ufe1981 | 0.2082 0.0802 1.0000
ufe1982 | 0.0003 -0.0380 0.2631 1.0000
ufe1983 | -0.2632 -0.2691 -0.2113 0.0089 1.0000
ufe1984 | -0.2594 -0.2698 -0.3004 -0.2037 0.3249 1.0000
ufe1985 | -0.1757 -0.0958 -0.1069 -0.1685 -0.1617 0.1713 1.0000
ufe1986 | -0.1915 -0.2534 -0.2644 -0.2676 -0.1723 -0.1364 -0.0865 1.0000
489

ufe1987 | -0.1519 -0.0497 -0.1561 -0.2008 -0.2399 -0.2066 -0.0918 -0.0908 1.0000
ufe1988 | -0.1650 -0.1109 -0.1385 -0.1772 -0.3816 -0.3096 -0.1268 0.2420 0.2439
| ufe1988
-------------+--------ufe1988 | 1.0000

.
. * The following does estimation for just one year
. regress lnhr1979 lnwg1979
Source |
SS
df
MS
Number of obs = 532
-------------+-----------------------------F( 1, 530) = 0.00
Model | .000035507 1 .000035507
Prob > F
= 0.9810
Residual | 33.0180361 530 .062298181
R-squared = 0.0000
-------------+-----------------------------Adj R-squared = -0.0019
Total | 33.0180716 531 .062180926
Root MSE
= .2496
-----------------------------------------------------------------------------lnhr1979 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg1979 | .0006173 .0258574 0.02 0.981 -.0501783 .0514129
_cons | 7.667738 .0680375 112.70 0.000 7.534082 7.801395
-----------------------------------------------------------------------------.
. ************ (2) ANALYSIS: OBTAIN AUTOCORRELATIONS FOR FIRST DIFFERNCES
.
. ** SET UP THE DATA
. use mom, clear
. gen dlnhr = lnhr - lnhr[_n-1]
(1 missing value generated)
. gen dlnwg = lnwg - lnwg[_n-1]
(1 missing value generated)
. * The following drops the first year which here is 1979
. drop if year == 1979
(532 observations deleted)
. regress dlnhr dlnwg
Source |
SS
df
MS
Number of obs = 4788
-------------+-----------------------------F( 1, 4786) = 26.09
Model | 2.27870825 1 2.27870825
Prob > F
= 0.0000
Residual | 417.943979 4786 .087326364
R-squared = 0.0054
-------------+-----------------------------Adj R-squared = 0.0052
Total | 420.222687 4787 .087784142
Root MSE
= .29551

490

-----------------------------------------------------------------------------dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .1089851 .0213351 5.11 0.000 .0671584 .1508118
_cons | .0008283 .0042712 0.19 0.846 -.0075452 .0092018
-----------------------------------------------------------------------------. predict ufdiff, residuals
. * Here just do this for lnhr and lnwg and the residuals
. keep dlnhr dlnwg ufdiff id year
. reshape wide dlnhr dlnwg ufdiff, i(id) j(year)
(note: j = 1980 1981 1982 1983 1984 1985 1986 1987 1988)
Data
long -> wide
----------------------------------------------------------------------------Number of obs.
4788 -> 532
Number of variables
5 ->
28
j variable (9 values)
year -> (dropped)
xij variables:
dlnhr -> dlnhr1980 dlnhr1981 ... dlnhr1988
dlnwg -> dlnwg1980 dlnwg1981 ... dlnwg1988
ufdiff -> ufdiff1980 ufdiff1981 ... ufdiff1988
----------------------------------------------------------------------------. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
532
266.5 153.7194
1
532
dlnhr1980 |
532 -.0092481 .3023508
-2.5
1.71
dlnwg1980 |
532 .0046053 .2301879
-2.12
1.05
ufdiff1980 |
532 -.0105783 .3014161 -2.499738 1.690644
dlnhr1981 |
532 .0075564 .2668644
-1.2
2.32
-------------+-------------------------------------------------------dlnwg1981 |
532 .0085902 .1818033
-.79
1.62
ufdiff1981 |
532 .0057919 .2669213 -1.145188 2.343149
dlnhr1982 |
532 -.0215602 .212834 -2.06
1.14
dlnwg1982 |
532 .0037218 .1755574
-1.17
.74
ufdiff1982 |
532 -.0227941 .213709 -2.036851 1.135902
-------------+-------------------------------------------------------dlnhr1983 |
532 -.0330263 .3413969 -4.51 .9899998
dlnwg1983 |
532 -.0041541 .1673057
-.88 .6399999
ufdiff1983 |
532 -.0334019 .3398726 -4.419281 .9780819
dlnhr1984 |
532 .0234586 .3034213
-2.31
2.57
dlnwg1984 |
532 -.0103383 .2342514 -2.13
.77
-------------+-------------------------------------------------------ufdiff1984 |
532 .0237571 .3004287 -2.168058 2.502691
dlnhr1985 |
532 .0318421 .2772558
-1.46
3.52
dlnwg1985 |
532 .0147556 .2371054
-1.33
3.06
491

ufdiff1985 |
532 .0294057 .2697542 -1.315878 3.185677
dlnhr1986 |
532 -.0090789 .3270724 -4.79
1.8
-------------+-------------------------------------------------------dlnwg1986 |
532 -.012312 .1804162
-1.83
1.04
ufdiff1986 |
532 -.0085654 .3299129 -4.796278 1.789363
dlnhr1987 |
532 .0147744 .3470122
-3.24
4.52
dlnwg1987 |
532 .0120677 .1845692 -.9400001
1.95
ufdiff1987 |
532 .0126309 .3494111 -3.243008 4.550777
-------------+-------------------------------------------------------dlnhr1988 |
532 .0057707 .2587991
-2.5
2.74
dlnwg1988 |
532 .0109023 .194813
-1.5
1.22
ufdiff1988 |
532 .0037542 .2576554 -2.337351 2.739172
.
. ** GET THE CORRELATIONS
. corr dlnhr1980 dlnhr1981 dlnhr1982 dlnhr1983 dlnhr1984 dlnhr1985 dlnhr1986 dlnhr1987
dlnhr1988
(obs=532)
| dlnhr1~0 dlnhr1~1 dlnhr1~2 dlnhr1~3 dlnhr1~4 dlnhr1~5 dlnhr1~6 dlnhr1~7 dlnhr1~8
-------------+--------------------------------------------------------------------------------dlnhr1980 | 1.0000
dlnhr1981 | -0.6289 1.0000
dlnhr1982 | 0.0402 -0.2306 1.0000
dlnhr1983 | 0.0144 -0.0204 -0.2209 1.0000
dlnhr1984 | -0.0001 -0.0570 -0.1410 -0.4495 1.0000
dlnhr1985 | 0.0393 -0.0320 -0.0827 -0.4035 -0.1969 1.0000
dlnhr1986 | -0.0629 0.0322 0.0112 0.0233 -0.1192 -0.2334 1.0000
dlnhr1987 | 0.0811 -0.0709 -0.0029 -0.0448 -0.0202 0.0093 -0.6231 1.0000
dlnhr1988 | -0.0341 0.0461 -0.0082 -0.1020 0.0261 0.0682 0.2486 -0.6064 1.0000

. corr dlnwg1980 dlnwg1981 dlnwg1982 dlnwg1983 dlnwg1984 dlnwg1985 dlnwg1986 dlnwg1987


dlnwg1988
(obs=532)
| dlnwg1~0 dlnwg1~1 dlnwg1~2 dlnwg1~3 dlnwg1~4 dlnwg1~5 dlnwg1~6 dlnwg1~7
dlnwg1~8
-------------+--------------------------------------------------------------------------------dlnwg1980 | 1.0000
dlnwg1981 | -0.3507 1.0000
dlnwg1982 | -0.0149 -0.2849 1.0000
dlnwg1983 | 0.0215 -0.0351 -0.3338 1.0000
dlnwg1984 | -0.0112 0.0098 -0.0686 -0.1899 1.0000
dlnwg1985 | -0.0135 -0.0085 0.0141 -0.1179 -0.5560 1.0000
dlnwg1986 | -0.0121 0.0289 -0.0303 0.0725 -0.0526 -0.2665 1.0000
dlnwg1987 | -0.0042 -0.0119 0.0382 -0.0083 0.1200 -0.1482 -0.5043 1.0000
dlnwg1988 | -0.0281 -0.0377 0.0157 -0.0133 -0.0174 -0.0058 -0.0174 -0.2627 1.0000

492

. corr ufdiff1980 ufdiff1981 ufdiff1982 ufdiff1983 ufdiff1984 ufdiff1985 ufdiff1986 ufdiff1987


ufdif
> f1988
(obs=532)
| ufd~1980 ufd~1981 ufd~1982 ufd~1983 ufd~1984 ufd~1985 ufd~1986 ufd~1987
ufd~1988
-------------+--------------------------------------------------------------------------------ufdiff1980 | 1.0000
ufdiff1981 | -0.6263 1.0000
ufdiff1982 | 0.0451 -0.2389 1.0000
ufdiff1983 | 0.0128 -0.0239 -0.2316 1.0000
ufdiff1984 | -0.0010 -0.0588 -0.1291 -0.4804 1.0000
ufdiff1985 | 0.0453 -0.0285 -0.0868 -0.3731 -0.1853 1.0000
ufdiff1986 | -0.0674 0.0321 0.0110 0.0256 -0.1138 -0.2538 1.0000
ufdiff1987 | 0.0811 -0.0711 -0.0077 -0.0533 -0.0081 0.0211 -0.6250 1.0000
ufdiff1988 | -0.0323 0.0499 0.0022 -0.1019 0.0368 0.0543 0.2326 -0.5943 1.0000

.
. ************ (3) ANALYSIS: CORRELATIONS FOR AN INDIVIDUAL OBSERVATION
.
. * Look at correlations for each individual
.
. ** TRANSFORM DATA FROM LONG FORM TO WIDE FORM FOR INDIVIDUALS
.
. use mom3, replace
. * Here just do this for lnhr and lnwg and the residuals
. keep lnhr lnwg id year
. reshape wide lnhr lnwg, i(year) j(id)
(note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33
> 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
65 6
> 6 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
98
> 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
121 122 123
> 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
146 147 1
> 48 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
170 171 172
> 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
195 196 1
> 97 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218
219 220 221
> 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243
244 245 2

493

> 46 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267
268 269 270
> 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292
293 294 2
> 95 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316
317 318 319
> 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341
342 343 3
> 44 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365
366 367 368
> 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390
391 392 3
> 93 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414
415 416 417
> 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439
440 441 4
> 42 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463
464 465 466
> 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488
489 490 4
> 91 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512
513 514 515
> 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532)
Data
long -> wide
----------------------------------------------------------------------------Number of obs.
5320 ->
10
Number of variables
4 -> 1065
j variable (532 values)
id -> (dropped)
xij variables:
lnhr -> lnhr1 lnhr2 ... lnhr532
lnwg -> lnwg1 lnwg2 ... lnwg532
----------------------------------------------------------------------------. * Note that i and j are reversed
.
. * Since year is 1979 to 1988 this will create
. * lnhr1979 to lnhr1988 and lnwg1979 to lnwg1988
.
. tsset year
time variable: year, 1979 to 1988
.
. * First-order Correlation over T years for the first observation
. corr lnhr1 L.lnhr1
(obs=9)
|
L.
| lnhr1 lnhr1
-------------+-----------------lnhr1
|
494

-- | 1.0000
L1 | 0.6378 1.0000

. * First-order Correlation over T years for the second observation


. corr lnhr2 L.lnhr2
(obs=9)
|
L.
| lnhr2 lnhr2
-------------+-----------------lnhr2
|
-- | 1.0000
L1 | 0.5553 1.0000

. * And so on
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section5\mma21p2panresiduals.txt
log type: text
closed on: 23 May 2005, 11:37:30
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma21p3panresiduals.txt
log type: text
opened on: 23 May 2005, 13:01:06
.
. ********** OVERVIEW OF MMA21P3PANRESIDUALS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 21.3.4 pages 713-15 Residual analysis
. * This program
. * (1) estimates correlations for
. * - dependent variable
. * - regressors variable
. * - residuals from pooled ols [Table 21.3]
. * - residuals from within estimation [Table 21.4]
. * - residuals from random effects estimation
. * (2) separately estimates correlations for
. * - residuals from first differences estiamtion
. * (3) gets correlations for each individual observation
.
. * The code is very limited:
495

. * - it considers only one regressor


. * - it assumes a balanced data set with exactly 10 years of data per obnservations
. * - it does not use loops for transformations which would generalize code
.
. * The four basic linear panel programs are
. * mma21p1panfeandre.do Linear fixed and random effects using xtreg
. * mma21p2panfeandre.do Linear fe and re using transformation and regress
.*
plus also has valid Hausman test
. * mma21p3panresiduals.do Residual analysis after linear fe and re
. * mma21p4panpangls.do Pooled panel OLS and GLS
.
. * To run you need file
. * MOM.dat
. * in your directory
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION **********
.
. * The original data is from
. * Jim Ziliak (1997)
. * "Efficient Estimation With Panel Data when Instruments are Predetermined:
. * An Emprirical Comparison of Moment-Condition Estimators"
. * Journal of Business and Economic Statistics, 15, 419-431
.
. * File MOM.dat has data on 532 men over 10 years (1979-1988)
. * Data are space-delimited ordered by person with separate line for each year
. * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ...
. * 8 variables:
. * lnhr lnwg kids ageh agesq disab id year
.
. * File MOM.dat is the version of the data posted at the JBES website
. * Note that in chapter 22 we instead use MOMprecise.dat
. * which is the same data set but with more significant digits
.
. ********** READ DATA **********
.*
. * The data are in ascii file MOM.dat
. * There are 532 individuals with 10 lines (years) per individual
. * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY
. infile lnhr lnwg kids ageh agesq disab id year using MOM.dat
(5320 observations read)
. summarize
496

Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lnhr |
5320 7.65743 .2855914
2.77
8.56
lnwg |
5320 2.609436 .4258924
-.26
4.69
kids |
5320 1.555827 1.195924
0
6
ageh |
5320 38.91823 8.450351
22
60
agesq |
5320 1586.024 689.7759
484
3600
-------------+-------------------------------------------------------disab |
5320 .0609023 .2391734
0
1
id |
5320
266.5 153.5893
1
532
year |
5320
1983.5 2.872551
1979
1988
.
. ************ (1) ANALYSIS: OBTAIN KEY AUTOCORRELATIONS Tables 21.3, 21.4
**********
.
. ** RUN REGRESSIONS AND GET RESIDUALS OF INTEREST
.
. * pooled ols
. regress lnhr lnwg
Source |
SS
df
MS
Number of obs = 5320
-------------+-----------------------------F( 1, 5318) = 82.22
Model | 6.60538417 1 6.60538417
Prob > F
= 0.0000
Residual | 427.225206 5318 .080335691
R-squared = 0.0152
-------------+-----------------------------Adj R-squared = 0.0150
Total | 433.830591 5319 .081562435
Root MSE
= .28344
-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0827436 .0091251 9.07 0.000 .0648545 .1006326
_cons | 7.441516 .0241265 308.44 0.000 7.394219 7.488814
-----------------------------------------------------------------------------. predict upols, residuals
.
. * fixed effects (within)
. xtreg lnhr lnwg, fe i(id)
Fixed-effects (within) regression
Group variable (i): id
R-sq: within = 0.0162
between = 0.0213
overall = 0.0152

corr(u_i, Xb) = -0.1995

Number of obs
=
5320
Number of groups =
532
Obs per group: min =
avg =
10.0
max =
10

F(1,4787)
=
Prob > F

10

78.96
= 0.0000
497

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1676755 .01887 8.89 0.000 .1306816 .2046694
_cons | 7.219892 .0493434 146.32 0.000 7.123156 7.316628
-------------+---------------------------------------------------------------sigma_u | .18142881
sigma_e | .23278339
rho | .37789558 (fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0: F(531, 4787) = 5.83
Prob > F = 0.0000
. predict ufe, e
.
. * random effects
. xtreg lnhr lnwg, re i(id)
Random-effects GLS regression
Group variable (i): id
R-sq: within = 0.0162
between = 0.0213
overall = 0.0152
Random effects u_i ~ Gaussian
corr(u_i, X)
= 0 (assumed)

Number of obs
Number of groups =

=
5320
532

Obs per group: min =


avg =
10.0
max =
10

10

Wald chi2(1)
= 76.64
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1193322 .0136312 8.75 0.000 .0926155 .146049
_cons | 7.346041 .0363925 201.86 0.000 7.274713 7.417368
-------------+---------------------------------------------------------------sigma_u | .16124733
sigma_e | .23278339
rho | .32424354 (fraction of variance due to u_i)
-----------------------------------------------------------------------------. predict ure, e
.
. summarize upols ufe ure
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------upols |
5320 -1.27e-10 .2834089 -4.826247 .964581
ufe |
5320 -5.52e-11 .2208354 -4.003929 1.2719
ure |
5320 -9.00e-11 .2231118 -4.131111 1.085362

498

. save mom3, replace


file mom3.dta saved
.
. ** TRANSFORM DATA FROM LONG FORM TO WIDE FORM
.
. * Here just do this for lnhr and lnwg and the residuals
. keep lnhr lnwg id year upols ufe ure
. reshape wide lnhr lnwg upols ufe ure, i(id) j(year)
(note: j = 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988)
Data
long -> wide
----------------------------------------------------------------------------Number of obs.
5320 -> 532
Number of variables
7 ->
51
j variable (10 values)
year -> (dropped)
xij variables:
lnhr -> lnhr1979 lnhr1980 ... lnhr1988
lnwg -> lnwg1979 lnwg1980 ... lnwg1988
upols -> upols1979 upols1980 ... upols1988
ufe -> ufe1979 ufe1980 ... ufe1988
ure -> ure1979 ure1980 ... ure1988
----------------------------------------------------------------------------.
. * Since year is 1979 to 1988 this will create
. * lnhr1979 to lnhr1988 and lnwg1979 to lnwg1988
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
532
266.5 153.7194
1
532
lnhr1979 |
532 7.669342 .249361
5.89
8.54
lnwg1979 |
532 2.597763 .4188951
.52
4.62
upols1979 |
532 .0128775 .2517228 -1.764168 .8312218
ufe1979 |
532 .0138689 .2249175 -1.578105 1.2719
-------------+-------------------------------------------------------ure1979 |
532 .0133046 .2200196 -1.618987 1.085362
lnhr1980 |
532 7.660094 .2691995
5.22
8.34
lnwg1980 |
532 2.602368 .3945963
.8
4.61
upols1980 |
532 .0032483 .2679463 -2.354734 .6659743
ufe1980 |
532 .0038486 .2253673 -2.085636 1.128546
-------------+-------------------------------------------------------ure1980 |
532 .0035069 .2238723 -2.089847 .9429754
lnhr1981 |
532 7.66765 .2105797
6.36
8.4
lnwg1981 |
532 2.610959 .3870011
1.53
4.53
upols1981 |
532 .0100939 .2133106 -1.342159 .7582438
ufe1981 |
532 .0099646 .163407 -1.001722 1.03687
-------------+-------------------------------------------------------499

ure1981 |
532 .0100382 .1596593 -1.02491 .8517824
lnhr1982 |
532 7.64609 .2427195
5.38
8.31
lnwg1982 |
532 2.61468 .4014363
1.21
4.61
upols1982 |
532 -.0117742 .2422735 -2.264238 .6897579
ufe1982 |
532 -.0122196 .1890237 -1.623214 .7918997
-------------+-------------------------------------------------------ure1982 |
532 -.0119661 .1875585 -1.737484 .6666697
lnhr1983 |
532 7.613064 .382703
2.77
8.37
lnwg1983 |
532 2.610526 .4111869
1.08
4.62
upols1983 |
532 -.0444568 .3778255 -4.826247 .7307264
ufe1983 |
532 -.0445494 .2836351 -3.577253 .5196197
-------------+-------------------------------------------------------ure1983 |
532 -.0444967 .294545 -3.804399 .5078294
lnhr1984 |
532 7.636523 .3316735
3.18
8.44
lnwg1984 |
532 2.600188 .4621549
-.26
4.65
upols1984 |
532 -.0201427 .3208512 -4.240003 .8263766
ufe1984 |
532 -.0193572 .225836 -2.810104 .8327778
-------------+-------------------------------------------------------ure1984 |
532 -.0198043 .2378605 -3.140221 .7036628
lnhr1985 |
532 7.668365 .2597423
5.08
8.54
lnwg1985 |
532 2.614944 .4347554
1.33
4.69
upols1985 |
532 .0104785 .259051 -2.503835 .8624523
ufe1985 |
532 .0100107 .1856724 -1.581894 .7944546
-------------+-------------------------------------------------------ure1985 |
532 .010277 .1886509 -1.752727 .7370209
lnhr1986 |
532 7.659286 .3330862
2.77
8.38
lnwg1986 |
532 2.602632 .4432807
.07
4.59
upols1986 |
532 .0024183 .3312105 -4.801424 .7439653
ufe1986 |
532 .0029962 .2595405 -4.003929 .6384854
-------------+-------------------------------------------------------ure1986 |
532 .0026673 .264328 -4.131111 .5111209
lnhr1987 |
532 7.67406 .2745015
4.38
8.56
lnwg1987 |
532 2.614699 .4300122
1.28
4.03
upols1987 |
532 .0161942 .2749153 -3.283269 .964581
ufe1987 |
532 .0157472 .2141618 -2.817174 1.009662
-------------+-------------------------------------------------------ure1987 |
532 .0160016 .2148092 -2.897725 .8441463
lnhr1988 |
532 7.679831 .2552894
4.79
8.53
lnwg1988 |
532 2.625602 .4701759
-.22
4.6
upols1988 |
532 .0210628 .2519891 -2.633313 .9072749
ufe1988 |
532 .0196898 .2048927 -1.68379 1.123516
-------------+-------------------------------------------------------ure1988 |
532 .0204713 .2022375 -1.897506 .9393954
.
. ** OBTAIN THE VARIOUS CORRELATIONS
.
. corr lnhr1979 lnhr1980 lnhr1981 lnhr1982 lnhr1983 lnhr1984 lnhr1985 lnhr1986 lnhr1987
lnhr1988
(obs=532)

500

| lnhr1979 lnhr1980 lnhr1981 lnhr1982 lnhr1983 lnhr1984 lnhr1985 lnhr1986 lnhr1987


-------------+--------------------------------------------------------------------------------lnhr1979 | 1.0000
lnhr1980 | 0.3220 1.0000
lnhr1981 | 0.4321 0.4022 1.0000
lnhr1982 | 0.2947 0.3142 0.5670 1.0000
lnhr1983 | 0.2070 0.2324 0.3788 0.4781 1.0000
lnhr1984 | 0.1908 0.2235 0.3141 0.3318 0.6476 1.0000
lnhr1985 | 0.2284 0.3184 0.3999 0.3453 0.3930 0.5839 1.0000
lnhr1986 | 0.1934 0.1931 0.2813 0.2524 0.3162 0.3595 0.4128 1.0000
lnhr1987 | 0.1986 0.3160 0.3322 0.2951 0.3261 0.3464 0.3987 0.3603 1.0000
lnhr1988 | 0.1640 0.2551 0.3081 0.2674 0.2267 0.2537 0.3509 0.5741 0.5248
| lnhr1988
-------------+--------lnhr1988 | 1.0000

. corr lnwg1979 lnwg1980 lnwg1981 lnwg1982 lnwg1983 lnwg1984 lnwg1985 lnwg1986


lnwg1987 lnwg1988
(obs=532)
| lnwg1979 lnwg1980 lnwg1981 lnwg1982 lnwg1983 lnwg1984 lnwg1985 lnwg1986
lnwg1987
-------------+--------------------------------------------------------------------------------lnwg1979 | 1.0000
lnwg1980 | 0.8415 1.0000
lnwg1981 | 0.8283 0.8920 1.0000
lnwg1982 | 0.7984 0.8559 0.9015 1.0000
lnwg1983 | 0.7795 0.8408 0.8787 0.9155 1.0000
lnwg1984 | 0.7208 0.7737 0.8102 0.8267 0.8625 1.0000
lnwg1985 | 0.7424 0.7929 0.8290 0.8511 0.8636 0.8620 1.0000
lnwg1986 | 0.7250 0.7714 0.8122 0.8286 0.8530 0.8399 0.9157 1.0000
lnwg1987 | 0.7188 0.7639 0.8029 0.8282 0.8525 0.8681 0.9117 0.9111 1.0000
lnwg1988 | 0.7220 0.7604 0.7900 0.8139 0.8326 0.8373 0.8787 0.8743 0.9101
| lnwg1988
-------------+--------lnwg1988 | 1.0000

. * The following gives Table 21.3 p.714


. corr upols1979 upols1980 upols1981 upols1982 upols1983 upols1984 upols1985 upols1986
upols1987 upo
> ls1988
(obs=532)
| upo~1979 upo~1980 upo~1981 upo~1982 upo~1983 upo~1984 upo~1985 upo~1986
upo~1987
-------------+--------------------------------------------------------------------------------upols1979 | 1.0000
501

upols1980 |
upols1981 |
upols1982 |
upols1983 |
upols1984 |
upols1985 |
upols1986 |
upols1987 |
upols1988 |

0.3283
0.4442
0.3008
0.2089
0.2025
0.2395
0.1987
0.2091
0.1619

1.0000
0.4035
0.3140
0.2298
0.2289
0.3246
0.1903
0.3167
0.2456

1.0000
0.5678
0.3739
0.3194
0.4087
0.2797
0.3340
0.3016

1.0000
0.4684
0.3360
0.3484
0.2470
0.2877
0.2582

1.0000
0.6398
0.3898
0.3109
0.3097
0.2083

1.0000
0.5800
0.3535
0.3361
0.2470

1.0000
0.3991 1.0000
0.3941 0.3496 1.0000
0.3436 0.5545 0.5242

| upo~1988
-------------+--------upols1988 | 1.0000

. corr ure1979 ure1980 ure1981 ure1982 ure1983 ure1984 ure1985 ure1986 ure1987 ure1988
(obs=532)
| ure1979 ure1980 ure1981 ure1982 ure1983 ure1984 ure1985 ure1986 ure1987
-------------+--------------------------------------------------------------------------------ure1979 | 1.0000
ure1980 | 0.0778 1.0000
ure1981 | 0.1777 0.0604 1.0000
ure1982 | -0.0250 -0.0519 0.2492 1.0000
ure1983 | -0.2339 -0.2277 -0.1609 0.0587 1.0000
ure1984 | -0.2482 -0.2431 -0.2691 -0.1709 0.3795 1.0000
ure1985 | -0.1842 -0.0919 -0.1054 -0.1581 -0.0939 0.2197 1.0000
ure1986 | -0.1860 -0.2333 -0.2434 -0.2405 -0.1110 -0.0763 -0.0361 1.0000
ure1987 | -0.1665 -0.0481 -0.1580 -0.1904 -0.1710 -0.1506 -0.0646 -0.0553 1.0000
ure1988 | -0.1960 -0.1251 -0.1646 -0.1949 -0.3265 -0.2786 -0.1221 0.2708 0.2379
| ure1988
-------------+--------ure1988 | 1.0000

. * The following gives Table 21.4 p.715


. corr ufe1979 ufe1980 ufe1981 ufe1982 ufe1983 ufe1984 ufe1985 ufe1986 ufe1987 ufe1988
(obs=532)
| ufe1979 ufe1980 ufe1981 ufe1982 ufe1983 ufe1984 ufe1985 ufe1986 ufe1987
-------------+--------------------------------------------------------------------------------ufe1979 | 1.0000
ufe1980 | 0.1017 1.0000
ufe1981 | 0.2082 0.0802 1.0000
ufe1982 | 0.0003 -0.0380 0.2631 1.0000
ufe1983 | -0.2632 -0.2691 -0.2113 0.0089 1.0000
ufe1984 | -0.2594 -0.2698 -0.3004 -0.2037 0.3249 1.0000
ufe1985 | -0.1757 -0.0958 -0.1069 -0.1685 -0.1617 0.1713 1.0000
ufe1986 | -0.1915 -0.2534 -0.2644 -0.2676 -0.1723 -0.1364 -0.0865 1.0000
ufe1987 | -0.1519 -0.0497 -0.1561 -0.2008 -0.2399 -0.2066 -0.0918 -0.0908 1.0000
502

ufe1988 | -0.1650 -0.1109 -0.1385 -0.1772 -0.3816 -0.3096 -0.1268 0.2420 0.2439
| ufe1988
-------------+--------ufe1988 | 1.0000

.
. * The following does estimation for just one year
. regress lnhr1979 lnwg1979
Source |
SS
df
MS
Number of obs = 532
-------------+-----------------------------F( 1, 530) = 0.00
Model | .000035507 1 .000035507
Prob > F
= 0.9810
Residual | 33.0180361 530 .062298181
R-squared = 0.0000
-------------+-----------------------------Adj R-squared = -0.0019
Total | 33.0180716 531 .062180926
Root MSE
= .2496
-----------------------------------------------------------------------------lnhr1979 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg1979 | .0006173 .0258574 0.02 0.981 -.0501783 .0514129
_cons | 7.667738 .0680375 112.70 0.000 7.534082 7.801395
-----------------------------------------------------------------------------.
. ************ (2) ANALYSIS: OBTAIN AUTOCORRELATIONS FOR FIRST DIFFERNCES
.
. ** SET UP THE DATA
. use mom, clear
. gen dlnhr = lnhr - lnhr[_n-1]
(1 missing value generated)
. gen dlnwg = lnwg - lnwg[_n-1]
(1 missing value generated)
. * The following drops the first year which here is 1979
. drop if year == 1979
(532 observations deleted)
. regress dlnhr dlnwg
Source |
SS
df
MS
Number of obs = 4788
-------------+-----------------------------F( 1, 4786) = 26.09
Model | 2.27870825 1 2.27870825
Prob > F
= 0.0000
Residual | 417.943979 4786 .087326364
R-squared = 0.0054
-------------+-----------------------------Adj R-squared = 0.0052
Total | 420.222687 4787 .087784142
Root MSE
= .29551
-----------------------------------------------------------------------------503

dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .1089851 .0213351 5.11 0.000 .0671584 .1508118
_cons | .0008283 .0042712 0.19 0.846 -.0075452 .0092018
-----------------------------------------------------------------------------. predict ufdiff, residuals
. * Here just do this for lnhr and lnwg and the residuals
. keep dlnhr dlnwg ufdiff id year
. reshape wide dlnhr dlnwg ufdiff, i(id) j(year)
(note: j = 1980 1981 1982 1983 1984 1985 1986 1987 1988)
Data
long -> wide
----------------------------------------------------------------------------Number of obs.
4788 -> 532
Number of variables
5 ->
28
j variable (9 values)
year -> (dropped)
xij variables:
dlnhr -> dlnhr1980 dlnhr1981 ... dlnhr1988
dlnwg -> dlnwg1980 dlnwg1981 ... dlnwg1988
ufdiff -> ufdiff1980 ufdiff1981 ... ufdiff1988
----------------------------------------------------------------------------. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
532
266.5 153.7194
1
532
dlnhr1980 |
532 -.0092481 .3023508
-2.5
1.71
dlnwg1980 |
532 .0046053 .2301879
-2.12
1.05
ufdiff1980 |
532 -.0105783 .3014161 -2.499738 1.690644
dlnhr1981 |
532 .0075564 .2668644
-1.2
2.32
-------------+-------------------------------------------------------dlnwg1981 |
532 .0085902 .1818033
-.79
1.62
ufdiff1981 |
532 .0057919 .2669213 -1.145188 2.343149
dlnhr1982 |
532 -.0215602 .212834 -2.06
1.14
dlnwg1982 |
532 .0037218 .1755574
-1.17
.74
ufdiff1982 |
532 -.0227941 .213709 -2.036851 1.135902
-------------+-------------------------------------------------------dlnhr1983 |
532 -.0330263 .3413969 -4.51 .9899998
dlnwg1983 |
532 -.0041541 .1673057
-.88 .6399999
ufdiff1983 |
532 -.0334019 .3398726 -4.419281 .9780819
dlnhr1984 |
532 .0234586 .3034213
-2.31
2.57
dlnwg1984 |
532 -.0103383 .2342514 -2.13
.77
-------------+-------------------------------------------------------ufdiff1984 |
532 .0237571 .3004287 -2.168058 2.502691
dlnhr1985 |
532 .0318421 .2772558
-1.46
3.52
dlnwg1985 |
532 .0147556 .2371054
-1.33
3.06
ufdiff1985 |
532 .0294057 .2697542 -1.315878 3.185677
504

dlnhr1986 |
532 -.0090789 .3270724 -4.79
1.8
-------------+-------------------------------------------------------dlnwg1986 |
532 -.012312 .1804162 -1.83
1.04
ufdiff1986 |
532 -.0085654 .3299129 -4.796278 1.789363
dlnhr1987 |
532 .0147744 .3470122
-3.24
4.52
dlnwg1987 |
532 .0120677 .1845692 -.9400001
1.95
ufdiff1987 |
532 .0126309 .3494111 -3.243008 4.550777
-------------+-------------------------------------------------------dlnhr1988 |
532 .0057707 .2587991
-2.5
2.74
dlnwg1988 |
532 .0109023 .194813
-1.5
1.22
ufdiff1988 |
532 .0037542 .2576554 -2.337351 2.739172
.
. ** GET THE CORRELATIONS
. corr dlnhr1980 dlnhr1981 dlnhr1982 dlnhr1983 dlnhr1984 dlnhr1985 dlnhr1986 dlnhr1987
dlnhr1988
(obs=532)
| dlnhr1~0 dlnhr1~1 dlnhr1~2 dlnhr1~3 dlnhr1~4 dlnhr1~5 dlnhr1~6 dlnhr1~7 dlnhr1~8
-------------+--------------------------------------------------------------------------------dlnhr1980 | 1.0000
dlnhr1981 | -0.6289 1.0000
dlnhr1982 | 0.0402 -0.2306 1.0000
dlnhr1983 | 0.0144 -0.0204 -0.2209 1.0000
dlnhr1984 | -0.0001 -0.0570 -0.1410 -0.4495 1.0000
dlnhr1985 | 0.0393 -0.0320 -0.0827 -0.4035 -0.1969 1.0000
dlnhr1986 | -0.0629 0.0322 0.0112 0.0233 -0.1192 -0.2334 1.0000
dlnhr1987 | 0.0811 -0.0709 -0.0029 -0.0448 -0.0202 0.0093 -0.6231 1.0000
dlnhr1988 | -0.0341 0.0461 -0.0082 -0.1020 0.0261 0.0682 0.2486 -0.6064 1.0000

. corr dlnwg1980 dlnwg1981 dlnwg1982 dlnwg1983 dlnwg1984 dlnwg1985 dlnwg1986 dlnwg1987


dlnwg1988
(obs=532)
| dlnwg1~0 dlnwg1~1 dlnwg1~2 dlnwg1~3 dlnwg1~4 dlnwg1~5 dlnwg1~6 dlnwg1~7
dlnwg1~8
-------------+--------------------------------------------------------------------------------dlnwg1980 | 1.0000
dlnwg1981 | -0.3507 1.0000
dlnwg1982 | -0.0149 -0.2849 1.0000
dlnwg1983 | 0.0215 -0.0351 -0.3338 1.0000
dlnwg1984 | -0.0112 0.0098 -0.0686 -0.1899 1.0000
dlnwg1985 | -0.0135 -0.0085 0.0141 -0.1179 -0.5560 1.0000
dlnwg1986 | -0.0121 0.0289 -0.0303 0.0725 -0.0526 -0.2665 1.0000
dlnwg1987 | -0.0042 -0.0119 0.0382 -0.0083 0.1200 -0.1482 -0.5043 1.0000
dlnwg1988 | -0.0281 -0.0377 0.0157 -0.0133 -0.0174 -0.0058 -0.0174 -0.2627 1.0000

. corr ufdiff1980 ufdiff1981 ufdiff1982 ufdiff1983 ufdiff1984 ufdiff1985 ufdiff1986 ufdiff1987


ufdif
505

> f1988
(obs=532)
| ufd~1980 ufd~1981 ufd~1982 ufd~1983 ufd~1984 ufd~1985 ufd~1986 ufd~1987
ufd~1988
-------------+--------------------------------------------------------------------------------ufdiff1980 | 1.0000
ufdiff1981 | -0.6263 1.0000
ufdiff1982 | 0.0451 -0.2389 1.0000
ufdiff1983 | 0.0128 -0.0239 -0.2316 1.0000
ufdiff1984 | -0.0010 -0.0588 -0.1291 -0.4804 1.0000
ufdiff1985 | 0.0453 -0.0285 -0.0868 -0.3731 -0.1853 1.0000
ufdiff1986 | -0.0674 0.0321 0.0110 0.0256 -0.1138 -0.2538 1.0000
ufdiff1987 | 0.0811 -0.0711 -0.0077 -0.0533 -0.0081 0.0211 -0.6250 1.0000
ufdiff1988 | -0.0323 0.0499 0.0022 -0.1019 0.0368 0.0543 0.2326 -0.5943 1.0000

.
. ************ (3) ANALYSIS: CORRELATIONS FOR AN INDIVIDUAL OBSERVATION
.
. * Look at correlations for each individual
.
. ** TRANSFORM DATA FROM LONG FORM TO WIDE FORM FOR INDIVIDUALS
.
. use mom3, replace
. * Here just do this for lnhr and lnwg and the residuals
. keep lnhr lnwg id year
. reshape wide lnhr lnwg, i(year) j(id)
(note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33
> 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
65 6
> 6 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
98
> 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
121 122 123
> 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
146 147 1
> 48 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
170 171 172
> 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
195 196 1
> 97 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218
219 220 221
> 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243
244 245 2
> 46 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267
268 269 270

506

> 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292
293 294 2
> 95 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316
317 318 319
> 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341
342 343 3
> 44 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365
366 367 368
> 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390
391 392 3
> 93 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414
415 416 417
> 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439
440 441 4
> 42 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463
464 465 466
> 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488
489 490 4
> 91 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512
513 514 515
> 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532)
Data
long -> wide
----------------------------------------------------------------------------Number of obs.
5320 ->
10
Number of variables
4 -> 1065
j variable (532 values)
id -> (dropped)
xij variables:
lnhr -> lnhr1 lnhr2 ... lnhr532
lnwg -> lnwg1 lnwg2 ... lnwg532
----------------------------------------------------------------------------. * Note that i and j are reversed
.
. * Since year is 1979 to 1988 this will create
. * lnhr1979 to lnhr1988 and lnwg1979 to lnwg1988
.
. tsset year
time variable: year, 1979 to 1988
.
. * First-order Correlation over T years for the first observation
. corr lnhr1 L.lnhr1
(obs=9)
|
L.
| lnhr1 lnhr1
-------------+-----------------lnhr1
|
-- | 1.0000
L1 | 0.6378 1.0000
507

. * First-order Correlation over T years for the second observation


. corr lnhr2 L.lnhr2
(obs=9)
|
L.
| lnhr2 lnhr2
-------------+-----------------lnhr2
|
-- | 1.0000
L1 | 0.5553 1.0000

. * And so on
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section5\mma21p3panresiduals.txt
log type: text
closed on: 23 May 2005, 13:01:15
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma21p4pangls.txt
log type: text
opened on: 23 May 2005, 11:38:01
.
. ********** OVERVIEW OF MMA21P4PANGLS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 21.5.5 page 725 Table 21.6 Pooled panel OLS and GLS
. * Demonstrate pooled GLS estimation using XTGEE
. * (1) No correlation (i.e. pooled OLS)
. * (2) Equicorrelated
. * (3) AR1
. * (4) Unrestricted
. * Standard errors are default plus panel boostrap
.
. * To run you need file
. * MOM.dat
. * in your directory
.
. * The four basic linear panel programs are
. * mma21p1panfeandre.do Linear fixed and random effects using xtreg
. * mma21p2panfeandre.do Linear fe and re using transformation and regress
508

.*
plus also has valid Hausman test
. * mma21p3panresiduals.do Residual analysis after linear fe and re
. * mma21p4panpangls.do Pooled panel OLS and GLS
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION **********
.
. * The original data is from
. * Jim Ziliak (1997)
. * "Efficient Estimation With Panel Data when Instruments are Predetermined:
. * An Empirical Comparison of Moment-Condition Estimators"
. * Journal of Business and Economic Statistics, 15, 419-431
.
. * File MOM.dat has data on 532 men over 10 years (1979-1988)
. * Data are space-delimited ordered by person with separate line for each year
. * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ...
. * 8 variables:
. * lnhr lnwg kids ageh agesq disab id year
.
. * File MOM.dat is the version of the data posted at the JBES website
. * Note that in chapter 22 we instead use MOMprecise.dat
. * which is the same data set but with more significant digits
.
. ********** READ DATA AND SUMMARIZE **********
.*
. * The data are in ascii file MOM.dat
. * There are 532 individuals with 10 lines (years) per individual
. * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY
. infile lnhr lnwg kids ageh agesq disab id year using MOM.dat
(5320 observations read)
.
. describe
Contains data
obs:
5,320
vars:
8
size:
191,520 (98.1% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------lnhr
float %9.0g
509

lnwg
float %9.0g
kids
float %9.0g
ageh
float %9.0g
agesq
float %9.0g
disab
float %9.0g
id
float %9.0g
year
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lnhr |
5320 7.65743 .2855914
2.77
8.56
lnwg |
5320 2.609436 .4258924
-.26
4.69
kids |
5320 1.555827 1.195924
0
6
ageh |
5320 38.91823 8.450351
22
60
agesq |
5320 1586.024 689.7759
484
3600
-------------+-------------------------------------------------------disab |
5320 .0609023 .2391734
0
1
id |
5320
266.5 153.5893
1
532
year |
5320
1983.5 2.872551
1979
1988
.
. ********** DEFINE GLOBALS INCLUDING REGRESSOR LIST *********
.
. * Number of reps for the boostrap
. * Table 21.6 used 500
. global nreps 500
.
. ********* ANALYSIS: DIFFERENT POOLED GLS ESTIMATES USING XTGEE *********
.
. *** (1) N0 ERROR CORRELATION - SAME AS POOLED OLS Table 21.7 first column
.
. * Default standard error
. xtgee lnhr lnwg, corr(independent) i(id)
Iteration 1: tolerance = 3.405e-13
GEE population-averaged model
Number of obs
=
5320
Group variable:
id
Number of groups =
532
Link:
identity
Obs per group: min =
10
Family:
Gaussian
avg =
10.0
Correlation:
independent
max =
10
Wald chi2(1)
= 82.25
Scale parameter:
.0803055
Prob > chi2
= 0.0000
Pearson chi2(5320):

427.23

Deviance

427.23
510

Dispersion (Pearson):

.0803055

Dispersion

= .0803055

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0827436 .0091234 9.07 0.000 .064862 .1006251
_cons | 7.441516 .0241219 308.50 0.000 7.394238 7.488795
-----------------------------------------------------------------------------. estimates store ind
. * "Robust" standard error
. xtgee lnhr lnwg, corr(independent) i(id) robust
Iteration 1: tolerance = 3.405e-13
GEE population-averaged model
Number of obs
=
5320
Group variable:
id
Number of groups =
532
Link:
identity
Obs per group: min =
10
Family:
Gaussian
avg =
10.0
Correlation:
independent
max =
10
Wald chi2(1)
=
7.99
Scale parameter:
.0803055
Prob > chi2
= 0.0047
Pearson chi2(5320):
Dispersion (Pearson):

427.23
Deviance
.0803055
Dispersion

427.23
= .0803055

(standard errors adjusted for clustering on id)


-----------------------------------------------------------------------------|
Semi-robust
lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0827436 .0292684 2.83 0.005 .0253785 .1401086
_cons | 7.441516 .0795795 93.51 0.000 7.285543 7.597489
-----------------------------------------------------------------------------. estimates store indrob
. * Correct panel bootstrap standard errors
. set seed 10001
. bootstrap "xtgee lnhr lnwg, corr(independent) i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps
> ) level(95)
command:
xtgee lnhr lnwg , corr(independent) i(id)
statistics: _bs_1
= _b[lnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

5320

511

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 72 .0827435 -.0007854 .0317837 .0193687 .1461184 (N)
|
.0090096 .1413525 (P)
|
.0154833 .1413525 (BC)
_bs_2 | 72 7.441516 .0024828 .0861859 7.269667 7.613366 (N)
|
7.27043 7.635125 (P)
|
7.27043 7.631187 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix indbootse = e(se)
.
. *** (2) EQUICORRELATED - SAME AS RE-GLS Table 21.7 second column
.
. * Default standard error
. xtgee lnhr lnwg, corr(exchangeable) i(id)
Iteration 1: tolerance = .03364039
Iteration 2: tolerance = .00033468
Iteration 3: tolerance = 4.733e-06
Iteration 4: tolerance = 6.715e-08
GEE population-averaged model
Number of obs
=
5320
Group variable:
id
Number of groups =
532
Link:
identity
Obs per group: min =
10
Family:
Gaussian
avg =
10.0
Correlation:
exchangeable
max =
10
Wald chi2(1)
= 76.70
Scale parameter:
.0805511
Prob > chi2
= 0.0000
-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1195474 .0136507 8.76 0.000 .0927925 .1463023
_cons | 7.345479 .0364481 201.53 0.000 7.274042 7.416916
-----------------------------------------------------------------------------. estimates store exch
. * "Robust" standard error
. xtgee lnhr lnwg, corr(exchangeable) i(id) robust
Iteration 1: tolerance = .03364039
Iteration 2: tolerance = .00033468
Iteration 3: tolerance = 4.733e-06
512

Iteration 4: tolerance = 6.715e-08


GEE population-averaged model
Number of obs
=
5320
Group variable:
id
Number of groups =
532
Link:
identity
Obs per group: min =
10
Family:
Gaussian
avg =
10.0
Correlation:
exchangeable
max =
10
Wald chi2(1)
=
5.38
Scale parameter:
.0805511
Prob > chi2
= 0.0204
(standard errors adjusted for clustering on id)
-----------------------------------------------------------------------------|
Semi-robust
lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .1195474 .0515426 2.32 0.020 .0185258 .220569
_cons | 7.345479 .1379494 53.25 0.000 7.075103 7.615855
-----------------------------------------------------------------------------. estimates store exchrob
. * Correct panel bootstrap standard errors
. set seed 10001
. bootstrap "xtgee lnhr lnwg, corr(exchangeable) i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nrep
> s) level(95)
command:
xtgee lnhr lnwg , corr(exchangeable) i(id)
statistics: _bs_1
= _b[lnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

5320

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 72 .1195474 .0068755 .059895 .0001201 .2389747 (N)
|
.0256504 .2573869 (P)
|
.0256504 .2286118 (BC)
_bs_2 | 72 7.345479 -.0179736 .1585556 7.029328 7.66163 (N)
|
6.990765 7.605015 (P)
|
7.066358 7.605015 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix exchbootse = e(se)

513

.
. *** (3) AR(1) Table 21.7 third column
.
. * Default standard error
. xtgee lnhr lnwg, corr(ar 1) i(id) t(year)
Iteration 1: tolerance = .001507
Iteration 2: tolerance = 2.246e-06
Iteration 3: tolerance = 1.547e-09
GEE population-averaged model
Number of obs
=
5320
Group and time vars:
id year
Number of groups =
532
Link:
identity
Obs per group: min =
10
Family:
Gaussian
avg =
10.0
Correlation:
AR(1)
max =
10
Wald chi2(1)
= 46.73
Scale parameter:
.0803129
Prob > chi2
= 0.0000
-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0843777 .0123428 6.84 0.000 .0601862 .1085691
_cons | 7.439893 .0327698 227.04 0.000 7.375665 7.50412
-----------------------------------------------------------------------------. estimates store ar1
. * "Robust" standard error
. xtgee lnhr lnwg, corr(ar 1) i(id) t(year) robust
Iteration 1: tolerance = .001507
Iteration 2: tolerance = 2.246e-06
Iteration 3: tolerance = 1.547e-09
GEE population-averaged model
Number of obs
=
5320
Group and time vars:
id year
Number of groups =
532
Link:
identity
Obs per group: min =
10
Family:
Gaussian
avg =
10.0
Correlation:
AR(1)
max =
10
Wald chi2(1)
=
5.15
Scale parameter:
.0803129
Prob > chi2
= 0.0232
(standard errors adjusted for clustering on id)
-----------------------------------------------------------------------------|
Semi-robust
lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0843777 .0371764 2.27 0.023 .0115133 .1572421
_cons | 7.439893 .100308 74.17 0.000 7.243293 7.636493
------------------------------------------------------------------------------

514

. estimates store ar1rob


. * Correct panel bootstrap standard errors
. set seed 10001
. bootstrap "xtgee lnhr lnwg, corr(ar 1) i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nreps) level
> (95)
command:
xtgee lnhr lnwg , corr(ar 1) i(id)
statistics: _bs_1
= _b[lnwg]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
532
Replications =
500

5320

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .0843777 -.0025819 .050393 -.014631 .1833863 (N)
|
-.0060264 .184696 (P)
|
-.0031327 .1860251 (BC)
_bs_2 | 500 7.439893 .0077122 .136732 7.171251 7.708534 (N)
|
7.165532 7.686645 (P)
|
7.157923 7.676162 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix ar1bootse = e(se)
.
. *** (4) HOMOSKEDASTIC UNSTRUCTURED Table 21.7 fourth column
.
. * Default standard error
. xtgee lnhr lnwg, corr(unstructured) i(id) t(year)
Iteration 1: tolerance = .00721446
Iteration 2: tolerance = .0003951
Iteration 3: tolerance = .00001469
Iteration 4: tolerance = 4.230e-07
GEE population-averaged model
Number of obs
=
5320
Group and time vars:
id year
Number of groups =
532
Link:
identity
Obs per group: min =
10
Family:
Gaussian
avg =
10.0
Correlation:
unstructured
max =
10
Wald chi2(1)
= 43.67
Scale parameter:
.0803575
Prob > chi2
= 0.0000

515

-----------------------------------------------------------------------------lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0910023 .0137712 6.61 0.000 .0640113 .1179933
_cons | 7.426262 .0366836 202.44 0.000 7.354363 7.49816
-----------------------------------------------------------------------------. estimates store unstr
. * "Robust" standard error
. xtgee lnhr lnwg, corr(unstructured) i(id) t(year) robust
Iteration 1: tolerance = .00721446
Iteration 2: tolerance = .0003951
Iteration 3: tolerance = .00001469
Iteration 4: tolerance = 4.230e-07
GEE population-averaged model
Number of obs
=
5320
Group and time vars:
id year
Number of groups =
532
Link:
identity
Obs per group: min =
10
Family:
Gaussian
avg =
10.0
Correlation:
unstructured
max =
10
Wald chi2(1)
=
3.29
Scale parameter:
.0803575
Prob > chi2
= 0.0695
(standard errors adjusted for clustering on id)
-----------------------------------------------------------------------------|
Semi-robust
lnhr |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwg | .0910023 .0501344 1.82 0.069 -.0072594 .189264
_cons | 7.426262 .1328255 55.91 0.000 7.165929 7.686595
-----------------------------------------------------------------------------. estimates store unstrrob
. * Correct panel bootstrap standard errors
. set seed 10001
. /* For some reason the following did not work
> bootstrap "xtgee lnhr lnwg, corr(unstructured) i(id)" "_b[lnwg] _b[_cons]", cluster(id) reps($nrep
> s) level(95)
> matrix unstrbootse = e(se)
> */
.
. ********** DISPLAY RESULTS IN TABLE 21.7 page 725 **********
.
. * Standard error using iid errors and in some cases panel
. estimates table ind indrob exch exchrob, /*
> */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)

516

-----------------------------------------------------------------Variable | ind
indrob
exch
exchrob
-------------+---------------------------------------------------lnwg |
0.083
0.083
0.120
0.120
|
0.009
0.029
0.014
0.052
_cons |
7.442
7.442
7.345
7.345
|
0.024
0.080
0.036
0.138
-------------+---------------------------------------------------N | 5320.000 5320.000 5320.000 5320.000
ll |
r2 |
tss |
rss |
mss |
rmse |
df_r |
-----------------------------------------------------------------legend: b/se
. estimates table ar1 ar1rob unstr unstrrob, /*
> */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)
-----------------------------------------------------------------Variable | ar1
ar1rob
unstr
unstrrob
-------------+---------------------------------------------------lnwg |
0.084
0.084
0.091
0.091
|
0.012
0.037
0.014
0.050
_cons |
7.440
7.440
7.426
7.426
|
0.033
0.100
0.037
0.133
-------------+---------------------------------------------------N | 5320.000 5320.000 5320.000 5320.000
ll |
r2 |
tss |
rss |
mss |
rmse |
df_r |
-----------------------------------------------------------------legend: b/se
.
. * Standard errors using panel bootstrap (regular bootstrap for between)
. matrix list indbootse
indbootse[1,2]
_bs_1
_bs_2
se .03178369 .0861859
. matrix list exchbootse

517

exchbootse[1,2]
_bs_1
_bs_2
se .05989501 .15855561
. matrix list ar1bootse
ar1bootse[1,2]
_bs_1
_bs_2
se .05039303 .13673201
. matrix list unstrbootse
matrix unstrbootse not found
r(111);
end of do-file
r(111);
. exit, clear

518

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma22p1pangmm.txt
log type: text
opened on: 23 May 2005, 11:52:35
.
. ********** OVERVIEW OF MMA22P1PANGMM.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 22.3 pages 754-6
. * Panel 2SLS and GMM for a linear model with endogenous regressors
. * Fixed effects are first differenced.
. * Then 2SLS and GMM applied to first differenced model.
.
. * Program derives Table 22.2 and does other analysis in section
. * (1) pooled OLS
. * (2) 2SLS in base instruments case
. * (3) 2SLS in stacked instruments case
. * (4) 2SGMM in base instruments case
. * (5) 2SGMM in stacked instruments case
. * (6) F-statistics for weak instruments
. * (7) Partial R-squared for weak instruments
.
. * The pooled OLS and 2SLS replicate Ziliak (1997) Table 1 Top left-hand corner
. * for Base Case (9 instruments) and first Stacked Case (72 instruments)
. * 2SLS in first differences where both 1979 and 1980 are dropped
.
. * To run you need file
. * MOMprecise.dat
. * in your directory
.
. * NOTE: This data set is different from MOM.dat used in chapter 21.
.*
The data here has more significant digits.
.*
leading to some difference in resulting coefficient estiamtes.
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION **********
.
519

. * The original data is from


. * Jim Ziliak (1997)
. * "Efficient Estimation With Panel Data when Instruments are Predetermined:
. * An Empirical Comparison of Moment-Condition Estimators"
. * Journal of Business and Economic Statistics, 15, 419-431
. * NOTE: Data originally posted on JBES website was to only 2 dec places
. * Here more accurate data is used (the same as the data used by Ziliak)
. * Ziliak used Gauss. Here Stata is used.
.
. * File MOM.dat has data on 532 men over 10 years (1979-1988)
. * Data are space-delimited ordered by person with separate line for each year
. * So id 1 1979, id 1 1980, ..., id 1 1988, id 2 1979, 1d 2 1980, ...
. * 8 variables:
. * lnhr lnwg kids ageh agesq disab id year
.
. * File MOMprecise.dat has more significant digits than file MOM.dat
. * (the version of the data posted at the JBES website (used in chapter 21)
.
. ********** READ DATA **********
.
. * The data are in ascii file MOM.dat
. * There are 532 individuals with 10 lines (years) per individual
. * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY
. infile lnhr lnwg kids ageh agesq disab id year using MOMprecise.dat
(5320 observations read)
. describe
Contains data
obs:
5,320
vars:
8
size:
191,520 (98.1% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------lnhr
float %9.0g
lnwg
float %9.0g
kids
float %9.0g
ageh
float %9.0g
agesq
float %9.0g
disab
float %9.0g
id
float %9.0g
year
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |

Obs

Mean

Std. Dev.

Min

Max
520

-------------+-------------------------------------------------------lnhr |
5320 7.657458
.28564 2.772589 8.556414
lnwg |
5320 2.609477 .4260333 -.2613648 4.686474
kids |
5320 1.555827 1.195924
0
6
ageh |
5320 38.91823 8.450351
22
60
agesq |
5320 1586.024 689.7759
484
3600
-------------+-------------------------------------------------------disab |
5320 .0609023 .2391734
0
1
id |
5320
266.5 153.5893
1
532
year |
5320
1983.5 2.872551
1979
1988
.
. ********** FIRST DIFFERENCES REGRESSION **********
.
. * Stata has no command for first differences regression
. * Though may be possible with xtivreg
.
. * The following only works if each observation is (i,t)
. * and within i the data are ordered by t
. gen dlnhr = lnhr - lnhr[_n-1]
(1 missing value generated)
. gen dlnwg = lnwg - lnwg[_n-1]
(1 missing value generated)
. gen dkids = kids - kids[_n-1]
(1 missing value generated)
. gen dageh = ageh - ageh[_n-1]
(1 missing value generated)
. gen dagesq = agesq - agesq[_n-1]
(1 missing value generated)
. gen ddisab = disab - disab[_n-1]
(1 missing value generated)
.
. * The regression is of
. * dlnhr on constant dlnwg dkids dageh dagesq ddisab
.
. ********** GENERATE THE INSTRUMENTS **********
.
. * The endogenous variable is dlnwg. The others are exogenous.
. * It is not clear whether current values of the exogenous variables are used as instruments.
. * I would think so but there is no mention in the paper of this.
. * In addition Table 1 considers various instrument sets
. * We consider the first (first rows) and second (second rows)
.
. * (1) Use the levels of the exogenous regressors lagged one and two periods
. * and the level of the endogenous regressor lagged two periods
521

. * This gives nine instruments


. gen kidsl1 = kids[_n-1]
(1 missing value generated)
. gen kidsl2 = kids[_n-2]
(2 missing values generated)
. gen agehl1 = ageh[_n-1]
(1 missing value generated)
. gen agehl2 = ageh[_n-2]
(2 missing values generated)
. gen agesql1 = agesq[_n-1]
(1 missing value generated)
. gen agesql2 = agesq[_n-2]
(2 missing values generated)
. gen disabl1 = disab[_n-1]
(1 missing value generated)
. gen disabl2 = disab[_n-2]
(2 missing values generated)
. gen lnwgl2 = lnwg[_n-2]
(2 missing values generated)
.
. * (2) Use the same instruments as in (1) except now stacked so that
. * now the instrument matrix is block-diagonal.
. * This gives nine instruments times number of time periods.
. * The original data are 1979 to 1988.
. * We will eventually drop the first two years as lose 2 years due to lags.
. * For short hand call the instruments z1 to z9 and the years 1981 to 1988 y1 to y8.
. * Pad out to 8 x 9 = 72 instruments for 8 years
.
. program define makeZ
1. forvalues i=1(1)8 {
2. gen z1y`i'=0
3. replace z1y`i' = ageh[_n-1] if year==1980+`i'
4. gen z2y`i'=0
5. replace z2y`i' = agesq[_n-1] if year==1980+`i'
6. gen z3y`i'=0
7. replace z3y`i' = kids[_n-1] if year==1980+`i'
8. gen z4y`i'=0
9. replace z4y`i' = disab[_n-1] if year==1980+`i'
10. gen z5y`i'=0
11. replace z5y`i' = ageh[_n-2] if year==1980+`i'
12. gen z6y`i'=0
13. replace z6y`i' = agesq[_n-2] if year==1980+`i'
522

14. gen z7y`i'=0


15. replace z7y`i' = kids[_n-2] if year==1980+`i'
16. gen z8y`i'=0
17. replace z8y`i' = disab[_n-2] if year==1980+`i'
18. gen z9y`i'=0
19. replace z9y`i' = lnwg[_n-2] if year==1980+`i'
20. }
21. end
. quietly makeZ
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lnhr |
5320 7.657458
.28564 2.772589 8.556414
lnwg |
5320 2.609477 .4260333 -.2613648 4.686474
kids |
5320 1.555827 1.195924
0
6
ageh |
5320 38.91823 8.450351
22
60
agesq |
5320 1586.024 689.7759
484
3600
-------------+-------------------------------------------------------disab |
5320 .0609023 .2391734
0
1
id |
5320
266.5 153.5893
1
532
year |
5320
1983.5 2.872551
1979
1988
dlnhr |
5319 .0000192 .3016322 -4.787492 4.521109
dlnwg |
5319 .0001115 .2718437 -2.32463 3.062298
-------------+-------------------------------------------------------dkids |
5319 -.000188 .6629109
-5
6
dageh |
5319 .0030081 4.611209
-36
19
dagesq |
5319 .2105659 371.0841
-3024
1577
ddisab |
5319
0 .2429913
-1
1
kidsl1 |
5319 1.555932 1.196012
0
6
-------------+-------------------------------------------------------kidsl2 |
5318 1.556036 1.196101
0
6
agehl1 |
5319 38.91747 8.45096
22
60
agehl2 |
5318 38.91707 8.451706
22
60
agesql1 |
5319 1585.974 689.8313
484
3600
agesql2 |
5318 1585.957 689.8949
484
3600
-------------+-------------------------------------------------------disabl1 |
5319 .0609137 .2391944
0
1
disabl2 |
5318 .0609252 .2392155
0
1
lnwgl2 |
5318 2.609513 .4261095 -.2613648 4.686474
z1y1 |
5320 3.544549 10.92972
0
52
z2y1 |
5320 132.0002 438.9997
0
2704
-------------+-------------------------------------------------------z3y1 |
5320 .1567669 .5978681
0
6
z4y1 |
5320 .0048872 .0697442
0
1
z5y1 |
5320 3.445489 10.64043
0
51
z6y1 |
5320 125.0688 418.0247
0
2601
z7y1 |
5320 .1520677 .5938801
0
6
-------------+-------------------------------------------------------523

z8y1 |
5320 .0054511 .0736372
0
1
z9y1 |
5320 .2597756 .7905791
0 4.61522
z1y2 |
5320 3.63891 11.20265
0
53
z2y2 |
5320 138.7175 458.8032
0
2809
z3y2 |
5320 .1590226 .6057112
0
6
-------------+-------------------------------------------------------z4y2 |
5320 .0039474 .0627099
0
1
z5y2 |
5320 3.544549 10.92972
0
52
z6y2 |
5320 132.0002 438.9997
0
2704
z7y2 |
5320 .1567669 .5978681
0
6
z8y2 |
5320 .0048872 .0697442
0
1
-------------+-------------------------------------------------------z9y2 |
5320 .2602349 .7906729
0 4.60976
z1y3 |
5320 3.737218 11.49054
0
54
z2y3 |
5320 145.9744 480.6547
0
2916
z3y3 |
5320 .1637218 .6172305
0
6
z4y3 |
5320 .0052632 .0723633
0
1
-------------+-------------------------------------------------------z5y3 |
5320 3.63891 11.20265
0
53
z6y3 |
5320 138.7175 458.8032
0
2809
z7y3 |
5320 .1590226 .6057112
0
6
z8y3 |
5320 .0039474 .0627099
0
1
z9y3 |
5320 .2610997 .7928738
0 4.52656
-------------+-------------------------------------------------------z1y4 |
5320 3.83985 11.79093
0
55
z2y4 |
5320 153.7444 503.9576
0
3025
z3y4 |
5320 .1620301 .6132476
0
6
z4y4 |
5320 .0037594 .0612043
0
1
z5y4 |
5320 3.737218 11.49054
0
54
-------------+-------------------------------------------------------z6y4 |
5320 145.9744 480.6547
0
2916
z7y4 |
5320 .1637218 .6172305
0
6
z8y4 |
5320 .0052632 .0723633
0
1
z9y4 |
5320 .2614749 .7946793
0 4.607767
z1y5 |
5320 3.940414 12.08767
0
56
-------------+-------------------------------------------------------z2y5 |
5320 161.6111 527.9522
0
3136
z3y5 |
5320 .1595865 .608814
0
6
z4y5 |
5320 .006015 .0773303
0
1
z5y5 |
5320 3.83985 11.79093
0
55
z6y5 |
5320 153.7444 503.9576
0
3025
-------------+-------------------------------------------------------z7y5 |
5320 .1620301 .6132476
0
6
z8y5 |
5320 .0037594 .0612043
0
1
z9y5 |
5320 .2610663 .7939903
0 4.618777
z1y6 |
5320 4.047368 12.40128
0
57
z2y6 |
5320 170.144 553.5552
0
3249
-------------+-------------------------------------------------------z3y6 |
5320 .1575188 .6042401
0
5
z4y6 |
5320 .0065789 .0808511
0
1
z5y6 |
5320 3.940414 12.08767
0
56
524

z6y6 |
5320 161.6111 527.9522
0
3136
z7y6 |
5320 .1595865 .608814
0
6
-------------+-------------------------------------------------------z8y6 |
5320 .006015 .0773303
0
1
z9y6 |
5320 .2600271 .7937085 -.2613648 4.648325
z1y7 |
5320 4.140602 12.67474
0
58
z2y7 |
5320 177.7635 576.2959
0
3364
z3y7 |
5320 .1537594 .5983346
0
5
-------------+-------------------------------------------------------z4y7 |
5320 .006203 .0785219
0
1
z5y7 |
5320 4.047368 12.40128
0
57
z6y7 |
5320 170.144 553.5552
0
3249
z7y7 |
5320 .1575188 .6042401
0
5
z8y7 |
5320 .0065789 .0808511
0
1
-------------+-------------------------------------------------------z9y7 |
5320 .261494 .7964894
0 4.686474
z1y8 |
5320 4.240414 12.96638
0
59
z2y8 |
5320 186.0765 600.9297
0
3481
z3y8 |
5320 .1494361 .5901043
0
5
z4y8 |
5320 .0090226 .0945665
0
1
-------------+-------------------------------------------------------z5y8 |
5320 4.140602 12.67474
0
58
z6y8 |
5320 177.7635 576.2959
0
3364
z7y8 |
5320 .1537594 .5983346
0
5
z8y8 |
5320 .006203 .0785219
0
1
z9y8 |
5320 .2602616 .7933278
0 4.5933
.
. * Define variable lists for regressors X and instruments Z
.
. global XREG dlnwg dkids dageh dagesq ddisab
.
. global ZBASECASE kidsl1 agehl1 agesql1 disabl1 agehl2 kidsl2 agesql2 disabl2 lnwgl2
.
. global ZSTACKED z1y1 z2y1 z3y1 z4y1 z5y1 z6y1 z7y1 z8y1 z9y1 /*
> */
z1y2 z2y2 z3y2 z4y2 z5y2 z6y2 z7y2 z8y2 z9y2 /*
> */
z1y3 z2y3 z3y3 z4y3 z5y3 z6y3 z7y3 z8y3 z9y3 /*
> */
z1y4 z2y4 z3y4 z4y4 z5y4 z6y4 z7y4 z8y4 z9y4 /*
> */
z1y5 z2y5 z3y5 z4y5 z5y5 z6y5 z7y5 z8y5 z9y5 /*
> */
z1y6 z2y6 z3y6 z4y6 z5y6 z6y6 z7y6 z8y6 z9y6 /*
> */
z1y7 z2y7 z3y7 z4y7 z5y7 z6y7 z7y7 z8y7 z9y7 /*
> */
z1y8 z2y8 z3y8 z4y8 z5y8 z6y8 z7y8 z8y8 z9y8
.
. * Define variable lists for weak instruments test which drops
.
. save momfdiffgmm, replace
file momfdiffgmm.dta saved

525

. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lnhr |
5320 7.657458
.28564 2.772589 8.556414
lnwg |
5320 2.609477 .4260333 -.2613648 4.686474
kids |
5320 1.555827 1.195924
0
6
ageh |
5320 38.91823 8.450351
22
60
agesq |
5320 1586.024 689.7759
484
3600
-------------+-------------------------------------------------------disab |
5320 .0609023 .2391734
0
1
id |
5320
266.5 153.5893
1
532
year |
5320
1983.5 2.872551
1979
1988
dlnhr |
5319 .0000192 .3016322 -4.787492 4.521109
dlnwg |
5319 .0001115 .2718437 -2.32463 3.062298
-------------+-------------------------------------------------------dkids |
5319 -.000188 .6629109
-5
6
dageh |
5319 .0030081 4.611209
-36
19
dagesq |
5319 .2105659 371.0841
-3024
1577
ddisab |
5319
0 .2429913
-1
1
kidsl1 |
5319 1.555932 1.196012
0
6
-------------+-------------------------------------------------------kidsl2 |
5318 1.556036 1.196101
0
6
agehl1 |
5319 38.91747 8.45096
22
60
agehl2 |
5318 38.91707 8.451706
22
60
agesql1 |
5319 1585.974 689.8313
484
3600
agesql2 |
5318 1585.957 689.8949
484
3600
-------------+-------------------------------------------------------disabl1 |
5319 .0609137 .2391944
0
1
disabl2 |
5318 .0609252 .2392155
0
1
lnwgl2 |
5318 2.609513 .4261095 -.2613648 4.686474
z1y1 |
5320 3.544549 10.92972
0
52
z2y1 |
5320 132.0002 438.9997
0
2704
-------------+-------------------------------------------------------z3y1 |
5320 .1567669 .5978681
0
6
z4y1 |
5320 .0048872 .0697442
0
1
z5y1 |
5320 3.445489 10.64043
0
51
z6y1 |
5320 125.0688 418.0247
0
2601
z7y1 |
5320 .1520677 .5938801
0
6
-------------+-------------------------------------------------------z8y1 |
5320 .0054511 .0736372
0
1
z9y1 |
5320 .2597756 .7905791
0 4.61522
z1y2 |
5320 3.63891 11.20265
0
53
z2y2 |
5320 138.7175 458.8032
0
2809
z3y2 |
5320 .1590226 .6057112
0
6
-------------+-------------------------------------------------------z4y2 |
5320 .0039474 .0627099
0
1
z5y2 |
5320 3.544549 10.92972
0
52
z6y2 |
5320 132.0002 438.9997
0
2704
z7y2 |
5320 .1567669 .5978681
0
6
z8y2 |
5320 .0048872 .0697442
0
1
526

-------------+-------------------------------------------------------z9y2 |
5320 .2602349 .7906729
0 4.60976
z1y3 |
5320 3.737218 11.49054
0
54
z2y3 |
5320 145.9744 480.6547
0
2916
z3y3 |
5320 .1637218 .6172305
0
6
z4y3 |
5320 .0052632 .0723633
0
1
-------------+-------------------------------------------------------z5y3 |
5320 3.63891 11.20265
0
53
z6y3 |
5320 138.7175 458.8032
0
2809
z7y3 |
5320 .1590226 .6057112
0
6
z8y3 |
5320 .0039474 .0627099
0
1
z9y3 |
5320 .2610997 .7928738
0 4.52656
-------------+-------------------------------------------------------z1y4 |
5320 3.83985 11.79093
0
55
z2y4 |
5320 153.7444 503.9576
0
3025
z3y4 |
5320 .1620301 .6132476
0
6
z4y4 |
5320 .0037594 .0612043
0
1
z5y4 |
5320 3.737218 11.49054
0
54
-------------+-------------------------------------------------------z6y4 |
5320 145.9744 480.6547
0
2916
z7y4 |
5320 .1637218 .6172305
0
6
z8y4 |
5320 .0052632 .0723633
0
1
z9y4 |
5320 .2614749 .7946793
0 4.607767
z1y5 |
5320 3.940414 12.08767
0
56
-------------+-------------------------------------------------------z2y5 |
5320 161.6111 527.9522
0
3136
z3y5 |
5320 .1595865 .608814
0
6
z4y5 |
5320 .006015 .0773303
0
1
z5y5 |
5320 3.83985 11.79093
0
55
z6y5 |
5320 153.7444 503.9576
0
3025
-------------+-------------------------------------------------------z7y5 |
5320 .1620301 .6132476
0
6
z8y5 |
5320 .0037594 .0612043
0
1
z9y5 |
5320 .2610663 .7939903
0 4.618777
z1y6 |
5320 4.047368 12.40128
0
57
z2y6 |
5320 170.144 553.5552
0
3249
-------------+-------------------------------------------------------z3y6 |
5320 .1575188 .6042401
0
5
z4y6 |
5320 .0065789 .0808511
0
1
z5y6 |
5320 3.940414 12.08767
0
56
z6y6 |
5320 161.6111 527.9522
0
3136
z7y6 |
5320 .1595865 .608814
0
6
-------------+-------------------------------------------------------z8y6 |
5320 .006015 .0773303
0
1
z9y6 |
5320 .2600271 .7937085 -.2613648 4.648325
z1y7 |
5320 4.140602 12.67474
0
58
z2y7 |
5320 177.7635 576.2959
0
3364
z3y7 |
5320 .1537594 .5983346
0
5
-------------+-------------------------------------------------------z4y7 |
5320 .006203 .0785219
0
1
z5y7 |
5320 4.047368 12.40128
0
57
527

z6y7 |
5320 170.144 553.5552
0
3249
z7y7 |
5320 .1575188 .6042401
0
5
z8y7 |
5320 .0065789 .0808511
0
1
-------------+-------------------------------------------------------z9y7 |
5320 .261494 .7964894
0 4.686474
z1y8 |
5320 4.240414 12.96638
0
59
z2y8 |
5320 186.0765 600.9297
0
3481
z3y8 |
5320 .1494361 .5901043
0
5
z4y8 |
5320 .0090226 .0945665
0
1
-------------+-------------------------------------------------------z5y8 |
5320 4.140602 12.67474
0
58
z6y8 |
5320 177.7635 576.2959
0
3364
z7y8 |
5320 .1537594 .5983346
0
5
z8y8 |
5320 .006203 .0785219
0
1
z9y8 |
5320 .2602616 .7933278
0 4.5933
.
. ********** (1)-(3) 2SLS USING IVREG IS STRAIGHTFORWARD (Table 22.2, p.755)
**********
.
. * Note that this will automatically includes the exogenous variables as instrumetns
. * It is not clear that Ziliak does this
.
. * The following drops the first two years which here are 1979 and 1980
. drop if year == 1979 | year == 1980
(1064 observations deleted)
.
. * (1) OLS results at bottom Ziliak table 1
. * Table 22.2 (page 755) OLS column with various standard errors estimates
. regress dlnhr $XREG, noconstant
Source |
SS
df
MS
Number of obs = 4256
-------------+-----------------------------F( 5, 4251) = 5.38
Model | 2.3389287 5 .467785741
Prob > F
= 0.0001
Residual | 369.369193 4251 .086889954
R-squared = 0.0063
-------------+-----------------------------Adj R-squared = 0.0051
Total | 371.708121 4256 .087337435
Root MSE
= .29477
-----------------------------------------------------------------------------dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .1115114 .0230566 4.84 0.000 .0663084 .1567144
dkids | -.0062887 .0116719 -0.54 0.590 -.0291717 .0165943
dageh | .0066935 .0212744 0.31 0.753 -.0350154 .0484025
dagesq | -.0000797 .0002644 -0.30 0.763 -.000598 .0004387
ddisab | -.0352603 .0199796 -1.76 0.078 -.0744306 .0039101
-----------------------------------------------------------------------------. estimates store olsiid

528

. regress dlnhr $XREG, noconstant robust


Regression with robust standard errors
Number of obs =
F( 5, 4251) = 0.70
Prob > F
= 0.6246
R-squared = 0.0063
Root MSE = .29477

4256

-----------------------------------------------------------------------------|
Robust
dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .1115114 .0791674 1.41 0.159 -.043698 .2667207
dkids | -.0062887 .011057 -0.57 0.570 -.0279662 .0153888
dageh | .0066935 .0243788 0.27 0.784 -.0411016 .0544887
dagesq | -.0000797 .0003147 -0.25 0.800 -.0006965 .0005372
ddisab | -.0352603 .0364021 -0.97 0.333 -.1066273 .0361067
-----------------------------------------------------------------------------. estimates store olshet
. regress dlnhr $XREG, noconstant cluster(id)
Regression with robust standard errors
Number of obs = 4256
F( 5, 531) = 0.52
Prob > F
= 0.7617
R-squared = 0.0063
Number of clusters (id) = 532
Root MSE
= .29477
-----------------------------------------------------------------------------|
Robust
dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .1115114 .0960926 1.16 0.246 -.0772569 .3002797
dkids | -.0062887 .0109558 -0.57 0.566 -.0278107 .0152333
dageh | .0066935 .012339 0.54 0.588 -.0175458 .0309328
dagesq | -.0000797 .0001551 -0.51 0.608 -.0003843 .000225
ddisab | -.0352603 .0452557 -0.78 0.436 -.1241625 .053642
-----------------------------------------------------------------------------. estimates store olspanel
.
. * (2) 2SLS using the base case instrument set
. * Table 22.2 (page 755) 2SLS column base case with various se estimates
. ivreg dlnhr ($XREG = $ZBASECASE), noconstant
Instrumental variables (2SLS) regression
Source |
SS
df
MS
-------------+------------------------------

Number of obs =
F( 5, 4251) =

4256
.
529

Model | .164904559 5 .032980912


Prob > F
=
.
Residual | 371.543217 4251 .087401368
R-squared =
.
-------------+-----------------------------Adj R-squared =
.
Total | 371.708121 4256 .087337435
Root MSE
= .29564
-----------------------------------------------------------------------------dlnhr |
Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .2091087 .3886332 0.54 0.591 -.5528154 .9710328
dkids | -.0296864 .0437001 -0.68 0.497 -.1153615 .0559886
dageh | .026388 .0289908 0.91 0.363 -.030449 .0832251
dagesq | -.0003411 .0003688 -0.92 0.355 -.0010641 .000382
ddisab | .000402 .0429076 0.01 0.993 -.0837194 .0845233
-----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab
Instruments: kidsl1 agehl1 agesql1 disabl1 agehl2 kidsl2 agesql2 disabl2
lnwgl2
-----------------------------------------------------------------------------. estimates store baseiid
. ivreg dlnhr ($XREG = $ZBASECASE), noconstant robust
IV (2SLS) regression with robust standard errors
Number of obs =
F( 5, 4251) = 0.23
Prob > F
= 0.9510
R-squared =
.
Root MSE = .29564

4256

-----------------------------------------------------------------------------|
Robust
dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .2091087 .423312 0.49 0.621 -.6208038 1.039021
dkids | -.0296864 .0400461 -0.74 0.459 -.1081977 .0488249
dageh | .026388 .0361631 0.73 0.466 -.0445106 .0972866
dagesq | -.0003411 .0004555 -0.75 0.454 -.0012342 .000552
ddisab | .000402 .0731433 0.01 0.996 -.142997 .143801
-----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab
Instruments: kidsl1 agehl1 agesql1 disabl1 agehl2 kidsl2 agesql2 disabl2
lnwgl2
-----------------------------------------------------------------------------. estimates store basehet
. ivreg dlnhr ($XREG = $ZBASECASE), noconstant cluster(id)
IV (2SLS) regression with robust standard errors
Number of obs =
F( 5, 531) = 1.44
Prob > F
= 0.2087

4256

530

R-squared
Number of clusters (id) = 532

=
.
Root MSE

= .29564

-----------------------------------------------------------------------------|
Robust
dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .2091087 .3741705 0.56 0.576 -.5259273 .9441447
dkids | -.0296864 .0293678 -1.01 0.313 -.0873777 .0280048
dageh | .026388 .0153921 1.71 0.087 -.0038488 .0566249
dagesq | -.0003411 .0001837 -1.86 0.064 -.0007019 .0000198
ddisab | .000402 .0667719 0.01 0.995 -.1307674 .1315714
-----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab
Instruments: kidsl1 agehl1 agesql1 disabl1 agehl2 kidsl2 agesql2 disabl2
lnwgl2
-----------------------------------------------------------------------------. estimates store basepanel
.
. * (3) 2SLS using the stacked instrument set
. * Table 22.2 (page 755) 2SLS column stacked case with various se estimates
. set matsize 100
. ivreg dlnhr ($XREG = $ZSTACKED), noconstant
Instrumental variables (2SLS) regression
Source |
SS
df
MS
Number of obs = 4256
-------------+-----------------------------F( 5, 4251) =
.
Model | -29.3711267 5 -5.87422533
Prob > F
=
.
Residual | 401.079248 4251 .094349388
R-squared =
.
-------------+-----------------------------Adj R-squared =
.
Total | 371.708121 4256 .087337435
Root MSE
= .30716
-----------------------------------------------------------------------------dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .542827 .1691348 3.21 0.001 .2112345 .8744195
dkids | -.0482932 .0393723 -1.23 0.220 -.1254834 .028897
dageh | .0268935 .0288808 0.93 0.352 -.029728 .0835151
dagesq | -.0003511 .0003671 -0.96 0.339 -.0010709 .0003687
ddisab | .0079759 .0397995 0.20 0.841 -.0700519 .0860037
-----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab
Instruments: z1y1 z2y1 z3y1 z4y1 z5y1 z6y1 z7y1 z8y1 z9y1 z1y2 z2y2 z3y2 z4y2
z5y2 z6y2 z7y2 z8y2 z9y2 z1y3 z2y3 z3y3 z4y3 z5y3 z6y3 z7y3 z8y3
z9y3 z1y4 z2y4 z3y4 z4y4 z5y4 z6y4 z7y4 z8y4 z9y4 z1y5 z2y5 z3y5
z4y5 z5y5 z6y5 z7y5 z8y5 z9y5 z1y6 z2y6 z3y6 z4y6 z5y6 z6y6 z7y6
z8y6 z9y6 z1y7 z2y7 z3y7 z4y7 z5y7 z6y7 z7y7 z8y7 z9y7 z1y8 z2y8
531

z3y8 z4y8 z5y8 z6y8 z7y8 z8y8 z9y8


-----------------------------------------------------------------------------. estimates store stackiid
. ivreg dlnhr ($XREG = $ZSTACKED), noconstant robust
IV (2SLS) regression with robust standard errors
Number of obs =
F( 5, 4251) = 1.59
Prob > F
= 0.1596
R-squared =
.
Root MSE = .30716

4256

-----------------------------------------------------------------------------|
Robust
dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .542827 .2260738 2.40 0.016 .0996043 .9860497
dkids | -.0482932 .0350149 -1.38 0.168 -.1169408 .0203544
dageh | .0268935 .0339561 0.79 0.428 -.0396781 .0934652
dagesq | -.0003511 .0004324 -0.81 0.417 -.0011989 .0004966
ddisab | .0079759 .064012 0.12 0.901 -.1175211 .1334729
-----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab
Instruments: z1y1 z2y1 z3y1 z4y1 z5y1 z6y1 z7y1 z8y1 z9y1 z1y2 z2y2 z3y2 z4y2
z5y2 z6y2 z7y2 z8y2 z9y2 z1y3 z2y3 z3y3 z4y3 z5y3 z6y3 z7y3 z8y3
z9y3 z1y4 z2y4 z3y4 z4y4 z5y4 z6y4 z7y4 z8y4 z9y4 z1y5 z2y5 z3y5
z4y5 z5y5 z6y5 z7y5 z8y5 z9y5 z1y6 z2y6 z3y6 z4y6 z5y6 z6y6 z7y6
z8y6 z9y6 z1y7 z2y7 z3y7 z4y7 z5y7 z6y7 z7y7 z8y7 z9y7 z1y8 z2y8
z3y8 z4y8 z5y8 z6y8 z7y8 z8y8 z9y8
-----------------------------------------------------------------------------. estimates store stackhet
. ivreg dlnhr ($XREG = $ZSTACKED), noconstant cluster(id)
IV (2SLS) regression with robust standard errors
Number of obs =
F( 5, 531) = 2.41
Prob > F
= 0.0357
R-squared =
.
Number of clusters (id) = 532
Root MSE
= .30716

4256

-----------------------------------------------------------------------------|
Robust
dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .542827 .2085225 2.60 0.009 .1331968 .9524572
dkids | -.0482932 .0245011 -1.97 0.049 -.0964242 -.0001622
dageh | .0268935 .0149934 1.79 0.073 -.0025602 .0563473
dagesq | -.0003511 .0001866 -1.88 0.060 -.0007176 .0000154
ddisab | .0079759 .0624423 0.13 0.898 -.1146884 .1306402
532

-----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab


Instruments: z1y1 z2y1 z3y1 z4y1 z5y1 z6y1 z7y1 z8y1 z9y1 z1y2 z2y2 z3y2 z4y2
z5y2 z6y2 z7y2 z8y2 z9y2 z1y3 z2y3 z3y3 z4y3 z5y3 z6y3 z7y3 z8y3
z9y3 z1y4 z2y4 z3y4 z4y4 z5y4 z6y4 z7y4 z8y4 z9y4 z1y5 z2y5 z3y5
z4y5 z5y5 z6y5 z7y5 z8y5 z9y5 z1y6 z2y6 z3y6 z4y6 z5y6 z6y6 z7y6
z8y6 z9y6 z1y7 z2y7 z3y7 z4y7 z5y7 z6y7 z7y7 z8y7 z9y7 z1y8 z2y8
z3y8 z4y8 z5y8 z6y8 z7y8 z8y8 z9y8
-----------------------------------------------------------------------------. estimates store stackpanel
. ivreg dlnhr ($XREG = $ZSTACKED), noconstant robust cluster(id)
IV (2SLS) regression with robust standard errors
Number of obs =
F( 5, 531) = 2.41
Prob > F
= 0.0357
R-squared =
.
Number of clusters (id) = 532
Root MSE
= .30716

4256

-----------------------------------------------------------------------------|
Robust
dlnhr |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dlnwg | .542827 .2085225 2.60 0.009 .1331968 .9524572
dkids | -.0482932 .0245011 -1.97 0.049 -.0964242 -.0001622
dageh | .0268935 .0149934 1.79 0.073 -.0025602 .0563473
dagesq | -.0003511 .0001866 -1.88 0.060 -.0007176 .0000154
ddisab | .0079759 .0624423 0.13 0.898 -.1146884 .1306402
-----------------------------------------------------------------------------Instrumented: dlnwg dkids dageh dagesq ddisab
Instruments: z1y1 z2y1 z3y1 z4y1 z5y1 z6y1 z7y1 z8y1 z9y1 z1y2 z2y2 z3y2 z4y2
z5y2 z6y2 z7y2 z8y2 z9y2 z1y3 z2y3 z3y3 z4y3 z5y3 z6y3 z7y3 z8y3
z9y3 z1y4 z2y4 z3y4 z4y4 z5y4 z6y4 z7y4 z8y4 z9y4 z1y5 z2y5 z3y5
z4y5 z5y5 z6y5 z7y5 z8y5 z9y5 z1y6 z2y6 z3y6 z4y6 z5y6 z6y6 z7y6
z8y6 z9y6 z1y7 z2y7 z3y7 z4y7 z5y7 z6y7 z7y7 z8y7 z9y7 z1y8 z2y8
z3y8 z4y8 z5y8 z6y8 z7y8 z8y8 z9y8
-----------------------------------------------------------------------------.
. * DISPLAY THE OLS AND 2SLS RESULTS
.
. * The following are used in Table 22.2 (page 755)
.
. * OLS column with various standard errors estimates
. estimates table olspanel olshet olsiid, /*
> */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)
----------------------------------------------------Variable | olspanel
olshet
olsiid
-------------+--------------------------------------533

dlnwg |
0.112
0.112
0.112
|
0.096
0.079
0.023
dkids | -0.006
-0.006
-0.006
|
0.011
0.011
0.012
dageh |
0.007
0.007
0.007
|
0.012
0.024
0.021
dagesq | -0.000
-0.000
-0.000
|
0.000
0.000
0.000
ddisab | -0.035
-0.035
-0.035
|
0.045
0.036
0.020
-------------+--------------------------------------N | 4256.000 4256.000 4256.000
ll | -837.557 -837.557 -837.557
r2 |
0.006
0.006
0.006
tss |
rss | 369.369
369.369
369.369
mss |
2.339
2.339
2.339
rmse |
0.295
0.295
0.295
df_r | 531.000 4251.000 4251.000
----------------------------------------------------legend: b/se
.
. * 2SLS column base case with various standard errors estimates
. estimates table basepanel basehet baseiid, /*
> */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)
----------------------------------------------------Variable | basepanel basehet
baseiid
-------------+--------------------------------------dlnwg |
0.209
0.209
0.209
|
0.374
0.423
0.389
dkids | -0.030
-0.030
-0.030
|
0.029
0.040
0.044
dageh |
0.026
0.026
0.026
|
0.015
0.036
0.029
dagesq | -0.000
-0.000
-0.000
|
0.000
0.000
0.000
ddisab |
0.000
0.000
0.000
| 0.067
0.073
0.043
-------------+--------------------------------------N | 4256.000 4256.000 4256.000
ll |
r2 |
.
.
.
tss |
rss | 371.543
371.543
371.543
mss |
0.165
0.165
0.165
rmse |
0.296
0.296
0.296
df_r | 531.000 4251.000 4251.000
----------------------------------------------------legend: b/se
534

.
. * 2SLS column stacked case with various standard errors estimates
. estimates table stackpanel stackhet stackiid, /*
> */ se stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)
----------------------------------------------------Variable | stackpanel stackhet stackiid
-------------+--------------------------------------dlnwg |
0.543
0.543
0.543
|
0.209
0.226
0.169
dkids | -0.048
-0.048
-0.048
|
0.025
0.035
0.039
dageh |
0.027
0.027
0.027
|
0.015
0.034
0.029
dagesq | -0.000
-0.000
-0.000
|
0.000
0.000
0.000
ddisab |
0.008
0.008
0.008
|
0.062
0.064
0.040
-------------+--------------------------------------N | 4256.000 4256.000 4256.000
ll |
r2 |
.
.
.
tss |
rss | 401.079
401.079
401.079
mss | -29.371
-29.371 -29.371
rmse |
0.307
0.307
0.307
df_r | 531.000 4251.000 4251.000
----------------------------------------------------legend: b/se
.
. ********** (4)-(5) 2SGMM REQUIRES SPECIAL MARTRIX CODING **********
.
. *** PROGRAM PANELGMM DOES 2SLS (as check) and 2SGMM USING MATRIX
COMMANDS
.
. * This program:
. * - requires as inputs the global macros
.*
y gives the dependent variable name
.*
X gives the list of regressor names
.*
Z gives the list of instrument names
. * - assumes the appropriate data is in memory
. * - assumes the cluster identifier is called id
.
. * If the regressors and instruments include an intercept include
. * this as a separate regressor, say called ONE, in X and Z.
. * Then continue to use the following code with the noconstant option for accum and optaccum.
. * (accum and optaccum automatically include a constant AT THE END,
. * which is not where we want the constant.)
.
535

. * This program computes the 2SLS and two-step GMM estimators


.*
[(X'Z)(Z'Z)_inv Z'X]_inv (X'Z)(Z'Z)_inv Z'y
. * and [(X'Z)S_inv Z'X]_inv (X'Z)S_inv Z'y
. * and appropriate panel robust standard errors
. * assuming a short panel with errors correlated over t for given i and heteroskedastic.
.
. program define panelgmm
1.
. * (1) Create Z'Z and check that full rank
. matrix accum ZZ = $Z, noconstant
2. scalar dimz = rowsof(ZZ)
3. scalar detzz = det(ZZ)
4. di "Redundant instruments if det(Z'Z) zero. Here det(Z'Z) = " detzz
5.
. * (2) Create Z'X which is trickier
. * Create ZX'ZX = [Z X]' [Z X] using accum which automatically adds a constant
. matrix accum ZXZX = $Z $X, noconstant
6. * Then Z'X is the (1,2) submatrix: rows 1 to dimz and columns dimz+1 to dimzx
. scalar dimzx = rowsof(ZXZX)
7. * Also need dimension of X
. matrix accum XX = $X, noconstant
8. scalar dimx = rowsof(XX)
9. matrix ZX = ZXZX[1..dimz,dimz+1...]
10.
. * (3) Create Z'y
. * Create Zy'Zy = [Z y]' [Z y] using accum which automatically adds a constant
. matrix accum ZyZy = $Z $y, noconstant
11. * Then Z'y is the (1,2) submatrix: rows 1 to dimz and the last column
. matrix Zy = ZyZy[1..dimz,dimz+1]
12.
. * (4) Compute 2SLS Estimator
. di " "
13. di "2SLS results: "
14. matrix b2SLS = syminv(ZX'*syminv(ZZ)*ZX)*ZX'*syminv(ZZ)*Zy
15. matrix list b2SLS
16.
. * (5) Compute S = Sum_i Zi'u_i*u_i'Z_i using opaccum
. * Key is use of opaccum.
. * Need to compute the residuals.
. gen yhat = 0
17. foreach var of varlist $X {
18. matrix a`var' = b2SLS["`var'",1]
19. scalar b`var' = trace(a`var') /* converts matrix to scalar */
20. quietly replace yhat = yhat + (b`var')*(`var')
21. }
22. gen uhat = $y - yhat
23. gen uhatsq = uhat*uhat
24. quietly sum(uhatsq)
25. scalar rmse = sqrt(r(sum)/(_N-dimx))
26. di "rmse = " rmse
27. * Alternative and check uses ivreg.
536

. quietly ivreg $y ($X = $Z), noconstant cluster(id)


28. predict uhat2, residuals
29. quietly sum uhat uhat2
30. * Sort data for opaccum to work
. preserve
31. sort id
32. matrix opaccum S = $Z, group(id) opvar(uhat) noconstant
33. /*
> * Ziliak uses heteroskedastic errors but not correlated.
> * Then instead use the following which assumes time identifier is year.
> * Make a unique identifier obsid so that group(obsid) does not group
> gen obsid = 10000*id + year
> sort obsid
> matrix opaccum S = $Z, group(obsid) opvar(uhat) noconstant
> */
. restore
34.
. * (6) Compute Variance of 2SLS.
.
matrix
v2SLS
=
syminv(ZX'*syminv(ZZ)*ZX)*ZX'*syminv(ZZ)*S*syminv(ZZ)*ZX*syminv(ZX'*syminv(ZZ)*Z
X)
35. * matrix list v2SLS
. * Now need to get standard errors
. matrix se2SLS = J(dimx,1,0) /* Initially column vector of zeroes */
36. scalar icol = 1
37. * Need loop here as Stata does not do square root on a vector
. while icol <= dimx {
38. matrix se2SLS[icol,1] = sqrt(v2SLS[icol,icol])
39. scalar icol = icol+1
40. }
41. matrix list se2SLS
42.
. * (7) Compute Two-step GMM
. di " "
43. di "2SGMM results: "
44. matrix b2SGMM = syminv(ZX'*syminv(S)*ZX)*ZX'*syminv(S)*Zy
45. matrix list b2SGMM
46.
. * (8) Compute Variance of Two-step GMM
. * Compute the residuals to recompute S at the new estimates.
. * Note that could just use the old S
. drop yhat uhat uhatsq
47. gen yhat = 0
48. foreach var of varlist $X {
49. matrix a`var' = b2SGMM["`var'",1]
50. scalar b`var' = trace(a`var') /* converts matrix to scalar */
51. quietly replace yhat = yhat + (b`var')*(`var')
52. }
53. gen uhat = $y - yhat
54. gen uhatsq = uhat*uhat
55. quietly sum(uhatsq)
537

56. scalar rmse = sqrt(r(sum)/(_N-dimx))


57. di "rmse = " rmse
58. * Sort data for opaccum to work
. preserve
59. sort id
60. matrix opaccum S = $Z, group(id) opvar(uhat) noconstant
61. matrix v2SGMM = syminv(ZX'*syminv(S)*ZX)
62. * matrix list v2SGMM
. matrix se2SGMM = J(dimx,1,0) /* Initially column vector of zeroes */
63. scalar icol = 1
64. * Need loop here as Stata does not do square root on a vector
. while icol <= dimx {
65. matrix se2SGMM[icol,1] = sqrt(v2SGMM[icol,icol])
66. scalar icol = icol+1
67. }
68. matrix list se2SGMM
69.
. * (9) Compute the overidentifying restrictions test
. * Create row vector u'Z using vecaccum which automatically adds a constant
. matrix vecaccum uZ = uhat $Z, noconstant
70. matrix maxobjfunction = uZ*syminv(S)*uZ'
71. scalar ortest = maxobjfunction[1,1]
72. scalar dof = dimz - dimx
73. di " Over-identifying restrictions test " ortest " dof " dof " p-value " chi2tail(dof,ortest)
74.
. end
.
. *** EXECUTE THE PROGRAM PANEL GMM FOR THESE DATA
.
. * Note that Ziliak does not use an intercept.
. * If have an intercept then need to add in the constant explicitly
. * generate ONE = 1
. * and then add this to the X and Z
.
. * Define the dependent variable
. global y dlnhr
.
. * Define the regressors.
. global X $XREG
.
. * (4) 2SGMM (and 2SLS as check) using the base case instrument set
. * Gives 2SGMM Base Case column of Table 22.2 (page 755)
.
. global Z $ZBASECASE
. panelgmm
(obs=4256)
Redundant instruments if det(Z'Z) zero. Here det(Z'Z) = 6.375e+37
538

(obs=4256)
(obs=4256)
(obs=4256)
2SLS results:
b2SLS[5,1]
dlnhr
dlnwg .20910869
dkids -.02968643
dageh .02638804
dagesq -.00034108
ddisab .00040197
rmse = .29563723
se2SLS[5,1]
c1
r1 .3736429
r2 .02932634
r3 .01537039
r4 .00018343
r5 .06667771
2SGMM results:
b2SGMM[5,1]
dlnhr
dlnwg .54679602
dkids -.04490416
dageh .02747594
dagesq -.00035912
ddisab -.0468348
rmse = .30719932
se2SGMM[5,1]
c1
r1 .32762396
r2 .02714405
r3 .01295984
r4 .00015941
r5 .06236006
Over-identifying restrictions test 5.4503878 dof 4 p-value .24412497
.
. * (5) 2SGMM (and 2SLS as check) using the stacked instrument set
. * Gives 2SGMM Stacked Case column of Table 22.2 (page 755)
.
. drop uhat yhat uhatsq uhat2 /* Obtained in panelgmm */
. global Z $ZSTACKED

539

. * dlnwg dkids dageh dagesq ddisab


. panelgmm
(obs=4256)
Redundant instruments if det(Z'Z) zero. Here det(Z'Z) = 7.52e+234
(obs=4256)
(obs=4256)
(obs=4256)
2SLS results:
b2SLS[5,1]
dlnhr
dlnwg .54282703
dkids -.0482932
dageh .02689353
dagesq -.00035113
ddisab .0079759
rmse = .30716345
se2SLS[5,1]
c1
r1 .20822845
r2 .02446659
r3 .01497229
r4 .0001863
r5 .0623543
2SGMM results:
b2SGMM[5,1]
dlnhr
dlnwg .32999732
dkids -.01681724
dageh .01637783
dagesq -.00019221
ddisab -.02010632
rmse = .29791501
se2SGMM[5,1]
c1
r1 .10965082
r2 .01356737
r3 .00834178
r4 .0001037
r5 .02357317
Over-identifying restrictions test 69.506226 dof 67 p-value .39307324
.
. ********** (6) F-STATISTICS FOR WEAK INSTRUMENTS (page 756) **********
.
. * (1) Weak Instruments using base case instrument set
540

.
. * Test weak instruments for dlnwg using panel robust inference
. quietly regress dlnwg $ZBASECASE, cluster(id)
. quietly test $ZBASECASE
. * This value should have been reported in the text on page 756
. * [Instead by mistake the F assuning iid errors below was reported]
. di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df)
r2 = .00590049 F = 2.3790046 p = .01209278 dof = 9
.
. * Same except use wrong inference assuming iid errors
. quietly regress dlnwg $ZBASECASE
. quietly test $ZBASECASE
. di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df)
r2 = .00590049 F = 2.800243 p = .00281135 dof = 9
.
. * (2) Weak Instruments using stacked instrument set
.
. * Test weak instruments for dlnwg using panel robust inference
. quietly regress dlnwg $ZSTACKED, cluster(id)
. quietly test $ZSTACKED
. * This value was reported in the text on page 756
. di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df)
r2 = .02256803 F = 1.9000813 p = .00003808 dof = 72
.
. * Same except use wrong inference assuming iid errors
. quietly regress dlnwg $ZSTACKED
. quietly test $ZSTACKED
. di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df)
r2 = .02256803 F = 1.341413 p = .02961833 dof = 72
.
. * (3) Weak Instruments for other regressors
. * Here all regressors are instrumented. So should test all as above.
. * These find no problems.
. * For example, for dkids and base case instrument set
. quietly regress dkids $ZSTACKED, cluster(id)
. quietly test $ZSTACKED
. di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df)
541

r2 = .16281613 F = 8.4145744 p = 3.349e-52 dof = 72


. quietly regress dageh $ZSTACKED, cluster(id)
. quietly test $ZSTACKED
. di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df)
r2 = .22076423 F = 24.002499 p = 6.30e-126 dof = 72
. quietly regress dagesq $ZSTACKED, cluster(id)
. quietly test $ZSTACKED
. di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df)
r2 = .36856999 F = 150.79951 p = 4.10e-309 dof = 72
. quietly regress ddisab $ZSTACKED, cluster(id)
. quietly test $ZSTACKED
. di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df)
r2 = .28591864 F = 25.786283 p = 4.70e-132 dof = 72
.
. ********** PARTIAL R-SQUARED FOR WEAK INSTRUMENTS (page 756) **********
.
. * (1) Weak Instruments using base case instrument set
.
. * Test weak instruments for dlnwg using panel robust inference
. quietly regress dlnwg $ZBASECASE, cluster(id)
. quietly test $ZBASECASE
. di "r2 = " e(r2) " F = " r(F) " p = " r(p) " dof = " r(df)
r2 = .00590049 F = 2.3790046 p = .01209278 dof = 9
.
. **** (D) Shea (1997) partial R-squared
.
. * Here we have five endogenous regressors and no exogenous regressors.
. * Need to change code below if there are exogenous regressors. See ch4ivkling.do
. * Focus on the endogenous wage regressor.
. * For the other four just need to replace dlnwg in the first line of (1)
. * and replace the first line of (2B)
.
. * (1) Form x1 - x1tilda: residual from regress x1 on other regressors
. quietly reg dlnwg dkids dageh dagesq ddisab
. * quietly reg dkids dlnwg dageh dagesq ddisab
. predict x1minusx1tilda, resid

542

.
. * (2) Form x1hat - x1hattilda: residual from regress x1hat on fitted values of other regressors
. * (2A) First get the fitted values from regress endogenous on instruments
. quietly reg dlnwg $ZBASECASE
. predict dlnwghat, xb
. di e(r2) " r2 from regress x1 on Z"
.00590049 r2 from regress x1 on Z
. quietly reg dkids $ZBASECASE
. predict dkidshat, xb
. di e(r2) " r2 from regress second endog regressor on Z"
.1473738 r2 from regress second endog regressor on Z
. quietly reg dageh $ZBASECASE
. predict dagehhat, xb
. di e(r2) " r2 from regress third endog regressor on Z"
.13903221 r2 from regress third endog regressor on Z
. quietly reg dagesq $ZBASECASE
. predict dagesqhat, xb
. di e(r2) " r2 from regress fourth endog regressor on Z"
.3049799 r2 from regress fourth endog regressor on Z
. quietly reg ddisab $ZBASECASE
. predict ddisabhat, xb
. di e(r2) " r2 from regress fifth endog regressor on Z"
.26087493 r2 from regress fifth endog regressor on Z
. * (2B) Run the regression of x1hat on fitted values of other regressors
. quietly reg dlnwghat dkidshat dagehhat dagesqhat ddisabhat
. * quietly reg dkidshat dlnwghat dagehhat dagesqhat ddisabhat
. di e(r2) " r2 from regress prediction of x1 on predictions of x2
.38268288 r2 from regress prediction of x1 on predictions of x2
. predict x1hatminusx1hattilda, resid
.
. * (3) Form the correlation between (1) and (2)
. * This value is reported in the text on page 756
. corr x1minusx1tilda x1hatminusx1hattilda
543

(obs=4256)
| x1minu~a x1hatm~a
-------------+-----------------x1minusx1t~a | 1.0000
x1hatminus~a | 0.0604 1.0000

. di r(rho)^2 " Shea's partial R-squared measure"


.00364741 Shea's partial R-squared measure
.
. ********** CLOSE OUTPUT
.
. log close
log: c:\Imbook\bwebpage\Section5\mma22p1pangmm.txt
log type: text
closed on: 23 May 2005, 11:52:42

544

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section5\mma23p1pannonlin.txt
log type: text
opened on: 23 May 2005, 12:46:16
.
. ********** OVERVIEW OF MMA23P1PANNONLIN.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 23.3 pages 792-5
. * Example of nonlinear model (multiplicative effects)
.
. * This program derives Table 23.1 and Figure 23.1.
. * It performs nonlinear panel analysis for multiplicative effects model
. * y_it = a_i*exp(x_it'b) = exp(c_i+x_it'b)
. * and parametric count data models
.
. * (1) Linear (xtreg) for log(PAT) with adjustment for PAT=0
.*
Output include Figure 23.1
. * (2) Poisson (xtpoisson) fixed and random effects
. * (3) GEE (xtgee) which includes pooled NLS
.
. * The Poisson individual effects model is
. * y_it ~ Poisson(x_it'b + a_i)
. * The standard errors assume this model correctly specified
. * i.e. Variance = mean given x+it and a_i
.
. * FOr "panel robust se's see section 23.2.6 pages 788-791
. * To obtain more panel robust standard errors this program panel bootstraps
. * Note that the panel se entries of 0.033 under GEE, Poisson-RE and Poisson-FE
. * are not panel robust to the extent that the bootstrap se's are panel robust
. * and in fact are the usual se's in the case of Poisson-RE and Poisson-FE
. * Unlike ch.21 here "panel se" means "defaul panel se" and not "panel-robust se".
.
. * To speed up program reduce nreps, the number of bootstrap replications
.
. * To run this program you need data file
. * patr7079.asc
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
545

.
. ********** DATA DESCRIPTION **********
.
. * There are ten years of data but only five years 1975-79 are used in estimation
.
. * The original data is from
. * Bronwyn Hall, Zvi Griliches, and Jerry Hausman (1986),
. * "Patents and R&D: Is There a Lag?",
. * International Economic Review, 27, 265-283.
.
. * File patr7079.dat has data on 346 firms
. * There are 4 lines per firm, with 25 variables
. * Time-invariant: CUSIP,ARDSSIC,SCISECT,LOGK,SUMPAT,
. * Time-varying X: LOGR70,LOGR71,LOGR72, ....., LOGR77,LOGR78,LOGR79
. * Time-varying Y: PAT70,PAT71,PAT72, ....., PAT77,PAT78,PAT79
. * in the format:
. * I7,I3,I2,5F12.6/6F12.6/6F12.6/5F12.6/
. * where
. * CUSIP Compustat's identifying number for the firm (Committee on
.*
Uniform Security Identification Procedures number).
. * ARDSIC A two-digit code for the applied R&D industrial classification
.*
(roughly that in Bound, Cummins, Griliches, Hall, and Jaffe, in
.*
the Griliches R&D, Patents, and Productivity volume).
. * SCISECT Dummy equal to one for firms in the scientific sector.
. * LOGK The logarithm of the book value of capital in 1972.
. * SUMPAT The sum of patents applied for between 1972-1979.
. * LOGR70- The logarithm of R&D spending during the year (in 1972 dollars).
. * LOGR79
. * PAT70- The number of patents applied for during the year that were
. * PAT79 eventually granted.
.
. ********** READ DATA **********
.
. * The data are in ascii file patr7079.asc
. * There are 346 observations on 25 variables with four lines per obs
. * The data are fixed format with
. * line 1 variables 1-8 I7,I3,I2,5F12.6
. * line 2 variables 9-14 6F12.6
. * line 3 variables 15-20 6F12.6
. * line 4 variables 20-25 6F12.6
.
. * Read in using Infile: FREE FORMAT WITHOUT DICTIONARY
. * As there is space between each observation data is also space-delimited
. * free format and then there is no need for a dictionary file
. * The following command spans more that one line so use /* and */
. infile CUSIP ARDSSIC SCISECT LOGK SUMPAT LOGR70 LOGR71 LOGR72 LOGR73 /*
> */ LOGR74 LOGR75 LOGR76 LOGR77 LOGR78 LOGR79 PAT70 PAT71 PAT72 /*
> */ PAT73 PAT74 PAT75 PAT76 PAT77 PAT78 PAT79 using patr7079.asc
(346 observations read)

546

.
. ********** DATA TRANSFORMATIONS **********
.
. * Use observation number as an identifier, not just CUSIP
. gen id = _n
. label variable id "id"
. * The following lists the variables in data set and summarizes data
. describe
Contains data
obs:
346
vars:
26
size:
37,368 (99.6% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------CUSIP
float %9.0g
ARDSSIC
float %9.0g
SCISECT
float %9.0g
LOGK
float %9.0g
SUMPAT
float %9.0g
LOGR70
float %9.0g
LOGR71
float %9.0g
LOGR72
float %9.0g
LOGR73
float %9.0g
LOGR74
float %9.0g
LOGR75
float %9.0g
LOGR76
float %9.0g
LOGR77
float %9.0g
LOGR78
float %9.0g
LOGR79
float %9.0g
PAT70
float %9.0g
PAT71
float %9.0g
PAT72
float %9.0g
PAT73
float %9.0g
PAT74
float %9.0g
PAT75
float %9.0g
PAT76
float %9.0g
PAT77
float %9.0g
PAT78
float %9.0g
PAT79
float %9.0g
id
float %9.0g
id
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize

547

Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------CUSIP |
346 531201.2 282074.9
800 989399
ARDSSIC |
336 9.97619 5.459706
1
21
SCISECT |
346 .4248555 .4950369
0
1
LOGK |
346 3.921063 2.095542 -1.76965 9.66626
SUMPAT |
346 284.7312 571.1136
0
3806
-------------+-------------------------------------------------------LOGR70 |
346 1.198348 1.941968 -3.67354 6.56641
LOGR71 |
346 1.169182 1.929444 -3.53055 6.95687
LOGR72 |
346 1.185953 1.929078 -3.35241 6.97009
LOGR73 |
346 1.231135 1.934896 -3.67395 7.06211
LOGR74 |
346 1.232636 1.946417 -3.15274 7.06524
-------------+-------------------------------------------------------LOGR75 |
346 1.165802 1.98001 -3.5476 6.76486
LOGR76 |
346 1.212888 1.979273 -3.84868 6.8285
LOGR77 |
346 1.250034 2.003002 -3.47884 6.90253
LOGR78 |
346 1.306511 2.019792 -3.2832 6.96345
LOGR79 |
346 1.345581 2.054982 -3.57742 7.03432
-------------+-------------------------------------------------------PAT70 |
346 40.00289 82.50335
0
608
PAT71 |
346 38.10983 78.40308
0
553
PAT72 |
346 36.30925 74.81591
0
557
PAT73 |
346 36.95376 77.91971
0
595
PAT74 |
346 37.60983 75.94388
0
528
-------------+-------------------------------------------------------PAT75 |
346 36.87283 75.98788
0
508
PAT76 |
346 35.84682 73.31613
0
487
PAT77 |
346 36.23121 72.75146
0
456
PAT78 |
346 32.80636 65.6505
0
434
PAT79 |
346 32.10116 66.36197
0
515
-------------+-------------------------------------------------------id |
346
173.5 100.0258
1
346
.
. ******** CHANGE ORGANIZATION OF DATA USING RESHAPE AND MORE
TRANSFORMATIONS
.
. reshape long PAT LOGR, i(id) j(year)
(note: j = 70 71 72 73 74 75 76 77 78 79)
Data
wide -> long
----------------------------------------------------------------------------Number of obs.
346 -> 3460
Number of variables
26 ->
9
j variable (10 values)
-> year
xij variables:
PAT70 PAT71 ... PAT79 -> PAT
LOGR70 LOGR71 ... LOGR79 -> LOGR
-----------------------------------------------------------------------------

548

. describe
Contains data
obs:
3,460
vars:
9
size:
128,020 (98.7% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
id
year
byte %9.0g
CUSIP
float %9.0g
ARDSSIC
float %9.0g
SCISECT
float %9.0g
LOGK
float %9.0g
SUMPAT
float %9.0g
LOGR
float %9.0g
PAT
float %9.0g
------------------------------------------------------------------------------Sorted by: id year
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
3460
173.5 99.89562
1
346
year |
3460
74.5 2.872696
70
79
CUSIP |
3460 531201.2 281707.7
800 989399
ARDSSIC |
3360 9.97619 5.452387
1
21
SCISECT |
3460 .4248555 .4943925
0
1
-------------+-------------------------------------------------------LOGK |
3460 3.921063 2.092814 -1.76965 9.66626
SUMPAT |
3460 284.7312 570.3701
0
3806
LOGR |
3460 1.229807 1.970524 -3.84868 7.06524
PAT |
3460 36.28439 74.46563
0
608
.
. * Create new variable log(patents) with adjustment for patents = 0
. gen NEWPAT = PAT
. replace NEWPAT = 0.5 if NEWPAT==0.
(605 real changes made)
. gen LPAT = ln(NEWPAT)
. label variable LPAT "Ln(Patents)"
. label variable PAT "Patents"

549

. * Dummy variable for logit analysis


. gen DPAT = 0
. replace DPAT = 1 if PAT>0
(2855 real changes made)
. label variable DPAT "Patent Indicator"
. * R and D
. gen RANDD = exp(LOGR)
. label variable LOGR "Ln(R&D)"
. label variable RANDD "R&D"
. * Lagged log R and D
. tsset id year
panel variable: id, 1 to 346
time variable: year, 70 to 79
. gen LOGRL1 = L1.LOGR
(346 missing values generated)
. gen LOGRL2 = L2.LOGR
(692 missing values generated)
. gen LOGRL3 = L3.LOGR
(1038 missing values generated)
. gen LOGRL4 = L4.LOGR
(1384 missing values generated)
. gen LOGRL5 = L5.LOGR
(1730 missing values generated)
. label variable LOGRL1 "Ln(R&D) lagged once"
. label variable LOGRL2 "Ln(R&D) lagged twice"
. label variable LOGRL3 "Ln(R&D) lagged three times"
. label variable LOGRL4 "Ln(R&D) lagged four times"
. label variable LOGRL5 "Ln(R&D) lagged five times"
. * Year dummies
. gen dyear2 = 0
. replace dyear2 = 1 if year==76
(346 real changes made)

550

. gen dyear3 = 0
. replace dyear3 = 1 if year==77
(346 real changes made)
. gen dyear4 = 0
. replace dyear4 = 1 if year==78
(346 real changes made)
. gen dyear5 = 0
. replace dyear5 = 1 if year==79
(346 real changes made)
.
. * Check data and Save data as Stata data set
. describe
Contains data
obs:
3,460
vars:
22
size:
307,940 (97.0% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
id
year
byte %9.0g
CUSIP
float %9.0g
ARDSSIC
float %9.0g
SCISECT
float %9.0g
LOGK
float %9.0g
SUMPAT
float %9.0g
LOGR
float %9.0g
Ln(R&D)
PAT
float %9.0g
Patents
NEWPAT
float %9.0g
LPAT
float %9.0g
Ln(Patents)
DPAT
float %9.0g
Patent Indicator
RANDD
float %9.0g
R&D
LOGRL1
float %9.0g
Ln(R&D) lagged once
LOGRL2
float %9.0g
Ln(R&D) lagged twice
LOGRL3
float %9.0g
Ln(R&D) lagged three times
LOGRL4
float %9.0g
Ln(R&D) lagged four times
LOGRL5
float %9.0g
Ln(R&D) lagged five times
dyear2
float %9.0g
dyear3
float %9.0g
dyear4
float %9.0g
dyear5
float %9.0g
------------------------------------------------------------------------------Sorted by: id year
551

Note: dataset has changed since last saved


. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
3460
173.5 99.89562
1
346
year |
3460
74.5 2.872696
70
79
CUSIP |
3460 531201.2 281707.7
800 989399
ARDSSIC |
3360 9.97619 5.452387
1
21
SCISECT |
3460 .4248555 .4943925
0
1
-------------+-------------------------------------------------------LOGK |
3460 3.921063 2.092814 -1.76965 9.66626
SUMPAT |
3460 284.7312 570.3701
0
3806
LOGR |
3460 1.229807 1.970524 -3.84868 7.06524
PAT |
3460 36.28439 74.46563
0
608
NEWPAT |
3460 36.37182 74.42325
.5
608
-------------+-------------------------------------------------------LPAT |
3460 1.935464 1.949421 -.6931472 6.410175
DPAT |
3460 .8251445 .3798984
0
1
RANDD |
3460 23.02263 82.90186 .0213078 1170.563
LOGRL1 |
3114 1.216943 1.960836 -3.84868 7.06524
LOGRL2 |
2768 1.205747 1.953427 -3.84868 7.06524
-------------+-------------------------------------------------------LOGRL3 |
2422 1.19942 1.946583 -3.84868 7.06524
LOGRL4 |
2076 1.197176 1.941555 -3.67395 7.06524
LOGRL5 |
1730 1.203451 1.934293 -3.67395 7.06524
dyear2 |
3460
.1 .3000434
0
1
dyear3 |
3460
.1 .3000434
0
1
-------------+-------------------------------------------------------dyear4 |
3460
.1 .3000434
0
1
dyear5 |
3460
.1 .3000434
0
1
. drop NEWPAT
. save patr7079, replace
file patr7079.dta saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
3460
173.5 99.89562
1
346
year |
3460
74.5 2.872696
70
79
CUSIP |
3460 531201.2 281707.7
800 989399
ARDSSIC |
3360 9.97619 5.452387
1
21
SCISECT | 3460 .4248555 .4943925
0
1
-------------+-------------------------------------------------------LOGK |
3460 3.921063 2.092814 -1.76965 9.66626
SUMPAT |
3460 284.7312 570.3701
0
3806
LOGR |
3460 1.229807 1.970524 -3.84868 7.06524
552

PAT |
3460 36.28439 74.46563
0
608
LPAT |
3460 1.935464 1.949421 -.6931472 6.410175
-------------+-------------------------------------------------------DPAT |
3460 .8251445 .3798984
0
1
RANDD |
3460 23.02263 82.90186 .0213078 1170.563
LOGRL1 |
3114 1.216943 1.960836 -3.84868 7.06524
LOGRL2 |
2768 1.205747 1.953427 -3.84868 7.06524
LOGRL3 |
2422 1.19942 1.946583 -3.84868 7.06524
-------------+-------------------------------------------------------LOGRL4 |
2076 1.197176 1.941555 -3.67395 7.06524
LOGRL5 |
1730 1.203451 1.934293 -3.67395 7.06524
dyear2 |
3460
.1 .3000434
0
1
dyear3 |
3460
.1 .3000434
0
1
dyear4 |
3460
.1 .3000434
0
1
-------------+-------------------------------------------------------dyear5 |
3460
.1 .3000434
0
1
. xtsum, i(id)
Variable
|
Mean Std. Dev.
Min
Max | Observations
-----------------+--------------------------------------------+---------------id
overall | 173.5 99.89562
1
346 | N = 3460
between |
100.0258
1
346 | n = 346
within |
0
173.5
173.5 | T =
10
|
|
year overall |
74.5 2.872696
70
79 | N = 3460
between |
0
74.5
74.5 | n = 346
within |
2.872696
70
79 | T =
10
|
|
CUSIP overall | 531201.2 281707.7
800 989399 | N = 3460
between |
282074.9
800 989399 | n = 346
within |
0 531201.2 531201.2 | T =
10
|
|
ARDSSIC overall | 9.97619 5.452387
1
21 | N = 3360
between |
5.459706
1
21 | n = 336
within |
0 9.97619 9.97619 | T =
10
|
|
SCISECT overall | .4248555 .4943925
0
1 | N = 3460
between |
.4950369
0
1 | n = 346
within |
0 .4248555 .4248555 | T =
10
|
|
LOGK overall | 3.921063 2.092814 -1.76965 9.66626 | N = 3460
between |
2.095542 -1.76965 9.66626 | n = 346
within |
0 3.921063 3.921063 | T =
10
|
|
SUMPAT overall | 284.7312 570.3701
0
3806 | N = 3460
between |
571.1136
0
3806 | n = 346
within |
0 284.7312 284.7312 | T =
10
|
|
LOGR overall | 1.229807 1.970524 -3.84868 7.06524 | N = 3460
between |
1.944421 -3.120133 6.911438 | n = 346
553

within |
.3347099 -1.19673 4.218814 | T =
10
|
|
PAT
overall | 36.28439 74.46563
0
608 | N = 3460
between |
72.5989
0
484.8 | n = 346
within |
16.97772 -177.7156 224.3844 | T =
10
|
|
LPAT overall | 1.935464 1.949421 -.6931472 6.410175 | N =
between |
1.873181 -.6931472 6.180623 | n = 346
within |
.5482375 -.2643028 4.368045 | T =
10
|
|
DPAT overall | .8251445 .3798984
0
1 | N = 3460
between |
.2831052
0
1 | n = 346
within |
.2537376 -.0748555 1.725145 | T =
10
|
|
RANDD overall | 23.02263 82.90186 .0213078 1170.563 | N =
between |
81.69163 .0582575 1014.058 | n = 346
within |
14.71596 -280.2214 311.47 | T =
10
|
|
LOGRL1 overall | 1.216943 1.960836 -3.84868 7.06524 | N =
between |
1.937733 -3.123236 6.897784 | n = 346
within |
.3157841 -.6151992 4.203909 | T =
9
|
|
LOGRL2 overall | 1.205747 1.953427 -3.84868 7.06524 | N =
between |
1.932143 -3.12461 6.889576 | n = 346
within |
.3035537 -.486563 4.187752 | T =
8
|
|
LOGRL3 overall | 1.19942 1.946583 -3.84868 7.06524 | N =
between |
1.926813 -3.074006 6.887726 | n = 346
within |
.2928787 -.2381882 4.153968 | T =
7
|
|
LOGRL4 overall | 1.197176 1.941555 -3.67395 7.06524 | N =
between |
1.923302 -2.989647 6.897597 | n = 346
within |
.2818841 -.2335892 4.095286 | T =
6
|
|
LOGRL5 overall | 1.203451 1.934293 -3.67395 7.06524 | N =
between |
1.917687 -2.99075 6.924144 | n = 346
within |
.2692134 -.1899074 4.062701 | T =
5
|
|
dyear2 overall |
.1 .3000434
0
1 | N = 3460
between |
0
.1
.1 | n = 346
within |
.3000434
0
1| T=
10
|
|
dyear3 overall |
.1 .3000434
0
1 | N = 3460
between |
0
.1
.1 | n = 346
within |
.3000434
0
1| T=
10
|
|
dyear4 overall |
.1 .3000434
0
1 | N = 3460
between |
0
.1
.1 | n = 346
within |
.3000434
0
1| T=
10
|
|
dyear5 overall |
.1 .3000434
0
1 | N = 3460

3460

3460

3114

2768

2422

2076

1730

554

between |
within |

0
.3000434

.1
0

.1 | n = 346
1| T=
10

.
. ********** DEFINE GLOBALS INCLUDING REGRESSOR LIST **********
.
. * Number of reps for the bootstrap
. * Table 23.1 used 500
. global nreps 500
.
. * The regressions below are of patents on LOGR ??? on ???
. * Additional regressors to be included below are defined in xextra
. * Here no additional regressors
. global xextra
.
. ********** (1) LINEAR PANEL RANDOM AND FIXED EFFECTS FOR LOG(PAT)
**********
.
. * This adhoc method uses as dependent variable
. * LPAT = ln(PAT) if PAT > 0
.*
= ln(0.5) if PAT = 0
. * which is analyzed using chapter 21 methods
.
. * Note that in the first xt command need to give , i(id)
. * to indicate that the ith observation is for the ith id
. * Time invariant regressors LOGK SCISECT are not included
.
. use patr7079, clear
. drop if year<75
(1730 observations deleted)
.
. * Overall plot of data
. * The graphs below use new Stata 8 graphics
. * Change graphics scheme from default s2color to s1mono for printing
. set scheme s1mono
.
. * Figure 21.1 page 792 [with axis labels corrected - book is wrong]
. graph twoway (scatter LPAT LOGR, msize(vsmall)) (lowess LPAT LOGR) (lfit LPAT LOGR), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Pooled (Overall) Regression") /*
> */ xtitle("Log R&D Spending", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Log Patents", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(4) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Original data") label(2 "Nonparametric fit") label(3 "Linear fit"))
. graph export ch23fig1.wmf, replace
555

(file c:\Imbook\bwebpage\Section5\ch23fig1.wmf written in Windows Metafile format)


.
. * OLS
. regress LPAT LOGR $xextra, cluster(id)
Regression with robust standard errors
Number of obs = 1730
F( 1, 345) = 1330.60
Prob > F
= 0.0000
R-squared = 0.7192
Number of clusters (id) = 346
Root MSE
= 1.0461
-----------------------------------------------------------------------------|
Robust
LPAT |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .8340745 .0228655 36.48 0.000 .7891012 .8790478
_cons | .7954785 .0579246 13.73 0.000 .6815487 .9094083
-----------------------------------------------------------------------------. estimates store linolspan
.
. * Fixed effects
. xtreg LPAT LOGR $xextra, fe i(id)
Fixed-effects (within) regression
Group variable (i): id
R-sq: within = 0.0026
between = 0.7669
overall = 0.7192

corr(u_i, Xb) = 0.8405

Number of obs
=
1730
Number of groups =
346
Obs per group: min =
avg =
5.0
max =
5

F(1,1383)
=
Prob > F

3.63
=

0.0570

-----------------------------------------------------------------------------LPAT |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .1067505 .0560364 1.91 0.057 -.0031749 .216676
_cons | 1.709116 .0714557 23.92 0.000 1.568943 1.849289
-------------+---------------------------------------------------------------sigma_u | 1.7380872
sigma_e | .51119065
rho | .92038546 (fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0: F(345, 1383) = 16.96
Prob > F = 0.0000
. estimates store linfe
.
556

. * Random effects
. xtreg LPAT LOGR $xextra, re i(id)
Random-effects GLS regression
Group variable (i): id
R-sq: within = 0.0026
between = 0.7669
overall = 0.7192
Random effects u_i ~ Gaussian
corr(u_i, X)
= 0 (assumed)

Number of obs
Number of groups =

=
1730
346

Obs per group: min =


avg =
5.0
max =
5

Wald chi2(1)
= 915.90
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------LPAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .7202377 .0237986 30.26 0.000 .6735932 .7668821
_cons | .9384761 .0599584 15.65 0.000 .8209598 1.055992
-------------+---------------------------------------------------------------sigma_u | .90057544
sigma_e | .51119065
rho | .7563152 (fraction of variance due to u_i)
-----------------------------------------------------------------------------. estimates store linre
.
.
. ********** (2) POISSON RANDOM AND FIXED EFFECTS (Table 32.1 p.794 ) **********
.
. use patr7079, clear
. drop if year<75
(1730 observations deleted)
.
. * Poisson Cross-section with Poisson standard errors
. * Table 23.1 Poisson column
.
. poisson PAT LOGR $xextra
Iteration 0: log likelihood = -21030.607
Iteration 1: log likelihood = -21030.583
Iteration 2: log likelihood = -21030.583
Poisson regression

Number of obs =
1730
LR chi2(1)
= 108479.76
Prob > chi2 = 0.0000
Log likelihood = -21030.583
Pseudo R2
= 0.7206
-----------------------------------------------------------------------------557

PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .6929337 .0022454 308.61 0.000 .6885329 .6973346
_cons | 1.711528 .009767 175.24 0.000 1.692385 1.730671
-----------------------------------------------------------------------------. estimates store poisiid
.
. * Poisson Cross-section with heteroskedastic robust standard errors
. poisson PAT LOGR $xextra, robust
Iteration 0: log pseudo-likelihood = -21030.607
Iteration 1: log pseudo-likelihood = -21030.583
Iteration 2: log pseudo-likelihood = -21030.583
Poisson regression

Number of obs =
1730
Wald chi2(1) = 1223.63
Prob > chi2 = 0.0000
Log pseudo-likelihood = -21030.583
Pseudo R2
= 0.7206
-----------------------------------------------------------------------------|
Robust
PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .6929337 .0198092 34.98 0.000 .6541084 .731759
_cons | 1.711528 .0620025 27.60 0.000 1.590006 1.833051
-----------------------------------------------------------------------------. estimates store poishet
.
. * Poisson Cross-section with panel robust standard errors
. poisson PAT LOGR $xextra, cluster(id)
Iteration 0: log pseudo-likelihood = -21030.607
Iteration 1: log pseudo-likelihood = -21030.583
Iteration 2: log pseudo-likelihood = -21030.583
Poisson regression

Number of obs =
Wald chi2(1) = 259.15
Log pseudo-likelihood = -21030.583
Prob > chi2

1730
=

0.0000

(standard errors adjusted for clustering on id)


-----------------------------------------------------------------------------|
Robust
PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .6929337 .0430441 16.10 0.000 .6085688 .7772987
_cons | 1.711528 .1340309 12.77 0.000 1.448832 1.974224
-----------------------------------------------------------------------------558

. estimates store poispan


.
. * Poisson panel fixed effects
. * Table 23.1 p.794 Poisson-FE column
.
. * Poisson fixed effects
. xtpoisson PAT LOGR $xextra, fe i(id)
note: 22 groups (110 obs) dropped due to all zero outcomes
Iteration 0: log likelihood = -3660.2656
Iteration 1: log likelihood = -3659.5926
Iteration 2: log likelihood = -3659.5926
Conditional fixed-effects Poisson regression Number of obs
=
Group variable (i): id
Number of groups =
324
Obs per group: min =
avg =
5.0
max =
5

Log likelihood = -3659.5926

1620

Wald chi2(1)
=
1.35
Prob > chi2
=

0.2460

-----------------------------------------------------------------------------PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | -.0377642 .0325518 -1.16 0.246 -.1015645 .026036
-----------------------------------------------------------------------------. estimates store poisfe
.
. /*
> * Alternative way is to put in dummy variables
> set matsize 400
> xi: poisson PAT LOGR $xextra i.id
> */
.
. * Poisson panel random effects
. * Table 23.1 p.794 Poisson-RE column
.
. * Poisson random effects
. xtpoisson PAT LOGR $xextra, re i(id)
Fitting Poisson model:
Iteration 0: log likelihood = -21030.607
Iteration 1: log likelihood = -21030.583
Iteration 2: log likelihood = -21030.583
559

Fitting full model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -5633.1283


log likelihood = -5560.1171
log likelihood = -5553.2991
log likelihood = -5553.1788
log likelihood = -5553.1787

Random-effects Poisson regression


Number of obs
Group variable (i): id
Number of groups =
Random effects u_i ~ Gamma

=
1730
346

Obs per group: min =


avg =
5.0
max =
5

Wald chi2(1)
= 110.20
Log likelihood = -5553.1787
Prob > chi2
=

0.0000

-----------------------------------------------------------------------------PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .3487832 .0332254 10.50 0.000 .2836625 .4139039
_cons | 2.312705 .124758 18.54 0.000 2.068184 2.557226
-------------+---------------------------------------------------------------/lnalpha | .5454692 .0899144
.3692402 .7216983
-------------+---------------------------------------------------------------alpha | 1.725418 .1551399
1.446635 2.057925
-----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 3.1e+04 Prob>=chibar2 = 0.000
. estimates store poisre
.
. * Poisson random effects with normal error
. xtpoisson PAT LOGR $xextra, re i(id) normal
Fitting comparison Poisson model:
Iteration 0: log likelihood = -21030.607
Iteration 1: log likelihood = -21030.583
Iteration 2: log likelihood = -21030.583
Fitting constant-only model:
tau =
tau =
tau =
tau =
tau =
tau =

0.0
0.1
0.2
0.3
0.4
0.5

log likelihood = -55439.205


log likelihood = -12594.935
log likelihood = -8669.2146
log likelihood = -8107.7532
log likelihood = -7634.0488
log likelihood = -8046.3947
560

Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -7634.0488


log likelihood = -7586.9889
log likelihood = -7586.5899
log likelihood = -7586.5898

Fitting full model:


tau = 0.0
tau = 0.1
tau = 0.2
tau = 0.3
Iteration 0:
Iteration 1:
Iteration 2:

log likelihood = -19363.106


log likelihood = -6602.7685
log likelihood = -6335.5261
log likelihood = -6556.0614
log likelihood = -6335.5261
log likelihood = -6310.8821
log likelihood = -6261.9825

Random-effects Poisson regression


Number of obs
Group variable (i): id
Number of groups =
Random effects u_i ~ Gaussian

Obs per group: min =


avg =
5.0
max =
5

LR chi2(0)
Log likelihood = -6261.9825

=
1730
346

= 2649.21
Prob > chi2
=

-----------------------------------------------------------------------------PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .815977
.
.
.
.
.
_cons | 1.156293
.
.
.
.
.
-------------+---------------------------------------------------------------/lnsig2u | -1.310299
.
.
.
.
.
-------------+---------------------------------------------------------------sigma_u | .5193643
.
.
.
-----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01) = 3.0e+04 Pr>=chibar2 = 0.000
. estimates store poisrenormal
.
. * Poisson random effects population averaged
. xtpoisson PAT LOGR $xextra, pa i(id)
Iteration 1: tolerance = .09172122
Iteration 2: tolerance = .02686915
Iteration 3: tolerance = .00712438
Iteration 4: tolerance = .00159015
Iteration 5: tolerance = .00032104
Iteration 6: tolerance = .00006195
Iteration 7: tolerance = .00001174
Iteration 8: tolerance = 2.209e-06
561

Iteration 9: tolerance = 4.146e-07


GEE population-averaged model
Number of obs
=
1730
Group variable:
id
Number of groups =
346
Link:
log
Obs per group: min =
5
Family:
Poisson
avg =
5.0
Correlation:
exchangeable
max =
5
Wald chi2(1)
= 16317.27
Scale parameter:
1
Prob > chi2
= 0.0000
-----------------------------------------------------------------------------PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .5595302 .0043803 127.74 0.000
.550945 .5681153
_cons | 2.067515 .0185166 111.66 0.000 2.031223 2.103807
-----------------------------------------------------------------------------. estimates store poispa
.
. * Poisson random effects population averaged with robust se
. xtpoisson PAT LOGR $xextra, robust pa i(id)
Iteration 1: tolerance = .09172122
Iteration 2: tolerance = .02686915
Iteration 3: tolerance = .00712438
Iteration 4: tolerance = .00159015
Iteration 5: tolerance = .00032104
Iteration 6: tolerance = .00006195
Iteration 7: tolerance = .00001174
Iteration 8: tolerance = 2.209e-06
Iteration 9: tolerance = 4.146e-07
GEE population-averaged model
Number of obs
=
1730
Group variable:
id
Number of groups =
346
Link:
log
Obs per group: min =
5
Family:
Poisson
avg =
5.0
Correlation:
exchangeable
max =
5
Wald chi2(1)
= 293.80
Scale parameter:
1
Prob > chi2
= 0.0000
(standard errors adjusted for clustering on id)
-----------------------------------------------------------------------------|
Semi-robust
PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .5595302 .0326436 17.14 0.000 .4955499 .6235104
_cons | 2.067515 .1113256 18.57 0.000 1.849321 2.285709
-----------------------------------------------------------------------------. estimates store poispapan
562

.
. ********** (3) POISSON GEE (GENERALIZED ESTIMATING EQUATIONS **********
.
. * Xtgee should reproduce Poisson random effects population averaged
. xtgee PAT LOGR $xextra, corr(exchangeable) family(poisson) link(log) i(id)
Iteration 1: tolerance = .09172122
Iteration 2: tolerance = .02686915
Iteration 3: tolerance = .00712438
Iteration 4: tolerance = .00159015
Iteration 5: tolerance = .00032104
Iteration 6: tolerance = .00006195
Iteration 7: tolerance = .00001174
Iteration 8: tolerance = 2.209e-06
Iteration 9: tolerance = 4.146e-07
GEE population-averaged model
Number of obs
=
1730
Group variable:
id
Number of groups =
346
Link:
log
Obs per group: min =
5
Family:
Poisson
avg =
5.0
Correlation:
exchangeable
max =
5
Wald chi2(1)
= 16317.27
Scale parameter:
1
Prob > chi2
= 0.0000
-----------------------------------------------------------------------------PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .5595302 .0043803 127.74 0.000
.550945 .5681153
_cons | 2.067515 .0185166 111.66 0.000 2.031223 2.103807
-----------------------------------------------------------------------------. estimates store poisgee
.
. * Xtgee should reproduce Poisson random effects population averaged with robust se
. xtgee PAT LOGR $xextra, corr(exchangeable) family(poisson) link(log) i(id) robust
Iteration 1: tolerance = .09172122
Iteration 2: tolerance = .02686915
Iteration 3: tolerance = .00712438
Iteration 4: tolerance = .00159015
Iteration 5: tolerance = .00032104
Iteration 6: tolerance = .00006195
Iteration 7: tolerance = .00001174
Iteration 8: tolerance = 2.209e-06
Iteration 9: tolerance = 4.146e-07
GEE population-averaged model
Number of obs
=
1730
Group variable:
id
Number of groups =
346
Link:
log
Obs per group: min =
5
563

Family:
Correlation:
Scale parameter:

Poisson
avg =
5.0
exchangeable
max =
5
Wald chi2(1)
= 293.80
1
Prob > chi2
= 0.0000

(standard errors adjusted for clustering on id)


-----------------------------------------------------------------------------|
Semi-robust
PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .5595302 .0326436 17.14 0.000 .4955499 .6235104
_cons | 2.067515 .1113256 18.57 0.000 1.849321 2.285709
-----------------------------------------------------------------------------. estimates store poisgeepan
.
. * Xtgee should give NLS of exponential mean with iid standard errors
. xtgee PAT LOGR $xextra, corr(independent) family(gaussian) link(log) i(id)
Iteration 1: tolerance = 8.014e-08
GEE population-averaged model
Number of obs
=
1730
Group variable:
id
Number of groups =
346
Link:
log
Obs per group: min =
5
Family:
Gaussian
avg =
5.0
Correlation:
independent
max =
5
Wald chi2(1)
= 2316.87
Scale parameter:
2060.724
Prob > chi2
= 0.0000
Pearson chi2(1730):
Dispersion (Pearson):

3565052.8
2060.724

Deviance
Dispersion

= 3565052.8
= 2060.724

-----------------------------------------------------------------------------PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .5084673 .0105636 48.13 0.000
.487763 .5291716
_cons | 2.528729 .0544558 46.44 0.000 2.421997 2.63546
-----------------------------------------------------------------------------. estimates store nls
.
. * Xtgee should give NLS of exponential mean with robust standard errors
. xtgee PAT LOGR $xextra, corr(independent) family(gaussian) link(log) i(id) robust
Iteration 1: tolerance = 8.014e-08
GEE population-averaged model
Number of obs
=
1730
Group variable:
id
Number of groups =
346
Link:
log
Obs per group: min =
5
564

Family:
Correlation:
Scale parameter:
Pearson chi2(1730):
Dispersion (Pearson):

Gaussian
avg =
5.0
independent
max =
5
Wald chi2(1)
= 85.32
2060.724
Prob > chi2
= 0.0000
3565052.8
2060.724

Deviance
Dispersion

= 3565052.8
= 2060.724

(standard errors adjusted for clustering on id)


-----------------------------------------------------------------------------|
Semi-robust
PAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LOGR | .5084673 .055046 9.24 0.000 .4005791 .6163554
_cons | 2.528729 .2176674 11.62 0.000 2.102109 2.955349
-----------------------------------------------------------------------------. estimates store nlspan
.
. ********** (4) PANEL ROBUST STANDARD ERRORS BY BOOTSTRAP **********
.
. * For discussion of panel robust standard errors
. * see text Section 23.2.6 page 788-9 (nonlinear panel)
. * and text Section 21.2.3 page 705-8 (linear panel)
.
. * Pooled Poisson panel robust bootstrap standard errors
. set seed 10001
. bootstrap "poisson PAT LOGR $xextra" "_b[LOGR] _b[_cons]", cluster(id) reps($nreps) level(95)
command:
poisson PAT LOGR
statistics: _bs_1
= _b[LOGR]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
346
Replications =
500

1730

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .6929337 .0081667 .0473006 .6000008 .7858666 (N)
|
.6250867 .8100113 (P)
|
.6209522 .8025689 (BC)
_bs_2 | 500 1.711528 -.0267995 .141745 1.433038 1.990019 (N)
|
1.336657 1.924925 (P)
|
1.355381 1.935691 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
565

BC = bias-corrected
. matrix poisbootse = e(se)
.
. * Poisson fixed effects panel bootstrap standard errors
. set seed 10001
. bootstrap "xtpoisson PAT LOGR $xextra, fe i(id)" "_b[LOGR]", cluster(id) reps($nreps) level(95)
command:
xtpoisson PAT LOGR , fe i(id)
statistic: _bs_1
= _b[LOGR]
Bootstrap statistics

Number of obs =
N of clusters =
324
Replications =
500

1620

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 -.0377642 .0057448 .1067039 -.2474085 .17188 (N)
|
-.2458792 .1454112 (P)
|
-.3182177 .1310303 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix poisfebootse = e(se)
.
. * Poisson random effects panel bootstrap standard errors
. set seed 10001
. bootstrap "xtpoisson PAT LOGR $xextra, re i(id)" "_b[LOGR] _b[_cons]", cluster(id)
reps($nreps) le
> vel(95)
command:
xtpoisson PAT LOGR , re i(id)
statistics: _bs_1
= _b[LOGR]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
346
Replications =
500

1730

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .3487832 -.1581585 .1194127 .1141695 .5833969 (N)
|
-.0414326 .4028537 (P)
566

|
.2775298 .5040658 (BC)
_bs_2 | 500 2.312705 .5382745 .4384781 1.451214 3.174196 (N)
|
2.104445 3.743506 (P)
|
1.804036 2.552794 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix poisrebootse = e(se)
.
. * Poisson population averaged panel bootstrap standard errors
. set seed 10001
. bootstrap "xtpoisson PAT LOGR $xextra, pa i(id)" "_b[LOGR] _b[_cons]", cluster(id)
reps($nreps) le
> vel(95)
command:
xtpoisson PAT LOGR , pa i(id)
statistics: _bs_1
= _b[LOGR]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
346
Replications =
500

1730

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 338 .5595301 -.0013448 .1072904 .3484868 .7705734 (N)
|
.1938364 .6946551 (P)
|
.0630385 .6535396 (BC)
_bs_2 | 338 2.067515 -.0016997 .2940233 1.489163 2.645867 (N)
|
1.675453 3.034075 (P)
|
1.80883 3.352539 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix poispabootse = e(se)
. set seed 10001
.
. * Xtgee should give exponential mean (NLS) with iid errors with boostrap se's
. bootstrap "xtgee PAT LOGR $xextra, corr(independent) family(gaussian) link(log) i(id)"
"_b[LOGR]
> _b[_cons]", cluster(id) reps($nreps) level(95)

567

command:
xtgee PAT LOGR , corr(independent) family(gaussian) link(log) i(id)
statistics: _bs_1
= _b[LOGR]
_bs_2
= _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
346
Replications =
500

1730

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 500 .5084673 .0122215 .0541264 .4021235 .614811 (N)
|
.4453159 .6547906 (P)
|
.4372376 .6397901 (BC)
_bs_2 | 500 2.528729 -.0502655 .198022 2.139669 2.917789 (N)
|
1.953206 2.763821 (P)
|
2.084754 2.820513 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
.
. * Results fiven in same order as in Table 23.1 page 794
. matrix nlsbootse = e(se)
. matrix list poisbootse
poisbootse[1,2]
_bs_1
_bs_2
se .04730061 .14174498
. matrix list poisfebootse
symmetric poisfebootse[1,1]
_bs_1
se .10670389
. matrix list poisrebootse
poisrebootse[1,2]
_bs_1
_bs_2
se .11941272 .43847813
. matrix list poispabootse
poispabootse[1,2]
_bs_1
_bs_2
se .10729042 .29402327
.
568

. ********** DISPLAY RESULTS FOR (1)-(3) GIVEN IN TABLE 23.1 page 794 **********
.
. * Standard error using iid errors and in some cases panel
.
. estimates table linolspan linfe linre, t se /*
> */ stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)
----------------------------------------------------Variable | linolspan
linfe
linre
-------------+--------------------------------------LOGR |
0.834
0.107
0.720
|
0.023
0.056
0.024
|
36.48
1.91
30.26
_cons |
0.795
1.709
0.938
|
0.058
0.071
0.060
|
13.73
23.92
15.65
-------------+--------------------------------------N | 1730.000 1730.000 1730.000
ll | -2531.658 -1100.267
r2 |
0.719
0.003
tss |
6732.584
rss | 1890.831 361.400
mss | 4841.753
0.948
rmse |
1.046
0.511
df_r | 345.000 1383.000
----------------------------------------------------legend: b/se/t
. estimates table poisiid poishet poispan, t se /*
> */ stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)
----------------------------------------------------Variable | poisiid
poishet
poispan
-------------+--------------------------------------LOGR |
0.693
0.693
0.693
|
0.002
0.020
0.043
| 308.61
34.98
16.10
_cons |
1.712
1.712
1.712
|
0.010
0.062
0.134
| 175.24
27.60
12.77
-------------+--------------------------------------N | 1730.000 1730.000 1730.000
ll | -21030.583 -21030.583 -21030.583
r2 |
tss |
rss |
mss |
rmse |
df_r |
----------------------------------------------------legend: b/se/t
569

. estimates table poisfe poisre poisrenormal poispa poispapan, t se /*


> */ stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)
------------------------------------------------------------------------------Variable | poisfe
poisre poisreno~l poispa poispapan
-------------+----------------------------------------------------------------PAT
|
LOGR | -0.038
0.349
0.816
|
0.033
0.033
0.000
|
-1.16
10.50
.
_cons |
2.313
1.156
|
0.125
0.000
|
18.54
.
-------------+----------------------------------------------------------------lnalpha
|
_cons |
0.545
|
0.090
|
6.07
-------------+----------------------------------------------------------------lnsig2u
|
_cons |
-1.310
|
0.000
|
.
-------------+----------------------------------------------------------------_
|
LOGR |
0.560
0.560
|
0.004
0.033
|
127.74
17.14
_cons |
2.068
2.068
|
0.019
0.111
|
111.66
18.57
-------------+----------------------------------------------------------------Statistics |
N | 1620.000 1730.000 1730.000 1730.000 1730.000
ll | -3659.593 -5553.179 -6261.982
r2 |
tss |
rss |
mss |
rmse |
df_r |
------------------------------------------------------------------------------legend: b/se/t
. estimates table poisgee poisgeepan nls nlspan, t se /*
> */ stats(N ll r2 tss rss mss rmse df_r) b(%10.3f)
-----------------------------------------------------------------Variable | poisgee poisgeepan
nls
nlspan
-------------+---------------------------------------------------570

LOGR |
0.560
0.560
0.508
0.508
|
0.004
0.033
0.011
0.055
| 127.74
17.14
48.13
9.24
_cons |
2.068
2.068
2.529
2.529
|
0.019
0.111
0.054
0.218
| 111.66
18.57
46.44
11.62
-------------+---------------------------------------------------N | 1730.000 1730.000 1730.000 1730.000
ll |
r2 |
tss |
rss |
mss |
rmse |
df_r |
-----------------------------------------------------------------legend: b/se/t
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section5\mma23p1pannonlin.txt
log type: text
closed on: 23 May 2005, 12:53:45

571

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section6\mma24p1olscluster.txt
log type: text
opened on: 24 May 2005, 14:33:58
.
. ********** OVERVIEW OF MMA24P1OLSCLUSTER.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 24.7 pages 848-53 Table 24.4
. * Cluster robust inference for OLS cross-section application using
. * Vietnam Living Standard Survey data
.
. * (0) Descriptive Statistics (Table 24.3 first half)
. * (1) Linear regression (in logs) with household data (Table 24.4)
.
. * For Tables 24.5-6 for clustered count data see MMA24P2POISCLUSTER.DO
.
. * The cluster effects model is
. * y_it = x_it'b + a_i + e_it
. * Default xtreg output assumes e_it is iid.
. * This is usually too strong an assumption.
. * Instead should get cluster-robust errors after xtreg
. * See Section 21.2.3 pages 709-12
. * Stata Version 8 does not do this but Stata version 9 does.
. * Here we do a panel bootstrap - results not reported in the text
.
. * To speed up programs reduce breps - the number of bootstrap reps
.
. * To run this program you need data set
. * vietnam_ex1.dta
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** DATA DESCRIPTION **********
.
. * The data comes from World Bank 1997 Vietnam Living Standards Survey
. * A subset was used in chapter 4.6.4.
. * The larger sample here is described on pages 848-9
572

.
. * The data are HOUSEHOLD data
. * There are N=5006 households in 194 clusters
.
. * The separate data set vietnam_ex2.dta has household-level data
.
. ********** READ IN HOUSEHOLD DATA and SUMMARIZE (Table 24.3) **********
.
. use vietnam_ex1.dta
. desc
Contains data from vietnam_ex1.dta
obs:
5,999
vars:
8
11 Apr 2005 12:39
size:
185,969 (98.2% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------sex
byte %8.0g
Gender of HH.head (1:M;2:F)
age
int %8.0g
Age of household head
comped98
float %9.0g
diploma completed diploma HH.head
farm
float %9.0g
loaiho Type of HH (1:farm; 0:nonfarm)
hhsize
long %12.0g
Household size
commune
float %9.0g
commune code PSU-SVY commands
lhhexp1
float %9.0g
lhhex12m
float %9.0g
------------------------------------------------------------------------------Sorted by:
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------sex |
5999 1.270712 .4443645
1
2
age |
5999 48.01284 13.7702
16
95
comped98 |
5999 3.385564 2.037543
0
9
farm |
5999 .5730955 .4946694
0
1
hhsize |
5999 4.752292 1.954292
1
19
-------------+-------------------------------------------------------commune |
5999 98.26588 56.00461
1
194
lhhexp1 |
5999 9.341561 .6877458 6.543108 12.20242
lhhex12m |
5006 6.310585 1.593083
0 12.36325
.
. rename sex SEX
. rename age AGE
. rename comped98 EDUC
573

. rename farm FARM


. rename hhsize HHSIZE
. rename commune COMMUNE
. rename lhhexp1 LNHHEXP
. rename lhhex12m LNEXP12M
. gen HHEXP = exp(LNHHEXP)
.
. * Following should give same descriptive statistics
. * as in top half (Household) in Table 24.3 p.850
. * But there are some differences plus here have FARM not URBAN
. sum LNEXP12M AGE SEX HHSIZE FARM EDUC HHEXP LNHHEXP COMMUNE
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------LNEXP12M |
5006 6.310585 1.593083
0 12.36325
AGE |
5999 48.01284 13.7702
16
95
SEX |
5999 1.270712 .4443645
1
2
HHSIZE |
5999 4.752292 1.954292
1
19
FARM |
5999 .5730955 .4946694
0
1
-------------+-------------------------------------------------------EDUC |
5999 3.385564 2.037543
0
9
HHEXP |
5999 14599.23 12582.31 694.4419 199271
LNHHEXP |
5999 9.341561 .6877458 6.543108 12.20242
COMMUNE |
5999 98.26588 56.00461
1
194
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. * Note that LNEXP12M has some missing values coded as .
. outfile LNEXP12M AGE SEX HHSIZE FARM EDUC LNHHEXP COMMUNE /*
> */using vietnam_ex1.asc, replace
.
. ********** ANALYSIS: CLUSTER ANALYSIS FOR LINEAR MODEL [Table 24.4 p.851]
**********
.
. * Regressor list for the linear regressions
. global XLISTLINEAR LNHHEXP AGE SEX HHSIZE FARM EDUC
.
. * OLS with usual standard errors (Table 24.4 columns 1-2)
. regress LNEXP12M $XLISTLINEAR
Source |
SS
df
MS
-------------+------------------------------

Number of obs = 5006


F( 6, 4999) = 82.02
574

Model | 1138.38332 6 189.730553


Prob > F
= 0.0000
Residual | 11563.877 4999 2.31323805
R-squared = 0.0896
-------------+-----------------------------Adj R-squared = 0.0885
Total | 12702.2603 5005 2.53791415
Root MSE
= 1.5209
-----------------------------------------------------------------------------LNEXP12M |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | .6702328 .0418711 16.01 0.000 .5881472 .7523185
AGE | .0105766 .0016554 6.39 0.000 .0073312 .013822
SEX | .097444 .0518961 1.88 0.060 -.0042952 .1991832
HHSIZE | .0289812 .0132524 2.19 0.029 .0030007 .0549617
FARM | .1346891 .0493325 2.73 0.006 .0379757 .2314025
EDUC | -.0903599 .0122803 -7.36 0.000 -.1144346 -.0662852
_cons | -.5107135 .3799642 -1.34 0.179 -1.25561 .234183
-----------------------------------------------------------------------------. estimates store olsiid
.
. * OLS with heteroskedastic-robust standard errors (Table 24.4 column 3)
. regress LNEXP12M $XLISTLINEAR, robust
Regression with robust standard errors
Number of obs =
F( 6, 4999) = 80.80
Prob > F
= 0.0000
R-squared = 0.0896
Root MSE = 1.5209

5006

-----------------------------------------------------------------------------|
Robust
LNEXP12M |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | .6702328 .0425223 15.76 0.000 .5868705 .7535952
AGE | .0105766 .0016634 6.36 0.000 .0073157 .0138376
SEX | .097444 .0519606 1.88 0.061 -.0044217 .1993096
HHSIZE | .0289812 .0134698 2.15 0.031 .0025744 .055388
FARM | .1346891 .0494286 2.72 0.006 .0377873 .2315908
EDUC | -.0903599 .0127869 -7.07 0.000 -.1154278 -.0652919
_cons | -.5107135 .3812665 -1.34 0.180 -1.258163 .2367362
-----------------------------------------------------------------------------. estimates store olshet
.
. * OLS with cluster-robust standard errors (Table 24.4 column 4)
. regress LNEXP12M $XLISTLINEAR, cluster(COMMUNE)
Regression with robust standard errors
Number of obs =
F( 6, 193) = 54.91
Prob > F
= 0.0000

5006

575

R-squared
Number of clusters (COMMUNE) = 194

= 0.0896
Root MSE

= 1.5209

-----------------------------------------------------------------------------|
Robust
LNEXP12M |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | .6702328 .0528536 12.68 0.000
.565988 .7744777
AGE | .0105766 .0019371 5.46 0.000 .0067561 .0143972
SEX | .097444 .0595084 1.64 0.103 -.0199263 .2148142
HHSIZE | .0289812 .0153602 1.89 0.061 -.0013142 .0592766
FARM | .1346891 .0608046 2.22 0.028 .0147622 .2546159
EDUC | -.0903599 .0149743 -6.03 0.000 -.1198942 -.0608255
_cons | -.5107135 .4706163 -1.09 0.279 -1.438925 .4174979
-----------------------------------------------------------------------------. estimates store olsclust
.
. * Random effects estimation (FGLS) (Table 24.4 columns 5-6)
. * This uses the xtreg command which first requires identifying the cluster
. iis COMMUNE
. xtreg LNEXP12M $XLISTLINEAR, re
Random-effects GLS regression
Group variable (i): COMMUNE
R-sq: within = 0.0518
between = 0.2884
overall = 0.0883
Random effects u_i ~ Gaussian
corr(u_i, X)
= 0 (assumed)

Number of obs
=
5006
Number of groups =
194
Obs per group: min =
avg =
25.8
max =
39

Wald chi2(6)
= 335.12
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------LNEXP12M |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | .6268899 .0468004 13.39 0.000 .5351627 .718617
AGE | .0112334 .0016411 6.85 0.000
.008017 .0144499
SEX | .1069915 .0511849 2.09 0.037 .0066709 .2073121
HHSIZE | .0158302 .0135166 1.17 0.242 -.0106618 .0423222
FARM | .0928509 .0549544 1.69 0.091 -.0148578 .2005595
EDUC | -.0638447 .0129744 -4.92 0.000 -.0892741 -.0384153
_cons | -.1660698 .4202027 -0.40 0.693 -.989652 .6575123
-------------+---------------------------------------------------------------sigma_u | .46739871
sigma_e | 1.4526468
rho | .09381491 (fraction of variance due to u_i)
------------------------------------------------------------------------------

576

. estimates store refgls


.
. * Note that can cluster bootstrap if desired to get more robust standard errors
. * This is done at end of program
.
. * Fixed effects estimation (FGLS) (Table 24.4 columns 7-8)
. xtreg LNEXP12M $XLISTLINEAR, fe
Fixed-effects (within) regression
Group variable (i): COMMUNE
R-sq: within = 0.0520
between = 0.2787
overall = 0.0865

corr(u_i, Xb) = 0.0797

Number of obs
=
5006
Number of groups =
194
Obs per group: min =
avg =
25.8
max =
39

F(6,4806)
=
Prob > F

43.92
= 0.0000

-----------------------------------------------------------------------------LNEXP12M |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | .6037139 .0520178 11.61 0.000 .5017352 .7056926
AGE | .0115845 .0016706 6.93 0.000 .0083092 .0148597
SEX | .112821 .0520014 2.17 0.030 .0108745 .2147675
HHSIZE | .0107124 .0141127 0.76 0.448 -.016955 .0383797
FARM | .0693037 .0609002 1.14 0.255 -.0500885 .1886959
EDUC | -.0510325 .0135817 -3.76 0.000 -.0776588 -.0244062
_cons | .0361552 .461482 0.08 0.938 -.8685606 .9408711
-------------+---------------------------------------------------------------sigma_u | .57732514
sigma_e | 1.4526468
rho | .13640519 (fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0: F(193, 4806) = 3.49
Prob > F = 0.0000
. estimates store fe
.
. * Note that can cluster bootstrap if desired to get more robust standard errors
. * This is done at end of program
.
. * Random effects estimation by MLE assuming normality (Table 24.4 columns 5-6)
. * This uses the xtreg command which first requires identifying the cluster
. iis COMMUNE
. xtreg LNEXP12M $XLISTLINEAR, mle
Fitting constant-only model:
Iteration 0: log likelihood = -9262.6182
Iteration 1: log likelihood = -9252.6974
577

Iteration 2: log likelihood = -9252.1542


Iteration 3: log likelihood = -9252.1493
Fitting full model:
Iteration 0: log likelihood = -9096.5264
Iteration 1: log likelihood = -9092.5585
Iteration 2: log likelihood = -9092.5546
Random-effects ML regression
Group variable (i): COMMUNE
Random effects u_i ~ Gaussian

Number of obs
=
5006
Number of groups =
194
Obs per group: min =
avg =
25.8
max =
39

LR chi2(6)
Log likelihood = -9092.5546

= 319.19
Prob > chi2
=

0.0000

-----------------------------------------------------------------------------LNEXP12M |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | .6276456 .0467072 13.44 0.000
.536101 .7191901
AGE | .01122 .0016406 6.84 0.000 .0080045 .0144354
SEX | .1067788 .0511618 2.09 0.037 .0065035 .207054
HHSIZE | .01603 .0135121 1.19 0.235 -.0104533 .0425133
FARM | .0936529 .0548379 1.71 0.088 -.0138274 .2011332
EDUC | -.0643046 .0130222 -4.94 0.000 -.0898277 -.0387816
_cons | -.1718111 .4192856 -0.41 0.682 -.9935959 .6499737
-------------+---------------------------------------------------------------/sigma_u | .455472 .0329742 13.81 0.000 .3908438 .5201002
/sigma_e | 1.452303 .0148092 98.07 0.000 1.423278 1.481329
-------------+---------------------------------------------------------------rho | .0895499 .0120221
.0682208 .1154799
-----------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)= 212.57 Prob>=chibar2 = 0.000
. estimates store remle
.
. * Test of the RE specification using Breusch-Pagan test
. * This is statistic in third bottom row of Table 24.4
. quietly xtreg LNEXP12M $XLISTLINEAR, re
. xttest0
Breusch and Pagan Lagrangian multiplier test for random effects:
LNEXP12M[COMMUNE,t] = Xb + u[COMMUNE] + e[COMMUNE,t]
Estimated results:
|
Var

sd = sqrt(Var)
578

---------+----------------------------LNEXP12M | 2.537914
1.593083
e | 2.110183
1.452647
u | .2184615
.4673987
Test: Var(u) = 0
chi2(1) = 432.75
Prob > chi2 = 0.0000
.
. * Hausman test of FE vs. RE specification
. * This test is not a robust version.
. * Its validity asswumes that errors are iid after including COMMUNE-specific effect
. * For this example this may be reasonable as cluster bootstrap se's close to usual se's
. xthausman
(Warning: xthausman is no longer a supported command; use -hausman-. For instructions, see help
hausman.)

Hausman specification test


---- Coefficients ---|
Fixed
Random
LNEXP12M | Effects
Effects
Difference
-------------+----------------------------------------LNHHEXP | .6037139 .6268899
-.0231759
AGE | .0115845 .0112334
.000351
SEX | .112821 .1069915
.0058295
HHSIZE | .0107124 .0158302
-.0051179
FARM | .0693037 .0928509
-.0235472
EDUC | -.0510325 -.0638447
.0128122
Test: Ho: difference in coefficients not systematic
chi2( 6) = (b-B)'[S^(-1)](b-B), S = (S_fe - S_re)
= 17.89
Prob>chi2 = 0.0065
.
. * Alternative GLS estimation using the GEE approach
. * Same as xtgee with family(gaussian) link(id) corr(exchangeable)
. * So GLS with equicorrelated errors
. xtreg LNEXP12M $XLISTLINEAR, pa
Iteration 1: tolerance = .21691897
Iteration 2: tolerance = .00610852
Iteration 3: tolerance = .00014606
Iteration 4: tolerance = 3.479e-06
Iteration 5: tolerance = 8.285e-08
GEE population-averaged model

Number of obs

5006
579

Group variable:
Link:
Family:
Correlation:
Scale parameter:

COMMUNE
Number of groups =
194
identity
Obs per group: min =
1
Gaussian
avg =
25.8
exchangeable
max =
39
Wald chi2(6)
= 338.97
2.314413
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------LNEXP12M |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | .6281447 .0466076 13.48 0.000 .5367955 .719494
AGE | .0112111 .0016411 6.83 0.000 .0079946 .0144275
SEX | .1066389 .0511914 2.08 0.037 .0063056 .2069722
HHSIZE | .0161625 .013502 1.20 0.231 -.0103009 .0426259
FARM | .0941811 .0547349 1.72 0.085 -.0130973 .2014594
EDUC | -.0646085 .0129528 -4.99 0.000 -.0899956 -.0392215
_cons | -.1756087 .4185566 -0.42 0.675 -.9959645 .6447472
-----------------------------------------------------------------------------. estimates store pa
.
. ********** DISPLAY TABLE 24.4 RESULTS page 851 **********
.
. estimates table olsiid olshet olsclust, /*
> */ b(%10.3f) t(%10.2f) stats(r2 N)
----------------------------------------------------Variable | olsiid
olshet
olsclust
-------------+--------------------------------------LNHHEXP |
0.670
0.670
0.670
|
16.01
15.76
12.68
AGE |
0.011
0.011
0.011
|
6.39
6.36
5.46
SEX |
0.097
0.097
0.097
|
1.88
1.88
1.64
HHSIZE |
0.029
0.029
0.029
|
2.19
2.15
1.89
FARM |
0.135
0.135
0.135
|
2.73
2.72
2.22
EDUC | -0.090
-0.090
-0.090
|
-7.36
-7.07
-6.03
_cons | -0.511
-0.511
-0.511
|
-1.34
-1.34
-1.09
-------------+--------------------------------------r2 |
0.090
0.090
0.090
N | 5006.000 5006.000 5006.000
----------------------------------------------------legend: b/t
. estimates table pa fe refgls remle, /*
580

>

*/ b(%10.3f) t(%10.2f) stats(r2 N)

-----------------------------------------------------------------Variable | pa
fe
refgls
remle
-------------+---------------------------------------------------_
|
LNHHEXP |
0.628
0.604
0.627
|
13.48
11.61
13.39
AGE |
0.011
0.012
0.011
|
6.83
6.93
6.85
SEX |
0.107
0.113
0.107
|
2.08
2.17
2.09
HHSIZE |
0.016
0.011
0.016
|
1.20
0.76
1.17
FARM |
0.094
0.069
0.093
|
1.72
1.14
1.69
EDUC | -0.065
-0.051
-0.064
|
-4.99
-3.76
-4.92
_cons | -0.176
0.036
-0.166
|
-0.42
0.08
-0.40
-------------+---------------------------------------------------LNEXP12M |
LNHHEXP |
0.628
|
13.44
AGE |
0.011
|
6.84
SEX |
0.107
|
2.09
HHSIZE |
0.016
|
1.19
FARM |
0.094
|
1.71
EDUC |
-0.064
|
-4.94
_cons |
-0.172
|
-0.41
-------------+---------------------------------------------------sigma_u
|
_cons |
0.455
|
13.81
-------------+---------------------------------------------------sigma_e
|
_cons |
1.452
|
98.07
-------------+---------------------------------------------------Statistics |
r2 |
0.052
N | 5006.000 5006.000 5006.000 5006.000
-----------------------------------------------------------------legend: b/t

581

.
. ********** ADDITIONALLY DO CLUSTER BOOTSTRAPS **********
.
. * These results not given in the text
.
. global breps = 500
.
. * Note that can bootstrap if desired to get more robust standard errors
. * The first reproduces reg , cluster(COMMUNE)
. bootstrap "reg LNEXP12M $XLISTLINEAR" _b, cluster(COMMUNE) reps($breps) level(95)
command:
reg LNEXP12M LNHHEXP AGE SEX HHSIZE FARM EDUC
statistics: b_LNHHEXP = _b[LNHHEXP]
b_AGE
= _b[AGE]
b_SEX
= _b[SEX]
b_HHSIZE = _b[HHSIZE]
b_FARM = _b[FARM]
b_EDUC = _b[EDUC]
b_cons = _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
194
Replications =
500

5006

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------b_LNHHEXP | 500 .6702328 .0000939 .0546562 .5628482 .7776175 (N)
|
.5575338 .7715588 (P)
|
.5502583 .7638555 (BC)
b_AGE | 500 .0105766 .0000108 .0019538 .0067379 .0144154 (N)
|
.0067395 .0143774 (P)
|
.006576 .0141968 (BC)
b_SEX | 500 .097444 -.0023301 .0602315 -.0208945 .2157825 (N)
|
-.0210348 .2196117 (P)
|
-.0261246 .2083439 (BC)
b_HHSIZE | 500 .0289812 -.0008009 .0160043 -.0024629 .0604252 (N)
|
-.0004838 .0628019 (P)
|
.0028144 .0662394 (BC)
b_FARM | 500 .1346891 .0026611 .0560327 .0245999 .2447782 (N)
|
.0293473 .2510255 (P)
|
.0202142 .2483591 (BC)
b_EDUC | 500 -.0903599 -.00006 .014992 -.119815 -.0609047 (N)
|
-.1205786 -.0618314 (P)
|
-.1204532 -.0615499 (BC)
b_cons | 500 -.5107135 .0044955 .4893788 -1.47221 .4507834 (N)
|
-1.435498 .4444398 (P)
|
-1.388972 .4859312 (BC)
-----------------------------------------------------------------------------Note: N = normal
582

P = percentile
BC = bias-corrected
. * The t-statistic vector is e(b)./e(se) where ./ is elt. by elt. division
. * But Stata Version 8 does not do ./ so instead need the following
. matrix tols = (vecdiag(diag(e(b))*syminv(diag(e(se)))))'
. matrix list tols, format(%10.2f)
tols[7,1]
r1
b_LNHHEXP 12.26
b_AGE 5.41
b_SEX 1.62
b_HHSIZE 1.81
b_FARM 2.40
b_EDUC -6.03
b_cons -1.04
.
. * The next two reproduce xtreg , cluster(COMMUNE)
. * but the cluster option for xtreg is not available for Stata version 8
.
. * For this example the cluster bootstrap se's are within 10 percent
. * of the usual xtreg se's, so usual se's may be okay here
.
. * Fixed effects estimator
. bootstrap "xtreg LNEXP12M $XLISTLINEAR, fe" _b, cluster(COMMUNE) reps($breps)
level(95)
command:
xtreg LNEXP12M LNHHEXP AGE SEX HHSIZE FARM EDUC , fe
statistics: b_LNHHEXP = _b[LNHHEXP]
b_AGE
= _b[AGE]
b_SEX
= _b[SEX]
b_HHSIZE = _b[HHSIZE]
b_FARM = _b[FARM]
b_EDUC = _b[EDUC]
b_cons = _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
194
Replications =
500

5006

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------b_LNHHEXP | 500 .6037139 -.0006143 .0583525 .4890671 .7183608 (N)
|
.4852716 .7172067 (P)
|
.4841806 .7148217 (BC)
b_AGE | 500 .0115845 5.02e-06 .0017464 .0081532 .0150157 (N)
|
.0082637 .0151613 (P)
583

|
.0084701 .0152766 (BC)
b_SEX | 500 .112821 -.0017372 .0546362 .0054756 .2201664 (N)
|
.0129603 .2214846 (P)
|
.017047 .235448 (BC)
b_HHSIZE | 500 .0107124 -.0004379 .0150286 -.0188148 .0402395 (N)
|
-.0195233 .0415316 (P)
|
-.0184428 .044119 (BC)
b_FARM | 500 .0693037 -.0010067 .0497627 -.0284666 .167074 (N)
|
-.0291446 .1679352 (P)
|
-.0259051 .1705921 (BC)
b_EDUC | 500 -.0510325 .0003307 .0153224 -.081137 -.020928 (N)
|
-.0818133 -.0219096 (P)
|
-.0844261 -.0230367 (BC)
b_cons | 500 .0361552 .0087515 .5186644 -.9828799 1.05519 (N)
|
-.934128 1.087458 (P)
|
-.934128 1.087458 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix tfe = (vecdiag(diag(e(b))*syminv(diag(e(se)))))'
. matrix list tfe, format(%10.2f)
tfe[7,1]
r1
b_LNHHEXP 10.35
b_AGE 6.63
b_SEX 2.06
b_HHSIZE 0.71
b_FARM 1.39
b_EDUC -3.33
b_cons 0.07
.
. * Random effects estimator
. bootstrap "xtreg LNEXP12M $XLISTLINEAR, re" _b, cluster(COMMUNE) reps($breps)
level(95)
command:
xtreg LNEXP12M LNHHEXP AGE SEX HHSIZE FARM EDUC , re
statistics: b_LNHHEXP = _b[LNHHEXP]
b_AGE
= _b[AGE]
b_SEX
= _b[SEX]
b_HHSIZE = _b[HHSIZE]
b_FARM = _b[FARM]
b_EDUC = _b[EDUC]
b_cons = _b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
194

5006

584

Replications

500

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------b_LNHHEXP | 500 .6268899 -.0079169 .0486878 .5312314 .7225483 (N)
|
.5261016 .7155449 (P)
|
.540477 .7254891 (BC)
b_AGE | 500 .0112334 .0001211 .0017668 .0077622 .0147047 (N)
|
.0080698 .0152565 (P)
|
.0077655 .0147142 (BC)
b_SEX | 500 .1069915 .0058127 .0561182 -.0032656 .2172486 (N)
|
.0046711 .2187323 (P)
|
-.0109273 .2045939 (BC)
b_HHSIZE | 500 .0158302 -.0014562 .0146506 -.0129543 .0446147 (N)
|
-.017179 .0459636 (P)
|
-.0108163 .0482198 (BC)
b_FARM | 500 .0928509 -.0071707 .0442312 .0059485 .1797532 (N)
|
-.0014455 .1728321 (P)
|
.0053411 .1906732 (BC)
b_EDUC | 500 -.0638447 .0049481 .014058 -.0914648 -.0362246 (N)
|
-.0871102 -.029496 (P)
|
-.094956 -.0407984 (BC)
b_cons | 500 -.1660698 .0535286 .4305953 -1.012073 .6799335 (N)
|
-.8970464 .6892154 (P)
|
-.9512222 .6032417 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix tre = (vecdiag(diag(e(b))*syminv(diag(e(se)))))'
. matrix list tre, format(%10.2f)
tre[7,1]
r1
b_LNHHEXP 12.88
b_AGE 6.36
b_SEX 1.91
b_HHSIZE 1.08
b_FARM 2.10
b_EDUC -4.54
b_cons -0.39
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section6\mma24p1olscluster.txt
log type: text
closed on: 24 May 2005, 14:44:12
585

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section6\mma24p2poiscluster.txt
log type: text
opened on: 24 May 2005, 16:35:22
.
. ********** OVERVIEW OF MMA24P2POISCLUSTER.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 24.7 pages 848-53 Table 24.6
. * Cluster robust inference for Poisson cross-section application using
. * Vietnam Living Standard Survey data
.
. * (0) Descriptive Statistics (Table 24.3 second half)
. * (1) Frequencies of data (Table 24.5)
. * (2) Poisson regression with individual-level data (Table 24.6)
.
. * The results differ in second significant digit from those in text
. * despite same sample size. Not sure why.
.
. * For Table 24.4 for clustered household data see MMA24P1OLSCLUSTER.DO
.
. * The Poisson cluster effects model is
. * y_it ~ Poiss0n(x_it'b + a_i)
. * Default xtreg output assumes Poisson distribution - var = mean.
. * This is usually too strong an assumption.
. * Instead should get cluster-robust errors after xtpois
. * See Section 21.2.3 pages 709-12 and section 23.26 pages 788-9
. * Stata Version 8 does not do this.
. * Here we do a panel bootstrap - results not reported in the text
.
. * To speed up programs reduce breps - the number of bootstrap reps
. * This program takes a long time if bootstrap
.
. * To run this program you need data set
. * vietnam_ex2.dta
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */

586

.
. ********** DATA DESCRIPTION **********
.
. * The data comes from World Bank 1997 Vietnam Living Standards Survey
. * A subset was used in chapter 4.6.4.
. * The larger sample here is described on pages 848-9
.
. * The data are HOUSEHOLD data
. * There are N=5006 individuals in 194 clusters (communes)
.
. * The separate data set vietnam_ex1.dta has individual level data
.
. ********** READ IN INDIVIDUAL-LEVEL DATA and SUMMARIZE (Table 24.3)
**********
.
. use vietnam_ex2.dta, clear
. desc
Contains data from vietnam_ex2.dta
obs:
27,766
vars:
12
11 Apr 2005 12:33
size: 1,443,832 (85.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------COMPED98
float %9.0g
SEX
float %9.0g
AGE
float %9.0g
MARRIED
float %9.0g
ILLDUM
float %9.0g
INJDUM
float %9.0g
ILLDAYS
float %9.0g
ACTDAYS
float %9.0g
PHARVIS
float %9.0g
HLTHINS
float %9.0g
lnhhinc
float %9.0g
commune
float %9.0g
------------------------------------------------------------------------------Sorted by:
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------COMPED98 | 27765 3.390672 1.93115
0
11
SEX | 27765 .5111471 .4998847
0
1
AGE | 27765 2.977504 .9671446
0 4.59512
MARRIED | 27765 .3988835 .4896775
0
1
ILLDUM | 27765 .6219701 .8995068
0
9
587

-------------+-------------------------------------------------------INJDUM | 27765 .0096885 .0979537


0
1
ILLDAYS | 27765 2.804034 5.45823
0
60
ACTDAYS | 27765 .0657302 1.115939
0
30
PHARVIS | 27765 .5117594 1.313427
0
30
HLTHINS | 27765 .1625788 .3689876
0
1
-------------+-------------------------------------------------------lnhhinc | 27765 2.60261 .6244145 .0467014 5.405502
commune | 27765 101.5266 56.28334
1
194
.
. rename COMPED98 EDUC
. rename ILLDUM ILLNESS
. rename INJDUM INJURY
. rename HLTHINS INSURANCE
. rename lnhhinc LNHHEXP
. rename commune COMMUNE
.
. * Following should give same descriptive statistics
. * as in bottom half (Household) in Table 24.3 p.850
. * But there are is a difference for LNHHEXP plus here no data on MEDEXP
. sum PHARVIS LNHHEXP AGE SEX MARRIED EDUC ILLNESS INJURY ILLDAYS
ACTDAYS INSURANCE COMMUNE
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------PHARVIS | 27765 .5117594 1.313427
0
30
LNHHEXP | 27765 2.60261 .6244145 .0467014 5.405502
AGE | 27765 2.977504 .9671446
0 4.59512
SEX | 27765 .5111471 .4998847
0
1
MARRIED | 27765 .3988835 .4896775
0
1
-------------+-------------------------------------------------------EDUC | 27765 3.390672 1.93115
0
11
ILLNESS | 27765 .6219701 .8995068
0
9
INJURY | 27765 .0096885 .0979537
0
1
ILLDAYS | 27765 2.804034 5.45823
0
60
ACTDAYS | 27765 .0657302 1.115939
0
30
-------------+-------------------------------------------------------INSURANCE | 27765 .1625788 .3689876
0
1
COMMUNE | 27765 101.5266 56.28334
1
194
. sum LNHHEXP, detail
LNHHEXP
------------------------------------------------------------588

Percentiles
Smallest
1% 1.302267
.0467014
5% 1.658267
.1111674
10% 1.875315
.3755146
25% 2.188848
.4177101
50%
75%
90%
95%
99%

Obs
27765
Sum of Wgt.
27765

2.534935
Mean
2.60261
Largest
Std. Dev.
.6244145
2.962732
5.405502
3.458658
5.405502
Variance
.3898934
3.737957
5.405502
Skewness
.4925002
4.295394
5.405502
Kurtosis
3.583693

.
. * Following gives Table 24.5 (page 852) frequencies
. * These differ in some places from Table 24.5 - especially for number = 0
. tabulate PHARVIS
PHARVIS |
Freq. Percent
Cum.
------------+----------------------------------0 | 20,668
74.44
74.44
1|
3,829
13.79
88.23
2|
1,716
6.18
94.41
3|
777
2.80
97.21
4|
359
1.29
98.50
5|
174
0.63
99.13
6|
64
0.23
99.36
7|
43
0.15
99.51
8|
16
0.06
99.57
9|
4
0.01
99.59
10 |
78
0.28
99.87
11 |
1
0.00
99.87
12 |
5
0.02
99.89
13 |
1
0.00
99.89
14 |
3
0.01
99.90
15 |
9
0.03
99.94
16 |
1
0.00
99.94
20 |
8
0.03
99.97
22 |
2
0.01
99.97
27 |
1
0.00
99.98
28 |
3
0.01
99.99
30 |
3
0.01
100.00
------------+----------------------------------Total | 27,765
100.00
.
. * Histogram with kernel density estimate
. hist PHARVIS, discrete kdensity
(start=0, width=1)
.
589

. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile PHARVIS LNHHEXP AGE SEX MARRIED EDUC ILLNESS INJURY ILLDAYS /*
> */ ACTDAYS INSURANCE COMMUNE using vietnam_ex2.asc, replace
.
. ********** ANALYSIS: CLUSTER ANALYSIS FOR POISSON MODEL [Table 24.6 p.851]
*********
.
. * Regressor list for the Poisson regressions
. global XLISTPOISSON LNHHEXP INSURANCE SEX AGE MARRIED ILLDAYS ACTDAYS
INJURY ILLNESS EDUC
.
. * Poisson with usual standard errors (Table 24.6 columns 1-2)
. poisson PHARVIS $XLISTPOISSON
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -26309.924


log likelihood = -25300.337
log likelihood = -25281.839
log likelihood = -25281.786
log likelihood = -25281.786

Poisson regression

Number of obs =
27765
LR chi2(10) = 13226.50
Prob > chi2 = 0.0000
Log likelihood = -25281.786
Pseudo R2
= 0.2073

-----------------------------------------------------------------------------PHARVIS |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | .078686 .0138419 5.68 0.000 .0515564 .1058156
INSURANCE | -.2485716 .0259704 -9.57 0.000 -.2994727 -.1976706
SEX | .0851733 .0171697 4.96 0.000 .0515213 .1188253
AGE | .0252426 .0106126 2.38 0.017 .0044423 .0460429
MARRIED | .1239639 .0209267 5.92 0.000 .0829483 .1649795
ILLDAYS | .0429083 .0010728 40.00 0.000 .0408057 .0450109
ACTDAYS | .0089793 .0052409 1.71 0.087 -.0012927 .0192514
INJURY | .1717029 .0747292 2.30 0.022 .0252364 .3181694
ILLNESS | .5623976 .0064536 87.15 0.000 .5497488 .5750464
EDUC | -.0524459 .0048173 -10.89 0.000 -.0618878 -.0430041
_cons | -1.640821 .0458542 -35.78 0.000 -1.730694 -1.550949
-----------------------------------------------------------------------------. estimates store poisiid
.
. * Poisson with heteroskedastic-robust standard errors (Table 24.6 column 3)
. poisson PHARVIS $XLISTPOISSON, robust
Iteration 0: log pseudo-likelihood = -26309.924
Iteration 1: log pseudo-likelihood = -25300.337
590

Iteration 2: log pseudo-likelihood = -25281.839


Iteration 3: log pseudo-likelihood = -25281.786
Iteration 4: log pseudo-likelihood = -25281.786
Poisson regression

Number of obs =
27765
Wald chi2(10) = 2423.07
Prob > chi2 = 0.0000
Log pseudo-likelihood = -25281.786
Pseudo R2
= 0.2073
-----------------------------------------------------------------------------|
Robust
PHARVIS |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | .078686 .0255091 3.08 0.002 .0286891 .1286829
INSURANCE | -.2485716 .0437892 -5.68 0.000 -.3343969 -.1627464
SEX | .0851733 .030907 2.76 0.006 .0245967 .1457499
AGE | .0252426 .0198448 1.27 0.203 -.0136526 .0641377
MARRIED | .1239639 .0419107 2.96 0.003 .0418205 .2061073
ILLDAYS | .0429083 .0028779 14.91 0.000 .0372678 .0485488
ACTDAYS | .0089793 .0207444 0.43 0.665 -.031679 .0496377
INJURY | .1717029 .2043534 0.84 0.401 -.2288224 .5722282
ILLNESS | .5623976 .0228635 24.60 0.000
.517586 .6072092
EDUC | -.0524459 .0081043 -6.47 0.000 -.0683301 -.0365618
_cons | -1.640821 .0872497 -18.81 0.000 -1.811828 -1.469815
-----------------------------------------------------------------------------. estimates store poishet
.
. * Poisson with cluster-robust standard errors (Table 24.6 column 4)
. poisson PHARVIS $XLISTPOISSON, cluster(COMMUNE)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log pseudo-likelihood = -26309.924


log pseudo-likelihood = -25300.337
log pseudo-likelihood = -25281.839
log pseudo-likelihood = -25281.786
log pseudo-likelihood = -25281.786

Poisson regression

Number of obs =
27765
Wald chi2(10) = 1295.38
Log pseudo-likelihood = -25281.786
Prob > chi2 = 0.0000
(standard errors adjusted for clustering on COMMUNE)
-----------------------------------------------------------------------------|
Robust
PHARVIS |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | .078686 .0472052 1.67 0.096 -.0138344 .1712065
INSURANCE | -.2485716 .0617873 -4.02 0.000 -.3696725 -.1274708
SEX | .0851733 .0327427 2.60 0.009 .0209988 .1493478
AGE | .0252426 .0262626 0.96 0.336 -.0262311 .0767163
591

MARRIED | .1239639 .048607 2.55 0.011


.028696 .2192318
ILLDAYS | .0429083 .0037384 11.48 0.000 .0355811 .0502355
ACTDAYS | .0089793 .0190493 0.47 0.637 -.0283567 .0463154
INJURY | .1717029 .2214258 0.78 0.438 -.2622836 .6056894
ILLNESS | .5623976 .028512 19.72 0.000
.506515 .6182802
EDUC | -.0524459 .0153841 -3.41 0.001 -.0825982 -.0222937
_cons | -1.640821 .1541108 -10.65 0.000 -1.942873 -1.33877
-----------------------------------------------------------------------------. estimates store poisclust
.
. * Random effects estimation (Table 24.6 columns 5-6)
. * This uses the xtpois command which first requires identifying the cluster
. iis COMMUNE
. xtpois PHARVIS $XLISTPOISSON, re
Fitting Poisson model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -26309.924


log likelihood = -25300.337
log likelihood = -25281.839
log likelihood = -25281.786
log likelihood = -25281.786

Fitting full model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -23538.342


log likelihood = -23430.615
log likelihood = -23419.142
log likelihood = -23419.132
log likelihood = -23419.132

Random-effects Poisson regression


Group variable (i): COMMUNE
Random effects u_i ~ Gamma

Number of obs
= 27765
Number of groups =
194
Obs per group: min =
avg = 143.1
max =
206

51

Wald chi2(10)
= 13723.01
Log likelihood = -23419.132
Prob > chi2
= 0.0000
-----------------------------------------------------------------------------PHARVIS |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | -.1013746 .0187549 -5.41 0.000 -.1381336 -.0646157
INSURANCE | -.1675953 .0273642 -6.12 0.000 -.2212283 -.1139624
SEX | .099303 .0172541 5.76 0.000 .0654855 .1331206
AGE | .0047406 .0107899 0.44 0.660 -.0164073 .0258884
592

MARRIED | .1579958 .0212825 7.42 0.000 .1162828 .1997088


ILLDAYS | .046055 .0011422 40.32 0.000 .0438164 .0482937
ACTDAYS | .0186084 .0054546 3.41 0.001 .0079176 .0292991
INJURY | .1479464 .0780863 1.89 0.058
-.0051 .3009928
ILLNESS | .5801872 .0076855 75.49 0.000
.565124 .5952505
EDUC | -.0284493 .0055827 -5.10 0.000 -.0393911 -.0175075
_cons | -1.276974 .0723199 -17.66 0.000 -1.418718 -1.135229
-------------+---------------------------------------------------------------/lnalpha | -1.039839 .1035295
-1.242753 -.836925
-------------+---------------------------------------------------------------alpha | .3535115 .0365989
.2885885 .4330401
-----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 3725.31 Prob>=chibar2 = 0.000
. estimates store poisre
.
. * Following shows that cluster option for xtpois in Stata version does nothing
. xtpois PHARVIS $XLISTPOISSON, i(COMMUNE) re
Fitting Poisson model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -26309.924


log likelihood = -25300.337
log likelihood = -25281.839
log likelihood = -25281.786
log likelihood = -25281.786

Fitting full model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -23538.342


log likelihood = -23430.615
log likelihood = -23419.142
log likelihood = -23419.132
log likelihood = -23419.132

Random-effects Poisson regression


Group variable (i): COMMUNE
Random effects u_i ~ Gamma

Number of obs
= 27765
Number of groups =
194
Obs per group: min =
avg = 143.1
max =
206

51

Wald chi2(10)
= 13723.01
Log likelihood = -23419.132
Prob > chi2
= 0.0000
-----------------------------------------------------------------------------PHARVIS |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | -.1013746 .0187549 -5.41 0.000 -.1381336 -.0646157
INSURANCE | -.1675953 .0273642 -6.12 0.000 -.2212283 -.1139624
593

SEX | .099303 .0172541 5.76 0.000 .0654855 .1331206


AGE | .0047406 .0107899 0.44 0.660 -.0164073 .0258884
MARRIED | .1579958 .0212825 7.42 0.000 .1162828 .1997088
ILLDAYS | .046055 .0011422 40.32 0.000 .0438164 .0482937
ACTDAYS | .0186084 .0054546 3.41 0.001 .0079176 .0292991
INJURY | .1479464 .0780863 1.89 0.058
-.0051 .3009928
ILLNESS | .5801872 .0076855 75.49 0.000
.565124 .5952505
EDUC | -.0284493 .0055827 -5.10 0.000 -.0393911 -.0175075
_cons | -1.276974 .0723199 -17.66 0.000 -1.418718 -1.135229
-------------+---------------------------------------------------------------/lnalpha | -1.039839 .1035295
-1.242753 -.836925
-------------+---------------------------------------------------------------alpha | .3535115 .0365989
.2885885 .4330401
-----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 3725.31 Prob>=chibar2 = 0.000
.
. * Note that can cluster bootstrap if desired to get more robust standard errors
. * This is done at end of program
.
. * Fixed effects estimation (FGLS) (Table 24.6 columns 7-8)
. xtpois PHARVIS $XLISTPOISSON, fe
note: 1 group (94 obs) dropped due to all zero outcomes
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -28435.61


log likelihood = -24231.502
log likelihood = -22468.078
log likelihood = -22446.225
log likelihood = -22446.002
log likelihood = -22446.002

Conditional fixed-effects Poisson regression Number of obs


=
Group variable (i): COMMUNE
Number of groups =
Obs per group: min =
avg = 143.4
max =
206

Log likelihood = -22446.002

27671
193

51

Wald chi2(10)
= 13621.76
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------PHARVIS |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNHHEXP | -.1146402 .019025 -6.03 0.000 -.1519285 -.0773519
INSURANCE | -.163603 .0274193 -5.97 0.000 -.2173438 -.1098622
SEX | .0997415 .0172564 5.78 0.000 .0659195 .1335635
AGE | .0033591 .0107945 0.31 0.756 -.0177977 .024516
MARRIED | .1606792 .0212958 7.55 0.000 .1189403 .2024182
ILLDAYS | .046148 .0011453 40.29 0.000 .0439032 .0483929
ACTDAYS | .0189184 .0054666 3.46 0.001
.008204 .0296328
594

INJURY | .1479319 .078183 1.89 0.058 -.0053039 .3011677


ILLNESS | .5803719 .0077289 75.09 0.000 .5652235 .5955203
EDUC | -.0272099 .0056191 -4.84 0.000 -.0382232 -.0161966
-----------------------------------------------------------------------------. estimates store poisfe
.
. * Note that can cluster bootstrap if desired to get more robust standard errors
. * This is done at end of program
.
. ********** DISPLAY TABLE 24.6 RESULTS page 852 **********
.
. * The results here differ in the second significant digit from those in text
. * despite same sample size. Not sure why.
.
. estimates table poisiid poishet poisclust, /*
> */ b(%10.3f) t(%10.2f) stats(r2 N)
----------------------------------------------------Variable | poisiid
poishet poisclust
-------------+--------------------------------------LNHHEXP |
0.079
0.079
0.079
|
5.68
3.08
1.67
INSURANCE | -0.249
-0.249
-0.249
|
-9.57
-5.68
-4.02
SEX |
0.085
0.085
0.085
|
4.96
2.76
2.60
AGE |
0.025
0.025
0.025
|
2.38
1.27
0.96
MARRIED |
0.124
0.124
0.124
|
5.92
2.96
2.55
ILLDAYS |
0.043
0.043
0.043
|
40.00
14.91
11.48
ACTDAYS |
0.009
0.009
0.009
|
1.71
0.43
0.47
INJURY |
0.172
0.172
0.172
|
2.30
0.84
0.78
ILLNESS |
0.562
0.562
0.562
|
87.15
24.60
19.72
EDUC | -0.052
-0.052
-0.052
| -10.89
-6.47
-3.41
_cons | -1.641
-1.641
-1.641
| -35.78
-18.81
-10.65
-------------+--------------------------------------r2 |
N | 27765.000 27765.000 27765.000
----------------------------------------------------legend: b/t
. estimates table poisre poisfe, /*
595

>

*/ b(%10.3f) t(%10.2f) stats(r2 N)

---------------------------------------Variable | poisre
poisfe
-------------+-------------------------PHARVIS
|
LNHHEXP | -0.101
-0.115
|
-5.41
-6.03
INSURANCE | -0.168
-0.164
|
-6.12
-5.97
SEX |
0.099
0.100
|
5.76
5.78
AGE |
0.005
0.003
|
0.44
0.31
MARRIED |
0.158
0.161
|
7.42
7.55
ILLDAYS |
0.046
0.046
|
40.32
40.29
ACTDAYS |
0.019
0.019
|
3.41
3.46
INJURY |
0.148
0.148
|
1.89
1.89
ILLNESS |
0.580
0.580
|
75.49
75.09
EDUC | -0.028
-0.027
|
-5.10
-4.84
_cons | -1.277
| -17.66
-------------+-------------------------lnalpha
|
_cons | -1.040
| -10.04
-------------+-------------------------Statistics |
r2 |
N | 27765.000 27671.000
---------------------------------------legend: b/t
.
. ********** ADDITIONALLY DO CLUSTER BOOTSTRAPS **********
.
. * These results not given in the text
.
. * Output at website uses breps 500
. global breps 50
.
. * Note that can bootstrap if desired to get more robust standard errors
. * The first reproduces pois , cluster(COMMUNE)
. bootstrap "poisson PHARVIS $XLISTPOISSON" _b, cluster(COMMUNE) reps($breps) level(95)
596

command:
poisson PHARVIS LNHHEXP INSURANCE SEX AGE MARRIED ILLDAYS
ACTDAYS INJURY ILLNESS EDUC
statistics: b_LNHHEXP = [PHARVIS]_b[LNHHEXP]
b_INSURA~E = [PHARVIS]_b[INSURANCE]
b_SEX
= [PHARVIS]_b[SEX]
b_AGE
= [PHARVIS]_b[AGE]
b_MARRIED = [PHARVIS]_b[MARRIED]
b_ILLDAYS = [PHARVIS]_b[ILLDAYS]
b_ACTDAYS = [PHARVIS]_b[ACTDAYS]
b_INJURY = [PHARVIS]_b[INJURY]
b_ILLNESS = [PHARVIS]_b[ILLNESS]
b_EDUC = [PHARVIS]_b[EDUC]
b_cons = [PHARVIS]_b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
194
Replications =
50

27765

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------b_LNHHEXP | 50 .078686 .0072233 .0475425 -.0168542 .1742262 (N)
|
-.0050689 .1878158 (P)
|
-.0204097 .1710779 (BC)
b_INSURANCE | 50 -.2485716 .0013929 .0770506 -.4034107 -.0937326 (N)
|
-.3640907 -.1004183 (P)
|
-.4677969 -.1004183 (BC)
b_SEX | 50 .0851733 -.0039062 .0345537 .0157351 .1546115 (N)
|
.022333 .1494552 (P)
|
.022333 .1494552 (BC)
b_AGE | 50 .0252426 .0012812 .0270715 -.0291596 .0796447 (N)
|
-.025843 .0726057 (P)
|
-.0479862 .0726057 (BC)
b_MARRIED | 50 .1239639 -.0017894 .0406114 .0423522 .2055756 (N)
|
.0132484 .2024732 (P)
|
.0132484 .2101617 (BC)
b_ILLDAYS | 50 .0429083 -.0005122 .0034 .0360757 .0497409 (N)
|
.0358535 .0481521 (P)
|
.0363203 .0500312 (BC)
b_ACTDAYS | 50 .0089793 -.0021093 .0249974 -.0412549 .0592135 (N)
|
-.0343906 .0573651 (P)
|
-.0352626 .0573651 (BC)
b_INJURY | 50 .1717029 -.0321969 .2090263 -.2483512 .591757 (N)
|
-.3271621 .4807015 (P)
|
-.1896703 .648314 (BC)
b_ILLNESS | 50 .5623976 .0061368 .0294736 .5031682 .621627 (N)
|
.5206931 .6271017 (P)
|
.5192547 .6206369 (BC)
b_EDUC | 50 -.0524459 .0027244 .01598 -.0845589 -.0203329 (N)
|
-.0825952 -.017323 (P)
597

|
-.0850821 -.0256777 (BC)
b_cons | 50 -1.640821 -.0414073 .1460702 -1.93436 -1.347282 (N)
|
-1.984352 -1.399226 (P)
|
-1.867373 -1.310915 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. * The t-statistic vector is e(b)./e(se) where ./ is elt. by elt. division
. * But Stata Version 8 does not do ./ so instead need the following
. matrix tpois = (vecdiag(diag(e(b))*syminv(diag(e(se)))))'
. matrix list tpois, format(%10.2f)
tpois[11,1]
r1
b_LNHHEXP 1.66
b_INSURANCE -3.23
b_SEX 2.46
b_AGE 0.93
b_MARRIED 3.05
b_ILLDAYS 12.62
b_ACTDAYS 0.36
b_INJURY 0.82
b_ILLNESS 19.08
b_EDUC -3.28
b_cons -11.23
.
. * The next two reproduce xtpois , cluster(COMMUNE)
. * but xtpois has no cluster option so instead cluster boostrap
.
. * Fixed effects estimator
. bootstrap "xtpois PHARVIS $XLISTPOISSON, fe" _b, cluster(COMMUNE) reps($breps)
level(95)
command:
xtpois PHARVIS LNHHEXP INSURANCE SEX AGE MARRIED ILLDAYS
ACTDAYS INJURY ILLNESS EDUC ,
> fe
statistics: b_LNHHEXP = [PHARVIS]_b[LNHHEXP]
b_INSURA~E = [PHARVIS]_b[INSURANCE]
b_SEX
= [PHARVIS]_b[SEX]
b_AGE
= [PHARVIS]_b[AGE]
b_MARRIED = [PHARVIS]_b[MARRIED]
b_ILLDAYS = [PHARVIS]_b[ILLDAYS]
b_ACTDAYS = [PHARVIS]_b[ACTDAYS]
b_INJURY = [PHARVIS]_b[INJURY]
b_ILLNESS = [PHARVIS]_b[ILLNESS]
b_EDUC = [PHARVIS]_b[EDUC]

598

Bootstrap statistics

Number of obs =
N of clusters =
193
Replications =
50

27671

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------b_LNHHEXP | 50 -.1146402 .0046925 .042981 -.2010138 -.0282666 (N)
|
-.1801919 -.0258064 (P)
|
-.1841975 -.043704 (BC)
b_INSURANCE | 50 -.163603 .0145077 .0513299 -.2667543 -.0604516 (N)
|
-.2391983 -.0581847 (P)
|
-.269962 -.0993868 (BC)
b_SEX | 50 .0997415 .0030381 .0298361 .0397836 .1596994 (N)
|
.0581716 .1630876 (P)
|
.055771 .1562326 (BC)
b_AGE | 50 .0033591 -.0017336 .0228288 -.042517 .0492353 (N)
|
-.0508069 .040935 (P)
|
-.0508069 .0541492 (BC)
b_MARRIED | 50 .1606793 .009603 .0435503 .0731616 .2481969 (N)
|
.1091381 .260388 (P)
|
.0877519 .2407327 (BC)
b_ILLDAYS | 50 .046148 -.0004107 .0027904 .0405406 .0517555 (N)
|
.0397139 .0504146 (P)
|
.0397139 .050898 (BC)
b_ACTDAYS | 50 .0189184 -.0049228 .0176306 -.0165115 .0543484 (N)
|
-.0169987 .0490534 (P)
|
-.0158923 .0497731 (BC)
b_INJURY | 50 .1479319 .0204617 .2194316 -.2930323 .5888962 (N)
|
-.2735089 .5520838 (P)
|
-.3044733 .5520838 (BC)
b_ILLNESS | 50 .5803719 .0003675 .0199171 .540347 .6203969 (N)
|
.5370637 .6163648 (P)
|
.5370637 .6163648 (BC)
b_EDUC | 50 -.0272099 -.0003993 .0112987 -.0499155 -.0045043 (N)
|
-.0521668 -.0068456 (P)
|
-.0531845 -.0068456 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix tpoisfe = (vecdiag(diag(e(b))*syminv(diag(e(se)))))'
. matrix list tpoisfe, format(%10.2f)
tpoisfe[10,1]
r1
b_LNHHEXP -2.67
b_INSURANCE -3.19
b_SEX 3.34
599

b_AGE 0.15
b_MARRIED 3.69
b_ILLDAYS 16.54
b_ACTDAYS 1.07
b_INJURY 0.67
b_ILLNESS 29.14
b_EDUC -2.41
.
. * Random effects estimator
. bootstrap "xtpois PHARVIS $XLISTPOISSON, re" _b, cluster(COMMUNE) reps($breps)
level(95)
command:
xtpois PHARVIS LNHHEXP INSURANCE SEX AGE MARRIED ILLDAYS
ACTDAYS INJURY ILLNESS EDUC ,
> re
statistics: b_LNHHEXP = [PHARVIS]_b[LNHHEXP]
b_INSURA~E = [PHARVIS]_b[INSURANCE]
b_SEX
= [PHARVIS]_b[SEX]
b_AGE
= [PHARVIS]_b[AGE]
b_MARRIED = [PHARVIS]_b[MARRIED]
b_ILLDAYS = [PHARVIS]_b[ILLDAYS]
b_ACTDAYS = [PHARVIS]_b[ACTDAYS]
b_INJURY = [PHARVIS]_b[INJURY]
b_ILLNESS = [PHARVIS]_b[ILLNESS]
b_EDUC = [PHARVIS]_b[EDUC]
b_cons = [PHARVIS]_b[_cons]
b_1cons = [lnalpha]_b[_cons]
Bootstrap statistics

Number of obs =
N of clusters =
194
Replications =
50

27765

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------b_LNHHEXP | 50 -.1013746 .0038095 .0406385 -.1830407 -.0197086 (N)
|
-.1794194 -.0319058 (P)
|
-.1977448 -.0319058 (BC)
b_INSURANCE | 50 -.1675954 -.0053195 .04945 -.2669688 -.0682219 (N)
|
-.2912881 -.0900193 (P)
|
-.2677689 -.088337 (BC)
b_SEX | 50 .099303 -.0008622 .032962 .0330634 .1655427 (N)
|
.0463968 .1569125 (P)
|
.0463968 .1569125 (BC)
b_AGE | 50 .0047406 -.002087 .0196285 -.0347045 .0441856 (N)
|
-.0319554 .0398893 (P)
|
-.0212454 .0454795 (BC)
b_MARRIED | 50 .1579958 .0045701 .0386327 .0803604 .2356311 (N)
|
.1002202 .2446688 (P)
|
.0595091 .2383231 (BC)
600

b_ILLDAYS | 50 .046055 -.0000891 .0033445 .039334 .0527761 (N)


|
.0400018 .0525925 (P)
|
.0400018 .0528012 (BC)
b_ACTDAYS | 50 .0186084 -.0013996 .0204209 -.022429 .0596457 (N)
|
-.0251694 .0533912 (P)
|
-.0251694 .0624974 (BC)
b_INJURY | 50 .1479464 -.0122248 .2130704 -.2802346 .5761274 (N)
|
-.2971589 .4662884 (P)
|
-.3564237 .4662884 (BC)
b_ILLNESS | 50 .5801873 .002013 .019375 .5412517 .6191228 (N)
|
.5488635 .621733 (P)
|
.5488635 .6328769 (BC)
b_EDUC | 50 -.0284493 -.0017922 .0117021 -.0519655 -.0049331 (N)
|
-.050308 -.0116823 (P)
|
-.050308 -.0065941 (BC)
b_cons | 50 -1.276974 -.0036143 .1309168 -1.540061 -1.013887 (N)
|
-1.523902 -.9686469 (P)
|
-1.523902 -.9686469 (BC)
b_1cons | 50 -1.039839 .0148765 .0966908 -1.234147 -.8455317 (N)
|
-1.170977 -.8494586 (P)
|
-1.183111 -.8494586 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. matrix tpoisre = (vecdiag(diag(e(b))*syminv(diag(e(se)))))'
. matrix list tpoisre, format(%10.2f)
tpoisre[12,1]
r1
b_LNHHEXP -2.49
b_INSURANCE -3.39
b_SEX 3.01
b_AGE 0.24
b_MARRIED 4.09
b_ILLDAYS 13.77
b_ACTDAYS 0.91
b_INJURY 0.69
b_ILLNESS 29.95
b_EDUC -2.43
b_cons -9.75
b_1cons -10.75
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section6\mma24p2poiscluster.txt
log type: text
closed on: 24 May 2005, 16:50:38
601

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section6\mma25p1treatment.txt
log type: text
opened on: 26 May 2005, 10:26:17
.
. ********** OVERVIEW OF MMA25P1TREATMENT.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 25.8.1-25.8.4 pages 889-893 Tables 25.3-25.4 and Fig. 25.3
. * Evaluating treatment effect of training on Earnings
. * using Dehejia-Wahba data (originally Lalonde data)
.
. * (0) Summarize data for treatments and controls (Table 25.3)
. * (1) Calculate the treatment effect by simple methods (Table 25.4)
. * To replicate some results in DW 1999
. * (1A) treatment-control
. * (1B) control function
. * (1C) before-after cpmparison
. * (1D) differences-in-differences
. * (2) Calculate treatment effect by propensity score (matching by strata)
. * Last entry in Table 25.4 and Figure 25.3.
.
. * The program MMA25P2MATCHING.DO uses propensity scores with matching
. * methods more sophisticated than those usd in the MMA25P1TREAMENT.DO
.
. * To run this program you need file
. * nswpsid.da1
.
. ********** STATA SETUP **********
.
. set more off
. version 8
. set scheme s1mono /* Used for graphs */
.
. ********** DATA DESCRIPTION **********
.
. * Data set nswpsid.da1 is data set nswpsid.da1 from Guido Imbens
. * http://emlab.berkeley.edu/users/imbens/index.shtml
.
. * Data originally from DW99
. * R.H. Dehejia and S. Wahba (1999)
. * "Causal Effects in Nonexperimental Studies: reevaluating the
602

. * Evaluation of Training Programs", JASA, 1053-1062


. * or DW02
. * R.H. Dehejia and S. Wahba (2002)
. * "Propensity-score Matching Methods for Nonexperimental Causal
. * Studies", ReStat, 151-161
. * which in turn are from
. * Lalonde, R. (1986), "Evaluating the Econometric Evaluations of
. * Training Programs with Experimental Data," AER, 604-620.
.
. * Each observation is for an individual.
. * There are 2,675 observations: 185 in treated group and 2490 in control
.
. * Variables are
. * TREAT 1 if treated (NSW treated) and 0 if not (PSID-1 control)
. * AGE in years
. * EDUC in years
. * BLACK 1 if black
. * HISP 1 if hispanic
. * MARR 1 if married
. * RE74 Real annual earnings in 1974 (pre-treatment)
. * RE75 Real annual earnings in 1974 (pre-treatment)
. * RE78 Real annual earnings in 1974 (post-treatment)
. * U74 1 if unemployed in 1974
. * U75 1 if unemployed in 1974
.
. * NOTE: U74 and U75 are miscoded in these data and also in the
.*
summary statistics table of DW02
.*
See below for correction to data
.
. ********** READ DATA AND TRANSFORMATIONS **********
.
. infile TREAT AGE EDUC BLACK HISP MARR RE74 RE75 RE78 U74 U75 /*
> */ using nswpsid.da1
(2675 observations read)
.
. * The original data reversed U74 and U75
. * Should be U74=1 if R74=0 and U74=0 if R74>0 anmd similar for U75
. * This effects results with propensity score though not eariler results
.
. * Wrong U74 and U75
. sum U74 U75
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------U74 |
2675 .1345794 .3413376
0
1
U75 |
2675 .1293458 .335645
0
1
.
. * Correct the original data
. drop U74 U75
603

. gen U74 = cond(RE74 == 0, 1, 0)


. gen U75 = cond(RE75 == 0, 1, 0)
.
. * Correct U74 and U75
. sum U74 U75
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------U74 |
2675 .1293458 .335645
0
1
U75 |
2675 .1345794 .3413376
0
1
.
. * Create regressors used as additional controls in regressions below
. gen AGESQ = AGE*AGE
. gen EDUCSQ = EDUC*EDUC
. * DW99 do not define NODEGREE but following gives Table 1 means
. gen NODEGREE = 0
. replace NODEGREE = 1 if EDUC < 12
(891 real changes made)
. gen RE74SQ = RE74*RE74
. gen RE75SQ = RE75*RE75
. gen U74BLACK = U74*BLACK
. gen U74HISP = U74*HISP
.
. sum AGE EDUC NODEGREE BLACK HISP MARR U74 U75 RE74 RE75 RE78 TREAT /*
> */ AGESQ EDUCSQ RE74SQ RE75SQ U74BLACK U74HISP
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------AGE |
2675 34.22579 10.49984
17
55
EDUC |
2675 11.99439 3.053556
0
17
NODEGREE |
2675 .3330841 .4714045
0
1
BLACK |
2675 .2915888 .4545789
0
1
HISP |
2675 .0343925 .1822693
0
1
-------------+-------------------------------------------------------MARR |
2675 .8194393 .3847257
0
1
U74 |
2675 .1293458 .335645
0
1
U75 |
2675 .1345794 .3413376
0
1
RE74 |
2675
18230 13722.25
0 137149
RE75 |
2675 17850.89 13877.78
0 156653
604

-------------+-------------------------------------------------------RE78 |
2675 20502.38 15632.52
0 121174
TREAT |
2675 .0691589 .2537716
0
1
AGESQ |
2675 1281.61 766.8415
289
3025
EDUCSQ |
2675 153.1862 70.62231
0
289
RE74SQ |
2675 5.21e+08 8.47e+08
0 1.88e+10
-------------+-------------------------------------------------------RE75SQ |
2675 5.11e+08 8.91e+08
0 2.45e+10
U74BLACK |
2675 .0549533 .2279316
0
1
U74HISP |
2675 .0056075 .0746868
0
1
.
. * Reproduce DW99 Table 1: RE74subset Treated and PSID-1 rows
. * Same as CT Table 25.3 page 890
. * except for changes to U74, U75 and U74BLACK
. bysort TREAT: sum AGE EDUC NODEGREE BLACK HISP MARR U74 U75 RE74 RE75
RE78 TREAT /*
> */ AGESQ EDUCSQ RE74SQ RE75SQ U74BLACK
----------------------------------------------------------------------------------------------------> TREAT = 0
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------AGE |
2490 34.8506 10.44076
18
55
EDUC |
2490 12.11687 3.082435
0
17
NODEGREE |
2490 .3052209 .4605934
0
1
BLACK |
2490 .2506024 .433447
0
1
HISP |
2490 .0325301 .1774389
0
1
-------------+-------------------------------------------------------MARR |
2490 .8662651 .3404357
0
1
U74 |
2490 .0863454 .2809298
0
1
U75 |
2490
.1 .3000603
0
1
RE74 |
2490 19428.75 13406.88
0 137149
RE75 | 2490 19063.34 13596.95
0 156653
-------------+-------------------------------------------------------RE78 |
2490 21553.92 15555.35
0 121174
TREAT |
2490
0
0
0
0
AGESQ |
2490 1323.53 769.796
324
3025
EDUCSQ |
2490 156.3161 71.43048
0
289
RE74SQ |
2490 5.57e+08 8.66e+08
0 1.88e+10
-------------+-------------------------------------------------------RE75SQ |
2490 5.48e+08 9.12e+08
0 2.45e+10
U74BLACK |
2490 .0144578 .1193923
0
1
----------------------------------------------------------------------------------------------------> TREAT = 1
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------AGE |
185 25.81622 7.155019
17
48
605

EDUC |
185 10.34595 2.01065
4
16
NODEGREE |
185 .7081081 .4558666
0
1
BLACK |
185 .8432432 .3645579
0
1
HISP |
185 .0594595 .2371244
0
1
-------------+-------------------------------------------------------MARR |
185 .1891892 .3927217
0
1
U74 |
185 .7081081 .4558666
0
1
U75 |
185
.6 .4912274
0
1
RE74 |
185 2095.574 4886.623
0 35040.1
RE75 |
185 1532.056 3219.251
0 25142.2
-------------+-------------------------------------------------------RE78 |
185 6349.145 7867.405
0 60307.9
TREAT |
185
1
0
1
1
AGESQ |
185 717.3946 431.2517
289
2304
EDUCSQ |
185 111.0595 39.30388
16
256
RE74SQ |
185 2.81e+07 1.14e+08
0 1.23e+09
-------------+-------------------------------------------------------RE75SQ |
185 1.27e+07 5.60e+07
0 6.32e+08
U74BLACK |
185
.6 .4912274
0
1

.
. save nswpsid, replace
file nswpsid.dta saved
.
. ********** ANALYSIS: (1) CALCULATE EFFECT OF TRAINING (Table 25.4, p.891)
**********
.
. ***** (1A) TREATMENT-CONTROL COMPARISON USING POST_TREATMENT
EARNINGS
. *****
[Difference in means]
.
. * DW99 Table 5 column 1 and Table 3 column 1
. regress RE78 T
Source |
SS
df
MS
Number of obs = 2675
-------------+-----------------------------F( 1, 2673) = 173.41
Model | 3.9811e+10 1 3.9811e+10
Prob > F
= 0.0000
Residual | 6.1365e+11 2673 229573201
R-squared = 0.0609
-------------+-----------------------------Adj R-squared = 0.0606
Total | 6.5346e+11 2674 244375675
Root MSE
= 15152
-----------------------------------------------------------------------------RE78 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------TREAT | -15204.78 1154.614 -13.17 0.000 -17468.8 -12940.75
_cons | 21553.92 303.6414 70.98 0.000 20958.53 22149.32
-----------------------------------------------------------------------------.
606

. * CT Table 25.4 p.891 first row uses heteroskedastic-robust standard errors


. regress RE78 TREAT, robust
Regression with robust standard errors
Number of obs =
F( 1, 2673) = 537.36
Prob > F
= 0.0000
R-squared = 0.0609
Root MSE = 15152

2675

-----------------------------------------------------------------------------|
Robust
RE78 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------TREAT | -15204.78 655.9143 -23.18 0.000 -16490.93 -13918.63
_cons | 21553.92 311.785 69.13 0.000 20942.56 22165.29
-----------------------------------------------------------------------------. estimates store treatcontrol
.
. ***** (1B) CONTROL FUNCTION ESTIMATOR Additionally Include pre-treatment controls
.
. * DW99 Table 5 column 2 using regressors in footnote a
. * Same as DW99 Table 2 column 14
. regress RE78 TREAT AGE AGESQ EDUC NODEGREE BLACK HISP RE74 RE75
Source |
SS
df
MS
Number of obs = 2675
-------------+-----------------------------F( 9, 2665) = 419.22
Model | 3.8296e+11 9 4.2551e+10
Prob > F
= 0.0000
Residual | 2.7050e+11 2665 101500967
R-squared = 0.5860
-------------+-----------------------------Adj R-squared = 0.5847
Total | 6.5346e+11 2674 244375675
Root MSE
= 10075
-----------------------------------------------------------------------------RE78 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------TREAT | 217.9438 866.1968 0.25 0.801 -1480.542 1916.43
AGE | 158.5058 155.4065 1.02 0.308 -146.2239 463.2354
AGESQ | -3.232885 2.11617 -1.53 0.127 -7.382386 .9166173
EDUC | 564.6237 103.56 5.45 0.000 361.5577 767.6898
NODEGREE | 502.0912 647.0243 0.78 0.438 -766.6292 1770.812
BLACK | -699.3353 493.1811 -1.42 0.156 -1666.392 267.7211
HISP | 2226.535 1092.71 2.04 0.042 83.88965 4369.181
RE74 | .2791682 .0279297 10.00 0.000 .2244021 .3339343
RE75 | .5680874 .0275763 20.60 0.000 .5140143 .6221605
_cons | -2836.703 2901.443 -0.98 0.328 -8526.01 2852.604
-----------------------------------------------------------------------------.
. * CT Table 25.4 p.891 second row uses heteroskedastic-robust standard errors
. regress RE78 TREAT AGE AGESQ EDUC NODEGREE BLACK HISP RE74 RE75, robust
607

Regression with robust standard errors


Number of obs =
F( 9, 2665) = 232.85
Prob > F
= 0.0000
R-squared = 0.5860
Root MSE = 10075

2675

-----------------------------------------------------------------------------|
Robust
RE78 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------TREAT | 217.9438 767.8811 0.28 0.777 -1287.759 1723.647
AGE | 158.5058 151.0305 1.05 0.294 -137.6431 454.6546
AGESQ | -3.232885 2.103324 -1.54 0.124 -7.357197 .891428
EDUC | 564.6237 121.6483 4.64 0.000 326.0891 803.1583
NODEGREE | 502.0912 632.3685 0.79 0.427 -737.8914 1742.074
BLACK | -699.3353 432.4582 -1.62 0.106 -1547.323 148.6523
HISP | 2226.535 1219.08 1.83 0.068 -163.9034 4616.974
RE74 | .2791682 .0618802 4.51 0.000 .1578301 .4005063
RE75 | .5680874 .0663995 8.56 0.000 .4378876 .6982872
_cons | -2836.703 2937.385 -0.97 0.334 -8596.487 2923.081
-----------------------------------------------------------------------------. estimates store controlfunction
.
. * Variation that lets OLS coefficients differ across treatment and controls
. * Interaction of regressors with T
. gen TAGE = TREAT*AGE
. gen TAGESQ = TREAT*AGESQ
. gen TEDUC = TREAT*EDUC
. gen TNODEGREE = TREAT*NODEGREE
. gen TBLACK = TREAT*BLACK
. gen THISP = TREAT*HISP
. gen TRE74 = TREAT*RE74
. gen TRE75 = TREAT*RE75
. regress RE78 TREAT AGE AGESQ EDUC NODEGREE BLACK HISP RE74 RE75 /*
> */TAGE TAGESQ TEDUC TNODEGREE TBLACK THISP TRE74 TRE75
Source |
SS
df
MS
Number of obs = 2675
-------------+-----------------------------F( 17, 2657) = 223.17
Model | 3.8431e+11 17 2.2607e+10
Prob > F
= 0.0000
Residual | 2.6915e+11 2657 101297131
R-squared = 0.5881
608

-------------+-----------------------------Adj R-squared = 0.5855


Total | 6.5346e+11 2674 244375675
Root MSE
= 10065
-----------------------------------------------------------------------------RE78 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------TREAT | -8202.823 11960.39 -0.69 0.493 -31655.45 15249.8
AGE | 79.46291 165.6177 0.48 0.631 -245.2897 404.2155
AGESQ | -2.260967 2.239074 -1.01 0.313 -6.651471 2.129537
EDUC | 567.4906 106.2026 5.34 0.000 359.2424 775.7388
NODEGREE | 655.3534 679.5015 0.96 0.335 -677.052 1987.759
BLACK | -707.0551 505.0048 -1.40 0.162 -1697.297 283.1872
HISP | 2553.662 1154.726 2.21 0.027 289.4107 4817.914
RE74 | .2869368 .0282197 10.17 0.000
.231602 .3422715
RE75 | .5677759 .0277689 20.45 0.000 .5133251 .6222267
TAGE | 668.0022 745.1401 0.90 0.370 -793.1112 2129.116
TAGESQ | -8.651515 12.26876 -0.71 0.481 -32.7088 15.40577
TEDUC | -27.54033 529.1855 -0.05 0.958 -1065.197 1010.117
TNODEGREE | -963.4163 2410.973 -0.40 0.689 -5690.989 3764.157
TBLACK | -384.5853 2593.349 -0.15 0.882 -5469.772 4700.601
THISP | -2126.096 4086.539 -0.52 0.603 -10139.22 5887.023
TRE74 | -.2540934 .2070566 -1.23 0.220 -.6601018 .1519151
TRE75 | -.472797 .3097211 -1.53 0.127 -1.080116 .1345218
_cons | -1603.593 3069.895 -0.52 0.601 -7623.219 4416.032
-----------------------------------------------------------------------------.
. ***** (1D) DIFFERENCE-IN-DIFFERENCES
.
. * Need to stack two separate years of data RE75 and RE78
. * into a panel of two years on RE
. gen id = _n
. label variable id "id"
. gen EARNS1 = RE75
. gen EARNS2 = RE78
. reshape long EARNS, i(id) j(year)
(note: j = 1 2)
Data
wide -> long
----------------------------------------------------------------------------Number of obs.
2675 -> 5350
Number of variables
31 ->
31
j variable (2 values)
-> year
xij variables:
EARNS1 EARNS2 -> EARNS
-----------------------------------------------------------------------------

609

. gen dyear2 = 0
. replace dyear2 = 1 if year==2
(2675 real changes made)
. gen Tdyear2 = TREAT*dyear2
. regress EARNS Tdyear2 TREAT dyear2
Source |
SS
df
MS
Number of obs = 5350
-------------+-----------------------------F( 3, 5346) = 169.20
Model | 1.0214e+11 3 3.4047e+10
Prob > F
= 0.0000
Residual | 1.0757e+12 5346 201218724
R-squared = 0.0867
-------------+-----------------------------Adj R-squared = 0.0862
Total | 1.1779e+12 5349 220201247
Root MSE
= 14185
-----------------------------------------------------------------------------EARNS |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------Tdyear2 | 2326.505 1528.712 1.52 0.128 -670.3928 5323.403
TREAT | -17531.28 1080.962 -16.22 0.000 -19650.41 -15412.15
dyear2 | 2490.585 402.0217 6.20 0.000 1702.458 3278.711
_cons | 19063.34 284.2723 67.06 0.000 18506.05 19620.63
-----------------------------------------------------------------------------.
. * CT Table 25.4 p.891 fourth row usea heteroskedastic-robust standard errors
. regress EARNS Tdyear2 TREAT dyear2, robust
Regression with robust standard errors
Number of obs =
F( 3, 5346) = 1222.98
Prob > F
= 0.0000
R-squared = 0.0867
Root MSE = 14185

5350

-----------------------------------------------------------------------------|
Robust
EARNS |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------Tdyear2 | 2326.505 748.5021 3.11 0.002 859.1359 3793.875
TREAT | -17531.28 360.5992 -48.62 0.000 -18238.2 -16824.36
dyear2 | 2490.585 414.1056 6.01 0.000 1678.769
3302.4
_cons | 19063.34 272.5318 69.95 0.000 18529.06 19597.61
-----------------------------------------------------------------------------. estimates store diffindiff
.
. * Adding pretreatment controls makes no differnce as timne-invariant
. regress EARNS Tdyear2 TREAT dyear2 AGE AGESQ EDUC NODEGREE BLACK HISP

610

Source |
SS
df
MS
Number of obs = 5350
-------------+-----------------------------F( 9, 5340) = 184.54
Model | 2.7943e+11 9 3.1048e+10
Prob > F
= 0.0000
Residual | 8.9843e+11 5340 168245017
R-squared = 0.2372
-------------+-----------------------------Adj R-squared = 0.2359
Total | 1.1779e+12 5349 220201247
Root MSE
= 12971
-----------------------------------------------------------------------------EARNS |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------Tdyear2 | 2326.505 1397.856 1.66 0.096 -413.8634 5066.874
TREAT | -9766.469 1043.296 -9.36 0.000 -11811.76 -7721.183
dyear2 | 2490.585 367.6092 6.78 0.000
1769.92 3211.249
AGE | 1357.093 139.6885 9.72 0.000 1083.246 1630.939
AGESQ | -15.23373 1.911801 -7.97 0.000 -18.98164 -11.48582
EDUC | 1504.728 91.99622 16.36 0.000 1324.377 1685.078
NODEGREE | -447.8275 588.8841 -0.76 0.447 -1602.281 706.6257
BLACK | -3177.524 446.5098 -7.12 0.000 -4052.865 -2302.182
HISP | -360.5058 993.7164 -0.36 0.717 -2308.596 1587.584
_cons | -25357.74 2618.207 -9.69 0.000 -30490.49 -20224.98
-----------------------------------------------------------------------------.
. ***** (1C) BEFORE-AFTER COMPARISON
.
. * Regression for treated only
. regress EARNS Tdyear2 if TREAT==1
Source |
SS
df
MS
Number of obs = 370
-------------+-----------------------------F( 1, 368) = 59.41
Model | 2.1464e+09 1 2.1464e+09
Prob > F
= 0.0000
Residual | 1.3296e+10 368 36129816.6
R-squared = 0.1390
-------------+-----------------------------Adj R-squared = 0.1367
Total | 1.5442e+10 369 41848713.4
Root MSE
= 6010.8
-----------------------------------------------------------------------------EARNS |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------Tdyear2 | 4817.09 624.9741 7.71 0.000 3588.121 6046.058
_cons | 1532.056 441.9234 3.47 0.001 663.0436 2401.068
-----------------------------------------------------------------------------.
. * CT Table 25.4 p.891 third row uses heteroskedastic-robust standard errors
. regress EARNS Tdyear2 if TREAT==1, robust
Regression with robust standard errors
Number of obs =
F( 1, 368) = 59.41
Prob > F
= 0.0000
R-squared = 0.1390
Root MSE
= 6010.8

370

611

-----------------------------------------------------------------------------|
Robust
EARNS |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------Tdyear2 | 4817.09 624.9741 7.71 0.000 3588.121 6046.058
_cons | 1532.056 236.684 6.47 0.000 1066.633 1997.478
-----------------------------------------------------------------------------. estimates store beforeafter
.
. ***** DISPLAY RESULTS FOR FIRST FOUR ROWSM OF Table 25.4, p.891
.
. estimates table treatcontrol controlfunction beforeafter diffindiff, /*
> */ b(%10.0f) se(%10.0f) stats(N)
-----------------------------------------------------------------Variable | treatcon~l controlf~n beforeaf~r diffindiff
-------------+---------------------------------------------------TREAT | -15205
218
-17531
|
656
768
361
AGE |
159
|
151
AGESQ |
-3
|
2
EDUC |
565
|
122
NODEGREE |
502
|
632
BLACK |
-699
|
432
HISP |
2227
|
1219
RE74 |
0
|
0
RE75 |
1
|
0
Tdyear2 |
4817
2327
|
625
749
dyear2 |
2491
|
414
_cons |
21554
-2837
1532
19063
|
312
2937
237
273
-------------+---------------------------------------------------N|
2675
2675
370
5350
-----------------------------------------------------------------legend: b/se
.

612

. ********** ANALYSIS: (2) PROPENSITY SCORE USING STRATA (Table 25.4, p.891)
**********
.
. use nswpsid, clear
.
. ***** (2A) COMPUTE PROPENSITY SCORE
.
. * Calculate propensity score using regressors in DW99 Table 3 footnote e
. logit TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75
RE74SQ RE75SQ U74BLACK
Iteration 0: log likelihood = -672.64954
Iteration 1: log likelihood = -499.56574
Iteration 2: log likelihood = -318.55053
Iteration 3: log likelihood = -248.28844
Iteration 4: log likelihood = -225.08984
Iteration 5: log likelihood = -219.00396
Iteration 6: log likelihood = -209.30653
Iteration 7: log likelihood = -208.38887
Iteration 8: log likelihood = -205.17689
Iteration 9: log likelihood = -204.93156
Iteration 10: log likelihood = -204.92951
Iteration 11: log likelihood = -204.9295
Logit estimates

Log likelihood = -204.9295

Number of obs =
2675
LR chi2(13) = 935.44
Prob > chi2 = 0.0000
Pseudo R2
= 0.6953

-----------------------------------------------------------------------------TREAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------AGE | .3305734 .1203353 2.75 0.006 .0947206 .5664262
AGESQ | -.0063429 .0018561 -3.42 0.001 -.0099808 -.0027049
EDUC | .8247711 .3534216 2.33 0.020 .1320775 1.517465
EDUCSQ | -.0483153 .0186057 -2.60 0.009 -.0847819 -.0118488
MARR | -1.884062 .2994614 -6.29 0.000 -2.470996 -1.297129
NODEGREE | .1299868 .4284278 0.30 0.762 -.7097163
.96969
BLACK | 1.132961 .352088 3.22 0.001 .4428814 1.823041
HISP | 1.962762 .5673735 3.46 0.001 .8507302 3.074793
RE74 | -.0001047 .0000355 -2.95 0.003 -.0001743 -.0000351
RE75 | -.0002172 .0000415 -5.23 0.000 -.0002986 -.0001357
RE74SQ | 2.36e-09 6.57e-10 3.59 0.000 1.07e-09 3.65e-09
RE75SQ | 1.58e-10 6.68e-10 0.24 0.813 -1.15e-09 1.47e-09
U74BLACK | 2.137042 .4273667 5.00 0.000 1.299419 2.974665
_cons | -7.552458 2.451721 -3.08 0.002 -12.35774 -2.747173
-----------------------------------------------------------------------------note: 19 failures and 0 successes completely determined.

613

. * Note that Table 25.6 footnote b is wrong in stating RE74*RE75 is regressor


. predict PSCORE
(option p assumed; Pr(TREAT))
.
. ***** (2B) PLOT PROPENSITY SCORE BY TREATMENT STATUS TO SEE OVERLAP
.
. * Observations with no overlap in propensity score across treatment status are dropped
.
. sum PSCORE if TREAT==1
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------PSCORE |
185 .6876511 .3095136 .0006526 .9748755
. scalar PTMIN = r(min)
. scalar PTMAX = r(max)
. sum PSCORE if TREAT==0
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------PSCORE |
2490 .0232066 .0901373 4.49e-11 .9735255
. scalar PCMIN = r(min)
. scalar PCMAX = r(max)
. drop if PSCORE < PTMIN
(1344 observations deleted)
. drop if PSCORE < PCMIN
(0 observations deleted)
. drop if PSCORE > PTMAX
(0 observations deleted)
. drop if PSCORE > PCMAX
(6 observations deleted)
. * Following gives number of observations left
. sum PSCORE
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------PSCORE |
1325 .1350934 .2703797 .0006526 .9735255
.
. * This differs from CT text page 893 as now U74 and U75 are corrected
. * Instead of losing 1423 controls and 8 treated leaving 1244
614

. * now
lose 1344 controls and 6 treated leaving 1325
. * versus DW Figure 1 1333 controls are dropped leaving 1342
. * and Dw Table 3 column 6 says that there are 1255 left
.
. ***** (2C) CREATE FIGURE 25.3 ON PAGE 892
.
. * This will differ a little from figure in text due to U74 and U75 corrected
.
. label define tstatus 0 Comparison_sample 1 Treated_sample
. label values TREAT tstatus
. label variable TREAT "Treatment Status"
. graph twoway (scatter RE78 PSCORE if RE78 < 20000, msize(small)) /*
> */ (lowess RE78 PSCORE, bwidth(0.5) clpattern(solid)), /*
>
*/ by(TREAT, title("Post-treatment Earnings against Propensity Score", margin(b=3)
size(vlarge))
> ) /*
> */ subtitle(, bfcolor(none)) /*
> */ scale (1.2) plotregion(style(none)) /*
> */ xtitle(" Propensity Score
Propensity Score", size(medlarge))
> xscale(titlegap(*5)) /*
> */ ytitle("Real Earnings 1978", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(12) ring(0) col(2)) /*
> */ legend( label(1 "Original data") label(2 "Nonparametric regression"))
. graph export ch25treatment.wmf, replace
(file c:\Imbook\bwebpage\Section6\ch25treatment.wmf written in Windows Metafile format)
.
. ***** (2D) ADJUSTED DIFFERENCE Use PSCORE to summarize pre-treatment controls
.
. * A simple method regressors RE78 on a quadratic on PSCORE and on TREAT
. * And measures the treatment effect as coefficient of TREATED
.
. gen PSCORESQ = PSCORE*PSCORE
. regress RE78 TREAT PSCORE PSCORESQ
Source |
SS
df
MS
Number of obs = 1325
-------------+-----------------------------F( 3, 1321) = 46.14
Model | 1.5152e+10 3 5.0505e+09
Prob > F
= 0.0000
Residual | 1.4458e+11 1321 109450232
R-squared = 0.0949
-------------+-----------------------------Adj R-squared = 0.0928
Total | 1.5974e+11 1324 120645977
Root MSE
= 10462
-----------------------------------------------------------------------------RE78 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------TREAT | 301.5344 1388.756 0.22 0.828 -2422.874 3025.943
615

PSCORE | -39475.21 4836.678 -8.16 0.000 -48963.62 -29986.8


PSCORESQ | 33122.86 5037.943 6.57 0.000 23239.61 43006.1
_cons | 14560.51 347.3596 41.92 0.000 13879.07 15241.95
-----------------------------------------------------------------------------.
. * This yields coefficient of 301 with nonrobust se of 1388
. * which is close to DW 99 Table 3 column 3
.*
coefficient of 294 with nonrobust se of 1389
.
. ***** (2E) CREATE STRATA
.
. * DW are not clear on how formed.
. * NBER Working Paper W6829 appendix suggests that form five cells
. * according to range of PSCORE (where nonoverlapping PSCOREs already dropped)
.
. * Here we instead create ten strata
. * for PSCORE <0.1, 0.1-0.2, ...., 0.8-0.9 and > 0.9
. global cut1 = 0.1
. global cut2 = 0.2
. global cut3 = 0.3
. global cut4 = 0.4
. global cut5 = 0.5
. global cut6 = 0.6
. global cut7 = 0.7
. global cut8 = 0.8
. global cut9 = 0.9
. gen STRATA = 1
. replace STRATA = 2 if PSCORE > $cut1 & PSCORE <= $cut2
(60 real changes made)
. replace STRATA = 3 if PSCORE > $cut2 & PSCORE <= $cut3
(35 real changes made)
. replace STRATA = 4 if PSCORE > $cut3 & PSCORE <= $cut4
(33 real changes made)
. replace STRATA = 5 if PSCORE > $cut4 & PSCORE <= $cut5
(13 real changes made)
. replace STRATA = 6 if PSCORE > $cut5 & PSCORE <= $cut6
616

(21 real changes made)


. replace STRATA = 7 if PSCORE > $cut6 & PSCORE <= $cut7
(22 real changes made)
. replace STRATA = 8 if PSCORE > $cut7 & PSCORE <= $cut8
(13 real changes made)
. replace STRATA = 9 if PSCORE > $cut8 & PSCORE <= $cut9
(13 real changes made)
. replace STRATA = 10 if PSCORE > $cut9
(86 real changes made)
.
. tab STRATA T
| Treatment Status
STRATA | Compariso Treated_s | Total
-----------+----------------------+---------1 | 1,018
11 | 1,029
2|
53
7|
60
3|
24
11 |
35
4|
17
16 |
33
5|
8
5|
13
6|
6
15 |
21
7|
8
14 |
22
8|
5
8|
13
9|
0
13 |
13
10 |
7
79 |
86
-----------+----------------------+---------Total | 1,146
179 | 1,325

.
. ***** (2F) Test for similar regressor means for treated and nontreated within each Strata
.
. * Compare means within Strata across treatment status
. tab STRATA TREAT, sum(AGE) nostand nofreq
Means of AGE
| Treatment Status
STRATA | Compariso Treated_s | Total
-----------+----------------------+---------1 | 31.427308 30.363636 | 31.415938
2 | 28.037736 28.714286 | 28.116667
3 | 27.833333 27.909091 | 27.857143
4 | 27.529412
28.25 | 27.878788
5 | 28.875
27.8 | 28.461538
6|
25
23.4 | 23.857143
617

7 | 24.875
24.5 | 24.636364
8|
24.8
32 | 29.230769
9|
. 29.461538 | 29.461538
10 | 23.285714 23.367089 | 23.360465
-----------+----------------------+---------Total | 30.961606 25.765363 | 30.259623
. tab STRATA TREAT, sum(EDUC) nostand nofreq
Means of EDUC
| Treatment Status
STRATA | Compariso Treated_s | Total
-----------+----------------------+---------1 | 11.229862 11.545455 | 11.233236
2 | 10.433962 10.714286 | 10.466667
3 | 10.583333 10.181818 | 10.457143
4 | 10.647059 10.0625 | 10.363636
5 | 10.625
9.4 | 10.153846
6 | 9.3333333 10.066667 | 9.8571429
7 | 9.875 11.071429 | 10.636364
8|
10.8
11.25 | 11.076923
9|
.
11 |
11
10 | 10.571429 10.164557 | 10.197674
-----------+----------------------+---------Total | 11.141361 10.413408 | 11.043019
. tab STRATA TREAT, sum(MARR) nostand nofreq
Means of MARR
| Treatment Status
STRATA | Compariso Treated_s | Total
-----------+----------------------+---------1 | .8280943 .81818182 | .82798834
2 | .56603774 .85714286 |
.6
3 | .29166667 .18181818 | .25714286
4 | .23529412
.25 | .24242424
5|
.25
0 | .15384615
6 | .16666667 .06666667 | .0952381
7|
.125 .07142857 | .09090909
8|
.2
.625 | .46153846
9|
. .53846154 | .53846154
10 |
0
0|
0
-----------+----------------------+---------Total | .77574171 .19553073 | .69735849
. tab STRATA TREAT, sum(NODEGREE) nostand nofreq
Means of NODEGREE

618

| Treatment Status
STRATA | Compariso Treated_s | Total
-----------+----------------------+---------1 | .38408644 .36363636 | .38386783
2 | .62264151 .57142857 | .61666667
3|
.625 .54545455 |
.6
4 | .52941176
.625 | .57575758
5|
.625
.8 | .69230769
6 | .83333333
.8 | .80952381
7|
.625 .64285714 | .63636364
8|
.8
.75 | .76923077
9|
. .76923077 | .76923077
10 | .71428571 .75949367 | .75581395
-----------+----------------------+---------Total | .41186736 .69832402 | .45056604
. tab STRATA TREAT, sum(BLACK) nostand nofreq
Means of BLACK
| Treatment Status
STRATA | Compariso Treated_s | Total
-----------+----------------------+---------1 | .36247544 .63636364 | .3654033
2 | .60377358 .57142857 |
.6
3 | .66666667 .54545455 | .62857143
4 | .88235294
.875 | .87878788
5|
1
.4 | .76923077
6 | .83333333
.6 | .66666667
7|
.875 .92857143 | .90909091
8|
.8
1 | .92307692
9|
. .92307692 | .92307692
10 |
1 .94936709 | .95348837
-----------+----------------------+---------Total | .40401396 .83798883 | .46264151
. tab STRATA TREAT, sum(HISP) nostand nofreq
Means of HISP
| Treatment Status
STRATA | Compariso Treated_s | Total
-----------+----------------------+---------1 | .04911591
0 | .04859086
2 | .0754717 .28571429 |
.1
3 | .08333333
0 | .05714286
4|
0
0|
0
5|
0
.2 | .07692308
6 | .16666667 .13333333 | .14285714
7|
.125 .07142857 | .09090909
8|
.2
0 | .07692308
619

9|
. .07692308 | .07692308
10 |
0 .05063291 | .04651163
-----------+----------------------+---------Total | .05148342 .06145251 | .05283019
. tab STRATA TREAT, sum(RE74) nostand nofreq
Means of RE74
| Treatment Status
STRATA | Compariso Treated_s | Total
-----------+----------------------+---------1 | 12216.528 12142.62 | 12215.738
2 | 5989.8844 2031.6573 | 5528.0912
3 | 6476.1906 5884.7335 | 6290.3041
4 | 4790.868 4895.09 | 4841.3999
5 | 2375.3662 5715.8799 | 3660.1792
6 | 3173.6867 2402.9567 | 2623.1653
7 | 1533.1259 2269.1672 | 2001.5158
8 | 1567.414
0 | 602.85154
9|
. 34.243847 | 34.243847
10 |
0
0|
0
-----------+----------------------+---------Total | 11386.483 2165.8167 | 10140.823
. tab STRATA TREAT, sum(RE75) nostand nofreq
Means of RE75
| Treatment Status
STRATA | Compariso Treated_s | Total
-----------+----------------------+---------1 | 10352.924 8964.4728 | 10338.081
2 | 3916.448 3250.0113 | 3838.697
3 | 2417.8314 2694.2624 | 2504.7097
4 | 3134.96 2905.615 | 3023.7624
5 | 3204.6788 1917.262 | 2709.5185
6 | 2878.54 1731.1554 | 2058.9796
7 | 643.84411 1230.5051 | 1017.1739
8 | 2539.0337 1501.9275 | 1900.8145
9|
. 201.91542 | 201.91542
10 | 127.88014 234.47151 | 225.79547
-----------+----------------------+---------Total | 9528.6389 1583.4094 | 8455.2834
. tab STRATA TREAT, sum(U74BLACK) nostand nofreq
Means of U74BLACK
| Treatment Status
STRATA | Compariso Treated_s |

Total
620

-----------+----------------------+---------1 | .01473477
0 | .01457726
2 | .05660377 .14285714 | .06666667
3 | .08333333 .09090909 | .08571429
4 | .17647059
.1875 | .18181818
5|
.25
.2 | .23076923
6 | .16666667 .06666667 | .0952381
7|
.125 .21428571 | .18181818
8|
.4
1 | .76923077
9|
. .92307692 | .92307692
10 |
1 .94936709 | .95348837
-----------+----------------------+---------Total | .03141361 .58659218 | .10641509
.
. * Formal test of difference in means within strata across treatment status
. * Example is for education
. * bysort STRATA: oneway EDUC T
.
. ***** (2G) Calculate weighted average of within strata mean difference in outcome
.
. #delimit ;
delimiter now ;
. global sum = 0 ;
.
* Sums the estimate of interest over strata ;
. global sumwgt = 0 ;
. /* Sums the number of treated obs over strata */
> global count = 0 ;
.
/* This gives the number of Strata used
> global numcut = 10;

*/

. * Possibly include extra regressors.


> * Not clear which ones, so same as DW99 Table 3 footnote a for column 2
> global XLIST AGE AGESQ EDUC NODEGREE BLACK HISP RE74 RE75;
. forvalues i = 1/$numcut { ;
2. global addon = 0 ;
3. /* Within strata estiamte of interest */
> global tobs = 0 ;
4. /* Within strata number of treated obs */
> capture { ;
5.
quiet regress RE78 TREAT $XLIST if STRATA == `i' ;
6.
global addon = _b[TREAT] ;
7.
quiet summarize TREAT if TREAT==1 & STRATA==`i' ;
8.
global tobs = _result(1) ;
9. * # of treatment observations ;
. };
10. di "`i' estimate = $addon
Top cut = ${cut`i'} #treat obs = $tobs" ;
11. if $addon ~= 0 { ;
621

12.
global sum = $sum + $addon * $tobs ;
13.
global sumwgt = $sumwgt + $tobs ;
14.
global count = $count + 1 ;
15. } ;
16. } ;
1 estimate = -4410.946812653378
Top cut = .1
2 estimate = -2113.275144674707
Top cut = .2
3 estimate = 1486.684503266305
Top cut = .3
4 estimate = -6085.742371951832
Top cut = .4
5 estimate = 1899.984014892578
Top cut = .5
6 estimate = -411.1481648763024
Top cut = .6
7 estimate = 133.9267490931921
Top cut = .7
8 estimate = 1848.656362915039
Top cut = .8
9 estimate = 0
Top cut = .9 #treat obs = 13
10 estimate = 4857.563579676591
Top cut =

#treat obs = 11
#treat obs = 7
#treat obs = 11
#treat obs = 16
#treat obs = 5
#treat obs = 15
#treat obs = 14
#treat obs = 8
#treat obs = 79

. #delimit cr ;
delimiter now cr
.
.
. ***** DISPLAY RESULT: "Propensity Score" estimate in last row Table 25.4
.
. * Weighted estimate
. di $sum / $sumwgt "
Count = " $count
1562.7274
Count = 9
.
. * This differs from value 995 given in text due to
. * previously mentioned correction of U74 and U75.
. * Now get 1562 with se not estimated
. * compared to DW99 estimates Table 3 column 4 1608 and column 5 1494
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section6\mma25p1treatment.txt
log type: text
closed on: 26 May 2005, 10:26:22

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section6\mma25p2matching.txt
log type: text
opened on: 26 May 2005, 10:26:31
.
. ********** OVERVIEW OF MMA25P2MATCHING.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
622

. * Cambridge University Press


.
. * Chapter 25.8.5 pages 893-6 Tables 25.5-25.7
. * Evaluating treatment effect of training on Earnings
. * using Dehejia-Wahba data (originally Lalonde data)
.
. * (1) For DW 2002 specification of the logit model for propensity score
. * calculate treatment effect by matching methods (Tables 25.5-6)
. * ( ) give distribution of propensity score (Table 25.5)
. * (1A) nearest neighbor matching
. * (1B) radius matching r = 0.001
. * (1C) radius matching r = 0.001
. * (1D) radius matching r = 0.001
. * (1E) stratification
. * (1F) kernel matching
. * (2) For DW 1999 specification of the logit model for propensity score
. * calculate treatment effect by matching methods (Table 25.6)
.
. * The program MMA25P1TREATMENT.DO provides simpler nonmatching methods
. * for the same data.
.
. * To run this program you need data file
. * nswpsid.da1
.
. * To run this program you need the Stata add-ons
. * pscore.ado, atts.ado, attr.ado, attnd.ado, attnw.ado
. * due to Sascha O. Becker and Andrea Ichino (2002)
. * "Estimation of average treatment effects based on propensity scores",
. * The Stata Journal, Vol.2, No.4, pp. 358-377.
.
. * This program uses version 2.02 May 13 2005 for Stata version 8
. * downloadable from http://www.iue.it/Personal/Ichino/#pscore
. * We earlier used version 1.29 October 8 2002 for Stata version 7
. * downloadable from http://www.iue.it/Personal/Ichino/#pscore
. * and obtained the same results
.
. * To speed up the program reduce breps: the number of bootstrap
. * replications used to obtain bootstrap standard errors
. * Bootstrap se's will differ from text as here seed is set to 10101
.
. ********** STATA SETUP **********
.
. set more off
. version 8
. set scheme s1mono /* Used for graphs */
.
. ********** DATA DESCRIPTION **********
.
623

. * Data set nswpsid.da1 is data set nswpsid.da1 from Guido Imbens


. * http://emlab.berkeley.edu/users/imbens/index.shtml
.
. * Data originally from DW99
. * R.H. Dehejia and S. Wahba (1999)
. * "Causal Effects in Nonexperimental Studies: reevaluating the
. * Evaluation of Training Programs", JASA, 1053-1062
. * or DW02
. * R.H. Dehejia and S. Wahba (2002)
. * "Propensity-score Matching Methods for Nonexperimental Causal
. * Studies", ReStat, 151-161
. * which in turn are from
. * Lalonde, R. (1986), "Evaluating the Econometric Evaluations of
. * Training Programs with Experimental Data," AER, 604-620.
.
. * Each observation is for an individual.
. * There are 2,675 observations: 185 in treated group and 2490 in control
.
. * Variables are
. * TREAT 1 if treated (NSW treated) and 0 if not (PSID-1 control)
. * AGE in years
. * EDUC in years
. * BLACK 1 if black
. * HISP 1 if hispanic
. * MARR 1 if married
. * RE74 Real annual earnings in 1974 (pre-treatment)
. * RE75 Real annual earnings in 1974 (pre-treatment)
. * RE78 Real annual earnings in 1974 (post-treatment)
. * U74 1 if unemployed in 1974
. * U75 1 if unemployed in 1974
.
. * NOTE: U74 and U75 are miscoded in these data and also in the
.*
summary statistics table of DW02
.*
See below for correction to data
.
. ********** READ DATA AND TRANSFORMATIONS **********
.
. ****** propensity score for nsw-psid composite sample*************
. ****** output for MMA Tables 25.6 & 25.7 ***********************
.
. infile TREAT AGE EDUC BLACK HISP MARR RE74 RE75 RE78 U74 U75 /*
> */ using nswpsid.da1
(2675 observations read)
.
. * The original data reversed U74 and U75
. * Should be U74=1 if R74=0 and U74=0 if R74>0 anmd similar for U75
. * This effects results with propensity score though not eariler results
.
. * Wrong U74 and U75
. sum U74 U75
624

Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------U74 |
2675 .1345794 .3413376
0
1
U75 |
2675 .1293458 .335645
0
1
.
. * Correct the original data
. drop U74 U75
. gen U74 = cond(RE74 == 0, 1, 0)
. gen U75 = cond(RE75 == 0, 1, 0)
.
. * Correct U74 and U75
. sum U74 U75
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------U74 |
2675 .1293458 .335645
0
1
U75 |
2675 .1345794 .3413376
0
1
.
. * Create regressors used as additional controls in regressions below
. gen AGESQ = AGE*AGE
. gen EDUCSQ = EDUC*EDUC
. * DW99 do not define NODEGREE but following gives Table 1 means
. gen NODEGREE = 0
. replace NODEGREE = 1 if EDUC < 12
(891 real changes made)
. gen RE74SQ = RE74*RE74
. gen RE75SQ = RE75*RE75
. gen U74BLACK = U74*BLACK
. gen U74HISP = U74*HISP
.
. sum AGE EDUC NODEGREE BLACK HISP MARR U74 U75 RE74 RE75 RE78 TREAT /*
> */ AGESQ EDUCSQ RE74SQ RE75SQ U74BLACK U74HISP
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------AGE |
2675 34.22579 10.49984
17
55
EDUC |
2675 11.99439 3.053556
0
17
625

NODEGREE |
2675 .3330841 .4714045
0
1
BLACK |
2675 .2915888 .4545789
0
1
HISP |
2675 .0343925 .1822693
0
1
-------------+-------------------------------------------------------MARR |
2675 .8194393 .3847257
0
1
U74 |
2675 .1293458 .335645
0
1
U75 |
2675 .1345794 .3413376
0
1
RE74 |
2675
18230 13722.25
0 137149
RE75 |
2675 17850.89 13877.78
0 156653
-------------+-------------------------------------------------------RE78 |
2675 20502.38 15632.52
0 121174
TREAT |
2675 .0691589 .2537716
0
1
AGESQ |
2675 1281.61 766.8415
289
3025
EDUCSQ |
2675 153.1862 70.62231
0
289
RE74SQ |
2675 5.21e+08 8.47e+08
0 1.88e+10
-------------+-------------------------------------------------------RE75SQ |
2675 5.11e+08 8.91e+08
0 2.45e+10
U74BLACK |
2675 .0549533 .2279316
0
1
U74HISP |
2675 .0056075 .0746868
0
1
.
. bysort TREAT: sum AGE EDUC NODEGREE BLACK HISP MARR U74 U75 RE74 RE75
RE78 TREAT /*
> */ AGESQ EDUCSQ RE74SQ RE75SQ U74BLACK U74HISP
----------------------------------------------------------------------------------------------------> TREAT = 0
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------AGE |
2490 34.8506 10.44076
18
55
EDUC |
2490 12.11687 3.082435
0
17
NODEGREE |
2490 .3052209 .4605934
0
1
BLACK |
2490 .2506024 .433447
0
1
HISP |
2490 .0325301 .1774389
0
1
-------------+-------------------------------------------------------MARR |
2490 .8662651 .3404357
0
1
U74 |
2490 .0863454 .2809298
0
1
U75 |
2490
.1 .3000603
0
1
RE74 |
2490 19428.75 13406.88
0 137149
RE75 |
2490 19063.34 13596.95
0 156653
-------------+-------------------------------------------------------RE78 |
2490 21553.92 15555.35
0 121174
TREAT |
2490
0
0
0
0
AGESQ |
2490 1323.53 769.796
324
3025
EDUCSQ |
2490 156.3161 71.43048
0
289
RE74SQ |
2490 5.57e+08 8.66e+08
0 1.88e+10
-------------+-------------------------------------------------------RE75SQ |
2490 5.48e+08 9.12e+08
0 2.45e+10
U74BLACK |
2490 .0144578 .1193923
0
1
U74HISP |
2490 .0036145 .0600237
0
1
626

----------------------------------------------------------------------------------------------------> TREAT = 1
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------AGE |
185 25.81622 7.155019
17
48
EDUC |
185 10.34595 2.01065
4
16
NODEGREE |
185 .7081081 .4558666
0
1
BLACK |
185 .8432432 .3645579
0
1
HISP |
185 .0594595 .2371244
0
1
-------------+-------------------------------------------------------MARR |
185 .1891892 .3927217
0
1
U74 |
185 .7081081 .4558666
0
1
U75 |
185
.6 .4912274
0
1
RE74 |
185 2095.574 4886.623
0 35040.1
RE75 |
185 1532.056 3219.251
0 25142.2
-------------+-------------------------------------------------------RE78 |
185 6349.145 7867.405
0 60307.9
TREAT |
185
1
0
1
1
AGESQ |
185 717.3946 431.2517
289
2304
EDUCSQ |
185 111.0595 39.30388
16
256
RE74SQ |
185 2.81e+07 1.14e+08
0 1.23e+09
-------------+-------------------------------------------------------RE75SQ |
185 1.27e+07 5.60e+07
0 6.32e+08
U74BLACK |
185
.6 .4912274
0
1
U74HISP |
185 .0324324 .1776263
0
1

.
. *** NOTE: The benchmark estimate obtained from NSW experiment is
. ***
$1,794 = Average(RE_78 for NSW treated) - Average (RE_78 for NSW comtrols)
. ***
See MMA25P3EXTRA.DO
.
. ********** (1) ANALYSIS for DW02 SPECIFICATION OF THE PROPENSITY SCORE
**********
.
. * Following defines number of bootstrap replications
. * Table 25.6 used 200 (or 100 in some places)
. global breps 200
.
. * From DW02 Table 3 footnote a the propensity score uses the following regressors
. global XDW02 AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75
RE74SQ U74 U75 U74HISP
.
. **** Table 25.5 p.894 summarizes propensity score
. **** using just those observations with common support
.

627

. pscore TREAT $XDW02, pscore(myscore) comsup blockid(myblock) numblo(5) level(0.005)


logit

****************************************************
Algorithm to estimate the propensity score
****************************************************

The treatment is TREAT


TREAT |
Freq. Percent
Cum.
------------+----------------------------------0|
2,490
93.08
93.08
1|
185
6.92
100.00
------------+----------------------------------Total |
2,675 100.00

Estimation of the propensity score


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:
Iteration 7:
Iteration 8:
Iteration 9:

log likelihood = -672.64954


log likelihood = -551.87026
log likelihood = -355.56578
log likelihood = -234.78051
log likelihood = -208.2965
log likelihood = -199.26423
log likelihood = -197.26114
log likelihood = -197.1054
log likelihood = -197.10179
log likelihood = -197.10175

Logit estimates

Number of obs =
2675
LR chi2(14) = 951.10
Prob > chi2 = 0.0000
Log likelihood = -197.10175
Pseudo R2
= 0.7070
-----------------------------------------------------------------------------TREAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------AGE | .2628422 .120206 2.19 0.029 .0272428 .4984416
AGESQ | -.0053794 .0018341 -2.93 0.003 -.0089742 -.0017846
EDUC | .7149774 .3418173 2.09 0.036 .0450278 1.384927
EDUCSQ | -.0426178 .0179039 -2.38 0.017 -.0777088 -.0075269
MARR | -1.780857 .301802 -5.90 0.000 -2.372378 -1.189336
NODEGREE | .1891046 .4257533 0.44 0.657 -.6453564 1.023566
BLACK | 2.519383 .370358 6.80 0.000 1.793495 3.245272
HISP | 3.087327 .7340486 4.21 0.000 1.648618 4.526036
RE74 | -.0000448 .0000425 -1.05 0.292 -.000128 .0000385
628

RE75 | -.0002678 .0000485 -5.52 0.000 -.0003628 -.0001727


RE74SQ | 1.99e-09 7.75e-10 2.57 0.010 4.72e-10 3.51e-09
U74 | 3.100056 .5187391 5.98 0.000 2.083346 4.116766
U75 | -1.273525 .4644557 -2.74 0.006 -2.183842 -.3632088
U74HISP | -1.925803 1.07186 -1.80 0.072 -4.02661 .1750032
_cons | -7.407524 2.445692 -3.03 0.002 -12.20099 -2.614056
-----------------------------------------------------------------------------note: 65 failures and 0 successes completely determined.

Note: the common support option has been selected


The region of common support is [.00036433, .98576756]

Description of the estimated propensity score


in region of common support
Estimated propensity score
------------------------------------------------------------Percentiles
Smallest
1% .0003871
.0003643
5% .0004805
.0003669
10% .0006343
.0003702
Obs
1271
25% .0016393
.0003714
Sum of Wgt.
1271
50%
75%
90%
95%
99%

.0090427
Mean
.1447205
Largest
Std. Dev.
.2809511
.0897599
.9803043
.656286
.9830988
Variance
.0789335
.9392306
.9855413
Skewness
2.049999
.9640553
.9857676
Kurtosis
5.748631

******************************************************
Step 1: Identification of the optimal number of blocks
Use option detail if you want more detailed output
******************************************************

The final number of blocks is 6


This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks

**********************************************************
629

Step 2: Test of balancing property of the propensity score


Use option detail if you want more detailed output
**********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated


and the number of controls for each block
Inferior |
of block |
TREAT
of pscore |
0
1 | Total
-----------+----------------------+---------.0003643 |
960
9|
969
.1 |
56
10 |
66
.2 |
33
14 |
47
.4 |
22
24 |
46
.6 |
7
33 |
40
.8 |
8
95 |
103
-----------+----------------------+---------Total | 1,086
185 | 1,271
Note: the common support option has been selected

*******************************************
End of the algorithm to estimate the pscore
*******************************************
.
. **** For completeness do same with common support option NOT selected
.
. drop myscore myblock
. pscore TREAT $XDW02, pscore(myscore) blockid(myblock) numblo(5) level(0.005) logit

****************************************************
Algorithm to estimate the propensity score
****************************************************

The treatment is TREAT


TREAT |
Freq. Percent
Cum.
------------+----------------------------------0|
2,490
93.08
93.08
1|
185
6.92
100.00
630

------------+----------------------------------Total |
2,675
100.00

Estimation of the propensity score


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:
Iteration 7:
Iteration 8:
Iteration 9:

log likelihood = -672.64954


log likelihood = -551.87026
log likelihood = -355.56578
log likelihood = -234.78051
log likelihood = -208.2965
log likelihood = -199.26423
log likelihood = -197.26114
log likelihood = -197.1054
log likelihood = -197.10179
log likelihood = -197.10175

Logit estimates

Number of obs =
2675
LR chi2(14) = 951.10
Prob > chi2 = 0.0000
Log likelihood = -197.10175
Pseudo R2
= 0.7070

-----------------------------------------------------------------------------TREAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------AGE | .2628422 .120206 2.19 0.029 .0272428 .4984416
AGESQ | -.0053794 .0018341 -2.93 0.003 -.0089742 -.0017846
EDUC | .7149774 .3418173 2.09 0.036 .0450278 1.384927
EDUCSQ | -.0426178 .0179039 -2.38 0.017 -.0777088 -.0075269
MARR | -1.780857 .301802 -5.90 0.000 -2.372378 -1.189336
NODEGREE | .1891046 .4257533 0.44 0.657 -.6453564 1.023566
BLACK | 2.519383 .370358 6.80 0.000 1.793495 3.245272
HISP | 3.087327 .7340486 4.21 0.000 1.648618 4.526036
RE74 | -.0000448 .0000425 -1.05 0.292 -.000128 .0000385
RE75 | -.0002678 .0000485 -5.52 0.000 -.0003628 -.0001727
RE74SQ | 1.99e-09 7.75e-10 2.57 0.010 4.72e-10 3.51e-09
U74 | 3.100056 .5187391 5.98 0.000 2.083346 4.116766
U75 | -1.273525 .4644557 -2.74 0.006 -2.183842 -.3632088
U74HISP | -1.925803 1.07186 -1.80 0.072 -4.02661 .1750032
_cons | -7.407524 2.445692 -3.03 0.002 -12.20099 -2.614056
-----------------------------------------------------------------------------note: 65 failures and 0 successes completely determined.

Description of the estimated propensity score


Estimated propensity score
------------------------------------------------------------631

Percentiles
Smallest
1% 2.36e-09
1.76e-12
5% 8.39e-08
5.07e-12
10% 4.47e-07
1.14e-11
25% .0000107
1.14e-11
50%
75%
90%
95%
99%

Obs
2675
Sum of Wgt.
2675

.0002558
Mean
.0691589
Largest
Std. Dev.
.2074207
.0071195
.9830988
.129801
.9855413
Variance
.0430234
.6394923
.9857676
Skewness
3.407447
.9572224
.986626
Kurtosis
13.56404

******************************************************
Step 1: Identification of the optimal number of blocks
Use option detail if you want more detailed output
******************************************************

The final number of blocks is 7


This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks

**********************************************************
Step 2: Test of balancing property of the propensity score
Use option detail if you want more detailed output
**********************************************************
Variable BLACK is not balanced in block 1
The balancing property is not satisfied
Try a different specification of the propensity score
Inferior |
of block |
TREAT
of pscore |
0
1 | Total
-----------+----------------------+---------0 | 2,265
7 | 2,272
.05 |
98
2|
100
.1 |
56
10 |
66
.2 |
33
14 |
47
.4 |
22
24 |
46
.6 |
7
33 |
40
.8 |
9
95 |
104
-----------+----------------------+---------632

Total |

2,490

185 |

2,675

*******************************************
End of the algorithm to estimate the pscore
*******************************************
.
. **** All of the following use common support
.
. ****************************************************************************
. **** Note: The results in the first half of Table 25.6
. ****
erroneously added RE75SQ as a regressor.
. ****
This does not effect Table 25.5 (done correctly) or
. ****
stratification estimates (which used myscore from correct model).
. ****
But it does effect NN, radius and kernel estimates.
. ****
To enable comparison with the text we do analysis here
. ****
both with and without RE75SQ.
. ****
Even dropping RE75SQ the results continue to differ from DW02.
. ****
Text Corrected
. ****
Table 25.6 Table 25.6 DW 2002
. ****
NN
2385
1286
1202
. ****
Radius = 0.001 -7815
-7808
1187
. ****
Radius = 0.0001 -9333
-6401
1191
. ****
Radius = 0.00001 -2200
-1135
1198
. ****
Stratification
1497
1497
. ****
Kernel
1309
1342
. ****************************************************************************
.
. **** Row 1 Table 25.6: Nearest neighbor matching (random version)
. set seed 10101
. attnd RE78 TREAT $XDW02 RE75SQ, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit.


This operation may take a while.

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

53

2385.430

1792.028

1.331

633

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches

Bootstrapping of standard errors


command:
attnd RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ U74 U
> 75 U74HISP RE75SQ , pscore() logit comsup
statistic: attnd
= r(attnd)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attnd | 200 2385.43 -859.5093 1094.969 226.1985 4544.661 (N)
|
-937.0529 3515.425 (P)
|
1202.547 4697.713 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

53

2385.430

1094.969

2.179

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches
. set seed 10101

634

. attnd RE78 TREAT $XDW02, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit.


This operation may take a while.

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

60

1285.782

3895.044

0.330

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches

Bootstrapping of standard errors


command:
attnd RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ U74 U
> 75 U74HISP , pscore() logit comsup
statistic: attnd
= r(attnd)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attnd | 200 1285.782 319.006 1275.405 -1229.261 3800.825 (N)
|
-1128.466 3835.567 (P)
|
-2181.243 3294.797 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

635

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

60

1285.782

1275.405

1.008

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches
.
. **** Row 2 Table 25.6: Radius matching for Radius=0.001
. set seed 10101
. attr RE78 TREAT $XDW02 RE75SQ, comsup boot reps($breps) dots logit radius(0.001)

The program is searching for matches of treated units within radius.


This operation may take a while.

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------54

517 -7815.382

1118.181

-6.989

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

Bootstrapping of standard errors


command:
attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ U74 U7
> 5 U74HISP RE75SQ , pscore() logit comsup radius(.001)
statistic: attr
= r(attr)
636

....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 200 -7815.381 1345.983 3794.466 -15297.9 -332.8595 (N)
|
-18163.96 936.3913 (P)
|
-21184.98 -2839.753 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Radius Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------54

517 -7815.381

3794.466

-2.060

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius
. set seed 10101
. attr RE78 TREAT $XDW02, comsup boot reps($breps) dots logit radius(0.001)

The program is searching for matches of treated units within radius.


This operation may take a while.

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
---------------------------------------------------------

637

51

541 -7808.241

1146.418

-6.811

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

Bootstrapping of standard errors


command:
attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ U74 U7
> 5 U74HISP , pscore() logit comsup radius(.001)
statistic: attr
= r(attr)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 200 -7808.242 1022.016 3770.093 -15242.7 -373.7819 (N)
|
-16697.45 1438.308 (P)
|
-18942.21 -1204.325 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Radius Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------51

541 -7808.242

3770.093

-2.071

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius
.
638

. **** Row 3 Table 25.6: Radius matching for Radius=0.0001


. set seed 10101
. attr RE78 TREAT $XDW02 RE75SQ, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius.


This operation may take a while.

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------24

92 -9333.120

2285.624

-4.083

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

Bootstrapping of standard errors


command:
attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ U74 U7
> 5 U74HISP RE75SQ , pscore() logit comsup radius(.0001)
statistic: attr
= r(attr)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 200 -9333.12 4076.044 5211.11 -19609.2 942.9621 (N)
|
-19094.04 4604.865 (P)
|
-22414.52 -4341.134 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
639

BC = bias-corrected

ATT estimation with the Radius Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------24

92 -9333.120

5211.110

-1.791

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius
. set seed 10101
. attr RE78 TREAT $XDW02, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius.


This operation may take a while.

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------27

91 -6401.345

2054.218

-3.116

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

Bootstrapping of standard errors


command:
attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ U74 U7
> 5 U74HISP , pscore() logit comsup radius(.0001)
statistic: attr
= r(attr)
....................................................................................................
640

> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 200 -6401.345 310.4673 5618.88 -17481.53 4678.842 (N)
|
-18778.71 4636.073 (P)
|
-21404.97 3740.767 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Radius Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------27

91 -6401.345

5618.880

-1.139

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius
.
. **** Row 4 Table 25.6: Radius matching for Radius=0.00001
. set seed 10101
. attr RE78 TREAT $XDW02 RE75SQ, comsup boot reps($breps) dots logit radius(0.00001)

The program is searching for matches of treated units within radius.


This operation may take a while.

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------641

15

19 -2200.022

2986.211

-0.737

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

Bootstrapping of standard errors


command:
attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ U74 U7
> 5 U74HISP RE75SQ , pscore() logit comsup radius(.00001)
statistic: attr
= r(attr)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 200 -2200.022 626.9762 7009.51 -16022.47 11622.43 (N)
|
-24355.12 8831.196 (P)
|
-31741.1 4217.228 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Radius Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------15

19 -2200.022

7009.510

-0.314

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

642

. set seed 10101


. attr RE78 TREAT $XDW02, comsup boot reps($breps) dots logit radius(0.00001)

The program is searching for matches of treated units within radius.


This operation may take a while.

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------16

17 -1135.184

3189.367

-0.356

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

Bootstrapping of standard errors


command:
attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ U74 U7
> 5 U74HISP , pscore() logit comsup radius(.00001)
statistic: attr
= r(attr)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 199 -1135.184 -2079.93 7030.204 -14998.87 12728.5 (N)
|
-23808.6 8048.6 (P)
|
-16939.85 9102.585 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
643

ATT estimation with the Radius Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------16

17 -1135.184

7030.204

-0.161

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius
.
. **** Row 5 Table 25.6: Stratification Matching
. set seed 10101
. atts RE78 TREAT, pscore(myscore) blockid(myblock) comsup boot reps($breps) dots

ATT estimation with the Stratification method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

1086

1497.484

920.688

1.626

---------------------------------------------------------

Bootstrapping of standard errors


command:
atts RE78 TREAT , pscore(myscore) blockid(myblock) comsup
statistic: atts
= r(atts)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

644

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------atts | 200 1497.484 91.22797 913.129 -303.1669 3298.134 (N)
|
-16.69353 3509.36 (P)
|
-64.37524 3306.115 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Stratification method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

1086

1497.484

913.129

1.640

--------------------------------------------------------.
. **** Row 6 Table 25.6: Kernel Matching
. set seed 10101
. attk RE78 TREAT $XDW02 RE75SQ, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit.


This operation may take a while.

ATT estimation with the Kernel Matching method


--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

1058

1309.217

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use


the bootstrap option to get bootstrapped standard errors.

645

Bootstrapping of standard errors


command:
attk RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ U74 U7
> 5 U74HISP RE75SQ , pscore() logit comsup bwidth(.06)
statistic: attk
= r(attk)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attk | 200 1309.217 45.93746 958.1801 -580.2722 3198.707 (N)
|
-412.7856 3416.999 (P)
|
-374.4567 3450.043 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Kernel Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

1058

1309.217

958.180

1.366

--------------------------------------------------------. set seed 10101


. attk RE78 TREAT $XDW02, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit.


This operation may take a while.

ATT estimation with the Kernel Matching method

646

--------------------------------------------------------n. treat. n. contr.


ATT Std. Err.
t
--------------------------------------------------------185

1086

1342.016

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use


the bootstrap option to get bootstrapped standard errors.

Bootstrapping of standard errors


command:
attk RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ U74 U7
> 5 U74HISP , pscore() logit comsup bwidth(.06)
statistic: attk
= r(attk)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attk | 200 1342.016 61.94744 933.8668 -499.5284 3183.561 (N)
|
-378.5027 3354.131 (P)
|
-405.7551 3349.118 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Kernel Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

1086

1342.016

933.867

1.437

--------------------------------------------------------647

.
. ********** (2) ANALYSIS for DW99 SPECIFICATION OF THE PROPENSITY SCORE
**********
.
. * From DW99 Table 3 footnote e the propensity score uses the following regressors
. global XDW99 AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK HISP RE74 RE75
RE74SQ RE75SQ U74BLACK
.
. * Note that CT Table 25.6 footnote b erroneously lists RE74*RE75 as regressor
. * but this program (correctly) did not include RE74*RE75
.
. **** Propensity score with just those observations with common support
.
. drop myscore myblock
. pscore TREAT $XDW99, pscore(myscore) comsup blockid(myblock) numblo($breps)
level(0.005) logit

****************************************************
Algorithm to estimate the propensity score
****************************************************

The treatment is TREAT


TREAT |
Freq. Percent
Cum.
------------+----------------------------------0|
2,490
93.08
93.08
1|
185
6.92
100.00
------------+----------------------------------Total |
2,675
100.00

Estimation of the propensity score


Iteration 0: log likelihood = -672.64954
Iteration 1: log likelihood = -499.56574
Iteration 2: log likelihood = -318.55053
Iteration 3: log likelihood = -248.28844
Iteration 4: log likelihood = -225.08984
Iteration 5: log likelihood = -219.00396
Iteration 6: log likelihood = -209.30653
Iteration 7: log likelihood = -208.38887
Iteration 8: log likelihood = -205.17689
Iteration 9: log likelihood = -204.93156
Iteration 10: log likelihood = -204.92951
648

Iteration 11: log likelihood = -204.9295


Logit estimates

Log likelihood = -204.9295

Number of obs =
2675
LR chi2(13) = 935.44
Prob > chi2 = 0.0000
Pseudo R2
= 0.6953

-----------------------------------------------------------------------------TREAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------AGE | .3305734 .1203353 2.75 0.006 .0947206 .5664262
AGESQ | -.0063429 .0018561 -3.42 0.001 -.0099808 -.0027049
EDUC | .8247711 .3534216 2.33 0.020 .1320775 1.517465
EDUCSQ | -.0483153 .0186057 -2.60 0.009 -.0847819 -.0118488
MARR | -1.884062 .2994614 -6.29 0.000 -2.470996 -1.297129
NODEGREE | .1299868 .4284278 0.30 0.762 -.7097163
.96969
BLACK | 1.132961 .352088 3.22 0.001 .4428814 1.823041
HISP | 1.962762 .5673735 3.46 0.001 .8507302 3.074793
RE74 | -.0001047 .0000355 -2.95 0.003 -.0001743 -.0000351
RE75 | -.0002172 .0000415 -5.23 0.000 -.0002986 -.0001357
RE74SQ | 2.36e-09 6.57e-10 3.59 0.000 1.07e-09 3.65e-09
RE75SQ | 1.58e-10 6.68e-10 0.24 0.813 -1.15e-09 1.47e-09
U74BLACK | 2.137042 .4273667 5.00 0.000 1.299419 2.974665
_cons | -7.552458 2.451721 -3.08 0.002 -12.35774 -2.747173
-----------------------------------------------------------------------------note: 19 failures and 0 successes completely determined.

Note: the common support option has been selected


The region of common support is [.00065257, .97487544]

Description of the estimated propensity score


in region of common support
Estimated propensity score
------------------------------------------------------------Percentiles
Smallest
1% .0006813
.0006526
5% .0008363
.0006581
10% .0011416
.0006593
Obs
1331
25% .0024351
.0006598
Sum of Wgt.
1331
50%
75%
90%
95%

.0111854
Mean
.1388772
Largest
Std. Dev.
.275571
.0779976
.9744237
.6200607
.9747552
Variance
.0759394
.9494181
.9747918
Skewness
2.17177
649

99%

.970738

.9748754

Kurtosis

6.296349

******************************************************
Step 1: Identification of the optimal number of blocks
Use option detail if you want more detailed output
******************************************************

The final number of blocks is 195


This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks

**********************************************************
Step 2: Test of balancing property of the propensity score
Use option detail if you want more detailed output
**********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated


and the number of controls for each block
Inferior |
of block |
TREAT
of pscore |
0
1 | Total
-----------+----------------------+---------.0006526 |
501
2|
503
.005 |
143
3|
146
.01 |
78
0|
78
.015 |
42
0|
42
.02 |
38
0|
38
.025 |
29
1|
30
.03 |
22
0|
22
.035 |
23
0|
23
.04 |
22
0|
22
.045 |
17
1|
18
.05 |
23
0|
23
.055 |
13
1|
14
.06 |
12
0|
12
.065 |
9
0|
9
.07 |
11
1|
12
.075 |
9
1|
10
.08 |
6
0|
6
.085 |
6
0|
6
650

.09 |
.095 |
.1 |
.105 |
.11 |
.115 |
.12 |
.125 |
.13 |
.135 |
.14 |
.145 |
.15 |
.155 |
.16 |
.165 |
.175 |
.18 |
.185 |
.19 |
.195 |
.2 |
.205 |
.215 |
.225 |
.23 |
.235 |
.24 |
.245 |
.25 |
.26 |
.265 |
.27 |
.28 |
.285 |
.29 |
.295 |
.3 |
.305 |
.315 |
.32 |
.325 |
.33 |
.335 |
.34 |
.345 |
.35 |
.355 |
.365 |
.37 |
.375 |

8
6
9
4
8
3
1
2
6
1
1
1
2
4
3
2
1
0
1
2
2
1
1
5
2
2
2
2
0
0
1
1
1
1
1
2
2
2
0
1
0
2
1
0
1
1
2
0
1
2
2

1|
0|
0|
0|
0|
0|
0|
3|
1|
0|
1|
0|
0|
0|
0|
0|
0|
1|
0|
0|
1|
0|
0|
0|
1|
1|
3|
0|
1|
2|
1|
0|
0|
0|
0|
1|
1|
0|
1|
0|
1|
1|
0|
1|
1|
2|
0|
1|
0|
0|
2|

9
6
9
4
8
3
1
5
7
1
2
1
2
4
3
2
1
1
1
2
3
1
1
5
3
3
5
2
1
2
2
1
1
1
1
3
3
2
1
1
1
3
1
1
2
3
2
1
1
2
4
651

.38 |
.385 |
.4 |
.405 |
.42 |
.425 |
.45 |
.47 |
.48 |
.485 |
.495 |
.5 |
.51 |
.515 |
.525 |
.53 |
.535 |
.54 |
.555 |
.56 |
.565 |
.57 |
.575 |
.59 |
.595 |
.6 |
.605 |
.61 |
.615 |
.62 |
.625 |
.635 |
.64 |
.645 |
.665 |
.67 |
.675 |
.68 |
.69 |
.71 |
.735 |
.74 |
.745 |
.765 |
.79 |
.795 |
.8 |
.805 |
.815 |
.825 |
.84 |

1
1
0
0
0
1
2
1
1
2
1
0
0
2
0
0
0
1
0
1
1
0
1
0
0
0
0
1
0
0
0
1
1
2
0
1
0
1
1
1
0
1
2
1
0
0
0
0
0
0
0

2|
4|
1|
2|
1|
0|
0|
0|
1|
0|
0|
2|
2|
1|
1|
2|
1|
0|
1|
1|
0|
1|
1|
1|
1|
1|
1|
2|
1|
1|
1|
2|
1|
0|
1|
0|
3|
0|
0|
1|
1|
0|
0|
1|
4|
1|
1|
2|
3|
1|
1|

3
5
1
2
1
1
2
1
2
2
1
2
2
3
1
2
1
1
1
2
1
1
2
1
1
1
1
3
1
1
1
3
2
2
1
1
3
1
1
2
1
1
2
2
4
1
1
2
3
1
1
652

.845 |
0
1|
1
.85 |
0
1|
1
.86 |
0
1|
1
.865 |
0
1|
1
.895 |
0
1|
1
.9 |
0
2|
2
.905 |
0
2|
2
.915 |
0
1|
1
.92 |
0
1|
1
.925 |
0
7|
7
.93 |
0
2|
2
.935 |
0
1|
1
.94 |
0
3|
3
.945 |
1
6|
7
.95 |
1
14 |
15
.955 |
0
16 |
16
.96 |
1
5|
6
.965 |
3
12 |
15
.97 |
1
13 |
14
-----------+----------------------+---------Total | 1,146
185 | 1,331
Note: the common support option has been selected

*******************************************
End of the algorithm to estimate the pscore
*******************************************
.
. **** For completeness do same with common support option NOT selected
.
. drop myscore myblock
. pscore TREAT $XDW99, pscore(myscore) blockid(myblock) numblo($breps) level(0.005) logit

****************************************************
Algorithm to estimate the propensity score
****************************************************

The treatment is TREAT


TREAT |
Freq. Percent
Cum.
------------+----------------------------------0|
2,490
93.08
93.08
1|
185
6.92
100.00
------------+----------------------------------Total |
2,675
100.00
653

Estimation of the propensity score


Iteration 0: log likelihood = -672.64954
Iteration 1: log likelihood = -499.56574
Iteration 2: log likelihood = -318.55053
Iteration 3: log likelihood = -248.28844
Iteration 4: log likelihood = -225.08984
Iteration 5: log likelihood = -219.00396
Iteration 6: log likelihood = -209.30653
Iteration 7: log likelihood = -208.38887
Iteration 8: log likelihood = -205.17689
Iteration 9: log likelihood = -204.93156
Iteration 10: log likelihood = -204.92951
Iteration 11: log likelihood = -204.9295
Logit estimates

Log likelihood = -204.9295

Number of obs =
2675
LR chi2(13) = 935.44
Prob > chi2 = 0.0000
Pseudo R2
= 0.6953

-----------------------------------------------------------------------------TREAT |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------AGE | .3305734 .1203353 2.75 0.006 .0947206 .5664262
AGESQ | -.0063429 .0018561 -3.42 0.001 -.0099808 -.0027049
EDUC | .8247711 .3534216 2.33 0.020 .1320775 1.517465
EDUCSQ | -.0483153 .0186057 -2.60 0.009 -.0847819 -.0118488
MARR | -1.884062 .2994614 -6.29 0.000 -2.470996 -1.297129
NODEGREE | .1299868 .4284278 0.30 0.762 -.7097163
.96969
BLACK | 1.132961 .352088 3.22 0.001 .4428814 1.823041
HISP | 1.962762 .5673735 3.46 0.001 .8507302 3.074793
RE74 | -.0001047 .0000355 -2.95 0.003 -.0001743 -.0000351
RE75 | -.0002172 .0000415 -5.23 0.000 -.0002986 -.0001357
RE74SQ | 2.36e-09 6.57e-10 3.59 0.000 1.07e-09 3.65e-09
RE75SQ | 1.58e-10 6.68e-10 0.24 0.813 -1.15e-09 1.47e-09
U74BLACK | 2.137042 .4273667 5.00 0.000 1.299419 2.974665
_cons | -7.552458 2.451721 -3.08 0.002 -12.35774 -2.747173
-----------------------------------------------------------------------------note: 19 failures and 0 successes completely determined.

Description of the estimated propensity score


Estimated propensity score
------------------------------------------------------------Percentiles
Smallest
654

1%
5%
10%
25%

2.84e-08
4.47e-07
2.07e-06
.000034

50%

.0006388
Mean
.0691589
Largest
Std. Dev.
.2063646
.010941
.9744237
.1336877
.9747552
Variance
.0425863
.6200607
.9747918
Skewness
3.471137
.9651648
.9748754
Kurtosis
14.05057

75%
90%
95%
99%

4.49e-11
4.88e-10
4.88e-10
4.95e-10

Obs
2675
Sum of Wgt.
2675

******************************************************
Step 1: Identification of the optimal number of blocks
Use option detail if you want more detailed output
******************************************************

The final number of blocks is 195


This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks

**********************************************************
Step 2: Test of balancing property of the propensity score
Use option detail if you want more detailed output
**********************************************************
Variable BLACK is not balanced in block 1
The balancing property is not satisfied
Try a different specification of the propensity score
Inferior |
of block |
TREAT
of pscore |
0
1 | Total
-----------+----------------------+---------0 | 1,845
2 | 1,847
.005 |
143
3|
146
.01 |
78
0|
78
.015 |
42
0|
42
.02 |
38
0|
38
.025 |
29
1|
30
.03 |
22
0|
22
.035 |
23
0|
23
.04 |
22
0|
22
655

.045 |
.05 |
.055 |
.06 |
.065 |
.07 |
.075 |
.08 |
.085 |
.09 |
.095 |
.1 |
.105 |
.11 |
.115 |
.12 |
.125 |
.13 |
.135 |
.14 |
.145 |
.15 |
.155 |
.16 |
.165 |
.175 |
.18 |
.185 |
.19 |
.195 |
.2 |
.205 |
.215 |
.225 |
.23 |
.235 |
.24 |
.245 |
.25 |
.26 |
.265 |
.27 |
.28 |
.285 |
.29 |
.295 |
.3 |
.305 |
.315 |
.32 |
.325 |

17
23
13
12
9
11
9
6
6
8
6
9
4
8
3
1
2
6
1
1
1
2
4
3
2
1
0
1
2
2
1
1
5
2
2
2
2
0
0
1
1
1
1
1
2
2
2
0
1
0
2

1|
0|
1|
0|
0|
1|
1|
0|
0|
1|
0|
0|
0|
0|
0|
0|
3|
1|
0|
1|
0|
0|
0|
0|
0|
0|
1|
0|
0|
1|
0|
0|
0|
1|
1|
3|
0|
1|
2|
1|
0|
0|
0|
0|
1|
1|
0|
1|
0|
1|
1|

18
23
14
12
9
12
10
6
6
9
6
9
4
8
3
1
5
7
1
2
1
2
4
3
2
1
1
1
2
3
1
1
5
3
3
5
2
1
2
2
1
1
1
1
3
3
2
1
1
1
3
656

.33 |
.335 |
.34 |
.345 |
.35 |
.355 |
.365 |
.37 |
.375 |
.38 |
.385 |
.4 |
.405 |
.42 |
.425 |
.45 |
.47 |
.48 |
.485 |
.495 |
.5 |
.51 |
.515 |
.525 |
.53 |
.535 |
.54 |
.555 |
.56 |
.565 |
.57 |
.575 |
.59 |
.595 |
.6 |
.605 |
.61 |
.615 |
.62 |
.625 |
.635 |
.64 |
.645 |
.665 |
.67 |
.675 |
.68 |
.69 |
.71 |
.735 |
.74 |

1
0
1
1
2
0
1
2
2
1
1
0
0
0
1
2
1
1
2
1
0
0
2
0
0
0
1
0
1
1
0
1
0
0
0
0
1
0
0
0
1
1
2
0
1
0
1
1
1
0
1

0|
1|
1|
2|
0|
1|
0|
0|
2|
2|
4|
1|
2|
1|
0|
0|
0|
1|
0|
0|
2|
2|
1|
1|
2|
1|
0|
1|
1|
0|
1|
1|
1|
1|
1|
1|
2|
1|
1|
1|
2|
1|
0|
1|
0|
3|
0|
0|
1|
1|
0|

1
1
2
3
2
1
1
2
4
3
5
1
2
1
1
2
1
2
2
1
2
2
3
1
2
1
1
1
2
1
1
2
1
1
1
1
3
1
1
1
3
2
2
1
1
3
1
1
2
1
1
657

.745 |
2
0|
2
.765 |
1
1|
2
.79 |
0
4|
4
.795 |
0
1|
1
.8 |
0
1|
1
.805 |
0
2|
2
.815 |
0
3|
3
.825 |
0
1|
1
.84 |
0
1|
1
.845 |
0
1|
1
.85 |
0
1|
1
.86 |
0
1|
1
.865 |
0
1|
1
.895 |
0
1|
1
.9 |
0
2|
2
.905 |
0
2|
2
.915 |
0
1|
1
.92 |
0
1|
1
.925 |
0
7|
7
.93 |
0
2|
2
.935 |
0
1|
1
.94 |
0
3|
3
.945 |
1
6|
7
.95 |
1
14 |
15
.955 |
0
16 |
16
.96 |
1
5|
6
.965 |
3
12 |
15
.97 |
1
13 |
14
-----------+----------------------+---------Total | 2,490
185 | 2,675

*******************************************
End of the algorithm to estimate the pscore
*******************************************
.
. **** All of the following use common support
.
. **** Row 7 Table 25.6: Nearest neighbor matching (random version)
. set seed 10101
. attnd RE78 TREAT $XDW99, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit.


This operation may take a while.

658

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

57

560.287

2205.663

0.254

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches

Bootstrapping of standard errors


command:
attnd RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ RE75S
> Q U74BLACK , pscore() logit comsup
statistic: attnd
= r(attnd)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attnd | 200 560.2872 1104.87 1331.294 -2064.967 3185.542 (N)
|
-785.5272 4190.844 (P)
|
-2615.809 2016.239 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
659

--------------------------------------------------------185

57

560.287

1331.294

0.421

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches
.
. **** Row 8 Table 25.6: Radius matching for Radius=0.001
. set seed 10101
. attr RE78 TREAT $XDW99, comsup boot reps($breps) dots logit radius(0.001)

The program is searching for matches of treated units within radius.


This operation may take a while.

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------57

583 -9358.228

997.561

-9.381

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

Bootstrapping of standard errors


command:
attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ RE75SQ
> U74BLACK , pscore() logit comsup radius(.001)
statistic: attr
= r(attr)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

660

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 200 -9358.228 2589.204 3079.824 -15431.51 -3284.949 (N)
|
-11328.39 901.8873 (P)
|
-13053.95 -6956.288 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Radius Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------57

583 -9358.228

3079.824

-3.039

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius
.
. **** Row 9 Table 25.6: Radius matching for Radius=0.0001
. set seed 10101
. attr RE78 TREAT $XDW99, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius.


This operation may take a while.

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------27

76 -7847.460 2066.697

-3.797

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

661

Bootstrapping of standard errors


command:
attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ RE75SQ
> U74BLACK , pscore() logit comsup radius(.0001)
statistic: attr
= r(attr)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 200 -7847.46 2920.804 4850.874 -17413.17 1718.251 (N)
|
-13423.91 5223.634 (P)
|
-15432.32 632.0693 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Radius Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------27

76 -7847.460

4850.874

-1.618

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius
.
. **** Row 10 Table 25.6: Radius matching for Radius=0.00001
. set seed 10101
. attr RE78 TREAT $XDW99, comsup boot reps($breps) dots logit radius(0.00001)

662

The program is searching for matches of treated units within radius.


This operation may take a while.

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------16

13

223.468

4551.850

0.049

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

Bootstrapping of standard errors


command:
attr RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ RE75SQ
> U74BLACK , pscore() logit comsup radius(.00001)
statistic: attr
= r(attr)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 199 223.4685 -1272.487 5608.927 -10837.43 11284.37 (N)
|
-14600.21 8548.427 (P)
|
-10778.17 11039.05 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Radius Matching method


Bootstrapped standard errors
663

--------------------------------------------------------n. treat. n. contr.


ATT Std. Err.
t
--------------------------------------------------------16

13

223.468

5608.927

0.040

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius
.
. **** Row 11 Table 25.6: Stratification Matching
. set seed 10101
. atts RE78 TREAT, pscore(myscore) blockid(myblock) comsup boot reps($breps) dots

ATT estimation with the Stratification method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------98

1233

1322.160

---------------------------------------------------------

Bootstrapping of standard errors


command:
atts RE78 TREAT , pscore(myscore) blockid(myblock) comsup
statistic: atts
= r(atts)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------atts | 200 1322.16 -51.6285 1276.237 -1194.524 3838.844 (N)
|
-1515.399 3960.787 (P)
664

|
-1383.034 4034.298 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Stratification method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------98

1233

1322.160

1276.237

1.036

--------------------------------------------------------.
. **** Row 12 Table 25.6: Kernel Matching
. * pscore TREAT $XDW99, pscore(myscore) comsup blockid(myblock) numblo($breps)
level(0.005) logit
. set seed 10101
. attk RE78 TREAT $XDW99, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit.


This operation may take a while.

ATT estimation with the Kernel Matching method


--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

1146

1518.694

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use


the bootstrap option to get bootstrapped standard errors.

Bootstrapping of standard errors

665

command:
attk RE78 TREAT AGE AGESQ EDUC EDUCSQ MARR NODEGREE BLACK
HISP RE74 RE75 RE74SQ RE75SQ
> U74BLACK , pscore() logit comsup bwidth(.06)
statistic: attk
= r(attk)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

2675

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attk | 200 1518.694 130.8493 808.3386 -75.31444 3112.703 (N)
|
212.6286 3165.292 (P)
|
96.05106 2991.407 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Kernel Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

1146

1518.694

808.339

1.879

--------------------------------------------------------.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section6\mma25p2matching.txt
log type: text
closed on: 26 May 2005, 11:15:53
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section6\mma25p3extra.txt
log type: text
opened on: 26 May 2005, 11:33:04
.
. ********** OVERVIEW OF MMA25P3EXTRA.DO **********
.
666

. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 25.8 pages 889-893
. * Evaluating treatment effect of training on Earnings
. * This program provides additional analysis and data not in the book
. * (1) Compare NSW experiment treated to NSW experiment controls
. * (2) Compare NSW experiment treated to CPS "controls"
. * [Same as text except "controls" are from CPS not PSID]
.
. * The program is based on
.*
MMA25P2MATCHING.DO propensity score matching
.
. * To run this program you need STATA data files
. * nswre74_treated.dta NSW Treated sample
. * nswre74_control.dta NSW Control sample (not analyzed earlier)
. * propensity_cps.dta
CPS Control sample (rather than PSID)
.
. * To run this program you need the Stata add-ons
. * pscore.ado, atts.ado, attr.ado, attnd.ado, attnw.ado
. * due to Sascha O. Becker and Andrea Ichino (2002)
. * "Estimation of average treatment effects based on propensity scores",
. * The Stata Journal, Vol.2, No.4, pp. 358-377.
.
. * This program uses version 2.02 May 13 2005 for Stata version 8
. * downloadable from http://www.iue.it/Personal/Ichino/#pscore
. * We earlier used version 1.29 October 8 2002 for Stata version 7
. * downloadable from http://www.iue.it/Personal/Ichino/#pscore
. * and obtained the same results
.
. * To speed up the program reduce breps: the number of bootstrap
. * replications used to obtain bootstrap standard errors
. * Bootstrap se's will differ from text as here seed is set to 10101
.
. ********** STATA SETUP **********
.
. set more off
. version 8
. set scheme s1mono /* Used for graphs */
.
. ********** DATA DESCRIPTION **********
.
. * Data originally from DW99
. * R.H. Dehejia and S. Wahba (1999)
. * "Causal Effects in Nonexperimental Studies: reevaluating the
667

. * Evaluation of Training Programs", JASA, 1053-1062


. * or DW02
. * R.H. Dehejia and S. Wahba (2002)
. * "Propensity-score Matching Methods for Nonexperimental Causal
. * Studies", ReStat, 151-161
. * which in turn are from
. * Lalonde, R. (1986), "Evaluating the Econometric Evaluations of
. * Training Programs with Experimental Data," AER, 604-620.
.
. * nswre74_treated.dta N=185 NSW Treated sample only
. * nswre74_control.dta N=260 NSW Control sample only
. * propensity_cps.dta N=16177 NSW Treated + CPS Control sample (Full CPS or CPS-1)
.
. ********** (1) ANALYSIS: NSW TREATED VERSUS NSW CONTROLS **********
.
. * Read in NSW treated and control and combine
. use nswre74_treated.dta, clear
. append using nswre74_control.dta
.
. ** Summarize these data
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------treat |
445 .4157303 .4934022
0
1
age |
445 25.37079 7.100282
17
55
edu |
445 10.19551 1.792119
3
16
black |
445 .8337079 .3727617
0
1
hisp |
445 .0876404 .2830895
0
1
-------------+-------------------------------------------------------married |
445 .1685393 .3747658
0
1
nodegree |
445 .7820225 .4133367
0
1
re74 |
445 2102.265 5363.582
0 39570.68
re75 |
445 1377.138 3150.961
0 25142.24
re78 |
445 5300.764 6631.492
0 60307.93
-------------+-------------------------------------------------------u74 |
445 .2674157 .4431092
0
1
u75 |
445 .3505618 .4776829
0
1
. bysort treat: sum
----------------------------------------------------------------------------------------------------> treat = 0
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------treat |
260
0
0
0
0
age |
260 25.05385 7.057745
17
55
edu |
260 10.08846 1.614325
3
14
668

black |
260 .8269231 .3790434
0
1
hisp |
260 .1076923 .3105893
0
1
-------------+-------------------------------------------------------married |
260 .1538462 .3614971
0
1
nodegree |
260 .8346154 .3722439
0
1
re74 |
260 2107.027 5687.906
0 39570.68
re75 |
260 1266.909 3102.982
0 23031.98
re78 |
260 4554.801 5483.836
0 39483.53
-------------+-------------------------------------------------------u74 |
260
.25 .4338478
0
1
u75 |
260 .3153846 .4655651
0
1
----------------------------------------------------------------------------------------------------> treat = 1
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------treat |
185
1
0
1
1
age |
185 25.81622 7.155019
17
48
edu |
185 10.34595 2.01065
4
16
black |
185 .8432432 .3645579
0
1
hisp |
185 .0594595 .2371244
0
1
-------------+-------------------------------------------------------married |
185 .1891892 .3927217
0
1
nodegree |
185 .7081081 .4558666
0
1
re74 |
185 2095.574 4886.62
0 35040.07
re75 |
185 1532.055 3219.251
0 25142.24
re78 |
185 6349.144 7867.402
0 60307.93
-------------+-------------------------------------------------------u74 |
185 .2918919 .4558666
0
1
u75 |
185
.4 .4912274
0
1

.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile treat age edu black hisp married nodegree re74 re75 re78 u74 u75 /*
> */using nswre74_all.asc, replace
.
. ** Calculate the benchmark Treatment Effect
. ** Same as DW02 Tables 2 and 3 NSW row second last column
. ** and is the number given in CT page 894 second last line
.
. regress re78 treat
Source |
SS
df
MS
Number of obs = 445
-------------+-----------------------------F( 1, 443) = 8.04
Model | 348013183 1 348013183
Prob > F
= 0.0048
Residual | 1.9178e+10 443 43290369.3
R-squared = 0.0178
-------------+-----------------------------Adj R-squared = 0.0156
Total | 1.9526e+10 444 43976681.9
Root MSE
= 6579.5
669

-----------------------------------------------------------------------------re78 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------treat | 1794.342 632.8534 2.84 0.005 550.5745 3038.11
_cons | 4554.801 408.0459 11.16 0.000 3752.855 5356.747
-----------------------------------------------------------------------------.
. ********** (2) ANALYSIS: NSW TREATED VERSUS CPS CONTROLS **********
.
. * This data set has NSW treated and full CPS controls
. use propensity_cps.dta, clear
.
. * Variables u74, u75 were evaluated wrongly in the original file
. * So make the following correction
. drop u74 u75
. gen u74=0
. replace u74=1 if re74==0
(2044 real changes made)
. gen u75=0
. replace u75=1 if re75==0
(1859 real changes made)
. gen age2=age*age
. gen age3=age2*age
. gen edu2=edu*edu
. gen edure74=edu*re74
. * Not sure whether this is needed
. * Does DW99 use edu*re74*age3 or separately edu*re74 and age3 ?
. gen edre74age3=edu*re74*age3
.
. ** Summarize these data
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------treat | 16177 .011436 .1063292
0
1
age | 16177 33.14051 11.03651
16
55
edu | 16177 12.00828 2.868005
0
18
black | 16177 .0823391 .2748892
0
1
670

hisp | 16177 .0718922 .2583173


0
1
-------------+-------------------------------------------------------married | 16177 .7057551 .4557167
0
1
nodegree | 16177 .3005502 .4585115
0
1
re74 | 16177 13880.47 9613.115
0 35040.07
re75 | 16177 13512.21 9313.207
0 25243.55
re78 | 16177 14749.48 9670.996
0 60307.93
-------------+-------------------------------------------------------u74 | 16177 .1263522 .3322562
0
1
u75 | 16177 .1149162 .3189307
0
1
age2 | 16177 1220.09 783.4604
256
3025
age3 | 16177 48988.49 45032.59
4096 166375
edu2 | 16177 152.4238 67.06033
0
324
-------------+-------------------------------------------------------edure74 | 16177 169452.3 129585.8
0 490561
edre74age3 | 16177 9.53e+09 1.21e+10
0 7.75e+10
. bysort treat: sum
----------------------------------------------------------------------------------------------------> treat = 0
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------treat | 15992
0
0
0
0
age | 15992 33.22524 11.04522
16
55
edu | 15992 12.02751 2.870846
0
18
black | 15992 .0735368 .2610237
0
1
hisp | 15992 .072036 .2585556
0
1
-------------+-------------------------------------------------------married | 15992 .7117309 .4529712
0
1
nodegree | 15992 .2958354 .4564316
0
1
re74 | 15992 14016.8 9569.796
0 25862.32
re75 | 15992 13650.8 9270.403
0 25243.55
re78 | 15992 14846.66 9647.392
0 25564.67
-------------+-------------------------------------------------------u74 | 15992 .1196223 .3245295
0
1
u75 | 15992 .1093047 .3120308
0
1
age2 | 15992 1225.906 784.7382
256
3025
age3 | 15992 49305.85 45139.01
4096 166375
edu2 | 15992 152.9023 67.16633
0
324
-------------+-------------------------------------------------------edure74 | 15992 171147.6 129218.8
0 465521.8
edre74age3 | 15992 9.64e+09 1.21e+10
0 7.75e+10
----------------------------------------------------------------------------------------------------> treat = 1
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------treat |
185
1
0
1
1
671

age |
185 25.81622 7.155019
17
48
edu |
185 10.34595 2.01065
4
16
black |
185 .8432432 .3645579
0
1
hisp |
185 .0594595 .2371244
0
1
-------------+-------------------------------------------------------married |
185 .1891892 .3927217
0
1
nodegree |
185 .7081081 .4558666
0
1
re74 |
185 2095.574 4886.62
0 35040.07
re75 |
185 1532.055 3219.251
0 25142.24
re78 |
185 6349.144 7867.402
0 60307.93
-------------+-------------------------------------------------------u74 |
185 .7081081 .4558666
0
1
u75 |
185
.6 .4912274
0
1
age2 |
185 717.3946 431.2517
289
2304
age3 |
185 21554.66 20964.71
4913 110592
edu2 |
185 111.0595 39.30388
16
256
-------------+-------------------------------------------------------edure74 |
185 22898.73 57393.97
0 490561
edre74age3 |
185 4.28e+08 1.24e+09
0 8.75e+09

.
. * Write data to a text (ascii) file so can use with programs other than Stata
. * This has data as original except for recode of u74 and u75
. outfile treat age edu black hisp married nodegree re74 re75 re78 u74 u75 /*
> */ using propensity_cps.asc, replace
.
. ** Number of replications to use in the bootstrap
. ** Ideally at least 400
. global breps 200
.
. *** (2A) CPS propensity score model from DW02 Table 2 footnote A
.
. global CPSDW02 age age2 age3 edu edu2 married nodegree black hisp re74 re75 u74 u75 edure74
.
. * With common support option
. pscore treat $CPSDW02, pscore(myscore) blockid(myblock) comsup numblo(5) level(0.005) logit

****************************************************
Algorithm to estimate the propensity score
****************************************************

The treatment is treat


treat |

Freq.

Percent

Cum.
672

------------+----------------------------------0 | 15,992
98.86
98.86
1|
185
1.14
100.00
------------+----------------------------------Total | 16,177
100.00

Estimation of the propensity score


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:
Iteration 7:
Iteration 8:

log likelihood = -1011.0713


log likelihood = -612.55814
log likelihood = -481.71035
log likelihood = -428.3351
log likelihood = -409.00437
log likelihood = -404.57736
log likelihood = -404.16676
log likelihood = -404.15991
log likelihood = -404.15991

Logit estimates

Number of obs =
16177
LR chi2(14) = 1213.82
Prob > chi2 = 0.0000
Log likelihood = -404.15991
Pseudo R2
= 0.6003
-----------------------------------------------------------------------------treat |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------age | 2.425229 .3500652 6.93 0.000 1.739114 3.111344
age2 | -.0672395 .0111308 -6.04 0.000 -.0890555 -.0454234
age3 | .0005685 .0001113 5.11 0.000 .0003505 .0007866
edu | .9247848 .2500694 3.70 0.000 .4346577 1.414912
edu2 | -.0572021 .0136202 -4.20 0.000 -.0838972 -.0305071
married | -1.556471 .2517687 -6.18 0.000 -2.049929 -1.063014
nodegree | .9270591 .3254621 2.85 0.004 .2891651 1.564953
black | 3.850668 .2662868 14.46 0.000 3.328755 4.37258
hisp | 1.673885 .409913 4.08 0.000 .8704705
2.4773
re74 | -.0002203 .0001086 -2.03 0.043 -.0004332 -7.40e-06
re75 | -.0001969 .0000378 -5.21 0.000 -.000271 -.0001228
u74 | 1.749522 .2897311 6.04 0.000
1.18166 2.317385
u75 | .00944 .257531 0.04 0.971 -.4953115 .5141915
edure74 | .0000222 9.08e-06 2.45 0.014 4.43e-06
.00004
_cons | -35.22098 3.797922 -9.27 0.000 -42.66477 -27.77719
-----------------------------------------------------------------------------note: 3 failures and 0 successes completely determined.

Note: the common support option has been selected


The region of common support is [.00106139, .93845543]
673

Description of the estimated propensity score


in region of common support
Estimated propensity score
------------------------------------------------------------Percentiles
Smallest
1% .0010892
.0010614
5%
.001221
.0010615
10% .0013925
.0010625
Obs
4041
25% .0021398
.0010632
Sum of Wgt.
4041
50%
75%
90%
95%
99%

.0053823
Mean
.0452964
Largest
Std. Dev.
.1326324
.0156111
.9356451
.0856723
.93718
Variance
.0175914
.282253
.9374608
Skewness
4.475994
.822637
.9384554
Kurtosis
24.36564

******************************************************
Step 1: Identification of the optimal number of blocks
Use option detail if you want more detailed output
******************************************************

The final number of blocks is 8


This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks

**********************************************************
Step 2: Test of balancing property of the propensity score
Use option detail if you want more detailed output
**********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated


and the number of controls for each block
Inferior |
of block |
of pscore |

treat
0

1|

Total
674

-----------+----------------------+---------.0010614 | 3,214
18 | 3,232
.025 |
240
8|
248
.05 |
172
14 |
186
.1 |
96
19 |
115
.2 |
86
32 |
118
.4 |
31
38 |
69
.6 |
9
20 |
29
.8 |
8
36 |
44
-----------+----------------------+---------Total | 3,856
185 | 4,041
Note: the common support option has been selected

*******************************************
End of the algorithm to estimate the pscore
*******************************************
.
. * Without common support option
. drop myscore myblock
. pscore treat $CPSDW02, pscore(myscore) blockid(myblock) numblo(5) level(0.005) logit

****************************************************
Algorithm to estimate the propensity score
****************************************************

The treatment is treat


treat |
Freq. Percent
Cum.
------------+----------------------------------0 | 15,992
98.86
98.86
1|
185
1.14
100.00
------------+----------------------------------Total | 16,177
100.00

Estimation of the propensity score


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -1011.0713


log likelihood = -612.55814
log likelihood = -481.71035
log likelihood = -428.3351
log likelihood = -409.00437
log likelihood = -404.57736
675

Iteration 6: log likelihood = -404.16676


Iteration 7: log likelihood = -404.15991
Iteration 8: log likelihood = -404.15991
Logit estimates

Number of obs =
16177
LR chi2(14) = 1213.82
Prob > chi2 = 0.0000
Log likelihood = -404.15991
Pseudo R2
= 0.6003
-----------------------------------------------------------------------------treat |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------age | 2.425229 .3500652 6.93 0.000 1.739114 3.111344
age2 | -.0672395 .0111308 -6.04 0.000 -.0890555 -.0454234
age3 | .0005685 .0001113 5.11 0.000 .0003505 .0007866
edu | .9247848 .2500694 3.70 0.000 .4346577 1.414912
edu2 | -.0572021 .0136202 -4.20 0.000 -.0838972 -.0305071
married | -1.556471 .2517687 -6.18 0.000 -2.049929 -1.063014
nodegree | .9270591 .3254621 2.85 0.004 .2891651 1.564953
black | 3.850668 .2662868 14.46 0.000 3.328755 4.37258
hisp | 1.673885 .409913 4.08 0.000 .8704705
2.4773
re74 | -.0002203 .0001086 -2.03 0.043 -.0004332 -7.40e-06
re75 | -.0001969 .0000378 -5.21 0.000 -.000271 -.0001228
u74 | 1.749522 .2897311 6.04 0.000
1.18166 2.317385
u75 | .00944 .257531 0.04 0.971 -.4953115 .5141915
edure74 | .0000222 9.08e-06 2.45 0.014 4.43e-06
.00004
_cons | -35.22098 3.797922 -9.27 0.000 -42.66477 -27.77719
-----------------------------------------------------------------------------note: 3 failures and 0 successes completely determined.

Description of the estimated propensity score


Estimated propensity score
------------------------------------------------------------Percentiles
Smallest
1% 5.92e-07
1.18e-09
5% 1.72e-06
4.07e-09
10% 3.63e-06
4.24e-09
Obs
16177
25% .0000196
1.55e-08
Sum of Wgt.
16177
50%
75%
90%
95%
99%

.0001247
Mean
.011436
Largest
Std. Dev.
.0691037
.0010579
.9356451
.0073933
.93718
Variance
.0047753
.0250635
.9374608
Skewness
9.281842
.3620009
.9384554
Kurtosis
99.39697

676

******************************************************
Step 1: Identification of the optimal number of blocks
Use option detail if you want more detailed output
******************************************************

The final number of blocks is 13


This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks

**********************************************************
Step 2: Test of balancing property of the propensity score
Use option detail if you want more detailed output
**********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated


and the number of controls for each block
Inferior |
of block |
treat
of pscore |
0
1 | Total
-----------+----------------------+---------0 | 11,635
0 | 11,635
.0007813 | 1,056
2 | 1,058
.0015625 |
932
5|
937
.003125 |
712
2|
714
.00625 |
709
2|
711
.0125 |
306
7|
313
.025 |
240
8|
248
.05 |
172
14 |
186
.1 |
96
19 |
115
.2 |
86
32 |
118
.4 |
31
38 |
69
.6 |
9
20 |
29
.8 |
8
36 |
44
-----------+----------------------+---------Total | 15,992
185 | 16,177

*******************************************
End of the algorithm to estimate the pscore
*******************************************
677

.
. * Nearest neighbor matching (random version)
. attnd re78 treat $CPSDW02, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit.


This operation may take a while.

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

155

730.380

1049.321

0.696

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches

Bootstrapping of standard errors


command:
attnd re78 treat age age2 age3 edu edu2 married nodegree black hisp re74 re75 u74
u75
> edure74 , pscore() logit comsup
statistic: attnd
= r(attnd)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attnd | 200 730.3805 1280.829 941.0756 -1125.38 2586.141 (N)
|
151.7753 3865.059 (P)
|
-601.5495 1317.795 (BC)
-----------------------------------------------------------------------------Note: N = normal
678

P = percentile
BC = bias-corrected

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

155

730.380

941.076

0.776

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches
.
. * Radius matching: Radius=0.0001
. attr re78 treat $CPSDW02, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius.


This operation may take a while.

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------67

1027 -2935.932

888.041

-3.306

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

Bootstrapping of standard errors


command:
attr re78 treat age age2 age3 edu edu2 married nodegree black hisp re74 re75 u74 u75
e
> dure74 , pscore() logit comsup radius(.0001)
679

statistic: attr
= r(attr)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 200 -2935.932 472.0703 1332.096 -5562.767 -309.0973 (N)
|
-5186.873 438.6902 (P)
|
-5999.987 -950.2962 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Radius Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------67

1027 -2935.932

1332.096

-2.204

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius
.
. * Kernel Matching
. attk re78 treat $CPSDW02, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit.


This operation may take a while.

ATT estimation with the Kernel Matching method


--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
---------------------------------------------------------

680

185

3856

1267.716

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use


the bootstrap option to get bootstrapped standard errors.

Bootstrapping of standard errors


command:
attk re78 treat age age2 age3 edu edu2 married nodegree black hisp re74 re75 u74
u75 e
> dure74 , pscore() logit comsup bwidth(.06)
statistic: attk
= r(attk)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attk | 200 1267.716 -64.23519 720.5805 -153.2374 2688.669 (N)
|
-211.0497 2559.206 (P)
|
-136.5283 2594.417 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Kernel Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

3856

1267.716

720.580

1.759

--------------------------------------------------------.
. * Stratification Matching
. atts re78 treat, pscore(myscore) blockid(myblock) comsup boot reps($breps) dots
681

ATT estimation with the Stratification method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

3856

1505.512

734.270

2.050

---------------------------------------------------------

Bootstrapping of standard errors


command:
atts re78 treat , pscore(myscore) blockid(myblock) comsup
statistic: atts
= r(atts)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------atts | 200 1505.512 -9.343635 665.1843 193.7979 2817.227 (N)
|
251.7493 2958.461 (P)
|
252.6815 2985.052 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Stratification method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

3856

1505.512

665.184

2.263
682

--------------------------------------------------------.
. *** (2B) CPS propensity score model from DW99 Table 2 footnote A
.
. global CPSDW99 age age2 edu edu2 nodegree married black hisp re74 re75 u74 u75 edure74 age3
.
. * With common support option
. drop myscore myblock
. pscore treat $CPSDW99, pscore(myscore) blockid(myblock) comsup numblo(5) level(0.005) logit

****************************************************
Algorithm to estimate the propensity score
****************************************************

The treatment is treat


treat |
Freq. Percent
Cum.
------------+----------------------------------0 | 15,992
98.86
98.86
1|
185
1.14
100.00
------------+----------------------------------Total | 16,177
100.00

Estimation of the propensity score


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:
Iteration 7:
Iteration 8:

log likelihood = -1011.0713


log likelihood = -612.55814
log likelihood = -481.71035
log likelihood = -428.3351
log likelihood = -409.00437
log likelihood = -404.57736
log likelihood = -404.16676
log likelihood = -404.15991
log likelihood = -404.15991

Logit estimates

Number of obs =
16177
LR chi2(14) = 1213.82
Prob > chi2 = 0.0000
Log likelihood = -404.15991
Pseudo R2
= 0.6003
-----------------------------------------------------------------------------treat |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
683

-------------+---------------------------------------------------------------age | 2.425229 .3500652 6.93 0.000 1.739114 3.111344


age2 | -.0672395 .0111308 -6.04 0.000 -.0890555 -.0454234
edu | .9247848 .2500694 3.70 0.000 .4346577 1.414912
edu2 | -.0572021 .0136202 -4.20 0.000 -.0838972 -.0305071
nodegree | .9270591 .3254621 2.85 0.004 .2891651 1.564953
married | -1.556471 .2517687 -6.18 0.000 -2.049929 -1.063014
black | 3.850668 .2662868 14.46 0.000 3.328755 4.37258
hisp | 1.673885 .409913 4.08 0.000 .8704705
2.4773
re74 | -.0002203 .0001086 -2.03 0.043 -.0004332 -7.40e-06
re75 | -.0001969 .0000378 -5.21 0.000 -.000271 -.0001228
u74 | 1.749522 .2897311 6.04 0.000
1.18166 2.317385
u75 | .00944 .257531 0.04 0.971 -.4953115 .5141915
edure74 | .0000222 9.08e-06 2.45 0.014 4.43e-06
.00004
age3 | .0005685 .0001113 5.11 0.000 .0003505 .0007866
_cons | -35.22098 3.797922 -9.27 0.000 -42.66477 -27.77719
-----------------------------------------------------------------------------note: 3 failures and 0 successes completely determined.

Note: the common support option has been selected


The region of common support is [.00106139, .93845543]

Description of the estimated propensity score


in region of common support
Estimated propensity score
------------------------------------------------------------Percentiles
Smallest
1% .0010892
.0010614
5%
.001221
.0010615
10% .0013925
.0010625
Obs
4041
25% .0021398
.0010632
Sum of Wgt.
4041
50%
75%
90%
95%
99%

.0053823
Mean
.0452964
Largest
Std. Dev.
.1326324
.0156111
.9356451
.0856723
.93718
Variance
.0175914
.282253
.9374608
Skewness
4.475994
.822637
.9384554
Kurtosis
24.36564

******************************************************
Step 1: Identification of the optimal number of blocks
Use option detail if you want more detailed output
******************************************************
684

The final number of blocks is 8


This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks

**********************************************************
Step 2: Test of balancing property of the propensity score
Use option detail if you want more detailed output
**********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated


and the number of controls for each block
Inferior |
of block |
treat
of pscore |
0
1 | Total
-----------+----------------------+---------.0010614 | 3,214
18 | 3,232
.025 |
240
8|
248
.05 |
172
14 |
186
.1 |
96
19 |
115
.2 |
86
32 |
118
.4 |
31
38 |
69
.6 |
9
20 |
29
.8 |
8
36 |
44
-----------+----------------------+---------Total | 3,856
185 | 4,041
Note: the common support option has been selected

*******************************************
End of the algorithm to estimate the pscore
*******************************************
.
. * Without common support option
. drop myscore myblock
. pscore treat $CPSDW99, pscore(myscore) blockid(myblock) numblo(5) level(0.005) logit

685

****************************************************
Algorithm to estimate the propensity score
****************************************************

The treatment is treat


treat |
Freq. Percent
Cum.
------------+----------------------------------0 | 15,992
98.86
98.86
1|
185
1.14
100.00
------------+----------------------------------Total | 16,177
100.00

Estimation of the propensity score


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:
Iteration 7:
Iteration 8:

log likelihood = -1011.0713


log likelihood = -612.55814
log likelihood = -481.71035
log likelihood = -428.3351
log likelihood = -409.00437
log likelihood = -404.57736
log likelihood = -404.16676
log likelihood = -404.15991
log likelihood = -404.15991

Logit estimates

Number of obs =
16177
LR chi2(14) = 1213.82
Prob > chi2 = 0.0000
Log likelihood = -404.15991
Pseudo R2
= 0.6003
-----------------------------------------------------------------------------treat |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------age | 2.425229 .3500652 6.93 0.000 1.739114 3.111344
age2 | -.0672395 .0111308 -6.04 0.000 -.0890555 -.0454234
edu | .9247848 .2500694 3.70 0.000 .4346577 1.414912
edu2 | -.0572021 .0136202 -4.20 0.000 -.0838972 -.0305071
nodegree | .9270591 .3254621 2.85 0.004 .2891651 1.564953
married | -1.556471 .2517687 -6.18 0.000 -2.049929 -1.063014
black | 3.850668 .2662868 14.46 0.000 3.328755 4.37258
hisp | 1.673885 .409913 4.08 0.000 .8704705
2.4773
re74 | -.0002203 .0001086 -2.03 0.043 -.0004332 -7.40e-06
re75 | -.0001969 .0000378 -5.21 0.000 -.000271 -.0001228
u74 | 1.749522 .2897311 6.04 0.000
1.18166 2.317385
u75 | .00944 .257531 0.04 0.971 -.4953115 .5141915
edure74 | .0000222 9.08e-06 2.45 0.014 4.43e-06
.00004
age3 | .0005685 .0001113 5.11 0.000 .0003505 .0007866
_cons | -35.22098 3.797922 -9.27 0.000 -42.66477 -27.77719
686

-----------------------------------------------------------------------------note: 3 failures and 0 successes completely determined.

Description of the estimated propensity score


Estimated propensity score
------------------------------------------------------------Percentiles
Smallest
1% 5.92e-07
1.18e-09
5% 1.72e-06
4.07e-09
10% 3.63e-06
4.24e-09
Obs
16177
25% .0000196
1.55e-08
Sum of Wgt.
16177
50%
75%
90%
95%
99%

.0001247
Mean
.011436
Largest
Std. Dev.
.0691037
.0010579
.9356451
.0073933
.93718
Variance
.0047753
.0250635
.9374608
Skewness
9.281842
.3620009
.9384554
Kurtosis
99.39697

******************************************************
Step 1: Identification of the optimal number of blocks
Use option detail if you want more detailed output
******************************************************

The final number of blocks is 13


This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks

**********************************************************
Step 2: Test of balancing property of the propensity score
Use option detail if you want more detailed output
**********************************************************

The balancing property is satisfied

This table shows the inferior bound, the number of treated


and the number of controls for each block
Inferior |
687

of block |
treat
of pscore |
0
1 | Total
-----------+----------------------+---------0 | 11,635
0 | 11,635
.0007813 | 1,056
2 | 1,058
.0015625 |
932
5|
937
.003125 |
712
2|
714
.00625 |
709
2|
711
.0125 |
306
7|
313
.025 |
240
8|
248
.05 |
172
14 |
186
.1 |
96
19 |
115
.2 |
86
32 |
118
.4 |
31
38 |
69
.6 |
9
20 |
29
.8 |
8
36 |
44
-----------+----------------------+---------Total | 15,992
185 | 16,177

*******************************************
End of the algorithm to estimate the pscore
*******************************************
.
. * Nearest neighbor matching (random version)
. attnd re78 treat $CPSDW99, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit.


This operation may take a while.

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

155

730.380

1049.321

0.696

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches

688

Bootstrapping of standard errors


command:
attnd re78 treat age age2 edu edu2 nodegree married black hisp re74 re75 u74 u75
edure
> 74 age3 , pscore() logit comsup
statistic: attnd
= r(attnd)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attnd | 200 730.3805 1179.371 964.5437 -1171.658 2632.419 (N)
|
-9.143144 3738.959 (P)
|
-638.1188 1625.387 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

155

730.380

964.544

0.757

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches
.
. * Radius matching: Radius=0.0001
. attr re78 treat $CPSDW99, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius.


This operation may take a while.

689

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------67

1027 -2935.932

888.041

-3.306

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

Bootstrapping of standard errors


command:
attr re78 treat age age2 edu edu2 nodegree married black hisp re74 re75 u74 u75
edure7
> 4 age3 , pscore() logit comsup radius(.0001)
statistic: attr
= r(attr)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 200 -2935.932 522.4813 1276.508 -5453.15 -418.7147 (N)
|
-5239.598 302.9884 (P)
|
-6023.029 -1232.031 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Radius Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
690

--------------------------------------------------------67

1027 -2935.932

1276.508

-2.300

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius
.
. * Kernel Matching
. attk re78 treat $CPSDW99, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit.


This operation may take a while.

ATT estimation with the Kernel Matching method


--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

3856 1267.716

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use


the bootstrap option to get bootstrapped standard errors.

Bootstrapping of standard errors


command:
attk re78 treat age age2 edu edu2 nodegree married black hisp re74 re75 u74 u75
edure7
> 4 age3 , pscore() logit comsup bwidth(.06)
statistic: attk
= r(attk)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------691

attk | 200 1267.716 -57.76407 751.2898 -213.7948 2749.227 (N)


|
-304.83 2488.355 (P)
|
-314.1009 2459.423 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Kernel Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

3856

1267.716

751.290

1.687

--------------------------------------------------------.
. * Stratification Matching
. atts re78 treat, pscore(myscore) blockid(myblock) comsup boot reps($breps) dots

ATT estimation with the Stratification method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

3856

1505.512

734.270

2.050

---------------------------------------------------------

Bootstrapping of standard errors


command:
atts re78 treat , pscore(myscore) blockid(myblock) comsup
statistic: atts
= r(atts)
....................................................................................................
> ..................................................................................................
> ..

692

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------atts | 200 1505.512 61.77066 741.7862 42.7422 2968.282 (N)
|
245.6284 2880.622 (P)
|
348.125 2849.896 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Stratification method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

3856

1505.512

741.786

2.030

--------------------------------------------------------.
. *** (2C) CPS propensity score model from Becker-Ichino, 2002 (BI02)
.
. gen re742 = re74*re74
. gen re752 = re75*re75
. gen blacku74 = black*u74
. global CPSBI02 age age2 edu edu2 married black hisp re74 re75 re742 re752 blacku74
.
. * With common support option
. drop myscore myblock
. pscore treat $CPSBI02, pscore(myscore) blockid(myblock) comsup numblo(5) level(0.005) logit

****************************************************
Algorithm to estimate the propensity score
****************************************************

693

The treatment is treat


treat |
Freq. Percent
Cum.
------------+----------------------------------0 | 15,992
98.86
98.86
1|
185
1.14
100.00
------------+----------------------------------Total | 16,177
100.00

Estimation of the propensity score


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:
Iteration 7:
Iteration 8:

log likelihood = -1011.0713


log likelihood = -660.17479
log likelihood = -533.64831
log likelihood = -462.67008
log likelihood = -435.22392
log likelihood = -427.14921
log likelihood = -425.78297
log likelihood = -425.64689
log likelihood = -425.64309

Logit estimates

Number of obs =
16177
LR chi2(12) = 1170.86
Prob > chi2 = 0.0000
Log likelihood = -425.64309
Pseudo R2
= 0.5790
-----------------------------------------------------------------------------treat |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------age | .7902073 .0940972 8.40 0.000 .6057803 .9746344
age2 | -.0128161 .0015894 -8.06 0.000 -.0159313 -.0097009
edu | .9953909 .2558663 3.89 0.000 .4939022 1.49688
edu2 | -.0636036 .0131378 -4.84 0.000 -.0893532 -.0378541
married | -1.534639 .2516679 -6.10 0.000 -2.027899 -1.041379
black | 3.340175 .3032312 11.02 0.000 2.745853 3.934497
hisp | 1.636367 .3971529 4.12 0.000 .8579614 2.414772
re74 | -.0001744 .0000626 -2.79 0.005 -.0002971 -.0000517
re75 | -.000168 .0000693 -2.42 0.015 -.0003039 -.0000322
re742 | 8.06e-09 2.61e-09 3.09 0.002 2.95e-09 1.32e-08
re752 | -2.05e-09 3.97e-09 -0.52 0.605 -9.83e-09 5.73e-09
blacku74 | 1.033264 .288037 3.59 0.000 .4687217 1.597806
_cons | -18.16269 1.865757 -9.73 0.000 -21.81951 -14.50588
-----------------------------------------------------------------------------note: 112 failures and 0 successes completely determined.

Note: the common support option has been selected


694

The region of common support is [.00065577, .90386519]

Description of the estimated propensity score


in region of common support
Estimated propensity score
------------------------------------------------------------Percentiles
Smallest
1% .0006768
.0006558
5% .0007912
.000656
10% .0009583
.0006562
Obs
5354
25% .0016749
.0006566
Sum of Wgt.
5354
50%
75%
90%
95%
99%

.0040446
Mean
.0343457
Largest
Std. Dev. .1120884
.0089357
.8905055
.0495031
.898552
Variance
.0125638
.1913766
.9023286
Skewness
4.931471
.6773557
.9038652
Kurtosis
29.27201

******************************************************
Step 1: Identification of the optimal number of blocks
Use option detail if you want more detailed output
******************************************************

The final number of blocks is 10


This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks

**********************************************************
Step 2: Test of balancing property of the propensity score
Use option detail if you want more detailed output
**********************************************************
Variable blacku74 is not balanced in block 3
The balancing property is not satisfied
Try a different specification of the propensity score
Inferior |
of block |
of pscore |

treat
0

1|

Total
695

-----------+----------------------+---------0 | 4,230
13 | 4,243
.0125 |
330
7|
337
.025 |
231
9|
240
.05 |
126
14 |
140
.1 |
108
23 |
131
.2 |
87
30 |
117
.4 |
29
20 |
49
.5 |
10
24 |
34
.6 |
12
25 |
37
.8 |
6
20 |
26
-----------+----------------------+---------Total | 5,169
185 | 5,354
Note: the common support option has been selected

*******************************************
End of the algorithm to estimate the pscore
*******************************************
.
. * Without common support option
. drop myscore myblock
. pscore treat $CPSBI02, pscore(myscore) blockid(myblock) numblo(5) level(0.005) logit

****************************************************
Algorithm to estimate the propensity score
****************************************************

The treatment is treat


treat |
Freq. Percent
Cum.
------------+----------------------------------0 | 15,992
98.86
98.86
1|
185
1.14
100.00
------------+----------------------------------Total | 16,177
100.00

Estimation of the propensity score


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -1011.0713


log likelihood = -660.17479
log likelihood = -533.64831
log likelihood = -462.67008
696

Iteration 4:
Iteration 5:
Iteration 6:
Iteration 7:
Iteration 8:

log likelihood = -435.22392


log likelihood = -427.14921
log likelihood = -425.78297
log likelihood = -425.64689
log likelihood = -425.64309

Logit estimates

Number of obs =
16177
LR chi2(12) = 1170.86
Prob > chi2 = 0.0000
Log likelihood = -425.64309
Pseudo R2
= 0.5790
-----------------------------------------------------------------------------treat |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------age | .7902073 .0940972 8.40 0.000 .6057803 .9746344
age2 | -.0128161 .0015894 -8.06 0.000 -.0159313 -.0097009
edu | .9953909 .2558663 3.89 0.000 .4939022 1.49688
edu2 | -.0636036 .0131378 -4.84 0.000 -.0893532 -.0378541
married | -1.534639 .2516679 -6.10 0.000 -2.027899 -1.041379
black | 3.340175 .3032312 11.02 0.000 2.745853 3.934497
hisp | 1.636367 .3971529 4.12 0.000 .8579614 2.414772
re74 | -.0001744 .0000626 -2.79 0.005 -.0002971 -.0000517
re75 | -.000168 .0000693 -2.42 0.015 -.0003039 -.0000322
re742 | 8.06e-09 2.61e-09 3.09 0.002 2.95e-09 1.32e-08
re752 | -2.05e-09 3.97e-09 -0.52 0.605 -9.83e-09 5.73e-09
blacku74 | 1.033264 .288037 3.59 0.000 .4687217 1.597806
_cons | -18.16269 1.865757 -9.73 0.000 -21.81951 -14.50588
-----------------------------------------------------------------------------note: 112 failures and 0 successes completely determined.

Description of the estimated propensity score


Estimated propensity score
------------------------------------------------------------Percentiles
Smallest
1% 2.89e-08
1.94e-10
5% 3.05e-07
1.94e-10
10% 1.20e-06
1.94e-10
Obs
16177
25% .0000148
1.94e-10
Sum of Wgt.
16177
50%
75%
90%
95%
99%

.0001313
Mean
.011436
Largest
Std. Dev. .0664629
.0016513
.8905055
.0074369
.898552
Variance
.0044173
.0234798
.9023286
Skewness
8.811019
.3855562
.9038652
Kurtosis
89.82108

697

******************************************************
Step 1: Identification of the optimal number of blocks
Use option detail if you want more detailed output
******************************************************

The final number of blocks is 14


This number of blocks ensures that the mean propensity score
is not different for treated and controls in each blocks

**********************************************************
Step 2: Test of balancing property of the propensity score
Use option detail if you want more detailed output
**********************************************************
Variable blacku74 is not balanced in block 7
The balancing property is not satisfied
Try a different specification of the propensity score
Inferior |
of block |
treat
of pscore |
0
1 | Total
-----------+----------------------+---------0 | 11,076
1 | 11,077
.0007813 |
968
2|
970
.0015625 | 1,020
2 | 1,022
.003125 | 1,185
3 | 1,188
.00625 |
804
5|
809
.0125 |
330
7|
337
.025 |
231
9|
240
.05 |
126
14 |
140
.1 |
108
23 |
131
.2 |
87
30 |
117
.4 |
29
20 |
49
.5 |
10
24 |
34
.6 |
12
25 |
37
.8 |
6
20 |
26
-----------+----------------------+---------Total | 15,992
185 | 16,177

*******************************************
End of the algorithm to estimate the pscore
*******************************************
698

.
. * Nearest neighbor matching (random version)
. attnd re78 treat $CPSBI02, comsup boot reps($breps) dots logit

The program is searching the nearest neighbor of each treated unit.


This operation may take a while.

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

147

1214.888

988.298

1.229

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches

Bootstrapping of standard errors


command:
attnd re78 treat age age2 edu edu2 married black hisp re74 re75 re742 re752 blacku74
,
> pscore() logit comsup
statistic: attnd
= r(attnd)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attnd | 200 1214.888 379.5276 924.3417 -607.8733 3037.65 (N)
|
-199.325 3378.257 (P)
|
-1646.026 2654.964 (BC)
-----------------------------------------------------------------------------Note: N = normal
699

P = percentile
BC = bias-corrected

ATT estimation with Nearest Neighbor Matching method


(random draw version)
Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

147

1214.888

924.342

1.314

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


nearest neighbour matches
.
. * Radius matching: Radius=0.0001
. attr re78 treat $CPSBI02, comsup boot reps($breps) dots logit radius(0.0001)

The program is searching for matches of treated units within radius.


This operation may take a while.

ATT estimation with the Radius Matching method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------65

1089 -3094.104

857.247

-3.609

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius

Bootstrapping of standard errors


command:
attr re78 treat age age2 edu edu2 married black hisp re74 re75 re742 re752 blacku74 ,
> pscore() logit comsup radius(.0001)
statistic: attr
= r(attr)
700

....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attr | 200 -3094.104 603.6858 1724.927 -6495.585 307.3775 (N)
|
-5865.623 247.5659 (P)
|
-8184.668 -474.5812 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Radius Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------65

1089 -3094.104

1724.927

-1.794

--------------------------------------------------------Note: the numbers of treated and controls refer to actual


matches within radius
.
. * Kernel Matching
. attk re78 treat $CPSBI02, comsup boot reps($breps) dots logit

The program is searching for matches of each treated unit.


This operation may take a while.

ATT estimation with the Kernel Matching method


--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

5169

881.520

.
701

--------------------------------------------------------Note: Analytical standard errors cannot be computed. Use


the bootstrap option to get bootstrapped standard errors.

Bootstrapping of standard errors


command:
attk re78 treat age age2 edu edu2 married black hisp re74 re75 re742 re752 blacku74 ,
> pscore() logit comsup bwidth(.06)
statistic: attk
= r(attk)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------attk | 200 881.5195 193.3904 741.3048 -580.3012 2343.34 (N)
|
-375.8089 2373.732 (P)
|
-776.3726 2117.355 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Kernel Matching method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

5169

881.520

741.305

1.189

--------------------------------------------------------.
. * Stratification Matching
. atts re78 treat, pscore(myscore) blockid(myblock) comsup boot reps($breps) dots

702

ATT estimation with the Stratification method


Analytical standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

5169

1538.713

---------------------------------------------------------

Bootstrapping of standard errors


command:
atts re78 treat , pscore(myscore) blockid(myblock) comsup
statistic: atts
= r(atts)
....................................................................................................
> ..................................................................................................
> ..

Bootstrap statistics

Number of obs =
Replications =
200

16177

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------atts | 200 1538.713 18.76738 748.4438 62.81438 3014.612 (N)
|
249.6562 3263.537 (P)
|
225.0108 3230.658 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

ATT estimation with the Stratification method


Bootstrapped standard errors
--------------------------------------------------------n. treat. n. contr.
ATT Std. Err.
t
--------------------------------------------------------185

5169

1538.713

748.444

2.056

--------------------------------------------------------703

.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section6\mma25p3extra.txt
log type: text
closed on: 26 May 2005, 13:26:49
----------------------------------------------------------------------------------------------------

704

705

706

707

708

709

710

711

712

713

714

715

BOOK
716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

FIGURES
Most of these figures are produced by Stata programs given at this website.
Page Figure Brief caption
File
50 3.1
Social experiment with random assignment
ch3-fig1.wmf
89

4.1

Quantile regression estimates of slope coefficient

ch4fig1qr.wmf

90

4.2

Quantile regression estimated lines

ch4fig2qr.wmf

249 7.1

Power of Wald chi-square test

ch7power.wmf

253 7.2

Density of Wald test statistic of zero slope coefficient

ch7montecarlo.wmf

296 9.1

Histogram for log wage

ch9hist.wmf

296 9.2

Kernel density estimates for log wage

ch9kd1.wmf

297 9.3

Nonparametric regression of log wage on education

ch9ksm1.wmf

300 9.4

Kernel density estimates using differnet kernels

ch9kdensu1.wmf

309 9.5

k-NN regression

ch9ksmma.wmf

310 9.6

Nonparametric regression using Lowess

ch9ksmlowess.wmf

317 9.7

Nonparamertric estimate of derivative of y with respect to ch9kderiv.wmf


x

368 11.1

Bootstrap estimate of the density of t-test statistic

411 12.1

Halton sequence draws comparedto pseudo-random draws

413 12.2

Inverse transformation method for unit exponential draws ch12fig2invtransform.wmf

414 12.3

Accept-reject method for random draws

ch12fig3envelope.wmf

424 13.1

Bayesian analysis for mean parameter of normal density

ch13_bayes1.wmf

466 14.1

Charter boat fishing: probit and logit predictions

ch14binary.wmf

516 15.1

Generalized random utility model

ch15-Gen-RUM2.wmf

531 16.1

Tobit regression example

ch16condmeans.wmf

540 16.2

Inverse Mills ratio as censoring point c increases

ch16millsratio.wmf

575 17.1

Strike duration: Kaplan-Meier survival function

kennanstrk.wmf

585 17.2

Weibull distribution: density, survivor, hazard and ch17weibull.wmf


cumulative hazard functions

604 17.3

Unemployment duration: Kaplan-Meier survival function

605 17.4

Unemployment duration:
unemployment insurance

606 17.5

Unemployment duration: Nelson-Aalen cumulative hazard na_pt1.wmf


function

606 17.6

Unemployment duration: cumulative hazard functions by na_pt2.wmf


unemployment insurance

627 18.1

Length-biased sampling under stock sampling

633 18.2

Unemployment duration:
generalized residuals

633 18.3

Unemployment duration: Weibull model generalized exp_gamma.wmf


residuals

survival

ch11boot.wmf

functions

exponential-gamma

km_pt1.wmf

by km_pt2.wmf

ch18lbias.wmf
model exp.wmf

743

635 18.4

Unemployment duration: Weibull model generalized weibul16.wmf


residuals

636 18.5

Unemployment duration: Weibull-Inverse Gaussian model weibul16_ig.wmf


generalized residuals

661 19.1

Unemployment duration: Cox Competing Risks baseline combined_bsf.wmf


survival functions

662 19.2

Unemployment duration: Cox Competing Risks baseline combined_cbh.wmf


cumulative hazards

712 21.1

Hours and wages: pooled (overall) regression

ch21pantot.wmf

713 21.2

Hours and wages: between regression

ch21panbe.wmf

713 21.3

Hours and wages: within (fixed effects) regression

ch21panfe.wmf

714 21.4

Hours and wages: first differences regression

ch21panfd.wmf

793 23.1

Patents and R&D spending: pooled (overall) regression ch23fig1.wmf


[with corrected labelling of axes]

880 25.1

Regression-discontinuity design: example

ch25-fig1-rd.wmf

883 25.2

Treatment assignment in sharp and fuzzy RD designs.

ch25-fig2-rd.wmf

892 25.3

Training impact: earnings against propensity score by ch25treatment.wmf


treatment status

924 27.1

Missing data: examples of missing regressors

ch27fig1.wmf

Assign to
treatment
Yes
Eligible
subject
invited to
participate

Randomize

Agrees to
participate?

Assign to
control
No

Drop from
study

744

1
.8
.6
.4
.2

Upper 95% confidence band


Quantile slope coefficient
Lower 95% confidence band
OLS slope coefficient

Slope and confidence bands

Slope Estimates as Quantile Varies

.2

.4

.6

.8

15

Regression Lines as Quantile Varies


Actual Data
90th percentile
Median

10

10th percentile

Log Household Total Expenditure

Quantile

10

12

Log Household Medical Expenditure

745

.6

Test size = 0.10


Test size = 0.05

.4

Test size = 0.01

.2

Test Power

.8

Test Power as a function of the ncp

10

15

20

Noncentrality parameter lamda

.4

Monte Carlo Simulations of Wald Test


Monte Carlo

.2
.1
0

Density

.3

Standard Normal

-4

-2

Wald Test Statistic

746

.2

Density

.4

.6

Histogram for Log Wage

Log Hourly Wage

One-half plug-in
Plug-in

.2

.4

.6

Two times plug-in

Kernel density estimates

.8

Density Estimates as Bandwidth Varies

Log Hourly Wage

747

Bandwidth h=0.8

Bandwidth h=0.4

Bandwidth h=0.1

Actual data

Log Hourly Wage

Nonparametric Regression as Bandwidth Varies

10

15

20

Years of Schooling

.4

Epanechnikov (h=0.545)
Gaussian (h=0.246)
Quartic (h=0.646)

.2

Uniform (h=0.214)

Kernel density estimates

.6

Density Estimates as Kernel Varies

Log Hourly Wage

748

350

Actual Data

300

kNN (k=5)
Linear OLS

200

250

kNN (k=25)

150

Dependent variable y

k-Nearest Neighbours Regression as k Varies

20

40

60

80

100

Regressor x

Actual Data

300

Lowess (k=25)

200

250

OLS Cubic Regression

150

Dependent variable y

350

Lowess Nonparametric Regression

20

40

60

80

100

Regressor x

749

Nonparametric Derivative Estimation

From OLS Cubic Regression

-2

Dependent variable y

From Lowess (k=25)

20

40

60

80

100

Regressor x

.4

Bootstrap Density of 't-Statistic'


Bootstrap Estimate

.2
.1
0

Density

.3

Standard Normal

-4

-2

t-statistic from each bootstrap replication

750

.6
.4
0

.2

Cdf F(x)

.8

Inverse Transformation Method

Random variable x
Draw of 0.64 (vertical axis) yields x = 1.02 (horizontal axis).

.6

Accept-reject Method
Desired density f(x)

.4
.2
0

f(x) and kg(x)

Envelope kg(x)

10

Random variable x

751

.4

Bayes: Likelihood, Prior and Posterior


Likelihood N[10,2]
Prior N[5,3]

.2
0

.1

Density

.3

Posterior N[8,1.2]

10

15

Evaluation point

1.5

Predicted Probabilities Across Models


Actual Data (jittered)

Probit

.5

OLS

-.5

Predicted probability

Logit

-2

Log relative price (lnrelp)

752

Explanatory
variables
Disturbances

Disturbances

Latent
classes

Indicators

Latent
variables

Stated preference
indicators

Utilities

Observable
variable

Indicators

Unobservable
variable
Structural
relationship
Disturbances
Revealed preference
indicator y

-2000

2000

4000

Tobit: Censored and Truncated Means

Actual Latent Variable


Truncated Mean
Censored Mean

-4000

Different Conditional Means

Measurement
relationship

Uncensored Mean

Natural Logarithm of Wage

753

Inverse Mills ratio

N[0,1] Cdf

.5

1.5

N[0,1] Density

Inverse Mills, pdf and cdf

2.5

Inverse Mills Ratio as Cutoff Varies

-2

-1

Cutoff point c

.75
.5

Upper 95% confidence band


Survival Function

.25

Lower 95% confidence band

Survival Probability

Kaplan-Meier Survival Function Estimate

50

100

150

200

250

Strike duration in days

754

20

40

60

0 .2 .4 .6 .8 1

Weibull survivor

0 .01.02 .03.04

Weibull density

Weibull Distribution

80

20

40

60

80

Duration time

60

80

0 2 4 6 8

Cumulative hazard

.05 .1 .15
0

40

Duration time

Weibull hazard

Duration time

20

20

40

60

80

Duration time

Overall Survival Function Estimate


Upper 95% confidence band

.25

.5

.75

Lower 95% confidence band

Survival Probability

Survival Estimate

10

20

30

Unemployment Duration in 2-week intervals

755

1.00

Survival Function Estimates by UI Status


No UI (UI = 0)

0.75
0.50
0.25
0.00

Survival Probability

Received UI (UI = 1)

10

20

30

Unemployment Duration in 2-week intervals

1.5

Overall Cumulative Hazard Estimate


Upper 95% confidence band

.5

Lower 95% confidence band

Cumulative Hazard

Cumulative Hazard Estimate

10

20

30

Unemployment Duration in 2-week intervals

756

1.50

Cumulative Hazard Estimates by UI Status


No UI (UI = 0)

1.00
0.50
0.00

Cumulative Hazard

Received UI (UI = 1)

10

20

30

Unemployment Duration in 2-week intervals

S3

S2
S1

S5
12-month
survey
period

S4

S7

Survey date
S9
S6

S8

757

4
3
2
1

Cumulative Hazard

Exponential Model Residuals

Cumulative Hazard

45 degree line

Generalized (Cox-Snell) Residual

3
2
1

Cumulative Hazard
45 degree line

Cumulative Hazard

Exponential-Gamma Model Residuals

Generalized (Cox-Snell) Residual

758

4
2

Cumulative Hazard

Weibull Model Residuals

Cumulative Hazard

45 degree line

Generalized (Cox-Snell) Residual

4
3
2
1

Cumulative Hazard
45 degree line

Cumulative Hazard

Weibull-IG Model Residuals

Generalized (Cox-Snell) Residual

759

1
.8
.6

Risk 1 (full-time job)


Risk 2 (part-time job)

.2

.4

Risk 3 (unknown job)

Baseline Survival Probability

Baseline Survival Functions

10

20

30

Unemployment Duration in 2-week intervals

10

Risk 1 (full-time job)

Risk 2 (part-time job)

Risk 3 (unknown job)

Baseline Cumulative Hazard

Baseline Cumulative Hazard Functions

10

20

30

Unemployment Duration in 2-week intervals

760

8
6
4

Log annual hours

10

Pooled (Overall) Regression

Original data
Nonparametric fit

Linear fit

Log hourly wage

8
7.5
7

Averages
Nonparametric fit

6.5

Log annual hours

8.5

Between Regression

Linear fit

Log hourly wage

761

7
6
5

Log annual hours

Within (Fixed Effects) Regression

Deviations from average

Nonparametric fit
Linear fit

Log hourly wage

First differences
Nonparametric fit
Linear fit

-5

Log annual hours

First Differences Regression

-2

-1

Log hourly wage

762

4
2
0

Log Patents

Pooled (Overall) Regression

Original data

-2

Nonparametric fit
Linear fit

-5

10

Log R&D Spending

10
5

Actual data
No treat (low)
Treat (high)

Outcome y

15

20

Regression Discontinuity Example

Selection variable S

763

Post-treatment Earnings against Propensity Score


Treated_sample

5000

10000

15000

Real Earnings 1978

20000

Comparison_sample

.5

Propensity Score
Original data

.5

Propensity Score
Nonparametric regression

Graphs by Treatment Status

Propensity score Pr[D=1|S]

Sharp and Fuzzy RD Designs

Sharp Design
Fuzzy design

Selection variable S

764

Post-treatment Earnings against Propensity Score

5000

10000

15000

Treated_sample

Real Earnings 1978

20000

Comparison_sample

.5

Propensity Score
Original data

.5

Propensity Score
Nonparametric regression

Graphs by Treatment Status

765

BOOK CORRECTIONS - June 9, 2005 plus some but not all corrections since
then added
Page
p.85

Date Posted Correction or Addition


2/18/2006 Bottom line should be "censored models (see Section 16.9.2)." [Jeff Smith,
Michigan]

p.68, 147 11/22/2005 Liebler should be spelt Leibler [Joerg Stoye, NYU]
p.89

3/30/2006

Third last line should be "q = 0.1, 0.5, and 0.9" and not "q = 0.1, 0.2, ...,
0.9" [James MacKinnon, Queen's]

p. 113

5/27/2005

Exercise 4-2 part (b) should be Hence directly obtain a consistent estimate
of
the
variance
of
_hat
(and not Hence directly obtain the variance of y_bar)

p. 114

6/9/2005

Exercise 4-7 parts (d)-(f) need to be replaced. See mmaex04_7.pdf.

p. 164

6/9/2005

Exercise 5-1 is correct but the function is close to


A better example uses E[y|x]=exp(0+0.04x)/[1+exp(0+0.04x)].

p. 165

6/9/2005

Exercise 5-7 part (c) is ML estimation (delete the word NLS).

p. 168

3/3/2006

Second line after first displayed equation should be E[h(x)(y-g(x,))] = 0


(and not E[h(x)(y-x')]) [Doug Miller, UC-Davis]

p. 178

3/3/2006

Last displayed equation. The first and third matrices are wrong and should
be similar to G_hat in (6.21). For these matrices the two terms being
summed over i should be x_i*x_i' and 3*utilde_i^2*x_i*x_i'. [Doug Miller,
UC-Davis]

p. 189

3/6/2006

Theil's interpretation. Change "suppose that in the reduced form model" to


"Suppose that we specify a first-stage model where" [Doug Miller, UCDavis]

p.190

3/6/2006

Basmann's interpretation. Change "OLS reduced form prediction" to "OLS


first-stage predictions". [Doug Miller, UC-Davis]

p.193

3/6/2006

Top line change "because to regressors" to "because the regressors" [Doug


Miller, UC-Davis]

p. 199

5/18/2005

In Table 6.4 NL2SLS column is 0.969, 0.041, 0.84 (and not 0.960, 0.046,
0.85)

p. 214

3/28/2006

In the displayed equation for the 3SLS estimator the matrix OMEGA_hat
should
be
SIGMA-hat.
Same change two lines down and four lines down. SIGMA_hat = definition
given for OMEGA_hat.

p. 220

5/27/2005

Exercise 6-1 part (a) should be (y - exp(x'))^2 (and not (y - (x'))^2)

linear.

Exercise 6-1 part (d) should be E[x(y - exp(x'))] = 0 (so add = 0)


p. 255

5/18/2005

Sample size was N=40 (and not N=30)

p.255

5/18/2005

Five lines from bottom should be z = (0.817 - 1) / 0.376 = -0.487

p. 256

5/18/2005

In section 7.8.3 the percentiles should be -1.89 and 1.80 (and not -2.62 and
1.83)

p.278,280 11/22/2005 Liebler should be spelt Leibler [Joerg Stoye, NYU]


p. 414

5/18/2005

Figure 12.3 vertical axis label should be f(x) and kg(x) and legend should
be kg(x) (and not g(x))
766

p. 493

2/18/2006

First two lines should be "in the probability of fishing from a beach, and an
increase
of
0.119,
0.080,
and 0.068, respectively, in the probability of fishing from a pier, a private
boat,
and
a
charter
boat."
[Jeff Smith, Michigan]

p. 501

3/22/2006

(15.17) and the line before should have minus sign before the expected
Hessian. [Frank Windmeijer, Bristol]

p. 505

3/22/2006

Fifth line should be "computer-intensive" not "computer-intesive". [Frank


Windmeijer, Bristol]

p.508

3/22/2006

Possible error in (15.31) needs to be checked

p. 569

5/19/2005

Bibliographic note 16.3 should refer to Tobin (1958) (and not Tobit (1958))
[Kevin Hoover, UCD]

p. 793

4/7/2005

Figure 23.1 axes labels are reversed. Vertical axis is log(patents) and
horizontal axis is log(R&D)

p. 839

4/10/2006

Second equality for SIGMA_c^-1 should not have the inverse at the end.

p. 839

4/10/2006

Formula for [I + aee']^(1/2) should finish with ee' and not Mee'.

p. 895

5/26/2005

Table 25.6 footnote b drop RE74*RE75 from the list of regressors

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

Вам также может понравиться