Partial Identification of Probability Distributions With Misclassified Data

Journal of Econometrics 144 (2008) 81117
Partial identication of probability distributions

with misclassied data
Francesca Molinari
Department of Economics, Cornell University, 492 Uris Hall, Ithaca, NY 14853-7601, USA
Received 14 March 2006; received in revised form 19 August 2007; accepted 11 December 2007
Available online 6 January 2008
Abstract
This paper addresses the problem of data errors in discrete variables. When data errors occur, the observed variable is a
misclassied version of the variable of interest, whose distribution is not identied. Inferential problems caused by data
errors have been conceptualized through convolution and mixture models. This paper introduces the direct misclassication
approach. The approach is based on the observation that in the presence of classication errors, the relation between the
distribution of the true but unobservable variable and its misclassied representation is given by a linear system of
simultaneous equations, in which the coefcient matrix is the matrix of misclassication probabilities. Formalizing the
problem in these terms allows one to incorporate any prior information into the analysis through sets of restrictions on the
matrix of misclassication probabilities. Such information can have strong identifying power. The direct misclassication
approach fully exploits it to derive identication regions for any real functional of the distribution of interest. A method
for estimating the identication regions and construct their condence sets is given, and illustrated with an empirical
analysis of the distribution of pension plan types using data from the Health and Retirement Study.
r 2007 Elsevier B.V. All rights reserved.
JEL classication: C10; C13; C14; J26
Keywords: Misclassication; Partial identication; Direct misclassication approach
1. Introduction
Error-ridden data constitute a signicant problem in nearly all elds of science. There are many possible
sources of data errors. Examples include use of inexact measures because of high costs or infeasibility of exact
evaluation, tendency of study subjects to underreport socially undesirable behaviors and attitudes and
overreport socially desirable ones, or imperfect recall (or lack of knowledge) by study subjects. When data
errors are present, often the sampling process does not identify the probability distribution of interest, and
inference is impaired.
This paper addresses the problem of data errors in discrete variables. Interest in the question emerges from
the observation that much of the empirical work in economics and related elds is based on the analysis of
ARTICLE IN PRESS
www.elsevier.com/locate/jeconom
0304-4076/$ - see front matter r 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.jeconom.2007.12.003
Tel.: +1 6072556367; fax: +1 6072552818.

E-mail address: fm72@cornell.edu
survey data. The reliability of these data is well documented to be less than perfect (see for example Bound
et al., 2001). Although survey questions may gather information on variables that are conceptualized as
continuous (e.g., age, earnings, etc.), a considerable part of the collected data is in the form of variables taking
values in nite sets. Examples include educational attainment, language prociency, workers union status,
employment status, health conditions, and health/functional status.
When data errors occur in variables of this type, it is natural to think about the problem in terms of
classication errors (see for example Bross, 1954; Aigner, 1973). An example may clarify this point. Suppose
that an analyst is interested in learning the distribution of pension plan types in the American population.
Three types are possible: dened benet (DB), dened contribution (DC), and plans incorporating features of
both. Suppose that the analyst has data from a nationally representative survey which queried a random
sample of American households about their pension plans characteristics. Validation studies document that a
signicant fraction of the reported plan types differ from the truth; for example, some people who truly have a
DB plan are erroneously classied as having a DC plan (Gustman and Steinmeier, 2001).
To formalize the problem, suppose that each member l of a population L is characterized by the vector
(w
l
; x
l
) c X X, where X is a discrete set, not necessarily ordered, denoted by X {1; 2; . . . ; J], 2pJoo.
Let a sampling process draw persons at random from L. Suppose that the analyst is interested in learning
features of the distribution P(x) from the available data. However, she does not observe realizations of x, but
observes realizations of w, which can either be equal or differ from the realizations of x. In the above example,
x denotes the true pension plan type and w the type reported in the survey.
Much of the existing literature on drawing inference in presence of error-ridden data has conceptualized the
problem using either convolution models or mixture models. In the case of convolution models, a latent variable
v c V is introduced and w is assumed to measure x with chronic (i.e., affecting each observation) errors-in-
variables: w = x v. Researchers using convolution models commonly assume that the latent variable v is
statistically independent from x or uncorrelated with x and has mean zero (see, e.g., Klepper and Leamer,
1984). In the case of mixture models, latent variables v c V and z c {0; 1] are introduced and w is viewed as a
contaminated version of x, generated by the mixture w = zx (1 z)v. In this model, z denotes whether x or v
is observed and realizations of w with z = 1 are said to be error free. Researchers using mixture models
commonly assume that the error probability Pr(z = 0) is known or at least that it can be bounded non-trivially
from above (see, e.g., Horowitz and Manski, 1995).
When a variable with nite support is imperfectly classied, it is widely recognized that the assumption,
typical in convolution models, of independence between measurement error and true variable cannot hold
(see, for example, Bound et al., 2001, p. 3735). Moreover, compelling evidence from validation studies suggests
that errors in the data are occasional rather than chronic: a signicant part of the observed data are error
free. Mixture models seem therefore more suited for the analysis of such data. However, often the researcher
has prior information on the nature of the misclassication pattern that has transformed x into w. This
information may aid in identication, but cannot easily be exploited through a mixture model.
In this paper I propose an alternative framework, which I call the direct misclassication approach, to draw
inference on the distribution of discrete variables subject to classication errors. The approach does not rely
on the introduction of latent variables, but is based on the observation that in the presence of misclassication,
the relation between the observable distribution of w and the unobservable distribution of x is given by
Pr(w = 1)
.
.
.
Pr(w = J)
_
_
_
_
=
Pr(w = 1[x = 1) . . . Pr(w = 1[x = J)
.
.
.
.
.
.
.
.
.
Pr(w = J[x = 1) . . . Pr(w = J[x = J)
_
_
_
_
Pr(x = 1)
.
.
.
Pr(x = J)
_
_
_
_
. (1.1)
In all that follows I denote by P
%
the matrix of elements {Pr(w = i[x = j)]
i;jcX
which appears on the right-
hand side of the above equation. For iaj, Pr(w = i[x = j) is generally referred to as misclassication
probability. Eq. (1.1) is a simple formalism and does not have content per se. However, it becomes potentially
informative when combined with assumptions on the matrix of misclassication probabilities P
%
; such
assumptions generate a misclassication model.
The method that I introduce allows one to draw inference on P(x) and on any real functional of this
distribution using Eq. (1.1) directly, when restrictions on the elements of P
%
are imposed. Due to the
ARTICLE IN PRESS
F. Molinari / Journal of Econometrics 144 (2008) 81117 82
classication errors, the identication of the probability distribution P(x) is partial and the inference on any of
its real functionals is in the form of identication regions, that is, sets collecting the feasible values of such
functionals. I show that these regions are sharp, in the sense that they exhaust all the available information,
given the sampling process and the maintained assumptions. Manski (2003) gives an overview of the literature
on partial identication; for other work see, e.g., Hotz et al. (1997) and Blundell et al. (2007).
The restrictions imposed on P
%
can have several origins, including validation studies, economic theory,
cognitive and social psychology, or information on the circumstances under which the data have been
collected. In this paper I study their identifying power in general. I then consider a few specic examples. As a
starting point, I assume that the researcher has a known lower bound on the probability that the realizations
of w and x coincide, i.e., Pr(w = x)X1 l, or, strengthening this assumption, that the researcher has a known
lower bound on the probability of correct report for each value that x can take, i.e., Pr(w = j[x = j)X1 l,
\j c X. This information is often provided by validation studies or knowledge of the circumstances under
which the data have been collected.
1
In this paper it is regarded as base-case information, and the
identication regions derived under these assumptions constitute the baseline of the analysis. Then I consider
the case of constant probability of correct report and the case of monotonicity in correct reporting. I show
that these assumptions can have identifying power when maintained alone, as well as when imposed jointly
with the base case assumptions.
The assumption of constant probability of correct report is motivated by the ndings of validation studies.
For specic survey inquiries, these studies suggest that the probability of correct report, for at least a subset of
the values that x can take, is constant. For example, in the context of self-reports of employment status,
Poterba and Summers (1995) analysis suggests that there is approximately the same probability of correct
report for people who are employed and for those who are not in the labor force, but a much lower probability
of correct report for people who are unemployed.
The assumption of monotonicity in correct reporting is motivated by social psychology, which suggests that
when survey respondents are asked questions relative to socially and personally sensitive topics, they tend to
underreport socially undesirable behaviors and attitudes, and overreport socially desirable ones. This
suggestion is supported by validation studies, which often document, within a given survey inquiry, that the
probability of correct report of a certain alternative is greater than or equal to the probability of correct report
of a less socially desirable alternative. This is the case, for example, when survey respondents are asked about
their participation in welfare programs.
The proposed method allows the researcher to easily incorporate these assumptions, and in general any
restriction on the misclassication pattern, into the analysis. The method is easy to implement and often
computationally tractable (see Section 2.2 for a discussion of computational issues). Despite the fact that the
results of validation studies on discrete variables are often presented in the form of matrices of
misclassication probabilities (see, e.g., Bound et al., 2001), and the appeal of the simple formalization
given by the misclassication models, there appear to be no precedents to the direct use of Eq. (1.1) to deal
with the identication problems caused by classication errors.
However, there are precedents to the use of specic restrictions on misclassication probabilities. Aigner
(1973), Klepper (1988) and Bollinger (1996) imposed different sets of assumptions on the probabilities of
misclassifying a dichotomous variable x and derived sharp non-parametric bounds on the mean regression
E(y[x). Their approach is close in spirit to the one in this paper, but their methods are designed exclusively for
binary variables and for the case in which specic assumptions hold. Swartz et al. (2004) discuss identication
problems due to misclassication from a Bayesian perspective. In particular, they focus on permutation-type
non-identiability by which switching the positions of Pr(x = i) and Pr(x = j), and those of P(w[x = i) and
P(w[x = j), the implied distribution of P(w) does not change. They introduce several assumptions on the
matrix of misclassication probabilities which overcome this type of problem, and achieve point identication
by imposing a prior on the misclassication matrix and on P(x).
ARTICLE IN PRESS
1
Availability of a lower bound on the error probability is a commonplace assumption in the statistic literature on robust estimation,
which makes use of mixture models. For example, Hampel (1974) and Hampel et al. (1986) state that the proportion of gross errors in
data, depending on circumstances, is normally between 0.1% and 10% with several percent being the rule rather than the exception
(p. 387 and p. 28, respectively).
Most of the related literature in econometrics (e.g., Card, 1996; Hausman et al., 1998; Abrevaya and
Hausman, 1999; Lewbel, 2000; Dustmann and van Soest, 2000; Kane et al., 1999; Ramalho, 2002) proposes
methods imposing restrictions on misclassication probabilities to achieve parametric or semiparametric
identication of the quantities of interest (i.e., features of P(y[x), or, less often, P(x)).
2
As such, these methods
are subject to criticisms against possible misspecications; moreover, while the assumptions employed might
hold in some data sets, there might be other data sets for which they do not hold, and in that case the methods
cannot be applied. Additionally, often these assumptions are maintained for technical reasons and do not have
an obvious interpretation.
Horowitz and Manski (1995, HM henceforth) introduced fully non-parametric methods to draw inference
on features of the distribution of a random variable x when the sampling process is corrupted or
contaminated. They adopted a mixture model and showed that if the researcher has a (non-trivial) lower
bound 1 l on the probability that the realization of w is drawn from the distribution of x, informative
bounds can be obtained on any parameter of the distribution P(x) that respects stochastic dominance. HM
showed that these bounds are sharp, in the sense that they exhaust all the available information, given the
sampling process and the maintained assumptions. The assumptions they entertain imply the base case
assumptions on P
%
introduced above, namely Pr(w = x)X1 l, and Pr(w = j[x = j)X1 l, \j c X.
3
When
only these assumptions are maintained, in terms of identication of the types of parameters considered by
HM, the method developed in this paper is equivalent to the one they proposed.
However, often different, and perhaps more, information is available to the applied researcher beyond that
maintained by HM. This information can have strong identifying power, but cannot be easily used within a
mixture model. In particular, for each additional assumption that the researcher wants to bring to bear, she
needs to derive new sharp identication regions for the parameters of interest. Closed form results are often
not easy to obtain and different (possibly computationally challenging) calculation methods for the bounds
and condence sets may need to be devised for each different set of assumptions.
The direct misclassication approach, on the other hand, does not rely on any specic set of
assumptions, but can incorporate any prior information on the misreporting pattern into the analysis. For
any set of maintained assumptions, the method guarantees sharpness of the implied identication regions,
and these regions and their condence sets can be estimated using a relatively simple method introduced
in Section 2.
In this paper I focus on a single misclassied variable x. The method easily extends to drawing inference on
features of the distribution of x conditional on a perfectly observed covariate, or on the joint distribution of
several misclassied variables, taking values in nite sets. Given an outcome variable of interest y c Y, the
approach also extends to drawing inference on features of the distribution P(y[x) when x is subject to
classication errors. Moreover, it can allow one to draw inference when the data are not only error-ridden, but
also incomplete, a situation very common in practice. In fact, in presence of both misclassied and missing
data, the matrix in Eq. (1.1) simply becomes rectangular rather than square, with additional rows giving the
probabilities of having missing data, conditional on the true values of x.
The paper is organized as follows. Section 2 introduces the method, describes connectedness properties of
the identication regions, outlines how the identication regions can be estimated consistently, and proposes a
procedure to calculate condence sets for the identication regions. Section 3 studies the identifying power of
a few specic assumptions, some of which have not been previously considered in the literature. Section 4
illustrates the estimation method with an application to data on the distribution of pension plans
characteristics in the American population. Section 5 discusses extensions of the direct misclassication
approach. Section 6 concludes. All of the mathematical details are in Appendix A.
ARTICLE IN PRESS
2
Specic restrictions include the following: Bross (1954), when introducing the misclassication problem for binary data, assumed that
Pr(w = 1[x = 0) and Pr(w = 0[x = 1) are of the same order of magnitude. Usually with binary data it is assumed either that Pr(w =
1[x = 0) = Pr(w = 0[x = 1)o
1
2
(e.g., Klepper, 1988; Card, 1996), or that Pr(w = 1[x = 0) Pr(w = 0[x = 1)o1 (e.g., Bollinger, 1996;
Hausman et al., 1998). When J42, it is assumed that other monotonicity restrictions between the elements of P
%
hold (e.g., Abrevaya and
Hausman, 1999; Dustmann and van Soest, 2000), or that specic types of misclassication do not occur (Gong et al., 1990).
3
If the researcher has an upper bound l on the error probability, and the sampling process is corrupted, the rst assumption follows; if
the sampling process is contaminated, the second assumption follows. These results are rigorously proved in Molinari (2003).
2. The direct misclassication approach
In all that follows, to keep the focus on identication, I treat identied quantities as population parameters,
and I assume that Pr(w = j)40 \j c X. A method to consistently estimate the identication regions and
construct their condence sets is provided at the end of this section.
Let P
w
denote the column vector [P
w
j
; j c X] [Pr(w = j); j c X], P
x
the column vector [Pr(x = j); j c X],
and P
%
the stochastic matrix which, through Eq. (1.1), generates the misclassication of x into w. Denote the
elements of P
%
by p
%
ij
{Pr(w = i[x = j)], i; j c X, and the columns of P
%
by p
%
j
. Let C
X
denote the space of
all probability distributions on X and dene analogously C
XW
; let R denote the real line. Let t : C
X
R be
a real functional of P(x), denoted t[P
x
], with analogous denitions for functionals of the joint distribution of
(w; x). A particularly simple functional of P(x) is t[P
x
] = E[1(x = j)] = Pr(x = j), j c X. For any given matrix
of functionals of interest Y, let H[Y] denote its identication region.
Given this notation, I can rewrite Eq. (1.1) as
P
w
= P
%
P
x
. (2.1)
The direct misclassication approach starts from the observation that Pr(x = j), j c X enters each of the J
equations in system (1.1). Hence, each one of these equations can, potentially, imply restrictions on Pr(x = j),
and therefore on P
x
and t[P
x
]. The extent to which this is the case crucially depends on what assumptions are
imposed on the misreporting pattern.
The approach is quite intuitive. If P
%
were known, and of full rank, I would be able to solve the system of
linear equations in (2.1) and uniquely identify P
x
, and therefore t[P
x
]. In practice, the misclassication
probabilities p
%
ij
, i; j c X are known only to belong to a set H[P
%
], dened below. This set accounts both for
the restrictions coming from probability theory, as well as for the restrictions on the misreporting pattern
coming from validation studies, social and cognitive psychology, economic theory, etc. Denote the elements of
H[P
%
] by P {p
ij
]
i;jcX
, and the columns of this matrix by p
j
, j c X. When H[P
%
] is not a singleton, P
x
is not
identied and t[P
x
] need not be identied, but only known, respectively, to lie in the identication regions
H[P
x
] and H{t[P
x
]].
The identication region H[P
x
] is dened as the set of column vectors p
x
= [p
x
k
; k c X], such that, given
P c H[P
%
], p
x
solves system (2.1):
H[P
x
] = {p
x
: P
w
= Pp
x
; P c H[P
%
]]. (2.2)
In the next subsection, H[P
%
] is formally dened and characterized in a way such that \P c H[P
%
], p
x
k
X0,
\k c X, and
J
k=1
p
x
k
= 1.
Throughout this paper, the notation p
x
is reserved to elements of H[P
x
] and the notation p
x
k
to the kth
component of a vector p
x
. Hence, p
x
k
and p
x
represent, respectively, feasible values of Pr(x = k), k c X, and
[Pr(x = j); j c X], given P c H[P
%
] and Eq. (2.1). By construction
p
x
p
x
(P; P
w
),
p
x
k
= p
x
k
(P; P
w
); k c X.
For ease of notation, I omit the arguments of p
x
k
and p
x
. The identication region H{t[P
x
]] is then dened as
H{t[P
x
]] = {t[p
x
] : p
x
c H[P
x
]]. (2.3)
The set H[P
%
] is of central importance for the identication of P
x
and t[P
x
], as the identication regions of
these functionals are dened on the basis of H[P
%
]. I denote by H
P
[P
%
] the set of matrices that satisfy the
probabilistic constraints and by H
E
[P
%
] the set of matrices satisfying the constraints coming from validation
studies and theories developed in the social sciences. Hence,
H[P
%
] = H
P
[P
%
] H
E
[P
%
].
The geometry of H[P
%
] and its connectedness properties are of particular interest. This is because the
continuous image of a connected set is connected. Hence, if H[P
%
] is connected and p
x
is a continuous
function of P, H[P
x
] is connected as well, and so is H{t[P
x
]] if t() is a continuous functional. Conversely, if
H[P
%
] is not connected or if the functionals are not continuous, H[P
x
] and H{t[P
x
]] need not necessarily be
ARTICLE IN PRESS
connected. This has implications for the estimation of the identication regions. Consider for example the case
that interest centers on a real valued functional t[P
x
]. When H{t[P
x
]] is a connected set, it is given by the entire
interval between its smallest and its largest points. Hence by estimating these two points one obtains an
estimate of the entire identication region. When H{t[P
x
]] is disconnected, parts of the interval between the
smallest and the largest points are not feasible and therefore are not elements of the identication region.
Section 2.2 introduces a method to estimate H{t[P
x
]] when this is the case.
A relevant example of a case in which p
x
is a continuous function of P is obtained when each matrix
P c H[P
%
] is of full rank. In this case, for each P c H[P
%
], one can solve the linear system in (2.1), obtaining
p
x
= P
1
P
w
. It is a well known result in matrix algebra that the inverse of a non-singular matrix is continuous
in the elements of the matrix (see, e.g., Campbell and Meyer, 1991, Chapter 10). A very simple condition
ensuring that each matrix P c H[P
%
] is of full rank is assuming that the probability of correct report is greater
than
1
2
for each of the values that x can take.
4
Validation studies suggest that this requirement is often satised
in practice.
5
2.1. The set H[P
%
] and its geometry
I start by characterizing the set H
P
[P
%
] and its geometry. Probability theory requires that
J
i=1
p
ij
= 1,
\j c X, that p
ij
X0, \i; j c X, and that, given P
w
, Eq. (2.1), and P, the implied p
x
gives a valid probability
measure. Denote by H
P
[P
%
] the set of Ps that satisfy these probabilistic requirements, so that, throughout
the entire paper,
H
P
[P
%
] P :
p
ij
X0; \i; j c X;
J
i=1
p
ij
= 1; \j c X;
p
x
h
X0; \h c X;
J
h=1
p
x
h
= 1
_ _ _ _
. (2.4)
Notice that the set H
P
[P
%
] can be dened alternatively using the notions of (J 1)-dimensional simplex and
convex hull of a set of vectors. I use the following denitions:
Denition 1. The (J 1)-dimensional simplex is the set D
J1
{d c R
J
: d
1
d
2
d
J
= 1].
Denition 2. The convex hull of a nite subset {n
1
; n
2
; . . . ; n
J
] of R
J
, denoted conv{n
1
; n
2
; . . . ; n
J
], consists of all
the vectors of the form a
1
n
1
a
2
n
2
a
J
n
J
with a
i
X0 \i = 1; . . . ; J and
J
i=1
a
i
= 1. (Rockafellar, 1970,
Corollary 2.3.1.)
By denition, P
w
c D
J1
. The set H
P
[P
%
] can be rewritten as
H
P
[P
%
] {P : p
j
c D
J1
and p
x
j
X0 \j c X; and P
w
c conv{p
1
; p
2
; . . . ; p
J
]]. (2.5)
In words, a matrix P is an element of H
P
[P
%
] if its columns are probability mass functions, the implied p
x
is a
probability mass function, and the vector P
w
can be expressed as a convex combination of the columns of P.
This set of matrices contains also matrices that are not of full rank. Notably, it contains the matrix with each
column identical to P
w
, denoted
~
P. This matrix plays an important role in Proposition 1 below.
To describe the geometry of H
P
[P
%
] I need to introduce another denition:
Denition 3. A subset G of R
n
is star convex with respect to c
0
c G if for each c c G the line segment joining c
and c
0
lies in G (Munkres, 1991, p. 330).
Star convexity implies path-connectedness, which in turn implies connectedness. Given a set of matrices
P R
JJ
, dene the line segment between two matrices P
1
; P
2
c P as
P
a
= aP
1
(1 a)P
2
; a c [0; 1].
ARTICLE IN PRESS
4
If p
jj
4
1
2
; \j c X, \ P c H[P
%
], P
T
is strictly diagonally dominant, and hence P is non-singular. An n n matrix A = {a
ij
] is said to be
strictly diagonally dominant if, for i = 1; 2; . . . ; n, [a
ii
[4
n
j=1(jai)
[a
ij
[. A proof of the fact that if A is strictly diagonally dominant, then A is
non-singular, can be found in Horn and Johnson (1999, Theorem 6.1.10.)
5
Among others, this is the case in the context of workers union status (see, e.g., Card, 1996), transfer program recipiency (see, e.g.,
Moore et al., 1996), employment status (see, e.g., Poterba and Summers, 1995), and 1- and 3-digit level classication of industry and
occupation (see, e.g., Mellow and Sider, 1983).
Then the set P is convex if given any two matrices P
1
; P
2
c P, P
a
c P for all a c (0; 1). Connectedness of the
set H
P
[P
%
] is established in the following proposition:
Proposition 1. The set H
P
[P
%
] is star convex with respect to
~
P. However, it is not star convex with respect to any
other of its elements.
The result in Proposition 1 implies that the set H
P
[P
%
] is not convex, because a convex set is star convex
with respect to each of its elements. The set H
P
[P
%
] is illustrated in Example 1 and in the rst panel of Fig. 1.
Example 1. Suppose that x and w are binary, i.e., that J = 2, and let P
w
1
= 0:3. Then the matrix P is
determined by its two diagonal elements, p
11
and p
22
, and
p
x
1
c [0; 1] : P
w
1
= p
11
p
x
1
(1 p
22
)(1 p
x
1
).
It is easy to verify that
H
P
[P
%
] = {p
11
; p
22
: (p
11
c [0; P
w
1
]; p
22
c [0; 1 P
w
1
]) C (p
11
c [P
w
1
; 1]; p
22
c [1 P
w
1
; 1])].
This set is plotted in the rst panel of Fig. 1, and its star convexity is apparent.
ARTICLE IN PRESS
0 0.5 1
0
0.2
0.4
0.6
0.8
1
11
2
2
Panel 1: H
P
[
]
0 0.5 1
11
Panel 2: H[
]
Assuming
11
=
22
0 0.5 1
11
0 0.5 1
11
0 0.5 1
11
0 0.5 1
11
Panel 3: H[
]
Assuming
11

22
0
0.2
0.4
0.6
0.8
1
2
2
0
0.2
0.4
0.6
0.8
1
2
2
0
0.2
0.4
0.6
0.8
1
2
2
0
0.2
0.4
0.6
0.8
1
2
2
0
0.2
0.4
0.6
0.8
1
2
2
Panel 4: H[
]
Assuming
11

22
Panel 5: H[
]
Assuming
jj
0.2 j X
Panel 6: H[
]
Assuming
jj
0.8 j X
Fig. 1. Geometry of the set H
P
[P
%
], and of the set H[P
%
] under different assumptions, when J = 2 and Pr(w = 1) = 0:3.
The problem which occurs in this example relates to the permutation-type non-identiability considered by
Swartz et al. (2004). For a given P
1
c H
P
[P
%
], one can obtain another P
2
c H
P
[P
%
] by letting p
2
11
= 1 p
1
22
and p
2
22
= 1 p
1
11
. Letting ~ p
x
1
= (1 p
x
1
) yields p
1
11
p
x
1
(1 p
1
22
)(1 p
x
1
) = p
2
11
~ p
x
1
(1 p
2
22
)(1 ~ p
x
1
). This
explains the symmetry of H
P
[P
%
] around the line p
22
= 1 p
11
.
Denote by H
E
[P
%
] the set of matrices that satisfy the restrictions on the misreporting pattern coming from
prior information. Then if, for example, validation studies suggest a uniform lower bound on the probability
of correct report for each j c X,
H
E
[P
%
] = {P : p
jj
X1 l \j c X].
If social psychology suggests that individuals, when answering about the frequency with which they engage in
a certain socially desirable activity, either provide correct reports or overreport,
H
E
[P
%
] = {P : p
ij
= 0 \ioj c X].
Of course, plenty of other restrictions are possible.
Because H
P
[P
%
] is connected, but not convex, when I take its intersection with the set H
E
[P
%
] I obtain a set
H[P
%
] that might be disconnected, connected, or convex, depending on how H
E
[P
%
] slices H
P
[P
%
]. Below I
provide three examples of sets H
E
[P
%
], which are further analyzed in Section 3. Each of these sets is trivially
convex, as it is linear in P, but its intersection with H
P
[P
%
] generates sets H[P
%
] that can be disconnected,
connected, or convex. These examples are illustrated in the six panels of Fig. 1.
Example 2 (Constant probability of correct report). Let H
E
[P
%
] = {P : p
jj
= p \j c X]. Suppose that x and w
are binary, i.e. that J = 2. Then
H[P
%
] =
{p : p c [0; P
w
1
] C [1 P
w
1
; 1]] if P
w
1
o
1
2
;
{p : p c [0; 1 P
w
1
] C [P
w
1
; 1]] if P
w
1
4
1
2
;
{p : p c [0; 1]] if P
w
1
=
1
2
:
_
_
Hence, if P
w
1
a
1
2
, H[P
%
] is disconnected. This set is plotted in the second panel of Fig. 1, and its
disconnectedness is apparent. The set H[P
%
] remains disconnected, if P
w
1
a
1
2
, even if the assumption of
constant probability of correct report is weakened to requiring that p
22
= p
11
e, as long as [e[o[1 2P
w
1
[
(and e is such that p
22
c [0; 1]).
Example 3 (Monotonicity in correct reporting). Let H
E
[P
%
] = {P : p
jj
Xp
(j1)(j1)
\j c X]. Suppose that x and
w are binary, i.e. that J = 2, so that the monotonicity assumption simplies to p
11
Xp
22
. Then if P
w
1
o
1
2
,
H[P
%
] = {p
11
; p
22
: (p
11
c [0; P
w
1
]; p
22
c {[0; p
11
]]) C (p
11
c [1 P
w
1
; 1]; p
22
c [1 P
w
1
; p
11
])].
If P
w
1
X
1
2
,
H[P
%
] = {p
11
; p
22
: (p
11
c [0; P
w
1
]; p
22
c [0; min(1 P
w
1
; p
11
)]) C (p
11
c [P
w
1
; 1]; p
22
c [1 P
w
1
; p
11
])].
Hence, if P
w
1
o
1
2
, H[P
%
] is disconnected, but otherwise it is connected. This set is plotted in the third panel of
Fig. 1. Its disconnectedness is apparent given the choice of P
w
1
= 0:3. To see why the set can be connected, the
fourth panel of Fig. 1 plots the set H[P
%
] obtained when the monotonicity assumption is p
11
pp
22
(in the
binary case, reversing the sign of the monotonicity assumption has an effect similar to maintaining p
11
Xp
22
but having P
w
1
4
1
2
).
Example 4 (Lower bound on the probability of correct report). Let H
E
[P
%
] = {P : p
jj
X1 l \j c X]. Suppose
that x and w are binary, i.e. that J = 2. Then if 14l4max{P
w
1
; 1 P
w
1
],
H[P
%
] = {p
11
; p
22
: (p
11
c [1 l; P
w
1
]; p
22
c [1 l; 1 P
w
1
]) C (p
11
c [P
w
1
; 1]; p
22
c [1 P
w
1
; 1])].
This set is connected through the point p
11
= P
w
1
, p
22
= 1 P
w
1
, and is plotted in the fth panel of Fig. 1 for
P
w
1
= 0:3 and l = 0:8.
If max{P
w
1
; 1 P
w
1
]4l, then
H[P
%
] = {p
11
; p
22
: p
11
c [max{1 l; P
w
1
]; 1]; p
22
c [max{1 l; 1 P
w
1
]; 1]],
ARTICLE IN PRESS
and H[P
%
] is convex. This set is plotted in the sixth panel of Fig. 1. Its convexity is apparent given the choice
of P
w
1
= 0:3 and l = 0:2.
2.2. Consistent estimation of the identication regions
The set H[P
%
] can be disconnected, connected or convex. These properties are reected in the shape of the
identication regions of the functionals of interest, namely H[P
x
], H{t[P
x
]] and H{Y[P
x
]], for some vector of
dimension k of functionals Y : C
X
R
k
. Hence, it is important to have a method to calculate and
consistently estimate the entire identication regions, that is able to capture their possible disconnectedness
and non-convexities. While the general identication approach proposed in Section 2.1 is valid for any set of
restrictions on P
%
, here I focus on restrictions that satisfy certain regularity conditions, described in
Assumptions C0 and C1 below, so that a simple estimator can be utilized.
Manski and Tamer (2002) introduced methods to estimate the entire identication region of a vector of
parameters of interest when the identication region cannot be expressed in closed form solution, but
is given by all values of the vector that minimize a specied objective function. Here I introduce a
related nonlinear programming estimator, using the same insight as in the linear programming estimator
proposed by Honore and Tamer (2006) and further discussed by Honore and Lleras-Muney (2006).
Observe that if I can calculate H[P
x
], I can then calculate H{t[P
x
]] and H{Y[P
x
]] for any functionals t()
and Y() (for example, the mean of x, its variance, the Gini coefcient, etc.); hence, I focus on the calculation
of H[P
x
].
6
The set H[P
x
] consists of the vectors p
x
c D
J1
for which the equations
P
w
= Pp
x
; p
j
c D
J1
\j; P c H
E
[P
%
], (2.6)
have a solution for P. In general, H
E
[P
%
] can be written as
H
E
[P
%
] =
P : f
j
(P)Xm
j
; j = 1; . . . ; q
1
; g
i
(P)pm
q
1
i
; i = 1; . . . ; q
2
;
h
k
(P) = m
q
1
q
2
k
; k = 1; . . . ; q
3
;
_ _
,
where q
1
q
2
q
3
= q is the number of constraints imposed, m
j
c [0; M], j = 1; . . . ; q, is a non-negative
parameter bounded by some constant M, and f
j
: R
J
2
R, g
i
: R
J
2
R, and h
k
: R
J
2
R, are functions
taking as arguments the elements of the matrix P.
To give a concrete example, if X = {1; 2; 3] and
H
E
[P
%
] = {P : p
jj
X0:8\j c X; 0:125pp
12
p
13
p0:33; p
11
= p
22
],
then q
1
= 4, q
2
= 1, q
3
= 1, q = 6, and
f
j
(P) = p
jj
; m
j
= 0:8; j = 1; 2; 3,
f
4
(P) = p
12
p
13
; m
4
= 0:125,
g
1
(P) = p
12
p
13
; m
5
= 0:33,
h
1
(P) = p
11
p
22
; m
6
= 0.
The equations in (2.6) have the same structure as the constraints in a nonlinear programming problem.
Hence one can check whether a particular vector n c D
J1
belongs to H[P
x
] by checking if a nonlinear
programming problem that has constraints given by (2.6) has a solution with a specic value for the objective
ARTICLE IN PRESS
6
If the researcher is interested in a scalar valued functional of P
x
, say, for example, t[P
x
] = Pr(x = j), j c X, and the matrix P is of full
rank for any P c H[P
%
], the extreme points of the identication region of this functional can be calculated and consistently estimated by
solving nonlinear optimization problems subject to linear and nonlinear constraints. In particular, let p
x
= P
1
P
w
, P c H[P
%
]. Then the
smallest and the largest points in H[Pr(x = j)], j c X can be calculated as p
x;L
j
= inf
PcH[P
%
]
p
x
j
(P; P
w
), p
x;U
j
= sup
PcH[P
%
]
p
x
j
(P; P
w
). These
extreme points are continuous functions of P
w
, and therefore one can consistently estimate them by replacing P
w
with P
w
N
.
function. Consider the nonlinear programming problem
Q(n) = max
{p
ij
];{v
k
]
k
v
k
(2.7)
subject to
v
k
X0 \k;
p
ij
X0; i; j = 1; . . . ; J;
1
J
i=1
p
ij
= v
j
; j = 1; . . . ; J;
P
w
Pn = [v
J1
. . . v
2J
]
T
;
f
l
(P) m
l
v
2Jl
X0; l = 1; . . . ; q
1
;
m
q
1
m
g
m
(P) v
2Jq
1
m
X0; l = 1; . . . ; q
2
;
h
s
(P) m
q
1
q
2
s
v
2Jq
1
q
2
s
= 0; l = 1; . . . ; q
3
:
_
_
(2.8)
I consider restrictions determining the set H
E
[P
%
] that satisfy the following conditions:
Assumption C0. For each j = 1; . . . ; q
1
, i = 1; . . . ; q
2
, and k = 1; . . . ; q
3
; f
j
(P)[
P=0
= g
i
(P)[
P=0
= h
k
(P)[
P=0
=
0 and f
j
(P), g
i
(P), and h
k
(P) are continuous on [0; 1]
J
2
.
Let PV denote the constraint set dened by (2.8). Assumption C0 is imposed to establish that the
objective function in (2.7) achieves a maximum on (2.8). Observe that the set PV is closed, because the
constraints dening it are continuous, and non-empty, because it contains the vector [p
0
1
; . . . ; p
0
J
; v
0
], with
p
0
ij
= 0 for i; j = 1; . . . ; J, v
0
j
= 1 for j = 1; . . . ; J, v
0
Jj
= P
w
j
for j = 1; . . . ; J; v
0
2Jl
= m
l
, l = 1; . . . ; q
1
,
v
0
2Jq
1
m
= 0, m = 1; . . . ; q
2
, and v
0
2Jq
1
q
2
s
= m
q
1
q
2
s
, s = 1; . . . ; q
3
. Hence maximization of (2.7) on P
V is equivalent to maximization of (2.7) on
~
P
~
V= [p
1
; . . . ; p
J
; v] c PV:
k
v
k
X
k
v
0
k
_ _
,
which is a closed and bounded set. The objective function in (2.7) is continuous, and therefore the result
follows by the BolzanoWeierstrass theorem.
7
The optimal function has value zero if and only if all v
k
= 0,
that is if a solution exists to (2.6). Hence, for given n c D
J1
one can check whether n c H[P
x
] by solving the
above nonlinear programming problem and checking whether v
k
= 0 for all k.
The above method for calculating identication regions has a natural sample analog counterpart, and under
some regularity conditions about the functions dening the set H
E
[P
%
] and the sampling process, this
estimator is consistent. In particular, I maintain the following:
Assumption C1. For each j = 1; . . . ; q
1
, i = 1; . . . ; q
2
, and k = 1; . . . ; q
3
, either (i) f
j
(P), g
i
(P) and h
k
(P) are
homogeneous functions of degree (respectively) r
j
; r
i
; r
k
X1, or (ii) f
j
(P), g
i
(P) and h
k
(P) are multivariate
polynomials in P with non-negative coefcients, or (iii) f
j
(P) are convex functions, g
i
(P) are concave
functions, and either (i) or (ii) holds for h
k
(P). Additionally, g
i
(P)X0 and h
k
(P)X0 on [0; 1]
J
2
.
Assumption C2. (a) Let a random sample {w
i
], i = 1; . . . ; N be available, and let P
w
i;N
=
1
N
N
j=1
1(w
j
= i),
i = 1; . . . ; J. (b) If the set H
E
[P
%
] contains constraints involving any parameters to be estimated, let these
parameters enter the constraints additively. Without loss of generality, to simplify the notation, let the
parameters to be estimated be m
l
, l = 1; . . . ; qpq. (c) Suppose that a random sample of size n =
N
k
for some
constant k such that 0okoo is available to estimate m
l
, l = 1; . . . ; q, so that
N
_
(m
l;n
m
l
)
d
N(0; kV
m
l
).
(d) Let m
l
satisfy m
l
40; l = 1; . . . ; qpq.
In Section 3 I consider several examples of restrictions dening the set H
E
[P
%
] that satisfy Assumptions
C0C1. For example, suppose that a validation study provides a lower bound on the probability of correct
ARTICLE IN PRESS
7
Alternative assumptions replacing Assumption C0 and yielding a non-empty closed and bounded constraint set for every n c D
J1
would also imply this result.
report for each type j = 1; . . . ; J, so that H
E
[P
%
] = {P : p
jj
Xm
j
; j c X]. Then Assumptions C0C1 are clearly
satised. Moreover, if a validation (random) sample { ~ w
i
; ~ x
i
], i = 1; . . . ; n is available (with n =
N
k
, 0okoo),
which does not point identify P
x
and P
%
, but allows one to conclude that
p
jj
Xm
j;n
=
n
i=1
1( ~ w
i
= j; ~ x
i
= j)
n
i=1
1( ~ x
i
= j)
,
then Assumption C2 is satised. The empirical analysis conducted in Section 4 shows that there are important
cases in which a validation sample allows for root-N consistent estimation of m
j;n
, but does not allow for point
identication of P
x
or P
%
.
Let H
E
N
[P
%
] denote the set H
E
[P
%
] obtained when m
l
is replaced by m
l;n
, l = 1; . . . ; q, with the convention
that m
l;n
= m
l
for l = q 1; . . . ; q. Dene an objective function Q
N
(n) as in (2.7)(2.8), with m
l;n
, l = 1; . . . ; q,
replacing m
l
and P
w
N
replacing P
w
. Then the following consistency result holds:
Proposition 2. Let Assumptions C0, C1 and C2 hold. Dene the set
H
N
[P
x
] = p
x
N
c D
J1
: Q
N
(p
x
N
)X sup
ncD
J1
Q
N
(n)
N
_ _
, (2.9)
where
N
= N
t
, 0oto
1
2
. Then the set H
N
[P
x
] is a consistent estimator of H[P
x
], in the sense that
r(H
N
[P
x
]; H[P
x
]) max sup
p
x
N
cH
N
[P
x
]
inf
p
x
cH[P
x
]
|p
x
N
p
x
|; sup
p
x
cH[P
x
]
inf
p
x
N
cH
N
[P
x
]
|p
x
N
p
x
|
_ _
p
0.
Most of the calculations and estimations of H[P
x
] presented in this paper are performed using this nonlinear
programming method. The method requires checking the value function of the sample analog of (2.7)(2.8) for
each n c D
J1
. Hence it works best, and the computations are easiest, when J is a relatively small number. This
is the case in many applications of interest. Examples include educational attainment, language prociency,
workers union status, employment status, health conditions, and health/functional status. When J is a large
number, the nonlinear programming problem becomes computationally harder. This issue has been
acknowledged in the related literature on partial identication, and some solutions have been proposed. For
example, Chernozhukov et al. (2004) and Ciliberto and Tamer (2004) have suggested the use of the
MetropolisHastings algorithm to generate adaptive grid sets or the use of simulated annealing to perform the
optimization over D
J1
. While Ciliberto and Tamers empirical analysis is based on the optimization of a
different objective function and the parameter space for n in their case is not D
J1
, their work shows that the
computational problem is feasible for values of J as large as 13.
2.3. Condence sets for the identication regions
8
The problem of the construction of condence intervals for partially identied parameters was addressed by
Horowitz and Manski (1998, 2000). They considered the case in which the identication region of the
parameter of interest is an interval whose lower and upper bounds can be estimated from sample data, and
proposed condence intervals that asymptotically cover the entire identication region with xed probability.
For the same class of problems, Imbens and Manski (2004) suggested shorter condence intervals that
uniformly cover the parameter of interest, rather than its identication region, with a prespecied probability.
Beresteanu and Molinari (2007) provide condence sets and condence collections for partially identied
parameters whose convex identication region is equal to the expectation of a properly dened set valued
random variable. These approaches are not applicable to the problem studied here, because our identication
regions are given by the set of values of the parameters of interest that solve a maximization problem, do not
have a closed form solution, and are not necessarily convex. The problem of construction of condence sets
for identication regions of parameters obtained as the solution of the minimization of a criterion function has
recently been addressed by Chernozhukov et al. (2007). They provided a method to construct condence sets
ARTICLE IN PRESS
8
I am very grateful to Elie Tamer for suggestions that led to the construction of these condence sets.
that cover the identication region with probability asymptotically equal to 1 a, and developed subsampling
methods to implement this procedure. Here I introduce a different procedure, and show that the coverage
property of these condence sets follows directly from well known results in the literature (e.g., Rao, 1973;
Cox and Hinkley, 1974). The counterpart of the simplicity of this approach is that the condence sets may be
conservative, in the sense that given a prespecied condence coefcient 1 a, 0oao1, the condence sets
asymptotically cover the identication region with probability at least equal to 1 a.
The main insight for the construction of the condence sets for H[P
x
], denoted C
H[P
x
]
N
, is given by observing
that the only parameters to be estimated for obtaining H
N
[P
x
] in (2.9) are P
w
i;N
, i = 1; . . . ; J 1, and m
l;n
,
l = 1; . . . ; q. Let
^
!
N
denote the J 1 q vector collecting these estimators. Under Assumption C2,
^
!
N
is root-
N consistent and asymptotically normal, and has a covariance matrix (Var(!)) that can be consistently
estimated from the data (
Var(
^
!
N
)). Hence, if c
1a
denotes the 1 a quantile of the w
2
(J1 q)
distribution, I
construct a joint condence ellipsoid for ! [(P
w
i
)
i=1;...;J1
; (m
l
)
l=1;...; q
] as
C
!
N
{!
0
: (
^
!
N
!
0
)
/
(
Var(
^
!
N
))
1
(
^
!
N
!
0
)pc
1a
].
It follows from the results in Rao (1973, Section 7b) that
lim
No
Pr(! c C
!
N
) = 1 a.
Given C
!
N
, I construct C
H[P
x
]
N
as follows. For a given !
0
c C
!
N
, let H
!
0
[P
x
] denote the identication region for
P
x
obtained when
^
!
N
is replaced by !
0
in the estimation procedure described in the previous section. Let
C
H[P
x
]
N
=
_
!
0
cC
!
N
H
!
0
[P
x
].
Then
! c C
!
N
==H[P
x
] _ C
H[P
x
]
N
,
and therefore
lim
No
Pr(H[P
x
] _ C
H[P
x
]
N
)X1 a.
The condence sets presented in Section 4 are obtained using this procedure. Using similar procedures one
can construct condence regions for H{t[P
x
]] and H{Y[P
x
]], where again t() and Y() denote functionals
of P(x).
3. Analysis of the identifying power of specic restrictions on P
%
This section analyzes in detail examples of restrictions on the matrix P
%
(which satisfy Assumptions
C0C1) coming from validation studies and theories developed in the social sciences. I suggest settings in
which such assumptions may be credible, show their implications for the structure of H[P
%
], and present
results on the inferences that they allow one to draw on P
x
and t[P
x
]. I show that when the base-case
assumptions are maintained, the direct misclassication approach is equivalent to the method proposed by
HM and therefore it gives the same identication regions for H[Pr(x = j)], j c X as the ones they derived.
Hence, I use these results as a benchmark to evaluate the identifying power of additional assumptions. Notice
however that H[Pr(x = j)], j c X is just the projection of H[P
x
] on its jth component. Therefore, when J42, a
comparison based simply on H[Pr(x = j)], j c X understates the identifying power of the additional
assumptions. When J = 2, H[P
x
] is entirely described by H[Pr(x = 1)] and closed form bounds can be derived
under different sets of assumptions, hence allowing for a full comparison.
3.1. Upper bound on the probability of data errors
Suppose that the researcher has a known lower bound on the probability that the realizations of w and x
coincide, i.e., Pr(w = x)X1 l, or, strengthening this assumption, that the researcher has a known lower
ARTICLE IN PRESS
bound on the probability of correct report for each value that x can take, i.e., Pr(w = j[x = j)X1 l, \j c X.
Formally, consider the following:
Assumption 1. Pr(w = x)X1 l40
or, as a stronger version of Assumption 1, that:
Assumption 2. Pr(w = j[x = j)X1 l40; \j c X.
Assumptions 1 and 2 are quite often satised in practice, mainly due to the availability of results of
validation studies, and are therefore of particular interest. Additionally, Assumptions 1 and 2 exhaust the
implications for the structure of P
%
of the assumptions typically maintained by researchers adopting mixture
models. Hence, the results obtained under these base-case assumptions are particularly suited to evaluate the
identifying power of additional prior information. In the next section I show that informative identication
regions might be obtained even if one dispenses with Assumptions 1 and 2, when other information is available.
When the researcher has prior information suggesting that either Assumption 1 or the stronger Assumption
2 hold, she can specify the set H
E
[P
%
], respectively, as follows:
H
E;1
[P
%
] = P :
J
h=1
p
hh
p
x
h
X1 l
_ _
,
H
E;2
[P
%
] = {P : p
jj
X1 l \j c X],
where H
E;1
[P
%
] denotes the set H
E
[P
%
] when Assumption 1 is maintained, and H
E;2
[P
%
] denotes the set
H
E
[P
%
] when Assumption 2 is maintained. Notice that H
E;2
[P
%
] H
E;1
[P
%
]. Proposition 3 gives closed form
bounds on Pr(x = j), j c X, for the case in which either Assumption 1 or 2 holds.
Proposition 3. (a) Suppose that Assumption 1 holds, and that no other information is available. Then from system
(1.1) one can learn that
H[Pr(x = j)] = [max(P
w
j
l; 0); min(1; P
w
j
l)]; j c X. (3.1)
(b) Suppose that Assumption 2 holds, and that no other information is available. Then from system (1.1) one
can learn that
H[Pr(x = j)] = max
P
w
j
l
1 l
; 0
_ _
; min 1;
P
w
j
1 l
_ _ _ _
; j c X. (3.2)
The proof of Proposition 3 proceeds in two steps. First, it is shown that from the jth equation of system
(1.1) one can learn, depending on the maintained assumption, that Pr(x = j) lies in one of the intervals in
(3.1)(3.2). Then it is shown that there exists a P c H[P
%
] for which the extreme values of these intervals solve
system (1.1). This implies that the bounds are sharp. This result establishes that when only Assumption 1 or
Assumption 2 is maintained, only the jth equation in system (1.1) implies restrictions on Pr(x = j), j c X. In
the next section I show that when more structure is imposed on the matrix P, several of the equations in
system (1.1) imply restrictions on Pr(x = j), j c X, and additional progress can be made.
The same identication regions as those in Proposition 3 were obtained by HM. They used a mixture model
to study the problem of inference with corrupted and contaminated data, and assumed that a known lower
bound is available on the probability that a realization of w is drawn from the distribution of x. Molinari
(2003) shows that under Assumptions 1 and 2, the identication regions for parameters that respect stochastic
dominance obtained using the direct misclassication approach are also equivalent to those obtained by HM.
3.2. Constant probability of correct report
Consider the case that, conditional on the value of x, there is constant probability of correct report for at
least a subset of the values that x can take. Formally:
Assumption 3. Pr(w = j[x = j) = p
%
X1 lX0 \j c
~
X _ X,
ARTICLE IN PRESS
where p
%
is known only to lie in [1 l; 1], and l is strictly less than 1 if a non-trivial upper bound on the
probability of a data error is available.
There are various situations in which this assumption may be credible. For example, Poterba and Summers
(1995) use CPS data (with Reinterview Survey) and provide evidence for the reinterviewed sub-sample that the
rate of correct report of employment status is similar for individuals who are employed or not in the labor
force (Pr(w = j[x = j) 0:99), but much lower for individuals who are unemployed (Pr(w = j[x = j) 0:86).
Kane et al. (1999) provide evidence (Table 5, p. 18) that self-report of educational attainment is correct with
similar probabilities for individuals with no college, some college but no AA degree, and AA degree
(Pr(w = j[x = j) 0:92), and is higher for individuals with at least a bachelor degree (Pr(w = j[x = j) 0:99).
Assumption 3 may hold with
~
X = X when the misclassication is generated by specic types of interviewer
recording errors. For example, the interviewer may sometimes mark one box at random in the questionnaire.
Additionally, in the special case of dichotomous variables, some have argued that the misreporting of health
disability is independent from true disability status (see Kreider and Pepper, 2007 for a discussion of this
issue), or that the misreporting of workers union status is independent from true union status (see Bollinger,
1996 for a discussion of this issue). When this is the case, Assumption 3 holds.
In general, Assumption 3 does not place any restriction on Pr(w = i[x = j), iaj; i; j c X, other than that the
misreporting probabilities need to satisfy
iaj
Pr(w = i[x = j) = 1 p
%
; \j c
~
X.
When J = 2, this implies that the two off-diagonal elements of P
%
are equal; hence the only unknown element
of P
%
is p
%
.
Suppose rst that
~
X X, and without loss of generality let
~
X {1; 2; . . . ; h], 2phoJ. When this is the case,
Eq. (1.1) can be rewritten as
p
%
p
%
12
. . . p
%
1J
p
%
21
p
%
. . . p
%
2J
.
.
.
.
.
.
.
.
.
.
.
.
p
%
J1
p
%
J2
. . . p
%
JJ
_
_
_
_
Pr(x = 1)
Pr(x = 2)
.
.
.
Pr(x = J)
_
_
_
_
=
Pr(w = 1)
Pr(w = 2)
.
.
.
Pr(w = J)
_
_
_
_
, (3.3)
where p
%
X1 l and, assuming that l constitutes a uniform upper bound for all the misclassication
probabilities, p
%
ll
X1 l, \l c X
~
X. Then H
E
[P
%
] is dened as
H
E;3
[P
%
] = {P : p
jj
= pX1 l \j c
~
X; p
ll
X1 l \l c X
~
X].
Let H
3
[P
%
] = H
P
[P
%
] H
E;3
[P
%
], where H
P
[P
%
] was dened in (2.4). Then one can immediately calculate
H[P
x
] and H{t[P
x
]] using the non-linear programming method described in Section 2, with H
E
[P
%
] =
H
E;3
[P
%
].
It is natural to ask whether Assumption 3 does have identifying power. To answer this question, I consider
the case that the researcher has a non-trivial upper bound on the probability of data errors, i.e., that lo1 and
compare the bounds on Pr(x = j), j c X derived in Proposition 3, Eq. (3.2), with the extreme points obtained
using the nonlinear programming method, with H
E
[P
%
] = H
E;3
[P
%
]. In Section 3.4 I consider the case in
which x and w are binary (J = 2) and show that Assumption 3 can have identifying power even when l = 1.
Proposition 4 shows that if P
w
i
40, for some i c
~
X{j], the base case lower bound on Pr(x = j), j c
~
X, if
informative, is never feasible when Assumption 3 (with
~
X X) is maintained; hence the lower bound on
Pr(x = j), j c
~
X under Assumption 3 is strictly greater than that in (3.2). For the case in which the base case
upper bound on Pr(x = j), j c
~
X is informative, Proposition 5 derives conditions under which such upper
bound is not feasible when Assumption 3 (with
~
X X) is maintained, and shows that when those conditions
are satised, this upper bound is strictly smaller than that in (3.2). When the base case lower and upper bounds
(respectively) are not informative, Assumption 3 has no additional identifying power.
ARTICLE IN PRESS
Proposition 4. (a) Suppose that Assumption 3 holds, with
~
X X, and that P
w
j
4l. Then the lower bound on
Pr(x = j), j c
~
X is strictly greater than the base case lower bound in (3.2). The base case lower bound in (3.2) is
the sharp lower bound for Pr(x = k), k c X
~
X.
(b) Suppose that Assumption 3 holds, with
~
X X, and that P
w
j
pl. Then the sharp lower bound on Pr(x = j),
j c X coincides with the base case lower bound in (3.2), and is equal to 0.
Proposition 5. (a) Suppose that Assumption 3 holds, with
~
X X, and that P
w
j
o1 l.
If lp
1
2
, the upper bound on Pr(x = j), j c
~
X is strictly smaller than the base case upper bound in (3.2) if and
only if
J k c
~
X{j] : P
w
j
P
w
k
4(1 l) P
w
j
l
1 l
. (3.4)
If l4
1
2
, the upper bound on Pr(x = j), j c
~
X is strictly smaller than the base case upper bound in (3.2) if
J k c
~
X{j] : P
w
k
4l. (3.5)
The base case upper bound in (3.2) is the sharp upper bound for Pr(x = k), k c X
~
X.
(b) Suppose that Assumption 3 holds, with
~
X X, and that P
w
j
X1 l. Then the sharp upper bound on
Pr(x = j), j c X coincides with the base case upper bound in (3.2) and is equal to 1.
The proofs of Propositions 45, parts (a), are based on showing that there is no P c H
3
[P
%
] for which the
lower bound in (3.2) for Pr(x = j), j c
~
X solves system (3.3), and that when condition (3.4) or condition (3.5) is
satised, there is no P c H
3
[P
%
] for which the upper bound in (3.2) for Pr(x = j), j c
~
X solve system (3.3).
When the inference is on Pr(x = k), k c X
~
X, there is a P c H
3
[P
%
] that allows for the base case bounds in
(3.2) to solve system (3.3). The proofs of Propositions 45, parts (b), are based on showing that when the
bounds on Pr(x = j), j c X in (3.2) are not informative, one can nd values of P c H
3
[P
%
] for which p
x
j
= 0
and p
x
j
= 1 solve system (3.3).
The results in Propositions 45 can be explained as follows: only a subset
~
X of the equations in system (1.1)
are related between each other. Therefore, when drawing inference on Pr(x = j), j c X, an improvement on the
base case bound in (3.2) can be achieved only for j c
~
X. Consider now the case in which
~
X = X. In this case
the results of Propositions 45 apply directly, with X replacing
~
X. Of course, the identifying power of
Assumption 3 is the highest in this case. In particular, Proposition 4 establishes that the lower bound for
Pr(x = j), j c X, if informative, improves for all j when Assumption 3 is maintained with
~
X = X.
A nal consideration is relevant. Often the researcher might have prior information suggesting that
Assumption 3 holds, but not exactly. That is, she might have prior information that the probability of correct
report is only approximately constant: Pr(w = j[x = j) - p
%
, \j c
~
X _ X. Then it is natural to ask how much
variation in the probabilities of correct report is consistent with the conclusions of Propositions 45. For ease
of exposition, consider the identication of Pr(x = 1), and let p
11
= p.
9
Molinari (2003) shows that as long as
[p
jj
p
11
[ol, \j c
~
X{1], and
~
X X, or
~
X = X, the results of Proposition 4 continue to hold. A similar
condition is derived for the results of Proposition 5.
Example 6 in Section 3.4 illustrates the identifying power of Assumption 3, both for the case in which
~
X X and
~
X = X, by comparing the identication regions H[Pr(x = j)], j c X, H[P
x
] and H[E(x)] obtained
using the nonlinear programming method with H
E
[P
%
] = H
E;3
[P
%
] with those obtained when only
Assumption 2 is maintained.
3.3. Monotonicity in correct reporting
Social psychology suggests that when survey respondents are asked questions relative to socially and
personally sensitive topics, they tend to underreport socially undesirable behaviors and attitudes, and
overreport socially desirable ones. This suggestion is often supported by validation studies. In the context of
questions of the type described above, these studies often document that Pr(w = j[x = j)XPr(w = j 1[x =
j 1), \j c
~
X X. This is the case for example when survey respondents are asked about their participation
ARTICLE IN PRESS
9
When drawing inference on P(x = j), j c
~
X, we can always dene p
jj
= p, and look at p
kk
, k c
~
X{j], as deviations from p.
in welfare programs, and j = 1 indicates non-participation, while j = 2 indicates participation, or when they
are asked about their employment status, and j = 1; 2 indicates, respectively, employed or not in the labor
force, while j = 3 indicates unemployed.
Suppose that the set X {1; 2; . . . ; J] can be ordered according to the social desirability of the values
that x can take, with x = 1 being the most desirable, and x = J the least desirable. Suppose further
that the researcher believes that there is monotonicity in correct reporting. Then she can maintain the
following:
Assumption 4. Pr(w = j[x = j)XPr(w = j 1[x = j 1), \j c X{J], Pr(w = J[x = J)X1 lX0,
where l is strictly less than 1 if a non-trivial upper bound on the probability of a data error is available. When
this assumption holds, H
E
[P
%
] is dened as
H
E;4
[P
%
] = {P : p
jj
Xp
(j1)(j1)
; \j c X{J]; p
JJ
X1 l].
Let H
4
[P
%
] = H
P
[P
%
] H
E;4
[P
%
], where H
P
[P
%
] was dened in (2.4). Then H[P
x
] and H{t[P
x
]] can be
calculated using the nonlinear programming method described in Section 2, with H
E
[P
%
] = H
E;4
[P
%
].
To verify that Assumption 4 does have identifying power I again consider the case that lo1, and compare
the results obtained using the nonlinear programming method when Assumption 4 is maintained, with those
of Proposition 3. In Section 3.4 I consider the case in which x and w are binary (J = 2), and show that
Assumption 4 can have identifying power even when l = 1.
Suppose that Assumption 4 holds. Proposition 6 shows that the base case lower bound in (3.2), when
informative, is feasible for Pr(x = 1). However, for j c X{1] if P
w
l
40 for some l c {1; . . . ; j 1], the base case
lower bound in (3.2), when informative, is not feasible for Pr(x = j), and hence the lower bound under
Assumption 4 is strictly greater than that in (3.2). Regarding the base case upper bound in (3.2), the same
results as those in Proposition 5 hold, with
~
X = {j; j 1; . . . ; J]. The proof of this proposition derives almost
directly from the proofs of Propositions 45.
Proposition 6. Suppose that Assumption 4 holds.
(a) Let P
w
j
4l. Then if j = 1, the base case lower bound in (3.2) is the sharp lower bound for Pr(x = 1). The
lower bound for Pr(x = j), j c X{1], is strictly greater than the base case lower bound in (3.2). The result of
Proposition 4, part (b), is unchanged.
(b) Let P
w
j
o(1 l). Then the same results as in Proposition 5 hold, with
~
X = {j; j 1; . . . ; J]. The result of
Proposition 5, part (b), is unchanged.
Example 6 in Section 3.4 illustrates the identifying power of Assumption 4, by comparing the identication
regions obtained using the nonlinear programming method with H
E
[P
%
] = H
E;4
[P
%
] with those obtained
when only Assumption 2 is maintained.
3.4. Dichotomous variables and numerical examples
When x and w are dichotomous variables, the identifying power of Assumptions 3 and 4 can be more easily
appreciated, since the bounds on H[P
x
] can be derived explicitly. This section shows how. It then provides
numerical examples of the identication regions obtained under Assumptions 2, 3 and 4, both for the case of
J = 2 and 3.
Let X = {1; 2]. The problem of misclassication of a dichotomous variable has received much attention
in the econometric, statistical, and epidemiological literature. It is in the context of misclassied dichoto-
mous variables that most of the precedents to the use of restrictions on the misclassication probabilities take
place.
To start, suppose that Assumption 3 holds. In the related literature it has often been assumed that Pr(w =
1[x = 2) = Pr(w = 2[x = 1) and additionally that these misclassication probabilities are less than
1
2
(see, e.g.,
Klepper, 1988; Card, 1996). Notice that with dichotomous variables Assumption 3 implies that Eq. (1.1)
ARTICLE IN PRESS
can be rewritten as
Pr(w = 1)
Pr(w = 2)
_ _
=
p
%
1 p
%
1 p
%
p
%
_ _
Pr(x = 1)
Pr(x = 2)
_ _
.
Hence, the identication region H[P
x
] can be inferred from the identication region
H[Pr(x = 1)] = {p
x
1
: P
w
1
= pp
x
1
(1 p)(1 p
x
1
); p c H
3
[P
%
]],
where H
3
[P
%
] was dened in Example 2. Notice that if p =
1
2
, P
w
1
=
1
2
; in this case, P(w[x) = P(w), i.e., x and w
are statistically independent, and obviously knowledge of P(w) does not provide any information on P(x).
If P
w
1
a
1
2
, then pa
1
2
. The following proposition characterizes explicitly H[Pr(x = 1)].
Proposition 7. Let Assumption 3 hold, with
~
X = X {1; 2].
(a) If lo
1
2
, then
H[Pr(x = 1)] = P
w
1
; min
P
w
1
l
1 2l
; 1
_ _ _ _
if P
w
1
X0:5;
H[Pr(x = 1)] = max
P
w
1
l
1 2l
; 0
_ _
; P
w
1
_ _
otherwise:
_
_
(b) If lX
1
2
, then
H[Pr(x = 1)] = [P
w
1
; 1] if P
w
1
4l;
H[Pr(x = 1)] = 0;
P
w
1
l
1 2l
_ _
C [P
w
1
; 1] if lXP
w
1
4
1
2
;
H[Pr(x = 1)] = [0; 1] if P
w
1
=
1
2
;
H[Pr(x = 1)] = [0; P
w
1
] C
P
w
1
l
1 2l
; 1
_ _
if
1
2
4P
w
1
X1 l;
H[Pr(x = 1)] = [0; P
w
1
] if 1 l4P
w
1
:
_
_
These identication regions are a subset of those in (3.2).
The fact that if lX
1
2
H[Pr(x = 1)] can be given by two disjoint intervals is a direct consequence of the
possible disconnectedness of H[P
%
] arising when one assumes constant probability of correct report, and is
described in Section 2 and in Example 2.
Suppose now that Assumption 4 holds. Also in this case the identication region H[P
x
] can be inferred from
the identication region
H[Pr(x = 1)] = {p
x
1
: P
w
1
= p
11
p
x
1
(1 p
22
)(1 p
x
1
); (p
11
; p
22
) c H
4
[P
%
]], (3.6)
where H
4
[P
%
] was dened in Example 3. Notice that again if p
11
= p
22
=
1
2
, P
w
1
=
1
2
; in this case,
P(w[x) = P(w), i.e., x and w are statistically independent, and obviously knowledge of P(w) does not provide
any information on P(x). If P
w
1
a
1
2
, then p
11
and p
22
cannot be jointly equal to
1
2
. The following proposition
characterizes explicitly H[Pr(x = 1)].
Proposition 8. Let Assumption 4 hold.
(a) If lo
1
2
, then
H[Pr(x = 1)] = max
P
w
1
l
1 l
; 0
_ _
; min
P
w
1
l
1 2l
; 1
_ _ _ _
if P
w
1
X0:5;
H[Pr(x = 1)] = max
P
w
1
l
1 l
; 0
_ _
; P
w
1
_ _
otherwise.
_
_
(3.7)
ARTICLE IN PRESS
(b) If lX
1
2
, then
H[Pr(x = 1)] =
P
w
1
l
1 l
; 1
_ _
if P
w
1
4l;
H[Pr(x = 1)] = [0; 1] if lXP
w
1
X
1
2
;
H[Pr(x = 1)] = [0; P
w
1
] C
P
w
1
l
1 2l
; 1
_ _
if
1
2
4P
w
1
X1 l;
H[Pr(x = 1)] = [0; P
w
1
] if 1 l4P
w
1
:
_
_
(3.8)
These identication regions are a subset of those in (3.2).
Again, the fact that if lX
1
2
and P
w
1
o
1
2
, H[Pr(x = 1)] can be given by two disjoint intervals is a direct
consequence of the possible disconnectedness of H[P
%
] arising when one assumes monotonicity in correct
reporting, and is described in Section 2 and in Example 3.
The following numerical example illustrates the identifying power of Assumptions 3 and 4 with X = {1; 2]
by comparing the bounds in Propositions 7 and 8 with those in (3.2) and showing how the bounds improve as
l gets closer to the true misclassication parameter.
Example 5. Let Pr(x = 1) = 0:3 and p
%
= 0:9, so that P
w
1
= 0:34. Table 1 gives lower and upper bounds on
Pr(x = 1), when Assumptions 2, 3 and 4 are maintained, as l approaches 1 p
%
. Notice that the identication
region for Pr(x = 1) when Assumptions 3 and 4 are maintained is informative even when l = 1.
To conclude this section, I illustrate the identifying power of Assumption 3 (both for the case in which
~
X X and
~
X = X) and Assumption 4, when J = 3. I compare the identication regions H[Pr(x = j)], j c X,
H[P
x
] and H[E(x)] obtained using the nonlinear programming method with H
E
[P
%
] = H
E;3
[P
%
] and with
H
E
[P
%
] = H
E;4
[P
%
] with those obtained when only Assumption 2 is maintained.
ARTICLE IN PRESS
Table 1
Identifying power of assuming monotonicity in correct reporting or constant probability of correct report vs. base-case, with dichotomous
variables, for different values of l
Maintained assumptions
l Base-case Monotonicity in correct reporting Constant probability of correct report
H[Pr(x = 1)] H[Pr(x = 1)] H[Pr(x = 1)]
1.000 [0; 1] [0; 0:34] C [0:66; 1] [0; 0:34] C [0:66; 1]
0.750 [0; 1] [0; 0:34] C [0:82; 1] [0; 0:34] C [0:82; 1]
0.400 [0:00; 0:57] [0:00; 0:34] [0:00; 0:34]
0.250 [0:12; 0:45] [0:12; 0:34] [0:18; 0:34]
0.100 [0:27; 0:38] [0:27; 0:34] [0:30; 0:34]
Table 2
Identifying power of assuming monotonicity in correct reporting or constant probability of correct report vs. base-case
Maintained assumptions Exact value
Base-case Monotonicity in correct reporting Constant probability of correct report
~
X = {1; 2]
~
X = X
Pr(x = 1) [0:180; 0:425] [0:180; 0:415] [0:235; 0:415] [0:235; 0:415] 0.3
Pr(x = 2) [0:434; 0:687] [0:525; 0:687] [0:525; 0:687] [0:551; 0:687] 0.6
Pr(x = 3) [0:000; 0:138] [0:000; 0:138] [0:000; 0:138] [0:000; 0:137] 0.1
E(x) [1:575; 1:955] [1:585; 1:955] [1:585; 1:899] [1:585; 1:899] 1.8
Example 6. Let X = {1; 2; 3], l = 0:2, p
%
= 0:85, [Pr(x = j); j c X] = [0:3 0:6 0:1]
T
, and suppose that
p
%
21
= 0:11, p
%
12
= 0:13, p
%
13
= 0:04, so that P
w
= [0:34 0:55 0:11]
T
; with these values, E(x) = 1:8. Table 2
gives the identication regions for t[P
x
] = Pr(x = j), j c X, and for t[P
x
] = E(x), when Assumption 2 alone is
maintained, when Assumptions 2 and 3 are jointly maintained with
~
X = X and with
~
X = {1; 2], and when
Assumptions 2 and 4 are jointly maintained. The improvement in the upper bound on Pr(x = 1) comes from
the second equation of system (1.1); indeed P
w
1
P
w
2
= 0:8940:885 = (1 l)
l
1l
P
w
1
. Fig. 2 plots the
identication regions H[P
x
] obtained under the different assumptions, mapping them in R
2
.
4. Estimation and inference for the distribution of pension plan types in the US
To illustrate estimation of the bounds and construction of the condence sets, I consider data on the
distribution of pension plan characteristics in the American population age 5161. The data are based on
household interviews obtained in the Health and Retirement Study (HRS), a longitudinal, nationally
representative study of older Americans, which in its base year of 1992 surveyed 12,652 individuals from 7,607
households, with at least one household member born between 1931 and 1941. The survey has been updated
every two years since 1992, and in 1998 a new cohort of 2,529 individuals born between 1942 and 1947
(so-called War Babies) was added to the HRS sample. I use data from the rst HRS wave and from the War
Babies wave, focusing on the information collected on pension plan characteristics for people age 5161 and
ARTICLE IN PRESS
Fig. 2. Comparison of the identifying power of different assumptions for H[P
x
].
employed at the time of the survey. This provides two nationally representative cross-sections of the
population of interest. The question to be addressed is:
How did the distribution of pension plan types in the population of currently employed Americans, age
5161, change between 1992 and 1998?
Three pension plan types are possible: DB, DC, and plans incorporating features of both (Both). DB and
DC plans differ greatly in their characteristics. As described by Gustman et al. (2000), in a DB pension the
benet formula is specied by the plan sponsor, usually as a function of the workers highest salary, years of
service, and retirement age. Typically such plans reduce the benet amount for retirement prior to the so-
called normal retirement age, and are nanced by employer (pre-tax) contributions. DC plans do not specify
the retirement benet, but they set how much is contributed into the account each year the worker remains
with the plan. Then the benet payout is determined at retirement, as a function of how much it accumulated
in the workers account. The plan type can affect several pension-related variables, including pension wealth
and pension accrual. For example, there are DB plans in which an additional year of service is rewarded by
greater retirement benets up to the rms early retirement age. Then the benet accrual prole may atten
out, and even become negative, if retirement is delayed further. By contrast, DC plans tend to be actuarially
neutral with regard to the retirement age, rewarding delayed retirement more monotonically.
It is then of interest to learn how the distribution of pension plan types has changed over time, as a
preliminary step before studying the relation between pension incentives and retirement and saving behavior.
The HRS data can provide valuable information in this direction. However, there is evidence that workers are
particularly misinformed about their pension plans characteristics, and it is therefore not obvious how to
make use of their reported pension plan descriptions to draw the inference of interest. Gustman and
Steinmeier (2001) linked data from the rst HRS wave with restricted data from Social Security
Administration and employer provided pension plan descriptions, and documented that individuals with
matched data (approximately 51% of the entire HRS sample and 67% of currently employed respondents)
approaching retirement age are remarkably misinformed with regard to their pension plans characteristics.
Their results are reported in Table 3, and suggest that, overall, approximately 49% of the currently employed
individuals with matched data correctly identify their pension plan type, the remaining 51% providing a
wrong report.
For the individuals in the rst HRS wave without a matched pension (33% of the sample) it is difcult to
determine the true plan type: on one side, Gustman and Steinmeier (2001) document that the sub-sample
without a matched pension is different from the sub-sample with a matched pension; on the other side, the
evidence for the sub-sample with matched pension casts doubts on the reliability of the self-reports. Moreover,
linked data are not available for individuals in subsequent waves, or for individuals in the War Babies wave.
10
Yet, the results of Gustman and Steinmeiers (2001) analysis provide information on the misreporting pattern,
ARTICLE IN PRESS
Table 3
Percentage with self-reported plan type conditional on rm report of plan type, for respondents reporting pension coverage on current job
with a matched employer plan description
Self-report Provider report
DB DC Both
DB 0.56 0.26 0.45
DC 0.15 0.54 0.18
Both 0.27 0.18 0.35
Dont Know 0.02 0.02 0.02
Sample size: 2,907. Source: Gustman and Steinmeier (2001, Table 6C).
10
Additionally, employer provided pension plan descriptions are not publicly accessible by HRS users. In particular, such data are not
available for the analysis carried out in this paper.
and such information can be exploited through the direct misclassication approach to draw inference on the
question of interest.
In all that follows I assume that the HRS respondents correctly report whether they are covered by a
pension,
11
and I take rm reported plan types to be the true plan types. I also ignore the observations with
missing data (about 2% of the sample). Let x = 1 if the individual has a DB plan, x = 2 if the individual has a
DC plan, and x = 3 if the individual has a plan combining features of both, so that X {1; 2; 3]. As before,
w c X denotes the reported pension plan type. Let P
w;t
[Pr
t
(w = j); j c X] and P
x;t
[Pr
t
(x = j); j c X]
denote, respectively, the vectors of fractions of reported pension plan types and true pension plan types at time
t = 1992; 1998. For the respondents in the rst HRS wave, let s
l
= 1 denote the fact that individual l c L
1992
has a matched pension plan description, s
l
= 0 otherwise, and denote by P
%1
1992
the matrix of misclassication
probabilities that maps the true pension plan types into the reported types for individuals with matched
pension plan descriptions. Let P
%0
1992
denote the matrix of misclassication probabilities for the respondents in
the rst HRS wave without a matched plan description, and let P
%
1998
denote the matrix of misclassication
probabilities for the entire sample of respondents in the War Babies wave. Table 3 reveals, up to statistical
considerations, P
%1
1992
. From the HRS data and from Gustman and Steinmeiers (2001) results one can learn
P
w;1992
, P
w;1998
, and [Pr
1992
(x = j[s = 1); j c X]. These values are reported in Table 4, along with 95%
condence intervals.
One might expect the misclassication pattern reported by Gustman and Steinmeier (2001) to hold for the
entire set of respondents to the 1992 HRS survey. On the other hand, one might expect that the
misclassication structure mapping true pension plan types into reported types changes over time, so that
P
%1
1992
can help in constructing H[P
%
1998
], but not reduce this set to a singleton. However, one might as well be
tempted to entertain assumptions strong enough to achieve point identication of the quantity of interest. To
test the credibility of these conjectures, I examine the following assumptions:
Assumption E1 (No selection). P
%
1992
= P
%1
1992
.
Assumption E2 (No selection and no variation over time). P
%
1998
= P
%1
1992
.
The rst assumption states that the misreporting pattern for respondents in the rst HRS wave with
matched pension plan description holds for the entire sample of the rst HRS wave. The second assumption
states that the misreporting pattern for the respondents in the War Babies wave is the same as that for the
respondents with matched data in the rst HRS wave. When these assumptions are maintained, P
%
1992
and
P
%
1998
are identied, and, since P
%1
1992
is non-singular, one can use the equation p
x
= P
1
P
w
to attempt to learn
[Pr
t
(x = j); j c X], t = 1992; 1998. Table 5 reports the results of such procedure, along with 95% bootstrap
condence intervals. The data reject the assumption that P
%
1998
= P
%1
1992
: the vector obtained from solving
(P
%1
1992
)
1
P
w;1998
does not generate a valid probability measure. In particular, the rst element of the implied
ARTICLE IN PRESS
Table 4
True fractions of pension plan types for the subset of respondents with matched data for 1992, as calculated by Gustman and Steinmeier
(2001), Table 6A, and reported fractions of pension plan types for 1992 and 1998 (Authors calculations)
t = 1992 t = 1992 t = 1998
Point est. 95% CI Point est. 95% CI Point est. 95% CI
Pr
t
(x = 1[s = 1) 0.48 [0:46; 0:50] Pr
t
(w = 1) 0.44 [0:42; 0:45] 0.28 [0:25; 0:30]
Pr
t
(x = 2[s = 1) 0.21 [0:19; 0:22] Pr
t
(w = 2) 0.30 [0:29; 0:32] 0.38 [0:35; 0:41]
Pr
t
(x = 3[s = 1) 0.31 [0:29; 0:33] Pr
t
(w = 3) 0.26 [0:24; 0:27] 0.34 [0:31; 0:37]
Sample size N = 2; 907 Sample size N = 4; 244 N = 1; 124
11
This assumption is based on Gustman and Steinmeiers (2001) comparison between peoples reports on their pension coverage in both
the 1992 and 1994 waves of the HRS. This comparison shows that 93% of the respondents who declared to be covered by a pension or to
be not covered by a pension in 1992 give the same answer in 1994. Of the remaining 7%, approximately 80% are individuals who declared
not to be covered by a pension in 1992 but to be covered in 1994.
vector is negative and its 95% condence interval does not cover the zero, and the last element is greater than
one. Hence, point identication of P
x;1998
through Assumption E2 is not possible. On the other hand, the data
do not reject the assumption that P
%
1992
= P
%1
1992
, despite the possible selection problem. In all that follows I
maintain Assumption E1 and focus the attention on the problem of inferring H[P
x;1998
]. Of course,
Assumption E1 can be relaxed, and H[P
x;1992
] can be estimated under weaker assumptions using the direct
misclassication approach.
The main assumption that I maintain throughout the entire analysis, and that I use to exploit part of the
information in P
%1
1992
to learn H[P
x;1998
], is the following:
Assumption E3 (No reduction in awareness). p
jj;1998
Xp
jj;1992
, \j c X.
This assumption says that the fraction of individuals correctly identifying their pension plan type does not
decline over time. This in turn implies that lower bounds on the probability of correct report in 1992 provide
lower bounds on the probability of correct report in 1998. Assumption E3 is motivated by the observation that
in recent years the Social Security Administration and the Department of Labor have increasingly expanded
their efforts to improve individuals knowledge about pensions and about retirement saving in general (see
Gustman and Steinmeier, 2001 for a summary of recent interventions).
I now introduce two sets of assumptions, which I entertain along with Assumption E3 to construct the set
H[P
%
1998
] and derive H[P
x;1998
]. Of course, different empirical researchers might hold disparate beliefs about
which of the assumptions in Cases 1 and 2 hold; moreover, they might bring to bear different prior
information.
The identication regions for H[P
x;1998
] are plotted in Fig. 3, along with their 95% condence sets. The
identication regions H[Pr
1998
(x = j)], j c X are reported in Table 6, again with their 95% condence
intervals.
Case 1:
H[P
%
1998
] = H
P
[P
%
] {P : p
11
= p
22
X0:54; p
22
Xp
33
X0:35; p
21
pp
12
; p
31
pp
13
; p
23
pp
13
].
Case 1 maintains Assumption E3 and builds on Assumption E1. I assume that certain of the ndings of
Gustman and Steinmeier (2001) for matched respondents in 1992 are informative about respondents in 1998. I
assume that the probability of correct report for 1998 respondents who truly have a DB or a DC plan is at
least as large as the corresponding probability for 1992 respondents. I also assume that persons with DB and
DC pensions have the same probabilities of correct reports, these being at least as large as the probability of
correct report by those whose pensions are of the Both type. This assumption is motivated by Table 2, which
shows this pattern for 1992 respondents.
I also assume that various other features of Table 2 carry over to respondents in the War Babies wave. I
assume that persons who truly have a pension plan of the Both type report their plan as DB more often than
the reverse pattern, where persons with DB plans report themselves as having a plan of the Both type. I assume
that persons who truly have a DC plan report a DB plan more often than individuals with a DB plan report a
DC one. And I assume that persons who truly have a plan of the Both type report a DB plan more often than
a DC one. These assumptions are expressed through the inequalities p
21
pp
12
, p
31
pp
13
, p
23
pp
13
.
ARTICLE IN PRESS
Table 5
Implications of Assumption E1no selectionand Assumption E2no selection and no variation over timefor the identication
regions of [Pr
t
(x = j); j c X], t = 1992; 1998
Maintained assumptions t = 1992: No selection t = 1998: No selection and no variation over time
Point estimate Bootstrap 95% CI Point estimate Bootstrap 95% CI
0.46 [0:36; 0:60] 0.86 [1:82; 0:43]
(P
%1
1992
)
1
P
w;t
0.37 [0:33; 0:41] 0.48 [0:31; 0:63]
0.17 [0:02; 0:28] 1.38 [0:89; 2:37]
Sample size N = 4; 244 N = 1; 124
The rst panel of Fig. 3 shows the estimate of H[P
x;1998
] obtained in Case 1, and its condence set, mapped
in R
2
. Interestingly, these sets are non-convex. For the construction of the condence set, I estimated P
w;1998
using sample means, and took as estimates of the lower bounds in H
E
[P
%
] the values m
1;n
, m
2;n
in the (2,2) and
(3,3) entries of Table 3. These estimates are borrowed from Gustman and Steinmeier (2001) and are based on a
validation data (respondents to the 1992 wave with matched pension plan descriptions) independent from the
1998 data, with n = 2; 907. For the construction of the condence ellipsoid for [P
w;1998
1
; P
w;1998
2
; m
1
; m
2
] I used
k =
N
n
=
1;124
2;907
. The estimates of Pr
1992
(x = 1) and H[Pr
1998
(x = 1)] reported in Table 6 suggest that the fraction
of individuals having a DB plan should have declined between 1992 and 1998. However, the condence set for
H[Pr
1992
(x = 1) Pr
1998
(x = 1)] covers negative numbers, and therefore the hypothesis Pr
1992
(x = 1)
Pr
1998
(x = 1)o0 cannot be rejected. This shows that relatively mild restrictions yield a strong conclusion
regarding the question of interest, although more assumptions are needed to obtain statistical signicance.
ARTICLE IN PRESS
Table 6
Identication regions in cases 12 for Pr
1998
(x = j), and point estimates for Pr
1992
(x = j)
Maintained assumptions H[Pr
t
(x = 1)] H[Pr
t
(x = 2)] H[Pr
t
(x = 3)]
Estimate 95% CI Estimate 95% CI Estimate 95% CI
t = 1992 0.46 [0:36; 0:60] 0.37 [0:33; 0:41] 0.17 [0:02; 0:28]
Case 1, 1998 [0:00; 0:42] [0:00; 0:44] [0:11; 0:72] [0:10; 0:87] [0:00; 0:89] [0:00; 0:91]
Case 2, 1998 [0:00; 0:28] [0:00; 0:34] [0:35; 0:61] [0:28; 0:80] [0:11; 0:50] [0:00; 0:67]
Sample size N = 4; 244 for 1992 N = 1; 124 for 1998
Case 1
Case 2
C
N
H[P
x,1998
]
H[P
x,1998
]
2
C
N
H[P
x,1998
]
H[P
x,1998
]
2
[1 0 0]
[1 0 0]
[0 1 0]
[0 1 0]
[0 0 1]
[0 0 1]
Fig. 3. Identication regions and condence sets for H[P
x;1998
] under different assumptions.
Case 2:
H[P
%
1998
] = H
P
[P
%
] P :
p
11
= p
22
Xp
33
X0:54;
p
21
pp
12
; p
31
pp
13
; p
23
pp
13
;
p
21
X0:10; p
ij
X0:15 for all other i; j c X; iaj:
_
_
_
_
_
_
_
_
_
_
.
Case 2 builds on Case 1, as it retains all the assumptions maintained there. However, it is crucially set apart
from the previous case, in that it requires a lower bound on each probability of misclassication. This in turn
implies that, given any true pension plan type, the probability of correct report has to be necessarily less than
one. This assumption is motivated by the large amount of misreporting of pension plan types which appears in
Table 3, and which is documented at large by Gustman and Steinmeier (2001). Additionally, p
33
is required to
have the same lower bound as p
11
and p
22
. This is motivated by the large amount of information campaigns on
DC plans (in particular 401k) that has characterized the mid to late 1990s.
Under these assumptions, the estimate of H[P
x;1998
] shrinks further. This allows one to conclude that the
fraction of individuals having DB plans has decreased between 1992 and 1998; in particular,
Pr
1992
(x = 1) Pr
1998
(x = 1)X0:18. This in turn implies that the fraction of individuals having either DC
plans or plans incorporating features of both has increased sharply between 1992 and 1998. The condence set
for H[Pr
1992
(x = 1) Pr
1998
(x = 1)] does not contain negative numbers, so that the assumption Pr
1992
(x =
1) Pr
1998
(x = 1)o0 can be rejected. The condence set H[P
x;1998
] in Case 2 is constructed again by
estimating P
w;1998
using sample means, and taking as estimate of the lower bound for p
jj
, j = 1; 2; 3, in H
E
[P
%
]
the value m
n
in the (2,2) entry of Table 3. However the lower bounds for the other parameters are treated as
constant, so that the condence ellipsoid is constructed exclusively for the vector [P
w;1998
1
; P
w;1998
2
; m].
By comparison, if one did not use all the information provided by Gustman and Steinmeiers (2001)
analysis, but imposed only a uniform lower bound on the probability of correct report (Assumption 2), the
results of HM would apply. If one assumed 1 l = 0:35, one would learn that Pr
1998
(x = 1) c [0; 0:79],
Pr
1998
(x = 2) c [0; 1], Pr
1998
(x = 3) c [0; 0:97]. If one assumed 1 l = 0:54, one would learn that
Pr
1998
(x = 1) c [0; 0:51], Pr
1998
(x = 2) c [0; 0:71], Pr
1998
(x = 3) c [0; 0:63]. These bounds do not allow one to
identify the sign of the change in the fraction of individuals having a DB plan.
5. Extensions
The direct misclassication approach can be easily extended to drawing inference in the presence of multiple
misclassied variables, regression with misclassied outcome, regression with misclassied regressor, and
jointly missing and misclassied outcomes. Below I list briey the modications of the approach that allow
inference in each of these cases.
5.1. Two or more misclassied variables
In this case, the researcher simply has to redene variables. Suppose that interest centers on features of
P(x
1
; x
2
), x
1
c X
1
{1; 2; . . . ; J
1
], x
2
c X
2
{1; 2; . . . ; J
2
], 2pJ
1
; J
2
oo, and the researcher observes only
(w
1
; w
2
), a misclassied version of (x
1
; x
2
). She can then construct random variables s and r, taking values in
S {1; 2; . . . ; J
1
J
2
], and such that s = (l 1)J
2
j if x
1
= j and x
2
= l, and r = (k 1)J
2
i if w
1
= i and
w
2
= k. She can then write the analogue of Eq. (1.1) for r and s, and use the method proposed here to draw the
inference of interest.
5.2. Regressions
(a) If interest centers on features of P(x[s = s
0
), where s c S is a perfectly observable discrete covariate with
Pr(s = s
0
)40, and the researcher has prior information on P
%
s
0
{Pr(w = i[x = j; s = s
0
)]
i;jcX
, the proposed
method can be applied directly, with the event s = s
0
conditioning all the probabilities involved.
(b) Consider now the case that interest centers on features of P(y[x), where y is a perfectly observed outcome
variable. The problem of regression with misclassied covariates has been widely studied (e.g., Aigner, 1973;
ARTICLE IN PRESS
Klepper, 1988; Bollinger, 1996; Card, 1996; Kane et al., 1999; Hu, 2006; Mahajan, 2006), and point
identied or interval identied estimators have been proposed under specic sets of assumptions.
The direct misclassication approach can be used to estimate the smallest point and the largest point
in the identication region of (for example) a mean regression under any set of assumptions. Molinari
(2003) shows how. Here I present ideas, for the special case in which the probability of correct report is
greater than
1
2
for each of the values that x can take (and any additional assumption might hold). In
this case any P c H[P
%
] is of full rank, so that p
x
= P
1
P
w
. This implies that a feasible value of [Pr(x =
j[w = i); i; j c X] can be uniquely expressed as a function of P. Hence, for each P c H[P
%
], I can use
the results of HM to obtain sharp bounds for E(y[w = i; x = j) and use the Law of Total Probability to infer
sharp P-dependent bounds on E(y[x = j). Taking the inmum and the supremum, respectively, of the
smallest and largest points in these bounds for P c H[P
%
] gives the smallest and the largest point in
H[E(y[x = j)], j c X.
This same argument has been proposed by Dominitz and Sherman (2006), who studied the problem of
inferring the distribution of test scores for truly English procient students (x = 1), when only an imperfect
indicator of English prociency is available (w = 1). They used a mixture model with verication and assumed
that students classied as English procient (w = 1) are more likely to be truly English procient (x = 1) than
students classied as limited English procient (w = 2). In terms of misclassication probabilities, this
assumption translates into p
11
XP
w
1
.
5.3. Jointly missing and misclassied data
The data available to the empirical researcher are often not only error ridden, but also incomplete. Consider
the example of survey respondents being asked about their pension plan type: not only can they report DB,
DC, or Both, but they can as well choose not to respond to the question. Let w = J 1 denote this outcome.
Then system (1.1) needs to be enlarged to include the equation
Pr(w = J 1) =
J
j=1
Pr(w = J 1[x = j) Pr(x = j).
This simply implies that the set H[P
%
] is a set of rectangular matrices. The identication regions H[P
x
] and
H{t[P
x
]] are still dened as in (2.2) and (2.3), and the nonlinear programming method can be used to
consistently estimate them. Of course, there are additional constraints, one coming from the (J 1)th
equation in the above system, and the others from possible assumptions on the relationship between
misreporting and non-response.
6. Conclusions
This paper has studied the problem of drawing inference when a discrete variable is subject to classication
errors. This is a commonplace problem in surveys and elsewhere. The problem has long been conceptualized
through convolution and mixture models. This paper introduced the direct misclassication approach. The
approach is based on the observation that in the presence of classication errors, the relation between the
distribution of the true but unobservable variable and its misclassied representation is given by a linear
system of simultaneous equations, in which the coefcient matrix is the matrix of misclassication
probabilities.
While this matrix is unknown, validation studies, economic theory, cognitive and social psychology, or
knowledge of the circumstances under which the data have been collected can provide information on the
misclassication pattern that has transformed the true but unobservable variable into the observable but
possibly misclassied variable. The method introduced in this paper shows how to transform such prior
information into sets of restrictions on the (unknown) matrix of misclassication probabilities, and exploit
these restrictions to derive identication regions for any real functional of the distribution of interest. By
contrast, mixture models do not allow the researcher to easily exploit this type of prior information to learn
features of the distribution of interest. Convolution models, as usually implemented with the assumption of
ARTICLE IN PRESS
independence between measurement error and true variable, are not suited to analyze errors in discrete data.
The direct misclassication approach does not rely on any specic set of assumptions, but it can incorporate
into the analysis any prior information that the researcher might have on the misreporting pattern. In some
cases the implied identication regions have a simple closed form solution that allows for straightforward
estimation using sample analogs. When this is not the case, the identication regions can be estimated using
the nonlinear programming estimator introduced in this paper. Condence sets that cover the true
identication region with probability at least equal to a prespecied condence level can be constructed using
a simple procedure based on the inversion of a Wald statistic.
Acknowledgments
I am grateful to the Associate Editor, two anonymous reviewers, Tim Conley, Joel Horowitz, Rosa
Matzkin, and especially Chuck Manski for helpful comments and suggestions. I have benetted from
discussions with T. Bar, G. Barlevy, L. Barseghyan, L. Blume, R. DiCecio, M. Goltsman, A. Guerdjikova,
G. Jakubson, N. Kiefer, R. Lentz, G. Menzio, B. Meyer, M. Peski, J. Sullivan, C. Taber, E. Tamer,
T. Tatur, and T. Vogelsang, and from the comments of seminar participants at Boston College, Chicago GSB,
Cornell, Duke, Georgetown, Pittsburgh, Penn, Penn State, Princeton, Purdue, Toronto, UCLA, UCL,
Virginia, and at the 2003 Southern Economic Association Meetings. All remaining errors are my own.
Research support from Northwestern University Dissertation Year Fellowship, the Center for Analytic
Economics at Cornell University, and the National Science Foundation Grant SES-0617482 is gratefully
acknowledged.
Appendix A. Proofs of Propositions
A.1. Propositions in Section 2
A.1.1. Proposition 1
Proof. Let P
1
c H
P
[P
%
]. This means that J n
1
c D
J1
such that P
1
n
1
= P
w
. Now observe that for any
n c D
J1
,
~
Pn = P
w
. Hence, for any a c (0; 1) it holds that (aP
1
(1 a)
~
P)n
1
= P
w
, and therefore
(aP
1
(1 a)
~
P) c H
P
[P
%
]. To show that H
P
[P
%
] is not star convex with respect to any other of its
elements, consider a matrix P
1
c H
P
[P
%
] with P
1
a
~
P. Because P
1
a
~
P, it follows that there exists an i c X
such that not all elements of the ith row of P
1
are equal to P
w
i
. Without loss of generality, let i = 1. Let
p
1
1j
4P
w
1
40 (a similar argument works for the case that p
1j
oP
w
1
), and without loss of generality suppose j = 1.
Construct P
2
as follows: p
2
1
= P
w
, p
2
1k
= 1 \k c X{1]. Then P
2
c H
P
[P
%
]. Let P
a
= aP
1
(1 a)P
2
. Then
for any a c [0; 1 P
w
1
) it follows that P
a
eH
P
[P
%
], because every element in the rst row of the resulting
matrix is strictly greater than P
w
1
. &
For given vectors of positive probabilities P
w
and positive constants l [m
1
; . . . ; m
q
], let Q(n; P
w
; l) denote
the value function in the nonlinear programming problem (2.7)(2.8). Observe that Q
N
(n) = Q(n; P
w
N
; l
n
). Let
(v
1
; P
1
) be the maximizer for Q(n; P
w;1
; l
1
), where P
w;1
; l
1
are arbitrary values of P
w
; l (recall that this problem
has always an optimal solution). I show that for different feasible arbitrary values P
w;2
; l
2
, the difference
[Q(n; P
w;1
; l
1
) Q(n; P
w;2
; l
2
)[ is O
p
(N
1=2
). The strategy of this proof is similar to the one in Honore and
Lleras-Muney (2006), except that here some more complications arise due to the possible nonlinearity of some
of the constraints. This establishes that
sup
ncD
J1
[Q
N
(n) Q(n)[
p
0 and
sup
ncD
J1
[Q
N
(n) Q(n)[
N
= o
p
(1).
The consistency result then follows from Manski and Tamer (2002, Proposition 5).
To simplify the notation, let q = q, and assume that q
1
components of l are estimated for the greater-than-
or-equal constraints, q
2
for the less-than-or-equal constraints, and q
3
for the equality constraints,
ARTICLE IN PRESS
q
1
q
2
q
3
= q. Let
c
1
= min min
j
P
w;2
j
P
w;1
j
; min
lc{1;...;q
1
]
m
2
l
m
1
l
; min
mc{1;...;q
2
]
m
1
q
1
m
m
2
q
1
m
; min
sc{1;...;q
3
]
m
2
q
1
q
2
s
m
1
q
1
q
2
s
; min
sc{1;...;q
3
]
m
1
q
1
q
2
s
m
2
q
1
q
2
s
_ _
.
This implies that 0oc
1
p1. Let

P c
1
P
1
X0, and
v
j
1
J
i=1
p
ij
= 1 c
1
J
i=1
p
1
ij
X1
J
i=1
p
1
ij
X0; j = 1; . . . ; J,
v
Jj
P
w;2
j

J
i=1
p
ij
x
j
= P
w;2
j
c
1
J
i=1
p
1
ij
x
j
X
P
w;2
j
P
w;1
j
v
1
Jj
X0; j = 1; . . . ; J.
Notice that [ v
j
v
1
j
[pJ(1 c
1
) and [ v
Jj
v
1
Jj
[p(1 J)(
1c
1
c
1
).
Consider now the constraints dening H
E
[P
%
]. Let r = max(t; max
l
r
l
), where t is the degree of the
polynomial. Observe that if f
l
(P
1
)Xm
1
l
then v
1
2Jl
= 0; if f
l
(P
1
)om
1
l
then v
1
2Jl
= m
1
l
f
l
(P
1
). For
l = 1; . . . ; q
1
, let
v
2Jl

0 if f
l
(P
1
)Xm
1
l
and f
l
(
P)Xm
2
l
;
f
l
(P
1
) m
1
l
(f
l
(
P) m
2
l
) if f
l
(P
1
)Xm
1
l
and f
l
(
P)om
2
l
;
m
2
l
f
l
(
P) if f
l
(P
1
)om
1
l
:
_
_
(A.1)
The suggested values of v
2Jl
are feasible. In fact, if f
l
(P
1
)Xm
1
l
the implied v
2Jl
is obviously non-negative. If
f
l
(P
1
)om
1
l
,
v
2Jl
= m
2
l
f
l
(
P) = m
2
l
f
l
(c
1
P
1
)X
m
2
l
m
1
l
m
1
l
c
1
f
l
(P
1
)Xc
1
v
1
2Jl
X0,
where the rst inequality follows from Assumption C1. Moreover,
[ v
2Jl
v
1
2Jl
[p[m
2
l
m
1
l
[ [f
l
(
P) f
l
(P
1
)[pM
1 c
1
c
1
_ _
max
Pc[0;1]
J
2
[f
l
(P)[(1 c
r
1
),
where max
Pc[0;1]
J
2 [f
l
(P)[ is bounded because f
l
() is a continuous function on a compact set.
Regarding the less-than-or-equal constraints, observe that under Assumption C1 a monotone transforma-
tion of g
m
(P) and m
q
1
m
leaves the constraint unaltered. Hence without loss of generality when g
m
() satises
Assumptions C1(i), let r
m
= 1.
Now, notice that if g
m
(P
1
)pm
1
q
1
m
then v
1
2Jq
1
m
= 0; if g
m
(P
1
)4m
1
q
1
m
then v
1
2Jq
1
m
= g
m
(P
1
) m
1
q
1
m
. For
m = 1; . . . ; q
2
, let
v
2Jq
1
m

0 if g
m
(P
1
)pm
1
q
1
m
and g
m
(
P)pm
2
q
1
m
;
m
1
q
1
m
g
m
(P
1
)
1
c
1r
1
g
m
(
P) m
2
q
1
m
_ _
if g
m
(P
1
)pm
1
q
1
m
and g
m
(
P)4m
2
q
1
m
;
1
c
1r
1
g
m
(
P) m
2
q
1
m
if g
m
(P
1
)4m
1
q
1
m
:
_
_
This choice of v
2Jq
1
m
satises the constraint in (2.8). In fact, if g
m
(
P)pm
2
q
1
m
the constraint is satised with
v
2Jq
1
m
= 0, and in the other cases
m
2
q
1
m
g
m
(
P) v
2Jq
1
m
Xm
2
q
1
m
g
m
(
P)
1
c
1r
1
g
m
(
P) m
2
q
1
m
_ _
=
1
c
1r
1
1
_ _
g
m
(
P)X0,
where the last inequality follows because by Assumption C1 g
m
() is non-negative on [0; 1]
J
2
and 0oc
1
p1 by
construction. Notice also that the suggested values of v
2Jq
1
m
are feasible. In fact, if g
m
(P
1
)pm
1
q
1
m
the
ARTICLE IN PRESS
implied v
2Jq
1
m
is obviously non-negative, because
1
c
1r
1
g
m
(
P)Xg
m
(
P). On the other hand, recalling that by

construction c
1
pmin
mc{1;...;q
2
]
m
1
q
1
m
m
2
q
1
m
, if g
m
(P
1
)4m
1
q
1
m
,
v
2Jq
1
m
=
1
c
1r
1
g
m
(
P) m
2
q
1
m
=
1
c
1r
1
g
m
(c
1
P
1
) m
2
q
1
m
X
1
c
r
1
g
m
(P
1
) m
2
q
1
m
X
1
c
r
1
v
1
2Jq
1
m
X0.
Moreover, by Assumption C1(i)
[ v
2Jq
1
m
v
1
2Jq
1
m
[p[m
2
q
1
m
m
1
q
1
m
[
1
c
1r
1
g
m
(
P) g
m
(P
1
)
p
1 c
r1
1
c
r1
1
_ _
M max
Pc[0;1]
J
2
g
m
(P)
_ _
,
where max
Pc[0;1]
J
2 g
m
(P) is bounded because g
m
() is a continuous function on a compact set. Finally, observe
that for the equality constraints the same calculations as above can be applied to h
k
(P)Xm
q
1
q
2
k
and
h
k
(P)pm
q
1
q
2
k
, k = 1; . . . ; q
3
.
Hence, for each n, Q(n; P
w;2
; l
2
)XQ(n; P
w;1
; l
1
) const
1 c
r1
1
c
r1
1
_ _
. Interchanging the role of P
w;1
and P
w;2
yields Q(n; P
w;1
; l
1
)X Q(n; P
w;2
; l
2
) const
1 c
r1
2
c
r1
2
_ _
, where
c
2
= min min
j
P
w;1
j
P
w;2
j
; min
lc{1;...;q
1
]
m
1
l
m
2
l
; min
mc{1;...;q
2
]
m
2
q
1
m
m
1
q
1
m
; min
sc{1;...;q
3
]
m
2
q
1
q
2
s
m
1
q
1
q
2
s
; min
sc{1;...;q
3
]
m
1
q
1
q
2
s
m
2
q
1
q
2
s
_ _
with 0oc
2
p1, so that
[Q(n; P
w;2
; l
2
) Q(n; P
w;1
; l
1
)[pconst
1 c
r1
1
c
r1
1
_ _
const
1 c
r1
2
c
r1
2
_ _
.
Finally, under Assumption C2 the estimators P
w
N
and l
n
are root-N consistent, so that
sup
ncD
J1
[Q
N
(n) Q(n)[ = O
p
(N
1=2
).
A.2. Propositions in Section 3
I rst introduce and prove a lemma that is useful for the proof of some of the following propositions.
Lemma 1. Suppose that Assumption 2 holds, and that P
w
j
4l, j c X. Then
P
w
j
l
1 l
is an admissible value of p
x
j
,
and therefore solves the jth equation of system (1.1), if and only if the following conditions jointly hold: (a) p
jj
= 1,
and (b) p
ji
= l \i c X{j] such that p
x
i
40, so that
iaj
p
ji
p
x
i
= l
1P
w
j
1l
.
Proof. For
P
w
j
l
1 l
40 to be an admissible value of p
x
j
, the jth equation of system (1.1) requires that
p
jj
P
w
j
l
1 l

iaj
p
ji
p
x
i
= P
w
j
, (A.2)
ARTICLE IN PRESS
and
iaj
p
x
i
=
1 P
w
j
1 l
. By Assumption 2, p
ji
c [0; l], \i c X{j] and p
jj
c [1 l; 1]. Notice that it is possible
for p
ji
= l, \i c X{j], because the p
ji
are not related across i. (Recall that 1 p
kk
=
lak
p
lk
pl, \k c X.)
Therefore,
p
jj
P
w
j
l
1 l

iaj
p
ji
p
x
i
pp
jj
P
w
j
l
1 l
l
iaj
p
x
i
= p
jj
P
w
j
l
1 l
l
1 P
w
j
1 l
p
P
w
j
l
1 l
l
1 P
w
j
1 l
= P
w
j
.
Hence, Eq. (A.2) can be satised if and only if p
jj
= 1, and
iaj
p
ji
p
x
i
= l
1P
w
j
1l
. That is, p
ji
= l \i c X{j] such
that p
x
i
40. Notice that at least one value of p
x
i
40, because p
x
j
=
P
w
j
l
1l
o1. &
Proof. Without loss of generality, suppose that interest is in characterizing the identication region
H[Pr(x = 1)].
(a) Assumption 1 holds: For the rst equation of system (1.1) to be satised it must be that
p
x
1
= P
w
1

J
j=2
p
1j
p
x
j
p
x
1
J
i=2
p
i1
. From the denition of H
1
[P
%
] it follows that
lX1
J
h=1
p
hh
p
x
h
=
J
h=1
iah
p
ih
p
x
h
_ _
X
J
j=2
p
1j
p
x
j
p
x
1
J
i=2
p
i1
.
Hence from the rst equation of system (1.1) one can learn that p
x
1
Xmax{P
w
1
l; 0], and p
x
1
pmin{1; P
w
1
l].
If P
w
1
4l, the lower bound is achieved for
J
j=2
p
1j
p
x
j
= l and
J
i=2
p
i1
p
x
1
= 0. If P
w
1
o1 l, the upper bound is
achieved for
J
j=2
p
1j
p
x
j
= 0 and
J
i=2
p
i1
p
x
1
= l. I now show that there are values of p
x
j
c X{1] and P c
H
1
[P
%
] such that the corresponding p
x
c H[P(x)].
(a.1.1) Upper Bound, with P
w
1
o1 l: Let p
11
=
P
w
1
(P
w
1
l)
, p
jj
= 1, j c X{1], p
ij
= 0, i; j c X{1], iaj, and
dene p
i1
, i c X{1], as follows:
if J j41 : P
w
j
Xl; p
i1
=
l
(P
w
1
l)
for i = j = min{k = 2; . . . ; J : P
w
k
Xl];
0; \i c X; ia{1; j]:
_
_
if P
w
j
ol; \j c X{1]; p
i1
=
P
w
2
(P
w
1
l)
for i = 2;
min
l
(P
w
1
l)
i1
k=2
P
w
k
(P
w
1
l)
;
P
w
i
(P
w
1
l)
_ _
for i c X{1; 2];
i :

i1
k=2
P
w
k
(P
w
1
l)
pl;
_
_
0
for i c X{1; 2];
i :

i1
k=2
P
w
k
(P
w
1
l)
4l:
_
_
_
_
It is easy to show that the suggested P belongs to H
1
[P
%
], and allows for p
x
1
= P
w
1
l and the implied p
x
j
,
j c X{1] to solve system (1.1). Hence, p
x
1
= P
w
1
l is a feasible value of Pr(x = 1) given the maintained
assumptions.
(a.1.2) Upper bound, with P
w
1
X1 l: In this case the upper bound is not informative, but just set equal to 1.
Let p
x
1
= 1; this in turn implies p
x
j
= 0, \j c X{1]. Let
J
i=2
p
i1
= 1 P
w
1
pl, and p
i1
p
x
1
= p
i1
= P
w
i
pl,
\i c X, ia1. It is straightforward to verify that the suggested P c H
1
[P
%
], and allows for p
x
1
= 1, and the
ARTICLE IN PRESS
implied p
x
j
= 0, \j c X{1], to solve system (1.1). Hence p
x
1
= 1 is a feasible value of Pr(x = 1) given the
maintained assumptions.
(a.2.1) Lower bound, with P
w
1
4l: Let p
x
2
= P
w
2
l, and p
12
=
l
p
x
2
, p
22
= 1
l
p
x
2
, and p
jj
= 1, \j c X{2], so
that p
i2
= 0, \i c X{2], and p
ij
= 0, \i; j c X, iaj, [i j]a[1 2]. Then it is straightforward to verify that the
suggested P c H
1
[P
%
], and allows for p
x
1
= P
w
1
l and the implied p
x
j
, j c X{1] to solve system (1.1). Hence
P
w
1
l is a feasible value of Pr(x = 1) given the maintained assumptions.
(a.2.2) Lower bound, with P
w
1
pl: Then the lower bound is not informative, but just set equal to 0. Let
p
x
1
= 0; this in turn implies
J
j=2
p
x
j
= 1. Let p
12
= p
13
= = p
1J
= P
w
1
. Then
J
j=2
p
1j
p
x
j
= P
w
1
. Moreover
J
j=2
P
w
j
= 1 P
w
1
X1 l, hence P
w
j
p1 P
w
1
for each j c X{1]. Let p
jj
= 1 P
w
1
, \j c X{1], and p
ij
= 0,
\i; j c X, iaj, ia1. Then p
x
j
=
P
w
j
1 P
w
1
p1, j c X{1], and
J
j=2
p
x
j
= 1. It follows that when P
w
1
pl, there exist
values of P c H
1
[P
%
] for which p
x
1
= 0 and the implied p
x
j
, j c X{1] solve system (1.1), and hence it is a
feasible value of Pr(x = 1) given the maintained assumptions.
(a.3) The entire interval between the extreme points is feasible: To prove the claim I need to distinguish four
cases: (1) lpP
w
1
p1 l; (2) P
w
1
pmin{l; 1 l]; (3) P
w
1
Xmax{l; 1 l]; (4) 1 loP
w
1
ol. Here I describe in
detail the proof for case (1); the other cases can be proved using similar arguments. See Molinari (2003) for a
detailed proof of all cases.
Let lpP
w
1
p1 l. It then follows that
P
w
1
lpp
x
1
pP
w
1
l.
Let p
x
1
= P
w
1
(1 2a)l, for any a c (0; 1). To nd values of p
x
j
c X{1] and P c H
1
[P
%
] such that the
corresponding p
x
c H[P(x)], I distinguish two sub-cases:
1. If ap
1
2
, let p
11
=
P
w
1
P
w
1
(1 2a)l
, p
ij
= 0, \i = 1; . . . ; J, j = 2; . . . ; J. Choose p
j1
and p
x
j
, j c X{1], as
follows: if J j : P
w
j
X1
P
w
1
P
w
1
(1 2a)l
,
p
k1
=
1
P
w
1
P
w
1
(1 2a)l
for k = j = min{i = 2; . . . ; J : P
w
i
Xl];
0; \k c X; ka{1; j]:
_
_
If P
w
j
o1
P
w
1
P
w
1
(1 2a)l
; \j c X{1],
p
k1
=
P
w
2
for k = 2;
min 1
P
w
1
P
w
1
(1 2a)l
k1
i=2
p
i1
; P
w
k
_ _
\k c X{1; 2];
_
_
p
x
j
= P
w
j
p
j1
(P
w
1
(1 2a)l).
2. If a4
1
2
, let p
jj
= 1, \j c X{2], p
22
=
P
w
2
P
w
2
(2a 1)l
, p
11
=
(2a 1)l
P
w
2
(2a 1)l
, and p
x
2
= P
w
2
(2a 1)l.
(b) Assumption 2 holds: For the rst equation of system (1.1) to be satised, I need p
11
p
x
1

J
j=2
p
1j
p
x
j
= P
w
1
,
where
J
j=2
p
x
j
= 1 p
x
1
. From the denition of H
2
[P
%
], p
1j
pl, \j c X{1], and p
11
X1 l. Let
J
j=2
p
1j
p
x
j
p p(1 p
x
1
), where p c [0; l]. Then
p
x
1
=
P
w
1
p
p
11
p
,
ARTICLE IN PRESS
and p
x
1
is well dened as long as p
11
a p. I distinguish a few cases.
1. If P
w
1
omin{l; 1 l], one can pick p = P
w
1
ol, and p
x
1
= 0 is the lower bound. As for the upper bound,
when P
w
1
o1 lpp
11
, by the rst equation of system (1.1) ppP
w
1
pp
11
, and p
x
1
is decreasing in both p
11
and
p. Hence the upper bound is achieved for p
11
= 1 l, and p = 0, and is given by p
x
1
=
P
w
1
1 l
.
2. If lpP
w
1
p1 l, by the rst equation of system (1.1) ppP
w
1
pp
11
, and p
x
1
11
and p.
Hence the upper bound is achieved for p
11
= 1 l, and p = 0, and is given by p
x
1
=
P
w
1
1 l
, and the lower
bound is achieved for p
11
= 1, and p = l, and is given by p
x
1
=
P
w
1
l
1 l
.
3. If 1 lpP
w
1
pl, pick p = P
w
1
pl, and p
x
1
= 0 is the lower bound. Pick p
11
= P
w
1
X1 l, and p
x
1
= 1 is the
upper bound.
4. If P
w
1
4max{l; 1 l], pick p
11
= P
w
1
X1 l, and p
x
1
= 1 is the upper bound. As for the lower bound, when
P
w
1
4lX p, by the rst equation of system (1.1) ppP
w
1
pp
11
, and p
x
1
11
and p. Hence
the lower bound is achieved for p
11
= 1, and p = l, and is given by p
x
1
=
P
w
1
l
1 l
.
To summarize, from the rst equation of system (1.1) one can learn that p
x
1
Xmax
P
w
1
l
1 l
; 0
_ _
and
p
x
1
pmin 1;
P
w
1
1 l
_ _
. I am left to show that one can nd values of p
x
j
c X{1] and P c H
2
[P
%
] such that for
any p
x
1
c max
P
w
1
l
1 l
; 0
_ _
; min 1;
P
w
1
1 l
_ _ _ _
the corresponding p
x
c H[P(x)]. I rst show that this holds for
the extreme points, and then that it holds for any point in the closed interval between the lower and the upper
bound.
(b.1.1) Upper Bound, with P
w
1
o1 l: Let p
11
= 1 l and p
jj
= 1, \ j41. Then the system reduces to
(1 l)
P
w
1
1 l
= P
w
1
;
p
j1
P
w
1
1 l
p
x
j
= P
w
j
; j = 2; . . . ; J;
_
_
where
J
j=2
p
j1
= l, and
J
j=2
P
w
j
4l. Choose p
k1
, k c X{1], as follows:
if J j : P
w
j
Xl; p
k1
=
l for k = j = min{i = 2; . . . ; J : P
w
i
Xl];
0; \k c X; ka{1; j]:
_
if P
w
j
ol; \j c X{1]; p
k1
=
P
w
2
for k = 2;
min{l
k1
i=2
p
i1
; P
w
k
] \k c X{1; 2]:
_
(A.3)
It is easy to show that the suggested P belongs to H
2
[P
%
], and allows for p
x
1
=
P
w
1
1 l
and the implied p
x
j
, j c X{1]
to solve system (1.1). Hence, p
x
1
=
P
w
1
1 l
is a feasible value of Pr(x = 1) given the maintained assumptions.
(b.1.2) Upper bound, with P
w
1
X1 l: In this case the upper bound is not informative, but just set equal to 1.
Let p
x
1
= 1; this in turn implies p
x
j
= 0, \j c X{1]. Let p
j1
= P
w
j
, j = 1; . . . ; J. It is straightforward to verify
that this P c H
2
[P
%
], and obviously allows for p
x
1
x
j
= 0, \j c X{1], to solve system
(1.1). Hence p
x
1
= 1 is a feasible value of Pr(x = 1) given the maintained assumptions.
(b.2.1) Lower bound, with P
w
1
4l: Let p
j1
= 0, \j c X{1], and p
12
= = p
1J
= l; then the rst equation of
system (1.1) is satised, and the implied P c H
2
[P
%
]. Let p
x
j
=
P
w
j
1 l
X0, j c X{1]. It is straightforward to verify
that system (1.1) is satised. Hence p
x
1
=
P
w
1
l
1 l
is a feasible value for Pr(x = 1) given the maintained assumptions.
ARTICLE IN PRESS
(b.2.2) Lower bound, with P
w
1
pl: Let p
x
1
= 0; this in turn implies
J
j=2
p
x
j
= 1. Let p
1j
= P
w
1
and p
jj
= 1 P
w
1
\j41. Then p
x
j
=
P
w
j
1 P
w
1
X0, j c X{1], and
J
j=2
p
x
j
= 1. It follows that when P
w
1
pl, there exist values of
P c H
2
[P
%
] for which p
x
1
x
j
, j c X{1], solve system (1.1), and hence it is a feasible value
of Pr(x = 1) given the maintained assumptions.
(b.3) The entire interval between the extreme points is feasible: To prove the claim I need to distinguish four
cases: (1) lpP
w
1
p1 l; (2) P
w
1
pmin{l; 1 l]; (3) P
w
1
Xmax{l; 1 l]; (4) 1 loP
w
1
ol. Here I describe in
detail the proof for case (1); the other cases can be proved using similar arguments. See Molinari (2003) for a
detailed proof of all cases.
Let lpP
w
1
p1 l. It then follows that
P
w
1
l
1 l
pp
x
1
p
P
w
1
1 l
. Let p
x
1
=
P
w
1
al
1 l
, for any a c (0; 1). I show that
there are values of p
x
j
c X{1] and P c H
2
[P
%
] such that the corresponding p
x
c H[P(x)]. Let
p
11
= 1 l(1 a), p
1j
= al, \j c X{1], p
ij
= 0, \i; j c X{1], iaj. Choose p
j1
and p
x
j
, j c X{1], as follows:
if J j : P
w
j
Xl(1 a); p
k1
=
l(1 a) for k = j = min{i = 2; . . . ; J : P
w
i
Xl];
0; \k c X; ka{1; j];
_
if P
w
j
ol(1 a); \j c X{1]; p
k1
=
P
w
2
for k = 2;
min{l(1 a)
k1
i=2
p
i1
; P
w
k
] \k c X{1; 2];
_
p
x
j
=
1
1 al
(P
w
j
p
j1
P
w
1
al
1 l
). &
Proof. (a) Suppose, without loss of generality, that
~
X = {1; 2; . . . ; h], 2phoJ, and consider Pr(x = 1). By
Lemma 1, for
P
w
1
l
1 l
40 to solve the rst equation of system (1.1), it must be that p
11
= p = 1, and either
p
1i
= l or p
x
i
= 0, \i c X{1], with
J
i=2
p
1i
p
x
i
= l
1 P
w
1
1 l
. Since p
22
= p by assumption, and p = 1, it follows
that p
12
= 0. Hence, for the rst equation in system (1.1) to hold, p
x
2
= 0. Consider the second equation in
system (1.1): when the rst equation of the system holds, the second reduces to
J
i=3
p
2i
p
x
i
= P
w
2
. However, for
each i c X{1], if p
1i
= l, it follows that p
2i
= 0, since
kal
p
kl
= 1 p
ll
pl, \l c X. On the other hand, if
p
1i
ol, for the rst equation in system (1.1) to hold it must be the case that p
x
i
= 0. Hence,
J
i=3
p
2i
p
x
i
= 0.
Therefore, since P
w
2
40; the lower bound in (3.2) is not feasible for Pr(x = 1), because the second equation of
system (1.1) is not satised. Notice now that repeating the same argument for each of equations 3 to h in
system (1.1), implies by a symmetry argument that Pr(x = 1) cannot achieve the lower bound in (3.2).
For k c X
~
X, Pr(x = k) can achieve the lower bound in (3.2). Consider for example Pr(x = J). Let p
JJ
= 1
and p
Ji
= l, \i c X{J]. Then the last equation of system (1.1) is satised. These values of p
Ji
, i c X imply that
p = 1 l, and that p
x
j
=
P
w
j
1 l
for each j c X{J]. It is obvious that the suggested P c H
3
[P
%
], and the
implied p
x
j
solves system (1.1).
(b) Suppose that P
w
1
pl and that p
x
1
= 0. Then
J
j=2
p
x
j
= 1, and p
x
j
X0 \j = 2; . . . ; J. Then the proof of
Proposition 3, part (b.2.2), applies, with p = 1 P
w
1
, p
12
= p
13
= = p
1J
= P
w
1
, and p
ij
= 0, \i; j c X, iaj,
ia1. Hence, it follows that p
x
1
= 0 is a value consistent with Assumption 3 if P
w
1
pl. &
Proof. (a) Suppose, without loss of generality, that
~
X = {1; 2; . . . ; h], 2phoJ, and consider Pr(x = 1). For
p
x
1
=
P
w
1
1 l
o1 to be admissible in the rst equation of system (1.1), it must be that p = 1 l and
ARTICLE IN PRESS
J
j=2
p
1j
p
x
j
= 0. Since p
jj
= p, \j c
~
X, the second equation of the system becomes:
p
21
P
w
1
1 l
(1 l)p
x
2

J
j=3
p
2j
p
x
j
= P
w
2
,
where
J
j=3
p
x
j
= 1
P
w
1
1 l
p
x
2
. Let
J
j=3
p
2j
p
x
j
= p(1
P
w
1
1 l
p
x
2
), where p c [0; l], since the constraints
p
ij
p1 ppl, \iaj c
~
X, and p
lk
pl, \lak c X
~
X, allow for p
1j
= 0 or p
1j
= l, \j = 2; . . . ; J. It follows that
p
x
2
=
P
w
2
p (p
21
p)
P
w
1
1 l
1 l p
.
Notice that p
x
2
must lie in 0; 1
P
w
1
1 l
_ _
. I need to distinguish three cases.
1. 1 l p40. Then
P
w
2
p (p
21
p)
P
w
1
1 l
1 l p
X0 == p
21
p p (P
w
2
p)
(1 l)
P
w
1
,
and one can always nd values of p
21
; p c [0; l] for which this inequality is satised. For p
x
2
p1
P
w
1
1 l
it
must be that
P
w
2
p (p
21
p)
P
w
1
1 l
1 l p
p1
P
w
1
1 l
== p
21
X
l 1 P
w
1
P
w
2
P
w
1
(1 l).
As long as there exist values of p
21
pl that satisfy the above inequality, the upper bound in (3.2) is
admissible. However,
l 1 P
w
1
P
w
2
P
w
1
(1 l)4l == P
w
1
P
w
2
4(1 l) P
w
1
l
1 l
.
Hence, the upper bound in (3.2) can be rejected if
P
w
1
P
w
2
4(1 l) P
w
1
l
1 l
. (A.4)
2. 1 l p = 0. Then p
21
=
P
w
1
P
w
2
(1 l)
P
w
1
(1 l). Hence, the upper bound in (3.2) can be rejected if
condition (A.4) is satised.
3. 1 l po0. Then
P
w
2
p (p
21
p)
P
w
1
1 l
1 l p
X0 == p
21
X p (P
w
2
p)
(1 l)
P
w
1
.
21
pl that satisfy the above inequality, the upper bound in (3.2) is admissible.
However,
p (P
w
2
p)
(1 l)
P
w
1
4l == P
w
2
4 p
P
w
1
(l p)
1 l
.
ARTICLE IN PRESS
Hence, given that by assumption p
ij
pl, \iaj, i; j c X, the upper bound in (3.2) can be rejected if P
w
2
4l. For
p
x
2
p1
P
w
1
1 l
it must be that
P
w
2
p (p
21
p)
P
w
1
1 l
1 l p
p1
P
w
1
1 l
== p
21
p
l 1 P
w
1
P
w
2
P
w
1
(1 l).
21
X0 that satisfy the above inequality, the upper bound in (3.2) is admissible.
However,
l 1 P
w
1
P
w
2
P
w
1
(1 l)o0 == P
w
1
P
w
2
o(1 l).
Hence, the upper bound in (3.2) can be rejected if one of the following holds: (i) P
w
2
4l, or (ii)
P
w
1
P
w
2
o(1 l).
Finally, notice that
if lp
1
2
; (1 l p
ij
)40; \iaj; i; j c X;
if l4
1
2
;
P
w
2
4l==
P
w
1
P
w
2
4(1 l) P
w
1
l
1 l
;
P
w
1
P
w
2
4(1 l)
_
_
_
P
w
1
P
w
2
o(1 l)==
P
w
1
P
w
2
o(1 l) P
w
1
l
1 l
P
w
2
ol
_
_
_
:
_
_
_
_
When lp
1
2
, condition (A.4) is necessary and sufcient to dene the cases in which the upper bound in (3.2) is
not feasible. When l4
1
2
, it can still be the case that (1 l p)40 (but it does not need to be). If P
w
2
4l, (A.4)
is implied, and the upper bound in (3.2) is not feasible. If P
w
1
P
w
2
o(1 l), then condition (A.4) is not
satised, and if (1 l p)40, the upper bound in (3.2) can be feasible. Hence, when lX
1
2
, P
w
2
4l is a
sufcient condition for the upper bound in (3.2) to be not feasible.
Notice now that repeating the same argument for each of equations 3 to h in system (3.3), and solving each
one of them, respectively, for p
x
3
; p
x
4
; . . . ; p
x
h
, implies by a symmetry argument that if lp
1
2
, the upper bound in
(3.2) can be rejected if and only if
P
w
1
P
w
j
4(1 l) P
w
1
l
1 l
some j c
~
X{1],
while if l4
1
2
, the upper bound in (3.2) can be rejected if
P
w
j
4l some j c
~
X{1].
Equations h 1 to J in system (3.3) do not imply any additional conditions under which the upper bound in
(3.2) is not feasible. Indeed, let k c X
~
X; then
p
21
P
w
1
1 l
p
kk
p
x
k

jcX{2;k]
p
kj
p
x
j
= P
w
k
.
Let p
kk
= 1, and, by the same argument as above, let
jcX{2;k]
p
kj
p
x
j
= p(1
P
w
1
1 l
p
x
k
), where p must lie in
[0; l]. Then
p
x
k
=
P
w
k
p
21
P
w
1
1 l
p 1
P
w
1
1 l
_ _
1 p
,
ARTICLE IN PRESS
where 1 pX1 l40. It is straightforward to verify that there are values of p
21
; p c [0; l] for which
p
x
k
c 0; 1
P
w
1
1 l
_ _
. For example, if P
w
k
p1
P
w
1
1 l
, let p = p
21
= 0, so that p
x
k
= P
w
k
. If P
w
k
41
P
w
1
1 l
and
P
w
k
4l, let p = p
21
= l, so that p
x
k
=
P
w
k
l
1 l
p1
P
w
1
1 l
.
(b) Suppose that P
w
1
41 l, and that p
x
1
= 1. Then p
x
j
= 0 \j = 2; . . . ; J. Then pick p = P
w
1
(notice that
P
w
1
41 l, hence the proposed value of p is admissible), and p
j1
= P
w
j
\j = 2; 3; . . . ; J. Since P
w
1
41 l, it
follows that P
w
j
ol \j = 2; 3; . . . ; J, hence the proposed values of p
j1
, \j = 2; 3; . . . ; J, are admissible, and
therefore p
x
1
= 1 is admissible, and hence it is the upper bound. &
Proof. (a) Lower bound.
Suppose that j41, and without loss of generality consider Pr(x = 2). By Lemma 1, for p
x
2
=
P
w
2
l
1 l
40 to
solve the second equation of system (1.1), it must be that p
22
= 1, and either p
2i
= l or p
x
i
= 0, \i c X{2],
with
ia2
p
2i
p
x
i
= l
1 P
w
2
1 l
. Since p
22
pp
11
by assumption, and p
22
= 1, it follows that p
11
= 1; hence, the rst
equation of system (1.1) reduces to
J
i=3
p
1i
p
x
i
= P
w
1
. However, for each i c X{1; 2], if p
2i
= l, it follows that
p
1i
= 0, since
kal
p
kl
= 1 p
ll
pl, \l c X. On the other hand, if p
2i
ol, for the second equation in system
(1.1) to hold it must be the case that p
x
i
= 0. Hence,
J
i=3
p
1i
p
x
i
= 0. Therefore, since P
w
1
40, the lower bound
in (3.2) is not feasible for Pr(x = 2). Notice now that repeating the same argument for Pr(x = j), jX3, implies
that Pr(x = j) cannot achieve the lower bound in (3.2).
Consider now Pr(x = 1), and let p
11
= 1, and p
1i
= l, \i c X{1]. Then the rst equation of system (1.1) is
satised. Let p
x
j
=
P
w
j
1 l
and p
jj
= 1 l for each j c X{1]. It is obvious that the suggested P c H
4
[P
%
], and
the implied p
x
j
solves system (1.1).
(b) Upper bound.
First, let j = 1, and P
w
1
o(1 l). Then, as shown in the proof of Proposition 5, for p
x
1
=
P
w
1
1 l
it must be that
p
11
= 1 l and
J
i=2
p
1i
p
x
i
= 0. But by Assumption 4, p
11
Xp
22
X Xp
JJ
X1 l, and therefore for p
x
1
=
P
w
1
1 l
to solve the rst equation of system (1.1) it must be that p
jj
= 1 l, \ j c X, and I am back to the case of
constant probability of correct report, with
~
X = X. Now let j41, and P
w
j
o(1 l). Then, again, for p
x
j
=
P
w
j
1 l
it must be that p
jj
= 1 l and
iaj
p
ji
p
x
i
= 0. But by Assumption 4, p
jj
Xp
(j1)(j1)
X Xp
JJ
X1 l,
and therefore it must be that p
kk
= 1 l, \k c {j; j 1; . . . ; J], and I am back to the case of constant
probability of correct report, with
~
X = {j; j 1; . . . ; J]. The result of Proposition 5 applies. &
Proof. With dichotomous variables, p
x
1
(p) =
P
w
1
(1 p)
p (1 p)
=
1
2
2P
w
1
1
2p 1
1
_ _
, p c H
3
[P
%
]. Hence,
1. If lo
1
2
P
w
1
X
1
2
, then 1 ppP
w
1
pp and
qp
x
1
(p)
qp
p0. Hence the lower bound on Pr(x = 1) is achieved for
p = 1 and the upper bound for p = max(1 l; P
w
1
).
2. If lX
1
2
P
w
1
X
1
2
, then for p
x
1
c [0; 1] I need one of the following: (a) 1 ppP
w
1
pp==pXP
w
1
X
1
2
; or (b)
ppP
w
1
p1 p==pp1 P
w
1
o
1
2
; additionally, I need pX1 l. Hence, the feasible values of p are given by
p c [1 l; 1 P
w
1
] C [P
w
1
; 1]. Notice that if loP
w
1
, the feasible values of p are given by p c [P
w
1
; 1], and p
x
1
is
decreasing in p; therefore the lower bound is achieved for p = 1 and the upper bound for p = P
w
1
. When
ARTICLE IN PRESS
l4P
w
1
, for values of p c [P
w
1
; 1] the previous result applies. For values of p c [1 l; 1 P
w
1
] p
x
1
is decreasing
in p; therefore the upper bound is achieved for p = 1 l and the lower bound for p = 1 P
w
1
.
3. If lo
1
2
P
w
1
o
1
2
, then 1 ppP
w
1
pp and
qp
x
1
(p)
qp
X0. Hence the lower bound on Pr(x = 1) is achieved for
p = 1 min(l; P
w
1
) and the upper bound for p = 1.
4. If lX
1
2
P
w
1
o
1
2
, then for p
x
1
c [0; 1] I need one of the following: (a) 1 ppP
w
1
pp==pX1 P
w
1
4
1
2
; or (b)
ppP
w
1
p1 p==ppP
w
1
o
1
2
; additionally, I need pX1 l. Hence, the feasible values of p are given by
p c [1 l; P
w
1
] C [1 P
w
1
; 1]. Notice that if 1 l4P
w
1
, the feasible values of p are given by p c [1 P
w
1
; 1],
and p
x
1
is increasing in p; therefore the lower bound is achieved for p = 1 P
w
1
and the upper bound for
p = 1. When 1 loP
w
1
, for values of p c [1 P
w
1
; 1] the previous result applies. For values of p c
[1 l; P
w
1
] p
x
1
is increasing in p; therefore the upper bound is achieved for p = P
w
1
and the lower bound for
p = 1 l.
It is easy to verify that these bounds are a subset of those in (3.2). &
Proof. In this case, p
x
1
(p) =
P
w
1
(1 p
22
)
p
11
(1 p
22
)
, (p
11
; p
22
) c H
4
[P
%
]. Hence,
1. If lo
1
2
, 1 p
22
pP
w
1
pp
11
, and p
x
1
(p) is increasing in p
22
and decreasing in p
11
. Hence the lower bound is
achieved for p
22
= 1 l and p
11
= 1. The upper bound is achieved with p
22
= p
11
, since p
11
bounds p
22
from above. Hence if P
w
1
X
1
2
, the upper bound is achieved for p
11
= p
22
= max(1 l; P
w
1
). If P
w
1
o
1
2
, the
upper bound is achieved for p
11
= p
22
= 1.
2. If lX
1
2
and P
w
1
o
1
2
, either 1 p
22
pP
w
1
pp
11
or 1 p
22
XP
w
1
Xp
11
. Hence, either p
11
c [1 P
w
1
; 1] and
p
22
c [1 P
w
1
; p
11
], or p
11
c [1 l; P
w
1
] and p
22
c [1 l; p
11
]. In the rst case p
x
1
is increasing in p
22
and
decreasing in p
11
; the lower bound is achieved for p
11
= 1, p
22
= 1 P
w
1
. The upper bound is achieved with
p
22
= p
11
= 1. In the second case p
x
1
is decreasing in p
22
and increasing in p
11
; the lower bound is achieved
with p
22
= p
11
= 1 l. The upper bound is achieved with p
11
= P
w
1
and p
22
= 1 l.
3. If lX
1
2
and P
w
1
X
1
2
, consider the following two cases. If l4P
w
1
then p
11
= p
22
= 1 P
w
1
are admissible
values, and the implied p
x
1
= 0. Also, p
11
= P
w
1
is an admissible value, and the implied p
x
1
= 1. If loP
w
1
then
p
11
c [P
w
1
; 1], p
22
c [1 l; p
11
] and 1 p
22
pP
w
1
pp
11
. Then p
x
1
is decreasing in p
11
and increasing in p
22
.
Hence the lower bound is achieved for p
11
= 1 and p
22
= 1 l, and the upper bound is achieved with
p
22
= p
11
= P
w
1
. &
References
Abrevaya, J., Hausman, J.A., 1999. Semiparametric estimation with mismeasured dependent variables: an application to duration models
for unemployment spells. Annales dEconomie et de Statistique 5556, 243275.
Aigner, D.J., 1973. Regression with a binary independent variable subject to errors of observation. Journal of Econometrics 1, 4960.
Beresteanu, A., Molinari, F., 2007. Asymptotic properties for a class of partially identied models. Econometrica, forthcoming.
Blundell, R., Gosling, A., Ichimura, H., Meghir, C., 2007. Changes in the distribution of male and female wages accounting for
employment composition using bounds. Econometrica 75, 323363.
Bollinger, C.R., 1996. Bounding mean regressions when a binary regressor is mismeasured. Journal of Econometrics 73, 387399.
Bound, J., Brown, C., Mathiowetz, N., 2001. Measurement error in survey data. In: Heckman, J.J., Leamer, E. (Eds.), Handbook of
Econometrics, vol. 5. North-Holland, Elsevier Science, pp. 37053843.
Bross, I., 1954. Misclassication in 2 2 tables. Biometrics 10 (4), 478486.
Campbell, S.L., Meyer, C.D., 1991. Generalized Inverses of Linear Transformations. Dover Publications, Inc., New York.
Card, D., 1996. The effect of unions on the structure of wages: a longitudinal analysis. Econometrica 64 (4), 957979.
Chernozhukov, V., Hong, H., Tamer, E., 2004. Inference on parameter sets in econometric models. Discussion paper, MIT, Duke and
Northwestern University.
Chernozhukov, V., Hong, H., Tamer, E., 2007. Estimation and condence regions for parameter sets in econometric models.
Econometrica 75, 12431284.
Ciliberto, F., Tamer, E., 2004. Market structure and multiple equilibria in airline markets, Discussion paper, University of Virginia and
Northwestern University.
ARTICLE IN PRESS
Cox, D.R., Hinkley, D.V., 1974. Theoretical Statistics. Chapman and Hall, London, UK.
Dominitz, J., Sherman, R.P., 2006. Identication and estimation of bounds on school performance measures: a nonparametric analysis of
a mixture model with verication. Journal of Applied Econometrics 21, 12951326.
Dustmann, C., van Soest, A., 2000. Parametric and semiparametric estimation in models with misclassied dependent variables. IZA
Discussion Paper 218.
Gong, G., Whittemore, A.S., Grosser, S., 1990. Censored survival data with misclassied covariates: a case study of breast cancer
mortality. Journal of the American Statistical Association 85 (409), 2028.
Gustman, A.L., Steinmeier, T.L., 2001. What people dont know about their pension and social security. In: Gale, W.G., Shoven, J.B.,
Warshawsky, M.J. (Eds.), Public Policies and Private Pensions. Brookings Institution, Washington D.C.
Gustman, A.L., Mitchell, O.S., Samwick, A.A., Steinmeier, T.L., 2000. Evaluating pension entitlements. In: Mitchell, O.S., Hammond,
P.B., Rappaport, A.M. (Eds.), Forecasting Retirement Needs and Retirement Wealth. University of Pennsylvania.
Hampel, F.R., 1974. The inuence curve and its role in robust estimation. Journal of the American Statistical Association 69 (346),
383393.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., 1986. Robust Statistics: The Approach Based on Inuence Functions.
Wiley, New York.
Hausman, J., Abrevaya, J., Scott-Morton, F.M., 1998. Misclassication of the dependent variable in a discrete-response setting. Journal of
Econometrics 87, 239269.
Honore, B.E., Lleras-Muney, A., 2006. Bounds in competing risks models and the war on cancer. Econometrica 74, 16751698.
Honore, B.E., Tamer, E., 2006. Bounds on parameters in panel dynamic discrete choice models. Econometrica 74, 611629.
Horn, R.A., Johnson, C.R., 1999. Matrix Analysis. Cambridge University Press, New York.
Horowitz, J.L., Manski, C.F., 1995. Identication and robustness with contaminated and corrupted data. Econometrica 63 (2), 281302.
Horowitz, J.L., Manski, C.F., 1998. Censoring of outcomes and regressors due to survey nonresponse: identication and estimation using
weights and imputations. Journal of Econometrics 84, 3758.
Horowitz, J.L., Manski, C.F., 2000. Nonparametric analysis of randomized experiments with missing covariate and outcome data. Journal
of the American Statistical Association 95 (449), 7784.
Hotz, V.J., Mullin, C.H., Sanders, S.G., 1997. Bounding causal effects using data from a contaminated natural experiment: analyzing the
effects of teenage childbearing. Review of Economic Studies 64, 575603.
Hu, Y., 2006. Bounding parameters in a linear regression model with a mismeasured regressor using additional information. Journal of
Econometrics 133, 5170.
Imbens, G.W., Manski, C.F., 2004. Condence intervals for partially identied parameters. Econometrica 72 (6), 18451857.
Kane, T.J., Rouse, C.E., Staiger, D., 1999. Estimating returns to schooling when schooling is misreported, NBER Working Paper 7235.
Klepper, S., 1988. Bounding the effects of measurement error in regressions involving dichotomous variables. Journal of Econometrics 37,
343359.
Klepper, S., Leamer, E.E., 1984. Consistent sets of estimates for regressions with errors in all variables. Econometrica 52 (1), 163183.
Kreider, B., Pepper, J., 2007. Inferring disability status from corrupt data. Journal of Applied Econometrics, forthcoming.
Lewbel, A., 2000. Identication of the binary choice model with misclassication. Econometric Theory 16, 603609.
Mahajan, A., 2006. Identication and estimation of regression models with misclassication. Econometrica 74, 631665.
Manski, C.F., 2003. Partial Identication of Probability Distributions. Springer Series in Statistics. Springer, New York.
Manski, C.F., Tamer, E., 2002. Inference on regressions with interval data on a regressor or outcome. Econometrica 70 (2), 519546.
Mellow, W., Sider, H., 1983. Accuracy of response in labor market surveys: evidence and implications. Journal of Labor Economics 1 (4),
331344.
Molinari, F., 2003. Contaminated, corrupted, and missing data, Ph.D. Thesis, Northwestern University, available at http://
www.arts.cornell.edu/econ/fmolinari/dissertation.pdf).
Moore, J.C., Marquis, K.H., Bogen, K., 1996. The SIPP Cognitive Research Evaluation Experiment: Basic Results and Documentation,
Unpublished Report, U.S. Bureau of the Census.
Munkres, J.R., 1991. Analysis on Manifolds. Addison-Wesley, Reading, MA.
Poterba, J.M., Summers, L.H., 1995. Unemployment benets and labor market transitions: a multinomial logit model with errors in
classication. The Review of Economics and Statistics 77 (2), 201216.
Ramalho, E.A., 2002. Regression models for choice-based samples with misclassication in the response variable. Journal of Econometrics
106, 171201.
Rao, C.R., 1973. Linear Statistical Inference and its Applications. Wiley, New York.
Rockafellar, R.T., 1970. Convex Analysis. Princeton University Press, Princeton, New Jersey.
Swartz, T., Haitovsky, Y., Vexler, A., Yang, T., 2004. Bayesian identiability and misclassication in multinomial data. Canadian Journal
of Statistics 32, 285302.
ARTICLE IN PRESS

Partial Identification of Probability Distributions With Misclassified Data

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Partial Identification of Probability Distributions With Misclassified Data

Загружено:

Авторское право:

Доступные форматы

Journal of Econometrics 144 (2008) 81117

Partial identication of probability distributions

Tel.: +1 6072556367; fax: +1 6072552818.

P). On the other hand, recalling that by

Вам также может понравиться