EstevaoSarndal - 2003 - New Perspective On Calibration Estimators PDF

2003 Joint Statistical Meetings - Section on Survey Research Methods
A NEW PERSPECTIVE ON CALIBRATION ESTIMATORS
Victor M. Estevao and Carl-Erik Srndal

Statistics Canada, Ottawa, Ontario, Canada, K1T 3L2
KEY WORDS: Auxiliary Information, Calibration by k the inclusion probability of unit k and by
Weights, Automated Linearization, Population a k = 1 / k the sampling weight of k. Let y be the
Residuals.
variable of interest. Its value for unit k, y k , is observed
1 Introduction for k s . The unknown total to be estimated is
Complex sampling design and complex parameter are Y = U y k .
familiar concepts in survey sampling. Another survey
feature that often extends beyond the simple ordinary We denote by x an auxiliary vector of dimension
formulation is the use of auxiliary information at the
J 1 , and by x k its value for unit k. We assume that
estimation stage. As the recent literature has shown,
auxiliary information can be more or less complex, we have the following auxiliary information:
depending on the survey design. Estimation for (i) The population vector total X = U x k is known.
complex cases is not well-covered by standard textbook (ii) The vector value x k is known for every k s .
techniques. A broader framework for estimation is
needed with such auxiliary information. We use the Here, X is assumed known from an outside source
term complex auxiliary information for the several
such as a census. If we know the value x k for every
non-standard cases examined in this paper.
k U , as when x k is on the population frame U for
The most basic use of auxiliary information occurs every k, then both (i) and (ii) are met. We can compute
for a single stage, single phase sampling design, where the simple unbiased estimator of the known X as
a known set of auxiliary variables and their = a x . Under general conditions N 1 ( X
X X) is
s k k
corresponding totals are used to compute calibrated
weights for the estimate of a population total. This O p (n 1/ 2 ) .
procedure is reviewed in section 2. More complex cases
considered in later sections arise when auxiliary Our objective is to estimate Y = U y k . One
information is available at different stages and phases
of sampling. Then it is not always evident how to make possibility is the simple unbiased Horvitz-Thompson
efficient use of the available auxiliary information. In (HT) estimator Y = s ak y k . However, a more
this paper, we look at different ways of using complex efficient weighting of the observed yk is one that takes
auxiliary information to produce efficient calibration
the auxiliary information into account. Let us consider
estimators in two-stage and two-phase sampling. The
derivation of the variance of these estimators requires a instead YCAL = s wk yk , where the weights
simple procedure to linearize the expression for a { wk ; k s } satisfy the calibration equation
s wk x k = X . We say that the weights { wk ; k s } are
nonlinear calibration estimator. This simple procedure
is introduced as the method of automated linearization.
calibrated to X = U x k .
The paper is arranged as follows. In section 2 we
explain automated linearization for calibration Alternative sets of calibrated weights can be
estimators in a one-stage, one-phase sampling design. derived by the distance measure approach, as for
Section 3 examines estimation for two-phase sampling example in Huang and Fuller (1978) and Deville and
designs, and section 4 looks at estimation in two-stage Srndal (1992). The minimization of each distance
sampling with and without integrated weighting. A measure produces a different set of calibrated weights.
brief summary and comments are given in the However, the proposed distance measures are fairly
concluding section 5. similar so they tend to produce estimators with similar
properties. Instead we use the instrument vector
2 Automated linearization
approach, also called generalized calibration, as in
We first look at the simple case of auxiliary information Deville (2002) and Le Guennec and Sautory (2002).
for a one-stage, one-phase unit sampling design. This method allows a more general parameterization of
Consider a finite population U = { 1, 2, ..., k , ..., N }
from which a probability sample s is drawn. We denote
1346
the calibration weights. We specify a vector z k of the lower order in probability and assumed negligible
same dimension as x k and compute the weights compared to the main term. The expression for the
remainder term is usually not made explicit in
wk = a k (1 + Ts z k ) , k s (2.1) Woodruff linearization. This is not a serious drawback,
because standard practice is to discard this term and
) T ( a z xT ) 1 . The mapping from
where Ts = (X X s k k k
simply take YCAL ,lin to be a sufficiently good linear
z k to wk is not one-to-one. Different choices for z k approximation to YCAL . Under general conditions,
produce the same weights wk . We are free to choice N 1 (YCAL YCAL ,lin ) is O p (n 1 ) , not just O p (n 1/ 2 ) ,
the form of z k as long as the J J matrix
permitting the easily derived variance of Y to be
CAL ,lin
(s a k z k xTk ) has an inverse for every possible sample used as an accurate approximation of the variance of
s. We refer to such a z k as a valid instrument vector. YCAL , even for modest sample sizes.
The standard choice of z k = x k produces the
generalized regression estimator, although as explained Instead of the standard linearization approach, we
later, this choice is not necessarily optimal for any introduce the method of automated linearization. This
given design. For any valid instrument vector z k , the simple two-step procedure automatically makes
explicit both the linearized statistic and the lower order
weights satisfy s wk x k = X , and the estimator can be
term. In contrast to Woodruff linearization, automated
written as YCAL = s wk y k = Y + ( X X
)T B , where linearization requires no evaluation of partial
derivatives. For the case of simple auxiliary information
B = ( a z xT ) 1 a z y .
s k k k s k k k in this section, we confirm the well-known expression
for the variance of YCAL ,lin . Automated linearization has
Here B is a nonlinear design-weighted statistic, two steps:
thus, it is not what we call a HT statistic. Although Y is
a linear statistic, the term ( X X )T B
makes Y Step 1. In the expression YCAL = Y + ( X X
)T B
, create
CAL a
nonlinear estimator. This causes no problem for point a term of lower order in probability by centering B on
estimation since YCAL can be readily computed. But the the population vector B = (U z k xTk ) 1 U z k yk to
nonlinear form of the estimator creates a problem for
obtaining a simple exact expression for the variance of which B converges in probability. Then B B is
YCAL and for finding a corresponding sample-based O p (n 1/ 2 ) , and we have
estimate of this variance. Linearization is the usual
technique for circumventing this difficulty with YCAL = Y ( X
X) T B ( X
X)T (B B) (2.2)
nonlinear statistics. Woodruff (1971) is a basic X)T (B B) is O (n 1 ) , a lower order
reference. Since then, many papers have appeared on where N 1 ( X p
the linearization of complex statistics of interest in compared to N (X X) B which is O p (n 1/ 2 ) .

1 T
survey sampling, for example, Binder (1996), Binder

and Kovaevi (1995), and Deville (1999). The Step 2. Rewrite (2.2) as
emphasis in these references is on linearization of
statistics for estimating complex parameters, a purpose YCAL = (Y X
T B) + X T B ( X
X)T (B B) . (2.3)
somewhat different from ours, which is the study of
calibration estimators of a total. Thberge (1999) The calibration estimator is the sum of three terms: the
presents a linearization approach similar to the one constant term XT B , the design-based linear term
given here. His development is based on the use of
distance functions rather than an instrument vector. Y X
T B , and the design-based nonlinear term
(X X)T (B B) of lower order. The first two terms
Linearization of the nonlinear YCAL involves a on the right hand side of (2.3) define the linearized
power series expansion, including an evaluation of statistic
partial derivatives. The rather lengthy derivation is T B) + X T B = a e + X T B
YCAL ,lin = (Y X s k k (2.4)
given for example in Srndal, Swensson and Wretman
(1992). This method isolates a main term, YCAL ,lin , where ek = yk xTk B .
which is a linear statistic. The remainder term is of
1347
Our point estimator of Y is YCAL . It has a small (1987) determination of B so as to minimize the
variance of the unbiased difference estimator
bias, E(YCAL ) Y = E{ ( X
X ) T (B
B) } , since
Y + ( X X
)T B .
N 1 E{ ( X
X)T (B B) } is of order O(n -1 ) .
Therefore, the variance of YCAL is approximately the To see the features of the weights, let us write them
as wk = a k { 1 + (U x k s a k x k )T (s ak z k x Tk ) 1 z k } .
variance of the linearized statistic Y . Since XT B
CAL ,lin
We note the following:
is a constant, the use of auxiliary information reduces
(i) The computation of the weights wk for k s
the variance of the estimator from Var (s ak yk ) to
requires the design weights a k , the auxiliary
approximately Var (s ak ek ) . It is important to note
vector values x k , the instrument vector values z k ,
that the ek are fixed but unknown values and that
and the known auxiliary vector of totals U x k .
s ak ek is a HT statistic in the ek . Although the ek (ii) The a k are fixed by the design.
resemble regression residuals, they arise automatically
from steps 1 and 2, without any explicit regression (iii) We are free to choose the z k as long as the matrix
model or fit. Because s ak ek is a HT statistic, we s a k z k xTk is invertible.
obtain immediately, (iv) The weights wk calibrate to the known totals
Var(YCAL ) Var(YCAL ,lin ) = U Fkl ek el (2.5) U x k for any valid instrument vector z k .
(v) The weights wk are not dependent on y or on any
aa presumed relationship between y and x, as in a
where Fkl = k l 1 for l k , Fkl = Fkk = a k 1 for
a kl model dependent approach.
l = k , with a kl = 1 / kl where kl is the joint inclusion
probability of k and l . We use U as shorthand Some choices of z k are better than others. The
optimal choice, as noted above, is
for the double sum . To estimate the variance of
kU lU z k = z k = lU Fkl x l . It makes sense that the optimal
0
YCAL , we use the sample-based analogue of (2.5), choice depends on the sampling design. The
sample-based choice corresponding to z 0k is
V (YCAL ) = s (ak al akl )ek el (2.6)
z k = z *k = a k1 ls a kl Fkl x l . The weights wk do not
where ek = yk xTk B and s stands for . depend on the values yk of the variable of interest y
ks ls
and thus the optimal weights do not depend on yk .
The weights wk in YCAL = s wk y k depend on the Once the z k are specified, the same weights can be
used for all y-variables in the survey. The estimator
instrument vector z k . For every choice of z k for
YCAL is free of any unverifiable assumptions about a
k U , there corresponds a vector B satisfying the
possible regression of y on x. In the application of this
equation (U z k xTk ) B = U z k yk . We can find an approach it does not matter whether there exists a linear
optimal B, and a corresponding z k , by minimizing relationship between y and x. Furthermore, no
assumptions are required on the properties of the
Var(Y ) given by (2.5). This z is asymptotically
CAL ,lin k
residuals ek . These are treated as fixed but unknown
optimal for YCAL in that it minimizes values over the population U rather than random
Var(Y ) Var(YCAL ) . The optimal B is B 0 , defined variables from a hypothetical superpopulation model.
CAL ,lin
as the solution of the normal equation As a simple illustration, consider Simple Random
Sampling (SRS) from U with the sampling fraction
(U Fkl x l x ) B = U Fkl x l y k .
T
k
0
(2.7)
f = n / N and consider x k = (1, xk )T , where xk is a
A comparison with the general form scalar variable value. The required population
(U z k x Tk ) B = U z k y k defining B, suggests that an information is U x k = ( N , U xk )T . Then the optimal
optimal instrument vector is z k = z 0k , where instrument is found to be z k = z 0k = lU Fkl x l =
z 0k = lU Fkl x l . The result agrees with Montanaris
1348
N 1 are a 2 k = 1 / 2k , conditionally on the realized s1 . We

( 1)(x k xU ) where xU = U x k N . The
N 1 f denote by ak = a1k a2 k the overall sampling weight for
corresponding sample based choice is z k = z *k = unit k. The value yk of the variable of interest is
n 1 observed for all k s . The objective is to find a more
( 1)(x k x s ) where x s = s x k n . However
n 1 f efficient alternative for estimating Y = U yk than the
both z 0k and z *k are invalid because the first two-phase double expansion estimator Y = s ak y k .
component of these vectors is always zero, leading to a
singular matrix s a k z k xTk . We drop the first auxiliary We need to consider two auxiliary vectors for each
variable with the known total N and work instead with unit k. We denote these by x1 and x 2 , with x1k and
the vector x k = xk . This gives x 2 k representing their respective values for unit k.
n 1 Their dimensions are J1 1 and J 2 1 respectively.
z k = z k* = ( 1)( xk xs ) . The result is the
n 1 f The auxiliary information for x1 and x 2 is as follows:
familiar YCAL = N { y s + ( xU xs ) b } with
(i) The population vector total X1 = U x1k is known
b = { s ( xk xs )( y k y s ) s ( xk x s ) } . As is easily
2
while the population vector total X 2 = U x 2k is

verified, this estimator gives YCAL = N when y k = 1 for
not known.
all k, and YCAL = N xU = U xk when y k = xk . That is, (ii) For every k s1 , the vector values x1k and x 2 k
even though we must reduce the auxiliary vector from are known.
x k = (1, xk )T to x k = xk , the resulting set of weights (iii) For every k s , the vector values x1k and x 2 k
still reproduce the two known quantities N and U xk . are known.
No loss of information is incurred from the
The information given by (i), (ii) and (iii) is used to
non-invertibility of s a k z k xTk with x k = (1, xk )T . compute the weights for the calibration estimator
YCAL = s wk y k in an effort to improve on
We end this section by listing the steps of the
preceding argument. These important steps are applied Y = s ak y k . There are different ways to produce these
in each of the subsequent sections, where the auxiliary weights wk , depending on how we use (i) to (iii). For
information is more complex. example, we can carry out a single calibration step, or
arrive at the wk in two steps by first producing a set of
Step 1 The auxiliary information: Specify an
x k -vector with known totals. first-phase weights w1k . Each step requires starting
weights, an auxiliary vector and a valid instrument
Step 2 Point estimation: Specify a valid z k , compute
vector. We consider the following alternatives:
the calibrated weights and the resulting point
estimate. (a) One step calibration. Starting from ak = a1k a2 k ,
Step 3 Variance and variance estimation: Use
automated linearization to (a) identify the compute directly final weights wk for k s ,
linearized statistic, (b) obtain the residuals that U x1k
determine the variance, and (c) transform that calibrated to satisfy s wk x k = , with

variance into an estimated variance. s1 a1k x 2k
3 Calibration estimation in two-phase sampling x
x k = 1k of dimension J1 + J 2 . This is case B1
We now consider the setup for sampling in two phases. x 2k
From the population U = { 1, 2, ..., k , ..., N } , a large in Estevao and Srndal (2002).
probability sample, s1 , is drawn with known first-phase (b) Two step calibration. In step one, starting from
inclusion probabilities 1k . The first-phase sampling a1k , compute first-phase weights w1k for k s1 ,
weights are a1k = 1 / 1k for k s1 . One or more such that s1 w1k x1k = U x1k . In step two, starting
variables are observed for k s1 . Then, from s1 , a from ak = a1k a2 k , and using the w1k from step (i),
sub-sample, s, is drawn with known conditional
compute final weights wk for k s , such that
probabilities 2 k . The second-phase sampling weights
1349
s wk x k = s1 w1k x k . The final weights satisfy YCAL = s a k ( y k x T B)

(3.1)
U x1k + s1 a1k (xTk B x1Tk B1* ) + X1T B1* + R
s wk x k = w x . This is case A1 in
s1 1k 2 k where R is the lower order term given by
Estevao and Srndal (2002). T
B 1 B1
X X
R = 1 1 (X
X )T (B
* B* )
The two procedures make slightly different use of B B 1 1 1 1
the auxiliary information and in general, they produce X

2 X 2 2 2
different weights wk for k s . The use of information

= a x and X
with X j s1 1k jk j = s a k x jk , j = 1 , 2 . The
is somewhat more extensive in (b) than (a), in that it
term of main interest is the linear statistic
requires the information about the individual values x1k
for k s1 . This may or may not lead to an increase in YCAL ,lin = s ak ek + s1 a1k ek* + X1T B1* (3.2)
efficiency, depending on the relation between x1k , x 2 k
and yk . These questions are discussed in Estevao and where X1T B1* = ( U x1k )T B1* is a constant, and the
Srndal (2002). We can use automated linearization to residuals in the two random terms are
obtain the form of the residuals and the variance of each
e k = y k x Tk B = y k x 1Tk B 1 x T2k B 2 and
estimator in (a) and (b). We examine case (b) below.
ek* = xTk B x1Tk B1* .
The first-phase calibrated weights for case (b) are
computed as
By ignoring the lower-order term in (3.1), we can
w1k = a1k (1 + Ts1 z1k ) with use the linear design-weighted statistic (3.2) to obtain
Ts1 = (U x1k s1 a1k x1k )T (s1 a1k z1k x1Tk ) 1 the approximate variance of YCAL . Then, conditioning
on s1 , we obtain
for some valid instrument vector z1k . The w1k are then
used as input to compute the final calibrated weights as Var(YCAL ) Var(YCAL ,lin ) = V1 (E c ) + E1 (Vc )
wk = ak (1 + Ts z k ) where where E c = s1 a1k ( yk x1Tk B1* ) and Vc is the

T 1
= (sI w1k x k s ak x k ) (s ak z k x )
T
s
T
k conditional variance of s ak ek , given s1 . The
expressions for V1 (E c ) and E1 (Vc ) , and for their
x
where x k = 1k and z k is another valid instrument respective estimates are not detailed here. They follow
x 2k well-known patterns for two-phase sampling as shown
vector. We derive the variance by using automated for example in Estevao and Srndal (2002). For the
linearization. First, we insert into YCAL = s wk yk the estimated variance, we use B and B1* instead of B
B and B1* . Note that E1 (Vc ) = 0 if sample selection stops
expression for wk . Then, we define B = 1 =
after the first phase.
B2
( a z x )
s k k
T 1
k (s a k z k yk ) and center it on its The first term, V1 (E c ) , is reduced by the presence
population counterpart, the non-random vector in ek* of the regressor x1k only, whereas the second
B
( 1
)
B = 1 = U z k xTk (U z k y k ) . After some algebra,
term, E1 (Vc ) , gets reduced by both regressors, x1k and
B2 x 2 k . These features seem logical under the survey
we then define the statistic B1* = conditions. An interesting question, which we leave
unresolved here, is the jointly optimal choice for the
( s1 a1k z1k x1Tk ) 1 ( s1 a1k z1k (xTk B
)) , which we center
two instruments, z1k and z k . The simple standard
on its population counterpart B1* = choices are z1k = x1k and z k = x k .
( U z1k x1Tk ) 1 ( U z1k (xTk B)) . The result is
We comment briefly on the automated linearization
in case (a). The outcome is also a variance of the form
V1 (E c ) + E1 (Vc ) , with one residual for the first
1350
component, V1 (E c ) , and another residual for the overall sampling weight for unit k, and s = U si is the
isI
second, E1 (Vc ) . But these residuals are somewhat
sample of units. The value yk of the variable of interest
different in (a) and (b). For (b), we stated earlier in this
section the first component residuals as is observed for all units k s . We want to estimate the
ek = yk x1k B1 x 2k B 2 , showing a removal of the
T T total Y = U yk , but more efficiently than with the
influence of both x1k and x 2 k , and those of the second simple unbiased Y = s ak y k = sI a I i (si a k i y k ) .
component as e = x B x B , showing a removal of
*
k
T
k
T
1k
*
1
In general, auxiliary information exists for both the
the influence on xTk B (rather than on yk ) of x1k alone.
units and the clusters. We denote by x I i an auxiliary
The same pattern holds for (a), in that both x1k and x 2 k
vector value associated with cluster i, and by x k an
are removed in the first residual, and x1k alone in the
auxiliary vector value associated with unit k. We
second. Cases (a) and (b) differ in the B-coefficients of consider the following information to be available:
the two kinds of residuals. The automated linearization
of (a) readily reveals the form of these B -coefficients.
(i) The cluster population vector total U I x I i is
We do not show them here. The important point is that
the influence of the x-vectors is removed according to a known.
common pattern, although the values of ek and ek* are (ii) For every i sI , the cluster vector x I i is known.
not the same. Thus we can expect that (a) and (b) will (iii) The unit population vector total U x k is known.
usually generate rather small differences in the variance (iv) For every k s , the unit vector x k is known.
of the corresponding YCAL estimators. This is confirmed
by the simulations in Estevao and Srndal (2002). For If x I i is known for every i U I , then (i) and (ii)
unusual relationships between yk , x1k and x 2 k , the are met. This occurs, for example, in area sampling
differences can be more significant. Further studies are where each cluster is a geographical entity for which
needed to examine this. we have a useful auxiliary measurement vector, for
example, the surface area and/or the number of
4 Calibration estimation in two-stage sampling
inhabitants. On the other hand, it is unlikely that we
We start from the usual formulation of sampling in two would have information x k about every unit k U in
stages. A sample of units is realized by two-stage a survey where the absence of a list frame of units
selection from a population U = { 1, 2, ..., k , ..., N } precludes single stage sampling and forces us to use
grouped into clusters. This design involves sampling two-stage sampling. But conditions (iii) and (iv) are
from two distinct populations of interest: (i) the met if x k is recorded for all sampled units and the total
population of first stage clusters,
U x k can be imported from an accurate outside
U I = { 1, 2, ..., i, ..., N I } , and (ii) the population of
source, a census or a census projection, as it is, for
second stage units U = { 1, 2, ..., k , ..., N } . For example, in the Canadian Labour Force survey. This
simplicity, we refer to them as the population of section examines calibration estimators derived from
clusters and the population of units respectively. The some or all of the information (i) to (iv).
population U = { 1, 2, ..., k , ..., N } is the union of all
the units U i , i U I in the N I clusters. The information is somewhat different when there
is a known value x k for every unit k U i , where U i is
First, we draw a sample of clusters sI from U I , a selected cluster, i sI . This case is covered by (i) to
with known first-stage inclusion probabilities 1i . The (iv) and we need not consider it, because the known
cluster total t x i = U i x k for i sI can then be entered
first-stage sampling weights are aI i = 1/ I i for i sI .
At the second stage, we sample units within each of the into x I i in (ii), assuming U I t x i = U x k is also
selected clusters. From U i , we draw a sample si of known.
units, with known second-stage probabilities k i
Surveys involving sampling of clusters often have
conditional on si . The conditional sampling weights the double objective of computing estimates of totals
are ak i = 1/ k i for k si . Thus, a k = a I i a k i is the for both the population of units U (referred here as unit
statistics) and the population of clusters U I (cluster
statistics). Then, we observe both the value of a cluster
1351
variable of interest, yI i for i sI , and the value of a conditional design weights. One can argue that option
(2) is slightly simpler than (1) but it actually imposes
unit variable of interest, yk for k s . For example, if
more severe restrictions on the unit weights. As we see
households are clusters, yI i may be the value for later, this has implications on the variance of the
household i of the variable yI = household income; and estimators.
if units are persons in the households, yk may be the
A special case of (2) that has drawn considerable
value of the variable of interest y = employment status attention occurs for single stage cluster sampling, see
(0 if employed, 1 if unemployed). for example Lematre and Dufour (1987), Andersson
(1997), and Nieuwenbroek (1993). Since all k in cluster
The totals to be estimated are then YI = U I y Ii for i are observed, a k i = 1 and (2) implies wk = wI i . It is
statistics on household income, and Y = U y k for practical to assign the same weight to all units in a
statistics on individuals employment. We thus examine cluster, for the calculation of unit statistics, and this
the calibration estimators YI,CAL = sI wI i y Ii and YCAL = common weight is the cluster weight for cluster
statistics. The approach of Lematre and Dufour (1987)
s wk y k with cluster weights wI i satisfying differs from ours. They find wk to satisfy (4.2) but in
sI wI i x I i = U I x I i (4.1) such a way that the known auxiliary vector value x k is
replaced by one and the same constructed value,
and unit weights wk satisfying Ui x k / N i , for every k in cluster i. By contrast, we
keep the individual x k and use one of the integrated
s wk x k = U x k . (4.2)
weighting options to set up the calibration problem,
leading to integrated wk and wI i . The calculation of
We also allow for the fact that many two-stage
designs call for some form of integrated weighting. Its the weights of the calibration estimators YCAL = s wk yk
objective is to impose a simple relation between a and Y = w y is described below.
I ,CAL sI Ii Ii
cluster weight wI i and the weights wk for the selected
units k of that cluster. The interest in integrated (a) Non-integrated calibration: Starting from aIi ,
weighting is promoted by Eurostat in its efforts to
harmonize the estimation methods used by the member compute cluster weights wI i for i sI , calibrated
states of the European Union. Also, integrated to the cluster information in the manner of (4.1); in
weighting schemes are of interest for the further an independent second calibration, starting from
development of generalized estimation systems such as a k = a I i a k i , compute unit weights wk for k s
Bascula, CLAN and GES. We examine two options for
calibrated to the unit information as stated in (4.2).
integrated weighting:
(b) Calibration with integration option (1): In (4.1),
(1) si wk = N i wI i for every i sI , where N i is the
replace wI i by si wk / N i , making that equation a
known size of cluster i.
function of the wk . Assign the equal shares
(2) wk = ak i wI i for the selected units k in cluster
value x i k = x I i / N i to every selected unit k in
i sI .
cluster i. Then starting from a k = a I i a k i , compute
Each option imposes a simple relationship between unit weights wk for k s , calibrated to satisfy
the wk and the wI i . Depending on the option selected, x i k U I x I i
s wk = . Then compute the cluster

x k U x k
we can write (4.1) and (4.2) in terms of either wk or
wI i . We assume that the resulting set of equations is weights as wI i = si wk / N i .
consistent. Depending on the choice, there is some
effect on the precision of the resulting calibration (c) Calibration with integration option (2): In (4.2),
estimates, as discussed in this section. Option (1) is
replace wk by a k i wI i , making that equation a
based on the requirement that the estimated number of
units within any group of clusters must be the same function of the wI i . Starting from a I i , compute
whether the cluster weights or the unit weights are used
cluster weights wI i for i sI , calibrated to satisfy
to create that estimate. Option (2) preserves the
1352
Case Integrated Method Calibration Equation(s)

Weighting
Option
Using a I i as starting weights, compute wI i to
(a) None sI wI i x I i = U I x I i
satisfy (4.1). Independently, using ak as
s wk x k = U x k
starting weights, compute wk to satisfy (4.2).
In (4.1), replace wI i by si wk / N i . With ak as
(b) si wk = N i wI i x ik U I x I i
starting weights, compute wk to satisfy both s wk =

x k U x k
(4.1) and (4.2). Then compute the wI i .
In (4.2), replace wk by ak i wI i . With a I i as
(c) wk = ak i wI i x I i U x I i
starting weights, compute wI i to satisfy both sI wI i t = I x
x i U k
(4.1) and (4.2). Then compute the wk .
Table 1. Summary of Calibrated Weighting Methods for Two-Stage Estimation.
x I i U x I i component of variance. These residuals are summarized

sI wI i t = I x , where t x i = si ak i x k is in Table 2.
x i U k
the sample-weighted, unbiased estimator of Consider case (b) for unit statistics. The total to
t x i = U i x k . Then compute the unit weights as estimate is Y = U y k . The weights for YCAL = s wk y k
wk = ak i wI i . are computed for k s as wk = ak (1 + Ts z k ) with
1
x a x T T

The procedures for (a), (b) and (c) are summarized a z x i k ,
= U I I i s k i k
T
k k
x k
U x k s ak x k
s s
in Table 1. All three cases reduce to a weight
calculation of the form (2.1), just as in single-stage unit
sampling. That is, despite the two stages of sampling, where z k is any valid instrument, and x i k = x I i / N i for
the point estimation does not become any more every selected unit k in cluster. The estimator of the
complex than in single-stage unit sampling. A software total for units, is YCAL = s wk y k . Automated
programmed to compute formula (2.1), such as
CLAN97 or GES, can be used to compute the linearization gives YCAL = YCAL ,lin + R , where R is the
calibration estimators in (a), (b) and (c). However, the lower order term
two-stage design leads to a more complicated variance T
than in section 2. The variance has two components, ak x i k U I x I i
R = s (B B )

one for each stage of selection, as shown later in the s ak x k U x k
section. 1
B x
T
with B = 1 = s ak z k i k

(s ak z k yk ) , and
x k
We have three cases, (a) to (c), and for each, both
cluster statistics and unit statistics are examined. There B2
are thus 3 2 = 6 situations to examine. For each of the linearized statistic is
these, the approximate variance of the calibrated T
estimator (which equals the variance of the linearized x Ii B1
YCAL ,lin = s ak ek + U I . (4.3)
statistic) has the form V1 (E c ) + E1 (Vc ) , where V1 (E c ) U x k B2
is the first stage variance component and E1 (Vc ) the
second stage variance component. The latter is zero if The second term on the right hand side is a
there is no sampling at the second stage, that is, all units constant, and the preceding random term, s ak ek =
in selected clusters are observed (single stage cluster
sampling). It is straightforward to carry out the sI aI i (si ak i ek ) , has the residuals
automated linearization in the 6 situations. This leads to
the expression for the residuals, one for each
1353
T T
x B1 x I i B ( c)
ek = y k i k with eIi = y I i 1( c ) . The residuals for unit and
xk B2 t x i B2
1 cluster statistics, are summarized in Table 2.
B1 xi k
B = = U z k (U z k yk ) . (4.4)
B2 xk Another issue of interest in case (b) is the choice of
the instrument z k . The standard choice is to take
x
By conditioning on sI , and using Var(YCAL ,lin ) = z k = i k for k in cluster i. But one can derive a
V1 (E c ) + E1 (Vc ) , we find xk
B = B 0 that minimizes (4.5), with a corresponding
Var (YCAL ) Var(YCAL ,lin ) optimal z k = z 0k . Some algebra shows that z 0k =
(4.5)
= U I FI i j eI i eI j + U I a I i Vi x xIj
a I i lU i Fkl i i l + jU I FI i j for k in cluster i. It

x Ii
T xl t x j
B1
where eI i = U i ek = t y i is no surprise that z 0k depends on the sampling design
t xi B2
a k i al at both stages. Future work will examine when z k = z 0k
and Vi = U i Fkl i ek el with Fkl i =
i
1 and is a valid instrument and whether z k = z 0k gives any
a kl i
appreciable variance advantage over the simple
aI i aI j standard z k = x k . If this advantage is minimal, the
FI i j = 1 .
aI i j preferred choice in practice is the simple z k = x k .
The residual eIi in the first component V1 (E c ) = Consider now case (c). The calibrated estimator of
U I FI i j eI i eI j equals the cluster total of the residuals the cluster total YI = U I y Ii is YI ,CAL = sI wI i yI i , with
ek in the second component E1 (Vc ) = U I aI i Vi . Both the cluster weights wI i = a I i (1 + TsI z i ) for i sI , where
eIi and ek have their magnitude reduced by both the T 1
U x I i s a I i x I i xIi
cluster auxiliary and the unit auxiliary. The regressor is = I I sI a I i z i
T
U t x i
I sI I i x i
sI
a t t xi
(x TI i , t Tx i ) in eI i and (x Ti k , x Tk ) in ek . In particular, in
single stage cluster sampling, Vi = 0 for all i, and only and t x i = si a k i x k . The calibrated estimator of the
the first variance component remains. unit total Y = U y k is YCAL = s wk yk , where the
Consider now case (b) for cluster statistics. The integrated unit weights for k in cluster i are
wk = ak i wI i , using the computed wI i . We can use
total YI = U I y I i is estimated by YI,CAL = sI wI i yI i ,
where the wI i are computed from the already available automated linearization on YI ,CAL and YCAL to obtain
w as w = w / N . We can write Y as a sum the linearized statistic and the residuals that determine
k Ii si k i I ,CAL
the two components of the approximate variance. The
of unit values YI,CAL = s wk yi k , if we define details of the derivations are omitted. The residuals,
yi k = yI i / N i for all k in cluster i. To obtain its
given in Table 2, are expressed in terms of the vectors
1
B I1 x I i
T
variance, we simply change the variable of interest in
equations (4.3) to (4.5). We replace yk by yik , keeping
B I = = U z i
I t x i U I z i y I i( ) and
BI2
B ( c)
other quantities intact. Denote by B (c ) = 1( c ) the B (u )
B2 B (Iu ) = (I1u ) , where B (Iu ) is obtained by replacing yI i
BI2

result of replacing yk by yi k in B of (4.4). The
in B I by t y i = U i y k .
approximation to Var (YI,CAL ) is then given by (4.5)
T
x B1( c ) The residuals given in Table 2 for case (a) are
with ek = yi k i k and simple to explain. In case (a) for unit statistics, the
B ( c)
xk 2
1354
Case Integrated Estimation Residual eI i Residual ek

Weighting of a total for
Option
(a) None Units t yi t Tx i B y k xTk B
Clusters y Ii x TI i B I 0
(b) si wk = N i wI i Units t yi x TI i B1 t Tx i B 2 y k xTi k B1 xTk B 2
Clusters y Ii x TI i B1( c ) t Tx i B (2c ) yi k xTi k B1( c ) xTk B (2c )
(c) wk = ak i wI i Units t y i xTI i B (I1u ) t Tx i B (I u2) yk xTk B (Iu2)
Clusters y I i x TI i B I1 t Tx i B I 2 x Tk B I 2
Table 2. Summary of residuals in the Components of the Variance (4.5) for Two-Stage Sampling and
Estimation. The notation is explained in the text.
automated linearization of YCAL = s wk yk produces unfavourable situation for the second variance
component arises for case (c), where the residual is
B = (U z k x Tk ) 1 (U z k yk ) , and the residuals in Table
x Tk B I 2 . Thus (a) or possibly (b) has the best potential
2 follow from case (b) for unit statistics by setting
for efficient estimation of cluster statistics.
x I i = 0 and x i k = 0 for all i and k, because case (a)
involves no cluster related information in estimating for 5 Summary and discussion
units. For case (a) for cluster statistics, the automated The question of efficient weighting of the observed
linearization of YI,CAL = sI wI i yI i leads to B I = values has always been important in survey sampling
theory. An important step was the formulation in 1952
(U z i xTI i ) 1 (U z i yI i ) , and the residuals follow from of the HT estimator, prescribing that the weight of each
case (c) for cluster statistics by setting x k = 0 and unit equals the inverse of the probability of its inclusion
in the sample. Thus, in stratified simple random
t x i = 0 for all i and k, because case (a) uses no unit
sampling (STSRS), the weight given to all units
related information in estimating for clusters. sampled from a stratum equals the inverse of the
sampling rate in the stratum. Neymans convincing
An examination of the residuals in Table 2 leads to results in 1934 on optimal estimation under STSRS laid
some interesting conclusions. Let us first compare the the foundation of what is now commonly called the
residuals for unit statistics. In (b) and (c), the residuals design-based theory of estimation. Another important
eI i are adjusted for both x I i and t x i , but in (a) they are principle embodied in HT estimation is that the same
only adjusted for t x i . Thus (b) and (c) are better than weight system applies to all y-variables of interest in a
multi-purpose survey. This preserves the design
(a) for the first component. The residual ek is adjusted unbiasedness for every y-variable. Assuming no
for both auxiliaries in (b), but not in (a) and (c), where non-response, the sampling design alone determines
it is only adjusted for x k . Thus (b) has the best once and for all the weighting and the construction of
potential for efficient estimation of unit statistics. the point estimator.
Compare now the residuals for cluster statistics. In (b)
and (c), the residual eIi is adjusted for both x Ii and t xi , The principle of a single weight system extends to
the calibration estimators in this paper. However, unlike
but in (a) it is only adjusted for x Ii . By design, the
the sampling weights a k , the calibrated weights wk are
residual ek is always zero in (a). A particularly calculated only after drawing the sample. They are
1355
usually more efficient (give a smaller variance) than the Deville, J. C. (1999). Variance estimation for complex
a k for every single y-variable and they produce statistics and estimators: Linearization and residual
estimators with a negligible bias. techniques. Survey Methodology 25, 193-203
The literature on calibration has been based on a Deville, J.C. (2002). Correction for non-response by
model oriented construction of these estimators. Both generalized calibration. Report, ENSAI, France.
the model assisted and model dependent approaches to
calibration involve an explicit assumption of a linear Estevao, V. and Srndal, C.E. (2002). The ten cases of
superpopulation model between x and y. This model is auxiliary information for calibration in two-phase
sampling. Journal of Official Statistics 18,
of the form y k = xTk B + k where it is assumed that
233-255.
E ( k ) = 0 and Var ( k ) = ck 2 with ck > 0 . In
practice however, this model is often invalid. Huang, E.T. and Fuller, W.A. (1978). Nonnegative
regression estimation for sample survey data.
In our approach, the use of auxiliary information is Proceedings Social Statistics Section, American
not linked to model fitting. We define a Statistical Association, 300-305.
parameterization of the calibration weights involving
the instrument vector z k and then apply the method of Lematre, G.E. and Dufour, J. (1987). An integrated
method for weighting persons and families. Survey
automated linearization to obtain a linear approximation
Methodology 13, 199-207.
of the calibration estimator. This linear approximation
Le Guennec, J. and Sautory, O. (2002). Application of
is a design-based function of a set of fixed but unknown
generalized calibration to the correction of
population residuals determined implicitly without any non-response: An experiment. Report, ENSAI,
modelling. The wk are calculated using all or part of France.
the available auxiliary information. We have shown
how to do this for different designs including one-phase Montanari, G.E. (1987). Post-sampling efficient
and two-stage designs. It is important to note that the QR-prediction in large-sample surveys.
construction of the point estimator has nothing to do International Statistical Review 55, 191-202.
with the y-variables; the same weights apply to all
y-variables as is the case for the HT estimator. Nieuwenbroek, N.J. (1993). An integrated method for
However, the calibration estimator can be considerably weighting characteristics of persons and
more efficient for some y-variables than others. This households using the linear regression estimator.
depends on the resulting population residuals. Internal report, Central Bureau of Statistics, The
References Netherlands.
Andersson, C. (1997). Continuous labour force surveys: Srndal, C.E., Swensson, B. and Wretman, J. (1992).
performance analysis of a single weight procedure. Model Assisted Survey Sampling. New York:
Internal report, Statistical Methodology Unit, Springer-Verlag.
Statistics Sweden.
Thberge, A. (1999). Extensions of Calibration
Binder, D. (1996). The right and wrong ways to Taylor Estimators in Survey Sampling. Journal of the
linearize for single phase and two phase samples: a American Statistical Association 94, 635-644.
cookbook approach. Survey Methodology 22,
17-22. Woodruff, R.S. (1971). A simple method for
approximating the variance of a complicated
Binder, D. and Kovaevi, M.S. (1995). Estimating estimate. Journal of the American Statistical
some measures of income inequality from survey Association 66, 411-414.
data: and application of the estimating equation
approach. Survey Methodology 21, 137-145.
Deville, J.C. and Srndal, C.E. (1992). Calibration

estimators in survey sampling. Journal of the
American Statistical Association 87, 376-382
1356

EstevaoSarndal - 2003 - New Perspective On Calibration Estimators PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

EstevaoSarndal - 2003 - New Perspective On Calibration Estimators PDF

Загружено:

Авторское право:

Доступные форматы

2003 Joint Statistical Meetings - Section on Survey Research Methods

A NEW PERSPECTIVE ON CALIBRATION ESTIMATORS

Victor M. Estevao and Carl-Erik Srndal

the linearization of complex statistics of interest in compared to N (X X) B which is O p (n 1/ 2 ) .

survey sampling, for example, Binder (1996), Binder

N 1 are a 2 k = 1 / 2k , conditionally on the realized s1 . We

while the population vector total X 2 = U x 2k is

s wk x k = s1 w1k x k . The final weights satisfy YCAL = s a k ( y k x T B)

the auxiliary information and in general, they produce X

different weights wk for k s . The use of information

wk = ak (1 + Ts z k ) where where E c = s1 a1k ( yk x1Tk B1* ) and Vc is the

Case Integrated Method Calibration Equation(s)

Table 1. Summary of Calibrated Weighting Methods for Two-Stage Estimation.

x I i U x I i component of variance. These residuals are summarized

Case Integrated Estimation Residual eI i Residual ek

(a) None Units t yi t Tx i B y k xTk B

(b) si wk = N i wI i Units t yi x TI i B1 t Tx i B 2 y k xTi k B1 xTk B 2

Clusters y Ii x TI i B1( c ) t Tx i B (2c ) yi k xTi k B1( c ) xTk B (2c )

(c) wk = ak i wI i Units t y i xTI i B (I1u ) t Tx i B (I u2) yk xTk B (Iu2)

Deville, J.C. and Srndal, C.E. (1992). Calibration

Вам также может понравиться