Академический Документы
Профессиональный Документы
Культура Документы
(Y
1
, . . . , Y
n
)
We are interested in evaluating the distribution of
(resp.
1
, . . . , Y
n
are genera-
ted by drawing drawing observations independently and with
replacement from the available sample Y
1
, . . . , Y
n
.
2) Bootstrap estimates:
(Y
1
, . . . , Y
n
)
3) In practice: Steps 1) and 2) are repeated m times (e.g. m =
2000) m values
1
,
2
, . . . ,
m
4) The (empirical) distribution of
n( p p)
p(1 p)
L
N(0, 1)
n large: the distributions of
1
, . . . , Y
n
generated by drawing observati-
ons independently and with replacement fromY
n
:= {Y
1
, . . . , Y
n
}.
Let S
i
which are equal to 1.
Inference@LS-Kneip 13
Bootstrap estimate of p: p
= S
/n
The distribution of p
p given
the observed sample Y
n
. The bootstrap is called consistent if
asymptotically (n ) the conditional distribution of p
p
coincides with the true distribution of pp (note: a proper scaling
is required!)
We obtain
P
(Y
i
= 1) = P(Y
i
= 1| Y
n
) = p,
P
(Y
i
= 0) = P(Y
i
= 0| Y
n
) = 1 p
and
E
( p
) = E( p
| Y
n
) = p,
V ar
( p
) = E[( p
p)
2
| Y
n
] =
p(1 p]
n
The conditional distribution of n p
= S
given Y
n
is equal
to B(n, p). In a slight abuse of notation we will write
(n p
|Y
n
) B(n, p)
or
distr(n p
|Y
n
) = B(n, p)
Inference@LS-Kneip 14
As n the central limit theorem implies that the (condi-
tional) distribution (
n( p
p)
p(1 p)
|Y
n
) converges (stochastically)
to a N(0, 1)-distribution. Moreover, p is a consistent estima-
tor of p and therefore p(1 p)
P
p(1 p) as n .
This implies that asymptotically p(1 p) may be replaced
by p(1 p), and
The law of (
n( p
p)
p(1 p)
|Y
n
) converges stochastically
to a N(0, 1)-distribution
More precisely, as n
sup
|P
_
n( p
p)
p(1 p)
|Y
n
_
()|
P
0,
where denotes the distribution function of the standard
normal distribution.
We can conclude that for large n
distr(
n( p
p)|Y
n
) distr(
p|Y
n
) distr( p p) N(0, p(1 p)/n)
Bootstrap consistent
Inference@LS-Kneip 15
Example 2: Estimating a population mean
Let Y
1
, . . . , Y
n
denote an i.i.d. random sample with mean
and variance
2
. In the following F will denote the corre-
sponding distribution function.
Y =
1
n
n
i=1
Y
i
is an unbiased estimator of
Problem: Construct a condence interval
Traditional approach for constructing a 1 condence interval:
Y N(,
2
n
)
Estimation of
2
: S
2
=
1
n1
n
i=1
(Y
i
Y )
2
This implies:
n
Y
S
t
n1
, and hence
P(t
n1,1
2
S
n
Y t
n1,1
2
S
n
)
95% condence interval: [
Y t
n1,1
2
S
n
,
Y +t
n1,1
2
S
n
]
Remark: The construction relies on the assumption that
Y
N(,
2
n
). This is necessarily true if Y is normally distributed.
If the underlying distribution is not normal, then this condition
is approximately fullled is the sample size n is suciently large
(central limit theorem). In this case the constructed condence
interval must also be seen as an approximation
The bootstrap oers an alternative method for constructing such
condence intervals.
Inference@LS-Kneip 16
The bootstrap approach:
Random samples Y
1
, . . . , Y
n
are generated by drawing obser-
vations independently and with replacement from the availa-
ble sample Y
n
:= {Y
1
, . . . , Y
n
}.
Y
1
, . . . , Y
n
estimator
Y
=
1
n
n
i=1
Y
i
Means and variances of the conditional distributions of Y
i
and
Y
given Y
n
:
E
(Y
i
) = E(Y
i
|Y
n
) =
Y ,
V ar
(Y
i
) = E[(Y
i
Y )
2
| Y
n
] =
S
2
:=
1
n
n
i=1
(Y
i
Y )
2
Moreover,
E
) =
Y ,
V ar
) =
S
2
/n
As n the central limit theorem implies that the (condi-
tional) distribution of (
n(
Y )
S
|Y
n
) converges (stochasti-
cally) to a N(0, 1)-distribution.. Moreover,
S
2
is a consistent
estimator of
2
and therefore
S
2
P
2
as n . This
implies that asymptotically
S may be replaced by , and
The law of (
n(
Y )
|Y
n
) converges stochastically
to a N(0, 1)-distribution
More precisely, as n
sup
|P
_
n(
Y )
|Y
n
_
()|
P
0,
where denotes the distribution function of the standard
normal distribution.
Inference@LS-Kneip 17
We can conclude that for large n
distr(
n(
Y )|Y
n
) distr(
n(
Y )) N(0,
2
)
as well as
distr(
Y |Y
n
) distr(
Y ) N(0,
2
/n)
Bootstrap consistent
Construction of a symmetric condence interval of level
1 :
Determine
2
and 1
2
quantiles
t
2
and
t
1
2
of the con-
ditional distribution of Y
given Y
n
:= {Y
1
, . . . , Y
n
} (the
bootstrap distribution):
P
2
)
2
, P
>
t
2
) 1
2
,
P
t
1
2
) 1
2
, P
>
t
1
2
)
2
,
Here, P
given Y
n
:= {Y
1
, . . . , Y
n
}.
In practice:
Draw m bootstrap samples (e.g. m = 2000) and calculate
the corresponding estimates
Y
1
,
Y
2
, . . . ,
Y
m
.
Order the resulting estimates
Y
(1)
Y
(2)
Y
(m)
.
Set
t
:=
Y
([m+1]
2
)
and
t
1
:=
Y
([m+1][1
2
])
.
Inference@LS-Kneip 18
A basic bootstrap condence interval:
By construction of
t
2
and
t
1
2
we have
P
Y )
2
, P
t
1
Y ) 1
2
.
We have seen that the bootstrap is consistent, and therefore
distr(
Y |Y
n
) distr(
Y )
2
, P(
t
1
Y ) 1
2
,
and therefore
P
_
Y (
t
1
Y )
Y (
Y )
_
1
Approximate 1 (symmetric) condence interval:
[2
t
1
2
, 2
2
]
The percentile interval:
In the older bootstrap literature the so-called percentile in-
terval
[
2
,
t
1
2
]
is usually recommended as a 1 condence interval.
The percentile interval can easily be justied if all underlying
distributions are symmetric, distr(
Y |Y
n
) distr(
|Y
n
), distr(
Y ) distr(
Y ).
In practice the percentile interval is usually less precise than
the standard interval discussed above; there are however so-
me bias-corrected modications of the percentile interval which
allow better approximations.
Inference@LS-Kneip 19
General Setup: The nonparametric (naive) bootstrap
Data: Random sample Y
n
:= {Y
1
, . . . , Y
n
}; the distribution
of Y
i
depends on an unknown parameter (vector)
The data Y
1
, . . . , Y
n
is used to estimate
estimator
(Y
1
, . . . , Y
n
)
Bootstrap: Random samples Y
1
, . . . , Y
n
are generated by
drawing observations independently and with replacement
from the available sample Y
1
, . . . , Y
n
Bootstrap estimates
(Y
1
, . . . , Y
n
)
distr(
|Y
n
) is used to approximate distr(
)
The bootstrap works for a large number of statistical and eco-
nometrical problems. Indeed, it can be shown that under some
mild regularity conditions the bootstrap is consistent, if
1) Generation of the bootstrap sample reects appropriately
the way in which the original sample has been generated
(i.i.d. sampling!).
2) The distribution of the estimator
is asymptotically normal.
More precisely,
single parameter ( IR):
n(
) N(0, v
2
); v -
standard error of
n(
)
multivariate parameter vector ( IR
d
):
n(
)
N
d
(0, V ); V - covariance matrix of
n(
)
Consistent Bootstrap: distr(
n(
)|Y
n
) distr(
n(
)) [and distr(
|Y
n
) distr(
)] if n is suciently large.
Bootstrap condence intervals, tests, etc.
Inference@LS-Kneip 110
Note:
Standard approaches to construct condence intervals and
tests are usually based on asymptotic normal approximati-
ons. For example, if IR and
n(
) N(0, v
2
) one
usually tries to determine an approximation v of v from the
data. An approximate 1 condence interval is then given
by
[
z
1
2
v
n
,
+z
1
2
v
n
]
In some cases it is very dicult to obtain approximations v of
v. Statistical inference is then usually based on the bootstrap
In contemporary statistical analysis the bootstrap is frequent-
ly used even for standard problems, where estimates v of v
are easily constructed. The reason is that in many situa-
tions bootstrap it can be shown that bootstrap condence
intervals or tests are more precise than those determined
analytically based on asymptotic formulas.
It must be emphasized that the bootstrap does not always work.
The bootstrap may fail if one of the above conditions 1) or 2) is
violated. Examples are
The naive bootstrap will not work if the i.i.d re-sample Y
1
, . . . , Y
n
from Y
1
, . . . , Y
n
does not properly reect the way how the
Y
1
, . . . , Y
n
is generated from the underlying population (e.g.
dependent data; Y
1
, . . . , Y
n
not i.i.d.).
The distribution of the estimator
is not asymptotically nor-
mal (e.g. extreme value problems)
Inference@LS-Kneip 111
General approach: Basic bootstrap 1 condence in-
terval
Random sample Y
n
:= {Y
1
, . . . , Y
n
}; unknown parameter (vec-
tor)
We will assume that the bootstrap is consistent: distr(
|Y
n
)
distr(
) if n is suciently large.
Determine
2
and 1
2
quantiles
t
2
and
t
1
2
of the condi-
tional distribution of
given Y
n
:= {Y
1
, . . . , Y
n
} (the boot-
strap distribution):
P
2
)
2
, P
>
t
2
) 1
2
,
P
t
1
2
) 1
2
, P
>
t
1
2
)
2
,
Here, P
given Y
n
:= {Y
1
, . . . , Y
n
}.
Consistency of the bootstrap implies that for large n
P(
)
2
, P(
t
1
) 1
2
,
and therefore
P
_
t
1
)
(
)
_
1
Approximate 1 (symmetric) condence interval:
[2
t
1
2
, 2
2
]
Inference@LS-Kneip 112
Example: Bootstrap condence interval for a median
Given: i.i.d. sample Y
n
:= {Y
1
, . . . , Y
n
}; Y
i
possesses a continuous
distribution with (unknown) density f.
We are now interested in estimating the median
med
of the
underlying distribution. Recall that the median is dened by
P(Y
i
med
) = P(Y
i
med
) = 0.5
med
is estimated by the sample median
med
. Based on the
ordered sample Y
(1)
Y
(2)
Y
(n)
,
med
is given by
med
=
_
_
_
Y
(
n+1
2
)
if n is an odd number
(Y
(
n
2
)
+Y
(
n
2
+1)
)/2 if n is an even number
Construction of a condence interval for
med
is not an easy task.
Asymptotically we obtain
n(
med
med
)
L
N(0,
1
4f(
med
)
2
)
The problem is that the density f is unknown. In principle it
may be estimated by nonparametric kernel density estimation
and a corresponding plug-in estimate
f(
med
) may be used to
approximate the asymptotic variance. However, the bootstrap
oers a simple alternative.
Construction of a bootstrap condence interval:
Draw i.i.d. random samples Y
1
, . . . , Y
n
from Y
n
and deter-
mine the corresponding medians
med
Determine
2
and 1
2
quantiles
t
2
and
t
1
2
of the condi-
tional distribution of
med
given Y
n
:= {Y
1
, . . . , Y
n
}.
Approximate 1 (symmetric) condence interval:
[2
med
t
1
2
, 2
med
2
]
Inference@LS-Kneip 113
1.2 Pivot statistics and the bootstrap-t me-
thod
In many situations it is possible to get more accurate bootstrap
condence intervals by using the bootstrap-t method (one also
speaks of studentized bootstrap condence intervals). The con-
struction relies on so-called pivot statistics.
Let Y
1
, . . . , Y
n
be an i.i.d. random sample and assume that the
distribution of Y depends on an unknown parameter (or para-
meter vector) .
A statistics T
n
T(Y
1
, . . . , Y
n
) is called pivot statistics, if
the distribution of T
n
does not depend on any unknown
parameter.
A statistics T
n
T(Y
1
, . . . , Y
n
) is called asymptotic pivot
statistics, if for suitable sequences a
n
, b
n
of real numbers
the transformed statistics a
n
T
n
+b
n
possesses a well-dened,
non-degenerate asymptotic distribution, which does not de-
pend on the parameters of the unknown distribution of Y .
Example: Population mean: Y
1
, . . . , Y
n
with mean , variance
2
> 0, and E|Y |
3
= < . If Y is normally distributed we
obtain
n(
Y )
S
t
n1
with S
2
=
1
n1
n
i=1
(Y
i
Y )
2
, where t
n1
denotes Students t-
distribution with n1 degrees of freedom. We can conclude that
T
n
is a pivot statistics
Even if Y is not normally distributed, the central limit theorem
implies that
n(
Y )
S
L
N(0, 1)
Inference@LS-Kneip 114
In this case T
n
is an asymptotic pivot statistics.
Bootstrap:
i.i.d. re-sample Y
1
, . . . , Y
n
Y
1
, . . . , Y
n
from Y
n
estimators
=
1
n
n
i=1
Y
i
and
S
2
=
1
n1
n
i=1
(Y
i
Y
)
2
n large approximately
distr(
n(
Y )
S
|Y
n
) distr(
n(
Y )
S
) N(0, 1)
or
distr(
Y
S
|Y
n
) distr(
Y
S
)
Therefore, the (conditional) distribution of
Y
Y
S
(given Y
n
)
can be used to approximate the distribution of
Y
S
.
Construction of a bootstrap-t condence interval of level 1 :
Determine
2
and 1
2
quantiles
2
and
1
2
of the condi-
tional distribution of
Y
Y
S
given Y
n
:
P
Y
S
2
)
2
, P
Y
S
>
2
) 1
2
,
P
Y
S
2
) 1
2
, P
Y
S
>
1
2
)
2
,
In practice:
Draw m bootstrap samples (e.g. m = 2000) and cal-
culate the corresponding estimates Z
1
:
Y
1
Y
S
1
, Z
2
:=
Y
2
Y
S
2
, . . . , Z
m
:=
Y
m
Y
S
m
.
Order the resulting estimates Z
(1)
Z
(2)
Z
(m)
.
Inference@LS-Kneip 115
Set
:= Z
([m+1]
2
)
and
1
:= Z
([m+1][1
2
])
.
Consistency of the bootstrap implies that asymptotically also
P(
Y
S
2
)
2
, P
Y
S
>
2
) 1
2
,
P
Y
S
1
2
) 1
2
, P
Y
S
>
1
2
)
2
,
This yields the 1 condence interval
[
Y
1
S,
Y
S]
Inference@LS-Kneip 116
General construction of a bootstrap-t interval (unknown
real values parameter IR):
Random sample Y
n
:= {Y
1
, . . . , Y
n
}; unknown parameter (vec-
tor) . Assume that the estimator
of is asymptotically normal,
n(
)
L
N(0, v
2
)
n
(
)
v
L
N(0, 1)
and that a consistent estimator v v(Y
1
, . . . , Y
n
) of v is availa-
ble. One might then replace v by v to obtain
v
L
N(0, 1)
Obviously,
n
(
)
v
and
v
are asymptotic pivot statistics.
Based on an i.i.d. re-sample Y
1
, . . . , Y
n
from {Y
1
, . . . , Y
n
},
calculate Bootstrap estimates
and v
.
Determine
2
and 1
2
quantiles
2
and
1
2
of the condi-
tional distribution of
given Y
n
.
Bootstrap-t interval
[
1
v,
v]
Inference@LS-Kneip 117
1.3 The Parametric bootstrap
A further increase of accuracy can be obtained in applications,
where the distribution of Y is known up to some parameter vec-
tors , (e.g: Y is normal with mean and variance
2
; Y follows
an exponential distribution with parameter ). The dierence to
the nonparametric bootstrap discussed above consists in the way
how to generate a bootstrap re-sample Y
1
, . . . , Y
n
.
Let = (
1
, . . . ,
p
)
1
, . . . , Y
n
is generated by randomly
drawing observations from a F(,
, ) distribution (using
a random number generator)
.
The conditional distribution of
given F(,
, ) is used to
approximate the distribution of the estimator
.
In almost all cases of practical interest condence intervals based
on the parametric bootstrap are more accurate than standard
intervals based on rst order asymptotic approximations. The
parametric bootstrap usually also provides more accurate appro-
ximations than its nonparametric counterpart discussed above.
Of course, this requires that the underlying distributional as-
sumption is satised (otherwise, the parametric bootstrap will
lead to incorrect results).
Inference@LS-Kneip 118
Basic parametric bootstrap condence interval:
[2
t
1
2
, 2
2
],
where
t
2
and
t
1
2
now denote the
2
and 1
2
quantiles of the
conditional distribution of
given F(,
, ).
Bootstrap-t intervals:
Assume that the standard error v(, ) of
n(
) can be
determined in dependence of the parameter (vectors) , .
i.i.d re-sample Y
1
, . . . , Y
n
generated by randomly drawing
observations from a F(,
, ) distribution
Parameter estimates
1
v(
, ),
v(
, )],
where
2
and
1
2
now denote the
2
and 1
2
quantiles
of the conditional distribution of
v(
)
given F(,
, ).
Note: Sometimes the following modication leads to even more
accurate intervals:
Determine the
2
and 1
2
quantiles
2
and
1
2
of he
conditional distribution of
v(
)
given F(,
, ).
Asymptotically we obtain
P(
v(, )
1
2
) 1
1 condence interval: Set of all with
v(, )
1
2
Inference@LS-Kneip 119
Example: Exponential distribution
Assume that Y follows an exponential distribution with parame-
ter . Density and distribution function are then given by
f(y, ) =
1
e
x/
, F(y, ) = 1 e
x/
We have E(Y
i
) = and V ar(Y
i
) =
2
. The maximum likelihood
estimator of is given by
=
1
n
n
i=1
Y
i
, and V ar(
) =
2
n
.
The parametric bootstrap can then be used to construct con-
dence intervals. The following procedure is straightforward, but
there also exist alternative approaches.
An i.i.d re-sample Y
1
, . . . , Y
n
is generated by randomly dra-
wing observations from an exponential distribution with pa-
rameter
.
Y
1
, . . . , Y
n
Estimator
Calculation of
2
and 1
2
quantiles
2
and
1
2
with
P
2
) =
2
P
|
1
2
) = 1
2
where P
2
) =
Condence interval: [
2
,
2
]
It can be shown, that or any nite sample of size n the coverage
probability of this interval is exactly equal to 1 .
Inference@LS-Kneip 120
1.4 More on Bootstrap Condence Intervals
Setup: i.i.d. random sample Y
n
:= {Y
1
, . . . , Y
n
}; unknown para-
meter (vector)
We will assume that the bootstrap is consistent: distr(
|Y
n
)
distr(
) if n is suciently large.
In the previous sections we have already dened basic bootstrap
condence intervals as well as bootstrap-t intervals.
1.4.1 Basic condence interval
[2
t
1
2
, 2
2
],
where
t
2
and
t
1
2
are the
2
and 1
2
quantiles of the condi-
tional distribution of
given Y
n
.
1.4.2 Bootstrap-t Intervals
[
1
v,
v],
where
2
and
1
2
are the
2
and 1
2
quantiles of the condi-
tional distribution of
given Y
n
.
1.4.3 Percentile Intervals
The classical percentile condence interval is given by
[
2
,
t
1
2
]
Generally, this interval does not work extremely well in practice.
Inference@LS-Kneip 121
The so-called BC
a
method allows to construct better condence
intervals. The term BC
a
stands for bias-corrected and accelerated.
The BC
a
interval of intended coverage 1 is given by
[
1
,
2
],
where
t
1
and
t
2
are the
1
and
2
quantiles of the conditional
distribution of
given Y
n
, and
1
=
_
+z
2
1 a(
+z
2
)
_
2
=
_
+z
1
2
1 a(
+z
1
2
)
_
,
where is the standard normal distribution function, and where
z
=
1
_
P
<
]
_
Calculation of the acceleration a is slightly more complicated. It
is based on Jacknife values of the estimator
: For any i = 1, . . . , n
calculate the estimate
i
from the sample Y
1
, . . . , Y
i1
, Y
i+1
, . . . , Y
n
with the ith observation deleted. Let
=
1
n
n
i=1
i
and deter-
mine
a =
n
i=1
(
i
)
3
6[
n
i=1
(
i
)
2
]
3/2
The BC
a
interval is motivated by theoretical results which show
Inference@LS-Kneip 122
that it is second order accurate.
Consider generally 1 condence intervals of the form [t
low
, t
up
]
of . Upper and lower bounds of such intervals are determined
from the data, t
low
t
low
(Y
1
, . . . , Y
n
), t
up
t
up
(Y
1
, . . . , Y
n
), and
their accuracy depends on the particular procedure applied.
(Symmetric) condence intervals are said to be rst-order
accurate if there exist some constant d
1
, d
2
< such that
for suciently large n
|P( < t
low
)
2
|
d
1
n
, |P( > t
up
)
2
|
d
2
n
.
(Symmetric) condence intervals are said to be second-order
accurate if there exist some constant d
3
, d
4
< such that
for suciently large n
|P( < t
low
)
2
|
d
3
n
, |P( > t
up
)
2
|
d
4
n
.
If the distribution of
is asymptotically normal, then under some
additional regularity conditions it can usually be shown that
Standard condence intervals based on asymptotic appro-
ximations are rst-order accurate. The same holds for the
basic bootstrap intervals [2
t
1
2
, 2
2
] as well as for
the classical percentile method.
Bootstrap-t intervals as well as BC
a
intervals are second-
order accurate.
The dierence between rst and second-order accuracy is not
just a theoretical nicety. In many practically important situati-
ons second-order accurate intervals lead to much better approxi-
mations.
Another approach for constructing condence intervals is the
Inference@LS-Kneip 123
ABC method: ABC, standing for for approximate bootstrap
condence intervals, allows to approximate the BC
a
interval end-
points analytically, without using any Monte Carlo replications
at all (reduced computational costs). The procedure works by
approximating the bootstrap sampling results by Taylor expan-
sions. It is then, however, required that
(Y
1
, . . . , Y
n
) is a
smooth function of Y
1
, . . . , Y
n
. This is for example not true for
the sample median.
Inference@LS-Kneip 124
1.5 Subsampling: Inference for a sample maxi-
mum
Data: i.i.d. random sample Y
n
:= {Y
1
, . . . , Y
n
}.
We now consider the situation that the Y
i
only takes values in a
compact interval [0, ] such that
P(Y
i
[0, ]) = 1.
Furthermore, Y
i
possesses a density f which is continuous on [0, ]
and satises f(y) > 0 for y (0, ], and f(y) = 0 for y [0, ].
The maximum of Y
i
is unknown and has to be estimated from
the data.
Similar type of extreme value problems frequently arise in eco-
nometrics. An example is the analysis of production eciencies
of dierent rms. The above situation may arise if we consider
production outputs Y
i
of a sample of rms with identical inputs.
A rm then is ecient if its output equals the maximal possible
value . Note that in practice usually more complicated problems
have to be considered, where production outputs dependent on
individually dierent values of input variables Frontier Ana-
lysis.
Consistent estimator
of :
:= max
i=1,...,n
Y
i
Constructing a condence interval for is not an easy task. The
distribution of
is not asymptotically normal. Indeed, it can
be shown that n(
) follows asymptotically an exponential
distribution with parameter =
1
f()
:
n(
)
L
Exp(
1
f()
)
Inference@LS-Kneip 125
The naive bootstrap fails:
i.i.d. re-sample Y
1
, . . . , Y
n
from {Y
1
, . . . , Y
n
} bootstrap
estimator
:= max
i=1,...,n
Y
i
Unfortunately, the bootstrap is not consistent
The reason is as follows:
= Y
(n)
, and hence
=
= Y
(n)
whenever Y
(n)
{Y
1
, . . . , Y
n
}. Some calculations then show
that for large n
P
= 0) = P(
= 0|Y
n
) 1 e
1
,
while P(
= 0) = 0!
One can conclude that even for large sample sizes distr(
|Y
n
) will be very dierent from distr(
) Basic boot-
strap condence intervals are incorrect.
A possible remedy is to use subsampling. Similar to the ordinary
bootstrap, subsampling relies on i.i.d. re-sampling from Y, and
the only dierence consists in the fact that subsampling is based
on drawing a smaller number < n of observations.
Inference@LS-Kneip 126
Subsampling bootstrap:
Choose some < n
Determine an i.i.d. re-sample Y
1
, . . . , Y
k
by drawing ran-
domly observations from {Y
1
, . . . , Y
n
} bootstrap esti-
mator
:= max
i=1,...,k
Y
i
For the above problem subsampling is consistent.
If = n
)|Y
n
) converges stochastically
to a Exp(
1
f()
)-distribution
More precisely, as n , = n
|P
_
(
) |Y
n
_
F(;
1
f()
)|
P
0,
where F(;
1
f()
) denotes the distribution function of an ex-
ponential distribution with parameter =
1
f()
.
Asymptotically: distr((
)|Y
n
) distr(n(
)).
The subsampling bootstrap works under extremely general con-
ditions, and it can often be applied in situations where the ordi-
nary bootstrap fails. However, it usually does not make any sense
to apply subsampling in regular cases, where standard nonpara-
metric bootstrap is consistent. Then subsampling is less ecient,
and condence intervals based on subsampling are less accurate.
In practice, a major problem is the choice of .
Inference@LS-Kneip 127
Condence interval based on subsampling:
Calculation of
2
and 1
2
quantiles
t
2
and
t
1
2
with
P
2
) =
2
P
t
1
2
) = 1
2
where P
given Y
n
.
This yields
P
2
(
t
1
2
) 1 ,
and consistency of the bootstrap implies
P(
2
n(
t
1
2
) 1 .
Condence interval for :
[
+
n
2
,
+
n
t
1
2
]
Inference@LS-Kneip 128
1.6 Appendix
1.6.1 The empirical distribution function
Data: i.i.d. sample X
1
, . . . , X
n
; ordered sample X
(1)
X
(n)
. The distribution of X
i
possesses a distribution function F
dened by
F(x) = P(X
i
x)
Let H
n
(x) denote the number of observations X
i
satisfying X
i
X. The empirical distribution function is then dened by
F
n
(x) = H
n
(x)/n = Proportion of observations X
i
with X
i
x
Properties:
0 F
n
(x) 1
F
n
(x) = 0 if x < X
(1)
F(x) = 1 if x X
(n)
F
n
is a monotonically increasing step function
Inference@LS-Kneip 129
Example:
x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
5,20 4,80 5,40 4,60 6,10 5,40 5,80 5,50
Empirical distribution function:
4.0 4.5 5.0 5.5 6.0 6.5
0.0
0.2
0.4
0.6
0.8
1.0
Inference@LS-Kneip 130
Theoretical properties of F
n
Theorem: For every x IR we obtain
F
n
(x) B(n, F(x)),
i.e. F
n
(x) follows a binomial distribution with parameters n and
F(x). The probability distribution of F
n
(x) is thus givenn by
P
_
F
n
(x) =
m
n
_
=
_
_
n
m
_
_
F(x)
m
(1F(x))
nm
, m = 0, 1, . . . , n
Consequences:
E(F
n
(x)) = F(x), i.e.. F
n
(x) is an unbiased estimator of
F(x)
V ar(F
n
(x)) =
1
n
F(x)(1 F(x)) the standard error of
F
n
(x) decreases as n increaases. F
n
(x) is a consistent esti-
mator of F(x)).
Theorem of Glivenko-Cantelli:
P
_
lim
n
sup
xIR
|F
n
(x) F(x)| = 0
_
= 1
Inference@LS-Kneip 131
1.6.2 Consistency of estimators
Any reasonable estimator
of a parameter must be consistent.
Intuitively this means that the distribution of
n
must be-
come more and more concentrated around the true value as
n . The mathematical formalization of consistency relies on
general concepts quantifying convergence of random variables.
Convergence in probability:
Let X
1
, X
2
, . . . and X be random variables dened on a pro-
bability space (, A, P). X
n
converges in probability to X
if
lim
n
P [|X
n
X| < ] = 1
for every > 0. One often uses the notation X
n
P
X
weak consistency:
An estimator
is called weakly consistent if
n
P
n
MSE
.
Inference@LS-Kneip 132
n
a.s.
X
n
MSE
X implies X
n
P
X
X
n
a.s.
X implies X
n
P
X
Application: Law of large numbers
We obtain E(
X) = as well as V ar(
X) =
2
n
MSE(
X) := E((
X )
2
) = V ar(
X) =
2
n
n
0
X
P
as n
Example: Consider a normally distributed random variable X
N(, (0, 18)
2
) with unknown mean but known standard deviation
= 0.18.
Random sample X
1
, . . . , X
n
Estimator
X of .
Recall:
X N(,
2
n
) = N(,
0.18
2
n
).
n = 9 : standard error = 0, 06, MSE(
X) = 0, 0036
n = 144 : standard error = 0, 015, MSE(
X) = 0, 000225
Inference@LS-Kneip 133
n = 9 : P[ 0, 1176
X + 0, 1176] = 0, 95
n = 144 : P[ 0, 0294
X + 0, 0294] = 0, 95
0.0
0.5
1.0
1.5
n=9
0,025
0,025
0.0
0.5
1.0
1.5
n=144
0,025
0,025
Inference@LS-Kneip 134
1.6.3 Convergence in distribution
Let Z
1
, Z
2
, . . . be a sequence of random functions with distri-
bution functions F
1
, F
2
, . . . , and let Z be a random variable
with distribution function F. Z
n
konverges in distribution to
Z if
lim
n
F
n
(t) F(t) an every continuity point t von F
Notation: Z
n
L
Z
The central limit theorem
Theorem (Ljapunov): Let X
1
, X
2
, . . . be a sequence of inde-
pendent random variables with means E(X
i
) =
i
and variances
V ar(X
i
) = E((X
i
i
)
2
) =
2
i
> 0. Furthermore assume that
E(|X
i
i
|
3
) =
i
< .
If
(
n
i=1
i
)
1/3
(
n
i=1
2
i
)
1/2
0 as n then
n
i=1
(X
i
i
)
(
n
i=1
2
i
)
1/2
L
N(0, 1)
Sometimes the notation Z
n
AN(0, 1) is used instead of Z
n
L
N(0, 1).
Important information about the speed of convergence to a nor-
mal distribution is given by the Berry-Esen theorem:
Inference@LS-Kneip 135
Theorem (Berry-Esen): Let X
1
, X
2
, . . . be a sequence of i.i.d.
random variables with mean E(X
i
) = and variance V ar(X
i
) =
E((X
i
i
)
2
) =
2
> 0. Then, if G
n
denotes the distribution
function of
n(
X)
,
sup
t
|G
n
(t) (t)|
33
4
E(|X
i
|
3
)
3
n
1/2
1.6.4 Stochastic order symbols (rates of convergence)
In mathematical notation the symbols O() and o() are often
used in order to quantify the speed (rate) of convergence of a
sequence of numbers.
Let
1
,
1
,
3
, . . . and
1
,
1
,
3
, . . . be a (deterministic) sequence
of numbers.
The notation
n
= O(1) indicates that the sequence
1
,
2
, . . .
is bounded. More precisely, there exists an M < such that
n
M for all n IN.
n
= o(1) means that Z
n
0.
Z
n
= O(r
n
) means that |Z
n
|/|r
n
| = O(1).
Z = o(r
n
) means that |Z
n
|/|r
n
| 0.
Examples:
n
i=1
i = O(n
2
),
n
i=1
i = o(n
3
)
Stochastic order symbols O
P
() and o
P
() are used to quantify
the speed (rate) of convergence of a sequence of random varia-
bles. Let Z
1
, Z
2
, Z
3
, . . . be a sequence of random variables, and
let r
1
, r
2
, . . . be either a deterministic sequence of number or a
sequence of random variables.
Inference@LS-Kneip 136
We will write Z
n
= O
p
(1) if for every > 0 there exists an
M
< and an n
IN such that
P(|Z
n
| > M
) fr alle n n
In other words, Z
n
= O
p
(1) indicates that the r.v. Z
n
are
stochastically bounded.
We will write Z
n
= o
P
(1) if and only if Z
n
P
0.
Z
n
= O
P
(V
n
) means|Z
n
|/|V
n
| = O
P
(1).
Z
n
= o
P
(V
n
) means that |Z
n
|/|V
n
|
P
0.
Example:
X = O
P
(n
1/2
)
1.6.5 Important inequalities
Inequality of Chebychev:
P[|X | > k]
1
k
2
for all k > 0
P[ k X +k] 1
1
k
2
k P[ k X +k]
2 1
1
4
= 0, 75
3 1
1
9
0, 89
4 1
1
16
= 0, 9375
Generalization:
P[|X | > k]
E(|X |
r
)
k
r
for all k > 0, r = 1, 2, . . .
Inference@LS-Kneip 137
Cauchy-Schwarz inequality:
Let x
1
, . . . , x
n
and y
1
, . . . , y
n
be arbitrary real numbers. Then
(
n
i=1
x
i
y
i
)
2
(
n
i=1
x
2
i
)(
n
i=1
y
2
i
)
Integrated version:
_
b
a
f(x)g(x)dx
_
2
(
b
a
f(x)
2
dx)(
b
a
g(x)
2
dx)
Application to random variables:
(E(XY ))
2
E(X
2
) E(Y
2
)
Hlder inequality:
Sei p > 1 und
1
p
+
1
q
= 1
Let x
i
, y
i
0, i = 1, . . . , n be arbitrary numbers. Then
n
i=1
x
i
y
i
(
n
i=1
x
p
i
)
1/p
(
n
i=1
y
q
i
)
1/q
Integrated version: (f(x) 0, g(x) 0)
b
a
f(x)g(x)dx (
b
a
f(x)
p
dx)
1/p
(
b
a
g(x)
q
dx)
1/q
Application to random variables:
E(|X| |Y |) (E(|X|
p
))
1/p
(E(|Y |
q
))
1/q
Inference@LS-Kneip 138
2 Bootstrap and Regression Models
Problem: Analyze the inuence of some explanatory (indepen-
dent) variables X
1
, X
2
, . . . , X
p
on a response variable (or de-
pendent variable) Y .
Observations
(Y
1
, X
11
, . . . , X
1p
), (Y
2
, X
21
, . . . , X
2p
), . . . , (Y
n
, X
n1
, . . . , X
np
)
Model
Y
i
=
0
+
1
X
i1
+
2
X
i2
+. . . +
p
X
ip
+
i
1
, . . . ,
n
i.i.d., E(
i
) = 0, Var(
i
) =
2
_
i
N(0,
2
)
0
+
1
X
i1
+. . . +
p
X
ip
= m(X
i1
, . . . , X
ip
)
= E(Y |X
1
= X
i1
, . . . , X
p
= X
ip
),
is necessarily fullled, if (Y
i
, X
i1
, X
i2
, . . . , X
ip
)
T
is a multi-
variate normal random vector.
Inference@LS-Kneip 21
Remark: Regression analysis is usually a conditional analysis.
The goal is to estimate the regression function mwhich is the con-
ditional expectation of Y given X
1
, . . . , X
p
. Standard inference
studies the behavior of estimators conditional on the observed
values.
However, dierent types of bootstrap may be used depending on
how the data is generated.
1) Random design:
(Y
1
, X
11
, . . . , X
1p
), (Y
2
, X
21
, . . . , X
2p
), . . . , (Y
n
, X
n1
, . . . , X
np
)
is a sample of i.i.d. random vectors, i.e. observations are in-
dependent and identically distributed.
Example: p + 1 measurements from n individuals randomly
drawn from an underlying population.
2) (X
j1
, . . . , X
jp
), j = 1, . . . , p, random vectors which are, ho-
wever, not independent or not identically distributed (e.g.
time series data, the X-variables are observed in successive
time periods).
3) Fixed design: Data are collected at are pre-specied, non-
random values X
jk
(corresponding for example to dierent
experimental conditions).
Inference@LS-Kneip 22
The model can be rewritten in matrix notation:
Y = X +
E() = 0, Cov() =
2
I
n
,
[ N
n
(0,
2
I
n
)]
with Y =
_
_
_
_
_
Y
1
.
.
.
Y
n
_
_
_
_
_
, X =
_
_
_
_
_
_
_
_
X
11
X
12
X
1p
X
21
X
22
X
2p
.
.
.
.
.
.
.
.
.
X
n1
X
n2
X
np
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
1
.
.
.
p
_
_
_
_
_
_
_
_
, =
_
_
_
_
_
_
_
_
2
.
.
.
n
_
_
_
_
_
_
_
_
The parameter vector = (
0
, . . .
p
)
T
is usually estimated by
least squares:
Least squares method: Determine
0
,
1
, . . . ,
p
by minim-
zing
Q(
0
, . . . ,
p
) =
n
i=1
(Y
i
Y
i
)
2
=
n
i=1
(Y
i
1
X
i1
. . .
p
X
ip
)
2
Least squares estimator:
= [X
T
X]
1
X
T
Y
Inference@LS-Kneip 23
Let E
and Cov
1.
is an unbiased estimator of
E
) =
_
_
_
_
_
E
0
)
.
.
.
E
p
)
_
_
_
_
_
=
_
_
_
_
_
0
.
.
.
p
_
_
_
_
_
=
2. Covariance matrix:
Cov
) = Cov
([X
T
X]
1
X
T
Y )
= [X
T
X]
1
X
T
Cov(Y )X[X
T
X]
1
=
2
[X
T
X]
1
X
T
X[X
T
X]
1
=
2
[X
T
X]
1
3. Distribution under normality:
If
i
N(0,
2
i
) then N
n
(0,
2
I
n
), and consequently
N
p+1
_
,
2
[X
T
X]
1
_
4. Asymptotic distribution: Assume that
1
n
i
X
ij
X
ik
c
jk
as
well as
1
n
i
X
ij
c
0k
as n Note that c
jk
= E(X
j
X
k
)
and c
0j
= E(X
j
) in the case of random design. Furthermore,
Let C denote the (p +1) (p +1) matrix with elements cjk,
j, k = 0, . . . , p, c
00
= 1, c
j0
= c
0j
, and assume that C is of
full rank. Then
n(
) N
p+1
_
0,
2
C
1
_
Inference@LS-Kneip 24
Estimation of
2
:
The residuals
i
= Y
i
Y
i
= Y
i
j=1
j
X
ij
estimate
the error term
i
Estimator
2
of
2
:
2
=
1
n p 1
n
i=1
(Y
i
Y
i
)
2
2
is an unbiased estimator
2
If the true error terms
i
are normally distributed, then (n
p 1)
2
2
2
np1
Let
ij
, i, j = 1, . . . , p + 1 denote the elements of the matrix
= [X
T
X]
1
. Then, for normal errors,
j
j
jj
t
np1
Standard condence intervals and tests for the parameter esti-
mates.
Note: Under the normality assumption,
jj
is a Pivot stati-
stics. In the general case (under some weak regularity conditions),
this quantity is an asymptotic Pivot statistics.
jj
/n converges
to the j th diagonal element of the matrix C, and therefore
j
j
jj
L
N(0, 1) as n
Inference@LS-Kneip 25
2.1 Bootstrapping Pairs
The usual, nonparametric is applicable if the data is generated by
a random design. Let X
i
= (X
i1
, . . . , X
ip
). The construction
of bootstrap condence intervals then proceeds as follows:
Basic bootstrap condence interval:
Original data: i.i.d. sample (Y
1
, X
1
), . . . , (Y
n
, X
n
)
Random samples (Y
1
, X
1
), . . . , (Y
n
, X
n
) are generated by
drawing observations independently and with replacement
from the available sample Y
n
:= {(Y
1
, X
1
), . . . , (Y
n
, X
n
)}.
(Y
1
, X
1
), . . . , (Y
n
, X
n
) least squares estimators
j
, j =
1, . . . , p + 1.
Determine
2
and 1
2
quantiles
t
2
,j
and
t
1
2
,j
of the condi-
tional distribution of
j
given Y
n
:= {(Y
1
, X
1
), . . . , (Y
n
, X
n
)}.
P
2
,j
)
2
, P
j
>
t
2
,j
1
2
,
P
t
1
2
,j
) 1
2
, P
j
>
t
1
2
,j
)
2
,
Here, P
j
given Y
n
.
Approximate 1 (symmetric) condence interval:
[2
t
1
2
,j
, 2
2
,j
]
Inference@LS-Kneip 26
Remark: Under some weak regularity conditions the bootstrap
is consistent, whenever
Y
i
=
0
+
1
X
i1
+
2
X
i2
+. . . +
p
X
ip
+
i
for independent errors
i
with E(
i
) = 0 and var(
i
) =
2
(X
i
) <
. In other words, the basic bootstrap condence interval provi-
des an asymptotically (rst order) accurate condence interval,
even if the errors are heteroscedastic (unequal variances)! This is
not true for the standard t-intervals.
Modication: Bootstrap-t intervals:
Random samples (Y
1
, X
1
), . . . , (Y
n
, X
n
) are generated by
drawing observations independently and with replacement
from the available sample Y
n
:= {(Y
1
, X
1
), . . . , (Y
n
, X
n
)}.
Use (Y
1
, X
1
), . . . , (Y
n
, X
n
) to determine least squares esti-
mators
j
, j = 1, . . . , p +1 as well as estimators (
2
)
of the
error variance
2
.
With
jj
denoting the j-th diagonal element of the matrix
= [(X
)
T
X
]
1
compute
jj
Determine
2
and 1
2
quantiles
2
,j
and
1
2
,j
of the
conditional distribution of
jj
This yields the 1 condence interval
[
j
1
2
,j
jj
,
2
,j
jj
]
Dierent from the basic bootstrap interval, this bootstrap-t in-
terval will be incorrect for heteroscedastic error.
Inference@LS-Kneip 27
In order to understand bootstrap behavior for random design let
us analyze the simplest case with p = 1. Then Y
i
=
0
+
1
X
i
+
i
.
Consider the estimator
1
=
i
(X
i
X)Y
i
i
(X
i
X)
2
=
1
+
1
n
i
(X
i
X)
i
1
n
i
(X
i
X)
2
of the slope
1
.
Random design implies that (Y
i
, X
i
), and hence (
i
, X
i
), i =
1, . . . , n are independent and identically distributed. Under some
regularity conditions (existence of moments) we have
1
n
i
(X
i
X)
2
p
E(X
i
x
)
2
=
2
X
,
and the central limit theorem implies that
1
i
(X
i
X)
i
L
N(0, v
2
,X
),
where
v
2
,X
= E
_
(X
i
x
)
2
2
i
_
.
If
i
and X
i
are independent and
2
= var(
i
) does not depend
on X
i
, then v
,X
=
2
X
2
. We then generally obtain for large n
distr(
n(
1
)) distr
_
1
i
(X
i
X)
i
1
n
i
(X
i
X)
2
_
distr
_
1
i
(X
i
X)
i
2
X
_
N(0,
v
2
,X
4
x
)
Inference@LS-Kneip 28
Now consider the bootstrap estimator
1
,
1
=
i
(X
i
X
)Y
i
(X
i
X
)
2
=
1
+
1
n
i
(X
i
X
i
1
n
i
(X
i
X
)
2
,
where
i
= Y
1
X
i
.
Recall that by denition, (Y
i
, X
i
), and hence (
i
, X
i
), i = 1, . . . , n
are independent and identically distributed observations (condi-
tional on Y
n
). We obtain E(
1
n
i
(X
i
X
)
2
|Y
n
) =
1
n
i
(X
i
X)
2
=:
2
X
, and
|
1
n
i
(X
i
X
)
2
2
X
|
P
0
as n . Moreover, E
_
1
i
(X
i
X
i
|Y
n
_
= 0 and
var
_
1
i
(X
i
X
i
|Y
n
_
=
1
n
i
(X
i
X)
2
2
i
By the central limit theorem we obtain that for large n
distr
_
n(
1
)|Y
n
_
N(0,
1
n
i
(X
i
X)
2
2
i
4
X
).
Since
1
n
i
(X
i
X)
2
2
i
P
v
2
,X
,
x
P
x
, we can conclude
that asymptotically
distr(
n(
1
)) distr
_
n(
1
)|Y
n
_
Bootstrap consistent
Inference@LS-Kneip 29
2.2 Bootstrapping Residuals
Bootstrapping residuals is applicable independent of the particu-
lar design of the regression model. The only crucial assumption
is that the error terms
i
are i.i.d. with constant variance
2
.
Residuals:
i
= Y
i
Y
i
= Y
i
j=1
j
X
ij
Matrix notation:
=
_
_
_
_
_
1
.
.
.
n
_
_
_
_
_
= (I X[X
T
X]
1
X
T
)Y = (I X[X
T
X]
1
X
T
. .
H
)
Cov() =
2
(I H)
With h
ii
> 0 denoting the i-th diagonal element of H we thus
obtain
var(
i
) =
2
(1 h
ii
) <
2
Standardized residuals:
r
i
=
i
1 h
ii
var(r
i
) =
2
We have
i
i
= 0. For the standardized residuals it is, however,
not guaranteed that r =
1
n
i
r
i
is equal to zero. The residual
bootstrap thus relies on resampling centered standardized resi-
duals r
i
:= r
i
r.
Inference@LS-Kneip 210
Note: Residual plots play an important role in validating regres-
sion models.
a.) Nonlinear model:
0 50 100 150
2
0
2
4
Mangelnde Modellanpassung
fitted y
r
e
s
i
d
u
a
l
s
b.) Heteroscedasticity
0 50 100 150
2
0
0
1
5
0
1
0
0
5
0
0
5
0
1
0
0
Heteroskedadastizitt
fitted y_i
R
e
s
i
d
u
a
l
s
Inference@LS-Kneip 211
Bootstrapping Residuals
Original data: i.i.d. sample (Y
1
, X
1
), . . . , (Y
n
, X
n
) Estima-
tor
1
, . . . ,
n
of residuals by drawing
observations independently and with replacement from{r
1
, . . . , r
n
}.
Calculate
Y
i
=
0
+
p
j=1
j
X
ij
+
i
, i = 1, . . . , n
Bootstrap estimators
1
, X
1
), . . . , (Y
n
, X
n
).
Basic bootstrap condence intervals:
Determine
2
and 1
2
quantiles
t
2
,j
and
t
1
2
,j
of the
conditional distribution of
j
.
P
2
,j
)
2
, P
j
>
t
2
,j
1
2
,
P
t
1
2
,j
) 1
2
, P
j
>
t
1
2
,j
)
2
,
Here, P
j
given Y
n
.
Approximate 1 (symmetric) condence interval:
[2
t
1
2
,j
, 2
2
,j
]
Bootstrap-t intervals can be determined similarly.
Inference@LS-Kneip 212
In order to understand the residual bootstrap let us again analyze
the simplest case with p = 1, and recall that
1
=
i
(X
i
X)Y
i
i
(X
i
X)
2
=
1
+
1
n
i
(X
i
X)
i
1
n
i
(X
i
X)
2
Let
2
X
:=
1
n
i
(X
i
X)
2
. If the errors
i
are i.i.d. zero mean
random variables with var(
i
) =
2
, then (under some regularity
conditions) the central limit theorem implies that conditional on
the observed values X
1
, . . . , X
n
distr(
n(
1
)) = distr
_
1
i
(X
i
X)
i
1
n
i
(X
i
X)
2
_
N(0,
2
2
X
)
holds for large n.
By denition,
1
=
i
(X
i
X)Y
i
(X
i
X)
2
=
1
+
1
n
i
(X
i
X)
i
1
n
i
(X
i
X)
2
.
We have
E(
i
|Y
n
) = 0, var(
i
|Y
n
) =
1
n
i
r
2
i
=:
2
,
and therefore
var
_
1
i
(X
i
X)
Y
n
_
=
1
n
i
(X
i
X)
2
2
The central limit theorem then leads to
distr
_
n(
1
)|Y
n
_
N(0,
2
2
X
).
Bootstrap consistent, since
2
P
2
as n .
Inference@LS-Kneip 213
2.3 Wild Bootstrap
The residual bootstrap is not consistent if the errors
i
are
heteroscedastic, i.e. var(
i
) =
2
i
. In this case the wild bootstrap
oers an alternative.
There are several versions of the wild bootstrap. In its simplest
form this procedure works as follows: Conditional on Y
n
, a boot-
strap sample
1
, . . . ,
n
of residuals is determined by generating
n independent random variables from the following binary dis-
tributions:
P
_
i
=
i
1
5
2
_
= ,
P
_
i
=
i
1
5
2
_
= 1 ,
i = 1, . . . , n, where =
5+
5
10
.
The constants are chosen in such a way that
E(
i
|Y
n
) = E
i
) = 0
var(
i
|Y
n
) = var
i
) =
2
i
E((
i
)
3
|Y
n
) = E
((
i
)
3
) =
3
i
Inference@LS-Kneip 214
Implementation of the wild bootstrap:
Original data: i.i.d. sample (Y
1
, X
1
), . . . , (Y
n
, X
n
) Estima-
tor
i
from binary
distributions,
P
_
i
=
i
1
5
2
_
= ,
P
_
i
=
i
1
5
2
_
= 1 ,
i = 1, . . . , n, where =
5+
5
10
.
Calculate
Y
i
=
0
+
p
j=1
j
X
ij
+
i
, i = 1, . . . , n
Bootstrap estimators
1
, X
1
), . . . , (Y
n
, X
n
).
Basic bootstrap condence intervals:
Determine
2
and 1
2
quantiles
t
2
,j
and
t
1
2
,j
of the
conditional distribution of
j
.
Approximate 1 (symmetric) condence interval:
[2
t
1
2
,j
, 2
2
,j
]
Bootstrap-t intervals can be determined similarly.
Inference@LS-Kneip 215
In order to understand the basic intuition let us again analyze
the simplest case with p = 1, and recall that
1
=
i
(X
i
X)Y
i
i
(X
i
X)
2
=
1
+
1
n
i
(X
i
X)
i
1
n
i
(X
i
X)
2
It is now assumed that the errors
i
are independent with var(
i
) =
2
i
. Let
2
X
:=
1
n
i
(X
i
X)
2
and v
2
,X
=
1
n
i
(X
i
X)
2
2
i
. Un-
der some regularity conditions the central limit theorem implies
that conditional on the observed values X
1
, . . . , X
n
distr(
n(
1
)) = distr
_
1
i
(X
i
X)
i
1
n
i
(X
i
X)
2
_
N(0,
v
2
,X
4
X
)
holds for large n.
As above,
1
=
i
(X
i
X)Y
i
(X
i
X)
2
=
1
+
1
n
i
(X
i
X)
i
1
n
i
(X
i
X)
2
,
and by construction
var
_
1
i
(X
i
X)
Y
n
_
=
1
n
i
(X
i
X)
2
2
i
=: w
2
,X
.
For large n, the central limit theorem then leads to
distr
_
n(
1
)|Y
n
_
N(0,
w
2
,X
4
X
).
We have E
(
2
i
) =
2
i
+O(
1
n
), and thus for large n
E
( w
2
,X
) =
1
n
i
(X
i
X)
2
E
(
2
i
) v
2
,X
Under some regularity conditions the law of large numbers then
Inference@LS-Kneip 216
implies that | w
2
,X
v
2
,X
| 0 as n . Wild bootstrap
consistent.
2.4 Generalizations
The above types of bootstrap (bootstrapping pairs, bootstrap-
ping residuals, wild bootstrap) can also be useful in more com-
plex regression setups. An appropriate method then has to be
selected in dependence of existing knowledge about underlying
design and structure of residuals.
1) Nonlinear regression:
Y
i
= g(X
i
, ) +
i
,
where g is a nonlinear function of .
Example: Depreciation of a car (CV Citroen )
X - Age of the car (in years)
Y - depreciation =
selling price
original price (new car)
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Wertverlust eines Autos
X= Alter in Jahren
Y
=
r
e
l
a
t
i
v
e
r
W
e
r
t
v
e
r
l
u
s
t
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Inference@LS-Kneip 217
Model: Y
i
= e
X
i
+
i
An estimator
is determined by (nonlinear) least squares;
residual:
i
= Y
i
e
X
i
Bootstrap: Random design bootstrapping pairs; bootstrap-
ping residuals for homoscedastic errors; wild bootstrap for
heteroscedastic errors.
2) Median Regression:
Linear model: Y
i
=
0
+
j
j
X
ij
+
i
In some applications the errors possess heavy tails ( out-
liers!). In such situations estimation of by least squares
may not be appropriate, and statisticians tend to use more
robust method. A sensible procedure then is to determine
estimates
by minimizing
n
i=1
|Y
i
j
X
ij
|
over all possible . Solutions can be determined by numerical
optimization algorithms.
Inference is the usually based on the bootstrap. Random de-
sign bootstrapping pairs; bootstrapping residuals for ho-
moscedastic errors; wild bootstrap for heteroscedastic errors.
3) Nonparametric regression:
Model:
Y
i
= m(X
i
) +
i
for some unknown function m. The function m can be esti-
mated by nonparametric smoothing procedures (kernel esti-
mation; local linear estimation; spline estimation). Inference
is often based on the bootstrap.
Inference@LS-Kneip 218
2.5 Time series
The general idea of the residual bootstrap can be adapted to
many dierent situations. For example, it can also be used in the
context of time series models.
Example: AR(1)-process:
X
t
= X
t1
+
t
, t = 1, . . . , n
for i.i.d zero mean error terms with var(
t
) =
2
. If || < 1 this
denes a stationary stochastic process.
Standard estimator of :
=
n
i=2
(X
t
X)(X
t1
X)
n
i=1
(X
t
X)
2
Asymptotic distribution:
n( )
L
N(0, 1
2
)
Bootstrapping residuals
Calculate centered residuals
t
= X
t
X
t1
,
t
=
t
1
n 1
t
t
, t = 2, . . . , n
For some k > 0 generate random samples
k
,
k+1
, . . . ,
0
,
1
, . . . ,
n
of residuals by drawing n+k +1 observations independently
and with replacement from {
1
, . . . ,
n
}.
Generate a bootstrap time series by X
k
=
k
and
X
t
= X
t1
+
1
, t = k + 1, . . . , n
Determine bootstrap estimators
from X
1
, . . . , X
n
.
Inference@LS-Kneip 219
Under the standard assumptions of AR(1) models this bootstrap
is consistent.
Basic bootstrap condence intervals:
Determine
2
and 1
2
quantiles
t
2
and
t
1
2
of the condi-
tional distribution of
.
Approximate 1 (symmetric) condence interval:
[2
t
1
2
, 2
2
,j
]
Bootstrap-t intervals can be determined similarly.
Inference@LS-Kneip 220