Вы находитесь на странице: 1из 58

1 An introduction to the Bootstrap

The bootstrap is an important tool of modern statistical analysis.


It establishes a general framework for simulation-based statisti-
cal inference. In simple situations the uncertainty of an estimate
may be gauged by analytical calculations leading, for example,
to the construction of condence intervals based on an assumed
probability model for the available data. The bootstrap repla-
ces complicated and often inaccurate approximations to biases,
variances and other measures of uncertainty by computer simu-
lations.
The idea of the bootstrap:
The random sample Y
1
, . . . , Y
n
is generated by drawing ob-
servations independently and with replacement from the un-
derlying population (with distribution function F) For each
interval [a, b] the probability of drawing an observation in
[a, b] is given by P(Y [a, b]) = F(b) F(a).
n large: The empirical distribution of the sample values is
close to the distribution of Y in the underlying population.
The relative frequency F
n
(b) F
n
(a) of observations in [a, b]
converges to P(Y [a, b]) = F(b) F(a) as n .
The idea of the bootstrap consists in mimicking the data ge-
nerating process. Random sampling from the true population
is replaced by random sampling from the observed data.
This is justied by the insight that the empirical distributi-
on of the observed data is similar to the true distribution
(F
n
F for n ).
Literature: Davison, A.C. and Hinkley, D.V. (2005): Bootstrap
Methods and their Applications; Cambridge University Press
Inference@LS-Kneip 11
Setup:
Original data: i.i.d. random sample Y
1
, . . . , Y
n
; the distribu-
tion of Y
i
depends on an unknown parameter (vector)
The data Y
1
, . . . , Y
n
is used to estimate estimator

(Y
1
, . . . , Y
n
)
We are interested in evaluating the distribution of

(resp.

) in order to provide standard errors, to construct con-


dence intervals, or to perform tests of hypothesis.
The bootstrap approach:
1) Bootstrap samples: Random samples Y

1
, . . . , Y

n
are genera-
ted by drawing drawing observations independently and with
replacement from the available sample Y
1
, . . . , Y
n
.
2) Bootstrap estimates:



(Y

1
, . . . , Y

n
)
3) In practice: Steps 1) and 2) are repeated m times (e.g. m =
2000) m values

1
,

2
, . . . ,

m
4) The (empirical) distribution of

is used to approximate the


distribution of

.
Inference@LS-Kneip 12
1.1 Why does the bootstrap work?
The theoretical justication of the bootstrap is based on asym-
ptotic arguments. Usually the bootstrap does not provide very
good approximations for extremely small sample size. It must,
however, be emphasized that in some cases bootstrap condence
intervals can be more accurate for moderate sample sizes than
condence intervals based on standard asymptotic approximati-
ons.
Example 1: Estimating a proportion
Data: i.i.d. random sample Y
1
, . . . , Y
n
; Y
i
{0, 1} is dicho-
tomous, P(Y
i
= 1) = p, P(Y
i
= 0) = 1 p.
The problem is to estimate p.
Let S denote the number of Y
i
which are equal to 1. The
maximum likelihood estimate of p is p = S/n.
Recall: n p = S B(n, p)
As n the central limit theorem implies that

n( p p)
p(1 p)

L
N(0, 1)
n large: the distributions of

n( pp) and pp can be appro-


ximated by N(0, p(1p)) and N(0, p(1p)/n), respectively.
For simplicity we will write distr(

n( pp)) N(0, p(1p))


as well as distr( p p) N(0, p(1 p)/n).
Bootstrap:
Random sample Y

1
, . . . , Y

n
generated by drawing observati-
ons independently and with replacement fromY
n
:= {Y
1
, . . . , Y
n
}.
Let S

denote the number of Y

i
which are equal to 1.
Inference@LS-Kneip 13
Bootstrap estimate of p: p

= S

/n
The distribution of p

depends on the observed sample Y


n
:=
{Y
1
, . . . , Y
n
}!. A dierent sample will lead to a dierent distri-
bution. The bootstrap now tries to approximate the true distri-
bution of p p by the conditional distribution of p

p given
the observed sample Y
n
. The bootstrap is called consistent if
asymptotically (n ) the conditional distribution of p

p
coincides with the true distribution of pp (note: a proper scaling
is required!)
We obtain
P

(Y

i
= 1) = P(Y

i
= 1| Y
n
) = p,
P

(Y

i
= 0) = P(Y

i
= 0| Y
n
) = 1 p
and
E

( p

) = E( p

| Y
n
) = p,
V ar

( p

) = E[( p

p)
2
| Y
n
] =
p(1 p]
n
The conditional distribution of n p

= S

given Y
n
is equal
to B(n, p). In a slight abuse of notation we will write
(n p

|Y
n
) B(n, p)
or
distr(n p

|Y
n
) = B(n, p)
Inference@LS-Kneip 14
As n the central limit theorem implies that the (condi-
tional) distribution (

n( p

p)
p(1 p)
|Y
n
) converges (stochastically)
to a N(0, 1)-distribution. Moreover, p is a consistent estima-
tor of p and therefore p(1 p)
P
p(1 p) as n .
This implies that asymptotically p(1 p) may be replaced
by p(1 p), and
The law of (

n( p

p)
p(1 p)
|Y
n
) converges stochastically
to a N(0, 1)-distribution
More precisely, as n
sup

|P
_
n( p

p)
p(1 p)
|Y
n
_
()|
P
0,
where denotes the distribution function of the standard
normal distribution.
We can conclude that for large n
distr(

n( p

p)|Y
n
) distr(

n( p p)) N(0, p(1 p))


as well as
distr( p

p|Y
n
) distr( p p) N(0, p(1 p)/n)
Bootstrap consistent
Inference@LS-Kneip 15
Example 2: Estimating a population mean
Let Y
1
, . . . , Y
n
denote an i.i.d. random sample with mean
and variance
2
. In the following F will denote the corre-
sponding distribution function.


Y =
1
n

n
i=1
Y
i
is an unbiased estimator of
Problem: Construct a condence interval
Traditional approach for constructing a 1 condence interval:


Y N(,

2
n
)
Estimation of
2
: S
2
=
1
n1

n
i=1
(Y
i


Y )
2
This implies:

n

Y
S
t
n1
, and hence
P(t
n1,1

2
S

n


Y t
n1,1

2
S

n
)
95% condence interval: [

Y t
n1,1

2
S

n
,

Y +t
n1,1

2
S

n
]
Remark: The construction relies on the assumption that

Y
N(,

2
n
). This is necessarily true if Y is normally distributed.
If the underlying distribution is not normal, then this condition
is approximately fullled is the sample size n is suciently large
(central limit theorem). In this case the constructed condence
interval must also be seen as an approximation
The bootstrap oers an alternative method for constructing such
condence intervals.
Inference@LS-Kneip 16
The bootstrap approach:
Random samples Y

1
, . . . , Y

n
are generated by drawing obser-
vations independently and with replacement from the availa-
ble sample Y
n
:= {Y
1
, . . . , Y
n
}.
Y

1
, . . . , Y

n
estimator

Y

=
1
n

n
i=1
Y

i
Means and variances of the conditional distributions of Y

i
and

Y

given Y
n
:
E

(Y

i
) = E(Y

i
|Y
n
) =

Y ,
V ar

(Y

i
) = E[(Y

i


Y )
2
| Y
n
] =

S
2
:=
1
n
n

i=1
(Y
i


Y )
2
Moreover,
E

) =

Y ,
V ar

) =

S
2
/n
As n the central limit theorem implies that the (condi-
tional) distribution of (

n(

Y )

S
|Y
n
) converges (stochasti-
cally) to a N(0, 1)-distribution.. Moreover,

S
2
is a consistent
estimator of
2
and therefore

S
2

P

2
as n . This
implies that asymptotically

S may be replaced by , and
The law of (

n(


Y )

|Y
n
) converges stochastically
to a N(0, 1)-distribution
More precisely, as n
sup

|P
_
n(


Y )

|Y
n
_
()|
P
0,
where denotes the distribution function of the standard
normal distribution.
Inference@LS-Kneip 17
We can conclude that for large n
distr(

n(


Y )|Y
n
) distr(

n(

Y )) N(0,
2
)
as well as
distr(


Y |Y
n
) distr(

Y ) N(0,
2
/n)
Bootstrap consistent
Construction of a symmetric condence interval of level
1 :
Determine

2
and 1

2
quantiles

t

2
and

t
1

2
of the con-
ditional distribution of Y

given Y
n
:= {Y
1
, . . . , Y
n
} (the
bootstrap distribution):
P

2
)

2
, P

>

t

2
) 1

2
,
P

t
1

2
) 1

2
, P

>

t
1

2
)

2
,
Here, P

denotes probabilities with respect to conditional


distribution of

Y

given Y
n
:= {Y
1
, . . . , Y
n
}.
In practice:
Draw m bootstrap samples (e.g. m = 2000) and calculate
the corresponding estimates

Y

1
,

Y

2
, . . . ,

Y

m
.
Order the resulting estimates

Y

(1)


Y

(2)


Y

(m)
.
Set

t

:=

Y

([m+1]

2
)
and

t
1
:=

Y

([m+1][1

2
])
.
Inference@LS-Kneip 18
A basic bootstrap condence interval:
By construction of

t

2
and

t
1

2
we have
P


Y )

2
, P

t
1


Y ) 1

2
.
We have seen that the bootstrap is consistent, and therefore
distr(

Y |Y
n
) distr(

Y ) asymptotically. This implies


that for large n
P(


Y )

2
, P(

t
1


Y ) 1

2
,
and therefore
P
_

Y (

t
1


Y )

Y (


Y )
_
1
Approximate 1 (symmetric) condence interval:
[2

t
1

2
, 2

2
]
The percentile interval:
In the older bootstrap literature the so-called percentile in-
terval
[

2
,

t
1

2
]
is usually recommended as a 1 condence interval.
The percentile interval can easily be justied if all underlying
distributions are symmetric, distr(


Y |Y
n
) distr(

|Y
n
), distr(

Y ) distr(

Y ).
In practice the percentile interval is usually less precise than
the standard interval discussed above; there are however so-
me bias-corrected modications of the percentile interval which
allow better approximations.
Inference@LS-Kneip 19
General Setup: The nonparametric (naive) bootstrap
Data: Random sample Y
n
:= {Y
1
, . . . , Y
n
}; the distribution
of Y
i
depends on an unknown parameter (vector)
The data Y
1
, . . . , Y
n
is used to estimate
estimator



(Y
1
, . . . , Y
n
)
Bootstrap: Random samples Y

1
, . . . , Y

n
are generated by
drawing observations independently and with replacement
from the available sample Y
1
, . . . , Y
n
Bootstrap estimates



(Y

1
, . . . , Y

n
)
distr(

|Y
n
) is used to approximate distr(

)
The bootstrap works for a large number of statistical and eco-
nometrical problems. Indeed, it can be shown that under some
mild regularity conditions the bootstrap is consistent, if
1) Generation of the bootstrap sample reects appropriately
the way in which the original sample has been generated
(i.i.d. sampling!).
2) The distribution of the estimator

is asymptotically normal.
More precisely,
single parameter ( IR):

n(

) N(0, v
2
); v -
standard error of

n(

)
multivariate parameter vector ( IR
d
):

n(

)
N
d
(0, V ); V - covariance matrix of

n(

)
Consistent Bootstrap: distr(

n(


)|Y
n
) distr(

n(


)) [and distr(

|Y
n
) distr(

)] if n is suciently large.
Bootstrap condence intervals, tests, etc.
Inference@LS-Kneip 110
Note:
Standard approaches to construct condence intervals and
tests are usually based on asymptotic normal approximati-
ons. For example, if IR and

n(

) N(0, v
2
) one
usually tries to determine an approximation v of v from the
data. An approximate 1 condence interval is then given
by
[

z
1

2
v

n
,

+z
1

2
v

n
]
In some cases it is very dicult to obtain approximations v of
v. Statistical inference is then usually based on the bootstrap
In contemporary statistical analysis the bootstrap is frequent-
ly used even for standard problems, where estimates v of v
are easily constructed. The reason is that in many situa-
tions bootstrap it can be shown that bootstrap condence
intervals or tests are more precise than those determined
analytically based on asymptotic formulas.
It must be emphasized that the bootstrap does not always work.
The bootstrap may fail if one of the above conditions 1) or 2) is
violated. Examples are
The naive bootstrap will not work if the i.i.d re-sample Y

1
, . . . , Y

n
from Y
1
, . . . , Y
n
does not properly reect the way how the
Y
1
, . . . , Y
n
is generated from the underlying population (e.g.
dependent data; Y
1
, . . . , Y
n
not i.i.d.).
The distribution of the estimator

is not asymptotically nor-
mal (e.g. extreme value problems)
Inference@LS-Kneip 111
General approach: Basic bootstrap 1 condence in-
terval
Random sample Y
n
:= {Y
1
, . . . , Y
n
}; unknown parameter (vec-
tor)
We will assume that the bootstrap is consistent: distr(

|Y
n
)
distr(

) if n is suciently large.
Determine

2
and 1

2
quantiles

t

2
and

t
1

2
of the condi-
tional distribution of

given Y
n
:= {Y
1
, . . . , Y
n
} (the boot-
strap distribution):
P

2
)

2
, P

>

t

2
) 1

2
,
P

t
1

2
) 1

2
, P

>

t
1

2
)

2
,
Here, P

denotes probabilities with respect to conditional


distribution of

given Y
n
:= {Y
1
, . . . , Y
n
}.
Consistency of the bootstrap implies that for large n
P(

)

2
, P(

t
1

) 1

2
,
and therefore
P
_

t
1

)

(

)
_
1
Approximate 1 (symmetric) condence interval:
[2

t
1

2
, 2

2
]
Inference@LS-Kneip 112
Example: Bootstrap condence interval for a median
Given: i.i.d. sample Y
n
:= {Y
1
, . . . , Y
n
}; Y
i
possesses a continuous
distribution with (unknown) density f.
We are now interested in estimating the median
med
of the
underlying distribution. Recall that the median is dened by
P(Y
i

med
) = P(Y
i

med
) = 0.5

med
is estimated by the sample median
med
. Based on the
ordered sample Y
(1)
Y
(2)
Y
(n)
,
med
is given by

med
=
_
_
_
Y
(
n+1
2
)
if n is an odd number
(Y
(
n
2
)
+Y
(
n
2
+1)
)/2 if n is an even number
Construction of a condence interval for
med
is not an easy task.
Asymptotically we obtain

n(
med

med
)
L
N(0,
1
4f(
med
)
2
)
The problem is that the density f is unknown. In principle it
may be estimated by nonparametric kernel density estimation
and a corresponding plug-in estimate

f(
med
) may be used to
approximate the asymptotic variance. However, the bootstrap
oers a simple alternative.
Construction of a bootstrap condence interval:
Draw i.i.d. random samples Y

1
, . . . , Y

n
from Y
n
and deter-
mine the corresponding medians

med
Determine

2
and 1

2
quantiles

t

2
and

t
1

2
of the condi-
tional distribution of

med
given Y
n
:= {Y
1
, . . . , Y
n
}.
Approximate 1 (symmetric) condence interval:
[2
med

t
1

2
, 2
med

2
]
Inference@LS-Kneip 113
1.2 Pivot statistics and the bootstrap-t me-
thod
In many situations it is possible to get more accurate bootstrap
condence intervals by using the bootstrap-t method (one also
speaks of studentized bootstrap condence intervals). The con-
struction relies on so-called pivot statistics.
Let Y
1
, . . . , Y
n
be an i.i.d. random sample and assume that the
distribution of Y depends on an unknown parameter (or para-
meter vector) .
A statistics T
n
T(Y
1
, . . . , Y
n
) is called pivot statistics, if
the distribution of T
n
does not depend on any unknown
parameter.
A statistics T
n
T(Y
1
, . . . , Y
n
) is called asymptotic pivot
statistics, if for suitable sequences a
n
, b
n
of real numbers
the transformed statistics a
n
T
n
+b
n
possesses a well-dened,
non-degenerate asymptotic distribution, which does not de-
pend on the parameters of the unknown distribution of Y .
Example: Population mean: Y
1
, . . . , Y
n
with mean , variance

2
> 0, and E|Y |
3
= < . If Y is normally distributed we
obtain

n(

Y )
S
t
n1
with S
2
=
1
n1

n
i=1
(Y
i


Y )
2
, where t
n1
denotes Students t-
distribution with n1 degrees of freedom. We can conclude that
T
n
is a pivot statistics
Even if Y is not normally distributed, the central limit theorem
implies that

n(

Y )
S

L
N(0, 1)
Inference@LS-Kneip 114
In this case T
n
is an asymptotic pivot statistics.
Bootstrap:
i.i.d. re-sample Y

1
, . . . , Y

n
Y

1
, . . . , Y

n
from Y
n
estimators

=
1
n

n
i=1
Y

i
and
S
2
=
1
n1

n
i=1
(Y

i


Y

)
2
n large approximately
distr(

n(


Y )
S

|Y
n
) distr(

n(

Y )
S
) N(0, 1)
or
distr(


Y
S

|Y
n
) distr(

Y
S
)
Therefore, the (conditional) distribution of

Y

Y
S

(given Y
n
)
can be used to approximate the distribution of

Y
S
.
Construction of a bootstrap-t condence interval of level 1 :
Determine

2
and 1

2
quantiles

2
and
1

2
of the condi-
tional distribution of

Y

Y
S

given Y
n
:
P


Y
S

2
)

2
, P


Y
S

>

2
) 1

2
,
P


Y
S

2
) 1

2
, P


Y
S

>
1

2
)

2
,
In practice:
Draw m bootstrap samples (e.g. m = 2000) and cal-
culate the corresponding estimates Z

1
:

Y

1

Y
S

1
, Z
2
:=

Y

2

Y
S

2
, . . . , Z
m
:=

Y

m

Y
S

m
.
Order the resulting estimates Z

(1)
Z

(2)

Z

(m)
.
Inference@LS-Kneip 115
Set

:= Z

([m+1]

2
)
and
1
:= Z

([m+1][1

2
])
.
Consistency of the bootstrap implies that asymptotically also
P(

Y
S

2
)

2
, P

Y
S
>

2
) 1

2
,
P

Y
S

1

2
) 1

2
, P

Y
S
>
1

2
)

2
,
This yields the 1 condence interval
[

Y
1
S,

Y

S]
Inference@LS-Kneip 116
General construction of a bootstrap-t interval (unknown
real values parameter IR):
Random sample Y
n
:= {Y
1
, . . . , Y
n
}; unknown parameter (vec-
tor) . Assume that the estimator

of is asymptotically normal,

n(

)
L
N(0, v
2
)

n
(

)
v

L
N(0, 1)
and that a consistent estimator v v(Y
1
, . . . , Y
n
) of v is availa-
ble. One might then replace v by v to obtain


v

L
N(0, 1)
Obviously,

n
(

)
v
and

v
are asymptotic pivot statistics.
Based on an i.i.d. re-sample Y

1
, . . . , Y

n
from {Y
1
, . . . , Y
n
},
calculate Bootstrap estimates

and v

.
Determine

2
and 1

2
quantiles

2
and
1

2
of the condi-
tional distribution of

given Y
n
.
Bootstrap-t interval
[


1
v,

v]
Inference@LS-Kneip 117
1.3 The Parametric bootstrap
A further increase of accuracy can be obtained in applications,
where the distribution of Y is known up to some parameter vec-
tors , (e.g: Y is normal with mean and variance
2
; Y follows
an exponential distribution with parameter ). The dierence to
the nonparametric bootstrap discussed above consists in the way
how to generate a bootstrap re-sample Y

1
, . . . , Y

n
.
Let = (
1
, . . . ,
p
)

, and for some known F let F(y, , ) denote


the distribution function of Y as a function of , . F is assumed
to be known. For simplicity, we will concentrate on constructing
a condence interval for .
The parametric bootstrap now proceeds as follows:
The unknown parameter vectors , are estimated by the
maximum likelihood method. Likelihood estimators

,
An i.i.d re-sample Y

1
, . . . , Y

n
is generated by randomly
drawing observations from a F(,

, ) distribution (using
a random number generator)

.
The conditional distribution of

given F(,

, ) is used to
approximate the distribution of the estimator

.
In almost all cases of practical interest condence intervals based
on the parametric bootstrap are more accurate than standard
intervals based on rst order asymptotic approximations. The
parametric bootstrap usually also provides more accurate appro-
ximations than its nonparametric counterpart discussed above.
Of course, this requires that the underlying distributional as-
sumption is satised (otherwise, the parametric bootstrap will
lead to incorrect results).
Inference@LS-Kneip 118
Basic parametric bootstrap condence interval:
[2

t
1

2
, 2

2
],
where

t

2
and

t
1

2
now denote the

2
and 1

2
quantiles of the
conditional distribution of

given F(,

, ).
Bootstrap-t intervals:
Assume that the standard error v(, ) of

n(

) can be
determined in dependence of the parameter (vectors) , .
i.i.d re-sample Y

1
, . . . , Y

n
generated by randomly drawing
observations from a F(,

, ) distribution
Parameter estimates

as well as bootstrap approxi-


mations v(

) of the standard error.


Bootstrap-t interval
[


1
v(

, ),

v(

, )],
where

2
and
1

2
now denote the

2
and 1

2
quantiles
of the conditional distribution of

v(

)
given F(,

, ).
Note: Sometimes the following modication leads to even more
accurate intervals:
Determine the

2
and 1

2
quantiles

2
and
1

2
of he
conditional distribution of

v(

)
given F(,

, ).
Asymptotically we obtain
P(


v(, )

1

2
) 1
1 condence interval: Set of all with

v(, )

1

2
Inference@LS-Kneip 119
Example: Exponential distribution
Assume that Y follows an exponential distribution with parame-
ter . Density and distribution function are then given by
f(y, ) =
1

e
x/
, F(y, ) = 1 e
x/
We have E(Y
i
) = and V ar(Y
i
) =
2
. The maximum likelihood
estimator of is given by

=
1
n

n
i=1
Y
i
, and V ar(

) =

2
n
.
The parametric bootstrap can then be used to construct con-
dence intervals. The following procedure is straightforward, but
there also exist alternative approaches.
An i.i.d re-sample Y

1
, . . . , Y

n
is generated by randomly dra-
wing observations from an exponential distribution with pa-
rameter

.
Y

1
, . . . , Y

n
Estimator

Calculation of

2
and 1

2
quantiles

2
and
1

2
with
P

2
) =

2
P

|
1

2
) = 1

2
where P

() denotes probabilities calculated with respect to


the exponential distribution with parameter

.
This yields
P(

2
) =
Condence interval: [

2
,


2
]
It can be shown, that or any nite sample of size n the coverage
probability of this interval is exactly equal to 1 .
Inference@LS-Kneip 120
1.4 More on Bootstrap Condence Intervals
Setup: i.i.d. random sample Y
n
:= {Y
1
, . . . , Y
n
}; unknown para-
meter (vector)
We will assume that the bootstrap is consistent: distr(

|Y
n
)
distr(

) if n is suciently large.
In the previous sections we have already dened basic bootstrap
condence intervals as well as bootstrap-t intervals.
1.4.1 Basic condence interval
[2

t
1

2
, 2

2
],
where

t

2
and

t
1

2
are the

2
and 1

2
quantiles of the condi-
tional distribution of

given Y
n
.
1.4.2 Bootstrap-t Intervals
[


1
v,

v],
where

2
and
1

2
are the

2
and 1

2
quantiles of the condi-
tional distribution of

given Y
n
.
1.4.3 Percentile Intervals
The classical percentile condence interval is given by
[

2
,

t
1

2
]
Generally, this interval does not work extremely well in practice.
Inference@LS-Kneip 121
The so-called BC
a
method allows to construct better condence
intervals. The term BC
a
stands for bias-corrected and accelerated.
The BC
a
interval of intended coverage 1 is given by
[

1
,

2
],
where

t

1
and

t

2
are the
1
and
2
quantiles of the conditional
distribution of

given Y
n
, and

1
=
_

+z

2
1 a(

+z

2
)
_

2
=
_

+z
1

2
1 a(

+z
1

2
)
_
,
where is the standard normal distribution function, and where
z

is the quantile of a standard normal distribution.


Note that the BC
a
interval reduces to a standard percentile in-
terval if

= a = 0. However, a dierent choice of

and a leads
to more accurate intervals:
The value of the bias-correction

can be obtained from the pro-
portion of the bootstrap replications less than the original esti-
mate

=
1
_
P

<

]
_
Calculation of the acceleration a is slightly more complicated. It
is based on Jacknife values of the estimator

: For any i = 1, . . . , n
calculate the estimate

i
from the sample Y
1
, . . . , Y
i1
, Y
i+1
, . . . , Y
n
with the ith observation deleted. Let

=
1
n

n
i=1

i
and deter-
mine
a =

n
i=1
(

i
)
3
6[

n
i=1
(

i
)
2
]
3/2
The BC
a
interval is motivated by theoretical results which show
Inference@LS-Kneip 122
that it is second order accurate.
Consider generally 1 condence intervals of the form [t
low
, t
up
]
of . Upper and lower bounds of such intervals are determined
from the data, t
low
t
low
(Y
1
, . . . , Y
n
), t
up
t
up
(Y
1
, . . . , Y
n
), and
their accuracy depends on the particular procedure applied.
(Symmetric) condence intervals are said to be rst-order
accurate if there exist some constant d
1
, d
2
< such that
for suciently large n
|P( < t
low
)

2
|
d
1

n
, |P( > t
up
)

2
|
d
2

n
.
(Symmetric) condence intervals are said to be second-order
accurate if there exist some constant d
3
, d
4
< such that
for suciently large n
|P( < t
low
)

2
|
d
3
n
, |P( > t
up
)

2
|
d
4
n
.
If the distribution of

is asymptotically normal, then under some
additional regularity conditions it can usually be shown that
Standard condence intervals based on asymptotic appro-
ximations are rst-order accurate. The same holds for the
basic bootstrap intervals [2

t
1

2
, 2

2
] as well as for
the classical percentile method.
Bootstrap-t intervals as well as BC
a
intervals are second-
order accurate.
The dierence between rst and second-order accuracy is not
just a theoretical nicety. In many practically important situati-
ons second-order accurate intervals lead to much better approxi-
mations.
Another approach for constructing condence intervals is the
Inference@LS-Kneip 123
ABC method: ABC, standing for for approximate bootstrap
condence intervals, allows to approximate the BC
a
interval end-
points analytically, without using any Monte Carlo replications
at all (reduced computational costs). The procedure works by
approximating the bootstrap sampling results by Taylor expan-
sions. It is then, however, required that



(Y
1
, . . . , Y
n
) is a
smooth function of Y
1
, . . . , Y
n
. This is for example not true for
the sample median.
Inference@LS-Kneip 124
1.5 Subsampling: Inference for a sample maxi-
mum
Data: i.i.d. random sample Y
n
:= {Y
1
, . . . , Y
n
}.
We now consider the situation that the Y
i
only takes values in a
compact interval [0, ] such that
P(Y
i
[0, ]) = 1.
Furthermore, Y
i
possesses a density f which is continuous on [0, ]
and satises f(y) > 0 for y (0, ], and f(y) = 0 for y [0, ].
The maximum of Y
i
is unknown and has to be estimated from
the data.
Similar type of extreme value problems frequently arise in eco-
nometrics. An example is the analysis of production eciencies
of dierent rms. The above situation may arise if we consider
production outputs Y
i
of a sample of rms with identical inputs.
A rm then is ecient if its output equals the maximal possible
value . Note that in practice usually more complicated problems
have to be considered, where production outputs dependent on
individually dierent values of input variables Frontier Ana-
lysis.
Consistent estimator

of :

:= max
i=1,...,n
Y
i
Constructing a condence interval for is not an easy task. The
distribution of

is not asymptotically normal. Indeed, it can
be shown that n(

) follows asymptotically an exponential
distribution with parameter =
1
f()
:
n(

)
L
Exp(
1
f()
)
Inference@LS-Kneip 125
The naive bootstrap fails:
i.i.d. re-sample Y

1
, . . . , Y

n
from {Y
1
, . . . , Y
n
} bootstrap
estimator

:= max
i=1,...,n
Y

i
Unfortunately, the bootstrap is not consistent
The reason is as follows:

= Y
(n)
, and hence

=

= Y
(n)
whenever Y
(n)
{Y

1
, . . . , Y

n
}. Some calculations then show
that for large n
P

= 0) = P(

= 0|Y
n
) 1 e
1
,
while P(

= 0) = 0!
One can conclude that even for large sample sizes distr(

|Y
n
) will be very dierent from distr(

) Basic boot-
strap condence intervals are incorrect.
A possible remedy is to use subsampling. Similar to the ordinary
bootstrap, subsampling relies on i.i.d. re-sampling from Y, and
the only dierence consists in the fact that subsampling is based
on drawing a smaller number < n of observations.
Inference@LS-Kneip 126
Subsampling bootstrap:
Choose some < n
Determine an i.i.d. re-sample Y

1
, . . . , Y

k
by drawing ran-
domly observations from {Y
1
, . . . , Y
n
} bootstrap esti-
mator

:= max
i=1,...,k
Y

i
For the above problem subsampling is consistent.
If = n

for some 0 < < 1, then


The law of ((

)|Y
n
) converges stochastically
to a Exp(
1
f()
)-distribution
More precisely, as n , = n

for some 0 < < 1,


sup

|P
_
(

) |Y
n
_
F(;
1
f()
)|
P
0,
where F(;
1
f()
) denotes the distribution function of an ex-
ponential distribution with parameter =
1
f()
.
Asymptotically: distr((

)|Y
n
) distr(n(

)).
The subsampling bootstrap works under extremely general con-
ditions, and it can often be applied in situations where the ordi-
nary bootstrap fails. However, it usually does not make any sense
to apply subsampling in regular cases, where standard nonpara-
metric bootstrap is consistent. Then subsampling is less ecient,
and condence intervals based on subsampling are less accurate.
In practice, a major problem is the choice of .
Inference@LS-Kneip 127
Condence interval based on subsampling:
Calculation of

2
and 1

2
quantiles

t

2
and

t
1

2
with
P

2
) =

2
P

t
1

2
) = 1

2
where P

() denotes probabilities calculated with respect to


the conditional distribution of

given Y
n
.
This yields
P

2
(

t
1

2
) 1 ,
and consistency of the bootstrap implies
P(

2
n(

t
1

2
) 1 .
Condence interval for :
[

+

n

2
,

+

n

t
1

2
]
Inference@LS-Kneip 128
1.6 Appendix
1.6.1 The empirical distribution function
Data: i.i.d. sample X
1
, . . . , X
n
; ordered sample X
(1)

X
(n)
. The distribution of X
i
possesses a distribution function F
dened by
F(x) = P(X
i
x)
Let H
n
(x) denote the number of observations X
i
satisfying X
i

X. The empirical distribution function is then dened by
F
n
(x) = H
n
(x)/n = Proportion of observations X
i
with X
i
x
Properties:
0 F
n
(x) 1
F
n
(x) = 0 if x < X
(1)
F(x) = 1 if x X
(n)
F
n
is a monotonically increasing step function
Inference@LS-Kneip 129
Example:
x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
5,20 4,80 5,40 4,60 6,10 5,40 5,80 5,50
Empirical distribution function:
4.0 4.5 5.0 5.5 6.0 6.5
0.0
0.2
0.4
0.6
0.8
1.0
Inference@LS-Kneip 130
Theoretical properties of F
n
Theorem: For every x IR we obtain
F
n
(x) B(n, F(x)),
i.e. F
n
(x) follows a binomial distribution with parameters n and
F(x). The probability distribution of F
n
(x) is thus givenn by
P
_
F
n
(x) =
m
n
_
=
_
_
n
m
_
_
F(x)
m
(1F(x))
nm
, m = 0, 1, . . . , n
Consequences:
E(F
n
(x)) = F(x), i.e.. F
n
(x) is an unbiased estimator of
F(x)
V ar(F
n
(x)) =
1
n
F(x)(1 F(x)) the standard error of
F
n
(x) decreases as n increaases. F
n
(x) is a consistent esti-
mator of F(x)).
Theorem of Glivenko-Cantelli:
P
_
lim
n
sup
xIR
|F
n
(x) F(x)| = 0
_
= 1
Inference@LS-Kneip 131
1.6.2 Consistency of estimators
Any reasonable estimator

of a parameter must be consistent.
Intuitively this means that the distribution of


n
must be-
come more and more concentrated around the true value as
n . The mathematical formalization of consistency relies on
general concepts quantifying convergence of random variables.

Convergence in probability:
Let X
1
, X
2
, . . . and X be random variables dened on a pro-
bability space (, A, P). X
n
converges in probability to X
if
lim
n
P [|X
n
X| < ] = 1
for every > 0. One often uses the notation X
n

P
X

weak consistency:
An estimator

is called weakly consistent if

n

P

Convergence in mean square:


Let X
1
, X
2
, . . . and X be random variables dened on a pro-
bability space (, A, P). X
n
converges in mean square to
X if
lim
n
E
_
|X
n
X|
2
_
= 0
Notation: X
n

MSE
X

mean square consistency:

is mean square consistent if


n

MSE
.
Inference@LS-Kneip 132

Strong Convergence (Convergence with probability 1):


Let X
1
, X
2
, . . . and X be random variables dened on a pro-
bability space (, A, P). X
n
converges with probability 1
(or almost surely) to X if
P
_
lim
n
X
n
= X
_
= 1
Notation: X
n

a.s.
X

Strong consistency (consistency with probability 1):


An estimator

is strongly consistent if

n

a.s.

X
n

MSE
X implies X
n

P
X
X
n

a.s.
X implies X
n

P
X
Application: Law of large numbers
We obtain E(

X) = as well as V ar(

X) =

2
n
MSE(

X) := E((

X )
2
) = V ar(

X) =

2
n

n
0


X
P
as n
Example: Consider a normally distributed random variable X
N(, (0, 18)
2
) with unknown mean but known standard deviation
= 0.18.
Random sample X
1
, . . . , X
n
Estimator

X of .
Recall:

X N(,

2
n
) = N(,
0.18
2
n
).
n = 9 : standard error = 0, 06, MSE(

X) = 0, 0036
n = 144 : standard error = 0, 015, MSE(

X) = 0, 000225
Inference@LS-Kneip 133
n = 9 : P[ 0, 1176

X + 0, 1176] = 0, 95
n = 144 : P[ 0, 0294

X + 0, 0294] = 0, 95
0.0
0.5
1.0
1.5
n=9
0,025

0,025
0.0
0.5
1.0
1.5
n=144
0,025

0,025
Inference@LS-Kneip 134
1.6.3 Convergence in distribution

Let Z
1
, Z
2
, . . . be a sequence of random functions with distri-
bution functions F
1
, F
2
, . . . , and let Z be a random variable
with distribution function F. Z
n
konverges in distribution to
Z if
lim
n
F
n
(t) F(t) an every continuity point t von F
Notation: Z
n

L
Z
The central limit theorem
Theorem (Ljapunov): Let X
1
, X
2
, . . . be a sequence of inde-
pendent random variables with means E(X
i
) =
i
and variances
V ar(X
i
) = E((X
i

i
)
2
) =
2
i
> 0. Furthermore assume that
E(|X
i

i
|
3
) =
i
< .
If
(

n
i=1

i
)
1/3
(

n
i=1

2
i
)
1/2
0 as n then

n
i=1
(X
i

i
)
(

n
i=1

2
i
)
1/2

L
N(0, 1)
Sometimes the notation Z
n
AN(0, 1) is used instead of Z
n

L
N(0, 1).
Important information about the speed of convergence to a nor-
mal distribution is given by the Berry-Esen theorem:
Inference@LS-Kneip 135
Theorem (Berry-Esen): Let X
1
, X
2
, . . . be a sequence of i.i.d.
random variables with mean E(X
i
) = and variance V ar(X
i
) =
E((X
i

i
)
2
) =
2
> 0. Then, if G
n
denotes the distribution
function of

n(

X)

,
sup
t
|G
n
(t) (t)|
33
4
E(|X
i
|
3
)

3
n
1/2
1.6.4 Stochastic order symbols (rates of convergence)
In mathematical notation the symbols O() and o() are often
used in order to quantify the speed (rate) of convergence of a
sequence of numbers.
Let
1
,
1
,
3
, . . . and
1
,
1
,
3
, . . . be a (deterministic) sequence
of numbers.
The notation
n
= O(1) indicates that the sequence
1
,
2
, . . .
is bounded. More precisely, there exists an M < such that

n
M for all n IN.

n
= o(1) means that Z
n
0.
Z
n
= O(r
n
) means that |Z
n
|/|r
n
| = O(1).
Z = o(r
n
) means that |Z
n
|/|r
n
| 0.
Examples:

n
i=1
i = O(n
2
),

n
i=1
i = o(n
3
)
Stochastic order symbols O
P
() and o
P
() are used to quantify
the speed (rate) of convergence of a sequence of random varia-
bles. Let Z
1
, Z
2
, Z
3
, . . . be a sequence of random variables, and
let r
1
, r
2
, . . . be either a deterministic sequence of number or a
sequence of random variables.
Inference@LS-Kneip 136
We will write Z
n
= O
p
(1) if for every > 0 there exists an
M

< and an n

IN such that
P(|Z
n
| > M

) fr alle n n

In other words, Z
n
= O
p
(1) indicates that the r.v. Z
n
are
stochastically bounded.
We will write Z
n
= o
P
(1) if and only if Z
n

P
0.
Z
n
= O
P
(V
n
) means|Z
n
|/|V
n
| = O
P
(1).
Z
n
= o
P
(V
n
) means that |Z
n
|/|V
n
|
P
0.
Example:

X = O
P
(n
1/2
)
1.6.5 Important inequalities
Inequality of Chebychev:
P[|X | > k]
1
k
2
for all k > 0
P[ k X +k] 1
1
k
2
k P[ k X +k]
2 1
1
4
= 0, 75
3 1
1
9
0, 89
4 1
1
16
= 0, 9375
Generalization:
P[|X | > k]
E(|X |
r
)
k
r
for all k > 0, r = 1, 2, . . .
Inference@LS-Kneip 137
Cauchy-Schwarz inequality:
Let x
1
, . . . , x
n
and y
1
, . . . , y
n
be arbitrary real numbers. Then
(
n

i=1
x
i
y
i
)
2
(
n

i=1
x
2
i
)(
n

i=1
y
2
i
)
Integrated version:
_

b
a
f(x)g(x)dx
_
2
(

b
a
f(x)
2
dx)(

b
a
g(x)
2
dx)
Application to random variables:
(E(XY ))
2
E(X
2
) E(Y
2
)
Hlder inequality:
Sei p > 1 und
1
p
+
1
q
= 1
Let x
i
, y
i
0, i = 1, . . . , n be arbitrary numbers. Then
n

i=1
x
i
y
i
(
n

i=1
x
p
i
)
1/p
(
n

i=1
y
q
i
)
1/q
Integrated version: (f(x) 0, g(x) 0)

b
a
f(x)g(x)dx (

b
a
f(x)
p
dx)
1/p
(

b
a
g(x)
q
dx)
1/q
Application to random variables:
E(|X| |Y |) (E(|X|
p
))
1/p
(E(|Y |
q
))
1/q
Inference@LS-Kneip 138
2 Bootstrap and Regression Models
Problem: Analyze the inuence of some explanatory (indepen-
dent) variables X
1
, X
2
, . . . , X
p
on a response variable (or de-
pendent variable) Y .
Observations
(Y
1
, X
11
, . . . , X
1p
), (Y
2
, X
21
, . . . , X
2p
), . . . , (Y
n
, X
n1
, . . . , X
np
)
Model

Y
i
=
0
+
1
X
i1
+
2
X
i2
+. . . +
p
X
ip
+
i

1
, . . . ,
n
i.i.d., E(
i
) = 0, Var(
i
) =
2
_

i
N(0,
2
)

The linear structure of the regression function as postulated


by the model,

0
+
1
X
i1
+. . . +
p
X
ip
= m(X
i1
, . . . , X
ip
)
= E(Y |X
1
= X
i1
, . . . , X
p
= X
ip
),
is necessarily fullled, if (Y
i
, X
i1
, X
i2
, . . . , X
ip
)
T
is a multi-
variate normal random vector.
Inference@LS-Kneip 21
Remark: Regression analysis is usually a conditional analysis.
The goal is to estimate the regression function mwhich is the con-
ditional expectation of Y given X
1
, . . . , X
p
. Standard inference
studies the behavior of estimators conditional on the observed
values.
However, dierent types of bootstrap may be used depending on
how the data is generated.
1) Random design:
(Y
1
, X
11
, . . . , X
1p
), (Y
2
, X
21
, . . . , X
2p
), . . . , (Y
n
, X
n1
, . . . , X
np
)
is a sample of i.i.d. random vectors, i.e. observations are in-
dependent and identically distributed.
Example: p + 1 measurements from n individuals randomly
drawn from an underlying population.
2) (X
j1
, . . . , X
jp
), j = 1, . . . , p, random vectors which are, ho-
wever, not independent or not identically distributed (e.g.
time series data, the X-variables are observed in successive
time periods).
3) Fixed design: Data are collected at are pre-specied, non-
random values X
jk
(corresponding for example to dierent
experimental conditions).
Inference@LS-Kneip 22
The model can be rewritten in matrix notation:
Y = X +
E() = 0, Cov() =
2
I
n
,
[ N
n
(0,
2
I
n
)]
with Y =
_
_
_
_
_
Y
1
.
.
.
Y
n
_
_
_
_
_
, X =
_
_
_
_
_
_
_
_
X
11
X
12
X
1p
X
21
X
22
X
2p
.
.
.
.
.
.
.
.
.
X
n1
X
n2
X
np
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_

1
.
.
.

p
_
_
_
_
_
_
_
_
, =
_
_
_
_
_
_
_
_

2
.
.
.

n
_
_
_
_
_
_
_
_
The parameter vector = (
0
, . . .
p
)
T
is usually estimated by
least squares:
Least squares method: Determine

0
,

1
, . . . ,

p
by minim-
zing
Q(
0
, . . . ,
p
) =
n

i=1
(Y
i


Y
i
)
2
=
n

i=1
(Y
i

1
X
i1
. . .
p
X
ip
)
2
Least squares estimator:

= [X
T
X]
1
X
T
Y
Inference@LS-Kneip 23
Let E

and Cov

denote conditional expectation and covariances


given the observed X-values.
Properties of

1.

is an unbiased estimator of
E

) =
_
_
_
_
_
E

0
)
.
.
.
E

p
)
_
_
_
_
_
=
_
_
_
_
_

0
.
.
.

p
_
_
_
_
_
=
2. Covariance matrix:
Cov

) = Cov

([X
T
X]
1
X
T
Y )
= [X
T
X]
1
X
T
Cov(Y )X[X
T
X]
1
=
2
[X
T
X]
1
X
T
X[X
T
X]
1
=
2
[X
T
X]
1
3. Distribution under normality:
If
i
N(0,
2
i
) then N
n
(0,
2
I
n
), and consequently

N
p+1
_
,
2
[X
T
X]
1
_
4. Asymptotic distribution: Assume that
1
n

i
X
ij
X
ik
c
jk
as
well as
1
n

i
X
ij
c
0k
as n Note that c
jk
= E(X
j
X
k
)
and c
0j
= E(X
j
) in the case of random design. Furthermore,
Let C denote the (p +1) (p +1) matrix with elements cjk,
j, k = 0, . . . , p, c
00
= 1, c
j0
= c
0j
, and assume that C is of
full rank. Then

n(

) N
p+1
_
0,
2
C
1
_
Inference@LS-Kneip 24
Estimation of
2
:
The residuals
i
= Y
i


Y
i
= Y
i

j=1

j
X
ij
estimate
the error term
i
Estimator
2
of
2
:

2
=
1
n p 1
n

i=1
(Y
i


Y
i
)
2

2
is an unbiased estimator
2
If the true error terms
i
are normally distributed, then (n
p 1)

2

2

2
np1
Let
ij
, i, j = 1, . . . , p + 1 denote the elements of the matrix
= [X
T
X]
1
. Then, for normal errors,

j

j

jj
t
np1
Standard condence intervals and tests for the parameter esti-
mates.
Note: Under the normality assumption,

jj
is a Pivot stati-
stics. In the general case (under some weak regularity conditions),
this quantity is an asymptotic Pivot statistics.
jj
/n converges
to the j th diagonal element of the matrix C, and therefore

j

j

jj

L
N(0, 1) as n
Inference@LS-Kneip 25
2.1 Bootstrapping Pairs
The usual, nonparametric is applicable if the data is generated by
a random design. Let X
i
= (X
i1
, . . . , X
ip
). The construction
of bootstrap condence intervals then proceeds as follows:
Basic bootstrap condence interval:
Original data: i.i.d. sample (Y
1
, X
1
), . . . , (Y
n
, X
n
)
Random samples (Y

1
, X

1
), . . . , (Y

n
, X

n
) are generated by
drawing observations independently and with replacement
from the available sample Y
n
:= {(Y
1
, X
1
), . . . , (Y
n
, X
n
)}.
(Y

1
, X

1
), . . . , (Y

n
, X

n
) least squares estimators

j
, j =
1, . . . , p + 1.
Determine

2
and 1

2
quantiles

t

2
,j
and

t
1

2
,j
of the condi-
tional distribution of

j
given Y
n
:= {(Y
1
, X
1
), . . . , (Y
n
, X
n
)}.
P

2
,j
)

2
, P

j
>

t

2
,j
1

2
,
P

t
1

2
,j
) 1

2
, P

j
>

t
1

2
,j
)

2
,
Here, P

denotes probabilities with respect to conditional


distribution of

j
given Y
n
.
Approximate 1 (symmetric) condence interval:
[2

t
1

2
,j
, 2

2
,j
]
Inference@LS-Kneip 26
Remark: Under some weak regularity conditions the bootstrap
is consistent, whenever
Y
i
=
0
+
1
X
i1
+
2
X
i2
+. . . +
p
X
ip
+
i
for independent errors
i
with E(
i
) = 0 and var(
i
) =
2
(X
i
) <
. In other words, the basic bootstrap condence interval provi-
des an asymptotically (rst order) accurate condence interval,
even if the errors are heteroscedastic (unequal variances)! This is
not true for the standard t-intervals.
Modication: Bootstrap-t intervals:
Random samples (Y

1
, X

1
), . . . , (Y

n
, X

n
) are generated by
drawing observations independently and with replacement
from the available sample Y
n
:= {(Y
1
, X
1
), . . . , (Y
n
, X
n
)}.
Use (Y

1
, X

1
), . . . , (Y

n
, X

n
) to determine least squares esti-
mators

j
, j = 1, . . . , p +1 as well as estimators (
2
)

of the
error variance
2
.
With

jj
denoting the j-th diagonal element of the matrix

= [(X

)
T
X

]
1
compute

jj
Determine

2
and 1

2
quantiles

2
,j
and
1

2
,j
of the
conditional distribution of

jj
This yields the 1 condence interval
[

j

1

2
,j

jj
,

2
,j

jj
]
Dierent from the basic bootstrap interval, this bootstrap-t in-
terval will be incorrect for heteroscedastic error.
Inference@LS-Kneip 27
In order to understand bootstrap behavior for random design let
us analyze the simplest case with p = 1. Then Y
i
=
0
+
1
X
i
+
i
.
Consider the estimator

1
=

i
(X
i


X)Y
i

i
(X
i


X)
2
=
1
+
1
n

i
(X
i


X)
i
1
n

i
(X
i


X)
2
of the slope
1
.
Random design implies that (Y
i
, X
i
), and hence (
i
, X
i
), i =
1, . . . , n are independent and identically distributed. Under some
regularity conditions (existence of moments) we have
1
n

i
(X
i


X)
2

p
E(X
i

x
)
2
=
2
X
,
and the central limit theorem implies that
1

i
(X
i


X)
i

L
N(0, v
2
,X
),
where
v
2
,X
= E
_
(X
i

x
)
2

2
i
_
.
If
i
and X
i
are independent and
2
= var(
i
) does not depend
on X
i
, then v
,X
=
2
X

2
. We then generally obtain for large n
distr(

n(

1
)) distr
_
1

i
(X
i


X)
i
1
n

i
(X
i


X)
2
_
distr
_
1

i
(X
i


X)
i

2
X
_
N(0,
v
2
,X

4
x
)
Inference@LS-Kneip 28
Now consider the bootstrap estimator

1
,

1
=

i
(X

i


X

)Y

i
(X

i


X

)
2
=

1
+
1
n

i
(X

i


X

i
1
n

i
(X

i


X

)
2
,
where

i
= Y

1
X

i
.
Recall that by denition, (Y

i
, X

i
), and hence (

i
, X

i
), i = 1, . . . , n
are independent and identically distributed observations (condi-
tional on Y
n
). We obtain E(
1
n

i
(X

i


X

)
2
|Y
n
) =
1
n

i
(X
i

X)
2
=:
2
X
, and
|
1
n

i
(X

i


X

)
2

2
X
|
P
0
as n . Moreover, E
_
1

i
(X

i


X

i
|Y
n
_
= 0 and
var
_
1

i
(X

i


X

i
|Y
n
_
=
1
n

i
(X
i


X)
2

2
i
By the central limit theorem we obtain that for large n
distr
_

n(

1
)|Y
n
_
N(0,
1
n

i
(X
i


X)
2

2
i

4
X
).
Since
1
n

i
(X
i


X)
2

2
i

P
v
2
,X
,
x

P

x
, we can conclude
that asymptotically
distr(

n(

1
)) distr
_

n(

1
)|Y
n
_
Bootstrap consistent
Inference@LS-Kneip 29
2.2 Bootstrapping Residuals
Bootstrapping residuals is applicable independent of the particu-
lar design of the regression model. The only crucial assumption
is that the error terms
i
are i.i.d. with constant variance
2
.
Residuals:

i
= Y
i


Y
i
= Y
i

j=1

j
X
ij
Matrix notation:
=
_
_
_
_
_

1
.
.
.

n
_
_
_
_
_
= (I X[X
T
X]
1
X
T
)Y = (I X[X
T
X]
1
X
T
. .
H
)
Cov() =
2
(I H)
With h
ii
> 0 denoting the i-th diagonal element of H we thus
obtain
var(
i
) =
2
(1 h
ii
) <
2
Standardized residuals:
r
i
=

i
1 h
ii
var(r
i
) =
2
We have

i

i
= 0. For the standardized residuals it is, however,
not guaranteed that r =
1
n

i
r
i
is equal to zero. The residual
bootstrap thus relies on resampling centered standardized resi-
duals r
i
:= r
i
r.
Inference@LS-Kneip 210
Note: Residual plots play an important role in validating regres-
sion models.
a.) Nonlinear model:
0 50 100 150

2
0
2
4
Mangelnde Modellanpassung
fitted y
r
e
s
i
d
u
a
l
s
b.) Heteroscedasticity
0 50 100 150

2
0
0

1
5
0

1
0
0

5
0
0
5
0
1
0
0
Heteroskedadastizitt
fitted y_i
R
e
s
i
d
u
a
l
s
Inference@LS-Kneip 211
Bootstrapping Residuals
Original data: i.i.d. sample (Y
1
, X
1
), . . . , (Y
n
, X
n
) Estima-
tor

Calculate (centered) standardized residuals


r
i
=

i
1 h
ii
, r
i
= r
i
r, i = 1, . . . , n
Generate random samples

1
, . . . ,

n
of residuals by drawing
observations independently and with replacement from{r
1
, . . . , r
n
}.
Calculate
Y

i
=

0
+
p

j=1

j
X
ij
+

i
, i = 1, . . . , n
Bootstrap estimators

are determined by least squares esti-


mation from the data (Y

1
, X
1
), . . . , (Y

n
, X
n
).
Basic bootstrap condence intervals:
Determine

2
and 1

2
quantiles

t

2
,j
and

t
1

2
,j
of the
conditional distribution of

j
.
P

2
,j
)

2
, P

j
>

t

2
,j
1

2
,
P

t
1

2
,j
) 1

2
, P

j
>

t
1

2
,j
)

2
,
Here, P

denotes probabilities with respect to conditional


distribution of

j
given Y
n
.
Approximate 1 (symmetric) condence interval:
[2

t
1

2
,j
, 2

2
,j
]
Bootstrap-t intervals can be determined similarly.
Inference@LS-Kneip 212
In order to understand the residual bootstrap let us again analyze
the simplest case with p = 1, and recall that

1
=

i
(X
i


X)Y
i

i
(X
i


X)
2
=
1
+
1
n

i
(X
i


X)
i
1
n

i
(X
i


X)
2
Let
2
X
:=
1
n

i
(X
i


X)
2
. If the errors
i
are i.i.d. zero mean
random variables with var(
i
) =
2
, then (under some regularity
conditions) the central limit theorem implies that conditional on
the observed values X
1
, . . . , X
n
distr(

n(

1
)) = distr
_
1

i
(X
i


X)
i
1
n

i
(X
i


X)
2
_
N(0,

2

2
X
)
holds for large n.
By denition,

1
=

i
(X
i


X)Y

i
(X
i


X)
2
=

1
+
1
n

i
(X
i


X)

i
1
n

i
(X
i


X)
2
.
We have
E(

i
|Y
n
) = 0, var(

i
|Y
n
) =
1
n

i
r
2
i
=:
2
,
and therefore
var
_
1

i
(X
i


X)

Y
n
_
=
1
n

i
(X
i


X)
2

2
The central limit theorem then leads to
distr
_

n(

1
)|Y
n
_
N(0,

2

2
X
).
Bootstrap consistent, since
2

P

2
as n .
Inference@LS-Kneip 213
2.3 Wild Bootstrap
The residual bootstrap is not consistent if the errors
i
are
heteroscedastic, i.e. var(
i
) =
2
i
. In this case the wild bootstrap
oers an alternative.
There are several versions of the wild bootstrap. In its simplest
form this procedure works as follows: Conditional on Y
n
, a boot-
strap sample

1
, . . . ,

n
of residuals is determined by generating
n independent random variables from the following binary dis-
tributions:
P
_

i
=
i
1

5
2
_
= ,
P
_

i
=
i
1

5
2
_
= 1 ,
i = 1, . . . , n, where =
5+

5
10
.
The constants are chosen in such a way that
E(

i
|Y
n
) = E

i
) = 0
var(

i
|Y
n
) = var

i
) =
2
i
E((

i
)
3
|Y
n
) = E

((

i
)
3
) =
3
i
Inference@LS-Kneip 214
Implementation of the wild bootstrap:
Original data: i.i.d. sample (Y
1
, X
1
), . . . , (Y
n
, X
n
) Estima-
tor

Calculate (centered) standardized residuals


r
i
=

i
1 h
ii
, r
i
= r
i
r, i = 1, . . . , n
Generate n independent random variables

i
from binary
distributions,
P
_

i
=
i
1

5
2
_
= ,
P
_

i
=
i
1

5
2
_
= 1 ,
i = 1, . . . , n, where =
5+

5
10
.
Calculate
Y

i
=

0
+
p

j=1

j
X
ij
+

i
, i = 1, . . . , n
Bootstrap estimators

are determined by least squares esti-


mation from the data (Y

1
, X
1
), . . . , (Y

n
, X
n
).
Basic bootstrap condence intervals:
Determine

2
and 1

2
quantiles

t

2
,j
and

t
1

2
,j
of the
conditional distribution of

j
.
Approximate 1 (symmetric) condence interval:
[2

t
1

2
,j
, 2

2
,j
]
Bootstrap-t intervals can be determined similarly.
Inference@LS-Kneip 215
In order to understand the basic intuition let us again analyze
the simplest case with p = 1, and recall that

1
=

i
(X
i


X)Y
i

i
(X
i


X)
2
=
1
+
1
n

i
(X
i


X)
i
1
n

i
(X
i


X)
2
It is now assumed that the errors
i
are independent with var(
i
) =

2
i
. Let
2
X
:=
1
n

i
(X
i


X)
2
and v
2
,X
=
1
n

i
(X
i


X)
2

2
i
. Un-
der some regularity conditions the central limit theorem implies
that conditional on the observed values X
1
, . . . , X
n
distr(

n(

1
)) = distr
_
1

i
(X
i


X)
i
1
n

i
(X
i


X)
2
_
N(0,
v
2
,X

4
X
)
holds for large n.
As above,

1
=

i
(X
i


X)Y

i
(X
i


X)
2
=

1
+
1
n

i
(X
i


X)

i
1
n

i
(X
i


X)
2
,
and by construction
var
_
1

i
(X
i


X)

Y
n
_
=
1
n

i
(X
i


X)
2

2
i
=: w
2
,X
.
For large n, the central limit theorem then leads to
distr
_

n(

1
)|Y
n
_
N(0,
w
2
,X

4
X
).
We have E

(
2
i
) =
2
i
+O(
1
n
), and thus for large n
E

( w
2
,X
) =
1
n

i
(X
i


X)
2
E

(
2
i
) v
2
,X
Under some regularity conditions the law of large numbers then
Inference@LS-Kneip 216
implies that | w
2
,X
v
2
,X
| 0 as n . Wild bootstrap
consistent.
2.4 Generalizations
The above types of bootstrap (bootstrapping pairs, bootstrap-
ping residuals, wild bootstrap) can also be useful in more com-
plex regression setups. An appropriate method then has to be
selected in dependence of existing knowledge about underlying
design and structure of residuals.
1) Nonlinear regression:
Y
i
= g(X
i
, ) +
i
,
where g is a nonlinear function of .
Example: Depreciation of a car (CV Citroen )
X - Age of the car (in years)
Y - depreciation =
selling price
original price (new car)
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Wertverlust eines Autos
X= Alter in Jahren

Y

=

r
e
l
a
t
i
v
e
r

W
e
r
t
v
e
r
l
u
s
t
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Inference@LS-Kneip 217
Model: Y
i
= e
X
i
+
i
An estimator

is determined by (nonlinear) least squares;
residual:
i
= Y
i
e

X
i
Bootstrap: Random design bootstrapping pairs; bootstrap-
ping residuals for homoscedastic errors; wild bootstrap for
heteroscedastic errors.
2) Median Regression:
Linear model: Y
i
=
0
+

j

j
X
ij
+
i
In some applications the errors possess heavy tails ( out-
liers!). In such situations estimation of by least squares
may not be appropriate, and statisticians tend to use more
robust method. A sensible procedure then is to determine
estimates

by minimizing
n

i=1
|Y
i

j
X
ij
|
over all possible . Solutions can be determined by numerical
optimization algorithms.
Inference is the usually based on the bootstrap. Random de-
sign bootstrapping pairs; bootstrapping residuals for ho-
moscedastic errors; wild bootstrap for heteroscedastic errors.
3) Nonparametric regression:
Model:
Y
i
= m(X
i
) +
i
for some unknown function m. The function m can be esti-
mated by nonparametric smoothing procedures (kernel esti-
mation; local linear estimation; spline estimation). Inference
is often based on the bootstrap.
Inference@LS-Kneip 218
2.5 Time series
The general idea of the residual bootstrap can be adapted to
many dierent situations. For example, it can also be used in the
context of time series models.
Example: AR(1)-process:
X
t
= X
t1
+
t
, t = 1, . . . , n
for i.i.d zero mean error terms with var(
t
) =
2
. If || < 1 this
denes a stationary stochastic process.
Standard estimator of :
=

n
i=2
(X
t


X)(X
t1


X)

n
i=1
(X
t


X)
2
Asymptotic distribution:

n( )
L
N(0, 1
2
)
Bootstrapping residuals
Calculate centered residuals

t
= X
t
X
t1
,
t
=
t

1
n 1

t

t
, t = 2, . . . , n
For some k > 0 generate random samples

k
,

k+1
, . . . ,

0
,

1
, . . . ,

n
of residuals by drawing n+k +1 observations independently
and with replacement from {
1
, . . . ,
n
}.
Generate a bootstrap time series by X

k
=

k
and
X
t
= X
t1
+

1
, t = k + 1, . . . , n
Determine bootstrap estimators

from X

1
, . . . , X

n
.
Inference@LS-Kneip 219
Under the standard assumptions of AR(1) models this bootstrap
is consistent.
Basic bootstrap condence intervals:
Determine

2
and 1

2
quantiles

t

2
and

t
1

2
of the condi-
tional distribution of

.
Approximate 1 (symmetric) condence interval:
[2

t
1

2
, 2

2
,j
]
Bootstrap-t intervals can be determined similarly.
Inference@LS-Kneip 220

Вам также может понравиться