Вы находитесь на странице: 1из 11

Statistics 512 Notes 10: Bootstrap Procedures

Bootstrap standard errors


1
, ,
n
X X K
iid with CDF F and variance
2
.
( )
2
1
1
2
1
n
n
X X
Var Var X X
n n n
+ +
_
+ +

,
L
L
.
( ) SD X
n

.
We estimate ( ) SD X by
( )
s
SE X
n

where
s
is the
sample standard deviation.
What about
1
{ ( , , )}
n
SD Median X X K
? This SD depends in
a complicated way on the distribution F of the Xs. How to
approximate it?
Real World:
1 1
, , ( , , )
n n n
F X X T Median X X K K
.
The bootstrap principle is to approximate the real world by
assuming that

n
F F where

n
F is the empirical CDF, i.e.,
the distribution that puts
1
n
probability on each of
1
, ,
n
X X K
. We simulate from

n
F by drawing one point at
random from the original data set.
Bootstrap World:
* * * * *
1 1

, , ( , , )
n n n n
F X X T Median X X K K
The bootstrap estimate of
1
{ ( , , )}
n
SD Median X X K
is
* *
1
{ ( , , )}
n
SD Median X X K
where
* *
1
, ,
n
X X K
are iid draws
from

n
F , i.e.,
* *
1
, ,
n
X X K
is a sample drawn with
replacement from the observed sample
1
( , , )
n
x x K
.
Example: Suppose we draw
1 2 3
, , X X X
iid from an
unknown distribution. The observed sample is 4, 6, 8. The
distribution of samples of size 3 from the empirical
distribution puts 1/27 probability on the following 27
samples
{(4,4,4), (4,4,6), (4,4,8), (4,6,4), (4,6,6), (4,6,8), (4,8,4),
(4,8,6), (4,8,8), (6,4,4), (6,4,6), (6,4,8), (6,6,4), (6,6,6),
(6,6,8), (6,8,4), (6,8,6), (6,8,8), (8,4,4), (8,4,6), (8,4,8),
(8,6,4), (8,6,6), (8,6,8), (8,8,4), (8,8,6), (8,8,8)}
The distribution of the median puts 1/27 probability on
each of the following values of the median
{(4, 4, 4, 4, 6, 6, 4, 6, 8, 4, 6, 6, 6, 6, 6, 6, 6, 8, 4, 6, 8, 6, 6,
8, 8, 8, 8)}
The standard deviation of the median of samples of size
three from the empirical distribution is
* * *
1 2 3
{ ( , , )} 1.44 SD Median X X X
.
How to approximate
* *
1
{ ( , , )}
n
SD Median X X K
?
The Monte Carlo method.
( )
( ) [ ] { } [ ]
2
1 1
2
2
1 1
2
2
1 1
( ) ( )
1 1
( ) ( )
( ) ( ) ( )
m m
i i
i i
P
m m
i i
i i
g X g X
m m
g X g X
m m
E g X E g X Var g X


_


,
_


,
1

]


Bootstrap Standard Error Estimation for Statistic
1
( , , )
n n
T g X X K
:
1. Draw
* *
1
, ,
n
X X K
as a sample with replacement from
the observed sample
1
, ,
n
x x K
.
2. Compute
* * *
1
( , , )
n n
T g X X K
.
3. Repeat steps 1 and 2 m times to get
* *
,1 ,
, ,
n n m
T T K
4. Let
2
*
*
,
1 1 ,
1 1
1
m m
boot n i
i r n r
se T T
m m

_

,

The bootstrap involves two approximations:
not so small approx. error small approx. error

( ) ( )
n
F n n boot
F
SD T SD T se
R function for bootstrap estimate of SE(Median)
bootstrapmedianfunc=function(X,bootreps){
medianX=median(X);
# vector that will store the bootstrapped medians
bootmedians=rep(0,bootreps);
for(i in 1:bootreps){
# Draw a sample of size n from X with replacement and
# calculate median of sample
Xstar=sample(X,size=length(X),replace=TRUE);
bootmedians[i]=median(Xstar);
}
seboot=var(bootmedians)^.5;
list(medianX=medianX,seboot=seboot);
}
Example: In a study of the natural variability of rainfall, the
rainfall of summer storms was measured by a network of
rain gauges in southern Illinois for the year 1960.
>rainfall=c(.02,.01,.05,.21,.003,.45,.001,.01,2.13,.07,.01,.0
1,.001,.003,.04,.32,.19,.18,.12,.001,1.1,.24,.002,.67,.08,.00
3,.02,.29,.01,.003,.42,.27,.001,.001,.04,.01,1.72,.001,.14,.2
9,.002,.04,.05,.06,.08,1.13,.07,.002)
> median(rainfall)
[1] 0.045
> bootstrapmedianfunc(rainfall,10000)
$medianX
[1] 0.045
$seboot
[1] 0.02167736
Bootstrap for skewness
The skewness of a distribution F is
3
3
( )
( )
x
dF x

where

and

are the mean and standard deviation of F.


The skewness is a measure of asymmetry; it equals 0 for a
symmetric distribution. A natural point estimate of the
skewness based on an iid sample
1
, ,
n
X X K
is
3
1
3
1
( )

n
i n
i
X X
n
s


.
>thetahat=(sum((rainfall-
mean(rainfall))^3)/length(rainfall))/var(rainfall)^.5
> thetahat
[1] 0.5517379
Bootstrap estimate of standard error of

:
bootstrapskewnessfunc=function(X,bootreps){
skewnessX=(sum((X-mean(X))^3)/length(X))/var(X)^.5;
# vector that will store the bootstrapped skewness estimates
bootskewness=rep(0,bootreps);
for(i in 1:bootreps){
# Draw a sample of size n from X with replacement and
# calculate skewness estimate of sample
Xstar=sample(X,size=length(X),replace=TRUE);
bootskewness[i]= (sum((X-
mean(Xstar))^3)/length(Xstar))/var(Xstar)^.5;
}
seboot=var(bootskewness)^.5;
list(skewnessX=skewnessX,seboot=seboot);
}
> bootstrapskewnessfunc(rainfall,10000)
$skewnessX
[1] 0.5517379
$seboot
[1] 0.3362528
Bootstrap confidence intervals
There are several ways of using the bootstrap idea to obtain
approximate confidence intervals. Here we present one
approach percentile bootstrap confidence intervals.
Suppose we want to estimate a parameter and have a
point estimator
1

( , , )
n
X X K .
To form a
(1 )
percentile bootstrap confidence interval,
we obtain m bootstrap resamples (typically m=3000 or
more),
* * * *
11 1 1
( , , ), , ( , , )
n m mn
X X X X K K K
.
We calculate
* *
1

, ,
m
K based on the bootstrap resamples.
Let
* * *
(1) (2) ( )

m
K
denote the ordered values of
* *
1

, ,
m
K . Let
[ ]
denote the greatest integer function.
The percentile bootstrap confidence interval is
( )
([ / 2]) ( 1 [ / 2])

,
m m m

+
.
Motivation for percentile bootstrap confidence intervals:
The interval from the ( / 2 ) quantile to the (1 / 2 )
quantile of the sampling distribution of

has a (1 )
probability of containing the true . We can view
* *
1

, ,
m
K as approximate samples from the sampling
distribution of

and
( )
([ / 2]) ( 1 [ / 2])

,
m m m

+
as estimates of
the ( / 2 ) quantile and the (1 / 2 ) quantile of the
sampling distribution of

.
Let

be a point estimate of and


se be the estimated
standard error of . Suppose
Example: Percentile Bootstrap Confidence Interval for
Median
# Function that forms percentile bootstrap confidence
# interval for median. To find the percentile bootstrap
# confidence interval for a parameter other than the median,
# substitute the appropriate function at both calls to the
# median.
percentciboot=function(X,m,alpha){
# X is a vector containing the original sample
# m is the desired number of bootstrap replications
theta=median(X);
thetastar=rep(0,m); # stores bootstrap estimates of theta
n=length(X);
# Carry out m bootstrap resamples and estimate theta for
# each resample
for(i in 1:m){
Xstar=sample(X,n,replace=TRUE);
thetastar[i]=median(Xstar);
}
thetastarordered=sort(thetastar); # order the thetastars
cutoff=floor((alpha/2)*(m+1));
lower=thetastarordered[cutoff]; # lower CI endpoint
upper=thetastarordered[m+1-cutoff]; # upper CI endpoint
list(theta=theta,lower=lower,upper=upper);
}
> percentciboot(rainfall,10000,.05)
$theta
[1] 0.045
$lower
[1] 0.01
$upper
[1] 0.1
Percentile bootstrap confidence interval for skewness:
Replace median in above R function by the skewness
percentciboot=function(X,m,alpha){
# X is a vector containing the original sample
# m is the desired number of bootstrap replications
theta=(sum((X-mean(X))^3)/length(X))/var(X)^.5;
thetastar=rep(0,m); # stores bootstrap estimates of theta
n=length(X);
# Carry out m bootstrap resamples and estimate theta for
# each resample
for(i in 1:m){
Xstar=sample(X,n,replace=TRUE);
thetastar[i]=(sum((Xstar-
mean(Xstar))^3)/length(Xstar))/var(X)^.5;
}
thetastarordered=sort(thetastar); # order the thetastars
cutoff=floor((alpha/2)*(m+1));
lower=thetastarordered[cutoff]; # lower CI endpoint
upper=thetastarordered[m+1-cutoff]; # upper CI endpoint
list(theta=theta,lower=lower,upper=upper);
}
> percentciboot(rainfall,10000,.05)
$theta
[1] 0.5517379
$lower
[1] 0.05368078
$upper
[1] 1.079607
Validity of bootstrap CIs: This is a complicated subject and
there is a huge literature on it, but for most problems,
bootstrap CIs have asymptotically correct coverage.
Davison and Hinkley (1997, Bootstrap Methods and Their
Applications) is a good reference.
Bootstrap summary
1. The bootstrap is a powerful method for estimating
standard errors and obtaining approximate confidence
intervals for point estimators.
2. There are several methods for using the bootstrap to
obtain approximate confidence intervals other than the
percentile confidence interval bootstrap. One idea is to
standardize the estimator

by an estimate of scale (see


Problem 5.9.5). The book Efron and Tibshirani, An
Introduction to the Bootstrap is an excellent book on the
bootstrap.

Вам также может понравиться