Академический Документы
Профессиональный Документы
Культура Документы
.
We estimate ( ) SD X by
( )
s
SE X
n
where
s
is the
sample standard deviation.
What about
1
{ ( , , )}
n
SD Median X X K
? This SD depends in
a complicated way on the distribution F of the Xs. How to
approximate it?
Real World:
1 1
, , ( , , )
n n n
F X X T Median X X K K
.
The bootstrap principle is to approximate the real world by
assuming that
n
F F where
n
F is the empirical CDF, i.e.,
the distribution that puts
1
n
probability on each of
1
, ,
n
X X K
. We simulate from
n
F by drawing one point at
random from the original data set.
Bootstrap World:
* * * * *
1 1
, , ( , , )
n n n n
F X X T Median X X K K
The bootstrap estimate of
1
{ ( , , )}
n
SD Median X X K
is
* *
1
{ ( , , )}
n
SD Median X X K
where
* *
1
, ,
n
X X K
are iid draws
from
n
F , i.e.,
* *
1
, ,
n
X X K
is a sample drawn with
replacement from the observed sample
1
( , , )
n
x x K
.
Example: Suppose we draw
1 2 3
, , X X X
iid from an
unknown distribution. The observed sample is 4, 6, 8. The
distribution of samples of size 3 from the empirical
distribution puts 1/27 probability on the following 27
samples
{(4,4,4), (4,4,6), (4,4,8), (4,6,4), (4,6,6), (4,6,8), (4,8,4),
(4,8,6), (4,8,8), (6,4,4), (6,4,6), (6,4,8), (6,6,4), (6,6,6),
(6,6,8), (6,8,4), (6,8,6), (6,8,8), (8,4,4), (8,4,6), (8,4,8),
(8,6,4), (8,6,6), (8,6,8), (8,8,4), (8,8,6), (8,8,8)}
The distribution of the median puts 1/27 probability on
each of the following values of the median
{(4, 4, 4, 4, 6, 6, 4, 6, 8, 4, 6, 6, 6, 6, 6, 6, 6, 8, 4, 6, 8, 6, 6,
8, 8, 8, 8)}
The standard deviation of the median of samples of size
three from the empirical distribution is
* * *
1 2 3
{ ( , , )} 1.44 SD Median X X X
.
How to approximate
* *
1
{ ( , , )}
n
SD Median X X K
?
The Monte Carlo method.
( )
( ) [ ] { } [ ]
2
1 1
2
2
1 1
2
2
1 1
( ) ( )
1 1
( ) ( )
( ) ( ) ( )
m m
i i
i i
P
m m
i i
i i
g X g X
m m
g X g X
m m
E g X E g X Var g X
_
,
_
,
1
]
Bootstrap Standard Error Estimation for Statistic
1
( , , )
n n
T g X X K
:
1. Draw
* *
1
, ,
n
X X K
as a sample with replacement from
the observed sample
1
, ,
n
x x K
.
2. Compute
* * *
1
( , , )
n n
T g X X K
.
3. Repeat steps 1 and 2 m times to get
* *
,1 ,
, ,
n n m
T T K
4. Let
2
*
*
,
1 1 ,
1 1
1
m m
boot n i
i r n r
se T T
m m
_
,
The bootstrap involves two approximations:
not so small approx. error small approx. error
( ) ( )
n
F n n boot
F
SD T SD T se
R function for bootstrap estimate of SE(Median)
bootstrapmedianfunc=function(X,bootreps){
medianX=median(X);
# vector that will store the bootstrapped medians
bootmedians=rep(0,bootreps);
for(i in 1:bootreps){
# Draw a sample of size n from X with replacement and
# calculate median of sample
Xstar=sample(X,size=length(X),replace=TRUE);
bootmedians[i]=median(Xstar);
}
seboot=var(bootmedians)^.5;
list(medianX=medianX,seboot=seboot);
}
Example: In a study of the natural variability of rainfall, the
rainfall of summer storms was measured by a network of
rain gauges in southern Illinois for the year 1960.
>rainfall=c(.02,.01,.05,.21,.003,.45,.001,.01,2.13,.07,.01,.0
1,.001,.003,.04,.32,.19,.18,.12,.001,1.1,.24,.002,.67,.08,.00
3,.02,.29,.01,.003,.42,.27,.001,.001,.04,.01,1.72,.001,.14,.2
9,.002,.04,.05,.06,.08,1.13,.07,.002)
> median(rainfall)
[1] 0.045
> bootstrapmedianfunc(rainfall,10000)
$medianX
[1] 0.045
$seboot
[1] 0.02167736
Bootstrap for skewness
The skewness of a distribution F is
3
3
( )
( )
x
dF x
where
and
n
i n
i
X X
n
s
.
>thetahat=(sum((rainfall-
mean(rainfall))^3)/length(rainfall))/var(rainfall)^.5
> thetahat
[1] 0.5517379
Bootstrap estimate of standard error of
:
bootstrapskewnessfunc=function(X,bootreps){
skewnessX=(sum((X-mean(X))^3)/length(X))/var(X)^.5;
# vector that will store the bootstrapped skewness estimates
bootskewness=rep(0,bootreps);
for(i in 1:bootreps){
# Draw a sample of size n from X with replacement and
# calculate skewness estimate of sample
Xstar=sample(X,size=length(X),replace=TRUE);
bootskewness[i]= (sum((X-
mean(Xstar))^3)/length(Xstar))/var(Xstar)^.5;
}
seboot=var(bootskewness)^.5;
list(skewnessX=skewnessX,seboot=seboot);
}
> bootstrapskewnessfunc(rainfall,10000)
$skewnessX
[1] 0.5517379
$seboot
[1] 0.3362528
Bootstrap confidence intervals
There are several ways of using the bootstrap idea to obtain
approximate confidence intervals. Here we present one
approach percentile bootstrap confidence intervals.
Suppose we want to estimate a parameter and have a
point estimator
1
( , , )
n
X X K .
To form a
(1 )
percentile bootstrap confidence interval,
we obtain m bootstrap resamples (typically m=3000 or
more),
* * * *
11 1 1
( , , ), , ( , , )
n m mn
X X X X K K K
.
We calculate
* *
1
, ,
m
K based on the bootstrap resamples.
Let
* * *
(1) (2) ( )
m
K
denote the ordered values of
* *
1
, ,
m
K . Let
[ ]
denote the greatest integer function.
The percentile bootstrap confidence interval is
( )
([ / 2]) ( 1 [ / 2])
,
m m m
+
.
Motivation for percentile bootstrap confidence intervals:
The interval from the ( / 2 ) quantile to the (1 / 2 )
quantile of the sampling distribution of
has a (1 )
probability of containing the true . We can view
* *
1
, ,
m
K as approximate samples from the sampling
distribution of
and
( )
([ / 2]) ( 1 [ / 2])
,
m m m
+
as estimates of
the ( / 2 ) quantile and the (1 / 2 ) quantile of the
sampling distribution of
.
Let