Вы находитесь на странице: 1из 23

Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy Author(s): B.

Efron and R. Tibshirani Source: Statistical Science, Vol. 1, No. 1 (Feb., 1986), pp. 54-75 Published by: Institute of Mathematical Statistics Stable URL: http://www.jstor.org/stable/2245500 Accessed: 19/10/2010 11:46
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=ims. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to Statistical Science.

http://www.jstor.org

StatisticalScience 1986,Vol. 1, No. 1, 54-77

Bootstrap MethodsforStandardErrors, Confidence Intervals, and Other Measures of Statistical Accuracy


B. Efronand R. Tibshirani Abstract.This is a review of bootstrap methods, concentrating on basic ideasand applications thantheoretical rather considerations. It begins with an exposition of the bootstrap estimate of standard error forone-sample situations. Severalexamples, someinvolving quitecomplicated statistical procedures, are given. The bootstrap is thenextended to other measures of statistical suchas bias and prediction accuracy and to complicated error, data structures suchas timeseries, censored models. data,and regression Severalmore arepresented theseideas.The lastthird examples illustrating ofthepaperdeals mainly with bootstrap confidence intervals. Key words: Bootstrapmethod, estimated standarderrors, approximate confidence methods. intervals, nonparametric 1. INTRODUCTION the involves in appliedstatistics A typical problem parameter ofan unknown 0. The twomain estimation 0 shouldbe asked are (1) whatestimator questions 0, how used? (2) Havingchosento use a particular is a of0? The bootstrap is it as an estimator accurate the secondquesforanswering methodology general substitutes which method, tion. It is a computer-based in place oftheofcomputation amounts considerable can As we shall see, the bootstrap oretical analysis. are fartoo compliwhich questions answer routinely Even forrelanalysis. statistical catedfor traditional methods computer-intensive problems atively simple gooddata anaare an increasingly likethebootstrap comdeclining in an eraofexponentially bargain lytic costs. putational This paper describesthe basis of the bootstrap examand givesseveral which is very simple, theory, the ples of its use. Relatedideas like the jackknife, boundarealso information andFisher's deltamethod, are details andtechnical Mostoftheproofs discussed. in the references given, These can be found omitted. and andBiostatistics, ofStatistics is Professor B. Efron and Comin Mathematical Chairman oftheProgram His mailing Scienceat Stanford University. putational SequoiaHall,StanisDepartment ofStatistics, address is CA 94305.R. Tibshirani Stanford, fordUniversity, Fellowin theDepartment ofPreventive a Postdoctoral Uniand Biostatistics, Faculty ofMedicine, Medicine McMurrick Building,Toronto, of Toronto, versity M5S 1A8,Canada. Ontario,
54

Efron(1982a). Some of the discussion particularly Efron and Gong(1983)and also from hereis abridged (1984). from Efron we willdethe mainexposition, beginning Before ofa problem works in terms scribe howthebootstrap of the assessing the accuracy where it is not needed, sample mean. Suppose that our data consistsof a distrian unknown probability random samplefrom F on therealline, bution
(1.1) Xl X2, * , X.-

F.

X1 = x1, X2 = x2, ... , Xn = xn, we Havingobserved the samplemeanx = 1 xn/n, and wonder compute of the truemean how accurateit is as an estimate 6 = EFIX}.
-

ofF is 182(F) EFX2 moment Ifthesecondcentral

a sampleofsize n from ofi for deviation thestandard distribution F, is


(1.2) o(F)
=

(EFX)2, then the standard errora(F; n, x), that is

[,M2(F)/n]112.

n, i) is allownotation o(F) -(F; The shortened ofinterest statistic ablebecausethesamplesizen and F unknown. The standard x are known, only being ofi's accuracy. Unis the traditional measure error use (1.2) to assessthe we cannotactually fortunately, ofi, sincewe do notknow M2(F), butwe can accuracy error the estimated standard use (1.3)
= wherejX2 of A2(F).

=
Ei (Xi-x)2/(n

[#2/n]l/2 -

1), theunbiased estimate

o(F). Let wayto estimate Thereis a moreobvious

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY

55

probability distribution, F indicate theempirical mass 1/non x1,x2,... , xn. (1.4) F: probability replace F byF in (1.2),obtaining Thenwecan simply (1.5) a F) = [U2(P)/nl . as the estimated standarderrorfori. This is the The reasonforthe name "bootbootstrap estimate. weevaluate in Section 2,when strap" willbe apparent more complicated thanx. Since v(F) for statistics (1.6)
a

/2

82(F)

complierror for estimators more wewanta standard of This section presents a morecareful description a medianor a correlation thebootstrap catedthanx, forexample, ofstandard error. For nowwe estimate from a robust In most ora slopecoefficient regression. willassumethatthe observed data y = (xl, x2, ** cases thereis no equivalent to formula (1.2), which andidentically distributed ofindependent xn)consists thestandard error function (iid) observations a(F) as a simple expresses X1,X2- .-, Xn fiid F, as in (1.1). ofthe sampling F. As a result, formulas Here F represents distribution an unknown distribuprobability like(1.3) do notexistformoststatistics. tion on r, the commonsample space of the observaThis is where comesin. It turnsout thecomputer of interest, tions. We have a statistic say 0(y), to thatwecan always evaluate thebootstrap which numerically we wishto assignan estimated standard error. a simple a = a(F), without estimate expresknowing Fig. 1 showsan example.The samplespace r is ofa is a straightforward n2+, the positivequadrantof the plane. We have sionfora(F). The evaluation In MonteCarloexercise in thenextsection. described observed data points,each corren = 15 bivariate as described a good computing in the environment, law school.Each pointxi sponding to an American gives in Section2, the bootstrap effectively remarks for consists oftwosummary statistics the1973enterformula for the statistician a simple like (1.3) any ingclass at law schooli howcomplicated. no matter statistic, (2.1) are crudebut usefulmeasures of Standarderrors xi= (LSATi, GPAi); are used to give statistical They frequently accuracy. intervalsfor an unknown LSATi is the class' averagescore on a nationwide confidence approximate examcalled "LSAT"; GPAi is the class' averageun0 parameter The observed Pearsoncorrelation dergraduate grades. 0 E 0 ? Sz(a), (1.7) z(a)is the100 * a percentile pointofa standard where normalvariate,e.g., Z(95) = 1.645. Interval(1.7) is notso good.Sections sometimes and sometimes good, 7 and 8 discussa moresophisticated use ofthebootinwhichgivesbetter confidence approximate strap, tervals than(1.7). litThe standard interval (1.7) is based on taking (f erallythe largesamplenormalapproximation 0)/S N(0, 1). Appliedstatisticians use a variety of if thisapproximation. For instance tricks to improve 0 is the correlation and 0 the samplecorcoefficient 4 = tanh-1(0), = thenthetransformation relation, thenormal improves approximation, tanh-1(0) greatly theunderlying at leastin thosecases where sampling tactic is bivariatenormal.The correct distribution thenis to transform, compute theinterval (1.7) for, and transform thisinterval backto the0 scale.
3.5GPA 3.3t 3.1 GPA -10 2.9 97

formula to (1.3) in this case. The troublebeginswhen

is too is notquitethesame as a-,butthedifference in smallto be important mostapplications. Of course we do not reallyneed an alternative

We willsee thatbootstrap confidence intervals can automatically incorporate tricks likethis, without requiring thedataanalyst toproduce specialtechniques, likethetanh-1 transformation, for eachnewsituation. Animportant ofwhat is thesubstitution theme follows ofrawcomputing This power for theoretical analysis. of course,only is not an argument againsttheory, Mostcommon statistical against unnecessary theory. when methods weredeveloped inthe1920sand 1930s, Now thatcomwas slowand expensive. computation and expect putation is fastand cheapwecan hopefor This paper dischangesin statistical methodology. Efron(1979b)discussesone suchpotential change, cussesseveral others. 2. THE BOOTSTRAPESTIMATEOF STANDARD ERROR

*1 *2

*4

*6
15

2.7

@13

03 560

*@14
*12 l I

540

'-

580

l 1 I 620 600 LSAT

640

660

680

FIG. 1.

The law schooldata (Efron,1979b). The data points,beginning with School 1, are (576, 3.39), (635, 3.30), (558, 2.81), (578, 3.03), (666, 3.44), (580, 3.07), (555, 3.00), (661, 3.43), (651, 3.36), (605, 3.13), (653, 3.12), (575, 2.74), (545, 2.76), (572, 2.88), (594, 2.96).

56

B. EFRON AND R. TIBSHIRANI

for these15 pointsis 6 = .776.We wishto coefficient error to thisestimate. assigna standard Let o(F) indicatethe standarderrorof 0, as a F, distribution sampling oftheunknown function
(2.2) a(F)
=

[VarF{Ny)

ofthesamplesize n Ofcoursea (F) is also a function 0(y), but sincebothof of the statistic and the form in the these are knowntheyneed not be indicated The bootstrap estimate ofstandard error is notation.
(2.3)
=

(1.4), putting distribution whereF is the empirical data pointxi.In the 1/non each observed probability putting mass lawschoolexample, F is thedistribution 1/15on each point in Fig. 1, and a is the standard 15iidpoints for deviation ofthecorrelation coefficient drawn from F. In most cases, including that of the correlation for thefuncexpression there is no simple coefficient, it is easyto numeritiona(F) in (2.2). Nevertheless, callyevaluatea = 0(F) by meansof a MonteCarlo notation: on the following whichdepends algorithm, = (x4, 4, *, draws n independent x*) indicates sample.Because F is the from F, called a bootstrap sample ofthedata,a bootstrap distribution empirical sampleof size turns out to be the same as a random fromthe actual sample n drawnwithreplacement
{X1, X2,
..

Carloalgorithm willnotconverge to a'ifthebootstrap samplesize differs from thetruen. Bickeland Freedman(1981) showhowto correct thealgorithm to give thebootstrap a ifin fact samplesizeis takendifferent than n, but so far theredoes not seem to be any practical advantage to be gainedin thisway. Fig. 2 showsthe histogram of B = 1000bootstrap replications ofthecorrelation coefficient from thelaw schooldata. For convenient reference the abscissais plottedin termsof 0* - 0 = 0* - .776. Formula (2.4) gives6' = .127 as thebootstrap estimate of standard error. This can be compared withthe usual normal ofstandard for0, theory estimate error
(2.5)
TNORM =

(1 -

)/(n - 3)1/

.115,

[Johnson and Kotz (1970,p. 229)]. REMARK.The Monte Carlo algorithm leadingto is to On the Stanford version U7B (2.4) simple program. of the statistical computing languageS, Professor Owenhas introduced Arthur a single command which in theS catalog. For instance bootstraps anystatistic in Fig. 2 are obtained thebootstrap results simply by typing B = 1000). tboot(lawdata, correlation, The execution timeis abouta factor ofB greater than thatfor theoriginal computation.

inthree steps: proceeds The MonteCarloalgorithm There is anotherway to describethe bootstrap generator, independently standard F is thenonparametric number maximum like(i) usinga random error: F drawa largenumber lihood say y*(1), ofbootstrap samples, estimate (MLE) oftheunknown distribution sample y*(b), bootstrap and This means that the (ii) for each (Kiefer Wolfowitz, 1956). y*(B); ***, y*(2), thestatistic ofinterest, say 0*(b)= 0(y*(b))g bootstrap estimateaf = a(F) is the nonparametric evaluate MLE ofv(F), thetruestandard error. b = 1, 2, * , B; and (iii) calculate the sample standard In factthereis nothing whichsays thatthe bootofthe0*(b)values deviation must be carried outnonparametrically. strap Suppose -A 0()2A Zb=1 1/2 *.)}0/ {8*(b)for instance thatin thelaw schoolexample webelieve F mustbe bivariate the truesampling distribution B-i (2.4) F with itsparametric normal. Thenwecouldestimate MLE bivariate normal distribution the FNORM, >20*(b) having l*(.)= as the and covariance matrix the same meanvector B~~~~~~ It is easy to see that as B 60, 5B will approach error. of standard = (F), the bootstrap estimate deviation a standard All we are doingis evaluating Later, in Section 9, we NORMAL by Monte Carlo sampling. THEORYHITGA will discusshow largeB need be taken. For most DENSITY HISTOGRAM B in therange50 to 200 is quiteadequate. situations the difference we willusuallyignore In whatfollows HISTOGRAM bothsimply "a" between 5B and a, calling PERCENTILES sampletakenwiththesame Whyis each bootstrap 160/o 50 84 that data set?Remember size n as theoriginal sample 0 -0.3 -0.2 -0.1 -0.4 0.1 0.2 forthe error o(F) is actually(F, n, 6), the standard statistic sampleofsize n from FIG. 2. Histogramof B = 1000 bootstrapreplicationsof 6*for the 0( ) basedon a random estimate law schooldata. The normaltheory F. The bootstrap distribution the unknown curvehas a similarshape, density butfallsoff morequickly at the uppertail. at F = F. The Monte o(F, n,0) evaluated f is actually
A a

* * Xnl.

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY

57

are eswhere J is the observed Fisher information,

to "1NORM = (2.5) is a close approximation Expression standestimate of the bootstrap parametric o(FNORM), arderror. of the bootor demerits In considering the merits all of the usual that it is worth remembering strap, standard like g-l/2 for errors, formulas estimating

thoughtof as the bootstrap distributionfor B = oo.

samplesat step (i) of the algodata. The bootstrap FNORM insteadofF, from couldthenbe drawn rithm outas before. and steps(ii) and (iii) carried The smoothcurvein Fig. 2 showsthe resultsof on the bootstrap" theory out this "normal carrying thereis no need to do the law schooldata. Actually in this case, because of Fisher's sampling bootstrap coefdensity ofa correlation formula for thesampling Chapter (see normal situation in the bivariate ficient can be and Kotz, 1970). This density 32 of Johnson

carriedout in a parabootstrap estimates sentially This pointis carefully in explained framework. metric nonSection5 ofEfron(1982c).The straightforward ofavoid(i)-(iii) has thevirtues algorithm parametric all all approximations assumptions, parametric ing withthe Fisherinformation (such as thoseinvolved
TABLE 1 and jackknife the bootstrap A samplingexperiment comparing mean, estimatesofstandarderror forthe 25% trimmed sample size n = 15

ofan MLE), and in error thestandard for expression of any kind. The data fact all analyticdifficulties analystis freeto obtain standarderrorsforenorsubjectonlyto the estimators, mously complicated time. Sections 3 and6 discuss ofcomputer constraints whichare fartoo appliedproblems someinteresting analyses. forstandard complicated Table 1 shows work? How welldoes the bootstrap Here r is the real line, the answerin one situation. is the 25% 0 of interest n = 15, and the statistic F is distribution mean.If thetruesampling trimmed is a(F) = .286. error N(0, 1), thenthe truestandard averaging unbiased, estimate The bootstrap 'ais nearly The standard experiment. .287 in a largesampling .071 in a' is itself ofthebootstrap estimate deviation = .25. .071/.287 ofvariation coefficient thiscase,with of Carlo are two levels Monte that there (Notice the actualsamples drawing in Table 1: first involved y = (xl, x2, ..., x15)from bootF, and thendrawing value of y. samplesevaluatea' fora fixed bootstrap to the variability .071 refers deviation The standard choiceofy.) ofa' dueto therandom ofassignmethod common another The jackknife, is discussedin standarderrors, ing nonparametric estimate Section10. The jackknife CJ' is also nearly ofvaricoefficient fora(F), buthas higher unbiased possibleCV fora scaleation (CV). The minimum of a(F), assuming fullknowledge invariant estimate The model,is shownin brackets. of the parametric is seen to be moderately bootstrap nonparametric in Table 1. in bothcases considered efficient to the case of 0 the correlation Table 2 returns Insteadof real data we have a sampling coefficient. normal, in whichthe trueF is bivariate experiment = 14. Table 2 = size n 0 truecorrelation .50,sample in in Efron a table from (1981b), is abstracted larger
2
* *,x15)withy held fixed.The strapsamples (x4, x2*, *

F standard normal Ave SD


Bootstrap f (B = 200) .2s87 .071

F negative exponential Ave SD


.242 .078

CV
.25

CV
.32

.280 .084 .30 .224 .085 .38 Jackknife 6f (.27) (.19) .232 True(minimum CV) .286
TABLE

F bivariatenormalwithtrue C and forX = tanh-'6; sample size n 14, distribution coefficient forthe correlation Estimatesofstandarderror correlation p = .5 (froma largertablein Efron,1981b) Summarystatisticsfor200 trials Standard errorestimatesforC Ave 1. 2. 3. 4. 5. BootstrapB = 128 BootstrapB = 512 Normal smoothedbootstrapB = 128 UniformsmoothedbootstrapB = 128 UniformsmoothedbootstrapB = 512 .206 .206 .200 .205 .205 .223 .175 .217 .218 SD .066 .063 .060 .061 .059 .085 .058 .056 CV .32 .31 .30 .30 .29 .38 .33 .26 MSE .067 .064 .063 .062 .060 .085 .072 .056 Standard errorestimatesforX Ave .301 .301 .296 .298 .296 .314 .244 .302 .299 SD .065 .062 .041 .058 .052 .090 .052 0 CV .22 .21 .14 .19 .18 .29 .21 0

-VKfM
.065 .062 .041 .058 .052 .091 .076 .003

6. Jackknife 7. Delta method jackknife) (Infinitesimal 8. Normal theory True standarderror

58

B. EFRON AND R. TIBSHIRANI

a standard forestimating which someofthemethods thesamplesize to be even. error required to 0, whiletheright The leftside ofTable 2 refers the rootmean error, of standard For each estimator in )2]1/2 iS given of estimation [E(a squarederror thecolumn headedVMi.. The bootstrap was runwithB = 128 and also with better onlyslightly B = 512,thelatter valueyielding ofSection9. in accordance withtheresults estimates B wouldbe pointless.It can be increasing Further shown thatB = oo givesVii-i = .063for 0, only.001 estimate (2.5), theory less thanB = 512. The normal experiwhich we knowto be ideal forthissampling has ../i-Si= .056. ment, between the totally We can compromise nonparametric estimate a'andthetotally parametric bootstrap estimate C7NORM. This is done in lines 3, 4, bootstrap and 5 ofTable 2. Let 2 = Sin-l (xi - )(x- i)'/n be of the observed data. the samplecovariance matrix The normal bootstrap drawsthe bootstrap smoothed convoluF (D N2(0, .252), (D indicating samplefrom to estimating F by an equal mixtion.This amounts tureof the n distributions N2(xi,.252), thatis by a in a smoothed xi* normal window estimate. Each point is of a selected randomly sample the sum bootstrap an bivariate data pointxj, plus independent original makeslittle normal pointzj - N2(0,.252). Smoothing of the but left side difference on the table, is spectack The latterresultis in the case. ularlyeffective distribution is bivarsincethetruesampling suspect q O = tanhis and the function specifiiate normal, error in constant standard to havenearly callychosen normal The smoothed the bivariate uniform family. samples fromF (D W(0, .252), where bootstrap on a rhombus distribution WI(0,.252) is the uniform matrix so VIhas meanvector 0 andcovariance selected both in vMi-SR for moderate reductions .25Z. It yields sidesofthetable. which Line 6 ofTable 2 refers to thedeltamethod, methodof assigning nonparais the mostcommon it is badly error. metric standard enough, Surprisingly on bothsidesofthetable.The delta biaseddownward difof statistical as the method also known method, the infinitesand series theTaylor method, ferentials, in Section10. imaljackknife, is discussed
-

time(y) in weeksfortwogroups, treatment (x = 0)

side refersto X = tanh-'(0) = .5 log(1 + 6)/(1 -).

whether or notthe remission timeis censored (0) or complete (1). Thereare 21 micein each group. The standard regression modelforcensored data is Cox's proportional hazardsmodel(Cox, 1972).It assumes thatthehazard function h(tIx),theprobability of goingintoremission in nextinstant givenno remission up to timet fora mousewithcovariate x, is oftheform (3.1) h(tIx) = ho(t)e:x. Hereho(t)is an arbitrary function. Since unspecified x hereis a group thismeanssimply thatthe indicator, hazardfor thecontrol is e: times thehazardfor group The regression the treatment group. parameter d is of estimated maximizaindependently ho(t)through tionoftheso called"partial likelihood" (3.2) PL
=
iED

and control(x = 1), and a 0-1 variable (bi) indicating

11

e,3xi e-xi
EiER,

e i'

3. EXAMPLES Example 1. Cox's ProportionalHazards Model

error standard In this sectionwe applybootstrap statistics. to somecomplicated estimation a studyof The data forthis examplecome from timesin mice,taken fromCox leukemiaremission of remission (1972). They consistof measurements

the partiallikelihood and numerically 6*)) we form maximize it to produce the bootstrap estimate A*.A of1000bootstrap in Fig.3. histogram valuesis shown of the standard error The bootstrap estimate of A is .42. Although based on these 1000 numbers the it is interestand standard estimates bootstrap agree, is skewed distribution ingto notethatthe bootstrap to the right.This leads us to ask: is thereother from the bootstrap information thatwe can extract error The distribution other thana standard estimate? the bootstrap distribuansweris yes-in particular, a confidence interval forfA, tioncan be used to form 9. The shapeofthebootstrap as wewillsee in Section will help determine distributiion the shape of the confidence interval. In thisexample ourresampling unitwas thetriple ofthe theuniqueelements (yi,xi,bi),and we ignored model and theparticular problem, i.e.,the censoring, are other used.In fact, there being waysto bootstrap

each bootstrap sample $(y*, x*, 0),

where D is the set ofindicesofthe failure timesand ofthoseat riskat timeyi.This Ri is thesetofindices maximization an iterative requires computer search. The estimate thesedata turns d for outto be 1.51. Taken literally, thissaysthatthe hazardrateis e'5' = 4.33 timeshigher in the control thanin the group so the treatment treatment is veryeffective. group, Whatis thestandard error offA? The usualasymptotic likelihood one overthesquareroot maximum theory, oftheobserved Fisherinformation, givesan estimate of .41. Despitethe complicated oftheestimanature wecanalso estimate tion thestandard error procedure, We sample with replacement using the bootstrap. ..*, (Y42, x42, 642)). For from the triplesI(y', xi, 5k), *
.., (y*,
x4*2,

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 200

59

150

100

50

01 0.5

1.5

2.5

FIG. 3. Histogram of 1000 bootstrapreplicationsfor the mouse leukemiadata.

thisproblem. We willsee thiswhenwe discussbootcensored data in Section5. strapping Pursuit Example2: Linearand Projection Regression We illustrate an applicationof the bootstrap to standard linearleast squaresregression as wellas to a nonparametric regression technique. Consider thestandard regression setup.We haven on a response Y and covariates observations (X1,X2, ofcovariates vector ***,X,). Denotetheithobserved

Noticetherelation oftheprojection pursuit regression modelto the standard linearregression model. Whenthe function is forced to be linear and is sj(-) estimated by the usual least squaresmethod, a oneterm modelis exactly projection pursuit the same as the standard linearregression model.That is to say, the fitted models'(a'1 xi) exactlyequals the least f31 squaresfita' + jxi,. This is because the least squaresfit, bydefinition, finds thebestdirection and the best linearfunction of that direction. Note also s 2(& * X2)wouldnot linearterm thataddinganother modelsincethe sum of two linear changethe fitted functions is another linearfunction. Hastieand Tibshirani (1984) appliedthebootstrap to thelinearand projection pursuit regression models to assess the variability of the coefficients in each. The datathey considered aretakenfrom Breiman and Friedman (1985). The responseY is Upland atmosphericozoneconcentration (ppm);the covariates X1 = Sandburg Air Force base temperature (CO),X2 = inversion base height (ft),X3 = Daggotpressure gradient(mmHg), X4 = visibility (miles),and X5 = day oftheyear.Thereare 330 observations. The number ofterms (m) inthemodel(3.4) is takentobe two.The chosedirections projection pursuit algorithm al = (.80, -.38, .37, -.24, -.14)' and 62 = (.07, .16, .04, -.05, -.98)'. These directions consistmostly of Sandburg Air Force temperature and day of the year,respec-

modelassumes (3.3)

by xi = (xil, xi2, ... , xip)'. The usual linear regression

a5

E(Yi)

a + E /fl1xi.
j=l a4
A.

Friedman a moregenand Stuetzle (1981) introduced model eralmodel, theprojection pursuitregression


(3.4)
E(Yi)=
j=l

X sj(aj - xi).

a3

The p vectors and ("directions"), aj are unitvectors thefunctions sj(.) are unspecified. in a forward manner formed as follows. Constepwise
Estimation of la,, sl(.)),
...,

a2

{am1,Sm(-)} is per-

and function are determined. and the nextdirection term This processis continued until no additional theresidual sumofsquares. reduces significantly

directional and associated functionsl(-) that minimize s xa))2. Then residuals are taken (y1-sl(i *

smoother bya nonparametric (e.g.,running mean)of y on a, * x. The projection pursuit regression algoto findthe rithm searchesover all unit directions

sider {al, s ( -)}. Given a direction al, s, ()* is estimated

a,

-1

-0.5

bootstrapped coefficients

0.5

FIG. 4. Smoothedhistograms ofthebootstrapped coefficients forthe termin theprojection model.Solid histograms first pursuitregression are are forthe usual projection pursuitmodel;the dottedhistograms forlinear9(*).

60

B. EFRON AND R. TIBSHIRANI

on 157patients. A pro(1). Thereare measurements portional hazardsmodelwas fitto thesedata,witha are highly thebroken curve in Fig.6 is significant; ofx. as a function /3x+ f2x2 For comparison, Fig. 6 shows(solid line) another estimate. This was computed using local likelihood estimation (Tibshiraniand Hastie, 1984). Given a oftheform hazards model general proportional h(tIx)
f2

quadraticterm,i.e, h(t I x) =

ho(t)elx+i32x.

Both #,and

a4

a3

a2

al~~

~~~~~~ -0.5

-1

bootstrapped coefficients

0.5

coefficients forthe FIG. 5. Smoothedhistograms ofthebootstrapped second termin theprojection pursuitmodel.

oftheestimated functively. (We do notshowgraphs ofthe in a fullanalysis tionss(*(.) and s2(. ) although Forcing data theywouldalso be of interest.) s'( *) to in the direction be linearresults a' = (.90,-.37, .03, -.14, -.19)'. These are just the usual least squares estimates i1, *.,* ,, Ascaled so that EP 12 = 1. of the directions, To assess the variability a bootwithreplacement from strap sampleis drawn (Yi, x11, - , X3305)andtheprojection . . ., X15), * * *, (Y330, X3301, *is applied.Figs.4 and 5 showhispursuit algorithm ofthedirections a'* and a* for200 bootstrap tograms in Fig.4 (broken Also shown histogram) replications. to be linear. The first ofthe projection model direction pursuit is quite stableand onlyslightly morevariablethan thecorresponding linearregression direction. But the is extremely unstable!It is clearly seconddirection ofthe unwise to putanyfaith in theseconddirection model. original projection pursuit
Example 3: Cox's Model and Local Likelihood Estimation are the bootstrap replicationsof a& with s.(.) forced

form ofs(x); insteadit abouttheparametric nothing estimates s(x) nonparametrically usinga kindoflocal is very The algorithm inaveraging. computationally tensive, andstandard maximum likelihood theory cannotbe applied. A comparison of the two functions revealsan imthe difference: estimate portant qualitative parametric suggests thatthe hazarddecreases sharply up to age 34,thenrises;the local likelihood estimate staysapproximately constant up to age 45 thenrises.Has the ofa quadratic function a misforced fitting produced To answer thisquestion, leading result? we can bootstrapthe local likelihood estimate. We samplewith the triplesI(Yl, x1, 61) ... (Y157i replacement from to X157,6157) and apply thelocal likelihood algorithm eachbootstrap sample.Fig. 7 showsestimated curves from 20 bootstrap samples. Someofthecurves are flatup to age 45,others are Hence the original estidecreasing. local likelihood mateis highly in thisregion variable and on thebasis ofthesedata we cannotdetermine the truebehavior A lookbackat theoriginal ofthefunction there. data showsthatwhilehalfof thepatients wereunder45, wereunder30. Fig. 7 also only13% of the patients is stablenearthemiddle shows thattheestimate ages butunstable for theolderpatients.
3

= ho(t)es(x), the local likelihood technique assumes

t0
a

//~~~~~~~~~~ / /
\L
/t

//

In this example, we return to Cox's proportional a few in Example1,butwith hazards model described addedtwists. The data thatwe willdiscusscomefrom theStanin Miller and are given ford heart transplant program time and Halpern(1982). The response y is survival x is in weeksafter a hearttransplant, the covariate age at transplant, and the 0-1 variable3 indicates whether thesurvival timeis censored (0) or complete

10

20

30
age

40

50

60

FIG. 6. Estimates of log relativeriskfor the Stanfordheart transplant data. Broken curve: parametric estimate. Solid curve: local likelihood estimate.

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY


TABLE 3

61

BHCGblood serum levels for 54 patients having metasticized breast cancer in ascending order <
0.1, 0.1, 0.2, 0.4, 0.4, 0.6, 0.8, 0.8, 0.9, 0.9, 1.3, 1.3, 1.4, 1.5, 1.6, 1.6, 1.7, 1.7, 1.7, 1.8, 2.0, 2.0, 2.2, 2.2, 2.2, 2.3, 2.3, 2.4, 2.4, 2.4, 2.4, 2.4, 2.4, 2.5, 2.5, 2.5, 2.7, 2.7, 2.8, 2.9, 2.9, 2.9, 3.0, 3.1, 3.1, 3.2, 3.2, 3.3, 3.3, 3.5, 4.4, 4.5, 6.4, 9.4

-11,..,.
10

20

30

40

50

60

age

FIG.7. 20 bootstraps estimate theStanford ofthelocallikelihood for data. heart transplant

4. OTHER MEASURES OF STATISTICALERROR So farwe have discussed or accustatistical error, interms It is easyto assess ofthestandard error. racy, othermeasuresof statistical such as bias or error, prediction error, usingthebootstrap. ofbias.Fora given Consider theestimation statistic 0(y),and a given parameter ,u(F),let (4.1) R(y, F) = 0(y) - A(F). (It willhelpkeepournotation clearto call theparameterofinterest than0.) Forexample, ,umight Arather bethemeanofthedistribution F, assuming thesample mean. spaceX is therealline,and 0 the25% trimmed The bias of0 for estimating itis
(4.2) A(F) = EFR(Y, F) = EF10(y))
-

As an exampleconsider the blood serumdata of Table 3. Supposewe wishto estimate thetruemean = of this the A EF{X I population using 0, 25%trimmed mean.We calculate j, = ,u(F) = 2.32,thesamplemean ofthe54 observations, and0= 2.24, thetrimmed mean. The trimmed meanis lower becauseit discounts the of the largeobservations effect 6.4 and 9.4. It looks likethetrimmed meanmight be morerobust forthis typeof data, and as a matterof fact a bootstrap analysis,B = 1000, gave estimated standarderror = .16 for0, compared to .21 forthe samplemean. But whataboutbias? The same 1000 bootstrap replications whichgave = .16 also gave0*(-) = 2.29,so
a a

(4.5)
-

2.29

2.32

-0.03.

A(F).

The notation with to EF indicates expectation respect the probability to F, in this mechanism appropriate case y = (xl, x2, - - *,xn)a randomsample fromF. The bootstrap estimate ofbias is
(4.3), fi= A(F) = EFR(y*, F) = Ep{0(Y*)) - U(F) a random As in Section2, y* denotes sample(x4,x, F, i.e.,a bootstrap ***,4) from sample.To numericallyevaluate,3,all we do is changestep (iii) of the in Section2 to bootstrap algorithm 'A - 1 B /B - - E R(y*(b), F ).
Bb=1 b

2.27. Removing bias in thisway is frequently a bad idea (see Hinkley, 1978),but at least the bootstrap has given us a reasonable analysis ofthebias picture and standard error of0. Here is anothermeasureof statistical accuracy, different from either bias or standard error. Let 0(y) be the 25% trimmed meanand M(F) be the meanof F, as in the serum example, and also let i(y) be the interquartile range, thedistance the25thand between 75thpercentiles of the sampley = (x1, x2, *--, xn).
Define (4.6) R(y, F) =
(y)-

bias by subtraction, we get 0-

according to (4.4). (The estimated standard deviation of fBdue to the limitations of having B = 1000 is only0.005in thiscase,so we can ignore bootstraps the difference between fB and A3.) Whether or not a bias of magnitude -0.03 is too largedependson the context oftheproblem. If we attempt to remove the
= 2.24 - (-0.03)-

I(y)

(F)

(4.4)

0*(b) -()

R is like a Student's t statistic, exceptthatwe have substituted the 25% trimmed mean forthe sample mean and the interquartile rangeforthe standard deviation. Supposewe knowthe 5th and 95thpercentiles of and p(.95)(F),where R(y,F), sayp(-05)(F) thedefinition ofp(.05)(F)is (4.7) F) < p(5)(F) I = .05, ProbF,R(y, andsimilarly for (.OS) pf 95)(F). The relationship ProbFpt withdefinition (4.6) to s R < (95)} - .90 combines

3 AsB-*oo,L Xgoesto/F(4.3).

62

B. EFRON AND R. TIBSHIRANI

themean,(F), for 90% "t interval" givea central (4.8)


y

[6_i(.95), 6 _

5. MORE COMPLICATEDDATA SETS where to situations is notrestricted The bootstrap the data is a simplerandomsample froma single thatthe data conSupposeforinstance distribution. samples, random sistsoftwoindependent (5.1) and U1, U2,---,Ur-F V1,V2, ... Vn G,

g(.05)]

Of coursewe do not knowp`05)(F)and p(95)(F), them by theirbootstrap but we can approximate F and (95)(F). A bootstrap sample estimates p(O5)(F) y* gives a bootstrapvalue of (4.6), R(y*, F) = (6(y*) - p(F))/i(y*),wherei(y*) is the interquartile
range of the bootstrap data x, *,

any fixed numberp, the bootstrapestimate of ProbF{R < p} based on B bootstrapsamples is


(4.9)

.,

*. For

F) < pJB. #fR(y*(b),

on distributions different F and G are possibly where ofinterest therealline.Supposealso thatthestatistic estimate shift is theHodges-Lehmann
A

of By keepingtrack of the empiricaldistribution which p of values R(y*(b),F), we can pick offthe make (4.9) equal .05 and .95. These approach F(05)(F) replications data,B = 1000bootstrap Fortheserum = = Substituting .078. gave p( 5)(F) -.303 and p 95)(F) estithesevalues into (4.9), and usingthe observed = = mates0 2.24, i 1.40, gives (4.10)
L

(5.2)

0=

- Ui, i = 1, 2, *.. m,j medianIVj

1, 2, ..., n}.

and p(95)(F) as B

-*

oo.

HavingobservedU1 = ul, U2 = u2, *--, Vn = Vn, error fora(F, G), the standard we desirean estimate of0. of u1, u2, * . distribution where F is the empirical
The bootstrapestimate of a(F, G) is a =
G(F,G),
*,

E [2.13,2.66]

forthe true t interval" as a central90% "bootstrap than the shorter mean ,u(F). This is considerably offreefor,tbased on 53 degrees t interval standard = [1.97,2.67]. Here-a= .21 is theusual i ? 1.67W dom, error (1.3). ofstandard estimate further arediscussed intervals confidence Bootstrap in Sections7 and 8. They requiremorebootstrap on the standard errors, thando bootstrap replications order of B = 1000 ratherthan B = 50 or 100. This in Section9. briefly pointis discussed By now it shouldbe clear that we can use any notjust accuracy, R(y,F) to measure variable random (4.1) or (4.6), and thenestimate EFIR(y, F)} by its }value EpIR(y*,F1) b=1 R(y*(b),F)/B. bootstrap F)2, we can estimate EFR(y,F)2 byEpR(y*, Similarly in theprediction problem, etc.Efron (1983) considers a set of data is used to construct whicha training rule. A naive estimateof the prediction prediction ofcorrect guessesit is theproportion rule'saccuracy set,butthiscan be greatly makeson its owntraining ruleis explicitly sincethe prediction overoptimistic set.In on thetraining errors to minimize constructed this case, a naturalchoice of R(y, F) is the over the naive estimate between the difference optimism, rulefor and the actualsuccessrateoftheprediction estimate newdata. Efron(1983) givesthe bootstrap related and showsthatit is closely ofoveroptimism, the usual methodof estimating to cross-validation, The papergoeson to showthatsome overoptimism. out estimategreatly of the bootstrap modifications and thebootstrap. bothcross-validation perform

(iii) of the steps (i) through onlythis modification, MonteCarloalgorithm producefB, (2.4),approaching aas B -- oo. invesexperiment on a simulation Table 4 reports onthisproblem. works howwellthebootstrap tigating 100 trialsof situation(5.1) wererun,withm = 6, [0, 1]. For each trial, n = 9, F and G bothUniform were replications bothB = 100andB = 200bootstrap unestimateJB was nearly The bootstrap generated. error u(F, G) = .167 for biasedforthetruestandard a quitesmallstandard B = 100orB = 200,with either ingoing totrial. The improvement trial from deviation B = 100 to B = 200 is too smallto showup in from thisexperiment. consider quite mustoften statisticians In practice, mulseries time models, structures: data complicated
TABLE 4 theHodges-Lehmann for error ofstandard estimate Bootstrap 100trials estimate; shift two-sample

U* from F and an of a random sample U*, G. With randomsample V*, - - -, V* from independent
...,

evaluatev. Let y = (u1,u2, Section2 to numerically *-, vn) be the observeddata vector.A bootstrap , v, *, v*) consists * *,u*4, v sample y*= (uiu, -*
...

of v1, v2, ** , um, and G is the empiricaldistribution of theMonteCarloalgorithm vn. It is easyto modify

for statistics Summary OB Ave


B = 100 B = 200 True o .165 .166 .167

SD
.030 .031

CV
.18 .19

F and G both uniform [0, 1]. Note: m = 6, n = 9; truedistributions

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY

63

censoredand sequentialsampling, layouts, tifactor how thebootstrap 8 illustrates missing data,etc.Fig. situation. in a general processproceeds estimation generates P which mechanism The actualprobability P of family to some belongs y data the observed the Hodges-LehIn mechanism. probability possible on P = (F, G), a pairofdistributions mannexample, and P equalsthefamily ofall suchpairs, therealline, y = (u1, U2, *--, Um, V1, v2, *--, vn) is generated m timesfromF and n times by randomsampling from G. R(y, P), We have a randomvariableof interest modelP, on bothy and theunknown which depends and we wish to estimatesome aspect of the disexample, of R. In the Hodges-Lehmann tribution o(P) = R(y, P) = 0(y) - Ep{0j, and we estimated the of0. As before, error lEpR(y,p)2}1/2, thestandard wheny is generated expectation notation Ep indicates P. to mechanism according We assumethatwe have someway of estimating modelP from the data y, prothe entire probability called P in Fig. 8. (In the twothe estimate ducing P = (F, G), the pair of empirical sampleproblem, thebootstrap. Thisis thecrucial for step distributions.) or nonparouteither parametrically It can be carried or bysomeother likelihood ametrically, bymaximum estimation technique. Oncewe haveP, we can use MonteCarlomethods to the data sets y*, according to generate bootstrap fromP. The same rules by whichy is generated randomvariableR(y*, P) is observable, bootstrap of sincewe knowP as wellas y*,so the distribution The byMonteCarlosampling. R(y*,P) can be found ofEpR(y,P) is thenEfR(y*,P), estimate bootstrap any other aspect of and likewisefor estimating R(y,P)'s distribution. ofa commodelis a familiar A regression example We observe data structure. y = (Yi, Y2, * plicated Yn), where
(5-3) yi = g(#, ti) + ei i = 1, 2, *--,

e:'ti. The ei are an iid sample from some unknown

instance of/ andti,for function iates;andg is a known e1, 2F, ...

F on therealline, distribution (5.4)


En

at 0 in some to be centered assumed F is usually where < = .5. The = 0} or Prob$e sense,perhapsEHe} 0 = (5.4) describe and F); (5.3) model is P (A, probability -* t1, 8. covariates thestepP t2, *.. , tn y in Fig. The (1.1),are problem simple n in the likethesamplesize values. observed their fixed at considered For everychoice of A we have a vectorg(3) = valuesfor (g(/, t1),g(B, t2), ** , g(/, tn)) ofpredicted minimizing estimate by A we y, y. Havingobserved between ofdistance g(/3)and y, somemeasure
(5.5) /: min D(y, g(/)).

The most common choice of D is D(y, g) = ofd? Let R(y,P) How accurate is as an estimate / -. A familiar ofaccuracy measure equalthevector matrix is themeansquareerror
(5.6) Z(P) = Ep( - )( - /)' = EpR(y, P)R(y, P)'.
- g(/, ti)12. = 1$yi

Z = AP) is obof accuracy estimate The bootstrap through Fig.8. tainedbyfollowing F) in this Thereis an obviouschoiceforP = (/3, /3 from is obtained (5.5). ThenF is case. The estimate oftheresiduals, distribution theempirical

n.

we wishto ofunknown parameters HereA is a vector ofcovarvector foreach i, tiis an observed estimate;
FAMILY OF POSSIBLE PROBABILITY MODELS ACTUAL PROBABILITY MODEL P OBSERVED DATA y ESTIMATED PROBABILITY MODEL P BOOTSTRAP DATA y

e* is an iid samplefrom F. Notice erc* where ',2 even variates, bootstrap thatthe e* are independent intheusual variates thee'i arenotindependent though sense. value a bootstrap Each bootstrap y*(b)gives sample
Al*(b),

i = 1, ... , n. rules by following A bootstrap sampley* is obtained (5.3) and (5.4), (5.8) Y' = g(, ti) + , i = 1,2, *** n,

(5.7)

F: mass(1/n) on

90-9g(/,

ti),

(5.9)

minD(y*(b), g(/3)), MO*(b):

----------pe

as in (5.5). The estimate (5.10)


2B = Eb

R(y,P) RANDOM VARIABLE OF INTEREST BOOTSTRAP

R(y*,P) RANDOM VARIABLE

1t{3*(b)

3*( )H}3*(b) B
--

illustration processfora general FIG. 8. A schematic ofthebootstrap modelP. The expectation of R(y, P) is estimatedby the probability expectationof R(y*, P). The double arrow indicates the bootstrap the bootstrap. crucialstep in applying

where g(/, ti) = /'ti and D(y, g) =

byB 1 in (5.10).) couldjustas welldivide least squares regression, In the case of ordinary
-

approaches the bootstrapestimate Z as B

oo. (We

= (y1-

64

B. EFRON AND R. TIBSHIRANI

estimate, B = oo, can be calculated without Monte

Section7 of Efron(1979a) showsthatthe bootstrap

The statistician getsto observe


(5.14) Xi= min{X?, WiJ

Carlosampling, and is (5.11)


(2

and
2_ 2

tit

(5.15)

D=

1if Xi = X X X = W.
are the cumulative

obtained random bysimple from a distribusampling tionF. Ifthecovariate vector F is tiis p-dimensional, a distribution on p + 1 dimensions. Then we apply the bootstrap as described in Section2 to originally thedata setx1,X2, . ., Xn ~iid F. The twobootstrap methods for theregression problem are asymptotically but can perform equivalent, in smallsamplesituations. quitedifferently The class of possibleprobability modelsP is different forthe twomethods. The simple described method, last,takes less advantage of the specialstructure of the regression problem. It does not give answer(5.11) in the leastsquares.On theother case ofordinary handthe simplemethodgives a trustworthy estimateof f's variability even if the regressionmodel (5.3) is not correct. The bootstrap, as outlined in Fig. 5, is very general, butbecauseofthisgenerality there willoften be more thanone bootstrap solution a given for proble.m As the finalexampleof this section,we discuss censored data. The ages of 97 men at a California retirement center,ChanningHouse, were observed either at death(an uncensored observation) or at the timethe studyended (a censored observation). The data sety = {(x1,dl), (x2, d2), ***, (X97, wherexi d97)J, was theage oftheithmanobserved, and { 0 if xi censored.
1 if xi uncensored

sponse pair xi = (ti, yi) to be a single data point

This is the usual Gauss-Markov answer, exceptfor thedivisor n in thedefinition ofa2. Thereis another, simpler wayto bootstrap a regression problem. We can considereach covariate-re-

distribution functions (cdf) forX? and Wi,respecwithcensoreddata it is moreconvenient tively; to consider survival curves thancdf. Underassumptions (5.12)-(5.15)thereis a simple formula forthe nonparametric MLE of S?(t), called the Kaplan-Meier estimator(Kaplan and Meier,

Note: 1 - S?(t) and 1 -R(t)

Forconvenience 1958). suppose x1<


SO)

xn,

n = 97. Then theKaplan-Meier estimate is

X2 < X3 <

...

<

(5.16)

= l

:)did

where ktis the value of k suchthatt E [Xk, Xk*1). In the case of no censoring, to the S?(t) is equivalent observed ofx1,x2, **, xn, but distribution empirical otherwise theempirical (5.16) corrects distribution to account for Likewise censoring.

(5.17) 9

R(t)

=A (=

-;

1)d

is the Kaplan-Meier estimate of thecensoring curve R(t). Fig.9 showsS0(t) forthe Channing House men.It crossesthe 50% survival level at 0 = 1044 months. Call thisvaluetheobserved medianlifetime. We can use the bootstrap to assigna standard error to the observed median. The probability mechanism is P = (SO, R); P produces (X?, Di) according to (5.12)-(5.15),and yrepeti(xi,di), - (xn, dn)} by n = 97 independent tions ofthisprocess. Anobvious oftheestimate choice P in Fig. 8 is (S? R), (5.14), (5.15). The rest of

appearsin Hyde(1980). A typical data point(Xi, Di) can be thought of as in the following generated way:a real lifetime X? is selected randomly to a survival according curve
(5.12) S?(t) ProbJX9> t}, (0 c t < oo)

of "+." A fulldescriptionof the Channing House data

Thus (777, 1) represents a Channing House manobserved to die at age 777 months, while(843,0) representsa man 843 months old whenthe studyended. His observation couldbe written as "843+," and in factdi is just an indicator for theabsenceorpresence

1.0 0.8 _ 0.6 _


O. 4 -

0.2 -

and a censoring time Wi is independently selected to another survival curve according (5.13) R(t) Prob{ W. > t), (0 < t < em).

800

900

1000

1044

1100

FIG. 9. Kaplan-Meier estimatedsurvival curve for the Channing House men; t = age in months.The mediansurvivalage is estimated to be 1044 months(87 years).

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY

65

bootstrapprocess is automatic: S? and R replace S? and R in (5.12) and (5.13); n pairs (Xi*, D:*) are independentlygenerated according to rules (5.12)(5.15), givingthe bootstrapdata set y* = {xl*,d*, * , (x*, dn*); and finally the bootstrap Kaplan-Meier curve S* is constructedaccordingto formula(5.16), and the bootstrapobservedmedian 0* calculated. For the Channing House data, B = 1600 bootstrapreplications of 0* gave estimated standard errora = 14.0 months for 0. An estimated bias of 4.1 months was calculated as at (4.4). Efron (1981b) gives a fuller description. Once again thereis a simplerway to apply to bootstrap. Consider each pair y-= (xi, di) as an observed point obtained by simple random sampling from a bivariate distribution F, and apply the bootstrap as describedin Section 2 to the data set yl, Y2, , Yn F. This method makes no use of the struc-iid special it gives exactly the ture (5.12)-(5.15). Surprisingly, same answers as the more complicated bootstrap methoddescribedearlier(Efron,1981a). This leads to a surprising conclusion:bootstrapestimatesof variabilityforthe Kaplan-Meier curve give correctstandard errors even whenthe usual assumptionsabout the censoringmechanism,(5.12)-(5.15), fail. 6. EXAMPLES WITH MORE COMPLICATED DATA STRUCTURES Example 1: Autoregressive Time Series Model This example illustrates an application of the bootstrapto a famoustime series. The data are the Wolfer annual sunspot numbers forthe years 1770-1889 (taken from Anderson,1975). Let the count forthe ith year be zi. Aftercentering the data (replacingz- by z- - z--), we fit a first-order model autoregressive Z. = kz-i-+ Ewheree- iid N(0, a2). The estimate k turnedout to be .815 withan estimatedstandarderror, one overthe root of the Fisher of .053. information, square A bootstrapestimateof the standarderrorof k can be obtained as follows.Define the residuals ki = Zozi-1 fori = 2, 3, -.., 120. A bootstrapsample z*, z*, 0 with .-, *20is created by sampling2, 3, *, from the residuals, then letting replacement z* = zl, and z4* = Oz 1 + si*, i = 2, ..., 120. Finally, after centeringthe time series z*, z*, *-, 420, g is the estimateof the autoregressive parameterforthis new time series. (We could, if we wished, sample the * froma fitted normaldistribution.) of 1000 such bootstrapvalues k*f, 2 , A histogram * io, /40 is shown in Fig. 10. The bootstrapestimateof standard errorwas .055, agreeingnicelywiththe usual formula.Note however (6.1)

250

200

150

100

50

0.6

0.7

0.8

0.9

FIG. 10. Bootstraphistogram of of, spot data, model(6.1).

*, k10oo for the Wolfer sun-

that the distributionis skewed to the left, so a about confidenceintervalfor'kmightbe asymmetric and as discussed in Sections 8 9. 'k In bootstrappingthe residuals, we have assumed that the first-order autoregressivemodel is correct. models in Section (Recall the discussionof regression model is far autoregressive 5.) In fact,the first-order from adequate for this data. A fit of second-order model autoregressive (6.2)
z- = azi-- + OZi-2 + e

gave estimates a' = 1.37, 0 = -.677, both with an estimated standard error of .067, based on Fisher calculations. We applied the bootstrapto information this model, producing the histograms for a*t, *--, o shown in Figs. 11 and 12, a*too and O*, *--, respectively. The bootstrap standard errorswere .070 and .068, both close to the usual value. Note that respectively, the additional term has reduced the skewness of the first coefficient. Example 2: Estimatinga Response Transformation in Regression Box and Cox (1964) introduced a parametricfamily of the response in for estimatinga transformation a regression. Given regression data {(xl, yi), * , (Xn,Yn)}, theirmodel takes the form (6.3) z(X) = xi * + where z-(X) = (yi - 1)/X for X $ 0 and log yi for X = 0, and e, iid N(0, (2). Estimates of X and d are

found byminimizing Ej (Z1


-

Breiman and Friedman (1985) proposeda nonparametricsolutionforthis problem.Their so called ACE

)2-

66

B. EFRON AND R. TIBSHIRANI

250 200 150


100

50

1.1

1.2

1.3

1.4

1.5 sunfor the Wolfer

FIG. 11. Bootstraphistogram of & &,*.,&o spot data, model(6.2). 300

sponse Y being numberof cycles to failure,and the factors lengthoftest specimen(X1) (250, 300, and 350 mm),amplitudeof loading cycle (X2) (8, 9, or 10 mm), and load (X3) (40, 45, or 50 g). As in Box and Cox, we treatthe factorsas quantitiveand allow only a linear termforeach. Box and Cox foundthat a logarithmic was appropriate, withtheirprocedure transformation a value of -.06 forXwithan estimated95% producing confidenceintervalof (-.18, .06). selected by the Fig. 13 shows the transformation ACE algorithm.For comparison,the log functionis plotted(normalized)on the same figure. In orderto assess The similarity is truly remarkable! the variabilityof the ACE curve, we can apply the bootstrap.Since the X matrixin this problemis fixed by design,we resampledfromthe residuals instead of fromthe (xi, y-) pairs. The bootstrapprocedurewas the following: Calculate residuals , = s(y,) - xi.*3, i = 1, 2, ... , n. Repeat B times Choose a sample l , *,n withreplacementfrom el, En i = 1, 2, n Calculate y* = s-'(x, - + Compute s*(.) = resultof ACE algorithm applied to (x1, y ), .*., (xn, y*) End The number of bootstrap replications B was 20. Note thatthe residualsare computedon the s(*) scale, not the y scale, because it is on the s(*) scale that the true residuals are assumed to be approximatelyiid. The 20 estimated transformations, s *- -, S20( are shown in Fig. 14. The tightclusteringof the smooths indicates that theoriginalestimates((.) has low variability, especially forsmallervalues of Y. This agrees qualitativelywith
: 2 ? W } f

200

100

-0.8

-0.7

-0.6

-0.5

FIG. 12. Bootstraphistogram sunof 0*, ..., 6100for the Wolfer spot data, model(6.2).

I I

--T

I I

(alternatingconditional expectation) model generalizes (6.3) to


(6.4)
S(y1) = x-

Estimated

~
0
Log Function

wheres(.) is an unspecifiedsmooth function.(In its mostgeneralform, ACE allows fortransformations of the covariates as well.) The function s(.) and parameterj3are estimatedin an alternating fashion, utilizing a nonparametric smootherto estimates(-). Friedmanand In the following example,taken from Tibshirani (1984), we compare the Box and Cox procedure to ACE and use the bootstrap to assess the of ACE. variability The data fromBox and Cox (1964) consist of a 3 X 3qX 3qexpenriment on the strengthof yar-ns, the re-

-2

, , ,, I ,
1000

2000

3000

I , . , .
y

FIG. 13. Estimated transformation fromACE and the logfunction forBox and Cox example.

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY


2-

67

TABLE 5 Exact and approximate central90% confidence intervals for0, the truecorrelation coefficient, fromthe law schoQldata ofFig. 1

1. 2. 3. 4.

Exact (normal theory) Standard (1.7) Transformed standard Parametricbootstrap(BC) 5. Nonparametric bootstrap(BCa) -1

[.496, .898] [.587, .965] [.508, .907] [.488, .900] [.43, .92]

R/L = .44 R/L = 1.00 R/L = .49 R/L = .43


R/L = .42

Note: R/L = ratio of rightside of interval,measuredfrom0 = .776, to left side. The exact interval is strikingly asymmetricabout 0. Section 8 discusses the nonparametric methodof line 5.

instrikingly different from theexactnormal theory based on of a terval the assumption bivariate normal 2000 3000 0 1000 sampling distribution F. y In this case, it is well knownthat it is betterto ofACE transformations forBox and FIG. 14. Bootstrapreplications k= tanh-(O),X = tanh-(0), makethetransformation Cox example. backto apply(1.7) on theX scale,and thentransform the0 scale.The resulting interval, line3 ofTable 5, is moved closerto the exactinterval. thereis However, theshort for Xin theBox and Cox confidence interval about the tanh-1 transformation. nothing automatic analysis. For a different the correlation statistic from coeffithe cient or a different distributional from family 7. BOOTSTRAPCONFIDENCEINTERVALS bivariate normal, wemight very wellneedother tricks This section three closely related methods to make(1.7) perform presents satisfactorily. thebootstrap to setconfidence intervals. ofusing The The bootstrap can be usedto produce approximate is in termsof simpleparametric discussion models, in an automatic confidence intervals way.The followwherethe logicalbasis of the bootstrap methods is ing discussionis abridgedfromEfron (1984 and to muleasiestto see. Section8 extends themethods 1985)and Efron(1982a,Chapter 10). Line 4 ofTable models. tiparameter and nonparametric for the interval thattheparametric 5 shows bootstrap We have discussedobtaininga, the estimatedstand0 is nearlyidentical withthe correlation coefficient 0 and a' are ard error of an estimator 0. In practice, exactinterval. "Parametric" in thiscase meansthat theapproximate confi- the bootstrap usually usedtogether to form algorithm begins fromthe bivariate 0 E 0 ? 5z" , (1.7), wherez is the denceinterval normal MLE FNORM, as forthe normal curve theory 100 - a percentile normal distri- ofFig. 2. This goodperformance pointofa standard is no accident. The bution. to haveapproxi- bootstrap The interval (1.7) is claimed used in line 4 in effect method transforms 1 - 2a. For the law school matecoverage probability 0 tothebest(most theappropriate normal) scale,finds exampleof Section2, the values 0 = .776,& = .115, and transforms this interval back to the 0 interval, Z(05) = -1.645, give0 E [.587, .965]as an approximate scale.Allofthisis doneautomatically bythebootstrap forthe truecorrelation coeffi- algorithm, 90% centralinterval from without requiring specialintervention cient. the statistician. The pricepaid is a largeamountof interval We willcall (1.7) thestandard for0. When computing, perhapsB = 1000bootstrap replications, families likethebivariate as discussed within working parametric in Section10. obtained a'in (1.7) is usually bydifferentiating Define G(s) to be the parametric normal, cdf bootstrap see Section 5a of Rao the log likelihood function, of0*, in thecontext ofthispaperwemight (1973),although of a, estimate to use theparametric bootstrap prefer < SI, (7.1) G(s) = Prob*4o* e.g., 1NORM in Section2. probability computed according useful where are an immensely The standardintervals Prob*indicates to the bootstrap distribution of 0*. In Fig. 2 G(s) is statistical tool. Thev have the greatvirtueof being obtained thenormal curve. We by integrating theory which a computer can be written automatic: program threedifferent kindsofbootstrap confifrom the data y and the form willpresent produces (1.7) directly in orderof increasing denceintervals All generality. function fory, withno further of the density input three methods use percentiles ofG to define theconNevertheless thestandfrom thestatistician. required in which fidence interval. are Theydiffer percentiles as Table 5 shows. ardintervals canbe quiteinaccurate used. The standardinterval(1.7), using 1NORM,(2.5), is -2/

68

B. EFRON AND R. TIBSHIRANI

method inter10.4 of Efron(1982a). The percentile between the 100 - a and 100 val is just theinterval of distribution of the bootstrap (1 - a) percentiles We willuse the notation 0[a] forthe a levelendconfidence interval for0, so pointofan approximate 0 E [6[a], 0[1 - a]] is the central1 - 2a interval. thevarious be usedtoindicate different will Subscripts methods. The percentile interval has endpoints
(7.2) (7.3) Op[a] _ G-'(a) Os[a] = 0 + 6*.

The simplestmethodis to take 0 E [G-'(a), G '(1 - a)] as an approximate1 - 2a centralinterval for0. This is calledthepercentilemethodin Section

forsome monotonetransformation = g X= () where r is a constant. In the correlationcoefficient example the functiong was tanh-'. The standard limits(7.2) can now be grosslyinaccurate.Howeverit is easy to verifythat the percentile limits (7.2) are still correct. "Correct" here means that (7.2) is the mappingof the obvious intervalforXz, + r , back to the 0 scale, Op[a] = g-(4 + TZ(a)). It is also correct in the sense of having exactly the claimed converge probability 1 - 2a. Anotherway to state things is that the percentile intervalsare transformation invariant, (7.7)
OP[a] = g(Op[al)

interval, with thestandard This compares


iz(a)

thesedefinitions. Lines1 and 2 ofTable 6 summarize cdf G is the normal, say Suppose bootstrap perfectly
(7.4) G(s) = -((s-

normal dt,thestandard where 4(1s)= fs (2wr)-1/2e_t2/2 cdf.In otherwords,supposethat 6* has bootstrap method distribution N(O,a2). In thiscase thestandard
and the percentile method agree, Os[a] = Op[a]. In

G is markedly nonsituations likethatofFig.2,where from interval is quitedifferent normal, the standard (7.2). Whichis better? thesimplest posconsider To answer thisquestion, all 0 siblesituation, where for (7.5)
0 - N(6, a2)

0 with parameter That is, we have a singleunknown and a single summary statisno nuisance parameters, about0 with constant standdistributed tic0 normally cdf a. In thiscase theparametric ard error bootstrap a' equals a.) estimate thatinsteadof (7.5) we have,for Supposethough all 0,
(7.6 N)
2

is given by (7.4), so Os[a] = Op[a]. (The bootstrap

forany monotonetransformation g. This impliesthat if the percentileintervalsare correcton some transformed scale X = g(6), then theymust also be correct on the originalscale 0. The statisticiandoes not need to know the normalizingtransformation g, only that it exists. Definition(7.2) automaticallytakes care of the bookkeepinginvolved in the use of normalizing transformations forconfidenceintervals. Fisher's theoryof maximumlikelihood estimation says that we are always in situation (7.5) to a first orderof asymptoticapproximation.However,we are also in situation(7.6), forany choice ofg, to the same order of approximation.Efron (1984 and 1985) uses higher order asymptotictheory to differentiate between the standard and bootstrapintervals.It is the orderasymptotic termswhichoftenmake exact higher intervalsstrongly about the MLE 0 as in asymmetric Table 5. The bootstrapintervalsare effective at capturingthis asymmetry. The percentilemethod automaticallyincorporates as in going from(7.5)normalizing transformations, It turns are out that there two other important (7.6). the first waysthat assumption(7.5) can be misleading, of which relates to possible bias in 0. For example the familyof densitiesforthe observed considerfo(O), correlationcoefficient 0 when samplingn = 15 times froma bivariate normal distribution with true corre6

TABLE

intervals fora real valuedparameter6 approximate confidence Four methodsofsetting Method Abbreviation a level endpoint
a

Correctif

1. Standard

s[a]

+ ^ (a)

N(O, U2)

constant

2. Percentile 3. Bias-corrected 4. BCa

Op[a]
OBC[a]

G'(a)
G-1(cf2zo + z(a)1)
( {

There exists monotonetransformation + = g(6), 4 = g(6) such that: T constant N(O, T2) T2) constant T zo, N(O-ZOT,
N(-zoTO, Tz, = 1+ a4 where rO

OBCj[a]

z +

+ z (a) -(zo a(zo +Za)

zo, a

constant

Note: Each methodis correctundermoregeneralassumptionsthan its predecessor.Methods 2, 3, and 4 are definedin termsofthe percentiles of G, the bootstrapdistribution (7.1).

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY

69

lation0. In factit is easy to see that no monotone this family mappingk = g(0), q = g(0) transforms to - N(0, T2), as in (7.6). If thereweresuch a g, then Proba9G< 01 = Probofo < 14 = .50, but for O = .776 integrating the densityfunction f776(0) < 01 = .431. givesProb=.776{0 (BC method), method percentile The bias-corrected forthistype line 3 ofTable 6, makesan adjustment ofbias. Let
(7.8)

TABLE 7 Central90% confidence intervals for6 havingobserved /19) H(X2

1. 2. 3. 4. 5.

Exact Standard (1.7) BC (7.9) BCa (7.15) Nonparametric BCa

6 1.88 * 6] [.631 6, [.466 * 6,1.53 * 6] [.580 * 6,1.69 * 6] 6 [.630 *0, 1.88 8] 6 1.68 6] 6 [.640 8,

R/L = R/L = R/L = R/L = R/L =

2.38 1.00 1.64 2.37 1.88

zo3

1G(O)1,

of the standard where FV-is the inversefunction has a levelendpoint normal cdf.The BC method
(7.9)
OBC[a]

Note: The exact intervalis sharplyskewed to the rightof 6. The overthe standardinterval. BC methodis onlya partialimprovement with the exact The BCa interval,a = .108, agrees almost perfectly interval.

- G-1({2Z

+ Z(a)).

Note: if G(O) = .50, that is if halfof the bootstrap value 0, distribution of J*is less than the observed definition thenzo = 0 and OBC[a] = Op[a]. Otherwise (7.9) makesa bias correction. Section10.7 of Efron(1982a) showsthatthe BC if correct interval for0 is exactly (7.10)
X -N(M - zor,

X = g(0), q = g(0) transformation for somemonotone and someconstant zo. It does not look like (7.10) is muchmoregeneralthan (7.6), but in factthe bias correction is often important. method In the exampleof Table 5, the percentile [.536,.911]compared 90% interval (7.2) givescentral to the BC interval [.488,.900] and theexactinterval ofthe exact the endpoints [.496,.898].By definition interval satisfy

intervalis makes 6 unbiasedfor 6.) A confidence 6. In thiscase theBC desired for thescale parameter overthe interval basedon 6 is a definite improvement as far half standard interval but about (1.7), goesonly the of as it shouldtoward achieving asymmetry the exactinterval. It turns out that the parametricfamily0 be transformed into(7.10),noteven cannot #(X29/19) that The results approximately. ofEfron (1982b)show transformation theredoes exist a monotone g such to a highdegreeof that X = g(O), 4 = g(6) satisfy
approximation
(7.14) N(Ozor, r)
(To = 1 +

a+ ).

The BCa method (Efron, 1984),line 4 of Table 6, intervals confidence is a method ofassigning bootstrap which can be whichare exactlyrightforproblems a has level mappedinto form(7.14). This method endpoint
(7.15)
OBCakY]

The constantsin (7.14) are zo = .1082, a = .1077.

.7761 = .05 (7.11) Probo=.49610>

= Prob6.898$0 < .776}.

forthe BC endpoints The corresponding quantities are I> .7761= .0465, Prob=.488{ (7.12)
< .7761 = .0475, Prob0=.go0O

zo + z~ + 1 - a(zo + z(a))
= OBC[a], but otherwisethe BCa

to compared > .7761= .0725, Prob=.53610


< .7761 = .0293. Probo=.9110

overthe intervals can be a substantial improvement in Table 7. BC method as shown The constant zo in (7.15) is given by zo = from 4r-1G(O)}, (7.8),and so can be computed directly How do we knowa? It distribution. the bootstrap families outthatin one-parameter turns fo(O),a good
is approximation (7.16)

If a = 0 then 6BC[aI

is The bias correction forthe percentile endpoints. the error in equalizing probabilities quiteimportant accuIfzocan be approximated at thetwoendpoints. in Section 9),thenitis preferable rately (as mentioned to use theBC intervals. Table 7 shows a simpleexamplewherethe BC The data consistsof the method is less successful. indicatthenotation 0 0(X29/19), observation single 0 timesa random scale parameter ing an unknown variablewith distribution X29/19.(This definition

a I SKEWo=4(l0(t))
6

at parameter whereSKEWo=(i(t)) is the skewness value 0 = 0 of the score statistic lo(t) = (d/a0)log thisgivesa .1081, compared f0(t).For0 lNX19/19) in Efron(1984). to theactualvaluea = .1077derived of Table 5 correlation For the normal theory family takes which theBC method, a 0 which explains why so wellthere. a = 0, words

70

B. EFRON AND R. TIBSHIRANI


TABLE 8 intervals for6 = fl2/fll and foro = 1/d Central90% confidence havingobserved(YI, Y2) = (8, 4) froma bivariatenormal distribution y N2(iq,I)

(7.18) is thatwe neednot The advantage offormula to (7.14) in order g leading knowthe transformation likeOBC[a] and Op[a], to approximate a. In fact OBCa[], as in (7.7). Likethebootis transformation invariant, diare computed strapmethods, the BCa intervals (.), for ofthedensity function rectly from theform Jo O near0. Formula(7.16) appliesto the case where0 is the the more discusses onlyparameter. Section8 briefly intervals for ofsetting confidence challenging problem 0 in a multiparameter and also in a parameter family, of nuiwherethe number nonparametric situations is effectively infinite. sanceparameters from the theprogression To summarize thissection, to the BCa method is based on a standard intervals less restrictive assumptions, as seriesof increasingly in Table 6. Each successive in Table 6 method shown amountof to do a greater requires the statistician first G, then the bootstrap distribution computation; the bias correction the conconstant z0, and finally are algoall ofthesecomputations stanta. However, rithmic in character, and can be carriedout in an automatic fashion. other (1982a) discusses several Chapter 10 ofEfron to construct approximate waysofusingthebootstrap confidence which willnotbe presented here. intervals, One ofthesemethods, the "bootstrap t,"was used in thebloodserum ofSection4. example

O For 1. Exact (Fieller) 2. Parametricboot (BC) 3. Standard (1.7) MLE [.29, .76] [.29, .76] [.27, .73] O=.5

For [1.32, 3.50] [1.32, 3.50] [1.08, 2.92]

Note: The BC intervals,line 2, are based on the parametricbootof 6 = Y2/Y1. strapdistribution

matrix theidentity, meanvector q and covariance (8.1) y - N2(1,I)for whichwe desire a The parameter of interest, is theratio confidence interval,
(8.2)

0=

fl2/f11

AND MULTIPARAMETER 8. NONPARAMETRIC 0. Notice that the standard intervaldoes not transCONFIDENCEINTERVALS 6 to k. from form correctly based on applying 2 the BC intervals Line shows on the simplecase - Jo, where Section7 focused to the and definitions bootstrap (7.9) parametric (7.8) 0 and a real we have onlya real valuedparameter This is the 0 of = Y2/Y1 (or X y1/y2). 0 from valuedsummary statistic which we are trying distribution = when of distribution 0* samplingy* = Y*/y* to construct a confidence interval for0. VariousfaFNORM N2((y1,Y2), I). The bootstrap from (Y1, Y2*) intervorable of the bootstrap confidence properties and in this case they intervals transform correctly, in the simplecase, but of vals were demonstrated three decimal interval to the exact with places. agree course thesimple case is where weleastneeda general method likethebootstrap. Example 2. Product of NormalMeans Now we will discussthe morecommonsituation theredo not For mostmultiparameter situations, where thereare nuisance besidesthe paparameters intervals a exist exact confidence for parameter single the rameter of interest 0; or even more generally instance that is of for interest. changed (8.2) Suppose of nuisance case, wherethe number n6nparametric to is The discussion infinite. is effectively parameters Efron brief limited to a few examples. (1984and 1985) (8.3) 0= 1772, basis of bootstrap approxidevelopsthe theoretical forcomplicated intervals mateconfidence situations, stillassuming inter(8.1). Table 9 showsapproximate The word"approxi- vals for0, and also for / = 02, havingobserved and givesmanymoreexamples. y= here since exact nonparametric (2, 4). The "almostexact"intervals mate" is important are based on an do notexistformostparameters analogofFieller's intervals confidence with argument (Efron, 1985),which (see Bahadurand Savage,1956). to a highdegree suitable care can be carried through BC intervals Once again,theparametric ofaccuracy. Example1. RatioEstimation thatthestandard to line1. The fact area closematch is particularly intervals do not transform correctly The data consists of y = (Yi, Y2),assumed to come withunknown obvious here. from a bivariate normaldistribution

for Fieller(1954)provided wellknown exactintervals 6 in this case. The Fiellerintervals are based on a clevertrick,whichseems veryspecial to situation (8.1), (8.2). for6 90% interval Table 8 showsFieller'scentral having observed y = (8, 4). Also shownis theFieller for interval X = 1/0 = lql/fl2, whichequals [.76-1,.29-1], for0. The theobvious transformation ofthe interval for standard 0,butnotfor interval (1.7) is satisfactory

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY


Central90% confidence intervals for6 = fllfl2 and observed y = (2, 4), wherey N2(iq, I) O For 1. Almostexact 2. Parametricboot'(BC) 3. Standard (1.7) MLE [1.77, 17.03] [1.77, 17.12] [0.64, 15.36] E-8
TABLE 9
0 62 having =

71

For 0 [3.1, 290.0] [3.1, 239.1] [-53.7, 181.7] 64

on bootstrapping 0 = t(F), which is thenonparametric MLE of 0. In thiscase a goodapproximation to the in terms constant a is given oftheempirical influence function U?, defined in Section10 at (10.11), 11 (U?)3 --= --I a (8.5) This is a convenient formula, sinceit is easy to nua merically evaluatethe Ui?by simplysubstituting smallvalueof0 into(10.11).
Example 3. The Law School Data

Note: The almost exact intervals are based on the high order approximationtheoryof Efron (1985). The BC intervalsof line 2 are based on the parametricbootstrapdistribution of 6 = Y1Y2-

The good performance of the parametricBC intervals is not accidental. The theorydeveloped in Efron (1985) shows that the BC intervals,based on bootstrappingthe MLE 0, agree to high order with the almost exact intervalsin the following class of problems: the data y comes froma multiparameter family of densitiesf,(y), both y and q k-dimensional vectors; the real valued parameter of interest0 is a smooth functionof q, 0 = t(q); and the familyf,(y) can be transformed to multivariate normality, say (8.4) g(y) - Nk(h(,q),I),

by some one-to-onetransformations g and h. Just as in Section 7, it is not necessary for the statisticianto know the normalizingtransformations g and h, only that they exist. The BC intervals are obtaineddirectly from the originaldensitiesfr,: we find the MLE of q; sample y* - f,; compute 0*, 11= ?f(y), the bootstrapMLE of 0; calculate G, the bootstrapcdf of 0*, usually by Monte Carlo sampling,and finally apply definitions(7.8) and (7.9). This process gives the same intervalfor 0 whetheror not the transformationto form(8.4) has been made. Not all problemscan be transformed as in (8.4) to a normal distribution with constant covariance. The case considered in Table 7 is a one-dimensional counterexample. As a resultthe BC intervalsdo not always work as well as in Tables 8 and 9, although they usually improveon the standard method. However, in orderto take advantage of the BCa method, whichis based on moregeneralassumptions,we need to be able to calculate the constanta.
Efron (1984) gives expressions for "a" generalizing (7.16) to multiparameter families, and also to nonparametric situations. If (8.4) holds, then "a" will have case. Otherwise the two intervals differ. Here we will discuss only the nonparametric situation: the observed data y = (xl, x2, ... , xn) consists of - F, where F can be iid observations X1, X2, ... , any distribution on the sample space 2; we want a confidence interval for 0 = t(F), some real valued functional of F; and the bootstrap interval are based

For 6 the correlation the values of U? coefficient, corresponding to the 15 data pointsshownin Fig. 1 are -1.507, .168,.273,.004,.525,-.049, -.100, .477, .310, .004, -.526, -.091, .434, .125, -.048. (Notice howinfluential law school1 is.) Formula(8.5) gives a - -.0817. B = 100,000 bootstrapreplications, about 100 timesmorethan was actuallynecessary (see Section10), gave zQ = -.0927, and the central 0 E [.43, .92] shownin Table 5. The 90% interval nonparametric BCa interval is quitereasonable in this that there is no example,particularly considering F is thatthe truelaw schooldistribution guarantee nearbivariate normal. anywhere
Example 4. Mouse Leukemia Data (the FirstExample in Section 3)

value zero, and the BCa method reduces to the BC

The standard in formula central 90% interval for ,B (3.1) is [.835,2.18].The bias correction constant zoBC interval .0275,giving [1.00,2.39]. This is shifted farright ofthe standard the long interval, reflecting tail of the bootstrap seen in Fig. 3. right histogram We can calculate"a" from each of (8.5), considering the n = 42 data pointsto be a triple(yi, xi,bi):a-.152. Because a is negative, the BCa intervalis shifted back to the left,equaling[.788, 2.10]. This contrasts withthe law schoolexample, wherea, zo, and theskewness ofthebootstrap distribution added to each other rather thancancelling in out,resulting a BCa interval muchdifferent infrom the standard terval. for sometheoretical Efron(1984) provides support However theproblem BCa method. thenonparametric inofsetting confidence approximate nonparametric is stillfarfrom tervals wellunderstood, and all methods should be interpreted withsomecaution. We end a cautionary thissection with example.
Example 5. The Variance

Suppose X is the real line, and 0 = VarFX,the variance. Line5 ofTable 2 shows theresult ofapplying BCa method the nonparametric to data sets xl, x2, wereactually iid samplesfrom a N(0, *--, x20which 1) distribution. The number .640 forexampleis the

72

B. EFRON AND R. TIBSHIRANI

per data set. The upperlimit replications bootstrap outbySchenker small, as pointed 1.68. 0 is noticeably bootthenonparametric (1985).The reasonis simple: of 0* has a shortuppertail; comstrapdistribution which distribution bootstrap paredto theparametric ofBeran The results variable. is a scaledx29 random (1981),and Singh(1981) (1984),BickelandFreedman distribution is bootstrap showthatthenonparametric but of coursethatis accurateasymptotically, highly Bootofgoodsmallsamplebehavior. nota guarantee ofF, as in lines3, version from a smoothed strapping 4, and 5 of Table 2 alleviatesthe problemin this example. particular 9. BOOTSTRAPSAMPLE SIZES mustwe take? How manybootstrap replications error estimate'B based on B the standard Consider
-*

average of OBC[.05]/0over 40 such data sets,B = 4000

percenvalues,sayB = 250,can givea useful smaller beingthatthenthe conthe difference tile interval, intervals Confidence stantzo neednotbe computed. of stameasure moreambitious are a fundamentally so it is not tisticalaccuracythan standarderrors, efmorecomputational thattheyrequire surprising fort. AND THE DELTA METHOD 10. THE JACKKNIFE unknown a single from sampling byrandom obtained X1, * - *, Xn -iid F. We willgiveanother distribution, a, whichillusestimate of the bootstrap description to oldertechniques relationship thebootstrap's trates and like the jackknife errors, standard of assigning thedeltamethod. in Section2,letin step(i) ofthealgorithm described sample of the bootstrap Pi' indicatethe proportion equal to xi, i4 # = *= xi)x i= in 1,2 .. (10.1)
n
= (p*9, p*, **, p*). The vector p* has a rescaled distribution multinomial

to compute the BC or

bootstrap setting for is quitedifferent The situation ofEfron (1984), The calculations intervals. confidence Section8, showthatB = 1000 is a roughminimum necessary of MonteCarlobootstraps forthe number
BCa

intervals. Somewhat

a standard errorto 0(y), where y = (x1,

case ofassigning to thesimple returns This section


...,

x.) is

error as origiof standard estimate the bootstrap F in (2.3). BecauseF does notestimate nallydefined of coefficient 'a = o(F) willhavea non-zero perfectly, error a = the truestandard forestimating variation CV becauseoftherandomo(F); &B willhavea larger sampling. nessaddedbytheMonteCarlobootstrap approximation, thefollowing It is easyto derive (9.1) E+ 2] 1/2 CV(5B) {CV(&) {VCV(5)2 + E4B)
a,

(2.4). As B bootstrapreplications,

oo, B approaches

4), as For a givenbootstrapsample y* = (x*, x**,

of distribution ofthebootstrap where 6 is thekurtosis &'*, giventhe data y, and E{II its expectedvalue lies CV(Uf) averagedover y. For typicalsituations, if 0 = i, n = 20, .10 and .30. For example, between N(0, 1), thenCV(j) -.16. Xi -fid valuesofB and various Table 10 showsCV(&B)for For values of = in 0 (9.1). E{&} assuming CV(GD), > is there little past B = 100. improvement .10, CV(Uj) Even results. B as smallas 25 givesreasonable In fact as we smaller values of B can be quite informative, Data (Fig. 7 of HeartTransplant saw in theStanford Section3).
estimateofstandard ofvariationof aB, the bootstrap Coefficient as a function ofB and error based on B Monte Carlo replications, oo CV as B CV(f), the limiting
B--

(10.2)

p?)/n p* - MUltn(n, (p0 = (1/n, 1/n, *..,

1/n)),

observed theproportions indicates thenotation where each with fromn randomdraws on n categories, 1/n. probability vectors For n = 3 thereare 10 possiblebootstrap in Fig. 15 along withtheir p*. These are indicated from(10.2). For example, probabilities multinomial probaofthesevalueshas bootstrap anypermutation 1/9. bility easier suppose that the To make our discussion form: 0 = 0(F), 0 is of functional of interest statistic to a realnumber assigning where 8(F) is a functional F on thesamplespaceX. The mean, anydistribution meanare and thetrimmed coefficient, thecorrelation form of functional Statistics form. all of functional of F, no matter have the same value as a function whatthe samplesize n maybe, whichis convenient and deltamethod. thejackknife for discussing , Pn) havingnonFor any vectorp = (P1, P2, the weighted define to 1, summing weights negative

to x* = p* = (1/3, 0, 2/3), corresponding

(x1, X3, X3)

or

TABLE 10

25 CV(&) l .25 .20 .15 .10 .05 0 .29 .24 .21 .17 .15 .14

50 .27 .22 .18 .14 .11 .10

100 .26 .21 .17 .12 .09 .07

200.25 .21 .16 .11 .07 .05

.25 .20 .15 .10 .05 0

Note:Basedon (9.1),assuming Ef&} = 0.

BOOTSTRAP

METHODS

FOR MEASURES

OF STATISTICAL

ACCURACY

73

x3
1/27 p*-(l/3,O,2/3)
/9* 1/9

(10.6) p(i)
i = 1, 2,
*..

(1, 1, ..,

1,

, 1, ...,

1)

P(2)
1/9

p
2/9

P(1)
1/9

to sample size n - 1 ratherthan n. The linear function Oj(p) is calculated to be

form, (10.4),it forn = 3; because0 is the functional pointscorrespond does notmatter thatthejackknife

, n. Fig. 15 indicates the jackknifepoints

(10.7)

&J(p)

0(i) + (P - p0)

where,in termsof 0(i) (10.8) Ui = (n

U is thevector withithcoordinate
-

0(p(i), 0(.) = 1)(0(.) -0(i))

XO=1 and 0(i)/n,

Xxi

1/27

1/9

(3)

1/9

1/27

X2

error (Tukey, 1958; Thejackknife estimate ofstandard 1974)is Miller,


___

FIG. 15. The bootstrap and jackknifesamplingpoints in the case n = 3. The bootstrap points (.) are shown withtheir probabilities.

(10.9)

.0

[n

{-(i)

0nn-())2J

]~~~1/2 [
=

U211/2

distribution empirical (10.3) F(p): probability n. Pion xi i = 1, ***, Forp = p0= 1/n, theweighted distribution empirical equalsF, (1.4). to p is a resampled valueof0, Corresponding (10.4) 0(p) O(F(p)). The shortened notation 0(p) assumesthat the data fixed. (x1,x2, *--, xn) is considered Noticethat0(p0) = 0(F) is theobserved valueofthestatistic ofinterest. The bootstrap estimate a, (2.3),can thenbe written (10.5) a =[var*0(p*)Il/, where variancewithrespect to distrivar*indicates bution(10.2). In termsof Fig. 15, 'a is the standard deviation of the ten possiblebootstrap values 9(p*) as shown. weighted It lookslikewe couldalwayscalculate'a simply by the numberof doinga finitesum. Unfortunately, forn = 15 so bootstrap pointsis (2n-1), 77,558,710 of'a is usually calculation straightforward impractical. That is whywe have emphasized Monte Carlo apto v. Therneau (1983) considersthe prbximations of methods moreefficient thanpureMonte question better Carlo, but at presentthereis no generally method available. there is another to approximatHowever, approach ing (10.5). We can replacethe usuallycomplicated function linearin p, and 0(p) by an approximation thenuse thewellknown formula for themultinomial varianceof a linearfunction. The jackknife approxifunction ofp which mationOj(p) is thelinear matches to the 0(p), (10.4), at the n points corresponding deletionof a singlexi fromthe observed data set

A standard givesthe followmultinomial calculation ingtheorem (Efron, 1982a),


THEOREM. The jackknifeestimateof standarderestimateof rorequals [n/(n - 1)]1/2 timesthebootstrap
F

error standard forOj,


(10.10)
=

[n

inI
-

1~~~~~1/2

1 var.*0J(p*)

unbiasedforu2 in the case where0 = i, the sample mean.We couldmultiply thebootstrap estimate a' by thissamefactor, and achievethesameunbiasedness, butthere doesnotseemtobe anyconsistent advantage to doing so. The jackknife thanB = n,rather requires 50 to 200 resamples, at theexpense ofadding a linear to thestandard error estimate. Tables approximation 1 and 2 indicate thatthereis someestimating effilostin making thisapproximation. For statisciency to tics like the sample medianwhichare difficult is useless (see the jackknife approximate linearly, Section3.4 ofEfron, 1982a). There is a moreobviouslinearapproximation to
Taylor 0(p) than Oj(p). Why not use the first-order is the idea of Jaeckel's infinitesimal jackknife(1972).
O^(P)
0(P0)

of 0. The factor [n/(n - 1)/2 in (10.10) makes

In other is itself almost words, thejackknife estimate to a linear a bootstrap estimate approximation applied
&Tj

for series expansion 0(p) aboutthepointp = p0?This

The Taylorseriesapproximation turns outto be


+ (P - P

where
(10.11) U?
=

lim 0((i

)P + e5i) -"0(Po)

5,beingthe ith coordinate This suggests vector. the

74

B. EFRON AND R. TIBSHIRANI

infinitesimal jackknife estimate ofstandard error (10.12) ^ =-[var *T(p*)Il/2=[=U02/n


]

withvar*stillindicating varianceunder(10.2). The ordinary jackknife can be thought of as takinge =


-1/(n
-

tesimal letse -*0, thereby thename. jackknife earning The U? are valuesofwhatMallows(1974) callsthe empirical influence function. Their definition is a estimate ofthetrueinfluence nonparametric function
IF(x)
=

1) in the definition of U?, while the infini-

namesforthe same method. Noticethatthe results reported in line7 ofTable 2 showa"severe downward bias. Efronand Stein (1981) showthatthe ordinary is alwaysbiased upward, in a sense made jackknife precisein that paper. In the authors'opinionthe ordinary jackknife is themethod ofchoiceifone does notwantto do thebootstrap computations. ACKNOWLEDGMENT This paperis based on a previous review articleapin Behaviormetrika. pearing The authors and Editor aregrateful to that for journal graciously this allowing revision to appearhere. REFERENCES
ANDERSON, 0. D. (1975). Time Series Analysis and Forecasting: BAHADUR,

lim 0((1
e--O

)F + ek)
e

0(F)

distribution mass 1 Ax beingthe degenerate putting on x. The right side of (10.12) is thenthe obvious estimateof the influence function approximation to the standard errorof 0 (Hampel,1974), v(F)[ IF2(x) dF(x)/n]1/2.The empirical influence function method and the infinitesimal jackknife giveidentical estimates ofstandard error. How have statisticians gottenalong forso many methodslike the jackknife years without and the The answer is the deltamethod, whichis bootstrap? still themost useddevice for commonly approximating standard errors. The method appliesto statistics of . , QA), the form wheret(-, ., -. ., .) is a t(Q1, Q2, known function and each Qa is an observed average, Qa(Xi)/n. For example, the correlation0 is Qa = X7n=1 a function ofA = 5 suchaverages; theaverage ofthe first coordinate values,the second coordinates, the firstcoordinatessquared,the second coordinates squared, and thecross-products. In itsnonparametric formulation, thedeltamethod worksby (a) expanding t in a linearTaylor series aboutthe expectations of the Qa; (b) evaluating the standard errorof the Taylorseriesusingthe usual for variances and covariances ofaverages; expressions and (c) substituting -y(F) forany unknown quantity in (b). Forexample, thenonparametric 'y(F) occurring deltamethod estimates ofthecorthestandard error 0 by relation
[ A240
,-;-

R. (1984). Bootstrap in statistics. methods Jahrb. Math. Ver.86 14-30. BICKEL, P. J.andFREEDMAN, D. A. (1981).Someasymptotic theory for thebootstrap. Ann.Statist. 9 1196-1217. Box, G. E. P. and Cox, D. R. (1964).An analysis oftransformaBERAN, BREIMAN,

Statist.27 1115-1122.

R. and SAVAGE, L. (1956).The nonexistence ofcertain statistical procedures in nonparametric problems. Ann.Math.

The Box-JenkinsApproach.Butterworth, London.

models Cox, D. R. (1972).Regression and lifetables. J. R. Statist.


CRAMER, H. (1946). MathematicalMethods of Statistics.Princeton EFRON, EFRON, EFRON, EFRON,

Amer.Statist.Assoc. 80 580-619. Soc. Ser. B 34 187-202.

L. and FRIEDMAN, J. H. (1985). Estimating optimal for multiple transformations and correlation. J. regression

tions.J. R. Statist.Soc. Ser. B 26 211-252.

knife. Ann. Statist. 7 1-26.

NewJersey. University Press,Princeton, B. (1979a).Bootstrap methods: another lookat thejack-

the unthinkable.Soc. Ind. Appl. Math. 21 460-480. Statist.Assoc. 76 312-319.

B. (1979b). Computers andthetheory ofstatistics: thinking B. (1981a). Censored data and the bootstrap. J. Amer.

B. (1981b).Nonparametric ofstandard estimates error: the jackknife, thebootstrap, andother resampling methods, Biometrika 68 589-599. EFRON, B. (1982a).The jackknife, thebootstrap, and other resamB. (1982b).Transformation hownormal is a one theory: ofdistributions? parameter Ann.Statist. family 10 323-339. EFRON, B. (1982c).Maximum likelihood anddecision Ann. theory.
EFRON,

pling plans. Soc. Ind. Appl. Math. CBMS-Natl. Sci. Found. Monogr.38.

I4n fL20

04 +

222
+

4,22

4231
-

413

/
A

L02

L20L02

A11

A11/02

+A

Al11L02 J

Statist. 10 340-356.

EFRON,

where,in termsof xi = (yi, zi),


-Lg z

(Y

-)

-)h

(1946),p. 359). (Cramer


For statistics oftheform0 = t(Q1, l.., delta methodand the infinitesQA), the nonparametric imaljackknife give thesame estimateofstandarderror
THEOREM.

(Efron, 1982c).

316-331. B. (1984).Better confidence intervals. Tech.Rep. bootstrap Univ.Dept.Statist. Stanford confidence fora class of EFRON, B. (1985). Bootstrap intervals Biometrika 72 45-58. parametric problems. lookat thebootstrap, EFRON, B. and GONG, G. (1983).A leisurely the jackknife, and cross-validation. Amer.Statistician 37 36-48. estimate ofvariance. EFRON, B. andSTEIN, C. (1981).Thejackknife
EFRON,

in cross-validation. J. Amer. Statist.Assoc. 78 improvements

B. (1983).Estimating rateof a prediction rule: the error

Ann. Statist.9 586-596.

The infinitesimal jackknife, the deltamethod, and theempirical influence function approachare three

Statist.Soc. Ser. B 16 175-183. FRIEDMAN, J. H. and STUETZLE, W. (1981). Projection pursuit J. Amer.Statist.Assoc. 76 817-823. regression.

FIELLER,

in interval E. C. (1954).Someproblems estimation. J. R.

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY


26 243-250. Technometrics of scatter-plots. smoothing HAMPEL, F. R. (1974). The influencecurve and its role in robust estimation.J. Amer.Statist.Assoc. 69 383-393. HASTIE, T. J. and TIBSHIRANI, R. J. (1985). Discussion of Peter Huber's "ProjectionPursuit."Ann. Statist. 13 502-508. the jackknifewithspecial referHINKLEY, D. V. (1978). Improving estimation.Biometrika65, 13-22. ence to correlation HYDE, J. (1980). Survival Analysis with IncompleteObservations. Casebook.Wiley,New York. Biostatistics jackknife.MemorandumMM JAECKEL, L. (1972). The infinitesimal 72-1215-11.Bell Laboratories,MurrayHill, New Jersey. JOHNSON, N. and KOTZ, S. (1970). Continuous UnivariateDistriBoston, Vol. 2. butions.HoughtonMifflin, KAPLAN, E. L. and MEIER, P. (1958). Nonparametricestimation fromincompletesamples. J. Amer. Statist.Assoc. 53 457-481. ofthe maximum KIEFER, J. and WOLFOWITZ, J. (1956). Consistency manyincidenin the presenceof infinitely likelihoodestimator tal parameters. Ann. Math. Statist.27 887-906.
FRIEDMAN, J. H. and TIBSHIRANI, R. J. (1984). The monotone

75

and Its Applications. RAO,C. R. (1973). Linear StatisticalInference


SCHENKER,

Memorandum, in robustness. C. (1974).On sometopics MALLOWS, Hill,NewJersey. Murray Bell Laboratories, Biometrika 61 MILLER, R. G. (1974). The jackknife-a review. 1-17. censored with MILLER, R. G. and HALPERN, J. (1982).Regression 69 521-531. data.Biometrika NewYork. Wiley, intervals. confidence about bootstrap N. (1985).Qualms

bootstrap. ofEfron's accuracy K. (1981).On theasymptotic SINGH, thebootfor techniques reduction T. (1983).Variance ofStatisDepartment University, Stanford Ph.D. thesis, strap. tics. estiTIBSHIRANI, R. J. and HASTIE, T. J. (1984).Local likelihood 97. Univ.Dept.Statist. Tech.Rep.Stanford mation. in notquitelargesamples, TUKEY, J. (1958).Bias and confidence
THERNEAU,

J. Amer.Statist.Assoc. 80 360-361.

Ann. Statist.9 1187-1195.

abstract.Ann. Math. Statist.29 614.

Comment
J.A. Hartigan
on a are to be congratulated Efronand Tibshirani of the manyuses of survey persuasive wide-ranging on what Theyarea bitcagey technology. theboostrap at theend butthedescription is or is nota bootstrap, ofSection4 seemsto coverall thecases; somedatay F; it distribution probability an unknown comesfrom ofsomefunction thedistribution toestimate is desired the R(y, F) givenF; and thisis doneby estimating F where F is an estimate ofR (y*,F) given distribution F. theknown from ofF basedon y,andy* is sampled of in anyapplication problems Therewillbe three (1) how to choose the estimateF? the bootstrap: F? and (3) how of y* from (2) how muchsampling of R(y*, F) given F to close is the distribution F? R(y, F) given ofestimates a variety suggest and Tibshirani Efron and autoregression, sampling, F forsimplerandom their remarksabout (3) are confined regression; of the bootstrap demonstrations to empirical mainly situations. in specific about the bootI have some generalreservations techwith subsampling on based experiences my strap be a Let X1, 1975). 1969, X, (Hartigan, niques F, let F, be the a distribution samplefrom random
...,

ofStatistics, Professor isEugene Higgins J.A. Hartigan New Haven, 2179 Yale Station, Box Yale University, CT 06520.

and supposethat t(Fn) is an distribution, empirical t(F). The staparameter ofsomepopulation estimate subsamples random for several computed is tistic t(F,) with in the subsample appearing (each observation and the set of t(Fn) values obtained is 1/2), probability distribution the posterior as a samplefrom regarded of the deviation the standard of t(F). For example, of error standard the of estimate is an t(FJ) t(F,) to is notrestricted theprocedure from t(F); however, realvaluedt. seems to work not too badly in The procedure of behaviors and second-order at the firstgetting but it noteffective t(Fn)whent(Fn)is nearnormal, bias,and skewness. behavior, third-order in handling hugesamples is notmuchpointin taking Thus there is not relevant; behavior third-order the since t(F,) t(Fn)nearnormal, works onlyfor andiftheprocedure standard for estimating procedures arelessfancy there thesampleup into10 subsamsuchas dividing error deviastandard their plesofequal size andcomputing morebias than having tion. (True, this introduces about halfthe each containing randomsubsamples we evenift(Fn)is notnormal, Indeed, observations.) forthe median intervals can obtainexactconfidence Even fivesubusingthe 10 subsamples. of t(Fn110) idea of the standard sampleswill givea respectable error. (A) is thebootbackto thebootstrap: Transferring