Вы находитесь на странице: 1из 10

Local Likelihood Estimation

Author(s): Robert Tibshirani and Trevor Hastie


Source: Journal of the American Statistical Association, Vol. 82, No. 398 (Jun., 1987), pp. 559567
Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association
Stable URL: http://www.jstor.org/stable/2289465 .
Accessed: 11/11/2014 14:30
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve
and extend access to Journal of the American Statistical Association.

http://www.jstor.org

This content downloaded from 128.211.171.2 on Tue, 11 Nov 2014 14:30:10 PM


All use subject to JSTOR Terms and Conditions

Local LikelihoodEstimation
and TREVORHASTIE*
ROBERTTIBSHIRANI
tool(as
Thesesmoothsareusefulas a descriptive
is appliedtodataoftheform{(xl,yi), (x2,Y2), opment.
smoother
A scatterplot
Y
on
the
of
local
to
estimate
nonand
uses
for
blocks
,
and
also
as
building
seen
above)
have
we
dependence
fiting
(x,,,y,,)}
*.
whichfitsa least parametric
linessmoother,
X. A simpleexampleis therunning
in
developments
models.Important
regression
ina windowaroundeachx value.The
squareslinetothey valuesfalling
Stuetzle
and
at x is givenby thevalueof theleast the latterarea can be foundin Friedman
function
valueof theestimated
(1985).
theleastsquaresline,which (1981) and Breimanand Friedman
generalizes
squareslineat x. A smoother
ideastootherkinds
we extendsmoothing
In thisarticle,
of Y on X is linear.
assumesthatthedependence
to likelihood-basedof data. In particular,
In thisarticle,we extendtheidea oflocalfitting
(X,
Y) data whose
we consider
is to theclassof generalized relationship
models.One suchapplication
regression
function.
a
likelihood
through
is expressible
1972).We enlargethisclassby
linearmodels(NelderandWedderburn
like
function
parametric
a
simple
function Our idea is to replace
smooth
withanunspecified
form
thecovariate
replacing
hlo+ xf,h
an
with
in
the
likelihood
+ xfllappearing
unspecified
we call hBo
fromthedata by a technique
is estimated
s(x). Thisfunction
s(x) and to estimates(x) locally.Take,
likelihood smoothfunction
ofmaximum
The methodconsists
estimation.
locallikelihood
forl,Band li, appliedin a windowaroundeach x value. forexample,thesituation
estimation
inwhichy is a 0-1 responseand
algorithm. x is a covariate.The usuallinearlogistic
an iterative
through
areincorporated
Multiplecovariates
modelassumes
hazto theproportional
technique
We also applythelocallikelihood
= Pr(Y =
=
+
wherep(x)
thatlog(p(x)/(1 p(x)))
/5o x,6l,
ardsmodelofCox (1972),a modelforcensoreddata.Theproportional
logistic
the
2
shows
Figure
=
data
set,
such
a
1 1X x). For
A(t I x) = Ao(t)exp(xf1)is replacedby A(t I x) =
hazardsassumption
On the
likelihood.
maximum
by
estimated
line,
fromthedatabythe regression
s(x) is estimated
and thefunction
Ao(t)exp(s(x)),
method.
locallikelihood
cannot
we
(Since
shown.
are
logits
the
sameplot, observed
provesto takethelogitof0 or 1, they's weregroupedfirst.)
technique
In somerealdataexamples,thelocallikelihood
Also
It is usefulas a dedependencies.
nonlinear
be effective
in uncovering
the
on
based
2
a
smooth
estimate,
in
is
We also appearing Figure
of thecovariates.
toolor to suggesttransformations
scriptive
=
with
s(x),
p(x)))
model
log(p(x)I(1
more general
forinference.
discusssomemethods
As was thecase in the
smoothfunction.
s(x) an arbitrary
betterjob ofcapXthan
thelinedoes.
Y
between
and
therelationship
turing
we
a
technique
by
produced
2
was
The smoothin Figure

linearmodels;Nonparametricscatterplot
Generalized
KEY WORDS: Smoothing;
example,thesmoothdoes a
regression.

call local likelihoodestimation.The basic idea is a simple

1. INTRODUCTION

usedin scatterplot
technique
ofthelocalfitting
extension
Figure1 plots100datapairsalongwiththeleastsquares smoothing.
a linear
Givena globalmethodforestimating
of a response(Y) and response(e.g., maximum
therelationship
linesummarizing
inthelinestimation
likelihood
a covariate(X). Also shownin Figure1 is a scatterplot ear logisticmodel),we applyit locally,estimating
a sepsmooth.This was computedby a typeof local fitting- aratelinein a windowaroundeachx value.The valueof
aroundeach x value a windowof 20 pointswas formed theestimated
line at x is theestimateof thesmoothreanda leastsquareslinewasfittothepointsinthewindow. sponsefunction
areincorporated
atx. Multiplecovariates
The valueofthesmoothat x is givenbythevalueofthe in an additivemodelthatis estimated
iteratively.
local line at x. As we can see, thesmoothcapturesthe
esestimates
producedbylocallikelihood
The function
trendof thedata betterthantheleastsquaresline.The timation
analysis,
exploratory
for
descriptive,
useful
are
reasonis simple-thesmoothdoes notmakeas rigidan or to suggesta transformation
of a covariate.By varying
lineabouttheformofthere- the windowsize, we can controlthe smoothness
as thestraight
assumption
of the
lationship
betweenY andX.
smoother
the
the
windows,
the
function:
larger
estimated
In recentyears,therehas beena greatdeal ofinterest theestimated
100%
Wheneachwindowcontains
function.
(see, e.g., Cleve- of the data, the local likelihoodprocedurecorresponds
in scatterplot
bylocalfitting
smoothing
land 1979;Friedmanand Stuetzle1981),and theavaila- exactlyto thegloballinearmethod.
has been essentialin thisdevelbilityof fastcomputers
An outlineof thearticleis as follows.In Section2 we
thelocallikeandintroduce
smoothing
reviewscatterplot
models
hazard
and proportional
* RobertTibshirani
Pro- lihoodidea. The logistic
FellowandAssistant
is NSERC University
discuss
we
3
In
"degrees
and De- are usedforillustration. Section
Medicineand Biostatistics
of Preventive
fessor,Department
CanadaM5S1A8. of freedom"
ofToronto,
in Section4 we
ofStatistics,
and finally
Toronto,
University
partment
approximations,
andData discusstherelationship
staff
intheStatistics
oftheresearch
TrevorHastieis a member
ofthisworkto othertechniques.
MurrayHill,NJ
AnalysisResearchGroup,AT&T Bell Laboratories,
ofthisworkcanbe foundinTibshirani
Fried- Earlydevelopment
07974.The authorsthankTomDiCiccio,BradleyEfron,Jerome
for
in
Art
Owen
and
for
their
helpfulcomments,
man,and Paul Switzer
(1982) and Hastie (1983), and the readerinterested
by further
Suggestions
bysimulation.
hisideason finding
degreesof freedom
(1984).
detailsmayreferto Tibshirani
A
article
this
substantially.largepart
twoeditorsanda referee
improved
wherethefirst
at Stanford
University,
of thisresearchwas completed
Research
authorwassupported
bytheNaturalSciencesandEngineering
weresupported
bytheDepartment
CouncilofCanadaandbothauthors
of Energy,OfficeofNavalResearchand bytheU.S. ArmyResearch
Office.

? 1987AmericanStatistical
Association
Association
oftheAmericanStatistical
Journal
and Methods
June1987,Vol.82, No. 398,Theory

559

This content downloaded from 128.211.171.2 on Tue, 11 Nov 2014 14:30:10 PM


All use subject to JSTOR Terms and Conditions

Jounal of the American StatisticalAssociation,June1987

560

controlsthesmoothness
of theresulting
estimate:larger
spanswillproducesmoother
but
(lessvariable)estimates
withpossiblymorebias. A spanof l/n corresponds
to 1
pointperneighborhood.
The spanis eitherfixeda priori
or chosenadaptively
fromthedata.
IfFitstandsforarithmetic
mean,thenA(.) istherunning
mean,a verysimplescatterplot
smoother.The running
meanis notusuallysatisfactory,
becauseit createslarge
biases at the endpoints.In addition,unlessthe abscissa
valuesare equallyspacedit does notreproducestraight
lines(i.e., ifthedatalie exactlyalonga straight
line,the
smoothof thedata willnotbe a straight
line). A refinementoftherunning
linessmoother,
average,therunning
alleviatestheseproblems.Insteadof fitting
a meanin a
itfitsa leastsquaresline.The valueofthe
neighborhood,
leastsquareslineat xi is theestimated
smooththere.
The running
linessmootheris the mostobviousgeneralization
oftheleastsquaresline.Whenw is 2 (so that
containsall of thedata points),the
everyneighborhood
smoothagreesexactlywiththe least squaresline. Allinessmoother
thoughverysimplein nature,therunning
producesreasonableresultsand has the advantagethat
theestimates
can be updated.Thatis,tofindA(xi+1)
from
S(xi),onlyan 0(1) operationis needed.Thismakesthe
entiresmoothing
algorithm
O(n).

0
o*
CM

C\i LD~~~~~~~~~

3
x

Figufre1. Least Squares Line an)d ScafferplotSmooth.

2.

LOCAL LIKELIHOOD ESTIMATIONA DESCRIPTION

2.1 A Review of Scalterplot Smoothing

Givenindependent
datapairsJ(xj,yl), .
(Xn Mn)}
assumedto be realizations
ofa responsevariableY anda 2.2 Local LikelihoodEstimation:Definition
smoother
estimates
predictor
X, a scatterplot
realizations
Supposethatwe have n independent
(xi,
(1
s(x) = E(Y I X = x)
Yi), (x2, Y2), * *. , (Xn,Yn)of randomvariablesX and Y
ofx. The
Wewillnotdefineexactly withY I X = x -f(Y, 0), where0 is a function
wheres( ) is a smoothfunction.
, on) =
H' f(y,, 0).
what'64smooth"
meanshere; vaguelyspeaking,we are likelihoodis givenbyL(01, 02,
ofs( ) as a function
less smooththana straight A standardmodelingprocedurewouldassumea parsithinking
ThenL(-)
moniousformfortheOi's,say0i = flo+ x1fil.
thanan interpolating
linebutsmoother
polynomial.
would
of
and
these
would
be
a
function
parameters
f5o
Thereare manywaysto estimate
we
will
concenfIl;
s( );
The
local
likelihood
estimated
be
by
maximizing
L(*).
is
tratehereon themethodof "4local
This
defined
fitting."
as follows.Let Fit(D, x) be somereal-valued
of methodassumes onlythatO0is a "smooth" functionof x:
function
if
oont
2[n
4)/
5onsaentaaial.Tesa
3
on
the
data
value
at
x
the
D, representing
x, depend'ing
(4)
0i = s(xi).
of somefunction
fittedto theAdata. For example,Fit(D,
x) could be fBo+xfiB whereBloand#1are theregression
coefficients
basedon D. A localfitestimatei'sdefinedas
CM4
c
t

SA(xi)= Fit(f(xj,yj) : j E Nil

xi),

\*

(2)

whereNi is a "4neighborhood"
of xi (a set of indexesof
x
whose
values
to xi). The onlyneighare
"4close"
points
borhoodswe willconsiderin thisarticleare symmetric
>s
is
nearest
Associatedwitha neighborhood
neighborhoods.
of the 0)
thespan or windowsize w; thisis theproportion
totalpointsthateachneighborhood
contains.Let [x]representtheintegerpartofx and assumethat[wn]is odd.
Thena spanw symmetric
contains
nearestneighborhood
the
ith
[wn]points:
pointplus ([wn] 1)/2 pointson
eithersideoftheithpoint.Assuming
thatthedatapoilnts
are sortedby'increasing
x value,a formaldefinition
is
Ni = imax(i - ([wn] - 1)/2,1), . . . 9 i - 19

.,

m**
*

**

.ft

**

-A

....

If

***
.

fI

....

*\
*\

c'J

10

20

30

40

50

X
Figure
2. Logistic
Regression
LineandLocalUikelihood
Smooth.

This content downloaded from 128.211.171.2 on Tue, 11 Nov 2014 14:30:10 PM


All use subject to JSTOR Terms and Conditions

Tibshirani and Hastle: Local Likelihood Estimation

561

of
ofthefamily
can be viewedas an extension
procedure
1972).
linearmodels(NelderandWedderburn
generalized
A generalized
linearmodelis definedbyY I x - f(Y, 0)
+ xIl. In thelocal likelihoodprocedure
and g(,u) = /ho
to s(x).
,6o+ xfllis generalized
we havemodeledthenatNotethatinthisformulation
as wellmodelsomeother
just
could
0.
We
parameter
ural
+ x/fili
(5)
S^(X) -fib1
(likeEY); in anyspecificproblem,theremay
parameter
wherefioiandf,limaximizethelocal likelihood
to another.For
one parameterization
be reasonsto prefer
itis moreconproblem,
in
response
the
binary
example,
(6)
Li(fOi,fili)= H f (yjf/Oi+ Xiflli).
venientto modelthe naturalparameterlog[pl(l - p)]
jeNM
p, becausethelatterwouldrequire
overthepoints thantheexpectation
Note thati is fixedin (6), withj varying
smoothstaybetween0 and 1.
estimated
that
the
oftheneighborhood.
=
in the exponentialfamily
method
producesa smoothestimate Estimationof Di (fioi, fli1)
Thelocallikelihood
likelihoodmodeloftheform(5) and
or
local
any
model
of the curves(.) at the points{xl, x2, . . , xn}. It avoids
searchineach
usinga Newton-Raphson
Note that (6) is performed
overfitting
by averagingover neighborhoods.
goingin orderas i runsfrom1 to n. The
neighborhood,
valuesof A(x)forx not equal to one of thexi's can be
valuefor
locallikelihoodestimatePiis used as a starting
obtainedbysomesortofinterpolation.
do not
estimates
the
because
of
maximization
the
Li+,(-);
If we takef,li= 0 foreveryi, whatwe call local liketo thenext,
muchfromone neighborhood
tend
to
differ
analwithconstants,
wehavea procedure
lihoodestimation
is
achievedin two or threeiteramean.Thisis notas useful,because convergence typically
ogousto therunning
tions.
it tendsto producelargebiasesat theendpoints,
butis
an O(kn)
ofpointsina neighborhood,
Ifknisthenumber
moretractable
theoretically.
each locallikelihood
forcomputing
is
required
operation
Moregenerally,
supposethatwe haven data tuplesof
and thustheentireprocedureis O(knn).Thisis
theform(yi,xi,ci), wherey is a responsevariable,x is a estimate
nota problemformoderaten (sayn 200), becauseof
anyadditionalincovariate,and c is a vectorcontaining
iterations
required.For largerdata
c wouldindicate thesmallnumberof
formation.
(In censoreddataproblems,
thefitat
bycalculating
procedure
up
the
speed
sets,
we
whether
likeregression,
y is censored;inmanyproblems,
timeby about
reduces
the
running
mth
this
point;
every
considerations
lead to
c is empty.)Supposethatmodeling
x valuesare
fortheremaining
a factorofm. The smooths
maximization
of a function
oftheform
obtainedbyinterpolation.
L(01, 02, . . . , On) = gn(yl, Y2, . . . Yn
in thenonSomesubtleties
mayarisein theestimation
standard
case-see Example2, theCox model(Sec. 2.5).

To estimate{s(x1), s(x2), . . ., s(xn)}, we could tryto


thiswould
maximize
L(s(x), s(x2),. . . , s(xn));however,
In
estimatedue to overfitting.
resultin an unsatisfactory
it wouldsimplyreproducethedata. As
manysituations,
of
an alternative,
we definethelocal likelihoodestimate
s(xi) as
A

Olg 02

. .*, OngClg C29 . . .

Cn)9

(7)

withthesuperscript
thatg is basedon n obser- 2.3 Remarks
denoting
vations. The local likelihood estimateof s( ) is &(x1)
2.3.1. Asymptotic
Properties.If the neighborhoods
f?i0+ x1hli,wherefloiand,li maximizethe local likeli- shrink
in size butthenumberofpointsin each neighborhood
rate,thenit is
at an appropriate
hood goes to infinity
will
estimate
fl'i) = g[Ni]({yj, floi+ xjfli, cj}, j E Ni), (8) reasonabletoexpectthatthelocallikelihood
Li(f0oi0
and
In Tibshirani
and efficient.
consistent
be
(pointwise)
in the neigh[Ni] denotingthe numberof observations
Hastie(1985)we provedthisfora singlecovariatein exhazardsmodelof Cox (1972)
borhood.The proportional
in
localconstants
Notethatwhenfitting
family.
ponential
modelthatfitsintothis
is an exampleof a nonstandard
value
forthefitted
is necessary
no iteration
thatsetting,
framework
(see Sec. 2.5).
themeanofthey'sintheneighborhood
because,uissimply
A specialcase of model(6) occurswhenY has an exand ?(xi) can be obtainedby applyingthe linkfunctionto
oftheform
density
ponentialfamily
thismean.Thuswe provetheresultby (a) showingthat
is asymptotically
oftheslopeparameter
(9) thecontribution
exp[{yjOj- b(Oj) - h(yj,a(4))}Ia(4)]
to the
a centrallimittheorem
and (b) applying
negligible
withrespectto somecarriermeasure.If thescaleparamthis
As pointedoutbya referee,
estimate.
mean
running
if4 is
eter0 is known,then(9) is an exponential
family;
locallinesinstead
fitting
thequestion:Is itworthwhile
begs
but
an exponential
family,
unknown,
(9) is notgenerally
linesis worthWe believethatfitting
of local constants?
becausethelocal
theestimation
procedureis unchanged
The simat
the
it
bias
endpoints.
reduces
while
because
for0 does notinvolvea(+). Lettingu =
scorefunction
ulationdescribedin Section2.4 providessomeevidence
g(-)
EY, we assumethat0 = g(,u) = s(x). The function
covariates
to notethatifmultiple
ofthis.It is important
andtherelation6 = g(p) is the
is calledthelinkfunction,
is requiredforlocal constants,
are presenttheniteration
thelocal evenin theexponential
canonicalornaturallink.Weproceedbyforming
family.
likelihood[as in (6)] and estimatethelocal slopeand inofFreedom."'
ofParameters-"Degrees
2.3.2. Number
tercept/10,and ,Bi Note thatin the Gaussiancase, 3()
w,
1/n< w <
on
span
fit
based
linessmoothdefinedearlier.This Givena local likelihood
reducesto therunning
=

This content downloaded from 128.211.171.2 on Tue, 11 Nov 2014 14:30:10 PM


All use subject to JSTOR Terms and Conditions

Journal of the American Statistical Association, June 1987

562

couldbe
estimates
kernel-type
simple.Moresophisticated
used to make the procedurerobustand increasethe
ideas
Borrowing
of the estimatedfunction.
smoothness
(see, e.g., Cleveland1979),
smoothing
fromscatterplot
pointsbased bothon theirdistance
we can downweight
and thesize of the
fromthecenterof theneighborhood
downweightwe havenotinvestigated
residual.Although
forthe proportional
ing in general,a robustalgorithm
procedure
2.3.3. SpanSelection. Thelocallikelihood
(1984).
hazardsmodelis discussedin Tibshirani
requiresthechoiceofa spansize w. One methodis to try
estimateand 2.4 Example 1: The Logistic Model for
a rangeof spansand examinetheresulting
thevalueofthegloballikelihoodthatitproduces.SomeBinary Data
itmaybe desirableto havean automatic
times,however,
Supposethatwe have data of theform{(xj, Yi), (x2,
one
smoothing,
methodforspanselection.In scatterplot
popularmethodforchoosingthespanis cross-validationY2), *. . , (Xn,y,n)}, wherethe responsey is 0 or 1, x is an
areassumedto
variable,andtheobservations
(see, e.g., Friedmanand Stuetzle1982).In thelocallike- explanatory
be
independent.
turnsout to be veryexcross-validation
lihoodsetting,
Let x = (1, x) and let p(x) = Pr(y = 1 x). The logAs an alternative,
we use, as a
pensivecomputationally.
ofthedatais
likelihood
criroughruleofthumb,a formofAkaike'sinformation
n
terion(AIC) (Akaike1973).Havingfita modelwithmaxL
the
parameters,
imizedlikelihoodL and p independent
=
(10)
log
E {yj log pJ + (1 -yj)log(1 -pj)},
j=1
termmeasures
AIC is definedby2 log L + 2p. The first
the goodnessof fitof the model,and the secondterm wherepi = p(xj). The linearlogisticmodelassumesthat
used.HencetheAIC logitp(x) = xtf.
penalizesthenumberofparameters
andbias.The AIC canbe
to tradeoffvariability
attempts
The local likelihoodmethod,on the otherhand,asofas an extension
ofMallows's(1973)Cp statistic sumesthatlogitp(x) = s(x) ands(x) is estimated
thought
through
to likelihoodmodels.(The two criteriacoincidein the thelocal likelihoodcorresponding
to (6). Withmultiple
context.)In thissettingwe selectw to covariates,the modeltakesthe formlogitp(x) = a +
linearregression
minimize
AIC basedon thevalueofthegloballikelihood EP sQ), whereeach sj(-) is assumedto have mean0 to
as describedin Section4. ensureidentifiability.
and thenumberofparameters,
on
thistechnique
We nowillustrate
We do nothave anyresults,however,on theasymptoticsomerealdata.
ofthismethodofspanselection.
correctness
A studyconductedbetween1958and 1970at theUniof Chicago'sBillingsHospitalconcernedthesurversity
2.3.4. Handlingof Ties. Fordatawithtiedx values,
forbreast
of
vival
patientswho had undergonesurgery
is expanded
twothings
aredone.First,eachneighborhood
on
306
observations
There
are
1976).
cancer
(Haberman
to ensurethatifa pointj is ina givenneigh(ifnecessary)
=
1
5
0
otherif
survived
years,
4
variables:
2
patient
y
borhood,so is any otherpointk havingXk= xj. This
invariant
to theincoming
makestheestimation
procedure
orderofthedatapoints.Second,thesmoothsforeachof
the tied values are averagedand each smoothvalue is
it uses
2, we wouldexpectthe "numberof parameters"
between2 (the numberforspan = 2)
to be somewhere
andn (thenumberforspan = 1In). In Section4 we proor "degrees
videa definition
of"numberofparameters,"
it. This
of freedom"(df), and a methodforcomputing
withthevalueoftheoverall
can be used,in conjunction
likelihoodHn f(yi, di),to assessthefitofthemodel.

assignedthe average. That is, ifxj = xj+ = ... = xj+m,

thenforeachj - i - j + m, A(xi)is assignedthevalue


j+m

s^(xi)/(m+ 1).

2.3.5. MultipleCovariates. The previousdiscussion


ideacanbe usedtoestimate
showshowthelocallikelihood
areavailthesmoothfora singlecovariate.Ifp covariates
1 s&Q).Tibable,we assumea modeloftheform0 =
shirani(1984) discusseda forward
stepwiseapproachto
through
ofsucha model,withreadjustment
theestimation
worksbyholding
allbutone
Thisprocedure
"backfitting."
theremaining
smooth.This
smoothfixedandreestimating
can be
Backfitting
processis iterateduntilconvergence.
inone covariatebyforcing
usedfordetecting
nonlinearity
all oftheothercovariatesto have a linearfit.Manytheconveroreticaldetailsneedto be workedout,including
genceofthealgorithm
and selectionrules.

0*

0~~~~~~~~~~~~~~

10 20

30 40

.#of nodes

50

Procedures.The locallikelihood Figure3. Estimate


2.3.6. OtherFitting
forNumber
ofNodes(circles:locallikelihood
becauseit works smooth;
procedureuses local linearestimation
function).
Theareaofeachcircleis
parametric
broken
curve:
ofdatapoints.
tothenumber
bias at theendpoints)and is proportional
well (especiallyin reducing

This content downloaded from 128.211.171.2 on Tue, 11 Nov 2014 14:30:10 PM


All use subject to JSTOR Terms and Conditions

Tibshiraniand Hastle: Local Likelihood Estimation

563

wise; xl = age of patientat timeof operation;x2 = year

ofoperation;andX3= number
ofpositiveauxillary
nodes
detected.
A locallikelihood
of
fitwiththesinglevariablenumber
nodesreducedthenull"deviance"(i.e., twicelog-likelihoodratiostatistic)
from353.7to319.9usingan estimated
2.4 df.The smoothin Figure2 is theestimates(number
0L
ofnodes).Bycomparison,
fit(straight
a linearlogistic
line C;)
inFigure2) reducedthedevianceto 330.7on 1 df.When 0
all threevariableswereputintothemodel,thelocallikelihoodprocedure
(withbackfitting)
producedthesmooths
showninFigures3 and4 (yearofoperation
hadlittleeffect
andisnotshown).Thefinalmodelhasa devianceof307.74
on (306-2.4-2.5-2.4) = 298.7df.
Landwehr,Pregibon,and Shoemaker(1984) analyzed
of partialresidual
thisdata set to exploretheusefulness
plotsin identifying
parametric
formsofcovariateeffects.
Theirfinalmodelwas

CaC

logitp(x) = fio + xlll + xjfl2+

on

0)

-1.0

-0.5

0.0

0.5

1.0

ofLocalLikelihood
and Quartiles
Figure5. TrueQuadratic(asterisks)
Local Lines.
Fitting
Estimates,

X333 + X234

+ xlx2fi5
+ (log(l + x3))fi6.

CMJ

(11)

requiresquitea bitof
2. The partialresidualtechnique
The devianceofthismodelis 302.3on 299 df.The fitted
The
thevariouscovariateeffects.
in identifying
ingenuity
on Figures3
termsforeach covariateare superimposed
local likelihoodprocedure,on the otherhand,is autoand 4 (brokenlines),ignoring
themarginally
significant
matic.
are verysimilar.
x1x2term.The functions
Hastie (1984) and Hastie and Tibshirani(1986) dis- The local likelihoodtechniquealso has advantagesover
a modelof
thatis, fitting
of they's directly,
cussedtherelative
merits
ofthelocallikelihood
andpartial smoothing
residualplotprocedures.
Theygavetworeasonstosuggest theformy = I sj(xj). For morethanone covariate,one
wouldhave to truncatethe smoothsto ensurethatthe
whythelocal likelihoodprocedureis preferable:
thelogit,
valuesstaybetween0 and 1. By modeling
fitted
1. The partialresidualtechnique,
insuggesting
thepartechniqueavoidsthisproblem.It also
thelocallikelihood
formfora covariateeffect,
relieson theassumpametric
inherits
(locally)theusual advantagesof logisticregresare cortionthatthecovariateforms
fortheothereffects
sion.
rect.Indeed,theseeffects
areusuallyassumedtobe linear.
locallinesover
thevalue(ifany)offitting
To determine
The local likelihoodprocedurefindsthe bestfunctional
We chose a
we ran a smallsimulation.
local constants,
formforall covariatessimultaneously.
samplesizeof51 andfixedthex valuesequallyspacedon
from
[-1, 1]. Bernoullirandomvariablesweregenerated
themodel log(p(x)/(l - p(x))) = 2 - 3x2.Figures5 and
0
6 showthemedianand quartilesfor100local likelihood
A span
respectively.
linesandconstants,
fitting
estimates,
case.
each
in
.5
was
used
of
3 o.
We notice
arethetruequadraticfunction.
The asterisks
@
o\
morebiased
ofFigure6 areconsiderably
thattheestimates
whereasthetwomethodsare similar
neartheendpoints,
of
theusefulness
in themiddleofthedata.Thisconfirms
local linesto reduceendpointbias.
fitting
2.5 Example 2: Cox's Proportional
Hazards Model
In the censoreddata problemwe observedata triples

(yi, xi, 6i) (i = 1, 2, ...

, n), where6i indicateswhether

or nottheresponseyiis censored.The data are assumed


1 x2 - *
to be sortedby thecovariatex, thatis, xl
30 40 50 60 70 80
of
(1972)models
Cox
model
hazards
The
proportional
xn.
thatx acts
x
by
assuming
and
y
between
relationship
the
age
thatis,
way,
a
multiplicative
in
function
hazard
the
on
Figure
4. Estimate
forAge(circles:
locallikelihood
smooth;
broken
whereRo(Y)is an unspecified
I( x) = )R0(y)exp(xfl),
line:parametric
function).
Theareaofeachcircleisproportional
tothe
at covariate
and A( Ix) is thehazardfunction
function
number
ofdatapoints.

This content downloaded from 128.211.171.2 on Tue, 11 Nov 2014 14:30:10 PM


All use subject to JSTOR Terms and Conditions

Journal of the American Statistical Association, June 1987

564

Table 1. Mouse Leukemia Data

cY)

C'J

~~~~~*

Model

-2 log PL

Numberofparameters

Null
Smooth,span = .7
Linear
Linear + quadratic
Piecewise linear

1189.06
1173.98
1183.16
1183.07
1177.34

0
1.85
1
2
2

and Prentice(1980) analyzedtheresultsof


Kalbfleisch
a studydesignedto examinethegeneticand viralfactors
leuof spontaneous
thedevelopment
thatmayinfluence
204
in
AKR
mice.
data
set
contains
original
The
kemia
and
six
both
cancerous
and
with
covariates
observations,
and Prentice
deathsrecorded.Kalbfleisch
1.0 noncancerous
0.5
0.0
-0.5
-1.0
any
of
will
consider
a
number
analyses-we
performed
antibody
covariate
single
and
the
endpoint
as
the
death
X
values,alleveltookon continuous
level(%). Antibody
Figure6. TrueQuadratic (asterisks)and QuartilesofLocal likelihood
0.
of
the
had
a
value
mice
thoughabouthalfof
Estimates,FittingLocal Constants.
Table 1 showsthe resultsof the local likelihoodprocedureappliedto thesedata,anda graphoftheestimated
level x. This assumptionallows fBto be estimatedinde- smoothforantibody
is shownin Figure7. It is markedly
thepartiallikelihood nonlinear,
of AlO(y)by maximizing
pendently
at antibodylevel 7.5%. Also
slope
changing
(PL):
1
and quadratictermsfor
in
are
linear
included Table
Even witha quadraticterm,thefitof theparantibody.
PL = I|
worsethanthe local
(12) ametricCox modelis significantly
likelihoodsmooth.
Based on Figure7, a piecewiselinearcovariatewas
where D is the set of indexes of the uncensoredy's and
smooth
andrightmost
created
eachoftheleftmost
byjoining
Ri = {j IyjA 2: yi},therisksetattimeyi 0. We generalize
*
=
coFor
this
lines.
by
straight
point
to
the
bending
values
of
the
thisto A(y I x)
AO(y)exp(s(x)).Note thatbecause
-2
worse
PL
was
still
significantly
1177.34,
log
variate,
deteris
arbitrarybaseline hazard, the functions(x) only
we thanthe smoothmodel. This indicatesthatthe bowed
minedup to an additiveconstant,so fordefinitiveness
=
levels7.5% and
*
0. To estimates(xl), S(X2), * *, s(x*), we shapeof the smoothbetweenantibody
defines(xl)
the
is
data.
by
applythe local likelihoodtechnique.The local PL forthe 80% supported
data in Ni is
exp(a_
PLi =R
LocalConstants.n
Estimates,N
FIttn

xjAi)

(13)

levelx)
fiaore estimatedine
Thisasmto
OtepEP1s(). allowhs
PLi since
Note, however,that ai is not estimablefromlikelihoo
wthe parkitia,
pendetl
fofwr ste(p)ibymaxiizngr
the exp(aYi)termscancel one another,giving
(PL):erler
PL

~~~exp(xlAi)

(4

Let ,SimaximizeLi(-). Althoughxxi[and thuss(x)2] is not


estimablelocally,we can use theslope estimates{8,..
/3n} to estimate
{s(xl), . ,s(x,,)I, as follows.We have
S(xi) = f xis'(z) dz and s'(x) - ,i forx E- Nj; henceto

I0

off*xis'(z)dz based
estimate
s(xi) we canuse anyestimate
on (x1, f)
definedby

***,

^(Xi)

-We
(Xn /3fln)

E(Xj X

use the trapezoidal rule

xX 1)

2 3j1)

orMueLueiaa(h
otenmbroaapit)

Fiur 7. Loa ieiodSot


are ofec _icei.rootoa

(1

Withmorethanone covariate,themodel takestheform

20

a%itntib

40

This content downloaded from 128.211.171.2 on Tue, 11 Nov 2014 14:30:10 PM


All use subject to JSTOR Terms and Conditions

80

80

100

Tlbshirani
and Hastle: Local LikelihoodEstimation

3. DEGREESOF FREEDOMAPPROXIMATIONS

565

oftheform
withdensity
distributed
Y's areindependently
(9) with4 = 1, and definethedevianceas

linearmodels,thegoodnessoffitof an
In generalized
(18)
Dev(y, p) = 2[1(y) - 1(p)],
estimateg' is measuredbythedeviance.Wilks'stheorem
tellsus that,giventwonestedlinearmodelsand thehywherel(p) = E log fy(yi,jti, 4) is the log-likelihood,
pothesisthatthe smallermodelis correct,the deviance
of p insteadof 0.
as a function
forconvenience
written
decrease in fittingthe largermodel is asymptotically
thedfof a fit
We generalize(17) and define(implicitly)
wherePt andP2 are theranksof thetwolinear
Ox'
by
fit p
spaces.Thatis, the additionalnumberof parameters
givethenumberof degreesoffreedom(df) of thecorre- E Dev(y, ,u) = E Dev(y, p) - df(,u) + I(p, h), (19)
spondingdeviancedecrease.
whereh = E,u andI(p, h) is twicetheKullback-Leibler
This leads us to ask similarquestionsforthe smooth
distancebetweenp and h (see theAppendix).The first
our
estimatesdescribedin thisarticle.We will restrict
mechanism
ofthefitting
termon therightis independent
case. The questionof
family
discussion
to theexponential
andcompareswithn in (17) and,inmanycases,is asymparefitbya smooth?
interest
is, How many"parameters"
toticallyequal to n; the last termis a bias term.Now
Thiswilldependon thespan.Witha spanof2 (i.e., every
supposethat is basedon a covariatevectorx withspan
containsall of the data points),2 paramneighborhood
w. We showin theAppendixthatdf(,u)can be approxietersareused.Witha spanof1/n(i.e., 1 pointperneighmatrix
meanssmoother
matedbytr(S),ifS is therunning
are used. Thusfor
parameters
borhood),n independent
basedon x withspanw.
spansin the range1/nto 2, the numberof parameters Thisdefinition
Consider
is onceagainusefulfortesting.
between2 and n.
shouldbe somewhere
twofits,uiand basedon a covariatevectorx andspans
ofparameters"
or"degrees
definethe"number
Wefirst
based
matrices,
w1and w2.Let S, andS2be thesmoother
smoother,
alongthe same
of freedom"of a scatterplot
meanssmoothsof spansw,
on x, thatproducerunning
is a
linessmoother
linesas Cleveland(1979). A running
thatE 1 E,gE2,we
Thenassuming
and w2,respectively.
as y =
linearsmoother;thatis, thefit9 can be written
df(Al)
have that E((Dev(y, 1) - Dev(y, Ai2))
Sy, whereS is called a smoothermatrix.(This follows
these
of
The
derivation
approxtr(S1).
tr(S2)
is a linearoperation.)We df(pu2)
becauseleast squaresfitting
imations,givenin theAppendix,are rough,and we do
expandtheexpectedresidualsumofsquares(RSS),
arealso
assumptions
notprovideerrorbounds.Additional
function.
_
variance
in
the
variation
the
needed
about
-_
=
A)
)t(y
E(y
y'))
E(RSS(y,
fit,we caneasilyworkouttr(S)
Givena locallikelihood
= (n - [2 tr(S) - tr(StS)I)aU2
of
significance
the approximate
and use it to determine
+ ft(I - S)t(I - S)f,
(16) the smooth.As an example,for200 equallyspaced X
values and a span of .5, tr(S) is about3.6. Hence the
wheref = Ey and U2 = var(yi).The firsttermin (16) smoothusesroughly
3.6 parameters.
values,andthesecond
relatesto thevarianceofthefitted
of the
So far,we have discussedonlytheexpectation
termmeasuresthebias.By analogywithlinearregression,deviance,notitsdistribution.
dewe
In
(1984)
Tibshirani
we definethenumberofdfusedin a fit' by
studyto assessthe accuracyof the
scribeda simulation
ofthedeviance
andto studythedistribution
(17) traceformula
df(A) = 2 tr(S) - tr(StS).
turnsoutto be quitegoodforthe
decrease.The formula
reducesto Gaussianand logisticmodelsbut notverygood forthe
thisexpression
In standardlinearregression
inthemodel.A simplificationCox model.Hencewe resortto simulation
ofparameters
p, thenumber
the
to estimate
becauseone dfforthe Cox model.As forthedistribution
linessmoother,
of (17) occursfora running
of thedecanshowthattr(StS)= tr(S)becauseofthefactthateach viancedecrease,itis notx2,butis somewhat
morespread
S itself out.
although
matrix,
rowofS is a rowofa projection
df(or
withtheappropriate
Hencethex2distribution
ifS = {sij}, then more
matrix.[Inparticular,
is nota projection
thecorresponding
gammadistribution)
specifically,
Sii = Yj sz, so tr(StS) = tr(SSt) = tr(S)I. Hence (df) shouldbe usedonlyas a roughreference.
to tr(S). Note thattr(SSt)a2is also a relevant
simplifies
values
itis thesumofthevariancesofthefitted
quantity;
4. DISCUSSION AND RELATEDWORK
A

A2

var(9i)-

a single
The quantity
df(A) is usefulbothforassessing
twofits.For example,supposethat
fitandforcomparing
we wantto comparetwofits'l = Sly and A2 = S2Y. If
we assumethatthebiasesarethesame,wehaveE(RSS(y,
Y2) - RSS(y, A')) = [(2 tr(S2)- tr(StS2)) - (2 tr(SJ)lines
forrunning
[tr(S1)- tr(S2)Icrt
tr(tlS))]2,orsimply
smoothers.
foranylocal
Analogousresultsalsoholdapproximately
Assumethatthe
family.
fit,. intheexponential
likelihood

The local likelihoodmethodextendsnonparametric


models.
tolikelihood-based
regression
techniques
regression
is rich;see,
on nonparametric
regression
The literature
forexample,Rosenblatt
(1971),WahbaandWold(1975),
Stone(1977),Cleveland(1979),Li (1984),andSilverman
covariatecase,thelocallikelihood
(1985).In themultiple
whatHastie
techniqueprovidesa methodforestimating
model,"
additive
(1986)calleda "generalized
andTibshirani

by
anymodelinwhichthelinearterm
z xfl1is replaced

This content downloaded from 128.211.171.2 on Tue, 11 Nov 2014 14:30:10 PM


All use subject to JSTOR Terms and Conditions

566

Journalofthe American StatisticalAssociation.JuneIQ7

an additivetermE s(xj). In thatarticle,we discussed Firstwe concentrate


on the termE(I(j', h)). This can be
of as theanalogofthesumofthevariances
ofthefits,
another
localscoring, thought
closelyrelatedestimation
technique,
usingI as a "metric."We shownowthatit is approximately
a
and compareditwithlocal likelihoodestimation.
sum
of
variances
and
that
weighted
E(I(,u,
h))
tr(S).
Generalizedadditivemodelsprovideone way of exDenotingby V(Q)thevariancefunction,
let ai = V(,up)and
tending
theadditivemodelE(Y Ix) = TP sj(xj) [see Has=
In
V(h1).
let
be
a
addition,
matrix
withiith
D(ai)
diagonal
ai
tie, Tibshirani,
and Buja (1987) fora discussionof the
entryai and denoteby vi thevarianceof A. Then a standard
havebeen Taylorseries
additivemodel].At leasttwootherextensions
approximation
gives
the
proposed.Friedmanand Stuetzle(1981) introduced
E(I(,A

projectionpursuitregressionmodel:

E(Y I x)

> sj(atx).
p

E(6(Y)

Ix)

> sj(xj).
1

E(,

- h)tD(ai)-1(j-- h)

> viai-.
n

(20)

The directions
aj arefoundbya numerical
search,andthe
bysmoothers.
sj(-)'s are estimated
conditional
The alternating
expectations
(ACE) model
andFriedman
theadditive
model
(Breiman
1985)generalizes
oftheresponse:
a transformation
byestimating
p

h))

(21)

(A.3)

We assumethat, was obtainedby local likelihoodfitswith


thismeansthat, = Sy,whereS is therunning
constants;
means
matrix.
smoother
Thenvi =
j a1/[Ni]2,andifwe assumethat
thatai a- 1i,
ave[N1(uj) ai, thenvi -ail[Ni]. Finally,assuming
we have
E(I(,, h))

E [

N= tr(S).

(A.4)

0 Notethatin thecase ofgenFinally,we showthatE A O.


eralizedlinearmodels,we have
A = 2 p -_ p gX= 0.
(A.5)

a
The locallikelihood
idea couldalso be usedto estimate
orindeedanyotherfunction
responsetransformation,
ap- Then(A.2) is a specialcase ofSimon'stheorem(Simon1973),
pearingina model,forexample,a linkorvariancefunction a Pythagorean
relationforKullback-Leibler
distance.For runin a generalized
linearmodel.We havenotpursuedthis, ninglinesfits A = Sy,A = 2(Sy - h)t(y - fL)= 2(y - f)tSt(I however.
S)y, withexpectationa2 tr(St(I - S)) = 0. For local likelihood
In the local likelihoodprocedurewe have used local fitsin the exponentialfamilywe can writeg(, )- g(h)
h). Since once again FL = Sy, we have
linearfitting.
O'Sullivan,
Yandell,andRaynor(1986)looked D-I(aj)(uat splinesforgeneralexponential
familymodels.They
E(A)
2E(y - p)tStD-l(aj)(I - S)y
emergeas thesolutionto a penalizedlikelihood
problem.
= tr(StD-l(ai)(I - S)D(ai))
An additivemodelis not considered;instead,a general
= 2(tr(SD-1(aj)D(u1))) - tr(StD- (aj)SD(uj)).
surfaceis fitted.Green (1985) and Greenand Yandell
(1985) lookedat similartechniques,
withan emphasison The termon theright,however,is anotherrepresentation
for
semiparametric
models.Brant(1985) discusseda tech- h)'D(a1)-t( u - h), and thusfromthe derivationleading
locallikelihood
estimation
niqueinvolving
withconstants. to (A.4), E A - 0.
APPENDIX:DERIVATIONOF THETRACEFORMULA
FOR DEGREESOF FREEDOM

1983.RevisedAugust1986.1
[ReceivedFebruary

REFERENCES

Givena fit,i based on thelocal likelihoodfitof a response


of theEnTheoryand an Extension
vectory,a covariate
x, andspanw,letS be thesmoother
matrix, Akaike,H. (1973),"Information
Symposium
on
Principle,"
in 2nd International
tropyMaximization
based on x thatproducesa running
meanssmoothof span w.
Theory,
eds.B. N. PetrovandF. Csak,Kiado:Akademia,
Information
As before,we assumethat+ = 1. Let ,u = Ey, h = E,u, and
pp. 267-281.
LinearModels,"
ResidualPlotsforGeneralized
R. (1985),"Smooth
= 2EJ 11 logf(Y,
(1,
P2)
uli, k)Ify(Y, Y2i,
), ortwicethe Brant,
Dept.ofAppliedStatistics.
ofMinnesota,
report,
University
technical
Kullback-Leibler
distancebetweenp,' and R2. We wishto esOptimalCorreJ.H. (1985),"Estimating
Breiman,L., andFriedman,
tablishthat
oftheAmerJournal
andCorrelation,"
Regression
lationsforMultiple
80, 580-597.
Association,
icanStatistical
df(,i) tr(S),
(A.1)
Regressionand
Cleveland,W. S. (1979), "RobustLocallyWeighted

wheredf(,u)is implicitly
definedby
AssociStatistical
Journalof theAmerican
Smoothing
Scatterplots,"
ation,
74,
828-836.
E Dev(y, ,i) = E Dev(y, p,) - df(,u)+ I(p,, h).
of
Modelsand LifeTables,"Journal
Cox, D. R. (1972),"Regression
Ser.B, 34, 187-202.
Society,
We can replacethetermsDev(y, ) by I(y, -) (Hoeffding's theRoyalStatistical
ofExponential
Families,"TheAnnals
of Kullback- Efron,B. (1975),"TheGeometry
lemma;see Efron1975). Now by the definition
Statistics,
6,
362-376.
of
Leiblerdistancein the exponential
family,
EI(y, h) = EI(y, Friedman,
PursuitRegresJ. H., and Stuetzle,W. (1981),"Projection
u) + I(,u, h), andwe can expandI(y, h) as
76, 817-823.
Association,
Statistical
oftheAmerican
sion,"Journal
of
technical
report
(Orion003),
Scatterplots,"
"Smoothing
(1982),
=
A,
+
I(y, h) I(y, ) + I(lA, h)
(A.2)
A

whereA\= 2(g(j)
shipsweget

Dept. ofStatistics.
Stanford
University,

g(h))'(y - ,I). By usingtheserelation-Green,P. J.(1985),"PenalizedLikelihoodforGeneralSemi-parametric

df(,k)= [E(I(g>, h)) + E A].

ofWisconsinReport2819,University
Models,"Technical
Regression
Madison,Dept. ofStatistics.
Generalized
Linear
Green,P., andYandell,B. (1985),"Semiparametric

This content downloaded from 128.211.171.2 on Tue, 11 Nov 2014 14:30:10 PM


All use subject to JSTOR Terms and Conditions

Tibshirani and Hastie: Local Likelihood Estimation

567

Models,"in Proceedings
of the2ndInternational
GLIM ConferenceNelder,J.A., andWedderburn,
R. W. M. (1972),"Generalized
Linear
(LectureNotesin Statistics
32), Berlin:Springer-Verlag.
Models,"Journalof theRoyalStatistical
Society,Ser. A, 135,370Haberman,S. (1976),"GeneralizedResidualsforLog-linear
Models,"
384.
inProceedings
ofthe9thInternational
Biostatistics
Conference,
Boston, O'Sullivan,
F.,Yandell,B., andRaynor,
W.(1986),"Automatic
Smoothpp. 104-122.
in Generalized
ingofRegression
Functions
LinearModels,"Journal
Hastie,T. (1983),"Non-parametric
LogisticRegression,"
technical
reStatistical
oftheAmerican
Association,
81, 96-103.
port(Orion016),Stanford
Rosenblatt,
M. (1971),"CurveEstimates,"
AnnalsofMathematical
University,
StaDept. ofStatistics.
(1984),Comment
on "Graphical
MethodsforAssessing
Logistic tistics,
42, 1815-1841.
Regression
Models,"byJ.M. Landwehr,
D. Pregibon,
andA. Shoe- Silverman,
B. W. (1985),"SomeAspectsoftheSplineSmoothing
Apmaker,Journal
oftheAmerican
Statistical
Association,
CurveFitting"
79,77-78.
proachtoNon-parametric
Regression
(withdiscussion),
Hastie,T., andTibshirani,
R. (1986),"Generalized
AdditiveModels"
Journal
oftheRoyalStatistical
Society,
Ser. B, 36, 111-147.
(withdiscussion),
Statistical
Science,1, No. 3, 297-318.
of Information
in Exponential
Simon,G. (1973), "Additivity
Family
Hastie,T., Tibshirani,
R., andBuja, A. (1987),"LinearSmoothers
and
Statistical
Laws,"Journal
oftheAmerican
Association,
68,478-482.
AdditiveModels,"submitted
forpublication.
TheAnnals
Stone,C. (1977),"Consistent
Non-parametric
Regression,"
Kalbfleisch,
J. D., and Prentice,
R. L. (1980),TheStatistical
5, 595-620.
Analysis ofStatistics,
ofFailureTimeData, NewYork:JohnWiley.
R. (1982),"Non-parametric
of RelativeRisk,"
Tibshirani,
Estimation
J.M., Pregibon,
A. (1984),"Graphical technical
Landwehr,
report(Orion22), Stanford
University,
Dept. ofStatistics.
D., andShoemaker,
MethodsforAssessingLogisticRegression
Models,"Journalof the
(1984),"Local LikelihoodEstimation,"
unpublished
Ph.D. disAmerican
sertation,
Stanford
University,
Dept. ofStatistics.
Statistical
Association,
79, 61-71.
Li, K. C. (1984),"Regression
ModelsWithInfinitely
ManyParameters: Tibshirani,
R., and Hastie,T. (1985),"Local LikelihoodEstimation,"
ofBoundedLinearFunctionals,
ofToronto,Dept. ofStatistics.
TheAnnalsofStatistics, unpublished
manuscript,
University
Consistency
12,601-611.
Automatic
French
Wahba,G., and Wold,S. (1975), "A Completely
Mallows,C. L. (1973),"Some Comments
on Cp," Technometrics,
Curve:FittingSplineFunctions
Communicaby Cross-Validation,"
15,
661-675.
tionsinStatistics,
PartA-TheoryandMethods,
4, 1-7.

This content downloaded from 128.211.171.2 on Tue, 11 Nov 2014 14:30:10 PM


All use subject to JSTOR Terms and Conditions

Вам также может понравиться