Liu R. (1990) On A Notion of Data Depth Based On Random Simpleces

On a Notion of Data Depth Based on Random Simplices Author(s): Regina Y.
Liu Reviewed work(s): Source: The Annals of Statistics, Vol. 18, No. 1 (Mar., 1990), pp. 405-414 Published by: Institute of Mathematical Statistics Stable URL: http://www.jstor.org/stable/2241550 . Accessed: 11/10/2012 18:48
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The Annals of Statistics.
http://www.jstor.org
The Annals of Statistics 1990, Vol. 18, No. 1, 405-414
ON A NOTION OF DATA DEPTH BASED ON RANDOM SIMPLICES1 By REGINA Y. Liu Rutgers University JohnVanRyzin To thememory ofmyteacher and friend
For a distributionF on R P and a point x in R P, the simplicial depth D(x) is introduced,which is the probabilitythat the point x is contained inside a random simplexwhose verticesare p + 1 independentobservations it is argued that D(x) indeed can from F. Mathematicallyand heuristically be viewed as a measure of depth of the point x with respect to F. An empirical versionof D(-) gives rise to a natural orderingof the data points fromthe centeroutward.The ordering thus obtained leads to the introduction of multivariategeneralizationsof the univariate sample median and L-statistics.This generalizedsample median and L-statisticsare affine equivariant.
1. Introduction. The maingoalofthispaperis to introduce a newnotion of data depth.This notion out ofa fundamental underemerges naturally concept lying affine geometry, namely thatofa simplex, and it satisfies therequirements one would expect froma notionof data depth.Thus it leads to an affine invariant, center-outward ranking ofthedata points. We nowturnto a detailed description. data set. Givenany threedata pointsXi, X1 Let X1,..., Xn be a bivariate and Xk, we can forntheclosedtriangle withvertices we Xi, X1 and Xk which in thisway (n) we generate denoteby lA(Xi, X1,Xk). Fromthe n data points, To any point x in R2 we can associatethenthe number of those triangles. if x is "deep" whichcontainx inside.This number shouldbe larger triangles insideor near the "center" ofthedata cloud,and smaller ifx is relatively near its outskirts. This suggests thefollowing notion ofdepthmeasure, which we shall call simplicialdepth since it is based on triangles and theirp-dimensional E which are Denote x generalizations, that simplices. by A(Xi,Xj, Xk) theevent x fallsinsidetheclosedrandom triangle A(Xi,Xj, Xk) and byI(A) theindicator ofan eventA, i.e., I(A) = 1 ifA occurs function and I(A) = 0 otherwise. Then
(1.1)
Dn(x) - (3)
I(x E 1< i <1< k< n
A(Xi, Xj, Xk))
the proportion oftriangles x. To visualize expresses thesituation, we containing on the region may imagineplacinga layerof clay withthickness corre(
Received July 1987; revisedMarch 1989. 'Research supported by NSF grantsDMS-85-02945 and DMS-88-02558. AMS 1980 subject classifications. Primary62H05, 62H12; secondary60D05. Key words and phrases.Simplex, simplicial depth, multivariatemedian, L-statistics,angularly location estimators, affine symmetric distributions, consistency, equivariance. 405
406
R. Y. LIU
are to each triangle sponding A(Xi,Xj, Xk),one by one untilall (') triangles theexactshapeof Dn(.). The resulting solidwillrepresent exhausted. oftheprobability version in (1.1)is an empirical It is clearthat Dn(x) defined (1.2) D(x) PF(X E_ A(X1,X2,X3)) D(x) function F. The quantity distribution if Xi's are i.i.d. witha common and of the distribution shouldassumehigher valueswhenx is nearthe center to D(x) in (1.2) We shallrefer awayfrom thecenter. shouldtendto 0 as x moves to F in R2 and to D"(x) in (1.1) as respect as the simplicial depth (SD) ofx with to thedata cloud X1,..., X,. the samplesimplicial depthofx withrespect theunivariate analogofSD, namely, It maybe instructive to consider (1.3) D(x) = P(x E X1X2). from a univariate observations Here x is in R', X1 and X2 are twoindependent X1 and X2. connecting the closedline segment c.d.f. F and X1X2represents WhenF is continuous,
(1.4) D(x) = 2F(x)[1 - F(x)].
D(x) is a population that any pointwhichmaximizes immediately It follows median.The maximum value of D(*) is 2 in this case, and D(x) decreases themedian. to 0 as x is pulledawayfrom monotonically maximizes suggests thatwe call a pointin R2 which The above observation median.We denotesucha pointby IL and willalso simplicial D(.) a bivariate The sampleversion is emphasized. whengeometric understanding call it center is then median ofthe bivariate (1.5)
A
SD. highest sample attaining = thedata pointXE0
I we can define as the thanonedata point, If themaximum is achieved at more which maximize for motivation Dn(.). The heuristic averageofthosedata points and ,1 is the If D(.) is continuous as the samplemedianis the following: for forD(-) in R2, an estimator ,uwouldbe a pointxo in the uniquemaximizer of in theneighborhood F If a nonzero has maximizes density planewhich Dn(*). all the data maximizes among thedata pointXiowhich u, we wouldexpect Dn(-) be made can actually to u. Thesearguments pointsto be closeto xo and,hence, in 3. as we shallsee Section rigorous, in (1.2) can indeedbe viewed A majortaskhereis to showthat D(x) defined somekindof thatit possesses of depth;thatis, to showformally as a measure in the univariate to the one that D(-) possesses similar monotonicity property the in Theorem 3 of Section2. To be moreprecise, analog.This is established is angularly distribution assertsthat whenthe underlying theorem symmetric monotoniabouta point,, thenD(x) decreases thedefinition) (see Section2 for ray. ,ualonganyfixed callyas x movesawayfrom dimento higher so farcan be easilyextended All the concepts introduced in the definition of SD is F on RP, a random sions.For a distribution triangle now replaced by a randomsimplexwhose verticesare p + 1 independent
DATA DEPTH BASED ON RANDOM SIMPLICES
407
from F. Consequently, we define: observations to F to be 1. the simplicial depth(SD) function D(*) on R P withrespect (1.6) D(x) PF(x E S[Xl,..., XP+1]),
from Fand S[X1,..., Xp+?] observations where X1,..., Xp+1areindependent words, S[X1,..., Xpj?] is is the simplex withvertices Xl,..., Xpj (in other of X1,..., Xp+1); are convex combinations in R P which the set of all points medianof F, I, to be a pointwhichmaximizes 2. a (multivariate) simplicial to be 3. the samplesimplicial depth function Dn(*) (1.7)
D(.);
D() n(X) -(+1)E p +

J1<jj<
...
Ix E=S Xil, ... Xi+] i+

<ip+,'n
if X1,..., Xn is a random samplefrom F; medianjinto be thesamplepointwhich 4. the (multivariate) samplesimplicial ifthere are many. ofsuchpoints maximizes Dn(.), or theaverage P ornota pointx in Ri to check whether thatit is straightforward We observe to solving thefollowamounts It actually is insidethe simplex S[x1,..., x 111. ingsystem oflinearequations: (1.8) x = alxl + a2x2 +
..
+ap+lxp+l;
a1 + a2 +
?.+a+
1.
this system of p + 1 equationswithp + 1 unFor a nondegenerate simplex, ifand and x is inside thesimplex knowns a1,a2, ..., ap+1 has a uniquesolution, onlyif al, a2,.-., ap+I are all positive. that Let A be a p x p matrix and b E Ri P. Then(1.8) immediately implies
(1.9)
DA,b(Ax
+ b) =D(x),
insidethe simplex whereDA, b(Y) is the probability that y(E DR P) is contained = the function with verticesAXi + b, i 1,..., p + 1. In otherwords, D(.) is underaffine holds invariant transformations. Such invariance property clearly to assertthe forthesampleSD Dn(-)as well.This property ofDn(*) is sufficient in thisarticle. ofall location estimators proposed equivariance property affine Some applications ofsimplicial depth. L-statistics.In additionto givingthe above generalized (A) Multivariate data pointsand, median,the notionof SD leads to a new way of ordering of the so-calledL-statistics a generalization (linearcombinations consequently, Let X[U]be the data point of orderstatistics)in the multivariate setting. associatedwiththe ith highest sampleSD value.Then X[1],X[2],..., X[n] are Let w(.) the orderstatistics of Xi's withan ordering thecenter outward. from function of [0,1]. We define a class of multivariate be a nonincreasing weight
408
R. Y. LIU
L-statistics as
n n
(1.10)
L=
i=l
Xufw(i/n)/ L w(j/n).
j=1
Whenw(t) = I(t < 1/n), Lw is thesameas thesamplemedian[in ifDn(-)is among thesamplepoints. uniquely maximized thatthetrimmed or0.1 are thecommonly usedvalues.We wouldliketo mention to A2,whenthe mean witha = 0.95 (or so) shouldbe an appealing alternative maximized. population SD is notuniquely in the planecan be data and simplicial depth. A direction (B) Directional space viewedas a pointon a unitcircle, in three-dimensional whilea direction data as a point The study ofdirectional on a unitsphere. can be similarly viewed where the ambient space is not a p-dimensional Euclidean leads to situations in (p - 1) dimensions. depth The notion ofsimplicial space,but rather a sphere For example, in instead ofsimplices. can be adaptedby usinggeodesic simplices twoobservations is to replacethe the short arc connecting the case of a circle, in Liu and used to define SD in R1. This is investigated randomline segment Singh(1988). We are oftenrequired to of (angular) symmetry. (C) Testingthe center A class ofdistributions determine the center of a symmetric somepopulation. what broaderthan the usual symmetric distributions is the class of angularly is angularly symmetdistributions. a distribution Roughly speaking, symmetric if passingthrough x dividesthe whole ric about a point x everyhyperplane definition withequal probabilities. (For the precise space into two half-spaces see Section2.) In the present and further paper,it is shown(cf. discussions, ofangular 3 and 4) thattheSD is maximized at thecenter symmetry Theorems center of and takes therethe value 2-P in R P. Thus,if bo is a hypothesized an of the indication thena largevalue of (2-P Dn(bo))is angularsymmetry, B ofSection2 saysthat in Remark The observation false. nullhypothesis being This factleadsus to U-statistic. the teststatistic (2-P - Dn(bo))is a degenerate of concludethat n(2-P - Dn(bo))has as its weak limita linearcombination x2-distributions studyof thistesting procedure [cf.Gregory (1977)].A detailed willappearseparately. scalesand a a classofmultivariate ofSD include Otherapplications deriving rule.In fact,a measureof scale can be derived by multivariate classification the center how faraway one has to movefrom (i.e.,the maximum considering of its pointof the sampleSD) in orderto reducethe SD value to a fraction is roughly thefollowing As for theidea there maximum. [see Gross classification, from twodifferent populaand Liu (1988)]: Supposethattwotraining samples A classification ruleis a wayofassigning tionsare given. anynewdata pointZ the Such a rulecan be obtained to one of thesetwopopulations. by comparing ranksof Z w.r.t.the training samples.Z shouldbe relativecenter-outward
When w(t) = I(t < 1 - a), Lw is a 100a% trimmed mean. In practice,a = 0.05
409
relative rank assigned to thepopulation whose training sample leadsto a smaller forZ. Generalremarks. An earlier concept ofdata depthwas introduced byTukey (1975). Tukey's data depthand the relatedsamplemedianstudiedin Stahel (1981),Donoho(1982), and Donohoand Gasko(1988)arebasedon theinspection of "every"one-dimensional of the sampledata. In a different projection direca samplemedianin R P as a pointx whichyieldsthe tion,Oja (1983) defined minimum totalvolume ofall simplices formed byx and p ofthedata points. As faras the "generalized" multivariate medianis concerned, there is an extensive literature and a thorough coverage can be found in Rousseeuw and Leroy(1987). 2. Main properties of the simplicial depth functionD(.). in Theorems 1-4. properties of D(.) are summarized
THEOREM 1.
The main
For any Fon R Pandx E RP, supllXII2M D(x)
O asM --oo.
THEOREM 2 [Continuity of D( )]. If F is an absolutely continuous distributionon R P, thenD(-) is continuous.
The The nexttwotheorems are statedforangularly symmetric distributions. reason we focus on these distributions is that they forma large class of an obviouscenter, distributions and we shall showthat this center possessing agreeswiththe one predicted by thesimplicial depthfunction. F is said to be DEFINITION. A random variableX in R P or itsdistribution about thepoint b (in R P) if and only if the random angularlysymmetric variables (X - b)/JIX- bII and -(X - b)/JIX- bII are identically diswhere11 standsfortheEuclideannorm. tributed, *11 for all 9, inducedbyF provided thatsuchangular exists. It is easyto see thatifF density
For p
=
2, F is angularly symmetric about b simply means ab(O) = ab(O + IT) 0 < 9 < iT, whereab(') is the angulardensity aroundthe point b
is symmetric about b, then F is angularlysymmetric about b. It is also easy to
see that if F is angularly about b, then any hyperplane symmetric passing P intotwoopenhalf-spaces b willdivide1R withequal probabilities. This through of is ' if the distribution is absolutely continuous. Thus the center probability In viewof is whatonewouldwantas a (multivariate) median. angular symmetry a medianby the maximal thisand Theorem to define 3, it is onlynatural point ofangular we notethatthecenter is uniquewhenit of D(*). Finally, symmetry F has itswholeprobability mass exists, exceptin thecase whenthedistribution on a lineand its probability distribution concentrated alongthatlinehas more centers of angular than one median.In fact,if b1 and b2 are two different thentheregion between symmetry, anytwoparallel hyperplanes passing through have two and would zero these b2, respectively, probability. Rotating hyperb,
410
R. Y. LIU
P except forthe line passing throughb1 and planes, it followsthat the entire1R b2 has zero probability.
and continuous of D(*)]. If F is absolutely THEOREM 3 [Monotonicity nonincreasing D(ax) is a monotone then abouttheorigin, symmetric angularly in a 2 0 forall x e R P. on R P and it is continuous distribution THEOREM 4. If F is an absolutely aboutb E R P, then symmetric angularly D(b) = 2-P.
In particular,Theorems3 and 4 implythat forany point a in R P, D(a) < 2-P distribution. if F is an angularlysymmetric Beforediscussingthe proofsof Theorems1-4, we pause to make two remarks.
the center.Their away from and further other.As c decreases,theymove further about the distributionF. In the geometryshould contain useful information each contouris a circleand D(x) is a monotonic special case when F is spherical, i.e., when the densityat functionof lixll.In the case of an ellipticaldistribution, it is not hard to show that D(x) is also a x is a functionof (x - t)'V-'(x - Ly), functionof (x - tt)'V- (x - t). In other words,the contoursof D(.) resemble densityin the elliptical case. This observation the contours of the underlying again confirmsthat D(-) indeed provides us with an appropriate notion of ordering. fact: implythe following REMARKB. The Proof of Theorem 4 will further at a center bo, the conditionalSD Under the assumptionof angular symmetry value at bo given one of the random verticesis the same as the unconditional one. In other words, (2.1) P(bo E S[X1,..., Xp+1]Xi) = 2 P
{x e R P: D(x) = c} for positive numbers c < 2-P are nested within one an-
REMARKA. Theorem3 is equivalentto sayingthat the contoursdefinedby
(2.1) impliesthat (2-P - D (bo)) is a degenforeach i = 1,..., p + 1. Evidently, erate U-statistic,that is E[(Dn(bo) - 2-P)IX] = 0 forall i, 1 < i < n.
that the event {x E 1. Given x in DRP,we observe PROOFOF THEOREM ? llxll}. The theorem S[X1,..., Xp+1]} is contained in the event UJL11{lIXill follows.1
to the difference can contribute D(x) - D(y) onlyifit points. A randomtriangle however other. This impliesthat theremust be a contains one point but not the intersects the line segmentxy. It line segmentjoining two data points which to followsthat if xn is a sequence in R2 whichconverges x, then
ID(x)
- D(Xn)l
Let x and y be twodistinct simplicity. PROOF OF THEOREM 2. Let p = 2 for
< 3P(An))
DATA DEPTH BASED ON RANDOMSIMPLICES
411
whereAn = {(X1, X2): X1X2intersects xx,}. Note that limsup,,- P(AJ) < 0 An = {(X1, X2): x E XlX2). P(limsup,, . A). Note also that limsup,, C Since P(limsup, , . An) = 0, thetheorem follows. contribute to the difference different eventswhich D(x) - D(ax), a 2 1. They x to ax enters or leavestherandom from are the events thatthearrow triangle A(X1,X2,X3). We shallcall themAmand Aut, respectively. For anytwodistinct we needsomenotation. To makethe argument precise, If theplaneintotwohalf-planes. aibdivides contains pointsa, b, the linewhich the whichcontains the origin, we call thehalf-plane that line does not contain originthe "inner side" I(a, b). Let x, a 2 1 be fixedand C = {(a, b): ab n xx, whichintersect with x, ax. Then, ax=# 0 } be the set of all segments A12= {(X1, X2) E C) n {X3 where null sets,A = A12u A?3u A31, neglecting o I(X1, X2)) and thethree and equallyprobable. events AiJaredisjoint Similar holdforAout remarks to the definition of D(.), If B, = {ax e A(XP,X2, XA3), then,according
D(x)
-
PROOF OF THEOREM 3.
The idea is to examinetwo Let p = 2 forsimplicity.
D(ax) = P(B1\Ba)
P(BG\B,). Now,
B, \ Ba = Aout \ Ain, Ba \ B, = Ain \ Aout
and
D(x)
-
D(ax)
= P(A.Ut) = 3P(A
2-
P(Aout n Ain) 3P(A12) [P(X3EI(X1,
[P(Am)
P(Ain n Aout)]
)
C
(2.2)
=3
X2)) x2))] dF(Xl) dF(X2)*

2
(X1, X2)
P(X3
? I(X1,
P(X3 E I(x1,x2)) 2 Because of the angularsymmetry, theassertion. This proves C1 nonnegative.
and the integrand is
REMARK C. From(2.2) in the Proofof Theorem 3, we may deducethat twoadditional conditions hold: D(x) - D(ax) > 0 for anya > 1 ifthefollowing (i) f is positivein a neighborhood of the origin, and (ii) f is positivein a offixfor some thedensity neighborhood , suchthat1 < 1 < a, where f denotes in (2.2) is positive of F. Clearly, thattheintegrand almost and (i) implies surely, in (2.2) has positive (ii) impliesthat the domainof the integral In probability. maximized at theorigin particular, D(.) is uniquely under condition (i).
ric about the origin 0. In thiscase Xi* - XillXill and (-Xi*) are identically and the four events are equivalent fora nullset: distributed, following except
(ii)
PROOF OF THEOREM4. W.l.o.g.we may assume that F is angularlysymmet-
(i) {(X1,..., AXp, Xp+1): 0 E S[XA1,..., TXp, Xp+A]);

{(X1 .. ., Xp, Xp+1): 0 E S[X,...
whereei (iii) {(XI1,...,2 Xp, Xp+,): 0 E-S[el, ... , eP,[[X1*,...,Xp*]]-'Xp*+,])}
e A1*,., w+hre; A]
412
R. Y. LIU
withcolumns in iRPand [[X1*,..., Xp*]]is the matrix is the ith unit vector (iv) {(X1,..., Xp,Xp+?): W1< 0,..., Wp< 01, whereWi is the ith compo+ [[X1*,..., Xp nentofthevector X.* with - Xi*, we can show that the randomvector By exchanging This imaboutthe origin. symmetric [[X1*,..., Xp ]]-1Xp+L is coordinatewise by (W1,..., Wp)'has an equal probability, determined pliesthateach "orthant" 2-P and the event (iv) has the probability which must be 2-P. Therefore El 4 follows. Theorem 3. Consistencyof the sample simplicialdepth D.(-). on R P with distribution continuous 5. Let F be an absolutely THEOREM bounded density f. Then: ofDn(-) holds,i.e., consistency (a) The uniform
x E-RP
x*
AI*
. XwP I II
x*.
L
sup IDn(x) - D(x)l
-*
0 a.s. as n
-s
oo.
--*i then Ia1n uniquelymaximizedat M,u
of A and ifD(-) is iff does notvanishin a neighborhood (b) Furthermore,

a.s. as n -x c.
lemmas. three 5 is basedon thefollowing The proof ofTheorem

LEMMA
1. For anyF on R P and x E RPi,

IIXII2M
sup Dn(x)
-*
0 a.s. asM
-x
cc.
Let 8 and c be arbitrary continuous. F is absolutely LEMMA 2. Supposethat constants. butfixed Then, foranypositivee, we have positive
sup
{x, yEBall(g, c), IIx-yII<E)
IDn(x)
D.(y)l < y(e) + 8 +

E
Rn,
where-y(E)is nonrandom, y(e) -O 0 as

LEMMA
0 and Rn
O_ 0
a.s. as n -x oc.
on R P and X1,.. ., Xn be a random 3. Let F be a distribution n = ( 1 F. Let Un <i1< ) samplefrom ,... Xim)be a U-statistic m. If h is bounded, foranyr> 2, say byc, then withthekernel h(*) ofdegree
E(Un - EU)r < ?
r
on c. K depends where 1 and 2, respecto Theorems Note that Lemmas1 and 2 are closein spirit of D(-), we need to invoke the sampleversion Since Lemma2 concerns tively. P form a setsin 1R Borelmeasurable the crucialfactthattheclassofall convex
413
w.r.t. measure classifF has a density Glivenko-Cantelli Lebesgue [cf.Gaenssler and Stute (1979)].In other words,
A c- W
sup jF.(A) - F(A)l
--
0 a.s.,
to Liu (1987) where' is the class ofall convex Borelmeasurable sets.We refer In fact, Lemma forthedetails.Lemma3 is needed becauseDj(x) is a U-statistic. 3 is essentially LemmaA on page 185ofSerfling (1980). we cometo theProof ofTheorem 5. Finally, if 1 and Lemma1, part(a) willfollow PROOF OF THEOREM 5. By Theorem we can showthat,fora chosenM > 0, sup ID,(x) - D(x)l -O 0 a.s. as n -x 00, (3.1)
x Q(0, M)
and M as thelength ofits with,uas its center whereQ(,u,M) is thehypercube sides. NP subhypercubess. Divide each side of Q(,, M) intoN equal piecesto form 2 and Lemma2, sinceN can be arbitrarily we only In view of Theorem large, need to showthat max IDj(x) - D(x)l -O 0 a.s. as n -+ oo, (3.2)
xec-C(L, M)
whereC(Qu, ofthesubhypercubes. M) is theset ofall corner points

Using Lemma 3 with m = p + 1, r = 4 and c = 1, we obtain
P(
max
<
ID(x)
xeC(IA,
M)
-D(x)I
>)
-
NP max P(IDn(x)
D(x)l > c) = O(n 2).
from theBorel-Cantelli lemma. The claim(3.2) therefore follows with two ofpart(b) canbe outlined as follows: We begin The idea oftheproof smallbut at ,u.The radiusof the bigger ball is arbitrarily balls, each centered Then it is shownthatthe Dn value at any pointinsidethe inner ball is fixed. thanthatat anypointoutside thebigger all largen. Since,for all larger ball,for willfallinside theinner ofIILn largen, at leastonedata point ball,thepossibility ball is ruledout. lyingoutsidethebigger that D(-) is uniquely maximized at u, we see that,forany 6 > 0, Assuming thereexistsa 8 > 0 such that D(x) < D(yt) - 8 forall x i Ball(,4, E). By the < 6 of D(.) (cf. Theorem such that ID(y) continuity 2), we may choose El < E for all < y Ball(,u, D(y) 8/2 for all x ? D(tQ)I 8/2 El). Thus, D(x) in part ofDn to D given Ball(,i, 6) and y E Ball(y,El).The uniform convergence 5 guarantees from a certain (a) ofTheorem that, starting n, Dn(x) < Dn(y) - 8/4 forall x O Ball(,u, E) and y E Ball(,u, El). Now we claimthatthere is at least one samplepointinsidethe smaller ball Ball(,u,e6) forall largen, almostsurely. Since f does notvanishin a neighborhood of u, we have p P(X1 E Ball(ju, E6)) > 0.
414
R. Y. LIU
Consequently,
P(Ball(M, e1) does not containany of X...., Xj)
=
(1
p)n.
Therefore, almostsurely, after certain exists somesample n, there point, say Xk, ? Dfl(Xk) and, hence, 'n E insideBall(M,-,). By the definition of ,n, Dn(An) Ball(M,e). Since E can be chosen [1 arbitrarily small, part(b) follows. for hisencourAcknowledgments. I wouldliketo thank JoopKemperman I am grateful discussions. to the referee and the agementand manyhelpful Editorfortheirkindsuggestions, which thearticle helpedimprove greatly.
REFERENCES
paper, location estimators. Qualifying of multivariate DONOHO,D. L. (1982). Breakdownproperties Harvard Univ. means. of the median and trimmed DoNOHO, D. and GASKO, M. (1988). Multivariategeneralization I. Unpublished. and P. and STUTE,W. (1979). Empiricalprocesses:A surveyof resultsforindependent GAENSSLER, randomvariables. Ann. Probab. 7 193-243. identicallydistributed and tests of fit.Ann. Statist.5 110-123. GREGORY, G. G. (1977). Large sample theoryforU-statistics rules based on concepts of data depth. Unpublished. GROSS, S. and Liu, R. (1988). Classification Dept. Statist., Technical report, Liu, R. (1987). Simplicial depth and the relatedlocationestimators. Rutgers Univ. ofdirectional data: Conceptsof data depth on circles Liu, R. and SINGH, K. (1988). On the ordering and spheres.Unpublished. Statist. Probab. Lett. 1 327-333. distribution. OJA, H. (1983). Descriptivestatisticsformultivariate RouSSEEUW, P. J. and LEROY, A. (1987). Robust Regressionand OutliersDetection.Wiley, New York. Theoremsof Mathematical Statistics.Wiley,New York. SERFLING, R. J. (1980). Approximation Optimalitit und Schiitzungenvon STAHEL, W. A. (1981). Robuste Schatzungen: Infinitesimale Ph.D. dissertation, ETH, Zuirich. Kovarianzmatrizen. TUKEY, J. W. (1975). Mathematics and picturingdata. Proceedings of International Congress of Mathematics, Vancouver,2 523-531.
DEPARTMENT OF STATISTICS RUTGERS UNIVERSITY NEW BRUNSWICK, NEW JERSEY 08903

Liu R. (1990) On A Notion of Data Depth Based On Random Simpleces

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Liu R. (1990) On A Notion of Data Depth Based On Random Simpleces

Загружено:

Авторское право:

Доступные форматы

On a Notion of Data Depth Based on Random Simplices Author(s): Regina Y.

The Annals of Statistics 1990, Vol. 18, No. 1, 405-414

I(x E 1< i <1< k< n

A(Xi, Xj, Xk))

SD. highest sample attaining = thedata pointXE0

DATA DEPTH BASED ON RANDOM SIMPLICES

D() n(X) -(+1)E p +

Ix E=S Xil, ... Xi+] i+

DATA DEPTH BASED ON RANDOM SIMPLICES

For any Fon R Pandx E RP, supllXII2M D(x)

THEOREM 2 [Continuity of D( )]. If F is an absolutely continuous distributionon R P, thenD(-) is continuous.

is symmetric about b, then F is angularlysymmetric about b. It is also easy to

REMARKA. Theorem3 is equivalentto sayingthat the contoursdefinedby

Let x and y be twodistinct simplicity. PROOF OF THEOREM 2. Let p = 2 for

DATA DEPTH BASED ON RANDOMSIMPLICES

The idea is to examinetwo Let p = 2 forsimplicity.

B, \ Ba = Aout \ Ain, Ba \ B, = Ain \ Aout

P(Aout n Ain) 3P(A12) [P(X3EI(X1,

X2)) x2))] dF(Xl) dF(X2)*

P(X3 E I(x1,x2)) 2 Because of the angularsymmetry, theassertion. This proves C1 nonnegative.

and the integrand is

PROOF OF THEOREM4. W.l.o.g.we may assume that F is angularlysymmet-

(i) {(X1,..., AXp, Xp+1): 0 E S[XA1,..., TXp, Xp+A]);

whereei (iii) {(XI1,...,2 Xp, Xp+,): 0 E-S[el, ... , eP,[[X1*,...,Xp*]]-'Xp*+,])}

sup IDn(x) - D(x)l

--*i then Ia1n uniquelymaximizedat M,u

of A and ifD(-) is iff does notvanishin a neighborhood (b) Furthermore,

lemmas. three 5 is basedon thefollowing The proof ofTheorem

1. For anyF on R P and x E RPi,

D.(y)l < y(e) + 8 +

where-y(E)is nonrandom, y(e) -O 0 as

DATA DEPTH BASED ON RANDOM SIMPLICES

sup jF.(A) - F(A)l

whereC(Qu, ofthesubhypercubes. M) is theset ofall corner points

D(x)l > c) = O(n 2).

Вам также может понравиться

whereei (iii) {(XI1,...,2 Xp, Xp+,): 0 E-S[el, ... , eP,[[X1,...,Xp]]-'Xp*+,])}