Springer Series in Statistics
Advisors:
J. Berger, S. Fienberg, J. Gani,
K. Krickeberg, B. Singer
Springer Series in Statistics
AndrewsjHerzberg: Data: A Collection of Problems from Many Fields for the
Student and Research Worker.
Anscombe: Computing in Statistical Science through APL.
Berger: Statistical Decision Theory and Bayesian Analysis, 2nd edition.
Bremaud: Point Processes and Queues: Martingale Dynamics.
Brockwe/ljDavis: Time Series: Theory and Methods.
DaleyjVereJones: An Introduction to the Theory of Point Processes.
Dzhaparidze: Parameter Estimation and Hypothesis Testing in Spectral Analysis
of Stationary Time Series.
Farrell: Multivariate Calculation.
GoodmanjKJUskal: Measures of Association for Cross Classifications.
Hartigan: Bayes Theory.
Heyer: Theory of Statistical Experiments.
Jolliffe: Principal Component Analysis.
Kres: Statistical Tables for Multivariate Analysis.
LeadbetterjLindgrenjRootzen: Extremes and Related Properties of Random
Sequences and Processes.
Le Cam: Asymptotic Methods in Statistical Decision Theory.
Manoukian: Modem Concepts and Theorems of Mathematical Statistics.
Miller, Jr.: Simulaneous Statistical Inference, 2nd edition.
MostellerjWallace: Applied Bayesian and Classical Inference: The Case of The
Federalist Papers.
Pollard: Convergence of Stochastic Processes.
Pratt/Gibbons: Concepts of N onparametric Theory.
Read/Cressie: GoodnessofFit Statistics for Discrete Multivariate Data.
Reiss: Approximate Distributions of Order Statistics: With Applications to
Nonparametric Statistics.
Sachs: Applied Statistics: A Handbook of Techniques, 2nd edition.
Sen eta: NonNegative Matrices and Markov Chains.
Siegmund: Sequential Analysis: Tests and Confidence Intervals.
Vapnik: Estimation of Dependences Based on Empirical Data.
Wolter: Introduction to Variance Estimation.
Yaglom: Correlation Theory of Stationary and Related Random Functions I:
Basic Results.
Yaglom: Correlation Theory of Stationary and Related Random Functions II:
Supplementary Notes and References.
R.D. Reiss
Approximate Distributions
of Order Statistics
With Applications to N onparametric
Statistics
With 30 Illustrations
SpringerVerlag
New York Berlin Heidelberg
London Paris Tokyo
R.D. Reiss
Universitat Gesamthochschule Siegen
Fachbereich 6, Mathematik
D5900 Siegen
Federal Republic of Germany
Mathematics Subject Classification (1980): 6207, 62B15, 62E20, 62G05, 62G10, 62G30
Library of Congress CataloginginPublication Data
Reiss, RolfDieter.
Approximate distributions of order statistics.
(Springer series in statistics)
Bibliography: p.
Includes indexes.
1. Order statistics. 2. Asymptotic distribution
(Probability theory) 3. Nonparametric statistics.
I. Title. II. Series.
QA278.7.R45 1989
519.5
8824844
Printed on acidfree paper.
1989 by SpringerVerlag New York Inc.
Softcover reprint of the hardcover 1st edition 1989
All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (SpringerVerlag, 175 Fifth Avenue, New York, NY 10010,
USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection
with any form of information storage and retrieval, electronic adaptation, computer software, or
by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc. in this publication, even if
the former are not especially identified, is not to be taken as a sign that such names, as understood
by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.
Typeset by Asco Trade Typesetting Ltd., Hong Kong.
9 876 54 32 1
ISBN13:9781461396222
eISBN13:9781461396208
DOl: 10.1007/9781461396208
To Margit, Maximilian, Cornelia, and Thomas
Preface
This book is designed as a unified and mathematically rigorous treatment of
some recent developments of the asymptotic distribution theory of order
statistics (including the extreme order statistics) that are relevant for statistical
theory and its applications. Particular emphasis is placed on results concerning the accuracy oflimit theorems, on higher order approximations, and other
approximations in quite a general sense.
Contrary to the classical limit theorems that primarily concern the weak
convergence of distribution functions, our main results will be formulated in
terms of the variational and the Hellinger distance. These results will form the
proper springboard for the investigation of parametric approximations of
nonparametric models of joint distributions of order statistics. The approximating models include normal as well as extreme value models. Several
applications will show the usefulness of this approach.
Other recent developments in statistics like nonparametric curve estimation and the bootstrap method will be studied as far as order statistics are
concerned. 1n connection with this, graphical methods will, to some extent,
be explored.
The prerequisite for handling the indicated problems is a profound knowledge of distributional properties of order statistics. Thus, we collect several
basic tools (of finite and asymptotic nature) that are either scattered in literature or are not elaborated to such an extent that would satisfy our present
requirements. For example, the Markov property of order statistics is studied
in detail. This part of the book that has the characteristics of a textbook is
supplemented by several wellknown results.
The book is intended for students and research workers in probability and
statistics, and practitioners involved in applications of mathematical results
concerning order statistics and extremes. The knowledge of standard calculus
viii
Preface
and topics that are taught in introductory probability and statistics courses
are necessary for the understanding of this book. To reinforce previous knowledge as well as to fill gaps, we shall frequently give a short exposition of
probabilistic and statistical concepts (e.g., that of conditional distribution and
approximate sufficiency).
The results are often formulated for distributions themselves (and not only
for distribution functions) and so we need, as far as order statistics are
concerned, the notion of Borel sets in a Euclidean space. Intervals, open sets,
and closed sets are special Borel sets. Large parts of this book can be understood without prior knowledge of technical details of measuretheoretic
nature.
My research work on order statistics started at the University of Cologne,
where influenced by J. Pfanzagl, I became familiar with expansions and
statistical problems. Lecture notes of a course on order statistics held at the
University of Freiburg during the academic year 1976/77 can be regarded as
an early forerunner of the book.
I would like to thank my students B. Dohmann, G. Heer, and E. Kaufmann
for their programming assistance. G. Heer also skillfully read through larger
parts of the manuscript. It gives me great pleasure to acknowledge the cooperation, documented by several articles, with my colleague M. Falk. The
excellent atmosphere within the small statistical research group at the University of Siegen, and including A. Janssen and F. Marohn, facilitated the writing
of this book. Finally, I would like to thank W. Stute, and those not mentioned
individually, for their comments.
Siegen, FR Germany
RolfDieter Reiss
Contents
Preface
vii
CHAPTER 0
Introduction
0.1.
0.2.
0.3.
0.4.
0.5.
Weak and Strong Convergence
Approximations
The Role of Order Statistics in Nonparametric Statistics
Central and Extreme Order Statistics
The Restriction to Independent and Identically Distributed
Random Variables
0.6. Graphical Methods
0.7. A Guide to the Contents
0.8. Notation and Conventions
1
3
4
5
6
7
8
PART I
Exact Distributions and Basic Tools
CHAPTER 1
Distribution Functions, Densities, and Representations
11
1.1.
1.2.
1.3.
1.4.
1.5.
1.6.
1. 7.
1.8.
P.1.
11
14
20
27
32
36
44
51
56
61
Introduction to Basic Concepts
The Quantile Transformation
Single Order Statistics, Extremes
Joint Distribution of Several Order Statistics
Extensions to Continuous and Discontinuous Distribution Functions
Spacings, Representations, Generalized Pareto Distribution Functions
Moments, Modes, and Medians
Conditional Distributions of Order Statistics
Problems and Supplements
Bibliographical Notes
Contents
CHAPTER 2
Multivariate Order Statistics
64
2.1. Introduction
2.2. Distribution Functions and Densities
P.2. Problems and Supplements
Bibliographical Notes
68
78
81
64
CHAPTER 3
Inequalities and the Concept of Expansions
83
3.1.
3.2.
3.3.
P.3.
83
89
94
Inequalities for Distributions of Order Statistics
Expansions of Finite Length
Distances of Measures: Convergence and Inequalities
Problems and Supplements
Bibliographical Notes
102
104
PART II
Asymptotic Theory
CHAPTER 4
Approximations to Distributions of Central Order Statistics
107
4.1. Asymptotic Normality of Central Sequences
4.2. Expansions: A Single Central Order Statistic
4.3. Asymptotic Independence from the Underlying Distribution
Function
4.4. The Approximate Multivariate Normal Distribution
4.5. Asymptotic Normality and Expansions of Joint Distributions
4.6. Expansions of Distribution Functions of Order Statistics
4.7. Local Limit Theorems and Moderate Deviations
P.4. Problems and Supplements
Bibliographical Notes
108
114
123
129
131
138
142
145
148
CHAPTER 5
Approximations to Distributions of Extremes
151
5.1. Asymptotic Distributions of Extreme Sequences
5.2. Hellinger Distance between Exact and Approximate Distributions
of Sample Maxima
5.3. The Structure of Asymptotic Joint Distributions of Extremes
5.4. Expansions of Distributions of Extremes of Generalized Pareto
Random Variables
5.5. Variational Distance between Exact and Approximate
Joint Distributions of Extremes
5.6. Variational Distance between Empirical and Poisson Processes
P.5. Problems and Supplements
Bibliographical Notes
152
164
176
181
186
190
194
201
Contents
xi
CHAPTER 6
Other Important Approximations
206
6.1.
6.2.
6.3.
6.4.
P.6.
206
209
216
220
226
227
Approximations of Moments and Quantiles
Functions of Order Statistics
Bahadur Approximation
Bootstrap Distribution Function of a Quantile
Problems and Supplements
Bibliographical Notes
CHAPTER 7
Approximations in the Multivariate Case
229
7.1. Asymptotic Normality of Central Order Statistics
7.2. Multivariate Extremes
P.7. Problems and Supplements
Bibliographical Notes
229
232
237
238
PART III
Statistical Models and Procedures
CHAPTER 8
Evaluating the Quantile and Density Quantile Function
243
8.1.
8.2.
8.3.
8.4.
P.8.
243
248
260
265
268
270
Sample Quantiles
Kernel Type Estimators of Quantiles
Asymptotic Performance of Quantile Estimators
Bootstrap via Smooth Sample Quantile Function
Problems and Supplements
Bibliographical Notes
CHAPTER 9
Extreme Value Models
272
9.1.
9.2.
9.3.
9.4.
9.5.
9.6.
9.7.
P.9.
273
276
279
281
283
284
286
289
290
Some Basic Concepts of Statistical Theory
Efficient Estimation in Extreme Value Models
Semiparametric Models for Sample Maxima
Parametric Models Belonging to Upper Extremes
Inference Based on Upper Extremes
Comparison of Different Approaches
Estimating the Quantile Function Near the Endpoints
Problems and Supplements
Bibliographical Notes
CHAPTER 10
Approximate Sufficiency of Sparse Order Statistics
292
10.1. Comparison of Statistical Models via Markov Kernels
10.2. Approximate Sufficiency over a Neighborhood of a Fixed
Distribution
292
299
xii
Contents
10.3. Approximate Sufficiency over a Neighborhood of a Family of
Distributions
10.4. Local Comparison of a Nonparametric Model and a Normal Model
P.lO. Problems and Supplements
Bibliographical Notes
305
310
315
317
Appendix 1. The Generalized Inverse
Appendix 2. Two Technical Lemmas on Expansions
Appendix 3. Further Results on Distances of Measures
318
321
325
Bibliography
Author Index
Subject Index
331
345
349
CHAPTER 0
Introduction
Let us start with a detailed outline of the intentions and of certain characteristics of this book.
0.1. Weak and Strong Convergence
For good reasons the concept of weak convergence of random variables (in
short, r.v.'s) ~n plays a preeminent role in literature. Whenever the distribution
functions (in short, dJ.'s) Fn of the r.v.'s ~n are not necessarily continuous then,
in general, only the weak convergence holds, that is,
n +
00,
(1)
at every point of continuity t of Fo. If Fo is continuous then it is well known
that the convergence in (1) holds uniformly in t. This may be written in terms
of the KolmogorovSmirnov distance as
n +
00.
In this sequel let us assume that Fo is continuous. It follows from (1) that
n +
00,
(2)
uniformly over all intervals 1. In general, (2) does not hold for every Borel set
1. However, if the drs Fn have densities, say, f,. such that f,.(t) + fo(t), n + 00,
for almost all t, then it is well known that (2) is valid w.r.t. the variational
distance, that is,
o. Introduction
sup IPgn
B
B}  Pg o E B} 1+ 0,
n +
(3)
00,
where the sup is taken over all Borel sets B.
Next, the remarks above will be specialized to order statistics. It is well
known that central order statistics Xr(n),n of a sample of size n are asymptotically normally distributed under weak conditions on the underlying dJ. F. In
terms of weak convergence this may be written
n +
(4)
00,
for every t, with ~o denoting a standard normal r.v. and an, bn are normalizing
constants. The two classical methods of proving (4) are
(a) an application of the central limit theorem to binomial r.v.'s,
(b) a direct proof of the pointwise convergence of the corresponding densities
(e.g. H. Cramer (1946)).
However, it is clear that (b) yields the convergence in a stronger sense,
namely, w.r.t. the variational distance. We have
sup IP{a;;l(Xr (n),n  bn) E B}  Pg o E B}I+ 0,
n +
00,
(5)
where the sup is taken over all Borel sets B. A more systematic study of the
strong convergence of distributions of order statistics was initiated by L. Weiss
(1959, 1969a) and s. Ikeda (1963). These results particularly concern the joint
asymptotic normality of an increasing number of order statistics.
The convergence of densities of central order statistics was originally
studied for technical reasons; these densities are of a simpler analytical form
than the corresponding dJ.'s. On the other hand, when treating weak convergence of extreme order statistics it is natural to work directly with dJ.'s. To
highlight the foregoing remark the reader is reminded of the fact that F n is
the dJ. of the largest order statistic (maximum) Xn,n of n independent and
identically distributed r.v.'s with common dJ. F.
The, meanwhile, classical theory for extreme order statistics provides necessary and sufficient conditions for adJ. F to belong to the domain of attraction
of a nondegenerate dJ. G; that is, the weak convergence
n +
(6)
00,
holds for some choice of constants an > and reals bn. If F has a density then
one can make use of the celebrated von Mises conditions to verify (6). These
conditions are also necessary for (6) under further milder conditions imposed
on F. In particular, the drs treated in statistical textbooks satisfy one of the
von Mises conditions. Moreover, it turns out that the convergence w.r.t. the
variational distance holds. This may be written,
sup IP{a;;l(Xn,n  bn) E B}  G(B)I+ 0,
+ 00,
(7)
where the sup is taken over all Borel sets B. Note that the symbol G is also
0.2. Approximations
used for the probability measure corresponding to the dJ. G. Apparently, (7)
implies (6).
The relation (7) can be generalized to the joint distribution of upper
extremes X n  k +1 :n' X n k+2 :n' .. , Xn:n where k == k(n) is allowed to increase
to infinity as the sample size n increases.
We want to give some arguments why our emphasis aims at the variational
and Hellinger distance instead of the KolmogorovSmirnov distance:
(a) We claim mathematical reasons, namely, to formulate as strongly as
possible the results. One can add that the problems involved are very
challenging.
(b) Results in terms of dJ.'s look awkward if the dimension increases with the
sample size. Of course, the alternative outcome is the formulation in terms
of stochastic processes.
(c) It is necessary to use the variational distance (and, as an auxiliary tool,
the Hellinger distance) in connection with model approximation. In other
words, certain problems cannot be solved in a different way.
0.2. Approximations
The joint distributions of order statistics can explicitly be described by analytical expressions involving the underlying dJ. F and density f. However, in
most cases it is extremely cumbersome to compute the exact numerical values
of probabilities concerning order statistics or to find the analytical form of
d.f.'s of functions of order statistics. Hence, it is desirable to find approximate
distributions. In view of practical and theoretical applications these approximations should be of a simple form.
The classical approach of finding approximate distributions is given by the
asymptotic theory for sequences of order statistics Xr(n):n with the sample size
n tending to infinity:
(a) If r(n) ~ 00 and n  r(n) ~ 00 as n ~ 00 then the order statistics are
asymptqtically normal under mild regularity conditions imposed on F.
(b) If r(n) = k or r(n) = n  k + 1 for every n with k being fixed then the order
statistics are asymptotically distributed according to an extreme value
distribution (being unequal to the normal distribution).
In the intermediate casesthat is, r(n) ~ 00 and r(n)/n ~ 0 or n  r(n) ~ 00
and (n  r(n))/n ~ 0 as n ~ ooone can either use the normal approximation
or an approximation by means of a sequence of extreme value distributions.
Thus, the problem of computing an estimate of the remainder term enters the
scene; sharp estimates will make the different approximations comparable.
In the case of maxima of normal r.v.'s we shall see that a certain sequence
of extreme value distributions provides a better approximation than the limit
distribution.
O. Introduction
Better insight into the problem of computing accurate approximations is
obtained when higher order approximations are available. There is a tradeoff
between the two requirements that the higher order approximation should be
of a simple form and also of a better performance than the limiting distribution.
In particular, we shall study finite expansions of length m + 1 which may
be written
Q+
L vi,n
i=l
where Q is the limiting distribution and the vi,n are signed measures depending
on the sample size n. A prominent example is provided by Edgeworth expansions. Usually, the signed measures have polynomials h,n as densities w.r.t. Q.
If Q has a density 9 then the expansion may be written
Q(B)
+ it
vi,n(B) =
L(1
+ ~ h,ix) )
g(x) dx
(8)
for every Borel set B. Specializing (8) to B = (  00, t], one gets approximations
to d.f.'s of order statistics.
The bound of the remainder term of an approximation will involve
(a) unknown universal constants, and
(b) some known terms which specify the dependence on the underlying d.f.
and the index of the order statistic.
Since the universal constants are not explicitly stated, our considerations
belong to the realm of asymptotics.
The bounds give a clear picture of the dependence on the remainder terms
from the underlying distribution. Much emphasis is laid on providing numerical examples to show that the asymptotic results are relevant for small and
moderate sample sizes.
0.3. The Role of Order Statistics in
Nonparametric Statistics
The sample d.f. Fn is the natural, nonparametric estimator of the unknown d.f.
F, and, likewise, the sample quantile function (in short, sample q.f.) Fn 1 may
be regarded as a natural estimator of the unknown q.f. F 1 . For any functional
T(Fl) of F 1 a plausible choice of an estimator will be T(Fn 1) if no further
information is given about the underlying model.
Note that T(Fn 1) can be expressed as t(X 1:n, ... , Xn:n) since Fn:1(q) =
Xr(q):n where r(q) = nq or r(q) = [nq] + 1.
In many nonparametric problems one is only concerned with the local
behavior of the q.f. F 1 so that it suffices to base a statistic on a small set of
order statistics like upper extremes
0.4. Central and Extreme Order Statistics
or certain central order statistics
X1nq]:n ~ ... ~
X1np]:n
where 0 < q < p < 1.
Thus, one is interested in the distribution of functions of order statistics of
the form T(X"m X,+1:n, ... , X.:n) where 1 ~ r ~ s ~ n. This problem can be
studied for a particular statistic T or within a certain class of statistics T like
linear combinations of order statistics.
If the type of the statistic T is not fixed in advance, one can simplify the
stochastic analysis by establishing an approximation of the joint distribution
of order statistics. Upper extremes X n : n, ... , Xnk+l:n may be replaced by
r.v.'s Y1 , , lk that are jointly distributed according to a multivariate extreme
value distribution so that the error term
sup IP{(Xn:n, ... , Xnk+l:n)
B}  P{(Y1 , , y,,)
B}I:= c5(F)
(9)
is sufficiently small. (9) implies that for any statistic T
sup IP{T(Xn:n, ... , Xnk+l:n)
B
B}  P{T(Y1 , . , y") E B}I ~ c5(F), (10)
and hence statistical problems concerning upper extremes can approximately
be solved within the parametric extreme value model. These arguments also
hold for lower extremes.
A similaryet slightly more complicatedoperation is needed in the case
of central order statistics. Now the joint distribution of order statistics is
replaced by a multivariate normal distribution. To return from the normal
model to the original model one needs a fixed Markov kernel which will be
constructed by means of a conditional distribution of order statistics.
0.4. Central and Extreme Order Statistics
There are good reasons for a separate treatment of extreme order statistics
and central order statistics; one can e.g. argue that the asymptotic distributions of extreme order statistics are different from those of central order
statistics.
However, as already mentioned above, intermediate order statistics can be
regarded as central order statistics as well as extremes so that a clear distinction between the two different classes of order statistics is not possible. The
statistical extreme value theory is concerned with the evaluation of parameters
of the tail of a distribution like the upper and lower endpoint. In many
situations the asymptotically efficient estimator will depend on intermediate
order statistics and will itself be asymptotically normal. Thus, from a certain
conservative point of view statistical extreme value theory does not belong to
extreme value theory.
o.
Introduction
On the other hand, some knowledge of stochastical properties of extreme
order statistics is needed to examine certain aspects of the behaviour of central
order statistics. To highlight this point we note that spacings X,:n  X'l:n
of exponential r.v.'s have the same distribution as sample maxima. Another
example is provided by the conditional distribution of the order statistic X':n
given X,+l:n = x that is given by distributions of sample maxima.
0.5. The Restriction to Independent and
Identically Distributed Random Variables
The classical theory of extreme values deals with the weak convergence of
distributions of maxima of independent and identically distributed r.v.'s. The
extension of these classical results to dependent sequences was one of the
celebrated achievements of the last decades. This extension was necessary to
justify the applicability of classical results to many natural phenomena.
A similar development can be observed in the literature concerning the
distributional properties of central order statistics, however, these results are
more sporadic than systematic. In this book we shall indicate some extensions
of the classical results to dependent sequences, but our attention will primarily
be focused upon strengthening classical results by obtaining convergence in
a stronger sense and deriving higher order approximations. Our results may
also be of interest for problems which concern dependent r.v.'s like
(a) testing problems where under the nullhypothesis the r.v.'s are assumed to
be independent, and
(b) cases where results for dependent random variables are formulated via a
comparison with the corresponding results for independent r.v.'s.
0.6. Graphical Methods
Despite of the preference for mathematical results the author strongly believes
in the usefulness of graphical methods. I have developed a very enthusiastic
attitude toward graphical methods but this is only when the methods are
controlled by a mathematical background.
The traditional method of visually discriminating between distributions is
the use of probability papers. This method is highly successful since the eye
can easily recognize whether a curve deviates from a straight line. Perhaps the
disadvantages are
(a) that one can no longer see the original form of the "theoretical" dJ.,
(b) that small oscillations ofthe density (thus, also of probabilities) are difficult
to be detected by the approach via dJ.'s.
0.7. A Guide to the Contents
Alternatively, one may use densities, which playa key role in our methodology. As far as visual aspects are concerned the maximum deviation of
densities is more relevant than the L1 distance (which is equivalent to the
variational distance of distributions).
The problem that discrete dJ.'s (like sample d.f.'s) have no densities can be
overcome by using smoothing techniques like histograms or kernel density
estimates. Thus the data points can be visualized by densities. The qJ. is
another useful diagnostic tool to study the tails of the distribution.
The graphical illustrations in the book were produced by means of the
interactive statistical software package ADO.
0.7. A Guide to the Contents
This volume is organized in three parts, each of which is divided into chapters
where univariate and multivariate order statistics are studied. The treatment
of univariate order statistics is separated completely from the multivariate
case.
The chapters startas a warmupwith an elementary treatment ofthe
topic or with an outline of the basic ideas and concepts. In order not to
overload the sections with too many details some of the results are shifted to
the Problems and Supplements. The Supplements also include important
theorems which are not central to this book. Historical remarks and discussions offurther results in literature are collected in the Bibliographical Notes.
Given the choice between different proofs, we prefer the one which can also
be made applicable within the asymptotic .setup. For example, our way of
establishing the joint density of several order statistics is also applicable to
derive the joint asymptotic normality of several central order statistics.
Part I lays out the basic notions and tools. In Chapter 1 we explain in detail
the transformation technique, compute the densities of order statistics and
study the structure of order statistics as far as representations and conditional
distributions are concerned.
Chapter 2 is devoted to the multivariate case. We discuss the problem of
defining order statistics in higher dimensions and study some basic properties
in the special case of order statistics, these are defined componentwise.
Chapter 3 contains some simple inequalities for distributions of order
statistics. Moreover, concepts and auxiliary tools are developed which are
needed in Part II for the construction of approximate distributions of order
statistics.
Part II provides the basic approximations of distributions of order statistics.
Chapter 4 and 5 are concerned with the asymptotic normality of central order
statistics and the asymptotic distributions of extreme order statistics. Both
chapters start with an introduction to asymptotic theory; in a second step the
accuracy of approximation is investigated. Some asymptotic properties of
o.
Introduction
functionals of order statistics, the Bahadur statistic and the bootstrap method
are treated in Chapter 6. Certain aspects of asymptotic theory of order
statistics in the multivariate case are studied in Chapter 7.
Our own interests heavily influence the selection of statistical problems in
Part III, and we believe the topics are of sufficient importance to be generally
interesting.
In Chapter 8 we study the problem of estimating the qJ. and related
problems within the nonparametric framework. Comparisons of semiparametric models of actual distributions with extreme value and normal
models are made in Chapters 9 and 10. The applicability of these comparisons
is illustrated by several examples.
0.8. Notation and Conventions
Given some random variables (in short: r.v.'s) ~l' ... '
ity space (0, d, P) we write:
F 1
IX(F)
w(F)
IB
x4,y
w.p.l
~n
defined on a probabil
ith order statistic of ~ 1, ... , ~n'
ith order statistic of n independent and identically distributed
(i.i.d.) r.v.'s with uniform distribution on (0, 1),
quantile function (qJ.) corresponding to the distribution function
(dJ.) F,
= inf {x: F(x) > O}
"left endpoint of dJ. F,"
= sup{ x: F(x) < 1}
"right endpoint of dJ. F,"
indicator function of a set B; thus IB(x) = 1 if x E Band IB(x) = 0
if x ~ B,
equality of r.v.'s in distribution,
with probability one.
We shall say, in short, density instead of Lebesgue density. In other cases,
the dominating measure is stated explicitly. The family of all Borel sets is the
smallest crfield generated by intervals. When writing SUPB without any comment then it is understood that the sup ranges over all Borel sets of the
respective Euclidean space. Given adJ. F we will also use this symbol for the
corresponding probability measure. Frequently, we shall use the notation TP
for the distribution of T.
PART I
EXACT DISTRIBUTIONS
AND BASIC TOOLS
CHAPTER 1
Distribution Functions, Densities,
and Representations
After an introduction to the basic notation and elementary, important techniques which concern the distribution of order statistics we derive, in Section
1.3, the dJ. and density of a single order statistic. From this result and from
the wellknown fact that the spacings of exponential r.v.'s are independent (the
proof is given in Section 1.6) we deduce the joint density of several order
statistics in Section 1.4.
In Sections 1.3 and 1.4 we shall always assume that the underlying dJ. is
absolutely continuous. Section 1.5 will provide extensions to continuous and
discontinuous drs.
In Section 1.6, the independence of spacings of exponential r.v.'s and the
independence of ratios of order statistics of uniform r.v.'s is treated in detail.
Furthermore, we study the wellknown representation of order statistics of
uniform r.v.'s by means of exponential r.v.'s. This section includes extensions
from the case of uniform r.v.'s to that of generalized Pareto r.v.'s.
In Section 1. 7 various results are collected concerning functional parameters of order statisticslike moments, modes, and medians.
Finally, Section 1.8 provides a detailed study ofthe conditional distribution
of one collection of order statistics conditioned on another collection of order
statistics. This result which is related to the Markov property of order statistics
will be one of the basic tools in this book.
1.1. Introduction to Basic Concepts
Order Statistics, Sample Maximum, Sample Minimum
Let ~ 1, ... , ~n be n r. v.'s. If one is not interested in the order of the outcome
of ~ l ' .. , ~n but in the order of the magnitude then one has to examine the
12
1. Distribution Functions, Densities, and Representations
ordered sample values
(1.1.1)
which are the order statistics of a sample of size n.
We say that X". is the rth order statistic and the random vector
(Xl:., ... , X.:.) is the order statistic. Note that Xl:. is the sample minimum
and X.:. is the sample maximum. We may write
(1.1.2)
and
( 1.1.3)
When treating a sequence X r (.):. of order statistics, one may distinguish
between the following different cases: A central sequence of order statistics is
given if r(n) ~ 00 and n  r(n) ~ 00 as n ~ 00. A sequence of lower (upper)
extremes is given if r(n) (respectively, n  r(n is bounded. If r(n) ~ 00 and
r(n)/n ~ 0 or n  r(n) ~ 00 and (n  r(n/n ~ 0 as n ~ 00 then one can also
speak of an intermediate sequence.
One should know that the asymptotic properties of central and extreme
sequences are completely different, however, it is one of the aims of this book
to show that it can be useful to combine the different results to solve certain
problems.
From (1.1.2) and (1.1.3) we see that the minimum Xl:. and the maximum
X.: II may be written as a composition of the random vector (~1'''''~') and
the functions min and max. Sometimes it will be convenient to extend this
notion to the rth order statistic. For this purpose define
(LlA)
where z 1 :s; ... :s; z. are the values of the reals Xl' ... , x. arranged in a
non decreasing order. Using this notation one may write
(1.1.5)
As special cases we obtain Zl:. = min and Z.:. = max. Such a representation
of order statistics is convenient when order statistics of different samples have
to be dealt with simultaneously. Then, given another sequence ~~, ... , ~~ of
r.v.'s, we can write X;:. = Z".(~~, ... , ~~).
Sample Quantile Function, Sample Distribution Function
There is a simple device in which way we may derive results for order statistics
from corresponding results concerning the frequency of r.v.'s ~i' Let i(ro.t]
denote the indicator function of the interval ( 00, t]; then the frequency of
the data Xi in ( 00, t] may be written 2::7=1 l(ro.t](xJ, A moment's reflection
shows that
1.1. Introduction to Basic Concepts
13
n
Zr :::;;
itT
L 1(oo,tl(x;) ~ r
;=1
(1.1.6)
with Zl :::;; . :::;; Zn denoting again the ordered values of Xl' . , X n From (1.1.6)
it is immediate that
(1.1.7)
and hence,
(1.1.8)
with
(1.1.9)
defining the sample dJ. Fn.
Given a sequence of independent and identically distributed (in short, i.i.d.)
r.v.'s, the dJ. of an order statistic can easily be derived from (1.1.8) by using
binomial probabilities. Keep in mind that (1.1.8) holds for every sequence
~1' ... , ~n ofr.v.'s.
Next, we turn to the basic relation between order statistics and the sample
quantile function (in short, sample qJ.) Fn 1 For this purpose we introduce
the notion of the quantile function (in short, qJ.) of adJ. F. Define
F1(q)
= inf{t: F(t)
~ q},
q E (0, 1).
(1.1.10)
Notice that the qJ. F 1 is a realvalued function. One could also define
= inf{x: F(x) > O} and F1(1):= w(F) = sup{x: F(x) < 1};
then, however, F 1 is no longer realvalued in general.
In Section 1.2 we shall indicate the possibility of defining a qJ. without
referring to adJ.
F1(q) is the smallest qquantile of F, that is, if ~ is a r.v. with dJ. F then
F1(q) is the smallest value t such that
F 1(0):= a(F)
Pg < t} :::;; q :::;; Pg : :; t}.
(1.1.11)
The qquantile of F is unique if F is strictly increasing. Moreover, F 1 is
the inverse of F in the usual sense if F is continuous and strictly increasing.
As an illustration we state three simple examples.
EXAMPLES 1.1.1. (i) Let $ denote the standard normal dJ. Then $1 is the usual
inverse of $.
(ii) The standard exponential dJ. is given by F(x) = 1  e x , x ~ O. We
have F1(q) = log(1  q), q E (0, 1).
(iii) Let Zl < Z2 < ... < Zn and F(t) = n 1 Li'=11(oo,tl(z;). Then,
F1(q) =
Z;
if(i  1)/n < q:::;; i/n, i = 1, ... , n.
1. Distribution Functions, Densities, and Representations
14
From Example 1.1.1 (iii), with n = 1, we know if F is a degenerate dJ. with
jump at z = Zl then F 1 is a constant function with value z. Notice that the
converse also holds. In this case we have F(F1(q)) = 1 for every q E (0, 1).
Thus F 1 is not the inverse of F in the usual sense.
If ~ l ' ... , ~n are r.v.'s with continuous d.f.'s then one can ignore the
possibilities of ties which occur with probability zero. Then, according to
Example 1.1.1 (iii) we obtain for every q E (0, 1):
(i  l)/n < q ::::; i/n,
= 1, ... , n.
(1.1.12)
Alternatively, we may write
Fn1(q) = Xnq,n,
X[nq]+l,n'
nq integer,
otherwise,
(1.1.13)
where [nq] denotes the integer part of nq. Thus, we have
Fn1(q) = X(nq),n
(1.1.13')
with <nq) = min{m: m 2 nq}.
The r.v. Fn1(q) is the smallest sample qquantile. If q = 1/2 then one also
speaks of the sample median.
The considerations above show that order statistics are more related to
q.f.'s than to dJ.'s. Finally, we remark that according to (1.1.12), Fn 1 (i/n) =
Xi,n which implies Fn(Fn 1(i/n)) = i/n for i = 1, ... , n  1.
1.2. The Quantile Transformation
In the finite and asymptotic treatment of order statistics we shall make use of
certain special properties of order statistics of uniform and exponential r. v.'s.
In a first step, one has to established the required results for these particular
cases. The extension to other r.v.'s will be accomplished by a transformation
technique.
Introduction and Main Results
To be more precise let us introduce i.i.d. random variables 171' ... , 17n and
~1' ... , ~n where the 17i are (0, I)uniformly distributed and the ~i have the
common dJ. F. Then, the following two relations hold:
(171, ... ,17n) ~ (F(~l),.,F(~n))
(1.2.1)
(~l""'~n) ~ (F1(17d,,F 1(17n))
(1.2.2)
if F is continuous, and
where F 1 is the qJ. of F.
1.2. The Quantile Transformation
15
Let V 1 : n ::;;::;; Vn:n and, respectively, X 1 : n ::;; ::;; Xn:n be the order
statistics of '11' ... , '1n and ~ 1, ... , ~n. Since an increasing order of the observations is not destroyed by a monotone (nondecreasing) transformation one
obtains
(1.2.3)
and
(X l:n' . .. , Xn:n)
4: (F 1 (V 1 : n), ... , F 1 (Vn:n)).
(1.2.4)
For the details we refer to Lemma 1.2.4 and Theorem 1.2.5.
Some Preliminaries
Let us begin by noting the simple fact that given ordered values Zl
get <P(Zl) ::;; ... ::;; <p(zn) if <p is nondecreasing [respectively, <P(Zl)
if <p is nonincreasing].
we
<p(zn)
::;; . ::;; Zn
...
Lemma 1.2.1. Let Xr:n be the rth order statistic of r.v.'s ~1' ... , ~n with range
R, <p a realvalued function with domain R, and X;:n the rth order statistic of
the r.v.'s <p( ~ 1), ... , <p( ~n)
Then,
(i) X;:n = <p(Xr:n)
(ii) X;:n = <p(Xn r+1 : n)
if <p is nondecreasing,
if <p is nonincreasing.
Alternatively, using the notation in (1.1.4) one can write
(1.2.5)
if <p is nondecreasing, and
(1.2.6)
if <p is nonincreasing.
Lemma 1.2.1 (i) shows that one can interchange the nondecreasing function
<p and the function Zr:n without changing the r.v. The main results of the
present section are applications of Lemma 1.2.1 (i) to <p = F and <p = F 1
where F is adJ.
Another example is <p(x) = x. According to (1.2.6),
Zr:n(~l""'~n)
= Znr+l:n( ~1"'"
~n)'
In particular, the identity
Zl:n(~l""'~n)
Zn:n(~l'"'' ~n)
(1.2.7)
indicates that results for the sample minimum can easily be deduced from
those for the sample maximum.
1. Distribution Functions, Densities, and Representations
16
We mention an application of Lemma 1.2.1(ii) to <p(x) = 1  x.
EXAMPLE 1.2.2. Let U1 : n, ... , Un:n be the order statistics of n i.i.d. (0,1)uniformly distributed r.v.'s '11' ... , '1n' Then,
(U1 : n,, Un:n) = (1  Un:n, ... , 1  U1 : n).
d
(1.2.8)
To prove this make use of the wellknown fact that
(1  '11,,1  '1n) ,g, ('11'"'' '1n)
In Lemma A.1.1 it will be provedwithin a more general framework that
q ::; F(x)
for every real x and q
iff F1(q)::; x
(1.2.9)
(0, 1). Notice that (1.2.9) is equivalent to
q > F(x)
iff F 1(q) > x.
(1.2.10)
Deduce from (1.2.9) that the qJ. of the dJ.
x + Fx  /1)/(1),
with location and scale parameters /1 and (1, is given by /1
From (1.2.9) one also obtains
+ (1F 1
0< q < 1,
(1.2.11)
[where F(x) denotes the lefthand limit of F at x] and,
F 1(F(x)) ::; x ::; F 1(F(xt)
if
<
F(x) < 1.
(1.2.12)
Criterion 1.2.3. A df F is continuous if, and only if,
F(F 1(q))
PROOF.
= q for every q E (0, 1).
(1.2.13)
Obvious from (1.2.11) and the fact that every q E (0, 1) lies in the range
~~
Notice that F(F1(q)) = q if F1(q) is a continuity point of F. Moreover,
from (1.2.12) we get
P1(F(x)) = x
if F(x) is a continuity point of F 1.
Quantile and Probability Integral Transformation
Criterion 1.2.3 will be the decisive tool to prove
Lemma 1.2.4. Let '1 be a (0, I)uniformly distributed r.v. Then for any df F the
following two results hold:
1.2. The Quantile Transformation
17
(i) (Quantile transformation)
F 1('1) has the dt. F.
(ii) (Probability integral transformation)
Let ~ be a r.v. with df. F. Then,
iff F is continuous.
F(~) 4: '1
PROOF.
(i) From (1.2.9) it is immediate that
P{F 1('1) ~ x} = P{'1 ~ F(x)} = F(x).
(ii) From (i) we know that ~ 4: F 1 ('1). Thus, Criterion 1.2.3 implies that
F(~) 4: F(F 1('1 = '1 if F is continuous. Conversely, for every x,
Pg = x}
~ P{F(~)
= F(x)} = 0
if F(~) 4: '1, and hence the dJ. F of ~ is continuous.
Let us note a direct consequence of the quantile transformation and the
transformation theorem for integrals. Apparently,
f II
gdF =
g(F 1(xdx
(1.2.14)
provided one of the integrals exists.
For independent r.v.'s ~1' ~2 with common continuous dJ. F we deduce
from (1.2.9), (1.2.10), (1.2.13) and Lemma 1.2.4(i) that
Pg 1 ~ ~2} = P{F 1('1d ~ F 1('12)}
= P
~ F(F 1('12}
{ '11
F(F 1('11 < '12
= P{'11
<
:( '12}
where '11' '12 are independent (O,l)uniformly distributed r.v.'s. Thus, the
probability
is independent of the continuous dJ. F.
We remark that the probability integral transformation in case of not
necessarily continuous dJ.'s will be given in Section 1.5.
The Quantile Transformation of Order Statistics
Combining Lemma 1.2.1 and Lemma 1.2.4 we obtain the main result of this
section [as already formulated in (1.2.3) and (1.2.4)].
Theorem 1.2.5. Let X 1:n, ... , Xn:n be the order statistics of n i.i.d. random
variables with common df. F. Then,
(i)
(Fl(Ul:n), ... ,Fl(Un:n
4: (X 1 : n, ... ,Xn:n),
1. Distribution Functions, Densities, and Representations
18
and
if, in addition, F is continuous, then
(ii)
PROOF.
(i) Using the quantile transformation we obtain
(l"",(n) 4, (F 1(rlt), ... ,F 1(tfn
where (1' ... , (n are i.i.d. random variables with common dJ. F and tf1, ... , tfn
are ij.d. random variables with common uniform distribution on (0,1). Moreover, w.l.g. the r.v.'s tfi are (0, I)valued. Since F 1 is a nondecreasing function
it is immediate from Lemma 1.2.1 that
(X 1 :n' ... , Xn :n) 4, (Zl: n(F 1(tf d, ... ,F 1(tfn, ... , Zn:n(F 1(tf d, ... , F 1(tfn)
=
(F1 (Zl :n(tf1'" ., tfn,"" F 1(Zn:n(tf 1"'" tfn)))
4, (F1 (V1 :n),"" F 1(Vn:n.
(ii) From (i) it is obvious that
(F(X l:n)'"'' F(Xn:n 4, (F(F 1(V 1:n, ... , F(F 1(Vn:n)))
where the second identity follows from Criterion 1.2.3.
Combining the two results of Theorem 1.2.5 we obtain
Corollary 1.2.6. Suppose that Xl:., ... , Xn:n are the order statistics of n i.i.d.
random variables with common continuous df F and X~:n' ... , X~:n are the
order statistics of n i.i.d. random variables with common df G. Then,
(1.2.15)
Since G 1 is defined on (0,1) it may happen that the righthand side of
(1.2.15) is only defined on a set with probability one. This, however, creates
no difficulties under the convention that the righthand side is equal to
{F(Xi:n) E {O, I}} which has probability
some fixed constant on the set
zero.
Ur=l
Corollary 1.2.7. Let V"n and X"n be as in Theorem 1.2.5 (i). Then, for reals
t 1, ... , tk and integers 1 ::;; r1 < r2 < ... < rk ::;; n we obtain,
P{Xr,:n::;; t 1, .. ,Xrk :n ::;; td
PROOF.
P{Vr,:n::;; F(t 1), .. , Vrk :n ::;; F(t k}}
Theorem 1.2.5 and (1.2.9) yield
P{Xr,:n::;; t 1, .. ,Xrk :n ::;; td
P{F 1(Vr,:n)::;; t 1, .. ,F1(Vrk :n)::;; td
P{Vr,:n::;; F(t 1), .. , Vrk :n ::;; F(tk)}'
1.2. The Quantile Transformation
19
An Alternative Approach to Q.F.'s
Next, we investigate the question whether it makes sense to speak of a qJ.
without referring to a dJ. In order to treat this question in a satisfactory way
it is useful to study the inverse of a nondecreasing function in a greater
generality. The proof of Theorem 1.2.8 and further technical details are postponed until Appendix 1.
Theorem 1.2.8. (i) The qf. F 1 of a df. F is nondecreasing and left continuous.
(ii) For every realvalued, nondecreasing and left continuous function G with
domain (0, 1) there exists a unique df. F such that G = F l .
We remark that the dJ. can be regained from its q.f. by
F(x) = sup{q E (0, 1): Fl(q) :s; x}.
From Theorem 1.2.8 we know that it makes sense to say that a realvalued
function G with domain (0,1) is a q.f. if Gis nondecreasing and left continuous.
Since order statistics are more related to q.f.'s than to d.f.'s it is tempting
to formulate assumptions via conditions imposed on q.f.'s instead of dJ.'s.
However, we shall not follow this advice because of the dominant role of dJ.'s
in literature.
Weak Convergence of Q.F.'s
Finally, we treat the wellknown result that the weak convergence of d.f.'s is
equivalent to the "weak convergence" of q.f.'s.
Lemma 1.2.9. A sequence of df.'s Fn converges weakly to a df. Fo
Fnl(q)
+
Fol(q),
if, and only if,
n + 00,
at every continuity point q of Fa l .
PROOF. First let us assume that Fn weakly converges to Fo. Let q be a continuity
point of Fa l . Since the set of all discontinuity points of Fo is finite or countable
it is obvious that for every E > we find continuity points Yl' Y2 of Fo such
that Yl < FOl(q) < Y2 and IYl  Y21 :s; E.
From (1.2.10) we conclude that Fo(Yd < q :s; FO(Y2)' Moreover, q < Fo(Y2)
because q = Fo(Y2) implies Fa1(q) = FO
 l (Fo(Y2)) = Y2 since Y2 is a continuity
point of Fo [compare with (1.2.12)] which is a contradiction.
Thus, Fn(Yl) < q < Fn(Y2) for all sufficiently large n because Fn(yJ + Fo(Y;),
n + 00. Now it is immediate from (1.2.9) that Yl :s; Fnl(q) :s; Y2 and hence
IFn 1 (q)  FOl (q)1 :s; E for all sufficiently large n. Since E > is arbitrary we
know that IFnl(q)  F01(q)l+ 0, n + 00.
1. Distribution Functions, Densities, and Representations
20
To prove the converse conclusion repeat the argument above with (1.2.9)
and (1.2.12) replaced by Lemma A.1.3 and (1.2.11).
0
Let Fn denote again the sample dJ. According to the GlivenkoCantelli
theorem, suP,IFn(t)  F(t)1 + 0, n + 00, w.p. 1. Thus one obtains as an immediate consequence of Lemma 1.2.9 that, w.p. 1, the sample qJ. Fn l converges
to the underlying q.f. F l at every continuity point of F l .
1.3. Single Order Statistic, Extremes
In this section we derive the explicit form of the dJ. and the density of a single
order statistic.
The D.F. of a Single Order Statistic
Let us start with the most simple result.
Lemma 1.3.1. Let Xr:n be the rth order statistic of n i.i.d. random variables
e 1,
.. ,
en with common df F. Then, for every t,
P{Xr:n
PROOF.
~ t} =
it, C)F(t)i(1  F(t))"i.
(1.3.1)
Obvious from (1.1.8) by noting that I7=11(oo,I](eJ is a binomial
u.
Lemma 1.3.1 proves once more the special case of k = 1 in Corollary 1.2.7.
It is obvious from (1.3.1) that
P{Xr:n ~ t} = P{Vr:n ~ F(t)},
where Vr:n is again the rth order statistic of n i.i.d. random variables with
common uniform dJ. on (0, 1).
As special cases of Lemma 1.3.1 we note the dJ. of the maximum Xn:n and
the minimum X l : n. We have
P{Xn:n
t} = F(t)n,
(1.3.2)
and
P{Xl:n
t}
= 1
(1  F(t))n.
(1.3.3)
Notice that (1.3.2) can easily be proved in a direct way since for i.i.d. random
variables e l' ... , en we have
P{Xn:n ~ t} = P{el ~ t, ... ,en ~ t} = F(t)n.
1.3. Single Order Statistic, Extremes
21
It is apparent that if l ' ... , en are independent and not necessarily identically distributed (in short, i.n.n.i.d.) r.v.'s then
P{Xn'n ::; t} =
n Fi(t)
n
(1.3.4)
i=l
with Fi denoting the dJ. of ei.
The Density of a Single Order Statistic
It is easily seen from Lemma 1.3.1 that the dJ. of the rth order statistic is
absolutely continuous if F is absolutely continuous. To prove this recall that
the composition of monotone absolutely continuous functions is absolutely
continuous (see e.g. HewittStromberg, Exercise (18.37)) or use the argument
as given at the beginning of Section 1.5. Hence, the density of X"n can easily
be established as the derivative of its dJ. (compare e.g. with HewittStromberg, Theorem (18.3)).
Theorem 1.3.2. Let X"n be the rth order statistic of n i.i.d. random variables
with common df. F and density f Then, X"n has the density
y1(1  Ft r
j,.,n = n!f (r _ 1)!(n  r)!
(1.3.5)
PROOF. From Lemma 1.3.1 we know that the dJ. of X"n, say, G can be written
as the composition G = H 0 F where the function H is defined by H(t) =
Li'=r(iW(1  t)ni. For every t where f(t) is the derivative of Fat t we know
that the derivative of Gat t exists and G'(t) = f(t)H'(F(t)); it suffices to prove
that G'(t) = j,.,n(t). The derivative of H is given by
t r  1(1  tt r
H'(t) = n! ,,(r  1)!(n  r)!
(1.3.6)
and hence the assertion of the theorem holds. For proving (1.3.6) check that
H'(t) =
,=r
i (~) t H (1  t)ni I
 i=r+1
f i (~) t i 1(1 I
t r  1(1  tt r
= n! .,.,...,..,',,
(r  1)!(n  r)!
~i1 (n  i) (~) t i(1
,=,
t)ni
 t)ni1
22
I. Distribution Functions, Densities, and Representations
where the final step is obvious from the identities
iC) = (n  i +
l)C ~ 1) = n!/i 
1)!(n  i)!).
An alternative, more elegant proof of Theorem 1.3.2 will be given in Section
1.5. This proof will enable us to replace the condition that F is absolutely
continuous by the weaker condition that F is continuous.
We note simple special cases of (1.3.5). The densities of the sample maximum and the sample minimum are given by
/",n = njFnl
and
(1.3.7)
Moreover, observe that U"n is a beta r.v. with parameters rand n  r + 1.
This becomes obvious by noting that a beta r.v. with parameters rand s has
the density
0< x < 1,
(1.3.8)
SJ
where b(r, s) = x r  1(1  X)sl dx is the beta function. Recall that b(r, s) =
r(r)r(s)!r(r + s) where r is the gamma function [with r(r) = (r  I)! for
positive integers r].
The following example, concerning sample medians, gives a flavor of the
asymptotic treatment of central order statistics. It indicates that central order
statistics are asymptotically normal.
Let cp denote the standard normal density given by
cp(x)
= (2n)1/2 exp(  x 2/2).
Deduce from (1.3.8) that the density h m of the normalized sample median
2(2m)1/2(Um+1'2m+l  1/2)
is given by
and = 0, otherwise, where c(m) is a constant. Since
(1  x2/2m)m
+
exp(  x 2/2),
m +
00,
and ~ (1  x2/2m)m ~ exp( x 2/2) it follows from Lebesgue's dominated
convergence theorem that
c(m)
+
f exp( x 2/2)dx = (2n)1/2,
+ 00,
and hence
m
+ 00,
(1.3.9)
23
1.3. Single Order Statistic, Extremes
for every x. The Scheffe lemma 3.3.2 yields that the distribution of the normalized sample median converges to the standard normal distribution W.r.t. the
variational distance as the sample size goes to infinity.
Extreme Value D.F.'s
Next (1.3.2) and (1.3.3) will be examined in the special case oflimiting dJ.'s of
sample maxima or sample minima (in other words: extreme value d.f.'s).
The nondegenerate limiting dJ.'s of sample maxima are of the type
Gl.~(X) = LxP( ~X~)
if
G () = {exp( ( x)~)
2.~ x
1
if
G3 (x)
= exp( _e
X )
x~O
"Frechet"
x> 0,
x~O
"Wei bull"
x> 0,
for every x.
(1.3.10)
"Gumbel"
where r:l > 0 is a shape parameter. We say that two dJ.'s G1 and G2 are of the
same type if Gdh + ax) = G2 (x) for some a > 0 and real h.
Frequently, it will be convenient to write G3.~ in place of G3 where r:l is
always understood to be equal to 1. The following identities show that the
dJ.'s Gi.~ are in fact limiting dJ.'s of sample maxima. We have
G~.~(nl/~x) = Gl,~(x),
= G2.~(X),
G~(x + log n) = G3 (x).
(1.3.11)
GL(nl/~x)
Every limiting dJ. has to be maxstable in the sense of (1.3.11). It is one of
the admirable achievements of the classical extreme value theory that one can
show that the d.f.'s in (1.3.10) are the possible nondegenerate limiting dJ.'s of
sample maxima [see e.g. Galambos (1987), Theorems 2.4.1 and 2.4.2, or Leadbetter et al. (1983), Theorem 1.4.2]. It is understood that the nondegenerate
limiting dJ.'s have to be of the same type as Gl,~' G2.~' G3
Frequently, (1.3.11) will be summarized by
(1.3.12)
where
en
nl/~,
dn = 0
en
nl/~,
dn = 0
if i = 2
en
= 1,
dn = logn
=1
(1.3.13)
= 3.
Notice that G2 1 (x) = eX, x < 0, defines the "negative" standard exponential dJ. The dJ. G2 1 will usually be taken as a starting point of our investiga
1. Distribution Functions, Densities, and Representations
24
tions. This is partly due to the fact that G2 , 1 is the limiting dJ. of the maximum
Un : n of (0, I)uniformly distributed r.v.'s. To prove this, notice that
P{n(Un:n  1) ~ x}
(1
+ x/n)",
n ~ x ~ 0,
and
(1.3.14)
(1
+ x/n)" ~ eX =
G2 ,l(X),
n~
00,
0.
It is obvious that the pertaining densities (1 + x/n)"ll[_n,Oj(x) converge to
the density eX, x ~ 0, of G2 ,l which again yields the convergence W.r.t. the
variational distance. We remark that (1.3.14) will be extended from the special
case of uniform r.v.'s to generalized Pareto r.v.'s in Section 1.6. A detailed
study of the asymptotic behavior of extremes will be made in Chapter 5.
Lemma 1.3.1 may be applied to show that a stability relation corresponding
to (1.3.12) does not hold for the kth largest order statistic if k > 1.
For the sake of completeness we also state the nondegenerate limiting dJ.'s
of sample minima (again with parameters (J( > 0):
F1,,,(x)
F2 ,,,(x)
= 1  G2 ,,,(  x), x > 0,
F3(X)
1  G 1 ,,,( x), x < 0,
=1
(1.3.15)
G3( x).
The pertaining stability relations may be summarized by
1  (1  Fi,,,(cnx
where
Cn
+ dn))n = Fi,,,(x)
(1.3.16)
and dn are the constants in (1.3.13).
Von Mises Parametrization
In the statistical context, one includes a location parameter f.l and a scale
parameter (J > 0 into the considerations. Starting with the standard Frechet,
Weibull, and Gumbel dJ.'s as given in (1.3.10) we obtain dJ.'s of the form
x
Gi,,,((x  f.l)/(J).
If the index i is unknown then these dJ.'s should be unified to a 3parameter
family by using the von Mises parametrization: For p "# 0 define
HfJ(x)
= exp[ (1 + px)llfJ],
1 + px > O.
(1.3.17)
Moreover,
(1.3.18)
Since (1 + PX)llfJ ~ eX, P ~ 0, it is clear that HfJ(x) ~ Ho(x), P ~ O. The
Frechet and Weibull drs can be regained from HfJ by the identities
25
1.3. Single Order Statistic, Extremes
P>O
and
if
G2 ,1/P(X) = Hp( (x
+ l)IP)
(1.3.19)
P< 0.
Graphical Representation of von Mises Densities
To get a visual impression ofthe "von Mises densities" we include their graphs
for special parameters. We shall concentrate our attention on the behavior of
the densities with parameter Pclose to zero. The explicit form of the densities
hp = Hp is given by
if P = 0, and
hp(x) = (1
+ PX)O+l/P) exp( (1 + PXfl/P)
liP, P>
< liP, P < 0,
if x >
and = 0, otherwise.
Figure 1.3.1 shows the standard Gumbel density h Q Notice that the mode
of the standard Gumbel density is equal to zero.
Figure 1.3.2 indicates the convergence of the rescaled Frechet densities to
the Gumbel density as P! 0. Figure 1.3.3 concerns the convergence of the
rescaled Weibull densities to the Gumbel density as Pi 0.
The illustrations indicate that extreme value densitiesin their von Mises
parametrizationform a nice, smooth family of densities. Frechet densities
(recall that this is the case of P > in the von Mises parametrization) are
skewed to the right. This property is shared by the Gumbel density and
0.5
3
Figure 1.3.1. Gumbel density hQ'
1. Distribution Functions, Densities, and Representations
26
3
Figure 1.3.2. Gumbel density ho and Frechet densities hp (von Mises parametrization)
with parameters f3 = 0.3, 0.6, 0.9.
3
Figure 1.3.3. Gumbel density ho and Weibull densities hp (von Mises parametrization)
with parameters f3 = 0.75, 0.5, 0.25.
Weibull densities for P == 1/IY. larger than 1/3.6. For parameters P close
to 1/3.6 (that is, IY. close to 3.6) the Weibull densities look symmetrical.
Finally, for parameters Psmaller than 1/3.6 the Weibull densities are skewed
to the left. For illustrations of Frechet and Weibull densities, with large
parameters IPI, we refer to Figures 5.1.1 and 5.1.2.
In Figure 1.3.4 we demonstrate that for certain location, scale and shape
parameters jJ., (J and IY. = l/P it is difficult to distinguish visually the Weibull
density from a normal density. Those readers having good eyes will recognize
104. Joint Distribution of Several Order Statistics
27
0.5
4
Figure 1.304. Standard normal density and Weibull density (dotted line) with parameters J1 = 3.14, (J' = 3048, and rx = 3.6.
a difference at the tails of the densities (with the dotted line indicating the
Wei bull density).
1.4. Joint Distribution of Several Order Statistics
In analogy to the proof of Lemma 1.3.1 which led to the explicit form of the
dJ. of a single order statistic one can find the joint dJ. of several order statistics
X r , :n' .. , X rk : n by using multinomial probabilities. The resulting expression
looks even more complicated than that in the case of a single order statistic.
Thus, we prefer to work with densities instead of d.f.'s. The basic results
that will enable us to derive the joint density of several order statistics are (a)
Theorem 1.3.2 that provides the explicit form of the density of a single order
statistic in the special case of exponential r.v.'s and (b) Theorem 1.4.1 which
concerns the density of the order statistic
Density of the Order Statistic
The density of the order statistic Xn can be established by some straightforward arguments.
Theorem 1.4.1. Suppose that ~ 1, . , ~n are i.i.d. random variables having the
common density f. Then, the order statistic Xn has the density 11,2, .... n:n given by
28
1. Distribution Functions, Densities, and Representations
n
f1,2,. ... .,n(x l ' ,xn) = n!
TI f(xJ,
i=l
and = 0, otherwise.
Let Sn be the permutation group on {l, ... ,n}; thus, (r(l), ... ,r(n
is a permutation of (1, ... , n) for every r E Sn Define Br = {~r(l) < ~r(2) <
... < ~r(n)} for every r E Sn. Note that
PROOF.
(X l:n,"" Xn:n) = (~r(l)"'" ~r(n) on Bp
and (~r(I)' ... , ~r(n) has the same distribution as (~I' ... , ~n)'
Moreover, since the r.v.'s ~i have a continuous dJ. we know that ~i and ~j
have no ties for i '# j (that is, Pg i = 0 = 0) so that P(I,rEsnBr) = 1.
Finally, notice that the sets Bp r E Sn, are mutually disjoint. Let Ao =
{(xI, ... ,xn): XI < X 2 < ... < x n}, and let A be any Borel set. We obtain
P{Xn
A} =
P( {Xn
P{(~r(I)""'~r(n)EAnAo}=n!P{(~I'''''~n)EAnAo}
A} n Br) =
tES"
reS"
P({(~r(l)'"'' ~r(n)
A} n Br)
tES n
= fAfl.2.,. .. n:n(XI, ... ,Xn)dXI ... dXn
which is the desired representation.
Theorem 1.4.1 will be specialized to the order statistic of exponential and
uniform r.v.'s.
EXAMPLES 1.4.2. (i) If ~ I,
... ,
~n
are i.i.d. standard exponential r. v.'s then
f1,2.,..,n:n(x I ,,,,,xn) = n!ex p [
<
i~ Xi}
Xl
< ... < Xn, (1.4.2)
and = 0, otherwise.
(ii) If ~ I ' ... , ~n are i.i.d. random variables with uniform distribution on (0, 1)
then
<
Xl
< ... < Xn < 1,
(1.4.3)
and = 0, otherwise.
Using Example 1.4.2(i) we shall prove that spacings Xr:n  X r  I :n of
exponential r.v.'s are independent (see Theorem 1.6.1). As an application one
obtains the following lemma which will be the decisive tool to establish the
joint density of several (in other words, sparse) order statistics X r, on' ... , X rk : n
Lemma 1.4.3. Let Xi:n be the ith order statistic of n i.i.d. standard exponential
r.v.'s. Then, for 1 :s:; rl < ... < rk :s:; n, the following two results hold:
29
1.4. Joint Distribution of Several Order Statistics
(i) The spacings X r, :n, X r2 :n  X r, :n, ... , X rk :n  X rk _l : n are independent,
and
(ii)
for i
= 1, ... , k (where ro = 0 and XO:n = 0).
PROOF. (i) follows from Theorem 1.6.1 since X l : n , X 2 : n X l :n, ... , Xn:nl : n are independent.
(ii) From Theorem 1.6.1 we also know that (n  r + I)(Xr:n  X r l : n) is
a standard exponential r.v. Hence, using an appropriate representation of
Xs:n  Xr:n by means of spacings we obtain for 0 ~ r < s ~ n,
X n
_ X
X
son
Sf (n 
(r
+ i) + l)(Xr+i:n  X r +i1:n)
i=l
ron
n  (r
.!!. s~ ((n  r)  i

+ i) + 1
+ l)(Xi :n r +1
().
n  r  I
1...
i=l
Xil:nr) _
 Xsr:nr
From Lemma 1.4.3 and Theorem 1.3.2 we shall deduce the density of
Xr,:n  X r'_l: n' and at the next step the joint density of
X r, on' X r2 :n  X r, on' ... , X rk :n  X rk _l : n
in the special case of exponential r.v.'s. Therefore, the joint density of order
statistics X r, on' ... , X rk :n of exponential r.v.'s can easily be established by
means of a simple application of the transformation theorem for densities.
Transformation Theorem for Densities
The following version of the wellknown transformation theorem for densities
will frequently be used in this sequel.
Let ~ be a random vector with density f and range B where B is an open
set in the Euclidean kspace IRk. Moreover, let T = (Tl , ... , 'Ii) be an IRkvalued,
injective map with domain B such that all partial derivatives 8Tj8xj are
continuous. Denote by (8T/8x) the matrix (8T;/8xj )i.j of all partial derivatives.
Assume that det(8T/8x) is unequal to zero on B. Then, the density of T(~) is
given by
(f 0 T l )ldet(8T l /8x)II T (B)
(1.4.4)
where T l denotes the inverse of T. It is wellknown that
det(8T l /8x)
l/det(8T/8x)
T l
(1.4.5)
under the conditions imposed on T.
EXAMPLE 1.4.4. Let ~ 1, ... , ~k be i.i.d. standard exponential r. v.'s. Put X =
(x 1, ... , X k ) The joint distribution of the partial sums ~ 1, ~ 1 + ~2' . , :L7=1 ~i
1. Distribution Functions, Densities, and Representations
30
has the density
(1.4.6)
where D = {y: 0 < Y1 < ... < Yk}' This is immediate from (1.4.4) applied
to B = (O,OO)k and T;(x) = L~=l Xj' Notice that T(B) = D, T 1(x) =
(X 1,X2  X1, ... ,Xk  Xk 1) and det(oT/ox) = 1 since (oT/ox) is a triangle
matrix with oT;/ox i = 1 for i = 1, ... , k.
The reader is reminded of the fact that L~=l ~i is a gamma r.v. with parameter k (see also Lemma 1.6.6(ii.
The Joint Density of Several Order Statistics
To establish the joint density of X' I on' , X. k : n we shall first examine the
special cases of exponential and uniform r.v.'s. Part III ofthe proof of Theorem
1.4.5 will concern the general case. The proof looks a little bit technical,
however, it can be developed step by step without much effort or imagination.
Another advantage of this method is that it is applicable to r.v.'s with continuous d.f.'s (see Theorem 1.5.2).
Theorem 1.4.5. Let 1 ::; k ::; nand 0 = ro < r1 < ... < rk < rk+1 = n + 1. Suppose that the common df. F of the i.i.d. random variables ~ l ' ... , ~n is absolutely
continuous and has the density f.
Then, X' I on' , X. k : n have the joint density J. 1.2 ......k :n given by
J.
1 2 . . . . . .k
:n(x) = n! (
) k+1 (F(Xi)  F(Xi_d)"',11
1)'
r, r,l
.
TI f(xi) ,=1
TI
,=1
(._. _
if 0 < F(x 1) < F(X2) < ... < F(Xk) < 1, and =0, otherwise. [We use the convention that F(xo) = 0 and F(Xk+1) = 1.]
PROOF. (I) First assume that ~ 1, ... , ~n are standard exponential r.v.'s. Lemma
1.4.3 and Theorem 1.3.2 imply that the joint density g of
X. ,:n , X. 2 : n
X. ,:n ,
X. k : n
is given by
_ k [
TI
g(x)  ,=1
X. k _ l : n
't,]
, x, (1  e x ,)" "1 l(e x
(n  ride
(r,. _.r,l _ 1)'(
.)'
. n _ r,.
Xi ~
'
0, i = 1, ... , n,
and = 0, otherwise.
From (1.4.4) and Example 1.4.4 we get, writing in short
J., ......k:n' that for 0 = Xo < Xl < ... < Xk'
kn
instead of
1.4. Joint Distribution of Several Order Statistics
TI
k
e(nr j +l)(xjX i 
[l 
31
e(XiXidJ'jri1l
i=1
TI e(nrj +l)(xj
k
xide(rj 'i1 l)xil
[eXi1 _ eXi]rj  ' i  1  1
i=1
and ir,n = 0, otherwise. The proof for the exponential case is complete.
(II) For Xi,n as in part I we obtain, according to Theorem 1.2.5(ii) that
d
(Ur! ,n"'" Urk,n) = (G(Xr! 'n)"'" G(Xrk,n))
where G(x) = 1  eX, x ;::: O. Using this representation, the assertion in the
uniform case is immediate from part I and (1.4.4) applied to
B
= {x: 0 <
< ... <
XI
xd
and
T(x)
= (G(XI), ... ,G(Xk))'
(III) Denote by Q the probability measure pertaining to F, and by gr,n
the density of(Ur! ,n"'" Urk,n)' It suffices to prove that for t l , ... , tk the identity
gr,n(F(xd,,F(xk))dQk(xl,,Xk)
Xi~!
oo,t,]
holds since Qk has the density x
II we get
+ TI~=I
i(x;). From Corollary 1.2.7 and part
P{ X r!,n ~ tl'"'' Xrk,n ~ tk} = P{Ur! ,n ~ F(tl)'"'' Urk,n ~ F(tk)}
=f
=
gr,n(xJ, ... ,xk)dxl .. dxk
f X7~1(OO,F(ti)](F(xI)"'"
X7~1 (ro,F(t i )]
F(xk))gr,n(F(xd,, F(xd)dQk(XI"'" xk)
where the 3rd identity follows by means ofthe probability integral transformation (Lemma 1.2.4(ii)). This lemma is applicable since F is continuous. The
proof is complete if
l(ro,F(t)](F(x)) = l(_oo,tj(x)
for Q almost all x.
This, however, is obvious from the fact that ( 00, t] c {y: F(y) ~ F(t)} and
that both sets have equal probability w.r.t. Q (prove this by applying the
probability integral transformation).
0
Remark 1.4.6. The condition 0 < F(x l ) < ... < F(x k) < 1 in Theorem 1.4.5
can be replaced by the condition XI < ... < x k To prove this notice that
1. Distribution Functions, Densities, and Representations
32
{O <
F(~I)
< ... <
F(~k)
< 1}
the same probability.
C gl
< ... <
~dandshowthatbothsetshave
We mention some special cases. For k = 1 and k = n we obtain again
Theorem 1.3.2 and Theorem 1.4.1. Moreover, we note the joint density of the
k smallest and k largest order statistics. We have
f1,2, ... ,k,n(X) = n! [
[I f(xJ
.1
] (1  F(Xk))"k
( _ k)'
'
n.
(1.4.7)
and = 0, otherwise. Moreover,
fnk+l ..... n'n(x) = n! [
lJ f(xJ
k
F(x 1 k
(n _ k)! '
(1.4.8)
and =0, otherwise. The joint density of(X Ln , Xn,n) is given by
and = 0, otherwise.
A slight modification of the proof of Theorem 1.4.5 will enable us to
establish the corresponding result for continuous d.f.'s.
1.5. Extensions to Continuous and
Discontinuous Distribution Functions
The results of this section are not required for the understanding of the main
ideas of this book and can be omitted at the first reading.
Let ~ l' ... , ~n be again i.i.d. random variables with common distribution
Q and dJ. F. It is easy to check that the joint distribution of k order statistics
possesses a Qkdensity. To simplify the arguments let us treat the case of
a single order statistic X,,"" Since {X"n E B} C Ui=1 gi E B} we have
P{X"n E B} ::s; n Pg 1 E B}, thus, Pg 1 E B} = implies P{X"n E B} = for
every Borel set B. Therefore, the distribution of X"n is absolutely continuous
W.r.t. Q, and hence the RadonNikodym theorem implies that X"n has a
Qdensity.
The knowledge of the existence of the density stimulates the interest in its
explicit form. One can argue that Theorem 1.5.1 is highly sophisticated,
however in many cases one would otherwise just be able to prove less elegant
results (see e.g. P.1.31).
Density of a Single Order Statistic under a Continuous D.F.
First we give an alternative proof to Theorem 1.3.2. This proof enables us to
weaken the condition that F is absolutely continuous to the condition that F
is continuous.
1.5. Extensions to Continuous and Discontinuous Distribution Functions
33
Theorem 1.5.1. Let X,," be the rth order statistic of n i.i.d. random variables
with common continuous df F. Then, X,," has the Fdensity
pr1 (1 _ F)"r
n!(r  l)!(n  r)!
(1.5.1)
PROOF. It suffices to prove that
P{X,,"
~ x} = J:oo H'(F)dF
with H' as in (1.3.6). According to (1.2.4), Criterion 1.2.3 and (1.2.9), the
righthand side above is equal to Jt(X) H'(x) dx. Moreover,
fo
F(X)
H'(x)dx
= H(F(x)) = P{X,,"
~ x}.
Notice that Theorem 1.3.2 is immediate from Theorem 1.5.1 under the
condition that F is absolutely continuous.
Joint Density of Several Order Statistics
under a Continuous D.F.
Another look at the proof of Theorem 1.4.5 reveals that the essential condition
adopted in the proof was the continuity of the dJ. F. In a second step we also
made use of the density x + [17=1 f(x;). When omitting the second step in the
proof onegets the following theorem for continuous dJ. 's which is an extension
of Theorem 1.4.5.
~ nand 0= ro < r1 < ... < rk < rk+1 = n + 1. Let
be i.i.d. random variables with common distribution Q and df F. If F
is continuous then the order statistics Xrl , " ' , X rk ," have the joint Qkdensity
grl .... ,rk'" given by
Theorem 1.5.2. Let 1 ~ k
~ l'
... ,
~"
(1.5.2)
if
Xl
< x 2 < , .. < x k, and =0, otherwise (where again F(x o) =
F(Xk+1)
= 1).
and
Note that Theorem 1.4.5 is immediate from Theorem 1.5.2 since Qk has the
Lebesgue density x + flf=r!(x;) if Q has the Lebesgue density f.
Remark 1.5.3. Part III of the proof of Theorem 1.4.5 shows that the following
result holds true: Let Qo be the uniform distribution on (0, 1) and let Q1 be a
probability measure with continuous dJ. F.
1. Distribution Functions, Densities, and Representations
34
1f(~1"'" ~k) is a random vector with Q~density g
(Fl(~d, ... , rl(~k)) has the Q1density
then the random vector
x + g(F(x 1)' ... , F(Xk))'
Probability Integral Transformation for Discontinuous D.F.'s
Let ~ be a r.v. with distribution Q having a continuous dJ. F. The uniformly
distributed r.v. F(~), as studied in Lemma 1.2.4(ii), corresponds to the following experiment: If x is a realization of ~ then in a second step the realization
F(x) will be observed.
Next, let F be discontinuous at x. Consider a 2stage random experiment
where we include a further r.v. which is uniformly distributed on the interval
(F(x), F(x)). Here, F(x) denotes again the lefthand limit of F at x. For
example, we may take the r.v. F(x) + I'/(F(x)  F(x)) where 1'/ is uniformly
distributed on (0, 1).
If x is a realization of ~, and y is a realization of 1'/ then the final outcome
of the experiment will be F(x) + y(F(x)  F(x)). This 2stage random experiment is also governed by the uniform distribution. This idea will be made
rigorous in the following lemma.
Lemma 1.5.4. Suppose that ~ is a r.v. with df F, and that 1'/ is a r.v. with uniform
distribution on (0, 1). Moreover, ~ and 1'/ are assumed to be independent. Define
H(y, x) = F(x)
Then,
H(I'/,~)
+ y(F(x) 
F(x)).
(1.5.3)
is uniformly distributed on (0, 1).
PROOF. It suffices to prove that P{H(I'/,~) < q} = q for every q E (0, 1). From
(1.2.9) we know that ~ < F 1 (q) implies F(O < q and ~ > F 1 (q) implies
F(~) 2 q. Therefore, by setting x = F 1 (q), we have
P{H(I'/,~)
< q} =
Pg < x} + P{H(I'/,~) < q, ~ = x}
= F(x) + P{F(x) + I'/(F(x)  F(x)) < q}Pg = x} = q.
D
Lemma 1.5.4 will be reformulated by using a Markov kernel K. Note that
inducing with the dJ. F is equivalent to inducing with the Markov kernel
(B, x) + 1B(F(x)).
Corollary 1.5.5. Let Q be a probability measure with df F. Define K(Blx) =
1B (F(x)) for every Borel set B if x is a continuity point of the df F, and K( 'Ix)
is the uniform distribution on (F(x), F(x)) if F is discontinuous at x. Then,
KQ =
is the uniform distribution on (0, 1).
f K( 'lx)dF(x)
1.5. Extensions to Continuous and Discontinuous Distribution Functions
35
PROOF. Let ~ and 11 be as in Lemma 1.5.4. Thus, K('lx) is the distribution of
F(x) + 11(F(x)  F(x. By Fubini's theorem we obtain for every t,
fK 
00,
tJlx)dF(x)
P{F(x)
= P{F(~)
+ 11(F(x) 
+ 11(F(~) 
F(x
F(~
t}dF(x)
t} = t
where the final identity is obvious from Lemma 1.5.4.
Joint Density of Order Statistics under a Discontinuous D.F.
Hereafter, let ~ l ' ... , ~n be i.i.d. random variables with common distribution
Q and dJ. F. For example, F is allowed to be a discrete dJ. Let again
H(y, x) = F(x)
+ y(F(x) 
F(x.
Theorem 1.5.6. For 1 S k S nand 0 = ro < r1 < ... < rk < rk+l = n
Qkdensity of (Xr1 :n'" ., X rk :n), say, f.. ...... rk: n is given by
f.. 1..... rk: n(x 1' ... ,xd =
J(O.l)k
+ 1 the
grl ..... rk:n(H(y1'X1)' ... ,H(yk'xd)dy1 ... dyk
where grl ..... rk:n is the joint density of UrI on' ... , Urk :n
PROOF. The proof runs along the lines of part (III) in the proof of Theorem
1.4.5. Instead of Lemma 1.2.4(ii) apply its extension Lemma 1.5.4 to discontinuous d.f.'s. We have
=E[l Xf:l(<Xl,F(ti)) (H(111 , ~1)"'" H(11k'~kgrl, ... ,rk:n(H(111' ~d, ,H(11k, ~k))]
= E[1 Xf=1 (<Xl ,til (~1'"'' ~k)grl ..... rk:n(H(111' ~ 1),, H(11k' ~k))]
where 11 1, ~ l' ... , 11k' ~k are independent r. v.'s such that ~ 1, ... , ~n possess the
common dJ. F, and 111' ... , 11k are uniformly distributed on (0, 1). The second
identity is established in the same way as the corresponding step in the proof
of Theorem 1.4.5 by applying Lemma 1.5.4 instead of Lemma 1.2.4(ii). Now
the assertion is immediate by applying Fubini's theorem.
0
Notice that H(Y1, xd < H(Y2, X2) if and only if either Xl < X2, or Xl = X2
and Y1 < Y2. Hence, by using the lexicographical ordering one may write
Theorem 1.5.6 in a different way:
Corollary 1.5.7. Define Bk as the set of all vectors (X 1,Y1, ... ,Xk,Yk) with
1, i = 1, ... , k, and Xi < Xi+1 or Xi = Xi+l and Yi < Yi+1 for i = 1, ... ,
o < Yi <
k  1.
Then, the density f.. ...... rk:n> given in Theorem 1.5.6, is of the following
form:
1. Distribution Functions, Densities, and Representations
36
f,,' ..... 'k:n (X )
=,n . .k+1
n [H(y;,;x) ( . H(y
_.
r,
Bk .=1
(with the convention that H(yo,xo)
;1,
r,l
;1
)],;,;_,1
_ 1)'
Y1 ... Yk
= 0 and H(Yk+1, xk+d = 1).
I.N.N.I.D. Random Variables
This is perhaps the proper place to mention an interesting result due to
Guilbaud (1982). This result connects the distribution of order statistics of
i.n.n.i.d. (independent not necessarily identically distributed) random variables to that of order statistics of i.i.d. random variables.
Theorem 1.5.S. Let X1:n ::;; ... ::;; Xn:n be the order statistics of i.n.n.i.d. random
variables ~ 1, ... , ~n' Denote by F; the df. of ~;.
Then, for every Borel set B,
where the summation runs over all subsets S of {I, ... , n} with m elements.
Moreover, xf,n ::;; ... ::;; X!:n are the order statistics of n i.i.d. random variables
with common df.
FS = ISI 1
ieS
F;.
We do not know whether Theorem 1.5.8 is of any practical relevance.
1.6. Spacings, Representations, Generalized
Pareto Distribution Functions
In this section we collect some results concerning spacings (and thus also for
order statistics) of generalized Pareto r.v.'s. We start with the particular cases
of exponential and uniform r.v.'s.
Spacings of Exponential R.V.'s
The independence of spacings of exponential r.v.'s was already applied to
establish the joint density of several order statistics. The following wellknown
result is due to Sukhatme (1937) and Renyi (1953).
Theorem 1.6.1. If X 1:n, ... , Xn:n are the order statistics of i.i.d. standard
exponential r.v.'s '11' ... , '1n then
1.6. Spacings, Representations, Generalized Pareto Distribution Functions
37
(i) the spacings X l :n, X2:n  X l :n, ... , Xn:n  X n l :n are independent,
and
(ii) (n  r + I)(Xr:n  X r  l : n ) is again a standard exponential r.v. for each
r = 1, ... , n (with the convention that XO:n = 0).
PROOF. Put x = (x 1, .. ,xn ). It suffices to prove that the function
n exp( x;)I(O,,,,)(X
i=l
n
X +
i)
is a joint density of
nXl:n> (n  I)(X2:n  X l : n ),
(Xn:n  X n 
l : n )
From Example 1.4.2(i), where the density of the order statistic of exponential r.v.'s was established, the desired result is immediate by applying the
transformation theorem for densities to the map T = (Tl , . . , 7;,) defined by
T;(x) = (n  i + l)(xi  xid, i = 1, ... , n.
Notice that det(oT/ox) = n! and Tl(x) = (LJ=l x)(n  j + 1~=l' Moreover, use the fact that L~=l Xi = L~=1 LJ=1 x)(n  j + 1).
0
From Theorem 1.6.1 the following representation for order statistics Xr:n
of exponential r.v.'s is immediate:
(1.6.1)
Note that spacings of independent r.v.'s '11' ... , '1n with common dJ.
F(x) = 1  exp[ a(x  b)],x ~ b,arealsoindependent.Itiswellknown(see
e.g. Galambos (1987), Theorem 1.6.3) that these dJ.'s are the only continuous
dJ.'s so that spacings are independent.
Ratios of Order Statistics of Uniform R.V.'s
Spacings of uniform r.v.'s cannot be independent. However it was shown by
Malmquist (1950) that certain ratios of order statistics Ui : n of uniform r.v.'s
are independent. This will be immediate from Theorem 1.6.1. A simple generalization may be found at the end of the section.
Corollary 1.6.2.
(i)
1  Ul :n, (1  U2 : n }/(l  Ul :n), ... , (1  Un:n)/(1  Unl:n)
are independent r.v.'s, and
(ii)
(1  Ur:n)/(1  Ur l :n) 4: Unr+1:nr+l'
(with the convention that UO : n = 0).
r = 1, ... , n,
1. Distribution Functions, Densities, and Representations
38
PROOF. Let X"n be as in Theorem 1.6.1 and let F be the standard exponential
dJ. Since U"n ~ F(X"n) we get
[(1  Ur:n)j(1  Url,n)]~=1 ~ [(1  F(X"n))/(l  F(Xrl,n))]~=1
~ [exp( (X"n  Xrl,n))]~=1
which yields (i) according to Theorem 1.6.1.
Moreover, by Lemma 1.4.3(ii) and Example 1.2.2 we obtain
d
exp( (X"n  X r 1,n)) = 1  F(X 1,nr+1) = 1  U1,nr+l = Un r+1,nr+1'
The proof of (ii) is complete.
The original result of Malmquist is a slight modification of Corollary 1.6.2.
Corollary 1.6.3.
(i) U1,n/U2,n"'" UnLn/Un,n, Un,n are independent r.v.'s,
and
(ii) U"n/Ur+1:n ~ U"r for r = 1, ... , n
(with the convention that Un+1,n = 1).
PROOF.
Immediate from Corollary 1.6.2 since by Example 1.2.2
(U"n/Ur+l:n)~=1 ~ [(1  Un r+1:n)j(l  Un"n)]~=I'
Since U"n/Ur+l:n, Ur+Ln are independent one could have the idea that
also U"n, U"n/Ur+1:n are independent which however is wrong. This becomes
obvious by noting that 0 ::'S: U"n ::'S: U"n/Ur+l,n ::'S: 1.
Representations of Order Statistics of Uniform R.V.'s
One purpose of the following lines will be to establish a representation of the
order statistics U1:n, ... , Un,n related to that in (1.6.1). In a preparatory step
we prove the following.
Lemma 1.6.4. Let 1/ 1, ... , 1/n+1 be independent exponential r.v.'s with 1/i having
the df. F;(x) = 1  exp( (XiX) for x ~ 0 where (Xi> O. Put (i = 1/;/(2::;:11 1/J,
i = 1, ... , n, and (n+l = I,;:t 1/j. Then, the joint density of (1' ... , (n+1, say gn+l,
is given by
gn+1 (xn+1) =
(:a
(Xi) x:+1 exp [  Xn+1 ((Xn+l
if Xi > 0 for i = 1, ... , n + 1,2:7=1 Xi <
1, and gn+1
+ it ((Xi 
(Xn+l )Xi) ]
= 0, otherwise.
1.6. Spacings, Representations, Generalized Pareto Distribution Functions
39
The transformation theorem for densities (see (1.4.4 is applicable
to B = (0, (0)"+1 and T = (Tl , ... , T,,+1) where T,,+1(xn +d = '[.;:: Xj and
7;(xn + l ) = Xd'[.;:11 Xj for i = 1, ... , n. The range of T is given by
PROOF.
T(B) =
{Xn+l: Xi > 0 for i = 1, ... , n + 1 and )=1
i Xj < I}.
The inversefunction S = (SI"'" Sn+d of T is given by S;(x n + l ) = X i X n +1 for
i = 1, ... , nand Sn+1 (x n + l ) = (1  '[.;=1 xJx n+l . Since the joint density of
'11' ... , '1n+l is given by
the asserted form of gn+1 is immediate from (1.4.4) if det(oS/ox)
= x:+1 (where
(as/ox) is the matrix of partial derivatives). This, however, follows at once
from the equation
Xn+1
Xl
Xn+1
0
0
0
Xn+1 Xn
Xl
0
0
Xn+1
Xn
X n+1 ... Xn+1 (1  '[.7=1 x;)
since det(AB) = det(A)det(B). Notice that the 3rd matrix is (as/ox).
Thejoint density ofthe r.v.'s C = '1d('[.;:11tlj), i = 1, ... , n, was computed in
a more direct way by Weiss (1965).
Corollary 1.6.5. The r.v.'s (;, i = 1, ... , n, above have the joint density hn given by
hn(xn)
if Xi >
n! (
n+l)[
ai an+1
}J
+ ;~ (a;
 an+l )x;
J(n+1)
0 for i = 1, ... , nand '[.7=1 X; < 1, and hn = 0, otherwise.
PROOF. Straightforward by applying Lemma 1.6.4 and by computing the
density of the marginal distribution in the first n coordinates.
0
Lemma 1.6.4 will only be applied in the special case of i.i.d. random
variables. We specialize Lemma 1.6.4 to the case of a 1 = a2 = ... = an+l = 1.
Lemma 1.6.6. Let '11' ... , '1n+1 be i.i.d. standard exponential r.v.'s. Then,
(i) (tlr/(L~:11 '1j~=l' '[.;:: '1j are independent,
(ii)
tlj is a gamma r.v. with parameter n + 1 (thus having the density
X + exxn/n!, X ~ 0),
'[.;:11
40
1. Distribution Functions, Densities, and Representations
(iii) IJ l' IJ 1+ IJ 2,
... ,
2:;=1 IJj have the joint density
xn>exp(xn)
ijO<x 1 <<x n ,
and the density is zero, otherwise.
PROOF.
(i) and (ii) are obvious since the density gn+1 in Lemma 1.6.4 is of the
form
gn+1 (x n+1)
n! exp(  X n+1)x:+ 1 /n!
if 0 < 2:~=1 Xi < 1 and X n + 1 > O.
(iii) Standard calculations! See Example 1.4.4.
We prove that spacings of (0, I)uniformly distributed r.v.'s have the same
joint distribution as the r. v.'s IJr/Ci.;:ll lJj ) above by comparing the densities
of the distributions.
Theorem 1.6.7. If IJ l'
... ,
IJn+1 are i.i.d. standard exponential r.v.'s, then
( Ib
(U 1:n, U2 : n  U1:n,, Un:n  Un 1:n, 1  Un:n) g, IJr
PROOF.
n+1
J1
IJj
)n+1
r=l
. (1.6.2)
It suffices to prove that
(where UO : n = 0) because the random vectors with n + 1 components are
induced by those above and the map Xn > (Xl'' Xn , 1  2:7=1 xJ
From Corollary 1.6.5 we know that (1'fr/2:;:lllJj)~=l has the density hn(xn) =
n! if Xi > 0, i = 1, ... , n, and 2:7=1 Xi < 1. Starting with the density of (Ur:n)~=l
(see Example 1.4.2(ii it is immediate from (1.4.4) and Example 1.4.4 that hn
is also the density of (Ur:n  Ur1:n)~=1
0
Since i.i.d. random variables are exchangeable it is obvious that the r.v.'s
1Jt!(2:;:: IJj), ... , 1'fn+1/(2:;:11 1J) are also exchangeable. Thus, Theorem 1.6.7
yields that the distribution of (Ur:n  Ur1:n)~;;t (where Un+1:n = 1) is invariant under the permutation of its components. This implies, in particular,
that all marginal distributions of (Ur:n  Ur 1:n)~;;t of equal dimension are
equal.
Corollary 1.6.8. For every permutation r on {I, ... , n
+ I},
(Ut(r):n  Ut(r)l:n)~;;t g, (Ur:n  Ur1:n)~;;t
(1.6.3)
Let us also formulate Theorem 1.6.7 in terms of the order statistics Ur:n
themselves. Since Ur:n = 2:~=1 (Ui :n  Ui  1:n) we obtain
1.6. Spacings, Representations, Generalized Pareto Distribution Functions
41
Corollary 1.6.9. If '11' ... , '1n+l are i.i.d. standard exponential r.v.'s, then
(1.6.4)
Reformulation of Results
At a first step, the results above will be reformulated to order statistics Vi:n
of n i.i.d. random variables uniformly distributed on ( 1, 0). From Section 1.2
we know that
(1.6.5)
In this sequel, we shall deal with "negative" standard exponential r.v.'s
'1i in place of standard exponential r. v.'s '1i' Thus, ~ 1, ... , ~n+l are i.i.d.
random variables with common dJ. Gz , 1 (compare with (1.3.1 0)). We introduce
the partial sums
~i
=
(1.6.6)
From Lemma 1.6.6(ii) it is obvious that Sk is a "negative" gamma r.v. with
parameter k having density x ...... e X (  X)kl I(k  1)!, x < O. Corollary 1.6.9 is
equivalent to
Corollary 1.6.10.
(1.6.7)
Notice that Sn+t!n ...... 1, n ...... 00, w.p. 1, which in conjunction with (1.6.7)
indicates that, for every fixed k, asymptotically in distribution,
(1.6.8)
Recall that for k = 1 such a relation was proved in (1.3.14). For further
details see Section 5.3.
Next, we reformulate Malmquist's result.
Corollary 1.6.11. We have
(i)
(ii)
(iii)
( V.v,,:n " ...
nl:n
VZ:n _ V
1:n
Vl:n ,
v,,r+l:n/v,,r:n
=d 
).!!. (SlS 'S'
Sz ... Snl ~)
'S 'S
'

Vl:r
Sl Sz
Snl Sn
S ,s ,"',  S ,s ,Sn+l
n
n+l
z 3
for r
= 1, ... , n 
n+l
1,
are independent r.v.'s.
(1.6.9)
1. Distribution Functions, Densities, and Representations
42
PROOF. (i) is obvious from (1.6.7). (ii) is immediate from Corollary 1.6.2(ii).
Ad (iii): From Corollary 1.6.2(i) we know that the first n components of the
vectors in (1.6.9) are independent. Moreover, it is immediate from Lemma
1.6.6(i) that (S,/Sn+1 )~=l' Sn+l are independent and this property also holds for
(S,/S,+1 )~=l' Sn+1' Thus, (iii) holds.
D
Generalized Pareto D.F.'s
The uniform distribution on (1,0) is the generalized Pareto dJ. W2 ,1' We
introduce the class {W1,a, W2 ,a, W3: rt > o} of generalized Pareto d.f.'s and
extend the results above to this class.
Associated with the extreme value dJ. Gi,a is the generalized Pareto dJ. W;,a
that will be introduced by means of the map
defined on the support of G2 . l' Explicitly, we have for x
(_x)l/a
T;,a(x)
= ( _x)l/a
E (
00,0),
i = 1
if i
log(x)
(1.6.10)
i=3.
with the convention that T3 ,a == T3 ,1 == T3 .
If ~ is a r.v. with "negative" exponential dJ. G2 ,1 then we know (see (1.2.15))
that
T;,a(~)
is distributed according to Gi.a.
In analogy to this construction we get for a (  1, O)uniformly distributed
r.v. '1 that
T;,a('1) is distributed according to W;,a
with
W;,a
1 + log Gi,a
whenever 1 < log Gi,a < 0. Thus, the class of generalized Pareto d.f.'s arises
out of W2 ,1 in the same way as the extreme value d.f.'s out of G2 ,1'
For rt > we have
W1,Ax)
W",(x)
W3(x)
_ox_ a
~ { I  ~ xl'
=
_oe_ x
if
x~1
x> 1,
x
if x
x
if
"Pareto"
~1
(1,0)
~ 0,
x ~ 0,
x> 0.
"Uniform etc."
"Exponential"
(1.6.11)
1.6. Spacings, Representations, Generalized Pareto Distribution Functions
43
This class of dJ.'s was introduced by J. Pickands (1975) in extreme value
theory. The importance of the generalized Pareto dJ.'s will become apparent
later.
Order Statistics of Generalized Pareto R.V.'s
For the rth order statistic X"n of n i.i.d. random variables with common
generalized Pareto dJ. W:,a we obtain the representation
(1.6.12)
The use of the transformation T;,a automatically leads to the proper normalization. Check that
(1.6.13)
where Cn and dn are the normalizing constants as defined in (1.3.13). By
combining (1.6.13) and (1.6.8) one finds that
(cnl(Xn:n  dn), .. , c;;l(Xn_k+l:n  d n))
,g, (T;,a(Sl)'"'' T;,a(Sk)),
asymptotically in distribution, for every fixed k ~ 1.
Next, Malmquist's result will be extended to generalized Pareto r.v.'s. Here
the cases i = 1, 2 are relevant. Check that for negative reals a, b,
and
(1.6.14)
T2,a(a)/T2,a(b)
=  T2,a(  a/b).
Combining (1.6.12) and Corollary 1.6.11 one obtains
Corollary 1.6.12. Let X"n be the rth order statistic of n i.i.d. random variables
with common df. W:, a for i E {I, 2} and rL > O. Then,
(1.)
Xn:n
X2:n
. d epen d entr.v.s,
'
  , .. ,,Xl:narem
Xn 
l :n
Xl :n
if
i
i
=1
=2
for r
= 1, ... , n 
1.
It can easily be seen that the independence of ratios of consecutive order
statistics still holds if we include a scale parameter into our considerations.
As mentioned above, spacings of i.i.d. random variables with common
continuous dJ. are independent if, and only if, F is an exponential dJ. As a
consequence of this result one obtains (see Rossberg (1972) or Galambos
(1987), Corollary 1.6.2) that the ratios of consecutive order statistics of positive
or negative i.i,d. random variables with common continuous dJ. F are independent if, and only if, F is of the type Wl,a or W 2 ,a (where a scale parameter
has to be included).
44
1. Distribution Functions, Densities, and Representations
1.7. Moments, Modes, and Medians
The calculation of the exact values of moments of order statistics has received
much attention in literature. Since this aspect will not be central to our
investigations we shall only touch on moments of order statistics of uniform
and exponential r.v.'s.
Two results are included concerning conditions which ensure that moments
of order statistics exist and are finite. This topic will further be pursued in
Section 3.1 where some inequalities for moments of order statistics will be
established. The section concludes with a short summary of results concerning
modes and medians of distributions of order statistics.
Exact Moments
Let U1 : n, ... , Un:nagain denote the order statistics of n i.i.d. random variables
with common uniform distribution on (0, 1). The first result is a nice application of Malmquist's lemma (see Corollary 1.6.3).
Lemma 1.7.1. Let 0 < r1 < ... < rk < rk+1 = n + 1, and let m 1, ... , mk be integers such that ri +
mj ~
for i =
k. Then,
IJ=l
( iI
k
k
ED
Ur7:n = Db
ri +
,=1
,=1
where b(r, s) = (r  1)!(s  1)!/(r
J=l
+s
1, ... ,
mj , ri+1
 ri
)/ b(ri' ri+1  rJ
(1.7.1)
1)! is the beta function.
IJ=l
PROOF. Put Un+1:n = 1 and Si =
mj' By Corollary 1.6.3 (see also P.1.16)
and by inserting the explicit form of the density of Ur''''+1 1 we obtain
k
TI Ur7;n = E TI (Ur,:nlUr,+1 :nY'
i=l
i=l
= TI
k
i=l
E Us,
ri :rj +1 1
which easily leads to the righthand side of (1.7.1).
(1.7.1) may alternatively be written in the following form.
1.7. Moments, Modes, and Medians
45
From (1.7.2) we obtain as special cases:
EUr:n = rl(n
+ 1) = Jlr,n'
(1. 7.3)
and, more generally,
m
EUr ' n
.
r,(r_+:::1),'_..'::(r,+,.m_....:.1):(n + 1)(n + 2) ... (n + m)
_
c
After some busy calculations one also gets, for r
E[(U
ron
and, for r
E[(U
ron
Jlr,n
(1.7.4)
s,
)(U _
)] = Jlr,n(1  Jls,n)
son Jls,n
n+2
(1.7.5)
t,
Jlr,n
)(U _
)(U _
)]
son Jls,n I:n JlI,n
= 2Jlr,n(1
(n
 2Jls,n)(1  JlI,n)
+ 2)(n + 3)
.
(1.7.6)
For r = s we obtain in (1.7.5) that
2
r(n  r + 1)
E[Ur:n  Jlr,n] = (n + If(n + 2)'
Next we state the expectation and the variance of the rth order statistic
Xr:n of i.i.d. standard exponential r.v.'s. From Theorem 1.6.1 we know that
Xr:n 4: L~=1 '1J(n  i + 1) where '11' ... , '1r are standard exponential r.v.'s
(thus, having common expectation and variance equal to 1). This implies
immediately that
EXr:n =
L (n  i + 1)1
i=1
=: Jlr,n
(1.7.7)
and
r
E(Xr:n  Jlr,n)2 =
L (n  i + 1)2.
(1.7.8)
i=1
Inequalities for Moments
The first result yields that the mth moment of any order statistic Xr:n exists
and is finite if the mth absolute moment of the underlying distribution is finite.
Lemma 1.7.2. Let 0 = ro < r1 < ... < rk < rk+l = n + 1. Let X r1 on' ... , X rk :n
be order statistics of i.i.d. random variables ~ l' ... , ~n'
Then for every nonnegative, measurable function g on the Euclidean kspace
we have
46
1. Distribution Functions, Densities, and Representations
PROOF. Let F be the dJ. of ~ l ' Put C = n!/n~:; (ri  ri 1  I)!, B =
{(x 1 , ,Xk):0<X 1 <<xk <I}, xo=O and Xk+l=1. From Theorem
1.2.5(i) and Theorem 1.4.5 we get
= C
g(Fl(xd, ... , F 1 (xd)
(Xi 
i=1
:$; C
k+1
f
JeO.ll
xi_d'i'iI 1 dx 1 dX k
g(Fl(xd, ... ,Fl(Xk))dxl ... dxk =
CEg(~I, ... ,ek)
where the final identity becomes obvious by using the quantile transformation.
D
For g(x)
= Ixl mwe obtain as a special case
n!
EIXr:nlm:$; (r _ 1)!(n _ r)!Elell m.
(1.7.9)
Next, we find some necessary and sufficient conditions which ensure that
moments of central order statistics exist and are finite if the sample size n is
sufficiently large.
Lemma 1.7.3. Let X i:j be the ith order statistic of j i.i.d. random variables
1, ... , j with common df. F. Assume that
(1.7.10)
for some positive integers j, m and s
C > 0 such that
{1,oo.,j}. Then there exists a constant
1F1 (x)lm x s(1  xyS+l :$; C,
X E
(0, 1).
(1.7.11)
Conversely, (1.7.11) implies that
EIXr:nlk <
(1.7.12)
00
whenever 1 + ks/m :$; r :$; n  (j  s + l)k/m.
PROOF. By the same arguments as in the proof of Lemma 1.7.2 we get
EIXs)m
.,
= (s 1~(j _
fl
s)! Jo 1F1 (x)lm x s(l xys+l/(x(l x))dx
and hence, (1.7.11) holds under condition (1.7.10) since
(l/x)dx
(1/(1  x))dx
00.
47
1. 7. Moments, Modes, and Medians
Moreover, (1.7.11) implies (1.7.12) since
n'
EIXr:nlk = (r _ 1)!(n _ r)!
=
n!
(r  1)!(n  r)!
and r  1  ks/m
JoeW (x)jkx r1
ck/m (1 Xrlks/m(1
Jo
(1  xrr dx
_ X)nr Us+l)k/mdx <
0 as well as n  r  (j  s
00,
+ l)k/m ~ O.
We formulate a slightly weaker version of Lemma 1.7.3.
Corollary 1.7.4. For every positive integer k and 0 < at: < 1/2 the following three
conditions are equivalent:
(i)
for all sufficiently large n and nat:
(ii) There exists b > 0 such that
;5;
r ;5; (1  at:)n.
W 1 (qWq(1  q) <
00.
(1.7.13)
Ixl P F(x)(1  F(x)) <
00.
(1.7.14)
sup
qE(O,l)
(iii) There exists p > 0 such that
sup
x
PROOF. If (i) holds for all n ~ no, say, then the implication (1.7.10) ~ (1.7.11)
yields (ii) with b = kj(noat: + 2).
Moreover, if (ii) holds then (1.7.11) ~ (1.7.12) yields (i) for no =
[(1 + k(1 + l/b))/at:]' Thus (i) and (ii) are equivalent.
To prove the equivalence of (ii) and (iii) notice that (1.7.13) holds iff there
exists b > 0 such that
(a)
W 1 (qWq <
1 and (b)
W 1 (qW(l  q) ;5;
(1.7.13')
for sufficiently small values of q in (a) and (1  q) in (b).
Moreover, (iii) holds iff there exists b > 0 such that
(a) Iyl~ F(y) < 1 and (b) lyl~(1  F(y)) < 1
(1.7.14')
for sufficiently small values of F(y) in (a) and (1  F(y)) in (b).
We are going to prove the equivalence of (1.7.13')(a) and (1.7. 14')(a):
For sufficiently small q, the inequality W 1 (qWq < 1 is equivalent to
Fl(q) > _ql/~ which holds, according to (1.2.10), iff q > F( _ql/~). Setting
y = _ql/~ we see that (1.7. 13')(a) holds iff for all sufficiently small y we have
Iyl~ > F(y) which is equivalent to (1.7.14')(a).
In a similar manner one can prove that (1.7.13')(b) is equivalent to
(1.7. 14')(b) which completes the proof.
0
48
1. Distribution Functions, Densities, and Representations
Unimodality of D.F.'s of Order Statistics
In this part of the section we find conditions which imply the unimodality of
the dJ. of an order statistic.
AdJ. F is unimodal if there exists a number u such that the restriction
FI(  00, u) of F to the interval (  00, u) is convex and FI(u, 00) is concave. Every
u with this property is a mode of F. If u is a mode of F and F is continuous
at u then F possesses a density, say f, where f is nondecreasing on (  00, u]
and nonincreasing on [u, 00). We also say that a density f is unimodal if it has
these properties.
Hereafter let Xi:n be the order statistic of n i.i.d. random variables with
common dJ. F and density f. Moreover, assume that f is differentiable and
strictly positive on (IX (F), w(F)). Denote by fr:n again the density of Xr:n.
Given a real number u we write I(u) = (IX (F), w(F)) (\ (  00, u) and J(u) =
(IX (F), w(F)) (\ (u, 00). The following results are essentially due to Alam (1972).
Standard calculations yield that fr:n is unimodal if, and only if, there exists
some u such that
(1.7.15)
f/'nII(u) ~ 0 and f::nIJ(u):5: o.
Check that f:'n = b(r, n  r
f'
gr,n = P
+ 1)1 P pr1(1
rl
+ p 
nr
1_ F
 F)nr gr,n on (IX (F), w(F)) where
on (IX (F), w(F)).
(1.7.16)
The unimodality of fr:n will be characterized by means of the function gr,n'
Lemma 1.7.5. The density fr:n of Xr:. is unimodal if, and only
u such that gr,.II(u) ~ 0 and gr,.IJ(u) :5: O.
if, there exists
PROOF. Immediate from (1.7.15) and (1.7.16). Define u:= sup{x: IX(F) < x <
w(F) and gr,.(x) ~ O} if {x: IX(F) < x < w(F), gr,n(x) ~ O} # 0, and u =
inf{x: IX(F) < x < w(F), gr,n(x) < O}, otherwise.
0
The density fr:n is not unimodal, in general, if the underlying density f is
unimodal. We mention the following counterexample due to Huang and Gosh
(1982): Consider the density f defined by
f(x) =
{I!
if
! < x < 0
O:5:x<1
that is zero otherwise. Obviously, f is unimodal. However, it can be shown
that the density of the kth order statistic of a sample of size n is not unimodal
for k > (n + 1)/2.
However, if f is strongly unimodal [that is, log f is concave on the support
(IX (F), w(F))] then it can be shown that fr:n is unimodal. Notice that the strong
unimodality of f implies that f'IP is nonincreasing on (IX(F),w(F)). This
follows at once from the fact that Ilf is convex if f is strongly unimodal.
49
1.7. Moments, Modes, and Medians
Corollary 1.7.6. (i) If 1'/ is non increasing on (ex (F), w(F)) then j,.,n is unimodal.
(ii) If, in addition, gr,n(u) = 0 for some U E (ex (F), w(F)) and n ~ 2 then u is the
unique mode of j,., n'
PROOF. (i) Obvious from Lemma 1.7.5 since F is nondecreasing.
(ii) Since F is strictly increasing on (ex (F), w(F)) we know that gr,n is strictly
decreasing. This implies that the solution of the equation gr,n(u) = 0 (that is
necessarily a mode of j,.,n) is unique.
0
The Cauchy distribution provides an example of a unimodal density which
is not strongly unimodal, however, l'/f2 is nonincreasing.
EXAMPLES 1.7.7. (i) The normal, exponential and uniform densities are strongly
unimodal.
(ii) Iff = 1[0.1] and n ~ 2 then (r  1)f(n  1) is the unique mode of j,.'n'
(iii) The condition that l'/f2 is nonincreasing is not necessary for the unimodality of j,.,n: Let F(x) = x a for x E (0, 1) and some ex E (0, 1). Then j,.,n is
unimodal, however, I'/P is strictly increasing on (0, 1).
It follows from Corollary 1.7.6 and P.3.4 that the weak convergence of
distributions of order statistics is equivalent to the convergence w.r.t. the
variational distance if the underlying density is strongly unimodal (or if 1'/f2
is nonincreasing).
Medians
As a third functional parameter of order statistics we consider the median of
the distribution of an order statistic. Again we are interested in the relationship
between the underlying distribution and the distributions of order statistics.
Recall that a median u of a r.v. ~ is defined by the property that
(1.7.17)
(1.7.17) holds if F(u) = t. Moreover, if the dJ. F of
(1.7.17) is equivalent to the condition F(u) = t.
is continuous, then
Lemma 1.7.8. Let X;,2m+1 be the ith order statistic of i.i.d. random variables
~ l ' ... , ~ 2m+1 with common df F where m is a positive integer. Then, every
median of ~1 is a median of X m+1,2m+1'
PROOF.
Let u be a median of ~ l ' Since F(u) ~
that
P{Xm+Um+1 ::;
u}
= P{Um+1,2m+1 ::;
t we obtain from Corollary 1.2.7
F(u)} ~
P{Um+1 ,2m+l ::;
t}.
Example 1.2.2 implies that P{Um+Um+1 ::; t} = P{Um+1,2m+1 ~ t}. Hence
P{Um+1 ,2m+1 ::; t} = t and, thus, P{Xm+1 ,2m+1 ::; u} ~ t.
1. Distribution Functions, Densities, and Representations
50
Since P{Xm+1:2m+1 ::; v} i P{Xm+l:2m+l < u} as viu it remains to prove
that P{Xm+l:2m+1 ::; v} ::; t for every v < u. This follows by the same arguments as in the first part of the proof by using the fact that F(v) ::; 1.
0
Lemma 1.7.8 reveals that the sample medians for odd sample sizes are
median unbiased estimators of the underlying (unknown) median. However,
this is an exceptional case. For even sample sizes 2m it is impossible, in general,
to find some r E {I, ... , 2m} such that the underlying median is the median of
Xr:2m'
EXAMPLE 1.7.9. For every positive integer m and r
{I, ... , 2m} we have
(1.7.18)
To prove this notice that for r # 2m  r + 1 we have P {U" 2m
P{U2m  r+1:2m ::; t} and hence by Example 1.2.2
::;
t} #
P{U,,2m ::; t} = 1  P{U2m  r+1:2m ::; t} # 1  P{U"2m ::; t}.
This implies (1.7.18).
The discussion above can be extended to the question whether the qquantile F 1 (q) is a median of the distribution of the sample qquantile
FnI(q); in other words, whether the sample qquantile is a median unbiased
estimator of the underlying qquantile. Clearly, the answer is negative in
general, however, as pointed out in (8.1.9), randomized sample qquantiles
have this property. In the present section we shall only examine randomized
sample medians. The reader not familiar with Markov kernels and their
interpretation is adviced first to read Section 10.1.
Denote by ex the Dirac measure with mass 1 at x; thus, we have eAB) =
IB(x). Define the Markov kernel M r n by
(1.7.19)
which is a randomized sample median if r = [(n + 1)/2]. Thus, X"n as well
as X n  r + 1 : n are chosen with probability t. Notice that if n = 2m + 1 and
r = m + 1 then the (nonrandomized) sample median X m+1 : 2m +1 is taken.
Denote by Mr.nP the distribution of the Markov kernel M r.n (compare with
(10.1.2)). We have (Mr.nP)(B) = EMr.n(BI).
Lemma 1.7.10. Let Xi:n denote the ith order statistic of n i.i.d. random
variables eI, ... , en with continuous df F. Then every median of e1 is a median
of M r n
Since F is continuous we have F(u) = 1/2 for every median u of I '
We will prove that (Mr.nP)( 00, u] = 1/2 and hence u is a median of M r.n.
From Corollary 1.2.7 and Example 1.2.2 we get
PROOF.
51
1.8. Conditional Distributions of Order Statistics
+ P{Xn r+1:n S u}J
t} + P{Un r+ l : n S t}J
t} + P{Ur:n > t}]
(Mr. n P)( oo,u] = t[P{Xr:n S u}
= t[P{Ur:n S
= t[P{Ur:n S
2'
Lemma 1.7.10 shows that M r n is a median unbiased estimator of the
underlying median.
1.8. Conditional Distributions of Order Statistics
Throughout this section, we shall assume that Xl on' ... , Xn:n are the order
statistics of n i.i.d. random variables with common continuous dJ. F. The aim
of the following lines will be to establish the conditional distribution of
(Xs, :n"'" XSm:n) conditioned on (Xr, :n"'" X rk :n).
Introductionary Remarks
At the beginning let us touch on some essential definitions and properties
concerning the conditional distribution
P(Y E 'IX)
of Y given X.
In the present context it is always possible to factorize the conditional distribution P( Y E '1 X) by means of the conditional distribution
P(YE 'IX = x) of Y given X = x. Moreover, P(YE BIX) is the composition
of P(YEBIX =.) and X. By writing, in short, P(YEBI') in place of
P(YE BIX = .) we have P(YE BIX) = P(YE BI') 0 X.
Apart from a measurability condition and the fact that P(Y E 'IX = x) is a
probability measure the defining property of P(Y E 'IX) is
E(lA(X)P(YE BIX))
= P{X E
A, YE B}
(1.8.1)
for all Borel sets (in general, measurable sets) A and B.
From (1.8.1) we see that P(Y E 'IX = x) has only to be defined for elements
x in a set having probability 1 W.r.t. the distribution of X. For x in
the complement of the this set, P( Y E '1 X = x) may e.g. be defined as the
distribution of Y.
In the statistical context, one is primarily interested in the consequence that
the distribution of Y can rebuilt by means of the conditional distribution
P( Y E '1 X = .) and the distribution of X. Obviously,
EP(YE BIX) = P{YE B}.
(1.8.2)
Assume that the joint distribution of X and Y has a density, say, f
W.r.t. some product measure J1.l x J1.2' Then we know that the conditional
52
1. Distribution Functions, Densities, and Representations
distribution P(YE 'IX = x) has a 1l2density, say, f2('lx) which, by the
definition of a density, has the property
P(YE BIX = x) =
Lj~('IX)dIl2'
The density f2( 'Ix) is the conditional density of Y given X = x. It is well
known that f2( 'Ix) = f(x, . )/fl (x) if fl (x) > 0 where fl is a Illdensity of the
distribution of X.
We mention another simple consequence of(1.8.1). The conditional distribution
P((X, Y)
'IX = x)
of (X, Y)
given X = x
is the product of P( Y E '1 X = x) and the Diracmeasure bx at x defined by
bAB) = 1B(x). This becomes obvious by noting that
E[lA(X)P(YE B2IX)b x (B I )] = P{X
A, (X, Y)
BI
B2 }.
(1.8.3)
The Basic Theorem
Starting with the joint density of order statistics it is straightforward to deduce
the desired conditional distributions. A detailed proof of this result is justified
because of its importance. We remark that the proof can slightly be clarified
(however not shortened) if P.1.32, which concerns conditional independence
under the Markov property, is utilized.
Let r l < ... < rk The conditional distribution of the order statistic Y:=
(Xl :n"'" Xn:n) given
X:= (Xr,,,,,,,Xrk:n) = (xr" ... ,xr.) =: x
has only to be computed for vectors x with IX(F) < x r, < ... < x rk < w(F)
(compare with Theorem 1.5.2). We shall prove that P(Y E 'IX = x) is the joint
distribution of certain independent order statistics W; and degenerated r.v.'s
Y,..J More precisely, W; is the order statistic of i.i.d. random variables with
common dJ. Fi.x which is F truncated on the left of Xr'_l and on the right of
x r, (where xro = IX(F) and Xrk + = w(F. Thus,
1
Fijy) = [F(y)  F(xr'_l)]/[F(x r)  F(Xr'_l)]'
and i = 1, ... , k
+ 1.
Theorem 1.S.1. Let F be a continuous dj., and let 0 = ro < r l < ... < rk <
rk+1 = n + 1. If IX(F) = xro < Xr, < ... < x rk < Xrk + 1 = w(F) then the conditional distribution of (Xl:n, ... ,Xn:n) given (Xr,:n"",Xrk:n) = (xr" ... ,xr.) is
the joint distribution of the r.v.'s YI , ... , y" which are characterized by the
following three properties:
1.8. Conditional Distributions of Order Statistics
(a) For every i E I := {j: 1 ~ j ~ k
+ 1, rj 
53
rj 1 > I} the random vector
is the order statistic of ri  ri1  1 U.d. random variables with common d.f.
Fi,x'
(b) Y,., is a degenerate r.v. with fixed value Xr, for i = 1, ... , k.
(c) W;, i E I, are independent.
PROOF. Put M := {I, ... , n} \ {r1'"'' rk}' In view of (1.8.3) it suffices to show
that the conditional distribution of the order statistics X i : n, i E M, given
X =: (Xr,:n,""Xrk :n) = (xr" ... ,xrJ =: x is equal to the joint distribution of
the r.v.'s ~,j E M. This will be verified by constructing the conditional density
in the way as described above.
Denote by Q the probability measure corresponding to the dJ. F. Let f be
the Qndensity of the order statistic (X 1 :n"'" Xn:n) and 9 the Qkdensity of X
(as computed in Theorem 1.5.2). Then, the conditional Qnkdensity, say,f( '1 x)
of X i : n, i E M, given X = x has the representation
if g(x) > 0 where z denotes the vector (Xi)ieM' Notice that the condition
g(x) > 0 is equivalent to oc(F) < x r, < ... < x rk < w(F). Check thatf(zlx) may
be written
f(zlx) =
fl hi(xr,_, +1"", Xr,d/(F(x r,) 
ieI
F(X r'_1))r,r,_,1
where hi is the Qi:;r,, Idensity of W; and Qi,x is the probability measure
corresponding to the truncated dJ. Fi,x'
Since 1/[F(xr,)  F(xri1)] defines a Qdensity of Qi,x it follows that f( 'Ix)
is the Qnkdensity of ~, j E M. The particular structure of f( 'Ix) shows that
the random vectors W;, i E I, are independent and W; is the asserted order
0
statistic.
Theorem 1.8.1 shows that the following two random experiments are
equivalent as far as their distributions are concerned. First, generate the
ordered values Xl < ... < Xn according to the dJ. F. Then, take x r, < ... < x rk
and replace the ordered values x r ,_, +1 < ... < Xr,l by the ordered values
Yr,_, +1 < ... < Yr,l which are generated according to the truncated dJ. Fi,x as
defined above. Then, in view of Theorem 1.8.1 the final outcomes
Yl < ... < Yr,l < x r, < Yr,+l < ... < Yr2l < x r2 < ...
< x rk < Yrk+1 < ... < Yn
as well as Xl < ... < Xn are governed by the same distribution.
In Corollary 1.8.2 we shall consider the conditional distribution of
(Xs,:n,""Xsm :n) given (Xr,:n,''''Xrk :n) = (xr" ... ,xrJ instead of the conditional distribution of the order statistic (Xl :n"'" Xn:n). This corollary will
I. Distribution Functions, Densities, and Representations
54
be an immediate consequence of Theorem 1.8.1 and the following trivial
remarks.
Let X and Y be LV.'S, and g a measurable map defined on the range of
Y. Then,
(1.8.4)
is the conditional distribution of g(Y) given X. This becomes obvious by
noting that as a consequence of (1.8.1) for measurable sets A,
E[lA(X)P(YE g1(c)IX)] = P{X
A, g(Y)
C}.
(1.8.5)
An application of (1.8.4), with g being the projection (x 1 , ,x n )+
.. , xsJ yields
(x s "
Corollary 1.8.2. Let 1 :s;; S1 < ... < Sm :s;; n. The conditional distribution of
(Xs, :n"'" Xsrn: n) given (Xrl :n.. X rk :n) = (x r,,. xrJ is the joint distribution
of the r.v.'s Y." ... Y.rn with 1'; defined as in Theorem 1.8.1.
As an illustration to Theorem 1.8.1 and Corollary 1.8.2 we note several
special cases.
EXAMPLES 1.8.3. (i) The conditional distribution of Xs:n given Xr:n = x is the
distribution of
(a) the (s  r)th order statistic Y.r:nr of n  r i.i.d. random variables with
dJ. F(x,oo) (the truncation of F of the left of x) if 1 :s;; r < s :s;; n,
(b) the (r  s)th order statistic y"s:ns of n  s i.i.d. random variables with
dJ. F(oo,x) (the truncation of F on the right of x) if 1 :s;; s < r :s;; n,
(c) a degenerate LV. with fixed value x if r = s.
(ii) More generally. if in (i) Xs:n is replaced by
(a) X s:n r < s :s;; n. then in (i)(a) Y.r:nr has to be replaced by (Yl :n,,""
~r:nr)'
(b) X s:n 1 :s;; s < r, then in (i)(a) y"s:ns has to be replaced by (Yl :nS" .. ,
Y,,s:n.)
(iii) The conditional distribution of X r+1 :n' ... , X s l :n given Xr:n = x and
Xs:n = Y is the distribution of the order statistic (Yl :sr+1,'''' Y.r+l :sr+l)
of s  r + 1 i.i.d. random variables with dJ, F(x,y) (the truncation of F on the
left of x and on the right of y).
(iv) (Markov property) The conditional distribution of Xs:n given Xl:n =
Xl' ... , X s l :n = X s 1 is the conditional distribution of Xs:n given X s l :n =
xs  l . Hence, the sequence Xl :n' ... , Xn:n has the Markov property.
The Conditional Distribution of Exceedances
Let again Xi:n be the ith order statistic ofn i.i.d. random variables ~l' ... , ~n
with common continuous dJ. F. As a special case of Example 1.8.3(ii)
1.8. Conditional Distributions of Order Statistics
55
we obtain the following result concerning the k largest order statIstIcs:
The conditional distribution of (Xnk+1:m"" Xn:n) given X n k : n = x is the
distribution of the order statistic (Y1 :k,"" l'k:k) of k i.i.d. random variables
111,"" 11k with common dJ. F(x,oo)'
By rearranging X n k+1 :n' ... , Xn:n in the original order of their outcome
we obtain the k exceedances, say, (1, ... , (k of the r.v.'s ~ 1, ... , ~n over the
"random threshold" X n  k : n
We have ((1'''',(k) = (~i(1)'''''~i(k) whenever I:$; i(l) < ... < i(k):$; n
and min(~i(1)"'" ;(k) > X n k:n. This defines the exceedances (; with probability one because F is assumed to be continuous.
Corollary 1.8.4. Let cx(F) < x < w(F). The conditional distribution of the
exceedances (1' ... , (k given X n  k : n = x is the joint distribution of k i.i.d. random
variables 111, ... , 11k with common dJ. F(x,oo) (the truncation of the dJ. F on the
left of x).
PROOF. Let Sk be the permutation group on {I, ... , k}. For every permutation
r E Sk we get the representation
(( 1", .,
(k) = (Xn t(1)+1 :n"'" X n t(k)+1 :n)
on the set At where
At = {(R i(1)'"'' Ri(k) = r
for some 1 :$; i(l) < ... < i(k) :$; n}
and (R 1, ... , Rn) is the rank statistic (see P.1.30). Check that P(A t ) = 11k! for
every r E Sk' Using the fact that the order statistic and the rank statistic are
independent we obtain for every Borel set B
P(((1""'(k)
=
BIXn k:n = x)
P(At n {(Xn 
(l)+1
:n"'" X n t(k)+1 :n)
B} IXn k:n = x)
tSk
= (11k!)
P((Xn t(1)+1 :n"'" X n t(k)+1 :n) E BIXn k:n = x)
P{(~(1):b'''' ~(k):k)
tE Sk
(11k!)
B}
tESk
where the Y;:k are the order statistics ofthe r.v.'s I1j. The last step follows from
Example 1.8.3(ii). By P.1.30,
P(((l'''''(k)
BIXn k:n = x)
= P{(I11," .,I1d E B}.
The proof is complete.
Extensions of Corollary 1.8.4 can be found in P.1.33 and P.2.1.
Convex Combination of Two Order Statistics
From Example 1.8.3(i) we deduce the following result which will further be
pursued in Section 6.2.
1. Distribution Functions, Densities, and Representations
56
Corollary 1.8.5. Let F be a continuous df, and let 1 :::; r < s :::; n.
Then, for every p and t,
P{(l  p)Xr:n
+ pXs:n:::; t}
= Fr,n(t) 
f",
P{P(Y.r:nr  x) > t  x} dFr,n(x)
where Fr,n is the df of Xr:n, and Y.r:nr is the (s  r)th order statistic of n  r
U.d. random variables with common df F(x,,,,) [the truncation of F on the left
of xl
This identity shows that it is possible to get an approximation to the dJ. of
the convex combination of two order statistics by using approximations to
distributions of single order statistics.
In Section 6.2 we shall study the special case of the convex combination of
consecutive order statistics Xr:n and X r+ 1 : n where Xr:n is a central order
statistic and, thus, Y.r:nr is a sample minimum.
PROOF OF COROLLARY
P{(l  p)Xr:n
1.8.5. Example 1.8.3(i) implies that
+ pXs:n:::; t} =
=
P{(l  p)x
f",
+ pY.r:nr:::; t}dFr,n(x)
P{p(Y.r:nr  x) :::; t  x} dFr,n(x)
since P {Y.r:nr :::;;; x} = O. This implies the assertion.
P.l. Problems and Supplements
Let ~ l ' ... , ~. be i.i.d. random variables with common dJ. F, and let Xr:n denote the
rth order statistic.
1. Prove that the order statistic is measurable.
2. Denote by I(q) the set of all qquantiles of F. Ifr(n)/n + q as n +
eventually, w.p. 1 for every open interval U containing I(q).
00
then X,(n):.
U,
3. Denote by S. the group of permutations on {l,oo.,n}.
(i) For every function f,
L J(Xt(l):"""
'teSn
Xt(n):n) =
L J(~t(I)"'"
'reS"
~t(.)
(ii) Using the notation of (1.1.4),
Zr:.(~I'' ~n) = Zr:.(~t(1)'' ~t(n)
(that is, the order statistic is invariant w.r.t the permutation of the given r.v.'s).
P.1. Problems and Supplements
57
4. (i) AdJ. F is continuous if F 1 is strictly increasing.
(ii) F 1 is continuous if F is strictly increasing on (tx(F), w(F)).
(iii) Denote by Fz the truncation of the d.f. F on the left of z. Prove that
Fz 1(q) = p1 [(1  F(z))q
+ F(z)J.
5. Let I] be a (0, 1)valued r.v. with dJ. F. Then, G 1(I]) has the dJ. FoG for every
dJ. G.
6. Let I] be a r.v. with uniform distribution on the interval (U 1 ,U 2 ) where 0 ~ U 1 <
U 2 ~ 1. Let F be a dJ. and put Vi = F 1(U i ) [with the convention that F 1(0) = tx(F)
and p1(1) = w(F)]. Then, p1(I]) has the dJ.
G(x) = (F(x)  F(v 1))/(F(v2 )
7. Let F and G be d.f.'s. If F(x)
q> G(u).
~ G(x) for
F(vd),
every x
~ U
then P1(q)
~ G 1(q)
for every
8. Let ei, i = 1,2, 3, ... be r.v.'s which weakly converge to eo. Then, there exist r.v.'s
e; such that ei ~ e; and e;, i = 1, 2, 3, ... converge pointwise to e~ w.p. 1. [Hint:
Use Lemma 1.2.9.]
9. For the beta dJ. I, . with parameters rand s [compare with (1.3.8)] the following
recurrence relation holds:
(r
+ s)/". =
rl'+1,.
+ 1".+1'
10. (Joint dJ. of two order statistic)
Let Xi,. be the ith order statistic of n i.i.d. random variables with common dJ. F.
(i) If 1 ~ r < s ~ n then for u < V,
P{X". ~ u, X.,. ~ v}
=
.i
i=, j=max(O,.i)
..
n'.. . F(u)V(v) })!
I!}!(n 
1 
F(u))i(l  F(V))i j
and for u ~ v,
P{X".
u, X.,.
v}
P{X.,.
v}.
[Hint: Use the fact that L;:=l [1(oo,u)(ek), 1(u,v)(ed, 1(v,oo)(ek)] is a multinomial
random vector.]
(ii) Denote again by I". the beta dJ. Then for u < v,
P{X".
u, X.,.
v}
= 1".,+1 (F(u)) _ _ n_!_
(r  1)!
'f1 (_1)i F(ur+J.'+1"'i(~
 F(~)) .
n!(n  r  I)!(r + I)
i=O
(Wilks, 1962)
11. (Transformation theorem)
Let v be a finite signed measure with density f Let T be a strictly monotone,
realvalued function defined on an open interval J. Assume that 1= T(J) is an
open interval and that the inverse S: I + J of T is absolutely continuous. Then
IS' I(f 0 S) 1[ is a density of Tv (the measure induced by v and T).
[Hint: Apply Hewitt & Stromberg, 1975, Corollary (20.5).]
1. Distribution Functions, Densities, and Representations
58
12. Derive Theorem 1.3.2 from Theorem 1.4.1 by computing the density of the rth
marginal distribution in the usual way by integration.
(Hajek & Sidak, 1967, pages 39, 78)
13. Extension to Theorem 1.4.1: Suppose that the random vector (~l" .. , ~n) has the
(Lebesgue) density g. Then, the order statistic (Xl ,n"", Xn,n) has the density
fl. .... n'n given by
fl ..... n'.(x)
reS"
g(XT(l)'"'XT(.)'
XI
< ... < x.,
and =0, otherwise (here Sn again denotes the permutation group).
(Hajek & Sidak, 1967, page 36)
14. For i = 1, 2 let X\j~., ... , x~j?n be the order statistics ofn i.i.d. random variables
with common continuous dJ. Fj If the restrictions FII Bj and F21 Bj are equal
on the fixed measurable sets Bj , j = 1, ... , k, then for every measurable set
B c BI X ... X Bk and 1 S rl < ... < rk S n:
P{ (X;~?n'"'' X;~?n) E B} = P{ (xg?n,"" X;;?)
B}.
15. If the continuity condition in P.1.14 is omitted then the result remains to hold if
the sets Bj are open.
16. (Modifications of Malmquist's result)
Let 1 s rl < ... < rk S n.
(i) Prove that the following r.v.'s are independent:
1  Ur"., (1  Ur2 ,n)/(1  Ur".), ... , (1  Urk ,.)/(l  Urkl'.)'
Moreover,
(1  Uri ,n)/(l  Uri _1 , . )
= U. ri +! ,.ri_'
for i = 1, ... , k (with ro = o and Uo,. = 0).
(ii) Prove that the following r.v.'s are independent:
Moreover,
for i = 1, ... , k (with rk+1 = n + 1 and U.+ I ,. = 1).
(iii) Prove that the following r.v.'s are independent:
Ur"n,(Ur2 ,n  Ur".)/(1  Ur".), ",,(Urk,n  Urk _, ,.)/(1  Urk_"n)'
Moreover,
for i = 1, ... , k (with ro = 0 and UO,n = 0).
(iv) Prove that the following r.v.'s are independent:
(Ur2 ,n  Ur"n)/Ur2 ,n, ... , (Urk ,.  Urk_".)/Urk ,., 1  Urk ,.
Moreover,
for i = 1, ... , k (with
rk+1
+ 1 and
Un+! ,.
= 0).
P.l. Problems and Supplements
59
17. Denote by ~i independent standard normal r.v.'s. It is well known that (~i
is a standard exponential r.v. Prove that
(VI ,n"'" Vn,n) =d
.~ ~f )/(2(n+1)
.~ ~f ))n
( ( 2r
11
11
r=l
+ ~n/2
18. Let ~1' ... , ~k+l be independent gamma r.v.'s with parameters SI, ... , Sk+l'
(i) Then, (~JL.J:';t ~)~~1 has a kvariate Dirichlet distribution with parameter
vector (SI,"" Sk+1)'
(Wilks, 1962)
(ii) Show that for 0 = ro < r l < ... < rk < rk+l = n + 1,
19. Let Fn denote the sample d.f. of n i.i.d. (0, I)uniformly distributed r.v.'s, and
rIJ, ... , '1n+1 independent standard exponential r.v.'s. Then,
Fn(t)
20.
~ n~l i~ 1(~oo,t] (~ '1j /:~ '1}
(i) Let Xi,n denote the ith order statistic ofn i.i.d. random variables with common
density f As an extension of Theorem 1.6.1 one obtains that (X"n  Xr~Ln)~~l
has the density
x
>
n!
(fu(t
11
)1
xj )),
Xj
> 0, i = 1, ... , n,
and the density is zero, otherwise.
(ii) The density of(V"n  Vr1'n)~~l is given by
x
>
if Xj > 0, i = 1, ... , n, and
n!
L.
Xj
< 1,
j~1
and the density is zero, otherwise.
(iii) For 1 ~ r < S ~ n the density of (V"n  Vr~Ln' V"n  VS~1 ,n) is given by
x
>
n(n  1)(1  x _ y)n~2
if x, y > 0 and x
+y<
1,
and the density is zero, otherwise.
21. (Convolutions of gamma r.v.'s)
(i) Give a direct proof of Lemma 1.6.6 by induction over n and by using the
convolution formula P {~ + '1 ~ t} = SG(t  s) dF(s) where ~ and '1 are independent r.v.'s with drs G and F.
(ii) It is clear that ~ + '1 is a gamma r.v. with parameter m + n if ~ and '1 are
gamma r.v.'s with parameters m and n.
22. Let IJ. > 0 and i = 1 or i = 2. Prove that the sample minimum of n i.i.d. random
variables with common generalized Pareto dJ. W; . has the d.f. W; ....
23. Prove that
EVr~~ =
fl
m=l
(n  m + 1)/(r  m)
if 1 ~j < r.
[Hint: Use the method of the proof to Lemma 1.7.1.]
60
1. Distribution Functions, Densities, and Representations
24. Put Ar
r/(n
+ 1), Un+! ,n =
1 and Uo,n
O. Prove that
(i)
if 1 :::; r < s :::; n + I, and
(ii)
if 0 :::; r < s :::; n.
25. For 0
ro < r, < ... < rk < rk+' = n
k+1
(ri  ri,  I)E
+ 1 and reals ai' i =
1, ... , k,
a.(U  A )2  a,_ (U
 A )2
I
r"n
r,
I I
r'_I,n
r'_1 = 0
j=l
U'j:nU'i_l:n
where ao = ak+1 = O.
26. Let X"n be the rth order statistic of n i.i.d. random variables with common dJ.
F(x) = 1  1/logx for x ~ e. Then, for every positive integer k,
EIX"nl k =
00.
27. For the order statistics XLI and X I ,2 from the Pareto dJ. Wl.l we get
EX", =
00
and
EX',2 = 2.
28. Let Mr,n be the randomized sample median as defined in (1.7.19) and
Nr.n = X"n 1(,/2.1)(tJ)
+ X n r+, ,n 1(0.'/2j(tJ)
where tJ is a (0, I)uniformly distributed r.v. that is independent from
Show that the distributions of Mr,n and Nr,n are equal.
(~"
... , ~n)'
29. (Conditional distribution of (~I"'" ~n) given (X I ,n"'" Xn,n
Let Xi,n be the order statistics of n i.i.d. random variables ~" ... , ~n' Let Sn denote
the group of permutations on {l, ... , n}. Then, the conditional distribution of
(~"""~n) given (X"n"",Xn,n) is defined by
P((~I'''''~n)
AI(X',n,,,,,Xn,n = (n!f'
Thus, the conditional expectation of f(~ I"'"
by
E(f(~"""~n)I(X"n,,,,,Xn'n
rESn
~n)
= (n!f'
l A (X,(lp'''''X,(n),n)'
given (XI ,n"", Xn,n) is defined
rES n
f(X,(l),n'''',X,(n),n)'
30. (Rank statistic and order statistic)
The rank of ~i is defined by R i.n = nFn(~;) where Fn is the sample dJ. based on
~" ... , ~n' Moreover, Rn = (R"n,"" Rn,n) is the rank statistic.
Suppose that (~" ... , ~n) has the density g. Then:
(i)
(ii) The conditional distribution of Rn given Xn = (X, ,n"'" Xn,n) is defined by
P(R n = KIXn) = g(XK(,),n,,,,,XK(n),n)/I g(X,(I),n,,,,,X,(n),n)
'ES
for K = (K(I), ... , K(n E Sn.
Bibliographical Notes
61
(iii) If, in addition, ~ I' ... , ~n are i.i.d. random variables then Rn and Xn are
independent and P{Rn = K} = lin! for every K E Sn'
(Hajek & Sidak, 1967, pages 3638)
31. (Positive dependence of order statistics)
Let Xi,n denote the ith order statistic of n i.i.d. random variables with common
continuous dJ. F. Assume that EIXi,nl < 00, EIXj,nl < 00 and EIXi,nXj,nl < 00.
Then, Cov(Xi,n, Xj,n) ~ O.
(Proved by P. Bickel (1967) under stronger conditions.)
32. (Conditional independence under Markov property)
Let Yt , , y" be realvalued r.v.'s which possess the Markov property. Let
1 :::;; r l < ... < rk :::;; n. Then, conditioned on y"" ... , y"k' the random vectors
(YI , .. , Y,.,), (Y,.,+1'"'' Y,.,), ... , (y"k+1,"" y") are independent; that is, the product
measure
P((YI , .. , Y,.,)E IY,.,)
Y,.,)) x ..
P((y",+ I ' " ' ' Y,.,)E 'I(y""
... X
P((y"k+ I , " " y")E 1Y,..j
is the conditional distribution of (YI , .. , y") given (y"", .. , Y,.J
33. Let F, ri, x, and Fi x be as in Theorem 1.8.1.
(i) For i ~ I := (j: 1 :::;; j :::;; k + 1, rj  rj_1 > 1} define the random vector
((,,_,+1'" ,(,,1) by the original r.v.'s ~i lying strictly between X"_I,n and X",n
in the original order of the outcome.
Then, the conditional distribution of ((,'_, +1,' .. , (,,I), i E I, given X",n =
x", ... , X'k,n = X'k is the joint distribution of the independent random vectors
(tI,,_,+I'" ,tI,,I), i E I, where for every i E I the components of the vector are
i.i.d. with common dJ. Fi x '
(ii) Notice that
(("_, +1"", (,,tl = ((j(!), ... , ~j(""II)
whenever 1 :::;; j(1) < ... < j(ri  ri 
I 
1)
n, and
X"_,,n < min(~j(I)""'~j(""_Il)):::;; max(~j(!), ... ,~j(,,_,,_,l)) < X",n'
34. (Conditional dJ. of exceedances)
Let Fn be the sample dJ. of r.v.'s with common uniform dJ. on (0,1). nFn(t),
o :::;; t :::;; 1, is a Markov process such that nFn(t), Xo :::;; t :::;; 1, conditioned on
nFn(xo) = k, is distributed as
Bibliographical Notes
Ordering of observations according to their magnitude and identifying central
or extreme events belongs to the most simple human activities. Thus, one can
give early reference to the subject of order statistics by quotations from any
number of ancient books. For example, J. Tiago de Oliveira gives reference
62
1. Distribution Functions, Densities, and Representations
to the age of Methuselah (Genesis, The Bible) in the preface of Statistical
Extremes and Applications (1984). By the way, Methuselah is reported to have
lived 969 years. This should not merely be regarded as a curiosity but also as
a comment indicating the difficulties for the proper choice of a model; here in
connection with the question (compare with E.J. Gumbel (1933), Das Alter des
M ethusalem): Does the distribution of mortality have a bounded support?
An exhaustive chronological bibliography on order statistics of pre1950
and 19501959 publications with summaries, references and citations has
been compiled by L. Harter. The first relevant result is that of Nicolas
Bernoulli (1709) which may be interpreted as the expectation ofthe maximum
of uniform random variables.
In the early period, the sample median was of some importance because of
its property of minimizing the sum of absolute deviations. It is noteworthy
that Laplace (1818) proved the asymptotic normality of the sample median.
This result showed that the sample median, as an estimator of the center of
the normal distribution, is asymptotically inefficient w.r.t. the sample mean.
From our point of view, the statistical theory in the 19th century may be
characterized by (a) the widely accepted role of the normal distribution as a
"universal" law and (b) the beginning of a critical phase which arose from the
fact that extremes often do not fit that assumption. Extremes were regarded
as doubtful, outlying observations (outliers) which had to be rejected. The
attitude toward extremes at that time may be interpreted as an attempt to
"immunize" the normality assumption against experience.
Modern statistical theory is connected with the name of R.A. Fisher who
in 1921 discussed the problem of outliers: " ... , the rejection of observations
is too crude to be defended; an unless there are other reasons for rejection
than mere divergences from the majority, it would be more philosophical to
accept these extreme values, not as gross errors, but as indications that the
distribution of errors is not normal."
A paper by L. von Bortkiewicz in 1922 aroused the interest of some of his
contemporaries (E.L. Dodd (1923), R. von Mises (1923), L.H.c. Tippett (1925)).
Von Bortkiewicz studied the sample range of normal random variables. An
important step toward the asymtotic theory of extremes was made by E.L.
Dodd and R. von Mises. Both authors studied the asymptotic behavior of the
sample maximum of normal and nonnormal random variables. The article
of von Mises is written in a very attractive, modern style. Under weak
regularity conditions, e.g. satisfied by the normal dJ., von Mises proved
that the expectation of the sample maximum is asymptotically equal to
F 1 (1  lin); moreover, he proved that
P{IXn : n
F 1 (1  1/n)1
:$;
e}
1,
n~
00,
for every e > O.
A similar result was also deduced by Dodd for various classes of distributions.
This development was culminated in the article of R.A. Fisher and L.H.C.
Tippett (1928), who derived the three types of extreme value distributions and
Bibliographical Notes
63
discussed the stability problem. The limiting dJ. Gl.~ was independently
discovered by M. Frechet (1927). As mentioned by Wilks (1948), Frechet's
result and that of Fisher and Tippett actually appeared almost simultaneously
in 1928.
We mention some of the early results obtained for central order statistics:
In 1902, K. Pearson derived the expectation of a spacing under a continuous
dJ. (Galton difference problem) and, in 1920, investigated the performance of
"systematic statistics" as estimators of the median by computing asymptotic
expectations and covariances of sample quantiles. Craig (1932) established
densities of sample quantiles in special cases. Thompson (1936) treated
confidence intervals for the qquantile. Compared to the development in
extreme value theory the results concerning central order statistics were
obtained more sporadically than systematically.
It is clear that the considerations in this book concerning exact distributions of order statistics are not exhaustive. For example, it is worthwhile
studying distributions of order statistics in the discrete case as it was done
by Nagaraja (1982, 1986), Arnold et al. (1984), and Riischendorf (1985a).
B.C. Arnold and his coauthors showed that order statistics of a sample of size
n ~ 3 possess the Markov property if, and only if, there does not exist an atom
x of the underlying dJ. F such that 0 < F(x) and F(x) < 1. In that paper one
may also find expressions for the density of order statistics in the discrete case.
We also note that densities of order statistics in case of a random sample size
are given in an explicit form by Consul (1984); see also Smith (1984, pages 631,
632). Further results concerning exact distributions of order statistics may be
found in the books mentioned below.
Apart from the books of E.J. Gumbel (1958), L. de Haan (1970), H.A. David
(1981), J. Galambos (1987), M.R. Leadbetter et al. (1983), and S.1. Resnick
(1987), mentioned in the various sections, we refer to the books of Johnson
and Kotz (1970, 1972) (order statistics for special distributions), Barnett and
Lewis (1978) (outliers), and R.R. Kinnison (1985) (applied aspects of extreme
value theory). The reading of survey articles about order statistics written by
S.S. Wilks (1948), A. Renyi (1953), and J. Galambos (1984) can be highly
recommended. For an elementary, enjoyable introduction to classical results
of extreme value theory we refer to de Haan (1976).
CHAPTER 2
Multivariate Order Statistics
This chapter is primarily concerned with the marginal ordering of the
observations. Thus, the restriction to one component again leads to the order
statistics dealt with in Chapter 1. Our treatment of multivariate order statistics
will not be as exhaustive as that in the univariate case because of the technical
difficulties and the complicated formulae for dJ.'s and densities.
There is one exception, namely, the case of multivariate maxima of i.i.d.
random vectors with dJ. F. This case is comparatively easy to deal with since
the dJ. of the multivariate maximum is again given by Fn, and the density is
consequently of a simple form.
2.1. Introduction
Multivariate order statistics (including extremes) will be defined by taking
order statistics componentwise (in other words, we consider marginal ordering).
It is by no means selfevident to define order statistics and extremes in this
particular way and we do not deny that other definitions of multivariate order
statistics are perhaps of equal importance. Some other possibilities will be
indicated at the end of this section. One reason why our emphasis is laid on
this particular definition is that it favorably fits to our present program and
purposes.
In this sequel, the relations and arithmetic operations are always taken
componentwise. Given x = (Xl'" .,Xd) and y = (Yl'" ',Yd) we write
x ::;; y
if
Xi::;;
Yi,
i = 1, ... , d,
(2.1.1)
and
(2.1.2)
2.1. Introduction
65
The Definition of Multivariate Order Statistics
Let ~l' ... , ~n be n random vectors of dimension d where ~i = (~i,l' ~i, 2" , ~i,d)'
The ordered values of the jth components ~ l,j, ~ 2,j, ... , ~n.j are denoted by
(2.1.3)
Using the map
Z"n
as defined in (1.1.4) we have
X?~
(2.1.4)
Zr:n(~I.j'~2,j""'~n.j)'
We also write
(2.1.5)
Using the order relation as defined in (2.1.1) we obtain
(2.1.6)
Notice that
XI:n
= (X~~~, X~7~,, X~~)n)
(2.1.7)
is the dvariate sample minimum, and
Xn:n
= (X~~~, X~7~,
. .. , X~~~)
is the dvariate sample maximum.
Observe that realizations of Xj:n are not realizations of ~l"
(2.1.8)
.. ,
~n
in general.
The Relation to Frequencies
For certain problems the results ofthe previous sections can easily be extended
to the multivariate setup. As an example we mention that (1.1.7) implies that
P {Xr:n :=; t}
{~ (1(oo,ttl(~i,I)'"'' 1(oo,tdl(~i,d)) ~ r}
(2.1.9)
where t = (t l , t 2 , .. . , t d ) and r = (r, r, ... , r). Notice that in (2.1.9) we obtain a
sum of independent random vectors if the random vectors ~l' ~2' ... , ~n are
independent. It makes no effort to extend (2.1.9) to any subclass of the r.v.'s
X~{~. For Ie {(j,r):j = 1, ... ,d and r = l, ... ,n} we have
P
{X?~ :=; tj,Y' (j, r) E I} = P {~ 1(oo,tj.rl(~i,j) ~ r, (j, r) E I}.
(2.1.10)
Thus, again the joint distribution of the r.v.'s X~{~, (j, r) E I, can be
represented by means of the distribution of a sum of independent random
vectors if the random vectors ~l' ... , ~n are independent. Note that a similar
result holds if maxima
(l)
X n(l):n(l)'
.. ,
X(d)
n(d):n(d)
are treated with different sample sizes for each component.
66
2. Multivariate Order Statistics
Further Concepts of Multivariate Ordering
A particular characteristic of univariate order statistics was that the ordered
values no longer contain any information about the order of their outcome.
Recall that this information is presented by the rank statistic Rn (see P.1.30).
The corresponding general formulation of this aspect in the Euclidean dspace
is given by the definition of the order statistic via sets of observations. Thus,
given r.v.'s or random vectors /;1"'" /;n we also may call the set {/;l, ... ,/;n}
the order statistic. It is well known that for i.i.d. random vectors these random
sets form a minimal sufficient statistic.
Other concepts are more related to the ordering according to the magnitude of the observations like in the univariate case. Our enthusiasm for
this topic is rather limited because no successful theory exists (besides the
particular case of sample maxima and sample minima as defined in (2.1.7) and
(2.1.8)). However, this topic meets an increasing interest since Barnett's
brilliant paper in 1976 which is full of ideas, suggestions and applications.
Some brief comments about the different concepts of multivariate ordering:
(a) The convex hull of the data points and the subsequent "peeling" of
the multidimensional sample entails one possibility of a multivariate
ordering. This concept is nice from a geometric point of view. The convex
hull can e.g. be used as an estimator of the distribution's support.
(b) The concomitants are obtained (in the bivariate case) by arranging the
data in the second component according to the ordering in the first
component.
(c) The multivariate sample median is a solution of the equation
n
L
;=1
Ilx;  xl12 = min!
x
(2.1.11)
where II 112 denotes the Euclidean norm. The median of a multivariate
probability measure Q is defined by
f Ily 
Xll2
dQ(y)
= m!n!.
(2.1.12)
TotalljlOrdering
Last but not least, we mention the ordering of multivariate data according to
the ranking method everyone is familiar with in his daily life. The importance
of this concept is apparent.
Following Plackett (1976) we introduce a total order of the points Xl' ... ,
Xn by means of a realvalued function 1/1. Define
(2.1.13)
if
2.1. Introduction
67
I/I(X)
(2.1.14)
I/I(y).
Usually one is not only interested in the ranking of the data Xl' ... , Xn
expressed in numbers 1, ... , n but also in the total information contained in
Xl' ... , Xn, thus getting the representation ofthe original data by
(2.1.15)
One advantage of this type of ordering compared to the marginal ordering
is that xi : n is a point of the original sample. It is clear that the ordering
(2.1.15) heavily depends on the selection procedure represented by the function 1/1.
As an example, consider the function I/I(x) = IIx  x o11 2. Other reasonable
functions 1/1 may be found in Barnett (1976) and Plackett (1976). Given the
random vectors ~ l' ... , ~n let
(2.1.16)
denote the I/Iorder statistics defined according to (2.1.15) with I/I(x) =
IIx  x oll 2. Define
(2.1.17)
which is the distance of the kth largest I/Iorder statistic from the center Xo'
Obviously,
(2.1.18)
is the kth largest order statistic of the n i.i.d. univariate r.v.'s
II~n  xol12 with common dJ.
lI~l
 Xo 112' ... ,
(2.1.19)
Here
B(xo,r)
= {x: Ilx  xoll2
~ r}
is the ball with center Xo and radius r.
Notice that the probability
P{Xk:n
E B(xo,r)}
(2.1.20)
may easily be computed since this quantity is equal to P{Rk:n ~ r}.
We also mention a result related to that of Corollary 1.8.4 in the univariate
case.
By rearranging Xnk+l:n, ... , Xn:n in the original order of their outcome
we obtain the k exceedances, say, ~l' ... , ~k of the random vectors ~l' ... , ~n'
It is well known that the conditional distribution of the exceedances ~l' ... ,
~k given R n  k : n = r is the joint distribution of k i.i.d. random vectors '11, ... ,
11k with common distribution equal to the original distribution of ~l truncated
outside of
C(xo,r)
= {x:
Ilx  xol12 > r}.
(2.1.21)
68
2. Multivariate Order Statistics
The author is grateful to Peter Hall for communicating a 3line sketch of the
proof of this result. An extension can be found in P.2.1.
If F(x o, .) is continuous then we deduce from Theorem 1.5.1 that for the
",maximum Xn:n the following identities hold:
P{Xn:n
B} =
P(Xn:n
= n(n 
1)
BIRnl:n)dP
f Pg
l E
B n C(x o, . )}F(x o, .
(2.1.22)
2 dF(x o,
').
The construction in (2.1.16) can be generalized to the case where Xo is
replaced by a random vector ~o leading to the kth ordered distance r.v. Rk:n
as studied in Dziubdziela (1976) and Reiss (1985b). Now the ranking is carried
out according to the random function "'(x) = Ilx  ~OIl2' A possible application of such a concept is the definition of an extrimmed mean
(2.1.23)
centered at the random vector
~o.
2.2. Distribution Functions and Densities
From (2.1.9) and (2.1.10) it is obvious that the joint dJ. of order statistics X~{}n
can be established by means of multinomial probabilities of appropriate "cell
frequency vectors" N l , ... , Nk where ~ = L7=l lR/~i) and the R l , ... , Rk form
a partition of the Euclidean dspace. Note that
The D.F. of Multivariate Extremes
Let ~, ~l' ~2' ... , ~n be i.i.d. random vectors. We start with a simple result
concerning the dJ. of multivariate order statistics. For the sample maximum
Xn:n based on ~l' ~2' ... , ~n we obtain as an extension of (1.3.2) that
(2.2.1)
This becomes obvious by writing
P{Xn:n:S; t} = P{X!~!:S; tl""'X!~~:s; td}
= P{maxg l ,l, .. ,en,d:s; t 1 , .. ,maxg l ,d, ... ,en,d}:S; t d}
= P{~l
:s; t'''''~n:S; t} = Fn(t).
69
2.2. Distribution Functions and Densities
The extension of (2.2.1) to the case of i.n.n.i.d. r.v.'s is straightforward.
Moreover, in analogy to (2.2.1) one gets for the sample minimum X l : n the
formula
(2.2.2)
P{Xl:n > t} = L(t)"
where L(t) = P{I; > t} is the survivor function.
For d = 2, the following representation for the bivariate survivor function
holds:
L(x,y) = P{I;
>
(x,y)} = 1  Fl(X)  F2 (y)
+ F(x,y)
with Fi denoting the marginal dJ.'s of F. Hence,
F(x, y)
= 1  (1 
Fl (x))  (1  F 2 (y))
+ L(x, y).
An extension of this representation to the dvariate dJ. may be found in
P.2.S.
Formula (2.2.2) in conjunction with (1.3.3) yields
P{Xl:n ~ (x,y)}
= 1  (1 
Fl(x))n  (1  F 2 (y))n
+ L(x,y)n.
(2.2.3)
If a dJ. on the Euclidean dspace has d continuous partial derivatives then
we know (see e.g. Bhattacharya and Rao (1976), Theorem A.2.2) that the dth
partial derivative Od F /(ot 1'" Otd) is a density of F. Thus, if j is a density of F
then, if d = 2,
fin.n):n
nF
nl
+ n(n 
l)F
n2 of
of
ox oy
(2.2.4)
is the density of the sample maximum Xn:n = (X~~!, X~7!) for n ~ 2.
The density of the sample minimum X l : n = (Xi~~, Xi7~) is given by
nLnlj
+ n(n 
1)Ln 
oL oL
2__
ox oy
(2.2.5)
For an extension and a reformulation of (2.2.4) we refer to (2.2.7) and (2.2.8).
The D.F. of Bivariate Order Statistics
The exact joint dJ. and joint density of order statistics X~{~ can be established
via multinomial random vectors. The joint distribution of X?~ and X!~~ will
be examined in detail.
Let again I;i = (~i.l' ~i,2)' i = 1, ... , n, be independent copies of the random
vector I; = (~1'~2) with common dJ. F and marginals Fi Thus, F(x,y) =
P{1; ~ (x,y)}, Fl(X) = Pg l ~ x} and F2 (y) = Pg 2 ~ y}.
A partition of the plane into the four quadrants
Rl = (oo,x] x (oo,y],
R3 = (x, (0) x (oo,y],
R2 = (oo,x] x (y,oo),
R4 = (x, (0) x (y,oo)
70
2. Multivariate Order Statistics
(where the dependence of Ri on (x, y) will be suppressed) leads to the
configuration
R3 '
(X,y)
Put
Notice that L4 is the bivariate survivor function as mentioned above. We
have
and hence
and as noted above
L 4 (x, y)
Denote by
+ F(x, y).
1  F1 (x)  F2 (y)
the frequency of the
~i
in Rj ; thus,
n
~=
.L lR/~;).
,=1
From (1.1.7) it is immediate that
= P
{~ l(oo.x]((i,l) 2 r, i~ l(oo,y]((i,2) 2
= P{N1 + N2 2 r, N1 + N3 2
=
L L P{N1
k=r I=s
S,
S,
N1
= m}
N1 = m}
= m, N2 = k  m, N3 = / m}.
Inserting the probabilities of the multinomial random vector (N1' N 2, N 3,
N 4 ) we get
Lemma 2.2.1. The df F(r.s):n of (xH~, x~~~) is given by
F
(r,s):n
min(k
= " ",,'
L... I=s
L...
k=r
I)
L...
m=max(k+ln,O)
n'LmLkmLImLnkl+m
)'(1  m)'(
m.'(k  m.
. n  k  1 + m),'
.
The Density of Bivariate Order Statistics
If F(r,s):n possesses two partial derivatives, one may use the representation
(8 2 /8x8y)F(r,s):n of the density of F(r,s):n, however, it is difficult to arrange the
terms in an appropriate way.
2.2. Distribution Functions and Densities
71
A different method will allow us to compute the density of (X~~~, X!~~)
under the condition that F has a density, say, f To make the proof rigorous
one has to use the RadonNikodym theorem and Lebesgue's differentiation
theorem for integrals.
In a first step we shall prove that a density of F(r,s):n exists if F has a density.
Notice that for every Borel set B we have
n
P{(X~~~,X!~~)EB}:S;
L P{(;,l,j,z)EB}
i,j=l
;,t1
Lf1(X)fz(Y)dXdY +
i#j
it
Lf(X,y) dx dy
wheref1 = Sf(,v)dvandfz = Sf(u, ')duarethedensitiesofF1 andFz . Thus,
if B has Lebesgue measure zero then P {(X~~~, X!~~) E B} = 0, and hence the
RadonNikodym theorem implies that F(r,s):n has a (Lebesgue) density.
The proof of Lemma 2.2.2 below will be based on the fact that for every
integrable function g on the Euclidean kspace almost all x = (x l ' ... ,xd are
Lebesgue points of g, that is,
lim (2hfk
hO
X1 +h iXk+h
...
g(z)dz = g(x)
(2.2.6)
Xk h
Xlh
for (Lebesgue) almost all x (see e.g. Floret (1981), page 276).
The following lemma was established in cooperation with W. Kohne.
Lemma 2.2.2. If the bivariate i.i.d. random vectors ~1' ~z, ... , ~n have the
common density f then the random vector (X~~~, X!~~) has the density
r
J(r,s):n
= n.I
'\'
L...
m=O
Lm
_1
I
m.
[Lr1m L s 1 mL nrs+m+1 rt
2
JI
(r  1  m)!(s  1  m)!(n  r  s + m + 1)!
(r  2  m)!(s  1  m)!(n  r  s + m + 1)!
(r  2  m)!(s  2  m)!(n  r  s + m + 2)!
(r  1  m)!(s  1  m)!(n  r  s + m)!
(r  1  m)!(s  2  m)!(n  r  s + m + 1)!]
with the convention that the terms involving negative factorials are replaced by
zeros. The functions L 1, ... , L4 are defined as above. Moreover,
72
2. Multivariate Order Statistics
Ls(x,y) =
f:",
L 7(x,y) =
Ix'" f(u,y)du,
Notice that
L 6 (x,y) =
f(u,y)du,
1'"
f(x, v)dv,
Ls(x,y) = f",f(X,V)dV.
2:::'=0 can be replaced by 2::::!!'g.S)l. Moreover,
PROOF. Put SO,h(X, y) = (x  h, x + h] x (y  h, y + h] where the indices h, x,
y will be suppressed as far as no confusion can arise. According to (2.2.6) it
suffices to show that
(1)
as h ! for almost all (x, y). To compute P {(X:~~, X!~~)
use of the following configuration
Sz
Ss" .
Sl
Put
S6
...........
: (x,y)
So} we shall make
S4
... S7 }2h .
Ss
~ = 2:7=1 1s/~;) and % = P{~ ESj} =
S3
f(u,v)dudv for
Sj
~j ~ 8. Ob
viously, qj + L j as h + for j = 1, ... , 4. Moreover, by applying (2.2.6) it is
straightforward to prove that almost everywhere:
(2)
for j = 5, ... ,8. First, observe that for all (x,y) such that (2) holds we have
hZP{No ;:::: 2} +0,
hZP{No = 1,
Js ~;::::
1}+0,
and
as h +
and hence it remains to prove that
(2h)Z [P{
as h +
(X:~~, X!~~) E So, No = 1, Ns = N6 = N7 = Ns = O}
+ P {(x:~~, X!~~) E So, No = 0,
almost everywhere.
jt ~
<
2} ]+
(3)
1("s):n
73
2.2. Distribution Functions and Densities
Applying (1.1.7) we conclude that
{(X~~~, X~~~) E SO}
=
{x  h < X~~~ ~ x + h, y  h < X~~~ ~ Y + h}
{~ 1(oo,xhl(~i,l) < r ~ i~ 1(oo,X+hl(~i,l)'
i~ 1(oo,Yhl(~i,2) < S ~ i~ 1(OO'Y+hl(~i'2)}
= {Nl
+ N2 + Ns < r ~ Nl + N2 + Ns + No + N6 + N s ,
Nl + N3 + Ns < S ~ Nl + N3 + Ns + No + Ns + N7}
Thus, for m
= 0, ... , n,
{(X~~~,X~~~)
So, No
= {Nl + N2 <
1, Ns
r ~ Nl
= N6 = N7 =
Ns
= 0, Nl = m}
+ N2 + 1, Nl + N3 < S ~ Nl + N3 + 1,
No = 1, Ns = N6 = N7 = Ns = 0, Nl = m}
= {No
(4)
1, Nl
= m, N2 = r
By (4) we also get for m
 1  m, N3
=S
1  m,
= N6 = N7 = Ns = O}.
Ns
= 0, ... , n,
(5)
.f ~ ~ 2, Nl = m}
{(X~~~, X~~~) E So, No = 0, )=s
= {No = 0, Nl + N2 + Ns = r
N6
= {No = 0, Nl = m, N2 =
+ N3 + Ns = S  1,
= 1, Ns + N7 = 1, Nl = m}
 1, Nl
+ Ns
r  2  m, N3
= S  1  m,
N7 = 0, Ns = 0, Ns = 1, N6 = 1}
+ {No = 0, Nl = m, N2 = r
 2  m, N3
=S
(6)
2  m,
N6 = 0, N7 = 0, Ns = 1, Ns = 1}
+ {No = 0, Nl = m, N2 = r
 1  m, N3
Ns
+ {No = 0, Nl = m, N2 = r
= 0, Ns = 0, N6 =
 1  m, N3
Ns
= S  1  m,
=S
1, N7
1}
1}.
2  m,
= 0, N6 = 0, N7 =
1, Ns
Now (3) is immediate from (2), (5), and (6). The proof is complete.
In the special case of the sample maximum (that is, r = nand s = n) we have
1(n,n):n
= nFnlj + n(n 
1)F"2 LsLs
(2.2.7)
74
2. Multivariate Order Statistics
which is a generalization of (2.2.4) in the bivariate case. If the partial derivatives
exist then 1= 8 z F/8x8y,
Ls(x,y)
f:",/(U,Y)dU
(JF/8y)(x,y),
and
L 8(x,y) =
fco/(X, V) dv = (8F/8x)(x,y).
Let ~ = (~1' ~z) again be a random vector with dJ. F and density f. Let
11 (x) = SI(x, v) dv and Iz(Y) = SI(u, y)du be the marginal densities, and let
F1(xIY)
= P(~l
Fz(Ylx)
:s; xl~z = y) = Ls(x,Y)/lz(Y)
and
Pg z :s; YI~l
x) = L8(X,Y)/11(X)
be the conditional dJ.'s. Now, (2.2.7) may be written
(2.2.8)
.f(n,nj,n(x, y)
= nF"l(x,y)/(x,y)
+ n(n 
1)F"Z(x,y)F1(xly)Fz (ylx)/1(X)/z(Y).
The Partial Maxima Process
A vector of extremes with different sample sizes in the different components
has to be treated in connection with the partial maxima process Xn defined by
(2.2.9)
for t > 0 where the reals bn and an > 0 are appropriate normalizing constants.
In order to calculate the finite dimensional marginal dJ.'s of Xn one needs the
following.
Lemma 2.2.3. Let 1 :s; Sl < Sz < ... <
variables with common df F. Then,
P{Xs,,,,:S; x 1"",Xsk ,sk:S;
xd
Sk
be integers and ~ 1, .. , ~sJi.d. random
F"(Y1)F'2 S'(Yz)F'k sk'(h)
where Yj = min(xj , x j +1, ... , x k )
PROOF.
Obvious by noting that
{Xs,,,, :s; Xl" ",XSk'Sk:S; Xk }
= {Xs,,,, :s; Y1,' ",XSk"k:S; yd
(2.2.10)
2.2. Distribution Functions and Densities
75
We remark that a corresponding formula for sample minima can be
established via the equality
= {min(~l""'~s,)
> Yl,.,min(~Skl+l, ... '~sk) > Yk}
(2.2.11)
Multivariate Extreme Value Distributions
In Section 1.3 we mentioned that the limiting (thus, also stable) dJ.'s of the
univariate maximum Xn:n are the Frechet, Weibull, and Gumbel dJ.'s Gi ,,,.
The situation in the multivariate case is much more complex. First, we
mention two trivial examples of limiting multivariate dJ.'s.
EXAMPLES 2.2.4. Let Xn:n = (X~~!, . .. , X~~~) be the sample maximum based
on i.i.d. random vectors E;1, ... , E;n which are distributed like E; = ('11"'" '1d)'
(i) (Complete dependence)
Our first example concerns the case that the components '11' ... , '1d of
E; are identical; i.e. we have
'11 = '12 = ... = '1d'
Let Fl denote the dJ. of '11' Then, the dJ. F of E; is given by
F(t) = Fl (min(t 1, ... , t d))
and hence
P{Xn:n::;; t} = Fn(t) = F:(min(t 1 , ... ,td)).
(2.2.12)
If Fl = Gi,a then with Cn and d n as in (1.3.13):
Fn(C ntl
+ dn, ... ,cntd + dn) = Gi,,,(cnmin(t 1,,td) + dn)
(ii) (Independence)
Secondly, assume that the components '11' ... , '1d of E; are independent.
Then it is clear that X~~!, ... , X~~~ are independent. If GiUl,,,Ul is the dJ.
of'1j then with Cn,j and dn,j as in (1.3.13):
F n(C n,l t l
+ dn,l, .. ,Cn,d td + dn,d) = F(t) =
n
j=l
d
GiW,,,(j)(tj ).
(2.2.13)
(iii) (Asymptotic independence)
Given E; = (~, ~), we have
Xn:n = (X~~!, X~7!) = ( X 1:n, Xn:n)
where Xl:n and Xn:n are the sample minimum and sample maximum
based on the independent copies ~ 1, ... , ~n of ~. In Section 4.2 we shall
76
2. Multivariate Order Statistics
see that X1:n and Xn:n (and, thus, X~~~ and X~7~) are asymptotically
independent. Thus, again we are getting independent r.v.'s in the limit.
Contrary to the univariate case the multivariate extreme value d.f.'s form
a non parametric family of distributions. There is a simple device which enables
us to check whether a given dJ. is a multivariate extreme value dJ.
We say that advariate dJ. Gis nondegenerate if the univariate marginals
are nondegenerate. A nondegenerate dvariate dJ. G is a limiting dJ. of sample
maxima if, and only if, G is maxstable, that is,
(2.2.14)
for some normalizing constants an j > and bn,j (compare e.g. with Galambos
(1987), page 295, or Resnick (1987), Proposition 5.9).
If advariate dJ. is maxstable then it is easy to show that the univariate
marginals are maxstable and, hence, these dJ.'s have to be of the type G 1 ,a,
G2 ,a or G3 with r:J. > 0.
On the other hand, if the jth univariate marginal dJ. is Gi(j),aU) for
j = 1, ... , d, one can take the normalizing constants as given in (1.3.13) to
verify the maxstability.
Again the transformation technique works: Let G be a stable dJ. with
univariate marginals GiU),aU) for j = 1, ... , d. Writing again Ii,a = Gi~; 0 G 2 ,l
we obtain that
Xl
< 0, ... , Xd < 0,
(2.2.15)
defines a stable dJ. with univariate marginal d.f.'s G2 ,l (the standard exponential dJ. on the negative halfline).
EXAMPLE
(i)
2.2.5. Check that G defined by
( X'Y)
G(x,y) = G2 1(X)G2 l(y)exp    ,
"
x+y
X,y < 0,
is an extreme value dJ. with "negative" exponential marginals GZ ,l, and
(ii)
is the corresponding extreme value dJ. with Gumbel marginals.
A bivariate dJ. with marginals GZ ,l is maxstable if and only ifthe Pickands
(1981) representation holds; that is
G(x, y) = exp (
Jr
min(ux, (1  u)y) dV(U)) ,
X,
y < 0,
(2.2.16)
[0,1]
where v is any finite measure having the property
J[O,l]
udv(u) =
J[O,l]
(1  u)dv(u) = 1.
(2.2.17)
2.2. Distribution Functions and Densities
77
Recall that the marginals are given by G1 (x) = limy~oo G(x,y) and G2 (y) =
G(x, y) and hence (2.2.17) immediately implies that, in fact, the marginals in (2.2.16) are equal to G2,1.
lf v is the Dirac measure putting mass 2 on the point t then G(x, y) =
exp(min(x, y. lf v is concentrated on {O, 1} and puts masses 1 on the points
o and 1 then G(x, y) = G2 1 (X)G 2 1 (y).
The transformation technique immediately leads to the corresponding
representations for marginals different from G2 l ' Check that e.g.
limx~oo
G(x,y) = exp ( 
max(ueX,(1  U)eY)dV(U)
(2.2.18)
J[O.l]
is the representation in case of standard Gumbel marginals if again (2.2.17)
holds.
For the extension of (2.2.16) to higher dimensions we refer to P.2.10.
Multivariate D.F.'s
This section will be concluded with some general remarks about multivariate
dJ.'s.
First recall that multivariate dJ.'s are characterized by the following three
properties:
(a) F is right continuous;
that is, if Xn ! Xo then F(xn) ! F(xo).
(b) F is normed;
that is, if Xn = (xn.l>"" x n d ) are such that x n .; i 00 for every i = 1, ... , d
then F(xn) i 1; moreover, ifx n ;;:: Xn+l and x n.;! 00 for some i E {I, ... , d}
then F(xn) + 0, n + 00.
(c) F is Amonotone;
that is, for all a = (ai' ... , ad) and b = (b 1, .. . , bd),
A~F :=
(l)dD:l m'F(bf'lai m1 , ... , b,jdaJmd) ;;:: O.
(2.2.19)
me{O.l}d
Recall that if Q is the probability measure corresponding to F then
Q(a, b] = A~F.
From the representations (2.2.16) and (2.2.17) we already know that
multivariate extreme value dJ.'s are continuous. However, notice that the
continuity is a simple consequence of the fact that the univariate marginal
dJ.'s are continuous. This is immediate from inequality (2.2.20).
Lemma 2.2.6. Let F be advariate df. with univariate marginal df.'s F;,
i = 1, ... , d. Then, for every x, y,
d
IF(x)  F(y)1
:$;
L IF;(x;) ;=1
F;(Y;)I
(2.2.20)
2. Multivariate Order Statistics
78
PROOF.
Let Q be the probability measure pertaining to F. Given x, y we write
Bi = {(Xi' yJ
(Yi' xJ
We get
if Xi:<=;; Yi
Xi > Yi
IF(x)  F(y)1 = lit [F(Yl,,Yil,X i,,Xd)  F(Yl, ... ,Yi,Xi+l, ... ,Xd)]1
:<=;;.f
,=1
Q((X (OO,YJ) XBi X(.X
]=1
]=,+1
(OO,XJ))
:<=;;
i=l
IFi{X;)  Fi{Y;)I
P.2. Problems and Supplements
1. Let ~1' ... , ~n be i.i.d. random vectors with common continuous dJ. F. For
i E I := {j: 1 ::; j ::; k + 1, rj  r j  1 > I} define the random vectors ~r;_, +1' ... , ~r;1
by the original random vectors
property
~;
(in the order of their outcome) which have the
 xollz < Rr;,n
(with the convention that Rro,n = 0 and R rk +, ,n = (0). Then the conditional
distribution of(~rH+l' ... '~r;I)' iEI, given Rr"n = ZI, .. , Rrk,n = Zk is the joint
Rr;_"n < II~;
distribution of the independent random vectors (1]r;_, +1' ... ' 1]r;I), i E I, where for
every i E I the components of the vector are i.i.d. random vectors with common
distribution equal to the distribution of ~1 truncated to {x: Z;1 < IIx  Xo liz < z;}
with Zo = 0 and Zk+l = 00.
2. (Distribution of IjIorder statistics)
(i) Prove the analogue of (2.1.21) for the kth IjIorder statistic Xk,n.
(ii) (Problem) Derive the asymptotic distributions of central and extreme IjIorder
statistics Xk,n.
(iii) (Problem) Derive the asymptotic distribution of the trimmed mean in (2.1.22)
for different centering random vectors ~o.
3. Let A; E d, i = 1, ... , nand m E {O, ... , n}. With So = 1 and
P(A;, n ... n A;),
one gets
(i)
and
(ii)
1, ... , n
P.2. Problems and Supplements
(iii)
{.
l~l
lA
79
m} :0;;:2': . (j)(
1)jmSj
m
if
J~m
keven
k  m odd.
4. (i)
U Ai = L (ly 1 Sj'
i=1
j=l
(ii) (Bonferroni inequality)
<
k
k
Pi~ Ai; j~
(ly 1 Sj
if
odd
even.
5. Let ~ = (~l"'" ~d) be a random vector with dJ. F.
(i) Prove that
1  F(t) =
L (_ly+l hit)
j~l
hit)
=
1
~il
Pgi, > ti"""~ij > tiJ,
< ... <ij'$d
(ii) Moreover,
1 _ F(t) :0;;
(_l)j+l hit)
:2': j~l
k
k
if
= 1, ... , d.
odd
even.
(iii) Find C > 0 such that for every positive integer n and x
[0,1],
exp( nx)  Cn 1 :0;; (1  x)" :0;; exp(  nx).
(iv) Check that
F(t)n :0;; exp
(~ (I)jnhit))
if k even or k
= d.
Moreover, for some universal constant C > 0,
F(t)" :2': exp
(tl
(l)jnhit))  Cn 1
if k odd or k
= d.
6. (Uniform Distribution on A = {(x, y): x, y :2': 0, x + Y :0;; I})
The density hn.n),. of (X~~~, X~~~) under the uniform distribution on A is given
by
hn.n),.(x, y) = 2nn(xy)n 1 1A (x, y)
+ 4n(n 
I)F n 2 (x, y)min(x, 1  y) min(1  x, y)
for 0 :0;; x, Y :0;; 1 where F is the underlying dJ. given by
F(x, y) =
{2XY
2xy  (x
+y
1)2
if x
X
+ Y:0;; 1
+ y :2': 1
for 0 :0;; x, Y :0;; 1.
7. Let the underlying density be given by f(x, y) = x
f(x, y) = 0 otherwise. Then, the rlJ. F is given by
+y
for 0 :0;; x, y:o;; 1 and
0:0;; x, y :0;; 1.
2. Multivariate Order Statistics
80
The density 1;n.n),n of (X~~~, X~7~) is given by
1;n.n),n(x,y)
nFn 1 (x,y)f(x,y)
+ n(n 
I)F n 1 (x,y)(xy
+ x 1/2)(xy + y1/2)
for 0 ::; x, y ::; 1.
8. (Problem) Let (~1' ~1) be a random vector with continuous dJ. F. Denote by Fl
and F1 the dJ's of ~1 and ~1. Extend (2.2.8) to
p {(X~~~, X~7~)
=
9.
B}
nFnl(x,y)dF(x,y)
n(n  l)F"1(x,y)F1 (xly)F1(ylx)d(F I x F1)(X,y).
(i) Prove that a bivariate extreme value dJ. G with standard "negative" exponential marginals (see (2.2.16)) can be written
G(x,y)
exp[(x
+ Y)dC:
y)
x,y < O.
where the "dependence" function d is given by
d(w) =
J10.1J
max(u(l  w), (1  u)w)dv(u)
and v is a finite measure on [0, 1] satisfying condition (2.2.17).
(ii) Check that d(O) = d(l) = 1. Moreover, d == 1 under independence and d(w)
max(1  w, w) under complete dependence.
(iii) Check that d(w) = 1  w + w1 in Example 2.2.5(ii).
10. Advariate d.f. with marginals G1 1 is maxstable if, and only if,
G(x) = exp(L min(Ulxb ... ,UdXd)dfl(U))
where fl is a finite measure on the dvariate unit simplex
S :=
having the property
{U: ,=1
.f
uidfl(U) = 1
Ui
= 1,
Ui
2':
o}
for i = 1, ... , d.
(Pickands, 1981; for the proof see Galambos, 1987)
11. (Pickands estimator of dependence function)
(i) Let ('11' '11) have the dJ. G as given in P.2.9(i). Prove that for every t < 0 and
WE (0, 1),
p{maxC
~ w' ~)::; t} =
exp[td(w)].
(ii) Let ('11.i, '11)' i = 1, ... , n, be i.i.d. random vectors with common dJ. G as given
in P.2.9(i). Define
Bibliographical Notes
81
In(w) =
[n f min(~,
l'1d)]l
1 w w
1
i=l
as an estimator of the dependence function d. Prove that
E(l/J.(w)) = 1/d(w)
and
Variance(1/J.(w)) = 1/(nd(w)2).
12. (Multivariate transformation technique)
Let ~ = (~1"'" ~d) be a random vector with continuous dJ. F. We use the notation
Fklxi1, .. ,xd = P(~i S; 'l~i1 = Xi1'''''~1 =
for the conditional dJ. of ~i given
(i) Put
xd
~i1 = XiI, ... , ~ 1 = Xl'
T(x) = (T1 (x), ... , Jd(x))
= (F1(xd, F2 (X2Ix 1), ,FAx dl x d1""
,xd)
Prove that T1 @, ... , Jd(~) are i.i.d. (0, 1)uniformly distributed r.v.'s.
(ii) Define y1(q) = (Sl(q),,,,,Sd(q)) by
Sl(q) = Fl1(qd
Si(q) = F i 1(q;lSi_1 (q), ... , Sl (q))
for i = 2, ... , d.
Prove that P{T1(T(~)) =~} = 1. Moreover, if '11' ... , '1d are i.i.d. (0,1)uniformly distributed r.v.'s then
T 1('11'"'' '1d)
has the dJ. F.
13. Compute the probability
P{X.,. =
~j
for some j
{1, ... , n}}.
Bibliographical Notes
It is likely that Gini and Galvani (1929) were the first who considered
the bivariate median defined by the property of minimizing the sum of the
deviations w.r.t. the Euclidean norm (see (2.1.11)). This is the "spatial" median
as dealt with by Oja and Niinimaa (1985). In that paper the asymptotic
performance of a "generalized sample median" as an estimator of the symmetry
center of a multivariate normal distribution is investigated. Another notable
article related to this is Isogai (1985).
The result concerning the conditional distribution of exceedances (see
(2.1.21)) and its extension in P.2.1 was e.g. applied by Moore and Yackel (1977)
and Hall (1983) in connection with nearest neighbor density estimators;
however, a detailed proof does not seem to exist.
A new insight in the asymptotic, stochastic behavior of the convex hull of
82
2. Multivariate Order Statistics
data points is obtained by the recent work of Eddy and Gale (1981) and
Brozius and de Haan (1987). This approach connects the asymptotic treatment
of convex hulls with that of multivariate extremes (w.r.t. the marginal
ordering).
For a different representation of the density of multivariate order statistics
we refer to Galambos (1975).
In the multivariate setup we only made use of the transformation
technique to transform a multivariate extreme value dJ. to a dJ. with
predetermined margins. P.2.12 describes the multivariate transformation
technique as developed by Rosenblatt (1952), O'Reilly and Quesenberry
(1973), Raoult et al. (1983), and Riischendorf (1985b). It does not seem to be
possible to make this technique applicable to multivariate order statistics
(with the exception of concomitants).
Further references concerning multivariate order statistics will be given in
Chapter 7.
CHAPTER 3
Ineq uali ties and the Concept
of Expansions
In order to obtain rough estimates of probabilities of certain events which
involve order statistics, we shall apply exponential bound theorems. These
bounds correspond to those for sums of independent r.v.'s. In Section 3.1 such
bounds are established in the particular case of order statistics of i.i.d. random
variables with common uniform dJ. on (0,1). This section also contains two
applications to moments of order statistics.
Apart from the basic notion of expansions of finite length, Section 3.2 will
provide some useful auxiliary results for the treatment of expansions.
In Parts II and III ofthis volume we shall make extensive use of inequalities
for the distance between probability measures. As pointed out before, the
variational distance will be central to our investigations. However, we shall
also need the Hellinger distance, a weighted Lrdistance (in other words,
X2distance), and the KullbackLeibler distance.
In Section 3.3 our main interest will be focused on bounds for the distance
between product measure via the distance between single components. We
shall start with some results connected to the Scheffe lemma.
3.1. Inequalities for Distributions of Order Statistics
In this section we deduce exponential bounds for the distributions of order
statistics from the corresponding result for binomial r.v.'s. By applying this
result we shall also obtain bounds for moments of order statistics.
Let us start with the following wellknown exponential bound (see Loeve
(1963), page 255) for the distribution of sums ofi.i.d. random variables
en with Eei = and led ~ 1: We have
el' ... ,
3. Inequalities and the Concept of Expansions
84
(3.1.1)
Eer.
Because of relation (1.1.8)
for every e ~ 0 and 0 ~ t ~ "t'n where "t'; = I7=1
between distributions of order statistics and binomial probabilities one can
expect that a result similar to (3.1.1) also holds for order statistics in place of
sums.
Exponential Bounds for Order Statistics of Uniform R.V.'s
First, our result will be formulated for order statistics UI : n ~ ... ~ Un:n of
i.i.d. random variables '1i which are uniformly distributed on (0, 1). The transformation technique leads to the general case ofi.i.d. random variables with
common dJ. F.
ei
Lemma 3.1.1. For every e ~ 0 and r
{I, ... , n} we have
nl/2
< e}
(
P { ;(Ur:n  Jl) ; e
~ exp
where Jl
r/(n
+ 1) and 0'2 = Jl(1
3(1
e2
+ e/(O'nl/2
(3.1.2)
 Jl).
PROOF. (I) First, we prove the upper bound of P{(nl/2/0')(Ur : n  Jl) ~ e}.
W.l.g. assume that a: = Jl  eO'/nl/2 > O. Otherwise, the upper bound in (3.1.2)
is trivial. In particular, a: E (0, 1). By (1.1.8), putting eo = (r  na:)/(na:(1  a:1/2
and i = 1(OO,~I('1;)  a:, we get
P{(nl/2/0')(Ur : n  Jl)
e} =
p{~ 1(OO,~I('1i) ~ r}
= p{~
ei ~ r 
na:}
~ exp( eot + it 2)
ei
if 0 ~ t ~ (na:(1  a:1/2 where the last step is an application of (3. 1.1) to and
e = eo. It is easy to see that t = 2e(a:(1  a:1/2/(30'(1 + e/(O'nl/2))) fulfills the
condition 0 ~ t ~ (na:(1  ClW/2. Moreover, eot + (3/4)t 2 ~ e 2/(3(1 + e/
(O'nl/2))) since eo ~ eO'/(a:(1  a:1/2 and a:(1  a:)/0'2 ~ 1 + e/(O'nl/2). This
proves the first inequality.
d
(II) Secondly, recall that Ur : n = 1  Un r +l :n (see Example 1.2.2), hence
we obtain from part (I) that
P{(nl/2/0')(Ur : n  Jl)
e}
= P{(nl/2/0')(1  Un r +1:n 
Jl) ~ e}
= P{(n l /2/0')(Un_r +1:n  (n  r
~
exp( e 2/3(1
+ e/(O'nl/2))).
+ 1)/(n + 1 ~
e}
85
3.1. Inequalities for Distributions of Order Statistics
The righthand side of(3.1.2) can be written in a simpler form for a special
choice of e. We have
P{[n1/2/max{u,(6s(logn)/n)1/2}JIUr:n  J.LI ~ (6slogn)1/2}
S;
2n s . (3.1.3)
Moreover, a crude estimate is obtained by
e~
o.
(3.1.4)
Notice that 2exp( e/5) ~ 1 whenever e S; 1. It is apparent that (3.1.4) is
weaker than (3.1.3) for small and moderate e.
As a supplement to Lemma 3.1.1 we shall prove another bound of
P {Ur: n S; c5} that is sharp for small c5 > O. Note that P {Ur: n S; c5} ! 0 as c5 ! 0,
however, this cannot be deduced from Lemma 3.1.1.
Lemma 3.1.2. If Ur:n and J.L are as above then for every e ~ 0:
P{Ur:n
PROOF.
S;
J.Le}
S;
e 1/'(ee)'/(2nr)1/2.
From Theorem 1.3.2 and Sterling's formula we get
P{Ur:n
S;
J.Le} = [n!/(r  l)!(n  r)!]
S;
[n' /(r  I)!]
= (exp(r
f:
JorilE
X,l
dx
X,l(1 
S;
xr'dx
(r' /r!)e'
+ ()(r)/r)/(2nr)1/2)e'
where I()(r) I < 1. Now the proof can easily be completed.
Extension to the General Case
The investigation of exponential bounds for distributions of order statistics
will be continued in Section 4.7 where local limit results are established. To
prove these results we need, however, the inequalities above. The extension
of inequality (3.1.2) to arbitrary dJ.'s is accomplished by means of Corollary
1.2.7. For order statistics X 1:n, ... , Xn:n of n i.i.d. random variables with
common dJ. F we have
p{[n 1/2g (J.L)/U](Xr:n  F1(J.L))
~ ;e} S; p{(n 1/2/U)(Ur:n _
J.L)
~ :~;e)}
(3.1.5)
where g(J.L) is a nonnegative constant and h(x) = (nl/2/u) [F(F1(J.L) + xu/
(g(J.L)nl/2))  J.L]' Thus, upper bounds for the lefthand side of (3.1.5) can be
deduced from (3.1.2) by using bounds for h( e) and h(e). Notice that if F has
a bounded second derivative on a neighborhood of F1(J.L) then, by taking
86
g(ll)
3. Inequalities and the Concept of Expansions
F'(F 1 (1l)), we get
h(x)
= x
+ O(x 2(J/g2(Il)n 1/2).
(3.1.6)
If one needs an upper bound of the lefthand side of(3.1.5) for a fixed sample
size n then one has to formulate the smoothness condition for F in a more
explicit way so that the capital 0 in (3.1.6) can be replaced by a constant. This
should always be done for the given specific problem.
Inequalities for Moments of Order Statistics
Let U"n, 11 and (J be given as in Lemma 3.1.1. From (1.7.5) we know that
E((U"n  11)2) = (J2/(n + 2). The following lemma due to Wellner (1977) gives
upper bounds for absolute central moments of U"n.
Lemma 3.1.3. For every positive integer j and r E {I, ... , n}:
EI U"n  Ill i ::s; 2j!5i (Jini/2.
By partial integration (or Fubini's theorem) we obtain for every dJ. G
with bounded support that
PROOF.
La) xidG(x) =j La)
X i  1 (1
G(x))dx
so that, by writing G(x) = P{(n 1/2/(J)IU"n  Ill::s; x}, the exponential bound
in (3.1.4) applied to 1  G(x) yields
E l(n 1/2 /(J)(U"n  IlW =
La) xi dG(x)
=j
La) x i 
::s; 2j
La) x i 
(1  G(x))dx
exp( x/5)dx = 2j!5 i .
To prove an expansion of the kth absolute moment E IXr:nlk (see Section
6.1) we shall use an expansion of E(IXr:nlkl{1Xrml,,;u}) and, furthermore, an
upper bound of E(lXr:nlkl{IXrnl>u}) for appropriately chosen numbers u. Such
a bound can again be derived from the exponential bound (3.1.2).
Lemma 3.1.4. Let Xi:n be the ith order statistic of n i.i.d. random variables with
common df. F. Assume that EIXs:il < 00 for some positive integers j and
s E {l, ... ,j}.
3.1. Inequalities for Distributions of Order Statistics
87
Then there exists a constant C > such that for every real u and integers n,
k and r E {I, ... , n} with 1 ~ i:= r  ks ~ m:= n  (j + l)k the following two
inequalities hold:
PROOF. We shall only verify the upper bound of E(IXr:nlk 1{X r ,n>U})' The other
inequality may be established in a similar way.
Since X"n ~ F1(V"n) and F1(q) > u iff q > F(u) we get
E(I Xr:nl k l{Xrn >u})
= E(IF 1(Vr:n)l k l{Fl(U"nU})
=
e 1F
b(r, n  r + 1) JF(U)
b(r, n 
(xWx r  1 (1 _ xrr dx
1 (Ir1(x)lxS(1 _ x).is+1)k
r
r + 1) JF(U)
b(i, m  i
~ b(
r, n  r
+ 1) k
+ 1) C P{Vi:m >
_ x)midx
X i1(1
F(u)}
where C is the constant of (1.7.11). Since P{Vi:m > F(u)} = P{Xi:m > u} the
proof is complete.
0
Bounds for the Maximum Deviation of Sample Q.F.'s
This section will be concluded with some simple applications of inequality
(3.1.3) to the sample q.f. Let C;;l be the sample q.f. based on n i.i.d. (0,1)uniformly distributed r.v.'s. The first result concerns the maximum deviation
of C;;l from the underlying q.f. C1(q) = q.
Lemma 3.1.5. For every s >
there exists a constant B(s) >
P {I C;;l (q)  ql > (log n)jn) 1/2 K(q, s, n) for some q E (0, I)}
where K(q,s,n)
PROOF.
(7(s
l)max{q(1  q), 7(s
such that
~
B(s)n S
+ 1)(logn)/n})1/2.
By (3.1.3)
P {I Cn 1(q)  q I > (log n)jn) 1/2 K(q, s, n) for some q E (0, I)}
2n s
where K(q,s,n) = 6smax{(j"(q), (6s(logn)jn)1/2} + 1jn, with (j"2(q) = (r(q)j
(n + 1))(1  r(q)j(n + 1)) and r(q) = nq ifnq is an integer and r(q) = [nq] + 1,
otherwise. Now check that K(q, s, n) ~ K(q, s, n) for sufficiently large n.
0
3. Inequalities and the Concept of Expansions
88
From Lemma 3.1.5 it is immediate that
p{
nl/2IG;1(q)_ql
max{(q(l  q))1/2, ((logn)/n)1/2}
> C(s)(log n)1/2 for some q E (0,
1)} : ; B(s)nS
(3.1. 7)
for some constant C(s) > O.
Oscillation of Sample Q.F.
From Theorem 1.6.7 we know that the spacing Us : n  Ur : n of n i.i.d. (0,1)uniformly distributed r.v.'s has the same distribution as Usron' This relation
makes (3.1.3) applicable to spacings, too. The details of the proof can be left
to the reader.
Lemma 3.1.6. For every s > 0 there exist constants B(s) > 0 and C(s) > 0 such
that
p{
O<P~~~2<1
n l/2 IG;1(P2)  G;l(pd  (P2  pdl
max{(P2  pd l/2, ((logn)/n)1/2}
> C(s)(log n)1/2 } < B(s)ns.
Extensions of (3.1.7) and Lemma 3.1.6 will be proved under appropriate
smoothness conditions on the underlying q.f. F l .
Lemma 3.1.7. Assume that the qf F l has a derivative on the interval
(ql  8, q2 + 8) for some 8> O. Put
sup
I(F l ),(p)l
Then, for every s > 0 there exist constants B(s, 8) > 0 and C(S,8) > 0 (only
depending on sand 8) such that
(i)
nl/2IFn1(p)  Fl(p)1
Q,S,PS,Q2 max{(p(1  pW/2, ((logn)/n)1/2}
P { sup
> C(s, 8)Dl (log n)1/2 } < B(s,8)n S,
and if, in addition, the derivative (Fl)' satisfies a Lipschitz condition of order
fJ E [1/2, 1J, that is,
I(F l )'(P2)  (F l ),(pdl ::;; D2 1p2  PllP
for ql  8 < Pl' P2 ::;; q2
+8
3.2. Expansions of Finite Length
89
for some D2 > 0, then
(ii)
p{
sup
Q,5.P,5.P25.Q2
nl/2IFn1(P2)  Fnl(Pl)  (Fl(P2)  Fl(Pl))1
1/2
1/2
max{(P2  pd ,((logn)jn) }
PROOF. In view of the quantile transformation we may take the version
Fl(G;;l) of the sample q.f. Fn l where Gn l is defined as in Lemma 3.1.5. Now,
applying (3.1.7) and the inequality
IFl(G;;l(p))  Fl(p)1 ::; DlIG;;l(p)  pi
we obtain (i).
Using the auxiliary function
l{I(y)
= F l (P2 + y(G;;1(P2) 
P2))  Fl(Pl
+ y(G;;l(pd 
pd)
we obtain the representation
F l (G;;1(P2))  Fl(G;;l(Pl))  [F l (P2)  Fl(Pl)]
= 1{I(1) I{I(O)
= (F l )'(P2 + O(G;;1(P2)  P2)HG;;1(P2)  P2]
 (Fl)'(Pl + O(G;;l(pd  pdHGnl(pd  PI]
with 0 < 0 < 1. Now, standard calculations and Lemma 3.1.6 lead to (ii).
From the proof of Lemma 3.1.7 it is obvious that (i) still holds if F l satisfies
a Lipschitz condition of order 1.
3.2. Expansions of Finite Length
When analyzing higher order approximations one realizes that in many cases
these approximations have a similar structure. As an example, we mention
the Edgeworth expansions which occur in connection with the central limit
theorem. In this case, a normal distribution is necessarily the leading term of
the expansion. The concept of Edgeworth expansions is not general enough
to cover the higher order approximation as studied in the present context.
Apart from the fact that our attention is not restricted to sequences of
distributions one also has to consider nonnormal limiting distributions in
the field of extreme order statistics. Thus, an extension of the notion of
Edgeworth expansions to the more general notion of expansions of finite
length is necessary.
It is not the purpose of this section to develop a theory for expansions of
finite length, and it is by no means necessary to have this notion in mind
90
3. Inequalities and the Concept of Expansions
to understand our results concerning order statistics. However, at least in
this section, we want to make clear what is meant by speaking of expansions. Moreover, this notion can serve as a guide for finding higher order
approximations.
A Definition of Expansions of Finite Length
Let gy and go,y,' .. , gml,y be realvalued functions with domain A for every
index Y E r so that I:.'f!=01 gi,y can be regarded as an approximation to gyWe say that gy, Y E r, admits the expansion Li=OI gi,y oflength m arranged
in powers of h(y) > if for every x E A there exists a constant C(x) > Osuch
that
i~ gi,y(X) 1 ~ C(x)h(yy+1,
YE
r,
(3.2.1)
for every j = 0, ... , m  1.
The expansion is said to hold uniformly over Ao
c:
A if sup{ C(x):
Igy(X) 
xEAo}<oo.
If sup {h(y): y E r} <
00,
we may assume w.l.g. that
Igi,yl ~ Ch(y)i
by putting C sup{1 + h(y): y E r} in place of C.
In our context the functions gy etc. will mainly be drs or probability
measures.
A significant feature of an expansion oflength m + 1 is that the first m terms
coincide with the expansion of length m. Thus, one always has the choice
between the simplicity and the accuracy of an approximation. The first term
of an expansion (giving the simplest approximation) is usually known from a
limit theorem; an error bound for the limit theorem leads to an expansion of
length one. One purpose of asymptotic expansions is to give a better insight
into the remainder term of the limit theorem.
EXAMPLE 3.2.1. If a realvalued function
bounded derivatives then
1f(y)
f defined on the real line has m
ml j<i)(yo)
'1
 i~O i!(y  Yo)' ~ Cly  Yolm
and hence the Taylor expansion Li=OI f(i)(yo)(y  Yo)iji! of f about Yo is an
expansion arranged in powers of h(y) = Iy  Yol.
3.2.2. Let <I> denote the standard normal dJ. and qJ = <1>'. By noting
that yqJ(Y) =  qJ'(y) one easily obtains by partial integration and by using the
induction scheme that
EXAMPLE
3.2. Expansions of Finite Length
1  <I>(y)
<p(y) (
=
91
1+
m1
i=1
.135 "(2i  1))
(1)'
2i
+(lr3'5"'(2ml)
<p(y)
dy
y2m
oo
(3.2.2)
for every positive integer m and y > 0 (where I?=1 equals zero by convention).
An application of (3.2.2) in the cases m = 1 and m = 2 leads to
<p(y)(l/y  l/y3) ~ 1  <I>(y) ~ <p(y)/y
(3.2.3)
for y > O.
By means of (3.2.2) we get an expansion of (1  <I>(y))y/<p(y) in powers of
h(y) = y2. We have
1
(1 7jY))y  (1
<p y
+ ~f (_I)i 1 3'5"';; (2i Y
,=1
1)) 1 ~ Cmy2m.
(3.2.4)
Notice that (1  <I>(y))y/<p(y) cannot be represented by means of the formal
series 1 + I~d 1)i3' 5 (2i  1)/y2i since 35 (2i  1) + 00 as i + 00.
However, (3.2.4) provides a useful inequality if y is large. Moreover, the
approximation for m + 1 is more accurate than that for m if y is sufficiently
large.
EXAMPLE 3.2.3. A sequence of drs Hn admits an Edgeworth expansion of
length m if
s~p
Hn(t)  ( <I>(t)
+ <p(t)
11
n i/2 Li(t)) 1
~ Cn ml2
(3.2.5)
where Li are polynomials. This is a special case of (3.2.1) with gn = H n,
go,n = <1>, gi,n = n i/2 <pLi for i = 1, ... , m  1 and h(n) = n 1/2.
Expansions of Probability Measures
We will primarily be interested in expansions
m1
PO,y + I Vi,y
i=1
of probability measure Py which hold uniformly over all measurable sets. If
the probability measure PO,y is the first order approximation to Py then the
approximation can be improved by adding to PO,y an approximation, say V1 ,y
to Py  PO,y' Since Py  PO,y is a signed measure with total mass equal to zero
it is clear that the set function V1 ,y will typically also have this property.
Lemma 3.2.4. Let Py and Po, y be probability measures and let Vi, y be finite signed
measures on a measurable space (S, I?4).
If PO,y + Ii=11 Vi,y is an expansion of Py, y E r, uniformly over I?4 arranged
3. Inequalities and the Concept of Expansions
92
in powers of hey), y E r, then there exists an expansion PO,y + L~=11 Ili,y such
that Ili,y are finite signed measures with lli,y(S) = O. Moreover, one may take
i
= 1, ... , m  1
or
i
(3.2.6)
= 1, ... , m  1.
(3.2.7)
PROOF. Straightforward by using the fact that II{=l Vi,y(S)1 ::;; Ch(y)j+1 for
0
some constant C > O.
According to Lemma 3.2.4. we can assume w.l.g. that the term Vi, y of an
expansion has the property Vi, yeS) = O. Another useful tool in this context is
the following.
Lemma 3.2.5. Let Py, PO,y and Vi,y be as in Lemma 3.2.4. Suppose that there exists
C > 0 such that
sup 1 Py(B)  Po,y(B)
1+
BE:J6
I/L
{=l vijB) 1 ::;; Ch(yy+1
i=l Vi,y(S)
(3.2.8)
and IvijS) I ::;; Ch(y)i for every j = 0, ... , m  1 and y E r [where (3.2.8) has to
hold whenever 1 + L{=l Vi,y(S) > 0].
Then, PO,y + L~=11 Ili,y, Y E r, is an expansion of Py, y E r, uniformly over f!J
arranged in powers of hey) where Ili,y is inductively defined by
i1
Ili,y = Vi,y  Vi,y(S)PO,y  L Vk,y(S)llik,y,
k=l
i = 1, ... , m  1.
(3.2.9)
PROOF. First notice that from the inequality IVi,y(S)1 ::;; Ch(y)i it is immediate
by induction over i = 1, ... , m  1 that
(3.2.10)
where C will be used as a generic constant which only depends on m.
The triangle inequality and (3.2.10) yield
Ili,y ::;; Ch(y)1
1Py  (PO,y + i~')1
'+1
::;; Ch(yy+1
O,y + L.,.i=l Vi,y
+ IP.1 + I{=l Vi,y(S) 
"j
(PO,y + i~')1
Ili,y
(1 + (1 + i~
1)
since
2j
i1
L Vk,y(S)Ilik,y'
k=l
i=j+1
Vi,y(S))
3.2. Expansions of Finite Length
93
Thus, the assertion is proved for those y for which h(y) is sufficiently small.
By (3.2.10) again it can easily be seen that, otherwise, the assertion trivially
holds by choosing the constant C sufficiently large.
0
By induction over i = 1, ... , m  1 it is easy to see that the signed measures
Ili.y in Lemma 3.2.5 already fulfill the condition lli,Y(S) = 0,
Expansions of D.F.'s
An expansion of probability measures which holds uniformly over all measurable sets on the real line yields an expansion
m1
PO,y( 00, tJ
+ L
i=l
Vi,y( 00, tJ
of dJ,'s.
Assume that PO,y = N(o, 1), Vi,y has a density <pRi,y where Ri,y is a polynomial
and the mass of Vi, y is equal to zero, Then, the expansion of the d.f.'s can always
be written in the form
m1
<D(t)
+ <p(t) L
Ly,i(t)
i=l
where L y , i are polynomials, This is immediate from the following lemma which
yields that one can find polynomials Ly,i such that (<pLy,;)' = <pRY,i'
Lemma 3.2.6. For every positive integer k,
<p(X) (X2k where ak = 1 and a i = (2i
f x 2k <p(X)dX) = [ <p(X) it aix 2i1]'
+ l)ai+1' i =
(3,2,11)
1"", k  1. Secondly,
<p(X)X 2k  1 = [  <p(X)
it
a ix 2(i1)]'
(3.2.12)
where a k = 1 and a i = 2ia i +1, i = 1, ... , k  1.
(3.2.11) and (3.2.12) can be proved in a straightforward way. Observe
that a 1 in (3.2.11) is given by
PROOF.
a 1 = 1 35 ... (2k  1) =
X2k<p(X) dx.
(3.2.13)
Two further technical lemmas that provide the basic tools for proving
expansions for extreme and central order statistics will be given in Appendix 2.
3. Inequalities and the Concept of Expansions
94
3.3. Distances of Measures: Convergence
and Inequalities
Given the r.v.'s (and I] with values in a measurable space (S, 81) [in our context,
S will be the real line or the Euclidean kspace] the variational distance is
defined by
sup IPg
B}  P{I]
(3.3.1 )
B}I.
BeiJ6
In this sequel, we shall write SUPB in place of sUPBeiJ6. Let Qo and Q1 denote
the distributions of ( and 1]. Then, we write
(3.3.2)
IIQo  Q111 = sup IQo(B)  Q1(B)I
B
Since the variational distance is difficult to deal with, we shall also
introduce related distances as the L 1distance, the Hellinger distance, a
weighted Lzdistance and the Kullback  Leibler distance. These distances will
enable us to establish important estimates of the variational distance.
The Variational Distance and the L 1  Distance
Representing the probability measures by their f1densities /; [in our context,
the /; are usually Lebesguedensities] one obtains the following wellknown
relation between the variational and the L 1distance.
Lemma 3.3.1.
(3.3.3)
PROOF.
Check that
where 1+ denotes the positive part of a function f This implies for B
1/0  111 df1 = 2 ('
with" =" for B
J{Jo>fd
(/0 
Id df1 ~ 2(Qo(B) 
Uo > Id. Hence
s~p (Qo(B) This yields the assertion.
Q1 (B)) = 2 1
PJ,
Q1 (B))
1/0  111 df1.
3.3. Distances of Measures: Convergence and Inequalities
95
The Scheffe Lemma and Related Results
We continue our calculations with some simple results concerning the
pointwise convergence of densities and the convergence w.r.t. the L1distance.
Lemma 3.3.2. For every nonnegative integer n, let
probability measure Qn. Then,
f" : fo
PROOF.
J.l  a.e. implies
f"
be the J.ldensity of the
If"  fol dJ.l: O.
We know (compare with the proof above) that
Ifn 
fol dJ.l = 2 (fo  f,,)+ dJ.l.
(1)
Moreover, fo ;;::: (fo  f,,)+ ;;::: 0 and (fo  fn)+ : 0 J.l  a.e. Therefore, the
dominated convergence theorem implies that
This together with (1) yields the assertion.
A short look at the proof above reveals also that the following extension
holds.
Lemma 3.3.3. Let f" be a nonnegative, J.lintegrable function. If
f f
limnSup f" dJ.l::;; fo dJ.l
and
li:,nf" = fo
J.l  a.e.
then
It is well known (and easy to show by examples) that the conditions of
Lemma 3.3.3 are not necessary for the Lcconvergence of fn to fo. We also
prove the following stronger version of the. Scheffe lemma.
Lemma 3.3.4. With
equivalent:
f"
as in Lemma 3.3.3 the following conditions (i)(iii) are
f
f f
fol dJ.l = o.
(i)
li:,n
(ii)
li:,n f" dJl = fo dJ.l,
Ifn 
3. Inequalities and the Concept of Expansions
96
and for every subsequence i(n) there exists a subsequence k(n) = i(j(n such that
limfk(n) = fo
n
J.l  a.e.
(iii) For every subsequence i(n) there exists a subsequence k(n) = i(j(n)) such
that
and
lim inffk(n)
fo
J.l  a.e.
We prove (i) => (ii) => (iii) => (i).
(i) => (ii): It is immediate that limn Jfn dJ.l = Jfo dJ.l. Moreover, for every
subsequence i(n) there exists a subsequence k(n) = i(j(n such that
PROOF.
Jl I.f~(n) 
fol dJ.l
00
fn~l
Lh(n)  fol dJ.l <
00.
This implies L~=lIA(n)  fol <
J.l  a.e. and hence limnA(n) = fo J.l  a.e.
(ii) => (iii): Obvious.
(iii) => (i): It suffices to prove that for every subsequence i(n) there exists a
subsequence k(n) = i(j(n such that
li~
IA(n)  fol dJ.l = O.
Condition (iii) implies that there exists k(n) = i(j(n)) such that
lim (/0  A(n)t = 0
n
J.l  a.e.
Thus, by repeating the arguments of the proof of Lemma 3.3.2 we obtain the
0
desired conclusion.
The following version of the SchefTe lemma will be particularly useful in
cases where the measurable space varies with n.
Lemma 3.3.5. Let gn and f" be nonnegative, measurable functions. Assume that
1, 2, 3, ... is a bounded sequence, and that limn (gn  f,,) dJ.ln = O.
Then the following three conditions are equivalent:
Jgn dJ.ln, n =
(i)
li~
(ii)
li~ f If,,/gn 
(iii)
lim [
n
Ign  fnl dJ.ln = 0,
11 gn dJ.ln = 0,
J{lfn/gnll~'}
gn dJ.ln = 0 for every e > O.
3.3. Distances of Measures: Convergence and Inequalities
PROOF.
97
(i) ::;. (ii) ::;. (iii): Obvious from
Jr{lJnlgn11~.} gndP,.n ~ e
flfn/gn  IlgndP,n
~ e
f lgn  fnldP,n.
(iii)::;. (i): For e > 0 put B = B(n, e) = {gn > 0, Ifn/gn  11 < e}. If (iii) holds
then
r gndP,n = fgndP,n J{IJnlgn11~'}
r
gndP,n~fgndP,ne
JB
for sufficiently large n. Moreover,
Itfn dP,n 
gn dP,n
I~
If,.  gnl dP,n =
(1)
Ifn/gn  11 gn dP,n
(2)
Combining (1) and (2),
L!..dP,n
~ f gndP,n 
e  e L gndP,n
~ f!..dP,n 
2e  e L gndP,n
(3)
if n is sufficiently large. By (1)(3),
fl!.. 
gnldP,n
~L
Ifn  gnldP,n
+f
+ ffndP,n  t!..dP,n
gndp,n  L gn dP,n
~ 2e f gndP,n + 3e
if n is sufficiently large. Since e is arbitrary this implies (i).
Finally, Lemma 3.3.5 will be formulated for the particular case of probability
measures.
Corollary 3.3.6. For probability measures Qn and Pn with p,ndensities !.. and gn
the following two assertions are equivalent:
li~ f
(i)
(ii)
Ign  fnl dP,n = 0,
lim Pn{I!../gn  11 ~ e}
n
= 0 for every e > O.
The Variational Distance between Product Measures
The aim of the following is to prove estimates of the variational distance
between products of probability measures in terms of distances between
the single components. Our starting point is an upper bound in terms of
3. Inequalities and the Concept of Expansions
98
the variational distances of the components. The technical details and a
generalization of the present result to signed measures can be found in
Appendix 3.
Lemma 3.3.7. For probability measures Qi and Pi' i = 1, ... , k,
(3.3.4)
The following example shows that the inequality is sharp as far as the order
of the upper bound is concerned. However, we will realize later that this is
not the typical situation.
3.3.8. Let Qt be the uniform distribution on the interval [0, tJ. We
show that for 0 :s; s :s; k:
EXAMPLE
s + O(S2) = 1  exp( s):s; IIQ~  Q~/(ls/k)11 :s; kllQ1  Q1/(lS/k)11 = s.
The two upper bounds are immediate from (3.3.4) and the identity
IIQ~
This also implies
IIQ~
Q~II = 1  t k
Q~/(1s/k)11 = 1  (1  S/k)k ;?: 1  exp( s).
The Hellinger Distance and Other Distances
To obtain sharp estimates of the variational distance of product measures we
introduce further distances and show their relation to the variational distance.
Let again Qi be a probability measure with Jldensity /;. Put
H(Qo, Q1) = [f (f01/2  fl/2)2 dJl
D(Qo, Qd
= [f (fdfo
J/2
1)2 dQo J/2
K(QO,Q1) = f(IOgf1/fo)dQo.
"Hellinger distance"
"x 2 
distance"
"Kullback  Leibler distance"
It can be shown that these distances are independent of the particular
choice of the dominating measure Jl and of the densities fo and f1. Keep in
mind that the distances 1111 and H are symmetrical whereas, this does not hold
for the distances D and K.
Notice that IIQo  Q111 :s; 1 and H(Qo, Q1) :s; 21/2. Moreover, 1IQ0  Q111 =
1 and H(Qo, Qd = 21/2 if the densities fo and f1 have disjoint supports. We
remark that, in literature, 2 1/2 H is also used as the definition of the Hellinger
distance.
3.3. Distances of Measures: Convergence and Inequalities
99
The definition of the X2distance will be extended to finite signed measures
in Appendix 3.
Check that H(Qo,Ql)::;; (211Qo  Ql11)1/2 and
H(Qo,Qd=[2(1 fUofdl/2dll)J/2.
(3.3.5)
(3.3.6)
Lemma 3.3.9. (i)
(ii) II Ql is dominated by Qo then
(3.3.7)
H(Qo,Qd::;; D(QO,Ql)
PROOF.
Ad (i): (3.3.3) and the Schwarz inequality yield
1IQ0  Qlll
f 1/0 
III dll = r
f 1/01/2  1/1211/01/2
::;; 2 1 [f U01/2  /l/2)2 dll J /2 [f Uol/2
= H(Qo, Ql{
+ /l/21 dll
+ 1/12)Z dll J /2
2(1 + f UO/l)1/2 dll) J /2/ 2 ::;; H(Qo, Qd
Ad (ii): Let 11 be a Qodensity of Ql. We have
H(Qo, Qd 2 = f (1  /l/2)2 dQo::;; f [(1  1/12)(1
=D(QO,Ql)2.
+ /l/2)]2 dQo
0
Note that (3.3.7) does not hold if the condition that Ql is dominated by Qo
is omitted. Without this condition one can easily prove (use (3.3.5)) that
H(Qo,Qd::;; [2D(Qo,Ql)]1 /2.
Under the condition of Lemma 3.3.9 it is clear that IIQo  Qll1 ::;; D(QO,Ql)
This inequality can slightly be improved by applying the Schwarz inequality
to 11  III dQo We have
IIQo  Qll1 ::;; 2 1 D(Qo, Ql)
(3.3.8)
Another bound for the Hellinger distance (and thus for the variational
distance) can be constructed by using the KullbackLeibler distance. This
bound is nontrivial if Qo is dominated by Qt. We have
(3.3.9)
A modification and the proof of this result can be found in Appendix 3.
The use of the KullbackLeibler distance has the following advantages: If
Idlo is the product of several terms, say, gi then we get an upper bound of
10gUdlo) by summing up estimates oflog(gJ Moreover, it will be extremely
3. Inequalities and the Concept of Expansions
100
useful in applications that only integrals of bounds of log(g;) have to be
treated.
Further Inequalities for Distances of Product Measures
In this sequel, it is understood that for every i = 1, ... , k the probability
measures Qi and Pi are defined on the same measurable space.
Lemma 3.3.10. (i)
(ii)
(iii) If, in addition, Pi is dominated by Qi for i = 1, ... , k, then
k) :$; exp [r k
k D(Qi' pY )1/2 .
1 i~ D(Qi' pY ] (. i~
D ( i~ Qi' ~ Pi
PROOF.
Ad (i): Suppose that Qi and Pi have the Ilcdensities}; and gi. By (3.3.5),
H(~ Qi' ~ PiY =
=
2[1 J[D
2[1  D
D
(};gJ 1/2 (XJ](d
i~ lli}X 1, ... ,X
k )]
J(};gY/2dll ]
(1  2 1H(Qi,PY)]:$; it H(Qi'PY
2[1 
where the final inequality is immediate from
n (1 k
u;) ~ 1 
i=l
Lu
i=l
for 0 :$; ui :$; 1.
Ad (ii): Obvious.
Ad (iii): Since D(Qi' pY = S};2 dQi  1 where}; is the Qcdensity of Pi we obtain
by straightforward calculations that
D(~ Qi' ~ PiY =
D+
[1
D(Qi,P;)2]  1
:$;
exp [ t D(Qi' PJ2]  1
:$;
exp[t D(Qi,Py](t D(Qi,PJ 2).
3.3. Distances of Measures: Convergence and Inequalities
101
Combining the results above we get
Corollary 3.3.11.
I ~ Qi  ~ Pi
II
:0;;
i~ H(Q;,PY
)112 :0;; (kl~ D(Qi'PY )112
(3.3.10)
Recall that the second inequality in (3.3.10) only holds if Pi is dominated
byQi'
If Qi = Q and Pi = P for i = 1, ... , k then by (3.3.4),
IIQ k
pkll
:0;;
kllQ  PII,
(3.3.11)
and by (3.3.10),
(3.3.12)
Thus, if IIQ  PII and H(Q, P) are of the same order (Example 3.3.8
treats an exceptional case where this is not true) then (3.3.12) provides a
more accurate inequality than (3.3.11). From (3.3.1 0) it is obvious that also
IIQk  pkll :0;; k I12 D(Q,P). A refinement of this inequality will be studied in
Appendix 3.
Distances of Induced Probability Measures
Let Q and P be probability measures on the same measurable space and T a
measurable map into another measurable space. Denote by TQ the probability
measure induced by Q and T; we have
TQ(B)
Q{TE B}.
Thus, in this context, the symbol T also denotes a map from one family of
probability measures into another family.
The following result is obvious.
Lemma 3.3.12.
IITQ  TPII
IIQ  PII
To highlight the relevance of this inequality let us consider the statistic
T(X,," ... X s ,") based on the order statistics X,," ... , X S ,". If Q is an
approximation to the distribution P of (X,,", ... , X,,") then TQ is an
approximation to the distribution TP of T(X,," ... X,,"). An upper bound
for the error IITQ  TPII of this approximation is given by IIQ  PII.
In view of the results above it is also desirable to obtain corresponding
results for the distances Hand D.
Lemma 3.3.13.
H(TQ, TP):O;; H(Q,P).
102
3. Inequalities and the Concept of Expansions
PROOF. We repeat in short the arguments in Pitman [1979, (2.2)]. Let go and
10 be J.Ldensities of Q and P where w.l.g. J.L is a probability measure. If gl 0 T
and 11 0 T are conditional expectations of go and 10 given T (relative to J.L)
then gl and 11 are densities of TQ and TP w.r.t. TJ.L.
Thus, by applying the Schwarz inequality for conditional expectations [see
e.g. Chow and Teicher (1978), page 215] to the conditional expectation of
(goIo)I/2 given T we obtain in a straightforward way that
(goIo)I/2 dJ.L::;;
(gJl)I/2d(TJ.L)
which implies the assertion according to (3.3.5).
Lemma 3.3.14. Under the condition that P is dominated by Q,
D(TQ, TP) ::;; D(Q, P).
PROOF. Check that (Id 2 dTQ :s; (10)2 dQ where 10 is a Qdensity of P and
11 is a TQdensity of TP. Moreover, use arguments similar to those in the
proof to Lemma 3.3.13.
0
P.3. Problems and Supplements
1. (i) For every x > kin,
P{Uk : n > x} :s; exp[ n(x  k/n)2/3].
(ii) Let x > 0 be fixed. Then, for every positive integer m we find a constant C(m, x)
such that for every nand k :s; n,
P{Uk : n > x} :s; C(m,x)(k/n)m.
2. Let X n : n be the maximum of the r.v.'s ';1"'" ';n' For k = 1, ... , n:
<
P{Xn:n :s; x}  1 +
;;:::
with
Six)
lSi! <"'<ijsn
L (lYSix)
k
j=l
if
kodd
k even
P{.;i 1 > x""'';ij > x},
= 1, ... , n.
3. Prove that
N(llloaI)(B)  N(llo.a5)(B) = (aO/a 1
+ ((Jl1
1)
+ O[((Jl1
Is
(l  ((x  Jld/ao)2 dN(1l1.a5)
Jlo)/a~) Is (x  Jlo)/ad
Jlo) dN(llo. a5)(x)
+ (ao/a 1 
1)2].
(see Falk and Reiss, 1988)
P.3. Problems and Supplements
103
4. For n = 0, 1, 2, ... let p. be unimodal probability measures which are dominated
by the Lebesgue measure. Then,
lIP.  Po II
+
0, n +
OCJ
iff p. + Po weakly.
(see Ibragimov, 1956, and Reiss, 1973)
5. Let Vi and V2 be finite signed measures on a measurable space (S, ~). Let JI be a
system of [0, 1)valued, ~measurable functions defined on S.
(i) Define
f7 = {.pl(t, 1]: t
[0, 1], '" E JI}.
Then,
(ii) As a special case we obtain for the system JI of all ~measurable, [0, 1]valued
functions that
(iii) If JI is the system of all [0, 1)valued, unimodal functions on the real line then
sup
t/le.At
If"'dV
l 
f"'dV21 =
sup Ivl(l)  v2(I)1
Ie",
where J is the system of all intervals on the real line.
6. Let Fo be a dJ. Then for every positive integer m there exists a finite set Am such
that for every dJ. Fl the following inequality holds:
sup IFo(t)  Fl (t)1 :::; m l
t
+ max (0, Fo(t) 
Fl (t), Fl
teAm
7. Prove that
sup
B
IPg l E B}  Pg 2 E B}I :::; Pg l
(n  Fo(C)).
=F e2}'
8. Let Qo . and Ql,n be probability measures such that Ql . is dominated by Qo ..
Find conditions under which
(i)
(Reiss, 1980)
(ii) the most powerful test of level ex for testing
<I>(<I>l(ex)
po. against pf.n has the power
+ nl/2 D(Po. , Pl )) + O(n 1/2 ).
(Weiss, 1974; Reiss, 1980)
9. (Jensen inequality)
Let h be a convex function on an open interval I and
that and h(e) are finitely integrable. Then,
ea r.v. with range I such
h(Ee) :::; Eh(e).
(see e.g. Ferguson, 1967, Lemma 1, page 76)
104
3. Inequalities and the Concept of Expansions
10. (Dvoretzky, Kiefer, Wolfowitz inequality)
Let G;;l be the sample q.f. in Lemma 3.1.5. Then for every e > 0,
p{ sup
nl/2IG;;1 
qe(O.I)
ql > e} =
p{ sup
qe(O.I)
n 1/2 IG. 
ql > e}::<:;; Cexp[ 2e 2 ]
for some C > O.
(see e.g. Serfling, 1980, page 59)
Bibliographical Notes
This chapter is not central to our considerations and so it suffices to only
make some short remarks.
Exponential bounds for order statistics related to (3.1.2) have been discovered and successfully applied by different authors (e.g. Reiss (1974a, 1975a),
Wellner (1977)).
The upper bound for the variational distance using the KullbackLeibler
distance was established by Hoeffding and Wolfowitz (1958). In this context
we also refer to Ikeda (1963, 1975) and Csiszar (1975). The upper bound for
the variational distance between products of probability measures by using
the variational distance between the single components was frequently proved
in various articles, nevertheless, this inequality does not seem to be well
known. It was established by Hoeffding and Wolfowitz (1958) and generalized
by Blum and Pathak (1972) and Sendler (1975). The extension to signed
measures (see Lemma A.3.3) was given in Reiss (1981b). Investigations along
these lines allowing a deviation from the independence condition are carried
out by Hillion (1983).
PART II
ASYMPTOTIC THEORY
CHAPTER 4
Approximations to Distributions of
Central Order Statistics
Under weak conditions on the underlying dJ. it can be proved that central (as
well as intermediate) order statistics are asymptotically normally distributed.
This result easily extends to the case of the joint distribution of a fixed number
of central order statistics. In Section 4.1 we shall discuss some conditions
which yield the weak and strong asymptotic normality of central order
statistics.
Expansions of distributions of single central order statistics will be established in Section 4.2. The leading term in such an expansion is the normal distribution, whereas, the higher order terms are given by integrals of
polynomials W.r.t. the normal distribution. These expansions differ from the
wellknown Edgeworth expansions for distributions of sums of independent
r.v.'s in the way that the higher order terms do not only depend on the sample
size n but also on the index r of the order statistic. In the particular case of
sample quantiles the accuracy of the normal approximation is shown to be of
order 0(n1/2).
In Section 4.3 it is proved that the usual normalization of joint distributions
of order statistics makes these distributions asymptotically independent of the
underlying dJ. This result still holds under conditions where the asymptotic
normality is not valid.
In Section 4.4 we give a detailed description of the multivariate normal
distribution which will serve as an approximation to the joint distribution of
central order statistics.
Combining the results of the Sections 4.3 and 4.4, the asymptotic normality
and expansions of the joint distribution of order statistics X'l :n' .. , X'k: n
(with 0 = ro < r 1 < ... < r k < rk+1 = n + 1) are proven in Section 4.5. It is
shown that the accuracy of this approximation is of order
108
4. Approximations to Distributions of Central Order Statistics
k+1
o ( i~
(ri  ri_d 1
)1/2
under weak regularity conditions. These approximations again hold w.r.t. the
variational distance.
Some supplementary results concerning the dJ.'s of order statistics and
moderate deviations are collected in the Sections 4.6 and 4.7.
4.1. Asymptotic Normality of Central Sequences
Convergence in Distribution of a Single Order Statistic
To begin with, let us consider the special case of order statistics U 1 : n ~
U2:n ~ ... ~ Un:n of n i.i.d. (0, I)uniformly distributed r.v.'s '11' ... , '1n' If
r(n) + 00 and n  r(n) + 00 as n + 00 then one can easily show that the order
statistics U,(n):n (if appropriately normalized) converge in distribution to a
standard normal r.v. as n + 00. Thus, with <I> denoting the standard normal
dJ., we have
P{a;:(~),n(U'(n):n  b,(n),n) ~ t} + <I>(t),
n +
(4.1.1)
00,
for every t where a"n = (r(n  r + I1/2j(n + 1)3/2 and b"n = rj(n + 1).
Since <I> is continuous we also know that the convergence in (4.1.1) holds
uniformly in t. In this sequel, we prefer to write a(n) and b(n) instead of a,(n),n
and b,(n),n, thus suppressing the dependence on r(n).
If (r(n)jn  q) = o(n1/2) for some q E (0, I)a condition which is e.g.
satisfied in the case of sample qquantilesanother natural choice of the
constants a(n) and b(n) is a(n) = (q(I  q))1/2 jn 1/2 and b(n) = q.
Applying (1.1.8) we obtain
P{a(n)1(U,(n):n  b(n)) ~ t}
=P{ 
i~ [I(oo,p(n,t))('1i)  p(n, t)] ~ r(n)
+ np(n, t)
(4.1.2)
where p(n,t) = b(n) + ta(n). Since (r(n) + np(n,t))j[np(n,t)(1 p(n,t))]1/2
+ t as n + 00, the convergence to <I>(t) is immediate from the central limit
theorem for a triangular array of i.i.d. random variables (or some other
appropriate limit theorem for binomial r.v.'s).1t is easy to see that this method
also applies to other r.v.'s. However, to extend (4.1.1) to other cases we shall
follow another standard device, namely, to use the transformation technique.
If X1:n ~ X2:n ~ ... ~ Xn:n are the order statistics of n i.i.d random
variables with dJ. F then, according to Corollary 1.2.7, P{X,(n):n ~ t} =
P{U,(n):n ~ F(t)} and hence by (4.1.1),
4.1. Asymptotic Normality of Central Sequences
P{a'(n)1(Xr(n):n  b'(n)) ~ t}
= P{Ur(n):n
109
+ ta'(n))}
+ 0(1) = <I>(t) + 0(1)
~ F(b'(n)
= <I> [a(n)1 [F(b'(n) + ta'(n))  b(n)]]
(4.1.3)
if a'(n) and b'(n) are chosen so that
a(nt1 [F(b'(n)
+ ta'(n)) 
b(n)] ~ t,
n~
00.
Our first example concerns central order statistics.
EXAMPLE 4.1.1. Let q E (0, 1) be fixed. Assume that F is differentiable at F1(q)
and F'(F1(q)) > 0. If n1/2(r(n)jn  q) ~ 0, n ~ 00, then
n1/2 F'(F1(q))
P { (q(1 _ q))1/2 (X,(n):n  F
1}
(q)) ~ t ~ <I>(t),
for every t. This is immediate from (4.1.3) by taking a(n)
b(n) = q, a'(n) = a(n)jF'(F1(q)), and b'(n) = F1(q).
As a special case we have
n~oo,
= (q(1
n 1/2F'(F1(q)) 1
1
}
P { (q(l _ q))1/2 (Fn (q)  F (q)) ~ t ~ <I>(t),
(4.1.4)
 q))1/2jn 1/2,
n~
00.
(4.1.5)
The next example deals with upper intermediate order statistics.
EXAMPLE 4.1.2. Assume that n  r(n) ~ 00 and r(n)jn ~ 1 as n ~ 00. Moreover,
assume that w(F) < 00 and that F has a derivative, say, f on the interval
(w(F)  e, w(F)) for some e > where f is uniformly continuous and bounded
away from zero. These conditions are e.g. fulfilled for uniform r.v.'s. Then,
(n
P{
+ 1)3/2f(F1 (~))
n+1
(r(n)(n  r(n) + 1))1/2 (Xr(n):n 
1
r(n)
(n +
1)) ~
<I>(t),
}
t
n~
00,
(4.1.6)
for every t. The proof is straightforward and can be left to the reader.
When treating intermediate order statistics the underlying dJ. F has to
satisfy certain regularity conditions on a neighborhood of IX (F) or w(F). From
this point of view intermediate order statistics are connected with extreme
order statistics. The extreme value theory will provide conditions better
tailored to this situation than those stated in Example 4.1.2 (see Theorem
5.1.7).
The Joint Asymptotic Normality
In a second step, consider the joint distribution of k order statistics where
k ~ 1 is fixed. Our arguments above can easily be extended to the case of joint
4. Approximations to Distributions of Central Order Statistics
110
distributions. Here we shall restrict our attention to an extension of Example
4.1.1.
Theorem 4.1.3. Let 0 < q1 < q2 < ... < qk < 1 be fixed. Assume that P is
differentiable at P1(q;) and that f(P 1(q;)) > 0 for i = 1, ... , k where f = P'.
Then, if (r(n, i)/n  q;) = 0(n1/2) for every i = 1, ... , k then
P{ (n 1/2f(p1 (q;))(Xr(n.i):n  p1(q;)m~1 ~ t} ..... <l>dt),
n .....
00,
(4.1.7)
for every t = (t l' ... , tk) where <1>1; is the df. of the kvariate normal distribution
with mean vector zero and covariances qi(1  %) for 1 ~ i ~ j ~ k. As a special
case we have
n .....
00.
(4.1.8)
Convergence w.r.t. the Variational Distance
One of the advantages of the representation (4.1.2) is that one can treat the
asymptotic behavior of the distribution of order statistics whenever a limit
theorem for the r.v.'s Li~l 1(oo,p(n,/))('1;) is at hand. The disadvantage of this
approach is that the convergence cannot be proved in a stronger sense since
we have to deal with discrete r.v.'s although the order statistics have a
continuous dJ.
Another wellknown method tackles this problem in a successful way. Let
us return to the distribution of a single order statistic Ur(n):n' In the i.i.d. case
we know the explicit form of the density. By showing that the density of
a(nt 1(Ur(n):n  b(n)) converges pointwise to the standard normal density
(compare with (1.3.9)) we know from the Scheffe lemma that the convergence
of the distributions holds w.r.t. the variational distance; that is
sup IP{a(n)l(Ur(n):n  b(n)) E B}  N(O.l)(B)I ..... 0,
n .....
00,
(4.1.9)
where ~o. 1) denotes the standard normal distribution.
Notice that (4.1.9) is in fact stronger than (4.1.1) since (4.1.1) can be written
sup IP{ a(n)l(Ur(n):n  b(n)) E (00, tJ}  N(o. 1)( 00, tJ I ..... 0,
n .....
00.
Next, the problem arises to extend (4.1.9) to a certain class of dJ.'s P. This
is again possible by using the transformation technique.
Theorem 4.1.4. (i) Let q E (0, 1) be fixed. Assume that P has a derivative, say,
f on the interval (F1(q)  e, p1(q) + e) for some e > O. Moreover, assume
that f is continuous at P1(q) and that f(P 1(q)) > O. Then, if r(n)/n ..... q as
n .....
00,
4.1. Asymptotic Normality of Central Sequences
111
sup
B
n +
(ii) Moreover,
00.
(4.1.10)
if (r(n)/n  q) = 0(n 1/2) then
n +
00.
(4.1.11)
(iii) (4.1.10) also holds under the conditions of Example 4.1.2.
Before sketching the proof of Theorem 4.1.4 let us examine an example
which shows that we have to impose stronger regularity conditions on the
underlying dJ. F than those in Example 4.1.1 to guarantee the convergence
w.r.t. the variational distance.
EXAMPLE 4.1.5. Let F have the density
2i + 1
f = 1[1/2,0] + ~ T+1 1[1!(2i+1),1/2i)
where the summation runs over all positive integers i. By verifying the
conditions of Example 4.1.1 we shall obtain that the dJ.'s ofthe standardized
sample medians weakly converge to the standard normal dJ. <1>. Since
n
2i
+ 1 (1
i~ T+1
it is easily seen that f f(x) dx =
F(2n
2i  2i
1)
+ 1 = 2(n + 1)
(4.1.12)
1. By (4.1.12),
~ 1) = F(2(n ~ 1)) = ~ + "2(:n~ ::I)'
and hence, for every positive integer n
1
1 2(n
F(x)   =
+ 1)
2(n
1)
2 1+
1)
2n+l(
if
+ ~ x  2n + 1
XE[2(n~ 1)'2n~ IJ
XE[2n~ 1'21nJ
This implies that x  x 2 ::;:; F(x)  1/2::;:; x for Ixl ::;:; 1/2 showing that F is
differentiable at F1(1/2) = 0 and F(l)(O) = 1. Thus, by Example 4.1.1,
P{2n1/2 X[n/2]:n ::;:; t}
+
<I>(t),
n +
00,
for every t,
4. Approximations to Distributions of Central Order Statistics
112
which proves the weak convergence. On the other hand,
P{2nl/2 X[n/21:n
Bn} = 0 < liminf N(O.l)(Bk )
1,
(4.1.13)
for every n where Bn = Ui 2nl/2 /2(i +
(2nl/2/(2i +
with i taken over all
positive integers.
To prove (4.1.13) verify that the Lebesgue measure of Bn 11 (0,1) is ;;:::t and
that f(x/2nl/2) = 0 for x E Bn.
The proof of Theorem 4.1.4 starts with the representation
a'(n)l(Xr(n):n  b'(n 4: T,.[a(n)l(Ur(n):n  b(n))]
where T,.(x)
= a'(n)l [F 1 (b(n) + xa(n  b'(n)]. According to (4.1.9)
sup IP{a'(n)l(Xr(n):n  b'(n E B}  P{T,.(,,) E B}I ~ 0
(4.1.14)
as n ~ 00 where" is a standard normal r. v.
To complete the proof of Theorem 4.1.4 it suffices to examine functions of
standard normal r.v.'s. Denote by Sn the inverse of T,.. Under appropriate
regularity conditions, S~(qJ 0 Sn) is the density of T,.(,,). If Sn(x) ~ x and
S~(x) ~ 1 as n ~ 00 for every x then S~(qJ 0 Sn) ~ qJ, n ~ 00. Therefore, the
Scheffe lemma implies the convergence to the standard normal distribution
w.r.t. the variational distance.
This idea will be made rigorous within some general framework. The
following lemma should be regarded as a useful technicality.
Lemma 4.1.6. Let Y;:n be the order statistics of n U.d random variables with
common continuous df. Fo and Xi:n be the order statistics of n U.d. random
variables with df. Fl' Let hand g(h 0 G) be probability densities where h is
assumed to be continuous at x for almost all x. Then, if
s~p iP{a(n)l(Y,,(n):n 
b(n E B} 
we have
s~p iP{a'(ntl(Xr(n):n 
b'(n E B} 
t
t
h(X)dxi
~ 0,
n~
g(X)h(G(Xdxi
~ 0,
00,
(4.1.15)
n~oo,
(4.1.16)
provided the functions Sn defined by
Sn(x) = a(ntl [FOI (Fl (b'(n)
+ xa'(n)))  b(n)]
are
(a) strictly increasing and absolutely continuous on intervals (oc(n), p(n where
oc(n) ~
00
and p(n) ~
00,
and
(b) Sn(x) ~ G(x) and S~(x) ~ g(x) as n ~ 00 for almost all
x.
113
4.1. Asymptotic Normality of Central Sequences
PROOF. Write T,,(x) = a'(nt 1[Fl1(Fo(b(n) + xa(n)))  b'(n)]. Since Fo is continuous we obtain from Corollary 1.2.6 that
P{a'(n)l(Xr(n):n  b'(n E B} = P{T,,[a(nt1(,.(n):n  b(n))] E B}
and hence condition (4.1.15) yields
s~p Ip{a'(nt1(Xr(n):n 
b'(n
B} 
9(X)h(G(Xdxl
~suplf
h(x)dx r 9(X)h(G(Xdxl+o(n
B {~E~
JB
(4.1.17)
O).
The image of (ct(n), f3(n under Sn' say, I n is an open interval, and T"IJn is the
inverse of Snl(ct(n),f3(n. By P.t.11,
J{T"EBj
h(x) dx =
r hn(x) dx
(4.1.18)
JB
for every Borel set B c (ct(n), f3(n where hn = S~(h 0 Sn) l(a(n).p(n))' Notice
that w.l.g. S~ can be assumed to be measurable. Since Jhn(x) dx ~ 1 and
hn + g(h 0 G) almost everywhere the SchefTe lemma 3.3.2 yields
s~p
II
hn(x)dx 
n +
g(X)h(G(Xdxl+o,
00.
This together with (4.1.18) yields
sup If
(~E~
h(x) dx 
JB
n +
g(x)h(G(x dx 1+ 0,
00.
Combining (4.1.17) and (4.1.19) the proof is completed.
(4.1.19)
Whereas the constants a(n) and b(n) are usually predetermined the constants a'(n) and b'(n) should be chosen in a way such that Sn fulfills the required
conditions. If G(x) = x and g(x) = 1 (that is, thelimiting expressions in (4.1.15)
and (4.1.16) are equal) then a natural choice of the constants a'(n) and b'(n) is
b'(n) = Fl1(Fo(b(n)
and
a'(n) = a(n)/(Fol
Fd(b'(n.
Then Sn(O) = and S~(O) = 1 so that a Taylor expansion of Sn about
that Sn(x) is approximately equal to x in a neighborhood of zero.
Now the proof of Theorem 4.1.4 will be a triviality.
(4.1.20)
yields
PROOF OF THEOREM 4.1.4. We shall only prove (4.1.10) since (4.1.11) and (iii)
follow in an analogous way.
Lemma 4.1.6 will be applied to Fo being the uniform dJ. on (0, 1), Fl = F,
a(n) = (r(n)(n  r(n) + 11/2/(n + 1)3/2, b(n) = r(n)/(n + 1), h = qJ, 9 = 1 and
G(x) = x. (4.1.15) holds according to (4.1.9). Moreover, choose b'(n) =
Fl(b(n and a'(n) = a(n)/f(b'(n. Since f is continuous at Fl(q) and
f(F1(q > we know that f is strictly positive on an interval (Fl(q)  K,
4. Approximations to Distributions of Central Order Statistics
114
F1(q) + K) for some K> O. This implies that Sn = a(n)l [F(b'(n) +
xa'(n))  b(n)] is strictly increasing and absolutely continuous on the
interval (  K/2a'(n), K/2a'(n)), eventually, and hence condition (a) in Lemma
4.1.6 is satisfied. It is straightforward to verify condition (b). The proof
is complete.
0
4.2. Expansions: A Single Central Order Statistic
The starting point for our study of expansions of distributions of central order
statistics will be an expansion of the distribution of an order statistic Ur:n of
i.i.d. (0, I)uniformly distributed r.v.'s. The leading term in the expansion will
be the standard normal distribution N(O.l)' The expansion will be ordered in
powers of (n/r(n  r))112. This shows that the accuracy of the approximation
by N(O.l) is bad if r or n  r is small. The quantile transformation will lead to
expansions in the case of order statistics of other r.v.'s.
Order Statistics of Uniform R.V.'s
For positive integers nand r E {1, ... ,n} put a;.n = r(n  r + 1)/(n + 1)3 and
br n = r/(n + 1). Recall from Section 1.7 that br n and ar n are the expectation
and, approximately, the standard deviation of Ur:n.
Theorem 4.2.1. For every positive integer m there exists a constant Cm > 0 such
that for every nand r E {I, .. . ,n},
sup Ip{a;:!(Ur:n  br n) E B} 
r (1 + ~f
,=1 L
JB
i r n)dNc.o.1)1
::; Cm(n/r(n  r)r'2
(4.2.1)
where L i r n is a polynomial of degree::; 3i.
PROOF. Throughout' this proof, the indices rand n will be suppressed.
Moreover, C will be used as a generic constant which only depends on m. Put
ex = rand P= n  r + 1. From Theorem 1.3.2 it is immediate that the density
of
a;:!(Ur:n  br n) = ((ex
+ P)312/(exP)112)(Ur:n 
ex/(ex
+ P))
is of the form pg where p is a normalizing constant and
+ (P/(ex + p)ex)112 X]"l [1  (ex/(ex + P) P) 112 x]fJ 1
if ((ex + p)ex/P)112 < x < ((ex + P)P/ex)112. Notice that min[(ex + P)ex/P,
(ex + P)P/ex] ~ exP/(ex + Pl. Corollary A.2.3 yields
g(x) = [1
4.2. Expansions: A Single Central Order Statistic
leXp (X 2/2)g(x) 
(1
+ ~t:
hi)l:::; C[(a
115
+ /3)//3ar /2 (lxl m + Ixl 3m )
(1)
for Ixl :::; [a/3/(a + /3)]1 /6 where hi are the polynomials as described in Corollary A.2.3. Define the signed measure v by
W.l.g., by choosing the constant C sufficiently large, we may assume
that the term J(1 + I:'!=11 h;) dN(o. 1) is bounded away from zero. By (1), the
exponential bound (3.1.2) and Lemma A.3.2 applied to the functions g and
f = exp(  x 2/2)(1 + Ir=1 1 h;) and to the set B = {x: Ixl :::; [a/3/(a + /3)] 1/6} we
obtain
sup IP{((a + /3)3 /2/(a/3)1/2)(Ur:n  a/(a + /3))
A
:::; C((a
+ /3)/(a/3))m /2
+ P{((a
:::; c((a
A}  v(A)1
f(,x ,m + IX I3m )dN(O,l)/f(1 + ~~1 h)dN(O,
+ /3)3 /2/(a/3)1/2)(Ur:n 
a/(a
+ /3)) B} + Ivl(B
l)
C)
+ /3)/(a/3)t I2 .
Now the assertion is immediate from Lemma 3.2.5.
Addendum 4.2.2. The application of Lemma 3.2.5 in the proof of Theorem 4.2.1
gives a more precise information about the polynomials Li,r,n'
(i) The polynomials Li,r,n are recursively defined by
LI,r,n = hI,r,n 
f hI,r,n dN.
(0,1)
il
 k~l
(f hk,r,n dN. )LIk,r,n
(0,1)
where hi,r,n == hi'
(ii) JLi,r,n dN(o, 1) = 0, i = 1, ... , m  1.
(iii) The coefficients of Li,r,n are of order O((n/r(n  r))iI2).
(iv) For i = 1,2 we have
[X3
]
n  2r + 1
L 1,r,n(x) = (r(n _ r + l)(n + 1))1/2 3'  x
(4.2.2)
and
1
L2 r n(x) = (
1)(
1) [en  2r
..
rnr+
n+
[7(n  2r
+ If + 3r(n 
+ 1)](x4 
+ 1)
26
(x  15)/18 
3)/12  (n  r
+ 1)2(x 2 
1)].
Before turning to the extension of Theorem 4.2.1 to a certain class of d.f.'s
we make some comments:
4. Approximations to Distributions of Central Order Statistics
116
(a) Perhaps the most important consequence of Theorem 4.2.1 is that we
get a normal approximation with an error term of order O((n/r(n  r))I/2).
Thus, if r = r(n) = [nq] where 0< q < 1 then the error bound is of order
O(nl/2). In the intermediate case the approximation is less accurate and,
moreover, if r or n  r is fixed (that is the case of extreme order statistics) we
have no approximation at all.
(b) When taking the expansion of length 2that is, we include the
polynomial L 1 ,r,n into our considerationsthen the accuracy of the approximation improves considerably. We also get a better insight in the accuracy of
the normal approximation.
For example, given the sample median Un +1:2n+1 we see that the corresponding polynomial L 1 ,n+1,2n+l is equal to zero and, thus, the accuracy of the
normal approximation is of order O(nl). A similar conclusion can be made
for order statistics which are closeas far as the indices are concernedto
the sample median. For sample quantiles different from the sample median
the accuracy of the normal approximation cannot be better than O(nl/2).
Finally, we mention that for symmetric Borel sets B (that is, B has the
property that x E B implies x E B) we have
L 1 ,r,n dN(o, 1)
= 0,
so that for symmetric sets the normal approximation is of order O(n/r(n  r)).
(c) Numerical calculations show that for n = 1, 2, ... , 250 we can take
C1 = .14 and C2 = .12 in Theorem 4.2.1.
The General Case
The extension of Theorem 4.2.1 to more general r.v.'s will be achieved by
means of the transformation technique. If Xr:n is the rth order statistic of n
i.i.d. random variables with common dJ. F then Xr:n 4: F 1 (Ur :n ). Notice
that F 1 is monotone. Apart from this special case one is also interested in
other monotone transformations of Ur :n
As a refinement of the idea which led to Lemma 4.1.6 we get the following
highly technical result.
Lemma 4.2.3. Let m be a positive integer and e > 0. Suppose that S is a function
with the properties S(O) = 0, S is continuously dif.ferentiable on the interval
(e,e),and
IS/(X) 
[1 + :~1 (XiXi/i!]I ~ (Xmlxml/m!,
Ixl <
e,
(4.2.3)
with l(Xd ~ exp(  ie) for i = 1, ... , m.
Moreover, let Ri be polynomials of degree ~ 3i so that the absolute values of
the coefficients are ~ exp(  ie) for i = 1, ... , m  1.
4.2. Expansions: A Single Central Order Statistic
117
Then there exist constants C > 0 and dE (0, 1) [which only depend on m]
such that
(i) S is strictly increasing on the interval I = (  de, de).
(ii) For every monotone, realvalued function T such that the restriction of T
to the set S(l) is the inverse of the restriction SII we have
sup [
B
r (1 + 'II R;)dN(O,I r (1 + mf L;)dN(O,I
J
,=1
J{TEB)
JB
,=1
J[
~ Cexp(me)
where L; is a polynomial of degree ~ 3i and the absolute values of the coefficients
are ~ C exp(  ie) for i = 1, ... , m  1.
(iii) We have
(4.2.4)
and
+ IX 1 [x 2 R'I(X)/2 + (x + IX 2 [x 2/2  x 4 /6].
L 2 (x) = R 2 (x)
x 3 /2)R 1 (x)]
+ IXi[x6/8 
5x 4 /8]
Since eP exp(  e) is uniformly bounded on [0, CfJ) for every p ;::: 1 there
exists d E (0, 1) such that
PROOF.
S'(x) ;::: 1 
Ixl
[deexp( e)];/i! ;::: 1/2,
i=l
~ de.
(1)
The assertion (i) is immediate from (1).
Moreover (1) implies that
S(O)( de)
de/2
and
From the condition S'(O)
that
0 and from (4.2.3) we deduce by integration
(2)
S(de);::: de/2.
ml
Xi+l
) I
Ixl +1
IS(x)  ( x + ;~ (i + I)! IX; ~ (m + 1)!IX
m
Ixl < e.
(3)
+ Ixl)IS(x)  xl is uniformly bounded over Ixl ~ de.
(4)
m,
Using (3) we get in analogy to (1) that
(1
Applying the transformation theorem for densities (1.4.4) we obtain for
every Borel set B c (  de, de) that
r (1 + mf R )dN(o.I r h(x)dx
(5)
~~1 R;(S'(X).
(6)
J{TEB)
,=1
JB
where
h(x) = S'(x)(J)(S(x ( 1 +
4. Approximations to Distributions of Central Order Statistics
118
Expanding <p about x we obtain from (4)
I<p(S(x 
<p(x) (1
+ ~~1
::; C<p(x) Iwm(x
wi(x)(S(x)  X)i) I
+ 8(S(x) 
x11 S(x)  xl
(7)
m
for Ixl ::; de and 8 E (0, 1). Moreover, Wi = <p(i)/(i!<p) is a polynomial of degree
::; i and C denotes a generic constant which only depends on m. For i = 1, 2
we get
W 1 (x) = x
and w2 (x) = (x 2  1)/2.
Writing
m1
tjJ(x) =
.~
,1
Xi+1
(.
I
+ 1)'. (Xi'
we obtain from (7) that
Ih(x) 
<p(x) [1
+ tjJ(1)(x)] [1 + ~:
::; C<p(x)exp( me)(1
w;(x)tjJ(i)(x) ] [ 1
+ ~t:
Ri(X
+ tjJ(x ] I (8)
+ IxI 6 (m+1)2)
for Ixl < de. From (8) we conclude that
Ih(x) 
<P(x{ 1 +
~t: Li(X) ] I ::; C<p(x) exp( 
me)(1
+ IxI 6 (m+1)2)
(9)
for Ixl < de where Li are polynomials which have the asserted property. From
(5) and (9) we deduce by integration that
If (1 ~f
: ; f.
{TEB}
,=1
Ri)dN(o.l) 
Ih(X) 
for Borel sets B
by (2)
If (1 ~f
{TEB}
,=1
c (
(1 + ~:
r (1 + ~f
JB
,=1
L i)dN(o.l)1
Li(X)<p(X)ldX::; Cexp(me)
de, de). Moreover, for Borel sets B
Ri)dN(o. 1)
(10)
fB (1 + ~f
L i)dN(o.l)1
,=1
where A is the complement of (  de/2, de/2).
Combining (10) and (11) the proof is complete.
c (
de, dey we get
(11 )
Note that Lemma 4.2.3 still holds if the condition that S has a continuous
derivative is replaced by the weaker condition that S is absolutely continuous.
4.2. Expansions: A Single Central Order Statistic
119
Next, an expansion oflength m will be established under the condition that
the underlying dJ. F has m + 1 derivatives on some appropriate interval. Let
again a;'n = r(n  r + 1)/(n + 1)3 and br,n = r/(n + 1). Based on Theorem
4.2.1 and Lemma 4.2.3 the proof of Theorem 4.2.4 will be a triviality.
Theorem 4.2.4. For some r E {I, . , . , n} let Xr: n be the rth order statistic
of n i.i.d. random variables with common df F and density f Assume that
f(F 1(br,n)) > 0 and that the function Sr,n defined by
Sr,n(x)
has m
+ xa r,n/f(F 1(br,n))]
= a;::~(F[F1(br,n)
 br,n)
+ 1 derivatives on the interval
Ir,n:= {x: Ixl < r110g(r(n  r
+ 1)/(n +
I))}.
Then there exists a constant Cm > 0 (only depending on m) such that
sup
B
Ip{a;::~f(r1(br,n))[Xr:n 
f (1
F 1(br,n)] E B} 
+ ~f L;,r,n)dN(o,l)1
,=1
(4.2.5)
where L;,r,n is a polynomial of degree :=:;; 3i. Moreover, ai,r,n = S~~:l)(O) for
j = 1, ... , m  1 and am,r,n = sup{IS~:':.+1)(x)l: x E Ir,n}.
PROOF. Throughout the proof, the indices rand n will be suppressed. Writing
(1)
and denoting by R; the polynomials of Theorem 4.2.1 we obtain from Theorem
1.2.5 and Theorem 4.2.1 that for every Borel set B,
IP{a 1f(F:=:;;
(b))[Xr:n  F1(b)]
B} 
{TEB}
(1 + ~f
R;)dN(O,1)1
,=1
C(n/r(n  r)t/2
It remains to prove that
I (1 + ~f R;)dN(O,
IJ{TEB)
1) 
,=1
:=:;; C [(n/r(n
 r))m/2
JB
(1 + ~f
,=1
+ ~~x laj,r,nl mli ].
L;) dN(o, 1)
I
(2)
)=1
Put e = log[(n/r(n  r))1/2 + max.i!=1Iaj,r,nI11i], and assume w.l.g. that
r(n  r) is sufficiently large so that e > O. A Taylor expansion of Sf about zero
yields that condition (4.2.3) is satisfied for e and a;. Moreover, TIS(I) is the
D
inverse of SII. Thus, Lemma 4.2.3 implies (2).
4. Approximations to Distributions of Central Order Statistics
120
Addendum 4.2.5. From the proof to Theorem 4.2.4 we see that
(i) SLi,r,ndN(o, l) = 0, i = 1, ... , m  1.
(ii) The coefficients of Li,r,n are of order
o [(n/r(n 
r1/2
+ n;t~X l(Xj,r,nlijjJ.
j;l
(iii) For i = 1,2, we have (with Ri,r,n denoting the polynomials of (4.2.2,
L 1,r,n(x) = R 1,r,n(x) + (X1,r,n(X  x 3 /2)
and
L 2,r,n(x) = R 2,r,n(x) + (Xl,r,n[x 2R~,r,n(x)/2
+ (XL,n(x 6 /8
 5x 4 /8)
+ (x 
+ (X2,r,n(x 2/2 
x 3 /2)R 1,r,n(x)]
x 4 /6).
Notice that Theorem 4.2.1 is immediate from Theorem 4.2.4 applied to
Sr,n(x) = x. In this case we have (Xj,r,n = O,j = 1, ... , m.
4.2.6. In many cases one can omit the term maxj;l l(Xj,r,nl m/j at the
righthand side of (4.2.5).
Let < q1 < q2 < 1 and suppose that the density is bounded away from
zero on the interval J = (F1(qd  e, F 1(q2) + e) for some e > 0. Iff has m
bounded derivatives on J then maxj;l l(Xj,r,nl m/ j = O(n m/2 ) uniformly over
rE {[nQ1], ... ,[nq2] + I}.
EXAMPLE
Order Statistics of Exponential R.V.'s
Careful calculations will show that in the case of exponential r.v.'s the righthand side of (4.2.5) is again of order O((n/r(n  r))m/2).
Corollary 4.2.7. Let Xi:n be the ith order statistic of n U.d. standard exponential
r.v.'s (having the df G(x) = 1  e and density g(x) = e x , x ;;::.: 0). Let again
a;'n = r(n  r + 1)/(n + 1)3 and br,n = r/(n + 1).
Then there exists a constant Cm > (only depending on m) such that
X
sU P/p{a;::!g(G 1(br,n[Xr:n  G 1(br,n)]EB} 
r (1 + ~f,;1 Li,r,n)d~o'l)/
JB
::;; Cm(n/r(n  rm/2
(4.2.6)
where the polynomials Li,r,n are defined as in Theorem 4.2.4 with
(Xi,r,n
= (I)i(r/(n + l)(n  r + l))i/2.
In particular, for i = 1, 2,
L1,r,ix) = (r(n  r
+ l)(n + 1)f1/2[(2n 
+ 2)x 3 /6 
(n  r + l)x],
4.2. Expansions: A Single Central Order Statistic
121
and
L 2,r,n(x) = R 2,r,n(X) + ((n  r + l)(n + lWl[r(5x 6/24 + 15x 4 /8  5x 2/2)
 (n
+ 1)( x6/6 + 4x4 /3
 3x 2/2)]
where R 2,r,n is the corresponding polynomial in Theorem 4.2.1.
PROOF. Since g(i)(G 1 (q)) = (I)i(1  q) it is immediate that rJ. i r n is of the
desired form. Moreover, lrJ.i,r,nl l / i $; (n/r(n  r + 1))1/2.
',
Let Sr,n and Ir,n be defined as in Theorem 4.2.4. Since log(1 + x) $; x for
x >  1, and hence, also log x < x, x > 0, we obtain
+ 1)/(n + 1))1/2]ar,n/g(G 1(br,n))
~ br,n  log[(r(n  r + 1)/(n + 1))1/2]ar,n/(1  br,n) > O.
inequality we see that Sr,n has m + 1 derivatives on the
G 1(br,n) log[(r(n  r
Using this
interval Ir,n' Moreover, by straightforward calculations we obtain rJ.m,r,n $;
C(n/r(n  r + l))m/Z where C is a universal constant. Thus, Theorem 4.2.4 is
applicable and yields the assertion.
0
Numerical computations show that one can take C1 = .15 and Cz = .12 in
Corollary 4.2.7 for n = 1, ... ,250. From the expansion oflength 2 in Corollary
4.2.7 we obtain the following upper bound ofthe remainder term of the normal
approximation:
Moreover,
L2 dN.
= 8(n  r + I)Z + 8r(n  r + 1) + 5r z < 2(n + 1)
l,r,n (0,1)
12r(n _ r + l)(n + 1)
 =3r'('n'r'+1::)'
(4.2.7)
Stochastic Independence of Certain Groups of Order Statistics
This section will be concluded with an application of the expansion of length
2 of distributions of order statistics Ui : n In the proof below we shall only
indicate the decisive step which is based on the expansion of length 2.
Hereafter, let 1 $; s < n  m + 1. Let Y.:n and v,,m+l:n be independent
r.v.'s such that Y.:n 4: Us :n and v,,m+l:n 4: Un m+1:no The basic inequality
is given by
$;
sm
C [ n(n _ s  m)
12
/
(4.2.8)
4. Approximations to Distributions of Central Order Statistics
122
where C > 0 is a universal constant. Thus, if sand m are fixed then the upper
bound is of order O(n 1). If s is fixed and (n  m)/n bounded away from 0 and
1 then the bound is of order O(n1/2). Finally, if s is fixed and n  m = o(n)
then the bound is of order On  mt I/2 ). This shows that extremes and
intermediate order statistics are asymptotically independent.
The proof of (4.2.8) is based on Theorem 1.8.1 and Theorem 4.2.1.
Conditioning on Un  m+ 1 : n one obtains
P{ (Us: n, Un m+1:n) E B}  P{ O::n, v,,m+1:n) E B} = ET(Un m+1:n)
(4.2.9)
where
T(x) = P{xU.: n m E Bx}  P{U.: n E Bx}
with Bx denoting the xsection of the set B.
The function T is of a rather complicated structure and has to be replaced
by a simpler one. This can be achieved by expansions of length 2. The
approximate representation of T as the difference of two expansions oflength
2 simplifies further computations. We remark that a normal approximation
instead of an expansion of length 2 leads to an inaccurate upper bound in
(4.2.8). For details of the proof we refer to Falk and Reiss (1988) where the
following two extensions of (4.2.8) can be also found.
Theorem 4.2.8. Let Xi:n be the ith order statistic of n i.i.d. random variables
with common df. F. Given 1 ~ s < n  m + 1 ~ n we consider two vectors of
order statistics, namely,
Xl
= (X 1:n,, X.:n), and Xu = (Xn m+1:n,, Xn:n)
Now let Yj and y" be independent random vectors so that Yj
d
y" = Xu Then,
sup IP{(XI,Xu )
B
4:
Xl' and
J1 /2 (4.2.10)
sm
)
nnsm
B}  P{(Yj, y") E B}I ~ C [ (
where C is the constant in (4.2.8).
A further extension is obtained when treating three groups of order
statistics.
Theorem 4.2.9. Let Xi:n be as above. Given 1 ~ k < r < s < n  m
+ 1 ~ n we
obtain three vectors of order statistics, namely,
= (X 1 : n,,Xk : n), Xc = (Xr:n,,X.: n), Xu = (Xnm+l:n,,Xn:n)
Now let Yj, ~ and Y" be independent random vectors so that Yj 4: Xl'
~ 4: Xc and Y" 4: Xu. Then there exists a universal constant C > 0 such that
Xl
4.3. Asymptotic Independence from the Underlying Distribution Function
sup IP{(Xz,Xc>XJ
B}  P{(l';, 1;., Y,J E B}I
k(n  r)
<C [
n(r  k)
123
sm
+ n(n  s  m)
(4.2.11)
Jl/2 .
Both theorems are deduced from (4.2.8) by means of the quantile transformation and by conditioning on order statistics.
4.3. Asymptotic Independence from the
Underlying Distribution Function
From the preceding section we know that the normalized central order
statistic f(F 1 (b r,n)}(Xr:n  F 1 (br.n)) is asymptotically normalwith expectation f.1 = 0 and variance a;,n = r(n  r + 1)/(n + 1)3 up to a remainder
term of order O(nl/2) if, roughly speaking, the underlying density fis bounded
away from zero. In the present section we shall primarily be interested in the
property that the approximating normal distribution is independent from the
underlying dJ. F. Consequently,
sup 1P{J(Fl(br,n))(Xr:n B
(br,n}} E B}  P{(Ur:n  br,n)
B}I
(4.3.1)
where Ur:n is the rth order statistic of n i.i.d. (0, I)uniformly distributed r.v.'s.
Notice that the error bound above is sharp since the second term of the
expansion of length two depends on the density f.
The Main Result
In analogy to (4.3.1) it will be shown in Theorem 4.3.1 that the variational
distance between standardized joint distributions of k order statistics is of
order o ((k/n) 1/2). That means, after a linear transformation which depends
on the underlying dJ. F the joint distribution of order statistics becomes
independent from F within an error bound of order O((k/n)1/2).
When treating the normal approximation, the situation is completely
different. It is clear that the joint asymptotic normality of order statistics Xr,n
and Xs,n implies that the spacings X"n  Xr:n also have this property.
However, if s  r is fixed then spacings behave like extreme order statistics,
and hence, the limiting distribution is different from the normal distribution.
Theorem 4.3.1. Let Xi,n be the ith order statistic of n i.i.d. random variables
with common df F and density f.
4. Approximations to Distributions of Central Order Statistics
124
= n + 1 with ri  ri 1 ::::: 4 for i = 1,2, ... ,
bi(1  b;)for i = 1, ... , k.
Assume that f > 0 and f has three derivatives on the interval I where
I = (F 1 (b 1 )  e1 , F 1(bk ) + ed with ei = 5nl/2(log n)a;/f(F 1(b;)) for i = 1, k.
Then, there exists a universal constant C > 0 such that
Let 0
ro < r 1 < ... < rk < rk+l
+ 1. Put bi = r;/(n + 1) and aF
sup IP{[f(F1(b;))(Xri : n
Fl(b;))]~=l E
B}  P{[(Uri : n
b;)]~=l E
B}I
::::;; C(k/n) 1/2 [c(f)1/2
where c(f)
+ C(f)2 + n 1/2J
maxJ=l [supYEllf(j)(y)l!infYErfi+1(y)].
At the end of this section we shall give an example showing that Theorem
4.3.1 does not hold for ri  ri  1 = 1. It is difficult to make a conjecture whether
the result holds for ri  ri 1 = 2 or ri  ri 1 = 3. As we will see in the proof of
Theorem 4.3.1 one reason for the restriction ri  ri  1 ::::: 4 is that the supports
of the two joint distributions are unequal.
Theorem 4.3.1 is a slight improvement of Theorem 2.1 in Reiss (1981b)
which was proved under the stronger condition that r i  r i  1 ::::: 5. Therefore,
the proof is given in its full length. Another reason for running through all the
technical details is to facilitate and to encourage further research work.
Theorem 4.3.1 may be of interest as a challenging problem that can only be
solved when having a profound knowledge of the distributional properties of
order statistics.
Theorem 4.3.1 also serves as a powerful tool to prove various results for
order statistics. As an example we mention a result of Section 4.5 stating that
several order statistics of i.i.d. exponential r.v.'s are jointly asymptotically
normal. By making use of Theorem 4.3.1, this may easily be extended to other
r.v.'s. However, one should notice that a stronger result may be achieved by
using a method adjusted to the particular problem. Thus, applications of
Theorem 4.3.1 will lead to results of a preliminary character which may
stimulate further research work. Another application of Theorem 4.3.1 will
concern linear combinations of order statistics (see Section 6.2).
PROOF OF THEOREM 4.3.1. Part I. We write Ili = F1(b;), /; = f(ll;) and, more
generally, /;U) = f U)(IlJ Denote by Qo and Ql the distributions of
(Uri : n
b;)~=l
and, respectively, (/;(Xri : n
and by go and gl the corresponding densities.
From Lemma 3.3.9(i) and Lemma A.3.5 we obtain
s~p IQo(B) 
Ql(B)1 ::::;;
[2 Qo(A
t(
Il;))~=l'
J/
IOg:Jd Qo
(1)
for some Borel set A to be fixed later. The main difficulty of the proof is to
obtain a sharp lower bound of JAloggl/godQo'
4.3. Asymptotic Independence from the Underlying Distribution Function
125
We have
and
where
+ xk/h)}.
Moreover, K is a normalizing constant, hi(x) = !(Jli + xi//;)//;, '/lAx) =
Xi  Xil + (hi  hid, <5i(x) = F(Jli + xi//;}  F(Jlil + xid/;I)  'Mx) for
i = 1, ... , k + 1 [with the convention that Xo = Xk+l = 0, F(Jlo + xo/!o) = 0
and F(Jlk+l + Xk+l/h+1) = 1]. Thus, for A c Al we have
Al = {x: F(Jll
F(Jlk
.f f (loghJdQo
f(
A
+ Xd!l) < ... <
IOggl)dQo =
go
,=1
+ k+l
.2: (ri ,=1
To obtain an expansion of log(l
ri 1
1)
f (
A
(2)
<5.) dQo
log 1 + .:
t/Ji
+ <5i/t/Ji), we introduce the sets
i = 1, ... , k + 1.
Notice that
(3)
on A 2 ,i where, throughout the proof, C denotes a universal constant that is
not necessarily the same at each appearance. Moreover, we write
A 3 ,i
= {x: Ixd
::;; 5n 1/2 (logn)oJ
and
(4)
We shall verify that the following three inequalities hold:
Ii~
I ~f (ri ,=1
(log hJ dQo
ril  1)
I: ; C[c(f)QO(Ac)2/3 k/n
log(1
+ <5i/t/Ji) dQ o l
1/ 2
+ (c(f) + c(f)2)k/n],
(5)
4. Approximations to Distributions of Central Order Statistics
126
Qo(AC) :;:;
c [n~ + C(f)4(log n)I/2 n~
J.
(7)
The assertion of the theorem is immediate from (1), (2), and (5)(7).
A Taylor expansion of log(fID about f.1i yields
Iloghi(x)  (f,oW/)x;I :;:; C(c(f)
for x
+ c(f)Z)x?
A 3.i and i = 1, ... , k. Since SXi dQo(X) = 0 we obtain
and hence, (5) is immediate from (1.7.4).
Next, we shall prove a lower bound of L~~t (ri  ri 1  1) SA log(1
bdt/lJ dQo It is obvious from (3) that
(ri  ri 1  1)
with
k+1
PI = i~ (ri  ri 1  1)
Pz
P3 =
~1
( . _.
1)
.L... r,
r,1
i~ (ri 
ri 1  1)
,~1
+ bdt/lJdQo :;:; c(lpll + IPzl + P3)
log(1
f
f
t
A
aix?  ai 1X?1
t/li(X)
+
(9)
dQo(X),
bi(x)  (aix?  ai 1xfd dQ ( )
./, ( )
0 x ,
'I'i X
(bdt/lJ z dQo,
where the constants a i are given by a i = 1;(1)121? for i = 1, ... , k, and ao =
ak+l = O. From P.1.25 it is easily seen that
Some straightforward calculations yield
for every x and i = 2, ... , k. Moreover, L~~t (aix?  a i 1X?I) = 0 and
ri  ril  (n + 1)t/li = en + l)(xi  XiI)' Combining these relations and
applying the Holder inequality we obtain
4.3. Asymptotic Independence from the Underlying Distribution Function
127
Since ri  ri 1 ~ 4 we know that P.1.23 is applicable to I/Ii 3 dQo and hence
the Holder inequality, Lemma 3.1.3 and Corollary 1.6.8 yield
(11)
To obtain a sharp upper bound of Ip21 one has to utilize some tedious
estimates of lc5i (x)  (aiX[  ailX[dl. A Taylor expansion of G(y) =
F(/li + yxdf;)  F(/lil + YX i I //;I) about y = 0 yields
Iui~ ()
X
(2
ai x i  ai 1Xil
)1_
11 (2) ( /li + ()Xi)
X~  (; 1
/; /;3
1(2) (
/lil
i  1) X~1 1
+ ()X/;1
/;~1
for every i = 2, ... , k and x E A 3 ,i n A 3 ,i1 where () E (0, 1). Thus, by further
Taylor expansions of F 1 and of derivatives of F we get
lc5i (x)  (aiX[  ailX[dl
~ C(c(f)lx~  x~11
+ x~I[c(f)lxi  XiII + (c(f) + c(f)2)(bi  bi  1)])
=: '1i(X),
(12)
For i = 1 and x E A 3 ,1 and, respectively, i = k
+ 1 and x E A 3 ,k+l we get
lc5i(x)  (aiX[  ailX[dl ~ CC(f)IXi  Xi_11 3 =: '1i(X),
(13)
Since L~,;t [c5i (x)  (aiX[  ailX[I)] = 0 we obtainusing again the
HOlder inequality and applying (12) and (13)that
k+l
Ip21 ~ i~
k+l
~ i~
[1'1i(x)I(1
+ (n + 1)l xi 
(f ['1i(x)(1 + (n + 1) IXi 
Xill)/I/Ii(X)]dQo(x)
Xil 1)]2 dQo(X)
)1/2 (f I/Ii 2dQo )1/2
Proceeding as in the proof of (11) we obtain
Ip21 ~ C(c(f)
+ c(f)2)k/n.
(14)
Moreover, the arguments used to prove (11) and (14) also lead to
P3
~:~ (ri 
ril
1)(f ['1i(X) +
+ (c(f) + c(f)2)(bi ~ C(c(f)
c(f)lx[  x[11
bi _dx[_1]6 dQ O(X)Y/3
(f I/Ii 3dQo y/3
(15)
+ c(f)2)k/n.
Combining (9), (11), (14), and (15) we obtain (6).
Finally, we prove (7). Applying Lemma 3.1.1 we get
Qo{x: Ix;! ~ (50/11)ui(logn)/n 1/2 } ~ Cn 3
(16)
4. Approximations to Distributions of Central Order Statistics
128
for i = 1, ... , k. Hence
Qo(A~.;) :5: Cn 3
(17)
for i = 1, ... , k, and in view of Corollary 1.6.8,
Qo{x:
IXi 
x i  1 1 ~ 5(bi  bi_1)1/2(logn)/n 1/2 } :5: Cn 3
(18)
for i = 2, ... , k. From (10), (11), (13), (17), and (18) we infer that
(19)
Qo{Ji ~ En} ~ 1  Cn 3
for i = 1, ... , k + 1 where En = c(f)(bi  bi_d 1/2(log n)3/n1/2. Since ri  ri 1 ~ 4
we deduce from Lemma 3.1.2 that
QO{ljJi
~ 3En} ~ 1 
(20)
Cc(f)4(logn)1/2/n2
+ 1. Combining (19) and (20) we get
Qo(A~):5: C[n 3 + c(f)4(logn)1/2/n2]
for i = 1, ... , k + 1. It is immediate that Qo(A 1) ~ QO(n~=l A 3,J
for i = 1, ... , k
(21)
This together
with (17) and (20) yields
Qo(AC) :5: C[k/n 3 + c(f)4(log n)1/2 k/n2].
(22)
Thus, (7) holds and the proof is complete.
Counterexample
Theorem 4.3.1 was proved under the condition ri  r i  1 ~ 4. A counterexample in Reiss (1981 b) shows that this result does not hold if ri  ri  1 = 1
for i = 1,2,00', k.
EXAMPLE 4.3.2. Let Xi: n be the ith order statistic of n i.i.d. standard exponential
r.v.'s (with common dJ. G and density g).
Then, if n 1/2 = o(k(n)) and [nq] + k(n) :5: n where q E (0, 1) is fixed, we
obviously have
P{Ui:n
and, with bi = i/(n
Ui 1:n > 0 for i = [nq],oo., [nq]
+ 1) and J1.i =
lim sup P{g(J1.i)(Xi : n
n
+ k(n)}
= 1
G 1(b;) it can be verified that
+ (bi  bid
[nq],oo., [nq] + k(n)} <
J1.i)  g(J1.id(Xi 1:n
> 0 for i =
J1.i1)
1.
Thus, the remainder term in Theorem 4.3.1 is not of order Ok/n)1/2) for the
sets
4.4. The Approximate Multivariate Normal Distribution
129
4.4. The Approximate Multivariate
Normal Distribution
From Section 4.3 we already know that normalized joint distributions of
central order statistics are asymptotically independent of the underlying dJ.
F. In Section 4.5 we shall prove that, under appropriate regularity conditions,
the joint distributions are approximately normal. In the present section we
introduce and study some properties of such normal distributions.
To find these approximate normal distributions it suffices to consider order
statistics Vr,:n::;; Vr2 :n ::;; ... ::;; V rk :n of n i.i.d. random variables uniformly
distributed on (0, 1). Put bi = rd(n + 1). Then the normalized order statistics
(n
+ 1)1/2(Vr,:n 
bi),
i = 1, ... , k,
have expectation equal to zero and co variances approximately equal to
bj ) for i ::;;j. Thus, adequate candidates of approximate joint normal
distribution of central order statistics are the kvariate normal distributions
N(o,l:.) with mean vector zero and covariance matrix ~ = (O"i,j) where O"i,j =
M1  b) for 1 ::;; i ::;; j ::;; k. Below the bi are replaced by arbitrary Ai'
Ml 
Representations
Our first aim is to represent N(o,l:.) as a distribution induced by the kvariate standard normal distribution N(O,I) where I denotes the unit matrix.
Obviously, N(O,I) = N/'o, 1)' Given = Ao < A1 < ... < Ak < 1 define the linear
map Tby
(4.4.1)
TN(o,I) = N(o,l:.)
Lemma 4.4.1.
(that is, N(o, I) {T E B} = N(o,l:.)(B) for every Borel set B).
PROOF. Let T also denote the matrix which corresponds to the linear map.
The standard formula for normal distributions yields that T~O,I) has the
covariance matrix H = ('1i) = TTl where yt is the transposed of T. Thus,
~
Am  Am1
'1i,j = (1  Ai)(1 . Aj) m~l (1  Am 1 )(1  Am)
for i ~j.
By induction over j = 1, ... , k we get
Am  Aml
m=l (1  Amd(l  Am)
and hence '1i,j = (1  Ai)AJor i
~ j.
Aj
(1  A)
Since '1i,j = '1j,i the proof is complete.
4. Approximations to Distributions of Central Order Statistics
130
From standard calculus for normal distributions we know that the density
of N(o,l:) is given by
((J(O,l:)
({J(O,l:)(x)
= [det1:1/(2n)kr/2exp[ht1:1x]
(4.4.2)
where x = (x 1 , ... ,xS and 1:1 is the inverse matrix of 1:. By elementary
calculations and by formula (4.4.4) below we get an alternative representation
of ({J(O,l:)' namely,
/
({J(O,l:)(x)
k+!
J1 2
= [ (2n)k 1] (Ai  Aid
exp
where Ao = 0, Ak+1 = 1 and
and lXi.i1
(lX i )
)2J
;:_11
(4.4.3)
is given by
Ai+!  A (Ai+1  A;)(Ai  Aid'
i 1
= ,:,,,:::
1,1
k+1 (x.  x
i~ ~i _
= Xk+1 = 0.
Xo
Lemma 4.4.2. (i) The matrix 1: 1 =
IX .
[12
i
= 1, ... , k,
= lX il,i = (Ai  Ai_d 1, i = 2, ... , k, and lXi,i = 0, otherwise.
(ii) det 1: 1 =
n (Ai 
k+1
i=l
Ai_d 1.
(4.4.4)
PROOF. (i) Let T be defined as in (4.4.1). The inverse of T is represented by the
matrix B = (f3i) given by
1  Ai  1
J1 /2
[
f3i,i = (1  Ai)(Ai  Aid
'
= 1, ... , k,
and
f3i,i1 =
[(1 _Ai~l~A~i )J
Ai_1
/2
i=2, ... ,k,
and f3i,i = 0, otherwise. Notice that 1: = BtB = n=~=l f3m,if3m,i]i,i and, thus,
lXi,i = f3ti + f3[+l,i, lXi, iI = lXi1,i = f3i,if3i,i1 and lXi,i = 0, otherwise. The proof
of (i) is complete.
(ii) Moreover,
k 2
_12
k
1 k 1  Ai 1
det 1: = (det B) =
f3i,i =
(Ai  Aid
i=l
i=l
i=l 1  Ai
k+1
=
(Ai  Ai_1f 1.
o
i=l
n  :  ':   "
Moments
Recall that the absolute moments of the standard normal distribution
are given by
N(O,l)
4.5. Asymptotic Normality and Expansions of Joint Distributions
I.
Xl
I I
X =
(O,1)()
1 . 3 . 5 ..... (j  1)
(2j/n)1/2((j _ 1)/2)!
131
'f j even
j odd
(4.4.5)
for j = 1,2, ....
Since N(O,CICt) is the normal distribution induced by N(O,I) and the map
x + Cx where C is a m,kmatrix with rank m we know that the distribution
induced by N(o, I) and the map x + Xi  Xi 1 is the univariate normal
distribution N(O,(A'<'_I)(1P., <'I)'
This together with (4.4.5) implies that
IXi  xi1l j dN(o,I)(X)
1 35 ... (j  1) [(Ai  )oid(1  (Ai  Ai_1))]j/2
(2j/n)1/2((j  1)/2)![(Ai  Ai 1)(1  ()oi  Ai_1]i/2
Further, by applying Lemma 4.4.1, we obtain for i
Ixl
Xi 1 dN(o,I/X) =
if j even (4.4.6)
j odd.
= 2, ... ,
xixl 1 dN(O,I)(X) =
k  1,
o.
(4.4.7)
4.5. Asymptotic Normality and Expansions
of Joint Distributions
In the particular case of exponential r.v.'s we know that spacings are
independent so that it will be easy to deduce the asymptotic normality and
an expansion of the joint distribution of several central order statistics from
the corresponding expansion for a single order statistic.
In a second step the result will be extended to a larger class of order statistics
by using the transformation technique.
We will use the abbreviations of Section 4.4: Given positive integers n, k,
and ri with 1 :::::; r 1 < r2 < ... < rk :::::; n, put bi = rd(n + 1) and ai,j = bi(1  bj)
for 1 :::::; i :::::; j :::::; k. Moreover, denote by N(o, I) the kvariate normal distribution
with mean vector zero and covariance matrix L = (ai ) . Again, the unit matrix
is denoted by I.
Normal Approximation: Exponential R.V.'s
First let us consider the case of order statistics from exponential r.v.'s. Before
treating the expansion of length two we shall discuss the result and the proof
in connection with the simpler normal approximation.
Let Xi:n be the ith order statistic of n i.i.d. standard exponential r.v.'s.
Denote by Pn the joint distribution of
i = 1, ... , k,
(4.5.1)
4. Approximations to Distributions of Central Order Statistics
132
where G is the standard exponential dJ. with density g. Moreover,
again the variational distance.
I II denotes
Theorem 4.5.1. For all positive integers k and ri with 0 = ro < r1 < r2 <
... < rk < rk+1 = n + 1 the following inequality holds:
(4.5.2)
where C = max(l, 2C2), C2 is the constant in Theorem 4.2.4 for m = 2, and Pn
is defined by
k+1
(4.5.3)
Pn = 2 L (ri  ri_1f 1.
i=l
Since L~~l (ri  ri 1)/(n
P.3.9) that
+ 1) =
1 we infer from Jensen's inequality (see
Pn ~ 2k2/n
which shows that N(o."F.) will provide an accurate approximation to Pn only
if the number of order statistics under consideration is bounded away from
n 1/2 From the expansion of length 2 we shall learn that the bound in (4.5.2)
is sharp.
Next we make some comments about the proof of Theorem 4.5.1. Notice
that the asymptotic normality of several order statistics holds if the corresponding spacings have this property. Let Qn denote the joint distribution of
the normalized spacings
en
l)(~i=b~~~:(l 
bi)Y/2(X'i: n  X'H:n  (G 1(bi )  G 1(bi_1)))
(4.5.4)
for i = 1, ... , k (with the convention that bo = 0 and G 1 (bo ) = 0).
Denote again by T the map in (4.4.1) which transforms ~O,I) to N(o,"F.) [that
is, TN(o,I) = N(o, "F.)]' Since G 1(bi ) = log(l  bi) and hence g(G 1(bi)) =
1  bi it is easy to see that
Therefore,
(4.5.5)
On the righthand side of (4.5.5) one has to calculate the variational distance
of the two product measures Qn :=
Qn,i and N(O,I) = N(~, 1) where Qn,i is
the distribution of the ith spacing as given in (4.5.4).
From Lemma 1.4.3 we know that spacings of exponential r.v.'s are distributed like order statistics of exponential r.v.'s. Since G 1(b i)  G 1(bi_1) =
G 1ri  rid/(n  ri 1 + 1)) we obtain that Qn,i is the distribution of the
normalized order statistic
Xt=l
4.5. Asymptotic Normality and Expansions of Joint Distributions
133
(mi + 1)3/2g(G 1(s;/(mi + 1)))(X
_ G 1 ( ./( . + 1)
(si(m i  Si + 11/2
.,:m,
S, m,
where mi = n  ri 1 and Si = ri  ri 1.
Section 3.3 provides the inequalities I Qn  N(o.l) I
as well as
IIQ.  N(o.l)II:::;;
Ct
(4.5.6)
:::;; L~=1 II Qn. i  N(O.I) II
/
H(Q"i,N(o.l)fY 2
where H denotes the Hellinger distance. The first inequality and upper bounds
of Wn.i  N(o.l)ll, i = 1, ... , k (compare with Corollary 4.2.7) lead to an
inaccurate upper bound of IIQn  N(o.l)II. The second inequality is not
applicable since a bound of the Hellinger distance between Qn. i and N(O.I) is
not at our disposal. The way out ofthis dilemma will be the use of an expansion
of length two.
Expansion of Length Two: Exponential R.V.'s
To simplify our notation we shall only establish an expansion of length two.
Expansions of length m can be proved by the same method.
Theorem 4.5.2. Let C, Xi:., ri, p. and Pn be as in Theorem 4.5.1. Then, the
following inequality holds:
s~p IPn(B) 
+ Lr.n)dN(o.l:)I:::;; Cexp(CPn)Pn
(1
(4.5.7)
where L r.n is the polynomial defined by
k
L r,n (x) = "~ Ll "i'it,n'il (xI"
x1 II"f l  1 ,I.)
I II.'. i=1
with Ll,r.n defined as in Corollary 4.2.7, Xo = 0 and
l'i.j = (1  bi)[(bj  bj 1}/(l  bjd(l  b)] 1/2.
PROOF.
From (4.5.6) and Corollary 4.2.7 it is immediate that
sup
Qn,'.(B) B
:::;;
C2
Jr (1 + L
B
n  ri
l,ri _'it,n _'j
(ri  rid(n  ri + 1)
) dN.(0,1)
I
(1)
=:
>:
C2 Ui'
The bound for the variational distance between product measures via the
variational distance between the single components (compare with Corollary
4. Approximations to Distributions of Central Order Statistics
134
A.3.4)) yields
sup
B
I(x
,=1
S;;
Qn.i)(B) 
C z exp [2C z
f TI
B,=l
(1
it it
bi ]
+ L1.riri~l.nril(XJ)dN(~.l)(X)1
(2)
bi
Next we verify that the integral in (2) can be replaced by that in (4.5.7).
Lemma A.3.6, applied to gi = L1.riri~l.nri~l' yields
sup
f TI [1 + L1.riri~,.nri~l (xJ] dN(~.l)(X)
B i=l
L[1 it L1.riri~1.nri~1(Xi)]dN(~.1)(X)1
+
S;;
8 1/Z exp [r1
S;;
I8 1/Z ex p
(4.5.8)
.f fLi.riri~l.nri~l dN(o.l)] .f fLi.riri~l.nri~l dN(o.l)
,=1
[r1
,=1
it bi] it bi
where the last step is immediate from (4.2.7).
Check that L7=1 bi S;; Pn' Combining (2) and (4.5.8) we obtain
supl(x Qn.i)(B)
,=1
S;;
r [1 +.f,=1 L1.riri~1.nri~1(XJ]dNto.1)(X)1
JB
C z exp[2CZPn]Pn
I8 1/Z exp[r 1Pn]Pn
S;;
(4.5.9)
Cexp(CPn)Pn
Now, the transformation, as explained in (4.5.5), yields the desired inequality (4.5.7). For this purpose apply the transformation theorem for
densities. Note that the inverse S of T is given by
D
From (4.5.9) we also deduce for the normalized, joint distribution Pn of
order statistics that
S;;
1 p~/Z
+ O(Pn)
where the last inequality follows by means of the Schwarz inequality.
Notice that (4.5.10) is equivalent to (4.5.2) as far as the order of the normal
approximation is concerned. However, to prove (4.5.2) with the constant as
stated there one has to utilize a slight modification of the proof of Theorem
4.5.2.
4.5. Asymptotic Normality and Expansions of Joint Distributions
135
PROOF OF THEOREM 4.5.1. Applying Lemma A.3.6 again we obtain
sup
B
Ir
J ,=1 [1 + L 1,r,r,_l,nr,_1(x;)]dN/b,1)(x)  N(~'1)(B)1
B
(4.5.8')
::;; exp[3 1 Pn] (Pn/6)1/2
showing that (4.5.2) can be proved in the same way as (4.5.7) by applying
(4.5.8') in place of (4.5.8).
0
Normal Approximation: General Case
Hereafter, let p. denote the joint distribution of the normalized order statistics
i
= 1, ... , k,
(4.5.11)
where Xi," is the ith order statistics of n i.i.d. random variables with common
dJ. F and density f, and bi = rj(n + 1). Recall that the covariance matrix L is
defined by (Ji,j = b;(1  bJ for 1 ::;; i ::;; j ::;; k.
From Theorem 4.3.1 and 4.5.1 it is easily seen that under certain regularity
conditions,
(4.5.12)
with P. as in (4.5.3). The crucial point is that the underlying density is assumed
to possess three bounded derivatives. The aim of the following considerations
is to show that (4.5.12) holds if f has two bounded derivatives. The bound
O(p;/2) is sharp as far as the normal approximation is concerned, however,
p;/2 is of a larger order than the upper bound in Theorem 4.3.1.
Theorem 4.5.3. Denote by p. the joint distribution of the normalized order
statistics in (4.5.11). Assume that the underlying density f has two derivatives on the intervals Ii = (F 1(b;)  8 i, F 1(b;)  8 i), i = 1, ... , k, where 8 i =
5[(Ji,ilog(n)/(n + 1)] 1/2/f(F 1(b;)). Moreover, assume that min(b 1, 1  bk ) ~
10 log(n)/(n + 1).
Then there is a universal constant C > such that
lIP.  N(o,I:)II ::;; C(1 + d(f))p;/2
where d(f) = maxf=1 max~=1 (SUPYEI, If(j)(y)l/infyE1 ,fi+ 1(y)).
PROOF. In the first part of the proof we deal with the special case of order
statistics U"n of n i.i.d. random variables with uniform distribution on (0,1).
In this case, an application of Theorem 4.3.1 would yield a result which is only
slightly weaker than that stated above. The present method has the advantage
of being simpler than that of Theorem 4.3.1 and, moreover, it will also be
applicable in the second part.
4. Approximations to Distributions of Central Order Statistics
136
I. Let Qn denote the joint distribution of normalized order statistics
X" ,n' ... , X'k,n of standard exponential r.v.'s with common dJ. C and
density g. Write gi = g(Cl(b;)). Denote by Q~ the joint distribution of
(n + 1)1/2(U"n  b;), i = 1, ... , k. From Corollary 1.2.6 it is easily seen
that
J"
(1)
Q~ = TQn
where T(x)
= (Tl (X 1 ), . .. ,
T;(x;) = (n
1k(xd) and
1)1/2 ( C ( C 1(b;)
for every x such that C 1(b;) + x;/((n
Theorem 4.5.1 and (1) yield
+ (n + ~i)1/2g) 
+ 1)1/2g;) > 0, i
IIQ~  N(o.dl ~ IITQn  TN(o.I:) II
~ Cp~/2
(2)
bi)
1, ... , k.
IITNro.I:)  N(o.I:)II
II TN(o.I:)  N(o.d
where, throughout, C denotes a universal constant that will not be the same
at each appearance. Thus, it remains to prove that
(3)
The inverse S of T is given by S(x)
Si(X i ) = (n
for x with
= (S1 (Xl)"", Sk(X k))
+ 1)1/2gi(C 1(b i + x;/(n + 1)1/2)  C 1(b;)),
< bi + x;/(n + 1)1/2 < 1. Inequality (3) holds if
where
i = 1, ... , k,
(4)
(5)
We prefer to prove (5) instead of (3) since this is the inequality that also has
to be verified in the second part of the proof with C replaced by F.
Denote by NT and Ns the restrictions of N(o.I:) to the domains DT of T and
Ds of S. Check that
IITN(o.I:)  N(o.dl ~ II(To S 0 T)N(o.I:)  (To S)N(o.dl
::;; IIN(o.I:)  SN(o.d
+ IINs 
+ N(o.I:)(D~) + N(o.I:)(Ds)
N(o.dl
which shows that (5) implies (3) since
(6)
(6) in conjunction with (A.3.5) yields
IIN(o.I:)  SN(o.I:)II
~ CPn + [2N(o.I:)(B") +
(IOg(fdlo))dN(o.I:)J2
(7)
for sets B in the domain of T, and 10' 11 being the densities of N(o.I:) and
SNro.I:)' Applying the transformation theorem for densities (1.4.4) we
obtain
4.5. Asymptotic Normality and Expansions of Joint Distributions
X E
137
B, where
(with the convention that 1k+1 (Xk+1) = To(xo) = Xk +1 =
bo = 0).
Check that
log 'Ii'(xJ = x;/(n
and, for Xi;;::: (n
+ 1)1/2(1
Xo
= 0 and bk+1 = 1,
(9)
 bi)
+ 1)1/20"i,i'
(10)
Define
B = {x: Xi > (10(logn)0";,;)1/2, i = 1, ... , k}.
Applying the inequality 1 
~(x) ~
qJ(x)/x we obtain
~o,};)(BC) ~ n 4.
The condition min(b 1, 1  bd;;::: 10 log (n)/(n
holds for x E B for i = 1, ... , k. Since
(11)
+ 1)
yields Be DT and (10)
i = 1, ... , k,
Xi dN(o,};)(X) = 0,
(12)
we obtain, by applying (9) and the Schwarz inequality, that
IXil
Jr (kift log 'Ii,(Xi)) d~o,};)(X) ~ iftk Jr (n + 1)1/2(1
_
B
1JC
b;) dN(o,};)(X)
Cn 1
(13)
Notice that according to (4.4.7),
i~
(xt  Xt1)(Xi 
xidd~o,};)(x) = 0,
(14)
and hence, applying (4.4.5) and (4.4.6), we obtain by means of some straightforward calculations that
i(
B
k~ c5i(X)(Xi  xid + c5f (X)/2) dN.
.L...
1=1
i 
i1
() < C
(o,};) X
Pn
Combining (11), (13), and (15) we see that the assertion of Part I holds.
(15)
4. Approximations to Distributions of Central Order Statistics
138
II. Notice that Pn = SQ: where S is defined as in (4) with G and gi replaced
by F and f(Fl(bJ). Using Taylor expansions oflog I;'(xJ and I;(xJ the proof
of this part runs along the lines of Part I.
0
Final Remarks
In Reiss (1981a) one can also find expansions of length m > 2 for the joint
distribution of central order statistics of exponential r.v.'s. Starting with this
special case, one may derive expansions in case of r.v.'s with sufficiently
smooth dJ. by using the method as adopted in Reiss (1975a); that is, one has
to expand the densities and to integrate the densities over Borel sets in a more
direct way.
4.6. Expansions of Distribution Functions
of Order Statistics
In Sections 4.2 and 4.5, expansions of distributions of central order statistics
were established which hold w.r.t. the variational distance. These expansions
can be represented by means of polynomials that are densities W.r.t. the
standard normal distribution.
Expansions for dJ.'s can be written in a way which is more adjusted to dJ.'s
The results for dJ.'s of order statistics hold under conditions which are weaker
than those required for approximations in the strong sense. Along with the
reformulation of the results of Section 4.2 we shall study expansions of d.f.'s
of order statistics under conditions that hold for order statistics of discrete
r.v.'s.
Write again
a;,n = r(n  r + 1)/(n
+ 1)3
and
br,n = r/(n
+ 1).
Continuous D.F.'s
First, the results of Section 4.2 will be rewritten in terms of d.f.'s.
Corollary 4.6.1. Under the conditions of Theorem 4.2.4 there exist polynomials
Si,r,n of degree ~ 3i  1 such that
s~p IP{ a'::~f(Fl (br,n))(Xr:n 
~ em [(n/r(n 
pl
r))m/2
(br,n))
~ t} 
( <l>(t)
+ rr;!X laj,r,nl m/j ]
where aj,r,n are the terms in Theorem 4.2.4.
+ <pet) ~~l
Si,r,n(t))
(4.6.1)
4.6. Expansions of Distribution Functions of Order Statistics
PROOF.
139
Apply Lemma 3.2.6.
Let us note the explicit form of Sl.r,n and S2,r,n' We have
(qJSi,r,n)'
= qJLi,r,n
with Li,r,n as in Addendum 4.2.5. Moreover,
n2r+1
2
2
Sl,r,n(t) = 3 [r(n _ r + 1)(n + 1)] 1/2 (1  t ) + (Xl,r,n t /2
(4.6.2)
and
_
(
1)(
.. n(t)  rnr+
n+ 1) [(n  2r
S2 r
+ [7(n  2r
t
(Xj,r,n
+ 1)2 + 3r(n  r + 1)](3t + t 3 )/12 + (n  r + 1)2t]
+ (X l,r,n 2 L l,r,n (t)
with
+ 1) (1St + St + t )/18
t5
(X2l,r,n 8
t3
+ (X 2,r,n 6
(4.6.3)
as in Theorem 4.2.4 and L 1 ,r,n as in (4.2.2).
EXAMPLE 4.6.2. We have
<em ( r(n  r)
)m12
where Si,r,n are the polynomials of Corollary 4.6.1 with
(Xi,r,n
= O.
Discrete D.F.'s
The conditions of Theorem 4.2.4 exclude discrete d.f.'s F. The key idea of the
following is to approximate the d.f. F (which may be discrete) by some function
G which fulfills an appropriate Taylor expansion.
As an example we shall treat the case of d.f.'s F that permit an Edgeworth
expansion (like binomial d.f.'s).
We start with a technical lemma.
Lemma 4.6.3. Let Xi:n be the order statistics of n i.i.d. random variables with
common dl. F.
Let G be a function and u a fixed real number such that for all reals y,
IG(u +
y)  G(u) 
.f ~: yi I::; (m m++11)IIYlm+1.
.
,=1 L
(4.6.5)
4. Approximations to Distributions of Central Order Statistics
140
Then, if C 1 > 0 there exists a universal constant Cm > 0 and polynomials Si.r,n
of degree ~ 3i  1 such that for all reals t the following inequality holds:
Ip {a;::~c1 (Xr:n <
Cm [(

PROOF.
u)
r (n
~ t} 
n
 r
+ 1)
( <I>(t)
+ cp(t) :~1
)m/2 + am
Si,r,n(t)) I
(c. /c j +1 )mfj
r,n max
j=l )+1 1
(4.6.6)
Writing x = u + tar,n/C1 we get
P{a;::~c1(Xr:n  U) ~ t}
P{a;::~(Ur:n  br,n) ~ a;::~(F(x)  br,nn.
Denote by Si,r.n the polynomials of Example 4.6.2. Since
a;::~(F(x)  br,n) = a;::~(F(x)  G(x))
+ V(t) + a;::~(G(u) 
br,n),
with V(t) = a;::~(G(x)  G(u)), it is immediate from Example 4.6.2 that
Ip{a;::~C1(Xr:n 
~C
m[
u)
~ t} 
C(n _: +
[ <I> (V(t))
l)r
/2
+ cp(V(t)) :~1 Si,r,n(V(t))JI
+ a;::!(IF(x) 
G(x)1
IG(u)  br,nD].
Using condition (4.6.5) we obtain an expansion of V(t) oflength m, namely,
V(t) = t
+L
m
i=2
a i 1
~ r'in t i + em(t)
dC 1
(m
m+1
am
r':+l Itl m+1
+ 1)!c 1
where lem(t)1 ~ 1. Now arguments analogous to those of the proof to Theorem
D
4.2.4 lead to (4.6.6).
The polynomials in Lemma 4.6.3 are of the same form as those in Corollary
4.6.1 with aj,r,n replaced by a!,nCj+1/c{+l.
Next, Lemma 4.6.3 will be specialized to dJ.'s F == FN permitting an
Edgeworth expansion G == GM,N of the form
M1
GM,N(t) = <I>(t) + cp(t) L N i/2 Qi(t)
i=l
where M and N are positive integers, and Qi is a polynomial for i =
1, ... , M  1. Let us assume that
(4.6.7)
uniformly over tEl where I will be specified below.
If FN stems from a Nfold convolution, typically one has the following two
cases:
4.6. Expansions of Distribution Functions of Order Statistics
141
(i) I is the real line if the Cramervon Mises condition holds,
(ii) 1= {y + kh: k integer} where y and h > 0 are fixed.
Moreover, define an "inverse" GZt.N of GM,N by
M1
GZt,N = <1>1 + I N i/2 Qt(<I>1)
i~l
where the Qt are the polynomials as described in Pfanzagl (1973c), Lemma 7.
We note that
and
(4.6.8)
Since GM,N is an approximation to FN we know that GZt,N is an approximation to Fli 1. As an application of Lemma 4.6.3 to F == FN, G == GM,N' and
u = GZt,N(br,n) we obtain the following
Corollary 4.6.4. Under condition (4.6.7) there exists em, M > 0 such that for every
positive integer n, r E {1, ... , n} and tEl:
IP{X"n :s; t}  ( <I>
+ ({J ~~1 Si,r,n}SM(t I
(4.6.9)
where
SM(t) = a;'~ G~,N[GZt,N(br,n)] (t  GZt,N(br,n))
and the Si,r,n are the polynomials of Lemma 4.6.3 with Ci = GX},N(GZt,N(br,n))'
PROOF. To make Lemma 4.6.3 applicable one has to verify that
GM,N(GZt,N(br,n)) = br,n
+ O(Nm/2).
(1)
It suffices to prove that (1) holds uniformly over all rand n such that
1<1>1 (br,n)1 = O(log N). A standard technique [see Pfanzagl (1973c), page 1016]
yields
(2)
uniformly over 1tl = O(log N) where GM,N(t) = t
is immediate from (2) applied to t = <I>l(br,n)'
+ Ii'!11 N i/2 Qt(t). Thus, (1)
0
To exemplify the usefulness of Corollary 4.6.4 we study the dJ. of an order
statistic X"n of n i.i.d. binomial r.v.'s with parameters Nand p E (0, 1). It is
clear that
142
4. Approximations to Distributions of Central Order Statistics
where
FN(t)
k~O (~) pk(1 
pt k
with [ J denoting the integer function. Moreover, P{X". ~ t} = P{X". ~
[tJ} so that P{ X". ~ t} has to be evaluated at t E {O, ... , N} only.
As an approximation to the normalized version of FN we use the standard
normal dJ. <I> and the Edgeworth expansion <I> + N 1/2 <pQ1 oflength 2 where
(see Bhattacharya and Rao (1976), Theorem 23.1)
Q1(t) = [(2p  1)t 2 + (4  2p)]j6(p(1 _ p))1/2.
Table 4.6.1. Maximum Absolute Deviation of Exact Values and Expansions
p =.5
N= n
r = [nI2]
p =.2
N=n
r = [nI4]
p =.2
N = [n4/ 3 ]
r = [nI4]
p =.5
N = [n4/ 3 ]
r = [nI2]
(m,M)
n
20
80
200
(1, 1)
(2,2)
(1, 1)
(2,2)
(1, 1)
(2,2)
(1, 1)
(2,2)
.33
.38
.42
.01
.002
.0001
.35
.32
.31
.01
.003
.0001
.29
.22
.20
.007
.002
.0007
.27
.20
.16
.006
.0028
.0005
Table 4.6.1 presents a numerical comparison of the approximations in
Corollary 4.6.4 in the special cases of (m, M) = (1, 1) and (m, M) = (2,2). Thus,
if(m,M) = (1, 1) we compute the maximum value of
IP {X". ~ k} 
<I> [
a;~ <p(<I>l(b,. )) (N1/2~~ ~Pp))1/2 
<1>1 (b".) ) ]
over k = 0, ... , N.
4.7. Local Limit Theorems and Moderate Deviations
In Section 4.2 we proved expansions of distributions of single order statistics
uniformly over the Borel sets. The main technical tool was an expansion of
one factor of the density (compare with the proof to (4.7.2)). The expansion
of the density was not given explicitly to concentrate our attention on the
result of statistical relevance, namely, the expansion of distributions.
The final section of this chapter is the proper place to give some explicit
formulas for expansions of densities with an error bound that is nonuniform
in x. By integration we shall also get inequalities which are relevant for
probabilities of moderate deviation.
4.7. Local Limit Theorems and Moderate Deviations
143
Let again
a;.n = r(n  r
+ 1)/(n + 1)3
br,n = r/(n
and
+ 1).
Denote again by U"n the rth order statistic of n i.i.d. (0, I)uniformly
distributed r.v.'s. From Lemma 3.1.1 we obtain
P{a;::~IU"n 
br,nl Z e} :::;; 2ex p (  3[1
+n
e
: e/(ar,nn)]).
e > O.
(4.7.1)
A refinement of this result will be obtained in the second part of this section.
Local Limit Theorems
Denote by gr,n the density of
and by <I> and q; the standard normal dJ. and density. The most simple "local
limit theorem" is given by the inequality
Igr,n(x) 
q;(x)1 :::;; Cq;(x) C(n _ nr + 1))112 (1 + Ix13)
(4.7.2)
which holds for
X E
A(r, n):= {x:
Ixl :::;; (r(n  r)/n)1/6}
(4.7.3)
where the constant C > 0 is independent of x.
To prove (4.7.2) let us follow the lines of the proof to Theorem 4.2.1. The
density gr,n of a;::!(Ur:n  br.n) is written as Pr,nhr,n where Pr,n is a normalizing
constant. From the proof of Theorem 4.2.1 (1) we know that
(4.7.4)
for x E A(r, n).
We also need an expansion of the factor Pr,n' By integration over an interval
B we get uniformly in rand n that
Pr,n =
gr,n(x)dx
IL
hr,n(x)dx
= P{a;::~(Ur:n  br,n) E B}/[(2n)1 /2N(O,l)(B)
= (2nfl/2
+ O((n/r(n 
+ O((n/r(n 
r))1/2]
(4.7.5)
r))1/2)
where the final step is immediate by specifying B = {x: Ixl :::;; log(r(n  r)/n)}
and applying (4.7.1) to e = log(r(n  r}/n).
An expansion of length m can be established in the same way. For some
constant Cm > 0 we get
144
4. Approximations to Distributions of Central Order Statistics
Igr.n(X) 
cp(x) ( 1 + i~ Li.r.n(x)
m1
I ~ Cmcp(x) (n
)m/2
r(n _ r + 1)
(1 + Ixl 3m )
(4.7.6)
for x E A(r, n) with polynomials Li,r,n as given in Theorem 4.2.1.
In analogy to Theorem 4.2.4 we also establish an expansion of the density
of the normalized rth order statistic under the condition that the underlying
dJ. has m + 1 derivatives.
Theorem 4.7.1. For some r E {l, ... , n} let X"n be the rth order statistic of n
i.i.d. random variables with common df F and density f Assume that
f(F1(br.n)) > 0 and that the function Sr,n defined by
Sr,n(x)
a;:~(F[Fl(br,n)
+ xar,n/f(F1(br,n))]
 br,n)
has m + 1 derivatives on the interval Ir,n:= {x: Ixl ~ cr,n} where log(r(n  r)/
n) ~ cr,n ~ (r(n  r)/n) 1/6/2. Denote by fr,n the density of
a;:~f(rl(br.n))(X"n  r1(br,n))'
Then there exists a constant Cm > 0 (only depending on m) such that
Ifr,n(X)  cp(x)
(1 + ~~1
Li,r,n)
~ Cmcp(x)(1 + IXI3m{(n/r(n 
r))m/2
+ rr;~lx
(4.7.7)
laj,r,nl mli ]
for x E Ir,n with polynomials Li,r,n as given in Theorem 4.2.4. Moreover,
)1' X E Ir,n} .
aj,r,n  S(j+1)(O)
r,n
,j.  1, ... , m  1,and am,r,n  sup {ls(m+1)(
r,n X.
PROOF.
We give a short sketch of the proof. Check that
with gr,n as above. Applying (4.7.6) we obtain
Ifr,n  S;,ncp(Sr,n)
~
(1 + }:::
Li,r,n(Sr,n)) I
CmIS;,nl cp(Sr,n)(n/r(n  r))m/2(1 + ISr,nI3m)
with polynomials Li,r,n as given in (4.7.6). Now, using Taylor expansions of
S;,n and Sr,n about zero and of cp about x we obtain the desired result by
arranging the terms in the appropriate order.
D
Moderate Deviations
We shall only study a simple application of (4.7.1). It will be shown that
the righthand side of (4.7.1) can be replaced by a term Cexp(e 2 /2)/e for
certain e.
P.4. Problems and Supplements
145
Lemma 4.7.2. For some constant C > 0,
(i)
~ C(,(n _ :
l)y/2
+ IxI 3)q>(x)dx
(1
for every Borel set Be A(r, n) [defined in (4.7.3)].
(ii) Moreover,
P{a;:~IUr:n  br.nl ~ 8} ~ Cexp(8 2 /2)/8
if 8
(r(n  r
+ 1)/n)I/6/2.
PROOF. (i) is immediate from (4.7.2) by integrating over B.
(ii) follows from (4.7.1) and (i). Put d = (r(n  r + 1)/n)I/6. We get
P{a;:~IUr:n  br.nl ~
8}
= P{a;:~IUr:n  br.nl
~ d}
< 2exp ( 
3[1
+ P{8
~ a;:~IUr:n  br.nl ~ d}
+ n 1 + d/(ar.nn)]
+ C((1 <1>(8 + (,(11 _11r + l)y/2
f)
IXI q>(X)dX)
~ Cexp( 8 2 /2)/8
where the final step is immediate from (3.2.3) and (3.2.12).
PA. Problems and Supplements
1. (Asymptotic d.f.'s of central order statistics)
(i) Let r(n) E {l, ... , n} be such that nl/2(r(n)/n  q) > 0, n > 00, for some q E (0, 1).
The possible non degenerate limiting d.f.'s of the sequence of order statistics
Xr(n),n of i.i.d. r.v.'s are of the following type:
H
H
I . (X)
2..
{O'f
cI>(x')
1
(x) = {cI>( ( x)')
1
0,
x<
X 2 0,
if x < 0,
H 3.. (x) = H I . (x/0')1[o.00)(x)
H4 = (1[1.00)
x 2 0,
+ H 2. (x)1(00.o)(x),
+ 1[1.00/2
where cc, 0' > O.
(Smirnov, 1949)
(ii) There exists an absolutely continuous dJ. F such that for every q E [0,1] and
every dJ. H there exists r(n) with r(n)ln > q and min(r(n), n  r(n > 00 as
146
4. Approximations to Distributions of Central Order Statistics
n > 00 having the following property: Let Xr(n),n denote the r(n)th order
statistic of n i.i.d. random variables with common dJ. F. Then, the dJ. of
a,;I(Xr(n),n  bn) converges weakly to H for certain an> 0 and bn.
(Balkema and de Haan, 1978b)
(iii) The set of all drs F such that (ii) holds is dense in the set of drs w.r.t. the
topology of weak convergence.
(Balkema and de Haan, 1978b)
(iv) Let XI' X 2 , X 3 , . be a stationary, standard normal sequence with covariances
,(n) = EXIXn+ 1 satisfying the condition L~II,(n)1 < 00. Let r(n) E {l, ... ,n}
be such that r(n)/n > .l., n > 00, where 0 < .l. < 1. Denote by Xr(n),n the r(n)th
order statistic of X I, ... , X n . Then, for every x,
n >
00,
where
(Rootzen, 1985)
2. Let again N(p.r.) be a kvariate normal distribution with mean vector J1 and
covariance matrix ~ = (0";). Moreover, let I denote the unit matrix.
(i) Prove that
IIN(o.r.)  N(o. l)II :s;
l/2
Lt
(0";.;  1)
IOg(det(~J/2.
(Hint: Apply (4.4.2) and an inequality involving the Kullback  Leibler
distance.]
(ii) If ~ is a diagonal matrix then (i) yields
(iii) Alternatively,
IIN(o.r.)  N(o.l)11 :s; k2k+111~  1112'
where 11112 denotes the Euclidean norm.
(Pfanzagl, 1973b, Lemma 12)
(iv) Denote again by K the KullbackLeibler distance. Prove that
K(N(P.l),N(o.l) = 2111J111~.
(v) Prove that
II N(p,.l)  N(p,.l) II :s;
1/2 11J11  J1211z.
3. Let N(o.r.) be the kvariate normal distribution given in Lemma 4.4.1. Define the
linear map S by
P.4. Problems and Supplements
147
Then, with I denoting the unit matrix, we have
(Reiss, 1975a)
4. (Spacings)
Given 1 ::; r l < ... < rk ::; n put again Ai = rj(n + 1), (Ji.j
j::; k, and /; = F'(FI(AJ). Moreover, we introduce
af =
(JiI.id/;:I  2(JiIj(/;_I/;)
= Ai(1 
Aj) for 1 ::; i ::;
+ (JijP
for i = 1, ... , k (with the convention that ai = (J 1. dfn.
Let Xi," be the order statistics of n i.i.d. random variables with common dJ. F.
Denote by Qn the joint distribution of the normalized spacings
i
1, ... , k,
and by Pn the joint distribution of the normalized order statistics
i = 1, ... , k.
After this long introduction we can ofTer some simple problems.
(i) Show that
IIQn  N(o. I) II ::; IlPn  N(o.!.:)11 + L\1/2
where I is the unit matrix,
L\
~ =
(Ji) and
1  (1  Ak)1/2
(Ai  Ai_dI/2/(ai/;)'
i=l
(ii) L\ = 0 if k = 1.
(iii) If F is the uniform dJ. on (0, 1) then
and as one could expect
5. (Asymptotic expansions centered at FI(q
Let q E (0, 1) be fixed. Assume that the dJ. F has m + 1 bounded derivatives on a
neighborhood of FI(q), and that f(FI(q > 0 where f = F'. Moreover, assume
that (r(n)/n  q) = O(n I ). Put (J2 = q(1  q). Then there exist polynomials Si,n of
degree::; 3i  1 (having coefficients uniformly bounded over n) such that
(1')
s~p P
{n I/2f(pI(q
(J
(Xr(n),n 
I} r
(q
E B
JB
where
Gr(n),n '"
 q>
+ cP
mI
,,i/2S
n
i,n'
L..
i=l
dGr(n),n
1_
m12 )
 O(n
148
4. Approximations to Distributions of Central Order Statistics
In particular,
2q  1
Sl.n(t) = [ ~
uj'(r1(qJ 2
l(q2 t
+ 2f(F
[q
+ nq u
r(n)
+1
2(2q  I)J
3u
.
(ii) If the condition (r(n)/n  q) = O(nl) is replaced by (r(n)/n  q) = o(nl/2) then
(i) holds for m = 2 with O(n 1) replaced by o(n 1/2).
(iii) Formulate weaker conditions under which (i) holds uniformly over intervals.
(iv) Denote by f..(n),n the density of the normalized distribution of Xr(n),n in (i), and
put gr(n),n = G;(n),n' Show that
If..(n).n(x)  gr(n).n(x)1 = O(nm/2<p(x)(1
uniformly over x
+ Ixl 3m
[logn, logn].
6. (Asymptotic independence)
Given n i.n.n.i.d. random variables with dJ.'s F1 ,
Fn we have
P{Xl,n ~ x, Xn,n ~ x}  P{Xl,n ~ x}P{Xn'n ~ x}
=
La
(Fj(y)(1  F;(X)))J 
[fl
(Fj(y)  F;(X].
(Walsh, 1969)
Bibliographical Notes
Laplace (1818) derived the asymptotic normality of sample medians. He
computed the density of the sample median (within a more general framework)
and proved a limit theorem for the pointwise convergence of the densities. For
a discussion of this result and applications we refer to Stigler (1973). This
method was also used by Smirnov (1935) to obtain the asymptotic normality
of central order statistics in greater generality. Other approaches reduce the
problem to an application of the central limit theorem (that includes as a
special case the asymptotic normality of binomial r.v.'s). The reduction is
achieved either by means of the representations given in Section 1.6 (Cramer,
1946, and Renyi, 1953), the equality in (1.1.8) (Smirnov, 1949, van der Vaart,
1961, and Iglehart, 1976), or the Bahadur approximation (Sen, 1968).
The problem of charaterizing the possible limiting d.f.'s of central order
statistics was dealt with by Smirnov (1949) (see P.4.1(i)) and Balkema and de
Haan (1978a, b). If no regularity conditions are supposed, every dJ. is a
limiting dJ. of central order statistics (see P.4.1(ii)).
An interesting problem, not treated in the book, occurs if the value of the
underlying density at the qquantile is equal to zero or if the qquantile is not
unique; in this context we refer to the articles of Feldman and Tucker (1966),
Kiefer (1969b), Umbach (1981), and Landers and Rogge (1985) for important
contributions.
Bibliographical Notes
149
A bound for the accuracy of the normal approximation to the dJ. of a single
order statistic was established by Reiss (1974a) (where the terms of the error
bound are given explicitly), Egorov and Nevzorov (1976), and Englund (1980).
Expansions of distributions of sample quantiles were established in Reiss
(1976). There it was merely assumed that the underlying dJ. F has derivatives
on (Fl(q)  s, Fl(q)] and (Fl(q) Fl(q) + s) for some s > O. Ifthe left and
right derivative of F at Fl(q) are unequal, then the leading term of the
expansion is a certain mixture of normal distributions (compare this with
P.4.1(i)). In this context, we also refer to Weiss (1969c) who proved a limit
theorem under such conditions.
Puri and Ralescu (1986) studied order statistics of a nonrandom sample
size n and a random index which converges to q E (0, 1) in probability. Among
others, the asymptotic normality and a BerryEsseen type theorem is proved.
A result concerning sample quantiles with random sample sizes related to that
for maxima (see P.5.11(i)) does not seem to exist in literature.
The problem of asymptotic independence between different groups of order
statistics provides an excellent example where a joint treatment of extreme
and central order statistics is preferable. The asymptotic independence of
lower and upper extremes was first observed by Gumbel (1946). A precise
characterization of the conditions that guarantee the asymptotic independence is due to Rossberg (1965, 1967). The corresponding result in the strong
sense (that is, approximation w.r.t. the variational distance) was proved by
Ikeda (1963) and Ikeda and Matsunawa (1970). In the i.n.nj.d. case, Walsh
(1969) proved the asymptotic independence of sample minimum and sample
maximum under the condition that one or several dJ.'s do not dominate the
other dJ.'s.
First investigations concerning the accuracy of the asymptotic results were
made by Walsh (1970). Sharp bounds of the variational distance in case of
extremes were established by Falk and Kohne (1986). Tiago de Oliveira (1961),
Rosengard (1962), Rossberg (1965), and Ikeda and Matsunawa (1970) proved
independence results that include central order statistics and sample means.
The sharp inequalities in Section 4.2 concerning extreme and central order
statistics are taken from Falk and Reiss (1988).
The asymptotic independence of ratios of consecutive order statistics was
proved by Lamperti (1964) and Dwass (1966); a corresponding result holds
for spacings. Smid and Stam (1975) showed that the condition, sufficient for
this result, is also necessary.
In Lemma 4.4.3 an upper bound of the distance between the normal
distribution N(O,I) and a distribution induced by N(O,I) and a function close to
the identity is computed. For related results we refer to Pfanzagl [1973a,
Lemma 1] and Bhattacharya and Gosh [1978, Theorem 1]. These results are
formulated in terms of sequences of arbitrary normal distributions of a
fixed dimension and therefore not applicable for our purposes. The normal
comparison lemma (see e.g. Leadbetter et al. (1983), Theorem 4.2.1) is related
to this.
4. Approximations to Distributions of Central Order Statistics
150
For rei) = rei, n), i = 1, ... , k, satisfying the condition rei, n) ~ qi' n ~ 00,
where < q 1 < ... < qk < 1, the weak convergence of the standardized joint
distributions of order statistics Xr(i),n to the normal distribution N(o.r.) was
proved by Smirnov (193S, 1944), Kendall (1940), and Mosteller (1946).
The normal distributions N(o,r.) are the finite dimensional marginals of
the "Brownian Bridge" WO which is a special Gaussian process with mean
function zero and covariance function E WO(q) WO(p) = q(1  p) for Os q s
p s 1. The sample quantile process
[0, 1],
here given for (0, I)uniformly distributed r.v.'s, converges to WO in distribution. Thus, the result for order statistics describes the weak convergence of the
finite dimensional marginals of the quantile process. For a short discussion
of this subject we refer to Serfling (1980). In view of the technique which is
needed to rigorously investigate the weak convergence of the quantile process,
a detailed study has to be done in conjunction with empirical processes in
general (see e.g. M. Csorgo and P. Revesz (1981) and G.R. Shorack and
J.A. Wellner (1986)). The invariance principle for the sample quantile process
provides a powerful tool to establish limit theorems (in the weak sense) for
functionals of the sample quantile process, however, one cannot indicate the
rate at which the limit theorems are valid. For statistical applications of the
quantile process we refer to M. Csorgo (1983) and Shorack and Wellner (1986).
Weiss (1969b) studied the normal approximation of joint distributions of
central order statistics w.r.t. the variational distance under the condition that
k = ken) is of order O(n 1 /4 ). Ikeda and Matsunawa (1972) and Weiss (1973)
obtained corresponding results under the weaker condition that ken) is of
order O(n 1/3). Reiss (197 Sa) established the asymptotic normality with a bound
of order O(~}~1 (ri  ri_l)1 )1/2 for the remainder term. We also refer to Reiss
(197Sa) for an expansion of the joint distribution of central order statistics (see
Section 4.S for an expansion of length two in the special case of exponential
r.v.'s). Other notable articles pertaining to this are those of Matsunawa (197S),
Weiss (1979a), and Ikeda and Nonaka (1983).
An approximation to the multinomial distribution, with an increasing
number of cells as the sample size tends to infinity, by means of the distribution
of certain roundedoff normal r.v.'s may be found in Weiss (1976); this method
seems to be superior to a more direct approximation by means of a normal
distribution as pointed out by Weiss (1978).
The expansions of dJ.'s of order statistics in Section 4.6, taken from Nowak
and Reiss (1983), are refinements of those given by Ivchenko (1971, 1974).
Ivchenko also considers the multivariate case. In conjunction with this, we
mention the article of Kolchin (1980), who established corresponding results
for extremes.
CHAPTER 5
Approximations to Distributions
of Extremes
The non degenerate limiting dJ.'s of sample maxima Xn:n are the Frechet d.f.'s
G1 ,a, Wei bull d.f.'s G2 ,a, and the Gumbel dJ. G3 Thus, with regard to the
variety of limiting d.f.'s the situation of the present chapter turns out to be
more complex than that of the preceding chapter, where weak regularity
conditions guarantee the asymptotic normality of the order statistics.
As stated in (1.3.11) the limiting dJ.'s are maxstable, that is, for G E
{G 1 ,a, G2 ,a, G3 : IX > O} we find Cn > 0 and reals dn such that
Gn(dn + xc n) = G(x).
Another interesting class of d.f.'s is that of the generalized Pareto d.f.'s
IX> O} as introduced in (1.6.11). These d.f.'s can also be
used as a starting point when investigating distributional properties of sample
maxima.
Given G E {Gl,a, G2 ,a, G3 : IX > O} we obtain the associated generalized
Pareto dJ. W by restricting the function \}' = 1 + log G to certain intervals.
The generalized Pareto dJ. W has the property
WE {W1 ,a, W2 ,a, W3:
wn(dn + xc n) = G(x)
+ O(n 1 )
where Cn and dn are the constants for which Gn(dn + xc n) =
G(x) holds. The
class of generalized Pareto dJ.'s includes as special cases Pareto d.f.'s, uniform
d.f.'s, and exponential dJ.'s.
An introduction to our particular point of view for the treatment of
extremes will be given in Section 5.1. This section also includes results for the
kth largest order statistic.
In Section 5.2 we shall establish bounds for the remainder terms in the limit
theorems for sample maxima. In view of statistical applications the distance
5. Approximations to Distributions of Extremes
152
between the exact and limiting distributions will be measured W.r.t. the
Hellinger distance.
In Section 5.3 some preparations are made for the study of the joint
distribution of the k largest order statistics; it is shown that there is a close
connection between the limiting distributions of the kth largest order statistic
Xnk+Ln and the k largest order statistics
Higher order approximations in case of extremes of generalized Pareto
r.v.'s are studied in Section 5.4. The accuracy of the approximations to the
distribution of the kth largest order statistics and the joint distribution of
extreme order statistics is dealt with in Section 5.5.
Finally, in Section 5.6, we shall make some remarks about the connection
between extreme order statistics, empirical point processes, and certain
Poisson processes.
5.1. Asymptotic Distributions of Extreme Sequences
In this section we shall examine the weak convergence of distributions
of extreme order statistics. Moreover, it will be indicated that the strong
convergencethat is the convergence w.r.t. the variational distanceholds
under the wellknown von Mises conditions.
Let X Ln S X 2 ,n S ... S Xn,n be the order statistics of n i.i.d. random
variables with common dJ. F. A non degenerate limiting dJ. of the sample
maximum Xn,n has to beas already pointed out in Section 1.3one of the
Frechet, Wei bull, or Gumbel drs; that is, if there exist constants an > 0 and
reals bn such that
Fn(bn + xa n) + G(x),
n +
(5.1.1)
00,
for every continuity point of the nondegenerate limiting dJ. G then G has to
be of the type G1,a, G2 ,a, G3 for some IX > O.
Recall that G1,a(x) = exp( _xa) for x> 0, G2 ,a(x) = exp( ( x)") for
x < 0, and G3 (x) = exp( _e X ) for every x.
Graphical Representation of Extreme Value Densities
The densities gi,a of Gi,a are given by
gl,Ax)
= IXx(1+a)exp( _xa),
g2,a(X)
= IX( _x)al exp( ( x)"),
g3(X)
e X exp( _e X ).
0 < x,
x < 0,
5.1. Asymptotic Distributions of Extreme Sequences
153
Figure 5.1.1. Frechet densities
increases as IX increases.
gl,"
with parameters
IX
0.33,0.5, 1, 3, 5; the mode
Frechet Densities
Figure 5.1.1 is misleading so far as one density seems to have a pole at zero.
A closer look shows that this is not the case. Moreover, from the definition
of gl," it is evident that every Frechet density is infinitely often differentiable.
For a = 5 the density already looks like a Gumbel density (compare with
Figure 1.3.1).
The density gl,. is unimodal with mode
m(l, a) = (a/(l
+ IX
I!".
It is easy to verify that
m(l, a)
0,
and
m(l, a)
1,
gl,.(m(l, a)) ~
00,
as a ~
00.
Weibull Densities
The "negative" standard exponential density g2,1 possesses a central position
within the family of Weibull densities. The Weibull densities are again
unimodal. From the visual as well as statistical point of view the most
significant characteristic of a Weibull density g2,. is its behavior at zero (Figure
5.1.2). Notice that
xi o.
g2,.(X) '" a(  xrl,
One may distinguish between five different classes of Weibull densities as
far as the behavior at zero is concerned:
5. Approximations to Distributions of Extremes
154
2
1
Figure 5.1.2. Weibull densities g2 . with parameters rx
decreases as rx increases.
rx
rx
rx
rx
rx
0.5, 1, 1.5, 2, 4; the mode
(0, 1): pole
= 1: jump
(1,2): continuous, not differentiable from the left at zero
2: differentiable from the left at zero
> 2: differentiable at zero.
If rx > 1 then the mode of g2.a is equal to
m(2,rx) = ((rx  1)/rx)l/a < 0.
Moreover,
m(2, rx)
+
0,
1,
as rx
+
+ 00,
as rx
+ 00.
g2.a(m(2, rx
+
1,
and
m(2, rx)
+
1,
g2.a(m(2, rx
Gumbel Density
The Gumbel density g3(X) = eXexp( _e X ) approximately behaves like the
standard exponential density e X as x + 00. The mode of g3 is equal to zero.
For the graph of g3 we refer to Figure 1.3.1.
Weak Domains of Attraction
If (5.1.1) holds then F is said to belong to the weak domain of attraction of G.
We shall discuss some conditions imposed on F which guarantee the weak
convergence of upper extremes.
5.1. Asymptotic Distributions of Extreme Sequences
155
As mentioned above, c;;l(Xn:n  dn) has the dJ. Gi,a. if F = Gi,a. and if the
constants are appropriately chosen. Thus e.g. the sample maximum Xn:n of
the negative exponential dJ. GZ,l may serve as a starting point for the study
of asymptotic distributions of sample maxima. However, to extend such a
result one has to use the transformation technique (or some equivalent more
direct method) so that it can be preferable to work with the sample maximum
Un : n or v,,:n of n i.i.d. random variables uniformly distributed on (0,1) or,
respectively, ( 1, 0). In this case the limiting dJ. will again be G2 ,l' Recall that
the uniform distribution on (  1,0) is the generalized Pareto distribution W2 ,l'
As pointed out in (1.3.14) we have
G2,l(X),
n +
00,
(5.1.2)
Fn(bn + xan) = G2 ,l (n(F(bn + xa n)  1)) + 0(1),
n +
00,
(5.1.3)
P{n(Un:n  1) ~ x} = P{nv,,:n ~ x}
+
for every x.
(5.1.2) and Corollary 1.2.7 imply that
for every x. Moreover, for G E {G1,a., G2 ,a., G3 : a > O} we may write
G=G2 ,l(lOgG)
on
(a(G),w(G)).
This yields
n +
00,
for every x,
if, and only if,
(5.1.4)
n(l  F(bn + xa n)) + log G(x) =: 1  'P(x),
n +
00,
for every x E (a(G),w(G)).
This wellknown equivalence is one ofthe basic tools to establish necessary
and sufficient conditions for the weak convergence of extremes. These conditions [due to Gnedenko (1943) and de Haan (1970)J in their elegance and
completeness can be regarded as a corner stone in the classical extreme value
theory.
AdJ. F belongs to the weak domain of attraction of an extreme value dJ.
Gi,a. if, and only if, one of the following conditions holds:
(1, a):
w(F)
lim [1  F(tx)J/[l  F(t)J
00,
= xa.,
x > 0;
(5.1.5)
t .... ""
(2, a):
w(F) <
00,
lim [1  F(w(F)
tlo
= (  x)a.,
(3):
lim [1  F(t
+ xg(t))]/[l
+ xt)]/[l
 F(w(F)  t)J
x < 0;
 F(t)] = eX,
(5.1.6)
00
<x<
00,
(5.1.7)
ttw(F)
where g(t) = n,(F)(1  F(y))dy/(l  F(t)).
Moreover the constants an and bn can be chosen in the following way:
5. Approximations to Distributions of Extremes
156
(2, IX):
b: = 0,
b: = w(F),
(3):
b: = F I (1  l/n),
(1, IX):
a: = rl(1  I/n);
(5.1.8)
a: = w(F)  F I (1  I/n);
(5.1.9)
a: = g(b:)
(5.1.10)
where g is defined in (5.1.7).
It is well known that the weak convergence to the limiting dJ. G holds for
other choices of constants an and bn if, and only if,
an/a:
+
1 and
a;; I (bn  b:)
+
0 as n +
00.
(5.1.11)
For a wellknown extension of this result we refer to P.5.3.
Tail Equivalence of D.F.'s
Further insight into the property that a dJ. belongs to the weak domain of
attraction of G = Gi a may be gained by conditions that are more closely
related to (5.1.4). Observe that the two statements in (5.1.4) are equivalent to
n(1  F(bn
+ xa n ))/[l
 qt(x)]
+
1,
n +
00,
(5.1.12)
for every x E (IX(G), w(G)) where qt = 1 + log G.
Recall that the restriction of qt to an appropriate interval is a generalized
Pareto dJ. WE {W1.a, W2 a, W3: IX > a}.
Theorem 5.1.1. Let G = Gi a and W = W;,Jor some i
the following three statements are equivalent:
(i) Fn(bn + xa n) + G(x),
n +
00,
{t, 2, 3} and IX> O. Then
for every x,
(5.1.13)
(ii) (l  F(bn + xan))/(l  W(d n + xc n)) + 1,
n +
(iii) (1  F(bn + xa n))/(l  G(dn + xc n)) + 1,
n +
00,
00,
(5.1.14)
(5.1.15)
where (5.1.14) and (5.1.15) have to hold for every x E (IX(G),W(G)).
Moreover, dn = 0 if i = 1,2, dn = log n if i = 3, Cn = n l / a if i = 1, cn = n l / a
if i = 2, and Cn = 1 if i = 3.
PROOF. The equivalence of (5.1.13) and (5.1.14) is immediate from (5.1.12) by
writing (1  qt(x))/n = 1  W(d n + xc n). Moreover, from (1.3.11) and the
first equivalence we conclude that [1  G(d n + xc n)]/[l  W(d n + xc n)] + 1,
n + 00, and hence, obviously, the second equivalence is also valid.
0
Notice that, necessarily, bn + xa n + w(F), n + 00, if Fn(bn + xa n) + G(x),
n + 00, and IX(G) < x < w(G). Thus, Theorem 5.1.1 reveals that F belongs to
the weak domain of attraction of G if, and only if, the upper tail of F can
asymptotically be made equivalent to G(d n + xc n ). Below we shall prefer to
work with the generalized Pareto d.f.'s W instead of the extreme value d.f.'s
G because of technical advantages and other reasons which will become
apparent when treating joint distributions of extremes.
5.1. Asymptotic Distributions of Extreme Sequences
157
Strong Domain of Attraction
Recall that the symbol G is used for the dJ. as well as for the corresponding
probability measure. In analogy to the notion of the weak domain of attraction, F is said to belong to the strong domain of attraction of G if
sup IP{a;;l(Xn:n  bn) E B}  G(B)I+ 0,
n +
00,
(5.1.16)
where the sup is taken over all Borel sets B.
Notice that condition (5.1.16) implies that F belongs to the weak domain
of attraction of G. Thus, necessarily the normalizing constants are again those
of the weak convergence. Moreover, it can easily be verified that (5.1.11) carries
over to the strong covergence.
The following result was already indicated in (1.3.14).
Lemma 5.1.2.
sup IP{n(Un:n  1) E B}  G2 1 (B)I+ 0,
n +
00.
(5.1.17)
PROOF. From Theorem 1.3.2 we deduce that n(Un : n  1) has the density
In given by fn(x) = (1 + x/nrl, n < x < 0, and =0, otherwise. Thus,
fn(x) + eX = g2, 1 (x), n + 00, x < 0, and hence the Scheffe lemma implies the
assertion.
0
Next, we study conditions under which F belongs to the strong domain of
attraction of an extreme value dJ.
Tail Equivalence of Densities
In this sequel let us assume that F has a density f.
Denote by w the density of the generalized Pareto dJ. W Notice that
w=g/G
on appropriate intervals where G is the corresponding extreme value dJ. and
9 = G'. Explicitly, we have
wl,a(x)
= {~x(l+a)
if
w".(x)
~ {~(  xY'
if
= {~X
if
W3(X)
x<l
x~l
"Pareto"
(5.1.18)
"Type II"
(5.1.19)
x < 1
l~x~O
x>O
x<O
x ~ O.
"Exponential" (5.1.20)
5. Approximations to Distributions of Extremes
158
The generalized Pareto densities as well as the extreme value densities are
unimodal. The particular feature of the generalized Pareto densities is the tail
equivalence to the corresponding extreme value densities at the right endpoint of the support.
The counterpart to Theorem 5.1.Iwith respect to the strong convergenceis the following.
Lemma 5.1.3. Assume that the constants an > 0 and bn are chosen so that the
weak convergence holds, that is, Fn(bn + xa n) 4 G(x), n 4 00, for every x, where
G E {Gl,a, G2 ,a, G3 : a < o}. Then,
sup IP{a;l(Xn:n  bn) E B}  G(B)I4 0,
(5.1.21)
n 400,
if, and only if, for every subsequence i(n) there exists a subsequence ken)
such that
ak(n)f(bk(n) + xak(n)
ck(n) w(dk(n) + xCk(n)
'''"''4,
n 4
i(j(n))
(5.1.22)
00,
for Lebesgue almost all x E (a(G), w(G)) where w is the corresponding generalized Pareto density and Cn and dn are the constants of Theorem 5.1.1.
Condition (5.1.22) is equivalent to the condition that for every subsequence
i(n) there exists a subsequence ken) = i(j(n)) such that
k(n)ak(nJ(bk(n)
+ xak(n) 41jJ(x) = giG,
n 4
00,
(5.1.22')
for almost all x E (a (G), w(G)). The equivalence of (5.1.22) and (5.1.22')
becomes obvious by noting that dn + xC n E (a(W), w(W)) for every x E (ct(G),
w(G)), and ljJ(x}/n = Cnw(d n + xc n), eventually.
Without the condition Fn(bn + xa n) 4 G(x), n 4 00, (5.1.22) does not necessarily imply (5.1.21) as can be shown by examples. If the weak convergence
holds then a sufficient condition for the convergence W.r.t. the variational
distance is
a.f(bn + xa n)
cnw(d n + xc n)
'41,
n4OO,
xE(a(G),w(G)).
(5.1.23)
Note that the rate of convergence in (5.1.23) will also determine the rate at
which the strong convergence of the distributions holds. We remark that the
generalized Pareto density w can be replaced by the density g of G in condition
(5.1.23).
Notice that (5.1.23) is equivalent to
n 4
PROOF OF
LEMMA
00,
X E
(a(G), w(G)).
5.1.3. Since
x
4
na.f(bn + xa n)Fnl(bn + xa n)
(5.1.23')
5.1. Asymptotic Distributions of Extreme Sequences
159
is the density of a;;l(Xn:n  bn) it is immediate from the SchetTe lemma 3.3.4
that (5.1.21) is equivalent to (5.1.22').
D
Lemma 5.1.3 will be the decisive tool to prove the following equivalence:
F belongs to the strong domain of attraction of an extreme value distribution
if, and only if, the corresponding result holds for the joint distribution of
the k largest extremes for every positive integer k. For details we refer to
Section 5.3.
From the mathematical point of view, condition (5.1.22) is more satisfactory
than the sufficient condition (5.1.23). However, for practical purposes condition (5.1.23) can be useful; e.g. to verify that a given dJ. belongs to the strong
domain of attraction of a particular extreme value distribution G E {G1, .. , G2 , .. ,
G3 : ex > O}. It was proved by Falk (1985a) that the von Mises conditions
(5.1.24) imply (5.1.23) and that (5.1.23) implies the convergence in the strong
sense. Sweeting (1985) was able to show that the von Mises conditions (5.1.24)
are equivalent to the uniform convergence of the densities in (5.1.23') on finite
int:!rvals if the density f is positive on a left neighborhood of w(F).
Von MisesType Conditions
Hereafter, we assume that F has a positive derivative f on (xo, w(F)) where
< w(F). The following conditions (1, ex), (2, ex), and (3) are sufficient for F to
belong to the strong domain of attraction of G1 , .. , G2 , .. , and G3 , respectively.
Xo
(1, ex):
w(F) =
00,
and lim if(t)/[1  F(t)] = ex;
(2, ex):
w(F) <
00,
and lim [w(F)  t]f(t)/[1  F(t)]
= ex;
ttw(F)
(3):
(5.1.24)
W(F)
00
(1  F(u)) du <
00,
and
W(F)
lim f(t)
ttw(F)
(1  F(u))du/[1  F(t)]2
= 1.
Another set of sufficient conditions can be formulated if, in addition, F has
a second derivative on (x o , w(F)) where Xo < w(F):
1
lim [(1  F)/f]'(t) =
rtw(F)
If i = 3 then the normalizing constant
be replaced by
!ex
if i = 2
i = 3.
(5.1.25)
a: = g(b:) as given in (5.1.10) can
5. Approximations to Distributions of Extremes
160
an
= 1/(nf(b:))
(5.1.26)
where again b: = F 1 (1  lin).
Notice that
[(1  F)/fJ' = (1  F)f'Ij2  1.
(5.1.27)
Thus, (5.1.25), i = 3, is equivalent to limttro(F) (1  F(t))f'(t)/j2(t) = 1.
(5.1.25) can be formulated in the following way: If the limit in (5.1.25) exists
then F belongs to the strong domain of attraction of the von Mises dJ. Hp
with parameter
f3 = lim [(1  F)!fJ'(t).
ttro(F)
Since the conditions (5.1.5)(5.1.7) and (5.1.24)(5.1.26) are deduced from
(5.1.4) it is not very amazing that these conditions are trivially fulfilled for the
generalized Pareto d.f.'s W, that is, the equalities if(t)/[l  F(t)] = (X etc. hold
for every t in the support of W
The von Misestype conditions are sufficient for a dJ. to belong to the
strong domain of attraction of an extreme value dJ. However, as examples
show these conditions are not necessary. This is intuitively clear since for every
density f which fulfills a von Misestype condition we can findby slightly
varying f in the tail of the distributiona density g which violates the
von Misestype condition whereas the stochastical properties of the sample
maximum remain to hold asymptotically.
The main purpose of the following example is to clarify the connection
between the different normalizing constants used in literature for the maximum of normal r.v.'s.
EXAMPLE 5.1.4. Let Xn;n be the maximum of standard normal r.v.'s. Write
again <p = <1>'. Since <p'(x) = x<p(x) we get
( 1  <1' (x)
<p
= (1  <1> (x)) x _ 1.
<p(x)
It is immediate from (3.2.3) that this expression tends to zero as x + 00. Thus,
condition (5.1.25), i = 3, implies that <1> belongs to the domain of attraction
of the Gumbel dJ. G3 Hence, according to (5.1.26), with bn = <1>1(1  lin),
sup IP{ncp(bn)(Xn;n  bn) E B}  G3 (B)I+ 0,
B
n +
00.
(1)
Direct calculations or an application of Example 5.2.4 shows that (1) holds
with a remainder term of order O(ljlog n).
Next an = lln<p(bn) and bn will be replaced by other normalizing constants
that satisfy (5.1.11). Obviously, bn is the solution of the equation
I  <1>(b)
= lin.
(2)
Since 1  <1>(x) ~ <p(x)/x as x + 00 it is immediate that (2 log n)1/2 may be
taken as a first approximate solution of (2). Moreover, (2) may be written
5.1. Asymptotic Distributions of Extreme Sequences
(1  <I>(b))/cp(b)
161
= 1/(ncp(b))
and hence a solution of the equation
(3)
ncp(b) = b,
say, b~ will be an approximate solution of (2). It can be shown (compare
also with Example 5.2.4) that (1) still holds with a remainder term of order
O(l/logn) if an and bn are replaced by a~ and b~ where a~ = (b~)l.
(3) is equivalent to the equation
b
= (2 log n log2n  210gb)1/2.
(4)
A Taylor expansion of length two about 210gn leads to the equation
b
= (210gn)1/2 _ log2n + 210gb
2(2 log n)1/2
Replacing b on the righthand side by (2 log n)1/2 we get
b* = (21
)1/2 _ log 4n + loglog n
n
og n
2(2 log n)1/2
(5)
Use P.5.7 to prove that
sup IP{(210gn)1/2(Xn:n  b:) E B}  G3 (B)1 = 0 (
B
(lOglOg n)2 )
1
.
ogn
(6)
We remark that the bound in (6) is sharp. Moreover, the same rates are
obtained if d.f.'s are considered.
The kth Largest Order Statistic
The results given above can easily be extended to the case of the kth largest
order statistic. It is well known that
P{a(n)l(Xn:n  b(n)) ~ x}
+
Gi.lZ(x),
n +
00,
implies for every fixed k that
P{a(n)1(Xn_k+1:n  b(n)) ~ x}
+
n +
Gi.lZ.k(X),
00,
(5.1.28)
where the dJ.'s Gi lZ .k are given by
G1.IZ .k(X) = exp( _XIZ)
G2.IZ .k(X)
k1 x jlZ
L ., '
j=O
= exp( ( _X)IZ)
J.
k1 (_XylZ
L .,,
j=O
J.
x> 0,
x < 0,
00
<x<
(5.1.29)
00.
5. Approximations to Distributions of Extremes
162
With the convention G3 ,a,k == G3 ,k we have
Gi,a,k
= Gi,a
k1
j:O
(5.1.30)
(log Gi,ay/j!
on the support of Gi,a'
To prove (5.1.28) recall that, necessarily, for G E {G 1,a, G2,a, G3 : r:x > O} and
x E (r:x(G),w(G)),
n(l  F(u n)) + log G(x),
with
Un
= bn + anx. According to (1.1.8), as n +
P{Xn k+1:n::;;
un}
+ 00,
00,
p{~ l(Un,oo)(~J::;; k 
1}
= B(n,1F(U n ))({0, 1, ... ,k  1})
+
(5.1.31)
p10gG(X)({0, 1, ... ,k  1})
where PI denotes the Poisson distribution with parameter t > 0. Thus, (5.1.28)
holds.
Moreover, it is well known that every nondegenerate limiting dJ. of the
kth largest order statistic X n k+1:n has to be one of the d.f.'s in (5.1.29)
(see e.g. Galambos (1987), Theorem 2.8.1) where it is always understood that
we have to include a location and scale parameter if the dJ. of X n  k + 1 : n is not
properly standardized.
Note that in analogy to (1.3.15) the nondegenerate limiting d.f.'s Fi,a,k of
the kth smallest order statistics Xk:n are given by
F1,a,k(X)
= 1  Gl,a,k( x),
x < 0,
F2,a,k(X)
= 1  G2,a,k(  x),
x> 0,
(5.1.32)
F3 ,k(X) = 1  G3,k( x)
where again r:x > 0.
Obviously, G2 , 1,k is the "negative" gamma dJ. with parameter k; thus, the
density g2, 1,k of G2 , 1,k is given by
x < 0,
and = 0, otherwise,
We also note the explicit form of the densities gi,a,k of Gi,a,k' Since
Gi,a,k
G2 , !,k(log Gi,a)
on
we know that
gi,a,k(X) = g2, 1,k(lOg Gi,a(x))
~i":~~'
and = 0, otherwise. Explicitly, we have
(r:x(Gi,a), w(Gi,a))
(5.1.33)
5.1. Asymptotic Distributions of Extreme Sequences
163
X(/lk+1)
gl,/l,k(x)=ocex p(x/l)(k_1)!'
x>O,
( _X)/lki
g2,/l,k(X) = ocexp( ( _x)/l) (k _ 1)! '
00
x < 0,
<x<
(5.1.34)
00.
Notice that
gi,/l,k = gi,/l( log Gi,/l)ki/(k  1)!'
Lemma 1.6.6 yields that G2 ,l,k is the dJ. of the partial sum Sk = L~=i ~i
where ~ 1, ... , ~k are i.i.d. random variables with common dJ. F(x) = eX, x < 0.
Next it will be proved that n(Unk+1:n  1) is asymptotically distributed
according to G2 ,l,k (in other words, can asymptotically be represented by Sd.
As an extension of Lemma 5.1.2 we obtain
Lemma 5.1.5. For every positive integer k,
sup IP{n(Unk+i:n  1) E B}  G2,l,k(B)I+ 0,
n +
00.
(5.1.35)
PROOF. Obvious by noting that n(Unk+i:n  1) has the density fn given by
fn(x) =
1]
ki (
i)) (
1  ;;
x)nk( _ X)ki
(k _ 1)! '
+ ;;
n < x < 0,
and = 0, otherwise.
Obviously, Lemma 5.1.5 can be written
sup IP{nv,,k+i:n
B
B}  G2 ,l,k(B)I+ 0,
n +
00,
(5.1.36)
where v,,k+1:n is the kth largest order statistic of n i.i.d. random variables that
are uniformly distributed on (  1,0).
Recall that the uniform distribution on (  1,0) is the generalized Pareto
distribution W2 ,l' (5.1.36) can easily be extended to the other generalized
Pareto distributions Wi,/l by using the transformation technique.
Let again Ii,/l be defined as in (1.6.10). For x < 0, we have Ti,/l(x) =
(  Xfi//l, T2 ,/l(x) =  (  X)i//l and T3,l (x) = log(  x).
Since Ii,/l(nV,:n) = c;l(Xr:n  dn) where cn, dn are the constants of
Theorem 5.1.1 and since Gi,/l,k is induced by G2 ,l,k and Ii,/l [recall that
Ii:,} = Gi,\ 0 Gi,/l = log Gi,/l] the following result is immediate from Lemma
5.1.5.
Corollary 5.1.6. Let Xnk+i:n be the kth largest order statistic of n i.i.d. random
variables with common generalized Pareto dJ. WE {Wi,/l' W2 ,/l' W3: oc > O}.
5. Approximations to Distributions of Extremes
164
Then, for every fixed k, as n 
00,
sup IP{n 1/aX n k+ 1:n E B}  G1,a,k(B)I 0
(5.1.37)
(5.1.38)
(5.1.39)
sup IP{(Xnk+1:n logn) E B}  G3 ,k(B)I 0
B
In Section 5.4 it will be shown that Lemma 5.1.5 (and thus also Corollary
5.1.6) is valid with a remainder term of order O(k/n).
Intermediate Order Statistics
From Chapter 4 we already know that intermediate order statistics are
asymptotically normal under weak regularity conditions. For example,
according to Theorem 4.2.1,
sup lP{a~!(Ur:n  br,n) E B}  N(O,1)(B)1 :5: C(n/r(n  r))1/2 (5.1.40)
B
where C > 0 is a universal constant, and ar,n > 0 and br,n are normalizing
constants. In Section 5.4 it will be proved that
(5.1.41)
sup IP{n(Un k+1:n  1) E B}  G2,1.k(B)1 :5: Ck/n,
B
where G2 ,1,k is the "negative" gamma distribution. We also refer to P.5.18
where a rate of order O(k1/2/n) is achieved in (5.1.41) by using other normalizing constants. Approximations of joint distributions of intermediate order
statistics are established in Sections 4.5, 5.4, and 5.5.
The following theorem is taken from Falk (1989b).
Theorem 5.1.7. Assume that one of the von Mises conditions (5.1.24) holds. Let
k(n) E {I, ... , n} be such that k(n)  00 and k(n)/n  0 as n  00.
Then, with bn = p1(1  k(n)/n), we have
n
00.
(5.1.42)
The proof of(5.1.42) is based on (5.1.40) and the transformation technique.
5.2. Hellinger Distance between Exact and
Approximate Distributions of Sample Maxima
Given n i.i.d. random variables ~ l' ... , ~n with common dJ. P we know that
the dJ. of the sample maximum Mn = Xn:n is given by pn. In Section 5.1 we
gave a short outline of classical results concerning the weak convergence of
pn (if appropriately normalized) to a limiting dJ. G. Moreover, we know that
5.2. Hellinger Distance between Exact and Approximate Distributions
165
under von Misestype conditions the weak convergence is equivalent to the
convergence w.r.t. the variational distance. In the present section we study the
accuracy of such approximations. Again we use the same symbol for a dJ. and
the pertaining probability measure to simplify the notation.
The Hellinger Distance
In statistical applications it is desirable to use the Hellinger distance instead
of the variational distance. To highlight this point consider the sample
maxima
i = 1, ... , N,
where the random variables ~1.1' ... , ~l.n' ~2.1'' ~2,n' ... , ~N,l> ... , ~N,n are
i.i.d. with common dJ. F. Thus, M n, 1, ... , Mn,N are i.i.d. random variables with
common dJ. Fn. If '11' ... , '1N are i.i.d. random variables with dJ. G then we
know from Corollary 3.3.11 that for every Borel set B,
IP{(Mn,l,oo.,Mn,N) E B}  P{('11,.oo,'1N)
B}I 5 Nl/2H(P,G)
(5.2.1)
where H(Fn, G) is the Hellinger distance between F n and G.
Given dJ.'s F and G with Lebesgue densities f and g, the Hellinger distance
of F and G is defined by
H(F, G) =
[f (fl/2(X) 
In general, if F and G have densities
measure J1 then
H(F, G) =
gl/2(XW dXJ /2
f and
[f (fl/2 
(5.2.2)
g with respect to some O"finite
gl/2)2 dl1 J /2
(5.2.3)
and the distance is independent of a particular representation. Thus (5.2.2)
and (5.2.3) lead to the same distance (if Lebesgue densities exist). We refer to
Section 3.3 for further details.
(5.2.1) also holds with N l /2H(Fn, G) replaced by N IIFn  Gil where
liP  Gil is the variational distance between P and G. However, the use of
N IIF n  Gil yields an inaccurate inequality in those cases where liP  Gil and
H(P, G) are of the same magnitude.
An Auxiliary Approximation
According to (5.1.4)
P(bn
+ xa n ) ~ G(x) =: exp( h(x,
n~
if, and only if,
00,
(5.2.4)
n(l  F(bn + xa n )) ~ h(x),
n~
00.
5. Approximations to Distributions of Extremes
166
Since F is a dJ. it is obvious that also Dn defined by
Dn = [exp[ n(1  Fn)]  e n]/(1  e n)
is a dJ. where Fn(x) = F(bn + xa n). Now (5.2.4) may be written
(5.2.4')
w.r.t. the pointwise convergence.
According to Lemma 5.2.1, Dn + G implies
+ G where the convergence
is taken w.r.t. the Hellinger distance H. Notice that the dJ. G in Lemma 5.2.1
is not necessarily an extreme value dJ. In particular, G may also depend on n.
F:
Lemma 5.2.1. There exists a universal constant C > 0 such that for every nand
all dI's F and G the following inequality holds:
H(F n, G) ::;; H(Dn' G)
+ C/n
where Dn = [exp[ n(1  F)]  en]/(l  e n).
PROOF.
Since H(F n, G) ::;; H(F n, Dn)
holds if
+ H(Dn, G)
we know that the assertion
(1)
First, (1) will be verified in the special case of Fo(x) = 1 + x/n,  n < x < O.
Notice that Fo is the dJ. of n(Un : n  1).
In this case we have Dn(x) == Do.n(x) = (eX  e n)/(1  e n), n < x < 0,
and, therefore, Do,n is the normalized restriction of the extreme value dJ. G2,1
to the interval (  n, 0).
Denote by fo and do,n the densities of Fo and Do,n' Since
H(Fo, Do,n) ::;;
[f (nfoFo1 /do,n 
1)2 dDo,n
J2
(see Lemma 3.3.9(ii)) it is immediate that (1) holds for Fo and Do,n if
f:n [e x
(1 + ~)"\1 
e n) 
1J
eX/(l  en)dx ::;; (C/n)2.
(2)
This inequality can be verified by means of some straightforward calculations.
The extension to arbitrary d.f.'s is obtained by means of the transformation
technique. If ~ and 'l are r.v.'s with dJ.'s Fo and Do,n then F 1(1 + ~/n) and
F 1(1 + 'lin) are r.v.'s with d.f.'s F" and Dn = [exp[ n(1  F)]  e n]/
(1  e n ). Now, Lemma 3.3.13, which concerns the Hellinger distance between
induced probability measures, implies (1) in the general case.
D
Let Fo be defined as in the proof to Lemma 5.2.1, that is, Fo is the dJ. of
n(Un : n  1). It is easy to see that also
5.2. Hellinger Distance between Exact and Approximate Distributions
H(F~,G2,l) ~
167
Cjn.
(5.2.5)
According to our considerations in Section 5.1, this inequality can easily
be extended to sample maxima under arbitrary generalized Pareto d.f.'s.
The Main Results
Notice that Lemma 5.2.1 holds for arbitrary dJ.'s F and G. Hereafter, we shall
assume that F and G possess densities f and g.
In the next step we establish an upper bound for H(Dn' G)and thus for
H(F n, G)which depends on F through the density f only.
Lemma 5.2.2. Let F and G be df.'s with densities f and g. Define ifJ = giG on
the support of G. Then, for every Xo ~ 00,
H(pn, G)
~ [2G(B +
C
[nfN  1  10g(nfN)] dG
r (1 + log G) dG + r
JB
nGdFJ1 /2 + Cln
J{g=O)
(5.2.6)
where B = {x: x> xo,f(x) > O} and C > 0 is a universal constant.
PROOF. Let the dJ. Dn be defined as in Lemma 5.2.1. Notice that Dn has the
density x + nf exp[  n(l  F)]/(l  e n). To prove this apply e.g. Remark
1.5.3. Now, by Lemma 5.2.1 and Lemma A.3.5, applied to H(Dn' G), we obtain
H(pn, G)
~ [2G(B +
C)
[n(l  F)  log(nf)
+ 10g(GifJ)] dG T/2
+~.
(1)
Recall that G = gN on the set {g > O}. Hence, by Fubini's theorem
(1  F)dG =
=
L: (LOO
f(y)dy )dG(X)
f f l[xo,oo)(x) l(OO,y)(x)f(y)g(x) dx dy
(fo
f(y) l[xo,oo)(Y)
g(x) dX) dy
(2)
s;; roo f(y)G(y) dy
Jxo
r UN)dG + r
JB
G(y)dF(y).
J{g=O}
Combining (1) and (2) we obtain inequality (5.2.6).
5. Approximations to Distributions of Extremes
168
In special cases the term on the righthand side of (5.2.6) simplifies
considerably.
Corollary 5.2.3. Assume in addition to the conditions of Lemma 5.2.2 that F and
G are mutually absolutely continuous (that is, G{J> O} = F {g > O} = 1).
Then,
H(P, G) ::;
PROOF.
[f (nf/rll 
1  log(nf/rll dG
Lemma 5.2.2 will be applied to
Xo
00.
J/
2
+ C/n.
(5.2.7)
It suffices to prove that
flOg G dG =  1.
(1)
Notice that according to Lemma 1.2.4,
f(l
+ 10gG)dG =
=
since x log x
+
0 as x
+
Il
Il
(1
+ 10g(G
(1
+ log x) dx = xlogxlA = 0
G1)(xdx
O.
The proof of (1) shows that JIog G dG = 1 for continuous dJ.'s G. If G
has a density g then Jg(x)( log G(x dx = 1 so that g(x)( log G(x is a
probability density. In Section 5.1, we already obtained a special case, namely,
that g;,rz,2 = g;,rz( log G;,rz) where g;,rz is the limiting density of the second
largest order statistic.
Thus, if g is an approximation to the density of the standardized sample
maximum then g( log G) will be the proper candidate as an approximate
density of the second largest order statistic. The extension of this argument
to k > 2 is straightforward and can be left to the reader.
Since xI logx::; xI + 1/x  1 = (x  1flx we obtain from Corollary 5.2.3 that
H(F n , G) ::;
[fnf/~/~ If dG
J/+
2
C/n
(5.2.8)
where again'" = giG. This inequality shows once more (see also Section 5.1)
that the approximating dJ. G should be chosen in such a way that nf/rll is close
to one.
5.2.4. Let F(x) = <II(bn + b;;l x) where <II is the standard normal dJ.
and bn is the solution of bn = ncp(bn ) with cp = <11'. Then,
EXAMPLE
(5.2.9)
5.2. Hellinger Distance between Exact and Approximate Distributions
169
To prove this, we apply (5.2.7), with l/1(x) = e X. We have
H(F",G3 )
~ [f (mp(b
+ b;;l x) b;; 1 eX llog(ncp(bn + b;;l X)b;;l eX)) dG 3(x) J /2 + C/n
= [f (exp( x 2/2b;) 
1 + X2/2b;)dG 3(x)J'2
~ [f(X 4 /8b:)dG3(X)J /2 + C/n ~ C(b;;2 + n
+ C/n
1 ).
Thus, (5.2.9) holds since b;;2 = O(lflog n).
Next, Lemma 5.2.2 will be applied to extreme value d.f.'s G E {G1,a;,
OJ. Note that the function l/1 = giG is given by
G2 .a;, G3 : ex >
l/11,a;(X) = exx(1+a;),
x>0
l/12,a;(X) = ex(  x)(1a;),
x<O
l/13(X) = e x ,
00
<x<
(5.2.10)
00.
Recall that w(F) < 00 if F belongs to the domain of attraction of G2,a;' In
this case, the usual choice of the constant bn is w(F) so that we may assume
w.l.g. that w(F) = w(G2,a;) = O. IfF belongs to the domain of attraction of G1,a;
then w(F) = w(Gl,a;) = 00, Let us also assume that w(F) = w(G3) = 00 ifi = 3
to make the inequality in Theorem 5.2.5 as simple as possible without losing
too much generality.
Theorem 5.2.5. Let G E {G1,a;, G2,a;, G3 : ex > OJ. Let F be a df. with density f
such that f(x) > 0 for Xo < x < w(F). Assume that w(F) = w(G). Then,
H(F n , G)
~ [L:(G) [nfN 
1 log(nfN)] dG + 2G(x o)  G(x o) log G(xo) J /2 + C/n
where C > 0 is a universal constant.
PROOF.
Immediate from Lemma 5.2.2 since J{g=O} nG dF
2G(BC ) +
= 0, and
(l + log G) dG = Gi,a;(BC) + Gi,a;,2(B')
= Gi,..(xo) + Gi,a;,2(XO)
= 2G(x o)  G(xo)log(G(xo)).
5. Approximations to Distributions of Extremes
170
Limit Distributions
The results above provide us with useful auxiliary inequalities which, in a next
step, have to be applied to special examples or certain classes of underlying
d.f.'s to obtain a more explicit form of the error bound.
Our first example again reveals the exceptional role of the generalized
Pareto dJ.'s W;,a (at least, from a technical point of view).
EXAMPLE 5.2.6. (i) Let WE {WI,a, W2 ,a, W3:
of Theorem 5.1.1. Put
Fn(x)
IX
> O} and Cn, dn be the constants
W(d n + xc n)
The density In of Fn is given by
fn(x)
Cnw(d n + xc n) = ljJ(x)ln
for every x with fn(x) > O. Thus, we have
(nfnN  1  10g(nfnN))dG
O.
(In>oJ
Applying Theorem 5.2.5 to
Xo =
(IX(W)  dn)/c n we obtain again
H(F;, G) ::;; Cln.
(ii) Let in (i) the generalized Pareto dJ. W be replaced by adJ. F which has
the same tail as W More precisely,
f(x)
w(x),
T(x o ) < x < w(G),
where  1 < Xo < 0 and T is the corresponding transformation as defined in
(1.6.10). Then,
H(F;, G) ::;; Coin
where Co is a constant which only depends on
Xo.
Notice that the condition T(x o ) < x in Example 5.2.6(ii) makes the
accuracy of the approximation independent of the special underlying dJ. F.
Example 5.2.6 will be generalized to classes of d.f.'s which include the
generalized Pareto dJ.'s as well as the extreme value dJ.'s. Since our calculations are always carried out within an error bound of order O(nl) it is
clear that the estimates will be inaccurate for extreme value d.f.'s.
Assume that the underlying density f is of the form
f= ljJe h
where h(x) ~ 0, x ~ w(G). Equivalently, one may use the representation
f = 1jJ(1 + h) by writing f = ljJe h = 1jJ(1 + (e h  1)).
Corollary 5.2.7. Assume that G E {GI,a' G2 ,a, G3: IX > O} and 1jJ, T are the
corresponding auxiliary functions with IjJ = giG and T = G I 0 G2 ,1'
5.2. Hellinger Distance between Exact and Approximate Distributions
171
Assume that the density f of the df F has the representation
f(x) = I/I(x)eh(X),
and = 0,
T(xo) < x < w(G),
(5.2.11)
if x > w( G), where Xo < 0 and h satisfies the condition
i = 1
LXIZ~
Ih(x)1
if i =
L( _X)IZ~
(5.2.12)
i=3
Le~x
and L, fJ are positive constants. Write
Fn(x) = F(d n + xc n)
where Cn> dn are the constants of Theorem 5.1.1. We have dn =0 if i = 1,2, and
dn = logn if i = 3; moreover, Cn = n i/lZ if i = 1, cn = n1j1Z if i = 2, and Cn = 1 if
i = 3.
Then, the following inequality holds:
H(F:, G)
~ DDnn=: if
0 < fJ
fJ>1
(5.2.13)
where D is a constant which only depends on Xo, L, and fJ.
PROOF. W.l.g. we may assume that G = G2 ,l' The other cases can easily be
deduced by using the transformations T == 'Ii,IZ'
Theorem 5.2.5 will be applied to xO,n = nxo. It is straightforward that
the term 2G2,l (nxo)  G2,l (nx o) log G2,l (nx o) can be neglected. Put f,.(x) =
f(x/n)/n. Since h is bounded on (xo, 0) we have
f. 0(nfnN2, 1 1 log(nfnN2,d)dG2,1
= f.0 (eh(Xln)  1  h(x/n))dG2,l(x)
nxo
nxo
~ fj L:o (h(x/n))2 dG2. (x) ~ fjL2n2~ f:oo IxI2~dG2,l(x)
1
where fj only depends on X o, Land fJ. Now the assertion is immediate from
Theorem 5.2.5.
D
Extreme value dJ.'s have representations as given in (5.2.11) with fJ = 1 and
hex) = _x IZ if i = 1, hex) = ( _X)IZ if i = 2, and hex) = _e X if i = 3.
Moreover, the special case of h = 0 concerns the generalized Pareto densities.
Remark 5.2.S. Corollary 5.2.7 can as well be formulated for densities having
the representation
f(x) = I/I(x)(1
+ hex)),
T(x o) < x < w(G),
and =0, if x> w(G), where h satisfies the condition (5.2.12).
(5.2.14)
5. Approximations to Distributions of Extremes
172
Maximum of Normal R.V.'s: Penultimate Distributions
Inequality (5.2.6) is also applicable to problems where approximate distributions which are different from the limiting ones are taken. The first example
will show that Wei bull distributions G2 ,a(n) with O((n) + 00 as n + 00 provide
more accurate approximations to distributions of sample maxima of normal
r.v.'s than the limiting distribution G3 .
The use of a "penultimate" distribution was already suggested by Tippett
in 1925. For a numerical comparison of the "ultimate" and "penultimate"
approximation we also refer to Fisher and Tippett (1928).
EXAMPLE
5.2.9. Let F(x) = <I>(b  b 1
+ b 1 x) where b is the solution of the
equation
n<p(b  b 1 )
= b.
Notice that b and thus also F depends on n; we have b 2 = O(ljlogn).
Below we shall use the von Mises parametrization (see (1.3.17 of Weibull
d.f.'s, namely,
H_ b 2(X) =
G2,b2( 1
+ x/b 2 ).
Applying Lemma 5.2.2 to G = H_ b2 we obtain after some straightforward but
tedious calculations that
(5.2.15)
We indicate some details of the proof of (5.2.15). Check that
nf(x)
"'(x)
exp( x
+ x/b 2 
(1  x/b 2
x 2 /2b 2 )
1
To establish a sharp estimate of J(nfN  1  log nfN) dG proceed in the
following way: (a) Apply Lemma A.2.1 to the integrand evaluated over the
interval [  cb, cb] with c being sufficiently small. (b) Use the crude inequalities ePx /a ~ (1 + x/O()P for x> 0( and (1 + x/O()a ~ 1 + x + (0(  l)x 2 /20( for
x> 0 and 0( ~ 2 to obtain estimates of the integral over (00, cb) and
(cb, b2 ).
Maximum of Normal R.V.'s: Expansions of Length Two
From Lemma 5.2.10 it will become obvious that
(5.2.16)
provides an expansion oflength two of <l>n(b  b 1 + b 1 x).
However, since this expansion is not monotone increasing it is evident that
(5.2.15) cannot be formulated with H_b2 replaced by this expansion since the
Hellinger distance is only defined for d.f.'s. One might overcome this problem
5.2. Hellinger Distance between Exact and Approximate Distributions
173
by extending the definition of the Hellinger distance to signed measures.
Another possibility is to redefine the expansion in such a way that one obtains
a probability measure; this was e.g. achieved in Example 5.2.9. To reformulate
(5.2.15) we need the following lemma which concerns an expansion of length
two of von Mises d.f.'s Hp.
Lemma 5.2.10. For every real {3 denote by fLp the signed measure which
corresponds to the measure generating function
Let again Hp denote the von Mises distribution with parameter {3. Then,
sup IHp(B)  fLp(B) I = 0(f32).
B
PROOF.
Apply Lemma A.2.1 and Lemma A.3.2.
Thus as an analogue to (5.2.15) we get
sup IP{bn(Xn:n  (bn  bn 1 )) E B}  fLb2(B)1
B
"
= O((logn)2)
(5.2.17)
where Xn:n is the maximum of n i.i.d. standard normal r.v.'s, and bn is the
solution of the equation nqJ(b  b 1 ) = b.
Figures 5.2.15.2.3 concern the density fn of $n(bn + an'), with bn =
$1(1  lin) and an = 1/(ncp(bn)) (compare with P.5.8), the Gumbel density
g3 and the derivative g3(1 + hn) of the expansion in (5.2.16).
Observe that fn and g3(1 + hn) have modes larger than zero; moreover,
g3(1 + hn) provides a better approximation to fn than g3'
0.5
3
Figure 5.2.1. Normalized density 1. (dotted line) of maximum of normal r.v.'s, Gumbel
density 93, and expansion 93(1 + h.) for n = 40.
5. Approximations to Distributions of Extremes
174
In order to get a better insight into the approximation, indicated by Figure
5.2.1, we also give illustrations concerning the error of the approximation.
10
0.025
Figure 5.2.2.
in 
g3' in
g3(1
+ hnl for n = 40.
0.025
10
0.025
Figure 5.2.3.
in 
g3, in
g3(1
+ hnl for n =
400.
We are well aware that some statisticians take the slow convergence rate
of order O(1/log n) as an argument against the asymptotic theory of extremes,
perhaps, believing that a rate of order O(nl/2) ensures a much better accuracy
of an approximation for small sample sizes. However, one may argue that
from the historical and mathematical point of view it is always challenging to
tackle this and related problems. Moreover, one should know that typical
statistical problems in extreme value theory do not concern normal r.v.'s.
The illustrations above and further numerical computations show that the
Gumbel approximation to the normalized dJ. and density of the maximum
of normal r.v.'s is of a reasonable accuracy for small sample sizes. This may
5.2. Hellinger Distance between Exact and Approximate Distributions
175
serve as an example that the applicability of an approximation not only
depends on the rate of convergence but also on the constant involved in the
error bound.
If a more accurate approximation is needed then, instead of increasing the
sample size, it is advisable to use an expansion oflength two or a penultimate
distribution. Comparing Figures 5.2.2 and 5.2.3 we see that the expansion of
length two for n = 40 is of a higher accuracy than the Gumbel approximation
for n = 400.
The limit theorem and the expansion give some insight into the asymptotic.;
behavior of the sample maximum. Keep in mind that the dJ. cI>n of the sample
maximum itself may serve as an approximate dJ. in certain applications (see
Reiss, 1978a).
Expansions of Length Two
Another example of an expansion of length two is obtained by treating a
refinement of Corollary 5.2.7 and Remark 5.2.8. In Remark 5.2.8 we studied
distributions of sample maxima under densities of the form f = 1/1(1 + h)
where h varies over a certain class of functions. Next, we consider densities of
the form
= 1/1(1
+ P + h)
with p being fixed. Moreover, 1/1 is given as in (5.2.10).
Below, an expansion of length 2 of distributions of sample maxima is
established where the leading term of the expansion is an extreme value
distribution G and the second term depends on G and p. Let
Kx ap
i= 1
p(x) = K( _x)a p if i = 2
(5.2.18)
Ke Px
i=3
for some fixed K
0 and p > 0, and
Ih(x)1
where L > 0 and 0 < p
()
Lx aa
L( _x)aa
Le ax
i=1
if i = 2
i=3
(5.2.19)
1. The expansion of length two is given by
Gp.n(x) = G(x{ 1  n P
1'' (G) P(Y)I/I(Y)dY]
(5.2.20)
for oc(G) < x < w(G). This may be written
X(l+p)a
Gp,n(x) = G(x{ 1 + n P1
~ p .( _x)<l+ p)a]
e(l+p)x
i = 1
if i = 2
i=3.
(5.2.21)
5. Approximations to Distributions of Extremes
176
Notice that f = 1/1(1 + p + h) and Gp n arise from the special case with i = 2
and C/. = 1 via the transformation Ii.a = Gi~~ 0 G2 1 .
It is easy to check that Gp,n is a dJ. if n is sufficiently large; more precisely,
this holds if, and only if,
(5.2.22)
Theorem 5.2.11. Let G, 1/1 and T be as in Corollary 5.2.7. Assume that the
underlying density f has the representation
f(x) = l/I(x)(1
+ p(x) + hex)),
T(x o)
< x < w(G),
(5.2.23)
and =0,
Put
if x> w(G), where Xo < 0 and p, h satisfy (5.2.18) and (5.2.19).
where
dn are the constants of Theorem 5.1.1. Then,
Cn,
H(F:, Gp,n)
PROOF.
O(n min (b,2 p ).
Apply Lemma 5.2.2.
It was observed by Radtke (1988) (compare with P.5.l6) that for a special
case the expansion Gp,n(x) can be replaced by G(bn + anx) where G is the
leading term of the expansion and bn + 0 and an + 1 as n + 00. Notice that
G(bn + anx) can be writtenup to terms of higher orderas
G(x) [1
+ I/I(x)(bn + (an
 1)x)]
where again 1/1 = G'IG. One can easily check that such a representation holds
in (5.2.21) if, and only if, i = 1 and p = IIC/..
5.3. The Structure of Asymptotic Joint
Distributions of Extremes
Let us reconsider the stochastical model which was studied in Section 5.2. The
sample maxima Mn,i:= max(nUl)+1,"" nJ are the observed r.v.'s, and it is
assumed that
(a) Mn, l' ... , Mn,N are i.i.d. random variables,
(b) the (possibly, nonobservable) r.v.'s n(il)+1,
... , ni
are i.i.d. for every
i= 1, ... ,N.
The r.v.'s n(il)+l' ... , ni may correspond to data which are collected
within the ith period (as e.g. the amount of daily rainfall within a year). Then,
the sample Mn, 1, ... , Mn,N of the annual maxima can be used to estimate the
unknown distribution of the maximum daily rainfall within a year. Condition
5.3. The Structure of Asymptotic Joint Distributions of Extremes
177
(a) seems to be justified in this example, however, the second condition is
severely violated. It would be desirable to get some insight (within a mathematical model) into the influence of a deviation from condition (b), however,
this problem is beyond the scope ofthis book. With the present stateoftheart
one can take some comfort from experience and from statements as e.g. made
in Pickands (1975, page 120) that "the method has been shown to be very
robust against dependence" of the r.v.'s ~n(il)+1' , ~ni'
It may happen that a certain amount of information is lost if the statistical
influence is only based on maxima. Thus, a different method was proposed by
Pickands (1975), namely, to consider the k largest observations of the original
data. This method is only applicable if these data can be observed. For the
mathematical treatment of this problem it is assumed (by combining the
conditions (a) and (b that ~ 1, ... , ~nN are i.i.d. random variables. The statistical inference will be based on the k largest order statistics X nN k+l:nN ~ ~
X nN : nN of ~1' ... , ~nN' In this sequel, the sample size will again be denoted by
n instead of nN.
In special cases, a comparison of the two different methods will be made
in Section 9.6. The information which is lost or gained by one or the
other method can be indicated by the relative efficiency between statistical
procedures which are constructed according to the respective methods.
One should keep in mind that such a comparison heavily depends on the
conditions stated above. For example one can argue that the dependence of
the rainfall on consecutive days has less influence on the stochastic properties
of the annual maxima compared to the influence on the k largest observations
within the whole period. Thus, the second method may be less robust against
the departure from the condition of independence.
The main purpose of this section is to introduce the asymptotic distributions of the k largest order statistics. Moreover, it will be of great importance
to find appropriate representations for these distributions. For the aims of this
section it suffices to consider order statistics from generalized Pareto r.v.'s as
introduced in (1.6.11). Notice again that the same symbol will be used for the
dJ. and the pertaining probability measure.
Upper Extremes of Uniform R.V.'s
Let y"k+l:n be the kth largest order statistic of n i.i.d. random variables with
common dJ. W2 ,l (the uniform distribution on ( 1, 0, In Section 5.1 it was
proved that n y"k+l:n is asymptotically equal (in distribution) to a "negative"
gamma r.v.
where ~ 1, ... , ~k are i.i.d. random variables with common "negative" exponential dJ. F(x) = eX for x < O. An extension of the result for a single order statistic
5. Approximations to Distributions of Extremes
178
to joint distributions of upper extremes can easily be established by utilizing
the following lemma.
Lemma 5.3.1. For every k = 1, ... , n we have
sup IP{ (n v'"n, n v,,l:n, 00., n v,,k+l:n) E B}  P{ (Sl' S2' 00., Sk)
sup IP{nVnk+l:n
B}  P{Sk
B} I
B}I.
It is obvious that "~" holds. At first sight the equality looks surprising,
however, the miracle will have a simple explanation when the distributions
are represented in an appropriate way.
From Corollary 1.6.11 it is immediate that
v,,k+2:n '~k+l:n ) =d (Sl
Skl
Sk )
( v,,:n
 , ,
 , ... ,   ,   .
v,,l:n
v,,k+1:n
S2
Sk Sn+l
(5.3.1)
Thus we easily get
sup IP{(nv,,:n,nv,,l:n,oo.,nv,,k+l:n)
B}  P{(Sl,S2,oo.,Sd E B}I
Sl
Skl
Sk
)
}
sup IP {(~S ,oo"S'
/
EB
B
2
k  Sn+1 n
Sl
Skl ) EB } I =:A.
P {(S2,oo.,s;:,Sk
Notice that the first k  1 components in the random vectors above are
equal. Moreover, it is straightforward to verify that the components in
each vector are independent since according to Corollary 1.6.11(iii) the r.v.'s
SdS2, 00', Sn/Sn+1, Sn+1 are independent. An application of inequality (3.3.4)
(which concerns an upper bound for the variational distance of product measures via the variational distances of the single components) yields
A~suplp{_
Sk / EB}P{SkEB}1
B
Sn+l n
=
sup IP{nv,,k+l:n
B
B}  P{Sk
B}I.
Thus, Lemma 5.3.1 is proved.
Combining Lemma 5.1.5 and Lemma 5.3.1 we get
Lemma 5.3.2. For every fixed k
1 as n >
00,
sup IP {(n v,,:n, n v,,l :n' 00., n v,,k+l:n) E B}  P{ (Sl' S2,' 00, Sk) E B} I > O.
B
The limiting distribution in Lemma 5.3.2 will be denoted by G2. 1 k .
G2.1.j~the limiting distribution of the jth largest order
It is apparent that
5.3. The Structure of Asymptotic Joint Distributions of Extremes
179
statisticis the jth marginal distribution of G2 1.k. From Lemma 1.6.6(iii) we
know that the density, say, g2, 1,k of G2 , 1,k is given by
g2, l,dx)
= exp(xk),
(5.3.2)
and = 0, otherwise.
Upper Extremes of Generalized Pareto R.V.'s
The extension of Lemma 5.3.2 to other generalized Pareto drs W;,a is
straightforward.
Let again T;,a denote the transformation in (1.6.10). We have T 1 ,a(x) =
(_x)l/a, T2 ,a(x) = ( _x)l/a, and T3(x) = loge x) for a.) < x < 0.
Denote by G;,a,k the distribution of the random vector
(5.3.3)
The transformation theorem for densities (see (1.4.4)) enables us to compute
the density, say, g;,a,k of G;,a,k' We have
gl,a,k(X) = akexp(x;a)
TI x
k
j=l
g2,a,k(X) = akexp( ( Xk)a)
j(a+l),
TI (_x)a\
j=l
and the densities are zero, otherwise.
Notice that the following representation of the density gi,a,k holds:
g;,a,k(X) = G;,a(x k )
k1
TI !/I;,a(Xj) = g;,a(x j=l
TI !/I;,a(xj )
j=l
(5.3.5)
k)
Corollary 5.3.3. Let Xr:n be the rth order statistic of n i.i.d. random variables
with common generalized Pareto df. W;,a' Then,
sup IP{(c;1(Xn j + 1,n  dn}}J=l
B}  G;,a,k(B)I+ 0,
n +
00,
where
Cn
and dn are the constants of Theorem 5.1.1.
PROOF. Straightforward from Lemma 5.3.2, the definition of G;,a,k and the fact
that
5. Approximations to Distributions of Extremes
180
Domains of Attraction
This section concludes with a characterization of the domains of attractions
of joint distributions of a fixed number of upper extremes by means of the
corresponding result for sample maxima.
First, we refer to the wellknown result (see e.g. Galambos (1987), Theorem
2.8.2) that a dJ. belongs to the weak domain of attraction of an extreme value
dJ. Gi a if, and only if, the corresponding result holds for the kth largest order
statistic with Gi,a,k as the limiting dJ.
Our interest is focused on the convergence W.r.t. the variational distance.
Theorem 5.3.4. Let F be a df. with density f Then, the following two statements
are equivalent:
(i) F belongs to the strong domain of attraction of an extreme value distribution
G E {Gl,a' GZ,a, G3 : 0: > a}.
(ii) There exist constants an > and bn such that for every positive integer k
there is a nondegenerate distribution G(k) such that
sup IP{(a;l(Xn_ j+1,n  bn))j=l
B}  G(k)(B)I+ 0,
n +
00.
In addition,
if (i) holds for G = Gi,a then (ii) is valid for G(k) = Gi,a,k'
(ii) => (i): Obvious.
(i) => (ii): Let an > and bn be such that for every x
PROOF.
n +
(1)
00,
where G E {Gt,a, GZ,a, G3 : 0: > a}. According to Lemma 5.1.3, (i) is equivalent
to the condition that for every subsequence i(n) there exists a subsequence
m(n) := i(j(n)) such that
m(n)am(n)f(bm(n)
for Lebesgue almost all x
+ xam(n) + !/J(x),
n +
(o:(G), w(G)) where again !/J
00,
= G'jG. Thus, also
TI m(n)am(n)f(bm(n) + xjam(n) + j=t
TI !/J(xj),
j=l
n +
00,
(2)
for Lebesgue almost all x = (xt, ... ,xk ) E (o:(G),w(G)t Furthermore, deduce
with the help of (1.4.4) that the density of (a;l (Xn j+1 ,n  bn))j=t, say, fn,k is
given by
f",k(X) = Fnk(bn + xka n)
and
TI [(n j=l
+ l)aJ(bn + xja n)],
(3)
= 0, otherwise. Combining (1)(3) with (5.3.5) we obtain for G = Gi,a that
n +
00,
5.4. Expansions of Distributions of Extremes
for Lebesgue almost all x with tX(G} < Xk < ... <
Lemma 3.3.2 implies (ii) with G(k) = Gi,~,k'
181
Xl
< w(G}. Thus the ScMiTe
D
5.4. Expansions of Distributions of Extremes
of Generalized Pareto Random Variables
In this section we establish higher order approximations to the distribution
of upper extremes of generalized Pareto r.v.'s. First, we prove an expansion
of the distribution of the kth largest order statistic of uniform r.v.'s. The
leading term of the expansion is a "negative" gamma distribution G2 ,1,k' By
using the transformation technique the result is extended to generalized
Pareto r.v.'s. Finally, the results of Section 5.3 enable us to examine joint
distributions of upper extremes.
Let v,,k+l:n again be the kth largest order statistic of n i.i.d. (1, O)uniformly distributed r.v.'s. From (5.1.35) we already know that
sup IP{nv,,k+I:n E B}  G2 ,I,k(B}I+ 0,
n +
00.
We shall prove that the remainder term is bounded by Ckln where C is a
universal constant. The expansion of length 2 will show that this bound is
sharp. The extension from W2,l to a generalized Pareto dJ. WE {WI,~, W2,~'
W3: tX > O} is straightforward. We have
sup IP{c;I(Xnk+I:n  dn } E B}  Gi,~,k(B}1 :::; Ckln
(5.4.1)
where Cn and dn are the usual normalizing constants.
In Section 5.5 we shall see that if the generalized Pareto dJ. W is replaced
by an extreme value dJ. G E {GI,~, G2,~' G3 : tX > O} then the bound in (5.4.1) is
of order o (P/2 In}.
Moreover, as it will be indicated at the end of this section, F has the tail of
a generalized Pareto dJ. if an inequality of the form (5.4.1) holds. Therefore,
in a certain sense, the generalized Pareto dJ.'s occupy the place of the
maxstable extreme value dJ.'s as far as joint distributions of extremes are
concerned.
Extremes of Uniform R.V.'s
Let us begin with a simple result concerning central moments of the gamma
distribution G2 ,1,k'
Lemma 5.4.1. The ith central moment
u(i, k} =
(x
+ k}i dG 2 ,l,k(X)
5. Approximations to Distributions of Extremes
182
of G2,1.k fulfills the recurrence relation
+ 2, k) =
u(i
(i
+ 1) [ku(i, k) 
u(i
+ 1, k)].
(5.4.2)
Moreover,
fix
6k
+ kl i dG2,l,k(X) :5; i!ki/2.
(5.4.3)
As special cases we note u(l, k) = 0, u(2, k) = k, u(3, k) =  2k, u(4, k)
+ 3k 2.
PROOF. Recall that the density of G2 ,l,k is given by
g2,l.k(X)
(k  I)!, x < O. By partial integration we get
 f (i
+ l)(x + k)iX dG 2,l,k(x) = f
(x
= ex(_x)klj
+ k)i+l x dG 2,l,k(x) + ku(i + 1, k).
Now, (5.4.2) is straightforward since
u(i
+ 2, k) =
f (x
+ k)i+l x dG2,l,k(x) + ku(i + 1, k)
=  f (i
+ l)(x + k)i x dG2,l,k(x) =
(i
+ 1) [ku(i, k) 
u(i
+ 1, k)].
Moreover, because of (i + 1) [(i + I)! + i!] = (i + 2)! we obtain by induction over i that IU(i, k)1 :5; i!ki/2j2. This implies (5.4.3) for every even i. Finally,
the Schwarz inequality yields
fix
+ kl2i+l x dG 2,l,k(x) :5; (2i + 1)!k(2i+l)/2.
D
The proof is complete.
A preliminary higher order approximation is obtained in Lemma 5.4.2.
em > 0 such
that for nand k E {I, ... , n} with kjn sufficiently small (so that the denominators
below are bounded away from zero) the following inequality holds:
Lemma 5.4.2. For every positive integer m there exists a constant
2(ml)
G2,l.k(B) + i~ P(i, n  k)
sup P {n v,,k+l:n
B} 
J (x + k)i dG2,l,k(X)
B
=2..,..(m~1)~==
1+
Moreover,
p(i,n)
j=O
p(i,n  k)u(i,k)
i=2
(l Y(. n .)n(ij)jj!
1]
and u(i, k) is the ith central moment of G2,1.k.
5.4. Expansions of Distributions of Extremes
183
As special cases we note P(2, n) = 1/2n, P(3, n) = 1/3n 2, P(4, n) = 1/8n 2
1/4n 3 Moreover, IP(2i  1, n)l, IP(2i, n)1 :s; Cmn i, i = 1, ... , m  1.
PROOF. Put
gn(x)
x + k)nk(  X)kl
1+n_ k
(k _ I)! 1(n,o)(x),
k (
=e
From Theorem 1.3.2 we conclude that gn/J gix)dx is the density ofnv,,_k+l:n.
Moreover, we write
fix)
[1 + 2:~1)
P(i, n  k)(x
+ k)iJ g2, l,k(x),
Lemma A.2.1 yields
Ign(x)  J..(x)1 :s; C(n  k)m[lx
+ kl 2m  l + (x + k)2m]g2, 1,k(x)
(1)
for every x E An := {X < 0: Ix + kl :s; (n  k)1/2} where, throughout, C will be
used as a generic constant that only depends on m.
From (5.4.3) and from the upper bound of P(i, n  k) as given in Lemma
A.2.1 we conclude that Jfn(x)dx ~ 1/2 if kin is sufficiently small. Thus, by (1),
Lemma A.3.2, and (5.4.3) we obtain
s~p /p{nv,,k+l:n
:s; C
Moreover, because of (1
Schwarz inequality yields
L:; Ign(x)  J..(X) Idx
+
B}  LJ..(X)dX/ fJ..(X)dX/
Ln Ign(x)  fn(x)1 dx + L:; Ign(x)  J..(x)1 dx
:s; C(k/nr
:s; 2G2,1,k(A~)
2(ml)
(2)
L:; Ign(x)  J..(X) Idx.
+ x/n)n :s; exp(x)
i~ IP(i,n  k)1 A:; Ix
we have gn :s; g2,1,k' Thus, the
+ kl i dG2,1,k(X)
:s; C(k/nr.
Combining this and (2) the proof is completed.
The following theorem is an immediate consequence of Lemma 5.4.2 and
Lemma 3.2.5. Moreover, we remark that the polynomials Pi,k,n can easily be
constructed by means of formula (3.2.9).
5. Approximations to Distributions of Extremes
184
Theorem 5.4.3. For every positive integer m there exists a constant Cm > 0 such
that for every nand k E {I, ... , n} the following inequality holds:
sup Ip{nv,.k+l,n
B
B}  [G2,l,k(B)
+ ~f
J=l
Pj,k,n dG 2,l,kJI
~ Cm(k/n)m
where Pj,k,n are polynomials of degree 2j.
We note the explicit form of Pl,k,n and P2,k,n' We have
Pl,k,n(X) =  [(x
+ k)2
 k]/2(n  k)
and
(5.4.4)
P2,k,n(X) = /3(4, n  k) [(x
+ k)4 
 /3(2, n  k)u(2, k) [(x
u(4, k)]
+ k)2
+ /3(3, n 
k) [(x
+ k)3
 u(3, k)]
 u(2, k)].
Lemma 5.4.2 as well as Theorem 5.4.3, applied to m = 1, yield (5.4.1) in the
particular case of W = W2 ,l'
Extremes of Generalized Pareto R.V.'s
The extension of the results above to the kth largest order statistics Xnk+l,n
under a generalized Pareto dJ. WE {W1,a, W2,a' W3: a > O} is immediate. By
using the transformation technique we easily obtain (5.4.1) and the following
expansion
s~p IP {C;;l (Xn k+1,n 
dn) E B}  [ Gi,a,k(B)
+ ~f
J=l
Pj,k,n(log Gi,a) dGi,a,kJ I
~ Cm(k/n)m
(5.4.5)
where Cn and dn are the constants of Theorem 5.1.1 and Pj,k,n are the
polynomials of Theorem 5.4.3.
Next, we prove the corresponding result for joint distributions of upper
extremes.
Theorem 5.4.4. Let Xn,n, ... , X n k+1,n be the k largest order statistics under
the generalized Pareto df WE {W1,a, W2,a, W3: a > O}. Let Cn, dn, Cm, and Pj,k,n
be as above. Then,
s~p Ip{(C;;l(Xn,n 
d n), ... , c;;l(Xnk+l,n
dn))
B}  [Gi,a,k(B)
(5.4.6)
5.4. Expansions of Distributions of Extremes
185
PROOF. It suffices to prove the assertion in the special case ofi = 2 and IX = 1.
The general case can easily be deduced by means of the transformation
technique. Thus, we have to prove that
s~p IP {(n v,o:n,""
n v,,k+l:n) E B}  [ GZ ,I,k(B)
+"f JBr
J=1
Pi.k,n(Xk) dGZ,I,k(X)]
I~ Cm(k/n)m.
(5.4.7)
If m = 1 then the proof of Lemma 5.3.2 carries over if Lemma 5.1.5 is
replaced by Theorem 5.4.3.
If m > 1 then one has to deal with signed measures, however, the method
of the proof to Lemma 5.3.2 is still applicable. Notice that the approximating
signed measure in (5.4.7) has the density
+
(1 + ~~1
Pi,k,n(Xk)g2,I,k(X).
By inducing with x + (xdx 2, ... , xkdxk, Xk) one obtains a product measure
where the kth component has the density
(
1+
t:
ml
J=1
Pi.k,n g2,I,k'
Now inequality (A.3.3), which holds for signed measures, and Theorem 5.4.3
imply the assertion.
D
Next, Theorem 5.4.4 will be stated once more in the particular case of
m = 1. In an earlier version of this book we conjectured that adJ. F has the
tail of a generalized Pareto dJ. if an inequality of the form (5.4.1) (formulated
for d.f.'s) holds. This was confirmed in Falk (1989a).
Theorem 5.4.5. (i) If X n:n, ... , X nk+l:n are the k largest order statistics
under a generalized Pareto df. WE {WI, .. , W2,.. , W3: IX > O} then there exists a
constant C > 0 such that for every k E {1, ... , n},
sup IP{ (C;;I(X.. :..  d.. ), ... , c;;I(Xnk+l:n  dn
B
Ck/n
B}  Gi, .. ,k(B) I
~~
with Cn and dn as in Theorem 5.1.1.
(ii) Let F be a df. which is strictly increasing and continuous on a left neighborhood of w(F). If (5.4.8) holds with W, cn, and dn replaced by F and any
normalizing constants an > 0 and bn then there exist c > 0 and d such that
F((x  d)/c) =
for x in a neighborhood of w(WI, .. ).
WI, ..(x)
5. Approximations to Distributions of Extremes
186
For a slightly stronger formulation of (ii) and for the proof we refer to Falk
(1989a).
5.5. Variational Distance between Exact and
Approximate Joint Distributions of Extremes
In this section we prove a version of Theorem 5.2.5 valid for the joint distribution of the upper extremes. In view of our applications and to avoid technical
complications the results will be proved W.r.t the variational distance.
The Main Results
In a preparatory step we prove the following technical lemma. Notice that the
upper bound in (5.5.1) still depends on the underlying distribution through
the dJ. F. The main purpose ofthe subsequent considerations will be to cancel
the dJ. F in the upper bound to facilitate further computations. We remark
that the results below are useful modifications of results of Falk (1986a).
Lemma 5.5.1. Given ~ E {GI,a,k, G2 ,a,k, G3 ,k: ex > O} let G denote the first
marginal df Let Xn:n :?: ... :?: Xnk+l:n be the k largest order statistics of n
U.d. random variables with df F and density f Define again IjJ = g/G on the
support of G where g is the density of G. Moreover, fix Xo :?: 00.
Then,
sup IP{(Xn:n"",Xnk+I:n)
B
2~(MC) +
S;; [
M [
B}  ~(B)I
n(1  F(x k )) + log G(x k ) 
jt
(5.5.1)
10g(nfNHx) ]
d~(x)
J/
2
+ Ck/n
where M = {x:
PROOF.
Xj
>
X O,
f(xj) > 0, j = 1, ... , k} and C is a universal constant.
The quantile transformation and inequality (5.4.9) yield
P{ (Xn:n, ... , Xnk+l:n)
B} = P{ [F I (1
+ (nv,,_j+1:n)/n]J=1
B}
= fln(B) + O(k/n)
(1)
uniformly over n, k, and Borel sets B where the measure fln is defined by
fln(B) = G2,I,k{X: n <
Xk
< '" < Xl' [F I (1
+ x)n)]J=1
E B}.
In analogy to the proof of Theorem 1.4.5, part III (see also Remark 1.5.3)
deduce that fln has the density hn defined by
5.5. Variational Distance
hn(x)
187
= exp[  n(1  F(xd)]
n (nf(xj)),
j=l
k
and = 0, otherwise.
In (1), the measure J1.n can be replaced by the probability measure Qn = J1.n/bn
where
1  exp( n)
k1
L.
ni/j!
j=O
= 1 + o (k/n).
Denote by gk the density of ~. Recall that gk(X) = G(Xk) n~=l r/J(xj ) for
oc(G) < X k < ... < Xl < w(G). Now, Lemma A.3.5, applied to Qn and ~,
implies the asserted inequality (5.5.1).
0
Next we formulate a simple version of Theorem 5.5.4 as an analogue to
Corollary 5.2.3. The proof can be left to the reader.
Corollary 5.5.2. Denote by Gj the jth marginal df. of ~ E {G1 ,IX,k' G2,IX,k, G3,k:
oc > O}, and write G = G1. If, in addition to the conditions of Lemma 5.5.1,
G{J> O} = 1 and w(F) = 0 for i = 2, then
sup IP{(Xn:n, ... ,Xnk+1:n) E B}  ~(B)I
B
::;; [
::;;
j~
k
[ j~
f [nfN f [(nfN 
1  10g(nfN)] dGj
J1/2
+ Ck/n
1)2/(nfN)] dGjJ1 2+ Ck/n.
As a consequence of Corollary 5.5.2 one gets the following example taken
from Falk (1986a).
EXAMPLE 5.5.3. Let cp denote the standard normal density. Define bn by the
equation bn = cp(bn). Let Xn:n ~ ... ~ X nk+1:n be the k largest order statistics
of n i.i.d. standard normal r.v.'s. Then,
k}
s~p IP {[bn(Xnj+1:n  bn)]j=l E B  G3 ,k(B)1 ::;; Ck
1/2 (log(k
+ lW
log n
The following theorem can be regarded as the main result of this section.
Notice that the integrals in the upper bound have only to be computed on
(xo,w(F)). Moreover, the condition G{J > O} = 1 as used in Corollary 5.5.2
is omitted.
Theorem 5.5.4. Denote by Gj the jth marginal df. of ~ E {G1,IX,k' G2,IX,k,
G3 ,k: oc > O}, and put G = G1. Let F be a df. with density f such that f(x) > 0
5. Approximations to Distributions of Extremes
188
for Xo < x < w(F). Assume that w(F) = w(G). Define again
support of G where g is the density of G. Then,
sup IP{(Xn,n,,,,,Xnk+l,n)
B
t/I =
giG on the
B}  GdB) I
(5.5.2)
PROOF. To prove (5.5.2) one has to establish an upper bound of the righthand
side of (5.5.1).
Note that under the present conditions
= {x: Xj > xo'/(x) > O,j = I, ... ,k}
=
Moreover, recall that
Obviously,
Xl
{x: Xo < Xj < w(G),j
:2: ... :2:
Xk
= I, ... ,k}.
for every x in the support
of~.
(1)
I':6
Denote by gk the density of Gk. Recall that Gk = G
(log GYlj! and
= g( log G)kl/(k  I)!. In analogy to inequality (2) in the proofto Lemma
5.5.2. we obtain
gk
f ~~ (1 
[1  F(x k )] d~(x) =
F)dGk ::;
W(G)
Xo
= j~
Moreover,
(log G(x k
d~(x)
f(y)Gk(y)dy
[(kl
)]
(fN) j~o(logGYlj!
dG
(2)
fW(G)
(fN)dGj .
Xo
W(G)
(log G(x dGk(x)
Xo
=  k
W(G)
g(x)( log G(xklk! dx
Xo
= 
f~~
k(I 
Gk+l
(3)
(x o
Now the proof can easily be completed by combining (5.5.1) with (1)(3).
Notice that Theorem 5.2.5 is a special case of Theorem 5.5.4.
189
5.5. Variational Distance
Special Classes of Densities
Finally, Theorem 5.5.4 will be applied to the particular densities as dealt with
in Corollary 5.2.7.
Corollary 5.5.5. Assume that G E {G1.a' G2.a, G3 : rx > O} and 1/1, T are the
corresponding auxiliary functions with t/J = giG and T = G l 0 G2 l .
Assume that the density f of the df F has the representation
f(x)
and =0
l/I(x)eh(X),
T(xo) < x < w(G),
(5.5.3)
if x> w(G), where Xo < 0 and h satisfies the condition
Lx ab
Ih(x)lsL(xt b
Le bx
and L, Ci are positive constants. Then,
sup IP{[c;;l(Xn j+1,n  dn)];;l
B} 
if
i= 1
i=2
i= 3
(5.5.4)
a.. (B) I s D[(kln)bkl/2 + kin]
where Cn, dn are the constants of Theorem 5.1.1 and D > 0 is a constant which
only depends on X o , Ci, and L.
We have dn = 0 if i = 1,2, and dn = log n if i= 3; moreover, Cn = n l/a if i = 1,
cn = n l/a if i = 2, and Cn = 1 if i = 3.
Again it suffices to prove the result for the particular case G = G2 l .
Theorem 5.5.4 will be applied to xO.n = nxo and fn(x) = f(xln)/n. We obtain
PROOF.
sup IP {(nXn,n,"" nXnk+b) E B} B
[t
a.. (B) I
f:nxo [eh(x/n)  1  h(xln)] dGj(x)
+ (1 + (k 
1)( xg))Gk(nx O) + kGk+dnxo)J/2
(1)
+ Ckln.
Check that Gk(x) = O((k/lxlt) uniformly in k and x < 0 for every positive
integer m. Moreover, since h is bounded on (xo, 0) we have
jt
f:nxo [eh(x/n)  1  h(xln)] dGi x )
(2)
k
S Dn 2b j~
S Dn 2
where r(t)
j;l
fO
00
r(2Ci
Ixl 2Hj  l exp(x)j(j  I)! dx
+ j}/r(j)
= SO' x t  l exp(  x) dx denotes the r function.
5. Approximations to Distributions of Extremes
190
Finally, observe that (compare with ErdeIyi et al. (1953), formula (5),
page 47)
k
j=l
Now by choosing m
r(2c5
+ j)/r(j) ~ D
L jUl.
(3)
j=l
2c5 the asserted inequality is immediate from (1)(3).
D
EXAMPLE 5.5.6. If fE {gl,lJ,g2,IJ,g3: IX > O}that is the case of extreme value
densitiesthen Corollary 5.5.5 is applicable with c5 = 1. Thus, the error bound
is of order O(k3/2 In) which is a rate worse than that in the case of generalized
Pareto densities. Direct calculations show that the bound o (k 3/2 In) is sharp
for k > 1.
5.6. Variational Distance between Empirical
and Poisson Processes
In this section we shall study the asymptotic behavior of extremes according
to their multitude in Borel sets. This topic does not directly concern order
statistics. It is the purpose of this section to show that the results for
order statistics can be applied to obtain approximations for empirical point
processes.
Preliminaries
Let el' ... , en be i.i.d. random variables with common dJ. F which belongs to
the weak domain of attraction of G E {G1,1J' G2 ,1J' G3 : IX > O}. Hence according
to (5.1.4) there exist an> 0 and bn such that
n(l  F(bn + anx)) + log G(x),
n +
00,
(5.6.1)
for x E (IX (G), w(G)). According to the Poisson approximation to binomial r.v.'s
we know that
n
j=l
l(X,OO)(a;l(ej
bn ))
(5.6.2)
is asymptotically a Poisson r.v. with parameter A. = log G(x).
Our investigations will be carried out within the framework of point
processes and in this context the expression in (5.6.2) is usually written in the
form
n
L e(~rbn)/aJB)
j=l
(5.6.3)
5.6. Variational Distance between Empirical and Poisson Processes
191
where 8z (B) = 1B(Z) and B = (x, (0). With B varying over all Borel sets we
obtain the empirical (point) process
n
Nn
= j=l
L 8(~rb")/a"
(5.6.4)
of a sample of size n with values in the set of point measures.
Recall that /1 is a point measure if there exists a denumerable set of points
Xj' j E J, such that
/1 =
jeJ
8x
and /1(K) < 00 for every relatively compact set K. The set of all point measures
M is endowed with the smallest afield .A such that the "projections" /1 + /1(B)
are measurable. It is apparent that N: n + M is measurable if N(B): n+
[0, 00] is measurable for every Borel set B. If N is measurable then N is called
a point process. Hence, the empirical process is a point process. Certain
Poisson processes will be the limiting processes of empirical processes.
Homogeneous Poisson Process
Let e1, ... , en be i.i.d. random variables with common dJ. W2 ,l the uniform
dJ. on ( 1, 0). In this case, the empirical process is given by
(5.6.5)
In the limit this point process will be the homogeneous Poisson process No
with unit rate. The Poisson process No is defined by
00
No =
L 8s
j=l
(5.6.6)
where Sj is the sum ofj i.i.d. standard "negative" exponential r.v.'s. Moreover,
M is the set of all point measures on the Borel sets in ( 00,0).
For every s > and n = 0,1,2, ... define the truncation N~S) by
(5.6.7)
Theorem 5.6.1. There exists a universal constant C > such that for every
positive integer nand s ~ log(n) the following inequality holds:
sup IP{N~S) E M}  P{N~S) E M}I ~ Cs/n.
Me.A
(5.6.8)
PROOF. Let v,,:n ~ ... ~ V1 : nbe the order statistics ofn i.i.d. random variables
with uniform distribution on ( 1, 0). Let k == k(n) be the smallest integer such
that
(1)
5. Approximations to Distributions of Extremes
192
In this sequel, C will denote a constant which is independent of nand
s :2': log(n). It follows from the exponential bound theorem for order statistics
(see Lemma 3.1.1) that k :::;; Cs. Write
k
N(')
O,k
= "L...
;=1
SSj (.
n [s , 0))
and
(2)
k
N(')
n,k
= "L...
;=1
S nVni+l:n (.
n [s,0))
.
It is immediate from (1) that for n :2': 1,
sup
MeA
IP{N~') E
M} 
p{M~i E
M}I :::;; n 1
(3)
From Theorem 5.4.4 we know that
Note that N~~L n :2': 1, and Nd~)k may be written as the composition of the
random vectors (n v,,:n,' .. , n v,,k+1:n), n :2': 1, and (Sl"'" Sk), respectively, and
the measurable map
k
(x 1 ,,xd +
L sx,
;=1
having its values in the set of point measures.
Therefore, (4) yields
(5)
sup IP{N~~i E M}  P{Nd~)k E M}I :::;; Ck/n.
MeA
Moreover, (1) and (4) yield
P{Sk:2': S} :::;; Ck/n
(6)
and hence, in analogy to (3),
sup IP{Nd~)k
MeA
M}  P{Nd') E M}I :::;; Ck/n.
(7)
Now (3), (5), (7), and the triangle inequality imply the asserted inequality.
The bound in Theorem 5.6.1 is sharp. Notice that for every k E {I, ... , n}
sup IP{Nn ( t,O) < k  I}  P{No( t,O) < k  1}1
s:s;; t
= sup lP{nv,,k+1:n:::;; t}  G2 ,1,k( t)l.
(5.6.9)
s:S;;  t
Hence a remainder term of a smaller order than that in (5.6.8) would yield a
result for order statistics which does not hold according to the expansion of
length 2 in Theorem 5.4.3.
5.6. Variational Distance between Empirical and Poisson Processes
193
Extensions
Denote by Vo the Lebesgue measure restricted to (00,0). Recall that Vo is the
intensity measure of the homogeneous Poisson process No. We have
(5.6.10)
Write again 7i.a = G~~ 0 G2 1 (see (1.6.10)). Denote by Mi the set of point
measures on (a(G i.a), w(Gi.a)) and by .$( the pertaining afield. Denote by 7i.a
also the map from Ml to Mi where 7i.all is the measure induced by 11 and 7i.a
Notice that if 11 =
eXj then
Lid
Tl,ar/I
= "i..J eT
jEJ
( ).
i,cr:Xj
Define
(5.6.11)
Ni.a.n = 7i.a(NJ
for N n as in (5.6.5) and (5.6.6). It is obvious that for n = 1, 2, ...
N l,a,n
="i...J e(,
d
k=l
(5.6.12)
<,ok d)'
n len
where ~ l ' ... , ~n are i.i.d. random variables with common generalized Pareto
dJ. Wi.a; moreover, Cn > 0 and dn are the usual normalizing constants as
defined in (1.3.13).
It is well known that N i a == N i a O is a Poisson process with intensity
measure vi.a = 7i.aVO (having the mean value function 10g(Gi.a)). Recall that
the distribution of N i a is uniquely characterized by the following two
properties:
(a) Ni.a(B) is a Poisson r.v. with parameter vi.a(B) if vi.a(B) < 00, and
(b) Ni.a(Bd, ... , Ni.a(Bm) are independent r.v.'s for mutually disjoint Borel
sets B 1 , . , Bm.
Define the truncated point processes Ni~s~.n by
(5.6.13)
From Theorem 5.6.1 and (5.6.11) it is obvious that the following result
holds.
Corollary 5.6.2. There exists a universal constant C > 0 such that for every
positive integer nand s ~ log(n) the following inequality holds:
sup IP{Ni~s~.n
ME .Jt;
M}  P{Ni~s~.O
M}I ::;; Cs/n.
(5.6.14)
Notice that Corollary 5.6.2 specialized to i = 2 and a = 1 yields Theorem
5.6.1.
5. Approximations to Distributions of Extremes
194
Final Remarks
Theorem 5.6.1 and Corollary 5.6.2 can easily be extended to a large class of
dJ.'s F belonging to a neighborhood of a generalized Pareto dJ. W; . with
Ni~s~.O again being the approximating Poisson process. This can be proved just
by replacing Theorem 5.4.3 in the proof of Theorem 5.6.1 (for appropriate
inequalities we refer to Section 5.5). Moreover, in view of (5.6.9) and Theorem
5.4.5(ii) it is apparent that a bound of order O(s/n) can only be achieved if F
has the upper tail of a generalized Pareto dJ. The details will be omitted since
this topic will not be pursued further in this book.
In statistical applications one gets in the most simple case a model of
independent Poisson r.v.'s by choosing mutually disjoint sets. The value of s
has to be large to gain efficiency; on the other hand, the Poisson model
provides an accurate approximation only if s is sufficiently small compared
to n. The limiting model is represented by the unrestricted Poisson processes
N; .. One has to consider Poisson processes with intensity measures depending on location and scale parameters if the original model includes such
parameters. This family of Poisson processes can again be studied within a
3parameter representation.
P.5. Problems and Supplements
1. Check that the maxstability Gn(dn + xcn) = G(x) of extreme value drs has its
counterpart in the equation
n(1  W(d n + xc n)) = 1  W(x)
for the generalized Pareto d.f.'s WE {W1..' W2 .", W3:
(X
> o}.
2. Check that the necessary and sufficient conditions (5.1.5)(5.1.7) are trivially
satisfied by the generalized Pareto dJ.'s in the following sense:
(i) For x > 0 and t such that tx > 1:
(1  Wl .,,(tx))/(1  W1.,,(t)) =
(ii) For x < 0 and t
XIX.
> 0 such that tx > 1:
(l  W2 (tx))/(1  W2 .,,( t)) = (x).
(iii) For t, x > 0:
g(t) =
f'
(1  W3 (y))dy/(1  W3 (t)) = 1
and
(1  W3 (t
+ x))/(1
 W3 (t)) =
e
3. Let F l , F 2 , F 3, .,. be drs. Define G:'(x) = Fn(b: + a:x) and Gn(x) = F.(bn + anx)
where
> O. Assume that for some nondegenerate dJ. G*,
a:, a.
P.5. Problems and Supplements
195
G: > G*
weakly.
(i) The following two assertions are equivalent:
(a) For some nondegenerate dJ. G,
Gn > G weakly.
(b) For some constants a> 0 and b,
an/a:
>
and (bn  b:)/a:
>
as n >
00.
(ii) Moreover, if (a) or (b) holds then
G(x) = G*(b
+ ax) for all real x.
[Hint: Use Lemma 1.2.9; see also de Haan, 1976.]
4. (i) Let c be the unique solution of the equation
x 2 sin(l/x)
+ 4x + 1 =
on the interval ( 1, 0). Define the d.f. F by
F(x) = x 2 sin(1/x)
+ 4x + 1,
X E
(c,O).
Then, for every x,
Fn(x/4n)
>
G2 1 (x)
as n >
00.
However, F does not belong to the strong domain of attraction of G2 1 .
(Falk, 1985b)
(ii) The Cauchy dJ. F and density f are given by
F(x) = 1/2
+ (l/n) arc tan x
and
Verify the von Misescondition (5.1.24) with i = 1 and (l( = 1.
[Hint: Use the de l'Hospital rule.]
5. (Asymptotic drs of intermediate order statistics)
Let k(n) E {1, ... , n} be such that k(n) i 00 and k(n)/n > 0 as n > 00.
(i) The nondegenerate limiting drs of the k(n)th order statistic are given by
!Il(G31 (G))
on (l(G), w(G))
where G E {G1.a' G2.a, G3 : (l( > O}.
(Chibisov, 1964; Wu, 1966)
(ii) The weak convergence of the distribution of a;;l(Xk (n):n  bn) to the limiting
dJ. defined by G holds if, and only if,
[nF(bn + anx)  k(n)]/k(n)I/2 > G31 (G(X)),
6. Let
that
'I' '2' '3' ...
n >
00,
(l(G), w(G)).
(Chibisov, 1964)
be i.i.d. symmetric random variables (that is,
'i
4: ,;). Prove
196
5. Approximations to Distributions of Extremes
sup IP{max(lell, ... ,le.i) E B}  P{max(el, ... ,e2.) E B}I = O(n 1 ).
B
[Hint: Apply (4.2.10).]
7. Let b: be defined as in Example 5.1.4(5) and b.
Let
a. =
cI>l(1  l/n). Show that
lb.  b:1 = O((loglogn)2/(logn)3/2).
l/mp(b.) and
a: = (2 log n)1/2. Show that
la.  a:1
o ((loglog n)/(log n)3/2).
Show that
(Reiss, 1977a, Lemma 15.11)
8. For l' > 0 and a real number Xo let Fy be a dJ. with
Fix) = ycl>(x)
+ (1
 1') for x
~ Xo.
Put b. = cI>l(l  l/ny) and a. = l/mp(b.). Show that
sup 1F;(b.
x
+ a.x) 
G3 (x)(1
+ x 2e x /(410g n)1 = o ((log nr2)
and, thus,
sup 1F;(b. + x/b.)  G3 (x)1
O((log nrl).
(Reiss, 1977a, Theorem 15.17 and Remark 15.18)
9. (Graphical representation of generalized Pareto densities)
Recall that for Pareto densities w1.a(x) = IXX(l+a), X ~ 1, we have w1.a(l) = IX
(Fig. P.5.1). For the generalized Pareto type II densities w2,a we have w2,a(x) =
IX(  x)al  g2,a(X) as xi 0 (Fig. P.5.2).
1.5
2.0
2.5
Figure P.5.1. Pareto densities w1,a with IX = 0.1,0.5,1.5.
P.5. Problems and Supplements
197
1
Figure P.5.2. Generalized Pareto densities w2 with
Q(
= 0.5, 1, 1.5,2,3.
10. (Von Mises parametrization of generalized Pareto dJ.'s)
For P > 0, define
Vp(x) = 1  (1
+ PX)l/P if 0 <
x.
For P < 0, define
1  (1
+ PX)l/P
0< x <  
if
Vp(x) =
For
P=
x> .
0, define
Vo(x) = 1  e X
Show that
W1,1/P(x) =
W2,1/IPI(X) =
X 
for x> O.
1)
ifP > 0,
ifP < 0,
Vp ( P
Vpel;1
W3 (x) = Vo(x).
The density vp of Vp is equal to zero for x < O. Moreover, if P > 0 then
vp(x) = (1
+ PX)(l +lIP).
If P< 0 then
vp(x) =
(1
+ px)(1 +lIP)
o <x < liP
x 2
liP.
5. Approximations to Distributions of Extremes
198
Figure P.5.3. Standard exponential and Pareto densities
vp
with P= 0, 0.6, 2.
Figure P.504. Standard exponential and generalized Pareto type II densities
1, 0.75, 0.5, 004, O.
P=
vfJ
with
The Pareto densities vfJ with P~ 0 (Figure P.5.3) and the generalized Pareto
type II densities vp with Pi 0 (Figure P.5.4) approach the standard exponential
density Vo (dotted curve).
11. (Maxima with random indices)
Let ~i' i = 1,2, ... be i.i.d. random variables and let N(i), i = 0, 1, ... be positive
integervalued r. v.'s.
(i) If G is an extreme value dJ., and
(a) P{a;;l(X.,.  b.) ::;; x} + G(x), n + 00,
(b) N(i)/i + N(O), i
then
+ 00,
in probability,
i + 00.
(BarndorffNielsen, 1964)
P.5. Problems and Supplements
199
(ii) If the sequence (0 and N(j) are independent for every j then the condition
(i)(b) can be replaced by
(b') N(i)/i ~ N(O),
i + 00.
(iii) Show that the independence condition in (ii) cannot be omitted without
compensation. [Hint: Define N(i) = min{j: ~j > log i} for standard exponential r.v.'s.]
(M. Falk)
12. Show that the Cauchy dJ. with scale parameter, (J = n satisfies condition (5.2.14)
for i = 1 and a = 1 with b = 1. As a consequence one gets for the maximum Xn:n
of standard Cauchy r.v.'s that
sup IP{(n/n)Xn:n
B}  G1.1(B)1 = O(n 1 ).
13. Under the Weibull dJ. F. on the positive halfline defined by
F.(x)
one gets for every a
=1
X> 0,
exp( x'),
* 1,
sup IP{a(logn)ll/'(Xn:n  (logn)l/.)
B}  G3 (B)1 = O(I/logn).
14. (Bounds for remainder terms involving von Mises conditions)
Assume that the d.f. F has three continuous derivatives and that f
interval (xo, w(F)). Put
H
= F' >
0 on the
= (1  F)!f
Assume that the von Mises condition (5.1.25) holds for some i E {l, 2, 3} and a> 0
(with a = 1 if i = 3). Thus, we have
hi,.(x) := aH'(x)  7;,.( 1) + 0
as
xi w(F)
where again 7; . = G;~; 0 G2 1 . Notice that T1 ( 1) = 1, T2 ( 1) = 1 and
T3 ( 1) = O. Then for a(Gi. ) < x < w(Gi,.),
IFn(bn + anx)  Gi . (x)1
with
Xn
= O(lhi,.(xn)1 + n 1 )
F 1 (1  l/n) and the normalizing constants are given by
an
= a/(nf(x n))
and
bn =
Xn 
7; . ( l)an
(Radtke, 1988)
15. (Expansions involving von Mises conditions)
Assume, in addition to the conditions of P.5.14, that
(xo,w(F)). Then for a(Gi,.) < x < w(Gi. ),
f" > 0
on the interval
IFn(bn + anx)  Gi. (x)(1  hi .(xn)tfri .(X) [x  7; . ( 1)]2/2)1
= O(hi . (xnf + Ihi .(xn)llgi .(xn)1 + n 1 )
where gi . is another auxiliary function. We have
gi .
= h:'.H/hi. + 7; . ( 1)/a
5. Approximations to Distributions of Extremes
200
implicitly assuming that hi . # O. Moreover, assume that limxtw(F) 9i . (X) exists in
(00,00), and there exist real numbers K, such that
91 . (tX)
92 . (w(F)  tx)
93(X
= 91 .(x)(K, + o(Xo)) as x
= 92 .(w(F)
+ tH(x)) =
93(X)(K,
 x)(K,
+ o(Xo))
>
= 1,
w(F) for all t > 0 if j
+ o(Xo)) as x! 0 for all t > 0 if j = 2,
as x i w(F) for all reals t if j = 3.
(Radtke, 1988)
16. (Special cases)
(i) Let
x;::: I,
for some
C(
> 0 and 0 <
p ::::; 1. Then
with an and bn as above. Moreover, 9i . (X n) does not converge to zero as n > 00
(compare with P.5.15).
(ii) Let
F(x) = lx.ex
for
C(
P[I1 +~OgXl
x;::: I,
> O. Then
IFn(bn + anx)  Gl,.(x)(1  hl,.(x n)t/J1 .(x) [x  1]2/2)1
= O((logn)2/n2.
+ n 1)
and
h 1 (x n ) = O(n).
17. (i) Prove that for adJ. F and a positive integer k the following two statements
are equivalent:
(a) F belongs to the weak domain of attraction of an extreme value dJ.
G E {G 1 ,G2 ,G3 : C( > O}.
(b) There are constants an > 0 and bn such that the dJ.'s Fn.k defined by
Fn.k(x) = P{a;l(X.,n  bn)::::; x 1, ... ,a;1(Xn_H1 ,.  bn)::::;
converge weakly to a nondegenerate dJ. G(k).
(ii) In addition, if (a) holds for G = Gi then (b) is valid for
18.
G(k)
xd
Gi k
(i) There exists a constant C > 0 such that for every positive integer nand
k E {I, 2, ... , [n/2]} the following inequality holds:
s~p IP {en :3:)1/2 ( Un HLn 
n:
k)  k) B}  G2.l.k(B) I :: :; Ck 1/2 /n.
E
(Kohne and Reiss, 1983)
(ii) It is unknown whether the standardized distribution of Un  H1 ,. admits an
expansion of length m arranged in powers of k 1/2 /n where again G2.l.k is the
leading term of the expansion.
(iii) Reformulate (i) by using N(o.l) in place of G2 1 k
Bibliographical Notes
201
19. (Asymptotic independence of spacings)
There exists a constant C > 0 such that for every positive integer nand k E
{1, 2, ... , n} the following inequality holds:
sup IP{ (nUl:., n(U2 :.
B
where
~ l ' ... , ~k
U I :.), ... , n(Uk:.  UkI:.))
B}
are i.i.d. random variables with standard exponential dJ.
20. Show that under the triangular density
f(x) = 1  lxi,
x:;:;; 1,
one gets
sup IP{(n/2)1/2(X._ i + 1 :.
B
1)~=1 E B}  G2. 2.k (B)1 :;:;; Ck/n
where C > 0 is a universal constant.
21. (Problem) Prove inequalities w.r.t. the Hellinger distance corresponding to those
in Lemma 5.5.1 and Theorem 5.5.5.
22. For the k largest order statistics of standard Cauchy r.v.'s one gets
s~p Ip {(~X.:., ... ,~X'k+l:') E B}  G k(B)1 :;:; Ck
11
3/2 /n
where C > 0 is a universal constant.
23. Extend Corollary 5.6.2 to drs that satisfy condition (5.2.11).
Bibliographical Notes
An excellent survey ofthe literature concerning classical extreme value theory
can be found in the book of Galambos (1987). Therefore it suffices here to
repeat only some of the basic facts of the classical part and, in addition, to
give a more detailed account of the recent developments concerning approximations w.r.t. the variational distance etc. and higher order approximations.
Out of the long history, ofthe meanwhile classical part of the extreme value
theory, we have already mentioned the pioneering work of Fisher and Tippett
(1928), who provided a complete list of all possible limiting d.f.'s of sample
maxima. Gnedenko (1943) found necessary and sufficient conditions for adJ.
to belong to the weak domain of attraction of an extreme value dJ. De Haan
(1970) achieved a specification of the auxiliary function in Gnedenko's
characterization of F to belong to the domain of attraction of the Gumbel
dJ. G3
The conditions (1, oc) and (2, oc) in (5.1.24) which are sufficient for a dJ. to
belong to the weak domain of attraction of the extreme value dJ.'s Gl,Q! and
G2 .Q! are due to von Mises (1936). The corresponding condition (5.1.24)(3) for
202
5. Approximations to Distributions of Extremes
the Gumbel dJ. G3 was found by de Haan (1970). Another set of "von Mises
conditions" is given in (5.1.25) for dJ.'s having two derivatives. For i = 3 this
condition is due to von Mises (1936). Its extension to the cases i = 1, 2
appeared in Pickands (1986).
In conjunction with strong domain of attraction, the von Mises conditions
have gained new interest. The pointwise convergence ofthe densities of sample
maxima under the von Mises condition (5.1.25), i = 3, was proved in Pickands
(1967) and independently in Reiss (1977a, 1981d). A thorough study of this
subject was carried out by de Haan and Resnick (1982), Falk (1985b), and
Sweeting (1985).
Sweeting, in his brilliant work, was able to show that the von Mises
conditions (5.1.24) are equivalent to the uniform convergence of densities
of normalized maxima on finite intervals. We also mention the article of
Pickands (1986) where a result closely related to that of Sweeting is proved
under certain differentability conditions imposed on F.
In (5.1.31) the number of exceedances of n i.i.d. random variables over
a threshold Un was studied to establish the limit law of the kth largest
order statistic. The key argument was that the number of exceedances is
asymptotically a Poisson r.v. This result also holds under weaker conditions.
We mention Leadbetter's conditions D(u n ) and D'(un ) for a stationary sequence
(for details see Leadbetter et al. (1983)).
A necessary and sufficient condition (see P.5.5(ii)) for the weak convergence
of normalized distributions of intermediate order statistics is due to Chibisov
(1964). The possible limiting dJ.'s were characterized by Chibisov (1964) and
Wu (1966) (see P.5.5(i)). Theorem 5.1.7, formulated for G3 ,k instead of N(O,l)'
is given in Reiss (1981d) under the stronger condition that the von Mises
condition (5.1.25), i = 3, holds; by the way, this result was proved via the
normal approximation. The weak convergence of intermediate order statistics
was extensively dealt with by Cooil (1985,1988). Cooil proved the asymptotic
joint normality of a fixed number of suitably normalized intermediate order
statistics under conditions that correspond to that in Theorem 5.1.7. For the
treatment of intermediate order statistics under dependence conditions we
refer to Watts et al. (1982).
Bounds for the remainder terms of limit laws concerning maxima were
established by various authors. We refer to W.J. Hall and J.A. Wellner (1979),
P. Hall (1979), R.A. Davis (1982), and the book of Galambos (1987) for bounds
with explicit constants.
As pointed out by Fisher and Tippett (1928), extreme value dJ.'s different
from the limiting ones (penultimate dJ.'s) may provide a more accurate
approximation to dJ.'s of sample maxima. This line of research was taken up
by Gomes (1978, 1984) and Cohen (1982a, b). Cohen (1982b), Smith (1982),
and Anderson (1984) found conditions that allow the computation of the rate
of convergence w.r.t. the KolmogorovSmirnov distance. Another notable
article pertaining to this is Zolotarev and Rachev (1985) who applied the
method of metric distances.
Bibliographical Notes
203
It can easily be deduced from a result of Matsunawa and Ikeda (1976) that
the variational distance between the normalized distribution of the k(n)th
largest order statistic of n independent, identically (0, 1)uniformly distributed
r.v.'s and the gamma distribution with parameter k(n) tends to zero as n + 00
if k(n)!n tends to zero as n + 00. In Reiss (1981d) it was proved that the
accuracy of this approximation is :$; Ckln for some universal constant C. This
result was taken up by Falk (1986a) to prove an inequality related to (5.2.6)
W.r.t. the variational distance. A further improvement was achieved in Reiss
(1984): By proving the result in Reiss (1981d) w.r.t. the Hellinger distance
and by using an inequality for induced probability measures (compare with
Lemma 3.3.13) it was shown that Falk's result still holds if the variational
distance is replaced by the Hellinger distance. The present result is a further
improvement since the upper bound only depends on the upper tail of the
underlying distribution.
The investigation of extremes under densities of the form (5.2.14) was
initiated by L. Weiss (1971) who studied the particular case of a neighborhood
of Wei bull densities. The class of densities defined by (5.2.18) and (5.2.19)
corresponds to the class of dJ.'s introduced by Hall (1982a).
It is evident that ifthe underlying dJ. only slightly deviates from an extreme
value dJ. then the rate of convergence of the dJ. of the normalized maximum
to the limit dJ. can be of order o(n 1 ). The rate is of exponential order if F
has the same upper tail as an extreme value dJ. It was shown by Rootzen
(1984) that this is the best order achievable under a dJ. unequal to an extreme
value dJ. It would be of interest to explore, in detail, the rates for the second
largest order statistic.
Because of historical reasons we note the explicit form of the interesting
expansion in Uzgoren (1954), which could have served as a guide to the
mathematical research of expansions in extreme value theory:
log( log Fn(bn + xg(bn)))
2
= x + ~! g'(bn) + ~! [g(bn)g"(bn)  2g'2(bn)J + ... + ...
e X +'"
2n
24n 2
+ _ _ + __ e 2x +'"
_ _
8n 3
e 3x +'" + '"
where bn = pl(1  lin) and g = (1  F)lf The first two terms of the expansion
formally agree to that in (5.2.16) in the Gumbel case. However, as reported by
T.J. Sweeting (talk at the Oberwolfach meeting on "Extreme Value Theory,"
1987) the expansion is not valid as far as the third term is concerned.
Other references pertaining to this are Dronskers (1958), who established an
approximate density of the k(n)th largest order statistic and Haldane and
Jayakar (1963), who studied the particular case of extremes of normal r.v.'s.
Expansions oflength 2 related to that in (5.2.16) are well known in literature
(e.g. Anderson (1971) and Smith (1982)). These expansions were established in
204
5. Approximations to Distributions of Extremes
a particularly appealing form by Radtke (1988) (see P.5.15). From P.5.15 we
see that the rate of convergence, at which the von Mises condition holds, also
determines the rate at which the convergence to the limiting extreme value
dJ. holds. The available results do not fit to our present program since only
expansions of dJ.'s are treated. In spite of the importance of these results,
details are given in the Supplements. It is an open problem under which
conditions the expansions in P.5.15 lead to higher order approximations that
are valid W.r.t. the variational or the Hellinger distance. (5.2.15) and (5.2.16)
only provide a particular example. A certain characterization of possible types
of expansions of distributions of maxima was given by Goldie and Smith
(1987).
Weinstein (1973) and Pantcheva (1985) adopted a nonlinear normalization
in order to derive a more accurate approximation of the dJ. of sample
maxima by means of the limiting extreme value dJ. From our point of view,
a systematic treatment of this approach would be the following: First, find an
expansion of finite length; second, construct a nonlinear normalization by
using the "inverse" of the expansion as it was done in Section 4.6 (see also
Theorem 6.1.2).
The method to base the statistical inference on the k largest order statistics
may be regarded as Type II censoring. Censoring plays an important role in
applications like reliability and lifetesting. This subject is extensively studied
in books by N.R. Mann et al. (1974), A.J. Gross (1975), L.J. Bain (1978),
W. Nelson (1982), and J.F. Lawless (1982).
Upper bounds for the variational distance between the counting processes
Nn [ t,O), 0::::;; t::::;; S, and N 2 1 [ t,O), 0::::;; t::::;; s, may also be found in
Kabanov and Lipster (1983) and Jacod and Shiryaev (1987). The bounds given
there are of order S2 In and slnl/2 and therefore not sharp. Another reference
is Karr (1986) who proved an upper bound of order n 1 for fixed s.
In Chapter 4 of the book by Resnick (1987), the weak convergence of certain
point processes connected to extreme value theory is studied. For this purpose
one has to verify that the CTfield J!{ on the set of point measures is the
BorelCTfield generated by the topology of vague convergence. The weak
convergence of empirical processes can be formulated in such a way that it is
equivalent to the condition that the underlying dJ. belongs to the domain of
attraction of an extreme value dJ. Note that the "empirical point processes"
studied by Resnick (1987, Corollary 4.19) are of the form
00
k=l
G(k/n.
(~k dn)/c n )'
thus allowing a simultaneous treatment of the time scale and the sequence of
observations.
From the statistical point of view the weak convergence is not satisfactory.
The condition that F belongs to the domain of attraction of an extreme value
dJ. is not strong enough to yield e.g. the existence of a consistent estimator of
the tail index (that is, the index ry, of the domain of attraction). Thus, the weak
Bibliographical Notes
205
convergence cannot be of any help either if F satisfies stronger regularity
conditions.
We briefly mention a recent article by Deheuvels and Pfeifer (1988) who
independently proved a result related to Theorem 5.6.1 by using the coupling
method. We do not know whether their method is also applicable to prove
the extension, as indicated at the end of Section 5.6, where F belongs to a
neighborhood of a generalized Pareto dJ.
CHAPTER 6
Other Important Approximations
In Chapters 4 and 5 we studied approximations to distributions of central and
extreme order statistics uniformly over Borel sets.
The approximation over Borel sets is equivalent to the approximation of
integrals over bounded measurable functions. In Section 6.1 we shall indicate
the extension of such approximations to unbounded functions, thus, getting
approximations to moments of order statistics.
From approximations ofjoint distributions of order statistics one can easily
deduce limit theorems for certain functions of order statistics. Results of this
type will be studied in Section 6.2. We also mention other important results
concerning linear comoinations of order statistics which, however, have to be
proved by means of a different approach.
Sections 6.3 and 6.4 deal with approximations of a completely different
type. In Section 6.3 we give an outline of the wellknown stochastic approximation of the sample dJ. to the sample qJ., connected with the name of R.R.
Bahadur.
Section 6.4 deals with the bootstrap, a resampling method introduced by
B. Efron in 1979. We indicate the stochastic behavior of the bootstrap dJ. of
the sample qquantile.
6.1. Approximations of Moments and Quantiles
This section provides approximations to functional parameters of distributions of order statistics by means of the corresponding functional parameters
of the limiting distributions and of finite expansions. We shall only consider
central and intermediate order statistics.
6.1. Approximations of Moments and Quantiles
207
Moments of Central and Intermediate Order Statistics
We shall utilize the result of Section 4.7, that concerns Edgeworth type
expansions of densities of central and intermediate order statistics.
Theorem 6.1.1. Let q E (0, 1) be fixed. Assume that the df F has m
derivatives on a neighborhood of F 1(q), and that f(F 1(q)) >
Assume that (r(n)/n  q) = O(n1). Put a 2 = q(1  q).
Moreover, assume that
E IX"jl <
+ 1 bounded
where f
F'.
for some positive integer j and s E {1, ... ,j}.
CIJ
Then, for every measurable function h with Ih(x)1
tion holds:
Ixlk the following rela
1) f h dGr(n).n 1 O(n m/2 )
a
(Xr(n):n  F (q)) IEh (n1/2f(F1(q))
(6.1.1)
where
Gr(n).n 
and Si.n is a polynomial of degree
over n.
In particular, we have
m1
,,iI2S
L.... n
i.n
i=1
'V
ffi
+ <P
3i  1 with coefficients uniformly bounded
1
( ) _ [2q  1 aj'(F (q))J 2
S1.n t 3a + 2f(F1(q))2 t
[q
+ nq 
(6.1.2)
r(n)
+1
2(2q  1)J
3a
.
(6.1.3)
PROOF. Denote by f..(n),n the density of the normalized distribution of Xr(n):n
and by gr(n),n the density (that is, the derivative) of Gr(n),n' Put Bn =
[ log n, log n]. By P.4.S,
If
h(x)f..(n),n(x)dx 
~r
JBn
h(X)gr(n),n(X)dXI
(1)
Ih(x)IIf..(n),n(x)gr(n),n(x)1 dx+f Ih(x)1 (f..(n),n(x) + Igr(n),n(x)l)dx
= 0 (n mI2
In
B~
Ixlk<p(x)(1
+ Ixl3m)dx + I~
Ixlk(f..(n),n(X) + Igr(n),n(x)J)dx ).
It remains to prove an upper bound for the second term on the righthand
side of (1). Straightforward calculations yield
JB~
Ixlklgr(n),n(x)1 dx
O(n mI2 ),
6. Other Important Approximations
208
The decisive step is to prove that
an :=
Ixlkf,.(.).n(x)dx
O(nm/2).
B~
Apparently,
an=f
IXr(n):n F  1(qWdP
_I
(x,(")," <F
(q)I"}
(x,(")," > F 1 (q)+t"}
F 1 (q)l k dP =:
IXr(n):n 
O:n,l
+ a.,2
where tn = (logn)a/[n1/2f(F1(q))]. Applying Lemma 3.1.4 and Corollary
1.2.7 we get
O:n,l =
o (P{Xr(n):. < F1(q) 
tn}
o (P{Ur(n):. ~ F(F1(q) +
(q) 
(q)I"}
t.)}
b(r(n)  ks, n  (j + l)k  r(n)
b(r(n), n  r(n) + 1)
~ F(F
IXr(n):nl kdP )
I
(x,(")," <F
+ ks + 1)
P Ur(.)ks:nU+1)k
t n )})
where b denotes the beta function. Applying Lemma 3.1.1 one obtains that
= O(nm/2). We may also prove O:n,2 = O(nm/2) which completes the
0:.,1
~~
As a special case of (6.1.1) we obtain
EIF 1( ) _ F 1( )I k
n
q
q
(q(1  q))k/2
nk/2f(F l(q))k
f,
IkdcI>( ) + O( (k+1)/2)
x
n
(6.1.4)
Expansions of Quantiles of Distributions of Order Statistics
Recall the result of Section 4.6 where we obtained an expansion concerning
the "inverse" of an Edgeworth expansion. A corresponding result holds for
expansions of dJ.'s of order statistics.
+ 1 derivatives on a neighborhood of F 1(q), and that f(F1(q)) > 0 where f = F'. Suppose
that (r(n)/n  q) = O(n1).
Then there exist polynomials Ri,n, i = 1, ... , m  1, such that uniformly over
Ixl ~ logn,
Theorem 6.1.2. Let q E (0, 1) be fixed. Suppose that the df. F has m
6.2. Functions of Order Statistics
209
With Si.n denoting the polynomials in (6.1.2) we have
(6.1.6)
and
PROOF. Apply P.4.5 and use the arguments of Section 4.6.
(6.1.5), applied to x = <D 1 (ex), yields
P {X,(n):n
1 11
~ F1(q) + ( <D (ex) +
n i /2R i ,n(<D 1(ex)) )
~;/~~~ q~~~;)}
(6.1.7)
= ex
+ O(nm/2).
This result may be adopted to justify a formal expansion given by F.N.
David and N.L. Johnson (1954) in case of sample medians (see P.6.2).
6.2. Functions of Order Statistics
With regard to functions of order statistics a predominant role is played by
linear combinations of order statistics. A comprehensive presentation of this
subject is given in the book by Helmers (1982) so that it suffices to make some
introductory remarks. In special cases we are able to prove supplementary
results by using the tools developed in this book.
Chapters 4 and 5 provide approximations of joint distributions of central
and extreme order statistics by means of normal and extreme value distributions. Thus, asymptotic distributions of certain functions of order statistics
can easily be established. In this context, we also refer to Sections 9.5 and 10.4
where we shall study Hill's estimator and a certain X2test.
Asymptotic Normality of a Linear Combination
of Uniform R.V.'s
From a certain technical point of view the existing results for linear combinations of order statistics are very satisfactory. However, the question is still
open whether one can find a condition which guarantees the asymptotic
6. Other Important Approximations
210
normality of a linear combination of order statistics related to the Lindebergcondition for sums of independent r.v.'s or martingales.
Such a condition (see (6.2.4)) was found by Hecker (1976) in the special case
of order statistics of i.i.d. uniform r.v.'s. This theorem is a simple application
of the central limit theorem.
Theorem 6.2.1. Given a triangular array of constants ai,n' i
i)
n (
jl
bJ,n~.
=
1a',n  .L  a
+
1
'=J
n
,=1 n + 1 ',n'
= 1, ... , n, define
j = 1, ... , n + 1,
(6.2.1)
and
(6.2.2)
Then,
P{1:;;1 i=1t (n + 1)ai,n (Ui:n 
_i)
n+1
:5;
t} + <I>(t),
n +
00,
(6.2.3)
for every t if, and only if,
n+l
max 1:;;1 Ibj,nl+ 0,
j=1
n +
(6.2.4)
00.
PROOF. Let '11' '12' '13' ... be i.i.d. standard exponential r.v.'s. Put Si = L~=1 '1j.
From Corollary 1.6.9 it is immediate that
i~ (n+1)ai,n( Ui:n n~1),g, i~ ai,n[SiiSn+d(n+ l)]/[Sn+d(n+ 1)].
(1)
Check that
n
L ai,n[Si  iSn+d(n
i=1
From (2) and the fact that E'1j = 1 it is
n+l
E L bj,n'1j =
j=1
+ 1)] =
n+l
L bj,n'1j.
j=1
clear that
n+l
L bj,n = O.
j=1
(2)
(3)
Consequently, 1:; is the variance ofLj~i bj,n'1j. Moreover, since Sn+l/(n + 1)+
1 in probability as n + 00 we deduce from (1)(3) that (6.2.3) holds if, and only
if,
n+l
J=1
p { 1:;;1 ~ bj,n'1j:5; t + <I>(t),
n +
00.
(4)
The equivalence of (4) and (6.2.4) is a particular case of the LindebergLevyFeller theorem as proved in Chernoff et al. (1967), Lemma 1.
0
6.2. Functions of Order Statistics
211
6.2.2. If ar.n = 1 and ai,n = 0 for i # r (that is, we consider the order
statistic Ur:n) then T~ = r(n  r + 1)/(n + 1)3 in (6.2.2). Furthermore, (6.2.4)
is equivalent to r(n) + if.) or n  r(n) + if.) as n + 00 with r(n) in place of r.
EXAMPLE
As an immediate consequence of Theorem 6.2.1 and Theorem 4.3.1 we
obtain the following result of preliminary character: Assume that the density
f is strictly larger than zero and has three derivatives on the interval
(F1(q)  a, F1(q) + a) for some q E (0, 1). Define In = {r(n) + 4i: i = 1, ... ,
k(n)} where r(n)/n + q and k(n)jn + 0 as n + 00. Assume that ai,n = 0 for if/: In.
Then, with Tn as in (6.2.2), as n + if.),
(6.2.5)
+ <l>(t),
for every t if, and only if, (6.2.4) holds.
Of course, this result is very artificial. It would be interesting to know
whether the index set In can be replaced by {r(n) + i: i = 1, ... , k(n)} etc. It is
left to the reader to formulate other theorems of this type by using Theorem
4.3.1 or Theorem 4.5.3.
The Trimmed Mean
The trimmed mean Li;r Xi:n is another exceptional case of a linear combination of order statistics which can easily be treated by conditioning on the order
statistics Xr:n and X s:n.
Denote by Y;:sr1 the ith order statistic of s  r  1 r.v.'s with common
dJ. Fx,y [the truncation of F on the left of x and on the right of y]. Moreover,
denote by Qr,s,n the joint distribution of Xr:n and X s:n. Then, according to
Example 1.8.3(iii),
s
p { i~ Xi:n ::;; t =
f{
f r~~l
sr1
}
P x + i~r Y;:sr1 + y ::;; t dQr,s,n(x,y)
= P
Y; ::;; t  (x
+ y)} dQr,s,n(x, y)
where Y1' ... , Y.r1 are i.i.d. random variables with common dJ. Fx,y'
Now we are able to apply the classical results for sums of i.i.d. random
variables to the integrand. Moreover, Section 4.5 provides a normal approximation to Qr,s,n' Concerning more details we refer again to Helmers (1982).
Systematic Statistics
The notion of systematic statistics goes back to Mosteller (1946); we mention
this expression for historical reasons only because nowadays one would speak
6. Other Important Approximations
212
of a linear combination of order statistics when treating this type of statistics.
Based on the asymptotic normality of a fixed number of sample qquantiles
one can easily verify the asymptotic normality of a linear combination of these
order statistics. Given a location and scale parameter family of distributions
one can e.g. try to find the optimum estimator based on k order statistics.
Below we shall only touch on the most simple case, namely, that of k = 2.
Lemma 6.2.3. (i) Let 0 < qo < 1. Assume that the df. F has two bounded
derivatives on a neighborhood of F1(qo) and that fo := F'(F1(qo)) > O.
Then,
s~p Ip{n;~2 (X[nqo]:n 
F1(qo)):::;; t}  <I>(t) = O(nl/2)
where u5 = qo(1  qo)/f02.
(ii) Let 0 < ql < q2 < 1. Assume that the df. F has two bounded derivatives
on a neighborhood of F1(qj) and that /; := F'(Fl(qj)) > 0 for i = 1,2.
Then,
s~p Ip{n;:2 (X[nQ2]:n =
X[nqtl: n  (Fl(q2)  F1(qd)):::;; t}  <I>(t) I
O(nl/2)
where
PROOF.
Immediate from Theorem 4.5.3 by routine calculations.
Sample quantiles and spacings (== difference of order statistics) provide
quick estimators of the location and scale parameter. Recall that adJ.
FIt ... (x) := F((x  J1.)/u)
has the qJ.
FIt~~(q)
= J1. + UFl(q).
Under the conditions of Lemma 6.2.3 we obtain for the sample quantiles
X[nq;l:n of n i.i.d. random variables with common dJ. Fit ... that with U j = uj(F)
as in Lemma 6.2.3:
s~p Ip{u;~~~)(X[nQo]:n 
J1.):::;; t}  <I>(t) I = O(nl/2)
(6.2.6)
(6.2.7)
if w.l.g. F 1 (qo) = 0; moreover,
s~p Ip {U;:~~) (un where the estimator Un is given by
u) :::;; t}  <I>(t) = O(nl/2)
6.2. Functions of Order Statistics
213
and
An Expansion of Length Two for the Convex Combination
of Consecutive Order Statistics
Let Xr:n be the rth order statistic of n i.i.d. random variables with common
continuous dJ.
We shall study statistics of the form
(1  y)Xr:n
+ yXr+1:n,
Y E [0,1],
which may be used as estimators of the qquantile. The most important case
is the sample median for even sample sizes.
It is apparent that this statistic has the same asymptotic behavior for every
y E [0,1] as far as the first order performance is concerned. The different
performance of the statistics for varying y can be detected if the second order
term is studied. For this purpose we shall establish an expansion oflength 2.
Denote by Fr n the dJ. of e 1 (Xr:n  d). From Corollary 1.8.5 it is
immediate that for y and t,
P{(1  y)Xr:n
+ yXr+1:n :5: d + et}
= Fr.n(t) 
foo P{y[Yl:nr 
(d
+ ex)] > e(t  x)} dFrjx)
(6.2.8)
where Y1 : n  r is the sample minimum of n  r i.i.d. random variables with
common dJ. Fd +cx [the truncation of F on the left of d + ex].
Let Gn.r be an approximation to Fn.r such that
s~p Ip{e (xr:n 1
d)
B} 
dGn.rl = O(n 1 ).
(6.2.9)
From Corollary 1.2.7 and Theorem 5.4.3 we get uniformly in t and x,
P{y[Y1 : n r  (d
+ ex)] > e(t  x)}
+ ex + e(t  x)/y]} (6.2.10)
exp[ (n  r)Fd+cx[d + ex + e(t  x)/y]] + O(n 1 ).
= P{(n  r)U1 : n r > (n  r)Fd+cx[d
=
Combining (6.2.8)(6.2.10) and applying P.3.5 we get
s~p Ip{(1 
y)Xr:n
+ yXr+1 : n :5: d + et}  [ Gr,n(t)
 foo exp[ (n 
(6.2.11)
r)Fd+cx[d
+ ex + e(t  X)fy]]dGr,iX)JI
= O(n 1 ).
6. Other Important Approximations
214
Notice that if y = then, in view of (6.2.1 0), the integral in (6.2.11) can be
replaced by zero.
Specifying normalizing constants and an expansion Gr n of Fr n we obtain
the following theorem.
Theorem 6.2.4. Let q E (0, 1) be fixed. Assume that F has three bounded
derivatives on a neighborhood of F 1(q) and that f(F 1(q)) > where f = P.
Moreover, assume that (r(n)/n  q)
Then, uniformly in y E [0, IJ,
s~p
IP {
0(n1/Z). Put
n1/2f(Fl(q))
(J
[(1  y)Xr(n),n
Rn(t)
= 
+ yXr(n)+b 
q(1  q)).
1
(q)] ~ t
+ n 1/z <p(t)Rn(t)) I =
 (<I>(t)
where
(Jz
0(n 1/2)
1  2q (Jf'(F 1(q))] Z
[ ~  2f(F1(q))2 t
_ [q  nq
+ r(n) + y (J
1 + 2(1  2q)].
3(J
PROOF. The basic formula (6.2.11) will be applied to d = F1(q) and e =
(J/(n 1/zf(d)). In view of P.4.5 which supplies us with an expansion Gr.n =
<I> + n1/Z<pSr,n of Fr,n it suffices to prove that
roo exp[ (n 
r)Fd+cx[d
+ ex + e(t 
x)/yJJ <p(x) dx
(1)
and
roo exp[ (n 
r)Fd+cx[d
+ ex + e(t 
x)/yJJ I(<pSr,n)'(x) I dx = o(nO)
(2)
uniformly in y and t. The proof of (1) will be carried out in detail. Similar
arguments lead to (2).
Since <1>(  (log n)/2) = O(n1) it is obvious that
can be replaced by J~logn
where t ~ (log n)/2. Then, the integrand is of order O(n1) for those x with
e(t  x)/y > s(log n)/n for some sufficiently large s > 0. Thus, J~oo can be
replaced by J~(n) where u(n) = max( log n, t  ys(log n)/en).
Under the condition that F has three bounded derivatives it is not difficult
to check that for u(n) ~ x ~ t,
foo
Fd+cx[d
=
+ ex + e(t f(d)e(t  x)
(1 _ q)y
x)/yJ
+ O[elxl(c(t 
x)/y
+ (e(t
 x)/y)Z)]
(3)
6.2. Functions of Order Statistics
215
Thus, (1) has to be verified with the lefthand side replaced by the term
i:n) ex p [ (n 
which, by substituting y
nl/2(y/cr)
r)n1/2~(t_q;ncp(X)dX
= n1/2cr(t 
x)/y, can easily be verified to be equal to
I [1 v(n)
exp
(4)
r/n ]
y cp(t  n 1/2yy/cr) dy
lq
(5)
where v(n) = s(log n)/!(d). Since
1  r/n ]
exp [  1 _ q Y
= exp( 
and
cp(t  n 1/2yy/cr)
y)[1
+ o(nO)]
= cp(t) [1 + o(nO)]
(6)
we obtain that the term in (5) is equal to
nl/2(y/cr)cp(t)
(v(")
exp(  y) dy(1
+ o(nO)).
(7)
Notice that the relations above hold uniformly in p and t. Now (1) is
immediate.
0
Notice that for y = 0 we again get the expansion of length two of the
normalized dJ. of Xr(n):n as given in P.4.5. Moreover, for y = 0 and for r(n)
replaced by r(n) + 1 we get the same expansion as for p = 1 and r(n).
If q = !, f'(Fl(I/2)) = 0, n = 2m, and r = m then
P{[(2m)1/2!(Fl(I/2))/2] [(Xm:2m
= cI>(t) +
+ X m+1:2m)/2 
Fl(I/2)] :::;; t}
(6.2.12)
o(n 1/2 ).
Thus, the sample median for even sample sizes is asymptotically normal with
a remainder term of order o(n 1/2 ). For odd sample sizes the corresponding
result was proved in Section 4.2.
Remark 6.2.5. Let qo E (0, 1). Assume that F has three bounded derivatives on
a neighborhood of F1(qo) and that !(F 1(qo)) > 0. Then a short examination
of the proof to Theorem 6.3.4 reveals that the assertion holds uniformly over
all q in a sufficiently small neighborhood of qo and r(n) == r(q, n) such that
SUPq Ir(q, n)/n  ql = o(n 1/ 2 ). This yields the version of Theorem 6.3.4 as cited
in Pfanzagl (1985).
The Meanwhile Classical Theory of Linear
Combinations of Order Statistics
The central idea of the classical approach is to use weight functions to
represent a linear function of order statistic in an elegant way.
6. Other Important Approximations
216
Linear combinations of order statistics of the form
T" = n 1
are estimators of the functional
J.l(F) =
i=l
Ii
f
(_i+)
n
Xi:n
J(s)Fl(s)ds.
(6.2.13)
(6.2.14)
Notice that according to (1.2.13) and (1.2.14)
J.l(F) =
xJ(F(x))dF(x)
(6.2.15)
for continuous dJ.'s F.
The following theorem is due to Helmers (1981). The proof of Theorem
6.2.6 (see also Helmers (1982, Theorem 3.1.2)) is based on the calculus of
characteristic functions.
Theorem 6.2.6. Suppose that EI~113 <
(J2(F):=
00
and
f f J(F(x))J(F(y))(min(F(x), F(y)) 
F(x)F(y)) dx dy > 0.
Moreover, let the weight function J satisfy a Lipschitz condition of order 1 on
(0, 1). Then,
The smoothness condition imposed on J can be weakened by imposing
appropriate smoothness conditions on F.
6.3. Bahadur Approximation
In Section 1.1 we have seen that the dJ. of an order statisticand thus
that of the sample qJ.can be represented by means of the sample dJ.
It was observed by R.R. Bahadur (1966) that an amazingly accurate stochastic
approximation of the sample dJ. to the sample qJ. holds.
Motivation
To get some insight into the nature of this approximation let us consider the
special case of i.i.d. (0, I)uniformly distributed r.v.'s 111' 112' ... , 11n. Denote by
Gn and Vi:n the pertaining sample dJ. and the ith order statistics. We already
6.3. Bahadur Approximation
217
know that the distributions of
n
Gn(rln) = n 1
i=l
1(00.r/n)(I1;)
and of Ur:n are concentrated about rln. Moreover, relation (1.1.6) shows that
pointwise
Ur:n 
~n $;
iff Gn
(~)
~n ~ 0.
n
(6.3.1)
Thus, it is plausible that the distribution of
(Ur:n  rln)
+ (Gn(rln) 
rln)
is more closely concentrated about zero than each of the distributions of
Ur:n  rln and Gn(rln)  rln. Instead of (Ur:n  rln) + (Gn(rln)  rln), the socalled Bahadur statistic
q
(0, 1),
(6.3.2)
may apparently be studied as well.
Recall that G;;l(q) = Ur(q):n where r(q) = nq if nq is an integer and r(q) =
[nq] + 1 otherwise.
In the general case of order statistics X i : n from n i.i.d. random variables ~i
with common dJ. F and derivative J(F1(q)) the Bahadur statistic is given by
q
(0, 1),
(6.3.3)
where Fn and Fn 1 are the sample dJ. and sample qJ. based on the r.v.'s ~i.
The connection between (6.3.2) and (6.3.3) becomes obvious by noting that
the transformation technique yields
+ (Fn(F1(q))  q)
J(F1(q))(F 1(G;;l (q))  F1(q)) + (Gn(F(F1(q))) 
J(F 1(q))(Fn1 (q)  F1(q))
d
q).
(6.3.4)
If F1(q) is a continuity point of F then F(F1(q)) can be replaced by q and,
moreover, if F 1 has a bounded second derivative then
and hence results for the Bahadur statistic in the uniform case can easily be
extended to continuous dJ.'s F.
Probabilities of Moderate Deviation
Since we are interested in the Bahadur statistic as a technical tool we shall
confine our attention to a result concerning moderate deviations. The upper
bound for the accuracy of the stochastic approximation will be nonuniform
in q.
6. Other Important Approximations
218
Theorem 6.3.1. For every s > 0 there exists a constant C(s) such that
P{ I(G;l(q)  q)
+ (Gn(q) 
q)1 > (log n)jn)3/415(q, s, n)
for some
q E (0, I)} ::;; C(s)n S
where l5(q, s, n) = 7(s
+ 3) max {(q(1
 q))1/4, (7(s
+ 3)(10g n)jn) 1/2 }.
Before proving Theorem 6.3.1 we make some comments and preparations.
Theorem 6.3.1 is sufficient as a technical tool in statistical applications,
however, one should know that sharp results concerning the stochastic
behavior of the Bahadur statistic exist in literature. The following limit
theorem is due to Kiefer (1969a): For every t > 0,
P { sup I(G;l(q)  q)
qE(O,l)
t
+ (Gn(q) 
q)1 >
(10gn)1/2t}
3/4
2 L (_l)m+1 e2m2t4
m
as n t 00 where the summation runs over all positive integers m.
Kiefer's result indicates that Theorem 6.3.1 is sharp in so far that the
((logn)/n)3/4 cannot be replaced by some term of order o[((10gn)/n)3/4].
To prove Theorem 6.3.1 we shall use a simple result concerning the
oscillation of the sample dJ. For this purpose define the sample probability
measure Qn by
n
Qn(A)
= n 1 L
i=l
lA('1i)
where the '1i are i.i.d. random variables with common uniform distribution Qo
on (0, 1). Recall that the GlivenkoCantelli theorem yields
n t
00,
w.p. 1,
(6.3.5)
where f is the system of all intervals in (0, 1).
Lemma 6.3.2 will indicate the rate of convergence in (6.3.5); moreover, this
result will show that the rate is better for those intervals 1 for which 0'2(1) =
Qo(I)(l  Qo(I)) is small.
Lemma 6.3.2. For every s > 0 there exists a constant A(s) such that for every n:
P{
PROOF.
n 1/2 IQn(I)  Qo(I)1
~~~ max{O'(1),((10gn)jn)1/2} ~ s +
3)(1
ogn
)1/2}
<
A()S
sn .
Given e, p > 0 we shall prove that
K(s, n) := P { sup
lE$
n1/2IQn(I)  Qo(I)1
}
{ () j 1/2} ~ e
max O'1,pn
(6.3.6)
6.3. Bahadur Approximation
219
Then, an application of (6.3.6) to p = (log n)1/2 and 6 = (8 + 3)(log n)1/2
yields the assertion.
Put.fo = {(i/n,j/n]: 0::; i <j::; n}. Straightforward calculations yield
K~~<P { wp

:s;
I do
L
IE
.Jb
n1/2IQn(I)  Qo(I)1 + 2n 1/2
}
>6
max {(j2(1)  2/n  4/n2, p2/n} 1/2 
(1)
P {n 1/2 1Qn(I)  Qo(I)1 ~ 6(I)}
where 6(I) = 6 max {(j2(I)  2/n  4/n2, p2/n} 1/2  2n 1/2 . Let J E.fo be fixed.
Assume w.l.g. that (j(I) > and 6p ~ 7/2 so that 6(1) > 0. Using the exponentialbound(3.1.1)witht = (j(I)/max{((j2(I)  2/n  4/n2)/p2, 1/np/2 weobtain
P{n 1/2 IQn(1)  Qo(I)1
~ 6(1)}:S; 2exp [  ;~ht + ~t2J
(2)
3 p 2 +"2
7
:s; 2 exp [  6p + 4"
+;;3J .
Now, (1) and (2) yield (6.3.6). The proof is complete.
Remark 6.3.3. Lemma 6.3.2 holds for any i.i.d. random variables (with
arbitrary common distribution Q in place of Qo). The general case be reduced
to the special case of Lemma 6.3.2 by means of the quantile transformation.
Lemma 6.3.2 together with the BorelCantelli lemma yields
.
n 1/2 1Qn(J)  Qo(I)1
hm sup sup
( )(l
)1/2
::; 5
n
IE.}n
(j J ogn
(6.3.7)
w.p. 1
where.fn = {I E: (j2(I) = Qo(I)(1  Qo(I ~ (logn)/n}. In this context, we
mention a result of Stute (1982) who proved a sharp result concerning the
almost sure behavior of the oscillation of the sample dJ.:
.
n 1/2 1Qn(I)  Qo(I)1
hm sup 2 ()l
1)1/2
n IE.}: ( Qo J oga n
(6.3.8)
= I w.p.1
where.f.* = {I E .~: J = (a,b], r:xan:s; Qo(I):s; {Jan} with < r:x < {J < 00, and
an has the properties an! 0, nan i 00, log a;; 1 = o(na n) and (log a;;1 )/(loglog n) ~
00 as n ~ 00. Note that (6.3.8) shows that the rate in (6.3.7) is sharp.
Theorem 6.3.1 will be an immediate consequence of Lemma 6.3.2 and
Lemma 3.1.5 which concerns the maximum deviation of the sample qJ. Gn 1
from the (0, I)uniform qJ.
PROOF OF THEOREM 6.3.1. Since IGn (Gn 1 (q  ql :s; l/n we obtain
IGn 1(q)  q
+ (Gn(q) 
q)1 :s; IG;1(q)  Gn(Gn 1(q
:s; sup
Ixql";"
+ (Gn(q) 
Ix  Gn(x) + Gn(q)  ql
= sup IQn(I(q  Qo(I(q1
I(q)
q)1
+ l/n
220
6. Other Important Approximations
whenever IG;l(q)  ql ::S; K and I(q) runs over all intervals (x, q] and (q, x] with
ql ::S; K. Thus, by Lemma 6.3.2 and Lemma 3.1.5 applied to K = K(q,s,n),
we get
Ix 
P{IG;l(q)  q
+ (Gn(q) 
~ P {sup IQn(I(q)) 
q)1 ::S; (j(q,s, n),
q E (0, I)}
Qo(I(q)) I ::S;
J(q)
(s
~
1  [A(s)
+ 3)((logn)/n)1/2 K(q,s,n)1/2,
+ B(s)]n
q E (0,
1)}  B(s)n
where A(s) and B(s) are the constants of Lemma 6.3.2 and Lemma 3.1.5. The
proof is complete.
D
6.4. Bootstrap Distribution Function of a Quantile
In this section we give a short introduction to Efron's bootstrap technique
and indicate its applicability to problems concerning order statistics.
Introduction
Since the sample dJ. Fn is a natural nonparametric estimator of the unknown
underlying dJ. F it is plausible that the statistical functional T(Fn) is an
appropriate estimator of T(F) for a large class of functionals T.
In connection with covering probabilities and confidence intervals one is
interested in the dJ.
T,.(F, t) = PF{T(Fn)  T(F) ::S; t}
of the centered statistic T(Fn)  T(F).
The basic idea of the bootstrap approach is to estimate the dJ. T,.(F, . ) by
means of the bootstrap dJ. T,.(Fn'). Thus, the underlying dJ. F is simply
replaced by the sample dJ. FnLet us touch on the following aspects:
(a) the calculation of the bootstrap dJ. by enumeration or alternatively, by
Monte Carlo resampling,
(b) the validity of the bootstrap approach,
(c) the construction of confidence intervals for T(F) via the bootstrap
approach.
Evaluation of Bootstrap D.F.: Enumeration and Monte Carlo
Hereafter, let the observations Xl' ... , Xn be generated according to n i.i.d.
random variables with common dJ. F. Denote by Fnx the corresponding
221
6.4. Bootstrap Distribution Function of a Quantile
realization of the sample dJ. Fn; thus, we have
FnX(t) = lin
L l(oo.tj(x;),
i=l
Since F: is a discrete dJ. it is clear that the realization T,,(F:,') of the
bootstrap dJ. T,,(Fn' .) can be calculated by enumeration:
If Xi =1= Xj for i =1= j then T,,(Fnx, t) is the relative frequency of vectors
Z E {x l' ... , xn}n which satisfy the condition
(6.4.1)
Notice that inequality (6.4.1) has to be checked for nn vectors z.
A Monte Carlo approximation to T,,(F:, t) is given by the relative frequency
of pseudorandom vectors Zl' ... , Zm satisfying (6.4.1) where Zi = (Zi,l,, Zi,n)'
The values Zl,l"'" zl,n' Z2,l"'" zm,n are pseudorandom numbers generated
according to the dJ. Fnx . The sample size m should be large enough so that the
deviation of the Monte Carlo approximation from T,,(F:, t) is negligible.
The 30'rule leads to a crude estimate of the necessary sample size. It says
that the absolute deviation of the Monte Carlo approximation from T,,(Fnx, t)
is smaller than 3/(2m 1!2) with a probability;::: .99. Thus, if e.g, a deviation of
0.005 is negligible then one should take m = 90000.
These considerations show that the Monte Carlo procedure is preferable
to the exact calculation of the bootstrap estimate by enumeration if m is small
compared to nn (which will be the case if n ;::: 10). In special cases it is possible
to represent the bootstrap estimate by some analytical expression (see (6.4.2)).
A Counterexample: Sample Minima
Next, we examine the statistical performance of bootstrap estimates in the
particular cases of sample minima. This problem will serve as an example
where the bootstrap approach is not valid.
Let again rx(F) = inf{x: F(x) > O} denote the left endpoint of the dJ. F. The
corresponding statistical functional rx(Fn) is the sample minimum X 1:n. If
rx(F) >  00 then according to (1.3.3),
T,,(F, t) = P{X1:n  rx(F)
:$;
t} = 1  [1  F(rx(F)
+ t)]n
and
If F is continuous then w.p. 1,
T,,(Fn' 0)  T,,(F,O)
= 1
(1 
~y + 1 
exp( 1),
n +
00.
Hence the bootstrap method leads to an inconsistent sequence of estimators.
222
6. Other Important Approximations
Sample Quantiles: Exact Evaluation of Bootstrap D.F.
Monte Carlo simulations provide some knowledge about the accuracy of the
bootstrap procedure for a fixed sample size. Further insight into the validity
of the bootstrap method is obtained by asymptotic considerations.
The consistency of T,,(Fn' .) holds if e.g. the normalized drs T,,(Fn' .) and
T,,(F, .) have the same limit, as n goes to infinity. Then the accuracy of the
bootstrap approximation will be determined by the rates of convergence of
the two sequences of drs to the limiting dJ. As an example we study the
bootstrap approximation to the dJ. of the sample qquantile.
If T(F) = Fl(q) then T(Fn) = Fn1(q) = Xm(n),n where men) = nq if nq is an
integer, and men) = [nq] + 1, otherwise. By Lemma 1.3.1,
= i=~n) C}F(F1(q) + M(1
(6.4.2)
F(F1(q)
+ t))ni
and the same representation holds for T,,(Fn' t) with F 1 replaced by Fn 1.
From Theorem 4.1.4 we know that T,,(F, t), suitably normalized, approaches
the standard normal dJ. <l> as n > 00. The normalized version of T,,(F, t) is
given by
(6.4.3)
if F = <l>.
To prove that the bootstrap dJ. T,,(Fn' .) is a consistent estimator of T,,(<l>, .)
one has to show that, T,,*(Fn, t) > <l>(t), n > 00, for every t, w.p. 1.
3
2
1
Figure 6.4.1. Normalized dJ. 7;,*(<1>, .) of sample qquantile and bootstrap dJ. 7;,*(F", .)
for q = 0.4 and n = 20, 200.
223
6.4. Bootstrap Distribution Function of a Quantile
The numerical calculations above were carried out by using the normal
approximation to the dJ. of the sample quantile of i.i.d. (0, I)uniformly
distributed r.v.'s. Otherwise, the computation of the binomial coefficients
would cause numerical difficulties. Computations for the sample size n = 20
showed that the error of this approximation is negligible.
From Figure 6.4.1 we see that Tz"O(<I>, .) and Tz"Oo(<I>, .) are close together
(and, by the way, close to <D) indicating a quick convergence. The bootstrap
dJ. T,,*(Fn' .) is a step function which slowly approaches <1>. Next, we indicate
this rate of convergence.
Asymptotic Investigations
The further analysis will be simplified by using the normal approximation to
the dJ. of the sample qquantile of n i.i.d. (0, I)uniformly distributed r.v.'s.
From Corollary 1.2.7 and (4.2.1), applied to m = 1, we deduce
T,,(Fn' t/n 1/Z )  T,,(F, t/n1/Z)
= <I>
[n 1/Z (Fn(Fn l(q) + t/n1/Z)  q) ]
(q(l  qW/Z
n 1/2 (F(F 1(q) + t/n 1/2 )  q) ]
 <I> [
(q(l _ q))1/2
=
+ O(n
1/2
(6.4.4)
)
<I>[tgn.t/(q(l  q))1/2]  <I>[tf(F1(q))/(q(1  q))1/2]
+ 0(1)
uniformly over t [where the second relation holds if F has a derivative, say,
f(F 1(q)) at F1(q)]. Moreover, the function gn,t is defined by
gn,t
Fn(Fn 1(q)
+ t/n 1/2 ) 
Fn(Fn 1(q))
# 0,
(6.4.5)
and = if t = 0.
The auxiliary function gn,t is a "naive" estimator of the density at the
random point Fn 1 (q). Thus, the stochastic behavior of the bootstrap error
T,,(Fn, t/n 1/2 )  T,,(F, t/n 1/2 ) is closely related to that of a density estimator. We
have
sup IT,,(Fn' t)  T,,(F, t) I + 0,
n +
00,
w.p. 1,
(6.4.6)
that is, the bootstrap estimator is strongly consistent, if w.p. 1 for every t # 0,
n +
00.
(6.4.7)
Let us assume that F has a derivative, say, f near F 1 (q) and f is continuous
at F 1 (q). From Lemma 3.1.7(ii) and the BorelCantelli lemma it follows that,
w.p. 1, for every t # 0,
6. Other Important Approximations
224
+o[(F(Fn 1(q) + t/n 1/2 )  F(Fn 1(q)))1 /2 (logn)1/2 + lOgn]
n 1/4
n 1/2
tin 1/2
= f(P1(q) + en.tt/n1/2) + 0 [(f(Fn 1(q) + en.tt/n1/2))1/2 (l0!1~1/2 + l:~2n]
eventually, for some en,t E (0, 1).
Thus, (6.4.7) holds because Fn1(q) is a strongly consistent estimator of
F1(q) under the present conditions (compare with Lemma 1.2.9).
It is easy to see that the proof, developed above, also leads to a bound of
the rate of convergence which is, roughly speaking, of order O(n1/4) under
slightly stronger conditions imposed on F.
An exact answer to the question concerning the accuracy of the bootstrap
approximation can e.g. be obtained by a law of the iterated logarithm as
proved by Singh (1981):
If F has a bounded second derivative near F1(q) andf(F 1(q)) > 0 then
lim sup (l
n
n 1/4
1
)1/2 sup I T,,(Fn' t)  T,,(F, t)1
og ogn
Kq,F
>0
w.p.l
where K is a constant depending on q and F only.
The accuracy of the bootstrap approach is also described in a theorem due
to Falk and Reiss (1989) which concerns the weak convergence of the process
Zn defined by
where
(6.4.8)
and cp = <1>'.
Theorem 6.4.1. Assume that F is a continuous df having a derivative f near
F 1(q) which satisfies a local Lipschitzcondition of order (j> 1/2 and that
f(F 1(q)) > O. Then, Zn weakly converges to a process Z defined by
Z(t ) = {
B1 (  t)
B 2 (t)
'f t:<:;; 0
t> 0
where B1 and B2 are independent standard Brownian motions on [0, (0).
We refer to Falk and Reiss (1989) for a detailed proof of Theorem 6.4.1
and for a definition of the weak convergence on the set of all right continuous
functions on the real line having lefthand limits.
The basic idea of the proofis to examine the expressions in (6.4.3) and (6.4.5)
conditioned on the sample qquantile Fn 1(q). Notice that the r.v.'s gn,t only
6.4. Bootstrap Distribution Function of a Quantile
225
depend on order statistics smaller (larger) than Fn 1 (q) if t ~ 0 (if t > 0). Thus,
it follows from Theorem 1.8.1 that, conditioned on Fn1(q), the processes
(gn,t)t~O
and (gn,t)t> 0
are conditionally independent. Theorem 6.4.1 reveals that we get the unconditioned independence in the limit.
The Maximum Deviation
Let T,,*(Fn, .) be the normalized bootstrap dJ. as defined in (6.4.3). Denote by
Hn the normalized dJ. of the maximum deviation ofthe bootstrap dJ. T,,*(Fn, t)
from T,,*(<D, t) over 1tl ~ 3. More precisely, we have
Hn(s) = P {nl/4 max 1T,,*(Fn, t)  T,,*(<D, t)1 ~ s}.
(6.4.9)
Itl~3
We present a Monte Carlo result based on a sample of size N = 5000.
Pigure 6.4.2 shows that the asymptotic result in Theorem 6.4.1 is ofrelevance
for small and moderate sample sizes.
1.0
0.5
0.0
+~~~
0.5
1.0
Figure 6.4.2. Normalized dJ. Hn of maximum bootstrap error for q = 0.5 and n = 200,
2000 with H 200 ~ H2000
Confidence Bounds
Next, we consider the problem of setting t'Y0sided confidence bounds for the
unknown parameter T(F). First, let us look at the problem from the point of
view of a practitioner. One has to find a random variable cn(oc) such that
(6.4.10)
The bootstrap solution is to take cioc) such that the bootstrap dJ. satisfies
6. Other Important Approximations
226
T,.(Fn, cn(a))  T,.(Fn' cn(a))
1  a.
The validity of (6.4.10) can be made plausible by the argument that uniformly
over all t
This idea will be made rigorous in the particular case of the qquantile via
asymptotic considerations.
If F has a derivative f near Fl(q) and f is continuous at F 1 (q) then we
know that
n +
00,
where un(a) = <1>1 (1  a/2)(q(1  q)/n)I/21f(Fl (q)). Moreover, by using the
fact that
n + 00, w.p. 1,
sup IT,.(Fn' t)  T,.(F, t)1 + 0,
t
we obtain cn(a)/un(a) + 1, n + 00, w.p. 1. Hence, Slutzky's lemma yields
(6.4.10).
For a continuation of this topic we refer to Section 8.4 where the smooth
bootstrap is examined.
P.6. Problems and Supplements
1. (GramCharlier series of type A)
Let cp denote the density of the standard normal dJ. <1>. The ChebyshevHermite
polynomials Hi = (_1)icp(i)/cp are orthonormal w.r.t. the inner product (h,g) =
Sh(x)g(x)cp(x) dx (see Kendall and Stuart (1958), page 155). Write Hi = 2:;=0 ej,ixj.
Denote by Pn the distribution and by Itn,j the jth moment of
n l /2 f(pI(q
I
(q(1 _ qI/2 (Xr(n),n  F
i (
(q.
Prove, under the conditions of Theorem 6.1.1, that
sup IPn(B) B
cp(x) 1 +
.2:
3(ml)
(_1)i
.=1
T Hi(x)
C
dx = O(nm/2)
I.
(Reiss, 1974b)
2. Under the conditions of Theorem 6.1.2, with q = 1/2, m = 3 and odd sample sizes n,
P { X[n/21+bn > F
I
(1/2)
+ 2fon l / 2 A. 
1 fl 2
4n l /2 f02 A.
~(A
+
A3(1 2ff/0
4n
+ ~)))}
=
6f 3
0
IX
+ O(n
3/
2)
where A. = <1>1(1  IX) and J; = f(il(FI(1/2.
(F.N. David and N. L. Johnson, 1954)
Bibliographical Notes
227
3. Let Xi,. be the ith order statistic of n i.i.d. exponential r.v.'s
where bj
= (n
 j
~ I ' ... , ~ .
Show that
+ Itl I7=j a i Moreover, with r; = I~=I bi:., we have
{r;;I .
ai (X i ,. 
EX i ,.)
::;
l=l
t}
+ lI>(t),
n +
00,
if, and only if,
.+1
max r;;llbj,.I+ 0,
j=1
n +
00.
4. Prove an expansion of length 2 in Lemma 6.2.3(ii).
5. Show that the accuracy of the bootstrap approximation can be improved by treating
the standardized version
[Hint: Use (6.4.4).]
Bibliographical Notes
An approach related to that in Theorem 6.1.1 was adopted by Hodges and
Lehmann (1967) for expanding the variance of the sample median (without
rigorous proof). These investigations led to the famous paper by Hodges and
Lehmann (1970) concerning the second order efficiency (deficiency).
Concerning limit theorems for moments of extremes we refer to Pickands
(1968), Polfeldt (1970), Ramachandran (1984), and Resnick (1987).
Concerning linear combinations of order statistics we already mentioned
the book of Helmers (1982). A survey of other approaches for deriving limit
theorems for linear combinations of order statistics is given in the book of
Serfling (1980). A more recent result concerning linear combinations of order
statistics is due to van Zwet (1984): A representation as a symmetric statistics leads to a BerryEsseen type theorem that is essentially equivalent to
Theorem 6.2.6.
Limit laws for sums of extremes and intermediate order statistics have
attained considerable attention in the last years. This problem is related to
that of weak convergence of sums of i.i.d. random variables to a stable law
(see Feller (1972)). Concerning weak laws we refer to the articles of M. Csorgo
et al. (1986), S. Csorgo and D.M. Mason (1986), and S. Csorgo et al. (1986).
A. Janssen (1988) proved a corresponding limit law w.r.t. the variational
distance. An earlier notable article pertaining to this is that of Teugels (1981),
among others.
Spacings and functions of spacings (understood in the greater generality of
mstep spacings) are dealt with in several parts of the book as e.g. in the context
of estimating the quantile density function. We did not make any attempt to
228
6. Other Important Approximations
cover this field to its full extent. For a comprehensive treatment of spacings
see Pyke (1965, 1972). Several test statistics in nonparametric statistics are
based on spacings. In the present context, the most interesting ones are
perhaps those based on mstep spacings. For a survey of recent results we refer
to the article of lammalamadaka S. Rao and M. Kuo (1984). Interesting results
concerning "systematic" statistic (including x2test) are given by Miyamoto
(1976).
A first improvement of Bahadur's original result in 1966 was achieved by
Kiefer (1967), namely a law ofthe iterated logarithm analogue for the Bahadur
approximation evaluated at a single point. Limit theorems like that stated in
Section 6.3 are contained in the article of Kiefer (1969a). Further extensions
concern (a) the weakening of conditions imposed on the underlying r.v.'s (see
e.g. Sen, 1972) and (b) nonuniform bounds for the remainder term of the
Bahadur approximation (e.g. Singh, 1979).
It was observed by Bickel and Freedman (1981) that bootstrapping leads
to inconsistent estimators in case of extremes. An interesting recent survey of
various techniques related to bootstrap was given by Beran (1985). We refer
to Klenk and Stute (1987) for an application of the bootstrap method to linear
combinations of order statistics.
CHAPTER 7
Approximations in the Multivariate Case
The title of this chapter should be regarded more as a program than as a
description of the content (in view of the declared aims of this book).
In Section 7.1 we shall give an outline of the present stateoftheart of the
asymptotic treatment of multivariate central order statistics.
Contrary to the field of central order statistics a huge amount of literature
exists concerning the asymptotic behavior of multivariate extremes. For an
excellent treatment of this subject we refer to Galambos (1987) and Resnick
(1987). In Section 7.2 we shall present some elementary results concerning the
rate of convergence in the weak sense. Our interest will be focused on maxima
where the marginals are asymptotically independent. As an example we shall
compute the rate at which the marginal maxima of normal random vectors
become independent.
7.1. Asymptotic Normality of Central Order Statistics
Throughout this section, we assume that /;1' /;2, /;3' ... is a sequence of i.i.d.
random vectors of dimension d with common dJ. F. Let X~)n be the rth order
statistic in the jth component as defined in (2.1.4).
For j = 1, ... , d, let I(j) c {1, ... , n}. If F statisfies some mild regularity
conditions then it is plausible that a collection of order statistics
j
= 1, ... , d, r(j) E J(j)
(7.1.1)
is jointly asymptotically normal if for each j = 1, ... , d the order statistics
r(j)
I(j),
(7.1.2)
7. Approximations in the Multivariate Case
230
have this property. We do not know whether this idea can be made rigorous,
though.
The asymptotic normality of order statistics can be proved via the device
of Section 2.1, namely, to represent the dJ. of order statistics as the dJ. of a
sum of i.i.d. random vectors. To simplify the writing let us study the 2dimensional case. According to Section 2.1 we have
P{X~U,n):n ::;; tl,n, X~ZJ,n):n ::;; t2,n}
= P
Lt (1(oo,tl,nl(~i,
d, 1(oo,t2,nl(~i,2 ~ r(n) }
(7.1.3)
where ~i = (~i,I'~i,2) and rn = (r(l,n), r(2,n. On the righthand side we are
given the distribution of a sum of i.i.d. random vectors whence the multidimensional central limit theorem is applicable.
Let 0 < ql' q2 < 1 be fixed and assume that
nI/2(r(i, n)/n  qi) + 0,
n +
00,
i = 1,2.
(7.1.4)
According to the univariate case the appropriate choice of constants
(tl,n> t 2 ,n) is
ti,n = Fil(qi)
+ x;/n I/21;,
i = 1,2,
tn
(7.1.5)
where F'; is the ith marginal dJ. of F and I; = Fi(F';I(qi' Let us rewrite the
righthand side of (7.1.3) by
(7.1.6)
where the random vectors 11i,n are given by
11i,n = [(I(oo,tl,nl(~i,d, 1(oo,t2,nl(~i,2))
(FI(tl,n), F2(t2,n))], (7.1.7)
and
(7.1.8)
Obviously, 11i,n, i = 1, 2, ... , n, are bounded i.i.d. random vectors with mean
vector zero and covariance matrix ~n = (O'i,j,n) given by
O'i,i,n = Fi(t i,n)(1  Fi(ti,n,
i = 1,2
(7.1.9)
and
Theorem 7.1.1. Assume that F is continuous at the point (F11 (ql),F;I(q2))' More
over, for i = 1, 2, let F'; be differentiable at Fil(qi) with I;
Define ~ = (O'i,j) by
= Fi(Fil(qi)) > O.
i = 1,2,
and
(7.1.10)
231
7.1. Asymptotic Normality of Central Order Statistics
If det(:E) "# 0 and condition (7.1.4) holds then for every (Xl,X2):
n + 00, (7.1.11)
where <I>}; is the bivariate normal df. with mean vector zero and covariance
matrix :E.
PROOF. Let :En and 1)i.n be as in (7.1.9) and (7.1.7). Since :En +:E, n + 00, we
may assume w.l.g. that det(:E n) # O. Let T" be a matrix such that T,,2 = :E;1
[compare with Bhattacharya and Rao (1976), (16.3), and (16.4)]. Then, according to a BerryEsseen type theorem (see Bhattacharya and Rao (1976), Corollary 18.3) we get
s~p Ip{n i~ 1)i,n ~ z} l /2
<I>};Jz)
I~ cnl/2EIIT,,1)l,nll~ = O(n
1/2 )
(7.1.12)
for some constant c > O. Here II 112 denotes the Euclidean norm.
The differentiability of Fi at Fi l (qi) and condition (7.1.4) yield that xi,n + Xi>
n + 00, and hence
n +
00.
Combining (7.1.3), (7.1.6), (7.1.12), and (7.1.13) we obtain (7.1.11).
(7.1.13)
The error rates in (7.1.11) can easily be computed under slightly stronger
regularity conditions imposed on F.
The condition det(:E) # 0 is rather a mild one. If ~i = (C;;. c;i) are random
vectors having the same r.v. in both components then det(:E) = 0 if ql = q2
and det(:E) # 0 if ql # q2' It is clear that the two procedures of taking two
. . X ':n' X .:n accord'mg to '>1,
}:. ... , '}:.>n or ord er statIstIcs
. . X(l)
X(2)
ord er statIstIcs
ron'
s:n
according to ~l' . , ~n are identical. Thus, the situation of Section 4.5 can be
regarded as a special case of the multivariate one.
Next we give a straightforward generalization of Theorem 7.1.1 to the case
d ~ 3. We take one order statistic X~~,n):11 out of each of the d components.
Theorem 7.1.2. Let
~l' ~2' ... be a sequence of dvariate i.i.d. random vectors
with common df. F. Denote by Fi and Fi,j the univariate and bivariate marginal
df.s of F. Let 0 < qi < 1 for i = 1, ... , d. Assume that Fi,j is continuous at the
point (Fil(qi),Fjl(qj for i, j = 1, ... , d. Moreover,for i = 1, ... , d, let fj be
differentiable at Fjl(qj) with /; = F;(Fjl(qi > O. Assume that
n 1/2 (r(i, n)/n  qj) + 0,
n + 00, i = 1, ... , d.
(7.1.14)
Define:E = (O'i) by
i = 1, ... , d,
and
(7.1.15)
7. Approximations in the Multivariate Case
232
= (x!,oo.,x d ),
~ Xi' i = 1, ... , d} + <l>r(x),
If det(~) # 0, then for every x
P{ n 1/2 .t;[X~3.n):n  Fi! (qi)]
n +
00,
(7.1.16)
where <l>r is the dvariate normal df with mean vector zero and covariance
matrix ~.
7.2. Multivariate Extremes
In this section, we shall deal exclusively with maxima of dvariate i.i.d. random
vectors /;!.n, ... , /;n.n with common d.f. Fn. It is assumed that Fn has identical
univariate marginals Fn,i' Thus,
Fn.l =00. =Fn,d'
It will be convenient to denote the dvariate maximum by
Mn = (Mn 1,,Mn ,d)
where M n ,!, ... , M n d are the identically distributed univariate marginal maxima (compare with (2.1.8)) with common d.f. F:,!. Recall that F:is the d.f. of Mn.
Weak Convergence
The weak convergence is again the pointwise convergence of d.f.'s if the
limiting d.f. is continuous which will always be assumed in this section. The
weak convergence of dvariate d.f.'s implies the weak convergence of the
univariate marginal d.f.'s (since the projections are continuous). In particular,
if Fnn weakly converges to Go then the univariate marginal d.f.'s F:,1 also
converge weakly to the univariate marginal GO 1 of Go. Notice that Go also
has identical univariate marginals. If Go,! is nondegenerate then the results of
Chapter 5 already give some insight into the present problem.
Recall from Section 2.2 that the dvariate d.f.'s x + nt=1 Go.! (x;) and
x + GO,1 (min(x 1 , ... , x d )) represent the case of independence and complete
dependence.
Lemma 7.2.1. Assume that the univariate marginals F:,1 converge pointwise to
the dj. GO 1 '
(i) Then, for every x,
nG
d
i;;;;l
O,1 (Xi)
~ lim inf F:(x) ~ lim sup F:(x) ~ Go.! (min(x 1 , 00', Xd))'
n
(ii) If F: converges pointwise to some right continuous function G then G is a df
PROOF. Ad (i): Check that F:(x)
is obvious.
Fnn.! (min(x l' ... , Xd))' Now, the upper bound
7.2. Multivariate Extremes
233
Secondly, Bonferroni's inequality (see P.2.5(iv yields
F:(x)
~ exp [
=
jt
+ 0(1)
n(1 Fn ,1(X)) ]
n exp[ n(1 d
j=1
Fn ,1(X))]
+ 0(1) =
n G ,1(X) + 0(1).
d
j=1
Therefore, the lower bound also holds.
Ad (ii): Use (i) to prove that G is a normed function. Moreover, the pointwise
convergence of
to G implies that G is ~monotone (see (2.2.19.
D
F:
It is immediate from Lemma 7.2.1 that maxstable dJ.'s Go have the
property
n G ,1(XJ:s; Go(x):s; Go,1(min(x 1'",Xd
d
;=1
(7.2.1)
Let /; = (e 1" .. , ed) be a random vector with dJ. F. Recall from P.2.5 that
for some universal constant C > 0,
s~p
Fn(t)  exp
Ct
(1)i nh t )) :s; Cn 1
(7.2.2)
where
j
= 1, ... ,d. (7.2.3)
Combining Lemma 7.2.1 and (7.2.2) we obtain
Corollary 7.2.2. Let /;n be a dvariate random vector with df Fn. Define hn.j in
analogy to hj in (7.2.3) with /; replaced by /;n. Suppose that the univariate
marginals F:,1 converge pointwise to a df Moreover, for every j = 1, ... , d,
n +
00,
pointwise,
where hO,j, j = 1, ... , d, are right continuous functions. Then,
(i)
Go
= ex p (
J=1
(1)ihO,j)
is a df,
and
(ii)
n +
00,
for every x.
The formulation of Lemma 7.2.1 and Corollary 7.2.2 is influenced by a
recent result due to Husler and Reiss (1989) where maxima under multivariate
normal vectors, with correlation coefficients p(n) tending to 1 as n + 00, are
studied. In the bivariate case the following result holds: If
(1  p(nlogn + A,2,
n +
00,
7. Approximations in the Multivariate Case
234
then the normalized distributions of maxima weakly converge to adJ. H).
defined by
H).(x,y) = exp [
$(A +
x ;y)eY
$(A +
y ; x)e x ]
(7.2.4)
with
Ho = lim H).
and
H~ =
).LO
lim H)..
).T~
If A = 0, the marginal maxima are asymptotically completely dependent; if
A = 00, we have asymptotic independence. Notice that H). is maxstable and
thus belongs to the usual class of multivariate extreme value dJ.'s.
Next (7.2.2) will be specialized to the bivariate case. Let (~n' '1n) be a random
vector with dJ. Fn. The identical marginal dJ.'s are again denoted by Fn l and
Fn 2 According to (7.2.2),
sup 1F:(x,y)  exp(n(1  Fn.l(x))  n(1 Fn.l(y))
(x.y)
+ nLix,y))l::;; Cn l
(7.2.5)
where
Lix,y) = P{en > x, '1n > y}
is the bivariate survivor function. Assume that
n +
F:.I(x) + GO.I(x),
(7.2.6)
00,
for every x, where GO I is a dJ. Then,
F:(x, y) = exp[  n(1  Fn I (x))  n(1  Fn I (y))
. .
= GO.I (x)GO.I (y)exp[nLn(x, y)]
+ nLn(x, y)] + O(nl)
+ 0(1).
(7.2.7)
Therefore, the asymptotic behavior of the bivariate survivor function is
decisive for the asymptotic behavior of the bivariate maximum. The convergence rate in the univariate case and the convergence rate of the survivor
functions determine the convergence rate for the bivariate maxima.
Asymptotic (Quadrant) Independence
We discuss the particular situation where the term nLn(x,y) in (7.2.7) goes to
zero as n + 00. The following result is a trivial consequence of (7.2.7).
Lemma 7.2.3. Assume that (7.2.6) holds. For every (x, y) with GO.I (x)GOI (y) > 0
the following equivalence holds:
F:(x, y) + GOI (x) Go. I (y),
n +
00,
if, and only if,
n +
00.
(7.2.8)
7.2. Multivariate Extremes
235
Thus under condition (7.2.8) the marginal maxima M n 1 and M n 2 are
asymptotically independent in the sense that (Mn 1 , M n 2 ) converge in distribution to a random vector with independent marginals.
Corollary 7.2.4. Let ~ and '1 be r.v.'s with common df F such that Fn(bn + an') +
G weakly.
Then, the pertaining normalized maxima a;; 1 (Mn.i  bn), i = 1,2, are asymptotically independent if
(7.2.9)
lim P(~ > xl'1 > x) = O.
xjw(F)
PROOF. Notice that (bn + anx)j w(F) and n(1  F(bn + anx)) + log G(x),
n + 00, for cx(Go ) < x < w(Go ) and hence the assertion is immediate from
Lemma 7.2.3 applied to ~n = a;;1(~  bn ) and '1n = a;;1('1  bn ).
D
It is well known that (7.2.9) is also necessary for the asymptotic independence. Moreover, Corollary 7.2.4 can easily be extended to the dvariate case
(see Galambos (1987), page 301, and Resnick (1987), Proposition 5.27).
Next, Lemma 7.2.3 will be applied to prove that, for multivariate extremes,
the asymptotic pairwise independence of the marginal maxima implies asymptotic independence.
Theorem 7.2.5. Assume that (Mn.1 , , M n.d ) converge in distribution to a
dvariate random vector with df Go. Then, the asymptotic pairwise independence of the marginal maxima implies the asymptotic independence.
PROOF. The Bonferroni inequality (see P.2.4 and P.2.S) implies that
P{Mn ::s; x}
::s; exp ({
:::: exp ( 
(D
: : Ca
::s;
,=1
f n(1 
Fn.1 (x;))
i~ n(1 
Fn.1 (Xi)))
i=1
GO 1 (X;)) exp(
Go. 1 (X;) )
1S;i<js;d
nLn.i,ixi,X))
+ 0(1)
~.
nLn.ijXi,Xj ) )
Pg i > Xi' ~j > Xj}'
1S;'<)S;d
+ 0(1)
+ 0(1)
where
Ln.ijXi,Xj )
It remains to prove that
exp (
1S;i<jS;d
nL n i,iXi' X j ) ) + 1,
n +
00,
7. Approximations in the Multivariate Case
236
for every x with n~;l GO,l (x;) > 0, This, however, is obvious from the fact that
according to Lemma 7.2,3 the pairwise independence implies
n +
for every 1 ~ i < j
00,
d.
As an immediate consequence of Theorem 7.2.5 one gets
Theorem 7.2.6. Let ~
indepencence of ~ 1,
= (~l' ... '~d) have a maxstable df. Then, the pairwise
... ,
~d
implies the independence.
In fact a much stronger result holds as pointed out to me by J. Husler. If
the r.v.'s ~ l' ... , ~d are uncorrelated and jointly have a maxstable dJ. then
they are mutually independent (see P.7.2).
Rates for the Distance from Independence
For notational convenience we shall only study the bivariate case. From (7.2.5)
and by noting that
sup IF:, 1 (x)  exp(n(1  Fn,l(x)))1 ~ Cn 1
x
(7.2.10)
(compare with P.2.5(ii)) we get
F:(x,y)
= F:,l(X)F:,l(y)exp[nL(x,y)] + O(n 1 ).
(7.2.11)
From (7.2.11) we see that the term nLn(x, y) determines the rate at which
the independence of the marginal maxima is attained.
It is apparent from the proof of Theorem 7.2.5 that (7.2.11) can easily be
extended to the case d ~ 2.
Next (7.2.11) will be specialized to bivariate normal vectors. It was observed
by Sibuya (1960) that the marginal maxima of i.i.d. normal random vectors
are asymptotically independent. In the following example we shall calculate
the rate at which the marginal maxima become quadrantindependent.
7.2.7. Let F be the dJ. of a normal vector (~,,,) where ~ and"
are standard normal r.v.'s. Let p denote the covariance of ~ and" where
 1 < p < 1. Put un(x) = bn + b;;l x where again bn = mp(bn). Then, for every
x,y,
EXAMPLE
Fn(un(x), un(y))
= I>n(un(x))I>n(u n(y))[1
(7.2.12)
O(n(lP)/(1+ P)(log nt P/(1+ P)] + O(nl).
According to (7.2.11) we have to prove that
nLn(x, y)
= nL(un(x), un(y)) = O(n(lP)/(l +P)(log n)P/(1+ P).
P.7. Problems and Supplements
237
It is well known that the normal distribution N(PZ.l p2) is the conditional
distribution of given" = z. Thus,
nL(un(x), un(y)) = n
f.oo
(1  N(Pz.lp2)( 
00, un(x)J)<p(z)dz
Un(Y)
00
(1cD[(u n(x) pu n(z))/(1 p2)1/2])exp[ (z+z2/b;)] dz
= o (b;; 1 <p(bS1P)/(1 +P
where the final step is carried out by using the inequality 1  cD(x) ::;; <p(x)/x,
x> O. We remark that for p > 0 the integration over z with y::;;
(uix)  pu n(z))/(1  p2)1/2 ::;; bn has to be dealt with separately. Since bn =
O((log n)1/2) the proof can easily be completed.
Final Remarks
If one confines the attention to asymptotically independent r.v.'s then it is
natural to replace, in a first step, the original marginal r.v.'s by some independent versions. The calculation of an upper bound of the Hellinger distance
between the distribution of a multivariate maximum and the joint distribution
of the independent versions ofthe marginals is an open problem. In a second
step one could apply Lemma 3.3.10 and the results of Section 5.2 to obtain
an upper bound of the Hellinger distance between the original distribution
and a limit distribution.
If we analyze the density of the normalized bivariate maximum, with
normalizing constants an > 0 and bn, in the form as given in (2.2.8), we
find that the decisive condition for the asymptotic independence, in the
strong sense, is that the conditional dJ.'s Fl(bn + anxlbn + anY) and
F2(bn + anYlbn + anx) converge to 1 as n ~ 00. Recall that the related condition (7.2.9) yields the asymptotic independence in the weak sense.
In case of asymptotic independence the statistical results in the univariate
case carryover to the multivariate case. If the marginals are asymptotically
dependent then new statistical problems have to be solved (see e.g. P.2.11 and
Bibliographical Notes).
P.7. Problems and Supplements
M.....
1. Denote by
the number of random vectors ~i in the random quadrant
(oo,X~~!) x (oo,X~7!). Under the conditions of Theorem 7.1.1 the random
vectors (M.(n) .(.) . , X~:~),., X~7~),.) are asymptotically normal.
(Siddiqui, 1960)
7. Approximations in the Multivariate Case
238
2. (i) Let I;
(C; 1' ... , C;d) have a maxstable dJ. Then,
C; l' ... , C;d are associated,
that is, cov(g(I;),f(I; ;;:,,: 0 for all component wise nondecreasing, realvalued functions j, g, whenever the relevant expectations exist.
(Marshall and Olkin, 1983; for an extension see Resnick, 1987)
(ii) If C; 1, ... , C;d are associated and uncorrelated then they are mutually independent.
(JoagDev, 1983)
Bibliographical Notes
Under slightly stronger conditions than those stated in Theorems 7.1.1 and
7.1.2, Weiss (1964) proved the asymptotic normality ofthe dJ.'s of multivariate
central order statistics. The proof is based on the normal approximation of
the multinomial distribution.
The asymptotic normality of multivariate central order statistics was
already proved by Mood (1941), in the special case of sample medians, and
by Siddiqui (1960). In both articles the exact densities of the order statistics
are computed. By using the normal approximation of the multinomial distribution it is then shown that the densities converge pointwise to the normal
density. Thus, according to the Scheffe lemma, one also gets the convergence
in the variational distance. Kuan and Ali (1960) verified the joint asymptotic
normality of multivariate order statistics, including the case where several
order statistics are taken from each component. It is evident that such ordered
values define a grid in the Euclidean dspace. The frequencies of sample points
in the cells of the grid define further r.v.'s. Weiss (1982) proved the joint
asymptotic normality of multivariate order statistics and such associated cell
frequencies.
The research work on multivariate maxima of i.i.d. random vectors started
with the articles of J. Tiago de Oliveira (1958), J. Geffroy (1958/59), and
M. Sibuya (1960). In literature, further reference is given to Finkelstein (1953).
From the beginning much attention was focused on the case where the
marginal maxima are asymptotically independent. It was observed by S.M.
Berman (1961) that for the components of an extreme value vector the independence is equivalent to the pairwise independence. In this context one
also has to note that the marginal maxima are asymptotically, mutually
(quadrant) independent when, and only when, this is true for each pair of
marginal maxima [see e.g. Galambos (1987, Corollary 5.3.1) or Resnick (1987,
Proposition 5.27)].
If measurements of a certain phenomenon are made at places close together
then there will be a certain dependence between the observations which, in
the present context, are supposed to be maxima. From the results of Section
2.2 it is apparent that the family of maxstable distributions is large enough
to serve as a model for this situation. One may argue that this model is even
Bibliographical Notes
239
so large that the problem has to be tackled of finding a smaller nonparametric
or a parametric model. If one has some knowledge of the mechanism underlying the maxima, then, speaking in mathematical terms, a limit theorem for
maxima will single out certain maxstable distributions.
However, one has to face the difficulty of finding "attractive" multivariate
distributions under which the asymptotic distribution of maxima reflects the
dependence of the observed marginal maxima. In this context, the result
obtained by Hiisler and Reiss (1989) (see (7.2.4)) looks promising: Distributions of bivariate maxima are studied under normal distributions where the
correlation coefficient p(n) varies as the sample size increases. In the limit one
obtains a family of maxstable distributions describing situations between
independence and complete dependence.
We refer to Tiago de Oliveira (1984) for a review of parametric submodels
of bivariate maxstable drs. The nonparametric approach of Pickands for
estimating the dependence function (see P.2.11) has been pursued further by
Smith (1985b) by introducing the smoothing technique to multivariate extreme value theory. This work has been continued in Smith et al. (1987) where
the kernel method is applied to the estimation of maxstable dJ.'s.
PART III
STATISTICAL MODELS
AND PROCEDURES
CHAPTER 8
Evaluating the Quantile and
Density Quantile Function
In this chapter
(a) we start with the "pure" nonparametric, statistical model,
(b) introduce smoothness conditions.
In Chapter 9 this discussion will be continued by studying
(c) semiparametric models,
(d) parametric extreme value models.
As pointed out in the Introduction the sample qJ. Fn 1 is the natural
estimator of the underlying qJ. F 1 In Section 8.1 some results will be
collected which concern the statistical performance of sample quantiles. It will
be shown, in particular, that statistical procedures built on sample quantiles
are optimal if the model is large enough.
Given the information that the unknown qJ. F 1 is a smooth function one
should not use step functions generated by the sample qJ. as estimates of the
qJ. Consequently, two different classes of kernel type estimators will be
introduced in Section 8.2.
The first class of estimators is obtained by smoothing the sample qJ. by
means of a kernel. The second class of estimators is established in analogy of
the construction of the sample qJ. as the "inverse" of the sample dJ.: Take the
"inverse" of the kernel type estimator of the dJ. Derivatives of the kernel type
estimates of the qJ. will be appropriate estimates of the density quantile
function (Fl)' = 1/f(F 1 ) where f is the density of F.
8.1. Sample Quantiles
In this section we shall primarily study results for a fixed sample size. The
statistical procedures for evaluating the unknown qquantile will be optimal
8. Evaluating the Quantile and Density Quantile Function
244
if the underlying model is large enough. The test, estimation, and confidence
procedures have to be randomized to satisfy the usual requirements in an exact
way (e.g. attainment of a level or median unbiasedness).
OneSided Test of Quantiles
el'
en
Let Xi:n be the ith order statistic of n i.i.d. random variables
e2' ... ,
with common continuous dJ. F. A basic problem is to test the nullhypothesis
F 1 (q) ~ u against Fl(q) > u.
We shall briefly summarize some wellknown facts concerning tests based
on sample quantiles.
Given IX, q E (0, 1) and a positive integer n, let r(lX) == r(lX, q, n) be the largest
integer r E {O, ... , n} such that
rf (~)
i=O
qi(1 _ q)ni
~ IX.
(8.1.1)
Notice that the lefthand side of (8.1.1) is equal to P{Xr:n > Fl(q)}.
Keep in mind that r(lX) also depends on q and n. Put XO:n = 00 and
Xn+l:n
= 00.
It is clear that
{Xr(Gr): n
> u} is a critical region oflevellX for testing
Fl(q)
u against Fl(q) > u,
however, the level IX will not be attained on the nullhypothesis except in those
cases where equality holds in (8.1.1).
To define a test which is similar on {F: Fl(q) = u} we introduce a randomized test procedure based on two order statistics. Define the critical
function CfJ by
Xr(Gr):n
>u
I
CfJ = { Y(IX) if Xr(Gr):n ~ u, Xr(Gr)+l:n > u
(8.1.2)
o
Xr(Gr)+l:n ~ u
where Y(IX) == y(lX, q, n) is the unique solution of the equation
r(f
~o
(~) qi(l
_ q)ni
+ y ( n ) qr(Gr)(l
~~
q)nr(Gr)
= IX
(8.1.3)
with 0 ~ y < 1.
Simple calculations show that the lefthand side of (8.1.3) is equal to
EFCfJ = P{Xr(Gr):n
> u} + yP{Xr(Gr):n ~ u, Xr(Gr)+l:n > u}.
We have
if Fl(q) ~ u
Fl(q) = u.
(8.1.4)
8.1. Sample Quantiles
245
Moreover,
EFCP =
IX
if F(u) = q.
The critical function cP as defined in (8.1.2) is uniformly most powerful for
the testing F1(q) :$; u against F1(q) > U. To prove this consider the simple
testing problem Fo against F1 where F1 is a dJ. with 0 < q1 := F1 (u) < q; notice
that F1 (u) < q is equivalent to Fl1 (q) > u. Define the dJ. Fo via fo by Fo(t) =
f . oo fo dFI where
q
1 q
(8.1.5)
fo =  1(oo,u) + 1  l(u,oo)'
ql
 ql
Denote by Qi the probability measures belonging to Fi . Then, fo is the
Q1density ofQo. Easy calculations show that Fo(u) = q and hence FOI(q) :$; U.
It will turn out that Fo is a "leastfavorable" nullhypothesis.
Lemma 8.1.1. cP as defined in (8.1.2) is a most powerful, critical function of level
IX for testing Fo against Fl'
PROOF. In view of the Fundamental Lemma of Neyman and Pearson it suffices
to prove that
1
cP = 0
for some c > O. Put Sn
We have
>
iff 1 < c
.IJ fo(O
n
I?=l 1(oo,u)(eJ
Ofo(O=
i=l
'
(q(1 _ qd)Sn ( 1 _ q )n
q1(1q)
lq1
and hence
where c > 0 is defined by the equation
1 q
r(lX) = [loge  nlog 1  
 q1
JI log q(1(1ql
qd)'
 q
From (1.1.8) we know that
r(lX)
>
X,() . n > u
Sn iff
~
<
X'(I1)+1:n :$; u
and hence cP is of the desired form.
Corollary 8.1.2. The critical function CPn defined in (8.1.2) is uniformly most
powerful of level IX for testing F1(q) :$; u against F1(q) > u.
246
8. Evaluating the Quantile and Density Quantile Function
PROOF. Obvious from (8.1.4) and Lemma 8.1.1 since the dJ. Fo defined in
Lemma 8.1.1 is continuous.
0
For k = 1,2,3, ... or k = CfJ we define
possess the following properties:
as the family of all dJ.'s F which
(i) F has a (Lebesgue) density f,
(ii) f> 0 on (rx(F), w(F)),
(iii) f has k bounded derivatives on (rx(F), w(F)).
(8.1.6)
The crucial point ofthe conditions above is that the derivatives above need
not be uniformly bounded over the given model.
Lemma 8.1.3. Let k = 1,2, ... or k = 00 be fixed.
Then, cp as defined in (8.1.2) is a uniformly most powerful critical function of
level rx for testing F1(q) ~ u against F1(q) > u with F E~.
PROOF. Notice that Fo (see the line before (8.1.5)) does not belong to
If f1 is the density of F1 E ~ then Fo has the density
q;
q
fo = f1 (
1(oo,u]
1 q
+1
q1
1(u,oo)
~.
(8.1. 7)
Since q1 < q it is clear thatfo has ajump at u, thus Fo ~. To make Lemma
8.1.1 applicable to the case k ~ 1 one can choose d.f.'s Gm E ~ with G,;,;l(q) = u
having densities gm such that gm(x) + fo(x) as m + 00 for every x i= u. Then,
applying Fatou's lemma, one can prove that every critical function t/I of level
rx on {F E~: F1(q) ~ u} has the property EFot/l ~ rx. Thus, Lemma 8.1.1
yields E F 1 t/I ~ E F 1 cp and hence, cp is uniformly most powerful.
0
Randomized Estimators of Quantiles
Whereas randomized test procedures expressed in the form of critical functions are widely accepted in statistics this cannot be said of randomized
estimators. Therefore, we keep our explanations here as short as possible.
Nevertheless, we hope that the following lines and some further details in the
Supplements will create some interest.
Recall that the randomized sample median was defined in (1.7.19) as the
Markov kernel
M('I')(e
+e X[{n+l)/2)+1:n )/2
n
X[(n+l)/2),"
(8.1.8)
where ex again denotes the Dirac measure with mass 1 at x.
In Lemma 1.7.10 it was proved that Mn is median unbiased; that is, the
median of the underlying distribution is a median of the distribution of the
Markov kernel Mn In analogy to (8.1.8) one can also construct a randomized
8.1. Sample Quantiles
247
sample qquantile which is a median unbiased estimator of the unknown
qquantile.
Given q E (0, 1) and the sample size n let
r == r(1/2, q, n)
and
==
y(1/2, q, n)
be defined as in (8.1.1) and (8.1.3). Define the randomized estimator Qn by
(8.1.9)
where Xr:n is the rth order statistic of n i.i.d. random variables with common
continuous dJ. F.
From the results concerning test procedures one can deduce by routine
calculations that the randomized sample qquantile is an optimal estimator
of the qquantile in the class of all randomized, median unbiased estimators
which are equivariant under translations. Nonrandomized estimators will be
studied at the end of this section.
Randomized OneSided Confidence Procedures
Another relevant source is Chapter 12 in Pfanzagl (1985). There the quantiles
serve as an example of an irregular functional in the sense that the standard
theory of 2nd order efficiency is not applicable. This is due to the fact that for
this particular functional a certain 2nd derivative does not exist. Hence, a
direct approach is necessary to establish upper bounds for the 2nd order
efficiency of the relevant statistical procedures.
Randomized statistics of the form (8.1.9) with r == r(l  p, q, n) and y ==
y(l  p, q, n) also define randomized, onesided confidence procedures where
the lower confidence bound is Xr:n with probability 1  y and Xr+1:n with
probability y. These confidence procedures are optimal under all procedures
that exactly attain the confidence level p. Pfanzagl proves that the asymptotic
efficiency still holds within an error bound of order o(n 1/2 ) in the class of all
confidence procedures attaining the confidence level p + o(n 1/2 ) uniformly in
a local sense (compare with Pfanzagl (1985), Proposition 12.3.3). A corresponding result can be proved for test and estimation procedures.
Estimator Based on a Convex Combination
of Two Consecutive Order Statistics
F or some fixed q E (0, 1) define
tIn
where r(n) == r(q,n)
= (1  y(n))Xr(n):n + y(n)Xr(n)+l:n
{l, ... ,n} and yen)
nq  r  y
(8.1.10)
[0,1) satisfy the equation
+ (1 + q)/3 =
0.
(8.1.11)
8. Evaluating the Quantile and Density Quantile Function
248
Put
s~p Ip
t
(52
q(1  q). Under the conditions of Theorem 6.2.4 we get
1/2 f(:l
(q)) (qn  F l (q))
:$;
t}
1
(<I>(t) _ n 1/ 2 (t)[1  2q _ (5f'(F (q))Jt 2
qJ
3(5
2f(F l(q))2
)1 = o(nl/2).
(8.1.12)
It is immediate that qn is median unbiased of order o(nl/2).
Moreover, notice that qn is equivariant under translations; that is, shifting
the observations amounts to the same as shifting the distribution of qn. One
can prove that qn is optimal in the class of all estimators that are equivariant
under translations and median unbiased of order o(nl/2). The related result
for confidence intervals is proved in Pfanzagl (1985), Proposition 12.3.9.
In the present section the statistical procedures are, roughly speaking,
based on the sample qquantile. These procedures possess an optimality
property because the class of competitors was restricted by strong conditions
like exact median unbiasedness or median unbiasedness of order o(n 1/ 2). If
these conditions are weakened then one can find better procedures. We refer
to Section 8.3 for a continuation of this discussion.
8.2. Kernel Type Estimators of Quantiles
Recall that the sample qquantile Fnl(q) is given by
Fnl(q) = Xi:. if (i  1)/n < q
:$;
i/n and q E (0, 1) for i = 1, ... , n.
Thus, F. l generates increasing step functions which have jumps at the
points i/n for i = 1, ... , n  1. Throughout we define F.l (0) = F.l(O+) = Xl:.
and F.l(l) = F. 1 (1) = X.: n.
If the underlying q.f. F l is continuous or differentiable then it is desirable
to construct functions as estimates which share this property. Moreover, the
information that F l is a smooth curve should be utilized to obtain estimators
of a better statistical performance than that of the sample q.f. F.I. The key
idea will be to average over the order statistics close to the sample qquantile
for every q E (0, 1).
The Polygon
In a first step we construct a piecewise linear version of the sample q.f. Fn l
by means of linear interpolation. Thus, given a predetermined partition
0= qo < ql < ... < qk < qk+1 = 1 we get an estimator of the form
Fn l (qjl ) + q  %1 [Fl()
n
%  F 1 ( %1 )] ,
qj  qjl
(8.2.1)
8.2. Kernel Type Estimators of Quantiles
249
For j = 2, ... , k we may take values qj such that qj  qj1 = f3 for some
appropriate "bandwidth" f3 > o. This estimator evaluated at q is equal to the
sample qquantile if q = % and equal to [Fn 1(q  f3/2) + Fn 1(q + f3/2)]/2 if
q = (%1 + %)/2 for j = 2, ... , k. Notice that the derivative of the polygon is
equal to
Moving Scheme
This gives reason to construct another estimator of F 1 by using a "moving
scheme." For every q E (0, 1) define the estimator of F1(q) by
(8.2.2)
where the "bandwidth function" f3(q) has to be defined in such a way that
q  f3(q) < q + f3(q) s 1. Given a predetermined value f3 E (0, 1/2) the
bandwidth function f3(q) can e.g. be defined by
os
f3(q)
f3
1 q
if
O<q<f3
f3sqs1f3
1f3<q<1.
(8.2.3)
Another reasonable choice of a bandwidth function is
f3(q)
q  q2/4f3
f3
(1  q)  (1  q)2/4f3
0< q < 2f3
if 2f3 s q s 1  2f3
1  2f3 < q < 1
(8.2.4)
where it is assumed that f3 s 1/4. Notice that the bandwidth function in (8.2.4)
is differentiable.
The use of bandwidths depending on q can be justified by the following
arguments:
Since Fn 1 (q) is the natural, nonparametric estimator of F 1 (q) it is clear
that (8.2.2) defines an estimator of [F 1(q  f3(q + r1(q + f3(q))]/2 which
in turn is approximately equal to F1(q) if F 1 is a smooth function near q
and if f3(q) is not too large. However, if q is close to one of the endpoints of
the domain of F 1 , then one has to be cautious. If q or 1  q is small than the
usual q.f.'s (e.g. normal or exponential) do not fulfill the required smoothness
condition. Thus, without further information about the form of the qJ. at the
endpoints of (0, 1) a statistician should again adopt the sample qJ. or any
estimator close to the sample qJ. This aim is achieved by using bandwidths
as defined above.
The use of variable bandwidths also enters the scene when a pointwise
optimal bandwidth (depending on the underlying dJ.) is estimated from
the data. In this case the bandwidth is random and depends on the given
argument q.
250
8. Evaluating the Quantile and Density Quantile Function
The polygon (Figure 8.2.1) and the moving scheme (Figure 8.2.2) are
based on n = 50 pseudo standard exponential random numbers. F 1 is the
standard exponential q.f.
1.2
0.8
0.4
0.4
0.5
0.6
Figure 8.2.1. Fl, Fn 1 , polygon with n = 50,
P=
0.1.
1.2
0.8
0.4
0.5
0.4
0.6
Figure 8.2.2. F 1 , moving scheme with n = 50,
P= 0.1.
QuasiQuantiles and Trimmed Means
The estimator in (8.2.2) can be written
(Xr(q):n
+ X s (q):n)/2.
(8.2.5)
8.2. Kernel Type Estimators of Quantiles
251
If q  P(q) and q + P(q) are not integers then we have r(q) = max(1, [n(q P(q))] + 1) and s(q) = min(n, [n(q + P(q))] + 1).
Another ad hoc estimator of the qquantile is a certain "trimmed mean"
defined by
(s(q)  r(q)
s(q)
+ 1)1 L
i=r(q)
X i :n.
(8.2.6)
To extend the class of estimators of the qquantile we introduce estimators
of the form
n
Fn~Mq) =
L ai.n(q)Xi :n
i=1
(8.2.7)
where the scores ai.n(q) satisfy the condition
n
L ai.n(q) = 1.
i=1
(8.2.8)
Within this class of estimators we shall study those where the scores are
defined by a kernel. The "trimmed mean" will be closely related to a kernel
estimator which is based on a uniform kernel.
The Kernel Method
Since we shall also need a kernel estimator Fn o of the dJ. F we discuss the
method of smoothing a function via a kernel within a general framework.
Notice that the qJ. of Fn o will be another competitor of the sample qJ. as
an estimator of the underlying qJ. F 1
Hereafter, let H be a realvalued function with domain (a, b). Particular
cases are qJ.'s and dJ.'s with domain (0, 1) and, respectively, the real line.
We say that a realvalued function k with domain (a, b) x (a, b) is a kernel
iffor every x E (a, b),
= 1.
(8.2.9)
k(x,y)Hn(y)dy.
(8.2.10)
k(x, y) dy
Given an initial estimator Hn define
Hn.o(x) =
By partial integration we get the representation
Hn.o(x) =
K(x,y)dHn(y)
+ Hn(a+)
(8.2.11)
if Hn(a+) and Hn(b) exist and are finite where the function K is defined by
K(x,z) =
k(x, y) dy.
(8.2.12)
252
8. Evaluating the Quantile and Density Quantile Function
We shall study special kernels of the form
1
k(x, y) = P(x) u
(Xy)
P(x) .
(8.2.13)
The function u is again called a kernel.
Kernel Estimators of Q.F.
(8.2.14)
where the score functions ai,n are given by
i/n
ai,n(q) =
k(q, y) dy.
(il)/n
Obviously, condition (8.2.9) implies that the scores ai,n(q) satisfy condition
(8.2.8).
Let u have the properties Su(y)dy = 1 and u(x) = 0 for Ixl > 1. Moreover, assume that the bandwidth function p satisfies the condition P(q) :::;
min(q, 1  q); e.g. the bandwidth functions in (8.2.3) and (8.2.4) satisfy this
condition. Then the kernel k defined in (8.2.13) satisfies (8.2.9). Now, Fn~6 can
be written in the form
(8.2.15)
For P(q), defined in (8.2.3) and (8.2.4), the function q + q  P(q)y is nondecreasing for every Iy I :::; 1 showing that Fn~6 is nondecreasing if u ~ O. Thus,
Fn~6 is in fact a qJ. Moreover, this construction has the favorable property
that the range of Fn~6 is a subset of the support of the underlying dJ. F.
Writing U(z) = S=1 u(y)dy we have
1
Fn,o(q) = i~
(q  (iP(q) 1)/n)  U (q(i{(j)
 i/n)] X i:n
(8.2.16)
It is easy to verify that the coefficients are equal to zero if i :::; n(q  P(q)) or
i ~ n(q + P(q)) + 1.
Kernel Estimators of D.F.
The kernel estimators of the dJ. are of the form
(8.2.17)
8.2. Kernel Type Estimators of Quantiles
253
or, alternatively,
Fn,o(x) = n 1
(8.2.18)
U((x  ~i)/[3)
i=l
where U(z) = J:.oo u(y) dy and u is a function such that Ju(y) dy = 1. If u ~ 0
then Fn,o generates drs hence by constructing the corresponding qJ.'s we
obtain a further estimator (Fn.or 1 of the qJ. F 1 .
Density Estimation
The kernel method enables us to construct differentiable functions as estimates
of the dJ. F and the qJ. F 1 , although the initial estimates are step functions.
Thus, we get estimators of the density f = F' and the density quantile function
(F 1 ), = 1/f(F 1 ) as well.
From (8.2.18) we obtain
Fn,l (x) = F~,o(x) = (nf3)l
L u((x 
~i)j[3).
(8.2.19)
i=1
A corresponding formula holds for Fn~i = (Fn}X If the bandwidth function
is defined as in (8.2.3) then Fn~6 is differentiable on the interval ([3, 1  [3). We
get
Fn~~(q) = [31 i~ [u(q 
(i[3 1)/n)  u(q
~i/n) JXi:n
(8.2.20)
for [3 ::;; q ::;; 1  [3. With [3(q) as in (8.2.4) the same representation of Fn~~ holds
for 2[3 ::;; q ::;; 1  2[3. However, now Fn~~ also exists on (0,1) and can easily be
computed.
Some Illustrations
In this sequel, we shall apply the Epanechnikov kernel defined by
u(x) = (3/4)(1  x 2)l r 1,ll(x).
(8.2.21)
Notice that
o
U(x)=1/2+3x/4x 3 /4
1
x < 1
if 1::;;x::;;1
x>1.
In Figures 8.2.3 and 8.2.4 the kernel qJ. Fn~6 and the qJ. (Fn,or 1 ofthe kernel
dJ. Fn,o are based on n = 100 pseudo standard exponential random numbers.
For q bounded away from 0 and lone realizes that Fn~6 and (Fn,O)l have
about the same performance. Near to 0 and 1 the estimate taken from (Fn,O)l
254
8. Evaluating the Quantile and Density Quantile Function
has the unpleasant property that (a) it is inaccurate and (b) it attains values
which do not belong to the support of the exponential dJ. The second property
is of course not very surprising. To avoid this unpleasant behavior of (Fn,o)l
one should modify Fn,o(x) in such a way that the bandwidth depends on x.
0.1
0.2
Figure 8.2.3. Fl, Fn l , and Fn~~ with n = 100, fJ
Figure 8.2.4. rt, Fn
l ,
and (Fn,or l with n
0.08.
= 100, fJ = 0.08.
255
8.2. Kernel Type Estimators of Quantiles
Figures 8.2.3 and 8.2.4 show clearly that the kernel estimates reduce the
random fluctuation of the "natural" estimates thus, also reducing the maximum deviation from the underlying dJ.
Next Fn~6 and (Fn,O)l will be evaluated at the right end of the domain.
Again Fn~6 is defined with the bandwidth function in (8.2.4).
8
0.9
Figure 8.2.5.
l,
F.I, and F.~6 with n = 100,
1.0
f3
= 0.08.
0.9
1.0
Figure 8.2.6. Fl, F. I and (F. of l with n = 100, f3 = 0.08.
At the first moment I thought there was an error in the computer program
when the graph in Figure 8.2.6 appeared on the screen. The graph of (Fn,O)l
can hardly be distinguished from the sample qJ. The explanation for (Fn,of 1
256
8. Evaluating the Quantile and Density Quantile Function
being close to the sample qJ. is that the largest order statistics are not close
to each other, and so the kernel dJ. with the bandwidth f3 = 0.08 does not
smooth the sample dJ.
Parametric versus Nonparametric Estimation
Finally, we examine the estimation of the standard normal qJ. This situation
is related to estimating the exponential qJ. near the right endpoint of the
domain.
In addition to the smoothed sample qJ. F.~~, we shall take the estimator
J1.. + 11.<1>1 where (J1..,I1.) is the maximum likelihood (m.l.) estimator of the
location and scale parameter of the normal dJ. The kernel qJ. is again defined
by means of the Epanechnikov kernel.
In Figures 8.2.7 and 8.2.8 the observations are sampled according to the
standard normal dJ. We remark that the m.l. estimate (J1.., 11.) of (J1., (1) has the
value (0.028, 1.032).
The performance of the estimators is bad near the endpoints 0 and 1 of the
domain. This is not surprising since the parametric estimate J1.. + 11.<1>1 does
not converge to <1>1 uniformly over (0, 1). Notice that
is an unbounded function whenever 11. # 1. Thus, Figure 8.2.8 is misleading
to some extent.
0.8
1.0
Figure 8.2.7. F.l, F.~6 with n = 100, P= 0.08.
8.2. Kernel Type Estimators of Quantiles
0.6
Figure 8.2.8. <1>1 (dotted curve), I1n
257
O.B
1.0
+ O"n<l>l, Fn~6 with n =
100, f3
0.08.
Fitting a Density to Data
For the visual comparison of two different dJ.'s the probability paper plays a
dominant role (see e.g. Gumbel's book or Barnett (1975)). For this purpose
the "theoretical" dJ. is transformed to a straight line. When applying the same
transformation to the sample dJ., a deviation of the transformed sample dJ.
from the straight line can easily be detected.
It can be advisable to compare distributions by their densities. One advantage is that one can see the original form of the distribution. The data will
visually be represented by means of the kernel density in = Fn.1 as introduced
in (8.2.19). In a second step an extreme value density is fitted to the kernel
density. We suppose that the graphs given in Sections 1.3 and 5.1 have already
sensitized the reader for extreme value densities.
We shall examine the monthly and annual maxima of the temperature at
De Bilt (Netherlands). Data of 133 years (18491981) are available and have
been first studied by M.A.l van Montfort (1982). The plot of the annual
maxima on a normal probability paper shows an excellent fit of a normal
distribution. Van Montfort points out the resemblance of normal distributions and certain "symmetric" Weibull distributions (compare also Figure
1.3.4). The author is grateful to van Montfort for a translation of his paragraph
8.3 (written in Dutch) and for providing the data. Despite of van Montfort's
remark, Sneyers (1984) considers this as an " ... example of an extreme value
distribution following not a FisherTippett asymptote ... ".
Below Weibull densities with location, scale, and shape parameters 11, 0',
and (X are fitted to kernel densities based on monthly maxima of the temperature. The kernel density is defined with the Epanechnikov kernel and the
258
8. Evaluating the Quantile and Density Quantile Function
0.2
....
24
28
32
36
Figure 8.2.9. July: Kernel density and Weibull density with parameters J1 = 37.3,
= 8.5, ex = 2.7.
(J
0.2
.........
24
28
32
36
Figure 8.2.10. September: Kernel density and Weibull density with parameters
(J = 19.8, ex = 7.0.
J1 = 44.0,
bandwidth {3 = 2.0. A better fit can be achieved by more smoothing, that is,
for a larger bandwidth.
We see that the densities of the maxima of temperature in July (Fig. 8.2.9)
are skewed to the left; those for September (Fig. 8.2.10) are skewed to the right.
Below we also include the corresponding Weibull densities for June and
August which are close together. That for June is nearly symmetric and that
for August is slightly skewed to the right. The kernel density for annual
maxima is nearly symmetric.
The largest observed values of monthly maxima within 133 years are
8.2. Kernel Type Estimators of Quantiles
259
0.2
24
28
32
36
Figure 8.2.11. Kernel density for annual maxima; Weibull densities: June: J1 = 38.2,
= 10.6, (J. = 3.9; July: J1 = 37.3, (J = 8.5, (J. = 2.7; August: J1 = 39.8, (J = 12.2, (J. = 4.1;
September: J1 = 44.0, (J = 19.8, (J. = 7.0.
(J
(a) 36.8 in June 1947, (b) 35.6 in July 1911, (c) 35.8 in August 1857, and (d) 34.2
in September 1949.
We suggest to classify the annual maximum as a maximum of independent,
not identically distributed Wei bull r.v.'s according to the maxima in June,
July, August, and September. According to (1.3.4), the calculation of the dJ.
and the density of the maximum of not identically distributed r.v.'s creates no
difficulties. The resulting density shows an excellent fit to the kernel density
of the annual maxima as given in Figure 8.2.11.
The choice of the Wei bull density was accomplished by some visual,
subjective judgment. To obtain an automatic procedure one should fix a
distance between densities like the maximum deviation, X2distance, Hellinger
distance, or some other distance. Then, take that parameter (}1, a, (J() which
minimizes the distance between the kernel density and the Weibull density.
From the foregoing remarks it becomes obvious that our estimates are produced by some kind of minimum distance method. By using this method we
are getting larger estimates of the unknown right endpoint than by taking the
sample maximum. Recall that the "minimum distance" estimates are 38.2, 37.3,
39.8, and 44.0 compared to the sample maxima 36.8, 35.6, 35.8, and 34.2. The
difference is particularly significant in those cases where the density is skewed
to the right.
Hosking (1985) developed a modified NewtonRaphson iteration algorithm for solving the maximum likelihood equation in the 3parameter extreme value model (given by the von Mises parametrization). This algorithm
seems to work if IPI < 0.5. When using the "minimum distance" estimates,
given in Figure 8.2.11, as initial estimates, then one obtains the following
estimates:
260
8. Evaluating the Quantile and Density Quantile Function
June:
August:
}J.
}J.
= 39.1,
= 38.7,
(J
(J
= 11.4, a = 4.3;
= 11.1, a = 4.0;
July:
}J. = 36.2, (J = 7.4, a = 2.4;
September:}J. = 38.8, (J = 14.4, a = 5.2.
The densities pertaining to the maximum likelihood estimates show again
an excellent fit to the kernel (sample) densities.
8.3. Asymptotic Performance of Quantile Estimators
The kernel estimator of the q.f. is given by
1( )=pl
Fn,O q
Joe
(qy)F1()d
P n y Y
(8.3.1)
if 2P < q < 1  2p. Notice that under appropriate regularity conditions the
ith derivative Fn~f of Fn~& is given by
(8.3.2)
Moderate Deviations
Our first aim will be to deduce rough bounds for the rate of convergence of
kernel estimators of the q.f. and its derivatives. For this purpose we shall study
again the oscillation property of the sample q.f.
The basic tool for the following considerations will be Lemma 3.1.7(ii)
which describes the stochastic behavior of
(8.3.3)
uniformly over ql' q2 with 0 < Pl ::;; ql < q2 ::;; P2 < 1.
In the sequel we shall assume that the kernel u satisfies the following
regularity conditions:
Condition 8.3.1. Let m be a positive integer. Assume that
(i) u has the support [ 1,1].
(ii) u has m + 1 derivatives.
(iii) u(y) dy = 1.
Integration by parts yields
u(i)(y)yi dy
and
= i!,
f u(i)(y)yi dy = 0,
= 0, .'" m + 1
(8.3.4)
O::;;j<i::;;m+1.
8.3. Asymptotic Performance of Quantile Estimators
261
Condition 8.3.2. Let k be a positive integer. Assume that
f u(y)yi dy = 0,
j = 1, ... , k.
Under Conditions 8.3.1 and 8.3.2 we get, by means of integration by parts,
that
f u(i)(y)yi+i dy = 0,
i = 0, ... , m + 1 and j = 1, ... , k.
(8.3.5)
The following representation of Pn~t will be useful:
(8.3.6)
where the remainder term is given by
Ri,n(q)
= f3 i f u(i)(y)[p~l(q + pi
f [pu(i)(y)
1 (q
f3y)  Pn1(q)  (Pl(q  f3y)  Pl(q))]dy
 f3y)  pl (q) 
~i (  ~iY (Pl )(j)(q) ]
}l
dy
].
(8.3.7)
if again 2f3 < q < 1  2f3 and if the derivatives of pl at q exist.
We remark that (8.3.7) always holds for k = 0.
The representation above shows that Ri,iq) splits up (a) into a random
part which is governed by the oscillation behavior of the sample qJ. and (b)
into a nonrandom part which depends on the remainder term of a Taylor
expansion of pl about q.
It is evident that a similar representation holds for the sample dJ. Pn in
place of the sample qJ. Pn 1 . Recall that the oscillation behavior of Pn was
studied in Remark 6.3.3.
The histograms with random or nonrandom cells are based on terms of
the form
or
(Fn(t z )  Pn(td)/(t z  td
Thus, the oscillation behavior of the sample qJ. and the sample dJ. can be
regarded as a property which summarizes the properties of histograms.
The representation (8.3.6) shows that the stochastic behavior of kernel
estimators of the qJ. is exhaustively determined by the oscillation behavior of
the sample q.f.
262
8. Evaluating the Quantile and Density Quantile Function
Next, we give a technical result which concerns the moderate deviation of
the kernel qJ. from the underlying qJ.
Lemma 8.3.3. Suppose that Conditions 8.3.1 and 8.3.2 hold for some m ~ 1 and
k = m  1. Moreover, assume that the qf. F 1 has m + 1 bounded derivatives
on a neighborhood of the interval (Pl,P2)'
Then, for every s > 0 and every sufficiently small P ~ (log n)/n there exist
constants B, C > 0 (being independent of P and n) such that
(i)
p{
sup
PI~q~P2
(ii) P {
sup
PI
~q~P2
Wn~Mq) Wn~l(q)
F;I(q)1 > c[(PIOgn)I/2
+ pmJ} <
Bn s,
)1/2 + pm.+l
. J} < Bn s
.
logn
 (Fl )(')(q)1
> C [(2i=1
for i = 1, ... , m, and
(iii) P {
PROOF.
sup
Pl~q~P2
Wn~~+1 (q)1 > C [(p~~:~n)1/2 + 1J} < Bn
Immediate from Lemma 3.1.7(ii) and (8.3.6).
It is easy to see that Lemma 8.3.3(i) holds with (log n)/n in place of
(P(log n)/n) if 0 < P ::5: (log n)/n. This yields that for every e > 0:
1/2
P{
sup
Inl/2(Fn~Mq) 
Fl(q))  n 1/2(Fn 1(q)  F 1 (q))1
~ e} ~ 0
(8.3.8)
Pl~q~P2
as n ~ 00 for every sequence of bandwidths P == Pn with np;m ~ 0, n ~ 00.
This means that the quantile process n 1/2(Fn 1  F 1) and the smooth
quantile process nl/2(Fn~~  F 1) have the same asymptotic behavior on the
interval (PI' P2)' Lemma 8.3.3 shows that, with high probability, the kernel
estimates of the qJ. are remarkably smooth. This fact is basic for the considerations of Section 8.4.
Kernel Estimators Evaluated at a Fixed Point
The results above do not enable us to distinguish between the asymptotic
performance of the sample qJ. and the kernel estimator of the qJ. This is
possible if a limit theorem together with a bound for the remainder term is
established. The first theorem, taken from Reiss (1981c), concerns the estimation of the dJ.
Theorem 8.3.4. Let Fn,o be the kernel estimator of the df. as given in (8.2.18).
Suppose that the kernel u satisfies Conditions 8.3.1(i), (ii), and 8.3.2 for some
k ~ 1. Moreover, let F have k + 1 derivatives on a neighborhood of the fixed
point t such that W(k+l)1 ::5: A.
8.3. Asymptotic Performance of Quantile Estimators
263
Then, uniformly over the bandwidths p E (0, 1),
IE(Fn,o(t)  F(t2  E(Fn(t)  F(t2
~
(pk+i A Jlu(x)xk+ildx/(k
+ 2(p/n)F'(t)
XU(X)U(X)dXI
+ 1)!)2 + O(p2/n).
(8.3.9)
This result enables us to compare the mean square error E(Fn,o(t)  F(t2
of Fn,o(t) and the variance E(Fn(t)  F(t2 = F(t)(1  F(t/n ofthe sample dJ.
Fn(t) evaluated at t. If F'(t) > 0 and the bandwidth p is chosen so that the
righthand side of (8.3.9) is sufficiently small then the term Jxu(x) U(x) dx can
be taken as a measure of performance of Fn,o(t). If
f xu(x) U(x) dx > 0
(8.3.10)
then, obviously, Fn,o(t) is of a better performance than Fit).
If u is a nonnegative, symmetric kernel then
xu(x) U(x) dx =
xU(X) [2U(x)  1Jdx > 0
since the integrand on the righthand is nonnegative. Notice that a nonnegative kernel u satisfies Condition 8.3.2 only if k = 1.
From (8.3.9) we see that Fn,o(t) and Fn(t) have the same asymptotic efficiency, however, Fn(t) is asymptotically deficient w.r.t. Fn,o(t). The concept of
deficiency was introduced by Hodges and Lehmann (1970). Define
(8.3.11)
Thus, i(n) is the smallest integer m such that Fm(t) has the same or a better
performance than Fn,o(t). Since i(n}/n  1, n  00, we know that Fn,o(t) and
Fn(t) have the same asymptotic efficiency. However, the relative deficiency
i(n)  n of Fn(t) w.r.t. Fn,o(t) quickly tends to infinity as n  00. In short, we
may say that the relative deficiency i(n)  n is the number of observations that
are wasted if we use the sample d.f. instead of the kernel estimator.
The comparison of Fn(t) and Fn,o(t) may as well be based on covering
probabilities. The BerryEsseen theorem yields
(8.3.12)
where u 2 = F(t)(1  F(t. The BerryEsseen theorem, Theorem 8.3.4, and
P.8.6 lead to the following theorem.
Theorem 8.3.5. Under the conditions of Theorem 8.3.4
p >0,
P{(ni/2/u)lFn o(t)  F(t)1 ~ y}
,
= 2<1> [y (~  E(Fn,o(t)  F(t2) ]
2 2E(Fn(t)  F(t2
we
get, uniformly over
(8.3.13)
+ O(ni/2 + (P + np2(m+i3/2).
264
8. Evaluating the Quantile and Density Quantile Function
We see that the performance of Fn,o(t) again depends on the mean square
error. A modified definition of the relative deficiency, given w.r.t. covering
probabilities, leads to the same conclusion as in the case of the mean square
error.
In analogy to the results above, one may compare the performance of the
sample qquantile Fnl(q) and a kernel estimator Fn~Mq). If the comparison is
based on the mean square error, one has to impose appropriate moment
conditions. To avoid this, we restrict our attention to covering probabilities.
Recall from Section 4.2 that under weak regularity conditions,
P{(n l/2/O'o)lFn 1(q)  F1(q)1 ::;; y} = 2<l>(y)  1 +O(n 1/2) (8.3,14)
with 0'5 = (q(1  q/[f(Fl(q2] and f denoting the derivative of F.
The following lemma is taken from Falk (1985a, Proposition 1.5).
Lemma 8.3.6. Let Fn~A be the kernel estimator of the qf. as given in (8.3.1).
Suppose that the kernel u satisfies Conditions 8.3.1 (i), (ii). Suppose that the qf.
F 1 has a bounded second derivative on a neighborhood of the fixed point
q E (0, 1), and that f(F1(q > 0.
Then, if P== pen) + 0, n + 00, we have,
P{ (nl/2 /O'n)(Fn~Mq)  fln) ::;; y} = <l>(y) + O(log(n)n 1/4 )
(8.3.15)
where
(8.3.16)
and
0'; =
II (f
u(x) [q  flx  l(o,qPX)(Y)] (F 1)'(q  px) dx
dy.
(8.3.17)
Moreover,
n +
00.
(8.3.18)
Thus, from Lemma 8.3.6 we know that Fn~Mq) is asymptotically normal
with mean value fln and variance 0'; In. The proof of Lemma 8.3.6 is based on
a Bahadur approximation argument. (8.3.18) indicates that Fn~Mq) and F,,l(q)
have the same asymptotic efficiency. It would be of interest to know whether
the remainder term in (8.3.14) is of order O(n1/2). Applying P.8.6 we obtain
as a counterpart of Theorem 8.3.5 the following result.
Under the conditions of Lemma 8.3.6,
P{ (n l/2 /O'o)IFn~Mq)  F1(q)1 ::;; y}
=
2<l>[Y(~ _ O';/n + (fln  Fl(q2)]
2
20'5/n
(8.3.19)
8.4. Bootstrap via Smooth Sample Quantile Function
265
This shows that the performance of Fn~Mq) depends on the "mean square
error" In + (JIn  Fl(q))2. As in Falk (1985a, proof of Theorem 2.3) we may
prove that
a;
a;
= aJ
2f3(n)
XU (x) U(x) dx
+ O(f3(n)2)
(8.3.20)
and
IJIn  F1(q)1
o(f3(n)k+ 1 )
(8.3.21)
if F 1 has k + 1 derivatives on a neighborhood of q and the kernel U satisfies
Condition 8.3.2 for k. Thus, the results for the qquantile are analogous to
that for the sample dJ.
8.4. Bootstrap via Smooth Sample Quantile Function
In Section 6.4 we introduced the bootstrap dJ. T,,(Fn,') as an estimator of
the dJ.
T,,(F, .) = PF{T(Fn)  T(F) ~ .}.
Thus, T,,(F, .) is the centered dJ. of the statistical functional T(Fn). Then, in
the next step, the bootstrap dJ. T,,(Fn, . ) is the statistical functional of T,,(F, .).
For the qquantile (which is the functional T(F) = Fl(q it was indicated
that the bootstrap error T,,(Fn' t)  T,,(F, t) is of order O(nl/4).
Thus, the rate of convergence of the bootstrap estimator is very slow. We
also refer to the illustrations in Section 6.4 which reveal the poor performance
for small sample sizes. Another unpleasant feature of the bootstrap estimate
was that it is a step function.
In the present section we shall indicate that under appropriate regularity
conditions the bootstrap estimator based on a smooth version of the sample
dJ. has a better performance.
The Smooth Bootstrap D.F.
Let again
Fn~A
denote the kernel qJ. as defined in Section 8.2. We have
Fn~A(q) =
Il nf3~q)
U(
qf3~;) Fn
where the kernel u satisfies the conditions u
1 (y) dy
0, u(x)
(8.4.1)
0 for Ixi > 1, and
Ju(x) dx = 1. Moreover, the bandwidth function f3(q) is defined as in (8.2.3) or
(8.2.4). Denote by Fn,o the smooth sample dJ. which is defined as the inverse
of the kernel qJ. Fn~A.
By plugging Fn,o into T,,(', t) (instead of Fn) we get the smooth bootstrap
dJ. T,,(Fn,o, ').
266
8. Evaluating the Quantile and Density Quantile Function
We remark that one may also use the kernel estimator of the dJ. as
introduced in Section 8.2.
Since Fn o is absolutely continuous one can expect that the smooth bootstrap dJ. T,,(Fn.O, .) is also absolutely continuous. This will be illustrated in
the particular case of the qquantile.
Illustration
Given n i.i.d. random variables with standard normal dJ. <I> define again, as
in Section 6.4, the normalized dJ. of the sample qquantile by
T,,*(F, t) = T,,(F, (q(l  q1/2t/nl/2qJ(<I>1(q))).
For a sample of size n = 20 (Figure 8.4.1) and n = 200 (Figure 8.4.2) we
3
2
1
Figure 8.4.1. T,.*(<I>,'), T,.*(F., .), T,.*(F. o, .) for q = .4, n = 20.
3
2
1
Figure 8.4.2. T,.*(<I>,'), T,.*(F., .), T,.*(F. o, .) for q = .4, n = 200.
8.4. Bootstrap via Smooth Sample Quantile Function
267
compare the normalized dJ. T,,*(F, .) ofthe sample qquantile, the normalized
bootstrap dJ. T,,*(Fn, .), and the normalized smooth bootstrap dJ. T,,*(Fn,o, .).
The kernel qJ. Fn~~ is defined with the bandwidth function in (8.2.4) with
P = 0.07. Moreover, u is the Epanechnikov kernel.
Smooth Bootstrap Error Process
In this sequel, let us again use the same symbol for the dJ. and the corresponding probability measure. Write
T,,(F, B) = PF { (T(Fn)  T(F))
B}
(8.4.2)
for Borel sets B.
Define the bootstrap error process Iln(F, .) by
Iln(F,B)
= T,,(Fn,o, B)  T,,(F,B).
(8.4.3)
Notice that Iln(F, .) is the difference of two random probability measures and
thus a random signed measure. Below we shall study the stochastic behavior
of Iln(F, .) as n + 00 in the particular case of the qquantile T(F) = F1(q).
Let!7 be a system of Borel sets. We shall study the asymptotic behavior of
sup Illn(F, B)I
Be[/'
in the particular case ofthe functional T(F) = Fl(q) for some fixed q E (0, 1).
Put
and
vn(F, B) =
(8.4.4)
[1  (x/O'n)]2 dN(O,aa)(x),
Straightforward calculations show that
sup Ivn(F, (00, t])1
= (2ne)1/2
and
(8.4.5)
sup Ivn(F, [t,t]1 = sup Ivn(F,B)1 = (2/ne)1/2.
1>0
Notice that these expressions do not depend on the underlying dJ. F.
Theorem 8.4.1. Assume that
(a) F 1 has a bounded second derivative near q and that (F 1)'(q)
fixed q E (0, 1),
(b) the bandwidth Pn satisfies the conditions np; + 0 and np;
(c) the kernel u has a bounded second derivative.
> 0 for some
+ 00
as n +
00,
8. Evaluating the Quantile and Density Quantile Function
268
Then,
PF {
nf3n /
f y2 !~~
u 2 (y) dy
IJln(F, B)I/ !~~ Ivn(F, B)I
~ t}
+
2<I>(t)  1
(8.4.6)
as n +
00
for every t 2 0 whenever SUPBeY' Ivn(F, B)I > O.
The key idea of the proof is to compute the asymptotic normality of the
sample qquantile ofi.i.d. random variables with common qJ. Fn~6. According
to Lemma 8.3.3 such qJ.'s satisfy the required smoothness conditions with
high probability.
A version of Theorem 8.4.1, with Y' = {(  00, tJ} and Fn,o being the smooth
sample dJ., is proved in Falk and Reiss (1989). A detailed proof of the present
result will be given somewhere else.
If f3n = n 1/ 3 then the accuracy of the bootstrap approximation is, roughly
speaking, of order O(n 1/3 ). The choice of f3n = n 1/2 leads to a bootstrap
estimator related to that of Section 6.4 as far as the rate of convergence is
concerned.
Under stronger regularity conditions it is possible to construct bootstrap
estimates of a higher accuracy. Assume that F 1 has three bounded derivatives
near q and that the kernel u has three bounded derivatives. Moreover, assume
that Su(x)x dx = O. Notice that nonnegative, symmetrical kernels u satisfy
this condition. Then, the condition nf3; + 0 in Theorem 8.4.1 can be weakened
to nf3; + 0 as n + 00. This yields that the rate of convergence of the smooth
bootstrap dJ. is, roughly speaking, of order O(n 2 / 5 ) for an appropriate choice
of f3n.
P.8. Problems and Supplements
l. (Randomized sample quantiles)
(i) Define a class of median unbiased estimators of the qquantile by choosing X,,"
with probability p(r) where I~=o p(r) = 1 and
rto ktr G) qk(1 
qrkp(r) = 1/2.
(Pfanzagl, 1985, page 435)
(ii) Establish a representation corresponding to that in P.l.28 for the randomized
sample median.
2. (Testing the qquantile)
(i) Let fo and fl be the densities in (8.1.7). Construct dJ.'s Gm E ~ with densities
gm for m = 1,2,3, ... such that G,;;I(q) = u and gm(x) > fo(x) as m > 00 for every
x =I u.
(ii) Let qJ and ~ be as in Lemma 8.1.3. Prove that for every critical function t/! of
level Ct: such that EFt/! = Ct: if F E ~ and FI(q) = u the following relations hold:
if F
E ~
and PI(q)
<
u.
>
P.8. Problems and Supplements
269
3. Let cp and ~ be as in Lemma 8.1.3 and let ':# be a subfamily of~. For e > 0 define
a "eneighborhood" ':#, of,:# by
':#, = {F
E~:
If  gl :::; eg for some
G E ':#}
where f and g denote the differentiable densities of F and G.
Then for every critical function t/J which has the property
if FE ':#, and F1(q) :::; u
we have
eq(1  q)
if FE ':#,/2 and q 
4(1
+ e)
:::; F(u) < q.
4. (Stochastic properties of kernel density estimator)
Find conditions under which the density estimator J" == Fn o 1 [see (8.2.19)] has the
following properties:
(i) Sfn(Y) dy = 1.
(ii) Efn(x) = Su(y)f(x + f3y) dy.
(iii) Efn(x) + f(x) as 13 > O.
(iv) IEfn(x)  f(x)  13 2P2)(x) SU(y)y2 dyj21 = 0(13 2).
(v) IE[fn(x)  EJ,,(X)]2  (nf3f1 f(x)J U2(y) dYI = 0(n 1).
(vi) Let U(y)y2 dy > O. Show that
13 = n 1/ [f(X)
u 2(y) dy
T5/[p 2)(X) fU(y)y2 dyJ'5
minimizes the term
For this choice of 13, the mean square error of fn(x) satisfies the relation
E[fn(x)  f(X)]2
5
= n 4 / 5 4
[
f(x)f u 2(y) dy J4/5 [ f(2)(x) f U(y)y2 dy J2/5 + 0(n
5. (Orthogonal series estimator)
For x E [0, 1] define eo(x) = 1 and
e2j  1(x)
= 21/2 cos(2n:jx)
e 2j(x) =
(i) (a) eo, e 1 , e2 ,
21/2
sin(2n:jx),
= 1,2,3, ....
are orthonormal [w.r.t. the inner product
(f, g) =
(b) Let
f(x)g(x) dx].
4 / 5 ).
270
8. Evaluating the Quantile and Density Quantile Function
be a probability density and ~ l ' ... , ~n i.i.d. random variables with common
density f. Then, for every x E [0, 1],
/,.(x)
= 1+
it (n i~ ei~J) ei
1
x)
is an expectation unbiased estimator of f(x) having the integrated variance
Var(in(x))dx
n 1
it I
er(x)f(x)dx  n 1
(1 + it
ar)
= O(s/n)
(see Prakasa Rao, 1983, Example 2.2.1)
(ii) (Problem) Investigate the asymptotic performance of
(/,.(x)  1)2 dx
as a test statistic for testing the uniform distribution on (0,1) against alternatives
as given iq (i) (b) with s == s(n) > 00 as n > 00.
(Compare with Example 10.4.1.)
6. There exists a constant C(p) > 0, only depending on p > 0, such that
IN(~n,y~){Jlayn1/2, ~ + aynl/2)_ 2<D(y[1
:::; C(p)(max(ln 1/ 2vn
for every y
0,
Vn
> 0, a
ai, n(~n p, 
00
+ {1(n/a 2)(v; +(~n 
~)2)}/2])11
~)2))3/2
< /In,
<
00
and positive integers n.
(Reiss, 1981c)
7. Denote by G;;l the sample qJ. if Fn o is the underlying dJ. Prove that
PFn.o{(G;;l(q)  Fn~b(q))/Fn~l(q) E B}
is a more accurate approximation to
than the bootstrap distribution Tn(Fn,o, .) to Tn(F, ').
8. (Generating pseudorandom variables)
Generate pseudorandom numbers according to the kernel qJ.
kernel dJ. Fn. o.
Fn~6
and the
Bibliographical Notes
It was proved by Pfanzagl (1975) that the sample qquantile (including the
sample median) is an asymptotically efficient estimator of the qquantile (the
median) in the class of an asymptotically median unbiased estimators. It is
well known that for symmetric densities one can find nonparametric estimators of the symmetry point which are as efficient as parametric estimators;
according to Pfanzagl's result a corresponding procedure is not possible if
there is even the slightest violation of the symmetry condition.
Bibliographical Notes
271
In Section 8.2 we studied special topics belonging to nonparametric density
estimation or, in other words, nonparametric curve estimation. We refer to
the book of Prakasa Rao (1983) for a comprehensive account of this field.
In data analysis extensive use of histograms, that are closely related to kernel estimators, has been made for a long time. As early as 1944, Smirnov
established an interesting mathematical result concerning the maximum deviation of the histogram from the underlying density. Since the celebrated
articles of Rosenblatt (1956) and Parzen (1962) much research work has been
done in this field.
The kernel estimator of the dJ. was studied by Nadaraya (1964), Yamato
(1973), Winter (1973), and Reiss (1981c). It was proved by Falk (1983) that
kernels u exist which satisfy condition (8.3.10) as well as Condition 8.3.2 for
k > 1. Falk (1983) and Mammitzsch (1984) solved the question of optimal
choice of kernels in the context of estimating dJ.'s and qJ.'s.
The basic idea behind the kernel estimator of the qquantile is to average
over order statistics close to the sample qquantile. The most simple case is
given by quasiquantiles which are built by two, or more general by a fixed
number k, of order statistics. In the nonparametric context, quasiquantiles
were used by Hodges and Lehmann (1967) in order to estimate the center of
a symmetric distribution and by Reiss (1980, 1982) to estimate and test
qquantiles. The kernel estimator of the qJ. was introduced by Parzen (1979)
and, independently, by Reiss (1982). The asymptotic performance of the kernel
estimator of the qJ. was investigated by Falk (1984a, 1985a). Other notable
articles pertaining to this are Brown (1981), Harrell and Davis (1982), and
Yang (1985), among others.
The derivative of the qJ. (== quantile density function) can easily be estimated by means of the difference of two order statistics. An estimator of the
quantile density function may e.g. be applied to construct confidence bounds
for the qquantile. The estimation of the quantile density function is closely
related to the estimation of the density by means of histograms with random
cell boundaries. Such histograms were dealt with by Siddiqui (1960), Bloch
and Gastwirth (1968), van Ryzin (1973), Tusmidy (1974), and Reiss (1975a,
1978). A confidence band, based on the moving scheme (see (8.2.2)), was
established in Reiss (1977b) by applying a result for kernel density estimators
due to Bickel and Rosenblatt (1973) and a Bahadur approximation result like
Theorem 6.3.1.
Another example of the kernel method is provided by smoothing the log
survivor function and taking the derivative which leads to a kernel estimator
of the hazard function (see Rice and Rosenblatt, 1976). A related estimator
of the hazard function was earlier investigated by Watson and Leadbetter
(1964a, 1964b).
Sharp results for the almost sure behavior of kernel density estimators were
proved by Stute (1982) by applying the result concerning the oscillation of the
sample dJ. A notable article pertaining to this is Reiss (1975b).
CHAPTER 9
Extreme Value Models
This chapter is devoted to parametric and nonparametric extreme value
models. The parametric models result from the limiting distributions of sample extremes, whereas the nonparametric models contain actual distributions
of sample extremes. The statistical inference within the nonparametric framework will be carried out by applying the parametric results.
The importance of parametric statistical procedures for the non parametric
setup (see also Section 10.4) may possibly revive the interest in parametric
problems. However, it is not our intention to give a detailed, exhaustive survey
of the various statistical procedures concerning extreme values.
The central idea of our approach will be pointed out by studying the
simplenevertheless important problem of estimating a parameter IX which
describes the shape of the distribution in the parametric model and the domain
of attraction in the nonparametric model.
In Section 9.1 we give an outline of some important statistical ideas which
are basic for our considerations. In particular, we explain in detail the straightforward and widely adopted device of transforming a given model in order to
simplify the statistical inference. A continuation ofthis discussion can be found
in Section 10.1 where the concept of "sufficiency" is included into our considerations.
Sections 9.2 and 9.3 deal with the sampling of independent maxima. Section
9.4 introduces the parametric model which describes the sampling of the k
largest order statistics. It is shown that in important cases the given model
can be transformed into a model defined by independent observations. The
nonparametric counterpart is treated in Section 9.5.
A comparison of the results of Sections 9.3 and 9.5 is given in Section 9.6.
The 3parameter extreme value family contains regular and nonregular subfamilies and hence the statistical inference can be intricate. However, the
9.1. Some Basic Concepts of Statistical Theory
273
classical model is of a rather limited range; it can be enlarged by adding further
parameters as it will be indicated in Section 9.6.
In Section 9.7 we continue our research concerning the evaluation of the
unknown qJ. The information that the underlying dJ. belongs to the domain
of attraction of an extreme value dJ. is used to construct a competitor of the
sample qJ. near the endpoints.
9.1. Some Basic Concepts of Statistical Theory
In the present section we shall recall some simple facts from statistical theory.
The first part mainly concerns the estimation of an unknown parameter as
e.g. the shape parameter of an extreme value distribution. The second part
deals with the comparison of statistical models.
Remarks about Estimation Theory
Consider the fairly general estimation problem where a sequence ~ l' ~2' '" of
r.v.'s (with common distribution Po, e E 0) is given which enables us to
construct a consistent estimator of a realvalued parameter e as the sample
size k tends to infinity.
In applications the sample size k will be predetermined or chosen by the
statistician so that the estimation procedure attains a certain accuracy. Then,
one faces two problems, namely that of measuring the accuracy of estimators
and in a second step that of finding an optimal estimator in order not to waste
observations (although in some cases it may be preferable to use quick
estimators in order not to waste time).
For an estimator e: == e:(~ 1, ... , ~k) of the parameter e a widely accepted
measure of accuracy is the mean square error
(9.1.1)
Eo etc. instead of E in order to
and, thus, the expectation as well depends on
If necessary the expectation is denoted by
indicate that the r. v.'s ~ l'
the parameter e. Since
... ,
~k
Eo(e:  e)2 = Eie:  Eoen 2+ (Eoe:  e)2
e:
(9.1.2)
we know that the mean square error is the variance if
is expectation
unbiased.
In general, the accuracy of the estimator can be measured by the expected
loss
(9.1.3)
where L is an appropriate loss function. Note that A. Wald in his supreme
wisdom decided to call EoL( e: Ie) risk instead of expected loss. For a detailed
9. Extreme Value Models
274
discussion of the problem of comparing estimators and of the definitions of
optimal estimators we refer to Pfanzagl (1982), pages 151154. We indicate
some basic facts.
There does not exist a canonical criterion for the selection of an optimal
estimator. However, one basic idea for any definition of optimality is to
exclude degenerated estimators as e.g. an estimator which is a constant.
An estimator (){ is optimal w.r.t. the global minimax criterion if
sup E8L( (){ 1()) = inf sup E8 L ((){* 1())
8
(9.1.4)
where the inf is taken over the given class of estimators (){*. Notice that (9.1.4)
can be modified to a local minimax criterion by taking the sup over a
neighborhood of ()o for each ()o E e.
The Bayes risk of an estimator (){* w.r.t. a "prior distribution A" is given
by the weighted risk
f E8 L ((){* ())dA(())
1
where A is a probability measure on the parameter space e equipped with a
afield. The optimum estimator is now the Bayes estimator (){ which minimizes the Bayes risk; that is,
f E8 L ((){I())dA(()) = inf f E8 L ((){* ())dA(())
1
(9.1.5)
where the inf is taken over the given class of estimators (){*. In certain
applications one also considers generalized Bayes estimators where A is a
measure; this generalization e.g. leads to Pitman estimators (compare with
(10.1.23)). For a detailed treatment of Bayes and minimax procedures we refer
to Ibragimov and Has'minskii (1981) and Witting (1985).
Alternatively, one can try to find an optimal estimator within a class of
estimators which satisfy an additional regularity condition. Recall that if the
estimators are assumed to be expectation unbiased then the use of(9.1.1) leads
to the famous CramerRao bound as a lower bound for the variance. In the
nonparametric context (e.g. when estimating a density) one has to admit a
certain amount of bias of the estimator to gain a smaller mean square error.
The extension of the concept above to randomized estimators (Markov
kernels having their distributions on the parameter space e) is straightforward. Notice that E8L((){ 1()) = L( '1 ())dQ8 where Q8 is the distribution of
(){. The extension is easily obtained by putting the distribution of the randomized estimator in place of Q8'
A different restriction is obtained by the requirement that the estimator
(){ is median unbiased or asymptotically median unbiased (compare with
Section 8.1).
Moreover, we shall base our calculations on covering probabilities of the
form
275
9.1. Some Basic Concepts of Statistical Theory
P{ t'::;;
e:  e::;; til}
(9.1.6)
which measure the concentration of the estimator e: about e.
Let L(e 1 Ie2 ) be of the form L(e1  ( 2 ). An estimator e: which is maximally concentrated about the true parameter e will also minimize the risk
EeL(e:  e) for bounded, negative unimodal loss functions Lhaving the mode
at zero [that is, Lis nonincreasing on (  00, 0] and nondecreasing on [0, 00)].
This can easily be deduced from P.3.5.
Comparison of Statistical Models
Next we describe the simplest version of the fundamental operation of replacing a given statistical model by another one which might be more accessible
to the statistician. The model
(9.1.7)
will be replaced by
(9.1.8)
The two models can be compared by means of a map T or, in general, by
a Markov kernel (the latter case will be dealt with in Chapter 10). The crucial
point is that the map T is independent of the parameter e.
Given 8 E 8 and a r.v. ~ with distribution Pe let '1 = T(O be distributed
according to Qe.
Then, obviously, for any estimator 8('1) operating on f2 [or in greater
generality, a statistical procedure] we find an estimator operating on f!J',
namely,
8*(~) = 8(T(~
having the same distribution as 8('1)'
In terms of risks this yields that for every loss function L
8 E 8.
(9.1.9)
An extension of the framework above is needed in Section 9.3 where f2 and
f!J' have different parameter sets. Let f2 be as in (9.1.8) and
[lJ!
= {Pe,h: 8 E 8, hE H(8)}.
Let T be a map such that for every r.v.
with distribution Qe,
sup IP{T(O
B}  P{'1
(9.1.10)
with distribution Pe,h and r.v. '1
B}I::;; e(e,h).
(9.1.11)
This implies (compare with P.3.5) that with 8*(~)
8(T(~)),
IEe,hL(8*(~)18)  EeL(8('1)18)1 ::;; e(8, h) sup L(tle)
t
for every loss function L( . 18).
(9.1.12)
9. Extreme Value Models
276
For every procedure acting on fl, we found a procedure on f!jJ with the same
performance (within a certain error bound). Until now we have not excluded
the possibility that there exists a procedure on f!jJ which is superior to those
carried over from fl to f?J. However, if T is a onetoone map (as e.g. in Example
9.1.1), one may interchange the role of fl and f?J by taking the inverse T 1
instead of T. Thus, the optimal procedure on f!jJ can be regained from the
corresponding one on fl.
In connection with loss functions the parameter () is not necessarily
realvalued. The extension of the concept to functional parameters is obvious.
EXAMPLE 9.1.1. Section 9.2 will provide a simple example for the comparison
of two models. Here, with () = (CT, ct), Po is the Frechet distribution with scale
parameter CT and shape parameter 1/ct, and Qo is the Gumbel distribution with
location parameter log CT and scale parameter ct. The transformation T is given
by T = log.
Moreover, given a sample of size k one has to take the transformation
T(x 1 ,,X k )
= (logxl,,logxd
A continuation of this discussion can be found in Section 10.1.
9.2. Efficient Estimation in Extreme Value Models
Given adJ. G denote by G(/lu) the corresponding dJ. with location parameter
/1 and scale parameter CT; thus, we have
G(/lU)(x) = G((x  /1)/CT).
Frechet and Gumbel Model
The starting point is the scale and shape parameter family of Frechet dJ.'s
G(o.u)
We have
1.1/a'
x 2 0.
(9.2.1 )
The usual procedure of treating the estimation problem is to transform the
given model to the location and scale parameter family of Gumbel d.f.'s G~O.a)
where () = log CT. Notice that if ~ is a r.v. with dJ. Gi~i~~ then '1 = log ~ is a r.v.
with dJ. G~O.a).
The density of G~O.a) will be denoted by g~.a).
Gumbel Model: Fisher Information Matrix
For the calculation of the Fisher information matrix within the location and
scale parameter family of Gumbel d.f.'s we need the first two moments of the
9.2. Efficient Estimation in Extreme Value Models
277
distributions. The following two formulas are well known (see e.g. Johnson
and Kotz (1970:
xdG 3(x) =
Loo (log x)e
(9.2.2)
dx = y
where y = 0.5772 ... is Euler's constant. Moreover,
x Z dG 3 (x) = yZ
+ n Z/6.
(9.2.3)
From (9.2.2) and (9.2.3) it is obvious that a r.v. IJ with dJ. G~8.a) has the
expectation
EIJ = ()
+ ay
(9.2.4)
and variance
(9.2.5)
The Fisher information matrix can be written as
I(()l,()z)
[f[a~i logg~1.82)(X) J[a~j logg~1.82)(X) JdG~1.82)(X)
1/
By partial integration one can easily deduce from (9.2.4) and (9.2.5) that
I((),a)
z
[1
(y 1)
nZ/6
(y 
+ (1
1)_ y)Z ].
(9.2.6)
Check that the inverse matrix I((), a)l of I((), a) is given by
I((),a)l
(6aZ/nZ{ nZ/6(t ~\~ y)Z
(1
y)}
(9.2.7)
Gumbel Model: The Maximum Likelihood Estimator
The maximum likelihood (m.l.) estimator (Ok' &k) of the location and scale
parameters in the Gumbel model is asymptotically normal with mean vector
((), a) and covariance matrix k l I((), arl. The rate of convergence to the
limiting normal distribution is of order O(kl/Z) (proof!).
In the sequel, the estimators will be written in a factorized form: If the m.l.
estimator is based on k i.i.d. random variables IJ 1, " . , IJk we shall write
&k(1J1,"" IJk) instead of &k'
If the r.v.'s IJ 1, ... , IJk have the common dJ. G~8.a) then we obtain according
to (9.2.7) that
P{ (k/V(a)) l/Z (&k(1J 1 , , , . , IJk)  a) :s; t}
+
<I>(t),
n + 00,
(9.2.8)
where V(a) = 6a z/nz.
Given the observations Xl' .'" Xk the m.l. estimate &k(Xl,,,,,Xk) is the
solution of the two loglikelihoodequations
9. Extreme Value Models
278
Le
k
=k
(x, O)/a
(9.2.9)
i=1
and
(Xi 
(J)
[1 
e(x, OJ/a]
= k(J..
(9.2.10)
i=l
Notice that (9.2.9) is equivalent to the equation
(J= CdO g
[k . e
so that by inserting the expression for
,=1
(J
(9.2.11 )
Xda ]
in (9.2.10) we get the equation
g((J.) = 0
(9.2.12)
with 9 defined by
(9.2.13)
Observe that the solution ak(x l ' ... , x k) of the equation (9.2.12) has the following property: For reals (J and (J. > 0 we have
(9.2.14)
This property yields that there exist correction terms which make the m.l.
estimator of (J. median unbiased. The corresponding result also holds w.r.t. the
expectation unbiasedness.
Equation (9.2.12) has to be solved numerically; however, this can hardly be
regarded as a serious drawback in the computer era. Approximate solutions
can be obtained by the NewtonRaphson iteration procedure. Notice that
(6 1/ 2 /n)Sk(1J 1, ... , IJk)
may serve as an initial estimator of (J. where
S~(Xl"",Xk) = (k _1)1 i~ [Xi 
k 1
i~
xiJ
is the sample variance. The asymptotic performance of (6 1/2/n)sk is indicated
in P.9.2. We remark that the first iteration leads to
(9.2.15)
The estimator (J.t(1J1'" ., IJd has the same asymptotic performance as the m.l.
estimator. Further iterations may improve the finite sample properties of the
estimator.
From (9.2.11) we know that the m.l. estimator of the location parameter is
given by
8k(1J1, ... ,lJk) =
ak(1J1, ... ,lJk)IOg[k 1 i~ e
X ;/ti k (ql ... qk)}
(9.2.16)
279
9.3. Semiparametric Models for Sample Maxima
Efficient Estimation of (X
Let us concentrate on estimating the parameter rx.
(9.2.14) yields that (9.2.8) holds uniformly over the location and scale
parameters () and rx. A further consequence is that the m.l. estimator is
asymptotically efficient in the class of all estimators rxt (11 1 , ... , 11k) which are
asymptotically median unbiased in a locally uniform way. For such estimators
we get for every t', t" > 0,
P{ _t'k 1/2 ~ rxt(111, ... ,l1k)  rx ~ t"k 1/2 }
~ P{ _t'k 1/2 ~ rX k(111, ... ,l1k) 
rx ~ t"k 1/2 } + o(kO).
(9.2.17)
We return to the Frechet model of dJ.'s Gi~i/; with scale parameter (J and
shape parameter l/rx. The results above can easily be made applicable to the
Frechet model.
If ~ l ' ... , ~k are i.i.d. random variables with common dJ. Gi~i/; then it
follows from (9.2.8) and the discussion in Section 9.1 that
n .......
00.
(9.2.18)
The rate of convergence in (9.2.18) is again of order O(kl/2). Moreover, the
efficiency of rX k(111"'" 11k) as an estimator of the scale parameter ofthe Gumbel
distribution carries over to rXk(log ~ 1, ... , log ~k) as an estimator of the shape
parameter of the Frechet distribution.
9.3. Semiparametric Models for Sample Maxima
The parametric models as studied in Section 9.2 reflect the ideal world where
we are allowed to replace the actual distributions of sample maxima by the
limiting ones. By stating that the parametric model is an approximation to
the real world one acknowledges that the parametric model is incorrect
although in many cases the error of the approximation can be neglected.
In the present section we shall study a non parametric approach, give some
bounds for the error of the parametric approximation and discuss the meaning
of a statistical decision within the parametric model for the nonparametric
model.
Frechet Type Model
We observe the sample maximum Xm,m of m i.i.d. random variables with
common dJ. F belonging to the domain of attraction of a Frechet dJ. Gl,l/~'
Our aim is to find an estimator of the shape parameter rx.
More precisely, we assume that F is close to a Pareto dJ. ~(.~i:) (with
unknown scale parameter (J) in the following sense: F has a density f satisfying
9. Extreme Value Models
280
the condition
f(x) = (CTocf 1(x/or(1+1 /")e h (X/C7)
where
Xo
for x
(xoor"
(9.3.1)
> 0 is fixed and h is a (measurable) function such that
Ih(x)1 ::;;
Llxl 61"
for some constants L > 0 and fJ > O.
Condition (9.3.1) is formulated in such a way that the results will hold
uniformly over CT and oc. It is apparent that the Pareto and Fn!chet densities
satisfy this condition with h = 0 and, respectively, h(x) = _X 1/".
The present model can be classified as a semi parametric (in other words,
seminonparametric) model where the shape parameter oc and (or) the scale
parameter CT have to be evaluated and the function h is a nonparametric
nuisance parameter which satisfies certain side conditions.
Let X~1'>m' ... , X~~m be independent repetitions of X m:m. The joint distribution of X~1:)m"'" X~~m will heavily depend on the parameters CT and oc whereas
the dependence on h, xo, and L can be neglected if m is sufficiently large and
k is small compared to m.
Let ~ 1, .. , ~k be i.i.d. random variables with common dJ. Gl, 1/'" From
(3.3.12) and Corollary 5.2.7 it follows that
sup IP{(X~1'>m"'" X~~m) E B}  P{ (CTm"~1"'" CTm"~k) E B} I
B
= O(k1/2(m6 + m 1))
(9.3.2)
uniformly over k, m and densities f which satisfy (9.3.1) for some fixed values
x o, Land fJ. Notice that CTm"~i has the dJ. G~~ii"m').
Let again <ik be the solution of the m.l. equation (9.2.12). Combining (9.2.18)
and (9.3.2) we get
Theorem 9.3.1.
P{(k/V(oc))1/2[<ik(logX~1'>m"" ,10gX~~m)  oc] ::;; t}
= <I>(t) + O(k 1/2 (m 6 + m 1) + k 1/2 )
(9.3.3)
uniformly over t, k, m and densities f which satisfy condition (9.3.1) for some
fixed constants x o, L, and fJ. Moreover, V(oc) = 6 oc 2/n 2.
The properties of the m.l. estimator carryover from the parametric to the
non parametric framework.
Sample Maxima within a Fixed Period
If the practitioner insists on observing the data within a fixed period, then it
is necessary to modify the results above since now the sample size is random.
This situation e.g. occurs in insurance mathematics. So let us speak for a
while in terms of claims and claim sizes.
9.4. Parametric Models Belonging to Upper Extremes
281
Assume that the claims come in according to a Poisson process N(s), s ~ 0,
and that independently the claim sizes '11' ... , 'lk have the common density f
which satisfies condition (9.3.1). Thus, the number of claims within a period
oflength s will be N(s). The claims will be arranged in k groups. Write
M == M(s, k)
[N(s)/kJ.
(9.3.4)
Denote by XXl:M the maximum claim size ofthe r.v.'s 'l(i1)M+1'' 'liM. Thus,
using the notation of (1.1.4) we get the representation
(i)
X M:M  ZM:M('l(i1)M+1,' 'liM)
(9.3.5)
In analogy to Theorem 9.3.1 we get
Theorem 9.3.2.
P{(kjV(a1(2[&k(logXii~M, ... ,logX~:M)  a] ~ t}
=
<l>(t) +
O[k1(2[m~o (m
+ m 1)P{M = m} ] + k 1(2 ]
(9.3.6)
uniformly over t, k, m and densities f which satisfy condition (9.3.1) for some
fixed constants x o, L, and b. Moreover, V(a) = 6 a 2jn 2.
PROOF. Writing at = &k(logXii~M, ... ,logX~:M) and conditioning on M we
get
00
P{at ~ t} =
P(at ~ tiM = m)P{M = m}
m=O
L
00
m=O
P{&k(logX~1'>m,,logX~~m) ~ t}P{M = m}
with X~;m as in Theorem 9.3.1. Now the assertion is immediate since Theorem
9.3.1 holds uniformly over m.
0
If the distribution of M is highly concentrated about a fixed value, say m,
then it is apparent that the righthand side of (9.3.6) is again that of (9.3.3).
Another interesting problem arises if k periods of length ti  t i 1 are fixed.
Notice that the claim numbers N(td, N(t2)  N(t3)' ... , N(t k)  N(t k 1) of
the k periods are independent. Again the statistical inference can be based on
the maximum claim sizes of each period. After conditioning on the claim
numbers the maximum claim sizes can again approximately be represented
by independent Gumbel r.v.'s which, however, are not identically distributed.
9.4. Parametric Models Belonging to Upper Extremes
In Section 9.2 we studied the classical problem of evaluating the unknown
parameter in the extreme value model by means of estimators based on i.i.d.
random variables. A model of a different kind arises in connection with the
282
9. Extreme Value Models
limiting joint distributions G;,a,k of the k largest extremes of a sample of size
n as introduced in Section 5.3. More precisely, one has to speak of approximate
distributions when the number k = k(n) of extremes goes to infinity as n goes
to infinity.
Now the statistical procedures will be based on k r.v.'s which are dependent.
However, we shall only study certain submodels which can be transformed
to models involving i.i.d. random variables.
Frechet Type Model
First we examine a model that corresponds to that in (9.2.1), namely,
{Gi~i7~,k:
(1
> 0, ex> O}
(9.4.1)
with location parameter 0 and scale parameter (1. This model arises out of the
Frechet distributions G1,1/a' The model in (9.4.1) can be transformed to the
model
{Q:1 x Gi~i7~,k:
(1
> 0, ex > O}
(9.4.2)
where Qa is the exponential distribution with scale parameter ex and G~~i7~,k is
the kth marginal distribution of Gi~i7~,k'
More precisely, if (~1'"'' ek) is a random vector with distribution Gi~i7~,k
then according to (5.3.3), (1.6.14), and Corollary 1.6. 11 (iii) the random vector
(1'11,, 11k) := (log(~ d~2)' 210g(~2g3)"'" (k  l)log(~kd~k)' ~k)
(9.4.3)
has the distribution Qk1
x G(O,a)
a
1,1/a,k
Exponential Model
The statistical inference is particularly simple in the exponential model
(9.4.4)
Asymptotically, one does not lose information by restricting model (9.4.2)
to model (9.4.4) as far as the evaluation of the parameter ex is concerned
(proofl).
The m.l. estimator
rXk  1(111, ... ,11kd
= (k _1)1
k1
L 11;
;=1
(9.4.5)
is an (asymptotically) efficient estimator of ex. This estimator is expectation
unbiased and has the variance
Var( rXk(111, ... ,11k)) = ex 2 /k.
Moreover, the Fisher information J(ex) is given by
(9.4.6)
283
9.5. Inference Based on Upper Extremes
J(rx)
= f[:rxIOg[exp(x/rx)/rxJ
dQ",(x)
= rx 2 ,
(9.4.7)
thus, rlk(1J1' ... , IJk) attains the CramerRao bound (kJ(rx)tl. The central limit
theorem yields the asymptotic normality of rlk(1J 1, ... , IJk). We have
(9.4.8)
Moreover, (9.4.8) holds with O(kl/2) in place of 0(1) according to the
BerryEsseen theorem.
Corresponding to the results of Section 9.2, the m.l. estimator is asymptotically efficient within the class of all locally uniformly asymptotically
median unbiased estimators rxt(1J 1, ... , IJk). For t', til > 0 we get
P{ _t'k 1/2 ::;; rxt(1J1, ... ,lJk)  rx::;; t"k 1/2}
::;; P{ _t'k 1/2 ::;; rlk(1J1, ... ,lJk)  rx::;; t"k 1/ 2 }
+ 0(1).
(9.4.9)
9.5. Inference Based on Upper Extremes
In analogy to the investigations in Section 9.3 we are going to examine the
relation between the actual model of distributions of upper extremes and the
model built by limiting distributions
{G~~i!~.k: (J > 0, rx > O}
as introduced in Section 9.4. Let fbe a density which satisfies condition (9.3.1),
that is,
f(x)
where
Xo
= ((Jrxt 1(x/(Jt(1+1/"')e h (x/a)
for x
(xo(Jt'"
(9.5.1 )
is fixed and h is a (measurable) function such that
Ih(x)1 ::;; Llxl d /'"
for some constants L > 0 and 15 > O.
Contrary to Section 9.3, the statistical inference will now be based on the
k upper extremes (Xn,n, ... , Xnk+l,n) of a sample of n i.i.d. random variables
with common density f.
The distribution of (Xn,n, ... ' Xnk+l,n) will heavily depend on the parameters rx and (J whereas the dependence on h, x o, and L can be neglected if n is
sufficiently large and k is small compared to n.
It is immediate from Corollary 5.5.5 that
sup IP{(Xn,n, ... ,Xnk+l,n) E B}  G~~i!:."~(B)1 = O(k/n)dkl/2
B
+ kin~
(9.5.2)
uniformly over n, k E {I, ... , n} and densities f which satisfy (9.5.1) for some
fixed constants 15, L, and xo.
9. Extreme Value Models
284
Thus, the transformation as introduced in (9.4.3) yields
sup IP
B
{(I
}
o gXn:n
  , ... ,( k  1) og X n k + 2 : n, X n  k + 1 : n)EB
X n 1:n
X n k +1:n

(Q~l x G~~i7:'"k)(B) I = O((kln)b k 1/2 + kin)).
(9.5.3)
The optimal estimator in the exponential model {Q~1: IX > O} with unknown scale parameter IX (compare with Section 9.4) is the m.l. estimator
&'k(I11,, rlk) = (k  1)1 L~':; '1i where '11' ... , '1k are i.i.d. random variables
with common distribution Qa' Thus, within the error bound given in (9.5.3)
the estimator
k1
lXt,n = (k  1)1
i 10g(Xn i +1 :nlX n i : n)
i=l
(9.5.4)
= [(k 
1)1
:t:
logXn i + 1:n] logXn k +1:n
has the same performance as the m.l. estimator &'k('11' ... , '1k) as far as covering
probabilities are concerned. We remark that IXt.n is Hill's (1975) estimator. The
optimality property carries over from &'k('11"'" '1d to IXL. From (9.5.3) we get
for t', t" > 0,
= P{k(1t'k1/2)~Yk_1 ~k(1 +t"k 1/2 )} +O((kln)bkl/2 + kin)) (9.5.5)
= <I>(t")  <1>( t') + O[(kln)bk 1/2 + kin + k 1/2 ]
where Yk1 is a gamma r.v. with parameter k  1.
From (9.5.5) we see that the gamma approximation is preferable to the
normal approximation if k is small. From an Edgeworth expansion of length
2 one obtains that the term k 1/2 in the 3rd line of (9.5.5) can be replaced by
k 1 if t' = t".
9.6. Comparison of Different Approaches
In the Sections 9.3 and 9.5 we studied the nonparametric model given by
densities f of the form
f(x) = (0'1X)1(xloT(1+1 /a)eh(x l a)
for x
(xoO'fa
(9.6.1)
where h satisfies the condition
Ih(x) ~ Llxl bla
Let n = mk. Given the i.i.d. random variables el' ... , en with common density
let x:.f!m be the maximum based on the jth subsample of r.v.'s e(j1)m+1' .. ,
ejm for j = 1, ... , k. Moreover, X nk+1:n, ... , Xn:n are the k largest order
285
9.6. Comparison of Different Approaches
statistics of ~I'
.. , ~n'
We write
elk,n = elk (log X:';'>m'"'' log X!.:'~m)
(9.6.2)
where elk is the solution of (9.2.12). From (9.3.3) we know that for every t,
P{(kn2j6)1/2oc l (el k,n  oc):s; t}
(9.6.3)
Recall from (9.5.6) that Hill's estimator octn, which is based on the k largest
order statistics, has the following property:
p{kl/2oc l (octn  oc) :s; t}
=
<I>(t)
+ O[(kjn)dkl/2 + kjn + k I /2]
(9.6.4)
for every t.
A comparison of (9.6.3) and (9.6.4) shows that the asymptotic relative
efficiency of Hill's estimator octn w.r.t. the estimator cXk,n' based on the sample
maxima of subsamples, is given by
ARE(octn, elk,n) = 0.6079 ....
(9.6.5)
Thus, Hill's estimator is asymptotically inefficient if both estimators are based
on the same number k == k(n) of observations (where, of course, the error
bound in (9.6.3) and (9.6.4) has to go to zero as n + (0). Notice that the error
bounds in (9.6.3) and (9.6.4) are of the same order if c5 :s; 1 which is perhaps
the most interesting case. A numerical comparison of both estimators for small
sample sizes showed an excellent agreement to the asymptotic results.
The crucial point is the choice of the number k. This problem is similar to
that of choosing the bandwidth in the context of kernel density estimators as
discussed in Section 8.2.
The above results are applicable if (kjn)d kl/2 is sufficiently small where for
the sake of simplicity it is assumed that c5 :s; 1. On the other hand, the relations
(9.6.3) and (9.6.4) show that k should be large to obtain estimators of a good
performance. This leads to the proposal to take
k = cn 2d/(2HI)
(9.6.6)
for some appropriate choice of the constant c. If c5 is known to satisfy a
condition 0 < c50 :s; c5 :s; 1, where c50 is known, then one may take k as in (9.6.6)
with c5 replaced by c5o.
Within a smaller model, that is, the densities f satisfy a stronger regularity
condition, it was proved by Hall and Welsh (1985) that c5 can consistently be
estimated from the data obtaining in this wayan adaptive version of Hill's
estimator. S. Csorgo et al. (1985) were able to show that the bias term of Hill's
estimator (and of related estimators) restricts the choice of the number k; the
balance between the variance and the bias determines the performance of the
estimator and the optimal choice of k. These results are proved under conditions weaker than that given in (5.2.18). By using (5.2.18), thus strengthening
286
9. Extreme Value Models
(9.6.1), we may suppose that the density f satisfies the condition
f(x) = (Jcx)l(x/(J)(l+l/a)(1  K(x/(J)p/a
+ h(x/(J)),
where
and 0 < p :::;; b :::;; 1. According to the results of Section 5.2, the expansion of
length 2 of the form
G1,1/a(X/(J)
(1 + m 1~ p (x/(J)(1+ p)a )
P
(9.6.8)
provides a better approximation to the normalized dJ. of the maximum X;';;m
than the Frechet dJ. Gl,l/a'
The d.f.'s in (9.6.8) define an extended extreme value model that contains
the classical one for K = O. Notice that the restricted original model of
distributions of sample maxima is approximated by the extended extreme
value model with a higher accuracy.
The approach, developed in this chapter, is again applicable. By constructing an estimator of cx in the extended model one is able to find an estimator
of cx in the original model of densities satisfying condition (9.6.7). The details
are carried out in Reiss (1989).
It is needless to say that our approach also helps to solve various other
problems. We mention twosample problems or, more general, msample
problems. If every sample consists of the k largest order statistics with k ~ 2
and m tends to infinity then one needs modified versions of the results of
Section 5.5, namely, a formulation W.r.t. the Hellinger distance instead of the
variational distance, to obtain sharp bounds for the remainder terms of the
approximations. Such situations are discussed in articles by R.L. Smith (1986),
testing the trend of the Venice sealevel, and I. Gomes (1981).
9.7. Estimating the Quantile Function
Near the Endpoints
Let us recall the basic idea standing behind the method adopted in Section
8.2 to estimate the underlying qJ. F 1 Under the condition that F 1 has
bounded derivatives it is plausible to use an estimator which also has bounded
derivatives. Thus, the sample qJ. Fn 1 has been smoothened by means of an
appropriate kernel. One has to choose a bandwidth which controls to some
extent the degree of smoothness of the resulting kernel estimator Fn~6.
For q being close to 0 or 1 the required smoothness condition imposed on
F 1 will only hold for exceptional cases. So if no further information about
F 1 is available it is advisable to reduce the degree of smoothing when q
approaches 0 or 1 (as it was done in Section 8.2).
287
9.7. Estimating the Quantile Function Near the Endpoints
However, for q close to 0 or 1 we are in the realm of extreme value theory.
In many situations the statistician will accept the condition that the underlying dJ. F belongs to the domain of attraction of an extreme value distribution. As pointed out in Section 5.1 this condition can be interpreted in the
way that the tail of F lies in a neighborhood of a generalized Pareto distribution "'i.a with shape parameter IX.
This suggests to estimate the unknown qJ. F 1 near the endpoints by
means of the qJ. of a generalized Pareto qJ. where the unknown parameters
are replaced by estimates.
When treating the full extreme value model then it is advisable to make
use of the von Mises parametrization of generalized Pareto distributions as
given in Section 5.1. Then, in a first step, one has to estimate the unknown
parameters. As already pointed out the full 3parameter model contains
regular as well as nonregular submodels so that a satisfactory treatment of
this problem seems to be quite challenging from the mathematical point of
view.
!n practice the statistician will often be able to specify a certain submodel.
We shall confine ourselves to the treatment of the upper tails of dJ.'s F which
belong to a neighborhood of a Pareto dJ. Wf?i/,j with scale parameter a.
Thus,
w(O,al(x)
 1  (x/a)l/a,
1,1/a

x> a,
(9.7.1)
O<q<1.
(9.7.2)
and the q.f. is given by
(Wl(~i/:)rl(q)
= a(l  qr a,
The estimator G;;l is defined by
G;;l(q) = {
Fn~Mq)
1
( 1  q )a~
Fn o(x o ) 1.
xc
q ~ xo
if
xo < q
(9.7.3)
where Fn~A is the kernel q.f. as defined in Section 8.2 and IX: is the Hill estimator
defined in (9.5.4).
In Figures 9.7.1 and 9.7.2, n = 100 pseudorandom numbers were drawn
according to the standard Frechet dJ. Gl,l' The point xo was chosen to be
equal to 0.9; the estimate of IX is equal to 1.012.
In Figure 9.7.1 the inverse (Fn.O)l ofthe kernel estimator ofthe dJ. cannot
visually be distinguished from the sample qJ. Fn 1 (compare this with the
remarks to Figure 8.2.6).
As indicated above, the philosophy behind this procedure is the following:
Up to some point xo we only have information that the underlying qJ. is
smooth, thus, the kernel method is applicable. Beyond the point xo we are in
the realm of extreme value theory, and hence, the use of a Pareto tail with
estimated parameters may be appropriate. The choice of the point Xo is crucial.
There seems to be some relationship to the wellknown problem of estimating
9. Extreme Value Models
288
0.92
Figure 9.7.1. G1.!!, F.!,
0.96
F.~b, (F. of!
1.00
with f3 = 0.08.
Figure 9.7.2. G1.!!, F.!, and estimated Pareto tail G;;! .
289
P.9. Problems and Supplements
a change point of a sequence of r.v.'s where the underlying distributions
changes the parameter after an unknown time point.
P.9. Problems and Supplements
1. Prove that there exists a unique solution of the loglikelihoodequations (9.2.9) and
(9.2.10) provided the values Xl' , Xk are not identical.
2. (Estimators based on sample mean and sample deviation)
Let '11' ... , '1k be i.i.d. random variables with mean J.l and variance (f2. Denote by
J.li the ith central moment, by mk the sample mean and by s~ the sample variance.
(i) Prove that kl/2(mk  J.l, s~  (f2) is asymptotically normal with mean vector
zero and covariance matrix given by (fl.l = (f2, (fl,2 = (f2,l = J.l3' (f2,2 =
(see Serfiing, 1980, page 114)
(ii) Prove the corresponding result for the sample mean mk and the sample standard
deviation Sk'
[Hint: Apply (i) and Theorem A, Serfiing, 1980, page 122.]
(iii) Let '11' ... , '1k be i.i.d. random variables with common Gumbel d.f. G~8,) where
e and IX denote the location and scale parameters. Define
e: = m
k 
')IIX:
and
IX:
= (6 1/2 /n)sk
where ')I is Euler's constant. Prove that kl/2(e:  e, a:  a) is asymptotically
normal with mean vector zero and covariance matrix given by
0"1,1 = [n 2/6
+ y2(P2
 1)/4  ynPd6 1/2]/a 2,
(fl.2 = (f2,l = [Pi  6 1/2y(P2  1)/2n]n2a2/12,
0"2,2 = (P2  1)a 2/4,
where Pi = J.l3/J.l~/2 and P2 = J.l4/J.l~
(see Tiago de Oliveira, 1963, and Johnson and Kotz, 1970)
3. (Estimators based on order statistics)
Prove a result corresponding to that in P.9.2(iii) by using estimators as given in
(6.2.6) and (6.2.7).
4. In Figure 9.7.2 we see that the second largest and largest observations are about 60
and 180.
(i) Let X". be the rth order statistic of n i.i.d. random variables with common
Pareto dJ. W1.. Then, for U ~ 1,
P{X.:. > uX._ 1 :.} = u.
[Hint: Apply Corollary 1.6.12(ii).]
(ii) Let a = 1 as in Figure 9.7.2. Notice that
P{X.:. > 3X._ 1 :.} = 1/3.
290
9. Extreme Value Models
Bibliographical Notes
In this book we primarily explore the distributional properties of order
statistics and relations between models of actual distributions of order statistics and approximate, parametric models. Statistical procedures are studied
as examples to show in which way parametric statistical procedures become
relevant within the nonparametric context.
A proper place for an exhaustive list of a greater number of parametric
statistical procedures in extreme value models is a book like that of Johnson
and Kotz (1970): Chapters 18,20, and 21 deal with exponential, Weibull, and
Gumbel models. We will only give a summary by using keywords out of these
chapters: Maximum likelihood (m.l.), minimum variance unbiased, Bayesian,
censoring, quick estimators, method of moments, best linear unbiased.
One might add (compare with Herbach (1984) and Mann (1984 the
additional keywords: Best linear invariant, unbiased nearly best linear, simplified linear.
Further references may be found in the following articles. Smith (1985a)
studied the asymptotic behavior of m.l. estimators in nonregular models like
the Weibull model; see also Polfeldt (1970). By the way, see Reiss (1973, 1978b)
and Pitman (1979) for consistency results concerning m.l. estimators in models
of unimodal d.f.'s and, respectively, in models with location and scale parameters. New quick estimators of location and scale parameters in the Gumbel
model have been proposed by Husler and Schupbach (1986). Quick tests and
a locally most powerful test have been studied by van Montfort and Gomes
(1985) for testing Gumbel d.f.'s against Frechet and Weibull alternatives.
In Section 9.2 we mentioned the asymptotic normality of the m.!. estimator
of the location and scale parameter in the Gumbel model. Higher order
approximations of the distribution of the m.l. estimator can be obtained by
means of expansions. These expansions may e.g. be applied to establish
asymptotic median unbiasedness of a higher order. We refer to R. Michel
(1975) for expansions in the case of vector parameters and to Miebach
(1977) for a specialization of these results to families with location and scale
parameters.
Next, we make some further comments about the estimation of the tail
index and related problems. The statistical extreme value theory is based on
the idea that the parametric extreme value model is an approximation of the
model of actual distributions of maxima. This idea was made rigorous by
Weiss (1971) in a particular case by treating a model of densities in a neighborhood of Wei bull densities. Weiss constructed quick estimators of the location,
scale, and shape (== tail index) parameter based on extreme and intermediate
order statistics. The estimator of the tail index is based on two intermediate
order statistics. This is of interest because an alternative approach, namely,
the use of the k largest order statistics, with k being fixed, fails to entail
consistent estimators. The article of Hill (1975) attracted more attention than
that of Weiss. Presumably, the reason for this is that Hill's estimator is efficient
Bibliographical Notes
291
and, moreover, is related to the m.l. estimator of the scale parameter in the
exponential model. Notice that the estimation of the tail index based on the
k largest order statistics, with k fixed and n ~ 00, is equivalent to estimating
the scale parameter in the exponential model for the fixed sample size k. Hill's
estimator and related estimators were extensively studied in literature [e.g. de
Haan and Resnick (1980), Hall (1982b), Hall and Welsh (1984), Hausler and
Teugels (1985), and Smith (1987)]. The estimation of the endpoint of dJ.'s in
the Weibull case was treated by Hall (1982a).
Falk (1985b) took up Weiss' approach of approximating models and derived the properties, as essentially known in literature (Hall (1982b), Hausler
and Teugels (1985)), of Hill's estimator by using the properties of the m.l.
estimator in the exponential model (compare with Sections 9.4 and 9.5).
The method of taking maxima of subsamples is due to Gumbel; a typical
example is to take annual maxima. The results of Sections 9.5 and 9.6 are
partly taken from Reiss (1987). A comparison of the two different methods,
namely, to base the inference on the k largest order statistics and, respectively,
to use the subsample method, was also carried out in the paper by Husler and
Tiago de Oliveira (1988) within a parametric framework.
The estimation of the parameters of extreme value dJ.'s is related to the
estimation of the qJ. near to the ends of the support (see Section 9.7). This
subject was dealt with in the articles by Weissman (1978), Boos (1984), Joe
(1987), Smith (1987), and Smith and Weissman (1987), among others. In this
context another interesting paper is that of Heidelberger and Lewis (1984)
who suggested applying the subsample method to reduce the possible correlation of the r.v.'s and to reduce the problem of estimating extreme quantiles to
that of estimating the median; moreover, it may have computational advantages to reduce the sample size in certain simulations by applying the subsample method.
The statistical procedures in Sections 9.29.6 are either based on the k
largest order statistics or on k subsamples. The choice of the number k
is crucial for the performance of the statistical procedures. The optimal
choice heavily depends on the given model as is pointed out in this chapter.
Some work has been done concerning the selection of the model; we refer to
Pickands (1975), Hall and Welsh (1985), and to Section 9.5 for some results.
The advice ofDu Mouchel (1983) to take the upper 10 per cent of the sample
might be valuable for practitioners. The visual comparison between sample
and extreme value d.f.'s gives further insight into the problem.
CHAPTER 10
Approximate Sufficiency of
Sparse Order Statistics
This chapter starts with an introduction to "comparison of statistical models"
where in addition to Section 9.1 we also make use of Markov kernels.
In Section to.2 it is shown that sparse order statistics X" on' X'2: n,, X'k: n
are approximately sufficient over a nonparametric neighborhood of a fixed
dJ. Fo. This result will be proved under particularly weak conditions.
In Section 10.3 the fixed dJ. Fo will be replaced by a parametric family of
drs. In the case of the location and scale parameter family of uniform
distributions, the extended result follows immediately from Section 10.2. In
other cases, one has to include an auxiliary estimator ofthe unknown parameter into the considerations.
Since sparse order statistics are asymptotically jointly normal one obtains
a normal approximation of the nonparametric model of distributions of
(Xr:n,X,+l:n, ... ,Xs:n) or (X1:n, ... ,Xn:n). The usefulness of this approach
will be demonstrated in Section 10.4 by considering a nonparametric testing
problem.
10.1. Comparison of Statistical Models
via Markov Kernels
The statistical models f!J and f2 we are primarily concerned with are built by
the joint distributions of order statistics X l : n , , Xn:n and, respectively,
X" on' X'2: n, ... , X'k: n where 1 ~ '1 ~ ... ~ 'k ~ n. The order statistics come
from n i.i.d. random variables with common dJ. F which belongs to a certain
non parametric family of dJ.'s.
lO.1. Comparison of Statistical Models via Markov Kernels
293
It is obvious that the projection, defined by
,(x 1,,xn ) = (xr" ... ,xrJ
(10.1.1)
carries the model [JJ to the model fl. Notice that the map, is not onetoone,
and hence to return from fl to [JJ one has to make use of a Markov kernel.
Markov Kernels
A Markov kernel K carrying mass from the probability space (S1,g;J1,Q) to
(So,g;Jo) has the following two properties:
(a) K( ./y) is a probability measure on g;Jo for every y E S1' and
(b) K(B/) is measurable for every B E g;Jo.
Recall that
KQ(B):=
f K(B/)dQ
(10.1.2)
defines a probability measure on g;Jo. KQ is the distribution of the Markov
kernel K (under Q). Thus, the symbol K also denotes a map from the family
of probability measures on g;J1 into that on g;Jo.
The reader is reminded of the following interpretation of KQ. First observe
y which is an outcome of an experiment governed by Q. Secondly, carry out
an experiment governed by K( ./y) and observe x. Then, the 2step experiment
with the final outcome x is governed by KQ.
Note that the distribution of a map T (under Q) can be written as KQ where
K is the Markov kernel defined by
K(B/y)
= 1B(T(y)) = BT(y)(B)
with Bx denoting the Dirac measure with mass 1 at x. In this case, given y the
value T(y) is chosen "with probability one."
More Informative and BlackwellSufficiency
In this sequel, we are given two models [JJ = {P/1: 0 E 0} and fl = {Q/1: 0 E 0}
such that TP/1 = Q(/, 0 E 0 (in other words, if is a r.v. with distribution P/1
then 11 = T(e) is distributed according to QIl).
Notice that the models [JJ and fl may be defined on different measurable
spaces (So, g;Jo) and (S1, g;J1) like Euclidean spaces of different dimensions.
It is desireable to find a Markov kernel K (independent of the parameter
0) such that
(10.1.3)
oE 0,
which means that P/1 can be reconstructed from Q/1 by means of the Markov
kernel K.
294
10. Approximate Sufficiency of Sparse Order Statistics
If (10.1.3) holds then f2 is said to be more informative than r!J>. If also
e, then both models are equivalent, and T is said to be
Blackwellsufficien t.
Recall from Section 9.1 that TP8 = Q8' () E e, implies that for every statistical procedure on f2 one finds a procedure on (llJ of equal performance.
Under (10.1.3) also the converse conclusion holds. Let us exemplify this idea
in the context of the testing problem.
Let C E f1Io be a critical region (acting on (llJ). Then, the critical function
K(q): Sl + [0, 1] is of equal performance if, as usual, the comparison is
based on power functions. This becomes obvious by noting that according to
(10.1.2) and (10.1.3),
TPe = Qe, () E
P8(C) =
f K(q)dQ8'
() E
e.
(10.1.4)
The same conclusion holds if one starts with a critical function 1/1 defined
on So. The Fubini theorem for Markov kernels implies that
e,
(10.1.5)
JI/I(x)K(dxl)
are of equal
() E
and hence the critical functions 1/1 and
performance.
lfr =
BlackwellSufficiency and Sufficiency
We continue our discussion of basic statistical concepts being aware that there
is a good chance of boring some readers. However, if this is the case, omit the
next lines and continue with Example 10.1.2 and the definition of the edeficiency for unequal parameter sets.
The classical concept of sufficiency is closely related to that of Blackwellsufficiency. In fact, under mild regularity conditions, which are always satisfied
in our context, Blackwellsufficiency and sufficiency are equivalent [see e.g.
Heyer (1982, Theorem 22.12)].
Recall that T: So + Sl is sufficient iffor every critical function 1/1 defined on
So there exists a version E(I/II T) of the conditional expectation w.r.t. T which
does not depend on the parameter (). Then, the Blackwellsufficiency holds
with a Markov kernel defined by
K(Bly) = Q(BI T = y)
where Q(BI T = y) are appropriate versions of the conditional probability of
B given T = y (in other words, K is the factorization of the conditional
distribution of the identity on So given T). Check that
E(I/II T) =
I/I(x)K(dxl T)
w.p.1.
295
10.1. Comparison of Statistical Models via Markov Kernels
Recall that the Neyman criterion provides a powerful tool for the verification of sufficiency of T. The sufficiency holds if the density P8 of P8 (w.r.t. some
dominating measure) can be factorized in the form P8 = r(h8 0 T).
EXAMPLES 10.1.1.
(i) Let ~ be a family of uniform distributions with unknown location parameter. Then, (X 1:n, Xn:n) is sufficient.
(ii) Let ~ be a family of exponential distributions with unknown location
parameter. Then, Xl:n is sufficient.
The concept of Blackwellsufficiency will be extended in two steps. First we
consider the situation where (10.1.3) holds with a remainder term. The second
extension also includes the case where the parameter sets of ~ and !2 are
unequal.
Approximate Sufficiency and eDeficiency
If (10.1.3) does not hold for any Markov kernel then one may try to find a
Markov kernel K such that the variational distances sUPBIP8(B)  KQ8(B)I,
oE e, are small. We say that !2 is edeficient w.r.t. ~ if
sup IP8(B)  KQ8(B)1
e(O),
0E
for some Markov kernel K.
In this context, the map T may be called approximately sufficient if e(O) is
small. Define the onesided deficiency o(!2,~) of!2 w.r.t. f!lJ by
o(!2,~):=
infsup sup IP8(B)  KQ8(B)1
K
8ee
(10.1.6)
where K ranges over all Markov kernels from (Sl,Bi 1 ) to (So, Bio) The deficiency o(!2, f!lJ) of !2 w.r.t. ~ measures the amount of information which is
needed so that !2 is more informative than ~. If TP8 = Q8, 0 E e, then
o(~,!2) =
o.
Notice that
between !2 and
is not symmetric. To obtain a symmetric distance
define the symmetric deficiency
o(!2,~)
~
~(!2,~)
= max(o(!2, ~), o(~, !2.
(10.1.7)
The arguments in (10.1.4) and (10.1.5) carryover to the present situation;
now, we have to include some remainder term into our consideration. Let
again K be a Markov kernel carrying mass from (Sl' Bi 1 ,!2) to (So, Bio). If 1/1*
is an optimal critical function acting on!2 then 1/1** = I/I*(T) is optimal on ~
within the error bound o(!2, ~). To prove this, consider a critical function 1/1
on Sl. We have
296
10. Approximate Sufficiency of Sparse Order Statistics
f I/J** dPe = f I/J* dQe"? f[f I/J(x)K(dxly) ] dQe(Y) = f I/J dKQe
"?
I/J dPe 
s~p lPe(B) 
(10.1.8)
KQiB) I
for every Markov kernel K, and hence
I/J** dPe "?
I/J dPe 
<5(~, 9)
(10.1.9)
showing the desired conclusion.
Next we shall give a simple, (as we hope) illuminating example in order not
to remain too theoretical. The technical details are omitted in order not to
disturb the flow of the main ideas.
EXAMPLE 10.1.2. Consider the location parameter model90 ,n = {PO,n,e} of a
sample of size n arising out of the densities
x
+
f(x  0)
(10.1.10)
with f being fixed. Assume that f(x) > 0, a ::;; x ::;; b, and =0, otherwise.
A typical example is given by the uniform density
= Z 11[1.11
(10.1.11)
Denote by &>~n the special model under condition (10.1.11). Recall from
Example 10.1.1 that (XI:n,Xn:n) is a sufficient statistic in this case.
Step 1 (Approximate Sufficiency of (X 1:n, Xn:n)). Under weak regularity conditions it can be shown that (X 1:n, Xn:n) is still approximately sufficient for
the location model9o,n. We refer to Weiss (1979b) for a global treatment and
to Janssen and Reiss (1988) for a local "onesided" treatment ofthis problem.
The technique for proving such a result will be developed in the next section.
Regularity conditions have to guarantee that no further jumps of the density
occur besides those at the points a, b.
Let &>l,n = {PI,n,e} denote the model of distributions of (Xl:n' Xn:n) under
the parameter O.
Approximate sufficiency means that there exists a Markov kernel K I such
that PO,n,e can approximately be rebuilt by KIP1,n,e. In terms of edeficiency
we have
(10.1.12)
where e(n) + 0, n + 00. We remark that e(n) = O(n 1 ) under certain regularity
conditions.
In the special case of (10.1.11), obviously,
A(9~n,91,n)
o.
Notice that 9 1,n is again a location parameter model.
(10.1.13)
10.1. Comparison of Statistical Models via Markov Kernels
297
Step 2 (Asymptotic Independence of X l :n and Xn:n). Next Xl:n and Xn:n will
be replaced by independent versions Yl :n and Y,,:n, that is,
i = 1, n,
(10.1.14)
and Yl :n, Y,,:n are independent.
From (4.2.10) we know that the variational distance between the distributions of (X l:n' Xn:n) and (Yl :n, Y,,:n) is of order O(nl).
In this case, the Markov kernel which carries one model to the other is
simply represented by the identity. Denote by 9 2,n = {P2,n,6} the location
parameter model which consists of the distributions of (Yl : n, Y,,:n). Then,
(10.1.15)
Step 3 (Limiting Distributions of Extremes). Our journey through several
models is not yet finished. Under mild conditions (see Section 5.2), the extremes Yl :n and Y,,:n have an exponential distribution with remainder term
of order O(nl). More precisely, if the extremes l/:n, i = 1, n, are generated
under the parameter 0 then
sup IP6{(Yl :n  a) E B}  Ql,n,6(B)1 = O(n l )
B
(10.1.16)
and
sup IP6{(y":n  b) E B}  Q2,n,6(B)1 = O(nl)
B
where the Qi,n,6 have the densities qi,n(  0) defined by
ql
.f x:2: 0
x< 0
_ {nf(a)exP[ nf(a)x]
0
' x) n
(10.1.17)
and
( ) _ {nf(b)eXP[nf(b)Y]
q2 ,n Y  0
.f y::; 0
Y> o
We introduce the ultimate model 9 3 ,n = {P3 ,n,6} where
P3 ,n,6 = Ql,n,6
Qn,n,6
Note that 9 3 ,n is again a location parameter model.
Summarizing the steps 13 we get
~(9o,", 9 3 ,n)
= O(e(n) + n l ).
(10.1.18)
One may obtain a fixed asymptotic model by starting with the model of
distributions of n(Yl :n  a) and n(Y,,:n  b) under local parameters nO in place
ofO.
Step 4 (Estimation of the Location Parameter). In a location parameter model
it makes sense to choose an optimal estimator out of the class of estimators
298
to. Approximate Sufficiency of Sparse Order Statistics
that are equivariant under translations; that is, given the model
estimator has the property
~3,n
the
(10.1.19)
If en is an optimal equivariant estimator on
~3,n
then
(10.1.20)
en(X l : n  a, Xn:n  b)
is an equivariant estimator operating on ~O,n having the same performance
as en besides of a remainder term of order O(B(n) + n l ).
We remark that in order to show that en(X l : n  a, Xn:n  b) is the optimal
estimator on ~O,n one has to verify that en is optimal within the class of all
randomized equivariant estimators operating on ~3,n'
Let us examine the special case of uniform densities as given in (10.1.11). A
moment's reflection shows that necessarily
(10.1.21)
so that any reasonable estimator has to lie between Xn:n  1 and X l : n + 1.
One could try to adopt the maximum likelihood (m.l.) principle for finding
an optimal estimator. However, the likelihood function
e+ Tn n
n
i=l
1[91,9+1j(Xi : n )
has its maximum at any e between Xn:n  1 and X l : n + 1. Hence, the m.l.
principle does not lead to a reasonable solution of the problem.
For location parameter models it is well known that Pitman estimators are
optimal within the class of equivariant estimators (see e.g. Ibragimov and
Has'minskii (1981), page 22, lines 19).
It is a simple exercise to verify that
(10.1.22)
is a Pitman estimator w.r.t. any subconvex loss function L('  '). Note
that L('  .) is subconvex if L is symmetric about zero and LI [0, (0) is
nondecreasing.
If L is strictly increasing then the Pitman estimator is uniquely determined.
Let us return to the ultimate model ~3,n' A Pitman estimate en (x, y) w.r.t.
the loss function L( .  . ) minimizes
(10.1.23)
L(e  U)gl,n(X  U)g2,n(Y  u)du
in e. [Recall that the Pitman estimator is a generalized Bayes estimator with
the Lebesgue measure being the prior "distribution."]
Check that (10.1.23) is equivalent to solving the problem
rx L(e 
Jy
u)exp[n(f(a)  f(bu] du = min!.
9
(10.1.24)
10.2. Approximate Sufficiency over a Neighborhood of a Fixed Distribution
299
If f(a) = f(b) then for subconvex loss functions,
On(x, y) = (x
+ y)/2
is a solution of (10.1.24). Moreover, this is the unique solution if L is strictly
increasing on [0, (0).
Thus,
[X l : n
+ Xn:n 
(a
+ b)]/2
(10.1.25)
is an "approximate" Pitman estimator in the original model &'O,n'
The finding of explicit solutions of (10.1.24) for f(a) "# f(b) is an open
problem.
Unequal Parameter Sets
Corresponding to (9.1.7) and (9.1.8) we introduce models
f!J> = {PO,g: 8 E
e, g E G(8)}
.?l = {QO,h: 8 E
e, h E H(8)}
and
where g and h may be regarded as nuisance parameters. The notion and the
results above carryover to the present framework .
.?l is said to be edeficient w.r.t. &' if
sup IPo,y(B)  KQO,h(B)1
:S;
e(8, g, h),
0E
e, g E G(8), h E H(8),
(10.1.26)
for some Markov kernel K.
Define the "onesided" deficiency b(.?l, &') of.?l w.r.t. &' by
b(.?l, &') := inf sup sup IPo,g(B)  KQO,h(B)1
K
O,g,h
(10.1.27)
where K ranges over all Markov kernels from (Sl,81d to (So, 810 ), Moreover,
the symmetric deficiency of .?l and &' is again defined by
(10.1.28)
10.2. Approximate Sufficiency over a Neighborhood
of a Fixed Distribution
In this section we compute an upper bound for the deficiency (in the sense of
(10.1.7)) of a model defined by the distributions of the order statistic and a
second model defined by the joint distribution, say, Pn of sparse order statistics
Xr,:n:S; X r2 : n :S; "':S; X rk : n [suppressing the dependence on rl, ... ,rkJ. To
10. Approximate Sufficiency of Sparse Order Statistics
300
prove such a result one has to construct an appropriate Markov kernel which
carries the second model back to the original model.
Let X1:n ::::;; ... ::::;; Xn:n be the order statistics of n i.i.d. random variables
with common dJ. F which is assumed to be continuous. Theorem 1.8.1
provides the conditional distribution
(10.2.1)
of the order statistic (X 1:n, ... ,Xn:n) conditioned on (Xr,:n,Xr2:n"",Xrk:n)=
x. Re::all that Kn is a Markov kernel having the "reproducing" property
KnPn(B) =
f Kn(BI')dPn = P{(X1:n,X2:n"",Xn:n)EB}
(10.2.2)
for every Borel set B.
Let K: denote the special Markov kernel which is obtained if F is the
uniform dJ. on (0,1), say, Fo. Thus, we have
K:Clx) = P((U1:n,, Un:n) E 'I(Ur, :n' Ur2 :n,, Urk :n) = x).
If F is close to Foin a sense to be described laterthen one can hope
that (10.2.2) approximately holds when Kn is replaced by K:. The decisive
point is that K: does not depend on the dJ. F.
In light of the foregoing remark the k order statistics X r, :n' ... , X rk :n carry
approximately as much information about F as the full order statistic.
The Main Results
We shall prove a bound for the accuracy of the approximation introduced
above under particularly weak conditions on the underlying dJ. F.
Theorem 10.2.1. Let 1 ::::;; k ::::;; nand 0 = ro < r1 < ... < rk < rk+1 = n + 1. Denote again by Pn the joint distribution of order statistics X r, :n' X r2 :n, ... , X rk :n
[of n i.i.d. random variables with common df. F and density f].
Assume that cx(F) = 0 and w(F) = 1, and that f has a derivative on (0,1).
Then,
sup IP{(X 1:n,X2:n, ... ,Xn:n) E B}  K: Pn(B) I
(+ 1)2
::::;; c5(F) [ k+1
L (rj  rj  1 _ 1) rj  rj  1
j=l
n+1
J1/2
(10.2.3)
where
c5(F) = sup 1f'(y)11 inf j2(y).
YE(O.l)
(10.2.4)
YE(O.l)
PROOF. Let Kn denote the Markov kernel in (10.2.1) given the dJ. F. Applying
Theorem 1.8.1 we obtain
10.2. Approximate Sufficiency over a Neighborhood of a Fixed Distribution
301
sup IP{(X 1 ,n"",Xn,n) E B}  K:Pn(B)1
~ s~p IKn(BI') 
(1)
K:(BI')I dPn
~ f s~p 1
C~ ~.x ) (B)  C~ Qj.x )<B) 1
dPn(x)
where Pri .x = Qri. x are the Diracmeasures at Xi for i = 1, ... , k; moreover
for i = 1, ... , k + 1 and j = r i  1 + 1, ... , r i  1 the probability measures ~.x
and Qj.x are defined by the densities:
Pi.x
l(xi _,.xj(F(x;)  F(X i 1 ))
and
qi,x
1(Xi_"Xj(Xi  Xii)
[with the convention that X o = 0 and
Writing
g
X n +1 =
1].
== gj,x = (Pj,x/%,x)  1
we obtain from (3.3.10) that for every x with Xj  xj 1 > 0, j
sup
B
I(x ~,x)(B) (X

)=1
)=1
1, ... , k
+ 1,
Qj,x)(B)1
(2)
where
p(F)
= [
sup 1f'(Y)I/ inf f(y)J2
YE(O,l)
YE(O,l)
The second inequality in (2) can easily be verified by using the representation
gj,x(Y)
f'(v)
f(u) (y  u),
(3)
with u and v strictly between xj 1 and Xj and u not depending on y.
Combining (1) and (2) and applying the Schwarz inequality we obtain
sup IP{(X 1 ,n, X 2 ,n,"" Xn,n)
B}  K: Pn(B)1
where XO,n =
and X n+1,n = 1. Finally, Theorem 1.2.5(i) and (1.7.4) yield
(4)
10. Approximate Sufficiency of Sparse Order Statistics
302
E(Xrj:n  X rj _,:n)2 = E[F 1 (Urj :n)  F 1 (Urj1:n)]2
~ E[Ur:rj_,:n] /
inf P(y) ~ (
r.  r
J
ye(O,I)
11+ )2/
thus, (10.2.3) is immediate from (3)(5).
(5)
inf P(y)
ye(O,I)
Notice that c5(F) can be regarded as a distance between F and the uniform
dJ. Fo.
EXAMPLE 10.2.2. If the differences ri  ri  1 are of order O(m(n)) and k == k(n)
is of order O(n/m(n)) which means that, roughly speaking, the indices ri are
equidistant, then the righthand side of (10.2.3) is of order
O(c5(F)m(n)/nl/2 ).
Thus, if m(n) = o(nl/2) (entailing that the number k of order statistics has
to be larger than n 1/2 ) then the righthand side of (10.2.3) goes to zero as n
goes to infinity even if c5(F) is bounded away from zero.
If n1/ 2 = O(m(n)) then F should also depend on n. In the statistical context
this means that our model has to shrink towards the uniform dJ. as n goes to
infinity.
A typical situation for such a dependence on the sample size n occurs in
the context of a goodnessoffit test when one is testing the uniform dJ. Fo
against an alternative Fn having a density f,. given by
f,.(x) = 1 + p(n)n 1/2h(x).
Note that p(n) is a fixed constant in classical test problems. In Example
10.4.1 we shall study the situation where the dimension of the alternative
increases as the sample size increases. Then, p(n) has to go to infinity as n ~ 00
in order to attain rejection probabilities bounded away from the level a under
alternatives of the above form.
If h and h' are bounded then c5(Fn) = O(p(n)nl/2) and, therefore, the righthand side of (10.2.3) is of order o [p(n)/k(n)].
Local Formulation
Theorem 10.2.1 may be extended in various directions. In cases where one is
only interested in local properties of F, our considerations will be based on a
statistic only depending on certain extreme or central order statistics, say,
Xr:n ~ Xr+l:n ~ ... ~ X.:n where 1 ~ r ~ s ~ n. Again the number of order
statistics may be reduced. If r1 = rand rk = s then, in contrary to the conditions of Theorem 10.2.1, it suffices to assume that 0 ~ a(F) < w(F) ~ 1.
For the formulation of Addendum 10.2.3 we introduce the projection
r == r(r, s) defined by
10.2. Approximate Sufficiency over a Neighborhood of a Fixed Distribution
303
Note that in the following context, Markov kernels will rebuild the joint
distribution of X r:n, X r+ l : n, ... , Xs' Define a Markov kernel adjusted to the
present problem, namely,
K:,t('lx)
= rK:(lx).
Note that K:,t( 'Ix) is a marginal distribution of K:( . Ix). Check that
K:,t( 'Ix) is the conditional distribution of (Ur:n, Ur+1:n, ... , Us:n) given
(Uri :n' Ur2 :n,, Urk :n) = x.
Addendum 10.2.3. Assume that 1 S; r = r l < r2 < ... < rk = S S; n. Denote
again by Pn the joint distribution of the order statistics X r1 :n, X r2 :n, ... , X rk :n.
Assume that 0 S; a(F) < w(F) S; 1, and that f has a derivative on (a (F), w(F.
Then,
sup
B
IP{ (Xr:n, X r+1:n, ... , Xs:n) E B}  K:,tPn(B) I
(10.2.5)
where
b(F) =
sup
ye(<z(F),w(F))
1f'(Y)I/
inf
P(y).
ye(<z(F),w(F))
The proof of (10.2.5) is an almost verbatim repetition of that of (10.2.3) and
can be left to the reader. We remark that Addendum 10.2.3 is an immediate
consequence of Theorem 10.2.1 if again a(F) = 0 and w(F) = 1.
Transformed Models
The results until now are concerned with d.f.'s F close to the uniform dJ. Fo
on (0,1). If we fix some other continuous d.f., say Go in place of Fo then the
probability integral transformation may be applied to reduce the problem
again to the former case.
The dJ.'s G close to Go have to be of the form G = FoGo where F (being
equal to G 0 G( 1 ) has to fulfill the conditions of Theorem 10.2.1. If Yi:n are
the order statistics of r.v.'s with common dJ. G then Xi:n = Gol(Yi:n) are the
order statistics of r.v.'s with common dJ. F. Thus, Theorem 10.2.1 applies to
X i :n
In order to formulate the problem for the original order statistics Yi:n we
introduce the Markov kernel M: where M:( 'Iy) is the conditional distribution of (Y1: n,Y2 : n, ... ,y":n) given (,.I: n ',.2: n ""',.k: n )=Y in the special
case of G = Go (where again the dependence of M: on r10 ... , rk will be
suppressed).
304
10. Approximate Sufficiency of Sparse Order Statistics
Theorem 10.2.4. Let 1 .:s;; k .:s;; nand 0 = ro < r 1 < ... < rk < rk+1 = n + 1. Let
F be a continuous df. with a(F) = 0 and w(F) = 1. Assume that F has two
derivatives on (0, 1). Put f = F'.
Denote by Qn the joint distribution of the order statistics Y,., :n' ... , y"k: n
where the 1';:n are the order statistics of n i.i.d. random variables with common
df. G1 = FoGo Then,
sup IP{ (Y1:n, YZ : n,"" Y,,:n)
B}  M,iQn(B)1
.:s;; b(F) [
k+1
(rj 
rjl _
1) (
rj
j=l
rj1
+ I)Z
n+1
J1 /Z (10.2.6)
where again
b(F) =
sup 1f'(Y)I/ inf j2(y).
ye(O,l)
ye(O,l)
Theorem 10.2.4 was stated in such a way that it can easily be deduced from
Theorem 10.2.1, however, this formulation looks rather artificial. Further
insight in the nature of the term b(F) can be obtained by means of a different
representation of the density f
From P.1.S and Remark 1.5.3 we conclude that G1 has the Godensity
g = foGo. Hence, f = g 0 G01 , according to Criterion 1.2.3. Thus, the conditions of Theorem 10.2.4 can be reformulated in the following way:
Assume that G1 has the Godensity g so that g 0 G01 is differentiable.
Moreover, the term b(F) can be replaced by
sup I(g
ye(O,l)
G01 ),(Y)I/ inf (g
ye(O,l)
G01 )Z(y).
Theorem 10.2.4 is an immediate consequence of Theorem 10.2.1 and the
following lemma which may also be applied to prove extensions of Addendum
10.2.3.
Lemma 10.2.5. Let 1 .:s;; r 1 < rz < ... < rk .:s;; n. Let Xi:n and 1';:n be the order
statistics of n i.i.d. random variables with common df. F and, respectively, df.
G1 = FoGo. The df.'s F and Go are assumed to be continuous, and 0 .:s;; a(F) <
w(F) .:s;; 1.
Denote by Pn and Qn the joint distributions of X r , :n' ... , X rk : n and Y,., :n'
... , y"k: n' Then, with M,i and K: as defined above, we have
sup IP{ (Y1:n, YZ : n,'''' Y,,:n)
B
B}  M,iQn(B) I
sup IP{(X1:n,X2:n"",Xn:n)
B}  K:Pn(B)I.
(10.2.7)
PROOF. From P.1.S we know that G 1(rO is a r.v. with dJ. FoGo if 11 is a r.v.
with dJ. F. This implies that
(Y1:n, YZ : n,"" Y,,:n)
4: (G01(X1:n)' G01 (XZ : n),"" G01(Xn:n)).
(1)
10.3. Approximate Sufficiency over a Neighborhood
305
From (1) we know that
P{(Yl :n, Y2:n,., Y,,:n) E B} = P{(Xl:n,X2:n"",Xn:n) E B}
where B = {x: (GOl(Xl)' GOl (X 2 ), ... , Gol(xn)) E B}. If, moreover,
M:Qn(B) = K: Pn(B)
(2)
(3)
then it is apparent that (10.2.7) holds. From (1) we also know that
M:Qn(B) = EM:(BI GOl(X'1 on), Go l (X'2: n ), , Gol(X'k: n ))'
Thus, in view of (3) it remains to prove that
K:(Blxl, ... ,Xk) = M:(BIY1,Yl"",Yk)
(4)
whenever IX(F) < Xl < x 2 < ... < x k < w(F) with Yi denoting GOl(Xi)'
Since Go is continuous we know that GOl is strictly increasing thus,
IX(Go) =: Yo < Yl < ... < Yk < Yk+l := w(Go)
Put Xo = 0 and Xk+l = 1. Let ex denote the Diracmeasure at X (with mass
1). Moreover, Q denotes the probability measure corresponding to Go, and Q
is the uniform distribution on (0, 1).
It is obvious that ey; is induced by ex; and GOl . Moreover, from P.1.6 we
know that the truncation of Q to the interval (Yil, Yi), say, QYil'Y' is induced
by QX'_I'X, and GOl for i = 1, ... , k + 1. Thus, Theorem 1.2.5(i) yields that
M:( 'IYl,'" ,Yk) is induced by K:( Ix l , ... ,Xk) and the map
(u l , U2, ... , un) + (GOl (U l ), GOl (U2), ... , GOl (Un)).
This implies (4). The proof is complete.
10.3. Approximate Sufficiency over a Neighborhood
of a Family of Distributions
In the preceding section it was proved that a small number of order statistics
carries nearly all the information about the underlying dJ. G if this dJ. is close
to a fixed dJ. Go. Now we start with a family of drs G(, 0), 0 E e, and build
a model containing joint distributions of order statistics under adJ. G close
to one of the dJ.'s G(', 0), 0 E e.
Near to Uniform Distributions
The location and scale parameter family of uniform distributions provides an
exceptional case. Here the approximate sufficiency of sparse order statistics
can directly be proved by means of the results in Section 10.2. The uniform
dJ.'s
10. Approximate Sufficiency of Sparse Order Statistics
306
G( . ,9) == G(', (Jl, a
are given by G(x, (Jl, a = (x  Jl)/a for Jl < x < Jl
+ a.
Corollary 10.3.1. Let Xi:n be the ith order statistic of n i.i.d. random variables
with df. G given by G(x) = F((x  Jl)/a) for Jl < x < Jl + a with  00 < Jl < 00
and a > O. It is assumed that F is continuous and has two derivatives on
(0,1) = (oc(F), w(F. Put f = F'.
Let 1 ~ r = r 1 < r2 < ... < rk = s ~ n. Let K:'t denote the Markov kernel
defined in Addendum to.2.3, and let again Pn denote the joint distribution of the
sparse order statistics X r, on' X r2 :n, ... , X rk :n. Then,
sup IP{(Xr:n,Xr+1:m,,,,Xs:n)EB}  K:,tPn(B) I
B
where again
c5(F) = sup
ye(O,l)
1f'(Y)I/ inf
ye(O,l)
j2(y).
Immediate by applying Theorem 10.2.4 to Go = G(', (Jl,
that M:'t = K:'t
PROOF.
and noting
D
In Corollary to.3.1 it may as well be assumed that F has two derivatives
on (oc(F),w(F with 0 ~ oc(F) < w(F) ~ 1. However, this would not yield an
extension of Corollary 10.3.1 since the dJ. G can be represented in the former
way by choosing different parameters Jl and a. It is also of importance to take
r 1 = rand rk = s since, otherwise, the identity M:'t = K:'t does not hold.
Corollary 10.3.1 was immediate from the results of Section 10.2 since the
conditional distribution of (Y.:n, Y.+1:n''" Son) given (Y., on' Y. 2:n,", Y.k:n)
is independent of the parameter (Jl, a) where the Y;:n are the order statistics
under the uniform dJ. G( " (Jl, a. This property is not shared by other parametric families of dJ.'s (proof!), and so we need a modification of the concept.
The Main Results
In this sequel, we shall assume that r = r 1 < ... < rk = s, and the parameter
space e is a subset of the Euclidean dspace equipped with the Euclidean
norm 11112'
For every vector 9 = ((J1' (J2"'" (Jd) E e let M:'o be the conditional distribution of (Y.:n, Y.+1:n,'' Son) given (Y.,:n, Y. 2:n,'''' Y.k:n) = x where the Y;:n
are the order statistics under the dJ. G(, 9). Notice that the Markov kernel
M:'o also depends on r = (r1' r2 , ... , rk).
In the next step the unknown parameter 9 will be replaced by an estimator
10.3. Approximate Sufficiency over a Neighborhood
307
On based on the order statistics
X r, on' X r2 :n, ... , X rk :n under adJ. G which
not necessarily belongs to the parametric family {G(, 0): E 0}. Thus, a new
problem arises, namely, one has to estimate the parameter under a model
which is incorrect.
The conditions in Theorem 10.3.2 will guarantee that
defined by
M:
(10.3.1)
is a Markov kernel.
Let again Pndenote the joint distribution of the sparse order statistics X r, on'
X r2 :n, ... , X rk :n. We shall use M:Pn as an approximation to the joint distribution of X r:n, X r+1:n, ... , X s:n. The accuracy of this approximation will depend
on the performance of the estimator On and the distance of G from the
parametric family {G(,O): e E 0}. We assume that the drs G(,O) have
densities, say, g(., 0).
Theorem 10.3.2 will be proved under a local Lipschitz condition. Given a
fixed parameter 00 E 0 assume that
(0/oy)G(G 1(Y1'00)'0)
I(0/oy)G(G 1(yz,00),0) 
1 ::; ClIO  001lzIY1  Yzi
(10.3.2)
for every E 0 with 110  Oollz ::; e, C ~ 0 and Y; with 0 < q1 < Y; < qz < 1
for i = 1,2.
In (10.3.2) it is implicitely assumed that g(x,O) > 0 for every x with
G 1(q1'00) < x < G 1(qz,00) and withllO  Oollz::; e.
lf 0 = {Oo }that is the problem of Section 1O.2then (10.3.2) holds with
C=O.
Another set of conditions involving the partial derivatives
(OZ /oe;oy)log g
will be examined in Criterion 10.3.3.
Theorem 10.3.2. Let 1 ::; k ::; nand 1 :$ r = r 1 < rz < ... < rk = s ::; n. Let
X;:n be the ith order statistic of n i.i.d. random variables with common df
G = F 0 G(,Oo) where 00 E 0 and F is a df with cx(F) = 0 and w(F) = 1.
Moreover, suppose that F has two derivatives on (0, 1). Put again f = F'.
Suppose that the df's G(, 0) fulfill condition (10.3.2) for some constants e,
C > 0 and 0 < q1 < qz < 1.
Then for every measurable and 0valued estimator On we have, with
defined as in (10.3.1),
M:
sup IP{(Xr:n,Xr+1:n, ... ,Xs:n) E B} B
::; [(j(F)
~ C, e)] [kL
+ p(F, On,
j=Z
M: Pn(B)!
(rj  rj  1
+ P{IIOn(Xr,:n,Xr2:n, ... ,Xrk:n) 
(r.r.1+3)ZJ1/Z
1)
)n+1
Oollz > e}
+ P{Xr:n::; G 1(q1'00)} + P{Xs:n ~ G 1(Qz,00)}
10. Approximate Sufficiency of Sparse Order Statistics
308
with c5(F) as in (10.2.4) and
p(F, On' C, c;)
=
PROOF.
(C/
inf f(y))min(C;,[EIIOn(Xrl,n,Xr2,n, ... ,Xrk,n) ye
(0.1)
0011~]1/4).
Our first aim is to prove that
sup IP{ (X"n, X r+ Ln ,, X"n)
B
B}  M: Pn(B) 1
1)2 J1 /2
+
~ c5(F) [ Ik (rj  rj  1  1) ( rj  r~j_1~_
j=2
+ P{ IIOn(Xr1 ,n' X r2 ,n,"" Xrk,n)  00112 > c;}
+ P{X"n ~ G 1(Q1,00)} + P{X"n?: G 1(Q2'00)}
+[
where, with
.I (rj k
J=2
rj 1  1)
(1)
/
I/!ix) dPn(x) J1 2
9 == 9n ,
hj ,x(y,8)
= g(y, 0) l(xj_1>xj)(y)/[G(xj , 0)  G(Xjl> 0)].
Applying the triangle inequality and Theorem 10.2.4 one obtains
sup P {(X"n, X r+ 1 ,n" .. , X"n)
1
B}  M: Pn(B)
~ sup IP{(X"n,Xr+1,n"",X"n)
+ sup 1M: 0
B
~ c5(F) [
Pn(B)  M: Pn(B) 1
j=2
B}  M:ooPn(B)
(rj _ rj  1 _
1) (rj 
rj  1 +
n+ 1
1)2 J1 /2
where Pr"x and Qr"x are the Diracmeasures at Xi' and for i = 2, ... , k and
j = ri  1 + 1, ... , ri  1 ~he probability measures IJ,x and Qj,X are defined
by the densities hj,x(', O(x)) and hj,x(', ( 0 ), Now (1) is immediate from
inequality (3.3.10) and the Schwarz inequality.
For every x E A we obtain, with Zj = G(xj , ( 0 ), that
2
2
2
I/!j(x) ~ C 110(x)  001121Zj  Zj11 .
(2)
A
From the mean value theorem and substituting y by G 1 (z, ( 0 ) we obtain
for some uj between Zj1 and Zj that
309
10.3. Approximate Sufficiency over a Neighborhood
1
t/!j(x) =   Zj  Zj1
X
G(G
1
1
Zj  Zj1
G  1 (Zj,9 0 )
Gl(Zj_l,90 )
[9(Y,9(X))
g(y, (0)
Zj  Zj1 1
(Zj'Oo), O(x))  G(G (Zj1'00)'0(x))
A
Zj
Zjl
J2 g(y, )dy
(8 j 8ZG(G 1(Z,00)' ~(x)) _ 1)2 dz
8j8z G(G 1(Uj' (0), O(x))
and hence (2) follows at once from condition (10.3.2) by noting that q1 < Zl <
Z2 < .. , < Zk < q2'
It is immediate from (2) and the Schwarz inequality that
t/!j(x)dPn(x):::.;; C 2min{e 2, (EI19(Xrl:n,Xr2:n"",Xrk:n)  001IW/2}
x (E(G(Xrj:n,Oo)  G(Xrj _ 1 : n ,00))4)1/2.
(3)
Applying (1.7.4) we obtain (as in the proof of Theorem 10.2.1) that
E(G(Xrj:n' (0)
G(Xrj _ 1 on' ( 0 ))4
= E(F 1(Urj :n)  F 1(Urj _ :n))4
1
:::.;; (
inf f(y))4 EUr>rj_l:n
(4)
YE(O,l)
:::.;; (
inf f(y))4 ((rj  rj 1 + 3)j(n
(0,1)
+ 1W.
YE
Combining (1), (3), and (4) the proof is complete.
Condition (10.3.2) holds~as already mentioned~in the degenerate case
where E> = rOo}. Another special case will be studied in the following.
Criterion 10.3.3. Assume that E> is an open and convex subset of the Euclidean
dspace. Assume that the partial derivatives (8 2 j80j 8y)log 9 exist.
Then condition (10.3.2) holds with
C
where
= exp[elq2  q1IK(g)]K(g)
K(g) = suplI((8 2j80i 8y)log g(G 1(y, (0)' 0))~=1112
with the supremUm ranging over all (y, 0) with q 1 < Y < q2 and II
PROOF.
Applying the mean value theorem we get
0 0 112 :::.;; e.
!IOg :y G(G 1(Y1' (0), 0)  log :y G(G 1(Y2' (0)0)!
= !:yIOg :y G(G 1(y,00),0))(Yl
 Y2)!
= !:yIOg9(G 1(y,00),0)  :yIOg9(G 1(y,00),00)!IY1  Y21
:::.;; K(g)IIO  001l21Y1  Y21
to. Approximate Sufficiency of Sparse Order Statistics
310
with y between Y1 andY2. Sincezdz2 = exp(logzl logz2)and lexp(z)  11 ~
exp(z)z for z, z 1, Z2 > 0 the proof can easily be completed.
D
Final Remarks
Let us examine the problem of testing the parametric nullhypothesis
{G( ,0): 0 E e} against certain nonparametric alternatives Gn
It is easy to see that Gn is of the form Fn 0 G(, 0 0 ) where Fn has the density
fn(Y) = 1 + h(G 1(y, Oo))rx(n) if, and only if, Gn has the density
gn(x) = g(x,O o)(l
+ h(x)rx(n))
where f h(x)g(x, 0 0 ) dx = O. In this case if hand h'(G 1(., 00))/g(G 1 (., 0 0 )) are
bounded we have J(Fn) = O(rx(n)) and infyE (o,ldn(Y) ~ 1  O(rx(n)).
Within the present framework one has to find an appropriate estimator of
O. The problem of constructing estimators which are optimal in the sense
of minimizing the upper bound in Theorem 10.3.2 is also connected to
the problem of finding an "optimal" parameter 00 which makes J(F) =
J(G 0 G 1 (., 0 0 )) small.
Given a functional T on the family of all qJ.'s so that T(G 1(., 0)) = 0, the
statistical functional T(Fn 1) is an appropriate estimator of T(G 1) and thus
of 0 0 if G 1 is close to G 1 ( ,0 0 ). Since the estimator On is only allowed to
depend on the sparse order statistics X r / :n' X r2 : n, ... , X rk : n one has to take
a statistical functional w.r.t. a version ofthe sample qJ. which is based on these
sparse order statistics.
10.4. Local Comparison of a Nonparametric
Model and a Normal Model
Let us summarize the results of Sections 10.2 and 10.3 without going into the
technical details. The nucleus of our model is a parametric family G(, 0),
o E e, of dJ.'s. In Section 10.2 we studied the particular case where e consists
of one parameter. In Section 10.3 the model is built by dJ.'s G close to G(, 0)
for some 0 E e. Under appropriate conditions on r = (r1, ... , rk) and G we find
a Markov kernel
such that
M:
sup
B
IP{ (Xr:n' X r+1 : n, ... , Xs:n) E B}
 M: Pn(B) I ~ Bo(G, r, n)
(10.4.1)
where X1:n ~ ... ~ Xn:n are the order statistics of n i.i.d. random variables
with common dJ. G, and Pn is the joint distribution of X r /: n, X r2 : n, ... , X rk : n.
The decisive point in (10.4.1) is that the Markov kernel
is independent
ofG.
M:
Let us also apply the result of Section 4.5, namely, that central order
statistics X r / :n' X r2 : n, ... , X rk : n are approximately normally distributed.
10.4. Local Comparison of a Nonparametric Model and a Normal Model
311
Denote by g the density of G. We have
sup IP{ (Xrl on' X r2 :n,, X rk :n) E B}  P{ (Y{, y
B
z,... , Y;)
B} I ~
8 1 (G,
r, n)
(10.4.2)
where the explicit form of 8 1 (G, r, n) is given in Theorem 4.5.3, and
z,... , yn is a normal random vector with mean vector
(Y{, y
Jl (G)
and covariance matrix L(G)
O"i,j =
n:
1 (1 
(~)
(~))
(10.4.3)
(G 1 n+l , ... , G 1 n+l
= (O"i)
given by
~ 1)/[(n + l)g( G
C:
1)
)g( G
C~
1)) ]
(10.4.4)
for 1 ~ i ~ j ~ k.
Since (10.4.2) can be extended to [0, l]valued measurable functions (see
P.3.5) we obtain
sup
B
1M: Pn(B)  M: N(I1(G),l;(G))(B)1
~ 8 1 (G, r, n).
(10.4.5)
Combining (10.4.1) and (10.4.5) we have
sup IP{(Xr:n,Xr+1:n,"" Xs:n)
B
B} 
M: N(I1(G),l;(G))(B) I
8(G,r,n):= 8 0 (G,r,n) + 8 1 (G,r,n).
(10.4.6)
(10.4.6) connects the following two models. The first one is given by joint
distributions of order statistics X r:n, ... , Xs:n with "parameter" G; the second
one is a family of kdimensional normal distributions with parameters
(Jl(G), L(G)). In the sense of (10. 1.26), the model, given by normal distributions
N(I1(G),l;(G))' is 8(G, r, n)deficient w.r.t. the model determined by the order
statistics X r:n, X r+1:n, ... , X s:n.
If (10.4.6) holds for r = 1 and s = n then the following result also holds: Let
~1' ~2' ... , ~n be the original i.i.d. random variables. Since the order statistic
is sufficient we find a Markov kernel M:* (see also P.1.29) such that
sup IP{ (~l' ~2"'" ~n) E B} B
M:* N(I1(G),l;(G))(B) I ~ 8(G, r, n).
(10.4.7)
Next we present the main ideas of an example due to Weiss (1974, 1977)
where the approximating normal distribution depends on the original dJ. F
only through the mean vector. Moreover, we indicate the possibility of calculating a bound of the remainder term of the approximation.
EXAMPLE 10.4.1. As a continuation of Example 10.2.2, the uniform dJ. Fo on
(0,1) will be tested against a composite alternative of dJ.'s Fn having densities
in given by
10. Approximate Sufficiency of Sparse Order Statistics
312
fn(x) = 1 + f3(n)n 1/ 2 h(x),
o :$; x :$; 1,
and = 0, otherwise, where S5 h(x) dx = O. The term f3(n) will be specified later.
Part 1 (Asymptotic Sufficiency). Recall from Example 10.2.2 that sparse order
statistics
are asymptotically sufficient under weak conditions.
Part 2 (Asymptotic Normality). Put again
Ai = rj(n
+ 1).
Let f3i.i and f3i,il be given as in the proof of Lemma 4.4.2. Recall that the f3i.j
define a map S such that SN(o.r.) = N(O.I) where L = (O'i) and O'i,j = Ai(l  A),
i :$; j. The decisive point is that these values do not depend on F. Define
(10.4.8)
for i = 1, ... , k where 131,0 = O. Notice that Zl' ... , Zk are known to the
statistician, and hence tests may be based on these r.v.'s. The Zi are closely
related to spacings, however, the use of spacings would not lead to asymptotically independent r.v.'s (compare with P.4.4).
Applying (10.4.2) we obtain that Zi' ... , Zk can be replaced by independent
normal r.v.'s Y1' ... , ~ with unit variances and expectations equal to
i
= 1, ... , k.
(10.4.9)
Thus, we have
sup IP{(Zl, ... ,Zd E B}  P{(Y1""'~) E B}I = 0(1).
B
(10.4.10)
A bound for the remainder term in (10.4.10) may be proved by means of
P.4.2(i) and P.4.2(v) [see also P.1O.7].
Thus, the original testing problem has become a problem oftesting, within
a model of normal distributions N(P,I)' the nullhypothesis
J1
= (J11, .. ,J1d = 0
(10.4.11)
against
i
= 1, ... , k,
where the alternative has to be specified more precisely.
Part 3 (Discussion). The above considerations enable us to apply the nonasymptotic theory of linear models to the original problem of testing the
uniform distribution against a parametric or nonparametric alternative. By
finding an optimum procedure within the linear model one gets an approximately optimum procedure for the original model.
10.4. Local Comparison ofa Nonparametric Model and a Normal Model
313
Recall from P.3.8 that the most powerfullevelatest of a sample of size n,
for testing the uniform density against the density 1 + [3(n)n 1/2 h, rejects the
nullhypothesis with probability
(10.4.12)
under appropriate regularity conditions. However, in general, this power
cannot be attained uniformly over a composite alternative. It is well known
that test procedures with high efficiency w.r.t. one "direction" h have a bad
efficiency w.r.t. other directions. The KolmogorovSmirnov test provides a
typical example of a test having such a behavior.
In view of (10.4.12) a plausible requirement is that a test in the original
model should be of equal performance under every alternative 1 + [3(n)n 1/2 h
satisfying the condition
(fo h2(x) dx
1
)1/2
= (j
(10.4.13)
for fixed (j > O.
Let again Y1, ... ' Y,. be i.i.d. normal r.v.'s with unit variance and mean vector
JI=(1l1, ... ,llk) as given in (10.4.11). Denote again by 11112 the Euclidean
norm. Notice that L~=l P.i  Ai _ 1 )h(A;)2 is an approximation to (j2 and hence
IIJlI12 is an approximation to [3(n)(j.
Thus, within the normal model, one has to test the nullhypothesis ffo =
{O} = {JI: IIJlI12 = O} against an alternative
~ c {JI: IIJlI12
> O}
(10.4.14)
under the additional requirement that the performance of the test procedure
depends on the underlying parameter JI through IIJlI12 only; thus, the test is
invariant under orthogonal transformations. In Parts 4 and 5 we shall recall
some basic facts from classical, parametric statistics.
Part 4 (A x2Test). Let us first consider the case where ff1 = {JI: IIJlI12 > O}
without taking into account that h has to satisfy a certain smoothness condition that also restricts the choice of the parameters JI. The uniformly most
powerful, invariant test of level a is given by the critical region
(10.4.15)
where
(10.4.16)
and xL is the (1  a)quantile of the central x2distribution with k degrees of
freedom. According to Weiss (1977) the critical region Ck is also a Bayes test
for testing IIJlI12 = 0 against IIJlI12 = (j with prior probability uniformly dis
10. Approximate Sufficiency of Sparse Order Statistics
314
tributed over the sphere {JI: IIJlllz = c5} (proof!). Moreover, Ck is minimax for
this testing problem.
Since Yk = (Y1"'" y") is a vector of normal r.v.'s with unit variance and
mean vector JI we know that 1k is distributed according to a noncentral
xZdistribution with k degrees of freedom and noncentrality parameter IIJlII~.
If k == k(n) tends to infinity as n + 00, the central limit theorem implies that
(2k
Ct
+ 411J1W 1/z
(Y;Z  1) 
IIJlII~ )
(10.4.17)
is asymptotically standard normal. Consequently, Ck has the asymptotic
power function
(10.4.18)
This yields that asymptotically the rejection probability is strictly larger
than IX if IIJlII~/k1/z is bounded away from zero.
In the original model, the critical region
Ck =
tt
zl > xL},
(10.4.19)
with Zi defined in (10.4.8), attains the rejection probability
<I> ( <1>1 (IX)
(f
h2(X) dX) )
+ o(kO)
(10.4.20)
under alternatives 1 + [(2k)1/Z /nJ1/z h.
The critical region Ck is closely related to a xZtest based on a random
partition of the interval [0, 1].
Part 5 (Linear Regression). We indicate a natural generalization of Part 4 that
also takes into account the required smoothness condition imposed on h.
Assume that
(10.4.21)
where Vj = (viI), ... , vik)), j = 1, ... , s, are orthonormal vectors W.r.t. the inner
product (x, y) = I;;l XiYi' The wellknown solution of the problem is to take
the critical region
(10.4.22)
where
(10.4.23)
Notice that T. = I Yk liz where Yk = I;;l (Vj' yk)Zvj is the orthogonal projection of Yk onto the sdimensional linear subspace. The statistic T. is again
P.10. Problems and Supplements
315
distributed according to a noncentral X2distribution with s degrees offreedom
and noncentrality parameter 11J111~.We refer to Witting and Nolle (1970) or
Lehmann (1986) for the details. Now the remarks made above concerning the
asymptotic performance of the critical regions Ck and Ck carryover with k
replaced by s.
Part 6 (Parametric and Nonparametric Statistics). If s is fixed as n +
00 then,
obviously, our asymptotic considerations belong to parametric statistics. If
s == s(n} + 00 as n + 00 then, e.g. in view of the Fourier expansion of square
integrable functions, the sequence of original models approaches the space of
square integrable densities close to the uniform density showing that the
testing problem is of a nonparametric nature.
The foregoing remarks seem to be of some importance for non parametric
density testing (and estimation). Note that the functions h may belong to the
linear space spanned by the trigonometric functions el' ... , e. (see P.8.5(i.
So there is some relationship to the orthogonal series method adopted in
nonparametric density estimation. The crucial problem in nonparametric
density estimation is to find a certain balance between the variance and the
bias of estimation procedures. Our present point of view differs from that
taken up in literature. First, we deduce the asymptotically optimum procedure
w.r.t. the s(n}dimensional model. These considerations belong to classical
statistics. In a second step, we may examine the performance of the test
procedure if the s(n}dimensional model is incorrect.
P.10. Problems and Supplements
1. Let ~1' ... , ~. and, respectively, 111' ... ,11. be i.i.d. random variables and denote by
Xl:. ::::; ... ::::; X.:. and Y1:.::::; ::::; Y..:. the corresponding order statistics. Prove
that
= sup IP{(X 1 :., ,X.:.) E B}  P{(Y1 :., . ,
B
Y..:.) E B}I.
2. Prove that Theorem 10.2.1 holds with
c5(F) = exp ( sup
ye(O.l)
If'(y)/J(y) I) sup 1f'(Y)/J(Y)I/ inf J(y).
ye(O.l)
ye(O.l)
[Hint: Use the fact that J(y)!J(x) = exp[(f'(z)IJ(z))(y  x)] with z between x
and y.]
3. Theorem 10.2.1 holds with the upper bound replaced by
(c/
inf
ye(O.l)
j2(y))[kf (rj _ rj  1_ 1)(1]  rj  1+ 1)2"J1/2
j=l
n+1
if the density J statisfies a Lipschitz condition of order ex
(0, 1] on (0, 1).
316
10. Approximate Sufficiency of Sparse Order Statistics
'1
4. (i) If 0 = '0 < < '2 < ... < 'k = S, , = 1 and IX(F) = 0 then (10.2.5) holds with
IJ=2 replaced by IJ=I'
(ii) If, = 'I < '2 < ... < 'k < rk+1 = n + 1, S = nand w(F) = 1 then (10.2.5) holds
with IJ=2 replaced by IJ:~
5. Let r(xl, ... ,x.) = (x. k +1""'x.). Under the conditions of Addendum 10.2.3, if
IX(F) ~ 0 and w(F) = 1,
sup
B
IP{ (X.k+I,."", X.:.) E B}  K:.,p.(B)1
~[
sup
ye(<F),I)
1f'(Y)I/ inf
ye(<F), I)
j2(y)Jk 3/2
/n.
6. (i) Verify condition (10.3.2) with C = p(g)K(g) where K(g) is given as in Criterion
10.3.3 and
p(g) =
sup
1I0001i2'"
sup
G(GI(y,Oo),O)/ inf
ql <y<q,
G(GI(y,Oo),O)).
ql <y<Q2
(ii) Prove a modified version of Theorem 10.3.2 under the condition
(%y)G(GI(YI, 00 ), 0)
I
I(Y2'00)'0)  1 ~ CillO  001121y,  Y21
I(%y)G(G
+ C2(110 for every 0 E e with 110  0 0112 ~
C2 = p(g)K2(g).
6,
and
ql
001121y,  Y21)2
< YI, Y2 < q2' Here, C I = K(g) and
7. In analogy to (10.4.8) define
Z; = (n
+ 1)1 /2(Pi,i(X,,:, 
GI(A i ))
+ Pi,i1 (X'H:' 
GI(A;))).
Denote by p. the joint distribution of
i = 1, ... , k,
where gi = g(GI(A i)). N(o,f.) again denotes the kvariate normal distribution with
mean vector zero and covariances Ui,j = Ai(l  Aj), 1 ~ i ~ j ~ k. Prove that
sup
B
IP{ (Z~, ... , Z~) E B}
~
lIP. 
 ~'O,,)(B)I
k
N(o,f.)
I + 2 1/2 [ i~ (U;,i 
+ 210g gil
]1/2
where U~,I = g12 and
,
U. .
1,1
2
= gi
AiI (1  Ai) ( gi
2
gi (Ai  Aid giI
)2
1 ,
i = 2, ... ,k.
[Hint: Let H be the diagonal matrix with diagonal elements tfi,i = l/g i Let l: =
B 0 H 0 l: 0 Ht 0 Bt where B is defined as in the proof of Lemma 4.4.2. Notice that
det(l:') = (det(H))2.J
8. Specialize Example 10.4.1, Part 5, to trigonometric functions (see P.8.5).
9. Extend Example 10.4.1 to the composite nullhypothesis of uniform distributions.
Bibliographical Notes
317
Bibliographical Notes
The reader who is interested in the theoretical background concerning the
comparison of experiments is referred to Torgersen (1976), Strasser (1985), and
Le Cam (1986). The article of Torgersen gives a short, illuminating introduction to this subject.
The magnificent idea to study a construction like that in Theorem 10.2.1
is due to L. Weiss (1974) who also gave some asymptotic results. The extension
of the problem from a single dJ. to a parametric family of drs was suggested
by Weiss (1980). Weiss carried out a detailed study in the location and scale
parameter case. Further insight into the problem of comparing models based
on order statistics was obtained by Reiss et al. (1984) where a sharp bound of
the remainder term of the approximation was also established. The present
approach is taken from Reiss (1986). Some results concerning the sufficiency
of extremes within a parametric framework can be found in the articles by
Weiss (1979b) and Janssen and Reiss (1988). In the second article the location
model of a Weibull sample is locally compared with location models defined
by
(S;(I%
+ (J)m~k
and, respectively, (S;(I%
+ (J)m=1.2.3 ....
where (J is the location parameter, and Sm is the sum of m i.i.d. standard
exponential r.v.'s.
The optimum test procedure described in Example 10.4.1, (10.4.22), depends on the special choice of the set of alternatives. Weiss (1977) also
describes a Bayes test that has the properties of an "all purpose" test. Moreover, it is apparent that, by using the approach of Section 10.4, larger parts
of the theory of linear models can be made applicable to non parametric
statistics. A similar procedure based on spacings is dealt with by Weiss (1965).
APPENDIX 1
The Generalized In verse
Extending the definition of a q.f. (see (1.1.10)) we define the inverse 1jJ* of a
realvalued, nondecreasing and right continuous function IjJ with domain
(C!, w) by setting
1jJ*(y) = inf{t E (C!,w): ljJ(t);;::: y}
for
00
<y<
00
(A. 1.1)
(with the convention that inf 0 = w). Moreover, we define
1jJ1 = 1jJ* I(inf ljJ(s), sup ljJ(s));
(A. 1.2)
that is, 1jJ1 is the restriction of 1jJ* to the interval (inf ljJ(s), sup ljJ(s)).
Thus, in the particular case of the q.f. we have IjJ = F, (C!, w) == real line,
(inf ljJ(s), sup ljJ(s)) = (0, 1), and 1 = F 1. From the definitions ofljJ* and 1jJ1
one can easily conclude that 1jJ* is [C!,w]valued and 1jJ1 is (C!,w)valued.
Lemma A.t.t. For IjJ as above, if C! < x < fl then for every real y,
y S; ljJ(x)
iff
1jJ*(y) S; x.
(A. 1.3)
Since 1jJ*(y) is the inf of all t E (C!, w) such that ljJ(t) ;;::: y it is clear that
ljJ(x) ;;::: y implies x ;;::: 1jJ*(y). Conversely, for every z > x ;;::: 1jJ*(y) we have
ljJ(z) ;;::: y, and thus, y S; limzLx ljJ(z) = ljJ(x) since IjJ is right continuous.
D
PROOF.
It is clear that (A. 1.3) also holds for 1jJ1 and inf ljJ(s) S; y S; sup ljJ(s) in place
of 1jJ* and  00 < y < 00. Thus (1.2.9) is a special case of (A. 1.3).
We already know that 1jJ1 is a (C!, w)valued function with domain (inf ljJ(s),
sup ljJ(s)). More precisely, one can easily check that 1jJ1 is an (C!(IjJ), w(IjJ))valued function where
C!(IjJ) = inf {t
(C!, w): ljJ(t) > inf ljJ(s)}
(A. 1.4)
Appendix 1. The Generalized Inverse
319
and
w(ljJ) = sup{t
(a,w): ljJ(t) > supljJ(s)}.
(A. 1.5)
It is clear that a ::;; a(ljJ) ::;; w(ljJ) ::;; w. Notice that in the particular case of
adJ. F we have
a(F) = inf {t: F(t) > O}
(A. 1.6)
w(F) = sup{t: F(t) < 1}.
(A 1.7)
and
For the proof of Theorem 1.2.8 we also need the following auxiliary result.
Lemma A.1.2. If IjJ is as above then 1jJ* is nondecreasing and left continuous.
Moreover,
lim 1jJ*(y) = a,
y 
00
lim 1jJ*(y) = w,
y>oo
and
lim
ljJl (y) = a(IjJ),
y>infljt(s)
lim
y>sup Ijt(s)
(y) = w(IjJ).
PROOF. From the definition of 1jJ* it is obvious that 1jJ* is nondecreasing.
Moreover, 1jJ* is left continuous if Yn i y implies 1jJ*(Yn) > t, eventually, whenever a < t < 1jJ*(y). Lemma ALl implies that ljJ(t) < y. Consequently, ljJ(t) <
Yn and thus, by Lemma A.Ll again t < 1jJ*(Yn), eventually. By similar arguments one can verify the other assertions.
0
In analogy to the inverse of a nondecreasing, right continuous function IjJ
one can define the inverse of a nondecreasing left continuous function, say, q>
with domain (a,w). Put
q>**(y) = sup{t E (a,w): q>(t) ::;; y}
(A. 1.8)
for 00 < y < 00 (with the convention that sup 0 = a). An application of
Lemma ALl to the nondecreasing, right continuous function defined by
ljJ(x) =  q>(  x)leads to
if a < x < w, then for every y,
y ::;; q>(x) iff q>**(y) ~ x.
Lemma A.1.3. For q> as above,
PROOF. Verify that q>**(y) = 1jJ(  y).
From Lemma A.1.2 we conclude
Lemma A.I.4. (i) q>** is nondecreasing and right continuous.
(ii) Moreover,
320
Appendix 1. The Generalized Inverse
lim <p**(y) =
y~oo
0(,
and
lim <p**(y) =
y+oo
13.
Now we are in the proper position to carry out the
PROOF OF THEOREM 1.2.8. (i) is immediate from Lemma A.1.2.
(ii) Put F = G**. From Lemma A.1.4 it is clear that F is a dJ. To prove
that G = F 1 we apply Lemma A.Ll and Lemma A.1.3. For q E (0, 1) and
00 < x < 00 we have G(q) :::;; x iff q :::;; F(x), and this holds iff Fl(q):::;; x.
This equivalence implies that G = F 1
Finally we show that for dJ.'s Fl and F2 with Fl1 = F;l = G we have
Fl = F2. Suppose that Fl1 = F;l and Fl (x) # F2(x) for some x. W.l.g. we can
assume that Fl(X) < q < F2(x) for some q E (0, 1). Lemma A.Ll implies
that F21(q):::;; x < Fl1(q) which is a contradiction to Fl1 = F;l. Thus,
Fl = F2
D
From the proof to Theorem 1.2.8 we also know that (Fl )** = F. Thus F
is the "generalized inverse" of F 1 which does, however, not imply that
F 0 F 1 is the identity function as we already know from Criterion 1.2.3.
In analogy to Criterion 1.2.3 we obtain
Criterion A.I.S. The df. F 1 is continuous if, and only if, F1(F(x)) = x for
every x with < F(x) < 1.
APPENDIX 2
Two Technical Lemmas on Expansions
The results below will provide us with the basic tools for proving asymptotic
expansions for distributions of extreme and central order statistics.
Expansion of (1
+ x/nt
When studying extreme order statistics we are interested in an expansion of
finite length of e x (1 + x/nt where n is a positive integer and x > O. We
remark that e x (1 + x/n)" can easily be written as an infinite series by multiplying the absolutely convergent series
i~ C)<x/n)i
i~ (x)i/i!.
and
We have
e x (1
where
{3(i,n)
+ x/n)" =
co
L {3(i, n)xi
(~lY(.
i=l
J!
(A.2.l)
i=O
n .)ni1.
I 
(A.2.2)
We will prove that also an expansion of finite length arranged in powers
of n 1 holds. This result will be proved for real numbers rx ~ 1 instead of
positive integers n.
If k = 1,3,5, ... and rx ~ 1 then by writing (1 + x/rxt as exp[rx 10g(1 + x/rx)]
and by using a Taylor expansion of log about 1 it is immediate that
Appendix 2. Two Technical Lemmas on Expansions
322
(2X)k+1]
exp [ (k + l)ak
::<==;
e x (1
k (_1)i+1 Xi]
iai 1
+ x/a)a exp i~2
::<==;
(A.2.3)
for x ~  a/2. Moreover, the upper bound still holds for x ~  a. The inequalities are strict for x "# O. Since exp(x) ~ 1 + x we obtain from (A.2.3), applied
to k = 1, that
(A.2.4)
For k = 3, 5, 7, ... the term expO=~=2 (_1)i+1 xi/(iail)] is a higher order
approximation to e x (1 + x/aY but this approximation is not an expansion
of the type as discussed in Section 3.2. However, a Taylor expansion of exp
about zero yields the following result.
Lemma A.2.t. For every positive integer m there exists a constant Cm > 0 such
that for every a ~ 1 and x with  a/2 ::<==; x ::<==; a 2/3 the following inequality holds:
Ie
x (1
+ x/a)a 
[1 + 2:~1)
[3(i, a)x i ]
::<==;
Cmam(lxI2m1
+ Ixl2m)
(A.2.5)
where [3(i, a) are real numbers which have the property max {I [3(2k  1, a)l,
I[3(2k, a)!} ::<==; Cma k for k = 1, ... , m  1.
Moreover, we have
[3(2, a) = 1/(2a),
PROOF.
We have
2m1 (_I)i+1 Xi] _ m1 ~ [2(m1) (_I)i x i+1]i I <
m 2m
.,l.a i 1
.2:.,.2:
('
)'
i
Ca
x
)=0 J.
,=1 / + 1 .a
(A.2.6)
Iexp [ ,=2
.2:
where C will be used as a generic constant which only depends on m. By some
tedious (however straightforward) computations one can prove that
m1 1 [2(m1) (_I)i
2(m1)
Xi+1]i
.2: J.~ ,=1
.2: (/. + 1).a
,i  .2:
I)=0
,=2
[3(i, a)xi
::<==;
Ca m(lx 2m  1
+ Ix2ml)
where the values [3(i, a) have the desired property. This together with (A.2.3)
and (A.2.6) implies (A.2.5).
D
By writing down the proof of Lemma A.2.1 in detail one realizes that the
upper bound in A.2.5 still holds for values x with \I. ::<==; x ::<==; a 2 / 3
For every positive integer n the terms [3(i, a) in (A.2.5) are identical to the
corresponding values, say, [3*(i, a) in (A.2.2). This becomes obvious by noting
that there exists A > 0 and B > 0 such that
Appendix 2. Two Technical Lemmas on Expansions
323
for every Ixl ~ A. Now a comparison of this inequality to (A.2.5) leads to the
desired identification.
The Second Lemma
The next lemma will provide us with an expansion of the function
[3 )1/2
ga.P(x) = e x2 / 2 [ 1 + ( (a + [3)a
x
Ja [1  ((a +a [3)[3 )1/2 x JP
(A.2.7)
where a ~ 1 and [3 ~ 1. This expansion will be arranged in powers of the terms
((a + [3)/a[3) 1/2. As an application of this result one obtains expansions of
densities and, in a second step, of distributions of central order statistics (see
Section 4.2).
Lemma A.2.2. For every positive integer m there exists a constant Cm > 0 such
that for every a ~ 1, [3 ~ 1 and Ixl ~ (a[3/(a + [3))1/6 the following inequality
holds:
where gi.a.p is a polynomial of degree ~ 3i and the coefficients of gi,a,p are smaller
than Cm((a + [3)/a[3)i/2 for i = 1, ... , m  1.
In the proof of Lemma A.2.2 one has to choose the polynomials gi,a,p in
such a way that
1 (m1
'+2)j m1
I ((a + [3)1/2 Ixl 3)m
~ ~ .~ ai,a,px'
 .f: gi,a,P(X) ~ Cm [3Im1
)1 J. ,1
,1
a
(A.2.8)
where for i = 1, ... , m  1
1 [
.
(
[3
)(i+2)/2
(
a
)(i+2)/2J
ai,a,p = i + 2 (_1)'+1 a (a + [3)a
 [3 (a + [3)[3
.
Particularly, for i = 1,2,3
gl,a,p(x) = a 1,a,pX 3,
g2,a,p(X) = a 2,a,pX 4
+ ai,a,px 6 /2,
(A.2.9)
g3,a,p(X) = a 3,a,pX 5 + a1,a,pa2,a,pX 7 + aL,px 9 /6.
PROOF OF LEMMA A.2.2. Starting as in the proof of Lemma A.2.1 we obtain,
by using a Taylor expansion of log(1 + x) of length m + 1, that for every
Ixl ~ (a[3/(a + [3))1/6:
Appendix 2. Two Technical Lemmas on Expansions
324
ga,p{~ exp[~t: ai_2,a,pX i ~ c(a ~pr/2Ixlm+2 J
{<
 exp [m1
.L :r1 (m1
.L ai,a,px ,.+2
~
}=o
J.
)iJ + (a +
1=1
p)m/2 (lxl m+ 2
_ Cm  ap
+ IxI3m).
Now the proof can easily be completed by choosing the polynomials gi,a,p as
indicated in (A.2.8).
0
If a = p then Lemma A.2.1 and Lemma A.2.2 roughly coincide for nonnegative x.
We believe that an expansion of the function ga,p is of some interest in its
own right, however, this function is not properly adjusted to the particular
problem of computing an expansion of the distribution of a central order
statistic. For this purpose one has to deal with functions ha,p defined by
ha,p(X) = e x212 [ 1 + ( (a
P )1/2 Ja1 [
+ p)a x
1
((a +a p)P )1/2 x JP1
(A.2.10)
or with some other variation of the function ga,p according to the standardization of the distribution of the order statistic. By using Taylor expansions of
[ 1 + ( (a +Pp)a )1/2 x J1
and
[1 ((a +a P}P )1/2 x J1
about 1, one can easily deduce from Lemma A.2.2 the following:
Corollary A.2.3. For a, p ~ 1 let ha,p be defined as in (A.2.10).
Then, Lemma A.2.2 holds true for ha,p, hi,a,p and the term (Ixl m+ Ixl 3m ) in
place of ga,p, gi,a,p and (lxl m+ 2 + Ixl 3m ) where hi,a,p is a polynomial which has
the same properties as gi,a,p for i = 1, ... , m  1.
The polynomials h 1,a,p and h2,a,P are given by
h1,a,p(X) = gl,a,p(X) 
[((a +Pp)a )1/2 
h2,a,p(X) = g2,a,p(X) 
[((a +Pp)a )1/2 
)1 /2J
(a
+ P)P
(a
+a p)p )1 2J xg 1,a,p(X)
+ [(a + p)a  a + p + (a + P)P x.
2
APPENDIX 3
Further Results on Distances
of Measures
The aim of the following lines is to extend some of the results of Section 3.3
to finite signed measures. Moreover, we prove some highly technical inequalities which do not belong to the necessary prerequisites for the understanding
of the main ideas of this volume. However, these inequalities are useful for
certain computations.
In the sequel, let Vi be a finite signed measure (on a measurable space (S,.?4))
represented by the density J; w.r.t. a dominating measure 11.
A Further Remark about the Scheffe Lemma
An extension of Lemma 3.3.4 to finite signed measures is easily obtained by
splitting the measure Vi to the positive and negative part
and vi with the
respective densities J;+ and J; (the positive and negative part of f). Check that
vt
Ifa  fnl = Ifo+  f/I Ifo  f,,I.
Now, if the conditions of Lemma 3.3.4 are satisfied by J;+ andJ; then again
lim
n+oo
fifo  f,,1 dJl = O.
(A.3.1)
The Variational Distance and the L 1 Distance
Define again
II va
 viii
= sup IVa (B)
B
 Vi (B)I
(A.3.2)
326
Appendix 3. Further Results on Distances of Measures
as the variational distance between Vo and Vl . As an extension to Lemma 3.3.1
(the proof can be left to the reader) we get
Lemma A.3.t. (i) Ilvo  vlll ::;; I Ifo  fll d/l ::;; 211 vo  vlll
(ii) If vo(S) = Vl (S) then
Ilvo  vlll =
l fifo  flld/l.
We note that under the condition that vo(S) = vl(S) we have again
Ilvo  vlll = voUo > fd  vdfo > fd
The following modification of Lemma A.3.l (i) will be useful when the error
term of an approximation has to be computed. In our applications no estimate
of the term I g d/l has to be computed.
Lemma A.3.2. Let f and g be /lintegrable functions with g ~ 0, I g d/l > 0, and
If d/l > 0. Denote by Q the probability measure with /ldensity go = glI g d/l,
and by v the signed measure with /ldensity fo = flI f d/l.
Then for every B E f!J,
IIQ  vii::;; (ffd/l
r
l
L Ig  fld/l
+ Lc
Igo  fold/l
where Be denotes the complement of B.
PROOF.
From Lemma A.3.1 (ii) and the triangle inequality we get
Moreover, since g ~
and I go d/l = I fo d/l = 1 we have
LlgOg! ffd/lld/l=IL(gOg! ffd/l)d/l I
1
L (g! f f d/l) d/l  Lc go d/l
::;; ( f fd/l
L Ig  fld/l
+f
+ Lc Igo 
This together with (1) implies the asserted inequality.
fo d/ll
fold/l.
In the applications the set B in Lemma A.3.2 is an exceptional set (with
Q(B) and v(B) close to one) on which the integrand Ig  fl can easily be
computed.
Appendix 3. Further Results on Distances of Measures
327
The Variational Distance between Product Measures
Given finite signed measures Vi with J.l;density I; put Iv;! == IVi(Si) = JIl;l dJ.li
where Si denotes the underlying space. Fubini's theorem implies that IVl x v21=
IVlllv21 where the product measure Vl x v2is defined by
(Vl x v2)(B) =
fl(xdf2(x 2)d(J.ll
x J.l2)(X l ,X2)
Lemma A.3.3. Let Vi and )'i be finite signed measures for i = 1, ... , k. Then,
PROOF. For notational simplicity we will prove the assertion for k = 2 only.
The general case can easily be proved by induction over k.
For measurable sets A in the product space we get by Fubini's theorem
I(vl x A2)(A)  (Al x )'2)(A)1
If(Vl(A x')  Al(Ax2))dA2(X2)1
(1)
where AX2 is the x 2 section of A. In analogy to (1) we get
l(v1 x v2)(A)  (V1 x A2)(A)1 ::; IV11 sup Iv2(A x')  A2(Ax,)l.
(2)
Thus, combining (1) and (2) we get the desired inequality in the case of k = 2.
The proof is complete.
D
Notice that Lemma 3.3.7 is an immediate consequence of Lemma A.3.3.
Moreover, Lemma A.3.3 is an extension of the following wellknown formula
It ai  tl bil::; it
OJ
,aj')CiL Ib)) la i  bil
which holds for all real (as well as complex) numbers ai and bi
Corollary A.3.4. For probability measures Qi and finite signed measures Ai with
Ai(Si) = 1 we have
I ~ Qi PROOF.
i~ Ai I ::; exp [ 2 it II Qi 
Ai I ] it I Qi  Ai II
(A.3.3)
Check that
Ijis1 Ajl
jis1 IAjl::; jtL (1 + 211Qj  Ajl!)::; exp [2 it IIQi  Aill].
328
Appendix 3. Further Results on Distances of Measures
The proof of Lemma A.3.3 gives a little bit more than stated there. For
every measurable set A we obtain
where Xi = (Xl' .. . ,Xil,Xi+l,,Xk) and Ax, is the xisection of A; that is,
Ax, = {Xi:(Xl,,Xk)EA}. Thus, if e.g. A is a convex set then Ax, is an
interval.
The Hellinger Distance and the KullbackLeibler Distance
Next we give the proof of the inequality (3.3.9) where it was stated that
H(Qo, Ql) ~ K(Qo, Qd l/2 .
This is immediate from inequality (A.3.4) applied to B =
s.
Lemma A.3.S. Let Qo and Ql be probability measures with {tdensities fo and
fl' Then, for every measurable set B,
H(QO,Ql)
~ [2 Qo(B
C)
(logUdfo))dQo T/2
(A.3.4)
PROOF. According to (3.3.5) we have to establish a lower bound of JUdo) 1/2 d{t.
W.l.g. let Qo(B) > O. Since exp(x) ~ 1 + X, we obtain from the Jensen inequality that
Udo) 1/2 d{t
~ Qo(B)
Udfo)1/2 d(Qo/Qo(B))
~ Qo(B)exp [ (2Qo(BWl
~ Qo(B) + r
10gUdfo)dQo ]
10gUdfo)dQo
Now the assertion is immediate from (3.3.5).
Further Bounds for the Variational Distance
of Product Measures
Finally, we establish upper bounds for the variational distance of product
measures via the X2distance D. One special case was already proved in
(3.3.10), namely, that
Appendix 3. Further Results on Distances of Measures
329
for probability measures Qi and Pi where Pi has to be dominated by Qi.
Next Pi will be replaced by a signed measure Vi with VieS) = 1. Again one
has to assume that Vi is dominated by Qi. Lemma A.3.6, applied to m = 0, yields
At the end of this section we will discuss in detail the special case of m = 1.
Lemma A.3.6. Assume that Qi and Vi satisfy the conditions above. Let 1 + gi be
a Q;density of Vi. Then, for every m E {O, ... , k},
s~p IC~ Vi)(B) :<:=;
T1(ex p
hmd
i~ Qil
[((m + 1)!)1 it D(Qi,Vi)2 JY /2 [it D(Qi,P;)2 Jm+1)/2
where
hm (x 1 ,,xk ) = 1 +
PROOF.
TI giJXiJ
TI air'
i=11~il<"'<ij~kr=1
Notice that
k
TI (1
i=l
+ a;) =
1+
i=11:Si t <<i j 5.kr=1
X7=1
X7=1
and, therefore, hk is the
Qidensity of
Vi. From the Schwarz
inequality and the fact that the functions (x 1 ,,Xk )+
giJX;J for
1 :<:=; i1 < ... < ij :<:=; k and j = 1, ... , k form a multiplicative system w.r.t.
X~=l Qi we obtain
This implies the asserted inequality since
TI:=l
330
Appendix 3. Further Results on Distances of Measures
and
L
00
i=m
zi/i!
o.
D(Qi' pY =: Rk
(A.3.6)
exp(z)zm/m!
for z ~
In the special case of m = 1 we get
~ 8
exp [2 1
it
D(Qi' VY ]
it
and hence
This shows that for k + 00 further insight into the variational distance of
product measures may be gained by means of the central limit theorem.
Bibliography
Alam, K. (1972). Unimodality of the distribution of an order statistic. Ann. Math.
Statist. 43, 20412044.
Albers, W., Bickel, P. J. and van Zwet, W.R. (1976). Asymptotic expansions for the
power of distributionfree tests in onesample problem. Ann. Statist. 4, 108156.
Ali, M.M. and Kuan, K.S. (1977). On the joint asymptotic normality of quantiles.
Nanta Math. 10, 161165.
Anderson, C.W. (1971). Contributions to the Asymptotic Theory of Extreme Values.
Ph.D. Thesis, University of London.
Anderson, C.W. (1984). Large deviations of extremes. In: Statistical Extremes and
Applications, Ed. J. Tiago de Oliveira, pp. 325340. Dordrecht: Reidel.
Arnold, B.C., Becker, A., Gather, U. and Zahedi, H. (1984). On the Markov property
of order statistics. J. Statist. Plann. Inference 9, 147154.
Bahadur, R.R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37,
577580.
Bain, L.J. (1978). Statistical Analysis of Reliability and LifeTesting Models. New York:
Marcel Dekker.
Balkema, A.A. and Haan, L. de (1978a). Limit distributions for order statistics I. Theory
Probab. Appl. 23, 7792.
Balkema, A.A. and Haan, L. de (1978b). Limit distributions for order statistics II.
Theory Probab. Appl. 23, 341358.
BarndorffNielsen, O. (1964). On the limit distribution of the maximum of a random
number of independent random variables. Acta Math. Acad. Sci. Hungar. 15,399403.
Barnett, V. (1975). Probability plotting methods and order statistics. Appl. Statist. 24,
95108.
Barnett, V. (1976). The ordering of multivariate data. J. Roy. Statist. Soc., Ser. A, 139,
318344.
Barnett, V. and Lewis, T. (1978). Outliers in Statistical Data. Chichester: Wiley.
Beiriant, J. and Teugels, J.L. (1987). Asymptotics of Hill's estimator. Theory Probab.
Appl. 31,463469.
Beran, J. (1985). Stochastic procedures: Bootstrap and random search methods in
statistics: Proceedings of 45th Session of the lSI, Vol. 4 (Amsterdam), 25.1.
332
Bibliography
Berman, S.M. (1961). Convergence to bivariate limiting extreme value distributions.
Ann. Inst. Statist. Math. 13,217223.
Bhattacharya, R.N. and Rao, R.R. (1976). Normal Approximation and Asymptotic
Expansion. New York: Wiley.
Bhattacharya, R.N. and Gosh, J.K. (1978). On the validity of the formal Edgeworth
expansions. Ann. Statist. 6,434451.
Bickel, P.J. (1967). Some contributions to the theory of order statistics. In: Proc. 5th
Berkeley Symp. Math. Statistics and Prob., Vol. I., pp. 575591. Berkeley: Univ.
California Press.
Bickel, P.J. and Freedman D.A. (1981). Some asymptotic theory for the bootstrap.
Ann. Statist. 9,11961217.
Bickel, P.J. and Rosenblatt, M. (1973). On some global measures of the deviation of
density function estimates. Ann. Statist. 1, 10711095.
Bickel, P.J. and Rosenblatt, M. (1975). Correction to "On some global measures of the
deviation of density function estimates". Ann. Statist. 3, 1370.
Bloch, D.A. and Gastwirth, J.L. (1968). On a simple estimate of the reciprocal of the
density function. Ann. Math. Statist. 39, 10831085.
Biom, G. (1958). Statistical Estimates and Transformed BetaVariables. New York:
Wiley.
Blum, J.R. and Pathak, P.K. (1972). A note on the zeroone law. Ann. Math. Statist.
43,10081009.
Boos, D.D. (1984). Using extreme value theory to estimate large percentiles. Technometrics 26, 3339.
Bortkiewicz, L. von (1922). Variationsbreite und mittlere Fehler. Sitzungsberichte
Berliner Math. Ges. 21, 311.
Brown, B.M. (1981). Symmetric quantile averages and related estimators. Biometrika
68,235242.
Brozius, H. and Haan, L. de (1987). On limit laws for the convex hull of a sample. J.
Appl. Probab. 24, 863874.
Chernoff, H., Gastwirth, J.L. and John, M.V. (1967). Asymptotic distribution of linear
combinations of functions of order statistics with applications to estimation. Ann.
Math. Statist. 38, 5272.
Chibisov, D.M. (1964). On limit distributions for order statistics. Theory Probab. Appl.
9,150165.
Chow, Y.S. and Teicher, H. (1978). Probability Theory. New York: Springer.
Cohen, J.P. (1982a). The penultimate form of approximation to normal extremes. Adv.
Appl. Probab. 14,324339.
Cohen, J.P. (1982b). Convergence rates for the ultimate and penultimate approximation in extremevalue theory. Adv. Appl. Probab. 14, 833854.
Cohen, J.P. (1984). The asymptotic behaviour of the maximum likelihood estimates
for univariate extremes. In: Statistical Extremes and Applications, Ed. J. Tiago de
Oliveira, pp. 435442. Dordrecht: Reidel.
Cooil, B. (1985). Limiting multivariate distributions of intermediate order statistics.
Ann. Probab. 13,469477.
Cooil, B. (1988). When are intermediate processes of the same stochastic order? Statist.
Probab. Letters 6,159162.
Consul, P.C. (1984). On the distributions of order statistics for a random sample size.
Statist. Neerlandica 38, 249256.
Craig, A.T. (1932). On the distribution of certain statistics. Amer. J. Math. 54, 353366.
Cramer, H. (1946). Mathematical Methods of Statistics. Princeton: Princeton Univ.
Press.
Csiszar, I. (1975). IDivergence geometry of probability distributions and minimization
problems. Ann. Probab. 3,146158.
Bibliography
333
Csorgo, M. (1983). Quantile Processes with Statistical Applications. Philadelphia:
SIAM.
Csorgo, M., Csorgo, S., Horvath, L. and Mason, D.M. (1986). Normal and stable
convergence of integral functions of the empirical distribution function. Ann.
Probab. 14,86118.
Csorgo, M. and Revesz, P. (1981). Strong Approximations in Probability and Statistics.
New York: Academic Press.
Csorgo, S., Deheuvels, P. and Mason, D.M. (1985). Kernel estimates of the tail index
of a distribution. Ann. Statist. 13, 10501078.
Csorgo, S., Horvath, L. and Mason, D.M. (1986). What portion of the sample makes
a partial sum asymptotically stable or normal? Probab. Th. ReI. Fields 72,116.
Csorgo, S., and Mason, D.M. (1986). The asymptotic distributions of sums of extreme
values from a regularly varying distribution. Ann. Probab. 14,974983.
David, F.N. and Johnson, N.L. (1954). Statistical treatment of censored data, Part I,
Fundamental formulae. Biometrika 44, 228240.
David, HA (1981). Order Statistics. 2nd ed. New York: Wiley.
Davis, R.A. (1982). The rate of convergence in distribution of the maxima. Statist.
Neerlandica 36, 3135.
Deheuvels, P. and Pfeifer, D. (1988). Poisson approximations of multinomial distributions and point processes. J. Multivariate Anal. 25, 6589.
Dodd, E.L. (1923). The greatest and the least variate under general laws of error. Trans.
Amer. Math. Soc. 25, 525539.
Dronskers, J.J. (1958). Approximate formulae for the statistical distributions of extreme
values. Biometrika 45, 447470.
Du Mouchel, W. (1983). Estimating the stable index (X in order to measure tail thickness.
Ann. Statist. 11, 10191036.
Dwass, M. (1966). Extremal processes, II. Illinois J. Math. 10,381391.
Dziubdziela, W. (1976). A note on the kth distance random variables. Zastosowania
Matematykai 15,289291.
Eddy, W.F. and Gale, J.D. (1981). The convex hull of a spherically symmetric sample.
Adv. App!. Probab. 13, 751763.
Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Statist. 7,
126.
Egorov, V.A. and Nevzorov, V.B. (1976). Limit theorems for linear combinations of
order statistics. In: Proc. 3rd JapanUSSR Symp. Probab. Theory, Eds. G. Maruyama
and J.V. Prokhorov, pp. 6379. Lecture Notes in Mathematics 550. New York:
Springer.
Englund, G. (1980). Remainder term estimates for the asymptotic normality of order
statistics. Scand. J. Statist. 7, 197202.
Erdelyi, A., Magnus, W., Oberhettinger, F. and Tricomi, F.G. (1953). Higher Transcendental Functions, Vol. I. New York: McGrawHill.
Falk, M. (1983). Relative efficiency and deficiency of kernel type estimators of smooth
distribution functions. Statist. Neerlandica 37, 7383.
Falk, M. (1984a). Relative deficiency of kernel type estimators of quantiles. Ann. Statist.
12,261268.
Falk, M. (1984b). BerryEsseen theorems for a global measure of performance of kernel
density estimators. South African Statist. J. 19, 119.
Falk, M. (1985a). Asymptotic normality of the kernel quantile estimator. Ann. Statist.
13, 428433.
Falk, M. (1985b). Uniform convergence of extreme order statistics. Habilitationsschrift,
University of Siegen.
Falk, M. (1986a). Rates of uniform convergence of extreme order statistics. Ann. Inst.
Statist. Math., Ser. A, 38, 245262.
334
Bibliography
Falk, M. (1986b). On the estimation of the quantile density function. Statist. Probab.
Letters 4, 69 73.
Falk, M. (1989a). Best attainable rate of joint convergence of extremes. In: Extreme
Value Theory, Eds. J. Hiisler and R.D. Reiss, pp. 19. Lecture Notes in Statistics
51. New York: Springer.
Falk, M. (1989b). A note on uniform asymptotic normality of intermediate order
statistics. Ann. Inst. Statist. Math., Ser. A.
Falk, M. and Kohne, W. (1986). On the rate at which the sample extremes become
independent. Ann. Probab. 14, 13391346.
Falk, M. and Reiss, R.D. (1988). Independence of order statistics. Ann. Probab. 16,
854862.
Falk, M. and Reiss, R.D. (1989). Weak convergence of smoothed and nonsmoothed
bootstrap quantile estimates. Ann. Probab. 17.
Feldman, D. and Tucker, H.G. (1966). Estimation of nonunique quantiles. Ann. Math.
Statist. 37,451457.
Feller, W. (1972). An Introduction to Probability Theory and its Applications. Vol. 2,
2nd ed. New York: Wiley.
Ferguson, T.S. (1967). Mathematical Statistics. New York: Academic Press.
Finkelstein, B.V. (1953). Limiting distribution of extreme terms of a variational series
of a twodimensional random variable. Dokl. Ak. Nauk. S.S.S.R. 91, 209211 (in
Russian).
Fisher, R.A. (1922). On the mathematical foundation of theoretical statistics. Phil.
Trans. Roy. Soc. A 222, 309368. Reprint in: Collected Papers of R.A. Fisher, Vol.
I, Ed. J.H. Bennett, pp. 276335. University of Adelaide.
Fisher, R.A. and Tippett, L.H.C. (1928). Limiting forms of the frequence distribution
of the largest or smallest member of a sample. Proc. Camb. Phil. Soc. 24, 180190.
Floret, K. (1981). Mass und Integrationstheorie. Stuttgart: Teubner.
Fn:chet, M. (1927). Sur la loi de probabilite de l'ecart maximum. Ann. de la Soc.
Polonaise de Math. 6,93116.
Galambos, J. (1975). Order statistics of samples from multivariate distributions. J.
Amer. Statist. Assoc. 70, 674680.
Galambos, J. (1984). Order statistics. In: Handbook of Statistics. Vo!' 4, Eds. P.R.
Krishnaiah and P.K. Sen, pp. 359382. Amsterdam: NorthHolland.
Galambos, J. (1987). The Asymptotic Theory of Extreme Order Statistics. 2nd ed.
Malabar, Florida: Krieger.
Geffroy, J. (1958/59). Contributions ala theorie des valeurs extremes. Pub!. Inst. Statist.
Univ. Paris 7/8, 37185.
Gini, C. and Galvani, L. (1929). Di talune estensioni dei concetti di media ai caratteri
qualitativi. Metron 8. Partial English translation in: J. Amer. Statist. Assoc. 25,
448450.
Gnedenko, B. (1943). Sur la distribution limit du terme maximum d'une serie aleatoire.
Ann. Math. 44, 423453.
Goldie, C.M. and Smith, R.L. (1987). Slow variation with remainder: Theory and
applications. Quart. J. Math. Oxford 38,4571.
Gomes, M.1. (1978). Some probabilistic and statistical problems in extreme value
theory. Ph.D. Thesis, University of Sheffield.
Gomes, M.I. (1981). An idimensional limiting distribution function of largest values
and its relevance to the statistical theory of extremes. In: Statistical Distribution in
Scientific Work, Eds. C. Taillie et aI., Vol. 6., pp. 389410. Dordrecht: Reidel.
Gomes, M.I. (1984). Penultimate limiting forms in extreme value theory. Ann. Inst.
Statist. Math., Ser. A, 36, 7185.
Gross, A.J. (1975). Survival Distributions: Reliability Applications in the Biomedical
Sciences. New York: Wiley.
Bibliography
335
Guilbaud, O. (1982). Functions of noniid random vectors expressed as functions of
iid random vectors. Scand. J. Statist. 9, 229233.
Gumbel, E.J. (1933). Das Alter des Methusalem. Z. Schweizerische Statistik und
Volkswirtschaft 69, 516530.
Gumbel, E.J. (1946). On the independence of the extremes in a sample. Ann. Math.
Statist. 17, 7881.
Gumbel, E.J. (1958). Statistics of Extremes. New York: Columbia Univ. Press.
Haan, L. de (1970). On Regular Variation and its Application to the Weak Convergence of Sample Extremes. Amsterdam, Math. Centre Tracts 32.
Haan, L. de (1976). Sample extremes: an elementary introduction. Statist. Neerlandica
30, 161172.
Haan, L. de and Resnick, S.I. (1980). A simple asymptotic estimate for the index of
a stable distribution. J. Roy. Statist. Soc., Ser. B., 42, 8387.
Haan, L. de and Resnick, S.l. (1982). Local limit theorems for sample extremes. Ann.
Probab. 10,396413.
Hausler, E. and Teugels, J.L. (1985). On asymptotic normality of Hill's estimator for
the exponent of regular variation. Ann. Statist. 13, 743756.
Haldane, J.B.S. and Jayakar, S.G. (1963). The distribution of extremal and nearly
extremal values in samples from a normal distribution. Biometrika 50, 8994.
Hall, P. (1978). Some asymptotic expansions of moments of order statistics. Stoch.
Proc. Appl. 7, 265275.
Hall, P. (1979). On the rate of convergence of normal extremes. 1. Appl. Probab. 16,
433439.
Hall, P. (1982a). On estimating the endpoint of a distribution. Ann. Statist. 10, 556568.
Hall, P. (1982b). On some simple estimates of an exponent of regular variation. J. Roy.
Statist. Soc., Ser. B., 44,3742.
Hall, P. (1983). On near neighbour estimates of a multivariate density. J. Multivariate
Anal. 13,2439.
Hall, P. and Welsh, A.H. (1984). Best attainable rates of convergence for estimates of
parameters ofregular variation. Ann. Statist. 12, 10791084.
Hall, P. and Welsh, A.H. (1985). Adaptive estimates of parameters of regular variation.
Ann. Statist. 13,331341.
Hall, W.J. and Wellner, J.A. (1979). The rate of convergence in law of the maximum
of an exponential sample. Statist. Neerlandica 33, 151154.
Hajek, J. and Sidak, Z. (1967). Theory of Rank Tests. New York: Academic Press.
Harrel, F.E. and Davis, C.E. (1982). A new distributionfree quantile estimator. Biometrika 69, 635640.
Harter, H.L. (1983). The chronological annotated bibliography of order statistics. Vol.
I: pre1950. Vol. II: 19501959. Columbus, Ohio: American Sciences Press.
Hecker, H. (1976). A characterization of the asymptotic normality of linear combinations of order statistics from the uniform distribution. Ann. Statist. 4, 12441246.
Heidelberger, P. and Lewis, P.A.W. (1984). Quantile estimation in dependent sequences. Opns. Res. 32, 185209.
Helmers, R. (1981). A BerryEsseen theorem for linear combinations of order statistics.
Ann. Probab. 9, 342347.
Helmers, R. (1982). Edgeworth Expansions for Linear Combinations of Order Statistics. Amsterdam, Math. Centre Tracts 105.
Herbach, L. (1984). Introduction, Gumbel model. In: Statistical Extremes and Applications, Ed. J. Tiago de Oliveira, pp. 4980. Dordrecht: Reidel.
Hewitt, E. and Stromberg, K. (1975). Real and Abstract Analysis. 3rd ed. New York:
Springer.
Heyer, H. (1982). Theory of Statistical Experiments. Springer Series in Statistics. New
York: Springer.
336
Bibliography
Hill, B.M. (1975). A simple approach to inference about the tail of a distribution. Ann.
Statist. 3, 11631174.
Hillion, A. (1983). On the use of some variation distance inequalities to estimate the
difference between sample and perturbed sample. In: Specifying Statistical Models,
Eds. J.P. Florens et aI., pp. 163175. Lecture Notes in Statistics 16. New York:
Springer.
Hodges, J.L. Jr. and Lehmann, E.L. (1967). On medians and quasi medians. J. Amer.
Statist. Assoc. 62, 926931.
Hodges, J.L. Jr. and Lehmann, E.L. (1970). Deficiency. Ann. Math. Statist. 41,783801.
Hoeffding, W. and Wolfowitz, 1. (1958). Distinguishability of sets of distributions. Ann.
Math. Statist. 29, 700718.
Hosking, J.R.M. (1985). Maximumlikelihood estimation of the parameter of the
generalized extremevalue distribution. Applied Statistics 34, 301310.
Huang, J.S. and Gosh, M. (1982). A note on the strong unimodality of order statistics.
J. Amer. Statist. Soc. 77, 929930.
Husler, J. and Reiss, R.D. (1989). Maxima of normal random vectors: Between
independence and complete dependence. Statist. Probab. Letters 7.
Husler, J. and Schupbach, M. (1988). On simple block estimators for the parameters
of the extremevalue distribution. Commun. Statist.Simula. 15,6176.
Husler, J. and Tiago de Oliveira, J. (1986). The usage of the largest observations
for parameter and quantile estimation for the Gumbel distribution; an efficiency
analysis. Pub!. Inst. Stat. Univ. 33,4156.
Ibragimov, J.A. (1956). On the composition of unimodal distributions. Theory Probab.
Appl. 1,225260.
Ibragimov, J.A. and Has'minskii, R.Z. (1981). Statistical Estimation. SpringerVerlag,
Berlin.
Iglehardt, D.L. (1976). Simulating stable stochastic systems; VI. Quantile estimation.
J. Assoc. Comput. Mach. 23, 347360.
Ikeda, S. (1963). Asymptotic equivalence of probability distributions with applications
to some problems of asymptotic independence. Ann. Inst. Statist. Math. 15,87116.
Ikeda, S. (1975). Some criteria for uniform asymptotic equivalence of real probability
distributions. Ann. Inst. Statist. Math. 27,421428.
Ikeda, S. and Matsunawa, T. (1970). On asymptotic independence of order statistics.
Ann. Inst. Statist. Math. 22, 435449.
Ikeda, S. and Matsunawa, T. (1972). On the uniform asymptotic joint normality of
sample quantiles. Ann. Inst. Statist. Math. 24, 3352.
Ikeda, S. and Nonaka, Y. (1983). Uniform asymptotic joint normality of a set of
increasing number of sample quantiles. Ann. Inst. Statist. Math. 35, Ser. A, 329341.
Isogai, T. (1985). Some extensions of Haldane's multivariate median and its applications. Ann. Inst. Statist. Math. 37, Ser. A, 289301.
Ivchenko, G.I. (1971). On limit distributions for the order statistics of the multinomial
distribution. Theory Probab. Appl. 16, 102115.
Ivchenko, G.I. (1974). On limit distributions for middle order statistics for double
sequence. Theory Probab. App!. 19,267277.
Jacod, J. and Shiryaev, A.N. (1987). Limit Theorems for Stochastic Processes. Berlin:
Springer.
Janssen, A. (1988). Uniform convergence of sums of order statistics to stable laws.
Probab. Th. ReI. Fields 78, 261272.
Janssen, A. and Reiss, R.D. (1988). Comparison of location models of Wei bull type
samples and extreme value processes. Probab. Th. ReI. Fields 78, 273292.
JoagDev, K. (1983). Independence via uncorrelatedness under certain dependence
structures. Ann. Probab. 11, 10371041.
Joe, H. (1987). Estimation of quantiles of the maximum of N observations. Biometrika
74,347354.
Bibliography
337
Johnson, N.L. and Kotz, S. (1970). Distributions in Statistics: Continuous Univariate
Distributionsl. New York: Wiley.
Johnson, N.L. and Kotz, S. (1972). Distributions in Statistics: Continuous Multivariate
Distributions. New York: Wiley.
Kabanov, Yu. and Lipster, R.S. (1983). On convergence in variation of the distributions
of multivariate point processes. Z. Wahrsch. verw. Geb. 63,475485.
Karr, A.F. (1986). Point Processes and their Statistical Inference. New York: Marcel
Dekker.
Kendall, M.G. (1940). Note on the distributions of quantiles for large samples. J. Roy.
Statist. Soc., Suppl. 7, 8385.
Kendall, M.G. and Stuart, A. (1958). The Advanced Theory of Statistics. Vol. l.
London: Griffin.
Kiefer, J. (1967). On Bahadur's representation of sample quantiles. Ann. Math. Statist.
38, 13231342.
Kiefer, J. (1969a). Deviations between the sample quantile process and the sample df.
In: Nonparametric Techniques in Statistical Inference, Ed. M.L. Puri, pp. 299319.
Cambridge: Cambridge Univ. Press.
Kiefer, J. (1969b). Old and new methods for studying order statistics and sample
quantiles. In: Nonparametric Techniques in Statistical Inference, Ed. M.L. Puri,
pp. 349357. Cambridge: Cambridge Univ. Press.
Kinnison, R.R. (1985). Applied Extreme Value Statistics. Columbus: Battelle Press.
Klenk, A. and Stute, W. (1987). Bootstrapping ofLestimates. Statist. Decisions 5, 7787.
Kohne, W. and Reiss, R.D. (1983). A note on uniform approximation to distributions
of extreme order statistics. Ann. Inst. Statist. Math., Ser. A, 35, 343345.
Kolchin, V.F. (1980). On the limiting behaviour of extreme order statistics in a
polynomial scheme. Theory Probab. Appl. 14,458469.
Koziol, J.A. (1980). A note on limiting distributions for spacings statistics. Z. Wahrsch.
verw. Gebiete 51, 5562.
Kuan, K.S. and Ali, M.M. (1960). Asymptotic distribution of quantiles from a multivariate distribution. In: Mult. Statist. Analysis, Ed. R.P. Gupta, pp. 109120.
Amsterdam: NorthHolland.
Lamperti, J. (1964). On extreme order statistics. Ann. Math. Statist. 35, 17261736.
Landers, D. and Rogge, L. (1985). Asymptotic normality ofthe estimators of the natural
median. Statist. Decisions 3, 7790.
Laplace, P.S. de (1818). Deuxieme supplement a la theorie analytique de probabilities.
Paris: Courcier, Reprint (1886) in: Ouevres completes de Laplace 7, pp. 531580.
Paris: GauthierVillars.
Lawless, IF. (1982). Statistical Models and Methods for Lifetime Data. New York:
Wiley.
Leadbetter, M.R., Lindgren, G. and Rootzen, H. (1983). Extremes and Related Properties of Random Sequences and Processes. Springer Series in Statistics. New York:
Springer.
Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer Series
in Statistics. New York: Springer.
Lehmann, E.L. (1986). Testing Statistical Hypothesis. 2nd ed. New York: Wiley.
Loeve, M. (1963). Probability Theory. 3rd ed. New York: Van Nostrand.
Mack, Y.P. (1984). Remarks on some smoothed empirical distribution functions and
processes. Bull. Informatics Cybernetics 21, 2935.
Malmquist, S. (1950). On a property of order statistics from a rectangular distribution.
Skand. Aktuar. 33, 214222.
Mammitzsch, V. (1984). On the asymptotically optimal solution within a certain class
of kernel type estimators. Statist. Decisions 2, 247255.
Mann, N.R., Schafer, R.E. and Singpurwalla, N.D. (1974). Methods for Statistical
Analysis of Reliability and Life Data. New York: Wiley.
338
Bibliography
Mann, N.R. (1984). Statistical estimation of the Weibull and Frechet distributions.
In: Statistical Extremes and Applications, Ed. J. Tiago de Oliveira, pp. 8189.
Dordrecht: Reidel.
Marshall, A.W. and Olkin, I. (1983). Domains of attraction of multivariate extreme
value distributions. Ann. Probab. 11, 168177.
Matsunawa, T. (1975). On the error evaluation of the joint normal approximation for
sample quantiles. Ann. Inst. Statist. Math. 27,189199.
Matsunawa, T. and Ikeda, S. (1976). Uniform asymptotic distribution of extremes. In:
Essays in Probab. Statist., Eds. S. Ikeda et al., pp. 419432. Tokyo: Shinko Tsusho.
Michel, R. (1975). An asymptotic expansion for the distribution of asymptotic maximum likelihood estimators of vector parameters. J. Multivariate Anal. 5,6785.
Miebach, B. (1977). Asymptotische Theorie fUr Familien von MaBen mit Lokalisations und Dispersionsparameter. Diploma Thesis, University of Cologne.
Mises von, R. (1923). Ober die Variationsbreite einer Beobachtungsreihe. Sitzungsberichte Berliner Math. Ges. 22, 38.
Mises von, R. (1936). La distribution de la plus grande de n valeurs. Rev. Math. Union
Interbalcanique 1, 141160. Reproduced in Selected Papers of Richard von Mises,
Amer. Math. Soc. 2 (1964), 271294.
Miyamoto, Y. (1976). Optimum spacings for goodness of fit tests based on sample
quantiles. In: Essays in Probab. Statist., Eds. S. Ikeda et aI., pp. 475483. Tokyo:
Shinko Tsusho.
Montfort, M.A.J. van (1982). Modellen voor maximum en minima, schattingen en
betrouwbaarheidsintervallen, kreuze tussen modellen, Agricultural University
Wageningen, Netherlands, Dept. Math., Statist. Division, Technical Note 8202.
Montfort, M.A.J. van and Gomes, I.M. (1985). Statistical choice of extremal models
for complete and censored data. 1. Hydrology 77, 7787.
Mood, A. (1941). On the joint distribution of the median in sample from a multivariate
population. Ann. Math. Statist. 12,268278.
Moore, D.S. and Yackel, J.W. (1977). Large sample properties of nearest neighbour
density function estimates. In: Statistical Decision Theory and Related Topics, Eds.
S.S. Gupta and D.S. Moore, pp. 269279. New York: Academic Press.
Mosteller, F. (1946). On some useful inefficient statistics. Ann. Math. Statist. 17,
377408.
Nadaraya, E.A. (1964). Some new estimates for distribution functions. Theory Probab.
Appl. 10, 186190.
Nagaraja, H.N. (1982). On the nonMarkovian structure of discrete order statistics. J.
Statist. PI ann. Inference 7, 2933.
Nagaraja, H.N. (1986). Structure of discrete order statistics. J. Statist. Plann. Inference
13, 165177.
Nelson, W. (1982). Applied Life Data Analysis. New York: Wiley.
Nowak, W. and Reiss, R.D. (1983). Asymptotic expansions of distributions of central
order statistics under discrete distributions. Technical Report 101, University of
Siegen.
Oja, H. and Niinimaa, A. (1985). Asymptotic properties of the generalized median in
the case of multivariate normality. J. Roy. Statist. Soc., Ser. B, 47,372377.
O'Reilley, F.J. and Quesenberry, c.P. (1973). The conditional probability integral
transformation and applications to obtain composite chisquare goodnessoffit
tests. Ann. Statist. 1, 7483.
Pantcheva, E.I. (1985). Limit theorems for extreme order statistics under nonlinear
normalization. In: Stability Problems for Stochastic Models, Eds. v.v. Kalashnikov
and V.M. Zolotarev, pp. 284309. Lecture Notes in Mathematics 1155, Berlin:
Springer.
Parzen, E. (1962) On estimation of a probability density function and mode. Ann.
Math. Statist. 33, 10651076.
Bibliography
339
Parzen, E. (1979). Nonparametric statistical data modeling. 1. Amer. Statist. Assoc. 74,
105121.
Pearson, K. (1902). Note on Francis Galton's problem. Biometrika 1,390399.
Pearson, K. (1920). On the probable errors of frequency constants. Biometrika 13,
113132.
Pfanzagl, J. (1973a). Asymptotically optimum estimation and test procedures. In: Proc.
Prague Symp. Asymptotic Statistics, Vol. 1, Ed. J. Hajek, pp. 201272. Prague:
Charles University.
Pfanzagl, J. (1973b). The accuracy ofthe normal approximation for estimates of vector
parameters. Z. Wahrsch. verw. Gebiete 25, 171198.
Pfanzagl, J. (1973c). Asymptotic expansions related to minimum contrast estimators.
Ann. Statist. 1,9931026.
Pfanzagl, J. (1975). Investigating the quantile of an unknown distribution. In: Statistical
Methods in Biometry, Ed. W.J. Ziegler, pp. 111126. Basel: Birkhauser.
Pfanzagl, J. (1982). Contributions to a General Asymptotic Statistical Theory. (With
the assistence ofW. Wefelmeyer). Lecture Notes in Statistics 13. New York: Springer.
Pfanzagl, J. (1985). Asymptotic Expansions for General Statistical Models. (With the
assistance ofW. Wefelmeyer). Lecture Notes in Statistics 31. New York: Springer.
Pickands, J. (1967). Sample sequences of maxima. Ann. Math. Statist. 38, 15701574.
Pickands, J. (1968). Moment convergence of sample extremes. Ann. Math. Statist. 39,
881889.
Pickands, J. (1975). Statistical inference using extreme order statistics. Ann. Statist. 3,
119131.
Pickands,1. (1981). Multivariate extreme value distributions. Proc. 43th Session ofthe
lSI (Buenos Aires), 859878.
Pickands, J. (1986). The continuous and differentiable domains of attractions of the
extreme value distributions. Ann. Probab. 14,9961004.
Pitman, E.J.G. (1979). Some Basic Theory for Statistical Inference. London: Chapman
and Hall.
Plackett, R.L. (1976). In: Discussion of Professor Barnett's Paper. J.R. Statist. Soc., Ser.
A, 139,344346.
Polfeldt, T. (1970). Asymptotic results in nonregular estimation. Skand. Aktuar.,
Suppl. 12,278.
Prakasa Rao, B.L.S. (1983). Nonparametric Functional Estimation. Orlando: Academic Press.
Puri, M.L. and Ralescu, S.S. (1986). Limit theorems for random central order statistics. In: Adaptive Statistical Procedures and Related Topics, Ed. J. van Ryzin,
pp. 447475. IMS Lecture Notes 8.
Pyke, R. (1965). Spacings. J. Roy. Statist. Soc., Ser. B. 27, 395436. Discussion: 437449.
Pyke, R. (1972). Spacings revisited. In: Proc. 6th Berkeley Symp., Math. Statist.
Probability, Vol. 1, Eds. L.M. Le Cam et aI., pp. 417427. Berkeley: Univ. California
Press.
Radtke, M. (1988). Konvergenzraten und Entwicklungen unter von Mises Bedingungen der Extremwerttheorie. Ph.D. Thesis, University of Siegen.
Ramachandran, G. (1984). Approximate values for the moments of extreme order
statistics in large samples. In: Statistical Extremes and Applications, Ed. 1. Tiago de
Oliveira, pp. 563578. Dordrecht: Reidel.
Rao, J.S. and Kuo, M. (1984). Asymptotic results on the Greenwood statistic and some
of its generalizations. J. Roy. Statist. Soc., Ser. B, 46,228237.
Raoult, J.P., Criticou, D. and Terzakis, D. (1983). The probability integral transformation for not necessarily absolutely continuous distribution functions, and its application to goodnessoffit tests. In: Specifying Statistical Models, Ed. J.P. Florens et aI.,
pp. 3649. New York: Springer.
340
Bibliography
Reiss, R.D. (1973). On the measurability and consistence of maximum likelihood
estimates for unimodal densities. Ann. Statist. 1,888901.
Reiss, R.D. (1974a). On the accuracy of the normal approximation for quantiles. Ann.
Probab. 2, 741744.
Reiss, R.D. (1974b). Asymptotic expansions for sample quantiles. Technical Report 6,
University of Cologne.
Reiss, R.D. (1975a). The asymptotic normality and asymptotic expansions for the joint
distribution of several order statistics. In: Limit Theorems of Prob. Theory, Ed.
P. Revesz, pp. 297340. Amsterdam: NorthHolland.
Reiss, R.D. (1975b). Consistency of a certain class of empirical density functions.
Metrika 22,189203.
Reiss, R.D. (1976). Asymptotic expansions for sample quantiles. Ann. Probab. 4,
249258.
Reiss, R.D. (1977a). Asymptotic Theory of Order Statistics. Lecture Notes, University
of Freiburg.
Reiss, R.D. (1977b). Optimum confidence bands for density functions. Studia Sci.
Math. Hungar. 12,207214.
Reiss, R.D. (1978a). Approximate distribution of the maximum deviation of histograms. Metrika 25, 926.
Reiss, R.D. (1978b). Consistency of minimum contrast estimators in nonstandard
cases. Metrika 25, 129142.
Reiss, R.D. (1980). Estimation of quantiles in certain nonparametric models. Ann.
Statist. 8, 87105.
Reiss, R.D. (1981a). Approximation of product measures with an application to order
statistics. Ann. Probab. 9, 335341.
Reiss, R.D. (1981b). Asymptotic independence of distributions of normalized order
statistics of the underlying probability measure. J. Multivariate Anal. 11, 386399.
Reiss, R.D. (1981c). Nonparametric estimation of smooth distribution functions.
Scand. J. Statist. 8, 116119.
Reiss, R.D. (1981d). Uniform approximation to distributions of extreme order statistics. Adv. Appl. Probab. 13,533547.
Reiss, R.D. (1982). One sided test for quantiles in certain nonparametric models. In:
Nonparametric Statistical Inference, Colloq. Math. Soc. Jimos Bolyai 32, Eds. P.V.
Gnedenko et aI., pp. 759772. Amsterdam: North Holland.
Reiss, R.D. (1984). Statistical inference using approximate extreme value models.
Technical Report 124, University of Siegen.
Reiss, R.D. (1985a). Asymptotic expansions of moments of central order statistics. In:
Probability and Statistical Decision Theory, Vol. A., Proc. 4th Pann. Symp., Eds.
Mogyordi et aI., pp. 293300. Dordrecht: Reidel.
Reiss, R.D. (1985b). Approximations to the distributions of ordered distance random
variables. Ann. Inst. Statist. Math., Ser. A, 37, 529533.
Reiss, R.D. (1986). A new proof of the approximate sufficiency of sparse order statistics.
Statist. Probab. Letters 4, 233235.
Reiss, R.D. (1987). Estimating the tail index of the claim size distribution. Blatter
DGVM 18,2125.
Reiss, R.D. (1989). Extended extreme value models and adaptive estimation of the tail
index. In: Extreme Value Theory, Eds. ]. Husler and R.D. Reiss, pp. 156165.
Lecture Notes in Statistics 51. New York: Springer.
Reiss, R.D., Falk, M. and Weller, M. (1984). Inequalities for the relative sufficiency
between sets of order statistics. In: Statistical Extremes and Applications, Ed. J.
Tiago de Oliveira, pp. 597610. Dordrecht: Reidel.
Renyi, A. (1953). On the theory of order statistics. Acta Math. Acad. Sci. Hungar. 4,
191231.
Bibliography
341
Resnick, S.I. (1987). Extreme Values, Regular Variation, and Point Processes. Applied
Probability. Vol. 4. New York: Springer.
Rice, 1. and Rosenblatt, M. (1976). Estimation ofthe log survivor function and hazard
function. Sankhya, Ser. A, 38, 6078.
Rootzen, H. (1984). Attainable rates of convergence of maxima. Statist. Probab. Letters
2,219221.
Rootzen, H. (1985). Asymptotic distributions of order statistics from stationary normal
sequences. In: Contribution to Probability and Statistics in Honour of Gunnar
Blom, Eds. J. Lanke and G. Lindgren, pp. 291302. University of Lund.
Rosenblatt, M. (1952). Remarks on a multivariate transformation. Ann. Statist. 23,
470472.
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27, 832837.
Rosengard, A. (1962). Etude des loislimitesjointes et marginales de la moyenne et des
valeurs extremes d'un echantillon. Publ. Inst. Statist. Univ. Paris 11, 353.
Rossberg, H.J. (1965). Die asymptotische Unabhiingigkeit der kleinsten und groBten
Werte einer Stichprobe vom Stichprobenmittel. Math. Nachr. 28, 305318.
Rossberg, H.J. (1967). Ober das asymptotische Verhalten der Rand und Zentralglieder
einer Variationsreihe (II). Publ. Math. Debrecen 14,8390.
Rossberg, H.J. (1972). Characterization ofthe exponential and the Pareto distribution
by means of some properties of the distributions which the differences and quotients
of order statistics are subject to. Math. Operationsforsch. Statist. 3,207316.
Riischendorf, L. (1985a). Two remarks on order statistics. J. Statist. Plann. Inference
11,7174.
Riischendorf, L. (1985b). The Wasserstein distance and approximation theorems. Z.
Wahrsch. verw. Geb. 66,117129.
Ryzin, J. van (1973). A histogram method of density estimation. Commun. Statist. 2,
493506.
Sen, P.K. (1968). Asymptotic normality of sample quantiles for mdependent processes.
Ann. Math. Statist. 39, 17241730.
Sen, P.K. (1972). On the Bahadur representation of sample quantiles for sequences of
cpmixing random variables. J. Multivariate Anal. 2, 7795.
Sendler, W. (1975). A note on the proof ofthe zeroone law of Blum and Pathak. Ann.
Probab. 3, 10551058.
Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistical. New York:
Wiley.
Shaked, M. and Tong, Y.L. (1984). Stochastic ordering of spacings from dependent
random variables. In: Inequalities in Statistics and Probability. IMS Lecture Notes
5,141149.
Shorack, G.R. and Wellner, J.A. (1986). Empirical Processes with Applications to
Statistics. New York: Wiley.
Sibuya, M. (1960). Bivariate extreme statistics. Ann. Inst. Stat. Math. 19, 195210.
Siddiqui, M.M. (1960). Distribution of quantiles in samples from a bivariate population. J. Res. Nat. Bureau Standards 64, Ser. B, 124150.
Singh, K. (1979). Representation of quantile processes with nonuniform bounds.
Sankhya, Ser. A, 41, 271277.
Singh, K. (1981). On the asymptotic accuracy of Efron's bootstrap. Ann. Statist. 9,
11871195.
Smid, B. and Starn, AJ. (1975). Convergence in distribution of quotients of order
statistics. Stoch. Proc. Appl. 3,287292.
Smirnov, N.V. (1935). Ober die Verteilung des allgemeinen Gliedes in der Variationsreihe. Metron 12,5981.
Smirnov, N.B. (1944). Approximation of distribution laws of random variables by
empirical data. Uspechi Mat. Nauk 10, 179206 (in Russian).
342
Bibliography
Smirnov, N.V. (1949). Limit distributions for the term of a variational series. Trudy
Mat. Inst. Steklov 25, 160. (In Russian). English translation in Amer. Math. Soc.
Transl. (1), 11 (1952), 82143.
Smirnov, N.V. (1967). Some remarks on limit laws for order statistics. Theory Probab.
Appl. 12,337339.
Smith, R.L. (1982). Uniform rates of convergence in extreme value theory. Adv. Appl.
Probab. 14,600622.
Smith, R.L. (1984). Threshold methods for sample extremes. In: Statistical Extremes
and Applications, Ed. J. Tiago de Oliveira, pp. 621638. Dordrecht: Reidel.
Smith, R.L. (1985a). Maximum likelihood estimation in a class of nonregular cases.
Biometrika 72, 6792.
Smith, R.L. (1985b). Statistics of extreme values. Proc. 45th Session of the lSI, Vol. 4
(Amsterdam), 26.1.
Smith, R.L. (1986). Extreme value theory based on the r largest annual events. J.
Hydrology 86, 2743.
Smith, R.L. (1987). Estimating tails of probability distributions. Ann. Statist. 15,
11741207.
Smith, R.L. and Weissman, I. (1987). Large deviations of tail estimators based on the
Pareto approximation. J. Appl. Probab. 24, 619630.
Smith, R.L., Tawn, J.A. and Yuen, H.K. (1987). Statistics of multivariate extremes.
Preprint, University of Surrey.
Sneyers, R. (1984). Extremes in meteorology. In: Statistical Extremes and Applications,
Ed. J. Tiago de Oliveira, pp. 235252. Dortrecht: Reidel.
Stigler, S.M. (1973). Studies in the history of probability and statistics. XXXII. Biometrika 60, 439445.
Strasser, H. (1985). Mathematical Theory of Statistics. De Gruyter Studies in Math.
7, Berlin: De Gruyter.
Stute, W. (1982). The oscillation behaviour of empirical processes. Ann. Probab. 10,
86107.
Sukhatme, P.V. (1937). Tests of significance for sample of the X2 population with two
degrees offreedom. Ann. Eugenics 8, 5256.
Sweeting, T.J. (1985). On domains of uniform local attraction in extreme value theory.
Ann. Probab. 13, 196205.
Teugels, J.L. (1981). Limit theorems on order statistics. Ann. Probab. 9, 868880.
Thompson, W.R. (1936). On confidence ranges for the median and other expectation
distributions for populations of unknown distribution form. Ann. Math. Statist. 7,
122128.
Tiago de Oliveira, J. (1958). Extremal distributions. Rev. Fac. Cienc. Univ. Lisboa A
7,215227.
Tiago de Oliveira, J. (1961). The asymptotic independence of the sample means and
the extremes. Rev. Fac. Cienc. Univ. Lisboa A 8, 299310.
Tiago de Oliveira, J. (1963). Decision results for the parameters of the extreme value
(Gumbel) distribution based on the mean and standard deviation. Trabajos de
Estadistica 14, 6181.
Tiago de Oliveira, J. (1984). Bivariate models for extremes; statistical decisions.
In: Statistical Extremes and Applications, Ed. J. Tiago de Oliveira, pp. 131153.
Dordrecht: Reidel.
Tippett, L.H.C. (1925). On the extreme individuals and the range of samples taken
from a normal population. Biometrika 17,364387.
Torgersen, E.N. (1976). Comparison of statistical experiments. Scand. J. Statist. 3,
186208.
Tusmidy, G. (1974). On testing density functions. Period. Math. Hungar. 5, 161169.
Umbach, D. (1981). A note on the median of a distribution. Ann. Inst. Statist. Math.
33, Ser. A, 135140.
Bibliography
343
Uzgoren, N.T. (1954). The asymptotic development of the distribution of the extreme
values of a sample. In: Studies in Mathematics and Mechanics. Presented to Richard
von Mises, pp. 346353. New York: Academic Press.
Vaart, H.P. van der (1961). A simple derivation ofthe limiting distribution function of
a sample quantile with increasing sample size. Statist. Neerlandica 15,239242.
Walsh, J.E. (1969). Asymptotic independence between largest and smallest of a set of
independent observations. Ann. Inst. Statist. Math. 21, 287289.
Walsh, J.E. (1970). Sample sizes for appropriate independence of largest and smallest
order statistic. J. Amer. Statist. Assoc. 65, 860863.
Watson, G. and Leadbetter, M. (1964a). Hazard analysis I. Biometrika 51,175184.
Watson, G. and Leadbetter, M. (1964b). Hazard analysis II. Sankhya, Ser. A, 26,
101116.
Watts, V., Rootzen, H. and Leadbetter, M.R. (1982). On limiting distributions of
intermediate order statistics from stationary sequences. Ann. Probab. 10, 653662.
Weinstein, S.B. (1973). Theory and applications of some classical and generalized
asymptotic distributions of extreme values. IEEE Trans. Inf. Theory 19, 148154.
Weiss, L. (1959). The limiting joint distribution of the largest and smallest sample
spacings. Ann. Math. Statist. 30, 590593.
Weiss, L. (1964). On the asymptotic joint normality of quantiles from a multivariate
distribution. J. Res. Nat. Bureau Standards 68, Ser. B, 6566.
Weiss, L. (1965). On asymptotic sampling theory for distributions approaching the
uniform distribution. Z. Wahrsch. verw. Gebiete 4, 217221.
Weiss, L. (1969a). The joint asymptotic distribution of the ksmallest sample spacings.
J. Appl. Probab. 6,442448.
Weiss, L. (1969b). The asymptotic joint distribution of an increasing number of sample
quantiles. Ann. Inst. Statist. Math. 21, 257263.
Weiss, L. (1969c). Asymptotic distributions of quantiles in some nonstandard cases.
In: Nonparametric Techniques in Statistical Inference, Ed. M.L. Puri, pp. 343348.
Cambridge: Cambridge Univ. Press.
Weiss, L. (1971). Asymptotic inference about a density function at an end of its range.
Nav. Res. Logist. Quart. 18,111114.
Weiss, L. (1973). Statistical procedures based on a gradually increasing number of order
statistics. Commun. Statist. 2, 95114.
Weiss, L. (1974). The asymptotic sufficiency of a relatively small number of order
statistics in test of fit. Ann. Statist. 2, 795802.
Weiss, L. (1976). The normal approximations to the multinomial with an increasing
number of classes. Nav. Res. Logist. Quart. 23, 139149.
Weiss, L. (1977). Asymptotic properties of Bayes tests of nonparametric hypothesis.
In: Statistical Decision Theory and Related Topics, II, Eds. D.S. Moore and S.S.
Gupta, pp. 439450. New York: Academic Press.
Weiss, L. (1978). The error in the normal approximation to the multinomial with an
increasing number of classes. Nav. Res. Logist. Quart. 25,257261.
Weiss, L. (1979a). The asymptotic distribution of order statistics. Nav. Res. Logist.
Quart. 26,437445.
Weiss, L. (1979b). Asymptotic sufficiency in a class of nonregular cases. Selecta Statistica Canadiana V, 141150.
Weiss, L. (1980). The asymptotic sufficiency of sparse order statistics in test of fit with
nuisance parameters. Nav. Res. Logist. Quart. 27, 397406.
Weiss, L. (1982). Asymptotic joint normality of an increasing number of multivariate order statistics and associated cell frequencies. Nav. Res. Logist. Quart. 29,
7596.
Weissman, I. (1975). Multivariate extremal processes generated by independent nonidentically distributed random variables. J. Appl. Probab. 12,477487.
344
Bibliography
Weissman, I. (1978). Estimation of parameters and large quantiles based on the k
largest observations. J. Amer. Statist. Assoc. 73,812815.
Wellner, J.A. (1977). A law of the iterated logarithm for functions of order statistics.
Ann. Statist. 5, 481494.
Wilks, S.S. (1948). Order Statistics. Bull. Amer. Math. Soc. 54, 650.
Wilks, S.S. (1962). Mathematical Statistics. New York: Wiley.
Winter, B.B. (1973). Strong uniform consistency of integrals of density estimators.
Canad. J. Statist. 1,247253.
Witting, H. (1985). Mathematische Statistik I (Parametrische Verfahren bei festem
Stichprobenumfang). Stuttgart: Teubner.
Witting, H. and Nolle, G. (1970). Angewandte Mathematische Statistik. Stuttgart:
Teubner.
Wu, c.Y. (1966). The types of limit distributions for some terms of variational series.
Sci. Sinica 15, 749762.
Yang, S.S. (1985). A smooth nonparametric estimator of a quantile function. J. Amer.
Statist. Assoc. 80, 10041011.
Yamato, H. (1973). Uniform convergence of an estimator of a distribution function.
Bull. Math. Statist. 15, 6978.
Zolotarev, V.M. and Rachev, S.T. (1985). Rate of convergence in limit theorems for
the max scheme. In: Stability Problems for Stochastic Models, Eds. V.V. Kalashnikov and V.M. Zolotarev, pp. 415442. Lecture Notes in Mathematics 1155. Berlin:
Springer.
Zwet, W.R. van (1964). Convex Transformations of Random Variables. Amsterdam.
Math. Centre Tracts 7.
Zwet, W.R. van (1984). A BerryEsseen bound for symmetric statistics. Z. Wahrsch.
verw. Gebiete 66, 425440.
Author Index
Alam, K., 48
Ali, M.M., 238
Anderson, C.W., 202, 203
Arnold, B.C., 63
B
Bahadur, R.R., 216, 228
Bain, L.J., 204
Balkema, A.A., 146, 148
BarndorffNielsen, 0., 198
Barnett, V., 63, 66, 67, 257
Becker, A., 63
Beran, 1., 228
Bennan, S.M., 238
Bernoulli, N., 62
Bhattacharya, R.N., 69, 149,231
Bickel, P.L, 61, 228, 271
Bloch, D.A., 271
Blum, J.R., 104
Boos, D.O., 291
Bortkiewicz, L. von, 62
Brown, B.M., 271
Brozius, H., 82
C
Chernoff, H., 210
Chibisov, D.M., 195,202
Chow, Y.S., 102
Cohen, J.P., 202
Consul, P.c., 63
Cooil, B., 202
Craig, A.T., 63
Cramer, H., 2, 148
Criticou, D., 82
Csiszar, I., 104
Csorgo, M., 150, 227
Csorgo, S., 227, 285
D
David, F.N., 205, 226
David, H.A., 63
Davis, C.E., 271
Davis, R.A., 202
Deheuvels, P., 205, 285
Dodd, E.L., 62
Dronskers, J.J., 203
Du Mouchel, W., 291
Dwass, M., 149
Dziubdziela, W., 68
E
Eddy, W.F., 82
Efron, B., 220
Egorov, V.A., 149
346
Englund, G., 149
Erdelyi, A., 190
F
Falk, M., 102, 122, 149, 159, 164, 185,
186, 187, 195, 199,202,203,224,
264,265,271,291,317
Feldman, D., 148
Feller, W., 227
Ferguson, T.S., 103
Finkelstein, B. V., 238
Fisher, R.A., 62, 172,201,202
Floret, K., 71
Frechet, M., 63
Freedman, D.A., 228
G
Galambos, J., 23, 37, 43, 63, 76, 80,
82, 162, 180,201,202,228,235,
238
Gale, J.D., 82
Galvani, L., 81
Gastwirth, J.L., 210, 271
Gather, U., 63
Geffroy, J., 238
Gini, c., 81
Gnedenko, B., 155,201
Goldie, C.M., 202, 204
Gomes, M.I., 286, 290
Gosh, J.K., 149
Gosh, M., 48
Gross, A.J., 204
Guilbaud, 0., 36
Gumbel, E.J., 62, 63, 149,257
H
Haan, L. de, 63, 82, 146, 148, 155, 195,
201, 202, 291
Hajek, J., 58, 61
Haldane, J.B.S., 203
Hall, P., 68, 81, 202, 203, 285, 291
Hall, W.J., 202
Harrel, F.E., 271
Harter, H.L., 62
Has'minskii, R.Z., 274, 298
Author Index
Hausler, E., 291
Hecker, H., 210
Heidelberger, P., 291
Helmers, R., 209, 211, 216, 227
Herbach, L., 290
Hewitt, E., 21, 57
Heyer, H., 294
Hill, B.M., 284, 290
Hillion, A., 104
Hodges, J.L. Jr., 227, 263, 271
Hoeffding, W., 104
Horvath, L., 227
Hosking, J.R.M., 259
Huang, J.S., 48
Htisler, J., 233, 236, 239, 290, 291
I
Ibragimov, J.A., 103,274,298
Iglehardt, D.L., 148
Ikeda, S., 2, 104, 149, 150, 203
Isogai, T., 81
Ivchenko, G.I., 150
Jacod, J., 204
Jammalamadaka, S.R., 228
Janssen, A., 227, 296, 317
Jayakar, S.G., 203
JoagDev, K., 238
John, M.V., 210
Johnson, N.L., 63, 209, 226, 277, 289,
290
K
Kabanov, Yu., 204
Karr, A.F., 204
Kendall, M.G., 150,226
Kiefer, J., 148,218,228
Kinnison, R.R., 63
Klenk, A., 228
Kohne, W., 71, 149,200
Kolchin, V.F., 150
Kotz, S., 63, 277, 289, 290
Kuan, K.S., 238
Kuo, M., 228
Author Index
L
Lamperti, J., 149
Landers, D., 148
Laplace P.S. de, 62, 148
Lawless, J.F., 204
Leadbetter, M.R., 23, 63, 149,202,
271
Le Cam, L., 317
Lehmann, E.L., 227, 263, 271, 315
Lewis, P.A.W., 291
Lewis, T., 63
Lindgren, G., 23, 63, 149
Lipster, R.S., 204
Loeve, M., 83
Magnus, W., 190
Malmquist, S., 37
Mammitzsch, V., 271
Mann, N.R., 204, 290
Marshall, A.W., 238
Mason, D.M., 227, 285
Matsunawa, T., 149, 150,203
Michel, R., 290
Miebach, B., 290
Mises, R. von, 62, 201, 202
Miyamoto, Y., 228
Montfort, M.A.J. van, 257, 290
Mood, A., 238
Moore, D.S., 81
Mosteller, F., 150, 211
N
Nadaraya, E.A., 271
Nagaraja, H.N., 63
Nelson, W., 204
Nevzorov, V.B., 149
Niinimaa, A., 81
Nolle, G., 315
Nonaka, Y., 150
Nowak, W., 150
Oberhettinger, F., 190
Oja, H., 81
347
Olkin, I., 238
O'Reilley, F.J., 82
p
Pantcheva, E.I., 204
Parzen, E., 271
Pathak, P.K., 104
Pearson, K., 63
Pfanzagl, J., 141, 146, 149,215,247,
248, 268, 270, 274
Pfeifer, D., 205
Pickands, J., 43, 76,80, 177,202,227,
239,291
Pitman, E.J.G., 102, 290
Plackett, R.L., 66, 67
Polfeldt, T., 227, 290
Prakasa Rao, B.L.S., 270, 271
Purl, M.L., 149
Pyke, R., 228
Quesenberry, c.P., 82
R
Rachev, S.T., 202
Radtke, M., 176, 199, 200, 204
Ralescu, S.S., 149
Ramachandran, G., 227
Rao, R.R., 69, 142,231
Raoult, J.P., 82
Reiss, R.D., 68, 102, 103, 104, 122,
124, 128, 138, 147, 149, 150, 175,
196, 200, 202, 203, 224, 226, 233,
239, 262, 268, 270, 271, 286, 290,
291,296,317
Renyi, A., 36, 63, 148
Resnick, S.I., 63, 76, 202, 204, 227,
228, 235, 238, 291
Revesz, P., 150
Rice, J., 271
Rogge, L., 148
Rootzen, H., 23, 63, 146, 149, 202, 203
Rosenblatt, M., 82, 271
Rosengard, A., 149
Rossberg, H.J., 43, 149
Author Index
348
Riischendorf, L., 63, 82
Ryzin, J. van, 271
Tucker, H.G., 148
Tusmidy, G., 271
S
Schafer, R.E., 204
Schiipbach, M., 290
Sen, P.K., 148,228
Send1er, W., 104
Sertling, R.J., 104, 150,227,289
Shiryaev, A.N., 204
Shorack, G.R., 150
Sibuya, M., 236, 238
Sidlik, Z., 58, 61
Siddiqui, M.M., 237, 238, 271
Singh, K., 224, 228
Singpurwalla, N.D., 204
Smid, B., 149
Smimov, N.V., 145, 148, 150,271
Smith, R.L., 63, 202, 203, 204, 239,
286, 290, 291
Sneyers, R., 257
Starn, A.J., 149
Stigler, S.M., 148
Strasser, H., 317
Stromberg, K., 21, 57
Stuart, A., 226
Stute, W., 219, 228, 271
Sukhatme, P.V., 36
Sweeting, T.J., 159,202,203
T
Tawn, J.A., 239
Teicher, H., 102
Terzakis, D., 82
Teugels, J.L., 227, 291
Thompson, W .R., 63
Tiago de Oliveira, J., 61, 149,238,239,
289, 291
Tippett, L.H.C., 62,172,201,202
Torgersen, E.N., 317
Tricomi, F.G., 150
Y
Yackel, J.W., 81
Yamato, H., 271
Yang, S.S., 271
Yuen, H.K., 239
Umbach, D., 148
Uzgoren, N.T., 203
V
Vaart, H.P. van der, 148
W
Wald, A., 273
Walsh, J.E., 148, 149
Watson, G., 271
Watts, V., 202
Weinstein, S.B., 204
Weiss, L., 2, 39, 103, 149, 150,203,
238,290,296,311,313,317
Weissman, I., 291
Weller, M., 317
Wellner, J .A., 86, 104, 150, 202
Welsh, A.H., 285, 291
Wilks, S.S., 57, 59, 63
Winter, B.B., 271
Witting, H., 274, 315
Wolfowitz, J., 104
Wu, C.Y., 195,202
Z
Zahedi, H., 63
Zolotarev, V.M., 202
Zwet, W.R. van, 227
Subject Index
[AbbL: o.s.
order statistic1
A
ADO (software package), 7
Annual maxima method, see Subsample
method
Associated LV.'S, 238
Asymptotic distribution of
central o.s.'s, 145146; see also
Asymptotic normality
extreme o.s.'s
k largest o.s.'s, 177179
kth largest o.s.'s, 161163
maxima, see (univariate, multivariate) Extreme value d.f.
minima, 24, 162
intermediate o.s.'s, 164, 195; see also
Asymptotic normality
Asymptotic independence of
groups of o.s. 's, 75, 121123, 297
marginal maxima, 234237
ratios of o.s. 's, 149
spacings, 201
Asymptotic normality of
central o.s. 's, multivariate, 229232
central o.s. 's, univariate
strong, 22, 110114, 131142
weak, 108110, 129
intermediate o.s.'s
strong, 164
weak, 109
kernel estimator, 263264
linear combination of o.s. 's, 209211,
215216, 227
multinomial distributions, 150
B
Bahadur approximation, 216220
Bandwidth, 249
Beta
function, 22
LV., 22
Bonferroni inequality, 79, 102,
233
Bootstrap
distribution
of linear combination of o.s. 's,
228
of sample quantile, 222226
smooth, of sample quantile, 265268
error process, 224, 267
Borel set, 8
350
Brownian
bridge, 150
motion, 224
C
Cauchy distribution, 49, 199
Central limit theorem
LindebergUvyFeller, 210
multidimensional, 231
Central o.s., see (central) Sequence
X2 distance, see Distance
distribution
central, 313
noncentral, 315
Comparison of models, 275276, 292299, 317
Componentwise ordering, see (multivariate) O.s. 's
Concomitant, 66
Conditional
density, 52
distribution, 51
of exceedances, 5455, 61, 78
of Li.d. random variables given the
o.s., 60
of o.s., see (univariate) O.s.'s
of rank statistic given the o.s., 6061
independence under Markov property,
53,61
Confidence procedure
bootstrap, 225226
for quantile, 247
Convex hull of data, 66, 8182
D
Data
temperature (De Bilt), 257260
Venice sealevel, 286
Deficiency
E, of models, 295, 299
of estimators, 263
~monotone, 77
Density quantile function, 243, 253
Dependence function, 80
Pickands estimator of, 80; see also
Kernel estimator
Subject Index
D.f., see Distribution function
Dirichlet distribution, 59
Distance
X2_, 98102, 328330
between induced probability measures, 102
between product measures, 100
Hellinger, 98102, 328
between induced probability measures, 101
between product measures, 100
KolmogorovSmirnov, 2
KullbackLeibler, 98100, 328
between product measures, 100
L,, 94, 326
variational, 94, 326
between induced probability measures, 101
between product measures, 9798,
327, 328330
Distribution function (d. f)
continuity criterion, 16
degenerate, 14, 76
endpoints of, 8
multivariate, 7778
weak convergence, 2, 194195
Domain of attraction, see (univariate) Extreme value d.f.
Dvoretzky, Kiefer, Wolfowitz inequality,
104
E
Edgeworth expansion, 91, 140
inverse of, 141
Efficiency, 273275, 279, 283, 284
second order, see Deficiency, of estimators
Estimator
Bayes, 274
equivariant under translations, 284
kernel, see Kernel estimator
maximum likelihood, 259260, 277279,298
minimum distance, 259
nearest neighbor density, 81
orthogonal series, 269270, 315
Pitman, 274, 298
quick, of location and scale parameters, 212213, 289
351
Subject Index
randomized, 274; see also Sample,
median; Sample, qquantile
of shape parameter, 277279, 281283, 284286
of tail index, 279281, 283284, 284286
Exceedances, see also (truncated, empirical) Point process
multivariate, 6768, 81
univariate, 54, 190193
Expansion of finite length, 90
of d.f.'s, 93
of distributions of
central o.s. 's, several, 131135
central o.s.'s, single, 114121, 138140, 147148; see also GramCharlier series
convex combination of o.S. 's, 213215
k largest o.s.'s, 182, 184
kth largest o.s. 's, 184
maxima, 172176
of moments of o.S. 's, 207208
of normal distributions, 9091, 102
of probability measures, 9193
of quantiles of o.S. 's, 208209, 226
Expected loss, see Risk
Exponential
d.f., 13, 42; see also Generalized Pareto d.f.
model, 282283
Exponential bound theorem for
LLd. random variables, 8384
kernel estimator, 262
o.s.'s, 8486, 144145
sample d.f., 218219
sample q.f., 8789
Extreme o.s. 's, see (extreme) Sequence,
maximum, and minimum
Extreme value d.f., multivariate, 7577
maxstability of, 77
Pickands representation of, 7677,
80
Extreme value d~f., univariate, 23, 24;
see also Fn!chet, Gumbel, and
Weibull d.f.
density of, 152
domain of attraction of, 24, 154156,
157, 180, 194
maxstability of, 23
Extreme value model, see also Frechet,
Weibull, and Gumbel model
extended, 286
of Poisson processes, 194
3parameter, see von Mises parametrization
F
Finite expansion, see Expansion of finite
length
Fisher information, 282
matrix, 276277
FisherTippett asymptote, see (univariate)
Extreme value d.f.
type I, see Gumbel d.f.
type II, see Frechet d.f.
type III, see Weibull d.f.
Fourier expansion, 269, 315
Frechet
d.f., 23; see also Extreme value d.f.
illustrations, 26, 153
mode of, 153
model, 276, 279
multivariate, model, 282
semiparametric, type model, 279280,
283284
G
Galton difference problem, 63
Gamma
function, 22
r.v., 3940, 59
moments of, 181182
Generalized Pareto
density, 157
illustrations, 196198
d.f.,42
characterization of, 37, 43, 185
type I, see Pareto d.f.
type II, 42, 196
type III, see Exponential d.f.
GramCharlier series, 226
Gumbel
d.f., 23; see also Extreme value d.f.
illustrations, 25, 26
method, see Subsample method
model, 276279
352
H
Hellinger distance, see Distance
Hill estimator, 284285
Homogeneous Poisson process, see Point
process
I
Independent not necessarily identically
distributed (i.n.n.i.d.) r. v.'s
distribution of the o.s. of, 36
maximum of, 21
Informative, more, 294
Intensity measure, see Point process
Inverse, generalized, 318320; see also
Q.f.
Jenkinson parametrization, see von Mises
parametrization
Jensen inequality, 103
K
Kernel
Epanechnikov, 253
method, 251252
Kernel estimator of
density, 253, 269
illustrations, 258259
density quantile function, 253, 260262
dependence function, 239
d.f., 252253, 262264
inverse of: illustrations, 254255
hazard function, 271
q.f., 252, 260262, 264265, 286289
illustrations, 254255, 288
KolmogorovSmirnov
distance, see Distance
test, see Test
L
Leadbetter's conditions, 202
Lebesgue's differentiation theorem, 71
Lstatistic, see (linear combination of)
O.s.'s
Subject Index
M
Malmquist's result, 3738
Marginal ordering, see Multivariate o.s. 's
Markov
kernel, 34, 293
distribution of, 34, 50, 293
property
conditional independence under, 61
of O.s. 's, 54
Maximum (also: sample maximum)
multivariate, 65
density of, 69
d.f. of, 68
univariate, 12, 21
density of, 22
dependence of, and minimum, see
Asymptotic independence
d.f. of, 20
with random index, 198,280281
Maximum likelihood, see Estimator
Maxstability, see Extreme value d.f.
Mean value function, see Point process
Median
multivariate, 66
univariate, 49
Minimax criterion, 274
Minimum (also: sample minimum)
multivariate, 65
density of, 69
d.f. of, 69
univariate, 12
density of, 22
d.f. of, 20
Mises, von
parametrization
of extreme value d.f. 's, 2426
of generalized Pareto d.f.'s, 197198
of Poisson processes, 194
type conditions, 159160, 199200
Moderate deviation, see Exponential
bound theorem
Moving scheme, 249250
illustration, 250
N
NewtonRaphson iteration, 259, 278
Normal
approximation, see Asymptotic
normality
353
Subject Index
comparison lemma, 149
distributions
expansion of, see Expansion of finite
length
moments of, 130131
multivariate, 129130, 146
univariate, 13
model , multivariate, 310315
Normalization of maxima, 23, 156, 161,
200
nonlinear, 204
of nonnal r.v.'s, 160161
o
Ordered distance r. v ., 68
Ordering, totalI\J, 6668
Order statistics (o.s. 's), multivariate, 65
density of, 71, 7374
d.f. of, 6970, 229232, 232237
I\J, see Ordering
Order statistics (o.s. 's), univariate, 12
of binomial r.v.'s, 141142
central, see (central) Sequence
conditional distribution of, given
o.s. 's, 5254
convex combination of, 5556
density of single, 21, 33
d.f. of, 20, 57
mode of, 49
unimodality of, 4849
of discrete r.v.'s, 3536, 139142
extreme, see (extreme) Sequence
independence of, from underlying d.f.,
123128
intennediate, see (intennediate) Sequence
joint density of
absolutely continuous case, 2728,
3032
continuous case, 33
discontinuous case, 3536, 58
linear combination of, 56, 209216,
227
local limit theorem for, 142144
Markov property of, 54
moments of
exact, 4445, 5960
inequalities for, 4547, 8687
positive dependence of, 61
with random sample size, 149
ratios of
of generalized Pareto r.v.'s, 43
of unifonn r. v .'s, 3738, 58
representation of
of exponential LV.'S, 37
of unifonn LV. 's, 3842, 59
sparse, 28
of stationary, nonnal sequence,
146
Outlier, 62, 63
Pareto d.f., 42, 196,289; see also Generalized Pareto d.f.
illustrations, 196, 198
Partial maxima process, 7475
Penultimate distribution, 172
Pickands estimator, see Dependence
function
Point process
empirical, 190194
intensity measure of, 193
mean value function of, 193
Poisson, 190194
homogeneous, 191192
truncated, 190194
Poisson
approximation of
binomial LV., 162, 190
empirical point process, 190194,
204205
process, 281; see also Point process
Polygon, 248249
illustration, 250
Probability integral transfonnation
multivariate, 81
of o.s. 's, 18
univariate, 14, 17,34
Probability paper, 6, 257
Q. f., see Quantile function
Qquantile, 13
354
Quantile
function (q.f.), 13, 19
continuity criterion, 320
estimation of, 286289
parametric estimation of, 256; see
also Kernel estimator
weak convergence, 19
process, 150, 264
smooth, 264
transformation
multivariate, 81
of o.s.'s, 15, 1718, 76
univariate, 14, 17
Quasiquantile, 250251
R
Ranking, see Ordering
Rank statistic, 55, 60
Regression, linear, 314
Risk, 273
Bayes, 274
S
Sample
d.f., 13,59
oscillation of, 218219
maximum, see Maximum
median, multivariate, 66
median, univariate, 14
randomized, 50, 60, 246
minimum, see Minimum
q.f., 13
illustrations, 250, 254255,
288
maximum deviation of, 8788
oscillation of, 8889 261
smooth, see Kernel estimator
qquantile, 14, 247248
randomized, 247, 268
Scheffe lemma, 9597, 325
Sequence
of lower or upper extremes,
12
of o.s.'s
central, extreme or intermediate,
12
Subject Index
Skewness of extreme value density, 2526
Smoothing technique, see Kernel method
Spacings, 29, 3637, 147,201,212,
227228
Strong convergence of unimodal probability measures, 103
Subsample method, 165, 176, 185
Sufficiency, 294295; see also Deficiency
approximate, 295
Blackwell, 293295
Sukhatme's result, 36
Sum of extremes, 227
Survivor function, 69, 79, 234
Sweeting's result, 159
Systematic statistic, 211213
T
Tail
equivalence of
densities, 157159
d.f. 's, 156
index, 204, 280286
Test
X2 , 313
KolmogorovSmirnov, 313
of quantiles, 244246, 268269
Threshold
nonrandom, 191, 193
random, 55
Transformation
of models, see Comparison of models
technique, see Quantile, transformation; Probability integral transformation
theorem for densities, 29, 57
Trimmed mean, 68, 211, 251
Truncation
of d.f., 52, 57, 194
of point process, 191, 193
Unbiased estimation
expectation, 273
median, 5051, 247, 248, 268, 274
Subject Index
Unimodal density, 48
mode of, 48
strongly, 48
V
Variational distance, see Distance
355
W
Weibull
d.f., 23, 199; see also Extreme value
d.f.
illustrations, 26, 27, 154, 258259
mode of, 154
model, 317