Вы находитесь на странице: 1из 362

Springer Series in Statistics

Advisors:

J. Berger, S. Fienberg, J. Gani,


K. Krickeberg, B. Singer

Springer Series in Statistics


AndrewsjHerzberg: Data: A Collection of Problems from Many Fields for the
Student and Research Worker.
Anscombe: Computing in Statistical Science through APL.
Berger: Statistical Decision Theory and Bayesian Analysis, 2nd edition.
Bremaud: Point Processes and Queues: Martingale Dynamics.
Brockwe/ljDavis: Time Series: Theory and Methods.
DaleyjVere-Jones: An Introduction to the Theory of Point Processes.
Dzhaparidze: Parameter Estimation and Hypothesis Testing in Spectral Analysis
of Stationary Time Series.
Farrell: Multivariate Calculation.
GoodmanjKJUskal: Measures of Association for Cross Classifications.
Hartigan: Bayes Theory.
Heyer: Theory of Statistical Experiments.
Jolliffe: Principal Component Analysis.
Kres: Statistical Tables for Multivariate Analysis.
LeadbetterjLindgrenjRootzen: Extremes and Related Properties of Random
Sequences and Processes.
Le Cam: Asymptotic Methods in Statistical Decision Theory.
Manoukian: Modem Concepts and Theorems of Mathematical Statistics.
Miller, Jr.: Simulaneous Statistical Inference, 2nd edition.
MostellerjWallace: Applied Bayesian and Classical Inference: The Case of The
Federalist Papers.
Pollard: Convergence of Stochastic Processes.
Pratt/Gibbons: Concepts of N onparametric Theory.
Read/Cressie: Goodness-of-Fit Statistics for Discrete Multivariate Data.
Reiss: Approximate Distributions of Order Statistics: With Applications to
Nonparametric Statistics.
Sachs: Applied Statistics: A Handbook of Techniques, 2nd edition.
Sen eta: Non-Negative Matrices and Markov Chains.
Siegmund: Sequential Analysis: Tests and Confidence Intervals.
Vapnik: Estimation of Dependences Based on Empirical Data.
Wolter: Introduction to Variance Estimation.
Yaglom: Correlation Theory of Stationary and Related Random Functions I:
Basic Results.
Yaglom: Correlation Theory of Stationary and Related Random Functions II:
Supplementary Notes and References.

R.-D. Reiss

Approximate Distributions
of Order Statistics
With Applications to N onparametric
Statistics

With 30 Illustrations

Springer-Verlag
New York Berlin Heidelberg
London Paris Tokyo

R.-D. Reiss
Universitat Gesamthochschule Siegen
Fachbereich 6, Mathematik
D-5900 Siegen
Federal Republic of Germany

Mathematics Subject Classification (1980): 62-07, 62B15, 62E20, 62G05, 62G10, 62G30
Library of Congress Cataloging-in-Publication Data
Reiss, Rolf-Dieter.
Approximate distributions of order statistics.
(Springer series in statistics)
Bibliography: p.
Includes indexes.
1. Order statistics. 2. Asymptotic distribution
(Probability theory) 3. Nonparametric statistics.
I. Title. II. Series.
QA278.7.R45 1989
519.5
88-24844
Printed on acid-free paper.

1989 by Springer-Verlag New York Inc.


Softcover reprint of the hardcover 1st edition 1989
All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer-Verlag, 175 Fifth Avenue, New York, NY 10010,
USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection
with any form of information storage and retrieval, electronic adaptation, computer software, or
by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc. in this publication, even if
the former are not especially identified, is not to be taken as a sign that such names, as understood
by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.
Typeset by Asco Trade Typesetting Ltd., Hong Kong.

9 876 54 32 1
ISBN-13:978-1-4613-9622-2
e-ISBN-13:978-1-4613-9620-8
DOl: 10.1007/978-1-4613-9620-8

To Margit, Maximilian, Cornelia, and Thomas

Preface

This book is designed as a unified and mathematically rigorous treatment of


some recent developments of the asymptotic distribution theory of order
statistics (including the extreme order statistics) that are relevant for statistical
theory and its applications. Particular emphasis is placed on results concerning the accuracy oflimit theorems, on higher order approximations, and other
approximations in quite a general sense.
Contrary to the classical limit theorems that primarily concern the weak
convergence of distribution functions, our main results will be formulated in
terms of the variational and the Hellinger distance. These results will form the
proper springboard for the investigation of parametric approximations of
nonparametric models of joint distributions of order statistics. The approximating models include normal as well as extreme value models. Several
applications will show the usefulness of this approach.
Other recent developments in statistics like nonparametric curve estimation and the bootstrap method will be studied as far as order statistics are
concerned. 1n connection with this, graphical methods will, to some extent,
be explored.
The prerequisite for handling the indicated problems is a profound knowledge of distributional properties of order statistics. Thus, we collect several
basic tools (of finite and asymptotic nature) that are either scattered in literature or are not elaborated to such an extent that would satisfy our present
requirements. For example, the Markov property of order statistics is studied
in detail. This part of the book that has the characteristics of a textbook is
supplemented by several well-known results.
The book is intended for students and research workers in probability and
statistics, and practitioners involved in applications of mathematical results
concerning order statistics and extremes. The knowledge of standard calculus

viii

Preface

and topics that are taught in introductory probability and statistics courses
are necessary for the understanding of this book. To reinforce previous knowledge as well as to fill gaps, we shall frequently give a short exposition of
probabilistic and statistical concepts (e.g., that of conditional distribution and
approximate sufficiency).
The results are often formulated for distributions themselves (and not only
for distribution functions) and so we need, as far as order statistics are
concerned, the notion of Borel sets in a Euclidean space. Intervals, open sets,
and closed sets are special Borel sets. Large parts of this book can be understood without prior knowledge of technical details of measure-theoretic
nature.
My research work on order statistics started at the University of Cologne,
where influenced by J. Pfanzagl, I became familiar with expansions and
statistical problems. Lecture notes of a course on order statistics held at the
University of Freiburg during the academic year 1976/77 can be regarded as
an early forerunner of the book.
I would like to thank my students B. Dohmann, G. Heer, and E. Kaufmann
for their programming assistance. G. Heer also skillfully read through larger
parts of the manuscript. It gives me great pleasure to acknowledge the cooperation, documented by several articles, with my colleague M. Falk. The
excellent atmosphere within the small statistical research group at the University of Siegen, and including A. Janssen and F. Marohn, facilitated the writing
of this book. Finally, I would like to thank W. Stute, and those not mentioned
individually, for their comments.
Siegen, FR Germany

Rolf-Dieter Reiss

Contents

Preface

vii

CHAPTER 0

Introduction
0.1.
0.2.
0.3.
0.4.
0.5.

Weak and Strong Convergence


Approximations
The Role of Order Statistics in Nonparametric Statistics
Central and Extreme Order Statistics
The Restriction to Independent and Identically Distributed
Random Variables
0.6. Graphical Methods
0.7. A Guide to the Contents
0.8. Notation and Conventions

1
3
4

5
6

7
8

PART I

Exact Distributions and Basic Tools


CHAPTER 1

Distribution Functions, Densities, and Representations

11

1.1.
1.2.
1.3.
1.4.
1.5.
1.6.
1. 7.
1.8.
P.1.

11
14
20
27
32
36
44
51
56
61

Introduction to Basic Concepts


The Quantile Transformation
Single Order Statistics, Extremes
Joint Distribution of Several Order Statistics
Extensions to Continuous and Discontinuous Distribution Functions
Spacings, Representations, Generalized Pareto Distribution Functions
Moments, Modes, and Medians
Conditional Distributions of Order Statistics
Problems and Supplements
Bibliographical Notes

Contents

CHAPTER 2

Multivariate Order Statistics

64

2.1. Introduction
2.2. Distribution Functions and Densities
P.2. Problems and Supplements
Bibliographical Notes

68
78
81

64

CHAPTER 3

Inequalities and the Concept of Expansions

83

3.1.
3.2.
3.3.
P.3.

83
89
94

Inequalities for Distributions of Order Statistics


Expansions of Finite Length
Distances of Measures: Convergence and Inequalities
Problems and Supplements
Bibliographical Notes

102

104

PART II

Asymptotic Theory
CHAPTER 4

Approximations to Distributions of Central Order Statistics

107

4.1. Asymptotic Normality of Central Sequences


4.2. Expansions: A Single Central Order Statistic
4.3. Asymptotic Independence from the Underlying Distribution
Function
4.4. The Approximate Multivariate Normal Distribution
4.5. Asymptotic Normality and Expansions of Joint Distributions
4.6. Expansions of Distribution Functions of Order Statistics
4.7. Local Limit Theorems and Moderate Deviations
P.4. Problems and Supplements
Bibliographical Notes

108
114
123
129
131
138

142
145
148

CHAPTER 5

Approximations to Distributions of Extremes

151

5.1. Asymptotic Distributions of Extreme Sequences


5.2. Hellinger Distance between Exact and Approximate Distributions
of Sample Maxima
5.3. The Structure of Asymptotic Joint Distributions of Extremes
5.4. Expansions of Distributions of Extremes of Generalized Pareto
Random Variables
5.5. Variational Distance between Exact and Approximate
Joint Distributions of Extremes
5.6. Variational Distance between Empirical and Poisson Processes
P.5. Problems and Supplements
Bibliographical Notes

152
164
176
181
186
190
194
201

Contents

xi

CHAPTER 6

Other Important Approximations

206

6.1.
6.2.
6.3.
6.4.
P.6.

206
209
216
220
226
227

Approximations of Moments and Quantiles


Functions of Order Statistics
Bahadur Approximation
Bootstrap Distribution Function of a Quantile
Problems and Supplements
Bibliographical Notes

CHAPTER 7

Approximations in the Multivariate Case

229

7.1. Asymptotic Normality of Central Order Statistics


7.2. Multivariate Extremes
P.7. Problems and Supplements
Bibliographical Notes

229
232
237
238

PART III

Statistical Models and Procedures


CHAPTER 8

Evaluating the Quantile and Density Quantile Function

243

8.1.
8.2.
8.3.
8.4.
P.8.

243
248
260
265
268
270

Sample Quantiles
Kernel Type Estimators of Quantiles
Asymptotic Performance of Quantile Estimators
Bootstrap via Smooth Sample Quantile Function
Problems and Supplements
Bibliographical Notes

CHAPTER 9

Extreme Value Models

272

9.1.
9.2.
9.3.
9.4.
9.5.
9.6.
9.7.
P.9.

273
276
279
281
283
284
286
289
290

Some Basic Concepts of Statistical Theory


Efficient Estimation in Extreme Value Models
Semiparametric Models for Sample Maxima
Parametric Models Belonging to Upper Extremes
Inference Based on Upper Extremes
Comparison of Different Approaches
Estimating the Quantile Function Near the Endpoints
Problems and Supplements
Bibliographical Notes

CHAPTER 10

Approximate Sufficiency of Sparse Order Statistics

292

10.1. Comparison of Statistical Models via Markov Kernels


10.2. Approximate Sufficiency over a Neighborhood of a Fixed
Distribution

292
299

xii

Contents

10.3. Approximate Sufficiency over a Neighborhood of a Family of


Distributions
10.4. Local Comparison of a Nonparametric Model and a Normal Model
P.lO. Problems and Supplements
Bibliographical Notes

305
310
315
317

Appendix 1. The Generalized Inverse


Appendix 2. Two Technical Lemmas on Expansions
Appendix 3. Further Results on Distances of Measures

318
321
325

Bibliography
Author Index
Subject Index

331
345
349

CHAPTER 0

Introduction

Let us start with a detailed outline of the intentions and of certain characteristics of this book.

0.1. Weak and Strong Convergence


For good reasons the concept of weak convergence of random variables (in
short, r.v.'s) ~n plays a preeminent role in literature. Whenever the distribution
functions (in short, dJ.'s) Fn of the r.v.'s ~n are not necessarily continuous then,
in general, only the weak convergence holds, that is,
n -+

00,

(1)

at every point of continuity t of Fo. If Fo is continuous then it is well known


that the convergence in (1) holds uniformly in t. This may be written in terms
of the Kolmogorov-Smirnov distance as
n -+

00.

In this sequel let us assume that Fo is continuous. It follows from (1) that

n -+

00,

(2)

uniformly over all intervals 1. In general, (2) does not hold for every Borel set
1. However, if the drs Fn have densities, say, f,. such that f,.(t) -+ fo(t), n -+ 00,
for almost all t, then it is well known that (2) is valid w.r.t. the variational
distance, that is,

o. Introduction

sup IPgn
B

B} - Pg o E B} 1--+ 0,

n --+

(3)

00,

where the sup is taken over all Borel sets B.


Next, the remarks above will be specialized to order statistics. It is well
known that central order statistics Xr(n),n of a sample of size n are asymptotically normally distributed under weak conditions on the underlying dJ. F. In
terms of weak convergence this may be written
n --+

(4)

00,

for every t, with ~o denoting a standard normal r.v. and an, bn are normalizing
constants. The two classical methods of proving (4) are
(a) an application of the central limit theorem to binomial r.v.'s,
(b) a direct proof of the pointwise convergence of the corresponding densities
(e.g. H. Cramer (1946)).
However, it is clear that (b) yields the convergence in a stronger sense,
namely, w.r.t. the variational distance. We have
sup IP{a;;-l(Xr (n),n - bn) E B} - Pg o E B}I--+ 0,

n --+

00,

(5)

where the sup is taken over all Borel sets B. A more systematic study of the
strong convergence of distributions of order statistics was initiated by L. Weiss
(1959, 1969a) and s. Ikeda (1963). These results particularly concern the joint
asymptotic normality of an increasing number of order statistics.
The convergence of densities of central order statistics was originally
studied for technical reasons; these densities are of a simpler analytical form
than the corresponding dJ.'s. On the other hand, when treating weak convergence of extreme order statistics it is natural to work directly with dJ.'s. To
highlight the foregoing remark the reader is reminded of the fact that F n is
the dJ. of the largest order statistic (maximum) Xn,n of n independent and
identically distributed r.v.'s with common dJ. F.
The, meanwhile, classical theory for extreme order statistics provides necessary and sufficient conditions for adJ. F to belong to the domain of attraction
of a nondegenerate dJ. G; that is, the weak convergence

n --+

(6)

00,

holds for some choice of constants an > and reals bn. If F has a density then
one can make use of the celebrated von Mises conditions to verify (6). These
conditions are also necessary for (6) under further milder conditions imposed
on F. In particular, the drs treated in statistical textbooks satisfy one of the
von Mises conditions. Moreover, it turns out that the convergence w.r.t. the
variational distance holds. This may be written,
sup IP{a;;-l(Xn,n - bn) E B} - G(B)I--+ 0,

--+ 00,

(7)

where the sup is taken over all Borel sets B. Note that the symbol G is also

0.2. Approximations

used for the probability measure corresponding to the dJ. G. Apparently, (7)
implies (6).
The relation (7) can be generalized to the joint distribution of upper
extremes X n - k +1 :n' X n -k+2 :n' .. , Xn:n where k == k(n) is allowed to increase
to infinity as the sample size n increases.
We want to give some arguments why our emphasis aims at the variational
and Hellinger distance instead of the Kolmogorov-Smirnov distance:
(a) We claim mathematical reasons, namely, to formulate as strongly as
possible the results. One can add that the problems involved are very
challenging.
(b) Results in terms of dJ.'s look awkward if the dimension increases with the
sample size. Of course, the alternative outcome is the formulation in terms
of stochastic processes.
(c) It is necessary to use the variational distance (and, as an auxiliary tool,
the Hellinger distance) in connection with model approximation. In other
words, certain problems cannot be solved in a different way.

0.2. Approximations
The joint distributions of order statistics can explicitly be described by analytical expressions involving the underlying dJ. F and density f. However, in
most cases it is extremely cumbersome to compute the exact numerical values
of probabilities concerning order statistics or to find the analytical form of
d.f.'s of functions of order statistics. Hence, it is desirable to find approximate
distributions. In view of practical and theoretical applications these approximations should be of a simple form.
The classical approach of finding approximate distributions is given by the
asymptotic theory for sequences of order statistics Xr(n):n with the sample size
n tending to infinity:
(a) If r(n) ~ 00 and n - r(n) ~ 00 as n ~ 00 then the order statistics are
asymptqtically normal under mild regularity conditions imposed on F.
(b) If r(n) = k or r(n) = n - k + 1 for every n with k being fixed then the order
statistics are asymptotically distributed according to an extreme value
distribution (being unequal to the normal distribution).
In the intermediate cases-that is, r(n) ~ 00 and r(n)/n ~ 0 or n - r(n) ~ 00
and (n - r(n))/n ~ 0 as n ~ oo-one can either use the normal approximation
or an approximation by means of a sequence of extreme value distributions.
Thus, the problem of computing an estimate of the remainder term enters the
scene; sharp estimates will make the different approximations comparable.
In the case of maxima of normal r.v.'s we shall see that a certain sequence
of extreme value distributions provides a better approximation than the limit
distribution.

O. Introduction

Better insight into the problem of computing accurate approximations is


obtained when higher order approximations are available. There is a trade-off
between the two requirements that the higher order approximation should be
of a simple form and also of a better performance than the limiting distribution.
In particular, we shall study finite expansions of length m + 1 which may
be written

Q+

L vi,n
i=l

where Q is the limiting distribution and the vi,n are signed measures depending
on the sample size n. A prominent example is provided by Edgeworth expansions. Usually, the signed measures have polynomials h,n as densities w.r.t. Q.
If Q has a density 9 then the expansion may be written
Q(B)

+ it

vi,n(B) =

L(1

+ ~ h,ix) )

g(x) dx

(8)

for every Borel set B. Specializing (8) to B = ( - 00, t], one gets approximations
to d.f.'s of order statistics.
The bound of the remainder term of an approximation will involve
(a) unknown universal constants, and
(b) some known terms which specify the dependence on the underlying d.f.
and the index of the order statistic.
Since the universal constants are not explicitly stated, our considerations
belong to the realm of asymptotics.
The bounds give a clear picture of the dependence on the remainder terms
from the underlying distribution. Much emphasis is laid on providing numerical examples to show that the asymptotic results are relevant for small and
moderate sample sizes.

0.3. The Role of Order Statistics in


Nonparametric Statistics
The sample d.f. Fn is the natural, nonparametric estimator of the unknown d.f.
F, and, likewise, the sample quantile function (in short, sample q.f.) Fn -1 may
be regarded as a natural estimator of the unknown q.f. F- 1 . For any functional
T(F-l) of F- 1 a plausible choice of an estimator will be T(Fn -1) if no further
information is given about the underlying model.
Note that T(Fn- 1) can be expressed as t(X 1:n, ... , Xn:n) since Fn-:-1(q) =
Xr(q):n where r(q) = nq or r(q) = [nq] + 1.
In many nonparametric problems one is only concerned with the local
behavior of the q.f. F- 1 so that it suffices to base a statistic on a small set of
order statistics like upper extremes

0.4. Central and Extreme Order Statistics

or certain central order statistics


X1nq]:n ~ ... ~

X1np]:n

where 0 < q < p < 1.

Thus, one is interested in the distribution of functions of order statistics of


the form T(X"m X,+1:n, ... , X.:n) where 1 ~ r ~ s ~ n. This problem can be
studied for a particular statistic T or within a certain class of statistics T like
linear combinations of order statistics.
If the type of the statistic T is not fixed in advance, one can simplify the
stochastic analysis by establishing an approximation of the joint distribution
of order statistics. Upper extremes X n : n, ... , Xn-k+l:n may be replaced by
r.v.'s Y1 , , lk that are jointly distributed according to a multivariate extreme
value distribution so that the error term
sup IP{(Xn:n, ... , Xn-k+l:n)

B} - P{(Y1 , , y,,)

B}I:= c5(F)

(9)

is sufficiently small. (9) implies that for any statistic T


sup IP{T(Xn:n, ... , Xn-k+l:n)
B

B} - P{T(Y1 , . , y") E B}I ~ c5(F), (10)

and hence statistical problems concerning upper extremes can approximately


be solved within the parametric extreme value model. These arguments also
hold for lower extremes.
A similar-yet slightly more complicated-operation is needed in the case
of central order statistics. Now the joint distribution of order statistics is
replaced by a multivariate normal distribution. To return from the normal
model to the original model one needs a fixed Markov kernel which will be
constructed by means of a conditional distribution of order statistics.

0.4. Central and Extreme Order Statistics


There are good reasons for a separate treatment of extreme order statistics
and central order statistics; one can e.g. argue that the asymptotic distributions of extreme order statistics are different from those of central order
statistics.
However, as already mentioned above, intermediate order statistics can be
regarded as central order statistics as well as extremes so that a clear distinction between the two different classes of order statistics is not possible. The
statistical extreme value theory is concerned with the evaluation of parameters
of the tail of a distribution like the upper and lower endpoint. In many
situations the asymptotically efficient estimator will depend on intermediate
order statistics and will itself be asymptotically normal. Thus, from a certain
conservative point of view statistical extreme value theory does not belong to
extreme value theory.

o.

Introduction

On the other hand, some knowledge of stochastical properties of extreme


order statistics is needed to examine certain aspects of the behaviour of central
order statistics. To highlight this point we note that spacings X,:n - X'-l:n
of exponential r.v.'s have the same distribution as sample maxima. Another
example is provided by the conditional distribution of the order statistic X':n
given X,+l:n = x that is given by distributions of sample maxima.

0.5. The Restriction to Independent and


Identically Distributed Random Variables
The classical theory of extreme values deals with the weak convergence of
distributions of maxima of independent and identically distributed r.v.'s. The
extension of these classical results to dependent sequences was one of the
celebrated achievements of the last decades. This extension was necessary to
justify the applicability of classical results to many natural phenomena.
A similar development can be observed in the literature concerning the
distributional properties of central order statistics, however, these results are
more sporadic than systematic. In this book we shall indicate some extensions
of the classical results to dependent sequences, but our attention will primarily
be focused upon strengthening classical results by obtaining convergence in
a stronger sense and deriving higher order approximations. Our results may
also be of interest for problems which concern dependent r.v.'s like
(a) testing problems where under the null-hypothesis the r.v.'s are assumed to
be independent, and
(b) cases where results for dependent random variables are formulated via a
comparison with the corresponding results for independent r.v.'s.

0.6. Graphical Methods


Despite of the preference for mathematical results the author strongly believes
in the usefulness of graphical methods. I have developed a very enthusiastic
attitude toward graphical methods but this is only when the methods are
controlled by a mathematical background.
The traditional method of visually discriminating between distributions is
the use of probability papers. This method is highly successful since the eye
can easily recognize whether a curve deviates from a straight line. Perhaps the
disadvantages are
(a) that one can no longer see the original form of the "theoretical" dJ.,
(b) that small oscillations ofthe density (thus, also of probabilities) are difficult
to be detected by the approach via dJ.'s.

0.7. A Guide to the Contents

Alternatively, one may use densities, which playa key role in our methodology. As far as visual aspects are concerned the maximum deviation of
densities is more relevant than the L1 -distance (which is equivalent to the
variational distance of distributions).
The problem that discrete dJ.'s (like sample d.f.'s) have no densities can be
overcome by using smoothing techniques like histograms or kernel density
estimates. Thus the data points can be visualized by densities. The qJ. is
another useful diagnostic tool to study the tails of the distribution.
The graphical illustrations in the book were produced by means of the
interactive statistical software package ADO.

0.7. A Guide to the Contents


This volume is organized in three parts, each of which is divided into chapters
where univariate and multivariate order statistics are studied. The treatment
of univariate order statistics is separated completely from the multivariate
case.
The chapters start-as a warm-up-with an elementary treatment ofthe
topic or with an outline of the basic ideas and concepts. In order not to
overload the sections with too many details some of the results are shifted to
the Problems and Supplements. The Supplements also include important
theorems which are not central to this book. Historical remarks and discussions offurther results in literature are collected in the Bibliographical Notes.
Given the choice between different proofs, we prefer the one which can also
be made applicable within the asymptotic .set-up. For example, our way of
establishing the joint density of several order statistics is also applicable to
derive the joint asymptotic normality of several central order statistics.
Part I lays out the basic notions and tools. In Chapter 1 we explain in detail
the transformation technique, compute the densities of order statistics and
study the structure of order statistics as far as representations and conditional
distributions are concerned.
Chapter 2 is devoted to the multivariate case. We discuss the problem of
defining order statistics in higher dimensions and study some basic properties
in the special case of order statistics, these are defined componentwise.
Chapter 3 contains some simple inequalities for distributions of order
statistics. Moreover, concepts and auxiliary tools are developed which are
needed in Part II for the construction of approximate distributions of order
statistics.
Part II provides the basic approximations of distributions of order statistics.
Chapter 4 and 5 are concerned with the asymptotic normality of central order
statistics and the asymptotic distributions of extreme order statistics. Both
chapters start with an introduction to asymptotic theory; in a second step the
accuracy of approximation is investigated. Some asymptotic properties of

o.

Introduction

functionals of order statistics, the Bahadur statistic and the bootstrap method
are treated in Chapter 6. Certain aspects of asymptotic theory of order
statistics in the multivariate case are studied in Chapter 7.
Our own interests heavily influence the selection of statistical problems in
Part III, and we believe the topics are of sufficient importance to be generally
interesting.
In Chapter 8 we study the problem of estimating the qJ. and related
problems within the nonparametric framework. Comparisons of semiparametric models of actual distributions with extreme value and normal
models are made in Chapters 9 and 10. The applicability of these comparisons
is illustrated by several examples.

0.8. Notation and Conventions


Given some random variables (in short: r.v.'s) ~l' ... '
ity space (0, d, P) we write:

F- 1
IX(F)

w(F)
IB

x4,y
w.p.l

~n

defined on a probabil-

ith order statistic of ~ 1, ... , ~n'


ith order statistic of n independent and identically distributed
(i.i.d.) r.v.'s with uniform distribution on (0, 1),
quantile function (qJ.) corresponding to the distribution function
(dJ.) F,
= inf {x: F(x) > O}
"left endpoint of dJ. F,"
= sup{ x: F(x) < 1}
"right endpoint of dJ. F,"
indicator function of a set B; thus IB(x) = 1 if x E Band IB(x) = 0
if x ~ B,
equality of r.v.'s in distribution,
with probability one.

We shall say, in short, density instead of Lebesgue density. In other cases,


the dominating measure is stated explicitly. The family of all Borel sets is the
smallest cr-field generated by intervals. When writing SUPB without any comment then it is understood that the sup ranges over all Borel sets of the
respective Euclidean space. Given adJ. F we will also use this symbol for the
corresponding probability measure. Frequently, we shall use the notation TP
for the distribution of T.

PART I

EXACT DISTRIBUTIONS
AND BASIC TOOLS

CHAPTER 1

Distribution Functions, Densities,


and Representations

After an introduction to the basic notation and elementary, important techniques which concern the distribution of order statistics we derive, in Section
1.3, the dJ. and density of a single order statistic. From this result and from
the well-known fact that the spacings of exponential r.v.'s are independent (the
proof is given in Section 1.6) we deduce the joint density of several order
statistics in Section 1.4.
In Sections 1.3 and 1.4 we shall always assume that the underlying dJ. is
absolutely continuous. Section 1.5 will provide extensions to continuous and
discontinuous drs.
In Section 1.6, the independence of spacings of exponential r.v.'s and the
independence of ratios of order statistics of uniform r.v.'s is treated in detail.
Furthermore, we study the well-known representation of order statistics of
uniform r.v.'s by means of exponential r.v.'s. This section includes extensions
from the case of uniform r.v.'s to that of generalized Pareto r.v.'s.
In Section 1. 7 various results are collected concerning functional parameters of order statistics-like moments, modes, and medians.
Finally, Section 1.8 provides a detailed study ofthe conditional distribution
of one collection of order statistics conditioned on another collection of order
statistics. This result which is related to the Markov property of order statistics
will be one of the basic tools in this book.

1.1. Introduction to Basic Concepts


Order Statistics, Sample Maximum, Sample Minimum
Let ~ 1, ... , ~n be n r. v.'s. If one is not interested in the order of the outcome
of ~ l ' .. , ~n but in the order of the magnitude then one has to examine the

12

1. Distribution Functions, Densities, and Representations

ordered sample values


(1.1.1)

which are the order statistics of a sample of size n.


We say that X". is the rth order statistic and the random vector
(Xl:., ... , X.:.) is the order statistic. Note that Xl:. is the sample minimum
and X.:. is the sample maximum. We may write
(1.1.2)
and
( 1.1.3)
When treating a sequence X r (.):. of order statistics, one may distinguish
between the following different cases: A central sequence of order statistics is
given if r(n) ~ 00 and n - r(n) ~ 00 as n ~ 00. A sequence of lower (upper)
extremes is given if r(n) (respectively, n - r(n is bounded. If r(n) ~ 00 and
r(n)/n ~ 0 or n - r(n) ~ 00 and (n - r(n/n ~ 0 as n ~ 00 then one can also
speak of an intermediate sequence.
One should know that the asymptotic properties of central and extreme
sequences are completely different, however, it is one of the aims of this book
to show that it can be useful to combine the different results to solve certain
problems.
From (1.1.2) and (1.1.3) we see that the minimum Xl:. and the maximum
X.: II may be written as a composition of the random vector (~1'''''~') and
the functions min and max. Sometimes it will be convenient to extend this
notion to the rth order statistic. For this purpose define
(LlA)
where z 1 :s; ... :s; z. are the values of the reals Xl' ... , x. arranged in a
non decreasing order. Using this notation one may write
(1.1.5)

As special cases we obtain Zl:. = min and Z.:. = max. Such a representation
of order statistics is convenient when order statistics of different samples have
to be dealt with simultaneously. Then, given another sequence ~~, ... , ~~ of
r.v.'s, we can write X;:. = Z".(~~, ... , ~~).

Sample Quantile Function, Sample Distribution Function


There is a simple device in which way we may derive results for order statistics
from corresponding results concerning the frequency of r.v.'s ~i' Let i(-ro.t]
denote the indicator function of the interval (- 00, t]; then the frequency of
the data Xi in (- 00, t] may be written 2::7=1 l(-ro.t](xJ, A moment's reflection
shows that

1.1. Introduction to Basic Concepts

13
n

Zr :::;;

itT

L 1(-oo,tl(x;) ~ r
;=1

(1.1.6)

with Zl :::;; . :::;; Zn denoting again the ordered values of Xl' . , X n From (1.1.6)
it is immediate that
(1.1.7)
and hence,
(1.1.8)
with
(1.1.9)
defining the sample dJ. Fn.
Given a sequence of independent and identically distributed (in short, i.i.d.)
r.v.'s, the dJ. of an order statistic can easily be derived from (1.1.8) by using
binomial probabilities. Keep in mind that (1.1.8) holds for every sequence
~1' ... , ~n ofr.v.'s.
Next, we turn to the basic relation between order statistics and the sample
quantile function (in short, sample qJ.) Fn- 1 For this purpose we introduce
the notion of the quantile function (in short, qJ.) of adJ. F. Define
F-1(q)

= inf{t: F(t)

~ q},

q E (0, 1).

(1.1.10)

Notice that the qJ. F- 1 is a real-valued function. One could also define
= inf{x: F(x) > O} and F-1(1):= w(F) = sup{x: F(x) < 1};
then, however, F- 1 is no longer real-valued in general.
In Section 1.2 we shall indicate the possibility of defining a qJ. without
referring to adJ.
F-1(q) is the smallest q-quantile of F, that is, if ~ is a r.v. with dJ. F then
F-1(q) is the smallest value t such that
F- 1(0):= a(F)

Pg < t} :::;; q :::;; Pg : :; t}.

(1.1.11)

The q-quantile of F is unique if F is strictly increasing. Moreover, F- 1 is


the inverse of F in the usual sense if F is continuous and strictly increasing.
As an illustration we state three simple examples.
EXAMPLES 1.1.1. (i) Let $ denote the standard normal dJ. Then $-1 is the usual
inverse of $.
(ii) The standard exponential dJ. is given by F(x) = 1 - e- x , x ~ O. We
have F-1(q) = -log(1 - q), q E (0, 1).
(iii) Let Zl < Z2 < ... < Zn and F(t) = n- 1 Li'=11(-oo,tl(z;). Then,

F-1(q) =

Z;

if(i - 1)/n < q:::;; i/n, i = 1, ... , n.

1. Distribution Functions, Densities, and Representations

14

From Example 1.1.1 (iii), with n = 1, we know if F is a degenerate dJ. with


jump at z = Zl then F- 1 is a constant function with value z. Notice that the
converse also holds. In this case we have F(F-1(q)) = 1 for every q E (0, 1).
Thus F- 1 is not the inverse of F in the usual sense.
If ~ l ' ... , ~n are r.v.'s with continuous d.f.'s then one can ignore the
possibilities of ties which occur with probability zero. Then, according to
Example 1.1.1 (iii) we obtain for every q E (0, 1):
(i - l)/n < q ::::; i/n,

= 1, ... , n.

(1.1.12)

Alternatively, we may write


Fn-1(q) = Xnq,n,
X[nq]+l,n'

nq integer,
otherwise,

(1.1.13)

where [nq] denotes the integer part of nq. Thus, we have


Fn-1(q) = X(nq),n

(1.1.13')

with <nq) = min{m: m 2 nq}.


The r.v. Fn-1(q) is the smallest sample q-quantile. If q = 1/2 then one also
speaks of the sample median.
The considerations above show that order statistics are more related to
q.f.'s than to dJ.'s. Finally, we remark that according to (1.1.12), Fn- 1 (i/n) =
Xi,n which implies Fn(Fn- 1(i/n)) = i/n for i = 1, ... , n - 1.

1.2. The Quantile Transformation


In the finite and asymptotic treatment of order statistics we shall make use of
certain special properties of order statistics of uniform and exponential r. v.'s.
In a first step, one has to established the required results for these particular
cases. The extension to other r.v.'s will be accomplished by a transformation
technique.

Introduction and Main Results


To be more precise let us introduce i.i.d. random variables 171' ... , 17n and
~1' ... , ~n where the 17i are (0, I)-uniformly distributed and the ~i have the
common dJ. F. Then, the following two relations hold:
(171, ... ,17n) ~ (F(~l),.,F(~n))

(1.2.1)

(~l""'~n) ~ (F-1(17d,,F- 1(17n))

(1.2.2)

if F is continuous, and
where F- 1 is the qJ. of F.

1.2. The Quantile Transformation

15

Let V 1 : n ::;;::;; Vn:n and, respectively, X 1 : n ::;; ::;; Xn:n be the order
statistics of '11' ... , '1n and ~ 1, ... , ~n. Since an increasing order of the observations is not destroyed by a monotone (nondecreasing) transformation one
obtains
(1.2.3)
and
(X l:n' . .. , Xn:n)

4: (F- 1 (V 1 : n), ... , F- 1 (Vn:n)).

(1.2.4)

For the details we refer to Lemma 1.2.4 and Theorem 1.2.5.

Some Preliminaries
Let us begin by noting the simple fact that given ordered values Zl
get <P(Zl) ::;; ... ::;; <p(zn) if <p is nondecreasing [respectively, <P(Zl)
if <p is nonincreasing].

we
<p(zn)

::;; . ::;; Zn

...

Lemma 1.2.1. Let Xr:n be the rth order statistic of r.v.'s ~1' ... , ~n with range
R, <p a real-valued function with domain R, and X;:n the rth order statistic of
the r.v.'s <p( ~ 1), ... , <p( ~n)
Then,
(i) X;:n = <p(Xr:n)
(ii) X;:n = <p(Xn- r+1 : n)

if <p is nondecreasing,
if <p is nonincreasing.

Alternatively, using the notation in (1.1.4) one can write


(1.2.5)
if <p is nondecreasing, and
(1.2.6)
if <p is nonincreasing.
Lemma 1.2.1 (i) shows that one can interchange the nondecreasing function
<p and the function Zr:n without changing the r.v. The main results of the
present section are applications of Lemma 1.2.1 (i) to <p = F and <p = F- 1
where F is adJ.
Another example is <p(x) = -x. According to (1.2.6),
Zr:n(~l""'~n)

= -Zn-r+l:n( -~1"'"

-~n)'

In particular, the identity


Zl:n(~l""'~n)

-Zn:n(-~l'"'' -~n)

(1.2.7)

indicates that results for the sample minimum can easily be deduced from
those for the sample maximum.

1. Distribution Functions, Densities, and Representations

16

We mention an application of Lemma 1.2.1(ii) to <p(x) = 1 - x.


EXAMPLE 1.2.2. Let U1 : n, ... , Un:n be the order statistics of n i.i.d. (0,1)uniformly distributed r.v.'s '11' ... , '1n' Then,

(U1 : n,, Un:n) = (1 - Un:n, ... , 1 - U1 : n).


d

(1.2.8)

To prove this make use of the well-known fact that


(1 - '11,,1 - '1n) ,g, ('11'"'' '1n)
In Lemma A.1.1 it will be proved-within a more general framework -that
q ::; F(x)

for every real x and q

iff F-1(q)::; x

(1.2.9)

(0, 1). Notice that (1.2.9) is equivalent to


q > F(x)

iff F- 1(q) > x.

(1.2.10)

Deduce from (1.2.9) that the qJ. of the dJ.


x -+ Fx - /1)/(1),

with location and scale parameters /1 and (1, is given by /1


From (1.2.9) one also obtains

+ (1F- 1

0< q < 1,

(1.2.11)

[where F(x-) denotes the left-hand limit of F at x] and,


F- 1(F(x)) ::; x ::; F- 1(F(xt)

if

<

F(x) < 1.

(1.2.12)

Criterion 1.2.3. A df F is continuous if, and only if,


F(F- 1(q))
PROOF.

= q for every q E (0, 1).

(1.2.13)

Obvious from (1.2.11) and the fact that every q E (0, 1) lies in the range

~~

Notice that F(F-1(q)) = q if F-1(q) is a continuity point of F. Moreover,


from (1.2.12) we get
P-1(F(x)) = x
if F(x) is a continuity point of F- 1.

Quantile and Probability Integral Transformation


Criterion 1.2.3 will be the decisive tool to prove
Lemma 1.2.4. Let '1 be a (0, I)-uniformly distributed r.v. Then for any df F the
following two results hold:

1.2. The Quantile Transformation

17

(i) (Quantile transformation)


F- 1('1) has the dt. F.
(ii) (Probability integral transformation)
Let ~ be a r.v. with df. F. Then,

iff F is continuous.

F(~) 4: '1
PROOF.

(i) From (1.2.9) it is immediate that


P{F- 1('1) ~ x} = P{'1 ~ F(x)} = F(x).

(ii) From (i) we know that ~ 4: F- 1 ('1). Thus, Criterion 1.2.3 implies that
F(~) 4: F(F- 1('1 = '1 if F is continuous. Conversely, for every x,

Pg = x}

~ P{F(~)

= F(x)} = 0

if F(~) 4: '1, and hence the dJ. F of ~ is continuous.

Let us note a direct consequence of the quantile transformation and the


transformation theorem for integrals. Apparently,

f II
gdF =

g(F- 1(xdx

(1.2.14)

provided one of the integrals exists.


For independent r.v.'s ~1' ~2 with common continuous dJ. F we deduce
from (1.2.9), (1.2.10), (1.2.13) and Lemma 1.2.4(i) that

Pg 1 ~ ~2} = P{F- 1('1d ~ F- 1('12)}


= P

~ F(F- 1('12}
{ '11
F(F- 1('11 < '12

= P{'11

<

:( '12}

where '11' '12 are independent (O,l)-uniformly distributed r.v.'s. Thus, the
probability
is independent of the continuous dJ. F.
We remark that the probability integral transformation in case of not
necessarily continuous dJ.'s will be given in Section 1.5.

The Quantile Transformation of Order Statistics


Combining Lemma 1.2.1 and Lemma 1.2.4 we obtain the main result of this
section [as already formulated in (1.2.3) and (1.2.4)].

Theorem 1.2.5. Let X 1:n, ... , Xn:n be the order statistics of n i.i.d. random
variables with common df. F. Then,

(i)

(F-l(Ul:n), ... ,F-l(Un:n

4: (X 1 : n, ... ,Xn:n),

1. Distribution Functions, Densities, and Representations

18

and

if, in addition, F is continuous, then

(ii)
PROOF.

(i) Using the quantile transformation we obtain


(l"",(n) 4, (F- 1(rlt), ... ,F- 1(tfn

where (1' ... , (n are i.i.d. random variables with common dJ. F and tf1, ... , tfn
are ij.d. random variables with common uniform distribution on (0,1). Moreover, w.l.g. the r.v.'s tfi are (0, I)-valued. Since F- 1 is a nondecreasing function
it is immediate from Lemma 1.2.1 that
(X 1 :n' ... , Xn :n) 4, (Zl: n(F- 1(tf d, ... ,F- 1(tfn, ... , Zn:n(F- 1(tf d, ... , F- 1(tfn)
=

(F-1 (Zl :n(tf1'" ., tfn,"" F- 1(Zn:n(tf 1"'" tfn)))

4, (F-1 (V1 :n),"" F- 1(Vn:n.

(ii) From (i) it is obvious that


(F(X l:n)'"'' F(Xn:n 4, (F(F- 1(V 1:n, ... , F(F- 1(Vn:n)))

where the second identity follows from Criterion 1.2.3.


Combining the two results of Theorem 1.2.5 we obtain

Corollary 1.2.6. Suppose that Xl:., ... , Xn:n are the order statistics of n i.i.d.
random variables with common continuous df F and X~:n' ... , X~:n are the
order statistics of n i.i.d. random variables with common df G. Then,

(1.2.15)
Since G- 1 is defined on (0,1) it may happen that the right-hand side of
(1.2.15) is only defined on a set with probability one. This, however, creates
no difficulties under the convention that the right-hand side is equal to
{F(Xi:n) E {O, I}} which has probability
some fixed constant on the set
zero.

Ur=l

Corollary 1.2.7. Let V"n and X"n be as in Theorem 1.2.5 (i). Then, for reals
t 1, ... , tk and integers 1 ::;; r1 < r2 < ... < rk ::;; n we obtain,

P{Xr,:n::;; t 1, .. ,Xrk :n ::;; td


PROOF.

P{Vr,:n::;; F(t 1), .. , Vrk :n ::;; F(t k}}

Theorem 1.2.5 and (1.2.9) yield

P{Xr,:n::;; t 1, .. ,Xrk :n ::;; td

P{F- 1(Vr,:n)::;; t 1, .. ,F-1(Vrk :n)::;; td

P{Vr,:n::;; F(t 1), .. , Vrk :n ::;; F(tk)}'

1.2. The Quantile Transformation

19

An Alternative Approach to Q.F.'s


Next, we investigate the question whether it makes sense to speak of a qJ.
without referring to a dJ. In order to treat this question in a satisfactory way
it is useful to study the inverse of a nondecreasing function in a greater
generality. The proof of Theorem 1.2.8 and further technical details are postponed until Appendix 1.
Theorem 1.2.8. (i) The qf. F- 1 of a df. F is nondecreasing and left continuous.

(ii) For every real-valued, nondecreasing and left continuous function G with
domain (0, 1) there exists a unique df. F such that G = F- l .
We remark that the dJ. can be regained from its q.f. by
F(x) = sup{q E (0, 1): F-l(q) :s; x}.

From Theorem 1.2.8 we know that it makes sense to say that a real-valued
function G with domain (0,1) is a q.f. if Gis nondecreasing and left continuous.
Since order statistics are more related to q.f.'s than to d.f.'s it is tempting
to formulate assumptions via conditions imposed on q.f.'s instead of dJ.'s.
However, we shall not follow this advice because of the dominant role of dJ.'s
in literature.

Weak Convergence of Q.F.'s


Finally, we treat the well-known result that the weak convergence of d.f.'s is
equivalent to the "weak convergence" of q.f.'s.
Lemma 1.2.9. A sequence of df.'s Fn converges weakly to a df. Fo

Fn-l(q)

-+

Fo-l(q),

if, and only if,

n -+ 00,

at every continuity point q of Fa l .


PROOF. First let us assume that Fn weakly converges to Fo. Let q be a continuity
point of Fa l . Since the set of all discontinuity points of Fo is finite or countable
it is obvious that for every E > we find continuity points Yl' Y2 of Fo such
that Yl < FO-l(q) < Y2 and IYl - Y21 :s; E.
From (1.2.10) we conclude that Fo(Yd < q :s; FO(Y2)' Moreover, q < Fo(Y2)
because q = Fo(Y2) implies Fa1(q) = FO
- l (Fo(Y2)) = Y2 since Y2 is a continuity
point of Fo [compare with (1.2.12)] which is a contradiction.
Thus, Fn(Yl) < q < Fn(Y2) for all sufficiently large n because Fn(yJ --+ Fo(Y;),
n --+ 00. Now it is immediate from (1.2.9) that Yl :s; Fn-l(q) :s; Y2 and hence
IFn- 1 (q) - FOl (q)1 :s; E for all sufficiently large n. Since E > is arbitrary we
know that IFn-l(q) - F01(q)l-+ 0, n --+ 00.

1. Distribution Functions, Densities, and Representations

20

To prove the converse conclusion repeat the argument above with (1.2.9)
and (1.2.12) replaced by Lemma A.1.3 and (1.2.11).
0
Let Fn denote again the sample dJ. According to the Glivenko-Cantelli
theorem, suP,IFn(t) - F(t)1 --+ 0, n --+ 00, w.p. 1. Thus one obtains as an immediate consequence of Lemma 1.2.9 that, w.p. 1, the sample qJ. Fn- l converges
to the underlying q.f. F- l at every continuity point of F- l .

1.3. Single Order Statistic, Extremes


In this section we derive the explicit form of the dJ. and the density of a single
order statistic.

The D.F. of a Single Order Statistic


Let us start with the most simple result.
Lemma 1.3.1. Let Xr:n be the rth order statistic of n i.i.d. random variables

e 1,

.. ,

en with common df F. Then, for every t,


P{Xr:n

PROOF.

~ t} =

it, C)F(t)i(1 - F(t))"-i.

(1.3.1)

Obvious from (1.1.8) by noting that I7=11(-oo,I](eJ is a binomial

u.

Lemma 1.3.1 proves once more the special case of k = 1 in Corollary 1.2.7.
It is obvious from (1.3.1) that

P{Xr:n ~ t} = P{Vr:n ~ F(t)},


where Vr:n is again the rth order statistic of n i.i.d. random variables with
common uniform dJ. on (0, 1).
As special cases of Lemma 1.3.1 we note the dJ. of the maximum Xn:n and
the minimum X l : n. We have
P{Xn:n

t} = F(t)n,

(1.3.2)

and
P{Xl:n

t}

= 1-

(1 - F(t))n.

(1.3.3)

Notice that (1.3.2) can easily be proved in a direct way since for i.i.d. random
variables e l' ... , en we have
P{Xn:n ~ t} = P{el ~ t, ... ,en ~ t} = F(t)n.

1.3. Single Order Statistic, Extremes

21

It is apparent that if l ' ... , en are independent and not necessarily identically distributed (in short, i.n.n.i.d.) r.v.'s then

P{Xn'n ::; t} =

n Fi(t)
n

(1.3.4)

i=l

with Fi denoting the dJ. of ei.

The Density of a Single Order Statistic


It is easily seen from Lemma 1.3.1 that the dJ. of the rth order statistic is
absolutely continuous if F is absolutely continuous. To prove this recall that
the composition of monotone absolutely continuous functions is absolutely
continuous (see e.g. Hewitt-Stromberg, Exercise (18.37)) or use the argument
as given at the beginning of Section 1.5. Hence, the density of X"n can easily
be established as the derivative of its dJ. (compare e.g. with HewittStromberg, Theorem (18.3)).

Theorem 1.3.2. Let X"n be the rth order statistic of n i.i.d. random variables
with common df. F and density f Then, X"n has the density
y-1(1 - Ft- r
j,.,n = n!f (r _ 1)!(n - r)!

(1.3.5)

PROOF. From Lemma 1.3.1 we know that the dJ. of X"n, say, G can be written
as the composition G = H 0 F where the function H is defined by H(t) =
Li'=r(iW(1 - t)n-i. For every t where f(t) is the derivative of Fat t we know
that the derivative of Gat t exists and G'(t) = f(t)H'(F(t)); it suffices to prove
that G'(t) = j,.,n(t). The derivative of H is given by

t r - 1(1 - tt- r
H'(t) = n! ----,--,--(r - 1)!(n - r)!

(1.3.6)

and hence the assertion of the theorem holds. For proving (1.3.6) check that
H'(t) =

,=r

i (~) t H (1 - t)n-i I

- i=r+1
f i (~) t i- 1(1 I

t r - 1(1 - tt- r

= n! .,-----.,...,..-,-----'-----,-,
(r - 1)!(n - r)!

~i1 (n - i) (~) t i(1


,=,

t)n-i

- t)n-i-1

22

I. Distribution Functions, Densities, and Representations

where the final step is obvious from the identities

iC) = (n - i +

l)C ~ 1) = n!/i -

1)!(n - i)!).

An alternative, more elegant proof of Theorem 1.3.2 will be given in Section


1.5. This proof will enable us to replace the condition that F is absolutely
continuous by the weaker condition that F is continuous.
We note simple special cases of (1.3.5). The densities of the sample maximum and the sample minimum are given by

/",n = njFn-l
and

(1.3.7)

Moreover, observe that U"n is a beta r.v. with parameters rand n - r + 1.


This becomes obvious by noting that a beta r.v. with parameters rand s has
the density

0< x < 1,

(1.3.8)

SJ

where b(r, s) = x r - 1(1 - X)s-l dx is the beta function. Recall that b(r, s) =
r(r)r(s)!r(r + s) where r is the gamma function [with r(r) = (r - I)! for
positive integers r].
The following example, concerning sample medians, gives a flavor of the
asymptotic treatment of central order statistics. It indicates that central order
statistics are asymptotically normal.
Let cp denote the standard normal density given by
cp(x)

= (2n)-1/2 exp( - x 2/2).

Deduce from (1.3.8) that the density h m of the normalized sample median
2(2m)1/2(Um+1'2m+l - 1/2)

is given by
and = 0, otherwise, where c(m) is a constant. Since

(1 - x2/2m)m

--+

exp( - x 2/2),

m --+

00,

and ~ (1 - x2/2m)m ~ exp( -x 2/2) it follows from Lebesgue's dominated


convergence theorem that
c(m)

--+

f exp( -x 2/2)dx = (2n)1/2,

--+ 00,

and hence
m

--+ 00,

(1.3.9)

23

1.3. Single Order Statistic, Extremes

for every x. The Scheffe lemma 3.3.2 yields that the distribution of the normalized sample median converges to the standard normal distribution W.r.t. the
variational distance as the sample size goes to infinity.

Extreme Value D.F.'s


Next (1.3.2) and (1.3.3) will be examined in the special case oflimiting dJ.'s of
sample maxima or sample minima (in other words: extreme value d.f.'s).
The nondegenerate limiting dJ.'s of sample maxima are of the type

Gl.~(X) = LxP( ~X-~)

if

G () = {exp( -( -x)~)
2.~ x
1

if

G3 (x)

= exp( _e-

X )

x~O

"Frechet"

x> 0,
x~O

"Wei bull"

x> 0,

for every x.

(1.3.10)

"Gumbel"

where r:l > 0 is a shape parameter. We say that two dJ.'s G1 and G2 are of the
same type if Gdh + ax) = G2 (x) for some a > 0 and real h.
Frequently, it will be convenient to write G3.~ in place of G3 where r:l is
always understood to be equal to 1. The following identities show that the
dJ.'s Gi.~ are in fact limiting dJ.'s of sample maxima. We have
G~.~(nl/~x) = Gl,~(x),

= G2.~(X),
G~(x + log n) = G3 (x).

(1.3.11)

GL(n-l/~x)

Every limiting dJ. has to be max-stable in the sense of (1.3.11). It is one of


the admirable achievements of the classical extreme value theory that one can
show that the d.f.'s in (1.3.10) are the possible nondegenerate limiting dJ.'s of
sample maxima [see e.g. Galambos (1987), Theorems 2.4.1 and 2.4.2, or Leadbetter et al. (1983), Theorem 1.4.2]. It is understood that the nondegenerate
limiting dJ.'s have to be of the same type as Gl,~' G2.~' G3
Frequently, (1.3.11) will be summarized by
(1.3.12)
where
en

nl/~,

dn = 0

en

n-l/~,

dn = 0

if i = 2

en

= 1,

dn = logn

=1
(1.3.13)

= 3.

Notice that G2 1 (x) = eX, x < 0, defines the "negative" standard exponential dJ. The dJ. G2 1 will usually be taken as a starting point of our investiga-

1. Distribution Functions, Densities, and Representations

24

tions. This is partly due to the fact that G2 , 1 is the limiting dJ. of the maximum
Un : n of (0, I)-uniformly distributed r.v.'s. To prove this, notice that
P{n(Un:n - 1) ~ x}

(1

+ x/n)",

-n ~ x ~ 0,

and

(1.3.14)
(1

+ x/n)" ~ eX =

G2 ,l(X),

n~

00,

0.

It is obvious that the pertaining densities (1 + x/n)"-ll[_n,Oj(x) converge to


the density eX, x ~ 0, of G2 ,l which again yields the convergence W.r.t. the
variational distance. We remark that (1.3.14) will be extended from the special
case of uniform r.v.'s to generalized Pareto r.v.'s in Section 1.6. A detailed
study of the asymptotic behavior of extremes will be made in Chapter 5.
Lemma 1.3.1 may be applied to show that a stability relation corresponding
to (1.3.12) does not hold for the kth largest order statistic if k > 1.
For the sake of completeness we also state the nondegenerate limiting dJ.'s
of sample minima (again with parameters (J( > 0):
F1,,,(x)

F2 ,,,(x)

= 1 - G2 ,,,( - x), x > 0,

F3(X)

1 - G 1 ,,,( -x), x < 0,

=1-

(1.3.15)

G3( -x).

The pertaining stability relations may be summarized by


1 - (1 - Fi,,,(cnx

where

Cn

+ dn))n = Fi,,,(x)

(1.3.16)

and dn are the constants in (1.3.13).

Von Mises Parametrization


In the statistical context, one includes a location parameter f.l and a scale
parameter (J > 0 into the considerations. Starting with the standard Frechet,
Weibull, and Gumbel dJ.'s as given in (1.3.10) we obtain dJ.'s of the form
x

Gi,,,((x - f.l)/(J).

If the index i is unknown then these dJ.'s should be unified to a 3-parameter


family by using the von Mises parametrization: For p "# 0 define
HfJ(x)

= exp[ -(1 + px)-llfJ],

1 + px > O.

(1.3.17)

Moreover,
(1.3.18)
Since (1 + PX)llfJ ~ eX, P ~ 0, it is clear that HfJ(x) ~ Ho(x), P ~ O. The
Frechet and Weibull drs can be regained from HfJ by the identities

25

1.3. Single Order Statistic, Extremes

P>O
and

if
G2 ,-1/P(X) = Hp( -(x

+ l)IP)

(1.3.19)

P< 0.

Graphical Representation of von Mises Densities


To get a visual impression ofthe "von Mises densities" we include their graphs
for special parameters. We shall concentrate our attention on the behavior of
the densities with parameter Pclose to zero. The explicit form of the densities
hp = Hp is given by
if P = 0, and
hp(x) = (1

+ PX)-O+l/P) exp( -(1 + PXfl/P)

-liP, P>
< -liP, P < 0,

if x >

and = 0, otherwise.
Figure 1.3.1 shows the standard Gumbel density h Q Notice that the mode
of the standard Gumbel density is equal to zero.
Figure 1.3.2 indicates the convergence of the rescaled Frechet densities to
the Gumbel density as P! 0. Figure 1.3.3 concerns the convergence of the
rescaled Weibull densities to the Gumbel density as Pi 0.
The illustrations indicate that extreme value densities-in their von Mises
parametrization-form a nice, smooth family of densities. Frechet densities
(recall that this is the case of P > in the von Mises parametrization) are
skewed to the right. This property is shared by the Gumbel density and

0.5

-3

Figure 1.3.1. Gumbel density hQ'

1. Distribution Functions, Densities, and Representations

26

-3

Figure 1.3.2. Gumbel density ho and Frechet densities hp (von Mises parametrization)
with parameters f3 = 0.3, 0.6, 0.9.

-3

Figure 1.3.3. Gumbel density ho and Weibull densities hp (von Mises parametrization)
with parameters f3 = -0.75, -0.5, -0.25.

Weibull densities for P == -1/IY. larger than -1/3.6. For parameters P close
to -1/3.6 (that is, IY. close to 3.6) the Weibull densities look symmetrical.
Finally, for parameters Psmaller than -1/3.6 the Weibull densities are skewed
to the left. For illustrations of Frechet and Weibull densities, with large
parameters IPI, we refer to Figures 5.1.1 and 5.1.2.
In Figure 1.3.4 we demonstrate that for certain location, scale and shape
parameters jJ., (J and IY. = -l/P it is difficult to distinguish visually the Weibull
density from a normal density. Those readers having good eyes will recognize

104. Joint Distribution of Several Order Statistics

27

0.5

-4

Figure 1.304. Standard normal density and Weibull density (dotted line) with parameters J1 = 3.14, (J' = 3048, and rx = 3.6.

a difference at the tails of the densities (with the dotted line indicating the
Wei bull density).

1.4. Joint Distribution of Several Order Statistics


In analogy to the proof of Lemma 1.3.1 which led to the explicit form of the
dJ. of a single order statistic one can find the joint dJ. of several order statistics
X r , :n' .. , X rk : n by using multinomial probabilities. The resulting expression
looks even more complicated than that in the case of a single order statistic.
Thus, we prefer to work with densities instead of d.f.'s. The basic results
that will enable us to derive the joint density of several order statistics are (a)
Theorem 1.3.2 that provides the explicit form of the density of a single order
statistic in the special case of exponential r.v.'s and (b) Theorem 1.4.1 which
concerns the density of the order statistic

Density of the Order Statistic


The density of the order statistic Xn can be established by some straightforward arguments.
Theorem 1.4.1. Suppose that ~ 1, . , ~n are i.i.d. random variables having the
common density f. Then, the order statistic Xn has the density 11,2, .... n:n given by

28

1. Distribution Functions, Densities, and Representations


n

f1,2,. ... .,n(x l ' ,xn) = n!

TI f(xJ,

i=l

and = 0, otherwise.

Let Sn be the permutation group on {l, ... ,n}; thus, (r(l), ... ,r(n
is a permutation of (1, ... , n) for every r E Sn Define Br = {~r(l) < ~r(2) <
... < ~r(n)} for every r E Sn. Note that
PROOF.

(X l:n,"" Xn:n) = (~r(l)"'" ~r(n) on Bp

and (~r(I)' ... , ~r(n) has the same distribution as (~I' ... , ~n)'
Moreover, since the r.v.'s ~i have a continuous dJ. we know that ~i and ~j
have no ties for i '# j (that is, Pg i = 0 = 0) so that P(I,rEsnBr) = 1.
Finally, notice that the sets Bp r E Sn, are mutually disjoint. Let Ao =
{(xI, ... ,xn): XI < X 2 < ... < x n}, and let A be any Borel set. We obtain
P{Xn

A} =

P( {Xn

P{(~r(I)""'~r(n)EAnAo}=n!P{(~I'''''~n)EAnAo}

A} n Br) =

tES"

reS"

P({(~r(l)'"'' ~r(n)

A} n Br)

tES n

= fAfl.2.,. .. n:n(XI, ... ,Xn)dXI ... dXn

which is the desired representation.

Theorem 1.4.1 will be specialized to the order statistic of exponential and


uniform r.v.'s.
EXAMPLES 1.4.2. (i) If ~ I,

... ,

~n

are i.i.d. standard exponential r. v.'s then

f1,2.,..,n:n(x I ,,,,,xn) = n!ex p [

<

-i~ Xi}

Xl

< ... < Xn, (1.4.2)

and = 0, otherwise.
(ii) If ~ I ' ... , ~n are i.i.d. random variables with uniform distribution on (0, 1)
then

<

Xl

< ... < Xn < 1,

(1.4.3)

and = 0, otherwise.
Using Example 1.4.2(i) we shall prove that spacings Xr:n - X r - I :n of
exponential r.v.'s are independent (see Theorem 1.6.1). As an application one
obtains the following lemma which will be the decisive tool to establish the
joint density of several (in other words, sparse) order statistics X r, on' ... , X rk : n

Lemma 1.4.3. Let Xi:n be the ith order statistic of n i.i.d. standard exponential
r.v.'s. Then, for 1 :s:; rl < ... < rk :s:; n, the following two results hold:

29

1.4. Joint Distribution of Several Order Statistics

(i) The spacings X r, :n, X r2 :n - X r, :n, ... , X rk :n - X rk _l : n are independent,


and
(ii)
for i

= 1, ... , k (where ro = 0 and XO:n = 0).

PROOF. (i) follows from Theorem 1.6.1 since X l : n , X 2 : n -X l :n, ... , Xn:nl : n are independent.
(ii) From Theorem 1.6.1 we also know that (n - r + I)(Xr:n - X r- l : n) is
a standard exponential r.v. Hence, using an appropriate representation of
Xs:n - Xr:n by means of spacings we obtain for 0 ~ r < s ~ n,
X n-

_ X

X
son

Sf (n -

(r

+ i) + l)(Xr+i:n - X r +i-1:n)

i=l

ron

n - (r

.!!. s~ ((n - r) - i
-

+ i) + 1

+ l)(Xi :n- r +1

().
n - r - I

1...

i=l

Xi-l:n-r) _
- Xs-r:n-r

From Lemma 1.4.3 and Theorem 1.3.2 we shall deduce the density of
Xr,:n - X r'_l: n' and at the next step the joint density of
X r, on' X r2 :n - X r, on' ... , X rk :n - X rk _l : n

in the special case of exponential r.v.'s. Therefore, the joint density of order
statistics X r, on' ... , X rk :n of exponential r.v.'s can easily be established by
means of a simple application of the transformation theorem for densities.

Transformation Theorem for Densities


The following version of the well-known transformation theorem for densities
will frequently be used in this sequel.
Let ~ be a random vector with density f and range B where B is an open
set in the Euclidean k-space IRk. Moreover, let T = (Tl , ... , 'Ii) be an IRk-valued,
injective map with domain B such that all partial derivatives 8Tj8xj are
continuous. Denote by (8T/8x) the matrix (8T;/8xj )i.j of all partial derivatives.
Assume that det(8T/8x) is unequal to zero on B. Then, the density of T(~) is
given by
(f 0 T- l )ldet(8T- l /8x)II T (B)

(1.4.4)

where T- l denotes the inverse of T. It is well-known that


det(8T- l /8x)

l/det(8T/8x)

T- l

(1.4.5)

under the conditions imposed on T.


EXAMPLE 1.4.4. Let ~ 1, ... , ~k be i.i.d. standard exponential r. v.'s. Put X =
(x 1, ... , X k ) The joint distribution of the partial sums ~ 1, ~ 1 + ~2' . , :L7=1 ~i

1. Distribution Functions, Densities, and Representations

30

has the density


(1.4.6)
where D = {y: 0 < Y1 < ... < Yk}' This is immediate from (1.4.4) applied
to B = (O,OO)k and T;(x) = L~=l Xj' Notice that T(B) = D, T- 1(x) =
(X 1,X2 - X1, ... ,Xk - Xk- 1) and det(oT/ox) = 1 since (oT/ox) is a triangle
matrix with oT;/ox i = 1 for i = 1, ... , k.
The reader is reminded of the fact that L~=l ~i is a gamma r.v. with parameter k (see also Lemma 1.6.6(ii.

The Joint Density of Several Order Statistics


To establish the joint density of X' I on' , X. k : n we shall first examine the
special cases of exponential and uniform r.v.'s. Part III ofthe proof of Theorem
1.4.5 will concern the general case. The proof looks a little bit technical,
however, it can be developed step by step without much effort or imagination.
Another advantage of this method is that it is applicable to r.v.'s with continuous d.f.'s (see Theorem 1.5.2).
Theorem 1.4.5. Let 1 ::; k ::; nand 0 = ro < r1 < ... < rk < rk+1 = n + 1. Suppose that the common df. F of the i.i.d. random variables ~ l ' ... , ~n is absolutely
continuous and has the density f.
Then, X' I on' , X. k : n have the joint density J. 1.2 ......k :n given by

J.

1 2 . . . . . .k

:n(x) = n! (

) k+1 (F(Xi) - F(Xi_d)"-',-1-1


1)'
r, r,-l
.

TI f(xi) ,=1
TI
,=1

(._. _

if 0 < F(x 1) < F(X2) < ... < F(Xk) < 1, and =0, otherwise. [We use the convention that F(xo) = 0 and F(Xk+1) = 1.]

PROOF. (I) First assume that ~ 1, ... , ~n are standard exponential r.v.'s. Lemma
1.4.3 and Theorem 1.3.2 imply that the joint density g of
X. ,:n , X. 2 : n

X. ,:n ,

X. k : n

is given by

_ k [

TI

g(x) - ,=1

X. k _ l : n

't-,]

, -x, (1 - e- x ,)" -"-1 -l(e- x


(n - ri-de
(r,. _.r,-l _ 1)'(
.)'
. n _ r,.
Xi ~

'

0, i = 1, ... , n,

and = 0, otherwise.
From (1.4.4) and Example 1.4.4 we get, writing in short
J., ......k:n' that for 0 = Xo < Xl < ... < Xk'

kn

instead of

1.4. Joint Distribution of Several Order Statistics

TI
k

e-(n-r j +l)(xj-X i -

[l -

31

e-(Xi-Xi-dJ'j-ri-1-l

i=1

TI e-(n-rj +l)(xj
k

-xi-de(rj -'i-1 -l)xi-l

[e-Xi-1 _ e-Xi]rj - ' i - 1 - 1

i=1

and ir,n = 0, otherwise. The proof for the exponential case is complete.
(II) For Xi,n as in part I we obtain, according to Theorem 1.2.5(ii) that
d

(Ur! ,n"'" Urk,n) = (G(Xr! 'n)"'" G(Xrk,n))

where G(x) = 1 - e-X, x ;::: O. Using this representation, the assertion in the
uniform case is immediate from part I and (1.4.4) applied to
B

= {x: 0 <

< ... <

XI

xd

and

T(x)

= (G(XI), ... ,G(Xk))'

(III) Denote by Q the probability measure pertaining to F, and by gr,n


the density of(Ur! ,n"'" Urk,n)' It suffices to prove that for t l , ... , tk the identity

gr,n(F(xd,,F(xk))dQk(xl,,Xk)

Xi~!

oo,t,]

holds since Qk has the density x


II we get

--+ TI~=I

i(x;). From Corollary 1.2.7 and part

P{ X r!,n ~ tl'"'' Xrk,n ~ tk} = P{Ur! ,n ~ F(tl)'"'' Urk,n ~ F(tk)}

=f
=

gr,n(xJ, ... ,xk)dxl .. dxk

f X7~1(-OO,F(ti)](F(xI)"'"
X7~1 (-ro,F(t i )]

F(xk))gr,n(F(xd,, F(xd)dQk(XI"'" xk)

where the 3rd identity follows by means ofthe probability integral transformation (Lemma 1.2.4(ii)). This lemma is applicable since F is continuous. The
proof is complete if
l(-ro,F(t)](F(x)) = l(_oo,tj(x)

for Q almost all x.

This, however, is obvious from the fact that (- 00, t] c {y: F(y) ~ F(t)} and
that both sets have equal probability w.r.t. Q (prove this by applying the
probability integral transformation).
0
Remark 1.4.6. The condition 0 < F(x l ) < ... < F(x k) < 1 in Theorem 1.4.5
can be replaced by the condition XI < ... < x k To prove this notice that

1. Distribution Functions, Densities, and Representations

32

{O <

F(~I)

< ... <

F(~k)

< 1}

the same probability.

C gl

< ... <

~dandshowthatbothsetshave

We mention some special cases. For k = 1 and k = n we obtain again


Theorem 1.3.2 and Theorem 1.4.1. Moreover, we note the joint density of the
k smallest and k largest order statistics. We have
f1,2, ... ,k,n(X) = n! [

[I f(xJ

.-1

] (1 - F(Xk))"-k
( _ k)'
'

n.

(1.4.7)

and = 0, otherwise. Moreover,


fn-k+l ..... n'n(x) = n! [

lJ f(xJ
k

F(x 1 k
(n _ k)! '

(1.4.8)

and =0, otherwise. The joint density of(X Ln , Xn,n) is given by


and = 0, otherwise.
A slight modification of the proof of Theorem 1.4.5 will enable us to
establish the corresponding result for continuous d.f.'s.

1.5. Extensions to Continuous and


Discontinuous Distribution Functions
The results of this section are not required for the understanding of the main
ideas of this book and can be omitted at the first reading.
Let ~ l' ... , ~n be again i.i.d. random variables with common distribution
Q and dJ. F. It is easy to check that the joint distribution of k order statistics
possesses a Qk-density. To simplify the arguments let us treat the case of
a single order statistic X,,"" Since {X"n E B} C Ui=1 gi E B} we have
P{X"n E B} ::s; n Pg 1 E B}, thus, Pg 1 E B} = implies P{X"n E B} = for
every Borel set B. Therefore, the distribution of X"n is absolutely continuous
W.r.t. Q, and hence the Radon-Nikodym theorem implies that X"n has a
Q-density.
The knowledge of the existence of the density stimulates the interest in its
explicit form. One can argue that Theorem 1.5.1 is highly sophisticated,
however in many cases one would otherwise just be able to prove less elegant
results (see e.g. P.1.31).

Density of a Single Order Statistic under a Continuous D.F.


First we give an alternative proof to Theorem 1.3.2. This proof enables us to
weaken the condition that F is absolutely continuous to the condition that F
is continuous.

1.5. Extensions to Continuous and Discontinuous Distribution Functions

33

Theorem 1.5.1. Let X,," be the rth order statistic of n i.i.d. random variables
with common continuous df F. Then, X,," has the F-density
pr-1 (1 _ F)"-r
n!----(r - l)!(n - r)!

(1.5.1)

PROOF. It suffices to prove that

P{X,,"

~ x} = J:oo H'(F)dF

with H' as in (1.3.6). According to (1.2.4), Criterion 1.2.3 and (1.2.9), the
right-hand side above is equal to Jt(X) H'(x) dx. Moreover,

fo

F(X)

H'(x)dx

= H(F(x)) = P{X,,"

~ x}.

Notice that Theorem 1.3.2 is immediate from Theorem 1.5.1 under the
condition that F is absolutely continuous.

Joint Density of Several Order Statistics


under a Continuous D.F.
Another look at the proof of Theorem 1.4.5 reveals that the essential condition
adopted in the proof was the continuity of the dJ. F. In a second step we also
made use of the density x --+ [17=1 f(x;). When omitting the second step in the
proof onegets the following theorem for continuous dJ. 's which is an extension
of Theorem 1.4.5.
~ nand 0= ro < r1 < ... < rk < rk+1 = n + 1. Let
be i.i.d. random variables with common distribution Q and df F. If F
is continuous then the order statistics Xrl , " ' , X rk ," have the joint Qk-density
grl .... ,rk'" given by

Theorem 1.5.2. Let 1 ~ k


~ l'

... ,

~"

(1.5.2)

if

Xl

< x 2 < , .. < x k, and =0, otherwise (where again F(x o) =

F(Xk+1)

= 1).

and

Note that Theorem 1.4.5 is immediate from Theorem 1.5.2 since Qk has the
Lebesgue density x --+ flf=r!(x;) if Q has the Lebesgue density f.
Remark 1.5.3. Part III of the proof of Theorem 1.4.5 shows that the following
result holds true: Let Qo be the uniform distribution on (0, 1) and let Q1 be a
probability measure with continuous dJ. F.

1. Distribution Functions, Densities, and Representations

34

1f(~1"'" ~k) is a random vector with Q~-density g


(F-l(~d, ... , rl(~k)) has the Q1-density

then the random vector

x --+ g(F(x 1)' ... , F(Xk))'

Probability Integral Transformation for Discontinuous D.F.'s


Let ~ be a r.v. with distribution Q having a continuous dJ. F. The uniformly
distributed r.v. F(~), as studied in Lemma 1.2.4(ii), corresponds to the following experiment: If x is a realization of ~ then in a second step the realization
F(x) will be observed.
Next, let F be discontinuous at x. Consider a 2-stage random experiment
where we include a further r.v. which is uniformly distributed on the interval
(F(x-), F(x)). Here, F(x-) denotes again the left-hand limit of F at x. For
example, we may take the r.v. F(x-) + I'/(F(x) - F(x-)) where 1'/ is uniformly
distributed on (0, 1).
If x is a realization of ~, and y is a realization of 1'/ then the final outcome
of the experiment will be F(x-) + y(F(x) - F(x-)). This 2-stage random experiment is also governed by the uniform distribution. This idea will be made
rigorous in the following lemma.
Lemma 1.5.4. Suppose that ~ is a r.v. with df F, and that 1'/ is a r.v. with uniform
distribution on (0, 1). Moreover, ~ and 1'/ are assumed to be independent. Define

H(y, x) = F(x-)
Then,

H(I'/,~)

+ y(F(x) -

F(x-)).

(1.5.3)

is uniformly distributed on (0, 1).

PROOF. It suffices to prove that P{H(I'/,~) < q} = q for every q E (0, 1). From
(1.2.9) we know that ~ < F- 1 (q) implies F(O < q and ~ > F- 1 (q) implies
F(~) 2 q. Therefore, by setting x = F- 1 (q), we have

P{H(I'/,~)

< q} =

Pg < x} + P{H(I'/,~) < q, ~ = x}

= F(x-) + P{F(x-) + I'/(F(x) - F(x-)) < q}Pg = x} = q.


D
Lemma 1.5.4 will be reformulated by using a Markov kernel K. Note that
inducing with the dJ. F is equivalent to inducing with the Markov kernel
(B, x) --+ 1B(F(x)).

Corollary 1.5.5. Let Q be a probability measure with df F. Define K(Blx) =


1B (F(x)) for every Borel set B if x is a continuity point of the df F, and K( 'Ix)
is the uniform distribution on (F(x-), F(x)) if F is discontinuous at x. Then,

KQ =
is the uniform distribution on (0, 1).

f K( 'lx)dF(x)

1.5. Extensions to Continuous and Discontinuous Distribution Functions

35

PROOF. Let ~ and 11 be as in Lemma 1.5.4. Thus, K('lx) is the distribution of


F(x-) + 11(F(x) - F(x-. By Fubini's theorem we obtain for every t,

fK -

00,

tJlx)dF(x)

P{F(x-)

= P{F(~-)

+ 11(F(x) -

+ 11(F(~) -

F(x-

F(~-

t}dF(x)

t} = t

where the final identity is obvious from Lemma 1.5.4.

Joint Density of Order Statistics under a Discontinuous D.F.


Hereafter, let ~ l ' ... , ~n be i.i.d. random variables with common distribution
Q and dJ. F. For example, F is allowed to be a discrete dJ. Let again
H(y, x) = F(x-)

+ y(F(x) -

F(x-.

Theorem 1.5.6. For 1 S k S nand 0 = ro < r1 < ... < rk < rk+l = n
Qk-density of (Xr1 :n'" ., X rk :n), say, f.. ...... rk: n is given by

f.. 1..... rk: n(x 1' ... ,xd =

J(O.l)k

+ 1 the

grl ..... rk:n(H(y1'X1)' ... ,H(yk'xd)dy1 ... dyk

where grl ..... rk:n is the joint density of UrI on' ... , Urk :n
PROOF. The proof runs along the lines of part (III) in the proof of Theorem

1.4.5. Instead of Lemma 1.2.4(ii) apply its extension Lemma 1.5.4 to discontinuous d.f.'s. We have

=E[l Xf:l(-<Xl,F(ti)) (H(111 , ~1)"'" H(11k'~kgrl, ... ,rk:n(H(111' ~d, ,H(11k, ~k))]
= E[1 Xf=1 (-<Xl ,til (~1'"'' ~k)grl ..... rk:n(H(111' ~ 1),, H(11k' ~k))]

where 11 1, ~ l' ... , 11k' ~k are independent r. v.'s such that ~ 1, ... , ~n possess the
common dJ. F, and 111' ... , 11k are uniformly distributed on (0, 1). The second
identity is established in the same way as the corresponding step in the proof
of Theorem 1.4.5 by applying Lemma 1.5.4 instead of Lemma 1.2.4(ii). Now
the assertion is immediate by applying Fubini's theorem.
0
Notice that H(Y1, xd < H(Y2, X2) if and only if either Xl < X2, or Xl = X2
and Y1 < Y2. Hence, by using the lexicographical ordering one may write
Theorem 1.5.6 in a different way:
Corollary 1.5.7. Define Bk as the set of all vectors (X 1,Y1, ... ,Xk,Yk) with
1, i = 1, ... , k, and Xi < Xi+1 or Xi = Xi+l and Yi < Yi+1 for i = 1, ... ,

o < Yi <

k - 1.
Then, the density f.. ...... rk:n> given in Theorem 1.5.6, is of the following
form:

1. Distribution Functions, Densities, and Representations

36

f,,' ..... 'k:n (X )

=,n . .k+1
n [H(y;,;x) (- . H(y
_.
r,

Bk .=1

(with the convention that H(yo,xo)

;-1,

r,-l

;-1

)],;-,;_,-1

_ 1)'

Y1 ... Yk

= 0 and H(Yk+1, xk+d = 1).

I.N.N.I.D. Random Variables


This is perhaps the proper place to mention an interesting result due to
Guilbaud (1982). This result connects the distribution of order statistics of
i.n.n.i.d. (independent not necessarily identically distributed) random variables to that of order statistics of i.i.d. random variables.
Theorem 1.5.S. Let X1:n ::;; ... ::;; Xn:n be the order statistics of i.n.n.i.d. random
variables ~ 1, ... , ~n' Denote by F; the df. of ~;.
Then, for every Borel set B,

where the summation runs over all subsets S of {I, ... , n} with m elements.
Moreover, xf,n ::;; ... ::;; X!:n are the order statistics of n i.i.d. random variables
with common df.

FS = ISI- 1

ieS

F;.

We do not know whether Theorem 1.5.8 is of any practical relevance.

1.6. Spacings, Representations, Generalized


Pareto Distribution Functions
In this section we collect some results concerning spacings (and thus also for
order statistics) of generalized Pareto r.v.'s. We start with the particular cases
of exponential and uniform r.v.'s.

Spacings of Exponential R.V.'s


The independence of spacings of exponential r.v.'s was already applied to
establish the joint density of several order statistics. The following well-known
result is due to Sukhatme (1937) and Renyi (1953).
Theorem 1.6.1. If X 1:n, ... , Xn:n are the order statistics of i.i.d. standard
exponential r.v.'s '11' ... , '1n then

1.6. Spacings, Representations, Generalized Pareto Distribution Functions

37

(i) the spacings X l :n, X2:n - X l :n, ... , Xn:n - X n- l :n are independent,

and

(ii) (n - r + I)(Xr:n - X r - l : n ) is again a standard exponential r.v. for each


r = 1, ... , n (with the convention that XO:n = 0).
PROOF. Put x = (x 1, .. ,xn ). It suffices to prove that the function

n exp( -x;)I(O,,,,)(X
i=l
n

X --+

i)

is a joint density of

nXl:n> (n - I)(X2:n - X l : n ),

(Xn:n - X n -

l : n )

From Example 1.4.2(i), where the density of the order statistic of exponential r.v.'s was established, the desired result is immediate by applying the
transformation theorem for densities to the map T = (Tl , . . , 7;,) defined by
T;(x) = (n - i + l)(xi - xi-d, i = 1, ... , n.
Notice that det(oT/ox) = n! and T-l(x) = (LJ=l x)(n - j + 1~=l' Moreover, use the fact that L~=l Xi = L~=1 LJ=1 x)(n - j + 1).
0
From Theorem 1.6.1 the following representation for order statistics Xr:n
of exponential r.v.'s is immediate:
(1.6.1)
Note that spacings of independent r.v.'s '11' ... , '1n with common dJ.
F(x) = 1 - exp[ -a(x - b)],x ~ b,arealsoindependent.Itiswellknown(see
e.g. Galambos (1987), Theorem 1.6.3) that these dJ.'s are the only continuous
dJ.'s so that spacings are independent.

Ratios of Order Statistics of Uniform R.V.'s


Spacings of uniform r.v.'s cannot be independent. However it was shown by
Malmquist (1950) that certain ratios of order statistics Ui : n of uniform r.v.'s
are independent. This will be immediate from Theorem 1.6.1. A simple generalization may be found at the end of the section.
Corollary 1.6.2.
(i)

1 - Ul :n, (1 - U2 : n }/(l - Ul :n), ... , (1 - Un:n)/(1 - Un-l:n)

are independent r.v.'s, and


(ii)

(1 - Ur:n)/(1 - Ur- l :n) 4: Un-r+1:n-r+l'

(with the convention that UO : n = 0).

r = 1, ... , n,

1. Distribution Functions, Densities, and Representations

38

PROOF. Let X"n be as in Theorem 1.6.1 and let F be the standard exponential
dJ. Since U"n ~ F(X"n) we get

[(1 - Ur:n)j(1 - Ur-l,n)]~=1 ~ [(1 - F(X"n))/(l - F(Xr-l,n))]~=1

~ [exp( -(X"n - Xr-l,n))]~=1


which yields (i) according to Theorem 1.6.1.
Moreover, by Lemma 1.4.3(ii) and Example 1.2.2 we obtain
d

exp( -(X"n - X r- 1,n)) = 1 - F(X 1,n-r+1) = 1 - U1,n-r+l = Un- r+1,n-r+1'

The proof of (ii) is complete.

The original result of Malmquist is a slight modification of Corollary 1.6.2.


Corollary 1.6.3.
(i) U1,n/U2,n"'" Un-Ln/Un,n, Un,n are independent r.v.'s,
and
(ii) U"n/Ur+1:n ~ U"r for r = 1, ... , n

(with the convention that Un+1,n = 1).


PROOF.

Immediate from Corollary 1.6.2 since by Example 1.2.2

(U"n/Ur+l:n)~=1 ~ [(1 - Un- r+1:n)j(l - Un-"n)]~=I'

Since U"n/Ur+l:n, Ur+Ln are independent one could have the idea that
also U"n, U"n/Ur+1:n are independent which however is wrong. This becomes
obvious by noting that 0 ::'S: U"n ::'S: U"n/Ur+l,n ::'S: 1.

Representations of Order Statistics of Uniform R.V.'s


One purpose of the following lines will be to establish a representation of the
order statistics U1:n, ... , Un,n related to that in (1.6.1). In a preparatory step
we prove the following.
Lemma 1.6.4. Let 1/ 1, ... , 1/n+1 be independent exponential r.v.'s with 1/i having
the df. F;(x) = 1 - exp( -(XiX) for x ~ 0 where (Xi> O. Put (i = 1/;/(2::;:11 1/J,
i = 1, ... , n, and (n+l = I,;:t 1/j. Then, the joint density of (1' ... , (n+1, say gn+l,
is given by

gn+1 (xn+1) =

(:a

(Xi) x:+1 exp [ - Xn+1 ((Xn+l

if Xi > 0 for i = 1, ... , n + 1,2:7=1 Xi <

1, and gn+1

+ it ((Xi -

(Xn+l )Xi) ]

= 0, otherwise.

1.6. Spacings, Representations, Generalized Pareto Distribution Functions

39

The transformation theorem for densities (see (1.4.4 is applicable


to B = (0, (0)"+1 and T = (Tl , ... , T,,+1) where T,,+1(xn +d = '[.;:: Xj and
7;(xn + l ) = Xd'[.;:11 Xj for i = 1, ... , n. The range of T is given by

PROOF.

T(B) =

{Xn+l: Xi > 0 for i = 1, ... , n + 1 and )=1


i Xj < I}.

The inversefunction S = (SI"'" Sn+d of T is given by S;(x n + l ) = X i X n +1 for


i = 1, ... , nand Sn+1 (x n + l ) = (1 - '[.;=1 xJx n+l . Since the joint density of
'11' ... , '1n+l is given by

the asserted form of gn+1 is immediate from (1.4.4) if det(oS/ox)

= x:+1 (where

(as/ox) is the matrix of partial derivatives). This, however, follows at once


from the equation
Xn+1

Xl

Xn+1
0

0
0

Xn+1 Xn

Xl
0

0
Xn+1
Xn
-X n+1 ... -Xn+1 (1 - '[.7=1 x;)

since det(AB) = det(A)det(B). Notice that the 3rd matrix is (as/ox).

Thejoint density ofthe r.v.'s C = '1d('[.;:11tlj), i = 1, ... , n, was computed in


a more direct way by Weiss (1965).
Corollary 1.6.5. The r.v.'s (;, i = 1, ... , n, above have the joint density hn given by

hn(xn)

if Xi >

n! (

n+l)[
ai an+1

}J

+ ;~ (a;

- an+l )x;

J-(n+1)

0 for i = 1, ... , nand '[.7=1 X; < 1, and hn = 0, otherwise.

PROOF. Straightforward by applying Lemma 1.6.4 and by computing the


density of the marginal distribution in the first n coordinates.
0

Lemma 1.6.4 will only be applied in the special case of i.i.d. random
variables. We specialize Lemma 1.6.4 to the case of a 1 = a2 = ... = an+l = 1.
Lemma 1.6.6. Let '11' ... , '1n+1 be i.i.d. standard exponential r.v.'s. Then,

(i) (tlr/(L~:11 '1j~=l' '[.;:: '1j are independent,


(ii)
tlj is a gamma r.v. with parameter n + 1 (thus having the density
X -+ e-xxn/n!, X ~ 0),

'[.;:11

40

1. Distribution Functions, Densities, and Representations

(iii) IJ l' IJ 1+ IJ 2,

... ,

2:;=1 IJj have the joint density


xn->exp(-xn)

ijO<x 1 <<x n ,

and the density is zero, otherwise.


PROOF.

(i) and (ii) are obvious since the density gn+1 in Lemma 1.6.4 is of the

form
gn+1 (x n+1)

n! exp( - X n+1)x:+ 1 /n!

if 0 < 2:~=1 Xi < 1 and X n + 1 > O.


(iii) Standard calculations! See Example 1.4.4.

We prove that spacings of (0, I)-uniformly distributed r.v.'s have the same
joint distribution as the r. v.'s IJr/Ci.;:ll lJj ) above by comparing the densities
of the distributions.
Theorem 1.6.7. If IJ l'

... ,

IJn+1 are i.i.d. standard exponential r.v.'s, then

( Ib

(U 1:n, U2 : n - U1:n,, Un:n - Un- 1:n, 1 - Un:n) g, IJr


PROOF.

n+1
J-1

IJj

)n+1
r=l

. (1.6.2)

It suffices to prove that

(where UO : n = 0) because the random vectors with n + 1 components are


induced by those above and the map Xn -> (Xl'' Xn , 1 - 2:7=1 xJ
From Corollary 1.6.5 we know that (1'fr/2:;:lllJj)~=l has the density hn(xn) =
n! if Xi > 0, i = 1, ... , n, and 2:7=1 Xi < 1. Starting with the density of (Ur:n)~=l
(see Example 1.4.2(ii it is immediate from (1.4.4) and Example 1.4.4 that hn
is also the density of (Ur:n - Ur-1:n)~=1
0
Since i.i.d. random variables are exchangeable it is obvious that the r.v.'s

1Jt!(2:;:: IJj), ... , 1'fn+1/(2:;:11 1J) are also exchangeable. Thus, Theorem 1.6.7
yields that the distribution of (Ur:n - Ur-1:n)~;;t (where Un+1:n = 1) is invariant under the permutation of its components. This implies, in particular,
that all marginal distributions of (Ur:n - Ur- 1:n)~;;t of equal dimension are
equal.
Corollary 1.6.8. For every permutation r on {I, ... , n

+ I},

(Ut(r):n - Ut(r)-l:n)~;;t g, (Ur:n - Ur-1:n)~;;t

(1.6.3)

Let us also formulate Theorem 1.6.7 in terms of the order statistics Ur:n
themselves. Since Ur:n = 2:~=1 (Ui :n - Ui - 1:n) we obtain

1.6. Spacings, Representations, Generalized Pareto Distribution Functions

41

Corollary 1.6.9. If '11' ... , '1n+l are i.i.d. standard exponential r.v.'s, then

(1.6.4)

Reformulation of Results
At a first step, the results above will be reformulated to order statistics Vi:n
of n i.i.d. random variables uniformly distributed on ( -1, 0). From Section 1.2
we know that
(1.6.5)
In this sequel, we shall deal with "negative" standard exponential r.v.'s
'1i in place of standard exponential r. v.'s '1i' Thus, ~ 1, ... , ~n+l are i.i.d.
random variables with common dJ. Gz , 1 (compare with (1.3.1 0)). We introduce
the partial sums
~i

=-

(1.6.6)
From Lemma 1.6.6(ii) it is obvious that Sk is a "negative" gamma r.v. with
parameter k having density x ...... e X ( - X)k-l I(k - 1)!, x < O. Corollary 1.6.9 is
equivalent to
Corollary 1.6.10.

(1.6.7)
Notice that -Sn+t!n ...... 1, n ...... 00, w.p. 1, which in conjunction with (1.6.7)
indicates that, for every fixed k, asymptotically in distribution,
(1.6.8)
Recall that for k = 1 such a relation was proved in (1.3.14). For further
details see Section 5.3.
Next, we reformulate Malmquist's result.
Corollary 1.6.11. We have

(i)

(ii)
(iii)

( V.v,,:n " ...


n-l:n

VZ:n _ V
1:n
Vl:n ,

v,,-r+l:n/v,,-r:n

=d -

).!!. (SlS 'S'


Sz ... Sn-l ~)
'S 'S
'
-

Vl:r

Sl Sz
Sn-l Sn
-S ,-s ,"', - S ,-s ,Sn+l
n
n+l
z 3

for r

= 1, ... , n -

n+l

1,

are independent r.v.'s.

(1.6.9)

1. Distribution Functions, Densities, and Representations

42

PROOF. (i) is obvious from (1.6.7). (ii) is immediate from Corollary 1.6.2(ii).
Ad (iii): From Corollary 1.6.2(i) we know that the first n components of the
vectors in (1.6.9) are independent. Moreover, it is immediate from Lemma
1.6.6(i) that (S,/Sn+1 )~=l' Sn+l are independent and this property also holds for
(S,/S,+1 )~=l' Sn+1' Thus, (iii) holds.
D

Generalized Pareto D.F.'s


The uniform distribution on (-1,0) is the generalized Pareto dJ. W2 ,1' We
introduce the class {W1,a, W2 ,a, W3: rt > o} of generalized Pareto d.f.'s and
extend the results above to this class.
Associated with the extreme value dJ. Gi,a is the generalized Pareto dJ. W;,a
that will be introduced by means of the map
defined on the support of G2 . l' Explicitly, we have for x
(_x)-l/a
T;,a(x)

= -( _x)l/a

E (-

00,0),

i = 1

if i

-log(-x)

(1.6.10)

i=3.

with the convention that T3 ,a == T3 ,1 == T3 .


If ~ is a r.v. with "negative" exponential dJ. G2 ,1 then we know (see (1.2.15))
that
T;,a(~)

is distributed according to Gi.a.

In analogy to this construction we get for a ( - 1, O)-uniformly distributed


r.v. '1 that
T;,a('1) is distributed according to W;,a

with
W;,a

1 + log Gi,a

whenever -1 < log Gi,a < 0. Thus, the class of generalized Pareto d.f.'s arises
out of W2 ,1 in the same way as the extreme value d.f.'s out of G2 ,1'
For rt > we have

W1,Ax)

W",(x)

W3(x)

_ox_ a

~ { I - ~- xl'
=

_oe_ x

if

x~1

x> 1,

x
if x
x

if

"Pareto"

~-1

(-1,0)
~ 0,

x ~ 0,
x> 0.

"Uniform etc."

"Exponential"

(1.6.11)

1.6. Spacings, Representations, Generalized Pareto Distribution Functions

43

This class of dJ.'s was introduced by J. Pickands (1975) in extreme value


theory. The importance of the generalized Pareto dJ.'s will become apparent
later.

Order Statistics of Generalized Pareto R.V.'s


For the rth order statistic X"n of n i.i.d. random variables with common
generalized Pareto dJ. W:,a we obtain the representation
(1.6.12)
The use of the transformation T;,a automatically leads to the proper normalization. Check that
(1.6.13)
where Cn and dn are the normalizing constants as defined in (1.3.13). By
combining (1.6.13) and (1.6.8) one finds that
(cn-l(Xn:n - dn), .. , c;;l(Xn_k+l:n - d n))

,g, (T;,a(Sl)'"'' T;,a(Sk)),

asymptotically in distribution, for every fixed k ~ 1.


Next, Malmquist's result will be extended to generalized Pareto r.v.'s. Here
the cases i = 1, 2 are relevant. Check that for negative reals a, b,
and

(1.6.14)
T2,a(a)/T2,a(b)

= - T2,a( - a/b).

Combining (1.6.12) and Corollary 1.6.11 one obtains


Corollary 1.6.12. Let X"n be the rth order statistic of n i.i.d. random variables
with common df. W:, a for i E {I, 2} and rL > O. Then,
(1.)

Xn:n

X2:n

. d epen d entr.v.s,
'

- - , .. ,--,Xl:narem

Xn -

l :n

Xl :n

if

i
i

=1
=2

for r

= 1, ... , n -

1.

It can easily be seen that the independence of ratios of consecutive order


statistics still holds if we include a scale parameter into our considerations.
As mentioned above, spacings of i.i.d. random variables with common
continuous dJ. are independent if, and only if, F is an exponential dJ. As a
consequence of this result one obtains (see Rossberg (1972) or Galambos
(1987), Corollary 1.6.2) that the ratios of consecutive order statistics of positive
or negative i.i,d. random variables with common continuous dJ. F are independent if, and only if, F is of the type Wl,a or W 2 ,a (where a scale parameter
has to be included).

44

1. Distribution Functions, Densities, and Representations

1.7. Moments, Modes, and Medians


The calculation of the exact values of moments of order statistics has received
much attention in literature. Since this aspect will not be central to our
investigations we shall only touch on moments of order statistics of uniform
and exponential r.v.'s.
Two results are included concerning conditions which ensure that moments
of order statistics exist and are finite. This topic will further be pursued in
Section 3.1 where some inequalities for moments of order statistics will be
established. The section concludes with a short summary of results concerning
modes and medians of distributions of order statistics.

Exact Moments
Let U1 : n, ... , Un:nagain denote the order statistics of n i.i.d. random variables
with common uniform distribution on (0, 1). The first result is a nice application of Malmquist's lemma (see Corollary 1.6.3).
Lemma 1.7.1. Let 0 < r1 < ... < rk < rk+1 = n + 1, and let m 1, ... , mk be integers such that ri +
mj ~
for i =
k. Then,

IJ=l

( iI

k
k
ED
Ur7:n = Db
ri +

,=1

,=1

where b(r, s) = (r - 1)!(s - 1)!/(r

J=l

+s-

1, ... ,

mj , ri+1

- ri

)/ b(ri' ri+1 - rJ

(1.7.1)

1)! is the beta function.

IJ=l

PROOF. Put Un+1:n = 1 and Si =


mj' By Corollary 1.6.3 (see also P.1.16)
and by inserting the explicit form of the density of Ur''''+1 -1 we obtain
k

TI Ur7;n = E TI (Ur,:nlUr,+1 :nY'

i=l

i=l

= TI
k

i=l

E Us,
ri :rj +1 -1

which easily leads to the right-hand side of (1.7.1).


(1.7.1) may alternatively be written in the following form.

1.7. Moments, Modes, and Medians

45

From (1.7.2) we obtain as special cases:

EUr:n = rl(n

+ 1) = Jlr,n'

(1. 7.3)

and, more generally,


m

EUr ' n
.

r--,-(r_+-:-:---:-1),-'_..-'::-(r,-+-----,.m_--....-:.1)--:(n + 1)(n + 2) ... (n + m)

_
c-

After some busy calculations one also gets, for r

E[(U

ron

and, for r

E[(U

ron

Jlr,n

(1.7.4)

s,

)(U _
)] = Jlr,n(1 - Jls,n)
son Jls,n
n+2

(1.7.5)

t,

Jlr,n

)(U _
)(U _
)]
son Jls,n I:n JlI,n

= 2Jlr,n(1

(n

- 2Jls,n)(1 - JlI,n)
+ 2)(n + 3)
.
(1.7.6)

For r = s we obtain in (1.7.5) that


2
r(n - r + 1)
E[Ur:n - Jlr,n] = (n + If(n + 2)'

Next we state the expectation and the variance of the rth order statistic
Xr:n of i.i.d. standard exponential r.v.'s. From Theorem 1.6.1 we know that
Xr:n 4: L~=1 '1J(n - i + 1) where '11' ... , '1r are standard exponential r.v.'s
(thus, having common expectation and variance equal to 1). This implies
immediately that

EXr:n =

L (n - i + 1)-1

i=1

=: Jlr,n

(1.7.7)

and
r

E(Xr:n - Jlr,n)2 =

L (n - i + 1)-2.

(1.7.8)

i=1

Inequalities for Moments


The first result yields that the mth moment of any order statistic Xr:n exists
and is finite if the mth absolute moment of the underlying distribution is finite.
Lemma 1.7.2. Let 0 = ro < r1 < ... < rk < rk+l = n + 1. Let X r1 on' ... , X rk :n
be order statistics of i.i.d. random variables ~ l' ... , ~n'
Then for every non-negative, measurable function g on the Euclidean k-space
we have

46

1. Distribution Functions, Densities, and Representations

PROOF. Let F be the dJ. of ~ l ' Put C = n!/n~:; (ri - ri- 1 - I)!, B =
{(x 1 , ,Xk):0<X 1 <<xk <I}, xo=O and Xk+l=1. From Theorem
1.2.5(i) and Theorem 1.4.5 we get

= C

g(F-l(xd, ... , F- 1 (xd)

(Xi -

i=1

:$; C

k+1

f
JeO.ll

xi_d'i-'i-I -1 dx 1 dX k

g(F-l(xd, ... ,F-l(Xk))dxl ... dxk =

CEg(~I, ... ,ek)

where the final identity becomes obvious by using the quantile transformation.

D
For g(x)

= Ixl mwe obtain as a special case


n!
EIXr:nlm:$; (r _ 1)!(n _ r)!Elell m.

(1.7.9)

Next, we find some necessary and sufficient conditions which ensure that
moments of central order statistics exist and are finite if the sample size n is
sufficiently large.

Lemma 1.7.3. Let X i:j be the ith order statistic of j i.i.d. random variables
1, ... , j with common df. F. Assume that

(1.7.10)
for some positive integers j, m and s
C > 0 such that

{1,oo.,j}. Then there exists a constant

1F-1 (x)lm x s(1 - xy-S+l :$; C,

X E

(0, 1).

(1.7.11)

Conversely, (1.7.11) implies that


EIXr:nlk <

(1.7.12)

00

whenever 1 + ks/m :$; r :$; n - (j - s + l)k/m.

PROOF. By the same arguments as in the proof of Lemma 1.7.2 we get


EIXs)m

.,

= (s -1~(j _

fl

s)! Jo 1F-1 (x)lm x s(l- xy-s+l/(x(l- x))dx

and hence, (1.7.11) holds under condition (1.7.10) since

(l/x)dx

(1/(1 - x))dx

00.

47

1. 7. Moments, Modes, and Medians

Moreover, (1.7.11) implies (1.7.12) since

n'

EIXr:nlk = (r _ 1)!(n _ r)!


=

n!
(r - 1)!(n - r)!

and r - 1 - ks/m

JoeW- (x)jkx r1

ck/m (1 Xr-l-ks/m(1

Jo

(1 - xrr dx

_ X)n-r- U-s+l)k/mdx <

0 as well as n - r - (j - s

00,

+ l)k/m ~ O.

We formulate a slightly weaker version of Lemma 1.7.3.

Corollary 1.7.4. For every positive integer k and 0 < at: < 1/2 the following three
conditions are equivalent:
(i)

for all sufficiently large n and nat:


(ii) There exists b > 0 such that

;5;

r ;5; (1 - at:)n.

W- 1 (qWq(1 - q) <

00.

(1.7.13)

Ixl P F(x)(1 - F(x)) <

00.

(1.7.14)

sup
qE(O,l)

(iii) There exists p > 0 such that


sup
x

PROOF. If (i) holds for all n ~ no, say, then the implication (1.7.10) ~ (1.7.11)
yields (ii) with b = kj(noat: + 2).
Moreover, if (ii) holds then (1.7.11) ~ (1.7.12) yields (i) for no =
[(1 + k(1 + l/b))/at:]' Thus (i) and (ii) are equivalent.
To prove the equivalence of (ii) and (iii) notice that (1.7.13) holds iff there
exists b > 0 such that

(a)

W- 1 (qWq <

1 and (b)

W- 1 (qW(l - q) ;5;

(1.7.13')

for sufficiently small values of q in (a) and (1 - q) in (b).


Moreover, (iii) holds iff there exists b > 0 such that
(a) Iyl~ F(y) < 1 and (b) lyl~(1 - F(y)) < 1

(1.7.14')

for sufficiently small values of F(y) in (a) and (1 - F(y)) in (b).


We are going to prove the equivalence of (1.7.13')(a) and (1.7. 14')(a):
For sufficiently small q, the inequality W- 1 (qWq < 1 is equivalent to
F-l(q) > _q-l/~ which holds, according to (1.2.10), iff q > F( _q-l/~). Setting
y = _q-l/~ we see that (1.7. 13')(a) holds iff for all sufficiently small y we have
Iyl-~ > F(y) which is equivalent to (1.7.14')(a).
In a similar manner one can prove that (1.7.13')(b) is equivalent to
(1.7. 14')(b) which completes the proof.
0

48

1. Distribution Functions, Densities, and Representations

Unimodality of D.F.'s of Order Statistics


In this part of the section we find conditions which imply the unimodality of
the dJ. of an order statistic.
AdJ. F is unimodal if there exists a number u such that the restriction
FI( - 00, u) of F to the interval ( - 00, u) is convex and FI(u, 00) is concave. Every
u with this property is a mode of F. If u is a mode of F and F is continuous
at u then F possesses a density, say f, where f is nondecreasing on ( - 00, u]
and nonincreasing on [u, 00). We also say that a density f is unimodal if it has
these properties.
Hereafter let Xi:n be the order statistic of n i.i.d. random variables with
common dJ. F and density f. Moreover, assume that f is differentiable and
strictly positive on (IX (F), w(F)). Denote by fr:n again the density of Xr:n.
Given a real number u we write I(u) = (IX (F), w(F)) (\ ( - 00, u) and J(u) =
(IX (F), w(F)) (\ (u, 00). The following results are essentially due to Alam (1972).
Standard calculations yield that fr:n is unimodal if, and only if, there exists
some u such that
(1.7.15)
f/'nII(u) ~ 0 and f::nIJ(u):5: o.
Check that f:'n = b(r, n - r

f'

gr,n = P

+ 1)-1 P pr-1(1
r-l

+ ----p- -

n-r

1_ F

- F)n-r gr,n on (IX (F), w(F)) where

on (IX (F), w(F)).

(1.7.16)

The unimodality of fr:n will be characterized by means of the function gr,n'

Lemma 1.7.5. The density fr:n of Xr:. is unimodal if, and only
u such that gr,.II(u) ~ 0 and gr,.IJ(u) :5: O.

if, there exists

PROOF. Immediate from (1.7.15) and (1.7.16). Define u:= sup{x: IX(F) < x <
w(F) and gr,.(x) ~ O} if {x: IX(F) < x < w(F), gr,n(x) ~ O} # 0, and u =
inf{x: IX(F) < x < w(F), gr,n(x) < O}, otherwise.
0

The density fr:n is not unimodal, in general, if the underlying density f is


unimodal. We mention the following counterexample due to Huang and Gosh
(1982): Consider the density f defined by

f(x) =

{I!

if

-! < x < 0
O:5:x<1

that is zero otherwise. Obviously, f is unimodal. However, it can be shown


that the density of the kth order statistic of a sample of size n is not unimodal
for k > (n + 1)/2.
However, if f is strongly unimodal [that is, log f is concave on the support
(IX (F), w(F))] then it can be shown that fr:n is unimodal. Notice that the strong
unimodality of f implies that f'IP is nonincreasing on (IX(F),w(F)). This
follows at once from the fact that Ilf is convex if f is strongly unimodal.

49

1.7. Moments, Modes, and Medians

Corollary 1.7.6. (i) If 1'/ is non increasing on (ex (F), w(F)) then j,.,n is unimodal.
(ii) If, in addition, gr,n(u) = 0 for some U E (ex (F), w(F)) and n ~ 2 then u is the

unique mode of j,., n'


PROOF. (i) Obvious from Lemma 1.7.5 since F is nondecreasing.
(ii) Since F is strictly increasing on (ex (F), w(F)) we know that gr,n is strictly
decreasing. This implies that the solution of the equation gr,n(u) = 0 (that is
necessarily a mode of j,.,n) is unique.
0

The Cauchy distribution provides an example of a unimodal density which


is not strongly unimodal, however, l'/f2 is nonincreasing.
EXAMPLES 1.7.7. (i) The normal, exponential and uniform densities are strongly
unimodal.
(ii) Iff = 1[0.1] and n ~ 2 then (r - 1)f(n - 1) is the unique mode of j,.'n'
(iii) The condition that l'/f2 is nonincreasing is not necessary for the unimodality of j,.,n: Let F(x) = x a for x E (0, 1) and some ex E (0, 1). Then j,.,n is
unimodal, however, I'/P is strictly increasing on (0, 1).
It follows from Corollary 1.7.6 and P.3.4 that the weak convergence of
distributions of order statistics is equivalent to the convergence w.r.t. the
variational distance if the underlying density is strongly unimodal (or if 1'/f2
is nonincreasing).

Medians
As a third functional parameter of order statistics we consider the median of
the distribution of an order statistic. Again we are interested in the relationship
between the underlying distribution and the distributions of order statistics.
Recall that a median u of a r.v. ~ is defined by the property that
(1.7.17)
(1.7.17) holds if F(u) = t. Moreover, if the dJ. F of
(1.7.17) is equivalent to the condition F(u) = t.

is continuous, then

Lemma 1.7.8. Let X;,2m+1 be the ith order statistic of i.i.d. random variables
~ l ' ... , ~ 2m+1 with common df F where m is a positive integer. Then, every
median of ~1 is a median of X m+1,2m+1'

PROOF.

Let u be a median of ~ l ' Since F(u) ~

that
P{Xm+Um+1 ::;

u}

= P{Um+1,2m+1 ::;

t we obtain from Corollary 1.2.7

F(u)} ~

P{Um+1 ,2m+l ::;

t}.

Example 1.2.2 implies that P{Um+Um+1 ::; t} = P{Um+1,2m+1 ~ t}. Hence


P{Um+1 ,2m+1 ::; t} = t and, thus, P{Xm+1 ,2m+1 ::; u} ~ t.

1. Distribution Functions, Densities, and Representations

50

Since P{Xm+1:2m+1 ::; v} i P{Xm+l:2m+l < u} as viu it remains to prove


that P{Xm+l:2m+1 ::; v} ::; t for every v < u. This follows by the same arguments as in the first part of the proof by using the fact that F(v) ::; 1.
0
Lemma 1.7.8 reveals that the sample medians for odd sample sizes are
median unbiased estimators of the underlying (unknown) median. However,
this is an exceptional case. For even sample sizes 2m it is impossible, in general,
to find some r E {I, ... , 2m} such that the underlying median is the median of
Xr:2m'
EXAMPLE 1.7.9. For every positive integer m and r

{I, ... , 2m} we have


(1.7.18)

To prove this notice that for r -# 2m - r + 1 we have P {U" 2m


P{U2m - r+1:2m ::; t} and hence by Example 1.2.2

::;

t} -#

P{U,,2m ::; t} = 1 - P{U2m - r+1:2m ::; t} -# 1 - P{U"2m ::; t}.

This implies (1.7.18).


The discussion above can be extended to the question whether the qquantile F- 1 (q) is a median of the distribution of the sample q-quantile
Fn-I(q); in other words, whether the sample q-quantile is a median unbiased
estimator of the underlying q-quantile. Clearly, the answer is negative in
general, however, as pointed out in (8.1.9), randomized sample q-quantiles
have this property. In the present section we shall only examine randomized
sample medians. The reader not familiar with Markov kernels and their
interpretation is adviced first to read Section 10.1.
Denote by ex the Dirac measure with mass 1 at x; thus, we have eAB) =
IB(x). Define the Markov kernel M r n by
(1.7.19)
which is a randomized sample median if r = [(n + 1)/2]. Thus, X"n as well
as X n - r + 1 : n are chosen with probability t. Notice that if n = 2m + 1 and
r = m + 1 then the (non-randomized) sample median X m+1 : 2m +1 is taken.
Denote by Mr.nP the distribution of the Markov kernel M r.n (compare with
(10.1.2)). We have (Mr.nP)(B) = EMr.n(BI).

Lemma 1.7.10. Let Xi:n denote the ith order statistic of n i.i.d. random
variables eI, ... , en with continuous df F. Then every median of e1 is a median
of M r n

Since F is continuous we have F(u) = 1/2 for every median u of I '


We will prove that (Mr.nP)( -00, u] = 1/2 and hence u is a median of M r.n.
From Corollary 1.2.7 and Example 1.2.2 we get
PROOF.

51

1.8. Conditional Distributions of Order Statistics

+ P{Xn- r+1:n S u}J


t} + P{Un- r+ l : n S t}J
t} + P{Ur:n > t}]

(Mr. n P)( -oo,u] = t[P{Xr:n S u}

= t[P{Ur:n S
= t[P{Ur:n S

2'

Lemma 1.7.10 shows that M r n is a median unbiased estimator of the


underlying median.

1.8. Conditional Distributions of Order Statistics


Throughout this section, we shall assume that Xl on' ... , Xn:n are the order
statistics of n i.i.d. random variables with common continuous dJ. F. The aim
of the following lines will be to establish the conditional distribution of
(Xs, :n"'" XSm:n) conditioned on (Xr, :n"'" X rk :n).

Introductionary Remarks
At the beginning let us touch on some essential definitions and properties
concerning the conditional distribution
P(Y E 'IX)

of Y given X.

In the present context it is always possible to factorize the conditional distribution P( Y E '1 X) by means of the conditional distribution
P(YE 'IX = x) of Y given X = x. Moreover, P(YE BIX) is the composition
of P(YEBIX =.) and X. By writing, in short, P(YEBI') in place of
P(YE BIX = .) we have P(YE BIX) = P(YE BI') 0 X.
Apart from a measurability condition and the fact that P(Y E 'IX = x) is a
probability measure the defining property of P(Y E 'IX) is
E(lA(X)P(YE BIX))

= P{X E

A, YE B}

(1.8.1)

for all Borel sets (in general, measurable sets) A and B.


From (1.8.1) we see that P(Y E 'IX = x) has only to be defined for elements
x in a set having probability 1 W.r.t. the distribution of X. For x in
the complement of the this set, P( Y E '1 X = x) may e.g. be defined as the
distribution of Y.
In the statistical context, one is primarily interested in the consequence that
the distribution of Y can rebuilt by means of the conditional distribution
P( Y E '1 X = .) and the distribution of X. Obviously,
EP(YE BIX) = P{YE B}.

(1.8.2)

Assume that the joint distribution of X and Y has a density, say, f


W.r.t. some product measure J1.l x J1.2' Then we know that the conditional

52

1. Distribution Functions, Densities, and Representations

distribution P(YE 'IX = x) has a 1l2-density, say, f2('lx) which, by the


definition of a density, has the property
P(YE BIX = x) =

Lj~('IX)dIl2'

The density f2( 'Ix) is the conditional density of Y given X = x. It is well


known that f2( 'Ix) = f(x, . )/fl (x) if fl (x) > 0 where fl is a Ill-density of the
distribution of X.
We mention another simple consequence of(1.8.1). The conditional distribution
P((X, Y)

'IX = x)

of (X, Y)

given X = x

is the product of P( Y E '1 X = x) and the Dirac-measure bx at x defined by


bAB) = 1B(x). This becomes obvious by noting that
E[lA(X)P(YE B2IX)b x (B I )] = P{X

A, (X, Y)

BI

B2 }.

(1.8.3)

The Basic Theorem


Starting with the joint density of order statistics it is straightforward to deduce
the desired conditional distributions. A detailed proof of this result is justified
because of its importance. We remark that the proof can slightly be clarified
(however not shortened) if P.1.32, which concerns conditional independence
under the Markov property, is utilized.
Let r l < ... < rk The conditional distribution of the order statistic Y:=
(Xl :n"'" Xn:n) given
X:= (Xr,,,,,,,Xrk:n) = (xr" ... ,xr.) =: x
has only to be computed for vectors x with IX(F) < x r, < ... < x rk < w(F)
(compare with Theorem 1.5.2). We shall prove that P(Y E 'IX = x) is the joint
distribution of certain independent order statistics W; and degenerated r.v.'s
Y,..J More precisely, W; is the order statistic of i.i.d. random variables with
common dJ. Fi.x which is F truncated on the left of Xr'_l and on the right of
x r, (where xro = IX(F) and Xrk + = w(F. Thus,
1

Fijy) = [F(y) - F(xr'_l)]/[F(x r) - F(Xr'_l)]'


and i = 1, ... , k

+ 1.

Theorem 1.S.1. Let F be a continuous dj., and let 0 = ro < r l < ... < rk <
rk+1 = n + 1. If IX(F) = xro < Xr, < ... < x rk < Xrk + 1 = w(F) then the conditional distribution of (Xl:n, ... ,Xn:n) given (Xr,:n"",Xrk:n) = (xr" ... ,xr.) is
the joint distribution of the r.v.'s YI , ... , y" which are characterized by the
following three properties:

1.8. Conditional Distributions of Order Statistics

(a) For every i E I := {j: 1 ~ j ~ k

+ 1, rj -

53

rj- 1 > I} the random vector

is the order statistic of ri - ri-1 - 1 U.d. random variables with common d.f.
Fi,x'
(b) Y,., is a degenerate r.v. with fixed value Xr, for i = 1, ... , k.
(c) W;, i E I, are independent.

PROOF. Put M := {I, ... , n} \ {r1'"'' rk}' In view of (1.8.3) it suffices to show
that the conditional distribution of the order statistics X i : n, i E M, given
X =: (Xr,:n,""Xrk :n) = (xr" ... ,xrJ =: x is equal to the joint distribution of
the r.v.'s ~,j E M. This will be verified by constructing the conditional density
in the way as described above.
Denote by Q the probability measure corresponding to the dJ. F. Let f be
the Qn-density of the order statistic (X 1 :n"'" Xn:n) and 9 the Qk-density of X
(as computed in Theorem 1.5.2). Then, the conditional Qn-k-density, say,f( '1 x)
of X i : n, i E M, given X = x has the representation
if g(x) > 0 where z denotes the vector (Xi)ieM' Notice that the condition
g(x) > 0 is equivalent to oc(F) < x r, < ... < x rk < w(F). Check thatf(zlx) may
be written
f(zlx) =

fl hi(xr,_, +1"", Xr,-d/(F(x r,) -

ieI

F(X r'_1))r,-r,_,-1

where hi is the Qi:;r,-, -I-density of W; and Qi,x is the probability measure


corresponding to the truncated dJ. Fi,x'
Since 1/[F(xr,) - F(xri-1)] defines a Q-density of Qi,x it follows that f( 'Ix)
is the Qn-k-density of ~, j E M. The particular structure of f( 'Ix) shows that
the random vectors W;, i E I, are independent and W; is the asserted order
0
statistic.
Theorem 1.8.1 shows that the following two random experiments are
equivalent as far as their distributions are concerned. First, generate the
ordered values Xl < ... < Xn according to the dJ. F. Then, take x r, < ... < x rk
and replace the ordered values x r ,_, +1 < ... < Xr,-l by the ordered values
Yr,_, +1 < ... < Yr,-l which are generated according to the truncated dJ. Fi,x as
defined above. Then, in view of Theorem 1.8.1 the final outcomes
Yl < ... < Yr,-l < x r, < Yr,+l < ... < Yr2-l < x r2 < ...

< x rk < Yrk+1 < ... < Yn


as well as Xl < ... < Xn are governed by the same distribution.
In Corollary 1.8.2 we shall consider the conditional distribution of
(Xs,:n,""Xsm :n) given (Xr,:n,''''Xrk :n) = (xr" ... ,xrJ instead of the conditional distribution of the order statistic (Xl :n"'" Xn:n). This corollary will

I. Distribution Functions, Densities, and Representations

54

be an immediate consequence of Theorem 1.8.1 and the following trivial


remarks.
Let X and Y be LV.'S, and g a measurable map defined on the range of
Y. Then,
(1.8.4)
is the conditional distribution of g(Y) given X. This becomes obvious by
noting that as a consequence of (1.8.1) for measurable sets A,
E[lA(X)P(YE g-1(c)IX)] = P{X

A, g(Y)

C}.

(1.8.5)

An application of (1.8.4), with g being the projection (x 1 , ,x n )-+


.. , xsJ yields

(x s "

Corollary 1.8.2. Let 1 :s;; S1 < ... < Sm :s;; n. The conditional distribution of
(Xs, :n"'" Xsrn: n) given (Xrl :n.. X rk :n) = (x r,,. xrJ is the joint distribution
of the r.v.'s Y." ... Y.rn with 1'; defined as in Theorem 1.8.1.
As an illustration to Theorem 1.8.1 and Corollary 1.8.2 we note several
special cases.
EXAMPLES 1.8.3. (i) The conditional distribution of Xs:n given Xr:n = x is the
distribution of
(a) the (s - r)th order statistic Y.-r:n-r of n - r i.i.d. random variables with
dJ. F(x,oo) (the truncation of F of the left of x) if 1 :s;; r < s :s;; n,
(b) the (r - s)th order statistic y"-s:n-s of n - s i.i.d. random variables with
dJ. F(-oo,x) (the truncation of F on the right of x) if 1 :s;; s < r :s;; n,
(c) a degenerate LV. with fixed value x if r = s.
(ii) More generally. if in (i) Xs:n is replaced by
(a) X s:n r < s :s;; n. then in (i)(a) Y.-r:n-r has to be replaced by (Yl :n-,,""
~-r:n-r)'

(b) X s:n 1 :s;; s < r, then in (i)(a) y"-s:n-s has to be replaced by (Yl :n-S" .. ,
Y,,-s:n-.)
(iii) The conditional distribution of X r+1 :n' ... , X s- l :n given Xr:n = x and
Xs:n = Y is the distribution of the order statistic (Yl :s-r+1,'''' Y.-r+l :s-r+l)
of s - r + 1 i.i.d. random variables with dJ, F(x,y) (the truncation of F on the
left of x and on the right of y).
(iv) (Markov property) The conditional distribution of Xs:n given Xl:n =
Xl' ... , X s- l :n = X s-- 1 is the conditional distribution of Xs:n given X s- l :n =
xs - l . Hence, the sequence Xl :n' ... , Xn:n has the Markov property.

The Conditional Distribution of Exceedances


Let again Xi:n be the ith order statistic ofn i.i.d. random variables ~l' ... , ~n
with common continuous dJ. F. As a special case of Example 1.8.3(ii)

1.8. Conditional Distributions of Order Statistics

55

we obtain the following result concerning the k largest order statIstIcs:


The conditional distribution of (Xn-k+1:m"" Xn:n) given X n- k : n = x is the
distribution of the order statistic (Y1 :k,"" l'k:k) of k i.i.d. random variables
111,"" 11k with common dJ. F(x,oo)'
By rearranging X n- k+1 :n' ... , Xn:n in the original order of their outcome
we obtain the k exceedances, say, (1, ... , (k of the r.v.'s ~ 1, ... , ~n over the
"random threshold" X n - k : n
We have ((1'''',(k) = (~i(1)'''''~i(k) whenever I:$; i(l) < ... < i(k):$; n
and min(~i(1)"'" ;(k) > X n- k:n. This defines the exceedances (; with probability one because F is assumed to be continuous.
Corollary 1.8.4. Let cx(F) < x < w(F). The conditional distribution of the
exceedances (1' ... , (k given X n - k : n = x is the joint distribution of k i.i.d. random
variables 111, ... , 11k with common dJ. F(x,oo) (the truncation of the dJ. F on the
left of x).

PROOF. Let Sk be the permutation group on {I, ... , k}. For every permutation
r E Sk we get the representation
(( 1", .,

(k) = (Xn- t(1)+1 :n"'" X n- t(k)+1 :n)

on the set At where


At = {(R i(1)'"'' Ri(k) = r

for some 1 :$; i(l) < ... < i(k) :$; n}

and (R 1, ... , Rn) is the rank statistic (see P.1.30). Check that P(A t ) = 11k! for
every r E Sk' Using the fact that the order statistic and the rank statistic are
independent we obtain for every Borel set B
P(((1""'(k)
=

BIXn- k:n = x)

P(At n {(Xn -

(l)+1

:n"'" X n- t(k)+1 :n)

B} IXn- k:n = x)

tSk

= (11k!)

P((Xn- t(1)+1 :n"'" X n- t(k)+1 :n) E BIXn- k:n = x)

P{(~(1):b'''' ~(k):k)

tE Sk

(11k!)

B}

tESk

where the Y;:k are the order statistics ofthe r.v.'s I1j. The last step follows from
Example 1.8.3(ii). By P.1.30,
P(((l'''''(k)

BIXn- k:n = x)

= P{(I11," .,I1d E B}.

The proof is complete.

Extensions of Corollary 1.8.4 can be found in P.1.33 and P.2.1.

Convex Combination of Two Order Statistics


From Example 1.8.3(i) we deduce the following result which will further be
pursued in Section 6.2.

1. Distribution Functions, Densities, and Representations

56

Corollary 1.8.5. Let F be a continuous df, and let 1 :::; r < s :::; n.

Then, for every p and t,


P{(l - p)Xr:n

+ pXs:n:::; t}

= Fr,n(t) -

f",

P{P(Y.-r:n-r - x) > t - x} dFr,n(x)

where Fr,n is the df of Xr:n, and Y.-r:n-r is the (s - r)th order statistic of n - r
U.d. random variables with common df F(x,,,,) [the truncation of F on the left
of xl
This identity shows that it is possible to get an approximation to the dJ. of
the convex combination of two order statistics by using approximations to
distributions of single order statistics.
In Section 6.2 we shall study the special case of the convex combination of
consecutive order statistics Xr:n and X r+ 1 : n where Xr:n is a central order
statistic and, thus, Y.-r:n-r is a sample minimum.
PROOF OF COROLLARY

P{(l - p)Xr:n

1.8.5. Example 1.8.3(i) implies that

+ pXs:n:::; t} =
=

P{(l - p)x

f",

+ pY.-r:n-r:::; t}dFr,n(x)

P{p(Y.-r:n-r - x) :::; t - x} dFr,n(x)

since P {Y.-r:n-r :::;;; x} = O. This implies the assertion.

P.l. Problems and Supplements


Let ~ l ' ... , ~. be i.i.d. random variables with common dJ. F, and let Xr:n denote the
rth order statistic.
1. Prove that the order statistic is measurable.
2. Denote by I(q) the set of all q-quantiles of F. Ifr(n)/n --+ q as n --+
eventually, w.p. 1 for every open interval U containing I(q).

00

then X,(n):.

U,

3. Denote by S. the group of permutations on {l,oo.,n}.


(i) For every function f,

L J(Xt(l):"""

'teSn

Xt(n):n) =

L J(~t(I)"'"

'reS"

~t(.)

(ii) Using the notation of (1.1.4),


Zr:.(~I'' ~n) = Zr:.(~t(1)'' ~t(n)

(that is, the order statistic is invariant w.r.t the permutation of the given r.v.'s).

P.1. Problems and Supplements

57

4. (i) AdJ. F is continuous if F- 1 is strictly increasing.


(ii) F- 1 is continuous if F is strictly increasing on (tx(F), w(F)).
(iii) Denote by Fz the truncation of the d.f. F on the left of z. Prove that
Fz- 1(q) = p-1 [(1 - F(z))q

+ F(z)J.

5. Let I] be a (0, 1)-valued r.v. with dJ. F. Then, G- 1(I]) has the dJ. FoG for every
dJ. G.
6. Let I] be a r.v. with uniform distribution on the interval (U 1 ,U 2 ) where 0 ~ U 1 <
U 2 ~ 1. Let F be a dJ. and put Vi = F- 1(U i ) [with the convention that F- 1(0) = tx(F)
and p-1(1) = w(F)]. Then, p-1(I]) has the dJ.
G(x) = (F(x) - F(v 1))/(F(v2 )

7. Let F and G be d.f.'s. If F(x)


q> G(u).

~ G(x) for

F(vd),

every x

~ U

then P-1(q)

~ G- 1(q)

for every

8. Let ei, i = 1,2, 3, ... be r.v.'s which weakly converge to eo. Then, there exist r.v.'s
e; such that ei ~ e; and e;, i = 1, 2, 3, ... converge pointwise to e~ w.p. 1. [Hint:
Use Lemma 1.2.9.]
9. For the beta dJ. I, . with parameters rand s [compare with (1.3.8)] the following
recurrence relation holds:
(r

+ s)/". =

rl'+1,.

+ 1".+1'

10. (Joint dJ. of two order statistic)

Let Xi,. be the ith order statistic of n i.i.d. random variables with common dJ. F.
(i) If 1 ~ r < s ~ n then for u < V,

P{X". ~ u, X.,. ~ v}
=

.-i

i=, j=max(O,.-i)

..

n'.. . F(u)V(v) })!

I!}!(n -

1 -

F(u))i(l - F(V))-i- j

and for u ~ v,

P{X".

u, X.,.

v}

P{X.,.

v}.

[Hint: Use the fact that L;:=l [1(-oo,u)(ek), 1(u,v)(ed, 1(v,oo)(ek)] is a multinomial
random vector.]
(ii) Denote again by I". the beta dJ. Then for u < v,

P{X".

u, X.,.

v}

= 1".-,+1 (F(u)) _ _ n_!_

(r - 1)!

'-f1 (_1)i F(ur+J.-'+1"-'-i(~


- F(~)) .
n!(n - r - I)!(r + I)
i=O

(Wilks, 1962)
11. (Transformation theorem)
Let v be a finite signed measure with density f Let T be a strictly monotone,
real-valued function defined on an open interval J. Assume that 1= T(J) is an
open interval and that the inverse S: I -+ J of T is absolutely continuous. Then
IS' I(f 0 S) 1[ is a density of Tv (the measure induced by v and T).
[Hint: Apply Hewitt & Stromberg, 1975, Corollary (20.5).]

1. Distribution Functions, Densities, and Representations

58

12. Derive Theorem 1.3.2 from Theorem 1.4.1 by computing the density of the rth
marginal distribution in the usual way by integration.
(Hajek & Sidak, 1967, pages 39, 78)
13. Extension to Theorem 1.4.1: Suppose that the random vector (~l" .. , ~n) has the
(Lebesgue) density g. Then, the order statistic (Xl ,n"", Xn,n) has the density
fl. .... n'n given by
fl ..... n'.(x)

reS"

g(XT(l)'"'XT(.)'

XI

< ... < x.,

and =0, otherwise (here Sn again denotes the permutation group).


(Hajek & Sidak, 1967, page 36)
14. For i = 1, 2 let X\j~., ... , x~j?n be the order statistics ofn i.i.d. random variables
with common continuous dJ. Fj If the restrictions FII Bj and F21 Bj are equal
on the fixed measurable sets Bj , j = 1, ... , k, then for every measurable set
B c BI X ... X Bk and 1 S rl < ... < rk S n:

P{ (X;~?n'"'' X;~?n) E B} = P{ (xg?n,"" X;;?)

B}.

15. If the continuity condition in P.1.14 is omitted then the result remains to hold if
the sets Bj are open.
16. (Modifications of Malmquist's result)
Let 1 s rl < ... < rk S n.
(i) Prove that the following r.v.'s are independent:
1 - Ur"., (1 - Ur2 ,n)/(1 - Ur".), ... , (1 - Urk ,.)/(l - Urk-l'.)'

Moreover,
(1 - Uri ,n)/(l - Uri _1 , . )

= U.- ri +! ,.-ri_'

for i = 1, ... , k (with ro = o and Uo,. = 0).


(ii) Prove that the following r.v.'s are independent:

Moreover,
for i = 1, ... , k (with rk+1 = n + 1 and U.+ I ,. = 1).
(iii) Prove that the following r.v.'s are independent:
Ur"n,(Ur2 ,n - Ur".)/(1 - Ur".), ",,(Urk,n - Urk _, ,.)/(1 - Urk_"n)'

Moreover,

for i = 1, ... , k (with ro = 0 and UO,n = 0).


(iv) Prove that the following r.v.'s are independent:
(Ur2 ,n - Ur"n)/Ur2 ,n, ... , (Urk ,. - Urk_".)/Urk ,., 1 - Urk ,.

Moreover,
for i = 1, ... , k (with

rk+1

+ 1 and

Un+! ,.

= 0).

P.l. Problems and Supplements

59

17. Denote by ~i independent standard normal r.v.'s. It is well known that (~i
is a standard exponential r.v. Prove that
(VI ,n"'" Vn,n) =d

.~ ~f )/(2(n+1)
.~ ~f ))n

( ( 2r

1-1

1-1

r=l

+ ~n/2

18. Let ~1' ... , ~k+l be independent gamma r.v.'s with parameters SI, ... , Sk+l'
(i) Then, (~JL.J:';t ~)~~1 has a k-variate Dirichlet distribution with parameter
vector (SI,"" Sk+1)'
(Wilks, 1962)
(ii) Show that for 0 = ro < r l < ... < rk < rk+l = n + 1,

19. Let Fn denote the sample d.f. of n i.i.d. (0, I)-uniformly distributed r.v.'s, and
rIJ, ... , '1n+1 independent standard exponential r.v.'s. Then,
Fn(t)

20.

~ n~l i~ 1(~oo,t] (~ '1j /:~ '1}

(i) Let Xi,n denote the ith order statistic ofn i.i.d. random variables with common
density f As an extension of Theorem 1.6.1 one obtains that (X"n - Xr~Ln)~~l
has the density
x

--->

n!

(fu(t
1-1

)-1

xj )),

Xj

> 0, i = 1, ... , n,

and the density is zero, otherwise.


(ii) The density of(V"n - Vr-1'n)~~l is given by
x

--->

if Xj > 0, i = 1, ... , n, and

n!

L.

Xj

< 1,

j~1

and the density is zero, otherwise.


(iii) For 1 ~ r < S ~ n the density of (V"n - Vr~Ln' V"n - VS~1 ,n) is given by
x

--->

n(n - 1)(1 - x _ y)n~2

if x, y > 0 and x

+y<

1,

and the density is zero, otherwise.


21. (Convolutions of gamma r.v.'s)
(i) Give a direct proof of Lemma 1.6.6 by induction over n and by using the
convolution formula P {~ + '1 ~ t} = SG(t - s) dF(s) where ~ and '1 are independent r.v.'s with drs G and F.
(ii) It is clear that ~ + '1 is a gamma r.v. with parameter m + n if ~ and '1 are
gamma r.v.'s with parameters m and n.
22. Let IJ. > 0 and i = 1 or i = 2. Prove that the sample minimum of n i.i.d. random
variables with common generalized Pareto dJ. W; . has the d.f. W; ....
23. Prove that

EVr~~ =

fl

m=l

(n - m + 1)/(r - m)

if 1 ~j < r.

[Hint: Use the method of the proof to Lemma 1.7.1.]

60

1. Distribution Functions, Densities, and Representations

24. Put Ar

r/(n

+ 1), Un+! ,n =

1 and Uo,n

O. Prove that

(i)

if 1 :::; r < s :::; n + I, and


(ii)
if 0 :::; r < s :::; n.
25. For 0

ro < r, < ... < rk < rk+' = n

k+1

(ri - ri-, - I)E

+ 1 and reals ai' i =

1, ... , k,

a.(U - A )2 - a,_ (U
- A )2
I
r"n
r,
I I
r'_I,n
r'_1 = 0

j=l

U'j:n-U'i_l:n

where ao = ak+1 = O.

26. Let X"n be the rth order statistic of n i.i.d. random variables with common dJ.
F(x) = 1 - 1/logx for x ~ e. Then, for every positive integer k,
EIX"nl k =

00.

27. For the order statistics XLI and X I ,2 from the Pareto dJ. Wl.l we get
EX", =

00

and

EX',2 = 2.

28. Let Mr,n be the randomized sample median as defined in (1.7.19) and
Nr.n = X"n 1(,/2.1)(tJ)

+ X n- r+, ,n 1(0.'/2j(tJ)

where tJ is a (0, I)-uniformly distributed r.v. that is independent from


Show that the distributions of Mr,n and Nr,n are equal.

(~"

... , ~n)'

29. (Conditional distribution of (~I"'" ~n) given (X I ,n"'" Xn,n


Let Xi,n be the order statistics of n i.i.d. random variables ~" ... , ~n' Let Sn denote
the group of permutations on {l, ... , n}. Then, the conditional distribution of
(~"""~n) given (X"n"",Xn,n) is defined by
P((~I'''''~n)

AI(X',n,,,,,Xn,n = (n!f'

Thus, the conditional expectation of f(~ I"'"


by
E(f(~"""~n)I(X"n,,,,,Xn'n

rESn

~n)

= (n!f'

l A (X,(lp'''''X,(n),n)'

given (XI ,n"", Xn,n) is defined

rES n

f(X,(l),n'''',X,(n),n)'

30. (Rank statistic and order statistic)


The rank of ~i is defined by R i.n = nFn(~;) where Fn is the sample dJ. based on
~" ... , ~n' Moreover, Rn = (R"n,"" Rn,n) is the rank statistic.
Suppose that (~" ... , ~n) has the density g. Then:
(i)

(ii) The conditional distribution of Rn given Xn = (X, ,n"'" Xn,n) is defined by


P(R n = KIXn) = g(XK(,),n,,,,,XK(n),n)/I g(X,(I),n,,,,,X,(n),n)
'ES

for K = (K(I), ... , K(n E Sn.

Bibliographical Notes

61

(iii) If, in addition, ~ I' ... , ~n are i.i.d. random variables then Rn and Xn are
independent and P{Rn = K} = lin! for every K E Sn'
(Hajek & Sidak, 1967, pages 36-38)
31. (Positive dependence of order statistics)
Let Xi,n denote the ith order statistic of n i.i.d. random variables with common
continuous dJ. F. Assume that EIXi,nl < 00, EIXj,nl < 00 and EIXi,nXj,nl < 00.
Then, Cov(Xi,n, Xj,n) ~ O.
(Proved by P. Bickel (1967) under stronger conditions.)
32. (Conditional independence under Markov property)
Let Yt , , y" be real-valued r.v.'s which possess the Markov property. Let
1 :::;; r l < ... < rk :::;; n. Then, conditioned on y"" ... , y"k' the random vectors
(YI , .. , Y,.,), (Y,.,+1'"'' Y,.,), ... , (y"k+1,"" y") are independent; that is, the product
measure
P((YI , .. , Y,.,)E IY,.,)

Y,.,)) x ..

P((y",+ I ' " ' ' Y,.,)E 'I(y""


... X

P((y"k+ I , " " y")E 1Y,..j

is the conditional distribution of (YI , .. , y") given (y"", .. , Y,.J


33. Let F, ri, x, and Fi x be as in Theorem 1.8.1.
(i) For i ~ I := (j: 1 :::;; j :::;; k + 1, rj - rj_1 > 1} define the random vector
((,,_,+1'" ,(,,-1) by the original r.v.'s ~i lying strictly between X"_I,n and X",n
in the original order of the outcome.
Then, the conditional distribution of ((,'_, +1,' .. , (,,-I), i E I, given X",n =
x", ... , X'k,n = X'k is the joint distribution of the independent random vectors
(tI,,_,+I'" ,tI,,-I), i E I, where for every i E I the components of the vector are
i.i.d. with common dJ. Fi x '
(ii) Notice that
(("_, +1"", (,,-tl = ((j(!), ... , ~j("-"-I-I)
whenever 1 :::;; j(1) < ... < j(ri - ri -

I -

1)

n, and

X"_,,n < min(~j(I)""'~j("-"_I-l)):::;; max(~j(!), ... ,~j(,,_,,_,-l)) < X",n'


34. (Conditional dJ. of exceedances)
Let Fn be the sample dJ. of r.v.'s with common uniform dJ. on (0,1). nFn(t),
o :::;; t :::;; 1, is a Markov process such that nFn(t), Xo :::;; t :::;; 1, conditioned on
nFn(xo) = k, is distributed as

Bibliographical Notes
Ordering of observations according to their magnitude and identifying central
or extreme events belongs to the most simple human activities. Thus, one can
give early reference to the subject of order statistics by quotations from any
number of ancient books. For example, J. Tiago de Oliveira gives reference

62

1. Distribution Functions, Densities, and Representations

to the age of Methuselah (Genesis, The Bible) in the preface of Statistical


Extremes and Applications (1984). By the way, Methuselah is reported to have
lived 969 years. This should not merely be regarded as a curiosity but also as
a comment indicating the difficulties for the proper choice of a model; here in
connection with the question (compare with E.J. Gumbel (1933), Das Alter des
M ethusalem): Does the distribution of mortality have a bounded support?
An exhaustive chronological bibliography on order statistics of pre-1950
and 1950-1959 publications with summaries, references and citations has
been compiled by L. Harter. The first relevant result is that of Nicolas
Bernoulli (1709) which may be interpreted as the expectation ofthe maximum
of uniform random variables.
In the early period, the sample median was of some importance because of
its property of minimizing the sum of absolute deviations. It is noteworthy
that Laplace (1818) proved the asymptotic normality of the sample median.
This result showed that the sample median, as an estimator of the center of
the normal distribution, is asymptotically inefficient w.r.t. the sample mean.
From our point of view, the statistical theory in the 19th century may be
characterized by (a) the widely accepted role of the normal distribution as a
"universal" law and (b) the beginning of a critical phase which arose from the
fact that extremes often do not fit that assumption. Extremes were regarded
as doubtful, outlying observations (outliers) which had to be rejected. The
attitude toward extremes at that time may be interpreted as an attempt to
"immunize" the normality assumption against experience.
Modern statistical theory is connected with the name of R.A. Fisher who
in 1921 discussed the problem of outliers: " ... , the rejection of observations
is too crude to be defended; an unless there are other reasons for rejection
than mere divergences from the majority, it would be more philosophical to
accept these extreme values, not as gross errors, but as indications that the
distribution of errors is not normal."
A paper by L. von Bortkiewicz in 1922 aroused the interest of some of his
contemporaries (E.L. Dodd (1923), R. von Mises (1923), L.H.c. Tippett (1925)).
Von Bortkiewicz studied the sample range of normal random variables. An
important step toward the asymtotic theory of extremes was made by E.L.
Dodd and R. von Mises. Both authors studied the asymptotic behavior of the
sample maximum of normal and non-normal random variables. The article
of von Mises is written in a very attractive, modern style. Under weak
regularity conditions, e.g. satisfied by the normal dJ., von Mises proved
that the expectation of the sample maximum is asymptotically equal to
F- 1 (1 - lin); moreover, he proved that
P{IXn : n

F- 1 (1 - 1/n)1

:$;

e}

1,

n~

00,

for every e > O.

A similar result was also deduced by Dodd for various classes of distributions.
This development was culminated in the article of R.A. Fisher and L.H.C.
Tippett (1928), who derived the three types of extreme value distributions and

Bibliographical Notes

63

discussed the stability problem. The limiting dJ. Gl.~ was independently
discovered by M. Frechet (1927). As mentioned by Wilks (1948), Frechet's
result and that of Fisher and Tippett actually appeared almost simultaneously
in 1928.
We mention some of the early results obtained for central order statistics:
In 1902, K. Pearson derived the expectation of a spacing under a continuous
dJ. (Galton difference problem) and, in 1920, investigated the performance of
"systematic statistics" as estimators of the median by computing asymptotic
expectations and covariances of sample quantiles. Craig (1932) established
densities of sample quantiles in special cases. Thompson (1936) treated
confidence intervals for the q-quantile. Compared to the development in
extreme value theory the results concerning central order statistics were
obtained more sporadically than systematically.
It is clear that the considerations in this book concerning exact distributions of order statistics are not exhaustive. For example, it is worthwhile
studying distributions of order statistics in the discrete case as it was done
by Nagaraja (1982, 1986), Arnold et al. (1984), and Riischendorf (1985a).
B.C. Arnold and his co-authors showed that order statistics of a sample of size
n ~ 3 possess the Markov property if, and only if, there does not exist an atom
x of the underlying dJ. F such that 0 < F(x-) and F(x) < 1. In that paper one
may also find expressions for the density of order statistics in the discrete case.
We also note that densities of order statistics in case of a random sample size
are given in an explicit form by Consul (1984); see also Smith (1984, pages 631,
632). Further results concerning exact distributions of order statistics may be
found in the books mentioned below.
Apart from the books of E.J. Gumbel (1958), L. de Haan (1970), H.A. David
(1981), J. Galambos (1987), M.R. Leadbetter et al. (1983), and S.1. Resnick
(1987), mentioned in the various sections, we refer to the books of Johnson
and Kotz (1970, 1972) (order statistics for special distributions), Barnett and
Lewis (1978) (outliers), and R.R. Kinnison (1985) (applied aspects of extreme
value theory). The reading of survey articles about order statistics written by
S.S. Wilks (1948), A. Renyi (1953), and J. Galambos (1984) can be highly
recommended. For an elementary, enjoyable introduction to classical results
of extreme value theory we refer to de Haan (1976).

CHAPTER 2

Multivariate Order Statistics

This chapter is primarily concerned with the marginal ordering of the


observations. Thus, the restriction to one component again leads to the order
statistics dealt with in Chapter 1. Our treatment of multivariate order statistics
will not be as exhaustive as that in the univariate case because of the technical
difficulties and the complicated formulae for dJ.'s and densities.
There is one exception, namely, the case of multivariate maxima of i.i.d.
random vectors with dJ. F. This case is comparatively easy to deal with since
the dJ. of the multivariate maximum is again given by Fn, and the density is
consequently of a simple form.

2.1. Introduction
Multivariate order statistics (including extremes) will be defined by taking
order statistics componentwise (in other words, we consider marginal ordering).
It is by no means self-evident to define order statistics and extremes in this
particular way and we do not deny that other definitions of multivariate order
statistics are perhaps of equal importance. Some other possibilities will be
indicated at the end of this section. One reason why our emphasis is laid on
this particular definition is that it favorably fits to our present program and
purposes.
In this sequel, the relations and arithmetic operations are always taken
componentwise. Given x = (Xl'" .,Xd) and y = (Yl'" ',Yd) we write
x ::;; y

if

Xi::;;

Yi,

i = 1, ... , d,

(2.1.1)

and
(2.1.2)

2.1. Introduction

65

The Definition of Multivariate Order Statistics


Let ~l' ... , ~n be n random vectors of dimension d where ~i = (~i,l' ~i, 2" , ~i,d)'
The ordered values of the jth components ~ l,j, ~ 2,j, ... , ~n.j are denoted by
(2.1.3)

Using the map

Z"n

as defined in (1.1.4) we have


X?~

(2.1.4)

Zr:n(~I.j'~2,j""'~n.j)'

We also write
(2.1.5)

Using the order relation as defined in (2.1.1) we obtain


(2.1.6)
Notice that
XI:n

= (X~~~, X~7~,, X~~)n)

(2.1.7)

is the d-variate sample minimum, and


Xn:n

= (X~~~, X~7~,

. .. , X~~~)

is the d-variate sample maximum.


Observe that realizations of Xj:n are not realizations of ~l"

(2.1.8)

.. ,

~n

in general.

The Relation to Frequencies


For certain problems the results ofthe previous sections can easily be extended
to the multivariate set-up. As an example we mention that (1.1.7) implies that
P {Xr:n :=; t}

{~ (1(-oo,ttl(~i,I)'"'' 1(-oo,tdl(~i,d)) ~ r}

(2.1.9)

where t = (t l , t 2 , .. . , t d ) and r = (r, r, ... , r). Notice that in (2.1.9) we obtain a


sum of independent random vectors if the random vectors ~l' ~2' ... , ~n are
independent. It makes no effort to extend (2.1.9) to any subclass of the r.v.'s
X~{~. For Ie {(j,r):j = 1, ... ,d and r = l, ... ,n} we have
P

{X?~ :=; tj,Y' (j, r) E I} = P {~ 1(-oo,tj.rl(~i,j) ~ r, (j, r) E I}.

(2.1.10)

Thus, again the joint distribution of the r.v.'s X~{~, (j, r) E I, can be
represented by means of the distribution of a sum of independent random
vectors if the random vectors ~l' ... , ~n are independent. Note that a similar
result holds if maxima
(l)
X n(l):n(l)'

.. ,

X(d)
n(d):n(d)

are treated with different sample sizes for each component.

66

2. Multivariate Order Statistics

Further Concepts of Multivariate Ordering


A particular characteristic of univariate order statistics was that the ordered
values no longer contain any information about the order of their outcome.
Recall that this information is presented by the rank statistic Rn (see P.1.30).
The corresponding general formulation of this aspect in the Euclidean d-space
is given by the definition of the order statistic via sets of observations. Thus,
given r.v.'s or random vectors /;1"'" /;n we also may call the set {/;l, ... ,/;n}
the order statistic. It is well known that for i.i.d. random vectors these random
sets form a minimal sufficient statistic.
Other concepts are more related to the ordering according to the magnitude of the observations like in the univariate case. Our enthusiasm for
this topic is rather limited because no successful theory exists (besides the
particular case of sample maxima and sample minima as defined in (2.1.7) and
(2.1.8)). However, this topic meets an increasing interest since Barnett's
brilliant paper in 1976 which is full of ideas, suggestions and applications.
Some brief comments about the different concepts of multivariate ordering:
(a) The convex hull of the data points and the subsequent "peeling" of
the multi-dimensional sample entails one possibility of a multivariate
ordering. This concept is nice from a geometric point of view. The convex
hull can e.g. be used as an estimator of the distribution's support.
(b) The concomitants are obtained (in the bivariate case) by arranging the
data in the second component according to the ordering in the first
component.
(c) The multivariate sample median is a solution of the equation
n

L
;=1

Ilx; - xl12 = min!


x

(2.1.11)

where II 112 denotes the Euclidean norm. The median of a multivariate


probability measure Q is defined by

f Ily -

Xll2

dQ(y)

= m!n!.

(2.1.12)

Totalljl-Ordering
Last but not least, we mention the ordering of multivariate data according to
the ranking method everyone is familiar with in his daily life. The importance
of this concept is apparent.
Following Plackett (1976) we introduce a total order of the points Xl' ... ,
Xn by means of a real-valued function 1/1. Define
(2.1.13)
if

2.1. Introduction

67

I/I(X)

(2.1.14)

I/I(y).

Usually one is not only interested in the ranking of the data Xl' ... , Xn
expressed in numbers 1, ... , n but also in the total information contained in
Xl' ... , Xn, thus getting the representation ofthe original data by

(2.1.15)
One advantage of this type of ordering compared to the marginal ordering
is that xi : n is a point of the original sample. It is clear that the ordering
(2.1.15) heavily depends on the selection procedure represented by the function 1/1.
As an example, consider the function I/I(x) = IIx - x o11 2. Other reasonable
functions 1/1 may be found in Barnett (1976) and Plackett (1976). Given the
random vectors ~ l' ... , ~n let

(2.1.16)
denote the I/I-order statistics defined according to (2.1.15) with I/I(x) =
IIx - x oll 2. Define

(2.1.17)
which is the distance of the kth largest I/I-order statistic from the center Xo'
Obviously,

(2.1.18)
is the kth largest order statistic of the n i.i.d. univariate r.v.'s
II~n - xol12 with common dJ.

lI~l

- Xo 112' ... ,

(2.1.19)
Here
B(xo,r)

= {x: Ilx - xoll2

~ r}

is the ball with center Xo and radius r.


Notice that the probability
P{Xk:n

E B(xo,r)}

(2.1.20)

may easily be computed since this quantity is equal to P{Rk:n ~ r}.


We also mention a result related to that of Corollary 1.8.4 in the univariate
case.
By rearranging Xn-k+l:n, ... , Xn:n in the original order of their outcome
we obtain the k exceedances, say, ~l' ... , ~k of the random vectors ~l' ... , ~n'
It is well known that the conditional distribution of the exceedances ~l' ... ,
~k given R n - k : n = r is the joint distribution of k i.i.d. random vectors '11, ... ,
11k with common distribution equal to the original distribution of ~l truncated
outside of
C(xo,r)

= {x:

Ilx - xol12 > r}.

(2.1.21)

68

2. Multivariate Order Statistics

The author is grateful to Peter Hall for communicating a 3-line sketch of the
proof of this result. An extension can be found in P.2.1.
If F(x o, .) is continuous then we deduce from Theorem 1.5.1 that for the
",-maximum Xn:n the following identities hold:
P{Xn:n

B} =

P(Xn:n

= n(n -

1)

BIRn-l:n)dP

f Pg

l E

B n C(x o, . )}F(x o, .

(2.1.22)
2 dF(x o,

').

The construction in (2.1.16) can be generalized to the case where Xo is


replaced by a random vector ~o leading to the kth ordered distance r.v. Rk:n
as studied in Dziubdziela (1976) and Reiss (1985b). Now the ranking is carried
out according to the random function "'(x) = Ilx - ~OIl2' A possible application of such a concept is the definition of an ex-trimmed mean
(2.1.23)
centered at the random vector

~o.

2.2. Distribution Functions and Densities


From (2.1.9) and (2.1.10) it is obvious that the joint dJ. of order statistics X~{}n
can be established by means of multinomial probabilities of appropriate "cell
frequency vectors" N l , ... , Nk where ~ = L7=l lR/~i) and the R l , ... , Rk form
a partition of the Euclidean d-space. Note that

The D.F. of Multivariate Extremes


Let ~, ~l' ~2' ... , ~n be i.i.d. random vectors. We start with a simple result
concerning the dJ. of multivariate order statistics. For the sample maximum
Xn:n based on ~l' ~2' ... , ~n we obtain as an extension of (1.3.2) that
(2.2.1)
This becomes obvious by writing
P{Xn:n:S; t} = P{X!~!:S; tl""'X!~~:s; td}
= P{maxg l ,l, .. ,en,d:s; t 1 , .. ,maxg l ,d, ... ,en,d}:S; t d}

= P{~l

:s; t'''''~n:S; t} = Fn(t).

69

2.2. Distribution Functions and Densities

The extension of (2.2.1) to the case of i.n.n.i.d. r.v.'s is straightforward.


Moreover, in analogy to (2.2.1) one gets for the sample minimum X l : n the
formula
(2.2.2)
P{Xl:n > t} = L(t)"
where L(t) = P{I; > t} is the survivor function.
For d = 2, the following representation for the bivariate survivor function
holds:
L(x,y) = P{I;

>

(x,y)} = 1 - Fl(X) - F2 (y)

+ F(x,y)

with Fi denoting the marginal dJ.'s of F. Hence,


F(x, y)

= 1 - (1 -

Fl (x)) - (1 - F 2 (y))

+ L(x, y).

An extension of this representation to the d-variate dJ. may be found in


P.2.S.
Formula (2.2.2) in conjunction with (1.3.3) yields
P{Xl:n ~ (x,y)}

= 1 - (1 -

Fl(x))n - (1 - F 2 (y))n

+ L(x,y)n.

(2.2.3)

If a dJ. on the Euclidean d-space has d continuous partial derivatives then


we know (see e.g. Bhattacharya and Rao (1976), Theorem A.2.2) that the dth
partial derivative Od F /(ot 1'" Otd) is a density of F. Thus, if j is a density of F
then, if d = 2,
fin.n):n

nF

n-l

+ n(n -

l)F

n-2 of

of

ox oy

(2.2.4)

is the density of the sample maximum Xn:n = (X~~!, X~7!) for n ~ 2.


The density of the sample minimum X l : n = (Xi~~, Xi7~) is given by
nLn-lj

+ n(n -

1)Ln -

oL oL

2__

ox oy

(2.2.5)

For an extension and a reformulation of (2.2.4) we refer to (2.2.7) and (2.2.8).

The D.F. of Bivariate Order Statistics


The exact joint dJ. and joint density of order statistics X~{~ can be established
via multinomial random vectors. The joint distribution of X?~ and X!~~ will
be examined in detail.
Let again I;i = (~i.l' ~i,2)' i = 1, ... , n, be independent copies of the random
vector I; = (~1'~2) with common dJ. F and marginals Fi Thus, F(x,y) =
P{1; ~ (x,y)}, Fl(X) = Pg l ~ x} and F2 (y) = Pg 2 ~ y}.
A partition of the plane into the four quadrants
Rl = (-oo,x] x (-oo,y],
R3 = (x, (0) x (-oo,y],

R2 = (-oo,x] x (y,oo),
R4 = (x, (0) x (y,oo)

70

2. Multivariate Order Statistics

(where the dependence of Ri on (x, y) will be suppressed) leads to the


configuration
R3 '

(X,y)

Put
Notice that L4 is the bivariate survivor function as mentioned above. We
have

and hence

and as noted above


L 4 (x, y)

Denote by

+ F(x, y).

1 - F1 (x) - F2 (y)

the frequency of the

~i

in Rj ; thus,
n

~=

.L lR/~;).
,=1

From (1.1.7) it is immediate that

= P

{~ l(-oo.x]((i,l) 2 r, i~ l(-oo,y]((i,2) 2

= P{N1 + N2 2 r, N1 + N3 2
=

L L P{N1

k=r I=s

S,

S,

N1

= m}

N1 = m}

= m, N2 = k - m, N3 = /- m}.

Inserting the probabilities of the multinomial random vector (N1' N 2, N 3,

N 4 ) we get

Lemma 2.2.1. The df F(r.s):n of (xH~, x~~~) is given by


F
(r,s):n

min(k

= " ",,'
L... I=s
L...
k=r

I)

L...
m=max(k+l-n,O)

n'LmLk-mLI-mLn-k-l+m

)'(1 - m)'(
m.'(k - m.
. n - k - 1 + m),'
.

The Density of Bivariate Order Statistics


If F(r,s):n possesses two partial derivatives, one may use the representation
(8 2 /8x8y)F(r,s):n of the density of F(r,s):n, however, it is difficult to arrange the
terms in an appropriate way.

2.2. Distribution Functions and Densities

71

A different method will allow us to compute the density of (X~~~, X!~~)


under the condition that F has a density, say, f To make the proof rigorous
one has to use the Radon-Nikodym theorem and Lebesgue's differentiation
theorem for integrals.
In a first step we shall prove that a density of F(r,s):n exists if F has a density.
Notice that for every Borel set B we have
n

P{(X~~~,X!~~)EB}:S;

L P{(;,l,j,z)EB}
i,j=l

;,t1

Lf1(X)fz(Y)dXdY +

i#j

it

Lf(X,y) dx dy

wheref1 = Sf(-,v)dvandfz = Sf(u, ')duarethedensitiesofF1 andFz . Thus,


if B has Lebesgue measure zero then P {(X~~~, X!~~) E B} = 0, and hence the
Radon-Nikodym theorem implies that F(r,s):n has a (Lebesgue) density.
The proof of Lemma 2.2.2 below will be based on the fact that for every
integrable function g on the Euclidean k-space almost all x = (x l ' ... ,xd are
Lebesgue points of g, that is,
lim (2hfk
h-O

X1 +h iXk+h
...
g(z)dz = g(x)

(2.2.6)

Xk- h

Xl-h

for (Lebesgue) almost all x (see e.g. Floret (1981), page 276).
The following lemma was established in cooperation with W. Kohne.
Lemma 2.2.2. If the bivariate i.i.d. random vectors ~1' ~z, ... , ~n have the
common density f then the random vector (X~~~, X!~~) has the density
r

J(r,s):n

= n.I

'\'

L...

m=O

Lm
_1
I

m.

[Lr-1-m L s- 1- mL n-r-s+m+1 rt
2

JI

(r - 1 - m)!(s - 1 - m)!(n - r - s + m + 1)!


(r - 2 - m)!(s - 1 - m)!(n - r - s + m + 1)!
(r - 2 - m)!(s - 2 - m)!(n - r - s + m + 2)!
(r - 1 - m)!(s - 1 - m)!(n - r - s + m)!
(r - 1 - m)!(s - 2 - m)!(n - r - s + m + 1)!]
with the convention that the terms involving negative factorials are replaced by
zeros. The functions L 1, ... , L4 are defined as above. Moreover,

72

2. Multivariate Order Statistics

Ls(x,y) =

f:",

L 7(x,y) =

Ix'" f(u,y)du,

Notice that

L 6 (x,y) =

f(u,y)du,

1'"

f(x, v)dv,

Ls(x,y) = f",f(X,V)dV.

2:::'=0 can be replaced by 2::::!!'g.S)-l. Moreover,

PROOF. Put SO,h(X, y) = (x - h, x + h] x (y - h, y + h] where the indices h, x,


y will be suppressed as far as no confusion can arise. According to (2.2.6) it
suffices to show that
(1)

as h ! for almost all (x, y). To compute P {(X:~~, X!~~)


use of the following configuration
Sz

Ss" .
Sl
Put

S6
...........
: (x,y)

So} we shall make

S4
... S7 }2h .

Ss

~ = 2:7=1 1s/~;) and % = P{~ ESj} =

S3

f(u,v)dudv for

Sj

~j ~ 8. Ob-

viously, qj -+ L j as h -+ for j = 1, ... , 4. Moreover, by applying (2.2.6) it is


straightforward to prove that almost everywhere:
(2)

for j = 5, ... ,8. First, observe that for all (x,y) such that (2) holds we have
h-ZP{No ;:::: 2} -+0,

h-ZP{No = 1,

Js ~;::::

1}-+0,

and

as h -+

and hence it remains to prove that


(2h)-Z [P{

as h -+

(X:~~, X!~~) E So, No = 1, Ns = N6 = N7 = Ns = O}

+ P {(x:~~, X!~~) E So, No = 0,

almost everywhere.

jt ~

<

2} ]-+

(3)
1("s):n

73

2.2. Distribution Functions and Densities

Applying (1.1.7) we conclude that


{(X~~~, X~~~) E SO}
=

{x - h < X~~~ ~ x + h, y - h < X~~~ ~ Y + h}

{~ 1(-oo,x-hl(~i,l) < r ~ i~ 1(-oo,X+hl(~i,l)'

i~ 1(-oo,Y-hl(~i,2) < S ~ i~ 1(-OO'Y+hl(~i'2)}


= {Nl

+ N2 + Ns < r ~ Nl + N2 + Ns + No + N6 + N s ,
Nl + N3 + Ns < S ~ Nl + N3 + Ns + No + Ns + N7}

Thus, for m

= 0, ... , n,

{(X~~~,X~~~)

So, No

= {Nl + N2 <

1, Ns

r ~ Nl

= N6 = N7 =

Ns

= 0, Nl = m}

+ N2 + 1, Nl + N3 < S ~ Nl + N3 + 1,
No = 1, Ns = N6 = N7 = Ns = 0, Nl = m}

= {No

(4)

1, Nl

= m, N2 = r

By (4) we also get for m

- 1 - m, N3

=S-

1 - m,

= N6 = N7 = Ns = O}.

Ns

= 0, ... , n,

(5)

.f ~ ~ 2, Nl = m}
{(X~~~, X~~~) E So, No = 0, )=s
= {No = 0, Nl + N2 + Ns = r
N6

= {No = 0, Nl = m, N2 =

+ N3 + Ns = S - 1,
= 1, Ns + N7 = 1, Nl = m}

- 1, Nl

+ Ns

r - 2 - m, N3

= S - 1 - m,

N7 = 0, Ns = 0, Ns = 1, N6 = 1}

+ {No = 0, Nl = m, N2 = r

- 2 - m, N3

=S-

(6)

2 - m,

N6 = 0, N7 = 0, Ns = 1, Ns = 1}

+ {No = 0, Nl = m, N2 = r

- 1 - m, N3

Ns

+ {No = 0, Nl = m, N2 = r

= 0, Ns = 0, N6 =

- 1 - m, N3

Ns

= S - 1 - m,
=S-

1, N7

1}

1}.

2 - m,

= 0, N6 = 0, N7 =

1, Ns

Now (3) is immediate from (2), (5), and (6). The proof is complete.

In the special case of the sample maximum (that is, r = nand s = n) we have
1(n,n):n

= nFn-lj + n(n -

1)F"-2 LsLs

(2.2.7)

74

2. Multivariate Order Statistics

which is a generalization of (2.2.4) in the bivariate case. If the partial derivatives


exist then 1= 8 z F/8x8y,
Ls(x,y)

f:",/(U,Y)dU

(JF/8y)(x,y),

and
L 8(x,y) =

fco/(X, V) dv = (8F/8x)(x,y).

Let ~ = (~1' ~z) again be a random vector with dJ. F and density f. Let
11 (x) = SI(x, v) dv and Iz(Y) = SI(u, y)du be the marginal densities, and let
F1(xIY)

= P(~l

Fz(Ylx)

:s; xl~z = y) = Ls(x,Y)/lz(Y)

and
Pg z :s; YI~l

x) = L8(X,Y)/11(X)

be the conditional dJ.'s. Now, (2.2.7) may be written


(2.2.8)

.f(n,nj,n(x, y)

= nF"-l(x,y)/(x,y)

+ n(n -

1)F"-Z(x,y)F1(xly)Fz (ylx)/1(X)/z(Y).

The Partial Maxima Process


A vector of extremes with different sample sizes in the different components
has to be treated in connection with the partial maxima process Xn defined by

(2.2.9)

for t > 0 where the reals bn and an > 0 are appropriate normalizing constants.
In order to calculate the finite dimensional marginal dJ.'s of Xn one needs the
following.
Lemma 2.2.3. Let 1 :s; Sl < Sz < ... <
variables with common df F. Then,

P{Xs,,,,:S; x 1"",Xsk ,sk:S;

xd

Sk

be integers and ~ 1, .. , ~sJi.d. random

F"(Y1)F'2- S'(Yz)F'k- sk-'(h)

where Yj = min(xj , x j +1, ... , x k )


PROOF.

Obvious by noting that


{Xs,,,, :s; Xl" ",XSk'Sk:S; Xk }
= {Xs,,,, :s; Y1,' ",XSk"k:S; yd

(2.2.10)

2.2. Distribution Functions and Densities

75

We remark that a corresponding formula for sample minima can be


established via the equality
= {min(~l""'~s,)

> Yl,.,min(~Sk-l+l, ... '~sk) > Yk}

(2.2.11)

Multivariate Extreme Value Distributions


In Section 1.3 we mentioned that the limiting (thus, also stable) dJ.'s of the
univariate maximum Xn:n are the Frechet, Weibull, and Gumbel dJ.'s Gi ,,,.
The situation in the multivariate case is much more complex. First, we
mention two trivial examples of limiting multivariate dJ.'s.
EXAMPLES 2.2.4. Let Xn:n = (X~~!, . .. , X~~~) be the sample maximum based
on i.i.d. random vectors E;1, ... , E;n which are distributed like E; = ('11"'" '1d)'

(i) (Complete dependence)


Our first example concerns the case that the components '11' ... , '1d of
E; are identical; i.e. we have
'11 = '12 = ... = '1d'
Let Fl denote the dJ. of '11' Then, the dJ. F of E; is given by
F(t) = Fl (min(t 1, ... , t d))

and hence
P{Xn:n::;; t} = Fn(t) = F:(min(t 1 , ... ,td)).

(2.2.12)

If Fl = Gi,a then with Cn and d n as in (1.3.13):

Fn(C ntl

+ dn, ... ,cntd + dn) = Gi,,,(cnmin(t 1,,td) + dn)

(ii) (Independence)
Secondly, assume that the components '11' ... , '1d of E; are independent.
Then it is clear that X~~!, ... , X~~~ are independent. If GiUl,,,Ul is the dJ.
of'1j then with Cn,j and dn,j as in (1.3.13):
F n(C n,l t l

+ dn,l, .. ,Cn,d td + dn,d) = F(t) =

n
j=l
d

GiW,,,(j)(tj ).

(2.2.13)

(iii) (Asymptotic independence)


Given E; = (-~, ~), we have
Xn:n = (X~~!, X~7!) = (- X 1:n, Xn:n)
where Xl:n and Xn:n are the sample minimum and sample maximum
based on the independent copies ~ 1, ... , ~n of ~. In Section 4.2 we shall

76

2. Multivariate Order Statistics

see that X1:n and Xn:n (and, thus, X~~~ and X~7~) are asymptotically
independent. Thus, again we are getting independent r.v.'s in the limit.
Contrary to the univariate case the multivariate extreme value d.f.'s form
a non parametric family of distributions. There is a simple device which enables
us to check whether a given dJ. is a multivariate extreme value dJ.
We say that ad-variate dJ. Gis nondegenerate if the univariate marginals
are nondegenerate. A nondegenerate d-variate dJ. G is a limiting dJ. of sample
maxima if, and only if, G is max-stable, that is,
(2.2.14)

for some normalizing constants an j > and bn,j (compare e.g. with Galambos
(1987), page 295, or Resnick (1987), Proposition 5.9).
If ad-variate dJ. is max-stable then it is easy to show that the univariate
marginals are max-stable and, hence, these dJ.'s have to be of the type G 1 ,a,
G2 ,a or G3 with r:J. > 0.
On the other hand, if the jth univariate marginal dJ. is Gi(j),aU) for
j = 1, ... , d, one can take the normalizing constants as given in (1.3.13) to
verify the max-stability.
Again the transformation technique works: Let G be a stable dJ. with
univariate marginals GiU),aU) for j = 1, ... , d. Writing again Ii,a = Gi~; 0 G 2 ,l
we obtain that
Xl

< 0, ... , Xd < 0,

(2.2.15)

defines a stable dJ. with univariate marginal d.f.'s G2 ,l (the standard exponential dJ. on the negative half-line).
EXAMPLE

(i)

2.2.5. Check that G defined by

( X'Y)

G(x,y) = G2 1(X)G2 l(y)exp - - - ,


"
x+y

X,y < 0,

is an extreme value dJ. with "negative" exponential marginals GZ ,l, and


(ii)
is the corresponding extreme value dJ. with Gumbel marginals.
A bivariate dJ. with marginals GZ ,l is max-stable if and only ifthe Pickands
(1981) representation holds; that is
G(x, y) = exp (

Jr

min(ux, (1 - u)y) dV(U)) ,

X,

y < 0,

(2.2.16)

[0,1]

where v is any finite measure having the property

J[O,l]

udv(u) =

J[O,l]

(1 - u)dv(u) = 1.

(2.2.17)

2.2. Distribution Functions and Densities

77

Recall that the marginals are given by G1 (x) = limy~oo G(x,y) and G2 (y) =
G(x, y) and hence (2.2.17) immediately implies that, in fact, the marginals in (2.2.16) are equal to G2,1.
lf v is the Dirac measure putting mass 2 on the point t then G(x, y) =
exp(min(x, y. lf v is concentrated on {O, 1} and puts masses 1 on the points
o and 1 then G(x, y) = G2 1 (X)G 2 1 (y).
The transformation technique immediately leads to the corresponding
representations for marginals different from G2 l ' Check that e.g.
limx~oo

G(x,y) = exp ( -

max(ue-X,(1 - U)e-Y)dV(U)

(2.2.18)

J[O.l]

is the representation in case of standard Gumbel marginals if again (2.2.17)


holds.
For the extension of (2.2.16) to higher dimensions we refer to P.2.10.

Multivariate D.F.'s
This section will be concluded with some general remarks about multivariate
dJ.'s.
First recall that multivariate dJ.'s are characterized by the following three
properties:
(a) F is right continuous;
that is, if Xn ! Xo then F(xn) ! F(xo).
(b) F is normed;
that is, if Xn = (xn.l>"" x n d ) are such that x n .; i 00 for every i = 1, ... , d
then F(xn) i 1; moreover, ifx n ;;:: Xn+l and x n.;! -00 for some i E {I, ... , d}
then F(xn) -+ 0, n -+ 00.
(c) F is A-monotone;
that is, for all a = (ai' ... , ad) and b = (b 1, .. . , bd),
A~F :=

(-l)d-D:l m'F(bf'lai- m1 , ... , b,jdaJ-md) ;;:: O.

(2.2.19)

me{O.l}d

Recall that if Q is the probability measure corresponding to F then


Q(a, b] = A~F.
From the representations (2.2.16) and (2.2.17) we already know that
multivariate extreme value dJ.'s are continuous. However, notice that the
continuity is a simple consequence of the fact that the univariate marginal
dJ.'s are continuous. This is immediate from inequality (2.2.20).

Lemma 2.2.6. Let F be ad-variate df. with univariate marginal df.'s F;,
i = 1, ... , d. Then, for every x, y,
d

IF(x) - F(y)1

:$;

L IF;(x;) ;=1

F;(Y;)I

(2.2.20)

2. Multivariate Order Statistics

78

PROOF.

Let Q be the probability measure pertaining to F. Given x, y we write


Bi = {(Xi' yJ
(Yi' xJ

We get

if Xi:<=;; Yi
Xi > Yi

IF(x) - F(y)1 = lit [F(Yl,,Yi-l,X i,,Xd) - F(Yl, ... ,Yi,Xi+l, ... ,Xd)]1
:<=;;.f

,=1

Q((X (-OO,YJ) XBi X(.X


]=1

]=,+1

(-OO,XJ))

:<=;;

i=l

IFi{X;) - Fi{Y;)I

P.2. Problems and Supplements


1. Let ~1' ... , ~n be i.i.d. random vectors with common continuous dJ. F. For
i E I := {j: 1 ::; j ::; k + 1, rj - r j - 1 > I} define the random vectors ~r;_, +1' ... , ~r;-1

by the original random vectors


property

~;

(in the order of their outcome) which have the

- xollz < Rr;,n


(with the convention that Rro,n = 0 and R rk +, ,n = (0). Then the conditional
distribution of(~rH+l' ... '~r;-I)' iEI, given Rr"n = ZI, .. , Rrk,n = Zk is the joint
Rr;_"n < II~;

distribution of the independent random vectors (1]r;_, +1' ... ' 1]r;-I), i E I, where for
every i E I the components of the vector are i.i.d. random vectors with common
distribution equal to the distribution of ~1 truncated to {x: Z;-1 < IIx - Xo liz < z;}
with Zo = 0 and Zk+l = 00.
2. (Distribution of IjI-order statistics)
(i) Prove the analogue of (2.1.21) for the kth IjI-order statistic Xk,n.
(ii) (Problem) Derive the asymptotic distributions of central and extreme IjI-order
statistics Xk,n.
(iii) (Problem) Derive the asymptotic distribution of the trimmed mean in (2.1.22)
for different centering random vectors ~o.

3. Let A; E d, i = 1, ... , nand m E {O, ... , n}. With So = 1 and


P(A;, n ... n A;),

one gets
(i)

and
(ii)

1, ... , n

P.2. Problems and Supplements

(iii)

{.
l~l

lA

79

m} :0;;:2': . (j)(
-1)j-mSj
m

if

J~m

keven
k - m odd.

4. (i)

U Ai = L (-ly- 1 Sj'

i=1

j=l

(ii) (Bonferroni inequality)

<

k
k

Pi~ Ai; j~

(-ly- 1 Sj

if

odd
even.

5. Let ~ = (~l"'" ~d) be a random vector with dJ. F.


(i) Prove that
1 - F(t) =

L (_ly+l hit)
j~l

hit)

=
1

~il

Pgi, > ti"""~ij > tiJ,

< ... <ij'$d

(ii) Moreover,

1 _ F(t) :0;;
(_l)j+l hit)
:2': j~l

k
k

if

= 1, ... , d.

odd
even.

(iii) Find C > 0 such that for every positive integer n and x

[0,1],

exp( --------nx) - Cn- 1 :0;; (1 - x)" :0;; exp( - nx).


(iv) Check that

F(t)n :0;; exp

(~ (-I)jnhit))

if k even or k

= d.

Moreover, for some universal constant C > 0,

F(t)" :2': exp

(tl

(-l)jnhit)) - Cn- 1

if k odd or k

= d.

6. (Uniform Distribution on A = {(x, y): x, y :2': 0, x + Y :0;; I})


The density hn.n),. of (X~~~, X~~~) under the uniform distribution on A is given
by

hn.n),.(x, y) = 2nn(xy)n- 1 1A (x, y)

+ 4n(n -

I)F n- 2 (x, y)min(x, 1 - y) min(1 - x, y)

for 0 :0;; x, Y :0;; 1 where F is the underlying dJ. given by

F(x, y) =

{2XY

2xy - (x

+y-

1)2

if x
X

+ Y:0;; 1
+ y :2': 1

for 0 :0;; x, Y :0;; 1.


7. Let the underlying density be given by f(x, y) = x
f(x, y) = 0 otherwise. Then, the rlJ. F is given by

+y

for 0 :0;; x, y:o;; 1 and

0:0;; x, y :0;; 1.

2. Multivariate Order Statistics

80
The density 1;n.n),n of (X~~~, X~7~) is given by
1;n.n),n(x,y)

nFn- 1 (x,y)f(x,y)

+ n(n -

I)F n- 1 (x,y)(xy

+ x 1/2)(xy + y1/2)

for 0 ::; x, y ::; 1.


8. (Problem) Let (~1' ~1) be a random vector with continuous dJ. F. Denote by Fl
and F1 the dJ's of ~1 and ~1. Extend (2.2.8) to
p {(X~~~, X~7~)
=

9.

B}

nFn-l(x,y)dF(x,y)

n(n - l)F"-1(x,y)F1 (xly)F1(ylx)d(F I x F1)(X,y).

(i) Prove that a bivariate extreme value dJ. G with standard "negative" exponential marginals (see (2.2.16)) can be written
G(x,y)

exp[(x

+ Y)dC:

y)

x,y < O.

where the "dependence" function d is given by


d(w) =

J10.1J

max(u(l - w), (1 - u)w)dv(u)

and v is a finite measure on [0, 1] satisfying condition (2.2.17).


(ii) Check that d(O) = d(l) = 1. Moreover, d == 1 under independence and d(w)
max(1 - w, w) under complete dependence.
(iii) Check that d(w) = 1 - w + w1 in Example 2.2.5(ii).

10. Ad-variate d.f. with marginals G1 1 is max-stable if, and only if,
G(x) = exp(L min(Ulxb ... ,UdXd)dfl(U))

where fl is a finite measure on the d-variate unit simplex


S :=
having the property

{U: ,=1
.f

uidfl(U) = 1

Ui

= 1,

Ui

2':

o}

for i = 1, ... , d.

(Pickands, 1981; for the proof see Galambos, 1987)


11. (Pickands estimator of dependence function)
(i) Let ('11' '11) have the dJ. G as given in P.2.9(i). Prove that for every t < 0 and
WE (0, 1),
p{maxC

~ w' ~)::; t} =

exp[td(w)].

(ii) Let ('11.i, '11)' i = 1, ... , n, be i.i.d. random vectors with common dJ. G as given
in P.2.9(i). Define

Bibliographical Notes

81

In(w) =

[n- f min(~,
l'1d)]-l
1- w w
1

i=l

as an estimator of the dependence function d. Prove that


E(l/J.(w)) = 1/d(w)

and
Variance(1/J.(w)) = 1/(nd(w)2).
12. (Multivariate transformation technique)
Let ~ = (~1"'" ~d) be a random vector with continuous dJ. F. We use the notation
Fklxi-1, .. ,xd = P(~i S; 'l~i-1 = Xi-1'''''~1 =

for the conditional dJ. of ~i given


(i) Put

xd

~i-1 = Xi-I, ... , ~ 1 = Xl'

T(x) = (T1 (x), ... , Jd(x))


= (F1(xd, F2 (X2Ix 1), ,FAx dl x d-1""

,xd)

Prove that T1 @, ... , Jd(~) are i.i.d. (0, 1)-uniformly distributed r.v.'s.
(ii) Define y-1(q) = (Sl(q),,,,,Sd(q)) by
Sl(q) = Fl1(qd
Si(q) = F i- 1(q;lSi_1 (q), ... , Sl (q))

for i = 2, ... , d.

Prove that P{T-1(T(~)) =~} = 1. Moreover, if '11' ... , '1d are i.i.d. (0,1)uniformly distributed r.v.'s then
T- 1('11'"'' '1d)

has the dJ. F.

13. Compute the probability

P{X.,. =

~j

for some j

{1, ... , n}}.

Bibliographical Notes
It is likely that Gini and Galvani (1929) were the first who considered

the bivariate median defined by the property of minimizing the sum of the
deviations w.r.t. the Euclidean norm (see (2.1.11)). This is the "spatial" median
as dealt with by Oja and Niinimaa (1985). In that paper the asymptotic
performance of a "generalized sample median" as an estimator of the symmetry
center of a multivariate normal distribution is investigated. Another notable
article related to this is Isogai (1985).
The result concerning the conditional distribution of exceedances (see
(2.1.21)) and its extension in P.2.1 was e.g. applied by Moore and Yackel (1977)
and Hall (1983) in connection with nearest neighbor density estimators;
however, a detailed proof does not seem to exist.
A new insight in the asymptotic, stochastic behavior of the convex hull of

82

2. Multivariate Order Statistics

data points is obtained by the recent work of Eddy and Gale (1981) and
Brozius and de Haan (1987). This approach connects the asymptotic treatment
of convex hulls with that of multivariate extremes (w.r.t. the marginal
ordering).
For a different representation of the density of multivariate order statistics
we refer to Galambos (1975).
In the multivariate set-up we only made use of the transformation
technique to transform a multivariate extreme value dJ. to a dJ. with
predetermined margins. P.2.12 describes the multivariate transformation
technique as developed by Rosenblatt (1952), O'Reilly and Quesenberry
(1973), Raoult et al. (1983), and Riischendorf (1985b). It does not seem to be
possible to make this technique applicable to multivariate order statistics
(with the exception of concomitants).
Further references concerning multivariate order statistics will be given in
Chapter 7.

CHAPTER 3

Ineq uali ties and the Concept


of Expansions

In order to obtain rough estimates of probabilities of certain events which


involve order statistics, we shall apply exponential bound theorems. These
bounds correspond to those for sums of independent r.v.'s. In Section 3.1 such
bounds are established in the particular case of order statistics of i.i.d. random
variables with common uniform dJ. on (0,1). This section also contains two
applications to moments of order statistics.
Apart from the basic notion of expansions of finite length, Section 3.2 will
provide some useful auxiliary results for the treatment of expansions.
In Parts II and III ofthis volume we shall make extensive use of inequalities
for the distance between probability measures. As pointed out before, the
variational distance will be central to our investigations. However, we shall
also need the Hellinger distance, a weighted Lrdistance (in other words,
X2-distance), and the Kullback-Leibler distance.
In Section 3.3 our main interest will be focused on bounds for the distance
between product measure via the distance between single components. We
shall start with some results connected to the Scheffe lemma.

3.1. Inequalities for Distributions of Order Statistics


In this section we deduce exponential bounds for the distributions of order
statistics from the corresponding result for binomial r.v.'s. By applying this
result we shall also obtain bounds for moments of order statistics.
Let us start with the following well-known exponential bound (see Loeve
(1963), page 255) for the distribution of sums ofi.i.d. random variables
en with Eei = and led ~ 1: We have

el' ... ,

3. Inequalities and the Concept of Expansions

84

(3.1.1)

Eer.

Because of relation (1.1.8)


for every e ~ 0 and 0 ~ t ~ "t'n where "t'; = I7=1
between distributions of order statistics and binomial probabilities one can
expect that a result similar to (3.1.1) also holds for order statistics in place of
sums.

Exponential Bounds for Order Statistics of Uniform R.V.'s


First, our result will be formulated for order statistics UI : n ~ ... ~ Un:n of
i.i.d. random variables '1i which are uniformly distributed on (0, 1). The transformation technique leads to the general case ofi.i.d. random variables with
common dJ. F.

ei

Lemma 3.1.1. For every e ~ 0 and r

{I, ... , n} we have

nl/2
< -e}
(
P { ---;-(Ur:n - Jl) ; e
~ exp
where Jl

r/(n

+ 1) and 0'2 = Jl(1

3(1

e2

+ e/(O'nl/2

(3.1.2)

- Jl).

PROOF. (I) First, we prove the upper bound of P{(nl/2/0')(Ur : n - Jl) ~ -e}.
W.l.g. assume that a: = Jl - eO'/nl/2 > O. Otherwise, the upper bound in (3.1.2)
is trivial. In particular, a: E (0, 1). By (1.1.8), putting eo = (r - na:)/(na:(1 - a:1/2
and i = 1(-OO,~I('1;) - a:, we get

P{(nl/2/0')(Ur : n - Jl)

-e} =

p{~ 1(-OO,~I('1i) ~ r}

= p{~

ei ~ r -

na:}

~ exp( -eot + it 2)

ei

if 0 ~ t ~ (na:(1 - a:1/2 where the last step is an application of (3. 1.1) to and
e = eo. It is easy to see that t = 2e(a:(1 - a:1/2/(30'(1 + e/(O'nl/2))) fulfills the
condition 0 ~ t ~ (na:(1 - ClW/2. Moreover, -eot + (3/4)t 2 ~ -e 2/(3(1 + e/
(O'nl/2))) since eo ~ eO'/(a:(1 - a:1/2 and a:(1 - a:)/0'2 ~ 1 + e/(O'nl/2). This
proves the first inequality.
d
(II) Secondly, recall that Ur : n = 1 - Un- r +l :n (see Example 1.2.2), hence
we obtain from part (I) that
P{(nl/2/0')(Ur : n - Jl)

e}

= P{(nl/2/0')(1 - Un- r +1:n -

Jl) ~ e}

= P{(n l /2/0')(Un_r +1:n - (n - r


~

exp( -e 2/3(1

+ e/(O'nl/2))).

+ 1)/(n + 1 ~

-e}

85

3.1. Inequalities for Distributions of Order Statistics

The right-hand side of(3.1.2) can be written in a simpler form for a special
choice of e. We have

P{[n1/2/max{u,(6s(logn)/n)1/2}JIUr:n - J.LI ~ (6slogn)1/2}

S;

2n- s . (3.1.3)

Moreover, a crude estimate is obtained by

e~

o.

(3.1.4)

Notice that 2exp( -e/5) ~ 1 whenever e S; 1. It is apparent that (3.1.4) is


weaker than (3.1.3) for small and moderate e.
As a supplement to Lemma 3.1.1 we shall prove another bound of
P {Ur: n S; c5} that is sharp for small c5 > O. Note that P {Ur: n S; c5} ! 0 as c5 ! 0,
however, this cannot be deduced from Lemma 3.1.1.
Lemma 3.1.2. If Ur:n and J.L are as above then for every e ~ 0:

P{Ur:n
PROOF.

S;

J.Le}

S;

e 1/'(ee)'/(2nr)1/2.

From Theorem 1.3.2 and Sterling's formula we get

P{Ur:n

S;

J.Le} = [n!/(r - l)!(n - r)!]


S;

[n' /(r - I)!]

= (exp(r

f:

JorilE

X,-l

dx

X,-l(1 -

S;

xr'dx

(r' /r!)e'

+ ()(r)/r)/(2nr)1/2)e'

where I()(r) I < 1. Now the proof can easily be completed.

Extension to the General Case


The investigation of exponential bounds for distributions of order statistics
will be continued in Section 4.7 where local limit results are established. To
prove these results we need, however, the inequalities above. The extension
of inequality (3.1.2) to arbitrary dJ.'s is accomplished by means of Corollary
1.2.7. For order statistics X 1:n, ... , Xn:n of n i.i.d. random variables with
common dJ. F we have

p{[n 1/2g (J.L)/U](Xr:n - F-1(J.L))

~ ;e} S; p{(n 1/2/U)(Ur:n _

J.L)

~ :~;e)}
(3.1.5)

where g(J.L) is a nonnegative constant and h(x) = (nl/2/u) [F(F-1(J.L) + xu/


(g(J.L)nl/2)) - J.L]' Thus, upper bounds for the left-hand side of (3.1.5) can be
deduced from (3.1.2) by using bounds for h( -e) and h(e). Notice that if F has
a bounded second derivative on a neighborhood of F-1(J.L) then, by taking

86
g(ll)

3. Inequalities and the Concept of Expansions

F'(F- 1 (1l)), we get

h(x)

= x

+ O(x 2(J/g2(Il)n 1/2).

(3.1.6)

If one needs an upper bound of the left-hand side of(3.1.5) for a fixed sample
size n then one has to formulate the smoothness condition for F in a more
explicit way so that the capital 0 in (3.1.6) can be replaced by a constant. This
should always be done for the given specific problem.

Inequalities for Moments of Order Statistics


Let U"n, 11 and (J be given as in Lemma 3.1.1. From (1.7.5) we know that
E((U"n - 11)2) = (J2/(n + 2). The following lemma due to Wellner (1977) gives
upper bounds for absolute central moments of U"n.
Lemma 3.1.3. For every positive integer j and r E {I, ... , n}:
EI U"n - Ill i ::s; 2j!5i (Jin-i/2.

By partial integration (or Fubini's theorem) we obtain for every dJ. G


with bounded support that

PROOF.

La) xidG(x) =j La)

X i - 1 (1-

G(x))dx

so that, by writing G(x) = P{(n 1/2/(J)IU"n - Ill::s; x}, the exponential bound
in (3.1.4) applied to 1 - G(x) yields
E l(n 1/2 /(J)(U"n - IlW =

La) xi dG(x)

=j

La) x i -

::s; 2j

La) x i -

(1 - G(x))dx

exp( -x/5)dx = 2j!5 i .

To prove an expansion of the kth absolute moment E IXr:nlk (see Section


6.1) we shall use an expansion of E(IXr:nlkl{1Xrml,,;u}) and, furthermore, an
upper bound of E(lXr:nlkl{IXrnl>u}) for appropriately chosen numbers u. Such
a bound can again be derived from the exponential bound (3.1.2).
Lemma 3.1.4. Let Xi:n be the ith order statistic of n i.i.d. random variables with
common df. F. Assume that EIXs:il < 00 for some positive integers j and
s E {l, ... ,j}.

3.1. Inequalities for Distributions of Order Statistics

87

Then there exists a constant C > such that for every real u and integers n,
k and r E {I, ... , n} with 1 ~ i:= r - ks ~ m:= n - (j + l)k the following two
inequalities hold:

PROOF. We shall only verify the upper bound of E(IXr:nlk 1{X r ,n>U})' The other
inequality may be established in a similar way.
Since X"n ~ F-1(V"n) and F-1(q) > u iff q > F(u) we get

E(I Xr:nl k l{Xrn >u})


= E(IF- 1(Vr:n)l k l{F-l(U"nU})
=

e 1F-

b(r, n - r + 1) JF(U)
b(r, n -

(xWx r - 1 (1 _ xrr dx

1 (Ir1(x)lxS(1 _ x).i-s+1)k
r
r + 1) JF(U)

b(i, m - i
~ b(
r, n - r

+ 1) k
+ 1) C P{Vi:m >

_ x)m-idx

X i-1(1

F(u)}

where C is the constant of (1.7.11). Since P{Vi:m > F(u)} = P{Xi:m > u} the
proof is complete.
0

Bounds for the Maximum Deviation of Sample Q.F.'s


This section will be concluded with some simple applications of inequality
(3.1.3) to the sample q.f. Let C;;l be the sample q.f. based on n i.i.d. (0,1)uniformly distributed r.v.'s. The first result concerns the maximum deviation
of C;;l from the underlying q.f. C-1(q) = q.
Lemma 3.1.5. For every s >

there exists a constant B(s) >

P {I C;;l (q) - ql > (log n)jn) 1/2 K(q, s, n) for some q E (0, I)}
where K(q,s,n)
PROOF.

(7(s

l)max{q(1 - q), 7(s

such that
~

B(s)n- S

+ 1)(logn)/n})1/2.

By (3.1.3)

P {I Cn- 1(q) - q I > (log n)jn) 1/2 K(q, s, n) for some q E (0, I)}

2n- s

where K(q,s,n) = 6smax{(j"(q), (6s(logn)jn)1/2} + 1jn, with (j"2(q) = (r(q)j


(n + 1))(1 - r(q)j(n + 1)) and r(q) = nq ifnq is an integer and r(q) = [nq] + 1,
otherwise. Now check that K(q, s, n) ~ K(q, s, n) for sufficiently large n.
0

3. Inequalities and the Concept of Expansions

88

From Lemma 3.1.5 it is immediate that

p{

nl/2IG;1(q)_ql
max{(q(l - q))1/2, ((logn)/n)1/2}
> C(s)(log n)1/2 for some q E (0,

1)} : ; B(s)n-S

(3.1. 7)

for some constant C(s) > O.

Oscillation of Sample Q.F.


From Theorem 1.6.7 we know that the spacing Us : n - Ur : n of n i.i.d. (0,1)uniformly distributed r.v.'s has the same distribution as Us-ron' This relation
makes (3.1.3) applicable to spacings, too. The details of the proof can be left
to the reader.
Lemma 3.1.6. For every s > 0 there exist constants B(s) > 0 and C(s) > 0 such
that

p{

O<P~~~2<1

n l/2 IG;1(P2) - G;l(pd - (P2 - pdl


max{(P2 - pd l/2, ((logn)/n)1/2}
> C(s)(log n)1/2 } < B(s)n-s.

Extensions of (3.1.7) and Lemma 3.1.6 will be proved under appropriate


smoothness conditions on the underlying q.f. F- l .
Lemma 3.1.7. Assume that the qf F- l has a derivative on the interval
(ql - 8, q2 + 8) for some 8> O. Put
sup

I(F- l ),(p)l

Then, for every s > 0 there exist constants B(s, 8) > 0 and C(S,8) > 0 (only
depending on sand 8) such that
(i)

nl/2IFn-1(p) - F-l(p)1
Q,S,PS,Q2 max{(p(1 - pW/2, ((logn)/n)1/2}

P { sup

> C(s, 8)Dl (log n)1/2 } < B(s,8)n- S,

and if, in addition, the derivative (F-l)' satisfies a Lipschitz condition of order

fJ E [1/2, 1J, that is,

I(F- l )'(P2) - (F- l ),(pdl ::;; D2 1p2 - PllP

for ql - 8 < Pl' P2 ::;; q2

+8

3.2. Expansions of Finite Length

89

for some D2 > 0, then

(ii)

p{

sup

Q,5.P,5.P25.Q2

nl/2IFn-1(P2) - Fn-l(Pl) - (F-l(P2) - F-l(Pl))1


1/2
1/2
max{(P2 - pd ,((logn)jn) }

PROOF. In view of the quantile transformation we may take the version


F-l(G;;l) of the sample q.f. Fn- l where Gn- l is defined as in Lemma 3.1.5. Now,
applying (3.1.7) and the inequality

IF-l(G;;l(p)) - F-l(p)1 ::; DlIG;;l(p) - pi


we obtain (i).
Using the auxiliary function
l{I(y)

= F- l (P2 + y(G;;1(P2) -

P2)) - F-l(Pl

+ y(G;;l(pd -

pd)

we obtain the representation


F- l (G;;1(P2)) - F-l(G;;l(Pl)) - [F- l (P2) - F-l(Pl)]
= 1{I(1) -I{I(O)

= (F- l )'(P2 + O(G;;1(P2) - P2)HG;;1(P2) - P2]


- (F-l)'(Pl + O(G;;l(pd - pdHGn-l(pd - PI]
with 0 < 0 < 1. Now, standard calculations and Lemma 3.1.6 lead to (ii).

From the proof of Lemma 3.1.7 it is obvious that (i) still holds if F- l satisfies
a Lipschitz condition of order 1.

3.2. Expansions of Finite Length


When analyzing higher order approximations one realizes that in many cases
these approximations have a similar structure. As an example, we mention
the Edgeworth expansions which occur in connection with the central limit
theorem. In this case, a normal distribution is necessarily the leading term of
the expansion. The concept of Edgeworth expansions is not general enough
to cover the higher order approximation as studied in the present context.
Apart from the fact that our attention is not restricted to sequences of
distributions one also has to consider non-normal limiting distributions in
the field of extreme order statistics. Thus, an extension of the notion of
Edgeworth expansions to the more general notion of expansions of finite
length is necessary.
It is not the purpose of this section to develop a theory for expansions of
finite length, and it is by no means necessary to have this notion in mind

90

3. Inequalities and the Concept of Expansions

to understand our results concerning order statistics. However, at least in


this section, we want to make clear what is meant by speaking of expansions. Moreover, this notion can serve as a guide for finding higher order
approximations.

A Definition of Expansions of Finite Length


Let gy and go,y,' .. , gm-l,y be real-valued functions with domain A for every
index Y E r so that I:.'f!=-01 gi,y can be regarded as an approximation to gyWe say that gy, Y E r, admits the expansion Li=-OI gi,y oflength m arranged
in powers of h(y) > if for every x E A there exists a constant C(x) > Osuch
that

i~ gi,y(X) 1 ~ C(x)h(yy+1,

YE

r,

(3.2.1)

for every j = 0, ... , m - 1.


The expansion is said to hold uniformly over Ao

c:

A if sup{ C(x):

Igy(X) -

xEAo}<oo.
If sup {h(y): y E r} <

00,

we may assume w.l.g. that


Igi,yl ~ Ch(y)i

by putting C sup{1 + h(y): y E r} in place of C.


In our context the functions gy etc. will mainly be drs or probability
measures.
A significant feature of an expansion oflength m + 1 is that the first m terms
coincide with the expansion of length m. Thus, one always has the choice
between the simplicity and the accuracy of an approximation. The first term
of an expansion (giving the simplest approximation) is usually known from a
limit theorem; an error bound for the limit theorem leads to an expansion of
length one. One purpose of asymptotic expansions is to give a better insight
into the remainder term of the limit theorem.
EXAMPLE 3.2.1. If a real-valued function
bounded derivatives then
1f(y)

f defined on the real line has m

m-l j<i)(yo)
'1
- i~O -i-!-(y - Yo)' ~ Cly - Yolm

and hence the Taylor expansion Li=-OI f(i)(yo)(y - Yo)iji! of f about Yo is an


expansion arranged in powers of h(y) = Iy - Yol.
3.2.2. Let <I> denote the standard normal dJ. and qJ = <1>'. By noting
that yqJ(Y) = - qJ'(y) one easily obtains by partial integration and by using the
induction scheme that
EXAMPLE

3.2. Expansions of Finite Length

1 - <I>(y)

<p(y) (

=-

91

1+

m-1

i=1

.135 "(2i - 1))


(-1)'
2i

+(-lr3'5"'(2m-l)

<p(y)
-dy
y2m

oo

(3.2.2)

for every positive integer m and y > 0 (where I?=1 equals zero by convention).
An application of (3.2.2) in the cases m = 1 and m = 2 leads to

<p(y)(l/y - l/y3) ~ 1 - <I>(y) ~ <p(y)/y

(3.2.3)

for y > O.
By means of (3.2.2) we get an expansion of (1 - <I>(y))y/<p(y) in powers of
h(y) = y-2. We have
1

(1 -7jY))y - (1
<p y

+ ~f (_I)i 1 3'5"';; (2i Y

,=1

1)) 1 ~ Cmy-2m.

(3.2.4)

Notice that (1 - <I>(y))y/<p(y) cannot be represented by means of the formal


series 1 + I~d -1)i3' 5 (2i - 1)/y2i since 35 (2i - 1) --+ 00 as i --+ 00.
However, (3.2.4) provides a useful inequality if y is large. Moreover, the
approximation for m + 1 is more accurate than that for m if y is sufficiently
large.
EXAMPLE 3.2.3. A sequence of drs Hn admits an Edgeworth expansion of
length m if

s~p

Hn(t) - ( <I>(t)

+ <p(t)

11

n- i/2 Li(t)) 1

~ Cn- ml2

(3.2.5)

where Li are polynomials. This is a special case of (3.2.1) with gn = H n,


go,n = <1>, gi,n = n- i/2 <pLi for i = 1, ... , m - 1 and h(n) = n- 1/2.

Expansions of Probability Measures


We will primarily be interested in expansions
m-1
PO,y + I Vi,y
i=1

of probability measure Py which hold uniformly over all measurable sets. If


the probability measure PO,y is the first order approximation to Py then the
approximation can be improved by adding to PO,y an approximation, say V1 ,y
to Py - PO,y' Since Py - PO,y is a signed measure with total mass equal to zero
it is clear that the set function V1 ,y will typically also have this property.

Lemma 3.2.4. Let Py and Po, y be probability measures and let Vi, y be finite signed
measures on a measurable space (S, I?4).
If PO,y + Ii=11 Vi,y is an expansion of Py, y E r, uniformly over I?4 arranged

3. Inequalities and the Concept of Expansions

92

in powers of hey), y E r, then there exists an expansion PO,y + L~=11 Ili,y such
that Ili,y are finite signed measures with lli,y(S) = O. Moreover, one may take
i

= 1, ... , m - 1

or
i

(3.2.6)

= 1, ... , m - 1.

(3.2.7)

PROOF. Straightforward by using the fact that II{=l Vi,y(S)1 ::;; Ch(y)j+1 for
0
some constant C > O.

According to Lemma 3.2.4. we can assume w.l.g. that the term Vi, y of an
expansion has the property Vi, yeS) = O. Another useful tool in this context is
the following.
Lemma 3.2.5. Let Py, PO,y and Vi,y be as in Lemma 3.2.4. Suppose that there exists
C > 0 such that
sup 1 Py(B) - Po,y(B)
1+

BE:J6

I/L

{=l vijB) 1 ::;; Ch(yy+1


i=l Vi,y(S)

(3.2.8)

and IvijS) I ::;; Ch(y)i for every j = 0, ... , m - 1 and y E r [where (3.2.8) has to
hold whenever 1 + L{=l Vi,y(S) > 0].
Then, PO,y + L~=11 Ili,y, Y E r, is an expansion of Py, y E r, uniformly over f!J
arranged in powers of hey) where Ili,y is inductively defined by
i-1

Ili,y = Vi,y - Vi,y(S)PO,y - L Vk,y(S)lli-k,y,


k=l

i = 1, ... , m - 1.

(3.2.9)

PROOF. First notice that from the inequality IVi,y(S)1 ::;; Ch(y)i it is immediate
by induction over i = 1, ... , m - 1 that

(3.2.10)

where C will be used as a generic constant which only depends on m.


The triangle inequality and (3.2.10) yield
Ili,y ::;; Ch(y)1
1Py - (PO,y + i~')1

'+1

::;; Ch(yy+1

O,y + L.,.i=l Vi,y


+ IP.1 + I{=l Vi,y(S) -

"j

(PO,y + i~')1
Ili,y

(1 + (1 + i~

-1)

since

2j

i-1

L Vk,y(S)Ili-k,y'
k=l

i=j+1

Vi,y(S))

3.2. Expansions of Finite Length

93

Thus, the assertion is proved for those y for which h(y) is sufficiently small.
By (3.2.10) again it can easily be seen that, otherwise, the assertion trivially
holds by choosing the constant C sufficiently large.
0
By induction over i = 1, ... , m - 1 it is easy to see that the signed measures
Ili.y in Lemma 3.2.5 already fulfill the condition lli,Y(S) = 0,

Expansions of D.F.'s
An expansion of probability measures which holds uniformly over all measurable sets on the real line yields an expansion
m-1

PO,y( -00, tJ

+ L

i=l

Vi,y( -00, tJ

of dJ,'s.
Assume that PO,y = N(o, 1), Vi,y has a density <pRi,y where Ri,y is a polynomial
and the mass of Vi, y is equal to zero, Then, the expansion of the d.f.'s can always
be written in the form
m-1

<D(t)

+ <p(t) L

Ly,i(t)

i=l

where L y , i are polynomials, This is immediate from the following lemma which
yields that one can find polynomials Ly,i such that (<pLy,;)' = <pRY,i'
Lemma 3.2.6. For every positive integer k,
<p(X) (X2k where ak = 1 and a i = (2i

f x 2k <p(X)dX) = [ -<p(X) it aix 2i-1]'

+ l)ai+1' i =

(3,2,11)

1"", k - 1. Secondly,

<p(X)X 2k - 1 = [ - <p(X)

it

a ix 2(i-1)]'

(3.2.12)

where a k = 1 and a i = 2ia i +1, i = 1, ... , k - 1.

(3.2.11) and (3.2.12) can be proved in a straightforward way. Observe


that a 1 in (3.2.11) is given by

PROOF.

a 1 = 1 35 ... (2k - 1) =

X2k<p(X) dx.

(3.2.13)

Two further technical lemmas that provide the basic tools for proving
expansions for extreme and central order statistics will be given in Appendix 2.

3. Inequalities and the Concept of Expansions

94

3.3. Distances of Measures: Convergence


and Inequalities
Given the r.v.'s (and I] with values in a measurable space (S, 81) [in our context,
S will be the real line or the Euclidean k-space] the variational distance is
defined by
sup IPg

B} - P{I]

(3.3.1 )

B}I.

BeiJ6

In this sequel, we shall write SUPB in place of sUPBeiJ6. Let Qo and Q1 denote
the distributions of ( and 1]. Then, we write

(3.3.2)

IIQo - Q111 = sup IQo(B) - Q1(B)I


B

Since the variational distance is difficult to deal with, we shall also


introduce related distances as the L 1-distance, the Hellinger distance, a
weighted Lz-distance and the Kullback - Leibler distance. These distances will
enable us to establish important estimates of the variational distance.

The Variational Distance and the L 1 - Distance


Representing the probability measures by their f1-densities /; [in our context,
the /; are usually Lebesgue-densities] one obtains the following well-known
relation between the variational and the L 1-distance.
Lemma 3.3.1.

(3.3.3)

PROOF.

Check that

where 1+ denotes the positive part of a function f This implies for B

1/0 - 111 df1 = 2 ('

with" =" for B

J{Jo>fd

(/0 -

Id df1 ~ 2(Qo(B) -

Uo > Id. Hence

s~p (Qo(B) This yields the assertion.

Q1 (B)) = 2- 1

PJ,

Q1 (B))

1/0 - 111 df1.

3.3. Distances of Measures: Convergence and Inequalities

95

The Scheffe Lemma and Related Results


We continue our calculations with some simple results concerning the
pointwise convergence of densities and the convergence w.r.t. the L1-distance.
Lemma 3.3.2. For every nonnegative integer n, let
probability measure Qn. Then,

f" -: fo
PROOF.

J.l - a.e. implies

f"

be the J.l-density of the

If" - fol dJ.l-: O.

We know (compare with the proof above) that

Ifn -

fol dJ.l = 2 (fo - f,,)+ dJ.l.

(1)

Moreover, fo ;;::: (fo - f,,)+ ;;::: 0 and (fo - fn)+ -: 0 J.l - a.e. Therefore, the
dominated convergence theorem implies that

This together with (1) yields the assertion.

A short look at the proof above reveals also that the following extension
holds.

Lemma 3.3.3. Let f" be a nonnegative, J.l-integrable function. If

f f

limnSup f" dJ.l::;; fo dJ.l

and

li:,nf" = fo

J.l - a.e.

then

It is well known (and easy to show by examples) that the conditions of


Lemma 3.3.3 are not necessary for the Lcconvergence of fn to fo. We also
prove the following stronger version of the. Scheffe lemma.

Lemma 3.3.4. With


equivalent:

f"

as in Lemma 3.3.3 the following conditions (i)-(iii) are

f
f f

fol dJ.l = o.

(i)

li:,n

(ii)

li:,n f" dJl = fo dJ.l,

Ifn -

3. Inequalities and the Concept of Expansions

96

and for every subsequence i(n) there exists a subsequence k(n) = i(j(n such that
limfk(n) = fo
n

J.l - a.e.

(iii) For every subsequence i(n) there exists a subsequence k(n) = i(j(n)) such
that

and
lim inffk(n)

fo

J.l - a.e.

We prove (i) => (ii) => (iii) => (i).


(i) => (ii): It is immediate that limn Jfn dJ.l = Jfo dJ.l. Moreover, for every
subsequence i(n) there exists a subsequence k(n) = i(j(n such that
PROOF.

Jl I.f~(n) -

fol dJ.l

00

fn~l

Lh(n) - fol dJ.l <

00.

This implies L~=lIA(n) - fol <


J.l - a.e. and hence limnA(n) = fo J.l - a.e.
(ii) => (iii): Obvious.
(iii) => (i): It suffices to prove that for every subsequence i(n) there exists a
subsequence k(n) = i(j(n such that

li~

IA(n) - fol dJ.l = O.

Condition (iii) implies that there exists k(n) = i(j(n)) such that
lim (/0 - A(n)t = 0
n

J.l - a.e.

Thus, by repeating the arguments of the proof of Lemma 3.3.2 we obtain the
0
desired conclusion.
The following version of the SchefTe lemma will be particularly useful in
cases where the measurable space varies with n.

Lemma 3.3.5. Let gn and f" be nonnegative, measurable functions. Assume that
1, 2, 3, ... is a bounded sequence, and that limn (gn - f,,) dJ.ln = O.
Then the following three conditions are equivalent:

Jgn dJ.ln, n =

(i)

li~

(ii)

li~ f If,,/gn -

(iii)

lim [
n

Ign - fnl dJ.ln = 0,

11 gn dJ.ln = 0,

J{lfn/gn-ll~'}

gn dJ.ln = 0 for every e > O.

3.3. Distances of Measures: Convergence and Inequalities


PROOF.

97

(i) ::;. (ii) ::;. (iii): Obvious from

Jr{lJnlgn-11~.} gndP,.n ~ e-

flfn/gn - IlgndP,n

~ e-

f lgn - fnldP,n.

(iii)::;. (i): For e > 0 put B = B(n, e) = {gn > 0, Ifn/gn - 11 < e}. If (iii) holds
then

r gndP,n = fgndP,n- J{IJnlgn-11~'}


r
gndP,n~fgndP,n-e

JB

for sufficiently large n. Moreover,

Itfn dP,n -

gn dP,n

I~

If,. - gnl dP,n =

(1)

Ifn/gn - 11 gn dP,n
(2)

Combining (1) and (2),

L!..dP,n

~ f gndP,n -

e - e L gndP,n

~ f!..dP,n -

2e - e L gndP,n

(3)

if n is sufficiently large. By (1)-(3),

fl!.. -

gnldP,n

~L

Ifn - gnldP,n

+f

+ ffndP,n - t!..dP,n

gndp,n - L gn dP,n

~ 2e f gndP,n + 3e

if n is sufficiently large. Since e is arbitrary this implies (i).

Finally, Lemma 3.3.5 will be formulated for the particular case of probability
measures.
Corollary 3.3.6. For probability measures Qn and Pn with p,n-densities !.. and gn
the following two assertions are equivalent:

li~ f

(i)
(ii)

Ign - fnl dP,n = 0,

lim Pn{I!../gn - 11 ~ e}
n

= 0 for every e > O.

The Variational Distance between Product Measures


The aim of the following is to prove estimates of the variational distance
between products of probability measures in terms of distances between
the single components. Our starting point is an upper bound in terms of

3. Inequalities and the Concept of Expansions

98

the variational distances of the components. The technical details and a


generalization of the present result to signed measures can be found in
Appendix 3.
Lemma 3.3.7. For probability measures Qi and Pi' i = 1, ... , k,
(3.3.4)
The following example shows that the inequality is sharp as far as the order
of the upper bound is concerned. However, we will realize later that this is
not the typical situation.
3.3.8. Let Qt be the uniform distribution on the interval [0, tJ. We
show that for 0 :s; s :s; k:

EXAMPLE

s + O(S2) = 1 - exp( -s):s; IIQ~ - Q~/(l-s/k)11 :s; kllQ1 - Q1/(l-S/k)11 = s.


The two upper bounds are immediate from (3.3.4) and the identity
IIQ~

This also implies


IIQ~

Q~II = 1 - t- k

Q~/(1-s/k)11 = 1 - (1 - S/k)k ;?: 1 - exp( -s).

The Hellinger Distance and Other Distances


To obtain sharp estimates of the variational distance of product measures we
introduce further distances and show their relation to the variational distance.
Let again Qi be a probability measure with Jl-density /;. Put
H(Qo, Q1) = [f (f01/2 - fl/2)2 dJl
D(Qo, Qd

= [f (fdfo

J/2

1)2 dQo J/2

K(QO,Q1) = f(-IOgf1/fo)dQo.

"Hellinger distance"

"x 2 -

distance"

"Kullback - Leibler distance"

It can be shown that these distances are independent of the particular


choice of the dominating measure Jl and of the densities fo and f1. Keep in
mind that the distances 1111 and H are symmetrical whereas, this does not hold
for the distances D and K.
Notice that IIQo - Q111 :s; 1 and H(Qo, Q1) :s; 21/2. Moreover, 1IQ0 - Q111 =
1 and H(Qo, Qd = 21/2 if the densities fo and f1 have disjoint supports. We
remark that, in literature, 2- 1/2 H is also used as the definition of the Hellinger
distance.

3.3. Distances of Measures: Convergence and Inequalities

99

The definition of the X2-distance will be extended to finite signed measures


in Appendix 3.
Check that H(Qo,Ql)::;; (211Qo - Ql11)1/2 and

H(Qo,Qd=[2(1- fUofdl/2dll)J/2.

(3.3.5)
(3.3.6)

Lemma 3.3.9. (i)

(ii) II Ql is dominated by Qo then


(3.3.7)

H(Qo,Qd::;; D(QO,Ql)
PROOF.

Ad (i): (3.3.3) and the Schwarz inequality yield

1IQ0 - Qlll

f 1/0 -

III dll = r

f 1/01/2 - 1/1211/01/2

::;; 2- 1 [f U01/2 - /l/2)2 dll J /2 [f Uol/2


= H(Qo, Ql{

+ /l/21 dll

+ 1/12)Z dll J /2

2(1 + f UO/l)1/2 dll) J /2/ 2 ::;; H(Qo, Qd

Ad (ii): Let 11 be a Qo-density of Ql. We have

H(Qo, Qd 2 = f (1 - /l/2)2 dQo::;; f [(1 - 1/12)(1


=D(QO,Ql)2.

+ /l/2)]2 dQo
0

Note that (3.3.7) does not hold if the condition that Ql is dominated by Qo
is omitted. Without this condition one can easily prove (use (3.3.5)) that
H(Qo,Qd::;; [2D(Qo,Ql)]1 /2.
Under the condition of Lemma 3.3.9 it is clear that IIQo - Qll1 ::;; D(QO,Ql)
This inequality can slightly be improved by applying the Schwarz inequality
to 11 - III dQo We have

IIQo - Qll1 ::;; 2- 1 D(Qo, Ql)

(3.3.8)

Another bound for the Hellinger distance (and thus for the variational
distance) can be constructed by using the Kullback-Leibler distance. This
bound is nontrivial if Qo is dominated by Qt. We have
(3.3.9)
A modification and the proof of this result can be found in Appendix 3.
The use of the Kullback-Leibler distance has the following advantages: If
Idlo is the product of several terms, say, gi then we get an upper bound of
10gUdlo) by summing up estimates oflog(gJ Moreover, it will be extremely

3. Inequalities and the Concept of Expansions

100

useful in applications that only integrals of bounds of log(g;) have to be


treated.

Further Inequalities for Distances of Product Measures


In this sequel, it is understood that for every i = 1, ... , k the probability
measures Qi and Pi are defined on the same measurable space.

Lemma 3.3.10. (i)


(ii)
(iii) If, in addition, Pi is dominated by Qi for i = 1, ... , k, then

k) :$; exp [r k
k D(Qi' pY )1/2 .
1 i~ D(Qi' pY ] (. i~

D ( i~ Qi' ~ Pi
PROOF.

Ad (i): Suppose that Qi and Pi have the Ilcdensities}; and gi. By (3.3.5),

H(~ Qi' ~ PiY =


=

2[1- J[D
2[1 - D
D

(};gJ 1/2 (XJ](d

i~ lli}X 1, ... ,X

k )]

J(};gY/2dll ]

(1 - 2- 1H(Qi,PY)]:$; it H(Qi'PY

2[1 -

where the final inequality is immediate from

n (1 k

u;) ~ 1 -

i=l

Lu

i=l

for 0 :$; ui :$; 1.


Ad (ii): Obvious.
Ad (iii): Since D(Qi' pY = S};2 dQi - 1 where}; is the Qcdensity of Pi we obtain
by straightforward calculations that

D(~ Qi' ~ PiY =

D+
[1

D(Qi,P;)2] - 1

:$;

exp [ t D(Qi' PJ2] - 1

:$;

exp[t D(Qi,Py](t D(Qi,PJ 2).

3.3. Distances of Measures: Convergence and Inequalities

101

Combining the results above we get


Corollary 3.3.11.

I ~ Qi - ~ Pi

II

:0;;

i~ H(Q;,PY

)112 :0;; (kl~ D(Qi'PY )112

(3.3.10)

Recall that the second inequality in (3.3.10) only holds if Pi is dominated


byQi'
If Qi = Q and Pi = P for i = 1, ... , k then by (3.3.4),

IIQ k

pkll

:0;;

kllQ - PII,

(3.3.11)

and by (3.3.10),
(3.3.12)
Thus, if IIQ - PII and H(Q, P) are of the same order (Example 3.3.8
treats an exceptional case where this is not true) then (3.3.12) provides a
more accurate inequality than (3.3.11). From (3.3.1 0) it is obvious that also
IIQk - pkll :0;; k I12 D(Q,P). A refinement of this inequality will be studied in
Appendix 3.

Distances of Induced Probability Measures


Let Q and P be probability measures on the same measurable space and T a
measurable map into another measurable space. Denote by TQ the probability
measure induced by Q and T; we have

TQ(B)

Q{TE B}.

Thus, in this context, the symbol T also denotes a map from one family of
probability measures into another family.
The following result is obvious.
Lemma 3.3.12.

IITQ - TPII

IIQ - PII

To highlight the relevance of this inequality let us consider the statistic


T(X,," ... X s ,") based on the order statistics X,," ... , X S ,". If Q is an
approximation to the distribution P of (X,,", ... , X,,") then TQ is an
approximation to the distribution TP of T(X,," ... X,,"). An upper bound
for the error IITQ - TPII of this approximation is given by IIQ - PII.
In view of the results above it is also desirable to obtain corresponding
results for the distances Hand D.
Lemma 3.3.13.

H(TQ, TP):O;; H(Q,P).

102

3. Inequalities and the Concept of Expansions

PROOF. We repeat in short the arguments in Pitman [1979, (2.2)]. Let go and
10 be J.L-densities of Q and P where w.l.g. J.L is a probability measure. If gl 0 T
and 11 0 T are conditional expectations of go and 10 given T (relative to J.L)
then gl and 11 are densities of TQ and TP w.r.t. TJ.L.
Thus, by applying the Schwarz inequality for conditional expectations [see
e.g. Chow and Teicher (1978), page 215] to the conditional expectation of
(goIo)I/2 given T we obtain in a straightforward way that

(goIo)I/2 dJ.L::;;

(gJl)I/2d(TJ.L)

which implies the assertion according to (3.3.5).

Lemma 3.3.14. Under the condition that P is dominated by Q,


D(TQ, TP) ::;; D(Q, P).

PROOF. Check that (Id 2 dTQ :s; (10)2 dQ where 10 is a Q-density of P and
11 is a TQ-density of TP. Moreover, use arguments similar to those in the
proof to Lemma 3.3.13.
0

P.3. Problems and Supplements


1. (i) For every x > kin,

P{Uk : n > x} :s; exp[ -n(x - k/n)2/3].


(ii) Let x > 0 be fixed. Then, for every positive integer m we find a constant C(m, x)
such that for every nand k :s; n,

P{Uk : n > x} :s; C(m,x)(k/n)m.

2. Let X n : n be the maximum of the r.v.'s ';1"'" ';n' For k = 1, ... , n:

<

P{Xn:n :s; x} - 1 +
;;:::

with

Six)

lSi! <"'<ijsn

L (-lYSix)
k

j=l

if

kodd

k even

P{.;i 1 > x""'';ij > x},

= 1, ... , n.

3. Prove that
N(llloaI)(B) - N(llo.a5)(B) = (aO/a 1

+ ((Jl1

1)

+ O[((Jl1

Is

(l - ((x - Jld/ao)2 dN(1l1.a5)

Jlo)/a~) Is (x - Jlo)/ad

Jlo) dN(llo. a5)(x)

+ (ao/a 1 -

1)2].

(see Falk and Reiss, 1988)

P.3. Problems and Supplements

103

4. For n = 0, 1, 2, ... let p. be unimodal probability measures which are dominated


by the Lebesgue measure. Then,

lIP. - Po II

-+

0, n -+

OCJ

iff p. -+ Po weakly.
(see Ibragimov, 1956, and Reiss, 1973)

5. Let Vi and V2 be finite signed measures on a measurable space (S, ~). Let JI be a
system of [0, 1)-valued, ~-measurable functions defined on S.
(i) Define
f7 = {.p-l(t, 1]: t

[0, 1], '" E JI}.

Then,

(ii) As a special case we obtain for the system JI of all ~-measurable, [0, 1]-valued
functions that

(iii) If JI is the system of all [0, 1)-valued, unimodal functions on the real line then
sup

t/le.At

If"'dV

l -

f"'dV21 =

sup Ivl(l) - v2(I)1


Ie",

where J is the system of all intervals on the real line.


6. Let Fo be a dJ. Then for every positive integer m there exists a finite set Am such
that for every dJ. Fl the following inequality holds:
sup IFo(t) - Fl (t)1 :::; m- l
t

+ max (0, Fo(t) -

Fl (t), Fl

teAm

7. Prove that
sup
B

IPg l E B} - Pg 2 E B}I :::; Pg l

(n - Fo(C)).

=F e2}'

8. Let Qo . and Ql,n be probability measures such that Ql . is dominated by Qo ..


Find conditions under which

(i)
(Reiss, 1980)

(ii) the most powerful test of level ex for testing


<I>(<I>-l(ex)

po. against pf.n has the power

+ nl/2 D(Po. , Pl )) + O(n- 1/2 ).


(Weiss, 1974; Reiss, 1980)

9. (Jensen inequality)
Let h be a convex function on an open interval I and
that and h(e) are finitely integrable. Then,

ea r.v. with range I such

h(Ee) :::; Eh(e).

(see e.g. Ferguson, 1967, Lemma 1, page 76)

104

3. Inequalities and the Concept of Expansions

10. (Dvoretzky, Kiefer, Wolfowitz inequality)


Let G;;l be the sample q.f. in Lemma 3.1.5. Then for every e > 0,

p{ sup

nl/2IG;;1 -

qe(O.I)

ql > e} =

p{ sup

qe(O.I)

n 1/2 IG. -

ql > e}::<:;; Cexp[ -2e 2 ]

for some C > O.


(see e.g. Serfling, 1980, page 59)

Bibliographical Notes
This chapter is not central to our considerations and so it suffices to only
make some short remarks.
Exponential bounds for order statistics related to (3.1.2) have been discovered and successfully applied by different authors (e.g. Reiss (1974a, 1975a),
Wellner (1977)).
The upper bound for the variational distance using the Kullback-Leibler
distance was established by Hoeffding and Wolfowitz (1958). In this context
we also refer to Ikeda (1963, 1975) and Csiszar (1975). The upper bound for
the variational distance between products of probability measures by using
the variational distance between the single components was frequently proved
in various articles, nevertheless, this inequality does not seem to be well
known. It was established by Hoeffding and Wolfowitz (1958) and generalized
by Blum and Pathak (1972) and Sendler (1975). The extension to signed
measures (see Lemma A.3.3) was given in Reiss (1981b). Investigations along
these lines allowing a deviation from the independence condition are carried
out by Hillion (1983).

PART II

ASYMPTOTIC THEORY

CHAPTER 4

Approximations to Distributions of
Central Order Statistics

Under weak conditions on the underlying dJ. it can be proved that central (as
well as intermediate) order statistics are asymptotically normally distributed.
This result easily extends to the case of the joint distribution of a fixed number
of central order statistics. In Section 4.1 we shall discuss some conditions
which yield the weak and strong asymptotic normality of central order
statistics.
Expansions of distributions of single central order statistics will be established in Section 4.2. The leading term in such an expansion is the normal distribution, whereas, the higher order terms are given by integrals of
polynomials W.r.t. the normal distribution. These expansions differ from the
well-known Edgeworth expansions for distributions of sums of independent
r.v.'s in the way that the higher order terms do not only depend on the sample
size n but also on the index r of the order statistic. In the particular case of
sample quantiles the accuracy of the normal approximation is shown to be of
order 0(n-1/2).
In Section 4.3 it is proved that the usual normalization of joint distributions
of order statistics makes these distributions asymptotically independent of the
underlying dJ. This result still holds under conditions where the asymptotic
normality is not valid.
In Section 4.4 we give a detailed description of the multivariate normal
distribution which will serve as an approximation to the joint distribution of
central order statistics.
Combining the results of the Sections 4.3 and 4.4, the asymptotic normality
and expansions of the joint distribution of order statistics X'l :n' .. , X'k: n
(with 0 = ro < r 1 < ... < r k < rk+1 = n + 1) are proven in Section 4.5. It is
shown that the accuracy of this approximation is of order

108

4. Approximations to Distributions of Central Order Statistics


k+1

o ( i~

(ri - ri_d- 1

)1/2

under weak regularity conditions. These approximations again hold w.r.t. the
variational distance.
Some supplementary results concerning the dJ.'s of order statistics and
moderate deviations are collected in the Sections 4.6 and 4.7.

4.1. Asymptotic Normality of Central Sequences


Convergence in Distribution of a Single Order Statistic
To begin with, let us consider the special case of order statistics U 1 : n ~
U2:n ~ ... ~ Un:n of n i.i.d. (0, I)-uniformly distributed r.v.'s '11' ... , '1n' If
r(n) --+ 00 and n - r(n) --+ 00 as n --+ 00 then one can easily show that the order
statistics U,(n):n (if appropriately normalized) converge in distribution to a
standard normal r.v. as n --+ 00. Thus, with <I> denoting the standard normal
dJ., we have
P{a;:(~),n(U'(n):n - b,(n),n) ~ t} --+ <I>(t),

n --+

(4.1.1)

00,

for every t where a"n = (r(n - r + I1/2j(n + 1)3/2 and b"n = rj(n + 1).
Since <I> is continuous we also know that the convergence in (4.1.1) holds
uniformly in t. In this sequel, we prefer to write a(n) and b(n) instead of a,(n),n
and b,(n),n, thus suppressing the dependence on r(n).
If (r(n)jn - q) = o(n-1/2) for some q E (0, I)-a condition which is e.g.
satisfied in the case of sample q-quantiles-another natural choice of the
constants a(n) and b(n) is a(n) = (q(I - q))1/2 jn 1/2 and b(n) = q.
Applying (1.1.8) we obtain
P{a(n)-1(U,(n):n - b(n)) ~ t}

=P{ -

i~ [I(-oo,p(n,t))('1i) - p(n, t)] ~ -r(n)

+ np(n, t)

(4.1.2)

where p(n,t) = b(n) + ta(n). Since (-r(n) + np(n,t))j[np(n,t)(1- p(n,t))]1/2


--+ t as n --+ 00, the convergence to <I>(t) is immediate from the central limit
theorem for a triangular array of i.i.d. random variables (or some other
appropriate limit theorem for binomial r.v.'s).1t is easy to see that this method
also applies to other r.v.'s. However, to extend (4.1.1) to other cases we shall
follow another standard device, namely, to use the transformation technique.
If X1:n ~ X2:n ~ ... ~ Xn:n are the order statistics of n i.i.d random
variables with dJ. F then, according to Corollary 1.2.7, P{X,(n):n ~ t} =
P{U,(n):n ~ F(t)} and hence by (4.1.1),

4.1. Asymptotic Normality of Central Sequences

P{a'(n)-1(Xr(n):n - b'(n)) ~ t}

= P{Ur(n):n

109

+ ta'(n))}
+ 0(1) = <I>(t) + 0(1)

~ F(b'(n)

= <I> [a(n)-1 [F(b'(n) + ta'(n)) - b(n)]]

(4.1.3)

if a'(n) and b'(n) are chosen so that


a(nt1 [F(b'(n)

+ ta'(n)) -

b(n)] ~ t,

n~

00.

Our first example concerns central order statistics.


EXAMPLE 4.1.1. Let q E (0, 1) be fixed. Assume that F is differentiable at F-1(q)
and F'(F-1(q)) > 0. If n1/2(r(n)jn - q) ~ 0, n ~ 00, then
n1/2 F'(F-1(q))
P { (q(1 _ q))1/2 (X,(n):n - F

-1}

(q)) ~ t ~ <I>(t),

for every t. This is immediate from (4.1.3) by taking a(n)


b(n) = q, a'(n) = a(n)jF'(F-1(q)), and b'(n) = F-1(q).
As a special case we have

n~oo,

= (q(1

n 1/2F'(F-1(q)) -1
-1
}
P { (q(l _ q))1/2 (Fn (q) - F (q)) ~ t ~ <I>(t),

(4.1.4)

- q))1/2jn 1/2,

n~

00.

(4.1.5)

The next example deals with upper intermediate order statistics.


EXAMPLE 4.1.2. Assume that n - r(n) ~ 00 and r(n)jn ~ 1 as n ~ 00. Moreover,
assume that w(F) < 00 and that F has a derivative, say, f on the interval
(w(F) - e, w(F)) for some e > where f is uniformly continuous and bounded
away from zero. These conditions are e.g. fulfilled for uniform r.v.'s. Then,

(n
P{

+ 1)3/2f(F-1 (~))
n+1
(r(n)(n - r(n) + 1))1/2 (Xr(n):n -

-1

r(n)
(n +

1)) ~

<I>(t),

}
t

n~

00,

(4.1.6)

for every t. The proof is straightforward and can be left to the reader.
When treating intermediate order statistics the underlying dJ. F has to
satisfy certain regularity conditions on a neighborhood of IX (F) or w(F). From
this point of view intermediate order statistics are connected with extreme
order statistics. The extreme value theory will provide conditions better
tailored to this situation than those stated in Example 4.1.2 (see Theorem
5.1.7).

The Joint Asymptotic Normality


In a second step, consider the joint distribution of k order statistics where
k ~ 1 is fixed. Our arguments above can easily be extended to the case of joint

4. Approximations to Distributions of Central Order Statistics

110

distributions. Here we shall restrict our attention to an extension of Example


4.1.1.
Theorem 4.1.3. Let 0 < q1 < q2 < ... < qk < 1 be fixed. Assume that P is
differentiable at P-1(q;) and that f(P- 1(q;)) > 0 for i = 1, ... , k where f = P'.
Then, if (r(n, i)/n - q;) = 0(n-1/2) for every i = 1, ... , k then

P{ (n 1/2f(p-1 (q;))(Xr(n.i):n - p-1(q;)m~1 ~ t} ..... <l>dt),

n .....

00,

(4.1.7)

for every t = (t l' ... , tk) where <1>1; is the df. of the k-variate normal distribution
with mean vector zero and covariances qi(1 - %) for 1 ~ i ~ j ~ k. As a special
case we have
n .....

00.

(4.1.8)

Convergence w.r.t. the Variational Distance


One of the advantages of the representation (4.1.2) is that one can treat the
asymptotic behavior of the distribution of order statistics whenever a limit
theorem for the r.v.'s Li~l 1(-oo,p(n,/))('1;) is at hand. The disadvantage of this
approach is that the convergence cannot be proved in a stronger sense since
we have to deal with discrete r.v.'s although the order statistics have a
continuous dJ.
Another well-known method tackles this problem in a successful way. Let
us return to the distribution of a single order statistic Ur(n):n' In the i.i.d. case
we know the explicit form of the density. By showing that the density of
a(nt 1(Ur(n):n - b(n)) converges pointwise to the standard normal density
(compare with (1.3.9)) we know from the Scheffe lemma that the convergence
of the distributions holds w.r.t. the variational distance; that is
sup IP{a(n)-l(Ur(n):n - b(n)) E B} - N(O.l)(B)I ..... 0,

n .....

00,

(4.1.9)

where ~o. 1) denotes the standard normal distribution.


Notice that (4.1.9) is in fact stronger than (4.1.1) since (4.1.1) can be written
sup IP{ a(n)-l(Ur(n):n - b(n)) E (-00, tJ} - N(o. 1)( -00, tJ I ..... 0,

n .....

00.

Next, the problem arises to extend (4.1.9) to a certain class of dJ.'s P. This
is again possible by using the transformation technique.
Theorem 4.1.4. (i) Let q E (0, 1) be fixed. Assume that P has a derivative, say,
f on the interval (F-1(q) - e, p-1(q) + e) for some e > O. Moreover, assume
that f is continuous at P-1(q) and that f(P- 1(q)) > O. Then, if r(n)/n ..... q as

n .....

00,

4.1. Asymptotic Normality of Central Sequences

111

sup
B

n --+

(ii) Moreover,

00.

(4.1.10)

if (r(n)/n - q) = 0(n- 1/2) then


n -+

00.

(4.1.11)

(iii) (4.1.10) also holds under the conditions of Example 4.1.2.


Before sketching the proof of Theorem 4.1.4 let us examine an example
which shows that we have to impose stronger regularity conditions on the
underlying dJ. F than those in Example 4.1.1 to guarantee the convergence
w.r.t. the variational distance.
EXAMPLE 4.1.5. Let F have the density
2i + 1
f = 1[-1/2,0] + ~ T+1 1[1!(2i+1),1/2i)

where the summation runs over all positive integers i. By verifying the
conditions of Example 4.1.1 we shall obtain that the dJ.'s ofthe standardized
sample medians weakly converge to the standard normal dJ. <1>. Since
n

2i

+ 1 (1

i~ T+1
it is easily seen that f f(x) dx =
F(2n

2i - 2i

1)

+ 1 = 2(n + 1)

(4.1.12)

1. By (4.1.12),

~ 1) = F(2(n ~ 1)) = ~ + "-2(-:-n-~- :-:-I)'

and hence, for every positive integer n


1
1 2(n
F(x) - - =

+ 1)

2(n

1)

2 1+

1)

2n+l(

if

+ ~ x - 2n + 1

XE[2(n~ 1)'2n~ IJ
XE[2n~ 1'21nJ

This implies that x - x 2 ::;:; F(x) - 1/2::;:; x for Ixl ::;:; 1/2 showing that F is
differentiable at F-1(1/2) = 0 and F(l)(O) = 1. Thus, by Example 4.1.1,
P{2n1/2 X[n/2]:n ::;:; t}

-+

<I>(t),

n -+

00,

for every t,

4. Approximations to Distributions of Central Order Statistics

112

which proves the weak convergence. On the other hand,

P{2nl/2 X[n/21:n

Bn} = 0 < liminf N(O.l)(Bk )

1,

(4.1.13)

for every n where Bn = Ui 2nl/2 /2(i +


(2nl/2/(2i +
with i taken over all
positive integers.
To prove (4.1.13) verify that the Lebesgue measure of Bn 11 (0,1) is ;;:::t and
that f(x/2nl/2) = 0 for x E Bn.
The proof of Theorem 4.1.4 starts with the representation

a'(n)-l(Xr(n):n - b'(n 4: T,.[a(n)-l(Ur(n):n - b(n))]


where T,.(x)

= a'(n)-l [F- 1 (b(n) + xa(n - b'(n)]. According to (4.1.9)

sup IP{a'(n)-l(Xr(n):n - b'(n E B} - P{T,.(,,) E B}I ~ 0

(4.1.14)

as n ~ 00 where" is a standard normal r. v.


To complete the proof of Theorem 4.1.4 it suffices to examine functions of
standard normal r.v.'s. Denote by Sn the inverse of T,.. Under appropriate
regularity conditions, S~(qJ 0 Sn) is the density of T,.(,,). If Sn(x) ~ x and
S~(x) ~ 1 as n ~ 00 for every x then S~(qJ 0 Sn) ~ qJ, n ~ 00. Therefore, the
Scheffe lemma implies the convergence to the standard normal distribution
w.r.t. the variational distance.
This idea will be made rigorous within some general framework. The
following lemma should be regarded as a useful technicality.
Lemma 4.1.6. Let Y;:n be the order statistics of n U.d random variables with
common continuous df. Fo and Xi:n be the order statistics of n U.d. random
variables with df. Fl' Let hand g(h 0 G) be probability densities where h is
assumed to be continuous at x for almost all x. Then, if

s~p iP{a(n)-l(Y,,(n):n -

b(n E B} -

we have

s~p iP{a'(ntl(Xr(n):n -

b'(n E B} -

t
t

h(X)dxi

~ 0,

n~

g(X)h(G(Xdxi

~ 0,

00,

(4.1.15)

n~oo,

(4.1.16)

provided the functions Sn defined by


Sn(x) = a(ntl [FOI (Fl (b'(n)

+ xa'(n))) - b(n)]

are
(a) strictly increasing and absolutely continuous on intervals (oc(n), p(n where

oc(n) ~

-00

and p(n) ~

00,

and

(b) Sn(x) ~ G(x) and S~(x) ~ g(x) as n ~ 00 for almost all

x.

113

4.1. Asymptotic Normality of Central Sequences

PROOF. Write T,,(x) = a'(nt 1[Fl1(Fo(b(n) + xa(n))) - b'(n)]. Since Fo is continuous we obtain from Corollary 1.2.6 that
P{a'(n)-l(Xr(n):n - b'(n E B} = P{T,,[a(nt1(,.(n):n - b(n))] E B}

and hence condition (4.1.15) yields

s~p Ip{a'(nt1(Xr(n):n -

b'(n

B} -

9(X)h(G(Xdxl

~suplf
h(x)dx- r 9(X)h(G(Xdxl+o(n
B {~E~
JB

(4.1.17)
O).

The image of (ct(n), f3(n under Sn' say, I n is an open interval, and T"IJn is the
inverse of Snl(ct(n),f3(n. By P.t.11,

J{T"EBj

h(x) dx =

r hn(x) dx

(4.1.18)

JB

for every Borel set B c (ct(n), f3(n where hn = S~(h 0 Sn) l(a(n).p(n))' Notice
that w.l.g. S~ can be assumed to be measurable. Since Jhn(x) dx ~ 1 and
hn --+ g(h 0 G) almost everywhere the SchefTe lemma 3.3.2 yields

s~p

II

hn(x)dx -

n --+

g(X)h(G(Xdxl--+o,

00.

This together with (4.1.18) yields


sup If

(~E~

h(x) dx -

JB

n --+

g(x)h(G(x dx 1--+ 0,

00.

Combining (4.1.17) and (4.1.19) the proof is completed.

(4.1.19)

Whereas the constants a(n) and b(n) are usually predetermined the constants a'(n) and b'(n) should be chosen in a way such that Sn fulfills the required
conditions. If G(x) = x and g(x) = 1 (that is, thelimiting expressions in (4.1.15)
and (4.1.16) are equal) then a natural choice of the constants a'(n) and b'(n) is
b'(n) = Fl1(Fo(b(n)

and

a'(n) = a(n)/(Fo-l

Fd(b'(n.

Then Sn(O) = and S~(O) = 1 so that a Taylor expansion of Sn about


that Sn(x) is approximately equal to x in a neighborhood of zero.
Now the proof of Theorem 4.1.4 will be a triviality.

(4.1.20)

yields

PROOF OF THEOREM 4.1.4. We shall only prove (4.1.10) since (4.1.11) and (iii)
follow in an analogous way.
Lemma 4.1.6 will be applied to Fo being the uniform dJ. on (0, 1), Fl = F,
a(n) = (r(n)(n - r(n) + 11/2/(n + 1)3/2, b(n) = r(n)/(n + 1), h = qJ, 9 = 1 and
G(x) = x. (4.1.15) holds according to (4.1.9). Moreover, choose b'(n) =
F-l(b(n and a'(n) = a(n)/f(b'(n. Since f is continuous at F-l(q) and
f(F-1(q > we know that f is strictly positive on an interval (F-l(q) - K,

4. Approximations to Distributions of Central Order Statistics

114

F-1(q) + K) for some K> O. This implies that Sn = a(n)-l [F(b'(n) +


xa'(n)) - b(n)] is strictly increasing and absolutely continuous on the
interval ( - K/2a'(n), K/2a'(n)), eventually, and hence condition (a) in Lemma
4.1.6 is satisfied. It is straightforward to verify condition (b). The proof
is complete.
0

4.2. Expansions: A Single Central Order Statistic


The starting point for our study of expansions of distributions of central order
statistics will be an expansion of the distribution of an order statistic Ur:n of
i.i.d. (0, I)-uniformly distributed r.v.'s. The leading term in the expansion will
be the standard normal distribution N(O.l)' The expansion will be ordered in
powers of (n/r(n - r))112. This shows that the accuracy of the approximation
by N(O.l) is bad if r or n - r is small. The quantile transformation will lead to
expansions in the case of order statistics of other r.v.'s.

Order Statistics of Uniform R.V.'s


For positive integers nand r E {1, ... ,n} put a;.n = r(n - r + 1)/(n + 1)3 and
br n = r/(n + 1). Recall from Section 1.7 that br n and ar n are the expectation
and, approximately, the standard deviation of Ur:n.
Theorem 4.2.1. For every positive integer m there exists a constant Cm > 0 such
that for every nand r E {I, .. . ,n},
sup Ip{a;:!(Ur:n - br n) E B} -

r (1 + ~f
,=1 L

JB

i r n)dNc.o.1)1

::; Cm(n/r(n - r)r'2

(4.2.1)

where L i r n is a polynomial of degree::; 3i.


PROOF. Throughout' this proof, the indices rand n will be suppressed.
Moreover, C will be used as a generic constant which only depends on m. Put
ex = rand P= n - r + 1. From Theorem 1.3.2 it is immediate that the density
of

a;:!(Ur:n - br n) = ((ex

+ P)312/(exP)112)(Ur:n -

ex/(ex

+ P))

is of the form pg where p is a normalizing constant and

+ (P/(ex + p)ex)112 X]"-l [1 - (ex/(ex + P) P) 112 x]fJ- 1


if -((ex + p)ex/P)112 < x < ((ex + P)P/ex)112. Notice that min[(ex + P)ex/P,
(ex + P)P/ex] ~ exP/(ex + Pl. Corollary A.2.3 yields
g(x) = [1

4.2. Expansions: A Single Central Order Statistic

leXp (X 2/2)g(x) -

(1

+ ~t:

hi)l:::; C[(a

115

+ /3)//3ar /2 (lxl m + Ixl 3m )

(1)

for Ixl :::; [a/3/(a + /3)]1 /6 where hi are the polynomials as described in Corollary A.2.3. Define the signed measure v by

W.l.g., by choosing the constant C sufficiently large, we may assume


that the term J(1 + I:'!=11 h;) dN(o. 1) is bounded away from zero. By (1), the
exponential bound (3.1.2) and Lemma A.3.2 applied to the functions g and
f = exp( - x 2/2)(1 + Ir=1 1 h;) and to the set B = {x: Ixl :::; [a/3/(a + /3)] 1/6} we
obtain
sup IP{((a + /3)3 /2/(a/3)1/2)(Ur:n - a/(a + /3))
A

:::; C((a

+ /3)/(a/3))m /2

+ P{((a
:::; c((a

A} - v(A)1

f(,x ,m + IX I3m )dN(O,l)/f(1 + ~~1 h)dN(O,

+ /3)3 /2/(a/3)1/2)(Ur:n -

a/(a

+ /3)) B} + Ivl(B

l)

C)

+ /3)/(a/3)t I2 .

Now the assertion is immediate from Lemma 3.2.5.

Addendum 4.2.2. The application of Lemma 3.2.5 in the proof of Theorem 4.2.1
gives a more precise information about the polynomials Li,r,n'
(i) The polynomials Li,r,n are recursively defined by
LI,r,n = hI,r,n -

f hI,r,n dN.

(0,1)

i-l
- k~l

(f hk,r,n dN. )LI-k,r,n


(0,1)

where hi,r,n == hi'


(ii) JLi,r,n dN(o, 1) = 0, i = 1, ... , m - 1.
(iii) The coefficients of Li,r,n are of order O((n/r(n - r))iI2).
(iv) For i = 1,2 we have
[X3
]
n - 2r + 1
L 1,r,n(x) = (r(n _ r + l)(n + 1))1/2 3' - x
(4.2.2)

and
1

L2 r n(x) = (
1)(
1) [en - 2r
..
rn-r+
n+
[7(n - 2r

+ If + 3r(n -

+ 1)](x4 -

+ 1)

26

(x - 15)/18 -

3)/12 - (n - r

+ 1)2(x 2 -

1)].

Before turning to the extension of Theorem 4.2.1 to a certain class of d.f.'s


we make some comments:

4. Approximations to Distributions of Central Order Statistics

116

(a) Perhaps the most important consequence of Theorem 4.2.1 is that we


get a normal approximation with an error term of order O((n/r(n - r))I/2).
Thus, if r = r(n) = [nq] where 0< q < 1 then the error bound is of order
O(n-l/2). In the intermediate case the approximation is less accurate and,
moreover, if r or n - r is fixed (that is the case of extreme order statistics) we
have no approximation at all.
(b) When taking the expansion of length 2-that is, we include the
polynomial L 1 ,r,n into our considerations-then the accuracy of the approximation improves considerably. We also get a better insight in the accuracy of
the normal approximation.
For example, given the sample median Un +1:2n+1 we see that the corresponding polynomial L 1 ,n+1,2n+l is equal to zero and, thus, the accuracy of the
normal approximation is of order O(n-l). A similar conclusion can be made
for order statistics which are close-as far as the indices are concerned-to
the sample median. For sample quantiles different from the sample median
the accuracy of the normal approximation cannot be better than O(n-l/2).
Finally, we mention that for symmetric Borel sets B (that is, B has the
property that x E B implies -x E B) we have

L 1 ,r,n dN(o, 1)

= 0,

so that for symmetric sets the normal approximation is of order O(n/r(n - r)).
(c) Numerical calculations show that for n = 1, 2, ... , 250 we can take
C1 = .14 and C2 = .12 in Theorem 4.2.1.

The General Case


The extension of Theorem 4.2.1 to more general r.v.'s will be achieved by
means of the transformation technique. If Xr:n is the rth order statistic of n
i.i.d. random variables with common dJ. F then Xr:n 4: F- 1 (Ur :n ). Notice
that F- 1 is monotone. Apart from this special case one is also interested in
other monotone transformations of Ur :n
As a refinement of the idea which led to Lemma 4.1.6 we get the following
highly technical result.
Lemma 4.2.3. Let m be a positive integer and e > 0. Suppose that S is a function
with the properties S(O) = 0, S is continuously dif.ferentiable on the interval
(-e,e),and
IS/(X) -

[1 + :~1 (XiXi/i!]I ~ (Xmlxml/m!,

Ixl <

e,

(4.2.3)

with l(Xd ~ exp( - ie) for i = 1, ... , m.


Moreover, let Ri be polynomials of degree ~ 3i so that the absolute values of
the coefficients are ~ exp( - ie) for i = 1, ... , m - 1.

4.2. Expansions: A Single Central Order Statistic

117

Then there exist constants C > 0 and dE (0, 1) [which only depend on m]
such that
(i) S is strictly increasing on the interval I = ( - de, de).
(ii) For every monotone, real-valued function T such that the restriction of T
to the set S(l) is the inverse of the restriction SII we have

sup [
B

r (1 + 'II R;)dN(O,I r (1 + mf L;)dN(O,I


J-

,=1

J{TEB)

JB

,=1

J[

~ Cexp(-me)

where L; is a polynomial of degree ~ 3i and the absolute values of the coefficients


are ~ C exp( - ie) for i = 1, ... , m - 1.
(iii) We have

(4.2.4)

and

+ IX 1 [x 2 R'I(X)/2 + (x + IX 2 [x 2/2 - x 4 /6].

L 2 (x) = R 2 (x)

x 3 /2)R 1 (x)]

+ IXi[x6/8 -

5x 4 /8]

Since eP exp( - e) is uniformly bounded on [0, CfJ) for every p ;::: 1 there
exists d E (0, 1) such that
PROOF.

S'(x) ;::: 1 -

Ixl

[deexp( -e)];/i! ;::: 1/2,

i=l

~ de.

(1)

The assertion (i) is immediate from (1).


Moreover (1) implies that
S(O)( -de)

-de/2

and

From the condition S'(O)


that

0 and from (4.2.3) we deduce by integration

(2)

S(de);::: de/2.

m-l
Xi+l
) I
Ixl +1
IS(x) - ( x + ;~ (i + I)! IX; ~ (m + 1)!IX
m

Ixl < e.

(3)

+ Ixl)IS(x) - xl is uniformly bounded over Ixl ~ de.

(4)

m,

Using (3) we get in analogy to (1) that


(1

Applying the transformation theorem for densities (1.4.4) we obtain for


every Borel set B c ( - de, de) that

r (1 + mf R )dN(o.I r h(x)dx

(5)

~~1 R;(S'(X).

(6)

J{TEB)

,=1

JB

where
h(x) = S'(x)(J)(S(x ( 1 +

4. Approximations to Distributions of Central Order Statistics

118

Expanding <p about x we obtain from (4)

I<p(S(x -

<p(x) (1

+ ~~1

::; C<p(x) Iwm(x

wi(x)(S(x) - X)i) I

+ 8(S(x) -

x11 S(x) - xl

(7)
m

for Ixl ::; de and 8 E (0, 1). Moreover, Wi = <p(i)/(i!<p) is a polynomial of degree
::; i and C denotes a generic constant which only depends on m. For i = 1, 2
we get
W 1 (x) = -x
and w2 (x) = (x 2 - 1)/2.
Writing
m-1

tjJ(x) =

.~
,-1

Xi+1

(.
I

+ 1)'. (Xi'

we obtain from (7) that

Ih(x) -

<p(x) [1

+ tjJ(1)(x)] [1 + ~:

::; C<p(x)exp( -me)(1

w;(x)tjJ(i)(x) ] [ 1

+ ~t:

Ri(X

+ tjJ(x ] I (8)

+ IxI 6 (m+1)2)

for Ixl < de. From (8) we conclude that

Ih(x) -

<P(x{ 1 +

~t: Li(X) ] I ::; C<p(x) exp( -

me)(1

+ IxI 6 (m+1)2)

(9)

for Ixl < de where Li are polynomials which have the asserted property. From
(5) and (9) we deduce by integration that

If (1 ~f
: ; f.
{TEB}

,=1

Ri)dN(o.l) -

Ih(X) -

for Borel sets B


by (2)

If (1 ~f
{TEB}

,=1

c (-

(1 + ~:

r (1 + ~f

JB

,=1

L i)dN(o.l)1

Li(X)<p(X)ldX::; Cexp(-me)

de, de). Moreover, for Borel sets B

Ri)dN(o. 1)

(10)

fB (1 + ~f
L i)dN(o.l)1
,=1

where A is the complement of ( - de/2, de/2).


Combining (10) and (11) the proof is complete.

c (-

de, dey we get

(11 )

Note that Lemma 4.2.3 still holds if the condition that S has a continuous
derivative is replaced by the weaker condition that S is absolutely continuous.

4.2. Expansions: A Single Central Order Statistic

119

Next, an expansion oflength m will be established under the condition that


the underlying dJ. F has m + 1 derivatives on some appropriate interval. Let
again a;'n = r(n - r + 1)/(n + 1)3 and br,n = r/(n + 1). Based on Theorem
4.2.1 and Lemma 4.2.3 the proof of Theorem 4.2.4 will be a triviality.
Theorem 4.2.4. For some r E {I, . , . , n} let Xr: n be the rth order statistic
of n i.i.d. random variables with common df F and density f Assume that
f(F- 1(br,n)) > 0 and that the function Sr,n defined by
Sr,n(x)
has m

+ xa r,n/f(F- 1(br,n))]

= a;::~(F[F-1(br,n)

- br,n)

+ 1 derivatives on the interval


Ir,n:= {x: Ixl < r110g(r(n - r

+ 1)/(n +

I))}.

Then there exists a constant Cm > 0 (only depending on m) such that

sup
B

Ip{a;::~f(r1(br,n))[Xr:n -

f (1

F- 1(br,n)] E B} -

+ ~f L;,r,n)dN(o,l)1
,=1

(4.2.5)
where L;,r,n is a polynomial of degree :=:;; 3i. Moreover, ai,r,n = S~~:l)(O) for
j = 1, ... , m - 1 and am,r,n = sup{IS~:':.+1)(x)l: x E Ir,n}.

PROOF. Throughout the proof, the indices rand n will be suppressed. Writing
(1)

and denoting by R; the polynomials of Theorem 4.2.1 we obtain from Theorem


1.2.5 and Theorem 4.2.1 that for every Borel set B,

IP{a- 1f(F:=:;;

(b))[Xr:n - F-1(b)]

B} -

{TEB}

(1 + ~f

R;)dN(O,1)1

,=1

C(n/r(n - r)t/2

It remains to prove that

I (1 + ~f R;)dN(O,
IJ{TEB)

1) -

,=1

:=:;; C [(n/r(n

- r))m/2

JB

(1 + ~f
,=1

+ ~~x laj,r,nl mli ].

L;) dN(o, 1)

I
(2)

)=1

Put e = -log[(n/r(n - r))1/2 + max.i!=1Iaj,r,nI11i], and assume w.l.g. that


r(n - r) is sufficiently large so that e > O. A Taylor expansion of Sf about zero
yields that condition (4.2.3) is satisfied for e and a;. Moreover, TIS(I) is the
D
inverse of SII. Thus, Lemma 4.2.3 implies (2).

4. Approximations to Distributions of Central Order Statistics

120

Addendum 4.2.5. From the proof to Theorem 4.2.4 we see that

(i) SLi,r,ndN(o, l) = 0, i = 1, ... , m - 1.


(ii) The coefficients of Li,r,n are of order

o [(n/r(n -

r1/2

+ n;t~X l(Xj,r,nlijjJ.
j;l

(iii) For i = 1,2, we have (with Ri,r,n denoting the polynomials of (4.2.2,
L 1,r,n(x) = R 1,r,n(x) + (X1,r,n(X - x 3 /2)
and
L 2,r,n(x) = R 2,r,n(x) + (Xl,r,n[x 2R~,r,n(x)/2

+ (XL,n(x 6 /8

- 5x 4 /8)

+ (x -

+ (X2,r,n(x 2/2 -

x 3 /2)R 1,r,n(x)]
x 4 /6).

Notice that Theorem 4.2.1 is immediate from Theorem 4.2.4 applied to


Sr,n(x) = x. In this case we have (Xj,r,n = O,j = 1, ... , m.
4.2.6. In many cases one can omit the term maxj;l l(Xj,r,nl m/j at the
right-hand side of (4.2.5).
Let < q1 < q2 < 1 and suppose that the density is bounded away from
zero on the interval J = (F-1(qd - e, F- 1(q2) + e) for some e > 0. Iff has m
bounded derivatives on J then maxj;l l(Xj,r,nl m/ j = O(n- m/2 ) uniformly over
rE {[nQ1], ... ,[nq2] + I}.
EXAMPLE

Order Statistics of Exponential R.V.'s


Careful calculations will show that in the case of exponential r.v.'s the righthand side of (4.2.5) is again of order O((n/r(n - r))m/2).
Corollary 4.2.7. Let Xi:n be the ith order statistic of n U.d. standard exponential
r.v.'s (having the df G(x) = 1 - e- and density g(x) = e- x , x ;;::.: 0). Let again
a;'n = r(n - r + 1)/(n + 1)3 and br,n = r/(n + 1).
Then there exists a constant Cm > (only depending on m) such that
X

sU P/p{a;::!g(G- 1(br,n[Xr:n - G- 1(br,n)]EB} -

r (1 + ~f,;1 Li,r,n)d~o'l)/

JB

::;; Cm(n/r(n - rm/2

(4.2.6)

where the polynomials Li,r,n are defined as in Theorem 4.2.4 with


(Xi,r,n

= (-I)i(r/(n + l)(n - r + l))i/2.

In particular, for i = 1, 2,
L1,r,ix) = (r(n - r

+ l)(n + 1)f1/2[(2n -

+ 2)x 3 /6 -

(n - r + l)x],

4.2. Expansions: A Single Central Order Statistic

121

and
L 2,r,n(x) = R 2,r,n(X) + ((n - r + l)(n + lWl[r(-5x 6/24 + 15x 4 /8 -- 5x 2/2)
- (n

+ 1)( -x6/6 + 4x4 /3

- 3x 2/2)]

where R 2,r,n is the corresponding polynomial in Theorem 4.2.1.


PROOF. Since g(i)(G- 1 (q)) = (-I)i(1 - q) it is immediate that rJ. i r n is of the
desired form. Moreover, lrJ.i,r,nl l / i $; (n/r(n - r + 1))1/2.
',
Let Sr,n and Ir,n be defined as in Theorem 4.2.4. Since log(1 + x) $; x for
x > - 1, and hence, also log x < x, x > 0, we obtain

+ 1)/(n + 1))1/2]ar,n/g(G- 1(br,n))


~ br,n - log[(r(n - r + 1)/(n + 1))1/2]ar,n/(1 - br,n) > O.
inequality we see that Sr,n has m + 1 derivatives on the

G- 1(br,n) -log[(r(n - r

Using this
interval Ir,n' Moreover, by straightforward calculations we obtain rJ.m,r,n $;
C(n/r(n - r + l))m/Z where C is a universal constant. Thus, Theorem 4.2.4 is
applicable and yields the assertion.
0
Numerical computations show that one can take C1 = .15 and Cz = .12 in
Corollary 4.2.7 for n = 1, ... ,250. From the expansion oflength 2 in Corollary
4.2.7 we obtain the following upper bound ofthe remainder term of the normal
approximation:

Moreover,

L2 dN.
= 8(n - r + I)Z + 8r(n - r + 1) + 5r z < 2(n + 1)
l,r,n (0,1)
12r(n _ r + l)(n + 1)
- -=-3r-'(-'n'---r-'+-1-:-:-)'
(4.2.7)

Stochastic Independence of Certain Groups of Order Statistics


This section will be concluded with an application of the expansion of length
2 of distributions of order statistics Ui : n In the proof below we shall only
indicate the decisive step which is based on the expansion of length 2.
Hereafter, let 1 $; s < n - m + 1. Let Y.:n and v,,-m+l:n be independent
r.v.'s such that Y.:n 4: Us :n and v,,-m+l:n 4: Un- m+1:no The basic inequality
is given by

$;

sm
C [ n(n _ s - m)

12
/

(4.2.8)

4. Approximations to Distributions of Central Order Statistics

122

where C > 0 is a universal constant. Thus, if sand m are fixed then the upper
bound is of order O(n- 1). If s is fixed and (n - m)/n bounded away from 0 and
1 then the bound is of order O(n-1/2). Finally, if s is fixed and n - m = o(n)
then the bound is of order On - mt I/2 ). This shows that extremes and
intermediate order statistics are asymptotically independent.
The proof of (4.2.8) is based on Theorem 1.8.1 and Theorem 4.2.1.
Conditioning on Un - m+ 1 : n one obtains
P{ (Us: n, Un- m+1:n) E B} - P{ O-::n, v,,-m+1:n) E B} = ET(Un- m+1:n)

(4.2.9)

where
T(x) = P{xU.: n- m E Bx} - P{U.: n E Bx}

with Bx denoting the x-section of the set B.


The function T is of a rather complicated structure and has to be replaced
by a simpler one. This can be achieved by expansions of length 2. The
approximate representation of T as the difference of two expansions oflength
2 simplifies further computations. We remark that a normal approximation
instead of an expansion of length 2 leads to an inaccurate upper bound in
(4.2.8). For details of the proof we refer to Falk and Reiss (1988) where the
following two extensions of (4.2.8) can be also found.

Theorem 4.2.8. Let Xi:n be the ith order statistic of n i.i.d. random variables
with common df. F. Given 1 ~ s < n - m + 1 ~ n we consider two vectors of
order statistics, namely,
Xl

= (X 1:n,, X.:n), and Xu = (Xn- m+1:n,, Xn:n)

Now let Yj and y" be independent random vectors so that Yj


d
y" = Xu Then,

sup IP{(XI,Xu )
B

4:

Xl' and

J1 /2 (4.2.10)
sm
)
nn-s-m

B} - P{(Yj, y") E B}I ~ C [ (

where C is the constant in (4.2.8).


A further extension is obtained when treating three groups of order
statistics.

Theorem 4.2.9. Let Xi:n be as above. Given 1 ~ k < r < s < n - m

+ 1 ~ n we

obtain three vectors of order statistics, namely,

= (X 1 : n,,Xk : n), Xc = (Xr:n,,X.: n), Xu = (Xn-m+l:n,,Xn:n)


Now let Yj, ~ and Y" be independent random vectors so that Yj 4: Xl'
~ 4: Xc and Y" 4: Xu. Then there exists a universal constant C > 0 such that
Xl

4.3. Asymptotic Independence from the Underlying Distribution Function

sup IP{(Xz,Xc>XJ

B} - P{(l';, 1;., Y,J E B}I

k(n - r)
<C [
n(r - k)

123

sm
+ n(n - s - m)

(4.2.11)

Jl/2 .

Both theorems are deduced from (4.2.8) by means of the quantile transformation and by conditioning on order statistics.

4.3. Asymptotic Independence from the


Underlying Distribution Function
From the preceding section we know that the normalized central order
statistic f(F- 1 (b r,n)}(Xr:n - F- 1 (br.n)) is asymptotically normal-with expectation f.1 = 0 and variance a;,n = r(n - r + 1)/(n + 1)3- up to a remainder
term of order O(n-l/2) if, roughly speaking, the underlying density fis bounded
away from zero. In the present section we shall primarily be interested in the
property that the approximating normal distribution is independent from the
underlying dJ. F. Consequently,
sup 1P{J(F-l(br,n))(Xr:n B

(br,n}} E B} - P{(Ur:n - br,n)

B}I

(4.3.1)
where Ur:n is the rth order statistic of n i.i.d. (0, I)-uniformly distributed r.v.'s.
Notice that the error bound above is sharp since the second term of the
expansion of length two depends on the density f.

The Main Result


In analogy to (4.3.1) it will be shown in Theorem 4.3.1 that the variational
distance between standardized joint distributions of k order statistics is of
order o ((k/n) 1/2). That means, after a linear transformation which depends
on the underlying dJ. F the joint distribution of order statistics becomes
independent from F within an error bound of order O((k/n)1/2).
When treating the normal approximation, the situation is completely
different. It is clear that the joint asymptotic normality of order statistics Xr,n
and Xs,n implies that the spacings X"n - Xr:n also have this property.
However, if s - r is fixed then spacings behave like extreme order statistics,
and hence, the limiting distribution is different from the normal distribution.
Theorem 4.3.1. Let Xi,n be the ith order statistic of n i.i.d. random variables
with common df F and density f.

4. Approximations to Distributions of Central Order Statistics

124

= n + 1 with ri - ri- 1 ::::-: 4 for i = 1,2, ... ,


bi(1 - b;)for i = 1, ... , k.
Assume that f > 0 and f has three derivatives on the interval I where
I = (F- 1 (b 1 ) - e1 , F- 1(bk ) + ed with ei = 5n-l/2(log n)a;/f(F- 1(b;)) for i = 1, k.
Then, there exists a universal constant C > 0 such that

Let 0

ro < r 1 < ... < rk < rk+l

+ 1. Put bi = r;/(n + 1) and aF

sup IP{[f(F-1(b;))(Xri : n

F-l(b;))]~=l E

B} - P{[(Uri : n

b;)]~=l E

B}I

::::;; C(k/n) 1/2 [c(f)1/2


where c(f)

+ C(f)2 + n- 1/2J

maxJ=l [supYEllf(j)(y)l!infYErfi+1(y)].

At the end of this section we shall give an example showing that Theorem
4.3.1 does not hold for ri - ri - 1 = 1. It is difficult to make a conjecture whether
the result holds for ri - ri- 1 = 2 or ri - ri- 1 = 3. As we will see in the proof of
Theorem 4.3.1 one reason for the restriction ri - ri - 1 ::::-: 4 is that the supports
of the two joint distributions are unequal.
Theorem 4.3.1 is a slight improvement of Theorem 2.1 in Reiss (1981b)
which was proved under the stronger condition that r i - r i - 1 ::::-: 5. Therefore,
the proof is given in its full length. Another reason for running through all the
technical details is to facilitate and to encourage further research work.
Theorem 4.3.1 may be of interest as a challenging problem that can only be
solved when having a profound knowledge of the distributional properties of
order statistics.
Theorem 4.3.1 also serves as a powerful tool to prove various results for
order statistics. As an example we mention a result of Section 4.5 stating that
several order statistics of i.i.d. exponential r.v.'s are jointly asymptotically
normal. By making use of Theorem 4.3.1, this may easily be extended to other
r.v.'s. However, one should notice that a stronger result may be achieved by
using a method adjusted to the particular problem. Thus, applications of
Theorem 4.3.1 will lead to results of a preliminary character which may
stimulate further research work. Another application of Theorem 4.3.1 will
concern linear combinations of order statistics (see Section 6.2).
PROOF OF THEOREM 4.3.1. Part I. We write Ili = F-1(b;), /; = f(ll;) and, more
generally, /;U) = f U)(IlJ Denote by Qo and Ql the distributions of
(Uri : n

b;)~=l

and, respectively, (/;(Xri : n

and by go and gl the corresponding densities.


From Lemma 3.3.9(i) and Lemma A.3.5 we obtain

s~p IQo(B) -

Ql(B)1 ::::;;

[2 Qo(A

t(

Il;))~=l'

J/

-IOg:Jd Qo

(1)

for some Borel set A to be fixed later. The main difficulty of the proof is to
obtain a sharp lower bound of JAloggl/godQo'

4.3. Asymptotic Independence from the Underlying Distribution Function

125

We have

and

where

+ xk/h)}.
Moreover, K is a normalizing constant, hi(x) = !(Jli + xi//;)//;, '/lAx) =
Xi - Xi-l + (hi - hi-d, <5i(x) = F(Jli + xi//;} - F(Jli-l + xi-d/;-I) - 'Mx) for
i = 1, ... , k + 1 [with the convention that Xo = Xk+l = 0, F(Jlo + xo/!o) = 0
and F(Jlk+l + Xk+l/h+1) = 1]. Thus, for A c Al we have
Al = {x: F(Jll

F(Jlk

.f f (loghJdQo

f(
A

+ Xd!l) < ... <

IOggl)dQo =
go

,=1

+ k+l
.2: (ri ,=1

To obtain an expansion of log(l

ri- 1

1)

f (
A

(2)

<5.) dQo
log 1 + ---.:
t/Ji

+ <5i/t/Ji), we introduce the sets


i = 1, ... , k + 1.

Notice that
(3)

on A 2 ,i where, throughout the proof, C denotes a universal constant that is


not necessarily the same at each appearance. Moreover, we write
A 3 ,i

= {x: Ixd

::;; 5n- 1/2 (logn)oJ

and
(4)

We shall verify that the following three inequalities hold:

Ii~

I ~f (ri ,=1

(log hJ dQo
ri-l - 1)

I: ; C[c(f)QO(Ac)2/3 k/n

log(1

+ <5i/t/Ji) dQ o l

1/ 2

+ (c(f) + c(f)2)k/n],

(5)

4. Approximations to Distributions of Central Order Statistics

126

Qo(AC) :;:;

c [n~ + C(f)4(log n)I/2 n~

J.

(7)

The assertion of the theorem is immediate from (1), (2), and (5)-(7).
A Taylor expansion of log(fID about f.1i yields
Iloghi(x) - (f,oW/)x;I :;:; C(c(f)
for x

+ c(f)Z)x?

A 3.i and i = 1, ... , k. Since SXi dQo(X) = 0 we obtain

and hence, (5) is immediate from (1.7.4).


Next, we shall prove a lower bound of L~~t (ri - ri- 1 - 1) SA log(1
bdt/lJ dQo It is obvious from (3) that

(ri - ri- 1 - 1)

with
k+1

PI = i~ (ri - ri- 1 - 1)
Pz

P3 =

~1

( . _.

1)

.L... r,

r,-1

i~ (ri -

ri- 1 - 1)

,~1

+ bdt/lJdQo :;:; c(lpll + IPzl + P3)

log(1

f
f

t
A

aix? - ai- 1X?-1


t/li(X)

+
(9)

dQo(X),

bi(x) - (aix? - ai- 1xf-d dQ ( )


./, ( )
0 x ,
'I'i X

(bdt/lJ z dQo,

where the constants a i are given by a i = 1;(1)121? for i = 1, ... , k, and ao =


ak+l = O. From P.1.25 it is easily seen that

Some straightforward calculations yield

for every x and i = 2, ... , k. Moreover, L~~t (aix? - a i- 1X?-I) = 0 and


ri - ri-l - (n + 1)t/li = -en + l)(xi - Xi-I)' Combining these relations and
applying the Holder inequality we obtain

4.3. Asymptotic Independence from the Underlying Distribution Function

127

Since ri - ri- 1 ~ 4 we know that P.1.23 is applicable to I/Ii- 3 dQo and hence
the Holder inequality, Lemma 3.1.3 and Corollary 1.6.8 yield
(11)

To obtain a sharp upper bound of Ip21 one has to utilize some tedious
estimates of lc5i (x) - (aiX[ - ai-lX[-dl. A Taylor expansion of G(y) =
F(/li + yxdf;) - F(/li-l + YX i- I //;-I) about y = 0 yields

Iui~ ()
X-

(2

ai x i - ai- 1Xi-l

)1_
11 (2) ( /li + ()Xi)
X~ - (; 1
/; /;3

1(2) (

/li-l

i - 1) X~-1 1
+ ()X/;-1
/;~1

for every i = 2, ... , k and x E A 3 ,i n A 3 ,i-1 where () E (0, 1). Thus, by further
Taylor expansions of F- 1 and of derivatives of F we get

lc5i (x) - (aiX[ - ai-lX[-dl


~ C(c(f)lx~ - x~-11

+ x~-I[c(f)lxi - Xi-II + (c(f) + c(f)2)(bi - bi - 1)])

=: '1i(X),

(12)

For i = 1 and x E A 3 ,1 and, respectively, i = k

+ 1 and x E A 3 ,k+l we get

lc5i(x) - (aiX[ - ai-lX[-dl ~ CC(f)IXi - Xi_11 3 =: '1i(X),

(13)

Since L~,;t [c5i (x) - (aiX[ - ai-lX[-I)] = 0 we obtain-using again the


HOlder inequality and applying (12) and (13)-that
k+l

Ip21 ~ i~

k+l

~ i~

[1'1i(x)I(1

+ (n + 1)l xi -

(f ['1i(x)(1 + (n + 1) IXi -

Xi-ll)/I/Ii(X)]dQo(x)
Xi-l 1)]2 dQo(X)

)1/2 (f I/Ii- 2dQo )1/2

Proceeding as in the proof of (11) we obtain


Ip21 ~ C(c(f)

+ c(f)2)k/n.

(14)

Moreover, the arguments used to prove (11) and (14) also lead to
P3

~:~ (ri -

ri-l

-1)(f ['1i(X) +

+ (c(f) + c(f)2)(bi ~ C(c(f)

c(f)lx[ - x[-11

bi _dx[_1]6 dQ O(X)Y/3

(f I/Ii- 3dQo y/3

(15)

+ c(f)2)k/n.

Combining (9), (11), (14), and (15) we obtain (6).


Finally, we prove (7). Applying Lemma 3.1.1 we get
Qo{x: Ix;! ~ (50/11)ui(logn)/n 1/2 } ~ Cn- 3

(16)

4. Approximations to Distributions of Central Order Statistics

128

for i = 1, ... , k. Hence


Qo(A~.;) :5: Cn- 3

(17)

for i = 1, ... , k, and in view of Corollary 1.6.8,


Qo{x:

IXi -

x i - 1 1 ~ 5(bi - bi_1)1/2(logn)/n 1/2 } :5: Cn- 3

(18)

for i = 2, ... , k. From (10), (11), (13), (17), and (18) we infer that
(19)

Qo{Ji ~ -En} ~ 1 - Cn- 3

for i = 1, ... , k + 1 where En = c(f)(bi - bi_d 1/2(log n)3/n1/2. Since ri - ri- 1 ~ 4


we deduce from Lemma 3.1.2 that
QO{ljJi

~ 3En} ~ 1 -

(20)

Cc(f)4(logn)1/2/n2

+ 1. Combining (19) and (20) we get


Qo(A~):5: C[n- 3 + c(f)4(logn)1/2/n2]
for i = 1, ... , k + 1. It is immediate that Qo(A 1) ~ QO(n~=l A 3,J
for i = 1, ... , k

(21)

This together

with (17) and (20) yields

Qo(AC) :5: C[k/n 3 + c(f)4(log n)1/2 k/n2].

(22)

Thus, (7) holds and the proof is complete.

Counterexample
Theorem 4.3.1 was proved under the condition ri - r i - 1 ~ 4. A counterexample in Reiss (1981 b) shows that this result does not hold if ri - ri - 1 = 1
for i = 1,2,00', k.
EXAMPLE 4.3.2. Let Xi: n be the ith order statistic of n i.i.d. standard exponential
r.v.'s (with common dJ. G and density g).
Then, if n 1/2 = o(k(n)) and [nq] + k(n) :5: n where q E (0, 1) is fixed, we
obviously have

P{Ui:n

and, with bi = i/(n

Ui- 1:n > 0 for i = [nq],oo., [nq]

+ 1) and J1.i =

lim sup P{g(J1.i)(Xi : n


n

+ k(n)}

= 1

G- 1(b;) it can be verified that

+ (bi - bi-d
[nq],oo., [nq] + k(n)} <

J1.i) - g(J1.i-d(Xi- 1:n

> 0 for i =

J1.i-1)

1.

Thus, the remainder term in Theorem 4.3.1 is not of order Ok/n)1/2) for the
sets

4.4. The Approximate Multivariate Normal Distribution

129

4.4. The Approximate Multivariate


Normal Distribution
From Section 4.3 we already know that normalized joint distributions of
central order statistics are asymptotically independent of the underlying dJ.
F. In Section 4.5 we shall prove that, under appropriate regularity conditions,
the joint distributions are approximately normal. In the present section we
introduce and study some properties of such normal distributions.
To find these approximate normal distributions it suffices to consider order
statistics Vr,:n::;; Vr2 :n ::;; ... ::;; V rk :n of n i.i.d. random variables uniformly
distributed on (0, 1). Put bi = rd(n + 1). Then the normalized order statistics
(n

+ 1)1/2(Vr,:n -

bi),

i = 1, ... , k,

have expectation equal to zero and co variances approximately equal to


bj ) for i ::;;j. Thus, adequate candidates of approximate joint normal
distribution of central order statistics are the k-variate normal distributions
N(o,l:.) with mean vector zero and covariance matrix ~ = (O"i,j) where O"i,j =
M1 - b) for 1 ::;; i ::;; j ::;; k. Below the bi are replaced by arbitrary Ai'

Ml -

Representations
Our first aim is to represent N(o,l:.) as a distribution induced by the kvariate standard normal distribution N(O,I) where I denotes the unit matrix.
Obviously, N(O,I) = N/'o, 1)' Given = Ao < A1 < ... < Ak < 1 define the linear
map Tby

(4.4.1)
TN(o,I) = N(o,l:.)

Lemma 4.4.1.

(that is, N(o, I) {T E B} = N(o,l:.)(B) for every Borel set B).

PROOF. Let T also denote the matrix which corresponds to the linear map.
The standard formula for normal distributions yields that T~O,I) has the
covariance matrix H = ('1i) = TTl where yt is the transposed of T. Thus,
~
Am - Am-1
'1i,j = (1 - Ai)(1 .- Aj) m~l (1 - Am- 1 )(1 - Am)

for i ~j.

By induction over j = 1, ... , k we get

Am - Am-l
m=l (1 - Am-d(l - Am)

and hence '1i,j = (1 - Ai)AJor i

~ j.

Aj
(1 - A)

Since '1i,j = '1j,i the proof is complete.

4. Approximations to Distributions of Central Order Statistics

130

From standard calculus for normal distributions we know that the density
of N(o,l:) is given by

((J(O,l:)

({J(O,l:)(x)

= [det1:-1/(2n)kr/2exp[-ht1:-1x]

(4.4.2)

where x = (x 1 , ... ,xS and 1:-1 is the inverse matrix of 1:. By elementary
calculations and by formula (4.4.4) below we get an alternative representation
of ({J(O,l:)' namely,
/

({J(O,l:)(x)

k+!
J-1 2
= [ (2n)k 1] (Ai - Ai-d
exp

where Ao = 0, Ak+1 = 1 and

and lXi.i-1

(lX i )

)2J

;:_11

(4.4.3)

is given by

Ai+! - A (Ai+1 - A;)(Ai - Ai-d'

i 1
= -,--:-------,---,--,--:-------:------:-

1,1

k+1 (x. - x
i~ ~i _

= Xk+1 = 0.

Xo

Lemma 4.4.2. (i) The matrix 1:- 1 =


IX .

[1-2
i

= 1, ... , k,

= lX i-l,i = -(Ai - Ai_d- 1, i = 2, ... , k, and lXi,i = 0, otherwise.

(ii) det 1:- 1 =

n (Ai -

k+1
i=l

Ai_d- 1.

(4.4.4)

PROOF. (i) Let T be defined as in (4.4.1). The inverse of T is represented by the


matrix B = (f3i) given by

1 - Ai - 1
J1 /2
[
f3i,i = (1 - Ai)(Ai - Ai-d
'

= 1, ... , k,

and
f3i,i-1 =

-[(1 _Ai~l~A~i- )J
Ai_1

/2

i=2, ... ,k,

and f3i,i = 0, otherwise. Notice that 1:- = BtB = n=~=l f3m,if3m,i]i,i and, thus,
lXi,i = f3ti + f3[+l,i, lXi, i-I = lXi-1,i = f3i,if3i,i-1 and lXi,i = 0, otherwise. The proof
of (i) is complete.
(ii) Moreover,
k 2
_12
k
-1 k 1 - Ai- 1
det 1: = (det B) =
f3i,i =
(Ai - Ai-d
i=l
i=l
i=l 1 - Ai
k+1
=
(Ai - Ai_1f 1.
o
i=l

n- - -:- - -': - - "-

Moments
Recall that the absolute moments of the standard normal distribution
are given by

N(O,l)

4.5. Asymptotic Normality and Expansions of Joint Distributions

I.
Xl

I I

X =

(O,1)()

1 . 3 . 5 ..... (j - 1)
(2j/n)1/2((j _ 1)/2)!

131

'f j even
j odd

(4.4.5)

for j = 1,2, ....


Since N(O,CICt) is the normal distribution induced by N(O,I) and the map
x --+ Cx where C is a m,k-matrix with rank m we know that the distribution
induced by N(o, I) and the map x --+ Xi - Xi- 1 is the univariate normal
distribution N(O,(A'--<'_I)(1-P.,- -<'-I)'
This together with (4.4.5) implies that

IXi - xi-1l j dN(o,I)(X)


1 35 ... (j - 1) [(Ai - )oi-d(1 - (Ai - Ai_1))]j/2
(2j/n)1/2((j - 1)/2)![(Ai - Ai- 1)(1 - ()oi - Ai_1]i/2

Further, by applying Lemma 4.4.1, we obtain for i

Ixl

Xi- 1 dN(o,I/X) =

if j even (4.4.6)
j odd.

= 2, ... ,

xixl- 1 dN(O,I)(X) =

k - 1,

o.

(4.4.7)

4.5. Asymptotic Normality and Expansions


of Joint Distributions
In the particular case of exponential r.v.'s we know that spacings are
independent so that it will be easy to deduce the asymptotic normality and
an expansion of the joint distribution of several central order statistics from
the corresponding expansion for a single order statistic.
In a second step the result will be extended to a larger class of order statistics
by using the transformation technique.
We will use the abbreviations of Section 4.4: Given positive integers n, k,
and ri with 1 :::::; r 1 < r2 < ... < rk :::::; n, put bi = rd(n + 1) and ai,j = bi(1 - bj)
for 1 :::::; i :::::; j :::::; k. Moreover, denote by N(o, I) the k-variate normal distribution
with mean vector zero and covariance matrix L = (ai ) . Again, the unit matrix
is denoted by I.

Normal Approximation: Exponential R.V.'s


First let us consider the case of order statistics from exponential r.v.'s. Before
treating the expansion of length two we shall discuss the result and the proof
in connection with the simpler normal approximation.
Let Xi:n be the ith order statistic of n i.i.d. standard exponential r.v.'s.
Denote by Pn the joint distribution of
i = 1, ... , k,

(4.5.1)

4. Approximations to Distributions of Central Order Statistics

132

where G is the standard exponential dJ. with density g. Moreover,


again the variational distance.

I II denotes

Theorem 4.5.1. For all positive integers k and ri with 0 = ro < r1 < r2 <
... < rk < rk+1 = n + 1 the following inequality holds:

(4.5.2)

where C = max(l, 2C2), C2 is the constant in Theorem 4.2.4 for m = 2, and Pn


is defined by
k+1
(4.5.3)
Pn = 2 L (ri - ri_1f 1.

i=l

Since L~~l (ri - ri- 1)/(n


P.3.9) that

+ 1) =

1 we infer from Jensen's inequality (see

Pn ~ 2k2/n
which shows that N(o."F.) will provide an accurate approximation to Pn only
if the number of order statistics under consideration is bounded away from
n 1/2 From the expansion of length 2 we shall learn that the bound in (4.5.2)
is sharp.
Next we make some comments about the proof of Theorem 4.5.1. Notice
that the asymptotic normality of several order statistics holds if the corresponding spacings have this property. Let Qn denote the joint distribution of
the normalized spacings

en

l)(~i-=-b~~~:(l -

bi)Y/2(X'i: n - X'H:n - (G- 1(bi ) - G- 1(bi_1)))


(4.5.4)

for i = 1, ... , k (with the convention that bo = 0 and G- 1 (bo ) = 0).


Denote again by T the map in (4.4.1) which transforms ~O,I) to N(o,"F.) [that
is, TN(o,I) = N(o, "F.)]' Since G- 1(bi ) = -log(l - bi) and hence g(G- 1(bi)) =
1 - bi it is easy to see that
Therefore,
(4.5.5)
On the right-hand side of (4.5.5) one has to calculate the variational distance
of the two product measures Qn :=
Qn,i and N(O,I) = N(~, 1) where Qn,i is
the distribution of the ith spacing as given in (4.5.4).
From Lemma 1.4.3 we know that spacings of exponential r.v.'s are distributed like order statistics of exponential r.v.'s. Since G- 1(b i) - G- 1(bi_1) =
G- 1ri - ri-d/(n - ri- 1 + 1)) we obtain that Qn,i is the distribution of the
normalized order statistic

Xt=l

4.5. Asymptotic Normality and Expansions of Joint Distributions

133

(mi + 1)3/2g(G- 1(s;/(mi + 1)))(X


_ G- 1 ( ./( . + 1)
(si(m i - Si + 11/2
.,:m,
S, m,
where mi = n - ri- 1 and Si = ri - ri- 1.
Section 3.3 provides the inequalities I Qn - N(o.l) I
as well as

IIQ. - N(o.l)II:::;;

Ct

(4.5.6)

:::;; L~=1 II Qn. i - N(O.I) II

/
H(Q"i,N(o.l)fY 2

where H denotes the Hellinger distance. The first inequality and upper bounds
of Wn.i - N(o.l)ll, i = 1, ... , k (compare with Corollary 4.2.7) lead to an
inaccurate upper bound of IIQn - N(o.l)II. The second inequality is not
applicable since a bound of the Hellinger distance between Qn. i and N(O.I) is
not at our disposal. The way out ofthis dilemma will be the use of an expansion
of length two.

Expansion of Length Two: Exponential R.V.'s


To simplify our notation we shall only establish an expansion of length two.
Expansions of length m can be proved by the same method.
Theorem 4.5.2. Let C, Xi:., ri, p. and Pn be as in Theorem 4.5.1. Then, the
following inequality holds:

s~p IPn(B) -

+ Lr.n)dN(o.l:)I:::;; Cexp(CPn)Pn

(1

(4.5.7)

where L r.n is the polynomial defined by


k

L r,n (x) = "~ Ll "i-'i-t,n-'i-l (xI"


x1- II"f l - 1 ,I.)
I II.'. i=1
with Ll,r.n defined as in Corollary 4.2.7, Xo = 0 and
l'i.j = (1 - bi)[(bj - bj- 1}/(l - bj-d(l - b)] 1/2.
PROOF.

From (4.5.6) and Corollary 4.2.7 it is immediate that

sup
Qn,'.(B) B
:::;;

C2

Jr (1 + L
B

n - ri-

l,ri _'i-t,n _'j-

(ri - ri-d(n - ri + 1)

) dN.(0,1)

I
(1)

=:

>:
C2 Ui'

The bound for the variational distance between product measures via the
variational distance between the single components (compare with Corollary

4. Approximations to Distributions of Central Order Statistics

134

A.3.4)) yields
sup
B

I(x
,=1

S;;

Qn.i)(B) -

C z exp [2C z

f TI
B,=l

(1

it it
bi ]

+ L1.ri-ri~l.n-ri-l(XJ)dN(~.l)(X)1
(2)

bi

Next we verify that the integral in (2) can be replaced by that in (4.5.7).
Lemma A.3.6, applied to gi = L1.ri-ri~l.n-ri~l' yields
sup

f TI [1 + L1.ri-ri~,.n-ri~l (xJ] dN(~.l)(X)


B i=l

-L[1 it L1.ri-ri~1.n-ri~1(Xi)]dN(~.1)(X)1
+

S;;

8- 1/Z exp [r1

S;;

I8- 1/Z ex p

(4.5.8)

.f fLi.ri-ri~l.n-ri~l dN(o.l)] .f fLi.ri-ri~l.n-ri~l dN(o.l)

,=1

[r1

,=1

it bi] it bi

where the last step is immediate from (4.2.7).


Check that L7=1 bi S;; Pn' Combining (2) and (4.5.8) we obtain
supl(x Qn.i)(B)-

,=1

S;;

r [1 +.f,=1 L1.ri-ri~1.n-ri~1(XJ]dNto.1)(X)1

JB

C z exp[2CZPn]Pn

I8- 1/Z exp[r 1Pn]Pn

S;;

(4.5.9)

Cexp(CPn)Pn-

Now, the transformation, as explained in (4.5.5), yields the desired inequality (4.5.7). For this purpose apply the transformation theorem for
densities. Note that the inverse S of T is given by

D
From (4.5.9) we also deduce for the normalized, joint distribution Pn of
order statistics that

S;;

1 p~/Z

+ O(Pn)

where the last inequality follows by means of the Schwarz inequality.


Notice that (4.5.10) is equivalent to (4.5.2) as far as the order of the normal
approximation is concerned. However, to prove (4.5.2) with the constant as
stated there one has to utilize a slight modification of the proof of Theorem
4.5.2.

4.5. Asymptotic Normality and Expansions of Joint Distributions

135

PROOF OF THEOREM 4.5.1. Applying Lemma A.3.6 again we obtain


sup
B

Ir

J ,=1 [1 + L 1,r,-r,_l,n-r,_1(x;)]dN/b,1)(x) - N(~'1)(B)1


B

(4.5.8')

::;; exp[3 -1 Pn] (Pn/6)1/2


showing that (4.5.2) can be proved in the same way as (4.5.7) by applying
(4.5.8') in place of (4.5.8).
0

Normal Approximation: General Case


Hereafter, let p. denote the joint distribution of the normalized order statistics
i

= 1, ... , k,

(4.5.11)

where Xi," is the ith order statistics of n i.i.d. random variables with common
dJ. F and density f, and bi = rj(n + 1). Recall that the covariance matrix L is
defined by (Ji,j = b;(1 - bJ for 1 ::;; i ::;; j ::;; k.
From Theorem 4.3.1 and 4.5.1 it is easily seen that under certain regularity
conditions,
(4.5.12)
with P. as in (4.5.3). The crucial point is that the underlying density is assumed
to possess three bounded derivatives. The aim of the following considerations
is to show that (4.5.12) holds if f has two bounded derivatives. The bound
O(p;/2) is sharp as far as the normal approximation is concerned, however,
p;/2 is of a larger order than the upper bound in Theorem 4.3.1.
Theorem 4.5.3. Denote by p. the joint distribution of the normalized order

statistics in (4.5.11). Assume that the underlying density f has two derivatives on the intervals Ii = (F- 1(b;) - 8 i, F- 1(b;) - 8 i), i = 1, ... , k, where 8 i =
5[(Ji,ilog(n)/(n + 1)] 1/2/f(F- 1(b;)). Moreover, assume that min(b 1, 1 - bk ) ~
10 log(n)/(n + 1).
Then there is a universal constant C > such that

lIP. - N(o,I:)II ::;; C(1 + d(f))p;/2


where d(f) = maxf=1 max~=1 (SUPYEI, If(j)(y)l/infyE1 ,fi+ 1(y)).
PROOF. In the first part of the proof we deal with the special case of order
statistics U"n of n i.i.d. random variables with uniform distribution on (0,1).
In this case, an application of Theorem 4.3.1 would yield a result which is only
slightly weaker than that stated above. The present method has the advantage
of being simpler than that of Theorem 4.3.1 and, moreover, it will also be
applicable in the second part.

4. Approximations to Distributions of Central Order Statistics

136

I. Let Qn denote the joint distribution of normalized order statistics


X" ,n' ... , X'k,n of standard exponential r.v.'s with common dJ. C and
density g. Write gi = g(C-l(b;)). Denote by Q~ the joint distribution of
(n + 1)1/2(U"n - b;), i = 1, ... , k. From Corollary 1.2.6 it is easily seen
that
J"
(1)

Q~ = TQn

where T(x)

= (Tl (X 1 ), . .. ,

T;(x;) = (n

1k(xd) and

1)1/2 ( C ( C- 1(b;)

for every x such that C- 1(b;) + x;/((n


Theorem 4.5.1 and (1) yield

+ (n + ~i)1/2g) -

+ 1)1/2g;) > 0, i

IIQ~ - N(o.dl ~ IITQn - TN(o.I:) II

~ Cp~/2

(2)

bi)

1, ... , k.

IITNro.I:) - N(o.I:)II

II TN(o.I:) - N(o.d

where, throughout, C denotes a universal constant that will not be the same
at each appearance. Thus, it remains to prove that
(3)

The inverse S of T is given by S(x)


Si(X i ) = (n

for x with

= (S1 (Xl)"", Sk(X k))

+ 1)1/2gi(C- 1(b i + x;/(n + 1)1/2) - C- 1(b;)),


< bi + x;/(n + 1)1/2 < 1. Inequality (3) holds if

where
i = 1, ... , k,

(4)

(5)

We prefer to prove (5) instead of (3) since this is the inequality that also has
to be verified in the second part of the proof with C replaced by F.
Denote by NT and Ns the restrictions of N(o.I:) to the domains DT of T and
Ds of S. Check that
IITN(o.I:) - N(o.dl ~ II(To S 0 T)N(o.I:) - (To S)N(o.dl

::;; IIN(o.I:) - SN(o.d

+ IINs -

+ N(o.I:)(D~) + N(o.I:)(Ds)

N(o.dl

which shows that (5) implies (3) since


(6)

(6) in conjunction with (A.3.5) yields


IIN(o.I:) - SN(o.I:)II

~ CPn + [2N(o.I:)(B") +

(-IOg(fdlo))dN(o.I:)J2

(7)

for sets B in the domain of T, and 10' 11 being the densities of N(o.I:) and
SNro.I:)' Applying the transformation theorem for densities (1.4.4) we
obtain

4.5. Asymptotic Normality and Expansions of Joint Distributions

X E

137

B, where

(with the convention that 1k+1 (Xk+1) = To(xo) = Xk +1 =


bo = 0).
Check that
-log 'Ii'(xJ = x;/(n
and, for Xi;;::: -(n

+ 1)1/2(1

Xo

= 0 and bk+1 = 1,

(9)

- bi)

+ 1)1/20"i,i'
(10)

Define
B = {x: Xi > -(10(logn)0";,;)1/2, i = 1, ... , k}.

Applying the inequality 1 -

~(x) ~

qJ(x)/x we obtain

~o,};)(BC) ~ n- 4.

The condition min(b 1, 1 - bd;;::: 10 log (n)/(n


holds for x E B for i = 1, ... , k. Since

(11)

+ 1)

yields Be DT and (10)

i = 1, ... , k,

Xi dN(o,};)(X) = 0,

(12)

we obtain, by applying (9) and the Schwarz inequality, that


IXil
Jr (kift log 'Ii,(Xi)) d~o,};)(X) ~ iftk Jr (n + 1)1/2(1
_
B

1JC

b;) dN(o,};)(X)

Cn- 1

(13)

Notice that according to (4.4.7),

i~

(xt - Xt-1)(Xi -

xi-dd~o,};)(x) = 0,

(14)

and hence, applying (4.4.5) and (4.4.6), we obtain by means of some straightforward calculations that

i(
B

k~ c5i(X)(Xi - xi-d + c5f (X)/2) dN.


.L...

1=1

i -

i-1

() < C

(o,};) X

Pn

Combining (11), (13), and (15) we see that the assertion of Part I holds.

(15)

4. Approximations to Distributions of Central Order Statistics

138

II. Notice that Pn = SQ: where S is defined as in (4) with G and gi replaced
by F and f(F-l(bJ). Using Taylor expansions oflog I;'(xJ and I;(xJ the proof
of this part runs along the lines of Part I.
0

Final Remarks
In Reiss (1981a) one can also find expansions of length m > 2 for the joint
distribution of central order statistics of exponential r.v.'s. Starting with this
special case, one may derive expansions in case of r.v.'s with sufficiently
smooth dJ. by using the method as adopted in Reiss (1975a); that is, one has
to expand the densities and to integrate the densities over Borel sets in a more
direct way.

4.6. Expansions of Distribution Functions


of Order Statistics
In Sections 4.2 and 4.5, expansions of distributions of central order statistics
were established which hold w.r.t. the variational distance. These expansions
can be represented by means of polynomials that are densities W.r.t. the
standard normal distribution.
Expansions for dJ.'s can be written in a way which is more adjusted to dJ.'s
The results for dJ.'s of order statistics hold under conditions which are weaker
than those required for approximations in the strong sense. Along with the
reformulation of the results of Section 4.2 we shall study expansions of d.f.'s
of order statistics under conditions that hold for order statistics of discrete
r.v.'s.
Write again
a;,n = r(n - r + 1)/(n

+ 1)3

and

br,n = r/(n

+ 1).

Continuous D.F.'s
First, the results of Section 4.2 will be rewritten in terms of d.f.'s.
Corollary 4.6.1. Under the conditions of Theorem 4.2.4 there exist polynomials
Si,r,n of degree ~ 3i - 1 such that

s~p IP{ a'::~f(F-l (br,n))(Xr:n -

~ em [(n/r(n -

p-l

r))m/2

(br,n))

~ t} -

( <l>(t)

+ rr;!X laj,r,nl m/j ]

where aj,r,n are the terms in Theorem 4.2.4.

+ <pet) ~~l

Si,r,n(t))

(4.6.1)

4.6. Expansions of Distribution Functions of Order Statistics


PROOF.

139

Apply Lemma 3.2.6.

Let us note the explicit form of Sl.r,n and S2,r,n' We have

(qJSi,r,n)'

= qJLi,r,n

with Li,r,n as in Addendum 4.2.5. Moreover,

n-2r+1
2
2
Sl,r,n(t) = 3 [r(n _ r + 1)(n + 1)] 1/2 (1 - t ) + (Xl,r,n t /2

(4.6.2)

and
_

(
1)(
.. n(t) - rn-r+
n+ 1) [-(n - 2r

S2 r

+ [7(n - 2r
t

(Xj,r,n

+ 1)2 + 3r(n - r + 1)](3t + t 3 )/12 + (n - r + 1)2t]

+ (X l,r,n -2 L l,r,n (t)


with

+ 1) (1St + St + t )/18

t5
(X2l,r,n -8

t3

+ (X 2,r,n -6

(4.6.3)

as in Theorem 4.2.4 and L 1 ,r,n as in (4.2.2).

EXAMPLE 4.6.2. We have

<em ( r(n - r)

)m12

where Si,r,n are the polynomials of Corollary 4.6.1 with

(Xi,r,n

= O.

Discrete D.F.'s
The conditions of Theorem 4.2.4 exclude discrete d.f.'s F. The key idea of the
following is to approximate the d.f. F (which may be discrete) by some function
G which fulfills an appropriate Taylor expansion.
As an example we shall treat the case of d.f.'s F that permit an Edgeworth
expansion (like binomial d.f.'s).
We start with a technical lemma.

Lemma 4.6.3. Let Xi:n be the order statistics of n i.i.d. random variables with
common dl. F.
Let G be a function and u a fixed real number such that for all reals y,

IG(u +

y) - G(u) -

.f ~: yi I::; (m m++11)IIYlm+1.
.

,=1 L

(4.6.5)

4. Approximations to Distributions of Central Order Statistics

140

Then, if C 1 > 0 there exists a universal constant Cm > 0 and polynomials Si.r,n
of degree ~ 3i - 1 such that for all reals t the following inequality holds:

Ip {a;::~c1 (Xr:n <


Cm [(
-

PROOF.

u)

r (n

~ t} -

n
- r

+ 1)

( <I>(t)

+ cp(t) :~1

)m/2 + am

Si,r,n(t)) I

(c. /c j +1 )mfj
r,n max
j=l )+1 1

(4.6.6)

Writing x = u + tar,n/C1 we get

P{a;::~c1(Xr:n - U) ~ t}

P{a;::~(Ur:n - br,n) ~ a;::~(F(x) - br,nn.

Denote by Si,r.n the polynomials of Example 4.6.2. Since


a;::~(F(x) - br,n) = a;::~(F(x) - G(x))

+ V(t) + a;::~(G(u) -

br,n),

with V(t) = a;::~(G(x) - G(u)), it is immediate from Example 4.6.2 that

Ip{a;::~C1(Xr:n -

~C

m[

u)

~ t} -

C(n _: +

[ <I> (V(t))

l)r

/2

+ cp(V(t)) :~1 Si,r,n(V(t))JI

+ a;::!(IF(x) -

G(x)1

IG(u) - br,nD].

Using condition (4.6.5) we obtain an expansion of V(t) oflength m, namely,


V(t) = t

+L
m

i=2

a i- 1

~ r'in t i + em(t)

dC 1

(m

m+1

am
r':+l Itl m+1

+ 1)!c 1

where lem(t)1 ~ 1. Now arguments analogous to those of the proof to Theorem


D
4.2.4 lead to (4.6.6).
The polynomials in Lemma 4.6.3 are of the same form as those in Corollary
4.6.1 with aj,r,n replaced by a!,nCj+1/c{+l.
Next, Lemma 4.6.3 will be specialized to dJ.'s F == FN permitting an
Edgeworth expansion G == GM,N of the form
M-1
GM,N(t) = <I>(t) + cp(t) L N- i/2 Qi(t)
i=l

where M and N are positive integers, and Qi is a polynomial for i =


1, ... , M - 1. Let us assume that
(4.6.7)
uniformly over tEl where I will be specified below.
If FN stems from a N-fold convolution, typically one has the following two
cases:

4.6. Expansions of Distribution Functions of Order Statistics

141

(i) I is the real line if the Cramer-von Mises condition holds,


(ii) 1= {y + kh: k integer} where y and h > 0 are fixed.
Moreover, define an "inverse" GZt.N of GM,N by
M-1
GZt,N = <1>-1 + I N- i/2 Qt(<I>-1)
i~l

where the Qt are the polynomials as described in Pfanzagl (1973c), Lemma 7.


We note that
and

(4.6.8)

Since GM,N is an approximation to FN we know that GZt,N is an approximation to Fli 1. As an application of Lemma 4.6.3 to F == FN, G == GM,N' and
u = GZt,N(br,n) we obtain the following

Corollary 4.6.4. Under condition (4.6.7) there exists em, M > 0 such that for every
positive integer n, r E {1, ... , n} and tEl:
IP{X"n :s; t} - ( <I>

+ ({J ~~1 Si,r,n}SM(t I

(4.6.9)

where
SM(t) = a;'~ G~,N[GZt,N(br,n)] (t - GZt,N(br,n))
and the Si,r,n are the polynomials of Lemma 4.6.3 with Ci = GX},N(GZt,N(br,n))'
PROOF. To make Lemma 4.6.3 applicable one has to verify that

GM,N(GZt,N(br,n)) = br,n

+ O(N-m/2).

(1)

It suffices to prove that (1) holds uniformly over all rand n such that
1<1>-1 (br,n)1 = O(log N). A standard technique [see Pfanzagl (1973c), page 1016]
yields
(2)

uniformly over 1tl = O(log N) where GM,N(t) = t


is immediate from (2) applied to t = <I>-l(br,n)'

+ Ii'!11 N- i/2 Qt(t). Thus, (1)


0

To exemplify the usefulness of Corollary 4.6.4 we study the dJ. of an order


statistic X"n of n i.i.d. binomial r.v.'s with parameters Nand p E (0, 1). It is
clear that

142

4. Approximations to Distributions of Central Order Statistics

where
FN(t)

k~O (~) pk(1 -

pt- k

with [ J denoting the integer function. Moreover, P{X". ~ t} = P{X". ~


[tJ} so that P{ X". ~ t} has to be evaluated at t E {O, ... , N} only.
As an approximation to the normalized version of FN we use the standard
normal dJ. <I> and the Edgeworth expansion <I> + N- 1/2 <pQ1 oflength 2 where
(see Bhattacharya and Rao (1976), Theorem 23.1)
Q1(t) = [(2p - 1)t 2 + (4 - 2p)]j6(p(1 _ p))1/2.

Table 4.6.1. Maximum Absolute Deviation of Exact Values and Expansions


p =.5
N= n
r = [nI2]

p =.2
N=n
r = [nI4]

p =.2
N = [n4/ 3 ]
r = [nI4]

p =.5
N = [n4/ 3 ]
r = [nI2]

(m,M)

n
20
80
200

(1, 1)

(2,2)

(1, 1)

(2,2)

(1, 1)

(2,2)

(1, 1)

(2,2)

.33
.38
.42

.01
.002
.0001

.35
.32
.31

.01
.003
.0001

.29
.22
.20

.007
.002
.0007

.27
.20
.16

.006
.0028
.0005

Table 4.6.1 presents a numerical comparison of the approximations in


Corollary 4.6.4 in the special cases of (m, M) = (1, 1) and (m, M) = (2,2). Thus,
if(m,M) = (1, 1) we compute the maximum value of

IP {X". ~ k} -

<I> [

a;~ <p(<I>-l(b,. )) (N1/2~~ ~Pp))1/2 -

<1>-1 (b".) ) ]

over k = 0, ... , N.

4.7. Local Limit Theorems and Moderate Deviations


In Section 4.2 we proved expansions of distributions of single order statistics

uniformly over the Borel sets. The main technical tool was an expansion of
one factor of the density (compare with the proof to (4.7.2)). The expansion
of the density was not given explicitly to concentrate our attention on the
result of statistical relevance, namely, the expansion of distributions.
The final section of this chapter is the proper place to give some explicit
formulas for expansions of densities with an error bound that is nonuniform
in x. By integration we shall also get inequalities which are relevant for
probabilities of moderate deviation.

4.7. Local Limit Theorems and Moderate Deviations

143

Let again
a;.n = r(n - r

+ 1)/(n + 1)3

br,n = r/(n

and

+ 1).

Denote again by U"n the rth order statistic of n i.i.d. (0, I)-uniformly
distributed r.v.'s. From Lemma 3.1.1 we obtain

P{a;::~IU"n -

br,nl Z e} :::;; 2ex p ( - 3[1

+n

e
: e/(ar,nn)]).

e > O.
(4.7.1)

A refinement of this result will be obtained in the second part of this section.

Local Limit Theorems


Denote by gr,n the density of

and by <I> and q; the standard normal dJ. and density. The most simple "local
limit theorem" is given by the inequality
Igr,n(x) -

q;(x)1 :::;; Cq;(x) C(n _ nr + 1))112 (1 + Ix13)

(4.7.2)

which holds for


X E

A(r, n):= {x:

Ixl :::;; (r(n - r)/n)1/6}

(4.7.3)

where the constant C > 0 is independent of x.


To prove (4.7.2) let us follow the lines of the proof to Theorem 4.2.1. The
density gr,n of a;::!(Ur:n - br.n) is written as Pr,nhr,n where Pr,n is a normalizing
constant. From the proof of Theorem 4.2.1 (1) we know that
(4.7.4)
for x E A(r, n).
We also need an expansion of the factor Pr,n' By integration over an interval
B we get uniformly in rand n that
Pr,n =

gr,n(x)dx

IL

hr,n(x)dx

= P{a;::~(Ur:n - br,n) E B}/[(2n)1 /2N(O,l)(B)


= (2nfl/2

+ O((n/r(n -

+ O((n/r(n -

r))1/2]

(4.7.5)

r))1/2)

where the final step is immediate by specifying B = {x: Ixl :::;; log(r(n - r)/n)}
and applying (4.7.1) to e = log(r(n - r}/n).
An expansion of length m can be established in the same way. For some
constant Cm > 0 we get

144

4. Approximations to Distributions of Central Order Statistics

Igr.n(X) -

cp(x) ( 1 + i~ Li.r.n(x)
m-1

I ~ Cmcp(x) (n
)m/2
r(n _ r + 1)
(1 + Ixl 3m )
(4.7.6)

for x E A(r, n) with polynomials Li,r,n as given in Theorem 4.2.1.


In analogy to Theorem 4.2.4 we also establish an expansion of the density
of the normalized rth order statistic under the condition that the underlying
dJ. has m + 1 derivatives.
Theorem 4.7.1. For some r E {l, ... , n} let X"n be the rth order statistic of n
i.i.d. random variables with common df F and density f Assume that
f(F-1(br.n)) > 0 and that the function Sr,n defined by
Sr,n(x)

a;:~(F[F-l(br,n)

+ xar,n/f(F-1(br,n))]

- br,n)

has m + 1 derivatives on the interval Ir,n:= {x: Ixl ~ cr,n} where log(r(n - r)/
n) ~ cr,n ~ (r(n - r)/n) 1/6/2. Denote by fr,n the density of
a;:~f(rl(br.n))(X"n - r1(br,n))'

Then there exists a constant Cm > 0 (only depending on m) such that


Ifr,n(X) - cp(x)

(1 + ~~1

Li,r,n)

~ Cmcp(x)(1 + IXI3m{(n/r(n -

r))m/2

+ rr;~lx

(4.7.7)

laj,r,nl mli ]

for x E Ir,n with polynomials Li,r,n as given in Theorem 4.2.4. Moreover,


)1' X E Ir,n} .
aj,r,n -- S(j+1)(O)
r,n
,j. -- 1, ... , m - 1,and am,r,n -- sup {ls(m+1)(
r,n X.
PROOF.

We give a short sketch of the proof. Check that

with gr,n as above. Applying (4.7.6) we obtain


Ifr,n - S;,ncp(Sr,n)
~

(1 + }:::

Li,r,n(Sr,n)) I

CmIS;,nl cp(Sr,n)(n/r(n - r))m/2(1 + ISr,nI3m)

with polynomials Li,r,n as given in (4.7.6). Now, using Taylor expansions of


S;,n and Sr,n about zero and of cp about x we obtain the desired result by
arranging the terms in the appropriate order.
D

Moderate Deviations
We shall only study a simple application of (4.7.1). It will be shown that
the right-hand side of (4.7.1) can be replaced by a term Cexp(-e 2 /2)/e for
certain e.

P.4. Problems and Supplements

145

Lemma 4.7.2. For some constant C > 0,

(i)

~ C(,(n _ :

l)y/2

+ IxI 3)q>(x)dx

(1

for every Borel set Be A(r, n) [defined in (4.7.3)].


(ii) Moreover,
P{a;:~IUr:n - br.nl ~ 8} ~ Cexp(-8 2 /2)/8

if 8

(r(n - r

+ 1)/n)I/6/2.

PROOF. (i) is immediate from (4.7.2) by integrating over B.


(ii) follows from (4.7.1) and (i). Put d = (r(n - r + 1)/n)I/6. We get
P{a;:~IUr:n - br.nl ~

8}

= P{a;:~IUr:n - br.nl

~ d}

< 2exp ( -

3[1

+ P{8

~ a;:~IUr:n - br.nl ~ d}

+ n- 1 + d/(ar.nn)]

+ C((1- <1>(8 + (,(11 _11r + l)y/2

f)

IXI q>(X)dX)

~ Cexp( -8 2 /2)/8

where the final step is immediate from (3.2.3) and (3.2.12).

PA. Problems and Supplements


1. (Asymptotic d.f.'s of central order statistics)
(i) Let r(n) E {l, ... , n} be such that nl/2(r(n)/n - q) ---> 0, n ---> 00, for some q E (0, 1).
The possible non degenerate limiting d.f.'s of the sequence of order statistics
Xr(n),n of i.i.d. r.v.'s are of the following type:
H
H

I . (X)

2..

{O'f
cI>(x')
1

(x) = {cI>( -( -x)')


1

0,

x<
X 2 0,

if x < 0,

H 3.. (x) = H I . (x/0')1[o.00)(x)


H4 = (1[-1.00)

x 2 0,

+ H 2. (x)1(-00.o)(x),

+ 1[1.00/2

where cc, 0' > O.


(Smirnov, 1949)
(ii) There exists an absolutely continuous dJ. F such that for every q E [0,1] and
every dJ. H there exists r(n) with r(n)ln ---> q and min(r(n), n - r(n ---> 00 as

146

4. Approximations to Distributions of Central Order Statistics

n --> 00 having the following property: Let Xr(n),n denote the r(n)th order
statistic of n i.i.d. random variables with common dJ. F. Then, the dJ. of
a,;-I(Xr(n),n - bn) converges weakly to H for certain an> 0 and bn.
(Balkema and de Haan, 1978b)
(iii) The set of all drs F such that (ii) holds is dense in the set of drs w.r.t. the
topology of weak convergence.
(Balkema and de Haan, 1978b)
(iv) Let XI' X 2 , X 3 , . be a stationary, standard normal sequence with covariances
,(n) = EXIXn+ 1 satisfying the condition L~II,(n)1 < 00. Let r(n) E {l, ... ,n}
be such that r(n)/n --> .l., n --> 00, where 0 < .l. < 1. Denote by Xr(n),n the r(n)th
order statistic of X I, ... , X n . Then, for every x,

n -->

00,

where

(Rootzen, 1985)
2. Let again N(p.r.) be a k-variate normal distribution with mean vector J1 and
covariance matrix ~ = (0";). Moreover, let I denote the unit matrix.
(i) Prove that
IIN(o.r.) - N(o. l)II :s;

l/2

Lt

(0";.; - 1)

-IOg(det(~J/2.

(Hint: Apply (4.4.2) and an inequality involving the Kullback - Leibler


distance.]
(ii) If ~ is a diagonal matrix then (i) yields

(iii) Alternatively,
IIN(o.r.) - N(o.l)11 :s; k2k+111~ - 1112'
where 11112 denotes the Euclidean norm.
(Pfanzagl, 1973b, Lemma 12)
(iv) Denote again by K the Kullback-Leibler distance. Prove that
K(N(P.l),N(o.l) = 2-111J111~.

(v) Prove that


II N(p,.l) - N(p,.l) II :s;

1/2 11J11 - J1211z.

3. Let N(o.r.) be the k-variate normal distribution given in Lemma 4.4.1. Define the
linear map S by

P.4. Problems and Supplements

147

Then, with I denoting the unit matrix, we have

(Reiss, 1975a)
4. (Spacings)
Given 1 ::; r l < ... < rk ::; n put again Ai = rj(n + 1), (Ji.j
j::; k, and /; = F'(F-I(AJ). Moreover, we introduce

af =

(Ji-I.i-d/;:I - 2(Ji-Ij(/;_I/;)

= Ai(1 -

Aj) for 1 ::; i ::;

+ (JijP

for i = 1, ... , k (with the convention that ai = (J 1. dfn.


Let Xi," be the order statistics of n i.i.d. random variables with common dJ. F.
Denote by Qn the joint distribution of the normalized spacings
i

1, ... , k,

and by Pn the joint distribution of the normalized order statistics


i = 1, ... , k.

After this long introduction we can ofTer some simple problems.


(i) Show that

IIQn - N(o. I) II ::; IlPn - N(o.!.:)11 + L\1/2


where I is the unit matrix,
L\

~ =

(Ji) and

1 - (1 - Ak)1/2

(Ai - Ai_dI/2/(ai/;)'

i=l

(ii) L\ = 0 if k = 1.
(iii) If F is the uniform dJ. on (0, 1) then

and as one could expect

5. (Asymptotic expansions centered at F-I(q


Let q E (0, 1) be fixed. Assume that the dJ. F has m + 1 bounded derivatives on a
neighborhood of F-I(q), and that f(F-I(q > 0 where f = F'. Moreover, assume
that (r(n)/n - q) = O(n- I ). Put (J2 = q(1 - q). Then there exist polynomials Si,n of
degree::; 3i - 1 (having coefficients uniformly bounded over n) such that

(1')

s~p P

{n I/2f(p-I(q
(J

(Xr(n),n -

-I} r
(q

E B

JB

where
Gr(n),n -'"
- q>

+ cP

m-I

,,-i/2S
n
i,n'

L..

i=l

dGr(n),n

1_
-m12 )
- O(n

148

4. Approximations to Distributions of Central Order Statistics


In particular,

2q - 1
Sl.n(t) = [ ~

uj'(r1(qJ 2
l(q2 t

+ 2f(F

[-q

+ nq u

r(n)

+1

2(2q - I)J
3u
.

(ii) If the condition (r(n)/n - q) = O(n-l) is replaced by (r(n)/n - q) = o(n-l/2) then


(i) holds for m = 2 with O(n- 1) replaced by o(n- 1/2).
(iii) Formulate weaker conditions under which (i) holds uniformly over intervals.
(iv) Denote by f..(n),n the density of the normalized distribution of Xr(n),n in (i), and
put gr(n),n = G;(n),n' Show that

If..(n).n(x) - gr(n).n(x)1 = O(n-m/2<p(x)(1


uniformly over x

+ Ixl 3m

[-logn, logn].

6. (Asymptotic independence)
Given n i.n.n.i.d. random variables with dJ.'s F1 ,

Fn we have

P{Xl,n ~ x, Xn,n ~ x} - P{Xl,n ~ x}P{Xn'n ~ x}


=

La

(Fj(y)(1 - F;(X)))J -

[fl

(Fj(y) - F;(X].
(Walsh, 1969)

Bibliographical Notes
Laplace (1818) derived the asymptotic normality of sample medians. He
computed the density of the sample median (within a more general framework)
and proved a limit theorem for the pointwise convergence of the densities. For
a discussion of this result and applications we refer to Stigler (1973). This
method was also used by Smirnov (1935) to obtain the asymptotic normality
of central order statistics in greater generality. Other approaches reduce the
problem to an application of the central limit theorem (that includes as a
special case the asymptotic normality of binomial r.v.'s). The reduction is
achieved either by means of the representations given in Section 1.6 (Cramer,
1946, and Renyi, 1953), the equality in (1.1.8) (Smirnov, 1949, van der Vaart,
1961, and Iglehart, 1976), or the Bahadur approximation (Sen, 1968).
The problem of charaterizing the possible limiting d.f.'s of central order
statistics was dealt with by Smirnov (1949) (see P.4.1(i)) and Balkema and de
Haan (1978a, b). If no regularity conditions are supposed, every dJ. is a
limiting dJ. of central order statistics (see P.4.1(ii)).
An interesting problem, not treated in the book, occurs if the value of the
underlying density at the q-quantile is equal to zero or if the q-quantile is not
unique; in this context we refer to the articles of Feldman and Tucker (1966),
Kiefer (1969b), Umbach (1981), and Landers and Rogge (1985) for important
contributions.

Bibliographical Notes

149

A bound for the accuracy of the normal approximation to the dJ. of a single
order statistic was established by Reiss (1974a) (where the terms of the error
bound are given explicitly), Egorov and Nevzorov (1976), and Englund (1980).
Expansions of distributions of sample quantiles were established in Reiss
(1976). There it was merely assumed that the underlying dJ. F has derivatives
on (F-l(q) - s, F-l(q)] and (F-l(q) F-l(q) + s) for some s > O. Ifthe left and
right derivative of F at F-l(q) are unequal, then the leading term of the
expansion is a certain mixture of normal distributions (compare this with
P.4.1(i)). In this context, we also refer to Weiss (1969c) who proved a limit
theorem under such conditions.
Puri and Ralescu (1986) studied order statistics of a non-random sample
size n and a random index which converges to q E (0, 1) in probability. Among
others, the asymptotic normality and a Berry-Esseen type theorem is proved.
A result concerning sample quantiles with random sample sizes related to that
for maxima (see P.5.11(i)) does not seem to exist in literature.
The problem of asymptotic independence between different groups of order
statistics provides an excellent example where a joint treatment of extreme
and central order statistics is preferable. The asymptotic independence of
lower and upper extremes was first observed by Gumbel (1946). A precise
characterization of the conditions that guarantee the asymptotic independence is due to Rossberg (1965, 1967). The corresponding result in the strong
sense (that is, approximation w.r.t. the variational distance) was proved by
Ikeda (1963) and Ikeda and Matsunawa (1970). In the i.n.nj.d. case, Walsh
(1969) proved the asymptotic independence of sample minimum and sample
maximum under the condition that one or several dJ.'s do not dominate the
other dJ.'s.
First investigations concerning the accuracy of the asymptotic results were
made by Walsh (1970). Sharp bounds of the variational distance in case of
extremes were established by Falk and Kohne (1986). Tiago de Oliveira (1961),
Rosengard (1962), Rossberg (1965), and Ikeda and Matsunawa (1970) proved
independence results that include central order statistics and sample means.
The sharp inequalities in Section 4.2 concerning extreme and central order
statistics are taken from Falk and Reiss (1988).
The asymptotic independence of ratios of consecutive order statistics was
proved by Lamperti (1964) and Dwass (1966); a corresponding result holds
for spacings. Smid and Stam (1975) showed that the condition, sufficient for
this result, is also necessary.
In Lemma 4.4.3 an upper bound of the distance between the normal
distribution N(O,I) and a distribution induced by N(O,I) and a function close to
the identity is computed. For related results we refer to Pfanzagl [1973a,
Lemma 1] and Bhattacharya and Gosh [1978, Theorem 1]. These results are
formulated in terms of sequences of arbitrary normal distributions of a
fixed dimension and therefore not applicable for our purposes. The normal
comparison lemma (see e.g. Leadbetter et al. (1983), Theorem 4.2.1) is related
to this.

4. Approximations to Distributions of Central Order Statistics

150

For rei) = rei, n), i = 1, ... , k, satisfying the condition rei, n) ~ qi' n ~ 00,
where < q 1 < ... < qk < 1, the weak convergence of the standardized joint
distributions of order statistics Xr(i),n to the normal distribution N(o.r.) was
proved by Smirnov (193S, 1944), Kendall (1940), and Mosteller (1946).
The normal distributions N(o,r.) are the finite dimensional marginals of
the "Brownian Bridge" WO which is a special Gaussian process with mean
function zero and covariance function E WO(q) WO(p) = q(1 - p) for Os q s
p s 1. The sample quantile process

[0, 1],

here given for (0, I)-uniformly distributed r.v.'s, converges to WO in distribution. Thus, the result for order statistics describes the weak convergence of the
finite dimensional marginals of the quantile process. For a short discussion
of this subject we refer to Serfling (1980). In view of the technique which is
needed to rigorously investigate the weak convergence of the quantile process,
a detailed study has to be done in conjunction with empirical processes in
general (see e.g. M. Csorgo and P. Revesz (1981) and G.R. Shorack and
J.A. Wellner (1986)). The invariance principle for the sample quantile process
provides a powerful tool to establish limit theorems (in the weak sense) for
functionals of the sample quantile process, however, one cannot indicate the
rate at which the limit theorems are valid. For statistical applications of the
quantile process we refer to M. Csorgo (1983) and Shorack and Wellner (1986).
Weiss (1969b) studied the normal approximation of joint distributions of
central order statistics w.r.t. the variational distance under the condition that
k = ken) is of order O(n 1 /4 ). Ikeda and Matsunawa (1972) and Weiss (1973)
obtained corresponding results under the weaker condition that ken) is of
order O(n 1/3). Reiss (197 Sa) established the asymptotic normality with a bound
of order O(~}~1 (ri - ri_l)-1 )1/2 for the remainder term. We also refer to Reiss
(197Sa) for an expansion of the joint distribution of central order statistics (see
Section 4.S for an expansion of length two in the special case of exponential
r.v.'s). Other notable articles pertaining to this are those of Matsunawa (197S),
Weiss (1979a), and Ikeda and Nonaka (1983).
An approximation to the multinomial distribution, with an increasing
number of cells as the sample size tends to infinity, by means of the distribution
of certain rounded-off normal r.v.'s may be found in Weiss (1976); this method
seems to be superior to a more direct approximation by means of a normal
distribution as pointed out by Weiss (1978).
The expansions of dJ.'s of order statistics in Section 4.6, taken from Nowak
and Reiss (1983), are refinements of those given by Ivchenko (1971, 1974).
Ivchenko also considers the multivariate case. In conjunction with this, we
mention the article of Kolchin (1980), who established corresponding results
for extremes.

CHAPTER 5

Approximations to Distributions
of Extremes

The non degenerate limiting dJ.'s of sample maxima Xn:n are the Frechet d.f.'s
G1 ,a, Wei bull d.f.'s G2 ,a, and the Gumbel dJ. G3 Thus, with regard to the
variety of limiting d.f.'s the situation of the present chapter turns out to be
more complex than that of the preceding chapter, where weak regularity
conditions guarantee the asymptotic normality of the order statistics.
As stated in (1.3.11) the limiting dJ.'s are max-stable, that is, for G E
{G 1 ,a, G2 ,a, G3 : IX > O} we find Cn > 0 and reals dn such that
Gn(dn + xc n) = G(x).

Another interesting class of d.f.'s is that of the generalized Pareto d.f.'s


IX> O} as introduced in (1.6.11). These d.f.'s can also be
used as a starting point when investigating distributional properties of sample
maxima.
Given G E {Gl,a, G2 ,a, G3 : IX > O} we obtain the associated generalized
Pareto dJ. W by restricting the function \}' = 1 + log G to certain intervals.
The generalized Pareto dJ. W has the property
WE {W1 ,a, W2 ,a, W3:

wn(dn + xc n) = G(x)

+ O(n- 1 )
where Cn and dn are the constants for which Gn(dn + xc n) =

G(x) holds. The


class of generalized Pareto dJ.'s includes as special cases Pareto d.f.'s, uniform
d.f.'s, and exponential dJ.'s.
An introduction to our particular point of view for the treatment of
extremes will be given in Section 5.1. This section also includes results for the
kth largest order statistic.
In Section 5.2 we shall establish bounds for the remainder terms in the limit
theorems for sample maxima. In view of statistical applications the distance

5. Approximations to Distributions of Extremes

152

between the exact and limiting distributions will be measured W.r.t. the
Hellinger distance.
In Section 5.3 some preparations are made for the study of the joint
distribution of the k largest order statistics; it is shown that there is a close
connection between the limiting distributions of the kth largest order statistic
Xn-k+Ln and the k largest order statistics
Higher order approximations in case of extremes of generalized Pareto
r.v.'s are studied in Section 5.4. The accuracy of the approximations to the
distribution of the kth largest order statistics and the joint distribution of
extreme order statistics is dealt with in Section 5.5.
Finally, in Section 5.6, we shall make some remarks about the connection
between extreme order statistics, empirical point processes, and certain
Poisson processes.

5.1. Asymptotic Distributions of Extreme Sequences


In this section we shall examine the weak convergence of distributions
of extreme order statistics. Moreover, it will be indicated that the strong
convergence-that is the convergence w.r.t. the variational distance-holds
under the well-known von Mises conditions.
Let X Ln S X 2 ,n S ... S Xn,n be the order statistics of n i.i.d. random
variables with common dJ. F. A non degenerate limiting dJ. of the sample
maximum Xn,n has to be-as already pointed out in Section 1.3-one of the
Frechet, Wei bull, or Gumbel drs; that is, if there exist constants an > 0 and
reals bn such that
Fn(bn + xa n) --+ G(x),

n --+

(5.1.1)

00,

for every continuity point of the nondegenerate limiting dJ. G then G has to
be of the type G1,a, G2 ,a, G3 for some IX > O.
Recall that G1,a(x) = exp( _x-a) for x> 0, G2 ,a(x) = exp( -( -x)") for
x < 0, and G3 (x) = exp( _e- X ) for every x.

Graphical Representation of Extreme Value Densities


The densities gi,a of Gi,a are given by
gl,Ax)

= IXx-(1+a)exp( _x-a),

g2,a(X)

= IX( _x)a-l exp( -( -x)"),

g3(X)

e- X exp( _e- X ).

0 < x,
x < 0,

5.1. Asymptotic Distributions of Extreme Sequences

153

Figure 5.1.1. Frechet densities


increases as IX increases.

gl,"

with parameters

IX

0.33,0.5, 1, 3, 5; the mode

Frechet Densities
Figure 5.1.1 is misleading so far as one density seems to have a pole at zero.
A closer look shows that this is not the case. Moreover, from the definition
of gl," it is evident that every Frechet density is infinitely often differentiable.
For a = 5 the density already looks like a Gumbel density (compare with
Figure 1.3.1).
The density gl,. is unimodal with mode
m(l, a) = (a/(l

+ IX

I!".

It is easy to verify that


m(l, a)

0,

and
m(l, a)

1,

gl,.(m(l, a)) ~

00,

as a ~

00.

Weibull Densities
The "negative" standard exponential density g2,1 possesses a central position
within the family of Weibull densities. The Weibull densities are again
unimodal. From the visual as well as statistical point of view the most
significant characteristic of a Weibull density g2,. is its behavior at zero (Figure
5.1.2). Notice that
xi o.
g2,.(X) '" a( - xrl,
One may distinguish between five different classes of Weibull densities as
far as the behavior at zero is concerned:

5. Approximations to Distributions of Extremes

154

-2

-1

Figure 5.1.2. Weibull densities g2 . with parameters rx


decreases as rx increases.
rx
rx
rx
rx
rx

0.5, 1, 1.5, 2, 4; the mode

(0, 1): pole

= 1: jump

(1,2): continuous, not differentiable from the left at zero


2: differentiable from the left at zero
> 2: differentiable at zero.

If rx > 1 then the mode of g2.a is equal to


m(2,rx) = -((rx - 1)/rx)l/a < 0.

Moreover,
m(2, rx)

--+

0,

1,

as rx

--+

--+ 00,

as rx

--+ 00.

g2.a(m(2, rx

--+

1,

and
m(2, rx)

--+

1,

g2.a(m(2, rx

Gumbel Density
The Gumbel density g3(X) = e-Xexp( _e- X ) approximately behaves like the
standard exponential density e- X as x --+ 00. The mode of g3 is equal to zero.
For the graph of g3 we refer to Figure 1.3.1.

Weak Domains of Attraction


If (5.1.1) holds then F is said to belong to the weak domain of attraction of G.
We shall discuss some conditions imposed on F which guarantee the weak
convergence of upper extremes.

5.1. Asymptotic Distributions of Extreme Sequences

155

As mentioned above, c;;-l(Xn:n - dn) has the dJ. Gi,a. if F = Gi,a. and if the
constants are appropriately chosen. Thus e.g. the sample maximum Xn:n of
the negative exponential dJ. GZ,l may serve as a starting point for the study
of asymptotic distributions of sample maxima. However, to extend such a
result one has to use the transformation technique (or some equivalent more
direct method) so that it can be preferable to work with the sample maximum
Un : n or v,,:n of n i.i.d. random variables uniformly distributed on (0,1) or,
respectively, ( -1, 0). In this case the limiting dJ. will again be G2 ,l' Recall that
the uniform distribution on ( - 1,0) is the generalized Pareto distribution W2 ,l'
As pointed out in (1.3.14) we have

G2,l(X),

n -+

00,

(5.1.2)

Fn(bn + xan) = G2 ,l (n(F(bn + xa n) - 1)) + 0(1),

n -+

00,

(5.1.3)

P{n(Un:n - 1) ~ x} = P{nv,,:n ~ x}

-+

for every x.
(5.1.2) and Corollary 1.2.7 imply that

for every x. Moreover, for G E {G1,a., G2 ,a., G3 : a > O} we may write


G=G2 ,l(lOgG)

on

(a(G),w(G)).

This yields

n -+

00,

for every x,

if, and only if,

(5.1.4)

n(l - F(bn + xa n)) -+ -log G(x) =: 1 - 'P(x),

n -+

00,

for every x E (a(G),w(G)).


This well-known equivalence is one ofthe basic tools to establish necessary
and sufficient conditions for the weak convergence of extremes. These conditions [due to Gnedenko (1943) and de Haan (1970)J in their elegance and
completeness can be regarded as a corner stone in the classical extreme value
theory.
AdJ. F belongs to the weak domain of attraction of an extreme value dJ.
Gi,a. if, and only if, one of the following conditions holds:
(1, a):

w(F)

lim [1 - F(tx)J/[l - F(t)J

00,

= x-a.,

x > 0;

(5.1.5)

t .... ""

(2, a):

w(F) <

00,

lim [1 - F(w(F)
t-l-o

= ( - x)a.,

(3):

lim [1 - F(t

+ xg(t))]/[l

+ xt)]/[l

- F(w(F) - t)J

x < 0;
- F(t)] = e-X,

(5.1.6)
-00

<x<

00,

(5.1.7)

ttw(F)

where g(t) = n,(F)(1 - F(y))dy/(l - F(t)).


Moreover the constants an and bn can be chosen in the following way:

5. Approximations to Distributions of Extremes

156

(2, IX):

b: = 0,
b: = w(F),

(3):

b: = F- I (1 - l/n),

(1, IX):

a: = rl(1 - I/n);

(5.1.8)

a: = w(F) - F- I (1 - I/n);

(5.1.9)

a: = g(b:)

(5.1.10)

where g is defined in (5.1.7).


It is well known that the weak convergence to the limiting dJ. G holds for
other choices of constants an and bn if, and only if,
an/a:

--+

1 and

a;; I (bn - b:)

--+

0 as n --+

00.

(5.1.11)

For a well-known extension of this result we refer to P.5.3.

Tail Equivalence of D.F.'s


Further insight into the property that a dJ. belongs to the weak domain of
attraction of G = Gi a may be gained by conditions that are more closely
related to (5.1.4). Observe that the two statements in (5.1.4) are equivalent to
n(1 - F(bn

+ xa n ))/[l

- qt(x)]

--+

1,

n --+

00,

(5.1.12)

for every x E (IX(G), w(G)) where qt = 1 + log G.


Recall that the restriction of qt to an appropriate interval is a generalized
Pareto dJ. WE {W1.a, W2 a, W3: IX > a}.
Theorem 5.1.1. Let G = Gi a and W = W;,Jor some i
the following three statements are equivalent:

(i) Fn(bn + xa n) --+ G(x),

n --+

00,

{t, 2, 3} and IX> O. Then

for every x,

(5.1.13)

(ii) (l - F(bn + xan))/(l - W(d n + xc n)) --+ 1,

n --+

(iii) (1 - F(bn + xa n))/(l - G(dn + xc n)) --+ 1,

n --+

00,
00,

(5.1.14)
(5.1.15)

where (5.1.14) and (5.1.15) have to hold for every x E (IX(G),W(G)).


Moreover, dn = 0 if i = 1,2, dn = log n if i = 3, Cn = n l / a if i = 1, cn = n- l / a
if i = 2, and Cn = 1 if i = 3.

PROOF. The equivalence of (5.1.13) and (5.1.14) is immediate from (5.1.12) by


writing (1 - qt(x))/n = 1 - W(d n + xc n). Moreover, from (1.3.11) and the
first equivalence we conclude that [1 - G(d n + xc n)]/[l - W(d n + xc n)] --+ 1,
n --+ 00, and hence, obviously, the second equivalence is also valid.
0
Notice that, necessarily, bn + xa n --+ w(F), n --+ 00, if Fn(bn + xa n) --+ G(x),
n --+ 00, and IX(G) < x < w(G). Thus, Theorem 5.1.1 reveals that F belongs to
the weak domain of attraction of G if, and only if, the upper tail of F can
asymptotically be made equivalent to G(d n + xc n ). Below we shall prefer to
work with the generalized Pareto d.f.'s W instead of the extreme value d.f.'s
G because of technical advantages and other reasons which will become
apparent when treating joint distributions of extremes.

5.1. Asymptotic Distributions of Extreme Sequences

157

Strong Domain of Attraction


Recall that the symbol G is used for the dJ. as well as for the corresponding
probability measure. In analogy to the notion of the weak domain of attraction, F is said to belong to the strong domain of attraction of G if
sup IP{a;;-l(Xn:n - bn) E B} - G(B)I-+ 0,

n -+

00,

(5.1.16)

where the sup is taken over all Borel sets B.


Notice that condition (5.1.16) implies that F belongs to the weak domain
of attraction of G. Thus, necessarily the normalizing constants are again those
of the weak convergence. Moreover, it can easily be verified that (5.1.11) carries
over to the strong covergence.
The following result was already indicated in (1.3.14).
Lemma 5.1.2.
sup IP{n(Un:n - 1) E B} - G2 1 (B)I-+ 0,

n -+

00.

(5.1.17)

PROOF. From Theorem 1.3.2 we deduce that n(Un : n - 1) has the density
In given by fn(x) = (1 + x/nrl, -n < x < 0, and =0, otherwise. Thus,
fn(x) -+ eX = g2, 1 (x), n -+ 00, x < 0, and hence the Scheffe lemma implies the
assertion.
0
Next, we study conditions under which F belongs to the strong domain of
attraction of an extreme value dJ.

Tail Equivalence of Densities


In this sequel let us assume that F has a density f.
Denote by w the density of the generalized Pareto dJ. W Notice that

w=g/G
on appropriate intervals where G is the corresponding extreme value dJ. and
9 = G'. Explicitly, we have

wl,a(x)

= {~x-(l+a)

if

w".(x)

~ {~( - xY-'

if

= {~-X

if

W3(X)

x<l
x~l

"Pareto"

(5.1.18)

"Type II"

(5.1.19)

x < -1
-l~x~O

x>O
x<O
x ~ O.

"Exponential" (5.1.20)

5. Approximations to Distributions of Extremes

158

The generalized Pareto densities as well as the extreme value densities are
unimodal. The particular feature of the generalized Pareto densities is the tail
equivalence to the corresponding extreme value densities at the right endpoint of the support.
The counterpart to Theorem 5.1.I-with respect to the strong convergence-is the following.
Lemma 5.1.3. Assume that the constants an > 0 and bn are chosen so that the
weak convergence holds, that is, Fn(bn + xa n) -4 G(x), n -4 00, for every x, where
G E {Gl,a, G2 ,a, G3 : a < o}. Then,

sup IP{a;l(Xn:n - bn) E B} - G(B)I-4 0,

(5.1.21)

n -400,

if, and only if, for every subsequence i(n) there exists a subsequence ken)
such that
ak(n)f(bk(n) + xak(n)
ck(n) w(dk(n) + xCk(n)

-----'--'-----'-"---'--'--4,

n -4

i(j(n))

(5.1.22)

00,

for Lebesgue almost all x E (a(G), w(G)) where w is the corresponding generalized Pareto density and Cn and dn are the constants of Theorem 5.1.1.

Condition (5.1.22) is equivalent to the condition that for every subsequence


i(n) there exists a subsequence ken) = i(j(n)) such that
k(n)ak(nJ(bk(n)

+ xak(n) -41jJ(x) = giG,

n -4

00,

(5.1.22')

for almost all x E (a (G), w(G)). The equivalence of (5.1.22) and (5.1.22')
becomes obvious by noting that dn + xC n E (a(W), w(W)) for every x E (ct(G),
w(G)), and ljJ(x}/n = Cnw(d n + xc n), eventually.
Without the condition Fn(bn + xa n) -4 G(x), n -4 00, (5.1.22) does not necessarily imply (5.1.21) as can be shown by examples. If the weak convergence
holds then a sufficient condition for the convergence W.r.t. the variational
distance is
a.f(bn + xa n)
cnw(d n + xc n)

---'-------41,

n-4OO,

xE(a(G),w(G)).

(5.1.23)

Note that the rate of convergence in (5.1.23) will also determine the rate at
which the strong convergence of the distributions holds. We remark that the
generalized Pareto density w can be replaced by the density g of G in condition
(5.1.23).
Notice that (5.1.23) is equivalent to
n -4

PROOF OF

LEMMA

00,

X E

(a(G), w(G)).

5.1.3. Since
x

-4

na.f(bn + xa n)Fn-l(bn + xa n)

(5.1.23')

5.1. Asymptotic Distributions of Extreme Sequences

159

is the density of a;;-l(Xn:n - bn) it is immediate from the SchetTe lemma 3.3.4
that (5.1.21) is equivalent to (5.1.22').
D
Lemma 5.1.3 will be the decisive tool to prove the following equivalence:
F belongs to the strong domain of attraction of an extreme value distribution
if, and only if, the corresponding result holds for the joint distribution of
the k largest extremes for every positive integer k. For details we refer to
Section 5.3.
From the mathematical point of view, condition (5.1.22) is more satisfactory
than the sufficient condition (5.1.23). However, for practical purposes condition (5.1.23) can be useful; e.g. to verify that a given dJ. belongs to the strong
domain of attraction of a particular extreme value distribution G E {G1, .. , G2 , .. ,
G3 : ex > O}. It was proved by Falk (1985a) that the von Mises conditions
(5.1.24) imply (5.1.23) and that (5.1.23) implies the convergence in the strong
sense. Sweeting (1985) was able to show that the von Mises conditions (5.1.24)
are equivalent to the uniform convergence of the densities in (5.1.23') on finite
int:!rvals if the density f is positive on a left neighborhood of w(F).

Von Mises-Type Conditions


Hereafter, we assume that F has a positive derivative f on (xo, w(F)) where
< w(F). The following conditions (1, ex), (2, ex), and (3) are sufficient for F to
belong to the strong domain of attraction of G1 , .. , G2 , .. , and G3 , respectively.
Xo

(1, ex):

w(F) =

00,

and lim if(t)/[1 - F(t)] = ex;

(2, ex):

w(F) <

00,

and lim [w(F) - t]f(t)/[1 - F(t)]

= ex;

ttw(F)

(3):

(5.1.24)

W(F)

-00

(1 - F(u)) du <

00,

and

W(F)

lim f(t)
ttw(F)

(1 - F(u))du/[1 - F(t)]2

= 1.

Another set of sufficient conditions can be formulated if, in addition, F has


a second derivative on (x o , w(F)) where Xo < w(F):

1
lim [(1 - F)/f]'(t) =
rtw(F)

If i = 3 then the normalizing constant


be replaced by

-!ex

if i = 2

i = 3.

(5.1.25)

a: = g(b:) as given in (5.1.10) can

5. Approximations to Distributions of Extremes

160

an

= 1/(nf(b:))

(5.1.26)

where again b: = F- 1 (1 - lin).


Notice that
[(1 - F)/fJ' = -(1 - F)f'Ij2 - 1.

(5.1.27)

Thus, (5.1.25), i = 3, is equivalent to limttro(F) (1 - F(t))f'(t)/j2(t) = -1.


(5.1.25) can be formulated in the following way: If the limit in (5.1.25) exists
then F belongs to the strong domain of attraction of the von Mises dJ. Hp
with parameter
f3 = lim [(1 - F)!fJ'(t).
ttro(F)
Since the conditions (5.1.5)-(5.1.7) and (5.1.24)-(5.1.26) are deduced from
(5.1.4) it is not very amazing that these conditions are trivially fulfilled for the
generalized Pareto d.f.'s W, that is, the equalities if(t)/[l - F(t)] = (X etc. hold
for every t in the support of W
The von Mises-type conditions are sufficient for a dJ. to belong to the
strong domain of attraction of an extreme value dJ. However, as examples
show these conditions are not necessary. This is intuitively clear since for every
density f which fulfills a von Mises-type condition we can find-by slightly
varying f in the tail of the distribution-a density g which violates the
von Mises-type condition whereas the stochastical properties of the sample
maximum remain to hold asymptotically.
The main purpose of the following example is to clarify the connection
between the different normalizing constants used in literature for the maximum of normal r.v.'s.
EXAMPLE 5.1.4. Let Xn;n be the maximum of standard normal r.v.'s. Write
again <p = <1>'. Since <p'(x) = -x<p(x) we get
( 1 - <1' (x)
<p

= (1 - <1> (x)) x _ 1.
<p(x)

It is immediate from (3.2.3) that this expression tends to zero as x -+ 00. Thus,
condition (5.1.25), i = 3, implies that <1> belongs to the domain of attraction
of the Gumbel dJ. G3 Hence, according to (5.1.26), with bn = <1>-1(1 - lin),

sup IP{ncp(bn)(Xn;n - bn) E B} - G3 (B)I-+ 0,


B

n -+

00.

(1)

Direct calculations or an application of Example 5.2.4 shows that (1) holds


with a remainder term of order O(ljlog n).
Next an = lln<p(bn) and bn will be replaced by other normalizing constants
that satisfy (5.1.11). Obviously, bn is the solution of the equation
I - <1>(b)

= lin.

(2)

Since 1 - <1>(x) ~ <p(x)/x as x -+ 00 it is immediate that (2 log n)1/2 may be


taken as a first approximate solution of (2). Moreover, (2) may be written

5.1. Asymptotic Distributions of Extreme Sequences

(1 - <I>(b))/cp(b)

161

= 1/(ncp(b))

and hence a solution of the equation


(3)

ncp(b) = b,

say, b~ will be an approximate solution of (2). It can be shown (compare


also with Example 5.2.4) that (1) still holds with a remainder term of order
O(l/logn) if an and bn are replaced by a~ and b~ where a~ = (b~)-l.
(3) is equivalent to the equation
b

= (2 log n -log2n - 210gb)1/2.

(4)

A Taylor expansion of length two about 210gn leads to the equation


b

= (210gn)1/2 _ log2n + 210gb


2(2 log n)1/2

Replacing b on the right-hand side by (2 log n)1/2 we get


b* = (21
)1/2 _ log 4n + loglog n
n
og n
2(2 log n)1/2

(5)

Use P.5.7 to prove that


sup IP{(210gn)1/2(Xn:n - b:) E B} - G3 (B)1 = 0 (
B

(lOglOg n)2 )
1
.
ogn

(6)

We remark that the bound in (6) is sharp. Moreover, the same rates are
obtained if d.f.'s are considered.

The kth Largest Order Statistic


The results given above can easily be extended to the case of the kth largest
order statistic. It is well known that
P{a(n)-l(Xn:n - b(n)) ~ x}

-+

Gi.lZ(x),

n -+

00,

implies for every fixed k that


P{a(n)-1(Xn_k+1:n - b(n)) ~ x}

-+

n -+

Gi.lZ.k(X),

00,

(5.1.28)

where the dJ.'s Gi lZ .k are given by


G1.IZ .k(X) = exp( _X-IZ)
G2.IZ .k(X)

k-1 x- jlZ

L -.-, '

j=O

= exp( -( _X)IZ)

J.

k-1 (_XylZ

L -.-,-,

j=O

J.

x> 0,
x < 0,
-00

<x<

(5.1.29)
00.

5. Approximations to Distributions of Extremes

162

With the convention G3 ,a,k == G3 ,k we have


Gi,a,k

= Gi,a

k-1

j:O

(5.1.30)

(-log Gi,ay/j!

on the support of Gi,a'


To prove (5.1.28) recall that, necessarily, for G E {G 1,a, G2,a, G3 : r:x > O} and
x E (r:x(G),w(G)),
n(l - F(u n)) -+ -log G(x),

with

Un

= bn + anx. According to (1.1.8), as n -+


P{Xn -k+1:n::;;

un}

-+ 00,

00,

p{~ l(Un,oo)(~J::;; k -

1}

= B(n,1-F(U n ))({0, 1, ... ,k - 1})


-+

(5.1.31)

p-10gG(X)({0, 1, ... ,k - 1})

where PI denotes the Poisson distribution with parameter t > 0. Thus, (5.1.28)
holds.
Moreover, it is well known that every nondegenerate limiting dJ. of the
kth largest order statistic X n -k+1:n has to be one of the d.f.'s in (5.1.29)
(see e.g. Galambos (1987), Theorem 2.8.1) where it is always understood that
we have to include a location and scale parameter if the dJ. of X n - k + 1 : n is not
properly standardized.
Note that in analogy to (1.3.15) the nondegenerate limiting d.f.'s Fi,a,k of
the kth smallest order statistics Xk:n are given by
F1,a,k(X)

= 1 - Gl,a,k( -x),

x < 0,

F2,a,k(X)

= 1 - G2,a,k( - x),

x> 0,

(5.1.32)

F3 ,k(X) = 1 - G3,k( -x)


where again r:x > 0.
Obviously, G2 , 1,k is the "negative" gamma dJ. with parameter k; thus, the
density g2, 1,k of G2 , 1,k is given by
x < 0,

and = 0, otherwise,
We also note the explicit form of the densities gi,a,k of Gi,a,k' Since
Gi,a,k

G2 , !,k(log Gi,a)

on

we know that
gi,a,k(X) = g2, 1,k(lOg Gi,a(x))

~i":~~'

and = 0, otherwise. Explicitly, we have

(r:x(Gi,a), w(Gi,a))

(5.1.33)

5.1. Asymptotic Distributions of Extreme Sequences

163

X-(/lk+1)
gl,/l,k(x)=ocex p(-x-/l)(k_1)!'

x>O,

( _X)/lk-i
g2,/l,k(X) = ocexp( -( _x)/l) (k _ 1)! '
-00

x < 0,

<x<

(5.1.34)

00.

Notice that
gi,/l,k = gi,/l( -log Gi,/l)k-i/(k - 1)!'

Lemma 1.6.6 yields that G2 ,l,k is the dJ. of the partial sum Sk = L~=i ~i
where ~ 1, ... , ~k are i.i.d. random variables with common dJ. F(x) = eX, x < 0.
Next it will be proved that n(Un-k+1:n - 1) is asymptotically distributed
according to G2 ,l,k (in other words, can asymptotically be represented by Sd.
As an extension of Lemma 5.1.2 we obtain

Lemma 5.1.5. For every positive integer k,


sup IP{n(Un-k+i:n - 1) E B} - G2,l,k(B)I-+ 0,

n -+

00.

(5.1.35)

PROOF. Obvious by noting that n(Un-k+i:n - 1) has the density fn given by


fn(x) =

1]

k-i (

i)) (

1 - ;;

x)n-k( _ X)k-i
(k _ 1)! '

+ ;;-

-n < x < 0,

and = 0, otherwise.
Obviously, Lemma 5.1.5 can be written
sup IP{nv,,-k+i:n
B

B} - G2 ,l,k(B)I-+ 0,

n -+

00,

(5.1.36)

where v,,-k+1:n is the kth largest order statistic of n i.i.d. random variables that
are uniformly distributed on ( - 1,0).
Recall that the uniform distribution on ( - 1,0) is the generalized Pareto
distribution W2 ,l' (5.1.36) can easily be extended to the other generalized
Pareto distributions Wi,/l by using the transformation technique.
Let again Ii,/l be defined as in (1.6.10). For x < 0, we have Ti,/l(x) =
( - Xfi//l, T2 ,/l(x) = - ( - X)i//l and T3,l (x) = -log( - x).
Since Ii,/l(nV,:n) = c;l(Xr:n - dn) where cn, dn are the constants of
Theorem 5.1.1 and since Gi,/l,k is induced by G2 ,l,k and Ii,/l [recall that
Ii:,} = Gi,\ 0 Gi,/l = log Gi,/l] the following result is immediate from Lemma
5.1.5.

Corollary 5.1.6. Let Xn-k+i:n be the kth largest order statistic of n i.i.d. random
variables with common generalized Pareto dJ. WE {Wi,/l' W2 ,/l' W3: oc > O}.

5. Approximations to Distributions of Extremes

164

Then, for every fixed k, as n -

00,

sup IP{n- 1/aX n- k+ 1:n E B} - G1,a,k(B)I- 0

(5.1.37)

(5.1.38)
(5.1.39)

sup IP{(Xn-k+1:n -logn) E B} - G3 ,k(B)I- 0


B

In Section 5.4 it will be shown that Lemma 5.1.5 (and thus also Corollary
5.1.6) is valid with a remainder term of order O(k/n).

Intermediate Order Statistics


From Chapter 4 we already know that intermediate order statistics are
asymptotically normal under weak regularity conditions. For example,
according to Theorem 4.2.1,
sup lP{a~!(Ur:n - br,n) E B} - N(O,1)(B)1 :5: C(n/r(n - r))1/2 (5.1.40)
B

where C > 0 is a universal constant, and ar,n > 0 and br,n are normalizing
constants. In Section 5.4 it will be proved that
(5.1.41)

sup IP{n(Un- k+1:n - 1) E B} - G2,1.k(B)1 :5: Ck/n,


B

where G2 ,1,k is the "negative" gamma distribution. We also refer to P.5.18


where a rate of order O(k1/2/n) is achieved in (5.1.41) by using other normalizing constants. Approximations of joint distributions of intermediate order
statistics are established in Sections 4.5, 5.4, and 5.5.
The following theorem is taken from Falk (1989b).
Theorem 5.1.7. Assume that one of the von Mises conditions (5.1.24) holds. Let
k(n) E {I, ... , n} be such that k(n) - 00 and k(n)/n - 0 as n - 00.
Then, with bn = p-1(1 - k(n)/n), we have

n-

00.

(5.1.42)

The proof of(5.1.42) is based on (5.1.40) and the transformation technique.

5.2. Hellinger Distance between Exact and


Approximate Distributions of Sample Maxima
Given n i.i.d. random variables ~ l' ... , ~n with common dJ. P we know that
the dJ. of the sample maximum Mn = Xn:n is given by pn. In Section 5.1 we
gave a short outline of classical results concerning the weak convergence of
pn (if appropriately normalized) to a limiting dJ. G. Moreover, we know that

5.2. Hellinger Distance between Exact and Approximate Distributions

165

under von Mises-type conditions the weak convergence is equivalent to the


convergence w.r.t. the variational distance. In the present section we study the
accuracy of such approximations. Again we use the same symbol for a dJ. and
the pertaining probability measure to simplify the notation.

The Hellinger Distance


In statistical applications it is desirable to use the Hellinger distance instead
of the variational distance. To highlight this point consider the sample
maxima
i = 1, ... , N,
where the random variables ~1.1' ... , ~l.n' ~2.1'' ~2,n' ... , ~N,l> ... , ~N,n are
i.i.d. with common dJ. F. Thus, M n, 1, ... , Mn,N are i.i.d. random variables with
common dJ. Fn. If '11' ... , '1N are i.i.d. random variables with dJ. G then we
know from Corollary 3.3.11 that for every Borel set B,
IP{(Mn,l,oo.,Mn,N) E B} - P{('11,.oo,'1N)

B}I 5 Nl/2H(P,G)

(5.2.1)

where H(Fn, G) is the Hellinger distance between F n and G.


Given dJ.'s F and G with Lebesgue densities f and g, the Hellinger distance
of F and G is defined by
H(F, G) =

[f (fl/2(X) -

In general, if F and G have densities


measure J1 then
H(F, G) =

gl/2(XW dXJ /2

f and

[f (fl/2 -

(5.2.2)

g with respect to some O"-finite

gl/2)2 dl1 J /2

(5.2.3)

and the distance is independent of a particular representation. Thus (5.2.2)


and (5.2.3) lead to the same distance (if Lebesgue densities exist). We refer to
Section 3.3 for further details.
(5.2.1) also holds with N l /2H(Fn, G) replaced by N IIFn - Gil where
liP - Gil is the variational distance between P and G. However, the use of
N IIF n - Gil yields an inaccurate inequality in those cases where liP - Gil and
H(P, G) are of the same magnitude.

An Auxiliary Approximation
According to (5.1.4)
P(bn

+ xa n ) ~ G(x) =: exp( -h(x,

n~

if, and only if,

00,

(5.2.4)
n(l - F(bn + xa n )) ~ h(x),

n~

00.

5. Approximations to Distributions of Extremes

166

Since F is a dJ. it is obvious that also Dn defined by

Dn = [exp[ -n(1 - Fn)] - e- n]/(1 - e- n)


is a dJ. where Fn(x) = F(bn + xa n). Now (5.2.4) may be written
(5.2.4')
w.r.t. the pointwise convergence.
According to Lemma 5.2.1, Dn -+ G implies
-+ G where the convergence
is taken w.r.t. the Hellinger distance H. Notice that the dJ. G in Lemma 5.2.1
is not necessarily an extreme value dJ. In particular, G may also depend on n.

F:

Lemma 5.2.1. There exists a universal constant C > 0 such that for every nand
all dI's F and G the following inequality holds:

H(F n, G) ::;; H(Dn' G)

+ C/n

where Dn = [exp[ -n(1 - F)] - e-n]/(l - e- n).


PROOF.

Since H(F n, G) ::;; H(F n, Dn)

holds if

+ H(Dn, G)

we know that the assertion


(1)

First, (1) will be verified in the special case of Fo(x) = 1 + x/n, - n < x < O.
Notice that Fo is the dJ. of n(Un : n - 1).
In this case we have Dn(x) == Do.n(x) = (eX - e- n)/(1 - e- n), -n < x < 0,
and, therefore, Do,n is the normalized restriction of the extreme value dJ. G2,1
to the interval ( - n, 0).
Denote by fo and do,n the densities of Fo and Do,n' Since

H(Fo, Do,n) ::;;

[f (nfoFo-1 /do,n -

1)2 dDo,n

J2

(see Lemma 3.3.9(ii)) it is immediate that (1) holds for Fo and Do,n if

f:n [e- x

(1 + ~)"-\1 -

e- n) -

1J

eX/(l - e-n)dx ::;; (C/n)2.

(2)

This inequality can be verified by means of some straightforward calculations.


The extension to arbitrary d.f.'s is obtained by means of the transformation
technique. If ~ and 'l are r.v.'s with dJ.'s Fo and Do,n then F- 1(1 + ~/n) and
F- 1(1 + 'lin) are r.v.'s with d.f.'s F" and Dn = [exp[ -n(1 - F)] - e- n]/
(1 - e- n ). Now, Lemma 3.3.13, which concerns the Hellinger distance between
induced probability measures, implies (1) in the general case.
D
Let Fo be defined as in the proof to Lemma 5.2.1, that is, Fo is the dJ. of
n(Un : n - 1). It is easy to see that also

5.2. Hellinger Distance between Exact and Approximate Distributions


H(F~,G2,l) ~

167

Cjn.

(5.2.5)

According to our considerations in Section 5.1, this inequality can easily


be extended to sample maxima under arbitrary generalized Pareto d.f.'s.

The Main Results


Notice that Lemma 5.2.1 holds for arbitrary dJ.'s F and G. Hereafter, we shall
assume that F and G possess densities f and g.
In the next step we establish an upper bound for H(Dn' G)-and thus for
H(F n, G)-which depends on F through the density f only.
Lemma 5.2.2. Let F and G be df.'s with densities f and g. Define ifJ = giG on
the support of G. Then, for every Xo ~ -00,

H(pn, G)

~ [2G(B +
C

[nfN - 1 - 10g(nfN)] dG

r (1 + log G) dG + r

JB

nGdFJ1 /2 + Cln

J{g=O)

(5.2.6)

where B = {x: x> xo,f(x) > O} and C > 0 is a universal constant.


PROOF. Let the dJ. Dn be defined as in Lemma 5.2.1. Notice that Dn has the
density x --+ nf exp[ - n(l - F)]/(l - e- n). To prove this apply e.g. Remark
1.5.3. Now, by Lemma 5.2.1 and Lemma A.3.5, applied to H(Dn' G), we obtain

H(pn, G)

~ [2G(B +
C)

[n(l - F) - log(nf)

+ 10g(GifJ)] dG T/2

+~.

(1)

Recall that G = gN on the set {g > O}. Hence, by Fubini's theorem

(1 - F)dG =
=

L: (LOO

f(y)dy )dG(X)

f f l[xo,oo)(x) l(-OO,y)(x)f(y)g(x) dx dy

(fo

f(y) l[xo,oo)(Y)

g(x) dX) dy

(2)

s;; roo f(y)G(y) dy


Jxo

r UN)dG + r

JB

G(y)dF(y).

J{g=O}

Combining (1) and (2) we obtain inequality (5.2.6).

5. Approximations to Distributions of Extremes

168

In special cases the term on the right-hand side of (5.2.6) simplifies


considerably.
Corollary 5.2.3. Assume in addition to the conditions of Lemma 5.2.2 that F and
G are mutually absolutely continuous (that is, G{J> O} = F {g > O} = 1).
Then,

H(P, G) ::;

PROOF.

[f (nf/rll -

1 - log(nf/rll dG

Lemma 5.2.2 will be applied to

Xo

-00.

J/
2

+ C/n.

(5.2.7)

It suffices to prove that

flOg G dG = - 1.

(1)

Notice that according to Lemma 1.2.4,

f(l

+ 10gG)dG =
=

since x log x

-+

0 as x

-+

Il
Il

(1

+ 10g(G

(1

+ log x) dx = xlogxlA = 0

G-1)(xdx

O.

The proof of (1) shows that JIog G dG = -1 for continuous dJ.'s G. If G


has a density g then Jg(x)( -log G(x dx = 1 so that g(x)( -log G(x is a
probability density. In Section 5.1, we already obtained a special case, namely,
that g;,rz,2 = g;,rz( -log G;,rz) where g;,rz is the limiting density of the second
largest order statistic.
Thus, if g is an approximation to the density of the standardized sample
maximum then g( -log G) will be the proper candidate as an approximate
density of the second largest order statistic. The extension of this argument
to k > 2 is straightforward and can be left to the reader.
Since x-I -logx::; x-I + 1/x - 1 = (x - 1flx we obtain from Corollary 5.2.3 that
H(F n , G) ::;

[fnf/~/~ If dG

J/+
2

C/n

(5.2.8)

where again'" = giG. This inequality shows once more (see also Section 5.1)
that the approximating dJ. G should be chosen in such a way that nf/rll is close
to one.
5.2.4. Let F(x) = <II(bn + b;;l x) where <II is the standard normal dJ.
and bn is the solution of bn = ncp(bn ) with cp = <11'. Then,

EXAMPLE

(5.2.9)

5.2. Hellinger Distance between Exact and Approximate Distributions

169

To prove this, we apply (5.2.7), with l/1(x) = e- X. We have


H(F",G3 )

~ [f (mp(b

+ b;;l x) b;; 1 eX -l-log(ncp(bn + b;;l X)b;;l eX)) dG 3(x) J /2 + C/n

= [f (exp( -x 2/2b;) -

1 + X2/2b;)dG 3(x)J'2

~ [f(X 4 /8b:)dG3(X)J /2 + C/n ~ C(b;;2 + n-

+ C/n
1 ).

Thus, (5.2.9) holds since b;;2 = O(lflog n).


Next, Lemma 5.2.2 will be applied to extreme value d.f.'s G E {G1,a;,
OJ. Note that the function l/1 = giG is given by

G2 .a;, G3 : ex >

l/11,a;(X) = exx-(1+a;),

x>0

l/12,a;(X) = ex( - x)-(1-a;),

x<O

l/13(X) = e- x ,

-00

<x<

(5.2.10)
00.

Recall that w(F) < 00 if F belongs to the domain of attraction of G2,a;' In


this case, the usual choice of the constant bn is w(F) so that we may assume
w.l.g. that w(F) = w(G2,a;) = O. IfF belongs to the domain of attraction of G1,a;
then w(F) = w(Gl,a;) = 00, Let us also assume that w(F) = w(G3) = 00 ifi = 3
to make the inequality in Theorem 5.2.5 as simple as possible without losing
too much generality.

Theorem 5.2.5. Let G E {G1,a;, G2,a;, G3 : ex > OJ. Let F be a df. with density f
such that f(x) > 0 for Xo < x < w(F). Assume that w(F) = w(G). Then,
H(F n , G)

~ [L:(G) [nfN -

1 -log(nfN)] dG + 2G(x o) - G(x o) log G(xo) J /2 + C/n

where C > 0 is a universal constant.


PROOF.

Immediate from Lemma 5.2.2 since J{g=O} nG dF

2G(BC ) +

= 0, and

(l + log G) dG = Gi,a;(BC) + Gi,a;,2(B')

= Gi,..(xo) + Gi,a;,2(XO)
= 2G(x o) - G(xo)log(G(xo)).

5. Approximations to Distributions of Extremes

170

Limit Distributions
The results above provide us with useful auxiliary inequalities which, in a next
step, have to be applied to special examples or certain classes of underlying
d.f.'s to obtain a more explicit form of the error bound.
Our first example again reveals the exceptional role of the generalized
Pareto dJ.'s W;,a (at least, from a technical point of view).
EXAMPLE 5.2.6. (i) Let WE {WI,a, W2 ,a, W3:
of Theorem 5.1.1. Put
Fn(x)

IX

> O} and Cn, dn be the constants

W(d n + xc n)

The density In of Fn is given by


fn(x)

Cnw(d n + xc n) = ljJ(x)ln

for every x with fn(x) > O. Thus, we have

(nfnN - 1 - 10g(nfnN))dG

O.

(In>oJ

Applying Theorem 5.2.5 to

Xo =

(IX(W) - dn)/c n we obtain again

H(F;, G) ::;; Cln.

(ii) Let in (i) the generalized Pareto dJ. W be replaced by adJ. F which has
the same tail as W More precisely,
f(x)

w(x),

T(x o ) < x < w(G),

where - 1 < Xo < 0 and T is the corresponding transformation as defined in


(1.6.10). Then,
H(F;, G) ::;; Coin

where Co is a constant which only depends on

Xo.

Notice that the condition T(x o ) < x in Example 5.2.6(ii) makes the
accuracy of the approximation independent of the special underlying dJ. F.
Example 5.2.6 will be generalized to classes of d.f.'s which include the
generalized Pareto dJ.'s as well as the extreme value dJ.'s. Since our calculations are always carried out within an error bound of order O(n-l) it is
clear that the estimates will be inaccurate for extreme value d.f.'s.
Assume that the underlying density f is of the form
f= ljJe h

where h(x) ~ 0, x ~ w(G). Equivalently, one may use the representation


f = 1jJ(1 + h) by writing f = ljJe h = 1jJ(1 + (e h - 1)).
Corollary 5.2.7. Assume that G E {GI,a' G2 ,a, G3: IX > O} and 1jJ, T are the
corresponding auxiliary functions with IjJ = giG and T = G- I 0 G2 ,1'

5.2. Hellinger Distance between Exact and Approximate Distributions

171

Assume that the density f of the df F has the representation


f(x) = I/I(x)eh(X),
and = 0,

T(xo) < x < w(G),

(5.2.11)

if x > w( G), where Xo < 0 and h satisfies the condition


i = 1

LX-IZ~

Ih(x)1

if i =

L( _X)IZ~

(5.2.12)

i=3

Le-~x

and L, fJ are positive constants. Write


Fn(x) = F(d n + xc n)
where Cn> dn are the constants of Theorem 5.1.1. We have dn =0 if i = 1,2, and
dn = logn if i = 3; moreover, Cn = n i/lZ if i = 1, cn = n-1j1Z if i = 2, and Cn = 1 if
i = 3.
Then, the following inequality holds:
H(F:, G)

~ DDnn=: if

0 < fJ

fJ>1

(5.2.13)

where D is a constant which only depends on Xo, L, and fJ.


PROOF. W.l.g. we may assume that G = G2 ,l' The other cases can easily be
deduced by using the transformations T == 'Ii,IZ'
Theorem 5.2.5 will be applied to xO,n = nxo. It is straightforward that
the term 2G2,l (nxo) - G2,l (nx o) log G2,l (nx o) can be neglected. Put f,.(x) =
f(x/n)/n. Since h is bounded on (xo, 0) we have

f. 0(nfnN2, 1- 1 -log(nfnN2,d)dG2,1
= f.0 (eh(Xln) - 1 - h(x/n))dG2,l(x)
nxo

nxo

~ fj L:o (h(x/n))2 dG2. (x) ~ fjL2n-2~ f:oo IxI2~dG2,l(x)


1

where fj only depends on X o, Land fJ. Now the assertion is immediate from
Theorem 5.2.5.
D
Extreme value dJ.'s have representations as given in (5.2.11) with fJ = 1 and
hex) = _x- IZ if i = 1, hex) = -( _X)IZ if i = 2, and hex) = _e- X if i = 3.
Moreover, the special case of h = 0 concerns the generalized Pareto densities.

Remark 5.2.S. Corollary 5.2.7 can as well be formulated for densities having
the representation

f(x) = I/I(x)(1

+ hex)),

T(x o) < x < w(G),

and =0, if x> w(G), where h satisfies the condition (5.2.12).

(5.2.14)

5. Approximations to Distributions of Extremes

172

Maximum of Normal R.V.'s: Penultimate Distributions


Inequality (5.2.6) is also applicable to problems where approximate distributions which are different from the limiting ones are taken. The first example
will show that Wei bull distributions G2 ,a(n) with O((n) --+ 00 as n --+ 00 provide
more accurate approximations to distributions of sample maxima of normal
r.v.'s than the limiting distribution G3 .
The use of a "penultimate" distribution was already suggested by Tippett
in 1925. For a numerical comparison of the "ultimate" and "penultimate"
approximation we also refer to Fisher and Tippett (1928).
EXAMPLE

5.2.9. Let F(x) = <I>(b - b- 1

+ b- 1 x) where b is the solution of the

equation
n<p(b - b- 1 )

= b.

Notice that b and thus also F depends on n; we have b- 2 = O(ljlogn).


Below we shall use the von Mises parametrization (see (1.3.17 of Weibull
d.f.'s, namely,
H_ b -2(X) =

G2,b2( -1

+ x/b 2 ).

Applying Lemma 5.2.2 to G = H_ b-2 we obtain after some straightforward but


tedious calculations that
(5.2.15)

We indicate some details of the proof of (5.2.15). Check that


nf(x)
"'(x)

exp( -x

+ x/b 2 -

(1 - x/b 2

x 2 /2b 2 )
1

To establish a sharp estimate of J(nfN - 1 - log nfN) dG proceed in the


following way: (a) Apply Lemma A.2.1 to the integrand evaluated over the
interval [ - cb, cb] with c being sufficiently small. (b) Use the crude inequalities ePx /a ~ (1 + x/O()P for x> -0( and (1 + x/O()a ~ 1 + x + (0( - l)x 2 /20( for
x> 0 and 0( ~ 2 to obtain estimates of the integral over (-00, -cb) and
(cb, b2 ).

Maximum of Normal R.V.'s: Expansions of Length Two


From Lemma 5.2.10 it will become obvious that
(5.2.16)

provides an expansion oflength two of <l>n(b - b- 1 + b- 1 x).


However, since this expansion is not monotone increasing it is evident that
(5.2.15) cannot be formulated with H_b2 replaced by this expansion since the
Hellinger distance is only defined for d.f.'s. One might overcome this problem

5.2. Hellinger Distance between Exact and Approximate Distributions

173

by extending the definition of the Hellinger distance to signed measures.


Another possibility is to redefine the expansion in such a way that one obtains
a probability measure; this was e.g. achieved in Example 5.2.9. To reformulate
(5.2.15) we need the following lemma which concerns an expansion of length
two of von Mises d.f.'s Hp.
Lemma 5.2.10. For every real {3 denote by f-Lp the signed measure which
corresponds to the measure generating function

Let again Hp denote the von Mises distribution with parameter {3. Then,
sup IHp(B) - f-Lp(B) I = 0(f3-2).
B

PROOF.

Apply Lemma A.2.1 and Lemma A.3.2.

Thus as an analogue to (5.2.15) we get


sup IP{bn(Xn:n - (bn - bn- 1 )) E B} - f-L-b-2(B)1
B

"

= O((logn)-2)

(5.2.17)

where Xn:n is the maximum of n i.i.d. standard normal r.v.'s, and bn is the
solution of the equation nqJ(b - b- 1 ) = b.
Figures 5.2.1-5.2.3 concern the density fn of $n(bn + an'), with bn =
$-1(1 - lin) and an = 1/(ncp(bn)) (compare with P.5.8), the Gumbel density
g3 and the derivative g3(1 + hn) of the expansion in (5.2.16).
Observe that fn and g3(1 + hn) have modes larger than zero; moreover,
g3(1 + hn) provides a better approximation to fn than g3'

0.5

-3

Figure 5.2.1. Normalized density 1. (dotted line) of maximum of normal r.v.'s, Gumbel
density 93, and expansion 93(1 + h.) for n = 40.

5. Approximations to Distributions of Extremes

174

In order to get a better insight into the approximation, indicated by Figure


5.2.1, we also give illustrations concerning the error of the approximation.

10

-0.025

Figure 5.2.2.

in -

g3' in

g3(1

+ hnl for n = 40.

0.025

10

-0.025

Figure 5.2.3.

in -

g3, in

g3(1

+ hnl for n =

400.

We are well aware that some statisticians take the slow convergence rate
of order O(1/log n) as an argument against the asymptotic theory of extremes,
perhaps, believing that a rate of order O(n-l/2) ensures a much better accuracy
of an approximation for small sample sizes. However, one may argue that
from the historical and mathematical point of view it is always challenging to
tackle this and related problems. Moreover, one should know that typical
statistical problems in extreme value theory do not concern normal r.v.'s.
The illustrations above and further numerical computations show that the
Gumbel approximation to the normalized dJ. and density of the maximum
of normal r.v.'s is of a reasonable accuracy for small sample sizes. This may

5.2. Hellinger Distance between Exact and Approximate Distributions

175

serve as an example that the applicability of an approximation not only


depends on the rate of convergence but also on the constant involved in the
error bound.
If a more accurate approximation is needed then, instead of increasing the
sample size, it is advisable to use an expansion oflength two or a penultimate
distribution. Comparing Figures 5.2.2 and 5.2.3 we see that the expansion of
length two for n = 40 is of a higher accuracy than the Gumbel approximation
for n = 400.
The limit theorem and the expansion give some insight into the asymptotic.;
behavior of the sample maximum. Keep in mind that the dJ. cI>n of the sample
maximum itself may serve as an approximate dJ. in certain applications (see
Reiss, 1978a).

Expansions of Length Two


Another example of an expansion of length two is obtained by treating a
refinement of Corollary 5.2.7 and Remark 5.2.8. In Remark 5.2.8 we studied
distributions of sample maxima under densities of the form f = 1/1(1 + h)
where h varies over a certain class of functions. Next, we consider densities of
the form

= 1/1(1

+ P + h)

with p being fixed. Moreover, 1/1 is given as in (5.2.10).


Below, an expansion of length 2 of distributions of sample maxima is
established where the leading term of the expansion is an extreme value
distribution G and the second term depends on G and p. Let
-Kx- ap
i= 1
p(x) = -K( _x)a p if i = 2
(5.2.18)
-Ke- Px
i=3
for some fixed K

0 and p > 0, and


Ih(x)1

where L > 0 and 0 < p

()

Lx- aa
L( _x)aa
Le- ax

i=1
if i = 2
i=3

(5.2.19)

1. The expansion of length two is given by

Gp.n(x) = G(x{ 1 - n- P

1'' (G) P(Y)I/I(Y)dY]

(5.2.20)

for oc(G) < x < w(G). This may be written


X-(l+p)a
Gp,n(x) = G(x{ 1 + n- P1

~ p .( _x)<l+ p)a]
e-(l+p)x

i = 1

if i = 2

i=3.

(5.2.21)

5. Approximations to Distributions of Extremes

176

Notice that f = 1/1(1 + p + h) and Gp n arise from the special case with i = 2
and C/. = 1 via the transformation Ii.a = Gi~~ 0 G2 1 .
It is easy to check that Gp,n is a dJ. if n is sufficiently large; more precisely,
this holds if, and only if,
(5.2.22)
Theorem 5.2.11. Let G, 1/1 and T be as in Corollary 5.2.7. Assume that the
underlying density f has the representation
f(x) = l/I(x)(1

+ p(x) + hex)),

T(x o)

< x < w(G),

(5.2.23)

and =0,
Put

if x> w(G), where Xo < 0 and p, h satisfy (5.2.18) and (5.2.19).

where

dn are the constants of Theorem 5.1.1. Then,

Cn,

H(F:, Gp,n)
PROOF.

O(n- min (b,2 p ).

Apply Lemma 5.2.2.

It was observed by Radtke (1988) (compare with P.5.l6) that for a special
case the expansion Gp,n(x) can be replaced by G(bn + anx) where G is the
leading term of the expansion and bn --+ 0 and an --+ 1 as n --+ 00. Notice that
G(bn + anx) can be written-up to terms of higher order-as
G(x) [1

+ I/I(x)(bn + (an

- 1)x)]

where again 1/1 = G'IG. One can easily check that such a representation holds
in (5.2.21) if, and only if, i = 1 and p = IIC/..

5.3. The Structure of Asymptotic Joint


Distributions of Extremes
Let us reconsider the stochastical model which was studied in Section 5.2. The
sample maxima Mn,i:= max(nU-l)+1,"" nJ are the observed r.v.'s, and it is
assumed that
(a) Mn, l' ... , Mn,N are i.i.d. random variables,
(b) the (possibly, non-observable) r.v.'s n(i-l)+1,

... , ni

are i.i.d. for every

i= 1, ... ,N.

The r.v.'s n(i-l)+l' ... , ni may correspond to data which are collected
within the ith period (as e.g. the amount of daily rainfall within a year). Then,
the sample Mn, 1, ... , Mn,N of the annual maxima can be used to estimate the
unknown distribution of the maximum daily rainfall within a year. Condition

5.3. The Structure of Asymptotic Joint Distributions of Extremes

177

(a) seems to be justified in this example, however, the second condition is


severely violated. It would be desirable to get some insight (within a mathematical model) into the influence of a deviation from condition (b), however,
this problem is beyond the scope ofthis book. With the present state-of-the-art
one can take some comfort from experience and from statements as e.g. made
in Pickands (1975, page 120) that "the method has been shown to be very
robust against dependence" of the r.v.'s ~n(i-l)+1' , ~ni'
It may happen that a certain amount of information is lost if the statistical
influence is only based on maxima. Thus, a different method was proposed by
Pickands (1975), namely, to consider the k largest observations of the original
data. This method is only applicable if these data can be observed. For the
mathematical treatment of this problem it is assumed (by combining the
conditions (a) and (b that ~ 1, ... , ~nN are i.i.d. random variables. The statistical inference will be based on the k largest order statistics X nN -k+l:nN ~ ~
X nN : nN of ~1' ... , ~nN' In this sequel, the sample size will again be denoted by
n instead of nN.
In special cases, a comparison of the two different methods will be made
in Section 9.6. The information which is lost or gained by one or the
other method can be indicated by the relative efficiency between statistical
procedures which are constructed according to the respective methods.
One should keep in mind that such a comparison heavily depends on the
conditions stated above. For example one can argue that the dependence of
the rainfall on consecutive days has less influence on the stochastic properties
of the annual maxima compared to the influence on the k largest observations
within the whole period. Thus, the second method may be less robust against
the departure from the condition of independence.
The main purpose of this section is to introduce the asymptotic distributions of the k largest order statistics. Moreover, it will be of great importance
to find appropriate representations for these distributions. For the aims of this
section it suffices to consider order statistics from generalized Pareto r.v.'s as
introduced in (1.6.11). Notice again that the same symbol will be used for the
dJ. and the pertaining probability measure.

Upper Extremes of Uniform R.V.'s


Let y"-k+l:n be the kth largest order statistic of n i.i.d. random variables with
common dJ. W2 ,l (the uniform distribution on ( -1, 0, In Section 5.1 it was
proved that n y"-k+l:n is asymptotically equal (in distribution) to a "negative"
gamma r.v.

where ~ 1, ... , ~k are i.i.d. random variables with common "negative" exponential dJ. F(x) = eX for x < O. An extension of the result for a single order statistic

5. Approximations to Distributions of Extremes

178

to joint distributions of upper extremes can easily be established by utilizing


the following lemma.
Lemma 5.3.1. For every k = 1, ... , n we have

sup IP{ (n v'"n, n v,,-l:n, 00., n v,,-k+l:n) E B} - P{ (Sl' S2' 00., Sk)

sup IP{nVn-k+l:n

B} - P{Sk

B} I

B}I.

It is obvious that "~" holds. At first sight the equality looks surprising,
however, the miracle will have a simple explanation when the distributions
are represented in an appropriate way.
From Corollary 1.6.11 it is immediate that

v,,-k+2:n '~-k+l:n ) =d (Sl


Sk-l
Sk )
( -v,,:n
- , ,
- , ... , - - , - - .
v,,-l:n
v,,-k+1:n
S2
Sk -Sn+l

(5.3.1)

Thus we easily get


sup IP{(nv,,:n,nv,,-l:n,oo.,nv,,-k+l:n)

B} - P{(Sl,S2,oo.,Sd E B}I

Sl
Sk-l
Sk
)
}
sup IP {(~S ,oo"-S'
/
EB
B
2
k - Sn+1 n
Sl
Sk-l ) EB } I =:A.
-P {(S2,oo.,----s;:,Sk

Notice that the first k - 1 components in the random vectors above are
equal. Moreover, it is straightforward to verify that the components in
each vector are independent since according to Corollary 1.6.11(iii) the r.v.'s
SdS2, 00', Sn/Sn+1, Sn+1 are independent. An application of inequality (3.3.4)
(which concerns an upper bound for the variational distance of product measures via the variational distances of the single components) yields

A~suplp{_
Sk / EB}-P{SkEB}1
B
Sn+l n
=

sup IP{nv,,-k+l:n
B

B} - P{Sk

B}I.

Thus, Lemma 5.3.1 is proved.


Combining Lemma 5.1.5 and Lemma 5.3.1 we get
Lemma 5.3.2. For every fixed k

1 as n -->

00,

sup IP {(n v,,:n, n v,,-l :n' 00., n v,,-k+l:n) E B} - P{ (Sl' S2,' 00, Sk) E B} I --> O.
B

The limiting distribution in Lemma 5.3.2 will be denoted by G2. 1 k .


G2.1.j~the limiting distribution of the jth largest order

It is apparent that

5.3. The Structure of Asymptotic Joint Distributions of Extremes

179

statistic-is the jth marginal distribution of G2 1.k. From Lemma 1.6.6(iii) we


know that the density, say, g2, 1,k of G2 , 1,k is given by
g2, l,dx)

= exp(xk),

(5.3.2)

and = 0, otherwise.

Upper Extremes of Generalized Pareto R.V.'s


The extension of Lemma 5.3.2 to other generalized Pareto drs W;,a is
straightforward.
Let again T;,a denote the transformation in (1.6.10). We have T 1 ,a(x) =
(_x)-l/a, T2 ,a(x) = -( _x)l/a, and T3(x) = -loge -x) for -a.) < x < 0.
Denote by G;,a,k the distribution of the random vector
(5.3.3)
The transformation theorem for densities (see (1.4.4)) enables us to compute
the density, say, g;,a,k of G;,a,k' We have
gl,a,k(X) = akexp(-x;a)

TI x
k

j=l

g2,a,k(X) = akexp( -( -Xk)a)

j-(a+l),

TI (_x)a-\

j=l

and the densities are zero, otherwise.


Notice that the following representation of the density gi,a,k holds:
g;,a,k(X) = G;,a(x k )

k-1

TI !/I;,a(Xj) = g;,a(x j=l


TI !/I;,a(xj )
j=l

(5.3.5)

k)

Corollary 5.3.3. Let Xr:n be the rth order statistic of n i.i.d. random variables
with common generalized Pareto df. W;,a' Then,
sup IP{(c;1(Xn- j + 1,n - dn}}J=l

B} - G;,a,k(B)I-+ 0,

n -+

00,

where

Cn

and dn are the constants of Theorem 5.1.1.

PROOF. Straightforward from Lemma 5.3.2, the definition of G;,a,k and the fact
that

5. Approximations to Distributions of Extremes

180

Domains of Attraction
This section concludes with a characterization of the domains of attractions
of joint distributions of a fixed number of upper extremes by means of the
corresponding result for sample maxima.
First, we refer to the well-known result (see e.g. Galambos (1987), Theorem
2.8.2) that a dJ. belongs to the weak domain of attraction of an extreme value
dJ. Gi a if, and only if, the corresponding result holds for the kth largest order
statistic with Gi,a,k as the limiting dJ.
Our interest is focused on the convergence W.r.t. the variational distance.

Theorem 5.3.4. Let F be a df. with density f Then, the following two statements
are equivalent:
(i) F belongs to the strong domain of attraction of an extreme value distribution
G E {Gl,a' GZ,a, G3 : 0: > a}.
(ii) There exist constants an > and bn such that for every positive integer k
there is a nondegenerate distribution G(k) such that

sup IP{(a;l(Xn_ j+1,n - bn))j=l

B} - G(k)(B)I--+ 0,

n --+

00.

In addition,

if (i) holds for G = Gi,a then (ii) is valid for G(k) = Gi,a,k'

(ii) => (i): Obvious.


(i) => (ii): Let an > and bn be such that for every x

PROOF.

n --+

(1)

00,

where G E {Gt,a, GZ,a, G3 : 0: > a}. According to Lemma 5.1.3, (i) is equivalent
to the condition that for every subsequence i(n) there exists a subsequence
m(n) := i(j(n)) such that

m(n)am(n)f(bm(n)
for Lebesgue almost all x

+ xam(n) --+ !/J(x),

n --+

(o:(G), w(G)) where again !/J

00,

= G'jG. Thus, also

TI m(n)am(n)f(bm(n) + xjam(n) --+ j=t


TI !/J(xj),
j=l

n --+

00,

(2)

for Lebesgue almost all x = (xt, ... ,xk ) E (o:(G),w(G)t Furthermore, deduce
with the help of (1.4.4) that the density of (a;l (Xn- j+1 ,n - bn))j=t, say, fn,k is
given by

f",k(X) = Fn-k(bn + xka n)


and

TI [(n j=l

+ l)aJ(bn + xja n)],


(3)

= 0, otherwise. Combining (1)-(3) with (5.3.5) we obtain for G = Gi,a that


n --+

00,

5.4. Expansions of Distributions of Extremes

for Lebesgue almost all x with tX(G} < Xk < ... <
Lemma 3.3.2 implies (ii) with G(k) = Gi,~,k'

181
Xl

< w(G}. Thus the ScMiTe


D

5.4. Expansions of Distributions of Extremes


of Generalized Pareto Random Variables
In this section we establish higher order approximations to the distribution
of upper extremes of generalized Pareto r.v.'s. First, we prove an expansion
of the distribution of the kth largest order statistic of uniform r.v.'s. The
leading term of the expansion is a "negative" gamma distribution G2 ,1,k' By
using the transformation technique the result is extended to generalized
Pareto r.v.'s. Finally, the results of Section 5.3 enable us to examine joint
distributions of upper extremes.
Let v,,-k+l:n again be the kth largest order statistic of n i.i.d. (-1, O)uniformly distributed r.v.'s. From (5.1.35) we already know that
sup IP{nv,,-k+I:n E B} - G2 ,I,k(B}I-+ 0,

n -+

00.

We shall prove that the remainder term is bounded by Ckln where C is a


universal constant. The expansion of length 2 will show that this bound is
sharp. The extension from W2,l to a generalized Pareto dJ. WE {WI,~, W2,~'
W3: tX > O} is straightforward. We have
sup IP{c;I(Xn-k+I:n - dn } E B} - Gi,~,k(B}1 :::; Ckln

(5.4.1)

where Cn and dn are the usual normalizing constants.


In Section 5.5 we shall see that if the generalized Pareto dJ. W is replaced
by an extreme value dJ. G E {GI,~, G2,~' G3 : tX > O} then the bound in (5.4.1) is
of order o (P/2 In}.
Moreover, as it will be indicated at the end of this section, F has the tail of
a generalized Pareto dJ. if an inequality of the form (5.4.1) holds. Therefore,
in a certain sense, the generalized Pareto dJ.'s occupy the place of the
max-stable extreme value dJ.'s as far as joint distributions of extremes are
concerned.

Extremes of Uniform R.V.'s


Let us begin with a simple result concerning central moments of the gamma
distribution G2 ,1,k'
Lemma 5.4.1. The ith central moment
u(i, k} =

(x

+ k}i dG 2 ,l,k(X)

5. Approximations to Distributions of Extremes

182

of G2,1.k fulfills the recurrence relation

+ 2, k) =

u(i

(i

+ 1) [ku(i, k) -

u(i

+ 1, k)].

(5.4.2)

Moreover,
fix

6k

+ kl i dG2,l,k(X) :5; i!ki/2.

(5.4.3)

As special cases we note u(l, k) = 0, u(2, k) = k, u(3, k) = - 2k, u(4, k)


+ 3k 2.

PROOF. Recall that the density of G2 ,l,k is given by

g2,l.k(X)

(k - I)!, x < O. By partial integration we get

- f (i

+ l)(x + k)iX dG 2,l,k(x) = f

(x

= ex(_x)k-lj

+ k)i+l x dG 2,l,k(x) + ku(i + 1, k).

Now, (5.4.2) is straightforward since


u(i

+ 2, k) =

f (x

+ k)i+l x dG2,l,k(x) + ku(i + 1, k)

= - f (i

+ l)(x + k)i x dG2,l,k(x) =

(i

+ 1) [ku(i, k) -

u(i

+ 1, k)].

Moreover, because of (i + 1) [(i + I)! + i!] = (i + 2)! we obtain by induction over i that IU(i, k)1 :5; i!ki/2j2. This implies (5.4.3) for every even i. Finally,
the Schwarz inequality yields

fix

+ kl2i+l x dG 2,l,k(x) :5; (2i + 1)!k(2i+l)/2.


D

The proof is complete.


A preliminary higher order approximation is obtained in Lemma 5.4.2.

em > 0 such
that for nand k E {I, ... , n} with kjn sufficiently small (so that the denominators
below are bounded away from zero) the following inequality holds:

Lemma 5.4.2. For every positive integer m there exists a constant

2(m-l)

G2,l.k(B) + i~ P(i, n - k)
sup P {n v,,-k+l:n

B} -

J (x + k)i dG2,l,k(X)
B

---------=2..,..(m--~1)~-----=-=--------

1+

Moreover,
p(i,n)

j=O

p(i,n - k)u(i,k)

i=2

(-l Y(. n .)n-(i-j)jj!


1-]

and u(i, k) is the ith central moment of G2,1.k.

5.4. Expansions of Distributions of Extremes

183

As special cases we note P(2, n) = -1/2n, P(3, n) = 1/3n 2, P(4, n) = 1/8n 2


1/4n 3 Moreover, IP(2i - 1, n)l, IP(2i, n)1 :s; Cmn- i, i = 1, ... , m - 1.

PROOF. Put
gn(x)

x + k)n-k( - X)k-l
1+n_ k
(k _ I)! 1(-n,o)(x),

-k (

=e

From Theorem 1.3.2 we conclude that gn/J gix)dx is the density ofnv,,_k+l:n.
Moreover, we write
fix)

[1 + 2:~1)

P(i, n - k)(x

+ k)iJ g2, l,k(x),

Lemma A.2.1 yields


Ign(x) - J..(x)1 :s; C(n - k)-m[lx

+ kl 2m - l + (x + k)2m]g2, 1,k(x)

(1)

for every x E An := {X < 0: Ix + kl :s; (n - k)1/2} where, throughout, C will be


used as a generic constant that only depends on m.
From (5.4.3) and from the upper bound of P(i, n - k) as given in Lemma
A.2.1 we conclude that Jfn(x)dx ~ 1/2 if kin is sufficiently small. Thus, by (1),
Lemma A.3.2, and (5.4.3) we obtain

s~p /p{nv,,-k+l:n
:s; C

Moreover, because of (1
Schwarz inequality yields

L:; Ign(x) - J..(X) Idx


+

B} - LJ..(X)dX/ fJ..(X)dX/

Ln Ign(x) - fn(x)1 dx + L:; Ign(x) - J..(x)1 dx

:s; C(k/nr

:s; 2G2,1,k(A~)

2(m-l)

(2)

L:; Ign(x) - J..(X) Idx.

+ x/n)n :s; exp(x)

i~ IP(i,n - k)1 A:; Ix

we have gn :s; g2,1,k' Thus, the

+ kl i dG2,1,k(X)

:s; C(k/nr.
Combining this and (2) the proof is completed.

The following theorem is an immediate consequence of Lemma 5.4.2 and


Lemma 3.2.5. Moreover, we remark that the polynomials Pi,k,n can easily be
constructed by means of formula (3.2.9).

5. Approximations to Distributions of Extremes

184

Theorem 5.4.3. For every positive integer m there exists a constant Cm > 0 such
that for every nand k E {I, ... , n} the following inequality holds:

sup Ip{nv,.-k+l,n
B

B} - [G2,l,k(B)

+ ~f
J=l

Pj,k,n dG 2,l,kJI

~ Cm(k/n)m

where Pj,k,n are polynomials of degree 2j.

We note the explicit form of Pl,k,n and P2,k,n' We have


Pl,k,n(X) = - [(x

+ k)2

- k]/2(n - k)

and

(5.4.4)

P2,k,n(X) = /3(4, n - k) [(x

+ k)4 -

- /3(2, n - k)u(2, k) [(x

u(4, k)]

+ k)2

+ /3(3, n -

k) [(x

+ k)3

- u(3, k)]

- u(2, k)].

Lemma 5.4.2 as well as Theorem 5.4.3, applied to m = 1, yield (5.4.1) in the


particular case of W = W2 ,l'

Extremes of Generalized Pareto R.V.'s


The extension of the results above to the kth largest order statistics Xn-k+l,n
under a generalized Pareto dJ. WE {W1,a, W2,a' W3: a > O} is immediate. By
using the transformation technique we easily obtain (5.4.1) and the following
expansion

s~p IP {C;;-l (Xn- k+1,n -

dn) E B} - [ Gi,a,k(B)

+ ~f
J=l

Pj,k,n(log Gi,a) dGi,a,kJ I

~ Cm(k/n)m

(5.4.5)

where Cn and dn are the constants of Theorem 5.1.1 and Pj,k,n are the
polynomials of Theorem 5.4.3.
Next, we prove the corresponding result for joint distributions of upper
extremes.
Theorem 5.4.4. Let Xn,n, ... , X n- k+1,n be the k largest order statistics under
the generalized Pareto df WE {W1,a, W2,a, W3: a > O}. Let Cn, dn, Cm, and Pj,k,n
be as above. Then,

s~p Ip{(C;;-l(Xn,n -

d n), ... , c;;-l(Xn-k+l,n

dn))

B} - [Gi,a,k(B)

(5.4.6)

5.4. Expansions of Distributions of Extremes

185

PROOF. It suffices to prove the assertion in the special case ofi = 2 and IX = 1.
The general case can easily be deduced by means of the transformation
technique. Thus, we have to prove that

s~p IP {(n v,o:n,""

n v,,-k+l:n) E B} - [ GZ ,I,k(B)

+"f JBr
J=1

Pi.k,n(Xk) dGZ,I,k(X)]

I~ Cm(k/n)m.

(5.4.7)

If m = 1 then the proof of Lemma 5.3.2 carries over if Lemma 5.1.5 is


replaced by Theorem 5.4.3.
If m > 1 then one has to deal with signed measures, however, the method
of the proof to Lemma 5.3.2 is still applicable. Notice that the approximating
signed measure in (5.4.7) has the density

-+

(1 + ~~1

Pi,k,n(Xk)g2,I,k(X).

By inducing with x -+ (xdx 2, ... , xk-dxk, Xk) one obtains a product measure
where the kth component has the density
(

1+

t:

m-l
J=1

Pi.k,n g2,I,k'

Now inequality (A.3.3), which holds for signed measures, and Theorem 5.4.3
imply the assertion.
D
Next, Theorem 5.4.4 will be stated once more in the particular case of
m = 1. In an earlier version of this book we conjectured that adJ. F has the
tail of a generalized Pareto dJ. if an inequality of the form (5.4.1) (formulated
for d.f.'s) holds. This was confirmed in Falk (1989a).

Theorem 5.4.5. (i) If X n:n, ... , X n-k+l:n are the k largest order statistics
under a generalized Pareto df. WE {WI, .. , W2,.. , W3: IX > O} then there exists a
constant C > 0 such that for every k E {1, ... , n},
sup IP{ (C;;-I(X.. :.. - d.. ), ... , c;;-I(Xn-k+l:n - dn
B

Ck/n

B} - Gi, .. ,k(B) I

~~

with Cn and dn as in Theorem 5.1.1.


(ii) Let F be a df. which is strictly increasing and continuous on a left neighborhood of w(F). If (5.4.8) holds with W, cn, and dn replaced by F and any
normalizing constants an > 0 and bn then there exist c > 0 and d such that
F((x - d)/c) =

for x in a neighborhood of w(WI, .. ).

WI, ..(x)

5. Approximations to Distributions of Extremes

186

For a slightly stronger formulation of (ii) and for the proof we refer to Falk
(1989a).

5.5. Variational Distance between Exact and


Approximate Joint Distributions of Extremes
In this section we prove a version of Theorem 5.2.5 valid for the joint distribution of the upper extremes. In view of our applications and to avoid technical
complications the results will be proved W.r.t the variational distance.

The Main Results


In a preparatory step we prove the following technical lemma. Notice that the
upper bound in (5.5.1) still depends on the underlying distribution through
the dJ. F. The main purpose ofthe subsequent considerations will be to cancel
the dJ. F in the upper bound to facilitate further computations. We remark
that the results below are useful modifications of results of Falk (1986a).

Lemma 5.5.1. Given ~ E {GI,a,k, G2 ,a,k, G3 ,k: ex > O} let G denote the first
marginal df Let Xn:n :?: ... :?: Xn-k+l:n be the k largest order statistics of n
U.d. random variables with df F and density f Define again IjJ = g/G on the
support of G where g is the density of G. Moreover, fix Xo :?: -00.
Then,
sup IP{(Xn:n"",Xn-k+I:n)
B

2~(MC) +

S;; [

M [

B} - ~(B)I

n(1 - F(x k )) + log G(x k ) -

jt

(5.5.1)

10g(nfNHx) ]

d~(x)

J/
2

+ Ck/n
where M = {x:
PROOF.

Xj

>

X O,

f(xj) > 0, j = 1, ... , k} and C is a universal constant.

The quantile transformation and inequality (5.4.9) yield

P{ (Xn:n, ... , Xn-k+l:n)

B} = P{ [F- I (1

+ (nv,,_j+1:n)/n]J=1

B}

= fln(B) + O(k/n)

(1)

uniformly over n, k, and Borel sets B where the measure fln is defined by

fln(B) = G2,I,k{X: -n <

Xk

< '" < Xl' [F- I (1

+ x)n)]J=1

E B}.

In analogy to the proof of Theorem 1.4.5, part III (see also Remark 1.5.3)
deduce that fln has the density hn defined by

5.5. Variational Distance

hn(x)

187

= exp[ - n(1 - F(xd)]

n (nf(xj)),
j=l
k

and = 0, otherwise.
In (1), the measure J1.n can be replaced by the probability measure Qn = J1.n/bn
where

1 - exp( -n)

k-1

L.

ni/j!

j=O

= 1 + o (k/n).

Denote by gk the density of ~. Recall that gk(X) = G(Xk) n~=l r/J(xj ) for
oc(G) < X k < ... < Xl < w(G). Now, Lemma A.3.5, applied to Qn and ~,
implies the asserted inequality (5.5.1).
0
Next we formulate a simple version of Theorem 5.5.4 as an analogue to
Corollary 5.2.3. The proof can be left to the reader.
Corollary 5.5.2. Denote by Gj the jth marginal df. of ~ E {G1 ,IX,k' G2,IX,k, G3,k:
oc > O}, and write G = G1. If, in addition to the conditions of Lemma 5.5.1,
G{J> O} = 1 and w(F) = 0 for i = 2, then
sup IP{(Xn:n, ... ,Xn-k+1:n) E B} - ~(B)I
B

::;; [

::;;

j~

k
[ j~

f [nfN f [(nfN -

1 - 10g(nfN)] dGj

J1/2

+ Ck/n

1)2/(nfN)] dGjJ1 2+ Ck/n.

As a consequence of Corollary 5.5.2 one gets the following example taken


from Falk (1986a).
EXAMPLE 5.5.3. Let cp denote the standard normal density. Define bn by the
equation bn = cp(bn). Let Xn:n ~ ... ~ X n-k+1:n be the k largest order statistics
of n i.i.d. standard normal r.v.'s. Then,

k}

s~p IP {[bn(Xn-j+1:n - bn)]j=l E B - G3 ,k(B)1 ::;; Ck

1/2 (log(k

+ lW

log n

The following theorem can be regarded as the main result of this section.
Notice that the integrals in the upper bound have only to be computed on
(xo,w(F)). Moreover, the condition G{J > O} = 1 as used in Corollary 5.5.2
is omitted.
Theorem 5.5.4. Denote by Gj the jth marginal df. of ~ E {G1,IX,k' G2,IX,k,
G3 ,k: oc > O}, and put G = G1. Let F be a df. with density f such that f(x) > 0

5. Approximations to Distributions of Extremes

188

for Xo < x < w(F). Assume that w(F) = w(G). Define again
support of G where g is the density of G. Then,

sup IP{(Xn,n,,,,,Xn-k+l,n)
B

t/I =

giG on the

B} - GdB) I

(5.5.2)

PROOF. To prove (5.5.2) one has to establish an upper bound of the right-hand
side of (5.5.1).
Note that under the present conditions

= {x: Xj > xo'/(x) > O,j = I, ... ,k}


=

Moreover, recall that


Obviously,

Xl

{x: Xo < Xj < w(G),j

:2: ... :2:

Xk

= I, ... ,k}.

for every x in the support

of~.

(1)

I':6

Denote by gk the density of Gk. Recall that Gk = G


(-log GYlj! and
= g( -log G)k-l/(k - I)!. In analogy to inequality (2) in the proofto Lemma
5.5.2. we obtain

gk

f ~~ (1 -

[1 - F(x k )] d~(x) =

F)dGk ::;

W(G)
Xo

= j~
Moreover,

(log G(x k

d~(x)

f(y)Gk(y)dy

[(k-l
)]
(fN) j~o(-logGYlj!
dG

(2)

fW(G)

(fN)dGj .

Xo

W(G)

(log G(x dGk(x)

Xo

= - k

W(G)

g(x)( -log G(xklk! dx

Xo

= -

f~~

k(I -

Gk+l

(3)

(x o

Now the proof can easily be completed by combining (5.5.1) with (1)-(3).
Notice that Theorem 5.2.5 is a special case of Theorem 5.5.4.

189

5.5. Variational Distance

Special Classes of Densities


Finally, Theorem 5.5.4 will be applied to the particular densities as dealt with
in Corollary 5.2.7.
Corollary 5.5.5. Assume that G E {G1.a' G2.a, G3 : rx > O} and 1/1, T are the
corresponding auxiliary functions with t/J = giG and T = G- l 0 G2 l .

Assume that the density f of the df F has the representation


f(x)
and =0

l/I(x)eh(X),

T(xo) < x < w(G),

(5.5.3)

if x> w(G), where Xo < 0 and h satisfies the condition


Lx- ab
Ih(x)lsL(-xt b
Le- bx

and L, Ci are positive constants. Then,


sup IP{[c;;-l(Xn- j+1,n - dn)];;l

B} -

if

i= 1
i=2
i= 3

(5.5.4)

a.. (B) I s D[(kln)bkl/2 + kin]

where Cn, dn are the constants of Theorem 5.1.1 and D > 0 is a constant which
only depends on X o , Ci, and L.
We have dn = 0 if i = 1,2, and dn = log n if i= 3; moreover, Cn = n l/a if i = 1,
cn = n- l/a if i = 2, and Cn = 1 if i = 3.
Again it suffices to prove the result for the particular case G = G2 l .
Theorem 5.5.4 will be applied to xO.n = nxo and fn(x) = f(xln)/n. We obtain
PROOF.

sup IP {(nXn,n,"" nXn-k+b) E B} B

[t

a.. (B) I

f:nxo [eh(x/n) - 1 - h(xln)] dGj(x)

+ (1 + (k -

1)( -xg))Gk(nx O) + kGk+dnxo)J/2

(1)

+ Ckln.

Check that Gk(x) = O((k/lxlt) uniformly in k and x < 0 for every positive
integer m. Moreover, since h is bounded on (xo, 0) we have

jt

f:nxo [eh(x/n) - 1 - h(xln)] dGi x )

(2)
k

S Dn- 2b j~
S Dn- 2

where r(t)

j;l

fO

-00

r(2Ci

Ixl 2Hj - l exp(x)j(j - I)! dx

+ j}/r(j)

= SO' x t - l exp( - x) dx denotes the r -function.

5. Approximations to Distributions of Extremes

190

Finally, observe that (compare with ErdeIyi et al. (1953), formula (5),
page 47)
k

j=l

Now by choosing m

r(2c5

+ j)/r(j) ~ D

L jUl.

(3)

j=l

2c5 the asserted inequality is immediate from (1)-(3).


D

EXAMPLE 5.5.6. If fE {gl,lJ,g2,IJ,g3: IX > O}-that is the case of extreme value


densities-then Corollary 5.5.5 is applicable with c5 = 1. Thus, the error bound
is of order O(k3/2 In) which is a rate worse than that in the case of generalized
Pareto densities. Direct calculations show that the bound o (k 3/2 In) is sharp
for k > 1.

5.6. Variational Distance between Empirical


and Poisson Processes
In this section we shall study the asymptotic behavior of extremes according
to their multitude in Borel sets. This topic does not directly concern order
statistics. It is the purpose of this section to show that the results for
order statistics can be applied to obtain approximations for empirical point
processes.

Preliminaries
Let el' ... , en be i.i.d. random variables with common dJ. F which belongs to
the weak domain of attraction of G E {G1,1J' G2 ,1J' G3 : IX > O}. Hence according
to (5.1.4) there exist an> 0 and bn such that
n(l - F(bn + anx)) -+ -log G(x),

n -+

00,

(5.6.1)

for x E (IX (G), w(G)). According to the Poisson approximation to binomial r.v.'s
we know that
n

j=l

l(X,OO)(a;l(ej

bn ))

(5.6.2)

is asymptotically a Poisson r.v. with parameter A. = -log G(x).


Our investigations will be carried out within the framework of point
processes and in this context the expression in (5.6.2) is usually written in the
form
n

L e(~rbn)/aJB)
j=l

(5.6.3)

5.6. Variational Distance between Empirical and Poisson Processes

191

where 8z (B) = 1B(Z) and B = (x, (0). With B varying over all Borel sets we
obtain the empirical (point) process
n

Nn

= j=l
L 8(~rb")/a"

(5.6.4)

of a sample of size n with values in the set of point measures.


Recall that /1 is a point measure if there exists a denumerable set of points
Xj' j E J, such that
/1 =

jeJ

8x

and /1(K) < 00 for every relatively compact set K. The set of all point measures
M is endowed with the smallest a-field .A such that the "projections" /1 -+ /1(B)
are measurable. It is apparent that N: n -+ M is measurable if N(B): n-+
[0, 00] is measurable for every Borel set B. If N is measurable then N is called
a point process. Hence, the empirical process is a point process. Certain
Poisson processes will be the limiting processes of empirical processes.

Homogeneous Poisson Process


Let e1, ... , en be i.i.d. random variables with common dJ. W2 ,l the uniform
dJ. on ( -1, 0). In this case, the empirical process is given by

(5.6.5)
In the limit this point process will be the homogeneous Poisson process No
with unit rate. The Poisson process No is defined by
00

No =

L 8s

j=l

(5.6.6)

where Sj is the sum ofj i.i.d. standard "negative" exponential r.v.'s. Moreover,
M is the set of all point measures on the Borel sets in ( -00,0).
For every s > and n = 0,1,2, ... define the truncation N~S) by

(5.6.7)

Theorem 5.6.1. There exists a universal constant C > such that for every
positive integer nand s ~ log(n) the following inequality holds:
sup IP{N~S) E M} - P{N~S) E M}I ~ Cs/n.

Me.A

(5.6.8)

PROOF. Let v,,:n ~ ... ~ V1 : nbe the order statistics ofn i.i.d. random variables
with uniform distribution on ( -1, 0). Let k == k(n) be the smallest integer such
that
(1)

5. Approximations to Distributions of Extremes

192

In this sequel, C will denote a constant which is independent of nand


s :2': log(n). It follows from the exponential bound theorem for order statistics
(see Lemma 3.1.1) that k :::;; Cs. Write
k

N(')
O,k

= "L...

;=1

SSj (.

n [-s , 0))

and

(2)
k

N(')
n,k

= "L...

;=1

S nVn-i+l:n (.

n [-s,0))
.

It is immediate from (1) that for n :2': 1,

sup

MeA

IP{N~') E

M} -

p{M~i E

M}I :::;; n- 1

(3)

From Theorem 5.4.4 we know that

Note that N~~L n :2': 1, and Nd~)k may be written as the composition of the
random vectors (n v,,:n,' .. , n v,,-k+1:n), n :2': 1, and (Sl"'" Sk), respectively, and
the measurable map
k

(x 1 ,,xd -+

L sx,
;=1

having its values in the set of point measures.


Therefore, (4) yields
(5)

sup IP{N~~i E M} - P{Nd~)k E M}I :::;; Ck/n.

MeA

Moreover, (1) and (4) yield


P{Sk:2': -S} :::;; Ck/n

(6)

and hence, in analogy to (3),


sup IP{Nd~)k

MeA

M} - P{Nd') E M}I :::;; Ck/n.

(7)

Now (3), (5), (7), and the triangle inequality imply the asserted inequality.

The bound in Theorem 5.6.1 is sharp. Notice that for every k E {I, ... , n}
sup IP{Nn ( -t,O) < k - I} - P{No( -t,O) < k - 1}1

-s:s;; -t

= sup lP{nv,,-k+1:n:::;; -t} - G2 ,1,k( -t)l.

(5.6.9)

-s:S;; - t

Hence a remainder term of a smaller order than that in (5.6.8) would yield a
result for order statistics which does not hold according to the expansion of
length 2 in Theorem 5.4.3.

5.6. Variational Distance between Empirical and Poisson Processes

193

Extensions
Denote by Vo the Lebesgue measure restricted to (-00,0). Recall that Vo is the
intensity measure of the homogeneous Poisson process No. We have
(5.6.10)
Write again 7i.a = G~~ 0 G2 1 (see (1.6.10)). Denote by Mi the set of point
measures on (a(G i.a), w(Gi.a)) and by .$( the pertaining a-field. Denote by 7i.a
also the map from Ml to Mi where 7i.all is the measure induced by 11 and 7i.a
Notice that if 11 =
eXj then

Lid

Tl,ar/I

= "i..J eT
jEJ

( ).
i,cr:Xj

Define
(5.6.11)

Ni.a.n = 7i.a(NJ

for N n as in (5.6.5) and (5.6.6). It is obvious that for n = 1, 2, ...


N l,a,n

="i...J e(,
d

k=l

(5.6.12)

<,ok- d)'
n len

where ~ l ' ... , ~n are i.i.d. random variables with common generalized Pareto
dJ. Wi.a; moreover, Cn > 0 and dn are the usual normalizing constants as
defined in (1.3.13).
It is well known that N i a == N i a O is a Poisson process with intensity
measure vi.a = 7i.aVO (having the mean value function 10g(Gi.a)). Recall that
the distribution of N i a is uniquely characterized by the following two
properties:
(a) Ni.a(B) is a Poisson r.v. with parameter vi.a(B) if vi.a(B) < 00, and
(b) Ni.a(Bd, ... , Ni.a(Bm) are independent r.v.'s for mutually disjoint Borel
sets B 1 , . , Bm.
Define the truncated point processes Ni~s~.n by
(5.6.13)
From Theorem 5.6.1 and (5.6.11) it is obvious that the following result
holds.

Corollary 5.6.2. There exists a universal constant C > 0 such that for every
positive integer nand s ~ log(n) the following inequality holds:
sup IP{Ni~s~.n

ME .Jt;

M} - P{Ni~s~.O

M}I ::;; Cs/n.

(5.6.14)

Notice that Corollary 5.6.2 specialized to i = 2 and a = 1 yields Theorem


5.6.1.

5. Approximations to Distributions of Extremes

194

Final Remarks
Theorem 5.6.1 and Corollary 5.6.2 can easily be extended to a large class of
dJ.'s F belonging to a neighborhood of a generalized Pareto dJ. W; . with
Ni~s~.O again being the approximating Poisson process. This can be proved just
by replacing Theorem 5.4.3 in the proof of Theorem 5.6.1 (for appropriate
inequalities we refer to Section 5.5). Moreover, in view of (5.6.9) and Theorem
5.4.5(ii) it is apparent that a bound of order O(s/n) can only be achieved if F
has the upper tail of a generalized Pareto dJ. The details will be omitted since
this topic will not be pursued further in this book.
In statistical applications one gets in the most simple case a model of
independent Poisson r.v.'s by choosing mutually disjoint sets. The value of s
has to be large to gain efficiency; on the other hand, the Poisson model
provides an accurate approximation only if s is sufficiently small compared
to n. The limiting model is represented by the unrestricted Poisson processes
N; .. One has to consider Poisson processes with intensity measures depending on location and scale parameters if the original model includes such
parameters. This family of Poisson processes can again be studied within a
3-parameter representation.

P.5. Problems and Supplements


1. Check that the max-stability Gn(dn + xcn) = G(x) of extreme value drs has its

counterpart in the equation

n(1 - W(d n + xc n)) = 1 - W(x)

for the generalized Pareto d.f.'s WE {W1..' W2 .", W3:

(X

> o}.

2. Check that the necessary and sufficient conditions (5.1.5)-(5.1.7) are trivially
satisfied by the generalized Pareto dJ.'s in the following sense:
(i) For x > 0 and t such that tx > 1:
(1 - Wl .,,(tx))/(1 - W1.,,(t)) =
(ii) For x < 0 and t

X-IX.

> 0 such that tx > -1:


(l - W2 (tx))/(1 - W2 .,,( -t)) = (-x).

(iii) For t, x > 0:


g(t) =

f'

(1 - W3 (y))dy/(1 - W3 (t)) = 1

and
(1 - W3 (t

+ x))/(1

- W3 (t)) =

e-

3. Let F l , F 2 , F 3, .,. be drs. Define G:'(x) = Fn(b: + a:x) and Gn(x) = F.(bn + anx)
where
> O. Assume that for some nondegenerate dJ. G*,

a:, a.

P.5. Problems and Supplements

195
G: -> G*

weakly.

(i) The following two assertions are equivalent:


(a) For some nondegenerate dJ. G,
Gn -> G weakly.

(b) For some constants a> 0 and b,


an/a:

->

and (bn - b:)/a:

->

as n ->

00.

(ii) Moreover, if (a) or (b) holds then


G(x) = G*(b

+ ax) for all real x.

[Hint: Use Lemma 1.2.9; see also de Haan, 1976.]


4. (i) Let c be the unique solution of the equation
x 2 sin(l/x)

+ 4x + 1 =

on the interval ( -1, 0). Define the d.f. F by


F(x) = x 2 sin(1/x)

+ 4x + 1,

X E

(c,O).

Then, for every x,


Fn(x/4n)

->

G2 1 (x)

as n ->

00.

However, F does not belong to the strong domain of attraction of G2 1 .


(Falk, 1985b)
(ii) The Cauchy dJ. F and density f are given by
F(x) = 1/2

+ (l/n) arc tan x

and

Verify the von Mises-condition (5.1.24) with i = 1 and (l( = 1.


[Hint: Use the de l'Hospital rule.]
5. (Asymptotic drs of intermediate order statistics)
Let k(n) E {1, ... , n} be such that k(n) i 00 and k(n)/n -> 0 as n -> 00.
(i) The nondegenerate limiting drs of the k(n)th order statistic are given by
!Il(G31 (G))

on (l(G), w(G))

where G E {G1.a' G2.a, G3 : (l( > O}.

(Chibisov, 1964; Wu, 1966)


(ii) The weak convergence of the distribution of a;;-l(Xk (n):n - bn) to the limiting
dJ. defined by G holds if, and only if,
[nF(bn + anx) - k(n)]/k(n)I/2 -> G31 (G(X)),

6. Let
that

'I' '2' '3' ...

n ->

00,

(l(G), w(G)).

(Chibisov, 1964)

be i.i.d. symmetric random variables (that is,

'i

4: -,;). Prove

196

5. Approximations to Distributions of Extremes


sup IP{max(lell, ... ,le.i) E B} - P{max(el, ... ,e2.) E B}I = O(n- 1 ).
B

[Hint: Apply (4.2.10).]


7. Let b: be defined as in Example 5.1.4(5) and b.

Let

a. =

cI>-l(1 - l/n). Show that

lb. - b:1 = O((loglogn)2/(logn)3/2).

l/mp(b.) and

a: = (2 log n)-1/2. Show that


la. - a:1

o ((loglog n)/(log n)3/2).

Show that

(Reiss, 1977a, Lemma 15.11)


8. For l' > 0 and a real number Xo let Fy be a dJ. with
Fix) = ycl>(x)

+ (1

- 1') for x

~ Xo.

Put b. = cI>-l(l - l/ny) and a. = l/mp(b.). Show that


sup 1F;(b.
x

+ a.x) -

G3 (x)(1

+ x 2e- x /(410g n)1 = o ((log nr2)

and, thus,
sup 1F;(b. + x/b.) - G3 (x)1

O((log nrl).

(Reiss, 1977a, Theorem 15.17 and Remark 15.18)


9. (Graphical representation of generalized Pareto densities)
Recall that for Pareto densities w1.a(x) = IXX-(l+a), X ~ 1, we have w1.a(l) = IX
(Fig. P.5.1). For the generalized Pareto type II densities w2,a we have w2,a(x) =
IX( - x)a-l - g2,a(X) as xi 0 (Fig. P.5.2).

1.5

2.0

2.5

Figure P.5.1. Pareto densities w1,a with IX = 0.1,0.5,1.5.

P.5. Problems and Supplements

197

-1

Figure P.5.2. Generalized Pareto densities w2 with

Q(

= 0.5, 1, 1.5,2,3.

10. (Von Mises parametrization of generalized Pareto dJ.'s)


For P > 0, define
Vp(x) = 1 - (1

+ PX)-l/P if 0 <

x.

For P < 0, define


1 - (1

+ PX)-l/P

0< x < - -

if

Vp(x) =

For

P=

x> --.

0, define
Vo(x) = 1 - e- X

Show that
W1,1/P(x) =

W2,1/IPI(X) =

X -

for x> O.

1)

ifP > 0,

ifP < 0,

Vp ( -P-

Vpel;1

W3 (x) = Vo(x).

The density vp of Vp is equal to zero for x < O. Moreover, if P > 0 then


vp(x) = (1

+ PX)-(l +lIP).

If P< 0 then
vp(x) =

(1

+ px)-(1 +lIP)

o <x < -liP


x 2

-liP.

5. Approximations to Distributions of Extremes

198

Figure P.5.3. Standard exponential and Pareto densities

vp

with P= 0, 0.6, 2.

Figure P.504. Standard exponential and generalized Pareto type II densities


-1, -0.75, -0.5, -004, O.

P=

vfJ

with

The Pareto densities vfJ with P~ 0 (Figure P.5.3) and the generalized Pareto
type II densities vp with Pi 0 (Figure P.5.4) approach the standard exponential
density Vo (dotted curve).
11. (Maxima with random indices)
Let ~i' i = 1,2, ... be i.i.d. random variables and let N(i), i = 0, 1, ... be positive
integer-valued r. v.'s.
(i) If G is an extreme value dJ., and
(a) P{a;;-l(X.,. - b.) ::;; x} -+ G(x), n -+ 00,

(b) N(i)/i -+ N(O), i


then

-+ 00,

in probability,
i -+ 00.

(Barndorff-Nielsen, 1964)

P.5. Problems and Supplements

199

(ii) If the sequence (0 and N(j) are independent for every j then the condition
(i)(b) can be replaced by
(b') N(i)/i ~ N(O),
i -+ 00.
(iii) Show that the independence condition in (ii) cannot be omitted without
compensation. [Hint: Define N(i) = min{j: ~j > log i} for standard exponential r.v.'s.]
(M. Falk)
12. Show that the Cauchy dJ. with scale parameter, (J = n satisfies condition (5.2.14)
for i = 1 and a = 1 with b = 1. As a consequence one gets for the maximum Xn:n
of standard Cauchy r.v.'s that
sup IP{(n/n)Xn:n

B} - G1.1(B)1 = O(n- 1 ).

13. Under the Weibull dJ. F. on the positive half-line defined by


F.(x)
one gets for every a

=1-

X> 0,

exp( -x'),

* 1,

sup IP{a(logn)l-l/'(Xn:n - (logn)l/.)

B} - G3 (B)1 = O(I/logn).

14. (Bounds for remainder terms involving von Mises conditions)


Assume that the d.f. F has three continuous derivatives and that f
interval (xo, w(F)). Put
H

= F' >

0 on the

= (1 - F)!f

Assume that the von Mises condition (5.1.25) holds for some i E {l, 2, 3} and a> 0
(with a = 1 if i = 3). Thus, we have
hi,.(x) := aH'(x) - 7;,.( -1) -+ 0

as

xi w(F)

where again 7; . = G;~; 0 G2 1 . Notice that T1 ( -1) = 1, T2 ( -1) = -1 and


T3 ( -1) = O. Then for a(Gi. ) < x < w(Gi,.),
IFn(bn + anx) - Gi . (x)1
with

Xn

= O(lhi,.(xn)1 + n- 1 )

F- 1 (1 - l/n) and the normalizing constants are given by

an

= a/(nf(x n))

and
bn =

Xn -

7; . ( -l)an
(Radtke, 1988)

15. (Expansions involving von Mises conditions)


Assume, in addition to the conditions of P.5.14, that
(xo,w(F)). Then for a(Gi,.) < x < w(Gi. ),

f" > 0

on the interval

IFn(bn + anx) - Gi. (x)(1 - hi .(xn)tfri .(X) [x - 7; . ( -1)]2/2)1

= O(hi . (xnf + Ihi .(xn)llgi .(xn)1 + n- 1 )


where gi . is another auxiliary function. We have
gi .

= h:'.H/hi. + 7; . ( -1)/a

5. Approximations to Distributions of Extremes

200

implicitly assuming that hi . -# O. Moreover, assume that limxtw(F) 9i . (X) exists in


(-00,00), and there exist real numbers K, such that
91 . (tX)
92 . (w(F) - tx)

93(X

= 91 .(x)(K, + o(Xo)) as x
= 92 .(w(F)

+ tH(x)) =

93(X)(K,

- x)(K,

+ o(Xo))

->

= 1,

w(F) for all t > 0 if j

+ o(Xo)) as x! 0 for all t > 0 if j = 2,


as x i w(F) for all reals t if j = 3.
(Radtke, 1988)

16. (Special cases)

(i) Let

x;::: I,
for some

C(

> 0 and 0 <

p :-:::; 1. Then

with an and bn as above. Moreover, 9i . (X n) does not converge to zero as n -> 00


(compare with P.5.15).
(ii) Let
F(x) = l-x-.ex
for

C(

P[I-1 +~OgXl

x;::: I,

> O. Then
IFn(bn + anx) - Gl,.(x)(1 - hl,.(x n)t/J1 .(x) [x - 1]2/2)1
= O((logn)2/n2.

+ n- 1)

and
h 1 (x n ) = O(n-).

17. (i) Prove that for adJ. F and a positive integer k the following two statements

are equivalent:
(a) F belongs to the weak domain of attraction of an extreme value dJ.
G E {G 1 ,G2 ,G3 : C( > O}.
(b) There are constants an > 0 and bn such that the dJ.'s Fn.k defined by
Fn.k(x) = P{a;l(X.,n - bn):-:::; x 1, ... ,a;1(Xn_H1 ,. - bn):-:::;
converge weakly to a nondegenerate dJ. G(k).
(ii) In addition, if (a) holds for G = Gi then (b) is valid for
18.

G(k)

xd

Gi k

(i) There exists a constant C > 0 such that for every positive integer nand
k E {I, 2, ... , [n/2]} the following inequality holds:

s~p IP {en :3:)1/2 ( Un- HLn -

n:

k) - k) B} - G2.l.k(B) I :-: :; Ck 1/2 /n.


E

(Kohne and Reiss, 1983)


(ii) It is unknown whether the standardized distribution of Un - H1 ,. admits an
expansion of length m arranged in powers of k 1/2 /n where again G2.l.k is the
leading term of the expansion.
(iii) Reformulate (i) by using N(o.l) in place of G2 1 k

Bibliographical Notes

201

19. (Asymptotic independence of spacings)


There exists a constant C > 0 such that for every positive integer nand k E
{1, 2, ... , n} the following inequality holds:
sup IP{ (nUl:., n(U2 :.
B

where

~ l ' ... , ~k

U I :.), ... , n(Uk:. - Uk-I:.))

B}

are i.i.d. random variables with standard exponential dJ.

20. Show that under the triangular density

f(x) = 1 - lxi,

x:;:;; 1,

one gets
sup IP{(n/2)1/2(X._ i + 1 :.
B

1)~=1 E B} - G2. 2.k (B)1 :;:;; Ck/n

where C > 0 is a universal constant.


21. (Problem) Prove inequalities w.r.t. the Hellinger distance corresponding to those

in Lemma 5.5.1 and Theorem 5.5.5.


22. For the k largest order statistics of standard Cauchy r.v.'s one gets

s~p Ip {(~X.:., ... ,~X'-k+l:') E B} - G k(B)1 :;:; Ck


11

3/2 /n

where C > 0 is a universal constant.


23. Extend Corollary 5.6.2 to drs that satisfy condition (5.2.11).

Bibliographical Notes
An excellent survey ofthe literature concerning classical extreme value theory
can be found in the book of Galambos (1987). Therefore it suffices here to
repeat only some of the basic facts of the classical part and, in addition, to
give a more detailed account of the recent developments concerning approximations w.r.t. the variational distance etc. and higher order approximations.
Out of the long history, ofthe meanwhile classical part of the extreme value
theory, we have already mentioned the pioneering work of Fisher and Tippett
(1928), who provided a complete list of all possible limiting d.f.'s of sample
maxima. Gnedenko (1943) found necessary and sufficient conditions for adJ.
to belong to the weak domain of attraction of an extreme value dJ. De Haan
(1970) achieved a specification of the auxiliary function in Gnedenko's
characterization of F to belong to the domain of attraction of the Gumbel
dJ. G3
The conditions (1, oc) and (2, oc) in (5.1.24) which are sufficient for a dJ. to
belong to the weak domain of attraction of the extreme value dJ.'s Gl,Q! and
G2 .Q! are due to von Mises (1936). The corresponding condition (5.1.24)(3) for

202

5. Approximations to Distributions of Extremes

the Gumbel dJ. G3 was found by de Haan (1970). Another set of "von Mises
conditions" is given in (5.1.25) for dJ.'s having two derivatives. For i = 3 this
condition is due to von Mises (1936). Its extension to the cases i = 1, 2
appeared in Pickands (1986).
In conjunction with strong domain of attraction, the von Mises conditions
have gained new interest. The pointwise convergence ofthe densities of sample
maxima under the von Mises condition (5.1.25), i = 3, was proved in Pickands
(1967) and independently in Reiss (1977a, 1981d). A thorough study of this
subject was carried out by de Haan and Resnick (1982), Falk (1985b), and
Sweeting (1985).
Sweeting, in his brilliant work, was able to show that the von Mises
conditions (5.1.24) are equivalent to the uniform convergence of densities
of normalized maxima on finite intervals. We also mention the article of
Pickands (1986) where a result closely related to that of Sweeting is proved
under certain differentability conditions imposed on F.
In (5.1.31) the number of exceedances of n i.i.d. random variables over
a threshold Un was studied to establish the limit law of the kth largest
order statistic. The key argument was that the number of exceedances is
asymptotically a Poisson r.v. This result also holds under weaker conditions.
We mention Leadbetter's conditions D(u n ) and D'(un ) for a stationary sequence
(for details see Leadbetter et al. (1983)).
A necessary and sufficient condition (see P.5.5(ii)) for the weak convergence
of normalized distributions of intermediate order statistics is due to Chibisov
(1964). The possible limiting dJ.'s were characterized by Chibisov (1964) and
Wu (1966) (see P.5.5(i)). Theorem 5.1.7, formulated for G3 ,k instead of N(O,l)'
is given in Reiss (1981d) under the stronger condition that the von Mises
condition (5.1.25), i = 3, holds; by the way, this result was proved via the
normal approximation. The weak convergence of intermediate order statistics
was extensively dealt with by Cooil (1985,1988). Cooil proved the asymptotic
joint normality of a fixed number of suitably normalized intermediate order
statistics under conditions that correspond to that in Theorem 5.1.7. For the
treatment of intermediate order statistics under dependence conditions we
refer to Watts et al. (1982).
Bounds for the remainder terms of limit laws concerning maxima were
established by various authors. We refer to W.J. Hall and J.A. Wellner (1979),
P. Hall (1979), R.A. Davis (1982), and the book of Galambos (1987) for bounds
with explicit constants.
As pointed out by Fisher and Tippett (1928), extreme value dJ.'s different
from the limiting ones (penultimate dJ.'s) may provide a more accurate
approximation to dJ.'s of sample maxima. This line of research was taken up
by Gomes (1978, 1984) and Cohen (1982a, b). Cohen (1982b), Smith (1982),
and Anderson (1984) found conditions that allow the computation of the rate
of convergence w.r.t. the Kolmogorov-Smirnov distance. Another notable
article pertaining to this is Zolotarev and Rachev (1985) who applied the
method of metric distances.

Bibliographical Notes

203

It can easily be deduced from a result of Matsunawa and Ikeda (1976) that
the variational distance between the normalized distribution of the k(n)th
largest order statistic of n independent, identically (0, 1)-uniformly distributed
r.v.'s and the gamma distribution with parameter k(n) tends to zero as n --+ 00
if k(n)!n tends to zero as n --+ 00. In Reiss (1981d) it was proved that the
accuracy of this approximation is :$; Ckln for some universal constant C. This
result was taken up by Falk (1986a) to prove an inequality related to (5.2.6)
W.r.t. the variational distance. A further improvement was achieved in Reiss
(1984): By proving the result in Reiss (1981d) w.r.t. the Hellinger distance
and by using an inequality for induced probability measures (compare with
Lemma 3.3.13) it was shown that Falk's result still holds if the variational
distance is replaced by the Hellinger distance. The present result is a further
improvement since the upper bound only depends on the upper tail of the
underlying distribution.
The investigation of extremes under densities of the form (5.2.14) was
initiated by L. Weiss (1971) who studied the particular case of a neighborhood
of Wei bull densities. The class of densities defined by (5.2.18) and (5.2.19)
corresponds to the class of dJ.'s introduced by Hall (1982a).
It is evident that ifthe underlying dJ. only slightly deviates from an extreme
value dJ. then the rate of convergence of the dJ. of the normalized maximum
to the limit dJ. can be of order o(n- 1 ). The rate is of exponential order if F
has the same upper tail as an extreme value dJ. It was shown by Rootzen
(1984) that this is the best order achievable under a dJ. unequal to an extreme
value dJ. It would be of interest to explore, in detail, the rates for the second
largest order statistic.
Because of historical reasons we note the explicit form of the interesting
expansion in Uzgoren (1954), which could have served as a guide to the
mathematical research of -expansions in extreme value theory:

log( -log Fn(bn + xg(bn)))


2

= -x + ~! g'(bn) + ~! [g(bn)g"(bn) - 2g'2(bn)J + ... + ...


e- X +'"

2n

24n 2

+ _ _ + __ e- 2x +'"

_ _

8n 3

e- 3x +'" + '"

where bn = p-l(1 - lin) and g = (1 - F)lf The first two terms of the expansion
formally agree to that in (5.2.16) in the Gumbel case. However, as reported by
T.J. Sweeting (talk at the Oberwolfach meeting on "Extreme Value Theory,"
1987) the expansion is not valid as far as the third term is concerned.
Other references pertaining to this are Dronskers (1958), who established an
approximate density of the k(n)th largest order statistic and Haldane and
Jayakar (1963), who studied the particular case of extremes of normal r.v.'s.
Expansions oflength 2 related to that in (5.2.16) are well known in literature
(e.g. Anderson (1971) and Smith (1982)). These expansions were established in

204

5. Approximations to Distributions of Extremes

a particularly appealing form by Radtke (1988) (see P.5.15). From P.5.15 we


see that the rate of convergence, at which the von Mises condition holds, also
determines the rate at which the convergence to the limiting extreme value
dJ. holds. The available results do not fit to our present program since only
expansions of dJ.'s are treated. In spite of the importance of these results,
details are given in the Supplements. It is an open problem under which
conditions the expansions in P.5.15 lead to higher order approximations that
are valid W.r.t. the variational or the Hellinger distance. (5.2.15) and (5.2.16)
only provide a particular example. A certain characterization of possible types
of expansions of distributions of maxima was given by Goldie and Smith
(1987).
Weinstein (1973) and Pantcheva (1985) adopted a nonlinear normalization
in order to derive a more accurate approximation of the dJ. of sample
maxima by means of the limiting extreme value dJ. From our point of view,
a systematic treatment of this approach would be the following: First, find an
expansion of finite length; second, construct a nonlinear normalization by
using the "inverse" of the expansion as it was done in Section 4.6 (see also
Theorem 6.1.2).
The method to base the statistical inference on the k largest order statistics
may be regarded as Type II censoring. Censoring plays an important role in
applications like reliability and life-testing. This subject is extensively studied
in books by N.R. Mann et al. (1974), A.J. Gross (1975), L.J. Bain (1978),
W. Nelson (1982), and J.F. Lawless (1982).
Upper bounds for the variational distance between the counting processes
Nn [ -t,O), 0::::;; t::::;; S, and N 2 1 [ -t,O), 0::::;; t::::;; s, may also be found in
Kabanov and Lipster (1983) and Jacod and Shiryaev (1987). The bounds given
there are of order S2 In and slnl/2 and therefore not sharp. Another reference
is Karr (1986) who proved an upper bound of order n- 1 for fixed s.
In Chapter 4 of the book by Resnick (1987), the weak convergence of certain
point processes connected to extreme value theory is studied. For this purpose
one has to verify that the CT-field J!{ on the set of point measures is the
Borel-CT-field generated by the topology of vague convergence. The weak
convergence of empirical processes can be formulated in such a way that it is
equivalent to the condition that the underlying dJ. belongs to the domain of
attraction of an extreme value dJ. Note that the "empirical point processes"
studied by Resnick (1987, Corollary 4.19) are of the form
00

k=l

G(k/n.

(~k -dn)/c n )'

thus allowing a simultaneous treatment of the time scale and the sequence of
observations.
From the statistical point of view the weak convergence is not satisfactory.
The condition that F belongs to the domain of attraction of an extreme value
dJ. is not strong enough to yield e.g. the existence of a consistent estimator of
the tail index (that is, the index ry, of the domain of attraction). Thus, the weak

Bibliographical Notes

205

convergence cannot be of any help either if F satisfies stronger regularity


conditions.
We briefly mention a recent article by Deheuvels and Pfeifer (1988) who
independently proved a result related to Theorem 5.6.1 by using the coupling
method. We do not know whether their method is also applicable to prove
the extension, as indicated at the end of Section 5.6, where F belongs to a
neighborhood of a generalized Pareto dJ.

CHAPTER 6

Other Important Approximations

In Chapters 4 and 5 we studied approximations to distributions of central and


extreme order statistics uniformly over Borel sets.
The approximation over Borel sets is equivalent to the approximation of
integrals over bounded measurable functions. In Section 6.1 we shall indicate
the extension of such approximations to unbounded functions, thus, getting
approximations to moments of order statistics.
From approximations ofjoint distributions of order statistics one can easily
deduce limit theorems for certain functions of order statistics. Results of this
type will be studied in Section 6.2. We also mention other important results
concerning linear comoinations of order statistics which, however, have to be
proved by means of a different approach.
Sections 6.3 and 6.4 deal with approximations of a completely different
type. In Section 6.3 we give an outline of the well-known stochastic approximation of the sample dJ. to the sample qJ., connected with the name of R.R.
Bahadur.
Section 6.4 deals with the bootstrap, a resampling method introduced by
B. Efron in 1979. We indicate the stochastic behavior of the bootstrap dJ. of
the sample q-quantile.

6.1. Approximations of Moments and Quantiles


This section provides approximations to functional parameters of distributions of order statistics by means of the corresponding functional parameters
of the limiting distributions and of finite expansions. We shall only consider
central and intermediate order statistics.

6.1. Approximations of Moments and Quantiles

207

Moments of Central and Intermediate Order Statistics


We shall utilize the result of Section 4.7, that concerns Edgeworth type
expansions of densities of central and intermediate order statistics.
Theorem 6.1.1. Let q E (0, 1) be fixed. Assume that the df F has m

derivatives on a neighborhood of F- 1(q), and that f(F- 1(q)) >


Assume that (r(n)/n - q) = O(n-1). Put a 2 = q(1 - q).
Moreover, assume that
E IX"jl <

+ 1 bounded

where f

F'.

for some positive integer j and s E {1, ... ,j}.

CIJ

Then, for every measurable function h with Ih(x)1


tion holds:

Ixlk the following rela-

-1) f h dGr(n).n 1-- O(n -m/2 )


a
(Xr(n):n - F (q)) IEh (n1/2f(F-1(q))

(6.1.1)

where
Gr(n).n --

and Si.n is a polynomial of degree


over n.
In particular, we have

m-1
,,-iI2S
L.... n
i.n
i=1

'V

ffi

+ <P

3i - 1 with coefficients uniformly bounded

1
( ) _ [2q - 1 aj'(F- (q))J 2
S1.n t 3a + 2f(F-1(q))2 t

[-q

+ nq -

(6.1.2)

r(n)

+1

2(2q - 1)J
3a
.
(6.1.3)

PROOF. Denote by f..(n),n the density of the normalized distribution of Xr(n):n


and by gr(n),n the density (that is, the derivative) of Gr(n),n' Put Bn =
[ -log n, log n]. By P.4.S,

If

h(x)f..(n),n(x)dx -

~r

JBn

h(X)gr(n),n(X)dXI

(1)

Ih(x)IIf..(n),n(x)-gr(n),n(x)1 dx+f Ih(x)1 (f..(n),n(x) + Igr(n),n(x)l)dx

= 0 (n- mI2

In

B~

Ixlk<p(x)(1

+ Ixl3m)dx + I~

Ixlk(f..(n),n(X) + Igr(n),n(x)J)dx ).

It remains to prove an upper bound for the second term on the right-hand
side of (1). Straightforward calculations yield

JB~

Ixlklgr(n),n(x)1 dx

O(n- mI2 ),

6. Other Important Approximations

208

The decisive step is to prove that


an :=

Ixlkf,.(.).n(x)dx

O(n-m/2).

B~

Apparently,
an=f

IXr(n):n- F - 1(qWdP

_I

(x,(")," <F

(q)-I"}

(x,(")," > F- 1 (q)+t"}

F- 1 (q)l k dP =:

IXr(n):n -

O:n,l

+ a.,2

where tn = (logn)a/[n1/2f(F-1(q))]. Applying Lemma 3.1.4 and Corollary


1.2.7 we get
O:n,l =

o (P{Xr(n):. < F-1(q) -

tn}

o (P{Ur(n):. ~ F(F-1(q) +

(q) -

(q)-I"}

t.)}

b(r(n) - ks, n - (j + l)k - r(n)


b(r(n), n - r(n) + 1)

~ F(F-

IXr(n):nl kdP )

-I

(x,(")," <F

+ ks + 1)

P Ur(.)-ks:n-U+1)k

t n )})

where b denotes the beta function. Applying Lemma 3.1.1 one obtains that
= O(n-m/2). We may also prove O:n,2 = O(n-m/2) which completes the

0:.,1

~~

As a special case of (6.1.1) we obtain


EIF- 1( ) _ F- 1( )I k
n
q
q

(q(1 - q))k/2
nk/2f(F l(q))k

f,

IkdcI>( ) + O( -(k+1)/2)
x
n

(6.1.4)

Expansions of Quantiles of Distributions of Order Statistics


Recall the result of Section 4.6 where we obtained an expansion concerning
the "inverse" of an Edgeworth expansion. A corresponding result holds for
expansions of dJ.'s of order statistics.

+ 1 derivatives on a neighborhood of F- 1(q), and that f(F-1(q)) > 0 where f = F'. Suppose
that (r(n)/n - q) = O(n-1).
Then there exist polynomials Ri,n, i = 1, ... , m - 1, such that uniformly over
Ixl ~ logn,

Theorem 6.1.2. Let q E (0, 1) be fixed. Suppose that the df. F has m

6.2. Functions of Order Statistics

209

With Si.n denoting the polynomials in (6.1.2) we have

(6.1.6)

and

PROOF. Apply P.4.5 and use the arguments of Section 4.6.

(6.1.5), applied to x = <D- 1 (ex), yields


P {X,(n):n

1 11

~ F-1(q) + ( <D- (ex) +

n- i /2R i ,n(<D- 1(ex)) )

~;/~~~ q~~~;)}
(6.1.7)

= ex

+ O(n-m/2).

This result may be adopted to justify a formal expansion given by F.N.


David and N.L. Johnson (1954) in case of sample medians (see P.6.2).

6.2. Functions of Order Statistics


With regard to functions of order statistics a predominant role is played by
linear combinations of order statistics. A comprehensive presentation of this
subject is given in the book by Helmers (1982) so that it suffices to make some
introductory remarks. In special cases we are able to prove supplementary
results by using the tools developed in this book.
Chapters 4 and 5 provide approximations of joint distributions of central
and extreme order statistics by means of normal and extreme value distributions. Thus, asymptotic distributions of certain functions of order statistics
can easily be established. In this context, we also refer to Sections 9.5 and 10.4
where we shall study Hill's estimator and a certain X2-test.

Asymptotic Normality of a Linear Combination


of Uniform R.V.'s
From a certain technical point of view the existing results for linear combinations of order statistics are very satisfactory. However, the question is still
open whether one can find a condition which guarantees the asymptotic

6. Other Important Approximations

210

normality of a linear combination of order statistics related to the Lindebergcondition for sums of independent r.v.'s or martingales.
Such a condition (see (6.2.4)) was found by Hecker (1976) in the special case
of order statistics of i.i.d. uniform r.v.'s. This theorem is a simple application
of the central limit theorem.
Theorem 6.2.1. Given a triangular array of constants ai,n' i

i)

n (
j-l
bJ,n~.
=
1-a',n - .L - a
+
1
'=J
n
,=1 n + 1 ',n'

= 1, ... , n, define

j = 1, ... , n + 1,

(6.2.1)

and
(6.2.2)

Then,

P{1:;;1 i=1t (n + 1)ai,n (Ui:n -

_i-)
n+1

:5;

t} --+ <I>(t),

n --+

00,

(6.2.3)

for every t if, and only if,


n+l
max 1:;;1 Ibj,nl--+ 0,
j=1

n --+

(6.2.4)

00.

PROOF. Let '11' '12' '13' ... be i.i.d. standard exponential r.v.'s. Put Si = L~=1 '1j.
From Corollary 1.6.9 it is immediate that

i~ (n+1)ai,n( Ui:n- n~1),g, i~ ai,n[Si-iSn+d(n+ l)]/[Sn+d(n+ 1)].

(1)

Check that
n

L ai,n[Si - iSn+d(n
i=1
From (2) and the fact that E'1j = 1 it is
n+l
E L bj,n'1j =
j=1

+ 1)] =

n+l
L bj,n'1j.
j=1

clear that
n+l
L bj,n = O.
j=1

(2)

(3)

Consequently, 1:; is the variance ofLj~i bj,n'1j. Moreover, since Sn+l/(n + 1)--+
1 in probability as n --+ 00 we deduce from (1)-(3) that (6.2.3) holds if, and only
if,
n+l
J=1

p { 1:;;1 ~ bj,n'1j:5; t --+ <I>(t),

n --+

00.

(4)

The equivalence of (4) and (6.2.4) is a particular case of the Lindeberg-LevyFeller theorem as proved in Chernoff et al. (1967), Lemma 1.
0

6.2. Functions of Order Statistics

211

6.2.2. If ar.n = 1 and ai,n = 0 for i #- r (that is, we consider the order
statistic Ur:n) then T~ = r(n - r + 1)/(n + 1)3 in (6.2.2). Furthermore, (6.2.4)
is equivalent to r(n) --+ if.) or n - r(n) --+ if.) as n --+ 00 with r(n) in place of r.
EXAMPLE

As an immediate consequence of Theorem 6.2.1 and Theorem 4.3.1 we


obtain the following result of preliminary character: Assume that the density
f is strictly larger than zero and has three derivatives on the interval
(F-1(q) - a, F-1(q) + a) for some q E (0, 1). Define In = {r(n) + 4i: i = 1, ... ,
k(n)} where r(n)/n --+ q and k(n)jn --+ 0 as n --+ 00. Assume that ai,n = 0 for if/: In.
Then, with Tn as in (6.2.2), as n --+ if.),

(6.2.5)
--+ <l>(t),
for every t if, and only if, (6.2.4) holds.
Of course, this result is very artificial. It would be interesting to know
whether the index set In can be replaced by {r(n) + i: i = 1, ... , k(n)} etc. It is
left to the reader to formulate other theorems of this type by using Theorem
4.3.1 or Theorem 4.5.3.

The Trimmed Mean


The trimmed mean Li;r Xi:n is another exceptional case of a linear combination of order statistics which can easily be treated by conditioning on the order
statistics Xr:n and X s:n.
Denote by Y;:s-r-1 the ith order statistic of s - r - 1 r.v.'s with common
dJ. Fx,y [the truncation of F on the left of x and on the right of y]. Moreover,
denote by Qr,s,n the joint distribution of Xr:n and X s:n. Then, according to
Example 1.8.3(iii),
s

p { i~ Xi:n ::;; t =

f{
f r~~l

s-r-1
}
P x + i~r Y;:s-r-1 + y ::;; t dQr,s,n(x,y)

= P

Y; ::;; t - (x

+ y)} dQr,s,n(x, y)

where Y1' ... , Y.-r-1 are i.i.d. random variables with common dJ. Fx,y'
Now we are able to apply the classical results for sums of i.i.d. random
variables to the integrand. Moreover, Section 4.5 provides a normal approximation to Qr,s,n' Concerning more details we refer again to Helmers (1982).

Systematic Statistics
The notion of systematic statistics goes back to Mosteller (1946); we mention
this expression for historical reasons only because nowadays one would speak

6. Other Important Approximations

212

of a linear combination of order statistics when treating this type of statistics.


Based on the asymptotic normality of a fixed number of sample q-quantiles
one can easily verify the asymptotic normality of a linear combination of these
order statistics. Given a location and scale parameter family of distributions
one can e.g. try to find the optimum estimator based on k order statistics.
Below we shall only touch on the most simple case, namely, that of k = 2.
Lemma 6.2.3. (i) Let 0 < qo < 1. Assume that the df. F has two bounded
derivatives on a neighborhood of F-1(qo) and that fo := F'(F-1(qo)) > O.
Then,

s~p Ip{n;~2 (X[nqo]:n -

F-1(qo)):::;; t} - <I>(t) = O(n-l/2)

where u5 = qo(1 - qo)/f02.


(ii) Let 0 < ql < q2 < 1. Assume that the df. F has two bounded derivatives
on a neighborhood of F-1(qj) and that /; := F'(F-l(qj)) > 0 for i = 1,2.
Then,

s~p Ip{n;:2 (X[nQ2]:n =

X[nqtl: n - (F-l(q2) - F-1(qd)):::;; t} - <I>(t) I

O(n-l/2)

where

PROOF.

Immediate from Theorem 4.5.3 by routine calculations.

Sample quantiles and spacings (== difference of order statistics) provide


quick estimators of the location and scale parameter. Recall that adJ.

FIt ... (x) := F((x - J1.)/u)


has the qJ.
FIt~~(q)

= J1. + UF-l(q).

Under the conditions of Lemma 6.2.3 we obtain for the sample quantiles
X[nq;l:n of n i.i.d. random variables with common dJ. Fit ... that with U j = uj(F)
as in Lemma 6.2.3:

s~p Ip{u;~~~)(X[nQo]:n -

J1.):::;; t} - <I>(t) I = O(n-l/2)

(6.2.6)

(6.2.7)

if w.l.g. F- 1 (qo) = 0; moreover,

s~p Ip {U;:~~) (un where the estimator Un is given by

u) :::;; t} - <I>(t) = O(n-l/2)

6.2. Functions of Order Statistics

213

and

An Expansion of Length Two for the Convex Combination


of Consecutive Order Statistics
Let Xr:n be the rth order statistic of n i.i.d. random variables with common
continuous dJ.
We shall study statistics of the form
(1 - y)Xr:n

+ yXr+1:n,

Y E [0,1],

which may be used as estimators of the q-quantile. The most important case
is the sample median for even sample sizes.
It is apparent that this statistic has the same asymptotic behavior for every
y E [0,1] as far as the first order performance is concerned. The different
performance of the statistics for varying y can be detected if the second order
term is studied. For this purpose we shall establish an expansion oflength 2.
Denote by Fr n the dJ. of e- 1 (Xr:n - d). From Corollary 1.8.5 it is
immediate that for y and t,
P{(1 - y)Xr:n

+ yXr+1:n :5: d + et}

= Fr.n(t) -

foo P{y[Yl:n-r -

(d

+ ex)] > e(t - x)} dFrjx)

(6.2.8)

where Y1 : n - r is the sample minimum of n - r i.i.d. random variables with


common dJ. Fd +cx [the truncation of F on the left of d + ex].
Let Gn.r be an approximation to Fn.r such that

s~p Ip{e- (xr:n 1

d)

B} -

dGn.rl = O(n- 1 ).

(6.2.9)

From Corollary 1.2.7 and Theorem 5.4.3 we get uniformly in t and x,


P{y[Y1 : n- r - (d

+ ex)] > e(t - x)}

+ ex + e(t - x)/y]} (6.2.10)


exp[ -(n - r)Fd+cx[d + ex + e(t - x)/y]] + O(n- 1 ).

= P{(n - r)U1 : n- r > (n - r)Fd+cx[d


=

Combining (6.2.8)-(6.2.10) and applying P.3.5 we get

s~p Ip{(1 -

y)Xr:n

+ yXr+1 : n :5: d + et} - [ Gr,n(t)

- foo exp[ -(n -

(6.2.11)

r)Fd+cx[d

+ ex + e(t - X)fy]]dGr,iX)JI

= O(n- 1 ).

6. Other Important Approximations

214

Notice that if y = then, in view of (6.2.1 0), the integral in (6.2.11) can be
replaced by zero.
Specifying normalizing constants and an expansion Gr n of Fr n we obtain
the following theorem.
Theorem 6.2.4. Let q E (0, 1) be fixed. Assume that F has three bounded
derivatives on a neighborhood of F- 1(q) and that f(F- 1(q)) > where f = P.

Moreover, assume that (r(n)/n - q)


Then, uniformly in y E [0, IJ,
s~p

IP {

0(n-1/Z). Put

n1/2f(F-l(q))
(J
[(1 - y)Xr(n),n

Rn(t)

= -

+ yXr(n)+b -

q(1 - q)).

-1

(q)] ~ t

+ n- 1/z <p(t)Rn(t)) I =

- (<I>(t)
where

(Jz

0(n- 1/2)

1 - 2q (Jf'(F- 1(q))] Z
[ ~ - 2f(F-1(q))2 t

_ [q - nq

+ r(n) + y (J

1 + 2(1 - 2q)].

3(J

PROOF. The basic formula (6.2.11) will be applied to d = F-1(q) and e =


(J/(n 1/zf(d)). In view of P.4.5 which supplies us with an expansion Gr.n =
<I> + n1/Z<pSr,n of Fr,n it suffices to prove that

roo exp[ -(n -

r)Fd+cx[d

+ ex + e(t -

x)/yJJ <p(x) dx

(1)

and

roo exp[ -(n -

r)Fd+cx[d

+ ex + e(t -

x)/yJJ I(<pSr,n)'(x) I dx = o(nO)

(2)

uniformly in y and t. The proof of (1) will be carried out in detail. Similar
arguments lead to (2).
Since <1>( - (log n)/2) = O(n-1) it is obvious that
can be replaced by J~logn
where t ~ (log n)/2. Then, the integrand is of order O(n-1) for those x with
e(t - x)/y > s(log n)/n for some sufficiently large s > 0. Thus, J~oo can be
replaced by J~(n) where u(n) = max( -log n, t - ys(log n)/en).
Under the condition that F has three bounded derivatives it is not difficult
to check that for u(n) ~ x ~ t,

f-oo

Fd+cx[d
=

+ ex + e(t f(d)e(t - x)
(1 _ q)y

x)/yJ

+ O[elxl(c(t -

x)/y

+ (e(t

- x)/y)Z)]

(3)

6.2. Functions of Order Statistics

215

Thus, (1) has to be verified with the left-hand side replaced by the term
i:n) ex p [ -(n -

which, by substituting y
n-l/2(y/cr)

r)n-1/2~(t_-q;ncp(X)dX

= n1/2cr(t -

x)/y, can easily be verified to be equal to

I [1 v(n)

exp

(4)

r/n ]
y cp(t - n- 1/2yy/cr) dy

l-q

(5)

where v(n) = s(log n)/!(d). Since


1 - r/n ]
exp [ - 1 _ q Y

= exp( -

and
cp(t - n- 1/2yy/cr)

y)[1

+ o(nO)]

= cp(t) [1 + o(nO)]

(6)

we obtain that the term in (5) is equal to


n-l/2(y/cr)cp(t)

(v(")

exp( - y) dy(1

+ o(nO)).

(7)

Notice that the relations above hold uniformly in p and t. Now (1) is
immediate.
0
Notice that for y = 0 we again get the expansion of length two of the
normalized dJ. of Xr(n):n as given in P.4.5. Moreover, for y = 0 and for r(n)
replaced by r(n) + 1 we get the same expansion as for p = 1 and r(n).
If q = !, f'(F-l(I/2)) = 0, n = 2m, and r = m then
P{[(2m)1/2!(F-l(I/2))/2] [(Xm:2m

= cI>(t) +

+ X m+1:2m)/2 -

F-l(I/2)] :::;; t}

(6.2.12)

o(n- 1/2 ).

Thus, the sample median for even sample sizes is asymptotically normal with
a remainder term of order o(n- 1/2 ). For odd sample sizes the corresponding
result was proved in Section 4.2.
Remark 6.2.5. Let qo E (0, 1). Assume that F has three bounded derivatives on
a neighborhood of F-1(qo) and that !(F- 1(qo)) > 0. Then a short examination
of the proof to Theorem 6.3.4 reveals that the assertion holds uniformly over
all q in a sufficiently small neighborhood of qo and r(n) == r(q, n) such that
SUPq Ir(q, n)/n - ql = o(n- 1/ 2 ). This yields the version of Theorem 6.3.4 as cited
in Pfanzagl (1985).

The Meanwhile Classical Theory of Linear


Combinations of Order Statistics
The central idea of the classical approach is to use weight functions to
represent a linear function of order statistic in an elegant way.

6. Other Important Approximations

216

Linear combinations of order statistics of the form

T" = n- 1
are estimators of the functional
J.l(F) =

i=l

Ii
f

(_i+-)
n

Xi:n

J(s)F-l(s)ds.

(6.2.13)

(6.2.14)

Notice that according to (1.2.13) and (1.2.14)


J.l(F) =

xJ(F(x))dF(x)

(6.2.15)

for continuous dJ.'s F.


The following theorem is due to Helmers (1981). The proof of Theorem
6.2.6 (see also Helmers (1982, Theorem 3.1.2)) is based on the calculus of
characteristic functions.
Theorem 6.2.6. Suppose that EI~113 <
(J2(F):=

00

and

f f J(F(x))J(F(y))(min(F(x), F(y)) -

F(x)F(y)) dx dy > 0.

Moreover, let the weight function J satisfy a Lipschitz condition of order 1 on


(0, 1). Then,

The smoothness condition imposed on J can be weakened by imposing


appropriate smoothness conditions on F.

6.3. Bahadur Approximation


In Section 1.1 we have seen that the dJ. of an order statistic-and thus
that of the sample qJ.-can be represented by means of the sample dJ.
It was observed by R.R. Bahadur (1966) that an amazingly accurate stochastic
approximation of the sample dJ. to the sample qJ. holds.

Motivation
To get some insight into the nature of this approximation let us consider the
special case of i.i.d. (0, I)-uniformly distributed r.v.'s 111' 112' ... , 11n. Denote by
Gn and Vi:n the pertaining sample dJ. and the ith order statistics. We already

6.3. Bahadur Approximation

217

know that the distributions of


n

Gn(rln) = n- 1

i=l

1(-00.r/n)(I1;)

and of Ur:n are concentrated about rln. Moreover, relation (1.1.6) shows that
pointwise
Ur:n -

~n $;

iff Gn

(~)
~n ~ 0.
n

(6.3.1)

Thus, it is plausible that the distribution of


(Ur:n - rln)

+ (Gn(rln) -

rln)

is more closely concentrated about zero than each of the distributions of


Ur:n - rln and Gn(rln) - rln. Instead of (Ur:n - rln) + (Gn(rln) - rln), the socalled Bahadur statistic
q

(0, 1),

(6.3.2)

may apparently be studied as well.


Recall that G;;l(q) = Ur(q):n where r(q) = nq if nq is an integer and r(q) =
[nq] + 1 otherwise.
In the general case of order statistics X i : n from n i.i.d. random variables ~i
with common dJ. F and derivative J(F-1(q)) the Bahadur statistic is given by
q

(0, 1),

(6.3.3)

where Fn and Fn- 1 are the sample dJ. and sample qJ. based on the r.v.'s ~i.
The connection between (6.3.2) and (6.3.3) becomes obvious by noting that
the transformation technique yields

+ (Fn(F-1(q)) - q)
J(F-1(q))(F- 1(G;;l (q)) - F-1(q)) + (Gn(F(F-1(q))) -

J(F- 1(q))(Fn-1 (q) - F-1(q))


d

q).

(6.3.4)

If F-1(q) is a continuity point of F then F(F-1(q)) can be replaced by q and,


moreover, if F- 1 has a bounded second derivative then

and hence results for the Bahadur statistic in the uniform case can easily be
extended to continuous dJ.'s F.

Probabilities of Moderate Deviation


Since we are interested in the Bahadur statistic as a technical tool we shall
confine our attention to a result concerning moderate deviations. The upper
bound for the accuracy of the stochastic approximation will be non-uniform
in q.

6. Other Important Approximations

218

Theorem 6.3.1. For every s > 0 there exists a constant C(s) such that
P{ I(G;l(q) - q)

+ (Gn(q) -

q)1 > (log n)jn)3/415(q, s, n)

for some

q E (0, I)} ::;; C(s)n- S

where l5(q, s, n) = 7(s

+ 3) max {(q(1

- q))1/4, (7(s

+ 3)(10g n)jn) 1/2 }.

Before proving Theorem 6.3.1 we make some comments and preparations.


Theorem 6.3.1 is sufficient as a technical tool in statistical applications,
however, one should know that sharp results concerning the stochastic
behavior of the Bahadur statistic exist in literature. The following limit
theorem is due to Kiefer (1969a): For every t > 0,
P { sup I(G;l(q) - q)
qE(O,l)

--t

+ (Gn(q) -

q)1 >

(10gn)1/2t}
3/4

2 L (_l)m+1 e-2m2t4
m

as n --t 00 where the summation runs over all positive integers m.


Kiefer's result indicates that Theorem 6.3.1 is sharp in so far that the
((logn)/n)3/4 cannot be replaced by some term of order o[((10gn)/n)3/4].
To prove Theorem 6.3.1 we shall use a simple result concerning the
oscillation of the sample dJ. For this purpose define the sample probability
measure Qn by
n

Qn(A)

= n- 1 L

i=l

lA('1i)

where the '1i are i.i.d. random variables with common uniform distribution Qo
on (0, 1). Recall that the Glivenko-Cantelli theorem yields
n --t

00,

w.p. 1,

(6.3.5)

where f is the system of all intervals in (0, 1).


Lemma 6.3.2 will indicate the rate of convergence in (6.3.5); moreover, this
result will show that the rate is better for those intervals 1 for which 0'2(1) =
Qo(I)(l - Qo(I)) is small.

Lemma 6.3.2. For every s > 0 there exists a constant A(s) such that for every n:

P{
PROOF.

n 1/2 IQn(I) - Qo(I)1

~~~ max{O'(1),((10gn)jn)1/2} ~ s +

3)(1

ogn

)1/2}

<

A()-S
sn .

Given e, p > 0 we shall prove that


K(s, n) := P { sup
lE$

n1/2IQn(I) - Qo(I)1
}
{ () j 1/2} ~ e
max O'1,pn

(6.3.6)

6.3. Bahadur Approximation

219

Then, an application of (6.3.6) to p = (log n)1/2 and 6 = (8 + 3)(log n)1/2


yields the assertion.
Put.fo = {(i/n,j/n]: 0::; i <j::; n}. Straightforward calculations yield
K~~<P { wp
-

:s;

I do

L
IE

.Jb

n1/2IQn(I) - Qo(I)1 + 2n- 1/2


}
>6
max {(j2(1) - 2/n - 4/n2, p2/n} 1/2 -

(1)

P {n 1/2 1Qn(I) - Qo(I)1 ~ 6(I)}

where 6(I) = 6 max {(j2(I) - 2/n - 4/n2, p2/n} 1/2 - 2n- 1/2 . Let J E.fo be fixed.
Assume w.l.g. that (j(I) > and 6p ~ 7/2 so that 6(1) > 0. Using the exponentialbound(3.1.1)witht = (j(I)/max{((j2(I) - 2/n - 4/n2)/p2, 1/np/2 weobtain

P{n 1/2 IQn(1) - Qo(I)1

~ 6(1)}:S; 2exp [ - ;~ht + ~t2J


(2)

3 p 2 +"2
7
:s; 2 exp [ - 6p + 4"

+;;3J .

Now, (1) and (2) yield (6.3.6). The proof is complete.

Remark 6.3.3. Lemma 6.3.2 holds for any i.i.d. random variables (with
arbitrary common distribution Q in place of Qo). The general case be reduced
to the special case of Lemma 6.3.2 by means of the quantile transformation.
Lemma 6.3.2 together with the Borel-Cantelli lemma yields
.
n 1/2 1Qn(J) - Qo(I)1
hm sup sup
( )(l
)1/2
::; 5
n
IE.}n
(j J ogn

(6.3.7)

w.p. 1

where.fn = {I E: (j2(I) = Qo(I)(1 - Qo(I ~ (logn)/n}. In this context, we


mention a result of Stute (1982) who proved a sharp result concerning the
almost sure behavior of the oscillation of the sample dJ.:
.
n 1/2 1Qn(I) - Qo(I)1
hm sup 2 ()l
1)1/2
n IE.}: ( Qo J oga n

(6.3.8)

= I w.p.1

where.f.* = {I E .~: J = (a,b], r:xan:s; Qo(I):s; {Jan} with < r:x < {J < 00, and
an has the properties an! 0, nan i 00, log a;; 1 = o(na n) and (log a;;1 )/(loglog n) ~
00 as n ~ 00. Note that (6.3.8) shows that the rate in (6.3.7) is sharp.
Theorem 6.3.1 will be an immediate consequence of Lemma 6.3.2 and
Lemma 3.1.5 which concerns the maximum deviation of the sample qJ. Gn- 1
from the (0, I)-uniform qJ.
PROOF OF THEOREM 6.3.1. Since IGn (Gn- 1 (q - ql :s; l/n we obtain

IGn- 1(q) - q

+ (Gn(q) -

q)1 :s; IG;1(q) - Gn(Gn- 1(q


:s; sup

Ix-ql";"

+ (Gn(q) -

Ix - Gn(x) + Gn(q) - ql

= sup IQn(I(q - Qo(I(q1


I(q)

q)1

+ l/n

220

6. Other Important Approximations

whenever IG;l(q) - ql ::S; K and I(q) runs over all intervals (x, q] and (q, x] with
ql ::S; K. Thus, by Lemma 6.3.2 and Lemma 3.1.5 applied to K = K(q,s,n),
we get

Ix -

P{IG;l(q) - q

+ (Gn(q) -

~ P {sup IQn(I(q)) -

q)1 ::S; (j(q,s, n),

q E (0, I)}

Qo(I(q)) I ::S;

J(q)

(s
~

1 - [A(s)

+ 3)((logn)/n)1/2 K(q,s,n)1/2,

+ B(s)]n-

q E (0,

1)} - B(s)n-

where A(s) and B(s) are the constants of Lemma 6.3.2 and Lemma 3.1.5. The
proof is complete.
D

6.4. Bootstrap Distribution Function of a Quantile


In this section we give a short introduction to Efron's bootstrap technique
and indicate its applicability to problems concerning order statistics.

Introduction
Since the sample dJ. Fn is a natural nonparametric estimator of the unknown
underlying dJ. F it is plausible that the statistical functional T(Fn) is an
appropriate estimator of T(F) for a large class of functionals T.
In connection with covering probabilities and confidence intervals one is
interested in the dJ.
T,.(F, t) = PF{T(Fn) - T(F) ::S; t}
of the centered statistic T(Fn) - T(F).
The basic idea of the bootstrap approach is to estimate the dJ. T,.(F, . ) by
means of the bootstrap dJ. T,.(Fn'). Thus, the underlying dJ. F is simply
replaced by the sample dJ. FnLet us touch on the following aspects:
(a) the calculation of the bootstrap dJ. by enumeration or alternatively, by
Monte Carlo resampling,
(b) the validity of the bootstrap approach,
(c) the construction of confidence intervals for T(F) via the bootstrap
approach.

Evaluation of Bootstrap D.F.: Enumeration and Monte Carlo


Hereafter, let the observations Xl' ... , Xn be generated according to n i.i.d.
random variables with common dJ. F. Denote by Fnx the corresponding

221

6.4. Bootstrap Distribution Function of a Quantile

realization of the sample dJ. Fn; thus, we have


FnX(t) = lin

L l(-oo.tj(x;),
i=l

Since F: is a discrete dJ. it is clear that the realization T,,(F:,') of the


bootstrap dJ. T,,(Fn' .) can be calculated by enumeration:
If Xi =1= Xj for i =1= j then T,,(Fnx, t) is the relative frequency of vectors
Z E {x l' ... , xn}n which satisfy the condition
(6.4.1)
Notice that inequality (6.4.1) has to be checked for nn vectors z.
A Monte Carlo approximation to T,,(F:, t) is given by the relative frequency
of pseudo-random vectors Zl' ... , Zm satisfying (6.4.1) where Zi = (Zi,l,, Zi,n)'
The values Zl,l"'" zl,n' Z2,l"'" zm,n are pseudo-random numbers generated
according to the dJ. Fnx . The sample size m should be large enough so that the
deviation of the Monte Carlo approximation from T,,(F:, t) is negligible.
The 30'-rule leads to a crude estimate of the necessary sample size. It says
that the absolute deviation of the Monte Carlo approximation from T,,(Fnx, t)
is smaller than 3/(2m 1!2) with a probability;::: .99. Thus, if e.g, a deviation of
0.005 is negligible then one should take m = 90000.
These considerations show that the Monte Carlo procedure is preferable
to the exact calculation of the bootstrap estimate by enumeration if m is small
compared to nn (which will be the case if n ;::: 10). In special cases it is possible
to represent the bootstrap estimate by some analytical expression (see (6.4.2)).

A Counterexample: Sample Minima


Next, we examine the statistical performance of bootstrap estimates in the
particular cases of sample minima. This problem will serve as an example
where the bootstrap approach is not valid.
Let again rx(F) = inf{x: F(x) > O} denote the left endpoint of the dJ. F. The
corresponding statistical functional rx(Fn) is the sample minimum X 1:n. If
rx(F) > - 00 then according to (1.3.3),
T,,(F, t) = P{X1:n - rx(F)

:$;

t} = 1 - [1 - F(rx(F)

+ t)]n

and
If F is continuous then w.p. 1,

T,,(Fn' 0) - T,,(F,O)

= 1-

(1 -

~y --+ 1 -

exp( -1),

n --+

00.

Hence the bootstrap method leads to an inconsistent sequence of estimators.

222

6. Other Important Approximations

Sample Quantiles: Exact Evaluation of Bootstrap D.F.


Monte Carlo simulations provide some knowledge about the accuracy of the
bootstrap procedure for a fixed sample size. Further insight into the validity
of the bootstrap method is obtained by asymptotic considerations.
The consistency of T,,(Fn' .) holds if e.g. the normalized drs T,,(Fn' .) and
T,,(F, .) have the same limit, as n goes to infinity. Then the accuracy of the
bootstrap approximation will be determined by the rates of convergence of
the two sequences of drs to the limiting dJ. As an example we study the
bootstrap approximation to the dJ. of the sample q-quantile.
If T(F) = F-l(q) then T(Fn) = Fn-1(q) = Xm(n),n where men) = nq if nq is an
integer, and men) = [nq] + 1, otherwise. By Lemma 1.3.1,

= i=~n) C}F(F-1(q) + M(1

(6.4.2)

F(F-1(q)

+ t))n-i

and the same representation holds for T,,(Fn' t) with F- 1 replaced by Fn- 1.
From Theorem 4.1.4 we know that T,,(F, t), suitably normalized, approaches
the standard normal dJ. <l> as n --> 00. The normalized version of T,,(F, t) is
given by
(6.4.3)
if F = <l>.
To prove that the bootstrap dJ. T,,(Fn' .) is a consistent estimator of T,,(<l>, .)
one has to show that, T,,*(Fn, t) --> <l>(t), n --> 00, for every t, w.p. 1.

-3

-2

-1

Figure 6.4.1. Normalized dJ. 7;,*(<1>, .) of sample q-quantile and bootstrap dJ. 7;,*(F", .)
for q = 0.4 and n = 20, 200.

223

6.4. Bootstrap Distribution Function of a Quantile

The numerical calculations above were carried out by using the normal
approximation to the dJ. of the sample quantile of i.i.d. (0, I)-uniformly
distributed r.v.'s. Otherwise, the computation of the binomial coefficients
would cause numerical difficulties. Computations for the sample size n = 20
showed that the error of this approximation is negligible.
From Figure 6.4.1 we see that Tz"O(<I>, .) and Tz"Oo(<I>, .) are close together
(and, by the way, close to <D) indicating a quick convergence. The bootstrap
dJ. T,,*(Fn' .) is a step function which slowly approaches <1>. Next, we indicate
this rate of convergence.

Asymptotic Investigations
The further analysis will be simplified by using the normal approximation to
the dJ. of the sample q-quantile of n i.i.d. (0, I)-uniformly distributed r.v.'s.
From Corollary 1.2.7 and (4.2.1), applied to m = 1, we deduce
T,,(Fn' t/n 1/Z ) - T,,(F, t/n1/Z)
= <I>

[n 1/Z (Fn(Fn -l(q) + t/n1/Z) - q) ]


(q(l - qW/Z

n 1/2 (F(F- 1(q) + t/n 1/2 ) - q) ]


- <I> [
(q(l _ q))1/2
=

+ O(n

-1/2

(6.4.4)
)

<I>[tgn.t/(q(l - q))1/2] - <I>[tf(F-1(q))/(q(1 - q))1/2]

+ 0(1)

uniformly over t [where the second relation holds if F has a derivative, say,
f(F- 1(q)) at F-1(q)]. Moreover, the function gn,t is defined by

gn,t

Fn(Fn- 1(q)

+ t/n 1/2 ) -

Fn(Fn- 1(q))

# 0,

(6.4.5)

and = if t = 0.
The auxiliary function gn,t is a "naive" estimator of the density at the
random point Fn- 1 (q). Thus, the stochastic behavior of the bootstrap error
T,,(Fn, t/n 1/2 ) - T,,(F, t/n 1/2 ) is closely related to that of a density estimator. We
have
sup IT,,(Fn' t) - T,,(F, t) I --+ 0,

n --+

00,

w.p. 1,

(6.4.6)

that is, the bootstrap estimator is strongly consistent, if w.p. 1 for every t # 0,

n --+

00.

(6.4.7)

Let us assume that F has a derivative, say, f near F- 1 (q) and f is continuous
at F- 1 (q). From Lemma 3.1.7(ii) and the Borel-Cantelli lemma it follows that,
w.p. 1, for every t # 0,

6. Other Important Approximations

224

+o[(F(Fn- 1(q) + t/n 1/2 ) - F(Fn- 1(q)))1 /2 (logn)1/2 + lOgn]


n 1/4
n 1/2
tin 1/2

= f(P-1(q) + en.tt/n1/2) + 0 [(f(Fn- 1(q) + en.tt/n1/2))1/2 (l0!1~1/2 + l:~2n]


eventually, for some en,t E (0, 1).
Thus, (6.4.7) holds because Fn-1(q) is a strongly consistent estimator of
F-1(q) under the present conditions (compare with Lemma 1.2.9).
It is easy to see that the proof, developed above, also leads to a bound of
the rate of convergence which is, roughly speaking, of order O(n-1/4) under
slightly stronger conditions imposed on F.
An exact answer to the question concerning the accuracy of the bootstrap
approximation can e.g. be obtained by a law of the iterated logarithm as
proved by Singh (1981):
If F has a bounded second derivative near F-1(q) andf(F- 1(q)) > 0 then
lim sup (l
n

n 1/4
1
)1/2 sup I T,,(Fn' t) - T,,(F, t)1

og ogn

Kq,F

>0

w.p.l

where K is a constant depending on q and F only.


The accuracy of the bootstrap approach is also described in a theorem due
to Falk and Reiss (1989) which concerns the weak convergence of the process
Zn defined by
where

(6.4.8)

and cp = <1>'.
Theorem 6.4.1. Assume that F is a continuous df having a derivative f near
F- 1(q) which satisfies a local Lipschitz-condition of order (j> 1/2 and that
f(F- 1(q)) > O. Then, Zn weakly converges to a process Z defined by
Z(t ) = {

B1 ( - t)
B 2 (t)

'f t:<:;; 0
t> 0

where B1 and B2 are independent standard Brownian motions on [0, (0).

We refer to Falk and Reiss (1989) for a detailed proof of Theorem 6.4.1
and for a definition of the weak convergence on the set of all right continuous
functions on the real line having left-hand limits.
The basic idea of the proofis to examine the expressions in (6.4.3) and (6.4.5)
conditioned on the sample q-quantile Fn- 1(q). Notice that the r.v.'s gn,t only

6.4. Bootstrap Distribution Function of a Quantile

225

depend on order statistics smaller (larger) than Fn- 1 (q) if t ~ 0 (if t > 0). Thus,
it follows from Theorem 1.8.1 that, conditioned on Fn-1(q), the processes
(gn,t)t~O

and (gn,t)t> 0

are conditionally independent. Theorem 6.4.1 reveals that we get the unconditioned independence in the limit.

The Maximum Deviation


Let T,,*(Fn, .) be the normalized bootstrap dJ. as defined in (6.4.3). Denote by
Hn the normalized dJ. of the maximum deviation ofthe bootstrap dJ. T,,*(Fn, t)
from T,,*(<D, t) over 1tl ~ 3. More precisely, we have
Hn(s) = P {nl/4 max 1T,,*(Fn, t) - T,,*(<D, t)1 ~ s}.

(6.4.9)

Itl~3

We present a Monte Carlo result based on a sample of size N = 5000.


Pigure 6.4.2 shows that the asymptotic result in Theorem 6.4.1 is ofrelevance
for small and moderate sample sizes.

1.0

0.5

0.0

+---~~------~-

0.5

1.0

Figure 6.4.2. Normalized dJ. Hn of maximum bootstrap error for q = 0.5 and n = 200,
2000 with H 200 ~ H2000

Confidence Bounds
Next, we consider the problem of setting t'Y0-sided confidence bounds for the
unknown parameter T(F). First, let us look at the problem from the point of
view of a practitioner. One has to find a random variable cn(oc) such that
(6.4.10)
The bootstrap solution is to take cioc) such that the bootstrap dJ. satisfies

6. Other Important Approximations

226

T,.(Fn, cn(a)) - T,.(Fn' -cn(a))

1 - a.

The validity of (6.4.10) can be made plausible by the argument that uniformly
over all t
This idea will be made rigorous in the particular case of the q-quantile via
asymptotic considerations.
If F has a derivative f near F-l(q) and f is continuous at F- 1 (q) then we
know that
n -+

00,

where un(a) = <1>-1 (1 - a/2)(q(1 - q)/n)I/21f(F-l (q)). Moreover, by using the


fact that
n -+ 00, w.p. 1,
sup IT,.(Fn' t) - T,.(F, t)1 -+ 0,
t

we obtain cn(a)/un(a) -+ 1, n -+ 00, w.p. 1. Hence, Slutzky's lemma yields


(6.4.10).
For a continuation of this topic we refer to Section 8.4 where the smooth
bootstrap is examined.

P.6. Problems and Supplements


1. (Gram-Charlier series of type A)
Let cp denote the density of the standard normal dJ. <1>. The Chebyshev-Hermite
polynomials Hi = (_1)icp(i)/cp are orthonormal w.r.t. the inner product (h,g) =
Sh(x)g(x)cp(x) dx (see Kendall and Stuart (1958), page 155). Write Hi = 2:;=0 ej,ixj.
Denote by Pn the distribution and by Itn,j the jth moment of
n l /2 f(p-I(q

-I

(q(1 _ qI/2 (Xr(n),n - F

i (

(q.

Prove, under the conditions of Theorem 6.1.1, that


sup IPn(B) B

cp(x) 1 +

.2:

3(m-l)

(_1)i

.=1

-T Hi(x)
C

dx = O(n-m/2)

I.

(Reiss, 1974b)
2. Under the conditions of Theorem 6.1.2, with q = 1/2, m = 3 and odd sample sizes n,
P { X[n/21+bn > F

-I

(1/2)

+ 2fon l / 2 A. -

1 fl 2
4n l /2 f02 A.

-~(A
+
A3(1- 2ff/0
4n

+ ~)))}
=
6f 3
0

IX

+ O(n-

3/
2)

where A. = <1>-1(1 - IX) and J; = f(il(F-I(1/2.


(F.N. David and N. L. Johnson, 1954)

Bibliographical Notes

227

3. Let Xi,. be the ith order statistic of n i.i.d. exponential r.v.'s

where bj

= (n

- j

~ I ' ... , ~ .

Show that

+ Itl I7=j a i Moreover, with r; = I~=I bi:., we have

{r;;-I .

ai (X i ,. -

EX i ,.)

::;

l=l

t}

-+ lI>(t),

n -+

00,

if, and only if,


.+1

max r;;-llbj,.I-+ 0,

j=1

n -+

00.

4. Prove an expansion of length 2 in Lemma 6.2.3(ii).


5. Show that the accuracy of the bootstrap approximation can be improved by treating
the standardized version

[Hint: Use (6.4.4).]

Bibliographical Notes
An approach related to that in Theorem 6.1.1 was adopted by Hodges and
Lehmann (1967) for expanding the variance of the sample median (without
rigorous proof). These investigations led to the famous paper by Hodges and
Lehmann (1970) concerning the second order efficiency (deficiency).
Concerning limit theorems for moments of extremes we refer to Pickands
(1968), Polfeldt (1970), Ramachandran (1984), and Resnick (1987).
Concerning linear combinations of order statistics we already mentioned
the book of Helmers (1982). A survey of other approaches for deriving limit
theorems for linear combinations of order statistics is given in the book of
Serfling (1980). A more recent result concerning linear combinations of order
statistics is due to van Zwet (1984): A representation as a symmetric statistics leads to a Berry-Esseen type theorem that is essentially equivalent to
Theorem 6.2.6.
Limit laws for sums of extremes and intermediate order statistics have
attained considerable attention in the last years. This problem is related to
that of weak convergence of sums of i.i.d. random variables to a stable law
(see Feller (1972)). Concerning weak laws we refer to the articles of M. Csorgo
et al. (1986), S. Csorgo and D.M. Mason (1986), and S. Csorgo et al. (1986).
A. Janssen (1988) proved a corresponding limit law w.r.t. the variational
distance. An earlier notable article pertaining to this is that of Teugels (1981),
among others.
Spacings and functions of spacings (understood in the greater generality of
m-step spacings) are dealt with in several parts of the book as e.g. in the context
of estimating the quantile density function. We did not make any attempt to

228

6. Other Important Approximations

cover this field to its full extent. For a comprehensive treatment of spacings
see Pyke (1965, 1972). Several test statistics in nonparametric statistics are
based on spacings. In the present context, the most interesting ones are
perhaps those based on m-step spacings. For a survey of recent results we refer
to the article of lammalamadaka S. Rao and M. Kuo (1984). Interesting results
concerning "systematic" statistic (including x2-test) are given by Miyamoto
(1976).
A first improvement of Bahadur's original result in 1966 was achieved by
Kiefer (1967), namely a law ofthe iterated logarithm analogue for the Bahadur
approximation evaluated at a single point. Limit theorems like that stated in
Section 6.3 are contained in the article of Kiefer (1969a). Further extensions
concern (a) the weakening of conditions imposed on the underlying r.v.'s (see
e.g. Sen, 1972) and (b) non-uniform bounds for the remainder term of the
Bahadur approximation (e.g. Singh, 1979).
It was observed by Bickel and Freedman (1981) that bootstrapping leads
to inconsistent estimators in case of extremes. An interesting recent survey of
various techniques related to bootstrap was given by Beran (1985). We refer
to Klenk and Stute (1987) for an application of the bootstrap method to linear
combinations of order statistics.

CHAPTER 7

Approximations in the Multivariate Case

The title of this chapter should be regarded more as a program than as a


description of the content (in view of the declared aims of this book).
In Section 7.1 we shall give an outline of the present state-of-the-art of the
asymptotic treatment of multivariate central order statistics.
Contrary to the field of central order statistics a huge amount of literature
exists concerning the asymptotic behavior of multivariate extremes. For an
excellent treatment of this subject we refer to Galambos (1987) and Resnick
(1987). In Section 7.2 we shall present some elementary results concerning the
rate of convergence in the weak sense. Our interest will be focused on maxima
where the marginals are asymptotically independent. As an example we shall
compute the rate at which the marginal maxima of normal random vectors
become independent.

7.1. Asymptotic Normality of Central Order Statistics


Throughout this section, we assume that /;1' /;2, /;3' ... is a sequence of i.i.d.
random vectors of dimension d with common dJ. F. Let X~)n be the rth order
statistic in the jth component as defined in (2.1.4).
For j = 1, ... , d, let I(j) c {1, ... , n}. If F statisfies some mild regularity
conditions then it is plausible that a collection of order statistics
j

= 1, ... , d, r(j) E J(j)

(7.1.1)

is jointly asymptotically normal if for each j = 1, ... , d the order statistics


r(j)

I(j),

(7.1.2)

7. Approximations in the Multivariate Case

230

have this property. We do not know whether this idea can be made rigorous,
though.
The asymptotic normality of order statistics can be proved via the device
of Section 2.1, namely, to represent the dJ. of order statistics as the dJ. of a
sum of i.i.d. random vectors. To simplify the writing let us study the 2dimensional case. According to Section 2.1 we have

P{X~U,n):n ::;; tl,n, X~ZJ,n):n ::;; t2,n}


= P

Lt (1(-oo,tl,nl(~i,

d, 1(-oo,t2,nl(~i,2 ~ r(n) }

(7.1.3)

where ~i = (~i,I'~i,2) and rn = (r(l,n), r(2,n. On the right-hand side we are


given the distribution of a sum of i.i.d. random vectors whence the multidimensional central limit theorem is applicable.
Let 0 < ql' q2 < 1 be fixed and assume that
nI/2(r(i, n)/n - qi) --+ 0,

n --+

00,

i = 1,2.

(7.1.4)

According to the univariate case the appropriate choice of constants


(tl,n> t 2 ,n) is

ti,n = Fi-l(qi)

+ x;/n I/21;,

i = 1,2,

tn

(7.1.5)

where F'; is the ith marginal dJ. of F and I; = Fi(F';-I(qi' Let us rewrite the
right-hand side of (7.1.3) by
(7.1.6)

where the random vectors 11i,n are given by


11i,n = -[(I(-oo,tl,nl(~i,d, 1(-oo,t2,nl(~i,2))

(FI(tl,n), F2(t2,n))], (7.1.7)

and
(7.1.8)
Obviously, 11i,n, i = 1, 2, ... , n, are bounded i.i.d. random vectors with mean
vector zero and covariance matrix ~n = (O'i,j,n) given by
O'i,i,n = Fi(t i,n)(1 - Fi(ti,n,

i = 1,2

(7.1.9)

and

Theorem 7.1.1. Assume that F is continuous at the point (F11 (ql),F;I(q2))' More-

over, for i = 1, 2, let F'; be differentiable at Fi-l(qi) with I;


Define ~ = (O'i,j) by

= Fi(Fi-l(qi)) > O.

i = 1,2,
and

(7.1.10)

231

7.1. Asymptotic Normality of Central Order Statistics

If det(:E) "# 0 and condition (7.1.4) holds then for every (Xl,X2):

n -+ 00, (7.1.11)
where <I>}; is the bivariate normal df. with mean vector zero and covariance
matrix :E.
PROOF. Let :En and 1)i.n be as in (7.1.9) and (7.1.7). Since :En -+:E, n -+ 00, we
may assume w.l.g. that det(:E n) # O. Let T" be a matrix such that T,,2 = :E;1
[compare with Bhattacharya and Rao (1976), (16.3), and (16.4)]. Then, according to a Berry-Esseen type theorem (see Bhattacharya and Rao (1976), Corollary 18.3) we get

s~p Ip{n- i~ 1)i,n ~ z} l /2

<I>};Jz)

I~ cn-l/2EIIT,,1)l,nll~ = O(n-

1/2 )

(7.1.12)
for some constant c > O. Here II 112 denotes the Euclidean norm.
The differentiability of Fi at Fi- l (qi) and condition (7.1.4) yield that xi,n -+ Xi>
n -+ 00, and hence

n -+

00.

Combining (7.1.3), (7.1.6), (7.1.12), and (7.1.13) we obtain (7.1.11).

(7.1.13)

The error rates in (7.1.11) can easily be computed under slightly stronger
regularity conditions imposed on F.
The condition det(:E) # 0 is rather a mild one. If ~i = (C;;. c;i) are random
vectors having the same r.v. in both components then det(:E) = 0 if ql = q2
and det(:E) # 0 if ql # q2' It is clear that the two procedures of taking two
. . X ':n' X .:n accord'mg to '>1,
}:. ... , '}:.>n or ord er statIstIcs
. . X(l)
X(2)
ord er statIstIcs
ron'
s:n
according to ~l' . , ~n are identical. Thus, the situation of Section 4.5 can be
regarded as a special case of the multivariate one.
Next we give a straightforward generalization of Theorem 7.1.1 to the case
d ~ 3. We take one order statistic X~~,n):11 out of each of the d components.

Theorem 7.1.2. Let

~l' ~2' ... be a sequence of d-variate i.i.d. random vectors


with common df. F. Denote by Fi and Fi,j the univariate and bivariate marginal
df.s of F. Let 0 < qi < 1 for i = 1, ... , d. Assume that Fi,j is continuous at the
point (Fi-l(qi),Fj-l(qj for i, j = 1, ... , d. Moreover,for i = 1, ... , d, let fj be
differentiable at Fj-l(qj) with /; = F;(Fj-l(qi > O. Assume that

n 1/2 (r(i, n)/n - qj) -+ 0,

n -+ 00, i = 1, ... , d.

(7.1.14)

Define:E = (O'i) by
i = 1, ... , d,
and

(7.1.15)

7. Approximations in the Multivariate Case

232

= (x!,oo.,x d ),
~ Xi' i = 1, ... , d} -+ <l>r(x),

If det(~) # 0, then for every x

P{ n 1/2 .t;[X~3.n):n - Fi-! (qi)]

n -+

00,

(7.1.16)

where <l>r is the d-variate normal df with mean vector zero and covariance
matrix ~.

7.2. Multivariate Extremes


In this section, we shall deal exclusively with maxima of d-variate i.i.d. random
vectors /;!.n, ... , /;n.n with common d.f. Fn. It is assumed that Fn has identical
univariate marginals Fn,i' Thus,
Fn.l =00. =Fn,d'

It will be convenient to denote the d-variate maximum by

Mn = (Mn 1,,Mn ,d)


where M n ,!, ... , M n d are the identically distributed univariate marginal maxima (compare with (2.1.8)) with common d.f. F:,!. Recall that F:is the d.f. of Mn.

Weak Convergence
The weak convergence is again the pointwise convergence of d.f.'s if the
limiting d.f. is continuous which will always be assumed in this section. The
weak convergence of d-variate d.f.'s implies the weak convergence of the
univariate marginal d.f.'s (since the projections are continuous). In particular,
if Fnn weakly converges to Go then the univariate marginal d.f.'s F:,1 also
converge weakly to the univariate marginal GO 1 of Go. Notice that Go also
has identical univariate marginals. If Go,! is nondegenerate then the results of
Chapter 5 already give some insight into the present problem.
Recall from Section 2.2 that the d-variate d.f.'s x -+ nt=1 Go.! (x;) and
x -+ GO,1 (min(x 1 , ... , x d )) represent the case of independence and complete
dependence.
Lemma 7.2.1. Assume that the univariate marginals F:,1 converge pointwise to
the dj. GO 1 '
(i) Then, for every x,

nG
d

i;;;;l

O,1 (Xi)

~ lim inf F:(x) ~ lim sup F:(x) ~ Go.! (min(x 1 , 00', Xd))'
n

(ii) If F: converges pointwise to some right continuous function G then G is a df


PROOF. Ad (i): Check that F:(x)
is obvious.

Fnn.! (min(x l' ... , Xd))' Now, the upper bound

7.2. Multivariate Extremes

233

Secondly, Bonferroni's inequality (see P.2.5(iv yields


F:(x)

~ exp [
=

-jt

+ 0(1)

n(1- Fn ,1(X)) ]

n exp[ -n(1 d

j=1

Fn ,1(X))]

+ 0(1) =

n G ,1(X) + 0(1).
d

j=1

Therefore, the lower bound also holds.


Ad (ii): Use (i) to prove that G is a normed function. Moreover, the pointwise
convergence of
to G implies that G is ~-monotone (see (2.2.19.
D

F:

It is immediate from Lemma 7.2.1 that max-stable dJ.'s Go have the


property

n G ,1(XJ:s; Go(x):s; Go,1(min(x 1'",Xd


d

;=1

(7.2.1)

Let /; = (e 1" .. , ed) be a random vector with dJ. F. Recall from P.2.5 that
for some universal constant C > 0,

s~p

Fn(t) - exp

Ct

(-1)i nh t )) :s; Cn- 1

(7.2.2)

where
j

= 1, ... ,d. (7.2.3)

Combining Lemma 7.2.1 and (7.2.2) we obtain

Corollary 7.2.2. Let /;n be a d-variate random vector with df Fn. Define hn.j in
analogy to hj in (7.2.3) with /; replaced by /;n. Suppose that the univariate
marginals F:,1 converge pointwise to a df Moreover, for every j = 1, ... , d,
n -+

00,

pointwise,

where hO,j, j = 1, ... , d, are right continuous functions. Then,


(i)

Go

= ex p (

J=1

(-1)ihO,j)

is a df,

and

(ii)

n -+

00,

for every x.

The formulation of Lemma 7.2.1 and Corollary 7.2.2 is influenced by a


recent result due to Husler and Reiss (1989) where maxima under multivariate
normal vectors, with correlation coefficients p(n) tending to 1 as n -+ 00, are
studied. In the bivariate case the following result holds: If
(1 - p(nlogn -+ A,2,

n -+

00,

7. Approximations in the Multivariate Case

234

then the normalized distributions of maxima weakly converge to adJ. H).


defined by
H).(x,y) = exp [

-$(A +

x ;y)e-Y

$(A +

y ; x)e- x ]

(7.2.4)

with
Ho = lim H).

and

H~ =

).LO

lim H)..
).T~

If A = 0, the marginal maxima are asymptotically completely dependent; if

A = 00, we have asymptotic independence. Notice that H). is max-stable and


thus belongs to the usual class of multivariate extreme value dJ.'s.
Next (7.2.2) will be specialized to the bivariate case. Let (~n' '1n) be a random
vector with dJ. Fn. The identical marginal dJ.'s are again denoted by Fn l and
Fn 2 According to (7.2.2),

sup 1F:(x,y) - exp(-n(1 - Fn.l(x)) - n(1- Fn.l(y))


(x.y)

+ nLix,y))l::;; Cn- l
(7.2.5)

where
Lix,y) = P{en > x, '1n > y}

is the bivariate survivor function. Assume that


n -+

F:.I(x) -+ GO.I(x),

(7.2.6)

00,

for every x, where GO I is a dJ. Then,


F:(x, y) = exp[ - n(1 - Fn I (x)) - n(1 - Fn I (y))

. .

= GO.I (x)GO.I (y)exp[nLn(x, y)]

+ nLn(x, y)] + O(n-l)

+ 0(1).

(7.2.7)

Therefore, the asymptotic behavior of the bivariate survivor function is


decisive for the asymptotic behavior of the bivariate maximum. The convergence rate in the univariate case and the convergence rate of the survivor
functions determine the convergence rate for the bivariate maxima.

Asymptotic (Quadrant-) Independence


We discuss the particular situation where the term nLn(x,y) in (7.2.7) goes to
zero as n -+ 00. The following result is a trivial consequence of (7.2.7).
Lemma 7.2.3. Assume that (7.2.6) holds. For every (x, y) with GO.I (x)GOI (y) > 0
the following equivalence holds:
F:(x, y) -+ GOI (x) Go. I (y),

n -+

00,

if, and only if,


n -+

00.

(7.2.8)

7.2. Multivariate Extremes

235

Thus under condition (7.2.8) the marginal maxima M n 1 and M n 2 are


asymptotically independent in the sense that (Mn 1 , M n 2 ) converge in distribution to a random vector with independent marginals.
Corollary 7.2.4. Let ~ and '1 be r.v.'s with common df F such that Fn(bn + an') --+
G weakly.
Then, the pertaining normalized maxima a;; 1 (Mn.i - bn), i = 1,2, are asymptotically independent if
(7.2.9)
lim P(~ > xl'1 > x) = O.
xjw(F)

PROOF. Notice that (bn + anx)j w(F) and n(1 - F(bn + anx)) --+ -log G(x),
n --+ 00, for cx(Go ) < x < w(Go ) and hence the assertion is immediate from
Lemma 7.2.3 applied to ~n = a;;1(~ - bn ) and '1n = a;;1('1 - bn ).
D
It is well known that (7.2.9) is also necessary for the asymptotic independence. Moreover, Corollary 7.2.4 can easily be extended to the d-variate case
(see Galambos (1987), page 301, and Resnick (1987), Proposition 5.27).
Next, Lemma 7.2.3 will be applied to prove that, for multivariate extremes,
the asymptotic pairwise independence of the marginal maxima implies asymptotic independence.

Theorem 7.2.5. Assume that (Mn.1 , , M n.d ) converge in distribution to a


d-variate random vector with df Go. Then, the asymptotic pairwise independence of the marginal maxima implies the asymptotic independence.
PROOF. The Bonferroni inequality (see P.2.4 and P.2.S) implies that

P{Mn ::s; x}
::s; exp ({

:::: exp ( -

(D
: : Ca
::s;

,=1

f n(1 -

Fn.1 (x;))

i~ n(1 -

Fn.1 (Xi)))

i=1

GO 1 (X;)) exp(
Go. 1 (X;) )

1S;i<js;d

nLn.i,ixi,X))

+ 0(1)

~.

nLn.ijXi,Xj ) )

Pg i > Xi' ~j > Xj}'

1S;'<)S;d

+ 0(1)

+ 0(1)

where
Ln.ijXi,Xj )
It remains to prove that

exp (

1S;i<jS;d

nL n i,iXi' X j ) ) --+ 1,

n --+

00,

7. Approximations in the Multivariate Case

236

for every x with n~;l GO,l (x;) > 0, This, however, is obvious from the fact that
according to Lemma 7.2,3 the pairwise independence implies
n -+

for every 1 ~ i < j

00,

d.

As an immediate consequence of Theorem 7.2.5 one gets


Theorem 7.2.6. Let ~

indepencence of ~ 1,

= (~l' ... '~d) have a max-stable df. Then, the pairwise

... ,

~d

implies the independence.

In fact a much stronger result holds as pointed out to me by J. Husler. If


the r.v.'s ~ l' ... , ~d are uncorrelated and jointly have a max-stable dJ. then
they are mutually independent (see P.7.2).

Rates for the Distance from Independence


For notational convenience we shall only study the bivariate case. From (7.2.5)
and by noting that
sup IF:, 1 (x) - exp(-n(1 - Fn,l(x)))1 ~ Cn- 1
x

(7.2.10)

(compare with P.2.5(ii)) we get


F:(x,y)

= F:,l(X)F:,l(y)exp[nL(x,y)] + O(n- 1 ).

(7.2.11)

From (7.2.11) we see that the term nLn(x, y) determines the rate at which
the independence of the marginal maxima is attained.
It is apparent from the proof of Theorem 7.2.5 that (7.2.11) can easily be
extended to the case d ~ 2.
Next (7.2.11) will be specialized to bivariate normal vectors. It was observed
by Sibuya (1960) that the marginal maxima of i.i.d. normal random vectors
are asymptotically independent. In the following example we shall calculate
the rate at which the marginal maxima become quadrant-independent.
7.2.7. Let F be the dJ. of a normal vector (~,,,) where ~ and"
are standard normal r.v.'s. Let p denote the covariance of ~ and" where
- 1 < p < 1. Put un(x) = bn + b;;l x where again bn = mp(bn). Then, for every
x,y,

EXAMPLE

Fn(un(x), un(y))
= I>n(un(x))I>n(u n(y))[1

(7.2.12)
O(n-(l-P)/(1+ P)(log nt P/(1+ P)] + O(n-l).

According to (7.2.11) we have to prove that


nLn(x, y)

= nL(un(x), un(y)) = O(n-(l-P)/(l +P)(log n)-P/(1+ P).

P.7. Problems and Supplements

237

It is well known that the normal distribution N(PZ.l- p2) is the conditional
distribution of given" = z. Thus,

nL(un(x), un(y)) = n

f.oo

(1 - N(Pz.l-p2)( -

00, un(x)J)<p(z)dz

Un(Y)

00

(1-cD[(u n(x)- pu n(z))/(1- p2)1/2])exp[ -(z+z2/b;)] dz

= o (b;; 1 <p(bS1-P)/(1 +P
where the final step is carried out by using the inequality 1 - cD(x) ::;; <p(x)/x,
x> O. We remark that for p > 0 the integration over z with y::;;
(uix) - pu n(z))/(1 - p2)1/2 ::;; bn has to be dealt with separately. Since bn =
O((log n)1/2) the proof can easily be completed.

Final Remarks
If one confines the attention to asymptotically independent r.v.'s then it is
natural to replace, in a first step, the original marginal r.v.'s by some independent versions. The calculation of an upper bound of the Hellinger distance
between the distribution of a multivariate maximum and the joint distribution
of the independent versions ofthe marginals is an open problem. In a second
step one could apply Lemma 3.3.10 and the results of Section 5.2 to obtain
an upper bound of the Hellinger distance between the original distribution
and a limit distribution.
If we analyze the density of the normalized bivariate maximum, with
normalizing constants an > 0 and bn, in the form as given in (2.2.8), we
find that the decisive condition for the asymptotic independence, in the
strong sense, is that the conditional dJ.'s Fl(bn + anxlbn + anY) and
F2(bn + anYlbn + anx) converge to 1 as n ~ 00. Recall that the related condition (7.2.9) yields the asymptotic independence in the weak sense.
In case of asymptotic independence the statistical results in the univariate
case carryover to the multivariate case. If the marginals are asymptotically
dependent then new statistical problems have to be solved (see e.g. P.2.11 and
Bibliographical Notes).

P.7. Problems and Supplements

M.....

1. Denote by
the number of random vectors ~i in the random quadrant
(-oo,X~~!) x (-oo,X~7!). Under the conditions of Theorem 7.1.1 the random
vectors (M.(n) .(.) . , X~:~),., X~7~),.) are asymptotically normal.

(Siddiqui, 1960)

7. Approximations in the Multivariate Case

238
2. (i) Let I;

(C; 1' ... , C;d) have a max-stable dJ. Then,


C; l' ... , C;d are associated,

that is, cov(g(I;),f(I; ;;:,,: 0 for all component wise nondecreasing, real-valued functions j, g, whenever the relevant expectations exist.
(Marshall and Olkin, 1983; for an extension see Resnick, 1987)
(ii) If C; 1, ... , C;d are associated and uncorrelated then they are mutually independent.
(Joag-Dev, 1983)

Bibliographical Notes
Under slightly stronger conditions than those stated in Theorems 7.1.1 and
7.1.2, Weiss (1964) proved the asymptotic normality ofthe dJ.'s of multivariate
central order statistics. The proof is based on the normal approximation of
the multinomial distribution.
The asymptotic normality of multivariate central order statistics was
already proved by Mood (1941), in the special case of sample medians, and
by Siddiqui (1960). In both articles the exact densities of the order statistics
are computed. By using the normal approximation of the multinomial distribution it is then shown that the densities converge pointwise to the normal
density. Thus, according to the Scheffe lemma, one also gets the convergence
in the variational distance. Kuan and Ali (1960) verified the joint asymptotic
normality of multivariate order statistics, including the case where several
order statistics are taken from each component. It is evident that such ordered
values define a grid in the Euclidean d-space. The frequencies of sample points
in the cells of the grid define further r.v.'s. Weiss (1982) proved the joint
asymptotic normality of multivariate order statistics and such associated cell
frequencies.
The research work on multivariate maxima of i.i.d. random vectors started
with the articles of J. Tiago de Oliveira (1958), J. Geffroy (1958/59), and
M. Sibuya (1960). In literature, further reference is given to Finkelstein (1953).
From the beginning much attention was focused on the case where the
marginal maxima are asymptotically independent. It was observed by S.M.
Berman (1961) that for the components of an extreme value vector the independence is equivalent to the pairwise independence. In this context one
also has to note that the marginal maxima are asymptotically, mutually
(quadrant-) independent when, and only when, this is true for each pair of
marginal maxima [see e.g. Galambos (1987, Corollary 5.3.1) or Resnick (1987,
Proposition 5.27)].
If measurements of a certain phenomenon are made at places close together
then there will be a certain dependence between the observations which, in
the present context, are supposed to be maxima. From the results of Section
2.2 it is apparent that the family of max-stable distributions is large enough
to serve as a model for this situation. One may argue that this model is even

Bibliographical Notes

239

so large that the problem has to be tackled of finding a smaller nonparametric


or a parametric model. If one has some knowledge of the mechanism underlying the maxima, then, speaking in mathematical terms, a limit theorem for
maxima will single out certain max-stable distributions.
However, one has to face the difficulty of finding "attractive" multivariate
distributions under which the asymptotic distribution of maxima reflects the
dependence of the observed marginal maxima. In this context, the result
obtained by Hiisler and Reiss (1989) (see (7.2.4)) looks promising: Distributions of bivariate maxima are studied under normal distributions where the
correlation coefficient p(n) varies as the sample size increases. In the limit one
obtains a family of max-stable distributions describing situations between
independence and complete dependence.
We refer to Tiago de Oliveira (1984) for a review of parametric submodels
of bivariate max-stable drs. The nonparametric approach of Pickands for
estimating the dependence function (see P.2.11) has been pursued further by
Smith (1985b) by introducing the smoothing technique to multivariate extreme value theory. This work has been continued in Smith et al. (1987) where
the kernel method is applied to the estimation of max-stable dJ.'s.

PART III

STATISTICAL MODELS
AND PROCEDURES

CHAPTER 8

Evaluating the Quantile and


Density Quantile Function

In this chapter

(a) we start with the "pure" nonparametric, statistical model,


(b) introduce smoothness conditions.
In Chapter 9 this discussion will be continued by studying

(c) semi-parametric models,


(d) parametric extreme value models.
As pointed out in the Introduction the sample qJ. Fn- 1 is the natural
estimator of the underlying qJ. F- 1 In Section 8.1 some results will be
collected which concern the statistical performance of sample quantiles. It will
be shown, in particular, that statistical procedures built on sample quantiles
are optimal if the model is large enough.
Given the information that the unknown qJ. F- 1 is a smooth function one
should not use step functions generated by the sample qJ. as estimates of the
qJ. Consequently, two different classes of kernel type estimators will be
introduced in Section 8.2.
The first class of estimators is obtained by smoothing the sample qJ. by
means of a kernel. The second class of estimators is established in analogy of
the construction of the sample qJ. as the "inverse" of the sample dJ.: Take the
"inverse" of the kernel type estimator of the dJ. Derivatives of the kernel type
estimates of the qJ. will be appropriate estimates of the density quantile
function (F-l)' = 1/f(F- 1 ) where f is the density of F.

8.1. Sample Quantiles


In this section we shall primarily study results for a fixed sample size. The
statistical procedures for evaluating the unknown q-quantile will be optimal

8. Evaluating the Quantile and Density Quantile Function

244

if the underlying model is large enough. The test, estimation, and confidence
procedures have to be randomized to satisfy the usual requirements in an exact
way (e.g. attainment of a level or median unbiasedness).

One-Sided Test of Quantiles

el'

en

Let Xi:n be the ith order statistic of n i.i.d. random variables


e2' ... ,
with common continuous dJ. F. A basic problem is to test the null-hypothesis
F- 1 (q) ~ u against F-l(q) > u.

We shall briefly summarize some well-known facts concerning tests based


on sample quantiles.
Given IX, q E (0, 1) and a positive integer n, let r(lX) == r(lX, q, n) be the largest
integer r E {O, ... , n} such that

rf (~)

i=O

qi(1 _ q)n-i

~ IX.

(8.1.1)

Notice that the left-hand side of (8.1.1) is equal to P{Xr:n > F-l(q)}.
Keep in mind that r(lX) also depends on q and n. Put XO:n = -00 and
Xn+l:n

= 00.

It is clear that

{Xr(Gr): n

> u} is a critical region oflevellX for testing

F-l(q)

u against F-l(q) > u,

however, the level IX will not be attained on the null-hypothesis except in those
cases where equality holds in (8.1.1).
To define a test which is similar on {F: F-l(q) = u} we introduce a randomized test procedure based on two order statistics. Define the critical
function CfJ by
Xr(Gr):n
>u
I
CfJ = { Y(IX) if Xr(Gr):n ~ u, Xr(Gr)+l:n > u
(8.1.2)
o
Xr(Gr)+l:n ~ u
where Y(IX) == y(lX, q, n) is the unique solution of the equation
r(f

~o

(~) qi(l

_ q)n-i

+ y ( n ) qr(Gr)(l
~~

q)n-r(Gr)

= IX

(8.1.3)

with 0 ~ y < 1.
Simple calculations show that the left-hand side of (8.1.3) is equal to
EFCfJ = P{Xr(Gr):n

> u} + yP{Xr(Gr):n ~ u, Xr(Gr)+l:n > u}.

We have
if F-l(q) ~ u
F-l(q) = u.

(8.1.4)

8.1. Sample Quantiles

245

Moreover,
EFCP =

IX

if F(u) = q.

The critical function cP as defined in (8.1.2) is uniformly most powerful for


the testing F-1(q) :$; u against F-1(q) > U. To prove this consider the simple
testing problem Fo against F1 where F1 is a dJ. with 0 < q1 := F1 (u) < q; notice
that F1 (u) < q is equivalent to Fl1 (q) > u. Define the dJ. Fo via fo by Fo(t) =
f . oo fo dFI where
q
1- q
(8.1.5)
fo = - 1(-oo,u) + -1- - l(u,oo)'
ql
- ql
Denote by Qi the probability measures belonging to Fi . Then, fo is the
Q1-density ofQo. Easy calculations show that Fo(u) = q and hence FOI(q) :$; U.
It will turn out that Fo is a "least-favorable" null-hypothesis.
Lemma 8.1.1. cP as defined in (8.1.2) is a most powerful, critical function of level
IX for testing Fo against Fl'
PROOF. In view of the Fundamental Lemma of Neyman and Pearson it suffices
to prove that
1
cP = 0

for some c > O. Put Sn


We have

>

iff 1 < c

.IJ fo(O
n

I?=l 1(-oo,u)(eJ

Ofo(O=
i=l
'

(q(1 _ qd)Sn ( 1 _ q )n
-q1(1-q)
l-q1

and hence

where c > 0 is defined by the equation


1- q
r(lX) = [-loge - nlog -1 - -

- q1

JI log q(1(1ql

qd)'
- q

From (1.1.8) we know that


r(lX)

>
X,(-) . n > u
Sn iff
~
<
X'(I1)+1:n :$; u

and hence cP is of the desired form.

Corollary 8.1.2. The critical function CPn defined in (8.1.2) is uniformly most
powerful of level IX for testing F-1(q) :$; u against F-1(q) > u.

246

8. Evaluating the Quantile and Density Quantile Function

PROOF. Obvious from (8.1.4) and Lemma 8.1.1 since the dJ. Fo defined in
Lemma 8.1.1 is continuous.
0

For k = 1,2,3, ... or k = CfJ we define


possess the following properties:

as the family of all dJ.'s F which

(i) F has a (Lebesgue) density f,


(ii) f> 0 on (rx(F), w(F)),
(iii) f has k bounded derivatives on (rx(F), w(F)).

(8.1.6)

The crucial point ofthe conditions above is that the derivatives above need
not be uniformly bounded over the given model.
Lemma 8.1.3. Let k = 1,2, ... or k = 00 be fixed.
Then, cp as defined in (8.1.2) is a uniformly most powerful critical function of
level rx for testing F-1(q) ~ u against F-1(q) > u with F E~.
PROOF. Notice that Fo (see the line before (8.1.5)) does not belong to
If f1 is the density of F1 E ~ then Fo has the density

q;

q
fo = f1 (

1(-oo,u]

1- q

+1

q1

1(u,oo)

~.

(8.1. 7)

Since q1 < q it is clear thatfo has ajump at u, thus Fo ~. To make Lemma


8.1.1 applicable to the case k ~ 1 one can choose d.f.'s Gm E ~ with G,;,;l(q) = u
having densities gm such that gm(x) --+ fo(x) as m --+ 00 for every x i= u. Then,

applying Fatou's lemma, one can prove that every critical function t/I of level
rx on {F E~: F-1(q) ~ u} has the property EFot/l ~ rx. Thus, Lemma 8.1.1
yields E F 1 t/I ~ E F 1 cp and hence, cp is uniformly most powerful.
0

Randomized Estimators of Quantiles


Whereas randomized test procedures expressed in the form of critical functions are widely accepted in statistics this cannot be said of randomized
estimators. Therefore, we keep our explanations here as short as possible.
Nevertheless, we hope that the following lines and some further details in the
Supplements will create some interest.
Recall that the randomized sample median was defined in (1.7.19) as the
Markov kernel
M('I')-(e
+e X[{n+l)/2)+1:n )/2
n
X[(n+l)/2),"

(8.1.8)

where ex again denotes the Dirac measure with mass 1 at x.


In Lemma 1.7.10 it was proved that Mn is median unbiased; that is, the
median of the underlying distribution is a median of the distribution of the
Markov kernel Mn- In analogy to (8.1.8) one can also construct a randomized

8.1. Sample Quantiles

247

sample q-quantile which is a median unbiased estimator of the unknown


q-quantile.
Given q E (0, 1) and the sample size n let
r == r(1/2, q, n)

and

==

y(1/2, q, n)

be defined as in (8.1.1) and (8.1.3). Define the randomized estimator Qn by


(8.1.9)
where Xr:n is the rth order statistic of n i.i.d. random variables with common
continuous dJ. F.
From the results concerning test procedures one can deduce by routine
calculations that the randomized sample q-quantile is an optimal estimator
of the q-quantile in the class of all randomized, median unbiased estimators
which are equivariant under translations. Non-randomized estimators will be
studied at the end of this section.

Randomized One-Sided Confidence Procedures


Another relevant source is Chapter 12 in Pfanzagl (1985). There the quantiles
serve as an example of an irregular functional in the sense that the standard
theory of 2nd order efficiency is not applicable. This is due to the fact that for
this particular functional a certain 2nd derivative does not exist. Hence, a
direct approach is necessary to establish upper bounds for the 2nd order
efficiency of the relevant statistical procedures.
Randomized statistics of the form (8.1.9) with r == r(l - p, q, n) and y ==
y(l - p, q, n) also define randomized, one-sided confidence procedures where
the lower confidence bound is Xr:n with probability 1 - y and Xr+1:n with
probability y. These confidence procedures are optimal under all procedures
that exactly attain the confidence level p. Pfanzagl proves that the asymptotic
efficiency still holds within an error bound of order o(n- 1/2 ) in the class of all
confidence procedures attaining the confidence level p + o(n- 1/2 ) uniformly in
a local sense (compare with Pfanzagl (1985), Proposition 12.3.3). A corresponding result can be proved for test and estimation procedures.

Estimator Based on a Convex Combination


of Two Consecutive Order Statistics
F or some fixed q E (0, 1) define
tIn

where r(n) == r(q,n)

= (1 - y(n))Xr(n):n + y(n)Xr(n)+l:n

{l, ... ,n} and yen)


nq - r - y

(8.1.10)

[0,1) satisfy the equation

+ (1 + q)/3 =

0.

(8.1.11)

8. Evaluating the Quantile and Density Quantile Function

248

Put

s~p Ip

t
(52

q(1 - q). Under the conditions of Theorem 6.2.4 we get

1/2 f(:-l

(q)) (qn - F- l (q))

:$;

t}
1

-(<I>(t) _ n- 1/ 2 (t)[1 - 2q _ (5f'(F- (q))Jt 2


qJ
3(5
2f(F l(q))2

)1 = o(n-l/2).

(8.1.12)

It is immediate that qn is median unbiased of order o(n-l/2).


Moreover, notice that qn is equivariant under translations; that is, shifting
the observations amounts to the same as shifting the distribution of qn. One
can prove that qn is optimal in the class of all estimators that are equivariant
under translations and median unbiased of order o(n-l/2). The related result
for confidence intervals is proved in Pfanzagl (1985), Proposition 12.3.9.
In the present section the statistical procedures are, roughly speaking,
based on the sample q-quantile. These procedures possess an optimality
property because the class of competitors was restricted by strong conditions
like exact median unbiasedness or median unbiasedness of order o(n- 1/ 2). If
these conditions are weakened then one can find better procedures. We refer
to Section 8.3 for a continuation of this discussion.

8.2. Kernel Type Estimators of Quantiles


Recall that the sample q-quantile Fn-l(q) is given by
Fn-l(q) = Xi:. if (i - 1)/n < q

:$;

i/n and q E (0, 1) for i = 1, ... , n.

Thus, F.- l generates increasing step functions which have jumps at the
points i/n for i = 1, ... , n - 1. Throughout we define F.-l (0) = F.-l(O+) = Xl:.
and F.-l(l) = F.- 1 (1-) = X.: n.
If the underlying q.f. F- l is continuous or differentiable then it is desirable
to construct functions as estimates which share this property. Moreover, the
information that F- l is a smooth curve should be utilized to obtain estimators
of a better statistical performance than that of the sample q.f. F.-I. The key
idea will be to average over the order statistics close to the sample q-quantile
for every q E (0, 1).

The Polygon
In a first step we construct a piecewise linear version of the sample q.f. Fn- l
by means of linear interpolation. Thus, given a predetermined partition
0= qo < ql < ... < qk < qk+1 = 1 we get an estimator of the form
Fn- l (qj-l ) + q - %-1 [F-l()
n
% - F 1 ( %-1 )] ,
qj - qj-l

(8.2.1)

8.2. Kernel Type Estimators of Quantiles

249

For j = 2, ... , k we may take values qj such that qj - qj-1 = f3 for some
appropriate "bandwidth" f3 > o. This estimator evaluated at q is equal to the
sample q-quantile if q = % and equal to [Fn- 1(q - f3/2) + Fn- 1(q + f3/2)]/2 if
q = (%-1 + %)/2 for j = 2, ... , k. Notice that the derivative of the polygon is
equal to

Moving Scheme
This gives reason to construct another estimator of F- 1 by using a "moving
scheme." For every q E (0, 1) define the estimator of F-1(q) by
(8.2.2)

where the "bandwidth function" f3(q) has to be defined in such a way that
q - f3(q) < q + f3(q) s 1. Given a predetermined value f3 E (0, 1/2) the
bandwidth function f3(q) can e.g. be defined by

os

f3(q)

f3
1- q

if

O<q<f3
f3sqs1-f3
1-f3<q<1.

(8.2.3)

Another reasonable choice of a bandwidth function is


f3(q)

q - q2/4f3
f3
(1 - q) - (1 - q)2/4f3

0< q < 2f3


if 2f3 s q s 1 - 2f3
1 - 2f3 < q < 1

(8.2.4)

where it is assumed that f3 s 1/4. Notice that the bandwidth function in (8.2.4)
is differentiable.
The use of bandwidths depending on q can be justified by the following
arguments:
Since Fn- 1 (q) is the natural, nonparametric estimator of F- 1 (q) it is clear
that (8.2.2) defines an estimator of [F- 1(q - f3(q + r1(q + f3(q))]/2 which
in turn is approximately equal to F-1(q) if F- 1 is a smooth function near q
and if f3(q) is not too large. However, if q is close to one of the endpoints of
the domain of F- 1 , then one has to be cautious. If q or 1 - q is small than the
usual q.f.'s (e.g. normal or exponential) do not fulfill the required smoothness
condition. Thus, without further information about the form of the qJ. at the
endpoints of (0, 1) a statistician should again adopt the sample qJ. or any
estimator close to the sample qJ. This aim is achieved by using bandwidths
as defined above.
The use of variable bandwidths also enters the scene when a pointwise
optimal bandwidth (depending on the underlying dJ.) is estimated from
the data. In this case the bandwidth is random and depends on the given
argument q.

250

8. Evaluating the Quantile and Density Quantile Function

The polygon (Figure 8.2.1) and the moving scheme (Figure 8.2.2) are
based on n = 50 pseudo standard exponential random numbers. F- 1 is the
standard exponential q.f.

1.2

0.8

0.4
0.4

0.5

0.6

Figure 8.2.1. F-l, Fn- 1 , polygon with n = 50,

P=

0.1.

1.2

0.8

0.4

0.5

0.4

0.6

Figure 8.2.2. F- 1 , moving scheme with n = 50,

P= 0.1.

Quasi-Quantiles and Trimmed Means


The estimator in (8.2.2) can be written
(Xr(q):n

+ X s (q):n)/2.

(8.2.5)

8.2. Kernel Type Estimators of Quantiles

251

If q - P(q) and q + P(q) are not integers then we have r(q) = max(1, [n(q P(q))] + 1) and s(q) = min(n, [n(q + P(q))] + 1).
Another ad hoc estimator of the q-quantile is a certain "trimmed mean"
defined by
(s(q) - r(q)

s(q)

+ 1)-1 L

i=r(q)

X i :n.

(8.2.6)

To extend the class of estimators of the q-quantile we introduce estimators


of the form
n

Fn~Mq) =

L ai.n(q)Xi :n
i=1

(8.2.7)

where the scores ai.n(q) satisfy the condition


n

L ai.n(q) = 1.
i=1

(8.2.8)

Within this class of estimators we shall study those where the scores are
defined by a kernel. The "trimmed mean" will be closely related to a kernel
estimator which is based on a uniform kernel.

The Kernel Method


Since we shall also need a kernel estimator Fn o of the dJ. F we discuss the
method of smoothing a function via a kernel within a general framework.
Notice that the qJ. of Fn o will be another competitor of the sample qJ. as
an estimator of the underlying qJ. F- 1
Hereafter, let H be a real-valued function with domain (a, b). Particular
cases are qJ.'s and dJ.'s with domain (0, 1) and, respectively, the real line.
We say that a real-valued function k with domain (a, b) x (a, b) is a kernel
iffor every x E (a, b),

= 1.

(8.2.9)

k(x,y)Hn(y)dy.

(8.2.10)

k(x, y) dy

Given an initial estimator Hn define


Hn.o(x) =

By partial integration we get the representation


Hn.o(x) =

K(x,y)dHn(y)

+ Hn(a+)

(8.2.11)

if Hn(a+) and Hn(b-) exist and are finite where the function K is defined by
K(x,z) =

k(x, y) dy.

(8.2.12)

252

8. Evaluating the Quantile and Density Quantile Function

We shall study special kernels of the form


1
k(x, y) = P(x) u

(X-y)
P(x) .

(8.2.13)

The function u is again called a kernel.

Kernel Estimators of Q.F.

(8.2.14)
where the score functions ai,n are given by
i/n
ai,n(q) =
k(q, y) dy.

(i-l)/n

Obviously, condition (8.2.9) implies that the scores ai,n(q) satisfy condition
(8.2.8).
Let u have the properties Su(y)dy = 1 and u(x) = 0 for Ixl > 1. Moreover, assume that the bandwidth function p satisfies the condition P(q) :::;
min(q, 1 - q); e.g. the bandwidth functions in (8.2.3) and (8.2.4) satisfy this
condition. Then the kernel k defined in (8.2.13) satisfies (8.2.9). Now, Fn~6 can
be written in the form
(8.2.15)
For P(q), defined in (8.2.3) and (8.2.4), the function q -+ q - P(q)y is nondecreasing for every Iy I :::; 1 showing that Fn~6 is nondecreasing if u ~ O. Thus,
Fn~6 is in fact a qJ. Moreover, this construction has the favorable property
that the range of Fn~6 is a subset of the support of the underlying dJ. F.
Writing U(z) = S=-1 u(y)dy we have
-1

Fn,o(q) = i~

(q - (iP(q)- 1)/n) - U (q(i{(j)


- i/n)] X i:n

(8.2.16)

It is easy to verify that the coefficients are equal to zero if i :::; n(q - P(q)) or
i ~ n(q + P(q)) + 1.

Kernel Estimators of D.F.


The kernel estimators of the dJ. are of the form
(8.2.17)

8.2. Kernel Type Estimators of Quantiles

253

or, alternatively,
Fn,o(x) = n- 1

(8.2.18)

U((x - ~i)/[3)

i=l

where U(z) = J:.oo u(y) dy and u is a function such that Ju(y) dy = 1. If u ~ 0


then Fn,o generates drs hence by constructing the corresponding qJ.'s we
obtain a further estimator (Fn.or 1 of the qJ. F- 1 .

Density Estimation
The kernel method enables us to construct differentiable functions as estimates
of the dJ. F and the qJ. F- 1 , although the initial estimates are step functions.
Thus, we get estimators of the density f = F' and the density quantile function
(F- 1 ), = 1/f(F- 1 ) as well.
From (8.2.18) we obtain
Fn,l (x) = F~,o(x) = (nf3)-l

L u((x -

~i)j[3).

(8.2.19)

i=1

A corresponding formula holds for Fn~i = (Fn}X If the bandwidth function


is defined as in (8.2.3) then Fn~6 is differentiable on the interval ([3, 1 - [3). We
get

Fn~~(q) = [3-1 i~ [u(q -

(i[3- 1)/n) - u(q

~i/n) JXi:n

(8.2.20)

for [3 ::;; q ::;; 1 - [3. With [3(q) as in (8.2.4) the same representation of Fn~~ holds
for 2[3 ::;; q ::;; 1 - 2[3. However, now Fn~~ also exists on (0,1) and can easily be
computed.

Some Illustrations
In this sequel, we shall apply the Epanechnikov kernel defined by
u(x) = (3/4)(1 - x 2)l r- 1,ll(x).

(8.2.21)

Notice that

o
U(x)=1/2+3x/4-x 3 /4
1

x < -1
if -1::;;x::;;1
x>1.

In Figures 8.2.3 and 8.2.4 the kernel qJ. Fn~6 and the qJ. (Fn,or 1 ofthe kernel
dJ. Fn,o are based on n = 100 pseudo standard exponential random numbers.
For q bounded away from 0 and lone realizes that Fn~6 and (Fn,O)-l have
about the same performance. Near to 0 and 1 the estimate taken from (Fn,O)-l

254

8. Evaluating the Quantile and Density Quantile Function

has the unpleasant property that (a) it is inaccurate and (b) it attains values
which do not belong to the support of the exponential dJ. The second property
is of course not very surprising. To avoid this unpleasant behavior of (Fn,o)-l
one should modify Fn,o(x) in such a way that the bandwidth depends on x.

0.1

0.2

Figure 8.2.3. F-l, Fn- l , and Fn~~ with n = 100, fJ

Figure 8.2.4. rt, Fn-

l ,

and (Fn,or l with n

0.08.

= 100, fJ = 0.08.

255

8.2. Kernel Type Estimators of Quantiles

Figures 8.2.3 and 8.2.4 show clearly that the kernel estimates reduce the
random fluctuation of the "natural" estimates thus, also reducing the maximum deviation from the underlying dJ.
Next Fn~6 and (Fn,O)-l will be evaluated at the right end of the domain.
Again Fn~6 is defined with the bandwidth function in (8.2.4).
8

0.9

Figure 8.2.5.

l,

F.-I, and F.~6 with n = 100,

1.0

f3

= 0.08.

0.9

1.0

Figure 8.2.6. F-l, F.- I and (F. of l with n = 100, f3 = 0.08.

At the first moment I thought there was an error in the computer program
when the graph in Figure 8.2.6 appeared on the screen. The graph of (Fn,O)-l
can hardly be distinguished from the sample qJ. The explanation for (Fn,of 1

256

8. Evaluating the Quantile and Density Quantile Function

being close to the sample qJ. is that the largest order statistics are not close
to each other, and so the kernel dJ. with the bandwidth f3 = 0.08 does not
smooth the sample dJ.

Parametric versus Nonparametric Estimation


Finally, we examine the estimation of the standard normal qJ. This situation
is related to estimating the exponential qJ. near the right endpoint of the
domain.
In addition to the smoothed sample qJ. F.~~, we shall take the estimator
J1.. + 11.<1>-1 where (J1..,I1.) is the maximum likelihood (m.l.) estimator of the
location and scale parameter of the normal dJ. The kernel qJ. is again defined
by means of the Epanechnikov kernel.
In Figures 8.2.7 and 8.2.8 the observations are sampled according to the
standard normal dJ. We remark that the m.l. estimate (J1.., 11.) of (J1., (1) has the
value (0.028, 1.032).
The performance of the estimators is bad near the endpoints 0 and 1 of the
domain. This is not surprising since the parametric estimate J1.. + 11.<1>-1 does
not converge to <1>-1 uniformly over (0, 1). Notice that

is an unbounded function whenever 11. #- 1. Thus, Figure 8.2.8 is misleading


to some extent.

0.8

1.0

Figure 8.2.7. F.-l, F.~6 with n = 100, P= 0.08.

8.2. Kernel Type Estimators of Quantiles

0.6

Figure 8.2.8. <1>-1 (dotted curve), I1n

257

O.B

1.0

+ O"n<l>-l, Fn~6 with n =

100, f3

0.08.

Fitting a Density to Data


For the visual comparison of two different dJ.'s the probability paper plays a
dominant role (see e.g. Gumbel's book or Barnett (1975)). For this purpose
the "theoretical" dJ. is transformed to a straight line. When applying the same
transformation to the sample dJ., a deviation of the transformed sample dJ.
from the straight line can easily be detected.
It can be advisable to compare distributions by their densities. One advantage is that one can see the original form of the distribution. The data will
visually be represented by means of the kernel density in = Fn.1 as introduced
in (8.2.19). In a second step an extreme value density is fitted to the kernel
density. We suppose that the graphs given in Sections 1.3 and 5.1 have already
sensitized the reader for extreme value densities.
We shall examine the monthly and annual maxima of the temperature at
De Bilt (Netherlands). Data of 133 years (1849-1981) are available and have
been first studied by M.A.l van Montfort (1982). The plot of the annual
maxima on a normal probability paper shows an excellent fit of a normal
distribution. Van Montfort points out the resemblance of normal distributions and certain "symmetric" Weibull distributions (compare also Figure
1.3.4). The author is grateful to van Montfort for a translation of his paragraph
8.3 (written in Dutch) and for providing the data. Despite of van Montfort's
remark, Sneyers (1984) considers this as an " ... example of an extreme value
distribution following not a Fisher-Tippett asymptote ... ".
Below Weibull densities with location, scale, and shape parameters 11, 0',
and (X are fitted to kernel densities based on monthly maxima of the temperature. The kernel density is defined with the Epanechnikov kernel and the

258

8. Evaluating the Quantile and Density Quantile Function

0.2

....

24

28

32

36

Figure 8.2.9. July: Kernel density and Weibull density with parameters J1 = 37.3,
= 8.5, ex = 2.7.

(J

0.2

.........

24

28

32

36

Figure 8.2.10. September: Kernel density and Weibull density with parameters
(J = 19.8, ex = 7.0.

J1 = 44.0,

bandwidth {3 = 2.0. A better fit can be achieved by more smoothing, that is,
for a larger bandwidth.
We see that the densities of the maxima of temperature in July (Fig. 8.2.9)
are skewed to the left; those for September (Fig. 8.2.10) are skewed to the right.
Below we also include the corresponding Weibull densities for June and
August which are close together. That for June is nearly symmetric and that
for August is slightly skewed to the right. The kernel density for annual
maxima is nearly symmetric.
The largest observed values of monthly maxima within 133 years are

8.2. Kernel Type Estimators of Quantiles

259

0.2

24

28

32

36

Figure 8.2.11. Kernel density for annual maxima; Weibull densities: June: J1 = 38.2,
= 10.6, (J. = 3.9; July: J1 = 37.3, (J = 8.5, (J. = 2.7; August: J1 = 39.8, (J = 12.2, (J. = 4.1;
September: J1 = 44.0, (J = 19.8, (J. = 7.0.
(J

(a) 36.8 in June 1947, (b) 35.6 in July 1911, (c) 35.8 in August 1857, and (d) 34.2
in September 1949.
We suggest to classify the annual maximum as a maximum of independent,
not identically distributed Wei bull r.v.'s according to the maxima in June,
July, August, and September. According to (1.3.4), the calculation of the dJ.
and the density of the maximum of not identically distributed r.v.'s creates no
difficulties. The resulting density shows an excellent fit to the kernel density
of the annual maxima as given in Figure 8.2.11.
The choice of the Wei bull density was accomplished by some visual,
subjective judgment. To obtain an automatic procedure one should fix a
distance between densities like the maximum deviation, X2-distance, Hellinger
distance, or some other distance. Then, take that parameter (}1, a, (J() which
minimizes the distance between the kernel density and the Weibull density.
From the foregoing remarks it becomes obvious that our estimates are produced by some kind of minimum distance method. By using this method we
are getting larger estimates of the unknown right endpoint than by taking the
sample maximum. Recall that the "minimum distance" estimates are 38.2, 37.3,
39.8, and 44.0 compared to the sample maxima 36.8, 35.6, 35.8, and 34.2. The
difference is particularly significant in those cases where the density is skewed
to the right.
Hosking (1985) developed a modified Newton-Raphson iteration algorithm for solving the maximum likelihood equation in the 3-parameter extreme value model (given by the von Mises parametrization). This algorithm
seems to work if IPI < 0.5. When using the "minimum distance" estimates,
given in Figure 8.2.11, as initial estimates, then one obtains the following
estimates:

260

8. Evaluating the Quantile and Density Quantile Function

June:
August:

}J.
}J.

= 39.1,
= 38.7,

(J
(J

= 11.4, a = 4.3;
= 11.1, a = 4.0;

July:
}J. = 36.2, (J = 7.4, a = 2.4;
September:}J. = 38.8, (J = 14.4, a = 5.2.

The densities pertaining to the maximum likelihood estimates show again


an excellent fit to the kernel (sample) densities.

8.3. Asymptotic Performance of Quantile Estimators


The kernel estimator of the q.f. is given by
1( )=p-l
Fn,O q

Joe

(q-y)F-1()d
P n y Y

(8.3.1)

if 2P < q < 1 - 2p. Notice that under appropriate regularity conditions the
ith derivative Fn~f of Fn~& is given by
(8.3.2)

Moderate Deviations
Our first aim will be to deduce rough bounds for the rate of convergence of
kernel estimators of the q.f. and its derivatives. For this purpose we shall study
again the oscillation property of the sample q.f.
The basic tool for the following considerations will be Lemma 3.1.7(ii)
which describes the stochastic behavior of
(8.3.3)
uniformly over ql' q2 with 0 < Pl ::;; ql < q2 ::;; P2 < 1.
In the sequel we shall assume that the kernel u satisfies the following
regularity conditions:

Condition 8.3.1. Let m be a positive integer. Assume that


(i) u has the support [ -1,1].
(ii) u has m + 1 derivatives.
(iii) u(y) dy = 1.

Integration by parts yields

u(i)(y)yi dy

and

= i!,

f u(i)(y)yi dy = 0,

= 0, .'" m + 1
(8.3.4)

O::;;j<i::;;m+1.

8.3. Asymptotic Performance of Quantile Estimators

261

Condition 8.3.2. Let k be a positive integer. Assume that

f u(y)yi dy = 0,

j = 1, ... , k.

Under Conditions 8.3.1 and 8.3.2 we get, by means of integration by parts,


that

f u(i)(y)yi+i dy = 0,

i = 0, ... , m + 1 and j = 1, ... , k.

(8.3.5)

The following representation of Pn~t will be useful:


(8.3.6)
where the remainder term is given by
Ri,n(q)

= f3- i f u(i)(y)[p~-l(q + p-i

f [pu(i)(y)

1 (q

f3y) - Pn-1(q) - (P-l(q - f3y) - P-l(q))]dy


- f3y) - p-l (q) -

~i ( - ~iY (P-l )(j)(q) ]

}-l

dy

].

(8.3.7)
if again 2f3 < q < 1 - 2f3 and if the derivatives of p-l at q exist.
We remark that (8.3.7) always holds for k = 0.
The representation above shows that Ri,iq) splits up (a) into a random
part which is governed by the oscillation behavior of the sample qJ. and (b)
into a non-random part which depends on the remainder term of a Taylor
expansion of p-l about q.
It is evident that a similar representation holds for the sample dJ. Pn in
place of the sample qJ. Pn- 1 . Recall that the oscillation behavior of Pn was
studied in Remark 6.3.3.
The histograms with random or non-random cells are based on terms of
the form
or
(Fn(t z ) - Pn(td)/(t z - td

Thus, the oscillation behavior of the sample qJ. and the sample dJ. can be
regarded as a property which summarizes the properties of histograms.
The representation (8.3.6) shows that the stochastic behavior of kernel
estimators of the qJ. is exhaustively determined by the oscillation behavior of
the sample q.f.

262

8. Evaluating the Quantile and Density Quantile Function

Next, we give a technical result which concerns the moderate deviation of


the kernel qJ. from the underlying qJ.
Lemma 8.3.3. Suppose that Conditions 8.3.1 and 8.3.2 hold for some m ~ 1 and
k = m - 1. Moreover, assume that the qf. F- 1 has m + 1 bounded derivatives
on a neighborhood of the interval (Pl,P2)'
Then, for every s > 0 and every sufficiently small P ~ (log n)/n there exist
constants B, C > 0 (being independent of P and n) such that
(i)

p{

sup
PI~q~P2

(ii) P {

sup

PI

~q~P2

Wn~Mq) Wn~l(q)

F;I(q)1 > c[(PIOgn)I/2

+ pmJ} <

Bn- s,

)1/2 + pm-.+l
. J} < Bn- s

.
logn
- (F-l )(')(q)1
> C [(2i=1

for i = 1, ... , m, and

(iii) P {
PROOF.

sup

Pl~q~P2

Wn~~+1 (q)1 > C [(p~~:~n)1/2 + 1J} < Bn-

Immediate from Lemma 3.1.7(ii) and (8.3.6).

It is easy to see that Lemma 8.3.3(i) holds with (log n)/n in place of
(P(log n)/n) if 0 < P ::5: (log n)/n. This yields that for every e > 0:

1/2

P{

sup

Inl/2(Fn~Mq) -

F-l(q)) - n 1/2(Fn- 1(q) - F- 1 (q))1

~ e} ~ 0

(8.3.8)

Pl~q~P2

as n ~ 00 for every sequence of bandwidths P == Pn with np;m ~ 0, n ~ 00.


This means that the quantile process n 1/2(Fn- 1 - F- 1) and the smooth
quantile process nl/2(Fn~~ - F- 1) have the same asymptotic behavior on the
interval (PI' P2)' Lemma 8.3.3 shows that, with high probability, the kernel
estimates of the qJ. are remarkably smooth. This fact is basic for the considerations of Section 8.4.

Kernel Estimators Evaluated at a Fixed Point


The results above do not enable us to distinguish between the asymptotic
performance of the sample qJ. and the kernel estimator of the qJ. This is
possible if a limit theorem together with a bound for the remainder term is
established. The first theorem, taken from Reiss (1981c), concerns the estimation of the dJ.
Theorem 8.3.4. Let Fn,o be the kernel estimator of the df. as given in (8.2.18).
Suppose that the kernel u satisfies Conditions 8.3.1(i), (ii), and 8.3.2 for some
k ~ 1. Moreover, let F have k + 1 derivatives on a neighborhood of the fixed
point t such that W(k+l)1 ::5: A.

8.3. Asymptotic Performance of Quantile Estimators

263

Then, uniformly over the bandwidths p E (0, 1),


IE(Fn,o(t) - F(t2 - E(Fn(t) - F(t2
~

(pk+i A Jlu(x)xk+ildx/(k

+ 2(p/n)F'(t)

XU(X)U(X)dXI

+ 1)!)2 + O(p2/n).

(8.3.9)

This result enables us to compare the mean square error E(Fn,o(t) - F(t2
of Fn,o(t) and the variance E(Fn(t) - F(t2 = F(t)(1 - F(t/n ofthe sample dJ.
Fn(t) evaluated at t. If F'(t) > 0 and the bandwidth p is chosen so that the
right-hand side of (8.3.9) is sufficiently small then the term Jxu(x) U(x) dx can
be taken as a measure of performance of Fn,o(t). If

f xu(x) U(x) dx > 0

(8.3.10)

then, obviously, Fn,o(t) is of a better performance than Fit).


If u is a non-negative, symmetric kernel then

xu(x) U(x) dx =

xU(X) [2U(x) - 1Jdx > 0

since the integrand on the right-hand is non-negative. Notice that a nonnegative kernel u satisfies Condition 8.3.2 only if k = 1.
From (8.3.9) we see that Fn,o(t) and Fn(t) have the same asymptotic efficiency, however, Fn(t) is asymptotically deficient w.r.t. Fn,o(t). The concept of
deficiency was introduced by Hodges and Lehmann (1970). Define
(8.3.11)
Thus, i(n) is the smallest integer m such that Fm(t) has the same or a better
performance than Fn,o(t). Since i(n}/n - 1, n - 00, we know that Fn,o(t) and
Fn(t) have the same asymptotic efficiency. However, the relative deficiency
i(n) - n of Fn(t) w.r.t. Fn,o(t) quickly tends to infinity as n - 00. In short, we
may say that the relative deficiency i(n) - n is the number of observations that
are wasted if we use the sample d.f. instead of the kernel estimator.
The comparison of Fn(t) and Fn,o(t) may as well be based on covering
probabilities. The Berry-Esseen theorem yields
(8.3.12)
where u 2 = F(t)(1 - F(t. The Berry-Esseen theorem, Theorem 8.3.4, and
P.8.6 lead to the following theorem.

Theorem 8.3.5. Under the conditions of Theorem 8.3.4

p >0,

P{(ni/2/u)lFn o(t) - F(t)1 ~ y}

,
= 2<1> [y (~ - E(Fn,o(t) - F(t2) ]
2 2E(Fn(t) - F(t2

we

get, uniformly over

(8.3.13)
+ O(n-i/2 + (P + np2(m+i3/2).

264

8. Evaluating the Quantile and Density Quantile Function

We see that the performance of Fn,o(t) again depends on the mean square
error. A modified definition of the relative deficiency, given w.r.t. covering
probabilities, leads to the same conclusion as in the case of the mean square
error.
In analogy to the results above, one may compare the performance of the
sample q-quantile Fn-l(q) and a kernel estimator Fn~Mq). If the comparison is
based on the mean square error, one has to impose appropriate moment
conditions. To avoid this, we restrict our attention to covering probabilities.
Recall from Section 4.2 that under weak regularity conditions,

P{(n l/2/O'o)lFn- 1(q) - F-1(q)1 ::;; y} = 2<l>(y) - 1 +O(n- 1/2) (8.3,14)


with 0'5 = (q(1 - q/[f(F-l(q2] and f denoting the derivative of F.
The following lemma is taken from Falk (1985a, Proposition 1.5).

Lemma 8.3.6. Let Fn~A be the kernel estimator of the qf. as given in (8.3.1).
Suppose that the kernel u satisfies Conditions 8.3.1 (i), (ii). Suppose that the qf.
F- 1 has a bounded second derivative on a neighborhood of the fixed point
q E (0, 1), and that f(F-1(q > 0.
Then, if P== pen) -+ 0, n -+ 00, we have,
P{ (nl/2 /O'n)(Fn~Mq) - fln) ::;; y} = <l>(y) + O(log(n)n- 1/4 )

(8.3.15)

where
(8.3.16)

and

0'; =

II (f

u(x) [q - flx - l(o,q-PX)(Y)] (F- 1)'(q - px) dx

dy.

(8.3.17)

Moreover,
n -+

00.

(8.3.18)

Thus, from Lemma 8.3.6 we know that Fn~Mq) is asymptotically normal


with mean value fln and variance 0'; In. The proof of Lemma 8.3.6 is based on
a Bahadur approximation argument. (8.3.18) indicates that Fn~Mq) and F,,-l(q)
have the same asymptotic efficiency. It would be of interest to know whether
the remainder term in (8.3.14) is of order O(n-1/2). Applying P.8.6 we obtain
as a counterpart of Theorem 8.3.5 the following result.
Under the conditions of Lemma 8.3.6,

P{ (n l/2 /O'o)IFn~Mq) - F-1(q)1 ::;; y}


=

2<l>[Y(~ _ O';/n + (fln - F-l(q2)]


2

20'5/n

(8.3.19)

8.4. Bootstrap via Smooth Sample Quantile Function

265

This shows that the performance of Fn~Mq) depends on the "mean square
error" In + (J-In - F-l(q))2. As in Falk (1985a, proof of Theorem 2.3) we may
prove that

a;

a;

= aJ

2f3(n)

XU (x) U(x) dx

+ O(f3(n)2)

(8.3.20)

and
IJ-In - F-1(q)1

o(f3(n)k+ 1 )

(8.3.21)

if F- 1 has k + 1 derivatives on a neighborhood of q and the kernel U satisfies


Condition 8.3.2 for k. Thus, the results for the q-quantile are analogous to
that for the sample dJ.

8.4. Bootstrap via Smooth Sample Quantile Function


In Section 6.4 we introduced the bootstrap dJ. T,,(Fn,') as an estimator of
the dJ.
T,,(F, .) = PF{T(Fn) - T(F) ~ .}.
Thus, T,,(F, .) is the centered dJ. of the statistical functional T(Fn). Then, in
the next step, the bootstrap dJ. T,,(Fn, . ) is the statistical functional of T,,(F, .).
For the q-quantile (which is the functional T(F) = F-l(q it was indicated
that the bootstrap error T,,(Fn' t) - T,,(F, t) is of order O(n-l/4).
Thus, the rate of convergence of the bootstrap estimator is very slow. We
also refer to the illustrations in Section 6.4 which reveal the poor performance
for small sample sizes. Another unpleasant feature of the bootstrap estimate
was that it is a step function.
In the present section we shall indicate that under appropriate regularity
conditions the bootstrap estimator based on a smooth version of the sample
dJ. has a better performance.

The Smooth Bootstrap D.F.


Let again

Fn~A

denote the kernel qJ. as defined in Section 8.2. We have

Fn~A(q) =

Il nf3~q)

U(

qf3~;) Fn-

where the kernel u satisfies the conditions u

1 (y) dy

0, u(x)

(8.4.1)
0 for Ixi > 1, and

Ju(x) dx = 1. Moreover, the bandwidth function f3(q) is defined as in (8.2.3) or

(8.2.4). Denote by Fn,o the smooth sample dJ. which is defined as the inverse
of the kernel qJ. Fn~A.
By plugging Fn,o into T,,(', t) (instead of Fn) we get the smooth bootstrap
dJ. T,,(Fn,o, ').

266

8. Evaluating the Quantile and Density Quantile Function

We remark that one may also use the kernel estimator of the dJ. as
introduced in Section 8.2.
Since Fn o is absolutely continuous one can expect that the smooth bootstrap dJ. T,,(Fn.O, .) is also absolutely continuous. This will be illustrated in
the particular case of the q-quantile.

Illustration
Given n i.i.d. random variables with standard normal dJ. <I> define again, as
in Section 6.4, the normalized dJ. of the sample q-quantile by
T,,*(F, t) = T,,(F, (q(l - q1/2t/nl/2qJ(<I>-1(q))).

For a sample of size n = 20 (Figure 8.4.1) and n = 200 (Figure 8.4.2) we

-3

-2

-1

Figure 8.4.1. T,.*(<I>,'), T,.*(F., .), T,.*(F. o, .) for q = .4, n = 20.

-3

-2

-1

Figure 8.4.2. T,.*(<I>,'), T,.*(F., .), T,.*(F. o, .) for q = .4, n = 200.

8.4. Bootstrap via Smooth Sample Quantile Function

267

compare the normalized dJ. T,,*(F, .) ofthe sample q-quantile, the normalized
bootstrap dJ. T,,*(Fn, .), and the normalized smooth bootstrap dJ. T,,*(Fn,o, .).
The kernel qJ. Fn~~ is defined with the bandwidth function in (8.2.4) with
P = 0.07. Moreover, u is the Epanechnikov kernel.

Smooth Bootstrap Error Process


In this sequel, let us again use the same symbol for the dJ. and the corresponding probability measure. Write
T,,(F, B) = PF { (T(Fn) - T(F))

B}

(8.4.2)

for Borel sets B.


Define the bootstrap error process Iln(F, .) by
Iln(F,B)

= T,,(Fn,o, B) - T,,(F,B).

(8.4.3)

Notice that Iln(F, .) is the difference of two random probability measures and
thus a random signed measure. Below we shall study the stochastic behavior
of Iln(F, .) as n --+ 00 in the particular case of the q-quantile T(F) = F-1(q).
Let!7 be a system of Borel sets. We shall study the asymptotic behavior of
sup Illn(F, B)I

Be[/'

in the particular case ofthe functional T(F) = F-l(q) for some fixed q E (0, 1).
Put
and
vn(F, B) =

(8.4.4)

[1 - (x/O'n)]2 dN(O,aa)(x),

Straightforward calculations show that


sup Ivn(F, (-00, t])1

= (2ne)-1/2

and

(8.4.5)
sup Ivn(F, [-t,t]1 = sup Ivn(F,B)1 = (2/ne)1/2.
1>0

Notice that these expressions do not depend on the underlying dJ. F.


Theorem 8.4.1. Assume that
(a) F- 1 has a bounded second derivative near q and that (F- 1)'(q)

fixed q E (0, 1),


(b) the bandwidth Pn satisfies the conditions np; --+ 0 and np;
(c) the kernel u has a bounded second derivative.

> 0 for some

--+ 00

as n --+

00,

8. Evaluating the Quantile and Density Quantile Function

268

Then,
PF {

nf3n /

f y2 !~~
u 2 (y) dy

IJln(F, B)I/ !~~ Ivn(F, B)I

~ t}

--+

2<I>(t) - 1
(8.4.6)

as n --+

00

for every t 2 0 whenever SUPBeY' Ivn(F, B)I > O.

The key idea of the proof is to compute the asymptotic normality of the
sample q-quantile ofi.i.d. random variables with common qJ. Fn~6. According
to Lemma 8.3.3 such qJ.'s satisfy the required smoothness conditions with
high probability.
A version of Theorem 8.4.1, with Y' = {( - 00, tJ} and Fn,o being the smooth
sample dJ., is proved in Falk and Reiss (1989). A detailed proof of the present
result will be given somewhere else.
If f3n = n- 1/ 3 then the accuracy of the bootstrap approximation is, roughly
speaking, of order O(n- 1/3 ). The choice of f3n = n- 1/2 leads to a bootstrap
estimator related to that of Section 6.4 as far as the rate of convergence is
concerned.
Under stronger regularity conditions it is possible to construct bootstrap
estimates of a higher accuracy. Assume that F- 1 has three bounded derivatives
near q and that the kernel u has three bounded derivatives. Moreover, assume
that Su(x)x dx = O. Notice that nonnegative, symmetrical kernels u satisfy
this condition. Then, the condition nf3; --+ 0 in Theorem 8.4.1 can be weakened
to nf3; --+ 0 as n --+ 00. This yields that the rate of convergence of the smooth
bootstrap dJ. is, roughly speaking, of order O(n- 2 / 5 ) for an appropriate choice
of f3n.

P.8. Problems and Supplements


l. (Randomized sample quantiles)
(i) Define a class of median unbiased estimators of the q-quantile by choosing X,,"
with probability p(r) where I~=o p(r) = 1 and

rto ktr G) qk(1 -

qrkp(r) = 1/2.

(Pfanzagl, 1985, page 435)


(ii) Establish a representation corresponding to that in P.l.28 for the randomized
sample median.
2. (Testing the q-quantile)
(i) Let fo and fl be the densities in (8.1.7). Construct dJ.'s Gm E ~ with densities
gm for m = 1,2,3, ... such that G,;;I(q) = u and gm(x) --> fo(x) as m --> 00 for every
x =I u.
(ii) Let qJ and ~ be as in Lemma 8.1.3. Prove that for every critical function t/! of
level Ct: such that EFt/! = Ct: if F E ~ and F-I(q) = u the following relations hold:
if F

E ~

and P-I(q)

<
u.
>

P.8. Problems and Supplements

269

3. Let cp and ~ be as in Lemma 8.1.3 and let ':# be a sub-family of~. For e > 0 define
a "e-neighborhood" ':#, of,:# by

':#, = {F

E~:

If - gl :-::; eg for some

G E ':#}

where f and g denote the differentiable densities of F and G.


Then for every critical function t/J which has the property
if FE ':#, and F-1(q) :-::; u
we have
eq(1 - q)

if FE ':#,/2 and q -

4(1

+ e)

:-::; F(u) < q.

4. (Stochastic properties of kernel density estimator)


Find conditions under which the density estimator J" == Fn o 1 [see (8.2.19)] has the
following properties:
(i) Sfn(Y) dy = 1.
(ii) Efn(x) = Su(y)f(x + f3y) dy.
(iii) Efn(x) -+ f(x) as 13 ---> O.
(iv) IEfn(x) - f(x) - 13 2P2)(x) SU(y)y2 dyj21 = 0(13 2).
(v) IE[fn(x) - EJ,,(X)]2 - (nf3f1 f(x)J U2(y) dYI = 0(n- 1).
(vi) Let U(y)y2 dy > O. Show that

13 = n- 1/ [f(X)

u 2(y) dy

T5/[p 2)(X) fU(y)y2 dyJ'5

minimizes the term

For this choice of 13, the mean square error of fn(x) satisfies the relation
E[fn(x) - f(X)]2

5
= n- 4 / 5 4

[
f(x)f u 2(y) dy J4/5 [ f(2)(x) f U(y)y2 dy J2/5 + 0(n-

5. (Orthogonal series estimator)


For x E [0, 1] define eo(x) = 1 and
e2j - 1(x)

= 21/2 cos(2n:jx)

e 2j(x) =

(i) (a) eo, e 1 , e2 ,

21/2

sin(2n:jx),

= 1,2,3, ....

are orthonormal [w.r.t. the inner product


(f, g) =

(b) Let

f(x)g(x) dx].

4 / 5 ).

270

8. Evaluating the Quantile and Density Quantile Function


be a probability density and ~ l ' ... , ~n i.i.d. random variables with common
density f. Then, for every x E [0, 1],
/,.(x)

= 1+

it (n- i~ ei~J) ei
1

x)

is an expectation unbiased estimator of f(x) having the integrated variance

Var(in(x))dx

n- 1

it I

er(x)f(x)dx - n- 1

(1 + it

ar)

= O(s/n)

(see Prakasa Rao, 1983, Example 2.2.1)


(ii) (Problem) Investigate the asymptotic performance of

(/,.(x) - 1)2 dx

as a test statistic for testing the uniform distribution on (0,1) against alternatives
as given iq (i) (b) with s == s(n) -> 00 as n -> 00.
(Compare with Example 10.4.1.)
6. There exists a constant C(p) > 0, only depending on p > 0, such that
IN(~n,y~){Jl-ayn-1/2, ~ + ayn-l/2)_ 2<D(y[1

:::; C(p)(max(ln 1/ 2vn


for every y

0,

Vn

> 0, a

ai, n(~n p, -

00

+ {1-(n/a 2)(v; +(~n -

~)2)}/2])-11

~)2))3/2

< /-In,

<

00

and positive integers n.


(Reiss, 1981c)

7. Denote by G;;l the sample qJ. if Fn o is the underlying dJ. Prove that
PFn.o{(G;;l(q) - Fn~b(q))/Fn~l(q) E B}

is a more accurate approximation to

than the bootstrap distribution Tn(Fn,o, .) to Tn(F, ').


8. (Generating pseudo-random variables)
Generate pseudo-random numbers according to the kernel qJ.
kernel dJ. Fn. o.

Fn~6

and the

Bibliographical Notes
It was proved by Pfanzagl (1975) that the sample q-quantile (including the
sample median) is an asymptotically efficient estimator of the q-quantile (the
median) in the class of an asymptotically median unbiased estimators. It is
well known that for symmetric densities one can find nonparametric estimators of the symmetry point which are as efficient as parametric estimators;
according to Pfanzagl's result a corresponding procedure is not possible if
there is even the slightest violation of the symmetry condition.

Bibliographical Notes

271

In Section 8.2 we studied special topics belonging to nonparametric density


estimation or, in other words, nonparametric curve estimation. We refer to
the book of Prakasa Rao (1983) for a comprehensive account of this field.
In data analysis extensive use of histograms, that are closely related to kernel estimators, has been made for a long time. As early as 1944, Smirnov
established an interesting mathematical result concerning the maximum deviation of the histogram from the underlying density. Since the celebrated
articles of Rosenblatt (1956) and Parzen (1962) much research work has been
done in this field.
The kernel estimator of the dJ. was studied by Nadaraya (1964), Yamato
(1973), Winter (1973), and Reiss (1981c). It was proved by Falk (1983) that
kernels u exist which satisfy condition (8.3.10) as well as Condition 8.3.2 for
k > 1. Falk (1983) and Mammitzsch (1984) solved the question of optimal
choice of kernels in the context of estimating dJ.'s and qJ.'s.
The basic idea behind the kernel estimator of the q-quantile is to average
over order statistics close to the sample q-quantile. The most simple case is
given by quasi-quantiles which are built by two, or more general by a fixed
number k, of order statistics. In the nonparametric context, quasi-quantiles
were used by Hodges and Lehmann (1967) in order to estimate the center of
a symmetric distribution and by Reiss (1980, 1982) to estimate and test
q-quantiles. The kernel estimator of the qJ. was introduced by Parzen (1979)
and, independently, by Reiss (1982). The asymptotic performance of the kernel
estimator of the qJ. was investigated by Falk (1984a, 1985a). Other notable
articles pertaining to this are Brown (1981), Harrell and Davis (1982), and
Yang (1985), among others.
The derivative of the qJ. (== quantile density function) can easily be estimated by means of the difference of two order statistics. An estimator of the
quantile density function may e.g. be applied to construct confidence bounds
for the q-quantile. The estimation of the quantile density function is closely
related to the estimation of the density by means of histograms with random
cell boundaries. Such histograms were dealt with by Siddiqui (1960), Bloch
and Gastwirth (1968), van Ryzin (1973), Tusmidy (1974), and Reiss (1975a,
1978). A confidence band, based on the moving scheme (see (8.2.2)), was
established in Reiss (1977b) by applying a result for kernel density estimators
due to Bickel and Rosenblatt (1973) and a Bahadur approximation result like
Theorem 6.3.1.
Another example of the kernel method is provided by smoothing the log
survivor function and taking the derivative which leads to a kernel estimator
of the hazard function (see Rice and Rosenblatt, 1976). A related estimator
of the hazard function was earlier investigated by Watson and Leadbetter
(1964a, 1964b).
Sharp results for the almost sure behavior of kernel density estimators were
proved by Stute (1982) by applying the result concerning the oscillation of the
sample dJ. A notable article pertaining to this is Reiss (1975b).

CHAPTER 9

Extreme Value Models

This chapter is devoted to parametric and nonparametric extreme value


models. The parametric models result from the limiting distributions of sample extremes, whereas the nonparametric models contain actual distributions
of sample extremes. The statistical inference within the nonparametric framework will be carried out by applying the parametric results.
The importance of parametric statistical procedures for the non parametric
set-up (see also Section 10.4) may possibly revive the interest in parametric
problems. However, it is not our intention to give a detailed, exhaustive survey
of the various statistical procedures concerning extreme values.
The central idea of our approach will be pointed out by studying the
simple-nevertheless important -problem of estimating a parameter IX which
describes the shape of the distribution in the parametric model and the domain
of attraction in the nonparametric model.
In Section 9.1 we give an outline of some important statistical ideas which
are basic for our considerations. In particular, we explain in detail the straightforward and widely adopted device of transforming a given model in order to
simplify the statistical inference. A continuation ofthis discussion can be found
in Section 10.1 where the concept of "sufficiency" is included into our considerations.
Sections 9.2 and 9.3 deal with the sampling of independent maxima. Section
9.4 introduces the parametric model which describes the sampling of the k
largest order statistics. It is shown that in important cases the given model
can be transformed into a model defined by independent observations. The
nonparametric counterpart is treated in Section 9.5.
A comparison of the results of Sections 9.3 and 9.5 is given in Section 9.6.
The 3-parameter extreme value family contains regular and non-regular subfamilies and hence the statistical inference can be intricate. However, the

9.1. Some Basic Concepts of Statistical Theory

273

classical model is of a rather limited range; it can be enlarged by adding further


parameters as it will be indicated in Section 9.6.
In Section 9.7 we continue our research concerning the evaluation of the
unknown qJ. The information that the underlying dJ. belongs to the domain
of attraction of an extreme value dJ. is used to construct a competitor of the
sample qJ. near the endpoints.

9.1. Some Basic Concepts of Statistical Theory


In the present section we shall recall some simple facts from statistical theory.
The first part mainly concerns the estimation of an unknown parameter as
e.g. the shape parameter of an extreme value distribution. The second part
deals with the comparison of statistical models.

Remarks about Estimation Theory


Consider the fairly general estimation problem where a sequence ~ l' ~2' '" of
r.v.'s (with common distribution Po, e E 0) is given which enables us to
construct a consistent estimator of a real-valued parameter e as the sample
size k tends to infinity.
In applications the sample size k will be predetermined or chosen by the
statistician so that the estimation procedure attains a certain accuracy. Then,
one faces two problems, namely that of measuring the accuracy of estimators
and in a second step that of finding an optimal estimator in order not to waste
observations (although in some cases it may be preferable to use quick
estimators in order not to waste time).
For an estimator e: == e:(~ 1, ... , ~k) of the parameter e a widely accepted
measure of accuracy is the mean square error
(9.1.1)

Eo etc. instead of E in order to


and, thus, the expectation as well depends on

If necessary the expectation is denoted by

indicate that the r. v.'s ~ l'


the parameter e. Since

... ,

~k

Eo(e: - e)2 = Eie: - Eoen 2+ (Eoe: - e)2

e:

(9.1.2)

we know that the mean square error is the variance if


is expectation
unbiased.
In general, the accuracy of the estimator can be measured by the expected
loss
(9.1.3)
where L is an appropriate loss function. Note that A. Wald in his supreme
wisdom decided to call EoL( e: Ie) risk instead of expected loss. For a detailed

9. Extreme Value Models

274

discussion of the problem of comparing estimators and of the definitions of


optimal estimators we refer to Pfanzagl (1982), pages 151-154. We indicate
some basic facts.
There does not exist a canonical criterion for the selection of an optimal
estimator. However, one basic idea for any definition of optimality is to
exclude degenerated estimators as e.g. an estimator which is a constant.
An estimator (){ is optimal w.r.t. the global minimax criterion if
sup E8L( (){ 1()) = inf sup E8 L ((){* 1())
8

(9.1.4)

where the inf is taken over the given class of estimators (){*. Notice that (9.1.4)
can be modified to a local minimax criterion by taking the sup over a
neighborhood of ()o for each ()o E e.
The Bayes risk of an estimator (){* w.r.t. a "prior distribution A" is given
by the weighted risk

f E8 L ((){* ())dA(())
1

where A is a probability measure on the parameter space e equipped with a


a-field. The optimum estimator is now the Bayes estimator (){ which minimizes the Bayes risk; that is,

f E8 L ((){I())dA(()) = inf f E8 L ((){* ())dA(())


1

(9.1.5)

where the inf is taken over the given class of estimators (){*. In certain
applications one also considers generalized Bayes estimators where A is a
measure; this generalization e.g. leads to Pitman estimators (compare with
(10.1.23)). For a detailed treatment of Bayes and minimax procedures we refer
to Ibragimov and Has'minskii (1981) and Witting (1985).
Alternatively, one can try to find an optimal estimator within a class of
estimators which satisfy an additional regularity condition. Recall that if the
estimators are assumed to be expectation unbiased then the use of(9.1.1) leads
to the famous Cramer-Rao bound as a lower bound for the variance. In the
nonparametric context (e.g. when estimating a density) one has to admit a
certain amount of bias of the estimator to gain a smaller mean square error.
The extension of the concept above to randomized estimators (Markov
kernels having their distributions on the parameter space e) is straightforward. Notice that E8L((){ 1()) = L( '1 ())dQ8 where Q8 is the distribution of
(){. The extension is easily obtained by putting the distribution of the randomized estimator in place of Q8'
A different restriction is obtained by the requirement that the estimator
(){ is median unbiased or asymptotically median unbiased (compare with
Section 8.1).
Moreover, we shall base our calculations on covering probabilities of the
form

275

9.1. Some Basic Concepts of Statistical Theory

P{ -t'::;;

e: - e::;; til}

(9.1.6)

which measure the concentration of the estimator e: about e.


Let L(e 1 Ie2 ) be of the form L(e1 - ( 2 ). An estimator e: which is maximally concentrated about the true parameter e will also minimize the risk
EeL(e: - e) for bounded, negative unimodal loss functions Lhaving the mode
at zero [that is, Lis nonincreasing on ( - 00, 0] and nondecreasing on [0, 00)].
This can easily be deduced from P.3.5.

Comparison of Statistical Models


Next we describe the simplest version of the fundamental operation of replacing a given statistical model by another one which might be more accessible
to the statistician. The model
(9.1.7)

will be replaced by
(9.1.8)

The two models can be compared by means of a map T or, in general, by


a Markov kernel (the latter case will be dealt with in Chapter 10). The crucial
point is that the map T is independent of the parameter e.
Given 8 E 8 and a r.v. ~ with distribution Pe let '1 = T(O be distributed
according to Qe.
Then, obviously, for any estimator 8('1) operating on f2 [or in greater
generality, a statistical procedure] we find an estimator operating on f!J',
namely,
8*(~) = 8(T(~
having the same distribution as 8('1)'
In terms of risks this yields that for every loss function L

8 E 8.

(9.1.9)

An extension of the framework above is needed in Section 9.3 where f2 and


f!J' have different parameter sets. Let f2 be as in (9.1.8) and
[lJ!

= {Pe,h: 8 E 8, hE H(8)}.

Let T be a map such that for every r.v.


with distribution Qe,
sup IP{T(O

B} - P{'1

(9.1.10)

with distribution Pe,h and r.v. '1

B}I::;; e(e,h).

(9.1.11)

This implies (compare with P.3.5) that with 8*(~)

8(T(~)),

IEe,hL(8*(~)18) - EeL(8('1)18)1 ::;; e(8, h) sup L(tle)


t

for every loss function L( . 18).

(9.1.12)

9. Extreme Value Models

276

For every procedure acting on fl, we found a procedure on f!jJ with the same
performance (within a certain error bound). Until now we have not excluded
the possibility that there exists a procedure on f!jJ which is superior to those
carried over from fl to f?J. However, if T is a one-to-one map (as e.g. in Example
9.1.1), one may interchange the role of fl and f?J by taking the inverse T- 1
instead of T. Thus, the optimal procedure on f!jJ can be regained from the
corresponding one on fl.
In connection with loss functions the parameter () is not necessarily
real-valued. The extension of the concept to functional parameters is obvious.
EXAMPLE 9.1.1. Section 9.2 will provide a simple example for the comparison
of two models. Here, with () = (CT, ct), Po is the Frechet distribution with scale
parameter CT and shape parameter 1/ct, and Qo is the Gumbel distribution with
location parameter log CT and scale parameter ct. The transformation T is given
by T = log.
Moreover, given a sample of size k one has to take the transformation
T(x 1 ,,X k )

= (logxl,,logxd

A continuation of this discussion can be found in Section 10.1.

9.2. Efficient Estimation in Extreme Value Models


Given adJ. G denote by G(/lu) the corresponding dJ. with location parameter
/-1 and scale parameter CT; thus, we have
G(/lU)(x) = G((x - /-1)/CT).

Frechet and Gumbel Model


The starting point is the scale and shape parameter family of Frechet dJ.'s
G(o.u)
We have
1.1/a'
x 2 0.

(9.2.1 )

The usual procedure of treating the estimation problem is to transform the


given model to the location and scale parameter family of Gumbel d.f.'s G~O.a)
where () = log CT. Notice that if ~ is a r.v. with dJ. Gi~i~~ then '1 = log ~ is a r.v.
with dJ. G~O.a).
The density of G~O.a) will be denoted by g~.a).

Gumbel Model: Fisher Information Matrix


For the calculation of the Fisher information matrix within the location and
scale parameter family of Gumbel d.f.'s we need the first two moments of the

9.2. Efficient Estimation in Extreme Value Models

277

distributions. The following two formulas are well known (see e.g. Johnson
and Kotz (1970:

xdG 3(x) =

Loo (-log x)e-

(9.2.2)

dx = y

where y = 0.5772 ... is Euler's constant. Moreover,

x Z dG 3 (x) = yZ

+ n Z/6.

(9.2.3)

From (9.2.2) and (9.2.3) it is obvious that a r.v. IJ with dJ. G~8.a) has the
expectation
EIJ = ()

+ ay

(9.2.4)

and variance
(9.2.5)
The Fisher information matrix can be written as
I(()l,()z)

[f[a~i logg~1.82)(X) J[a~j logg~1.82)(X) JdG~1.82)(X)

1/

By partial integration one can easily deduce from (9.2.4) and (9.2.5) that
I((),a)

-z

[1
(y -1)

nZ/6

(y -

+ (1

1)_ y)Z ].

(9.2.6)

Check that the inverse matrix I((), a)-l of I((), a) is given by


I((),a)-l

(6aZ/nZ{ nZ/6(t ~\~ y)Z

(1

y)}

(9.2.7)

Gumbel Model: The Maximum Likelihood Estimator


The maximum likelihood (m.l.) estimator (Ok' &k) of the location and scale
parameters in the Gumbel model is asymptotically normal with mean vector
((), a) and covariance matrix k- l I((), arl. The rate of convergence to the
limiting normal distribution is of order O(k-l/Z) (proof!).
In the sequel, the estimators will be written in a factorized form: If the m.l.
estimator is based on k i.i.d. random variables IJ 1, " . , IJk we shall write
&k(1J1,"" IJk) instead of &k'
If the r.v.'s IJ 1, ... , IJk have the common dJ. G~8.a) then we obtain according
to (9.2.7) that
P{ (k/V(a)) l/Z (&k(1J 1 , , , . , IJk) - a) :s; t}

--+

<I>(t),

n --+ 00,

(9.2.8)

where V(a) = 6a z/nz.


Given the observations Xl' .'" Xk the m.l. estimate &k(Xl,,,,,Xk) is the
solution of the two log-likelihood-equations

9. Extreme Value Models

278

Le
k

=k

-(x, -O)/a

(9.2.9)

i=1

and

(Xi -

(J)

[1 -

e-(x, -OJ/a]

= k(J..

(9.2.10)

i=l

Notice that (9.2.9) is equivalent to the equation


(J= -CdO g

[k- . e-

so that by inserting the expression for

,=1

(J

(9.2.11 )

Xda ]

in (9.2.10) we get the equation

g((J.) = 0

(9.2.12)

with 9 defined by
(9.2.13)
Observe that the solution ak(x l ' ... , x k) of the equation (9.2.12) has the following property: For reals (J and (J. > 0 we have
(9.2.14)
This property yields that there exist correction terms which make the m.l.
estimator of (J. median unbiased. The corresponding result also holds w.r.t. the
expectation unbiasedness.
Equation (9.2.12) has to be solved numerically; however, this can hardly be
regarded as a serious drawback in the computer era. Approximate solutions
can be obtained by the Newton-Raphson iteration procedure. Notice that
(6 1/ 2 /n)Sk(1J 1, ... , IJk)

may serve as an initial estimator of (J. where

S~(Xl"",Xk) = (k _1)-1 i~ [Xi -

k- 1

i~

xiJ

is the sample variance. The asymptotic performance of (6 1/2/n)sk is indicated


in P.9.2. We remark that the first iteration leads to
(9.2.15)
The estimator (J.t(1J1'" ., IJd has the same asymptotic performance as the m.l.
estimator. Further iterations may improve the finite sample properties of the
estimator.
From (9.2.11) we know that the m.l. estimator of the location parameter is
given by

8k(1J1, ... ,lJk) =

-ak(1J1, ... ,lJk)IOg[k- 1 i~ e-

X ;/ti k (ql ... qk)}

(9.2.16)

279

9.3. Semiparametric Models for Sample Maxima

Efficient Estimation of (X
Let us concentrate on estimating the parameter rx.
(9.2.14) yields that (9.2.8) holds uniformly over the location and scale
parameters () and rx. A further consequence is that the m.l. estimator is
asymptotically efficient in the class of all estimators rxt (11 1 , ... , 11k) which are
asymptotically median unbiased in a locally uniform way. For such estimators
we get for every t', t" > 0,
P{ _t'k- 1/2 ~ rxt(111, ... ,l1k) - rx ~ t"k- 1/2 }
~ P{ _t'k- 1/2 ~ rX k(111, ... ,l1k) -

rx ~ t"k- 1/2 } + o(kO).

(9.2.17)

We return to the Frechet model of dJ.'s Gi~i/; with scale parameter (J and
shape parameter l/rx. The results above can easily be made applicable to the
Frechet model.
If ~ l ' ... , ~k are i.i.d. random variables with common dJ. Gi~i/; then it
follows from (9.2.8) and the discussion in Section 9.1 that
n .......

00.

(9.2.18)

The rate of convergence in (9.2.18) is again of order O(k-l/2). Moreover, the


efficiency of rX k(111"'" 11k) as an estimator of the scale parameter ofthe Gumbel
distribution carries over to rXk(log ~ 1, ... , log ~k) as an estimator of the shape
parameter of the Frechet distribution.

9.3. Semiparametric Models for Sample Maxima


The parametric models as studied in Section 9.2 reflect the ideal world where
we are allowed to replace the actual distributions of sample maxima by the
limiting ones. By stating that the parametric model is an approximation to
the real world one acknowledges that the parametric model is incorrect
although in many cases the error of the approximation can be neglected.
In the present section we shall study a non parametric approach, give some
bounds for the error of the parametric approximation and discuss the meaning
of a statistical decision within the parametric model for the nonparametric
model.

Frechet Type Model


We observe the sample maximum Xm,m of m i.i.d. random variables with
common dJ. F belonging to the domain of attraction of a Frechet dJ. Gl,l/~'
Our aim is to find an estimator of the shape parameter rx.
More precisely, we assume that F is close to a Pareto dJ. ~(.~i:) (with
unknown scale parameter (J) in the following sense: F has a density f satisfying

9. Extreme Value Models

280

the condition
f(x) = (CTocf 1(x/or(1+1 /")e h (X/C7)

where

Xo

for x

(xoor"

(9.3.1)

> 0 is fixed and h is a (measurable) function such that


Ih(x)1 ::;;

Llxl- 61"

for some constants L > 0 and fJ > O.


Condition (9.3.1) is formulated in such a way that the results will hold
uniformly over CT and oc. It is apparent that the Pareto and Fn!chet densities
satisfy this condition with h = 0 and, respectively, h(x) = _X- 1/".
The present model can be classified as a semi parametric (in other words,
semi-nonparametric) model where the shape parameter oc and (or) the scale
parameter CT have to be evaluated and the function h is a nonparametric
nuisance parameter which satisfies certain side conditions.
Let X~1'>m' ... , X~~m be independent repetitions of X m:m. The joint distribution of X~1:)m"'" X~~m will heavily depend on the parameters CT and oc whereas
the dependence on h, xo, and L can be neglected if m is sufficiently large and
k is small compared to m.
Let ~ 1, .. , ~k be i.i.d. random variables with common dJ. Gl, 1/'" From
(3.3.12) and Corollary 5.2.7 it follows that
sup IP{(X~1'>m"'" X~~m) E B} - P{ (CTm"~1"'" CTm"~k) E B} I
B
= O(k1/2(m-6 + m- 1))

(9.3.2)

uniformly over k, m and densities f which satisfy (9.3.1) for some fixed values
x o, Land fJ. Notice that CTm"~i has the dJ. G~~ii"m').
Let again <ik be the solution of the m.l. equation (9.2.12). Combining (9.2.18)
and (9.3.2) we get
Theorem 9.3.1.
P{(k/V(oc))1/2[<ik(logX~1'>m"" ,10gX~~m) - oc] ::;; t}

= <I>(t) + O(k 1/2 (m- 6 + m- 1) + k- 1/2 )

(9.3.3)

uniformly over t, k, m and densities f which satisfy condition (9.3.1) for some
fixed constants x o, L, and fJ. Moreover, V(oc) = 6 oc 2/n 2.

The properties of the m.l. estimator carryover from the parametric to the
non parametric framework.

Sample Maxima within a Fixed Period


If the practitioner insists on observing the data within a fixed period, then it
is necessary to modify the results above since now the sample size is random.
This situation e.g. occurs in insurance mathematics. So let us speak for a
while in terms of claims and claim sizes.

9.4. Parametric Models Belonging to Upper Extremes

281

Assume that the claims come in according to a Poisson process N(s), s ~ 0,


and that independently the claim sizes '11' ... , 'lk have the common density f
which satisfies condition (9.3.1). Thus, the number of claims within a period
oflength s will be N(s). The claims will be arranged in k groups. Write
M == M(s, k)

[N(s)/kJ.

(9.3.4)

Denote by XXl:M the maximum claim size ofthe r.v.'s 'l(i-1)M+1'' 'liM. Thus,
using the notation of (1.1.4) we get the representation
(i)

X M:M - ZM:M('l(i-1)M+1,' 'liM)

(9.3.5)

In analogy to Theorem 9.3.1 we get


Theorem 9.3.2.
P{(kjV(a1(2[&k(logXii~M, ... ,logX~:M) - a] ~ t}
=

<l>(t) +

O[k1(2[m~o (m-

+ m- 1)P{M = m} ] + k- 1(2 ]

(9.3.6)

uniformly over t, k, m and densities f which satisfy condition (9.3.1) for some
fixed constants x o, L, and b. Moreover, V(a) = 6 a 2jn 2.

PROOF. Writing at = &k(logXii~M, ... ,logX~:M) and conditioning on M we


get
00
P{at ~ t} =
P(at ~ tiM = m)P{M = m}

m=O

L
00

m=O

P{&k(logX~1'>m,,logX~~m) ~ t}P{M = m}

with X~;m as in Theorem 9.3.1. Now the assertion is immediate since Theorem
9.3.1 holds uniformly over m.
0
If the distribution of M is highly concentrated about a fixed value, say m,
then it is apparent that the right-hand side of (9.3.6) is again that of (9.3.3).
Another interesting problem arises if k periods of length ti - t i- 1 are fixed.
Notice that the claim numbers N(td, N(t2) - N(t3)' ... , N(t k) - N(t k- 1) of
the k periods are independent. Again the statistical inference can be based on
the maximum claim sizes of each period. After conditioning on the claim
numbers the maximum claim sizes can again approximately be represented
by independent Gumbel r.v.'s which, however, are not identically distributed.

9.4. Parametric Models Belonging to Upper Extremes


In Section 9.2 we studied the classical problem of evaluating the unknown
parameter in the extreme value model by means of estimators based on i.i.d.
random variables. A model of a different kind arises in connection with the

282

9. Extreme Value Models

limiting joint distributions G;,a,k of the k largest extremes of a sample of size


n as introduced in Section 5.3. More precisely, one has to speak of approximate
distributions when the number k = k(n) of extremes goes to infinity as n goes
to infinity.
Now the statistical procedures will be based on k r.v.'s which are dependent.
However, we shall only study certain sub-models which can be transformed
to models involving i.i.d. random variables.

Frechet Type Model


First we examine a model that corresponds to that in (9.2.1), namely,
{Gi~i7~,k:

(1

> 0, ex> O}

(9.4.1)

with location parameter 0 and scale parameter (1. This model arises out of the
Frechet distributions G1,1/a' The model in (9.4.1) can be transformed to the
model

{Q:-1 x Gi~i7~,k:

(1

> 0, ex > O}

(9.4.2)

where Qa is the exponential distribution with scale parameter ex and G~~i7~,k is


the kth marginal distribution of Gi~i7~,k'
More precisely, if (~1'"'' ek) is a random vector with distribution Gi~i7~,k
then according to (5.3.3), (1.6.14), and Corollary 1.6. 11 (iii) the random vector
(1'11,, 11k) := (log(~ d~2)' 210g(~2g3)"'" (k - l)log(~k-d~k)' ~k)

(9.4.3)

has the distribution Qk-1


x G(O,a)
a
1,1/a,k

Exponential Model
The statistical inference is particularly simple in the exponential model
(9.4.4)
Asymptotically, one does not lose information by restricting model (9.4.2)
to model (9.4.4) as far as the evaluation of the parameter ex is concerned
(proofl).
The m.l. estimator
rXk - 1(111, ... ,11k-d

= (k _1)-1

k-1

L 11;

;=1

(9.4.5)

is an (asymptotically) efficient estimator of ex. This estimator is expectation


unbiased and has the variance
Var( rXk(111, ... ,11k)) = ex 2 /k.
Moreover, the Fisher information J(ex) is given by

(9.4.6)

283

9.5. Inference Based on Upper Extremes

J(rx)

= f[:rxIOg[exp(-x/rx)/rxJ

dQ",(x)

= rx- 2 ,

(9.4.7)

thus, rlk(1J1' ... , IJk) attains the Cramer-Rao bound (kJ(rx)tl. The central limit
theorem yields the asymptotic normality of rlk(1J 1, ... , IJk). We have
(9.4.8)

Moreover, (9.4.8) holds with O(k-l/2) in place of 0(1) according to the


Berry-Esseen theorem.
Corresponding to the results of Section 9.2, the m.l. estimator is asymptotically efficient within the class of all locally uniformly asymptotically
median unbiased estimators rxt(1J 1, ... , IJk). For t', til > 0 we get
P{ _t'k- 1/2 ::;; rxt(1J1, ... ,lJk) - rx::;; t"k- 1/2}
::;; P{ _t'k- 1/2 ::;; rlk(1J1, ... ,lJk) - rx::;; t"k- 1/ 2 }

+ 0(1).

(9.4.9)

9.5. Inference Based on Upper Extremes


In analogy to the investigations in Section 9.3 we are going to examine the
relation between the actual model of distributions of upper extremes and the
model built by limiting distributions
{G~~i!~.k: (J > 0, rx > O}

as introduced in Section 9.4. Let fbe a density which satisfies condition (9.3.1),
that is,
f(x)

where

Xo

= ((Jrxt 1(x/(Jt(1+1/"')e h (x/a)

for x

(xo(Jt'"

(9.5.1 )

is fixed and h is a (measurable) function such that


Ih(x)1 ::;; Llxl- d /'"

for some constants L > 0 and 15 > O.


Contrary to Section 9.3, the statistical inference will now be based on the
k upper extremes (Xn,n, ... , Xn-k+l,n) of a sample of n i.i.d. random variables
with common density f.
The distribution of (Xn,n, ... ' Xn-k+l,n) will heavily depend on the parameters rx and (J whereas the dependence on h, x o, and L can be neglected if n is
sufficiently large and k is small compared to n.
It is immediate from Corollary 5.5.5 that
sup IP{(Xn,n, ... ,Xn-k+l,n) E B} - G~~i!:."~(B)1 = O(k/n)dkl/2
B

+ kin~
(9.5.2)

uniformly over n, k E {I, ... , n} and densities f which satisfy (9.5.1) for some
fixed constants 15, L, and xo.

9. Extreme Value Models

284

Thus, the transformation as introduced in (9.4.3) yields


sup IP
B

{(I

}
o gXn:n
- - , ... ,( k - 1) og X n- k + 2 : n, X n - k + 1 : n)EB
X n- 1:n
X n- k +1:n
-

(Q~-l x G~~i7:'"k)(B) I = O((kln)b k 1/2 + kin)).

(9.5.3)

The optimal estimator in the exponential model {Q~-1: IX > O} with unknown scale parameter IX (compare with Section 9.4) is the m.l. estimator
&'k(I11,, rlk) = (k - 1)-1 L~':-; '1i where '11' ... , '1k are i.i.d. random variables
with common distribution Qa' Thus, within the error bound given in (9.5.3)
the estimator
k-1
lXt,n = (k - 1)-1
i 10g(Xn- i +1 :nlX n- i : n)
i=l
(9.5.4)

= [(k -

1)-1

:t:

logXn- i + 1:n] -logXn- k +1:n

has the same performance as the m.l. estimator &'k('11' ... , '1k) as far as covering
probabilities are concerned. We remark that IXt.n is Hill's (1975) estimator. The
optimality property carries over from &'k('11"'" '1d to IXL. From (9.5.3) we get
for t', t" > 0,

= P{k(1-t'k-1/2)~Yk_1 ~k(1 +t"k- 1/2 )} +O((kln)bkl/2 + kin)) (9.5.5)


= <I>(t") - <1>( -t') + O[(kln)bk 1/2 + kin + k- 1/2 ]
where Yk-1 is a gamma r.v. with parameter k - 1.
From (9.5.5) we see that the gamma approximation is preferable to the
normal approximation if k is small. From an Edgeworth expansion of length
2 one obtains that the term k- 1/2 in the 3rd line of (9.5.5) can be replaced by
k- 1 if t' = t".

9.6. Comparison of Different Approaches


In the Sections 9.3 and 9.5 we studied the nonparametric model given by
densities f of the form
f(x) = (0'1X)-1(xloT(1+1 /a)eh(x l a)

for x

(xoO'fa

(9.6.1)

where h satisfies the condition


Ih(x) ~ Llxl- bla

Let n = mk. Given the i.i.d. random variables el' ... , en with common density
let x:.f!m be the maximum based on the jth subsample of r.v.'s e(j-1)m+1' .. ,
ejm for j = 1, ... , k. Moreover, X n-k+1:n, ... , Xn:n are the k largest order

285

9.6. Comparison of Different Approaches

statistics of ~I'

.. , ~n'

We write

elk,n = elk (log X:';'>m'"'' log X!.:'~m)

(9.6.2)

where elk is the solution of (9.2.12). From (9.3.3) we know that for every t,

P{(kn2j6)1/2oc- l (el k,n - oc):s; t}

(9.6.3)

Recall from (9.5.6) that Hill's estimator octn, which is based on the k largest
order statistics, has the following property:

p{kl/2oc- l (octn - oc) :s; t}


=

<I>(t)

+ O[(kjn)dkl/2 + kjn + k- I /2]

(9.6.4)

for every t.
A comparison of (9.6.3) and (9.6.4) shows that the asymptotic relative
efficiency of Hill's estimator octn w.r.t. the estimator cXk,n' based on the sample
maxima of subsamples, is given by

ARE(octn, elk,n) = 0.6079 ....

(9.6.5)

Thus, Hill's estimator is asymptotically inefficient if both estimators are based


on the same number k == k(n) of observations (where, of course, the error
bound in (9.6.3) and (9.6.4) has to go to zero as n -+ (0). Notice that the error
bounds in (9.6.3) and (9.6.4) are of the same order if c5 :s; 1 which is perhaps
the most interesting case. A numerical comparison of both estimators for small
sample sizes showed an excellent agreement to the asymptotic results.
The crucial point is the choice of the number k. This problem is similar to
that of choosing the bandwidth in the context of kernel density estimators as
discussed in Section 8.2.
The above results are applicable if (kjn)d kl/2 is sufficiently small where for
the sake of simplicity it is assumed that c5 :s; 1. On the other hand, the relations
(9.6.3) and (9.6.4) show that k should be large to obtain estimators of a good
performance. This leads to the proposal to take

k = cn 2d/(2HI)

(9.6.6)

for some appropriate choice of the constant c. If c5 is known to satisfy a


condition 0 < c50 :s; c5 :s; 1, where c50 is known, then one may take k as in (9.6.6)
with c5 replaced by c5o.
Within a smaller model, that is, the densities f satisfy a stronger regularity
condition, it was proved by Hall and Welsh (1985) that c5 can consistently be
estimated from the data obtaining in this wayan adaptive version of Hill's
estimator. S. Csorgo et al. (1985) were able to show that the bias term of Hill's
estimator (and of related estimators) restricts the choice of the number k; the
balance between the variance and the bias determines the performance of the
estimator and the optimal choice of k. These results are proved under conditions weaker than that given in (5.2.18). By using (5.2.18), thus strengthening

286

9. Extreme Value Models

(9.6.1), we may suppose that the density f satisfies the condition


f(x) = (Jcx)-l(x/(J)-(l+l/a)(1 - K(x/(J)-p/a

+ h(x/(J)),

where
and 0 < p :::;; b :::;; 1. According to the results of Section 5.2, the expansion of
length 2 of the form
G1,1/a(X/(J)

(1 + m- 1~ p (x/(J)-(1+ p)a )
P

(9.6.8)

provides a better approximation to the normalized dJ. of the maximum X;';;m


than the Frechet dJ. Gl,l/a'
The d.f.'s in (9.6.8) define an extended extreme value model that contains
the classical one for K = O. Notice that the restricted original model of
distributions of sample maxima is approximated by the extended extreme
value model with a higher accuracy.
The approach, developed in this chapter, is again applicable. By constructing an estimator of cx in the extended model one is able to find an estimator
of cx in the original model of densities satisfying condition (9.6.7). The details
are carried out in Reiss (1989).
It is needless to say that our approach also helps to solve various other
problems. We mention two-sample problems or, more general, m-sample
problems. If every sample consists of the k largest order statistics with k ~ 2
and m tends to infinity then one needs modified versions of the results of
Section 5.5, namely, a formulation W.r.t. the Hellinger distance instead of the
variational distance, to obtain sharp bounds for the remainder terms of the
approximations. Such situations are discussed in articles by R.L. Smith (1986),
testing the trend of the Venice sea-level, and I. Gomes (1981).

9.7. Estimating the Quantile Function


Near the Endpoints
Let us recall the basic idea standing behind the method adopted in Section
8.2 to estimate the underlying qJ. F- 1 Under the condition that F- 1 has
bounded derivatives it is plausible to use an estimator which also has bounded
derivatives. Thus, the sample qJ. Fn- 1 has been smoothened by means of an
appropriate kernel. One has to choose a bandwidth which controls to some
extent the degree of smoothness of the resulting kernel estimator Fn~6.
For q being close to 0 or 1 the required smoothness condition imposed on
F- 1 will only hold for exceptional cases. So if no further information about
F- 1 is available it is advisable to reduce the degree of smoothing when q
approaches 0 or 1 (as it was done in Section 8.2).

287

9.7. Estimating the Quantile Function Near the Endpoints

However, for q close to 0 or 1 we are in the realm of extreme value theory.


In many situations the statistician will accept the condition that the underlying dJ. F belongs to the domain of attraction of an extreme value distribution. As pointed out in Section 5.1 this condition can be interpreted in the
way that the tail of F lies in a neighborhood of a generalized Pareto distribution "'i.a with shape parameter IX.
This suggests to estimate the unknown qJ. F- 1 near the endpoints by
means of the qJ. of a generalized Pareto qJ. where the unknown parameters
are replaced by estimates.
When treating the full extreme value model then it is advisable to make
use of the von Mises parametrization of generalized Pareto distributions as
given in Section 5.1. Then, in a first step, one has to estimate the unknown
parameters. As already pointed out the full 3-parameter model contains
regular as well as non-regular sub-models so that a satisfactory treatment of
this problem seems to be quite challenging from the mathematical point of
view.
!n practice the statistician will often be able to specify a certain submodel.
We shall confine ourselves to the treatment of the upper tails of dJ.'s F which
belong to a neighborhood of a Pareto dJ. Wf?i/,j with scale parameter a.
Thus,
w(O,al(x)
- 1 - (x/a)-l/a,
1,1/a
-

x> a,

(9.7.1)

O<q<1.

(9.7.2)

and the q.f. is given by


(Wl(~i/:)rl(q)

= a(l - qr a,

The estimator G;;l is defined by


G;;l(q) = {

Fn~Mq)
-1
( 1 - q )-a~
Fn o(x o ) -1-.
-xc

q ~ xo

if

xo < q

(9.7.3)

where Fn~A is the kernel q.f. as defined in Section 8.2 and IX: is the Hill estimator
defined in (9.5.4).
In Figures 9.7.1 and 9.7.2, n = 100 pseudo-random numbers were drawn
according to the standard Frechet dJ. Gl,l' The point xo was chosen to be
equal to 0.9; the estimate of IX is equal to 1.012.
In Figure 9.7.1 the inverse (Fn.O)-l ofthe kernel estimator ofthe dJ. cannot
visually be distinguished from the sample qJ. Fn- 1 (compare this with the
remarks to Figure 8.2.6).
As indicated above, the philosophy behind this procedure is the following:
Up to some point xo we only have information that the underlying qJ. is
smooth, thus, the kernel method is applicable. Beyond the point xo we are in
the realm of extreme value theory, and hence, the use of a Pareto tail with
estimated parameters may be appropriate. The choice of the point Xo is crucial.
There seems to be some relationship to the well-known problem of estimating

9. Extreme Value Models

288

0.92

Figure 9.7.1. G1.!!, F.-!,

0.96
F.~b, (F. of!

1.00

with f3 = 0.08.

Figure 9.7.2. G1.!!, F.-!, and estimated Pareto tail G;;! .

289

P.9. Problems and Supplements

a change point of a sequence of r.v.'s where the underlying distributions


changes the parameter after an unknown time point.

P.9. Problems and Supplements


1. Prove that there exists a unique solution of the log-likelihood-equations (9.2.9) and
(9.2.10) provided the values Xl' , Xk are not identical.
2. (Estimators based on sample mean and sample deviation)
Let '11' ... , '1k be i.i.d. random variables with mean J.l and variance (f2. Denote by
J.li the ith central moment, by mk the sample mean and by s~ the sample variance.
(i) Prove that kl/2(mk - J.l, s~ - (f2) is asymptotically normal with mean vector
zero and covariance matrix given by (fl.l = (f2, (fl,2 = (f2,l = J.l3' (f2,2 =
(see Serfiing, 1980, page 114)
(ii) Prove the corresponding result for the sample mean mk and the sample standard
deviation Sk'
[Hint: Apply (i) and Theorem A, Serfiing, 1980, page 122.]
(iii) Let '11' ... , '1k be i.i.d. random variables with common Gumbel d.f. G~8,) where
e and IX denote the location and scale parameters. Define

e: = m

k -

')IIX:

and

IX:

= (6 1/2 /n)sk

where ')I is Euler's constant. Prove that kl/2(e: - e, a: - a) is asymptotically


normal with mean vector zero and covariance matrix given by
0"1,1 = [n 2/6

+ y2(P2

- 1)/4 - ynPd6 1/2]/a 2,

(fl.2 = (f2,l = [Pi - 6 1/2y(P2 - 1)/2n]n2a2/12,


0"2,2 = (P2 - 1)a 2/4,

where Pi = J.l3/J.l~/2 and P2 = J.l4/J.l~


(see Tiago de Oliveira, 1963, and Johnson and Kotz, 1970)
3. (Estimators based on order statistics)
Prove a result corresponding to that in P.9.2(iii) by using estimators as given in
(6.2.6) and (6.2.7).
4. In Figure 9.7.2 we see that the second largest and largest observations are about 60
and 180.
(i) Let X". be the rth order statistic of n i.i.d. random variables with common
Pareto dJ. W1.. Then, for U ~ 1,

P{X.:. > uX._ 1 :.} = u-.


[Hint: Apply Corollary 1.6.12(ii).]
(ii) Let a = 1 as in Figure 9.7.2. Notice that

P{X.:. > 3X._ 1 :.} = 1/3.

290

9. Extreme Value Models

Bibliographical Notes
In this book we primarily explore the distributional properties of order
statistics and relations between models of actual distributions of order statistics and approximate, parametric models. Statistical procedures are studied
as examples to show in which way parametric statistical procedures become
relevant within the nonparametric context.
A proper place for an exhaustive list of a greater number of parametric
statistical procedures in extreme value models is a book like that of Johnson
and Kotz (1970): Chapters 18,20, and 21 deal with exponential, Weibull, and
Gumbel models. We will only give a summary by using keywords out of these
chapters: Maximum likelihood (m.l.), minimum variance unbiased, Bayesian,
censoring, quick estimators, method of moments, best linear unbiased.
One might add (compare with Herbach (1984) and Mann (1984 the
additional keywords: Best linear invariant, unbiased nearly best linear, simplified linear.
Further references may be found in the following articles. Smith (1985a)
studied the asymptotic behavior of m.l. estimators in nonregular models like
the Weibull model; see also Polfeldt (1970). By the way, see Reiss (1973, 1978b)
and Pitman (1979) for consistency results concerning m.l. estimators in models
of unimodal d.f.'s and, respectively, in models with location and scale parameters. New quick estimators of location and scale parameters in the Gumbel
model have been proposed by Husler and Schupbach (1986). Quick tests and
a locally most powerful test have been studied by van Montfort and Gomes
(1985) for testing Gumbel d.f.'s against Frechet and Weibull alternatives.
In Section 9.2 we mentioned the asymptotic normality of the m.!. estimator
of the location and scale parameter in the Gumbel model. Higher order
approximations of the distribution of the m.l. estimator can be obtained by
means of expansions. These expansions may e.g. be applied to establish
asymptotic median unbiasedness of a higher order. We refer to R. Michel
(1975) for expansions in the case of vector parameters and to Miebach
(1977) for a specialization of these results to families with location and scale
parameters.
Next, we make some further comments about the estimation of the tail
index and related problems. The statistical extreme value theory is based on
the idea that the parametric extreme value model is an approximation of the
model of actual distributions of maxima. This idea was made rigorous by
Weiss (1971) in a particular case by treating a model of densities in a neighborhood of Wei bull densities. Weiss constructed quick estimators of the location,
scale, and shape (== tail index) parameter based on extreme and intermediate
order statistics. The estimator of the tail index is based on two intermediate
order statistics. This is of interest because an alternative approach, namely,
the use of the k largest order statistics, with k being fixed, fails to entail
consistent estimators. The article of Hill (1975) attracted more attention than
that of Weiss. Presumably, the reason for this is that Hill's estimator is efficient

Bibliographical Notes

291

and, moreover, is related to the m.l. estimator of the scale parameter in the
exponential model. Notice that the estimation of the tail index based on the
k largest order statistics, with k fixed and n ~ 00, is equivalent to estimating
the scale parameter in the exponential model for the fixed sample size k. Hill's
estimator and related estimators were extensively studied in literature [e.g. de
Haan and Resnick (1980), Hall (1982b), Hall and Welsh (1984), Hausler and
Teugels (1985), and Smith (1987)]. The estimation of the endpoint of dJ.'s in
the Weibull case was treated by Hall (1982a).
Falk (1985b) took up Weiss' approach of approximating models and derived the properties, as essentially known in literature (Hall (1982b), Hausler
and Teugels (1985)), of Hill's estimator by using the properties of the m.l.
estimator in the exponential model (compare with Sections 9.4 and 9.5).
The method of taking maxima of subsamples is due to Gumbel; a typical
example is to take annual maxima. The results of Sections 9.5 and 9.6 are
partly taken from Reiss (1987). A comparison of the two different methods,
namely, to base the inference on the k largest order statistics and, respectively,
to use the subsample method, was also carried out in the paper by Husler and
Tiago de Oliveira (1988) within a parametric framework.
The estimation of the parameters of extreme value dJ.'s is related to the
estimation of the qJ. near to the ends of the support (see Section 9.7). This
subject was dealt with in the articles by Weissman (1978), Boos (1984), Joe
(1987), Smith (1987), and Smith and Weissman (1987), among others. In this
context another interesting paper is that of Heidelberger and Lewis (1984)
who suggested applying the subsample method to reduce the possible correlation of the r.v.'s and to reduce the problem of estimating extreme quantiles to
that of estimating the median; moreover, it may have computational advantages to reduce the sample size in certain simulations by applying the subsample method.
The statistical procedures in Sections 9.2-9.6 are either based on the k
largest order statistics or on k subsamples. The choice of the number k
is crucial for the performance of the statistical procedures. The optimal
choice heavily depends on the given model as is pointed out in this chapter.
Some work has been done concerning the selection of the model; we refer to
Pickands (1975), Hall and Welsh (1985), and to Section 9.5 for some results.
The advice ofDu Mouchel (1983) to take the upper 10 per cent of the sample
might be valuable for practitioners. The visual comparison between sample
and extreme value d.f.'s gives further insight into the problem.

CHAPTER 10

Approximate Sufficiency of
Sparse Order Statistics

This chapter starts with an introduction to "comparison of statistical models"


where in addition to Section 9.1 we also make use of Markov kernels.
In Section to.2 it is shown that sparse order statistics X" on' X'2: n,, X'k: n
are approximately sufficient over a nonparametric neighborhood of a fixed
dJ. Fo. This result will be proved under particularly weak conditions.
In Section 10.3 the fixed dJ. Fo will be replaced by a parametric family of
drs. In the case of the location and scale parameter family of uniform
distributions, the extended result follows immediately from Section 10.2. In
other cases, one has to include an auxiliary estimator ofthe unknown parameter into the considerations.
Since sparse order statistics are asymptotically jointly normal one obtains
a normal approximation of the nonparametric model of distributions of
(Xr:n,X,+l:n, ... ,Xs:n) or (X1:n, ... ,Xn:n). The usefulness of this approach
will be demonstrated in Section 10.4 by considering a nonparametric testing
problem.

10.1. Comparison of Statistical Models


via Markov Kernels
The statistical models f!J and f2 we are primarily concerned with are built by
the joint distributions of order statistics X l : n , , Xn:n and, respectively,
X" on' X'2: n, ... , X'k: n where 1 ~ '1 ~ ... ~ 'k ~ n. The order statistics come
from n i.i.d. random variables with common dJ. F which belongs to a certain
non parametric family of dJ.'s.

lO.1. Comparison of Statistical Models via Markov Kernels

293

It is obvious that the projection, defined by

,(x 1,,xn ) = (xr" ... ,xrJ

(10.1.1)

carries the model [JJ to the model fl. Notice that the map, is not one-to-one,
and hence to return from fl to [JJ one has to make use of a Markov kernel.

Markov Kernels
A Markov kernel K carrying mass from the probability space (S1,g;J1,Q) to
(So,g;Jo) has the following two properties:
(a) K( ./y) is a probability measure on g;Jo for every y E S1' and
(b) K(B/) is measurable for every B E g;Jo.
Recall that
KQ(B):=

f K(B/)dQ

(10.1.2)

defines a probability measure on g;Jo. KQ is the distribution of the Markov


kernel K (under Q). Thus, the symbol K also denotes a map from the family
of probability measures on g;J1 into that on g;Jo.
The reader is reminded of the following interpretation of KQ. First observe
y which is an outcome of an experiment governed by Q. Secondly, carry out
an experiment governed by K( ./y) and observe x. Then, the 2-step experiment
with the final outcome x is governed by KQ.
Note that the distribution of a map T (under Q) can be written as KQ where
K is the Markov kernel defined by
K(B/y)

= 1B(T(y)) = BT(y)(B)

with Bx denoting the Dirac measure with mass 1 at x. In this case, given y the
value T(y) is chosen "with probability one."

More Informative and Blackwell-Sufficiency


In this sequel, we are given two models [JJ = {P/1: 0 E 0} and fl = {Q/1: 0 E 0}
such that TP/1 = Q(/, 0 E 0 (in other words, if is a r.v. with distribution P/1
then 11 = T(e) is distributed according to QIl).
Notice that the models [JJ and fl may be defined on different measurable
spaces (So, g;Jo) and (S1, g;J1) like Euclidean spaces of different dimensions.
It is desireable to find a Markov kernel K (independent of the parameter
0) such that
(10.1.3)
oE 0,

which means that P/1 can be reconstructed from Q/1 by means of the Markov
kernel K.

294

10. Approximate Sufficiency of Sparse Order Statistics

If (10.1.3) holds then f2 is said to be more informative than r!J>. If also


e, then both models are equivalent, and T is said to be
Blackwell-sufficien t.
Recall from Section 9.1 that TP8 = Q8' () E e, implies that for every statistical procedure on f2 one finds a procedure on (llJ of equal performance.
Under (10.1.3) also the converse conclusion holds. Let us exemplify this idea
in the context of the testing problem.
Let C E f1Io be a critical region (acting on (llJ). Then, the critical function
K(q): Sl --+ [0, 1] is of equal performance if, as usual, the comparison is
based on power functions. This becomes obvious by noting that according to
(10.1.2) and (10.1.3),
TPe = Qe, () E

P8(C) =

f K(q)dQ8'

() E

e.

(10.1.4)

The same conclusion holds if one starts with a critical function 1/1 defined
on So. The Fubini theorem for Markov kernels implies that

e,

(10.1.5)

JI/I(x)K(dxl)

are of equal

() E

and hence the critical functions 1/1 and


performance.

lfr =

Blackwell-Sufficiency and Sufficiency


We continue our discussion of basic statistical concepts being aware that there
is a good chance of boring some readers. However, if this is the case, omit the
next lines and continue with Example 10.1.2 and the definition of the edeficiency for unequal parameter sets.
The classical concept of sufficiency is closely related to that of Blackwellsufficiency. In fact, under mild regularity conditions, which are always satisfied
in our context, Blackwell-sufficiency and sufficiency are equivalent [see e.g.
Heyer (1982, Theorem 22.12)].
Recall that T: So --+ Sl is sufficient iffor every critical function 1/1 defined on
So there exists a version E(I/II T) of the conditional expectation w.r.t. T which
does not depend on the parameter (). Then, the Blackwell-sufficiency holds
with a Markov kernel defined by
K(Bly) = Q(BI T = y)

where Q(BI T = y) are appropriate versions of the conditional probability of


B given T = y (in other words, K is the factorization of the conditional
distribution of the identity on So given T). Check that
E(I/II T) =

I/I(x)K(dxl T)

w.p.1.

295

10.1. Comparison of Statistical Models via Markov Kernels

Recall that the Neyman criterion provides a powerful tool for the verification of sufficiency of T. The sufficiency holds if the density P8 of P8 (w.r.t. some
dominating measure) can be factorized in the form P8 = r(h8 0 T).
EXAMPLES 10.1.1.
(i) Let ~ be a family of uniform distributions with unknown location parameter. Then, (X 1:n, Xn:n) is sufficient.
(ii) Let ~ be a family of exponential distributions with unknown location
parameter. Then, Xl:n is sufficient.

The concept of Blackwell-sufficiency will be extended in two steps. First we


consider the situation where (10.1.3) holds with a remainder term. The second
extension also includes the case where the parameter sets of ~ and !2 are
unequal.

Approximate Sufficiency and e-Deficiency


If (10.1.3) does not hold for any Markov kernel then one may try to find a
Markov kernel K such that the variational distances sUPBIP8(B) - KQ8(B)I,
oE e, are small. We say that !2 is e-deficient w.r.t. ~ if
sup IP8(B) - KQ8(B)1

e(O),

0E

for some Markov kernel K.


In this context, the map T may be called approximately sufficient if e(O) is
small. Define the one-sided deficiency o(!2,~) of!2 w.r.t. f!lJ by
o(!2,~):=

infsup sup IP8(B) - KQ8(B)1


K

8ee

(10.1.6)

where K ranges over all Markov kernels from (Sl,Bi 1 ) to (So, Bio) The deficiency o(!2, f!lJ) of !2 w.r.t. ~ measures the amount of information which is
needed so that !2 is more informative than ~. If TP8 = Q8, 0 E e, then
o(~,!2) =

o.

Notice that
between !2 and

is not symmetric. To obtain a symmetric distance


define the symmetric deficiency

o(!2,~)
~

~(!2,~)

= max(o(!2, ~), o(~, !2.

(10.1.7)

The arguments in (10.1.4) and (10.1.5) carryover to the present situation;


now, we have to include some remainder term into our consideration. Let
again K be a Markov kernel carrying mass from (Sl' Bi 1 ,!2) to (So, Bio). If 1/1*
is an optimal critical function acting on!2 then 1/1** = I/I*(T) is optimal on ~
within the error bound o(!2, ~). To prove this, consider a critical function 1/1
on Sl. We have

296

10. Approximate Sufficiency of Sparse Order Statistics

f I/J** dPe = f I/J* dQe"? f[f I/J(x)K(dxly) ] dQe(Y) = f I/J dKQe


"?

I/J dPe -

s~p lPe(B) -

(10.1.8)
KQiB) I

for every Markov kernel K, and hence

I/J** dPe "?

I/J dPe -

<5(~, 9)

(10.1.9)

showing the desired conclusion.


Next we shall give a simple, (as we hope) illuminating example in order not
to remain too theoretical. The technical details are omitted in order not to
disturb the flow of the main ideas.
EXAMPLE 10.1.2. Consider the location parameter model90 ,n = {PO,n,e} of a
sample of size n arising out of the densities
x

-+

f(x - 0)

(10.1.10)

with f being fixed. Assume that f(x) > 0, a ::;; x ::;; b, and =0, otherwise.
A typical example is given by the uniform density

= Z- 11[-1.11

(10.1.11)

Denote by &>~n the special model under condition (10.1.11). Recall from
Example 10.1.1 that (XI:n,Xn:n) is a sufficient statistic in this case.
Step 1 (Approximate Sufficiency of (X 1:n, Xn:n)). Under weak regularity conditions it can be shown that (X 1:n, Xn:n) is still approximately sufficient for
the location model9o,n. We refer to Weiss (1979b) for a global treatment and
to Janssen and Reiss (1988) for a local "one-sided" treatment ofthis problem.
The technique for proving such a result will be developed in the next section.
Regularity conditions have to guarantee that no further jumps of the density
occur besides those at the points a, b.
Let &>l,n = {PI,n,e} denote the model of distributions of (Xl:n' Xn:n) under
the parameter O.
Approximate sufficiency means that there exists a Markov kernel K I such
that PO,n,e can approximately be rebuilt by KIP1,n,e. In terms of e-deficiency
we have
(10.1.12)

where e(n) -+ 0, n -+ 00. We remark that e(n) = O(n- 1 ) under certain regularity
conditions.
In the special case of (10.1.11), obviously,
A(9~n,91,n)

o.

Notice that 9 1,n is again a location parameter model.

(10.1.13)

10.1. Comparison of Statistical Models via Markov Kernels

297

Step 2 (Asymptotic Independence of X l :n and Xn:n). Next Xl:n and Xn:n will

be replaced by independent versions Yl :n and Y,,:n, that is,


i = 1, n,

(10.1.14)

and Yl :n, Y,,:n are independent.


From (4.2.10) we know that the variational distance between the distributions of (X l:n' Xn:n) and (Yl :n, Y,,:n) is of order O(n-l).
In this case, the Markov kernel which carries one model to the other is
simply represented by the identity. Denote by 9 2,n = {P2,n,6} the location
parameter model which consists of the distributions of (Yl : n, Y,,:n). Then,
(10.1.15)
Step 3 (Limiting Distributions of Extremes). Our journey through several

models is not yet finished. Under mild conditions (see Section 5.2), the extremes Yl :n and Y,,:n have an exponential distribution with remainder term
of order O(n-l). More precisely, if the extremes l/:n, i = 1, n, are generated
under the parameter 0 then
sup IP6{(Yl :n - a) E B} - Ql,n,6(B)1 = O(n- l )
B

(10.1.16)

and
sup IP6{(y":n - b) E B} - Q2,n,6(B)1 = O(n-l)
B

where the Qi,n,6 have the densities qi,n( - 0) defined by


ql

.f x:2: 0
x< 0

_ {nf(a)exP[ -nf(a)x]
0

' x) n

(10.1.17)

and
( ) _ {nf(b)eXP[nf(b)Y]
q2 ,n Y - 0

.f y::; 0
Y> o

We introduce the ultimate model 9 3 ,n = {P3 ,n,6} where


P3 ,n,6 = Ql,n,6

Qn,n,6

Note that 9 3 ,n is again a location parameter model.


Summarizing the steps 1-3 we get
~(9o,", 9 3 ,n)

= O(e(n) + n- l ).

(10.1.18)

One may obtain a fixed asymptotic model by starting with the model of
distributions of n(Yl :n - a) and n(Y,,:n - b) under local parameters nO in place
ofO.
Step 4 (Estimation of the Location Parameter). In a location parameter model

it makes sense to choose an optimal estimator out of the class of estimators

298

to. Approximate Sufficiency of Sparse Order Statistics

that are equivariant under translations; that is, given the model
estimator has the property

~3,n

the

(10.1.19)

If en is an optimal equivariant estimator on

~3,n

then
(10.1.20)

en(X l : n - a, Xn:n - b)

is an equivariant estimator operating on ~O,n having the same performance


as en besides of a remainder term of order O(B(n) + n- l ).
We remark that in order to show that en(X l : n - a, Xn:n - b) is the optimal
estimator on ~O,n one has to verify that en is optimal within the class of all
randomized equivariant estimators operating on ~3,n'
Let us examine the special case of uniform densities as given in (10.1.11). A
moment's reflection shows that necessarily
(10.1.21)

so that any reasonable estimator has to lie between Xn:n - 1 and X l : n + 1.


One could try to adopt the maximum likelihood (m.l.) principle for finding
an optimal estimator. However, the likelihood function

e-+ Tn n
n

i=l

1[9-1,9+1j(Xi : n )

has its maximum at any e between Xn:n - 1 and X l : n + 1. Hence, the m.l.
principle does not lead to a reasonable solution of the problem.
For location parameter models it is well known that Pitman estimators are
optimal within the class of equivariant estimators (see e.g. Ibragimov and
Has'minskii (1981), page 22, lines 1-9).
It is a simple exercise to verify that
(10.1.22)

is a Pitman estimator w.r.t. any sub-convex loss function L(' - '). Note
that L(' - .) is sub-convex if L is symmetric about zero and LI [0, (0) is
nondecreasing.
If L is strictly increasing then the Pitman estimator is uniquely determined.
Let us return to the ultimate model ~3,n' A Pitman estimate en (x, y) w.r.t.
the loss function L( . - . ) minimizes

(10.1.23)

L(e - U)gl,n(X - U)g2,n(Y - u)du

in e. [Recall that the Pitman estimator is a generalized Bayes estimator with


the Lebesgue measure being the prior "distribution."]
Check that (10.1.23) is equivalent to solving the problem

rx L(e -

Jy

u)exp[n(f(a) - f(bu] du = min!.


9

(10.1.24)

10.2. Approximate Sufficiency over a Neighborhood of a Fixed Distribution

299

If f(a) = f(b) then for sub-convex loss functions,


On(x, y) = (x

+ y)/2

is a solution of (10.1.24). Moreover, this is the unique solution if L is strictly


increasing on [0, (0).
Thus,
[X l : n

+ Xn:n -

(a

+ b)]/2

(10.1.25)

is an "approximate" Pitman estimator in the original model &'O,n'


The finding of explicit solutions of (10.1.24) for f(a) "# f(b) is an open
problem.

Unequal Parameter Sets


Corresponding to (9.1.7) and (9.1.8) we introduce models
f!J> = {PO,g: 8 E

e, g E G(8)}

.?l = {QO,h: 8 E

e, h E H(8)}

and

where g and h may be regarded as nuisance parameters. The notion and the
results above carryover to the present framework .
.?l is said to be e-deficient w.r.t. &' if
sup IPo,y(B) - KQO,h(B)1

:S;

e(8, g, h),

0E

e, g E G(8), h E H(8),

(10.1.26)

for some Markov kernel K.


Define the "one-sided" deficiency b(.?l, &') of.?l w.r.t. &' by
b(.?l, &') := inf sup sup IPo,g(B) - KQO,h(B)1
K

O,g,h

(10.1.27)

where K ranges over all Markov kernels from (Sl,81d to (So, 810 ), Moreover,
the symmetric deficiency of .?l and &' is again defined by
(10.1.28)

10.2. Approximate Sufficiency over a Neighborhood


of a Fixed Distribution
In this section we compute an upper bound for the deficiency (in the sense of
(10.1.7)) of a model defined by the distributions of the order statistic and a
second model defined by the joint distribution, say, Pn of sparse order statistics
Xr,:n:S; X r2 : n :S; "':S; X rk : n [suppressing the dependence on rl, ... ,rkJ. To

10. Approximate Sufficiency of Sparse Order Statistics

300

prove such a result one has to construct an appropriate Markov kernel which
carries the second model back to the original model.
Let X1:n ::::;; ... ::::;; Xn:n be the order statistics of n i.i.d. random variables
with common dJ. F which is assumed to be continuous. Theorem 1.8.1
provides the conditional distribution
(10.2.1)
of the order statistic (X 1:n, ... ,Xn:n) conditioned on (Xr,:n,Xr2:n"",Xrk:n)=
x. Re-::all that Kn is a Markov kernel having the "reproducing" property

KnPn(B) =

f Kn(BI')dPn = P{(X1:n,X2:n"",Xn:n)EB}

(10.2.2)

for every Borel set B.


Let K: denote the special Markov kernel which is obtained if F is the
uniform dJ. on (0,1), say, Fo. Thus, we have
K:Clx) = P((U1:n,, Un:n) E 'I(Ur, :n' Ur2 :n,, Urk :n) = x).
If F is close to Fo-in a sense to be described later-then one can hope
that (10.2.2) approximately holds when Kn is replaced by K:. The decisive
point is that K: does not depend on the dJ. F.
In light of the foregoing remark the k order statistics X r, :n' ... , X rk :n carry
approximately as much information about F as the full order statistic.

The Main Results


We shall prove a bound for the accuracy of the approximation introduced
above under particularly weak conditions on the underlying dJ. F.
Theorem 10.2.1. Let 1 ::::;; k ::::;; nand 0 = ro < r1 < ... < rk < rk+1 = n + 1. Denote again by Pn the joint distribution of order statistics X r, :n' X r2 :n, ... , X rk :n
[of n i.i.d. random variables with common df. F and density f].
Assume that cx(F) = 0 and w(F) = 1, and that f has a derivative on (0,1).
Then,
sup IP{(X 1:n,X2:n, ... ,Xn:n) E B} - K: Pn(B) I

(+ 1)2

::::;; c5(F) [ k+1


L (rj - rj - 1 _ 1) rj - rj - 1
j=l
n+1

J1/2

(10.2.3)

where
c5(F) = sup 1f'(y)11 inf j2(y).
YE(O.l)

(10.2.4)

YE(O.l)

PROOF. Let Kn denote the Markov kernel in (10.2.1) given the dJ. F. Applying
Theorem 1.8.1 we obtain

10.2. Approximate Sufficiency over a Neighborhood of a Fixed Distribution

301

sup IP{(X 1 ,n"",Xn,n) E B} - K:Pn(B)1

~ s~p IKn(BI') -

(1)

K:(BI')I dPn

~ f s~p 1
C~ ~.x ) (B) - C~ Qj.x )<B) 1
dPn(x)
where Pri .x = Qri. x are the Dirac-measures at Xi for i = 1, ... , k; moreover
for i = 1, ... , k + 1 and j = r i - 1 + 1, ... , r i - 1 the probability measures ~.x
and Qj.x are defined by the densities:
Pi.x

l(xi _,.xj(F(x;) - F(X i- 1 ))

and
qi,x

1(Xi_"Xj(Xi - Xi-i)

[with the convention that X o = 0 and


Writing
g

X n +1 =

1].

== gj,x = (Pj,x/%,x) - 1

we obtain from (3.3.10) that for every x with Xj - xj- 1 > 0, j


sup
B

I(x ~,x)(B) (X
-

)=1

)=1

1, ... , k

+ 1,

Qj,x)(B)1
(2)

where
p(F)

= [

sup 1f'(Y)I/ inf f(y)J2


YE(O,l)

YE(O,l)

The second inequality in (2) can easily be verified by using the representation
gj,x(Y)

f'(v)
f(u) (y - u),

(3)

with u and v strictly between xj- 1 and Xj and u not depending on y.


Combining (1) and (2) and applying the Schwarz inequality we obtain
sup IP{(X 1 ,n, X 2 ,n,"" Xn,n)

B} - K: Pn(B)1

where XO,n =

and X n+1,n = 1. Finally, Theorem 1.2.5(i) and (1.7.4) yield

(4)

10. Approximate Sufficiency of Sparse Order Statistics

302

E(Xrj:n - X rj _,:n)2 = E[F- 1 (Urj :n) - F- 1 (Urj-1:n)]2


~ E[Ur:-rj_,:n] /

inf P(y) ~ (

r. - r
J

ye(O,I)

11+ )2/

thus, (10.2.3) is immediate from (3)-(5).

(5)

inf P(y)

ye(O,I)

Notice that c5(F) can be regarded as a distance between F and the uniform
dJ. Fo.
EXAMPLE 10.2.2. If the differences ri - ri - 1 are of order O(m(n)) and k == k(n)
is of order O(n/m(n)) which means that, roughly speaking, the indices ri are
equi-distant, then the right-hand side of (10.2.3) is of order
O(c5(F)m(n)/nl/2 ).

Thus, if m(n) = o(nl/2) (entailing that the number k of order statistics has
to be larger than n 1/2 ) then the right-hand side of (10.2.3) goes to zero as n
goes to infinity even if c5(F) is bounded away from zero.
If n1/ 2 = O(m(n)) then F should also depend on n. In the statistical context
this means that our model has to shrink towards the uniform dJ. as n goes to
infinity.
A typical situation for such a dependence on the sample size n occurs in
the context of a goodness-of-fit test when one is testing the uniform dJ. Fo
against an alternative Fn having a density f,. given by
f,.(x) = 1 + p(n)n- 1/2h(x).

Note that p(n) is a fixed constant in classical test problems. In Example


10.4.1 we shall study the situation where the dimension of the alternative
increases as the sample size increases. Then, p(n) has to go to infinity as n ~ 00
in order to attain rejection probabilities bounded away from the level a under
alternatives of the above form.
If h and h' are bounded then c5(Fn) = O(p(n)n-l/2) and, therefore, the righthand side of (10.2.3) is of order o [p(n)/k(n)].

Local Formulation
Theorem 10.2.1 may be extended in various directions. In cases where one is
only interested in local properties of F, our considerations will be based on a
statistic only depending on certain extreme or central order statistics, say,
Xr:n ~ Xr+l:n ~ ... ~ X.:n where 1 ~ r ~ s ~ n. Again the number of order
statistics may be reduced. If r1 = rand rk = s then, in contrary to the conditions of Theorem 10.2.1, it suffices to assume that 0 ~ a(F) < w(F) ~ 1.
For the formulation of Addendum 10.2.3 we introduce the projection
r == r(r, s) defined by

10.2. Approximate Sufficiency over a Neighborhood of a Fixed Distribution

303

Note that in the following context, Markov kernels will rebuild the joint
distribution of X r:n, X r+ l : n, ... , Xs' Define a Markov kernel adjusted to the
present problem, namely,
K:,t('lx)

= rK:(lx).

Note that K:,t( 'Ix) is a marginal distribution of K:( . Ix). Check that
K:,t( 'Ix) is the conditional distribution of (Ur:n, Ur+1:n, ... , Us:n) given
(Uri :n' Ur2 :n,, Urk :n) = x.

Addendum 10.2.3. Assume that 1 S; r = r l < r2 < ... < rk = S S; n. Denote


again by Pn the joint distribution of the order statistics X r1 :n, X r2 :n, ... , X rk :n.
Assume that 0 S; a(F) < w(F) S; 1, and that f has a derivative on (a (F), w(F.
Then,
sup
B

IP{ (Xr:n, X r+1:n, ... , Xs:n) E B} - K:,tPn(B) I


(10.2.5)

where

b(F) =

sup
ye(<z(F),w(F))

1f'(Y)I/

inf

P(y).

ye(<z(F),w(F))

The proof of (10.2.5) is an almost verbatim repetition of that of (10.2.3) and


can be left to the reader. We remark that Addendum 10.2.3 is an immediate
consequence of Theorem 10.2.1 if again a(F) = 0 and w(F) = 1.

Transformed Models
The results until now are concerned with d.f.'s F close to the uniform dJ. Fo
on (0,1). If we fix some other continuous d.f., say Go in place of Fo then the
probability integral transformation may be applied to reduce the problem
again to the former case.
The dJ.'s G close to Go have to be of the form G = FoGo where F (being
equal to G 0 G( 1 ) has to fulfill the conditions of Theorem 10.2.1. If Yi:n are
the order statistics of r.v.'s with common dJ. G then Xi:n = Gol(Yi:n) are the
order statistics of r.v.'s with common dJ. F. Thus, Theorem 10.2.1 applies to
X i :n

In order to formulate the problem for the original order statistics Yi:n we
introduce the Markov kernel M: where M:( 'Iy) is the conditional distribution of (Y1: n,Y2 : n, ... ,y":n) given (,.I: n ',.2: n ""',.k: n )=Y in the special
case of G = Go (where again the dependence of M: on r10 ... , rk will be
suppressed).

304

10. Approximate Sufficiency of Sparse Order Statistics

Theorem 10.2.4. Let 1 .:s;; k .:s;; nand 0 = ro < r 1 < ... < rk < rk+1 = n + 1. Let
F be a continuous df. with a(F) = 0 and w(F) = 1. Assume that F has two
derivatives on (0, 1). Put f = F'.
Denote by Qn the joint distribution of the order statistics Y,., :n' ... , y"k: n
where the 1';:n are the order statistics of n i.i.d. random variables with common
df. G1 = FoGo Then,

sup IP{ (Y1:n, YZ : n,"" Y,,:n)

B} - M,iQn(B)1

.:s;; b(F) [

k+1

(rj -

rj-l _

1) (

rj

j=l

rj-1

+ I)Z

n+1

J1 /Z (10.2.6)

where again
b(F) =

sup 1f'(Y)I/ inf j2(y).

ye(O,l)

ye(O,l)

Theorem 10.2.4 was stated in such a way that it can easily be deduced from
Theorem 10.2.1, however, this formulation looks rather artificial. Further
insight in the nature of the term b(F) can be obtained by means of a different
representation of the density f
From P.1.S and Remark 1.5.3 we conclude that G1 has the Go-density
g = foGo. Hence, f = g 0 G01 , according to Criterion 1.2.3. Thus, the conditions of Theorem 10.2.4 can be reformulated in the following way:
Assume that G1 has the Go-density g so that g 0 G01 is differentiable.
Moreover, the term b(F) can be replaced by
sup I(g

ye(O,l)

G01 ),(Y)I/ inf (g

ye(O,l)

G01 )Z(y).

Theorem 10.2.4 is an immediate consequence of Theorem 10.2.1 and the


following lemma which may also be applied to prove extensions of Addendum
10.2.3.
Lemma 10.2.5. Let 1 .:s;; r 1 < rz < ... < rk .:s;; n. Let Xi:n and 1';:n be the order
statistics of n i.i.d. random variables with common df. F and, respectively, df.
G1 = FoGo. The df.'s F and Go are assumed to be continuous, and 0 .:s;; a(F) <
w(F) .:s;; 1.
Denote by Pn and Qn the joint distributions of X r , :n' ... , X rk : n and Y,., :n'
... , y"k: n' Then, with M,i and K: as defined above, we have

sup IP{ (Y1:n, YZ : n,'''' Y,,:n)


B

B} - M,iQn(B) I

sup IP{(X1:n,X2:n"",Xn:n)

B} - K:Pn(B)I.

(10.2.7)

PROOF. From P.1.S we know that G 1(rO is a r.v. with dJ. FoGo if 11 is a r.v.
with dJ. F. This implies that
(Y1:n, YZ : n,"" Y,,:n)

4: (G01(X1:n)' G01 (XZ : n),"" G01(Xn:n)).

(1)

10.3. Approximate Sufficiency over a Neighborhood

305

From (1) we know that


P{(Yl :n, Y2:n,., Y,,:n) E B} = P{(Xl:n,X2:n"",Xn:n) E B}
where B = {x: (GOl(Xl)' GOl (X 2 ), ... , Gol(xn)) E B}. If, moreover,
M:Qn(B) = K: Pn(B)

(2)

(3)

then it is apparent that (10.2.7) holds. From (1) we also know that
M:Qn(B) = EM:(BI GOl(X'1 on), Go l (X'2: n ), , Gol(X'k: n ))'

Thus, in view of (3) it remains to prove that


K:(Blxl, ... ,Xk) = M:(BIY1,Yl"",Yk)

(4)

whenever IX(F) < Xl < x 2 < ... < x k < w(F) with Yi denoting GOl(Xi)'
Since Go is continuous we know that GOl is strictly increasing thus,
IX(Go) =: Yo < Yl < ... < Yk < Yk+l := w(Go)
Put Xo = 0 and Xk+l = 1. Let ex denote the Dirac-measure at X (with mass
1). Moreover, Q denotes the probability measure corresponding to Go, and Q
is the uniform distribution on (0, 1).
It is obvious that ey; is induced by ex; and GOl . Moreover, from P.1.6 we
know that the truncation of Q to the interval (Yi-l, Yi), say, QYi-l'Y' is induced
by QX'_I'X, and GOl for i = 1, ... , k + 1. Thus, Theorem 1.2.5(i) yields that
M:( 'IYl,'" ,Yk) is induced by K:( Ix l , ... ,Xk) and the map
(u l , U2, ... , un) --+ (GOl (U l ), GOl (U2), ... , GOl (Un)).

This implies (4). The proof is complete.

10.3. Approximate Sufficiency over a Neighborhood


of a Family of Distributions
In the preceding section it was proved that a small number of order statistics
carries nearly all the information about the underlying dJ. G if this dJ. is close
to a fixed dJ. Go. Now we start with a family of drs G(, 0), 0 E e, and build
a model containing joint distributions of order statistics under adJ. G close
to one of the dJ.'s G(', 0), 0 E e.

Near to Uniform Distributions


The location and scale parameter family of uniform distributions provides an
exceptional case. Here the approximate sufficiency of sparse order statistics
can directly be proved by means of the results in Section 10.2. The uniform
dJ.'s

10. Approximate Sufficiency of Sparse Order Statistics

306

G( . ,9) == G(', (Jl, a


are given by G(x, (Jl, a = (x - Jl)/a for Jl < x < Jl

+ a.

Corollary 10.3.1. Let Xi:n be the ith order statistic of n i.i.d. random variables
with df. G given by G(x) = F((x - Jl)/a) for Jl < x < Jl + a with - 00 < Jl < 00
and a > O. It is assumed that F is continuous and has two derivatives on
(0,1) = (oc(F), w(F. Put f = F'.
Let 1 ~ r = r 1 < r2 < ... < rk = s ~ n. Let K:'t denote the Markov kernel
defined in Addendum to.2.3, and let again Pn denote the joint distribution of the
sparse order statistics X r, on' X r2 :n, ... , X rk :n. Then,
sup IP{(Xr:n,Xr+1:m,,,,Xs:n)EB} - K:,tPn(B) I
B

where again
c5(F) = sup

ye(O,l)

1f'(Y)I/ inf

ye(O,l)

j2(y).

Immediate by applying Theorem 10.2.4 to Go = G(', (Jl,


that M:'t = K:'t

PROOF.

and noting
D

In Corollary to.3.1 it may as well be assumed that F has two derivatives


on (oc(F),w(F with 0 ~ oc(F) < w(F) ~ 1. However, this would not yield an
extension of Corollary 10.3.1 since the dJ. G can be represented in the former
way by choosing different parameters Jl and a. It is also of importance to take
r 1 = rand rk = s since, otherwise, the identity M:'t = K:'t does not hold.
Corollary 10.3.1 was immediate from the results of Section 10.2 since the
conditional distribution of (Y.:n, Y.+1:n''" Son) given (Y., on' Y. 2:n,", Y.k:n)
is independent of the parameter (Jl, a) where the Y;:n are the order statistics
under the uniform dJ. G( " (Jl, a. This property is not shared by other parametric families of dJ.'s (proof!), and so we need a modification of the concept.

The Main Results


In this sequel, we shall assume that r = r 1 < ... < rk = s, and the parameter
space e is a subset of the Euclidean d-space equipped with the Euclidean
norm 11112'
For every vector 9 = ((J1' (J2"'" (Jd) E e let M:'o be the conditional distribution of (Y.:n, Y.+1:n,'' Son) given (Y.,:n, Y. 2:n,'''' Y.k:n) = x where the Y;:n
are the order statistics under the dJ. G(, 9). Notice that the Markov kernel
M:'o also depends on r = (r1' r2 , ... , rk).
In the next step the unknown parameter 9 will be replaced by an estimator

10.3. Approximate Sufficiency over a Neighborhood

307

On based on the order statistics

X r, on' X r2 :n, ... , X rk :n under adJ. G which


not necessarily belongs to the parametric family {G(, 0): E 0}. Thus, a new
problem arises, namely, one has to estimate the parameter under a model
which is incorrect.
The conditions in Theorem 10.3.2 will guarantee that
defined by

M:

(10.3.1)
is a Markov kernel.
Let again Pndenote the joint distribution of the sparse order statistics X r, on'
X r2 :n, ... , X rk :n. We shall use M:Pn as an approximation to the joint distribution of X r:n, X r+1:n, ... , X s:n. The accuracy of this approximation will depend
on the performance of the estimator On and the distance of G from the
parametric family {G(,O): e E 0}. We assume that the drs G(,O) have
densities, say, g(., 0).
Theorem 10.3.2 will be proved under a local Lipschitz condition. Given a
fixed parameter 00 E 0 assume that
(0/oy)G(G- 1(Y1'00)'0)

I(0/oy)G(G- 1(yz,00),0) -

1 ::; ClIO - 001lzIY1 - Yzi

(10.3.2)

for every E 0 with 110 - Oollz ::; e, C ~ 0 and Y; with 0 < q1 < Y; < qz < 1
for i = 1,2.
In (10.3.2) it is implicitely assumed that g(x,O) > 0 for every x with
G- 1(q1'00) < x < G- 1(qz,00) and withllO - Oollz::; e.
lf 0 = {Oo }-that is the problem of Section 1O.2-then (10.3.2) holds with
C=O.
Another set of conditions involving the partial derivatives

(OZ /oe;oy)log g
will be examined in Criterion 10.3.3.
Theorem 10.3.2. Let 1 ::; k ::; nand 1 :$ r = r 1 < rz < ... < rk = s ::; n. Let
X;:n be the ith order statistic of n i.i.d. random variables with common df
G = F 0 G(,Oo) where 00 E 0 and F is a df with cx(F) = 0 and w(F) = 1.
Moreover, suppose that F has two derivatives on (0, 1). Put again f = F'.
Suppose that the df's G(, 0) fulfill condition (10.3.2) for some constants e,
C > 0 and 0 < q1 < qz < 1.
Then for every measurable and 0-valued estimator On we have, with
defined as in (10.3.1),

M:

sup IP{(Xr:n,Xr+1:n, ... ,Xs:n) E B} B

::; [(j(F)

~ C, e)] [kL
+ p(F, On,

j=Z

M: Pn(B)!

(rj - rj - 1

+ P{IIOn(Xr,:n,Xr2:n, ... ,Xrk:n) -

(r.-r.1+3)ZJ1/Z
1)
)n+1

Oollz > e}

+ P{Xr:n::; G- 1(q1'00)} + P{Xs:n ~ G- 1(Qz,00)}

10. Approximate Sufficiency of Sparse Order Statistics

308

with c5(F) as in (10.2.4) and


p(F, On' C, c;)

=
PROOF.

(C/

inf f(y))min(C;,[EIIOn(Xrl,n,Xr2,n, ... ,Xrk,n) ye

(0.1)

0011~]1/4).

Our first aim is to prove that


sup IP{ (X"n, X r+ Ln ,, X"n)
B

B} - M: Pn(B) 1

1)2 J1 /2

+
~ c5(F) [ Ik (rj - rj - 1 - 1) ( rj - r~j-_1~_
j=2

+ P{ IIOn(Xr1 ,n' X r2 ,n,"" Xrk,n) - 00112 > c;}


+ P{X"n ~ G- 1(Q1,00)} + P{X"n?: G- 1(Q2'00)}
+[
where, with

.I (rj k

J=2

rj- 1 - 1)

(1)

/
I/!ix) dPn(x) J1 2

9 == 9n ,

hj ,x(y,8)

= g(y, 0) l(xj_1>xj)(y)/[G(xj , 0) - G(Xj-l> 0)].

Applying the triangle inequality and Theorem 10.2.4 one obtains


sup P {(X"n, X r+ 1 ,n" .. , X"n)
1

B} - M: Pn(B)

~ sup IP{(X"n,Xr+1,n"",X"n)

+ sup 1M: 0
B

~ c5(F) [

Pn(B) - M: Pn(B) 1

j=2

B} - M:ooPn(B)

(rj _ rj - 1 _

1) (rj -

rj - 1 +
n+ 1

1)2 J1 /2

where Pr"x and Qr"x are the Dirac-measures at Xi' and for i = 2, ... , k and
j = ri - 1 + 1, ... , ri - 1 ~he probability measures IJ,x and Qj,X are defined
by the densities hj,x(', O(x)) and hj,x(', ( 0 ), Now (1) is immediate from
inequality (3.3.10) and the Schwarz inequality.
For every x E A we obtain, with Zj = G(xj , ( 0 ), that
2
2
2
I/!j(x) ~ C 110(x) - 001121Zj - Zj-11 .
(2)
A

From the mean value theorem and substituting y by G- 1 (z, ( 0 ) we obtain


for some uj between Zj-1 and Zj that

309

10.3. Approximate Sufficiency over a Neighborhood

1
t/!j(x) = - - Zj - Zj-1
X

G(G

-1

1
Zj - Zj-1

G - 1 (Zj,9 0 )

G-l(Zj_l,90 )

[9(Y,9(X))
g(y, (0)

Zj - Zj-1 -1
(Zj'Oo), O(x)) - G(G (Zj-1'00)'0(x))
A

Zj

Zj-l

J2 g(y, )dy

(8 j 8ZG(G- 1(Z,00)' ~(x)) _ 1)2 dz


8j8z G(G- 1(Uj' (0), O(x))

and hence (2) follows at once from condition (10.3.2) by noting that q1 < Zl <
Z2 < .. , < Zk < q2'
It is immediate from (2) and the Schwarz inequality that

t/!j(x)dPn(x):::.;; C 2min{e 2, (EI19(Xrl:n,Xr2:n"",Xrk:n) - 001IW/2}


x (E(G(Xrj:n,Oo) - G(Xrj _ 1 : n ,00))4)1/2.

(3)

Applying (1.7.4) we obtain (as in the proof of Theorem 10.2.1) that

E(G(Xrj:n' (0)

G(Xrj _ 1 on' ( 0 ))4

= E(F- 1(Urj :n) - F- 1(Urj _ :n))4


1

:::.;; (

inf f(y))-4 EUr>rj_l:n

(4)

YE(O,l)

:::.;; (

inf f(y))-4 ((rj - rj- 1 + 3)j(n


(0,1)

+ 1W.

YE

Combining (1), (3), and (4) the proof is complete.

Condition (10.3.2) holds~as already mentioned~in the degenerate case


where E> = rOo}. Another special case will be studied in the following.

Criterion 10.3.3. Assume that E> is an open and convex subset of the Euclidean
d-space. Assume that the partial derivatives (8 2 j80j 8y)log 9 exist.
Then condition (10.3.2) holds with
C

where

= exp[elq2 - q1IK(g)]K(g)

K(g) = suplI((8 2j80i 8y)log g(G- 1(y, (0)' 0))~=1112

with the supremUm ranging over all (y, 0) with q 1 < Y < q2 and II
PROOF.

Applying the mean value theorem we get

0 0 112 :::.;; e.

!IOg :y G(G- 1(Y1' (0), 0) - log :y G(G- 1(Y2' (0)0)!

= !:yIOg :y G(G- 1(y,00),0))(Yl

- Y2)!

= !:yIOg9(G- 1(y,00),0) - :yIOg9(G- 1(y,00),00)!IY1 - Y21


:::.;; K(g)IIO - 001l21Y1 - Y21

to. Approximate Sufficiency of Sparse Order Statistics

310

with y between Y1 andY2. Sincezdz2 = exp(logzl -logz2)and lexp(z) - 11 ~


exp(z)z for z, z 1, Z2 > 0 the proof can easily be completed.
D

Final Remarks
Let us examine the problem of testing the parametric null-hypothesis
{G( ,0): 0 E e} against certain nonparametric alternatives Gn
It is easy to see that Gn is of the form Fn 0 G(, 0 0 ) where Fn has the density
fn(Y) = 1 + h(G- 1(y, Oo))rx(n) if, and only if, Gn has the density
gn(x) = g(x,O o)(l

+ h(x)rx(n))

where f h(x)g(x, 0 0 ) dx = O. In this case if hand h'(G- 1(., 00))/g(G- 1 (., 0 0 )) are
bounded we have J(Fn) = O(rx(n)) and infyE (o,ldn(Y) ~ 1 - O(rx(n)).
Within the present framework one has to find an appropriate estimator of
O. The problem of constructing estimators which are optimal in the sense
of minimizing the upper bound in Theorem 10.3.2 is also connected to
the problem of finding an "optimal" parameter 00 which makes J(F) =
J(G 0 G- 1 (., 0 0 )) small.
Given a functional T on the family of all qJ.'s so that T(G- 1(., 0)) = 0, the
statistical functional T(Fn- 1) is an appropriate estimator of T(G- 1) and thus
of 0 0 if G- 1 is close to G- 1 ( ,0 0 ). Since the estimator On is only allowed to
depend on the sparse order statistics X r / :n' X r2 : n, ... , X rk : n one has to take
a statistical functional w.r.t. a version ofthe sample qJ. which is based on these
sparse order statistics.

10.4. Local Comparison of a Nonparametric


Model and a Normal Model
Let us summarize the results of Sections 10.2 and 10.3 without going into the
technical details. The nucleus of our model is a parametric family G(, 0),
o E e, of dJ.'s. In Section 10.2 we studied the particular case where e consists
of one parameter. In Section 10.3 the model is built by dJ.'s G close to G(, 0)
for some 0 E e. Under appropriate conditions on r = (r1, ... , rk) and G we find
a Markov kernel
such that

M:

sup
B

IP{ (Xr:n' X r+1 : n, ... , Xs:n) E B}

- M: Pn(B) I ~ Bo(G, r, n)

(10.4.1)

where X1:n ~ ... ~ Xn:n are the order statistics of n i.i.d. random variables
with common dJ. G, and Pn is the joint distribution of X r /: n, X r2 : n, ... , X rk : n.
The decisive point in (10.4.1) is that the Markov kernel
is independent
ofG.

M:

Let us also apply the result of Section 4.5, namely, that central order
statistics X r / :n' X r2 : n, ... , X rk : n are approximately normally distributed.

10.4. Local Comparison of a Nonparametric Model and a Normal Model

311

Denote by g the density of G. We have


sup IP{ (Xrl on' X r2 :n,, X rk :n) E B} - P{ (Y{, y
B

z,... , Y;)

B} I ~

8 1 (G,

r, n)

(10.4.2)

where the explicit form of 8 1 (G, r, n) is given in Theorem 4.5.3, and


z,... , yn is a normal random vector with mean vector

(Y{, y

Jl (G)

and covariance matrix L(G)


O"i,j =

n:

1 (1 -

(~)

(~))

(10.4.3)

(G- 1 n+l , ... , G- 1 n+l


= (O"i)

given by

~ 1)/[(n + l)g( G-

C:

1)

)g( G-

C~

1)) ]

(10.4.4)
for 1 ~ i ~ j ~ k.
Since (10.4.2) can be extended to [0, l]-valued measurable functions (see
P.3.5) we obtain

sup
B

1M: Pn(B) - M: N(I1(G),l;(G))(B)1

~ 8 1 (G, r, n).

(10.4.5)

Combining (10.4.1) and (10.4.5) we have


sup IP{(Xr:n,Xr+1:n,"" Xs:n)
B

B} -

M: N(I1(G),l;(G))(B) I

8(G,r,n):= 8 0 (G,r,n) + 8 1 (G,r,n).

(10.4.6)

(10.4.6) connects the following two models. The first one is given by joint
distributions of order statistics X r:n, ... , Xs:n with "parameter" G; the second
one is a family of k-dimensional normal distributions with parameters
(Jl(G), L(G)). In the sense of (10. 1.26), the model, given by normal distributions
N(I1(G),l;(G))' is 8(G, r, n)-deficient w.r.t. the model determined by the order
statistics X r:n, X r+1:n, ... , X s:n.
If (10.4.6) holds for r = 1 and s = n then the following result also holds: Let
~1' ~2' ... , ~n be the original i.i.d. random variables. Since the order statistic
is sufficient we find a Markov kernel M:* (see also P.1.29) such that

sup IP{ (~l' ~2"'" ~n) E B} B

M:* N(I1(G),l;(G))(B) I ~ 8(G, r, n).

(10.4.7)

Next we present the main ideas of an example due to Weiss (1974, 1977)
where the approximating normal distribution depends on the original dJ. F
only through the mean vector. Moreover, we indicate the possibility of calculating a bound of the remainder term of the approximation.
EXAMPLE 10.4.1. As a continuation of Example 10.2.2, the uniform dJ. Fo on
(0,1) will be tested against a composite alternative of dJ.'s Fn having densities
in given by

10. Approximate Sufficiency of Sparse Order Statistics

312

fn(x) = 1 + f3(n)n- 1/ 2 h(x),

o :$; x :$; 1,

and = 0, otherwise, where S5 h(x) dx = O. The term f3(n) will be specified later.
Part 1 (Asymptotic Sufficiency). Recall from Example 10.2.2 that sparse order
statistics

are asymptotically sufficient under weak conditions.


Part 2 (Asymptotic Normality). Put again

Ai = rj(n

+ 1).

Let f3i.i and f3i,i-l be given as in the proof of Lemma 4.4.2. Recall that the f3i.j
define a map S such that SN(o.r.) = N(O.I) where L = (O'i) and O'i,j = Ai(l - A),
i :$; j. The decisive point is that these values do not depend on F. Define
(10.4.8)
for i = 1, ... , k where 131,0 = O. Notice that Zl' ... , Zk are known to the
statistician, and hence tests may be based on these r.v.'s. The Zi are closely
related to spacings, however, the use of spacings would not lead to asymptotically independent r.v.'s (compare with P.4.4).
Applying (10.4.2) we obtain that Zi' ... , Zk can be replaced by independent
normal r.v.'s Y1' ... , ~ with unit variances and expectations equal to
i

= 1, ... , k.

(10.4.9)

Thus, we have
sup IP{(Zl, ... ,Zd E B} - P{(Y1""'~) E B}I = 0(1).
B

(10.4.10)

A bound for the remainder term in (10.4.10) may be proved by means of


P.4.2(i) and P.4.2(v) [see also P.1O.7].
Thus, the original testing problem has become a problem oftesting, within
a model of normal distributions N(P,I)' the null-hypothesis
J1

= (J11, .. ,J1d = 0
(10.4.11)

against
i

= 1, ... , k,

where the alternative has to be specified more precisely.


Part 3 (Discussion). The above considerations enable us to apply the nonasymptotic theory of linear models to the original problem of testing the
uniform distribution against a parametric or nonparametric alternative. By
finding an optimum procedure within the linear model one gets an approximately optimum procedure for the original model.

10.4. Local Comparison ofa Nonparametric Model and a Normal Model

313

Recall from P.3.8 that the most powerfullevel-a-test of a sample of size n,


for testing the uniform density against the density 1 + [3(n)n- 1/2 h, rejects the
null-hypothesis with probability
(10.4.12)
under appropriate regularity conditions. However, in general, this power
cannot be attained uniformly over a composite alternative. It is well known
that test procedures with high efficiency w.r.t. one "direction" h have a bad
efficiency w.r.t. other directions. The Kolmogorov-Smirnov test provides a
typical example of a test having such a behavior.
In view of (10.4.12) a plausible requirement is that a test in the original
model should be of equal performance under every alternative 1 + [3(n)n -1/2 h
satisfying the condition

(fo h2(x) dx
1

)1/2

= (j

(10.4.13)

for fixed (j > O.


Let again Y1, ... ' Y,. be i.i.d. normal r.v.'s with unit variance and mean vector
JI=(1l1, ... ,llk) as given in (10.4.11). Denote again by 11112 the Euclidean
norm. Notice that L~=l P.i - Ai _ 1 )h(A;)2 is an approximation to (j2 and hence
IIJlI12 is an approximation to [3(n)(j.
Thus, within the normal model, one has to test the null-hypothesis ffo =
{O} = {JI: IIJlI12 = O} against an alternative
~ c {JI: IIJlI12

> O}

(10.4.14)

under the additional requirement that the performance of the test procedure
depends on the underlying parameter JI through IIJlI12 only; thus, the test is
invariant under orthogonal transformations. In Parts 4 and 5 we shall recall
some basic facts from classical, parametric statistics.
Part 4 (A x2-Test). Let us first consider the case where ff1 = {JI: IIJlI12 > O}
without taking into account that h has to satisfy a certain smoothness condition that also restricts the choice of the parameters JI. The uniformly most
powerful, invariant test of level a is given by the critical region

(10.4.15)
where
(10.4.16)
and xL is the (1 - a)-quantile of the central x2-distribution with k degrees of
freedom. According to Weiss (1977) the critical region Ck is also a Bayes test
for testing IIJlI12 = 0 against IIJlI12 = (j with prior probability uniformly dis-

10. Approximate Sufficiency of Sparse Order Statistics

314

tributed over the sphere {JI: IIJlllz = c5} (proof!). Moreover, Ck is minimax for
this testing problem.
Since Yk = (Y1"'" y") is a vector of normal r.v.'s with unit variance and
mean vector JI we know that 1k is distributed according to a noncentral
xZ-distribution with k degrees of freedom and noncentrality parameter IIJlII~.
If k == k(n) tends to infinity as n -+ 00, the central limit theorem implies that
(2k

Ct

+ 411J1W- 1/z

(Y;Z - 1) -

IIJlII~ )

(10.4.17)

is asymptotically standard normal. Consequently, Ck has the asymptotic


power function
(10.4.18)
This yields that asymptotically the rejection probability is strictly larger
than IX if IIJlII~/k1/z is bounded away from zero.
In the original model, the critical region

Ck =

tt

zl > xL},

(10.4.19)

with Zi defined in (10.4.8), attains the rejection probability


<I> ( <1>-1 (IX)

(f

h2(X) dX) )

+ o(kO)

(10.4.20)

under alternatives 1 + [(2k)1/Z /nJ1/z h.


The critical region Ck is closely related to a xZ-test based on a random
partition of the interval [0, 1].
Part 5 (Linear Regression). We indicate a natural generalization of Part 4 that
also takes into account the required smoothness condition imposed on h.
Assume that

(10.4.21)
where Vj = (viI), ... , vik)), j = 1, ... , s, are orthonormal vectors W.r.t. the inner
product (x, y) = I;;l XiYi' The well-known solution of the problem is to take
the critical region
(10.4.22)
where
(10.4.23)
Notice that T. = I Yk liz where Yk = I;;l (Vj' yk)Zvj is the orthogonal projection of Yk onto the s-dimensional linear sub-space. The statistic T. is again

P.10. Problems and Supplements

315

distributed according to a noncentral X2-distribution with s degrees offreedom


and non-centrality parameter 11J111~.We refer to Witting and Nolle (1970) or
Lehmann (1986) for the details. Now the remarks made above concerning the
asymptotic performance of the critical regions Ck and Ck carryover with k
replaced by s.
Part 6 (Parametric and Nonparametric Statistics). If s is fixed as n --+

00 then,
obviously, our asymptotic considerations belong to parametric statistics. If
s == s(n} --+ 00 as n --+ 00 then, e.g. in view of the Fourier expansion of square
integrable functions, the sequence of original models approaches the space of
square integrable densities close to the uniform density showing that the
testing problem is of a nonparametric nature.
The foregoing remarks seem to be of some importance for non parametric
density testing (and estimation). Note that the functions h may belong to the
linear space spanned by the trigonometric functions el' ... , e. (see P.8.5(i.
So there is some relationship to the orthogonal series method adopted in
nonparametric density estimation. The crucial problem in nonparametric
density estimation is to find a certain balance between the variance and the
bias of estimation procedures. Our present point of view differs from that
taken up in literature. First, we deduce the asymptotically optimum procedure
w.r.t. the s(n}-dimensional model. These considerations belong to classical
statistics. In a second step, we may examine the performance of the test
procedure if the s(n}-dimensional model is incorrect.

P.10. Problems and Supplements


1. Let ~1' ... , ~. and, respectively, 111' ... ,11. be i.i.d. random variables and denote by
Xl:. ::::; ... ::::; X.:. and Y1:.::::; ::::; Y..:. the corresponding order statistics. Prove

that

= sup IP{(X 1 :., ,X.:.) E B} - P{(Y1 :., . ,


B

Y..:.) E B}I.

2. Prove that Theorem 10.2.1 holds with


c5(F) = exp ( sup
ye(O.l)

If'(y)/J(y) I) sup 1f'(Y)/J(Y)I/ inf J(y).


ye(O.l)

ye(O.l)

[Hint: Use the fact that J(y)!J(x) = exp[(f'(z)IJ(z))(y - x)] with z between x
and y.]
3. Theorem 10.2.1 holds with the upper bound replaced by

(c/

inf
ye(O.l)

j2(y))[kf (rj _ rj - 1_ 1)(1] - rj - 1+ 1)2"J1/2


j=l

n+1

if the density J statisfies a Lipschitz condition of order ex

(0, 1] on (0, 1).

316

10. Approximate Sufficiency of Sparse Order Statistics

'1

4. (i) If 0 = '0 < < '2 < ... < 'k = S, , = 1 and IX(F) = 0 then (10.2.5) holds with
IJ=2 replaced by IJ=I'
(ii) If, = 'I < '2 < ... < 'k < rk+1 = n + 1, S = nand w(F) = 1 then (10.2.5) holds
with IJ=2 replaced by IJ:~
5. Let r(xl, ... ,x.) = (x.- k +1""'x.). Under the conditions of Addendum 10.2.3, if
IX(F) ~ 0 and w(F) = 1,
sup
B

IP{ (X.-k+I,."", X.:.) E B} - K:.,p.(B)1

~[

sup

ye(<F),I)

1f'(Y)I/ inf

ye(<F), I)

j2(y)Jk 3/2

/n.

6. (i) Verify condition (10.3.2) with C = p(g)K(g) where K(g) is given as in Criterion
10.3.3 and
p(g) =

sup

1I0-001i2'"

sup

G(G-I(y,Oo),O)/ inf

ql <y<q,

G(G-I(y,Oo),O)).

ql <y<Q2

(ii) Prove a modified version of Theorem 10.3.2 under the condition


(%y)G(G-I(YI, 00 ), 0)
I
I(Y2'00)'0) - 1 ~ CillO - 001121y, - Y21

I(%y)G(G

+ C2(110 for every 0 E e with 110 - 0 0112 ~


C2 = p(g)K2(g).

6,

and

ql

001121y, - Y21)2

< YI, Y2 < q2' Here, C I = K(g) and

7. In analogy to (10.4.8) define


Z; = (n

+ 1)1 /2(Pi,i(X,,:, -

G-I(A i ))

+ Pi,i-1 (X'H:' -

G-I(A;))).

Denote by p. the joint distribution of


i = 1, ... , k,

where gi = g(G-I(A i)). N(o,f.) again denotes the k-variate normal distribution with
mean vector zero and covariances Ui,j = Ai(l - Aj), 1 ~ i ~ j ~ k. Prove that
sup
B

IP{ (Z~, ... , Z~) E B}


~

lIP. -

- ~'O,,)(B)I
k

N(o,f.)

I + 2- 1/2 [ i~ (U;,i -

+ 210g gil

]1/2

where U~,I = g12 and


,
U. .
1,1

-2

= gi

Ai-I (1 - Ai) ( gi
2
gi (Ai - Ai-d gi-I

)2
1 ,

i = 2, ... ,k.

[Hint: Let H be the diagonal matrix with diagonal elements tfi,i = l/g i Let l: =
B 0 H 0 l: 0 Ht 0 Bt where B is defined as in the proof of Lemma 4.4.2. Notice that
det(l:') = (det(H))2.J
8. Specialize Example 10.4.1, Part 5, to trigonometric functions (see P.8.5).
9. Extend Example 10.4.1 to the composite null-hypothesis of uniform distributions.

Bibliographical Notes

317

Bibliographical Notes
The reader who is interested in the theoretical background concerning the
comparison of experiments is referred to Torgersen (1976), Strasser (1985), and
Le Cam (1986). The article of Torgersen gives a short, illuminating introduction to this subject.
The magnificent idea to study a construction like that in Theorem 10.2.1
is due to L. Weiss (1974) who also gave some asymptotic results. The extension
of the problem from a single dJ. to a parametric family of drs was suggested
by Weiss (1980). Weiss carried out a detailed study in the location and scale
parameter case. Further insight into the problem of comparing models based
on order statistics was obtained by Reiss et al. (1984) where a sharp bound of
the remainder term of the approximation was also established. The present
approach is taken from Reiss (1986). Some results concerning the sufficiency
of extremes within a parametric framework can be found in the articles by
Weiss (1979b) and Janssen and Reiss (1988). In the second article the location
model of a Weibull sample is locally compared with location models defined
by
(S;(I%

+ (J)m~k

and, respectively, (S;(I%

+ (J)m=1.2.3 ....

where (J is the location parameter, and Sm is the sum of m i.i.d. standard


exponential r.v.'s.
The optimum test procedure described in Example 10.4.1, (10.4.22), depends on the special choice of the set of alternatives. Weiss (1977) also
describes a Bayes test that has the properties of an "all purpose" test. Moreover, it is apparent that, by using the approach of Section 10.4, larger parts
of the theory of linear models can be made applicable to non parametric
statistics. A similar procedure based on spacings is dealt with by Weiss (1965).

APPENDIX 1

The Generalized In verse

Extending the definition of a q.f. (see (1.1.10)) we define the inverse 1jJ* of a
real-valued, nondecreasing and right continuous function IjJ with domain
(C!, w) by setting
1jJ*(y) = inf{t E (C!,w): ljJ(t);;::: y}

for

-00

<y<

00

(A. 1.1)

(with the convention that inf 0 = w). Moreover, we define


1jJ-1 = 1jJ* I(inf ljJ(s), sup ljJ(s));

(A. 1.2)

that is, 1jJ-1 is the restriction of 1jJ* to the interval (inf ljJ(s), sup ljJ(s)).
Thus, in the particular case of the q.f. we have IjJ = F, (C!, w) == real line,
(inf ljJ(s), sup ljJ(s)) = (0, 1), and 1 = F- 1. From the definitions ofljJ* and 1jJ-1
one can easily conclude that 1jJ* is [C!,w]-valued and 1jJ-1 is (C!,w)-valued.

Lemma A.t.t. For IjJ as above, if C! < x < fl then for every real y,
y S; ljJ(x)

iff

1jJ*(y) S; x.

(A. 1.3)

Since 1jJ*(y) is the inf of all t E (C!, w) such that ljJ(t) ;;::: y it is clear that
ljJ(x) ;;::: y implies x ;;::: 1jJ*(y). Conversely, for every z > x ;;::: 1jJ*(y) we have
ljJ(z) ;;::: y, and thus, y S; limzLx ljJ(z) = ljJ(x) since IjJ is right continuous.
D

PROOF.

It is clear that (A. 1.3) also holds for 1jJ-1 and inf ljJ(s) S; y S; sup ljJ(s) in place
of 1jJ* and - 00 < y < 00. Thus (1.2.9) is a special case of (A. 1.3).
We already know that 1jJ-1 is a (C!, w)-valued function with domain (inf ljJ(s),
sup ljJ(s)). More precisely, one can easily check that 1jJ-1 is an (C!(IjJ), w(IjJ))valued function where
C!(IjJ) = inf {t

(C!, w): ljJ(t) > inf ljJ(s)}

(A. 1.4)

Appendix 1. The Generalized Inverse

319

and

w(ljJ) = sup{t

(a,w): ljJ(t) > supljJ(s)}.

(A. 1.5)

It is clear that a ::;; a(ljJ) ::;; w(ljJ) ::;; w. Notice that in the particular case of
adJ. F we have

a(F) = inf {t: F(t) > O}

(A. 1.6)

w(F) = sup{t: F(t) < 1}.

(A 1.7)

and

For the proof of Theorem 1.2.8 we also need the following auxiliary result.
Lemma A.1.2. If IjJ is as above then 1jJ* is nondecreasing and left continuous.
Moreover,
lim 1jJ*(y) = a,
y- -

00

lim 1jJ*(y) = w,
y-->oo

and
lim

ljJ-l (y) = a(IjJ),

y->infljt(s)

lim
y-->sup Ijt(s)

(y) = w(IjJ).

PROOF. From the definition of 1jJ* it is obvious that 1jJ* is nondecreasing.


Moreover, 1jJ* is left continuous if Yn i y implies 1jJ*(Yn) > t, eventually, whenever a < t < 1jJ*(y). Lemma ALl implies that ljJ(t) < y. Consequently, ljJ(t) <
Yn and thus, by Lemma A.Ll again t < 1jJ*(Yn), eventually. By similar arguments one can verify the other assertions.
0
In analogy to the inverse of a nondecreasing, right continuous function IjJ
one can define the inverse of a nondecreasing left continuous function, say, q>
with domain (a,w). Put

q>**(y) = sup{t E (a,w): q>(t) ::;; y}

(A. 1.8)

for -00 < y < 00 (with the convention that sup 0 = a). An application of
Lemma ALl to the nondecreasing, right continuous function defined by
ljJ(x) = - q>( - x)leads to

if a < x < w, then for every y,


y ::;; q>(x) iff q>**(y) ~ x.

Lemma A.1.3. For q> as above,

PROOF. Verify that q>**(y) = -1jJ( - y).


From Lemma A.1.2 we conclude
Lemma A.I.4. (i) q>** is nondecreasing and right continuous.
(ii) Moreover,

320

Appendix 1. The Generalized Inverse

lim <p**(y) =
y~-oo

0(,

and

lim <p**(y) =
y-+oo

13.

Now we are in the proper position to carry out the


PROOF OF THEOREM 1.2.8. (i) is immediate from Lemma A.1.2.
(ii) Put F = G**. From Lemma A.1.4 it is clear that F is a dJ. To prove
that G = F- 1 we apply Lemma A.Ll and Lemma A.1.3. For q E (0, 1) and
-00 < x < 00 we have G(q) :::;; x iff q :::;; F(x), and this holds iff F-l(q):::;; x.
This equivalence implies that G = F- 1
Finally we show that for dJ.'s Fl and F2 with Fl1 = F;l = G we have
Fl = F2. Suppose that Fl1 = F;l and Fl (x) # F2(x) for some x. W.l.g. we can
assume that Fl(X) < q < F2(x) for some q E (0, 1). Lemma A.Ll implies
that F2-1(q):::;; x < Fl1(q) which is a contradiction to Fl1 = F;l. Thus,
Fl = F2
D
From the proof to Theorem 1.2.8 we also know that (F-l )** = F. Thus F
is the "generalized inverse" of F- 1 which does, however, not imply that
F 0 F- 1 is the identity function as we already know from Criterion 1.2.3.
In analogy to Criterion 1.2.3 we obtain
Criterion A.I.S. The df. F- 1 is continuous if, and only if, F-1(F(x)) = x for
every x with < F(x) < 1.

APPENDIX 2

Two Technical Lemmas on Expansions

The results below will provide us with the basic tools for proving asymptotic
expansions for distributions of extreme and central order statistics.

Expansion of (1

+ x/nt

When studying extreme order statistics we are interested in an expansion of


finite length of e- x (1 + x/nt where n is a positive integer and x > O. We
remark that e- x (1 + x/n)" can easily be written as an infinite series by multiplying the absolutely convergent series

i~ C)<x/n)i

i~ (-x)i/i!.

and

We have
e- x (1

where
{3(i,n)

+ x/n)" =

co

L {3(i, n)xi

(~lY(.

i=l

J!

(A.2.l)

i=O

n .)ni-1.

I -

(A.2.2)

We will prove that also an expansion of finite length arranged in powers


of n- 1 holds. This result will be proved for real numbers rx ~ 1 instead of
positive integers n.
If k = 1,3,5, ... and rx ~ 1 then by writing (1 + x/rxt as exp[rx 10g(1 + x/rx)]
and by using a Taylor expansion of log about 1 it is immediate that

Appendix 2. Two Technical Lemmas on Expansions

322

(2X)k+1]
exp [ -(k + l)ak

::<==;

e- x (1

k (_1)i+1 Xi]
iai 1

+ x/a)a exp -i~2

::<==;

(A.2.3)

for x ~ - a/2. Moreover, the upper bound still holds for x ~ - a. The inequalities are strict for x "# O. Since exp(x) ~ 1 + x we obtain from (A.2.3), applied
to k = 1, that

(A.2.4)
For k = 3, 5, 7, ... the term expO=~=2 (_1)i+1 xi/(iai-l)] is a higher order
approximation to e- x (1 + x/aY but this approximation is not an expansion
of the type as discussed in Section 3.2. However, a Taylor expansion of exp
about zero yields the following result.
Lemma A.2.t. For every positive integer m there exists a constant Cm > 0 such
that for every a ~ 1 and x with - a/2 ::<==; x ::<==; a 2/3 the following inequality holds:

Ie-

x (1

+ x/a)a -

[1 + 2:~1)

[3(i, a)x i ]

::<==;

Cma-m(lxI2m-1

+ Ixl2m)
(A.2.5)

where [3(i, a) are real numbers which have the property max {I [3(2k - 1, a)l,
I[3(2k, a)!} ::<==; Cma- k for k = 1, ... , m - 1.
Moreover, we have

[3(2, a) = -1/(2a),
PROOF.

We have
2m-1 (_I)i+1 Xi] _ m-1 ~ [2(m-1) (_I)i x i+1]i I <
-m 2m
.,l.a i 1
.2:.,.2:
('
)'
i
Ca
x
)=0 J.
,=1 / + 1 .a
(A.2.6)

Iexp [ ,=2
.2:

where C will be used as a generic constant which only depends on m. By some


tedious (however straightforward) computations one can prove that
m-1 1 [2(m-1) (_I)i

2(m-1)

Xi+1]i
.2: J.~ ,=1
.2: (/. + 1).a
,i - .2:
I)=0

,=2

[3(i, a)xi

::<==;

Ca- m(lx 2m - 1

+ Ix2ml)

where the values [3(i, a) have the desired property. This together with (A.2.3)
and (A.2.6) implies (A.2.5).
D
By writing down the proof of Lemma A.2.1 in detail one realizes that the
upper bound in A.2.5 still holds for values x with -\I. ::<==; x ::<==; a 2 / 3
For every positive integer n the terms [3(i, a) in (A.2.5) are identical to the
corresponding values, say, [3*(i, a) in (A.2.2). This becomes obvious by noting
that there exists A > 0 and B > 0 such that

Appendix 2. Two Technical Lemmas on Expansions

323

for every Ixl ~ A. Now a comparison of this inequality to (A.2.5) leads to the
desired identification.

The Second Lemma


The next lemma will provide us with an expansion of the function

[3 )1/2
ga.P(x) = e- x2 / 2 [ 1 + ( (a + [3)a
x

Ja [1 - ((a +a [3)[3 )1/2 x JP

(A.2.7)

where a ~ 1 and [3 ~ 1. This expansion will be arranged in powers of the terms


((a + [3)/a[3) 1/2. As an application of this result one obtains expansions of
densities and, in a second step, of distributions of central order statistics (see
Section 4.2).
Lemma A.2.2. For every positive integer m there exists a constant Cm > 0 such
that for every a ~ 1, [3 ~ 1 and Ixl ~ (a[3/(a + [3))1/6 the following inequality
holds:

where gi.a.p is a polynomial of degree ~ 3i and the coefficients of gi,a,p are smaller
than Cm((a + [3)/a[3)i/2 for i = 1, ... , m - 1.
In the proof of Lemma A.2.2 one has to choose the polynomials gi,a,p in
such a way that
1 (m-1
'+2)j m-1
I ((a + [3)1/2 Ixl 3)m
~ ~ .~ ai,a,px'
- .f: gi,a,P(X) ~ Cm -[3Im-1
)-1 J. ,-1
,-1
a

(A.2.8)

where for i = 1, ... , m - 1


1 [
.
(
[3
)(i+2)/2
(
a
)(i+2)/2J
ai,a,p = i + 2 (_1)'+1 a (a + [3)a
- [3 (a + [3)[3
.

Particularly, for i = 1,2,3


gl,a,p(x) = a 1,a,pX 3,
g2,a,p(X) = a 2,a,pX 4

+ ai,a,px 6 /2,

(A.2.9)

g3,a,p(X) = a 3,a,pX 5 + a1,a,pa2,a,pX 7 + aL,px 9 /6.


PROOF OF LEMMA A.2.2. Starting as in the proof of Lemma A.2.1 we obtain,
by using a Taylor expansion of log(1 + x) of length m + 1, that for every
Ixl ~ (a[3/(a + [3))1/6:

Appendix 2. Two Technical Lemmas on Expansions

324

ga,p{~ exp[~t: ai_2,a,pX i ~ c(a ~pr/2Ixlm+2 J


{<
- exp [m-1
.L -:r1 (m-1
.L ai,a,px ,.+2
~

}=o

J.

)iJ + (a +

1=1

p)m/2 (lxl m+ 2
_ Cm - ap

+ IxI3m).

Now the proof can easily be completed by choosing the polynomials gi,a,p as
indicated in (A.2.8).
0
If a = p then Lemma A.2.1 and Lemma A.2.2 roughly coincide for nonnegative x.
We believe that an expansion of the function ga,p is of some interest in its
own right, however, this function is not properly adjusted to the particular
problem of computing an expansion of the distribution of a central order
statistic. For this purpose one has to deal with functions ha,p defined by

ha,p(X) = e- x212 [ 1 + ( (a

P )1/2 Ja-1 [
+ p)a x
1-

((a +a p)P )1/2 x JP-1

(A.2.10)

or with some other variation of the function ga,p according to the standardization of the distribution of the order statistic. By using Taylor expansions of

[ 1 + ( (a +Pp)a )1/2 x J-1

and

[1 -((a +a P}P )1/2 x J-1

about 1, one can easily deduce from Lemma A.2.2 the following:

Corollary A.2.3. For a, p ~ 1 let ha,p be defined as in (A.2.10).


Then, Lemma A.2.2 holds true for ha,p, hi,a,p and the term (Ixl m+ Ixl 3m ) in
place of ga,p, gi,a,p and (lxl m+ 2 + Ixl 3m ) where hi,a,p is a polynomial which has
the same properties as gi,a,p for i = 1, ... , m - 1.
The polynomials h 1,a,p and h2,a,P are given by

h1,a,p(X) = gl,a,p(X) -

[((a +Pp)a )1/2 -

h2,a,p(X) = g2,a,p(X) -

[((a +Pp)a )1/2 -

)1 /2J

(a

+ P)P

(a

+a p)p )1 2J xg 1,a,p(X)

+ [(a + p)a - a + p + (a + P)P x.


2

APPENDIX 3

Further Results on Distances


of Measures

The aim of the following lines is to extend some of the results of Section 3.3
to finite signed measures. Moreover, we prove some highly technical inequalities which do not belong to the necessary prerequisites for the understanding
of the main ideas of this volume. However, these inequalities are useful for
certain computations.
In the sequel, let Vi be a finite signed measure (on a measurable space (S,.?4))
represented by the density J; w.r.t. a dominating measure 11.

A Further Remark about the Scheffe Lemma


An extension of Lemma 3.3.4 to finite signed measures is easily obtained by
splitting the measure Vi to the positive and negative part
and vi- with the
respective densities J;+ and J;- (the positive and negative part of f). Check that

vt

Ifa - fnl = Ifo+ - f/I -Ifo- - f,,-I.


Now, if the conditions of Lemma 3.3.4 are satisfied by J;+ andJ;- then again
lim
n--+oo

fifo - f,,1 dJl = O.

(A.3.1)

The Variational Distance and the L 1 -Distance


Define again

II va

- viii

= sup IVa (B)


B

- Vi (B)I

(A.3.2)

326

Appendix 3. Further Results on Distances of Measures

as the variational distance between Vo and Vl . As an extension to Lemma 3.3.1


(the proof can be left to the reader) we get
Lemma A.3.t. (i) Ilvo - vlll ::;; I Ifo - fll d/l ::;; 211 vo - vlll
(ii) If vo(S) = Vl (S) then

Ilvo - vlll =

l fifo - flld/l.

We note that under the condition that vo(S) = vl(S) we have again

Ilvo - vlll = voUo > fd - vdfo > fd


The following modification of Lemma A.3.l (i) will be useful when the error
term of an approximation has to be computed. In our applications no estimate
of the term I g d/l has to be computed.
Lemma A.3.2. Let f and g be /l-integrable functions with g ~ 0, I g d/l > 0, and
If d/l > 0. Denote by Q the probability measure with /l-density go = glI g d/l,
and by v the signed measure with /l-density fo = flI f d/l.
Then for every B E f!J,

IIQ - vii::;; (ffd/l

r
l

L Ig - fld/l

+ Lc

Igo - fold/l

where Be denotes the complement of B.


PROOF.

From Lemma A.3.1 (ii) and the triangle inequality we get

Moreover, since g ~

and I go d/l = I fo d/l = 1 we have

LlgO-g! ffd/lld/l=IL(gO-g! ffd/l)d/l I

1-

L (g! f f d/l) d/l - Lc go d/l

::;; ( f fd/l

L Ig - fld/l

+f

+ Lc Igo -

This together with (1) implies the asserted inequality.

fo d/ll
fold/l.

In the applications the set B in Lemma A.3.2 is an exceptional set (with


Q(B) and v(B) close to one) on which the integrand Ig - fl can easily be
computed.

Appendix 3. Further Results on Distances of Measures

327

The Variational Distance between Product Measures


Given finite signed measures Vi with J.l;-density I; put Iv;! == IVi(Si) = JIl;l dJ.li
where Si denotes the underlying space. Fubini's theorem implies that IVl x v21=
IVlllv21 where the product measure Vl x v2is defined by

(Vl x v2)(B) =

fl(xdf2(x 2)d(J.ll

x J.l2)(X l ,X2)

Lemma A.3.3. Let Vi and )'i be finite signed measures for i = 1, ... , k. Then,

PROOF. For notational simplicity we will prove the assertion for k = 2 only.
The general case can easily be proved by induction over k.
For measurable sets A in the product space we get by Fubini's theorem

I(vl x A2)(A) - (Al x )'2)(A)1

If(Vl(A x') - Al(Ax2))dA2(X2)1


(1)

where AX2 is the x 2 -section of A. In analogy to (1) we get

l(v1 x v2)(A) - (V1 x A2)(A)1 ::; IV11 sup Iv2(A x') - A2(Ax,)l.

(2)

Thus, combining (1) and (2) we get the desired inequality in the case of k = 2.
The proof is complete.
D
Notice that Lemma 3.3.7 is an immediate consequence of Lemma A.3.3.
Moreover, Lemma A.3.3 is an extension of the following well-known formula

It ai - tl bil::; it

OJ

,aj')CiL Ib)) la i - bil

which holds for all real (as well as complex) numbers ai and bi
Corollary A.3.4. For probability measures Qi and finite signed measures Ai with
Ai(Si) = 1 we have

I ~ Qi PROOF.

i~ Ai I ::; exp [ 2 it II Qi -

Ai I ] it I Qi - Ai II

(A.3.3)

Check that

Ijis1 Ajl

jis1 IAjl::; jtL (1 + 211Qj - Ajl!)::; exp [2 it IIQi - Aill].

328

Appendix 3. Further Results on Distances of Measures

The proof of Lemma A.3.3 gives a little bit more than stated there. For
every measurable set A we obtain

where Xi = (Xl' .. . ,Xi-l,Xi+l,,Xk) and Ax, is the xi-section of A; that is,


Ax, = {Xi:(Xl,,Xk)EA}. Thus, if e.g. A is a convex set then Ax, is an
interval.

The Hellinger Distance and the Kullback-Leibler Distance


Next we give the proof of the inequality (3.3.9) where it was stated that
H(Qo, Ql) ~ K(Qo, Qd l/2 .

This is immediate from inequality (A.3.4) applied to B =

s.

Lemma A.3.S. Let Qo and Ql be probability measures with {t-densities fo and


fl' Then, for every measurable set B,

H(QO,Ql)

~ [2 Qo(B

C)

(-logUdfo))dQo T/2

(A.3.4)

PROOF. According to (3.3.5) we have to establish a lower bound of JUdo) 1/2 d{t.
W.l.g. let Qo(B) > O. Since exp(x) ~ 1 + X, we obtain from the Jensen inequality that

Udo) 1/2 d{t

~ Qo(B)

Udfo)1/2 d(Qo/Qo(B))

~ Qo(B)exp [ (2Qo(BWl

~ Qo(B) + r

10gUdfo)dQo ]

10gUdfo)dQo

Now the assertion is immediate from (3.3.5).

Further Bounds for the Variational Distance


of Product Measures
Finally, we establish upper bounds for the variational distance of product
measures via the X2-distance D. One special case was already proved in
(3.3.10), namely, that

Appendix 3. Further Results on Distances of Measures

329

for probability measures Qi and Pi where Pi has to be dominated by Qi.


Next Pi will be replaced by a signed measure Vi with VieS) = 1. Again one
has to assume that Vi is dominated by Qi. Lemma A.3.6, applied to m = 0, yields

At the end of this section we will discuss in detail the special case of m = 1.
Lemma A.3.6. Assume that Qi and Vi satisfy the conditions above. Let 1 + gi be
a Q;-density of Vi. Then, for every m E {O, ... , k},

s~p IC~ Vi)(B) :<:=;

T1(ex p

hmd

i~ Qil

[((m + 1)!)-1 it D(Qi,Vi)2 JY /2 [it D(Qi,P;)2 Jm+1)/2

where
hm (x 1 ,,xk ) = 1 +
PROOF.

TI giJXiJ

TI air'

i=11~il<"'<ij~kr=1

Notice that
k

TI (1

i=l

+ a;) =

1+

i=11:Si t <<i j 5.kr=1

X7=1

X7=1

and, therefore, hk is the


Qi-density of
Vi. From the Schwarz
inequality and the fact that the functions (x 1 ,,Xk )--+
giJX;J for
1 :<:=; i1 < ... < ij :<:=; k and j = 1, ... , k form a multiplicative system w.r.t.
X~=l Qi we obtain

This implies the asserted inequality since

TI:=l

330

Appendix 3. Further Results on Distances of Measures

and

L
00

i=m

zi/i!

o.

D(Qi' pY =: Rk

(A.3.6)

exp(z)zm/m!

for z ~

In the special case of m = 1 we get

~ 8-

exp [2- 1

it

D(Qi' VY ]

it

and hence

This shows that for k -+ 00 further insight into the variational distance of
product measures may be gained by means of the central limit theorem.

Bibliography

Alam, K. (1972). Unimodality of the distribution of an order statistic. Ann. Math.


Statist. 43, 2041-2044.
Albers, W., Bickel, P. J. and van Zwet, W.R. (1976). Asymptotic expansions for the
power of distribution-free tests in one-sample problem. Ann. Statist. 4, 108-156.
Ali, M.M. and Kuan, K.S. (1977). On the joint asymptotic normality of quantiles.
Nanta Math. 10, 161-165.
Anderson, C.W. (1971). Contributions to the Asymptotic Theory of Extreme Values.
Ph.D. Thesis, University of London.
Anderson, C.W. (1984). Large deviations of extremes. In: Statistical Extremes and
Applications, Ed. J. Tiago de Oliveira, pp. 325-340. Dordrecht: Reidel.
Arnold, B.C., Becker, A., Gather, U. and Zahedi, H. (1984). On the Markov property
of order statistics. J. Statist. Plann. Inference 9, 147-154.
Bahadur, R.R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37,
577-580.
Bain, L.J. (1978). Statistical Analysis of Reliability and Life-Testing Models. New York:
Marcel Dekker.
Balkema, A.A. and Haan, L. de (1978a). Limit distributions for order statistics I. Theory
Probab. Appl. 23, 77-92.
Balkema, A.A. and Haan, L. de (1978b). Limit distributions for order statistics II.
Theory Probab. Appl. 23, 341-358.
Barndorff-Nielsen, O. (1964). On the limit distribution of the maximum of a random
number of independent random variables. Acta Math. Acad. Sci. Hungar. 15,399403.
Barnett, V. (1975). Probability plotting methods and order statistics. Appl. Statist. 24,
95-108.
Barnett, V. (1976). The ordering of multivariate data. J. Roy. Statist. Soc., Ser. A, 139,
318-344.
Barnett, V. and Lewis, T. (1978). Outliers in Statistical Data. Chichester: Wiley.
Beiriant, J. and Teugels, J.L. (1987). Asymptotics of Hill's estimator. Theory Probab.
Appl. 31,463-469.
Beran, J. (1985). Stochastic procedures: Bootstrap and random search methods in
statistics: Proceedings of 45th Session of the lSI, Vol. 4 (Amsterdam), 25.1.

332

Bibliography

Berman, S.M. (1961). Convergence to bivariate limiting extreme value distributions.


Ann. Inst. Statist. Math. 13,217-223.
Bhattacharya, R.N. and Rao, R.R. (1976). Normal Approximation and Asymptotic
Expansion. New York: Wiley.
Bhattacharya, R.N. and Gosh, J.K. (1978). On the validity of the formal Edgeworth
expansions. Ann. Statist. 6,434-451.
Bickel, P.J. (1967). Some contributions to the theory of order statistics. In: Proc. 5th
Berkeley Symp. Math. Statistics and Prob., Vol. I., pp. 575-591. Berkeley: Univ.
California Press.
Bickel, P.J. and Freedman D.A. (1981). Some asymptotic theory for the bootstrap.
Ann. Statist. 9,1196-1217.
Bickel, P.J. and Rosenblatt, M. (1973). On some global measures of the deviation of
density function estimates. Ann. Statist. 1, 1071-1095.
Bickel, P.J. and Rosenblatt, M. (1975). Correction to "On some global measures of the
deviation of density function estimates". Ann. Statist. 3, 1370.
Bloch, D.A. and Gastwirth, J.L. (1968). On a simple estimate of the reciprocal of the
density function. Ann. Math. Statist. 39, 1083-1085.
Biom, G. (1958). Statistical Estimates and Transformed Beta-Variables. New York:
Wiley.
Blum, J.R. and Pathak, P.K. (1972). A note on the zero-one law. Ann. Math. Statist.
43,1008-1009.
Boos, D.D. (1984). Using extreme value theory to estimate large percentiles. Technometrics 26, 33-39.
Bortkiewicz, L. von (1922). Variationsbreite und mittlere Fehler. Sitzungsberichte
Berliner Math. Ges. 21, 3-11.
Brown, B.M. (1981). Symmetric quantile averages and related estimators. Biometrika
68,235-242.
Brozius, H. and Haan, L. de (1987). On limit laws for the convex hull of a sample. J.
Appl. Probab. 24, 863-874.
Chernoff, H., Gastwirth, J.L. and John, M.V. (1967). Asymptotic distribution of linear
combinations of functions of order statistics with applications to estimation. Ann.
Math. Statist. 38, 52-72.
Chibisov, D.M. (1964). On limit distributions for order statistics. Theory Probab. Appl.
9,150-165.
Chow, Y.S. and Teicher, H. (1978). Probability Theory. New York: Springer.
Cohen, J.P. (1982a). The penultimate form of approximation to normal extremes. Adv.
Appl. Probab. 14,324-339.
Cohen, J.P. (1982b). Convergence rates for the ultimate and penultimate approximation in extreme-value theory. Adv. Appl. Probab. 14, 833-854.
Cohen, J.P. (1984). The asymptotic behaviour of the maximum likelihood estimates
for univariate extremes. In: Statistical Extremes and Applications, Ed. J. Tiago de
Oliveira, pp. 435-442. Dordrecht: Reidel.
Cooil, B. (1985). Limiting multivariate distributions of intermediate order statistics.
Ann. Probab. 13,469-477.
Cooil, B. (1988). When are intermediate processes of the same stochastic order? Statist.
Probab. Letters 6,159-162.
Consul, P.C. (1984). On the distributions of order statistics for a random sample size.
Statist. Neerlandica 38, 249-256.
Craig, A.T. (1932). On the distribution of certain statistics. Amer. J. Math. 54, 353-366.
Cramer, H. (1946). Mathematical Methods of Statistics. Princeton: Princeton Univ.
Press.
Csiszar, I. (1975). I-Divergence geometry of probability distributions and minimization
problems. Ann. Probab. 3,146-158.

Bibliography

333

Csorgo, M. (1983). Quantile Processes with Statistical Applications. Philadelphia:


SIAM.
Csorgo, M., Csorgo, S., Horvath, L. and Mason, D.M. (1986). Normal and stable
convergence of integral functions of the empirical distribution function. Ann.
Probab. 14,86-118.
Csorgo, M. and Revesz, P. (1981). Strong Approximations in Probability and Statistics.
New York: Academic Press.
Csorgo, S., Deheuvels, P. and Mason, D.M. (1985). Kernel estimates of the tail index
of a distribution. Ann. Statist. 13, 1050-1078.
Csorgo, S., Horvath, L. and Mason, D.M. (1986). What portion of the sample makes
a partial sum asymptotically stable or normal? Probab. Th. ReI. Fields 72,1-16.
Csorgo, S., and Mason, D.M. (1986). The asymptotic distributions of sums of extreme
values from a regularly varying distribution. Ann. Probab. 14,974-983.
David, F.N. and Johnson, N.L. (1954). Statistical treatment of censored data, Part I,
Fundamental formulae. Biometrika 44, 228-240.
David, HA (1981). Order Statistics. 2nd ed. New York: Wiley.
Davis, R.A. (1982). The rate of convergence in distribution of the maxima. Statist.
Neerlandica 36, 31-35.
Deheuvels, P. and Pfeifer, D. (1988). Poisson approximations of multinomial distributions and point processes. J. Multivariate Anal. 25, 65-89.
Dodd, E.L. (1923). The greatest and the least variate under general laws of error. Trans.
Amer. Math. Soc. 25, 525-539.
Dronskers, J.J. (1958). Approximate formulae for the statistical distributions of extreme
values. Biometrika 45, 447-470.
Du Mouchel, W. (1983). Estimating the stable index (X in order to measure tail thickness.
Ann. Statist. 11, 1019-1036.
Dwass, M. (1966). Extremal processes, II. Illinois J. Math. 10,381-391.
Dziubdziela, W. (1976). A note on the k-th distance random variables. Zastosowania
Matematykai 15,289-291.
Eddy, W.F. and Gale, J.D. (1981). The convex hull of a spherically symmetric sample.
Adv. App!. Probab. 13, 751-763.
Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Statist. 7,
1-26.
Egorov, V.A. and Nevzorov, V.B. (1976). Limit theorems for linear combinations of
order statistics. In: Proc. 3rd Japan-USSR Symp. Probab. Theory, Eds. G. Maruyama
and J.V. Prokhorov, pp. 63-79. Lecture Notes in Mathematics 550. New York:
Springer.
Englund, G. (1980). Remainder term estimates for the asymptotic normality of order
statistics. Scand. J. Statist. 7, 197-202.
Erdelyi, A., Magnus, W., Oberhettinger, F. and Tricomi, F.G. (1953). Higher Transcendental Functions, Vol. I. New York: McGraw-Hill.
Falk, M. (1983). Relative efficiency and deficiency of kernel type estimators of smooth
distribution functions. Statist. Neerlandica 37, 73-83.
Falk, M. (1984a). Relative deficiency of kernel type estimators of quantiles. Ann. Statist.
12,261-268.
Falk, M. (1984b). Berry-Esseen theorems for a global measure of performance of kernel
density estimators. South African Statist. J. 19, 1-19.
Falk, M. (1985a). Asymptotic normality of the kernel quantile estimator. Ann. Statist.
13, 428-433.
Falk, M. (1985b). Uniform convergence of extreme order statistics. Habilitationsschrift,
University of Siegen.
Falk, M. (1986a). Rates of uniform convergence of extreme order statistics. Ann. Inst.
Statist. Math., Ser. A, 38, 245-262.

334

Bibliography

Falk, M. (1986b). On the estimation of the quantile density function. Statist. Probab.
Letters 4, 69- 73.
Falk, M. (1989a). Best attainable rate of joint convergence of extremes. In: Extreme
Value Theory, Eds. J. Hiisler and R.-D. Reiss, pp. 1-9. Lecture Notes in Statistics
51. New York: Springer.
Falk, M. (1989b). A note on uniform asymptotic normality of intermediate order
statistics. Ann. Inst. Statist. Math., Ser. A.
Falk, M. and Kohne, W. (1986). On the rate at which the sample extremes become
independent. Ann. Probab. 14, 1339-1346.
Falk, M. and Reiss, R.-D. (1988). Independence of order statistics. Ann. Probab. 16,
854-862.
Falk, M. and Reiss, R.-D. (1989). Weak convergence of smoothed and nonsmoothed
bootstrap quantile estimates. Ann. Probab. 17.
Feldman, D. and Tucker, H.G. (1966). Estimation of non-unique quantiles. Ann. Math.
Statist. 37,451-457.
Feller, W. (1972). An Introduction to Probability Theory and its Applications. Vol. 2,
2nd ed. New York: Wiley.
Ferguson, T.S. (1967). Mathematical Statistics. New York: Academic Press.
Finkelstein, B.V. (1953). Limiting distribution of extreme terms of a variational series
of a two-dimensional random variable. Dokl. Ak. Nauk. S.S.S.R. 91, 209-211 (in
Russian).
Fisher, R.A. (1922). On the mathematical foundation of theoretical statistics. Phil.
Trans. Roy. Soc. A 222, 309-368. Reprint in: Collected Papers of R.A. Fisher, Vol.
I, Ed. J.H. Bennett, pp. 276-335. University of Adelaide.
Fisher, R.A. and Tippett, L.H.C. (1928). Limiting forms of the frequence distribution
of the largest or smallest member of a sample. Proc. Camb. Phil. Soc. 24, 180-190.
Floret, K. (1981). Mass- und Integrationstheorie. Stuttgart: Teubner.
Fn:chet, M. (1927). Sur la loi de probabilite de l'ecart maximum. Ann. de la Soc.
Polonaise de Math. 6,93-116.
Galambos, J. (1975). Order statistics of samples from multivariate distributions. J.
Amer. Statist. Assoc. 70, 674-680.
Galambos, J. (1984). Order statistics. In: Handbook of Statistics. Vo!' 4, Eds. P.R.
Krishnaiah and P.K. Sen, pp. 359-382. Amsterdam: North-Holland.
Galambos, J. (1987). The Asymptotic Theory of Extreme Order Statistics. 2nd ed.
Malabar, Florida: Krieger.
Geffroy, J. (1958/59). Contributions ala theorie des valeurs extremes. Pub!. Inst. Statist.
Univ. Paris 7/8, 37-185.
Gini, C. and Galvani, L. (1929). Di talune estensioni dei concetti di media ai caratteri
qualitativi. Metron 8. Partial English translation in: J. Amer. Statist. Assoc. 25,
448-450.
Gnedenko, B. (1943). Sur la distribution limit du terme maximum d'une serie aleatoire.
Ann. Math. 44, 423-453.
Goldie, C.M. and Smith, R.L. (1987). Slow variation with remainder: Theory and
applications. Quart. J. Math. Oxford 38,45-71.
Gomes, M.1. (1978). Some probabilistic and statistical problems in extreme value
theory. Ph.D. Thesis, University of Sheffield.
Gomes, M.I. (1981). An i-dimensional limiting distribution function of largest values
and its relevance to the statistical theory of extremes. In: Statistical Distribution in
Scientific Work, Eds. C. Taillie et aI., Vol. 6., pp. 389-410. Dordrecht: Reidel.
Gomes, M.I. (1984). Penultimate limiting forms in extreme value theory. Ann. Inst.
Statist. Math., Ser. A, 36, 71-85.
Gross, A.J. (1975). Survival Distributions: Reliability Applications in the Biomedical
Sciences. New York: Wiley.

Bibliography

335

Guilbaud, O. (1982). Functions of non-iid random vectors expressed as functions of


iid random vectors. Scand. J. Statist. 9, 229-233.
Gumbel, E.J. (1933). Das Alter des Methusalem. Z. Schweizerische Statistik und
Volkswirtschaft 69, 516-530.
Gumbel, E.J. (1946). On the independence of the extremes in a sample. Ann. Math.
Statist. 17, 78-81.
Gumbel, E.J. (1958). Statistics of Extremes. New York: Columbia Univ. Press.
Haan, L. de (1970). On Regular Variation and its Application to the Weak Convergence of Sample Extremes. Amsterdam, Math. Centre Tracts 32.
Haan, L. de (1976). Sample extremes: an elementary introduction. Statist. Neerlandica
30, 161-172.
Haan, L. de and Resnick, S.I. (1980). A simple asymptotic estimate for the index of
a stable distribution. J. Roy. Statist. Soc., Ser. B., 42, 83-87.
Haan, L. de and Resnick, S.l. (1982). Local limit theorems for sample extremes. Ann.
Probab. 10,396-413.
Hausler, E. and Teugels, J.L. (1985). On asymptotic normality of Hill's estimator for
the exponent of regular variation. Ann. Statist. 13, 743-756.
Haldane, J.B.S. and Jayakar, S.G. (1963). The distribution of extremal and nearly
extremal values in samples from a normal distribution. Biometrika 50, 89-94.
Hall, P. (1978). Some asymptotic expansions of moments of order statistics. Stoch.
Proc. Appl. 7, 265-275.
Hall, P. (1979). On the rate of convergence of normal extremes. 1. Appl. Probab. 16,
433-439.
Hall, P. (1982a). On estimating the endpoint of a distribution. Ann. Statist. 10, 556-568.
Hall, P. (1982b). On some simple estimates of an exponent of regular variation. J. Roy.
Statist. Soc., Ser. B., 44,37-42.
Hall, P. (1983). On near neighbour estimates of a multivariate density. J. Multivariate
Anal. 13,24-39.
Hall, P. and Welsh, A.H. (1984). Best attainable rates of convergence for estimates of
parameters ofregular variation. Ann. Statist. 12, 1079-1084.
Hall, P. and Welsh, A.H. (1985). Adaptive estimates of parameters of regular variation.
Ann. Statist. 13,331-341.
Hall, W.J. and Wellner, J.A. (1979). The rate of convergence in law of the maximum
of an exponential sample. Statist. Neerlandica 33, 151-154.
Hajek, J. and Sidak, Z. (1967). Theory of Rank Tests. New York: Academic Press.
Harrel, F.E. and Davis, C.E. (1982). A new distribution-free quantile estimator. Biometrika 69, 635-640.
Harter, H.L. (1983). The chronological annotated bibliography of order statistics. Vol.
I: pre-1950. Vol. II: 1950-1959. Columbus, Ohio: American Sciences Press.
Hecker, H. (1976). A characterization of the asymptotic normality of linear combinations of order statistics from the uniform distribution. Ann. Statist. 4, 1244-1246.
Heidelberger, P. and Lewis, P.A.W. (1984). Quantile estimation in dependent sequences. Opns. Res. 32, 185-209.
Helmers, R. (1981). A Berry-Esseen theorem for linear combinations of order statistics.
Ann. Probab. 9, 342-347.
Helmers, R. (1982). Edgeworth Expansions for Linear Combinations of Order Statistics. Amsterdam, Math. Centre Tracts 105.
Herbach, L. (1984). Introduction, Gumbel model. In: Statistical Extremes and Applications, Ed. J. Tiago de Oliveira, pp. 49-80. Dordrecht: Reidel.
Hewitt, E. and Stromberg, K. (1975). Real and Abstract Analysis. 3rd ed. New York:
Springer.
Heyer, H. (1982). Theory of Statistical Experiments. Springer Series in Statistics. New
York: Springer.

336

Bibliography

Hill, B.M. (1975). A simple approach to inference about the tail of a distribution. Ann.
Statist. 3, 1163-1174.
Hillion, A. (1983). On the use of some variation distance inequalities to estimate the
difference between sample and perturbed sample. In: Specifying Statistical Models,
Eds. J.P. Florens et aI., pp. 163-175. Lecture Notes in Statistics 16. New York:
Springer.
Hodges, J.L. Jr. and Lehmann, E.L. (1967). On medians and quasi medians. J. Amer.
Statist. Assoc. 62, 926-931.
Hodges, J.L. Jr. and Lehmann, E.L. (1970). Deficiency. Ann. Math. Statist. 41,783-801.
Hoeffding, W. and Wolfowitz, 1. (1958). Distinguishability of sets of distributions. Ann.
Math. Statist. 29, 700-718.
Hosking, J.R.M. (1985). Maximum-likelihood estimation of the parameter of the
generalized extreme-value distribution. Applied Statistics 34, 301-310.
Huang, J.S. and Gosh, M. (1982). A note on the strong unimodality of order statistics.
J. Amer. Statist. Soc. 77, 929-930.
Husler, J. and Reiss, R.-D. (1989). Maxima of normal random vectors: Between
independence and complete dependence. Statist. Probab. Letters 7.
Husler, J. and Schupbach, M. (1988). On simple block estimators for the parameters
of the extreme-value distribution. Commun. Statist.-Simula. 15,61-76.
Husler, J. and Tiago de Oliveira, J. (1986). The usage of the largest observations
for parameter and quantile estimation for the Gumbel distribution; an efficiency
analysis. Pub!. Inst. Stat. Univ. 33,41-56.
Ibragimov, J.A. (1956). On the composition of unimodal distributions. Theory Probab.
Appl. 1,225-260.
Ibragimov, J.A. and Has'minskii, R.Z. (1981). Statistical Estimation. Springer-Verlag,
Berlin.
Iglehardt, D.L. (1976). Simulating stable stochastic systems; VI. Quantile estimation.
J. Assoc. Comput. Mach. 23, 347-360.
Ikeda, S. (1963). Asymptotic equivalence of probability distributions with applications
to some problems of asymptotic independence. Ann. Inst. Statist. Math. 15,87-116.
Ikeda, S. (1975). Some criteria for uniform asymptotic equivalence of real probability
distributions. Ann. Inst. Statist. Math. 27,421-428.
Ikeda, S. and Matsunawa, T. (1970). On asymptotic independence of order statistics.
Ann. Inst. Statist. Math. 22, 435-449.
Ikeda, S. and Matsunawa, T. (1972). On the uniform asymptotic joint normality of
sample quantiles. Ann. Inst. Statist. Math. 24, 33-52.
Ikeda, S. and Nonaka, Y. (1983). Uniform asymptotic joint normality of a set of
increasing number of sample quantiles. Ann. Inst. Statist. Math. 35, Ser. A, 329-341.
Isogai, T. (1985). Some extensions of Haldane's multivariate median and its applications. Ann. Inst. Statist. Math. 37, Ser. A, 289-301.
Ivchenko, G.I. (1971). On limit distributions for the order statistics of the multinomial
distribution. Theory Probab. Appl. 16, 102-115.
Ivchenko, G.I. (1974). On limit distributions for middle order statistics for double
sequence. Theory Probab. App!. 19,267-277.
Jacod, J. and Shiryaev, A.N. (1987). Limit Theorems for Stochastic Processes. Berlin:
Springer.
Janssen, A. (1988). Uniform convergence of sums of order statistics to stable laws.
Probab. Th. ReI. Fields 78, 261-272.
Janssen, A. and Reiss, R.-D. (1988). Comparison of location models of Wei bull type
samples and extreme value processes. Probab. Th. ReI. Fields 78, 273-292.
Joag-Dev, K. (1983). Independence via uncorrelatedness under certain dependence
structures. Ann. Probab. 11, 1037-1041.
Joe, H. (1987). Estimation of quantiles of the maximum of N observations. Biometrika
74,347-354.

Bibliography

337

Johnson, N.L. and Kotz, S. (1970). Distributions in Statistics: Continuous Univariate


Distributions-l. New York: Wiley.
Johnson, N.L. and Kotz, S. (1972). Distributions in Statistics: Continuous Multivariate
Distributions. New York: Wiley.
Kabanov, Yu. and Lipster, R.S. (1983). On convergence in variation of the distributions
of multivariate point processes. Z. Wahrsch. verw. Geb. 63,475-485.
Karr, A.F. (1986). Point Processes and their Statistical Inference. New York: Marcel
Dekker.
Kendall, M.G. (1940). Note on the distributions of quantiles for large samples. J. Roy.
Statist. Soc., Suppl. 7, 83-85.
Kendall, M.G. and Stuart, A. (1958). The Advanced Theory of Statistics. Vol. l.
London: Griffin.
Kiefer, J. (1967). On Bahadur's representation of sample quantiles. Ann. Math. Statist.
38, 1323-1342.
Kiefer, J. (1969a). Deviations between the sample quantile process and the sample df.
In: Nonparametric Techniques in Statistical Inference, Ed. M.L. Puri, pp. 299-319.
Cambridge: Cambridge Univ. Press.
Kiefer, J. (1969b). Old and new methods for studying order statistics and sample
quantiles. In: Nonparametric Techniques in Statistical Inference, Ed. M.L. Puri,
pp. 349-357. Cambridge: Cambridge Univ. Press.
Kinnison, R.R. (1985). Applied Extreme Value Statistics. Columbus: Battelle Press.
Klenk, A. and Stute, W. (1987). Bootstrapping ofL-estimates. Statist. Decisions 5, 77-87.
Kohne, W. and Reiss, R.-D. (1983). A note on uniform approximation to distributions
of extreme order statistics. Ann. Inst. Statist. Math., Ser. A, 35, 343-345.
Kolchin, V.F. (1980). On the limiting behaviour of extreme order statistics in a
polynomial scheme. Theory Probab. Appl. 14,458-469.
Koziol, J.A. (1980). A note on limiting distributions for spacings statistics. Z. Wahrsch.
verw. Gebiete 51, 55-62.
Kuan, K.S. and Ali, M.M. (1960). Asymptotic distribution of quantiles from a multivariate distribution. In: Mult. Statist. Analysis, Ed. R.P. Gupta, pp. 109-120.
Amsterdam: North-Holland.
Lamperti, J. (1964). On extreme order statistics. Ann. Math. Statist. 35, 1726-1736.
Landers, D. and Rogge, L. (1985). Asymptotic normality ofthe estimators of the natural
median. Statist. Decisions 3, 77-90.
Laplace, P.S. de (1818). Deuxieme supplement a la theorie analytique de probabilities.
Paris: Courcier, Reprint (1886) in: Ouevres completes de Laplace 7, pp. 531-580.
Paris: Gauthier-Villars.
Lawless, IF. (1982). Statistical Models and Methods for Lifetime Data. New York:
Wiley.
Leadbetter, M.R., Lindgren, G. and Rootzen, H. (1983). Extremes and Related Properties of Random Sequences and Processes. Springer Series in Statistics. New York:
Springer.
Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer Series
in Statistics. New York: Springer.
Lehmann, E.L. (1986). Testing Statistical Hypothesis. 2nd ed. New York: Wiley.
Loeve, M. (1963). Probability Theory. 3rd ed. New York: Van Nostrand.
Mack, Y.P. (1984). Remarks on some smoothed empirical distribution functions and
processes. Bull. Informatics Cybernetics 21, 29-35.
Malmquist, S. (1950). On a property of order statistics from a rectangular distribution.
Skand. Aktuar. 33, 214-222.
Mammitzsch, V. (1984). On the asymptotically optimal solution within a certain class
of kernel type estimators. Statist. Decisions 2, 247-255.
Mann, N.R., Schafer, R.E. and Singpurwalla, N.D. (1974). Methods for Statistical
Analysis of Reliability and Life Data. New York: Wiley.

338

Bibliography

Mann, N.R. (1984). Statistical estimation of the Weibull and Frechet distributions.
In: Statistical Extremes and Applications, Ed. J. Tiago de Oliveira, pp. 81-89.
Dordrecht: Reidel.
Marshall, A.W. and Olkin, I. (1983). Domains of attraction of multivariate extreme
value distributions. Ann. Probab. 11, 168-177.
Matsunawa, T. (1975). On the error evaluation of the joint normal approximation for
sample quantiles. Ann. Inst. Statist. Math. 27,189-199.
Matsunawa, T. and Ikeda, S. (1976). Uniform asymptotic distribution of extremes. In:
Essays in Probab. Statist., Eds. S. Ikeda et al., pp. 419-432. Tokyo: Shinko Tsusho.
Michel, R. (1975). An asymptotic expansion for the distribution of asymptotic maximum likelihood estimators of vector parameters. J. Multivariate Anal. 5,67-85.
Miebach, B. (1977). Asymptotische Theorie fUr Familien von MaBen mit Lokalisations- und Dispersionsparameter. Diploma Thesis, University of Cologne.
Mises von, R. (1923). Ober die Variationsbreite einer Beobachtungsreihe. Sitzungsberichte Berliner Math. Ges. 22, 3-8.
Mises von, R. (1936). La distribution de la plus grande de n valeurs. Rev. Math. Union
Interbalcanique 1, 141-160. Reproduced in Selected Papers of Richard von Mises,
Amer. Math. Soc. 2 (1964), 271-294.
Miyamoto, Y. (1976). Optimum spacings for goodness of fit tests based on sample
quantiles. In: Essays in Probab. Statist., Eds. S. Ikeda et aI., pp. 475-483. Tokyo:
Shinko Tsusho.
Montfort, M.A.J. van (1982). Modellen voor maximum en minima, schattingen en
betrouwbaarheidsintervallen, kreuze tussen modellen, Agricultural University
Wageningen, Netherlands, Dept. Math., Statist. Division, Technical Note 82-02.
Montfort, M.A.J. van and Gomes, I.M. (1985). Statistical choice of extremal models
for complete and censored data. 1. Hydrology 77, 77-87.
Mood, A. (1941). On the joint distribution of the median in sample from a multivariate
population. Ann. Math. Statist. 12,268-278.
Moore, D.S. and Yackel, J.W. (1977). Large sample properties of nearest neighbour
density function estimates. In: Statistical Decision Theory and Related Topics, Eds.
S.S. Gupta and D.S. Moore, pp. 269-279. New York: Academic Press.
Mosteller, F. (1946). On some useful inefficient statistics. Ann. Math. Statist. 17,
377-408.
Nadaraya, E.A. (1964). Some new estimates for distribution functions. Theory Probab.
Appl. 10, 186-190.
Nagaraja, H.N. (1982). On the non-Markovian structure of discrete order statistics. J.
Statist. PI ann. Inference 7, 29-33.
Nagaraja, H.N. (1986). Structure of discrete order statistics. J. Statist. Plann. Inference
13, 165-177.
Nelson, W. (1982). Applied Life Data Analysis. New York: Wiley.
Nowak, W. and Reiss, R.-D. (1983). Asymptotic expansions of distributions of central
order statistics under discrete distributions. Technical Report 101, University of
Siegen.
Oja, H. and Niinimaa, A. (1985). Asymptotic properties of the generalized median in
the case of multivariate normality. J. Roy. Statist. Soc., Ser. B, 47,372-377.
O'Reilley, F.J. and Quesenberry, c.P. (1973). The conditional probability integral
transformation and applications to obtain composite chi-square goodness-of-fit
tests. Ann. Statist. 1, 74-83.
Pantcheva, E.I. (1985). Limit theorems for extreme order statistics under nonlinear
normalization. In: Stability Problems for Stochastic Models, Eds. v.v. Kalashnikov
and V.M. Zolotarev, pp. 284-309. Lecture Notes in Mathematics 1155, Berlin:
Springer.
Parzen, E. (1962) On estimation of a probability density function and mode. Ann.
Math. Statist. 33, 1065-1076.

Bibliography

339

Parzen, E. (1979). Nonparametric statistical data modeling. 1. Amer. Statist. Assoc. 74,
105-121.
Pearson, K. (1902). Note on Francis Galton's problem. Biometrika 1,390-399.
Pearson, K. (1920). On the probable errors of frequency constants. Biometrika 13,
113-132.
Pfanzagl, J. (1973a). Asymptotically optimum estimation and test procedures. In: Proc.
Prague Symp. Asymptotic Statistics, Vol. 1, Ed. J. Hajek, pp. 201-272. Prague:
Charles University.
Pfanzagl, J. (1973b). The accuracy ofthe normal approximation for estimates of vector
parameters. Z. Wahrsch. verw. Gebiete 25, 171-198.
Pfanzagl, J. (1973c). Asymptotic expansions related to minimum contrast estimators.
Ann. Statist. 1,993-1026.
Pfanzagl, J. (1975). Investigating the quantile of an unknown distribution. In: Statistical
Methods in Biometry, Ed. W.J. Ziegler, pp. 111-126. Basel: Birkhauser.
Pfanzagl, J. (1982). Contributions to a General Asymptotic Statistical Theory. (With
the assistence ofW. Wefelmeyer). Lecture Notes in Statistics 13. New York: Springer.
Pfanzagl, J. (1985). Asymptotic Expansions for General Statistical Models. (With the
assistance ofW. Wefelmeyer). Lecture Notes in Statistics 31. New York: Springer.
Pickands, J. (1967). Sample sequences of maxima. Ann. Math. Statist. 38, 15701574.
Pickands, J. (1968). Moment convergence of sample extremes. Ann. Math. Statist. 39,
881-889.
Pickands, J. (1975). Statistical inference using extreme order statistics. Ann. Statist. 3,
119-131.
Pickands,1. (1981). Multivariate extreme value distributions. Proc. 43th Session ofthe
lSI (Buenos Aires), 859-878.
Pickands, J. (1986). The continuous and differentiable domains of attractions of the
extreme value distributions. Ann. Probab. 14,996-1004.
Pitman, E.J.G. (1979). Some Basic Theory for Statistical Inference. London: Chapman
and Hall.
Plackett, R.L. (1976). In: Discussion of Professor Barnett's Paper. J.R. Statist. Soc., Ser.
A, 139,344-346.
Polfeldt, T. (1970). Asymptotic results in non-regular estimation. Skand. Aktuar.,
Suppl. 1-2,2-78.
Prakasa Rao, B.L.S. (1983). Nonparametric Functional Estimation. Orlando: Academic Press.
Puri, M.L. and Ralescu, S.S. (1986). Limit theorems for random central order statistics. In: Adaptive Statistical Procedures and Related Topics, Ed. J. van Ryzin,
pp. 447-475. IMS Lecture Notes 8.
Pyke, R. (1965). Spacings. J. Roy. Statist. Soc., Ser. B. 27, 395-436. Discussion: 437-449.
Pyke, R. (1972). Spacings revisited. In: Proc. 6th Berkeley Symp., Math. Statist.
Probability, Vol. 1, Eds. L.M. Le Cam et aI., pp. 417-427. Berkeley: Univ. California
Press.
Radtke, M. (1988). Konvergenzraten und Entwicklungen unter von Mises Bedingungen der Extremwerttheorie. Ph.D. Thesis, University of Siegen.
Ramachandran, G. (1984). Approximate values for the moments of extreme order
statistics in large samples. In: Statistical Extremes and Applications, Ed. 1. Tiago de
Oliveira, pp. 563-578. Dordrecht: Reidel.
Rao, J.S. and Kuo, M. (1984). Asymptotic results on the Greenwood statistic and some
of its generalizations. J. Roy. Statist. Soc., Ser. B, 46,228-237.
Raoult, J.P., Criticou, D. and Terzakis, D. (1983). The probability integral transformation for not necessarily absolutely continuous distribution functions, and its application to goodness-of-fit tests. In: Specifying Statistical Models, Ed. J.P. Florens et aI.,
pp. 36-49. New York: Springer.

340

Bibliography

Reiss, R.-D. (1973). On the measurability and consistence of maximum likelihood


estimates for unimodal densities. Ann. Statist. 1,888-901.
Reiss, R.-D. (1974a). On the accuracy of the normal approximation for quantiles. Ann.
Probab. 2, 741-744.
Reiss, R.-D. (1974b). Asymptotic expansions for sample quantiles. Technical Report 6,
University of Cologne.
Reiss, R.-D. (1975a). The asymptotic normality and asymptotic expansions for the joint
distribution of several order statistics. In: Limit Theorems of Prob. Theory, Ed.
P. Revesz, pp. 297-340. Amsterdam: North-Holland.
Reiss, R.-D. (1975b). Consistency of a certain class of empirical density functions.
Metrika 22,189-203.
Reiss, R.-D. (1976). Asymptotic expansions for sample quantiles. Ann. Probab. 4,
249-258.
Reiss, R.-D. (1977a). Asymptotic Theory of Order Statistics. Lecture Notes, University
of Freiburg.
Reiss, R.-D. (1977b). Optimum confidence bands for density functions. Studia Sci.
Math. Hungar. 12,207-214.
Reiss, R.-D. (1978a). Approximate distribution of the maximum deviation of histograms. Metrika 25, 9-26.
Reiss, R.-D. (1978b). Consistency of minimum contrast estimators in nonstandard
cases. Metrika 25, 129-142.
Reiss, R.-D. (1980). Estimation of quantiles in certain non-parametric models. Ann.
Statist. 8, 87-105.
Reiss, R.-D. (1981a). Approximation of product measures with an application to order
statistics. Ann. Probab. 9, 335-341.
Reiss, R.-D. (1981b). Asymptotic independence of distributions of normalized order
statistics of the underlying probability measure. J. Multivariate Anal. 11, 386399.
Reiss, R.-D. (1981c). Nonparametric estimation of smooth distribution functions.
Scand. J. Statist. 8, 116-119.
Reiss, R.-D. (1981d). Uniform approximation to distributions of extreme order statistics. Adv. Appl. Probab. 13,533-547.
Reiss, R.-D. (1982). One sided test for quantiles in certain non-parametric models. In:
Nonparametric Statistical Inference, Colloq. Math. Soc. Jimos Bolyai 32, Eds. P.V.
Gnedenko et aI., pp. 759-772. Amsterdam: North Holland.
Reiss, R.-D. (1984). Statistical inference using approximate extreme value models.
Technical Report 124, University of Siegen.
Reiss, R.-D. (1985a). Asymptotic expansions of moments of central order statistics. In:
Probability and Statistical Decision Theory, Vol. A., Proc. 4th Pann. Symp., Eds.
Mogyordi et aI., pp. 293-300. Dordrecht: Reidel.
Reiss, R.-D. (1985b). Approximations to the distributions of ordered distance random
variables. Ann. Inst. Statist. Math., Ser. A, 37, 529-533.
Reiss, R.-D. (1986). A new proof of the approximate sufficiency of sparse order statistics.
Statist. Probab. Letters 4, 233-235.
Reiss, R.-D. (1987). Estimating the tail index of the claim size distribution. Blatter
DGVM 18,21-25.
Reiss, R.-D. (1989). Extended extreme value models and adaptive estimation of the tail
index. In: Extreme Value Theory, Eds. ]. Husler and R.-D. Reiss, pp. 156-165.
Lecture Notes in Statistics 51. New York: Springer.
Reiss, R.-D., Falk, M. and Weller, M. (1984). Inequalities for the relative sufficiency
between sets of order statistics. In: Statistical Extremes and Applications, Ed. J.
Tiago de Oliveira, pp. 597-610. Dordrecht: Reidel.
Renyi, A. (1953). On the theory of order statistics. Acta Math. Acad. Sci. Hungar. 4,
191-231.

Bibliography

341

Resnick, S.I. (1987). Extreme Values, Regular Variation, and Point Processes. Applied
Probability. Vol. 4. New York: Springer.
Rice, 1. and Rosenblatt, M. (1976). Estimation ofthe log survivor function and hazard
function. Sankhya, Ser. A, 38, 60-78.
Rootzen, H. (1984). Attainable rates of convergence of maxima. Statist. Probab. Letters
2,219-221.
Rootzen, H. (1985). Asymptotic distributions of order statistics from stationary normal
sequences. In: Contribution to Probability and Statistics in Honour of Gunnar
Blom, Eds. J. Lanke and G. Lindgren, pp. 291-302. University of Lund.
Rosenblatt, M. (1952). Remarks on a multivariate transformation. Ann. Statist. 23,
470-472.
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27, 832-837.
Rosengard, A. (1962). Etude des lois-limitesjointes et marginales de la moyenne et des
valeurs extremes d'un echantillon. Publ. Inst. Statist. Univ. Paris 11, 3-53.
Rossberg, H.J. (1965). Die asymptotische Unabhiingigkeit der kleinsten und groBten
Werte einer Stichprobe vom Stichprobenmittel. Math. Nachr. 28, 305-318.
Rossberg, H.J. (1967). Ober das asymptotische Verhalten der Rand- und Zentralglieder
einer Variationsreihe (II). Publ. Math. Debrecen 14,83-90.
Rossberg, H.J. (1972). Characterization ofthe exponential and the Pareto distribution
by means of some properties of the distributions which the differences and quotients
of order statistics are subject to. Math. Operationsforsch. Statist. 3,207-316.
Riischendorf, L. (1985a). Two remarks on order statistics. J. Statist. Plann. Inference
11,71-74.
Riischendorf, L. (1985b). The Wasserstein distance and approximation theorems. Z.
Wahrsch. verw. Geb. 66,117-129.
Ryzin, J. van (1973). A histogram method of density estimation. Commun. Statist. 2,
493-506.
Sen, P.K. (1968). Asymptotic normality of sample quantiles for m-dependent processes.
Ann. Math. Statist. 39, 1724-1730.
Sen, P.K. (1972). On the Bahadur representation of sample quantiles for sequences of
cp-mixing random variables. J. Multivariate Anal. 2, 77-95.
Sendler, W. (1975). A note on the proof ofthe zero-one law of Blum and Pathak. Ann.
Probab. 3, 1055-1058.
Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistical. New York:
Wiley.
Shaked, M. and Tong, Y.L. (1984). Stochastic ordering of spacings from dependent
random variables. In: Inequalities in Statistics and Probability. IMS Lecture Notes
5,141-149.
Shorack, G.R. and Wellner, J.A. (1986). Empirical Processes with Applications to
Statistics. New York: Wiley.
Sibuya, M. (1960). Bivariate extreme statistics. Ann. Inst. Stat. Math. 19, 195-210.
Siddiqui, M.M. (1960). Distribution of quantiles in samples from a bivariate population. J. Res. Nat. Bureau Standards 64, Ser. B, 124-150.
Singh, K. (1979). Representation of quantile processes with non-uniform bounds.
Sankhya, Ser. A, 41, 271-277.
Singh, K. (1981). On the asymptotic accuracy of Efron's bootstrap. Ann. Statist. 9,
1187-1195.
Smid, B. and Starn, AJ. (1975). Convergence in distribution of quotients of order
statistics. Stoch. Proc. Appl. 3,287-292.
Smirnov, N.V. (1935). Ober die Verteilung des allgemeinen Gliedes in der Variationsreihe. Metron 12,59-81.
Smirnov, N.B. (1944). Approximation of distribution laws of random variables by
empirical data. Uspechi Mat. Nauk 10, 179-206 (in Russian).

342

Bibliography

Smirnov, N.V. (1949). Limit distributions for the term of a variational series. Trudy
Mat. Inst. Steklov 25, 1-60. (In Russian). English translation in Amer. Math. Soc.
Transl. (1), 11 (1952), 82-143.
Smirnov, N.V. (1967). Some remarks on limit laws for order statistics. Theory Probab.
Appl. 12,337-339.
Smith, R.L. (1982). Uniform rates of convergence in extreme value theory. Adv. Appl.
Probab. 14,600-622.
Smith, R.L. (1984). Threshold methods for sample extremes. In: Statistical Extremes
and Applications, Ed. J. Tiago de Oliveira, pp. 621-638. Dordrecht: Reidel.
Smith, R.L. (1985a). Maximum likelihood estimation in a class of non-regular cases.
Biometrika 72, 67-92.
Smith, R.L. (1985b). Statistics of extreme values. Proc. 45th Session of the lSI, Vol. 4
(Amsterdam), 26.1.
Smith, R.L. (1986). Extreme value theory based on the r largest annual events. J.
Hydrology 86, 27-43.
Smith, R.L. (1987). Estimating tails of probability distributions. Ann. Statist. 15,
1174-1207.
Smith, R.L. and Weissman, I. (1987). Large deviations of tail estimators based on the
Pareto approximation. J. Appl. Probab. 24, 619-630.
Smith, R.L., Tawn, J.A. and Yuen, H.K. (1987). Statistics of multivariate extremes.
Preprint, University of Surrey.
Sneyers, R. (1984). Extremes in meteorology. In: Statistical Extremes and Applications,
Ed. J. Tiago de Oliveira, pp. 235-252. Dortrecht: Reidel.
Stigler, S.M. (1973). Studies in the history of probability and statistics. XXXII. Biometrika 60, 439-445.
Strasser, H. (1985). Mathematical Theory of Statistics. De Gruyter Studies in Math.
7, Berlin: De Gruyter.
Stute, W. (1982). The oscillation behaviour of empirical processes. Ann. Probab. 10,
86-107.
Sukhatme, P.V. (1937). Tests of significance for sample of the X2 -population with two
degrees offreedom. Ann. Eugenics 8, 52-56.
Sweeting, T.J. (1985). On domains of uniform local attraction in extreme value theory.
Ann. Probab. 13, 196-205.
Teugels, J.L. (1981). Limit theorems on order statistics. Ann. Probab. 9, 868-880.
Thompson, W.R. (1936). On confidence ranges for the median and other expectation
distributions for populations of unknown distribution form. Ann. Math. Statist. 7,
122-128.
Tiago de Oliveira, J. (1958). Extremal distributions. Rev. Fac. Cienc. Univ. Lisboa A
7,215-227.
Tiago de Oliveira, J. (1961). The asymptotic independence of the sample means and
the extremes. Rev. Fac. Cienc. Univ. Lisboa A 8, 299-310.
Tiago de Oliveira, J. (1963). Decision results for the parameters of the extreme value
(Gumbel) distribution based on the mean and standard deviation. Trabajos de
Estadistica 14, 61-81.
Tiago de Oliveira, J. (1984). Bivariate models for extremes; statistical decisions.
In: Statistical Extremes and Applications, Ed. J. Tiago de Oliveira, pp. 131-153.
Dordrecht: Reidel.
Tippett, L.H.C. (1925). On the extreme individuals and the range of samples taken
from a normal population. Biometrika 17,364-387.
Torgersen, E.N. (1976). Comparison of statistical experiments. Scand. J. Statist. 3,
186-208.
Tusmidy, G. (1974). On testing density functions. Period. Math. Hungar. 5, 161-169.
Umbach, D. (1981). A note on the median of a distribution. Ann. Inst. Statist. Math.
33, Ser. A, 135-140.

Bibliography

343

Uzgoren, N.T. (1954). The asymptotic development of the distribution of the extreme
values of a sample. In: Studies in Mathematics and Mechanics. Presented to Richard
von Mises, pp. 346-353. New York: Academic Press.
Vaart, H.P. van der (1961). A simple derivation ofthe limiting distribution function of
a sample quantile with increasing sample size. Statist. Neerlandica 15,239-242.
Walsh, J.E. (1969). Asymptotic independence between largest and smallest of a set of
independent observations. Ann. Inst. Statist. Math. 21, 287-289.
Walsh, J.E. (1970). Sample sizes for appropriate independence of largest and smallest
order statistic. J. Amer. Statist. Assoc. 65, 860-863.
Watson, G. and Leadbetter, M. (1964a). Hazard analysis I. Biometrika 51,175-184.
Watson, G. and Leadbetter, M. (1964b). Hazard analysis II. Sankhya, Ser. A, 26,
101-116.
Watts, V., Rootzen, H. and Leadbetter, M.R. (1982). On limiting distributions of
intermediate order statistics from stationary sequences. Ann. Probab. 10, 653662.
Weinstein, S.B. (1973). Theory and applications of some classical and generalized
asymptotic distributions of extreme values. IEEE Trans. Inf. Theory 19, 148-154.
Weiss, L. (1959). The limiting joint distribution of the largest and smallest sample
spacings. Ann. Math. Statist. 30, 590-593.
Weiss, L. (1964). On the asymptotic joint normality of quantiles from a multivariate
distribution. J. Res. Nat. Bureau Standards 68, Ser. B, 65-66.
Weiss, L. (1965). On asymptotic sampling theory for distributions approaching the
uniform distribution. Z. Wahrsch. verw. Gebiete 4, 217-221.
Weiss, L. (1969a). The joint asymptotic distribution of the k-smallest sample spacings.
J. Appl. Probab. 6,442-448.
Weiss, L. (1969b). The asymptotic joint distribution of an increasing number of sample
quantiles. Ann. Inst. Statist. Math. 21, 257-263.
Weiss, L. (1969c). Asymptotic distributions of quantiles in some nonstandard cases.
In: Nonparametric Techniques in Statistical Inference, Ed. M.L. Puri, pp. 343-348.
Cambridge: Cambridge Univ. Press.
Weiss, L. (1971). Asymptotic inference about a density function at an end of its range.
Nav. Res. Logist. Quart. 18,111-114.
Weiss, L. (1973). Statistical procedures based on a gradually increasing number of order
statistics. Commun. Statist. 2, 95-114.
Weiss, L. (1974). The asymptotic sufficiency of a relatively small number of order
statistics in test of fit. Ann. Statist. 2, 795-802.
Weiss, L. (1976). The normal approximations to the multinomial with an increasing
number of classes. Nav. Res. Logist. Quart. 23, 139-149.
Weiss, L. (1977). Asymptotic properties of Bayes tests of nonparametric hypothesis.
In: Statistical Decision Theory and Related Topics, II, Eds. D.S. Moore and S.S.
Gupta, pp. 439-450. New York: Academic Press.
Weiss, L. (1978). The error in the normal approximation to the multinomial with an
increasing number of classes. Nav. Res. Logist. Quart. 25,257-261.
Weiss, L. (1979a). The asymptotic distribution of order statistics. Nav. Res. Logist.
Quart. 26,437-445.
Weiss, L. (1979b). Asymptotic sufficiency in a class of nonregular cases. Selecta Statistica Canadiana V, 141-150.
Weiss, L. (1980). The asymptotic sufficiency of sparse order statistics in test of fit with
nuisance parameters. Nav. Res. Logist. Quart. 27, 397-406.
Weiss, L. (1982). Asymptotic joint normality of an increasing number of multivariate order statistics and associated cell frequencies. Nav. Res. Logist. Quart. 29,
75-96.
Weissman, I. (1975). Multivariate extremal processes generated by independent nonidentically distributed random variables. J. Appl. Probab. 12,477-487.

344

Bibliography

Weissman, I. (1978). Estimation of parameters and large quantiles based on the k


largest observations. J. Amer. Statist. Assoc. 73,812-815.
Wellner, J.A. (1977). A law of the iterated logarithm for functions of order statistics.
Ann. Statist. 5, 481-494.
Wilks, S.S. (1948). Order Statistics. Bull. Amer. Math. Soc. 54, 6-50.
Wilks, S.S. (1962). Mathematical Statistics. New York: Wiley.
Winter, B.B. (1973). Strong uniform consistency of integrals of density estimators.
Canad. J. Statist. 1,247-253.
Witting, H. (1985). Mathematische Statistik I (Parametrische Verfahren bei festem
Stichprobenumfang). Stuttgart: Teubner.
Witting, H. and Nolle, G. (1970). Angewandte Mathematische Statistik. Stuttgart:
Teubner.
Wu, c.Y. (1966). The types of limit distributions for some terms of variational series.
Sci. Sinica 15, 749-762.
Yang, S.-S. (1985). A smooth nonparametric estimator of a quantile function. J. Amer.
Statist. Assoc. 80, 1004-1011.
Yamato, H. (1973). Uniform convergence of an estimator of a distribution function.
Bull. Math. Statist. 15, 69-78.
Zolotarev, V.M. and Rachev, S.T. (1985). Rate of convergence in limit theorems for
the max scheme. In: Stability Problems for Stochastic Models, Eds. V.V. Kalashnikov and V.M. Zolotarev, pp. 415-442. Lecture Notes in Mathematics 1155. Berlin:
Springer.
Zwet, W.R. van (1964). Convex Transformations of Random Variables. Amsterdam.
Math. Centre Tracts 7.
Zwet, W.R. van (1984). A Berry-Esseen bound for symmetric statistics. Z. Wahrsch.
verw. Gebiete 66, 425-440.

Author Index

Alam, K., 48
Ali, M.M., 238
Anderson, C.W., 202, 203
Arnold, B.C., 63

B
Bahadur, R.R., 216, 228
Bain, L.J., 204
Balkema, A.A., 146, 148
Barndorff-Nielsen, 0., 198
Barnett, V., 63, 66, 67, 257
Becker, A., 63
Beran, 1., 228
Bennan, S.M., 238
Bernoulli, N., 62
Bhattacharya, R.N., 69, 149,231
Bickel, P.L, 61, 228, 271
Bloch, D.A., 271
Blum, J.R., 104
Boos, D.O., 291
Bortkiewicz, L. von, 62
Brown, B.M., 271
Brozius, H., 82

C
Chernoff, H., 210
Chibisov, D.M., 195,202

Chow, Y.S., 102


Cohen, J.P., 202
Consul, P.c., 63
Cooil, B., 202
Craig, A.T., 63
Cramer, H., 2, 148
Criticou, D., 82
Csiszar, I., 104
Csorgo, M., 150, 227
Csorgo, S., 227, 285

D
David, F.N., 205, 226
David, H.A., 63
Davis, C.E., 271
Davis, R.A., 202
Deheuvels, P., 205, 285
Dodd, E.L., 62
Dronskers, J.J., 203
Du Mouchel, W., 291
Dwass, M., 149
Dziubdziela, W., 68

E
Eddy, W.F., 82
Efron, B., 220
Egorov, V.A., 149

346

Englund, G., 149


Erdelyi, A., 190

F
Falk, M., 102, 122, 149, 159, 164, 185,
186, 187, 195, 199,202,203,224,
264,265,271,291,317
Feldman, D., 148
Feller, W., 227
Ferguson, T.S., 103
Finkelstein, B. V., 238
Fisher, R.A., 62, 172,201,202
Floret, K., 71
Frechet, M., 63
Freedman, D.A., 228

G
Galambos, J., 23, 37, 43, 63, 76, 80,
82, 162, 180,201,202,228,235,
238
Gale, J.D., 82
Galvani, L., 81
Gastwirth, J.L., 210, 271
Gather, U., 63
Geffroy, J., 238
Gini, c., 81
Gnedenko, B., 155,201
Goldie, C.M., 202, 204
Gomes, M.I., 286, 290
Gosh, J.K., 149
Gosh, M., 48
Gross, A.J., 204
Guilbaud, 0., 36
Gumbel, E.J., 62, 63, 149,257

H
Haan, L. de, 63, 82, 146, 148, 155, 195,
201, 202, 291
Hajek, J., 58, 61
Haldane, J.B.S., 203
Hall, P., 68, 81, 202, 203, 285, 291
Hall, W.J., 202
Harrel, F.E., 271
Harter, H.L., 62
Has'minskii, R.Z., 274, 298

Author Index

Hausler, E., 291


Hecker, H., 210
Heidelberger, P., 291
Helmers, R., 209, 211, 216, 227
Herbach, L., 290
Hewitt, E., 21, 57
Heyer, H., 294
Hill, B.M., 284, 290
Hillion, A., 104
Hodges, J.L. Jr., 227, 263, 271
Hoeffding, W., 104
Horvath, L., 227
Hosking, J.R.M., 259
Huang, J.S., 48
Htisler, J., 233, 236, 239, 290, 291

I
Ibragimov, J.A., 103,274,298
Iglehardt, D.L., 148
Ikeda, S., 2, 104, 149, 150, 203
Isogai, T., 81
Ivchenko, G.I., 150

Jacod, J., 204


Jammalamadaka, S.R., 228
Janssen, A., 227, 296, 317
Jayakar, S.G., 203
Joag-Dev, K., 238
John, M.V., 210
Johnson, N.L., 63, 209, 226, 277, 289,
290

K
Kabanov, Yu., 204
Karr, A.F., 204
Kendall, M.G., 150,226
Kiefer, J., 148,218,228
Kinnison, R.R., 63
Klenk, A., 228
Kohne, W., 71, 149,200
Kolchin, V.F., 150
Kotz, S., 63, 277, 289, 290
Kuan, K.S., 238
Kuo, M., 228

Author Index
L

Lamperti, J., 149


Landers, D., 148
Laplace P.S. de, 62, 148
Lawless, J.F., 204
Leadbetter, M.R., 23, 63, 149,202,
271
Le Cam, L., 317
Lehmann, E.L., 227, 263, 271, 315
Lewis, P.A.W., 291
Lewis, T., 63
Lindgren, G., 23, 63, 149
Lipster, R.S., 204
Loeve, M., 83

Magnus, W., 190


Malmquist, S., 37
Mammitzsch, V., 271
Mann, N.R., 204, 290
Marshall, A.W., 238
Mason, D.M., 227, 285
Matsunawa, T., 149, 150,203
Michel, R., 290
Miebach, B., 290
Mises, R. von, 62, 201, 202
Miyamoto, Y., 228
Montfort, M.A.J. van, 257, 290
Mood, A., 238
Moore, D.S., 81
Mosteller, F., 150, 211

N
Nadaraya, E.A., 271
Nagaraja, H.N., 63
Nelson, W., 204
Nevzorov, V.B., 149
Niinimaa, A., 81
Nolle, G., 315
Nonaka, Y., 150
Nowak, W., 150

Oberhettinger, F., 190


Oja, H., 81

347
Olkin, I., 238
O'Reilley, F.J., 82
p

Pantcheva, E.I., 204


Parzen, E., 271
Pathak, P.K., 104
Pearson, K., 63
Pfanzagl, J., 141, 146, 149,215,247,
248, 268, 270, 274
Pfeifer, D., 205
Pickands, J., 43, 76,80, 177,202,227,
239,291
Pitman, E.J.G., 102, 290
Plackett, R.L., 66, 67
Polfeldt, T., 227, 290
Prakasa Rao, B.L.S., 270, 271
Purl, M.L., 149
Pyke, R., 228

Quesenberry, c.P., 82

R
Rachev, S.T., 202
Radtke, M., 176, 199, 200, 204
Ralescu, S.S., 149
Ramachandran, G., 227
Rao, R.R., 69, 142,231
Raoult, J.P., 82
Reiss, R.-D., 68, 102, 103, 104, 122,
124, 128, 138, 147, 149, 150, 175,
196, 200, 202, 203, 224, 226, 233,
239, 262, 268, 270, 271, 286, 290,
291,296,317
Renyi, A., 36, 63, 148
Resnick, S.I., 63, 76, 202, 204, 227,
228, 235, 238, 291
Revesz, P., 150
Rice, J., 271
Rogge, L., 148
Rootzen, H., 23, 63, 146, 149, 202, 203
Rosenblatt, M., 82, 271
Rosengard, A., 149
Rossberg, H.J., 43, 149

Author Index

348
Riischendorf, L., 63, 82
Ryzin, J. van, 271

Tucker, H.G., 148


Tusmidy, G., 271

S
Schafer, R.E., 204
Schiipbach, M., 290
Sen, P.K., 148,228
Send1er, W., 104
Sertling, R.J., 104, 150,227,289
Shiryaev, A.N., 204
Shorack, G.R., 150
Sibuya, M., 236, 238
Sidlik, Z., 58, 61
Siddiqui, M.M., 237, 238, 271
Singh, K., 224, 228
Singpurwalla, N.D., 204
Smid, B., 149
Smimov, N.V., 145, 148, 150,271
Smith, R.L., 63, 202, 203, 204, 239,
286, 290, 291
Sneyers, R., 257
Starn, A.J., 149
Stigler, S.M., 148
Strasser, H., 317
Stromberg, K., 21, 57
Stuart, A., 226
Stute, W., 219, 228, 271
Sukhatme, P.V., 36
Sweeting, T.J., 159,202,203

T
Tawn, J.A., 239
Teicher, H., 102
Terzakis, D., 82
Teugels, J.L., 227, 291
Thompson, W .R., 63
Tiago de Oliveira, J., 61, 149,238,239,
289, 291
Tippett, L.H.C., 62,172,201,202
Torgersen, E.N., 317
Tricomi, F.G., 150

Y
Yackel, J.W., 81
Yamato, H., 271
Yang, S.-S., 271
Yuen, H.K., 239

Umbach, D., 148


Uzgoren, N.T., 203

V
Vaart, H.P. van der, 148

W
Wald, A., 273
Walsh, J.E., 148, 149
Watson, G., 271
Watts, V., 202
Weinstein, S.B., 204
Weiss, L., 2, 39, 103, 149, 150,203,
238,290,296,311,313,317
Weissman, I., 291
Weller, M., 317
Wellner, J .A., 86, 104, 150, 202
Welsh, A.H., 285, 291
Wilks, S.S., 57, 59, 63
Winter, B.B., 271
Witting, H., 274, 315
Wolfowitz, J., 104
Wu, C.Y., 195,202

Z
Zahedi, H., 63
Zolotarev, V.M., 202
Zwet, W.R. van, 227

Subject Index

[AbbL: o.s.

order statistic1

A
ADO (software package), 7
Annual maxima method, see Subsample
method
Associated LV.'S, 238
Asymptotic distribution of
central o.s.'s, 145-146; see also
Asymptotic normality
extreme o.s.'s
k largest o.s.'s, 177-179
kth largest o.s.'s, 161-163
maxima, see (univariate, multivariate) Extreme value d.f.
minima, 24, 162
intermediate o.s.'s, 164, 195; see also
Asymptotic normality
Asymptotic independence of
groups of o.s. 's, 75, 121-123, 297
marginal maxima, 234-237
ratios of o.s. 's, 149
spacings, 201
Asymptotic normality of
central o.s. 's, multivariate, 229-232
central o.s. 's, univariate
strong, 22, 110-114, 131-142
weak, 108-110, 129

intermediate o.s.'s
strong, 164
weak, 109
kernel estimator, 263-264
linear combination of o.s. 's, 209-211,
215-216, 227
multinomial distributions, 150

B
Bahadur approximation, 216-220
Bandwidth, 249
Beta
function, 22
LV., 22
Bonferroni inequality, 79, 102,
233
Bootstrap
distribution
of linear combination of o.s. 's,
228
of sample quantile, 222-226
smooth, of sample quantile, 265268
error process, 224, 267
Borel set, 8

350
Brownian
bridge, 150
motion, 224

C
Cauchy distribution, 49, 199
Central limit theorem
Lindeberg-Uvy-Feller, 210
multi-dimensional, 231
Central o.s., see (central) Sequence
X2 distance, see Distance
distribution
central, 313
noncentral, 315
Comparison of models, 275-276, 292299, 317
Componentwise ordering, see (multivariate) O.s. 's
Concomitant, 66
Conditional
density, 52
distribution, 51
of exceedances, 54-55, 61, 78
of Li.d. random variables given the
o.s., 60
of o.s., see (univariate) O.s.'s
of rank statistic given the o.s., 6061
independence under Markov property,
53,61
Confidence procedure
bootstrap, 225-226
for quantile, 247
Convex hull of data, 66, 81-82

D
Data
temperature (De Bilt), 257-260
Venice sea-level, 286
Deficiency
E-, of models, 295, 299
of estimators, 263
~-monotone, 77
Density quantile function, 243, 253
Dependence function, 80
Pickands estimator of, 80; see also
Kernel estimator

Subject Index
D.f., see Distribution function
Dirichlet distribution, 59
Distance
X2_, 98-102, 328-330
between induced probability measures, 102
between product measures, 100
Hellinger, 98-102, 328
between induced probability measures, 101
between product measures, 100
Kolmogorov-Smirnov, 2
Kullback-Leibler, 98-100, 328
between product measures, 100
L,-, 94, 326
variational, 94, 326
between induced probability measures, 101
between product measures, 97-98,
327, 328-330
Distribution function (d. f)
continuity criterion, 16
degenerate, 14, 76
endpoints of, 8
multivariate, 77-78
weak convergence, 2, 194-195
Domain of attraction, see (univariate) Extreme value d.f.
Dvoretzky, Kiefer, Wolfowitz inequality,
104

E
Edgeworth expansion, 91, 140
inverse of, 141
Efficiency, 273-275, 279, 283, 284
second order, see Deficiency, of estimators
Estimator
Bayes, 274
equivariant under translations, 284
kernel, see Kernel estimator
maximum likelihood, 259-260, 277279,298
minimum distance, 259
nearest neighbor density, 81
orthogonal series, 269-270, 315
Pitman, 274, 298
quick, of location- and scale parameters, 212-213, 289

351

Subject Index
randomized, 274; see also Sample,
median; Sample, q-quantile
of shape parameter, 277-279, 281283, 284-286
of tail index, 279-281, 283-284, 284286
Exceedances, see also (truncated, empirical) Point process
multivariate, 67-68, 81
univariate, 54, 190-193
Expansion of finite length, 90
of d.f.'s, 93
of distributions of
central o.s. 's, several, 131-135
central o.s.'s, single, 114-121, 138140, 147-148; see also GramCharlier series
convex combination of o.S. 's, 213215
k largest o.s.'s, 182, 184
kth largest o.s. 's, 184
maxima, 172-176
of moments of o.S. 's, 207-208
of normal distributions, 90-91, 102
of probability measures, 91-93
of quantiles of o.S. 's, 208-209, 226
Expected loss, see Risk
Exponential
d.f., 13, 42; see also Generalized Pareto d.f.
model, 282-283
Exponential bound theorem for
LLd. random variables, 83-84
kernel estimator, 262
o.s.'s, 84-86, 144-145
sample d.f., 218-219
sample q.f., 87-89
Extreme o.s. 's, see (extreme) Sequence,
maximum, and minimum
Extreme value d.f., multivariate, 75-77
max-stability of, 77
Pickands representation of, 76-77,
80
Extreme value d~f., univariate, 23, 24;
see also Fn!chet, Gumbel, and
Weibull d.f.
density of, 152
domain of attraction of, 24, 154-156,
157, 180, 194
max-stability of, 23

Extreme value model, see also Frechet,


Weibull, and Gumbel model
extended, 286
of Poisson processes, 194
3-parameter, see von Mises parametrization

F
Finite expansion, see Expansion of finite
length
Fisher information, 282
matrix, 276-277
Fisher-Tippett asymptote, see (univariate)
Extreme value d.f.
type I, see Gumbel d.f.
type II, see Frechet d.f.
type III, see Weibull d.f.
Fourier expansion, 269, 315
Frechet
d.f., 23; see also Extreme value d.f.
illustrations, 26, 153
mode of, 153
model, 276, 279
multivariate, model, 282
semiparametric, type model, 279-280,
283-284

G
Galton difference problem, 63
Gamma
function, 22
r.v., 39-40, 59
moments of, 181-182
Generalized Pareto
density, 157
illustrations, 196-198
d.f.,42
characterization of, 37, 43, 185
type I, see Pareto d.f.
type II, 42, 196
type III, see Exponential d.f.
Gram--Charlier series, 226
Gumbel
d.f., 23; see also Extreme value d.f.
illustrations, 25, 26
method, see Subsample method
model, 276-279

352
H
Hellinger distance, see Distance
Hill estimator, 284-285
Homogeneous Poisson process, see Point
process

I
Independent not necessarily identically
distributed (i.n.n.i.d.) r. v.'s
distribution of the o.s. of, 36
maximum of, 21
Informative, more, 294
Intensity measure, see Point process
Inverse, generalized, 318-320; see also
Q.f.

Jenkinson parametrization, see von Mises


parametrization
Jensen inequality, 103

K
Kernel
Epanechnikov, 253
method, 251-252
Kernel estimator of
density, 253, 269
illustrations, 258-259
density quantile function, 253, 260262
dependence function, 239
d.f., 252-253, 262-264
inverse of: illustrations, 254-255
hazard function, 271
q.f., 252, 260--262, 264-265, 286-289
illustrations, 254-255, 288
Kolmogorov-Smirnov
distance, see Distance
test, see Test

L
Leadbetter's conditions, 202
Lebesgue's differentiation theorem, 71
L-statistic, see (linear combination of)
O.s.'s

Subject Index
M
Malmquist's result, 37-38
Marginal ordering, see Multivariate o.s. 's
Markov
kernel, 34, 293
distribution of, 34, 50, 293
property
conditional independence under, 61
of O.s. 's, 54
Maximum (also: sample maximum)
multivariate, 65
density of, 69
d.f. of, 68
univariate, 12, 21
density of, 22
dependence of, and minimum, see
Asymptotic independence
d.f. of, 20
with random index, 198,280--281
Maximum likelihood, see Estimator
Max-stability, see Extreme value d.f.
Mean value function, see Point process
Median
multivariate, 66
univariate, 49
Minimax criterion, 274
Minimum (also: sample minimum)
multivariate, 65
density of, 69
d.f. of, 69
univariate, 12
density of, 22
d.f. of, 20
Mises, von
parametrization
of extreme value d.f. 's, 24-26
of generalized Pareto d.f.'s, 197-198
of Poisson processes, 194
-type conditions, 159-160, 199-200
Moderate deviation, see Exponential
bound theorem
Moving scheme, 249-250
illustration, 250
N
Newton-Raphson iteration, 259, 278
Normal
approximation, see Asymptotic
normality

353

Subject Index
comparison lemma, 149
distributions
expansion of, see Expansion of finite
length
moments of, 130-131
multivariate, 129-130, 146
univariate, 13
model , multivariate, 310-315
Normalization of maxima, 23, 156, 161,
200
nonlinear, 204
of nonnal r.v.'s, 160-161

o
Ordered distance r. v ., 68
Ordering, total-I\J, 66-68
Order statistics (o.s. 's), multivariate, 65
density of, 71, 73-74
d.f. of, 69-70, 229-232, 232-237
I\J-, see Ordering
Order statistics (o.s. 's), univariate, 12
of binomial r.v.'s, 141-142
central, see (central) Sequence
conditional distribution of, given
o.s. 's, 52-54
convex combination of, 55-56
density of single, 21, 33
d.f. of, 20, 57
mode of, 49
unimodality of, 48-49
of discrete r.v.'s, 35-36, 139-142
extreme, see (extreme) Sequence
independence of, from underlying d.f.,
123-128
intennediate, see (intennediate) Sequence
joint density of
absolutely continuous case, 27-28,
30-32
continuous case, 33
discontinuous case, 35-36, 58
linear combination of, 56, 209-216,
227
local limit theorem for, 142-144
Markov property of, 54
moments of
exact, 44-45, 59-60
inequalities for, 45-47, 86-87

positive dependence of, 61


with random sample size, 149
ratios of
of generalized Pareto r.v.'s, 43
of unifonn r. v .'s, 37-38, 58
representation of
of exponential LV.'S, 37
of unifonn LV. 's, 38-42, 59
sparse, 28
of stationary, nonnal sequence,
146
Outlier, 62, 63

Pareto d.f., 42, 196,289; see also Generalized Pareto d.f.


illustrations, 196, 198
Partial maxima process, 74-75
Penultimate distribution, 172
Pickands estimator, see Dependence
function
Point process
empirical, 190-194
intensity measure of, 193
mean value function of, 193
Poisson, 190-194
homogeneous, 191-192
truncated, 190-194
Poisson
approximation of
binomial LV., 162, 190
empirical point process, 190-194,
204-205
process, 281; see also Point process
Polygon, 248-249
illustration, 250
Probability integral transfonnation
multivariate, 81
of o.s. 's, 18
univariate, 14, 17,34
Probability paper, 6, 257

Q. f., see Quantile function


Q-quantile, 13

354
Quantile
function (q.f.), 13, 19
continuity criterion, 320
estimation of, 286-289
parametric estimation of, 256; see
also Kernel estimator
weak convergence, 19
process, 150, 264
smooth, 264
transformation
multivariate, 81
of o.s.'s, 15, 17-18, 76
univariate, 14, 17
Quasi-quantile, 250-251

R
Ranking, see Ordering
Rank statistic, 55, 60
Regression, linear, 314
Risk, 273
Bayes, 274

S
Sample
d.f., 13,59
oscillation of, 218-219
maximum, see Maximum
median, multivariate, 66
median, univariate, 14
randomized, 50, 60, 246
minimum, see Minimum
q.f., 13
illustrations, 250, 254-255,
288
maximum deviation of, 87-88
oscillation of, 88-89 261
smooth, see Kernel estimator
q-quantile, 14, 247-248
randomized, 247, 268
Scheffe lemma, 95-97, 325
Sequence
of lower or upper extremes,
12
of o.s.'s
central, extreme or intermediate,
12

Subject Index
Skewness of extreme value density, 2526
Smoothing technique, see Kernel method
Spacings, 29, 36-37, 147,201,212,
227-228
Strong convergence of unimodal probability measures, 103
Subsample method, 165, 176, 185
Sufficiency, 294-295; see also Deficiency
approximate, 295
Blackwell-, 293-295
Sukhatme's result, 36
Sum of extremes, 227
Survivor function, 69, 79, 234
Sweeting's result, 159
Systematic statistic, 211-213

T
Tail
equivalence of
densities, 157-159
d.f. 's, 156
index, 204, 280-286
Test
X2 -, 313
Kolmogorov-Smirnov, 313
of quantiles, 244-246, 268-269
Threshold
non-random, 191, 193
random, 55
Transformation
of models, see Comparison of models
technique, see Quantile, transformation; Probability integral transformation
theorem for densities, 29, 57
Trimmed mean, 68, 211, 251
Truncation
of d.f., 52, 57, 194
of point process, 191, 193

Unbiased estimation
expectation, 273
median, 50-51, 247, 248, 268, 274

Subject Index
Unimodal density, 48
mode of, 48
strongly, 48

V
Variational distance, see Distance

355
W
Weibull
d.f., 23, 199; see also Extreme value
d.f.
illustrations, 26, 27, 154, 258-259
mode of, 154
model, 317