Вы находитесь на странице: 1из 39

Generalizations of Tchebycheff's Inequalities

Author(s): C. L. Mallows
Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 18, No. 2
(1956), pp. 139-176
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2983702
Accessed: 19-08-2017 03:25 UTC

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms

Royal Statistical Society, Wiley are collaborating with JSTOR to digitize, preserve and
extend access to Journal of the Royal Statistical Society. Series B (Methodological)

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
Journal of the Royal Statistical Society
SERIES B (METHODOLOGICAL)

Vol. XVIII, No. 2, 1956

GENERALIZATIONS OF TCHEBYCHEFF'S INEQUALITIES

By C. L. MALLOWS

University College, London

[Read before the RESEARCH SECTION of the ROYAL STATISTJCAL SOCIETY,


March 14, 1956, Dr. H. E. DANIELS in the Chair]

PART I

SUMMARY

WE may often obtain the lower moments of a sampling distribution when the dis-
tribution itself is intractable. Tchebycheff's celebrated inequalities for the cumula-
tive distribution function are of very little use in practice, being "much too wide".
The present work obtains generalizations of these inequalities by restricting the
distribution function to be "smooth" in two senses: firstly, by restricting the number
of changes of sign of some derivative, and secondly by also restricting the size of this
derivative. These conditions lead to certain "extremal" distributions, and hence to
the required inequalities. These have been proved correct only in the simpler
cases, but are conjectured to hold in general. In Part I some of the simpler results
are given explicitly.

1. Introduction
1. 1. The technique of "fitting a curve" to the initial moments of a theoretical sampling distri-
bution, whose form is unknown, suffers from a serious defect, namely that we have no idea how
close to the true distribution the approximation is. An important technique for finding various
properties of a distribution function is that of Tchebycheff's inequalities, and the numerous exten-
sions of them (see 2.2). Here, we assume the knowledge of the initial moments of the distribution
and we constructupper and lower limits for the, (cumulative) distribution function. These limits
are precise, in that they are attained for some special forms of distribution with the given moments;
but they have not found favour with statisticians because they are "much too wide". It is found,
for example, that the Pearson system and Johnson's (1949) systems of curves agree remarkably
closely over a large area of the ((3, (2) plane (Dennis (Unpublished)), while for these values of
5 and 32 the Tchebycheff limits are extremely wide.
The Tchebycheff limits are attained only by distributions consisting of discrete point-masses,
which exhibit an extreme form of "un-smoothness"; while the Pearson and Johnson curves are,
in some sense, "smooth". This suggests that it should be possible to obtain narrower limits than
those of Tchebycheff by placing some restriction on the shape of the distributions.
1.2. The problem is, then, to define this rather loose idea of "smoothness", and to obtain
inequalities analogous to those of Tchebycheff under this additional restriction. In the present
work, the definition used (Definition I) leads to inequalities which are direct generalizations of
those of Tchebycheff, and are precise in the same sense. The proof that these inequalities are in
fact correct depends on three main theorems (Theorems I, If, and Ill), of which the first two are
proved in all generality; the third theorem has been found correct in all the (simpler) cases so far
tackled directly.
In this paper (Part I), the method is applied to the simplest two-moment cases. Part IJ is
concerned with algebraic properties of the limits, as well as some computational methods. Also
included are some numerical results.
F

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
140 MALLOWs-Generalizations of Tchebycheff's Inequalities [No. 2,

2. Review of Previous Results


2.1. Tchebycheff stated his inequalities without proof (Tchebycheff, 1874); they were proved
almost simultaneously and by the same method by Markoff (1884) and Stieltjes (1884). Stieltjes
later (1894) gave another proof, which is the one leading most naturally to the generalizations
below. Shohat & Tamarkin (1943) give an extensive mathematical treatment of the problem
of moments, and refer to a work in Russian by Akhieser & Krein (1938) which is unfortunately
unobtainable in England. Shohat & Tamarkin quote (op. cit., p. 82) a "typical problem following
the treatment of Akhieser & Krein" which suggested the form of the solution under the smooth-
ness condition (0, ?) (see Definition I).
Verblunsky (1936) has given necessary and sufficient conditions for the existence of a solution
of the moment-problem (n, 0, 1) (see 4.1) with terminals 0 and 1. Akhieser & Krein (1940)
in a note in English, point out that Verblunsky's results are contained in the more general theorems
of their series of three papers in a Russian journal (1934 and 1935); and a footnote (1940, p. 131)
seems to imply that their book (1938) contains such investigations as these (e.g. relating to the
moment-problems (n, 0, k) in the interval - 1 < x < 1) "and some others similar". They do
not seem to have treated the moment-problems (n, k, k) for any k other than zero, or to have
noticed the generalization of Tchebycheff's inequalities which follows from their work on the
(n, 0, k) case.
A recent paper by H. L. Royden (1953) treats the problem of giving limits for a distribution
function with one terminal known. Hoeffding (1955) has improved Tchebycheff's inequalities
by assuming the random variable involved to be a sum of independent random variables.
2.2. Numerous generalizations of the result commonly known as Tchebycheff's inequality,
or as Bienayme's inequality, namely

P{ 14x-i | > ta} < t2 t > I (1)

where x is any random variable with mean i and variance a2 have appeared in the statistical litera-
ture. See Frechet (1937) for an exposition of the main variants, and van Dantzig (1951) for a
more recent investigation. These results are summarized in Savage (1952). Bienayme gave the
inequality (1) in 1853; but a result of Gauss of 1821 gives a stronger inequality for continuous
unimodal distributions, namely

P{x - m > ts} ? ? . . . . (2)

where x is any random variable with a continuous distribution function having a unique mode mn,
and
s 2 ? (72 -m M)2

(see Cramer, 1946, 15.7 and Ex. 4, 5, 6, p. 256). Occupying a rather unique position in these
generalizations in the inequality of Bernstein (see Uspensky, 1937; Frechet, 1937). Bernstein
requires an upper limit for each moment of the random variable, and obtains a limit to the prob-
ability of a deviation from the mean exceeding, say, t times the standard deviation; the limit
obtained decreases as e-02

3. Smoothness
3.1. One thing the Pearson and Johnson systems of curves have in common is their differenti-
ability. By itself, however, this will not make possible any improvements on Tchebycheff's limits;
for we can always obtain a function which is differentiable as far as required and which will approach
as closely as desired the "staircase" of a point-mass distribution, and hence the Tchebycheff limits
themselves.
If the k + 1 th derivative of a distribution function exists and is everywhere continuous, it must
have at least k zeros between the terminals; the following definition restricts this number to
exactly k, and also imposes a restriction on the value of the derivative.

Definition I
A (cumulative) distribution function F(x) is said to satisfy the smoothness condition of order
k with bound X if its k + 1th derivative exists and is continuous everywhere, and if there exist

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLOwS-Generalizations of Tchebycheff's Inequalities 141

k + 2 numbers P7 o, ... X, k+1 such that

O < (- l)i Flk+l) (X) < X 3i < X< p3i+lX i-OX A k


where P0 and rk+, are the terminals of a bounded distribution, and are conventionally

- cc, rk+1 + cc
for unbounded distributions.
Distributions satisfying this condition will be said to satisfy the smoothness condition (k, A).
Notice the strict inequalities in the definition. Thus F(k+l) (x) must have zeros at the k points
b, P2, ... ., k, and no others between po and k+l A distribution function E(x) will be said
to satisfy the "wide sense" smoothness condition (k, A) if it is the limit of a sequence of dis-
tribution functions satisfying the smoothness condition (k, A).
The smoothness condition (0, oo) will lead to Tchebycheff's inequalities. The condition (1, cc)
is that a unique mode (not an antimode) exists. The condition (0, A) is that the derivative of the
distribution function (i.e. the elementary probability law) is bounded by X. This condition is
due to Markoff (see Shohat & Tamarkin, 1943, p. x) and the construction of the corresponding
"extremal distributions" (see 5.1) is due to Akhieser and Krein (Shohat & Tamarkin, 1943,
p. 82).
3.2. Some research is needed on the problem of determining what smoothness conditions will
be satisfied in any particular case. Consider the behaviour of the mean of a sample of n from a
rectangular population; here the distribution function satisfies the smoothness conditions
(k, oo) for k - O, 1, . .. n - 2. As n oo , the function tends to the Normal form, which satisfies
the conditions (k, oo) of all orders (k = 0, 1, . . . ).
We would suppose that in all such cases, where the distribution is that of the sum of n inde-
pendent random variables and the limiting form is Normal, the order of the smoothness condition
satisfied for any n will increase with n. In cases where the limiting form is not Normal, it is not
obvious what sort of result to expect. However, in cases where the limiting form is known exactly,
and also the distribution can be obtained exactly for some low values of n, we may expect to be
able to deduce some smoothness properties of the distributions for intermediate n.
The problem of what value to take for A, having decided on k, is again one for research. A
paper by Birnbaum (1948), while not immediately applicable, perhaps indicates one line of attack.
A numerical study of the values of the maxima of the first and second derivatives of the stand-
ardized t and x2 distributions for various degrees of freedom shows that the maximum of the
second derivative tends to its limiting value rather more slowly than the maximum of the first
derivative. This is possibly a typical result.

4. The Problem
4.1. We may now state the problem more precisely. It is desired to find functions L(x), U(x)
such that any distribution function F(x) which
(i) satisfies the n + 1 equations
00

xsdF(x)= Vs s-=O,l... ,n
- co

for some given numbers k = 1, _, .. ., ,; and


(ii) satisfies the smoothness condition (k, X) of 3.1 (Definition I);
satisfies also the inequalities

L(x) < F(x) < U(x) - oo < x < oo

where the limits L(x), U(x) are the best possible.


Notice that the inequalities are strict-this follows from the strictness of the inequalities in
the smoothness condition. We shall speak of a distribution function satisfying conditions (i)
and (ii) as being a "solution of the moment-problem (n, k, X)", with the moments Vs implied.
If the limits L(x), U(x) are to be the best possible, they must be monotonic increasing; so
we may obtain limits for a percentage point X,, i.e. the solution of F(X6) = -, in the form

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
142 MALLOWs-Generalizations of Tchebychef's Inequalities [No. 2,

X1 < X8 < X2
where X1 and X2 are given by
U(X1)= ?, L(X2)-

5. Extremal Distributions

5.1. Consider the moment-problem (2m, k, A). It is shown below (6.5) that the knowledge
of V.2m?1 will not make possible any improvement of the limits; so we consider only an even
number of moments. We shall assume that the given moments are general; if, for example, the
odd moments vanish, the following arguments may have to be modified.
The central result of this paper may be summarized as follows:
In all the cases considered in detail (and, it is conjectured, in all cases); if the moment-problem
(2m, k, A) has a solution F(x), then the limits L(x), U(x) of (4.1) can be obtained in the following
way. We construct a simple infinity of distribution functions of certain forms, to be called
"extremal" distributions, which will cross any solution F(x) either 2m or 2m + 1 times; this will
be, found to constrain any such solution F(x) to lie entirely within the area of the plane occupied
by the extremal distributions. Thus the edges of this area will be the required limits. The area
may be regarded as being swept out by a variable extremal distribution.
The next sections (up to 5.9) will show how the extremal distributions are to be constructed.
5.2. A "simple intersection" of two distribution functions A(x), B(x) is to be taken as meaning
a value xo of x such that
A(xo - 0) < B(xo) < A(xo + 0)

while A * B for x just less than xo and for x greater than some xl > xo (if xl > xo we must h
A = B for xo < x < xl); A - B changing sign from just below xo to just above xl.

Definition II

An "extremal distribution" E(x) corresponding to a moment-problem (2m, k, A) is a distribu-


tion function satisfying the wide sense smoothness condition (k, A) and having the correct moments
(up to V.2m), such that no solution of the moment-problem has more than 2m + 1 simple inter-
sections with E(x).
5.3. We construct what will be called a "general extremal distribution" E(x) corresponding
to a moment-problem (2m, k, oo) as follows. The proof that this distribution is in fact an extremal
distribution according to the definition is given in Theorem II below (6.3).
(i) Let xo, ccl, . . . CC, be m + 1 distinct ordered values of x.
(ii) We define

E(x) - *(? x > CCM


x1 X>OCm

(iii) In each interval (oxj, cxj+1), E(x) is an arc of a polynomial of at most the
(iv) To each point cxj (j = 0, 1, . . . m) corresponds a "characteristic number" nj
such that the (k - nj)th derivative of E(x) has a simple discontinuity at oc1 (i.e. the
(k - nj - )th derivative is continuous at cxj). We call the ordered set of integers
(no, nli, ... nm) the "character" of E(x).

(v) nj > O, Enj = k


(vi) 0
xsdE(x) = Vs s-O,1, . . . 2m
-0c

(vii) E(x) satisfies the wide sense smoothness condition (k, oo).
5.4. ThuLs for k = 0, the only possible character is (0, 0, . .. , 0) and E(x) is constant in each
of the intervals (ocj_1,, cj) (j 1, 2, .. ., m). This is the point-mass or "staircase" distribution.
Suppose

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLows-Generalizations of Tchebycheff's Inequalities 143

Then the pi are the weights associated with the points occ. We ma
diagrammatically as in Fig. l(a). This is a diagrammatic representation of a singular function
we shall call E'(x); this function is zero everywhere except at the m + 1 points cxj, and it sat

aj + 0

JE(x) dx =pj j= 0, 1, ... ,m.


aj-O

E(x) itself is as shown in Fig. 1(b).

(~~) C~ Ib- ;9lrtt


ao (XI (>m

(b)EC

O --

ao
FIG. 1.

Similarly for k 1, we represent the extremal distribution with character (0, . 0. , 0, 1, 0, .. ., 0


(with the unit in the r + I th position) as in Fig. 2. This is a diagrammatic representation of
function we shall call E"(x), which is zero everywhere except at the m + 1 points my, and has
ai + 0
JE (X) dx 1<0
J > ? ]j=r
J 0, 1 +l,.
** r Im
x1-0
ar+O y

0 < dy E"(x)dx < oo.


ar - ? ar-?

E'(x) and E(x) are as shown in Fig. 2(b) and (c).

For general k, we may represent the extremal distribution E(x) with character (no, n, . n
by its (k + I)th derivative, a singular function E(k?1) (x), which is zero everywhere except at
m + 1 points, ccj, and has n1 + 1-ple singularities at these points. The total number of singulariti
(counting n-ple ones n times) is thus m + k + 1, and the number of changes of sign (which occur
only at multiple singularities) is En>= k.
The above indicates how to find the form of the extremal distributions for the moment-problems
(2m, k, oo). It will be found that, after satisfying the 2m + 1 moment conditions, there is just
one degree of freedom in the extremal distribution.
5. 5. Suppose a general extremal distribution (of the above type) exists with ccO X say. By
continuity, extremal distributions with the same character will exist for all cco in some (open)
interval (a, b) say. If we try to fit an extremal distribution by taking oco to be either a or b, we shall
obtain a "special extremal distribution" of one of two types. These two types have characters
which are a modification of those described in 5.3.
For the first type, the character consists of only m integers nj (j 0, 1, .. , m - 1), summing
to k. This is a general character corresponding to the moment-problem (2m - 2, k, oo). The
corresponding special extremal distribution is a general extremal distribution corresponding to

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
144 MALLOWS-Generalizations of Tchebycheff's Inequalities [No. 2,

this moment-problem. It is the limit of the extremal distribution with character (0, no, . n. , )
as co- -oo, and of the extremal distribution with character (no, ... , nm,l 0) as acxn + ??
It does not satisfy the last (2m)th moment condition, and so is not an extremal distribution cor-
responding to the moment-problem (2m, k, co).
The second type has a character consisting of m + 1 integers nj, summing to k - 1, with an
oblique stroke between two of them, say ni and ni+1; thus-(no, ... , nl/ni+, . .. , nm). It is the
limit of the extremal distribution with character (no, . .. , ni + 1, fl, . .. , nm) as ao tends to
the lower extreme of its possible interval, and the limit of the extremal distribution with character
(no, ... , ni, ni+l + 1, ... , nfm) as co tends to its upper extreme value. The diagrammatic
E(k+l) (x) has (nj + 1)-ple singularities at the points cxj as before, but now has an extra change of
sign between two of these points, as indicated by the oblique stroke in the character. This is an
extremal distribution according to the definition.
5.6. As an example, consider the moment-problem (4, 2, co). (Thus m 2, four momenits).
There are six different characters, namely

(2, 0, 0), (1, 1, 0), (1, 0, 1), (0, 2, 0), (0, 1, 1), (0, 0, 2).

(a) E Jo - - r -r+i a m
ao ar- i i

~ 1o-o-- - -_

(cc) 5'/

FIG. 2.

There are three special extremal distributions of the first type, with characters

(2, 0), (1, 1), (0, 2).

There are six special extremal distributions of the second type, with characters

(1/0, 0), (0/1, 0), (0/0, 1), (1, 0/0), (0, 1/0), (0, 0/1).

5.7. We now define the concept of a "complete cyclic set" of extremal distributions, correspond-
ing to a moment-problem (2m, k, oo).

Definition III
A complete cyclic set of extremal distributions is a set of extremal distributions which may be
made to depend on a single parameter 0 in such a way that-
(i) to any value of 0, 0 < 0 < 1, corresponds an extremal distribution Eo (x), which is
special of the first type for 0 = 0, special of the second type for k other values of 0 in (0, 1);
and general for the remaining values of 0;
(ii) as 0 varies continuously from 0 to 1, the extremal distributions Eo (x) vary continu-
ously through all the extremal distributions of the set.
0 could be a monotonic function of oco; then the special extremal distribution of the first kind
would be the limit as co - o - c of a general extremal distribution with character (0, nl, . . . nm)

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLows-Generalizations of Tchebycheff's Inequalities 145

say. The upper limit of oco for a general extremal distribution with this character would give a
special extremal distribution of the second kind. Increasing ox0 still further, we have general
extremal distributions with a different character; then another special extremal distribution;
and so on until the cycle is completed by o?' -- cc. Running co from - oo to co would m
going through the whole cycle m + I times.
A complete cyclic set of extremal distributions must thus contain-

just one special extremal distribution of the first type (E0) with character (no, ni, . . . , nm,)
say,
general extremal distributions with k + 1 different characters, two of which are
(0, no, . .. , nm-1) and (no, ... , nm-l, 0), and
k special extremal distributions of the second type.
5.8. We may approach the idea of a complete cyclic set from a slightly different angle, using
the concept of a cyclic set of characters, as defined below.

Definition IV
A character B is "adjacent" to a character A (written A -* B) if it can be obtained from A by-
(i) moving all the n1 (j 0, 1, .. ., m - 1) one place to the right, and placing nm, whic
must be zero, in the initial position; or-
(ii) adding unity to some nj (not nm), and subtracting unity from nj+l (which mus
be zero).
Notice that A -> B does not imply B -> A.

Definition V

A set of k + 1 characters AO, A1, .l. , Ak form a cyclic set if


Ao-A,-> ... -Ak->AO

where the first k adjacencies are of type (ii), and the last one is of type (i).
Thus for example in the (4, 2, oo) case of 5.6, there are just four cyclic sets, namely-

(a) (b) (c) (d)


(0, 2, 0) (0, 1, 1) (0, 1, 1) (0, 0, 2)
(1, 1,0) (1,0, 1) (0,2,0) (0, 1, 1)
(2, 0, 0) (1, 1, 0) (1, 1, 0) (0, 2, 0)

A complete cyclic set of extremal distributions can be shown to contain general extremal distri-
butions with k + 1 characters, which form a cyclic set, together with the k + 1 transitional special
extremal distributions. In the (4, 2, oo) case, these have characters

(a) (b) (c) (d)


(0/1, 0) (0/0,1) (0, 1/0) (0, 0/1)
(1/0, 0) (1, 0/0) (0/1, 0) (0, 1/0)
(2, 0) (1, 1) (1,1) (0, 2)

The cyclic set (a) is shown in Fig. 3.

For the moment-problem (2m, k, co), there are (m k) possible characters, and mk cyclic
sets of characters.
5.9. We now construct general extremal distributions for the moment-problems (2m, k, X)
with X < oo (and with general moments). An extremal distribution E(x) has a character of the
same form as in the (2m, k, oo) case. Its k + 1 th derivative exists everywhere except at 2m + k + 2
points; when it exists, its value is zero or ?A.
If a function G(x) takes the value + X in (a, b), and is zero just below a and just above b, we
say that G(x) has a positive block on (a, b). If G(x) is + X in (a, b), - X in (b, c), and is zero just

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
146 MALLOws-Generalizations of Tchebycheff's Inequalities [No. 2,

below a and just above c, we say that G(x) has a double block on (a, c). Similarly we may have
triple, ... multiple blocks.
If E(x) has character (no, n,, ... nm), then E(k+l) (x) consists of m + 1 ordered blocks, of
multiplicities nj + 1 (j= 0, 1, . .. , m). The number of blocks (counting n-ple ones n times)
is thus m + k + 1, and the number of changes of sign (which occur only in multiple blocks) is
k. Notice that the form of E(k+l) (X) for a given character may be found from the diagram for
E(k+1) (x) for the moment-problem (2m, k, oo) (with the same character) by a "widening" process.
A simple singularity becomes a single block, a multiple singularity a mtultiple block.

Ckaracter E E

(o.a. 0) I 1

(o/' o) I I

()/o, o)

(2,0.0)

(2.0) _

FiG. 3.

For the (2m, 0, ?) and (2m, 1, A) moment-problems, the extremal distributions are as shown in
Fig. 4(a) and (b).
We may define a complete cyclic set of extremal distributions as before (Definition III). There
are special extremal distributions of two types, which may be obtained from those for the X= co
case by the "widening" process above.

6. The Basic Theorems


6.1. We shall now prove two of the three basic theorems.

Theorem I
Any two distribution functions with the same initial moments [,, 1, ... 2m have at least
2m simple intersections.

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLows-Generalizations of Tchebycheff's Inequalities 147

This theorem is essentially that used by Stieltjes in his second proof of Tchebycheff's inequalities
(see e.g. Uspensky, 1937).

Proof

Suppose F1(x) and F2(x) are any two distribution functions continuous on the left for all x,
having the same moments V0, 1 ... , V,2.. Then the integrals
00

Xxr JFl (x) -F2(X) } dx

0~~~~~~~~0

1 - _ _ _ _ _ _ _ _ _ _ _ _

(a) Iylome

Character (o, o ,... o)

E'-,.

(b) AIo

Chara
FIG. 4.

exist for r = 0, 1, ... ,2m - 1, and are all zero. Suppose F1 and F2 have s(< 2m) simple inter-
sections 41, 42, ... . Consider

Go

J II u (x - {Fl(x) - F2(x)} dx.


-Go

Since s < 2mn, J vanishes. But the integrand never changes sign, and is non-zero over a finite
interval in each of the intervals (4j, i+j) (i = 1, 2, ... s - 1). This is a contradiction; so F1
and F2 must have at least 2m simple intersections.
6.2 Before proceeding to Theorem II, we shall prove a lemma.

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
148 MALLoWS-Generalizations of Tchebycheff's Inequalities [No. 2,

Lemma

A sufficient condition for the existence of a solution of the moment-problem (2m, k, X) is the
existence of a general extremal distribution as constructed in 5.3 et seq.

Proof
A general extremal distribution is not itself a solution of the moment-problem, because of the
strictness of the inequalities in Definition I. However, we can find a solution differing arbitrarily
little from it by the following construction.
We may define the "distance" d(l, 2) between two pV-points (i) -(V1, i V, i, ... , ** 2M, i)
(i = 1, 2) in the 2m-dimensional moment-space by
2m

d2(1, 2)= Y (uj5, 1 _ Fj5 2)2.


j = 1

If the moment-problem (2m, k, k) with V-point V.(1) admits of a general extremal distribu
E( 1) (x), then so will all the moment-problems (2m, k, k) with V-points V( 2) satisfying
d(1, 2) < do

for some positive do. Thus for any such L( 2) . we may find a general extremal distribution E( 2) (X)
and a random variable e( 2) having E( 2) as distribution function.
Let u be a unit Normal variable independent of e( 2), and consider the random variable

f = e(2) + ?U . . . . . . (3)

for some positive ?. f has a V.-point VL(f)


of f satisfies the (k, k) smoothness condit
that

We have from (3) the vector equation

{1, V-(f) } == {1 5V(2) } (I + M)


where Iis the unit (2m + 1) x (2m + 1) matrix, and M is a triangular matrix with zeros in an
below the leading diagonal. (I + M)-I exists, so if we take

{1, V(2) } = {1, V(1) } (I + M)-1


we shall have Ku(f) = V.1). For sufficiently small e we have d(l, 2) < do and so e(2) exists.
We may so choose e(2) that the distribution function F(x) of f tends uniformly to E1 (x) fo
all x as ? -?0 (except for one point if E1 (x) has a character of the type (0, ... , 0, k, 0 . 0)
and X = oo, when El (x) has a discontinuity at one point).
6.3. The second basic theorem is obvious in the Tchebycheff case, i.e. for smoothness condition
(0, oo).

Theorem II
No distribution function which is a solution of the moment-problem (2m, k, k) can ha-ve more
than 2m + 1 simple intersections with any general extremal distribution corresponding to this
moment-problem (i.e., the "general extremal distributions" constructed in 5.3 et seq. are in fact
extremal distributions according to Definition II).

Proof
Suppose the moment-problem (2m, k, k) admits of a general extremal distribution E(x) and
a solution F(x), having just n simple intersections. For any a, ? > 0, we may construct as in the
Lemma a solution G(x) of the moment-problem such that-

I G(x) - E(x) I <

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLOws-Generalizations of Tchebycheff 's Inequalities 149

for all x except possibly those for which j x - | < 8 for some f (this happens only if E(x)
has a character of the type (0, . .. 0, k, 0, . 0. ,0) and X = o).
For sufficiently small 8, s, we have that-
(i) The function
H(x) = F(x) - G(x)
has just n simple zeros;
(ii) H(k+1) (x) exists for all x;
(iii) by the construction of E(x),

H(k+l) (X) = F(k+l) (X) - G(k+l) (x)

has not more than 2m + k + 2 simple zeros. This last property is the essential feature of the
"general" extremal distributions as defined in 5.3 et seq.
Now H(x) -O 0 as x -* + oo or - oo; so H'(x) has at least n + 1 simple zeros; similarly
H"(x) has at least n + 2 simple zeros; and H(k+1) (x) has at least n + k + 1 simple zeros. Hence
from (iii) we have
n < 2m + 1.

6.4. We now state the third basic theorem, which has not been proved in general.

Theorem III
A necessary and sufficient condition for the existence of a solution of the moment-problem
(2m, k, )X) is the existence of a complete cyclic set of extremal distributions.
This theorem has been found to hold in all the cases so far tackled by direct algebra. The
sufficiency follows from the Lemma above (5. 10); for a complete cyclic set of extremal distributions
must contain an infinity of general extremal distributions.
If Theorem III is assumed to be correct, it follows that the required limits are

L(x) = min E (x)

U(x)= max Eo (x)$ <x <c. . . . . (4)


0< 0<1

This may be shown as follows. Suppose that F(x) is any solution of the moment-problem
(2m, k, ?) (with general moments), having terminals Po, Pk+?; and E4(x) (O < 0 < 1) is a complet
cyclic set of extremal distributions, 0 being a parameter as in Definition II. Then for any 0 in
(0, 1), F(x) and E0(x) have either 2m or 2m + 1 simple intersections; and of these exactly m will
be upward-crossing (i.e. F(x) - E(x) passing from negative to positive). Since Eo(x) is a general
extremal distribution corresponding to the moment-problem (2n - 2, k, k), F(x) has exactly
m - 1 upward-crossing simple intersections with EO(x).
We may thus define m functions Ai(0) with Al < A2 < . . . < Am for 0 < 0 < 1, each
Ai(0) being an upward-crossing intersection of F(x) with EO(x). Let Ai(0) (i = 1, . . . m - 1
be the ordered upward-crossing simple intersections of F(x) with EO(x).
Since F(x) is continuous and monotonic increasing, and since the E0(x) vary continuously
with 0, these functions Ai(H) are continuous, and satisfy

A1(0) --o l 0
Aj(O) Al-,(O) j = 2, 3, ...m J
Aj(O) Aj(O) j = 1, 2, ...* m 1 X I
Am(0Y) |3k+1
Now define
B(cp) = A[](cp-[p]) 0 < cp < m
where [p] is the integral part of cp. Thus for cp in (0, 1), B(p) = AQ(p); for cp in (1, 2), say p
B(p) = A1(0). B(p) is uniquely defined fore 0 < cp < m, and is continuous in this range.

B(P)-*Po as c 0, B(p)>1k+, as m
B(j)=A(O) for j= 12, . . . m-1.

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
150 MALLows-Generalizations of Tchebycheff's Inequalities [No. 2,

Hence for aniy ,, P0 < X < 3k+1, there is a p in (0, m) such that B
(f, F(3)) is an upward-crossing simple intersection of F(x) with some extremal distribution (e.g.
that E0(x) with 0 - [cp]). This means that F(x) must lie completely within the area swept
out by the extremal distributions E0(x) as 0 varies from 0 to 1; i.e. the limits L(x), U(x) are as
given in equations (4) above.
6.5. We now prove the assertion above (5. 1) that the limits for an even number of moments
2m say, cannot be improved by the knowledge of the next odd moment L2m + 1. Suppose we
wish to prove this for a point x = X. Since the limit L(x) is the best possible, we can construct
a distribution function Fx(x) which (cf. the construction in the lemma of 6.2)
(i) is a solution of the moment-problem (2m, k, A)
(ii) approaches the limit L(x) as closely as desired at x X; in fact suppose that for
some given ? > 0 we have
Fx(X) - L(X) < E/2
(iii) has 2m + 1 th moment ,Ux say (not under our control).

Suppose that for - Xo < 4 < oo, F(x) is a family of distribution functions which*
(i) are solutions of the moment-problem (2m, K, X)
(ii) have 2m + Ith moment
X 2m+? dFc(x) =
Consider the function
V_2____ XX__ 112m?1 - L F~()
F(x) _ - v. Fx(x) +).
For sufficiently large C (positive or negative according as 12m+1>
have 0 < Fc < 1, F(x) is a distribution function which
(i) is a solution of the moment-problem (2m, k, k)
(ii) has 2m + I th moment V2m+1
(iii) satisfies
| F(x) - Fx(x) I < 4/2
and thus satisfies
F(x) - L(x) < s.

Thus we have constructed a solution of the moment-problem (2m + 1, k, X) wit


2m + 1 th moment t.2m+1, approaching the limit L(x) to within ? at x = X. Sin
L(x) cannot be improved at x = X.
This argument can be repeated for all X, and can obviously be adapted to prove the same
result for U(x).

7. Examples
7.1. We shall obtain the generalized Tchebycheff inequalities for the moment-problems
(2, 0, oc), (2, 1, oc), and (2, 0, X). Theorem III holds in all these cases. We may assume the
given moments to be
(UO, [1, V2) (1, 0, 1).

For the (2, 0, oc) moment-problem, the extremal distributions have character (0, 0), and so
consist of two point-masses, po and Pi, say at points ao < oc, (see Fig. 5(a)). On fitting this distri-
butioni to the given moments, we find that

= _1 1 p P=
= O P? 1 +0C02 1 + CXo2

There is one special extremal distribution (of the first type), which has character (0). This is a
unit mass at the origin. The required limits are thus
* Seep. 176.

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLOWs-Generalizations of Tchebycheff's Inequalities 151

L(x)- { X2 X>? U(x) = 21 + X2


+ x 2 ~~1 x> 0
We may therefore write

p{u > x} < + 2 X > 0


1 + x2

where u is any standardized random variable. This is the simplest of Tchebycheff's inequalities.
7.2. For the (2, 1, oo) moment-problem, there is just one (complete) cyclic set, namely

(0, 1)
(1, 0)

monerttl
t.c>lrl Ciaracter E) L:

O e~ o |

0 0-

(b) (20,i CO) (1,0)

O~~~~~~~~~~~X ...

FIG. 5.

The extremal distributions with character (1, 0) consist of a point-mass p at oco with a unifor
distribution of total mass (1 - p) on the interval (oc0, oc,). (See Fig. 5(b).) On fitting this distr
bution to the given moments, we find that

3 = 3(1 + o2c2

We must therefore have - V3 < oo < 0. For this range of values of oco,

0 x < or.0

E(x) x -49(1 (3 + 2) + o)2 X oco < x < oc (5)


{ oC, < X

There are two special extremal distributions (one of each type), with characters (1) and (0/0).
The first of these is a unit mass at the origin (oc0 -?0 in (5)), the second a uniform distribution on
(-\/3, + /3) (oc -? - V3 in (5)).
The character (0, 1) gives symmetrical results. The limit L(x) for positive x thus consists of:
(i) the locus of the point (x, 1 - p) (x > 0)
(ii) the envelope of the line in (5) for oco < x < ocr.
If this line is y mx + c, the envelope to the system obtained by varying oco is given paramet
by

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
152 MALLOws-Generalizations of Tchebycheff 's Inequalities [No. 2,

c' mc'
X= - / y =C--t

In this case, we find


I 4oc02 4 1
o,o 1 = 9(1-+ loU2) 9 1 + X2

The curves (i) and (ii) intersect at the point

/5 5
x= '\3 Y= 6

Hence we have for positive x


J 4X2 O5?V
4 91 3 ( (6)

U(x)= 4 1 0 < x
We may therefore write
p{u > x} < 1-L(x) x > 0
where L(x) is given by (6), and u is any standardized unimodal random variable. We notice
that the improvement over the Tchebycheff limit is considerable, being by a factor

4 for i x i > 5.

This factor has been obtained by several writers (see e.g. Gauss' inequality (2)), but it is believed
that the form of the limits in (6) is new.
7.3. For the (2, 0, k) moment-problem, the form of the extremal distributions is as shown in
Fig. 5(c). It follows from a result in Shohat & Tamarkin (1943) that for a solution to the moment-
problem to exist, we must have

>1
>2 / 3'
On fitting the required form of extremal distribution to the given moments, we find tha
L(x) is one root of-

(x + ) L2 - (x+ ) +1 - L + 0(x+ )=O

for x > - I

We notice the following properties of these limits (U(x) is symmetrical).


(i) As X, ->- oo, the limits approach those found in 7. l1for the (2, 0, oo) case.
(ii) As X I - the limits tend to the uniform distribution function
2x\/3'
0 1 x <- v3

E(x) == + x -VA3< x <?V\3


2 2V/3
1 + /3 < x.

(ii) As x co, L(x) is given asymptotically by

L(x) = I -a + 1 +~ + (7)
where

62= - 1
12X2.

This may be compared with the corresponding result in the Tchebycheff case,

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
19561 MALLOWs-Generalizations of Tchebycheff's Inequalities 153

L(X) =- 1 -X2 + I-1 +

8. General Procedutre
8. 1. To obtain the generalized Tchebycheff inequalities in any given case, we proceed as follows:
(i) Choose n, k, X. We shall obviously try to use as many moments as possible, to obtain
the closest possible limits. Unfortunately, the results become very complex algebraically, though
the difficulties are not insuperable if numerical methods are used. The choice of the smoothness
condition (k, k) is much more critical; unfortunately, we have as yet very little guide as to the
correct choice to make in any given case.
(ii) Find the correct cyclic set for the given moments. This may be done by finding which
special extremal distributions may be fitted. If none can be fitted, the moment-problem has no
solution.
(iii) Compute the general and special extremal distributions (either algebraically or numerically);
the limits are then given by (4) (6.4). They will be in part the loci of "corners" of the extremal
distributions, and in part envelopes.
8.2. The limits can be found explicitly for the (2, k, oo) moment-problems, for all k. The
extremal distributions in the cases (4, k, o) for k = 0, 1 (and possibly for k > 2) and (2, 0, ?)
require the solution of quadratic equations. The cases (6, k, cc) for k = 0, 1 and (2, 1, ?) require
the solution of cubics. Theorem III holds in all these cases.
The process of obtaining the limits as loci and enivelopes rapidly becomes enormously compli-
cated; the simplest equation of the form
f(L, x) 0
giving the lower limit L(x) in the (4, 1, oo) case appears to be of degree at least several hund
in L and x. We may however use numerical and/or graphical methods; and we may be able
to obtain asymptotic expansions giving approximations to the limits for large x, as in (7) for the
(2, 0, k) case.

PART II

SUMMARY

THE technique of "fitting a curve" to the initial moments of a theoretical sampling


distribution suffers from the defect that so far there has been no way of estimating
how accurate is the approximation. The present paper investigates the accuracy
of approximation to be expected when curves are fitted by moments and satisfy
certain conditions of unimodality, smoothness, etc., as specified in Part I of this
paper, using the methods there developed.

9. Preliminary
9. 1. In Part I, sections 5.3 et seq., it was shown how to construct "general" extremal distribu-
tions for the moment-problems (2m, k, X). To calculate the generalized Tchebycheff inequalities
in any given case, we must fit the extremal distributions of the correct forms to the given moments.
There will be one degree of freedom in the extremal distributions; if we allow an extremal dis-
tribution to vary continuously through the members of a complete cyclic set, its cumulative
distribution function will sweep out the area between the required limits L(x) and U(x).
In the present Part some of the algebraic properties of the extremal distributions are examined,
and several cases are treated at length. The algebra is simplified by the use of the I-function
defined below.
9.2. Following Shohat & Tamarkin (1943) (which will be referred to as S & T), we define
0I

I (Z)= dF(x_ I (7)

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
154 MALLOwS-Generalizations of Tchebychegff's Inequalities [No. 2,

for complex z, where F(x) is a (cumulative) distribution function. We exp

IF(Z) = VOZ-' + V-l1Z-2 + VL2Z-3 + . . . . (8)


This expansion is purely formal; we are not concerned with the con
and will use it only as a convenient algebraic shorthand.
This I-function is a sort of characteristic function of the distribution F(x); it satisfies an in-
version formula. It is very apt to the present problem, as the I-functions corresponding to the
extremal distributions are simple in form. However, it will not be found to be very useftul in
statistics generally, as it does not assume a simple form in many important cases; for example,
when F(x) is the Normal distribution function. We shall use the notation

A -./B

where A, B are functions of z, to denote that the expansions of A and B in inverse powers of z
agree up to the term in z-n.
We define I(z) (without a suffix) for the moment-problem (2m, k, k) as

I(z) _ VoZ-' + V-lZ-2 + . . . + Z-2m-1


Thus any solution F(x) of the moment-problem must satisfy

2m+1I
IF (Z) -I(Z)

together with the smoothness condition (k, ?).

10. The Tchebycheff Inequalities


10.1. In this (2m, 0, oo) case, which is very fully treated in S & T, the general extremal distri-
butions consist of m + 1 point-masses pi say at points cxi (i 0, 1, . . . m). We thus have
from the definition (7)
00

IE(Z) XdE(x) Po + Pi Pm
IEW == - -h +. F - 9
-aco
coZ - X Z - o Z - C1 Z - OCm

Thus we may write IE(Z) = P(z)/Q(z) where Q(z) is a polynomial of the m + 1 th degree, having
zeros Oci, and P(z) is another polynomial. The following is an identity:

I4- o V- 2m m Q0) v ___ . (10)v.


Q(Z)
kz(z+ ***+
z 1 P(Z)
IZ Z+ z + 2 + * + 0)

where P(z) is the p


be replaced by V.
m

S piOT== =L r =0, 1, . . 2m
i=O
which are contained in
2m+1
IEWZ I(Z) . ....(1)
The condition (11) thus requires

v. " Q(v.) = O s = , 1, .. m-1 I (12)

The system of m equations (12) are linear in the m + 2 coefficients of Q(z), and so allow one
degree of freedom in the m + 1 ratios between them. Following S & T, we call these equations
the conditions of quasi-orthogonality (of Q(z)), and Q(z) a quasi-orthogonal polynomial of order
m + 1, associated with the momients o, v-, . . . 2m
The condition that the moment-problem has a solution is well-known; it is that

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLows-Generalizations of Techebycheff's Inequalities 155

U-o -i . . . -s
U-1 U-2 >0 s= 0, 1, . . . m . . . (13)
U-s U-2s

10.2. The following properties of the polynomials Q(z) are taken from S & T. Given any c,
there exists a quasi-orthogonal polynomial with a zero at a, which is
1 Z z2 ... Zmrn1 0 .1 z . .. zm
1 C C2 . . M+l 1 -o -i U -rn
Q(Z) = -o 0-1 U-2 U . . -M + (Z - a) a U1 =(z - x) Q*(z) . (14)

Uns--i U-2m n a r m U - U -2m

say. This polynomial is unique within a constant factor. The coefficient of zmrl in Q(z) is a
polynomial in cc, QO(a) say, which m has distinct real zeros 81, a, . . am. If ac takes any of
these values, Q(z) becomes essentially QO(z), and so has zeros ai. For such a value of ac has the
effect of making one zero of Q(z) infinite. Then the polynomial Q(z), looked on as a poly-
nomial in a, has m zeros 8i and one infinite zero; i.e. the coefficient of amr+ vanishes. But this
coefficient is - QO(z).
QO(z) is the orthogonal polynomial associated with the moments U-o, U-i, . . . U-2rn-1, and
will give the special extremal distribution. If Qj(z) is any other quasi-orthogonal polynomial,
then any third polynomial Q(z) is a linear function of QO(z) and Q&(z). In fact, if Po(z), Pj(z),
are the numerators corresponding to QO(z) and Qj(z), then any general extremal distribution
E(x) has
P(Z) PO(z) + tPl(z)
IE( = QQ(Z) QO(Z) + tQ1(z) (15)
for some parameter t. The parameter 0 of Definition III (Part I) can easily be found in terms
of t.
For cx real but not equal to any ai, the zeros (xi of Q(z) satisfy

XO < 81 < a, < . . . < am < am

where one of the ac is equal to a.


We may write
Q(z) = QO(C) (z - cc) H (z - cci)
i*j
(where c = cj). We have from (9)

Pi = P(cca)/Q'(cc.) i =, 1, . . . m
with
QO(M) = QO(a) H (ai - acj)
j*i
Also from (10),
P(z) = {Q(Z) - Q(0)}I(z - U-)
so

P(ai) = QO(c) H (U1 - ai)


J*i
We thus obtain the simple result

Ai = 11 (- (Xi i 0, 19 .............. m . . . . (16)


i* i ati - aX
where the oc's are the zeros of the polynomial Q(z) of (14), and U- is symbolic as before.
10.3. We may thus compute the extremal distribution for any given a by
(i) finding the other oc's as the zeros of the polynomial Q*(z) of (14),
(ii) finding the pi corresponding to each cci from (16). Then the Tchebycheff limits
L(x), U(x) for any x are obtained by taking cc = x, when

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
156 MALLowS-Generalizations of Tchebycheff's Inequalities [No. 2,

L(x) = Y_ Pi, U(x) = E pi


ai<a ai<a

Any distribution function having the same initial moments will satisfy

L(x) < F(x) < U(x) -oo < x < oo

The above in fact holds whether or not x is a zero of Qo(z); in thi


10. 4. (4, 0, oo). The Tchebycheff limits for the two-moment case were found in (7. 1), Part 1.
In the four-moment case, we may take

( P-0 [ j-1 P p2,P 3, P-4) I (, 0, I , P-3, P-4)

We assume for the present that P-3 and L4 are general. The conditions (13) become
A= P4- P32- 1 > 0
From (14),
QO(oc)- -2 + P3 +
and so
al, 2 = -3T ? v(P32 + 4))
Also
Q*(z) = Qo(oc) Z2 (P3 QO(oc) + CA) Z - Qo(a) A
It will be found that if oc = oco < 81, we have for the zeros of Q*(z)

81 < C1 < 82 < C2

while if 81 <oc = oc < a2, then oco < 81, 2 < X2 where now the zeros of Q*(z)
for 8, < oc = OC2. Suppose oc = o < a,; then from (15),

Po__m_)(P- 1 + C1o2
(OC -OCl)(OC - X2) - OC(C1 + ?C2) + OC10C2

A/{(Qo((X))2 + A(1 + oC2)}


which can be obtained in one operation on a calculating machine. p(x) is U(x) for x < 81: so
we have
P{V < X} <P(X) x < al

where v is any random variable having the given first four moments. Similarly

P{V >X} Ap(X) X > 2

To obtain the limits for 81 < x < a2, we take oc = x, and find the zer

1 + (X(X2 P =1 + OCGtO
Po (xC -oc)(co - oC2) P (2 -oc )(?C2 -OC)
The limits are then
L(x) = po, U(x) = 1-p2 al < x < 82

If in the above we put VL3 = 0, we have 81, 2 = F 1, and so

p(X) = 04- 1)/(oC4 + (-4 - 3) X2 + V-4)

The value V4 = 3 gives the simple result


P{v > x} < 2/(X4 + 3) x > 1

10.5. The major part of the computation for higher m is the formation of the polynomial
Q*(z); however, we do not need this if we only require the limits for x < 81, or x > 8rn. For
oc < 8, we find

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLOWs-Generalizations of Tchebycheff 's Inequalities 157

0 1 1 . . . Ocm

Po = p(0) - I / a - i
U-m . . . Ur2m OCm "*m U 2m
so

P{W < X} < p(X) x < a1

where w is any random variable having the given initial moments. Similarly

P{w > X} < p(X) X > am

Here Al, am are the smallest and the largest of the zeros of QO(z).

11. I-functions in the Generalized Case.

11. 1. It will be found that the I-functions of the general extremal distributions constructe
in 5.3 et seq. have simple forms. We shall derive first those for the moment-problems (2m, k, X
with X < oo, and shall obtain those for the (2m, k, oo) cases by letting X -> oo.
11.2. Suppose a functionf(x) = F'(x) has a simple block on (a, b); i.e.,f(x) is zero for x < a
and x > b, and is + X on (a, b). Then for any real z not in (a, b),

IF(Z) X log (z - a) - X log (z - b)


while for complex z we have
exp A-1 IF(Z) = (z - a)/(z - b)

Iff(x) has two blocks, on (a, b) and on (c, d), we have

exp A-1 IF(Z) = (z - a)(z - c)/(z - b)(z - d)

and so on. Thus for the moment-problem (2m, 0, X), a general extremal distribution E(x)
m + 1 blocks (yi, oci) (i = 0, 1, . . . m) has
exp A-1 IE(Z) = S(Z)I Q(Z) . . . . .(1 7)
where S(z) has zeros yi, Q(z) has zeros oci, and

Yo < oco < Y1 < . . . < Ym < OCm. * (18)

11.3. When k > 1, we obtained E(x) most simply by considering its (k + I)th derivative.
Since (integrating by parts)

(d-) IE (Z) = X dE(z


we shall here obtain the simplest form by considering

(d )k

Thus in the (2m, 1, X) case, where E'(x) has say r + 1 positive blocks (yi, cci) (i = 0, 1, . . . r
and m + 1 - r negative blocks (oci, Yi+?) (i r, . . . m) (so that there is a double block on
(yr, yr+D) we have

exp IIEWz Z -Yo Z - Yl Z-Yr Z -Yr+]. Z-Ym+l


Idz) z-yo z-X z-y -yr 7Z- z-yM. Z c
=S(z)/Q(z) . . . . . . . . . .(19)

where now S(z) and Q(z) are of degree nm

YO < ocO < Y1 < . . . < Yr < oCr < Yr+l < . . . < ocm < Ym+ I * (20)

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
158 MALLoWs-Generalizations of Tchebycheff 's Inequalities [No. 2,

It will be found tlhat if this system of labelling the blocks is continued to the general case with
k > 2, then Q(z) (of degree m + k + 1) will have [(k + 1)/2] double zeros, and S(z) [(k/2],
and these double zeros will themselves alternate. Thus for example in the (6, 4, X) case, the gener
extremal distribution E(x) with character (1, 2, 0, 1) has

1 d4IE(z) (z - yO)(z - Y) (Z - Y2)2(Z -Y3) (Z- Y4) (z - Y5)2


eXp~ dz4 - (z-oc0)2 (Z- O1)(Z-oC2)2 (Z-oc3 * (Z-OC4)(Z-oc5)
with
YO < XO < YI < . . . < Y5 < ?C5
and so also
ocO < Y2 < M2 < Y5

In the general case, there are 2m + k + 2 distinct zeros, of which k are double.
11.4. We now obtain the I-functions for the extremal distributions with X infinite. Suppose
a moment-problem (2m, k, oo) has a general extremal distribution E(x) with character
(no, ni, . . . nm), and lower terminal 40, E(k+l)(x) having singularities at , >i, . M.
For sufficiently large A, the moment-problem (2m, k, X) with thne same moments has a general
extremal distribution E)(x) with the same character and with the same lower terminal yo = 40
and we may write
exp X 1(d/dz)k IEA(Z) = SA(Z)/1QA(z)

where SA and QA are polynomials of degree m + k + 1 as above. Define Pa(z) by

PA(Z) - (- 1), (SA(Z) - QA(Z))

and let X -* oo. Then EI(x) -? E(x) uniformly except at points of discontinuity of E(x). The
zeros of QA(z) fall into m + 1 groups, corresponding to the distinct nj + 1-ple blocks of EA(x),
all zeros of a group tending to one of the points of singularity of E(k+1)(x). Thus for example in
the (6, 4, X) case above, if X ?c we shall have

oco 0, (1, X2 _> 1, X3 _> 42, X4, X5 __ 3


In general,
QA(Z)_ Q(Z) (Z - io)no+l (Z - iJ)n1+l . . . (z - im)nm+?

uniformly in any finite region of z. Also

x exp Id 1EA(Z)- I d IE(


which must be rational, and so PA(z) tends to some polynomial P(z). Thus we have finally

(_ J )k (d/dz) k IE(Z) == P(Z)I Q(Z)

where Q(z) is a polynomial of degree m + k + 1, having only m + 1 distinct zeros it, these zeros
having multiplicities ni + 1, and P(z) is another polynomial, which must have degree m, since
the expansion of the L.H.S. starts with a term in z-k-1. We shall usually call the zeros of Q(z)
C0, . . . m.

11 . 5. As a generalization of (9) we now find that P(z)/Q(z) can be expressed as a sum of m + 1


partial fractions of the form constant/-r(z), where r(z) is a polynomial of the k + 1 th degree in z,
having for zeros a set of k + 1 adjacent zeros of Q(z); those zeros with characteristic number nj
counting nj + 1 times (as in Q(z) itself). Thus for example in the case k = 1 we have

dIE(z) = P(z) r __ + + (21)


dz - Q(Z) ii= (Z -_ OC_)(z -_ C) (z - cc)2 i=r+1 (z - ci-1)(Z - c ()21

(where n, = 1). In this case, the numerators of the partial fractions are simply
the total mass associated with the several intervals (oc-1, oci). There is a point-mass p at ccr. For
higher k however, the numerators are not simply these "probabilities".

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLoWS-Generalizations of Tchebycheff's Inequalities 159

11 . 6. We may generalize the analysis of 10. 1 as follows. We write

Vr = r(r -1) . . . (r -k +0 1)r-k? r = k9 k + 13, . . . k + 2m


v,r= 0 r= 0,1, . . . k-i.

Thus we have
(_ 1)k(d/dz)k I(z) = VOZ-1 + V1Z-2 + * * * + Vk+2m Z

If Q(z) is any polynomial of the in + k + 1 th degree, we have

z++ z
k+2+1
. , P ~~~~z
~ + + Q2 + + z
vm1+Q(v
z . (22)

where P(z) is the polynomial part ofithe product, and v is used symbolically in the same way as
v in 10.1; i.e. we are to replace vs by v. after expansion. Thus vO = vo = 0 for k > 1. If now
we restrict Q(z) to satisfy the conditions of "generalized quasi-orthogonality", namely

vsQ(V)=0 s=O, 1, . . . m-i.


we have from (22)
k+ 2m+ 1
( J1)7k (d/dz)7kIE(Z) = P(z)/Q(z) _ (_ J)k (d/tz)7k I(z).
Q(z) is further restricted to be of the required form; i.e.

Q(z) = const. (z - OCO)nl?1 (z - CX)ni_+1 .. (z - (Xm)nm+


for some (ordered) constants cxi. We have thus m conditions on the m + 1 values (xi.

12. Some Special Cases with X Infinite

12. 1. (2m, 1, oo). Let us consider the case k = 1. The form of the I-function is given by
(21). The extremal distribution itself is linear in the intervals (oi-c, cxi), and is continuous every-
where except at a,; it is concave upwards for x < a. and concave downwards for x > jr. We
shall sometimes write ( for dxr. Q(z) is given by

1 z z2 zm?2 0 0 1 z ...zm
1 332 M3m+ 2 0 1 v 0 V1 Vm
0 1 2(3 (m+2)(3m+l 1 m 'V
Q(Z) 'V | v V I2 Vm+2 -(Z- )2 2( t2 V2 (z _)2 Q* (z)

Vmi- V2m+1 (m + 1) PM PM+1 Vm?+ V2m+1


say. We have from (21)

Pi P(Co0) P(OCr)_; Pm P(OCm)


( - -o Q'(cxo)' 2-Q"(Ccr)' ?Cm - ?m-1 Q'(@Lm)
Pi p i - i= 1,2,... r-1, r+1, .. .m.. . (23)
oci - oci-I (xi+, - (xi Q (ocr)

We may obtain in the same way as before for the Tchebycheff case

P(OO) v-jcr r ( v- ioc* r


Q'(cci) ai - jr j * i )ci iCr

I(00) - (v - OcL) rI (V - . . . (24)


I Q".( 0C r) j =r Or-()
where the cxi are the zeros of Q(z), and v is symbolic as before.
The condition that the given momenits allow of a solution to the moment-problem
some (

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
160 MALLows-Generalizations of Tchebycheff's Inequalities [No. 2,

1 ( ... s+1

Rs(8)- w sl vsl > 0 s =


Vs V2s+I

This condition is identical with that found by Johnson & Rogers (1951). If this is satisfied
for some (, then it is satisfied for all ( in an interval (Pro -1, Pro) between two adjace
Rm(F3). The mode of any solution of the moment-problem must lie in this interval. The coefficient
of zm in Q*(z) vanishes for just one ( in this interval, say ( = 30. We may fit extremal distributions
with r= ro and 30 < ( < P,0 and with r = ro1 and Pro-, < ( < 30 the value ( = 30 gives
rise to an extremal distribution which is special of the first type (se, 5.5, Part I), one of the points
oci having gone to infinity. The values ( = Pro-, and ( = Pro each give rise to the same distribu
tion, which is special of the second type; the point-mass at ( having vanished.
Thus in this case, Theorem III of 6.4 (part I) is true.
The generalized Tchebycheff inequalities may be obtained as the boundaries of the region
swept out by the extremal distributions as ( varies over the interval (Pr0-l, Pr). They wil
in part of the loci of the points
((, E Pt), (3,p + YE P)
i<r i<r

and in part of the envelopes of the m lines wth equations

(oci- oi-i) y = pix + EiY pj - c-i E- pj (oc-, < x < oc, i=1, 25 . . . m)
j<i j6i

where in the summations on the right we include a term p if i > r. The algebraic process of
obtaining these envelopes is complex for m > 1; in such cases, we must use numerical or graphical
methods. We shall not usually be interested in extreme accuracy; an adequate knowledge of the
limits can be obtained from comparatively few extremal distributions.
12.2 (4, 1, oo). The two-moment case (2, 1 oo) was treated in 7.2, Part I. In the four-
moment unimodal case, we standardize the moments so that,

(v0 . * * V5) (0, 1, 0, 3, 4V3, 5V4)


R2(() (54 - 9 - 4[(3)(3 - (2) -1632
R2(- /3) = R2(+ V/3) = -1632
Thus if there is a Po in (- /3, V/3) such that R2((0) > 0, then R2((3 has two zer
The third zero is > V3 or < - V3 according as V3 > 0 or < 0. Let us suppose V3 > 0, so
that the allowed range of variation of ( is (po, pl), and ro = 1.

Q*(z) = D(z) Z2 + {121l3((2 - 1) - 2((54 - 9)} z


+ {8V3(P3 - (5V4 - 9) 2-(153- -(163 2- 27)} - 3D(()

D(p) Z2 + H(F) z + K(() - 3D(()


say, where
D(5) = (3-(32)2 + 8V3

D(O) has just one zero in (po, pl). The zeros of Q*(z) are

ocz= D-D'{- H ?(4 H2 + 3D2-DK)I}


(i, j) are (1, 2) if po < ( < 30, and are (2, 0) if 30 < ( < PI. From (23), (24), if Po < < 3 A, we
have
3 + P(xl + X2) + ocXlc2 K-(pH
p (2 (O(I + 0(2) + X1l2 K + (H + ((2 - 3) D
Pi P2 3 + 2 + 2(oC2
n~~~= (OC -X9 P)(C,-

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLows-Generalizations of Tchebycheff's Inequalities 161

P2 3 + P2 + 2Pxi
MC2 - MC - (X2 - 03)2(2 - C1)

Thus to compute the extremal distribution for a given 3, we need to write down in
_H(f), K(f), (4H2 + 3D2 - DK)I. Then if po < 3< <0 we find successively a,c, 2, (M2-
(OS2 - O, (aX1 - 0 1, P2(X2 -a-1, P2, Pl, P-
Similarly if 80 < p < p1, we find successively oo, aC2, (C2 -co>1, (C2 - )-1, (3 _ CO)-,
3 + 2 + 2(22 3 + 2 + 2pmo
P ( - )(XO2 -O)' P2 (c2 - X3)(a2- )' O

-0t;.5S o 0o5 1-0 0.5 2 o

/'3

/ 0)1) >0?? (o,)

(d) | (C)6 (G)

FIG. 6.-Moment-problems (4, k, co) with k 0, 1, 2. Boundaries of impossible regions.

Explanation.
(4, 0, co). The parabola T is the well-known boundary given by A - 4 - t 32 1 = 0; for a
solution of the Tchebycheff moment-problem to exist, we must have A > 0. Each point
on this boundary corresponds to a special extremal distribution with character (0, 0), i.e. a
distribution consisting of two point-masses.
(4, 1, oo) The curve J and R is the boundary derived by Johnson & Rogers (1951) for unimodal
distributions; points on this curve are obtained from special extremal distributions with
characters (1, 0) and (0, 1). (The symmetrical case being (0/0).)
(4, 2, cc) The remaining system of boundaries give the regions in which the respective cyclic
sets (a), (b), (c), (d) of 5 8 (Part I) apply. Points on these boundaries correspond to
special extremal distributions with characters (2, 0), (1, 1), (0, 2), and (0/0/0) (with the
transitional (1/0) and (0/1).)

The first derivative of such a general extremal distribution consists of a point-mass p at ,


and rectangles of masses Pl, P2 on the intervals ((, c1), (cc1, c2) if po < ( < 30 and on (c0, O), ((Xc2)
if 80 < (3 < Pi.
12. 3. In this (4, 1, oo) case, there are two cyclic sets of extremal distributions; we may find the
boundaries of the regions in the (V3, V4) plane in which they are applicable by finding the V, and
Vc4 of the extremal distributions which are special of the first type; i.e. of the extrenial distributions
corresponding to the moment-problem (2, 1, oo). These have characters (1, 0), (0, 1), and (0/0).

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
162 MALLOWs-Generalizations of Tchebyrheff's Inequalities [No. 2,

These boundaries were found by Johnson & Rogers (1951). The character (0/0) gives the
rectangular distribution, with 3- =0. See Fig. 6.
12.4. Taking the given moments to be VL3 = 0, V4 = 3, we cannot immediately ap
above theory. We find that one zero of R2(r) is infinite, the other two being ? /3. The general
extremal distributions all have character (0, 1, 0); there is one special extremal distribution, (0/0),
i.e. rectangular. We may use the method of 12.2 to compute these distributions, where now

D(f) = (3 P2)2, H(f) - 12f, K(5) = - 6(3 + @2)


We may obtain the asymptotic form (as x oo) of the limits for this case as follows. Putting
3= -3 e we find that
-2+1 1V\3 1. 7V\3
M2=V3?+ -12- 2 48 ?
and hence that
2
P2 = 38+

If the line joining the points (@, 1 - P2) and (@2, 1) has equation y mx + c, we have

2 2 +
m- 3,\/3 e0 + . . . , C= 1 - 8+
Hence for a point on the envelope

x = - c'/m' - / e-2 +

y =mx + c =1- 2 8 + .2 .--2 +


15 . =1-.55 X

The numerical factor is *49152 exactly. Thus the. improvement over the Tchebycheff limit
(end of 10.4) is asymptotically by a factor -24576.
12.5 (2m, 2, oo). For the moment-problems (2m, 2, so), a general extremal distribution
E(x) with character (n0, nl, . . . n,) consists of m arcs of parabolae so joined that E'(x) is
continuous except at two points ocT, a, if nr= n == 1, ni = 0 i * r, s; and except at one point
ar if n, = 2, ni = 0 i 4= r; in the second case E(x) is discontinuous at a,. The I-functions are
given by
(d/dZ)2 IE(Z) = P(Z)/Q(Z)
where Q(z) is of degree m + 3, and has either two double zeros, or one triple zero. In the first
case, we may write
1 z z2 . . .zm+3
I1 @1 1 12 p1m+3
0 1 2@i (m -+ 3) fm1+2
1 @2 @ 2 3 2m+3
Vo V1 V2 Vm+3

VI V2

Vm-1 V2m+2

where @1 and @2 must satisfy Q'(@2) = 0; while in the second case we have

Z z 2 z3 . zm+3
1 @ @2 @33 m+3
0 1 2@ 3@2 (m + 3) PM+2
Q(Z) = 0 0 2 6@ (m + 3)(m + 2) Pm+
Vo V1 V2 V' Vm+3
VI

Vm-i V2m+2

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLows-Generalizations of Tchebycheff's Inequalities 163

There are m2 cyclic sets of extremal distributions; the boundaries of the regions in the space
of the moment-point (v, ,uj, . . . V2m) in which they are applicable will become increasingly
complex with increasing m; thus for m = 2, i.e. four moments, we consider the extremal distribu-
tions with characters (0, 2), (1, 1), (2, 0), (0/1), (1/0), (0/0/0). The V, and V.4 of these distributions
trace out the boundaries shown in Fig. 6. The four cyclic sets given in 5.8 (Part I) are applicable
in the regions shown.
We may obtain a necessary and sufficient condition for the moment-problem (2m, 2, oo) to
have a solution by an argument analogous to that used by Johnson & Rogers (1951) for the
(2m, 1, oo) case. Suppose F(x) is a solution of the moment-problem with given moments
0O, [L1, * * t2m, and the zeros of F"(x) are @1 and @2. Then
x

G(x) = (u - (u - 2) dF"(x)

is a solution of the moment-problem (2mn, 0, oo) (using the "wide sense" smoothness condition,
since G'(x) = 0 for x = @1 and @2, and so is not strictly positive), with moments

l.s(G) = 2v(s+2 - (f3I + @2) Vs+l + f31I32 Vs) s = 0 1 . . . 2m.

The condition for such a solution to be possible is given in (13) 10. 1, and here reduces to

0 1 @i . . . s+
1 VO V1 V3+1
@2 V1 <0 s=0,t, . . . m.
2s+1 V1s+2

12.6 (2, k, oo). We take the given m


distribution with character (r, s) (where r + s = k, r, s > 1) is given by

E'(x) = {y(x - M0) + a((Xl - x)}(x - MO)s-l (1 - X)r-1 MO < x < a1

where y, 8 are positive. We shall write (a, () for (o0, acj). The I-function corresponding t
distribution is given by

( )1 (al IEZ = (r - 1)! (s - 1)! -) f Sy + r


(_ l)k ( d_z k! ) (r_-1)l (s-l)! ( _ o!)k(Z - x)r (Z - P)s+l (Z - M)r+' (Z - 3)sj
On eliminating y and 8 from the moment conditions, we find that a and ( must satisfy

r(r + 1) OC2 + 2(r + 1)(s + 1) OCc + S(S + 1) @2 + (k + 1)(k + 2) = 0

Hence we have E(x) as the weighted sum of two incomplete Beta-functions; on differentiation
with respect to a, we find that E(x) touches its envelope where

-I(k + 2)(x - x-) (r + 1) a + (s + 1) 3


At such a point x,

E(x) =k+2B(s + 1, r)+k +2 B,,(s, r + 1)

1-x2 /(r + l)(s + 1)(k + 1) k! CP (25)


1 +x2 ~~k +2 r!sV~ r (5
where

s +1 + 1 /(r + 1)(s + 1)
an k + 2 k + 2 f ik + r

and B,, is the incomplete Beta function ratio.

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
164 MALLowS-Generalizations of Tchebycheff's Inequalities [No. 2,

Since y and a must be positive, x must lie in one of the intervals

/-(r I)(k+ 1) + \(s + 1) \I(s + 1)(k + 1) - (r + 1)0


Vr(k + 2) ' Vs(k + 2) ]'

(V/(r + 1)(k + 1)- V(s + 1)


r(k+2) VIs(k + 2)
The case where one of r, s, is zero may be treated similarly; taking r = 0, we find that the
extremal distribution touches its envelope where x =- -1, and at this point

E(x) = 2(k + 2))+1 1 + 2 * (27)

This is the equation of the envelope, valid for - oo <

The generalized Tchebycheff inequalities consist of portions of the 2k envelopes given by


(25), (26), and (27) (with the symmetrical envelope for s= 0), together with part of the loci

2(k + 1) x2 Jk+2<<0
k +2 1 +x 2k
2(k +1) 1x2 1 k +2 . . (8
l k + 2 1 + x2 ? < X < |k+2 (28)
which are obtained from the cases r = 0, s = 0.
Letting s + oo in (25), we find the envelopes become

Vr ev 1 -
y = 1-r (r + I)- \I(r + 1) e Xr = 19 29 3 ........................... . . (29)

for d -vr < x <1, where v = r + 1-V(r + 1) and IV(r) is the incom
function ratio. Letting r-* oo, we find
wse-w x2-1
y = rF(s + 1) + (S + 1) s! x2+1 S = 1 2, 3, . . . (30)

for 1 < x < Js+- +I,, where w = s + 1 + V I(s + 1).

The boundary of the region for x > 0, which may be called the generalized Tchebycheff limit
L(x) for the moment-problem (2, oo, oo), consists of arcs of the curves (29), (30), together with a
portion of the locus (from (28))
y= 2X2/(1 +X2) 0<x<1.

The portion of the boundary near x = 1 consists of infinitely m


approach the normal distribution function, being ultimately tang

13. Some Special Cases with ? Finite.


13.1 (2m, 0, ?). In the (2m, 0, ?) case, the I-function of a general extremal distribution is
given by (17) 11.2. The moment conditions are thus
2n+ 1

S(z)/ Q(z) -exp X-1 I(z)


which, as before, leave one degree of freedom in the parameters. It is shown in S & T that a
necessary and sufficient condition for our moment-problem (2m, 0, ?) to have a solution is that
the sequence of moments (1, V1*, Vt2*, . . *2) defined by
2m+ 1

exp ?-1 I(z) ' 1 + ?-1 I*(z) = 1 + ?-1(z-1 + Ili* z-2 + . . . + 2t*2 Z-2m-1)

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLows-Generalizations of Tchebycheff's Inequalities 165

admit of a solution to the moment-problem (2m, 0, oo). For if we can find polynomials P(z),
Q(z) satisfying
P(z) m pi 2m+1
Q(Z) i-l E _ 0Ct ~ I*(z)

with all the pi positive, then the polynomial S(z) = Q(z) + )-1 P(z) has ze
the inequalities (18) of 11 .2.
Thus for the moment-problem (2, 0, k) with standardized moments, we have

exp - + z3 I + + + +(i + 6k2) 3)


and so for a solution to exist, we must have

1 + 6;2 ~2-( ) > 0, i.e. X > 23


The generalized Tchebycheff inequalities for this case were given in 7. 3, Part I.
13.2 (2m, 1, x). In the case of the moment-problem (2m, 1, x), we found the I-function for a
general extremal distribution in (19) 11.2. Here again the problem of existence can be reduced
to the corresponding problem for a (2m, 1, oo) moment-problem.
The condition that there exists a pair of polynomials S(z), Q(z) satisfying the moment condi-
tions (19) and the inequalities (20) is that the sequence of moments defined by

2m+2
exp X-1(d/dz) I(z) - 1 + X71(d/dz) I*(z)
1 - -1(z-2 + 2Lj* Z-3 + . . . + (2m + 1) *2m Z-2m-2
admit of a solution of the moment-problem (2m, 1, oo). For if such a solution exists, then a
complete cyclic set of extremal distributions exists; to any general distribution E*(x) of this set
with
2m+2
(dldz) IE*(Z) - P*(z)!Q*(z) - (dldz) I*(z)
there corresponds a general extremal distribution E(x) for the moment-problem (2m, 1, A), with
2m+2
exp ?-1(d/dz) IE(Z)= S(z)Q(z)= 1 - - P*(z)/Q*(z) - exp ?-1(d/dz) I(z)

and the zeros of S(z) and Q(z) will alternate correctly.


Unfortunately this method does not immediately generalize any further; for any polynomial
Q(z) with a zero of multiplicity three or more will not do for the case with X finite.
13.3 (2, 1, k). Consider the two-moment case. Putting 1 - 1/6X = cp2, = z/cp (assuming
cp t 0; we shall see below that p2 > 0, so that we may take cp > 0), we have from the results of
the (2, 1, oo) case
1 dj 3 43 4 1 22 + 3(1-02)
pd 1 2 iw + Y2 _ 0)2 (20 + 3 + 02)

for arbitrary 0. Hence, replacing 0 by r/cp, we find

ldI 4j 1 1 2fpz + 3(p2 - 2) S(Z)


ex dz 1 - (z - r3)2(2rz + 3Cp2 + P2) - Q(z)

Since X > 0, the conditions for the zeros of S(z), Q(z) to alternate correctly reduce to

,P 2 > 0, - CpV3 < 3 < + pV/3


The mode of any solution of the (2, 1, ?) moment-problem must lie in the interval i CP/3.
Taking 0 < 3 < cpV3, and letting y, a, ? represent the ordered zeros of S(z), and a the zero
a= - (3,p2 + P2)I2p of Q(z), the extremal distribution is as follows:

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
166 MALLOws-Generalizations of Tchebycheff 's Inequalities [No. 2,

O x <y
2X,(X - Y) y A <x<a

(X) = 1 - (? - P)2 + 2 X(X + e -2)2 X


- X(X - e)2X X ?
1 ? ~~~~~~< X

The only parts of this distribution which give an envelope are the line for a < x - a and the
parabola for a < x < P. The point (xl, yl) is on the envelope of the line, where
= (am' - yy')(a' - y')-1, y = - y)(2x - a- y)

(primes denoting differentiation with respect to P). The point (x2,, y,) is on
parabola, where
x2, = (2r - e')(2 - e')-1, y2,, 1 -??- )2+ 2X(x2, + ? - 2r)2

There are symmetrical envelopes for - cp/3 < P < 0.


We may obtain an asymptotic expansion for the limits as x i oo by a method similar to
that of 12.4. The result is

L(x) =: 1 - 4 2 4 y2(8 - 9p2) 4 2(256 - 600p2 + 327(p4) +


9x2 + 27x4 243X6

Thus the improvement over the (2, 1, oo) limit is asymptotically by a factor cp2 1- 1/6k.

14. Numerical Results


14.1. The generalized Tchebycheff inequalities have been computed for the following moment-
problems: (2, 0, oo), (2, 1, oo), (2, 2, oo), (2, 0, ?A) with ?X = 0 5, (2, 1, X) with X = 0 3, (4, 0, cc),
(4, 1, oo). In the four-moment cases, the values V3 = 0, Vt4 = 3 were used. The value for X in
the (2, 0, X) case was chosen by a comparison of the limiting (rectangular) 1/2\/3 = 0-2887, the
Normal value I/A/(27) = 0-3989, and the values obtained for the t and X2 distributions with
various degrees of freedom. The value for X in the (2, 1, A) case was chosen by a comparison of
the limiting (triangular) value 1/6, the Normal value 1/V/(27e) = 0-24187, and various values
for the t and x2 distributions. It is thought that these values may prove to be typical in applica-
tions. The following results were obtained.

Limits for 1 - F(x) for Various Moment-problems.

x 00 05 1-0 1-5 2-0 25 30 40


Moment-problem:
(2, 0, oo) 1.0 1 800 500 *308 200 *138 100 059
(2, 1, oc) . 10 733 333 137 089 061 044 *026
(2, 2, oc) . . 10 700 250 115 075 052 037 022
(2,0, i) . 667 485 333 221 *147 100 072 *041
=0 5
(2, 1, i) . . 58 38 21 11 *06 04 02 *01
2 =0 3
(4, 0, oc) . . 833 *689 500 *248 *105 047 024 008
(4, 1, oc) . 700 *498 *284 *101 044 018 *008 003

Normal value . 500 309 159 067 023 006 0014 00003

14.2. It is seen from these figures that the assumption of the various smoothness conditions
improves the inequalities for the distribution functions considerably. In comparing these limits
with the values for the Normal curve, it should be remembered that the Normal curve is extremely
smooth, satisfying the smoothness conditions (k, oo) for all k. Also only two or four moments
are being used. We conjecture the following properties of the limits for different m, k, k.
(i) Increasing m alone will only improve the inequalities considerably if the p,-point is close
to the boundary of the impossible region. The limits coincide, leaving no room for variation in

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] MALLows-Generalizations of Tchebycheff's Inequalities 167

F(x), when the ,u-point is on this boundary. For each m, as x -


creases as x-2m. Thus the improvement over the limits for the next lower m is by a factor in
-2
X-.
(ii) Increasing k alone, with X = oo, will improve the inequalities mainly for I x > -. A
k -> oo, the limits will tend to certain limiting forms, but U(x) - L(x) will remain finite for all x.
For each k, as x ->- oo, the improvement over the Tchebycheff limit is asymptotically by a certain
factor (i.e. is independent of x).
(iii) Decreasing x, with m and k fixed, will improve the inequalities mainly for I x I <
As X approaches its limiting value, U(x) - L(x) ->- 0 for all x. For each X, as x oo->- , the improv
ment over the (2m, k, oo) limit is asymptotically by a certain factor (i.e. is independent of x).

15. Suggestions for Further Work.

15.1. The main immediate problem is the proof of Theorem III in the general case. The
difficulty seems to be that of finding a representation of a complete cyclic set which can be manipu-
lated as a unit, analogous to that of (15), 10.2.
15.2. A simple analysis would give the possible regions for the p-point under various smooth-
ness conditions; we could also find the regions corresponding to the various cyclic sets of extremal
distributions. The form of the limits is complicated in all but the simpler cases, but it may be
possible to obtain useful approximations. We may use asymptotic expansions (as in 7.3 and
12.4) for large x.

15.3. Research is required on the validity of the assumption of the various smoothness condi-
tions in any practical problem. A possible approach may be to perform a sampling experiment,
and to estimate the smoothness conditions (and possibly some moments at the same time) from
that. This would bring some probability considerations into the final statement of the inequalities.

Acknowledgments.
The above work is based on part of my thesis submitted for the degree of Ph.D. in the University
of London, which was prepared while I was in receipt of a D.S.I.R. grant.
My grateful thanks are due to my supervisors, particularly Dr. F. N. David, for their helpful
advice and encouragement; also to the referees, for their suggestions as to the presentation of
the work.

References.
AKHIESER (ACHIESER, ACHYESER), N. I., & KREIN, M. (1934-35), Three papers (in German). Zap. kharkiv.
mat. Tov. (Comm. Math. Soc. Kharkoff), Ser. 4, 9 (1934), 9-28; 10 (1934), 3-32; 12 (1935), 13-35.
(1938) On Certain Problems in the Theory of Moments. (In Russian.) Kharkoff.
(1940), "Some remarks about three papers of M. S. Verblunsky", Zap. nauch.-issled. Inst. Mat.
Mekh. Kharkov mat. Obshch. (Comm. Inst. Math. Mech. Sci. Univ. Kharkoff and Math. Soc. Kharkoff)
Ser. 4, 16, 129-134.
BIRNBAUM, Z. W. (1948), "On random variables with comparable peakedness". Ann. Math. Statist.,
19, 76-81.
CRAMER, H. (1946), Mathematical Methods of Statistics. Princeton University Press.
DENNIS, K. E. (Unpublished).
FRECHET, M. (1937), "Generalites sur les Probabilites. Variables Aleatoires". Traite du Calcul des
Probabilites et de ses Applications. Tome I. Fasc. 3, 1. Paris: Gauthier-Villars.
HOEFFDING, W. (1955), "The extrema of the expected value of a function of independent random variables
Ann. Math. Statist., 26, 268-75. See also
& SHRIKHANDE, S. S. (1955), "Bounds for the distribution function of a sum of independent, identically
distributed random variables", Ann. Math. Statist., 26, 439-49.
JOHNSON, N. L. (1949), "Systems of frequency curves generated by methods of translation", Biometrika,
36, 149-76.
& ROGERS, C. A. (1951), "The moment problem for unimodal distributions", Ann. Math. Statist.,
22, 433-9.
MARKOFF, A. (1884), On Certain Applications of Algebraic Continued Fractions. (Thesis in Russian.
St. Petersburg.
ROYDEN, H. L. (1953), "Bounds on a distribution function when its first n moments are given", Anin.
Math. Statist., 24, 361-76.
SAVAGE, I. R. (1952), "Probability inequalities of the Tchebycheff type", Rep. Nat. Bur. Stand., No. 1744.
U.S. Department of Commerce.

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
168 Discussion on Dr. Mallow's Paper [No. 2,

SHOHAT, J. A. & TAMARKIN, J. D. (1943), "The Problem of Moments". Math. Surv. No. 1. New York:
American Mathematical Society.
STIELTJES, T. J. (1884), "Quelques recherches, sur les quadratures dites mecaniques", Ann. sci. Ec.
norm. sup., Paris, (3), 1, 409.
(1894), "Sur certaines inegafites dues 'a M. P. Tchebycheff. Article redige d'apres un manuscript
inedit". Oeuvres Completes, 2, 586. Cited in Uspensky (1937).
TCHEBYCHEFF, P. (1874), "Sur les valeurs limites des integrales". J. Math. Pures Appl., (2). 19, 157.
(1907), Collected Papers, 10, 181-5.
ULIN, B. (1953), "An extremal problem in Mathematical Statistics", Skand. AktuarTidskr., 158-167.
USPENSKY, J. V. (1937), Introduction to Mathematical Probability. New York: McGraw-Hill.
VAN DANTZIG, D. (1951), "Une nouvelle generalization de l'inegalite de Bienayme", Ann. Inst. Poincare,
12, 30-43.
VERBLUNSKY, S. (1936), "Solution of a moment problem for bounded functions", Proc. Camb. Phil.
Soc., 32, 30-39.

DIscussIoN ON DR. MALLOWS'S PAPER


Dr. C. A. B. SMITH: I have the greatest pleasure in proposing the vote of thanks to Dr. M
who has explained with great clarity his substantial and intricate paper. The problem of finding
bounds within which a distribution must lie is important, especially in tests of significance, and the
attack on the problem through the moments seems the most hopeful line, since it is often much
easier to find the first few moments of a distribution, at least approximately, than to get an explicit
expression for the distribution function. Furthermore, if convenient bounds can be found, it
may be possible to avoid the production of extensive new tables. Statisticians have been very
lucky in the past to have been able to solve many important problems by the use of the normal
distribution, t, x2 and F, and much ingenuity has gone into the reduction of other distributions
to approximate normal, x2 or F tests. A similar approach would be the determination of bounds
from a knowledge of the characteristic function, which includes all the moments, since sometimes
the characteristic function can be found when the distribution function is still unknown, or too
complicated for immediate use.
Before discussing the statistical aspect, however, I should like to mention one linguistic point.
There is a tradition of careful pronunciation of French and German names: we don't speak of
Yooler or Gauce or Dezkarteez. Even an expert linguist could hardly be expected to cope with
all other nationalities, but in these days of peaceful coexistence, perhaps the same courtesy could
be extended to the Russians, whose names are usually much easier to deal with than the French
or Gernans. Nevertheless, a distinguished musician is regularly insulted by being called
Borrow-din, instead of, roughly speaking, Ba-ra-DYEEN. The name spelt Tschebyschew or
Tchebycheff also has the last syllable accented, and is more or less Chi-bi-sHAwFF, I believe.
Incidentally, why should we borrow the spellings from the Germans or French? English is not
the least important of the minor languages, and there does not seem any reason why we should
not adopt the more logical spelling Chebyshof in English publications.
The author has gone very far indeed into the main problem, but has fortunately not resolved
it so completely as to leave no interesting lines for further research and discussion. In particular,
Theorem 3 calls aloud for proof or refutation. But even though Dr. Mallows's bounds are a
great improvement on Chebyshof's, they are still rather wide for practical applications, as he
himself remarks.
I have been wondering how much can be done if one has only the first two moments of a
distribution. In principle this only imposes the restriction of Chebyshof's simplest inequality
on the distribution, but in practice it rarely leads to serious error in a reasonably large sample
to take a normal approximation, especially if the range of the distribution is infinite (or it has no
well-defined termini) and the two-tailed test is used to counteract skewness. This gives at least
an idea of the order of magnitude of the tail areas, and since it may well happen in a theoretical
distribution used for a practical problem that the central part of the distribution agrees well with
observation but that the tails fit less well, such a rough idea may be all that can be reasonably
expected. Some distributions are restricted to positive values of -the variable; x2 and F are well-
known examples. With such distributions the cut-off at zero may make the normal approximation
rather inaccurate. It would be possible to include this cut-off as an extra condition in Chebyshof-
type inequalities (Dr. Mallows mentions that some work on these lines has already been done)
but I suspect that the improvement in the bounds would not be very great. What can be done
as a rough practical measure? An obvious device is to transform the distribution logarithmically;
if x is restricted merely to positive values, the range of X = lnx is unrestricted. If the distribution

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] Discussion on Dr. Mallows's Paper 169

of X is normal, with mean M and standard deviation S, the first two moments of x about the origin
are
g(x) = exp (M + + S2); g(X2) = exp (2M + 2S2).

If, inversely, the values of (x) and &(x2) are known, a lognormal approximation to t
of x can be found by taking X = In x to be normal with mean M and s.d. S, where

M = 2 In &(x) - a ln &(x)2,

S = [n &(X2) - 2 n &(x)].

Some results are given in the following table, where exact significance points of
taken, and the tail areas calculated by the lognormal approximation.
[For x2 with d degrees of freedom,

&(x) = 9(X2) = d, &(x2) = d2 + 2d;


for x = F, with (dl, d2) degrees of freedom,

&(x) = d2/(d2- 2), &(x2) = d22(d1 + 2)/d1(d2- 2) (d2-4).]

Tail Areas Calcuated from Lognormal Approximation

True Significance Level

Variable '05 *01 *001


{ Id.f. . . . *035 *010 *003
X2 J 2 d.? . . . . * 042 *012 *003
1 5 d.f. . . . . 048 *014 *003
20 d.f. . . . L 053 *014 *003

Fi, 6 d-f. . . . . *036 *007 *0007


F 1, 12 d.f. . . . . 038 *009 *0015
3, 6 d.f. . . . . *055 '012 *0011

Alternatively one can proceed on the basis that the square of a variable is constrained to be
positive. The correct procedure would not seem to be to take Vx as normally distributed, as
is so often done, for V/x is still cut off at zero. Instead we assumed that the distribution of x
agrees with that of Y2, where Y is a normal variable with mean M and standard deviation S:
the formulas for M and S are then

M 3 (&x)2-2 &(x2)]I S V(gX - M2).

The probability of getting x > xo is then approximately the sum of the probabilities o
and Y > Vxo. The results of applying this approximation to the values of X2 used above are
given in the following table.

Tail Areas Calculated from x = Y2 Approximation

True Significance Level

Variable '05 *01 *001


' I d.f. . . . . 050 *010 '0010
y2 J 2 d.f. . . . 050 *0080 Q0005
X 5 d.f. . . . . 048 '0074 '0004
t20 d.f. . . . . 047 '0080 *0006

This second approximation cannot be applied to F with low degrees of freedom, since the values
found for M and S are then complex.
It therefore appears that either the lognormal or the "x = Y2" approximations give reasonable
results, where applicable, without taking into account skewness and kurtosis. This strongly
suggests that naturally occurring distributions are restricted to some narrow band of possible forms.
But it is not very clear exactly how this idea is to be put into a more exact form, and how nearly
it corresponds to the "smoothness conditions" used by Dr. Mallows. Neverthueless it is clear
that his investigations have made a substantial advance, on which he is to be warmly congratulated.

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
170 Discussion on Dr. Mallows's Paper [No. 2,

Dr. JOHNSON: I should like to follow Dr. Smith in complimenting the author on his courage
in setting himself such a difficult problem and on his thoroughness in working through to its
solution. The conditions of smoothness which he has employed seem to be very natural and
simple although the analysis to which they lead, particularly the analysis to which the X conditions
lead, is very far from simple. It is unfortunate that his results though very interesting from a
mathematical point of view are not quite what he wanted or what he hoped for in a number of
statistical applications.
Dr. Smith has already referred to the wideness of the gap between the upper and lower limits
for the probability integral and one would in a number of applications have hoped to get rather
narrower limits. This is not Dr. Mallows's fault; if it is not possible to get probability integrals
differing by a sufficiently small amount it is unfortunate-all he has done is to find out that it is so.
For example, it would be useful to say that if the curve you happen to choose gave a probability
integral of *05 up to X then you would be certain that no other curve which might have been
chosen would give a value bigger than * 06 or less than '04 for the same integral. This, of course
is not the position which Dr. Mallows has left us in, but it would be very desirable because it
would mean that it would not be necessary to try out a number of curves, any curve satisfying the
conditions would do.
It would be desirable to have some explanation of the table in para. 14 to give some lower
limits for 1 - F(x) and I should be inteiested to see some values worked out for the use of infinite
smoothness; I understand that Dr. Mallows has got some way with this, and that would form a
useful addition to the paper.
The root of the trouble, the trouble being the wideness of the limits, seems to lie in the inade-
quacy of the author's smoothness conditions, sufficient as they might appear to be. One might
consider the possible increase in restrictiveness-I have a number of suggestions here, but I do not
know whether any of them would commend themselves to the author. The first is restricting
the range of the variation of the variable. That is to say, we would consider only the fit within,
say, 3 standard deviations of the mean; using the moments of the corresponding distribution.
The second possible increase in restrictiveness would be to introduce two parameters ?l, AS
instead of having the parameter X which limits only the highest derivative included in the k
conditions, so that instead of a problem, say (2, 1, A) with a maximum value of the first derivativ
less than ?, the problem would be: (2, 1; XI, X2) the modal value of the distribution being X,
the maximum value of the first derivative being X2; there would also be some necessary relationship
between ?, and X2 and it is possible that the full restriction might not result in any noticea
increase in the narrow needs of the limits which were obtained.
The third possible increase in the restrictive conditions would be to retain only restrictions
on the values of the highest derivative concerned in this condition, but instead of having a constant
X it would be X which was a function of x. It seemed rather a waste to have a condition for a
maximum X which is probably only going to be used, so to speak, in a small part of the range of
variation of x. While there was no point in having a function A(x) which was too restrictive, one
could have some such condition as, say:

dF K
dx< (I + x)e-

so that the whole range of values of the derivative was being controlled in value, and not simply
at one point. It should be possible to work out suitable conditions for frequency curves of the
type which would be likely to fit.
The fourth point is that one might get closer agreement by fitting functions other than moments.
This is going outside the terms of reference of the problem, but it appears that the smoothness
conditions rather than the use of moments is the factor which ties down the curve effectively.
It may be that the use of some criterion other than the equation of moments might be more effective
in tying down the curve. Any function which falls off in its value as x increases might give a better
tying down of the curve than the use of the higher moments, which will depend on the exact shape
of the tail of the distribution.
It is worth noting in this connexion that if a simple function g(x) of the original variable can
be found to which some of Dr. Mallows's conditions apply, then frequency curves fitted to g(x)
will be under the limitations which Mallows has obtained and this means, of course, the corres-
ponding probability distribution of x itself will be under the same limitations.
This may be partly associated with the agreement which has been found between the Pearson
and Johnson curves. The latter are simple curves, obtained by supposing that there is a fairly
simple function of x which is normally distributed.

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] Discussion on Dr. Mallows's Paper 171

I have one point of criticism and one query. In connexion with the proof in para. 6.5, it is
assumed that a distribution function can be found which satisfies the (2 m, k, ?) problem with the
2m + 1st moment taking any value, as large or as small as desired. This would be true in a
(2m, 0, co) problem but it does not seem to me to be quite obvious that a distribution can be
obtained which satisfies the (2m, k, A) problem and which will take any value for the (2m + l)st
moment which might be desired.
There is one further point in para. 14.2, item (iii) which says that "As X approaches its limiting
value, U(x) - L(x) ->- 0 for all x." I should like to ask Dr. Mallows what this limiting value of X,
is, whether it is a constant value, the value for the normal curve, or is it a different value for a differ-
ent value of x?
I am very grateful for the paper which sheds new light upon a problem of long standing and
also indicates the lines of progress being made. I am extremely pleased to have the opportunity
of seconding the vote of thanks to Dr. Mallows.

The vote of thanks was put to the meeting and carried unanimously.

Dr. MULHOLLAND: One of the features that attracted my attention most in Dr. Mallows's
very interesting paper was his choice of method, in which he has preferred to follow Stieltjes
rather than Markoff. Markoff's method depends on inequalities of the form
sup F(i) < [p(x)],
where p(x) may be any polynomial of degree at most n such that p(x) >_ 1 for x < i
for x > t. An advantage of this line of attack is that even if one misses success in finding a best
possible polynomial for the purpose one may still, by choosing a polynomial judiciously, get
some upper bound for F(i). Some useful results were obtained in this way recently by Longuet-
Higgins (Proc. Camb. Phil. Soc., 51, 1955, 590-603) for the reduced trigonometric moment problem.
However, I am not suggesting that all the advantages lie with Markoff's method, only that it is
interesting to compare the two methods. I might remark that prima facie there is a reasonable
prospect of adapting Markoff's method so as to deal with moment problems of the type (2m, k, oo)
with k > 1. I have verified that this can be done for k = 1 in the case of distributions confined
to a finite interval. (Such a result may have been published somewhere by one of the Russian
writers, for all I know.)
I should like to make a query on a point of detail: that is whether Dr. Mallows's treatment
of the problem (2m, 1, oo) can be made to yield a convenient formula for the difference U(x) - L(x)
between the upper and lower bounds for the distribution function F(x). In the classical case of
Tchebycheff's inequalities there is a fairly simple formula: the difference equals the maximal
probability that can be concentrated at x and it is given by a quotient of determinants. The for-
mula appears in Dr. Mallows's ? 10. 5, and the beauty of it is that the elements in the deter-
minants are all known: there is no need to solve an algebraic equation first. I could not see
whether Dr. Mallows had already looked into this question, but possibly an answer to it is within
his reach.

Mr. GODWIN: Dr. Mallows's paper raises some interesting problems. The most immediate
need from a practical point of view is to find bounds not merely for F(x), that is F(x) - F(- oo),
but for F(xl) - F(x2) for given finite xl and x2. It has been shown by Hoeffding that this problem
in the (m, 0, oo) case involves the consideration of point distributions with at most m + 1 values,
and the case where x2 is minus infinity is rather special in the sense that only Im is required instead
of the full m + 1. There is a result due to H. L. Selberg which shows that when x2 is finite and
m = 2, three points are required. If one takes m + 1 points with probabilities pi at
di(i = 1, 2, ... , m + 1) then one can determine the pi in terms of the di from the moment
conditions and then the inequalities pi > 0 determine a region in the space of the di. In th
region the function F(xl) varies monotonically, but is discontinuous along a certain hyperplane;
its least value is taken on a boundary of the region where about half the pi are zero. But if we
deal with F(xl) - F(x2) then this is discontinuous on more hyperplanes and takes its minimum
in the interior of the region, where all the pi are non-zero. Dr. Mallows's result that we need
only consider sets of about -!-m "blocks" in the extreme distributions prompts the question whether
this number increases to about m, if we consider F(x1) - F(x2). Hoeffding's proof, which requires
the scaling up and down of the individual blocks no longer applies since the smoothness or bounded-
ness conditions are violated. It would be of interest to have a solution, even for the simplest
cases of (2, 1, co) or (2, 0, 4).
Dr. DAVID: This problem of moments is one of respectable antiquity as Dr. Mallows has briefly
indicated in his paper. The original impetus towards the solution of the problem came from
VOL. XVIII. NO. 2. G

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
172 Discussion on Dr. Mallows's Paper [No. 2,

the Russian school of probabilists who were concerned with various modifications of the law
of large numbers; this with the idea of "proving" the objective basis of probability theory. Many
of the present day modifications are applicable however to the problem as set out by Dr. Mallows
and it would be interesting to have comparison between some of them and the paper now under
discussion. Dr. Mallows knows about these inequalities but since he has not mentioned them in
his paper it is perhaps appropriate to put them on record. Besides the Tchebycheff and Markoff
inequalities we have Khintchine's theorem, Kolmogoroff's Strong Law of Large Numbers, the
Law of the Repeated Logarithm ascribed to both Khintchine and Kolmogoroff, and S. Bernstein's
inequality which is attractive in its simplicity, although not directly applicable to this present case.
Dr. Mallows mentions this inequality which is important enough to set out at length. We assume
a set of uniformly bounded independent random variables {xi}, with

1~
xi() = (0 4(x 2) C 2
n

a2 = a2

Then if

P= P I E xi I < W}
wbeingapositvenumberatchoice,
wt

p > 1 2 e 2a+2cw

where C = M/3, M being the upper bound of the values of the {x)}.
J. V. Uspensky in his book Introduction to Mathematical Probability gives a useful summary
of the proofs of these inequalities, but this was published in 1938 and further work now exists, as
will be seen in R. Savage's monograph. A modification of Bernstein's inequality so that a com-
parison with the present problem is possible would be very interesting.
It is certain that these inequalities mentioned above will be less useful than those proposed
by Dr. Mallows, but a comparison between them and Dr. Mallows's would be illuniinating in that
it would show, I believe, how far Dr. Mallows has progressed in adapting and strengthening the
conventional classical approach. I think that Dr. Mallows, by means of some stimulating new
ideas and some interesting if rather sketchy mathematics, has put fresh life into this problem.
I must confess, however, to a feeling of disappointment, not with him or his work, but with the
intractable nature of the problem which never seems to lead into pleasant places. If we imagine
the Pearson (Plp,) plane, the region into which our theoretical moments most often lead us i
what Pearson called the Type IV area; and unfortunately it is in this area that the classical inequali-
ties and Dr. Mallows's adaptation of them work least well. Dr. Mallows's inequalities work
better and better as we progress upwards to the impossible line (@2 - 1 - I1 0), and it is here
that we rarely find ourselves. This prompts the possibly unworthy thought that perhaps the
classical approach is lnot based on sufficiently firm practical foundations and that what is needed
is to recast our thinking afresh. I certainly think that no fresh approach along classical lines is
likely to lead to anything better than that of Dr. Malows.
It is just worth while saying that it is unfortunate that Dr. Mallows has chosen the word
"smooth" to describe the restrictions which he puts on his distribution functions, since it partly
clashes with the idea of "'smoothness" and "smooth" alternatives as put forward by Professor
Neyman. Dr. Mallows's smoothness conditions restrict the change of sign of the derivative of
the distribution function and also the size of this derivative. Professor Neynman was also concerned
that his smooth alternatives should not have derivatives which chanige sign often but only relative
to the hypothesis tested. There is smal likelihood of the two different aspects of smoothness
being confused but one feels a little like Humpty Dumpty in Looking Glass Land. "'That's
a great deal to make one word mean' Alice said in a thoughtful tone. 'When I make a word do
a lot of extra work like that', said Humpty Dumpty, 'I always pay it extra."'

Dr. D. E. BARTON: At first sight it looks as if Dr. Mallows's inequalities are rather too wide
to be useful, and his assumptions are rather too restrictive to be made with confidence. It is
therefore useful to point out a situation where neither of these objections carry very much weight.
This is when we wish to evaluate the power function of a statistical test and, whilst we know the
null hypothesis (H0) distribution of the test function more or less completely, we only know the
first few moments of the test function under the alternative hypothesis (H1). The usual procedure
in this situation is to "fit by moments", it being felt that since the power only needs to be known
approximately, it is not a serious defect that we do not know the error of apprioximation. Now
we are generally concerned with "smooth alternatives" (in Neyman's sense) and, at least for small

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] Discussion on Dr. Mallows's Paper 173

departures from Ho, it is reasonable to su


is not much less "smooth" (in Dr. Mallows's sense) than it is under H0. For instance, Neyman's
H1 system has (1, A) smoothness always and (2, A) smoothness for small departures from normality.
Dr. David has remarked on the connection between the two sorts of smoothness and has noted
Dr. Dodgson's contribution and so I will only add to this topic by saying that because both concepts
of smoothness essentially restrict the number of intersections of the two p.d.f's considered it is
my belief that they have a great deal in common. This of course is a point which is susceptible of
numerical verification.
If, then, we accept the assumption that under the alternative a slightly relaxed smoothness
condition holds we may use Dr. Mallows's inequalities based on the first two (or four) moments
to obtain bounds for the power function. This confirms the frequently observed fact that to know
the mean and variance of a test function under an alternative hypothesis is to know a great deal
about its power. At very least, these bounds may be used to provide an idea of the error involved
in "fitting by moments".
I have taken an example to illustrate the process which is an atypical one in that the H1 p.d.f.
is completely known but this has been done to enable an exact comparison to be made. Ho is
that (xl ... xm) is a sample randomly and independently drawn from a unit normal population;
H1 is that the mean is A, A > 0, and the test function is x. We use only the mean and variance
of x under 154 and the assumption of the (2, 1, 30) smoothness condition [X= 30 being a
relaxation of the normal condition X= 24]. The bounds (X(8), r(a)) for the power function
(@) (expressed as a function of al AVm) are tabled, together with values of P.
d 0 05 1.0 1 5 20 25' 30
,09 17 *32 51 71 *87 97
,p 050 126 *258 *442 639 803 912
*00 07 18 37 57 75 *87

They are derived from an unpublished graph of the functions tabled in 14.1 of Dr. Mallows's
paper; the fact that he has not yet completed the (4, 1, A) graph being responsible for the smrall
number of moments used here. However, the power is seen to be very well contained in the
bounds even for such small n, k. In more operational terms, for A = 0 2 samples of sizes 68
and 214 are needed for 50 per cent. and 90 per cent. power; the inequalities give the bounds
(61, 74) and (172, 260). These are fully sufficient for most practical purposes. Higher n and k
will plainly give all that would ever be required.

Dr. P. G. MOORE: I have found Dr. Mallows's paper of great benefit to me since it focuses
attention on what one is trying to do when fitting a frequency curve to a given set of data. Often
when using four moments, say, and assuming various functional forms for the unknown distri-
bution, such as the Pearson system, the Johnson system or the Gram-Charlier curves, very similar
results are obtained for the percentage points of the distribution. This paper suggests that by
fitting all these curves we are putting some restriction upon the height of the mode of the re-
distribution, and it is this restriction that is producing very similar results when using the various
curves concerned. For example, four moments by themselves seem nearly always to reproduce
a 5 per cent. point with fair accuracy but it may well be that the conditions of smoothness which
are put forward here are in fact restricting the distribution at further points and producing accurate
results for other values of x.
In looking at the effect of the mode (or A) on these inequalities consider the following results,
in the upper half of the table, for the upper limits of 1- F(x) for the case (2, 0, A) with varying
values of X.
x 1.0 1-5 2-0
0.6 0 500 0 308 0 200 (Simple Tchebycheff's inequality)
0 6 00356 0-237 0-159
A q 0 * 5 00333 0-221 0-147
0 4 0-299 0-192 0-122
L 2V3 0.211 0.067 0:000
0 5303 0 115 0-051 0-024 t with 4 d.f. P2 infinite
0-7071 0 121 0-060 0-031 Double exponential P2= 6
0-4481 0-151 0-082 0-042 X2 with 8 d.f. 2 =4-5
0-4280 0-148 0 063 0-025 t with 12 d.f. #2= 3-75
0-4135 0-156 0-078 0-036 X2 with 24 d.f. #2 = 3-5
0 3989 0-159 0 067 0 023 Normal distribution
0-2887 0 211 0-067 0 000 Rectangular P2= 1-8.

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
174 Discussion on Dr. Mallows's Paper [No. 2,

From these results it is clear that a knowledge of X makes considerable differences to the appro-
priate bounds for the various probabilities. The lower part of the table gives the actual values
of X and 1 - F(x) for seven distributions in decreasing order of kurtosis and it will be noticed
that although considerable variations exist the bounds are still quite wide and would require a
fairly accurate prior knowledge of X to make them satisfactory. In his paper Dr. Mallows illus-
trates with X = 0 5 which is a good first guess for most cases in practice as it would seem to include
most of the common forms of distribution. X = 0 * 5 happens also to be the cross-over point for
the estimation of the mean of a distribution in that for distributions with X < 2 the sample mean
is the best estimator of the population mean whilst for X > a the sample median comes to be better
in terms of its sampling variance. Perhaps this seems to show that instead of specifying a large
number of moments from the sample we should endeavour to use fewer moments but an accurate
assessment of the height of the mode in order to be accurate in our description of a population
from a sample.
These inequalities can of course be used inversely in several ways. For example we could
use them to obtain inequalities for the mode of a distribution give n moments or we could use
them to obtain bounds for one terminal of a distribution when the other is pegged at some point.
Again these problems revolve around the question of X and the values that may be given to it in
practical situations.
Presumably other sets of inequalities could be obtained, perhaps after some difficult mathe-
matics, by making different sets of assumptions concerning X. At present X applies only to the
highest order of differential of the cumulative distribution function that is considered. Some
device might be found to incorporate X,, X2, . . . the maximum values of the various der
Again an improvement could presumably be effected by allowing X to have a value varying according
to the value of x so that for the first derivative, say, the probability density function is tied down
more closely than by just stating a maximum applicable over the whole range.

Professor C. A. ROGERS: I have no direct contribution to make to the discussion on Dr.


Mallows's paper, except that I wish to congratulate him on it. But there are two generalizations
of the problem, which have been raised in the course of the discussion, which I should like to say
something about. Mr. Godwin suggested that instead of estimating the value of the distribution
y

function F(x) one should estimate dF(x) for fixed Xand Y. More generally one could estimate
x
some generalized moment
00

Q(x) dF(x). . . . . . (1)


- 00

Here Mallows's case is when Q(x) has the value 1 to the left of a fixed point and the value 0 to the
right of the point; and Godwin's case is when Q(x) has the value 1 in a fixed interval and the
value 0 outside the interval. Dr. Johnson suggested that it might be more efEcient to estimate
F(x) in terms of generalized moments of the form
00

Fs = Jp,(x) dF(x) i = 1,2, ... ,n. . . . (2)

This is a much more general problem and one cannot hope to get any explicit results until one
makes up one's mind as to what functions one is going to consider. Dr. Mulholland and I have
considered these problems from a general point of view and I shall describe the type of result
we are able to obtain. We consider the generalized moment problem of type (n, k, co), supposing
that F(x) satisfies the moment conditions (2) and smoothness conditions (k, co), and seeking the
extreme values of the integral (1). One can prove that although the extreme values of this integral
may not be attained they will be approximated as closely as one likes, even if one restricts the
function F(x) to be a distribution function of a type similar to those considered by Dr. Mallows.
One restricts attention to distribution functions of the form

F(x) = Y -,(X _)k


-i<X

satisfying the (k, co) smoothness condition in the wide sense. Here 41, N2, ... , iN are numbers

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
1956] Discussion on Dr. Mallows's Paper 175

with -co < 41 < 02 < ... < iN < + cc, and the summation is over those values of i for
which < 6 x. In order to be certain that the integral takes values arbitrarily near its extreme
values, under the moment conditions (n, k, oo), we have to allow F(x) to be a function of this
special form where N < n + k + 1. In the special cases he considers, Dr. Mallows has n = 2nm,
and he shows essentially that the same result holds for functions F(x) of this form satisfying the
much more restrictive condition N < -n + k + 1.

Mr. F. J. ANSCOMBE: In this very clearly written paper, Dr. Mallows makes a contribution
to an old problem in the mathematical theory of probability. I am not closely acquainted with the
relevant literature, but it appears to me that Dr. Mallows has made a striking advance on previous
work, and this fact provides a complete justification for his endeavour. It is nevertheless legitimate
to ask whether his results are likely to be of any practical use in statistics. If the answer is No,
his investigation was still worth while, but if the answer is Yes, then so much the better.
It is difficult at this stage to form a clear idea of the possible usefulness of this work, since
that may well be affected by further research on when the (k, A) smoothness conditions are satisfied.
But I suggest that in any particular case, if the specification of a probability distribution is suffi-
ciently tractable for some moments to be found and for a (k, ?) condition to be shown to be satisfied,
then it will be more profitable to seek a direct approximation to the distribution function than to
apply the rather difficult results of this paper.
Dr. Mallows has confined his attention to inequalities for the cumulative distribution function.
Distribution functions are of course required when we calculate significance levels of tests of the
orthodox type. But many statistical problems require likelihood functions rather than distribution
functions. I should like to ask Dr. Mallows to indicate, in his reply to the discussion, whether
any interesting inequalities can be derived from the present work for the probability density funct
In any statistical work where an unknown distribution is being discussed, I should be very
hesitant of asserting that a smoothness condition of order k greater than 1 was satisfied, since
naturally occurring populations could so easily violate such a condition. For example, a popu-
lation consisting of a mixture (in any proportion) of two normal populations having different
means may well fail to satisfy a smoothness condition of order 2, even if it is unimodal.

Dr. MALLOWS (in reply): In reply to Dr. Johnson, I apologize for the omission in the proof
of section 6.5, where I assume that the family of functions F4(x) exists. I think this is reasonable,
but I cannot prove it at the moment. As for the limiting value of X, I have the formula for that
in the two-moment case

X > 4 (k + 2 2) 2 (k > 1)

or it can be written the other way round

2 lk! 2
FL2 > k +~ 2 Jk r (k > )
which gives the lower limit to the second moment. Dr. Mulholland mentioned the Markoff
method of obtaining the Tchebycheff inequalities. The method does not seem to be of much use
if X < oo; I have thought about it a great deal, but could see no way of constructing the poly-
nomials.
With respect to the value of U(x) - L(x); the limits come out as envelopes of extremal distri-
butions, and for any one x the trouble is that the upper and lower limits will be attained by different
extremal distributions.
I agree with Dr. David that some extra assumptions such as those made by Bernstein might
usefully be employed. I have wondered whether there is any tie-up with Hoeffding's recent
work. He considers sums of r.v.'s and shows that the extremal distributions (in the (2m, 0, co)
case) are again point-mass distributions, but having 2m + 1 point-masses instead of Tchebycheff's
m + 1, as in the problem mentioned by Mr. Godwin. It seems to me that the "smoothness"
assumptions are more general than these, but will to some extent have the same effect.
I must apologize for not having done more computations; one indication of the effect of
assuming (1, X) smoothness is as follows: Consider the (4, 1, X) moment-problem, using Normal
moments and the Normal value *24197 for A. Then the symmetrical extremal distribution differs
from the (cumulative) Normal distribution by at most *013 (this is at x + 1 935); there is
another relative maximum difference of *011 at x = i449. It seems that with this or higher
order smoothness conditions we shall get really close inequalities.

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms
176 Discussion on Dr. Mallows's Paper [No. 2,

Dr. MALLOWS subsequently replied in writing, as follows:


The following extension of the table in 14I 1 gives lower limits for 1 - F(x)

x 0 *5 1.0 1.5
Moment-problem
(2,0,0 5) 333 *141 0
(2,1, 0-3) *42 *24 09 02 (O at I 826)
(4,0,oo) 167 036 0
(4, 1, oo ) 300 113 014 *000 (O at 1 732)

In the remaining two-moment cases no limit other than zero is possible.


The boundary in the (r1, P2) plane of the region in which infinitely smooth distributions m
exist, passes through the points (4/n, 3 + 6/n) for all integral n (the corresponding distributions
being Type III with integer exponent); but I believe the intermediate points on the Type III line
lie wholly within the region. I conjecture that the Normal is the only infinitely smooth distribution
having (r1l, P2) = (0, 3).
Dr. Johnson's suggestion that the inequalities might be improved by considering only truncated
distributions would require the knowledge of the truncated moments; these would presumably
be difficult to find. I find his second suggestion less attractive than the third, where we suppose
?X to be a function of x; under suitable conditions on this function, we can construct extremal
distributions as specified in 5.9 (X= A(x) throughout) and Theorem II will still hold. For
k > 1, we may extend Definition I to involve k + 1 functions ki(x). The algebraic complexities
are formidable, but a numerical approach may still be possible. Dr. Johnson's first suggestion is
here equivalent to requiring each A(x) to be zero outside some finite interval.

Completion of the proof of 6.5


Suppose the moment-problem (2m, k, A) admits as a solution Fx(x) with PU2m+l = P.X; it is
desired to construct a solution having any assigned 2m+l1t moment PU2.+1 = :.
Consider the function Fx(k+1)(x) and its associated sequence v(X) {vr(X)} (notation of 11 6).
Defining "distance" in 2m + k + 1-dimensional v-space in a fashion analogous to that of 6.2,
there exists a do > 0 such that for all v-points v(v) with d(v,(v) v(X)) < do, there will exist a fun
G,(x) having this v-point and satisfying the (k, A) smoothness condition. (Note that Gy can
be a distribution function unless vo, vl, . . . vk-, are all zero; we are here using Definition
to apply to a more general class of functions.) Choose ac large and positive or negative according
as C > ,x or C < ,ux; define

G,(k+l)(x) V - exp - 2
where
^;_2m + k d- 1 ! |C-x | f (_ 1)koC-1 i > pX
2m + 1! I a:2nt+k 1 ccI-:1 i < PX
and consider the effect of adding Gj (k+l) to G (k+1). The
0(c -1), and v2m+k+? will become
2m + k + 1! ' (

For ox sufficiently large, G. can be so chosen that


r Vr(X) r = O,1, ... 2m + k and
(i) Vr(Y) + vr(3) = 2m + k + 1'
L 2m+1! r= 2m+k+ 1
(ii) G,(x) + GJ(x) satisfies the (k, A) smoothness condition. Then Gy + GJ is the required
solution of the moment-problem.
Mr. Anscombe's request for inequalities for the p.d.f. F'(x) can be met; if E(x) is a general
extremal distribution and F(x) a solution of the m.p. (2m, k, A), (k > 1), then from the proof
of Theorem II H'(x) = F'(x) - E'(x) cannot have more than 2m + 2 simple zeros; and from a
simple extension of Theorem I it must have at least 2m + 1. It follows (cf. 6.4) that as E(x)
varies through a comiplete cyclic set, E'(x) will sweep out a region in the (x, F'(x)) plane within
which F'(x) must wholly lie. Similarly we may obtain limits for F"(x), ... F(k)(X). Thus
any solution of the m.p. (2m, k, A) is a solution also of a series of m.p.'s of the type suggested by
Dr. Johnson. The reverse is not true.

This content downloaded from 192.80.65.116 on Sat, 19 Aug 2017 03:25:13 UTC
All use subject to http://about.jstor.org/terms

Вам также может понравиться