Вы находитесь на странице: 1из 25
Biometrika Trust Maximum Likelihood Estimation in a Class of Nonregular Cases Author(s): Richard L. Smith

Biometrika Trust

Maximum Likelihood Estimation in a Class of Nonregular Cases Author(s): Richard L. Smith Source: Biometrika, Vol. 72, No. 1 (Apr., 1985), pp. 67-90 Published by: Oxford University Press on behalf of Biometrika Trust Stable URL: http://www.jstor.org/stable/2336336 Accessed: 03-04-2017 11:21 UTC

REFERENCES Linked references are available on JSTOR for this article:

http://www.jstor.org/stable/2336336?seq=1&cid=pdf-reference#references_tab_contents

You may need to log in to JSTOR to access the linked references.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted

digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about

JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

http://about.jstor.org/terms

Conditions of Use, available at http://about.jstor.org/terms Biometrika Trust, Oxford University Press are collaborating

Biometrika Trust, Oxford University Press are collaborating with JSTOR to digitize, preserve and extend access to Biometrika

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Biometrika (1985), 72, 1, pp. 67-90 67

Printed in Great Britain

Maximum likelihood estimation in a class of nonregular cases

BY RICHARD L. SMITH

Department of Mathematics, Imperial College, London SW7 2BZ, U.K.

SUMMARY

We consider maximum likelihood estimation of the parameters of a probability

density which is zero for x < 0 and asymptotically cxc(x_ 0)'- ' as x 4 0. Her

parameters, which may or may not include oc and c, are unknown. The classical regularity

conditions for the asymptotic properties of maximum likelihood estimators are not

satisfied but it is shown that, when cx > 2, the information matrix is finite and the

classical asymptotic properties continue to hold. For cx = 2 the maximum likelihood

estimators are asymptotically efficient and normally distributed, but with a different

rate of convergence. For 1 < a < 2, the maximum likelihood estimators exist in general,

but are not asymptotically normal, while the question of asymptotic efficiency is still

unsolved. For cx < 1, the maximum likelihood estimators may not exist at all, but

alternatives are proposed. All these results are already known for the case of a single

unknown location parameter 0, but are here extended to the case in which there are

additional unknown parameters. The paper concludes with a discussion of the appli-

cations in extreme value theory.

Some key words: Extreme value theory; Maximum likelihood; Nonregular estimation; Stable distribution;

Weibull distribution.

1. INTRODUCTION

It is well known that, if the support of a probability density depends on an unknown parameter, then the classical regularity conditions for maximum likelihood estimation are not satisfied. In some cases the maximum likelihood estimators exist and have the

same asymptotic properties as in regular cases. In other cases they may exist but not be

asymptotically efficient or normally distributed, and in still other cases they may not

exist at all, at least not as solutions of the likelihood equations.

The case of a single unknown location parameter

f (X; 0) = o (X 0) (O < X < so); fo(x) - occx'1(11 as x I 0 (ac > 0, c > 0), subject to certain smoothness conditions, is well docu

For cx > 2 the Fisher information is finite and the maximum likelihood estimators have the same asymptotic properties as in regular cases. This is stated by Woodroofe (1972,

Proposition 1.1), based on results of Le Cam (1970); the result is also implicit in Dawid's

(1970) proof of the asymptotic normality of posterior distributions in this case. For cx = 2

the maximum likelihood estimators are asymptotically normal (Woodroofe, 1972;

Akahira, 1975a) and efficient (Weiss & Wolfowitz, 1973), but the order of convergence is

O{(n log n)2} instead of the usual O(n2). For 1 <cx < 2 the maximum likelihood

estimators have a nonnormal limiting distribution with order of convergence O(n'12)

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

68 RICHARD L. SMITH

(Woodroofe, 1974). Asymptotic efficiency is an open question, though Akahira (1975a, b)

showed that no estimator has a larger order of convergence. When cx < 1 the likelihood

function has no local maximum but is globally maximized at the sample minimum. The sample minimum itself is a consistent estimator with order of convergence O(n 'a ), and

Akahira's results show that, for 0 <cx < 2, this order of convergence cannot be improved. When cx = 1 a result of Weiss (1979) shows that the sample minimum has a

property of asymptotic sufficiency, but I have not succeeded in applying Weiss's

argument for any other value of ac.

In the present paper these results are extended to distributions in which there are

other unknown parameters as well as 0. Some examples which fall within our framework

are as follows.

(i) Three-parameter Weibull,

f(x;O,x,f3)=cxf3(x-O)'-exp{-,f(x-0y)} (0<x<oo,x>O,fl>O).(1-2)

(ii) Three-parameter gamma,

f(x; 0, Lc, fl) = f,B(x-0) exp {-f(x-0)}/F(ac) (0 < x < oo, oc > 0

(1.3)

(iii) Three-parameter beta, which is a beta distribution with an unknown scale parameter, which can be recast in our framework by defining X to be a random

variable on 0 < x < oc with 1- exp (- X + 0) beta distributed; thus

f(x; 0, c,,B) = B(c, fl) {1-e-(X -)}-I e-fl(X-0) (0 < x < o, oc > 0,3 > 0).

(1.4)

(iv) Three-parameter log gamma, which arises when log (X -0+1) has a gamma

distribution; thus

f(x;0,cx,f3)=flB{log(x-0+1)}'-'(x-0+1)-y-'/r(oc) (0<x<oo,oc>0,f,>0).

(1.5)

Our results also cover certain instances of the Box-Cox transformation family but we

shall not consider these explicitly.

Of these four examples, the Weibull has been studied the most intensively. For all

four, when ac < 1 the density tends to infinity as x I 0 so that, unless the range

excluded, the likelihood function always tends to infinity along some path in the

likelihood space as 0 tends to the sample minimum. Therefore it is necessary to

distinguish between global and local maxima of the likelihood. In this paper, by the maximum likelihood estimator we shall always mean a local maximum, thus satisfying

the likelihood equations. A second point, which applies to all four examples but may not

be true in general, is that for Lc < 1 the density is J-shaped, so there cannot exist a maximum likelihood estimator for which ac < 1. In particular, if the true value of ac is less than 1, maximum likelihood estimators either do not exist at all or are inconsistent.

Harter & Moore (1965) described an iterative procedure for finding maximum

likelihood estimators for the Weibull and gamma distributions with possibly censored

data. In cases where maximum likelihood estimators do not exist, they proposed an ad

hoc modification based on treating the smallest observation as if it were censored.

Rockette, Antle & Klimko (1974) showed for the Weibull distribution that, if a local

maximum of the likelihood function exists, then there is a second solution of the likelihood equations which is a saddlepoint. Their result shows that, in finding maximum

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Nonregular estimation 69

likelihood estimators one must take care to ensure that a solution of the likelihood equations really is a local maximum. Other relevant references are Cohen (1965) and Lemon (1975). The Weibull distribution may be reparameterized as the generalized extreme value distribution (Jenkinson, 1955) for which modern algorithms are available (NERC, 1975; Prescott & Walden, 1980, 1983). Thus there is a fair-sized literature on finding maximum likelihood estimators of the

Weibull distribution, but their asymptotic properties are largely unexplored. It is easily

checked that, when a > 2, the Fisher information matrix is finite, and it is widely

assumed that the classical properties hold in this case. For a < 2 the Fisher information

for 0 is infinite, so the classical results are certainly not valid.

In this paper it is confirmed that the classical results hold when a > 2, and the case

a <, 2 is studied in detail. The surprising result is that, in this case, estimation of 0 and

the other parameters, denoted by 0, in general a vector, are asymptotically independent:

each of the maximum likelihood estimators of 0 and 0 has the same asymptotic

distribution when the other is unknown as when the other is known, and we are also able

to show that these asymptotic distributions are independent. For 1 < Li < 2 we prove the

existence of a consistent sequence of maximum likelihood estimators as the sample size

tends to infinity, while for Li < 1 no consistent maximum likelihood estimators exist. We

also propose efficient alternatives to maximum likelihood which, in particular, cover the

cases where maximum likelihood estimators do not exist.

Cheng & Amin (1983) have also studied the asymptotic properties of maximum

likelihood estimators for the Weibull and gamma cases, though their Theorem 2 is less

extensive than our results in ?? 3 and 4 and their proofs are unpublished. On the other hand they introduce a new estimator, the 'maximum .product of spacings' estimator,

which deserves to be studied further. Johnson & Haskell (1983) prove consistency

of the maximum likelihood estimator for the three-parameter Weibull with Li > 1, and

present Monte Carlo results which indicate that, even in the 'regular' case Li > 2, the

asymptotic normality of the estimators is approached only slowly.

Our main results require a long list of assumptions, which are stated in ? 2. The

remainder of the paper is organized so that statements of the main results appear at the

beginning of each section, and may be understood without reference to the proofs. Results of a purely technical nature are stated as lemmas and contained within the

proofs of the theorems, though Lemmas 6 and 7 may be of independent interest.

2. ASSUMPTIONS

We consider probability densities of the form

f(x;0'')= (x_0)"-'g(x-0;0g) ( < x< 00), (2.1)

where 0 and 0, the latter a vector, are unknown parameters and g

as x I 0. We assume a =_ x(f) is a twice continuously differentiabl

0 < a < oo. This formulation allows both the case when a is a component of 0 and where

it is a known constant. Similarly c c(+). For our four examples the function g is given

by:

(i) Weibull, g(x;oc,,) = oc,exp(- fx");

(ii) gamma, g(x; o, fl) = fl, exp (-fx)/F(a);

(iii) beta, g(x; o, ) = B(oc, ,B) {(1-e--)/x}i -e1 Oix;

(iv) log gamma, g(x ; oc, /B) = JBP{(log (1 +?x))/x}a- '1( ?x)~ -' /F(oc).

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

70 RICHARD L. SMITH

In general, we assume 0 is real-val

We adopt the convention that subscripts, 01, 02, etc., are used to denote elements of F,

whilst superscripts, 1, 4)2, etc., denote components of a particular 4. Thus

4, (i = 1,

,p)

is the ith component of 4j. The symbol I 0 1 denotes

The detailed assumptions are as follows.

Assumpton 1. All second-order partial derivatives of g(x; 0) exist and are continuous

in 0 < x < oo, 0 E (D. Moreover c(0) = c - 1 lim g(x; 4) as x 4 0 exists, is positive and finite for each 4, and is twice continuously differentiable as a function of 0.

Assumption 2. If 01 and 42 are distinct elements of (F then the set

{X: f (X; 0, 01) * f (X; 0, 0A)

has positive Lebesgue measure.

Assumption 3. For some fixed P >, 0 and for each il > 0, 01 E (, we have

log g(x -e; 0) -log g(x - el; 01) < Hr,(x; 01),

whenever IeI < 1, I1el <11, I 0 -01<ii , x-e> , x-e1 > * and the func

satisfies

for each y > 0, 4) - (F.

100

lim Hq(x+y; 01)f(x;0,00))dx = 0

q-O Jo

Assumption 4. There exists a fixed increasing sequence of compact subsets {Km, m > 1}

of RP+l and a fixed constant 6' such that Um Km = DR x (D, and, for 0,0, 04, 4O satisfyin

0 < 00o-'4)40)0o c (D,(0, 0) 0 Km, we have

log f (x; f, 0) - log f (x; f0o, #)) < Ho (x; f0o, 4)) (x > f0o),

where

(00

-oo < limsup Ho(x;0, 0) f(x;O0,0)dx < 0.

m oo J Oo

Assumption 5. There exists a fixed increasing sequence of compact subsets {K I m > 1} of (F with UmK' = (F, and 6' > 0 a fixed constant, such that for 0 E (F-K,, 4o c (F,

I < 6', max (0, e) < x < oc and some 17' > 0, we have

~i'x(4)) + {xoc(4) -x(4O)} log x + log g(x-e; 4) -log g(x; 4)) < HIJ(x; 4),)v

where

('00

-so < limsup H' (x; O) f (x;0, O))dx < 0.

m - JO 0

Assumption 6. If E, denotes expectation with respect to f(. ;0, 4)) then for each

0 E (D (i, j = l

(a)

I,p):

E (4 logf(X;O,a) = O

E,)6{(;+i) log f (X; 0, 4) )Qa4j)logf(x;0, 4)) =- E4(.A4) '7 ,) log f (X; 0 4)}

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Nonreqular estimation 71

(b) ifcx>1, then

E4( )log f (X; O, )} = ,

-E=Ox log f)(X; 0, ) (e-logf)(X 0, E c/ox}) logf(X;0 04)}

(c) if oc>2, then

= mio(4) = moi(4));

E4,[{(a/ax)logf(X ;o, 4)}2] = 2 -E{(02/aX2)logf(X;0, 0)}

- moo(4) > 0.

It is part of the assumption that all these expectations are finite.

Assumption 7. If h(x; 4) is any Of (02/ax a/i) log g(x; 0) or (a2/a4i a/J) log g(x; 4),

as 0 -+ Oo, 4) -+

EoIh(X- 0;4)-h(X-0o;40o)I -+ 0,

where Eo is with respect to f(.; 00, 4)0). If ox(4o) > 2, we require the same of

h(x; 4) = (a2/aX2) log g(x; 4).

Assumption 8. For each e > 0, 6 > 0, there exists a function h, , such that

I (02/aX2) log g(x; 0) I < e/x2 + hE,,(y; /o)

whenever I 01- )0I < 5, I x-y y < , and h, satisfies

{h8(X; 00) f (X; 0, 4)0) < so.

Assumption 9. For each ( > 0, ) E (D,

{ {(a/ax)logg(x;10)}2f(x;0,0))dx < 00.

We now make some remarks on the assumptions. The key assumptions are Assump-

tions 1 and 6; the rest are there for technical reasons. Assumptions 2-5 are similar to, but

necessarily more complicated than, the classical assumptions of Wald (1949). In

Assumption 3 we may have (* = 0 or (* > 0; in the Weibull case, Assumption 3 is true

with (* > 0 for all oc but with (* = 0 only for a > 1. The distinction is reflected in the

statement of Theorem 2 below. Assumptions 7 and 8 are needed for Lemma 4 in ? 3, and

Assumption 9 is taken from Woodroofe (1972, 1974). Assumption 1 could be weakened to

g(x; 0) slowly varying at x = 0 for each 0, but at the cost of an increase in technical

detail.

For all our examples, Assumptions 1-6 and 9 are straightforward, if somewhat tedious, to check. For examples (ii)-(iv), log g and its partial derivatives are bounded in x for each 4, so Assumptions 7 and 8 are easy to check as well. For the Weibull distribution

we have

log g(x;c,f3) = log a + log _flx,

a a2t*~R R- nnm*~R- t M

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

72 RICHARD L. SMITH

which are bounded as x I 0 respectiv

easily. To satisfy Assumption 8, given 0 < ax1 < aO < oc2 and 6 > 0, there exists K such

that

02 lOg g(X; oc, A) <Kxal - 2 (x 1),

and all (oc, /B) satisfying X1 < oX < oX2, I- I < J. Let XE = (8/K)'I' which we assume to be

< 1. Then

I (021/X2) log g(x ; oc, /)I < ?/X2 + hi6(,; C0, /0)

where

(0 (x <,xe),

2 (X, < XE v

he, (x; ao, f3o) = KxI (XE ? X < 1),

tX12 2 (X l)

This function is bounded as x - 0. If we define h.,6(y ; ao, I3o) t

sup {Ih4(x;f ox o): I X-y I < (},

Assumption 8 is satisfied.

3. EXISTENCE AND UNIQUENESS OF CONSISTENT ESTIMATORS

From now on we shall let 00, 00 denote the true values of 0 and 0. The letters ox and

unless indicated otherwise, will always denote Ca(40) and c(0o). Let X1,

, Xn den

sample of independent observations from the common density f( ; 00, /0). Let Ln d

the log likelihood divided by n, that is

Ln(0, ) = n-l E logf(Xi;0,0).

i= 1

The order statistics will be denoted Xn,1 <

for 0 < Xn, 1.

< Xn n; note that Ln(0, 0) is def

The maximum likelihood estimator, when it exists, will be denoted by (On, /n) a

satisfies

aL,n(on,, 4J)/aO = 0, aLn(On, /n)/abi = 0 (i = 1,

,p).

For the special case when 0 = 00 is known, let 7n- n(7O() denote the maximum

likelihood estimator for k, satisfying (aLnl/a4Ji) (0f,5,) = 0. The existence and con- sistency of a5n follows from the classical results for regular estimation problems.

Similarly, let fn =_ O-n(/) denote the maximum likelihood estimator for 0 when 0 = 00

is known. The asymptotic properties of On are given by the results in ? 1. In particular,

0in exists and is consistent when oc > 1.

Define dn, 0 (n 1) to be 1 if,B> 1, logn if,B= 1 and n1/P1 if 0 </< 1, and write

Yn <prrn, for random variables { Yn4 and positive constants {rnl}, if

lim limsuppr(I Yn I > arn) = 0. (3.1)

a-4oo n-4oo

Our main result in this section is the following.

THEOREM 1. Assume Assumptions 1 and 6-8 are satisfied, and that ocX> 1. Suppose that

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Nonregular estimation 73

M is strictly positive-definite, where:

(i) for a > 2, M is the (p + 1) x (p + 1) matrix with entries mij(0o) (i, j = 0,

,p);

(ii) for 1 < o < 2, M is the p x p matrix with entries mij(?o ) (i, j = 1,

., p), the functions

mij being defined in Assumption 6. Then there exists a sequence ( o,) A) Of solutions to

the likelihood equations such that

An00 <p (n4n,a) v n -/)o <p n-

Moreover, if ot = 2 we have

An -On <pn - (logn) v n a- dn <p (n log n) 2

while if 1 <cx < 2 we have

An -fJn < p n /a + n _ (An- <Pn-l /a

Theorem 1 is a result about the local behaviour of the likelihood function near the true

parameter values. There is no guarantee that the local maximum, whose existence is

guaranteed by Theorem 1 when cx > 1, is unique, and we know already that the local maximum is not a global maximum. The following result goes part of the way to settling the uniqueness of the estimator. It is an analogue for this problem of Wald's (1949) result on the consistency of the maximum likelihood estimator in regular estimation problems.

THEOREM 2. Suppose that, for each n, we have a sample of n independent observations

from the density f (. ; 00, 0 0), ordered Xn, 1 <

< Xn, n. For fixed ( > 0, ? > 0, y < oo, def

the following regions of the parameter space R x (D: U is the set of (0, 0) for which 0 < 0

4 E (, and Vn is the set of (0, 0) for which I 0- 0o I < , I - 00 I > ( and either X(4)) > 1

or fJ < Xn, 1-n- (i) Suppose Assumptions 1-6 hold with * = (Sin Assumption 3, (' = (Sin Assumptions

4 and 5. Then

lim pr {sup Ln(0, 4)) > Ln(O0, 4)o)} = 0.

(ii) Suppose Assumptions 1-6 hold with (* = 0 in Assumption 3, (' = ( in Assumptions

4 and 5, and cx > 1. Then

lim pr {sup Ln(0, ?) > LO(O0, ?O)} = O0

nf- 00 Vn

We remark that the main interest in this result lies in the case oc > 1, when (i) and (ii)

both hold. The theorem then shows that the region on which Xn, 1-0 is exponentiall

small and cx(o) < 1 is asymptotically the only region where the likelihood function

badly behaved. When ac < 1, for all our four examples the log likelihood is J-shaped,

which shows at once that there can be no consistent maximum likelihood estimator.

Note that we have not settled the question of whether there is a unique local

maximum of the likelihood function. The proof of Theorem 1 makes it clear that the Hessian of the log likelihood is negative-definite on a small neighbourhood depending on

n around (00, 00), and results of Mikeliinen, Schmidt & Styan (1981) then show that

there is a unique local maximum on this neighbourhood. The question of global

uniqueness is much harder to resolve.

We start the proofs with several technical lemmas. In Lemmas 1-3, the only

assumptions made are Assumption 1 and (21). Convergence of random variables is

always convergence in distribution unless stated; p means convergence in probability.

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

74 RICHARD L. SMITH

LEMMA 1. Let X1, Xn be independent from f(. 00, 00) and let

n

Sn,mZ I (Xk-00)m(m> O).

k= 1

Then a - '(Sn,m - bn) W for some an, bn and W having a stable distribution with

min (2, x/m). We may take an = (ndn,pim). If m = o then also

Sn, ml(cn log n) +p 1.

Proof. The random variable (Xi-O0)-m has a distribution in a stable domain of

attraction. The conclusion follows from standard results (Feller, 1971, ?? XVII.5, VII.7).

D

LEMMA 2. Let X1,., Xn be as in Lemma 1, ordered by Xn 1 <

n

Sn,m = Z (Xn,k-Xn,l) (m>O).

< Xn,n. Define

(i) If Lxm, thenSn*,mSn,m-+p asn- o.

(ii) If a < m, then Sn*m < n Ma as n -+ o.

Proof. Define S~n,m = Sn,m-(Xn,I-OO)-m. It is easily seen that S* */Sn, m p 1,

concentrate on showing that S*, mlS/*n -*p 1. Now

(Xn k-Xn, 1) m- (Xn, k-O ) < (Xn, k-o ) Mh{ (Xn, 1 -O0)/(Xn, k - O) },

where h(t) = mt(I - t) -m r-i for 0 < t < 1. Note that h(t) is increasing in t and tends

t -+ 0. Therefore

n

s*-m - S* (X,k-00) -m h{(Xn,1 -o)/(Xn,k-O)} (3*2)

k=2

Part (i) follows by splitting the sum in (3 2) into two parts, using the monotonicity of h

and (X., k-OO)-MSn*,*m -p O for any fixed k. Part (ii) is similar but easier.

For the results which follow, we need some new notation. Suppose { Yn(An), n > 1} i

random sequence indexed by iln e An, and {rn, n > 1 } is a sequence of positive consta

We shall say that Yn(An) <p rn uniformly in An if the relation (3 1) holds uniformly ove An e An. Similarly, we say that Yn(An) -+p c uniformly in An if convergence in probabil holds uniformly over An.

Define

-n E 9log(Xn, k-O) (i = 1),

YU) (0) = k= I

n-' Y, (X,k-f) i+ 1 (i = 2,3).

k= 1

LEMMA 3. Given positive sequences {5,n}, {bn}, the following relations hold uniforml

0on satisfying 0On < Xn, 1- bnI 0| - On I < 6n

(i) I Yn')(O,,)- Y'((0o) | <p max {(nlog I/6n)', n- log n al};

(ii) I Yn(')(On) - Yn )(00) I <p max {(nu,) , n 6, n

(iii) | Yn)(O,) - Yn (()0) <n max {n bn n n21a1, I

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Nonregular estimation 75

Proof. We prove only (ii); (i) and (iii) are s

the sum of three terms: n - '(Xn, 1-On) - n - '(Xn, 1-00)-,

n

n1 E {(Xn,k-On)'-(Xn,k-O0)'}-

k=2

The first term is <p (nbn)'- and the second is <p n1ll- 1. For the t

I (Xnk k-Ofn) -(Xn,k-OoV00) I O I 0-0 I/(Xn,k- Xn, 1)2.

But it follows from Lemmas 1 and 2 that

n

n- , (Xn,k-Xn, l )2 <p Xn, 2 a-

k=2~~~~~~~~~~~

Hence this third term is <, 5n

The result follows by putting together these three

bounds. DH

LEMMA 4. Let 6 6* for n = 1, 2,

, be any sequences of positive numbers with

-+ 0, 5n* o0, and let b, b' be any positive finite numbers. Let Assumptions 1 and 6-8

hold.

I (i) Suppose a > 2. Then _ (02/l02) Ln(O, (/) p moo(+o) uniformly over

0-00 I < bn-/ 1 0 -o 00 < bn, 0} < Xn, I -bn* n 2

I (ii) Suppose x = 2. Then - (log n) - 1(02/002) Ln(O, 0) p c uniformly over

10 So < b , 11)-s)ol bn)0 < Xn, I- 6*(n log n)-2

I (iii) Suppose x < 2. Then _ (02/002) Ln(0, 4)) <p n21a- I uniformly over

IO - 00 < bn-/ I | ?-?) I < bn, f0 < Xn,j -b'n-1.

Moreover-n- 2/a + 1 (02/a02 ) Ln(O, 4)) > Zn on this range, where {Zn} i8 a sequence

of asymptotically positive random variables in the sense that

lim lim pr (Zn > a) = 1.

alO n-oo

II (i) Suppose a > 1. Then - (02/aO a4i) Ln(0, 4)) p moi(4)o) for i = 1,

over

., p, unifo

0 I| <nlXn, la, 1 0 001 < bn, 0J < Xn, 1 -n 16n*

II (ii) Suppose a = 1. Then (D2/00 a')i) Ln(0, 4)) <p log n, for i = 1,

, p, uniformly

10-00 I < b(logn)/n, I0 - )0I < bn, 0 < Xn 1-b'/(nlogn).

II (iii) Suppose x < 1. Then (a2/0 84)') Ln(O, ) <p n1a- 1 uniformly over

I 0 o - 0 < bn- /, I ?- 00 I < bn, 0 K, Xn, I- b'n- 1/c.

III For a > 0, _ (0214i a4)) LnA(, 4)) p mij(4)O) for i, j = 1,

, p, uniformly ove

0-00 I < bn/Xn,v I 0-01 < bn5 ,0 < Xn,b-nb- If the upper bound on I 0

relaxed to b '1 n, we still have (02/&/i /iJ) Ln(0, i) <P 1 uniformly.

Proof. I. We write

f2 \ n (0 , 2) = {a(/) - } Yn3() -n k (02) logg(Xn k-0; (/)) (33)

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

76 RICHARD L. SMITH

(i) The result is true for 0 = 00,

prove (02/002) {Ln(0, 0)-Ln(0O,

( $42 ){Ln(0, 4) -Ln(0

))}

|

I

-o(+

+ n k=1 (Oj2) {1gg9(

and the second term tends to zero in probability. By Lemma 3(iii) and the

continuity of oc(.), the first term also tends to zero in probability, uniformly as

required.

(ii) For arbitrarily small ?, (5, the second term in (3 3) is bounded by

n

? y(3) (0 ) + n-f h (nkf '/))

nY 3,(O)?Xn,'kZ 00, 40)

k= 1

uniformly over I 0-00 I < (, I b-00 1 < , where the latter term is Op(1),

Assumption 8. The result then follows from Lemmas 1 and 3(iii). (iii) The first part again follows by a combination of Assumption 8 and Lemma 3(iii),

together with Lemma 1. For the second part, note that _ (02 Ln/a02) is bo

below, up to an error of at most Op(l), by

(oc -l ) n - (Xn, l-0)2 '> (OC-l ) n '(Xn, 1 -0 +b'n X a - 2)

With Wn = n'/"(Xn,1-00), this becomes (ac-1)n21- '(Wn-b')-2. But W

verges to a nondegenerate Weibull limit by extreme value theory. Hence the

stated conditions are satisfied by Zn = (oc -1) (Wn+ b') -2 C

We shall not give the proofs of the remaining parts of Lemma 4. They follow by

arguments similar to those already given, using Assumptions 1, 6 and 7 and the relevant

parts of Lemmas 1 and 3.

LEMMA 5. Let h be a continuously differentiable real-valued function of p +? r

variables and let H denote the gradient vector of h. Suppose that the scalar product of x

and H(x) is negative whenever I x = 1. Then h has a local maximum, at which H = 0,

for some x with IxI < 1.

The proof is omitted.

Proof of Theorem 1. The case cx > 2 is straightforward, so that we do this first. Let

be any sequence such that n' n -+ 0, n2 (5b -+ oo and define for t E R, y E RP,

fn(t, y) = (5-2 L(O?5t f?5y)

fn(^ Y = n 2 n((O + in t, o0 + an Y)-

By expanding afn/It and afnl/yj as far as the second term and using the results of Lem

4, we deduce that

afn(t, y)/at = -tmoo(0o) - i yM mi(4)0) + 'En, 0(t, Y),

afn(t, y)/yj = -tmOj(0O) - i y mij(0)o) +?n, j(t, y),

where ?n j(t, y) -,p 0 uniformly over t2 + j y 2 < 1 say, for j = 0,

Let t2+Iy12 = 1. We have

ti n@ zY g/Y = -tmO2 jnO

E iy

, p.

i ?

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Nonregular estimation 77

which is as n s-+o eventually strictly negative by the assumed positive-definiteness of M. Hence Lemma 5 shows that fn has a local maximum in the range t2 + I y 12 < 1, with

probability tending to 1 as n --+ oo. Since the sequence n' bn may be made to increase

arbitrarily slowly, this proves the result for the case oc > 2.

Now let c s 2. The argument proceeds by showing that On, /n are respectively On, (n- For 1 i < p,

Ln (0, k) = ai; (0, 0)_)- ni (00, n)

= (0-0) Lf Ln (0*' *) X(4,J(/)aLfl ( + )n

where (0*, O*) = A(0, 0) + (1 -iA) (0o, On) for some A between 0 and 1. Thus we may wri

aLn(o, O 0i = -(0)n 0o) MOi - j(j- on) mijr+ en(0, 4). (3 4)

Similarly,

Lfl(04,) {nLfl(O)) aL) (Ok)}+ {aL n (0 ) n (0A )}

00 00_ JL (n, 4o)} 0

+ { g^n (O)n, n)_ (On dmn) -

{Ln (0Ln) ( 3*5

= { f) 0> iin)gXf (Rr fin)

Let 0* satisfy

aL^ (n* On)- (On, On) = Xi(4l - /) mO. (3 6)

For the moment, we assume On* exists. Comparing (3 6) with (3 5), we see

L =L 0 OLn (0 ,*,)?e*(0,k) = ( -a 07 02) aL2 (O** nj+e*(0, t)

(3*7)

for some 0** between 0 and 0n. If we also define /* to satisfy

0 = (07n 6 0) MOi j((ljn (Pn) Mij,

(3 4) becomes

OLn4) = -Ej(o -k)*j).mij+e (38)

Note that /R must exist, since M is invertible.

To show that 0n* exists, with probability tending to 1, we ma

large n, so that (aL /a0)(0, a) -00 as 0 t Xn, 1. Since aLn/a0 is c

to show that there exists a 0 with

(0, c;bn)- g6o (On, On) > &(/n - iO) mOi.

The right-hand side is Op(n- 2) while the left is of the form (0-00) (52 La02) (0, v) for

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

78 RICHARD L. SMITH

some 01 between 0 and On. For cx =

on I 01-00 1 < n by Lemma 41(ii),

as

n

-> oo. Similarly, for

1

<

oc

<

2

that O7*-_ <7n-2/a+

Now consider en and e*. For cx = 2, we have n-nO- <p (nlogn) 2 and it follo

(n log n)2 en(0n, /n) -,p 0 along any sequence (o), /n) such that

(n log n) "(On - Rn) +p 0, O/n - C) <P (nlog n) 2-

Under the same conditions, we have also n4 en*(On /) ?n 0 For 1 <o c < 2,

and it follows that n"'l en(OX, /)) -+p 0 and n2 e*(On, On)4) - 0 along any sequence (

such that n'l"(0 - -n) +P andn < n'-a.

We are ready for the final step. Suppose ci = 2. For t E M, y E [RP, define

fn(t, y) = n log nLn{On* + tn -'(log n) -' q5n* + y(n log n)-}.

Then

at = n- ,j {tn* + tn-(log n) - 1, ?n* + y{n log n} '}

= t(log n)- a2 ( +**, i +na e*{ ?+tn -(logn) -' , n* +y(nlogn)}-

by (3 7). But the second term tends to zero in probability, and the first to - tc. Similarly,

afnl/ay -, -jgyfimij uniformly over t2 + I y 12 < 1, say, where we use (3 8) and the

previously stated result about en. Note that

On-On <Pn (logn)' = o{(nlogn)-}, 2} -n <p (nlogn) 2

so the required conditions are satisfied. Then

t O fnl't + ? Yi O fnl'Yi --p - tc-? Y'E y' mjj < ?,

so that, applying Lemma 5, the probability that fn has a local maximum within the ball t2 + I y 12 < 62, for any fixed 6 > 0, tends to 1 as n -+ x. Hence Ln has a local maximum

at (On, /k) satisfying

an- n <pn 2(logn) v 1 an - qn <p (n log n) 2(3i9)

with probability tending to 1.

The case 1 < cx < 2 is similar to the case cx = 2. Now define

fn(t,y) = I"L + tn 2 2 + yn

Expanding as far as the second term, using (3 7), (3 8) and Lemma 4J(iii), we may show

that t afn/an + ? y' afnl/yi is strictly negative over t2 + I y 12 = 62, with probability tending

to 1 as n -x . Again Lemma 5 may be applied, and we conclude that fn has a local

maximum satisfying t2 + I y 12 < 62, for any 6 > 0, with probability tending to 1. Hence

Ln has a local maximum at (On, an) satisfying

On- Rn <pn a2) $fn-fi% < n-/

with probability tending to 1 as n s~oc.

The proof of Theorem 1 is complete. II

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Nonregular estimation 79

Proof of Theorem 2. (i) This follows Wald's classical proof very closely; see also Walker

(1969) where similar results are needed as a preliminary to establishing asymptotic

normality of the Bayes estimator in a regular case. There are three steps.

First, for fixed (01, 01) E U there exists b(01, 01) > 0 such that

lim pr {Ln(Ol 014)-L(0o, 4o) < -b(Ol, 01)} = 1

n oo

This uses Assumption 2 and Jensen's inequality.

Secondly, for fixed (O1, 01) E U, i between 0 and 1 01-00 -6, we have

lim pr {sup LnA(0) < Ln(, v0)-b(01v, 1)/4} = 1,

n - oo

where the supremum is over all (0, 0) such that 0-01 I < , I k-01 1 < . This is

by bounding Ln(0, ) -Ln(01, 01), using Assumptions 1 and 3 the first step above

Thirdly, let Km be a compact subset of RO x (F, as in Assumption 4. Extending the result

of the second step above, we have

lim pr { sup Ln(0v/) < Ln(00,v0>"1m} = 1

n- oo KmnU

for some ilm > 0. This is because Km can be covered by a finite number of open

neighbourhoods of points (01, 01). But now Assumption 4 allows us to drop the restriction to KMn, provided m is sufficiently large.

(ii) First we note that, if 0 = 00 is assumed known then, given 6 > 0, there exists 4 > 0

such that

lim pr{ sup L(0O,0) < L(0O,00)-4} = 1 (3.10)

n -oo < 10-: 1<1-I > a

This may be proved by imitating the arguments used to prove (i), using Assumptions 2

and 3 with E = el = 0, and Assumption 5.

Let K' be a compact subset of (F, as in Assumption 5. For

(0, 0 1) E- Vn, 0 E- K f 01 c- Km,

we have

LO(0, 0)-LO(0o, /1) = n {X(+)-1} Ik{lOg (Xn,k -0) -log (Xn, k -Oo)}

+n {oIC()-ok 1) } k log (Xn, k -O)

+n' Xk{log (Xn, k-0; ,) )-log (Xn, k-O, 1)} * (3d11

But if kk-k11 < ij,b <, iv

n Ik{log (Xn, k-O0; 4)-log (Xn, k-o; 0 1?) } < nfl kH(Xn, k fO; 1)

and by Assumption 3 we may choose il and hence 6 sufficiently small that

limpr{n '1kHt,(Xn,k-OO;01) > 4/4} = 1. (3.12)

Note that it is essential. that Assumption 3 holds with 6* = 0 for this step.

Define an event 4'n,,(O, b) to hold if and only if

n7 I{xC() - I} Ek{log (Xn,k-0) -log (Xn,k -o)} < e{oC(4) + I}

We claim that, for any E > 0, it is possible to choose 6 sufficiently small so that

lim pr {gn~(, E(fb(l) for all (0, 0b) E V,n} = 1.F (3d 13)

n -o~

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

80 RICHARD L. SMITH

To prove this, fix (0, 0) E V,, and consider two cases: (a) oc() > 1, (b) X,,1 < 0-n-Y. We

use the inequality

I log (x-01)-log (x-00) <I 01-00 I/{x-max (01, Oo)}

valid whenever x > max (01, 00). In case (a),

n 1{x(4) - 1} Xk{log (Xfl,k -0)- log (Xf,k - Oo)}

is negative when 0 > 00, and is bounded by n- 1 6{x(4)- 1} Ik(Xfl k-Oo) - when 0 < 00.

Since E{(X - 0o) -} < oo, we may choose 6 sufficiently small, independently of

so that

lim pr[n'16{x(4)-1} Xk(Xn,k-00)' > e{X(4)+1}] = O0

n -~ oo

In case (b),

n

n XkIlog(Xn,k-0)-log(Xn,k-Oo)i ? n'ylogn?+rbn E (Xn,k-Xn ,l)1

and the last term converges in probability by Lemma 2. Putting the results for (a) and

(b) together, we have (3413).

We also have E{llog(Xn,k-0o)i} < oo and may make I o(o)-cx(o1)I arbitrarily s

by choosing q sufficiently small. Combining this observation with (3 10), (3-12) and (3

choosing E in (3d13) so that e{ (4)? 1} < 4/4 on the range I 0-01 l < , we have

lim pr {sup Ln( 0) < Ln(00 v0o)- /4} = 1, (3d14)

n-oo

where the supremum is now taken over all (0, 0) such that

(0, 0 1) E- Vn, 0 E- Kmn 01 c- K'5 1 0-01 I < il,

for fixed 01. This result may immediately be extended to any finite set of values of 01 and hence by compactness to the whole of Km. Thus (3d14) holds if the supremum is taken

over all (0, 0) such that (0, 0) E Vn and 0 E K'm. Now consider the case (0, 0) E Vn, 0 ? Km. Taking e = i' in (3 13), &l = 00 in (31 1), we

have with probability tending to one that

LnA(0 ) -L n(0O, k0) < n Sk HkH(Xn, k-O; ;0)

and the result now follows from Assumption 5. II

4. ASYMPTOTIC DISTRIBUTIONS

We are now in a position to state our main results about the asymptotic distributions

of 0n and s-

THEOREM 3. Under the a88umption8 of Theorem 1 let (n, $n) denote a sequence of

maximum likelihood e8timator8 8ati8fying the conclu8ion8 of Theorem 1.

(i) If a > 2 then n (On - 00, 4n - k0) converges in di8tribution to a normal random vector

with mean 0 and covariance matrix M-1, where M i8 as in Theorem 1(i).

(ii) If o = 2 then {(nc log n)2 (0n - 0), n (an - 4o)} converges in di8tribution to a normal

random vector with covariance matrix of form

[A M'1'

where M i8 as in Theorem 1l(ii).

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Nonregular estimation 81

(iii) If

1

<

a

<

2

then

{(nc)11"'(O

-

O

Y E R8 and Z E RiP are independent, Z has a normal distribution with mean 0 and covariance matrix M - 1, where M is as in (ii), and the distribution function of Y is H,

where H is defined in the statement of Theorem 2-4 of Woodroofe (1974).

Note that Woodroofe's definition of H is long-winded, but because we do not know any

way to simplify it, we refer to Woodroofe's paper for the definition.

We also have the following Corollary.

COROLLARY. If a = 2 then

(O(n-O0) {-n 2Ln (On, o)/0O2}1 Z, (4.1)

(n - 00) { - n Lnan2 L , $p)/aO2} - Z, (4-2)

where convergence is in distribution and, in each case, Z is standard normal.

The corollary shows that the variance of the estimators may be estimated asymp-

totically by means of the observed information, in a case where the expected information does not exist. In regular estimation problems, Efron & Hinkley (1978) argued that the

observed information is superior to the expected information as an estimator of variance, but their argument depends on second-order approximations and conditional argu-

ments. It may therefore be of some interest that we have an example in which the

superiority of observed information is very easily demonstrated. Our argument,

however, applies only to the specific case a = 2 and therefore is of only slight practical

significance.

Proofs. We require two preliminary lemmas and a remark. Suppose {(Xk, Yk), k > 1} is

a sequence of independent identically distributed random variables and let (Sn, Tn)

denote the sum of (Xk, Yk) (k = 1,

, n).

LEMMA 6. Suppose E(X1) = E(Y1) = 0, E(X2) = 1, E(Y2) = + oo and Sn/1n - ' 1,

Tnlbn -* Z2 in distribution, where Z1 and Z2 are each standard normal. Then

(Sn/1n, Tn/bn) + (Z1, Z2)

in distribution, with Z1, Z2 independent.

Proof. By Theorem 3X1 of an Erasmus University, Rotterdam, technical report by L. de Haan, E. Omey and S. I. Resnick, it suffices to show that the conditional

expectations satisfy

E(Xj Yj; I Y, bn) 0

{E( Y2; I Y1 Ib2)} 0

By splitting into I XI M and I XI > M and using the Cauchy-Schwarz inequality,

this is less than

ME(I Yf I)/{E(Y2 ; I 41 <I b2)}+?{E(X2 ;2X1 >

each term of which may be made arbitrarily small by taking first M and then n

sufficiently

large. El

LEMMA

density

7. Suppose X1

f

satisfies f

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

82 RICHARD L. SMITH

where the function g 8ati8fte8 f g(x)

(M./a., Wl/1n) converges weakly to

8tandard normal random variable8.

Proof. The case where g is the identity or a linear function is dealt with by Chow &

Teugels (1978), and our method closely follows theirs. It suffices to show that

E{exp (it W,/ Vn); M, > an u} -texp(-2-u )

The left-hand side may be written

/

0

ZoanU

n

(1

+

f

But it follows from the standard proof of the central limit theorem that

00

n g {exp (itg(x)/ In) - } f (x) dx - -2

0

and from the asymptotic form of f that

ranu

n ff(x)dx -- u'.

The result therefore follows by calculating that

ranu- anu

n

Jf[exp

{itg(x)/

I

<L{ t2 2(X

as

n

-[ oo.

Remark 1. Lemma 6 extends the result of Resnick & Greenwood (1979), Theorem 3,

that if Sn/an Z1, Tnlbn -_ Z2 with Z1 normal, Z2 stable with index less than 2, th

(Sn/an, Tnlbn) (Z1 , Z2), with Z1, Z2 necessarily independent. The key point in the pr

is the observation that the limit (Z1, Z2) must be infinitely divisible and therefore the

sum of independent Gaussian and Levy components. Our remark is that the same result holds if (Z1, Z2) arises as the limit of renormalized row sums of a triangular array subject

to the usual asymptotic negligibility condition. That is, if Z1 is normal and Z2 has an

infinitely divisible distribution without a Gaussian component, then Z1 and Z2 are

independent.

Proof of Theorem 3. (i) Theorem 1 shows the existence of (On, an) with an - O <p an-- k0 <p n- 2, and the proof of Theorem 1 shows that the second derivatives of L

asymptotically constant in this region. The result therefore follows by standard

arguments.

(ii) Since (n log n)2(On - On) -sp 0, n (4n - ) -sp 0, it suffices to prove the result

Orn 4in in place Of On, an. For On alone, the asymptotic distribution is given by Wood (1972). For an, alone, the asymptotic distribution is given by the classical results for

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Nonregular estimation 83

regular estimation problems. Therefore the only thing to show is the asymptotic

independence of on and /n,

Now (nclogn)2(On-0o) may be written as

(nc log n)2 2L n(O, _)/0{- 2 Ln(n* 00)//002 -1,

where O* satisfies l on- o I <p (n log n)2. By Lemma 41(ii),

so we have

-(c log n) - 02(n (p 1,

(nc log n)2(fJ-O) = (Cln>, )L

where Vn -+p 1 and hence plays no role in determining the limit. Now it follo

Lemma 1 and Assumption 9 (Woodroofe, 1972) that aLO(O0, X0)/0O has infinite

but that {n/(clogn)}l2OLO(O, 0)/a0 converges to a standard normal variable

Similar arguments applied to (j - 4c), together with the Cramer-Wold device and Lemma 6, allow us to assert that ZO is independent of the asymptotic distribution of

n2(0n-0)o), as required.

(iii) Since n1 (On- O-n) p 0 n($n -n) -+p 0, it again suffices to prove the result with

Onv, an in place of n,, $n, and hence the only thing to show is the independence of the

asymptotic distributions of On and /. Our proof will make use of Remark 1 as well as

Lemma 7.

Let t > 0, y Ec R and consider

pr I (cn) 1 /"(07n - 0) <-, n' Ei ai Oj?jo) -<Z y} 43

for fixed al,

, aP. In the notation of Woodroofe (1974) this is the same as

pr Znt -> ?, n2 I 1 aW(n j- 0)o) -<- A}

where Znt (n > 1) is a sequence of renormalized row sums of a triangular array,

converging to an infinitely divisible limit. The limit is given in Woodroofe's Theorem 2-2,

and does not contain a Gaussian component. Therefore the limiting probability in (4 3) is

the same as that of

pr I (cn)/ a(07n - 0) < - t} pr {n' Ej aj((Mn - jo) <, Y}-

Similar arguments may be applied to the limits of

pr {(cn) la(Tn - 00) <, 0, n a'A zj ajs/ njo) <- Y}

pr{(cn)1/'(0"-n-0) > t, n" Ejai(($jn-0j) K, y IXn,l > 00+t(cn) '/"

for t > 0, using Theorems 2-1 and 2-3 of Woodroofe's paper. The last equation implies

that

lim pr I (cn) 11"(0-n - 0) > t, n 1 j aj(0Jn -jOi) <, y}

- limpr {(cn) " (On-n-o) > t, n2 a Ej a(#Jn- O) y IXn 1 > IOo+ t(cn)-/}

x pr {Xn 1 > Oo +t(cn)V }

- limpr{(cn)'1 (on-ao) > tIXn 1 > o?+t(cnV I }

x pr {n2 XjaJ($j-4/)jO l yXn, 1 > O0?+t(cn) l/a} pr {Xn, > SO+?t(cn)" }.

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

84 RICHARD L. SMITH

Lemma 7 implies the asymptotic independence of (en) - - (XO,1- 0) and the score

statistic evaluated at 4)0 for the parameter E ai Xi. It easily follows that the second factor

in this expression is independent of t, and hence that the whole expression equals

lim pr {(cn) 11"(0n - O) > t} pr {n' Xj aj(O - O,jo) -<, Y}.

This proves the independence of the asymptotic distributions of On and X a& Rn for

al,

, ap, and hence gives the required result.

Proof of Corollary. Rewrite (4a 1) as

(nc log n)2(On - O0) { lo n (On,( 0)o)}

{c log H. 00

the product of two factors, the first converging to standard normal and the second to 1.

This gives (4-1), and (4 2) is similar. Note that (4-1) also holds if cx > 2.

5. AN ALTERNATIVE TO MAXIMUM LIKELIHOOD

The complicated nature of the preceding results when 1 < oc < 2, and the nonexistence

of a consistent estimator when cx < 1, make it desirable to seek some alternative

estimator. An obvious candidate for a point estimator of 0 is Xn 1, the sample minimum.

The asymptotic distribution of n'/a(Xn,1 - 0S) is Weibull, and Akahira's results show that

no point estimator of 0 converges at a faster rate when oc < 2. Thus it seems reasonable to

use Xn, 1 as an estimator of 0 when x < 2. The difficulties are, first, that it is generally n

known a priori whether Xc < 2 and, secondly, that we still need an estimator of 4.

In this section we propose a new estimator Sn of 4. It is consistent, so that it may be

used to discriminate between the cases oc > 2, cx < 2, and when a < 2 it is asymptotically

efficient, and may therefore be used in place of the maximum likelihood estimator 4)n.

The new estimator is defined as the local maximum of the function

n

Z log f (Xn,k; Xn, 1,t))

k = 2

It is therefore equivalent to estimating 0 by the smallest observation, and dropping that observation for the estimation of 4 by maximum likelihood.

Define a modified likelihood function

n

LnAf v t) = n E 109 f (Xn,k; 09v 01)-

k = 2

Thus our new estimator satisfies

aLn(Xn,l, 4))L0i = 0 (i = l

,p). (51)

THEOREM 4. Under the conditions of Theorem 1, with probabil

there exists an estimator 4n satisfying (5-1), and we have

- -~~ S-1/a (cx>1)

(>n C) <P ln 1 log n (o < 1)

COROLLARY. For all ~, 4 iits a consistent estimator of ?)O. If oc < 2, then a- is also

asymptotically efficient, and n2(4)n- 4)O) converges to a normal distri,bution with mean

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Nonregular estimation 85

zero and covariance matrix M 1, where M i8 the p x p matrix with entrie8

mij(4o) (i, j = 1,

, p).

Remark. The results of this section continue to hold if X,, 1 is replaced by any nl

consistent estimator O,n in (5 1), provided <, Xn,

Proof&. We start with some elementary observations about Ln. Clearly,

Ln(9 G/) -Ln(09, G/) = n- 109o f (Xn, 1; f9, /))

= n 1{CX(?) - 1} log (Xn, 1-0) + n log g(Xn, 1 - ; (W).

When 0 = 00 we have

log (Xn 1- 0) = O(log n), log g(Xn, 10; ) -+, log c(o),

so that the whole expression is Op(n- 1 log n). The same applies to

(0/0 Oi) {Ln (0, 0) -Ln (0, 0)}

since

(@/1(/)i) log1 (Xn 1 - 00; ) -+, (/10i) 1log0 (4)

by Assumption 1. We also note that parts II and III of Lemma 4 apply with Ln in place of Ln, and without the lower bound restriction on Xn, 1 -00.

Let

Xn = (a/a8/ti )Ltn(00 k (tn) = (@/@(/)i) {Ln(fO0, (/))- LL(0, d) {)}

which is <p n- log n, by the preceding remarks. Therefore

aLn (Xn 1, b) = ?+ (Xn, 1_00) Ln (0*, Ok*) +?X (k)-n) (0*, 4*)n

where (0*,*) = A(Xn 1,/)+(1- A)(00, () for some A (O < A < 1)

Suppose cx < 1. Let bn, n > 1, be any positive sequence such that bn 0,

nbn/logn -+ x. Define, for y e RlP such that I y = 1,

fn(y)gnY n 2 n(Xn, 15= O/n6-+ 6n Y)L(5-2)( 2

Then

ay n 6 n' = (Xn,1-00) a0 2Lin n j C L0 n

where the dependence of (0*, 4*) on both n and y is indicated. Now, for all y with I y I

we have

10n*(Y)-001 < lXn,1-001 <pn-/ I On* n(Y) - 0 <, bn 0,

so by Lemma 4, modified to apply to Ln as indicated above,

.^0g {(n Yv/n() <p{ l/a-1 (C<1

aO>j Oj{0n*(Y)v On*(Y)} +mij pp

uniformly over y. Therefore

afn/aYi = -E jmj+e() (5 3)

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

86 RICHARD L. SMITH

where en(y) -,p 0 uniformly over

previously, that with probability tending to 1 there exists /)n satisfying

In-fI < bn6aLn(Xn,41 n)/a4t = 0 for i = 1,

Since bn may be chosen so that

,p.

nb6/log n increases arbitrarily slowly, the result of the theorem follows in the case cx < 1.

The argument for cx > 1 is almost identical. Now choose bn (n > 1) so that

-+ 0, n' la - oc, and define fn by (5L2). We have a2 fi/ao 9 a4t -moi unifor

a region, so 6 n- 01) <p 6,7' n - l/ 0 and so (5 3) again holds. The rest of

is

the

same. O

Proof

The results for oc < 2 follow by noting that

of

Co

n2(4n -qo) = n( - )o) + n (;n -n)

and the last term tends in probability to zero. Therefore the asymptotic distribution of

n2(kn-4o) is the same as that of n2(in-40), which is as claimed.

6. HYPOTHESIS TESTING

In this section we consider testing hypotheses about 0, making first a few remarks

about the case 4 known before turning to the case 0 unknown. The results given

of course also relevant to the construction of confidence intervals.

Consider first a simple versus simple test of Ho: 0 = 00, 4 = 00 against HI: 0 0 = (/o, based on sample size n. The Neyman-Pearson test is to reject Ho if

njLn(On, oo) -Ln(001 00)1 > c* (6.1)

for some critical value c*. If ac < 2, the natural choice of 0On is of the

On= 00?+ n / t, (6-2)

for fixed real t, since in this case a test of fixed size will have limiting power strictly between 0 and 1.

Now suppose /0 is unknown but estimated by Jin, as defined in ? 5. The obvious

analogue of (6-1) is the test: reject Ho if

4{n(fLn( dn- Ln(6}n 7;n) > c * (6 3)

To compare (6 1) and (6 3), note that

n{Ln(O, n - LAn(0O0, 4n) - Ln(0n0 4.)) + Ln(0o, 4.o)}

= n(O0-O) X n(4.,n-l)) (2/ 4J) Ln,(0n*n, cn*

for some 0*', 1*. If 1 < cx < 2 we have

On-0 =O(n-'1) (jn-4.o0 <pn-? a2 Ln/a08.p) <p 1i

so the whole expression is OP(n2 - l and thus tends to zero in probability. For ax = 1, cx < 1 we have a2 Ln,,0 a0j <P log n, nl/" -i respectively, leading to the same

conclusion. Therefore the two tests (6-1) and (6 3) are asymptotically equivalent, in the sense that they make the same decision in large samples, provided cx < 2.

Note that this conclusion is false if cx > 2, even if fan is used in place of 4.n. In that

we have O,n- 0 = O(n- 2) and hence that n(On -00) Xj (4n- 4o) a2Ln/a0a4.O is Op(1)

than op(l), so that the equivalence of (6-1) and (6 3) fails. When cx =^ 2, we have

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Nonregular estimation 87

din-k0 <pn-2 by Theorem 4, but in t

so that (641) and (6 3) are again asymptotically equivalent.

Our conclusion is that for testing a simple hypothesis about 0 against a simple

alternative, ignorance about 0 makes no difference to the power of the tes

but it decreases the power if cx > 2.

Now consider a test of Ho: 0 = 0S against the composite alternative H1: 0 $ 00. For

the construction of two-sided confidence intervals, in particular, we wish to consider this

case. In regular cases, there are three widely used tests, all of which are to first order asymptotically efficient, namely the Wald test which is based on the asymptotic

distribution of the maximum likelihood estimator, the score test which is based on the

first derivative of the log likelihood and the likelihood ratio test; see, for example, Cox &

Hinkley (1974, Ch. 9). In the nonregular case oc < 2, even when 0 is known, thes

are not first-order equivalent and it is not known whether any of them has any

optimality properties. The following discussion is therefore concerned purely with

distributional properties and not with asymptotic optimality.

For the Wald test, the test statistic is the maximum likelihood estimator, assuming it

exists. Woodroofe's (1974) results determine the asymptotic distribution of maximum

likelihood estimator in the case 1 < oc < 2 but this is not easy to work with. For the

likelihood ratio test, nothing appears to be known about the asymptotic distribution of

the test statistic. For the score test, however, things are rather simpler, since the asymptotic distribution of the score statistic is known and the test does not require

computation or even existence of the maximum likelihood estimator. Therefore we

concentrate on the score test, believing that this provides at least a viable method of testing hypotheses about 0 even though it may not have asymptotic optimality

properties.

The score static for testing Ho: 0 = 00 when k = kO is known is

AL(0O, 00)/a0 = (oc- 1)2 (Xn, k-00)' n- +?E {logg(X10 nk 0; 00)/00J} n

When oc < 2 the second term is Op(l) and the first tends to oc, but E(aLn/a0) = 0 w

oc > 1, by Assumption 6. Hence aLnI/0 has the same asymptotic distribution a

n'- l{Sn 1-E(Sn, 1)} when oc > 1, n - 1 Sn, 1 when oc < 1, and this asymptotic distribution

stable, by Lemma 1. In particular, when 1 < oc < 2 or oc < 1,n'" aLnI/0 has a

nondegenerate stable limit law.

A two-sided test is therefore defined by the acceptance region

a < n{a-'1 aLn(0, /0o)/00 - bn < b, (6-4) where an, bn are as in Lemma 1 and the percentage points a,

limiting stable law. For oc = 2, the acceptance region is

a < n (clogn)>- 2Ln((0 400)/30 < b (6-5)

with a, b the appropriate percentage points of the standard normal distribution.

Now let us consider the effect of 0 being unknown, and estimated by jn. When

1 < oc < 2 we have

nl 'a{aLO(, f3n)/00 - aLn(0f 00)/O1} = nl - l (Ln- n(0 4L On)1/00j4),

with 0* between an and k0. But a2 Ln/I0 0fi <p 1 uniformly on a region of the I I I Io (l), O1-0I1 =Iop(rJ2+l), so the whole expression 0 uniformly on a

region of form l 0-001l = O(n - 1a). Similar arguments show that the same result holds

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

88 RICHARD L. SMITH

O < oc < 1, cx = 1 and cx = 2. We therefore conclude that the tests (6-4)-(6-5) remain valid

when k0 is unknown, provided that /0 is replaced by its estimated value 4n or by so

other In-consistent estimator.

As previously remarked, no claim of optimality is made for these procedures. In one-

parameter problems, 0 unknown, 0 known, the problem of constructing asy

optimal procedures, when cx < 2, is considered in detail in Chapters 5 and 6 of Jbragimov

& Has'minskii (1981). These results are complicated and depend on the loss function.

7. APPLICATIONS IN EXTREME VALUE THEORY

The results of this paper may be applied to two particular distributions which are of importance in the analysis of extreme values. These are the generalized extreme value distribution, which includes the three-parameter Weibull as a special case, and the generalized Pareto distribution introduced by Pickands (1975).

The density function of the generalized extreme value distribution is

g(y;k,y,u) = a-_{1p-kl-l(ylu} l/k-1 exp[-{1-kFi(y_4)}l/k] (7.1)

defined on the set {y: k(y - p) < a}. In the case k > 0, this is a reparameterization of the

Weibull distribution, while the cases k < 0, k = 0, defined by taking the limit as k - 0 in

(7-1), correspond to the type II and type I extreme value laws, in Gumbel's (1958) characterization. For computational details, see Prescott & Walden (1980, 1983). Recent

reviews on the fitting of extreme value distributions and discrimination among types have been given by Mann (1984) and Tiago de Oliveira (1984). For k > 0, the results of this paper are directly applicable, and show in particular that

the classical properties of maximum likelihood estimators hold for 0 < k < 4, but not for

k ?1. The information matrix is finite over the whole range- oX < k < 4. For k < 0, the

range of the distribution again depends on the unknown parameters, the density being

positive when y > 4u + u/k. Writing ,B = - 1/k, 0 = p + l/k gives the reparam

g(y;13,O,U) =U' {(y-0)/(fU,)} exp[-{(y-0)/(#fU)}f-] (y> 0),

which converges to zero faster than any power of y -0 as y I 0. Although this does not fal

within the scope of our Theorem 1, the arguments that were applied there to the case

oc > 2 are applicable here also, and show the existence of a maximum likelihood estimator

which is asymptotically normal and efficient. We conclude that the classical asymptotic

properties of maximum likelihood estimators hold throughout the range - so < k <2,

while for k > 4 the results for the three-parameter Weibull are directly applicable.

We remark that this argument applies also to the three-parameter log normal

distribution. In this case also, the likelihood function is unbounded but there is a local

maximum which is asymptotically normal and efficient. This observation provides a theoretical justification for the procedures advocated by Griffiths (1980). Another

distribution for which similar arguments hold is the inverse Gaussian distribution with

unknown location (Cheng & Amin, 1981).

The generalized Pareto distribution has density

g(y; k, ) = v- '(1-ky/l)l/k-1 (7-2)

defined on the set y > 0, ky < a. Its importance in extreme value theory arises from the

fact that it characterizes the limiting distributions of the excesses over a threshold, as

the threshold tends to infinity (Pickands, 1975; Davison, 1984; Smith, 1984). When

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

Nonregular estimation 89

k < 0, it is the same as the Pareto distribution,

distribution with mean a. For k > 0, the density is a transformation of the three-

parameter beta distribution of ? 1, with ,3 = 1. Thus we get analogous results to those for

the generalized extreme value distribution: for k < 2 the information matrix is finite and

the classical asymptotic theory of maximum likelihood estimators is applicable, while for

k 1 2 the problem is nonregular and special procedures are needed.

An alternative approach was taken by Hall (1982), who considered the estimation of 0

when the density is of the form (11), but without assuming a specific parametric form for

fo. Hall proposed a procedure based on a specified number of order statistics; h

procedure is almost the same as restricting attention to observations beneath a

threshold, and assuming the generalized Pareto distribution for the differences between

these observations and the threshold. Hall proves asymptotic normality of his es-

timators in the case oc > 2. The strength of his approach is that it also takes into account

the error in (1-1), but he essentially assumes that this error is known, which is not a

realistic practical assumption.

In conclusion, there are many open questions. Even when oc > 2, not all the higher-

order moments of the score statistic are finite and it remains an open question whether arguments based on higher-order asymptotics, e.g. Efron & Hinkley (1978), Barndorff-

Nielsen (1983), are applicable. Mann (1984) has reviewed a number of papers in which authors have reported practical difficulties in estimating the parameters of the three-

parameter Weibull. This raises questions of the speed of convergence, and small-sample

properties of the estimators, which we have not considered at all. The problem of determining asymptotically efficient tests and estimators of 0 when oc < 2 is also unresolved. Finally, there is the question of the asymptotic performance of Bayes estimators; in the case oc < 2 we would not expect asymptotic normality (Dawid, 1970) to hold, and it would be of interest to establish under what conditions the asymptotic posterior distribution does not depend on the prior distribution.

ACKNOWLEDGEMENTS

One of the referees made many helpful suggestions, in particular a greatly shortened proof of Lemma 6. I thank N. H. Bingham and J. P. Cohen for references.

REFERENCES

AKAHIRA, M. (1975a). Asymptotic theory for estimation of location in non-regular cases, I: Order of

convergence of consistent estimators. Rep. Statist. Appl. Res. Union Jap. Sci. Eng. 22, 8-26.

AKAHIRA, M. (1975b). Asymptotic theory for estimation of location in non-regular cases, II: Bounds of

asymptotic distributions of consistent estimators. Rep. Statist. Appl. Res. Union Jap. Sci. Eng. 22, 99-

115.

BARNDORFF-NIELSEN, 0. (1983). On a formula for the distribution of the maximum likelihood estimator.

Biometrika 70, 343-65.

CHENG, R. C. H. & AMIN, N. A. K. (1981). Maximum likelihood estimation of parameters in the inverse Gaussian distribution, with unknown origin. Technometrics 23, 257-63.

CHENG, R. C. H. & AMIN, N. A. K. (1983). Estimating parameters in continuous univariate distributions

with a shifted origin. J. R. Statist. Soc. B 45, 394-403.

CHOW, T. L. & TEUGELS, J. L. (1978). The sum and maximum of i.i.d. random variables. In Proc. 2nd

Symp. Asymp. Statist., Ed. P. Mandl and M. Huskova, pp. 81-92. Amsterdam: North Holland.

COHEN, A. C. (1965). Maximum likelihood estimation in the Weibull distribution based on complete and on

censored samples. Technometrics 7, 579-88.

Cox, D. R. & HINKLEY, D. V. (1974). Theoretical Statistics. London: Chapman and Hall.

DAVISON, A. C. (1984). Modelling excesses over high thresholds, with an application. In Statistical Extremes

and Applications, Ed. J. Tiago de Oliveira, pp. 461-82. Dordrecht: Reidel.

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms

90 RICHARD L. SMITH

DAWID, A. P. (1970). On the limiting normality of posterior distributions. Proc. Camb. Phil. Soc. 67, 625-33. EFRON, B. & HINKLEY, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed

versus expected Fisher information. Biometrika 65, 457-87. FELLER, W. (1971). An Introduction to Probability Theory and its Applications, 2, 2nd ed. New York: Wiley. GRIFFITHS, D. A. (1980). Interval estimation for the three-parameter lognormal distribution via the

likelihood function. Appl. Statist. 29, 58-68.

GUMBEL, E. J. (1958). Statistics of Extremes. New York: Columbia University Press. HALL, P. (1982). On estimating the endpoint of a distribution. Ann. Statist. 10, 556-68.

HARTER, H. L. & MOORE, A. H. (1965). Maximum-likelihood estimation of the parameters of gamma and

Weibull populations from complete and from censored samples. Technometrics 7, 639-43.

IBRAGIMOV, I. A. & HAS MINSKII, R. Z. (1981). Statistical Estimation. Berlin: Springer.

JENKINSON, A. F. (1955). Frequency distribution of the annual maximum (or minimum) values of meteorological elements. Quart. J. R. Met. Soc. 81, 158-71.

JOHNSON, R. A. & HASKELL, J. H. (1983). Sampling properties of estimators of a Weibull distribution of use

in the lumber industry. Can. J. Statist. 11, 155-69.

LE CAM, L. (1970). On the assumptions used to prove asymptotic normality of maximum likelihood

estimates. Ann. Math. Statist. 41, 802-28.

LEMON, G. H. (1975). Maximum likelihood estimation for the three parameter Weibull distribution based on censored samples. Technometrics 17, 247-54.

MAKELIINEN, T., SCHMIDT, K. & STYAN, G. P. H. (1981). On the existence and uniqueness of the maximum likelihood estimate of a vector-valued parameter in fixed-size samples. Ann. Statist. 9, 758-67. MANN, N. R. (1984). Statistical estimation of parameters of the Weibull and Frechet distributions. In

Statistical Extremes and Applications. Ed. J. Tiago de Oliveira, pp. 81-9. Dordrecht: Reidel.

NERC (1975). Flood Studies Report, 1. London: Natural Environment Research Council.

PICKANDS, J. (1975). Statistical inference using extreme order statistics. Ann. Statist. 3, 119-31. PRESCOTT, P. & WALDEN, A. T. (1980). Maximum likelihood estimation of the parameters of the generalized extreme-value distribution. Biometrika 67, 723-4.

PRESCOTT, P. & WALDEN, A. T. (1983). Maximum likelihood estimation of the parameters of the three- parameter generalized extreme-value distribution from censored samples. J. Statist. Comput. Simul. 16,

241-50.

RESNICK, S. & GREENWOOD, P. (1979). A bivariate stable characterization and domains of attraction. J.

Mult. Anal. 9, 206-21.

RoCKETTE, H., ANTLE, C. & KLIMKO, L. A. (1974). Maximum likelihood estimation with the Weibull model. J. Am. Statist. Assoc. 69, 246-9. SMITH, R. L. (1984). Threshold methods for sample extremes. In Statistical Extremes and Applications, Ed. J.

Tiago de Oliveira, pp. 621-38. Dordrecht: Reidel.

TIAGO DE OLIVEIRA, J. (1984). Univariate extremes: Statistical choice. In Statistical Extremes and

Applications, Ed. J. Tiago de Oliveira, pp. 91-107. Dordrecht: Reidel.

WALD, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20, 595-

601.

WALKER, A. M. (1969). On the asymptotic behaviour of posterior distributions. J. R. Statist. Soc. B 31, 80-8. WEISS, L. (1979). Asymptotic sufficiency in a class of non-regular cases. Selecta Statistica Canadiana 5, 143-

50.

WEISS, L. & WOLFOWITZ, J. (1973). Maximum likelihood estimation of a translation parameter of a truncated distribution. Ann. Statist. 1, 944-7.

WOODROOFE, M. (1972). Maximum likelihood estimation of a translation parameter of a truncated

distribution. Ann. Math. Statist. 43, 113-22.

WOODROOFE, M. (1974). Maximum likelihood estimation of translation parameter of truncated distribution

II. Ann. Statist. 2, 474-88.

[Received January 1984. Revised May 1984]

This content downloaded from 200.131.225.130 on Mon, 03 Apr 2017 11:21:25 UTC All use subject to http://about.jstor.org/terms