Вы находитесь на странице: 1из 30

Journal of Econometrics 13 (1980) 27-56.

0 North-Holland Publishing Company

MAXIMUM LIKELIHOOD ESTIMATION OF


ECONOMETRIC FRONTIER FUNCTIONS*

William H. GREENE
Cornell Uniwrsity, Ithacu, NY 14853, USA

1. Introduction

The estimation of production functions has been one of the more popular
areas of applied econometrics. Recent work in duality theory which has
linked production and cost functions has made this topic even more
attractive. Typically, least squares (or some variant, such as two stage or
generalized least squares) is used to estimate the model of interest in
accordance with the assumption of a normally distributed disturbance in the
model. However, definitions of a production function are given in terms of
the maximum output attainable at given levels of the inputs. Similarly, a
dual cost function gives the minimum cost of producing a given level of
output at some set of input prices [See Christensen and Greene (1976).] It
has thus been argued that the disturbances specified in these models, and
techniques used to estimate them should account for that fact. These
considerations have motivated the recent literature on frontier functions.
Numerous studies have been devoted to the respecification of empirical
production and cost models to make them more compatible with the
underlying theory, and to the derivation of appropriate estimators. In some
cases, this has amounted to minor modifications of least squares results. The
remaining estimators are based on two distinct specifications. The very recent
work on composite disturbances has relaxed somewhat the orthodox
interpretation of the underlying function as a strict frontier with all
observations lying on one side of it, and has produced well behaved
maximum likelihood estimators with all of the usual desirable properties.
Other authors, following the more strict interpretation, have employed what
we shall call full frontier estimators which allow only one sided residuals. It

*This is a revised version of an earlier paper, Cornell Working Paper no. 162. The helpful
comments of Henry Wan, Peter Schmidt, Jack Kiefer and two anonymous referees are gratefully
acknowledged.
28 WH. Greene, MLE of econometric frontier functions

has been shown that the received full frontier estimators are also maximum
likelihood. However, estimation of them has been something less than wholly
successful, due in large measure to the fact that, in spite of their being
maximum likelihood, their statistical properties are unknown.
The composite disturbance models offer an attractive specification.
However, they leave unanswered questions of the properties of full frontier
estimators, which do have a theoretical appeal. The purpose of this paper is
to provide some results on maximum likelihood estimation of full frontier
models. First, the common problems of the received estimators will be
analyzed. In short, as Schmidt (1975) points out, one of the standard
regularity conditions usually assumed in maximum likelihood estimation is
violated. For the frontier model, this means that the results usually invoked
for maximum likelihood estimators do not necessarily apply. An alternative
frontier estimator is then proposed. A class of probability distributions which
can be used for the disturbance model and which allow maximum likelihood
estimation to proceed as a regular case is defined. The statistical properties
of the resulting estimators are easily established using the standard results in
spite of the fact that this is a non-regular case. The results for a specific
disturbance formulation with some particularly convenient properties will be
examined in detail. Finally, the technique will be applied to two well known
sets of data.

2. Previous full frontier estimators and their statistical bases

The estimation of econometric frontier functions begins with the study of


Aigner and Chu (A-C) (1968). Following the initiative of Farrel (1957) who
describes an industry envelope isoquant, they propose a method of
estimating a production function model which constrains all residuals from
the fitted function to be negative, i.e., a full frontier model. Their model is

y= Ax;x;~u, (2.1)

where y is output, x1 and x2 are inputs, and u is a random disturbance. The


systematic part of the right-hand side gives the maximum output attainable
using inputs xi and x2. They suggest minimizing the sum of absolute
residuals from the log of the production function while constraining all
residuals to be negative, which is a linear programming problem. No
distributional assumption is made by A-C; but Schmidt (1975) shows that
their technique is equivalent to maximum likelihood estimation of the model,

logy=logA+r,logx,+a,logx,-a, (2.2)
WH. Greene, MLE of econometricfrontierjiinctions 29

where E= -log u and has an exponential distribution,

f,(e)=ie-, ezo, A>O. (2.3)

As an alternative, A-C propose minimizing the sum of squared residuals,


again constraining all residuals to be negative. In the setting of the Cobb-
Douglas model, this is a problem of quadratic programming. By similar
reasoning, Schmidt goes on to show that the quadratic programming
estimator is maximum likelihood if E has the half-normal distribution.

(2.4)

Recently, Forsund and Jansen (1977) have studied a production frontier


by estimating its dual cost function. The homothetic production function
which they estimate is

where y and xj are as before, I,= 1 rj = 1, and u has distribution

f,(u)= (1 +m)u, O<USl, Ix> -1. (2.6)


The dual cost function is

C=ByBeyy fi pj.upl, (2.7)


j= 1

where pj is the unit price of xj, and C is total cost. Forsund and Jansen show
that maximum likelihood estimates are obtained using linear programming,
by minimizing the sum of positive residuals from the log of (2.7). Making the
transformation from u to a=log U- , we find

f:(s) = (1 + cr)e-( +a)E, EZO. (2.8)

Equating 1 in (2.3) to (1 i-cc), we see that the stochastic framework here is the
same as that applied to ACs model.
This completes the list of the received full frontier maximum likelihood
estimators. In each case, estimates are obtained by solving a constrained
programming problem. The frontier model has been used to study the
structure of production and efficiency in production by a number of authors,
nearly all of whom have used the linear programming technique. The full
30 WH. Grrmr,MLE oleconornrtricfrontierfunctions

frontier model has the theoretical appeal of forcing the fitted function to
correspond to the underlying theory in terms of the signs of the residuals.
Moreover, they are maximum likelihood for certain stochastic specifications.
Unfortunately, certain problems have beset them. They can be highly
sensitive to outliers. To compensate for this problem, Timmer (1971)
proposed that the linear programming technique be modified to allow a
certain prespecified proportion of the residuals to have the wrong sign.
While this probably does solve the outlier problem, it must surely compound
the. statistical problems. The more serious shortcoming of these full frontier
estimators is their lack of identifiable statistical properties. Although they are
maximum likelihood, the characteristics of the estimation problem prevent us
from making any use of this result, except, perhaps, to justify the choice of
the computational algorithm. The observation that the programming
estimators are ML is not sufficient to enable us to establish their statistical
properties. No standard errors have been derived for them, and no statistical
inference based on them has been possible.
Beyond some cursory observations, we will not attempt to establish the
specific asymptotic properties of the programming estimators. This remains
for future research. We will, however, examine in some detail how the
problems with these models arise. In the process, the analysis suggests how
the problem of inference can be circumvented through the use of an
alternative estimator.

3. Estimation of the full frontier function: General results

Let the production or cost function be specified as

yt = M +/II x, + e,, t=l,...,T, (3.1)

where yr is output or total cost, whichever is appropriate, x, is the


corresponding vector of exogenous variables, E, is a random disturbance, r
and B are fixed (for all t), but unknown parameters, and T is the size of the
sample. The systematic part of the equation gives the optimum value of y,
given x,. The random part, a,, differs from 0 due to random shocks such as
weather, inefficiencies, etc. [See, e.g., Aigner and Chu (1968, p. 258).] The
disturbances will always be of one sign, negative for the production case or
positive for the cost case. Typically, the model would arise from the
logarithmic transformation of exp (yl) =,f( x ,) u, where either 0 < urs 1 or II, 2 1
for the production and cost cases respectively. The following additional
assumptions will be maintained for the remainder of this study:
W.H. Greene, MLE of econometricfrontier functions 31

Assumption 1. The (K + 1) x 1 vector (1, xi) =x: is independent of E, for all f


and s.l

Assumption 2. The random variables cl, Q,. . ., Q- are independent and


identically distributed with probability density function (p.d.f.) f(e,),
cumulative distribution function (c.d.f.) F(E,), finite mean, p, and finite,
positive variance, a2 for all t. We will also assume, purely for convenience,
that the range of E, is 0 5 E,< co, which implies p >O. (The case of negative
range involves only a trivial modification of what follows.)

Assumption 3. The T x (K f 1) matrix X* whose tth row is x: is observed


and has rank (Kfl) for all TIK+l.

Assumption 4. Let x,~ be the tth observation on the kth element of x, in a


sample of size T, k= 1,. .,K, t= 1,. .., T. The sample distribution function
F$(xk) defined by F$(xk)=j/T, where j is the number of points
Xlk, X2k,. . .>XTk less than or equal to xk converges to a distribution function
Fk(xk) and xk is bounded, k = 1,. .,K.

Assumption 5. The (K + 1) x (K + 1) matrix (l/T)z:= 1x:x: converges to a


finite positive definite matrix as T goes to co.

Without loss of generality, for the time being, x, will be assumed to have a
single element.

3.1. Least squares estimation


With the exception of the non-zero mean of the disturbances, E,, all of the
assumptions of the classical regression model apply to (3.1). Since the model
contains an intercept, it is simple to show that ordinary least squares
provides a best linear unbiased and (by virtue of the assumptions about
the regressor) consistent estimate of fi. The conventionally computed standard
error for this estimate is appropriate, as is the assumption of asymptotic
normality. [See Theil (1971, pp. 38&381).] The only parameter not
consistently estimated by OLS is CY.The OLS intercept estimator is consistent
for c1+p. Since 51 is generally of no interest in any event, if consistent
estimation of the slope (s) is all that is desired, the analysis can stop at this
point. However, in some instances, the least squares residuals can provide a

See, e.g., Christensen and Greene (1976, pp. 658%659), Zellner, Kmenta, and D&e (1966),
and Schmidt (1975, p. 238).
See Schmidts equation (3), but note the sign reversal in our formulation. Richmond (1974)
derives the same result for a specific disturbance model.
32 W.H. Greene, MLE of econometricfrontierfinctions

consistent moment estimator of M. For example, Richmond (1974) examines


a model in which the mean and variance of the one sided disturbance are
both equal to ,u. Hence, the least squares residual variance, s2, which is
unbiased for o2 in any event, is also unbiased for p; and koLs - s2 is unbiased
for CI. In general, we should expect that whenf(a,) is a one parameter family,
both p and o2 will be functions of that parameter. Since s2 will always be
unbiased for a2, a consistent estimate of ,u, and hence of CI may be
obtainable. (See, for example, the moments of the half-normal and
exponential distributions below.) If the disturbance distribution involves
more than one parameter, it may be possible to estimate all of the
parameters of the model using additional moments of the OLS residuals.
[See, for example, Aigner, Love11 and Schmidt (1977, p. 28).]
Note, though, that even after the correction of & for the non-zero
disturbance mean, some of the residuals will still have the theoretically
wrong sign. Of course, this is not inconsistent with the onesidedness of the
underlying disturbance; each residual is, after all, a function of all of the
disturbances and all of the data. [See Schmidt (1975).] But, the presence of
these wrong residuals may impede the computation of efficiency measures
which rely on sign uniformity. A biased but consistent estimate of c( (albeit
of uncertain efficiency) which imposes the sign uniformity on the residuals is
easily obtained.
Consider, first, a simplified version of the model, with only an intercept,
i.e., ~,=a+&,. For the purpose of the discussion which follows, define F;=
CI+a,, and note that for this simple model Et= J,. The OLS estimator of CI
would be j, which has plim j= CI+ p # a. As an alternative, we propose the
minimum sampled value of yr, which is conventionally denoted y(i,. For the
simple model here, y,,,=F(,,. Obviously, E,i,=~+c~i,; so to prove that y,i, is
consistent for CI,it is sufficient to prove that plim a(i) = 0.

Proof In any random sample of T observations, on a, the c.d.f. of the


sample minimum is l-cl -F(C)]*. Therefore, for any 6 greater than zero,
~(~~i)~~)=[l-F(6)]r. Since O<F(d)< 1 for all 6>0, lim,,, P(E~~~~~)=O.
Now, let 6 become arbitrarily small. With the condition a(i) 2 0 which follows
from .s,zO for all t, this proves the assertion directly.3

Assume, for example, that E has density in (2.3). This is easily shown to yield
fE(,)(E(l))=~Te-), which gives the exact results E(.s(i))=l/iT and V(E(~))
= 1/A2T2. Both of these vanish as T increases. Thus, for this case, a(,)
converges in mean square to 0, and y(i) = tl + ql) is consistent for c(.
3Note that a similar argument will establish the consistency of any order statistic of specific
rank, i.e., smallest, second smallest, 50th smallest (but not of any quantile in the sample). For
this reason, the efficiency of this estimator seems uncertain. Obviously, the bias of the sample
minimum is the smallest among the order statistics.
W.H. Greene, MLE of econometricfrontier functions 33

For the more general case in which fl #O, the simple result above cannot
be invoked. When B=O, although we cannot observe the disturbances, we can
observe their ranks. When there is a regressor in the model, so that we must
derive information about the disturbances from the residuals, even this
information is obscured. Let Zr=yl - bx,, where b is any consistent estimator
of p. (The intercept estimator has been discarded.) Note that et is the sample
estimate of I, =yt - ox,, but that while .Ct is observed, E, is not. Moreover,
(Ci,. . ., ET) is not a random sample as. the residuals are, in general, neither
independent nor identically distributed. Thus, the standard results for order
statistics do not apply here. Nonetheless, e(,, is a consistent estimate of c(
under the assumptions already made. (Our proof relies on the boundedness
of xt, and does not necessarily apply for cases in which x, is not restricted to
be finite for all t.)

Proof: By direct substitution, Ct=E;+ (b-P)x,. It is assumed that plim (b


-/i)=O in spite of the non-zero mean of 8,. This would be true, for example,
of the OLS estimator. It must then be true that

&5F;+max(b-fl)x, Vt

Thus,

f f[ f
rninC?~~min E,+max(b-P)x,
1
qmin ~2,s min Et+ max (b - fl)x,
f t t

=-plim min & 5 plim min E,f plim max (b - p)x,.


t f f

But, plim min, 1, =plim EC,,=u was proved above. Since x, is uniformly
bounded by Assumption 4, plim max, (b -p)x, =0 follows from the
consistency of b.

(*) :. plimEC,,sa.

It must also be true that

&Z&+min(b--p)x, Vt.
t
34 W.H. Greene, MLE ofeconometricfrontier,functions

Thus

f *[ f
minE,zmin i?,+min(b-0)x,
1
*min e, 2 min E,+ min (b - P)x~
1 I r

*plim min tFr2 plim min 2, + plim min (b - P)x,.


f f t

Again, plim Ed,)= a; and plim min, (b - p)x, = 0 follows from the consistency of
b and from the assumption that x, is uniformly bounded.

(**) :. plim~~,,Zcx.

Since (*) and (**) must both be true, we have proved that plim&,=cc.
The conclusion is that regardless of the distribution of a,, if the sequence x,
is well behaved (as detined by our assumptions) and if the distribution of a,
meets the requirements above, then the OLS residuals can be used to derive
a consistent estimate of c(. We need only shift the intercept of the estimated
function until all residuals (save for the one support point) have the correct
sign. As before, ancillary parameters of the disturbance distribution can now
be consistently estimated using the moments of the observed residuals.

3.2. Maximum likelihood estimation

Consistent estimates of all of the parameters of the frontier function can be


obtained using only a simple modification of the ordinary least squares
estimator. However, the distribution of E is necessarily asymmetric. A
maximum likelihood estimator which makes use of this information should
be more efficient, at least asymptotically. Maximum likelihood estimation
requires a particular assumption about the distribution of the disturbance.
The linear and quadratic programming estimators of Aigner and Chu are
maximum likelihood if the distribution of E, is assumed to be exponential
[(2.3)], which has p= l/n and g2 = l/i, in the first case and truncated
normal [(2.4)], which has ,U=@&/A and c2 = 02(7r-2)/n in the second.4

In obtaining moments of the half-normal distribution, the gamma integral

[ xP1 e-Qxdx=(l/cc)a-PT(pjr) where T(R)= ~xR~le-xdx for R 20,


0
T(R)=(R-l)f(R-1)for Rzl, and

I-( l/2) = JG, has been used.


W.H. Greene,
MLE ofeconometric frontier,functions 35

The asymptotic properties of these maximum likelihood estimators remain


to be established. The usual approach to the problem for econometric
models is to extend the analysis of Cramer (1946) to the regression case at
hand as, for example, in Barnett (1976) or Amemiya (1973).5 Unfortunately,
this approach requires a regularity condition which is absent here. In
particular, let aT be the vector of efficient scores (l/T)c3logL/@, where L is
the likelihood function based on a sample of size T and Cp is the vector of
unknown parameters being estimated. The standard approach requires either
E+(a,)=O or lim,,, aT = 0. Unfortunately, as shown below, for neither of
the aforementioned distributions, can either result be established. On the
other hand, the consistency proof of Wald (1949) requires less stringent
regularity conditions, and it would seem that it could be suitably modified to
apply to the LP and QP frontier estimators. The argument in Kendall and
Stuart (1973, p. 42) for example, which is based on this proof, should
provide a suitable framework.
Of course, the (appropriately modified) OLS estimator is also consistent,
and more easily computed. Rather, we are interested in asymptotic efficiency.
Unfortunately, the observation that these estimators are maximum likelihood
is of little value in this regard, as it provides no guidance as to how to
formulate or estimate standard errors. Consider the following exact
expectations based on a sample of size T, (J-~,.x,): For the exponential case,
with 4 = (A, CI,p),

E[(l/T)dlogL/Z$]= (3.2a)

and

p -
- E[~I/T )a2 i0g L/a+a+q

where X is the sample mean of the observations


=

1x
1 0
0 0 I 3

on x,. Note, E(a,)#O.


(3.2b)

Moreover, both E(a,ak) and E[l/T)alog L/I%#&#J] are singular in


every sample, so regardless of what is assumed about the data, the proof of
consistency, efficiency and asymptotic normality of Cramer will break down.
(The characteristic roots of E(a,a,) are 1/A2, IX+ p2.U2, and 0.) For the half

The extension is necessary because (J,, y2,. .,y.,.)do not constitute a random sample. While
independent (in most settings) the observations are not identically distributed; each has its own
mean. One approach is simply to consider repeated sampling of the multivariate observation
(p,lx,), t= l,..., T, as in Theil (1971, sec. 8.1). Alternatively, one can establish the necessary
results directly as do Barnett (1976) and Amemiya (1973).
36 W.H. Greene, MLE qfeconometric frontier,functions

normal case, with +=(O, a,/?),

(3.3a)

and

l/244
-E[(l/T)a* logL/a&%$] = ,,6/43 & 1/42 (3.3b)

&Xl@ J;r fix/@ xl# 1

where 1x is the sample mean square of the observations on x,. Again,


E(a,)#O, and clearly lim,,,, a, will not vanish either. It is easily verified
that the expectation in (3.3b) is not positive definite, so again the results of
Cramer cannot be applied.
None of the established results for MLEs of regression models apply here,
and it is not clear how asymptotic standard errors for these estimators can
be obtained or what asymptotic distribution is appropriate.
The problem, as Schmidt has correctly diagnosed, arises because the range
of the observed random variable depends upon the parameters being
estimated. Consequently, this case is an irregular one as far as maximum
likelihood estimation is concerned. Some special considerations are required
to handle this violation of the usual regularity conditions. We note, though,
that aside from the range problem, if one is willing to make the necessary
assumptions about the exogenous variables, then the two distributions
proposed by Schmidt imply a likelihood function which is otherwise well
behaved. (That is, there are no discontinuities in the density function of the
observed random variable anywhere over its range, all required derivatives
exist and are finite, etc.) But, as he observes, the range problem will persist
regardless of what distribution is chosen for c.
The range problem described above does not preclude us from establishing
the desirable large sample properties of all maximum likelihood estimates for
frontier functions. Nor, as shown below, does it necessarily invalidate the
standard analyses usually applied to better behaved problems (as, for
example, by Amemiya), including the use of the conventionally computed
information matrix to form standard errors.
Assume, for now, that no regression is involved, so that the results for
random sampling apply to the observations yr, t= 1,2,. . ., T, and consider the
general problem of maximum likelihood estimation of a parameter vector 4.
Let ,f(y,, $) be the common density function of yr, t= 1,. . ., T, and assume
that the following regularity conditions all apply:
W.H. Greene, MLE of econometric frontier functions 31

(1) The parameter space @, which may be restricted, for example to exclude
non-positive variances, is compact and contains an open neighborhood
of the true value of q5, &.

(2) f(yt, 4) is positive, continuous, and three times continuously


differentiable with respect to C$ everywhere in the range of yt and for all
values of 4 in an interval A which contains & as an interior point.

(3) The range of y, is independent of 4.

(4) 2 logf(y,, $)/a$ has a positive definite variance matrix with finite
elements.

(5) The absolute value of the third derivatives of logf(yt,$) with respect to
4 are bounded by a bounded integrable function of y1 which does not
depend on 4.

(Note that the estimation problem defined here is a particularly well behaved
one.)

Now, by definition

~/(O)d~,=l,
I
so that

Assumptions 2 and 3 (by Leibnitzs rule) allow the interchange of the


order of integration and differentiation so that we also have (at least in A)

It is now easily shown that

-&,[a log UOI = 0, (3.4)

where L is the likelihood function, nr= 1 f(yt, qb), and Ego(. ) indicates that
the expectation is taken at the true parameter value. Similarly,

where f+ = af(y,,
4)/i@;and, interchanging again,
38 W.H. Grrune, MLE ofeconometricfrontierfunctio~s

where f++, = Z2f( y!, 4)/&@4. This, in turn yields

and

(3.5)

With these results in hand, and with the other regularity conditions,
consistency, asymptotic efficiency, and asymptotic normality of the maximum
likelihood estimator can now be established [as, for example, in Kendall and
Stuart (1973, ch. 18)]. The negative of the inverse of E(a210gL/?@+)
provides the appropriate asymptotic covariance matrix for the maximum
likelihood estimator.
Independence of the range of yt of the parameters in 4 is normally
included among the regularity conditions in order to make the interchange of
integration and differentiation needed to establish (3.4) and (3.5) permissible.
In fact, given Condition 2, Condition 3 is not necessary but only sufficient.
The results in (3.4) and (3.5) can be established without Condition 3
provided other conditions are met.6 This has direct relevance for the
estimation of frontier functions, as this range problem is generally the only
one which prevents the likelihood function from being perfectly well behaved.
If we can establish (3.4) and (3.5) (or the necessary analog for the regression
case), then, in spite of this violation of the regularity conditions, the analysis
of Barnett or Amemiya for maximum likelihood estimation may be appealed
to directly to establish the properties of the MLE. (The simple regression
model generally considered in the frontier case is covered a fortiori by their
results for more involved cases.)
Suppose, then, that ,f(y,, 4) satisfies all of the regularity conditions except
that the range of J~depends upon 4. In particular, assume /(~)SY, su($). As
always,

In A, given Condition 2,

(3.6)

The following argument is suggested by Kendall and Stuart (1973, p. 35)


W.H. Greene, MLE of econometric frontier functions 39

[A proof of this form of Leibnitzs rule may be found in e.g., Kaplan (1952,
p. 221).] The second and third terms after the first equality vanish when the range of
yr is independent of 4, as &($)/a$ = a/(4)/@ =O. However, they also
vanish whenf(u(~),~)=f(@),~)=O, even if the range of yt depends upon 4.
The requirement is simply that ,f(~,4) be zero at its terminal points.
Distributions for which this is the case are numerous. For example,

is one; the lognormal,

is another. [See Kendall and Stuart (1973, p. 168).]


Assuming f(y, 4) vanishes at the terminals, we have, in A,

from which (3.4) follows immediately.


Applying the previous result.

~~~~.~~(?-$)dy~=~~~~~m

(3.7)

If the derivatives of f(yt,4) with respect to 4 are zero when JJ~is at the upper
and lower terminal points of the distribution, then the operations of
differentiation and integration can again be interchanged. From here it is
straightforward to obtain (3.5). All of the familiar asymptotic properties for
maximum likelihood estimators can now be invoked directly, as the
assumption about the range of yt plays no further role. [See, for example,
Kendall and Stuart (1973, ch. IS).]
It remains to extend these results to the frontier function (regression) case.
Consider, then, a disturbance specification for the model (3.1) with
continuous p.d.f.f(s). Two sets of parameters will be involved here. First are
the ancillary parameters of the disturbance distribution, such as /z in (2.3);
second are the intercept and slopes of the frontier function. The range of y1 is
free of the first set. Maximum likelihood estimation of them presents no
unusual problems so long as f(s) is regular enough with respect to these
parameters, which we will assume. Hence, they will be ignored in what
follows with no loss of generality. Let 4 = (a, p). Then 1(+) =a +pxt, while u(4)
40 W.H. Greene, MLE ofeconomrtric frorttier functions

is irrelevant. Given J(E) and yt =tl +/?x, +E,, so that the Jacobian of the
transformation from E, to y1 is unity, we have

&(.L4)=f,(L-~(dJ)).
To establish (3.4) we need, from (3.6),

f(4,~)/,=,,~r=~(O)=o. (3.8)

This simply requires that the contact point of the disturbance distribution be
zero. Obviously, this does not hold for the exponential distribution for which
f,(O)=;l, nor for the half-normal distribution, for which f,(O)=2/&/%r. It
does hold, however, for the lognormal distribution. For (3.5) by (3.7), we
require

(3.9)

Now, assume that (3.1), Assumptions l-5, Regularity Conditions 1,2,4,


and 5, and (3.8) and (3.9) all apply to the problem at hand. The problem of
maximum likelihood estimation of 4 thus implied is an extremely well
behaved one, and the already established results for MLEs for regression
models which are (essentially) linear in their parameters, can be invoked
directly. (While this set of conditions is quite stringent for the general
problem of maximum likelihood estimation, it places virtually no restriction
of the sorts of empirical models typically specified for production frontiers.)
Thus, there should be no obstacle to asserting the usual maximum likelihood
properties to the MLE of 4. For example, verification can follow the analysis
of Amemiya. We note, in passing, that the extension of the results to a
nonlinear l(4) would be direct, and should not greatly complicate the
analysis. An additional assumption about boundedness of the derivatives of
I(4) would be all that is required.
W.H. Greene, MLE ofeconometric Jrontierfunctions 41

To summarize, then, the difficulties in maximum likelihood estimation of


frontier functions are not inherent in the problem. Given the set of regularity
conditions and assumptions listed above, the MLE of (3.1) will be consistent,
asymptotically efficient and asymptotically normally distributed. An
appropriate estimator of the asymptotic covariance matrix for the MLE will
be provided by the second derivatives matrix of the log likelihood function.
What is required is a careful choice of the disturbance model.

4. A disturbance specification

The exponential and truncated normal distributions do not satisfy (3.8)


and (3.9). There are, however, numerous distributions which do meet these
requirements. One which is particularly attractive for the frontier estimator is
the gamma density,

f(~)=G(~,P)=iP$-l
e=, EZO, 1>0, P>2. (4.1)
P(P)

The mean and variance of F are p = P/i and g2 = P/A. The presence of two
free parameters in f(e) obviates the possibly unwarranted assumption of a
functional relationship between p and 0 implicit in (2.3) and (2.4). For the
general case of G(& P), P must be positive. The restriction P> 2 gives (3.8)
and (3.9).
The log of the likelihood function for this disturbance model is

-21 (yt-ct-flx,). (4.2)

i;
The first derivatives

log L/an
of the log likelihood are

I 1i; log L/dP


? log Ll&
C(: log~/i;pJ
= [2 log L/&#J]

TP/l - 1 tzr

T/i- (P-1)1 (l/c,)x,


\ f
42 W.H. Greene, MLE o~econometric,frontier.functions

where et =yt - CI- /?xt and X is the (K x 1) column vector of sample means of
the K variables in x. With the exception of alog L/dP, it is simple to verify
E(dlogL/a$)=O.E(E)=P/~* and E(l/s)=i,/(P- 1) are found by direct
integration, from which E(dlog L/di,)=E(? log L/&)=E(dlog L/dfik)=O
follow directly. To find E(ln a), let v =/?a, so E(ln E) =E(ln tl) -In /1. The
distribution of v is easily shown to be G(l, P). While the distribution of In v
is messy, the cumulants of In v are simply K,(ln v) =dlnT(P)/dP. [Kendall
and Stuart (1973, p. 177).] In general, or =p and IC~=CT~. Thus, E(lnv)
=r(P)/T(P), and E(Z logL/dP)=O follows imediately. The second
derivatives of the log likelihood are

(P-1)1 (l/E:)X:X; J

(4.4)

where i is a column vector of ones, and the intercept has been included in
the vector of slopes. The only new result required to derive the exact
expectations for (4.4) is E( l/e:) = A/( (P - 1)(P - 2)). This gives

(4.5)

where

A = IA2 r=r(q,
-l//I

and

Straightforward, though tedious algebra verifies that Z, gives the exact


covariance matrix of uT= (l/T)8 log L/&j for all T 2 (K + 1). Assumptions 4
and 5 guarantee that C, converges to a positive definite matrix with finite
elements. All of the other regularity conditions listed above (Conditions 1, 2,
W.H. Greene, MLE of econometric
frontier functions 43

4, and 5) are easily verified. Thus, we will, at this point, invoke the results of
Cramer to assert the consistency asymptotic efficiency, and asymptotic
normality of the vector of parameter estimates 6 which maximize (4.2). The
limiting distribution of fi (&-4,) will be normal with mean 0 and
covariance matrix C-l, where Z=lim,,,C,; (l/T)C-, which we will
estimate by (l/T)f; , gives the asymptotic covariance matrix for 6.
The Gamma distribution is obviously asymmetric, hence maximum
likelihood estimation of the parameters in (4.2) should be more efficient than
least squares which takes no account of that fact. It will also, unlike OLS,
give a consistent estimate of CI. Ignoring, for now, the inconsistency of the
OLS intercept term, we would expect the gain in efliciency obtained by ML
to be related to the degree of skewness of the distribution. The skewness
coefficient, E(E-E(E))~/~ is readily shown to be 2/a. The parameter P is
clearly crucial. Intuitively, we might expect the greatest efficiency gain from
ML when P is small (near 2).
In fact, a more direct efficiency comparison is available. The exact variance
of the OLS estimator is, as always, a(X,X,)- = (P/E.2)(X;X,)) . The
lower right block of the inverse of TC, provides the basis upon which we
will estimate the asymptotic variance matrix of the maximum likelihood
estimator. As an initial approximation, assume TC, is block-diagonal. Then
the appropriate variance matrix is derived from ((P - 2)/A2)(X;X*)- 1 = ((P
-2)/P)a2(X;X,)- . Again, a small value of P will suggest a large efficiency
gain of ML over OLS. As a first guess, the number P/(P-2) should be
indicative of the relative asymptotic efficiency of ML over OLS for this
model. The limiting case of P=2 (which is inadmissable) results in a singular
second derivatives matrix, while large values of P imply no gain. In
accordance with the earlier result, large P implies a symmetric distribution.
Unfortunately, TC, will never be block diagonal, even if all regressors are
in deviation form, so long as there is an intercept in the model. The first row
of XLiS is T6 in every sample. Partitioning the inverse of TZ:,, we get

1
-1

X:X, -Xi i6'A - 6iX, .

This can be simplified to

(4.6)

This uses E(~)=r(P+r)/(lT(p)). Note that i has disappeared. This follows from the fact
that E is just (l/L)0 where u- G(l,P), and a constant scale factor will not affect the shape of the
distribution.
44 W.H. Greene, MLE qf econometric frontier functions

where

p=A*(P-2)[(P- 1)2/1- (P-2)]


P(Py- 1)(P-1)2

and

Y= (r(pr(p)- (~(p))2)l(~(p))2.
Consider the limiting cases P+2 and P+ cc. As P+2, p+O as does (P-2)/P,
and ST vanishes as before.8 As P+m, p+O again, while (P-2)/P-+1.
Maximum likelihood estimation provides no gain over OLS if the error
distribution in this model is symmetric. For the intermediate cases, we can
appeal to the usual results to assert the relative asymptotic efficiency of the
MLE.
For a final characterization of this distribution recall that E =v/A where v
y G(l,P). Now, let z= (a-E(u))/(T, = (v -P)/fi. The rth cumulant of z is
rc,(z)=pl-2 (rz2). As P+m, all cumulants except the second will go to
zero. (ICY=0 if ,U=O.) This sequence of cumulants characterizes the normal
distribution, so we may conclude that as P+co, z tends to standard
normality. As E is a linear transformation of z, we see that as P+co, the
distribution of E tends to normality. This implies that the maximum
likelihood estimator should approach the ordinary least squares estimator.
This last property makes the gamma density extremely attractive for
estimating the production or cost frontier, as it implies that the model is
quite flexible in the shapes of error distributions it will accommodate.
Suppose the process generating the disturbances is such that the error
distribution is symmetric, or nearly so. Our consistent estimate of (c(,p) will
allow us to discern this from the regression residuals. The maximum
likelihood estimate of P will tend to be.large, while i will adjust to place the
mean in the appropriate location. Moreover, the slope estimators should
resemble the OLS estimators in value and efftciency, while the intercept term
will now be a consistent estimate of c(. Alternatively, if the observations tend
to be grouped close to the frontier, with only a relatively small number in
the extreme range, then P should be small, the error distribution will be
highly skewed, and we should expect the maximum likelihood estimator to
be highly efficient relative to OLS.
This has an implication for the average versus frontier estimators
discussed in a number of studies. The average estimator is generally

This requires that (Py- 1) not go to zero. But (Py- I )= (r2/T)lAl, which must be positive for
all P greater than 0. [Consider the asymptotic variance matrix of the MLE for (i,P) based on
an observed sample of cs, This would be A-, which must be positive definite.]
Kendall and Stuart (1968, pp. 47, 68, 94-101, 136, 166-167).
W.H. Greene, MLE of econometric frontier functions 45

understood to be OLS, (although what it estimates is somewhat ambiguous)


while the frontier estimator corresponds to any of the methods thus far
described. The Gamma specification allows a relationship to be established
between the average and frontier estimators. If the disturbances about the
frontier estimator tend to be symmetrically distributed, we should expect the
average estimator to be a displaced or simply scaled version of it with the
same shape. The more skewed the disturbances about the frontier are, the
less it should resemble the average estimator.
In summary, the Gamma density provides several useful results for the
specification and estimation of frontier functions. First, it provides a
maximum likelihood estimator with all of the usual desirable properties. The
asymptotic distribution of the estimator is easily derived, and the asymptotic
variance matrix is readily estimated. As shown below, the estimator and
these variances are relatively easily computed. The ancillary parameters ,J
and P provide additional information on the shape of the distribution with
which we may characterize our observations on relative (cost or technical)
efficiency and offer some evidence on the relationship between the frontier
and average estimators. In the case in which the error distribution is
symmetric in the relevant range, so that the simple modification of the results
of OLS suggested in section 3.1 is reasonable and appropriate, this is what
the maximum likelihood estimator should provide. Hence, the added
efficiency of ML will not be simply artificially built into the problem. When
the error distribution is highly skewed away from the frontier, however, large
gains in efficiency over OLS can be obtained by accounting for this
asymmetry.
For the gamma density with P >2, maximum likelihood estimation of the
parameters is a regular case. The problem is complicated somewhat because
estimates of the likelihood function and its derivatives will involve the
logarithms and reciprocals of the residuals. Hence, log L must be maximized
in the open set e, = (y, - LX- /?x,) > 0. But, as any e, approaches 0, log L goes
to negative infinity; while c?log L/i@ will be unbounded. Therefore, the
maximum must be at an interior point. It appears that this situation will
arise for any distribution for which (3.8) and (3.9) hold, as the conditions
seem to imply that f(a) must be of the form &(@)g(c, 4).
Since the expected values in (4.5) are known exactly, the method of scoring
can be used to maximize (4.2) with respect to 4=(&P,&, p). For this
procedure, the gamma function and its derivatives must be approximated.
For the function T(P),
46 W.H. Greene, MLE of econometric frontier functions

For our applications, a sixteen point Gauss-Laguerre quadrature was used


to approximate P(P),

P(P) z f r,P),
vi h(wi,
i=l

where h(wi, r, P) = wp- (log wi)*, and vi and wi are the Gauss-Laguerre
polynomial weights. [See IBM (1977, pp. 303-307).] The constraints P>2
and ;I > 0 were imposed by the method of squaring. The parameters P and A
in (4.2) were replaced by (Pi +2) and Ai; then (4.2) was maximized with
respect to P, and A.+without constraints.
In the iterative procedure, a slight modification of the scoring algorithm
was necessary. The procedure would normally be &+ ) = $) + d(), where
d)= -H(s)-gs), HCs) is the inverse of (4.5), and g) is the current value of
(4.3). However, the direction vector, d, was quite large in the early iterations,
so the process became unstable. To compensate, the elements of (AZ/p
- 2)Sx;x* in (4.5) were multiplied by T prior to inversion. This
substantially reduced the size of d at each iteration, and resulted in extremely
slow convergence of the process. However, the computations at each
iteration are simple, and even excessive numbers of iterations are quite
inexpensive. The process was stopped when the relative change in the
likelihood function was less than O.OOOO1.lo
Starting values for the intercept and slopes were the modified ordinary
least squares values of section 3.1. The intercept must be moved slightly
further than this procedure dictates in order to insure that no residual be
zero. The modified set of residuals now has mean C and (unchanged)
variance s2, both positive. Since E(E) =P/A and V(E) = P/,i2, appropriate
consistent starting values for P and A are P22/s2 and FJs2, respectively. The
starting value for A is obviously positive, and the starting value for P was
well over 2 in every case attempted.

For the function in the second application, with 5 parameters in addition to P and E.,
convergence required about 220 iterations, but less than eight seconds of CPU time on an IBM-
370. All computations were done in double precision.
This does not necessarily produce p>2. In all applications considered, however, i, was
greater than 5. It is easy to see, though, that P will almost surely be greater than 2 in any
sample. The modified set of OLS residuals is obtained by shifting the intercept until every
residual is positive, Let e,i, be the minimum of the original OLS residuals, Then e,i, must be less
than 0. The mean of the modified residuals is simply --e,,,, while the variance remains .s2. Thus.
~=c:,,/.s, The requirements P>2 is equivalent to e(,,/ s < - 1.4142. We require that the smallest
OLS residual in the sample be at least 1.4142 standard deviations below 0, an event which is
extremely likely in any sample, and has probability approaching 1 as T+ x
W.H. Greene, MLE of econometric frontier functions 41

5. Applications

Two data sets will be analyzed to illustrate the estimation of the frontier
function. To provide a comparison with some prior results, the production
frontier estimated by Aigner and Chu and by Aigner, Lovell, and Schmidt
will be reestimated using the technique of section 4. The second application
will involve a dual cost function. Finally, we will consider the question of
estimating technical efficiency using the frontier estimates.

5.1. Production function

The AignerXhu study uses statewide data on the U.S. primary metals
industry (SIC 33) to estimate the parameters of a Cobb-Douglas production
function,

In V=cc+p,lnL+fl&R lnK)-E, (5.1)

where I/ is value added, L is labor input, K is the gross book value of plant
and equipment, and R is the ratio of net to gross book value of plant and
equipment. Value added, labor, and capital are computed on a per
establishment basis, and there are 28 statewide observations in the sample.
[These data were first analyzed by Hildebrand and Liu (1965).]
Aigner and Chu estimated the parameters of (5.1) using linear
programming (LP) and quadratic programming (QP). These estimators are
maximum likelihood for the exponential distribution and half-normal
distribution, respectively. In their recent, innovative paper, Aigner, Love11
and Schmidt (ALS) respecified the disturbance, E, to be equal to (tl-u),
where u is assumed to be normally distributed with mean zero and variance
at, while u has the half normal distribution in (2.3) Thus, E has an
asymmetric distribution. The symmetric disturbance, v, is assumed to be due
to uncontrollable factors such as weather, making the effective frontier,
c(+/Yx+ c, stochastic. The negative term, -u, is assumed to be due to
inefficiency.
Table 1 presents the parameter estimates obtained using five estimators,
OLS, LP, QP, maximum likelihood for the stochastic frontier, and
maximum likelihood for the full frontier using the Gamma density.13
Numbers in parentheses below certain of the estimators are asymptotic t-
ratios computed using the ratio of the estimate to the square root of the
appropriate diagonal element of the estimated asymptotic covariance matrix.

They also consider the case in which u has the exponential distribution. See also Meeusen
and van den Broeck (1977).
13The first four of these are taken from Aigner, Lovell, and Schmidt (1977, p. 32).
48 W.H. Greene, MLE qf econometric~rontier functions

While the stochastic frontier function is quite close to the OLS estimator,
the full (gamma) frontier is substantially different. ALS deduce the first of
these comparisons from the very small value of 0, which suggests that their E
is dominated by the symmetric normal error term. Thus, the resemblance to

Table 1
Estimates of eq. (5.1).

Estimator dl BL. PI;


OLS 0.9146 0.9 168 0.04164
(Aigner- Love&Schmidt) (2.04) (7.31) (2.19)
0.8730 0.003 1
KgnerChu)
1.071 0.0269
ZgnerChu)

Stochastic frontier 0.9600 0.9105 0.04208


(AignerpLovellMchmidt) (2.06) (7.68) (2.34)
&2
= 0.000686, G*
I, = 0.0692

Full frontier 1.4197 0.7496 0.0756


(gamma) (5.33) (10.62) (7.21)
x=11.7937, F= 5.3804
(3.55) (3.76)
CY=0.0387, F;= 0.4562

Estimated covariance matrix for the coefficient estimates

& 131. 8, 1 B

0.0707
; -0.0122 0.0048
ii: o.OQOO1 -0.0011 0.00011
a a
,I ~ 0.0061 11.0301
a a 2.045X
B - 0.0048 4.5273

*less than 10 lb.

OLS is to be expected. The results for the gamma function are in strong
disagreement. The value for P of 5.3804 is rather small, giving a skewness
coefficient of 0.8622. By comparison, the skewness for ALSs distribution is
only 0.000068.14 As expected, the gamma parameter estimates are quite
different from OLS and the ALS estimates. The large efficiency gains
predicted by the small value of P can be seen in the substantially larger f-
ratios. If anything, the first guess efficiency ratio of 1.59 .understates the
difference.

This can be deduced from their results on pp. 26 and 29f.


W.H. Greene, MLE o~ec.onometricfrontierfirnc.tions 49

5.2. Cost frontier

In his classic study of economies of scale in electric power generation,


Nerlove (1963) provided the first empirical application of the well known
duality between cost and production functions. His study was primarily
concerned with estimation of the cost function dual to a Cobb-Douglas
production function based on a sample of 145 firms producing electric power
in 1955. This study provides an excellent setting in which to apply the notion
of a cost frontier.
Assume that production is characterized by the production function y
=F(x), where y is output, x is a vector of inputs. F(x) gives the maximum
output producible given X; and F(x) is a smooth, neoclassical production
function with positive . and decreasing marginal products and convex
isoquants for all pairs of inputs. Then F(x) has a one-to-one correspondence
with a cost function, 4 = C(y, p), where q is total cost, p is the vector of input
prices, and C(y, p) gives the minimum total cost of producing output y when
input prices are p, and y = F(x). C(y, p) will be similarly smooth, concave and
linearly homogeneous in p and have positive total and marginal cost [C( . )
and X/ay] and positive factor demands xi =i?C( )/Zpi.15
In an empirical setting, if F(x) is interpreted as a production frontier, then
C(y,p) should be interpreted as a cost frontier. Technical efficiency implies
cost efficiency and vice-versa. The negative residual on an estimated
production frontier will always have associated with it a positive residual
from a cost frontier. The choice of which function to estimate can be based
not on the information one expects to obtain, as it is the same in both cases
(by the one-to-one correspondence), but on statistical issues. Nerlove chose a
cost function. He argued that output could reasonably be considered
exogenously determined for a regulated firm, as could the input prices, while
the factor demands, and hence total cost, should be treated as endogenous.
To allow for neutral variations in the returns to scale parameter, Nerlove
generalized the three input Cobb-Douglas cost function to

In (q,lPF,) = Do+ a In yI + B(lnyt Y/2


+ 0, In (PK,IPF,)+ Qrd
In (PL,IPF,)+ E,, (5.2)

where K, I,, and F are inputs of capital, labor and fuel respectively. The
implied underlying production function is homothetic, but not homogeneous.
[See Christensen and Greene (1976, pp. 661, 665).] The linear homogeneity
in prices constraint has been imposed, and E, is assumed to be greater than 0
in the current setting. The implied scale economies parameter is

qt = l/(8 In C(y,, p,)l? In yr) = l/(a + B In yt),

%ee Diewert (1974) and Shephard (1953) for proofs of these results
50 W.H. Greene, MLE qfeconometric frontier functions

which clearly varies with yr as desired. Note that v1 is the ratio of average to
marginal cost.
Several estimates of (5.2) are presented in table 2. First, the OLS and
modified OLS results (per section 3) are given. Second, the multivariate
regression results obtained by maximum likelihood estimation (assuming
multivariate normality, MLMN) of the system of equations (5.2) and the
factor share equations,

c7In &a In PKI = P,&Jq, = 8, + Ed,,

ii In y,/a In P,, = P,tL,/q, = 8, + Ed,,

are given along with the intercept correction. Finally, the frontier estimator
using the gamma distributed error term (MLG) is presented.16
There is a surprising pattern in the parameter estimates. The estimates of
the price terms, 0, and 0, using the frontier estimator are very similar to the
OLS results. The large differences in the MLMN estimates are obviously due
to the information in the share equations used by this estimator. The mean
sample values of capitals and labors share in total cost are 0.439 and 0.106,
respectively. As might be expected, the efficiency of the full information
estimator of these parameters far exceeds that of either single equation
estimator. The output terms behave quite differently. Both MLG estimates
are very close to midway between *the OLS and MLMN estimates, an
outcome which is somewhat surprising in view of the relatively large value of
P obtained. As before, noticeable gains in efficiency over OLS are obtained.
The frontier estimator performs about as well as MLMN on the output
terms, but generally worse on the price terms. Estimates of the mean and
variance (both empirical and implied for the MLG case) of the disturbance
distribution are presented with each set of estimates. Again, the MLG results
more closely resemble OLS.
The relative similarity of the MLG results to OLS might have been
predicted on the basis of the large estimate of P. Table 3 gives a comparison
of the distributions of c estimated for the production and cost frontiers. The
efficiency ratio p./(p-2) is far less for the cost case. The skewness coefficient
is much smaller, indicating a more nearly symmetric error distribution. As a
final measure, the degree of excess, E(E -~)~/a~ - 3 is computed. For the

See Christensen and Greene (1976) for details on the MLMN estimator. The third factor
share is redundant due to the adding up condition, and is dropped. Also, one of Nerloves
observations appears to be inappropriate for the sample. The sample is of the, costs of steam
power generation for 145 firms, but his observation 6 has costs only 10% to 25?:, of that of
comparably sized firms. However, most of this companys capacity was hydraulic, which would
greatly reduce the costs of thermal generation. This observation was dropped in calculating the
frontier estimator. All results are based on the reduced sample of 144 observations.
W.H. Greene, MLE of econometric frontier functions 51

Table 2
Estimates of eq. (5.2).

OLS MLMN MLG

Original Modified Original Modified

PO 10.050 8.668 7.865 6.739 8.496


(14.30) (14.30)a (47.90) (47.90)a (14.24)
a 0.152 0.300 0.238
(2.46) (5.14) (4.67)
B 0.101 0.083 0.090
(9.42) (8.24) (9.82)
0.074 0.426 0.092
(0.49) (44.18) (0.73)
0.481 0.106 0.453
(2.98) (27.88) (3.31)
1.382 1.126 1.148
0.095 0.033 0.106

For the estimated gamma density

A= 14.860, p = 17.072
(8.49) (3.66)
PIi= 1.149, I?jK2 = 0.077
(58.32) (5.88)

aThe t-ratio for the modified intercept will not quite equal that of the original estimate, since
V(jO)# V(fl,+e,,,). It is not clear what standard error is appropriate, although the original
estimate should be a good approximation.

Table 3
Summary statistics for production and cost frontier disturbance distributions

Primary metals (production) Electric power (cost)


(Hildebrand & Liu) (Nerlove)

B 5.3804 17.0716
Asymptotic efficiency ratio 1.5916 1.1327
Skewness 0.8662 0.4841
Degree of excess 1.1152 0.3083

Gamma distribution, this is simply 6/P. This measure is sometimes used as a


measure of non-normality, as the mesokurtic value of 0 would be obtained
if E were normally distributed. Again, the value for the cost function is
substantially less than that for the production function.
52 MLE ofeconometric
W.H. Greene, frontier functions

Table 4
Estimates of scale economies

Output
(million k Wh) OLS MLMN MLG

43 1.88 1.63 1.75


338 1.35 1.28 1.32
1109 1.16 1.13 1.16
2226 1.07 1.06 I .07
5819 0.97 0.98 0.98

Implied estimates of minimum efficient scale (million kWh)


4429 4600 4753

The scale economies results for the three estimators are very similar in
spite of the differences in the parameter estimates. Table 4 presents the value
of ylt for the firm with the median output in each live groups. (The firms are
ranked by output and there are 29 firms in each group.) The frontier does
predict that scale economies are slightly more persistent than suggested by
the other two estimators. The predicted minimum efficient scale (MES), at
which average cost reaches its minimum, is larger for the frontier, although
the difference is certainly not economically meaningful.

5.3. Measuring technic-d and cost &ciency

One of the primary motivations for estimating frontier functions is to


study technical and/or cost efficiency. The formulation of the production
model as most authors have considered it is y =F(x)u where O<us 1, so that
in log form, logy=logF(n)+log u=logF(x)-c. In the case of full frontier
models, the sample residuals e, or C,, t = 1,. . ., T, provide observation specific
estimates of the efficiency factors with each sample point. The residuals from
the cost frontier provide analogous information on cost efficiency.
It is also useful, particularly if the sample contains a large number of
observations, to have a summary measure of efficiency for the sample as a
whole; and the approach typically taken has been to use the moments of the
estimated distribution of u or .Z to characterize the overall efficiency in the
sample. For example, Afriat (1972) suggested that u be chosen so that logu
has a Gamma density. Using the one parameter family with A= 1 in (4.1),
Richmond (1974) analyzes this suggestion in some detail and finds it has
some potentially peculiar implications. In particular, if E has the Gamma
density with ;i = 1, then u = e- has distribution

f;(u)==(1I~(P)) (log(llu))p-, O<uSl.


W.H. Greene, MLE of econometric frontier functions 53

The mass of the distribution of u can be concentrated near 1 if PC 1, which


would imply most firms are relatively efficient, spread uniformly between 0
and 1 if P= 1, which would imply a uniform distribution of technical
efficiencies, or concentrated near 0 if P> 1, which would imply most firms are
relatively inefficient. In fact, a similar characterization applies to the A-C
(and Schmidt) and Forsund and Jansen (1977) formulation. The distribution
of u =emE for the variate in (2.3) or (2.8) is f,(u)=,k- . The median of this
distribution is 2- jn, which is greater than, equal to, or less than l/2 as 1 is
greater than, equal to, or less than one. A similar analysis can be applied to
the half normal variate for which e- has a truncated lognormal distribution.
The unavoidable conclusion is that without some a priori constraint on the
parameters of f:(s) or fU(u), these distributional implications are simply
inherent in the model.
Richmond then shows that a summary efficiency measure for the one
parameter Gamma disturbance, E(e-) = 2-, can be estimated consistently,
but with an upward bias, using the OLS residuals. For the more general
model used in this study, summary measures for the distribution of
efficiencies may be estimated using the following results for E- G(i, P) (which
can be verified by direct integration):

E(e-)=[i/l+rIjP, (5.3)
and

E(e)=[i/A-rlP, %>r. (5.4)

The first of these provides moments for the distribution of technical


(productive) efficiencies, the second for the distribution of cost efficiencies.
[Note that Richmonds measure is a particular case of (5.3).] The means and
standard deviations for the implied efficiency distributions for the models
estimated in this study are presented below in table 5. The moments of the
distribution of log u are reproduced for convenience.
Finally, it would be useful to establish a relationship between the values of
technical and cost efficiency for a particular firm. Let U, be the efticiency
factor on the production function and U; be that on the dual cost function. If
the production function is homogeneous of degree a, then u; =IA-, which is
particularly convenient in the constant returns to scale (a= 1) case. If the
production function is not homogeneous, then the relationship between u,
and u; need not be explicit, and will depend on the type of function specified.
In general, this relationship will depend on the degree of returns to scale,
which need not be constant, in a complicated manner. Some results for the
homothetic case may be found in Forsund and Jansen (1977); however, their
54 W.H. Greene, MLE of econometric frontier jurtcrions

disturbance formulation is an unnatural one. This should be a useful area for


future research given the inherent limitations of the (homogeneous) Cobb
Douglas model.

Table 5
Efficiency distributions

Production cost

A 11.1931 14.8600
P 5.3804 17.072 1
Gamma density (E)
Mean 0.4562 1.149
Standard deviation 0.1967 0.2715
Efficiency distribution (u)
Mean 0.6454 3.2849
Standard deviation 0.1182 1.0028
Implied dual efftciency distribution (a)
a
Mean 1.4827
a
Standard deviation 0.2146

The cost function does not correspond to a homogeneous production function.


These parameters were not estimated.

6. Summary and conclusions

The frontier function has been something of an enigma for econometric


estimation. Beyond (apparent) consistency of some of them, the properties of
the received estimators have remained unknown. The ingenious new
specification of Aigner, Lovell, and Schmidt has provided a regular estimator
with more clearly defined properties and a better behaved likelihood
function than previous frontier estimators; however, it lacks the theoretical
appeal of the full frontier function. This paper has provided a simple
estimator for the full frontier function which has all of the familiar properties
of maximum likelihood estimators. Standard errors are computed in the
usual fashion, and all of the results for regular MLEs, are obtained.
It is shown that the irregular nature of the likelihood function is not an
unavoidable problem in the frontier setting, but merely a consequence of the.
choice of the disturbance distribution. There is a large class of disturbance
distributions which may be specified which make the maximum likelihood
frontier estimator regular and well behaved. Two simple conditions on the
error distribution are shown which are sufficient to make the problem of
estimation a regular one to which the standard analysis may be applied.
MLE ofeconometric
W.H. Greene, frontier junctions 55

Heuristically, what is required is that the shape of the distribution of the


disturbance term be similar to those of the familiar lognormal or chi-squared
distributions (both of which are candidates).
Two estimators have been presented. First, a consistent estimator is
obtained by a minor modification of least squares. Then, a maximum
likelihood estimator for the gamma distributed error term is presented.
Depending on how asymmetric the disturbance distribution is, large
efficiency gains can be achieved over OLS by using maximum likelihood.
The gamma distribution shares an attractive property with ALSs
specification. If the errors are symmetrically distributed, the distribution
approaches normality, and the resulting estimator approaches least squares.
This implies that the large efficiency gains of maximum likelihood are
obtained only when the error distribution is asymmetric, a result which is
discerned from the regression residuals.
Two applications are then considered. In the first, earlier results of Aigner
and Chu are replicated with our new specification. Large efficiency gains
over least squares are found, and the results are substantially different from
those obtained earlier. The second application re-estimates a cost function of
Nerlove. It is found that the gains in efficiency are somewhat smaller than in
the first case, but still noticeable. Inferences about scale economies under our
specification turn out to be virtually identical. The specification allows a
detailed analysis of technical or cost inefficiency, both in terms of the overall
characteristics of the distributions of the random variables involved and on
an individual observation by observation basis.
In general, the specification given in this study is a substantial
modification of the received frontier estimators. To date, all of these have
been lit using some programming algorithm, and the result has been an
envelope function. For our specification, this is not the case. The estimator is
not an envelope in the usual sense, as no points are on its boundary. The
difference is that the statistical properties of the estimator are clearly known.

Consider, for example, a sample of 1 from the model y=cr ts and B- G( 1,3). Any
programming estimator will choose &=y and e=O, but the MLE will be 4=).-Z and e=2.

References

Afriat, N.S., 1972, Efficiency estimation of production functions, International Economic Review
13, Oct., 568-598.
Aigner, D.J. and D.S. Chu, 1968. On estimating the industry production function, American
Economic Review 58. 826839.
Aigner, D., K. Love11 and P. Schmidt, 1977, Formulation and estimation of stochastic
frontier production function models, Journal of Econometrics 5, no. 1, 21-38.
Amemiya, T., 1973, Regression analysis when the dependent variable is truncated normal,
Econometrica 41, no. 6, Nov., 997-1016.
Barnett, W.A., 1976, Maximum likelihood and iterated Aitken estimation of nonlinear systems of
equations, Journal of the American Statistical Association 71, June, 354-360.
Christensen, L.R. and W.H. Greene, 1976, Economies of scale in U.S. electric power generation,
Journal of Political Economy 84, no. 1, 655-676.
Cramer, H., 1946, Mathematical methods of statistics (Princeton University Press, Princeton,
NJ).
Diewert, E., 1974, Applications of duality theory, in M.D. Intrilligator and D.A. Kendrick, eds.,
Frontiers of quantitative economics (North-Holland, Amsterdam).
Farrel, J.M., 1957, The measurement of productive efficiency, Journal of the Royal Statistical
Society A CXX, Part III, 253 290.
Forsund, F. and E.S. Jansen, 1977, On estimating average and best practice homothetic
production functions via cost functions, International Economic Review 18, no. 2, June, 463.
476.
Hildebrand, G. and T.C. Liu, 1965, Manufacturing production functions in the United States,
1957 (Cornell University Press, Ithaca, NY).
IBM, 1977, Scientific subroutine package.
Jennrich, RI., 1969, Asymptotic properties of nonlinear least squares estimators, Annals of
Mathematical Statistics 40, April, 6333643.
Kaplan, W., 1952, Advanced calculus (Addison Wesley, Reading, MA).
Kendall, M.G. and AS. Stuart, 1969, The advanced theory of statistics, Vol. 1 (Griffin, London).
Kendall. M.G. and A.S. Stuart, 1973, The advanced theory of statistics, Vol. II (Griffin,
London).
Meeusen, W. and J. van den Broeck, 1977, Efficiency estimation from CobbDouglas production
functions with composed error, International Economic Review 18, no.2, June, 435-555.
Nerlove, M., 1963, Returns to scale in electricity supply, in: Carl F. Christ, ed., Measurement in
economics, Studies in mathematical economics and econometrics in honor of Yehuda
Grunfeld (Stanford University Press, Stanford, CA).
Richmond, J., 1974, Estimating the efficiency of production, International Economic Review 15,
no. 2, June, 515-521.
Schmidt, P., 1975, On the statistical estimation of parametric frontier production functions,
Review of Economics and Statistics 58. 238239.
Shephard, R.W., 1953, Cost and production functions (Princeton University Press, Princeton, NJ).
Theil, H., 1971, Principles of econometrics (Wiley, New York).
Timmer, C.P., 1971, Using a probabilistic frontier production function to measure technical
efficiency, Journal of Political Economy 79, 7677794.
Wald, A., 1949, Note on the consistency of the maximum likelihood estimator, Annals of
Mathematical Statistics 20, 5955601.
Zellner, A., J. Kmenta and J. Dreze, 1966, Specification and estimation of Cobb Douglas
production functions, Econometrica 34, 784~795.

Вам также может понравиться