Estimation Point and Interval Berger 2015

Author's personal copy
Provided for non-commercial research and educational use only.

Not for reproduction, distribution or commercial use.
This article was originally published in the International Encyclopedia of the Social
& Behavioral Sciences, 2nd edition, published by Elsevier, and the attached copy
is provided by Elsevier for the authors benefit and for the benefit of the
authors institution, for non-commercial research and educational use including
without limitation use in instruction at your institution, sending it to specific
colleagues who you know, and providing a copy to your institutions administrator.
All other uses, reproduction and distribution, including

without limitation commercial reprints, selling or
licensing copies or access, or posting on open
internet sites, your personal or institutions website or
repository, are prohibited. For exceptions, permission
may be sought for such use through Elseviers
permissions site at:
http://www.elsevier.com/locate/permissionusematerial
From Berger, R.L., 2015. Estimation: Point and Interval. In: James D. Wright
(editor-in-chief), International Encyclopedia of the Social & Behavioral Sciences,
2nd edition, Vol 8. Oxford: Elsevier. pp. 1620.
ISBN: 9780080970868
Copyright 2015 Elsevier Ltd. unless otherwise stated. All rights reserved.
Elsevier
Estimation: Point and Interval

Roger L Bergery, Arizona State University, Phoenix, AZ, USA
2015 Elsevier Ltd. All rights reserved.
This article is a revision of the previous edition article by G. Casellay, R.L. Berger, volume 10, pp. 47444749, 2001, Elsevier Ltd.
Abstract
In parametric statistical inference, knowledge about a population parameter yields knowledge about the entire population.
Thus, methods of estimating population parameters are cornerstones to statistical analysis. Point estimators provide a single
value as an estimate of a parameter. Set estimators provide a set of possible values. Set estimators quantify uncertainty about
the parameter through the size of the set and the probability of the set covering the parameter. Various methods for deriving
point and set estimators and various methods for evaluating estimators are discussed.
Introduction Maximum Likelihood

For a sample X (X1, ., Xn) from f(xjq1, ., qk), the likelihood
When sampling from a population described by a density or
function is dened by
mass function f(xjq), knowledge of q yields knowledge of the
entire population. Hence, it is natural to seek a method of Y
n
nding a good estimator of the point q, that is, a good point Lqjx f xi jq1 ; .; qk
i1
estimator. However, a point estimator alone is not enough for
a complete inference, as a measure of uncertainty is also The values of q1, ., qk that maximize this function are those
needed. For that, we use a set estimator in which the inference is parameter values for which the observed sample x is most likely
the statement that q C where C 3 Q, Q is the parameter and are called the maximum likelihood estimators (MLEs). If
space (set of all possible values of q), and C C(x) is a set the likelihood function is differentiable (in qi), the MLEs can
determined by the value of the data, X x, observed. If q is real- often be found by solving
valued, then we usually prefer the set estimate C to be an
v
interval. Our uncertainty is quantied by the size of the interval log Lqjx 0; i 1; .; k
vqi
and its probability of covering the parameter.
where the vector with coordinates v log Lqjx is called the
vqi
score function (see Schervish, 1995: Section 2.3).
Point Estimators Example: If X (X1, ., Xn) is a sample from a Bernoulli (p)
population, the likelihood function is
In many cases there will be an obvious or natural candidate for
Y
n
a point estimator of a particular parameter. For example, the Lpjx pxi 1 p1xi
sample mean is a natural candidate for a point estimator of the i1
population mean. However, when we leave a simple case like
and differentiating log L(pjx) and setting the result equal to
this, intuition may desert us, so it is useful to have some P
techniques that will give reasonable candidates for consider- zero gives the MLE bp ni 1 xi =n x, the sample mean. This
ation. Those that have stood the test of time include these. is also the MOM estimator.
If instead we have a sample X (X1, ., Xn) from a bino-
mial (k, p) population, where p is known and k is unknown, the
Method of Moments likelihood function is
The method of moments (MOM) is, perhaps, the oldest Yn
k xi
method of nding point estimators, dating back at least to Karl Lkjx; p p 1 pkxi
i1 xi
Pearson in the late 1800s. One of the strengths of MOM esti-
mators is that they are usually simple to use and almost always and the MLE must be found by the numerical maximization.
yield some sort of estimate. In many cases, unfortunately, this The MOM gives the closed form solution
method yields estimators that may be improved upon. x2
Let X (X1, ., Xn) be a sample from a population with b
k P ;
x i xi x2 n
density or mass function f(xjq1, ., qk). MOM estimators are found
by equating the rst k sample moments to the corresponding k which can take on negative values. This illustrates a short-
population moments. That is, we dene the sample moments by coming of the MOM, one not shared by the MLE. Another,
P j
mj ni 1 Xi =n and the population moments by mj(q1, ., perhaps more serious shortcoming of the MOM estimator is
qk) Eq X for j 1, ., k. Then we set mj mj(q1, ., qk) and solve
j that it may not be based on a sufcient statistic (see Statistical
for q1, ., qk. This solution is the MOM estimator of q1, ., qk. Sufciency), which means it could be inefcient in not using all
of the available information in a sample. In contrast, both
MLEs and Bayes estimators (see Bayesian Statistics) are based
y
Deceased. on sufcient statistics.
16 International Encyclopedia of the Social & Behavioral Sciences, 2nd edition, Volume 8 http://dx.doi.org/10.1016/B978-0-08-097086-8.42026-X
International Encyclopedia of the Social & Behavioral Sciences, Second Edition, 2015, 1620
Estimation: Point and Interval 17
P
Bayes estimator of s2, S2 ni 1 Xi X2 =n 1, has MSEsS2
2s /(n 1). An alternative estimator of s2 is the MLE
4
In the Bayesian paradigm a random sample X1, ., Xn is drawn P
s 2 ni 1 Xi X2 =n n 1S2 =n. This is a biased esti-
b
from a population indexed by q, where uncertainty about q can
mator of s2 with
be described by a probability distribution (called the prior
distribution). After the sample is taken, the prior distribution is 2n 1 4 2 4
s2
MSEs b s < s MSEs S2 ;
updated with this sample information. The updated prior is n2 n1
called the posterior distribution.
showing that s b 2 has smaller MSE than S2. Thus, by trading off
If we denote the prior distribution by p(q) and the sampling
distribution by f(xjq), then the posterior distribution, the variance for bias, the MSE is improved.
conditional distribution of q given the sample, x, is Measuring performance by the squared difference between
the estimator and a parameter is a special case of a function
f xjqpq called a loss function. The study of the performance and opti-
pqjx ;
mx mality of estimators evaluated through loss functions is
R a branch of decision theory. In addition to MSE, based on
where mx f xjqpqdq is the marginal distribution of x.
Example: Let X (X1, ., Xn) be a sample from a Bernoulli (p) squared error loss, another popular loss function is absolute
P error loss, Lq; W jW qj. Both of these loss functions
population. Then Y i Xi is binomial (n, p). If p has a
beta (a, b) prior distribution, that is increase as the distance between q and W increases, with
minimum value L(q, q) 0. That is, the loss is minimum if the
Ga b a1
pp p 1 pb1 ; estimator is correct.
GaGb In a decision theoretic analysis an estimator is evaluated
the posterior distribution of p given y is through its risk function. For an estimator W of q, the risk
function is R(q, W) EqL(q, W); the risk function is the average
f yjppp
ppjy loss. If the loss is squared error, the risk function is the MSE.
my Using squared error loss, the risk function (MSE) of the
Gn a b binomial Bayes estimator of p is
pya1 1 pnyb1 ;
Gy aGn y b
2
MSEp b
p B Var p b
p B Biasp b
pB
which is a beta distribution with parameters y a and n y b. 2
The posterior mean, a common Bayes estimator of p, is np1 p np a
p :
ya a b n2 abn
b
pB :
abn In the absence of good prior information about p we might
choose a and b to make the risk function of b p B constant
p
Evaluating Point Estimators (called an equalizer rule). The solution is a b n=4,
yielding
There are many methods of deriving point estimators (robust n
MSEp b
pB p 2 :
methods, least squares, estimating equations, invariance), but 4n n
the three in Section Point Estimators are among the most
popular. No matter what method is used to derive a point We can also use a Bayesian approach to the problem of loss
estimator, it is important to evaluate the estimator using some function optimality, where we would use the prior distribution
R
performance criterion. to compute an average risk, Q Rq; Wpqdq, known as the
One way of evaluating the performance of a point estimator Bayes risk. We then nd the estimator that yields the smallest
W of a real-valued parameter q is through its mean squared error value of the Bayes risk. Such an estimator is called the Bayes
(MSE), dened by rule with respect to a prior p.
MSEq W Eq W q2 Var q W Eq W q2 : To nd the Bayes decision rule for a given prior p we write
the Bayes risk as
Dening the bias of a point estimator by BiasqW EqW q 2 3
yields MSEqW VarqW (BiasqW)2. An estimator whose bias Z Z Z
is identically (in q) equal to zero is called unbiased. Rq; Wpqdq 4 Lq; Wxpqjxdq5mxdx;
For an unbiased estimator we have MSEqW VarqW; if an Q x Q
estimator is unbiased, its MSE is equal to its variance. If
where the quantity in square brackets is the expected value of
X (X1, ., Xn) is a sample from a population with mean q
the loss function with respect to the posterior distribution,
and variance s2, the sample mean is an unbiased estimator of
called the posterior expected loss. It is a function only of x and not
q Eq X q, and
a function of q. Thus, for each x, if we choose the estimate W(x)
s2
MSEq X Varq X : to minimize the posterior expected loss, we will minimize the
n Bayes risk, and W(x) is the Bayes rule.
Controlling bias does not guarantee that MSE is minimized. For squared error loss, the posterior expected loss is
In particular, it is sometimes the case that a trade-off occurs minimized by the mean of the posterior distribution. For
between variance and bias. For example, in sampling from absolute error loss, the posterior expected loss is minimized
a normal population with variance s2, the usual unbiased by the median of the posterior distribution. If X (X1, ., Xn)
18 Estimation: Point and Interval
is a sample from a normal population, with mean q and Inverting a Test Statistic
variance s2 (n(q, s2)), and the prior on q is n(m, s2), the
There is a correspondence between acceptance regions of
posterior mean is
hypothesis tests (see Hypothesis Testing in Statistics) and
s2 s2 n condence sets, summarized in the following theorem.
Eqjx 2 x 2 m:
s s2 =n s s2 =n Theorem 1: For each q0 Q, let A(q0) be the acceptance
Because the posterior distribution is normal, it is symmetric region of a level a test of H0: q q0. For each sample point x,
and the posterior mean is the Bayes estimator for both squared dene a set C(x) in the parameter space by
error and absolute error loss. The posterior mean in our Cx fq0 : x Aq0 g:
binomial/beta example, b p B y a=a b n, is the Bayes
estimator for squared error loss. Then the random set C(X) is a condence set with condence
We can loosely group evaluation criteria into large sample coefcient 1 a. Conversely, let C(X) be a 1 a condence
(asymptotic) methods and small sample methods. Our calcu- set. For each q0 Q, dene
lations using MSE and risk functions illustrate small sample Aq0 fx : q0 Cxg:
methods. In large samples, MLEs typically perform very well,
being asymptotically normal and efcient, that is, attaining the Then A(q0) is the acceptance region of a level a test of
smallest possible variance. Other types of estimators that are H0: q q0.
derived in a similar manner (for example, M-estimators see Example: If X (X1, ., Xn) is a sample from an n(q, s2)
Robustness in Statistics) also share good asymptotic properties. population, with s2 known, the level a test of H0: q q0 versus
For a detailed discussion see Casella and Berger (2001), H1: q s q0 will accept the null hypothesis, if jx q0 j=
p
Lehmann (1999), Stuart et al. (1999), or Lehmann and Casella s= n < za=2 . This inequality can be equivalently written as
(1998: Chapter 6). s s
x za=2 p < q0 < x za=2 p :
n n
Interval Estimation This interval of q values, for which the null hypothesis is
accepted at level a, is a 1 a condence interval for q.
Reporting a point estimator of a parameter q only provides part
of the story. The story becomes more complete if an assessment
Pivotal Intervals
of the error of estimation is also reported. Informally, this can
be accomplished by giving an estimated standard error of the An elegant method of constructing set estimators is the use
estimator, and, more formally, this becomes the reporting of an of pivotal quantities (Barnard, 1949). A random variable,
interval estimate. If X x is observed, an interval estimate of Q(X, q) Q(X1, ., Xn, q), is a pivotal quantity (or pivot), if
a parameter q is a pair of functions, L(x) and U(x), for which the the distribution of Q(X, q) is independent of all parameters.
inference q Lx; Ux is made. The coverage probability of the If we nd a set C such that P(Q(X, q) C) 1 a, then
random interval (L(X), U(X)) is the probability that (L(X), the set C(X) {q: Q(X, q) C} has coverage probability
U(X)) covers the true parameter, q, and is denoted by 1 a.
Pq(L(X) < q < U(X)). In location and scale cases, once we calculate the sample
By denition the coverage probability depends on the mean X and the sample standard deviation S, examples of
unknown q, so it cannot be reported. What is typically reported pivots are:
is the condence coefcient, the inmum of the coverage prob-
abilities, inf q Pq LX < q < UX.
Form of pdf Type of pdf Pivot
If X (X1, ., Xn) is a sample from a population with mean
q and variance s2, a common interval estimator for q is f x q Location X q
fx=s=s Scale S=s
s s
x 2 p < q < x 2 p;
n n f x q=s Locationscale X q
s S
where x is the sample mean and s is the sample standard
deviation. The validity of this interval can be justied from the
Central Limit Theorem, because In general, differences are pivotal for location problems, while
ratios (or products) are pivotal for scale problems. see Fiducial
Xq and Structural Statistical Inference.
p/n0; 1;
S= n Example: Let X (X1, ., Xn) be a sample from an expo-
P
nential (l) population, and let T ni 1 Xi . T has a gamma
the standard normal distribution. We then see that the coverage (n, l) distribution. This gamma pdf is
probability (and condence coefcient) of this interval is t n1 1
approximately 95%. f tjl Gn et=l :
The above interval is a large sample interval, because its l l
justication is based on an asymptotic property. There are In this gamma pdf t and l appear together as t/l, and, in fact,
many methods for constructing interval estimators that are the gamma (n, l) pdf is a scale family in l. If Q(T, l) 2T/l,
valid in small samples, including these. then Q(T, l) has a gamma (n, 2) (also called c22n ) distribution,
Estimation: Point and Interval 19
which does not depend on l. Hence, the quantity Q(T, l) is interval denitions, together with the usual condence interval,
a pivot, and a 1 a pivotal interval is provide us with a hierarchy of inferences, each more stringent
2T 2T than the previous.
<l< 2 ; Let X (X1, ., Xn) be a sample from a population with cdf
c22n;a=2 c2n;1a=2
F(xjq). If C(x) (L(x), U(x)) is an interval, for a specied value
where Pc22n > c22n;a a. 1 a it is a:
1. condence interval if, for all q, Pq LX < q < UX
Bayesian Intervals 1 a;
2. prediction interval if, for all q, Pq LX < Xn1 < UX
If p(qjx) is the posterior distribution of q given X x, then for 1 a, where Xn1 is an independent, new observation;
any set A3Q the posterior probability of A is 3. tolerance interval if, for all q and for a specied value p,
Z Pq FUXjq FLXjq p 1 a.
Pq Ajx pqjxdq:
So a condence interval covers a parameter, a prediction
A
interval covers a new observation, and a tolerance interval
If A A(x) is chosen so that this posterior probability is 1 a, covers a proportion of the population. Each gives a different
then A is called a 1 a credible set for q. inference with the appropriate one being dictated by the
The interpretation of the Bayes interval estimator is problem at hand. Vardeman (1992) discusses the importance
different from the classical intervals. In the classical approach of each of these intervals.
to assert 95% coverage is to assert that in 95% of repeated
experiments the realized intervals will cover the true param-
eter. In the Bayesian approach a 95% coverage means that the Conclusions
probability is 95% that the parameter is in the realized
interval. In the classical approach the randomness comes from Point estimation is one of the cornerstones of statistical anal-
the repetition of experiments, while in the Bayesian approach ysis and the basic element on which many inferences are based.
the randomness comes from uncertainty about the value Inferences using point estimators gain statistical validity when
of the parameter (summarized in the prior and posterior they are accompanied by an interval estimate, providing an
distributions). assessment of the uncertainty. We have mainly discussed
Example: Let X (X1, ., Xn) be a sample from an parametric point and interval estimation, where we assume
Poisson (l) population, and assume that l has a gamma (a, b) that the underlying model is correct. Such an assumption can
prior, where a is an integer. The posterior distribution of l is be questioned, and considerations of nonparametric or robust
! alternatives can address this (see Robustness). For more on
Xn
b
pljx is gamma a xi ; : these subjects see, for example, Boos and Stefanski (2013),
i1
nb 1
Hettmansperger and McKean (1998), or Staudte and Sheather
Thus the posterior distribution of 2nb 1=bl is c22aP x , (1990). Full treatments of parametric point and interval esti-
and a 1 a Bayes credible interval for l is i i mation can be found in Casella and Berger (2001), Stuart et al.
(1999), or Schervish (1995).
c2 P c22 aP x ;a=2
2a ;1a 2
x
i i
i i

<l< :
2nb 1=b 2nb 1=b
Acknowledgment
We can form a Bayes credible set by using the highest posterior
density (HPD) region of the parameter space, also. That is, This article is a revised version of an article on the same topic in the rst
choose c so that the set Cx fl: pljx > cg satises edition of the International Encyclopedia of the Social and Behavioral
Sciences, which was coauthored with the late George Casella, University
1 a Pl Cxjx: of Florida.
Such a construction is optimal in the sense of giving the
shortest interval for a given 1 a, although if the posterior is See also: Bayesian Statistics; Likelihood in Statistics; Quantile
multimodal the set may not be an interval. Regression; Statistical Identication and Estimability; Statistical
Sufciency.
Other Intervals
Bibliography
We have presented two-sided parametric condence intervals
that are constructed to cover a parameter. Other types of Barnard, G.A., 1949. Statistical inference (with discussion). Journal of the Royal
intervals include one-sided intervals, distribution-free intervals, Statistical Society, Series B 11, 115139.
prediction intervals, and tolerance intervals. Boos, D.D., Stefanski, L.A., 2013. Essential Statistical Inference, Theory and Methods.
One-sided intervals are those in which only one endpoint is Springer, New York.
Casella, G., Berger, R.L., 2001. Statistical Inference, second ed. Wordsworth/Brooks
estimated, such as q (L(X), N). Distribution-free intervals are
Cole, Pacic Grove, CA.
intervals whose probability guarantee holds with few (or no) Hettmansperger, T.P., McKean, J.W., 1998. Robust Nonparametric Statistical
assumptions on the underlying population. The other two Methods. Wiley, New York.
20 Estimation: Point and Interval
Lehmann, E.L., 1999. Elements of Large-Sample Theory. Springer-Verlag, Stuart, A., Ord, J.K., Arnold, S., 1999. Kendalls Advanced Theory of Statistics, Vol.
New York. 2A, Classical Inference and the Linear Model, sixth ed. Wiley, London.
Lehmann, E.L., Casella, G., 1998. Theory of Point Estimation, second ed. Springer- Vardeman, S.B., 1992. What about the other intervals? The American Statistician 46,
Verlag, New York. 193197.
Schervish, M.J., 1995. Theory of Statistics. Springer-Verlag, New York.
Staudte, R.G., Sheather, S.J., 1990. Robust Estimation and Testing. John Wiley,
New York.

Estimation Point and Interval Berger 2015

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Estimation Point and Interval Berger 2015

Загружено:

Авторское право:

Доступные форматы

Author's personal copy

Provided for non-commercial research and educational use only.

All other uses, reproduction and distribution, including

Estimation: Point and Interval

Introduction Maximum Likelihood

Вам также может понравиться