12.6.1 MAP estimates
Definition 12.7 The valuc of 9 that maximizes the « posterior’ density (that is, the mode
of the posterior density) is called the maximum a posteriori probability estimate
of a
I the posterior density of @ given X is unimodal and symmetric, then it is easy to see
thatthe MAP estimate and the mean square¢ estimate coincide, for then the posterior density
attains its maximum value at is expectation. Furthermore, under these circumstances, the
‘median also coincides with the mode and the expectation. Thus, if we are lucky enough to
be dealing with such distritutions, the various estimates all tend toward the same value,
‘Although inthe development of maximum-likelihcod estimation theory we eschewed
the characterization of # as random, we may gain some valusble understanding of the
_maximum-likelihood estimate by considering 9 to be a random variable whose prior distri-
bution isso dispersed (that is, has sucha large variance) thatthe information provided by the
priors vanishingly small. I the theory is consistent, we would have a right to expect that
the maximum-likelihood estimate would be the limiting ease of such a Bayesian estimate
Let be considered as a random variable distributed according tothe a priort density
Jol). The a postertort distribution for 9, then, is given by
Sirol) 0)
Fels)
If the logarithm of the a posteriori density is differentiable with respect 1 @, then the
MAPestimate is given by the solution to
A log farx(O 1)
Sort? |x) (12.73)
12.74)
90 loaivay em
This equation is called the MAP equation
“Taking the logarithm of (12.73) yields
log fox? |x) = log fw Ce | #) + log fol) — log f(x).
and since f(x) is net a function of #, the MAP equation becomes
(812) _ alos fewix 19) , Blog fol) ams)
a cs)
‘Comparing (12.75) to the standard maximum-likelibood equation
aL) °
80 Nomi.
su
‘we see that the two expressions differ by If fs() is sufficiently “flat,” (that is, if
the variance is very large) ts logarithm will also be Mt, so the gradient ofthe logarithms
willbe nearly zero, and thea posterior density will be maximized, in the limiting case, a
the maximum-lkelihoog esate
Example 12.62 Let X),..., denote a random sample of size m from the normal dstebution
N'(@,a7). Suppose « is known, and we wish to find the MAP estimate for the mean, @. The jointdensity function for X45... Ne is
Fin
Suppose # is distnibuced NO, of), that i.
00)
Straightforward manipula
a yields
og fax (? 13)
3
Nowsitis lear thats 9 —+ 90, the limiting expression isthe maximum-Hkelhood estimate yy. It
is also tre that, as m + 00, the MAP estimate asymptotically approsches the ML estimate. Ths,
asthe knowledge abou @ from te prior distribution tends to ze10, of asthe amount ef data becomes
‘overwhelming, the MAP estimate converges tothe msximiun-likslihood eximate 0
12.6.2 Summary
From the preceding results, we have seen that the Bayes estimate of # based upon the
‘measurement of 2 random variable X depends upon the posterior density foe(9 |x). The
conversion of the prior information about @ represented by fo(@) to the posterior density
is accomplished via the expression
fate)
|x) = FE Ho. 12,76)
Sax(9 | Futxy 120) (
‘The posterior density feix(9 |x) represents our state of knowledge after the measurement
of X. Itis on the posterior density that we base our estimate and, for Bayesian purposes, it
‘containsall the information necessary for estimation. On te basis ofthe posterior, estimates
ccan be obtained in several ways:
1, For a minimum variance (quadratic loss function),
3 = £16 1X)
2. To minimize |? ~ 5), set 9 tothe median of foix(0 12).
3. To maximize the probability that 9 = 9, set 9 to the mode (maximum value) of
Fox? 18).ray ee (0, I,
rawinel?
otherwise
FOAM yor ay ool
fod) =f Pera
0 otherwise
‘Then the joint distribution of 9 and X
Xl is
a
Fol.) = fap (X\A)f0)
Plt) gus oe
- PD ge pene
Fer
where y = SM, x: The posterior distribution of? given Xis
Feats0) fats. 8)
fox? 19)
= Fo) Tino
Iccanbe shown (see exerse 12
“07! ~p 8). «27
Wherea = a+ yand F=f +m —y. Thus, both fy and fay have a 9 distbtion a