Expanded Steps For Example 2.3 From Pattern Recognition/4e by Theodoridis and Koutroumbas

Expanded steps for Example 2.
3
from Pattern Recognition/4e by Theodoridis and Koutroumbas
Heath Hunnicutt
23 October 2012
The following notes are my elaboration of the steps presented in Example 2.3 from Pattern Recognition/4e
by Theodoridis and Koutroumbas. Text presented with a gray background is quoted from that ex-
ample. Equation numbers such as (2.55) match the numbers used in the text. Equation numbers
such as (H.3) do not appear in the text.
1 Example 2.3
Example 2.3 is contained in section 2.5.1, Maximum Likelihood Parameter Estimation:
Assume that N data points, x
1
, x
2
, ..., x
N
, have been generated by a one-dimensional
Gaussian pdf of known mean, , but of unknown variance. Derive the ML estimate of
the variance.
The technique of Maximum Likelihood Parameter Estimation (MLPE) requires the user to assume
a probability distribution function (pdf) of population data from which the sample data points have
been drawn.
Example 2.3 states that the assumed distribution of the population data is a Gaussian pdf. The
mean, , is assumed to be known.
As stated in the example, the parameter being estimated, , is the variance, or =
2
.
The technique of MLPE is to dene a likelihood function, p(X, ), where X = x
1
, x
2
, ..., x
N
, and to
nd the maximum of this function with respect to varying parameter and xed data points X. The
likelihood function is the probability that all of the measurements x
k
could occur given the unknown
parameter . If the samples x
k
are believed to be causally independent of each other, then their
probabilities under the parameter are considered independent. In this case, the probability that all
measurements occurred is the product of the probability of each measurement having occurred, or
equation (2.55) from the text:
p(X; ) =
N
k=1
p(x
k
; ) (2.55)
1
If it can be shown that the likelihood must be non-zero for any value of the parameter, the log-
likelihood function can be used. Solving for the maximum may be easier when using the log-
likelihood, depending on the pdf assumed. In the case of the Gaussian, each probability p(x
k
, )
must be non-zero for all the Gaussian pdf never vanishes. Therefore use of the log-likelihood
is mathematically valid for the Gaussian pdf. We will also see that log-likelihood is convenient for
the Gaussian pdf. The log-likelihood may be written L(X; ) or L() when the context allows for
such brevity:
The log-likelihood function for this case is given by
L() = ln
N
k=1
p(x
k
;
2
) = ln
N
k=1
1
2
exp
_
(x
k
)
2
2
2
_
The left equation is a reference to equation 2.58 from the text, rewritten in terms of =
2
:
L() = ln
N
k=1
p(x
k
; ) (2.58)
Substituting =
2
, we derive the left equation from the example:
L() = L(
2
) = ln
N
k=1
p(x
k
;
2
)
To obtain the equation on the right, we expand p(x
k
;
2
) as a Gaussian pdf. Any Gaussian pdf is
dened by the probability function:
p(x) =
1
2
exp
_
(x )
2
2
2
_
so that:
p(x
k
;
2
) =
1
2
exp
_
(x
k
)
2
2
2
_
and therefore:
L(
2
) = ln
N
k=1
1
2
exp
_
(x
k
)
2
2
2
_
as given by the text in the equation shown above on the right.
2
Example 2.3 continues with:
or
L(
2
) =
N
2
ln(2
2
)
1
2
2
N
k=1
(x
k
)
2
which indicates an application of the logarithmic identity:
ln
k
a
k
=
k
ln a
k
(H.1)
Resuming our derivation of the likelihood function, we most recently obtained:
L(
2
) = ln
N
k=1
1
2
exp
_
(x
k
)
2
2
2
_
(H.2)
We now apply the logarithmic identity (H.1), twice:
L(
2
) =
N
k=1
ln
_
1
2
exp
_
(x
k
)
2
2
2
__
(H.3)
=
N
k=1
_
ln
1
2
+ ln exp
_
(x
k
)
2
2
2
__
(H.4)
Considering ln = exp
1
, we rewrite this as:
=
N
k=1
_
ln
1
_
(x
k
)
2
2
2
__
(H.5)
We apply the identity ln a
y
= y ln a:
=
N
k=1
_
1
2
ln
_
2
2
_
_
(x
k
)
2
2
2
__
and distribute the summation notation:
=
N
k=1
_
1
2
ln
_
2
2
_
_
k=1
_
(x
k
)
2
2
2
_
The left term does not depend on k and can be simplied as:
=
N
2
ln
_
2
2
_
k=1
_
(x
k
)
2
2
2
_
3
The right term contains a factor which does not depend on k:
L(
2
) =
N
2
ln
_
2
2
_
1
2
2
N
k=1
(x
k
)
2
Which restates the likelihood function as matching the equation given by the text.
Now that we have derived a likelihood function, our remaining task is to nd a maximum of this
function, in terms of . We will proceed as usual, by nding an expression for the derivative
L()
d
and solving for values of which have zero (0) derivative. It will be helpful to rewrite our likelihood
function as:
L(
2
) =
N
2
ln (2)
N
2
ln
_
2
_
1
2
2
N
k=1
(x
k
)
2
and to substitute =
2
:
L() =
N
2
ln (2)
N
2
ln ()
1
2
1
N
k=1
(x
k
)
2
Continuing with the example as presented in the text:
Taking the derivative of the above with respect to
2
and equating to zero, we obtain
N
2
2
+
1
2
4
N
k=1
(x
k
)
2
= 0
We will equivalently nd the derivative
dL
d
, remembering that
d ln x
dx
=
1
x
:
dL()
d
=
N
2
+
1
2
2
N
k=1
(x
k
)
2
It is now convenient to reverse our substitution =
2
:
dL(
2
)
d
2
=
N
2
2
+
1
2
4
N
k=1
(x
k
)
2
and rewrite the second term:
=
N
2
2
+
1
2
4
N
k=1
(x
k
)
2
Now our expression for the derivative matches the text.
4
To nd an extremum of L(
2
), we will solve for an expression of
2
at which the derivative is zero
(0):
N
2
2
+
1
2
4
N
k=1
(x
k
)
2
= 0
Multiplying by 2
4
gives:
N
2
+
N
k=1
(x
k
)
2
= 0
or the equivalent equation:
N
k=1
(x
k
)
2
= N
2
Dividing by N:
1
N
N
k=1
(x
k
)
2
=
2
We have obtained the same result as the text, which makes a change of notation at this point,
writing
2
ML
instead of
2
:
and nally the ML estimate of
2
results as the solution of the above,

2
ML
=
1
N
N
k=1
(x
k
)
2
(2.63)
2 Commentary
Reviewing our derivation, we can consider the convenience of log-likelihood L(X; ), as compared
to likelihood p(X; ).
In equation (H.2), the presence of the logarithm function allowed us to rewrite the product notation,
, of (H.2) as summation notation,
, in (H.3). This is helpful later when we nd the derivative.

Taking the derivative of a summation is easier than taking the derivative of a product notation.
In equation (H.4), the presence of the logarithm function was conveniently able to cancel the expo-
nential factor introduced by the Gaussian pdf, yielding (H.5).
5
3 About Heath Hunnicutt
Heath Hunnicutt is a non-credentialed amateur mathematician and professional software developer.
Via the Internet, Heath oers on-line videoconference training in a variety of amateur mathematics.
These areas include: signal processing, pattern recognition, information theory, thermodynamics,
statistics, abstract algebra, formal correctness of software, and software development.
Heaths email address is not written plainly to avoid computer programs which nd addresses and
send advertising. To contact Heath, you may calculate his email address: starting with his full
name, convert every letter to lower-case. Concatenate the rst and last name by adding a dot
between them, as in john.doe this is the email address at gmail.com.
6

Expanded Steps For Example 2.3 From Pattern Recognition/4e by Theodoridis and Koutroumbas

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Expanded Steps For Example 2.3 From Pattern Recognition/4e by Theodoridis and Koutroumbas

Загружено:

Авторское право:

Доступные форматы

Expanded steps for Example 2.

, of (H.2) as summation notation,

, in (H.3). This is helpful later when we nd the derivative.

Вам также может понравиться