Вы находитесь на странице: 1из 72

# Reasoning Under Uncertainty

## Most tasks requiring intelligent behavior have some

degree of uncertainty associated with them.
The type of uncertainty that can occur in knowledge-
based systems may be caused by problems with the
data. For example:
. !ata might be missing or unavailable
". !ata might be present but unreliable or
ambiguous due to measurement errors.
#. The representation of the data may be imprecise
or inconsistent.
\$. !ata may %ust be user&s best guess.
'. !ata may be based on defaults and the defaults
may have exceptions.

## The uncertainty may also be caused by the

represented knowledge since it might
. (epresent best guesses of the experts that are
based on plausible or statistical associations they
have observed.
". )ot be appropriate in all situations *e.g.+ may
have indeterminate applicability,
-iven these numerous sources of errors+ most
knowledge-based systems require the incorporation of
some form of uncertainty management.
.hen implementing some uncertainty scheme we
must be concerned with three issues:
. /ow to represent uncertain data
". /ow to combine two or more pieces of uncertain
data
#. /ow to draw inference using uncertain data
.e will introduce three ways of handling uncertainty:
"
0robabilistic reasoning.
1ertainty factors
!empster-2hafer Theory
#
1. Classical Probability
The oldest and best defined technique for managing
uncertainty is based on classical probability theory. 3et us
start to review it by introducing some terms.
Sample space: 1onsider an experiment whose
outcome is not predictable with certainty in advance.
/owever+ although the outcome of the experiment
will not be known in advance+ let us suppose that the
set of all possible outcomes is known. This set of all
possible outcomes of an experiment is known as the
sample space of the experiment and denoted by S.
For example:
4f the outcome of an experiment consists in the
determination of the sex of a newborn child+ then
2 5 6g+ b7
where the outcome g means that the child is a girl
and b that it is a boy.
4f the experiment consists of flipping two coins+
then the sample space consists of the following
four points:
2 5 6*/+ /,+ */+ T,+ *T+ /,+ *T+ T,7
\$
Event: any subset E of the sample space is known as
an event.
That is+ an event is a set consisting of possible
outcomes of the experiment. 4f the outcome of the
experiment is contained in 8+ then we say that 8 has
occurred.
For example+ if 8 5 6*/+ /,+ 6/+ T,7+ then 8 is the
event that a head appears on the first coin.
For any event 8 we define the new event 8&+ referred
to as the complement of 8+ to consist of all points in
the sample space 2 that are not in 8.
Mutually exclusive events: 9 set of events 8

+ 8
"
+ ...+
8
n
in a sample space 2+ are called mutually exclusive
events if 8
i
8
%
5 + i %+ i+ % n.
'
9 formal theory of probability can be made using
three axioms:
. : 0*8, .
". 0*8
i
, 5 *or 0*2, 5 ,
i
This axiom states that the sum of all events which do
not affect each other+ called mutually exclusive
events+ is .
9s a corollary of this axiom:
0*8
i
, ; 0*8
i
&, 5 +
where 8
i
& is the complement of event 8
i
.
#. 0*8

8
"
, 5 0*8

, ; 0*8
"
,+
where 8

and 8
"
are mutually exclusive events. 4n
general+ this is also true.
<
Compound probabilities
8vents that do not affect each other in any way are
called independent events. For two independent
events 9 and =+
0*9 =, 5 0*9, 0*=,
Independent events: The events 8

+ 8
"
+ ...+ 8
n
in a
sample space 2+ are independent if
0*8
i
... 8
ik
, 5 0*8
i
, ...0*8
ik
,
for each subset 6i

+ ...+i
k
, 6+ ...+ n7+ k n+ n .
4f events 9 and = are mutually exclusive+ then
0*9 =, 5 0*9, ; 0*=,
4f events 9 and = are not mutually exclusive+ then
0*9 =, 5 0*9, ; 0*=, - 0*9 =,
This is also called Addition la.
>
Conditional Probabilities
The probability of an event 9+ given = occurred+ is called
a conditional probability and indicated by
0*9 ? =,
The conditional probability is defined as
0*9 =,
0*9 ? =, 5 ------------------+ for 0*=, :.
0*=,
Multiplicative !a of probability for two events is
then defined as
0*9 =, 5 0*9 ? =, 0*=,
which is equivalent to the following
0*9 =, 5 0*= ? 9, 0*9,
"enerali#ed Multiplicative !a
0*9

9
"
... 9
n
, 5
0*9

? 9
"
... 9
n
, 0*9
"
? 9
#
... 9
n
,
... 0*9
n-
? 9
n
, 0*9
n
,
@
An example
9s an example of probabilities+ Table below shows
hypothetical probabilities of a disk crash using a =rand A
drive within one year.
=rand A =rand A& Total of (ows
1rash 1 :.< :. :.>
)o crash 1& :." :. :.#
Total of columns :.@ :." .:
\$ypot%etical probabilities o& a dis' cras%
A A& Total of rows
1 0*1 A, 0*1 A&, 0*1,
1& 0*1& A, 0*1& A&, 0*1&,
Total of columns 0*A, 0*A&,
Probability interpretation o& to sets
Bsing above tables+ the probabilities of all events can be
calculated. 2ome probabilities are
*, 0*1, 5 :.>
*", 0*1&, 5 :.#
*#, 0*A, 5 :.@
*\$, 0*A&, 5 :."
*', 0*1 A, 5 :.<
*the probability of a crash and using =rand A,
C
*<, The probability of a crash+ given that =rand A is
used+ is
0*1 A, :.<
0*1 ? A, 5 ------------- 5 ------- 5 :.>'
0*A, :.@
*>, The probability of a crash+ given that =rand A is not
used+ is
0*1 A&, :.
0*1 ? A&, 5 ------------- 5 ------- 5 :.':
0*A&, :."
0robabilities *', and *<, may appear to have similar
meanings when you read their descriptions. /owever
*', is simply the intersection of two events+ while *<,
is a conditional probability.
The meaning of *', is the following:
IF a disk drive is picked randomly, then 0.6 of the
time it will be Brand x and have crashed.
4n other words+ we are %ust picking samples from the
population of disk drives. 2ome of those drives are
=rand A and have crashed *:.<,+ some are not =rand
A and have crashed *:.,+ some are =rand A and have
not crashed *:.",+ and some are not =rand A and have
not crashed *:.,.
:
4n contrast+ the meaning of the conditional probability
*<, is very different
IF a Brand disk drive is picked, then 0.!" of the
time it will have crashed.
)ote also that if any of the following equation is true+
then events 9 and = are independent.
0*9 ? =, 5 0*9, or
0*= ? 9, 5 0*=, or
0*9 =, 5 0*9, 0*=,.

(ayes) *%eorem
)ote that conditional probability is defined as
0*/ 8,
0*/ ? 8, 5 ------------------+ for 0*8, :.
0*8,
i.e.+ the conditional probability of / given 8.
4n real-life practice+ the probability 0*/ ? 8, cannot
always be found in the literature or obtained from
statistical analysis. The conditional probabilities
0*8 ? /,
however often are easier to come byD
4n medical textbooks+ for example+ a disease is
described in terms of the signs likely to be found in a
typical patient suffering from the disease.
The following theorem provides us with a method for
computing the conditional probability 0*/ ? 8, from
the probabilities 0*8,+ 0*/, and 0*8 ? /,D
"
From conditional probability:
0*/ 8,
0*/ ? 8, 5 ------------------+
0*8,
0*8 /,
Furthermore+ we have+ 0*8 ? /, 5 ---------------
0*/,
2o+
0*8 ? /,0*/, 5 0*/ ? 8,0*8, 5 0*/ 8,
Thus
0*8 ? /, 0*/,
0*/ ? 8, 5 ---------------------
0*8,
This is the =ayes& Theorem. 4ts general form can be
written in terms of events+ 8+ and hypotheses
*assumptions,+ /+ in the following alternative forms.
0*8 /
i
,
0*/
i
? 8, 5 -------------------
0*8 /
%
,
%
0*8 ? /
i
, 0*/
i
, 0*8 ? /
i
, 0*/
i
,
5 ----------------------- 5 ---------------------
0*8 ? /
%
, 0*/
%
, 0*8,
%
#
\$ypot%etical reasoning and
bac'ard induction
=ayes& Theorem is commonly used for decision tree
analysis of business and the social sciences.
The method of =ayesian decision making is also used
in expert system 0(E2081TE(.
.e use oil exploration in 0(E2081TE( as an example.
2uppose the prospector believes that there is a better than
':-': chance of finding oil+ and assumes the following.
0*E, 5 :.< and 0*E&, 5 :.\$
Bsing the seismic survey technique+ we obtain the
following conditional probabilities+ where ; means a
positive outcome and - is a negative outcome
0*; ? E, 5 :.@ 0*- ? E, 5 :." *false -,
0*; ? E&,5 :. *false ;, 0*- ? E&, 5 :.C
Bsing the prior and conditional probabilities+ we can
construct the initial probability tree as shown below.
\$
Probabilities
Prior
2ub%ective Epinion
of site: 0*/i,
Conditional
2eismic test result
0*8?/i,
+oint: 0*8/,
50*8?/i,0*/i,
)o oil oil
0*E&, 5 :.\$ 0*E,5:.<
F test ; test F test ; test
0*-?E&, 0*;?E&, 0*-?E, 0*;?E,
5:.C 5 :. 5 :." 5 :.@
0*-E&, 0*;E&, 0*-E, 0*;E,
5:.#< 5:.:\$ 5:." 5:.\$@
4nitial probability tree for oil exploration
'
Bsing 9ddition law to calculate the total probability
of a ; and a - test
0*;, 5 0*; E, ; 0*; E&, 5 :.\$@ ; :.:\$ 5 :.'"
0*-, 5 0*- E, ; 0*- E&, 5 :." ; :.#< 5 :.\$@
0*;, and 0*-, are unconditional probabilities that can
now be used to calculate the posterior probabilities at
the site+ as shown below.
Probabilities
unconditional
0*8,
Posterior
Ef site:
0*/i?8, 5 0*8?
/i,0*/i,G0*8,
+oint: 0*8/,
50*/i?8,0*8,
F test ; test
0*-, 5 :.\$@ 0*;,5:.'"
)o oil oil )o oil oil
0*E&?-, 0*E?-, 0*E&?;, 0*E?;,
5#G\$ 5 G\$ 5 G# 5 "G#
0*-E&, 0*-E, 0*;E&, 0*;E,
5:.#< 5:." 5:.:\$ 5:.\$@
(evised probability tree for oil exploration
<
The Figure below shows the =aysian decision tree
using the data from the above Figure. The payoffs are
at the bottom of the tree.
Thus if oil is found+ the payoff is
H+"':+::: - H"::+::: - H':+::: 5 H+:::+:::
while a decision to quit after the seismic test result gives a
payoff of -H':+:::.
The assumed amounts are:
Eil lease+ if successful: H+"':+:::
!rilling expense: -H"::+:::
2eismic survey: -H':+:::
Event
*est result
; or -
Act.
Iuit or drill
Event
Eil or no oil
payo&&
0*-, 5 :.\$@ 0*;,5:.'"
Iuit !rill Iuit !rill
-H':+::: -H':+:::
-H+:::+::: H+:::+::: -H+:::+::: H+:::+:::
4nitial =ayesian !ecision tree for oil exploration
)o oil oil )o oil oil
0*E&?-, 0*E?-, 0*E&?;, 0*E?;,
5#G\$ 5 G\$ 5 G# 5 "G#
9
1
=
!
8
>
4n order for the prospector to make the best decision+
the expected payoff must be calculated at event node
9.
To compute the expected payoff at 9+ we must work
backward from the leaves. This process is called
bac'ard induction.
@
The expected payoff from an event node is the sum of
the payoffs times the probabilities leading to the
payoffs.
8xpected payoff at node 1
H@\$<+'# 5 *H+:::+:::, *"G#, - *H+:::+:::, *G#,
8xpected payoff at node =
-H'::+::: 5 *H+:::+:::, *G\$, - *H+:::+:::, *#G\$,
Event
*est result
; or -
Act.
Iuit or drill
Event
Eil or no oil
payo&&
0*-, 5 :.\$@ 0*;,5:.'"
Iuit !rill Iuit !rill
-H':+::: -H':+:::
-H+:::+::: H+:::+::: -H+:::+::: H+:::+:::
1omplete =ayesian !ecision tree for oil exploration
Bsing backward induction
)o oil oil )o oil oil
0*E&?-, 0*E?-, 0*E&?;, 0*E?;,
5#G\$ 5 G\$ 5 G# 5 "G#
9
1
=
!
8
H\$<+:::
-H':+::: H@\$<+'#
C
The decision tree shows the optimal strategy for the
prospector. 4f the seismic test is positive+ the site
should be drilled+ otherwise+ the site should be
abandoned.
The decision tree is an example of hypothetical
reasoning or Jwhat ifK type of situations.
=y exploring alternate paths of action+ we can prune
paths that do not lead to optimal payoffs.
":
(ayes) rule and 'noledge,based
systems
9s we know+ rule-based systems express knowledge in an
4F-T/8) format:
4F A is true
T/8) L can be concluded with probability p
4f we observe that A is true+ then we can conclude that
L exist with the specified probability. For example
4F the patient has a cold
T/8) the patient will sneeMe *:.>',
=ut what if we reason abductively and observe L *i.e.+
the patient sneeMes, while knowing nothing about A
*i.e.+ the patient has a cold,N .hat can we conclude
about itN =ayes& Theorem describes how we can
derive a probability for A.
"
.ithin the rule given above+ L *denotes some piece of
evidence *typically referred to as 8, and A denotes
some hypothesis */, given
0*8 ? /, 0*/,
*, 0*/ ? 8, 5 -------------------
0*8,
or
0*8 ? /, 0*/,
*", 0*/ ? 8, 5 -----------------------------------------
0*8 ? /,0*/, ; 0*8 ? /&,0*/&,
To make this more concrete+ consider whether (ob has a
cold *the hypothesis, given that he sneeMes *the evidence,.
8quation *", states that the probability that (ob has a
cold given that he sneeMes is the ratio of the
probability that he both has a cold and sneeMes+ to the
probability that he sneeMes.
The probability of his sneeMing is the sum of the
conditional probability that he sneeMes when he has a
cold and the conditional probability that he sneeMes
when he doesn&t have a cold. 4n other words+ the
probability that he sneeMes regardless of whether he
has a cold or not. 2uppose that we know in general
""
0*/, 5 0*(ob has a cold,
5 :."
0*8 ? /, 50*(ob was observed sneeMing ? (ob has a cold,
5 :.>'
0*8 ? /&, 5 0*(ob was observed sneeMing ?
(ob does not have a cold,
5 :."
Then
0*8, 5 0*(ob was observed sneeMing,
5 *:.>',*:.", ; *:.",*:.@,
5 :.#
and
0*/ ? 8, 50*(ob has a cold ? (ob was observed sneeMing,
*:.>',*:.",
5 ---------------
*:.#,
5 :.\$@#@>
Er (ob&s probability of having a cold given that he
"#
.e can also determine what his probability of having
a cold would be if he was not sneeMing:
0*8& ? /,0*/,
0*/ ? 8&, 5 -------------------
0*8&,
*-:.>', *:.",
5 -------------------
* - :.#,
5 :.:>"\$<
2o knowledge that he sneeMes increasing his
probability of having a cold by approximately ".'+
while knowledge that does not sneeMe decreases his
probability by a factor of almost #.
"\$
Propagation o& (elie&
)ote that what we have %ust examined is very limited
since we have only considered when each piece of
evidence affects only one hypothesis.
This must be generaliMed to deal with JmK hypotheses
/

+ /
"
+ ... /
m
and JnK pieces of evidence 8

+ ...+ 8
n
+ the
situation normally encountered in real-world
problems. .hen these factors are included+ 8quation
*", becomes
0*8
%
8
%"
...8
%k
? /
i
, 0*/
i
,
*#, 0*/
i
? 8
%
8
%"
...8
%k
, 5 -----------------------------------
0*8
%
8
%"
... 8
%k
,
0*8
%
? /
i
,0*8
%"
? /
i
, ... 0*8
%k
? /
i
,0*/
i
,
5 -------------------------------------------------------
m
0*8
%
? /
l
,0*8
%"
? /
l
, ... 0*8
%k
? /
l
,0*/
l
,
l5
where 6%

+ ...+%
k
, 6+ ...+ n7
This probability is called the posterior probability of
hypothesis /
i
from observing evidence 8
%
+ 8
%"
+ ...+ 8
%k
.
"'
This equation is derived based on several
assumptions:
. The hypotheses /

+ ...+ /
m
+ m + are mutually
exclusive.
". Furthermore+ the hypotheses /

+ ...+ /
m
are
collectively exhaustive.
#. The pieces of evidence 8

+ ...+ 8
n
+ n + are
conditionally independent given any hypothesis
/
i
+ i m .
Conditional independent: The events 8

+ 8
"
+ ...+
8
n
+ are conditionally independent given an event
/ if
0*8
%
... 8
%k
? /, 5 0*8
%
? /, ...0*8
%k
? /,
for each subset 6%

+ ...+%
k
, 6+ ...+ n7.
This last assumption often causes great difficulties for
probabilistic based methods.
For example+ two symptoms+ 9 and =+ might each
independently indicate that some disease is ': percent
likely. Together+ however+ it might be that these
symptoms reinforce *or contradict, each other. 1are
must be taken to ensure that such a situation does not
exist before using the =ayesian approach.
"<
To illustrate how belief is propagated through a
system using =ayes& rule+ consider the values shown
in the Table below. These values represent
*hypothetically, three mutually exclusive and
exhaustive hypotheses
. /

## + the patient+ (ob+ has a coldD

". /
"
+ (ob has an allergyD and
#. /
#
+ (ob has a sensitivity to light
with their prior probabilities+ 0*/
i
,&s+ and two
conditionally independent pieces of evidence
. 8

## + (ob sneeMes and

". 8
"
+ (ob coughs+
which support these hypotheses to differing degrees.
i 5 i 5 " i 5 #
*cold, *allergy, *light sensitive,
0*/
i
, :.< :.# :.
0*8

? /
i
, :.# :.@ :.#
0*8
"
? /
i
, :.< :.C :.:
">
4f we observe evidence 8

## *e.g.+ the patient sneeMes,+

we can compute posterior probabilities for the
hypotheses using 8quation *#, *where k 5 , to be:
*:.#,*:.<,
0*/

? 8

, 5 ------------------------------------------ 5 :.\$
*:.#,*:.<, ; *:.@,*:.#, ; *:.#,*:.,
*:.@,*:.#,
0*/
"
? 8

, 5 ------------------------------------------ 5 :.'#
*:.#,*:.<, ; *:.@,*:.#, ; *:.#,*:.,
*:.#,*:.,
0*/
#
? 8

, 5 ------------------------------------------ 5 :.:<
*:.#,*:.<, ; *:.@,*:.#, ; *:.#,*:.,
)ote that the belief in hypotheses /

and /
#
have both
decreased while the belief in hypothesis /
"
has
increased after observing 8

. 4f 8
"
*e.g.+ the patient
coughs, is now observed+ new posterior probabilities
can be computed from 8quation *#, *where k 5 ",:
0*/

? 8

8
"
,
*:.#,*:.<,*:.<,
5 ------------------------------------------------------------
*:.#,*:.<,*:.<, ; *:.@,*:.C,*:.#, ; *:.#,*:.:,*:.,
5 :.##
"@
0*/
"
? 8

8
"
,
*:.@,*:.C,*:.#,
5 ------------------------------------------------------------
*:.#,*:.<,*:.<, ; *:.@,*:.C,*:.#, ; *:.#,*:.:,*:.,
5 :.<>
0*/
#
? 8

8
"
,
*:.#,*:.:,*:.,
5 ------------------------------------------------------------
*:.#,*:.<,*:.<, ; *:.@,*:.C,*:.#, ; *:.#,*:.:,*:.,
5 :.:
/ypothesis /
#
*e.g.+ sensitivity to light, has now
ceased to be a viable hypothesis and /
"
*e.g.+ allergy,
is considered much more likely than /

*e.g.+ cold,
even though /

## initially ranked higher.

"C
(ayesian met%ods
The =ayesian methods have a number of advantages that
indicates their suitability in uncertainty management.
Most significant is their sound theoretical foundation
in probability theory. Thus+ they are currently the
most mature of all of the uncertainty reasoning
methods.
.hile =ayesian methods are more developed than the
other uncertainty methods+ they are not without faults.
. They require a significant amount of probability data
to construct a knowledge base. Furthermore+ human
experts are normally uncertain and uncomfortable
about the probabilities they are providing.
". .hat are the relevant prior and conditional
probabilities based onN 4f they are statistically based+
the sample siMes must be sufficient so the probabilities
obtained are accurate. 4f human experts have provided
the values+ are the values consistent and
comprehensiveN
#. Eften the type of relationship between the hypothesis
and evidence is important in determining how the
#:
uncertainty will be managed. (educing these
associations to simple numbers removes relevant
information that might be needed for successful
reasoning about the uncertainties. For example+
=ayesian-based medical diagnostic systems have
failed to gain acceptance because physicians distrust
systems that cannot provide explanations describing
how a conclusion was reached *a feature difficult to
provide in a =ayesian-based system,.
\$. The reduction of the associations to numbers also
eliminated using this knowledge within other tasks.
For example+ the associations that would enable the
system to explain its reasoning to a user are lost+ as is
the ability to browse through the hierarchy of
evidences to hypotheses.
#
-. Certainty &actors
1ertainty factor is another method of dealing with
uncertainty. This method was originally developed for
the ML14) system.
Ene of the difficulties with =ayesian method is that
there are too many probabilities required. Most of
them could be unknown.
The problem gets very bad when there are many
pieces of evidence.
=esides the problem of amassing all the conditional
probabilities for the =ayesian method+ another ma%or
problem that appeared with medical experts was the
relationship of belief and disbelief.
9t first sight+ this may appear trivial since obviously
disbelief is simply the opposite of belief. 4n fact+ the
theory of probability states that
0*/, ; 0*/&, 5
and so
#"
0*/, 5 - 0*/&,
For the case of a posterior hypothesis that relies on
evidence+ 8

*, 0*/ ? 8, 5 - 0*/& ? 8,
/owever+ when the ML14) knowledge engineers
began interviewing medical experts+ they found that
physicians were extremely reluctant to state their
knowledge in the form of equation *,.
For example+ consider a ML14) rule such as the
following.
4F , The stain of the organism is gram positive+ and
", The morphology of the organism is coccus+ and
#, The growth conformation of the organism is chains
T/8) There is suggestive evidence *:.>, that the
identity of the organism is streptococcus
This can be written in terms of posterior probability:
*", 0*/ ? 8

8
"
8
#
, 5 :.>
where the 8
i
correspond to the three patterns of the
antecedent.
##
The ML14) knowledge engineers found that while an
expert would agree to equation *",+ they became
uneasy and refused to agree with the probability result
*#, 0*/& ? 8

8
"
8
#
, 5 - :.> 5 :.#
This illustrates these numbers such as :.> and :.# are
likelihoods of belief+ not probabilities.
3et us have another example.
2uppose this is your last course required for a degree.
too good and you need an O9& in this course to bring
up your -09. The following formula may express
*\$, 0*graduating ? O9& in this course, 5 :.>:
)otice that this likelihood is not ::P. The reason
it&s not ::P is that a final audit of your course and
problem due to a number of reasons that would still
#\$
9ssuming that you agree with *\$, *or perhaps your
own value for the likelihood, then by equation *,
*', 0*not graduating ? O9& in this course, 5 :.#:
From a probabilistic point of view+ *', is correct.
/owever+ it seems intuitively wrong. 4t is %ust not
right that if you really work hard and get an O9& in this
course+ then there is a #:P chance that you won&t
graduate. *', should make you uneasy.
The fundamental problem is that while 0*/ ? 8,
implies a cause of effect relationship between 8 and
/+ there may be no cause and effect relationship
between 8 and /&.
These problems with the theory of probability led the the
researchers in ML14) to investigate other ways of
representing uncertainty.
The method that they used with ML14) was based on
certainty &actors.
#'
Measures o& belie& and disbelie&
4n ML14)+ the certainty factor *1F, was originally
defined as the difference between belief and disbelief.
1F*/+ 8, 5 M=*/+ 8, - M!*/+ 8,
where
1F is the certainty factor in the hypothesis /
due to evidence 8
M= is the measure o& increased belie& in / due to 8
M! is the measure o& increased disbelie& in / due to 8
The certainty factor is a way of combining belief and
disbelief into a single number.
1ombining the measures of belief and disbelief into a
single number has some interesting uses.
The certainty factor can be used to rank hypothesis in
order of importance.
For example+ 4f a patient has certain symptoms
which suggest several possible diseases+ then the
disease with the highest 1F would be the one that is
first investigated by ordering tests.
#<
The measures of belief and disbelief were defined in terms
of probabilities by
5 if 0*/, 5
M=*/+ 8,
maxQ0*/ ? 8,+ 0*/,R - 0*/,
5 ---------------------------------- otherwise
- 0*/,
5 if 0*/, 5
M!*/+8,
min Q0*/ ? 8,+ 0*/,R - 0*/,
5 ---------------------------------- otherwise
- 0*/,
#>
9ccording to these definitions+ some characteristics
are shown in Table '-.
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
1haracteristics Talues
------------------------------------------------------
(anges : M=
: M!
- 1F
-------------------------------------------------------
1ertain True /ypothesis M= 5
0*/ ? 8, 5 M! 5:
1F 5
-------------------------------------------------------
1ertain False /ypothesis M= 5:
0*/&?8, 5 M! 5
1F 5 -
-------------------------------------------------------
3ack of evidence M= 5 :
0*/ ? 8, 5 0*/, M! 5 :
1F 5:
-------------------------------------------------------
2ome 1haracteristics of M=+ M! and 1F
#@
The certainty factor+ 1F+ indicates the net belief in
hypothesis based on some evidence.
9 positive 1F means the evidence supports the
hypothesis since M= U M!.
9 1F 5 means that the evidence definitely proves
the hypothesis.
9 1F 5 : means one of two possibilities.
. First+ a 1F 5 M= - M! 5 : could mean that both
M= and M! are :.
". The second possibility is that M= 5 M! and both
are nonMero. The result is that the belief is
canceled out by the disbelief.
9 negative 1F means that the evidence favors the
negation of the hypothesis since M= V M!. 9nother
way of stating this is that there is more reason to
disbelief a hypothesis than to belief it.
For example+ a 1F 5 ->:P means that the disbelief is
>:P greater than the belief.
9 1F5>:P means that the belief is >:P greater than
the disbelief.
#C
1ertainty factors allow an expert to express a belief
without committing a value to the disbelief.
The following equation is true.
1F*/+ 8, ; 1F*/&+ 8, 5:
The equation means that evidence supporting a
hypothesis reduces support to the negation of the
hypothesis by an equal amount so that the sum is
always :.
For the example of the student graduating if an O9& is
given in the course
1F*/+8, 5 :.>: 1F*/&+8, 5 -:.>:
which means
*<, 4 am >: P certain that 4 will graduate if 4 get
an O9& in this course.
*>, 4 am ->:P certain that 4 will not graduate if 4
get an O9& in this course.
: means no evidence.
2o certainty values greater than : favor the hypothesis
\$:
1ertainty factors less than : favor the negation of the
hypothesis. 2tatements *<, and *>, are equivalent
using certainty factors
The above 1F values might be elicited by asking

/ow much do you believe that getting an O9K
if the evidence is to confirm the hypothesis+ or
/ow much do you disbelief that getting O9&
9n answer of >:P to each question will set 1F*/+ 8,
5 :.>+ and 1F*/&+8, 5 -:.>:.
\$
Calculation it% Certainty /actors
9lthough the original definition of 1F was
1F 5 M= - M!
there were difficulties with this definition
because one piece of disconfirming evidence could
control the confirmation of many other pieces of
evidence.
For example+ ten pieces of evidence might produce a
M= 5 :.CCC and one disconfirming piece with M! 5
:.>CC could then give
1F 5 :.CCC - :.>CC 5 :."::
The definition of 1F was changed in ML14) in C>>
to be

M= - M!
1F 5 ------------------------
- min*M=+ M!,
\$"
This softens the effects of a single piece of
disconfirming evidence on many confirming pieces of
evidence. Bnder this definition with M=5:.CCC+
M!5:.>CC

:.CCC-:.>CC :."::
1F 5 --------------------------- 5 ------------- 5 :.CC'
- min*:.CCC+ :.>CC, - :.>CC
The ML14) method for combining evidence in the
antecedent of a rule are shown in Table '-".
-------------------------------------------------------------
8vidence+ 8 9ntecedent 1ertainty
-------------------------------------------------------------
8

9)! 8
"
min Q1F*8

+ e,+1F*8
"
+ e,R
8

E( 8
"
maxQ1F*8

+ e,+1F*8
"
+ e,R
)ET 8 -1F*8+ e,
--------------------------------------------------------------
Table '-"
For example+ given a logical expression for
combining evidence such as
8 5 *8

9)! 8
"
9)! 8
#
, or *8
\$
9)! )ET 8
'
,
the evidence 8 would be computed as
\$#
8 5 maxQmin*8

+ 8
"
+ 8
#
,+ min*8
\$
+ -8
'
,R
for values
8

5 :.C 8
"
5 :.@ 8
#
5 :.#
8
\$
5 -:.' 8
'
5 -:.\$
the result is
8 5 maxQmin*:.C+ :.@+ :.#,+ min*-:.'+ -*-:.\$,R
5 maxQ:.#+ -:.'R
5 :.#
The formula for the 1F of a rule
4f 8 T/8) /
is given by
*@, 1F*/+e, 5 1F*8+e, 1F*/+8,
where
1F*8+e, is the certainty factor of the evidence 8
making up the antecedent of the rule base on
uncertain evidence e.
1F*/+8, is the certainty factor of hypothesis
assuming that the evidence is with certainty+ when
1F*8+e, 5 .
1F*/+e, is the certainty factor of the hypothesis
based on uncertain evidence e.
\$\$
Thus+ if all the evidence in the antecedent is known
with certainty+ the formula for the certainty factor of
the hypothesis is
1F*/+e, 5 1F*/+8,
since 1F*8+e, 5 .
2ee an example. 1onsider the 1F for the
streptococcus rule discussed before+
4F , The stain of the organism is gram positive+ and
", The morphology of the organism is coccus+ and
#, The growth confirmation of the organism is chains
T/8) There is suggestive evidence *:.>, that the
identity of the organism is streptococcus
where the certainty factor of the hypothesis under
certain evidence is
1F*/+ 8, 5 1F */+ 8

8
"
8
#
, 5 :.>
and is also called the attenuation &actor.
The attenuation &actor is based on the assumption
that all the evidence--8

+ 8
"
and 8
#
--is known with
certainty. That is+
1F*8

+ e, 5 1F*8
"
+ e, 5 1F*8
#
+ e, 5
\$'
.hat happens when all the evidenced are not known
with certaintyN
4n the case of ML14)+ the formula *@, must be used
to determine the resulting 1F value since 1F*/+
8

8
"
8
#
, 5 :.> is no longer valid for uncertain
evidence.
For example+ assuming
1F*8

+e, 5 :.'
1F*8
"
+e, 5 :.<
1F*8
#
+e, 5 :.#
then
1F*8+e, 5 1F*8

8
"
8
#
+e,
5 minQ1F*8

+e,+ 1F*8
"
+e,+ 1F*8
#
+e,R
5 minQ:.'+ :.<+ :.#R
5 :.#
The certainty factor of the conclusion is
1F*/+ e, 5 1F*8+e, 1F*/+8,
5 :.# W :.>
5 :."
\$<
.hat happen when another rule also concludes the
same hypothesis+ but with a different certainty factorN
The certainty factors of rules concluding the same
hypothesis is calculated from the combining function
for certainty factors defined as
*C, 1F
1EM=4)8
*1F

+1F
"
,
5 1F

; 1F
"
* - 1F

, if both 1F

and

1F
"
U :

1F

; 1F
"

5 ------------------------- if one of 1F

and

1F
"
V :
- min*?1F

?+?1F
"
?,
5 1F

; 1F
"
* ; 1F

, if both 1F

and

1F
"
V :
where 1F

is 1F

*/+ e, and 1F
"
is 1F
"
*/+ e,.
The formula for 1F
1EM=4)8
used depends on whether
the individual certainty factors are positive or
negative.
The combining function for more than two certainty
factors is applied incrementally. That is+ the 1F
1EM=4)8
is calculated for two 1F values+ and then the
1F
1EM=4)8
is combined using formula *C, with the third
1F values+ and so forth.
\$>
The following figure summariMes the calculations with
certainty factors for two rules based on uncertain
evidence and concluding the same hypothesis.
1F of two rules with the same hypothesis
based on uncertain evidence
/ypothesis+ /
1F

*/+ e, 5 1F

*8+ e,1F

*/+e, 1F
"
*/+ e, 5 1F
"
*8+ e,1F
"
*/+e,
(ule (ule "
9)! E( )ET
*min, *max, *-,
9)! E( )ET
*min, *max, *-,
\$@
4n our above example+ if another rule concludes
strepococcus with certainty factor 1F
"
5 :.'+ then the
combined certainty using the first formula of *C, is
1F
1EM=4)8
*:."+ :.', 5 :." ; :.'* - :.", 5 :.<:'
2uppose a third rule also has the same conclusion+ but
with a 1F
#
5 -:.\$. Then the second formula of *C, is
used to give
:.<:' - :.\$
1F
1EM=4)8
*:.<:'+ -:.\$, 5 --------------------------
- min*?:.<:'?+ ?:.\$?,

:.":'
5 --------- 5 :.#\$
- :.\$
The 1F
1EM=4)8
formula preserves the commutativity of
evidence. That is
1F
1EM=4)8
*A+L, 5 1F
1EM=4)8
*L+A,
and so the order in which evidence is received does
not affect the result.
\$C
certainty &actors
The 1F formalism has been quite popular with expert
system developers since its creation because
. 4t is a simple computational model that permits
experts to estimate their confidence in conclusion
being drawn.
". 4t permits the expression of belief and disbelief in
each hypothesis+ allowing the expression of the effect
of multiple sources of evidence.
#. 4t allows knowledge to be captured in a rule
representation while allowing the quantification of
uncertainty.
\$. The gathering of the 1F values is significantly easier
than the gathering of values for the other methods. )o
statistical base is required - you merely have to ask the
expert for the values.
Many systems+ including ML14)+ have utiliMed this
formalism and have displayed a high degree of
competence in their application areas. =ut is this
competence due to these systems& ability to manipulate
and reason with uncertainty or is it due to other factorsN
':
2ome studies have shown that changing the certainty
factors or even turn off the 1F reasoning portion of
ML14) does not seems to affect the correct diagnoses
much.
This revealed that the knowledge described within the
rule contributes much more to the final+ derived
results than the 1F values.
Ether criticisms of this uncertainty reasoning method
include among others:
. The 1F lack theoretical foundation. =asically+ the 1F
were partly ad %oc. 4t is an approximation of
probability theory.
". )on-independent evidence can be expressed and
combined only by JchunkingK it together within the
same rule. .hen large quantities of non-independent
evidence must be expressed+ this proves to be
unsatisfactory
#. the 1F values could be the opposite of conditional
probabilities.
For example+ if
0*/

, 5 :.@ 0*/
"
, 5 :."
0*/

? 8, 5 :.C 0*/
"
? 8, 5 :.@
then 1F*/

## + 8, 5 :.' and 1F*/

"
+ 8, 5 :.>'
'
2ince one purpose of 1F is to rank hypotheses in
terms of likely diagnosis+ it is a contradiction for a
disease to have a higher conditional probability
0*/ ? 8, and yet have a lower certainty factor+
1F*/+ 8,.
'"
0. 1empster,S%a&er *%eory
/ere we discuss another method for handling uncertainty.
4t is called !empster-2hafer theory. 4t is evolved during
the C<:s and C>:s through the efforts of 9rthur
!empster and one of his students+ -lenn 2hafer.
This theory was designed as a mathematical theory of
evidence.
The development of the theory has been motivated by
the observation that probability theory is not able to
distinguish between uncertainty and ignorance owing
to incomplete information.
'#
/rames o& discernment
-iven a set of possible elements+ called environment+
5 6

+
"
+ ...+

n
7
that are mutually exclusive and exhaustive.
The environment is the set of ob%ects that are of
interest to us.
For example+
5 6airline+ bomber+ fighter7
5 6red+ green+ blue+ orange+ yellow7
Ene way of thinking about is in terms of questions
5 6airline+ bomber+ fighter7
and the questions is+ Jwhat are the military aircraftNK.
The answer is the subset of
6
"
+
#
7 5 6bomber+ fighter7
'\$
8ach subset of can be interpreted as a possible
2ince the elements are mutually exclusive and the
environment is exhaustive+ there can be only one
correct answer subset to a question.
Ef course+ not all possible questions may be
meaningful.
The subsets of the environment are all possible valid
answers in this universe of discourse.
9n environment is also called a &rame o&
discernment.
The term discern means that it is possible to
distinguish the one correct answer from all the other
The power set of the environment *with "
)
subsets for
a set of siMe ), has as its elements all answers to the
possible questions of the frame of discernment.
''
Mass /unctions and Ignorance
4n =ayesian theory+ the posterior probability changes as
evidence is acquired. 3ikewise in !empster-2hafer theory+
the belief in evidence may vary.
4t is customary in !empster-2hafer theory to think
about the degree of belief in evidence as analogous to
the mass of a physical ob%ect.
That is+ the mass of evidence supports a belief.
The reason for the analogy with an ob%ect of mass is
to consider belief as a quantity that can move around+
be split up+ and combined.
'<
9 fundamental difference between !empster-2hafer
theory and probability theory is the treatment of
ignorance.
9s discussed in 1hapter \$+ probability theory must
distribute an equal amount of probability even in
ignorance.
For example+ if you have no prior knowledge+ then
you must assume the probability 0 of each possibility
is

0 5 ----
)
where ) is the number of possibilities.
8.g.+ The formula 0*/, ; 0*/&, 5 must be enforced
The !empster-2hafer theory does not force belief to
be assigned to ignorance or refutation of a hypothesis.
The mass is assigned only to those subsets of the
environment to which you wish to assign belief.
'>
9ny belief that is not assigned to a specific subset is
considered no belie& or nonbelie& and %ust associated
with environment .
=elief that refutes a hypothesis is disbelie&+ which is
not nonbelief.
For example+ we are trying to identify whether an
aircraft is hostile. 2uppose there is the evidence of :.>
indicating a belief that the target aircraft is hostile+
where hostile aircraft are only considered to be
bombers and fighters. Thus+ the mass assignment is to
the subset 6bomber+ fighter,+ and
m

## *6bomber+ fighter7, 5 :.>

The rest of the belief is left with the environment+ +
as nonbelief.
m

*, 5 - :.> 5 :.#.
'@
The !empster-2hafer theory has a ma%or difference
with probability theory which would assume that
0*hostile, 5 :.>
0*non-hostile, 5 - :.> 5 :.#
:.# in !empster-2hafer theory is held as nonbelief in
the environment by m*,. This means neither belief
nor disbelief in the evidence to a degree of :.#.
9 mass has considerably more freedom than probabilities
as show in the table below.
!empster-2hafer theory 0robability theory
-------------------------------------------------------------------
m*, does not have to be 0
i
5
i
4f A L+ it is not necessary 0*A, 0*L,
that m*A, m*L,
)o required relationship 0*A, ; 0*A&, 5
between m*A, and m*A&,
'C
.e now state things more formally.
3et be a frame of discernment *environment,. 9
mass assignment function assigns a number m*x, to
each x such that:
*, m*x, :
*", m*, 5 :
*#, m*x, 5
x
3et be a frame of discernment *environment,. and
let m be a mass assignment function on . 9 set x
is called a focal element in m if m*x, U :. The core
of m+ denoted by k*m,+ is the set of all focal elements
in m.
3et us consider a medical example. 2uppose
5 6heart-attack+ pericarditis+ pulmonary-embolism+
aortic-dissection7.
)ote that each mass assignment on assigns mass
numbers to "
\$
5 < sets. 4f for a specific patient there
is no evidence pointing at a certain diagnosis in
particular+ the mass of is assigned to .
if x 5
<:
m
:
*x, 5
: otherwise
8ach proper subset of gets assigned the number :.
The core of m
:
is equal to 67
)ow suppose that some evidence has become
available that points to the composite hypothesis
heart#attack or pericarditis with some certainty.
Then the subset 6heart#attack+ pericarditis7 will be
assigned a mass+ e.g.+ :.\$. !ue to lack of further
information+ the remaining certainty :.< is assigned to
.
:.< if x 5
m

## *x, 5 :.\$ if x 5 6heart#attack+ pericarditis7

: otherwise
)ow suppose we have obtained some evidence
against the hypothesis that our patient is suffering
from pericarditis. This information can be considered
as support for hypothesis that the patient is not
suffering from pericarditis. This is equivalent to the
composite hypothesis heart#attack or p\$lmonary#
embolism or aortic#dissection. .e therefore assign a
mass+ for example :.>+ to the set 6heart#attack+
p\$lmonary#embolism+ aortic#dissection7
:.# if x 5
<
m
"
*x, 5 :.> if x 5 6heart#attack+ p\$lmonary#embolism+
aortic#dissection 7
: otherwise
<"
Combining evidence
!empster-2hafer theory provides a function for
computing from two pieces of evidence and their
associated masses describing the combined influence of
these pieces of evidence.
This function is known as %empster&s r\$le of
combination.
3et m

and m
"
be mass assignments on + the frame of
discernment. The combined mass is computed using
the formula *special form of 1empster)s rule o&
Combination,
m

m
"
*X, 5 m

*A, m
"
*L,
A L5 X
For instance+ using our hostile aircraft example+ based
on two pieces of evidence+ we obtain
m
"
*6=7, 5 :.C m
"
*, 5 :.
-------------------------------------------------------------
m

m

## *, 5 :.# 6=7 :."> :.:#

8.g.+ the entry T

is calculated as this
<#
T

*6=7, 5 m

*6=+ F7, m
"
*6=7, 5 *:.>,*:.C,5:.<#
Ence the individual mass products have been
calculated as shown above+ then according to
!empster&s (ule the products over the common set of
m
#
*6=7, 5 m

m
"
*6=7,
5 :.<# ; :."> 5 :.C: =omber
m
#
*6=+ F7, 5 m

m
"
*6=+ F 7,
5 :.:> =omber or fighter
m
#
*, 5 m

m
"
*, 5 :.:# nonbelief
The m
#
*6=7, represents the belief that the target is a
bomber and only a bomber.
m
#
*6=+ F7, and m
#
sets include a bomber+ it is plausible that their sums
may contribute to a belief in the bomber.
2o :.:> ; :.:# 5 :. may be added to the belief of :.C
in the bomber set to yield the maximum belief *5 ,
that it could be a bomber. This is called the plausible
belie&.
<\$
.e have two belief values for the bomber+ :.C and .
This pair represents a range o& belie&. 4t is called an
evidential interval.
The loer bound is called the support *Spt, or (el+
and the upper bound is called plausibility *PIs,.
For instance+ :.C is the lower bound in the above
example+ and is the upper bound.
The support is the minimum belief based on the
evidence+ while the plausibility is the maximum belief
we are willing to give.
Thus+ : =el 04s . Table below shows some
common evidential interval.
8vidential 4nterval Meaning
------------------------------------------------------------------------
Q+ R 1ompletely true
Q:+ :R 1ompletely false
Q:+ R 1ompletely ignorant
Q=el+ R where : V =el V here Tends to support
Q:+ 04sR where : V 04s V here Tends to refute
Q=el+ 04sR where : V =el 04s V here Tends to both support and refute
<'
The (el *belie& &unction2 or support, is defined to be
the total belief of a set and all its subsets.
=el*A, 5 m*L,
L A
For example+
=el

*6=+ F7, 5 m

*6=+ F7, ; m

*6=7, ; m

*6F7,
5 :.> ; : ; : 5 :.>
=elief function is different from the mass+ which is the
belief in the evidence assigned to a single set.
2ince belief functions are defined in terms of masses+
the combination of two belief functions also can be
expressed in terms of masses of a set and all its
subsets.
For example:
=el

=el
"
*6=7, 5 m

m
"
*6=7, ; m

m
"
*,
5 :.C: ; : 5 :.C.
=el

=el
"
*6=+ F7,
5 m

m
"
*6=+ F7, ; m

m
"
*6=7, ; m

m
"
*6F7,
5 :.:> ; :.C ; : 5 :.C>
=el

=el
"
*,
5 m

m
"
*, ; m

m
"
*6=+ F7, ; m

m
"
*6=7,
5 :.:# ; :.:> ; :.C 5
<<
=el*, 5 in all cases since the sum of masses must
always equal .
The evidential interval of a set 2+ EI*2,+ may be
defined in terms of the belief.
84*2, 5 Q=el*2,+ - =el*2&,R
For instance+ if 2 5 6=7+ then 2& 5 69+ F7 and
=el*69+ F7,
5 m

m
"
*69+ F7, ; m

m
"
*697, ; m

m
"
*6F7,
5 : ; : ; : 5 :
since these are not focal elements and the mass is :
for nonfocal elements.
Thus+ 84*6=7, 5 Q:.C:+ - :R 5 Q:.C+ R
9lso+ since =el*697, 5 :+ and
=el*6=+ F7, 5 =el

=el
"
*6=+ F7, 5 :.C>.
Then
84*6=+ F7, 5 Q:.C>+ - :R 5 Q:.C>+ R
84*697, 5 Q:+ :.:#R
<>
The plausibility is defined as the degree to which the
evidence fails to refute A
04s*A, 5 - =el*A&,
Thus+ 84*A, 5 Q=el*A,+ 04s*A,R
The evidential interval 3total belie&2 plausibility4 can
be expressed+
Qevidence for support+ evidence for support ; ignoranceR
The dubiety *1bt, or doubt represents the degree to
which A is disbelieved or refuted.
The ignorance *Igr, is the degree to which the mass
supports A and A&.
These are defined as follows:
!bt*A, 5 =el*A&, 5 - 04s*A,
4gr*A, 5 04s*A, - =el*A,
<@
*%e 5ormali#ation o& (elie&
3et us see an example. 2uppose a third evidence now
reports conflicting evidence of an airliner
m
#
*697, 5 :.C'+ m
#
*, 5 :.:'
The table shows how the cross products are
calculated.
m

m
"
*6=7, m

m
"
*6=+ F7, m

m
"
*,
:.C: :.:> :.:#
----------------------------------------------------------------------------------
m
#
*697, 5 :.C' :.@'' :.:<<' 697 :.:"@'
m
#
*, 5 :.:' 6=7 :.:\$' 6=+ F7 :.::#' :.::'
Thus
m

m
"
m
#
*697, 5 :.:"@'
m

m
"
m
#
*6=7, 5 :.:\$'
m

m
"
m
#
*6=+ F7, 5 :.::#'
m

m
"
m
#
*, 5 :.::'
m

m
"
m
#
*, 5 :
)ote that for this example+ the sum of all the masses is
less than
<C
m

m
"
m
#
*A,
5 :.:"@' ; :.:\$' ; :.::#' ; :.::' 5 :.:>@'
/owever a sum of is required because the combined
evidence m

m
"
m
#
+ is a valid mass and the sum
over all focal elements must be .
This is a problem.
The solution to this problem is a normaliMation of the
focal elements by dividing each focal element by
-
where is defined for any sets A and L as
5 m

*A,m
"
*L,
A L 5
For our problem+
5 :.@'' ; :.:<<' 5 :.C"'
!ividing each m

m
"
m
#
focal element by -
m

m
"
m
#
*697, 5 :.#<#
m

m
"
m
#
*6=7, 5 :.'>#
m

m
"
m
#
*6=+ F7, 5 :.:\$'
m

m
"
m
#
*, 5 :.:C
The one evidence of 697 has considerably reduced
the belief in 6=7.
>:
The total normaliMed belief in 6=7 is now
=el*6=7, 5 m

m
"
m
#
*6=7, 5 :.'>#
=el*6=7&, 5 =el*69+ F7,
5 m

m
"
m
#
*69+ F7, ;
m

m
"
m
#
*697, ;
m

m
"
m
#
*6F7,
5 : ; :.#<# ; : 5 :.#<#
and so the evidential interval is now
84*6=7, 5 Q=el*6=7,+ - =el*6=7&,R
5 Q:.'>#+ - :.#<#R
5 Q:.'>#+ :.<#>R
The general form of 1empster)s Rule o&
Combination is+
m

*A, m
"
*L,
A L 5 X
m

m
"
*X, 5 --------------------------
-
)ote that 5 is undefined.
>
1i&&iculty it% t%e 1empster,S%a&er
t%eory
Ene difficulty with the !empster-2hafer theory occurs
with normaliMation and may lead to results which are
contrary to our expectation.
The problem is that it ignores the belief that the
ob%ect being considered does not exist.
For example+ the beliefs by two doctors+ 9 and =+ in a
patient&s illness are as follows
m
9
*meningitis, 5 :.CC+ m
9
*brain tumor, 5 :.:
m
=
*concussion, 5 :.CC+ m
=
*brain tumor, 5 :.:
=oth doctors think there is a very low chance+ :.:+ of
a brain tumor but greatly disagree on the ma%or
problem.
The !empster rule of combination gives a combined
belief of in the brain tumor. The result is very
unexpected.
>"