Академический Документы
Профессиональный Документы
Культура Документы
Abstract 1 Introduction
The general problem of computing poste-
rior probabilities in Bayesian networks is Several exact approaches to the computing of pos-
NP-hard (Cooper 1990). However e
cient terior probabilities in Bayesian networks have been
algorithms are often possible for particular proposed, studied, and some of them implemented.
applications by exploiting problem struc- The one that is most well known is clique tree propa-
tures. It is well understood that the key gation, which has been developed over the past few
to the materialization of such a possibil- years by Pearl (1988), Lauritzen and Spiegehalter
ity is to make use of conditional indepen- (1988), Shafer and Shenoy (1988), and Jensen et al
dence and work with factorizations of joint (1990). Other approaches include Shachter's arc rever-
probabilities rather than joint probabilities sal node reduction approach (Shachter 1988), symbolic
themselves. Dierent exact approaches can probabilistic inference rst proposed by D'Ambrosio
be characterized in terms of their choices of (Shachter et al 1990), recursive decomposition by
factorizations. We propose a new approach Cooper (1992), and component tree propagation by
which adopts a straightforward way for fac- Zhang and Poole (1992).
torizing joint probabilities. In comparison This paper grew out of an attempt to understand those
with the clique tree propagation approach, approaches and the relationships among them. We
our approach is very simple. It allows the asked ourselves: are there any common principles that
pruning of irrelevant variables, it accommo- underlie all those approaches? If yes, what are the
dates changes to the knowledge base more choices that render them dierent from one another?
easily. it is easier to implement. More What are the advantages and disadvantages of these
importantly, it can be adapted to utilize choices? Are there any better choices and/or any bet-
both intercausal independence and condi- ter combinations of choices?
tional independence in one uniform frame-
work. On the other hand, clique tree prop- Shachter et al (1992) has demonstrated the similarities
agation is better in terms of facilitating pre- among the various approaches. In this paper, we are
computations. more interested in the dierences among them.
Cooper (1990) has proved that the general problem
Keywords: reasoning under uncertainty, Bayesian of computing posterior probabilities in Bayesian net-
networks, algorithm works is NP-hard. In particular applications, however,
it is often possible to compute them e
ciently by ex-
ploiting the problem structures. The key technique
that enables the materialization of such a possibility,
as pointed out by Shafer and Shenoy (1988), is to work
with factorizations of joint probabilities rather than
the joint probabilities themselves. What all the ex-
act approaches have in common is that they all adopt
this technique, while they dier in their own choices
of factorizations.
These understandings lead to a new approach that
chooses a straightforward factorization for joint prob-
abilities. Though very simple, the new approach
has several advantages over clique tree propagation
in terms of pruning irrelevant variables, accommodat- a
ing changes to the knowledge base and easiness of im- a
plementation. It also leads to a uniform framework b
b
b
is identied in section 6. Clique tree propagation is For example, the prior joint probability P of net1
one solution to this subproblem. A new and very sim- is given by
net 1
children. A node is barren w.r.t a query PN (X jY = relevant to the query P 2(e b = b0).
net
Y0 ), if it is a leaf and it is not in X Y . In net1, g is From now on, we will assume that all the irrelevant
barren w.r.t P 1(ejb = b0 ).
net variables have been removed unless otherwise indi-
cated. Under this assumption, we can replace the
Theorem 1 Suppose N0 is a Bayesian network, and query PN (X jY = Y0 ) with the query PN (X Y = Y0).
v is a leaf node. Let N be the Bayesian network ob- We call the latter a standard query. The rest of the
tained from N by removing v. If v is barren w.r.t to paper will only be dealing with standard queries.
the query PN (X jY = Y0 ), then
4 Potentials
PN (X jY = Y0 ) = PN (X jY = Y0 ): (3)
A potential is a non-negative function which takes a
0
Consider computing P 1(ejb = b0). The node g is positive value at at least one point. Here are some ex-
net
barren w.r.t the query and hence irrelevant. According ample potentials. The probability P(X) is a potential
to Theorem 1, g can be harmlessly removed. This cre- of X, the conditional probability P(X jY ) is a poten-
ates a new barren node f. After the removal of g and tial of X and Y , and P(X Y = Y0 ) is a potential of
f, net1 becomes net2. Thus the query P 1(ejb = b0 )
net
X.
is reduced to the query P 2(ejb = b0 ).
net Let S be a set of potentials over a set of variables V .
Let An(X Y ) be the ancestral set of X Y , i.e the The marginal potential P (X) is dened as follows:
S
set of nodes in X Y and of the ancestors of those Multiply all the potentials in S together to get the joint
nodes. By repeatedly applying Theorem 1, one can potential P (V ), and P (X) is obtained from P (V ) by
S S S
easily show that summing out all the variables outside X. It is obvious
that marginal probability is a special case of marginal
Corollary 1 All the nodes outside An(X Y ) are ir- potentials. For technical convenience, we shall be talk-
relevant to the query P(X jY = Y0). ing about marginal potential P (X Y = Y0) instead
S
m-separation is new, but the concept itself was used method is not e
cient.
S S
Lauritzen et al (1990).
Even though the general problem of computing poste-
Theorem 2 Suppose N = (V A P ) is a Bayesian rior probabilities in Bayesian networks is NP-hard, e
-
network. Given a query PN (X jY = Y0 ), let N be
0 cient algorithms often exist for particular applications
the Bayesian network obtained from N by removing due to the underlying structures. The purpose of this
all the nodes that are m-separated from X by Y . Then section is to describe a key technique that allows us
to make use one aspect of problem structure, namely
conditional independencies. The technique is to work
PN (X jY = Y0 ) = PN (X jY = Y0 ):
0 (4) with factorizations of joint potentials (probabilities)
rather than joint potentials (probabilities) themselves.
In our example, since a is m-separated from e by b in We say that a set S1 of potentials is a factorization
net2, the query can be further reduced to P 3(ejb =net of the joint potential P (V ) if P (V ) is the result of
b0) Note that a is not m-separated from e by b in net1. S S
al 1990) that, given a query, all the irrelevant nodes straightforward one because it is what one has to begin
that are graphically recognizable can be recognized by with. We call S the primary factorization of P (V ). S
applying those two theorems. We will see later that clique tree propagation does not
From equation (2), we see that PN (X jY = Y0 ) can directly adopts the primary factorization. Rather it
be obtained from PN (X Y = Y0) by multiplying a rst performs some pre-organizations and precompu-
tations on S and then proceeds with the resulting more
2
Nodes that do not have parents. organized factorization.
Exponential explosion can be in terms of both storage The rst component nds an elimination ordering.
space and time. It is quite easy to see why factor- We call it the ordering determination component.
ization is able to help us to save space. For the sake Roughly speaking, an elimination ordering is good if
of illustration, consider the Bayesian network net1 in the arithmetic calculations needed to sum out each
Figure 1. If all the variables are binary, to store the variable, from the primary factorization, involve only
set of potentials of net1, i.e the prior and conditional a small number of other variables. Even with a clear
probabilities, one needs to store 2 + 4 4 + 2 8 = 34 and crisp denition, the ordering determination prob-
numbers. On the other hand, to explicitly store the lem proves to be a di
cult one. See Kjrul (1990)
joint potential (probability) P 1(a b c d e f g) it-
net and Klein et al (1990) for research progresses on the
self, one needs to store 27 = 128 numbers. problem. In this paper, we shall not discuss it any
To see how factorizations of joint potentials enable further.
us to save time, we assume that the summing-out- The third component is the arithmetic calculation
variables-one-by-one strategy is adopted for comput- component. It takes a bunch of potentials, multiply
ing marginal potentials.3 We also assume that an or- them together, sum out a certain variable from the
dering has been given for this purpose. This ordering product, and return the result. A major goal in design-
will be referred as the elimination ordering. ing Bayesian network inference algorithms is to mini-
Since we choose to work with the a factorization, which mize the total number of arithmetic calculations.
is a set of potentials, we need to dene how to sum out In between the rst and the third components lies the
one variable from a set of potentials. To sum out a data management component. It determines, from the
variable v from a set of potentials S is to: (1) remove elimination ordering produced by the rst component,
from S all the potentials that contain v, (2) multiply what arithmetic calculations the third component is
all those potentials together, (3) sum out v from the going to perform, and in which order. The compo-
product, and (4) add the resulting potential to S. nent also hides the design decisions as to how to store
For example, to sum a out of net1, we rst remove the potentials, how to retrieve a potential when it is
P(a), P(bja) and P(f ja) from net1, then compute needed, and how to update the set of potentials after
a variable has been summed out. We call a design of
the data management component a data management
(b f) =
X P (a)P (f ja)P (bja) (5) scheme.
Among the existing exact approaches to Bayesian net-
a
Usually, it takes much less arithmetic calculations to ponent tree propagation (Zhang and Poole 1992) are
sum out one variable from a factorization of a joint mixtured of data management schemes and ordering
potential than from the joint potential itself. For ex- determination methods.
ample, equation (5) denote all the arithmetic calcula-
tions needed to sum out a from net1. It involves only In the remainder of the paper, we shall rst propose a
three variables: a, b, and f. On the other hand, to very simple data management scheme (section 7), and
sum out a explicitly from P 1(a b c d e f g) itself, we shall compare this scheme with clique tree propaga-
tion and the arc reversal and node reduction approach
net
one needs to perform the following calculations,
XP (a b c d e f g)
(sections 8 and 9).
1
7 A simple data management scheme
net
which involves all the seven variables in the network. The following algorithms describes our design of the
This is the exactly why working the factorizations of data management component. It takes, as input, a
joint potentials enables us to reduce time complexity. set of potential S, a standard query (X, Y = Y0 )
and an elimination ordering Ordering. The output
6 Three components is P (X Y = Y0 ).
S
, P(a) , P(b|a)
3 P(c|b) , P(d|b)
bfe bcde
Remove the rst 1 bfe bcde P(e|c, d)
Let (e) = (d e). After the removal of d, we stuck in. Thus the initial factorization for clique tree
get
d d c
propagation consists of one and only one potential for
each clique.
e: f: Figure 2 (1) shows a clique tree for the Bayesian
P(gje f) P(f ja0) network net1 in Figure 1, together with a grouping
(e) of its prior and conditional probabilities (potentials).
P
d
of d, we get
e
def def
f: (e f g) = P (gjf e).
d def
P 1(g a = a0 ). b c
h d
net h d
from node (abf) to (bfe), and then to (efg). No mes- net4 net5 net6
nation ordering for the empty query P (). There are is precomputed, the query P 4(ejh = h0) can be im-
net
mediately transformed into P 6(ejh = h0 ), a query
S
linear time algorithms to obtain an elimination order- net
ing from a clique tree and to get back the clique tree to a Bayesian network with three variables less.
from the ordering (see, for example, Zhang 1991). So, both pruning and precomputing enable us to cut
To compare our data management scheme with clique down computations. However, they both have prices
tree propagation, we notice that our approach handles to pay as well.
changes to the knowledge base more easily. Clique Pruning irrelevant variables implies that we will be
tree is a secondary structure. Any topology changes working with a potentially dierent sub-network for
to the original network, like adding or deleting vari- each query. This usually means that an elimination
able, or adding or deleting an arc, require recomputing ordering is to be found for each particular query from
the clique tree and the potential associated with each scratch, instead of deriving it from an ordering for the
clique. In the Jensen et al (1990) scheme, one has to empty query.
recompute the marginal probabilities for all the cliques
even when there are only numerical adjustments to the We argue that this is not a serious drawback for two
conditional probabilities. reasons. First, if a query only involve a few variables,
Secondly, if in a query PN (X jY = Y0 ), X is not con- pruning may give us a sub-network which is only a
tained in any clique of the clique tree, then the sec- small portion of the original network. After all, only
ondary structure has to be modied, at least temporar- those variables who are ancestors of variables in the
ily. This is even more cumbersome in the Jensen et al query could be relevant to the query. Thus pruning
(1990) Scheme. makes nding a good eliminationordering much easier.
Second, some existing hueristics for nding elimina-
Two major issues in comparing our approach with tion, like maximal cardinality search and lexicographic
clique tree propagation are pruning and precomput- search (Rose 1970, Kjru 1990), are quite simple. It
ing, to which we devote the next subsection. does not take much more time to nd an ordering from
scratch.
8.2 Pruning vs. precomputing As far as precomputing is concerned, it is di
cult to
decide what to precompute. For example, precompute
Pruning irrelevant variables and precomputing the P(g) is not very helpful in the previous example. One
prior probabilities of some variables are two techniques solution is to compute the prior probabilities for all
to cut down computations. In this section, we shall il- the combinations of variables. But this implies an ex-
lustrate those two techniques through an example and ponential number of precomputations.
discuss some related issues. Clique tree propagation works on the secondary struc-
Consider the query about posterior probability ture of clique tree, which is kept static. This makes
P 4(ejh = h0 ), where net4 is given in Figure 3. One
net precomputing possible (Jensen at al 1990). As we
can rst prune f because it is barren. Thereafter, g pointed out early, however, there is a price to pay. The
and r can also be pruned because g becomes barren af- approach does not prune irrelevant variables, and it is
ter the removal of f and r becomes m-separated from e hard for it to accommodate changes to the knowledge
by h. Thus, pruning enables one to transform the orig- base.
8.3 Intercausal independence telligence, July, Cambridge, Mass. , pp. 257 - 264.
A major reason for us to come up with a new data G. F. Cooper (1990) The computational complexity of
management scheme is that it leads to a uniform probabilistic inference using Bayesian belief networks,
framework for utilizing conditional independence as Articial Intelligence, 42, pp. 393-405.
well as intercausal independence. G. F. Cooper (1990), Bayesian belief-network infer-
In Zhang and Poole (1994), we give a constructive de- ence using recursive decomposition, Report No. KSL
nition of intercausal independence. Noisy OR-gates 90-05, Knowledge Systems Laboratory, Medical Com-
and noisy adders satisfy our denition. A nice prop- puter Science, Standford University.
erty of our denition is that it relates intercausal in- D. Geiger, T. Verma, and J. Pearl (1990), d-separation:
dependence with factorization of conditional probabil- From theorems to algorithms, in Uncertainty in Arti-
ities, in a way very similar to that conditional inde- cial Intelligence 5, pp. 139-148.
pendence is related factorization of joint probabilities. F. V. Jensen, K. G. Olesen, and K. Anderson (1990),
The only dierence lies in the way the factors are com- An algebra of Bayesian belief universes for knowledge-
bined. While conditional independence implies that based systems, Networks, 20, pp. 637 - 659.
a joint can be factorized as a multiplication of sev-
eral factors, intercausal independence implies that a U. Kjrul (1990), Triangulation of Graphs - Algo-
conditional probability can be factorized as a certain rithms giving small total state space, R 90-09, Insti-
combination of several factors, where combination is tute for Electronic Systems, Department of Mathemat-
usually not multiplication. ics and Computer Science, Strandvejen, DK 9000 Aal-
The concept of heterogeneous factrization is proposed. borg, Denmark.
A heterogeneous factorization is one where dierent P. Klein, A. Agrawal, A. Ravi, and S. Rao (1990),
factors can be combined in dierent ways. We have Approximation through multicommodity ow, in Pro-
adapted the data management scheme proposed in this ceedings of 31st Symposium on Foundations of Com-
paper to handle heterogeneous factorizations. puter Science, pp. 726-737.
S. L. Lauritzen and D. J. Spiegehalter (1988), Local
9 Conclusions computations with probabilities on graphical struc-
tures and their applications to expert systems, Journal
of Royal Statistical Society B, 50: 2, pp. 157 - 224.
A key technique to reduce time and space complexities
in Bayesian networks is to work factorizations of joint S. L. Lauritzen, A. P. Dawid, B. N. Larsen, and H. G.
potentials rather than joint potentials themselves. Dif- Leimer (1990), Independence Properties of Directed
ferent exact approaches can be characterized by their Markov Fields, Networks, 20, pp. 491-506.
choices of factorizations. We have proposed a new ap- J. Pearl (1988), Probabilistic Reasoning in Intelligence
proach which begins with the primary factorization. Systems: Networks of Plausible Inference, Morgan
Our approach is simpler than clique tree propagation. Kaufmann Publishers, Los Altos, CA.
Yet it is advantageous in terms of pruning irrelevant
variables and accommodating changes to the knowl- D. Poole and E. Neufeld (1991), Sound probabilis-
edge base. More importantly, our approach leads to a tic inference in Prolog: An executable specication
uniform framework for dealing with both conditional of Bayesian networks, Department of Computer Sci-
and intercausal independence. However, our approach ence, University of British Columbia, Vancouver, B.
does not support precomputation as clique tree prop- C., V6T 1Z2, Canada.
agation does. D. Poole (1992), Search for Computing posterior prob-
abilities in Bayesian networks, Technical Report 92-
Acknowledegment: 24, Department of Computer Science, University of
British Columbia, Vancouver, Canada.
We wish to thank Mike Horsch for his valuable com- D. J. Rose (1970), Triangulated graphs and the elimi-
ments on a draft of this paper and Runping Qi for nation process, Journal of Mathematical Analysis and
useful discussions. Research is supported by NSERC Applications, 32, pp 597-609.
Grant OGPOO44121 and travel grants from Hong R. Shachter (1986), Evaluating Inuence Diagrams,
Kong University of Science and Technology. Operations Research, 34, pp. 871-882.
References: R. Shachter (1988), Probabilistic Inference and Inu-
ence Diagrams, Operations Research, 36, pp. 589-605.
M. Baker and T. E. Boult (1990), Pruning Bayesian R. D Shachter, S. K. Andersen, and P. Szolovits
networks for e
cient computation, in Proceedings of (1992), The equivalence of exact methods for prob-
the Sixth Conference on Uncertainty in Articial In- abilistic inference in belief networks, Department of
Engineering-Economic Systems, Stanford University.
R. D. Shachter, B. D'Ambrosio, and B. A. Del Favero
(1990), Symbolic Probabilistic Inference in Belief Net-
works, in AAAI-90, pp. 126-131.
G. Shafer and P. Shenoy (1988), Local computation in
hypertrees, Working Paper No. 201, Business School,
University of Kansas.
L. Zhang (1991), Studies on hypergraphs (I): Hyper-
forests, accepted for publication on Discrete Applied
Mathematics.
L. Zhang and D. Poole (1992) Sidestepping the tri-
angulation problem in Bayesian net computations, in
Proc. of 8th Conference on Uncertainty in Arti-
cial Intelligence, July 17-19, Standford University, pp.
360-367.
L. Zhang and D. Poole (1994) Intercausal indepen-
dence and heterogeneous factorizations, submitted to
The Tenth Conference on Uncertainty in Articial In-
telligence