Академический Документы
Профессиональный Документы
Культура Документы
• Directed Acyclic Graph (DAG) • Each node has a conditional probability table (CPT) that
gives the probability of each of its values given every possible
– Nodes are random variables combination of values for its parents (conditioning case).
– Edges indicate causal influences – Roots (sources) of the DAG that have no parents are given prior
probabilities.
.001
Burglary Earthquake .002
Alarm B E P(A)
T T .95
T F .94
Alarm F T .29
JohnCalls MaryCalls F F .001
A P(J) A P(M)
T .90 T .70
JohnCalls MaryCalls F .01
F .05
3 4
1
Naïve Bayes as a Bayes Net Independencies in Bayes Nets
• Naïve Bayes is a simple Bayes Net • If removing a subset of nodes S from the network
renders nodes Xi and Xj disconnected, then Xi and Xj
Y
are independent given S, i.e. P(Xi | Xj, S) = P(Xi | S)
• However, this is too strict a criteria for conditional
X1 X2 … Xn
independence since two nodes will still be
considered independent if their simply exists some
• Priors P(Y) and conditionals P(Xi|Y) for variable that depends on both.
Naïve Bayes provide CPTs for the network. – For example, Burglary and Earthquake should be
considered independent since they both cause Alarm.
7 8
??? .001
John also calls 5% of the time when there ??? .001
Actual probability of Burglary is 0.016
Burglary Earthquake Burglary Earthquake
is no Alarm. So over 1,000 days we since the alarm is not perfect (an
expect 1 Burglary and John will probably Earthquake could have set it off or it
Alarm call. However, he will also call with a Alarm could have gone off on its own). On the
false report 50 times on average. So the other side, even if there was not an
call is about 50 times more likely a false alarm and John called incorrectly, there
JohnCalls MaryCalls report: P(Burglary | JohnCalls) ≈ 0.02 JohnCalls MaryCalls could have been an undetected Burglary
A P(J) A P(J)
anyway, but this is unlikely.
T .90 T .90
F .05 F .05
11 12
2
Types of Inference Sample Inferences
15 16
3
Belief Propagation Details Belief Propagation Details (cont.)
• Each node B acts as a simple processor which maintains a • Assumes the CPT for each node is a matrix (M) with a column
vector λ(B) for the total evidential support for each value for each value of the node’s variable and a row for each
of its corresponding variable and an analogous vector π(B) conditioning case (all rows must sum to 1).
Value of Alarm
for the total causal support. T F
• The belief vector BEL(B) for a node, which maintains the Matrix M for Values TT 0.95 0.05
probability for each value, is calculated as the normalized the Alarm node of Burglary TF 0.94 0.06
product: and Earthquake FT 0.29 0.71
FF 0.001 0.999
BEL(B) = α λ(B) π(B)
• Propagation algorithm is simplest for trees in which each
node has only one parent (i.e. one cause).
• Computation at each node involve λ and π message • To initialize, λ(B) for all leaf nodes is set to all 1’s and π(B) of
vectors sent between nodes and consists of simple matrix all root nodes is set to the priors given in the CPT. Belief
calculations using the CPT to update belief (the λ and π based on the root priors is then propagated down the tree to all
node vectors) for each node based on new evidence. leaves to establish priors for all nodes.
• Evidence is then added incrementally and the effects
19 propagated to other nodes. 20
4
Statistical Revolution Structured (Multi-Relational) Data
• Across AI there has been a movement from logic- • In many domains, data consists of an
based approaches to approaches based on
probability and statistics.
unbounded number of entities with an
– Statistical natural language processing arbitrary number of properties and relations
– Statistical computer vision between them.
– Statistical robot navigation – Social networks
– Statistical learning
– Biochemical compounds
• Most approaches are feature-based and
“propositional” and do not handle complex – Web sites
relational descriptions with multiple entities like
those typically requiring predicate logic.
25
Faculty
Other
Predicting mutagenicity
[Srinivasan et. al, 1995]
Grad Research
Student Project
27
5
Probabilistic AI Paradigm Statistical Relational Models
• Represents knowledge and data as a fixed • Integrate methods from predicate logic (or
set of random variables with a joint relational databases) and probabilistic
probability distribution. graphical models to handle structured,
multi-relational data.
+ Handles uncertain knowledge and – Probabilistic Relational Models (PRMs)
probabilistic reasoning. – Stochastic Logic Programs (SLPs)
− Unable to handle arbitrary sets of objects, – Bayesian Logic Programs (BLPs)
with properties, relations, quantifiers, etc. – Relational Markov Networks (RMNs)
– Markov Logic Networks (MLNs)
– Other TLAs
32
Conclusions
• Bayesian learning methods are firmly based on
probability theory and exploit advanced methods
developed in statistics.
• Naïve Bayes is a simple generative model that works
fairly well in practice.
• A Bayesian network allows specifying a limited set
of dependencies using a directed graph.
• Inference algorithms allow determining the
probability of values for query variables given
values for evidence variables.
33