Вы находитесь на странице: 1из 28

Korb & Nicholson 1 Korb & Nicholson 2

Bayesian AI
Overview
Tutorial
1. Introduction to Bayesian AI (20 min)

Kevin B. Korb and Ann E. Nicholson 2. Bayesian networks (40 min)


Lunch
School of Computer Science
3. Bayesian networks cont’d (10 min)
and Software Engineering
Monash University 4. Applications (50 min)
Clayton, VIC 3168 Break (10 min)
AUSTRALIA
5. Learning Bayesian networks (50 min)

fkorb,annng@csse.monash.edu.au 6. Current research issues (10 min)


HTTP :// WWW. CSSE . MONASH . EDU. AU /˜KORB

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 3 Korb & Nicholson 4

Reasoning under uncertainty


Introduction to Bayesian AI
Uncertainty: The quality or state of being not clearly
 Reasoning under uncertainty known.
 Probabilities This encompasses most of what we understand about
 Alternative formalisms the world — and most of what we would like our AI
– Fuzzy logic systems to understand.

– MYCIN’s certainty factors Distinguishes deductive knowledge (e.g.,


– Default Logic mathematics) from inductive belief (e.g.,
science).
 Bayesian philosophy
Sources of uncertainty
– Dutch book arguments
– Bayes’ Theorem  Ignorance
(which side of this coin is up?)
– Conditionalization
– Confirmation theory  Physical randomness
(which side of this coin will land up?)
 Bayesian decision theory
 Vagueness
 Towards a Bayesian AI (which tribe am I closest to genetically? Picts?
Angles? Saxons? Celts?)

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 5 Korb & Nicholson 6

Fuzzy Logic
Probabilities Designed to cope with vagueness:
Is Fido a Labrador or a Shepard?
Classic approach to reasoning under uncertainty. Fuzzy set theory:
(Blaise Pascal and Fermat). m(F ido 2 Labrador) = m(F ido 2 Shepard) = 0:5
Kolmogorov’s Axioms: Extended to fuzzy logic, which takes intermediate
1. P (U ) = 1 truth values: T (Labrador(F ido)) = 0:5.

2.8X  U P (X )  0 Combination rules:

3. 8X; Y  U  T (p ^ q) = min(T (p); T (q))


if X \ Y = ;  T (p _ q) = max(T (p); T (q))
then P (X _ Y ) = P (X ) + P (Y )
 T (:p) = 1 T (p)
Conditional Probability P (X jY ) = P (PX(Y^Y) )
Not suitable for coping with randomness or ignorance.
Independence X q Y i P (X jY ) = P (X ) Obviously not:
Uncertainty(inclement weather) =
max(Uncertainty(rain),Uncertainty(hail),. . . )

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 7 Korb & Nicholson 8

MYCIN’s Certainty Factors Default Logic


Uncertainty formalism developed for the early expert
Intended to reflect “stereotypical” reasoning under
system MYCIN (Buchanan and Shortliffe, 1984):
uncertainty (Reiter 1980). Example:
Elicit for (h; e):
Bird(Tweety) : Bird(x) ! Flies(x)
 measure of belief: MB (h; e) 2 [0; 1℄ Flies(Tweety)
 measure of disbelief: MD(h; e) 2 [0; 1℄ Problems:
CF (h; e) = MB (h; e) MD(h; e) 2 [ 1; 1℄  Best semantics for default rules are probabilistic
(Pearl 1988, Korb 1995).
Special functions provided for combining evidence.  Mishandles combinations of low probability
Problems: events. E.g.,

 No semantics ever given for ‘belief’/‘disbelief’ ApplyforJob(me) : ApplyforJob(x) ! Reje t(x)


Reje t(me)
 Heckerman (1986) proved that restrictions
I.e., the dole always looks better than applying for
required for a probabilistic semantics imply
a job!
absurd independence assumptions.

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 9 Korb & Nicholson 10

A Dutch Book
Probability Theory Payoff table on a bet for h
(Odds = p=1 p; S = betting unit)
So, why not use probability theory to represent
uncertainty? h Payoff
That’s what it was invented for. . . dealing with T $(1-p) S
physical randomness and degrees of ignorance. F -$p S
Furthermore, if you make bets which violate Given a fair bet, the expected value from such a payoff
probability theory, you are subject to Dutch books: is always $0.
A Dutch book is a sequence of “fair” bets Now, let’s violate the probability axioms.
which collectively guarantee a loss.

Fair bets are bets based upon the standard Example


Say, P (A) = 0:1 (violating A2)
odds-probability relation:

O(h) = P (h) Payoff table against A (inverse of: for A),


1 P (h) with S = 1:

P (h) = O(h) :A Payoff


1 + O(h)
T $pS = -$0.10
F -$(1-p)S = -$1.10

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 11 Korb & Nicholson 12

Bayesian Decision Theory


Bayes’ Theorem;
— Frank Ramsey (1931)
Conditionalization
Decision making under uncertainty: what action to
— Due to Reverend Thomas Bayes (1764) take (plan to adopt) when future state of the world is
not known.
P (ejh)P (h)
P (hje) = Bayesian answer: Find utility of each possible
P (e)
outcome (action-state pair) and take the action that
Conditionalization: P 0 (h) = P (hje) maximizes expected utility.

Or, read Bayes’ theorem as: Example

Posterior = Likelihood  Prior Action Rain (p = .4) Shine (1 - p = .6)

Prob of eviden e Take umbrella 30 10


Leave umbrella -100 50
Assumptions:
Expected utilities:
1. Joint priors over fhi g and e exist. E(Take umbrella) = (30)(.4) + (10)(.6) = 18
2. Total evidence: e, and only e, is learned. E(Leave umbrella) = (-100)(.4) + (50)(.6) = -10

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 13 Korb & Nicholson 14

Bayesian AI
A Bayesian conception of an AI is: Bayesian Networks: Overview
An autonomous agent which

 Has a utility structure (preferences)  Syntax

 Can learn about its world and the relation between  Semantics
its actions and future states (probabilities)  Evaluation methods
 Maximizes its expected utility  Influence diagrams (Decision Networks)
The techniques used in learning about the world are  Dynamic Bayesian Networks
(primarily) statistical. . . Hence

Bayesian data mining

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 15 Korb & Nicholson 16

Bayesian Networks
Example: Earthquake (Pearl)
 Data Structure which represents the dependence
between variables.
 Pearl has a new burglar alarm installed.
 Gives concise specification of the joint probability
 It is reliable about detecting burglary, but
distribution.
responds to minor earthquakes.
 A Bayesian Network is a graph in which the
 Two neighbours (John, Mary) promise to call you
following holds:
at work when they hear the alarm.
1. A set of random variables makes up the nodes
– John always calls when hears alarm, but
in the network.
confuses alarm with phone ringing (and calls
2. A set of directed links or arrows connects pairs then also)
of nodes.
– Mary likes loud music and sometimes misses
3. Each node has a conditional probability table alarm!
that quantifies the effects the parents have on
the node.  Given evidence about who has and hasn’t called,
estimate the probability of a burglary.
4. Directed, acyclic graph (DAG), i.e. no directed
cycles.

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 17 Korb & Nicholson 18

Earthquake Example: Notes

Earthquake Example:  Assumptions: John and Mary don’t perceive


burglary directly; they do not feel minor
Network Structure earthquakes.

 Note: no info about loud music or telephone


ringing and confusing John. Summarised in
uncertainty in links from Alarm to JohnCalls
P(E) and MaryCalls.
Burglary Earthquake


0.02
Once specified topology, need to specify
P(B)
0.01 B E P(A|B,E) conditional probability table (CPT) for each
Alarm
T T 0.95 node.
T F 0.94
F T 0.29 – Each row contains the cond prob of each node
JohnCalls MaryCalls F F 0.001
value for a conditioning case.
A P(J|A) A P(M|A) – Each row must sum to 1.
T 0.90 T 0.70
– A table for a Boolean var with n Boolean
parents contain 2n+1 probs.
F 0.05 F 0.01

– A node with no parents has one row (the prior


probabilities)

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 19 Korb & Nicholson 20

Representing the joint


probability distribution

Semantics of Bayesian P (X1 = x1 ; X2 = x2 ; :::; Xn = xn )


Networks = P (x1 ; x2 ; :::; xn )

 A (more compact) representation of the joint = P (x1 )  P (x2 jx1 ):::  P (xn jx1 ^ :::xn 1 )
probability distribution.
– helpful in understanding how to construct = i P (xi jx1 ^ :::xi 1 )
network

 Encoding a collection of conditional independence = i P (xi j(Xi ))


statements.
– helpful in understanding how to design
Example: P (J ^ M ^ A ^ :B ^ :E )
inference procedures

= P (J jA)P (M jA)P (Aj:B ^ :E )P (:B )P (:E )

= 0:9  0:7  0:001  0:999  0:998 = 0:0067:

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 21 Korb & Nicholson 22

Compactness and Node


Ordering
Network Construction
 Compactness of BN is an example of a locally
1. Choose the set of relevant variables Xi that structured (or sparse) system.
describe the domain.  The correct order to add nodes is to add the “root
2. Choose an ordering for the variables. causes” first, then the variable they influence, so
on until “leaves” reached.
3. While there are variables left:
(a) Pick a variable Xi and add a node to the
 Examples of wrong ordering (which still represent
same joint distribution):
network for it.
(b) Set  (Xi ) to some minimal set of nodes already 1. MaryCalls, JohnCalls, Alarm, Burglary,
in the net such that the conditional Earthquake.
independence property is satisfied. MaryCalls

P (Xi jXi 1 ; :::; X1 ) = P (Xi j(Xi))


JohnCalls

(c) Define the CPT for Xi .


Alarm

Burglary Earthquake

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 23 Korb & Nicholson 24

Compactness and Node


Conditional Independence:
Ordering (cont.)
Causal Chains
2. MaryCalls, JohnCalls, Earthquake, Burglary,
Alarm.
Causal chains give rise to conditional independence:

MaryCalls
JohnCalls A B C

Earthquake
P (C jA ^ B ) = P (C jB )
Burglary Alarm Example

 A = Jack’s flu
As many probabilities as the full joint distribution!
 B = severe cough

 C = Jill’s flu
See below for why.

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 25 Korb & Nicholson 26

Conditional Dependence:
Conditional Independence:
Common Effects
Common Causes

Common effects (or their descendants) give rise to


Causal causes (or ancestors) also give rise to
conditional dependence:
conditional independence:
A C

P (AjC ^ B ) 6= P (A)P (C )
A C

P (C jA ^ B ) = P (C jB ) Example

Example  A = flu

 A = Jack’s food poisoning  B = severe cough

 B = shared soup  C = tuberculosis

 C = Jill’s food poisoning Given a severe cough, flu “explains away”


tuberculosis.

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 27 Korb & Nicholson 28

D-separation

 Graph-theoretic criterion of conditional Causal Ordering


independence.

 We can determine whether a set of nodes X is


independent of another set Y, given a set of
q j
evidence nodes E, i.e., X Y E ”.
Why does variable order affect network density?

 Earthquake example
Because
Burglary Earthquake
 Using the causal order allows direct
representation of conditional independencies

Alarm  Violating causal order requires new arcs to


re-establish conditional independencies

JohnCalls MaryCalls

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 29 Korb & Nicholson 30

Inference in Bayesian
Causal Ordering (cont’d)
Networks

Flu TB
 Basic task for any probabilistic inference system:
Compute the posterior probability distribution for
a set of query variables, given values for some
evidence variables.
Cough

 Also called Belief Updating.


Flu and TB are marginally independent.
 Types of Inference:

Given the ordering: Cough, Flu, TB: Q E Q E E

Cough

E Q
Flu TB

(Explaining Away)
Marginal independence of Flu and TB must be E Q Intercausal E
re-established by adding F lu !
T B or F lu T B
Diagnostic Causal Mixed

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 31 Korb & Nicholson 32

Kinds of Inference Inference Algorithms:


 Diagnostic inferences: from effect to causes.
Overview


P(Burglary|JohnCalls)
Exact inference
 Causal Inferences: from causes to effects.
– Trees and polytrees: message-passing
P(JohnCalls|Burglary) algorithm
P(MaryCalls|Burglary) – Multiply-connected networks:
 Intercausal Inferences: between causes of a  Clustering
common effect.  Approximate Inference
P(Burglary|Alarm)
– Large, complex networks:
P(Burglary|Alarm ^ Earthquake)  Stochastic Simulation
 Mixed Inference: combining two or more of above.  Other approximation methods
P(Alarm|JohnCalls ^ :EarthQuake)  In the general case, both sorts of inference are
P(Burglary|JohnCalls ^ :EarthQuake) computationally complex (“NP-hard”).

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 33 Korb & Nicholson 34

Message Passing Example


Inference in multiply
P(B) P(E)
0.01
Burglary Earthquake
0.02 connected networks


B E P(A)
PhoneRings Alarm
T T 0.95 Networks where two nodes are connected by more
T F 0.94
P(Ph) F T 0.29 than one path
0.05 JohnCalls MaryCalls F F 0.001
– Two or more possible causes which share a
P A P(J) A P(M) common ancestor
T T 0.95 T 0.70
T F 0.5 F 0.01
– One variable can influence another through
F T 0.90
F F 0.01
more than one causal mechanism

 Example: Cancer network


Metastatic Cancer
π(Β) = (.001,.999) π(Ε) = (.002,.998) A
λ (Β) = (1,1) λ (Ε) = (1,1)
Brain tumour
bel(B) = (.001, .999) bel(E) = (.002, .998)
B C
B λ A (B) λ A (E)
E

bel(Ph) = (.05, .95) Increased total


π A (B) π A (E) serum calcium
π(Ph) = (.05,.95)
D E
λ(Ph) = (1,1) Ph λ J (Ph) A
λ J (A) π M(A) Coma Severe Headaches
π J (Ph) λ M(A)
π J (A)  Message passing doesn’t work - evidence gets
J M
“counted twice”
λ (J) = (1,1) λ (M) = (1,0)

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 35 Korb & Nicholson 36

Clustering methods

 Transform network into a probabilistically


equivalent polytree by merging (clustering) Clustering methods (cont.)
offending nodes

 Cancer example: new node Z combining B and C


 Jensen Join-tree (Jensen, 1996) version the
current most efficient algorithm in this class (e.g.
A used in Hugin, Netica).

 Network evaluation done in two stages


– Compile into join-tree
Z=B,C  May be slow
 May require too much memory if original
network is highly connected
E – Do belief updating in join-tree (usually fast)
D
Caveat: clustered nodes have increased complexity;
P (z ja) = P (b; ja) = P (bja)P ( ja) updates may be computationally complex

P (ejz ) = P (ejb; ) = P (ej )


P (djz ) = P (djb; )

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 37 Korb & Nicholson 38

Approximate inference with


stochastic simulation Making Decisions
 Use the network to generate a large number of
 Bayesian networks can be extended to support
cases that are consistent with the network
decision making.
distribution.
 Preferences between different outcomes of
 Evaluation may not converge to exact values (in
various plans.
reasonable time).
– Utility theory
 Usually converges to close to exact solution
quickly if the evidence is not too unlikely.
 Decision theory = Utility theory + Probability
theory.
 Performs better when evidence is nearer to root
nodes, however in real domains, evidence tends to
be near leaves (Nicholson&Jitnah, 1998)

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 39 Korb & Nicholson 40

Type of Nodes
Decision Networks Chance nodes: (ovals) represent random variables
(same as Bayesian networks). Has an associated
A Decision network represents information about CPT. Parents can be decision nodes and other
 the agent’s current state chance nodes.

 its possible actions Decision nodes: (rectangles) represent points where


the decision maker has a choice of actions.
 the state that will result from the agent’s action
Utility nodes: (diamonds) represent the agent’s
 the utility of that state utility function (also called value nodes in the
Also called, Influence Diagrams (Howard&Matheson, literature). Parents are variables describing the
1981). outcome state that directly affect utility. Has an
associated table representing multi-attribute
utility function.

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 41 Korb & Nicholson 42

Example: Umbrella

Weather

Evaluating Decision
Forecast U
Networks: Algorithm

Take Umbrella 1. Set the evidence variables for the current state.

2. For each possible value of the decision node


P (W eather = Rainj) = 0:3
P (F ore ast = RainyjW eather = Rain) = 0:60 (a) Set the decision node to that value.
P (F ore ast = CloudyjW eather = Rain) = 0:25 (b) Calculate the posterior probabilities for the
P (F ore ast = SunnyjW eather = Rain) = 0:15 parent nodes of the utility node (as for BNs).
(c) Calculate the resulting (expected) utility for
P (F ore ast = RainyjW eather = NoRain) = 0:1 the action.
P (F ore ast = CloudyjW eather = NoRain) = 0:2 3. Return the action with the highest expected utility.
P (F ore ast = SunnyjW eather = NoRain) = 0:7
Simple for single decision, less so when executing
several actions in sequence (i.e. a plan).
U (NoRain; T akeUmbrella) = 20
U (NoRain; LeaveAtHome) = 100
U (Rain; T akeIt) = 70
U (Rain; LeaveAtHome) = 0

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 43 Korb & Nicholson 44

Dynamic Belief Networks

Dynamic Decision Network


State evolution model

State t-2 State t-1 State t State t+1 State t+2  Similarly, Decision Networks can be extended to
include temporal aspects.


Obs t-2 Obs t-1 Obs t Obs t+1 Obs t+2

Sensor model
Sequence of decisions taken = Plan.

 The values of state variables at time t depend only


on the values at t 1.
Dt Dt+1 Dt+1 Dt+1

 Can calculate distributions for St+1 and further:


probabilistic projection.
State t State t+1 State t+2 State t+3

 Can be done using standard BN updating


Ut+3
algorithms Obs t Obs t+1 Obs t+2 Obs t+3

 This type of DBN gets very large, very quickly.

 Usually only keep two time slices of the network.

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 45 Korb & Nicholson 46

Bayesian Networks: Summary

 Bayes’ rule allows unknown probabilities to be


computed from known ones.
Uses of Bayesian Networks  Conditional independence (due to causal
relationships) allows efficient updating
1. Calculating the belief in query variables given
values for evidence variables (above).
 Bayesian networks are a natural way to represent
conditional independence info.
2. Predicting values in dependent variables given – links between nodes: qualitative aspects;
values for independent variables.
– conditional probability tables: quantitative
3. Decision making based on probabilities in the aspects.
network and on the agent’s utilities (Influence
Diagrams [Howard and Matheson 1981]).
 Inference means computer the probability
distribution for a set of query variables, given a set
4. Deciding which additional evidence should be of evidence variables.
observed in order to gain useful information.
 Inference in Bayesian networks is very flexible:
5. Sensitivity analysis to test impact of changes in can enter evidence about any node and update
probabilities or utilities on decisions. beliefs in any other nodes.

 The speed of inference in practice depends on the


structure of the network: how many loops;
numbers of parents; location of evidence and query
nodes.

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 47-1 Korb & Nicholson 48

Bayesian Networks: Summary


(cont’d)

Bayesian networks can be extended with decision Applications: Overview


nodes and utility nodes to support decision making:
Decision Networks or Influence Diagrams.  (Simple) Example Networks
Bayesian and Decision networks can be extended to  Applications
allow explicit reasoning about changes over time.
– Medical Decision Making: Survey of
applications
– Planning and Plan Recognition
– Natural Language Generation (NAG)
– Bayesian poker

 Deployed Bayesian Networks (See Handout for


details)

 BN Software

 Web Resources

Bayesian AI Tutorial
Bayesian AI Tutorial
Korb & Nicholson 49 Korb & Nicholson 50

Example: Cancer
Metastatic cancer is a possible cause of a brain tumor
and is also an explanation for increased total serum Example: Asia
calcium. In turn, either of these could explain a
patient falling into a coma. Severe headache is also A patient presents to a doctor with shortness of breath.
possibly associated with a brain tumor. (Example from The doctor considers that possibles causes are
(Pearl, 1988).) tuberculosis, lung cancer and bronchitis. Other
Metastatic Cancer additional information that is relevant is whether the
A patient has recently visited Asia (where tuberculosis is
Brain tumour more prevalent), whether or not the patient is a
B C smoker (which increases the chances of cancer and
Increased total
serum calcium
bronchitis). A positive xray would indicate either TB or
D E lung cancer. (Example from (Lauritzen, 1988).)
Coma Severe Headaches visit to Asia smoking

P (a) = 0:2 tuberculosis lung cancer bronchitis

P (bja) = 0:80 P (bj:a) = 0:20 either tub or


lung cancer

P ( ja) = 0:20 P ( j:a) = 0:05


P (djb; ) = 0:80 P (dj:b; ) = 0:80
positive X-ray dyspnoea

P (djb; : ) = 0:80 P (dj:b; : ) = 0:05


P (ej ) = 0:80 P (ej: ) = 0:60

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 51 Korb & Nicholson 52

Example: A Lecturer’s Life


Dr. Ann Nicholson spends 60% of her work time in her office.
BN Applications
The rest of her work time is spent elsewhere. When Ann is in
her office, half the time her light is off (when she is trying to  Most BN applications to date are hand-crafted
hide from students and get some real work done). When she using domain information provided by experts.
is not in her office, she leaves her light on only 5% of the time.
(van der Gaag et al., 1999, give a case study on
80% of the time she is in her office, Ann is logged onto the
probability elicitation.)
computer. Because she sometimes logs onto the computer
from home, 10% of the time she is not in her office, she is still  Tasks include:
logged onto the computer. Suppose a student checks Dr.
Nicholson’s login status and sees that she is logged on. What
– prediction: (1) given evidence; (2) effect of
effect does this have on the student’s belief that Dr. intervention.
Nicholson’s light is on? (Example from (Nicholson, 1999)) – diagnosis
– planning
P(in-office=T)=0.6 – decision making
in-office
P(lights-on=T | in-office=T)=0.5 – explanation
P(lights-on=T | in-office=F)=0.05
– choice of observations (experimental design)
P(logged-on=T | in-office=T) = 0.8
lights-on logged-on P(logged-on=T | in-office=F) = 0.1

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 53 Korb & Nicholson 54-1

Probabilistic reasoning in
medicine

Probabilistic reasoning in Multiply-connected network (QMR structure)

medicine  B = background information (e.g. age, sex of


patient)
 See Dean et al. (1993).

 Simplest tree-structured network for diagnostic


reasoning
– H = disease hypothesis; F = findings
(symptoms, test results)

F1 F2 F3

Bayesian AI Tutorial
Bayesian AI Tutorial

Korb & Nicholson 55 Korb & Nicholson 56

Medical Applications

 Pathfinder case study; see Russell&Norvig (1995, Medical Applications


pp.457-458).

 QMR (Quick Medical Reference): 600 diseases,  ALARM (Beinlich et al., 1989): 37 nodes, 42 arcs.
4,000 findings, 40,000 arcs. (Dean&Wellman, (See Netica examples.)
MinVolSet (3)
1991) .976

Ventmach (4) Disconnect (2)

 MUNIN (Andreassen et al., 1989): neuromuscular PulmEmbolus(2) Intubation (3)


1.158
.141
.617
VentTube (4) 1.146 KinkedTube(4)

disorders, about 1000 nodes; exact computation <


.288 1.180 .227
.369 .140
.428
PAP (3) Shunt (2) Press (4) .098 VentLung (4)
5 seconds. .067 .100 1.201
1.189


FiO2 (2) VentAlv (4) MinVol (4)
Glucose prediction and insulin dose adjustment .411 .213
.805 .743
.891

PVSat (3) ArtCO2 (3) .362


(DBN application) (Andreassen et al., 1991). ExpCO2 (4)
Anaphylaxis (2)
.054 .239


InsuffAnesth (2) SaO2 (3)
TPR (3)
CPSC project (Pradham et al., 1994) .092
.246 .066
Catechol (2)

– 448 nodes, 906 links, 8254 conditional LVFailure(2) Hypovolemia (2)


.470
.360
probability values
.547 .538
.137 .479

ErrCauter (2) HR (3) ErrLowOutput(2) History (2) StrokeVolume (3) LVEDVolume(3)


.888
– LW algorithm - answers in 35 mins (1994) .324
.888 .948 .251
.344
.724 .746
.874
HRSat (3) .324 HREKG (3) HRBP (3) CO (3) CVP (3) PCPW (3)

 Application of LW to medical diagnosis .199


BP (3)
.485

(Shwe&Cooper, 1990).

 Forecasting sleep apnea (Dagum et al., 1993).

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 57 Korb & Nicholson 58

Natural Language Generation


Plan Recognition Applications
NAG (McConachy et al., 1999) – A Nice Argument
 Keyhole plan recognition in an Adventure game Generator – uses two Bayesian networks to generate
(Albrecht et al., 1998).
A
0 A
1
A
2 A
3
A
0 A
1
A
2 A
3
and assess natural language arguments:

Q Q’ Q Q’
 Normative Model: Represents our best
understanding of the domain; proper (constrained)
L L L L L L L L
Bayesian updating, given premises.
0 1 2 3 0 1 2 3

(a) mainModel (b) indepModel  User Model: Represents our best understanding of
A
0 A
1
A
2 A
3 Q Q’
the human; Bayesian updating modified to reflect
human biases (e.g., overconfidence; Korb,
Q Q’ L
0
L
1
L
2
L
3 McConachy, Zukerman, 1997).
(c) actionModel (d) locationModel
BNs are embedded in a semantic hierarchy
 Traffic plan recognition (Pynadeth&Wellman,  supports attentional modeling
1995).
 constrained updating

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 59 Korb & Nicholson 60

Bayesian Poker

 (Korb et al., 1999)


%cE
%  E cc

Higher level

%%  EE ccc
concepts like
 ‘motivation’ or
Poker is ideal for testing automated reasoning
%  E  c
‘ability’

%%  EE  cc
Lower level
concepts like
under uncertainty
‘Grade Point Average’
%  E cc
%% + EE H  cc
 
@ Semantic
@
% %   E
 E  B HH B 
H cc Network
2nd layer – Physical randomisation
@
@% 
EA
Q E HH
 c
 c
%@@ 
E
E A
 A EE QQBB
Q B
 cc
Semantic
%% -@@R

EE E Q  cc – Incomplete hand information
% E C   c
Network

%% EE C 
1st layer
 

% 
HH
H
 
EE C  C   
 Bayesian – Incomplete opponent info (strategies, bluffing,
%
 HH 


  
%     E  C  etc)
%%
E
EE 
 Network

   
%
% E

6

Bayesian networks are a good representation for


Proposition, e.g., [publications authored by person X cited >5 times]
complex game playing.

 Our Bayesian Poker Player (BPP) plays 5-Card


stud poker at the level of a good amateur human
player. To play:
telnet indy13.cs.monash.edu.au
1
login: poker
password: maverick

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 61 Korb & Nicholson 62

Bayesian Poker BN

 Bayesian network provides an estimate of winning Bayesian Poker BN (cont.)


at any point in the hand.

 Betting curves based on pot-odds used to  Different networks (matrices) for each round.
determine action (bet/call, pass or fold).
 OPP Current, BPP Current: (partial) hand types
BPP Win
with cards dealt so far.

 OPP Final, BPP Final: hand types after all 5


cards dealt.
OPP Final BPP Final

M
 Observation nodes:
M
C|F C|F
– OPP Upcards: All opponent’s cards except first
OPP Current BPP Current are visible to BPP.
M M
– OPP Action: BPP knows opponent’s action.
A|C U|C

OPP Action OPP Upcards

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 63 Korb & Nicholson 64

Current Status, Possible


Bayesian Poker BN (cont.)
Extensions
Hand Types

 Initial 9 hand types too coarse.


 BPP outperforms automated opponents, is fairly
even with ave amateur humans, and loses to
 We use a finer granularity for most common hands experienced humans.
(busted and a pair):
 Learning the OPP Action CPTs does not (yet)
– low, medium, Q-high, K-high, A-high appear to improve performance.
– results in 17 hand-types
 BN Improvements
Conditional Probability Matrices – Refine action nodes
 MAjC : probability of opponent’s action given – Further refinement of hand types
current hand type learned from observed – Improve network structure
showdown data.
– Adding bluffing to the opponent model
 MU jC and MCjF estimated by dealing out 107 – Improved learning of opponent model
poker hands.
 More complex poker: multi-opponent games, table
Belief Updating: Since network is a polytree, simple stake games.
fast propagation updating algorithm used.
 DBN model to represent changes over time

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 65 Korb & Nicholson 66

Deployed BNs
Deployed BNs (cont’d)
 From Web Site database: See handout for details.

 TRACS: Predicting reliability of military vehicles.  Knowledge Industries applications: (a) in


 Andes: intelligent tutoring system for physics. medicine, sleep disorders, pathology, trauma care,
hand and wrist evaluations, dermatology, and
 Distributed Virtual Agents advising online users home-based health evaluations (b) in capital
on web sites. equipment, locomotives, gas-turbine engines for
 Information extraction from natural language text aircraft and land-based power production, the
space shuttle, and office equipment.
 DXPLAIN: decision support for medical diagnosis.
 Software debugging.
 Illiad: teaching tool for medical students.
 Vista: decision support system used at NASA
 Microsoft Health Produce: “find by symptom”
Mission Control Center.
feature.
 MS: (a) Answer Wizard (Office 95), Information
 Weapons scheduling. retrieval; (b) Print Troubleshooter; (c) Aladdin,
 Monitoring power generation. troubleshooting customer support.

 Processor fault diagnosis.

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 67 Korb & Nicholson 68

BN Software: Issues

 Functionality
– Especially application vs API BN Software
 Price
– Many free for demo versions or educational use  Analytica: www.lumina.com
– Commercial licence costs.  Hugin: www.hugin.com
 Availability (platforms)  Netica: www.norsys.com
 Quality Above 3 available during tutorial lab session.
– GUI  JavaBayes:
– Documentation and Help http://www.cs.cmu.edu/˜ javabayes/Home/
 Leading edge  Many other packages (see next slide)
 Robustness
– software
– company

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 69 Korb & Nicholson 70

BN Web Resources
 Bayesian Belief Network site (Russell Greiner):
www.cs.ualberta.ca/˜ greiner/bn.html Applications: Summary
 Bayesian Network Repository (Nir Friedman)
www-nt.cs.berkeley.edu/home/nir/  Various BN structures are available to compactly
public html/Repository/index.htm and accurately represent certain types of domain
 Summary of BN software and links to software sites features.
(Kevin Murphy)
http.cs.berkeley.edu/˜
 Bayesian networks have been used for a wide
murphyk/Bayes/bnsoft.html range of AI applications.
Includes Murphy’s Bayes net toolbox  Robust and easy to use Bayesian network software
 Russell Almond’s BN page is now readily available.
bayes.stat.washington.edu/almond/belief.html

 Association for Uncertainty in AI


www.auai.org

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 71 Korb & Nicholson 72

Learning Bayesian Networks


 Linear and Discrete Models
 Learning Network Parameters Linear and Discrete Models
– Linear Coefficients
– Learning Probability Tables Linear Models: Used in biology & social sciences
since Sewall Wright (1921)
 Learning Causal Structure
 Conditional Independence Learning Linear models represent causal relationships as sets of
linear functions of “independent” variables.
– Statistical Equivalence
– TETRAD II
X1 X2
 Bayesian Learning of Bayesian Networks
– Cooper & Herskovits: K2
– Learning Variable Order
X3
– Statistical Equivalence Learners

 Full Causal Learners Equivalently:


 Minimum Encoding Methods X3 = a13 X1 + a23 X2 + 1
– Lam & Bacchus’s MDL learner
– MML metrics Discrete models: “Bayesian nets” replace vectors of
– MML search algorithms linear coefficients with CPTs.
– MML Sampling

 Empirical Results

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 73 Korb & Nicholson 74

Learning Conditional
Probability Tables
Spiegelhalter & Lauritzen (1990):
 assume parameter independence

Learning Linear Parameters  each CPT cell i = a parameter in a Dirichlet


distribution
Maximum likelihood methods have been available
since Wright’s path model analysis (1921). D[ 1 ; : : : ; i ; : : : ; K ℄
for K parents

Equivalent methods:  prob of outcome i is i =K


k=1 k

 Simon-Blalock method (Simon, 1954; Blalock,  observing outcome i update D to


1964)
D[ 1 ; : : : ; i + 1; : : : ; K ℄
 Ordinary least squares multiple regression (OLS)
Others are looking at learning without parameter
independence. E.g.,

 Decision trees to learn structure within CPTs


(Boutillier et al. 1996).

 Dual log-linear and full CPT models (Neil,


Wallace, Korb 1999).

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 75 Korb & Nicholson 76

Learning Causal Structure


This is the real problem; parameterizing models is
relatively straightforward estimation problem.
Statistical Equivalence
Verma and Pearl’s rules identify the set of causal
There are two basic methods:
models which are statistically equivalent —
 Learning from conditional independencies (CI
learning)
Two causal models H1 and H2 are
 Learning using a scoring metric statistically equivalent iff they contain the
(Metric learning) same variables and joint samples over them
provide no statistical grounds for preferring
one over the other.
CI learning (Verma and Pearl, 1991)
Examples
Suppose you have an Oracle who can answer yes or no  All fully connected models are equivalent.
to any question of the type:
 A ! B ! C and A B C.
X q Y jS?  A ! B ! D C and A B ! D C.
Then you can learn the correct causal model, up to
statistical equivalence.

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 77 Korb & Nicholson 78

TETRAD II
Statistical Equivalence
— Spirtes, Glymour and Scheines (1993)
Chickering (1995):
Replace the Oracle with statistical tests:
 Any two causal models over the same variables
which have the same skeleton (undirected arcs)  for linear models a significance test on partial
and the same directed v-structures are correlation
statistically equivalent.
X q Y jS i XY S = 0
 If H1 and H2 are statistically equivalent, then
they have the same maximum likelihoods relative  for discrete models a 2 test on the difference
to any joint samples: between CPT counts expected with independence
(Ei ) and observed (Oi )
max P (ejH1 ; 1 ) = max P (ejH2 ; 2 )
 2
where i is a parameterization of Hi X q Y jS i Oi ln Oi  0
i
Ei

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 79 Korb & Nicholson 80

Bayesian LBN: Cooper &


Herskovits
— Cooper & Herskovits (1991, 1992)
TETRAD II Compute P (hi je) by brute force, under the
assumptions:
 Asymptotically finds causal structure to within the
1. All variables are discrete.
statistical equivalence class of the true model.
2. Samples are i.i.d.
 Requires larger sample sizes than MML (Dai,
Korb, Wallace & Wu, 1997): 3. No missing values.
Statistical tests are not robust given weak 4. All values of child variables are uniformly
causal interactions and/or small samples. distributed.
 Cheap, and easy to use. 5. Priors over hypotheses are uniform.

With these assumptions, Cooper & Herskovits reduce


the computation of PCH (h; e) to a polynomial time
counting problem.

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 81 Korb & Nicholson 82

Learning Variable Order


Cooper & Herskovits
Reliance upon a given variable order is a major
drawback to K2
But the hypothesis space is exponential; they go for
dramatic simplification: And many other algorithms (Buntine 1991,
Bouckert 1994, Suzuki 1996, Madigan &
6. Assume we know the temporal ordering of the
Raftery 1994)
variables.
What’s wrong with that?
In that case, for any pair of variables the only problem
is  We want autonomous AI (data mining). If experts
can order the variables they can likely supply
 deciding whether they are connected by an arc
models.
! arc direction is trival
! cycles are impossible.  Determining variable ordering is half the problem.
2 n)=2 If we know A comes before B , the only remaining
New hypothesis space has size only 2(n (still
issue is whether there is a link between the two.
exponential).
Algorithm “K2” does a greedy search through this
 The number of orderings consistent with dags is
exponential (Brightwell & Winkler 1990; number
reduced space.
complete). So iterating over all possible orderings
will not scale up.

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 83 Korb & Nicholson 84

Statistical Equivalence
Statistical Equivalence Learners
Learners Wallace & Korb (1999): This is not right!

 These are causal models; they are distinguishable


Heckerman & Geiger (1995) advocate learning only up on experimental data.
to statistical equivalence classes (a la TETRAD II).
– Failure to collect some data is no reason to
Since observational data cannot distinguish change prior probabilities.
btw equivalent models, there’s no point trying E.g., If your thermometer topped out at 35Æ ,
to go futher. you wouldn’t treat 35Æ and 34Æ as equally

likely.
) Madigan, Andersson, Perlman & Volinsky  Not all equivalence classes are created equal:
(1996) follow this advice, use uniform prior fA ! ! !
B C, A B C, A B C g
over equivalence classes. f !
A g B C
) Geiger and Heckerman (1994) define
 Within classes some dags should have greater
Bayesian metrics for linear and discrete
priors than others. . . E.g.,
equivalence classes of models (BGe and BDe)
LightsOn ! InOffice ! LoggedOn v.
LightsOn InOffice ! LoggedOn

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 85 Korb & Nicholson 86

Full Causal Learners


So. . . a full causal learner is an algorithm that: MDL
1. Learns causal connectedness.
Minimum Description Length (MDL) inference —
2. Learns v-structures.
 Invented by Rissanen (1978)
Hence, learns equivalence classes.
based upon Minimum Message Length
3. Learns full variable order. (MML) invented by Wallace (Wallace and
Hence, learns full causal structure (order + Boulton, 1968).
connectedness).
 Plays trade-off btw
 TETRAD II: 1, 2.
– model simplicity
 Madigan et al.: 1, 2. – model fit to the data
 Cooper & Herskovits’ K2: 1. by minimizing the length of a joint description of

 Lam and Bacchus MDL: 1, 2 (partial), 3 (partial).


model and data given the model.

 Wallace, Neil, Korb MML: 1, 2, 3.

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 87 Korb & Nicholson 88

Lam & Bacchus (1993)


MDL encoding of causal models:

 Network: Lam & Bacchus


n

i=1 [ki log(n) + d(si 1) 2


j  (i) sj ℄ Search algorithm:
– ki log(n) for specifying ki parents for ith node  Initial constraints taken from domain expert:
ki
partial variable order, direct connections
– d(si 1) j=1 sj for specifying the CPT:
d is the fixed bit-length per probability  Greedy search: every possible arc addition is
si is the number of states for node i tested, best MDL measure used to add one
(Note: no arcs are deleted)
 Data given network:
n n  Local arcs checked for improved MDL via arc
N i=1 M (Xi ; (i)) N i=1 H (Xi ) reversal

– M (Xi ; (i)) is mutual information btw Xi and  Iterate until MDL fails to improve
its parent set ) Results similar to K2, but without full variable
– H (Xi ) is entropy of variable Xi ordering

(NB: This code is not efficient. E.g., treats every node


as equally likely to be a parent; assumes knowledge of
all ki .)

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 89 Korb & Nicholson 90

CaMML MML Metric for Linear Models


Minimum Message Length (Wallace & Boulton 1968)  Network:

log n! + n(n2 1) log E


uses Shannon’s measure of information:

I (m) = log P (m)


– log n! for variable order
n(n 1)
Applied in reverse, we can compute P (h; e) from I (h; e). – 2
for connectivity

Given an efficient joint encoding method for the


– log E restore efficiency by subtracting cost of
selecting a linear extension
hypothesis & evidence space (i.e., satisfying Shannon’s
law), MML:  Parameters given dag h:
f g
Searches hi for that hypothesis h that f (j jh)
minimizes I (h) + I (e h).j log p
F (j )
Xj

Equivalent to that h that maximizes P (h)P (e h) — i.e.,j where j are the parameters for Xj and F (j ) is
j
P (h e). j
the Fisher information. f (j h) is assumed to be
The other significant difference from MDL: N (0; j ).
MML takes parameter estimation seriously. (Cf. with MDL’s fixed length for parms)

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 91 Korb & Nicholson 92

MML Metric for discrete


models
MML Metric for Linear Models
We can use PCH (hi ; e) (from Cooper & Herskovits) to
define an MML metric for discrete models.
 Sample for Xj given h and j :
Difference between MML and Bayesian metrics:
p21 e
K
log P (ejh; j ) = 2 2
jk =2j
k=1 MML partitions the parameter space and
j
selects optimal parameters.
where K is the number of sample values and jk is
the difference between the observed value of Xj Equivalent to a penalty of 12 log e
6
per parameter
and its linear prediction. (Wallace & Freeman 1987); hence:

I (e; hi ) = sj log e log PCH (hi ; e) (1)


2 6
Applied in MML Sampling algorithm.

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 93 Korb & Nicholson 94

MML search algorithms


MML metrics need to be combined with search. This
has been done three ways: MML Sampling
1. Wallace, Korb, Dai (1996): greedy search (linear).
Search space of totally ordered models (TOMs).
 Brute force computation of linear extensions
Sampled via a Metropolis algorithm (Metropolis et al.
(small models only).
1953).
2. Neil and Korb (1999): genetic algorithms (linear). From current model M , find the next model M 0 by:
 Asymptotic estimator of linear extensions  Randomly select a variable; attempt to swap order
 GA chromosomes = causal models with its predecessor.
 Genetic operators manipulate them  Or, randomly select a pair; attempt to add/delete
 Selection pressure is based on MML an arc.
3. Wallace and Korb (1999): MML sampling (linear, Attempts succeed whenever P (M 0 )=P (M ) > U (per
discrete). MML metric), where U is uniformly random from
 Stochastic sampling through space of totally [0 : 1℄.
ordered causal models
 No counting of linear extensions required

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 95 Korb & Nicholson 96

Empirical Results

MML Sampling A weakness in this area — and AI generally.

 Paper publications based upon very small models,


Metropolis: this procedure samples TOMs with a loose comparisons.
frequency proportional to their posterior probability.
 ALARM net often used — everything sets it to
To find posterior of dag h: keep count of visits to all within 1 or 2 arcs.
TOMs consistent with h
Neil and Korb (1999) compared CaMML and BGe
Estimated by counting visits to all TOMs with (Heckerman & Geiger’s Bayesian metric over
identical max likelihoods to h equivalence classes), using identical GA search over
Output: Probabilities of linear models:

 Top dags  On KL distance and topological distance from the


true model, CaMML and BGe performed nearly
 Top statistical equivalence classes the same.
 Top MML equivalence classes  On test prediction accuracy on strict effect nodes
(those with no children), CaMML clearly
outperformed BGe.

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 97 Korb & Nicholson 98

Current Research Issues

 size and complexity


LBN Web Resources
 difficulties with elicitation

 Info on TETRAD II; downloadable TETRAD III (approx  combinations of discrete and continuous (i.e.
equiv to II) mixing node types)
hss.cmu.edu/html/departments/
philosophy/TETRAD/tetrad.html  Learning issues

 Microsoft: MSBN and Webmine — downloadable trial – Missing data


versions – Latent variables
www.research.microsoft.com/research/dtg/
– Experimental data
 CaMML (web site under construction)
– Learning CPT structure
www.csse.monash.edu.au/˜ korb
– Multi-structure models
 continuous & discrete
 CPTs w/ & w/o parm independence

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 99 Korb & Nicholson 100

References

Introduction to Bayesian AI
T. Bayes (1764) “An Essay Towards Solving a Problem in the
Doctrine of Chances.” Phil Trans of the Royal Soc of
London. Reprinted in Biometrika, 45 (1958), 296-315.
B. Buchanan and E. Shortliffe (eds.) (1984) Rule-Based
Expert Systems: The MYCIN Experiments of the Stanford
Heuristic Programming Project. Addison-Wesley.
(Other) Limitations B. de Finetti (1964) “Foresight: Its Logical Laws, Its
Subjective Sources,” in Kyburg and Smokler (eds.)
 inappropriate problems (deterministic systems, Studies in Subjective Probability. NY: Wiley.

legal rules) D. Heckerman (1986) “Probabilistic Interpretations for


MYCIN’s Certainty Factors,” in L.N. Kanal and J.F.
Lemmer (eds.) Uncertainty in Artificial Intelligence.
North-Holland.
C. Howson and P. Urbach (1993) Scientific Reasoning: The
Bayesian Approach. Open Court.
A MODERN REVIEW OF B AYESIAN THEORY.
K.B. Korb (1995) “Inductive learning and defeasible
inference,” Jrn for Experimental and Theoretical AI, 7,
291-324.
R. Neapolitan (1990) Probabilistic Reasoning in Expert
Systems. Wiley.

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 101 Korb & Nicholson 102

C HAPTERS 1, 2 AND 4 COVER SOME OF THE RELEVANT Spiegelhalter (1999) Probabilistic networks and expert
HISTORY. systems. New York: Springer.
TECHNICAL SURVEY OF B AYESIAN NET TECHNOLOGY,
J. Pearl (1988) Probabilistic Reasoning in Intelligent
INCLUDING LEARNING B AYESIAN NETS.
Systems, Morgan Kaufmann.
F. P. Ramsey (1931) “Truth and Probability” in The D. D’Ambrosio (1999) “Inference in Bayesian Networks”.
Foundations of Mathematics and Other Essays. NY: Artificial Intelligence Magazine, Vol 20, No. 2.
Humanities Press. A. P. Dawid (1998) Conditional independence. In
T HE ORIGIN OF MODERN B AYESIANISM . I NCLUDES Encyclopedia of Statistical Sciences, Update Volume 2.
LOTTERY- BASED ELICITATION AND D UTCH - BOOK New York: Wiley Interscience.
ARGUMENTS FOR THE USE OF PROBABILITIES.
P. Haddaway (1999) “An Overview of Some Recent
R. Reiter (1980) “A logic for default reasoning,” Artificial Developments in Bayesian Problem-Solving Techniques”.
Intelligence, 13, 81-132. Artificial Intelligence Magazine, Vol 20, No. 2.
J. von Neumann and O.Morgenstern (1947) Theory of Games R.A. Howard & J.E. Matheson (1981) Influence Diagrams.
and Economic Behavior, 2nd ed. Princeton Univ. In Howard and Matheson (eds.) Readings in the
S TANDARD REFERENCE ON ELICITING UTILITIES VIA Principles and Applications of Decision Analysis. Menlo
LOTTERIES. Park, Calif: Strategic Decisions Group.
F. V. Jensen (1996) An Introduction to Bayesian Networks,
Bayesian Networks Springer.
R. Neapolitan (1990) Probabilistic Reasoning in Expert
E. Charniak (1991) “Bayesian Networks Without Tears”,
Systems. Wiley.
Artificial Intelligence Magazine, pp. 50-63, Vol 12.
S IMILAR COVERAGE TO THAT OF P EARL ; MORE
A N ELEMENTARY INTRODUCTION.
EMPHASIS ON PRACTICAL ALGORITHMS FOR NETWORK
G.F. Cooper (1990) The computational complexity of UPDATING.
probabilistic inference using belief networks. Artificial
J. Pearl (1988) Probabilistic Reasoning in Intelligent
Intelligence, 42, 393-405.
Systems, Morgan Kaufmann.
R. G. Cowell, A. Philip Dawid, S. L. Lauritzen and D. J. T HIS IS THE CLASSIC TEXT INTRODUCING B AYESIAN

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 103 Korb & Nicholson 104

NETWORKS TO THE AI COMMUNITY. I. Beinlich, H. Suermondt, R. Chavez and G. Cooper (1992)


“The ALARM monitoring system: A case study with two
J. Pearl (2000) Causality. Cambridge University.
probabilistic inference techniques for belief networks”,
Poole, D., Mackworth, A., and Goebel, R. (1998) Proc. of the 2nd European Conf. on Artificial Intelligence
Computational Intelligence: a logical approach. Oxford in medicine, pp. 689-693.
University Press.
T.L Dean and M.P. Wellman (1991) Planning and control,
Russell & Norvig (1995) Artificial Intelligence: A Modern Morgan Kaufman.
Approach, Prentice Hall. T.L. Dean, J. Allen and J. Aloimonos (1994) Artificial
J. Whittaker (1990) Graphical models in applied Intelligence: Theory and Practice, Benjamin/Cummings.
multivariate statistics. Wiley. P. Dagum, A. Galper and E. Horvitz (1992) “Dynamic
Network Models for Forecasting”, Proceedings of the 8th
Conference on Uncertainty in Artificial Intelligence, pp.
Applications 41-48.
D.W. Albrecht, I. Zukerman and Nicholson, A.E. (1998) J. Forbes, T. Huang, K. Kanazawa and S. Russell (1995) “The
Bayesian Models for Keyhole Plan Recognition in an BATmobile: Towards a Bayesian Automated Taxi”,
Adventure Game. User Modeling and User-Adapted Proceedings of the 14th Int. Joint Conf. on Artificial
Interaction, 8(1-2), 5-47, Kluwer Academic Publishers. Intelligence (IJCAI’95), pp. 1878-1885.

S. Andreassen, F.V. Jensen, S.K. Andersen, B. Falck, U. M. Henrion, J.S. Breese and E.J. Horvitz (1991) Decision
Kjærulff, M. Woldbye, A.R. Sørensen, A. Rosenfalck and analysis and expert systems. AI Magazine, 12, 64-91.
F. Jensen (1989) “MUNIN — An Expert EMG Assistant”, K.B. Korb, I. Zukerman and R. McConachy (1997) A
Computer-Aided Electromyography and Expert Systems, cognitive model of argumentation. In Proceedings of the
Chapter 21, J.E. Desmedt (Ed.), Elsevier. Cognitive Science Society, Stanford University.
S.A. Andreassen, J.J Benn, R. Hovorks, K.G. Olesen and S.L Lauritzen and D.J. Spiegelhalter (1988) “Local
R.E. Carson (1991) “A Probabilistic Approach to Glucose Computations with Probabilities on Graphical
Prediction and Insulin Dose Adjustment: Description of Structures and their Application to Expert Systems”,
Metabolic Model and Pilot Evaluation Study”. Journal of the Royal Statistical Society, 50(2), pp.

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 105 Korb & Nicholson 106

157-224. R. Bouckeart (1994) Probabilistic network construction using


M. Pradham, G. Provan, B. Middleton and M. Henrion the minimum description length principle. Technical
(1994) “Knowledge engineering for large belief Report RUU-CS-94-27, Dept of Computer Science,
networks”, Proceedings of the 10th Conference on Utrecht University.
Uncertainty in Artificial Intelligence. C. Boutillier, N. Friedman, M. Goldszmidt, D. Koller (1996)
D. Pynadeth and M. P. Wellman (1995) “Accounting for “Context-specific independence in Bayesian networks,” in
Context in Plan Recogniition, with Application to Traffic Horvitz & Jensen (eds.) UAI 1996, 115-123.
Monitoring”, Proceedings of the 11th Conference on G. Brightwell and P. Winkler (1990) Counting linear
Uncertainty in Artificial Intelligence, pp.472-481. extensions is #P-complete. Technical Report DIMACS
M. Shwe and G. Cooper (1990) “An Empirical Analysis of 90-49, Dept of Computer Science, Rutgers Univ.
Likelihood-Weighting Simulation on a Large, Multiply W. Buntine (1991) “Theory refinement on Bayesian
Connected Belief Network”, Proceedings of the Sixth networks,” in D’Ambrosio, Smets and Bonissone (eds.)
Workshop on Uncertainty in Artificial Intelligence, pp. UAI 1991, 52-69.
498-508, 1990.
W. Buntine (1996) “A Guide to the Literature on Learning
L.C. van der Gaag, S. Renooij, C.L.M. Witteman, B.M.P. Probabilistic Networks from Data,” IEEE Transactions
Aleman, B.G. “Tall (1999) How to Elicit Many on Knowledge and Data Engineering,8, 195-210.
Probabilities”, Laskey & Prade (eds) UAI99, 647-654.
D.M. Chickering (1995) “A Tranformational
Zukerman, I., McConachy, R., Korb, K. and Pickett, D. (1999) Characterization of Equivalent Bayesian Network
“Exploratory Interaction with a Bayesian Argumentation Structures,” in P. Besnard and S. Hanks (eds.)
System,” in IJCAI-99 Proceedings – the Sixteenth Proceedings of the Eleventh Conference on Uncertainty in
International Joint Conference on Artificial Intelligence,
Artificial Intelligence (pp. 87-98). San Francisco: Morgan
pp. 1294-1299, Stockholm, Sweden, Morgan Kaufmann.
Kaufmann.
STATISTICAL EQUIVALENCE .

Learning Bayesian Networks G.F. Cooper and E. Herskovits (1991) “A Bayesian Method
for Constructing Bayesian Belief Networks from
H. Blalock (1964) Causal Inference in Nonexperimental
Databases,” in D’Ambrosio, Smets and Bonissone (eds.)
Research. University of North Carolina.

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 107 Korb & Nicholson 108

UAI 1991, 86-94. B AYESIAN LEARNING OF STATISTICAL EQUIVALENCE


CLASSES.
G.F. Cooper and E. Herskovits (1992) “A Bayesian Method
for the Induction of Probabilistic Networks from Data,” K. Korb (1999) “Probabilistic Causal Structure” in H.
Machine Learning, 9, 309-347. Sankey (ed.) Causation and Laws of Nature:
A N EARLY BAYESIAN CAUSAL DISCOVERY METHOD. Australasian Studies in History and Philosophy of
Science 14. Kluwer Academic.
H. Dai, K.B. Korb, C.S. Wallace and X. Wu (1997) “A study of
I NTRODUCTION TO THE RELEVANT PHILOSOPHY OF
casual discovery with weak links and small samples.”
CAUSATION FOR LEARNING B AYESIAN NETWORKS.
Proceedings of the Fifteenth International Joint
Conference on Artificial Intelligence (IJCAI), P. Krause (1998) Learning Probabilistic Networks.
pp. 1304-1309. Morgan Kaufmann. http : ==www:auai:org=bayes USKrause:ps:gz
B ASIC INTRODUCTION TO BN S, PARAMETERIZATION
N. Friedman (1997) “The Bayesian Structural EM
AND LEARNING CAUSAL STRUCTURE .
Algorithm,” in D. Geiger and P.P. Shenoy (eds.)
Proceedings of the Thirteenth Conference on Uncertainty W. Lam and F. Bacchus (1993) “Learning Bayesian belief
in Artificial Intelligence (pp. 129-138). San Francisco: networks: An approach based on the MDL principle,” Jrn
Morgan Kaufmann. Comp Intelligence, 10, 269-293.

D. Geiger and D. Heckerman (1994) “Learning Gaussian D. Madigan, S.A. Andersson, M.D. Perlman & C.T. Volinsky
networks,” in Lopes de Mantras and Poole (eds.) UAI (1996) “Bayesian model averaging and model selection
1994, 235-243. for Markov equivalence classes of acyclic digraphs,”
Comm in Statistics: Theory and Methods, 25, 2493-2519.
D. Heckerman and D. Geiger (1995) “Learning Bayesian
networks: A unification for discrete and Gaussian D. Madigan and A. E. Raftery (1994) “Model selection and
domains,” in Besnard and Hanks (eds.) UAI 1995, accounting for model uncertainty in graphical modesl
274-284. using Occam’s window,” Jrn AMer Stat Assoc, 89,
1535-1546.
D. Heckerman, D. Geiger, and D.M. Chickering (1995)
“Learning Bayesian Networks: The Combination of N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H.
Knowledge and Statistical Data,” Machine Learning, 20, Teller and E. Teller (1953) “Equations of state
197-243. calculations by fast computing machines,” Jrn Chemical

Bayesian AI Tutorial Bayesian AI Tutorial


Korb & Nicholson 109 Korb & Nicholson 110

Physics, 21, 1087-1091. Prediction and Search: Lecture Notes in Statistics 81.
Springer Verlag.
J.R. Neil and K.B. Korb (1999) “The Evolution of Causal
A THOROUGH PRESENTATION OF THE ORTHODOX
Models: A Comparison of Bayesian Metrics and
STATISTICAL APPROACH TO LEARNING CAUSAL
Structure Priors,” in N. Zhong and L. Zhous (eds.)
STRUCTURE .
Methodologies for Knowledge Discovery and Data
Mining: Third Pacific-Asia Conference (pp. 432-437). J. Suzuki (1996) “Learning Bayesian Belief Networks Based
Springer Verlag. on the Minimum Description Length Principle,” in L.
G ENETIC ALGORITHMS FOR CAUSAL DISCOVERY; Saitta (ed.) Proceedings of the Thirteenth International
STRUCTURE PRIORS. Conference on Machine Learning (pp. 462-470). San
Francisco: Morgan Kaufmann.
J.R. Neil, C.S. Wallace and K.B. Korb (1999) “Learning
Bayesian networks with restricted causal interactions,” T.S. Verma and J. Pearl (1991) “Equivalence and Synthesis
in Laskey and Prade (eds.) UAI 99, 486-493. of Causal Models,” in P. Bonissone, M. Henrion, L. Kanal
and J.F. Lemmer (eds) Uncertainty in Artificial
J. Rissanen (1978) “Modeling by shortest data description,”
Intelligence 6 (pp. 255-268). Elsevier.
Automatica, 14, 465-471.
T HE GRAPHICAL CRITERION FOR STATISTICAL
H. Simon (1954) “Spurious Correlation: A Causal EQUIVALENCE .
Interpretation,” Jrn Amer Stat Assoc, 49, 467-479.
C.S. Wallace and D. Boulton (1968) “An information measure
D. Spiegelhalter & S. Lauritzen (1990) “Sequential Updating for classification,” Computer Jrn, 11, 185-194.
of Conditional Probabilities on Directed Graphical
C.S. Wallace and P.R. Freeman (1987) “Estimation and
Structures,” Networks, 20, 579-605.
inference by compact coding,” Jrn Royal Stat Soc (Series
P. Spirtes, C. Glymour and R. Scheines (1990) “Causality B), 49, 240-252.
from Probability,” in J.E. Tiles, G.T. McKee and G.C.
C. S. Wallace and K. B. Korb (1999) “Learning Linear Causal
Dean Evolving Knowledge in Natural Science and
Models by MML Sampling,” in A. Gammerman (ed.)
Artificial Intelligence. London: Pitman. A N
Causal Models and Intelligent Data Management.
ELEMENTARY INTRODUCTION TO STRUCTURE
Springer Verlag.
LEARNING VIA CONDITIONAL INDEPENDENCE .
S AMPLING APPROACH TO LEARNING CAUSAL MODELS ;
P. Spirtes, C. Glymour and R. Scheines (1993) Causation, DISCUSSION OF STRUCTURE PRIORS.

Bayesian AI Tutorial Bayesian AI Tutorial

Korb & Nicholson 111

C. S. Wallace, K. B. Korb, and H. Dai (1996) “Causal


Discovery via MML,” in L. Saitta (ed.) Proceedings of the
Thirteenth International Conference on Machine
Learning (pp. 516-524). San Francisco: Morgan
Kaufmann.
I NTRODUCES AN MML METRIC FOR CAUSAL MODELS.

S. Wright (1921) “Correlation and Causation,” Jrn


Agricultural Research, 20, 557-585.

S. Wright (1934) “The Method of Path Coefficients,” Annals


of Mathematical Statistics, 5, 161-215.

Bayesian AI Tutorial

Вам также может понравиться