Академический Документы
Профессиональный Документы
Культура Документы
December 5, 2013
Articial Intelligence
December 5, 2013
1 / 475
December 5, 2013
2 / 475
Course Introduction
Python Brief Introduction
Agents and their Task-environments
Goal-based Problem-solving Agents using Searching
Non-classical Search Algorithms
Games Agents Play
Logical Agents: Propositional Logic
Probability Calculus
Beginning to Learn using Na Bayesian Classiers
ve
K. Pathak (Jacobs University Bremen)
Bayesian Networks
Articial Intelligence
Course Introduction
Contents
Course Introduction
Course Logistics
What is Articial Intelligence (AI)?
Foundations of AI
History of AI
State of the Art
Articial Intelligence
Course Introduction
December 5, 2013
3 / 475
Course Logistics
Grading
Break-down:
Easy quizzes
15% Auditors: taking 75% quizzes necessary.
Homeworks (5) 25%
Mid-term exam 30% 23rd Oct. (Wed.) after Reading days.
Final exam
30%
If you have an ocial excuse for a quizz/exam, make-up will be
provided. For home-works, make-ups will be decided on a
case-by-case basis: ocial excuse for at least three days immediately
before the deadline necessary.
Home-works: Python or C++.
Teaching Assistant: Vahid Azizi v.azizi@jacobs-university.de
Articial Intelligence
December 5, 2013
4 / 475
Course Introduction
Course Logistics
Articial Intelligence
Course Introduction
December 5, 2013
5 / 475
Course Logistics
Teaching Philosophy
No question will be ridiculed.
Some questions would be taken oine or might be postponed.
Homeworks are where you really learn!
Not all material will be in the slides. Some material will be derived on
the board - you should take lecture-notes yourselves.
Material done on the board is especially likely to appear in
quizzes/exams.
Articial Intelligence
December 5, 2013
6 / 475
Course Introduction
Course Logistics
Articial Intelligence
Course Introduction
December 5, 2013
7 / 475
December 5, 2013
8 / 475
Course Logistics
Coming Up...
Articial Intelligence
Course Introduction
Course Logistics
Textbooks
Main textbook:
Stuart Russell and Peter Norvig, Articial Intelligence: A Modern
Approach, 3rd Edition, 2010, Pearson International Edition.
Other references:
Uwe Schning, Logic for Computer Scientists, English 2001,
o
German 2005, Birkhuser.
a
Daphne Koller and Nir Friedman, Probabilistic Graphical Models:
Principles and Techniques, 2009, MIT Press.
Articial Intelligence
Course Introduction
December 5, 2013
9 / 475
Course Logistics
Syllabus
Introduction to AI; Intelligent agents:
Chapters 1,2.
Logical Agents
Chapter 7.
Introduction to Machine-Learning
Supervised Learning: Information Entropy, Decision Trees, ANNs:
Chapter 18.
Model Estimation: Priors, Maximum Likelihood, Kalman Filter, EKF,
RANSAC.
Learning Probabilistic Models:
Chapter 20.
Unsupervised Learning: Clustering (K-Means, Mean-Shift Algorithm).
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
10 / 475
Course Introduction
Dening AI
Human-centered vs. Rationalist Approaches
Thinking Humanly
[The
automation of] activities that
we associate with human thinking, activities such as decisionmaking, problem-solving, learning... (Bellman, 1978)
Acting Humanly The art
of creating machines that perform functions that require intelligence when performed by people.
(Kurzweil, 1990)
Thinking Rationally
The
study of computations that make
it possible to perceive, reason, and
act. (Winston, 1992)
Articial Intelligence
Course Introduction
December 5, 2013
11 / 475
Acting Humanly
Articial Intelligence
December 5, 2013
12 / 475
Course Introduction
Articial Intelligence
Course Introduction
December 5, 2013
13 / 475
Articial Intelligence
December 5, 2013
14 / 475
Course Introduction
Thinking Humanly
Trying to discover how human minds
work. Three ways:
Introspection
Psychological experiments on
humans
Brain imaging: Functional
Magnetic Resonance Imaging
(fMRI), Positron Emission
Tomography (PET), EEG, etc.
Cognitive Science constructs testable
theories of mind:
Articial Intelligence
Course Introduction
December 5, 2013
15 / 475
Thinking Rationally
Articial Intelligence
December 5, 2013
16 / 475
Course Introduction
Acting Rationally
Articial Intelligence
Course Introduction
December 5, 2013
17 / 475
Acting Rationally
The Rational Agent Approach
Articial Intelligence
December 5, 2013
18 / 475
Course Introduction
Acting Rationally
The Rational Agent Approach
Articial Intelligence
Course Introduction
December 5, 2013
19 / 475
Foundations of AI
Mathematics
Logic, computational tractability, probability theory.
Economics
Utility theory, decision theory (probability theory + utility theory), game
theory.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
20 / 475
Course Introduction
Foundations of AI
Neuroscience
The exact way the brain enables thought is still a scientic mystery.
However, the mapping between areas of the brain and parts of body they
control or receive sensory input from can be found, though it can change
over a course of a few weeks.
Motor Cortex
Parietal Lobe
Frontal Lobe
Dorsal Stream
Occipital Lobe
Visual Cortex
Ventral Stream
Cerebellum
Temporal Lobe
Spinal Cord
Figure 4: The human cortex with the various lobes shown in dierent colors. The
information from the visual cortex gets channeled into the dorsal (where/how)
and the ventral (what) streams.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
Course Introduction
December 5, 2013
21 / 475
Foundations of AI
Articial Intelligence
December 5, 2013
22 / 475
Course Introduction
Foundations of AI
Psychology
Behaviorism (stimulus/response), Cognitive psychology.
Computer Engineering
Hardware and Software. Computer vision.
Linguistics
Natural language processing.
Articial Intelligence
Course Introduction
December 5, 2013
23 / 475
Foundations of AI
The basic idea of control theory is to use sensory feedback to alter system
inputs so as to minimize the error between desired and observed output.
Basic example: controlling the movement of an industrial robotic arm to a
desired orientation.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
24 / 475
Course Introduction
Foundations of AI
Articial Intelligence
Course Introduction
December 5, 2013
25 / 475
History of AI
History of AI I
Gestation Period (1943-1955)
McCulloch and Pitts (1943) proposed a model for the neuron. Hebbian
learning (1949) for updating inter-neuron connection strengths developed.
Alan Turing published Computing Machinery and Intelligence (1950),
proposing the Turing test, machine learning, genetic algorithms, and
reinforcement learning.
Birth of AI (1956)
The Dartmouth workshop organized by John McCarthy of Stanford.
Articial Intelligence
December 5, 2013
26 / 475
Course Introduction
History of AI
History of AI II
After the Sputnik launch (1957), automatic Russian to English translation
attempted. Failed miserably.
1. The spirit is willing, but the esh is weak. Translated to:
2. The wodka is good but the meat is rotten.
Computational complexity scaling-up could not be handled. Single layer
perceptrons were found to have very limited representational power. Most
of government funding stopped.
AI in Industry (1980-present)
K. Pathak (Jacobs University Bremen)
Articial Intelligence
Course Introduction
December 5, 2013
27 / 475
History of AI
History of AI III
Companies like DEC, DuPont etc. developed expert systems. Industry
boomed but all extravagant promises not fullled leading to AI winter.
Articial Intelligence
December 5, 2013
28 / 475
Course Introduction
History of AI
History of AI IV
Huge data-sets (2001-present)
Learning based on very large data-sets. Example: Filling in holes in a
photograph; Hayes and Efros (2007). Performance went from poor for
10,000 samples to excellent for 2 million samples.
Articial Intelligence
Course Introduction
December 5, 2013
29 / 475
December 5, 2013
30 / 475
History of AI
Articial Intelligence
Course Introduction
Successful Applications
Intelligent Software Wizards and Assistants
(b) Siri
Articial Intelligence
Course Introduction
December 5, 2013
31 / 475
Logistics Planning
Dynamic Analysis and Replanning Tool (DART). Used during Gulf war
(1990s) for scheduling of transportation. DARPA stated that this single
application paid back DARPAs 30 years investment in AI.
DART won DARPAs outstanding Performance by a Contractor award, for
modication and transportation feasibility analysis for Time-Phased Force
and Deployment Data that was used during Desert Storm.
http://www.bbn.com
Articial Intelligence
December 5, 2013
32 / 475
Course Introduction
Flow Machines
2013 Best AI Video Award: http://www.aaaivideos.org
Articial Intelligence
Course Introduction
December 5, 2013
33 / 475
December 5, 2013
34 / 475
Intelligent Textbook
2012 Best AI Video Award: http://www.aaaivideos.org
Articial Intelligence
Course Introduction
Articial Intelligence
Course Introduction
December 5, 2013
35 / 475
Registered Point-Clouds
Articial Intelligence
December 5, 2013
36 / 475
Course Introduction
(a) The Microsoft Kinect 3D camera (b) A point-cloud obtained from it (from
(from Wikipedia)
Willow Garage).
Articial Intelligence
Course Introduction
December 5, 2013
37 / 475
December 5, 2013
38 / 475
RGBD Segmentation
Unsupervised Clustering By Mean-Shift Algorithm
Articial Intelligence
Course Introduction
Figure 13: IEEE Int. Conf. on Robotics & Automation (ICRA) 2011: Perception
Challenge. Our group won II place between Berkeley (I) and Stanford (III).
Video (2:39)
Articial Intelligence
December 5, 2013
39 / 475
December 5, 2013
40 / 475
Contents
Articial Intelligence
Data-types
Built-in Data-types
Type
Numbers
Strings
Boolean
Lists
Dictionaries
Tuples
Sets/FrozenSets
Files
Single Instances
...
Example
12, 3.4, 7788990L, 6.1+4j, Decimal
"abcd", abc, "abcs"
True, False
[True, 1.2, "vcf"]
{"A" : 25, "V" : 70}
("ABC", 1, Z)
{90,a}, frozenset({a, 2})
f= open(spam.txt, r)}
None, NotImplemented
Articial Intelligence
Immutable
X/
December 5, 2013
41 / 475
December 5, 2013
42 / 475
Data-types
Sequences I
str, list, tuple
Articial Intelligence
Data-types
Sequences II
str, list, tuple
Immutability
a[1]= q # Fails
b[1]= s
c= a;
c is a, c==a
a= "xyz"; c is a
Help
dir(b)
help(b.sort)
b.sort()
b # In-place sorting
Articial Intelligence
December 5, 2013
43 / 475
December 5, 2013
44 / 475
Data-types
Sequences III
str, list, tuple
Slicing
a[1:2]
a[0:-1]
a[:-1], a[3:]
a[:]
a[0:len(a):2]
a[-1::-1]
Articial Intelligence
Data-types
Sequences IV
str, list, tuple
Nesting
A=[[1,2,3],[4,5,6],[7,8,9]]
A[0]
A[0][2]
A[0:-1][-1]
A[3] # Error
List Comprehension
q= [x.isdigit() for x in a]
print q
p=[(r[1]**2) for r in A if r[1]< 8] # Power
print p
Articial Intelligence
December 5, 2013
45 / 475
December 5, 2013
46 / 475
Data-types
Sequences V
str, list, tuple
Dictionaries
D= {0:Rhine, 1:"Indus", 3:"Hudson"}
D[0]
D[6] # Error
D[6]="Volga"
dir(D)
Articial Intelligence
Data-types
Numbers I
Math Operations
1
2
3
4
5
6
7
8
9
10
11
12
13
Articial Intelligence
December 5, 2013
47 / 475
December 5, 2013
48 / 475
Data-types
Numbers II
Booleans
1
2
3
4
5
6
7
8
9
10
11
12
13
s1= True
s2= 3 < 5 <=100
not(s1 and s2) or (not s2)
x= 2 if s2 else 3
b1= 0xE210 # Hex
print b1
b2= 023 # Oct
print b2
b1 & b2 # Bitwise and
b1 | b2 # Bitwise or
b1 ^ b2 # Bitwise xor
b1 << 3 # Shift b1 left by 3 bits
~b1 # Bitwise complement
Articial Intelligence
Data-types
Dynamic Typing I
Variables are names and have no types. They can refer to objects of
any type. Type is associated with objects.
1
2
3
4
a= "abcf"
b= "abcf"
a==b, a is b
a= 2.5
Objects are garbage-collected automatically.
Shared references
Articial Intelligence
December 5, 2013
49 / 475
December 5, 2013
50 / 475
Data-types
Dynamic Typing II
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
a= [4,1,5,10]
b=a
b is a
a.sort()
b is a
a.append(w)
a
b is a
a= a + [w]
a
b is a
b
x= 42
y= 42
x is y, x==y
x= [1,2,3]; y=[1,2,3]
Articial Intelligence
Data-types
x is y, x==y
x=123; y= 123
x is y, x==y # Wassup?
# Assignments create references
L= [1,2,3]
M= [x, L, c]
M
L[1]= 0
M
# To copy
L= [1,2,3]
M= [x, L[:], c]
M
L[1]= 0
M
Articial Intelligence
December 5, 2013
51 / 475
Control Statements
Control Statements I
Mind the indentation! One extra carriage return to nish in
interactive mode.
1
2
3
4
5
6
7
8
9
10
11
import sys
tmp= sys.stdout
sys.stdout = open(log.txt, a)
x= random.random();
if x < 0.25:
[y, z]= [-1, 4]
elif 0.25 <= x < 0.75:
print case 2
y= z= 0
else:
z, y= 0, 2; y += 3
12
13
sys.stdout = tmp
Articial Intelligence
December 5, 2013
52 / 475
Control Statements
Loops I
While
1
2
3
4
5
6
7
8
9
10
11
12
13
i= 0;
while i< 5:
s= raw_input("Enter an int: ")
try:
j= int(s)
except:
print invalid input
break;
else:
print "Its square is %d" % j**2
i += 1
else:
print "exited normally without break"
Articial Intelligence
December 5, 2013
53 / 475
December 5, 2013
54 / 475
Control Statements
Loops II
For
1
2
3
4
5
6
7
8
X= range(2,10,2) # [2, 4, 6, 8]
N= 7
for x in X:
if x> N:
print x, "is >", N
break;
else:
print no number > , N, found
9
10
11
Articial Intelligence
Functions
Functions I
Arguments are passed by assignment
1
2
3
4
5
6
7
8
9
10
x= [a,b,c]; # Mutable
y= bdg # Immutable
print x, y
change_q(q=x,p=y)
print x, y
Output
[a, b, c] bdg
[a, b, c, d, g] bdg
Articial Intelligence
December 5, 2013
55 / 475
Functions
Functions II
Scoping rule: LEGB= Local-function, Enclosing-function(s), Global
(module), Built-ins.
1
2
3
4
5
6
7
8
v= 99
def local():
def locallocal():
v= u
print "inside locallocal ", v
u= 7; v= 2
locallocal()
print "outside locallocal ", v
9
10
11
12
13
def glob1():
global v
v += 1
Articial Intelligence
December 5, 2013
56 / 475
Functions
Functions III
14
15
16
17
18
local()
print v
glob1()
print v
Output
inside locallocal 7
outside locallocal 2
99
100
Articial Intelligence
December 5, 2013
57 / 475
Packages, Modules I
Python Standard Library http://docs.python.org/library/
Folder structure
root/
pack1/
__init__.py
mod1.py
pack2/
__init__.py
mod2.py
root should be in one of the following: 1) program home folder, 2)
PYTHONPATH 3) standard lib folder, or, 4) in a .pth le on path. The
full search-path is in sys.path.
Importing
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
58 / 475
Packages, Modules II
Python Standard Library http://docs.python.org/library/
import pack1.mod1
import pack1.mod3 as m3
from pack1.pack2.mod2 import A,B,C
Articial Intelligence
December 5, 2013
59 / 475
December 5, 2013
60 / 475
Classes I
Example
class Animal(object): # new style classes
count= 0
def __init__(self, _name):
Animal.count += 1
self.name= _name
def __str__(self):
return I am + self.name
def make_noise(self):
print (self.speak()+" ")*3
class Dog(Animal):
def __init__(self, _name):
Animal.__init__(self, _name)
self.count= 1
K. Pathak (Jacobs University Bremen)
Articial Intelligence
Classes II
def speak(self):
return "woof"
Full examples in python examples.tgz
Articial Intelligence
December 5, 2013
61 / 475
Articial Intelligence
December 5, 2013
62 / 475
Contents
Articial Intelligence
December 5, 2013
63 / 475
A general agent
Agent
Sensors
Percepts
Environment
?
Actuators
Actions
Articial Intelligence
December 5, 2013
64 / 475
Articial Intelligence
December 5, 2013
65 / 475
Articial Intelligence
December 5, 2013
66 / 475
Agent Types
Agent Types
Articial Intelligence
December 5, 2013
67 / 475
Agent Types
Algorithm 1: Simple-Reflex-Agent
Agent
Sensors
Conditionaction rules
What action I
should do now
Actuators
Environment
input
: percept
output : action
persistent: rules, a set of
condition-action rules
state Interpret-Input(percept) ;
rule Rule-Match(state, rules) ;
action rule.action;
return action
Articial Intelligence
December 5, 2013
68 / 475
Agent Types
Sensors
State
How the world evolves
What my actions do
What action I
should do now
Conditionaction rules
Agent
Environment
Actuators
Articial Intelligence
December 5, 2013
69 / 475
Agent Types
Articial Intelligence
December 5, 2013
70 / 475
Agent Types
Goal-based agents
Sensors
State
What the world
is like now
What my actions do
Goals
What action I
should do now
Agent
Environment
Actuators
Articial Intelligence
December 5, 2013
71 / 475
Agent Types
Utility-based agents
Sensors
State
What the world
is like now
What my actions do
Utility
Agent
Environment
Actuators
Articial Intelligence
December 5, 2013
72 / 475
Contents
Articial Intelligence
December 5, 2013
73 / 475
December 5, 2013
74 / 475
Articial Intelligence
Initial State x0
Valid Actions
u1
u2
(4.1)
Goal State xg
Articial Intelligence
December 5, 2013
75 / 475
Examples I
7
5
5
1
parent, action
State
Node
8
8
2
2
depth = 6
g=6
state
Goal State
Start State
(a) An instance
Arrows
Articial Intelligence
December 5, 2013
76 / 475
Examples II
Articial Intelligence
December 5, 2013
77 / 475
Examples III
Oradea
71
75
Neamt
Zerind
87
151
Iasi
Arad
140
92
Sibiu
99
Fagaras
118
Timisoara
111
Vaslui
80
Rimnicu Vilcea
Lugoj
Pitesti
97
142
211
70
98
Mehadia
75
Dobreta
146
85
101
Hirsova
Urziceni
86
138
Bucharest
120
90
Craiova
Giurgiu
Eforie
Figure 20: The map of Romania. An instance of the route planning problem
given a map.
Articial Intelligence
December 5, 2013
78 / 475
Examples IV
Articial Intelligence
December 5, 2013
79 / 475
Examples V
Articial Intelligence
December 5, 2013
80 / 475
Graph-Search
Compare with Textbook Fig. 3.7
Algorithm 4: Graph-Search
input : x0 , Xg
D = , The explored-set/dead-set/passive-set;
F.Insert( x0 , g (x0 ) = 0, (x0 ) = h(x0 ) ) Frontier/active-set;
while F not empty do
x, g (x), (x) F.Choose() Remove best x from F;
if x Xg then return SUCCESS;
D D {x};
for u U(x) do
x f(x, u),
g (x ) g (x) + k(x, u) ;
if (x D) and (x F) then
/
/
F.Insert( x , g (x ), (x ) = g (x ) + h(x , Xg ) );
else if (x F) then
F.Resolve-Duplicate(x , g (x ), (x ));
Articial Intelligence
December 5, 2013
81 / 475
Articial Intelligence
December 5, 2013
82 / 475
Articial Intelligence
December 5, 2013
83 / 475
BFS Example
Articial Intelligence
December 5, 2013
84 / 475
Articial Intelligence
December 5, 2013
85 / 475
D
xg
F
Figure 23: The graph separation by the frontier. The node x in the frontier is
chosen for further expansion. The set D is a tree.
Articial Intelligence
December 5, 2013
86 / 475
Articial Intelligence
December 5, 2013
87 / 475
Articial Intelligence
December 5, 2013
88 / 475
Articial Intelligence
December 5, 2013
89 / 475
Algorithm 5: Depth-limited-Search
input : current-state x, depth d
if x Xg then
return SUCCESS ;
else if d = 0 then
return CUTOFF
else
for u U(x) do
x f(x, u);
result Depth-limited-Search(x , d 1);
if result =SUCCESS then
return SUCCESS
else if result =CUTOFF then
cuto-occurred true
if cuto-occurred then return CUTOFF ;
else return NOT-FOUND ;
Articial Intelligence
December 5, 2013
90 / 475
DFS Example
Goal node M
D
H
E
I
F
K
G
M
E
I
Articial Intelligence
F
K
G
M
December 5, 2013
D
O
91 / 475
As O(b d )
and DFS.
Articial Intelligence
December 5, 2013
92 / 475
Limit = 0
Limit = 1
C
E
G
M
F
K
F
K
G
M
E
I
F
K
E
I
F
K
D
O
E
I
Articial Intelligence
G
M
A
C
A
B
G
G
M
E
I
C
E
D
O
F
K
B
G
E
I
F
K
F
K
A
C
E
I
E
I
B
D
F
K
C
E
A
B
D
C
E
Limit = 3
B
D
B
D
Limit = 2
F
K
G
M
December 5, 2013
93 / 475
December 5, 2013
94 / 475
Articial Intelligence
A Search
Est. path-cost from x0 to xg = g (x) + est. path-cost from x to xg (4.2)
(x)
h(x)
(x)
g (x) + h(x)
(4.3)
(4.4)
Articial Intelligence
December 5, 2013
95 / 475
A Search
Resolve-Duplicate
Articial Intelligence
December 5, 2013
96 / 475
Arad
Bucharest
Craiova
Drobeta
Eforie
Fagaras
Giurgiu
Hirsova
Iasi
Lugoj
Mehadia
Neamt
Oradea
Pitesti
Rimnicu Vilcea
Sibiu
Timisoara
Urziceni
Vaslui
Zerind
366
0
160
242
161
176
77
151
226
244
241
234
380
100
193
253
329
80
199
374
Articial Intelligence
December 5, 2013
97 / 475
(4.5)
(4.6)
Articial Intelligence
December 5, 2013
98 / 475
(4.7)
Articial Intelligence
December 5, 2013
99 / 475
(4.8)
Articial Intelligence
December 5, 2013
100 / 475
A proof I
Lemma 4.5 ( (x) is non-decreasing along any optimal path, if h(x) is
consistent)
x0
xm
xn
xp
Figure 25: A dashed line between two nodes denotes that the nodes are
connected by a path, but are not necessarily directly connected.
To prove: Let xp be a node for which the optimum-path with cost g (xp )
has been found (see gure): Nodes xm and xn lie on this optimum path
such that xm precedes xn , then
(xm )
(xn )
(4.9)
Articial Intelligence
December 5, 2013
101 / 475
A proof II
Proof.
First note that since xm and xn lie on the optimum path to xp , their
paths are also optimum and have lengths g (xn ) and g (xm )
respectively.
Let us rst assume that xm is the parent of xn , then
(4.10)
(4.11)
(4.6)
h(xm ) + g (xm )
(xm )
(4.12)
Articial Intelligence
December 5, 2013
102 / 475
Recall Graph-Search
Algorithm 7: Graph-Search
input : x0 , Xg
D = , The explored-set/dead-set/passive-set;
F.Insert( x0 , g (x0 ) = 0 ) The frontier/active-set;
while F not empty do
x, g (x) F.Choose() Remove x from the frontier;
if x Xg then return SUCCESS;
D D {x};
for u U(x) do
x f(x, u),
g (x ) g (x) + k(x, u) ;
if (x D) and (x F) then
/
/
F.Insert( x , g (x ) + h(x , Xg ) );
else if (x F) then
F.Resolve-Duplicate(x );
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
103 / 475
A proof I
Lemma 4.6 (At selection for expansion, a nodes optimum path has
been found)
To prove: In every iteration k of A , the node x selected for expansion by
the frontier Fk (x has the minimum value of (x) in Fk ) is such that, at
selection:
g (x) = g (x).
Articial Intelligence
(4.13)
December 5, 2013
104 / 475
Articial Intelligence
December 5, 2013
105 / 475
xs
x0
xm
xn
(x0 : xm)
xp
(xp : xn)
Figure 26: The path shown in blue is the assumed optimal path. Since xm D
Articial Intelligence
December 5, 2013
106 / 475
be xm D.
Thus, the entire assumed optimal path consists of the following
Articial Intelligence
December 5, 2013
(4.14)
107 / 475
k+1 (xp )
(4.15)
= h(xp ) + g (xp ).
(xp )
= g (xp )
(4.16)
(xp )
(xn )
(4.17a)
k+1 (xn ),
(4.17b)
(4.16)
k+1 (xp )
(xp ).
(4.18)
Combining results,
(4.17a)
(xp )
(xn ) =
(4.18)
(xn )
k+1 (xn ).
k+1 (xn )
Articial Intelligence
(xp ),
(4.19)
(4.20)
December 5, 2013
108 / 475
As
(xn ) =
k+1 (xn ),
(4.21)
Articial Intelligence
December 5, 2013
109 / 475
(xj )
(xj+1 ).
(4.22)
Why?
Proof.
Sketch: You have to consider two cases:
At iteration j + 1, xj+1 is a child of xj .
At iteration j + 1, xj+1 is not a child of xj .
Articial Intelligence
December 5, 2013
110 / 475
Remark 4.8
This shows that at iteration N = j, if Fj selects xj then:
All nodes x with (x) < (xj ) have already been expanded (i.e. they
have died), and some nodes with (x) = (xj ) have also been
expanded.
In particular, when the rst goal is found, then, All nodes x with
(x) < g (x ) have already been expanded, and some nodes with
g
(x) = g (x ) have also been expanded.
g
Articial Intelligence
December 5, 2013
111 / 475
Figure 27: Region searched before nding a solution: Dijkstra path search
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
112 / 475
Figure 28: Region searched before nding a solution: A path search. The
number of nodes expanded is the minimum possible.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
113 / 475
Properties of A
The paths are optimal w.r.t. the cost function, but do you notice any
undesirable properties of the planned paths?
Why are there less colors in Fig. 18 than in Fig. 17?
In Fig. 18 why are the red-shades lighter in the beginning?
Articial Intelligence
December 5, 2013
114 / 475
Figure 29: Funnel-planning using wave-front expansion. The path stays away
from the obstacles.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
115 / 475
Funnel path-planning
Figure 30: Funnel-planning using wave-front expansion in 3D. Source: Brock and
Kavraki, Decomposition based motion-planning: A framework for real-time
motion-planning in high dimensional conguration spaces, ICRA 2001.
Articial Intelligence
December 5, 2013
116 / 475
Algorithm 8: Funnel-Planning
input: x0 , xg
B ndFreeSphere(x0 )
B.parent
Q.insert(B, B.center xg B.r )
while Q not empty do
B Q.getMin()
D.insert(B)
if xg B then
return [D, B]
for s 1 . . . Ns do
x sampleOnSurface(B)
if x D then
/
C ndFreeSphere(x)
C .parent B
Q.insert(C , C .center xg C .r )
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
117 / 475
Funnel path-planning
Figure 31: Motion planning using the funnel potentials. Source: LaValle,
Planning Algorithms, http://planning.cs.uiuc.edu/.
Articial Intelligence
December 5, 2013
118 / 475
Contents
Articial Intelligence
December 5, 2013
119 / 475
Local Search
objective function
global maximum
shoulder
local maximum
flat local maximum
state space
current
state
Figure 32: A 1-D state-space landscape. The aim is to nd the global maximum.
Articial Intelligence
December 5, 2013
120 / 475
Hill-Climbing
Algorithm 9: Hill-Climbing
input : x0 , objective (value) function v (x)
to maximize
output: x , the state where a local
maximum is achieved
x x0 ;
while True do
y the highest-valued child of x ;
if v (y) v (x) then return x ;
xy
To avoid getting stuck in plateaux:
Allow side-ways movements.
Problems?
Random-restart: perform search from
many randomly chosen x0 till an
optimal solution is found.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
121 / 475
Hill-Climbing
Figure 34: (a) Starting state. h(x) is the number of pairs of queens attacking
each other. Each node has 8 7 children. (b) A local minimum with h(x) = 1.
Articial Intelligence
December 5, 2013
122 / 475
Problem 5.2
Sampling a PMF A uniform random number generator in the unit-interval
has the probability distribution function pu[0,1] (x) as shown below in the
gure. Python random.random() returns a sample x [0.0, 1.0). How
can you use it to sample a given discrete distribution (PMF)?
1
pu[0,1](x)
0
a
1
x
b
P (x [a, b] ; 0 a b < 1) = b a
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
123 / 475
FA (aj )
u(x)
FA (aj )
P(A aj )
0
1
=
i=1
P(A = ai )u(aj ai ),
where,
(5.1)
x <0
x 0
(5.2)
P(A = ai )
(5.3)
ij
Articial Intelligence
December 5, 2013
124 / 475
[0, FA (a1 ))
[FA (a1 ), FA (a2 ))
.
.
s
, Dene a0 s.t. FA (a0 )
.
0.
(5.4)
= FA (ai ) FA (ai1 )
(5.3)
= P(ai )
Articial Intelligence
December 5, 2013
125 / 475
Simulated Annealing
(5.5)
Articial Intelligence
An example of a
schedule is
Tk = T0
ln(k0 )
ln(k)
(5.6)
Applications:
VLSI layouts,
Factory-scheduling.
December 5, 2013
126 / 475
Simulated Annealing
Another Schedule
Articial Intelligence
December 5, 2013
127 / 475
Simulated Annealing
50000
100000
150000
nr. iteration
200000
250000
50000
100000
150000
nr. iteration
200000
250000
1000
800
T
600
400
200
00
Articial Intelligence
December 5, 2013
128 / 475
Simulated Annealing
20
40
60
80
100
Figure 35: A suboptimal tour found by the algorithm in one of the runs.
Articial Intelligence
December 5, 2013
129 / 475
Genetic Algorithms
Articial Intelligence
December 5, 2013
130 / 475
Genetic Algorithms
24 31%
32752411
32748552
32748152
32752411
23 29%
24748552
24752411
24752411
24415124
20 26%
32752411
32752124
32252124
32543213
11 14%
24415124
24415411
24415417
(a)
Initial Population
(b)
Fitness Function
(c)
Selection
(d)
Crossover
(e)
Mutation
Figure 36: The 8-Queens problem. The ith number in the string is the position of
the queen in the ith column. The tness function is the number of non-attacking
pairs (maximum tness 28).
Articial Intelligence
December 5, 2013
131 / 475
Genetic Algorithms
Articial Intelligence
December 5, 2013
132 / 475
Genetic Algorithms
Articial Intelligence
December 5, 2013
133 / 475
December 5, 2013
134 / 475
Genetic Algorithms
Job
J1
J2
Operation
O11
O12
O21
O22
O23
M1
2
3
4
-
M2
6
8
6
7
M3
5
6
5
11
M4
3
4
5
M5
4
5
8
Articial Intelligence
Genetic Algorithms
GA Example: Constraints
N
i=1 Jio
Articial Intelligence
December 5, 2013
135 / 475
Genetic Algorithms
(a)
Articial Intelligence
December 5, 2013
136 / 475
Genetic Algorithms
Articial Intelligence
December 5, 2013
137 / 475
Genetic Algorithms
Articial Intelligence
December 5, 2013
138 / 475
Genetic Algorithms
Articial Intelligence
December 5, 2013
139 / 475
Genetic Algorithms
Articial Intelligence
December 5, 2013
140 / 475
Genetic Algorithms
Articial Intelligence
December 5, 2013
141 / 475
Genetic Algorithms
GA Example: Mutation
Articial Intelligence
December 5, 2013
142 / 475
Genetic Algorithms
GA Example: Run
Articial Intelligence
December 5, 2013
143 / 475
December 5, 2013
144 / 475
Contents
Articial Intelligence
Minimax
Zero-Sum Games
A Partial Game-Tree for Tic-Tac-Toe
MAX (X)
MIN (O)
X
X
X O
X O X
X O
X
MIN (O)
X
O
...
...
TERMINAL
Utility
...
X O
X
MAX (X)
...
...
...
X O X
O X
O
X O X
O O X
X X O
X O X
X
X O O
...
+1
Articial Intelligence
December 5, 2013
145 / 475
Minimax
Search-Tree vs Game-Tree
Articial Intelligence
December 5, 2013
146 / 475
Minimax
Zero-Sum Games
Nomenclature
Articial Intelligence
December 5, 2013
147 / 475
Minimax
MAX
A1
A2
A3
MIN
A 11
A 12
12
2
A 21
A 13
2
A 31
A 22 A 23
14
A 32
A 33
Figure 44: Each node (state) labeled with its minimax value.
Minimax(s) =
Utility(s)
if Terminal-Test(s)
Articial Intelligence
December 5, 2013
148 / 475
Minimax
Articial Intelligence
December 5, 2013
149 / 475
Minimax
Search-Tree Complexity
Articial Intelligence
December 5, 2013
150 / 475
Alpha-Beta Pruning
Alpha-Beta Pruning
MAX
MIN
12
MAX
14
Articial Intelligence
MIN
12
December 5, 2013
151 / 475
December 5, 2013
152 / 475
Alpha-Beta Pruning
Articial Intelligence
Alpha-Beta Pruning
Reference
Articial Intelligence
December 5, 2013
153 / 475
Alpha-Beta Pruning
Transposition Table
Articial Intelligence
December 5, 2013
154 / 475
Alpha-Beta Pruning
Articial Intelligence
December 5, 2013
155 / 475
Alpha-Beta Pruning
Articial Intelligence
December 5, 2013
156 / 475
Contents
Articial Intelligence
December 5, 2013
157 / 475
Knowledge-Base
A Knowledge-Base is a set of sentences expressed in a knowledge
representation language. New sentences can be added to the KB and it
can be queried about whether a given sentence can be inferred from what
is known.
A KB-Agent is an example of a Reex-Agent explained previously.
Algorithm 20: Knowledge-Base (KB) Agent
input : KB, a knowledge-base,
t, time, initially 0.
Tell(KB, Make-Percept-Sentence(percept, t)) ;
action Ask(KB, Make-Action-Query(t)) ;
Tell(KB, Make-Action-Sentence(action, t)) ;
t t +1 ;
return action
Articial Intelligence
December 5, 2013
158 / 475
Propositional Logic
Propositional Logic
A simple knowledge representation language
Articial Intelligence
December 5, 2013
159 / 475
Propositional Logic
Propositional Logic
A simple knowledge representation language
| Sentence Sentence
| Sentence Sentence
| Sentence Sentence
| Sentence Sentence
Operator-Precedence : , , , ,
(7.1)
Axioms are sentences which are given and cannot be derived from other
sentences.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
160 / 475
Propositional Logic
A(Bi ).
A ((P Q))
A ((P Q))
1, if A (P) = 1 or A (Q) = 1
0, otherwise.
Articial Intelligence
December 5, 2013
161 / 475
Propositional Logic
A(Q )
1
0
1
0
A (P Q )
1
0
1
1
A (P Q )
1
0
0
1
Articial Intelligence
December 5, 2013
162 / 475
Propositional Logic
Articial Intelligence
December 5, 2013
F.
163 / 475
Propositional Logic
Theorem 7.2
A formula F is valid if and only if (i) F is unsatisable.
Proof.
F is valid i every suitable assignment of F is a model of F .
i every suitable assignment of F (and hence, of F ) is not a model of
F .
i F has no model, and hence, is unsatisable.
Articial Intelligence
December 5, 2013
164 / 475
Propositional Logic
Wumpus World
4
Bree z e
Stench
PIT
PIT
Bree z e
Bree z e
Stench
Gold
Bree z e
Stench
Bree z e
PIT
Bree z e
START
Articial Intelligence
December 5, 2013
165 / 475
Propositional Logic
Wumpus World
Articial Intelligence
December 5, 2013
166 / 475
Propositional Logic
Wumpus World
Articial Intelligence
December 5, 2013
167 / 475
Propositional Logic
Wumpus World KB
Px,y
Wx,y
Bx,y
Sx,y
is
is
is
is
true
true
true
true
if
if
if
if
(7.2)
(7.4)
(7.3)
R5 : B2,1
(7.5)
Articial Intelligence
December 5, 2013
168 / 475
Entailment
G,
M(F ) M(G ) .
(7.6)
Articial Intelligence
December 5, 2013
169 / 475
Proof.
Assume F
Articial Intelligence
G.
December 5, 2013
170 / 475
G and G
F.
Articial Intelligence
December 5, 2013
171 / 475
Equivalence
Example 7.7
In the following, and can be swapped to get new equivalences.
F F
(7.7)
F F F
Idempotency
Commutativity
F G G F
(7.8)
(7.9)
(F G ) H F (G H)
Associativity
(7.10)
Absorption
(7.11)
F (G H) (F G ) (F H)
Distributivity
(7.12)
deMorgans Law
(7.13)
Contraposition
(7.14)
F (F G ) F
(F G ) (F ) (G )
P Q Q P
Articial Intelligence
December 5, 2013
172 / 475
G i the sentence (F G ) is
Proof.
First, note the equivalence (F G ) (F G ).
F
G i the sentence (F G ) is
Articial Intelligence
December 5, 2013
173 / 475
Logical Inference
Articial Intelligence
December 5, 2013
174 / 475
Inference by Model-Checking
else
P First(symbols); tail Rest(symbols) ;
return TT-Check(KB, Q, tail, model {P = True }) And
TT-Check(KB, Q, tail, model {P = False })
Articial Intelligence
December 5, 2013
175 / 475
Inference by Model-Checking
Normal Forms
Denition 7.9 (Literal)
If P is an atomic formula then
P is a positive literal.
P is a negative literal.
mi
F =
Li, j
(7.15)
i=1 j=1
Clausei
Articial Intelligence
December 5, 2013
(7.16)
176 / 475
Inference by Model-Checking
(G H) by (G H),
(G H) by (G H),
Articial Intelligence
December 5, 2013
177 / 475
Inference by Model-Checking
Articial Intelligence
December 5, 2013
178 / 475
Inference by Model-Checking
Articial Intelligence
December 5, 2013
179 / 475
Inference by Model-Checking
Literals
L=
A
A
if L = A,
if L = A.
Articial Intelligence
(7.17)
December 5, 2013
180 / 475
Inference by Model-Checking
Articial Intelligence
December 5, 2013
181 / 475
Inference by Model-Checking
Residual formula F |
Articial Intelligence
December 5, 2013
182 / 475
Inference by Model-Checking
return Unsatisable
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
183 / 475
Inference by Model-Checking
WalkSAT
A local search algorithm for satisability
Articial Intelligence
December 5, 2013
184 / 475
Inference Rules
P Q, P
Q
P Q
P
Modus Ponens
((P Q) P)
(7.18)
And-Elimination
(P Q)
(7.19)
Articial Intelligence
December 5, 2013
185 / 475
Articial Intelligence
December 5, 2013
186 / 475
Inference by Resolution
Resolution
Articial Intelligence
December 5, 2013
187 / 475
Inference by Resolution
General Resolution
(7.20)
Example 7.14
C1 = {A, B, C , D} and C2 = {C , B, D, E , F }
R = {A, C , D, E , F }
Articial Intelligence
December 5, 2013
188 / 475
Inference by Resolution
General Resolution
Lemma 7.15 (Resolution Lemma)
Let F be a formula in CNF in set-format. Let R be the resolvent of two
clauses C1 and C2 in F . Then F and F {R} are equivalent (Def. 7.5).
Proof.
Let A be an assignment suitable for F (and hence also for F {R}).
If A
F.
F and F
F {R}. Thus,
Articial Intelligence
December 5, 2013
189 / 475
Inference by Resolution
Articial Intelligence
December 5, 2013
190 / 475
Inference by Resolution
RC (S)
/
Articial Intelligence
December 5, 2013
191 / 475
Inference by Resolution
A Worked-out Example
Let the sentence S consist of the following clauses
C1 = (X Y Z ), C2 = (Z R S), C3 = (S T ).
X = True .
Y = True .
R = True .
T = False , due to C6 under previous assignments.
S = True . Examine C3 and C4 carefully.
Z = False , due to C1 under previous assignments.
Articial Intelligence
December 5, 2013
192 / 475
Inference by Resolution
Theorem 7.19
Algo. ModelConstruction produces a valid model for S, if
RC (S)
/
Proof.
We prove this by contradiction. Assume at some iteration i = k of the
for-loop in Algo. 27, the assignment to Pk causes a clause C of RC (S) to
become False for the rst time.
For this to occur, C = (False . . . False Pk ) or
C = (False . . . False Pk ). If only one of these two is present in
RC (S), then the assignment rule chooses the appropriate value for Pk
to make A(C ) = True .
The problem occurs if both are in RC (S). But in this case, their
resolvent (False . . . False ) also has to be in RC (S), which means
that the resolvent is already False by the assignment P1 , . . . , Pk1 .
This contradicts our assumption that the rst falsied clause appears
at stage k.
Thus, the construction never falsies a clause in RC (S). It produces
a valid model for RC (S) and in particular, for S.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
193 / 475
Inference by Resolution
Articial Intelligence
December 5, 2013
194 / 475
Inference by Resolution
Proof.
Proof by contraposition: we prove that if the closure RC (S) does not
contain the empty-clause , then S is satisable.
If RC (S), we already proved that one model A S can be
/
recursively constructed by using Algo. 27 ModelConstruction.
Therefore, this proves that if the closure RC (S) does not contain the
empty-clause , then S is satisable. It also proves its contraposition,
namely, the Ground Resolution Theorem.
Articial Intelligence
December 5, 2013
195 / 475
Inference by Resolution
Articial Intelligence
December 5, 2013
196 / 475
Inference by Resolution
Wumpus World
B1,1
P2,1
P2,1
P2,1
B1,1
P1,2
P1,2
P1,2
P2,1
P2,1
B1,1
B1,1
B1,1
P1,2
B1,1
B1,1 P1,2
B1,1
P2,1
P1,2
P2,1
P1,2
P1,2
Figure 48: CNF of B1,1 (P1,2 P2,1 ) along with the observation B1,1 . It
entails the query Q : P1,2
Articial Intelligence
December 5, 2013
197 / 475
Special kinds of KB
Remark 7.21
Resolution is the most powerful algorithm for showing entailment for
a general KB .
The SAT problem is in general NP complete.
However, for some special, less general cases, we can make the
algorithm more ecient.
Two special algorithms are: HornSAT and 2SAT, which are applicable
to a KB consisting of a specic kind of clauses only.
Articial Intelligence
December 5, 2013
198 / 475
HornSAT
Denition 7.22 (Denite Clause)
A clause with only one positive literal. Every denite clause can be written
as an implication. Example: (A B C ) (B C A).
Articial Intelligence
December 5, 2013
199 / 475
Articial Intelligence
December 5, 2013
200 / 475
PL-FC-Entails is complete
Every entailed atomic sentence is derived. Consider the nal state of
A when the algorithm reaches a xed-point, i.e. no further inferences
are possible. In other words, g = .
Claim: A can be viewed as a model of the KB: every denite clause
in the KB is True in this A.
To prove, assume the opposite, i.e. some clause a1 . . . an b is
False in the model. Then the premise must be True and the
conclusion b False in the model, i.e. A(b) = False .
As the premise is True in A, b must have been added to g and hence
at some point (when b was popped from g) assigned True by the
algorithm. This is a contradiction. Therefore, A KB .
Articial Intelligence
December 5, 2013
201 / 475
LM P
B LM
AP L
AB L
A
B
(7.21)
Articial Intelligence
December 5, 2013
202 / 475
2SAT
2SAT
Applicable to KBs consisting of 2-CNFs
Articial Intelligence
December 5, 2013
203 / 475
2SAT
Articial Intelligence
December 5, 2013
204 / 475
2SAT
Articial Intelligence
December 5, 2013
205 / 475
2SAT
C1
o
m
(7.22)
C2
v f (u) f (v )
f
j
C4
C3
Figure 53: A directed graph G and its condensation. The subscripts i for each
component Ci in the right-gure have been chosen to be the same as f (v ),
v Ci . Thus C1 , C2 , C3 , C4 is a topological order.
Articial Intelligence
December 5, 2013
206 / 475
2SAT
2SAT
Theorem 7.25
If v V such that f (v ) = f (v ) then F is unsatisable.
Proof.
Since v and v lie in the same strongly connected component,
v v ,
v v .
Articial Intelligence
December 5, 2013
207 / 475
2SAT
Proof.
(7.23)
Articial Intelligence
December 5, 2013
208 / 475
2SAT
Runtime of 2SAT
Articial Intelligence
December 5, 2013
209 / 475
December 5, 2013
210 / 475
2SAT
Applications of 2SAT
http://en.wikipedia.org/wiki/2-satisfiability
Articial Intelligence
Initial conditions
Articial Intelligence
December 5, 2013
211 / 475
Fluents
Aspects of the world/ agents state which change with time should have a
time-index associated to the name.
All percepts: Stench 3 , Stench 4 , Breeze 5 .
Articial Intelligence
December 5, 2013
212 / 475
Eect Axioms
Transition model: to be written for all 16 squares, for all 4
orientations, and for all actions.
L0 FacingEast 0 Forward 0 (L1 L1 )
1,1
2,1
1,1
If the agent takes this action, then Ask(KB , L1 ) returns True .
2,1
Frame problem: each eect-axiom has to state what remains
unchanged as a result of the action.
Forward t (HaveArrow t HaveArrow t+1 )
Articial Intelligence
December 5, 2013
213 / 475
(7.24)
Articial Intelligence
December 5, 2013
214 / 475
Articial Intelligence
December 5, 2013
(7.25)
(7.26)
215 / 475
Bree z e
Stench
PIT
PIT
Bree z e
Bree z e
Stench
Gold
Bree z e
Stench
Bree z e
PIT
Bree z e
START
Articial Intelligence
December 5, 2013
216 / 475
Articial Intelligence
December 5, 2013
217 / 475
Articial Intelligence
December 5, 2013
218 / 475
Articial Intelligence
December 5, 2013
219 / 475
Formal languages
Ontological and Epistemological Commitments
Language
Epistemological Commitment
Propositional Logic
First-order Logic
Probability Theory
Fuzzy Logic
Ontological Commitment
Facts
Facts, Objects, Relations
Facts
Facts with
degree of truth [0, 1]
True/False/Unknown
True/False/Unknown
Degree of belief [0, 1]
Known interval value
Articial Intelligence
December 5, 2013
220 / 475
Probability Calculus
Contents
Probability Calculus
Limitations of Logic
Probability Calculus
Conditional Probabilities
Inference using a Joint Probability Distribution
Conditional Independence
Articial Intelligence
Probability Calculus
December 5, 2013
221 / 475
Limitations of Logic
Articial Intelligence
December 5, 2013
222 / 475
Probability Calculus
Limitations of Logic
Articial Intelligence
Probability Calculus
December 5, 2013
223 / 475
Limitations of Logic
Articial Intelligence
December 5, 2013
224 / 475
Probability Calculus
Probability Calculus
(8.1)
A
K. Pathak (Jacobs University Bremen)
Articial Intelligence
Probability Calculus
December 5, 2013
225 / 475
Probability Calculus
P(A).
(8.2)
AM(F )
Articial Intelligence
December 5, 2013
226 / 475
Probability Calculus
Probability Calculus
0
1
if F is inconsistent
if F is valid
(8.3)
= 1 P(F ).
(8.4)
Articial Intelligence
Probability Calculus
December 5, 2013
227 / 475
Probability Calculus
P(Weather = sunny )
(8.6)
December 5, 2013
(8.7)
228 / 475
Probability Calculus
Probability Calculus
|D(A)| = n
(8.8)
(8.9)
(8.10)
Note that partial joint probability distributions are also possible, e.g.
P(A = ai , B, . . .) P(ai , B, . . .) is of size |B|
K. Pathak (Jacobs University Bremen)
Articial Intelligence
Probability Calculus
December 5, 2013
(8.11)
229 / 475
Probability Calculus
|A| |B|
(8.13)
P(x) P(ai , bj , . . .).
(8.14)
P(x, Y, . . .),
Articial Intelligence
P(x, y, . . .).
December 5, 2013
(8.15)
230 / 475
Probability Calculus
Conditional Probabilities
Conditional Probabilities
P(A)
AM(F )
Suppose the agents observes that G = True . Thus, the agent now
needs to update its belief about F .
Articial Intelligence
Probability Calculus
December 5, 2013
231 / 475
Conditional Probabilities
Conditional Probabilities
Since G is now known to be True , we set
P(A)
0, if A M(G ).
/
(8.16)
Earlier, P(G ) =
P(A)
(8.17)
AM(G )
P(A | G ) =
P(A)
, A M(G )
P(G )
(8.18)
AM(G )
P(A | G ) =
AM(G )
Articial Intelligence
P(A)
P(G )
=
= 1.
P(G )
P(G )
December 5, 2013
232 / 475
Probability Calculus
Conditional Probabilities
Conditional Probabilities
Now, we can modify (8.2)
P(F | G ) =
A M(F ) M(G )
(8.18)
1
P(G )
P(A | G ),
(8.19)
P(A),
A M(F ) M(G )
P(F G )
P(G )
(8.20)
(8.21)
Independence
Articial Intelligence
Probability Calculus
December 5, 2013
(8.22)
233 / 475
Conditional Probabilities
Articial Intelligence
December 5, 2013
234 / 475
Probability Calculus
Conditional Probabilities
M(Gi ) = .
(8.25)
i=1
M(F ) =
i=1
M(Gi ) M(F ).
(8.26)
P(F ) =
P(A)
i=1
P(F Gi )
(8.28)
Articial Intelligence
Probability Calculus
(8.27)
December 5, 2013
235 / 475
Conditional Probabilities
P(A = aj ) =
P(A = aj , B = bi ),
(8.29)
P(A = aj | B = bi ) P(B = bi ).
(8.30)
i=1
|B|
=
i=1
P(A) =
|B|
P(A, bi ) =
i=1
i=1
|Y|
P(A | bi )P(bi )
(8.31)
P(X | yi ) P(yi ).
(8.32)
|Y|
P(X) =
P(X, yi ) =
i=1
i=1
Articial Intelligence
December 5, 2013
236 / 475
Probability Calculus
Conditional Probabilities
Expectation of a DRV
Let X be a DRV with domain [x1 , x2 , . . . , xn ], where xi are ordered
and have numerical values. Then, expectation of X is
n
E [X ] =
xi P(xi )
(8.33)
i=1
E [X] =
xi P(xi )
(8.34)
i=1
Articial Intelligence
Probability Calculus
December 5, 2013
237 / 475
Conditional Probabilities
(8.35)
Proof.
|X | |Y |
E [a X + b Y ] =
(a xi + b yj ) P(xi , yj )
i=1 j=1
Articial Intelligence
December 5, 2013
238 / 475
Probability Calculus
Conditional Probabilities
Articial Intelligence
Probability Calculus
December 5, 2013
239 / 475
Conditional Probabilities
P(C1 = 1, C2 = 1, C3 = 1 | T = 1) =
c4 =1,2 c5 =1,2
P(C1 = 1, C2 = 1, C3 = 1, C4 = c4 , C5 = c5 | T = 1) =
(5 + 4 + 10 + 8)
27
=
.
300
300
Articial Intelligence
December 5, 2013
240 / 475
Probability Calculus
Conditional Probabilities
146/300
154/300
Articial Intelligence
Probability Calculus
December 5, 2013
241 / 475
Conditional Probabilities
P(A = ai | B = bj ) =
(8.36)
(8.37)
i=1
1
P(A = ai | B = bj ) =
P(B = bj )
(8.29)
|A|
P(A = ai , B = bj )
i=1
P(B = bj )
= 1.
P(B = bj )
(8.38)
Articial Intelligence
December 5, 2013
242 / 475
Probability Calculus
Conditional Probabilities
(8.39)
Articial Intelligence
Probability Calculus
December 5, 2013
243 / 475
(8.40)
|Y|
P(X , E = e, Y = yi )
General Form
(8.41)
i=1
Articial Intelligence
December 5, 2013
244 / 475
Probability Calculus
Bayes Rule
(8.42)
P(F | G ) P(G )
P(F )
Bayes Rule
Articial Intelligence
Probability Calculus
December 5, 2013
(8.43)
245 / 475
P(A = ai | B = bj ) P(B = bj )
P(A = ai )
(8.39)
(8.44)
(8.45)
If we divide the set of RVs in a given joint distribution into: the RVs
of interest A, B, and observed (evidence) vector RV E, then, we can
generalize (8.45) to
P(ai | B, e) P(B | e)
P(ai | e)
= P(ai | B, e) P(B | e)
P(B | A = ai , E = e) =
(8.46)
General Form
(8.47)
Articial Intelligence
December 5, 2013
246 / 475
Probability Calculus
Articial Intelligence
Probability Calculus
(8.48)
December 5, 2013
247 / 475
Articial Intelligence
December 5, 2013
248 / 475
Probability Calculus
Articial Intelligence
Probability Calculus
December 5, 2013
249 / 475
The manufacturer has given the prior pmf of the dierent bag-types:
P(H) = [P(h1 ), P(h2 ), P(h3 ), P(h4 ), P(h5 )].
Suppose e1 = cherry , e2 = lime . Bayes theorem
1.0 P(h1 )
0.75 P(h2 )
0.25 P(h4 )
0.0 P(h5 )
gives
(0.5)2 P(h3 )
Articial Intelligence
December 5, 2013
250 / 475
Probability Calculus
P(En+1 | e1:n ) =
i=1
5
=
i=1
5
=
i=1
P(En+1 , H = hi | e1:n )
Articial Intelligence
Probability Calculus
December 5, 2013
(8.49)
251 / 475
Conditional Independence
Independence
(8.50)
Using the general Bayes rule, one sees that this is a symmetric
relationship and hence also holds with F and G swapped.
For independent events, (8.21) reduces to
P(F G ) = P(F ) P(G )
Articial Intelligence
(8.51)
December 5, 2013
252 / 475
Probability Calculus
Conditional Independence
(8.52)
Articial Intelligence
Probability Calculus
(8.53)
December 5, 2013
253 / 475
Conditional Independence
Conditional Independence
Independence is a changeable property. Two events which were earlier
independent, can become dependent in light of some evidence. Two
events F , G are called conditionally independent given H if
P(F | G H) = P(F | H), or, P(G H) = 0.
(8.54)
Using the general Bayes rule, one sees that this is a symmetric
relationship and hence also holds with F and G swapped. Using the
product rule, another way to write conditional independence is
P(F G | H) = P(F | H) P(G | H), or, P(H) = 0.
Articial Intelligence
December 5, 2013
(8.55)
254 / 475
Probability Calculus
Conditional Independence
Conditional Independence
In terms of RVs, A = ai and B = bj are conditionally independent given
C = ck if either P(B = bj C = ck ) = 0, or,
P(A = ai | B = bj , C = ck ) = P(A = ai | C = ck ).
(8.56)
If the above holds for all ai , bj , ck , we can say that A and B are
conditionally independent given C , and all of the following are equivalent
P(A | B, C ) = P(A | C ),
(8.57a)
P(B | A, C ) = P(B | C ),
(8.57b)
(8.57c)
Articial Intelligence
Probability Calculus
December 5, 2013
255 / 475
Conditional Independence
2,4
3,4
4,4
1,4
2,4
3,4
4,4
1,3
2,3
3,3
4,3
1,3
2,3
3,3
4,3
OTHER
QUERY
2,2
1,2
3,2
4,2
1,2
2,1
3,1
4,1
2,2
1,1
3,2
4,2
2,1
FRINGE
3,1
4,1
B
OK
1,1
B
OK
KNOWN
OK
Figure 57: Priors P(Pij = 1) = P(pij ) = 0.2; only P(P11 = 0) = P(p11 ) = 1. Pij
are independent binary RVs.
Articial Intelligence
December 5, 2013
256 / 475
Probability Calculus
Conditional Independence
(8.58)
Then,
|C |
P(A, B) =
|C |
(8.58)
P(A, B, ci )
i=1
i=1
|C |
= P(A) P(B)
i=1
P(ci | A, B)
= P(A) P(B)
Hence, A and B are independent.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
Probability Calculus
December 5, 2013
257 / 475
Conditional Independence
(8.59)
Then,
P(C | A, B) =
P(A, B, C )
P(A, B)
(8.59)
Articial Intelligence
December 5, 2013
258 / 475
Probability Calculus
Conditional Independence
(8.60)
Then,
P(B | A, C ) =
P(A, B, C )
P(A, C )
(8.60)
Articial Intelligence
Probability Calculus
December 5, 2013
259 / 475
Conditional Independence
|A|
i=1 P(B
|A|
i=1 P(ai , B, C )
P(C )
| ai ) P(C | ai ) P(ai )
P(C )
|A|
i=1 P(B | ai ) P(C | ai ) P(ai )
|A|
i=1 P(C | ai ) P(ai )
(8.61)
Articial Intelligence
December 5, 2013
260 / 475
Probability Calculus
Conditional Independence
Chain Rule
This is an application of the product-rule (stated in terms of RVs)
P(X1 , . . . , Xn ) = P(Xn | X1 , . . . , Xn1 ) P(X1 , . . . , Xn1 )
(8.62a)
(8.62b)
(8.62c)
(8.62d)
P(X1 , . . . , Xn ) =
i=1
(8.63)
Articial Intelligence
December 5, 2013
261 / 475
December 5, 2013
262 / 475
Contents
Articial Intelligence
Class
Attribute 1
Attribute n
Attribute 2
Figure 58: The Na Bayesian Classier assumes that the attributes are
ve
conditionally independent of each other given the class. However, NBC often
achieves surprisingly good performance even when this strong assumption is not
strictly valid.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
263 / 475
Articial Intelligence
December 5, 2013
264 / 475
i = 1, . . . , m.
(9.1)
Articial Intelligence
(9.2)
December 5, 2013
265 / 475
Answering queries
P(a1 , a2 , . . . , an | C ) P(C )
,
(9.3)
P(a1 , a2 , . . . , an )
using conditional independence of attributes given class,
P(a1 | C ) P(a2 | C ) P(an | C ) P(C )
=
(9.4)
k
P(a1 , a2 , . . . , an , ci )
i=1
P(a1 | C ) P(a2 | C ) P(an | C ) P(C )
(9.5)
=
k
i=1 P(a1 | ci ) P(an | ci ) P(ci )
P(C | a1 , a2 , . . . , an ) =
Articial Intelligence
December 5, 2013
266 / 475
P(cj | a1 , a2 , . . . , an ) =
k
i=1 P(a1
| ci ) P(an | ci ) P(ci )
Articial Intelligence
December 5, 2013
267 / 475
e bj
Rewrite P(cj | a1 , a2 , . . . , an )
(9.6)
k
bi
i=1 e
k
ln P(cj | a1 , a2 , . . . , an ) = bj ln
e bi
(9.7)
i=1
k
= bj b ln
where, b
e bi b
(9.8)
i=1
max bi .
(9.9)
Articial Intelligence
December 5, 2013
268 / 475
Learning Terminology
Learning 101
Denition of Tom Mitchell, CMU
Articial Intelligence
December 5, 2013
269 / 475
Learning Terminology
Articial Intelligence
December 5, 2013
270 / 475
Supervised Learning
Supervised Learning
Articial Intelligence
December 5, 2013
271 / 475
Supervised Learning
Nr. of attributes: 22
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
A13
A14
A15
A16
A17
A18
A19
A20
A21
A22
cap-shape
cap-surface
cap-color
bruises?
odor
gill-attachment
gill-spacing
gill-size
gill-color
stalk-shape
stalk-root
stalk-surface-above-ring
stalk-surface-below-ring
stalk-color-above-ring
stalk-color-below-ring
veil-type
veil-color
ring-number
ring-type
spore-print-color
population
habitat
bell=b,conical=c,convex=x,at=f, knobbed=k,sunken=s
brous=f,grooves=g,scaly=y,smooth=s
brown=n,bu=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y
bruises=t,no=f
almond=a,anise=l,creosote=c,shy=y,foul=f, musty=m,none=n,pungent=p,spicy=s
attached=a,descending=d,free=f,notched=n
close=c,crowded=w,distant=d
broad=b,narrow=n
black=k,brown=n,bu=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e, white=w,ye
enlarging=e,tapering=t
bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=?
brous=f,scaly=y,silky=k,smooth=s
brous=f,scaly=y,silky=k,smooth=s
brown=n,bu=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y
brown=n,bu=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y
partial=p,universal=u
brown=n,orange=o,white=w,yellow=y
none=n,one=o,two=t
cobwebby=c,evanescent=e,aring=f,large=l, none=n,pendant=p,sheathing=s,zone=z
black=k,brown=n,bu=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y
abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y
grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d
Articial Intelligence
December 5, 2013
272 / 475
Supervised Learning
x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g
x,y,b,t,n,f,c,b,e,e,?,s,s,e,w,p,w,t,e,w,c,w
x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,k,n,g
b,s,w,t,a,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,n,m
b,y,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,s,m
x,y,w,t,p,f,c,n,p,e,e,s,s,w,w,p,w,o,p,k,v,g
.
.
.
Articial Intelligence
December 5, 2013
273 / 475
Supervised Learning
Articial Intelligence
December 5, 2013
274 / 475
Supervised Learning
Independence
(9.10a)
Identical Distribution
(9.10b)
Articial Intelligence
December 5, 2013
275 / 475
Supervised Learning
Error-Rate
It is the proportion of mistakes a given hypothesis makes: i.e. the
proportion of times h(x) = y .
Articial Intelligence
December 5, 2013
276 / 475
Supervised Learning
Holdout Cross-Validation
Split the available examples randomly into a training-set from which the
learning algorithm produces a hypothesis, and test-set on which the
accuracy of h is evaluated. Disadvantage: we cannot use all examples for
nding h.
e1
e2
.
.
.
.
em
Articial Intelligence
December 5, 2013
277 / 475
Multinomial Distribution
Let our sample-space be divided into k classes, i.e. |C | = k.
We draw m samples. Each sample ej , j = 1 . . . m falls into only one
class. Let the class of ej be denoted ej .c.
Let the known prior probability P(C = ci ) = i . Thus,
k
i =
i=1
P(C = ci ) = 1.
(9.11)
i=1
Ni = m.
(9.13)
i=1
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
278 / 475
Multinomial Distribution
The the joint pmf of the Ni s is given by
n1
k1
m
m n1
m i=1 ni n1 n2
.
n
P(N = . ) =
1 2 k k
.
n1
n2
nk
nk
m!
n
n n
=
1 1 2 2 k k
n1 ! n2 ! nk !
k
where,
ini ,
m (n)
(9.14)
i=1
ni = m,
i=1
i = 1.
(9.15)
i=1
(9.16)
E [Ni ] =
E [I(ej .c = i)] =
j=1
P(C = i) =
j=1
(9.17)
j=1
Articial Intelligence
i = m i
December 5, 2013
279 / 475
(x)
t x1 e t dt
(9.18)
Figure 60: Src: facebook.com. We are only interested in the positive half of the
real axis.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
280 / 475
Figure 61: Src: zazzle.com. The famous Gamma function value for a non-integer.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
281 / 475
p(; )
ii 1 ,
d ()
d ()
(1 + . . . + k )
i=1
k
i=1 (i )
(9.19)
(9.20)
Sk
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
282 / 475
Sk
i p(; ) d
= d ()
Sk
1 1 1 ii k k 1 d
= missing steps
(1 + k ) (i + 1)
=
(1 + k + 1) (i )
i
=
.
1 + + k
Articial Intelligence
(9.21)
December 5, 2013
283 / 475
(a) = (1, 1, 1)
Articial Intelligence
December 5, 2013
284 / 475
Conjugate Priors
The Bayesian estimation update rule also holds for pdfs:
p( n+1 | e1:n+1 ) = p(en+1 | n ) p( n | e1:n ) .
Posterior
Likelihood
Prior
Articial Intelligence
December 5, 2013
286 / 475
ini
= 1
i=1
(9.22)
j 1
(9.23)
j=1
ini +i 1
= 1
Dir (n + )
(9.24)
(9.25)
i=1
nj + j
k
i=1 (ni
+ i )
Articial Intelligence
December 5, 2013
287 / 475
E [j ] =
nj + j
k
i=1 (ni
+ i )
Articial Intelligence
December 5, 2013
288 / 475
Training an NBC
P(A = ai , Class = Y )
pi
= .
P(Class = Y )
p
(9.26)
Articial Intelligence
December 5, 2013
289 / 475
Training an NBC
Step 1
Fill in the missing attribute-values using the heuristics of the last slide.
Step 2
Compute the prior P(C = ci ) for i = 1, . . . , k
1
P(C = ci ) =
m
I(ej .c = ci )
(9.27)
j=1
Articial Intelligence
December 5, 2013
290 / 475
Training an NBC
Step 3
The CPT P(Ar | C ) consists of |C | pmfs P(Ar | ci ) S|Ar | .
Assume a Dirichlet prior for all pmfs P(Ar | ci ) Dir (r ).
Articial Intelligence
December 5, 2013
291 / 475
Training an NBC
nr ,
j=1
P(Ar = ar , | C = ci ) =
I(ej .Ar = ar , ej .c = ci )
nr , + r [ ]
|Ar |
p=1 (nr ,p
(9.28)
(9.29)
+ r [p])
Articial Intelligence
December 5, 2013
292 / 475
P(x1 X x2 ) =
p(x)dx.
(9.30)
x1
Articial Intelligence
December 5, 2013
293 / 475
Most rules like the product-rule, marginalization, Bayes rule have similar
counterparts in the continuous domain.
p(X = x, Y = y ) xy = (p(x | y ) x) (p(y ) y )
p(x, y ) = p(x | y ) p(y )
(9.31)
(9.32)
D(X )
Articial Intelligence
December 5, 2013
294 / 475
1
(2)n/2 |C|1/2
1
exp (x )T C1 (x )
x
x
2
(9.33)
Articial Intelligence
December 5, 2013
295 / 475
Number of attributes: 5
1.
2.
3.
4.
5.
Articial Intelligence
December 5, 2013
296 / 475
3.6216,8.6661,-2.8073,-0.44699
4.5459,8.1674,-2.4586,-1.4621
3.866,-2.6383,1.9242,0.10645
3.4566,9.5228,-4.0112,-3.5944
0.32924,-4.4552,4.5718,-0.9888
-1.3887,-4.8773,6.4774,0.34179
-3.7503,-13.4586,17.5932,-2.7771
-3.5637,-8.3827,12.393,-1.2823
-2.5419,-0.65804,2.6842,1.1952
0
0
0
0
0
1
1
1
1
Articial Intelligence
December 5, 2013
297 / 475
Articial Intelligence
December 5, 2013
298 / 475
p(A1 = a1 , A2 = a2 , . . . , An = an | C = ci ) =
j=1
p(aj | ci ).
(9.34)
(9.35)
2
where, ji , ji are the mean and variance of the values of Aj among
instances of class C = ci .
Articial Intelligence
December 5, 2013
299 / 475
P(cj | a1 , a2 , . . . , an ) =
k
i=1 p(a1
| ci ) p(an | ci ) P(ci )
Articial Intelligence
December 5, 2013
(9.36)
300 / 475
Bayesian Networks
Contents
Bayesian Networks
Some Conditional Independence Results
Pruning a BN
Exact Inference in a BN
Approximate Inference in BN
Ecient Representation of CPTs
Applications of BN
Articial Intelligence
December 5, 2013
301 / 475
Bayesian Networks
Articial Intelligence
December 5, 2013
302 / 475
Bayesian Networks
Earthquake
Alarm
Burglary
B
t
t
f
f
E
t
f
t
f
P(A)
.95
.94
.29
.001
A P(M)
A P(J)
JohnCalls
t
f
.90
.05
P(E)
.002
MaryCalls
t
f
.70
.01
Figure 64: A typical Bayesian Network (BN) showing the topology and CPTs.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
303 / 475
Bayesian Networks
Terminology
Articial Intelligence
December 5, 2013
304 / 475
Bayesian Networks
(10.1)
Topological Ordering/Sorting
A topological ordering of a DAG is a non-unique total ordering which
is compatible with the above partial ordering.
In any topological ordering [X1 , X2 , . . . , Xn ], for all vertices Xi ,
Parents (Xi ) Predecessors (Xi ).
(10.2)
Articial Intelligence
December 5, 2013
305 / 475
Bayesian Networks
Articial Intelligence
December 5, 2013
306 / 475
Bayesian Networks
C
D
Articial Intelligence
December 5, 2013
307 / 475
Bayesian Networks
How can you get the ITYCA topological order with the DFS based
topological ordering algorithm from the CLRS book?
Articial Intelligence
December 5, 2013
308 / 475
Bayesian Networks
(10.3a)
(10.3b)
(10.3c)
P(X1 , . . . , Xn ) =
i=1
(10.4)
Articial Intelligence
Bayesian Networks
December 5, 2013
309 / 475
Dening Property of a BN
PB (X1 , . . . , Xn ) =
i=1
(10.5)
Note that some nodes can be without any parents, but they are also
included in the above product.
Articial Intelligence
December 5, 2013
310 / 475
Bayesian Networks
(10.6)
P(X1 , . . . , Xn ) =
i=1
(10.6)
i=1
Articial Intelligence
Bayesian Networks
December 5, 2013
311 / 475
P(X1 , . . . , Xn )
(10.7)
(10.8)
Xi+1
n
=
Xn
Xi+1 j=1
Articial Intelligence
December 5, 2013
312 / 475
Bayesian Networks
=
k=1
Xn
Xi+1 j=i+1
f1
Xn
Articial Intelligence
Bayesian Networks
December 5, 2013
313 / 475
P(Xi , Xi1 , . . . , X1 ) =
k=1
i1
P(Xi1 , Xi2 , . . . , X1 ) =
k=1
(10.13)
(10.14)
Articial Intelligence
December 5, 2013
314 / 475
Bayesian Networks
Proof.
A node is conditionally independent, given its parents, of its predecessors
in any topological ordering of the BN, in particular, of those in a
topological-ordering made by ITYCAing the node: these predecessors are
precisely the nodes nondescendants.
Articial Intelligence
Bayesian Networks
December 5, 2013
315 / 475
Example
A
C
D
H
G
Figure 66: Write down the expression for the JPD PB (X1 , . . . , Xn ) of the above
BN.
Articial Intelligence
December 5, 2013
316 / 475
Bayesian Networks
Pruning a BN
(10.15)
where, the RHS is the full JPD of B as dened in (10.5), and LHS is the
marginal JPD found from the full JPD of B after marginalizing out L.
Proof.
Articial Intelligence
Bayesian Networks
December 5, 2013
317 / 475
Pruning a BN
|L|
PB (x1 , . . . , xn ) =
PB (x1 , . . . , xn , L = i )
i=1
|L|
=
i=1 j=1
=
j=1
n
=
j=1
i=1
P( i | parents (L))
= PB (x1 , . . . , xn )
Articial Intelligence
December 5, 2013
318 / 475
Bayesian Networks
Pruning a BN
(10.16)
where, the RHS is the full JPD of BN A as dened in (10.5), and LHS is
the marginal JPD found from the full JPD of B after marginalizing out
nodes in B which do not exist in A.
Proof.
Articial Intelligence
Bayesian Networks
December 5, 2013
319 / 475
Pruning a BN
Articial Intelligence
December 5, 2013
320 / 475
Bayesian Networks
Exact Inference in a BN
Exact Inference in a BN
Denition 10.5 (Query on Posterior Distribution)
Given a query RV X , a vector of observed evidence RVs E = e, and a
vector of irrelevant (unobserved/hidden) RVs Y, wed like to nd the
posterior probability P(X | e). We have a BN B consisting of all the RVs
{X , E, Y}.
PB (X , e, y).
(10.17)
Articial Intelligence
Bayesian Networks
December 5, 2013
321 / 475
Exact Inference in a BN
A Useful Optimization
Articial Intelligence
December 5, 2013
322 / 475
Bayesian Networks
Exact Inference in a BN
Recall
P(B)
.001
Earthquake
Alarm
Burglary
B
t
t
f
f
E
t
f
t
f
P(A)
.95
.94
.29
.001
A P(M)
A P(J)
JohnCalls
t
f
P(E)
.002
MaryCalls
.90
.05
t
f
.70
.01
|E | |A|
P(B | j, m) = P(B, j, m) =
P(B, j, m, ei , ak )
i=1 k=1
|E | |A|
=
i=1 k=1
Articial Intelligence
Bayesian Networks
December 5, 2013
(10.18)
323 / 475
Exact Inference in a BN
Factors
|E | |A|
P(B | j, m) =
i=1 k=1
|E |
P(ei )
= P(B)
f1 (B)
|A|
i=1
f2 (E )
k=1
f4 (A)
f5 (A)
Each factor fi is a matrix indexed by the values of its argument RVs, e.g.
f4 (A) =
P(j | a)
,
P(j | a)
f3 (A, B, E ) is 2 2 2.
Articial Intelligence
f5 (A) =
P(m | a)
,
P(m | a)
December 5, 2013
324 / 475
Bayesian Networks
Exact Inference in a BN
P(B | j, m) = f1 (B)
The symbol
f2 (E )
f3 (A, B, E )
f4 (A)
f5 (A)
f(X1 , . . . , Xi , Y1 , . . . Yj , Z1 , . . . , Zk ) =
f1 (X1 , . . . , Xi , Y1 , . . . Yj )
f2 (Y1 , . . . Yj , Z1 , . . . , Zk ). (10.19)
Articial Intelligence
Bayesian Networks
December 5, 2013
325 / 475
Exact Inference in a BN
B
1
0
1
0
f1 (A, B)
p11
p12
p13
p14
B
1
1
0
0
C
1
0
1
0
f2 (B, C )
p21
p22
p23
p24
A
1
1
1
1
0
0
0
0
B
1
1
0
0
1
1
0
0
C
1
0
1
0
1
0
1
0
Articial Intelligence
f2 (B, C )
f1 f2
p11 p21
p11 p22
p12 p23
p12 p24
p13 p21
p13 p22
p14 p23
p14 p24
December 5, 2013
326 / 475
Bayesian Networks
Exact Inference in a BN
b1
b1
b2
b2
b1
b1
b2
b2
b1
b1
b2
b2
c1
c2
c1
c2
c1
c2
c1
c2
c1
c2
c1
c2
0.25
0.35
0.08
a1
a1
a2
a2
a3
a3
0.16
0.05
0.07
0
0
0.15
c1
c2
c1
c2
c1
c2
0.33
0.51
0.05
0.07
0.24
0.39
0.21
0.09
0.18
Articial Intelligence
Bayesian Networks
December 5, 2013
327 / 475
Exact Inference in a BN
Articial Intelligence
December 5, 2013
328 / 475
Bayesian Networks
Exact Inference in a BN
g=
h.
(10.20)
zZ
fS(Z )
point-wise product
Append factor g to F;
return F
K. Pathak (Jacobs University Bremen)
Articial Intelligence
Bayesian Networks
December 5, 2013
329 / 475
Exact Inference in a BN
Computing P(Q | E = e)
Zhang and Poole (1994)
Articial Intelligence
December 5, 2013
330 / 475
Bayesian Networks
Exact Inference in a BN
Recall
P(B)
.001
Earthquake
Alarm
Burglary
B
t
t
f
f
E
t
f
t
f
P(A)
.95
.94
.29
.001
A P(M)
A P(J)
JohnCalls
t
f
P(E)
.002
MaryCalls
.90
.05
t
f
Articial Intelligence
Bayesian Networks
.70
.01
December 5, 2013
331 / 475
Exact Inference in a BN
P(B | j, m) = P(B)
f1 (B)
|A|
P(ei )
i=1
f2 (E )
k=1
f4 (A)
f5 (A)
f1 (B)
P(B)
f2 (E )
P(E )
F = f3 (A, B, E )
= P(A | B, E )
f4 (J, A)
P(J | A)
f5 (M, A)
P(M | A)
Articial Intelligence
December 5, 2013
(10.21)
332 / 475
Bayesian Networks
Exact Inference in a BN
[0.001, 0.999]
f1 (B)
[0.002, 0.998]
f2 (E )
(10.22)
= f3 (A, B, E )
F = f3 (A, B, E )
[0.9, 0.05]
f4 (A)
[0.7, 0.01]
f5 (A)
In the rst iteration of the for-loop of Algo. 33, we have Y E . In
the call EliminateVariableFromFactors(F, E), in (10.20), we
have factor h1 = f2 (E ) f3 (A, B, E ). Verify that h1 has the table
B
T
T
F
F
A
T
F
T
F
E= True
0.95 x 0.002
0.05 x 0.002
0.29 x 0.002
0.71 x 0.002
E= False
0.94 x 0.998
0.06 x 0.998
0.001 x 0.998
0.009 x 0.998
Articial Intelligence
Bayesian Networks
December 5, 2013
333 / 475
Exact Inference in a BN
A
T
F
T
F
g1 (A, B)
0.95 x 0.002 + 0.94 x 0.998
0.05 x 0.002 + 0.06 x 0.998
0.29 x 0.002 + 0.001 x 0.998
0.71 x 0.002 + 0.999 x 0.998
=
=
=
=
0.94002
0.05998
0.001578
0.998422
A
T
F
T
F
h2 (A, B)
0.94002 x 0.9 x 0.7 = 0.5922126
0.05998 x 0.05 x 0.01 = 2.999 105
0.001578 x 0.9 x 0.7 = 0.00099414
0.998422 x 0.05 x 0.01 = 0.0004992
Articial Intelligence
December 5, 2013
334 / 475
Bayesian Networks
Exact Inference in a BN
h(B)
0.001 x 0.59224259 = 0.00059224259
0.999 x 0.001493351 = 0.0014918576
Articial Intelligence
Bayesian Networks
December 5, 2013
335 / 475
Exact Inference in a BN
For answering P(J = True | B = True ), note that the sub-DAG formed by
the nodes {B, A, J, E } is ancestral. Therefore, we can prune out the leaf
node M before answering the query.
Articial Intelligence
December 5, 2013
336 / 475
Bayesian Networks
Approximate Inference in BN
Approximate Inference in BN
Monte Carlo Algorithms
Articial Intelligence
Bayesian Networks
December 5, 2013
337 / 475
Approximate Inference in BN
Z TopologicalOrdering(B) ;
Initialize sample-vector s R B to 0 ;
for i 1 . . . B do
zi A random-sample from pmf P(Zi | parents (Zi ));
s[i] zi ;
return s
Articial Intelligence
December 5, 2013
338 / 475
Bayesian Networks
Approximate Inference in BN
Articial Intelligence
Bayesian Networks
December 5, 2013
339 / 475
Approximate Inference in BN
=
i=1
(10.5)
= PB (Z = s).
Articial Intelligence
December 5, 2013
340 / 475
Bayesian Networks
Approximate Inference in BN
for j 1 to N do
Initialize sample-vector s R B to 0 ;
s Prior-Sampling(B) ;
if s is consistent with e then
x the value of X in s ;
C[x] C[x] + 1 ;
return Normalize(C)
Articial Intelligence
Bayesian Networks
December 5, 2013
341 / 475
Approximate Inference in BN
Articial Intelligence
December 5, 2013
342 / 475
Bayesian Networks
Approximate Inference in BN
Articial Intelligence
Bayesian Networks
December 5, 2013
343 / 475
Approximate Inference in BN
Articial Intelligence
December 5, 2013
344 / 475
Bayesian Networks
Approximate Inference in BN
PWS (s) =
i=1
(10.23)
w (s) =
i=1
(10.24)
Articial Intelligence
Bayesian Networks
December 5, 2013
345 / 475
Approximate Inference in BN
return Normalize(W)
Articial Intelligence
December 5, 2013
346 / 475
Bayesian Networks
Approximate Inference in BN
Consistency of Likelihood-Weighting
Let Nx (y) be the number of samples of type s = x y e generated
by Weighted-Sample.
Then before normalization of W,
Nx (y) w (s = {x} y e).
W[x] =
y
(10.23),(10.24)
PB (s = {x} y e)
= PB (x, e)
Therefore, W = PB (X , e) = PB (X | e).
K. Pathak (Jacobs University Bremen)
Articial Intelligence
Bayesian Networks
December 5, 2013
347 / 475
Noisy OR
Ecient Representation of CPTs
An eect (child node) can have several causes (parent nodes), e.g.
the binary eect RV Fever can have binary cause-RVs
Cold , Flu , Malaria , etc. It is dicult for a domain expert to specify
all the numbers for the whole CPT P(Fever | Cold , Flu , Malaria ).
The number of to be specied probabilities in a CPT increases
exponentially with the number of parents.
Therefore, we use additional assumptions to keep this number
bounded.
Articial Intelligence
December 5, 2013
348 / 475
Bayesian Networks
Noisy OR
Ecient Representation of CPTs
As a logical statement:
Fever = Cold Flu Malaria
Articial Intelligence
Bayesian Networks
December 5, 2013
349 / 475
Noisy OR
Articial Intelligence
December 5, 2013
350 / 475
Bayesian Networks
Noisy OR
This allows the CPT to be dened implicitly by inhibition probabilities
P(fever | cold )
P(fever | u )
P(fever | malaria )
P(fever | cold )
qc ,
P(fever | u )
qf ,
P(fever | malaria )
qm ,
1,
(10.26)
1,
(10.27)
1.
(10.28)
The full CPT P(Fever | Cold , Flu , Malaria ) can now be given in terms of
these values by plugging them into (10.25).
Cold
T
T
T
T
F
F
F
F
K. Pathak (Jacobs University Bremen)
Flu
T
T
F
F
T
T
F
F
Malaria
T
F
T
F
T
F
T
F
P(Fever )
qc qf qm
qc qf
qc qm
qc
qf qm
qf
qm
1.0
Articial Intelligence
Bayesian Networks
P(Fever )
1 qc qf qm
1 qc qf
1 qc qm
1 qc
1 qf qm
1 qf
1 qm
0.0
December 5, 2013
351 / 475
Noisy MAX
The Noisy MAX is a generalization of the noisy OR for non-binary RVs.
Let Y be an RV with values 0, 1, . . . |Y | 1, |Y | > 2. The RV Y is
semantically graded, meaning that the value 0 implies that the
eect Y is absent, and increasing values denote that Y is present
with increasing intensity/degree.
The direct causes of Y are represented by
Parents (Y ) = X = {X1 , . . . , Xn }. Each Xi is also a graded RV and
can take values from 0 to |Xi | 1.
Let us denote zi,k to denote an instantiation of X in which Xi = k
and Xj = 0, j = i. Note that zi,0 0.
Articial Intelligence
December 5, 2013
352 / 475
Bayesian Networks
(10.29)
P(Y d | x) =
k=0
P(Y = k | x).
(10.30)
P(Y d | X1 = k1 , . . . , Xn = kn ) =
i=1
P(Y d | zi,ki ).
Articial Intelligence
Bayesian Networks
December 5, 2013
(10.31)
353 / 475
(10.32)
i = 0 . . . (|Y | 1),
Cd, j, k
P(Y d | zi,k ) =
ci, j, k .
(10.33)
i=0
P(Y d | X1 = k1 , . . . , Xn = kn ) =
Cd, j, kj
Q(d, x).
j=1
(10.34)
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
354 / 475
Bayesian Networks
0
1
0
0
1
0.5
0.5
0
2
0
1
0
Y \X2
0
1
2
(a) ci, 1, k
0
1
0
0
1
0.5
0.3
0.2
(10.35)
2
0
0
1
(b) ci, 2, k
Articial Intelligence
Bayesian Networks
December 5, 2013
355 / 475
Applications of BN
Articial Intelligence
December 5, 2013
356 / 475
Bayesian Networks
Applications of BN
Figure 69: The CPCS system of Pradhan et al (1994) for internal medicine. It
has 448 nodes, 906 edges, and uses Noisy-MAX distributions to reduce the
specied CPT values from 133,931,430 to 8,254.
Articial Intelligence
Bayesian Networks
December 5, 2013
357 / 475
Applications of BN
Articial Intelligence
December 5, 2013
358 / 475
Bayesian Networks
Applications of BN
Articial Intelligence
Bayesian Networks
December 5, 2013
359 / 475
Applications of BN
In Genetics
Gene-expression Analysis
Functional Annotation, Protein-protein interaction, Haplotype
Inference
Pedigree Analysis
Survey: http:
//genomics10.bu.edu/bioinformatics/kasif/bayes-net.html
Articial Intelligence
December 5, 2013
360 / 475
Bayesian Networks
Applications of BN
In Software
Articial Intelligence
Bayesian Networks
December 5, 2013
361 / 475
Applications of BN
Articial Intelligence
December 5, 2013
362 / 475
Contents
Articial Intelligence
December 5, 2013
363 / 475
The Perceptron
The Perceptron
Supervised Learning
x M
wx>
(11.1)
x M+ .
(11.2)
x
.
1
(11.3)
Redene
w
w
,
Articial Intelligence
December 5, 2013
364 / 475
The Perceptron
Articial Intelligence
December 5, 2013
365 / 475
The Perceptron
1.5
0.5
0.5
1.5
2.5
2.5
1.5
0.5
0.5
Articial Intelligence
Iteration 2
1.5
2.5
December 5, 2013
366 / 475
The Perceptron
1.5
0.5
0.5
1.5
2.5
2.5
1.5
0.5
0.5
1.5
Articial Intelligence
Iteration 2
2.5
December 5, 2013
367 / 475
2.5
Entropy
Entropy
1.5
0.5
1.5
0.5
0.5
1.5
2.5
Iteration 3
2.5
1.5
Articial Intelligence
December 5, 2013
368 / 475
Entropy
Entropy
Desired property 1
H(P1 , . . . , Pn ) is a continuous function of Pi s.
Articial Intelligence
December 5, 2013
369 / 475
Entropy
Entropy
Desired property 2
If all events X = xi are equally likely, Pi = 1/n, i = 1 . . . n. Then
H(1/n, . . . , 1/n) should be a monotonically increasing function of n.
As the number of equally likely events increases, our choice or
uncertainty increases.
Articial Intelligence
December 5, 2013
370 / 475
Entropy
1/2
1/2
1/3
2/3
1/6
1/2
1/3
Figure 70: Note that the net probabilities at the leaves are the same.
Articial Intelligence
December 5, 2013
371 / 475
Entropy
Entropy I
Theorem 11.1
The function satisfying the said three properties is
n
H(X ) H(P1 , . . . , Pn ) =
Articial Intelligence
Pi log Pi
(11.4)
i=1
December 5, 2013
372 / 475
Entropy
H(P1 =
1
1
1
, P2 = , . . . , Pn = )
n
n
n
Articial Intelligence
(11.5)
December 5, 2013
373 / 475
Entropy
Choice Tree
Consider rst levels 2 and 3, then also include level 1. Using property 3,
1
A(23 ) = A(22 ) + 22 2 A(2)
2
1
1
= A(2) + 21 1 A(2) + 22 2 A(2)
2
2
= 3 A(2).
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
374 / 475
Entropy
A(b d ) = A(b d1 ) +
i=1
1
b d1
A(b)
(11.6)
Articial Intelligence
December 5, 2013
375 / 475
Entropy
1
1
1
Knowing that H(P1 = n , P2 = n , . . . , Pn = n ) = k log n is nice, but
what were really after is the case H(P1 , P2 , . . . , Pn ), where Pi s are
arbitrary: This is proven in Part II.
Articial Intelligence
December 5, 2013
376 / 475
Entropy
mn
M
m1
M
P2 =
m1 equally likely
children
mn equally likely
children
m2
M
m2 equally likely
children
Articial Intelligence
December 5, 2013
377 / 475
Entropy
A(M) = H(P1 , . . . , Pn ) +
Pi A(mi )
i=1
H(P1 , . . . , Pn ) = k
Pi
i=1
log M k
=1
n
H(P1 , . . . , Pn ) = k
= k
Pi log
i=1
n
Pi log mi
i=1
mi
M
Pi log Pi .
i=1
Articial Intelligence
December 5, 2013
378 / 475
Entropy
Entropy
E [ log2 P(X )]
(11.7a)
|X |
for DRVs
(11.7b)
for CRVs
(11.7c)
i=1
D(X )
Articial Intelligence
December 5, 2013
379 / 475
Entropy
Computing 0 log 0
By LHpitals rule,
o
lim+ x log x = lim+
x0
x0
log x
1/x
= lim+
x0
1/x
1/x 2
Articial Intelligence
= lim+ (x)
= 0.
x0
December 5, 2013
380 / 475
Entropy
Conditional Entropy
A choice-tree based derivation of conditional entropy was done in the
class. The derivation can also be done purely mathematically as follows
|X |
H(Y | X )
i=1
|X |
P(xi ) H(Y | X = xi ) =
|X |
P(xi )
i=1
P(xi )
(11.8)
i=1
j=1
P(xi , yj ) log2
i=1 j=1
(11.9)
P(xi , yj )
P(xi )
|X | |Y |
j=1
H(yj | xi )
|Y |
|X | |Y |
|Y |
|X | |Y |
= H(X , Y ) H(X )
K. Pathak (Jacobs University Bremen)
(11.10)
Articial Intelligence
December 5, 2013
381 / 475
Entropy
H= -x log_2(x)
0.4
0.3
0.2
0.1
0.0
0.0
0.2
0.4
0.6
0.8
1.0
Articial Intelligence
December 5, 2013
382 / 475
Entropy
i xi )
i f (xi ),
if
i = 1, and i 0.
(11.11)
|Y | |X |
H(Y | X ) =
P(xi ) H(P(yj | xi ))
j=1
(11.12)
i=1
Articial Intelligence
December 5, 2013
383 / 475
Entropy
|Y |
H(Y | X )
|X |
H
j=1
|Y |
i=1
P(xi , yj )
i=1
|Y |
(11.13)
|X |
H
j=1
P(xi ) P(yj | xi )
H(P(yj ))
j=1
(11.7b)
H(Y ).
H(Y | X ) H(Y )
(11.14)
Articial Intelligence
December 5, 2013
384 / 475
Entropy
Mutual Information
Mutual information of two RVs X and Y is dened as
|X | |Y |
I (X , Y )
P(xi , yj ) log2
i=1 j=1
P(xi , yj )
P(xi ) P(yj )
(11.15a)
= H(Y ) H(Y | X )
(11.15b)
(11.15d)
= H(X ) H(X | Y )
(11.15c)
(11.15e)
Articial Intelligence
December 5, 2013
385 / 475
Entropy
Venn Diagram
H(X)
H(Y )
H(X | Y ) I(X, Y ) H(Y | X)
H(X, Y )
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
386 / 475
Entropy
Hb (p)
(11.16)
Articial Intelligence
December 5, 2013
387 / 475
Decision Trees
Patrons?
None
Some
Full
Yes
No
WaitEstimate?
>60
30-60
Alternate?
No
No
No
Yes
Bar?
No
Yes
0-10
Hungry?
Yes
Reservation?
No
10-30
No
Fri/Sat?
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Alternate?
No
Yes
Yes
Raining?
No
Yes
No
Yes
Yes
Articial Intelligence
December 5, 2013
388 / 475
Alt
Y
Y
N
Y
Y
N
N
N
N
Y
N
Y
Bar
N
N
Y
N
N
Y
Y
N
Y
Y
N
Y
Fri
N
N
N
Y
Y
N
N
N
Y
Y
N
Y
Hun
Y
Y
N
Y
N
Y
N
Y
N
Y
N
Y
Input Attributes
Patr
Prc Rain
Some $$$
N
Full
$
N
Some
$
N
Full
$
Y
Full
$$$
N
Some
$$
Y
None
$
Y
Some
$$
Y
Full
$
Y
Full
$$$
N
None
$
N
Full
$
N
Rsrv
Y
N
N
N
Y
Y
N
Y
N
Y
N
N
Articial Intelligence
Type
Frn
Thai
Burg
Thai
Frn
Itl
Burg
Thai
Burg
Itl
Thai
Burg
Est
0-10
30-60
0-10
10-30
>60
0-10
0-10
0-10
>60
10-30
0-10
30-60
December 5, 2013
o/p
Wait?
y1 = Y
y2 = N
y3 = Y
y4 = Y
y5 = N
y6 = Y
y7 = N
y8 = Y
y9 = N
y10 = N
y11 = N
y12 = Y
389 / 475
Aim
To build a decision-tree from examples, which allows us, on an average, to
reach a decision as fast as possible, i.e. with the least number of checks.
Strategy
We check the attributes Xi (i.e. split the tree) in decreasing order of their
mutual-information I (Xi , Y ) = H(Y ) H(Y | Xi ) to the decision (class
Y ).
Articial Intelligence
December 5, 2013
390 / 475
H(p, 1 p)
(11.17)
P(Type = t)
t=f ,t,b,i
=
t=f ,t,b,i
w =Y ,N
h(P(w | t))
P(Type = t) Hb P(Wait = Y | t) .
Articial Intelligence
December 5, 2013
391 / 475
Articial Intelligence
December 5, 2013
392 / 475
Articial Intelligence
December 5, 2013
393 / 475
Articial Intelligence
December 5, 2013
394 / 475
Notation
Let A be the vector RV of all attribute-DRVs. Each DRV Ai has a
domain of values {ai,j | j = 1 . . . |Ai |}.
Let the example-set be denoted as
X = {(A = ak , Y = yk ) | k = 1 . . . X},
Xi X.
(11.18)
Articial Intelligence
December 5, 2013
395 / 475
return ;
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
396 / 475
Some
Full
Yes
No
WaitEstimate?
>60
30-60
10-30
0-10
Alternate?
No
No
Yes
Reservation?
No
No
No
Fri/Sat?
Yes
Bar?
Hungry?
No
Yes
Yes
Yes
Alternate?
Yes
No
Yes
Yes
Yes
No
Yes
Raining?
Yes
No
Yes
No
No
Articial Intelligence
Yes
Yes
December 5, 2013
397 / 475
Some
Full
Hungry?
Yes
No
No
French
Yes
Yes
Type?
Italian
Thai
Burger
Fri/Sat?
No
No
Yes
No
Yes
Yes
Figure 75: The decision-tree deduced from the 12 examples of the training-set.
Some attributes are never checked to arrive at a decision.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
398 / 475
I (A, C )
H(A)
(11.19)
H(A) =
i=1
pi + ni
pi + ni
log2
p+n
p+n
(11.20)
Articial Intelligence
December 5, 2013
399 / 475
Articial Intelligence
December 5, 2013
400 / 475
Cross-Validation
Model-Complexity Selection
f(x)
f(x)
f(x)
f(x)
(a)
x
(c)
(b)
x
(d)
Ockhams Razor
Entities must not be multiplied beyond necessity. In other words, we
should tend towards simpler theories until more complicated theories
become necessary to explain new observations. The eld of
Model-Selection studies these ideas, many of them based on the
information entropy: MDL (minimum description length), AIC (Akaike
information criterion), etc.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
401 / 475
Cross-Validation
Error-Rate
It is the proportion of mistakes a given hypothesis makes: i.e. the
proportion of times h(x) = y .
Articial Intelligence
December 5, 2013
402 / 475
Cross-Validation
.
.
em
Articial Intelligence
December 5, 2013
403 / 475
Cross-Validation
k-fold Cross-Validation
Divide available examples into k equal subsets. Perform k rounds of
learning: in each round, (1 1/k)th of the data is used as a training-set
and the remaining 1/kth data is used as test-set (now called
validation-set). Then the average test score of k rounds is taken as the
performance measure. k = 5, 10, m. k = m is called leave-one-out
cross-validation (LOOCV).
e1
e2
.
.
.
.
em
Articial Intelligence
December 5, 2013
404 / 475
Cross-Validation
k-Fold-Cross-Validation
Algorithm 40: k-Fold-Cross-Validation
input : Learner, a learning-algorithm; s, a model-complexity parameter ;
k; examples
output: Average training-set error-rate eT ,
Average validation set error-rate eV
eV 0, eT 0 ;
for i = 1 to k do
St , Sv Partition(examples, i, k) ;
h Learner(s, St ) ;
eT eT + Error-Rate(h, St ) ;
eV eV + Error-Rate(h, Sv ) ;
return (eT /k, eV /k)
Articial Intelligence
December 5, 2013
405 / 475
Cross-Validation
Articial Intelligence
December 5, 2013
406 / 475
Cross-Validation
Error rate
40
30
20
10
0
1
5
6
Tree size
Articial Intelligence
10
December 5, 2013
407 / 475
Cross-Validation
Figure 79: From Scaling up the Naive Bayesian Classier: Using Decision Trees
for Feature Selection, by Ratanamahatana et al.
Articial Intelligence
December 5, 2013
408 / 475
Unsupervised Learning
Clustering
Clustering partitions available data xi (feature-vectors) into clusters, such
that samples belonging to a cluster are similar according to some criterion.
It nds usage in
Data analysis and compression
Pattern recognition
Image segmentation
Bioinformatics
Articial Intelligence
December 5, 2013
409 / 475
Unsupervised Learning
E (m1 , . . . , mk ) =
j=1
K. Pathak (Jacobs University Bremen)
min xj mp
Articial Intelligence
December 5, 2013
(11.21)
410 / 475
k-Means Clustering
N
j=1 bj,i
xj /
N
j=1 bj,i
Articial Intelligence
December 5, 2013
411 / 475
k-Means Clustering
Animation of k-means
Articial Intelligence
December 5, 2013
412 / 475
(11.22)
x,yCi
i =
i =
1
|Ci |(|Ci | 1)
1
|Ci |
x,yCi
x=y
xy
(11.23)
x i .
xCi
(11.24)
Articial Intelligence
December 5, 2013
413 / 475
max
xy
(11.25)
min
xy
(11.26)
xCi ,yCj
xCi ,yCj
(Ci , Cj ) = i j .
Articial Intelligence
(11.27)
December 5, 2013
414 / 475
1
DIm
min
min (Ci , Cj )
1im 1jm
max k
j=i
(11.28)
1km
Articial Intelligence
December 5, 2013
415 / 475
DIm
min
max k
1km
min (Ci , Cj )
1im 1jm
j=i
Articial Intelligence
December 5, 2013
416 / 475
The EM Algorithm
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0.2
0.4
0.6
0.8
(a)
0.2
0.4
0.6
0.8
(b)
Figure 81: A Gaussian mixture computed from 500 samples with weights (left to
right) 0.2, 0.3, 0.5.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
417 / 475
The EM Algorithm
N (X = x ; , )
x
(2)n/2 ||1/2
1
x
x
exp (x )T 1 (x )
2
(11.29)
GMM pdf
K
p(x) =
i=1
P(C = i) N (x; i , i ),
where,
(11.30)
P(C = i) = 1.
(11.31)
i=1
K. Pathak (Jacobs University Bremen)
Articial Intelligence
December 5, 2013
418 / 475
The EM Algorithm
E [f ]
P(xi ) f (X = xi )
(11.32)
i=1
E [f]
P(xi ) f(X = xi )
(11.33)
i=1
p(x) f (x) dx
(11.34)
D(X )
Articial Intelligence
December 5, 2013
419 / 475
The EM Algorithm
Covariance
E (X E [X])(X E [X])T
(11.35)
Articial Intelligence
December 5, 2013
420 / 475
The EM Algorithm
Mahalanobis Distance
If X Rn is a normally distributed vector continuous RV (CRV), its
normal/Gaussian pdf is dened as
N (X = x ; , C)
x
1
(2)n/2 |C|1/2
1
x
x
exp (x )T C1 (x )
2
(11.36)
(x )T C1 (x ).
Articial Intelligence
(11.37)
December 5, 2013
421 / 475
The EM Algorithm
Mahalanobis Distance
(x )T C1 (x ).
C=
2
x 0
2 ,
0 y
2
C
x
, and
y
(x x )2 (y y )2
=
+
.
2
2
x
y
Articial Intelligence
December 5, 2013
(11.38)
422 / 475
The EM Algorithm
p(x)
P(C = i, x)
i=1
i=1
P(C = i) p(x | C = i)
(11.39)
Articial Intelligence
December 5, 2013
423 / 475
The EM Algorithm
Articial Intelligence
December 5, 2013
424 / 475
The EM Algorithm
= p(xj | Ci ) P(Ci )
(11.40)
I(Zij = True )
(11.41)
P(Zij = True ).
ni =
(11.42)
j=1
n
ni
E [ni ] =
j=1
Articial Intelligence
December 5, 2013
425 / 475
The EM Algorithm
i
P(C = i)
1
ni
P(Zij = True ) xj
(11.43)
(11.44)
j=1
n
j=1
ni
.
n
(11.45)
Articial Intelligence
December 5, 2013
426 / 475
The EM Algorithm
L(x1:n ) =
j=1
log {p(xj )}
log
j=1
(11.46)
i=1
P(C = i) p(xj | C = i) .
(11.47)
Proof is outside of the scope of this course, but uses the Jensens
inequality.
Articial Intelligence
December 5, 2013
427 / 475
The EM Algorithm
Log-likelihood L
10
15
20
Iteration number
Articial Intelligence
December 5, 2013
428 / 475
The EM Algorithm
K unknown.
A Gaussian component may shrink to cover just one point: in this
case, its covariance determinant becomes 0 and the likelihood blows
up. Restart with a new better initial guess.
Articial Intelligence
December 5, 2013
429 / 475
Mean-Shift Clustering
Mean-Shift
Non-Parametric Clustering
Uptil now the number of clusters was always given (e.g. k=256 in the
image-compression example).
What if we want the algorithm to gure out the number of clusters, if
we give it some hints about the structure: signicance of distances in
each of the dimensions of the vector x? These are called bandwidths.
The mean-shift algorithm makes clusters by nding the basins of
attraction of local peaks of the pdf p(x).
The pdf p(x) is estimates by kernel density estimation (KDE).
Detailed derivation done in the class.
Articial Intelligence
December 5, 2013
430 / 475
Mean-Shift Clustering
0.5
0.0
0.1
0.2
dnorm(x)
0.3
0.4
reference
2.0
0.3
0.1
Figure 83: KDE using 200 samples from a standard Gaussian. The KDE uses a
Gaussian kernel of dierent bandwidths.
K. Pathak (Jacobs University Bremen)
Articial Intelligence
(a) hx = 30, hy = 30
December 5, 2013
431 / 475
Mean-Shift Clustering
(b) hx = 60, hy = 60
(c) hx = 60, hy = 30
Articial Intelligence
December 5, 2013
432 / 475
Mean-Shift Clustering
(a) hx = 30, hy = 30
(b) hx = 60, hy = 60
Articial Intelligence
December 5, 2013
433 / 475
Mean-Shift Clustering
Color Spaces
(a) RGB
Articial Intelligence
December 5, 2013
434 / 475
Mean-Shift Clustering
Articial Intelligence
December 5, 2013
435 / 475
December 5, 2013
436 / 475
Home-work Assignments
Contents
Home-assignment 1
Home-work Assignments
Home-assignment 2
Home-assignment 3
Home-assignment 4
Home-assignment 5
Articial Intelligence
Home-work Assignments
Home-assignment 1
Articial Intelligence
Home-work Assignments
December 5, 2013
437 / 475
Home-assignment 1
Articial Intelligence
December 5, 2013
438 / 475
Home-work Assignments
Home-assignment 1
Articial Intelligence
Home-work Assignments
December 5, 2013
439 / 475
Home-assignment 1
Figure 89: An example UML design for homework 1 feel free to alter it.
Base classes Node, Frontier, and GraphSearch are abstract classes which do
not care about implementation details. Thus, the same structure can be
used for all graph-search variants.
Articial Intelligence
December 5, 2013
440 / 475
Home-work Assignments
Home-assignment 1
Articial Intelligence
Home-work Assignments
December 5, 2013
441 / 475
Home-assignment 1
P(a2 ) = 0.15
P(a3 ) = 0.05
P(a4 ) = 0.4
P(a5 ) = 0.1 ,
Articial Intelligence
December 5, 2013
442 / 475
Home-work Assignments
Home-assignment 1
Articial Intelligence
Home-work Assignments
December 5, 2013
443 / 475
Home-assignment 1
4.3 For deciding whether or not to select a worse child, you need to
compute the Boltzmann probability at the current T and sample the
boolean random-variable using the code of the sampler class from the
last part.
4.4 Select a schedule for the temperature based on the advice given in the
quote from Numerical Recipes in C. You have to experiment with it
to get the best results.
4.5 Make a plot of the iteration number vs. the current total traversal
cost. Also show iteration number vs. T (schedule).
4.6 Plot the nal path and print its cost.
Articial Intelligence
December 5, 2013
444 / 475
Home-work Assignments
Home-assignment 1
20
40
60
80
100
(a) Path
K. Pathak (Jacobs University Bremen)
Articial Intelligence
Home-work Assignments
December 5, 2013
445 / 475
Home-assignment 1
500000
1000000
1500000
nr. iteration
2000000
2500000
500000
1000000
1500000
nr. iteration
2000000
2500000
1000
800
T
600
400
200
00
Articial Intelligence
= 0.5
December 5, 2013
446 / 475
Home-work Assignments
Home-assignment 1
Articial Intelligence
Home-work Assignments
December 5, 2013
447 / 475
Home-assignment 2
HA 2 I
Groups of 2. Due date 21.10., 23:59
Articial Intelligence
December 5, 2013
448 / 475
Home-work Assignments
Home-assignment 2
HA 2 II
Groups of 2. Due date 21.10., 23:59
F (A, , )
= , 3
t = 3, 2, 2
3
F (A1, , )
= , 3
t = 3, 12, 8
12
F (A1, , 3)
F (A1, , 3)
= , 14, 5, 2
t = 14, 5, 2
= , 2
t = 2
14
Articial Intelligence
Home-work Assignments
December 5, 2013
449 / 475
Home-assignment 2
HA 2 III
Groups of 2. Due date 21.10., 23:59
C1 :
C2 :
C3 :
A B C D E
E F G D
G E
C4 :
Articial Intelligence
December 5, 2013
450 / 475
Home-work Assignments
Home-assignment 2
HA 2 IV
Groups of 2. Due date 21.10., 23:59
C1 :
A B C D E
C2 :
D E F G
C3 :
E G
C4 :
Articial Intelligence
Home-work Assignments
December 5, 2013
451 / 475
Home-assignment 2
HA 2 V
Groups of 2. Due date 21.10., 23:59
C3 :
C6 = C2
C4 :
C7 = C3
C4 :
A B C D F G
A B C D G
D E F
In the next iteration, we get the following new clauses from resolution
amongst C1 , . . . , C7 , ignoring the pairs already considered
C8 = C2
C7 :
C9 = C3
C6 :
A B C D F
A B C D E F
Articial Intelligence
December 5, 2013
452 / 475
Home-work Assignments
Home-assignment 2
HA 2 VI
Groups of 2. Due date 21.10., 23:59
Articial Intelligence
Home-work Assignments
December 5, 2013
453 / 475
Home-assignment 2
HA 2 VII
Groups of 2. Due date 21.10., 23:59
Articial Intelligence
December 5, 2013
454 / 475
Home-work Assignments
Home-assignment 2
HA 2 VIII
Groups of 2. Due date 21.10., 23:59
3. Convert the following set of sentences to CNF and give a trace of the
execution of DPLL (2nd version, Algo. 25) on the conjunction of
these clauses.
A (C E )
S1 :
(12.1a)
E D
(12.1b)
E C
(12.1d)
C B
S2 :
(12.1f)
B F C
S3 :
S4 :
(12.1c)
C F
S5 :
S6 :
(12.1e)
(20%)
Articial Intelligence
Home-work Assignments
December 5, 2013
455 / 475
Home-assignment 2
HA 2 IX
Groups of 2. Due date 21.10., 23:59
S2 :C2
(A C E ) (C A) (E A)
S3 :C3
S4 :C4
B F C
S5 :C5
S6 :C6
C F
E D
E C
C B
DPLL-1(F = {C11 , . . . , C6 })
Articial Intelligence
December 5, 2013
456 / 475
Home-work Assignments
Home-assignment 2
HA 2 X
Groups of 2. Due date 21.10., 23:59
Although a literal u can now be chosen arbitrarily, we decide to choose
the one with the maximum Jeroslow Wang (JW) metric w (F, ) as
discussed in the class. To review:
2k dk (F, ),
w (F, ) =
(12.3)
k1
C11 : {A} which is also unitary. The second iteration of the while loop
then sets A = 1. F F | A now is empty.
Since F is empty, a Satisable is returned. The current partial model
is D = 1, C = 0, E = 0, A = 0, which satises all clauses of the original
set.
Articial Intelligence
Home-work Assignments
December 5, 2013
457 / 475
Home-assignment 2
HA 2 XI
Groups of 2. Due date 21.10., 23:59
D B, E C , B C .
4.1 Solve the 2SAT problem by using an implication graph: Either prove
the clauses unsatisable, or if they are satisiable, nd a model.
4.2 For nding the strongly-connected-components, use the algorithm from
Cormen et al (CLRS) book, given in Fig. 52. It uses the version of DFS
shown in Figs. 50 and 51.
(25%)
Solution:
Articial Intelligence
December 5, 2013
458 / 475
Home-work Assignments
Home-assignment 2
HA 2 XII
Groups of 2. Due date 21.10., 23:59
1/8
9/18
11/12
A
3/4
B
10/15
C
2/5
19/20
6/7
16/17
13/14
Figure 90: Prob. 4.1. The implication graph G (V , E ) and the rst DFS on G .
Articial Intelligence
Home-work Assignments
December 5, 2013
459 / 475
Home-assignment 2
HA 2 XIII
Groups of 2. Due date 21.10., 23:59
Decreasing oder of u.f from the rst DFS of G.
E, B, D, C, E, A, A, C, D, B
GT
4/9
13/20
A
3/10
14/19
5/8
16/17
6/7
15/18
D
1/2
11/12
Figure 91: Prob. 4.1, 4.2. The second DFS on G T taking the vertices in
decreasing order of their nishing time in the rst DFS.
Articial Intelligence
December 5, 2013
460 / 475
Home-work Assignments
Home-assignment 2
HA 2 XIV
Groups of 2. Due date 21.10., 23:59
T1
T2
T3
T4
:E
: B, C, A, D
: E
: A, D, B, C
T1
T2
T4
T3
Articial Intelligence
Home-work Assignments
December 5, 2013
461 / 475
Home-assignment 2
HA 2 XV
Groups of 2. Due date 21.10., 23:59
= M(G)
M(F )
M(G)
M(F ) M(G)
The independence condition implies
P(A) =
AM(F )
AM(F G ) P(A)
AM(G ) P(A)
(12.5)
Articial Intelligence
December 5, 2013
462 / 475
Home-work Assignments
Home-assignment 2
HA 2 XVI
Groups of 2. Due date 21.10., 23:59
|X | |Y |
P(xi , yj ),
i=1 j=1
Solution:
|X |
i=1
|X |
i=1
i=1 j=1
|Y |
j=1 P(xi , yj )
P(xi | yj )
|Y |
j=1 P(xi
Articial Intelligence
Home-work Assignments
December 5, 2013
463 / 475
December 5, 2013
464 / 475
Home-assignment 2
Articial Intelligence
Home-work Assignments
Home-assignment 3
HA 3: NBC/BN I
Groups of 2. Due date 17.11., 23:59
1. Download
http://www.aispace.org/bayes/version5.1.9/bayes.jar and
run it using java -jar bayes.jar. Load File >Load Sample
Problem >Car Starting Problem.
1.1 Given its parents, of which nodes is the node Spark Quality (SQ)
conditionally independent? You can abbreviate the node names by the
initials.
1.2 To answer the query P(Battery Voltage | Spark Adequate ), which
nodes are irrelevant and can be pruned out to get a smaller BN
without aecting the query result? Verify this by making the query
P(Battery Voltage | Spark Adequate = T ) rst (in the solve tab),
then pruning the BN (in the create tab) and making the query again. If
you get an error, you may have pruned too much.
Articial Intelligence
Home-work Assignments
December 5, 2013
465 / 475
Home-assignment 3
HA 3: NBC/BN II
Groups of 2. Due date 17.11., 23:59
If you code in Python, use of the numpy library would save you time.
3.2 Use these posterior queries on the test data SPECT.test and compute
the error-rate in percentage. The NBC predicted classication is, of
course, the c with the maximum posterior probability.
Articial Intelligence
December 5, 2013
466 / 475
Home-work Assignments
Home-assignment 4
HA 4: BN/Entropy I
Groups of 2. Due date 29.11., 23:59
e xi
, i = 1, . . . , n, where,
n
e xj
j=1
Articial Intelligence
Home-work Assignments
December 5, 2013
(12.6)
467 / 475
Home-assignment 4
HA 4: BN/Entropy II
Groups of 2. Due date 29.11., 23:59
n
xi
i=1 xi e
n
xi
i=1 e
(12.7)
Articial Intelligence
December 5, 2013
468 / 475
Home-work Assignments
Home-assignment 5
HA 5: Decision Trees I
Groups of 2. Due date 14.12., 23:59
We will compute how accurate decision trees are for the dataset
http://archive.ics.uci.edu/ml/datasets/SPECT+Heart. which we
evaluated with NBC in HA 3.
1. Write a decision-tree class (C++ or Python) which has functionality
for:
Loading a training dataset (SPECT.train for this DB);
Running Algo. 39 to learn a decision-tree.
2. Use the decision tree on the test data SPECT.test and compute the
error-rate in percentage.
Articial Intelligence
December 5, 2013
469 / 475
December 5, 2013
470 / 475
Quizzes
Contents
Quizzes
Quiz 1
Quiz 2
Articial Intelligence
Quizzes
Quiz 1
Quiz 1 I
Sep. 23
Sh2 = {x |
(x)
(x)
(xg )}
(13.1)
(13.2)
(xg )}
Articial Intelligence
Quizzes
December 5, 2013
471 / 475
Quiz 1
Quiz 1 II
Sep. 23
h1 is more ecient.
5
1
Start State
50%
(13.3)
Articial Intelligence
Goal State
December 5, 2013
472 / 475
Quizzes
Quiz 2
Quiz 2 I
Oct. 7
2. The case KB
Q and KB
5. Monotonicity of PL knowledge-bases
Articial Intelligence
Quizzes
December 5, 2013
473 / 475
Quiz 2
M(KB)
M(Q)
(a) KB
M(KB)
M(Q)
M(KB)
(b) KB
(c) KB
M(Q)
Q, KB
M(Q)
M(R)
M(KB)
(d) F F {R}
M(KB )
M(F )
December 5, 2013
474 / 475
Quizzes
Quiz 3
Quiz 3, 13.11.
A
C
D
H
G
Articial Intelligence
December 5, 2013
475 / 475