Вы находитесь на странице: 1из 150

Unit 4: Knowledge Representation,

Inferential reasoning

Hridaya Kandel
hridayakandel@gmail.com
AI Lecturer
BScCSIT Vth

Hridaya Kandel 1
Objective
Students should understand the importance of knowledge
representation
Students should understand the use of formal logic as a knowledge
representation language
The student should be familiar with the concepts of logic such as
syntax, semantics, validity, models, entailment.
student should be able Represent a natural language description as
statements in logic and Deduct new sentences by applying inference
rules.
Students should learn in details about resolution techniques

Hridaya Kandel 2
Knowledge-based Agents
Intelligent agents should have capacity for:
Perceiving, that is, acquiring information from environment,
Knowledge Representation, that is, representing its understanding of
the world,
Reasoning, that is, inferring the implications of what it knows and of
the choices it has, and
Acting, that is, choosing what it want to do and carry it out.

Humans can know things and reason


Representation: How are the things stored?
Reasoning: How is the knowledge used?
To solve a problem
To generate more knowledge

Hridaya Kandel 3
Knowledge-based Agents
Representation of knowledge and the reasoning process are
central to the entire field of artificial intelligence.
Useful mostly in partially observable environments

A knowledge-based agent can combine general knowledge


with current percepts to infer hidden aspects of the current
state prior to selecting actions.
Example: a physician diagnoses a patient
that is, infers a disease state that is not directly observable
Some of the knowledge used in the form of rules learned from
textbooks and teachers, and some is in the form of patterns of
association that the physician may not be able to consciously
describe.
If its inside the physicians head, it counts as knowledge.

Hridaya Kandel 4
Knowledge-based Agents
Understanding natural language also requires inferring hidden
state, namely, the intention of the speaker.
Example:
John saw the diamond through the window and coveted it, we
know it refers to the diamond and not the window
when we hear, John threw the brick through the window and
broke it, we know it refers to the window
Reasoning allows us to cope/deal with the virtually infinite
variety of utterances(vocal expressions) using a finite store of
commonsense knowledge.
Problem-solving agents have difficulty with this kind of
ambiguity because their representation of contingency
problems is inherently exponential.

Hridaya Kandel 5
Knowledge-based Agents
Knowledge-based Agents are flexible
They are able to accept new tasks in the form of explicitly described
goals,
they can achieve competence quickly by being told or learning new
knowledge about the environment, and
they can adapt to changes in the environment by updating the relevant
knowledge.

Hridaya Kandel 6
Knowledge-based Agents
The central component of a knowledge-based agent is its
knowledge base, or KB. Informally, a knowledge base is a set
of sentences (not English sentences).
Knowledge base = set of sentences in a formal language
Each sentence is expressed in a language called a knowledge
representation language. A language used to express
knowledge about the world
Declarative approach language is designed to be able to easily
express knowledge for the world the language is being implemented
for
Procedural approach encodes desired behaviors directly in program
code

Hridaya Kandel 7
Knowledge-based Agents
The new sentences are added to the Knowledge Base and the
query are made on Knowledge Base. This two task are
performed using the two generic functions.
TELL: add new sentences (facts) to the KB
Tell it what it needs to know
ASK: query what is known from the KB
Ask what to do next

Inference the process of deriving new sentences from the


knowledge base
When the agent draws a conclusion from available information, it is
guaranteed to be correct if the available information is correct

Hridaya Kandel 8
Wumpus World
Performance measure
gold +1000, death -1000
-1 per step, -10 for using the arrow
Environment
Squares adjacent to wumpus are smelly
Squares adjacent to pit are breezy
Glitter iff gold is in the same square
Shooting kills wumpus if you are facing it
Shooting uses up the only arrow
Grabbing picks up gold if in same square
Releasing drops the gold in same square
Sensors: Stench, Breeze, Glitter, Bump, Scream
Actuators: Left turn, Right turn, Forward, Grab, Release, Shoot

The percepts will be given to the agent in the form of a list of five symbols; for
example, if there is a stench and a breeze, but no glitter, bump, or scream, the agent
will receive
Hridaya Kandel the percept [Stench; Breeze; 9None; None; None].
Wumpus World

Fully Observable No only local perception


Deterministic Yes outcomes exactly specified
Episodic No sequential at the level of actions
Static Yes Wumpus and Pits do not move
Discrete Yes
Single-agent? Yes Wumpus is essentially a natural
feature

Hridaya Kandel 10
Exploring Wumpus World
We will mark down what we know initially the agent (A) is in 1,1, and
we know that that square is OK.
It then gets the first percept remember order is [stench, breeze, glitter,
bump, scream] and first percept has these all null [null null null null
null].
The agent will move only into a square it knows to be OK.

Hridaya Kandel 11
Exploring Wumpus World

In initial state, there are no percepts (the sequence is none, none, none, none,
none) and therefore it can be inferred that the neighboring squares are safe (OK).

Hridaya Kandel 12
Exploring Wumpus World

The agent decides to move up and feels a breeze what does this mean?

Hridaya Kandel 13
Exploring Wumpus World

That there is a pit in either of the two neighboring squares. Better not move there
since it knows there is a safer move if it backs up

Hridaya Kandel 14
Exploring Wumpus World

So it does and what can it infer from there?

Hridaya Kandel 15
Exploring Wumpus World

A pretty difficult inference!; note how difficult because of the inference of where
the pit is depends on the lack of a percept (no B in 2,1) and percepts gathered over
time.

Hridaya Kandel 16
Exploring Wumpus World

Hridaya Kandel 17
Exploring Wumpus World

Hridaya Kandel 18
Exploring Wumpus World

Hridaya Kandel 19
Exploring Wumpus World

In each case where the agent draws a conclusion from the available
Information, that conclusion is guaranteed to be correct if the available
Information is correct.

This is a fundamental property of logical


Hridaya Kandel 20 reasoning
Logic in general
Knowledge Bases consist of sentences
Logics are formal languages for representing information such that
conclusions can be drawn
Syntax defines how symbols can be put together to form the sentences in
the language
E.g., In the ordinary arithmetic.
x + y = 4 is a well-formed sentence, whereas x2y+ = is not
Semantics define the "meaning" of sentences;
Semantics allows you to relate the symbols in the logic to the domain
youre trying to model.
The semantics of the language defines the truth of each sentence with
respect to each possible world.
x + y =4 is true in a world where x is 2 and y is 2, but false in a world
where x is 1 and y is 1.
When we need to be precise, we will use the term model in place of
possible world.
Hridaya Kandel 21
Logic in general
Entailment means that one thing follows logically from
another.
In mathematical notation, we write as a |= b to mean that
the sentence a entails the sentence b.

The formal definition of entailment is: a |= b iff in every


model in which a is true, b is also true. Another way to say
this is if a is true, then b must be true.
Informally the truth of b is contained in the truth of a

The sentence x + y =4 entails the sentence 4=x + y.


i.e in any model where x + y =4 such as the model in which x is 2 and y
is 2 it is the case that 4=x + y.
We will see shortly that a knowledge base can be considered a
statement, and we often talk of a knowledge base entailing a sentence.

Hridaya Kandel 22
Logic in general
KB
Knowledge base KB entails sentence if and only if is true in all
worlds where KB is true
E.g., the KB containing the Nepal won and the India won entails
Either the Nepal won or the India won
E.g., x+y = 4 entails 4 = x+y

Logicians typically think in terms of models, which are


formally structured worlds with respect to which truth can be
evaluated
m is a model of a sentence means that sentence is true
in model m
M() is the set of all models of
Then
KB |= a iff M(KB) M(a)
Hridaya Kandel 23
Logic in general
Entailment in Wumpus-World
Consider a situation:
The agent has detected nothing in [1,1] and a breeze in [2,1].
The agent is interested in whether the adjacent squares [1,2], [2,2],
and [3,1] contain pits.
Each of the three squares might or might not contain a pit, so (for the
purposes of this example) there are 23 =8 possible models

Hridaya Kandel 24
Logic in general
The KB is false in models that contradict what the agent
knowsfor example, the KB is false in any model in which
[1,2] contains a pit, because there is no breeze in [1,1]. There
are in fact just three models in which the KB is true, and these
are shown as a subset of the models in Figure

KB = wumpus-world rules + observations


Hridaya Kandel 25
Logic in general
Now let us consider two possible conclusions
1 = There is no pit in [1,2].
2 = There is no pit in [2,2].
We have marked the models of 1 in figure.
By inspection, we see the following: in every model in which
KB is true, 1 is also true.
Hence, KB |= a1 :
there is no pit in [1,2].

Hridaya Kandel 26
Logic in general
For conclusion
2 = There is no pit in [2,2].

We have marked the models of 2 in figure.


By inspection, we see the following: In some model in which KB
is true, 2 false.
Hence, KB a1 :
the agent cannot conclude that there is no pit in [2,2]
The preceding example not only
illustrates entailment, but also shows
how the definition of entailment can be
applied to derive conclusionsthat is,
to carry out logical inference.
Hridaya Kandel 27
Logic in general
Model checking enumeration of all possible models to
ensure that a sentence is true in all models in which KB is
true
Inference is the process of deriving a specific sentence from a KB
(where the sentence must be entailed by the KB)
KB |-i a = sentence a can be derived from KB by procedure I
KBs are a haystack
Entailment = needle in haystack
Inference = finding it

Hridaya Kandel 28
Logic in general
An inference algorithm that derives only entailed sentences is called
sound or truth preserving.
Soundness
i is sound if
whenever KB |-i a is true, KB |= a is true

An inference algorithm is complete if it can derive any sentence that


is entailed.
Completeness
i is complete if
whenever KB |= a is true, KB |-i a is true

If KB is true in the real world, then any sentence a derived from KB


by a sound inference procedure is also true in the real world
Hridaya Kandel 29
Propositional Logic
Propositional logic is the simplest logic.
Also Known As Boolean Logic
The syntax of propositional logic defines the allowable sentences.
Proposition symbols P1, P2, etc are sentences
Atomic sentence - consists of a single propositional symbol, which is True or
False
Complex sentence-sentence constructed from simpler sentences using
parentheses and logical connectives:
(not) negation
(and) conjunction
( or) disjunction
(implies) implication (premise=>conclusion)
(if and only if) biconditional

Hridaya Kandel 30
Propositional Logic
Propositional logic is the simplest logic.
Also Known As Boolean Logic
The syntax of propositional logic defines the allowable sentences.
Proposition symbols P1, P2, etc are sentences
Atomic sentence - consists of a single propositional symbol, which is True or
False
Complex sentence-sentence constructed from simpler sentences using
parentheses and logical connectives:
(not) negation
(and) conjunction
( or) disjunction
(implies) implication (premise=>conclusion)
(if and only if) biconditional
P Q P PQ PQ PQ PQ
False False True False False True True
False True True False True True False
Truth table for connectives:
Hridaya Kandel
True False
31
False False True False False
True True False True True True True
Propositional Logic
Formal grammar for propositional logic can be given as below

A BNF (BackusNaur Form) grammar of sentences in propositional logic

Hridaya Kandel 32
Propositional Logic
A simple KB : Wumpus World
For simplicity: we deal only with pits.
Choose vocabulary of proposition symbols. For each i, j
Let Pi,j be True if there is a pit in [i,j]
Let Bi,j be True if there is a breeze in [i,j]
The KB contains the following (Rules)
There is no pit in [1,1]:
R1: P1,1
A square is breezy if and only if there is a pit in a neighboring square. (for
simplicity only relevant square)
R2: B1,1 (P1,2 P2,1)
R3: B2,1 (P1,1 P2,2 P3,1)
The breeze percepts for the first two squares visited in the specific world the agent
is in
Hridaya Kandel
R4: B1,1 33

R5: B2,1
Propositional Logic

Figure: A truth table constructed for the knowledge base as discussed. KB is true if
R1 through R5 are true, which occurs in just 3 of the 128 rows. In all 3 rows, P 1,2 is
false, so there is no pit in [1,2]. On the other hand, there might (or might not) be a pit
in [2,2].
Hridaya Kandel 34
KB example

Hridaya Kandel 35
KB example

Hridaya Kandel 36
KB example

Hridaya Kandel 37
Propositional Logic: Equivalence

Hridaya Kandel 38
Validity, Satisfiability, Unsatisfiability

A sentence if valid if it is true in all models


Valid sentences are also known as tautologies
e.g. True, A A, A A, (A (A B) B
Validity is connected to inference via the Deduction Theorem
KB if and only if (KB ) is valid
A sentence is satisfiable if it is True in some model
e.g. A B, C
A sentence is unstatisfiable if it is True in no models
e.g. A A
Satisfiability is connected to inference via the following
KB |= a iff (KB a) is unsatisfiable
proof by contradiction
Hridaya Kandel 39
Exercise

Q. Consider a vocabulary with only four propositions, A, B, C,


and D. How many models are there for the following sentences?
a) (A ^ B) V (B ^ D)
b) A V B
c) A B C

Hridaya Kandel 40
Reasoning Patterns
A Inference Rules
Patterns of inference that can be applied to derive chains of
conclusions that lead to the desired goal.
Proof rules (or inference rules) show us, given true statements how to
generate further true statements.

Axioms describe universal truths of the logic.


Example p p is an axiom of propositional logic
We use the symbol denoting is provable or is true.
We write A1, . . .An B to show that B is provable from A1, . . .An (given
some set of inference rules).
Stating that B follows (or is provable) from A1, . . .An can be written

Hridaya Kandel 41
Inference Rules
Modus Ponens
This well known proof rule is called modus ponens, i.e. in general

Whenever sentences of the form a b and a are given, then sentence b


can be inferred
(WumpusAhead WumpusAlive) Shoot and (WumpusAhead
WumpusAlive), Shoot can be inferred

Hridaya Kandel 42
Inference Rules
AND ()-elimination
From a conjunction, any of the conjuncts can be inferred

The first of these can be read if A and B hold (or are provable or true) then
A must also hold.
(WumpusAhead WumpusAlive), WumpusAlive can be inferred

Hridaya Kandel 43
Inference Rules
OR ()-introduction
Another proof rule, known as -introduction is

The first of these can be read if A holds (or are provable or true) then A B
must also hold.
All of the logical equivalences can be used as inference rules
Note:
sequence of applications of inference rulesis called a Proof.
Finding proofs is exactly like finding solutions to search problems.
Monotonicity
says that the set of entailed sentences can only increase as information is
added to the knowledge base.
If we have a proof, adding information to the DB will not invalidate the
Hridaya Kandel 44
proof
Inference Rules: Example
From r s and s p can we prove p, i.e. show r s, s p
p?

Hridaya Kandel 45
Resolution
We have argued that the inference rules covered so far are sound, but we have not
discussed the question of completeness for the inference algorithms that use them.
Resolution is a proof method for classical propositional and first-order logic.
Resolution allows a complete inference mechanism (search-based) using only one
rule of inference ie. Resolution itself
The (propositional) resolution rule is as follows

A V B is called the resolvent.


A V p and B V p are called parents of the resolvent.
p and p are called complementary literals.

Hridaya Kandel 46
Resolution
Unit Resolution
Unit resolution rule takes a clause a disjunction of literals and a literal and
produces a new clause. Single literal is also called unit clause.

where li and m are complementary literals

Generalized resolution rule


Generalized resolution rule takes two clauses of any length and produces a
new clause as below.

li and mj are complementary literals


Example:
Hridaya Kandel 47
Resolution
The Resolution method involves:-
translation to a normal form (CNF);
To prove a fact P, repeatedly apply resolution until either:
No new clauses can be added, (KB does not entail P)
The empty clause is derived (KB does entail P)
This is proof by contradiction: if we prove that KB P derives a
contradiction (empty clause) and we know KB is true, then P must be false,
so P must be true!
To apply resolution mechanically, facts need to be in Conjunctive
Normal Form (CNF)
A sentence is in Conjunctive Normal Form (CNF) if it is a conjunction of clauses,
each clause being a disjunction of literals
Example:

Hridaya Kandel 48
Resolution Method
Example
Show by resolution that the following set of clauses is unsatisfiable.
The sets of clauses are already in CNF.

Hridaya Kandel 49
Resolution Method: example
First Convert to CNF B1,1 (P1,2 P2,1)
1. Eliminate , replacing with ( )( ).
(B1,1 (P1,2 P2,1)) ((P1,2 P2,1) B1,1)

2. Eliminate , replacing with .


(B1,1 P1,2 P2,1) ((P1,2 P2,1) B1,1)

3. Move inwards using de Morgan's rules and eliminate


double-negation:
(B1,1 P1,2 P2,1) ((P1,2 P2,1) B1,1)

4. Apply distributivity law ( over ) and flatten:


(B1,1 P1,2 P2,1) (P1,2 B1,1) (P2,1 B1,1)

Hridaya Kandel 50
Resolution Method: Example
Conclusion: there is no pit in [1,2]
i.e = P1,2
Proof by contradiction, i.e., show
KB unsatisfiable
We have
KB = (B1,1 (P1,2 P2,1)) B1,1
= P1,2

Hridaya Kandel 51
Resolution Method: Exercise
Use resolution Algorithm to solve the following problem
KB entails A

Hridaya Kandel 52
Evaluation : Resolution

Resolution is sound
Because the resolution rule is true in all cases
Resolution is complete
Provided a complete search method is used to find the
proof, if a proof can be found it will
Note: you must know what youre trying to prove in
order to prove it!
Resolution is exponential
The number of clauses that we must search grows
exponentially

Hridaya Kandel 53
Horn clause
Real-world knowledge bases often contain only clauses of a
restricted kind called Horn clauses.
A Horn clause is a disjunction of literals of which at most one is
positive.
The positive literal is called the head and the negative literals form the
body of the clause
For example, the clause ( L Breeze B), is a Horn clause, whereas
( B P P) is not.
Importance:
Horn Clauses form the basis of forward
and backward chaining
Deciding entailment with Horn Clauses
is linear in the size of the knowledge
base.
Can be written as an implication
(example) Fig: Example of Horn clauses
Note: Hridaya
TheKandel
Prolog language is based on Horn
54

Clauses
ANDOR graphs
ANDOR graphs,
multiple links joined by an arc indicate a conjunctionevery
link must be proved
while multiple links without an arc indicate a disjunction
any link can be proved.

Hridaya Kandel 55
Reasoning with Horn Clauses
Forward Chaining
For each new piece of data, generate all new facts, until
the desired fact is generated
Data-directed reasoning
Backward Chaining
To prove the goal, find a clause that contains the goal as
its head, and prove the body recursively
(Backtrack when you chose the wrong clause)
Goal-directed reasoning

Hridaya Kandel 56
Forward Chaining
Fire any rule whose premises are satisfied in the KB
Add its conclusion to the KB until the query is found

Prove that Q can be inferred from above KB

Hridaya Kandel 57
Forward Chaining

Hridaya Kandel 58
Forward Chaining

Hridaya Kandel 59
Forward Chaining

Hridaya Kandel 60
Forward Chaining

Hridaya Kandel 61
Forward Chaining

Hridaya Kandel 62
Forward Chaining

Hridaya Kandel 63
Forward Chaining

Hridaya Kandel 64
Forward Chaining

Hridaya Kandel 65
Backward Chaining
Idea: work backwards from the query q:
To prove q by BC,
Check if q is known already, or
Prove by BC all premises of some rule concluding q

Avoid loops
Check if new subgoal is already on the goal stack

Avoid repeated work: check if new subgoal


Has already been proved true, or
Has already failed

Hridaya Kandel 66
Backward Chaining

Prove that Q can be inferred from above KB

Hridaya Kandel 67
Backward Chaining

Hridaya Kandel 68
Backward Chaining

Hridaya Kandel 69
Backward Chaining

Hridaya Kandel 70
Backward Chaining

Hridaya Kandel 71
Backward Chaining

Hridaya Kandel 72
Backward Chaining

Hridaya Kandel 73
Backward Chaining

Hridaya Kandel 74
Backward Chaining

Hridaya Kandel 75
Backward Chaining

Hridaya Kandel 76
Backward Chaining

Hridaya Kandel 77
Translation Guide

Hridaya Kandel 78
Pros and Cons of PL
Propositional logic is declarative
Propositional logic allows partial/disjunctive/negated
information
(unlike most data structures and databases)
Propositional logic is compositional:
meaning of B1,1 P1,2 is derived from meaning of B1,1 and of P1,2
Meaning in propositional logic is context-independent
(unlike natural language, where meaning depends on context)
Propositional logic has very limited expressive power
(unlike natural language)
E.g., cannot say "pits cause breezes in adjacent squares
except by writing one sentence for each square

Hridaya Kandel 79
Logics in General
Ontological Commitment:
What exists in the world TRUTH
PL : facts hold or do not hold.
FOL : objects with relations between them that hold or do not hold

Epistemological Commitment:
What an agent believes about facts BELIEF

Hridaya Kandel 80
First-order logic
Whereas propositional logic assumes the world contains facts,
first-order logic (like natural language) assumes the world
contains
Objects, which are things with individual identities
Properties of objects that distinguish them from other objects
Relations that hold among sets of objects
Functions, which are a subset of relations where there is only one value for
any given input
Examples:
Objects: Students, lectures, companies, cars ...
Relations: Brother-of, bigger-than, outside, part-of, has-color, occurs-after,
owns, visits, precedes, ...
Properties: blue, oval, even, large, ...
Functions: father-of, best-friend, second-half, one-more-than ...

Hridaya Kandel 81
Models for FOL: Graphical Example

Hridaya Kandel 82
Syntax of FOL: Basic elements
Constant Symbols: which represent individuals in the world
Stand for objects
e.g., KingJohn, 2, UCI,...
Predicate Symbols : which map individuals to truth values
Stand for relations
E.g., Brother(Richard, John), greater_than(3,2)...
Function Symbols : which map individuals to individuals
Stand for functions
E.g., Sqrt(4), LeftLegOf(John),...

Variables x, y, a, b,...
Connectives , , , ,
Equality =
Quantifiers ,

Hridaya Kandel 83
Syntax of FOL:BNF

Hridaya Kandel 84
FOPL: Sentences
A term (denoting a real-world individual) is a constant symbol, a variable
symbol, or an n-place function of n terms.
x and f(x1, ..., xn) are terms, where each xi is a term.
A term with no variables is a ground term

An atomic sentence (which has value true or false) is an n-place predicate of


n terms

A complex sentence is formed from atomic sentences connected by the


logical connectives:
P, PQ, PQ, PQ, PQ where P and Q are sentences

A quantified sentence adds quantifiers and

A well-formed formula (wff) is a sentence containing no free variables.


That is, all variables are bound by universal or existential quantifiers.
(x)P(x,y)
Hridaya Kandel has x bound as a universally
85 quantified variable, but y is free.
Atomic sentence
Atomic sentences state facts using terms and predicate symbols
P(x,y) interpreted as x is P of y

Examples:
LargerThan(2,3) is false.
Brother_of(Mary,Pete) is false.
Married(Father(Richard), Mother(John)) could be true or false

Note: Functions do not state facts and form no sentence:


Brother(Pete) refers to John (Petes brother) and is neither true nor
false.

Brother_of(Pete,Brother(Pete)) is True.

Hridaya Kandel 86
Binary relation Function
Complex Sentence

We make complex sentences with connectives (just like in propositional


logic).
property

Brother (LeftLeg (Richard ), John ) (Democrat (Bush ))

binary function
relation

objects

connectives

Hridaya Kandel 87
Universal Quantification
Universal quantification
<variables> <sentence>
Allows us to make statements about all objects that have certain properties
(x)P(x) means that P holds for all values of x in the domain associated
with that variable
E.g.,
x dolphin(x) => mammal(x)
x King(x) => Person(x)
x Person(x) => HasHead(x)
i Integer(i) => Integer(plus(i,1))

a) Universal quantifiers are often used with implies to form rules:


(x) student(x) smart(x) means All students are smart
b) Universal quantification is rarely used to make blanket statements about every
individual in the world:
(x)student(x)smart(x) means Everyone in the world is a student and is smart
Hridaya Kandel 88
x King(x) Person(x) is not correct!
Existential Quantification
Existential quantification
<variables> <sentence>
( x)P(x) means that P holds for some value of x in the domain associated
with that variable
Permits one to make a statement about some object without naming it
E.g.,
( x) mammal(x) lays-eggs(x)
x King(x)
x Lives_in(John, Castle(x))
i Integer(i) GreaterThan(i,0)
a) Existential quantifiers are usually used with and to specify a list of properties
about an individual:
(x) student(x) smart(x) means There is a student who is smart
b) A common mistake is to represent this English sentence as the FOL sentence:
(x) student(x) smart(x)
But what happens when there is a person who is not a student?
Hridaya Kandel 89
FOPL: Example
Lets consider following objects.
Richard the Lionheart, King of England from 1189 to 1199; His younger brother,
the evil King John, Who ruled from 1199 to 1215; the left leg of Richard and john;
and a crown.
The domain of the model is all the set of object. (objects are also called domain
element)
Symbols: Symbols are the syntactic elements of FOPL.
Constant symbols : Stands for object. Eg. Richard and John
Predicate symbols: Stands for relation. Eg. Brother, onHead, Person, King, Crown.
Function Symbol : Stands for functions. Eg LeftLeg.
Atomic sentence and Complex sentence (provide Example)
Quantified Sentence (provide Example)

Semantic : Relate sentences to models to determine truth.


Interpretation: specify exactly which object, relation, and function are referred to
by respective symbols.
Richard Refers to Richard the lionheart and John refers to the evil king John.
Brother refers to brotherhood relation
LeftLeg refers to Left leg function.
Hridaya Kandel 90
Quantifier Scope
More complex sentences can be expressed with nested quantifiers.
Like nested variable scopes in a programming language
Like nested ANDs and ORs in a logical sentence

The order of like/same quantifiers does not matter.


Switching the order of universal quantifiers does not change the meaning:
(x)(y)P(x,y) (y)(x) P(x,y)
Similarly, you can switch the order of existential quantifiers:
(x)(y)P(x,y) (y)(x) P(x,y)

The order of unlike/different quantifiers is important.


Switching the order of universals and existentials does change meaning:
Everyone loves someone: (x)(y) loves(x,y)
For everyone (all x) there is someone (exists y) whom they love.
There might be a different y for each x (y is inside the scope of x)
Someone is liked by everyone: (y)(x) loves(x,y)
There is someone (exists y) whom everyone loves (all x).
Every x loves the same y (x is inside the scope of y)
This is more Clearer with parentheses:
Hridaya Kandel 91 y ( x Loves(x,y) )
Connection of Quantifier
The two quantifiers are actually intimately connected with each other, through
negation.
Asserting that all x have property P is the same as asserting that
there does not exist any x that doest have the property P
Eg. When one says that everyone dislikes carrot, one is also saying that there does
not exist someone who likes them; and vice versa:
x Likes(x, Carrot) is equivalent to x Likes(x, Carrot)
Note.
- is a conjunction over the universe of objects
- is a disjunction over the universe of objects
Thus, DeMorgans rules can be applied

Generalized De Morgans Rule De Morgans Rule


x P x (P ) P Q (P Q )
x P x (P ) P Q (P Q )
x P x (P ) (P Q ) P Q
x P x (P ) (P Q ) P Q
Hridaya Kandel 92
Hridaya Kandel 93
Translating English to FOL
Every gardener likes the sun.
x gardener(x) likes(x,Sun)
You can fool some of the people all of the time.
x t (person(x) time(t)) can-fool(x,t)
You can fool all of the people some of the time.
x t (person(x) time(t) can-fool(x,t))
Equivalent
x (person(x) t (time(t) can-fool(x,t)))
All purple mushrooms are poisonous.
x (mushroom(x) purple(x)) poisonous(x)
No purple mushroom is poisonous.
x purple(x) mushroom(x) poisonous(x) Equivalent
x (mushroom(x) purple(x)) poisonous(x)
There are exactly two purple mushrooms.
x y mushroom(x) purple(x) mushroom(y) purple(y) ^ (x=y) z
(mushroom(z) purple(z)) ((x=z) (y=z))
Clinton is not tall.
tall(Clinton)
X is above Y iff X is on directly on top of Y or there is a pile of one or more other
objects directly on top of one another starting with X and ending with Y.
x y above(x,y) (on(x,y) z (on(x,z)
Hridaya Kandel 94
above(z,y)))
Inference in FOL
The inference rules for propositional logic: Modus Ponens, And-Elimination, And-
Introduction, Or-Introduction, and Resolution hold for first-order logic. Some
additional inference rules are required to handle first-order logic sentences with
quantifiers.
These are:
Universal Elimination / Universal instantiation (UI)
Existential Elimination /Existential instantiation(EI)
Existential Introduction/ Existential generalization
These rules are more complex as the variable have to be substituted by particular
individuals.
SUBST(, ) to denote the result of applying the substitution (or binding list) to
the sentence
SUBST({v/g}, ) means the result of substituting g for v in sentence.
= {v/g}
(Theta is also called unifier later which is the result of unification).
E.g SUBST({x/Sam, y/Pam}, Likes(x,y)) = Likes(Sam,Parri)
Hridaya Kandel 95
Universal instantiation
Universal Elimination / Universal instantiation (UI)
If (x) P(x) is true, then P(C) is true, where C is any constant in the domain of x
for any sentence , variable v and ground term g

Every instantiation of a universally quantified sentence is entailed by it


The variable symbol can be replaced by any ground term, i.e., any constant symbol or
function symbol applied to ground terms only.
For example, from xLikes(x,IceCream), we can use the substitution {x/Ben} and infer
Likes(Ben, IceCream).
More E.g., x King(x) Greedy(x) Evil(x) yields
King(John) Greedy(John) Evil(John), {x/John}
King(Richard) Greedy(Richard) Evil(Richard), {x/Richard}
King(Father(John)) Greedy(Father(John)) Evil(Father(John)), {x/Father(John)}
Note:
Universal instantiation can be applied several times to add new sentences: the new KB is
logically equivalent to the old.
Hridaya Kandel 96
Existential instantiation
Existential Elimination / Existential instantiation (UI)
From (x) P(x) infer P(c)
For any sentence , variable v, and constant symbol k (that does not appear elsewhere in the
knowledge base):

E.g., x Crown(x) OnHead(x,John) yields: Crown(C1) OnHead(C1,John)


where C1 is a new constant symbol, called a Skolem constant
Note that the variable is replaced by a brand new constant that does not occur in
this or any other sentence in the Knowledge Base. In other words, we don't want to
accidentally draw other inferences about it by introducing the constant. All we
know is there must be some constant that makes this true, so we can introduce a
brand new one to stand in for that (unknown) constant.
Note:
Existential instantiation can be applied once to replace the existential sentence: the new KB
is not equivalent to the old, but is satisfiable iff the old KB was satisfiable.

Hridaya Kandel 97
Existential generalization
Existential Introduction/ Existential generalization
If P(c) is true, then (x) P(x) is inferred
For any sentence , variable v that does not occur in , and ground term g that
does occur in :

v SUBST({g/v}, )

Example
eats(Ziggy, IceCream) (x) eats(Ziggy, x)
All instances of the given constant symbol are replaced by the new variable
symbol
Note that the variable symbol cannot already exist anywhere in the expression

Hridaya Kandel 98
Reduction to propositional form
FOL->predicate Logic(PL) (Some may call reduction to Propositional Logic)
Existential and universal instantiation allows to propositionalize any FOL
sentence or KB
EI produces one instantiation per Existential Quantified (EQ) sentence
UI produces a whole set of instantiated sentences per Universal Quantified (UQ)
sentence
Example:
Suppose the KB contains the following: Instantiating the universal sentence in all
x King(x) Greedy(x) Evil(x) possible ways, we have:
Father(x) King(John) Greedy(John) Evil(John)
King(John) King(Richard) Greedy(Richard) Evil(Richard)
Greedy(John) King(John)
Brother(Richard,John) Greedy(John)
Brother(Richard,John)

The new KB is propositionalized: propositional symbols are


King(John), Greedy(John), Evil(John), King(Richard), etc
Hridaya Kandel 99
Reduction to propositional form
Problems with Propositionalization :works if is entailed, loops if is not entailed
with function symbols, there are infinitely many ground terms,
Example First Depth
x King(x) Greedy(x) Evil(x) Father(John)
Father(Richard)
Father(x) (we assume this as a function)
King(John)
King(John) Greedy(Richard)
Greedy(Richard) Brother(Richard , John)
Brother(Richard,John) King(John) Greedy(John) Evil(John)
King(Richard) Greedy(Richard) Evil(Richard)
King(Father(John)) Greedy(Father(John)) Evil(Father(John)
King(Father(Richard)) Greedy(Father(Richard))
Evil(Father(Richard))
This continues
Propositionalization generates lots of irrelevant sentence :So inference may be very
inefficient.
If we take previous example
It seems obvious that Evil(John) is entailed, but propositionalization produces lots of facts such
as Greedy(Richard) that are irrelevant.
With pHridaya
k-aryKandel
predicates and n constants, there
100
are pnk instantiations
Unification and Lifting
Solution to Propositionalization could be doing inference directly with FOL
sentences
A key component of all first-order inference algorithms is unification.
Unification is a "pattern matching" procedure that takes two atomic sentences,
called literals, as input, and returns "failure" if they do not match and a
substitution list, Theta, if they do match.
Unify algorithm: takes 2 sentences p and q and returns a unifier if one exists
Unify(p,q) = where Subst(, p) = Subst(, q)
Example:
p = Knows(John,x)
q = Knows(John, Jane)
Unify(p,q) = = {x/Jane}
Most of the propositional inference rules are lifted to FOL inference rules with the
help of Unification. Lifted - transformed from
Generalized Modus Ponens = lifted Modus Ponens
Backwards chaining, forwards chaining, and resolution algorithms also have lifted
forms which we will see later
Hridaya Kandel 101
Unification examples
simple example: query = Knows(John,x), i.e., who does John know?
Given KB i.e all sentences in q

p q

Knows(John,x) Knows(John,Jane) {x/Jane}


Knows(John,x) Knows(y,OJ) {x/OJ,y/John}
Knows(John,x) Knows(y,Mother(y)) {y/John,x/Mother(John)}
Knows(John,x) Knows(x, Elizabeth) {fail}

Last unification fails: only because x cant take values John and Elizabeth at
the same time
Problem is due to use of same variable x in both sentences .Both use the same
variable, X. X cant equal both John and Elizabeth.
The solution: change the variable X to Y (or any other value) in Knows(X,
Elizabeth)
Knows(X, Elizabeth) is changed to Knows(Y, Elizabeth) which Still means the
same.
This is called standardizing apart.
Hridaya Kandel
Standardizing apart eliminates
102
overlap of variables
Unification Complication
A problem may arise in unification when the variable take two values as seen in the
example before
Standardizing apart eliminates overlap of variables.
Process: rename all variables so that variables bound by different quantifiers have
unique names for each sentences in KB.
For example if KB has following After Standardizing apart KB becomes
x Apple(x) => Fruit(x) x Apple(x) => Fruit(x)
x Spider(x) => Arachnid(x) y Spider(y) => Arachnid(y)
There is one more complication with Unification: we said that UNIFY should return
a substitution that makes the two arguments look the same. But there could be more
than one such unifier.
Example To unify Knows(John,x) and Knows(y,z),
This can return two possible unifications:
{y/ John, x/ z} which means Knows(John, z) OR {y/ John, x/ John, z/ John} which means
Knows(John, John).
So the unification algorithm is required to return the (unique) most general unifier (MGU).
for above example MGU = { y/John, x/z }
There is a single most general unifier (MGU) that is unique up to renaming of variables
Hridaya Kandel 103
Generalized Modus Ponens (GMP)
This is a general inference rule for FOL(lifted) that does not require
instantiation
For atomic sentences pi, pi' , and q, where there is a substitution such that
Subst(, pi' )= Subst(, pi), for all i,

p1', p2', , pn', ( p1 p2 pn q)


Subst(,q)
Example: (For KB described before)
King(John), Greedy(John) ,x King(x) Greedy(x) Evil(x)
Evil(John)
GMP is sound
p1' is King(John) p1 is King(x) Only derives sentences that are logically
p2' is Greedy(John) p2 is Greedy(x) entailed (Proof in Book).
is {x/John} q is Evil(x) GMP is complete (derives all sentences that
entailed) for a 1st-order KB in Horn Clause
Subst(,q) is Evil(John)
Hridaya Kandel 104
format
Inference in FOL
Forward-chaining
Uses GMP to add new atomic sentences
Useful for systems that make inferences as information streams in
Requires KB to be in form of first-order definite clauses
Backward-chaining
Works backwards from a query to try to construct a proof
Can suffer from repeated states and incompleteness
Useful for query-driven inference
Resolution-based inference (FOL)
Refutation-complete for general KB
Can be used to confirm or refute a sentence p (but not to
generate all entailed sentences)
Requires FOL KB to be reduced to CNF
Uses generalized version of propositional inference rule Resolution
Note that all of these methods are generalizations of their propositional
equivalents ( the rules are Lifted with unification)
Hridaya Kandel 105
Inference in FOL Example
The law says that it is a crime for an American to sell weapons to hostile nations.
The country Nono, an enemy of America, has some missiles, and all of its missiles
were sold to it by Colonel West, who is American.
... it is a crime for an American to sell weapons to hostile nations:
American(x) Weapon(y) Sells(x,y,z) Hostile(z) Criminal(x)

Nono has some missiles, i.e., x Owns(Nono,x) Missile(x):


Owns(Nono,M1) and Missile(M1)

all of its missiles were sold to it by Colonel West


Missile(x) Owns(Nono,x) Sells(West,x,Nono)

Missiles are weapons:


Missile(x) Weapon(x)

An enemy of America counts as "hostile:


Enemy(x,America) Hostile(x)

West, who is American


American(West)

The country Nono, an enemy of America


Enemy(Nono,America)
Hridaya Kandel 106
Forward Chaining

Hridaya Kandel 107


Forward Chaining

Hridaya Kandel 108


Forward Chaining

Hridaya Kandel 109


Backward Chaining

Hridaya Kandel 110


Backward Chaining

Hridaya Kandel 111


Backward Chaining

Hridaya Kandel 112


Backward Chaining

Hridaya Kandel 113


Backward Chaining

Hridaya Kandel 114


Backward Chaining

Hridaya Kandel 115


Backward Chaining

Hridaya Kandel 116


Resolution in FOL
Full first-order version:
l1 lk , m1 mn
Subst( , l1 li-1 li+1 lk m1 mj-1 mj+1 mn)

where Unify(li, mj) = .


(li, mj) are complementary literals
The two clauses are assumed to be standardized apart so that they share no
variables.
For example,
Rich(x) Unhappy(x) ,Rich(Ken)
Unhappy(Ken)
with = {x/Ken}

Apply resolution steps to CNF(KB );


it is complete for FOL.

Hridaya Kandel 117


Resolution refutation
The general technique is to add the negation of the sentence to be proven to
the KB and see if this leads to a contradiction.
Idea: if the KB becomes inconsistent with the addition of the negated
sentence, then the original sentence must be true.
This is called resolution refutation. Also called RRS
The procedure is complete for FOL.
Algorithm

Hridaya Kandel 118


Converting FOL sentences to CNF
1. Eliminate biconditionals and implications.
2. Reduce the scope of : move inwards.
3. Standardize variables apart: each quantifier should use a different variable
name.
4. Skolemize: a more general form of existential instantiation.
5. Each existential variable is replaced by a Skolem function of the enclosing
universally quantied variables.
6. Drop all universal quantifiers: It's all right to do so now.
7. Distribute over .
8. Make each conjuct a separate clause.
9. Standardize the variables apart again if required.

More on Skolemization
If an existentially quantified variable is in the scope of universally quantified
variables, then the existentially quantified variable must be a function of those
other variables. We introduce a new, unique function called Skolem function.
Hridaya Kandel 119
Converting FOL sentences to CNF: Example
Original sentence:
Anyone who likes all animals is loved by someone:
x [y Animal(y) Likes(x,y)] [y Loves(y,x)]

1. Eliminate biconditionals and implications


x [y Animal(y) Likess(x,y)] [y Loves(y,x)]

2. Move inwards:
Recall: x p x p, x p x p
x [y (Animal(y) Likes(x,y))] [y Loves(y,x)]
x [y Animal(y) Likes(x,y)] [y Loves(y,x)]
x [y Animal(y) Likes(x,y)] [y Loves(y,x)]

Either there is some animal that x doesn'tt like if that is not the case then someone
loves x

Hridaya Kandel 120


Converting FOL sentences to CNF
3. Standardize variables: each quantifier should use a different one
x [y Animal(y) Likes(x,y)] [z Loves(z,x)]

4. Skolemize:
x [Animal(A) Likes(x,A)] Loves(B,x)
Everybody fails to love a particular animal A or is loved by a particular
person B
Animal(cat)
Likes(marry, cat)
Loves(john, marry)
Likes(cathy, cat)
Loves(Tom, cathy)
a more general form of existential instantiation.

Each existential variable is replaced by a Skolem function of the enclosing


universally quantified variables:
x [Animal(F(x)) Likes(x,F(x))] Loves(G(x),x)

(reason: animal y could be a different121animal for each x.)


Hridaya Kandel
Converting FOL sentences to CNF

5. Drop universal quantifiers:


[Animal(F(x)) Likes(x,F(x))] Loves(G(x),x)
(all remaining variables assumed to be universally quantified)

6. Distribute over :
[Animal(F(x)) Loves(G(x),x)] [Likes(x,F(x)) Loves(G(x),x)]
Original sentence is now in CNF form can apply same ideas to all
sentences in KB to convert into CNF

Also need to include negated query Then use resolution to attempt to


derive the empty clause which show that the query is entailed by the KB

Hridaya Kandel 122


FOL Resolution

Hridaya Kandel 123


FOL Resolution : Example 2

KB:
a) Everyone who loves all animals is loved by someone.
b) Anyone who kills animals is loved by no-one.
c) Jack loves all animals.
d) Either Curiosity or Jack killed the cat, who is named Tuna.
Query: Did Curiosity kill the cat?

Inference Procedure:
1. Express sentences in FOL.
2. Eliminate existential quantifiers.
3. Convert to CNF form and negated query.

Ref Book for Detail solution


Hridaya Kandel 124
FOL Resolution : Example 2

Hridaya Kandel 125


FOL Resolution : Example 2

A1, A2. are from A.


Hridaya Kandel 126
FOL Resolution : Example 2

Hridaya Kandel 127


Reasoning
We have already covered reasoning in symbolic logic
Reasoning is the act of deriving a conclusion from certain premises using a given
methodology.
Reasoning is a process of thinking; reasoning is logically arguing; reasoning is
drawing inference.
When a system is required to do something, that it has not been explicitly told how
to do, it must reason. It must figure out what it needs to know from what it already
knows.
Definitions :
Reasoning is the act of deriving a conclusion from certain premises using a
given methodology (inference is methodology).
Any knowledge system must reason, if it is required to do something which has
not been told explicitly .
For reasoning, the system must find out what it needs to know from what it
already knows.
Example :
If we know : Robins are birds. All birds have wings
Then if we ask: Do robins have wings?
To answer this question - some reasoning must go.
Hridaya Kandel 128
Reasoning Vs inference ??
Uncertainty
The world is an uncertain place; often the Knowledge is imperfect , which
causes uncertainty. Therefore reasoning must be able to operate under
uncertainty.
AI systems must have ability to reason under conditions of uncertainty.
agents almost never have access to the whole truth about their
environment. Agents must, therefore, act under uncertainty.
for example,
An agent wants to drive someone to the airport to catch a plane .
He makes a plan say A90 = leaving home 90 minutes before the flight departs
and driving at a reasonable speed.
Even though the airport is only about 15 miles away, the agent cannot be
certain that plan A90 will get him on time.

Approaches to Reasoning
There are three different approaches to reasoning under uncertainties.
Symbolic reasoning (facts)
Statistical reasoning (degree of belief)
Fuzzy logic reasoning (degree of truth)

Hridaya Kandel 129


Uncertainty
Symbolic versus statistical reasoning

The (Symbolic) methods basically represent uncertainty belief as being


True,
False, or
Neither True nor False.
Some methods also had problems with
Incomplete Knowledge
Contradictions in the knowledge.

Statistical reasoning
In the logic based approaches (symbolic), we have assumed that everything is either
believed false or believed true.
However, it is often useful to represent the fact that we believe such that something
is probably true, or true with probability (say) 0.55.
This is useful for dealing with problems where there is randomness and
unpredictability (such as in games of chance) and also for dealing with problems
where we could, if we had sufficient information, work out exactly what is true. To
do all this in a principled way requires techniques for probabilistic reasoning.

Hridaya Kandel 130


Handling Uncertainty
Let us take an example of medical Diagnosis
Let us take a first-order logic rule for dental diagnosis using
p symptom(p, Toothache) disease(p,cavity)
The problem is that this rule is wrong. Not all patients with toothaches have
cavities; some of them have gum disease, an abscess, or one of several other
problems. To make the rule true we may have to add unlimited list of possible
causes as below
p sympt(p,Toothache) disease(p,cavity) disease(p,gum_disease)
We could turn this into causal rule which is also not true.
p disease(p,cavity) symptom(p, Toothache)
Trying to use first-order logic to cope with a domain like medical diagnosis thus
fails for three main reasons:
Laziness: It is too much work to list the complete set of antecedents or consequents
needed to ensure an exceptionless rule and too hard to use such rules.
Theoretical ignorance: Medical science has no complete theory for the domain.
Practical ignorance: Even if we know all the rules, we might be uncertain about a
particular patient because not all the necessary tests have been or can be run.

Hridaya Kandel 131


Handling Uncertainty
In domain such as medical and other judgmental domains: law, business, design,
automobile repair, gardening, dating, and so on. The agent's knowledge can at best
provide only a degree of belief (plausibility) in the relevant sentences. probability
theory is a tool for dealing with degree of belief , which assigns a numerical degree of
belief between 0 and 1 to each sentence
Decision under uncertainty
Ex , Take an example of : An agent wants to drive someone to the airport to catch a plane
Suppose Agent believe the following:
P(A25 gets me there on time | ) = 0.04
P(A90 gets me there on time | ) = 0.70
P(A120 gets me there on time | ) = 0.95
P(A1440 gets me there on time | ) = 0.9999
(remember if you reach early at airport you have to wait at airport)
In this case Which action to choose?
Choosing above actions Depends on preferences for missing flight vs. time spent waiting
in airport, etc.
Utility theory is used to represent and infer preference
so
Decision theory = probability theory + utility theory
Hridaya Kandel 132
Probability Theory : syntax
We need a formal language for representing and reasoning with uncertain knowledge
Degrees of belief are always applied to propositions language (Propositional Logic &
FOPL) The basic element of the language is the random variable, which can be thought of
as referring to a "part" of the world whose "status" is initially unknown. For example.
Cavity might refer to whether my lower left wisdom tooth has a cavity
Random variables play a role similar to that of CSP variables in constraint satisfaction
problems and that of proposition symbols in propositional logic.
Each random variable has a domain of values that it can take on. Ex. Cavity might take
values ( true,false) ( mostly random variables names are capitalize and values names are
lowercase)
Random variables are typically divided into three kinds, depending on the type of the
domain:
a) Boolean random variables, such as Cavity, have the domain (true, false). We will often abbreviate a
proposition such as Cavity = true simply by the lowercase name cavity. Similarly, Cavity = false
would be abbreviated by cavity.
b) Discrete random variables, which include Boolean random variables as a special case, take on values
from a countable domain. For example, the domain of Weather might be (sunny, rainy, cloudy,
snow). The values in the domain must be mutually exclusive and exhaustive. Where no confusion
arises, we: will use, for example, snow as an abbreviation for Weather = snow.
c) Continuous random variables take on values from the: real numbers. The domain can be either the
entire real line or some subset such as the interval [0,1].
Hridaya Kandel 133
Probability Theory : syntax
Elementary propositions, such as Cavity = true and Toothache =false, can be
combined to form complex propositions using all the standard logical
connectives.
For example, (Cavity = true Toothache = false) is a proposition to which one may
ascribe a degree of belief. Can also be written as (cavity toothache)

Atomic event: A complete specification of the state of the world about which the
agent is uncertain
E.g., if the world consists of only two Boolean Random variables Cavity and
Toothache, then there are 4 distinct atomic events:
Cavity = false Toothache = false Can also be written as
Cavity = false Toothache = true cavity toothache
cavity toothache
Cavity = true Toothache = false cavity toothache
Cavity = true Toothache = true cavity toothache
Atomic events are mutually exclusive and exhaustive

Hridaya Kandel 134


Probability Theory : syntax
Unconditional or prior probability
The unconditional or prior probability associated with a proposition a is the
degree of belief accorded to it in the absence of any other information; it is
written as P(a). For example, if the prior probability that I have a cavity is 0.1,
then we would write P(Cavity = true) = 0.1 or P(cavity) = 0.1
If weather is a discrete random variables and its probability for each state as below
a) P( Weather = sunny) = 0.7
b) P( Weather = rain) = 0.2
c) P( Weather = cloudy) = 0.08
d) P( Weather = snow) = 0.02 .
We simply write as P(Weather) = (0.7, 0.2, 0.08, 0.02)
This statement defines a prior probability distribution for the random variable
Weather
We will also use expressions such as P( Weather, Cavity) to denote the
probabilities of all combinations of the values of a set of random variable. In that
case, P( Weather, Cavity) can be represented by a 4 x 2 table of probabilities.
This is called the joint probability distribution of Weather and Cavity.
Hridaya Kandel 135
Probability Theory : syntax
Conditional or posterior probability
Once the agent has obtained some evidence concerning the previously unknown
random variables making up the domain, prior probabilities are no longer
applicable. Instead, we use conditional or posterior probabilities.
The notation used is P(a|b), where a and b are any proposition. This is read as
"the probability of a, given that all we know is b."
For example
P(Cavity | Toothache) = 0.8 indicates that if a patient is observed to have a toothache and no
other information is yet available, then the probability of the patient's having a cavity will be
0.8.

Conditional probabilities can be defined in terms of unconditional probabilities.


The condition probability of the occurrence of A if event B occurs
P(A|B) = P(A B) / P(B)
This can also be written as :
P(A B) = P(A|B) * P(B)
called the product rule
Hridaya Kandel 136
Axioms of probability: semantics
Elementary propositions, such as Cavity = true and Toothache =false, can be
combined to form complex propositions using all the standard logical
connectives.
For example, (Cavity = true Toothache = false) is a proposition to which one may
ascribe a degree of belief. Can also be written as (cavity toothache)

Atomic event: A complete specification of the state of the world about which the
agent is uncertain
E.g., if the world consists of only two Boolean Random variables Cavity and
Toothache, then there are 4 distinct atomic events:
Cavity = false Toothache = false Can also be written as
Cavity = false Toothache = true cavity toothache
cavity toothache
Cavity = true Toothache = false cavity toothache
Cavity = true Toothache = true cavity toothache
Atomic events are mutually exclusive and exhaustive

Hridaya Kandel 137


Axioms of probability: semantics
All probabilities are between 0 and 1. For any proposition A,
0 P(A) 1
Necessarily true (i.e., valid) propositions have probability 1, and necessarily false
(i.e.unsatisfiable) propositions have probability 0.
P(true) = 1 and P(false) = 0

The probability of a disjunction is given by


P(A B) = P(A) + P(B) - P(A B)

Hridaya Kandel 138


Inference: Joint probability Distribution
Example
Probability distribution for P(Cavity, Tooth)

Toothache Toothache

Cavity 0.04 0.06


Cavity 0.01 0.89

P(CavityToothache) P(CavityToothache)

P(Cavity) = 0.04 + 0.06 = 0.1


P(Cavity Tooth) = 0.04 + 0.01 + 0.06 = 0.11
P(Cavity | Tooth) = P(Cavity Tooth) / P(Tooth) = 0.04 / 0.05
Note : P(A B) = P(A) + P(B) - P(A B)
Hridaya Kandel 139
Inference: Joint probability Distribution
Example a domain consisting of just the three Boolean variables Toothache, Cavity, and Catch
(the dentist's nasty steel probe catches in my tooth). The full joint distribution is a 2 x 2 x 2 table as
shown in Figure
Probability distributions for P(Cavity, Tooth, Catch)

Tooth ~ Tooth
Catch ~ Catch Catch ~ Catch
Cavity 0.108 0.012 0.072 0.008
~ Cavity 0.016 0.064 0.144 0.576

P(Cavity) = 0.108 + 0.012 + 0.72 + 0.008 = 0.2


P(Cavity Tooth) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28
P(Cavity | Tooth) = P(Cavity Tooth) / P(Tooth) =
[P(Cavity Tooth Catch) + P(Cavity Tooth ~ Catch)] * / P(Tooth) =??

Note: P(A B) = P(A) + P(B) - P(A 140B)


Hridaya Kandel
Bayes' Theorem
Bayesian view of probability is related to degree of belief
It is measure of plausibility of an event given incomplete knowledge.
Bayes theorem is also known as Bayes rule or Bayes law, or called Bayesian reasoning
The probability of an event A conditional to another event B i.e P(A|B) is generally
different from probability of B conditional A i.e P(B|A).
There is a relationship between the two P(A|B) and P(B|A) , and the Bayes theorem
is the statement of that relationship.
Bayes theorem is a way to calculate P(B|A) from a knowledge of P(A|B).

Bayes rule is given as

Proof of Bayes Theorem


Use Product rule P(A B) = P(A|B) *P(B) (as discussed in class)

Hridaya Kandel 141


Bayes' Theorem: Useful
Bayes rule is useful in practice because there are many cases where we have good
probability estimates for three of the four probabilities involved, and therefore can
compute the fourth one.
Often useful for diagnosis:
If X are (observed) effects and Y are (hidden) causes,
We may have a model for how causes lead to effects (P(X | Y))
We may also have prior beliefs (based on experience) about the frequency of occurrence of
effects (P(Y))
Which allows us to reason abductively from effects to causes (P(Y | X)).

Diagnostic knowledge is often more fragile than causal knowledge


Bayes rule Example
Suppose we know that
Stiff neck is a symptom in 50% of meningitis cases
Meningitis (m) occurs in 1/50,000 patients
Stiff neck (s) occurs in 1/20 patients
Then
Given P(s|m) = 0.5, P(m) = 1/50000, P(s) = 1/20
P(m|s) = (P(s|m) P(m))/P(s)
= (0.5 x 1/50000) / 1/20 = .0002
Hridaya Kandel 142
So we expect that one in 5000 patients with a stiff neck to have meningitis.
Bayes' Theorem: Useful
In doing an expert task, such as medical diagnosis, the goal is to determine
identifications (diseases) given observations (symptoms). Bayes' Theorem
provides such a relationship.
has gained importance recently due to advances in efficiency
more computational power available
better methods

Bayes rule (Pros & Cons)


Advantages
sound theoretical foundation
well-defined semantics for decision making
problems
requires large amounts of probability data
sufficient sample sizes
subjective evidence may not be reliable
independence of evidences assumption often not valid
relationship between hypothesis and evidence is reduced to a number
explanations for the user difficult
high computational overhead
Hridaya Kandel 143
Issues with Probabilities
Often don't have the data
Just don't have enough observations
Data can't readily be reduced to numbers or frequencies.
Human estimates of probabilities are notoriously inaccurate. In
particular, often add up to >1.
Doesn't always match human reasoning well.
P(x) = 1 - P(-x). Having a stiff neck is strong (.9998!) evidence that you
don't have meningitis. True, but counterintuitive.

Several other approaches for uncertainty address some of these


problems.

Hridaya Kandel 144


Bayesian networks
Also called
Bayesian belief network (BBNs) or simply belief Network,
probabilistic network,
causal probabilistic network (CPNs) or simply causal Network, and
knowledge map.
In statistics graphical model

Represent dependencies among random variables


Give a short specification of conditional probability distribution
Many random variables are conditionally independent
Simplifies computations
Graphical representation
Directed Acyclic Graph (DAG) causal relationships among random variables
Allows inferences based on the network structure

Hridaya Kandel 145


Bayesian networks
The full specification of BN
1. A set of random variables makes up the nodes of the network. Variables may be
discrete or continuous.
2. A set of directed links or arrows connects pairs of nodes. If there is an arrow from
node X to node Y, X is said to be a parent of Y.
3. Each node Xi has a conditional probability distribution P(Xi | Parents(Xi)) that
quantifies the effect of the parents on the node.
4. The graph has no directed cycles (and hence is a directed, acyclic graph, or DAG).
Simple example
A simple Bayesian network in which Weather is independent of the other three variables and
Toothache and Catch are conditionally independent, given Cavity.

Hridaya Kandel 146


Bayesian networks : Example
Consider following domain
You have a new burglar alarm installed at home. It is fairly reliable at detecting a
burglary, but also responds on occasion to minor earthquakes. You also have two
neighbors, John and Mary, who have promised to call you at work when they
hear the alarm. John nearly always calls when he hears the alarm, but sometimes
confuses the telephone ringing with the alarm and calls then, too. Mary, on the
other hand, likes rather loud music and often misses the alarm altogether. Given
the evidence of who has or has not called, we would like to estimate the
probability of a burglary.
A Bayesian network for this domain is shown in figure next slides.
The network structure shows that burglary and earthquakes directly affect the
probability of the alarms going off, but whether John and Mary call depends
only on the alarm. The network thus represents our assumptions that they do not
perceive burglaries directly, they do not notice minor earthquakes, and they do
not confer before calling.
The conditional distributions are shown as a conditional probability table, or
CPT

Hridaya Kandel 147


Bayesian networks : Example
A typical Bayesian network, showing both the topology and the conditional probability
tables (CPTs). In the CPTs, the letters B, E, A, J, and M stand for Burglary, Earthquake,
Alarm, JohnCalls, and MaryCalls , respectively.

Example: What is the probability that the alarm has sounded, but neither burglary nor an
earthquake has occurred, and both John and Mary call?
P(J M A B E ) = P(J|A)* P(M|A)*P(A|B E )*P(B) P(E)
Hridaya Kandel
=0.9 * 0.7 * 0.001 *148
0.999 * 0.998 = 0.00062
Thank You!

Hridaya Kandel 149


Hridaya Kandel 150