Академический Документы
Профессиональный Документы
Культура Документы
Problem Solving:
Strong AI aims to build machines that can truly reason and
solve problems. These machines should be self aware and their
overall intellectual ability needs to be indistinguishable from
that of a human being. Excessive optimism in the 1950s and
1960s concerning strong AI has given way to an appreciation of
the extreme difficulty of the problem. Strong AI maintains that
suitably programmed machines are capable of cognitive mental
states.
Weak AI: deals with the creation of some form of computerbased artificial intelligence that cannot truly reason and solve
problems, but can act as if it were intelligent. Weak AI holds
that suitably programmed machines can simulate human
cognition.
Applied AI: Aims to produce commercially viable "smart"
systems such as, for example, a security system that is able to
recognize the faces of people who are permitted to enter a
particular building. Applied AI has already enjoyed considerable
success.
Cognitive AI: computers are used to test theories about how the
human mind works--for example, theories about how we
recognize faces and other objects, or about how we solve
abstract problems.
Best First Search: Best First search, which is a way of
combining the advantages of both depth-first and breadth-firstsearch into a single method.
One way of combining the two is to follow a single path
at a time, but switch paths whenever some competing path looks
more promising than the current one does.
Page 3
Page 4
Step4
Step5
A
D
B
Page 6
AO* algorithm:
1.
2.
Page 11
Mini-max search: The mini-max search procedure is a depthlimited search procedure. The idea is to start at the current
position and use the plausible-move generator to generate the set
of possible successor positions. Now we can apply the static
evaluation function to those positions and simply choose the
best one.
The starting position is exactly as good for us as the position
generated by the best move we can make next. Here we assume
that the static evaluation function returns large values to indicate
good situations for us, so our goal is to maximize the value of
the static evaluation function of the next board position.
An example of this operation is shown in fig1. It assumes a
static evaluation function that returns values ranging from -10 to
10, with 10 indicating a win for us, -10 a win for the opponent,
and 0 an even match. Since our goal is to maximize the value of
the heuristic function, we choose to move to B. backing Bs
value up to A, we can conclude that As value is 8, since we
know we can move to a position with a value of 8.
A
A
Page 12
B(8)
(8)
(3)
(-2)
J
J
(9) (-6) (0) (0) (-2) (-4) (3) J
J
J
Fig1. One ply search and two ply search
J
L
But since we know that the static evaluation function is not
L
completely accurate, we would like to carry the search farther
L
ahead than one ply. This could be very important for example, in
L
a chess game in which we are in the middle of a piece exchange.
F
E
After our move, the situation would appear to be very good, but
if we look one move ahead, we will see that one of our pieces
also gets captured and so the situation is not as seemed.
Once the values from the second ply are backed up, it becomes
clear that the correct move for us to make at the first level, given
the information we have available, is C, since there is nothing
the opponent can do from there to produce a value worse than 2. This process can be repeated for as many ply as time allows,
and the more accurate evaluations that are produced can be used
to choose the correct move at the top level. The alteration of
maximizing and minimizing at alternate ply when evaluations
Page 13
Heuristic function
A heuristic function, or simply a heuristic, is a function that
ranks alternatives in various search algorithms at each branching
step based on the available information (heuristically) in order to
make a decision about which branch to follow during a search.
Shortest paths
For example, for shortest path problems, a heuristic is a
function, h(n) defined on the nodes of a search tree, which
serves as an estimate of the cost of the cheapest path from that
node to the goal node. Heuristics are used by informed search
algorithms such as Greedy best-first search and A* to choose the
Page 14
Page 15
Finding heuristics
The problem of finding an admissible heuristic with a low
branching factor for common search tasks has been extensively
researched in the artificial intelligence community. Several
common techniques are used:
Pseudo code
Function alpha beta (node, depth, , , Player)
if depth = 0 or node is a terminal node
return the heuristic value of node
If Player = Max Player
for each child of node
:= max(, alpha beta(child, depth-1, , , not(Player) ))
if
Break
(* Beta cut-off *)
return
else
for each child of node
:= min(, alpha beta(child, depth-1, , , not(Player) ))
if
break
(* Alpha cut-off *)
return
(* Initial call *)
Alpha beta (origin, depth, -infinity, +infinity, Max Player)
Heuristic improvements
Alpha-beta search can be made even faster by considering only a
narrow search window (generally determined by guesswork
based on experience). This is known as aspiration search. In the
extreme case, the search is performed with alpha and beta equal;
a technique known as zero-window search, null-window search,
or scout search. This is particularly useful for win/loss searches
near the end of a game where the extra depth gained from the
narrow window and a simple win/loss evaluation function may
lead to a conclusive result. If an aspiration search fails, it is
Page 21
Page 24
Constraint programming
Constraint programming is the use of constraints as a
programming language to encode and solve problems. This is
often done by embedding constraints into a programming
language, which is called the host language. Constraint
programming originated from a formalization of equalities of
terms in Prolog II, leading to a general framework for
embedding constraints into a logic programming language. The
most common host languages are Prolog, C++, and Java, but
other languages have been used as well.
Constraint logic programming
A constraint logic program is a logic program that contains
constraints in the bodies of clauses. As an example, the clause
A(X):-X>0,B(X) is a clause containing the constraint X>0 in the
body. Constraints can also be present in the goal. The constraints
in the goal and in the clauses used to prove the goal are
accumulated into a set called constraint store. This set contains
the constraints the interpreter has assumed satisfiable in order to
proceed in the evaluation. As a result, if this set is detected un
satisfiable, the interpreter backtracks. Equations of terms, as
used in logic programming, are considered a particular form of
Page 25
Page 26
Page 28
as many ply from the current position as it can search in the time
available. Except for the case of "pathological" game trees [1]
(which seem to be quite rare in practice), increasing the search
depth (i.e., the number of ply searched) generally improves the
chance of picking the best move.
Two-person games can also be represented as and-or trees. For
the first player to win a game there must exist a winning move
for all moves of the second player. This is represented in the
and-or tree by using disjunction to represent the first player's
alternative moves and using conjunction to represent all of the
second player's moves.
Solving Game Trees
second player can follow that will guarantee either a win or tie.
The algorithm can be described recursively as follows.
1. Color the final ply of the game tree so that all wins for
player 1 are colored one way, all wins for player 2 are
colored another way, and all ties are colored a third
way.
2. Look at the next ply up. If there exists a node colored
opposite as the current player, color this node for that
player as well. If all immediately lower nodes are
colored for the same player, color this node for the
same player as well. Otherwise, color this node a tie.
3. Repeat for each ply, moving upwards, until all nodes
are colored. The color of the root node will determine
the nature of the game.
The diagram shows a game tree for an arbitrary game, colored
using the above algorithm.
It is usually possible to solve a game (in this technical sense of
"solve") using only a subset of the game tree, since in many
games a move need not be analyzed if there is another move that
is better for the same player (for example alpha-beta pruning can
be used in many deterministic games).
Any sub tree that can be used to solve the game is known as a
decision tree, and the sizes of decision trees of various shapes
are used as measures of game complexity.
Page 32
Unit-2
Knowledge Representation
Introduction to Knowledge Representation (KR)
We argue that the notion can best be understood in terms of
five distinct roles it plays, each crucial to the task at hand:
A
Acquisition Efficiency
- the ability to acquire new knowledge using automatic methods
wherever possible rather than reliance on human intervention.
X. likes(X,
For all X
if (X is a rose)
then there exists Y
(X has Y) and (Y is a thorn)
Higher Order Logic
More expressive than first order
Functions and predicates are also objects
Page 36
CD primitive action
Explanation
1.
ATRANS
transfer of
2.
PTRANS
transfer of
physical location of an object (e.g. go)
3.
PROPEL
application
of physical force of an object (e.g. throw)
4.
MOVE
movement
of a body part of an animal by the animal
5.
GRASP
grasping of
an object by an actor (e.g. hold)
6.
INGEST
Taking of
an object by an animal to the inside of that a
animal
(e.g. Drink.eat)
7.
EXPEL
Expulsion of
an object from inside the body by an animal to the world
(e.g. spit)
8.
MTRANS
Transfer of
mental information between animals or within an animal
(e.g. tell)
9.
MBUILD
Construction
of a new information from an old information (e.g. decide).
10. SPEAK
3. LOCs: Locations
Every action takes place at some locations and serves as
source and destination.
4. Ts: Times
Page 42
/-negative
Nil-present
Delta-timeless
C-conditional
Semantic Nets: The main idea behind semantic nets is that the
meaning of a concept comes from the ways in which it is
connected to other concepts. In a semantic net, information is
represented as a set of nodes connected to each other by a set of
labeled arcs, which represent relationship among the nodes. A
fragment of a typical semantic net is shown in fig.
Mammal
Isa
Person
Has-port
Nose
Instance
UniformColor
Blue
team
Pee-wee-Reese
Brooklyn- Dodgers
Page 45
Or, in logic:
X: dog(x)
y: Mail-carrier(y) bite(x, y)
Mail-carrier
Bite
Isa
Isa
isa
Assailant
victim
Mammal
Cardinality:
6,000,000,000
*handed:
Right
Adult-Male
Isa:
person
Cardinality
2,000,000,000
*height:
5-10
ML-Baseball-Player
Isa:
adult-male
Cardinality:
624
*height:
6-1
*bats:
equal to handed
*batting-average: .252
*team:
*uniform-color:
Page 49
Fielder
Isa:
ML-baseball-player
Cardinality:
376
*batting-average: .262
Pee-Wee-Reese
Instance:
fielder
Height:
5-10
Bats:
right
Batting-average: .309
Team:
Brooklyn-Dodgers
Uniform-color:
Blue
ML-Baseball-Team
Isa:
Team
Cardinality:
26
*team-size:
24
*manager:
Brooklyn-dodgers
Instance:
ML-Baseball-Team
Team-size:
24
Manager:
Leo-Durocher
Page 50
Players:
(Pee-Wee-Reese)
Unit-3
Handling Uncertainty and learning
Fuzzy Logic: In the techniques we have not modified the
mathematical underpinnings provided by set theory and logic.
We have instead augmented those ideas with additional
constructs provided by probability theory. We take a different
approach and briefly consider what happens if we make
fundamental changes to our idea of set membership and
corresponding changes to our definitions of logical operations.
The motivation for fuzzy sets is provided by the need
to represent such propositions as:
John is very tall.
Mary is slightly ill.
Sue and Linda are close friends.
Page 53
{A, B, C,}
{A, B, D}
{A, C, D}
{B, C, D}
{A, B} {A, C,} {B, C,} {B, D} {A, C,} {C, D} {B, D} {C, D}
Page 56
{A}
{B}
{C}
{D}
{O}
Fig Lattice of subsets of the universe U.
Bayes Theorem: An important goal for many problemsolving systems is to collect evidence as the system goes along
and to modify its behavior on the basis of evidence. To model
Page 57
P (E/Hn).P (Hn)
Specifically, when we say P (A/B), we are describing
the conditional probability of A given that the only evidence we
have is B. If there is also other relevant evidence, then it too
must be considered. Suppose, for example, that we are solving a
medical diagnosis problem. Consider the following assertions:
S: patient has spots
M: patient has measles
F: patient has high fever
Without any additional evidence, the presence of spots serves
as evidence in favor of measles. It also serves as evidence of
fever since measles would cause fever. But, since spots and
fever are not independent events, we cannot just sum their
effects; instead, we need to represent explicitly the conditional
probability that arises from their conjunction. In general, given a
prior body of evidence e and some new observation E, we need
to compute.
P (H/E, e) = P (H/E).P (e/E, H)
P (e/E)
Unfortunately, in an arbitrarily complex world, the sizes of
the set of join probabilities that we are require in order to
compute this function grows as 2n if there are n different
Page 59
Feedback u
Learner
Component
Environment
Or Teacher
Knowledge
Base
Critic
performance
Evaluator
Response
Performance
Component
Tasks
Fig. Learning Model
Page 63
Page 64
Feedback
Learning
resultant
Algorithms
Training
Scenario
Representation
scheme
Page 67
Factors to consider
Factors to consider when choosing and applying a learning
algorithm include the following:
1. Heterogeneity of the data. If the feature vectors include
features of many different kinds (discrete, discrete ordered,
counts, continuous values), some algorithms are easier to
apply than others. Many algorithms, including Support
Vector Machines, linear regression, logistic regression,
neural networks, and nearest neighbor methods, require that
the input features be numerical and scaled to similar ranges
(e.g., to the [-1,1] interval). Methods that employ a distance
function, such as nearest neighbor methods and support
vector machines with Gaussian kernels, are particularly
sensitive to this. An advantage of decision trees is that they
easily handle heterogeneous data.
2. Redundancy in the data. If the input features contain
redundant information (e.g., highly correlated features),
some learning algorithms (e.g., linear regression, logistic
regression, and distance based methods) will perform
poorly because of numerical instabilities. These problems
can often by solved by imposing some form of
regularization.
3. Presence of interactions and non-linearitys. If each of the
features makes an independent contribution to the output,
then algorithms based on linear functions (e.g., linear
regression, logistic regression, Support Vector Machines,
naive Bayes) and distance functions (e.g., nearest neighbor
methods, support vector machines with Gaussian kernels)
Page 68
Page 70
Has-scales?
has-feathers? flies?
lives in water?
lays eggs?
Dog
Cat
Bat
Whale
Canary
Robin
Ostrich
Snake
Lizard
1
Page 71
Alligator 0
highest level of activation, and set that unit to 1 and all other
output units to 0. In other words, the output unit with the highest
activation is the only one we consider to be active. A more
neural-like solution is to have the output units fight among
themselves for control of an input vector.
Page 73
Page 78
Page 83
Disadvantages:
1.
Dependency-directed
backtracking incurs a significant time and space overhead
as it requires the maintenance of dependency records and
an additional no-good database. Thus the effort required to
Page 84
maintain the dependencies may be more than the problemsolving effort solved.
2.
If the problem solver is
logically complete and finishes all work on a state before
considering the next, the problem of backtracking to an
inappropriate choice cannot occur.
3.
In such cases much of the
advantage of Dependency-directed backtracking is
irrelevant. However, most practical problem solvers are
neither logically complete nor finish all possible work on a
state before considering one other.
Fuzzy function:
Membership function is the one of the fuzzy function which is
used to develop the fuzzy set value. The fuzzy logic is depends
upon membership function
Page 85
Unit-4
Natural Language processing and planning
Backward chaining: Backward chaining (or backward
reasoning) is an inference method used in automated theorem
provers, proof assistants and other artificial intelligence
applications. It is one of the two most commonly used methods
of reasoning with inference rules and logical implications the
other is forward chaining. Backward chaining is implemented in
logic programming by SLD resolution. Both rules are based on
the modus ponens inference rule.
Backward chaining starts with a list of goals (or a hypothesis)
and works backwards from the consequent to the antecedent to
see if there is data available that will support any of these
consequents. An inference engine using backward chaining
would search the inference rules until it finds one which has a
consequent (Then clause) that matches a desired goal. If the
antecedent (If clause) of that rule is not known to be true, then it
is added to the list of goals (in order for one's goal to be
confirmed one must also provide data that confirms this new
rule).
For example, suppose that the goal is to conclude the color of
my pet Fritz, given that he croaks and eats flies, and that the rule
base contains the following four rules:
1. If X croaks and eats flies Then X is a frog
Page 86
Page 88
starting from some multi text and one or more monolingual tree
banks.
The recipe follows:
T1. Induce a word-to-word translation model.
T2. Induce PCFGs from the relative frequencies of productions
in the monolingual tree banks
T3. Synchronize some multi text,
T4. Induce an initial PMTG from the relative frequencies of
productions in the multi tree bank.
T5. Re-estimate the PMTG parameters, using a
Synchronous parser with the expectation smearing.
A1. Use the PMTG to infer the most probable multi tree
Covering new input text.
A2. Linearize the output dimensions of the multi tree.
Steps T2, T4 and A2 are trivial. Steps T1, T3, T5, and A1 are
instances of the generalized parsers
Figure 2 is only architecture. Computational
Complexity and generalization error stand in the
Way of its practical implementation. Nevertheless,
it is satisfying to note that all the non-trivial algorithms
In Figure 2 are special cases of Translator CT.
It is therefore possible to implement an MTSMT
System using just one inference algorithm, parameterized
By a grammar, a smearing, and a search
Strategy. An advantage of building an MT system in
This manner is that improvements invented for ordinary
Parsing algorithms can often be applied to all
The main components of the system. For example,
Page 90
Page 91
size, each block can have at most one other block directly on top
of it.
In order to specify both the conditions under which an operation
may be performed and the results of performing it, we need to
use the following predicates:
x: HOLDING (x)]
ARMEMPTY
x: ONTABLE(x)
y: ON(x,y)
ON (A, B, S0) ^
ONTABLE (B, S0) ^
A: ARMEMPTY^ON(X, Y)
UNSTACK(X, Y)
P: ON(X, Y) ^CLEAR(X) ^ARMEMPTY
D: ON(X, Y) ^ARMEMPTY
A: HOLDING(X) ^ON(X, Y)
PICKUP(X)
P: CLEAR(X) ^ONTABLE(X) ^ARMEMPTY
D: ONTABLE(X) ^ARMEMPTY
A: HOLDING(X)
PUTDOWN(X)
P: HOLDING(X)
D: HOLDING(X)
A: ONTABLE(X) ^ARMEMPTY
Start: ON (B, A) ^
goal: ON (C, A) ^
ON TABLE (A) ^
ON (B, D) ^
ONTABLE(C) ^
ONTABLE (A) ^
ONTABLE (D) ^
ONTABLE (D)
ARMEMPTY
ON (C, A)
ON (B, D)
ON (B, D)
ON (C, A)
[1]
[2]
|\
^
| \++++++++++++++++|
|
|
v
|
++++++> S3 ++++++> S4 ++++++
graphically represents the temporal constraints S1 < S2, S1 <
S3, S1 < S4, S2 < S5, S3 < S4, and S4 < S5. This partial-order
plan implicitly represents the following three total-order plans,
each of which is consistent with all of the given constraints:
[S1,S2,S3,S4,S5], [S1,S3,S2,S4,S5], and [S1,S3,S4,S2,S5].
Partial-Order Planner (POP) Algorithm
function pop(initial-state, conjunctive-goal, operators)
// non-deterministic algorithm
plan = make-initial-plan(initial-state, conjunctive-goal);
loop:
begin
if solution?(plan) then return plan;
(S-need, c) = select-subgoal(plan) ; // choose an unsolved goal
choose-operator(plan, operators, S-need, c);
// select an operator to solve that goal and revise plan
resolve-threats(plan); // fix any threats created
end
end
function solution?(plan)
if causal-links-establishing-all-preconditions-of-all-steps(plan)
and all-threats-resolved(plan)
and all-temporal-ordering-constraints-consistent(plan)
and all-variable-bindings-consistent(plan)
then return true;
else return false;
Page 101
end
function select-subgoal(plan)
pick a plan step S-need from steps(plan) with a precondition c
that has not been achieved;
return (S-need, c);
end
procedure choose-operator(plan, operators, S-need, c)
// solve "open precondition" of some step
choose a step S-add by either
Step Addition: adding a new step from operators that
has c in its Add-list
or Simple Establishment: picking an existing step in Steps(plan)
that has c in its Add-list;
if no such step then return fail;
add causal link "S-add --->c S-need" to Links(plan);
add temporal ordering constraint "S-add < S-need" to Orderings(plan);
if S-add is a newly added step then
begin
add S-add to Steps(plan);
add "Start < S-add" and "S-add < Finish" to Orderings(plan);
end
end
procedure resolve-threats(plan)
foreach S-threat that threatens link "Si --->c Sj" in Links(plan)
begin // "declobber" threat
choose either
Demotion: add "S-threat < Si" to Orderings(plan)
or Promotion: add "Sj < S-threat" to Orderings(plan);
if not(consistent(plan)) then return fail;
end
Page 102
end
Unit-5
Expert System and AI languages
Introduction: An expert system is a set of programs that
manipulate encoded knowledge to solve problems in a
specialized domain that normally requires human expertise. An
expert systems knowledge is obtained form expert sources and
coded in a form suitable for the system to use in its interference
or reasoning processes. The expert knowledge must be obtained
from specialists or other sources of expertise, such as texts,
Page 103
Development engine
Pro
ble
m
Knowledge
Base
Inference engine
Do
mai
n
User interface
User
E&F
Page 107
EXPERT SYSTEM
USER
Explanation
Module
Inference engine
Input
I/O interface
Case history
file
Output
Editor
Knowledge
base
Working
memory
Learning
Module
ENCOURAGED (student)
MOTIVATED (student)
Page 109
WORKHARD (student)
EXCELL (student)
EXCELL (student)
SUCCED (student)
Page 110
Page 111
Page 112
Page 114
Blackboard
Knowledge sources
Page 115
Control information
Fig. Components of blackboard systems.
which is passed on to other nodes or is used to produce some
output response.
Neural networks were originally inspired as being
models of the human nervous system. They are generally
simplified models to be sure.
D
Domain
Expert
Knowledge
engineer
System
Editor
Knowledge
Base
Page 118
Examples
Here follow some example programs written in Prolog.
Hello world
An example of a query:
?- write('Hello world!'), nl.
Hello world!
true.
?Compiler optimization
Any computation can be expressed declaratively as a sequence
of state transitions. As an example, an optimizing compiler with
Page 123
Dynamic programming
The following Prolog program uses dynamic programming to
find the longest common subsequence of two lists in polynomial
time. The clause database is used for memorization:
:- dynamic(stored/1).
memo (Goal) :- ( stored (Goal) -> true ; Goal,
asserts(stored(Goal)) ).
lcs([], _, []) :- !.
lcs(_, [], []) :- !.
lcs([X|Xs], [X|Ys], [X|Ls]) :- !, memo(lcs(Xs, Ys, Ls)).
lcs([X|Xs], [Y|Ys], Ls) :memo(lcs([X|Xs], Ys, Ls1)), memo(lcs(Xs, [Y|Ys], Ls2)),
length(Ls1, L1), length(Ls2, L2),
( L1 >= L2 -> Ls = Ls1 ; Ls = Ls2 ).
Example query:
?- lcs([x,m,j,y,a,u,z], [m,z,j,a,w,x,u], Ls).
Ls = [m, j, a, u]
Design patterns
A design pattern is a general reusable solution to a commonly
occurring problem in software design. In Prolog, design patterns
go under various names: skeletons and techniques, clichs,
program schemata, and logic description schemata. An
alternative to design patterns is higher order programming.
Page 125
Higher-order programming
Main articles: Higher-order logic and Higher-order
programming
By definition, first-order logic does not allow quantification
over predicates. A higher-order predicate is a predicate that
takes one or more other predicates as arguments. Prolog already
has some built-in higher-order predicates such as call/1, find
all/3, setoff/3, and bag of/3.[16] Furthermore, since arbitrary
Prolog goals can be constructed and evaluated at run-time, it is
easy to write higher-order predicates like map list/2, which
applies an arbitrary predicate to each member of a given list, and
sub list/3, which filters elements that satisfy a given predicate,
also allowing for currying.[15]
To convert solutions from temporal representation (answer
substitutions on backtracking) to spatial representation (terms),
Prolog has various all-solutions predicates that collect all answer
substitutions of a given query in a list. This can be used for list
comprehension. For example, perfect numbers equal the sum of
their proper divisors:
Perfect (N):between (1, inf, N), U is N // 2,
find all(D, (between(1,U,D), N mod D =:= 0), Ds),
sum list(Ds, N).
This can be used to enumerate perfect numbers, and also to
check whether a number is perfect
Page 126
Page 128
Practical use
MYCIN was never actually used in practice. This wasn't
because of any weakness in its performance. As mentioned, in
tests it outperformed members of the Stanford medical school
faculty. Some observers raised ethical and legal issues related to
the use of computers in medicine if a program gives the
wrong diagnosis or recommends the wrong therapy, who should
be held responsible? However, the greatest problem, and the
reason that MYCIN was not used in routine practice, was the
state of technologies for system integration, especially at the
time it was developed. MYCIN was a stand-alone system that
required a user to enter all relevant information about a patient
by typing in response to questions that MYCIN would pose. The
program ran on a large time-shared system, available over the
early Internet (Arpanet), before personal computers were
developed. In the modern era, such a system would be integrated
with medical record systems, would extract answers to questions
from patient databases, and would be much less dependent on
physician entry of information. In the 1970s, a session with
MYCIN could easily consume 30 minutes or morean
unrealistic time commitment for a busy clinician.
A difficulty that rose to prominence during the development of
MYCIN and subsequent complex expert systems has been the
extraction of the necessary knowledge for the inference engine
Page 129
to use from the human expert in the relevant fields into the rule
base (the so-called knowledge engineering).
Page 130