Вы находитесь на странице: 1из 46

AI in game (IV)

Oct. 11, 2006

So far
Artificial Intelligence: A Modern Approach
Stuart Russell and Peter Norvig
Prentice Hall, 2nd ed.
Chapter 1: AI taxonomy
Chapter 2: agents
Chapter 3: uninformed search
Chapter 4: informed search

From now on
Artificial Intelligence: A Modern Approach
Chapter 4.
Chapter 6: adversarial search

Network part
Learning (maybe from the same textbook)
Game AI techniques

Outline
Ch 4. informed search
Online search

Ch 6. adversarial search
Optimal decisions
- pruning
Imperfect, real-time decisions

Offline search vs. online search


Offline search agents
Compute a solution before setting foot in the
real world

Online search agents


Interleave computation and action
E.g. takes an action and then observes
environments and then computes the next action

Necessary for an exploration problem


States and actions are unknown
E.g. robot in a new building, or labyrinth

online search problems


Agents are assumed to know only
Actions(s): returns a list of actions allowed in states
c(s,a,s): this step-cost cannot be used until the agent knows that
s is the outcome
Goal-test(s)

The agent cannot access the successors of a state


except by actually trying all the actions in that state
Assumptions
The agent can recognize a state that it has visited before
Actions are deterministic
Optionally, an admissible heuristic function

online search problems


If some actions are irreversible, the agent may reach a
dead end

If some goal state is reachable from every reachable


state, the state space is safely explorable

online search agents


Online algorithm can expand only a node that it physically
occupies
Offline algorithms can expand any node in fringe

Same principle as DFS

Online DFS
function ONLINE_DFS-AGENT(s) return an action
input: s, a percept identifying current state
static: result, a table of the next state, indexed by action and state, initially empty
unexplored, a stack that lists, for each visited state, the action not yet tried
unbacktracked, a stack that lists, for each visited state, the predecessor states
agent has not yet backtracked
s, a, the previous state and action, initially null
if GOAL-TEST(s) then return stop
if s is a new state then unexplored[s] ACTIONS(s)
if s is not null then do
result[a,s] s
add s to the front of unbackedtracked[s]
if unexplored[s] is empty then
if unbacktracked[s] is empty then return stop
else a an action b such that result[b, s]=POP(unbacktracked[s])
else a POP(unexplored[s])
s s
return a

to which the

Online DFS, example

Assume maze problem on 3x3


grid.
s = (1,1) is initial state
result, unexplored (UX),
unbacktracked (UB),
are empty
s, a are also empty

Online DFS, example

GOAL-TEST((1,1))?
s != G thus false

s=(1,1)

(1,1) a new state?


True
ACTION((1,1)) UX[(1,1)]

{RIGHT,UP}

s is null?
True (initially)

UX[(1,1)] empty?
False

POP(UX[(1,1)]) a
a=UP

s = (1,1)
Return a

Online DFS, example

s=(1,2)

GOAL-TEST((1,2))?
s != G thus false

(1,2) a new state?


True
ACTION((1,2)) UX[(1,2)]

s is null?
false (s=(1,1))
result[UP,(1,1)] (1,2)
UB[(1,2)]={(1,1)}

s
s

{DOWN}

UX[(1,2)] empty?
False

a=DOWN, s=(1,2)
return a

Online DFS, example

s=(1,1)

GOAL-TEST((1,1))?
s != G thus false

(1,1) a new state?


false

s is null?
false (s=(1,2))
result[DOWN,(1,2)] (1,1)
UB[(1,1)] = {(1,2)}

UX[(1,1)] empty?
False

a=RIGHT, s=(1,1)
return a

Online DFS, example

s=(2,1)

GOAL-TEST((2,1))?
s != G thus false

(2,1) a new state?


True, UX[(2,1)]={RIGHT,UP,LEFT}

s is null?
false (s=(1,1))
result[RIGHT,(1,1)] (2,1)
UB[(2,1)]={(1,1)}

UX[(2,1)] empty?
False

a=LEFT, s=(2,1)
return a

Online DFS, example

s=(1,1)

GOAL-TEST((1,1))?
s != G thus false

(1,1) a new state?


false

s is null?
false (s=(2,1))
result[LEFT,(2,1)] (1,1)
UB[(1,1)]={(2,1),(1,2)}

UX[(1,1)] empty?
True
UB[(1,1)] empty? False

a = an action b such that result[b,(1,1)]=(2,1)


b=RIGHT

a=RIGHT, s=(1,1)
Return a
And so on

Online DFS

Worst case each node is visited


twice.
An agent can go on a long walk
even when it is close to the
solution.
An online iterative deepening
approach solves this problem.
Online DFS works only when
actions are reversible.

Online local search


Hill-climbing is already online
One state is stored.
Bad performance due to local maxima
Random restarts impossible.
Solution1: Random walk introduces exploration
Selects one of actions at random, preference to not-yet-tried action
can produce exponentially many steps

Online local search

Solution 2: Add memory to hill climber


Store current best estimate H(s) of cost to reach goal
H(s) is initially the heuristic estimate h(s)
Afterward updated with experience (see below)

Learning real-time A* (LRTA*)

The current
position of agent

Learning real-time A*(LRTA*)


function LRTA*-COST(s,a,s,H) return an cost estimate
if s is undefined the return h(s)
else return c(s,a,s) + H[s]
function LRTA*-AGENT(s) return an action
input: s, a percept identifying current state
static: result, a table of next state, indexed by action and state, initially empty
H, a table of cost estimates indexed by state, initially empty
s, a, the previous state and action, initially null
if GOAL-TEST(s) then return stop
if s is a new state (not in H) then H[s] h(s)
unless s is null
result[a,s] s
H[s] min LRTA*-COST(s,b,result[b,s],H)
b ACTIONS(s)

a an action b in ACTIONS(s) that minimizes LRTA*-COST(s,b,result[b,s],H)


s s
return a

Outline
Ch 4. informed search
Ch 6. adversarial search
Optimal decisions
- pruning
Imperfect, real-time decisions

Games vs. search problems


Problem solving agent is not alone any more
Multiagent, conflict

Default: deterministic, turn-taking, two-player,


zero sum game of perfect information
Perfect info. vs. imperfect, or probability

"Unpredictable" opponent specifying a move


for every possible opponent reply
Time limits unlikely to find goal, must
approximate

* Environments with very many agents are best viewed as economies rather than games

Game formalization
Initial state
A successor function
Returns a list of (move, state) paris

Terminal test
Terminal states

Utility function (or objective function)


A numeric value for the terminal states

Game tree
The state space

Tic-tac-toe: Game tree


(2-player, deterministic, turns)

Minimax
Perfect play for deterministic games: optimal strategy
Idea: choose move to position with highest minimax value
= best achievable payoff against best play
E.g., 2-ply game: only two half-moves

Minimax algorithm

Problem of minimax search


Number of games states is exponential to the
number of moves.
Solution: Do not examine every node
==> Alpha-beta pruning
Remove branches that do not influence final decision

Revisit example

Alpha-Beta Example
Do DF-search until first leaf
Rangeofpossiblevalues

[-,+]

[-, +]

Alpha-Beta Example (continued)

[-,+]

[-,3]

Alpha-Beta Example (continued)

[-,+]

[-,3]

Alpha-Beta Example (continued)


[3,+]

[3,3]

Alpha-Beta Example (continued)


[3,+]
Thisnodeisworse
forMAX

[3,3]

[-,2]

Alpha-Beta Example (continued)

[3,14]

[3,3]

[-,2]

[-,14]

Alpha-Beta Example (continued)

[3,5]

[3,3]

[,2]

[-,5]

Alpha-Beta Example (continued)

[3,3]

[3,3]

[,2]

[2,2]

Alpha-Beta Example (continued)

[3,3]

[3,3]

[-,2]

[2,2]

Properties of -
Pruning does not affect final result
Good move ordering improves effectiveness of pruning
With "perfect ordering," time complexity = O(bm/2)
doubles depth of search

Why is it called -?
is the value of the
best (i.e., highestvalue) choice found
so far at any choice
point along the path
for max
If v is worse than ,
max will avoid it
prune that branch

Define similarly for


min

The - pruning algorithm

The - pruning algorithm

Resource limits
In reality, imperfect and real-time decisions are
required
Suppose we have 100 secs, explore 104 nodes/sec
106 nodes per move

Standard approach:
cutoff test:
e.g., depth limit

evaluation function
= estimated desirability of position

Evaluation functions
For chess, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + + wn fn(s)
e.g., w1 = 9 for queen, w2 = 5 for rook, wn = 1 for pawn
f1(s) = (number of white queens) (number of black
queens), etc.

Cutting off search


MinimaxCutoff is identical to MinimaxValue except
1. Terminal-Test is replaced by Cutoff-Test
2. Utility is replaced by Eval

Does it work in practice?


bm = 106, b=35 m 4
4-ply lookahead is a hopeless chess player!

4-ply human novice


8-ply typical PC, human master
12-ply Deep Blue, Kasparov

Games that include chance


chancenodes

Backgammon: move all ones pieces off the board


Branches leading from each chance node denote the possible dice rolls
Labeled with roll and the probability

Games that include chance

[1,1], [6,6] chance 1/36, all other chance 1/18

Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)


Cannot calculate definite minimax value, only expected value

Expected minimax value


EXPECTED-MINIMAX-VALUE(n) =
UTILITY(n)

If n is a terminal

maxs successors(n) MINIMAX-VALUE(s)

If n is a max node

mins successors(n) MINIMAX-VALUE(s)

If n is a max node

s successors(n) P(s) * EXPECTEDMINIMAX(s)

If n is a chance node

These equations can be backed-up recursively all the way


to the root of the game tree.

Position evaluation with chance nodes

Left, A1 is best
Right, A2 is best
Outcome of evaluation function (hence the agent behavior) may change
when values are scaled differently.
Behavior is preserved only by a positive linear transformation of EVAL.

Вам также может понравиться