Game Ai4

AI in game (IV)
Oct. 11, 2006
So far
Artificial Intelligence: A Modern Approach
Stuart Russell and Peter Norvig
Prentice Hall, 2nd ed.
Chapter 1: AI taxonomy
Chapter 2: agents
Chapter 3: uninformed search
Chapter 4: informed search
From now on
Artificial Intelligence: A Modern Approach
Chapter 4.
Chapter 6: adversarial search
Network part
Learning (maybe from the same textbook)
Game AI techniques
Outline
Ch 4. informed search
Online search
Ch 6. adversarial search
Optimal decisions
- pruning
Imperfect, real-time decisions
Offline search vs. online search

Offline search agents
Compute a solution before setting foot in the
real world
Online search agents

Interleave computation and action
E.g. takes an action and then observes
environments and then computes the next action
Necessary for an exploration problem

States and actions are unknown
E.g. robot in a new building, or labyrinth
online search problems

Agents are assumed to know only
Actions(s): returns a list of actions allowed in states
c(s,a,s): this step-cost cannot be used until the agent knows that
s is the outcome
Goal-test(s)
The agent cannot access the successors of a state

except by actually trying all the actions in that state
Assumptions
The agent can recognize a state that it has visited before
Actions are deterministic
Optionally, an admissible heuristic function
online search problems

If some actions are irreversible, the agent may reach a
dead end
If some goal state is reachable from every reachable

state, the state space is safely explorable
online search agents

Online algorithm can expand only a node that it physically
occupies
Offline algorithms can expand any node in fringe
Same principle as DFS
Online DFS
function ONLINE_DFS-AGENT(s) return an action
input: s, a percept identifying current state
static: result, a table of the next state, indexed by action and state, initially empty
unexplored, a stack that lists, for each visited state, the action not yet tried
unbacktracked, a stack that lists, for each visited state, the predecessor states
agent has not yet backtracked
s, a, the previous state and action, initially null
if GOAL-TEST(s) then return stop
if s is a new state then unexplored[s] ACTIONS(s)
if s is not null then do
result[a,s] s
add s to the front of unbackedtracked[s]
if unexplored[s] is empty then
if unbacktracked[s] is empty then return stop
else a an action b such that result[b, s]=POP(unbacktracked[s])
else a POP(unexplored[s])
s s
return a
to which the
Online DFS, example
Assume maze problem on 3x3

grid.
s = (1,1) is initial state
result, unexplored (UX),
unbacktracked (UB),
are empty
s, a are also empty
Online DFS, example
GOAL-TEST((1,1))?
s != G thus false
s=(1,1)
(1,1) a new state?

True
ACTION((1,1)) UX[(1,1)]
{RIGHT,UP}
s is null?
True (initially)
UX[(1,1)] empty?
False
POP(UX[(1,1)]) a
a=UP
s = (1,1)
Return a
Online DFS, example
s=(1,2)
GOAL-TEST((1,2))?
s != G thus false
(1,2) a new state?

True
ACTION((1,2)) UX[(1,2)]
s is null?
false (s=(1,1))
result[UP,(1,1)] (1,2)
UB[(1,2)]={(1,1)}
s
s
{DOWN}
UX[(1,2)] empty?
False
a=DOWN, s=(1,2)
return a
Online DFS, example
s=(1,1)
GOAL-TEST((1,1))?
s != G thus false
(1,1) a new state?

false
s is null?
false (s=(1,2))
result[DOWN,(1,2)] (1,1)
UB[(1,1)] = {(1,2)}
UX[(1,1)] empty?
False
a=RIGHT, s=(1,1)
return a
Online DFS, example
s=(2,1)
GOAL-TEST((2,1))?
s != G thus false
(2,1) a new state?

True, UX[(2,1)]={RIGHT,UP,LEFT}
s is null?
false (s=(1,1))
result[RIGHT,(1,1)] (2,1)
UB[(2,1)]={(1,1)}
UX[(2,1)] empty?
False
a=LEFT, s=(2,1)
return a
Online DFS, example
s=(1,1)
GOAL-TEST((1,1))?
s != G thus false
(1,1) a new state?

false
s is null?
false (s=(2,1))
result[LEFT,(2,1)] (1,1)
UB[(1,1)]={(2,1),(1,2)}
UX[(1,1)] empty?
True
UB[(1,1)] empty? False
a = an action b such that result[b,(1,1)]=(2,1)

b=RIGHT
a=RIGHT, s=(1,1)
Return a
And so on
Online DFS
Worst case each node is visited

twice.
An agent can go on a long walk
even when it is close to the
solution.
An online iterative deepening
approach solves this problem.
Online DFS works only when
actions are reversible.
Online local search

Hill-climbing is already online
One state is stored.
Bad performance due to local maxima
Random restarts impossible.
Solution1: Random walk introduces exploration
Selects one of actions at random, preference to not-yet-tried action
can produce exponentially many steps
Online local search
Solution 2: Add memory to hill climber

Store current best estimate H(s) of cost to reach goal
H(s) is initially the heuristic estimate h(s)
Afterward updated with experience (see below)
Learning real-time A* (LRTA*)
The current
position of agent
Learning real-time A*(LRTA*)

function LRTA*-COST(s,a,s,H) return an cost estimate
if s is undefined the return h(s)
else return c(s,a,s) + H[s]
function LRTA*-AGENT(s) return an action
input: s, a percept identifying current state
static: result, a table of next state, indexed by action and state, initially empty
H, a table of cost estimates indexed by state, initially empty
s, a, the previous state and action, initially null
if GOAL-TEST(s) then return stop
if s is a new state (not in H) then H[s] h(s)
unless s is null
result[a,s] s
H[s] min LRTA*-COST(s,b,result[b,s],H)
b ACTIONS(s)
a an action b in ACTIONS(s) that minimizes LRTA*-COST(s,b,result[b,s],H)

s s
return a
Outline
Ch 4. informed search
Ch 6. adversarial search
Optimal decisions
- pruning
Imperfect, real-time decisions
Games vs. search problems

Problem solving agent is not alone any more
Multiagent, conflict
Default: deterministic, turn-taking, two-player,

zero sum game of perfect information
Perfect info. vs. imperfect, or probability
"Unpredictable" opponent specifying a move

for every possible opponent reply
Time limits unlikely to find goal, must
approximate
* Environments with very many agents are best viewed as economies rather than games
Game formalization
Initial state
A successor function
Returns a list of (move, state) paris
Terminal test
Terminal states
Utility function (or objective function)

A numeric value for the terminal states
Game tree
The state space
Tic-tac-toe: Game tree

(2-player, deterministic, turns)
Minimax
Perfect play for deterministic games: optimal strategy
Idea: choose move to position with highest minimax value
= best achievable payoff against best play
E.g., 2-ply game: only two half-moves
Minimax algorithm
Problem of minimax search

Number of games states is exponential to the
number of moves.
Solution: Do not examine every node
==> Alpha-beta pruning
Remove branches that do not influence final decision
Revisit example
Alpha-Beta Example
Do DF-search until first leaf
Rangeofpossiblevalues
[-,+]
[-, +]
Alpha-Beta Example (continued)
[-,+]
[-,3]
[-,+]
[-,3]

[3,+]
[3,3]

[3,+]
Thisnodeisworse
forMAX
[3,3]
[-,2]
[3,14]
[3,3]
[-,2]
[-,14]
[3,5]
[3,3]
[,2]
[-,5]
[3,3]
[3,3]
[,2]
[2,2]
[3,3]
[3,3]
[-,2]
[2,2]
Properties of -
Pruning does not affect final result
Good move ordering improves effectiveness of pruning
With "perfect ordering," time complexity = O(bm/2)
doubles depth of search
Why is it called -?
is the value of the
best (i.e., highestvalue) choice found
so far at any choice
point along the path
for max
If v is worse than ,
max will avoid it
prune that branch
Define similarly for

min
The - pruning algorithm
The - pruning algorithm
Resource limits
In reality, imperfect and real-time decisions are
required
Suppose we have 100 secs, explore 104 nodes/sec
106 nodes per move
Standard approach:
cutoff test:
e.g., depth limit
evaluation function
= estimated desirability of position
Evaluation functions
For chess, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + + wn fn(s)
e.g., w1 = 9 for queen, w2 = 5 for rook, wn = 1 for pawn
f1(s) = (number of white queens) (number of black
queens), etc.
Cutting off search

MinimaxCutoff is identical to MinimaxValue except
1. Terminal-Test is replaced by Cutoff-Test
2. Utility is replaced by Eval
Does it work in practice?

bm = 106, b=35 m 4
4-ply lookahead is a hopeless chess player!
4-ply human novice

8-ply typical PC, human master
12-ply Deep Blue, Kasparov
Games that include chance

chancenodes
Backgammon: move all ones pieces off the board

Branches leading from each chance node denote the possible dice rolls
Labeled with roll and the probability
Games that include chance
[1,1], [6,6] chance 1/36, all other chance 1/18
Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

Cannot calculate definite minimax value, only expected value
Expected minimax value

EXPECTED-MINIMAX-VALUE(n) =
UTILITY(n)
If n is a terminal
maxs successors(n) MINIMAX-VALUE(s)
If n is a max node
mins successors(n) MINIMAX-VALUE(s)
If n is a max node
s successors(n) P(s) * EXPECTEDMINIMAX(s)
If n is a chance node
These equations can be backed-up recursively all the way

to the root of the game tree.
Position evaluation with chance nodes
Left, A1 is best
Right, A2 is best
Outcome of evaluation function (hence the agent behavior) may change
when values are scaled differently.
Behavior is preserved only by a positive linear transformation of EVAL.

Game Ai4

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Game Ai4

Загружено:

Авторское право:

Доступные форматы

AI in game (IV)

Oct. 11, 2006

Offline search vs. online search

Online search agents

Necessary for an exploration problem

online search problems

The agent cannot access the successors of a state

online search problems

If some goal state is reachable from every reachable

online search agents

Same principle as DFS

Online DFS, example

Assume maze problem on 3x3

Online DFS, example

(1,1) a new state?

Online DFS, example

(1,2) a new state?

Online DFS, example

(1,1) a new state?

Online DFS, example

(2,1) a new state?

Online DFS, example

(1,1) a new state?

a = an action b such that result[b,(1,1)]=(2,1)

Worst case each node is visited

Online local search

Online local search

Solution 2: Add memory to hill climber

Learning real-time A* (LRTA*)

Learning real-time A*(LRTA*)

a an action b in ACTIONS(s) that minimizes LRTA*-COST(s,b,result[b,s],H)

Games vs. search problems

Default: deterministic, turn-taking, two-player,

"Unpredictable" opponent specifying a move

Utility function (or objective function)

Tic-tac-toe: Game tree

Problem of minimax search

Alpha-Beta Example (continued)

Alpha-Beta Example (continued)

Alpha-Beta Example (continued)

Alpha-Beta Example (continued)

Alpha-Beta Example (continued)

Alpha-Beta Example (continued)

Alpha-Beta Example (continued)

Alpha-Beta Example (continued)

Define similarly for

The - pruning algorithm

The - pruning algorithm

Cutting off search

Does it work in practice?

4-ply human novice

Games that include chance

Backgammon: move all ones pieces off the board

Games that include chance

[1,1], [6,6] chance 1/36, all other chance 1/18

Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

Expected minimax value

maxs successors(n) MINIMAX-VALUE(s)

mins successors(n) MINIMAX-VALUE(s)

s successors(n) P(s) * EXPECTEDMINIMAX(s)

These equations can be backed-up recursively all the way

Position evaluation with chance nodes

Learning real-time A(LRTA)