Content
- Game Overview
- Game Properties
- Two-Player Zero-Sum Game
- Definition
- Intuition
- MiniMax Strategy
- Real-Time/Online search
Game Overview
- Generalization of Search Problems
- Game tree: actions of >= 1 players/agents
- Agents acting to maximize their profits
- Others' profits might not have positive effect on yours
- Key Features of Game
- Each player has different goal
- Different paths/states assigned different costs
- Each player tries to alter the world to best benefit itself
Game Properties
- Two-player
- Discrete: game states or decision can be mapped on discrete values
- Finite: a finite # of states/decisions can be made
- Zero-sum (Fully Competitive): A gains = B loses
- Strategic/normal form game: one shot e.g. rock-paper-scissors
- Extensive form game: turn-taking, multiple moves
- Deterministic: no chance involved
- Perfect Information: all aspects of the state are fully observable
Two-Player Zero-Sum Game
Definition
- 2 players:
A
(Max),B
(Min) - States
S
- Initial state
I
- Terminal positions
T
- Successor functions
- Input: a state
- Return: a set of possible next states
- Utility/Payoff function
V: T -> R
- Mapping showing how good each terminal state is for A/how bad for B
Intuition
- Game ends when terminal
t ∊ T
reached - Game state: state-player pair
A
getsV(t)
,B
gets-V(t)
MiniMax Strategy
- Build full game tree
- Root: start state
- Edges: possible moves
- Leaves: terminals (utilities U(t) labeled)
Back values up the tree
U(n) = min{ U(c): c is a child of n if n is Min node }
U(n) = max{ U(c): c is a child of n if n is Max node }
O(b^d)
space
Depth-First Implementation
- To avoid expanding the tree exponentially in size
- Need finite depth
DFMiniMax(n, Player): //return Utility of state n given that Player is MIN or MAX
If n is TERMINAL:
Return V(n) //Return terminal states utility (V is specified as part of game)
//Apply Player s moves to get successor states.
ChildList = n.Successors(Player)
If Player == MIN:
return minimum of DFMiniMax(c, MAX) over c ∈ ChildList
Else: //Player is MAX
return maximum of DFMiniMax(c, MIN) over c ∈ ChildList
Pruning
- α-cuts(max node) & β-cuts(min node)
AlphaBeta(n,Player,alpha,beta): //return Utility of state
If n is TERMINAL:
return V(n) //Return terminal states utility
ChildList = n.Successors(Player)
If Player == MAX:
v = -infinity
for c in ChildList:
v = max(v, AlphaBeta(c,MIN,alpha,beta))
alpha = max(v, alpha)
If beta <= alpha:
break
return v
Else: //Player == MIN
v = infinity
for c in ChildList:
v = min(v, AlphaBeta(c,MAX,alpha,beta))
beta = min(v, beta)
If beta <= alpha:
break
return v
// Initial call: AlphaBeta(START_NODE, PLAYER, -infinity, infinity)
O(b^d)
space if no pruning,O(b^(d/2))
if optimal pruning- Branching factor of 1st layer: B; 2nd: 1; 3rd: B; ...
- In practice, must make heuristic estimates of the terminal nodes -> evaluation function
Real-Time/Online search
- Run A* until out of memory
- Use evaluation function to decide which path looks best
- Make the move
- Restart search at the node reached (can be cached)