# Extensive-form games

I recommend you first read the fine wikipedia entry: Extensive-form games. We will ignore the "infinite action space" and we will only later get to games with incomplete information.

An extensive-form game is a tree-like representation of a game. We'll start with games of complete information like chess or checkers, where there are no chance moves. Later, we'll get to games of incomplete infomration, like poker, etc. Formally, it consists of:

• A number of players $N \geq 1$
• A tree consisting of set of nodes $V$, the root is $v_0 \in V$, and the set of edges $E \subset V^2$. The tree can be partitioned into leaves (nodes with no children) and internal nodes (nodes with at least one child). Each edge $e \in E$ has a label $\ell_e$, and no two edges leaving the same internal node can have the same label.
• A payoff function (or utility function) $u:L \rightarrow \reals^N$ assigning a payoff to each player, where $L$ is the set of leaves of the tree $T$.
• A partition of the internal nodes into sets $S_1 \cup \ldots \cup S_N$, where in set $S_i$, player $i$ is the one that moves.
• A subpartition of each $S_i$ into information sets $S^j_i$. All nodes in the same information set must have the same set of labels on edges to its children (and hence the same number of children).

The decision set of an information set is the set of labels on edges to its children. The idea is that, starting at the root, the player that decides at that node chooses one outgoing edge. From the players perspective, however, she only knows which infomration set the node belongs to.

A player $i$ is said to have perfect information if each of her information sets has size 1. A game is one of perfect information if each player has perfect information.

### Pure, mixed, and behavioral strategies

A pure strategy is a function from information sets to decisions, where the output decision must belong to the input information set's decision set. If $s=(s_1,s_2,\ldots,s_N)$ is a vector of pure strategies, there is a well-defined payoff vector, which we denote by, $u(s)\in \reals^N$ to each player. It is defined as the payoff to each player if they use their strategy. It is an easy exercise to see that this vector is unique and well-defined.

The number of pure strategies for player $i$ may be huge: it is the product of the sizes of all decision sets for each information set for player $i$, which may be exponential in the size of the tree. A mixed strategy is a probability distribution over pure strategies. The payoff of any pair of mixed strategies is defined to be the expected payoff of the pair of realized pure strategies, where the pure strategies are drawn independently from the corrseponding distributions. This representation is a bit tedious to use, because such a set is in a possibly exponentially-sized set.

An appealing alternative is called a behavioral strategy. A behavioural strategy for player $i$ is a vector of probability distributions, one distribution over each decision set for each of player $i$'s information sets. A behavioral strategy is a representation for the following mixed strategy:

• It is a probability distribution over pure strategies, where the choices at each information set are chosen independently according to the corresponding distributions.

This implies a definition for the payoff of any pair of behavioral or mixed strategies.

### Perfect recall and Kuhn's theorem

Player $i$ has perfect recall if she can remember all of her previous decisions. To make this formal, consider every pair of nodes in the same information set. Consider the two sequences of nodes on the paths from these nodes to the root. Remove any nodes where player $i$ was not the one making the decision. Then the two sequences must have the same length, and corresponding nodes in the two sequences must belong to the same information sets. A player has imperfect recall if this is not the case, i.e., if there exists a pair of nodes for which the above fails. A game is one of perfect recall if every player has perfect recall.

In a game of perfect recall, Kuhn's theorem says that every mixed strategy $s_i$ for player $i$ is "equivalent" to a behavioral strategy $\sigma_i$ in the sense that the payoffs of $s_i,s_{-i}$ to all players are the same as those of $\sigma_i,s_{-i}$.

### "Reduction" to normal-form

It is not hard to see that any extensive-form game (even with incomplete information) can be converted to a normal-form game. The way to do this is simply to the normal-form game with $N$ players where each action set $A_i$ consists of the set of pure strategies of player $i$ in the extensive form game, and the payoff to player $i$ for action profile $a=(s_1,\ldots,s_N),$ is the payoff in the extensive form when all players use strategies $s_i$ (note that this is well-defined).

One key problem with this "reduction" is one of representation size. The number of actions $A_i$ may be exponential in the size of the extensive-form game.

Also note that any normal-form game can be represented as an extensive form game, without a significant blowup in representation size.

### Zero-sum extensive-form games

A extensive-form game is zero-sum if the sum of payoffs is always zero. How can one solve for optimal play in an two-person zero-sum extensive form game? How about one of perfect information?

One way to do it is to convert it to a normal-form game, which remains zero-sum and then find the value and optimal mixed strategies there.

### Backward induction

For a game of perfect information, this is an iterative process where one figures out what will happen in the game by starting at the leaves, and working up, determining a (hopefully) unique payoff for each internal node.

Fact. The value of a two-person zero-sum normal-form game of perfect information can be computed in polynomial time.

(Polynomial time means time polynomial in the size of the representation, i.e., size of the tree.)

Proof. The proof is easy. One simply does a depth-first traversal of the tree, where we assign a value to each node. Each leaf is assigned the value which is the payoff to player 1. Each internal node where player 1 decides is given the maximum of its childrens' values. Each internal node where player 2 decides is given the minimum value of its children. Each node is visited a constant number of times.

The production of this material was supported in part by NSF award SES-0734780.