A proof of the min-max theorem

From Theory

Jump to: navigation, search

Notes for CS 8803 - Game Theory and Computer Science. Spring 2008


In this part of the notes, we prove the Min-max theorem using the Weighted-majority algorithm (WM) and its guarantees.

We consider a finite two-person zero-sum game G with actions sets A_i=\{1,2,\ldots,n_i\} and payoff (utility) functions u_i:A_1 \times A_2 \rightarrow [-M,M], where M \geq 0 is an upper bound on the payoffs. (We can always assume that there is some upper-bound on the payoffs in a finite game, M=\max_{i,a_1,a_2} u_i(a_1,a_2).)

The value of the game is easy to write down mathematically if one of the players moves first (first announces her mixed strategy) and the other player goes second. The max-min value v_i = \max_{\sigma_i \in \Delta_i} \min_{\sigma_{-i} \in \Delta_{-i}} u_i(\sigma_i,\sigma_{-i}), is how much player i can guarantee if she has to go first, while the min-max value \bar{v}_i = \min_{\sigma_{-i} \in \Delta_{-i}} \max_{\sigma_i \in \Delta_i} u_i(\sigma_i,\sigma_{-i}), is how much player i can guarantee if she gets to go second. So, it is not hard to see that v_i \leq \bar{v}_i, since it is an advantage to go second. Also, \bar{v}_i=-v_{-i} because what player i can guarantee going second is the opposite of what her opponent can guarantee going first.

OK, now the proof

To prove the min-max theorem, i.e., that v_i=\bar{v}_i, or equivalently, v1 + v2 = 0, we consider playing a repeated game. Suppose we are given any game G with action sets Ai, mixed strategies Δi, and payoffs (utilities) u_i : \times_j A_j \rightarrow \reals (extended to u_i : \times_j \Delta_j \rightarrow \reals).

Consider the following mixed-strategy repeated game Δ(G)T. In this game, each player i chooses a mixed strategy \sigma^t_i\in\Delta_i each period t. For simplicity of the proof, we assume that this mixed strategy is announced to the opponent. (When we actually get into repeated games, we will consider the more natural setting in which a random action is chosen from the mixed strategy and the opponent only observes the action, not the mixed strategy.)

A strategy in this game is a function f_i:H \rightarrow \Delta_i, where H=\cup_{t=0}^{T-1}\Delta^t is the set of histories, saying what to do after each history of length < t. The payoff to player i in the repeated game is simply,

\frac{1}{T}\sum_{t=1}^T u_i(h^{t+1}), where h1 = (),ht + 1 = ht,(f1(ht),f2(ht)).
Lemma. Suppose that player i runs WM with parameter \epsilon = \frac{1}{M}\sqrt{\frac{\log n}{T}} to choose her mixed strategies in Δ(G)T. Then the average payoff to player i is at least \bar{v}_i-2M\sqrt{\frac{2 \log(n)}{T}}.

Proof. The average payoff of player i is,

\frac{1}{T}\sum_{t=1}^T u_i(\sigma_i^t,\sigma_{-i}^t).

The best she could have done had she used a fixed single strategy in hindsight would be,

\max_{\sigma_i \in \Delta_i}\frac{1}{T}\sum_{t=1}^T u_i(\sigma_i,\sigma_{-i}^t)  = \max_{\sigma_i \in \Delta_i} u_i(\sigma_i,\frac{1}{T}\sum_{t=1}^T \sigma_{-i}^t) \geq \bar{v}_i.

The reason the equality holds is because utility is linear in both of its arguments, i.e., randomly choosing between T mixed strategies is equivalent to using the average mixed strategy. The reason the inequality above holds is that she must get at least how much she can guarantee if going second, because she has the benefit of hindsight here. Since the Weighted majority theorem says that her regret is at most 2M\sqrt{\frac{2 \log(n)}{T}}, this means that her average payoff is at least \bar{v}_i-2M\sqrt{\frac{2 \log(n)}{T}}.

Q.E.D.



The production of this material was supported in part by NSF award SES-0734780.

Personal tools