A proof of the min-max theorem
From Theory
Notes for CS 8803 - Game Theory and Computer Science. Spring 2008
In this part of the notes, we prove the Min-max theorem using the Weighted-majority algorithm (WM) and its guarantees.
We consider a finite two-person zero-sum game G with actions sets
and payoff (utility) functions
, where
is an upper bound on the payoffs. (We can always assume that there is some upper-bound on the payoffs in a finite game,
)
The value of the game is easy to write down mathematically if one of the players moves first (first announces her mixed strategy) and the other player goes second. The max-min value
is how much player i can guarantee if she has to go first, while the min-max value
is how much player i can guarantee if she gets to go second. So, it is not hard to see that
, since it is an advantage to go second. Also,
because what player i can guarantee going second is the opposite of what her opponent can guarantee going first.
OK, now the proof
To prove the min-max theorem, i.e., that
, or equivalently, v1 + v2 = 0, we consider playing a repeated game. Suppose we are given any game G with action sets Ai, mixed strategies Δi, and payoffs (utilities)
(extended to
).
Consider the following mixed-strategy repeated game Δ(G)T. In this game, each player i chooses a mixed strategy
each period t. For simplicity of the proof, we assume that this mixed strategy is announced to the opponent. (When we actually get into repeated games, we will consider the more natural setting in which a random action is chosen from the mixed strategy and the opponent only observes the action, not the mixed strategy.)
A strategy in this game is a function
, where
is the set of histories, saying what to do after each history of length < t. The payoff to player i in the repeated game is simply,
, where h1 = (),ht + 1 = ht,(f1(ht),f2(ht)).
Lemma. Suppose that player i runs WM with parameter to choose her mixed strategies in Δ(G)T. Then the average payoff to player i is at least .
|
Proof. The average payoff of player i is,
.
The best she could have done had she used a fixed single strategy in hindsight would be,
.
The reason the equality holds is because utility is linear in both of its arguments, i.e., randomly choosing between T mixed strategies is equivalent to using the average mixed strategy. The reason the inequality above holds is that she must get at least how much she can guarantee if going second, because she has the benefit of hindsight here. Since the Weighted majority theorem says that her regret is at most
, this means that her average payoff is at least
.
Q.E.D.
The production of this material was supported in part by NSF award SES-0734780.
to choose her mixed strategies in 