Computing correlated equilibrium
From Theory
Notes for CS 8803 - Game Theory and Computer Science. Spring 2008
Given a mixed-strategy profile
, we define the correlated strategy πσ to be the prodcut distribution over actions,
for all
.
Given a finite N-player game G = (N,u,A), imagine playing the same game repeated T times. Each player can view this repeated game as a repeated n-decision game, where each of his actions is a decision. We then connect this with swap regret.
Theorem. Suppose the players play game G for T periods, using mixed strategy profiles . If each player has swap-regret , then is an ε correlated equilibrium of G.
|
The definition of ε-correlated equilibrium can be found here. See also the definition of swap regret.
Contents |
Swap weighted majority
The swap weighted majority algorithm is given below.
Input: number of actions n. Instantiate n copies of weighted majority,. On period
: 1. Let
be the output of
. 2. Let
be the unique solution to the following equations:
for
. 3. Receive payoff
. 4. Provide feedback (input)
to
for
.
Note, in step 2, there will always be a unique solution. To see this, consider a Markov chain with n states and transition probabilities (from i to j) of
. Since the weighted majority algorithm always has all positive weights, these probabilities are all positive and hence there is a unique stationary distribution to the walk. The solution is exactly what is specified by xt.
Theorem. For any , and for any , the swap weighted majority algorithm achieves,
|
Scribe: Daniel Dadush
Proof. For copy
of the weighted majority algorithm
receiving payoffs
, we know that the algorithm achieves regret
Combining these inequalities together, we get that
Since by construction
. And so finally we see that
Randomized algorithms that make pure decisions
Scribe: Jingu Kim
We consider adversarial repeated n-decision game which is a modified version of repeated n-decision game. The setting here is mostly close to that of repeated n-decision game except that the environment is allowed to adapt the payoff vector in possibly adversarial way. For
:
- The decision maker chooses
(based on
).
- Simultaneously, the environment picks
in possibly adversarial way.
- The decision maker receives
and finds out
.
We will consider randomized decision making algorithms for this problem where the algorithm gives a pure decision rather than a probability distribution. We say a decision making algorithm is pure if its decision
.
Randomized weighted majority
This is a randomized version weighted-majority algorithm. Each period the decision maker chooses one of her pure decisions according to the distribution from weighted majority algorithm.
Randomized swap weighted majority
Input: number of actions n and T Instantiate one copy of swap weighted majority; let it be SWM. On period: 1. Run SWM and let
be its decision. 2. Pick
randomly from probability distribution of
3. Receive payoff
. 4. Provide feedback pt to SWM.
Theorem. For any , and for any the randomized swap weighted majority algorithm guarantees that
|
Proof. Recall that
,
and note that
. Let
, then E[Δij] = 0
.
We use the fact that for any random variable
. To see this, one can either apply Jensen's inequality or observe that
Note that
because fixing
follows the distribution of
.
Now one can see that
and, therefore,
.
is in fact the swap regret of SWM in an environment simulated by RSWM. Using the previous theorem, we have
.
Putting it all together,
.
Q.E.D.
Correlated Equilibrium algorithm
Here we show a polynomial time algorithm that outputs ε-correlated equilibrium.
Input:Output: An ε-correlated equilibrium of G Let
where n = maxini. 1. Run N copies of randomized swap weighted majority algorithm
where i-th instance has ni actions. 2. For
: Let
be the output of RSWMi (
). Denote
Let
. 3. Provide feedback
to RSWM's. 4. If swap regret for each
is less than or equal to ε, then output a correlated strategy
. Else, repeat from step 1.
The vector
in this algorithm shows how much the player i would have gotten if she chose each action under
.
Theorem. For any game , Correlated Equilibrium algorithm outputs ε-correlated equilibrium in expected polynomial time in .
|
Proof.
The previous theorem for randomized swap weighted majority algorithm gives us
.
Using Markov's Inequality,
.
.
Therefore,
. Q.E.D.
The production of this material was supported in part by NSF award SES-0734780.
. If each player has swap-regret
, then
is an
.
On period
:
1. Let
be the output of
for
.
3. Receive payoff
to
.
, and for any
, the swap weighted majority algorithm achieves,
.
be its decision.
2. Pick
randomly from probability distribution of
3. Receive payoff
.
Output: An
where
where
be the output of
). Denote
Let
.
3. Provide feedback
is less than or equal to
.
Else, repeat from step 1.
, Correlated Equilibrium algorithm outputs
.
