Computing correlated equilibrium

From Theory

Jump to: navigation, search

Notes for CS 8803 - Game Theory and Computer Science. Spring 2008


Given a mixed-strategy profile \sigma \in \Delta, we define the correlated strategy πσ to be the prodcut distribution over actions,

\pi_\sigma(a)=\prod_{i=1}^N \sigma_i(a_i), for all a \in A.

Given a finite N-player game G = (N,u,A), imagine playing the same game repeated T times. Each player can view this repeated game as a repeated n-decision game, where each of his actions is a decision. We then connect this with swap regret.

Theorem. Suppose the players play game G for T periods, using mixed strategy profiles \sigma^1,\ldots,\sigma^T. If each player has swap-regret \leq \epsilon, then \frac{1}{T}\sum_{t=1}^T \pi_{\sigma^t} is an ε correlated equilibrium of G.

The definition of ε-correlated equilibrium can be found here. See also the definition of swap regret.

Contents

Swap weighted majority

The swap weighted majority algorithm is given below.

    Input: number of actions n.
    Instantiate n copies of weighted majority, \!^1\mbox{WM},\!^2\mbox{WM},\ldots,\!^n\mbox{WM}.   
    On period t=1,2,\ldots,T:
    1. Let \!^ix^t \in \Delta_n be the output of \!^i\mbox{WM}.
    2. Let x^t \in \Delta_n be the unique solution to the following equations:
       x^t_i = \sum_{j=1}^n x^t_j(^ix^t_j) for i=1,2,\ldots,n.
    3. Receive payoff p^t \in [-M,M]^n.
    4. Provide feedback (input) \!^ip^t=x^t_i p^t \in [-M,M]^n to \!^i\mbox{WM} for i=1,\ldots,n.

Note, in step 2, there will always be a unique solution. To see this, consider a Markov chain with n states and transition probabilities (from i to j) of \!^ix^t_j. Since the weighted majority algorithm always has all positive weights, these probabilities are all positive and hence there is a unique stationary distribution to the walk. The solution is exactly what is specified by xt.

Theorem. For any n,T\geq 1, M \geq 0, and for any p^1,p^2,\ldots,p^t \in [-M,M]^n, the swap weighted majority algorithm achieves,
\mbox{swap-regret} \leq 4Mn\sqrt{\frac{\log(n)}{T}}.

Scribe: Daniel Dadush

Proof. For copy \!^i\mbox{WM} of the weighted majority algorithm receiving payoffs \!^ip^t, we know that the algorithm achieves regret \max_{1 \leq j \leq n} \frac{1}{T}\sum_{t=1}^T \!^ip^t_j - \frac{1}{T}\sum_{t=1}^T (\!^ix^t)(\!^ip^t) \leq 4M \sqrt{\frac{\log(n)}{T}}

\max_{1 \leq j \leq n} \frac{1}{T}\sum_{t=1}^T x^t_i p^t_j - \frac{1}{T}\sum_{t=1}^T (\!^ix^t)(x^t_i p^t) \leq 4M\sqrt{\frac{\log(n)}{T}}

Combining these inequalities together, we get that

\sum_{i=1}^n \max_{1 \leq j \leq n} \frac{1}{T}\sum_{t=1}^T x^t_i p^t_j - \frac{1}{T}\sum_{t=1}^T (\!^ix^t)(x^t_i p^t) \leq 4Mn\sqrt{\frac{\log(n)}{T}}

\sum_{i=1}^n \max_{1 \leq j \leq n} \frac{1}{T}\sum_{t=1}^T x^t_i p^t_j - \frac{1}{T}\sum_{i=1}^n \sum_{t=1}^T \sum_{k=1}^n (\!^ix^t_k)(x^t_i p^t_k) \leq 4Mn\sqrt{\frac{\log(n)}{T}}

\sum_{i=1}^n \max_{1 \leq j \leq n} \frac{1}{T}\sum_{t=1}^T x^t_i p^t_j - \frac{1}{T}\sum_{k=1}^n \sum_{t=1}^T  \sum_{i=1}^n (x^t_i)(\!^ix^t_k) (p^t_k) \leq 4Mn\sqrt{\frac{\log(n)}{T}}

\sum_{i=1}^n \max_{1 \leq j \leq n} \frac{1}{T}\sum_{t=1}^T x^t_i p^t_j - \frac{1}{T}\sum_{k=1}^n \sum_{t=1}^T (x^t_k)p^t_k\leq 4Mn\sqrt{\frac{\log(n)}{T}}

Since by construction x^t_k = \sum_{i=1}^n (x^t_i)(\!^ix^t_k). And so finally we see that

\sum_{i=1}^n \max_{1 \leq j \leq n} \frac{1}{T}\sum_{t=1}^T x^t_i (p^t_j - p^t_i) = \mbox{swap-regret} \leq 4Mn\sqrt{\frac{\log(n)}{T}}

Randomized algorithms that make pure decisions

Scribe: Jingu Kim

We consider adversarial repeated n-decision game which is a modified version of repeated n-decision game. The setting here is mostly close to that of repeated n-decision game except that the environment is allowed to adapt the payoff vector in possibly adversarial way. For t=1,\cdots,T:

  • The decision maker chooses x^t\in \Delta_n (based on x^1,\cdots,x^{t-1},p^1,\cdots,p^{t-1}).
  • Simultaneously, the environment picks p^t\in [-M,M]^n in possibly adversarial way.
  • The decision maker receives p^t\cdot x^t and finds out p^t \in [-M,M]^n.

We will consider randomized decision making algorithms for this problem where the algorithm gives a pure decision rather than a probability distribution. We say a decision making algorithm is pure if its decision x^t \in \{e_1,\cdots,e_n\}\in \Delta_n.

Randomized weighted majority

This is a randomized version weighted-majority algorithm. Each period the decision maker chooses one of her pure decisions according to the distribution from weighted majority algorithm.

Randomized swap weighted majority

    Input: number of actions n and T
    Instantiate one copy of swap weighted majority; let it be SWM.
    On period t=1,2,\ldots,T:
    1. Run SWM and let \hat{x}^t \in \Delta_n be its decision.
    2. Pick x^t \in \{ e_1,\ldots,e_n \} randomly from probability distribution of \hat{x}^t
    3. Receive payoff p^t \in [-M,M]^n.
    4. Provide feedback pt to SWM.
Theorem. For any n,T\geq 1,M \geq 0, and for any p^1,p^2,\ldots,p^t \in [-M,M]^n the randomized swap weighted majority algorithm guarantees that

E[\mbox{swap-regret}] \leq \frac{4Mn^2}{\sqrt{T}}.

Proof. Recall that \mbox{swap-regret}= \sum_{i=1}^n \max_{j \leq n} \frac{1}{T}\sum_{t=1}^T (p^t_j-p^t_i)x_i^t, and note that \forall i,j,t, E[(p^t_j-p^t_i)x_i^t]= E[(p^t_j-p^t_i)\hat{x}_i^t]. Let \Delta_{ij}= \frac{1}{T} \sum_{t=1}^T (p^t_j-p^t_i)(x_i^t-\hat{x}_i^t), then Eij] = 0

E[\mbox{swap-regret}]= E[\sum_{i=1}^n \max_{j \leq n} ( \frac{1}{T} \sum_{t=1}^T (p^t_j-p^t_i)\hat{x}_i^t + \Delta_{ij} )] \leq E[\sum_{i=1}^n \max_{j \leq n} \frac{1}{T} \sum_{t=1}^T (p^t_j-p^t_i)\hat{x}_i^t]  +  E[\sum_{i=1}^n \sum_{j=1}^n \left|\Delta_{ij}\right|].

We use the fact that for any random variable X\in \mathbb{R}, E[X^2]\geq E[X]^2. To see this, one can either apply Jensen's inequality or observe that E[(X-\mu)^2]=E[X^2]+\mu^2-2E[\mu X] = E[X^2]-\mu^2 \geq 0

E[\left|\Delta_{ij}\right|]^2 \leq E[\left|\Delta_{ij}\right|^2]  = E[\frac{2}{T^2}\sum_{t_1<t_2} (p_j^{t_1}-p_i^{t_1})(x_i^{t_1}-\hat{x}_i^{t_1})(p_j^{t_2}-p_i^{t_2})(x_i^{t_2}-\hat{x}_i^{t_2})   + \frac{1}{T^2}\sum_{t=1}^T (p_j^t-p_i^t)^2(x_i^t-\hat{x}_i^t)^2] = E[\frac{1}{T^2}\sum_{t=1}^T (p_j^t-p_i^t)^2(x_i^t-\hat{x}_i^t)^2] \leq \frac{4M^2}{T}

Note that E[\frac{2}{T^2}\sum_{t_1<t_2} (p_j^{t_1}-p_i^{t_1})(x_i^{t_1}-\hat{x}_i^{t_1})(p_j^{t_2}-p_i^{t_2})(x_i^{t_2}-\hat{x}_i^{t_2})]=0 because fixing \hat{x}_i^{t_1}, x_i^{t_1} follows the distribution of \hat{x}_i^{t_1}.

Now one can see that E[\sum_{i=1}^n \sum_{j=1}^n \left|\Delta_{ij}\right|]^2 \leq \frac{4M^2n^4}{T} and, therefore, E[\sum_{i=1}^n \sum_{j=1}^n \left|\Delta_{ij}\right|] \leq \frac{2Mn^2}{\sqrt{T}}.

\sum_{i=1}^n \max_{j \leq n} \frac{1}{T} \sum_{t=1}^T (p^t_j-p^t_i)\hat{x}_i^t is in fact the swap regret of SWM in an environment simulated by RSWM. Using the previous theorem, we have E[\sum_{i=1}^n \max_{j \leq n} \frac{1}{T} \sum_{t=1}^T (p^t_j-p^t_i)\hat{x}_i^t] \leq 4Mn\sqrt{\frac{\log(n)}{T}}. Putting it all together,

E[\mbox{swap-regret}] \leq \frac{2Mn^2}{\sqrt{T}} + 4Mn\sqrt{\frac{\log(n)}{T}} \leq \frac{4Mn^2}{\sqrt{T}}. Q.E.D.

Correlated Equilibrium algorithm

Here we show a polynomial time algorithm that outputs ε-correlated equilibrium.

    Input: G=(N,A,u), u:A \rightarrow [-M,M]^N, A_i = \{1,2,\ldots,n_i \}, \epsilon \geq 0
    Output: An ε-correlated equilibrium of G
    Let T=\frac{16M^2n^4N^2}{\epsilon^2} where n = maxini.
    1. Run N copies of randomized swap weighted majority algorithm RSWM_{i=1}^N where i-th instance has ni actions.
    2. For t=1,2,\ldots,T:
       Let x_i^t be the output of RSWMi (x_i^t\in \Delta_{n_i} = \Delta_i). Denote x_t = (x_1^t,\ldots,x_N^t) \in \Delta
       Let p_i^t = (u_i(a_i=1,x_{-i}^t),\ldots,u_i(a_i=n_i,x_{-i}^t))\in [-M,M]^{n_i}. 
    3. Provide feedback p_i^t to RSWM's.
    4. If swap regret for each i\leq N is less than or equal to ε, then output a correlated strategy \Pi(a) = \frac{\left| \{t|x^t=a\} \right|}{T}. 
       Else, repeat from step 1.


The vector p_i^t in this algorithm shows how much the player i would have gotten if she chose each action under x_{-i}^t.

Theorem. For any game G=(N,A,u), u:A \rightarrow [-M,M]^N, \epsilon \geq 0, Correlated Equilibrium algorithm outputs ε-correlated equilibrium in expected polynomial time in (\frac{M}{\epsilon}, N, n_1+\cdots+n_N).

Proof. The previous theorem for randomized swap weighted majority algorithm gives us E[\mbox{swap-regret for } i] \leq \frac{4Mn^2}{\sqrt{T}} = \frac{\epsilon}{2N}. Using Markov's Inequality, \mathbb{P}[\mbox{swap regret for }i \geq \epsilon] \leq \frac{\epsilon}{2N}. \mathbb{P}[\mbox{any player has swap regret} \geq \epsilon] \leq \frac{1}{2}. Therefore, E[\# \mbox{ iterations}]=2. Q.E.D.


The production of this material was supported in part by NSF award SES-0734780.

Personal tools