Convex-concave games

From Theory

Jump to: navigation, search

Notes for CS 8803 - Game Theory and Computer Science. Spring 2008


We now consider a class of two-person zero-sum games with infinitely many actions. We will be working in n-dimensional Euclidean space, \reals^n. The action sets for both players will be convex sets, A_i \subseteq \reals^n. The game is convex-concave if the payoff functions u_i: A_1 \times A_2 \rightarrow \reals are assumed to satisfy the following condition:

ui(ai,ai) is a concave function in ai, for every a_{-i} \in A_{-i}.

While there is no explicit requirement on the dependence on the opponent's action, it follows, since it's a zero-sum game, that ui(ai,ai) = − ui(ai,ai) is convex function in ai, hence the term "convex-concave."

An example of such a game is two companies deciding on the parameters of competing products that they are going to produce. A common economic assumption is that a player's payoff function is a concave function in its own production. For example, if the company was Apple trying to decide how heavy to make a laptop, it may not be the case that their net profit is a strictly decreasing (or increasing) function of the weight, due to the fact that a lighter laptop would have inferior parts and would likely cost more. Instead, there would be some "sweet spot," presumably the peak of some concave function. And, a great many problems (such as linear programming) can cast as maximizing a concave function over a convex set.

We will again give an algorithm for solving such games (for differentiable payoff functions with bounded derivatives) by playing a repeated decision-making game. For simplicity in our lecture, we are going to assume that Ai = Δn, the n-dimensional simplex. The results in this lecture can be extended to arbitrary compact sets. We will all show that each such game has pure-strategy optimal strategy.

repeated n-dimensional concave decision-making game

As in the repeated n-decision game, the decision-maker may make any decision x^t \in \Delta_n. Each period, each decision pays off a real-valued payoff, say in ft(xt). The only assumptions we are going to make on ft are the following:

  • Each function f^t:\Delta_n \rightarrow \reals is a concave function.
  • Each ft is differentiable, and the derivative can be computed efficiently (e.g., in time polynomial in n).
  • Each partial derivative has magnitude at most M, for some parameter, M\geq 0, i.e.,
\frac{d}{dx_i}f^t(x) \in [-M,M] for all x \in \Delta_n.

Each period, the decision-maker chooses x \in \Delta_n based on the history x^1,f^1,\ldots,x^{t-1},f^{t-1}. The regret is defined it the same manner,

\mathrm{regret}=\max_{x \in \Delta_n}\frac{1}{T}\sum_{t=1}^T f^t(x)-\frac{1}{T}\sum_{t=1}^T f^t(x^t).\,

The algorithm we will use for making repeated decisions is simply the weighted majority algorithm, applied to the payoff sequence p^t = \nabla f^t(x^t):

  Parameter: ε > 0
  On period t=1,2,\ldots,T:
  * Let p^t= \nabla f^t(x^t)=\left(\frac{d}{dx_1}f^t(x^t),\ldots,\frac{d}{dx_n}f^t(x^t)\right).
  * Let P^t=p^1+p^2+\ldots+p^{t-1} be the total gradient vector.
  * Let x^t = \left(\frac{e^{\epsilon P^t_1}}{Z^t},\frac{e^{\epsilon P^t_2}}{Z^t},\ldots,\frac{e^{\epsilon P^t_n}}{Z^t}\right), where Z^t = \sum_{i=1}^n e^{\epsilon P^t_i}.
     // or simple update: x^t_i =x^{t-1}_i e^{\epsilon x^t_i}/\sum_j x^{t-1}_j e^{\epsilon x^t_j}.


The analysis of this algorithm is quite simple, given the analysis of the weighted majority algorithm.

Theorem. For any n,T\geq 1, M \geq 0, and for any concave differentiable f^1,f^2,\ldots,f^t:\Delta^n \rightarrow \reals such that \frac{d}{dx_i}f(x) \in [-M,M] for each t,i, the above algorithm run with \epsilon=\frac{1}{2M}\sqrt{\frac{\log(n)}{T}} achieves,
\mathrm{regret} \leq 4M\sqrt{\frac{\log(n)}{T}}.

Note that this setting and gaurantee are a generalization of the weighted majority. In the case of the weighted majority the function f^t(x) = p^t \cdot x.

Proof. Since a concave function lies below each of its tangent planes, we have,

f^t(x) \leq f^t(x^t) + p^t \cdot (x-x^t) for each x \in \Delta_n.

Hence,

\mathrm{regret}=\max_{x \in \Delta_n} \frac{1}{T} \sum_{t=1}^T \left(f^t(x) - f^t(x^t)\right) \leq \max_{x \in \Delta_n} \frac{1}{T} \sum_{t=1}^T p^t \cdot (x-x^t).

But the quantity on the right would be exactly the regret of the weighted majority algorithm in the repeated n-decision problem had it encountered the sequence pt, which happens to be in [ − M,M]n.

Q.E.D.

Bandit convex-concave games

The bandit setting encompasses the much more difficult setting in which only the payoff of the chosen point is discovered each period, not the entire function. A relevant paper is here.


The production of this material was supported in part by NSF award SES-0734780.

Personal tools