Convex-concave games
From Theory
Notes for CS 8803 - Game Theory and Computer Science. Spring 2008
We now consider a class of two-person zero-sum games with infinitely many actions. We will be working in n-dimensional Euclidean space,
. The action sets for both players will be convex sets,
. The game is convex-concave if the payoff functions
are assumed to satisfy the following condition:
- ui(ai,a − i) is a concave function in ai, for every
.
While there is no explicit requirement on the dependence on the opponent's action, it follows, since it's a zero-sum game, that ui(ai,a − i) = − u − i(ai,a − i) is convex function in a − i, hence the term "convex-concave."
An example of such a game is two companies deciding on the parameters of competing products that they are going to produce. A common economic assumption is that a player's payoff function is a concave function in its own production. For example, if the company was Apple trying to decide how heavy to make a laptop, it may not be the case that their net profit is a strictly decreasing (or increasing) function of the weight, due to the fact that a lighter laptop would have inferior parts and would likely cost more. Instead, there would be some "sweet spot," presumably the peak of some concave function. And, a great many problems (such as linear programming) can cast as maximizing a concave function over a convex set.
We will again give an algorithm for solving such games (for differentiable payoff functions with bounded derivatives) by playing a repeated decision-making game. For simplicity in our lecture, we are going to assume that Ai = Δn, the n-dimensional simplex. The results in this lecture can be extended to arbitrary compact sets. We will all show that each such game has pure-strategy optimal strategy.
repeated n-dimensional concave decision-making game
As in the repeated n-decision game, the decision-maker may make any decision
.
Each period, each decision pays off a real-valued payoff, say in ft(xt). The only assumptions we are going to make on ft are the following:
- Each function
is a concave function.
- Each ft is differentiable, and the derivative can be computed efficiently (e.g., in time polynomial in n).
- Each partial derivative has magnitude at most M, for some parameter,
, i.e.,
for all
.
Each period, the decision-maker chooses
based on the history
. The regret is defined it the same manner,
|
The algorithm we will use for making repeated decisions is simply the weighted majority algorithm, applied to the payoff sequence
:
Parameter: ε > 0 On period: * Let
. * Let
be the total gradient vector. * Let
, where
. // or simple update:
.
The analysis of this algorithm is quite simple, given the analysis of the weighted majority algorithm.
Theorem. For any , and for any concave differentiable such that for each t,i, the above algorithm run with achieves,
|
Note that this setting and gaurantee are a generalization of the weighted majority. In the case of the weighted majority the function
.
Proof. Since a concave function lies below each of its tangent planes, we have,
for each
.
Hence,
But the quantity on the right would be exactly the regret of the weighted majority algorithm in the repeated n-decision problem had it encountered the sequence pt, which happens to be in [ − M,M]n.
Q.E.D.
Bandit convex-concave games
The bandit setting encompasses the much more difficult setting in which only the payoff of the chosen point is discovered each period, not the entire function. A relevant paper is here.
The production of this material was supported in part by NSF award SES-0734780.
:
* Let
.
* Let
be the total gradient vector.
* Let
, where
.
// or simple update:
.
, and for any concave differentiable
such that
for each
achieves,
.
