Structured decision-making algorithm

From Theory

Jump to: navigation, search

Notes for CS 8803 - Game Theory and Computer Science. Spring 2008


Contents

Setup

The setup is similar to the repeated n-decision game, and also to the repeated n-dimensional concave decision-making game. However, there is now a (possibly infinite) set of decisions S \subset \reals^m. (This set need not be convex, though by using mixed strategies we can achieve any point in the convex hull.) Each period the decision-maker chooses a decision x^t \in S. The payoff is assumed to be a linear function p^t \cdot x^t, where p^t \in \reals^m. In the version we will cover, the full payoff vector is revealed to the decision-maker (though this assumption can be relaxed to just the payoff of the current decision). Hence, the decision-maker's choice on period t can be a function of x^1,p^1,\ldots,x^{t-1},p^{t-1}.

As usual,

\mbox{regret}=\frac{1}{T}\max_{x \in S}\sum_{t=1}^T p^t \cdot x - \frac{1}{T}\sum_{t=1}^T p^t \cdot x^t.

A simple example is the decisions being the set of paths from node s to node t in a graph with m edges. Each path can be represented as a vector in {0,1}m. Many functions can be represented as p^t \cdot x^t, including the function which is 1 if a certain edge is in the path and 0 otherwise, or the sum of edge delays on the path.

Algorithm

The algorithm is quite simple. We assume that we have a black-box function F, that given a payoff vector p, can efficiently find the best decision F(p)\in S, i.e., F(p) \cdot p = \max_{x \in S} x \cdot p, for all payoff vectors. In the path example, this is simply the shortest path (assuming all payoffs are \leq 0).

  Parameter: R\geq 0
  On period t=1,2,\ldots,T:
  * Let R^t \in [-R,0]^m be chosen uniformly at random.
  * Let x^t = F(R^t+p^1+p^2+\ldots+p^{t-1}).

Note: we chose R^t \in [-R,0] for convenience for the path example (so that all edge payoffs would be negative, and we could use a shortest path algorithm). We could equally well choose R^t \in [0,R] or R^t \in [-R/2,R/2] without significant change to the analysis.

Analysis

Lemma Star

Lemma Star. For any integer n\geq 1, T \geq 1, reals A,B\geq 0, and any compact decision set S \subseteq [-A,A]^n, and sequence p^1,p^2,\ldots,p^t \in \reals^n, such that |p^t|_1 \leq B, the structured decision-making algorithm with R = outputs a sequence x^1,x^2,\ldots,x^T with expected regret:
{\mathbb E}_{R^1,R^2,\ldots,R^T}[\mbox{regret}] \leq 4AB\sqrt\frac{n}{T}.

Proof. The proof is similar to the analysis of the weighted majority algorithm. Note first that all payoffs are in the range [ − AB,AB]. By linearity of expectation, we may assume WLOG R^1=R^2=\ldots=R^T to be the same random point in [ − R,0]n. By the stability-regularization lemma, we have that,

\max_{x \in S}\sum_{t=1}^T p^t \cdot x - p^t \cdot x^t \leq \sum_{t=1}^T p^t \cdot x^{t+1}-p^t \cdot x^t + \max_{x,x' \in S} |R^1 \cdot x-R^1 \cdot x'|.

Now, by our assumptions, it is straightforward to see that R^1 \cdot x \in [-RAn,RAn] for any x \in S. Hence,

\max_{x,x' \in S} |R^1 \cdot x-R^1 \cdot x'| \leq 2RAn.

Next, we claim that: Let z^t=R^1+p^1+p^2+\ldots+p^{t-1}, so that xt = F(zt).. Now, although for any particular R1, F(zt) may be quite different from F(zt + 1). However, we claim that {\mathbb E}[p^t \cdot F(z^{t+1})-p^t \cdot F(z^t)] is small. To see this, note that zt are both chosen uniformly random from cubes of side R. Moreover, they are quite similar. They are parallel, and one is a translation of the other by the vector pt, which has \sum |p^t_i|\leq B. Hence, the cubes overlap in at least 1-|p^t|_1/R\geq 1-\frac{B}{R} fraction. See the figure below. (A BETTER EXPLANATION NEEDS TO BE ADDED HERE.)

Image:overlap.gif

Hence, {\mathbb E}[p^t \cdot F(z^{t+1})-p^t \cdot F(z^t)]\leq \frac{B}{R}2AB, because the distributions between zt and zt + 1 are identical on a 1 − B / R fraction, and the rest of the time they can differ by at most 2AB since all payoffs are in [ − AB,AB]. Hence, we have, in expectation,

{\mathbb E}\left[\max_{x \in S}\sum_{t=1}^T p^t \cdot x - p^t \cdot x^t\right] \leq \frac{1}{R}2AB^2T + 2RAn.

For our choice of R=\frac{B}{\sqrt{nT}}, the above is 4AB\sqrt{nT}, which is what we need for the theorem.

Q.E.D.



The production of this material was supported in part by NSF award SES-0734780.

Personal tools