Structured decisionmaking algorithm
Notes for CS 8803  Game Theory and Computer Science. Spring 2008
Contents
Setup
The setup is similar to the repeated ndecision game, and also to the repeated ndimensional concave decisionmaking game. However, there is now a (possibly infinite) set of decisions <math>S \subset \reals^m</math>. (This set need not be convex, though by using mixed strategies we can achieve any point in the convex hull.) Each period the decisionmaker chooses a decision <math>x^t \in S</math>. The payoff is assumed to be a linear function <math>p^t \cdot x^t</math>, where <math>p^t \in \reals^m</math>. In the version we will cover, the full payoff vector is revealed to the decisionmaker (though this assumption can be relaxed to just the payoff of the current decision). Hence, the decisionmaker's choice on period <math>t</math> can be a function of <math>x^1,p^1,\ldots,x^{t1},p^{t1}</math>.
As usual,
 <math>\mbox{regret}=\frac{1}{T}\max_{x \in S}\sum_{t=1}^T p^t \cdot x  \frac{1}{T}\sum_{t=1}^T p^t \cdot x^t.</math>
A simple example is the decisions being the set of paths from node <math>s</math> to node <math>t</math> in a graph with <math>m</math> edges. Each path can be represented as a vector in <math>\{0,1\}^m</math>. Many functions can be represented as <math>p^t \cdot x^t</math>, including the function which is 1 if a certain edge is in the path and 0 otherwise, or the sum of edge delays on the path.
Algorithm
The algorithm is quite simple. We assume that we have a blackbox function <math>F</math>, that given a payoff vector <math>p</math>, can efficiently find the best decision <math>F(p)\in S</math>, i.e., <math>F(p) \cdot p = \max_{x \in S} x \cdot p</math>, for all payoff vectors. In the path example, this is simply the shortest path (assuming all payoffs are <math>\leq 0</math>).
Parameter: <math>R\geq 0</math> On period <math>t=1,2,\ldots,T</math>: * Let <math>R^t \in [R,0]^m</math> be chosen uniformly at random. * Let <math>x^t = F(R^t+p^1+p^2+\ldots+p^{t1})</math>.
Note: we chose <math>R^t \in [R,0]</math> for convenience for the path example (so that all edge payoffs would be negative, and we could use a shortest path algorithm). We could equally well choose <math>R^t \in [0,R]</math> or <math>R^t \in [R/2,R/2]</math> without significant change to the analysis.
Analysis
Lemma Star
p^t_1 \leq B</math>, the structured decisionmaking algorithm with <math>R=</math> outputs a sequence <math>x^1,x^2,\ldots,x^T</math> with expected regret:

Proof. The proof is similar to the analysis of the weighted majority algorithm. Note first that all payoffs are in the range <math>[AB,AB]</math>. By linearity of expectation, we may assume WLOG <math>R^1=R^2=\ldots=R^T</math> to be the same random point in <math>[R,0]^n</math>. By the stabilityregularization lemma, we have that,
 <math>\max_{x \in S}\sum_{t=1}^T p^t \cdot x  p^t \cdot x^t \leq \sum_{t=1}^T p^t \cdot x^{t+1}p^t \cdot x^t + \max_{x,x' \in S} R^1 \cdot xR^1 \cdot x'.</math>
Now, by our assumptions, it is straightforward to see that <math>R^1 \cdot x \in [RAn,RAn]</math> for any <math>x \in S</math>. Hence,
 <math>\max_{x,x' \in S} R^1 \cdot xR^1 \cdot x' \leq 2RAn.</math>
Next, we claim that: Let <math>z^t=R^1+p^1+p^2+\ldots+p^{t1}</math>, so that <math>x^t = F(z^t).</math>. Now, although for any particular <math>R_1</math>, <math>F(z^t)</math> may be quite different from <math>F(z^{t+1})</math>. However, we claim that <math>{\mathbb E}[p^t \cdot F(z^{t+1})p^t \cdot F(z^t)]</math> is small. To see this, note that <math>z^t</math> are both chosen uniformly random from cubes of side <math>R</math>. Moreover, they are quite similar. They are parallel, and one is a translation of the other by the vector <math>p^t</math>, which has <math>\sum p^t_i\leq B</math>. Hence, the cubes overlap in at least <math>1p^t_1/R\geq 1\frac{B}{R}</math> fraction. See the figure below. (A BETTER EXPLANATION NEEDS TO BE ADDED HERE.)
Hence, <math>{\mathbb E}[p^t \cdot F(z^{t+1})p^t \cdot F(z^t)]\leq \frac{B}{R}2AB</math>, because the distributions between <math>z^t</math> and <math>z^{t+1}</math> are identical on a <math>1B/R</math> fraction, and the rest of the time they can differ by at most <math>2AB</math> since all payoffs are in <math>[AB,AB]</math>. Hence, we have, in expectation,
 <math>{\mathbb E}\left[\max_{x \in S}\sum_{t=1}^T p^t \cdot x  p^t \cdot x^t\right] \leq \frac{1}{R}2AB^2T + 2RAn</math>.
For our choice of <math>R=\frac{B}{\sqrt{nT}}</math>, the above is <math>4AB\sqrt{nT}</math>, which is what we need for the theorem.
Q.E.D.
The production of this material was supported in part by NSF award SES0734780.