Structured decision-making algorithm
From Theory
Notes for CS 8803 - Game Theory and Computer Science. Spring 2008
Contents |
Setup
The setup is similar to the repeated n-decision game, and also to the repeated n-dimensional concave decision-making game. However, there is now a (possibly infinite) set of decisions
. (This set need not be convex, though by using mixed strategies we can achieve any point in the convex hull.) Each period the decision-maker chooses a decision
. The payoff is assumed to be a linear function
, where
. In the version we will cover, the full payoff vector is revealed to the decision-maker (though this assumption can be relaxed to just the payoff of the current decision). Hence, the decision-maker's choice on period t can be a function of
.
As usual,
A simple example is the decisions being the set of paths from node s to node t in a graph with m edges. Each path can be represented as a vector in {0,1}m. Many functions can be represented as
, including the function which is 1 if a certain edge is in the path and 0 otherwise, or the sum of edge delays on the path.
Algorithm
The algorithm is quite simple. We assume that we have a black-box function F, that given a payoff vector p, can efficiently find the best decision
, i.e.,
, for all payoff vectors. In the path example, this is simply the shortest path (assuming all payoffs are
).
Parameter:On period
: * Let
be chosen uniformly at random. * Let
.
Note: we chose
for convenience for the path example (so that all edge payoffs would be negative, and we could use a shortest path algorithm). We could equally well choose
or
without significant change to the analysis.
Analysis
Lemma Star
Lemma Star. For any integer , reals and any compact decision set , and sequence such that , the structured decision-making algorithm with R = outputs a sequence with expected regret:
|
Proof.
The proof is similar to the analysis of the weighted majority algorithm. Note first that all payoffs are in the range [ − AB,AB].
By linearity of expectation, we may assume WLOG
to be the same random point in [ − R,0]n. By the stability-regularization lemma, we have that,
Now, by our assumptions, it is straightforward to see that
for any
. Hence,
Next, we claim that:
Let
, so that xt = F(zt).. Now, although for any particular R1, F(zt) may be quite different from F(zt + 1). However, we claim that
is small. To see this, note that zt are both chosen uniformly random from cubes of side R. Moreover, they are quite similar. They are parallel, and one is a translation of the other by the vector pt, which has
. Hence, the cubes overlap in at least
fraction. See the figure below. (A BETTER EXPLANATION NEEDS TO BE ADDED HERE.)
Hence,
, because the distributions between zt and zt + 1 are identical on a 1 − B / R fraction, and the rest of the time they can differ by at most 2AB since all payoffs are in [ − AB,AB]. Hence, we have, in expectation,
.
For our choice of
, the above is
, which is what we need for the theorem.
Q.E.D.
The production of this material was supported in part by NSF award SES-0734780.
On period
:
* Let
be chosen uniformly at random.
* Let
.
, reals
and any compact decision set
, and sequence
such that
, the structured decision-making algorithm with
with expected regret:
.

