Proof of the weighted majority theorem
From Theory
Notes for CS 8803 - Game Theory and Computer Science. Spring 2008
Here we prove the Weighted majority theorem. First we give some intuition.
Intuition and useful lemmas
We are in the repeated n-decision game. Consider first the natural follow-the-leader strategy in which the decision maker uses, on each period t, the single decision that would have worked best on the previous periods. That is,
that maximizes
. This may be a very poor strategy in simple cases. For example, if n = 2, p1 = (1 / 2, − 1 / 2), and
, then this approach will have
, and the decision-maker would get a payoff of − 1 each period, while any single decision would achieve an average payoff of about 0.
The difficulty in the above is that the "leader" is changing each period. If it so happened that
in the above algorithm, then we would have 0 regret. And if xt was close to xt + 1 for most periods, we would similarly have little regret. This is implied by the following lemma.
Stability lemma. For any set S, , sequence of functions , and sequence , where maximizes ,
|
What the above lemma says is that the hypothetical "be the leader" algorithm that, on each period t uses the decision that works best on periods
, would have no regret. Of course it's impossible to implement such a strategy since we don't know pt when we choose xt. But it does imply that if xt is close to xt + 1 for each t (i.e., the leader is stable), then "following the leader", i.e., using xt on period t would yield low regret.
Proof. The proof follows by induction on T. The base case T = 1 is trivial. Suppose it holds for T − 1 and we want to show it for T. So, by induction hypothesis, we have,
.
Now,
.
Hence,
.
The above is equivalent to the lemma. Q.E.D.
In general the leader may change quite often, as in our example above. Hence, the key idea is to add a regularization term to the maximization to make the leader more stable. That is, rather than maximizing payoff so far, one maximizes payoff so far plus regularization. (Such regularization is common in machine learning.) The following lemma says that if the regularization makes the decisions stable, then we will have low regret.
Stability-regularization lemma
Stability-regularization lemma. For any set S, , functions , and sequence , where maximizes ,
|
Proof.
Let
maximize f1(x) + ... + fT(x). By previous lemma,
Rewriting
| ∑ | ft(xt) |
| t |
Q.E.D.
The main missing ingredient is now to show that the weighted majority algorithm in fact maximizes payoff plus regularization term. In particular, the regularization function is,
where H(x) is the "entropy" of the distribution x,
(where
is taken by definition)
Entropy is a beautiful notion capturing how much uncertainty a distribution has. For example, it is easy to check that the entropy of the uniform distribution over n items has entropy log(n), while the distribution which assigns probability 1 to any single decision has entropy 0. Don't fear, the properties of entropy we will use follow from its definition.
Lemma. For any
,
.
Proof. Since
,
. So we easily have
. For the other part, we will use Jensen's inequality for convex functions, since log(1 / x) = − log(x) is convex. This gives,
Q.E.D.
Analysis of weighted majority
Next, we argue that the Weighted Majority algorithm is a regularized maximizer for
Lemma Apple. For any ε > 0, the xt of the Weighted-majority algorithm WM(ε) maximizes over .
|
Proof. By definition, we have for any
,
By simple algebra, the above is equal to,
For the xi chosen by the algorithm, the above expression is
, because
.
Again using our friend Jensen's inequality, we have:
Hence, the Weighted Majority indeed maximizes the quantity in the lemma. Q.E.D.
Next, we argue that the weighted majority algorithm is stable.
Lemma Banana. For any and ,
|
Proof.
Note first that
and hence
.
The left-hand side above implies that,
, combined with the right-hand side, gives
for all
.
Finally, since
for
, we have that
.
Let λ = 2εM. First, if λ > 1, notice that the lemma is trivial because 4εM2 > 2M, and the difference in payoff between two decisions can never be greater than 2M. Hence, WLOG, we may assume that
. Let
be the unique vector such that,
- xt = (1 − λ)xt + 1 + λzt.
Then, we claim that
. The fact that
follows directly from the argument above. The fact that
follows from the fact that
and that xt is a convex combination of xt + 1 and xt.
Finally,
.
Since
for all
, the magnitude of the above quantity is at most 2λM = 4εM2, as required by the lemma.
Q.E.D.
Finally, we can apply the stability-regularization lemma to bound the regret of Weighted Majority.
Theorem. For any , and for any , the weighted-majority algorithm run with achieves,
|
Proof. The stability-regularization lemma, combined with the lemma before the previous lemma, implies that,
Since we have shown
, we have
The lemma above bounds the "stability" term.
Putting these together gives,
Our choice of
gives the theorem.
Q.E.D.
The production of this material was supported in part by NSF award SES-0734780.
, sequence of functions
, and sequence
, where
maximizes
,
, and sequence
, where
,
over
and
,
, and for any
achieves,
.
