Repeated games
Scribe: Luyi Gui
Finitely Repeated Game
Given a finite game <math>G=(N,A,u)</math>, define a finitely repeated game <math>G^T</math>, <math>T\geq 1</math> as follows. For every time period <math>t=1,2,...,T,</math> player <math>i</math> chooses an action <math>a_i^t\in A_i</math>. The vector <math>a^t</math> is defined as having <math>i</math>-th coordinate <math>a_i^t</math>. The history of the game up to round <math>t</math> is then <math>h^t=(a^1,a^2,...,a^{t-1})</math>. The set of all possible history vectors <math>h^t</math> on round <math>t</math> is <math>A^{t-1}</math>, the cross product of <math>A</math> with itself <math>t-1</math> times. The payoff to player <math>i</math> is obtained as an average of his payoffs on the individual rounds of the game, giving a total payoff of <math>\frac{1}{T}\sum_{t=1}^T u_i(a^t)</math>. A pure strategy for player <math>i</math> is defined as a function <math>s_i:H^T\rightarrow A_i</math>, where <math>H^T=\bigcup_{t=0}^{T-1}A^t</math> . Similarly, a mixed strategy for player <math>i</math> is defined as a function <math>\alpha_i:H^T\rightarrow \Delta_i</math>. The idea of this definition is that, given any sequence of moves, the strategy will say what to do next. Therefore, it is assumed that player <math>i</math> chooses his action <math>a_i^t</math> by following a strategy <math>s_i</math> and based on the history of the game. More precisely, <math>a_i^t=s_i(h^t)</math>.
Infinitely Repeated Game
Given the game <math>G</math> as above, one could attempt to define an infinitely repeated game <math>G^{\infty}</math> as follows. A pure strategy for player <math>i</math> is defined as a function <math>s_i:H^{*}\rightarrow A_i</math>, where <math>H^{*}=\bigcup_{t=0}^{\infty}A^t</math> . Similarly, a mixed strategy is defined as a function <math>\alpha_i:H^{*}\rightarrow \Delta_i</math>.
It might seem natural to define the payoff function in <math>G^{\infty}</math> as <math>\lim_{T\rightarrow \infty}\frac{1}{T}\sum_{t=1}^T u_i(a^t)</math>. However, this limit might not exist so the game <math>G^{\infty}</math>is not defined in this generality. Instead, introducing a discount parameter <math>\lambda\in (0,1)</math>, an infinitely repeated game <math>G^{\infty}(\lambda)</math> is defined as having payoff function for player <math>i</math> given by <math>(1-\lambda)\sum_{t=1}^{\infty}u_i(a^t)\lambda^{t-1}</math>. Since <math>\sum_{t=1}^{\infty}\lambda^{t-1}=\frac{1}{1-\lambda}</math>, the total payoff is the weighted sum of the payoff received in each individual game.
In order to understand the definition, one way is to consider the time value of money. An alternative way is to take <math>\lambda</math> as the probability that there is going to be a next game so that the total payoff evaluates the expected amount of money received. If <math>\lambda</math> is close to 1, we say that the player is patient. If <math>\lambda</math> is close to 0, we say that the player is myopic.
The following theorem shows that no matter what the value of <math>\lambda</math>, there always exists a Nash Equilibrium for <math>G^{\infty}(\lambda)</math>. In fact, the equilibria can be naturally inherited from the original game <math>G</math>. Given a mixed strategy profile <math>\sigma</math> for <math>G</math>, define a mixed strategy profile <math>\alpha^{\sigma}</math> for <math>G^{\infty}(\lambda)</math> by <math>\alpha_i^{\sigma}(h)=\sigma_i</math>for all players <math>i</math> and all history vectors <math>h</math>.
If <math>\sigma</math> is a Nash Equilibrium for G, then <math>\alpha^\sigma</math> is a Nash Equilibrium for <math>G^{\infty}(\lambda)</math>. |
Proof: Let <math>U</math> denote the payoff function for the game <math>G^{\infty}(\lambda)</math>. Given any player <math>i</math>, it must be shown that <math>U_i(\alpha^{\sigma})\geq U_i(\alpha_i^{\prime},\alpha^{\sigma}_{-i}) </math> for every mixed strategy <math>\alpha_i^{\prime}</math> for player <math>i</math>. It follows from the definition of <math>\alpha^{\sigma}</math> that when the game <math>G^{\infty}(\lambda)</math> is played following this strategy, the action vectors <math>a^t</math> are equal to <math>\sigma</math> for all <math>t</math>. Therefore, <math>U_i(\alpha^{\sigma})=(1-\lambda)\sum_{t=1}^\infty u_i(\sigma)\lambda^{t-1}=u_i(\sigma)</math>. Similarly, when the game <math>G^{\infty}(\lambda)</math> is played following the strategy <math>(\alpha_i^{\prime},\alpha^{\sigma}_{-i})</math>, the action vectors <math>a^t</math> are given by <math>a^t=(\alpha_i^{\prime}(h^{t-1}),\sigma_{-i})</math>. Hence <math>U_i(\alpha_i^{\prime},\alpha^{\sigma}_{-i})=(1-\lambda)\sum_{t=1}^\infty u_i(\alpha_i^{\prime}(h^{t-1}),\sigma_{-i})\lambda^{t-1}</math>. Since <math>\sigma</math> is a Nash equilibrium for <math>G</math>, then <math>u_i(\sigma)\geq u_i(\alpha_i^{\prime}(h^{t-1}),\sigma_{-i})</math>. Multiplying both sides of the inequality by <math>\lambda^{t-1}</math> and adding up these inequalities for all <math>t</math> it follows from the previous formulas that <math>U_i(\alpha^{\sigma})\geq U_i(\alpha_i^{\prime},\alpha^{\sigma}_{-i}) </math>, thus proving that <math>\alpha^{\sigma}</math> is a Nash equilibrium for <math>G^{\infty}(\lambda)</math>.
Repeated Prisoners' Dilemma
Cooperate | Defect | |
Cooperate | 1,1 | -1,2 |
Defect | 2,-1 | 0,0 |
Fig. 1: A payoff matrix for Prisoners' Dilemma |
Consider two strategies.
Tit for Tat: On the first round, both players play C. On round <math>t+1</math>, player <math>i</math> players <math>a^t_{-i}</math>.
Trigger: On the first round, both players play C. On round <math>t+1</math>, player <math>i</math> plays C if and only if <math>a_{-i}^1=a_{-i}^2=...=a_{-i}^t=C</math>.
Both of the above strategies are <math>\epsilon</math>-NE for a finitely repeated game <math>G^T</math> where <math>T\geq \frac{1}{\epsilon}</math>. The reason is that a player can only earn extra profit by defecting on the final round, which produces <math>\frac{1}{T}</math> more profit.
In the case of an infinitely repeated game <math>G^{\infty}(\lambda)</math>, it can be proved that <Trigger,Trigger> is a Nash Equilibrium as long as the players are all patient.
If <math>\lambda\geq \frac{1}{2}</math>, the strategy <Trigger,Trigger> is a Nash Equilibrium. |
Proof: If both players play according to Trigger, then their payoff will be 1.
Let <math>k</math> be the number of the first round that player <math>i</math> does not cooperate. Because the other player is playing Trigger, player <math>i</math> will defect for the rest of the game. Then his payoff will be
<math>(1-\lambda)[1+\lambda+\cdots+\lambda^{k-1}+2\lambda^{k}]=1+\lambda^k-2\lambda^{k+1}</math>
It's easy to see that when <math>\lambda\geq \frac{1}{2}</math>, neither player will earn positive profit by deviating to other strategies. Q.E.D.
<Trigger, Trigger> is also a subgame perfect Nash Equilibrium.