Rational learning

From Theory

Jump to: navigation, search

Scribed in part: ?? , Jingu Kim

Notes for CS 8803 - Game Theory and Computer Science. Spring 2008


Bayesian Repeated Games

We consider a repeated game with the following characteristics:

- N: Player set.

- A = A_1 \times \dotsb \times A_n: Action set.

- A^* = \bigcup_{ t = 1 }^\infty A^t: History set, whereA^t = A \times \dotsb \times A (t times.)

- u: A \to \mathbb{R}^N: Utility function.

- \lambda_i \in ( 0, 1 )^N: Vector of discounting factors.

- p_{ij}: A^* \to \Delta_j: Player i's prior distribution for player j, \forall~ j \neq i, \forall~ i \in N.

- s_i: A^* \to A_i: Pure strategy.

- \alpha_i: A^* \to \Delta_i: Mixed strategy.


The situation has two caveats:

i) Player i knows only A and ui.

ii) Perfect monitoring: For each period t, player i picks a_i^t \in A_i based (only) on ( a^1, \ldots, a^{ t - 1 } ) \in A^{ t - 1 }, A and ui. He then gets total utility

( 1 - \lambda_i ) \sum_{ t = 1 }^\infty u_i( a^t ) \lambda_i^{ t - 1 }.


Example: Repeated Prisoner's Dilemma In the repeated prisoner's dilemma, for some d \in \mathbb{N} we define the grim trigger strategy, given by

g_d( a^1, \ldots, a^t ) = \begin{cases} C, & t \leq d, a^1 = \dotsb = a^t = ( C, C ) \\ D, & otherwise. \end{cases}.

Then (gd,gd) is a Nash equilibrium for the repeated game, for any d \in \mathbb{N} and λi close enough to 1.

Theorem. \forall Bayesian equilibrium α,\exists \tau_0 such that after τ0 after τ0 it is ε-close to a ε-Nash equilibrium of repeated game with known types.

Intuition

Consider N=2, pure strategy Bayesian eq. s=(s1,s2). Ti finite.

Subjective equilibrium

Let sij be player i's belief about player j's strategy.

  • s_{ij}:A^* \to A_j
  • sii is a best response given si( − i)
  • sii(hτ) = sji(hτ) on play path hτ

h1 = φ

hτ + 1 = (hτ,(s11(hτ),s22(hτ)))

  • sij (i,j\in {1,2}) is a subjective equilibrium \to (s_{21},s_{12}) is a Nash equilibrium of repeated game.
Personal tools