Game Theory & Learning PDF

Game Theory & Learning
Informal Notes|Not to be distributed
Bhubaneswar Mishra
Courant Institute of Mathematical Sciences
Preface
In the spring of 1998, a small group of computer science colleagues, students and I started writing notes that could be used
in the context of our research in computational economy, evolving around our work on CAFE . The group consisted of Rohit
Parikh, Ron Even, Amy Greenwald, Gideon Berger, Toto Paxia
and few others. At present, these notes are intended for the
consumption of only this group.
January 1 1998
251 Mercer Street, New
York.
B. Mishra
mishra@nyu.edu
vii
Contents
Preface
1 Introduction
vii
1
2 Strategic Form Games
10
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
Stag Hunt Problem : : : : : : : : : : : : : : : : :

Why are these kinds of analysis important to us?
Prisoners' Dilemma : : : : : : : : : : : : : : : :
Second-Price Auction : : : : : : : : : : : : : : :
Two Person Zero-sum Games : : : : : : : : : : :
Obstacles : : : : : : : : : : : : : : : : : : : : : :
Repeated Play (with learning) : : : : : : : : : : :
Learning Algorithm : : : : : : : : : : : : : : : : :
Analysis of Learning Algorithm : : : : : : : : : :
1.9.1 Inequality 1 : : : : : : : : : : : : : : : : :
1.9.2 Inequality 2 : : : : : : : : : : : : : : : : :
1.9.3 Final Result : : : : : : : : : : : : : : : : :
Games : : : : : : : : : : : : : : : : : : : : :
Strategic Form Games : : : : : : : : : : : :
Domination & Nash Equilibrium : : : : : : :
Example : : : : : : : : : : : : : : : : : : : :
2.4.1 Matching Pennies : : : : : : : : : : :
2.5 Key Ingredients for Nash Equilibrium : : : :
2.6 Revisiting On-line Learning : : : : : : : : :
2.6.1 Convergence : : : : : : : : : : : : : :
2.6.2 Irrationality : : : : : : : : : : : : : :
2.6.3 A Meta-Theorem of Foster & Young :
2.1
2.2
2.3
2.4
ix
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
1
2
3
4
6
7
7
8
8
8
9
9
10
10
12
13
16
17
17
17
17
18
Contents
3 Nash Equilibrium
22
3.1 Nash Equilibrium : : : : : : : : : : : : : : : : : : 22

3.1.1 Fixed Point Theorems : : : : : : : : : : : 22
4 Beyond Nash: Domination, Rationalization and

Correlation
26
4.1 Beyond Nash : : : : : : : : : : : : : : : : : : :
4.1.1 Correlated Equilibrium : : : : : : : : : :
4.2 Iterated Strict Dominance and Rationalizability
4.2.1 Some Properties of Undominated Sets : :
4.3 Rationalizability : : : : : : : : : : : : : : : : : :
4.4 Correlated Equilibrium : : : : : : : : : : : : : :
4.4.1 Formal Denitions : : : : : : : : : : : :
4.4.2 Pure Strategies for the Expanded Game :
4.4.3 Correlated Equilibrium and Universal Device : : : : : : : : : : : : : : : : : : : :
5 Adaptive and Sophisticated Learning

5.1
5.2
5.3
5.4
Adaptive and Sophisticated Learning

Set-up : : : : : : : : : : : : : : : : :
Formulation : : : : : : : : : : : : : :
Looking Forward : : : : : : : : : : :
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
26
26
27
28
30
31
32
33
: 34
:
:
:
:
36
36
37
38
40
6 Learning a la Milgrom and Roberts
42
7 Information Theory and Learning
49
6.1 Adaptive Learning and Undominated Sets : : : : 42

6.2 Convergence : : : : : : : : : : : : : : : : : : : : : 43
6.3 Stochastic Learning Processes : : : : : : : : : : : 47
7.1 Information Theory and Games : : : : : : : : :

7.1.1 Basic Concepts : : : : : : : : : : : : : :
7.2 Joint & Conditional Entropy : : : : : : : : : : :
7.2.1 Chain Rule : : : : : : : : : : : : : : : :
7.3 Relative Entropy & Mutual Information : : : : :
7.4 Chain Rules for Entropy, Relative Entropy and
Mutual Information : : : : : : : : : : : : : : : :
7.5 Information Inequality : : : : : : : : : : : : : :
7.6 Stationary Markov Process : : : : : : : : : : : :
c Mishra 1998
:
:
:
:
:
49
49
50
50
51
: 52
: 53
: 54
xi
Contents
7.7 Gambling and Entropy : : : : : : : : : : : : : : 56

7.8 Side Information : : : : : : : : : : : : : : : : : : 60
7.9 Learning : : : : : : : : : : : : : : : : : : : : : : : 61
8 Universal Portfolio
8.1 Universal Portfolio : : : : :

8.1.1 Portfolio : : : : : : :
8.2 Universal Portfolio Strategy
8.2.1 Questions : : : : : :
8.3 Properties & Analysis : : : :
8.3.1 Doubling Ratio : : :
8.4 Competitiveness : : : : : : :
9 Portfolios and Markets
9.1 Portfolio Theory : : : : : :

9.1.1 Itô Calculus : : : :
9.1.2 Market Model : : :
9.2 Rebalanced Portfolio : : :
9.2.1 Optimal Portfolio :
9.2.2 Long Term Eects
9.3 Universal Portfolio : : : :
9.3.1 Competitiveness : :
Bibliography
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
63
63
64
65
65
66
66
69
73
73
73
73
75
78
79
80
81
82
c Mishra 1998
Chapter 1
Introduction
1.1 Stag Hunt Problem
(With Two Players)
Stag Hunt Problem

Stag Hare
Stag 2,2 0,1
Hare 1,0 1,1
1. If both row-player and column-player hunt stag, since a
stag is worth 4 \utils", they each get 2 \utils."
2. If both row-player and column-player hunt hares, since a
hare is worth 1 \util", they each get 1 \util."
3. If row-player hunts hare, while column-player hunts stag
(and hence fails to hunt any thing), then the row-player
gets 1 \util" and the column-player gets 0 \util."
4. The other case is symmetric.
1
Introduction
Chapter 1
Note that if row-player is risk aversive, he will choose to

hunt hare and thus guarantee that he gets 1 \util" independent
of the choice column-player makes. Thus he will maximize the
minimum utility under the two possible pure strategies (\hunt
stag" with a minimum utility of 0 if the opponent hunts hare
vs. \hunt hare" with a minimum utility of 1 regardless of what
the opponent chooses to play) and choose to hunt hare. By
symmetry, it is seen that in fact both players will choose to hunt
hares.
Is this the truly optimal strategy?
Quoting Rousseau (Discourse on the origin and Basis of
Equality among Men ):
\If a group of hunters set out to take a stag, they
are fully aware that they would all have to remain
faithfully at their posts in order to succeed; but if a
hare happens to pass near one of them, there can be
no doubt that if he pursued it without qualm, and
that once he had caught his prey, he cared very little
whether or not he had made his companions miss
theirs."
Changing the discussion slightly, suppose that column-player
will play a mixed strategy by playing \hunt stag" with some
probability (say, y) and by playing the other strategy (\hunt
hare") with probability (1 y). His best choice of these probabilities must be such that row-player is now \indierent" to the
choice of his own strategies. Thus, we have
2y + 0(1 y) = 1y + 1(1 y)
and y = 1=2. Thus one expects both row-player and columnplayer to play the strategies \hunt stag" and \hunt hare" with
equal probabilities.
1.2 Why are these kinds of analysis important to us?

1. Economy
c Mishra 1998
Section 1.3
2.
3.
4.
5.
Introduction
Evolutionary Biology
Large Scale Distributed Systems
Resource Allocation
Intelligent Agents
1.3 Prisoners' Dilemma

Prisoners' Dilemma
C D
C 0,0 -2,1
D 1,-2 -1,-1
There are two prisoners (row-player and column-player) arrested for a particular crime, but the prosecutor does not have
enough evidence to convict them both. He relies one one of them
testifying against the other in order to get a conviction and punish the second prisoner by sending him to jail. If both of them
testify against the other (defections: \D, D") then they both go
to jail for 1 year each, thus getting a \util" of 1. If, on the other
hand, both maintain silence (cooperations: \C, C") then they
go free with \util" of 0 each. If, on the other hand, row-player
testies (D) and column-player maintains silence (C), then rowplayer is rewarded with 1 util and column-player is punished
with 2 util. The other case is symmetric.
The pay-os can be made all non-negative by adding 2 utils
to each and thus getting a pay-o matrix:
c Mishra 1998
Introduction
Chapter 1
Prisoners' Dilemma (Modied Pay-os)

C D
C 2,2 0,3
D 3,0 1,1
1. For column-player the strategy C is dominated by the
strategy D independent of how row-player plays the game.
Thus column player must defect.
2. Similarly, for row-player the strategy C is dominated by
the strategy D independent of how column-player plays
the game. Thus row player must defect.
Hence the equilibrium strategy for the players is to defect
even when they could have each gotten better pay-os by cooperating.
1.4 Second-Price Auction

1. Seller has one indivisible unit of object for sale.
2. There are I potential buyers (bidders) with valuations
0 v1 v2 vI :
(Consider the case when I = 2.)
3. The bidders simultaneously submit bids
si 2 [0; 1]:
4. The highest bidder wins the object.
5. But he only pays the second bid (maxj6=i sj ).
c Mishra 1998
Section 1.5
Introduction
6. His utility is
vi max
s:
j 6=i j
Consider the special case of just two players
v1; v2 = valuations
s1; s2 = bids:
Pay-os
u1 if s1 > s2 then v1 s2 else 0:

u2 if s2 > s1 then v2 s1 else 0:
Let us look at the player 1's choices.
1. Overbidding
(a) s1 s2: The payo is zero and the strategy is weakly
dominated.
(b) s2 v1: The payo is v1 s2 and the strategy is
weakly dominated with respect to bidding s1 = v1.
(c) v1 < s2 < s1: The payo is v1 s2 < 0 negative and
the strategy is strongly dominated.
2. Underbidding
(a) s2 v1: The payo is zero and the strategy is weakly
dominated.
(b) s1 s2: The payo is is v1 s2 and the strategy is
weakly dominated with respect to bidding s1 = v1.
(c) s1 < s2 < v1: The payo is zero and the strategy is
weakly dominated.
So the best strategy for player 1 is to bid exactly his own
valuation (s1 = v1). And by a symmetric argument, the best
strategy for player 2 is also to bid exactly his own valuation
(s2 = v2).
c Mishra 1998
Chapter 1
Introduction
1.5 Two Person Zero-sum Games

We dene a loss matrix M as follows:
M (si ; sj ) = M (i; j ) = Loss suered by the row-player for the
strategy prole (si; sj ).
Rock, Paper & Scissors

R
R 1/2
P 0
S 1
P
1
1/2
0
S
0
1
1/2
Row-player's goal is to minimize the loss. Assume (without

loss of generality) that all the losses are in the range [0; 1].
Row-player's expected loss
X
r (si)c(sj )M (si ; sj )
i;j
X
=
r (i)M (i; j )c(j )
i;j
= rTMc = M (r ; c):
r (si) = Probability that the row player plays si

c(sj ) = Probability that the column player plays sj
Similarly,
X
X
M (r ; j ) = r (i)M (i; j ) and M (i; c) = c (j )M (i; j ):
i;j
Row-player's strategy
min
c M (r ; c ):
r max
c Mishra 1998
i;j
Section 1.7
Introduction
A mixed strategy r realizing this minimum is called a minmax

strategy.
Theorem 1.5.1 The MINMAX theorem: von Neumann

min
r M (r ; c ):
c min
c M (r ; c ) = max
r max
1.6 Obstacles
1. Imperfect Information
M (pay o) may be unknown.
2. Computational complexity
M is so large that computing a minmax strategy using a
linear program is infeasible.
3. Irrationality
Opponent (column-player) may not be truly adversarial.
1.7 Repeated Play (with learning)

M unknown
1. The game is played repeatedly in a sequence of rounds.
2. On round t = 1; : : : ; T :
(a) The learner (row-player) chooses mixed strategy r;t.
(b) The opponent (column-player) chooses mixed strategy c;t.
(c) Row-player observes all possible losses
X
M (i; c;t) = c;t(j )M (i; j );
i;j
for each row i.

(d) Row-player suers loss M (r;t c;t).
c Mishra 1998
Introduction
Chapter 1
Row-player's cumulative expected loss:

T
X
M (r;t; c;t):
t=1
The expected cumulative loss of the best strategy

T
T
X
X
M (r ; c;t):
M (r; c;t) = min
r
t=1
t=1
1.8 Learning Algorithm

Parameter to be chosen.
W 1 ( i)
Wt+1(i)
r;t(i)
Initially,
= 1;
8i
= Wt(i) M (i;c;t)
= PWWt(i()i) :
i
1.9 Analysis of Learning Algorithm
1.9.1 Inequality 1
X
Wt(i) M (i; c;t )

i
i
! X
X
=
Wt(i) r;t M (i; c;t)
i
i
P W ( i)
X
) Pi Wt+1(i) = r;t M (i; c;t )
i t
i
X
r;t(1 (1 )M (i; c;t))
Wt+1(i) =
= 1 (1 )M (r;t; c;t)):
After telescoping, we get
P W ( i) Y
Pi WT +1(i) (1 (1 )M (r;t; c;t))
i 1
t
c Mishra 1998
Section 1.9
Strategic Form Games
Hence,
P W ( i) !
X
ln i nT +1
ln(1 (1 )M (r;t; c;t))
t
X
(1 ) M (r;t; c;t):
t
1.9.2 Inequality 2
X
i
WT +1(i) WT +1(j ) =
P M (j; c;t)
t
P M (; c;t)
r
Hence
P W ( i) !
X
ln i nT +1
(ln ) M (r; c;t)) ln n:
t
1.9.3 Final Result
Combining the two inequalities:

X
X
(1 ) M (r;t; c;t) ln n + (ln 1= ) M (r; c;t)):
t
and,
X
t
M (r; c;t)
M (r;t; c;t)
X
(ln1 1= ) M (r; c;t)) + 1ln n :
t
t
c Mishra 1998
Chapter 2
2.1 Games
Games can be categorized in to following two forms as below.
We will start here with the rst category and postpone the discussion of the second category for later.
1. Strategic Form Games (also called Normal Form Games)
2. Extensive Form Games
2.2 Strategic Form Games

1. Let I = f1; : : :; I g be a nite set of players, where I 2 N
is the number of players.
2. Let Si(i 2 I ) be the (nite) set of pure strategies available
to player i 2 I .
3.
S = S1 S2 SI
(Cartesian product of the pure strategies) = Set of pure
strategy proles.
10
Section 2.2
11
Conventions
We write, si 2 Si for a pure strategy of player i. We also

write, s = (s1; s2; : : : ; sI ) 2 S for a pure strategy prole.
\ i" denotes the player i's \opponents" and refers to all
players other than some given player i. Thus, we can write,
S i = j2I;j6=i Sj
Just as before, s i 2 S i denotes a pure strategy prole for
the opponents of i. Hence,
s = (si; s i) 2 S;
is a pure strategy prole.
ui : S ! R = Pay-o function (real-valued function on S )
for player i.
ui(s) = von Neumann-Morgenstern utility of player i for
each prole s = (s1; s2; : : : ; sI ) of pure strategies.
Denition 2.2.1 A strategic form game is a tuple

(I ; fS1; S2; : : : ; SI g; fu1; u2; : : :; uI g)
consisting of a a set of players, pure strategy spaces and pay-o
functions.
Denition 2.2.2 A two-player zero-sum game is a strategic

form game with I = f1; 2g such that
8s2S
2
X
i=1
ui(s) = 0:
Denition 2.2.3 A mixed strategy set for player i, i is the
set of probability distributions over the pure strategy set Si
)
(
X
i = i: Si ! [0; 1]j i(si) = 1 :
i
The space of mixed strategy prole = = i2I i :

As before, we write: i 2 i, and = f1; 2; : : : ; I g 2 :
c Mishra 1998
12
Chapter 2
The support of a mixed strategy i = The set of pure strategies to which i assigns positive probability.
Player i's pay-o to prole is
ui() = E i ui(; i) X
ui() = ui(i; i) =
i(si )ui(si; i)
si 2Si
X
ui(si; i) =
i (s i)ui(si; s i)
s i 2S i
!
X Y
=
j (sj ) ui(si; s i):
s i 2S i j 6=i
Hence,
i(si)
j 6=i
!
X Y
=
j (sj ) ui(s):
ui() =
si 2Si s i 2S i
s2S
!
j (sj ) ui(si; s i)
2.3 Domination & Nash Equilibrium
Denition 2.3.1 A pure strategy si is strictly dominated for
player i if
9i0 2i 8s
i2S i ui(i ; s i ) > ui (si ; s i ):
A pure strategy si is weakly dominated for player i if
9i0 2i 8s
^ 9s
i2S i ui (i ; s i )
i 2S
ui(si; s i)
!
0
i ui (i; s i ) > ui (si ; s i ) :
Denition 2.3.2 Best Response: The set of best responses

for player i to a pure strategy prole s 2 S is
)
(

BRi(s) = si 2 Sij8si2Si ui(si ; s i) ui(si; s i) :
Let the joint best response set be BR(s) = i BRi(s):
c Mishra 1998
Section 2.4
13
Denition 2.3.3 Nash Equilibrium: A pure strategy prole
s is a Nash equilibrium if for all players i,

8si2Si ui(si ; s i) ui(si; s i ):
Thus a Nash equilibrium is a strategy prole s such that s 2
BR(s).
A Nash equilibrium s is strict if each player has a unique
best response to his rivals' strategies: BR(s ) = fsg.
8si6=si ui(si ; s i) > ui(si; s i):
A mixed strategy prole is a Nash equilibrium if for all players i,
8si2Si ui(i; i) ui(si; i ):
Remark: Since expected utilities are \linear in the probabilities," if a player uses a non-degenerate mixed strategy in a Nash
equilibrium (non-singleton support), he must be indierent between all pure strategies to which he assigns positive probability.
(It suces to check that no player has a protable pure-strategy
deviation).
2.4 Example
Example
L
U 4,3
M 2,1
D 3,0
M
5,1
8,4
9,6
R
6,2
3,6
2,8
For column-player, M is dominated by R. Column-player can

eliminate M from his strategy space. The pay-o matrix reduces
to
c Mishra 1998
14
Chapter 2
New Pay-os
L
U 4,3
M 2,1
D 3,0
R
6,2
3,6
2,8
For row-player, M and D are dominated by U. Row-player

can eliminate M and D. The new pay-o matrix is
New Pay-os
L R
U 4,3 6,2
Next, column-player eliminates R as it is dominated by U
and reduces the pay-o matrix to
New Pay-os
L
U 4,3
Note that
BRr (U; L) = U; & BRc(U; L) = L; & BR(U; L) = (U; L):

(U; L) is a strict Nash equilibrium.
Remark: Mixed Strategy (Not a Nash equilibrium.)
c Mishra 1998
Section 2.4
15
r = (1=3; 1=3; 1=3) & c = (0; 1=2; 1=2) & = (r ; c):
Thus
XY
ur(r ; c) = ( j (sj ))ur (s)
s
= (1=3 0)4 + (1=3 1=2)5 + (1=3 1=2)6

+ (1=3 0)2 + (1=3 1=2)8 + (1=3 1=2)3
+ (1=3 0)3 + (1=3 1=2)9 + (1=3 1=2)2
= 5 21 ;
and
XY
uc(r; c) = ( j (sj ))uc(s)
s
= (1=3 0)3 + (1=3 1=2)1 + (1=3 1=2)2

+ (1=3 0)1 + (1=3 1=2)4 + (1=3 1=2)6
+ (1=3 0)0 + (1=3 1=2)6 + (1=3 1=2)8
= 4 12 ;
Thus this mixed strategy leads to a much better pay-o in

comparison to the pure strategy Nash equilibrium.
A pure strategy may be strictly dominated by a mixed strategy, even if it is not strictly dominated by any pure strategy .
Example
L
U 2,0
M 0,0
D -1,0
R
-1,0
0,0
2,0
c Mishra 1998
16
Chapter 2
For row-player M is not dominated by either U or D. But M

is dominated by a mixed strategy r = (1=2; 0; 1=2) (payo:
ur () = (1=2; 1=2).
Going back to the \Prisoners' Dilemma" game, note that its
Nash equilibrium is in fact (D, D) [both players defect].
BRr (C; C ) = BRr (C; D) = BRr (D; C ) = BRr (D; D) = D;
BRc (C; C ) = BRc(C; D) = BRc(D; C ) = BRc(D; D) = D;
BR(C; C ) = BR(C; D) = BR(D; C ) = BR(D; D) = (D; D):
2.4.1 Matching Pennies
Matching Pennies
H T
H 1,-1 -1,1
T -1,1 1,-1
There are two players: \Matcher" (row-player) and \Mismatcher" (column-player). Matcher and Mismatcher both have
two strategies: \call head" (H) and \call tail" (T). Matcher
wins 1 util if both players call the same [(H,H) or (T,T)] and
mismatcher wins 1 util if the players call dierently [(H,T) or
(T,H)]. It is easy to see that this game has no Nash equilibrium
pure strategy. However it does have a Nash equilibrium mixed
strategy:
r = (1=2; 1=2) & c = (1=2; 1=2):
The pay-os are
ur () = (1=2 1=2)1 + (1=2 1=2)( 1)
+ (1=2 1=2)( 1) + (1=2 1=2)1 = 0
uc() = (1=2 1=2)( 1) + (1=2 1=2)1
+ (1=2 1=2)1 + (1=2 1=2)( 1) = 0:
c Mishra 1998
Section 2.6
17
2.5 Key Ingredients for Nash Equilibrium

1.
2.
3.
4.
Introspection (Fictitious play)

Deduction/Rationality
Knowledge of Opponents Pay-os
Common Knowledge
2.6 Revisiting On-line Learning

2.6.1 Convergence
Note that in the earlier discussion of the on-line learning strategy, we noted that the on-line learning algorithm is competitive
[with a competitive factor of (ln 1= )=(1 ) 1 + (1 )=2 +
(1 )2=3 + , for small (1 )] for any suciently large time
interval [0; T ]. But it is also fairly easy to note that the probabilities that the row-player chooses do not necessarily converge
to the best mixed strategy. Namely,
P
WT (i) = t M (i; c;t ) & r;T (i) = PWWT (i()i) :
i T
We have not explicitly shown that limT !1 r;T converges in
distribution to r. Does the computed distribution converge
to anything? In the absence of any convergence property, one
may justiably question how the algorithm can be interpreted
as learning a strategy.
2.6.2 Irrationality
Let us look at the \Matching Pennies" problem again:
Matching Pennies
c Mishra 1998
18
Chapter 2
H T
H 1,-1 -1,1
T -1,1 1,-1
Suppose the column-player chooses a mixed strategy at time
t such that c;t(H ) > 1=2 [and c;t(T ) = 1 c;t(H ) < 1=2]
then for the row-player, the best response is BRr;t(t) = H
and is unique. By a similar reasoning, if c;t(H ) < 1=2 [and
c;t(T ) > 1=2], then for the row-player, the best response is
BRr;t(t) = T . Thus, if the rival deviates from his Nash equilibrium mixed strategy c;t = (1=2; 1=2), then row-player's (rational) best response is always a pure strategy H or T . Thus,
if row-player had a convergent (rational) mixed strategy, then
depending on limT !1fc;tgT0 , the row player must converge to
one of the following three (conventional) strategies:
1. random(1=2; 1=2) (the Nash equilibrium mixed strategy),
2. H (always H), or
3. T (always T).
Anything else would make the row-player irrational. Thus,
a player playing the on-line learning algorithm must be almost
always irrational!
2.6.3 A Meta-Theorem of Foster & Young
Denition 2.6.1 An innite sequence c;t is almost constant,
if there exists a c such that c;t = c almost always (a.a.). That

is
jft T : c;t 6= cgj = 0:
lim
T !1
T
If c;t is not almost constant then
8c= const c;t 6= c innitely often (i.o.):
c Mishra 1998
Section 2.6
19
Consider an n-player game with a strategy space S1 S2

Sn = S and with the utility functions ui : S ! R. All
actions are publicly observed. Let i = the set of probability
distributions over Si. Let = ii be the product set of mixture. Before every round of the game, a state can be described
by a family of probability distributions
f(i; i;j )gi6=j :

i 2 i = Player i's mixed strategy;
i;j 2 j = Player i's belief about player j ' mixed strategy:
Denition 2.6.2 Rationality: Each player chooses only best

replies given his beliefs:
8i6=j i(si) > 0 ) si 2 BRi(i;j ):

Denition 2.6.3 Learning: Player i has its own deterministic learning process ffi ; fi;j g which it uses in determining its
strategy and its beliefs. In particular, let ht = all publicly available information up to time t. Then, player i chooses its strategy
and beliefs as follows:
fi : ht 1 7! i;t
fij : ht 1 7! ij;t:
The learning process is informationally independent if ij;t =
fij (ht 1) do not depend on any extraneous information.
Denition 2.6.4 Convergence: The beliefs are said to converge along a learning path fht; i;t; ij;tg10 if
8i6=j 9ij 2j tlim
= ij :
!1 ij;t
The strategies are said to converge along a learning path
fht; i;t; ij;tg10 if
= i:
8i 9i2i tlim
!1 i;t
c Mishra 1998
20
Chapter 2
The beliefs are said to be predictive along a learning path if
8i6=j tlim
= i;t;
!1 ij;t
and they are strongly predictive if in addition both the beliefs
and strategies converge.
Theorem 2.6.1 Consider a nite 2-person game (players: rowplayer and column-player) with a strict (thus, unique) Nash
equilibrium = (r; c) which has full support on Sr Sc .
Let f(fr ; frc ); (fc; fcr )g be a DRIP learning process (D = Deterministic, R = Rational, I = Informationally independent and P
= Predictive).
On any learning path (ht; (r;t; rc;t); (c;t; cr;t)), if the beliefs
are not almost constant with value then the beliefs do not
converge.
Proof:
Assume to the contrary: then rc;t 6= c i.o. Then, innitely
often, rc;t does not have full support and
9sr;t 2Sr sr;t 62 BRr (rc;t);
and by the niteness of the strategies Sr :
9sr 2Sr sr 62 BRr (rc;t) i.o.

By rationality of row-player,
9sr 2Sr r;t(sr) = 0 i.o. & 9sr 2Sr tlim

(s ) = 0:
!1 r;t r
By a similar argument,
9sc2Sc tlim
(s ) = 0 :
!1 c;t c
Since the learning is assumed to be predictive, we get
lim (s ) = 0
t!1 cr;t r
c Mishra 1998
& rc;t(sc) = 0:
Section 2.6
Nash Equilibrium
21
Thus, if the beliefs converge (say, to (r ; c)) then the beliefs
(and also, strategies|by predictivity) converge to some strategies other than the unique Nash equilibrium (as it is unique with
full support). Hence one of the following two holds at the limit:
9tr 2Sr nfsr g r (tr) > 0 and tr 62 BRr (c)
or
9tc2Scnfscg c(tc) > 0 and tc 62 BRc(r ):
But, depending on which equation holds true, we shall conclude

that either row-player or column-player (or both) must be irrational, a contradiction.
c Mishra 1998
Chapter 3
Nash Equilibrium
3.1 Nash Equilibrium
3.1.1 Fixed Point Theorems
Denition 3.1.1 A point x 2 K is a xed point of an injective

function f : K ! K; if
x = f (x):
Denition 3.1.2 A point x 2 K is a xed point of a mapping

: K ! 2K ; if
x 2 (x):
Theorem 3.1.1 Brouwer's Fixed Point Theorem: If f :
K ! K is a continuous function from a nonempty, compact,
convex subset K of a nite dimensional TVS (topological vector

space) into itself, then f has a xed point, i.e.,
9x2K x = f (x):
Theorem 3.1.2 Kakutani's Fixed Point Theorem: If :
K ! 2K is a convex-valued, uhc (upper hemi-continuous) map
from a nonempty, compact, convex subset K of a nite dimensional TVS to the nonempty subsets of K , then has a xed
point, i.e.,
9x2K x 2 (x):
22
Section 3.1
23
Nash Equilibrium
Denition 3.1.3 Topological Vector Space: L = vector space
with a T1 topology
(8x6=y2L 9Gx = open set x 2 Gx ^ y 62 Gx )
which admits continuous vector space operations.
Example: Rn with standard Euclidean topology. (Only instance
of a nite dimensional TVS.)
Theorem 3.1.3 Existence of a Mixed Strategy Equilibrium (Nash 1950). Every nite strategic-form game has a mixed-
strategy equilibrium.
Proof: Player i's reaction correspondence , i, maps each
strategy prole to the set of mixed strategies that maximize
player i's pay-os when his rivals play i:
)
(
0
0
i() = i j 8si2Si ui(i; i) ui(si; i) :
Thus,
Dene
i : ! 2i :
: ! 2 : 7! i i():
Thus this correspondence map is the Cartesian product of i's.
A xed point of (if exists) is a such that
2 ():
Note that
8si2Si ui(i; i) ui(si; i );
by denition. Thus a xed point of provides a mixed strategy
equilibrium .
Claims:
1. = Nonempty, compact and convex subset of a TVS.
i = jSij 1 = jSij 1 dimensional simplex, since
)
(
X
i = (i;1; : : : ; i;jSij) j i;j 0; i;j = 1 :
Rest follows since = ii.
c Mishra 1998
24
Chapter 3
Nash Equilibrium
2. ui = Linear Function.
80<<1 ui(i0 + (1 )i00; i)
= ui(i0; i) + (1 )ui (i00; i):
Hence ui is a continuous function in his own mixed strategy. Since is compact, ui attains maxima in .
82 () 6= ;:
3.
82 () = convex:

Let i0, i00 2 (). By denition,
8si2Si (ui(i0; i ) ui(si; i))
^ (ui(i00; i) ui(si; i)):
Hence
80<<1 8si2Si ui(i0 + (1 )i00; i) ui(si; i);

and
80<<1 i0 + (1 )i00 2 i():
4. = uhc. Consider a sequence

(
)
n
n
n
n
( ; ^ ) j ^ 2 ( ) :
n
We wish to show that

If
nlim
!1 (
n ; ^ n ) = (;
^)
then ^ 2 ():
Suppose Not! Then
8n ^ n 2 (n);
but
c Mishra 1998
^ 62 () ) î 62 i():
Section 3.1
25
Beyond Nash
Thus,
9>0 9i0 2i ui(i0; i) > ui(î; i) + 3:

Since ui = continuous, there is a suciently large N such
that
ui(i0; Ni) > ui(i0; i)

> ui(î; i) + 2
> ui(îN ; Ni) + :
Thus, îN 62 (N ), a contradiction.
Thus we conclude that : ! 2 is a convex valued,
uhc map from a nonempty, compact, convex subset of nite
dimensional TVS to nonempty subsets of . Thus by Kakutani's
xed point theorem
92 2 ();
and is a mixed strategy Nash equilibrium.
c Mishra 1998
Chapter 4
Beyond Nash: Domination,
Rationalization and
Correlation
4.1 Beyond Nash
We have seen that it is impossible to \learn" a Nash equilibrium
if we insist on DRIP conditions. A resolution to this dilemma
can involve one or more of the following approaches:
1. Explore simpler requirements than Nash equilibria: e.g.,
undominated sets, rationalizable sets and correlated equilibria. (The rst two correspond to minmax and maxmin
requirements. The last one requires some side information
and may make the system informationally dependent.)
2. Requirement of predictivity may need to be abandoned.
3. Requirement of rationality may need to be abandoned.
4.1.1 Correlated Equilibrium
This concept extends the Nash concept by supposing that the

players can build a \correlated device" that sends each of the
players a private signal before they choose their strategy.
26
Section 4.2
27
Beyond Nash
Main Ingredients: Predictions using only the assumption

that the structure of the game (i.e., the strategy spaces and payos, Si's and ui's) and the rationality of the players are common
knowledge.
4.2 Iterated Strict Dominance and Rationalizability

Denition 4.2.1 Iterated Strict Dominance: Let
Si0 = Si and 0i = i
Let for all n > 0
Sin
(
= si 2 Sin 1 j 8i02ni 1 9s
ui(si; s
i 2S n i 1
0
i ) ui (i ; s i )
(Thus si dominates all the mixed strategies for some strategy

prole of the rivals) and dene
ni
i 2 i j i(si) > 0 ) si 2 Sin
Let
Si1 =
1
\
n=0
)
:
Sin
be the set of player i's pure strategies that survive iterated deletion of strictly dominated strategies.
Let
1
i = i = mixed strategy j
8i02i 9s
i 2S 1i
ui(i; s
0
i ) ui(i ; s i )
be the set of player i's mixed strategies that survive iterated deletion of strictly dominated strategies.
c Mishra 1998
28
Beyond Nash
Chapter 4
Example:
L
U 1,3
M -2,0
D 0,1
R
-2,0
1,3
0,1
Note that
Sr0 = fU; M; Dg & 0r = f (with full support) g:

Similarly,
Sc0 = fL; Rg & 0c = f (with full support) g:

Also note that
Sr1 = = Sr2 = Sr1 = Sr0; & Sc1 = = Sc2 = Sc1 = Sc0:

Note, however, that for all values p 2 (1=3; 2=3) the mixed strategy r = (p; 1 p; 0) is dominated by D. Thus,
0
1
r r :
4.2.1 Some Properties of Undominated Sets

S 1 = S11 S21 SI1; & 1 = 11 12 1I :
1. The nal surviving strategy spaces are independent of the
elimination order .
2. A strategy is strictly dominated against all pure strategies
of the rivals if and only if it is dominated against all of their
c Mishra 1998
Section 4.3
29
Beyond Nash
strategies. Thus, the following is an equivalent denition

of the undominated sets:
Si0 = Si and 0i = i
Sin
si 2 Sin 1 j
8i02ni 1 9s
ni =
i 2S n i 1
ui(si; s
ui(i; s
0
i ) ui (i ; s i )
i 2 ni 1 j
8i02ni 1 9s
Si1 =
1
\
n=0
i 2S n i 1
Sin ;
)
:
0
i ) ui(i ; s i )
&
1i =
1
\
n=0
ni :
Denition 4.2.2 A game is solvable by iterated (strict) dominance, if for each player i, Si1 is a singleton, i.e., Si1 = fsi g.
In this case, the strategy prole (s1 , s2 , : : :, sI ) is a (unique)

Nash equilibrium.
Proof: Suppose that it is not a Nash equilibrium: That is for
some i
si 62 BRi(s i )
Thus
9si2Si ui(si; s i ) > ui(si ; s i):
But suppose si was eliminated in round n: Then
9s0i2Sin 1 8s i2Sn i 1 ui(s0i; s i) > ui(si; s i):
Since s i 2 S 1i, we have ui(s0i; s i) > ui(si; s i). Repeating in

this fashion we get a sequence of inequalities:
ui(si ; s i ) > > ui(s00i ; s i) > ui(s0i; s i) > ui(si; s i );
resulting in a contradiction.
c Mishra 1998
30
Chapter 4
Beyond Nash
4.3 Rationalizability
This notion is due to Bernheim(1984), Pearce(1984) and Aumann(1987) and provides a complementary approach to iterated
strict dominance. This approach tries to answer the following
question:
\What are all the strategies that a rational player
can play? "
Rational player will only play those strategies that are best
responses to some beliefs he has about his rivals' strategies.
Denition 4.3.1 (Rationalizable Strategies) Let

~ 0i = i :
For n > 0, let
~ ni
(
= i 2 ~ ni 1 j
9
n 1
i 2j6=i Conv(~ nj 1 ) i0 2~ i
ui(i;
0
i ) ui(i ; i )
The rationalizable strategies for player i are
Ri =
1
\
~ ni
n=0
A strategy prole is rationalizable if i is rationalizable for

each player i. Let = (1, 2, : : :, I) be a Nash equilibrium.
Note rst, i 2 ~ 0i , for all i. Next assume that 2 i~ ni 1.
Thus i 2 ~ ni 1 , and i 2 j6=i ~ nj 1. Hence,
8i02i ui(i; i) ui(i0; i) ) i 2 ~ ni :
Thus, 2 R = iRi.
Hence,
Theorem 4.3.1 Every Nash equilibrium is rationalizable.
c Mishra 1998
Section 4.4
31
Beyond Nash
Theorem 4.3.2 (Bernheim/Pearce (1984))
The set of rationalizable strategies is nonempty and contains

at least one pure strategy for each player. Further, each i 2 Ri
is (in i) a best response to an element of j6=i Conv(Rj ).
Comparing the constructions of undominated strategies with

rationalizable strategies, we note that
0i = i; and ~ 0i = i:
In the nth iteration, the undominated strategies are constructed
as
(
n
i = i 2 ni 1 j
)
0
8i02ni 1 9 i2j6=i Conv(nj 1 ) ui(i; i) ui(i; i) ;
where as rationalizable strategies are constructed as
(
n
~ i = i 2 ~ ni 1 j
9
i 2j6=i Conv(~ nj 1 )
8i02~ ni 1 ui(i;
0
i ) ui (i ; i )
Finally,
1
1
\
n ; 1 = 1 ; and R = \
~ ni; R = iRi:

1
=
i i
i
i
i
n=0
n=0
A direct examination of these constructions reveals that ~ ni

ni and hence, R 1 . Also, note that the undominated strategies are computing the minmax values where as the rationalizable
strategies compute maxmin values.
4.4 Correlated Equilibrium

Aumann's Example
c Mishra 1998
32
Beyond Nash
Chapter 4
L R
U 5,1 0,0
D 4,4 1,5
There are 3 Nash equilibria:
A pure strategy: (U, L) 7! Pay-o = 5,1,

A pure strategy: (D, R) 7! Pay-o = 1,5, and
A mixed strategy: ((1=2; 1=2), (1=2; 1=2)) 7! Pay-o =
(2.5, 2.5).
Suppose that there is a publicly observable random variable

with Pr(H ) = Pr(T ) = 1=2. Let the players play (U, L) if the
outcome is H, and (D, R) if the outcome is T. Then the pay-o
is (3, 3).
By using publicly observable random variables, the players
can obtain any pay-o vector in the convex hull of the set of
Nash equilibria pay-os.
Players can improve (without any prior contracts) if they can
build a device that sends dierent but correlated signals to each
of them.
4.4.1 Formal Denitions
\Expanded Games" with a correlating device.

Nash equilibrium for the expanded game.
Denition 4.4.1 Correlating device is a triple
(
; fHigI ; p)

= a (nite) state space corresponding to the outcomes
of the device.
c Mishra 1998
Section 4.4
33
Beyond Nash
p = probability measure on the state space
Hi = Information Partition for player i.

Assigns an hi (! ) to each ! 2
such that ! 2 hi (! ).
hi :
! Hi : ! 7! hi(!):
Player i's posterior belief about
are given by Bayes' law:
8!2hi p(!jhi ) = pp((h!)) :

i
4.4.2 Pure Strategies for the Expanded Game
Given a correlating device (

; fHi g; p), we can dene strategies
for the expanded game as follows: Consider a map
i :
! Si : ! 7! i(!);
such that i(!) = i(!0), if !0 2 hi(!).
The strategies are adapted to the information structure.
Denition 4.4.2 DEF(1) A correlated equilibrium relative

to information structure (
; fHi g; p) is a Nash equilibrium in
strategies that are adapted to information structure. That is,

(1, 2, : : :, I ) is a correlated equilibrium if
8i 8~i
! 2
p(!)ui (i(!); i (!))
! 2
p(!)ui(~i(!); i (!)):
Using the Bayes' rule, an equivalent condition would be:
8i 8hi2HiX
;p(hi )>0 8si2Si
p(!jhi )ui(i(!); i (!))

!jhi (!)=hi
!jhi (!)=hi
p(!jhi )ui(si; i(!)):

c Mishra 1998
34
Beyond Nash
Chapter 4
4.4.3 Correlated Equilibrium and Universal Device
\Universal Device" that signals each player how that player

should play.
Denition 4.4.3 DEF(2) A correlated equilibrium is any probability distribution p(:) over the pure strategies S1 S2 SI
such that, for every player i, and every function d(i) : Si ! Si
X
X
p(s)ui (si; s i) p(s)ui(d(si); s i):
s2S
s2S
Using the Bayes' rule, an equivalent condition would be:
8i 8si2Si;p(si)>0 8s0i2Si
X
p(s i jsi)ui(si; s i)

s i 2S i
s i 2S i
p(s i jsi)ui(s0i; s i):
Equivalence of correlated equilibria under Def(1) and Def(2):

Claim:
Def(1) ( Def(2):
Choose
= S . hi(s) = fs0js0i = sig. Leave p(s) unchanged.
Claim:
Def(1) ) Def(2):
Let be an equilibrium w.r.t. (

; fHig; p~). Dene
X
p(s) = fp~(!)j1(!) = s1; : : : ; I (!) = sI ; ! 2
g:
Let
Thus
Ji(si) = f!ji(!) = si g:
p~(Ji(si)) = p(si ) = probability that player i is told to play si:

c Mishra 1998
Section 4.4
Adaptive Learning
35
p~(!) (!):
i
!2Ji (si ) p~(Ji (si ))
It is the mixed strategy of the rivals that player i believes he
faces, conditional on being told to play si, and it is a convex
combination of the distributions conditional on each hi such that
i(hi) = si.
c Mishra 1998
Chapter 5
Adaptive and Sophisticated
Learning
5.1 Adaptive and Sophisticated Learning
The idea of best reply dynamics goes back all the way to Cournot's
study of duopoly and forms the foundation of Walrasian equilibrium in economy and is created by the classical Tatonnement
learning process.
The underlying learning processes can be categorized into
successively stronger versions:
Best-Reply Dynamics: However, it's also known that
this dynamics lead to non-convergent, cyclic behavior. In

this model, an outsider with no information about the utilities (payos) of the agents could eventually predict the
behavior of the agents more accurately than they themselves.
Fictitious-Play Dynamics: The agents choose strate-
gies that are best reply to predictions that the probability

distributions of the competitors' play at the next round is
based on the empirical distribution of the past plays. Even
36
Section 5.2
Adaptive Learning
37
this dynamics lead to (if there is no zero-sum restriction)

cycles of exponentially increasing lengths.
Stationary Bayesian Learning Dynamics: The agents
choose strategies as functions from the information set

(empirical distribution of the past plays) without relying
on any intermediate prediction. The distribution over the
strategies changes as the empirical distribution changes.
(Reactive Learning: involves no model building.)
The dynamics may converge|but to a (mixed) strategy
prole that is not necessarily the perfect (Nash) equilibrium.
5.2 Set-up
Player n plays a sequence of plays: fxn(t)g. Each xn(t) is a
pure strategy and is chosen by the rules of player n's learning
algorithm. We are interested in two properties that may be
satised by fxn(t)g: it is approximately best-reply dynamics,
then it is consistent with adaptive learning ; it is approximately
ctitious-play dynamics, then it is consistent with sophisticated
learning .
Denition 5.2.1 fxn(t)g is consistent with adaptive learning. Player n eventually chooses only strategies that are nearly
best replies to some probability distribution over his rivals joint

strategy proles, where near zero probabilities are assigned to
strategies that have not been played for suciently long time.
Denition 5.2.2 fxn(t)g is consistent with sophisticated

learning. Player n eventually chooses only nearly best replies to
his probabilistic forecast of rivals' joint strategy proles, where

the support of probability may include not only past plays but
also strategies that the rivals may choose if they themselves were
adaptive or sophisticated learners.
c Mishra 1998
38
Adaptive Learning
Chapter 5
We will look at the eect of these algorithms on nite player

games, with compact strategies and continuous pay-o functions .
Note that these assumptions are consistent with the usual
model of exchange economy with innitely divisible goods. Note
that in this model, serially undominated set is a singleton and
thus the Walrasian equilibrium. One of the main results that we
will see is that in any process, consistent with adaptive learning,
play tends towards the serially undominated set and hence, in an
exchange economy, adaptive learning would lead to equilibrium.
5.3 Formulation
Denition 5.3.1 Noncooperative game

= (N; (Sn; n 2 N ); ):
N = Finite Player Set
Sn = Player n's strategy
Compact Subset of some Normed Space

= Pay-o Function
Assumed Continuous:
S = n2N Sn x 2 S ) x = (xn; x n):

x n is the strategy choice of n's rivals.
: S ! RjN j = Pay-o Function, Continuous
n : S ! R
: (xn; x n) 7! n(x):
Let T be a set. Then (T ) = Set of all probability distributions over T .
(Sn) = Mixed strategies on Sn . n = j6=n (Tj ) =
Mixed strategies of n's rivals.
c Mishra 1998
Section 5.3
39
Adaptive Learning
Denition 5.3.2 A strategy xn 2 Sn is -dominated by another

strategy xn 2 (Sn ) if
8z n2S n n(xn; z n ) + < n(xn; z n ):
If 8 xn is -dominated by xn, then xn is dominated by xn
(in the classical sense).
Let T S . Dene Tn T jSn = projection of T onto Sn.

T n = j6=n Tj .
Denition 5.3.3 Given T S . Let

Un (T ) = fxn 2 Sn : 8yn2(Sn) 9z n2T n
n(xn; z n) + n (yn; z n)g
U (T ) = n2N Un (T ):
Un (T ) = Pure strategies in Sn that are not -dominated when
n's rivals are limited to T n .
Fact 1
The operator U is monotonic. Let R and T be sets of strategy

proles.
R T ) U ( R) U ( T ) :
Fact 2
9T U (T ) 6 T:
In general, starting with some arbitrary set of strategy prole T

one may not be able to create a monotonically descending chain
of sets of strategy proles:
T U (T ) U ;2(T ) U ;k (T ) U ;k+1 (T )
Fact 3
However, S U (S ): Since S is the whole nothing new can get

introduced.
By the monotonicity of U , we see that if
U ;k (T ) U ;k+1 (T );
c Mishra 1998
40
Adaptive Learning
then
Chapter 5
U (U ;k (T )) U (U ;k+1 (T ))
and
U ;k+1 (T ) U ;k+2 (T ):
Putting it all together, we do have
S U (S ) U ;2(S ) U ;k (S ) U ;k+1 (S )
We then dene
U 1 (S ) =
1
\
k=0
U ;k (S ):
Hence, U 01(S ) = lim!0 U ;1 (S ) = Serially undominated strategy set. We say x is serially undominated, if x 2 U 01(S ).
Denition 5.3.4 A sequence of strategies fxn(t)g is consistent

with adaptive learning by player n if
8>0 8^t 9t 8tt xn(t) 2 Un
!
fx(s) : ^t s < tg :
A sequence of strategy proles fx(t)g is consistent with adaptive learning if each fxn (t)g has this property.
5.4 Looking Forward

F ;0(t^; t) = U
!
fx(s) : t^ s < tg :
8 k 1:
F ;k (t^; t) = U
c Mishra 1998
F ;k 1(t^; t)
!
^
[ fx(s) : t s < tg :
Section 5.4
41
Learning: MR
Lemma 5.4.1
F ;0(t^; t) F ;1(t^; t) F ;k(t^; t) F ;k+1(t^; t)
Proof
By the monotonicity of U ,
F ;0(t^; t) F ;1(t^; t):
Assume by inductive hypothesis,
F ;k 1(t^; t) F ;k (t^; t):
Then
F ;k 1(t^; t) [ fx(s) : ^t s < tg

F ;k(t^; t) [ fx(s) : t^ s < tg:
By the monotonicity of U ,
U

Thus
F ;k 1(t^; t)
U
[ fx(s) : t^ s < tg
F ;k (t^; t)
!
^
[ fx(s) : t s < tg :
F ;k (t^; t) F ;k+1(t^; t):
Denition 5.4.1 A sequence of strategies fxn(t)g is consistent

with sophisticated learning by player n if
8>0 8^t 9t 8tt xn(t) 2 Un (F 1(t^; t)):

A sequence of strategy proles fx(t)g is consistent with sophisticated learning if each fxn (t)g has this property.
8>0 8^t 9t 8tt x(t) 2 F 1(t^; t):
c Mishra 1998
Chapter 6
Learning a la Milgrom and
Roberts
6.1 Adaptive Learning and Undominated
Sets
Example: Battle of Sexes
WnM
Ballet(B) Football(F)
Ballet(B)
2,1
0,0
Football(F) 0,0
1,2
Let fx(t)g be a sequence of strategy proles. We show that

x(t) = (F; B ) is consistent with sophisticated learning.
8^t fx(s)jt^ s < tg = f(F; B )g:
Thus, we have
!
;
0

FW (t^; t) = UW f(F; B )g = B
!
;
0

^
FM (t; t) = UM f(F; B )g = F
42
Section 6.2
Thus
Similarly,
F ;1(t^; t) = U
43
Learning: MR
F ;0(t^; t) = f (B; F ) g:
f(B; F ); (F; B )g = fB; F g fB; F g:
Continuing in this fashion, we get

F ;1(t^; t) = fB; F g fB; F g:
Thus
x(t + 1) = (F; B ) 2 F ;1(t^; t);

is consistent with sophisticated learning.
6.2 Convergence
Denition 6.2.1 A sequence of strategy proles fx(t)g con-
verges omitting correlation to a correlated strategy prole
G 2 (S )
if (1) and (2) hold:
1. Gtn converges weakly to the marginal distribution Gn for
all n.
2.
8>0 9t 8tt 8n2N d[xn(t); supp(Gn )] < ;

Dene d[x; T ] inf y2T kx yk.
The sequence converges to the correlated strategy G 2 (S )

if in addition
Gt converges weakly to G.
Denition 6.2.2 A sequence fx(t)g converges omitting correlation to a mixed strategy Nash equilibrium if
c Mishra 1998
44
Learning: MR
Chapter 6
1. It replicates the empirical frequency of the separate mixed

strategies and
2. It eventually plays only pure strategies that are in or near
the support of the equilibrium mixed strategies.
Theorem 6.2.1 If fx(t)g converges omitting correlation to a

correlated equilibrium in the game , then fx(t)g is consistent
with adaptive learning.
Proof Sketch:
Gt converges to a correlated equilibrium G.
) Gn consists of best responses to G n
) For suciently large t, xn(t) is within of Gn
) Since Sn is compact and is continuous
8yn2Gn 9z
)
n 2G n
9>0 n(xn(t); z n) + n(yn; z n )
xn (t) 2 Un
!
fx(s) j t^ s < tg :
Theorem 6.2.2 Suppose that the sequence fx(t)g is consistent
with adaptive learning and that it converges to x . Then x is a

pure strategy Nash equilibrium.
Proof Sketch:
Assume that x is not a Nash equilibrium
) 9n2N 8>0 fxng =6 U (fxg):
) Player n must play x0n 6= xn i.o.
) xn(t) does not converge to xn
) Contradiction.
Theorem 6.2.3 Let fx(t)g be consistent with sophisticated learning. Then for each > 0 and k 2 N there exists a time tk after
which (i.e., for t tk )
x(t) 2 U k (S ):
Proof Sketch:
Fix > 0. Dene tk tk (Change in notation).
c Mishra 1998
Section 6.2
45
Learning: MR
Case k = 0: t0 = 0. x(t) 2 U (S ).
Case k = j + 1: By the inductive hypothesis there exists a
tj such that
8ttj x(t) 2 U j (S ):
Hence
fx(s) j tj s < tg U j (S ):
Since fx(t) is consistent with sophisticated learning, we can

choose
t^ = tj ; tj+1 = max(t^; t):
Then
8ttj+1 x(t) 2 F 1(tj ; t):
Claim:
F 1(tj ; t) U ;j+1 (S ):
Equivalently,
8i F i(tj ; t) U ;j+1 (S ):
It then follows that
F 0(tj ; t)
F ;i+1(tj ; t)
U
fx(s) j tj s < tg
U
U ;j (S )
= U ;j+1(S ):
[ fx(s) j tj s < tg
U (U ;j+1(S ) [ U ;j (S ))
=
U
F ;i(tj ; t)
= U (U ;j (S )) = U ;j+1 (S ):
\\
k >0
U k (S ) =
\ 0k
U (S ) = U 01(S ):
k
c Mishra 1998
46
Learning: MR
Chapter 6
Theorem 6.2.4 Let fx(t)g be consistent with sophisticated learn-
ing and Sn1 be the set of strategies that are played innitely often
in fxn(t)g. Then
\ \ k
U (S ) = U 01(S )
S 1 = n2N Sn1
k >0
Corollary 6.2.5 In particular, for any nite game , all play
lies eventually in the set of serially undominated strategies U 01 (S ).

Theorem 6.2.6 Suppose U 01(S ) = fxg.
kx(t) xk ! 0
i fx(t)g is consistent with adaptive learning.
Proof Sketch:
())
Since is continuous,
kx(t) xk ! 0:
8>0 9t 8t>t 8n2N n(xn(t); x n(t)) maxfn (yn; x n(t))jyn 2 Sng
< [n(x) + =2] [maxfn(yn; x n)jyn 2 Sn g =2]
= :
xn(t) 2 Un (fx(t)g)
Un
!

fx(s)jt s < tg :
) fx(t)g is consistent with adaptive learning.

(()
Let x = accumulation point of fx(t)g.
8k 9t 8t>t x(t) 2 U k (S ):
x 2
1
\\
U ;k (S )
>0 k=1
\
\ ;k
U (S ) = U 01(S ) = fxg
=
)
c Mishra 1998
k >0
kx(t) xk ! 0:
Section 6.3
47
Learning: MR
Theorem 6.2.7 Suppose U 01(S ) = fxg.

kx(t) xk ! 0
i x(t) is consistent with sophisticated learning.
6.3 Stochastic Learning Processes

We now allow the players to experiment as we will now assume
that each user may not know his own pay-o function. See
Freudenberg & Kreps (1988).
Game consists of alternations among
Exploration : Every strategy is experimented with equiprobability.
Exploitation : Good strategies |based on exploration|
are played.
At each date t, player n conducts an experiment with probability nt in an attempt to learn its best play.
1. Independence: Decision to experiment is independent of
other players' decisions.
2. Rare: nt ! 0 as t ! 1.
3. Innitely Often: Pt nt = 1.
ft(k; !)g = Subsequence of dates at which player n conducts

no experiment.
! = Realization of the players' randomized choices.
Thus the interval [0; t] consists of experiment dates Pn (xn ; t)
and play dates Mn (xn; t). Write M (t) to denote the expected
total number of experiments.
(xn; t) = Total Pay o received with Mn(xn; t)
(yn; t) = Total Pay o received with Mn(yn ; t)
c Mishra 1998
48
Information Theory
Chapter 6
Claim:
Let T = set of strategy proles.
8z2T n(xn; z n) < n(yn; z n ) 2

) For large t
(xn; t) < (yn; t) M (t)=jSnj:
E [(xn; + 1) (yn; + 1)jT ] (xn; ) + (yn; )

= n; +1=jSnj E [n(xn; x n( + 1)) n (yn; x n( + 1))]
< 2 n; +1=jSn j:
Taking expectations
E [(xn; + 1) (yn; + 1)jT ]

= E [(xn; ) (yn; )jT ] 2 n; +1=jSn j:
and then telescoping,
E [(xn; t) (yn; t)] <
2=jSnj
n;t = 2M (t)=jSnj:
Let > 2jnj.

Var[(xn; t) (yn; t)]
22M (t)=jSnj:
Thus (xn; t) (yn; t)] + M (t)=jSnj converges to 1 and

hence represents a super-martingale.
In other words, xn dominates yn then the player n will discover this fact eventually by repeated experiments.
Theorem 6.3.1 For any nite strategy game , the sequence

fxn(t(k; !))g constructed as described above is consistent with
adaptive learning a.s.(almost surely).
c Mishra 1998
Chapter 7
Information Theory and
Learning
7.1 Information Theory and Games
7.1.1 Basic Concepts
Denition 7.1.1 Entropy is a measure of uncertainty of a random variable. Let X be a discrete random variable with alphabet
X.
p(x) = Pr[X = x]; where x 2 X :
The entropy H (X ) of the discrete random variable X is dened as
H (X ) = E p lg p(1X )
X
=
p(x) lg p(x):
x2X
Facts
1. H (X ) 0. Entropy is always nonnegative. 0 p(x) 1;
lg p(x) 0. Hence, Ep lg(1=p(x)) 0.)
2. H (X ) lg jXj. Consider the
uniform distribution u(x).
P
8x2X u(x) = 1=jXj. H (u) = x(1=jXj) lg jXj = lg jXj.
49
50
Information Theory
Chapter 7
3. H (X ) = Average number of bits required to encode the

discrete random variable X .
7.2 Joint & Conditional Entropy

(X; Y ) = A pair of discrete random variables with joint distribution p(x; y).
Joint Entropy =
H (X; Y ) = E p lg p(X;1 Y )
XX
=
p(x; y) lg p(x; y):
x2X y2Y
Conditional Entropy =
H (Y jX ) =
1
p(Y jX )
XX
p(x; y) lg p(yjx)
x2X y2Y
X
X
p(x) p(yjx) lg p(yjx)
x2X
y2Y
X
p(x)H (Y jx):
E p lg
=
=
=
x2X
7.2.1 Chain Rule

p(X; Y )
lg p(X; Y )
lg p(X;1 Y )
1
E p lg
p(X; Y )
H (X; Y )
= p(X ) p(Y jX ) Bayes' Rule

= lg p(X ) + lg p(Y jX )
= lg p(1X ) + lg p(Y1jX )
= E p lg p(1X ) + E p lg p(Y1jX ) Linearity of Expectation
= H (X ) + H (Y jX ):
Corollary 7.2.1
c Mishra 1998
Section 7.3
51
Information Theory
1. H (X; Y jZ ) = H (X jZ ) + H (Y jX; Z ):
2. H (X ) + H (Y jX ) = H (Y ) + H (X jY )
) H (X ) H (X jY ) = H (Y ) H (Y jX ):
3. Note that H (X jY ) 6= H (Y jX ):
7.3 Relative Entropy & Mutual Information
Denition 7.3.1 Relative Entropy|Also, Kullback-Liebler

Distance between two probability mass functions p(x) and q(x).
X
p(x) lg pq((xx)) :
D(pkq) = E p lg pq((xx)) =
x
is
Note that D(pkp) = 0. If u(x) = jXj1 , for all x. Then D(pku)
D(pku) =
X
p(x) lg p(1x) + p(x) lg jXj = lg jXj H (X ):
Denition 7.3.2 Mutual Information
Let X and Y be two discrete random variables with a joint probability mass function p(x; y), and with marginal probability mass
functions
p(x) =
y2Y
p(x; y) & p(y) =
x2X
p(x; y):
Mutual Information,
!
I (X ; Y ) = D p(x; y) k p(x)p(y)
= E p(x;y) lg pp(x(x;)py(y))
X X
=
p(x; y) lg pp(x(x;)py(y))
x2X y2Y
c Mishra 1998
52
Information Theory
Chapter 7
= H (X ) + H (Y ) !H (X; Y )
!
= H (X ) + H (Y )
H (Y ) + H (X jY )
= H (X ) H (X jY ) = H (Y ) H (Y jX ) = I (Y ; X ):
H (X ) H (X jY ) = I (X ; Y ) = H (Y ) H (Y jX ) = I (Y ; X ):
I (X ; X ) = H (X ) H (X jX ) = H (X ):
I (X ; Y ) = I (Y ; X ) = H (X ) + H (Y ) H (X; Y ):
7.4 Chain Rules for Entropy, Relative

Entropy and Mutual Information
H (X1 ; X2; : : :; Xn )
= H (X1 ) + H (X2jX1) +
+ H (Xn jX1; : : : ; Xn 1 )
n
X
H (Xi jX1; : : : ; Xi 1):
=
i=1
I (X1; X2; : : :; Xn ; Y )
= H (X1 ; : : :; Xn ) H (X1; : : :; Xn jY )
n
n
X
X
H (Xi jX1; : : :; Xi ; Y )
H (Xi jX1; : : : ; Xi)
=
i=1
i=1
n
X
H (Xi jX1; : : : ; Xi) H (Xi jX1; : : :; Xi ; Y )
=
i=1
n
X
I (Xi; Y jX1; : : : ; Xi 1):
=
i=1
c Mishra 1998
Section 7.5
53
Information Theory
!
D p(x; y) k q(x; y)
XX
y)
=
p(x; y) lg pq((x;
x; y)
x y
XX
=
p(x; y) lg pq((xx)) pq((yyjjxx))
x y
XX
XX
=
p(x; y) lg pq((xx)) +
p(x; y) lg pq((yyjjxx))
x y
x y
X
X
=
p(x) lg pq((xx)) + p(yjx) lg pq((yyjjxx))
x
! y
!
= D p(x) k q(x) + D p(yjx) k q(yjx) :
7.5 Information Inequality

X
p(x) lg pq((xx)) lg is a concave function

x
X
X
lg p(x) pq((xx)) lg q(x) = lg 1 = 0:
x
x
Theorem 7.5.1 D(pkq) 0 (with equality i p(x) = q(x) for
all x.)
D(pkq) =
Corollary 7.5.2
!
I (X ; Y ) = D p(x; y) k p(x)p(y) 0;
(with equality i X and Y are independent, i.e., p(x; y) = p(x)p(y)

for all x and y.)
Let u(x) = jXj1 .
D(p k u) = lg jXj H (X ) 0:
c Mishra 1998
54
Information Theory
Chapter 7
Hence,
H (X ) lg jXj;
(with equality i X has a uniform distribution over X .)
I (X ; Y ) = H (X ) H (X jY ) 0:
Theorem 7.5.3
H (X jY ) H (X ):
Conditioning reduces entropy.
H (X1; : : : ; Xn) =
n
X
i=1
n
X
i=1
H (Xi jX1; : : : ; Xi 1)
H (Xi )
Corollary 7.5.4
H (X1 ; : : :; Xn )
n
X
i=1
H (Xi );
with equality i Xi 's are independent.
7.6 Stationary Markov Process

Markovian
Pr[Xn jX1; : : :; Xi ] = Pr[Xn jXi]; i n:
Stationary
Pr[Xn jX1; : : : ; Xi] = Pr[Xn+1 jX2; : : :; Xi+1]:
c Mishra 1998
Section 7.6
55
Information Theory
H (Xn jX1) H (Xn jX1; X2) conditioning reduces entropy

= H (Xn jX2) Markov
= H (Xn 1 jX1) Stationary :
2nd Law of Thermodynamics
Theorem 7.6.1 Conditional entropy H (Xn jX1) increases with

time n for a stationary Markov process.
Relative entropy D(n kn0 ) decreases with time n.
Let n and n0 be two postulated probability distributions on

the state space of a Markov Process. At time n + 1, the distribution changes to n+1 and n0 +1, governed by the transition
probabilities r(xn ; xn+1).
Thus
p(xn ; xn+1) = p(xn)r(xn ; xn+1)
= p(xn)p(xn+1 jxn)
similarly,
q(xn; xn+1) = q(xn)r(xn ; xn+1)
= q(xn)q(xn+1jxn)
Thus, we have
!
D p(xn ; xn+1) k q(xn; xn+1)
!
!
= D p(xn ) k q(xn) + D p(xn+1jxn) k q(xn+1jxn)
!
= D p(xn ) k q(xn) :
And
D p(xn; xn+1 ) k q(xn; xn+1)
!
c Mishra 1998
56
Information Theory
Chapter 7
= D p(xn+1) k q(xn+1) + D p(xn jxn+1) k q(xnjxn+1)

!
D p(xn+1) k q(xn+1) :
We conclude that
!
D p(xn) k q(xn) D p(xn+1) k q(xn+1) :
Thus the relative entropy for this system must decrease:
D(1 k10 ) D(2k20 )

D(n kn0 ) D(n+1 kn0 +1 ) ! 0:
7.7 Gambling and Entropy

Horse Race
# horses = m; fH1; H2; : : : ; Hmg:
pi = Pr[Hi wins ]
ui = pay-o if Hi wins :
If bi = bet on the ith horse then the payo =
(
biui; if Hi wins with probability pi ;
0;
if Hi loses with probability (1 pi ):
Assume that the gambler has 1 dollar. Let bi = fraction of
his wealth invested in Hi . Thus
m
X
bi = 1:
0 b i 1:
i=1
c Mishra 1998
Section 7.7
57
Information Theory
Note that the gambler's pay-o is biui if Hi wins (with probability pi .)
S (X ) = b(X )u(X )
= factor by which the gambler increases his wealth if X wins.
Repeated game with reinvestment.
S0 = 1;
Sn = Sn 1 S (Xn );
Thus
Let
E p [lg S (X )] =
Sn =
n
Y
i=1
if Xn wins in the nth game.
P
S (Xi) = 2 lg S(Xi):
pk lg(bk uk ) = W (b; p) = Doubling Rate;
where b = the betting strategy. Then

1 lg S ! E [lg S (X )] in probability;
p
n n
by \Law of Large Number." Hence
Sn ! 2nW (b;p):
Denition 7.7.1 Doubling Rate

W (b; p) =
m
X
k=1
pk lg(bk uk ):
Theorem 7.7.1 Let the race outcomes X1 , : : :, Xn be i.i.d.

p(x). Then the wealth of the gambler using betting strategy b
grows exponentially at rate W (b; p), i.e.
Sn 2nW (b;p):
c Mishra 1998
58
Chapter 7
Information Theory
W (b; p) =
pk lg(bk uk )
#
X " bk
1
=
pk lg p lg p + lg uk
k
k
X
=
pk lg uk H (p) D(pkb)
X
pk lg uk H (p);
with equality i p = b.
The optimal doubling rate
W (p) = max
W (b; p) = W (p; p) =
b
pk lg uk H (p):
Theorem 7.7.2 Proportional gambling is log-optimal.
The optimum doubling rate is given by
W (p) = W (b; p) =
pk lg uk H (p);
and is achieved by the proportional gambling scheme, b = p.
Dene rk = u1k = Bookie's estimate of the win \probabilities." Thus

X
X
rk = u1 = 1:
k
k
Odds are fair and there is no track take.
pk lg rbk
k
#
X " bk
r
k
=
pk lg p lg p
k
k
= D(pkr) D(pkb):
W (b; p) =
Doubling Rate = Dierence between the distance of the bookie's

estimate from the true distribution and the distance of the gambler's estimate from the true distribution.
c Mishra 1998
Section 7.7
59
Information Theory
Special Case: Odds are m-for-1 on each horse:

8k rk = m1 :
Thus,
W (p) = D(pku) D(pkb) = lg m H (p):
Theorem 7.7.3 Conservation Theorem

for uniform odds.
W (p) + H (p) = lg m
Low-Entropy Races are Most Protable.

Case of a not fully invested gambler.
b0 = wealth held out as cash

bi = proportional bet on Hi:
m
X
b i = 1:
b0 0; bi 0;
Thus
i=0
S (X ) = b0 + b(X )u(X ):
P
Fair Odds: u1i = 1.
If there is a non-fully-invested strategy with b0, b1, : : :, bm,
then there is also a full investment as follows
b00 = 0
b0i = bi + bu0 ; 1 i m
i
m
m
m
X
X
X
bi + b0 u1 = 1:
b0i =
i=1 i
i=1
i=0
Thus
S (X ) = b0(X )u(X ) = u(bX0 ) u(X ) + b(X )u(X )
= b0 + b(X )u(X ):
Thus in this case there is a risk-neutral investment.
c Mishra 1998
60
Information Theory
Chapter 7
Super-Fair Odds: P u1i < 1.
\Dutch Book" betting strategy.

X1
1
;
b
b0 = 1
i = ; 1 i m:
ui
ui
Thus
X 1 1
X1
+
u
(
X
)
=
2
S (X ) = 1
ui u(X )
ui > 1
with no risk!
This, however, implies a strong arbitrage opportunity.
Sub-Fair Odds: P u1i > 1.
In this case, proportional gambling is no longer log-optimal
and this case represents a risky undertaking for the gambler.
7.8 Side Information

Some external information about the performance of the horses
may be available|for instance, previous games.
X = f1; 2; : : : ; mg, represent the horses.
Y = Some other arbitrary discrete random variable
(Side Information ).
p(x; y) = joint probability mass function for (X; Y ) :

b(xjy) = conditional betting strategy depending on Y
= proportion of wealth bet on horse x given that y 2 Y is
observed.
b(x) = unconditional betting strategy.
P b(x) = 1:
b(x) 0;
x
b(xjy) 0; Px b(xjy) = 1:
c Mishra 1998
Section 7.9
61
Information Theory
X
W (X ) = max
p(x) lg(b(x)u(x))
b(x) x
X
=
p(x) lg u(x) H (X ):
x
X
W (X jY ) = bmax
p(x) lg(b(xjy)u(x))
(xjy) x
X
=
p(x) lg u(x) H (X jY ):
x

W = W
X (X jY ) W (X )
X
=
p(x) lg u(x) H (X jY )
p(x) lg u(x) + H (X )
x
= H (X ) H (X jY ) = I (X ; Y ) 0:
Increase in Doubling Rate =
Mutual information between the horse race and side information.
7.9 Learning
fXk g = Sequence of horse race outcomes from a stochastic process.

W (Xk j"Xk 1; Xk 2 ; : : :; X1)
#
= E b(:jxmax;:::;x ) E [lg S (Xk )jXk 1; Xk 2 ; : : : ; X1]
1
k 1
= lg m H (Xk jXk 1; Xk 2 ; : : : ; X1);
and is maximized for
b(xk jxk 1; : : : ; x1) = p(xk jxk 1; : : :; x1):
Note that since
n
Y
Sn = S (Xi );
i=1
c Mishra 1998
62
Universal Portfolio
Chapter 7
we have
1 E lg S = 1 X E lg S (X )
n
i
n
n
X
= n1 (lg m H (Xi jX1; : : : ; Xi 1))
= lg m H (X1; n: : : ; Xn)
= lg m H (X ):
H (X ) is simply the entropy rate.
c Mishra 1998
Chapter 8
Universal Portfolio
8.1 Universal Portfolio
1. Sequential Portfolio Selection Procedure. An adapted process.
2. No statistical assumption about the behavior of the market.
3. Robust procedure with respect to arbitrary market sequences occurring in the real world.
We shall consider growth of wealth for arbitrary market sequences. For example, our goal may be to outperform the best
buy-and-hold strategy|i.e., we wish to be competitive against
a competing investor who can predict n future days. A dierent goal may be to outperform all constant rebalanced portfolio
strategies.
m = # stocks traded in a market

xi = price relative for the ith stock
price at close = Pi (c)
= stock
stock price at open Pi(o)
= 1 + PPi :
i
63
64
Universal Portfolio
0
BB
x=B
B@
x1
x2
...
xm
Chapter 8
1
C
C
C
= stock market vector :
C
A
8.1.1 Portfolio
b1 1
(
b2 C
CC
bPi 0
= portfolio ;
... C
A
i b i = 1:
bm
Portfolio is simply the proportion of the current wealth invested in each of the stocks .
X
S = b x = bT x = bixi;
0
B
B
b=B
B
@
= Factor by which the wealth increases in one period.
x(1); x(2); : : :; x(n)

= stock market vectors for n consecutive days.
b = Fixed (constant) portfolio

We shall follow a constant rebalanced portfolio strategy.
(
n
Y
S0(b) = 1
T
Sn(b) = b x(i);
Sn (b) = Sn 1(b) bT x(n):
i=1
Sn = max
S (b) = Sn (b):
b n
This is the maximum wealth achievable on the given stock sequence maximized over all constant rebalanced portfolios.
c Mishra 1998
Section 8.2
65
Universal Portfolio
8.2 Universal Portfolio Strategy

^b(k)
depends only the past price relatives: x(1), x(2), : : :, x(k 1).
It performs as well as the best constant rebalanced portfolio based on a clairvoyant knowledge of the sequence of price
relatives.
8.2.1 Questions
Since we wish to compete against a clairvoyant investor (who knows the future) and universal portfolios only depend on the past (past has no causal or
correlated relation with the future), how is it possible
that universal portfolio can be competitive?
Malicious/adversarial nature is free to structure the

future so as to help the competing investor.
0 1=m 1
B 1=m CC
BB .. CC :
^b(1) = B
@ . A
1=m
Sk (b) =
Yk
i=1
bT x(i); B = b 2 Rm+ j bi 0;
bi = 1 :
R bS (b)db
^b(k + 1) = RB k
S (b)db
B k
Note that
^b(k + 1)T x(k + 1) =
R bT x(k + 1)S (b)db R S (b)db

k
B R
= BR Sk+1(b)db
S
(
b
)
db
B k
B k
c Mishra 1998
66
Universal Portfolio
Chapter 8
The \learned" portfolio is the performance weighted average of

all portfolios b 2 B .
Thus
R S (b)db
Z
n
Y
T
S^n = ^b(k) x(k) = B R n db = (m 1)! Sn(b)db:
B
B
k=1
We will show that
q
(
m
1)!(
2=n)m 1
q
S^n Sn
jJnj;
where Jn = a positive semidenite (m 1) (m 1) sensitivity

matrix.
8.3 Properties & Analysis

Let F be some arbitrary probability distribution for price relatives over Rm+ . Let Fn be the empirical distribution associated
with x(1), x(2), : : :, x(n). Pr[X = x(i)] = 1=n. Pr[X 6=
x(i); 8i] = 0.
lim F ! F:
n!infty n
8.3.1 Doubling Ratio

Z
lg(bT x)dF (x)

n 1
X
W (b; Fn) =
lg(bT x(i))
n
i=1
W (F ) = max
W (b; F )
b
W (Fn) = max
W (b; Fn)
b
W (b; F ) =
n
Y
T x(i) = 2nW (Fn ):

b
Sn = max
S
(
b
)
=
max
n
b
b i=1
c Mishra 1998
Section 8.3
67
Universal Portfolio
Let ej be the vector

001
BB .. CC
BB . CC
BB 0 CC
ej = B
BB 1 CCC
BB 0 CC
B@ ... CA
0
Sn(ej ) =
n
Y
k=1
1 in j th position only.
eTj x(k) =
n
Y
k=1
x j (k )
= Wealth due to buy-and-hold strategy

associated wit the j th stock:
Since Sn is a maximization of Sn (b) over the entire simplex,
8j Sn Sn (ej ):
Corollary 8.3.1
1. Target Exceeds Best Stock.
S (e ):
Sn max
j n j
2. Target Exceeds Value Line.
0
11=m
Y
Sn @ Sn(ej )A
j
3. Target Exceeds Arithmetic Mean.
Sn
X
j
j Sn(ej );
j 0;
X
j
j = 1:
4. Sn(x(1), x(2), : : :, x(n)) is invariant under permutations

of the sequence x(1), x(2), : : :, x(n).
c Mishra 1998
68
Universal Portfolio
Lemma 8.3.2
Y
S^n = ^b(k)T x(k) =
n
k=1
where
Sn(b) =
n
Y
i=1
Chapter 8
R S (b)db
BRn
db
B
bT x(i):
S^n = Wealth from universal portfolio is the average of Sn(b)
over the simplex.

Proof :
Recall that
R
^b(k + 1)T x(k + 1) = RSk+1 (b)db :
Sk (b)db
Telescoping the products
n
^Sn = Y ^b(k)T x(k)
k=1
R S (b)db
R S (b)db
n
= R S (b)db R1 db
R Sn(b1)db
= Rn db
R Qn bT x(i)db
R
= B i=1
B db
= E b Sn (b) = E b 2nW (b;Fn ):
Corollary 8.3.3 S^n(x(1), x(2), : : :, x(n)) is invariant under

permutations of the sequence x(1), x(2), : : :, x(n).
Claim
E b W (b; Fn)
E b W (b; Fn )
c Mishra 1998
Eb
1 X W (e ; F ):
j n
m j
lg(bT x) dFn (x)
Section 8.4
69
Universal Portfolio
X
lg bj (eTj x) dFn(x)
X Z T
E b bj lg(ej x) dFn(x)
XZ T
1
= m
lg(ej x)dFn(x)
X
= m1 W (ej ; Fn):
=
Eb
By Jensen's inequality
Eb2nW (b;Fn ) 2nEbWP(b;Fn)
21=m nW (ej ;Fn)
Y nW (ej ;Fn)1=m

2
:
Thus
S^n = E b Sn (b) = E b 2nW (b;Fn )
11=m
0m
Y nW (ej ;Fn)1=m
Y

@ Sn (ej )A :
2
j =1
Corollary 8.3.4 Universal portfolio exceeds Value Line index.

0m
11=m
Y
S^n @ Sn(ej )A :
j =1
8.4 Competitiveness
Fn(x) = Empirical probability mass function. Mass on each
x(i) 2 Rm+ is n1 .
Sn(b) =
b(Fn) = b
n
Y
i=1
bT x(i) = 2nW (b;Fn ) = enV (b;Fn );
= arg max Sn(b) = arg max V (b; Fn) 2 Rm+ :

nV (Fn ) :
Sn = max
S
(
b
)
=
e
n
b2 B
c Mishra 1998
70
Chapter 8
Universal Portfolio
Denition 8.4.1 All stocks are active at time n, if

9b;Sn (b)=Sn 8i2[1::m] (b(Fn))i > 0:
All stocks are strictly active at time n, if
8b;Sn (b)=Sn 8i2[1::m] (b(Fn))i > 0:

If
Lin (x(1); x(2); : : :; x(n)) = Rm;

then we say that the price relatives x(1), x(2), : : :, x(n) are of
full rank.
J (b) = (m 1) (m 1) matrix :
J (b) = Sensitivity Matrix Function of a market with respect to
distribution F (x), x 2 Rm+ .
Z
x(j ) x(m)) dF (x):
Jij (b) = (x(i) x(m(b))(
T x)2
J = J (b) = Sensitivity Matrix.
2V ((b; : : :; b ; 1 Pm 1 b); F )
@
1
m 1
i=1 i

Jij =
@bi @bj
= Positive Semidenite Matrix.
It is positive denite if all stocks are strictly active.
Let
Dene
Thus
(
)
X
C = (c1; c2; : : : ; cm 1) j ci 0; ci 1 :
b(c) = c1; : : :; cm 1 ; 1
mX1
i=1
ci :
! Z
n
X
1
T
Vn (c) = n ln b(c) x(i) = ln(bT x) dFn (x) E Fn ln(bT x):
i=1
c Mishra 1998
Section 8.4
71
Universal Portfolio
Using Taylor series expansion:
Vn (c) = Vn (c) + (c c)T rVn(c)

1 (c c)T J (c c)
n
2
X
+ 61 (ci ci )(cj cj )(ck ck )
ijk
E Fn (x(i) x(m))(x(j )S 3(~cx)(m))(x(k) x(m))
where
c~ = c + (1 )c; 0;
mX1
b(~c)iX (i):
S (~c) =
i=1
Assume that all stocks are strictly active:

" 2 #
V = positive denite.

J = @c@ @c
i j
Hence its determinant is strictly positive:
jJ j > 0:
Let u = n(c c). Then since the second term is 0 in the

Taylor series formulation, we have
nVn (c) = nV (Fn) 21 uT Jnu
X
+ 6p1 n uiuj uk
E Fn (x(i) x(m))(x(j )S 3(~xc)(m))(x(k) x(m)) :
Next assume that 0 < a x(i) c < 1.
S (~c) a; x(i) x(m) 2c:

c Mishra 1998
72
Portfolios and Markets
Chapter 8
Thus the last term in the preceding expression can be bounded

by
1 kuk3m3=2 (2c)3 :
p
6 n
a3
Hence, we have
3=2 3
nVn (c) = nV (Fn ) 21 uT Jnu 43mpnac3 kuk3:
We thus conclude that
p
Sn (c) = 2nWn (c) e(nVn) (uT Jnu=2) (4m3=2c3 kuk3=3 na3)

p
= Sn e (uT Jnu=2) (4m3=2c3 kuk3=3 na3):
Since S^n = R Sn (b)db= R db, and since R db = 1=(m 1)!, we
have
!
Z
pna3 ) 1 m 1
T

3
=
2
3
3

(
u
J
u=
2)
(4
m
c
k
u
k
=
3
pn
S^n Sn(m 1)! u2U e n
du:
Thus,
=n)(m 1)=2 :
S^n Sn (m 1)!(2
jJ j1=2
In other words,
jJn j1=2
1 lg Sn = 1 lg
n S^n n (m 1)!(2=n)(m 1)=2 ! 0;
Summarizing, we have
1 lg S 1 lg S^
n n
n n
Vn V^n :
c Mishra 1998
as n ! 1 :
Chapter 9
9.1 Portfolio Theory
9.1.1 Itô Calculus
X = asset price at time t. In a continuous time model, one can

study the return on the asset dX=X over a small period of time
dt.
dX = dt + dZ:
X
This is a so-called Itô process.
= average rate of growth: DRIFT
= volatility: DIFFUSION
9.1.2 Market Model
Assume that there are m stocks, represented by m Itô processes:
X1 (t); X2(2); : : : ; Xm(t):

Furthermore,
m
dXi = dt + X
ij dZj ;
i
Xi
j =1
73
74
Chapter 9
Here, Zj 's are independent Brownian motions.

0 1
BB 12 CC
= Drift Vector = B
B@ ... CCA :
m
0 1
11 12
1m
B
C

B
21
22
2
m C
B
= Diusion Matrix = B .. .. . . .. C
. . C
@ . .
A
m1 m2 mm
= Instantaneous Covariance Matrix = T :
In general, the term dZ corresponds to a Wiener Process .
dZ = Normal Random Variable.
p
dZ N (0; dt). Mean of dZ is zero and variance of dZ
is dt.
p
dZ = dt; E [] = 0; E [2] = 1:
This holds in continuous time in the limit as dt ! 0.
Lemma 9.1.1 Itô's Lemma [Analogous to Taylor's theorem
in case of functions of random variables. The key ideas is based
on the observation that with probability 1, dZ 2 ! dt as dt ! 0.]
Suppose f (X ) is a function of X (where X is possibly stochastic).
@f dX + 1 @ 2f dX 2 + smaller order terms
df = @X
2 @X 2
2
dX = (Xdt + XdZ )2
= 2X 2dZ 2 + 2X 2dZdt + 2X 2dt2
! 2X 2dt as dt ! 0
@f (Xdt + XdZ ) + 1 2X 2 @ 2f dt
df = @X
2!
@X 2
2
= X @f + 1 2X 2 @ f2 dt + X @f dZ:
@X 2
@X
@X
c Mishra 1998
Section 9.2
75
Example
dX = dt + dZ
X
Let f (X ) = ln X . Then
@f = 1 ; & @ 2f = 1
@X X
@X 2
X2
@f dX + 1 @ 2f dX 2
df = @X
2 @X 2
1 2X 2dt
= dX
X 2X 2
2 dt
= dX
X 2
2 dt
d(ln X ) = dX
X 2
dX = d(ln X ) + 2 dt
X
2 Z
Zt
Z t dX
1 t 2dt
d
(ln
X
)
+
=
2 0 Z
0
0 X
t
= ln X (t) ln X (0) + 21 2dt
0
(Z t )

Z
t
dX = X (t) exp 1 2dt
exp
X (0)
2 0
0 X
9.2 Rebalanced Portfolio

Market Model with m stocks:
m
dXi (t) = (t)dt + X
ij (t)dZj (t)
i
Xi (t)
j =1
(t) = (t)(t)T :
c Mishra 1998
76
Chapter 9
A portfolio of long stocks at time t is identied by its weighted

vector process b(t) 2 B .
)
(
m
X
B = b 2 Rm j bi 0; bi = 1 :
i=1
Rebalanced Portfolio
(A self-nancing portfolio without dividends).

m
dXi (t)
dS (t) = X
b
i (t)
S (t)
Xi(t)
i=1
1
0m m
!
m
X
X
X
bi(t)ij (t)dZj A :
bi(t)i(t) dt + @
=
i=1
i=1 j =1
Let g(S ) = ln S and f (X ) = P bi ln Xi = ln Q Xibi .
dg
dS
S
df
X dXi
bi X
i
Hence
1 (bT b)S 2dt

= dS
S 2S 2
= d(ln S ) + 21 (bT b)dt
X dXi X 1
2
=
bi X
2Xi2 (biii)Xi dt
i
X
X
= d( bi ln Xi ) + 21 biiidt
X
X
d(ln S ) = d( bi ln Xi ) 21 bT bdt + 12 biiidt
t; b) = X b ln Xi (t) 1 bT b + 1 X b ;
ln SS((0)
i
Xi(0) 2
2 i ii
R
where 0t (s)ds.
1
m X (t) !bi
X
Y
1
i
T
exp 2 b b + 2 iibi :
S (t; b) = S (0)
i=1 Xi (0)
c Mishra 1998
Section 9.2
77
Maximizing the above expression we have
S (t) = max
S (t; b) = S (t; b(t))
b2 B
Note that b(t) = optimal solution of the following quadratic
programming problem:
!
m
X
1
X
i (t) 1
T
max
ln X (0) + 2 ii bi:
b b +
b2B 2
i
i=1
Dene the matrix V , an (m 1) (m 1) symmetric positive

semidenite matrix
V = (Vij ) Vij = ij im jm + mm; 1 i; j m:
Lemma 9.2.1 If V = positive denite then the portfolio problem has a unique optimal solution.
Denition 9.2.1 A stochastic process X (t) is weakly regular

if
8t jE [X (t)]j < 1:
lim E [X (t)] = exists
t
X (t) ! in probability as t ! 1:
t
The stock market model is weakly regular (easily satised if
the market is stationary)
8t jE [(t)]j < 1; & jE [ln X (t)]j < 1;
t)] = 1 ; & lim E [ln X (t)] = 1 exist
lim E [(
t
t
(t) ! 1 ; & ln X (t) ! 1 in probability as t ! 1:
t
t
c Mishra 1998
78
Chapter 9
Note that
Thus
dXi = dt + X dZ
i
ij j
Xi
i dXi2
d(ln Xi) = dX
X 2X 2
i ii i
X
= i 2 dt + ij dZj
1

ii :
1
1
i = i
2
1

1
1
i = i + 2ii :
Similarly,
dS = X b dt + X X b dZ
i i
i ij j
S

XX
d(ln S ) = bT 21 bT b dt +
biij dZj
r(b) = lim E [ln St (t; b)] = 21 bT 1 b + bT 1:
Asymptotically optimal constant weight b1 2 B .

1 bT 1 b + bT 1 :
r(b1) = max
r
(
b
)
=
max
b2B
b2B 2
9.2.1 Optimal Portfolio
Recall

1
m X (t) !bi
Y
i
T b + 1 X b :
S (t; b) = S (0)
exp
b
ii i
2
2
i=1 Xi (0)
Vij (t) = ij im jm + mm:
Dene
c Mishra 1998
Xm (t)
i = ln X
m (0)
Xi(t)
ln X
i (0)
Vii(t) :
2
Section 9.2
79
Notation:
b = (b0; bm) b01 + + b0m 1 + bm = 1; b0i 0; bm > 0:

Rewriting the previous equation, we have
Xm (t) exp 1 b0T V b0 T b0 :
S (t; b) = S (0) X
2
m (0)
The above value S (t; b) is maximized at b0 =
V (t) (t) =
(t) =
(t)
V 1(t)(t)
Xm (t) eT V =2;
S (t) = S (0) X
(0)
m
and
S (t; b) = S (t) exp
1

0 )T V (b0 ) :
(
b
2
S (t; b) = exp 1 (b0 )T V (b0 ) :

S (t)
2
9.2.2 Long Term Eects

Vij = ij im jm + mm
Jij1 = 1ij 1im 1jm + 1mm
(t)
(t)
ln XXii(0)
i = ln XXmm(0)
1
i1 = m1 i11 J2ii
mm 1 1
ii
= 1
m 1 2
i 1 2
ii + 1 mm
2 1 im 1 2
= 1
m i + im :
lim V t(t) = J 1
9
2 >
>
>
>
=
lim it(t) = i1
>
>
>
>
;
Vii (t)
c Mishra 1998
80
Since
1 bT 1 b + bT 1
2
1 b0T J 1b0 b0T 1;
2
r(b) =
=
it is maximized at
Chapter 9
1 = (J 1 ) 1 1 :
Note, however, that

!1
!

(
t
)
V
(
t
)
1) 1 1 = 1 :

!
(
J
(t) =
t
t
Problem: Construction of b1 requires the long-term average
of future instantaneous expected returns and covariances. This
however is impossible.
Remedy: Universal Portfolio
9.3 Universal Portfolio

Rebalanced portfolio with weights:
R
^bi(t) = RB biS (t; b)db :
B S (t; b)db
Let
R S (t; b)db
S = B R db
B
Note that
S(0) = S^(0):
Furthermore,
R
R
dS = RB dS (t; b)db = B Pi SR(t; b)bi(dXi =Xi )db
S
B S (t; b)db
B S (t; b)db
X ^ dXi
bi(t) X
=
i
^
= d^S
S
c Mishra 1998
Section 9.3
81
Hence,
8t S^(t) = S(t):
Lemma 9.3.1 The wealth accumulated by a universal portfolio

is given by
S^(t) =
B SR (t; b)db :
B db
This is the average wealth accumulated by all possible portfolios.
9.3.1 Competitiveness
S (t; b) = S (t)e 21 (b0 )T V (b0 ):
Let x = V 1=2(t)(b0 b0). Thus
(t) = V 1=2(t)(B 0 b0);
where
Note that
We have
B0 =
X
b0 2 Rm 1 j b0i 0; b0i < 1
)
:
Vol (B 0) = (m 1 1)! :
(t) R e jxj2 =2 dx
S
S^(t) = jV (t)j1=(2(1t) =(m 1)!) :
R
S^(t) = (m 1)! e jxj2=2dx
V (t) 1=2 m 1=2
S (t)
t t
p m1
(
m
1)!(
= jJ 1j1=2tm21)=2
2 m 1=2
= (jmJ 1j11)!
:
=2
t
c Mishra 1998
82
Bibliography
Thus,
1 ln S^(t) = C (m) C 0(m) ln t ! 0
t S (t)
t
and
c Mishra 1998
ln S^(t) ! ln S (t) ! ln S (t; b1) :

t
t
t
Bibliography
Text Books
[1] Thomas M. Cover and Joy A. Thomas. Elements of
Information Theory , John Wiley & Sons, 1991. ISBN
0-471-06259-6.
[2] Darrel Due. Dynamic Asset Pricing Theory , Princeton,

1997. ISBN 0-691-04302-7.
[3] Drew Fudenberg and Jean Tirole. Game Theory , MIT,
1995. ISBN 0-262-06141-4.
[4] Alan Kirman and Mark Salmon. Learning and Rationality in Economics , Basil Blackwell, 1995. ISBN
0-631-18488-0.
[5] H.M. Markovitz. Means-Variance Analysis in Portfolio Choice and Capital Markets , Blackwell, 1991. ISBN
0-631-17854-6.
Popular Books
[6] William Poundstone. Prisoner's Dilemma Doubleday,
1992.
[7] Anatol Rapoport. Two-Person Game Theory: The Essential Ideas , Ann Arbor Science Paperbacks, University of
Michigan, 1966.
[8] Karl Sigmund. Games of Life: Explorations in Ecology,
Evolution and Behaviour Oxford University Press, 1993.
83

Game Theory & Learning PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Game Theory & Learning PDF

Загружено:

Авторское право:

Доступные форматы

Game Theory & Learning

Informal Notes|Not to be distributed

2 Strategic Form Games

Stag Hunt Problem : : : : : : : : : : : : : : : : :

3.1 Nash Equilibrium : : : : : : : : : : : : : : : : : : 22

4 Beyond Nash: Domination, Rationalization and

5 Adaptive and Sophisticated Learning

Adaptive and Sophisticated Learning

6 Learning a la Milgrom and Roberts

7 Information Theory and Learning

6.1 Adaptive Learning and Undominated Sets : : : : 42

7.1 Information Theory and Games : : : : : : : : :

7.7 Gambling and Entropy : : : : : : : : : : : : : : 56

8.1 Universal Portfolio : : : : :

9 Portfolios and Markets

9.1 Portfolio Theory : : : : : :

Stag Hunt Problem

Note that if row-player is risk aversive, he will choose to

1.2 Why are these kinds of analysis important to us?

1.3 Prisoners' Dilemma

Prisoners' Dilemma (Modi ed Pay-o s)

1.4 Second-Price Auction

Consider the special case of just two players

u1  if s1 > s2 then v1 s2 else 0:

1.5 Two Person Zero-sum Games

Rock, Paper & Scissors

Row-player's goal is to minimize the loss. Assume (without

= rTMc = M (r ; c):

r (si) = Probability that the row player plays si

A mixed strategy r realizing this minimum is called a minmax

Theorem 1.5.1 The MINMAX theorem: von Neumann

1.7 Repeated Play (with learning)

for each row i.

Row-player's cumulative expected loss:

The expected cumulative loss of the best strategy

1.8 Learning Algorithm

1.9 Analysis of Learning Algorithm

Wt(i) M (i; c;t )

Strategic Form Games

1.9.3 Final Result

Combining the two inequalities:

2.2 Strategic Form Games

Strategic Form Games

We write, si 2 Si for a pure strategy of player i. We also

De nition 2.2.1 A strategic form game is a tuple

De nition 2.2.2 A two-player zero-sum game is a strategic

De nition 2.2.3 A mixed strategy set for player i, i is the

set of probability distributions over the pure strategy set Si

The space of mixed strategy pro le =  = i2I i :

Strategic Form Games

2.3 Domination & Nash Equilibrium

De nition 2.3.1 A pure strategy si is strictly dominated for

i2S i ui(i ; s i ) > ui (si ; s i ):

A pure strategy si is weakly dominated for player i if

De nition 2.3.2 Best Response: The set of best responses

Strategic Form Games

De nition 2.3.3 Nash Equilibrium: A pure strategy pro le

s is a Nash equilibrium if for all players i,

For column-player, M is dominated by R. Column-player can

Strategic Form Games

For row-player, M and D are dominated by U. Row-player

BRr (U; L) = U; & BRc(U; L) = L; & BR(U; L) = (U; L):

Strategic Form Games

Prisoners' Dilemma (Modied Pay-os)

u1 if s1 > s2 then v1 s2 else 0:

= rTMc = M (r ; c):

r (si) = Probability that the row player plays si

A mixed strategy r realizing this minimum is called a minmax

Wt(i) M (i; c;t )

Denition 2.2.1 A strategic form game is a tuple

Denition 2.2.2 A two-player zero-sum game is a strategic

Denition 2.2.3 A mixed strategy set for player i, i is the

The space of mixed strategy prole = = i2I i :

Denition 2.3.1 A pure strategy si is strictly dominated for

i2S i ui(i ; s i ) > ui (si ; s i ):

Denition 2.3.2 Best Response: The set of best responses

Denition 2.3.3 Nash Equilibrium: A pure strategy prole

Denition 2.6.1 An innite sequence c;t is almost constant,

if there exists a c such that c;t = c almost always (a.a.). That

f(i; i;j )gi6=j :

Denition 2.6.2 Rationality: Each player chooses only best

8i6=j i(si) > 0 ) si 2 BRi(i;j ):

9sr 2Sr sr 62 BRr (rc;t) i.o.

9sr 2Sr r;t(sr) = 0 i.o. & 9sr 2Sr tlim

9tr 2Sr nfsr g r (tr) > 0 and tr 62 BRr (c)

Denition 3.1.1 A point x 2 K is a xed point of an injective

Denition 3.1.2 A point x 2 K is a xed point of a mapping

Denition 3.1.3 Topological Vector Space: L = vector space

80<<1 ui(i0 + (1 )i00; i)

= ui(i0; i) + (1 )ui (i00; i):

82 () = convex:

80<<1 8si2Si ui(i0 + (1 )i00; i) ui(si; i);

80<<1 i0 + (1 )i00 2 i():

^ 62 () ) ^i 62 i():

9>0 9i0 2i ui(i0; i) > ui(^i; i) + 3:

ui(i0; Ni) > ui(i0; i)

i 2 i j i(si) > 0 ) si 2 Sin

Sr0 = fU; M; Dg & 0r = f (with full support) g:

Sc0 = fL; Rg & 0c = f (with full support) g:

strategies. Thus, the following is an equivalent denition

In this case, the strategy prole (s1 , s2 , : : :, sI ) is a (unique)

Denition 4.3.1 (Rationalizable Strategies) Let

A strategy prole is rationalizable if i is rationalizable for

A direct examination of these constructions reveals that ~ ni

A pure strategy: (U, L) 7! Pay-o = 5,1,

4.4.1 Formal Denitions

\Expanded Games" with a correlating device.

p = probability measure on the state space

Hi = Information Partition for player i.