Вы находитесь на странице: 1из 88

Game Theory & Learning

Informal Notes|Not to be distributed

Bhubaneswar Mishra
Courant Institute of Mathematical Sciences

Preface
In the spring of 1998, a small group of computer science colleagues, students and I started writing notes that could be used
in the context of our research in computational economy, evolving around our work on CAFE . The group consisted of Rohit
Parikh, Ron Even, Amy Greenwald, Gideon Berger, Toto Paxia
and few others. At present, these notes are intended for the
consumption of only this group.
January 1 1998
251 Mercer Street, New
York.

B. Mishra
mishra@nyu.edu

vii

Contents
Preface
1 Introduction

vii
1

2 Strategic Form Games

10

1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9

Stag Hunt Problem : : : : : : : : : : : : : : : : :


Why are these kinds of analysis important to us?
Prisoners' Dilemma : : : : : : : : : : : : : : : :
Second-Price Auction : : : : : : : : : : : : : : :
Two Person Zero-sum Games : : : : : : : : : : :
Obstacles : : : : : : : : : : : : : : : : : : : : : :
Repeated Play (with learning) : : : : : : : : : : :
Learning Algorithm : : : : : : : : : : : : : : : : :
Analysis of Learning Algorithm : : : : : : : : : :
1.9.1 Inequality 1 : : : : : : : : : : : : : : : : :
1.9.2 Inequality 2 : : : : : : : : : : : : : : : : :
1.9.3 Final Result : : : : : : : : : : : : : : : : :

Games : : : : : : : : : : : : : : : : : : : : :
Strategic Form Games : : : : : : : : : : : :
Domination & Nash Equilibrium : : : : : : :
Example : : : : : : : : : : : : : : : : : : : :
2.4.1 Matching Pennies : : : : : : : : : : :
2.5 Key Ingredients for Nash Equilibrium : : : :
2.6 Revisiting On-line Learning : : : : : : : : :
2.6.1 Convergence : : : : : : : : : : : : : :
2.6.2 Irrationality : : : : : : : : : : : : : :
2.6.3 A Meta-Theorem of Foster & Young :

2.1
2.2
2.3
2.4

ix

:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:

1
2
3
4
6
7
7
8
8
8
9
9

10
10
12
13
16
17
17
17
17
18

Contents

3 Nash Equilibrium

22

3.1 Nash Equilibrium : : : : : : : : : : : : : : : : : : 22


3.1.1 Fixed Point Theorems : : : : : : : : : : : 22

4 Beyond Nash: Domination, Rationalization and


Correlation
26
4.1 Beyond Nash : : : : : : : : : : : : : : : : : : :
4.1.1 Correlated Equilibrium : : : : : : : : : :
4.2 Iterated Strict Dominance and Rationalizability
4.2.1 Some Properties of Undominated Sets : :
4.3 Rationalizability : : : : : : : : : : : : : : : : : :
4.4 Correlated Equilibrium : : : : : : : : : : : : : :
4.4.1 Formal De nitions : : : : : : : : : : : :
4.4.2 Pure Strategies for the Expanded Game :
4.4.3 Correlated Equilibrium and Universal Device : : : : : : : : : : : : : : : : : : : :

5 Adaptive and Sophisticated Learning


5.1
5.2
5.3
5.4

Adaptive and Sophisticated Learning


Set-up : : : : : : : : : : : : : : : : :
Formulation : : : : : : : : : : : : : :
Looking Forward : : : : : : : : : : :

:
:
:
:

:
:
:
:

:
:
:
:

:
:
:
:

:
:
:
:

:
:
:
:

:
:
:
:
:
:
:
:

26
26
27
28
30
31
32
33

: 34
:
:
:
:

36
36
37
38
40

6 Learning a la Milgrom and Roberts

42

7 Information Theory and Learning

49

6.1 Adaptive Learning and Undominated Sets : : : : 42


6.2 Convergence : : : : : : : : : : : : : : : : : : : : : 43
6.3 Stochastic Learning Processes : : : : : : : : : : : 47

7.1 Information Theory and Games : : : : : : : : :


7.1.1 Basic Concepts : : : : : : : : : : : : : :
7.2 Joint & Conditional Entropy : : : : : : : : : : :
7.2.1 Chain Rule : : : : : : : : : : : : : : : :
7.3 Relative Entropy & Mutual Information : : : : :
7.4 Chain Rules for Entropy, Relative Entropy and
Mutual Information : : : : : : : : : : : : : : : :
7.5 Information Inequality : : : : : : : : : : : : : :
7.6 Stationary Markov Process : : : : : : : : : : : :

c Mishra 1998

:
:
:
:
:

49
49
50
50
51

: 52
: 53
: 54

xi

Contents

7.7 Gambling and Entropy : : : : : : : : : : : : : : 56


7.8 Side Information : : : : : : : : : : : : : : : : : : 60
7.9 Learning : : : : : : : : : : : : : : : : : : : : : : : 61

8 Universal Portfolio

8.1 Universal Portfolio : : : : :


8.1.1 Portfolio : : : : : : :
8.2 Universal Portfolio Strategy
8.2.1 Questions : : : : : :
8.3 Properties & Analysis : : : :
8.3.1 Doubling Ratio : : :
8.4 Competitiveness : : : : : : :

9 Portfolios and Markets

9.1 Portfolio Theory : : : : : :


9.1.1 It^o Calculus : : : :
9.1.2 Market Model : : :
9.2 Rebalanced Portfolio : : :
9.2.1 Optimal Portfolio :
9.2.2 Long Term E ects
9.3 Universal Portfolio : : : :
9.3.1 Competitiveness : :

Bibliography

:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

63
63
64
65
65
66
66
69

73
73
73
73
75
78
79
80
81

82

c Mishra 1998

Chapter 1
Introduction
1.1 Stag Hunt Problem
(With Two Players)

Stag Hunt Problem


Stag Hare
Stag 2,2 0,1
Hare 1,0 1,1
1. If both row-player and column-player hunt stag, since a
stag is worth 4 \utils", they each get 2 \utils."
2. If both row-player and column-player hunt hares, since a
hare is worth 1 \util", they each get 1 \util."
3. If row-player hunts hare, while column-player hunts stag
(and hence fails to hunt any thing), then the row-player
gets 1 \util" and the column-player gets 0 \util."
4. The other case is symmetric.
1

Introduction

Chapter 1

Note that if row-player is risk aversive, he will choose to


hunt hare and thus guarantee that he gets 1 \util" independent
of the choice column-player makes. Thus he will maximize the
minimum utility under the two possible pure strategies (\hunt
stag" with a minimum utility of 0 if the opponent hunts hare
vs. \hunt hare" with a minimum utility of 1 regardless of what
the opponent chooses to play) and choose to hunt hare. By
symmetry, it is seen that in fact both players will choose to hunt
hares.
Is this the truly optimal strategy?
Quoting Rousseau (Discourse on the origin and Basis of
Equality among Men ):
\If a group of hunters set out to take a stag, they
are fully aware that they would all have to remain
faithfully at their posts in order to succeed; but if a
hare happens to pass near one of them, there can be
no doubt that if he pursued it without qualm, and
that once he had caught his prey, he cared very little
whether or not he had made his companions miss
theirs."
Changing the discussion slightly, suppose that column-player
will play a mixed strategy by playing \hunt stag" with some
probability (say, y) and by playing the other strategy (\hunt
hare") with probability (1 y). His best choice of these probabilities must be such that row-player is now \indi erent" to the
choice of his own strategies. Thus, we have
2y + 0(1 y) = 1y + 1(1 y)
and y = 1=2. Thus one expects both row-player and columnplayer to play the strategies \hunt stag" and \hunt hare" with
equal probabilities.

1.2 Why are these kinds of analysis important to us?


1. Economy
c Mishra 1998

Section 1.3
2.
3.
4.
5.

Introduction

Evolutionary Biology
Large Scale Distributed Systems
Resource Allocation
Intelligent Agents

1.3 Prisoners' Dilemma


Prisoners' Dilemma

C D
C 0,0 -2,1
D 1,-2 -1,-1
There are two prisoners (row-player and column-player) arrested for a particular crime, but the prosecutor does not have
enough evidence to convict them both. He relies one one of them
testifying against the other in order to get a conviction and punish the second prisoner by sending him to jail. If both of them
testify against the other (defections: \D, D") then they both go
to jail for 1 year each, thus getting a \util" of 1. If, on the other
hand, both maintain silence (cooperations: \C, C") then they
go free with \util" of 0 each. If, on the other hand, row-player
testi es (D) and column-player maintains silence (C), then rowplayer is rewarded with 1 util and column-player is punished
with 2 util. The other case is symmetric.
The pay-o s can be made all non-negative by adding 2 utils
to each and thus getting a pay-o matrix:

c Mishra 1998

Introduction

Chapter 1

Prisoners' Dilemma (Modi ed Pay-o s)


C D
C 2,2 0,3
D 3,0 1,1
1. For column-player the strategy C is dominated by the
strategy D independent of how row-player plays the game.
Thus column player must defect.
2. Similarly, for row-player the strategy C is dominated by
the strategy D independent of how column-player plays
the game. Thus row player must defect.
Hence the equilibrium strategy for the players is to defect
even when they could have each gotten better pay-o s by cooperating.

1.4 Second-Price Auction


1. Seller has one indivisible unit of object for sale.
2. There are I potential buyers (bidders) with valuations
0  v1  v2  vI :
(Consider the case when I = 2.)
3. The bidders simultaneously submit bids

si 2 [0; 1]:
4. The highest bidder wins the object.
5. But he only pays the second bid (maxj6=i sj ).
c Mishra 1998

Section 1.5

Introduction

6. His utility is

vi max
s:
j 6=i j

Consider the special case of just two players

v1; v2 = valuations

s1; s2 = bids:

Pay-o s

u1  if s1 > s2 then v1 s2 else 0:


u2  if s2 > s1 then v2 s1 else 0:
Let us look at the player 1's choices.
1. Overbidding
(a) s1  s2: The payo is zero and the strategy is weakly
dominated.
(b) s2  v1: The payo is v1 s2 and the strategy is
weakly dominated with respect to bidding s1 = v1.
(c) v1 < s2 < s1: The payo is v1 s2 < 0 negative and
the strategy is strongly dominated.
2. Underbidding
(a) s2  v1: The payo is zero and the strategy is weakly
dominated.
(b) s1  s2: The payo is is v1 s2 and the strategy is
weakly dominated with respect to bidding s1 = v1.
(c) s1 < s2 < v1: The payo is zero and the strategy is
weakly dominated.
So the best strategy for player 1 is to bid exactly his own
valuation (s1 = v1). And by a symmetric argument, the best
strategy for player 2 is also to bid exactly his own valuation
(s2 = v2).
c Mishra 1998

Chapter 1

Introduction

1.5 Two Person Zero-sum Games


We de ne a loss matrix M as follows:
M (si ; sj ) = M (i; j ) = Loss su ered by the row-player for the
strategy pro le (si; sj ).

Rock, Paper & Scissors


R
R 1/2
P 0
S 1

P
1
1/2
0

S
0
1
1/2

Row-player's goal is to minimize the loss. Assume (without


loss of generality) that all the losses are in the range [0; 1].
Row-player's expected loss
X
r (si)c(sj )M (si ; sj )
i;j
X
=
r (i)M (i; j )c(j )
i;j

= rTMc = M (r ; c):

r (si) = Probability that the row player plays si


c(sj ) = Probability that the column player plays sj
Similarly,
X
X
M (r ; j ) = r (i)M (i; j ) and M (i; c) = c (j )M (i; j ):
i;j

Row-player's strategy
min
c M (r ; c ):
r max
c Mishra 1998

i;j

Section 1.7

Introduction

A mixed strategy r realizing this minimum is called a minmax


strategy.

Theorem 1.5.1 The MINMAX theorem: von Neumann


min
r M (r ; c ):
c min
c M (r ; c ) = max
r max

1.6 Obstacles

1. Imperfect Information
M (pay o ) may be unknown.
2. Computational complexity
M is so large that computing a minmax strategy using a
linear program is infeasible.
3. Irrationality
Opponent (column-player) may not be truly adversarial.

1.7 Repeated Play (with learning)


M unknown
1. The game is played repeatedly in a sequence of rounds.
2. On round t = 1; : : : ; T :
(a) The learner (row-player) chooses mixed strategy r;t.
(b) The opponent (column-player) chooses mixed strategy c;t.
(c) Row-player observes all possible losses
X
M (i; c;t) = c;t(j )M (i; j );
i;j

for each row i.


(d) Row-player su ers loss M (r;t c;t).
c Mishra 1998

Introduction

Chapter 1

Row-player's cumulative expected loss:


T
X
M (r;t; c;t):
t=1

The expected cumulative loss of the best strategy


T
T
X
X
M (r ; c;t):
M (r; c;t) = min
r
t=1

t=1

1.8 Learning Algorithm


Parameter to be chosen.
W 1 ( i)
Wt+1(i)
r;t(i)

Initially,
= 1;
8i
= Wt(i) M (i;c;t)
= PWWt(i()i) :
i

1.9 Analysis of Learning Algorithm

1.9.1 Inequality 1
X

Wt(i) M (i; c;t )


i
i
! X
X
=
Wt(i)  r;t M (i; c;t)
i
i
P W ( i)
X
) Pi Wt+1(i) = r;t M (i; c;t )
i t
i
X
 r;t(1 (1 )M (i; c;t))
Wt+1(i) =

= 1 (1 )M (r;t; c;t)):
After telescoping, we get
P W ( i) Y
Pi WT +1(i)  (1 (1 )M (r;t; c;t))
i 1
t
c Mishra 1998

Section 1.9

Strategic Form Games

Hence,
P W ( i) !
X
ln i nT +1
 ln(1 (1 )M (r;t; c;t))
t
X
 (1 ) M (r;t; c;t):
t

1.9.2 Inequality 2
X
i

WT +1(i)  WT +1(j ) =

P M (j; c;t)
t

P M (; c;t)
r

Hence

P W ( i) !
X
ln i nT +1
 (ln ) M (r; c;t)) ln n:
t

1.9.3 Final Result

Combining the two inequalities:


X
X
(1 ) M (r;t; c;t)  ln n + (ln 1= ) M (r; c;t)):
t

and,

X
t

M (r; c;t) 

M (r;t; c;t)
X
 (ln1 1= ) M (r; c;t)) + 1ln n :
t
t

c Mishra 1998

Chapter 2
Strategic Form Games
2.1 Games
Games can be categorized in to following two forms as below.
We will start here with the rst category and postpone the discussion of the second category for later.
1. Strategic Form Games (also called Normal Form Games)
2. Extensive Form Games

2.2 Strategic Form Games


1. Let I = f1; : : :; I g be a nite set of players, where I 2 N
is the number of players.
2. Let Si(i 2 I ) be the ( nite) set of pure strategies available
to player i 2 I .
3.

S = S1  S2      SI
(Cartesian product of the pure strategies) = Set of pure
strategy pro les.
10

Section 2.2

11

Strategic Form Games

Conventions

We write, si 2 Si for a pure strategy of player i. We also


write, s = (s1; s2; : : : ; sI ) 2 S for a pure strategy pro le.
\ i" denotes the player i's \opponents" and refers to all
players other than some given player i. Thus, we can write,
S i = j2I;j6=i Sj
Just as before, s i 2 S i denotes a pure strategy pro le for
the opponents of i. Hence,

s = (si; s i) 2 S;
is a pure strategy pro le.
ui : S ! R = Pay-o function (real-valued function on S )
for player i.
ui(s) = von Neumann-Morgenstern utility of player i for
each pro le s = (s1; s2; : : : ; sI ) of pure strategies.

De nition 2.2.1 A strategic form game is a tuple


(I ; fS1; S2; : : : ; SI g; fu1; u2; : : :; uI g)
consisting of a a set of players, pure strategy spaces and pay-o
functions.

De nition 2.2.2 A two-player zero-sum game is a strategic


form game with I = f1; 2g such that
8s2S

2
X
i=1

ui(s) = 0:

De nition 2.2.3 A mixed strategy set for player i, i is the

set of probability distributions over the pure strategy set Si

)
(
X
i = i: Si ! [0; 1]j i(si) = 1 :
i

The space of mixed strategy pro le =  = i2I i :


As before, we write: i 2 i, and  = f1; 2; : : : ; I g 2 :
c Mishra 1998

12

Strategic Form Games

Chapter 2

The support of a mixed strategy i = The set of pure strategies to which i assigns positive probability.
Player i's pay-o to pro le  is
ui() = E i ui(;  i) X
ui() = ui(i;  i) =
i(si )ui(si;  i)
si 2Si
X
ui(si;  i) =
 i (s i)ui(si; s i)
s i 2S i
!
X Y
=
j (sj ) ui(si; s i):
s i 2S i j 6=i

Hence,

i(si)
j 6=i
!
X Y
=
j (sj ) ui(s):

ui() =

si 2Si s i 2S i

s2S

!
j (sj ) ui(si; s i)

2.3 Domination & Nash Equilibrium

De nition 2.3.1 A pure strategy si is strictly dominated for

player i if

9i0 2i 8s

i2S i ui(i ; s i ) > ui (si ; s i ):

A pure strategy si is weakly dominated for player i if

9i0 2i 8s
^ 9s

i2S i ui (i ; s i )

i 2S

 ui(si; s i)

!
0
i ui (i; s i ) > ui (si ; s i ) :

De nition 2.3.2 Best Response: The set of best responses


for player i to a pure strategy pro le s 2 S is
)
(


BRi(s) = si 2 Sij8si2Si ui(si ; s i)  ui(si; s i) :
Let the joint best response set be BR(s) = i BRi(s):
c Mishra 1998

Section 2.4

Strategic Form Games

13

De nition 2.3.3 Nash Equilibrium: A pure strategy pro le

s is a Nash equilibrium if for all players i,


8si2Si ui(si ; s i)  ui(si; s i ):
Thus a Nash equilibrium is a strategy pro le s such that s 2
BR(s).
A Nash equilibrium s is strict if each player has a unique
best response to his rivals' strategies: BR(s ) = fsg.
8si6=si ui(si ; s i) > ui(si; s i):
A mixed strategy pro le  is a Nash equilibrium if for all players i,
8si2Si ui(i;  i)  ui(si;  i ):
Remark: Since expected utilities are \linear in the probabilities," if a player uses a non-degenerate mixed strategy in a Nash
equilibrium (non-singleton support), he must be indi erent between all pure strategies to which he assigns positive probability.
(It suces to check that no player has a pro table pure-strategy
deviation).

2.4 Example

Example

L
U 4,3
M 2,1
D 3,0

M
5,1
8,4
9,6

R
6,2
3,6
2,8

For column-player, M is dominated by R. Column-player can


eliminate M from his strategy space. The pay-o matrix reduces
to
c Mishra 1998

14

Strategic Form Games

Chapter 2

New Pay-o s
L
U 4,3
M 2,1
D 3,0

R
6,2
3,6
2,8

For row-player, M and D are dominated by U. Row-player


can eliminate M and D. The new pay-o matrix is

New Pay-o s
L R
U 4,3 6,2
Next, column-player eliminates R as it is dominated by U
and reduces the pay-o matrix to

New Pay-o s
L
U 4,3
Note that

BRr (U; L) = U; & BRc(U; L) = L; & BR(U; L) = (U; L):


(U; L) is a strict Nash equilibrium.
Remark: Mixed Strategy (Not a Nash equilibrium.)
c Mishra 1998

Section 2.4

Strategic Form Games

15

r = (1=3; 1=3; 1=3) & c = (0; 1=2; 1=2) &  = (r ; c):
Thus
XY
ur(r ; c) = ( j (sj ))ur (s)
s

= (1=3  0)4 + (1=3  1=2)5 + (1=3  1=2)6


+ (1=3  0)2 + (1=3  1=2)8 + (1=3  1=2)3
+ (1=3  0)3 + (1=3  1=2)9 + (1=3  1=2)2
= 5 21 ;

and
XY
uc(r; c) = ( j (sj ))uc(s)
s

= (1=3  0)3 + (1=3  1=2)1 + (1=3  1=2)2


+ (1=3  0)1 + (1=3  1=2)4 + (1=3  1=2)6
+ (1=3  0)0 + (1=3  1=2)6 + (1=3  1=2)8
= 4 12 ;

Thus this mixed strategy leads to a much better pay-o in


comparison to the pure strategy Nash equilibrium.
A pure strategy may be strictly dominated by a mixed strategy, even if it is not strictly dominated by any pure strategy .

Example

L
U 2,0
M 0,0
D -1,0

R
-1,0
0,0
2,0
c Mishra 1998

16

Strategic Form Games

Chapter 2

For row-player M is not dominated by either U or D. But M


is dominated by a mixed strategy r = (1=2; 0; 1=2) (payo :
ur () = (1=2; 1=2).
Going back to the \Prisoners' Dilemma" game, note that its
Nash equilibrium is in fact (D, D) [both players defect].
BRr (C; C ) = BRr (C; D) = BRr (D; C ) = BRr (D; D) = D;
BRc (C; C ) = BRc(C; D) = BRc(D; C ) = BRc(D; D) = D;
BR(C; C ) = BR(C; D) = BR(D; C ) = BR(D; D) = (D; D):

2.4.1 Matching Pennies

Matching Pennies
H T
H 1,-1 -1,1
T -1,1 1,-1

There are two players: \Matcher" (row-player) and \Mismatcher" (column-player). Matcher and Mismatcher both have
two strategies: \call head" (H) and \call tail" (T). Matcher
wins 1 util if both players call the same [(H,H) or (T,T)] and
mismatcher wins 1 util if the players call di erently [(H,T) or
(T,H)]. It is easy to see that this game has no Nash equilibrium
pure strategy. However it does have a Nash equilibrium mixed
strategy:
r = (1=2; 1=2) & c = (1=2; 1=2):
The pay-o s are
ur () = (1=2  1=2)1 + (1=2  1=2)( 1)
+ (1=2  1=2)( 1) + (1=2  1=2)1 = 0
uc() = (1=2  1=2)( 1) + (1=2  1=2)1
+ (1=2  1=2)1 + (1=2  1=2)( 1) = 0:
c Mishra 1998

Section 2.6

17

Strategic Form Games

2.5 Key Ingredients for Nash Equilibrium


1.
2.
3.
4.

Introspection (Fictitious play)


Deduction/Rationality
Knowledge of Opponents Pay-o s
Common Knowledge

2.6 Revisiting On-line Learning


2.6.1 Convergence

Note that in the earlier discussion of the on-line learning strategy, we noted that the on-line learning algorithm is competitive
[with a competitive factor of (ln 1= )=(1 )  1 + (1 )=2 +
(1 )2=3 +   , for small (1 )] for any suciently large time
interval [0; T ]. But it is also fairly easy to note that the probabilities that the row-player chooses do not necessarily converge
to the best mixed strategy. Namely,
P
WT (i) = t M (i; c;t ) & r;T (i) = PWWT (i()i) :
i T
We have not explicitly shown that limT !1 r;T converges in
distribution to r. Does the computed distribution converge
to anything? In the absence of any convergence property, one
may justi ably question how the algorithm can be interpreted
as learning a strategy.

2.6.2 Irrationality

Let us look at the \Matching Pennies" problem again:

Matching Pennies

c Mishra 1998

18

Strategic Form Games

Chapter 2

H T
H 1,-1 -1,1
T -1,1 1,-1
Suppose the column-player chooses a mixed strategy at time
t such that c;t(H ) > 1=2 [and c;t(T ) = 1 c;t(H ) < 1=2]
then for the row-player, the best response is BRr;t(t) = H
and is unique. By a similar reasoning, if c;t(H ) < 1=2 [and
c;t(T ) > 1=2], then for the row-player, the best response is
BRr;t(t) = T . Thus, if the rival deviates from his Nash equilibrium mixed strategy c;t = (1=2; 1=2), then row-player's (rational) best response is always a pure strategy H or T . Thus,
if row-player had a convergent (rational) mixed strategy, then
depending on limT !1fc;tgT0 , the row player must converge to
one of the following three (conventional) strategies:
1. random(1=2; 1=2) (the Nash equilibrium mixed strategy),
2. H  (always H), or
3. T  (always T).
Anything else would make the row-player irrational. Thus,
a player playing the on-line learning algorithm must be almost
always irrational!

2.6.3 A Meta-Theorem of Foster & Young

De nition 2.6.1 An in nite sequence c;t is almost constant,

if there exists a c such that c;t = c almost always (a.a.). That


is
jft  T : c;t 6= cgj = 0:
lim
T !1

T
If c;t is not almost constant then
8c= const c;t 6= c in nitely often (i.o.):

c Mishra 1998

Section 2.6

Strategic Form Games

19

Consider an n-player game with a strategy space S1  S2 


    Sn = S and with the utility functions ui : S ! R. All
actions are publicly observed. Let i = the set of probability
distributions over Si. Let  = ii be the product set of mixture. Before every round of the game, a state can be described
by a family of probability distributions

f(i; i;j )gi6=j :


i 2 i = Player i's mixed strategy;
i;j 2 j = Player i's belief about player j ' mixed strategy:

De nition 2.6.2 Rationality: Each player chooses only best


replies given his beliefs:

8i6=j i(si) > 0 ) si 2 BRi(i;j ):


De nition 2.6.3 Learning: Player i has its own deterministic learning process ffi ; fi;j g which it uses in determining its

strategy and its beliefs. In particular, let ht = all publicly available information up to time t. Then, player i chooses its strategy
and beliefs as follows:

fi : ht 1 7! i;t
fij : ht 1 7! ij;t:
The learning process is informationally independent if ij;t =
fij (ht 1) do not depend on any extraneous information.

De nition 2.6.4 Convergence: The beliefs are said to converge along a learning path fht; i;t; ij;tg10 if
8i6=j 9ij 2j tlim
 = ij :
!1 ij;t
The strategies are said to converge along a learning path
fht; i;t; ij;tg10 if

 = i:
8i 9i2i tlim
!1 i;t

c Mishra 1998

20

Strategic Form Games

Chapter 2

The beliefs are said to be predictive along a learning path if

8i6=j tlim
 = i;t;
!1 ij;t
and they are strongly predictive if in addition both the beliefs
and strategies converge.

Theorem 2.6.1 Consider a nite 2-person game (players: rowplayer and column-player) with a strict (thus, unique) Nash
equilibrium   = (r; c) which has full support on Sr  Sc .
Let f(fr ; frc ); (fc; fcr )g be a DRIP learning process (D = Deterministic, R = Rational, I = Informationally independent and P
= Predictive).
On any learning path (ht; (r;t; rc;t); (c;t; cr;t)), if the beliefs
are not almost constant with value   then the beliefs do not
converge.
Proof:
Assume to the contrary: then rc;t 6= c i.o. Then, in nitely
often, rc;t does not have full support and
9sr;t 2Sr sr;t 62 BRr (rc;t);
and by the niteness of the strategies Sr :

9sr 2Sr sr 62 BRr (rc;t) i.o.


By rationality of row-player,

9sr 2Sr r;t(sr) = 0 i.o. & 9sr 2Sr tlim


 (s ) = 0:
!1 r;t r
By a similar argument,

9sc2Sc tlim
 (s ) = 0 :
!1 c;t c
Since the learning is assumed to be predictive, we get
lim  (s ) = 0
t!1 cr;t r
c Mishra 1998

& rc;t(sc) = 0:

Section 2.6

Nash Equilibrium

21

Thus, if the beliefs converge (say, to (r ; c)) then the beliefs
(and also, strategies|by predictivity) converge to some strategies other than the unique Nash equilibrium (as it is unique with
full support). Hence one of the following two holds at the limit:

9tr 2Sr nfsr g r (tr) > 0 and tr 62 BRr (c)

or
9tc2Scnfscg c(tc) > 0 and tc 62 BRc(r ):

But, depending on which equation holds true, we shall conclude


that either row-player or column-player (or both) must be irrational, a contradiction.

c Mishra 1998

Chapter 3
Nash Equilibrium
3.1 Nash Equilibrium

3.1.1 Fixed Point Theorems

De nition 3.1.1 A point x 2 K is a xed point of an injective


function f : K ! K; if
x = f (x):

De nition 3.1.2 A point x 2 K is a xed point of a mapping


: K ! 2K ; if
x 2 (x):
Theorem 3.1.1 Brouwer's Fixed Point Theorem: If f :
K ! K is a continuous function from a nonempty, compact,

convex subset K of a nite dimensional TVS (topological vector


space) into itself, then f has a xed point, i.e.,

9x2K x = f (x):
Theorem 3.1.2 Kakutani's Fixed Point Theorem: If :
K ! 2K is a convex-valued, uhc (upper hemi-continuous) map
from a nonempty, compact, convex subset K of a nite dimensional TVS to the nonempty subsets of K , then has a xed
point, i.e.,
9x2K x 2 (x):

22

Section 3.1

23

Nash Equilibrium

De nition 3.1.3 Topological Vector Space: L = vector space

with a T1 topology
(8x6=y2L 9Gx = open set x 2 Gx ^ y 62 Gx )
which admits continuous vector space operations.
Example: Rn with standard Euclidean topology. (Only instance
of a nite dimensional TVS.)

Theorem 3.1.3 Existence of a Mixed Strategy Equilibrium (Nash 1950). Every nite strategic-form game has a mixed-

strategy equilibrium.
Proof: Player i's reaction correspondence , i, maps each
strategy pro le  to the set of mixed strategies that maximize
player i's pay-o s when his rivals play  i:

)
(
0
0
i() = i j 8si2Si ui(i;  i)  ui(si;  i) :

Thus,
De ne

i :  ! 2i :

:  ! 2 :  7! i i():
Thus this correspondence map is the Cartesian product of i's.
A xed point of (if exists) is a  such that
 2 ():
Note that
8si2Si ui(i;  i)  ui(si;  i );
by de nition. Thus a xed point of provides a mixed strategy
equilibrium .
Claims:
1.  = Nonempty, compact and convex subset of a TVS.
i = jSij 1 = jSij 1 dimensional simplex, since
)
(
X
i = (i;1; : : : ; i;jSij) j i;j  0; i;j = 1 :
Rest follows since  = ii.

c Mishra 1998

24

Chapter 3

Nash Equilibrium

2. ui = Linear Function.

80<<1 ui(i0 + (1 )i00;  i)

= ui(i0;  i) + (1 )ui (i00;  i):

Hence ui is a continuous function in his own mixed strategy. Since  is compact, ui attains maxima in .

82 () 6= ;:
3.

82 () = convex:


Let i0, i00 2 (). By de nition,
8si2Si (ui(i0;  i )  ui(si;  i))
^ (ui(i00;  i)  ui(si;  i)):
Hence

80<<1 8si2Si ui(i0 + (1 )i00;  i)  ui(si;  i);


and

80<<1 i0 + (1 )i00 2 i():

4. = uhc. Consider a sequence


(
)
n
n
n
n
( ; ^ ) j ^ 2 ( ) :
n

We wish to show that


If

nlim
!1 (

n ; ^ n ) = (; 
^)

then ^ 2 ():

Suppose Not! Then

8n ^ n 2 (n);
but
c Mishra 1998

^ 62 () ) ^i 62 i():

Section 3.1

25

Beyond Nash

Thus,

9>0 9i0 2i ui(i0;  i) > ui(^i;  i) + 3:


Since ui = continuous, there is a suciently large N such
that

ui(i0; Ni) > ui(i0;  i) 


> ui(^i;  i) + 2
> ui(^iN ; Ni) + :
Thus, ^iN 62 (N ), a contradiction.
Thus we conclude that :  ! 2 is a convex valued,
uhc map from a nonempty, compact, convex subset  of nite
dimensional TVS to nonempty subsets of . Thus by Kakutani's
xed point theorem

92  2 ();
and  is a mixed strategy Nash equilibrium.

c Mishra 1998

Chapter 4
Beyond Nash: Domination,
Rationalization and
Correlation
4.1 Beyond Nash
We have seen that it is impossible to \learn" a Nash equilibrium
if we insist on DRIP conditions. A resolution to this dilemma
can involve one or more of the following approaches:
1. Explore simpler requirements than Nash equilibria: e.g.,
undominated sets, rationalizable sets and correlated equilibria. (The rst two correspond to minmax and maxmin
requirements. The last one requires some side information
and may make the system informationally dependent.)
2. Requirement of predictivity may need to be abandoned.
3. Requirement of rationality may need to be abandoned.

4.1.1 Correlated Equilibrium

This concept extends the Nash concept by supposing that the


players can build a \correlated device" that sends each of the
players a private signal before they choose their strategy.
26

Section 4.2

27

Beyond Nash

Main Ingredients: Predictions using only the assumption


that the structure of the game (i.e., the strategy spaces and payo s, Si's and ui's) and the rationality of the players are common
knowledge.

4.2 Iterated Strict Dominance and Rationalizability


De nition 4.2.1 Iterated Strict Dominance: Let
Si0 = Si and 0i = i
Let for all n > 0

Sin

(
= si 2 Sin 1 j 8i02ni 1 9s

ui(si; s

i 2S n i 1

0
i )  ui (i ; s i )

(Thus si dominates all the mixed strategies for some strategy


pro le of the rivals) and de ne

ni

i 2 i j i(si) > 0 ) si 2 Sin

Let

Si1 =

1
\
n=0

)
:

Sin

be the set of player i's pure strategies that survive iterated deletion of strictly dominated strategies.
Let

1
i = i = mixed strategy j

8i02i 9s

i 2S 1i

ui(i; s

0
i )  ui(i ; s i )

be the set of player i's mixed strategies that survive iterated deletion of strictly dominated strategies.
c Mishra 1998

28

Beyond Nash

Chapter 4

Example:
L
U 1,3
M -2,0
D 0,1

R
-2,0
1,3
0,1

Note that

Sr0 = fU; M; Dg & 0r = f (with full support) g:


Similarly,

Sc0 = fL; Rg & 0c = f (with full support) g:


Also note that

Sr1 =    = Sr2 = Sr1 = Sr0; & Sc1 =    = Sc2 = Sc1 = Sc0:


Note, however, that for all values p 2 (1=3; 2=3) the mixed strategy r = (p; 1 p; 0) is dominated by D. Thus,
0
1
r  r :

4.2.1 Some Properties of Undominated Sets


S 1 = S11  S21      SI1; & 1 = 11  12      1I :
1. The nal surviving strategy spaces are independent of the
elimination order .
2. A strategy is strictly dominated against all pure strategies
of the rivals if and only if it is dominated against all of their
c Mishra 1998

Section 4.3

29

Beyond Nash

strategies. Thus, the following is an equivalent de nition


of the undominated sets:
Si0 = Si and 0i = i

Sin

si 2 Sin 1 j

8i02ni 1 9s
ni =

i 2S n i 1

ui(si; s

ui(i; s

0
i )  ui (i ; s i )

i 2 ni 1 j

8i02ni 1 9s
Si1 =

1
\
n=0

i 2S n i 1

Sin ;

)
:

0
i )  ui(i ; s i )

&

1i =

1
\
n=0

ni :

De nition 4.2.2 A game is solvable by iterated (strict) dominance, if for each player i, Si1 is a singleton, i.e., Si1 = fsi g.

In this case, the strategy pro le (s1 , s2 , : : :, sI ) is a (unique)


Nash equilibrium.
Proof: Suppose that it is not a Nash equilibrium: That is for
some i
si 62 BRi(s i )
Thus
9si2Si ui(si; s i ) > ui(si ; s i):
But suppose si was eliminated in round n: Then
9s0i2Sin 1 8s i2Sn i 1 ui(s0i; s i) > ui(si; s i):

Since s i 2 S 1i, we have ui(s0i; s i) > ui(si; s i). Repeating in


this fashion we get a sequence of inequalities:
ui(si ; s i ) >    > ui(s00i ; s i) > ui(s0i; s i) > ui(si; s i );
resulting in a contradiction.
c Mishra 1998

30

Chapter 4

Beyond Nash

4.3 Rationalizability
This notion is due to Bernheim(1984), Pearce(1984) and Aumann(1987) and provides a complementary approach to iterated
strict dominance. This approach tries to answer the following
question:
\What are all the strategies that a rational player
can play? "
Rational player will only play those strategies that are best
responses to some beliefs he has about his rivals' strategies.

De nition 4.3.1 (Rationalizable Strategies) Let


~ 0i = i :

For n > 0, let

~ ni

(
= i 2 ~ ni 1 j

9

n 1
i 2j6=i Conv(~ nj 1 ) i0 2~ i

ui(i; 

0
i )  ui(i ;  i )

The rationalizable strategies for player i are

Ri =

1
\
~ ni

n=0

A strategy pro le  is rationalizable if i is rationalizable for


each player i. Let  = (1, 2, : : :, I) be a Nash equilibrium.
Note rst, i 2 ~ 0i , for all i. Next assume that  2 i~ ni 1.
Thus i 2 ~ ni 1 , and  i 2 j6=i ~ nj 1. Hence,
8i02i ui(i;  i)  ui(i0;  i) ) i 2 ~ ni :
Thus,  2 R = iRi.
Hence,
Theorem 4.3.1 Every Nash equilibrium is rationalizable.
c Mishra 1998

Section 4.4

31

Beyond Nash

Theorem 4.3.2 (Bernheim/Pearce (1984))

The set of rationalizable strategies is nonempty and contains


at least one pure strategy for each player. Further, each i 2 Ri
is (in i) a best response to an element of j6=i Conv(Rj ).

Comparing the constructions of undominated strategies with


rationalizable strategies, we note that
0i = i; and ~ 0i = i:
In the nth iteration, the undominated strategies are constructed
as
(
n
i = i 2 ni 1 j
)
0
8i02ni 1 9 i2j6=i Conv(nj 1 ) ui(i;  i)  ui(i;  i) ;
where as rationalizable strategies are constructed as
(
n
~ i = i 2 ~ ni 1 j

9

i 2j6=i Conv(~ nj 1 )

8i02~ ni 1 ui(i; 

0
i )  ui (i ;  i )

Finally,
1
1
\
n ; 1 =  1 ; and R = \ 
~ ni; R = iRi:

1
=
i i
i
i
i
n=0

n=0

A direct examination of these constructions reveals that ~ ni 


ni and hence, R  1 . Also, note that the undominated strategies are computing the minmax values where as the rationalizable
strategies compute maxmin values.

4.4 Correlated Equilibrium


Aumann's Example
c Mishra 1998

32

Beyond Nash

Chapter 4

L R
U 5,1 0,0
D 4,4 1,5
There are 3 Nash equilibria:

 A pure strategy: (U, L) 7! Pay-o = 5,1,


 A pure strategy: (D, R) 7! Pay-o = 1,5, and
 A mixed strategy: ((1=2; 1=2), (1=2; 1=2)) 7! Pay-o =
(2.5, 2.5).

Suppose that there is a publicly observable random variable


with Pr(H ) = Pr(T ) = 1=2. Let the players play (U, L) if the
outcome is H, and (D, R) if the outcome is T. Then the pay-o
is (3, 3).
By using publicly observable random variables, the players
can obtain any pay-o vector in the convex hull of the set of
Nash equilibria pay-o s.
Players can improve (without any prior contracts) if they can
build a device that sends di erent but correlated signals to each
of them.

4.4.1 Formal De nitions

 \Expanded Games" with a correlating device.


 Nash equilibrium for the expanded game.
De nition 4.4.1 Correlating device is a triple
(
; fHigI ; p)

= a ( nite) state space corresponding to the outcomes
of the device.

c Mishra 1998

Section 4.4

33

Beyond Nash

 p = probability measure on the state space

 Hi = Information Partition for player i.


Assigns an hi (! ) to each ! 2
such that ! 2 hi (! ).
hi :
! Hi : ! 7! hi(!):
Player i's posterior belief about
are given by Bayes' law:

8!2hi p(!jhi ) = pp((h!)) :


i

4.4.2 Pure Strategies for the Expanded Game

Given a correlating device (


; fHi g; p), we can de ne strategies
for the expanded game as follows: Consider a map

i :
! Si : ! 7! i(!);
such that i(!) = i(!0), if !0 2 hi(!).
The strategies are adapted to the information structure.

De nition 4.4.2 DEF(1) A correlated equilibrium  relative


to information structure (
; fHi g; p) is a Nash equilibrium in

strategies that are adapted to information structure. That is,


(1, 2, : : :, I ) is a correlated equilibrium if

8i 8~i

! 2

p(!)ui (i(!);  i (!)) 

! 2

p(!)ui(~i(!);  i (!)):

Using the Bayes' rule, an equivalent condition would be:

8i 8hi2HiX
;p(hi )>0 8si2Si
p(!jhi )ui(i(!);  i (!))


!jhi (!)=hi

!jhi (!)=hi

p(!jhi )ui(si;  i(!)):


c Mishra 1998

34

Beyond Nash

Chapter 4

4.4.3 Correlated Equilibrium and Universal Device

\Universal Device" that signals each player how that player


should play.

De nition 4.4.3 DEF(2) A correlated equilibrium is any probability distribution p(:) over the pure strategies S1  S2    SI
such that, for every player i, and every function d(i) : Si ! Si
X
X
p(s)ui (si; s i)  p(s)ui(d(si); s i):
s2S

s2S

Using the Bayes' rule, an equivalent condition would be:

8i 8si2Si;p(si)>0 8s0i2Si
X
p(s i jsi)ui(si; s i)


s i 2S i

s i 2S i

p(s i jsi)ui(s0i; s i):

Equivalence of correlated equilibria under Def(1) and Def(2):


Claim:
Def(1) ( Def(2):
Choose
= S . hi(s) = fs0js0i = sig. Leave p(s) unchanged.

Claim:
Def(1) ) Def(2):

Let  be an equilibrium w.r.t. (


; fHig; p~). De ne
X
p(s) = fp~(!)j1(!) = s1; : : : ; I (!) = sI ; ! 2
g:

Let
Thus

Ji(si) = f!ji(!) = si g:

p~(Ji(si)) = p(si ) = probability that player i is told to play si:


c Mishra 1998

Section 4.4

Adaptive Learning

35

p~(!)  (!):
i
!2Ji (si ) p~(Ji (si ))
It is the mixed strategy of the rivals that player i believes he
faces, conditional on being told to play si, and it is a convex
combination of the distributions conditional on each hi such that
i(hi) = si.

c Mishra 1998

Chapter 5
Adaptive and Sophisticated
Learning
5.1 Adaptive and Sophisticated Learning
The idea of best reply dynamics goes back all the way to Cournot's
study of duopoly and forms the foundation of Walrasian equilibrium in economy and is created by the classical Tatonnement
learning process.
The underlying learning processes can be categorized into
successively stronger versions:

 Best-Reply Dynamics: However, it's also known that

this dynamics lead to non-convergent, cyclic behavior. In


this model, an outsider with no information about the utilities (payo s) of the agents could eventually predict the
behavior of the agents more accurately than they themselves.

 Fictitious-Play Dynamics: The agents choose strate-

gies that are best reply to predictions that the probability


distributions of the competitors' play at the next round is
based on the empirical distribution of the past plays. Even
36

Section 5.2

Adaptive Learning

37

this dynamics lead to (if there is no zero-sum restriction)


cycles of exponentially increasing lengths.

 Stationary Bayesian Learning Dynamics: The agents

choose strategies as functions from the information set


(empirical distribution of the past plays) without relying
on any intermediate prediction. The distribution over the
strategies changes as the empirical distribution changes.
(Reactive Learning: involves no model building.)
The dynamics may converge|but to a (mixed) strategy
pro le that is not necessarily the perfect (Nash) equilibrium.

5.2 Set-up
Player n plays a sequence of plays: fxn(t)g. Each xn(t) is a
pure strategy and is chosen by the rules of player n's learning
algorithm. We are interested in two properties that may be
satis ed by fxn(t)g: it is approximately best-reply dynamics,
then it is consistent with adaptive learning ; it is approximately
ctitious-play dynamics, then it is consistent with sophisticated
learning .

De nition 5.2.1 fxn(t)g is consistent with adaptive learning. Player n eventually chooses only strategies that are nearly

best replies to some probability distribution over his rivals joint


strategy pro les, where near zero probabilities are assigned to
strategies that have not been played for suciently long time.

De nition 5.2.2 fxn(t)g is consistent with sophisticated


learning. Player n eventually chooses only nearly best replies to

his probabilistic forecast of rivals' joint strategy pro les, where


the support of probability may include not only past plays but
also strategies that the rivals may choose if they themselves were
adaptive or sophisticated learners.
c Mishra 1998

38

Adaptive Learning

Chapter 5

We will look at the e ect of these algorithms on nite player


games, with compact strategies and continuous pay-o functions .
Note that these assumptions are consistent with the usual
model of exchange economy with in nitely divisible goods. Note
that in this model, serially undominated set is a singleton and
thus the Walrasian equilibrium. One of the main results that we
will see is that in any process, consistent with adaptive learning,
play tends towards the serially undominated set and hence, in an
exchange economy, adaptive learning would lead to equilibrium.

5.3 Formulation

De nition 5.3.1 Noncooperative game


= (N; (Sn; n 2 N ); ):
N = Finite Player Set
Sn = Player n's strategy

Compact Subset of some Normed Space


 = Pay-o Function
Assumed Continuous:

S = n2N Sn x 2 S ) x = (xn; x n):


x n is the strategy choice of n's rivals.
 : S ! RjN j = Pay-o Function, Continuous
n : S ! R
: (xn; x n) 7! n(x):
Let T be a set. Then (T ) = Set of all probability distributions over T .
(Sn) = Mixed strategies on Sn .  n = j6=n (Tj ) =
Mixed strategies of n's rivals.
c Mishra 1998

Section 5.3

39

Adaptive Learning

De nition 5.3.2 A strategy xn 2 Sn is -dominated by another


strategy xn 2 (Sn ) if
8z n2S n n(xn; z n ) +  < n(xn; z n ):
If 8 xn is -dominated by xn, then xn is dominated by xn
(in the classical sense).

Let T  S . De ne Tn  T jSn = projection of T onto Sn.


T n = j6=n Tj .

De nition 5.3.3 Given T  S . Let


Un (T ) = fxn 2 Sn : 8yn2(Sn) 9z n2T n
n(xn; z n) +   n (yn; z n)g
U (T ) = n2N Un (T ):
Un (T ) = Pure strategies in Sn that are not -dominated when
n's rivals are limited to T n .

Fact 1

The operator U  is monotonic. Let R and T be sets of strategy


pro les.
R  T ) U  ( R)  U  ( T ) :

Fact 2

9T U (T ) 6 T:

In general, starting with some arbitrary set of strategy pro le T


one may not be able to create a monotonically descending chain
of sets of strategy pro les:

T  U  (T )  U ;2(T )      U ;k (T )  U ;k+1 (T )    

Fact 3

However, S  U  (S ): Since S is the whole nothing new can get


introduced.
By the monotonicity of U  , we see that if

U ;k (T )  U ;k+1 (T );
c Mishra 1998

40

Adaptive Learning

then

Chapter 5

U  (U ;k (T ))  U  (U ;k+1 (T ))

and

U ;k+1 (T )  U ;k+2 (T ):
Putting it all together, we do have
S  U (S )  U ;2(S )      U ;k (S )  U ;k+1 (S )    

We then de ne

U 1 (S ) =

1
\
k=0

U ;k (S ):

Hence, U 01(S ) = lim!0 U ;1 (S ) = Serially undominated strategy set. We say x is serially undominated, if x 2 U 01(S ).

De nition 5.3.4 A sequence of strategies fxn(t)g is consistent


with adaptive learning by player n if

8>0 8^t 9t 8tt xn(t) 2 Un

!
fx(s) : ^t  s < tg :

A sequence of strategy pro les fx(t)g is consistent with adaptive learning if each fxn (t)g has this property.

5.4 Looking Forward


F ;0(t^; t) = U 

!
fx(s) : t^  s < tg :

8 k  1:
F ;k (t^; t) = U 
c Mishra 1998

F ;k 1(t^; t)

!
^
[ fx(s) : t  s < tg :

Section 5.4

41

Learning: MR

Lemma 5.4.1
F ;0(t^; t)  F ;1(t^; t)      F ;k(t^; t)  F ;k+1(t^; t)    
Proof
By the monotonicity of U  ,
F ;0(t^; t)  F ;1(t^; t):
Assume by inductive hypothesis,
F ;k 1(t^; t)  F ;k (t^; t):
Then

F ;k 1(t^; t) [ fx(s) : ^t  s < tg


 F ;k(t^; t) [ fx(s) : t^  s < tg:
By the monotonicity of U  ,

U


Thus

F ;k 1(t^; t)
U

[ fx(s) : t^  s < tg

F ;k (t^; t)

!
^
[ fx(s) : t  s < tg :

F ;k (t^; t)  F ;k+1(t^; t):

De nition 5.4.1 A sequence of strategies fxn(t)g is consistent


with sophisticated learning by player n if

8>0 8^t 9t 8tt xn(t) 2 Un (F 1(t^; t)):


A sequence of strategy pro les fx(t)g is consistent with sophisticated learning if each fxn (t)g has this property.
8>0 8^t 9t 8tt x(t) 2 F 1(t^; t):
c Mishra 1998

Chapter 6
Learning a la Milgrom and
Roberts
6.1 Adaptive Learning and Undominated
Sets
Example: Battle of Sexes

WnM
Ballet(B) Football(F)
Ballet(B)
2,1
0,0
Football(F) 0,0
1,2

Let fx(t)g be a sequence of strategy pro les. We show that


x(t) = (F; B ) is consistent with sophisticated learning.
8^t fx(s)jt^  s < tg = f(F; B )g:
Thus, we have
!
;
0

FW (t^; t) = UW f(F; B )g = B
!
;
0

^
FM (t; t) = UM f(F; B )g = F
42

Section 6.2
Thus
Similarly,

F ;1(t^; t) = U 

43

Learning: MR

F ;0(t^; t) = f (B; F ) g:

f(B; F ); (F; B )g = fB; F g  fB; F g:

Continuing in this fashion, we get


F ;1(t^; t) = fB; F g  fB; F g:
Thus

x(t + 1) = (F; B ) 2 F ;1(t^; t);


is consistent with sophisticated learning.

6.2 Convergence

De nition 6.2.1 A sequence of strategy pro les fx(t)g con-

verges omitting correlation to a correlated strategy pro le

G 2 (S )
if (1) and (2) hold:
1. Gtn converges weakly to the marginal distribution Gn for
all n.
2.

8>0 9t 8tt 8n2N d[xn(t); supp(Gn )] < ;


De ne d[x; T ]  inf y2T kx yk.

The sequence converges to the correlated strategy G 2 (S )


if in addition
Gt converges weakly to G.

De nition 6.2.2 A sequence fx(t)g converges omitting correlation to a mixed strategy Nash equilibrium if

c Mishra 1998

44

Learning: MR

Chapter 6

1. It replicates the empirical frequency of the separate mixed


strategies and
2. It eventually plays only pure strategies that are in or near
the support of the equilibrium mixed strategies.

Theorem 6.2.1 If fx(t)g converges omitting correlation to a


correlated equilibrium in the game , then fx(t)g is consistent
with adaptive learning.
Proof Sketch:
Gt converges to a correlated equilibrium G.
) Gn consists of best responses to G n
) For suciently large t, xn(t) is within  of Gn
) Since Sn is compact and  is continuous

8yn2Gn 9z
)

n 2G n

9>0 n(xn(t); z n) +   n(yn; z n )

xn (t) 2 Un

!
fx(s) j t^  s < tg :

Theorem 6.2.2 Suppose that the sequence fx(t)g is consistent

with adaptive learning and that it converges to x . Then x is a


pure strategy Nash equilibrium.
Proof Sketch:
Assume that x is not a Nash equilibrium
) 9n2N 8>0 fxng =6 U (fxg):
) Player n must play x0n 6= xn i.o.
) xn(t) does not converge to xn
) Contradiction.

Theorem 6.2.3 Let fx(t)g be consistent with sophisticated learning. Then for each  > 0 and k 2 N there exists a time tk after
which (i.e., for t  tk )
x(t) 2 U k (S ):
Proof Sketch:
Fix  > 0. De ne tk  tk (Change in notation).
c Mishra 1998

Section 6.2

45

Learning: MR

Case k = 0: t0 = 0. x(t) 2 U  (S ).
Case k = j + 1: By the inductive hypothesis there exists a

tj such that

8ttj x(t) 2 U j (S ):

Hence

fx(s) j tj  s < tg  U j (S ):

Since fx(t) is consistent with sophisticated learning, we can


choose
t^ = tj ; tj+1 = max(t^; t):
Then
8ttj+1 x(t) 2 F 1(tj ; t):

Claim:

F 1(tj ; t)  U ;j+1 (S ):
Equivalently,

8i F i(tj ; t)  U ;j+1 (S ):

It then follows that

F 0(tj ; t)

F ;i+1(tj ; t)

U

fx(s) j tj  s < tg

U

U ;j (S )

= U ;j+1(S ):

[ fx(s) j tj  s < tg
 U  (U ;j+1(S ) [ U ;j (S ))
=

U

F ;i(tj ; t)

= U  (U ;j (S )) = U ;j+1 (S ):

\\
k >0

U k (S ) =

\ 0k
U (S ) = U 01(S ):
k

c Mishra 1998

46

Learning: MR

Chapter 6

Theorem 6.2.4 Let fx(t)g be consistent with sophisticated learn-

ing and Sn1 be the set of strategies that are played in nitely often
in fxn(t)g. Then
\ \ k
U (S ) = U 01(S )
S 1 = n2N Sn1 
k >0

Corollary 6.2.5 In particular, for any nite game , all play

lies eventually in the set of serially undominated strategies U 01 (S ).


Theorem 6.2.6 Suppose U 01(S ) = fxg.
kx(t) xk ! 0
i fx(t)g is consistent with adaptive learning.
Proof Sketch:
())

Since  is continuous,

kx(t) xk ! 0:

8>0 9t 8t>t 8n2N n(xn(t); x n(t)) maxfn (yn; x n(t))jyn 2 Sng
< [n(x) + =2] [maxfn(yn; x n)jyn 2 Sn g =2]
= :

xn(t) 2 Un (fx(t)g)

Un

!

fx(s)jt  s < tg :

) fx(t)g is consistent with adaptive learning.


(()
Let x = accumulation point of fx(t)g.
8k 9t 8t>t x(t) 2 U k (S ):

x 2

1
\\

U ;k (S )

>0 k=1
\
\ ;k
U (S ) = U 01(S ) = fxg
=

)
c Mishra 1998

k >0

kx(t) xk ! 0:

Section 6.3

47

Learning: MR

Theorem 6.2.7 Suppose U 01(S ) = fxg.


kx(t) xk ! 0
i x(t) is consistent with sophisticated learning.

6.3 Stochastic Learning Processes


We now allow the players to experiment as we will now assume
that each user may not know his own pay-o function. See
Freudenberg & Kreps (1988).
Game consists of alternations among
 Exploration : Every strategy is experimented with equiprobability.
 Exploitation : Good strategies |based on exploration|
are played.
At each date t, player n conducts an experiment with probability nt in an attempt to learn its best play.
1. Independence: Decision to experiment is independent of
other players' decisions.
2. Rare: nt ! 0 as t ! 1.
3. In nitely Often: Pt nt = 1.

ft(k; !)g = Subsequence of dates at which player n conducts


no experiment.
! = Realization of the players' randomized choices.
Thus the interval [0; t] consists of experiment dates Pn (xn ; t)
and play dates Mn (xn; t). Write M (t) to denote the expected
total number of experiments.
 (xn; t) = Total Pay o received with Mn(xn; t)
 (yn; t) = Total Pay o received with Mn(yn ; t)
c Mishra 1998

48

Information Theory

Chapter 6

Claim:
Let T = set of strategy pro les.

8z2T n(xn; z n) < n(yn; z n ) 2


) For large t
(xn; t) < (yn; t) M (t)=jSnj:

E [(xn;  + 1) (yn;  + 1)jT ] (xn;  ) + (yn;  )


= n; +1=jSnj E [n(xn; x n( + 1)) n (yn; x n( + 1))]
< 2  n; +1=jSn j:
Taking expectations

E [(xn;  + 1) (yn;  + 1)jT ]


= E [(xn;  ) (yn;  )jT ] 2  n; +1=jSn j:
and then telescoping,

E [(xn; t) (yn; t)] <

2=jSnj

n;t = 2M (t)=jSnj:

Let  > 2jnj.


Var[(xn; t) (yn; t)] 

22M (t)=jSnj:

Thus (xn; t) (yn; t)] + M (t)=jSnj converges to 1 and


hence represents a super-martingale.
In other words, xn dominates yn then the player n will discover this fact eventually by repeated experiments.

Theorem 6.3.1 For any nite strategy game , the sequence


fxn(t(k; !))g constructed as described above is consistent with
adaptive learning a.s.(almost surely).

c Mishra 1998

Chapter 7
Information Theory and
Learning
7.1 Information Theory and Games
7.1.1 Basic Concepts

De nition 7.1.1 Entropy is a measure of uncertainty of a random variable. Let X be a discrete random variable with alphabet
X.
p(x) = Pr[X = x]; where x 2 X :
The entropy H (X ) of the discrete random variable X is de ned as
H (X ) = E p lg p(1X )
X
=
p(x) lg p(x):
x2X

Facts
1. H (X )  0. Entropy is always nonnegative. 0  p(x)  1;
lg p(x)  0. Hence, Ep lg(1=p(x))  0.)
2. H (X )  lg jXj. Consider the
uniform distribution u(x).
P
8x2X u(x) = 1=jXj. H (u) = x(1=jXj) lg jXj = lg jXj.
49

50

Information Theory

Chapter 7

3. H (X ) = Average number of bits required to encode the


discrete random variable X .

7.2 Joint & Conditional Entropy


(X; Y ) = A pair of discrete random variables with joint distribution p(x; y).
Joint Entropy =
H (X; Y ) = E p lg p(X;1 Y )
XX
=
p(x; y) lg p(x; y):
x2X y2Y

Conditional Entropy =

H (Y jX ) =

1
p(Y jX )
XX
p(x; y) lg p(yjx)
x2X y2Y
X
X
p(x) p(yjx) lg p(yjx)
x2X
y2Y
X
p(x)H (Y jx):

E p lg

=
=
=

x2X

7.2.1 Chain Rule


p(X; Y )
lg p(X; Y )
lg p(X;1 Y )
1
E p lg
p(X; Y )
H (X; Y )

= p(X ) p(Y jX ) Bayes' Rule


= lg p(X ) + lg p(Y jX )
= lg p(1X ) + lg p(Y1jX )
= E p lg p(1X ) + E p lg p(Y1jX ) Linearity of Expectation
= H (X ) + H (Y jX ):

Corollary 7.2.1
c Mishra 1998

Section 7.3

51

Information Theory

1. H (X; Y jZ ) = H (X jZ ) + H (Y jX; Z ):
2. H (X ) + H (Y jX ) = H (Y ) + H (X jY )
) H (X ) H (X jY ) = H (Y ) H (Y jX ):
3. Note that H (X jY ) 6= H (Y jX ):

7.3 Relative Entropy & Mutual Information

De nition 7.3.1 Relative Entropy|Also, Kullback-Liebler


Distance between two probability mass functions p(x) and q(x).
X
p(x) lg pq((xx)) :
D(pkq) = E p lg pq((xx)) =
x

is

Note that D(pkp) = 0. If u(x) = jXj1 , for all x. Then D(pku)

D(pku) =

X
p(x) lg p(1x) + p(x) lg jXj = lg jXj H (X ):

De nition 7.3.2 Mutual Information

Let X and Y be two discrete random variables with a joint probability mass function p(x; y), and with marginal probability mass
functions

p(x) =

y2Y

p(x; y) & p(y) =

x2X

p(x; y):

Mutual Information,

!
I (X ; Y ) = D p(x; y) k p(x)p(y)
= E p(x;y) lg pp(x(x;)py(y))
X X
=
p(x; y) lg pp(x(x;)py(y))
x2X y2Y

c Mishra 1998

52

Information Theory

Chapter 7

= H (X ) + H (Y ) !H (X; Y )
!
= H (X ) + H (Y )
H (Y ) + H (X jY )
= H (X ) H (X jY ) = H (Y ) H (Y jX ) = I (Y ; X ):

H (X ) H (X jY ) = I (X ; Y ) = H (Y ) H (Y jX ) = I (Y ; X ):
I (X ; X ) = H (X ) H (X jX ) = H (X ):
I (X ; Y ) = I (Y ; X ) = H (X ) + H (Y ) H (X; Y ):

7.4 Chain Rules for Entropy, Relative


Entropy and Mutual Information
H (X1 ; X2; : : :; Xn )
= H (X1 ) + H (X2jX1) +   
+ H (Xn jX1; : : : ; Xn 1 )
n
X
H (Xi jX1; : : : ; Xi 1):
=
i=1

I (X1; X2; : : :; Xn ; Y )
= H (X1 ; : : :; Xn ) H (X1; : : :; Xn jY )
n
n
X
X
H (Xi jX1; : : :; Xi ; Y )
H (Xi jX1; : : : ; Xi)
=
i=1
i=1
n
X
H (Xi jX1; : : : ; Xi) H (Xi jX1; : : :; Xi ; Y )
=
i=1
n
X
I (Xi; Y jX1; : : : ; Xi 1):
=
i=1

c Mishra 1998

Section 7.5

53

Information Theory

!
D p(x; y) k q(x; y)
XX
y)
=
p(x; y) lg pq((x;
x; y)
x y
XX
=
p(x; y) lg pq((xx)) pq((yyjjxx))
x y
XX
XX
=
p(x; y) lg pq((xx)) +
p(x; y) lg pq((yyjjxx))
x y
x y
X
X
=
p(x) lg pq((xx)) + p(yjx) lg pq((yyjjxx))
x
! y
!
= D p(x) k q(x) + D p(yjx) k q(yjx) :

7.5 Information Inequality


X

p(x) lg pq((xx)) lg is a concave function


x
X
X
 lg p(x) pq((xx))  lg q(x) = lg 1 = 0:
x
x
Theorem 7.5.1 D(pkq)  0 (with equality i p(x) = q(x) for
all x.)
D(pkq) =

Corollary 7.5.2

!
I (X ; Y ) = D p(x; y) k p(x)p(y)  0;

(with equality i X and Y are independent, i.e., p(x; y) = p(x)p(y)


for all x and y.)

Let u(x) = jXj1 .

D(p k u) = lg jXj H (X )  0:
c Mishra 1998

54

Information Theory

Chapter 7

Hence,

H (X )  lg jXj;
(with equality i X has a uniform distribution over X .)
I (X ; Y ) = H (X ) H (X jY )  0:

Theorem 7.5.3
H (X jY )  H (X ):
Conditioning reduces entropy.

H (X1; : : : ; Xn) =

n
X
i=1
n
X
i=1

H (Xi jX1; : : : ; Xi 1)
H (Xi )

Corollary 7.5.4
H (X1 ; : : :; Xn ) 

n
X
i=1

H (Xi );

with equality i Xi 's are independent.

7.6 Stationary Markov Process


 Markovian
Pr[Xn jX1; : : :; Xi ] = Pr[Xn jXi]; i  n:

 Stationary
Pr[Xn jX1; : : : ; Xi] = Pr[Xn+1 jX2; : : :; Xi+1]:
c Mishra 1998

Section 7.6

55

Information Theory

H (Xn jX1)  H (Xn jX1; X2) conditioning reduces entropy


= H (Xn jX2) Markov
= H (Xn 1 jX1) Stationary :
2nd Law of Thermodynamics

Theorem 7.6.1 Conditional entropy H (Xn jX1) increases with


time n for a stationary Markov process.

Relative entropy D(n kn0 ) decreases with time n.

Let n and n0 be two postulated probability distributions on


the state space of a Markov Process. At time n + 1, the distribution changes to n+1 and n0 +1, governed by the transition
probabilities r(xn ; xn+1).
Thus
p(xn ; xn+1) = p(xn)r(xn ; xn+1)
= p(xn)p(xn+1 jxn)
similarly,
q(xn; xn+1) = q(xn)r(xn ; xn+1)
= q(xn)q(xn+1jxn)
Thus, we have
!
D p(xn ; xn+1) k q(xn; xn+1)
!
!
= D p(xn ) k q(xn) + D p(xn+1jxn) k q(xn+1jxn)
!
= D p(xn ) k q(xn) :
And

D p(xn; xn+1 ) k q(xn; xn+1)

!
c Mishra 1998

56

Information Theory

Chapter 7

= D p(xn+1) k q(xn+1) + D p(xn jxn+1) k q(xnjxn+1)


!
 D p(xn+1) k q(xn+1) :
We conclude that

!
D p(xn) k q(xn)  D p(xn+1) k q(xn+1) :

Thus the relative entropy for this system must decrease:

D(1 k10 )  D(2k20 )    


 D(n kn0 )  D(n+1 kn0 +1 )        ! 0:

7.7 Gambling and Entropy


Horse Race

# horses = m; fH1; H2; : : : ; Hmg:

pi = Pr[Hi wins ]
ui = pay-o if Hi wins :
If bi = bet on the ith horse then the payo =
(
biui; if Hi wins with probability pi ;
0;
if Hi loses with probability (1 pi ):
Assume that the gambler has 1 dollar. Let bi = fraction of
his wealth invested in Hi . Thus
m
X
bi = 1:
0  b i  1:
i=1

c Mishra 1998

Section 7.7

57

Information Theory

Note that the gambler's pay-o is biui if Hi wins (with probability pi .)

S (X ) = b(X )u(X )
= factor by which the gambler increases his wealth if X wins.
Repeated game with reinvestment.

S0 = 1;
Sn = Sn 1 S (Xn );
Thus
Let
E p [lg S (X )] =

Sn =

n
Y
i=1

if Xn wins in the nth game.

P
S (Xi) = 2 lg S(Xi):

pk lg(bk uk ) = W (b; p) = Doubling Rate;

where b = the betting strategy. Then


1 lg S ! E [lg S (X )] in probability;
p
n n
by \Law of Large Number." Hence

Sn ! 2nW (b;p):

De nition 7.7.1 Doubling Rate


W (b; p) =

m
X
k=1

pk lg(bk uk ):

Theorem 7.7.1 Let the race outcomes X1 , : : :, Xn be i.i.d.


p(x). Then the wealth of the gambler using betting strategy b
grows exponentially at rate W (b; p), i.e.
Sn  2nW (b;p):
c Mishra 1998

58

Chapter 7

Information Theory

W (b; p) =

pk lg(bk uk )
#
X " bk
1
=
pk lg p lg p + lg uk
k
k
X
=
pk lg uk H (p) D(pkb)
X
 pk lg uk H (p);

with equality i p = b.
The optimal doubling rate

W (p) = max
W (b; p) = W (p; p) =
b

pk lg uk H (p):

Theorem 7.7.2 Proportional gambling is log-optimal.

The optimum doubling rate is given by

W (p) = W (b; p) =

pk lg uk H (p);

and is achieved by the proportional gambling scheme, b = p.

De ne rk = u1k = Bookie's estimate of the win \probabilities." Thus


X
X
rk = u1 = 1:
k
k
Odds are fair and there is no track take.

pk lg rbk
k
#
X " bk
r
k
=
pk lg p lg p
k
k
= D(pkr) D(pkb):

W (b; p) =

Doubling Rate = Di erence between the distance of the bookie's


estimate from the true distribution and the distance of the gambler's estimate from the true distribution.
c Mishra 1998

Section 7.7

59

Information Theory

Special Case: Odds are m-for-1 on each horse:


8k rk = m1 :
Thus,
W (p) = D(pku) D(pkb) = lg m H (p):

Theorem 7.7.3 Conservation Theorem


for uniform odds.

W (p) + H (p) = lg m

Low-Entropy Races are Most Pro table.


Case of a not fully invested gambler.

b0 = wealth held out as cash


bi = proportional bet on Hi:
m
X
b i = 1:
b0  0; bi  0;
Thus

i=0

S (X ) = b0 + b(X )u(X ):
P
 Fair Odds: u1i = 1.
If there is a non-fully-invested strategy with b0, b1, : : :, bm,
then there is also a full investment as follows
b00 = 0
b0i = bi + bu0 ; 1  i  m
i
m
m
m
X
X
X
bi + b0 u1 = 1:
b0i =
i=1 i
i=1
i=0
Thus
S (X ) = b0(X )u(X ) = u(bX0 ) u(X ) + b(X )u(X )
= b0 + b(X )u(X ):
Thus in this case there is a risk-neutral investment.
c Mishra 1998

60

Information Theory

Chapter 7

 Super-Fair Odds: P u1i < 1.

\Dutch Book" betting strategy.


X1
1
;
b
b0 = 1
i = ; 1  i  m:
ui
ui
Thus

 X 1 1
X1
+
u
(
X
)
=
2
S (X ) = 1
ui u(X )
ui > 1

with no risk!
This, however, implies a strong arbitrage opportunity.
 Sub-Fair Odds: P u1i > 1.
In this case, proportional gambling is no longer log-optimal
and this case represents a risky undertaking for the gambler.

7.8 Side Information


Some external information about the performance of the horses
may be available|for instance, previous games.
X = f1; 2; : : : ; mg, represent the horses.
Y = Some other arbitrary discrete random variable
(Side Information ).

p(x; y) = joint probability mass function for (X; Y ) :


b(xjy) = conditional betting strategy depending on Y
= proportion of wealth bet on horse x given that y 2 Y is
observed.
b(x) = unconditional betting strategy.
P b(x) = 1:
b(x)  0;
x
b(xjy)  0; Px b(xjy) = 1:
c Mishra 1998

Section 7.9

61

Information Theory

X
W (X ) = max
p(x) lg(b(x)u(x))
b(x) x
X
=
p(x) lg u(x) H (X ):
x

X
W (X jY ) = bmax
p(x) lg(b(xjy)u(x))
(xjy) x
X
=
p(x) lg u(x) H (X jY ):
x



W = W
X (X jY ) W (X )
X
=
p(x) lg u(x) H (X jY )
p(x) lg u(x) + H (X )
x

= H (X ) H (X jY ) = I (X ; Y )  0:
Increase in Doubling Rate =
Mutual information between the horse race and side information.

7.9 Learning

fXk g = Sequence of horse race outcomes from a stochastic process.


W (Xk j"Xk 1; Xk 2 ; : : :; X1)
#
= E b(:jxmax;:::;x ) E [lg S (Xk )jXk 1; Xk 2 ; : : : ; X1]
1
k 1
= lg m H (Xk jXk 1; Xk 2 ; : : : ; X1);
and is maximized for
b(xk jxk 1; : : : ; x1) = p(xk jxk 1; : : :; x1):
Note that since
n
Y
Sn = S (Xi );
i=1

c Mishra 1998

62

Universal Portfolio

Chapter 7

we have
1 E lg S = 1 X E lg S (X )
n
i
n
n
X
= n1 (lg m H (Xi jX1; : : : ; Xi 1))
= lg m H (X1; n: : : ; Xn)
= lg m H (X ):

H (X ) is simply the entropy rate.

c Mishra 1998

Chapter 8
Universal Portfolio
8.1 Universal Portfolio
1. Sequential Portfolio Selection Procedure. An adapted process.
2. No statistical assumption about the behavior of the market.
3. Robust procedure with respect to arbitrary market sequences occurring in the real world.
We shall consider growth of wealth for arbitrary market sequences. For example, our goal may be to outperform the best
buy-and-hold strategy|i.e., we wish to be competitive against
a competing investor who can predict n future days. A di erent goal may be to outperform all constant rebalanced portfolio
strategies.

m = # stocks traded in a market


xi = price relative for the ith stock
price at close = Pi (c)
= stock
stock price at open Pi(o)
= 1 + PPi :
i
63

64

Universal Portfolio

0
BB
x=B
B@

x1
x2
...
xm

Chapter 8

1
C
C
C
= stock market vector :
C
A

8.1.1 Portfolio

b1 1
(
b2 C
CC
bPi  0
= portfolio ;
... C
A
i b i = 1:
bm
Portfolio is simply the proportion of the current wealth invested in each of the stocks .
X
S = b  x = bT x = bixi;

0
B
B
b=B
B
@

= Factor by which the wealth increases in one period.

x(1); x(2); : : :; x(n)


= stock market vectors for n consecutive days.

b = Fixed (constant) portfolio


We shall follow a constant rebalanced portfolio strategy.
(
n
Y
S0(b) = 1
T
Sn(b) = b x(i);
Sn (b) = Sn 1(b) bT x(n):
i=1

Sn = max
S (b) = Sn (b):
b n
This is the maximum wealth achievable on the given stock sequence maximized over all constant rebalanced portfolios.
c Mishra 1998

Section 8.2

65

Universal Portfolio

8.2 Universal Portfolio Strategy


^b(k)
depends only the past price relatives: x(1), x(2), : : :, x(k 1).
It performs as well as the best constant rebalanced portfolio based on a clairvoyant knowledge of the sequence of price
relatives.

8.2.1 Questions

Since we wish to compete against a clairvoyant investor (who knows the future) and universal portfolios only depend on the past (past has no causal or
correlated relation with the future), how is it possible
that universal portfolio can be competitive?

Malicious/adversarial nature is free to structure the


future so as to help the competing investor.

0 1=m 1
B 1=m CC
BB .. CC :
^b(1) = B
@ . A
1=m
Sk (b) =

Yk
i=1

bT x(i); B = b 2 Rm+ j bi  0;

bi = 1 :

R bS (b)db
^b(k + 1) = RB k
S (b)db
B k

Note that
^b(k + 1)T x(k + 1) =

R bT x(k + 1)S (b)db R S (b)db


k
B R
= BR Sk+1(b)db
S
(
b
)
db
B k
B k
c Mishra 1998

66

Universal Portfolio

Chapter 8

The \learned" portfolio is the performance weighted average of


all portfolios b 2 B .
Thus
R S (b)db
Z
n
Y
T
S^n = ^b(k) x(k) = B R n db = (m 1)! Sn(b)db:
B
B
k=1
We will show that

q
(
m
1)!(
2=n)m 1
q
S^n  Sn
jJnj;

where Jn = a positive semide nite (m 1)  (m 1) sensitivity


matrix.

8.3 Properties & Analysis


Let F be some arbitrary probability distribution for price relatives over Rm+ . Let Fn be the empirical distribution associated
with x(1), x(2), : : :, x(n). Pr[X = x(i)] = 1=n. Pr[X 6=
x(i); 8i] = 0.
lim F ! F:
n!infty n

8.3.1 Doubling Ratio


Z

lg(bT x)dF (x)


n 1
X
W (b; Fn) =
lg(bT x(i))
n
i=1
W (F ) = max
W (b; F )
b
W (Fn) = max
W (b; Fn)
b

W (b; F ) =

n
Y
T x(i) = 2nW  (Fn ):

b
Sn = max
S
(
b
)
=
max
n
b
b i=1
c Mishra 1998

Section 8.3

67

Universal Portfolio

Let ej be the vector


001
BB .. CC
BB . CC
BB 0 CC
ej = B
BB 1 CCC
BB 0 CC
B@ ... CA
0

Sn(ej ) =

n
Y
k=1

1 in j th position only.

eTj x(k) =

n
Y
k=1

x j (k )

= Wealth due to buy-and-hold strategy


associated wit the j th stock:
Since Sn is a maximization of Sn (b) over the entire simplex,
8j Sn  Sn (ej ):

Corollary 8.3.1
1. Target Exceeds Best Stock.
S (e ):
Sn  max
j n j
2. Target Exceeds Value Line.

0
11=m
Y
Sn  @ Sn(ej )A
j

3. Target Exceeds Arithmetic Mean.

Sn 

X
j

j Sn(ej );

j  0;

X
j

j = 1:

4. Sn(x(1), x(2), : : :, x(n)) is invariant under permutations


of the sequence x(1), x(2), : : :, x(n).
c Mishra 1998

68

Universal Portfolio

Lemma 8.3.2

Y
S^n = ^b(k)T x(k) =
n

k=1

where

Sn(b) =

n
Y
i=1

Chapter 8

R S (b)db
BRn
db
B

bT x(i):

S^n = Wealth from universal portfolio is the average of Sn(b)

over the simplex.


Proof :
Recall that

R
^b(k + 1)T x(k + 1) = RSk+1 (b)db :
Sk (b)db
Telescoping the products
n
^Sn = Y ^b(k)T x(k)
k=1
R S (b)db
R S (b)db
n
= R S (b)db      R1 db
R Sn(b1)db
= Rn db
R Qn bT x(i)db
R
= B i=1
B db
= E b Sn (b) = E b 2nW (b;Fn ):

Corollary 8.3.3 S^n(x(1), x(2), : : :, x(n)) is invariant under


permutations of the sequence x(1), x(2), : : :, x(n).

Claim

E b W (b; Fn) 
E b W (b; Fn )

c Mishra 1998

Eb

1 X W (e ; F ):
j n
m j

lg(bT x) dFn (x)

Section 8.4

69

Universal Portfolio

X
lg bj (eTj x) dFn(x)
X Z T
 E b bj lg(ej x) dFn(x)
XZ T
1
= m
lg(ej x)dFn(x)
X
= m1 W (ej ; Fn):
=

Eb

By Jensen's inequality
Eb2nW (b;Fn )  2nEbWP(b;Fn)
 21=m nW (ej ;Fn)
Y nW (ej ;Fn)1=m

2
:
Thus
S^n = E b Sn (b) = E b 2nW (b;Fn )
11=m
0m
Y nW (ej ;Fn)1=m
Y

 @ Sn (ej )A :
2
j =1

Corollary 8.3.4 Universal portfolio exceeds Value Line index.


0m
11=m
Y
S^n  @ Sn(ej )A :
j =1

8.4 Competitiveness
Fn(x) = Empirical probability mass function. Mass on each
x(i) 2 Rm+ is n1 .
Sn(b) =
b(Fn) = b

n
Y
i=1

bT x(i) = 2nW (b;Fn ) = enV (b;Fn );

= arg max Sn(b) = arg max V (b; Fn) 2 Rm+ :


nV  (Fn ) :
Sn = max
S
(
b
)
=
e
n
b2 B
c Mishra 1998

70

Chapter 8

Universal Portfolio

De nition 8.4.1 All stocks are active at time n, if


9b;Sn (b)=Sn 8i2[1::m] (b(Fn))i > 0:
All stocks are strictly active at time n, if

8b;Sn (b)=Sn 8i2[1::m] (b(Fn))i > 0:


If

Lin (x(1); x(2); : : :; x(n)) = Rm;


then we say that the price relatives x(1), x(2), : : :, x(n) are of
full rank.
J (b) = (m 1)  (m 1) matrix :
J (b) = Sensitivity Matrix Function of a market with respect to
distribution F (x), x 2 Rm+ .
Z
x(j ) x(m)) dF (x):
Jij (b) = (x(i) x(m(b))(
T x)2
J  = J (b) = Sensitivity Matrix.
2V ((b; : : :; b ; 1 Pm 1 b); F )
@
1
m 1
i=1 i

Jij =
@bi @bj
= Positive Semide nite Matrix.
It is positive de nite if all stocks are strictly active.
Let
De ne
Thus

(
)
X
C = (c1; c2; : : : ; cm 1) j ci  0; ci  1 :
b(c) = c1; : : :; cm 1 ; 1

mX1
i=1

ci :

! Z
n
X
1
T
Vn (c) = n ln b(c) x(i) = ln(bT x) dFn (x)  E Fn ln(bT x):
i=1

c Mishra 1998

Section 8.4

71

Universal Portfolio

Using Taylor series expansion:

Vn (c) = Vn (c) + (c c)T rVn(c)


1 (c c)T J (c c)
n
2
X
+ 61 (ci ci )(cj cj )(ck ck )
ijk
E Fn (x(i) x(m))(x(j )S 3(~cx)(m))(x(k) x(m))
where
c~ = c + (1 )c;   0;
mX1
b(~c)iX (i):
S (~c) =
i=1

Assume that all stocks are strictly active:


" 2 #
V = positive de nite.

J = @c@ @c
i j

Hence its determinant is strictly positive:

jJ  j > 0:

Let u = n(c c). Then since the second term is 0 in the


Taylor series formulation, we have
nVn (c) = nV (Fn) 21 uT Jnu
X
+ 6p1 n uiuj uk
E Fn (x(i) x(m))(x(j )S 3(~xc)(m))(x(k) x(m)) :
Next assume that 0 < a  x(i)  c < 1.

S (~c)  a; x(i) x(m)  2c:


c Mishra 1998

72

Portfolios and Markets

Chapter 8

Thus the last term in the preceding expression can be bounded


by
1 kuk3m3=2 (2c)3 :
p
6 n
a3
Hence, we have

3=2 3
nVn (c) = nV (Fn ) 21 uT Jnu 43mpnac3 kuk3:
We thus conclude that
p

Sn (c) = 2nWn (c)  e(nVn) (uT Jnu=2) (4m3=2c3 kuk3=3 na3)


p
= Sn e (uT Jnu=2) (4m3=2c3 kuk3=3 na3):
Since S^n = R Sn (b)db= R db, and since R db = 1=(m 1)!, we
have
!
Z
pna3 ) 1 m 1
T

3
=
2
3
3

(
u
J
u=
2)
(4
m
c
k
u
k
=
3
pn
S^n  Sn(m 1)! u2U e n
du:
Thus,
=n)(m 1)=2 :
S^n  Sn (m 1)!(2
jJ j1=2
In other words,

jJn j1=2
1 lg Sn = 1 lg
n S^n n (m 1)!(2=n)(m 1)=2 ! 0;
Summarizing, we have
1 lg S   1 lg S^
n n
n n
Vn  V^n :

c Mishra 1998

as n ! 1 :

Chapter 9
Portfolios and Markets
9.1 Portfolio Theory
9.1.1 It^o Calculus

X = asset price at time t. In a continuous time model, one can


study the return on the asset dX=X over a small period of time
dt.
dX =  dt +  dZ:
X
This is a so-called It^o process.
 = average rate of growth: DRIFT
 = volatility: DIFFUSION

9.1.2 Market Model

Assume that there are m stocks, represented by m It^o processes:

X1 (t); X2(2); : : : ; Xm(t):


Furthermore,

m
dXi =  dt + X
ij dZj ;
i
Xi
j =1

73

74

Portfolios and Markets

Chapter 9

Here, Zj 's are independent Brownian motions.


0 1
BB 12 CC
 = Drift Vector = B
B@ ... CCA :
m
0     1
11 12
1m
B
C






B
21
22
2
m C
B
 = Di usion Matrix = B .. .. . . .. C
. . C
@ . .
A
m1 m2    mm
 = Instantaneous Covariance Matrix =  T :
In general, the term dZ corresponds to a Wiener Process .
 dZ = Normal Random Variable.
p
 dZ  N (0; dt). Mean of dZ is zero and variance of dZ
is dt.
p
dZ =  dt; E [] = 0; E [2] = 1:
This holds in continuous time in the limit as dt ! 0.
Lemma 9.1.1 It^o's Lemma [Analogous to Taylor's theorem
in case of functions of random variables. The key ideas is based
on the observation that with probability 1, dZ 2 ! dt as dt ! 0.]
Suppose f (X ) is a function of X (where X is possibly stochastic).
@f dX + 1 @ 2f dX 2 + smaller order terms
df = @X
2 @X 2
2
dX = (Xdt + XdZ )2
= 2X 2dZ 2 + 2X 2dZdt + 2X 2dt2
! 2X 2dt as dt ! 0
@f (Xdt + XdZ ) + 1 2X 2 @ 2f dt
df = @X
2!
@X 2
2
= X @f + 1 2X 2 @ f2 dt + X @f dZ:
@X 2
@X
@X
c Mishra 1998

Section 9.2

75

Portfolios and Markets

Example
dX =  dt +  dZ
X
Let f (X ) = ln X . Then
@f = 1 ; & @ 2f = 1
@X X
@X 2
X2
@f dX + 1 @ 2f dX 2
df = @X
2 @X 2
1 2X 2dt
= dX
X 2X 2
2 dt
= dX
X 2
2 dt
d(ln X ) = dX
X 2
dX = d(ln X ) + 2 dt
X
2 Z
Zt
Z t dX
1 t 2dt
d
(ln
X
)
+
=
2 0 Z
0
0 X
t
= ln X (t) ln X (0) + 21 2dt
0
(Z t )


Z
t
dX = X (t) exp 1 2dt
exp
X (0)
2 0
0 X

9.2 Rebalanced Portfolio


Market Model with m stocks:
m
dXi (t) =  (t)dt + X
ij (t)dZj (t)
i
Xi (t)
j =1
(t) = (t)(t)T :
c Mishra 1998

76

Portfolios and Markets

Chapter 9

A portfolio of long stocks at time t is identi ed by its weighted


vector process b(t) 2 B .
)
(
m
X
B = b 2 Rm j bi  0; bi = 1 :
i=1

Rebalanced Portfolio

(A self- nancing portfolio without dividends).


m
dXi (t)
dS (t) = X
b
i (t)
S (t)
Xi(t)
i=1
1
0m m
!
m
X
X
X
bi(t)ij (t)dZj A :
bi(t)i(t) dt + @
=
i=1

i=1 j =1

Let g(S ) = ln S and f (X ) = P bi ln Xi = ln Q Xibi .

dg
dS
S
df
X dXi
bi X
i
Hence

1 (bT b)S 2dt


= dS
S 2S 2
= d(ln S ) + 21 (bT b)dt
X dXi X 1
2
=
bi X
2Xi2 (biii)Xi dt
i
X
X
= d( bi ln Xi ) + 21 biiidt

X
X
d(ln S ) = d( bi ln Xi ) 21 bT bdt + 12 biiidt
t; b) = X b ln Xi (t) 1 bT b + 1 X b  ;
ln SS((0)
i
Xi(0) 2
2 i ii
R
where   0t (s)ds.
 1
m X (t) !bi
X 
Y
1
i
T
exp 2 b b + 2 iibi :
S (t; b) = S (0)
i=1 Xi (0)
c Mishra 1998

Section 9.2

77

Portfolios and Markets

Maximizing the above expression we have

S (t) = max
S (t; b) = S (t; b(t))
b2 B
Note that b(t) = optimal solution of the following quadratic
programming problem:
!
m
X
1
X
i (t) 1
T
max
ln X (0) + 2 ii bi:
b b +
b2B 2
i
i=1

De ne the matrix V , an (m 1)  (m 1) symmetric positive


semide nite matrix

V = (Vij ) Vij = ij im jm + mm; 1  i; j  m:

Lemma 9.2.1 If V = positive de nite then the portfolio problem has a unique optimal solution.

De nition 9.2.1 A stochastic process X (t) is weakly regular


if

8t jE [X (t)]j < 1:
lim E [X (t)] =  exists
t

X (t) !  in probability as t ! 1:
t
The stock market model is weakly regular (easily satis ed if
the market is stationary)
8t jE [(t)]j < 1; & jE [ln X (t)]j < 1;
t)] = 1 ; & lim E [ln X (t)] = 1 exist
lim E [(
t
t
(t) ! 1 ; & ln X (t) ! 1 in probability as t ! 1:
t
t
c Mishra 1998

78

Chapter 9

Portfolios and Markets

Note that

Thus

dXi =  dt + X  dZ
i
ij j
Xi
i dXi2
d(ln Xi) = dX
X 2X 2
 i ii i
X
= i 2 dt + ij dZj
1

ii :
1
1
i = i
2
1

1
1
i = i + 2ii :

Similarly,
dS = X b  dt + X X b  dZ
i i
i ij j
S


XX
d(ln S ) = bT  21 bT b dt +
biij dZj
r(b) = lim E [ln St (t; b)] = 21 bT 1 b + bT 1:

Asymptotically optimal constant weight b1 2 B .


1 bT 1 b + bT 1 :
r(b1) = max
r
(
b
)
=
max
b2B
b2B 2

9.2.1 Optimal Portfolio

Recall


 1
m X (t) !bi
Y
i
T b + 1 X  b :
S (t; b) = S (0)
exp
b
ii i
2
2
i=1 Xi (0)
Vij (t) = ij im jm + mm:
De ne

c Mishra 1998

Xm (t)
i = ln X
m (0)

Xi(t)
ln X
i (0)

Vii(t) :
2

Section 9.2

79

Portfolios and Markets

Notation:

b = (b0; bm) b01 +    + b0m 1 + bm = 1; b0i  0; bm > 0:


Rewriting the previous equation, we have
Xm (t) exp  1 b0T V b0 T b0 :
S (t; b) = S (0) X
2
m (0)
The above value S (t; b) is maximized at b0 = 

V (t) (t) =
(t) =

(t)
V 1(t)(t)

Xm (t) e T V =2;
S (t) = S (0) X
(0)
m

and

S (t; b) = S (t) exp

 1

0 )T V (b0 ) :
(
b
2

S (t; b) = exp  1 (b0 )T V (b0 ) :


S (t)
2

9.2.2 Long Term E ects


Vij = ij im jm + mm
Jij1 = 1ij 1im 1jm + 1mm

 (t) 
 (t) 
ln XXii(0)
i = ln XXmm(0)
1
i1 = m1 i11 J2ii
mm 1 1
ii
= 1
m 1 2
i 1 2
ii + 1 mm
2 1 im 1 2
= 1
m i + im :

lim V t(t) = J 1

9
2 >
>
>
>
=
lim it(t) = i1
>
>
>
>
;

Vii (t)

c Mishra 1998

80

Portfolios and Markets

Since

1 bT 1 b + bT 1
2
1 b0T J 1b0 b0T 1;
2

r(b) =
=
it is maximized at

Chapter 9

1 = (J 1 ) 1 1 :

Note, however, that


!1
!

(
t
)
V
(
t
)
1) 1 1 = 1 :

!
(
J
(t) =
t
t
Problem: Construction of b1 requires the long-term average
of future instantaneous expected returns and covariances. This
however is impossible.
Remedy: Universal Portfolio

9.3 Universal Portfolio


Rebalanced portfolio with weights:
R
^bi(t) = RB biS (t; b)db :
B S (t; b)db
Let
R S (t; b)db
S = B R db
B
Note that
S(0) = S^(0):
Furthermore,
R
R
dS = RB dS (t; b)db = B Pi SR(t; b)bi(dXi =Xi )db
S
B S (t; b)db
B S (t; b)db
X ^ dXi
bi(t) X
=
i

^
= d^S
S

c Mishra 1998

Section 9.3

81

Portfolios and Markets

Hence,

8t S^(t) = S(t):

Lemma 9.3.1 The wealth accumulated by a universal portfolio


is given by

S^(t) =

B SR (t; b)db :
B db

This is the average wealth accumulated by all possible portfolios.

9.3.1 Competitiveness
S (t; b) = S (t)e 21 (b0 )T V (b0 ):
Let x = V 1=2(t)(b0 b0). Thus
(t) = V 1=2(t)(B 0 b0);
where
Note that
We have

B0 =

X
b0 2 Rm 1 j b0i  0; b0i < 1

)
:

Vol (B 0) = (m 1 1)! :
 (t) R e jxj2 =2 dx
S
S^(t) = jV (t)j1=(2(1t) =(m 1)!) :

R
S^(t) = (m 1)!  e jxj2=2dx
 V (t) 1=2 m 1=2
S (t)
t t
p m1
(
m
1)!(
= jJ 1j1=2tm21)=2
 2 m 1=2
= (jmJ 1j11)!
:
=2
t
c Mishra 1998

82

Bibliography

Thus,
1 ln S^(t) = C (m) C 0(m) ln t ! 0
t S (t)
t
and

c Mishra 1998

ln S^(t) ! ln S (t) ! ln S (t; b1) :


t
t
t

Bibliography
Text Books
[1] Thomas M. Cover and Joy A. Thomas. Elements of
Information Theory , John Wiley & Sons, 1991. ISBN
0-471-06259-6.

[2] Darrel Due. Dynamic Asset Pricing Theory , Princeton,


1997. ISBN 0-691-04302-7.
[3] Drew Fudenberg and Jean Tirole. Game Theory , MIT,
1995. ISBN 0-262-06141-4.
[4] Alan Kirman and Mark Salmon. Learning and Rationality in Economics , Basil Blackwell, 1995. ISBN
0-631-18488-0.

[5] H.M. Markovitz. Means-Variance Analysis in Portfolio Choice and Capital Markets , Blackwell, 1991. ISBN
0-631-17854-6.

Popular Books
[6] William Poundstone. Prisoner's Dilemma Doubleday,
1992.
[7] Anatol Rapoport. Two-Person Game Theory: The Essential Ideas , Ann Arbor Science Paperbacks, University of
Michigan, 1966.
[8] Karl Sigmund. Games of Life: Explorations in Ecology,
Evolution and Behaviour Oxford University Press, 1993.
83

Вам также может понравиться