Академический Документы
Профессиональный Документы
Культура Документы
Bhubaneswar Mishra
Courant Institute of Mathematical Sciences
Preface
In the spring of 1998, a small group of computer science colleagues, students and I started writing notes that could be used
in the context of our research in computational economy, evolving around our work on CAFE . The group consisted of Rohit
Parikh, Ron Even, Amy Greenwald, Gideon Berger, Toto Paxia
and few others. At present, these notes are intended for the
consumption of only this group.
January 1 1998
251 Mercer Street, New
York.
B. Mishra
mishra@nyu.edu
vii
Contents
Preface
1 Introduction
vii
1
10
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
Games : : : : : : : : : : : : : : : : : : : : :
Strategic Form Games : : : : : : : : : : : :
Domination & Nash Equilibrium : : : : : : :
Example : : : : : : : : : : : : : : : : : : : :
2.4.1 Matching Pennies : : : : : : : : : : :
2.5 Key Ingredients for Nash Equilibrium : : : :
2.6 Revisiting On-line Learning : : : : : : : : :
2.6.1 Convergence : : : : : : : : : : : : : :
2.6.2 Irrationality : : : : : : : : : : : : : :
2.6.3 A Meta-Theorem of Foster & Young :
2.1
2.2
2.3
2.4
ix
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
1
2
3
4
6
7
7
8
8
8
9
9
10
10
12
13
16
17
17
17
17
18
Contents
3 Nash Equilibrium
22
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
26
26
27
28
30
31
32
33
: 34
:
:
:
:
36
36
37
38
40
42
49
c Mishra 1998
:
:
:
:
:
49
49
50
50
51
: 52
: 53
: 54
xi
Contents
8 Universal Portfolio
Bibliography
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
63
63
64
65
65
66
66
69
73
73
73
73
75
78
79
80
81
82
c Mishra 1998
Chapter 1
Introduction
1.1 Stag Hunt Problem
(With Two Players)
Introduction
Chapter 1
Section 1.3
2.
3.
4.
5.
Introduction
Evolutionary Biology
Large Scale Distributed Systems
Resource Allocation
Intelligent Agents
C D
C 0,0 -2,1
D 1,-2 -1,-1
There are two prisoners (row-player and column-player) arrested for a particular crime, but the prosecutor does not have
enough evidence to convict them both. He relies one one of them
testifying against the other in order to get a conviction and punish the second prisoner by sending him to jail. If both of them
testify against the other (defections: \D, D") then they both go
to jail for 1 year each, thus getting a \util" of 1. If, on the other
hand, both maintain silence (cooperations: \C, C") then they
go free with \util" of 0 each. If, on the other hand, row-player
testies (D) and column-player maintains silence (C), then rowplayer is rewarded with 1 util and column-player is punished
with 2 util. The other case is symmetric.
The pay-os can be made all non-negative by adding 2 utils
to each and thus getting a pay-o matrix:
c Mishra 1998
Introduction
Chapter 1
si 2 [0; 1]:
4. The highest bidder wins the object.
5. But he only pays the second bid (maxj6=i sj ).
c Mishra 1998
Section 1.5
Introduction
6. His utility is
vi max
s:
j 6=i j
v1; v2 = valuations
s1; s2 = bids:
Pay-os
Chapter 1
Introduction
P
1
1/2
0
S
0
1
1/2
Row-player's strategy
min
c M (r ; c ):
r max
c Mishra 1998
i;j
Section 1.7
Introduction
1.6 Obstacles
1. Imperfect Information
M (pay o) may be unknown.
2. Computational complexity
M is so large that computing a minmax strategy using a
linear program is infeasible.
3. Irrationality
Opponent (column-player) may not be truly adversarial.
Introduction
Chapter 1
t=1
Initially,
= 1;
8i
= Wt(i) M (i;c;t)
= PWWt(i()i) :
i
1.9.1 Inequality 1
X
= 1 (1 )M (r;t; c;t)):
After telescoping, we get
P W ( i) Y
Pi WT +1(i) (1 (1 )M (r;t; c;t))
i 1
t
c Mishra 1998
Section 1.9
Hence,
P W ( i) !
X
ln i nT +1
ln(1 (1 )M (r;t; c;t))
t
X
(1 ) M (r;t; c;t):
t
1.9.2 Inequality 2
X
i
WT +1(i) WT +1(j ) =
P M (j; c;t)
t
P M (; c;t)
r
Hence
P W ( i) !
X
ln i nT +1
(ln ) M (r; c;t)) ln n:
t
and,
X
t
M (r; c;t)
M (r;t; c;t)
X
(ln1 1= ) M (r; c;t)) + 1ln n :
t
t
c Mishra 1998
Chapter 2
Strategic Form Games
2.1 Games
Games can be categorized in to following two forms as below.
We will start here with the rst category and postpone the discussion of the second category for later.
1. Strategic Form Games (also called Normal Form Games)
2. Extensive Form Games
S = S1 S2 SI
(Cartesian product of the pure strategies) = Set of pure
strategy proles.
10
Section 2.2
11
Conventions
s = (si; s i) 2 S;
is a pure strategy prole.
ui : S ! R = Pay-o function (real-valued function on S )
for player i.
ui(s) = von Neumann-Morgenstern utility of player i for
each prole s = (s1; s2; : : : ; sI ) of pure strategies.
2
X
i=1
ui(s) = 0:
)
(
X
i = i: Si ! [0; 1]j i(si) = 1 :
i
12
Chapter 2
The support of a mixed strategy i = The set of pure strategies to which i assigns positive probability.
Player i's pay-o to prole is
ui() = E i ui(; i) X
ui() = ui(i; i) =
i(si )ui(si; i)
si 2Si
X
ui(si; i) =
i (s i)ui(si; s i)
s i 2S i
!
X Y
=
j (sj ) ui(si; s i):
s i 2S i j 6=i
Hence,
i(si)
j 6=i
!
X Y
=
j (sj ) ui(s):
ui() =
si 2Si s i 2S i
s2S
!
j (sj ) ui(si; s i)
player i if
9i0 2i 8s
9i0 2i 8s
^ 9s
i2S i ui (i ; s i )
i 2S
ui(si; s i)
!
0
i ui (i; s i ) > ui (si ; s i ) :
Section 2.4
13
2.4 Example
Example
L
U 4,3
M 2,1
D 3,0
M
5,1
8,4
9,6
R
6,2
3,6
2,8
14
Chapter 2
New Pay-os
L
U 4,3
M 2,1
D 3,0
R
6,2
3,6
2,8
New Pay-os
L R
U 4,3 6,2
Next, column-player eliminates R as it is dominated by U
and reduces the pay-o matrix to
New Pay-os
L
U 4,3
Note that
Section 2.4
15
r = (1=3; 1=3; 1=3) & c = (0; 1=2; 1=2) & = (r ; c):
Thus
XY
ur(r ; c) = ( j (sj ))ur (s)
s
and
XY
uc(r; c) = ( j (sj ))uc(s)
s
Example
L
U 2,0
M 0,0
D -1,0
R
-1,0
0,0
2,0
c Mishra 1998
16
Chapter 2
Matching Pennies
H T
H 1,-1 -1,1
T -1,1 1,-1
There are two players: \Matcher" (row-player) and \Mismatcher" (column-player). Matcher and Mismatcher both have
two strategies: \call head" (H) and \call tail" (T). Matcher
wins 1 util if both players call the same [(H,H) or (T,T)] and
mismatcher wins 1 util if the players call dierently [(H,T) or
(T,H)]. It is easy to see that this game has no Nash equilibrium
pure strategy. However it does have a Nash equilibrium mixed
strategy:
r = (1=2; 1=2) & c = (1=2; 1=2):
The pay-os are
ur () = (1=2 1=2)1 + (1=2 1=2)( 1)
+ (1=2 1=2)( 1) + (1=2 1=2)1 = 0
uc() = (1=2 1=2)( 1) + (1=2 1=2)1
+ (1=2 1=2)1 + (1=2 1=2)( 1) = 0:
c Mishra 1998
Section 2.6
17
Note that in the earlier discussion of the on-line learning strategy, we noted that the on-line learning algorithm is competitive
[with a competitive factor of (ln 1= )=(1 ) 1 + (1 )=2 +
(1 )2=3 + , for small (1 )] for any suciently large time
interval [0; T ]. But it is also fairly easy to note that the probabilities that the row-player chooses do not necessarily converge
to the best mixed strategy. Namely,
P
WT (i) = t M (i; c;t ) & r;T (i) = PWWT (i()i) :
i T
We have not explicitly shown that limT !1 r;T converges in
distribution to r. Does the computed distribution converge
to anything? In the absence of any convergence property, one
may justiably question how the algorithm can be interpreted
as learning a strategy.
2.6.2 Irrationality
Matching Pennies
c Mishra 1998
18
Chapter 2
H T
H 1,-1 -1,1
T -1,1 1,-1
Suppose the column-player chooses a mixed strategy at time
t such that c;t(H ) > 1=2 [and c;t(T ) = 1 c;t(H ) < 1=2]
then for the row-player, the best response is BRr;t(t) = H
and is unique. By a similar reasoning, if c;t(H ) < 1=2 [and
c;t(T ) > 1=2], then for the row-player, the best response is
BRr;t(t) = T . Thus, if the rival deviates from his Nash equilibrium mixed strategy c;t = (1=2; 1=2), then row-player's (rational) best response is always a pure strategy H or T . Thus,
if row-player had a convergent (rational) mixed strategy, then
depending on limT !1fc;tgT0 , the row player must converge to
one of the following three (conventional) strategies:
1. random(1=2; 1=2) (the Nash equilibrium mixed strategy),
2. H (always H), or
3. T (always T).
Anything else would make the row-player irrational. Thus,
a player playing the on-line learning algorithm must be almost
always irrational!
T
If c;t is not almost constant then
8c= const c;t 6= c innitely often (i.o.):
c Mishra 1998
Section 2.6
19
strategy and its beliefs. In particular, let ht = all publicly available information up to time t. Then, player i chooses its strategy
and beliefs as follows:
fi : ht 1 7! i;t
fij : ht 1 7! ij;t:
The learning process is informationally independent if ij;t =
fij (ht 1) do not depend on any extraneous information.
Denition 2.6.4 Convergence: The beliefs are said to converge along a learning path fht; i;t; ij;tg10 if
8i6=j 9ij 2j tlim
= ij :
!1 ij;t
The strategies are said to converge along a learning path
fht; i;t; ij;tg10 if
= i:
8i 9i2i tlim
!1 i;t
c Mishra 1998
20
Chapter 2
8i6=j tlim
= i;t;
!1 ij;t
and they are strongly predictive if in addition both the beliefs
and strategies converge.
Theorem 2.6.1 Consider a nite 2-person game (players: rowplayer and column-player) with a strict (thus, unique) Nash
equilibrium = (r; c) which has full support on Sr Sc .
Let f(fr ; frc ); (fc; fcr )g be a DRIP learning process (D = Deterministic, R = Rational, I = Informationally independent and P
= Predictive).
On any learning path (ht; (r;t; rc;t); (c;t; cr;t)), if the beliefs
are not almost constant with value then the beliefs do not
converge.
Proof:
Assume to the contrary: then rc;t 6= c i.o. Then, innitely
often, rc;t does not have full support and
9sr;t 2Sr sr;t 62 BRr (rc;t);
and by the niteness of the strategies Sr :
9sc2Sc tlim
(s ) = 0 :
!1 c;t c
Since the learning is assumed to be predictive, we get
lim (s ) = 0
t!1 cr;t r
c Mishra 1998
& rc;t(sc) = 0:
Section 2.6
Nash Equilibrium
21
Thus, if the beliefs converge (say, to (r ; c)) then the beliefs
(and also, strategies|by predictivity) converge to some strategies other than the unique Nash equilibrium (as it is unique with
full support). Hence one of the following two holds at the limit:
or
9tc2Scnfscg c(tc) > 0 and tc 62 BRc(r ):
c Mishra 1998
Chapter 3
Nash Equilibrium
3.1 Nash Equilibrium
9x2K x = f (x):
Theorem 3.1.2 Kakutani's Fixed Point Theorem: If :
K ! 2K is a convex-valued, uhc (upper hemi-continuous) map
from a nonempty, compact, convex subset K of a nite dimensional TVS to the nonempty subsets of K , then has a xed
point, i.e.,
9x2K x 2 (x):
22
Section 3.1
23
Nash Equilibrium
with a T1 topology
(8x6=y2L 9Gx = open set x 2 Gx ^ y 62 Gx )
which admits continuous vector space operations.
Example: Rn with standard Euclidean topology. (Only instance
of a nite dimensional TVS.)
Theorem 3.1.3 Existence of a Mixed Strategy Equilibrium (Nash 1950). Every nite strategic-form game has a mixed-
strategy equilibrium.
Proof: Player i's reaction correspondence , i, maps each
strategy prole to the set of mixed strategies that maximize
player i's pay-os when his rivals play i:
)
(
0
0
i() = i j 8si2Si ui(i; i) ui(si; i) :
Thus,
Dene
i : ! 2i :
: ! 2 : 7! i i():
Thus this correspondence map is the Cartesian product of i's.
A xed point of (if exists) is a such that
2 ():
Note that
8si2Si ui(i; i) ui(si; i );
by denition. Thus a xed point of provides a mixed strategy
equilibrium .
Claims:
1. = Nonempty, compact and convex subset of a TVS.
i = jSij 1 = jSij 1 dimensional simplex, since
)
(
X
i = (i;1; : : : ; i;jSij) j i;j 0; i;j = 1 :
Rest follows since = ii.
c Mishra 1998
24
Chapter 3
Nash Equilibrium
2. ui = Linear Function.
Hence ui is a continuous function in his own mixed strategy. Since is compact, ui attains maxima in .
82 () 6= ;:
3.
nlim
!1 (
n ; ^ n ) = (;
^)
then ^ 2 ():
8n ^ n 2 (n);
but
c Mishra 1998
Section 3.1
25
Beyond Nash
Thus,
92 2 ();
and is a mixed strategy Nash equilibrium.
c Mishra 1998
Chapter 4
Beyond Nash: Domination,
Rationalization and
Correlation
4.1 Beyond Nash
We have seen that it is impossible to \learn" a Nash equilibrium
if we insist on DRIP conditions. A resolution to this dilemma
can involve one or more of the following approaches:
1. Explore simpler requirements than Nash equilibria: e.g.,
undominated sets, rationalizable sets and correlated equilibria. (The rst two correspond to minmax and maxmin
requirements. The last one requires some side information
and may make the system informationally dependent.)
2. Requirement of predictivity may need to be abandoned.
3. Requirement of rationality may need to be abandoned.
Section 4.2
27
Beyond Nash
Sin
(
= si 2 Sin 1 j 8i02ni 1 9s
ui(si; s
i 2S n i 1
0
i ) ui (i ; s i )
ni
Let
Si1 =
1
\
n=0
)
:
Sin
be the set of player i's pure strategies that survive iterated deletion of strictly dominated strategies.
Let
1
i = i = mixed strategy j
8i02i 9s
i 2S 1i
ui(i; s
0
i ) ui(i ; s i )
be the set of player i's mixed strategies that survive iterated deletion of strictly dominated strategies.
c Mishra 1998
28
Beyond Nash
Chapter 4
Example:
L
U 1,3
M -2,0
D 0,1
R
-2,0
1,3
0,1
Note that
Section 4.3
29
Beyond Nash
Sin
si 2 Sin 1 j
8i02ni 1 9s
ni =
i 2S n i 1
ui(si; s
ui(i; s
0
i ) ui (i ; s i )
i 2 ni 1 j
8i02ni 1 9s
Si1 =
1
\
n=0
i 2S n i 1
Sin ;
)
:
0
i ) ui(i ; s i )
&
1i =
1
\
n=0
ni :
Denition 4.2.2 A game is solvable by iterated (strict) dominance, if for each player i, Si1 is a singleton, i.e., Si1 = fsi g.
30
Chapter 4
Beyond Nash
4.3 Rationalizability
This notion is due to Bernheim(1984), Pearce(1984) and Aumann(1987) and provides a complementary approach to iterated
strict dominance. This approach tries to answer the following
question:
\What are all the strategies that a rational player
can play? "
Rational player will only play those strategies that are best
responses to some beliefs he has about his rivals' strategies.
~ ni
(
= i 2 ~ ni 1 j
9
n 1
i 2j6=i Conv(~ nj 1 ) i0 2~ i
ui(i;
0
i ) ui(i ; i )
Ri =
1
\
~ ni
n=0
Section 4.4
31
Beyond Nash
9
i 2j6=i Conv(~ nj 1 )
8i02~ ni 1 ui(i;
0
i ) ui (i ; i )
Finally,
1
1
\
n ; 1 = 1 ; and R = \
~ ni; R = iRi:
1
=
i i
i
i
i
n=0
n=0
32
Beyond Nash
Chapter 4
L R
U 5,1 0,0
D 4,4 1,5
There are 3 Nash equilibria:
c Mishra 1998
Section 4.4
33
Beyond Nash
i :
! Si : ! 7! i(!);
such that i(!) = i(!0), if !0 2 hi(!).
The strategies are adapted to the information structure.
8i 8~i
! 2
! 2
p(!)ui(~i(!); i (!)):
8i 8hi2HiX
;p(hi )>0 8si2Si
p(!jhi )ui(i(!); i (!))
!jhi (!)=hi
!jhi (!)=hi
34
Beyond Nash
Chapter 4
Denition 4.4.3 DEF(2) A correlated equilibrium is any probability distribution p(:) over the pure strategies S1 S2 SI
such that, for every player i, and every function d(i) : Si ! Si
X
X
p(s)ui (si; s i) p(s)ui(d(si); s i):
s2S
s2S
8i 8si2Si;p(si)>0 8s0i2Si
X
p(s i jsi)ui(si; s i)
s i 2S i
s i 2S i
Claim:
Def(1) ) Def(2):
Let
Thus
Ji(si) = f!ji(!) = si g:
Section 4.4
Adaptive Learning
35
p~(!) (!):
i
!2Ji (si ) p~(Ji (si ))
It is the mixed strategy of the rivals that player i believes he
faces, conditional on being told to play si, and it is a convex
combination of the distributions conditional on each hi such that
i(hi) = si.
c Mishra 1998
Chapter 5
Adaptive and Sophisticated
Learning
5.1 Adaptive and Sophisticated Learning
The idea of best reply dynamics goes back all the way to Cournot's
study of duopoly and forms the foundation of Walrasian equilibrium in economy and is created by the classical Tatonnement
learning process.
The underlying learning processes can be categorized into
successively stronger versions:
Section 5.2
Adaptive Learning
37
5.2 Set-up
Player n plays a sequence of plays: fxn(t)g. Each xn(t) is a
pure strategy and is chosen by the rules of player n's learning
algorithm. We are interested in two properties that may be
satised by fxn(t)g: it is approximately best-reply dynamics,
then it is consistent with adaptive learning ; it is approximately
ctitious-play dynamics, then it is consistent with sophisticated
learning .
Denition 5.2.1 fxn(t)g is consistent with adaptive learning. Player n eventually chooses only strategies that are nearly
38
Adaptive Learning
Chapter 5
5.3 Formulation
Section 5.3
39
Adaptive Learning
Fact 1
Fact 2
9T U (T ) 6 T:
Fact 3
U ;k (T ) U ;k+1 (T );
c Mishra 1998
40
Adaptive Learning
then
Chapter 5
U (U ;k (T )) U (U ;k+1 (T ))
and
U ;k+1 (T ) U ;k+2 (T ):
Putting it all together, we do have
S U (S ) U ;2(S ) U ;k (S ) U ;k+1 (S )
We then dene
U 1 (S ) =
1
\
k=0
U ;k (S ):
Hence, U 01(S ) = lim!0 U ;1 (S ) = Serially undominated strategy set. We say x is serially undominated, if x 2 U 01(S ).
!
fx(s) : ^t s < tg :
A sequence of strategy proles fx(t)g is consistent with adaptive learning if each fxn (t)g has this property.
!
fx(s) : t^ s < tg :
8 k 1:
F ;k (t^; t) = U
c Mishra 1998
F ;k 1(t^; t)
!
^
[ fx(s) : t s < tg :
Section 5.4
41
Learning: MR
Lemma 5.4.1
F ;0(t^; t) F ;1(t^; t) F ;k(t^; t) F ;k+1(t^; t)
Proof
By the monotonicity of U ,
F ;0(t^; t) F ;1(t^; t):
Assume by inductive hypothesis,
F ;k 1(t^; t) F ;k (t^; t):
Then
U
Thus
F ;k 1(t^; t)
U
[ fx(s) : t^ s < tg
F ;k (t^; t)
!
^
[ fx(s) : t s < tg :
Chapter 6
Learning a la Milgrom and
Roberts
6.1 Adaptive Learning and Undominated
Sets
Example: Battle of Sexes
WnM
Ballet(B) Football(F)
Ballet(B)
2,1
0,0
Football(F) 0,0
1,2
Section 6.2
Thus
Similarly,
F ;1(t^; t) = U
43
Learning: MR
F ;0(t^; t) = f (B; F ) g:
6.2 Convergence
G 2 (S )
if (1) and (2) hold:
1. Gtn converges weakly to the marginal distribution Gn for
all n.
2.
Denition 6.2.2 A sequence fx(t)g converges omitting correlation to a mixed strategy Nash equilibrium if
c Mishra 1998
44
Learning: MR
Chapter 6
8yn2Gn 9z
)
n 2G n
xn (t) 2 Un
!
fx(s) j t^ s < tg :
Theorem 6.2.3 Let fx(t)g be consistent with sophisticated learning. Then for each > 0 and k 2 N there exists a time tk after
which (i.e., for t tk )
x(t) 2 U k (S ):
Proof Sketch:
Fix > 0. Dene tk tk (Change in notation).
c Mishra 1998
Section 6.2
45
Learning: MR
Case k = 0: t0 = 0. x(t) 2 U (S ).
Case k = j + 1: By the inductive hypothesis there exists a
tj such that
8ttj x(t) 2 U j (S ):
Hence
fx(s) j tj s < tg U j (S ):
Claim:
F 1(tj ; t) U ;j+1 (S ):
Equivalently,
8i F i(tj ; t) U ;j+1 (S ):
F 0(tj ; t)
F ;i+1(tj ; t)
U
fx(s) j tj s < tg
U
U ;j (S )
= U ;j+1(S ):
[ fx(s) j tj s < tg
U (U ;j+1(S ) [ U ;j (S ))
=
U
F ;i(tj ; t)
= U (U ;j (S )) = U ;j+1 (S ):
\\
k >0
U k (S ) =
\ 0k
U (S ) = U 01(S ):
k
c Mishra 1998
46
Learning: MR
Chapter 6
ing and Sn1 be the set of strategies that are played innitely often
in fxn(t)g. Then
\ \ k
U (S ) = U 01(S )
S 1 = n2N Sn1
k >0
Since is continuous,
kx(t) xk ! 0:
8>0 9t 8t>t 8n2N n(xn(t); x n(t)) maxfn (yn; x n(t))jyn 2 Sng
< [n(x) + =2] [maxfn(yn; x n)jyn 2 Sn g =2]
= :
Un
!
fx(s)jt s < tg :
x 2
1
\\
U ;k (S )
>0 k=1
\
\ ;k
U (S ) = U 01(S ) = fxg
=
)
c Mishra 1998
k >0
kx(t) xk ! 0:
Section 6.3
47
Learning: MR
48
Information Theory
Chapter 6
Claim:
Let T = set of strategy proles.
2=jSnj
22M (t)=jSnj:
c Mishra 1998
Chapter 7
Information Theory and
Learning
7.1 Information Theory and Games
7.1.1 Basic Concepts
Denition 7.1.1 Entropy is a measure of uncertainty of a random variable. Let X be a discrete random variable with alphabet
X.
p(x) = Pr[X = x]; where x 2 X :
The entropy H (X ) of the discrete random variable X is dened as
H (X ) = E p lg p(1X )
X
=
p(x) lg p(x):
x2X
Facts
1. H (X ) 0. Entropy is always nonnegative. 0 p(x) 1;
lg p(x) 0. Hence, Ep lg(1=p(x)) 0.)
2. H (X ) lg jXj. Consider the
uniform distribution u(x).
P
8x2X u(x) = 1=jXj. H (u) = x(1=jXj) lg jXj = lg jXj.
49
50
Information Theory
Chapter 7
Conditional Entropy =
H (Y jX ) =
1
p(Y jX )
XX
p(x; y) lg p(yjx)
x2X y2Y
X
X
p(x) p(yjx) lg p(yjx)
x2X
y2Y
X
p(x)H (Y jx):
E p lg
=
=
=
x2X
Corollary 7.2.1
c Mishra 1998
Section 7.3
51
Information Theory
1. H (X; Y jZ ) = H (X jZ ) + H (Y jX; Z ):
2. H (X ) + H (Y jX ) = H (Y ) + H (X jY )
) H (X ) H (X jY ) = H (Y ) H (Y jX ):
3. Note that H (X jY ) 6= H (Y jX ):
is
D(pku) =
X
p(x) lg p(1x) + p(x) lg jXj = lg jXj H (X ):
Let X and Y be two discrete random variables with a joint probability mass function p(x; y), and with marginal probability mass
functions
p(x) =
y2Y
x2X
p(x; y):
Mutual Information,
!
I (X ; Y ) = D p(x; y) k p(x)p(y)
= E p(x;y) lg pp(x(x;)py(y))
X X
=
p(x; y) lg pp(x(x;)py(y))
x2X y2Y
c Mishra 1998
52
Information Theory
Chapter 7
= H (X ) + H (Y ) !H (X; Y )
!
= H (X ) + H (Y )
H (Y ) + H (X jY )
= H (X ) H (X jY ) = H (Y ) H (Y jX ) = I (Y ; X ):
H (X ) H (X jY ) = I (X ; Y ) = H (Y ) H (Y jX ) = I (Y ; X ):
I (X ; X ) = H (X ) H (X jX ) = H (X ):
I (X ; Y ) = I (Y ; X ) = H (X ) + H (Y ) H (X; Y ):
I (X1; X2; : : :; Xn ; Y )
= H (X1 ; : : :; Xn ) H (X1; : : :; Xn jY )
n
n
X
X
H (Xi jX1; : : :; Xi ; Y )
H (Xi jX1; : : : ; Xi)
=
i=1
i=1
n
X
H (Xi jX1; : : : ; Xi) H (Xi jX1; : : :; Xi ; Y )
=
i=1
n
X
I (Xi; Y jX1; : : : ; Xi 1):
=
i=1
c Mishra 1998
Section 7.5
53
Information Theory
!
D p(x; y) k q(x; y)
XX
y)
=
p(x; y) lg pq((x;
x; y)
x y
XX
=
p(x; y) lg pq((xx)) pq((yyjjxx))
x y
XX
XX
=
p(x; y) lg pq((xx)) +
p(x; y) lg pq((yyjjxx))
x y
x y
X
X
=
p(x) lg pq((xx)) + p(yjx) lg pq((yyjjxx))
x
! y
!
= D p(x) k q(x) + D p(yjx) k q(yjx) :
Corollary 7.5.2
!
I (X ; Y ) = D p(x; y) k p(x)p(y) 0;
D(p k u) = lg jXj H (X ) 0:
c Mishra 1998
54
Information Theory
Chapter 7
Hence,
H (X ) lg jXj;
(with equality i X has a uniform distribution over X .)
I (X ; Y ) = H (X ) H (X jY ) 0:
Theorem 7.5.3
H (X jY ) H (X ):
Conditioning reduces entropy.
H (X1; : : : ; Xn) =
n
X
i=1
n
X
i=1
H (Xi jX1; : : : ; Xi 1)
H (Xi )
Corollary 7.5.4
H (X1 ; : : :; Xn )
n
X
i=1
H (Xi );
Stationary
Pr[Xn jX1; : : : ; Xi] = Pr[Xn+1 jX2; : : :; Xi+1]:
c Mishra 1998
Section 7.6
55
Information Theory
!
c Mishra 1998
56
Information Theory
Chapter 7
!
D p(xn) k q(xn) D p(xn+1) k q(xn+1) :
pi = Pr[Hi wins ]
ui = pay-o if Hi wins :
If bi = bet on the ith horse then the payo =
(
biui; if Hi wins with probability pi ;
0;
if Hi loses with probability (1 pi ):
Assume that the gambler has 1 dollar. Let bi = fraction of
his wealth invested in Hi . Thus
m
X
bi = 1:
0 b i 1:
i=1
c Mishra 1998
Section 7.7
57
Information Theory
S (X ) = b(X )u(X )
= factor by which the gambler increases his wealth if X wins.
Repeated game with reinvestment.
S0 = 1;
Sn = Sn 1 S (Xn );
Thus
Let
E p [lg S (X )] =
Sn =
n
Y
i=1
P
S (Xi) = 2 lg S(Xi):
Sn ! 2nW (b;p):
m
X
k=1
pk lg(bk uk ):
58
Chapter 7
Information Theory
W (b; p) =
pk lg(bk uk )
#
X " bk
1
=
pk lg p lg p + lg uk
k
k
X
=
pk lg uk H (p) D(pkb)
X
pk lg uk H (p);
with equality i p = b.
The optimal doubling rate
W (p) = max
W (b; p) = W (p; p) =
b
pk lg uk H (p):
W (p) = W (b; p) =
pk lg uk H (p);
pk lg rbk
k
#
X " bk
r
k
=
pk lg p lg p
k
k
= D(pkr) D(pkb):
W (b; p) =
Section 7.7
59
Information Theory
W (p) + H (p) = lg m
i=0
S (X ) = b0 + b(X )u(X ):
P
Fair Odds: u1i = 1.
If there is a non-fully-invested strategy with b0, b1, : : :, bm,
then there is also a full investment as follows
b00 = 0
b0i = bi + bu0 ; 1 i m
i
m
m
m
X
X
X
bi + b0 u1 = 1:
b0i =
i=1 i
i=1
i=0
Thus
S (X ) = b0(X )u(X ) = u(bX0 ) u(X ) + b(X )u(X )
= b0 + b(X )u(X ):
Thus in this case there is a risk-neutral investment.
c Mishra 1998
60
Information Theory
Chapter 7
X 1 1
X1
+
u
(
X
)
=
2
S (X ) = 1
ui u(X )
ui > 1
with no risk!
This, however, implies a strong arbitrage opportunity.
Sub-Fair Odds: P u1i > 1.
In this case, proportional gambling is no longer log-optimal
and this case represents a risky undertaking for the gambler.
Section 7.9
61
Information Theory
X
W (X ) = max
p(x) lg(b(x)u(x))
b(x) x
X
=
p(x) lg u(x) H (X ):
x
X
W (X jY ) = bmax
p(x) lg(b(xjy)u(x))
(xjy) x
X
=
p(x) lg u(x) H (X jY ):
x
W = W
X (X jY ) W (X )
X
=
p(x) lg u(x) H (X jY )
p(x) lg u(x) + H (X )
x
= H (X ) H (X jY ) = I (X ; Y ) 0:
Increase in Doubling Rate =
Mutual information between the horse race and side information.
7.9 Learning
c Mishra 1998
62
Universal Portfolio
Chapter 7
we have
1 E lg S = 1 X E lg S (X )
n
i
n
n
X
= n1 (lg m H (Xi jX1; : : : ; Xi 1))
= lg m H (X1; n: : : ; Xn)
= lg m H (X ):
c Mishra 1998
Chapter 8
Universal Portfolio
8.1 Universal Portfolio
1. Sequential Portfolio Selection Procedure. An adapted process.
2. No statistical assumption about the behavior of the market.
3. Robust procedure with respect to arbitrary market sequences occurring in the real world.
We shall consider growth of wealth for arbitrary market sequences. For example, our goal may be to outperform the best
buy-and-hold strategy|i.e., we wish to be competitive against
a competing investor who can predict n future days. A dierent goal may be to outperform all constant rebalanced portfolio
strategies.
64
Universal Portfolio
0
BB
x=B
B@
x1
x2
...
xm
Chapter 8
1
C
C
C
= stock market vector :
C
A
8.1.1 Portfolio
b1 1
(
b2 C
CC
bPi 0
= portfolio ;
... C
A
i b i = 1:
bm
Portfolio is simply the proportion of the current wealth invested in each of the stocks .
X
S = b x = bT x = bixi;
0
B
B
b=B
B
@
Sn = max
S (b) = Sn (b):
b n
This is the maximum wealth achievable on the given stock sequence maximized over all constant rebalanced portfolios.
c Mishra 1998
Section 8.2
65
Universal Portfolio
8.2.1 Questions
Since we wish to compete against a clairvoyant investor (who knows the future) and universal portfolios only depend on the past (past has no causal or
correlated relation with the future), how is it possible
that universal portfolio can be competitive?
0 1=m 1
B 1=m CC
BB .. CC :
^b(1) = B
@ . A
1=m
Sk (b) =
Yk
i=1
bT x(i); B = b 2 Rm+ j bi 0;
bi = 1 :
R bS (b)db
^b(k + 1) = RB k
S (b)db
B k
Note that
^b(k + 1)T x(k + 1) =
66
Universal Portfolio
Chapter 8
q
(
m
1)!(
2=n)m 1
q
S^n Sn
jJnj;
W (b; F ) =
n
Y
T x(i) = 2nW (Fn ):
b
Sn = max
S
(
b
)
=
max
n
b
b i=1
c Mishra 1998
Section 8.3
67
Universal Portfolio
Sn(ej ) =
n
Y
k=1
1 in j th position only.
eTj x(k) =
n
Y
k=1
x j (k )
Corollary 8.3.1
1. Target Exceeds Best Stock.
S (e ):
Sn max
j n j
2. Target Exceeds Value Line.
0
11=m
Y
Sn @ Sn(ej )A
j
Sn
X
j
j Sn(ej );
j 0;
X
j
j = 1:
68
Universal Portfolio
Lemma 8.3.2
Y
S^n = ^b(k)T x(k) =
n
k=1
where
Sn(b) =
n
Y
i=1
Chapter 8
R S (b)db
BRn
db
B
bT x(i):
R
^b(k + 1)T x(k + 1) = RSk+1 (b)db :
Sk (b)db
Telescoping the products
n
^Sn = Y ^b(k)T x(k)
k=1
R S (b)db
R S (b)db
n
= R S (b)db R1 db
R Sn(b1)db
= Rn db
R Qn bT x(i)db
R
= B i=1
B db
= E b Sn (b) = E b 2nW (b;Fn ):
Claim
E b W (b; Fn)
E b W (b; Fn )
c Mishra 1998
Eb
1 X W (e ; F ):
j n
m j
Section 8.4
69
Universal Portfolio
X
lg bj (eTj x) dFn(x)
X Z T
E b bj lg(ej x) dFn(x)
XZ T
1
= m
lg(ej x)dFn(x)
X
= m1 W (ej ; Fn):
=
Eb
By Jensen's inequality
Eb2nW (b;Fn ) 2nEbWP(b;Fn)
21=m nW (ej ;Fn)
Y nW (ej ;Fn)1=m
2
:
Thus
S^n = E b Sn (b) = E b 2nW (b;Fn )
11=m
0m
Y nW (ej ;Fn)1=m
Y
@ Sn (ej )A :
2
j =1
8.4 Competitiveness
Fn(x) = Empirical probability mass function. Mass on each
x(i) 2 Rm+ is n1 .
Sn(b) =
b(Fn) = b
n
Y
i=1
70
Chapter 8
Universal Portfolio
(
)
X
C = (c1; c2; : : : ; cm 1) j ci 0; ci 1 :
b(c) = c1; : : :; cm 1 ; 1
mX1
i=1
ci :
! Z
n
X
1
T
Vn (c) = n ln b(c) x(i) = ln(bT x) dFn (x) E Fn ln(bT x):
i=1
c Mishra 1998
Section 8.4
71
Universal Portfolio
jJ j > 0:
72
Chapter 8
3=2 3
nVn (c) = nV (Fn ) 21 uT Jnu 43mpnac3 kuk3:
We thus conclude that
p
jJn j1=2
1 lg Sn = 1 lg
n S^n n (m 1)!(2=n)(m 1)=2 ! 0;
Summarizing, we have
1 lg S 1 lg S^
n n
n n
Vn V^n :
c Mishra 1998
as n ! 1 :
Chapter 9
Portfolios and Markets
9.1 Portfolio Theory
9.1.1 It^o Calculus
m
dXi = dt + X
ij dZj ;
i
Xi
j =1
73
74
Chapter 9
Section 9.2
75
Example
dX = dt + dZ
X
Let f (X ) = ln X . Then
@f = 1 ; & @ 2f = 1
@X X
@X 2
X2
@f dX + 1 @ 2f dX 2
df = @X
2 @X 2
1 2X 2dt
= dX
X 2X 2
2 dt
= dX
X 2
2 dt
d(ln X ) = dX
X 2
dX = d(ln X ) + 2 dt
X
2 Z
Zt
Z t dX
1 t 2dt
d
(ln
X
)
+
=
2 0 Z
0
0 X
t
= ln X (t) ln X (0) + 21 2dt
0
(Z t )
Z
t
dX = X (t) exp 1 2dt
exp
X (0)
2 0
0 X
76
Chapter 9
Rebalanced Portfolio
i=1 j =1
dg
dS
S
df
X dXi
bi X
i
Hence
X
X
d(ln S ) = d( bi ln Xi ) 21 bT bdt + 12 biiidt
t; b) = X b ln Xi (t) 1 bT b + 1 X b ;
ln SS((0)
i
Xi(0) 2
2 i ii
R
where 0t (s)ds.
1
m X (t) !bi
X
Y
1
i
T
exp 2 b b + 2 iibi :
S (t; b) = S (0)
i=1 Xi (0)
c Mishra 1998
Section 9.2
77
S (t) = max
S (t; b) = S (t; b(t))
b2 B
Note that b(t) = optimal solution of the following quadratic
programming problem:
!
m
X
1
X
i (t) 1
T
max
ln X (0) + 2 ii bi:
b b +
b2B 2
i
i=1
Lemma 9.2.1 If V = positive denite then the portfolio problem has a unique optimal solution.
8t jE [X (t)]j < 1:
lim E [X (t)] = exists
t
X (t) ! in probability as t ! 1:
t
The stock market model is weakly regular (easily satised if
the market is stationary)
8t jE [(t)]j < 1; & jE [ln X (t)]j < 1;
t)] = 1 ; & lim E [ln X (t)] = 1 exist
lim E [(
t
t
(t) ! 1 ; & ln X (t) ! 1 in probability as t ! 1:
t
t
c Mishra 1998
78
Chapter 9
Note that
Thus
dXi = dt + X dZ
i
ij j
Xi
i dXi2
d(ln Xi) = dX
X 2X 2
i ii i
X
= i 2 dt + ij dZj
1
ii :
1
1
i = i
2
1
1
1
i = i + 2ii :
Similarly,
dS = X b dt + X X b dZ
i i
i ij j
S
XX
d(ln S ) = bT 21 bT b dt +
biij dZj
r(b) = lim E [ln St (t; b)] = 21 bT 1 b + bT 1:
Recall
1
m X (t) !bi
Y
i
T b + 1 X b :
S (t; b) = S (0)
exp
b
ii i
2
2
i=1 Xi (0)
Vij (t) = ij im jm + mm:
Dene
c Mishra 1998
Xm (t)
i = ln X
m (0)
Xi(t)
ln X
i (0)
Vii(t) :
2
Section 9.2
79
Notation:
V (t) (t) =
(t) =
(t)
V 1(t)(t)
Xm (t) eT V =2;
S (t) = S (0) X
(0)
m
and
1
0 )T V (b0 ) :
(
b
2
(t)
(t)
ln XXii(0)
i = ln XXmm(0)
1
i1 = m1 i11 J2ii
mm 1 1
ii
= 1
m 1 2
i 1 2
ii + 1 mm
2 1 im 1 2
= 1
m i + im :
lim V t(t) = J 1
9
2 >
>
>
>
=
lim it(t) =
i1
>
>
>
>
;
Vii (t)
c Mishra 1998
80
Since
1 bT 1 b + bT 1
2
1 b0T J 1b0 b0T
1;
2
r(b) =
=
it is maximized at
Chapter 9
1 = (J 1 ) 1 1 :
^
= d^S
S
c Mishra 1998
Section 9.3
81
Hence,
8t S^(t) = S(t):
S^(t) =
B SR (t; b)db :
B db
9.3.1 Competitiveness
S (t; b) = S (t)e 21 (b0 )T V (b0 ):
Let x = V 1=2(t)(b0 b0). Thus
(t) = V 1=2(t)(B 0 b0);
where
Note that
We have
B0 =
X
b0 2 Rm 1 j b0i 0; b0i < 1
)
:
Vol (B 0) = (m 1 1)! :
(t) R e jxj2 =2 dx
S
S^(t) = jV (t)j1=(2(1t) =(m 1)!) :
R
S^(t) = (m 1)! e jxj2=2dx
V (t) 1=2 m 1=2
S (t)
t t
p m1
(
m
1)!(
= jJ 1j1=2tm21)=2
2 m 1=2
= (jmJ 1j11)!
:
=2
t
c Mishra 1998
82
Bibliography
Thus,
1 ln S^(t) = C (m) C 0(m) ln t ! 0
t S (t)
t
and
c Mishra 1998
Bibliography
Text Books
[1] Thomas M. Cover and Joy A. Thomas. Elements of
Information Theory , John Wiley & Sons, 1991. ISBN
0-471-06259-6.
[5] H.M. Markovitz. Means-Variance Analysis in Portfolio Choice and Capital Markets , Blackwell, 1991. ISBN
0-631-17854-6.
Popular Books
[6] William Poundstone. Prisoner's Dilemma Doubleday,
1992.
[7] Anatol Rapoport. Two-Person Game Theory: The Essential Ideas , Ann Arbor Science Paperbacks, University of
Michigan, 1966.
[8] Karl Sigmund. Games of Life: Explorations in Ecology,
Evolution and Behaviour Oxford University Press, 1993.
83