Algorithmic Trading With Markov Chains

June 16, 2010
ALGORITHMIC TRADING WITH MARKOV CHAINS

HENRIK HULT AND JONAS KIESSLING
Abstract. An order book consists of a list of all buy and sell oers, repre-
sented by price and quantity, available to a market agent. The order book
changes rapidly, within fractions of a second, due to new orders being entered
into the book. The volume at a certain price level may increase due to limit
orders, i.e. orders to buy or sell placed at the end of the queue, or decrease
because of market orders or cancellations.
In this paper a high-dimensional Markov chain is used to represent the state
and evolution of the entire order book. The design and evaluation of optimal
algorithmic strategies for buying and selling is studied within the theory of
Markov decision processes. General conditions are provided that guarantee
the existence of optimal strategies. Moreover, a value-iteration algorithm is
presented that enables nding optimal strategies numerically.
As an illustration a simple version of the Markov chain model is calibrated
to high-frequency observations of the order book in a foreign exchange mar-
ket. In this model, using an optimally designed strategy for buying one unit
provides a signicant improvement, in terms of the expected buy price, over a
naive buy-one-unit strategy.
1. Introduction
The appearance of high frequency observations of the limit order book in or-
der driven markets have radically changed the way traders interact with nancial
markets. With trading opportunities existing only for fractions of a second it has
become essential to develop eective and robust algorithms that allow for instan-
taneous trading decisions.
In order driven markets there are no centralized market makers, rather all par-
ticipants have the option to provide liquidity through limit orders. An agent who
wants to buy a security is therefore faced with an array of options. One option is
to submit a market order, obtaining the security at the best available ask price.
Another alternative is to submit a limit order at a price lower than the ask price,
hoping that this order will eventually be matched against a market sell order. What
is the best alternative? The answer will typically depend both on the agents view
of current market conditions as well as on the current state of the order book. With
new orders being submitted at a very high frequency the optimal choice can change
in a matter of seconds or even fractions of a second.
In this paper the limit order book is modelled as a high-dimensional Markov chain
where each coordinate corresponds to a price level and the state of the Markov chain
represents the volume of limit orders at every price level. For this model many tools
from applied probability are available to design and evaluate the performance of
dierent trading strategies. Throughout the paper the emphasis will be on what we
call buy-one-unit strategies and making-the-spread strategies. In the rst case an
c the Authors
1
2 H. HULT AND J. KIESSLING
agent wants to buy one unit of the underlying asset. Here one unit can be thought
of as an order of one million EUR on the EUR/USD exchange. In the second case
an agent is looking to earn the dierence between the buy and sell price, the spread,
by submitting a limit buy order and a limit sell order simultaneously, hoping that
both orders will be matched against future market orders.
Consider strategies for buying one unit. A naive buy-one-unit strategy is exe-
cuted as follows. The agent submits a limit buy order and waits until either the
order is matched against a market sell order or the best ask level reaches a prede-
ned stop-loss level. The probability of the agents limit order to be executed, as
well as the expected buy price, can be computed using standard potential theory
for Markov chains. It is not optimal to follow the naive buy-one-unit strategy.
For instance, if the order book moves to a state where the limit order has a small
probability of being executed, the agent would typically like to cancel and replace
the limit order either by a market order or a new limit order at a higher level.
Similarly, if the market is moving in favor of the agent, it might be protable to
cancel the limit order and submit a new limit order at a lower price level. Such
more elaborate strategies are naturally treated within the framework of Markov
decision processes. We show that, under certain mild conditions, optimal strate-
gies always exist and that the optimal expected buy price is unique. In addition a
value-iteration algorithm is provided that is well suited to nd and evaluate optimal
strategies numerically. Sell-one-unit strategies can of course be treated in precisely
the same way as buy-one-unit strategies, so only the latter will be treated in this
paper.
In the nal part of the paper we apply the value-iteration algorithm to nd close
to optimal buy strategies in a foreign exchange market. This provides an example
of the proposed methodology, which consists of the following steps:
(1) parametrize the generator matrix of the Markov chain representing the
order book,
(2) calibrate the model to historical data,
(3) compute optimal strategies for each state of the order book,
(4) apply the model to make trading decisions.
The practical applicability of the method depends on the possibility to make suf-
ciently fast trading decisions. As the market conditions vary there is a need to
recalibrate the model regularly. For this reason it is necessary to have fast calibra-
tion and computational algorithms. In the simple model presented in Sections 5
and 6 the calibration (step 2) is fast and the speed is largely determined by how
fast the optimal strategy is computed. In this example the buy-one-unit strategy
is studied and the computation of the optimal strategy (step 3) took roughly ten
seconds on an ordinary notebook, using Matlab. Individual trading decision can
then be made in a few milliseconds (step 4).
Today there is an extensive literature on order book dynamics. In this paper the
primary interest is in short-term predictions based on the current state and recent
history of the order book. The content of this paper is therefore quite similar in
spirit to [4] and [2]. This is somewhat dierent from studies of market impact
and its relation to the construction of optimal execution strategies of large market
orders through a series of smaller trades. See for instance [13] and [1].
Statistical properties of the order book is a popular topic in the econophysics
literature. Several interesting studies have been written over the years, two of
ALGORITHMIC TRADING WITH MARKOV CHAINS 3
which we mention here. In the enticingly titled paper What really causes large
price changes?, [7], the authors claim that large changes in share prices are not
due to large market orders. They nd that statistical properties of prices depend
more on uctuations in revealed supply and demand than on their mean behavior,
highlighting the importance of models taking the whole order book into account.
In [3], the authors study certain static properties of the order book. They nd
that limit order prices follow a powerlaw around the current price, suggesting that
market participants believe in the possibility of very large price variations within
a rather short time horizon. It should be pointed out that the mentioned papers
study limit order books for stock markets.
Although the theory presented in this paper is quite general the applications
provided here concern a particular foreign exchange market. There are many simi-
larities between order books for stocks and exchange rates but there are also some
important dierences. For instance, orders of unit size (e.g. one million EUR) keep
the volume at rather small levels in absolute terms compared to stock markets.
In stock market applications of the techniques provided here one would have to
bundle shares by selecting an appropriate unit size of orders. We are not aware
of empirical studies, similar to those mentioned above, of order books in foreign
exchange markets.
Another approach to study the dynamical aspects of limit order books is by
means of game theory. Each agent is thought to take into account the eect of the
order placement strategy of other agents when deciding between limit or market
orders. Some of the systematic properties of the order book may then be explained
as properties of the resulting equilibrium, see e.g. [10] and [14] and the references
therein. In contrast, our approach assumes that the transitions of the order book
are given exogeneously as transitions of a Markov chain.
The rest of this paper is organized as follows. Section 2 contains a detailed
description of a general Markov chain representation of the limit order book. In
Section 3 some discrete potential theory for Markov chains is reviewed and applied
to evaluate a naive buy strategy and an elementary strategy for making the spread.
The core of the paper is Section 4, where Markov decision theory is employed
to study optimal trading strategies. A proof of existence of optimal strategies is
presented together with an iteration scheme to nd them. In Section 5, a simple
parameterization of the Markov chain is presented together with a calibration tech-
nique. For this particular choice of model, limit order arrival rates depend only
on the distance from the opposite best quote, and market order intensities are as-
sumed independent of outstanding limit orders. The concluding Section 6 contains
some numerical experiments on data from a foreign exchange (EUR/USD) market.
The simple model from Section 5 is calibrated on high-frequency data and three
dierent buy-one-unit strategies are compared. It turns out that there is a sub-
stantial amount to be gained from using more elaborate strategies than the naive
buy strategy.
2. Markov chain representation of a limit order book
We begin with a brief description of order driven markets. An order driven
market is a continuos double auction where agents can submit limit orders. A limit
order, or quote, is a request to to buy or sell a certain quantity together with a
worst allowable price, the limit. A limit order is executed immediately if there are
outstanding quotes of opposite sign with the same (or better) limit. Limit orders
that are not immediately executed are entered into the limit order book. An agent
having an outstanding limit order in the order book can at any time cancel this
order. Limit orders are executed using time priority at a given price and price
priority across prices.
Following [7], orders are decomposed into two types: an order resulting in an
immediate transaction is an eective market order and an order that is not exe-
cuted, but stored in the limit order book, an eective limit order. For the rest
of this paper eective market orders and eective limit orders will be referred to
simply as market orders and limit orders, respectively. As a consequence, the limit
of a limit buy (sell) order is always lower (higher) than the best available sell (buy)
quote. For simplicity it is assumed that the limit of a market buy (sell) order is
precisely equal to the best available sell (buy) quote. Note that it is not assumed
that the entire market order will be executed immediately. If there are fewer quotes
of opposite sign at the level where the market order is entered, then the remaining
part of the order is stored in the limit order book.
2.1. Markov chain representation. A continuous time Markov chain X = (X
t
)
is used to model the limit order book. It is assumed that there are d N possible
price levels in the order book, denoted
1
< <
d
. The Markov chain X
t
=
(X
1
t
, . . . , X
d
t
) represents the volume at time t 0 of buy orders (negative values)
and a sell orders (positive values) at each price level. It is assumed that X
j
t
Z
for each j = 1, . . . , d. That is, all volumes are integer valued. The state space
of the Markov chain is denoted S Z
d
. The generator matrix of X is denoted
Q = (Q
xy
), where Q
xy
is the transition intensity from state x = (x
1
, . . . , x
d
) to
state y = (y
1
, . . . , y
d
). The matrix P = (P
xy
) is the transition matrix of the jump
chain of X. Let us already here point out that for most of our results only the
jump chain will be needed and it will also be denoted X = (X
n
)
n=0
, where n is the
number of transitions from time 0. .
For each state x S let
j
B
= j
B
(x) = max{j : x
j
< 0},
j
A
= j
A
(x) = min{j : x
j
> 0},
be the highest bid level and the lowest ask level, respectively. For convenience it
will be assumed that x
d
> 0 for all x S; i.e. there is always someone willing to
sell at the highest possible price. Similarly x
1
< 0 for all x S; someone is always
willing to buy at the lowest possible price. It is further assumed that the highest
bid level is always below the lowest ask level, j
B
< j
A
. This will be implicit in the
construction of the generator matrix Q and transition matrix P. The bid price is
dened to be
B
=
j
B
and the ask price is
A
=
j
A
. Since there are no limit
orders at levels between j
B
and j
A
, it follows that x
j
= 0 for j
B
< j < j
A
. The
distance j
A
j
B
between the best ask level and the best bid level is called the
spread. See Figure 1 for an illustration of the state of the order book.
The possible transitions of the Markov chain X dening the order book are given
as follows. Throughout the paper e
j
= (0, . . . , 0, 1, 0, . . . , 0) denotes a vector in Z
d
with 1 in the jth position.
Limit buy order. A limit buy order of size k at level j is an order to buy k units
at price
j
. The order is placed last in the queue of orders at price
j
. It may be
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book
Figure 1. State of the order book. The negative volumes to the
left indicate limit buy orders and the positive volumes indicate
limit sell orders. In this state j
A
= 44, j
B
= 42, and the spread is
equal to 2.
interpreted as k orders of unit size arriving instantaneously. Mathematically it is
a transition of the Markov chain from state x to x ke
j
where j < j
A
and k 1.
That is, a limit buy order can only be placed at a level lower than the best ask
level j
A
. See Figure 2.
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book: limit buy order
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book: limit sell order
Figure 2. Left: Limit buy order of size 1 arrives at level 42.
Right: Limit sell order of size 2 arrives at level 45.
Limit sell order. A limit sell order of size k at level j is an order to sell k units
at price
j
. The order is placed last in the queue of orders at price
j
. It may be
interpreted as k orders of unit size arriving instantaneously. Mathematically it is
a transition of the Markov chain from state x to x + ke
j
where j > j
B
and k 1.
That is, a limit sell order can only be placed at a level higher than the best bid
level j
B
. See Figure 2.
Market buy order. A market buy order of size k is an order to buy k units at the
best available price. It corresponds to a transition from state x to x ke
j
A
. Note
that if k x
j
A
the market order will knock out all the sell quotes at j
A
, resulting
in a new lowest ask level. See Figure 3.
Market sell order. A market sell order of size k is an order to sell k units at the
best available price. It corresponds to a transition from state x to x + ke
j
B
. Note
that if k |x
j
B
| the market order will knock out all the buy quotes at j
B
, resulting
in a new highest bid level. See Figure 3.
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book: market buy order
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book: market sell order
Figure 3. Left: Market buy order of size 2 arrives and knocks
out level 44. Right: Market sell order of size 2 arrives.
Cancellation of a buy order. A cancellation of a buy order of size k at level
j is an order to instantaneously withdraw k limit buy orders at level j from the
order book. It corresponds to a transition from x to x + ke
j
where j j
B
and
1 k |x
j
|. See Figure 4.
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book: cancellation of buy order
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book
39 40 41 42 43 44 45 46 47
!10
!8
!6
!4
!2
0
2
4
6
8
10
Order book: cancellation of sell order
Figure 4. Left: A cancellation of a buy order of size 1 arrives at
level 40. Right: A cancellation of a sell order of size 2 arrives at
level 47.
Cancellation of a sell order. A cancellation of a sell order of size k at level j is
an order to instantaneously withdraw k limit sell orders at level j from the order
book. It corresponds to a transition from x to xke
j
where j j
A
and 1 k x
j
.
See Figure 4.
Summary. To summarize, the possible transitions are such that Q
xy
is non-zero
if and only if y is of the form
y =
x + ke
j
, j j
B
(x), k 1,
x ke
j
, j j
A
(x), k 1.
x ke
j
, j j
A
(x), 1 k x
j
x + ke
j
, j j
B
(x), 1 k |x
j
|.
(1)
To fully specify the model it remains to specify what the non-zero transition
intensities are. The computational complexity of the model does not depend heav-
ily on the specic choice of the non-zero transition intensities, but rather on the
dimensionality of the transition matrix. In Section 5 a simple model is presented
which is easy and fast to calibrate.
3. Potential theory for evaluation of simple strategies
Consider an agent who wants to buy one unit. There are two alternatives. The
agent can place a market buy order at the best ask level j
A
or place a limit buy
order at a level less than j
A
. In the second alternative the buy price is lower but
there is a risk that the order will not be executed, i.e. matched by a market sell
order, before the price starts to move up. Then the agent may be forced to buy at
a price higher than j
A
. It is therefore of interest to compute the probability that
a limit buy order is executed before the price moves up as well as the expected
buy price resulting from a limit buy order. These problems are naturally addressed
within the framework of potential theory for Markov chains. First, a standard
result on potential theory for Markov chains will be presented. A straightforward
application of the result enables the computation of the expected price of a limit
buy order and the expected payo of a simple strategy for making the spread.
3.1. Potential theory. Consider a discrete time Markov chain X = (X
n
) on a
countable state space S with transition matrix P. For a subset D S the set
D = S \ D is called the boundary of D and is assumed to be non-empty. Let
be the rst hitting time of D, that is = inf{n : X
n
D}. Suppose a running
cost function v
C
= (v
C
(s))
sD
and a terminal cost function v
T
= (v
T
(s))
sD
are
given. The potential associated to v
C
and v
T
is dened by = ((s))
sS
where
(s) = E
n=0
v
C
(X
n
) + v
T
(X
)I{ < }
X
0
= s
.
The potential is characterized as the solution to a linear system of equations.
Theorem 3.1 (e.g. [12], Theorem 4.2.3). Suppose v
C
and v
T
are non-negative.
Then satises
= P +v
C
, in D,
= v
T
, in D.
(2)
Theorem 3.1 is all that is needed to compute the success probability of buy-
ing/selling a given order and the expected value of simple buy/sell strategies. De-
tails are given in the following section.
3.2. Probability that a limit order is executed. In this section X = (X
n
)
denotes the jump chain of the order book described in Section 2 with possible
transitions specied by (1). Suppose the initial state of the order book is X
0
. The
agent places a limit buy order at level J
0
. The order is placed instantaneously at
time 0 and after the order is placed the state of the order book is X
0
= X
0
e
J0
.
Consider the probability that the order is executed before the best ask level is at
least J
1
> j
A
(X
0
).
As the order book evolves it is necessary to keep track of the position of the
agents buy order. For this purpose, an additional variable Y
n
is introduced repre-
senting the number of limit orders at level J
0
that are in front of the agents order,
including the agents order, after n transitions. Then, Y
0
= X
J0
0
1 and Y
n
can
only move up towards 0 and does so whenever there is a market order at level J
0
or an order in front of the agents order is cancelled.
The pair (X
n
, Y
n
) is also a Markov chain in S Z
d
{0, 1, 2, . . . } and whose
jump chain has transition matrix denoted P. The state space is partitioned into
two disjoint sets: S = D D where
D = {(x, y) S : y = 0, or x
j
0 for all J
0
< j < J
1
}.
Dene the terminal cost function v
T
: D R by
v
T
(x, y) =
1 if y = 0
0 otherwise
and let denote the rst time (X
n
, Y
n
) hits D. The potential = ((s))
sS
given
by
(s) = E[v
T
(X
, Y
)I{ < } | (X
0
, Y
0
) = s],
is precisely the probability of the agents market order being executed before best
ask moves to or above J
1
conditional on the initial state. To compute the desired
probability all that remains is to solve (2) with v
C
= 0.
3.3. Expected price for a naive buy-one-unit strategy. The probability that
a limit buy order is executed is all that is needed to compute the expected price of
a naive buy-one-unit strategy. The strategy is implemented as follows:
(1) Place a unit size limit buy order at level J
0
.
(2) If best ask moves to level J
1
, cancel the limit order and buy at level J
1
.
This assumes that there will always be limit sell orders available at level J
1
. If
p denotes the probability that the limit buy order is executed (from the previous
subsection) then the expected buy price becomes
E[ buy price ] = p
J0
+ (1 p)
J1
.
Recall that, at the initial state, the agent may select to buy at the best ask price
j
A
(X0)
. This suggests that it is better to follow the naive buy-one-unit strategy
than to place a market buy order whenever E[ buy price ] <
j
A
(X0)
. In Section 4
more elaborate buy strategies will be evaluated using the theory of Markov decision
processes.
3.4. Making the spread. We now proceed to calculate the expected payo of
another simple trading strategy. The aim is to earn the dierence between the bid
and the ask price, the spread. Suppose the order book starts in state X
0
. Initially
an agent places a limit buy order at level j
0
and a limit sell order at level j
1
> j
0
.
In case both are executed the prot is the price dierence between the two orders.
The orders are placed instantaneously at n = 0 and after the orders are placed the
state of the order book is X
0
e
j0
+e
j1
. Let J
0
and J
1
be stop-loss levels such that
J
0
< j
0
< j
1
< J
1
. The simple making-the-spread strategy proceeds as follows.
(1) If the buy order is executed rst and the best bid moves to J
0
before the
sell order is executed, cancel the limit sell order and place a market sell
order at J
0
.
(2) If the sell order is executed rst and the best ask moves to J
1
before the
buy order is executed, cancel the limit buy order and place a market buy
order at J
1
.
This strategy assumes that there will always be limit buy orders available at J
0
and limit sell orders at J
1
.
It will be necessary to keep track of the positions of the agents limit orders. For
this purpose two additional variables Y
0
n
and Y
1
n
are introduced that represent the
number of limit orders at levels j
0
and j
1
that are in front of and including the
agents orders, respectively.
It follows that Y
0
0
= X
j0
0
1 and Y
1
0
= X
j1
0
+ 1, Y
0
n
is non-decreasing, and Y
1
n
is non-increasing. The agents buy (sell) order has been executed when Y
0
n
= 0
(Y
1
n
= 0).
The triplet (X
n
, Y
0
n
, Y
1
n
) is also a Markov chain with state space S Z
d
{0, 1, 2, . . .} { 0, 1, 2, . . .}. Let P denote the its transition matrix.

The state space S is partitioned into two disjoint subsets: S = D D, where
D = {(x, y
0
, y
1
) S : y
0
= 0, or y
1
= 0}.
Let the function p
B
(x, y
0
) denote the probability that a limit buy order placed
at j
0
is executed before best ask moves to J
1
. This probability is computed in
Section 3.2. If the sell order is executed rst so y
1
= 0, then there will be a positive
income of
j1
. The expected expense in state (x, y
0
, y
1
) for buying one unit is
p
B
(x, y
0
)
j0
+ (1 p
B
(x, y
0
))
J1
. Similarly, let the function p
A
(x, y
1
) denote the
probability that a limit sell order placed at j
1
is executed before best bid moves
to J
0
. This can be computed in a similar manner. If the buy order is executed
rst so y
0
= 0, then this will result in an expense of
j0
. The expected income in
state (x, y
0
, y
1
) for selling one unit is p
A
(x, y
1
)
j1
+(1 p
A
(x, y
1
))
J0
. The above
agument leads us to dene the terminal cost function v
T
: D R by
v
T
(x, y
0
, y
1
) =

j1
p
B
(x, y
0
)
j0
(1 p
B
(x, y
0
))
J1
for y
1
= 0
p
A
(x, y
1
)
j1
+ (1 p
A
(x, y
1
))
J0
j0
for y
0
= 0.
Let denote the rst time (X
n
, Y
0
n
, Y
1
n
) hits D. The potential = ((s))
sS
dened by
(s) = E[v
T
(X
, Y
0
, Y
1
)I{ < } | (X
0
, Y
0
0
, Y
1
0
) = s],
is precisely the expected payo of this strategy. It is a solution to (2) with v
C
= 0.
4. Optimal strategies and Markov decision processes
The framework laid out in Section 3 is too restrictive for many purposes, as it
does not allow the agent to change the initial position. In this section it will be
demonstrated how Markov decision theory can be used to design and analyze more
exible trading strategies.
The general results on Markov decision processes are given in Section 4.1 and
the applications to buy-one-unit strategies and strategies for making the spread
are explained in the following sections. The general results that are of greatest
relevance to the applications are the last statement of Theorem 4.3 and Theorem
4.5, which lead to Algorithm 4.1.
4.1. Results for Markov decision processes. First the general setup will be
described. We refer to [12] for a brief introduction to Markov decision processes
and [6] or [9] for more details.
Let (X
n
)
n=0
be a Markov chain in discrete time on a countable state space S
with transition matrix P. Let A be a nite set of possible actions. Every action
can be classied as either a continuation action or a termination action. The set
of continutation actions is denoted C and the set of termination actions T. Then
A = CT where C and T are disjoint. When a termination action is selected the
Markov chain is terminated.
Every action is not available in every state of the chain. Let A : S 2
A
be
a function associating a non-empty set of actions A(s) to each state s S. Here
2
A
is the power set consisting of all subsets of A. The set of continuation actions
available in state s is denoted C(s) = A(s) C and the set of termination actions
T(s) = A(s) T. For each s S and a C(s) the transition probability from s to
s
when selecting action a is denoted P

ss
(a).
For every action there are associated costs. The cost of continuation is denoted
v
C
(s, a), it can be non-zero only when a C(s). The cost of termination is denoted
v
T
(s, a), it can be non-zero only when a T(s). It is assumed that both v
C
and
v
T
are non-negative and bounded.
A policy = (
0
,
1
, . . . ) is a sequence of functions:
n
: S
n+1
A such that
n
(s
0
, . . . , s
n
) A(s
n
) for each n 0 and (s
0
, . . . , s
n
) S
n+1
. If after n transitions
the Markov chain has visited (X
0
, . . . , X
n
), then
n
(X
0
, . . . , X
n
) is the action to
take when following policy . In the sequel we often encounter policies where the
nth decision
n
is dened as a function S
k+1
A for some 0 k n. In that
case the corresponding function from S
n+1
to A is understood as function of the
last k + 1 coordinates; (s
0
, . . . , s
n
)
n
(s
nk
, . . . , s
n
).
The expected total cost starting in X
0
= s and following a policy until termi-
nation is denoted by V (s, ). In the applications to come it could be intepreted as
the expected buy price. The purpose of Markov decision theory is to analyze op-
timal policies and optimal (minimal) expected costs. A policy
is called optimal
if, for all states s S and policies ,
V (s,
) V (s, ).
The optimal expected cost V
is dened by
V
(s) = inf
V (s, ).
Clearly, if an optimal policy
exists, then V
(s) = V (s,
). It is proved in
Theorem 4.3 below that, if all policies terminate in nite time with probability 1,
an optimal policy
exists and the optimal expected cost is the unique solution to

a Bellman equation. Furthermore, the optimal policy
is stationary. A stationary
policy is a policy that does not change with time. That is
= (
, . . . ), with
: S A, where
denotes both the policy as well as each individual decision

function.
The termination time
of a policy is the rst time an action is taken from

the termination set. That is,
= inf{n 0 :
n
(X
0
, . . . , X
n
) T(X
n
)}. The
expected total cost V (s, ) is given by
V (s, ) = E
n=0
v
C
(X
n
,
n
(X
0
, . . . , X
n
))
+ v
T
(X
(X
0
, . . . , X
))
X
0
= s
.
Given a policy = (
0
,
1
, . . . ) and a state s S let
s
be the shifted policy
s
=
(
0
,
1
, . . . ), where
n
: S
n
A with
n
(s
0
, . . . , s
n1
) =
n
(s, s
0
, . . . , s
n1
).
Lemma 4.1. The expected total cost of a policy satises
V (s, ) = I{
0
(s) C(s)}
v
C
(s,
0
(s)) +
S
P
ss
(
0
(s))V (s
,
s
0
(s))
+I{
0
(s) T(s)a}v
T
(s,
0
(s)). (3)
Proof. The claim follows from a straightforward calculation:
V (s, ) =
aC(s)
v
C
(s, a) +E
n=1
v
C
(X
n
,
n
(X
0
, . . . , X
n
))
+ v
T
(X
(X
0
, . . . , X
))
X
0
= s
I{
0
(s) = a}
+
aT(s)
v
T
(s,
0
(s))I{
0
(s) = a}
=
aC(s)
v
C
(s, a) +E[V (X
1
,
s
)
X
0
= s]
I{
0
(s) = a}
+
aT(s)
v
T
(s,
0
(s))I{
0
(s) = a}
=
aC(s)
v
C
(s, a) +
S
P
ss
(a)V (s
,
s
)
I{
0
(s) = a}
+
aT(s)
v
T
(s, a)I{
0
(s) = a}.
A central role is played by the function V

n
; the minimal incurred cost before
time n with termination at n. It is dened recursively as
V
0
(s) = min
aT(s)
v
T
(s, a),
V
n+1
(s) = min
min
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
n
(s
), min
aT(s)
v
T
(s, a)
, (4)
for n 0. It follows by induction that V
n+1
(s) V
n
(s) for each s S. To see this,
note rst that V
1
(s) V
0
(s) for each s S. Suppose V
n
(s) V
n1
(s) for each
s S. Then
V
n+1
(s) min
min
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
n1
(s
), min
aT(s)
v
T
(s, a)
= V
n
(s),
which proves the induction step.
For each s S the sequence (V
n
(s))
n0
is non-increasing and bounded below by
0, hence convergent. Let V
(s) denote its limit.

Lemma 4.2. V
satises the Bellman equation

V
(s) = min
min
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
(s
), min
aT(s)
v
T
(s, a)
. (5)
Proof. Follows by taking limits. Indeed,
V
(s) = lim
n
V
n+1
(s)
= min
min
aC(s)
v
C
(s, a) + lim
n
S
P
ss
(a)V
n
(s
), min
aT(s)
v
T
(s, a)
= min
min
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
(s
), min
aT(s)
v
T
(s, a)
,
where the last step follows by monotone convergence.
The following theorem states that there is a collection of policies for which
V
is optimal. Furthermore, if all policies belong to , which is quite natural in

applications, then V
is in fact the expected cost of a stationary policy
.
Theorem 4.3. Let be the collection of policies that terminate in nite time,
i.e. P[
< | X
0
= s] = 1 for each s S. Let
= (
, . . . ) be a
stationary policy where
(s) is a minimizer to
a min
min
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
(s
), min
aT(s)
v
T
(s, a)
. (6)
The following statements hold.
(a) For each , V (s, ) V
(s).
(b) V
is the optimal expected cost for . That is, V
= inf
V (s, ).
(c) If
, then V
(s) = V (s,
).
(d) Suppose that W is a solution to the Bellman equation (5) and let
w
denote
the minimizer of (6) with V
replaced by W. If
w
,
then W = V
.
In particular, if all policies belong to , then V
is the unique solution to the

Bellman equation (5). Moreover, V
is the optimal expected cost and is attained

by the stationary policy
.
Remark 4.4. It is quite natural that all policies belong to . For instance, suppose
that P
ss
(a) = P
ss
does not depend on a A and the set {s : C(s) = } is non-
empty. Then the chain will terminate as soon as it hits this set. It follows that
all policies belong to if P[ < | X
0
= s] = 1 for each s S, where its rst
hitting time of {s : C(s) = }.
Proof. (a) Take . Let T
n
= (
0
,
1
, . . . ,
n1
,
T
) be the policy terminated
at n. Here
T
(s) = min
aT(s)
v
T
(s, a) is an optimal termination action. That is,
the policy T
n
follows until time n 1 and then terminates. In particular,
P[
Tn
n | X
0
= s] = 1 for each s S.
We claim that
(i) V (s, T
n
) V
n
(s) for each policy and each s S, and
(ii) lim
n
V (s, T
n
) = V (s, ).
Then (a) follows since
V (s, ) = lim
n
V (s, T
n
) lim
n
V
n
(s) = V
(s).
Statement (i) follows by induction. First note that
V (s, T
0
) = min
aT(s)
v
T
(s, a) = V
0
(s).
Suppose V (s, T
n
) V
n
(s) for each policy and each s S. Then
V (s, T
n+1
) =
aT(s)
v
T
(s, a)I{
0
(s) = a}
+
aC(s)
v
C
(s, a) +
S
P
ss
(a)V (s
,
s
T
n+1
)
I{
0
(s) = a}.
Since
s
T
n+1
= (
1
, . . . ,
n
,
T
) = T
n
s
it follows by the induction hypothesis
that V (s
,
s
T
n+1
) V
n
(s
). The expression in the last display is then greater or

equal to
aT(s)
v
T
(s, a)I{
0
(s) = a} +
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
n
(s
I{
0
(s) = a}
min
min
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
n
(s
), min
aT(s)
v
T
(s, a)
= V
n+1
(s).
Proof of (ii). Note that one can write
V (s, ) = E
t=1
v
C
(X
t
,
t
(X
0
, . . . , X
t
))

X
0
= s
+E
v
T
(X
(X
0
, . . . , X
))

X
0
= s
,
and
V (s, T
n
) = E
n1
t=1
v
C
(X
t
,
t
(X
0
, . . . , X
t
))

X
0
= s
+E
v
T
(X
n
,
n
(X
0
, . . . , X
n
))

X
0
= s
.
From monotone convergence it follows that
E
t=1
v
C
(X
t
,
t
(X
0
, . . . , X
t
))

X
0
= s
= E
lim
n
n1
t=1
v
C
(X
t
,
t
(X
0
, . . . , X
t
))

X
0
= s
= lim
n
E
n1
t=1
v
C
(X
t
,
t
(X
0
, . . . , X
t
))

X
0
= s
.
Let C (0, ) be an upper bound to v
T
. It follows that
v
T
(X
) v
T
(X
n
,
n
(X
0
, . . . , X
n
))

X
0
= s
2CE[I{
> n} | X
0
= s]
2CP[
> n | X
0
= s].
By assumption so P[
n | X
0
= s] 0 as n , for each s S. This
shows that lim
n
V (s, T
n
) = V (s, ), as claimed.
(b) Note that by (a) inf
V (s, ) V
(s). From the rst part of Theorem 4.5

below it follows that there is a sequence of policies denoted
0:n
with V (s,
0:n
) =
V
n
(s). Since V
n
(s) V
(s) it follows that inf
V (s, ) V
(s). This proves

(b).
(c) Take s S. Suppose rst
(s) T(s). Then

V
(s) = min
aT(s)
v
T
(s, a) = V (s,
(s)).
It follows that V
(s) = V (s,
) for each s {s :
(s) T(s)}.
Take another s S such that
(s) C(s). Then

V
(s) = v
C
(s,
(s)) +
S
P
ss
(
(s))V
(s
),
and
V (s,
) = v
C
(s,
(s)) +
S
P
ss
(
(s))V (s
).
It follows that
|V
(s) V (s,
)|
S
P
ss
(
(s))|V
(s
) V (s
)|
=
:(s
)C(s
)
P
ss
(
(s))|V
(s
) V (s
)|
= E[|V
(X
1
) V (X
1
,
)|I{
> 1} | X
0
= s]
= E[E[|V
(X
2
) V (X
2
,
)|I{
> 2} | X
1
] | X
0
= s]
. . .
E[|V
(X
n
) V (X
n
,
)|I{
> n} | X
0
= s]
2CP[
> n | X
0
= s],
where n 1 is arbitrary. Since P[
< | X
0
= s] = 1 the last expression
converges to 0 as n . This completes the proof of (c).
Finally to prove (d), let W be a solution to (5). That is, W satises
W(s) = min
min
aC(s)
v
C
(s, a) +
S
P
ss
(a)W(s
), min
aT(s)
v
T
(s, a)
.
Proceeding as in the proof of (c) it follows directly that W(s) = V (s,
w
). By (a)
it follows that W(s) V
(s). Consider the termination regions {s :

w
(s) T(s)}
and {s :
(s) T(s)} of
w
and
. Since W(s) V
(s), and both are solutions

to (5) it follows that
{s :
(s) T(s)} { s :
w
(s) T(s)},
and V
(s) = min
aT(s)
v
T
(s, a) = W(s) on {s :
(s) T(s)}. To show equality

for all s S it remain to consider the continuation region of
. Take s {s :
(s) C(s)}. As in the proof of (c) one writes

W(s) min
aC(s)
v
C
(s, a) +
S
P
ss
(a)W(s
)
= min
aC(s)
v
C
(s, a) +
S
P
ss
(a)(W(s
) V
(s
)) +
S
P
ss
(a)V
(s
)
= V
(s) +
S
P
ss
(a)(W(s
) V
(s
))
= V
(s) +
:(s
)C(s
)
P
ss
(a)(W(s
) V
(s
))
= V
(s) +E
(W(X
1
) V
(X
1
))I{
> 1}
X
0
= s
= V
(s) +E
(W(X
n
) V
(X
n
))I{
> n}
X
0
= s
.
Since E[(W(X
n
) V
(X
n
))I{
> n} | X
0
= s] 0 as n 0 it follows that
W(s) V
(s) on {s :
(s) C(s)}. This implies W(s) = V
(s) for all s S

and the proof is complete.
In practice the optimal expected total cost V
may be dicult to nd, and hence

also the policy
that attains V
. However, it is easy to come close. Since V

n
(s)
converges to V
(s) a close to optimal policy is obtained by nding one that attains

the expected cost at most V
n
(s) for large n.
For s S. Let
0
(s) be a minimizer of a v
T
(s, a) and for n 1,
n
(s) is a
minimizer of
a min
min
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
n1
(s
), min
aT(s)
v
T
(s, a)
. (7)
Theorem 4.5. The policy
n:0
= (
n
,
n1
, . . . ,
0
) has expected total cost given
by V (s,
n:0
) = V
n
(s). Moreover, if the stationary policy
n
= (
n
,
n
, . . . ) satises
n
, then the expected total cost of
n
satises
V
n
(s) V (s,
n
) V
(s).
Proof. Note that
0
is a termination action and V (s,
0
) = V
0
(s). The rst claim
then follows by induction. Suppose V (s,
n:0
) = V
n
(s). Then
V (s,
n+1:0
) =
aC(s)
v
C
(s, a) +
S
P
ss
(a)V (s
,
n:0
)
I{
n+1
(s) = a}
+
aT(s)
v
T
(s, a)I{
n+1
(s) = a}
=
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
n
(s
I{
n+1
(s) = a}
+
aT(s)
v
T
(s, a)I{
n+1
(s) = a}
= min
min
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
n
(s
), min
aT(s)
v
T
(s, a)
,
= V
n+1
(s),
and the induction proceeds.
The proof of the second statement proceeds as follows. For n 0 and k 0 let
k
n:0
= (
n
, . . . ,
n
. .. .
k times
,
n1
, . . . ,
0
).
Then
0
n:0
=
n1:0
. By induction it follows that V (s,
k
n:0
) V (s,
k+1
n:0
). Indeed,
note rst that
V (s,
0
n:0
) V (s,
1
n:0
) = V
n1
(s) V
n
(s) 0.
Suppose V (s,
k1
n:0
) V (s,
k
n:0
) 0. If s is such that
n
(s) T(s), then
V (s,
k
n:0
) V (s,
k+1
n:0
) = v
T
(s,
n
(s)) v
T
(s,
n
(s)) = 0.
If
n
(s) C(s), then
V (s,
k
n:0
) V (s,
k+1
n:0
) =
S
P
ss
(
n
(s))
V (s
,
k1
n:0
) V (s
,
k
n:0
)
0.
This completes the induction step and the induction proceeds. Since
n
it
follows that V (s,
n
) = lim
k
V (s,
k
n:0
). Indeed,
|V (s,
n
) V (s,
k
n:0
)| CP(
n
> k) 0,
as k . Finally, by Theorem 4.3,
V
(s) V (s,
n
) = lim
k
V (s,
k
n:0
) V (s,
1
n:0
) = V
n
(s),
and the proof is complete.
From the above discussion it is clear that the stationary policy
n
converges to
an optimal policy and that V
n
provides an upper bound for the nal expected cost
following this strategy. In light of the above discussion it is clear that Algorithm
4.1 in the limit determines the optimal cost and an optimal policy.
Algorithm 4.1 Optimal trading strategies
Input: Tolerance TOL, transition matrix P, state space S, continuation actions
C, termination actions T, continuation cost v
C
, termination cost v
T
.
Output: Upper bound V
n
of optimal cost and almost optimal policy
n
.
Let
V
0
(s) = min
aT(s)
v
T
(s, a), for s S.
Let n = 1 and d > TOL.
while d > TOL do
Put
V
n
(s) = min
min
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
n1
(s
), min
aT(s)
v
T
(s, a)
,
and
d = max
sS
V
n1
(s) V
n
(s), for s S
n = n + 1.
end while
Dene : S C T as a minimizer to
min
min
aC(s)
v
C
(s, a) +
S
P
ss
(a)V
n1
(s
), min
aT(s)
v
T
(s, a)
.
Algorithm 4.1 is an example of a valueiteration algorithm. There are other
methods that can be used to solve Markov decision problems, such as policy
iteration algorithms. See Chapter 3 in [16] for an interesting discussion on algo-
rithms and Markov decision theory. Typically valueiteration algorithms are well
suited to solve Markov decision problems when the state space of the Markov chain
is large.
4.2. The keep-or-cancel strategy for buying one unit. In this section a buy-
one-unit strategy is considered. It is similar to the buy strategy outlined in Section
3.3 except that the agent has the additional optionality of early cancellation and
submission of a market order. Only the jump-chain of (X
t
) is considered. Recall
that the jump chain, denoted (X
n
)
n=0
, is a discrete Markov chain where X
n
is the
state of the order book after n transitions.
Suppose the initial state is X
0
. An agent wants to buy one unit and places a
limit buy order at level j
0
< j
A
(X
0
). After each market transition, the agent has
two choices. Either to keep the limit order or to cancel it and submit a market buy
order at the best available ask level j
A
(X
n
). It is assumed that the cancellation
and submission of the market order is processed instantaneously. It will also be
assumed that the agent has decided upon a maximum price level J > j
A
(X
0
). If
the agents limit buy order has not been processed and j
A
(X
n
) = J, then the agent
will immediately cancel the buy order and place a market buy order at level J. It
will be implicitly assumed that there always are limit sell orders available at level
J. Buying at level J can be thought of as a stop-loss.
From a theoretical point of view assuming an upper bound J for the price level
is not a serious restriction as it can be chosen very high. From a practical point of
view, though, it is convenient not to take J very high because it will signicantly
slow down the numerical computation of the solution. This deciency may be
compensated by dening
J
appropriately large, say larger than
J1
plus one
price tick.
Recall the Markov chain (X
n
, Y
n
) dened in Section 3.2. Here X
n
represents the
order book after n transitions and Y
n
is negative with |Y
n
| being the number of
quotes in front and including the agents order, at level j
0
. The state space in this
case is S Z
d
{. . . , 2, 1, 0}.
Let s = (x, y) S. Suppose y < 0 and j
A
(x) < J so the agents order has
not been executed and the stop-loss has not been reached. Then there is one
continuation action C(s) = {0} representing waiting for the next market transition.
The continuation cost v
C
(s) is always 0. If j
A
(x) = J (stop-loss is hit) or y = 0
(limit order executed) then C(s) = so it is only possible to terminate.
There are are two termination actions T = {2, 1}. If y < 0 the only termi-
nation action available is 1 T, representing cancellation of the limit order and
submission of a market order at the ask price. If y = 0 the Markov chain always
terminates since the limit order has been executed. This action is represented by
2 T. The termination costs are
v
T
(s, 1) =
j
A
(x)
v
T
(s, 2) =
j0
.
The expected total cost may, in this case, be interpreted as the expected buy
price. In a state s = (x, y) with j
A
(x) < J following a stationary policy =
(, , . . . ) it is given by (see Lemma 4.1)
V (s, ) =
s
P
ss
V (s
, ) for (s) = 0,
j
A
(x)
for (s) = 1,
j0
for (s) = 2.
(8)
When s = (x, y) is such that j
A
(x) = J, then V (s, ) =
J
. It follows immediately
that
j0
V (s, )
J
,
for all s S and all policies . The motivation of the expression (8) for the expected
buy price is as follows. If the limit order is not processed, so y < 0, there is no cost
of waiting. This is the case (s) = 0. The cost of cancelling and placing the market
buy order is
j
A
(x)
; the current best ask price. When the limit order is processed
y = 0 the incurred cost is
j0
; the price level of the limit order.
The collection of policies with P[
< | X
0
= s] = 1 for each s S are
the only reasonable policies. It does not seem desirable to risk having to wait an
innite amount of time to buy one unit.
By Theorem 4.3 an optimal keep-or-cancel strategy for buying one unit is the
stationary policy
, with expected buy price V
satisfying, see Lemma 4.2,

V
(s) = min
min
aC(s)
S
P
ss
V
(s
), min
aT(s)
v
T
(s)
min
S
P
ss
V
(s
),
j
A
(x)
, for j
A
(x) < J, y < 0,
J
, for j
A
(x) = J, y < 0,
j0
, for y = 0.
The stationary policy
n
in Theorem 4.5 provides a useful numerical approximation
of an optimal policy, and V
n
(s) in (4) provides an upper bound of the expected buy
price. Both V
n
and
n
can be computed by Algorithm 4.1.
4.3. The ultimate buy-one-unit strategy. In this section the keep-or-cancel
strategy considered above is extended so that the agent may at any time cancel
and replace the limit order.
Suppose the initial state of the order book is X
0
. An agent wants to buy one
unit. After n transitions of the order book, if the agents limit order is located
at a level j, then j
n
= j represents the level of the limit order, and Y
n
represents
the outstanding orders in front of and including the agents order at level j
n
. This
denes the discrete Markov chain (X
n
, Y
n
, j
n
).
It will be assumed that the agent has decided upon a best price level J
0
and
a worst price level J
1
where J
0
< j
A
(X
0
) < J
1
. The agent is willing to buy at
level J
0
and will not place limit orders at levels lower than J
0
. The level J
1
is the
worst case buy price or stop-loss. If j
A
(X
n
) = J
1
the agent is committed to cancel
the limit buy order immediately and place a market order at level J
1
. It will be
assumed that it is always possible to buy at level J
1
. The state space in this case
is S Z
d
{. . . , 2, 1, 0} { J
0
, . . . , J
1
1}.
The set of possible actions depend on the current state (x, y, j). In each state
where y < 0 the agent has three options:
(1) Do nothing and wait for a market transition.
(2) Cancel the limit order and place a market buy order at the best ask level
j
A
(x).
(3) Cancel the existing limit buy order and place a new limit buy order at
any level j
with J
0
j
< j
A
(x). This action results in the transition to
j
n
= j
, X
n
= x +e
j
e
j
and Y
n
= x
j
1.
In a given state s = (x, y, j) with y < 0 and j
A
(x) < J the set of continuation
actions is
C(x, y, j) = {0, J
0
, . . . , j
A
(x) 1},
Here a = 0 represents the agent being inactive and awaits the next market transition
and the actions j
, where J
0
j
< j
A
(x), corresponds to cancelling the outstanding
order and submitting a new limit buy order at level j
. The cost of continuation is

always 0, v
C
(s, 0) = v
C
(s, j
) = 0. If y = 0 or j
A
(x) = J
1
, then C(s) = and only
termination is possible.
As in the keep-or-cancel strategy there are are two termination actions T =
{2, 1}. If y < 0 the only termination action available is 1, representing can-
cellation of the limit order and submission of a market order at the ask price. If
y = 0 the Markov chain always terminates since the limit order has been executed.
This action is represented by 2.
The expected buy price V (s, ) from a state s = (x, y, j) with j
A
(x) < J follow-
ing a stationary policy = (, , . . . ) is
V (s, ) =
s
P
ss
V (s
, ) for (s) = 0,
V (s
j
, ), for (s) = j
, J
0
j
< j
A
(x),
j
A
(x)
for (s) = 1,
j
B
(X0)
for (s) = 2.
In the second line s
j
refers to the state (x
, y
, j
) where x
= x+e
j
e
j
, y
= x
j
1.
If s = (x, y, j) with j
A
(x) = J
1
, then V (s, ) =
J1
. Since the agent is committed
to buy at level J
0
and it is assumed that it is always possible to buy at level J
1
it
follows immediately that
J0
V (s, )
J1
,
for all s S and all policies .
By Theorem 4.3 an optimal buy strategy is the stationary policy
, with
expected buy price V
satisfying, see Lemma 4.2,

V
(s) = min
min
aC(s)
S
P
ss
V
(s
), min
aT(s)
v
T
(s, a)
which implies that

V
(s) = min
ss
V
(s
), V
(s
J0
), . . . , V
(s
j
A
(x)1
),
j
A
(x)
,
for j
A
(x) < J
1
, y < 0, and
V
(s) =

J
, for j
A
(x) = J
1
, y < 0,
j
, for y = 0.
The stationary policy
n
in Theorem 4.5 provides a useful numerical approximation
of an optimal policy, and V
n
(s) in (4) provides an upper bound of its expected buy
price. Both
n
and V
n
can be computed by Algorithm 4.1.
4.4. Making the spread. In this section a strategy aimed at earning the dierence
between the bid and the ask price, the spread, is considered. An agent submits two
limit orders, one buy and one sell. In case both are executed the prot is the
price dierence between the two orders. For simplicity it is assumed at rst that
before one of the orders has been executed the agent only has two options after
each market transition: cancel both orders or wait until next market transition.
The extension which allows for cancellation and resubmission of both limit orders
with new limits is presented at the end of this section.
Suppose X
0
is the initial state of the order book. The agent places the limit
buy order at level j
0
and the limit sell order at level j
1
> j
0
. The orders are
placed instantaneously and after the orders are placed the state of the order book
is X
0
e
j0
+e
j1
.
Consider the extended Markov chain (X
n
, Y
0
n
, Y
1
n
, j
0
n
, j
1
n
). Here X
n
represents the
order book after n transitions and Y
0
n
and Y
1
n
represent the limit buy (negative)
and sell (positive) orders at levels j
0
n
and j
1
n
that are in front of and including the
agents orders, respectively. It follows that Y
0
0
= X
j
0
n
0
1 and Y
1
0
= X
j
1
n
0
+1, where
Y
0
n
is non-decreasing and Y
1
n
is non-increasing. The agents buy (sell) order has
been processed when Y
0
n
= 0 (Y
1
n
= 0).
Suppose the agent has decided on a best buy level J
B0
< j
A
(X
0
) and a worst buy
level J
B1
> j
A
(X
0
). The agent will never place a limit buy order at a level lower
than J
B0
and will not buy at a level higher than J
B1
, and it is assumed to always
be possible to buy at level J
B1
. Similarly, the agent has decided on a best sell price
J
A1
> j
B
(X
0
) and a worst sell price J
A0
< j
B
(X
0
). The agent will never place a
limit sell order at a level higher than J
A1
and will not sell at a level lower than
J
A0
, and it is assumed to always be possible to sell at level J
A0
. The state space of
this Markov chain is S Z
d
{. . . , 2, 1, 0} { 0, 1, 2, . . . } { J
B0
, . . . , J
B11
}
{J
A0+1
, . . . , J
A1
}.
The possible actions are:
(1) Before any of the orders has been processed the agent can wait for the next
market transition or cancel both orders.
(2) When one of the orders has been processed, say the sell order, the agent
has an outstanding limit buy order. Then the agent proceeds according to
the ultimate buy-one-unit strategy presented in Section 4.3.
Given a state s = (x, y
0
, y
1
, j
0
, j
1
) of the Markov chain the optimal value function
V
is interpreted as the optimal expected payo. Note, that for making-the-spread

strategies it is more natural to have V
as a payo than as a cost and this is how

it will be interpreted. The general results in Section 4.1 still hold since the value
functions are bounded from below and above. The optimal expected payo can be
computed as follows. Let V
B
(x, y, j) denote the optimal (minimal) expected buy

price in state (x, y, j) for buying one unit, with best buy level J
B0
and worst buy
level J
B1
. Similarly, V
A
(x, y, j) denotes the optimal (maximal) expected sell price

in state (x, y, j) for selling one unit, with best sell level J
A1
and worst sell level J
A0
.
The optimal expected payo is then given by
V
(s) =
max
S
P
ss
V
(s
), 0
, for y
0
< 0, y
1
> 0,
j
1
V
B
(x, y
0
, j
0
), for y
1
= 0, y
0
< 0,
V
A
(x, y
1
, j
1
)
j
0
, for y
0
= 0, y
1
> 0.
The term

s
S
P
ss
V
(s
) is the value of waiting and 0 is the value of cancelling

both orders.
In the extended version of the making-the-spread strategy it is also possible to
replace the two limit orders before the rst has been executed. Then the possible
actions are as follows.
(1) Before any of the orders has been processed the agent can wait for the next
market transition, cancel both orders or cancel both orders and resubmit
at new levels k
0
and k
1
.
(2) When one of the orders have been processed, say the sell order, the agent
has an outstanding limit buy order. Then the agent proceeds according to
the ultimate buy-one-unit strategy presented in Section 4.3.
It is assumed that J
B0
, J
B1
, J
A0
, and J
A1
are the upper and lower limits, as above.
Then the optimal expected payo is given by
V
(s) =
max
S
P
ss
V
(s
), max V
(s
k
0
k
1), 0
, for y
0
< 0, y
1
> 0,
j
1
V
B
(x, y
0
, j
0
), for y
1
= 0, y
0
< 0,
V
A
(x, y
1
, j
1
)
j
0
, for y
0
= 0, y
1
> 0.
In the rst line the max V
(s
k
0
k
1) is taken over all states s
k
0
k
1 = ( x, y
0
, y
1
, k
0
, k
1
)
where J
B0
k
0
< J
B1
, J
A0
< k
1
J
A1
, x = x+e
j
0
e
k
0
e
j
1
+e
k
1
, y
0
= x
k
0
1,
and y
1
= x
k
1
+ 1. Here k
0
and k
1
represent the levels of the new limit orders.
5. Implementation of a simple model
In this section a simple parameterization of the Markov chain for the order book
is presented. The aim of the model presented here is not to be very sophisticated
but rather to allow for simple calibration.
Recall that a Markov chain is specied by its initial state and generator matrix
Q (see for instance Norris [12], Chapter 2). Given two dierent states x, y S, Q
xy
denotes the transition intensity from x to y. The waiting time until next transition
is exponentially distributed with parameter
Q
x
=
y=x
Q
xy
. (9)
The transition matrix of the jump chain, P
xy
, denotes the probability that a tran-
sition in state x will take the Markov chain to state y. It is obtained from Q
via:
P
xy
=
Q
xy
y=x
Q
xy
.
Recall from Section 2 the dierent order types (limit order, market order, can-
cellation) that dictate the possible transitions of the order book. The model is
then completely determined by the initial state and the non-zero intensities for
transitions rates for (1).
In this secton the limit, market and cancellation order intensities are specied
as follows.
Limit buy (sell) orders arrive at a distance of i levels from best ask (bid)
level with intensity
B
L
(i) (
S
L
(i)).
Market buy (sell) orders arrive with intensity
B
M
(
S
M
).
The size of limit and market orders follow discrete exponential distributions
with parameters
L
and
M
respectively. That is, the distributions (p
k
)
k1
and (q
k
)
k1
of limit and market order sizes are given by
p
k
= (e
L
1)e
L
k
, q
k
= (e
M
1)e
M
k
.
The size of cancellation orders is assumed to be 1. Each individual unit
size buy (sell) order located at a distance of i levels from the best ask (bid)
level is cancelled with a rate
B
C
(i) (
S
C
(i)). At the cumulative level the
cancellations of buy (sell) orders at a distance of i levels from opposite best
ask (bid) level arrive with a rate proportional to the volume at the level:
B
C
(i)|x
j
A
i
| (
S
C
(i)|x
j
B
+i
|).
In mathematical terms the transition rates are given as follows.
Limit orders:
x x + ke
j
, j > j
B
(x), k 1, with rate p
k
S
L
(j j
B
(x)),
x x ke
j
, j < j
A
(x), k 1, with rate p
k
B
L
(j
A
(x) j),
Cancellation orders except at best ask/bid level:
x x e
j
, j > j
A
(x), with rate
S
C
(j j
B
(x))|x
j
|,
x x + e
j
, j < j
B
(x), with rate
B
C
(j
A
(x) j)|x
j
|.
Market orders of unit size and cancellations at best ask/bid level:
x x +e
j
B
(x)
, with rate q
1
S
M
+
B
C
(j
A
(x) j
B
(x))|x
j
B
(x)
|
x x e
j
A
(x)
, with rate q
1
B
M
+
S
C
(j
A
(x) j
B
(x))|x
j
A
(x)
|,
Market orders of size at least 2:
x x + ke
j
B
, k 2, with rate q
k
S
M
.
x x ke
j
A
, k 2, with rate q
k
B
M
.
The model described above is an example of a zero intelligence model: Transition
probabilities are state independent except for their dependence on the location of
the best bid and ask. Zero intelligence models of the markets micro structure
were considered already in 1993, in the work of Gode and Sunder [11]. Despite
the simplicity of such models, they capture many important aspects of the order
driven market. Based on a mean eld theory analysis, the authors of [15] and [5]
derive laws relating the mean spread and the short term price diusion rate to the
order arrival rates. In [8], the validity of these laws are tested on high frequency
data on eleven stocks traded at the London Stock Exchange. The authors nd that
the model does a good job of predicting the average spread and a decent job of
predicting the price diusion rate.
It remains to estimate the model parameters from historical data. This is the
next topic.
5.1. Model calibration. Calibration of the Markov chain for the order book
amounts to determining the order intensities and the order size parameters
(
L
,
C
,
B
M
,
S
M
,
L
,
M
).
Suppose an historical sample of an order book containing all limit, market and
cancellation orders during a period of time T is given. If N
B
L
(i) denotes the number
of limit buy orders arrived at a distance of i level from best ask, then
B
L
(i) =
N
B
L
(i)
T
is an estimate of the arrival rate of limit buy orders at that level. The rates
S
L
,
B
M
, and
S
M
are estimated similarly.
The cancellation rate is proportional to the number of orders at each level. To
estimate
B
C
(i) (
S
C
(i)) one rst calculates the average number of outstanding buy
(sell) orders at a distance of i levels from the opposite best quote b
i
. If b
i
t
denotes
the number of buy orders i levels from best ask at time t, then
b
i
=
1
T
T
0
b
i
t
dt.
Then an estimate of the cancellation rate
B
C
(i) is:
B
C
(i) =
N
B
C
(i)
b
(i)T
.
and similarly for
S
C
.
The market and limit order size parameters are estimated by maximum like-
lihood. Given an independent sample (m
1
, . . . , m
N
) from a discrete exponential
distribution with parameter the maximum likelihood estimate is given by
= log
m
m1
,
with m = N
1
N
i=1
m
i
. As a consequence, if there are N
L
limit orders of size
l
1
, . . . , l
N
L
and N
M
market orders of size m
1
, . . . , m
N
M
in the historical data, then
the parameters
L
and
M
are estimated by

L
= log
l
l 1
,
M
= log
m
m 1
.
Remark 5.1. It is often the case that one does not have access to the complete
order book. For instance, the data discussed below only contained time-stamped
quotes of the rst ve nonzero levels on each side of the book. Empirical studies
indicates that arrival rates should decay as a power law (see e.g. [3]), at least for
stock market order books. As the quantities of interest in this paper do not depend
on levels far away from the best bid/ask, this issue will be ignored. Only intensities
at levels where data is available will be estimated.
A more serious problem is the fact that the data most often is presented as a
list of deals and quotes. In other words, one only observes the impact of limit and
cancellation orders, not the orders themselves. As a consequence, to distinguish
dierent orders from each other, high frequency data is needed.
6. Numerical results
In this section the simple model introduced in Section 5 is calibrated to data
from the foreign exchange market and the performance of the dierent buy-one-
unit strategies are evaluated.
6.1. Description of the data. In this section the EUR/USD exchange rate traded
on a major foreign exchange market is considered. The data consists of time-
stamped sequences of prices and quantities of outstanding limit orders (quotes)
and market orders (deals) for the rst ve non-zero levels on each side of the order
book. The trades are in units of one million. That is, a limit buy order of unit size
at 1.4342 is an order to buy one million EUR for 1.4342 million USD. Samples of
the data are presented in Tables 1 and 2. Note that level k in Table 1 refers to the
kth non-zero bid/ask quote. It does not refer to the distance to the best bid/ask.
Quotes are updated every 100 millisecond and it will be assumed that the updating
frequency is suciently high to be able to distinguish dierent orders (cf. Remark
5.1).
Note that a deal and a market order is not the same thing. The size of a market
order might be larger than the outstanding volume at the best price level. The deal
list can however be used to distinguish deals from cancellations.
Table 1. Data sample quotes
TIME BID/ASK LEVEL PRICE VOLUME
04:44:20.800 B 1 1.4342 4
04:44:20.800 B 2 1.4341 8
04:44:20.800 B 3 1.4340 8
04:44:20.800 B 4 1.4339 6
04:44:20.800 B 5 1.4338 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
04:44:21.500 A 1 1.4344 2
04:44:21.500 A 2 1.4345 5
04:44:21.500 A 3 1.4346 9
04:44:21.500 A 4 1.4347 6
04:44:21.500 A 5 1.4348 3
Table 2. Data sample deals
TIME BUY/SELL - ORDER PRICE VOLUME
04:44:20.600 B 1.4343 1
04:44:29.700 B 1.4344 1
04:44:29.800 S 1.4344 9
6.2. Calibration result. Extracting all limit, market and cancellation orders from
the data sample enabled calibration according to the procedure explained in Section
5.1. In Table 3 we show the result of the calibration. These particular parameters
where obtained using orders submitted during the 120 minute period on a single
day, from 02:44:20 am to 04:44:20 am. During this period there were a total of
14 294 observed units entered into the order book via 10 890 limit orders. In all,
there were 13 110 cancelled orders and 1 029 traded units, distributed over 660
market orders.
Table 3. Example of market parameters calibrated to a data
sample of 120 minutes.
i: 1 2 3 4 5
B
L
(i) 0.1330 0.1811 0.2085 0.1477 0.0541
S
L
(i) 0.1442 0.1734 0.2404 0.1391 0.0584
B
C
(i) 0.1287 0.1057 0.0541 0.0493 0.0408
S
C
(i) 0.1308 0.1154 0.0531 0.0492 0.0437
S
M
0.0467
L
0.5667
B
M
0.0467
M
0.4955
6.3. Optimal buy price. Following Section 4.3 the optimal expected buy price
can be evaluated for the ultimate buy-one-unit strategy. Once the expected price
has been determined, it is straightforward to determine an optimal strategy by
Algorithm 4.1.
Recall the setup. An agent wants to buy 1 million EUR at best possible price.
For simplicity, only three price levels will be considered; 1.4342, 1.4343 and 1.4344.
Suppose that, initially, 1.4342 is the best bid level, 1.4344 the best ask and there
are no quotes at 1.4343. The agent has the option of either submitting a market
order at 1.4344 or a limit order at 1.4342 or 1.4343. After each market transition,
the agent has the possibility to change the position. A stop-loss is placed at 1.4345.
If the best ask moves above 1.4344, a market buy order is immediately submitted.
This order is assumed to be processed at 1.4345.
Assuming that the number of outstanding orders at each level is bounded, by 15
say, results in a problem with nite state space. The optimal buy price and optimal
strategy is obtained using Algorithm 4.1 with a tolerance of 10
6
.
The result of the computation is illustrated in Figures 5, 6 and 7. Initially the
agent needs to decide on order type (limit or market) and price level. It turns out
that it is optimal to submit a limit order at 1.4342, independent of the volumes at
1.4342 and 1.4344, see Figure 5.
As new orders arrive, changing the state of the order book, the agent will re-
evaluate the situation. As an example, suppose that after some market transitions
the order book is in a state where there is only one limit buy order of unit size
at 1.4342 in front of the agents. The agent will now act dierently depending
on the number of outstanding limit orders at 1.4343 and 1.4344. If there are sell
limit orders at 1.4343, then the agent will almost always keep the limit order in
place. However, if there is more than one buy order at 1.4343 the agent will submit
a cancellation followed by, either a new limit order at 1.4343, or a market order,
depending on the situation. The optimal decision along with the corresponding
optimal expected buy price, in this situation, is illustrated in Figure 6.
As long as there are no limit buy orders at 1.4343, Figure 5 indicates that it will
be optimal to keep the limit order at 1.4342. However, if a limit buy order appears
at 1.4343, then the situation changes dramatically. Depending on the number of
limit orders ahead of the agents and on the number of sell orders at 1.4344, the
agent will act dierently. The optimal decision along with the optimal expected
buy price is illustrated in Figure 7.
6.4. Comparison of strategies for buying one unit. Three dierent strategies
for buying one unit have been described. The ultimate strategy described above
and in Section 4.3, the keep-or-cancel strategy described in Section 4.2 and the
naive buy strategy from Section 3.3. In the naive strategy the agent submits a
limit order at 1.4342 and does nothing until either the order is matched against a
market order or the best ask moves above 1.4344, in which case the limit order is
cancelled and a market order is submitted. In the keep-or-cancel strategy the agent
submits a limit order at 1.4342. This order can then be cancelled and replaced by
a market order at any time. However it can not be replaced by a limit order at
1.4343.
There are extra computational costs when determining the ultimate strategy,
compared with the naive and keep-or-cancel strategies. It is therefore reasonable to
check wether this increase in complexity generates any relevant cost reduction. The
dierence in expected buy price between the ultimate strategy and the other two
is presented in Figure 8. The situation is the same as the initial state considered
above. That is, there are no limit orders at 1.4343.
It turned out that the extra possibility of replacing the limit order with a market
order made a substantial dierence to the expected price compared to the naive
strategy. The additional possibility of replacing the order with a limit order at
1.4343 did yield an additional, however much smaller, reduction in expected buy
price.
Remark 6.1. In this example only three levels were considered. Note that the
state space, and hence the complexity, grows exponentially with the number of
levels. As a consequence, Algorithm 4.1 is only applicable when the number of
levels is relatively small.
Acknowledgements. The authors are grateful to Ebba Ankarcrona and her col-
leagues at SEB for interesting discussions as well as data access and analysis.
References
[1] Aurelien Alfonsi, Antje Fruth, and Alexander Schied. Optimal execution strategies in limit
order books with general shape functions. Quant. Finance, 10(2):143157, 2010.
[2] Marco Avellaneda and Sasha Stoikov. High-frequency trading in a limit order book. Quant.
Finance, 8(3):217224, 2008.
[3] Jean-Philippe Bouchaud, Marc Mezard, and Marc Potters. Statistical properties of stock
order books: empirical results and models. Quant. Finance, 2(4):251256, 2002.
[4] Rama Cont, Sasha Stoikov, and Rishi Talreja. A stochastic model for order book dynamics.
SSRN eLibrary, 2008. http://ssrn.com/paper=1273160.
[5] Marcus G. Daniels, J. Doyne Farmer, Laszlo Gillemot, Giulia Iori, and Eric Smith. Quanti-
tative model of price diusion and market friction based on trading as a mechanistic random
process. Phys. Rev. Lett., 90(10):108102, Mar 2003.
[6] Hans Michael Dietz and Volker Nollau. Markov decision problems with countable state spaces,
volume 15 of Mathematical Research. Akademie-Verlag, Berlin, 1983.
[7] J. Doyne Farmer, Laszlo Gillemot, Fabrizio Lillo, Szabolcs Mike, and Anindya Sen. What
really causes large price changes? Quant. Finance, 4(4):383397, 2004.
[8] J. Doyne Farmer, Paolo Patelli, and Ilija Zovko. The predictive power of zero intelligence
models in nancial markets. Proceedings of the National Academy of Sciences of the United
States of America, 102(6):22542259, 2005.
[9] Eugene A. Feinberg and Adam Shwartz, editors. Handbook of Markov decision processes.
International Series in Operations Research & Management Science, 40. Kluwer Academic
Publishers, Boston, MA, 2002.
[10] Thierry Foucault. Order ow composition and trading costs in a dynamic limit order market.
Journal of Financial Markets, 2(2):99 134, 1999.
[11] Dhananjay K. Gode and Shyam Sunder. Allocative eciency of markets with zero-intelligence
traders: Market as a partial substitute for individual rationality. Journal of Political Econ-
omy, 101(1):119, 1993.
[12] James R. Norris. Markov chains, volume 2 of Cambridge Series in Statistical and Probabilistic
Mathematics. Cambridge University Press, Cambridge, 1998.
[13] Anna A. Obizhaeva and Jiang Wang. Optimal Trading Strategy and Supply/Demand Dy-
namics. SSRN eLibrary, 2005. http://ssrn.com/paper=666541.
[14] Christine A. Parlour. Price dynamics in limit order markets. Rev. Financ. Stud., 11(4):789
816, 1998.
[15] Eric Smith, J. Doyne Farmer, Laszloo Gillemot, and Supiriya Krishnamurthy. Statistical
theory of the continuous double auction. Quantitative Finance, 3(6):14697688, 2003.
[16] Henk C. Tijms. Stochastic models. Wiley Series in Probability and Mathematical Statistics:
Applied Probability and Statistics. John Wiley & Sons Ltd., Chichester, 1994.
Initial level choice and expected payo
2 4 6 8 10 12 14
!12
!10
!8
!6
!4
!2
0
Volume at 1.4344
V
o
lu
m
e

a
t

1
.
4
3
4
2
2
4
6
8
10
12
14
!12
!10
!8
!6
!4
!2
0
4.31
4.32
4.33
4.34
4.35
4.36
4.37
4.38
4.39
4.4
x 10
!3
Volume at 1.4342
Volume at 1.4344
E
x
p
e
c
t
e
d

p
r
ic
e

!

1
.
4
3
Figure 5. Expected buy price and optimal initial choice for buy
order placement in the ultimate buy-one-unit strategy. The right
plot shows expected buy price given dierent volumes at price lev-
els 1.4342 and 1.4344. Level 1.4343 contains no quotes. The dark
grey in the choice matrix to the left shows that it is always optimal
to place a limit order at 1.4342.
Buy order at 1.4342, dierent volumes at 1.4343 and 1.4344
2 4 6 8 10 12 14
!10
!5
0
5
10
Volume at 1.4344
V
o
lu
m
e

a
t

1
.
4
3
4
3
2
4
6
8
10
12
14
!10
!5
0
5
10
4.25
4.3
4.35
4.4
x 10
!3
Volume at 1.4344
Volume at 1.4343
E
x
p
e
c
t
e
d

p
r
ic
e

!

1
.
4
3
Figure 6. Expected buy price and optimal choice of buy order
placement in the ultimate buy-one-unit strategy. The buy order
currently has place 2 out of a total of 10 limit buy orders at price
level 1.4342. The choice matrix to the left shows optimal choice
given dierent volumes at levels 1.4343 and 1.4344. Dark grey
indicates that the buy order should be kept in place. Light grey
indicates that the buy order should be cancelled and resubmitted at
level 1.4343. In the white region, the buy order should be cancelled
and replaced by a market order. The plot to the right shows the
optimal expected buy price.
Buy order at dierent positions at 1.4342
2 4 6 8 10 12 14
2
4
6
8
10
12
Volume at 1.4344
Q
u
o
t
e
s

a
h
e
a
d

o
f

o
r
d
e
r
2
4
6
8
10
12
14
2
4
6
8
10
12
4.35
4.36
4.37
4.38
4.39
4.4
x 10
!3
Quotes ahead of order
Volume at 1.4344
E
x
p
e
x
t
e
d

p
r
ic
e

!

1
.
4
3
Figure 7. Expected buy price and optimal choice of buy order
placement in the ultimate buy-one-unit strategy. In these plots,
the total volume at level 1.4342 is 13 and the volume at 1.4343
is 1. The plot to the left shows the optimal choice given dierent
number of quotes ahead of the buy order at 1.4342. That is, if
there are 4 quotes ahead of the buy order, then these 4 orders need
to be either cancelled or matched against market orders before the
buy order can be executed. In the dark grey region the buy order
should be kept at its current position. In the lighter grey region,
the order should be moved to level 1.4343, where it is behind only
1 order. In the white region nally, the limit order should be
cancelled and replaced by a market order. The plot to the right
shows the corresponding expected buy price.
(H. Hult) Department of Mathematics, KTH, 100 44 Stockholm, Sweden
E-mail address: hult@kth.se
(J. Kiessling) Department of Mathematics, KTH, 100 44 Stockholm, Sweden
E-mail address: jonkie@kth.se
Three dierent buy strategies compared
0
5
10
15
!15
!10
!5
0
0
1
2
3
4
x 10
!6
Volume at 1.4342 Volume at 1.4344
E
x
p
e
c
t
e
d

p
r
ic
e

d
if
f
e
r
e
n
c
0
5
10
15
!15
!10
!5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
x 10
!5
Volume at 1.4342
Volume at 1.4344
E
x
p
e
c
t
e
d

p
r
ic
e

d
if
f
e
r
e
n
c
e

Figure 8. Dierence between expected buy price under dierent
buy strategies. In both plots, level 1.4343 is empty. The plot to
the left shows the dierence between the expected buy price under
the keep-or-cancel policy described in Section 4.2 and the ultimate
policy, for dierent volumes at 1.4342 and 1.4344. The plot to the
right shows the dierence in expected buy price between the naive
buy-one-unit strategy and the ultimate policy. It is clear that the
option of early cancellation has a substantial impact on the nal
buy price.

Algorithmic Trading With Markov Chains

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Algorithmic Trading With Markov Chains

Загружено:

Авторское право:

Доступные форматы

June 16, 2010

ALGORITHMIC TRADING WITH MARKOV CHAINS

{0, 1, 2, . . .} { 0, 1, 2, . . .}. Let P denote the its transition matrix.

when selecting action a is denoted P

exists and the optimal expected cost is the unique solution to

denotes both the policy as well as each individual decision

of a policy is the rst time an action is taken from

A central role is played by the function V

(s) denote its limit.

satises the Bellman equation

is optimal. Furthermore, if all policies belong to , which is quite natural in

is in fact the expected cost of a stationary policy

is the optimal expected cost for . That is, V

is the unique solution to the

is the optimal expected cost and is attained

). The expression in the last display is then greater or

(s). From the rst part of Theorem 4.5

(s) it follows that inf

(s). This proves

(s) T(s). Then

(s) C(s). Then

(s). Consider the termination regions {s :

(s), and both are solutions

(s) T(s)}. To show equality

(s) C(s)}. As in the proof of (c) one writes

(s) C(s)}. This implies W(s) = V

(s) for all s S

may be dicult to nd, and hence

. However, it is easy to come close. Since V

(s) a close to optimal policy is obtained by nding one that attains

, with expected buy price V

satisfying, see Lemma 4.2,

. The cost of continuation is

satisfying, see Lemma 4.2,

which implies that

is interpreted as the optimal expected payo. Note, that for making-the-spread

as a payo than as a cost and this is how

(x, y, j) denote the optimal (minimal) expected buy

(x, y, j) denotes the optimal (maximal) expected sell price

) is the value of waiting and 0 is the value of cancelling

Вам также может понравиться