Вы находитесь на странице: 1из 31

Learning to Trade with Q-Reinforcement Learning

(A tensorflow and Python focus)

Ben Ball & David Samuel

www.prediction-machines.com
Special thanks to -
Algorithmic Trading (e.g., HFT) vs Human Systematic Trading

Often looking at opportunities existing in the


microsecond time horizon. Typically using statistical Systematic strategies learned from experience of
microstructure models and techniques from machine “reading market behavior”, strategies operating over
learning. Automated but usually hand crafted tens of seconds or minutes.
signals, exploits, and algorithms.

Predicting the next tick Predicting complex plays


Take inspiration from Deep Mind – Learning to play Atari video games
Could we do something similar for trading markets?

FC ReLU

FC ReLU
Output
Input

+
Functional
pass-though

O O

O O

O O

*Network images from http://www.asimovinstitute.org/neural-network-zoo/


Introduction to Reinforcement Learning
How does a child learn to ride a bike?

rather than this . . .

Lots of this

leading to

this
Machine Learning vs Reinforcement Learning
Good textbook on this by
Sutton and Barto -
 No supervisor
 Trial and error paradigm
 Feedback delayed
 Time sequenced
 Agent influences the environment

Agent

State St Reward rt & Next State St+1 Action at

Environment
st, at, rt, st+1,at+1,rt+1,st+2,at+2,rt+2, …
Value function

Policy function

REINFORCEjs
Reward function
GridWorld :

---demo---
Application to Trading
Typical dynamics of a mean-reverting asset or pairs-trading where the spread exhibits mean reversion

Asset / Spread price evolving with time

Upper range soft boundary


Price of mean reverting asset or spread

Map the movement of the mean


reverting asset (or spread) into a
discrete lattice where the price
dynamics become transitions
Mean between lattice nodes.

We started with a simple 5 node


lattice but this can be increased
quite easily.
Lower range soft boundary
State transitions of lattice simulation of mean reversion:

i=2

Spread price mapped on to lattice index


i=1
These map into:

i=0 (State, Action, Reward)

triplets used in the QRL algorithm


i = -1

i = -2

Short Flat Long

sell buy
Mean Reversion Game Simulator
Level 3
Example sell transaction

Example buy transaction


As per the Atari games example, our QRL/DQN plays the trading game

… over-and-over

http://www.prediction-machines.com/blog/ - for demonstration


Building a DQN and defining its topology

Using Keras and Trading-Gym


Double Dueling DQN (vanilla DQN does not converge well but this method works much better)

FC ReLU FC ReLU

FC ReLU FC ReLU
Output Output
Input Input

+ +
Functional Functional
pass-though pass-though
lattice position

(long,short,flat) state value of Buy


training network target network value of Sell
value of Do Nothing
Trading-Gym Architecture
Runner
warmup() Abstract class
train()
Children class
run()

Data Generator
Environment Agent Memory
next()
rewind() render()
step() act() add()
reset() observe() sample()
Random end()
Walks
Single Asset
Deterministic DQN
Signals Multi Asset Brain
Double DQN
CSV Replay Market train()
Making predict()
Market Data A3C
Streamer

Trading-Gym - OpenSourced
Prediction Machines release of Trading-Gym environment into OpenSource

- - demo - -
Prediction Machines release of Trading-Gym environment into OpenSource

TensorFlow TradingGym TensorFlow TradingBrain


available now released soon
with Brain and DQN example
References:

Insights In Reinforcement Learning (PhD thesis) by Hado van Hasselt

Human-level control though deep reinforcement learning


V Mnih, K Kavukcuoglu, D Silver, AA Rusu, J Veness, MG Bellemare, ...
Nature 518 (7540), 529-533

Deep Reinforcement Learning with Double Q-Learning


H Van Hasselt, A Guez, D Silver
AAAI, 2094-2100

Prioritized experience replay


T Schaul, J Quan, I Antonoglou, D Silver
arXiv preprint arXiv:1511.05952

Dueling Network Architectures for Deep Reinforcement Learning


Z Wang, T Schaul, M Hessel, H van Hasselt, M Lanctot, N de Freitas
The 33rd International Conference on Machine Learning, 1995–2003
DDDQN and
Tensorflow
Overview
1. DQN - DeepMind, Feb 2015 “DeepMindNature”
http://www.davidqiu.com:8888/research/nature14236.pdf
a. Experience Replay

b. Separate Target Network

2. DDQN - Double Q-learning. DeepMind, Dec 2015


https://arxiv.org/pdf/1509.06461.pdf

3. Prioritized Experience Replay - DeepMind, Feb 2016


https://arxiv.org/pdf/1511.05952.pdf

4. DDDQN - Dueling Double Q-learning. DeepMind, Apr 2016


https://arxiv.org/pdf/1511.06581.pdf
Enhancements

Experience Replay
Removes correlation in sequences
Smooths over changes in data distribution
Prioritized Experience Replay
Speeds up learning by choosing experiences with weighted distribution
Separate target network from Q network
Removes correlation with target - improves stability
Double Q learning
Removes a lot of the non uniform overestimations by separating selection of action and evaluation
Dueling Q learning
Improves learning with many similar action values. Separates Q value into two : state value and state-
dependent action advantage
Keras v Tensorflow

Keras Tensorflow

High level ✔

Standardized API ✔

Access to low level ✔

Tensorboard ✔* ✔

Understand under the hood ✔

Can use multiple backends ✔


Install Tensorflow

My installation was on CentOS in docker with GPU*, but also did locally on
Ubuntu 16 for this demo. *Built from source for maximum speed.

CentOS instructions were adapted from:

https://blog.abysm.org/2016/06/building-tensorflow-centos-6/

Ubuntu install was from:

https://www.tensorflow.org/install/install_sources
Tensorflow - what is it

A computational graph solver


Tensorflow key API
Namespaces for organizing the graph and showing in tensorboard

with tf.variable_scope('prediction'):

Sessions

with tf.Session() as sess:

Create variables and placeholders

var = tf.placeholder('int32', [None, 2, 3], name='varname’)

self.global_step = tf.Variable(0, trainable=False)

Session.run or variable.eval to run parts of the graph and retrieve values

pred_action = self.q_action.eval({self.s_t['p']: s_t_plus_1})

q_t, loss= self.sess.run([q['p'], loss], {target_q_t: target_q_t, action: action})


Trading-Gym

https://github.com/Prediction-Machines/Trading-Gym

Open sourced

Modelled after OpenAI Gym. Compatible with it.

Contains example of DQN with Keras

Contains pair trading example simulator and visualizer


Trading-Brain

https://github.com/Prediction-Machines/Trading-Brain

Two rich examples

Contains the Trading-Gym Keras example with suggested structuring


examples/keras_example.py

Contains example of Dueling Double DQN for single stock trading game
examples/tf_example.py
References
Much of the Brain and config code in this example is adapted from devsisters github:

https://github.com/devsisters/DQN-tensorflow

Our github:

https://github.com/Prediction-Machines

Our blog:
http://prediction-machines.com/blog/

Our job openings:


http://prediction-machines.com/jobopenings/

Video of this presentation:


https://www.youtube.com/watch?v=xvm-M-R2fZY

Вам также может понравиться