Python Meet Up Presentation

Learning to Trade with Q-Reinforcement Learning
(A tensorflow and Python focus)
Ben Ball & David Samuel
www.prediction-machines.com
Special thanks to -
Algorithmic Trading (e.g., HFT) vs Human Systematic Trading
Often looking at opportunities existing in the

microsecond time horizon. Typically using statistical Systematic strategies learned from experience of
microstructure models and techniques from machine “reading market behavior”, strategies operating over
learning. Automated but usually hand crafted tens of seconds or minutes.
signals, exploits, and algorithms.
Predicting the next tick Predicting complex plays

Take inspiration from Deep Mind – Learning to play Atari video games
Could we do something similar for trading markets?
FC ReLU
FC ReLU
Output
Input
+
Functional
pass-though
O O
O O
O O
*Network images from http://www.asimovinstitute.org/neural-network-zoo/

Introduction to Reinforcement Learning
How does a child learn to ride a bike?
rather than this . . .
Lots of this
leading to
this
Machine Learning vs Reinforcement Learning
Good textbook on this by
Sutton and Barto -
 No supervisor
 Trial and error paradigm
 Feedback delayed
 Time sequenced
 Agent influences the environment
Agent
State St Reward rt & Next State St+1 Action at
Environment
st, at, rt, st+1,at+1,rt+1,st+2,at+2,rt+2, …
Value function
Policy function
REINFORCEjs
Reward function
GridWorld :
---demo---
Application to Trading
Typical dynamics of a mean-reverting asset or pairs-trading where the spread exhibits mean reversion
Asset / Spread price evolving with time
Upper range soft boundary

Price of mean reverting asset or spread
Map the movement of the mean

reverting asset (or spread) into a
discrete lattice where the price
dynamics become transitions
Mean between lattice nodes.
We started with a simple 5 node

lattice but this can be increased
quite easily.
Lower range soft boundary
State transitions of lattice simulation of mean reversion:
i=2
Spread price mapped on to lattice index

i=1
These map into:
i=0 (State, Action, Reward)
triplets used in the QRL algorithm

i = -1
i = -2
Short Flat Long
sell buy
Mean Reversion Game Simulator
Level 3
Example sell transaction
Example buy transaction

As per the Atari games example, our QRL/DQN plays the trading game
… over-and-over
http://www.prediction-machines.com/blog/ - for demonstration

Building a DQN and defining its topology
Using Keras and Trading-Gym

Double Dueling DQN (vanilla DQN does not converge well but this method works much better)
FC ReLU FC ReLU
FC ReLU FC ReLU
Output Output
Input Input
+ +
Functional Functional
pass-though pass-though
lattice position
(long,short,flat) state value of Buy

training network target network value of Sell
value of Do Nothing
Trading-Gym Architecture
Runner
warmup() Abstract class
train()
Children class
run()
Data Generator
Environment Agent Memory
next()
rewind() render()
step() act() add()
reset() observe() sample()
Random end()
Walks
Single Asset
Deterministic DQN
Signals Multi Asset Brain
Double DQN
CSV Replay Market train()
Making predict()
Market Data A3C
Streamer
Trading-Gym - OpenSourced
Prediction Machines release of Trading-Gym environment into OpenSource
- - demo - -
Prediction Machines release of Trading-Gym environment into OpenSource
TensorFlow TradingGym TensorFlow TradingBrain

available now released soon
with Brain and DQN example
References:
Insights In Reinforcement Learning (PhD thesis) by Hado van Hasselt
Human-level control though deep reinforcement learning

V Mnih, K Kavukcuoglu, D Silver, AA Rusu, J Veness, MG Bellemare, ...
Nature 518 (7540), 529-533
Deep Reinforcement Learning with Double Q-Learning

H Van Hasselt, A Guez, D Silver
AAAI, 2094-2100
Prioritized experience replay

T Schaul, J Quan, I Antonoglou, D Silver
arXiv preprint arXiv:1511.05952
Dueling Network Architectures for Deep Reinforcement Learning

Z Wang, T Schaul, M Hessel, H van Hasselt, M Lanctot, N de Freitas
The 33rd International Conference on Machine Learning, 1995–2003
DDDQN and
Tensorflow
Overview
1. DQN - DeepMind, Feb 2015 “DeepMindNature”
http://www.davidqiu.com:8888/research/nature14236.pdf
a. Experience Replay
b. Separate Target Network
2. DDQN - Double Q-learning. DeepMind, Dec 2015

https://arxiv.org/pdf/1509.06461.pdf
3. Prioritized Experience Replay - DeepMind, Feb 2016

4. DDDQN - Dueling Double Q-learning. DeepMind, Apr 2016

Enhancements
Experience Replay
Removes correlation in sequences
Smooths over changes in data distribution
Prioritized Experience Replay
Speeds up learning by choosing experiences with weighted distribution
Separate target network from Q network
Removes correlation with target - improves stability
Double Q learning
Removes a lot of the non uniform overestimations by separating selection of action and evaluation
Dueling Q learning
Improves learning with many similar action values. Separates Q value into two : state value and state-
dependent action advantage
Keras v Tensorflow
Keras Tensorflow
High level ✔
Standardized API ✔
Access to low level ✔
Tensorboard ✔* ✔
Understand under the hood ✔
Can use multiple backends ✔

Install Tensorflow
My installation was on CentOS in docker with GPU*, but also did locally on
Ubuntu 16 for this demo. *Built from source for maximum speed.
CentOS instructions were adapted from:
https://blog.abysm.org/2016/06/building-tensorflow-centos-6/
Ubuntu install was from:
https://www.tensorflow.org/install/install_sources
Tensorflow - what is it
A computational graph solver

Tensorflow key API
Namespaces for organizing the graph and showing in tensorboard
with tf.variable_scope('prediction'):
Sessions
with tf.Session() as sess:
Create variables and placeholders
var = tf.placeholder('int32', [None, 2, 3], name='varname’)
self.global_step = tf.Variable(0, trainable=False)
Session.run or variable.eval to run parts of the graph and retrieve values
pred_action = self.q_action.eval({self.s_t['p']: s_t_plus_1})
q_t, loss= self.sess.run([q['p'], loss], {target_q_t: target_q_t, action: action})

Trading-Gym
https://github.com/Prediction-Machines/Trading-Gym
Open sourced
Modelled after OpenAI Gym. Compatible with it.
Contains example of DQN with Keras
Contains pair trading example simulator and visualizer

Trading-Brain
https://github.com/Prediction-Machines/Trading-Brain
Two rich examples
Contains the Trading-Gym Keras example with suggested structuring

examples/keras_example.py
Contains example of Dueling Double DQN for single stock trading game
examples/tf_example.py
References
Much of the Brain and config code in this example is adapted from devsisters github:
https://github.com/devsisters/DQN-tensorflow
Our github:
https://github.com/Prediction-Machines
Our blog:
http://prediction-machines.com/blog/
Our job openings:

http://prediction-machines.com/jobopenings/
Video of this presentation:

https://www.youtube.com/watch?v=xvm-M-R2fZY

Python Meet Up Presentation

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Python Meet Up Presentation

Загружено:

Авторское право:

Доступные форматы

Learning to Trade with Q-Reinforcement Learning

(A tensorflow and Python focus)

Ben Ball & David Samuel

Often looking at opportunities existing in the

Predicting the next tick Predicting complex plays

*Network images from http://www.asimovinstitute.org/neural-network-zoo/

rather than this . . .

State St Reward rt & Next State St+1 Action at

Asset / Spread price evolving with time

Upper range soft boundary

Map the movement of the mean

We started with a simple 5 node

Spread price mapped on to lattice index

i=0 (State, Action, Reward)

triplets used in the QRL algorithm

Short Flat Long

Example buy transaction

http://www.prediction-machines.com/blog/ - for demonstration

Using Keras and Trading-Gym

(long,short,flat) state value of Buy

TensorFlow TradingGym TensorFlow TradingBrain

Insights In Reinforcement Learning (PhD thesis) by Hado van Hasselt

Human-level control though deep reinforcement learning

Deep Reinforcement Learning with Double Q-Learning

Prioritized experience replay

Dueling Network Architectures for Deep Reinforcement Learning

b. Separate Target Network

2. DDQN - Double Q-learning. DeepMind, Dec 2015

3. Prioritized Experience Replay - DeepMind, Feb 2016

4. DDDQN - Dueling Double Q-learning. DeepMind, Apr 2016

Access to low level ✔

Understand under the hood ✔

Can use multiple backends ✔

CentOS instructions were adapted from:

Ubuntu install was from:

A computational graph solver

with tf.Session() as sess:

Create variables and placeholders

var = tf.placeholder('int32', [None, 2, 3], name='varname’)

self.global_step = tf.Variable(0, trainable=False)

Session.run or variable.eval to run parts of the graph and retrieve values

pred_action = self.q_action.eval({self.s_t['p']: s_t_plus_1})

q_t, loss= self.sess.run([q['p'], loss], {target_q_t: target_q_t, action: action})

Modelled after OpenAI Gym. Compatible with it.

Contains example of DQN with Keras

Contains pair trading example simulator and visualizer

Two rich examples

Contains the Trading-Gym Keras example with suggested structuring

Our job openings:

Video of this presentation:

Вам также может понравиться