Вы находитесь на странице: 1из 2

Downloading historical Forex tick data and importing them in to Pytho

This is the another post of the series: How to build your own algotrading platform.

Jon V
Before running any live algotrading system, it is a good practice to backtest (that means run a simulation) our algorithms.
Have in mind that this doesn't mean that if your system is killing it for the last 5 years/months/days it will make a prot
BigData.
but it is aStartups.
good indicator that you might be on to something.
Trading. 48
Shares

There are four things that we need to take into consideration when we do our backtesting:
22

1. Twitter
The quality of the data

2. LinkedIn
How to load them eciently

3. Github
How to built our backtesting system

4. Bitbucket
Try to have our backtesting and our live system share as much code as we can

Today, we are going to focus on (1) and (2).

For Forex data, I am using GainCapital. Their data are in the form of ticks. For a free source it is good enough. I used to use
Oanda's historical data service but it seems that they moved it to a premium product. Too bad. Make sure that you use
Powered by Pelican, theme
GainCapital's data only for experimentation.
info.

For any other kind of paid historical data (ETFs, stocks, options stc), I am using eoddata.com (they also have some forex
historical data but I haven't used them).

Let's download data for a week and experiment a little bit. The link to the data is
http://ratedata.gaincapital.com/2015/11%20November/EUR_USD_Week1.zip for the rst week of November 2015.

First we need to unzip the le :::python >unzip EUR_USD_Week1.zip

and you'll get a 25MB le named EUR_USD_Week1.csv. These are data for one week for one currency pair. You can
imagine the amount of data you need to process for all currencies for the last ve years (hint: a lot!). But don't worry, we
are going optimize this. For now, let's open the le and inspect.

>head EUR_USD_Week1.csv
lTid cDealable CurrencyPair RateDateTime RateBid RateAsk
4464650058 D EUR/USD 2015-11-01 17:00:06.490000000 1.103380 1.103770
4464650061 D EUR/USD 2015-11-01 17:00:06.740000000 1.103400 1.103760
4464650072 D EUR/USD 2015-11-01 17:00:07.990000000 1.103390 1.103750
4464650083 D EUR/USD 2015-11-01 17:00:08.990000000 1.103400 1.103750

the things that we care about is the RateDateTime, RateBid and RateAsk. As you can understade each line has a
timestamp and the how much was the price to buy or sell. Formats downloaded by other services are pretty similar.

There are many ways to load these data into Python but the most preferable when it comes to data slicing and
manipulating is using Pandas. We can always use the csv library to load data (and it might be faster) but we need to do
some optimizations and processing rst that as you will see it is pretty easy with pandas.

Another great tool to load TONS of GBs pretty eciently and very fast is using Bcolz, covered in a much later post (or you
can read a preview if you have signed up in the newsletter.

http://jon.io/downloading-historical-forex-tick-data-and-importing-them-in-to-python-using-pandas.html 2/8/16 17:56


Pgina 1 de 2
Manipulating data using Pandas The data we downloaded are in ticks. Unless we are building an UHFT (ultra high
frequency trading) algorithm, it is much more ecient (memory, storage and processing-wise) to "group" these ticks into
seconds (or minutes or hours depending on your strategy). This will make our download scale down from 25MB to just
35KB which translate to HUGE performance and memory benets.

Let's group all these data in 15 minutes. How? Time to fall in love with resample.

df = pandas.read_csv(filename, parse_dates={'DateTime'}, index_col='DateTime', names=['Tid', 'Dealable', 'Pair'

# let's erase the columns we don't need


del df['Tid']
del df['Dealable']
del df['Pair']

# group every 15 minutes and create an OHLC bar


grouped_data = df.resample('15Min', how='ohlc')

The resampled the dataset looks like this:

DateTime Buy Open Buy High Buy Low Buy Close


2015-08-02 17:00:00 1.09706 1.09821 1.09706 1.09735 1.09752 1.09853 1.09740 1.09762
2015-08-02 17:15:00 1.09731 1.09825 1.09731 1.09769 1.09762 1.09843 1.09758 1.09808

This is called OHLC (Open High Low Close) bar for every 15 minutes. You can see now that the ticks are grouped in 15
minute segments and you have the highest and lowest point that the price reached during these 15 minutes and also the
open/close for buy and sell. Pure gold! Not only you have all the information you need but now it is extremely fast to load
it. You just need to save the data:

:: python
# save to file
grouped_data.to_pickle(filename+'-OHLC.pkl')

and then you can reuse this 35kb le.

We can write a simple momentum algorithm that checks if there was a huge movement the last 15 minutes and if that
was the case, let's buy. We will dive into this in a later post.

You can see the code as always on github.

Coming up next, building a backtesting system from scratch!

If you have more feedback, ping me at jonromero or signup to the newsletter.

Legal outro. This is an engineering tutorial on how to build an algotrading platform for experimentation and FUN. Any
suggestions here are not nancial advices. If you lose any (or all) you money because you followed any trading advices or
deployed this system in production, you cannot blame this random blog (and/or me). Enjoy at your own risk.

Category: trading POSTED THU 03 DECEMBER 2015


Tags: algotrading

BACK TO BLOG

http://jon.io/downloading-historical-forex-tick-data-and-importing-them-in-to-python-using-pandas.html 2/8/16 17:56


Pgina 2 de 2

Вам также может понравиться