Вы находитесь на странице: 1из 87

1On Becoming A Quant

2The Essential Algorithmic


Trading Reading List
3The Self-Study Guide to
Becoming a Quantitative Trader
4QuantStart Lessons
ON BECOMING A QUANT

MARK JOSHI

1. W HAT DOES A QUANT DO ?

A quant designs and implements mathematical models for the pricing


of derivatives, assessment of risk, or predicting market movements.

2. W HAT SORTS OF QUANT ARE THERE ?


(1) Front office/desk quant
(2) Model validating quant
(3) Research quant
(4) Quant developer
(5) Statistical arbitrage quant
(6) Capital quant

A desk quant implements pricing models directly used by traders. Main


plusses close to the money and opportunities to move into trading. Mi-
nuses can be stressful and depending on the outfit may not involve much
research.
A model validation quant independently implements pricing models
in order to check that front office models are correct. Plusses more re-
laxed, less stressful. Minusses model validation teams can be uninspired
and far from the money.
Research quant tries to invent new pricing approaches and sometimes
carries out blue-sky research. Plusses its interesting and you learn a lot
more. Minusses sometimes hard to justify your existence.
Quant developer a glorified programmer but well-paid and easier to
find a job. This sort of job can vary a lot. It could be coding scripts quickly
all the time, or working on a large system debugging someone elses code.
Statistical arbitrage quant, works on finding patterns in data to sug-
gest automated trades. The techniques are quite different from those in

Date: March 13, 2015.


1
2 MARK JOSHI

derivatives pricing. This sort of job is most commonly found in hedge


funds. The return on this type of position is highly volatile!
A capital quant works on modelling the banks credit exposures and
capital requirements. This is less sexy than derivatives pricing but is be-
coming more and more important with the advent of the Basel III bank-
ing accord. You can expect decent (but not great) pay, less stress and
more sensible hours. There is currently a drive to mathematically model
the chance of operational losses through fraud etc, with mixed degrees of
success.
People do banking for the money, and you tend to get paid more the
closer you are to where the money is being made. This translates into a
sort of snobbery where those close to the money look down on those who
arent. As a general rule, moving away from the money is easy, moving
towards it is hard.

3. A REAS OF DERIVATIVES
FX
Equities
Fixed income
Credit derivatives
Commodities
Hybrids
Power/energy

FX is short for foreign exchange. Contracts tend to be short-dated with


high volume and simple specifications. Emphasis is therefore on speed
and smile modelling.
Equities means options on stocks and indices. Techniques tend to be
PDE based with the local vol model being popular. A typical contract is a
note paying some function of the stock price path. Not a particularly big
market.
Fixed income means interest rate derivatives. This is probably the biggest
area by value. The maths is more complex because the underlying is
multi-dimensional. Martingale techniques are used a lot. Its well paid.
Credit derivatives are derivatives that pay-off according to the defaults
of corporate entities. This was a big growth area with lots of demand
translating into very high pay. It displayed some bubble-like character-
istics, however, and the bubble has now burst.
ON BECOMING A QUANT 3

Commodities, this is also a big growth area with the general rally in
commodity prices in recent years. This is the area that seems to be hold-
ing up best in the current job market.
Hybrids are derivatives that pay off according to behaviours in more
than one market this is typically interest rates plus something else. The
main advantage of working on such products is the ability to learn multi-
ple areas.
Power and energy derivatives relate to the cost of buying and selling
electricity. The wholesale market for electricity has some unique features.

4. S ORTS OF EMPLOYERS

We can roughly divide employers into

Commercial banks, e.g., RBS, HSBC


Investment banks, e.g., Goldman Sachs, Lehman Brothers was one
of these and died in spectacular fashion
Hedge funds, e.g., the Citadel Group
Accountancy firms
Software companies

Commercial banks ask less of you, and pay less. Better job security.
Investment banks tend to demand long hours but pay well. Not so
good job security.
Hedge funds tend to demand a lot of work. They are very volatile and
a big growth industry currently. There is the potential to make a huge
amount of money, but also the potential to be unemployed after a few
months.
In general, American banks pay better but demand longer hours than
European banks.
The big accountancy firms have quant teams for consulting. Some
places, particularly D-fine, send their employees on the Oxford Masters
course. The main disadvantage is that you are far from the action, and
high quality individuals tend to work in banks so it may be hard to find
someone to learn from. Related places are consultancies and insurance
companies.
4 MARK JOSHI

There is becoming more of a tendency to outsource quant modelling


and buy in software models. One option is therefore to work for the soft-
ware company instead. The issues are similar to those with working for
accountancy firms.

5. S TUDY

What should one learn? There is by now a huge number of books avail-
able. Standard books are

Hull - Options future and other derivatives this is sometimes


called the bible book. Main downside is that it is oriented to-
wards MBAs rather than quantitative PhDs.
Baxter and Rennie accessible introduction to martingale approach
but oriented towards theory rather than practicalitues
Wilmott (Derivatives) good on the PDE approach but not so good
on other approaches.

Like any author I recommend my own books:

The concepts and practice of mathematical finance, CUP 2003,


my objective here was to cover what a good quant ought to know.
It includes programming projects that I strongly advise you to do
before applying for jobs. The second edition appeared in 2008.
C++ design patterns and derivatives, CUP 2004, this is a second
book on C++, the objective was to teach the reader how to use the
language properly. The second edition appeared in 2008.
Quant job interview questions and answers. We self-published
this one. I spent 8 years gathering questions from live quant job
interviews; here they are with answers and possible follow up ques-
tions. This is currently available only from amazon.com.
More mathematical finance, Sep 2011. This is the sequel to Con-
cepts. It takes up where Concepts left off and contain much more
discussion of the numerics and the models.

Stochastic calculus is useful, but not as important as it at first appears.


It is hard to find the time to pick it up on the job so its worth learning
in advance. Its also worth spending some time going over basic proba-
bility theory eg Chungs books. Some books on stochastic calculus and
martingales which I like are
ON BECOMING A QUANT 5

Williams, Probability with martingales, a remarkably easy to read


rigorous account of discrete time martingale theory. (You need to
know the discrete time stuff to learn the continuous case.)
Rogers and Williams, particularly Volume 1.
Chung and Williams, you need to know continuous time martin-
gales first, but if you do it is a nice read.

I keep a much more detailed list at

http://www.markjoshi.com/RecommendedBooks.html

6. F ORUM

I am now running a book and careers forum to discuss books and get-
ting a first job in quant which you can access from

http://www.markjoshi.com

Please ask me questions via this forum rather than by e-mail. (Please only
e-mail me if there is some confidential aspect to your query.)
It also now has an experimental job-wanted section. Post your profile
but not personal details and see if anyones interested...

7. H OW MUCH DO I NEED TO KNOW ?

The amount you must study before getting a job varies a lot from place
to place. It goes up every year as it becomes more standard to do financial
mathematics degrees. At the time of writing, I would advise knowing the
contents of both my books well. A lot of candidates go wrong by reading
books instead of studying them. Pick a couple of books and pretend that
you have to do an exam on them (this is essentially what happens in an
interview,) if you arent confident that youd get an A in that sort of exam,
dont apply for jobs.
Interviewers tend to care more about understanding the basics well
than on knowing a lot. Its also important to demonstrate genuine in-
terest in the field. Read the Economist and the FT or Wall Street Jour-
nal comprehensively. Its not unusual to ask basic calculus or analysis
questions e.g. what is the integral of log x. Asking for a derivation of the
6 MARK JOSHI

Black-Scholes equation is very common too. They always ask you to ex-
plain your thesis so be prepared to be able to do this. Have a prepared 60
second speech on every phrase on your cv.
The interview is also a chance for you to judge them. What are they like
as people? (You will be spending most of your waking life with them so
this is important.) What do they care about, as evidenced by what they
ask you? If most of the questions are about the minutae of C++ syntax
then be wary unless thats the sort of job you want.
Generally, a PhD (or almost a PhD) is a necessity to get a quant job. I
would advise against starting before its awarded as it tends to be hard to
get it done whilst doing a busy job.
Having a masters degree in Financial mathematics but no PhD tends
to lead into jobs in banking in risk or trading support but not straight
quant jobs. Banking is becoming progressively more mathematical so the
knowledge is useful in many areas in banks. Some people then manage
to move into quant later on.
In the US, it seems to be becoming more and more common to do a
masters after a PhD. This still seems to be less the case in the UK. There
is a general move towards more routine work and less research in banks
making the job less interesting. This seems to be particularly the case in
the US. One head quant told me that he regards research as something
to be contracted out to universities.

8. T HE CURRENT JOB MARKET

Post the global financial crisis, it has become much harder to find a job.
The problems are particularly acute at entry level. It each year it seems
to get worse, but some jobs do still exist. What does this translate into in
terms of behaviour as a job seeker?
First, you must really know your stuff. The days hiring on the basis
of potential are gone, now make sure that you have done your prepara-
tion and can cope with any reasonable question. This means learning
the books, being able to reproduce them and drilling interview questions
at great length. It also means spending a lot of time implementing the
models in C++ so you can demonstrate your ability to contribute from
day one.
Second, you cant assume that youll have a lot of chances. In the past,
many candidates honed their skills by going to lots of interviews and hav-
ing their gaps discovered for them. This is not an option when only a few
ON BECOMING A QUANT 7

places are hiring: they wont reinterview you just because you have done
a bit more preparation. Doing your preparation also means finding out
about the company and the area.
Third, dont be picky regarding area. You may well want to work with
exotic interest rate derivatives, but if all the jobs are in commodities then
accept that and plan a shift when the market does.
Fourth, dont get focussed on the salary. The money is down; the im-
portant thing is to get some useful experience for when the market turns
around.
Fifth, do you really need to graduate this year? Spending a little longer
at university is not a bad way to sit out the crisis. You can always spend
the time broadening your knowledge and maybe even get a financial maths
research project going.

9. F OR PURE MATHEMATICIANS

The main challenge for a pure mathematician is to be able to get ones


hands dirty and learning to be more focussed on getting numeric results
than on fancy theories. The main way to do this is to implement pricing
models for practice. If this doesnt appeal you arent suited to being a
quant. There are quite a few ex-pure mathematicians working in the city
so it can certainly be done but there is some prejudice in favour of applied
maths and physics people. Generally, people tend to hire people who are
like them so if you can find anyone with a similar background working in
the city, apply to them.
I sometimes get asked by people whether they should do a pure maths
PhD or a financial maths one. If you are absolutely sure you want to do
derivatives pricing then you should do it in financial maths. (Yes, I am
taking PhD students but have no capacity at the moment.) If you arent
sure then dont. A good compromise is to do stochastic calculus, this is a
hard area which will give plenty of intellectual stimulation and leave you
very well placed for working in derivatives if you ever want to make the
switch.

10. C ODING

All forms of quants spend a large amount (i.e. more than half ) their
time programming. However, implementing new models can be inter-
esting in itself. The standard programming approach is object-oriented
8 MARK JOSHI

C++. A wannabe quant must learn C++. 1 Some places use MatLab and
that is also a useful skill, but less important. VBA is also used a lot, but
there is a general attitude that you can pick it up on the job. If a job is
very VBA focussed thats generally a bad sign.

11. A PPLYING FOR A JOB

All of the finance forums have their own jobs advertising boards. An-
other useful site containing distilled version of this guide is

https://www.financejobs.co/
Some adverts are from recruitment consultants rather than from banks.
It is important to realize that the job may not even exist the consul-
tant wants to get decent candidates that he can then try to place them in
banks. The consultant gets a commission from the bank if he can place
you. They tend to have short attention spans. if you do well at the first
couple of interviews then they will work hard to get you a good job but
if you dont they will quickly lose interest. Also, be aware their agenda is
to get a good commission rather than to help you so they will push you
at jobs on that basis. (A typical cut is 25% of your first years package so
whether you say yes to a job makes a difference of ten thousand pounds
to them.) If you want to understand them, think of estate agents.
In fact, going via a recruitment consultant is the standard way to get
a job. Quants are generally not hired as a part of the on campus recruit-
ment process but instead hired as they are needed by the team. That said
it is worthwhile to go to presentations and to meet the people, and get
their contact details for later. Because of this it is not a great idea to start
applying a long time before you want to start. Banks tend not to be into
paying expenses for interviews. One therefore needs to go to London or
New York and attempt to get as many interviews as possible as quickly as
possible.
If you have personal contacts, you should use them. Employers prefer
not to use headhunters if they can avoid it. If you are finishing a maths or
physics PhD from a top university you will be a hot property. Employers
will be keen to get you before someone else grabs you, so make use of
this.
Recruitment agencies vary tremendously and are discussed at great
length on all the online forums.One which seems to know what they are
1I have no opinion on whether this should be the correct language for implementing;
it is merely the correct language for getting a job.
ON BECOMING A QUANT 9

doing more than most, and which has its own much more extensive guides
is paulanddominic.
If you get offered a job that is not in your ideal area do not be too wor-
ried. It is the first job that is hard to get. You can move on. The main
thing is not to spend more than a couple of years in an area where you do
not want to be. Quants are most employable with 18 months to 2 years
experience. With more than that they tend to be too well paid and get
pigeon-holed.
From time to time, I hear of someone being offered a job and being told
they must accept immediately or within 24 hours. This is unreasonable
and you should question why they are doing this, and do you want to
work with someone who treats you this way? Possible responses are

Why?
Does that mean the offer will go away if I dont accept immedi-
ately?
Oh I get it, you are testing my naivety and laugh.
Mark Joshis guide says never to accept an offer made under such
circumstances.

If you are interviewing with other places, call them first and tell them the
circumstances, they will find this less annoying than you telling them you
accepted a job under pressure.
I regularly get berated by recruitment consultants for my comments in
this section. Here is one rebuttal:
While it is true that the recruitment / consulting agency market has be-
come overly saturated and commoditized as people working in the indus-
try realized they only really need a phone and a computer to start off on
their own, in this business. I believe that our current state of affairs has
weeded out many of those non-reputable firms or one man Body Shops
looking for a quick placement to make a few dollars. There are definitely
advantages to working with reputable a recruitment firm and I do agree
with Marks assessment; that care must be taken in learning about, who
you are dealing with, prior to just turning over your CV blindly to them.
A few quick tips; look at the firms website, if they do not have one this
should be your first red flag: When were they established? Who are their
clients? What types of firms are they representing? Chances are, that if they
have been in business for a number of years and have a good client list,
they are more then likely reputable. Typically, large investment banks and
organizations go to great lengths to establish their preferred vendors list
of allowable recruitment agencies. Please ensure that you are represented
10 MARK JOSHI

directly to the client, as some agencies will attempt to represent you via a
third party, this is not recommended. If an accurate job description is pro-
vided and the direct clients name, you can probably be assured the person
you are dealing with is reputable. The advantage is this: when you ap-
ply directly to a large investment bank, your name, CV, and contact info.
are entered into a large / complex database containing 1000s of potential
candidates. These resumes are funneled through channels to an internal
recruiter, who gets 100s of resumes daily for a variety of different jobs, not
just Quant Jobs, whatever happens to be a priority that day. He will focus
on maybe a few of the resumes he gets everyday and the rest will be filed
for future reference, which never happens. If you do not hear anything
and keep applying your CV and Profile are tagged as a Serial Applicant
and you will no longer be considered for positions within the bank, out of
the sheer fact you seemed desperate, even though you had never heard any-
thing so just kept applying. If, I on the other hand call you, or get back to in
regards to one of my posts, say on LinkedIn, I am contacting you for a spe-
cific job, for a specific manager, whom I speak with on nearly a daily basis.
I have the advantage of submitting you directly to the hiring manager. I lit-
erally put your resume right into his hands, with a brief summary of your
skills and why I think you are a match for the role. I can almost guarantee
you an interview in most cases, where I submit your profile and I provide
insights from others who have interviewed with the same managers in the
past, possibly providing potential questions he may ask and the answers he
is looking for. The Rest is up to you. I do believe there is in fact, an inherent
value here, in a world of online applications, databases, Vendor Manage-
ment Offices etc... It is still nice to know there are agencies out there, who
go above and beyond to see you placed. It makes no sense, for me to place
someone in a role for which he / she will not be happy for a few reasons: A.)
If the person leaves, a portion of my fee needs to be refunded and B.) If the
candidate is not happy and does not perform, I look bad to my client. In
closing, I dont think it prudent to treat potential quants like lost children,
who need to be sheltered and shown the light, chances are if they have not
seen it already, they will be eaten alive in the world of High Finance.
Dallin T. Swenson
Account Executive
dswenson@softinc.com
www.softinc.com
ON BECOMING A QUANT 11

12. PAY

How much does a quant earn? A quant with no experience will gener-
ally get between 40 and 70k pounds. The lowest I have heard of is 25k and
the highest is 70. If the pay is outside the standard range, you should ask
yourself why? Pay will generally go up fairly rapidly. Bonuses are gener-
ally a large component of total salary, and should be taken into account
when negotiating pay. E.g. you may be able to get a guaranteed bonus if
the base is lower.
Do not get too focussed on what the starting salary is. Instead examine
what the job opportunities will be, and what the learning experience is
likely to be. How much turnover is there in the team? (some managers
get touchy if asked about turnover so it may be better to try and ascertain
this indirectly) and where do the people go?

13. H OURS

How hard does a quant work? This varies a lot. At RBS we got in be-
tween 8.30 and 9 and went home around 6pm. The pressure varied. Some
of the American banks expect much longer hours. Wall St tends to be
more demanding than the City. In London 5 to 6 weeks holidays is stan-
dard. In the US 2 to 3 is standard.

14. I NTERVIEWING

Here are some dos and donts that will reduce your chance of messing
up unnecessarily.

Dont be late.
Dont be early; this annoys the interviewer. Get there early, go to a
cafe and have a lemonade and turn up dead on time.
Do eat a good meal beforehand; sugar lows destroy thinking power.
Dont argue with the interviewer about why theyve asked you some-
thing. Theyve asked you it because they want to know whether
you can do it.
Do appear enthusiastic.
Do wear a suit.
Do be eager to please. They want someone wholl do what they
want, you must give the appearance of being obliging rather than
difficult.
12 MARK JOSHI

Dont be too relaxed; they may well conclude that you arent hun-
gry enough for success to work hard.
Dont tell them they shouldnt use C++ because my niche language
is better.
Do demonstrate an interest in financial news.
Do be able to talk about everything on your cv (resume in Ameri-
can). Have a prepared 2 minute response on every phrase on it.
Do bring copies of your CV.
Dont expect the interviewer to be familiar with your CV.
Dont say youve read a book unless you can discuss its contents;
particularly, if theyve written it.
Do be polite.
Do ask for feedback and dont argue about it. Even if its wrong try
to understand what made the interviewer think that.
Dont say you want to work in banking for the money; of course,
you do but its bad form to say so.
Do say you want to work closely with other people rather than
solo.
Dont say that you think that bankers are reasonable people they
arent.
Do take a break from interviewing and do more prep if more than
a couple of interviews go badly.
Dont use a mobile for a phone interview.
Do be able to explain your thesis work out explanations for dif-
ferent sorts of people in advance.
Dont expect banks in the UK to pay for interview expenses. If they
do agree to pay, make sure they are willing to pay what your ticket
will cost. eg dont get an expensive ticket if they say theyll pay for
a cheapo airline.
Do ask about the group, youll be working in. e.g. turnover, where
people go when they leave, how many, when can you meet the
rest of the group (only if an offer appears imminent), how old the
group is, whats the teams raison detre, is it expanding or con-
tracting. What would a typical working day be?
Dont get on to the topic of money early in the process.

A general comment is that quant has the reputation of being a hard


area to get into, but if you talk to any hiring manager theyll tell you that
they interview lots of candidates and most are terrible. Its rare to be
forced to choose between two good candidates; its much more common
to be relieved that youve finally found one whos good enough. The moral
is that most candidates are failing to reach the required level. If you are
ON BECOMING A QUANT 13

good at maths and do your preparation, you can be at that level and get a
job.

15. T HE CQF

I get more e-mails on this topic than any other. I have little direct ex-
perience of it. However, heres my impressions from others.
First, the CQF stands for the Certificate in Quantitative Finance and
is run by 7City training. This organization was created by quant author
Paul Wilmott of wilmott.com. Wilmott also created the diploma in Math-
ematical Finance at the University of Oxford before parting company with
that organization.
The CQF is a six-month part-time course which is available by distance
learning. Its aim is to teach the attendee how to be a quant.
Here some comments from a recent satisfied customer who was al-
ready working in banking:
The CQF is an excellent course, that is like a condensed accelerated MSc
in Mathematical Finance. The CQF covers the basics plus a lot of practi-
cal stuff like C++, Excel VBA and advanced topics like uncertain parame-
ters and stochastic volatility. It has definitely opened a lot of doors for me
that were previously closed, and it is becoming more and more recognised
within the industry. The whole thing takes 6 months, with a module per
month. Each module consists of 4 or 5 sections with homework set at the
end of each one. There is an exam at the end of each module, where you
need to score 60% or above to progress to the next module. If anyone fails
a module, they are given a reading list and encouraged to join the course
at the same point six months later - i.e. with enough will no-one fails.
The final exam is a programming project where youre given a Monte Carlo
and FDM scenario to code up. The content of the course is heavily math-
ematical with no holes barred - Stochastic Calculus, derivation of Black
Scholes, BS with dividends, BS with discrete hedging, stochastic vol, jump
diffusion, calibration, interest rates models, credit models, etc etc. Foun-
dational mathematics is given prior to the start of the course if required,
and new entrants are required to sit a small exam to test out their ability
to do the course (basic calculus, linear algebra and probability type ques-
tions). All exams are done at home, except for a final one at the very end
of the course, after the module exams, which is optional and determines if
you get a distinction. A distinction is basically an asterisk by your name in
the FT
14 MARK JOSHI

Another recent attendee says that it inevitably covers less than an MSc
since it is part time over six months, versus one year full-time or two-
years part time for an MSc. He also thought it was well-suited to those al-
ready with day jobs, and valuable for career development for those want-
ing to move into more quantitative areas.
A general impression seems to be that it is easy to pass the course, but
getting a distinction requires some real work and ability.
A head-hunter suggests that it is more useful for those already working
in banking to change areas rather than to move into banking.
Some comments made my Urnash on nuclearphynance (where you
can find further discussion.)
I was looking for 1) a way to learn those parts of quantitative finance
that every quant should know, but that I havent learned so far (because
they are not used at my current job) and 2) something that can be done in
less than a year full-time (for personal reasons). Since this ruled out every
program for a Master in Financial Engineering, I chose the CQF.
Of course, I could have simply bought a couple of books and work through
them myself. However, I learn much better if I know that I have to read
something this weekend, since I will have to answer some questions about
it before Monday, and that Ill have an exam on it in two weeks time.
What do you learn? Certainly not everything which is mentioned in the
many books that one receive (all published by Wiley, surprise, surprise).
Not even everything which is written in the 2nd edition of Pauls book. The
span of the course is much wider. So you do not only work with PDEs,
you also get quite a bit about the Martingale approach, quite a bit on
Credit derivatives, a lecture on portfolio optimisation, a lecture by Ayache
(ITO33) on Convertible bonds, Jaeckel on Monte Carlo, etc. However, note
that the program changes continuously, so I do not know what the current
program is.
Before giving a list of what I liked and disliked about the CQF, you should
know the following: I took the distance course since I do not live in London.
This means that I took it through the internet. You can see the presenter
and what he writes, and you can ask questions either through IM or a mi-
crophone.
Furthermore I already knew approximately 50-60% of what was taught
in the CQF. So I do not know what happens if you start the CQF without any
knowledge about quantitative finance. My idea is that it would be quite
hard. My advice would be to browse for a short period in either Pauls book
ON BECOMING A QUANT 15

or in Hull (I have no experience with Neftcis books so I cannot comment on


them, and as much as I like mjs book, I wouldnt recommend that book as
the first that an aspiring quant should read (but you certainly should read
it after a while)). And you should already be proficient in a programming
language (either VBA, or C(++)) before starting.
Oh, and do not plan anything on the weekend before the end of the mod-
ule; the exams that you have to make after every module are not hard, but
they are quite long. The answers that I sent were usually, per module, ap-
proximately 8-15 pages, not including the excel sheets that were required.
But I must admit that I usually used a large font.
But now, my verdict. Things I liked about the CQF:

Comprehensive, giving a nice overview of the derivatives market.


Sebastian Lleo, one of the presenters, answered my questions ex-
tremely quickly.
Good that most of the course was given using the same symbols. I
have seen plenty of other resources where the different symbols were
used in different chapters to denote the same thing.
The CQF is not just what was written in the 3 volumes of Pauls
book. There was also quite a lot about credit derivatives and a bit
about martingales.
Paul can actually teach. He manages sometimes (not always) to
explain things very clearly and succinctly.
When the connection worked it was nice that one could ask ques-
tions as a distance delegate, while the lecture was given.
I like that fact that the courses are kept online indefinitely, and that
videos of new lectures are added to it.

Things I did not like:

I followed the course through the internet. On multiple occasions I


had such problems with the live connection that I had to abort the
lecture and watch it later on a different date. It was not clear who
caused this (7city, Webex or my ISP). The contact with the Webex
helpdesk did not solve the problem at all.
For some reason the optional exam for extra credit only covered
Pauls book. As far as I remember there were no questions in it
about CDOs or Martingales.
Some of the test exams that were given every week were returned
quite late. Thus it was not possible to use them while learning for
the module exams.
16 MARK JOSHI

While the explanation about the grading of most of exams was suc-
cinct, but clear enough, the explanation of the grading of Module
6 was, with all due respect, laughable. All I got is a single line of
comment. And this as a reply for a bounded 15-page booklet with a
CD-ROM that I had to send to the CQF organisers by priority mail!
There is a helpdesk on the CQF website which one can use to ask the
organisers questions. Since e-mails tend nowadays to get lost in a
spam filter, I used it a lot. While some of my queries were answered
quickly, others were not answered at all. After a long mail to the
CQF organisers they told me that while the helpdesk is there for the
delegates it is still better to contact the presenters directly. If this is
the case, why does the helpdesk exist?

While the list of things I have not liked is longer that the list of what I
did like, this does not mean that I did not like the course. However, the
problems with the connection and the fact that some helpdesk questions
were not answered at all made it a bit hard for those, like me, who take the
distance learning course, to receive the same amount of tutoring as those
who took the course in the classroom. And tutoring through e-mail and the
web is what should make an internet course different form a set of taped
lectures on a DVD.
A general complaint is that its expensive for what it is.
Paul Wilmott is someone who arouses strong emotions in the quan-
titative finance community, and certainly some people are against the
qualification for that reason.
The bottom line seems to be: worth doing if you want to move areas
within banking and your employer is willing to pay, but not the way to
get your first quant job after university.

16. OTHER RESOURCES

There are by now a large number of online forums where these sort of
questions are discussed to death. I keep an up to date list on www.markjoshi.com
. I also running a forum on www.markjoshi.com for discussing books and
career issues.

17. E XAMS

There has been a shift towards the use of written exams to sift entry-
level candidates. There is a certain degree of fairness in this approach.
ON BECOMING A QUANT 17

The main issue tends to be that the question are fitted very much to the
setters prejudices but this is true in all interviews in any case.
Along with this is the shift to associates programmes specifically for
entry-level quants instead of hiring them as needed. For example, Barcap
has a quantitative associates program that only has intake at specific
times.

http://www.barcap.com/campusrecruitment

18. T HE VIEW FROM M ILAN

Here are some comments made by Italian quants.

18.1. Sorts of employers in Italy. In Italy the employers can be divided


into:

Investment Banks, e.g. Banca IMI


Commercial Banks, e.g. IntesaSanpaolo, Unicredito
Asset Management, Private Banking, Alternative Investments, e.g.
Eurizon Capital.
Consultancy firms, e.g. Accenture, Deloitte.
Software companies, e.g. Statpro, FMR.

Better job security can be found in Investment Banks, Commercial Banks


and Investment Management. Typically, financial institutions dont fire
you unless you try to commit fraud. Investment Banks, Commercial Banks
and Investment Management also pay better then Consultancy firms and
Software companies. More over they offer a contract which includes sev-
eral benefits (e.g. health insurance) which arent usually comprised in
other companies standard contracts and which can make the difference,
expecially when at entry level and earning a low base salary. I think its
useful to consider that in period strongly characterized by merge between
big Italian financial institutions, the demand for employees in Banks and
Asset Management usually decreases for a couple of years while jobs op-
portunities in Consultancy firms increase.

18.2. Applying for a job in Italy. If youre looking for your first job, head-
hunters wont help you that much. Italian headhunters tend to pay at-
tention to candidates who already have some years of professional expe-
rience. Nowadays youre in the right track to get your first job when you:
18 MARK JOSHI

Activate your personal contacts eg linkedin.


Send your cv to the human resources department of the compa-
nies youre interested in and propose yourself for an internship.
Its arguably better to get a good internship than a poor perma-
nent job.
Take part to the recruitment events organized by main institu-
tions (e.g. University)

When youre looking for your first job, its really important to be em-
ployable for an internship. Id like to stress this point because starting as
a stageur is a good way to become an employee in few months. Intern-
ship can be as long as four months up to one year. During this period a
project will be assigned to you and a tutor will train you. Detailed infor-
mation on the rules governing internship in Italy can be found here:

http://www.sportellostage.it/aziende/normativa.htm.

As a stageur you have the opportunity to learn and understand if you


like that job, Most important if you reach the goals implied in your as-
signments, you will become very precious for your tutor who has invested
time in training you and therefore he/she is interested in hiring you at the
end of the internships period.
When applying for an entry level role be prepared to face the following
selections stages:

(1) written aptitude tests and quantitative tests


(2) interview with human resources departments people
(3) technical interview/s with one or more members of the team which
you will join if you succeed the selection process
(4) interview with a person from HR to discuss the terms of your con-
tract

Meeting HR people demands a special advice: frequently they do not


even understand the position/role you are being selected, but they try to
understand what are your personal motivations, who are you both as an
individual and a worker, if you will join the team easily, if you will create
more returns than problems on the long run. Simply let them see that
you are such person, and they will be ok.
These steps are common to a great number of firms. Its rhetoric but I
think its worthwhile: remember that theres no second chance at making
a good first impression.
ON BECOMING A QUANT 19

19. T HE VIEW FROM J APAN

Here are some comments from an Australian who did his PhD in pure
mathematics in Japan, and then went looking for a quant job.
I applied to banks in Japan through their standard new grad recruit-
ment programme (undergrads and postgrads together; note this seems to
be different to how a potential PhD applies in the UK). After many info
seminars and early-stage interviews, I got a much better idea of people and
roles in a bank. In fact I decided to go for trader/structurer roles instead of
quant.
The rates hybrids desk at an international bank said if I really wanted
to start immediately in trading then theyd let me (at this stage I had the
leverage of another firms structurer offer in my pocket), but theyd like me
to work as a quant for two years first. They said the best traders know their
models inside-out. I liked all the people there and I trusted what they said,
so I accepted their offer.

20. M ANAGEMENT CONSULTANCY

I have heard the following from a few quants with science PhDs. The
following happens

They decide to try management consultancy.


They get nowhere applying.
They conclude that consultancies are not interested in people like
them.
They become quants.

Not knowing many management consultants, I cant judge how many


people with science PhDs are successful at getting into management con-
sultancy (some certainly are,) but there is a definite theme that consul-
tancies do not value science skills or personalities. (Any management
consultant reading this who wishes to give a contrary view, please feel
free to contact me.)

21. WAR STORIES

I would like to make this guide more dynamic by including the latest
gossip and stories of job applicants. So send me, mark@markjoshi.com,
your experiences, including info such as

how many different firms?


20 MARK JOSHI

how many interviews per firm?


what you thought of headhunters.
how you got the job you took?
which books were helpful?
what you wish someone had told you before you started.

22. I NTERVIEW QUESTIONS

We have published a book of quant job interview questions. We would


like to keep it up to date. So please send me lists of questions: ones that
are particularly tricky or interesting are especially good. Boring questions
are good too, however, in that it gives me a feel for what is happening
in the marketplace. A fair number of questions are also posted on the
forum on www.markjoshi.com. Its scary how many questions come up
time and time again.

23. C OURSES

Once youve got that job, the firm will generally be willing to send you
on at least one training course. Please consider attending one of mine.
My next course will be in Sydney in December 2011 and will cover the
LIBOR market model and its kooderive (ie CUDA) implementation. I also
keep a list on www.markjoshi.com

24. A DVERTISING

Various recruitment agencies and courses have asked for plugs. If you
would like to advertise in this guide, e-mail mark@markjoshi.com .

25. R EPRODUCTION

Please dont copy this guide on to your web-site. I am happy for you
to include extracts and a deep-link to it, however, and I will not move
the guides web location. The reason for this is that I update the guide
regularly and I do not want there to be lots of versions floating around
which I then have to police.
The Essential Algorithmic Trading Reading List
Michael Halls-Moore QuantStart.com

Thank you for signing up to the QuantStart mailing list and receiving the Algorithmic Trading
Toolbox. As part of the toolbox I wanted to provide a comprehensive reading list to help you get
up to speed with algorithmic trading. Algorithmic trading covers a broad range of topics and as such
it can be extremely confusing for a beginner to know where to start. For this reason I have labelled
each book as "Beginner" or "Advanced". If you have no prior background with algorithmic trading,
then I suggest consulting the beginner texts and work you way through to the advanced books.

Everyone who reads this list will have taken a very different educational path. Some of you may be
experienced discretionary traders who are interested in automating your strategies, but haven't
coded in a programming language or delved into advanced mathematics before. Others of you may
have a PhD in statistics or machine learning but have never applied your skills to the financial
markets. I have tried to create a "one size fits all" list, but obviously it will needed to be tailored to
your particular skillset and interests. I hope the list will be of interest to both retail traders who want
to "test the quantitative waters" as well as seasoned hedge fund professionals who are looking for a
new approach to their trading.

The approach I've taken is to introduce you to the necessary mathematics that will help you get up
to speed in creating your algorithmic trading strategies. You can of course skip these books if you
want to "dive in" or if you have an extensive mathematical background. However, if you haven't
taken a first year university level course in Probability, Calculus or Linear Algebra, you may find
the subsequent texts hard going.

I'm well aware that the length of the list can be off-putting to a beginner! Clearly it is unrealistic to
consider reading all of these books from cover-to-cover. There are only 24 hours in the day, after
all! In my own personal reading, I tend to concentrate on specific chapters of individual books. I re-
read those chapters multiple times when necessary. Knowing the basics extremely well is much
more important than having an encyclopaedic knowledge of all statistical machine learning and
time series models.

Necessary Mathematics
This is an optional section and is only suitable for those who have no university mathematics
background. In order to tackle this section you should be familiar with mathematics to a UK A-
Level or European International Baccalaureate (IB) level. I believe this is equivalent to senior high-
school mathematics in the US. In order to tackle these following books you should be familiar with
basic differentiation and integration techniques, trigonometry, and perhaps some exposure to
matrices and ordinary differential equations. If these topics suggested are unfamiliar to you, it
may be necessary to take some more elementary mathematics courses, perhaps from an online
MOOC site such as Coursera or Khan Academy, prior to tackling the books below.

The mathematics of quantitative trading differs significantly from that of derivative pricing, which
is also known as "mathematical finance", "financial engineering" or "quantitative finance".
Unfortunately all of these phrases are vague and only serve to confuse beginners coming into
finance! Derivatives pricing makes extensive use of upper undergraduate mathematics such as
partial differential equations, stochastic calculus, advanced linear algebra and vector analysis. There
is not a great deal of stochastic calculus in general algorithmic trading, unless you are considering
options or volatility trading, in which case you will need to be aware of stochastic calculus, the
Black Scholes model and its extensions.

Schaum's Outline of Probability and Statistics - John Schiller, R. Alu Srinivasan, Murray
Spiegel [BEGINNER]
If you have no probability or statistics background whatsoever, this is a great book with which to
gain familiarity. As I mention below, Schaum's Guides are great if you enjoy learning by working
through a lot of questions. This book begins with very elementary concepts in probability and
slowly leads up to basic intuition for frequentist statistical modelling via null hypothesis testing.

Probability and Random Processes - Geoffrey Grimmett, David Stirzaker [ADVANCED]


This is regarded as one of the definitive texts on probability. If you wish to build on the basic
knowledge acquired in the Schaum's book above, or learn from scratch at a much deeper level, then
this book will be highly appropriate. There is also a secondary book full of questions, if you enjoy
learning in this manner.

Linear Algebra And Its Applications - Gilbert Strang [BEGINNER]


This is probably one of the most famous books on Linear Algebra! Gilbert Strang has been teaching
a Linear Algebra course at MIT for some time, which is widely regarded as one of the best courses
out there. In fact, you can watch the course on MIT's Open Courseware page here. The book is a
good complement to the course and will rapidly get you up to scratch with the techniques you will
need for quantitative trading modelling. It is geared towards the practitioner (and thus of value to
quant traders) and not the mathematics student, as proofs are de-emphasised over techniques.

Basic Linear Algebra - T. S. Blyth, E. F. Robertson [BEGINNER]


When I was studying mathematics as an undergradate, I was always partial to the Springer
Undergraduate Mathematics Series (SUMS) books. This book in particular is extremely useful for
gaining an insight into Linear Algebra as a mathematician would look at it. This may be slightly
beyond what a practitioning quant trader would need, but given that a lot of a quant researchers time
is given over to diving through research papers for new models, it is worth having a solid
mathematical grounding in linear algebra as a mathematican would present it. There is also "Further
Linear Algebra" in the same series, although this looks at areas which are probably less of interest
to a practising quant researcher.

Calculus - Michael Spivak [BEGINNER]


This is probably one of the best books for learning Calculus. If you like to learn via the self-study
route, then working through Spivak will be an extremely rewarding experience. It discusses
differentiation, integration, trigonometric functions as well as sequences and series. It sets the stage
well for more advanced courses on Real Analysis. The latter is possibly more appropriate for a
student interested in financial engineering and derivatives pricing, but as a quant trader it is
essential to be at least marginally aware of this more rigourous aspect of mathematics.

Vector Calculus - Paul Matthews [ADVANCED]


Another SUMS book, this one is relatively short and covers the necessary techniques in vector
calculus. Such techniques appear frequently in optimisation problems as well as neural
network/deep learning models. While the book is really useful for getting to grips with basic vector
calculus concepts, the latter section is geared more towards physical applications, such as fluid
mechanics, electromagnetics and continuum mechanics, as opposed to quant trading models!

Vector Calculus - Jerrold Marsden, Anthony Tromba [ADVANCED]


This is the book that I learnt Vector Calculus from as an undergraduate. It is extremely
comprehensive, covering a wide range of techniques in vector calculus and some differential
geometry. Once again, it is pitched at the mathematician, rather than the practising quant and
probably provides more content than would be necessary for most algorithmic or quantitative
models. However, it is a great reference and as such it will always find a place on the shelf!

Statistics and Machine Learning


The main core of algorithmic trading research involves statistical machine learning and time series
analysis. The majority of quantitative models found in industry will generally make use of either of
these two, rather broad, areas. I've provided a gentle introduction to statistical machine learning and
then subsequently the Bayesian approach to statistics. Finally, I've provided some more advanced
machine learning books which discuss the near state of the art. The next step would be to read the
latest research pre-prints straight from a source such as the arXiv.

An Introduction to Statistical Learning: with Applications in R - Gareth James, Daniela


Witten, Trevor Hastie, Robert Tibshirani [BEGINNER]
This is the "smaller brother" to the book below, The Elements of Statistical Learning (ESL). It is
generally known as "ISL". I highly recommend this book if your mathematics and probability is a
little rusty and you are eager to begin getting involved in some machine learning. It covers the main
pitfalls in detail sufficient for the practitioner but provides references for further study. In essence, it
is basically a much less technical version of ESL. That being said, if you wish to really become an
expert in the world of algorithmic trading, you will eventually need to get to grips with the concepts
outlined in ESL. As with ESL, the ebook version can be found for free on the authors' website:
http://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf

The Elements of Statistical Learning - Trevor Hastie, Robert Tibshirani, Jerome Friedman
[ADVANCED]
If I was forced to recommend only one book from the entire list presented here, this is the book I
would suggest. It is an absolutely exceptional book on how to create modern statistical machine
learning techniques. Note that this is not a beginner book! It requires a solid grounding in linear
algebra, calculus and probability. However, it presents and elucidates upon all of the necessary
issues and trade-offs that arise in creating machine learning models, as well as providing a solid
statistical basis for each model. Understanding this book will give you a "feel" for how to create
new models, as well as the limitations of machine learning. The best part is that the ebook version
can be found completely for free on the authors' website:
http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf

Doing Bayesian Data Analysis - John Kruschke [BEGINNER]


Krusche's book, also known as the "puppy book", due to its interesting dog-themed cover (!) is a
fantastic introduction to Bayesian statistics. He assumes no familiarity with the Bayesian approach
and so the book can be read in a cover-to-cover fashion. When I was first learning Bayesian stats, I
found it to be an indispensible guide. It is a little verbose, but for beginners this is probably
appropriate. It also makes use of R and BUGS to carry out the Bayesian models discussed. Bayesian
statistics crops up in machine learning and financial modelling a lot, hence it is absolutely essential
to be aware of the basics before digging in to deeper models.

Bayesian Reasoning and Machine Learning - David Barber [ADVANCED]


Barber's book is chock-full of various machine learning models as well as "graphical models", such
as Bayesian networks. It is rather technical, so your probability, calculus and linear algebra needs to
be strong to follow this book. It is not really one to read cover-to-cover, rather it can be read on a
model-by-model basis as and when you wish to explore new approaches. The book makes use of
MatLab for its coding environment. I tend to use this book when I want to investigate a new area of
models and I wish to understand how the simpler models are carried out before diving into research
papers. Definitely a good book to have on the shelf!

Bayesian Data Analysis (3rd Ed) - Andrew Gelman et al. [ADVANCED]


This classic book by Gelman (who is also a popular blogger) provides an advanced look at Bayesian
methods. It is a great book to read subsequent to "Doing Bayesian Data Analysis" above, provided
you have the necessary probability and linear algebra background. If your particular form of trading
makes extensive use of Bayesian techniques, then this is an indispensible text to have on the shelf.

Machine Learning: A Probabilistic Perspective - Kevin Murphy [ADVANCED]


This is an extremely comprehensive book, which is well known for its breadth of coverage. Given
the vastness of the statistical machine learning landscape, this book successfully provides a
unification of the main areas of ML, whether one has a background in frequentists statistics,
machine learning or Bayesian analysis. As with Barber's book above, it considers graphical models
and also looks at Hidden Markov Models and State Space models (often associated with time series
analysis). Another benefit is that up to date academic references are provided in order to venture
further with particular models.

Time Series Analysis


Time series analysis is essential in allowing us to form models for particular financial time series.
This allows us to (statistically) identify trends, mean-reversion, seasonality effects and changes in
market behaviour. Once again we start with an introductory texts and then progress to examining
more sophisticated models.

Schaum's Outline of Statistics and Econometrics, 2nd Edition - Dominick Salvatore, Derrick
Reagle [BEGINNER]
For those of you who like the Q&A approach to self-study, the Schaum's Guides are fantastic. This
book in particular will take you from no statistical background whatsoever to a place where you can
carry out basic time series modelling. The format, as with all Schaum's Guides, is to learn by doing
a lot of questions, around half of which have model answers inline with the questions, while the rest
can be found at the back. I've read many of these books over the years and have always found them
to be a great way to learn.

Introductory Time Series with R - Paul Cowpertwait, Andrew Metcalfe [BEGINNER]


While the Schaum's Guide above is great for learning the theory, there is probably no better way to
gain a mastery of implementing models than by doing just that implementing them. This book
uses R, the statistical programming language, to introduce time series modelling. You will learn
about stochastic processes, state space models, stationary and non-stationary processes as well as
how to become a good R programmer! I highly recommend this if you've never had any exposure to
time series modelling before.

Analysis of Financial Time Series Ruey Tsay [ADVANCED]


Tsay's book is a classic for applying time series modelling to financial time series. It is not a cheap
book (currently 90 on Amazon.co.uk!), but it contains a wealth of modelling insight for our
particularly domain of interest. It also provides references from which to follow up the state of the
art in each particular area. In particular it discusses the usual linear models such as ARMA, as well
as the GARCH family and non-linear models. However the main benefit of the book is that it covers
financial time series and so spends a lot of time considering high-frequency market data as well as
factor models. Both of these topics are particularly relevant for quant traders.

General Trading
As I mentioned above, some of you may have little experience with the financial markets or
discretionary (i.e. non-algorithmic) trading. Hence I've listed some of the more useful trading texts
that will help you get a feel for how professional trading is carried out.

Market Wizards: Updated Interviews With Top Traders - Jack Schwager [BEGINNER]
A few of my discretionary trader friends who work in institutional settings said that this was the text
that they were given to read when they first started trading their own book. While the period of
coverage is well in the past now, the mentality of the traders and the pearls of wisdom gleaned
make this a worthwhile addition to the bookshelf. If you enjoy the interview style of this book then
there are also two other books in the series: The New Market Wizards and Hedge Fund Market
Wizards.

Following the Trend: Diversified Managed Futures Trading - Andreas Clenow [BEGINNER]
This is a more casual read from a practising professional futures trader. It describes the nuances of
how futures are traded in practice, with some basic trend following algorithms, along with a healthy
dose of real-world risk management techniques. Perhaps the most interesting aspect of the book is
the diary, which allows one to see how such trend-following models work in practice over certain
periods of time. It also briefly discusses how to run a trading firm from an entrepreneurial point of
view, for those considering a career in Commoditiy Trading Advisors (CTAs), an asset management
firm or hedge fund.

Volatility Trading Euan Sinclair [ADVANCED]


This is a great book on volatility trading. Sinclair clearly has a lot of experience with options
trading and the complexity that comes along with it. The first part of the book defines volatility,
how to measure it and subsequently how to forecast it. Implied volatility and hedging are covered
next, followed by money management and a discussion on trader psychology. The latter section of
the book discusses ETFs and volatility indices. This is one of those books that has a lot of insight in
nearly every sentence. It is definitely worth picking up if you are considering automated or
discretionary options trading.

Algorithmic/Quantitative Trading
Finally we come to the process of creating algorithmic/quantitative trading models and
implementing them against live markets. Having read the prior books on mathematics, statistical
and time series modelling, as well as some basic trading concepts, you will be in a good position to
tie it all together to create live automated trading strategies.

Depending upon your programming expertise and the required level of automation and redundancy,
you will either make use of external vendor software such as MT4 (for forex) or create your own
custom end-to-end backtesting and trading system against a brokerage such as Interactive Brokers,
OANDA or Dukascopy. There aren't many books that really go into the detail of how to implement
an end-to-end trading system, but the following go quite far into discussing what you'll need to
know:

Quantitative Trading: How to Build Your Own Algorithmic Trading Business - Ern est Chan
[BEGINNER]
This is probably the best book to read as a beginner entering quantitative trading. Ernest Chan does
a great job of outlining all of the issues that will affect a retail quantitative trader. The book is not
heavy on particular strategies, but rather discusses the other important issues in quant trading such
as risk management, position sizing, portfolio management and how to run an algorithmic trading
business. All strategies and techniques are coded in MatLab.

Algorithmic Trading: Winning Strategies and Their Rationale - Ernest Chan [ADVANCED]
This book is a great follow-on from Chan's previous book. It provides many more trading strategies
and definitely shows how Chan's own experience has developed since the previous book. The book
is definitely more technical and you will need to be aware of basic time series analysis methods (or
at least how to understand them in the context of this book!) in order to get the most out of it. The
book is particularly good in discussing strategies for futures and forex, which are areas not often
discussed in algorithmic trading books. Once again, all models and trading code are implemented in
MatLab.

Inside the Black Box: A Simple Guide to Quantitative and High Frequency Trading - Rishi
Narang [ADVANCED]
This was one of the first books about institutional quantitative trading that I read when I started at a
quant fund in my first quant role. It is written for investors who are considering investing in
quantitative strategies and has been designed to provide an insight into all aspects of the "black
box" so that these investors can make informed decisions as to whether to invest. However, it also
provides a fantastic non-technical overview into how an entire quantitative trading strategy is set up
and carried out in practice. The second edition discusses high-frequency trading (HFT) in detail.

Algorithmic Trading and DMA: An Introduction to Direct Access Trading Strategies - Barry
Johnson [ADVANCED]
The phrase 'algorithmic trading', in the financial industry, usually refers to the execution algorithms
used by banks and brokers to execute efficient trades. I am using the term to cover not only those
aspects of trading, but also quantitative or systematic trading. This book is mainly about the former,
being written by Barry Johnson, who is a quantitative software developer at an investment bank.
Does this mean it is of no use to the retail quant? Not at all. Possessing a deeper understanding of
how exchanges work and "market microstructure" can aid immensely the profitability of retail
strategies. Despite it being a heavy tome, it is worth picking up.

Trading and Exchanges: Market Microstructure for Practitioners - Larry Harris


[ADVANCED]
This book concentrates on market microstructure, which I personally feel is an essential area to
learn about, even at the beginning stages of quant trading. Market microstructure is the "science" of
how market participants interact and the dynamics that occur in the order book. It is closely related
to how exchanges function and what actually happens when a trade is placed. This book is less
about trading strategies as such, but more about things to be aware of when designing execution
systems. Many professionals in the quant finance space regard this as an excellent book and I also
highly recommend it.

The Science of Algorithmic Trading and Portfolio Management Robert Kissell


[ADVANCED]
As with Algorithmic Trading and DMA, this book is geared more towards execution algorithmic
trading. It discusses market microstructure in detail, as well as transaction cost analysis, and the
main execution algorithms including VWAP and IS. The book also discusses how such techniques
vary across asset classes and provides some advanced forecasting techniques for liquidity
estimation. The book also contains a detailed section on quantitative portfolio optimisation.
The Self-Study Guide to Becoming a
Quantitative Trader
Michael Halls-Moore QuantStart.com

Quantitative trader roles within large quant funds are often perceived to be one of the most
prestigious and lucrative positions in the quantitative finance employment landscape. Trading
careers in a "parent" fund are often seen as a springboard towards eventually allowing one to form
their own fund, with an initial capital allocation from the parent employer and a list of early
investors to bring on board.
Competition for quantitative trading positions is intense and thus a significant investment of time
and effort is necessary to obtain a career in quant trading. In this article I will outline the common
career paths, routes in to the field, the required background and a self-study plan to help both retail
traders and would-be professionals gain skills in quantitative trading.

Setting Expectations
Before we delve into the lists of textbooks and other resources, I will attempt to set some
expectations about what the role involves. Quantitative trading research is much more closely
aligned with scientific hypothesis testing and academic rigour than the "usual" perception of
investment bank traders and the associated bravado. There is very little (or non-existent)
discretionary input when carrying out quantitative trading as the processes are almost universally
automated.
The scientific method and hypothesis testing are highly-valued processes within the quant finance
community and as such anybody wishing to enter the field will need to have been trained in
scientific methodology. This often, but not exclusively, means training to a doctoral research level
- usually via having taken a PhD or graduate level Masters in a quantitative field. Although one can
break into quantitative trading at a professional level via alternate means, it is not common.
The skills required by a sophisticated quantitative trading researcher are diverse. An extensive
background in mathematics, probability and statistical testing provide the quantitative base on
which to build. An understanding of the components of quantitative trading is essential, including
forecasting, signal generation, backtesting, data cleansing, portfolio management and execution
methods. More advanced knowledge is required for time series analysis, statistical/machine learning
(including non-linear methods), optimisation and exchange/market microstructure. Coupled with
this is a good knowledge of programming, including how to take academic models and implement
them rapidly.
This is a significant apprenticeship and should not be entered into lightly. It is often said that it
takes 5-10 years to learn sufficient material to be consistently profitable at quantitative trading in a
professional firm. However the rewards are significant. It is a highly intellectual environment
with a very smart peer group. It will provide continuous challenges at a fast pace. It is extremely
well remunerated and provides many career options, including the ability to become an
entrepreneur by starting your own fund after demonstrating a long-term track record.

Necessary Background
It is common to consider a career in quantitative finance (and ultimately quantitative trading
research) while studying on a numerate undergraduate degree or within a specialised technical
doctorate. However, the following advice is applicable to those who may wish to transition into a
quant trading career from another, albeit with the caveat that it will take somewhat longer and will
involve extensive networking and a lot of self-study.
At the most basic level, professional quantitative trading research requires a solid understanding
of mathematics and statistical hypothesis testing. The usual suspects of multivariate calculus, linear
algebra and probability theory are all required. A good class-mark in an undergraduate course of
mathematics or physics from a well-regarded school will usually provide you with the necessary
background.
If you do not have a background in mathematics or physics then I would suggest that you should
pursue a degree course from a top school in one of those fields. You will be competing with
individuals who do have such knowledge and thus it will be highly challenging to gain a position at
a fund without some definitive academic credentials.
In addition to having a solid mathematical understanding it is necessary to be adept at
implementation of models, via computer programming. The common choices of modelling
languages these days include R, the open-source statistical language; Python, with its extensive data
analysis libraries; or MatLab. Gaining extensive familiarity with one of these packages is a
necessary prerequisite to becoming a quantitative trader. If you have an extensive background in
computer programming, you may wish to consider gaining entry into a fund via the Quantitative
Developer route.
The final major skill needed by quantitative trading researchers is that of being able to objectively
interpret new research and then implement it rapidly. This is a skill learned via doctoral training
and one of the reasons why PhD candidates from top schools are often the first to be picked for
quantitative trading positions. Gaining a PhD in one of the following areas (particularly machine
learning or optimisation) is a good way into a sophisticated quant fund.

Econometrics and Time Series Analysis


Fundamentally the majority of quantitative trading is about time series analysis. This predominently
includes asset price series as a function of time, but might include derivative series in some form.
Thus time series analysis is an essential topic for the quantitative trading researcher. I've written
about how to get started in the article on Top 10 Essential Resources for Learning Financial
Econometrics. That article includes basic guides to probability and beginning programming in R.
Recently I came across a fantastic resource called OTexts, which provides open access textbooks.
The following book is especially useful for forecasting:
Forecasting: Principles and Practice by Hyndman and Athanasopoulos - This free book is an
excellent way to begin learning about statistical forecasting via the R programming
environment. It covers simple and multivariate regression, exponential smoothing and
ARIMA techniques as well as more advanced forecasting models. The book is originally
pitched at business/commerce degrees but is sufficiently technical to be of interest to
beginning quants.
With the basics of time series under your belt the next step is to begin studying statistical/machine
learning techniques, which are the current "state of the art" within quantitative finance.

Statistical Machine Learning


Modern quantitative trading research relies on extensive statistical learning techniques. Up until
relatively recently, the only place to learn such techniques as applied to quantitative finance was in
the literature. Thankfully well-established textbooks now exist which bridge the gap between theory
and practice. It is the next logical follow-on from econometrics and time series forecasting
techniques although there is significant overlap in the two areas.

The main techniques of interest include Multivariate Linear Regression, Logistic Regression,
Resampling Techniques, Tree-Based Methods (including Random Forests), Support Vector
Machines (SVM), Principal Component Analysis (PCA), Clustering (K-Means, Hierarchical),
Kernal Methods and Neural Networks. Each of these topics is a significant learning exercise in
itself, although the above two texts will cover the necessary introductory material, providing further
references for deeper study.
A particularly useful (and free!) set of web courses on Machine Learning/AI are provided by
Coursera:
Machine Learning by Andrew Ng - This course covers the basics of the methods I have
briefly mentioned above. It has received high praise from individuals who have participated.
It is probably best watched as a companion to reading ISL or ESL, which are two books
mentioned in the Essential Algorithmic Trading Reading List PDF.

Neural Networks for Machine Learning by Geoffrey Hinton - This course focuses primarily
on neural networks, which have a long history of association with quantitative finance. If
you wish to specifically concentrate on this area, then this course is worth taking a look at,
in conjunction with a solid textbook.

Statistical learning is extremely important in quant trading research. We can bring to bear the entire
weight of the scientific method and hypothesis testing in order to rigourously assess the quant
trading research process. For quantitative trading we are interested in testable, repeatable results
that are subject to constant scrutiny. This allows easy replacement of trading strategies as and when
performance degrades. Note that this is in stark contrast to the approach taken in "discretionary"
trading where performance and risk are not often assessed in this manner.
Why Should We Use The Scientific Method In Quantitative Trading?

The statistical approach to quant trading is designed to eliminate issues that surround discretionary
methods. A great deal of discretionary technical trading is rife with cognitive biases, including loss
aversion, confirmation bias and the bandwagon effect. Quant trading research uses alternative
mathematical methods to mitigate such behaviours and thus enhance trading performance.
In order to carry out such a methodical process quant trading researchers possess a continuously
skeptical mindset and any strategy ideas or hypotheses about market behaviour are subject to
continual scrutiny. A strategy idea will only be put into a "production" environment after extensive
statistical analysis, testing and refinement. This is necessary because the market has a rather low
signal-to-noise ratio. This creates difficulties in forecasting and thus leads to a challenging trading
environment.
What Modelling Problems Do We Encounter In Quantitative Finance?

The goal of quantitative trading research is to produce algorithms and technology that can satisfy a
certain investment mandate. In practice this translates into creating trading strategies (and related
infrastructure) that produce consistent returns above a certain pre-determined benchmark, net of
costs associated with the trading transactions, while minimising "risk". Hence there are a few levers
that can be pulled to enhance the financial objectives.
A great deal of attention is often given to the signal/alpha generator, i.e. "the strategy". The best
funds and retail quants will spend a significant amount of time modelling/reducing transaction
costs, effectively managing risk and determining the optimal portfolio. This PDF is primarily aimed
at the alpha generator component of the stack, but please be aware that the other components are of
equal importance if successful long-term strategies are to be carried out.
We will now investigate problems encountered in signal generation and how to solve them. The
following is a basic list of such methods (which clearly overlap) that are often encountered in signal
generation problems:
Forecasting/Prediction - The most common technique is direct forecasting of a financial
asset price/direction based on prior prices (or fundamental factors). This usually involves
detection of an underlying signal in the "noise" of the market that can be predicted and thus
traded upon. It might also involve regressing against other factors (including lags in the
original time series) in order to assess the future response against future predictors.
Clustering/Classification - Clustering or classification techniques are methods designed to
group data into certain classes. These can be binary in nature, e.g. "up" or "down", or
multiply-grouped, e.g. "weak volatility", "strong volatility", "medium volatility".
Sentiment Analysis - More recent innovations in natural language processing and
computational speed have lead to sophisticated "sentiment analysis" techniques, which are
essentially a classification method, designed to group data based on some underlying
sentiment factors. These could be directional in nature, e.g. "bullish", "bearish", "neutral" or
emotional such as "happy", "sad", "positive" or "negative". Ultimately this will lead to a
trading signal of some form.
Big Data - Alternative sources of data, such as consumer social media activities, often lead
to terabytes (or greater) of data that requires more novel software/hardware in order to
interpret. New algorithm implementations have been created in order to handle such "big
data".

Modelling Methodology

I've provided some key Machine Learning textbooks in the accompanying PDF The Essential
Algorithmic Trading Reading List and they will discuss the following topics and models, which
are necessary for a beginning quant trader to know:
Statistical Modelling and Limitations - The books will outline what statistical learning is
and isn't capable of along with the tradeoffs that are necessary when carrying out such
research. The difference between prediction and inference is outlined as well as the
difference between supervised and unsupervised learning. The bias-variance tradeoff is also
explained in detail.
Linear Regression - Linear regression (LR) is one of the simplest supervised learning
techniques. It assumes a model where the predicted values are a linear function of the
predictor variable(s). While this may seem simplistic compared to the remaining methods in
this list, linear regression is still widely utilised in the financial industry. Being aware of LR
is important in order to grasp the later methods, some of which are generalisations of LR.
Supervised Classification: Logistic Regression, LDA, QDA, KNN - Supervised
classification techniques such as Logistic Regression, Linear/Quadratic Discriminant
Analysis and K-Nearest Neighbours are techniques for modelling qualitative classification
situations, such as prediction of whether a stock index will move up or down (i.e. a binary
value) in the next time period.
Resampling Techniques: Bootstrapping, Cross-Validation - Resampling techniques are
necessary in quantitative finance (and statistics in general) because of the dangers of model-
fitting. Such techniques are used to ascertain how a model behaves over different training
sets and how to minimise the problem of "overfitting" models.
Decision Tree Methods: Bagging, Random Forests - Decision trees are a type of graph
that are often employed in classification settings. Bagging and Random Forest techniques
are ensemble methods making use of such trees to reduce overfitting and reduce variance in
individually fitted supervised learning methods.
Neural Networks - Artificial Neural Networks (ANN) are a machine learning technique
often employed in a supervised manner to find non-linear relationships between predictors
and responses. In the financial domain they are often used for time series prediction and
forecasting.
Support Vector Machines - SVMs are also classification or regression tools, which work
by constructing a hyperplane in high or infinite dimensonal spaces. The kernel trick allows
non-linear classification to occur by a mapping of the original space into an inner-product
space.
Unsupervised Methods: PCA, K-Means, Hierarchical Clustering, NNMF - Unsupervised
learning techniques are designed to find hidden structure in data, without the use of an
objective or reward function to "train" on. Additionally, unsupervised techniques are often
used to pre-process data.
Ensemble Methods - Ensemble methods make use of multiple separate statistical learning
models in order to achieve greater predictive capability than could be achieved from any of
the individual models.
Lesson 1: Beginner's Guide to Quantitative Trading
In the first lesson in the quantitative trading email series I want to introduce you to some of the basic concepts
which accompany an end-to-end quantitative trading system.

This email will hopefully serve two audiences. The first will be individuals trying to obtain a job at a fund as a
quantitative trader. The second will be individuals who wish to try and set up their own "retail" algorithmic trading
business.

Quantitative trading is an extremely sophisticated area of quant finance. It can take a significant amount of time to
gain the necessary knowledge to pass an interview or construct your own trading strategies.

Not only that but it requires programming expertise in a language such as MATLAB, R, Python or C#. However as
the trading frequency of the strategy increases, the technological aspects become much more relevant. Thus being
familiar with C/C++ will be of paramount importance.

A quantitative trading system consists of four major components:

Strategy Identification - Finding a strategy, exploiting an edge and deciding on trading frequency

Strategy Backtesting - Obtaining data, analysing strategy performance and removing biases

Execution System - Linking to a brokerage, automating the trading and minimising transaction costs

Risk Management - Optimal capital allocation, "bet size"/Kelly criterion and trading psychology

We'll begin by taking a look at how to identify a trading strategy.

Strategy Identification

All quantitative trading processes begin with an initial period of research.

This research process encompasses finding a strategy, seeing whether the strategy fits into a portfolio of other
strategies you may be running, obtaining any data necessary to test the strategy and trying to optimise the strategy for
higher returns and/or lower risk.

You will need to factor in your own capital requirements if running the strategy as a "retail" trader and how any
transaction costs will affect the strategy.

Contrary to popular belief it is actually quite straightforward to find profitable strategies through various public sources.
Academics regularly publish theoretical trading results (albeit mostly gross of transaction costs). Quantitative finance
blogs will discuss strategies in detail. Trade journals will outline some of the strategies employed by funds.

You might question why individuals and firms are keen to discuss their profitable strategies, especially when they
know that others "crowding the trade" may stop the strategy from working in the long term.

The reason lies in the fact that they will not often discuss the exact parameters and tuning methods that they have
carried out. These optimisations are the key to turning a relatively mediocre strategy into a highly profitable one.
In fact, one of the best ways to create your own unique strategies is to find similar methods and then carry out your
own optimisation procedure.

Here is a small list of places to begin looking for strategy ideas:

Social Science Research Network - www.ssrn.com

arXiv Quantitative Finance - arxiv.org/archive/q-fin

Seeking Alpha - www.seekingalpha.com

Elite Trader - www.elitetrader.com

Nuclear Phynance - www.nuclearphynance.com

Quantivity - quantivity.wordpress.com

Many of the strategies you will look at will fall into the categories of mean-reversion and trend-
following/momentum.

A mean-reverting strategy is one that attempts to exploit the fact that a long-term mean on a "price series", such as
the spread between two correlated assets, exists and that short term deviations from this mean will eventually revert.

A momentum strategy attempts to exploit both investor psychology and big fund structure by "hitching a ride" on a
market trend, which can gather momentum in one direction, and follow the trend until it reverses.

Another hugely important aspect of quantitative trading is the frequency of the trading strategy. Low frequency
trading (LFT) generally refers to any strategy which holds assets longer than a trading day.

Correspondingly, high frequency trading (HFT) generally refers to a strategy which holds assets intraday.

Ultra-high frequency trading (UHFT) refers to strategies that hold assets on the order of seconds and milliseconds.

As a retail practitioner HFT and UHFT are certainly possible, but only with detailed knowledge of the trading
"technology stack" and order book dynamics.

Once a strategy, or set of strategies, has been identified it now needs to be tested for profitability on historical data.
That is the domain of backtesting.

Strategy Backtesting

The goal of backtesting is to provide evidence that the strategy identified via the above process is profitable when
applied to both historical and out-of-sample data. This sets the expectation of how the strategy will perform in the
"real world".

However, backtesting is NOT a guarantee of success, for various reasons. It is perhaps the most subtle area of
quantitative trading since it entails numerous biases, which must be carefully considered and eliminated as much as
possible.
We will discuss the common types of bias including look-ahead bias, survivorship bias and optimisation bias (also
known as "data-snooping" bias).

Other areas of importance within backtesting include availability and cleanliness of historical data, factoring in realistic
transaction costs and deciding upon a robust backtesting platform. We'll discuss transaction costs further in the
Execution Systems section below.

Once a strategy has been identified, it is necessary to obtain the historical data through which to carry out testing
and, perhaps, refinement.

There are a significant number of data vendors across all asset classes. Their costs generally scale with the quality,
depth and timeliness of the data.

The traditional starting point for beginning quant traders (at least at the retail level) is to use the free data set from
Yahoo Finance. I won't dwell on providers too much here, rather I would like to concentrate on the general issues
when dealing with historical data sets.

The main concerns with historical data include accuracy/cleanliness, survivorship bias and adjustment for corporate
actions such as dividends and stock splits:

Accuracy pertains to the overall quality of the data - whether it contains any errors. Errors can sometimes be easy to
identify, such as with a spike filter, which will pick out incorrect "spikes" in time series data and correct for them. At
other times they can be very difficult to spot. It is often necessary to have two or more providers and then check all of
their data against each other.

Survivorship bias is often a "feature" of free or cheap datasets. A dataset with survivorship bias means that it does
not contain assets which are no longer trading. In the case of equities this means delisted/bankrupt stocks. This bias
means that any stock trading strategy tested on such a dataset will likely perform better than in the "real world" as the
historical "winners" have already been preselected.

Corporate actions include "logistical" activities carried out by the company that usually cause a step-function change
in the raw price, that should not be included in the calculation of returns of the price. Adjustments for dividends and
stock splits are the common culprits. A process known as back adjustment is necessary to be carried out at each one
of these actions. One must be very careful not to confuse a stock split with a true returns adjustment. Many a trader
has been caught out by a corporate action!

In order to carry out a backtest procedure it is necessary to use a software platform. You have the choice between
dedicated backtest software, such as Tradestation, a numerical platform such as Excel or MATLAB or a full custom
implementation in a programming language such as Python or C++.

I won't dwell too much on Tradestation, Excel or MATLAB, as I believe in creating a full in-house technology stack for
reasons outlined below.

One of the benefits of doing so is that the backtest software and execution system can be tightly integrated, even with
extremely advanced statistical strategies. For HFT strategies in particular it is essential to use a custom
implementation.

When backtesting a system one must be able to quantify how well it is performing. The "industry standard" metrics for
quantitative strategies are the maximum drawdown and the Sharpe Ratio.
The maximum drawdown characterises the largest peak-to-trough drop in the account equity curve over a particular
time period, usually annual. This is most often quoted as a percentage.

LFT strategies will tend to have larger drawdowns than HFT strategies, due to a number of statistical factors. A
historical backtest will show the past maximum drawdown, which is a good guide for the future drawdown
performance of the strategy.

The second measurement is the Sharpe Ratio, which is heuristically defined as the average of the excess returns
divided by the standard deviation of those excess returns. Here, excess returns refers to the return of the strategy
above a pre-determined benchmark, such as the S&P500 or a 3-month Treasury Bill.

Once a strategy has been backtested and is deemed to be free of biases (in as much as that is possible!), with a good
Sharpe and minimised drawdowns, it is time to build an execution system.

Execution Systems

An execution system is the means by which the list of trades generated by the strategy are sent and executed by the
broker.

Despite the fact that the trade generation can be semi- or even fully-automated, the execution mechanism can be
manual, semi-manual (i.e. "one click") or fully automated.

For LFT strategies, manual and semi-manual techniques are common. For HFT strategies it is necessary to create a
fully automated execution mechanism, which will often be tightly coupled with the trade generator due to the
interdependence of strategy and technology.

The key considerations when creating an execution system are the interface to the brokerage, minimisation of
transaction costs (including commission, slippage and the spread) and divergence of performance of the live
system from backtested performance.

There are many ways to interface to a brokerage. They range from calling up your broker on the telephone right
through to a fully-automated high-performance Application Programming Interface (API).

Ideally you want to automate the execution of your trades as much as possible. This frees you up to concentrate on
further research, as well as allow you to run multiple strategies or even strategies of higher frequency.

The common backtesting software outlined above, such as MATLAB, Excel and Tradestation are good for lower
frequency, simpler strategies. However it will be necessary to construct an in-house execution system written in a
high performance language such as C++ in order to do any real HFT.

As an anecdote, in the fund I used to be employed at, we had a 10 minute "trading loop" where we would download
new market data every 10 minutes and then execute trades based on that information in the same time frame. This
was using an optimised Python stack. For anything approaching minute- or second-frequency data, I believe C/C++
would be more ideal.

In a larger fund it is often not the domain of the quant researcher to optimise execution. However in smaller shops or
HFT firms, the traders ARE the executors and so a much wider skillset is often desirable.
Bear that in mind if you wish to be employed by a fund. Your programming skills will be as important, if not more so,
than your statistics and time series talents!

Another major issue which falls under the banner of execution is that of transaction cost minimisation.

There are generally three components to transaction costs: Commissions (or tax), which are the fees charged by the
brokerage, the exchange and the SEC (or similar governmental regulatory body); slippage, which is the difference
between what you intended your order to be filled at versus what it was actually filled at; spread, which is the
difference between the bid/ask price of the security being traded.

Note that the spread is NOT constant and is dependent upon the current liquidity (i.e. availability of buy/sell orders) in
the market.

Transaction costs can make the difference between an extremely profitable strategy with a good Sharpe ratio and an
extremely unprofitable strategy with a terrible Sharpe ratio.

It can be a challenge to correctly predict transaction costs from a backtest. Depending upon the frequency of the
strategy, you will need access to historical exchange data, which will include tick data for bid/ask prices.

Entire teams of quants are dedicated to optimisation of execution in the larger funds for these reasons.

Consider the scenario where a fund needs to offload a substantial quantity of trades, of which the reasons to do so are
many and varied! By "dumping" so many shares onto the market, they will rapidly depress the price and may not
obtain optimal execution.

Hence algorithms which "drip feed" orders onto the market exist, although then the fund runs the risk of slippage.
Further to that, other strategies "prey" on these necessities and can exploit the inefficiencies. This is the domain of
fund structure arbitrage.

The final major issue for execution systems concerns divergence of strategy performance from backtested
performance.

This can happen for a number of reasons. We've already discussed look-ahead bias and optimisation bias in depth
when considering backtests.

However, some strategies do not make it easy to test for these biases prior to deployment. This occurs in HFT most
predominantly. There may be bugs in the execution system as well as the trading strategy itself that do not show up
on a backtest but DO show up in live trading.

The market may have been subject to a regime change subsequent to the deployment of your strategy. New
regulatory environments, changing investor sentiment and macroeconomic phenomena can all lead to divergences in
how the market behaves and thus the profitability of your strategy.

Risk Management

The final piece to the quantitative trading puzzle is the process of risk management.
"Risk" includes all of the previous biases we have discussed. It includes technology risk, such as servers co-located
at the exchange suddenly developing a hard disk malfunction. It includes brokerage risk, such as the broker
becoming bankrupt (not as crazy as it sounds, given the recent scare with MF Global!).

In short it covers nearly everything that could possibly interfere with the trading implementation, of which there are
many sources. Whole books are devoted to risk management for quantitative strategies so I wont't attempt to
elucidate on all possible sources of risk here.

Risk management also encompasses what is known as optimal capital allocation, which is a branch of portfolio
theory. This is the means by which capital is allocated to a set of different strategies and to the trades within those
strategies. It is a complex area and relies on some non-trivial mathematics.

The industry standard by which optimal capital allocation and leverage of the strategies are related is called the Kelly
criterion. The Kelly criterion makes some assumptions about the statistical nature of returns, which do not often hold
true in financial markets, so traders are often conservative when it comes to the implementation.

Another key component of risk management is in dealing with one's own psychological profile. There are
manycognitive biases that can creep in to trading. Although this is admittedly less problematic with algorithmic
trading if the strategy is left alone!

A common bias is that of loss aversion where a losing position will not be closed out due to the pain of having to
realise a loss. Similarly, profits can be taken too early because the fear of losing an already gained profit can be too
great.

Another common bias is known as recency bias. This manifests itself when traders put too much emphasis on recent
events and not on the longer term.

Then of course there are the classic pair of emotional biases - fear and greed. These can often lead to under- or over-
leveraging, which can cause blow-up, which is when the account equity heads to zero (or worse!), or reduced profits.

Summary

As can be seen, quantitative trading is an extremely complex, albeit very interesting, area of quantitative finance. I
have literally scratched the surface of the topic in this email and it is already getting rather long!

Whole books and papers have been written about issues which I have only given a sentence or two towards. For that
reason, before applying for quantitative fund trading jobs, it is necessary to carry out a significant amount of
groundwork study.

At the very least you will need a good background in statistics and time series analysis, with a lot of experience in
implementation, via a programming language such as MATLAB, Python or R.

For more sophisticated strategies at the higher frequency end, your skill set is likely to include Linux kernel
modification, C/C++, assembly programming and network latency optimisation.

If you are interested in trying to create your own algorithmic trading strategies, my first suggestion would be to get
good at programming.
My preference is to build as much of the data grabber, strategy backtester and execution system by yourself as
possible. If your own capital is on the line, wouldn't you sleep better at night knowing that you have fully tested your
system and are aware of its pitfalls and particular issues?

Outsourcing this to a vendor, while potentially saving time in the short term, could be extremely expensive in the long-
term.

In the next lesson we are going to look at the topic of How To Identify Algorithmic Trading Strategies.

Lesson 2: Profitable Algorithmic Trading Strategies


In the first lesson of the quantitative trading email course I presented the beginner's guide to quantitative trading. I
want to continue that discussion now, by introducing you to the methods by which I myself identify profitable
algorithmic trading strategies.

Our goal today is to understand in detail how to find, evaluate and select such systems.

I'll explain how identifying strategies is as much about personal preference as it is about strategy performance, how
to determine the type and quantity of historical data for testing, how to dispassionately evaluate a trading strategy
and finally how to proceed towards the backtesting phase and strategy implementation.

Identifying Your Own Personal Preferences for Trading

In order to be a successful trader - either discretionally or algorithmically - it is necessary to ask yourself some
honest questions. Trading provides you with the ability to lose money at an alarming rate, so it is necessary to "know
thyself" as much as it is necessary to understand your chosen strategy.

I would say the most important consideration in trading is being aware of your own personality. Trading, and

algorithmic trading in particular, requires a significant degree of discipline, patience and emotional detachment.

Since you are letting an algorithm perform your trading for you, it is necessary to be resolved not to interfere with

the strategy when it is being executed.

This can be extremely difficult, especially in periods of extended drawdown.

However, many strategies that have been shown to be highly profitable in a backtest can be ruined by simple

interference. Understand that if you wish to enter the world of algorithmic trading you will be emotionally tested and

that in order to be successful, it is necessary to work through these difficulties!


The next consideration is one of time. Do you have a full time job? Do you work part time? Do you work from home or

have a long commute each day?

These questions will help determine the frequency of the strategy that you should seek. For those of you in full time

employment, an intraday futures strategy may not be appropriate, at least until it is fully automated!

Your time constraints will also dictate the methodology of the strategy. If your strategy is frequently traded and reliant

on expensive news feeds, such as a Bloomberg terminal, you will clearly have to be realistic about your ability to

successfully run this while at the office!

For those of you with a lot of time, or the skills to automate your strategy, you may wish to look into a more technical

high-frequency trading (HFT) strategy.

My belief is that it is necessary to carry out continual research into your trading strategies to maintain a consistently

profitable portfolio. Few strategies stay "under the radar" forever.

Hence a significant portion of the time allocated to trading will be in carrying out ongoing research. Ask yourself

whether you are prepared to do this, as it can be the difference between strong profitability or a slow decline towards

losses.

You also need to consider your trading capital. The generally accepted ideal minimum amount for a quantitative

strategy is 50,000 USD (approximately 35,000 for us in the UK). If I was starting again, I would begin with a larger

amount, probably nearer 100,000 USD (approximately 70,000).

This is because transaction costs can be extremely expensive for mid- to high-frequency strategies and it is

necessary to have sufficient capital to absorb them in times of drawdown.

If you are considering beginning with less than 10,000 USD then you will need to restrict yourself to low-frequency

strategies, trading in one or two assets, as transaction costs will rapidly eat into your returns.

Interactive Brokers, which is one of the friendliest brokers to those with programming skills, due to its API, has a retail

account minimum of 10,000 USD.


Programming skill is an important factor in creating an automated algorithmic trading strategy. Being knowledgeable

in a programming language such as C++, Java, C#, Python or R will enable you to create the end-to-end data storage,

backtest engine and execution system yourself.

This has a number of advantages, chief of which is the ability to be completely aware of all aspects of the trading

infrastructure. It also allows you to explore the higher frequency strategies as you will be in full control of your

"technology stack".

While this means that you can test your own software and eliminate bugs, it also means more time spent coding up

infrastructure and less on implementing strategies, at least in the earlier part of your algo trading career.

You may find that you are comfortable trading in Excel or MATLAB and can outsource the development of other

components. I would not recommend this however, particularly for those trading at high frequency.

You also need to ask yourself what you hope to achieve by algorithmic trading.

Are you interested in a regular income, whereby you hope to draw earnings from your trading account? Or, are you

interested in a long-term capital gain and can afford to trade without the need to drawdown funds?

Income dependence will dictate the frequency of your strategy. More regular income withdrawals will require a higher

frequency trading strategy with less volatility (i.e. a higher Sharpe ratio). Long-term traders can afford a more sedate

trading frequency.

Finally, do not be deluded by the notion of becoming extremely wealthy in a short space of time! Algo trading is NOT

a get-rich-quick scheme - if anything it can be a become-poor-quick scheme. It takes significant discipline, research,

diligence and patience to be successful at algorithmic trading. It can take months, if not years, to generate consistent

profitability.

Sourcing Algorithmic Trading Ideas


Despite common perceptions to the contrary, it is actually quite straightforward to locate profitable trading strategies

in the public domain.

Never have trading ideas been more readily available than they are today. Academic finance journals, pre-print

servers, trading blogs, trading forums, weekly trading magazines and specialist texts provide thousands of trading

strategies with which to base your ideas upon.

Our goal as quantitative trading researchers is to establish a strategy pipeline that will provide us with a stream of

ongoing trading ideas. Ideally we want to create a methodical approach to sourcing, evaluating and implementing

strategies that we come across.

The aims of the pipeline are to generate a consistent quantity of new ideas and to provide us with a framework for

rejecting the majority of these ideas with the minimum of emotional consideration.

We must be extremely careful not to let cognitive biases influence our decision making methodology. This could be as

simple as having a preference for one asset class over another (gold and other precious metals come to mind)

because they are perceived as more exotic.

Our goal should always be to find consistently profitable strategies, with positive expectation. The choice of asset

class should be based on other considerations, such as trading capital constraints, brokerage fees and leverage

capabilities.

If you are completely unfamiliar with the concept of a trading strategy then the first place to look is with established

textbooks.

Classic texts provide a wide range of simpler, more straightforward ideas, with which to familiarise yourself with

quantitative trading. Here is a selection that I recommend for those who are new to quantitative trading, which

gradually become more sophisticated as you work through the list:

Quantitative Trading: How to Build Your Own Algorithmic Trading Business (Wiley Trading) - Ernest Chan

Algorithmic Trading and DMA: An introduction to direct access trading strategies - Barry Johnson

Option Volatility & Pricing: Advanced Trading Strategies and Techniques - Sheldon Natenberg

Volatility Trading - Euan Sinclair


Trading and Exchanges: Market Microstructure for Practitioners - Larry Harris

For a longer list of quantitative trading books, please visit the QuantStart reading list.

The next place to find more sophisticated strategies is with trading forums and trading blogs.

However, a note of caution: Many trading blogs rely on the concept of technical analysis. Technical analysis involves

utilising basic indicators and behavioural psychology to determine trends or reversal patterns in asset prices.

Despite being extremely popular in the overall trading space, technical analysis is considered somewhat ineffective in

the quantitative finance community.

Some have suggested that it is no better than reading a horoscope or studying tea leaves in terms of its predictive

power!

In reality there are successful individuals making use of technical analysis. However, as quants with a more

sophisticated mathematical and statistical toolbox at our disposal, we can easily evaluate the effectiveness of

such "TA-based" strategies and make data-driven decisions rather than base ours on emotional considerations or

preconceptions.

Here is a list of well-respected algorithmic trading blogs and forums:

The Whole Street

Quantivity

Quantitative Trading (Ernest Chan)

Quantopian

Quantpedia

Elite Trader Forums

Wealth Lab

Nuclear Phynance

Wilmott Forums

Once you have had some experience at evaluating simpler strategies, it is time to look at the more sophisticated

academic offerings.
Some academic journals will be difficult to access, without high subscriptions or one-off costs. If you are a member

or alumnus of a university, you should be able to obtain access to some of these financial journals.

Otherwise, you can look at pre-print servers, which are internet repositories of late drafts of academic papers that are

undergoing peer review. Since we are only interested in strategies that we can successfully replicate, backtest and

obtain profitability for, a peer review is of less importance to us.

The major downside of academic strategies is that they can often either be out of date, require obscure and

expensive historical data, trade in illiquid asset classes or do not factor in fees, slippage or spread.

It can also be unclear whether the trading strategy is to be carried out with market orders, limit orders or whether it

contains stop losses etc. Thus it is absolutely essential to replicate the strategy yourself as best you can, backtest it

and add in realistic transaction costs that include as many aspects of the asset classes that you wish to trade in.

Here is a list of the more popular pre-print servers and financial journals that you can source ideas from:

arXiv

SSRN

Journal of Investment Strategies

Journal of Computational Finance

Mathematical Finance

What about forming your own quantitative strategies? This generally requires (but is not limited to) expertise in one or

more of the following categories:

Market microstructure - For higher frequency strategies in particular, one can make use of market microstructure,
i.e. understanding of the order book dynamics in order to generate profitability. Different markets will have various
technology limitations, regulations, market participants and constraints that are all open to exploitation via specific
strategies. This is a very sophisticated area and retail practitioners will find it hard to be competitive in this space,
particularly as the competition includes large, well-capitalised quantitative hedge funds with strong technological
capabilities.

Fund structure - Pooled investment funds, such as pension funds, private investment partnerships (hedge funds),
commodity trading advisors and mutual funds are constrained both by heavy regulation and their large capital
reserves. Thus certain consistent behaviours can be exploited with those who are more nimble. For instance, large
funds are subject to capacity constraints due to their size. Thus if they need to rapidly offload (sell) a quantity of
securities, they will have to stagger it in order to avoid "moving the market". Sophisticated algorithms can take
advantage of this, and other idiosyncrasies, in a general process known as fund structure arbitrage.
Machine learning/artificial intelligence - Machine learning algorithms have become more prevalent in recent years
in financial markets. Classifiers (such as Naive-Bayes, et al.) non-linear function matchers (neural networks) and
optimisation routines (genetic algorithms) have all been used to predict asset paths or optimise trading strategies. If
you have a background in this area you may have some insight into how particular algorithms might be applied to
certain markets.

There are, of course, many other areas for quants to investigate. We'll discuss how to come up with custom strategies

in detail in later emails.

By continuing to monitor these sources on a weekly, or even daily, basis you are setting yourself up to receive a

consistent list of strategies from a diverse range of sources.

The next step is to determine how to reject a large subset of these strategies in order to minimise wasting your time

and backtesting resources on strategies that are likely to be unprofitable.

Evaluating Trading Strategies

The first, and arguably most obvious consideration is whether you actually understand the strategy.

Would you be able to explain the strategy concisely or does it require a string of caveats and endless parameter lists?

In addition, does the strategy have a good, solid basis in reality? For instance, could you point to some behavioural

rationale or fund structure constraint that might be causing the pattern(s) you are attempting to exploit?

Would this constraint hold up to a regime change, such as a dramatic regulatory environment disruption?

Does the strategy rely on complex statistical or mathematical rules? Does it apply to any financial time series or is

it specific to the asset class that it is claimed to be profitable on?

You should constantly be thinking about these factors when evaluating new trading methods, otherwise you may

waste a significant amount of time attempting to backtest and optimise unprofitable strategies.

Once you have determined that you understand the basic principles of the strategy you need to decide whether it fits

with your aforementioned personality profile.

This is not as vague a consideration as it sounds!


Strategies will differ substantially in their performance characteristics. There are certain personality types that can

handle more significant periods of drawdown, or are willing to accept greater risk for larger return.

Despite the fact that we, as quants, try and eliminate as much cognitive bias as possible and should be able to

evaluate a strategy dispassionately, biases will always creep in.

Thus we need a consistent, unemotional means through which to assess the performance of strategies. Here is the

list of criteria that I judge a potential new strategy by:

Methodology - Is the strategy momentum based, mean-reverting, market-neutral, directional? Does the strategy rely
on sophisticated (or complex!) statistical or machine learning techniques that are hard to understand and require a
PhD in statistics to grasp? Do these techniques introduce a significant quantity of parameters, which might lead to
optimisation bias? Is the strategy likely to withstand a regime change (i.e. potential new regulation of financial
markets)?

Sharpe Ratio - The Sharpe ratio heuristically characterises the reward/risk ratio of the strategy. It quantifies how
much return you can achieve for the level of volatility endured by the equity curve. Naturally, we need to determine the
period and frequency that these returns and volatility are measured over. A higher frequency strategy will require a
greater sampling rate of standard deviation, but a shorter overall time period of measurement, for instance.

Leverage - Does the strategy require significant leverage in order to be profitable? Does the strategy necessitate the
use of leveraged derivatives contracts (futures, options, swaps) in order to make a return? These leveraged contracts
can have heavy volatility characteristics and thus can easily lead to margin calls. Do you have the trading capital and
the temperament for such volatility?

Frequency - The frequency of the strategy is intimately linked to your technology stack and thus technological
expertise, the Sharpe ratio and overall level of transaction costs. All other issues considered, higher frequency
strategies require more capital, are more sophisticated and harder to implement. However, assuming your backtesting
engine is sophisticated and bug-free, they will often have far higher Sharpe ratios.

Volatility - Volatility is related strongly to the "risk" of the strategy. The Sharpe ratio characterises this. Higher volatility
of the underlying asset classes, if unhedged, often leads to higher volatility in the equity curve and thus smaller
Sharpe ratios. I am of course assuming that the positive volatility is approximately equal to the negative volatility.
Some strategies may have greater downside volatility. You need to be aware of these attributes.

Win/Loss, Average Profit/Loss - Strategies will differ in their win/loss and average profit/loss characteristics. One
can have a very profitable strategy, even if the number of losing trades exceed the number of winning trades.
Momentum strategies tend to have this pattern as they rely on a small number of "big hits" in order to be profitable.
Mean-reversion strategies tend to have opposing profiles where more of the trades are "winners", but the losing trades
can be quite severe.

Maximum Drawdown - The maximum drawdown is the largest overall peak-to-trough percentage drop on the equity
curve of the strategy. Momentum strategies are well known to suffer from periods of extended drawdowns (due to a
string of many incremental losing trades). Many traders will give up in periods of extended drawdown, even if historical
testing has suggested this is "business as usual" for the strategy. You will need to determine what percentage of
drawdown (and over what time period) you can accept before you cease trading your strategy. This is a highly
personal decision and thus must be considered carefully.

Capacity/Liquidity - At the retail level, unless you are trading in a highly illiquid instrument (like a small-cap stock),
you will not have to concern yourself greatly with strategy capacity. Capacity determines the scalability of the strategy
to further capital. Many of the larger hedge funds suffer from significant capacity problems as their strategies increase
in capital allocation.

Parameters - Certain strategies, especially those found in the machine learning community, require a large quantity of
parameters. Every extra parameter that a strategy requires leaves it more vulnerable to optimisation bias (also known
as "curve-fitting"). You should try and target strategies with as few parameters as possible or make sure you have
sufficient quantities of data with which to test your strategies on.

Benchmark - Nearly all strategies, unless characterised as "absolute return", are measured against some
performance benchmark. The benchmark is usually an index that characterises a large sample of the underlying asset
class that the strategy trades in. If the strategy trades large-cap US equities, then the S&P500 would be a natural
benchmark to measure your strategy against. You will hear the terms "alpha" and "beta", applied to strategies of this
type.

Notice that we have not discussed the actual returns of the strategy. Why is this? In isolation, the returns actually

provide us with limited information as to the effectiveness of the strategy.

They don't give you an insight into leverage, volatility, benchmarks or capital requirements. Thus strategies are

rarely judged on their returns alone. Always consider the risk attributes of a strategy before looking at the returns.

At this stage many of the strategies found from your pipeline will be rejected out of hand, since they won't meet your

capital requirements, leverage constraints, maximum drawdown tolerance or volatility preferences.

The strategies that do remain can now be considered for backtesting. However, before this is possible, it is necessary

to consider one final rejection criteria - that of available historical data on which to test these strategies.

Obtaining Historical Data

Nowadays, the breadth of the technical requirements across asset classes for historical data storage is substantial.

In order to remain competitive, both the buy-side (funds) and sell-side (investment banks) invest heavily in their

technical infrastructure. It is imperative to consider its importance.

In particular, we are interested in timeliness, accuracy and storage requirements. I will now outline the basics of

obtaining historical data and how to store it.


Unfortunately this is a very deep and technical topic, so I won't be able to say everything in this email. However, I will

be writing a lot more about this in the future as my prior industry experience in the financial industry was chiefly

concerned with financial data acquisition, storage and access.

In the previous section we had set up a strategy pipeline that allowed us to reject certain strategies based on our own

personal rejection criteria. In this section we will filter more strategies based on our own preferences for obtaining

historical data.

The chief considerations (especially at retail practitioner level) are the costs of the data, the storage requirements

and your level of technical expertise. We also need to discuss the different types of available data and the different

considerations that each type of data will impose on us.

Let's begin by discussing the types of data available and the key issues we will need to think about:

Fundamental Data - This includes data about macroeconomic trends, such as interest rates, inflation figures,
corporate actions (dividends, stock-splits), SEC filings, corporate accounts, earnings figures, crop reports,
meteorological data etc. This data is often used to value companies or other assets on a fundamental basis, i.e. via
some means of expected future cash flows. It does not include stock price series. Some fundamental data is freely
available from government websites. Other long-term historical fundamental data can be extremely expensive.
Storage requirements are often not particularly large, unless thousands of companies are being studied at once.

News Data - News data is often qualitative in nature. It consists of articles, blog posts, microblog posts ("tweets") and
editorial. Machine learning techniques such as classifiers are often used to interpretsentiment. This data is also often
freely available or cheap, via subscription to media outlets. The newer "NoSQL" document storage databases are
designed to store this type of unstructured, qualitative data.

Asset Price Data - This is the traditional data domain of the quant. It consists of time series of asset prices. Equities
(stocks), fixed income products (bonds), commodities and foreign exchange prices all sit within this class. Daily
historical data is often straightforward to obtain for the simpler asset classes, such as equities. However, once
accuracy and cleanliness are included and statistical biases removed, the data can become expensive. In addition,
time series data often possesses significant storage requirements especially when intraday data is considered.

Financial Instruments - Equities, bonds, futures and the more exotic derivative options have very different
characteristics and parameters. Thus there is no "one size fits all" database structure that can accommodate them.
Significant care must be given to the design and implementation of database structures for various financial
instruments. We will discuss the situation at length when we come to build a securities master database in future
emails.

Frequency - The higher the frequency of the data, the greater the costs and storage requirements. For low-frequency
strategies, daily data is often sufficient. For high frequency strategies, it might be necessary to obtain tick-level data
and even historical copies of particular trading exchange order book data. Implementing a storage engine for this type
of data is very technologically intensive and only suitable for those with a strong programming/technical background.
Benchmarks - The strategies described above will often be compared to a benchmark. This usually manifests itself
as an additional financial time series. For equities, this is often a national stock benchmark, such as the S&P500 index
(US) or FTSE100 (UK). For a fixed income fund, it is useful to compare against a basket of bonds or fixed income
products. The "risk-free rate" (i.e. appropriate interest rate) is also another widely accepted benchmark. All asset class
categories possess a favoured benchmark, so it will be necessary to research this based on your particular strategy, if
you wish to gain interest in your strategy externally.

Technology - The technology stacks behind a financial data storage centre are complex. This email can only scratch
the surface about what is involved in building one. However, it does centre around a database engine, such as a
Relational Database Management System (RDBMS), such as PostgreSQL, MySQL, SQL Server, Oracle or a
Document Storage Engine (i.e. "NoSQL"). This is accessed via "business logic" application code that queries the
database and provides access to external tools, such as MATLAB, R or Excel. Often this business logic is written in
C++, C#, Java or Python. You will also need to host this data somewhere, either on your own personal computer, or
remotely via internet servers. Products such as Amazon Web Services have made this simpler and cheaper in recent
years, but it will still require significant technical expertise to achieve in a robust manner.

As can be seen, once a strategy has been identified via the pipeline it will be necessary to evaluate the availability,

costs, complexity and implementation details of a particular set of historical data.

You may find it is necessary to reject a strategy based solely on historical data considerations. This is a big area and

teams of PhDs work at large funds making sure pricing is accurate and timely. Do not underestimate the difficulties

of creating a robust data centre for your backtesting purposes!

I do want to say, however, that many backtesting platforms can provide this data for you automatically - at a cost.

Thus it will take much of the implementation pain away from you, and you can concentrate purely on strategy

implementation and optimisation.

Tools like TradeStation possess this capability. However, my personal view is to implement as much as possible

internally and avoid outsourcing parts of the stack to software vendors.

I prefer higher frequency strategies due to their more attractive Sharpe ratios, but they are often tightly coupled to the

technology stack, where advanced optimisation is critical.

In the next email lesson we are going to look more closely at Strategy Backtesting in the first of two emails on
Successful Backtesting of Algorithmic Trading Strategies.

Lesson 3: Successful Backtesting of Algorithmic Trading Strategies


In the second lesson of our quantitative finance email course we discussed how to identify algorithmic trading
strategies and, in particular, create a strategy pipeline.
In today's lesson we're going to dig deeper into backtesting and consider aspects that are often ignored by much of
the algorithmic trading literature.

Algorithmic backtesting requires knowledge of many areas, including psychology, mathematics, statistics,

software development and market/exchange microstructure.

I couldn't hope to cover all of those topics in one email, so I'm going to split them into two or three smaller pieces.

What will we discuss in this section? I'll begin by defining backtesting and then I will describe the basics of how it is

carried out.

Then I will elucidate upon the biases we touched upon in the first email (Beginner's Guide to Quantitative Trading).

Next I will present a comparison of the various available backtesting software options.

In subsequent emails we will look at the details of strategy implementations that are often barely mentioned or

ignored.

We will also consider how to make the backtesting process more realistic by including the idiosyncrasies of a

trading exchange. Then we will discuss transaction costs and how to correctly model them in a backtest setting.

Let's begin by discussing what backtesting is and why we should carry it out in our algorithmic trading.

What is Backtesting?

Algorithmic trading stands apart from other types of investment classes because we can more reliably provide

expectations about future performance from past performance, as a consequence of abundant data availability. The

process by which this is carried out is known as backtesting.

In simple terms, backtesting is carried out by exposing your particular strategy algorithm to a stream of historical

financial data, which leads to a set of trading signals.

Each trade, which we will mean here to be a 'round-trip' of two signals, will have an associated profit or loss. The

accumulation of this profit/loss over the duration of your strategy backtest will lead to the total profit and loss (also

known as the 'P&L' or 'PnL'). That is the essence of the idea, although of course the "devil is always in the details"!

What are key reasons for backtesting an algorithmic strategy?


Filtration - If you recall from the previous email Strategy Identification, our goal at the initial research stage was to
set up a strategy pipeline and then filter out any strategy that did not meet certain criteria. Backtesting provides us with
another filtration mechanism, as we can eliminate strategies that do not meet our performance needs.

Modelling - Backtesting allows us to (safely!) test new models of certain market phenomena, such as transaction
costs, order routing, latency, liquidity or othermarket microstructure issues.

Optimisation - Although strategy optimisation is fraught with biases, backtesting allows us to increase the
performance of a strategy by modifying the quantity or values of the parameters associated with that strategy and
recalculating its performance.

Verification - Our strategies are often sourced externally, via our strategy pipeline. Backtesting a strategy ensures
that it has not been incorrectly implemented. Although we will rarely have access to the signals generated by external
strategies, we will often have access to the performance metrics such as the Sharpe Ratio and Drawdown
characteristics. Thus we can compare them with our own implementation.

Backtesting provides a host of advantages for algorithmic trading. However, it is not always possible to

straightforwardly backtest a strategy.

In general, as thefrequency of the strategy increases, it becomes harder to correctly model the microstructure effects

of the market and exchanges.

This leads to less reliable backtests and thus a trickier evaluation of a chosen strategy. This is a particular problem

where the execution system is the key to the strategy performance, as with ultra-high frequency algorithms.

Unfortunately, backtesting is fraught with biases of all types. We have touched upon some of these issues in previous

emails, but we will now discuss them in depth.

Biases Affecting Strategy Backtests

There are many biases that can affect the performance of a backtested strategy.

Unfortunately, these biases have a tendency to inflate the performance rather than detract from it. Thus you should

always consider a backtest to be an idealised upper bound on the actual performance of the strategy. It is almost

impossible to eliminate biases from algorithmic trading so it is our job to minimise them as best we can in order to

make informed decisions about our algorithmic strategies.


There are four major biases that I wish to discuss: Optimisation Bias, Look-Ahead Bias, Survivorship Bias and

Psychological Tolerance Bias.

Optimisation Bias

This is probably the most insidious of all backtest biases.

It involves adjusting or introducing additional trading parameters until the strategy performance on the backtest data
set is very attractive. However, once live the performance of the strategy can be markedly different. Another name
for this bias is "curve fitting" or "data-snooping bias".

Optimisation bias is hard to eliminate as algorithmic strategies often involve many parameters.

"Parameters" in this instance might be the entry/exit criteria, look-back periods, averaging periods (i.e the moving

average smoothing parameter) or volatility measurement frequency.

Optimisation bias can be minimised by keeping the number of parameters to a minimum and increasing the quantity

of data points in the training set.

In fact, one must also be careful of the latter as older training points can be subject to a prior regime (such as a

regulatory environment) and thus may not be relevant to your current strategy.

One method to help mitigate this bias is to perform a sensitivity analysis. This means varying the parameters

incrementally and plotting a "surface" of performance.

Sound, fundamental reasoning for parameter choices should, with all other factors considered, lead to a smoother

parameter surface.

If you have avery jumpy performance surface, it often means that a parameter is not reflecting a phenomena and is

an artefact of the test data.

There is a vast literature on multi-dimensional optimisation algorithms and it is a highly active area of research. I won't

dwell on it here, but keep it in the back of your mind when you find a strategy with a fantastic backtest!

Look-Ahead Bias

Look-ahead bias is introduced into a backtesting system when future data is accidentally included at a point in the
simulation where that data would not have actually been available.
If we are running the backtest chronologically and we reach time point N, then look-ahead bias occurs if data is
included for any point N+k, where k>0. Look-ahead bias errors can be incredibly subtle. Here are three examples of
how look-ahead bias can be introduced:
Technical Bugs - Arrays/vectors in code often have iterators or index variables. Incorrect offsets of these indices can
lead to a look-ahead bias by incorporating data at N+k for non-zero k.

Parameter Calculation - Another common example of look-ahead bias occurs when calculating optimal strategy
parameters, such as with linear regressions between two time series. If the whole data set (including future data) is
used to calculate the regression coefficients, and thus retroactively applied to a trading strategy for optimisation
purposes, then future data is being incorporated and a look-ahead bias exists.

Maxima/Minima - Certain trading strategies make use of extreme values in any time period, such as incorporating the
high or low prices in OHLC data. However, since these maximal/minimal values can only be calculated at the end of a
time period, a look-ahead bias is introduced if these values are used -during- the current period. It is always necessary
to lag high/low values by at least one period in any trading strategy making use of them.

As with optimisation bias, one must be extremely careful to avoid its introduction. It is often the main reason why

trading strategies underperform their backtests significantly in "live trading".

Survivorship Bias

Survivorship bias is a particularly dangerous phenomenon and can lead to significantly inflated performance for

certain strategy types.

It occurs when strategies are tested on datasets that do not include the full universe of prior assets that may have

been chosen at a particular point in time, but only consider those that have "survived" to the current time.

As an example, consider testing a strategy on a random selection of equities before and after the 2001 market crash.

Some technology stocks went bankrupt, while others managed to stay afloat and even prospered.

If we had restricted this strategy only to stocks which made it through the market drawdown period, we would be

introducing a survivorship bias because they have already demonstrated their success to us.

In fact, this is just another specific case of look-ahead bias, as future information is being incorporated into past

analysis.

There are two main ways to mitigate survivorship bias in your strategy backtests:

Survivorship Bias Free Datasets - In the case of equity data it is possible to purchase datasets that include delisted
entities, although they are not cheap and only tend to be utilised by institutional firms. In particular, Yahoo Finance
data is NOT survivorship bias free, and this is commonly used by many retail algo traders. One can also trade on
asset classes that are not prone to survivorship bias, such as certain commodities (and their future derivatives).

Use More Recent Data - In the case of equities, utilising a more recent data set mitigates the possibility that the stock
selection chosen is weighted to "survivors", simply as there is less likelihood of overall stock delisting in shorter time
periods. One can also start building a personal survivorship-bias free dataset by collecting data from current point
onward. After 3-4 years, you will have a solid survivorship-bias free set of equities data with which to backtest further
strategies.

We will now consider certain psychological phenomena that can influence your trading performance.

Psychological Tolerance Bias

This particular phenomena is not often discussed in the context of quantitative trading. However, it is discussed

extensively in regard to more discretionary trading methods.

It has various names, but I've decided to call it "psychological tolerance bias" because it captures the essence of the

problem.

When creating backtests over a period of 5 years or more, it is easy to look at an upwardly trending equity curve,

calculate the compounded annual return, Sharpe ratio and even drawdown characteristics and be satisfied with the

results.

As an example, the strategy might possess a maximum relative drawdown of 25% and a maximum drawdown

duration of 4 months. This would not be atypical for a momentum strategy.

It is straightforward to convince oneself that it is easy to tolerate such periods of losses because the overall picture

is rosy. However, in practice, it is far harder!

If historical drawdowns of 25% or more occur in the backtests, then in all likelihood you will see periods of similar

drawdown in live trading.

These periods of drawdown are psychologically difficult to endure. I have observed first hand what an extended

drawdown can be like, in an institutional setting, and it is not pleasant - even if the backtests suggest such periods

will occur.
The reason I have termed it a "bias" is that often a strategy which would otherwise be successful is stopped from

trading during times of extended drawdown and thus will lead to significant underperformance compared to a backtest.

Thus, even though the strategy is algorithmic in nature, psychological factors can still have a heavy influence on

profitability.

The takeaway is to ensure that if you see drawdowns of a certain percentage and duration in the backtests, then you

should expect them to occur in live trading environments, and will need to persevere in order to reach profitability once

more.

Software Packages for Backtesting

The software landscape for strategy backtesting is vast.

Solutions range from fully-integrated institutional grade sophisticated software through to programming languages

such as C++, Python and R where nearly everything must be written from scratch (or suitable 'plugins' obtained).

As quant traders, we are interested in the balance of being able to "own" our trading technology stack versus the

speed and reliability of our development methodology. Here are the key considerations for software choice:

Programming Skill - The choice of environment will in a large part come down to your ability to program software. I
would argue that being in control of the total stack will have a greater effect on your long term P&L than outsourcing
as much as possible to vendor software. This is due to the downside risk of having external bugs or idiosyncrasies
that you are unable to fix in vendor software, which would otherwise be easily remedied if you had more control over
your "tech stack". You also want an environment that strikes the right balance between productivity, library availability
and speed of execution. I make my own personal recommendation below.

Execution Capability/Broker Interaction - Certain backtesting software, such as TradeStation, ties in directly with a
brokerage. I am not a fan of this approach as reducing transaction costs are often a big component of getting a higher
Sharpe ratio. If you're tied into a particular broker (and Tradestation "forces" you to do this), then you will have a
harder time transitioning to new software (or a new broker) if the need arises. Interactive Brokers provide an API
which is robust, albeit with a slightly obtuse interface.

Customisation - An environment like MATLAB or Python gives you a great deal of flexibility when creating algo
strategies as they provide fantastic libraries for nearly any mathematical operation imaginable, but also allow
extensive customisation where necessary.
Strategy Complexity - Certain software just isn't cut out for heavy number crunching or mathematical complexity.
Excel is one such piece of software. While it is good for simpler strategies, it cannot really cope with numerous assets
or more complicated algorithms, at speed.

Bias Minimisation - Does a particular piece of software or data lend itself more to trading biases? You need to make
sure that if you want to create all the functionality yourself, that you don't introduce bugs which can lead to biases.

Speed of Development - One shouldn't have to spend months and months implementing a backtest engine.
Prototyping should only take a few weeks. Make sure that your software is not hindering your progress to any great
extent, just to grab a few extra percentage points of execution speed. C++ is the "elephant in the room" here!

Speed of Execution - If your strategy is completely dependent upon execution timeliness (as in HFT/UHFT) then a
language such as C or C++ will be necessary. However, you will be verging on Linux kernel optimisation and FPGA
usage for these domains, which is outside the scope of this email!

Cost - Many of the software environments that you can program algorithmic trading strategies with are completely free
and open source. In fact, many hedge funds make use of open source software for their entire algo trading stacks. In
addition, Excel and MATLAB are both relatively cheap and there are even free alternatives to each.

Now that we have listed the criteria with which we need to choose our software infrastructure, I want to run through

some of the more popular packages and how they compare:

Note: I am only going to include software that is available to most retail practitioners and software developers, as this

is the readership of the site and the email list. While other software is available, such as the more institutional grade

tools, I feel these are too expensive to be effectively used in a retail setting and I personally have no extensive

experience with them.

Backtesting Software Comparison


Description: WYSIWYG (what-you-see-is-what-you-get) spreadsheet software. Extremely widespread in the
financial industry. Data and algorithm are tightly coupled.
Execution: Yes, Excel can be tied into most brokerages.
Customisation: VBA macros allow more advanced functionality at the expense of hiding implementation.
Strategy Complexity: More advanced statistical tools are harder to implement as are strategies with many hundreds
MS of assets.
Excel
Bias Minimisation: Look-ahead bias is easy to detect via cell-highlighting functionality (assuming no VBA).
Development Speed: Quick to implement basic strategies.
Execution Speed: Slow execution speed - suitable only for lower-frequency strategies.
Cost: Cheap or free (depending upon license).
Alternatives: OpenOffice

Description: Programming environment originally designed for computational mathematics, physics and
MATLAB engineering. Very well suited to vectorised operations and those involving numerical linear algebra. Provides a wide
array of plugins for quant trading. In widespread use in quantitative hedge funds.
Execution: No native execution capability, MATLAB requires a separate execution system.
Customisation: Huge array of community plugins for nearly all areas of computational mathematics.
Strategy Complexity: Many advanced statistical methods already available and well-tested.
Bias Minimisation: Harder to detect look-ahead bias, requires extensive testing.
Development Speed: Short scripts can create sophisticated backtests easily.
Execution Speed: Assuming a vectorised/parallelised algorithm, MATLAB is highly optimised. Poor for traditional
iterated loops.
Cost: ~1,000 USD for a license.
Alternatives: Octave, SciLab

Description: High-level language designed for speed of development. Wide array of libraries for nearly any
programmatic task imaginable. Gaining wider acceptance in hedge fund and investment bank community. Not quite
as fast as C/C++ for execution speed.
Execution: Python plugins exist for larger brokers, such as Interactive Brokers. Hence backtest and execution system
can all be part of the same "tech stack".
Customisation: Python has a very healthy development community and is a mature language. NumPy/SciPy provide
fast scientific computing and statistical analysis tools relevant for quant trading.
Strategy Complexity: Many plugins exist for the main algorithms, but not quite as big a quant community as exists
Python for MATLAB.
Bias Minimisation: Same bias minimisation problems exist as for any high level language. Need to be extremely
careful about testing.
Development Speed: Python's main advantage is development speed, with robust in built in testing capabilities.
Execution Speed: Not quite as fast as C++, but scientific computing components are optimised and Python can talk
to native C code with certain plugins.
Cost: Free/Open Source
Alternatives:Ruby, Erlang, Haskell

Description: Environment designed for advanced statistical methods and time series analysis. Wide array of specific
statistical, econometric and native graphing toolsets. Large developer community.
Execution: R possesses plugins to some brokers, in particular Interactive Brokers. Thus an end-to-end system can
written entirely in R.
Customisation: R can be customised with any package, but its strengths lie in statistical/econometric domains.
Strategy Complexity: Mostly useful if performing econometric, statistical or machine-learning strategies due to
available plugins.
R
Bias Minimisation: Similar level of bias possibility for any high-level language such as Python or C++. Thus testing
must be carried out.
Development Speed: R is rapid for writing strategies based on statistical methods.
Execution Speed: R is slower than C++, but remains relatively optimised for vectorised operations (as with
MATLAB).
Cost: Free/Open Source
Alternatives: Stata
Description: Mature, high-level language designed for speed of execution. Wide array of quantitative finance and
numerical libraries. Harder to debug and often takes longer to implement than Python or MATLAB. Extremely
prevalent in both the buy- and sell-side.
Execution: Most brokerage APIs are written in C++ and Java. Thus many plugins exist.
Customisation: C/C++ allows direct access to underlying memory, hence ultra-high frequency strategies can be
implemented.
Strategy Complexity: C++ STL provides wide array of optimised algorithms. Nearly any specialised mathematical
algorithm possesses a free, open-source C/C++ implementation on the web.
C++
Bias Minimisation: Look-ahead bias can be tricky to eliminate, but no harder than other high-level language. Good
debugging tools, but one must be careful when dealing with underlying memory.
Development Speed: C++ is quite verbose compared to Python or MATLAB for the same algorithm. More lines-of-
code (LOC) often leads to greater likelihood of bugs.
Execution Speed: C/C++ has extremely fast execution speed and can be well optimised for specific computational
architectures. This is the main reason to utilise it.
Cost: Various compilers: Linux/GCC is free, MS Visual Studio has differing licenses.
Alternatives: C#, Java, Scala

Different strategies will require different software packages. HFT and UHFT strategies will be written in C/C++

(these days they are often carried out on GPUs and FPGAs), whereas low-frequency directional equity strategies are

easy to implement in TradeStation, due to the "all in one" nature of the software/brokerage.

My personal preference is for Python as it provides the right degree of customisation, speed of development,

testing capability and execution speed for my needs and strategies.

If I need anything faster, I can "drop in" to C++ directly from my Python programs. One method favoured by many

quant traders is to prototype their strategies in Python and then convert the slower execution sections to C++ in an

iterative manner. Eventually the entire algo is written in C++ and can be "left alone to trade"!

In the next email we will take an extensive look at transaction cost modelling as well as strategy specific
implementation.

Lesson 4: Transaction Cost Modelling and Strategy Implementation Issues

In the third lesson of our quantitative finance email course we discussed the various cognitive biases that can
affect trading, as well as different "technology stacks" and software for writing backtesters.

In today's lesson we're going to build on the last email by discussing transaction cost modelling and strategy
implementation issues.
Transaction Costs

One of the most prevalent beginner mistakes when implementing trading models is to neglect (or grossly
underestimate) the effects of transaction costs on a strategy.

Though it is often assumed that transaction costs only reflect broker commissions, there are in fact many other ways
that costs can be accrued on a trading model. The three main types of costs that must be considered include:

Commissions/Fees

The most direct form of transaction costs incurred by an algorithmic trading strategy are commissions and fees.

All strategies require some form of access to an exchange, either directly or through a brokerage intermediary ("the
broker"). These services incur an incremental cost with each trade, known as commission.

Brokers generally provide many services, although quantitative algorithms only really make use of the exchange
infrastructure. Hence brokerage commissions are often small on per trade basis.

Brokers also charge fees, which are costs incurred to clear and settle trades. Further to this are taxes imposed by
regional or national governments.

For instance, in the UK there is a stamp duty to pay on equities transactions. Since commissions, fees and taxes are
generally fixed, they are relatively straightforward to implement in a backtest engine (see below).

Slippage/Latency

Slippage is the difference in price achieved between the time when a trading system decides to transact and the time
when a transaction is actually carried out at an exchange.

Slippage is a considerable component of transaction costs and can make the difference between a very profitable
strategy and one that performs poorly.

Slippage is a function of the underlying asset volatility, the latency between the trading system and the exchange
and the type of strategy being carried out.

An instrument with higher volatility is more likely to be moving and so prices between signal and execution can differ
substantially.

Latency is defined as the time difference between signal generation and point of execution.

Higher frequency strategies are more sensitive to latency issues and improvements of milliseconds on this latency can
make all the difference towards profitability.

The type of strategy is also important.

Momentum systems suffer more from slippage on average because they are trying to purchase instruments that are
already moving in the forecast direction. The opposite is true for mean-reverting strategies as these strategies are
moving in a direction opposing the trade.
Market Impact/Liquidity

Market impact is the cost incurred to traders due to the supply/demand dynamics of the exchange (and asset)
through which they are trying to trade.

A large order on a relatively illiquid asset is likely to move the market substantially as the trade will need to access a
large component of the current supply.

To counter this, large block trades are broken down into smaller chunkswhich are transacted periodically, as and
when new liquidity arrives at the exchange.

On the opposite end, for highly liquid instruments such as the S&P500 E-Mini index futures contract, low volume
trades are unlikely to adjust the "current price" in any great amount.

More illiquid assets are characterised by a larger spread, which is the difference between the current bid and ask
prices on the limit order book.

This spread is an additional transaction cost associated with any trade. Spread is a very important component of the
total transaction cost - as evidenced by the myriad of UK spread-betting firms whose advertising campaigns express
the "tightness" of their spreads for heavily traded instruments.

Transaction Cost Models

n order to successfully model the above costs in a backtesting system, various degrees of complex transaction
models have been introduced.

They range from simple flat modelling through to a non-linear quadratic approximation. Here we will outline the
advantages and disadvantages of each model:

Flat/Fixed Transaction Cost Models

Flat transaction costs are the simplest form of transaction cost modelling. They assume a fixed cost associated with
each trade. Thus they best represent the concept of brokerage commissions and fees.

They are not very accurate for modelling more complex behaviour such as slippage or market impact.

In fact, they do not consider asset volatility or liquidity at all. Their main benefit is that they are computationally
straightforward to implement.

However they are likely to significantly under or over estimate transaction costs depending upon the strategy
being employed. Thus they are rarely used in practice.

Linear/Piecewise Linear/Quadratic Transaction Cost Models

More advanced transaction cost models start with linear models, continue with piece-wise linear models and conclude
with quadratic models.

They lie on a spectrum of least to most accurate, albeit with least to greatest implementation effort.
Since slippage and market impact are inherently non-linear phenomena quadratic functions are the most accurate at
modelling these dynamics.

Quadratic transaction cost models are much harder to implement and can take far longer to compute than for simpler
flat or linear models, but they are often used in practice.

Algorithmic traders also attempt to make use of actual historical transaction costs for their strategies as inputs to their
current transaction models to make them more accurate.

This is tricky business and often verges on the complicated areas of modelling volatility, slippage and market
impact.

However, if the trading strategy is transacting large volumes over short time periods, then accurate estimates of the
incurred transaction costs can have a significant effect on the strategy bottom-line and so it is worth the effort to invest
in researching these models.

Strategy Backtest Implementation Issues

While transaction costs are a very important aspect of successful backtesting implementations, there are many other
issues that can affect strategy performance.

Trade Order Types

One choice that an algorithmic trader must make is how and when to make use of the different exchange orders
available.

This choice usually falls into the realm of the execution system, but we will consider it here as it can greatly affect
strategy backtest performance. There are two types of order that can be carried out: market orders and limit orders.

A market order executes a trade immediately, irrespective of available prices.

Thus large trades executed as market orders will often get a mixture of prices as each subsequent limit order on the
opposing side is filled. Market orders are considered aggressive orders since they will almost certainly be filled, albeit
with a potentially unknown cost.

Limit orders provide a mechanism for the strategy to determine the worst price at which the trade will get executed,
with the caveat that the trade may not get filled partially or fully.

Limit orders are considered passive orders since they are often unfilled, but when they are a price is guaranteed. An
individual exchange's collection of limit orders is known as the limit order book, which is essentially a queue of buy
and sell orders at certain sizes and prices.

When backtesting, it is essential to model the effects of using market or limit orders correctly.

For high-frequency strategies in particular, backtests can significantly outperform live trading if the effects of market
impact and the limit order book are not modelled accurately.

OHLC Data Idiosyncrasies


There are particular issues related to backtesting strategies when making use of daily data in the form of Open-High-
Low-Close (OHLC) figures, especially for equities.

Note that this is precisely the form of data given out by Yahoo Finance, which is a very common source of data for
retail algorithmic traders!

Cheap or free datasets, while suffering from survivorship bias (which we have already discussed in the previous
email), are also often composite price feeds from multiple exchanges.

This means that the extreme points (i.e. the open, close, high and low) of the data are very susceptible to "outlying"
values due to small orders at regional exchanges.

Further, these values are also sometimes more likely to be tick-errors that have yet to be removed from the dataset.

This means that if your trading strategy makes extensive use of any of the OHLC points specifically, backtest
performance can differ from live performance as orders might be routed to different exchanges depending upon your
broker and your available access to liquidity.

The only way to resolve these problems is to make use of higher frequency data or obtain data directly from an
individual exchange itself, rather than a cheaper composite feed.

In the next email we will look at some of the recommended textbooks for quantitative trading.

Lesson 5 Recommended Textbooks for Quantitative Trading


In the fourth lesson of our quantitative finance email course we discussed transaction cost modelling and
strategy implementation issues.

In today's lesson we're going to look at some of the core textbooks for quantitative and algorithmic trading.

Algorithmic trading is usually perceived as a complex area for beginners to get to grips with.

It covers a wide range of disciplines, with certain aspects requiring a significant degree of mathematical and
statistical maturity.

Consequently it can be extremely off-putting for the uninitiated. In reality, the overall concepts are straightforward to
grasp, while the details can be learned in an iterative, ongoing manner.

The beauty of algorithmic trading is that there is no need to test out knowledge on real capital, as many brokerages

provide highly realistic market simulators.

While there are certain caveats associated with such systems, they provide an environment to foster a deep level of

understanding, with absolutely no capital risk.

The first task is to gain a solid overview of the subject. I have found it be far easier to avoid heavy mathematical

discussions until the basics are covered and understood. The best books I have found for this purpose are as follows:
1) Quantitative Trading by Ernest Chan - This is one of my favourite finance books. Dr. Chan provides a great
overview of the process of setting up a "retail" quantitative trading system, using MatLab or Excel. He makes the
subject highly approachable and gives the impression that "anyone can do it". Although there are plenty of details that
are skipped over (mainly for brevity), the book is a great introduction to how algorithmic trading works. He discusses
alpha generation ("the trading model"), risk management, automated execution systems and certain strategies
(particularly momentum and mean reversion). This book is the place to start.

2) Inside the Black Box by Rishi K. Narang - In this book Dr. Narang explains in detail how a professional
quantitative hedge fund operates. It is pitched at a savvy investor who is considering whether to invest in such a "black
box". Despite the seeming irrelevance to a retail trader, the book actually contains a wealth of information on how a
"proper" quant trading system should be carried out. For instance, the importance of transaction costs and risk
management are outlined, with ideas on where to look for further information. Many retail algo traders could do well to
pick this up and see how the 'professionals' carry out their trading.

3) Algorithmic Trading & DMA by Barry Johnson - The phrase 'algorithmic trading', in the financial industry, usually
refers to the execution algorithms used by banks and brokers to execute efficient trades. I am using the term to cover
not only those aspects of trading, but also quantitative or systematic trading. This book is mainly about the former,
being written by Barry Johnson, who is a quantitative software developer at an investment bank. Does this mean it is
of no use to the retail quant? Not at all. Possessing a deeper understanding of how exchanges work and "market
microstructure" can aid immensely the profitability of retail strategies. Despite it being a heavy tome, it is worth picking
up.

Once the basic concepts are grasped, it is necessary to begin developing a trading strategy. This is usually known

as the alpha model component of a trading system.

Strategies are straightforward to find these days (as I mentioned in previous emails), however the true value comes in

determining your own trading parameters via extensive research and backtesting.

The following books discuss certain types of trading and execution systems and how to go about implementing them:

4) Algorithmic Trading by Ernest Chan - This is the second book by Dr. Chan. In the first book he discussed
momentum, mean reversion and certain high frequency strategies. This book discusses such strategies in depth and
provides significant implementation details, albeit with more mathematical complexity than in the first (e.g. Kalman
Filters, Stationarity/Cointegration, CADF etc). The strategies, once again, make extensive use of MatLab but the code
can be easily modified to C++, Python/pandas or R for those with programming experience. It also provides updates
on the latest market behaviour, as the first book was written a few years back.

5) Trading and Exchanges by Larry Harris - This book concentrates on market microstructure, which I personally feel
is an essential area to learn about, even at the beginning stages of quant trading. Market microstructure is the
"science" of how market participants interact and the dynamics that occur in the order book. It is closely related to how
exchanges function and what actually happens when a trade is placed. This book is less about trading strategies as
such, but more about things to be aware of when designing execution systems. Many professionals in the quant
finance space regard this as an excellent book and I also highly recommend it.
At this stage, as a retail trader, you will be in a good place to begin researching the other components of a trading

system such as the execution mechanism (and its deep relationship with transaction costs), as well as risk and

portfolio management.

Successful Algorithmic Trading

While the above five books are very good, I'd also like to take the opportunity to recommend my own book on the

subject - Successful Algorithmic Trading.

In the book I make extensive use of Python and associated libraries such as Scikit-Learn, Pandas, NumPy, SciPy
and Statsmodels to create an end-to-end algorithmic trading backtest simulator with Interactive Brokers as the
primary brokerage.

I present trading strategies at multiple frequencies and discuss how to optimise parameters of such strategies,
while outlining pitfalls to be aware of.

To find out more about the book, please visit the Successful Algorithmic Trading page.

In the next email we will ask whether quantitative traders can still succeed at the retail level.

Lesson 6 whether quantitative traders can still succeed at the retail level
In the fifth lesson of our quantitative finance email course we discussed some of the core textbooks for
quantitative and algorithmic trading.

In today's lesson we're going to look at some of the advantages enjoyed by retail quants over quantitative hedge
funds.

It is common, as a beginning algorithmic trader practising at retail level, to question whether it is still possible to

compete with the large institutional quant funds.

In this email lesson I would like to argue that due to the nature of the institutional regulatory environment, the

organisational structure and a need to maintain investor relations that funds suffer from certain disadvantages that

do not concern retail algorithmic traders.


The capital and regulatory constraints imposed on funds lead to certain predictable behaviours, which are able to be

exploited by a retail trader.

"Big money" moves the markets, and as such one can dream up many strategies to take advantage of such

movements.

We will discuss some of these strategies in future emails. At this stage I would like to highlight the comparative

advantages enjoyed by the algorithmic trader over many larger funds.

Trading Advantages

There are many ways in which a retail algo trader can compete with a fund on their trading process alone, but there

are also some disadvantages:

Capacity - A retail trader has greater freedom to play in smaller markets. They can generate significant returns in
these spaces, even while institutional funds can't.

Crowding the trade - Funds suffer from "technology transfer", as staff turnover can be high. Non-Disclosure
Agreements and Non-Compete Agreements mitigate the issue, but it still leads to many quant funds "chasing the
same trade". Whimsical investor sentiment and the "next hot thing" exacerbate the issue. Retail traders are not
constrained to follow the same strategies and so can remain uncorrelated to the larger funds.

Market impact - When playing in highly liquid, non-OTC markets, the low capital base of retail accounts reduces
market impact substantially.

Leverage - A retail trader, depending upon their legal setup, is constrained by margin/leverage regulations. Private
investment funds do not suffer from the same disadvantage, although they are equally constrained from a risk
management perspective.

Liquidity - Having access to a prime brokerage is out of reach of the average retail algo trader. They have to "make
do" with a retail brokerage such as Interactive Brokers. Hence there is reduced access to liquidity in certain
instruments. Trade order-routing is also less clear and is one way in which strategy performance can diverge from
backtests.

Client news flow - Potentially the most important disadvantage for the retail trader is lack of access to client news
flow from their prime brokerage or credit-providing institution. Retail traders have to make use of non-traditional
sources such as meet-up groups, blogs, forums and open-access financial journals.

Risk Management

Retail algo traders often take a different approach to risk management than the larger quant funds. It is often

advantageous to be "small and nimble" in the context of risk.


Crucially, there isno risk management budget imposed on the trader beyond that which they impose themselves,

nor is there a compliance or risk management department enforcing oversight.

This allows the retail trader to deploy custom or preferred risk modelling methodologies, without the need to follow

"industry standards" (an implicit investor requirement).

However, the alternative argument is that this flexibility can lead to retail traders to becoming "sloppy" with risk

management.

Risk concerns may be built-in to the backtest and execution process, without external consideration given to portfolio

risk as a whole.

Although "deep thought" might be applied to the alpha model (strategy), risk management might not achieve a similar

level of consideration.

Investor Relations

Outside investors are the key difference between retail shops and large funds. This drives all manner of incentives for

the larger fund - issues which the retail trader need not concern themselves with:

Compensation structure - In the retail environment the trader is concerned only with absolute return. There are no
high-water marks to be met and no capital deployment rules to follow. Retail traders are also able to suffer more
volatile equity curves since nobody is watching their performance who might be capable of redeeming capital from
their fund.

Regulations and reporting - Beyond taxation there is little in the way of regulatory reporting constraints for the retail
trader. Further, there is no need to provide monthly performance reports or "dress up" a portfolio prior to a client
newsletter being sent. This is a big time-saver.

Benchmark comparison - Funds are not only compared with their peers, but also "industry benchmarks". For a long-
only US equities fund, investors will want to see returns in excess of the S&P500, for example. Retail traders are not
enforced in the same way to compare their strategies to a benchmark.

Performance fees - The downside to running your own portfolio as a retail trader are the lack of management and
performance fees enjoyed by the successful quant funds. There is no "2 and 20" to be had at the retail level!

Technology
One area where the retail trader is at a significant advantage is in the choice of technology stack for the trading

system.

Not only can the trader pick the "best tools for the job" as they see fit, but there are no concerns about legacy

systems integration or firm-wide IT policies.

Newer languages such as Python or R now possess packages to construct an end-to-end backtesting, execution, risk

and portfolio management system with far fewer lines-of-code (LOC) than may be needed in a more verbose

language such as C++.

However, this flexibility comes at a price.

One either has to build the stack themselves or outsource all or part of it to vendors. This is expensive in terms of

time, capital or both.

Further, a trader mustdebug all aspects of the trading system - a long and potentially painstaking process.

All desktop research machines and any co-located servers must be paid for directly out of trading profits as there are

no management fees to cover expenses.

In conclusion, it can be seen that retail traders possess significant comparative advantages over the larger quant

funds. Potentially, there are many ways in which these advantages can be exploited.

In the next email we will consider the best programming languages for algorithmic trading systems.

Lesson 7 Whether Retail Traders Can Still Succeed At Algorithmic Trading

In the sixth lesson of our quantitative finance email course we discussed some of the issues around whether retail
traders can still succeed at algorithmic trading.

In today's lesson we're going to look at some of the programming languages that are useful for building algorithmic
trading systems.

One of the most frequent questions I receive in the QS mailbag is "What is the best programming language for

algorithmic trading?".

The short answer is that there is no "best" language.


Strategy parameters, performance, modularity, development, resiliency and cost must all be considered.

Today's lesson will outline the necessary components of an algorithmic trading system architecture and how

decisions regarding implementation affect the choice of programming language.

Firstly, the major components of an algorithmic trading system will be considered, such as the research tools, portfolio

optimiser, risk manager and execution engine.

Subsequently, different trading strategies will be examined and how they affect the design of the system. In particular

the frequency of trading and the likely trading volume will both be discussed.

Once the trading strategy has been selected, it is necessary to architect the entire system. This includes choice of

hardware, the operating system(s) and system resiliency against rare, potentially catastrophic events.

While the architecture is being considered, due regard must be paid to performance - both to the research tools as

well as the live execution environment.

What Is The Trading System Trying To Do?

Before deciding on the "best" language with which to write an automated trading system it is necessary to define the

requirements.

Is the system going to be purely execution based?


Will the system require a risk management or portfolio construction module?
Will the system require a high-performance backtester?
For most strategies the trading system can be partitioned into two categories: Research and signal generation.

Research is concerned with evaluation of a strategy performance over historical data. The process of evaluating a

trading strategy over prior market data is known as backtesting.

The data size and algorithmic complexity will have a big impact on the computational intensity of the backtester. CPU

speed and concurrency are often the limiting factors in optimising research execution speed.

Signal generation is concerned with generating a set of trading signals from an algorithm and sending such orders to

the market, usually via a brokerage.


For certain strategies a high level of performance is required. I/O issues such as network bandwidth and latency are

often the limiting factor in optimising execution systems. Thus the choice of languages for each component of your

entire system may be quite different.

Type, Frequency and Volume of Strategy

The type of algorithmic strategy employed will have a substantial impact on the design of the system.

It will be necessary to consider the markets being traded, the connectivity to external data vendors, the frequency and

volume of the strategy, the trade-off between ease of development and performance optimisation, as well as any

custom hardware, including co-located custom servers, GPUs or FPGAs that might be necessary.

The technology choices for a low-frequency US equities strategy will be vastly different from those of a high-

frequency statistical arbitrage strategy trading on the futures market. Prior to the choice of language many data

vendors must be evaluated that pertain to a the strategy at hand.

It will be necessary to consider connectivity to the vendor, structure of any APIs, timeliness of the data, storage

requirements and resiliency in the face of a vendor going offline. It is also wise to possess rapid access to multiple

vendors!

Various instruments all have their own storage quirks, examples of which include multiple ticker symbols for equities

and expiration dates for futures (not to mention any specific OTC data). This needs to be factored in to the platform

design.

Frequency of strategy is likely to be one of the biggest drivers of how the technology stack will be defined. Strategies

employing data more frequently than minutely or secondly bars require significant consideration with regards to

performance.

A strategy exceeding secondly bars (i.e. tick data) leads to a performance driven design as the primary requirement.

For high frequency strategies a substantial amount of market data will need to be stored and evaluated. Software such

as HDF5 or kdb+ are commonly used for these roles.


In order to process the large volumes of data needed for HFT applications, an extensively optimised backtester and

execution system must be used. C/C++ (possibly with some assembler) is likely to the strongest language candidate.

Ultra-high frequency strategies will almost certainly require custom hardware such as FPGAs, exchange co-location

and kernal/network interface tuning.

Research Systems

Research systems typically involve a mixture of interactive development and automated scripting. The former
often takes place within an IDE such as Visual Studio, MatLab or R Studio. The latter involves extensive numerical
calculations over numerous parameters and data points.

This leads to a language choice providing a straightforward environment to test code, but also provides sufficient
performance to evaluate strategies over multiple parameter dimensions.

Typical IDEs in this space include:


Microsoft Visual C++/C#, which contains extensive debugging utilities, code completion capabilities (via
"Intellisense") and straightforward overviews of the entire project stack (via the database ORM, LINQ)
MatLab, which is designed for extensive numerical linear algebra and vectorised operations, but in an interactive
console manner
R Studio, which wraps the R statistical language console in a fully-fledged IDE
Eclipse IDE for Linux Java and C++
Semi-proprietary IDEs such as Enthought Canopy or Anaconda (both for Python), which include data analysis
libraries such as NumPy, SciPy, scikit-learn and pandas in a single interactive (console) environment.
For numerical backtesting, all of the above languages are suitable, although it is not necessary to utilise a GUI/IDE as
the code will be executed "in the background".

The prime consideration at this stage is that of execution speed. A compiled language (such as C++) is often useful if
the backtesting parameter dimensions are large. Remember that it is necessary to be wary of such systems if that is
the case!

Interpreted languages such as Python often make use of high-performance libraries such as NumPy/pandas for the
backtesting step, in order to maintain a reasonable degree of competitiveness with compiled equivalents.

Ultimately the language chosen for the backtesting will be determined by specific algorithmic needs as well as the
range of libraries available in the language (more on that below).

However, the language used for the backtester and research environments can be completely independent of those
used in the portfolio construction, risk management and execution components, as will be seen.

Portfolio Construction and Risk Management

The portfolio construction and risk management components are often overlooked by retail algorithmic
traders.
This is almost always a mistake. These tools provide the mechanism by which capital will be preserved. They not only
attempt to alleviate the number of "risky" bets, but also minimise churn of the trades themselves, reducing transaction
costs.

Sophisticated versions of these components can have a significant effect on the quality and consistentcy of

profitability.

It is straightforward to create a stable of strategies as the portfolio construction mechanism and risk manager can

easily be modified to handle multiple systems. Thus they should be considered essential components at the outset of

the design of an algorithmic trading system.

The job of the portfolio construction system is to take a set of desired trades and produce the set of actual trades that

minimise churn, maintain exposures to various factors (such as sectors, asset classes, volatility etc) and optimise the

allocation of capital to various strategies in a portfolio.

Portfolio construction often reduces to a linear algebra problem (such as a matrix factorisation) and hence

performance is highly dependent upon the effectiveness of the numerical linear algebra implementation available.

Common libraries include uBLAS, LAPACK and NAG for C++. MatLab also possesses extensively optimised matrix

operations. Python utilises NumPy/SciPy for such computations.

A frequently rebalanced portfolio will require a compiled (and well optimised!) matrix library to carry this step out, so as

not to bottleneck the trading system.

Risk management is another extremely important part of an algorithmic trading system.

Risk can come in many forms: Increased volatility (although this may be seen as desirable for certain strategies!),

increased correlations between asset classes, counter-party default, server outages, "black swan" events and

undetected bugs in the trading code, to name a few.

Risk management components try and anticipate the effects of excessive volatility and correlation between asset

classes and their subsequent effect(s) on trading capital.

Often this reduces to a set of statistical computations such as Monte Carlo "stress tests".
This is very similar to the computational needs of a derivatives pricing engine and as such will be CPU-bound. These

simulations are highly parallelisable (see below) and, to a certain degree, it is possible to "throw hardware at the

problem".

Execution Systems

The job of the execution system is to receive filtered trading signals from the portfolio construction and risk

management components and send them on to a brokerage or other means of market access.

For the majority of retail algorithmic trading strategies this involves an API or FIX connection to a brokerage such as

Interactive Brokers.

The primary considerations when deciding upon a language include quality of the API, language-wrapper availability

for an API, execution frequency and the anticipated slippage.

The "quality" of the API refers to how well documented it is, what sort of performance it provides, whether it needs

standalone software to be accessed or whether a gateway can be established in a headless fashion (i.e. no GUI).

In the case of Interactive Brokers, the Trader WorkStation tool needs to be running in a GUI environment in order to

access their API. Specifically, this means it cannot be run on a Linux console server environment.

Most APIs will provide a C++ and/or Java interface. It is usually up to the community to develop language-specific

wrappers for C#, Python, R, Excel and MatLab.

Note that with every additional plugin utilised (especially API wrappers) there is scope for bugs to creep into the

system. Always test plugins of this sort and ensure they are actively maintained. A worthwhile gauge is to see how

many new updates to a codebase have been made in recent months.

Execution frequency is of the utmost importance in the execution algorithm. Note that hundreds of orders may be

sent every minute and as such performance is critical.

Slippage will be incurred through a badly-performing execution system and this will have a dramatic impact on

profitability.
Statically-typed languages (see below) such as C++/Java are generally optimal for execution but there is a trade-off in

development time, testing and ease of maintenance.

Dynamically-typed languages, such as Python and Perl are now generally "fast enough". Always make sure the

components are designed in a modular fashion (see below) so that they can be "swapped out" out as the system

scales.

Architectural Planning and Development Process

The components of a trading system, its frequency and volume requirements have been discussed above, but system
infrastructure has yet to be covered.

Those acting as a retail trader or working in a small fund will likely be "wearing many hats". You will be covering the
alpha model, risk management and execution parameters, and also the final implementation of the system. Before
delving into specific languages the design of an optimal system architecture will be discussed.

Separation of Concerns

One of the most important decisions that must be made at the outset is how to "separate the concerns" of a
trading system.

In software development, this essentially means how to break up the different aspects of the trading system into
separate modular components.

By exposing interfaces at each of the components it is easy to swap out parts of the system for other versions that aid

performance, reliability or maintenance, without modifying any external dependency code.

This is the "best practice" for such systems. For strategies at lower frequencies such practices are advised.

For ultra high frequency trading the rulebook might have to be ignored at the expense of tweaking the system for

even more performance. A more tightly coupled system may be desirable.

Creating a component map of an algorithmic trading system is worth an email in itself. However, an optimal approach

is to make sure there are separate components for the historical and real-time market data inputs, data storage, data

access API, backtester, strategy parameters, portfolio construction, risk management and automated execution

systems.
For instance, if the data store being used is currently underperforming, even at significant levels of optimisation, it can

be swapped out with minimal rewrites to the data ingestion or data access API. As far the as the backtester and

subsequent components are concerned, there is no difference.

Another benefit of separated components is that it allows a variety of programming languages to be used in the overall

system.

There is no need to be restricted to a single language if the communication method of the components is language

independent. This will be the case if they are communicating via TCP/IP, ZeroMQ or some other language-

independent protocol.

As a concrete example, consider the case of a backtesting system being written in C++ for "number crunching"

performance, while the portfolio manager and execution systems are written in Python using SciPy and IBPy (an open

source wrapper for the Interactive Brokers API in Python).

Performance Considerations

Performance is a significant consideration for most trading strategies.

For higher frequency strategies it is the most important factor. "Performance" covers a wide range of issues, such as

algorithmic execution speed, network latency, bandwidth, data I/O, concurrency/parallelism and scaling.

Each of these areas are individually covered by large textbooks, so this email will only scratch the surface of each

topic. Architecture and language choice will now be discussed in terms of their effects on performance.

The prevailing wisdom as stated by Donald Knuth, one of the fathers of Computer Science, is that "premature

optimisation is the root of all evil".

This is almost always the case - except when building a high frequency trading algorithm! For those who are

interested in lower frequency strategies, a common approach is to build a system in the simplest way possible and

only optimise as bottlenecks begin to appear.


Profiling tools are used to determine where bottlenecks arise. Profiles can be made for all of the factors listed above,

either in a MS Windows or Linux environment. There are many operating system and language tools available to do

so, as well as third party utilities. Language choice will now be discussed in the context of performance.

C++, Java, Python, R and MatLab all contain high-performance libraries (either as part of their standard or externally)

for basic data structure and algorithmic work. C++ ships with the Standard Template Library, while Python contains

NumPy/SciPy. Common mathematical tasks are to be found in these libraries and it is rarely beneficial to write a new

implementation.

One exception is if highly customised hardware architecture is required and an algorithm is making extensive use of

proprietary extensions (such as custom caches).

However, often "reinvention of the wheel" wastes time that could be better spent developing and optimising other parts

of the trading infrastructure. Development time is extremely precious especially in the context of sole developers.

Latency is often an issue of the execution system as the research tools are usually situated on the same machine.

For the former, latency can occur at multiple points along the execution path. Databases must be consulted

(disk/network latency), signals must be generated (operating syste, kernal messaging latency), trade signals sent (NIC

latency) and orders processed (exchange systems internal latency).

For higher frequency operations it is necessary to become intimately familiar with kernal optimisation as well as

optimisation of network transmission. This is a deep area and is significantly beyond the scope of the email but if an

UHFT algorithm is desired then be aware of the depth of knowledge required!

Caching is very useful in the toolkit of a quantitative trading developer. Caching refers to the concept of storing

frequently accessed data in a manner which allows higher-performance access, at the expense of potential staleness

of the data. A common use case occurs in web development when taking data from a disk-backed relational database

and putting it into memory. Any subsequent requests for the data do not have to "hit the database" and so

performance gains can be significant.


For trading situations caching can be extremely beneficial. For instance, the current state of a strategy portfolio can be

stored in a cache until it is rebalanced, such that the list doesn't need to be regenerated upon each loop of the trading

algorithm. Such regeneration is likely to be a high CPU or disk I/O operation.

However, caching is not without its own issues. Regeneration of cache data all at once, due to the volatilie nature of

cache storage, can place significant demand on infrastructure. Another issue is dog-piling, where multiple

generations of a new cache copy are carried out under extremely high load, which leads to cascade failure.

Dynamic memory allocation is an expensive operation in software execution. Thus it is imperative for higher

performance trading applications to be well-aware how memory is being allocated and deallocated during program

flow. Newer language standards such as Java, C# and Python all perform automatic garbage collection, which refers

to deallocation of dynamically allocated memory when objects go out of scope.

Garbage collection is extremely useful during development as it reduces errors and aids readability. However, it is

often sub-optimal for certain high frequency trading strategies. Custom garbage collection is often desired for these

cases. In Java, for instance, by tuning the garbage collector and heap configuration, it is possible to obtain high

performance for HFT strategies.

C++ doesn't provide a native garbage collector and so it is necessary to handle all memory allocation/deallocation

as part of an object's implementation. While possibly error prone (potentially leading to dangling pointers) it is

extremely useful to have fine-grained control of how objects appear on the heap for certain applications. When

choosing a language make sure to study how the garbage collector works and whether it can be modified to optimise

for a particular use case.

Many operations in algorithmic trading systems are amenable to parallelisation. This refers to the concept of carrying

out multiple programmatic operations at the same time, i.e in "parallel".

So-called "embarassingly parallel" algorithms include steps that can be computed fully independently of other steps.

Certain statistical operations, such as Monte Carlo simulations, are a good example of embarassingly parallel

algorithms as each random draw and subsequent path operation can be computed without knowledge of other paths.
Other algorithms are only partially parallelisable. Fluid dynamics simulations are such an example, where the

domain of computation can be subdivided, but ultimately these domains must communicate with each other and thus

the operations are partially sequential. Parallelisable algorithms are subject to Amdahl's Law, which provides a

theoretical upper limit to the performance increase of a parallelised algorithm when subject to N separate processes

(e.g. on a CPU core or thread).

Parallelisation has become increasingly important as a means of optimisation since processor clock-speeds have

stagnated. Newer processors contain many cores with which to perform parallel calculations.

The rise of consumer graphics hardware (predominently for video games) has lead to the development of Graphical

Processing Units (GPUs), which contain hundreds of "cores" for highly concurrent operations. Such GPUs are now

very affordable. High-level frameworks, such as Nvidia CUDA have lead to widespread adoption in academia and

finance.

Such GPU hardware is generally only suitable for the research aspect of quantitative finance, whereas other more

specialised hardware (including Field-Programmable Gate Arrays - FPGAs) are used for (U)HFT.

Nowadays, most modern langauges support a degree of concurrency/multithreading. Thus it is straightforward to

optimise a backtester, since all calculations are generally independent of the others.

Scaling in software engineering and operations refers to the ability of the system to handle consistently increasing

loads in the form of greater requests, higher processor usage and more memory allocation.

In algorithmic trading a strategy is able to scale if it can accept larger quantities of capital and still produce consistent

returns. The trading technology stack scales if it can endure larger trade volumes and increased latency, without

bottlenecking.

While systems must be designed to scale, it is often hard to predict beforehand where a bottleneck will occur.

Rigourous logging, testing, profiling and monitoring will aid greatly in allowing a system to scale. Languages

themselves are often described as "unscalable". This is usually the result of misinformation, rather than hard fact.
It is the total technology stack that should be ascertained for scalability, not the language. Clearly certain languages

have greater performance than others in particular use cases, but one language is never "better" than another in every

sense.

One means of managing scale is to separate concerns, as stated above.

In order to further introduce the ability to handle "spikes" in the system (i.e. sudden volatility which triggers a raft of

trades), it is useful to create a "message queuing architecture". This simply means placing a message queue

system between components so that orders are "stacked up" if a certain component is unable to process many

requests.

Rather than requests being lost they are simply kept in a stack until the message is handled. This is particularly useful

for sending trades to an execution engine.

If the engine is suffering under heavy latency then it will back up trades. A queue between the trade signal generator

and the execution API will alleviate this issue at the expense of potential trade slippage. A well-respected open source

message queue broker is RabbitMQ.

Hardware and Operating Systems

The hardware running your strategy can have a significant impact on the profitability of your algorithm.

This is not an issue restricted to high frequency traders either. A poor choice in hardware and operating system can
lead to a machine crash or reboot at the most inopportune moment. Thus it is necessary to consider where your
application will reside. The choice is generally between a personal desktop machine, a remote server, a "cloud"
provider or an exchange co-located server.

Desktop machines are simple to install and administer, especially with newer user friendly operating systems such as

Windows 7/8, Mac OS X and Ubuntu. Desktop systems do possess some significant drawbacks, however.

The foremost is that the versions of operating systems designed for desktop machines are likely to require

reboots/patching (and often at the worst of times!). They also use up more computational resources by the virtue of

requiring a graphical user interface (GUI).


Utilising hardware in a home (or local office) environment can lead to internet connectivity and power uptime

problems. The main benefit of a desktop system is that significant computational horsepower can be purchased for

the fraction of the cost of a remote dedicated server (or cloud based system) of comparable speed.

A dedicated server or cloud-based machine, while often more expensive than a desktop option, allows for more

significant redundancy infrastructure, such as automated data backups, the ability to more straightforwardly ensure

uptime and remote monitoring. They are harder to administer since they require the ability to use remote login

capabilities of the operating system.

In Windows this is generally via the GUI Remote Desktop Protocol (RDP). In Unix-based systems the command-line

Secure SHell (SSH) is used. Unix-based server infrastructure is almost always command-line based which

immediately renders GUI-based programming tools (such as MatLab or Excel) to be unusable.

A co-located server, as the phrase is used in the capital markets, is simply a dedicated server that resides within an

exchange in order to reduce latency of the trading algorithm. This is absolutely necessary for certain high frequency

trading strategies, which rely on low latency in order to generate alpha.

The final aspect to hardware choice and the choice of programming language is platform-independence. Is there a

need for the code to run across multiple different operating systems? Is the code designed to be run on a

particular type of processor architecture, such as the Intel x86/x64 or will it be possible to execute on RISC

processors such as those manufactured by ARM? These issues will be highly dependent upon the frequency and type

of strategy being implemented.

Resilience and Testing

One of the best ways to lose a lot of money on algorithmic trading is to create a system with no resiliency.

This refers to the durability of the sytem when subject to rare events, such as brokerage bankruptcies, sudden excess

volatility, region-wide downtime for a cloud server provider or the accidental deletion of an entire trading database.

Years of profits can be eliminated within seconds with a poorly-designed architecture. It is absolutely essential to

consider issues such as debuggng, testing, logging, backups, high-availability and monitoring as core components of

your system.
It is likely that in any reasonably complicated custom quantitative trading application at least 50% of development time

will be spent on debugging, testing and maintenance.

Nearly all programming languageseither ship with an associated debugger or possess well-respected third-party

alternatives. In essence, a debugger allows execution of a program with insertion of arbitrary break points in the code

path, which temporarily halt execution in order to investigate the state of the system. The main benefit of debugging is

that it is possible to investigate the behaviour of code prior to a known crash point.

Debugging is an essential component in the toolbox for analysing programming errors. However, they are more

widely used in compiled languages such as C++ or Java, as interpreted languages such as Python are often easier to

debug due to fewer LOC and less verbose statements.

Despite this tendency Python does ship with the pdb, which is a sophisticated debugging tool. The Microsoft Visual

C++ IDE possesses extensive GUI debugging utilities, while for the command line Linux C++ programmer, the gdb

debugger exists.

Testing in software development refers to the process of applying known parameters and results to specific

functions, methods and objects within a codebase, in order to simulate behaviour and evaluate multiple code-paths,

helping to ensure that a system behaves as it should.

A more recent paradigm is known as Test Driven Development (TDD), where test code is developed against a

specified interface with no implementation. Prior to the completion of the actual codebase all tests will fail. As code is

written to "fill in the blanks", the tests will eventually all pass, at which point development should cease.

TDD requires extensive upfront specification design as well as a healthy degree of discipline in order to carry out

successfully. In C++, Boost provides a unit testing framework. In Java, the JUnit library exists to fulfill the same

purpose. Python also has the unittest module as part of the standard library. Many other languages possess unit

testing frameworks and often there are multiple options.

In a production environment, sophisticated logging is absolutely essential. Logging refers to the process of

outputting messages, with various degrees of severity, regarding execution behaviour of a system to a flat file or

database.
Logs are a "first line of attack" when hunting for unexpected program runtime behaviour. Unfortunately the

shortcomings of a logging system tend only to be discovered after the fact! As with backups discussed below, a

logging system should be given due consideration BEFORE a system is designed.

Both Microsoft Windows and Linux come with extensive system logging capability and programming languages tend

to ship with standard logging libraries that cover most use cases. It is often wise to centralise logging information in

order to analyse it at a later date, since it can often lead to ideas about improving performance or error reduction,

which will almost certainly have a positive impact on your trading returns.

While logging of a system will provide information about what has transpired in the past, monitoring of an

application will provide insight into what is happening right now. All aspects of the system should be considered for

monitoring. System level metrics such as disk usage, available memory, network bandwidth and CPU usage provide

basic load information.

Trading metrics such as abnormal prices/volume, sudden rapid drawdowns and account exposure for different

sectors/markets should also be continuously monitored. Further, a threshold system should be instigated that provides

notification when certain metrics are breached, elevating the notification method (email, SMS, automated phone call)

depending upon the severity of the metric.

System monitoring is often the domain of the system administrator or operations manager. However, as a sole

trading developer, these metrics must be established as part of the larger design. Many solutions for monitoring exist:

proprietary, hosted and open source, which allow extensive customisation of metrics for a particular use case.

Backups and high availability should be prime concerns of a trading system. Consider the following two questions:

1) If an entire production database of market data and trading history was deleted (without backups) how would the

research and execution algorithm be affected? 2) If the trading system suffers an outage for an extended period (with

open positions) how would account equity and ongoing profitability be affected? The answers to both of these

questions are often sobering!

It is imperative to put in place a system for backing up data and also for testing the restoration of such data.
Many individuals do not test a restore strategy. If recovery from a crash has not been tested in a safe environment,

what guarantees exist that restoration will be available at the worst possible moment?

Similarly, high availability needs to be "baked in from the start". Redundant infrastructure (even at additional expense)

must always be considered, as the cost of downtime is likely to far outweigh the ongoing maintenance cost of such

systems. I won't delve too deeply into this topic as it is a large area, but make sure it is one of the first considerations

given to your trading system.

Choosing a Language

Considerable detail has now been provided on the various factors that arise when developing a custom high-
performance algorithmic trading system. The next stage is to discuss how programming languages are generally
categorised.

Type Systems

When choosing a language for a trading stack it is necessary to consider the type system. The languages which are
of interest for algorithmic trading are either statically- or dynamically-typed.

A statically-typed language performs checks of the types (e.g. integers, floats, custom classes etc) during the
compilation process. Such languages include C++ and Java. A dynamically-typed language performs the majority of
its type-checking at runtime. Such languages include Python, Perl and JavaScript.

For a highly numerical system such as an algorithmic trading engine, type-checking at compile time can be extremely

beneficial, as it can eliminate many bugs that would otherwise lead to numerical errors.

However, type-checking doesn't catch everything, and this is where exception handling comes in due to the

necessity of having to handle unexpected operations.

'Dynamic' languages (i.e. those that are dynamically-typed) can often lead to run-time errors that would otherwise be

caught with a compilation-time type-check. For this reason, the concept of TDD (see above) and unit testing arose

which, when carried out correctly, often provides more safety than compile-time checking alone.

Another benefit of statically-typed languages is that the compiler is able to make many optimisations that are

otherwise unavailable to the dynamically- typed language, simply because the type (and thus memory requirements)

are known at compile-time.


In fact, part of the inefficiency of many dynamically-typed languages stems from the fact that certain objects must be

type-inspected at run-time and this carries a performance hit. Libraries for dynamic languages, such as NumPy/SciPy

alleviate this issue due to enforcing a type within arrays.

Open Source or Proprietary?

One of the biggest choices available to an algorithmic trading developer is whether to use proprietary (commercial) or

open source technologies.

There are advantages and disadvantages to both approaches. It is necessary to consider how well a language is

supported, the activity of the community surrounding a language, ease of installation and maintenance, quality of the

documentation and any licensing/maintenance costs.

The Microsoft .NET stack (including Visual C++, Visual C#) and MathWorks' MatLab are two of the larger

proprietary choices for developing custom algorithmic trading software. Both tools have had significant "battle testing"

in the financial space, with the former making up the predominant software stack for investment banking trading

infrastructure and the latter being heavily used for quantitative trading research within investment funds.

Microsoft and MathWorks both provide extensive high quality documentation for their products. Further, the

communities surrounding each tool are very large with active web forums for both. The .NET software allows cohesive

integration with multiple languages such as C++, C# and VB, as well as easy linkage to other Microsoft products such

as the SQL Server database via LINQ. MatLab also has many plugins/libraries (some free, some commercial) for

nearly any quantitative research domain.

There are also drawbacks. With either piece of software the costs are not insignificant for a lone trader (although

Microsoft does provide entry-level version of Visual Studio for free). Microsoft tools "play well" with each other, but

integrate less well with external code. Visual Studio must also be executed on Microsoft Windows, which is arguably

far less performant than an equivalent Linux server which is optimally tuned.

MatLab also lacks a few key plugins such as a good wrapper around the Interactive Brokers API, one of the few

brokers amenable to high-performance algorithmic trading. The main issue with proprietary products is the lack of

availability of the source code. This means that if ultra performance is truly required, both of these tools will be far less

attractive.
Open source tools have been industry grade for sometime. Much of the alternative asset space makes extensive

use of open-source Linux, MySQL/PostgreSQL, Python, R, C++ and Java in high-performance production roles.

However, they are far from restricted to this domain. Python and R, in particular, contain a wealth of extensive

numerical libraries for performing nearly any type of data analysis imaginable, often at execution speeds comparable

to compiled languages, with certain caveats.

The main benefit of using interpreted languages is the speed of development time. Python and R require far

fewer lines of code (LOC) to achieve similar functionality, principally due to the extensive libraries. Further, they often

allow interactive console based development, rapidly reducing the iterative development process.

Given that time as a developer is extremely valuable, and execution speed often less so (unless in the HFT space), it

is worth giving extensive consideration to an open source technology stack. Python and R possess significant

development communities and are extremely well supported, due to their popularity. Documentation is excellent and

bugs (at least for core libraries) remain scarce.

Open source tools often suffer from a lack of a dedicated commercial support contract and run optimally on

systems with less-forgiving user interfaces. A typical Linux server (such as Ubuntu) will often be fully command-line

oriented. In addition, Python and R can be slow for certain execution tasks. There are mechanisms for integrating with

C++ in order to improve execution speeds, but it requires some experience in multi-language programming.

While proprietary software is not immune from dependency/versioning issues it is far less common to have to deal

with incorrect library versions in such environments. Open source operating systems such as Linux can be trickier to

administer.

I will venture my personal opinion here and state that I build all of my trading tools with open source

technologies. In particular I use: Ubuntu, MySQL, Python, C++ and R. The maturity, community size, ability to "dig

deep" if problems occur and lower total cost ownership (TCO) far outweigh the simplicity of proprietary GUIs and

easier installations. Having said that, Microsoft Visual Studio (especially for C++) is a fantastic Integrated

Development Environment (IDE) which I would also highly recommend.

Batteries Included
The header of this section refers to the "out of the box" capabilities of the language - what libraries does it contain and

how good are they?

This is where mature languages have an advantage over newer variants. C++, Java and Python all now possess

extensive libraries for network programming, HTTP, operating system interaction, GUIs, regular expressions (regex),

iteration and basic algorithms.

C++ is famed for its Standard Template Library (STL) which contains a wealth of high performance data structures

and algorithms "for free". Python is known for being able to communicate with nearly any other type of system/protocol

(especially the web), mostly through its own standard library. R has a wealth of statistical and econometric tools built

in, while Matlab is extremely optimised for any numerical linear algebra code (which can be found in portfolio

optimisation and derivatives pricing, for instance).

Outside of the standard libraries, C++ makes use of the Boost library, which fills in the "missing parts" of the standard

library. In fact, many parts of Boost made it into the TR1 standard and subsequently are available in the C++11 spec,

including native support for lambda expressions and concurrency.

Python has the high performance NumPy/SciPy/Pandas data analysis library combination, which has gained

widespread acceptance for algorithmic trading research. Further, high-performance plugins exist for access to the

main relational databases, such as MySQL++ (MySQL/C++), JDBC (Java/MatLab), MySQLdb (MySQL/Python) and

psycopg2 (PostgreSQL/Python). Python can even communicate with R via the RPy plugin!

An often overlooked aspect of a trading system while in the initial research and design stage is the connectivity to a

broker API. Most APIs natively support C++ and Java, but some also support C# and Python, either directly or with

community-provided wrapper code to the C++ APIs. In particular, Interactive Brokers can be connected to via the

IBPy plugin. If high-performance is required, brokerages will support the FIX protocol.

Conclusion

As is now evident, the choice of programming language(s) for an algorithmic trading system is not straightforward and

requires deep thought.


The main considerations are performance, ease of development, resiliency and testing, separation of concerns,

familiarity, maintenance, source code availability, licensing costs and maturity of libraries.

The benefit of a separated architecture is that it allows languages to be "plugged in" for different aspects of a trading

stack, as and when requirements change.

Remember that a trading system is an evolving tool and it is likely that any language choices will evolve along with it.

In the next email lesson we will consider how tochoose a platform for backtesting and automated execution.

Вам также может понравиться