Вы находитесь на странице: 1из 28

Improve Your Betting

SMARTERSIG

SmarterSig August 2009
2 of 2
CONTENTS

4 Developing Systems Ricky Taylor

8 Developing Statistical Models Using R Part III Alun Owen

18 How Reliable are Breeding Stats Mark Foley

20 Multiple Mugs Mark Littlewood

22 Market Bias in 5f Handicaps Dave Renham

25 Educating Trevor Mark Littlewood



SmarterSig August 2009
3 of 3
EDITORIAL

Hi and welcome to the August edition of SmarterSig.

By the time you read this Goodwood will be fading from our memory and the York Ebor meeting will be
looming up. Probably my favourite meeting of the year along with Royal Ascot and one which I usually
attend for all three days or four as it is now. If any SmarterSig members are also attending the York
meeting and fancy meeting up for a drink later in the evening then contact myself or the email list. One or
two of us have done this in the past and some fruitful evenings have been had, getting to know faces to
familiar names.

Just a word about the KISS tips on the web site, which are related to a previous article. They got off to a
storming start in the first month and then hit a lull. Things have however picked up and within the most
profitable area identified in the article, reasonable profit has occurred. Overall they are performing below
the historical worse year which gives hope that should things balance out in the future the profitable area
identified can become even more profitable than it is at the moment.

All the best, hoping you have a good August.

SmarterSig August 2009
4 of 4
DEVELOPING SYSTEMS

By Ricky Taylor

It is great fun to develop your own betting systems but it can be very time consuming if you dont have
the aid of a computer. In the past, before electronic formbooks became readily available, I used to
develop my systems from newspapers. I had a large wardrobe that was stuffed from top to bottom with
old, yellow, dust mite infested back issues of the Racing Post. When I had an idea for a system I would
then test the system by wading through all my back issues, recording all the results. This was painstaking
work. Im glad those days are gone, and my wife is probably even more relieved!

With the advent of computer databases it is possible to develop and test systems in a matter of minutes
and if you are really serious about betting systems then you really do need a computer and need to buy
one of the many excellent databases that are available. It is also ideal if you have some knowledge of
statistics and some computer programming skills. Im fortunate in that I have studied statistical and
research methods and Ive trained myself to use some fairly sophisticated statistical software programs.
However, you have probably worked out by now that Im a bit of a systems anorak, and most normal
people wouldnt want to go to these lengths to develop betting systems. In this article I therefore want to
give you a few short cuts to developing robust and reliable systems. You will then avoid some of my early
and costly mistakes.

Data
The first thing you need to do when developing betting systems is to get hold of as much data as possible.
In the modern world this is fairly easy. There is a tonne of data on the Internet and you can purchase
electronic formbooks that allow you to download past racing results into spreadsheet packages for further
analysis. There are also menu driven software packages that allow you to develop and test systems
without having to bother to learn how to program a computer. Some of this stuff isnt cheap but data and
software are the fundamental tools of the trade. I look at systems development as a business and all
businesses have set-up costs. The purchase of a computer, data and software are your set up costs. Once
you have developed your first profitable system you can soon recoup your investment!

I wouldnt try to skimp on paying for data. You need samples that run into tens of thousands in order to
generate genuinely valid and reliable systems. A computer with plenty of memory and processing power is
therefore essential. I once made the mistake of buying a cheap computer that was under powered for
processing the size of database that I had put together. It took the machine age to process and it wasnt
long before the useless thing burnt out!



SmarterSig August 2009
5 of 5
System ideas
Once you have your data in place you then can then start to develop some ideas for a system. Nick Mordin
has written a lot about this. The most relevant is probably his Winning without thinking. The key point
that he makes in this text and one that I very much agree with, is that you have to be original when
developing systems. If you simply re-invent systems that have been used before then you are unlikely to
make a profit because if this is a genuinely good system then the factors on which it is based will already
be incorporated into the betting odds. What you should be striving for is an angle that no one has really
researched before, and is probably not reflected in the betting odds. In this instance you are much more
likely to be able to make a long-term profit. This doesnt necessarily mean that you need to develop very
complex systems. I would always advise that you should keep it simple and focus on horse race
fundamentals, namely the horses ability, jockey, trainer, pedigree, fitness and consistency, and the
betting market.

Testing your ideas
Once you have your data and you have some ideas for a system then you can start testing them against
the available data. This all sounds very logical but in reality there is a certain amount of circularity in
developing systems. For instance, you may have an idea but cannot find the data on which to test it, and
so you cant validate the system. Your system ideas therefore need to be framed around the available
data. The richer you data and the more variables it contains the more system ideas you will be able to
test.

Once you have an idea and the data to test it you can then work out whether it is profitable or not. In the
test phase you may find that your system isnt profitable and so you may want to do further research on
the systems variables to find the best combination, or add other variables to improve the results, and then
to test it again. However, one of the early mistakes that I made, when developing a system in this way,
was to develop a system on past results and then to implement it straight away.

The system I had developed looked great on the basis of past results. I was so excited by its past
performance that I implemented it immediately, and looked forward to buying that new car, and treating
the family to nice holiday. Unfortunately the system didnt work so well when I played it for real stakes.
This left me perplexed. Why didnt the system work in real time? After a lot of reflection I realised that
what I had actually done was to back fit a system to past results.

In a back fitted system selection rules are manipulated to account for a sample of previous results. For
instance if the system developer finds that his or her system picks a loser, the rules are then changed
slightly to eliminate this selection. Similarly some rules are changed to accommodate a long-priced
winner. This process is repeated until the system produces a respectable number of winners and a decent
level of profitability on past results. This is what I did when I developed my early systems. I was using the

SmarterSig August 2009
6 of 6
same sample of data to build my system and to refine my ideas, and using that same data to test its
validity. What I should have done was use what is called a split-half sample.

A split half sample is exactly what it says on the tin. It is a sample that has been split into two halves. In
one half of the data you develop your system. You can then play around with the variables to your hearts
content until you have something that looks to be sensible and profitable. At this stage the system is a
pure back fitted system. In order to test it properly you then run the system against the unseen data in
the other half of the sample. This is what is called the validation sample. If the system shows a profit over
both samples, and provided that the samples are large enough, then you can be fairly confident that you
have found a genuinely profitable system that will work when you play it for real.

When looking at commercially available systems the sign of a back fitted system is a long list of
complicated selection criteria, which often appear illogical. They tend not to work when applied in real
time because they are not based on sensible, proven form factors.

Racing logic
This brings me to my next point about betting systems. Regardless of any other consideration, they must
conform to racing logic. It is vital that the variables used in a system are sensible. I once stood in a
betting shop in open-mouthed amazement when a fellow punter punched the air to celebrate a winner and
turned to me and said First letter R system. Never fails. I think most right thinking people would agree
that horses with a name beginning with the letter R are no more likely to win than a horse with a name
beginning with any other letter of the alphabet. There is no logic to the system. It is better to stick to
variables that most racing professionals would agree are important to winner findings. You can then
combine these variables in unique ways.

If you are really imaginative you might be able to develop a variable that measures a form concept in an
original way. This can be really profitable because you will be the only person using the variable. For
example, the American racing guru Andy Beyer made a fortune in the 1970s and early 1980s by assessing
horses ability using speed figures because he had found a unique was of calculating them.

Objective variables
When developing systems you should only use quantitative variables. In other words only use factors that
can be measured in a consistent and objective manner. Avoid at all costs systems that rely on qualitative
information. I know of many systems that use qualitative data on horses looks. This is based on sound
racing logic in that good looking horses are usually the most able because they have a good physique, and
a high level of fitness. The problem is that what one-person judges too be a good looking horse another
person will disagree. As an example I remember reading in Pat Taaffees excellent autobiography My Life
and Arkles a story about Lord Bicester who always purchased what he considered to be very good looking

SmarterSig August 2009
7 of 7
horses. When his horse Royal Approach won the Irish Grand National a former leading jockey patted the
Lord on the back to congratulate him and said Good horse, but not much of a looker is he? to which the
Lord replied when was the last time you looked in the mirror! It is therefore better to steer clear of
opinion and focus only on those variables that you can measure objectively. This means using only hard
numerical data.

Look for consistent profits
A number of so-called profitable systems do not record consistent levels of profit, year in, and year out.
They often record a lucky year in which they make an extraordinary profit, which disguises the fact that in
a normal year the system makes a loss. When testing systems it is therefore important to break down the
results by year and note the level of profit recorded in each year. I am always much more confident of a
system that has shown a consistent profit over time. Im particularly interested in whether the system
shows a profit across years in its validation sample. If you find a system that shows a consistent profit in
its validation then get your betting boots on!

Conclusion
In this article I have tried to offer a few pointers to developing betting systems. For those of you that like
to try your hand at systems development I hope they prove useful and profitable.


Ricky Taylor is author of Pace Wins the race published by Sportsworld Publishing in 2006 and Profitable
betting system for horseracing published by High Stakes in 2008.


SmarterSig August 2009
8 of 8
Developing Statistical Models of Horse Racing Outcomes
Using

Alun Owen
(email: OWEN.A3@sky.com)


Part III: Developing a Binary Logistic Regression Model

This article is the third and final part of a series of workshops, where we are looking at using the free
statistical software R, to develop a model of horse racing outcomes. Parts I and II, which were published
in the June and July 2009 editions of the magazine, looked at how to obtain and install R, and introduced
R commands to examine and plot your data. We now move on to the stage where we will attempt to
develop a binary logistic regression model. No significant knowledge of statistics or mathematics is
assumed or required, apart from a good degree of numeracy. It is assumed however that you have read
parts I and II in this series and are now familiar with the use of R.

The data we are using comes from the Flat Turf Handicaps in 2004 and is available as a comma separated
file named aiplus2004.csv, in the utilities section of the SmarterSig web site. This data set includes the
following data for each horse in approximately 1,780 races:
- position3 finishing position three races ago (1, 2, 3 or 4, 0 = anywhere else)
- position2 finishing position two races ago (1, 2, 3 or 4, 0 = anywhere else)
- position1 finishing position in the previous race (1, 2, 3 or 4, 0 = anywhere else)
- days days since last race
- sireSR - win percentage achieved by the horses Sire with its offspring prior to this race
- position - finishing position in this race
Each row in the data set refers to a horse in each race and contains data on over 22,000 horses!

If you have not already done so, go to http://www.r-project.org and download and install R, and download
the file named aiplus2004.csv from the SmarterSig website and have a look at the data by opening it
using Excel

Start R and in the R console window type the following command to open our data set:
horse.data<-read.csv("C:\Directory Name\aiplus2004.csv")

Recall that C:\Directory Name\ needs replacing with the actual sub-directory path you want R to point
to in order to look for the file named aiplus.csv. If you have previously set the default directory that R
points to and the data file is in this directory, then all you need to type is:
read.csv("aiplus2004.csv").

Recall that this creates a data frame, which we have called horse.data.
Type the following command to make the columns of horse.data visible to R, so that we can refer to
them using the column name, such as position.

SmarterSig August 2009
9 of 9
attach(horse.data)

In Part II we decided that a suitable model would be better if we restricted the data on sireSR to 18%
and days to at most 50 and greater than zero. Therefore we will create a reduced data set called
horse.data.reduced using the subset command in R as follows:
horse.data.reduced<-subset(horse.data, sireSR<=18 & days<=50 & days>0)
attach(horse.data.reduced)

Recall that the information on the outcome of each race is contained in the column vector named
position, which indicates the position the horse finished in each race. We are interested in modeling the
percentage chance of a horse winning a race, using the horses finishing position in each of its previous
three races (position1, position2 and position3), its sire strike rate (sireSR) and the number of days
since the horse last ran (days), as predictor variables.
Type the following commands in R to create a new column vector which indicates whether the horse in
each row of our data won that particular race (with a 1) or not (with a 0):
n<-length(position)
win<-rep(0,n)
win[position==1]<-1

We saw in Part II that the variables position1, position2, position3, sireSR and days all seemed to be
related to the percentage chances of a horse winning. This was illustrated through two-way tables of win
against position1, position2 and position3 and plots of win against sireSR and days.

Since we are now working with a reduced data set, the commands to obtain the two-way tables of win
against position1, position2 and position3, based on our reduced data set, are reproduced below,
together with the resulting output:
postion1.win<-table(position1,win)
prop.table(postion1.win,1)

win
position1 0 1
0 0.94199354 0.05800646
1 0.85190280 0.14809720
2 0.88188559 0.11811441
3 0.88692390 0.11307610
4 0.91239669 0.08760331



postion2.win<-table(position2,win)
prop.table(postion2.win,1)

SmarterSig August 2009
10 of 10

position2 0 1
0 0.93278631 0.06721369
1 0.88107848 0.11892152
2 0.88638743 0.11361257
3 0.90398254 0.09601746
4 0.91243243 0.08756757


postion3.win<-table(position3,win)
prop.table(postion3.win,1)

win
position3 0 1
0 0.92912268 0.07087732
1 0.88776004 0.11223996
2 0.89628078 0.10371922
3 0.90329552 0.09670448
4 0.92140921 0.07859079


Recall that the above output shows how a better finishing position in any of the three previous races is
associated with an increase in the proportion of wins in the current race.

The commands to obtain the plots of win percentage against both sireSR and days for the reduced data
set are also shown below, along with the resulting plots (recall that the first line below calculates a
rounded version of the the sireSR variable, rounded to the nearest whole number):
sireSR.round<-round(sireSR)
sireSR.win<-table(sireSR.round,win)
sireSR.win.table <-prop.table(sireSR.win,1)
sireSR.win.prop<-sireSR.win.table[,2]
sireSR.labels<-rownames(sireSR.win.table)
sireSR.values<- as.numeric(sireSR.labels)
plot(sireSR.values,sireSR.win.prop, ylab="Proportion of Wins",xlab="Sire SR")


SmarterSig August 2009
11 of 11
0 5 10 15
0
.
0
7
0
.
0
8
0
.
0
9
0
.
1
0
0
.
1
1
0
.
1
2
0
.
1
3
Sire SR
P
r
o
p
o
r
t
i
o
n

o
f

W
i
n
s

days.win<-table(days,win)
days.win.table <-prop.table(days.win,1)
days.win.prop<-days.win.table[,2]
days.labels<-rownames(days.win.table)
days.values<- as.numeric(days.labels)
plot(days.values,days.win.prop,ylab="Proportion of Wins",xlab="Days Since Last
Run")
0 10 20 30 40 50
0
.
0
2
0
.
0
4
0
.
0
6
0
.
0
8
0
.
1
0
0
.
1
2
Days Since Last Run
P
r
o
p
o
r
t
i
o
n

o
f

W
i
n
s


SmarterSig August 2009
12 of 12
The plots above indicate that a higher sireSR or lower days value is associated with an increase in the
proportion wins in the current race.

A binary logistic regression model would appear to be a suitable model worth investigating here for a
number of reasons. Firstly our outcome (win) is a binary variable in the sense that there are only two
possible values, 1 (if the horse won) or 0 (if it didnt). In addition we require as an output from our model,
the percentage chances of a horse winning a race, and this is a natural output from such a logistic model.

The standard binary logistic regression model can specified algebraically as follows:

p p
x x x
p
p
| | | | + + + + =
|
|
.
|

\
|

...
1
log
2 2 1 1 0


Where p is the proportion of outcomes that we are interested in, which here is proportion of wins. The
term p/1-p is therefore the odds and log(p/1-p) is referred to as the log-odds or log of the odds.

The terms x
1
x
2,,
x
p
represent the predictor variables, which here are the sireSR, days, position1,
position2 and position3. However, note that the variables sireSR and days are continuous variables,
whereas the variables position1, position2 and position3 are what are referred to as categorical
variables. This is because sireSR or days are measured on a continuous scale, whereas the position a
horse finished in a previous race falls into one of five categories (first, second, third, fourth or unplaced).
Because of this, we need to treat the variables position1, position2 and position3 differently to
sireSR and days - more on this later.

The terms
1
,
2
,,
p
represent model parameters that we multiply the values of the predictors by in
the model. The term
0
represents a constant parameter that is added to everything else in the model.

Okay, so how does the above standard binary logistic model relate to what we have? Lets start, by
looking at a simple model, which involves just sireSR as the only predictor of win percentage. i.e. for the
moment, we will ignore the data on days, position1, position2 and position3. We can state our model
as follows:
sireSR
p
p
+ =
|
|
.
|

\
|

1 0
1
log | |


This simply has a constant
0
plus a term for the sireSR and multiplied by the parameter
1
.The
parameters
0
and
1
will be estimated by the software (R) using model fitting technique called maximum
likelihood estimation (no details of the method are discussed here).

We can then fit the model by typing the following command (recall that we are using the sireSR data
rounded to the nearest whole number which is sireSR.round):
summary(glm(win~sireSR.round,binomial))


SmarterSig August 2009
13 of 13
Part of the output obtained is:

Call:
glm(formula = win ~ sireSR.round, family = binomial)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.721224 0.078488 -34.67 < 2e-16 ***
sireSR.round 0.039469 0.009266 4.26 2.05e-05 ***


The most important part of the output are the estimates of
0,
which is the intercept, and is
-2.721224 and
1,
the parameter for sireSR.round, which is 0.039469.

Therefore the fitted model is:
sireSR
p
p
+ =
|
|
.
|

\
|

0395 . 0 7212 . 2
1
log


Therefore, if a horse running in a current race has a sireSR value of 10%, the log-odds are given by:

326 . 2 10 0395 . 0 7212 . 2
1
log odd the of log = + =
|
|
.
|

\
|

=
p
p
s


How do we then find the odds from this? Well, we simply calculate the exponential of this, which is
e
--2.326
and reads as e raised to the power of minus 2.326. This might look hard to calculate, but e is
simply a special number in mathematics like the number t. The number e is approximately 2.718, but
most calculators, as well as in R and Microsoft Excel, have the number e in its fixed memory. Hence they
have the capacity to raise the number e to whatever power you require. We can calculate e
-2.326
by
pressing the following typical keys on your calculator:

e
x
-2.326 =

Or alternatively you can use the following command in R:
exp(-2.326)

Either way you should get the answer to be 0.098

This means that fair odds of the horse winning would be 1 to 0.098, or 10.2:1

SmarterSig August 2009
14 of 14
We can convert this to the probability of the horse winning using the formula:
089 . 0
098 . 0 1
098 . 0
1
=
+
=
+
=
odds
odds
p

Hence, we have used the model to say that a horse with a sireSR value of 10%, would win its next race
with a probability of 0.089, and so fair odds would be 10.2:1.

We can obtain historical model-predicted odds and probabilities for the horses in our data set by using the
following commands:
model<-glm(win~sireSR.round,binomial)
log.odds<-predict(model)
odds<-exp(predict(model))
probs<-predict(model,type="response")

The variables log.odds, odds and probs, then contain the relevant values. Recall that you can view, say
just the first 10 of these values, by typing commands such as:
probs[1:10]

Okay, back to our full data set, where want to build a model that also includes days, position1,
position2 and position3. We can state our model as follows:
34 33 32 31
24 23 22 21
14 13 12 11
1
log
34 33 32 31
24 23 22 21
14 13 12 11
2 1 0
pos pos pos pos
pos pos pos pos
pos pos pos pos
days sireSR
p
p
+ + + +
+ + + +
+ + + +
+ + =
|
|
.
|

\
|

| | | |
| | | |
| | | |
| | |


This may at first look a little overwhelming, so lets look at it line by line:
The first line is simply the constant (intercept) term
0,
plus terms for the sireSR and days
variables multiplied by the estimated model parameters
1
and
2
.
The second line in the model above all relates to the variable position1 as follows:
pos11 is a dummy variable that = 1 if the value of position1 is 1, otherwise pos11 is zero;
pos12 is a dummy variable that = 1 if the value of position1 is 2, otherwise pos12 is zero;
pos13 is a dummy variable that = 1 if the value of position1 is 3, otherwise pos13 is zero;
pos14 is a dummy variable that = 1 if the value of position1 is 4, otherwise pos14 is zero;
So, for example, if a horse finished second in its previous race, it would have pos11 = 0,
pos12 = 1, pos13 = 0, pos14 = 0.
The third line in the model above relates to the variable position2 in a similar way, as does the
fourth line in the model in relation to the variable position3.


SmarterSig August 2009
15 of 15
Before we fit the model, we need to define the dummy variables pos11, pos12, pos13, pos14 for position1,
pos21, pos22, pos23, pos24 for position2 and pos31, pos32, pos33, pos34 for position3. This is easily done
by defining what are referred to as factor variables using the following commands in R:
pos1<-factor(position1)
pos2<-factor(position2)
pos3<-factor(position3)

We can then fit the model by typing:
summary(glm(win~sireSR.round+days+pos1+pos2+pos3,binomial))

Part of the output obtained is:

Call:
glm(formula = win ~ sireSR.round + days + pos1 + pos2 + pos3,
family = binomial)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.978040 0.098822 -30.135 < 2e-16 ***
sireSR.round 0.025855 0.009575 2.700 0.00693 **
days -0.011901 0.002713 -4.387 1.15e-05 ***
pos11 0.868040 0.076181 11.394 < 2e-16 ***
pos12 0.656480 0.083018 7.908 2.62e-15 ***
pos13 0.651979 0.084162 7.747 9.43e-15 ***
pos14 0.392159 0.092873 4.223 2.42e-05 ***
pos21 0.440784 0.080350 5.486 4.12e-08 ***
pos22 0.375065 0.083851 4.473 7.71e-06 ***
pos23 0.235901 0.089683 2.630 0.00853 **
pos24 0.178733 0.091808 1.947 0.05156 .
pos31 0.372110 0.080467 4.624 3.76e-06 ***
pos32 0.253462 0.085402 2.968 0.00300 **
pos33 0.218599 0.088042 2.483 0.01303 *
pos34 0.016654 0.095016 0.175 0.86086

Before we look at how we put these values in the model and use it for prediction, first note the asterix
annotation at the end of each row in the table of estimated parameters. These indicate which of our
predictor variables have an effect on the output of our model (win percentage). Variables annotated with
*** or ** indicate variables where the effects of the predictor variables on win percentage are more clear
cut (more statistically significant to use statistics jargon!) The only variables that do NOT have an asterix
are pos24 and pos34, which would suggest that whether a horse finished fourth two races ago or three
races ago has little effect the win percentage. However, the other finishing positions in these previous

SmarterSig August 2009
16 of 16
races are important and so these variables need to be retained in the model. Therefore all the variables
seem to be important and will be retained in the model.

Note that the estimated value for the constant has changed to -2.9780 (was -2.7212) and the estimated
value for the parameter for sireSR.round has changed to 0.0259 (was 0.0395). The output also shows
values for the other parameters in the model. Note that the estimate for the days variable is the only one
which is negative.

The model can therefore be specified as:

34 017 . 0 33 219 . 0 32 253 . 0 31 372 . 0
24 179 . 0 23 236 . 0 22 375 . 0 21 441 . 0
14 392 . 0 13 652 . 0 12 656 . 0 11 868 . 0
012 . 0 026 . 0 978 . 2
1
log
pos pos pos pos
pos pos pos pos
pos pos pos pos
days sireSR
p
p
+ + + +
+ + + +
+ + + +
+ =
|
|
.
|

\
|



To use the model for prediction, suppose we have a horse which has a sireSR value of 10%, last ran 5
days ago, finished 1
st
in its previous race, finished 3
rd
two races ago and was unplaced three races ago.

Since the horse finished 1
st
in its previous race it has values of pos11=1, pos12=0, pos13=0 and pos14=0.
Similarly, since the horse finished 3
rd
two races ago, it has values of pos21=0, pos22=0, pos23=1 and
pos24=0. And since the horse was unplaced three races ago, it has values of pos31=0, pos32=0, pos33=0
and pos34=0.

Hence the model predicted log-odds is given by

674 . 1
236 . 0 868 . 0 5 0119 . 0 10 026 . 0 978 . 2
0 017 . 0 0 219 . 0 0 253 . 0 0 372 . 0
0 179 . 0 1 236 . 0 0 375 . 0 0 441 . 0
0 392 . 0 0 652 . 0 0 656 . 0 1 868 . 0
5 012 . 0 10 026 . 0 978 . 2
1
log odd the of log
=
+ + + =
+ + + +
+ + + +
+ + + +
+ =
|
|
.
|

\
|

=
p
p
s

We can therefore calculate the predicted odds by typing:
exp(-1.674)


SmarterSig August 2009
17 of 17
The answer is 0.187, which means that fair odds of the horse winning would be 1 to 0.187, or 5.3:1

We can again convert this to the probability of the horse winning:
158 . 0
187 . 0 1
187 . 0
1
=
+
=
+
=
odds
odds
p

Hence, we have used the model to say that a horse which has a sireSR value of 10%, last ran 5 days ago,
finished 1
st
in its previous race, finished 3
rd
two races ago and was unplaced three races ago, would win its
next race with a probability of 0.158, and so fair odds would be 5.3:1.

Again, model-predicted odds and probabilities for all the horses in our data set can be obtained by the
following commands:
model<- glm(win~sireSR.round+days+pos1+pos2+pos3,binomial)
log.odds<-predict(model)
odds<-exp(predict(model))
probs<-predict(model,type="response")

We have really only scratched the surface of what you need to consider when developing such models, not
least of which is all the checks we should do to ensure that the model we have is sensible and is not
flawed in some way. This is a very complex and comprehensive topic and is really a job for a suitably
experienced statistician and so is well beyond the scope of this series.

In addition, you must bear in mind that the model has been developed based on sireSR values of up to
18% and only for horses that last ran within 50 days. As a result the model should not be used for
predictive purposes for horses with a sireSR value above 18% or that last ran more than 50 days ago,
since we cannot be sure of the suitability of the model in this case.

I hope you have managed to gain at least something from this series in relation to the development of a
binary logistic model for modeling horse race outcomes, and that it has shown you how these types of
models can be very useful for modeling sporting outcomes where win probability is the primary output of
interest.

SmarterSig August 2009
18 of 18
HOW RELIABLE ARE BREEDING STATS
Mark Foley

This is the first in a series of articles whose purpose is to discover how reliable the methods the every day
punter and so called racing experts use to find winners. Ill be looking at form, trends and Sire stats
among other things and trying to highlight the pitfalls and common mistakes that the average punter falls
prey to and the rubbish often spouted by the so called media experts.

With summer on the way and the guarantee of decent ground (Is that a flying pig Ive just seen?) it
seemed appropriate to start with progenies or Sire stats as they are more commonly referred to.

About six or seven years ago I used to listen to William Hill radio during the day and every time the
offspring of a Zafonic was running the William Hill Progeny expert used to say without fail and with great
conviction, Zafonic was a fast ground horse, this one will love good ground, just like his sire but he wont
do a thing on the soft. Since then Ive heard several so called Media experts expound this view, but if
its not true and then why did they believe it?
When looking at the problems we face obtaining reliable pointers from Sire statistics Zafonic is a good
horse to study.

Prior to winning the 2000 Guineas on Good ground, Zafonic had won the 1992 Dewhurst by a highly
impressive 4 lengths on G/Firm ground and was being talked of as the next wonder horse. He was
consequently installed as a short priced favourite for the following years 2000 Guineas.
As a juvenile he had won all four of his races comfortably and made his 1993 seasonal debut at Maison
Lafitte in a Listed race. Despite the ground being Soft, defeat was talked of as out of the question; he had
already won on Soft ground prior to racing at Maison Lafitte and was expected to win the 4 runner race
comfortably before taking the 2000 Guineas. Zafonic finished 2
nd
, despite going off at the prohibitive odds
of 1/10.
The bubble had been burst and people needed an explanation and most people blamed the ground, this
was despite the fact that his first two career wins had come on G/Soft and Soft ground. But was it the
ground that beat him?

Zafonic only raced twice after that defeat to Kingmambo, winning the 2000 Guineas with a blisteringly
turn of foot on Good ground and then finishing well down the field in the Sussex stakes at Goodwood on
G/Soft ground. Two career defeats on rain softened ground appeared to reinforce the view that he was a
Good ground horse and it would appear that this is where people have got the idea that he needed good
ground.

However if you take a closer look at the 2 defeats this doesnt appear to be the case. At Maison Lafitte he
was just beaten by Kingmambo in a photo finish in a falsely run race, by a very high class horse who
subsequently proved good enough to win The French 2000 Guineas and The St Jamess at Royal Ascot a
month later. Zafonics final run was at Goodwood, a course that several high class horses down the year
have failed to handle with its severe dips, turns and undulations. He finished 7
th
of 10 that day, behind
several horses that he would have been expected to beat comfortably on a more conventional course.

Zafonic only raced 7 times during his short career and it is irresponsible and wrong to say he was a fast
ground horse, despite this being an almost universal perception among the experts. With the passing of
time the two races that people remember are the defeat on Soft ground at odds of 1/10 and the visually
impressive Guineas win on Good ground. True, his 2 defeats came on G/Soft and Soft ground but there
were valid excuses for both of them and it must be remembered that he won a Group 1 race on Soft
ground as a Juvenile. The reality is that Zafonic only ever once raced on ground faster than Good when he
won the Dewhurst and he never raced on Firm ground.


Saddlers Wells is often put up as a Sire whose progeny love the mud and wont go on Good ground,
another myth not backed up by the statistics.

SmarterSig August 2009
19 of 19
Which brings us to a couple of very interesting points. The first is that Media experts have perceived
views, which they believe are true, just because it is the widespread view when in reality they are
misleading their audience because they cannot be bothered to check their facts.
Secondly they often say This ones sire hated such and such conditions, so I couldnt have this one
today. The reality is that it is very difficult to pinpoint favorable conditions for the vast majority of horses,
but even if you do find a strong preference, the chances are that it will not necessarily reflect its sires
traits.

Getting back to Zafonic and the popularly held view is that the progeny love fast ground and wont go an
inch on Soft ground. The reality is that overall the progeny have a strike rate of around 14% on G/Firm
and Firm ground, compared to a strike rate of around 9.5 % on ground thats G/soft or softer. The
evidence certainly suggests a preference for firmer ground but at the same time 33 winners on Soft and
Heavy dispels the argument that Zafonics only act on a Firm surface. However if we delve a bit deeper and
look at the statistics for the juveniles the Strike rates are 20.5% on Firmer surfaces and 19.5% on softer
ground, almost identical and showing no surface preference.

So how do we explain the fact that there doesnt appear to be a ground preference with the 2 yos, but the
older horses seem to go better on a Firmer surface?
The major problem with Progeny stats is that several other factors influence the statistics and apart from
the distance stats I would argue that the figures become less reliable as a horse gets older. Look at the
number of outside factors that determine whether a runner wins or not. Does the trainer get them fit to
win first time out or does he target a win later on in the season? Is the horse running in handicaps and
been harshly treated by the Handicapper? Is the horse running over an inappropriate trip? Has it been
overfaced or put in lower Class races to improve its chances of winning? These and several other factors
influence the figures and make it very difficult to assess a horses chances purely on its breeding.
Then of course you need to take into account the influence of the Dam.

So are Progeny stats reliable?

Overall I would have say no, because so many outside factors distort the figures, but that does not mean
that they cannot be useful.
For example the excellent sprinter Mind Games has been a prolific Sire and his progeny have won 77
juvenile races and all but 2 of them were over 5 or 6 furlongs. The juvenile record in races over 7f and
further was 2 wins from 126 runs and the obvious conclusion is that the progeny are bred for speed and
not endurance and given the stats it would take a brave man to back one of the 2 yos over further than
6f. The progeny of certain sires such as Royal Applause and Green Desert win regularly on debut, but
what good is that if their trainer doesnt get them ready to win on debut. However, given the combination
of a good FTO sire with a good FTO trainer, then youve got a decent betting proposition. At the other
extreme you wouldnt want to mortgage the house on the progeny of a Hawkwing (2 from 85) or a
Stravinsky (2 from 79) making their juvenile debuts.

To sum up, I believe the usefulness of Progeny statistics is limited due to the large number of other
outside factors that distort the breeding traits and the often neglected influence of the Dam. The main
benefit should be to see if the progeny have managed to win in similar circumstances and then to
determine if it was a freak result or one achieved by several of the offspring. The extreme examples
illustrated earlier show that it is possible to identify certain trends, but, as in the case of Zafonic statistics
can also be misleading, if taken at face value. In the unlikely event that we have a wonderful summer it
could pay dividends to follow the progeny of Rainbow Quest, Singspiel, Nashwan and that supposed
mudlark Saddlers Wells. The progeny of all of the above have proven themselves on G/Firm and Firm
ground over a period of time, even Saddlers Wells, despite what the Media experts might tell you.

Happy punting
Mark


SmarterSig August 2009
20 of 20
MULTIPLE MUGS
Mark Littlewood

Multiple bets such as the lucky 15 are usually considered to be mug bets, advertised and promoted by the
bookmakers to seduce the uneducated when perhaps taking a break from the slot machines. They are also
the bet you probably originally cut your teeth on when you set out on your betting life. My first bet was
certainly 3 x 25p doubles and a 25p treble or the equivalent of 25p back in 1976. You can probably guess
what happened to those 3 horses given that I am still punting today.

A question often asked around SmarterSig members is whether an edge enjoyed by a punter betting to
singles would still exist if they backed their selections in multiples. Would perhaps even the advantage be
magnified over the course of say a year. I hadnt given much thought to this until recently prompted by
two factors. The first is the difficulty of getting single bets on and perhaps multiple bets spread around
bookmakers would have a greater chance of gliding under the radar. Secondly my friend multiple Trevor
(see educating Trevor article), informed me that many bookmakers pay 3 x the odds on any single winner
in a lucky 15 should you only get one winner.

First of all, for those with responsible parents who perhaps only introduced their offspring to single win
bets, let me explain the lucky 15. The bet requires 4 selections and therefore consists of the following
combinations:-

4 win singles
6 doubles
4 trebles
1 fourfold accumulator

A 1 stake on the above would mean a total outlay of 15

Now the question is what kind of year would be produced from taking a winning year of single bets and
organising them into lucky 15s. What would also be the value of the one winner bonus scheme offered by
many bookmakers.

I ran my bets through a program to check this question. The year was 2008/09 which backing to singles
produced the following results :-

Bets = 3271 PL = +446.32 pts ROI = +13.6%

Now running the selections through my program, placing the bets into lucky 15s based simply on
chronological order gave the following profit without the single bet bonus.

Stake = 12270 pts PL = +4240.9 pts ROI = 34.5%

Of course you would not be able to expose yourself to the same degree with a bank of x pounds with the
lucky 15 bets. Where previously a bank of say 100 might enable you to place 1 bets to a point, the
lucky 15 bet would require considerably less.

Now lets take a look at the effect of applying the single win only bonus.

Stake = 12270 pts PL = +6332.3 pts ROI = +51.6%

I am not sure without further investigation on how to stake the above but I would guess that 1/500
th
of a
bank would be sufficient. If that is the case then I would certainly have made more backing in lucky 15s
than I did with straight win bets. The other big advantage is the extra gained could be used to offset the
loss of obtaining the very best odds as one splits the bet up amongst numerous outlets.


SmarterSig August 2009
21 of 21
The down side of course is that losing runs will certainly be longer or should I say more severe in terms of
points lost. Hopefully the mathematicians in the group might be able to have some input on this idea.

Before we get carried away with the results a look at a more modest year could produce a more sobering
effect. In the following year level stakes win only produced the following

Bets = 1577 PL = +73.7pts ROI = +4.6%

Now utilising lucky 15s with the bonus for single wins in place we have the following set of results :-

Stakes = 5910 PL = +73.3pts ROI = +1.2%

Virtually the same points profit but of course with a smaller unit stake on the lucky 15 bets we have a
smaller overall profit compared to the straight win bet option. It would seem that how the winners group
together could be playing a big part in producing any advantage the lucky 15 bet might have.

I dont think I will be rushing for those special pre printed slips just yet but perhaps I wont be pre judging
the multiple mugs so quickly as I might have done in the past.



SmarterSig August 2009
22 of 22
MARKET BIAS IN 5f HANDICAPS
by David Renham

In this article I am looking to see whether market forces are the same at different course and distances.
The focus is 5f turf handicaps (excluding 2yo nurseries) and I have concentrated on races with 7 or more
runners. I have decided to split the betting market into thirds as I do when I analyse draw bias. Of
course there is not always an even split, but it should balance out as the table below shows:
Number of runners Top third of market Middle third of
market
Bottom third of
market
10 3 4 3
11 4 3 4
12 4 4 4

Hence, with 10 runners, the middle third gets the extra runner, while with 11 runners the top and bottom
thirds get the extra runner. This idea continues for all other groups of three (eg 13, 14 and 15 runners).
Hopefully therefore, we will get a fairly accurate reflection of market bias overall.

The data has been taken from 2002 to July 2009 and for all 5f courses the handicap market bias stats are
as follows:

Top third of market (%) Middle third of market (%) Bottom third of market (%)
58.8 28.4 12.8

No surprises that the top end of the market have produced the majority of winners essentially the top
third of the market have produced the winner 4.6 times more often than the bottom third of the market.
However, let us see what happens when we look at the splits with different sets of number of runners:

Number of runners Top third of market
(%)
Middle third of
market (%)
Bottom third of
market (%)
7 to 9 57.5 28 14.5
10 to 12 57.7 30.6 11.7
13 to 15 58.9 28.5 12.6
16 to 18 59.4 26 14.6
19 or more 65.5 24.1 10.3

As the number of runners increases, the better it seems for the top third of the market. It seems to back
up the old adage the bigger the field, the bigger the certainty.
Let us break down the stats by course. There is quite a variance between courses I have ordered them
initially alphabetically:
Number of runners Top third of market
(%)
Middle third of
market (%)
Bottom third of
market (%)
Ascot 53.8 38.5 7.7
Ayr 51.6 37.5 10.9
Bath 73.5 22.4 4.1
Beverley 76.1 16.3 7.6
Brighton 55.6 30.9 13.6
Carlisle 60.0 25.0 15.0
Catterick 69.9 19.2 11.0
Chepstow 65.0 25.0 10.0
Chester 53.6 37.5 8.9
Doncaster 55.3 26.3 18.4
Epsom 47.8 34.8 17.4

SmarterSig August 2009
23 of 23
Folkestone 63.3 20.0 16.7
Goodwood 52.8 26.4 20.8
Hamilton 64.3 21.4 14.3
Haydock 53.1 35.9 10.9
Leicester 53.8 38.5 7.7
Lingfield 75.0 18.8 6.3
Musselburgh 66.7 20.7 12.6
Newbury 64.3 32.1 3.6
Newcastle 50.0 34.0 16.0
Newmarket 45.3 37.7 17.0
Nottingham 47.3 34.5 18.2
Pontefract 66.0 27.7 6.4
Redcar 55.0 32.5 12.5
Ripon 51.7 24.1 24.1
Salisbury 31.8 45.5 22.7
Sandown 54.1 35.3 10.6
Thirsk 70.3 14.9 14.9
Warwick 42.9 50.0 7.1
Windsor 66.7 22.8 10.5
Yarmouth 50.0 33.3 16.7
York 34.9 41.9 23.3

Beverley tops the list for the top third of the market with 76.1%; Salisbury has the lowest on 31.8%. A
huge difference between them now let us put the courses in order of best performances for the top
third of the market:

Number of runners Top third of market
(%)
Middle third of
market (%)
Bottom third of
market (%)
Beverley 76.1 16.3 7.6
Lingfield 75.0 18.8 6.3
Bath 73.5 22.4 4.1
Thirsk 70.3 14.9 14.9
Catterick 69.9 19.2 11.0
Musselburgh 66.7 20.7 12.6
Windsor 66.7 22.8 10.5
Pontefract 66.0 27.7 6.4
Chepstow 65.0 25.0 10.0
Hamilton 64.3 21.4 14.3
Newbury 64.3 32.1 3.6
Folkestone 63.3 20.0 16.7
Carlisle 60.0 25.0 15.0
Brighton 55.6 30.9 13.6
Doncaster 55.3 26.3 18.4
Redcar 55.0 32.5 12.5
Sandown 54.1 35.3 10.6
Ascot 53.8 38.5 7.7

SmarterSig August 2009
24 of 24
Leicester 53.8 38.5 7.7
Chester 53.6 37.5 8.9
Haydock 53.1 35.9 10.9
Goodwood 52.8 26.4 20.8
Ripon 51.7 24.1 24.1
Ayr 51.6 37.5 10.9
Newcastle 50.0 34.0 16.0
Yarmouth 50.0 33.3 16.7
Epsom 47.8 34.8 17.4
Nottingham 47.3 34.5 18.2
Newmarket 45.3 37.7 17.0
Warwick 42.9 50.0 7.1
York 34.9 41.9 23.3
Salisbury 31.8 45.5 22.7

The question that needs to be addressed at this juncture is how valid are these course figures? In many
cases my hypothesis is that they are fairly accurate. The figures for each course cover a fair number of
races Musselburgh for example has had 111 races, Beverley 92. Hence, in most cases we are dealing
with decent sample sizes. Also, as a punter who has tried to specialize in sprint handicaps, many of the
courses with low or lowish top third percentages are courses I have really struggled at York, Salisbury
and Newmarket are three such examples. The percentages for such courses indicate that results have not
been as market biased as one would expect, so in other words these races have been far more open
contests my losses at these courses can vouch for that!!

For me, the question now is, do I use these figures in the future when analyzing 5f sprint handicaps? The
answer is a simple yes. I think the information that has been collated is going to prove useful. I will now
think twice about backing an outsider at certain courses such as Beverley, Bath, Lingfield, Newbury,
Pontefract and Warwick; whereas the reverse will be true at the outsider biased C&Ds at Ripon, York,
Salisbury and Goodwood.

The beauty of this type of research is that it can be extended to all race types, all distances, etc, etc. My
next port of call will be 6f handicaps to see if the course stats for 6f correlate with those for 5. If they do
not, then maybe I will have to go back to the drawing board.



SmarterSig August 2009
25 of 25
EDUCATING TREVOR
Mark Littlewood

Members of the email list will be well aware of my recent exploits and problems on the High street. Suffice
it to say that about a year ago, in answer to the slamming of internet doors, I decided to hit the High
street each morning. My laptop and I hit the streets around June of last year as I rather naively thought
my face would be less of an identifiable feature than my login userid. My betting year for no real reason
begins on June 1
st
and the first two months were quiet as I pretty much broke even. Of course, break
even doesnt mean across all shops. Inevitably one shop or two gets hit whilst others are taking money off
me. In August therefore I began to hit problems with restrictions to SP only in some shops. When you
enter a shop and this news is broken to you at the counter you have two courses of action available. The
first is to politely accept the fact and quietly walk away. The second option is to kick up the proverbial and
make every customer aware that the decision is a crime on humanity resting somewhere between
Apartheid and the German extermination camps. For a brief moment I considered my choice before
eventually deciding that I owed it to Nelson Mandella.

The usual response to my outburst from fellow shop punters is one of bemusement. Those that do enquire
about the exchange usually suggest I simply bet at another shop. Sometimes I am tempted to embark on
an explanation of why this is not practical but thankfully I usually resist the temptation. A couple of days
after one of these showdowns I was approached by a very pleasant chap called Trevor who enquired about
the nature of what had happened a few days before. I must admit at first I entertained one or two absurd
paranoid ideas. Was he a bookmaker employee vested with the job of befriending wining punters to find
out something about their background? Well no, actually he is a 59 year old part time teaching advisor for
a local council who spends Wednesday to Sundays grappling with the intricacies of dog and horse racing.
After gradually getting to know Trevor I found his company quite enjoyable. Trevor, being an educated
man, embarked on what could be considered a cultural exchange. He introduced me to literature, the arts
and good music whilst I introduced him to earl grey tea.

The real exchange however was surely going to be on the betting front. Here was a man who could
understand what I was on about when I mentioned terms such as value. weak favourites, shopping
around for prices, and even if he couldnt, my daily list of possible bets was always next to the tea pot. If
he chose to, I had no objections to him simply backing what I was backing. I looked at the situation and
couldnt help think that if such an opportunity had come my way 25 years ago I would have saved an
enormous amount of time and research.

I mentioned the month of August because at pretty much the precise time of our meeting things began to
take off for me, and the profits kicked in. I was totally open with Trevor when he asked me how I was
doing. I dont see any point in wishy washy answers like cant complain or I have had worse months. I
answered with straight profit and loss figures. My ambition with project Trevor was twofold. Firstly he was
helping me get bets on and secondly I loved the idea of turning him around and making him into at the
very least a break-even punter, whilst being mindful of not trying to drag him somewhere he did not want
to go.

I tackled the latter by first pointing out the error of some of his ways. Going through the extortionate over
rounds on dog racing compared to horse racing didnt have the desired effect of getting him off the dogs
but I did have greater success with persuading him to shop around for odds instead of staying in the
comfort of one bookmaker because the coffee is free.

Trevor quickly took to the idea of checking his picks on my laptop and then like me, scurrying off to the
best bookmaker odds. The difference however is that whilst I was backing single wins Trevor couldnt
break the habit of combining his selections into multiples usually with the final leg being the Arch Bishop
of Canterbury,

Although I pointed out to Trevor a few months ago that if he had placed his average 10 stake on all the
selections that stared up from that coffee shop table each morning, he would now have enough for a 5
star holiday with some change left over.

SmarterSig August 2009
26 of 26

To be fair the first 10 or so bets he placed on my behalf only produced one winner, which may have
skewed his view of my profitable claims. Putting this aside however I was still struck by how reluctant
Trevor was to ditch his methods of selection. In fact my year in the shops has re-enforced the idea that
most shop punters do not want to change. Maybe change requires admittance that one hasnt got a clue.
Not easy to do unless we are talking about something life threatening like open-heart surgery. I used to
think that profitability went hand in hand with selectivity and that the average shop punter wouldnt have
the discipline for one or two bets per day. This is of course untrue with many successful punters operating
on turnover rates that would satisfy even the most compulsive of gamblers.

No, I have to concede that something else is going on here but I cant quite put my finger on it. Its also
not simply down to intelligence. Trevor is an educated and intelligent man. Perhaps the compulsive gene
that forces the habitual bettor should not be underestimated. The thrill of the bet, for some, may far
outweigh the thrill of long-term profit.

Mindful that readers are at different stages of reaching consistent profits, here would be my key
alterations to Trevors betting.

- Ditch multiples and stick to singles
- Read up on when each way is favourable otherwise back win only
- Bet on the early markets when the overound is down to often less than 1% per runner
- Shop around for best odds and gauranteed odds
- Set aside a bank for betting purposes
- Stick to consistent level stake bets that are comfortable within your bank
- Dont follow horses simply because you have backed them before. They dont owe you anything
- Decide what you are going to bet before racing begins and stick with those picks at morning odds
- Do not bet on dogs and dont even think about the slots and the virtuals

The above are a few basics. The trickier part is of course deciding on what to bet.

SmarterSig August 2009
27 of 27































SmarterSig August 2009
28 of 28


2009 www.SmarterSig.com

Вам также может понравиться