Вы находитесь на странице: 1из 62

The Mathematics of

Geographic Profiling
Towson University
Applied Mathematics Laboratory
Dr. Mike !Leary
"rime #ot $pots% &ehavioral' "omp(tational and Mathematical Models
)nstit(te for P(re and Applied Mathematics
*an(ary +, - .ebr(ary +' +//0
$(pported by the 1)* thro(gh grant +//23)*3"435/67
Pro8ect Participants
Towson University Applied Mathematics
Laboratory
Undergrad(ate research pro8ects in
applied mathematics.
.o(nded in 9,:/
1ational )nstit(te of *(stice
$pecial thanks to $tanley ;rickson <1)*= and
Andrew ;ngel <$A$=
$t(dents
+//2-+//7%
Pa(l "orbitt
&rooke &elcher
&randie &iddy
Gregory ;merson
+//7-+//0%
"hris "astillo
Adam .o8tik
La(rel Mo(nt
>(o?hen @ao
Melissa Aimmerman
*onathan Banderkolk
Grant Carble
Geographic Profiling
The Question:
Given a series of linked crimes committed by
the same offender' can we make predictions
abo(t the anchor point of the offenderD
The anchor point can be a place of
residence' a place of work' or some other
commonly visited location.
Geographic Profiling
(r E(estion is operational.
This places limitations on available data.
;Fample
A series of , linked vehicle thefts in
&altimore "o(nty
;Fample
ADDRESS DATE_FROM TIME DATE_TO TIME REMARKS
918 M 01/18/2003 0800 01/18/2003 0810 VEHICLE IS 01 TOYT CAMRY,
LEFT VEH RUNNING
1518 L 01/22/2003 0700 01/22/2003 072 VEHICLE IS 99 HOND ACCORD
STL!REC, """#/M
$AIR,DRIVING MAROON ACCORD"
731 CC 01/22/2003 07 01/22/2003 07% VEHICLE IS 02 CHEV MALI#U
STL!REC
1527 K 01/27/2003 110 01/27/2003 110 VEHICLE IS 97 MERC COUGAR,
LEFT VEH RUNNING
151 G 01/29/2003 0901 01/29/2003 0901 VEHICLE IS 99 MITS
DIAMONTE, LEFT VEH RUNNING
115 K 01/29/2003 1155 01/29/2003 115% VEHICLE IS 00 TOYT RUNNER
STL!REC, &' ARREST NFI
593 R 12/31/2003 0%32 12/31/2003 0%32 VEHICLE IS 92 #M( 525,
(ARMING U$ VEH
127 G 02/17/200 0820 02/17/200 0830 VEHICLE IS 00 HOND ACCORD,
(ARMING VEH
9 S 05/15/200 0210 05/15/200 0%00 VEHICLE IS 0 SU)I ENDORO
;Fisting Methods
$patial distrib(tion strategies
Probability distance strategies
1otation%
Anchor point-
"rime sites-
1(mber of crimes-
z=( z
(1)
, z
(2)
)
x
1
, x
2
,, x
n
n
Distance
;(clidean
Manhattan
$treet grid
d
1
( x , y)=x
(1)
y
(1)
+x
(2)
y
(2)

d
2
( x , y)=
.
( x
(1)
y
(1)
)
2
+( x
(2)
y
(2)
)
2
$patial Distrib(tion $trategies
"entroid%
Crime locations
Average
Average
Anchor Point

centroid
=
1
n

i=1
n
x
i
$patial Distrib(tion $trategies
"enter of minim(m distance% is the val(e
of that minimi?es
Crime locations
Distance sum = 10.63
Distance sum = 9.94
Smallest possible sum!
Anchor Point

cmd
y
D( y)=

i=1
n
d ( x
i
, y)
$patial Distrib(tion $trategies
"ircle Method%
Anchor point contained in the circle whose
diameter are the two crimes that are
farthest apart.
Crime locations
Anchor Point
Probability Distrib(tion $trategies
The anchor point is located in a region with a
high Ghit scoreH.
The hit score has the form
where are the crime locations and is a
decay f(nction and is a distance.
S ( y)=

i=1
n
f (d ( y , x
i
))
S ( y)
= f (d ( z , x
1
))+ f (d ( z , x
2
))++f (d ( z , x
n
))
x
i
f
d
Probability Distrib(tion $trategies
Linear%
f (d )=ABd
Hit Score
Crime Locations
>ossmo
Manhattan distance metric.
Decay f(nction
The constants and are empirically
defined
f (d )=

k
d
h
if d>B
k B
gh
(2 Bd)
g
if d<B
k , g , h
B
>ossmo
B=1
h=2
g=3
"anter' "offey' #(ntley I Missen
;(clidean distance
Decay f(nctions
f (d )=Ae
d
f ( d )=

0 if dA ,
B if AdB
Ce
d
if dB.
,
Dragnet
A=1
=1
Levine
;(clidean distance
Decay f(nctions
Linear
1egative
eFponential
1ormal
Lognormal
f (d )=A+Bd
f (d )=Ae
d
f (d )=
A
.
2nS
2
exp|
(d

d )
2
2S
2

f (d )=
A
d
.
2nS
2
exp|
(lnd

d )
2
2S
2

"rime$tat
.rom Levine <+//J=
"rime$tat
$hortcomings
These techniE(es are all ad hoc.
Chat is their theoretical 8(stificationD
Chat ass(mptions are being made abo(t
criminal behaviorD
Chat mathematical ass(mptions are being
madeD
#ow do yo( choose one method over
anotherD
$hortcomings
The conveF h(ll effect%
The anchor point always occ(rs inside the
conveF h(ll of the crime locations.
Crime locations
Convex Hull
$hortcomings
#ow do yo( add in local informationD
#ow co(ld yo( incorporate socio-
economic variables into the modelD
$nook' Individual differences in distance travelled by
serial burglars
Malc?ewski' Poet? I )ann(??i' Spatial analysis of
residential burglaries in London, Ontario
&ernasco I 1ie(wbeerta' How do residential burglars
select target areas?
sborn I Tseloni, The distribution of household
property crimes
A 1ew Approach
)n previo(s methods' the (nknown E(antity
was%
The anchor point
<spatial distrib(tion strategies=
The hit score
<probability distance strategies=
Ce (se a different (nknown E(antity.
A 1ew Approach
Let be the density f(nction for the
probability that an offender with anchor point
commits a crime at location .
This distrib(tion is o(r new (nknown.
This has criminological significance.
)n partic(lar' ass(mptions abo(t the
form of are eE(ivalent to
ass(mptions abo(t the offender!s
behavior.
P( x ; z )
z
x
P( x ; z )
The Mathematics
Given crimes located at the
maximum lielihood estimate for the anchor
point is the val(e of that maFimi?es
or eE(ivalently' the val(e that maFimi?es
x
1
, x
2
, , x
n

mle
y
L( y)=

i=1
n
P( x
i
, y)
=P( x
1
, y) P( x
2
, y)P( x
n
, y)
\( y)=

i=1
n
ln P( x
i
, y)
=ln P( x
1
, y)+ln P( x
2
, y)++ln P( x
n
, y)
>elation to
$patial Distrib(tion $trategies
)f we make the ass(mption that offenders
choose target locations based only on a
distance decay f(nction in normal form' then
The maFim(m likelihood estimate for the
anchor point is the centroid.
P( x ; z )=
1
2nc
2
exp
|

xz
2
2c
2

>elation to
$patial Distrib(tion $trategies
)f we make the ass(mption that offenders
choose target locations based only on a
distance decay f(nction in eFponentially
decaying form' then
The maFim(m likelihood estimate for the
anchor point is the center of minim(m
distance.
P( x ; z )=
1
2nc
2
exp
|

xz
2c

>elation to
Probability Distance $trategies
Chat is the log likelihood f(nctionD
This is the hit score provided we (se
;(clidean distance and the linear decay
for
\( y)=

i=1
n
|
ln(2nc
2
)
x
i
y
c

S ( y)
f (d )=A+Bd
A=ln(2nc
2
)
B=1/c
Parameters
The maFim(m likelihood techniE(e does not
reE(ire a priori estimates for parameters
other than the anchor point.
The same process that determines the best
choice of also determines the best choice
of .
P( x ; z , c)=
1
2nc
2
exp
|

xz
2
2c
2

z
c
&etter Models
Ce have recapt(red the res(lts of eFisting
techniE(es by choosing
appropriately.
These choices of are not very
realistic.
$pace is homogeneo(s and crimes are
eE(i-distrib(ted.
$pace is infinite.
Decay f(nctions were chosen arbitrarily.
P( x ; z )
P( x ; z )
&etter Models
(r framework allows for better choices of
.
"onsider
P( x ; z )
P( x ; z )=D(d ( x , z ))G( x)N ( z )
Geographic
factors
Normalization
Distance Decay
(Dispersion Kernel)
The $implest "ase
$(ppose we have information abo(t crimes
committed by the offender only for a portion
of the region.
W

E
The $implest "ase
>egions
% *(risdiction<s=. "rimes and anchor
points may be located here.
E% GelsewhereH. Anchor points may lie
here' b(t we have no data on crimes here.
W% GwaterH. 1either anchor points nor
crimes may be located here.
)n all other respects' we ass(me the
geography is homogeneous.
The $implest "ase
Ce set
Ce choose an appropriate decay f(nction
The reE(ired normali?ation f(nction is
G( x)=

1 xD
0 xD
D(xz)=exp
|

xz
2
2c
2

N( x; z)=
|

D
exp
(

yz
2
2c
2
)
dy
(1)
dy
(2)

1
The $implest "ase
(r estimate of the anchor point is the
choice of that maFimi?es
exp
(

i=1
n
x
i
y
2
2c
2
)
|

D
exp
(

jy
2
2c
2
)
d j
(1)
d j
(2)

mle
y
The $implest "ase
(r st(dents wrote code to implement this
method last year' and tested it on real crime
data from &altimore "o(nty.
Ce (sed Green!s theorem to convert the
do(ble integral to a line integral.
&altimore co(nty was simply a polygon
with +,/: vertices.

D
exp
(
jy
2
2c
2
)
d j
(1)
d j
(2)
=
|
c D
c
2
jy
exp
(
jy
2

)
(
e
r
n
)
ds+

z D
0 z D
The $implest "ase
To calc(late the maFim(m' we (sed the
&.G$ method.
$earch in the direction where
.or the 9-D optimi?ation we (sed the
bisection method.
D
n
\ f ( y
n
)
D
n+1
=D
n
+
(
1+
g
T
D
n
g
d
T
g
)
dd
T
d
T
g

D
n
gd
T
+gd
T
D
n
d
T
g
d=y
n+1
y
n
g=\ f ( y
n+1
)\ f ( y
n
)
$ample >es(lts
Baltimore County
Vehicle Theft
Predicted Anchor Point
Offender's Home
&etter Models
This is 8(st a modification of the centroid
method that acco(nts for possibly missing
crimes o(tside the 8(risdiction.
"learly' better models are needed.
&etter Models
>ecall o(r ansat?
Chat wo(ld be a better choice of D
Chat wo(ld be a better choice of D
P( x ; z )=D(d ( x , z ))G( x)N ( z )
D
G
Distance Decay
.rom Levine <+//J=
Distance Decay
Distance Decay
$(ppose that each offender has a decay
f(nction where varies
among offenders according to the distrib(tion
.
Then if we look at the decay f(nction for all
offenders' we obtain the aggregate
distrib(tion
f (d ; \) \(0, )
(\)
F (d )=
|
0

f (d ; \)(\) d \
Distance Decay
f (d )=
A
d
.
2nS
2
exp|
(ln d

d )
2
2 S
2

A=
.
n

d=0.1
$caling Parameters $hape Parameters
0.5
1
2
3
4

=2 S
2
Distance Decay
1 2 3 4 5
x
0.2
0.4
0.6
0.8
Aggregate Distrbution
;ach offender has a lognormal decay f(nction
The offender!s shape parameter has a lognormal decay
1 2 3 4 5
x
0.2
0.4
0.6
0.8
Aggregate Distrbution
Distance Decay
Distance Decay
)s this real' or an artifactD
#ow do we determine the GbestH choice of
decay f(nctionD
This needs to be determined in advance.
Cill it vary depending on
crime typeD
local geographyD
Geography
Let represent the local density of
potential targets.
>ather than look for feat(res
<demographic' geographic= to predict it' we
can (se historical data to meas(re it.
co(ld then be calc(lated in the same
fashion as hot spotsK e!g! by kernel density
parameter estimation.
)ss(es with bo(ndary conditions
G( x )
G( x)
Geography
Geography
1o calibration is reE(ired if is calc(lated
in this fashion.
An analyst can determine what historical
data sho(ld be (sed to generate the
geographic target density f(nction.
Different crime types will necessarily
generate different f(nctions .
G( x)
G( x)
$trengths of this .ramework
All of the ass(mptions on criminal behavior
are made in the open.
They can be challenged' tested' disc(ssed
and compared.
$trengths
The framework is eFtensible.
Bastly different sit(ations can be modelled
by making different choices for the form
and str(ct(re of .
e!g! ang(lar dependence' barriers.
The framework is otherwise agnostic abo(t
the crime seriesK all of the relevant
information m(st be encoded in .
P( x ; z )
P( x ; z )
$trengths
This framework is mathematically rigoro(s.
There are mathematical and criminological
meanings to the maFim(m likelihood
estimate .

mle
Ceaknesses of this .ramework
G)G
The method is only as acc(rate as the
acc(racy of the choice of .
)t is (nclear what the right choice is for
;ven with the simplifying ass(mption that
this is diffic(lt.
P( x ; z )
P( x ; z )
P( x ; z )=D(d ( x , z ))G( x)N ( z )
Ceaknesses
There is no simple closed mathematical form
for .
>elatively compleF techniE(es are
reE(ired to estimate even for simple
choices of .
The error analysis for maFim(m likelihood
estimators is delicate when the n(mber of
data points is small.

mle

mle
P( x ; z )
Ceaknesses
The framework ass(mes that crime sites are
independent' identically distrib(ted random
variables.
This is probably false in generalL
This sho(ld be a solvable problem tho(gh...
Ceaknesses
Ce only prod(ce the point estimate of .
Law enforcement agencies do not want
G4 Marks the $potH.
A search area' rather than a point estimate
is far preferable.
This sho(ld be possible with some &ayesian
analysis

mle
M(estionsD
"ontact information%
Dr. Mike !Leary
Director' Applied Mathematics Laboratory
Towson University
Towson' MD +9+2+
J9/-0/J-0J20
molearyNtowson.ed(

Вам также может понравиться