Вы находитесь на странице: 1из 27

2010 Mathematical Contest in Modeling (MCM) Control Sheet

Advisor
Name:

Deng Weijun

Department: School of Electrical and Mechanical

Your team's control

Institution: Central South University

number is:

Address: School of Electrical and

6906

Mechanical.Changsha.Hunan.P.R.ChinaSchool of
Electrical and
Mechanical.Changsha.Hunan.P.R.China
Changsha, Hunan 410083

(Place this control number


on all pages of your
solution paper and on any
support material.)

Phone: 8613517486903
Fax: 86-0731-88660172

Problem Chosen:
B

Email: yeatszone@gmail.com
Home
Phone:

86-0731-88660731

Team Member

Gender

Wang Jiaqi

Luo Hao

Yang Li

Each team member must sign the statement below:


(Failure to obtain signatures from each team member will result in disqualification of the
entire team.)
Each of us hereby testifies that our team abided by all of the contest's rules and did not
consult with anyone who was not on this team in developing the enclosed solution paper.

Mailing Address and Signature of Wang Jiaqi

Mailing Address and Signature of Luo Hao

Mailing Address and Signature of Yang Li

This signed original must be stapled to the top of one copy of your team's solution paper.

Team Control Number


For office use only
T1 ________________
T2 ________________
T3 ________________
T4 ________________

6906
Problem Chosen

For office use only


F1 ________________
F2 ________________
F3 ________________
F4 ________________

A approach to generate Geographic Profiling

Abstract
This paper is mainly to determine the geographical profile of a suspected serial
criminal based on the locations of the crimes and predict the possible locations of the
next crime based on the time and locations of the past crime scenes. To resolve the
questions we discussed the distance from their residence to the target, the features of
the crime sites and so on. Then we establish the following precise mathematical
models and resolve the problem.
Scheme1, Hit score model
Firstly, we discuss the alternative distance decay functions, which mainly discuss the
relationship between the target preference of a serial criminal and the distance he or
she must travel from their residence to the target, according to the existing studies
such as Rossomo, Canter, and Linear. Comparing the strengths and weaknesses of the
existing functions, we chose the Truncated negative exponential, taking the practice
into account at the same time. Then, we introduce a hit score function (9).Combining
these we model the function s(y) (10).At last we illustrate the implementation of our
two schemes using the collected data (in 2.4).
Scheme 2, Probability model
From a different angleWe have chosen the probability of each location that may
occur as a starting point forecast .We are regarding the forecast anchor points as a
random variable .After a rigorous mathematical derivation, each point may have been
recorded in the probability of a crime, and then select relatively high points from
which the probability of a crime, These points shall be in the region of
geographical profile. We also illustrate the implementation of our two schemes using
the collected data

Team #6906

page 2 of

26

.
Combination
First, by the model one can predict criminal records each have had a point scoring
value; the value would have to be a high point of a possible anchor region, recorded
as the regional one. By the model 2, predicted already had criminal records for each
point the probability of the same token, from the high points of the probability of
possible anchor points of the region, recorded as the region 2.
Second, when the point had criminal records, while in the regions 1 and 2, the point
shall be the most likely anchor points, from these points shall be composed of
regional.
Third, According to the circumference of criminal psychology theory, In these
points with a round will most likely anchor points ring up. These points within the
circle shall be located in the predicted location of the most likely to commit crimes .In
order to improve forecast accuracy, considering the characteristics of sites to start; we
introduced the scoring function of location-specific characteristics. By the scoring
function, select out the location of features which probably may lead to crime, , then
the circumference from the front to identify any features of the site to meet location,
these points need to pay special attention.

Key words:
Circumference theory

Hit score model; Probability model; prediction; Baye Theorem;

Team #6906

page 3 of

26

The executive summary


In order to help reduce the cost and time for finding out the serial criminal, we
develop a method to generate a geographical profile and a useful prediction for law
enforcement officers more accurately and quickly.

An overview of the potential issues

The contract between the need for large data of location to locate the criminal and
the need to prevent the criminal as early as possible.
The contract between the need for search every possible place to find the criminal
and the need to reduce the cost.
The criminal may do not have an anchor point.
How to considerate the influence of the different geographic features on the criminal.

An overview of the approach


We have develop a method that could generate a geographical profile for both the
anchor point and possible locations of the next crime by using more evidence
connected to the feature of the environment other than locations. The approach that
we develop have made use of two different schemes to generate a geographical
profile.
Compared with other approach, this approach have the following advantages
The geographical profile that we provide can make it possible to help the police to
search both the possible anchor point and possible locations of the next crime at the
same time .The effective way to search will help to reduce the cost and time for search
for the criminal.
We take the environment into consideration to make the prediction for the location of
next crime more realistic.
The models we used in the approach depend on the criminal behavior theory and it is
more reliable.
The whole approach take the following steps:
Step1. Obtain the anchor points from the result of the first scheme.
Step2. Obtain the anchor points from the result of the second scheme
Step3. Combine the two anchor points by using the intersection and union.
Step4. Find the centroid of the zone, which we obtained from the step3.
Step5. Predict the next crime sites by using the criminals willing distance to commit
a crime..
Step6. Take environment into consideration to select the area from the step 5 to give a
better prediction the next crime site .
In order to make the approach effective we need the police to provide the locations of
the criminal scenes and get the detail environmental information.

Team #6906

page 4 of

26

In the following situations, it is appropriate


The criminal have an anchor point, we can generate a useful geographical profile
no matter whether the choice of the criminal sites is random or not. The approach will
give a useful prediction if the zones of the criminal scenes are not too large. And no
matter how sophisticated environment is, we can select the next possible criminal
sites effectively.
In the following situations, it is not appropriate
The criminal do not have an anchor point, our approach assumes that there is an
anchor point.
If the location of the criminal scenes distributes to a too large zone, it is not
appropriate any more.
If the choices of the criminal sites are too unique, the approach may not get a good
prediction.

Team #6906

page 5 of

26

Content
Abstract ............................................................................................................................................. 1
The executive summary .................................................................................................................... 3
I Introduction: ................................................................................................................................... 6
1.1 Geographic Profiling ........................................................................................................... 6
1.2Spatial Event Prediction ....................................................................................................... 6
1.3Object ................................................................................................................................... 6
II. Scheme 1 Hit score model ............................................................................................................ 7
2.1Assumptions:........................................................................................................................ 7
2.2 Notations and Definitions: .................................................................................................. 7
2.3 Geographic profiling methods............................................................................................. 7
2.3.1 Decay functions................................................................................................................ 7
2.3.2 Hit score function ............................................................................................................. 9
2.3.3The foundation of the model ............................................................................................. 9
2.4 Example Joth Duffy, the RailwayKiller ............................................................................ 10
2.5Evaluation of the model ..................................................................................................... 12
2.5.1Shortcomings: ................................................................................................................. 12
2.5.2Advantages:..................................................................................................................... 14
III Scheme 2 Probability model ................................................................................................... 14
3.1Modle overview ................................................................................................................. 14
3.2Assumptions ....................................................................................................................... 14
3.3The building of the model .................................................................................................. 15
3.4 Example: burglary in Los Angeles .................................................................................... 17
3.5 Evaluation of the model .................................................................................................... 22
3.5.1 Strengths of this Framework .................................................................................. 22
3.5.2 Weaknesses ............................................................................................................ 22
IV. Combination and prediction .................................................................................................. 22
4.1 Overview ........................................................................................................................... 22
4.2 Combination ...................................................................................................................... 22
4.3 Prediction .......................................................................................................................... 23
4.4 The improvement of the prediction ................................................................................... 23
4.4.1Model Overview ............................................................................................................. 23
4.4.2Building the model ....................................................................................................... 23
4.4.3 The realization of the model .......................................................................................... 25
V. References .................................................................................................................................. 26

Team #6906

page 6 of

26

I. Introduction:
Although tactical crime analysis has been continually improving investigation efforts,
serial crimes still pose a great challenge to police officers and investigators alike.
These cases often go unsolved because arduous investigations are required. The
primary reason for the investigation complexity is that the offender is often a stranger
to the victim. Devoid of any tie between the victim and the offender, detectives are
left with no substantial leads, thus forcing them to consider large populations of
potential suspects. Such large suspect populations strain police resources, lead to
resource allocation, problems, and Lower the likelihood of apprehending the
neruetrator.

1.1 Geographic Profiling


To limit large suspect populations for serial cases, crime analysts have turned to
geographic profiling. Geographic profiling is an investigative methodology that uses
the locations of a connected series of crimes to determine the most probable area of
offender residence. By using geographic profiling, crime analysts are able to focus
their search to specific areas where the probability of offender residence is high.

1.2Spatial Event Prediction


In addition to using geographic profiling to optimize the allocation of police resources,
crime analysts also utilize crime forecasting. Crime forecasting, or specifically, the
methodology of crime event prediction, is an investigative methodology that uses the
locations, and location features of a set of prior crimes to determine the probable areas
of future crimes. The definitive work on this subject is done by Brown and Liu.Given
a set of possible characteristics or features, their methodology attempts to identify a
subset of the features that are most strongly correlated with crime incidents in a
historical data set and discover the pattern of preferences far each of these features.
These inferences are then used to generate the likelihood of another incident occurring
within a geographic region and a specified time range.

1.3Object
The primary goal of this paper is to define a geographic profiling methodology that
improves on existing methodologies which combines known journey to crime theories
and methodologies, with criminal forecasting theories and methodologies.

Team #6906

page 7 of

26

II. Scheme 1 Hit score model


2.1Assumptions:
1) Crime sites: We presume that we are working with a series of n linked crimes,
and the crime sites under consideration are labeled x1; x2 xn.
2) Anchor point: we presume that the offenders anchor point can be the offenders
home, place of work, or some other location of importance to the offender.
3) Distance: There are many reasonable choices for this metric, including the
Euclidean distance, the Manhattan distance, the total street distance following the
local road network or the total time to make the trip while following the local
road n t work. This paper we chose the Manhattan distance.

2.2 Notations and Definitions:


X: A point x will have two components x = (x(1); x(2)). These can be latitude and
longitude, or distances from a fixed pair of perpendicular reference axes.
Z: We use the symbol z to denote the offenders anchor point.
d: We use the symbol d to denote the distance between two points.
d(x,y): We shall let d(x; y) denote the distance metric between the points x and y.

2.3 Geographic profiling methods


2.3.1 Decay functions
Studies have shown that the target preference of a serial criminal is dependent upon
the distance he or she must travel from their residence to the target. Further research
has identified this relationship as the Journey to Crime Theory. This theory states that
a criminals propensity to commit crime decreases exponential with increasing
distance from their home or work place.
1) Rossmos method, as described in (Rossmo, 2000, Chapter 10) chooses the
Manhattan distance function for d and the decay function:

Team #6906

page 8 of
k

f (d) =

dh
kB g h
(2Bd)g

26

if d < ,
if d B.

(1)

Figure 1

We remark that Rossmo also considers the possibility of forming hit scores by
multiplication; see (Rossmo, 2000, p. 200)
2) The method described in Canter, Coey, Huntley, and Missen (2000) is to use a
Euclidean distance, and to choose either a decay function in the form

f (d) = Aed

(2)

or functions with a buffer and plateau, with the form

0
if d < ,
f (d) = 1 if A d <
Ced if d B

(3)

Figure 2

3) The CrimeStat program described in Levine (2009a) uses Euclidean or spherical

Team #6906

page 9 of

26

distance and gives the user a number of choices for the decay function, including
Linear: f (d) = A + Bd,
(4)
d
Negative exponential: f (d) = Ae ,
(5)
Normal: f (d) =

A
2S 2

Lognormal: f (d) =

exp

A
d

2S 2

(dd )2
2 S2

exp

(6)

(ln dd )2

(7)

2 S2

and
Truncated negative exponential: f (d) =

Bd
Ae

if d < dp
if d dp

(8)

Crime Stat also allows the user to use empirical data to create a different decay
function matching a set of provided data as well as the use of indirect distances.

2.3.2 Hit score function


Existing algorithms begin by first making a choice of distance metric d; they then
select a decay function f and construct a hit score function S (y) by computing
S(y) = ni=1 f(d(xi , y)) = f(d(x1 , y))+ f(d(x2 , y))+ + f(d(xn , y)) (9)
Where are the crime locations, f is a decay function and d is a distance metric.

2.3.3The foundation of the model

The linear function and the negative exponential, the simplest type of distance
model , postulate that the likelihood of committing a crime at any particular
location declines by a constant amount with distance from the offender s home .
It is highest near the offender s home but drops off by a const ant amount for
each unit of distance until it falls to zero.
The normal distribution and the lognormal function assume the peak likelihood is
at some optimal distance from the offender s home base. Thus, the function rises
to that distance and then declines. The rate of increase prior to the optimal
distance and t he r at e of decrease from that distance is symmetrical in both
directions.
The truncated negative exponential is a joined function made up of two distinct
mathematical function s - the linear and t he negative exponential. Although This
function is the closest approximation to the Rossmo model. However, it differ s in
several mathematical proper ties. First, t he near home base function is linear,
rather than a non-linear function. It assumes a simple increase in t ravel
likelihoods by distance from t he home base, up to the edge of the safety zone.3
Second, t he distance decay pa r t of the function is a negative exponential, rather
than an inverse distance function; consequently, it is more stable when distances a

Team #6906

page 10 of

26

very close to zero (e.g., for a crime where there is no near home base off set).
In practice, the offender always has buffer zone when to chose the site to commit and
the willing is the similar to the Truncated Negative Exponential. Hence, we use the
Truncated Negative Exponential.
Following the existing algorithms, we chose the Truncated Negative Exponential, then
we model a hit score function the same as the (9) .
Combining these, we then obtain the expression
S(y) =

n
i=1 f(d(xi , y))

n
i=1 Bd
n
d
i=1 Ae

if d < dp
if d dp

(10)

where d is the distance from the home base, B is the slope of the linear function and
for the negative exponential function A is a coefficient and C is an exponent . Since
the negative exponential only starts at a particular distance, , A, is assumed to be
the intercept if the Y-axis were transposed to that distance. Similarly, the slope of the
linear function is estimated from the peak distance, , by a peak likelihood
function.
Regions with a high hit score are considered to be more likely to contain the
offenders anchor point than regions with a low hit score.

2.4 Example Joth Duffy, the Railway Killer


Next we illustrate the implementation of our methodology using 18 crime site
locations connected to Joth Duffy, the Railway Killer
A three-dimensional surface is produced when the probability for every point on the
map is calculated. This surface can be represented by an isopleths or fishnet map
with different scores on the Z-axis representing probability density.

Figure3 Isopleth Map- Railway killer

Alternatively, the probability surface can be viewed from a top-down perspective and

Team #6906

page 11 of

depicted through a two-dimensional choropleth map.

Figure4. Choropleth Map -Railway killer

Figure5 .Geoprofile Map

26

Team #6906

page 12 of

26

2.5 Evaluation of the model


2.5.1Shortcomings:
1). These techniques are all ad hoc.
2). The distribution varies by type of crime.
For some crimes, it was very difficult to fit any single function . Figure 10.8
shows the frequency distribution of 137 homicides with three functions being
fitted to the data -The truncated negative exponential, the lognormal, and the
normal. As can be seen each function fits only some of the data, but not all of it .

Figure.6

Some types of crime, on the other hand, are very difficult to fit . Figure 10.7
shows the distribution of bank robberies. Partly because there were a limited
number of cases (N=176) and partly because its a complex pat tern, the truncated
negative exponential gave the best fit, but not a particularly good one. As can be
seen , the linear (near home) function underestimates some of t h e near distance
likelihoods while the negative exponential drops off too quickly; in fact , to make
this function even plausible, there gression was run only u p t o 21 miles
(otherwise, it under estimated even more).

Team #6906

page 13 of

26

Figure.6

3). The convex hull effect:


The anchor point always occurs inside the convex hull of the crime locations as the
follow figure.

Figure.7

Team #6906

page 14 of

26

2.5.2Advantages:
1).The decay function we chose is available.
In practice, the offender always has buffer zone when to chose the site to commit and
the willing is the similar to the Truncated Negative Exponential. Hence, we use the
Truncated Negative Exponential.
2).The model provides a search strategy for law enforcement.
By examining what type of function benefits a certain type of crime, police can target
their search efforts more efficiently. The model is relatively easy to implement and is
practical.
3).The mathematical formulation is stable.
Unlike the inverse distance function in the Rossmo model, equation 10.19 will not
have problems associated with distances that are close t o 0.
4).T he model does provide a search strategy for identifying an offender.
It is a useful tool for law enforcement officer s, particularly as they frame a search for
a serial offender.

III Scheme 2

Probability model

3.1Modle overview
A deeper understanding of the probability of crime locations would provide valuable
insight into geographic profiling problem. By creating a framework that incorporates
the contributions of location and time, we can estimate the probability of committing
the crime for each location. The model achieves several important objectives :
According to this, we can generate a geographical profile.

3.2Assumptions
In order to streamline our model we have made several key assumptions,
1) Assuming event independence and that all crimes were committed by the same
person.
2) The criminals are uniformly distributed amongst all residences within the city and
furthermore that all houses are equally.

Team #6906

page 15 of

26

3.3The building of the model


We want to change an angle .From probability angle, we develop a kinetic model of
criminal behavior that will be used to derive geographic profiling estimates. The
model assumes a foraging behavior for the criminal [2, 17, 9], though in general the
type of model should depend on the crime type. As proposed in [17], we start by
introducing a (stationary) spatial attractiveness field A(y) >= 0, reflecting how
attractive the target positioned at y is to a criminal also positioned there. .The
attractiveness field will determine the rate at which criminals commit their crimes.
(1)First letting y(t) denote the position of a criminal at time, we model the movement
of the criminal by the stochastic differential equation
dy
( y) 2 DRt
dt

11

where Rt is a wh ite noise, i.e. <Rt> = 0 and D is the diffusion parameter. The drift
term can be neglected in the case of unbiased motion or could be used to describe
more complex criminal behaviors.
For instance, it has been suggested that criminals may modify their movements
towards regions of higher attractiveness when selecting their targets . This type of
behavior could be incorporated into (11) through a gradient term of the form

A or a non-local potential involving the attractiveness field.


Since the anchor point can be viewed as the location from which criminals begin their
search for a target, we take y(0) = z as the initial condition for (11). A crime is then
committed at y(t) according to the killing measure A(y(t)), the probability per unit
time that the Brownian trajectory given by (11) is terminated at the space-time point
y(t) .
(2) Secondly, given that a criminal starts their random walk governed by (11) from the
anchor point z, the transition (survival) probability density ( x, t | z ) of the position
of the criminal satisfies the Fokker-Planck equation.
d
* (D ) A (x)
dt

0 ( x z )

(12)
(13)

where is with respect to the variable x .

Integrating (12)-(13) in time, the probability density of where the crime is committed
is then determined by,
P( x | z) A( x) ( x | z),

(14)

Team #6906

page 16 of

26

where

( x | z ) ( x, t | z )dt

solve the elliptic partial differential equation,

*( D ) *( ( x) ) ( x z).

(15)

We will refer to this equation as the forward" equation. Given the prior distribution
of criminal anchor points, P(z), the geographic profiling distribution can then be
determined using Baye Theorem,
P( x | z ) P( Z )
A( x) ( x | z ) P( z )
( x | z ) P( z )
P( z | x)

2 P( x | z) P( z)dz 2 P( x | z) P( z)dz 2 P( x | z) P( z)dz


R

P( z | x1 ,..., xN )

R2

Since

(16)

| z ) P( z )dz

P( x1 | z )...P( xN | z ) P( z )
P( x1 | z )...P( xN

i 1
N

R2

i 1

fi ( z )P( z )
fi ( z ) P( z )dz

is the

Green's function corresponding to the linear operator on the left side of (15), for fixed
x and varying z the function f ( z ) ( x | z ) solves the backward or adjoint
equation,
*( Df ) ( z )* f A( z ) f ( z x)

(17)

Where is now with respect to the variable z. Thus the geographic profiling
density can be efficiently computed in practice by solving the backward equation
given by (17), where the point mass on the right hand side is located at the scene of
the crime, and then multiplying by the prior distribution of anchor points and
normalizing. We note that the changes sign going from the forward equation (15) to
the backward equation (17). This has practical implications, for if criminals move up
gradients of attractiveness then police investigations starting from the scene of the
crime should move down gradients of attractiveness.
A similar procedure can be carried out for multiple crimes. Based on the
assumption 1), then the geographic profiling density for multiple events is given by,

P( z | x1 ,..., xN )

R2

Where

fi ( z )

| z ) P( z )dz

P( x1 | z )...P( xN | z ) P( z )
P( x1 | z )...P( xN

i 1
N

R2

i 1

fi ( z )P( z )
fi ( z ) P( z )dz

(18)

solves.

*( Dfi ) ( z)* fi A( z) fi ( z xi )

(19)

Also, a buffer zone could be incorporated into the forward equation through,
P( x | z ) 1{|x z|} A( x) ( x | z )

(11)

Where ( x | z ) solves the modified forward equation,

(20)

Team #6906

page 17 of

26

*( D ) *( ( x) ) 1{|x z|r} A( x) ( x z).

(21)

Here the idea is that criminals may leave a buffer zone of radius r around their anchor
point, within which they do not commit any crimes. The backward equation is then
given by,
*( D ) ( z)* f 1{|x z|r} A( x) ( x z).

(22)

and the geographic profiling density is


1
f ( z ) P( z )
P( z | x) {| x z|r }
f ( z ) P( z )dz

{| x z| r }

(23)

3.4 Example: burglary in Los Angeles


Next we illustrate the implementation of our methodology using burglary data
collected by the Los Angeles Police Department in 2004 within a 10 km by 10 km
region of the San Fernando Valley. The data consists of 45 locations,

P( z ) P0 * H ( z ) , where the burglaries occurred and the 45 corresponding residences,


zi , of the offenders (presumed to be their anchor points)
We first the parameters of the competing models using maximum likelihood
estimation. Here we assume that the motion of each criminalin the city is governed by
the stochastic differential equation (11) with the same diffusive parameter D. In
general, given historical distance to crime data on multiple offenders (each of whom
committed multiple crimes), a prior distribution ( D) for the diffusive parameter
could be estimated and incorporated into the modeling framework,
P( z | x1 ,...xN ) P( x1 ,..., xN | z, D) P( z ) ( D)dD

(24)

as discussed in [14]. In the models we also assume that criminals diffuse without drift
( 0 ).Based on the assumption 2 , both the attractiveness field

A( x) A0 * H ( x)

and the prior distribution P( z ) P0 * H ( z ) are taken to be proportional to housing


density H. In practice P(z) could be augmented with information from police
databases on the residences of past offenders, parolees, and suspects in other crimes.
This model has one effective parameter, which is found by maximizing the log
likelihood function: _
45

max log( P( zi | xi ))

For each event

xi

i 1

and parameter value the backward equation,

(25)

Team #6906

page 18 of

P( zi| | xi )

26

fi ( zi ) P( zi )

f ( z) P( z)dz
i

(26)

is solved using Multigrid on an 18 km by 18 km domain, with a 128x128 resolution


and Dirichlet boundary conditions. Here a buffer of 4km on each side of the data set is
used to avoid boundary effects. The log likelihood function (16) is then calculated for
each parameter using,

P( zi| | xi )

fi ( zi ) P( zi )

f ( z) P( z)dz
i

(27)

In Table 1 we list the maximized log likelihood values for the model. As expected,
taking A to be homogeneous severely limits the accuracy of the model since a great
deal of probability density is distributed in areas where there are no houses. Thus it is
important to accurately model criminal target selection through ,instead of using an ad
hoc approach that only incorporates geographic in homogeneities through P(z).

Table 1: Maximized log likelihood values of competing models


A(z) homogeneous
P(z) inhomogeneous
-231.996
In Figure 8 we include a plot of housing density in the region of the San Fernando
Valley considered in this study and in Figures 3, we plot several geographic profiles
corresponding to the different models. In Figure 2 the spatial in homogeneity of
housing density is illustrated, where there are several regions with high density and
other regions, including commercial zones, parks, and mountains, void of housing. In
Figure 9 we plot geographic profiles using the best fit parameters for each of the
model.
In Figure 10, we provide examples of geographic profiles corresponding to model in
the case of multiple crimes and when a buffer zone is included. The probability
density is much more localized in the case of multiple crimes due to the product in
Equation (9). We note that this is not the case when a summation of the form (1) is
used instead.

Team #6906

page 19 of

26

Figure 8: Housing density for the 18 km by 18 km region of the San Fernando Valley used in this
study. Center regions void of housing correspond to a commercial area and a park and lower
regions void of houses correspond to mountains.

Figure 9: model with best fit parameters

Team #6906

page 20 of

26

Figure 10: Geographic profiles (plotted on a logarithmic scale) for two crimes (top) and one crime
with a buffer zone (bottom) using model 3 with best fit parameters.

Team #6906

page 21 of

26

Figure 11: Histogram of distances to crime for 2004 data (top) and simulated data
using model with best fit parameters (bottom).

Team #6906

page 22 of

26

3.5 Evaluation of the model


3.5.1 Strengths of this Framework

All of the assumptions on criminal behavior are made in the open. They can be
challenged, tested, discussed and compared.

3.5.2 Weaknesses

The method is only as accurate as the accuracy of the choice of P.


Its unclear what the right choice is for p. Even with the simplifying assumption it
is difficult.
The framework assumes that crime sites are independent, identically distributed
random variables. This is probably false in general.

IV. Combination and prediction


4.1 Overview
The results of the two schemes are, to a certain degree, exact and are useful to
determinate the geographical profile of a suspected serial criminal. Though, to
make our result more precise, we combine the two results by using the intersection
and the union and we consider the intersection of the two results prior to the union.

4.2 Combination
Step1. Obtain the anchor points from the result of the first scheme.
According to the first scheme, we can produce a three-dimensional surface when
the hit score of every point on the map is calculated. This surface can be
represented by an isopleth or fishnetmap with different scores on the Z-axis
representing probability density (Garson&biggss,1992,48-52). Such maps, a form
of virtual reality(in the terms origial sense),may be generated through
computer-aided mathematical visualization techniques. We assume a constant as
the hit score that should be warned by experience, then we can obtain the
geographical profile we wanted.
Step2. Obtain the anchor points from the result of the second scheme.
The same as the step1, we can also produce a three-dimensional surface when the
probability of every point on the map is calculated. We assume a constant as the
maximum probability that should be warned by experience, then we can obtain the
geographical profile we wanted.

Team #6906

page 23 of

26

Step3. Combine the two anchor points by using the intersection and union.
First, go to the intersection of the results from the first scheme and the second one.
If we could not find the offender, then, go to the union of the results from the two
schemes.

4.3 Prediction

Find the centroid of the zone, which we obtained from the step3.
We can find the centroid of the zone by using the function
zcentoid =

1
n

n
i=1 xi

(28)

Where is the crime sites in the zone.


Predict the next crime sites.
We assume a constant d as average distance that the offenders are willing to
commit. With a diameterd, draw a circle, whose centre is the zcentoid .This
circle can be the guidance for law enforcement officers to predict the next
crime sites

4.4 The improvement of the prediction


The results of the first model can help us find the geographical profile of a
suspected serial criminal but the results only can be the guidance for law enforcement
officers to predict the next crime sites, which only consider the distances .Hence to
make our prediction more precise, we introduce the Process Transition Density
Model for reference ,which take the characteristics of crime sites into account.

4.4.1Model Overview
A deeper understanding of the characteristics of crime sites would provide valuable
insight into anchor points, Through the scoring characteristics of crime sites, we can
identify the characteristics of the location of anchor points.
Our objective is to find the smallest feature subset (of the initial feature set) that
accounts for the underlying pattern of criminal event occurrences (hot spots). This is a
model search problem we call feature selection. The selected feature subset is called
the key feature set and the feature subspace defined by the key feature set the key
feature space.

4.4.2Building the model


To evaluate a given set of features, we need a measure of cohesiveness of a point

Team #6906

page 24 of

26

pattern observed in the independent variable or feature subspace defined. In this


model we employ a class of cohesiveness of a point pattern observed in the
independent variable or feature subspace defined. In this model we employ a class of
cohesiveness measures that do not require any partitioning of space in advance. These
measures are functions of inter-event distances (or similarities).
Let d ij be the distance between two events i and j in the feature subspace defined by
the feature subset to be evaluated. We transform the distance d ij into the similarity

sij as follows
sij

(29)

1 dij

1
and d is the average inter-event distance, where distance refers to
d
differences in g k value of an independent variable. Define the Gini index between
these two events as
Where

gij 4sij (1 sij ) (30)


Notice that g attains its maximum, 1.0, when sij 0.5 (or dij d ), and its
minimum .0.0., when sij 0.0 , or sij 1.0 (or dij 0 ). For a data set of n events,
the averaged Gini index below is a suitable measure of cohesiveness:
n 1

Ig

2
i 1

j i 1

ij

n(n 1)

(31)

The smaller the value of the I index is, the higher the level of point-pattern
cohesiveness or the better the set of features that define the point pattern.
In general,,

rk

max | xik x jk |

xik x jk Ek

max | xik x jk |

32

xik x jk Pk

I g can be used in a subset selection algorithm (e.g., forward selection, backward


elimination) to yield an optimal or suboptimal subset of features. Alternatively, one
can also evaluate I g for each individual feature and select a subset of features based
on the I g scores.
Suppose that in addition to actual event data, we also have the feature values for a

Team #6906

page 25 of

26

large sample of locations that are chosen uniformly over the study region We call the
set of the feature values at the sample locations the prior feature data set. As the first
feature selection step, we calculate the ratio of the observed range to the full range of
each feature dimension to see whether there are any dimensions that do not exhibit
enough variation in the event feature data set. This ratio for feature f k defined by

rk

max | xik x jk |

xik x jk Ek

max | xik x jk |

33

xik x jk Pk

Where Ek rk and Pk are the event and the prior feature data sets for feature f k
(i.e., containing only the dimension f k ) respectively.
If the ratio rk is considered sufficiently small, we will not calculate the I g ( Ek ) score
for feature f k .Otherwise, we calculate the adjusted I g for feature f k , or the adjusted

I g( k ) , defined as follows
AdjustI g( k )
Where I g ( E k ) and

I g ( Ek )
I g ( Pk )

(34)

I g ( Pk ) are the I g scores for feature f k over the event feature

data set Ek And the prior feature data set Pk , respectively. The rationale for this
adjustment scheme is that I g ( Pk ) indicates how much the prior distribution of f k
deviates from the uniform distribution. The smaller I g ( Pk ) is or the further the prior
distribution is from the uniform distribution, the more I g ( Ek ) is adjusted

4.4.3 The realization of the model


Following what we say in the 4.3,we use the first geographical profile model to
determine the anchor point and then circle the prediction zone. And then we use the
Process Transition Density Model in the zone to make more accurate prediction.

Team #6906

page 26 of

26

V. References
[1] Barton, G. (1989). Elements of Green's Functions, Waves, and Propaga-tion: Potentials,
Diffusion, and Waves. Clarendon Press: Oxford.
[2] Brantingham, P. J. and Tita, G. (2008). Offender mobility and crime pattern formation from
first principles.Artificial Crime Analysis System,Edited by Lin Liu and John Eck.IGI
Global :Hershey,PA.
[3] Briggs, W. L., Emden Henson, V., McCormick, S. F. (2000). Amultigridtutorial. SIAM..
[4] Estep, D. (2004). A short course on duality, adjoint operators, Greens functions, and a
posteriori error analysis. \www:math:colostate:edu/estep/research/preprints/adjointcourse final:pdf
[5] Holcman, D., Marchewka, A., and Schuss, Z. (2005). Survival probability of diffusion with
trapping in cellular neurobiology. Physical Review E, 72(3),031910.
[6] Johnson, S. D., Summers, L., Pease, K. (2009). O_ender as Forager? A Direct Test of the Boost
Account of Victimization. Journal of Quantitative Criminology,in press.
[7] Keats, A., Yee, E., and Lien F-S. (2007). Bayesian inference for source
determination with applications to a complex urban environment. Atmo-spheric Environment, 41,
465-479.
[8] O'Leary, M. (2009). The mathematics of geographic pro_ling. preprint.
[9] Schuss, Z. (1980). Theory and Applications of Stochastic Dierential Equations. Wiley Series
in Probability and Statistics: New York.
[10] Short, M. B., D'Orsogna, M. R., Pasour, V. B., Tita, G. E., Brantingham, P. J., Bertozzi, A. L. and Chayes, L. (2008). A Statistical Model of
Criminal Behavior. M3AS, 18, 1249-1267.
ham, P. J., Bertozzi, A. L. and Chayes, L. (2008). A Statistical Model of
Criminal Behavior. M3AS, 18, 1249-1267.

Вам также может понравиться