Академический Документы
Профессиональный Документы
Культура Документы
SANTA CRUZ
MASTER OF SCIENCES
In
STATISTICS AND STOCHASTIC MODELLING
By
June 2008
____________________________________
Professor Bruno Sanso
____________________________________
Professor Herbert Lee
1
Copyright by
c
Luis Antonio Acevedo-Arreguı́n
2008
2
Spatial Temporal Statistical Modeling of Crime Data:
The Kernel Convolution Approach
Luis Antonio Acevedo-Arreguı́n
Professor Bruno Sansó
Faculty Advisor
Professor Herbert Lee
Academic Advisor
University of California, Santa Cruz
Department of Applied Mathematics and Statistics
June 2008
Abstract
3
Spatial Temporal Statistical Modeling of Crime Data:
The Kernel Convolution Approach
4
Moller and Waagepetersen provide the following definitions and remarks
to define the basic properties of Poisson point processes:
5
Heuristically, ρ(ξ)dξ is the probability for the occurrence of a
point in an infinitesimally small ball with centre ξ and volume dξ
[5].
Note that their notation is slightly different than the one used here.
Fortunately for our purposes, as they in turn cite Daley and Vere-Jones
(1988,2003), ”statistical inference for space-time processes is often simpler
than for spatial point processes,” which is later further treated in Section
9.2.5 of Moller and Waagepetersen’s book.
On the other hand, the more complicated issue of performing spatial clus-
tering model is addressed by Andrew B. Lawson and David G. T. Denison.
They claim that there are two main approaches to model clusters. Basi-
cally, the difference between those views is whether or not the locations of
the aggregations are parameterized [3]. Bayesian cluster modelling and, es-
pecially, mixture distribution and nonparametric approaches are given more
emphasis because of the development of fast computational algorithms for
sampling of complex Bayesian models (most notably Markov chain Monte
Carlo algorithms).
Method: The Kernel Convolution approach has been used for several years
to model spatial and temporal processes [7] [4]. For example, Stroud, Müller
and Sansó (2001) applied it to model two large environmental datasets,
whereas Lee, Higdon, Calder, and Holloman (2004) convolved simple Markov
random fields with a smoothing kernel to model some cases in hydrology and
aircraft prototype testing. In most of the applications of the kernel convo-
lution approach, Gaussian kernels were used. The emphasis on Gaussian
spatial and space-time models is because they are ”quite flexible and can be
adapted to a wide variety of applications, even where the observed data are
markedly non-Gaussian” [2].
David Higdon explains the reason of using the convolution models:
[Gaussian Markov random field] GMRF models work well for im-
age and lattice data; however, when data are irregularly spaced, a
continuos model for the spatial process z(s) is usually preferable.
In this section, convolution –or, equivalently, kernel– models are
introduced. These models construct a continuos spatial model
z(s) by smoothing out a simple, regularly spaced latent process.
In some cases, a GMRF model is used for this latent process.
6
The convolution process z(s) is determined by specifying a latent
process x(s) and a smoothing kernel k(s). We restrict the latent
process x(s) to be nonzero at the fixed spatial sites ω1 , ..., ωm ,
also in S, and define x = (x1 , ..., xm )T where xj = x(ωj ), j =
1, ..., m. For now, the xj ’s are modeled as independent draws
from a N(0, 1/λx ) distribution. The resulting continuos Gaussian
process is then
z(s) = k(u − s)dx(u)
S
m
= k(ωj − s)xj
j=1
http://www.cincinnati-oh.gov/police/pages/-5192-/
with records of crimes committed in the Hamilton County for several years.
The database reports date, time, and location of crimes, as well as other data
that might be useful to characterize the magnitude of the reported event. A
UCR code, which seems to be related to a uniform crime reporting code
7
that the FBI uses nationwide, was assigned to each event. There were more
than 70 different UCR codes describing a variety of events such as telephone
harassment, vehicle theft, murder, and the like. Although very diverse, this
variable was used to reclassify the data in order to be used in the test models.
Specifically, the data corresponding to 2006 were downloaded, address
geocoded, imported into R, and performed a simple descriptive statistical
analysis. Since the database only reported the street address of each crime,
the more than 43000 records of that year were processed to obtain their
geographical coordinates. The process of geocoding was conducted by using
online services. The website
http://www.gpsvisualizer.com
was helpful because by acquiring a Google API key, users can geocode thou-
sands of records a day. Thus, converting multiple addresses to GPS coordi-
nates only requires a minor modification on the HTML code of the geocoder
webpage to include the API key, reduce the google delay value to 0.5 sec-
onds or less, and increase the number of records to be processed at once.
Once geocoding was performed, the data was imported into R and some
temporal variables were added such as the day of the week and the day of
the year in which the crime was committed. Also, the UCR codes were
transformed into four main categories of events: crimes against people with
extreme violence, crimes against people with minor violence, crimes against
property, and crimes against the system. For example, the categorical vari-
able crime class, which was incorporated to the database
CRIME2006_plus3.dat,
was given values from 1 to 4 depending on the category of the crime. A crime
with a UCR of 105 (corresponding to murder) was given a crime class value
of 1, whereas a crime with a UCR of 1120 (passing bad checks) was given a
crime class value of 4, and so on. These new categorical variables allowed us
to perform a preliminary analysis, which was summarized in some box plots
and time series plots. A complete description of the UCR codes used by the
Cincinnati police is provided in the file UCR code description.txt.
From a preliminary analysis of the entire dataset, a cyclical pattern was
observed on a plot of the number of crimes both with respect to the day
of the week and with respect to the day of the year. The plots showed
the highest incidence of crimes happened around the middle of the year
8
whereas the values tended to decrease around the end of the year. Simi-
larly, a higher number of crimes was reported on Mondays than the number
reported for the rest of the week. A similar pattern emerged when a part
of the dataset, the one corresponding to the crimes against people with ex-
treme violence, was plotted with respect to the temporal covariates. Thus,
crimes type 1 were selected for statistical modeling. Within this crime cate-
gory were included the UCR codes 100 to 495 and 800 to 864, which along
with the rest of the data are in the file crime2006 plus3.dat. The files
CRIME2006 database description.txt provide more details on the entire
dataset.
As part of the data processing, map importing into R was another task
that required the search of mapping resources both for obtaining the satellite
photographs and for georeferencing the imported images. Google, especially
http://earth.google.com,
was again a good source of satellite images from the study area, almost in
the same way the website
http://tiger.census.gov
was very helpful not only for providing the spatial covariates later included in
the models, but also for generating maps from any part of the United States
by just specifying the GPS coordinates of the area of interest. For example,
to generate a map of Hamilton County, whose Tiger code is TGR39061, the
user only needs to type the following address in the browser of his or her
preference
http://tiger.census.gov/cgi-bin/mapgen?lat=39.166828&
lon=-84.538348&wid=0.290456&ht=0.290456&iwd=480&iht=480
in a single line and without spaces. The parameters included in the link were
computed by using the boundary coordinates of the Hamilton County, also
provided by the Census website (i.e., −84.820305 < longitude < −84.256391
and 39.021600 < latitude < 39.312056). The GPS coordinates in the link
correspond to the center of the map, the wid and ht values represent the
width and the height of the image in GPS units whereas the iwd and iht
values represent the same dimensions but in pixels. Thus, the width and the
height of the image were chosen depending on the dimensions of the JPEG
image to be generated.
9
The JPEG file with the map of Cincinnati was later processed in R by
using the package rimage. This was required to generate a surface ma-
trix that could be used by the command image as many times as needed
without demanding a lot of computational time and also to facilitate the
georeferencing of the JPEG map. Georeferencing was necessary to plot the
crime points on a map without transforming the GPS coordinates of each
point into another coordinate system. A satellite image of Cincinnati ob-
tained from Google Earth was processed in the same way, thus generating
the files Cincinnati map1.dat for the option ”map1” in the computer pro-
gram for model 1, and the files Cincinnati map2.dat, long map2.dat, and
lat map2.dat for the option ”map2” in the same program. The option
”map1” corresponds to the simple road map, whereas the option ”map2”
corresponds to the satellite photograph. These files are required to generate
the background on the plots both for the figures included in this report and
the background in the accompanying video clips.
Model Statement: Under the kernel convolution approach, the intensity
λ(s, t) of a point process is modeled as the convolution of a random pro-
cess Z(s) and a weighting kernel k(s − u) over a grid of u locations. Both
the spatial and the temporal covariates are included into the model through
multiplicative effects µs (s) and µt (t), so for a Poisson process on an observa-
tion window R, the corresponding expressions for the intensity, the expected
number of points, and the likelihood for n points y ∈ d occurring at times
t = 1...T are
λ(s, t) = τ k(s − u)Z(u)µs(u)µt (t) (2)
u
ΛR,T = λ(s, t) ds dt (3)
T R
n
L(λ|d) = exp(−ΛR (t)) λt (yi ) (4)
t∈T i=1
where u indicates a grid location and s indicates a point location. The spatial
multiplicative effect µs is a function of two spatial covariates, X1 (s) as the
population density in year 2000 (number of individuals per square mile) and
X2 (s) as the number of vacant units,
10
whereas the temporal multiplicative effect is based on a linear combination
of sines and cosines of four temporal covariates,
2πt4 2πt4
µt (t) = exp(θt1 sin( ) + θt2 cos( )
12 12
2πt3 2πt3
+θt3 sin( ) + θt4 cos( )
52 52
2πt2 2πt2
+θt5 sin( ) + θt6 cos( )
365 365
2πt1 2πt1
+θt7 sin( ) + θt8 cos( ))
7 7
(6)
where t1 ∈ {1, .., 7} is the day of the week (1 for Sunday), t2 ∈ {1, 2, ...365} is
the day of the year (1 for January 1st, 2006), t3 ∈ {1, 2, ...52} is the number
of week, and t4 ∈ {1, 2, ...12} is the number of month (1 for January).
The kernels over a 13x10 grid were chosen to be bivariate Gaussian with
fixed parameters σx2 and σy2 , which were estimated for the elliptical contours
of each bivariate Gaussian to have one standard deviation from its center (u
location) on both directions, x and y, equal to 52.5% the distance between
two grid points in the same row or column. The correlation ρ was set to
zero. Associated to these kernels was a Gamma process Z(u) with fixed
hyperparameters α and β, and a multiplicative factor τ that played the role
of transforming Z(u) into the process τ ·Z(u) with one of its hyperparameters,
α or β, acting as a random variable. The corresponding prior distributions
for all the parameters of the model were chosen to be
7 2πσx σy 7
π(τ · Z(u)) ∼ τ · Gamma( · , ) (7)
4 u R k(s − u)ds 4
7 0.0075 7
π(τ ) ∼ Gamma( · , ) (8)
4 2πσx σy 4
π(θ1 ) ∼ N(0, 0.00012) (9)
π(θ2 ) ∼ N(0, 0.00052) (10)
π(θtj ) ∼ N(0, 0.52 ) (11)
11
Results: The model parameters were estimated by using Markov chain
Monte Carlo (MCMC). Specifically for the posterior distribution of τ · Z(u),
a beta proposal was implemented to improve the acceptance rate of the pro-
posal value for a new iteration k of the M-H step. Thus, the proposal Z (u)
for a new Z k (u) was sampled from
Z k−1 (u) aδ a(1 − δ)
Z (u) ∼ Beta( , ), (12)
δ 2 2
where δ and a were set on 0.95 and 2.5, respectively. This multiplicative
random walk seemed to induce a fast convergence of the MCMC. More details
on this approach can be found in Sanso (2007). The parameter τ was sampled
from its posterior Gamma distribution by a Gibbs step. The rest of the
parameters were sampled from their corresponding posterior distributions
by using M-H with normal proposals. Thus, the proposal distributions for
the spatial θs were
θ1 ∼ N(θ1k−1 |0, 0.0000052) (13)
θ2 ∼ N(θ2k−1 |0, 0.0002502) (14)
whereas the proposal distributions for the temporal θs were simply
θtj ∼ N(θtj
k−1
|0, 0.0252), (15)
where j ∈ {1, 2, ...8}. For modeling the daily variation of the intensity λ,
5000 iterations were required to convergence and a burning in of 2500. Since
the entire computer program was coded in R, the simulation took over 20
hours/run. The code is included in the file ppm llnl ver7a.r.
Once the posterior means of the spatial and temporal parameters θ were
computed, the corresponding multiplicative factors µs (u) and µt (t) were es-
timated as
µs (u) = exp(0.000141X1(u) + 0.004256X2(u)) (16)
2πt4 2πt4
µt (t) = exp(0.027926sin( ) − 0.091187cos( )
12 12
2πt3 2πt3
−0.222996sin( ) + 0.226660cos( )
52 52
2πt2 2πt2
+0.134605sin( ) − 0.229748cos( )
365 365
2πt1 2πt1
+0.056335sin( ) + 0.037308cos( )
7 7
12
(17)
which, in conjunction with the baseline intensity λ(u), allow to make infer-
ences on the expected number of crime events Λ over the region of interest
per day. Contour plots of the intensity λ for an area of Cincinnati delimited
by the longitudes 84.63◦W and 84.38◦ W , as well as the latitudes 39.09◦ N and
39.22◦N were plotted for each of the 365 days of 2006, and can be observed
on the 6-min movie Video-2.wmv.
Conclusions: The model allowed us to obtain a picture of the criminal hot
spots of the metropolitan area of Cincinnati, OH, in terms of providing the
location on an actual map of the various modes of the spatial distribution of
the intensity λ and its corresponding evolution over time. By incorporating
the information of other spatial variables that were considered constant with
respect of time, such as population density or the number of houses for rent,
it was possible to visually find the correlation between crime intensity and
densely populated areas of Cincinnati.
The model also served the objective of testing new ways to deal with
the massive computational resources required to process thousand of data on
hundred of grid points by using new proposal distributions for the Metropolis-
Hastings steps to get fast convergence for the MCMC. The Beta proposal
resulted in faster MCMC iterations than the traditional Gaussian proposal.
Faster simulations might be obtained by translating the R code to Fortran
or C++. We worked the entire computer code in R because of its advantages
for educational settings with limited computational resources (i.e., it is open
source), and because of its graphical capabilities that allowed us to follow
the MCMC iterations on the computer screen in real time.
This model might be improved by incorporating kernels with varying
parameters over space and time to explore the correlation between crime ac-
tivity and city infrastructure such as roads or land use. Also, a preliminary
summary of criminal activity based on a spatial distribution of events occur-
ring during certain days of the week, month or year, might be incorporated
into the model to explore its forecasting potential.
Acknowledgements: This master’s project would not have been possible
without the support of Dr. William Hanley and his team in the Lawrence Liv-
ermore National Laboratories. Likewise, Professors Bruno Sansó and Herbert
Lee, as well as Dr. Matt Taddy were especially important academic advisors
for this project to come to a fruitful end.
13
Data points and lattice for kernel convolution modeling
+ + + + + + + + + +
●● ●
●●
●●● ●●
●
●● ● ●
●●
● ●●● ●●●●●
●
+ + + ●
● ● +● + + +●●● ● + + +
39.20
● ● ●
●●● ●
●●
●●●
●● ● ● ●●●● ●● ● ●
●●●●●
● ●●●
● ●●
●● ●● ● ● ● ●
● ●●
+ + +●
●
● ●
●
●●●●●
●●
+
●●
● ●●● ●●
●●
● ●
●
●
+ + ●●●
●●
● ●+
●
●
●
●
●●●
+ + +
●
● ●● ●●● ●
●
● ●
● ●●●●
●●
●
●
●●
● ● ●
● ● ● ● ●●
● ● ●●
● ●
● ● ● ●
● ●
●●● ● ●● ● ● ● ●● ●● ●
●
●●●● ●●
● ●
●
●
●
●●●
● ● ●●
● ●● ● ●
● ● ●
●
●
+ + + ●
●
●●
●
●●
● ●
+ ●
+ ●●
● + ● + ●
● ●
●
●
●
● ●● ●
+ ●
●●
●●
●●
● +
●● ●●
● ● ●● ●
● ●
+
● ●● ● ●
●● ●●● ●● ● ● ●●●
● ●
●
● ●
●● ● ●●
●●
●
●● ●●●● ●● ● ● ●●
●
●●
● ●●●● ●● ●
●
●
● ●●●
●● ●
●●●
●
●
●
● ●●
●
● ●
●●
+ + + ●●
+●●
●●●●●
+● ●●
●●●●
●●
●●●
●
+ ●●●
● +
●●●●
●
●
●●●
●●
●●
●
●
●
●●
●●
● + ● ● + +
●● ● ●
●
●●●●● ●● ● ● ●●
● ●
● ●
●● ● ● ●●
39.16
●●●
●
● ●●● ●● ●
● ● ● ●●●
● ●
●●
● ●●● ●●● ● ● ● ●
●●●● ● ●●
+ ●
+ + ●
●●●● +
●●
●●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●●●●
● ●
+ ●●●
●
●
+ ● ● ●
+ + ●●●
● ● ●
●
●
+ ●
●●
● ●
●●●
●
●
●
●
●
●●
●
●
●
●● ●
● ● ●
+
● ●●
●
●● ●● ● ●●● ● ● ●●
● ● ●●● ● ●● ●
●
●●● ●● ● ● ● ●● ●● ●●
● ●●
●● ●●●
● ● ●● ● ● ● ●● ●●● ● ●●● ● ●● ●
●●●
●●● ● ● ●● ● ● ● ●
●●● ●
●● ●
●
●●
● ● ● ● ●
yu
● ● ● ● ●● ● ●●
+ +
● ●
+ ● ●
●
● ●
●
●
●
●
●
+ ● ● ● + ●●●
●●● ● +
●●
●●
●●
●●
●
● ●●
●
●●●● + + ●● ●
● + ● ● ●●● +●
● ● ● ● ●●●● ● ● ●
● ●● ● ●● ●● ●●●
●● ● ● ● ●● ●
● ●
● ● ● ●● ●
●● ● ●● ●●●●● ● ● ●
● ●
● ●● ●●● ● ● ●● ●● ● ●●●
●● ●
●●
● ●●●●●
●● ●● ●●● ●●● ●
● ●●
● ● ●●● ●● ●●
●●● ● ●
●●●●● ●●●
●●
●
●
● ● ● ●●●●
●
●
●●●●● ●
●
●
●
●● ● ●
●●
● ● ● ● ● ●●●●●●
●
●
●●●●●● ●●● ●● ● ● ● ● ●●
●●●
● ●
●●●●●
●●
●●●●●●
●● ●●●● ● ●
●●
●
● ●● ●●
●● ●●
●●●● ●● ● ● ●
● ●
●
●● ● ●
● ●●●
●
●●●
● ● ●●● ● ● ●
● ● ●
●●
●●●
●
●●
●●● ● ● ●● ●
● ●● ●
● ●● ● ●
●● ● ● ● ● ● ● ●●●
+ ●●
●
●● ●
● ●●●●●●
+● ●● ●●
●
● ● ●
●
●●
● ●●
+ ●
●● ● ●
● ●
+ ●●
●
●●●● ●
●
●
●●● ●
●
+● ●
● ● ●●
●
●●● +
●●● ●
●●●●
●
●●● ●
●
● ●●● ●●●●
●● ● ●
●
●+
●
●
●●
●●●
●●●
●●● ● +
●
● ● + +
● ●●
● ● ●●●●●●
●●
●● ●● ●●●
●●● ● ●
●
●
●
●●● ● ●
●●
●●●
● ●● ● ●●●●
●●
● ●●●●●
●●
● ●● ●
●●●
●● ●● ● ● ●● ●●●● ● ● ●●
●
●
●
●
●●●●●
●●
● ●
●●
●
●● ● ●●●●● ● ●●
●
● ●●●
●● ● ●●
●●●
●●
●● ●● ●● ● ●● ●●●
●
●●●
●●
●●●● ● ●●●● ●
●● ●●●●●●●●● ●
●
● ●● ●●●● ●● ●●● ● ●● ● ●
●●
● ●● ●●
● ● ●●●
●●
● ●●
●● ● ●● ● ●● ●● ●● ●● ● ●●●● ● ● ● ●
+ + ●●
+ ●●●● ●● ●
●
●●+ ●● ●●
●
●
●
●
●
●
●
●
●
●+●
●
●●
●
●
● ●●
●●
●●●●
● ● + ●●
●
●●●
●●
●
●●● ●
●
●
●
● ●● +●
● ● + ● ● + +
39.12
●●
● ●● ●●
●●●●
●
● ●●
●●●
● ●●●●●● ●●●●
●
● ●●● ● ●●
●●●●
●●
●
●
●●
●●
●●●● ●
●● ● ●●● ●●● ● ●● ●
● ●
●
●●
● ●
●●●
●
●
●
●
●● ●●●●
● ●●● ● ●
●●
●
●●
●
● ●● ●●
●● ●
● ●
●●● ●●●● ● ● ●
●●●
● ● ●
●●●● ● ●● ● ● ● ● ●●
●●● ● ●
● ●●
● ●●● ● ● ●
●
●●● ●●
●●
●
● ● ● ● ●
●
●● ●●●●●
● ●
●●●
●
●
●●● ● ●● ● ● ● ● ●
● ●
●●● ●● ●● ●
●
● ●●
●
●● ●
●● ●●●● ● ● ●●
●● ●● ● ●●●
● ● ● ●● ●
+ ●
●
+ ● ●●
● ●
●
●
●
●●●
●
●●●
●
●●
●●● ●●●
●●
●
● ●
●
●
●
● +
●● ●● ●●
●
●
●
●
+ ● ● ●
●
●
●
●
●●●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●+
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●● ● ●+ ●
●
+ +
● ●●
● + +
● ● ●●●● ●● ●● ●●
●●●
●
●●
●●
●●● ●
●
●●
●
●●● ●
●●●● ● ●
●
●● ● ●
●●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
●●
●●
●●●
●●
●●
●
●● ●
● ●● ●● ● ●
●●
● ●●●
● ●
●●
●●●
●●●● ●●●● ●● ●●
●●●●
●
● ●● ●
●●●
● ●●
●
●
●
●●
●
●● ● ● ●● ● ●
●
●
●●● ●● ●
●●
●
●
●
●
●●●
●●
●●●
●●●●
● ●● ●●●
● ● ● ●
● ●●●● ● ●●
●●●
●● ●
●
●
●●●●●
● ●●
●●●●
● ●
● ● ●●
● ●● ●● ●
●●●●
● ●●
●
●● ●
+ +●●
●
●● ● ●
+ ●●●●● ●
●●
●
●
●
●
●
●●
●
●
+ ● ●
● +●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●●
+ + + ●
+ ● +
●● ● ● ●
● ●●
● ● ● ● ● ●●
●
● ●●● ● ● ●
● ● ●
● ●● ●
+ + ●●
+●
●●
●●
●
+ + + + + + ● +
●●
● ● ●
●
39.08
● ●
●
● ●
+ ●+ + + + + + + + + ●
xu
Figure 1: There were two different grids used to estimate the parameters of
the model 1. The Gamma process Z(u) was modeled over a 13x10 grid (the
+s in the figure) whereas the spatial covariates distribution were imported
from
a 20x20 grid (the empty squares in the figure) when needed to compute
R
τ k(s − u)Z(u) exp(θ1 X1 (u) + θ2 X2 (u)) ds. The figure also shows the crime
type 1 locations over the year.
14
CRIME EVENTS REPORTED Daily variation of crimes against PEOPLE
[Case 1: Extreme Violence]
15
5
●
●
15
5
1 2 3 4 5 6 7
Figure 2: The crime type 1, which includes events with high level of violence
especially against people, showed a cyclical pattern as that showed by the
entire dataset. There was a high number of crimes reported on Mondays
(upper panel) and high rates of criminal activity around the middle of the
year (lower panel).
15
39.22
39.20
39.18
39.16 Cincinnati Crime Data: Mean Intensity Surface Baseline
Latitude
39.14
39.12
39.10
Longitude
Figure 3: The mean baseline λ(u), or the mean intensity surface when µs (u) =
µt (t) = 1.
16
Cincinnati Crime Data: Mean Intensity Surface
Jun / 25 /2006
39.22
39.20
39.18
39.16
Latitude
39.14
39.12
39.10
Figure 4: The mean intensity surface corresponding to June 25th, 2006. This
picture is a frame taken from a movie generated in R and post processed with
Microsoft Windows Media Encoder 9.
17
Trace plot of theta [ 1 ]
0.000125 0.000150 covariate = population density 2000
theta [ 1 ]
mean = 0.000141174482952991 var = 2.29262511999377e−11
Histogram of theta [ 1 ]
covariate = population density 2000
100
40
0
theta [ 1 ]
Acceptance: 0.481696339267854
Figure 5: The trace plot for the parameter θ1 shows an acceptable mixing
(upper panel), whereas the estimated posterior mean of θ1 , whose histogram
is depicted in the lower panel, shows that, when used to compute µs (u),
an increment of 1000 new residents might increase the intensity of crime in
around 15%.
18
Trace plot of theta [ 2 ]
covariate = vacant units
0.0050
0.0035
theta [ 2 ]
mean = 0.00425622627213655 var = 8.58944178813553e−08
Histogram of theta [ 2 ]
covariate = vacant units
80
40
0
theta [ 2 ]
Acceptance: 0.585317063412683
Figure 6: The trace plot for the parameter θ2 shows an acceptable mixing
(upper panel), whereas the estimated posterior mean of θ2 , whose histogram
is depicted in the lower panel, shows that, when used to compute µs (u), an
increment of 10 vacant units might increase the intensity of crime in around
4.3%. In the case that 50 houses or apartments ended at some time with no
occupants, the intensity of crime events might increase in 23.7%.
19
Observed Number of Crimes
20
Nobs
10
0
10
0
Figure 7: The daily variation of the number of crime events, n and ΛR (t),
plotted from the observed data and from the estimated values according to
the model 1.
20
Population density 2006 in county 39061
39.22
39.20
39.18
39.16
rmy
39.14
39.12
39.10
rmx
Figure 8: Future research might explore kernels with temporal and spatially
varying parameters.
21
R source code:
ITER = 5000
burn = 1/2
map = "map1" # "map1" = atlas map; "map2" = satellite map
P = 13
Q = 10
x1 = -84.820305
x2 = -84.256391
y1 = 39.021600
y2 = 39.312056
# WORKING DIRECTORY
set.seed(9132)
##############################################################################
######
# SUBROUTINES AND FUNCTIONS
22
"ezinterp" <-
function(x, y, z, method="loess", gridlen=40,
span=0.05, ...)
{
xyz.loess <-
suppressWarnings(loess(z ~ x + y, data.frame(x=x, y=y),
span=span, ...))
##############################################################################
######
# LUIS ARREGUIN’S SUBROUTINE TO COMPUTE BIVARIATE NORMAL PROBABILITY
sd1 = sqrt(xvar)
sd2 = sqrt(yvar)
23
deltax = (x2-x1)/slices
xi = x1 + deltax/2
volume = 0
for(i in 1:slices) {
m3 = xi*rho*sd2/sd1
sd3 = sd2*sqrt(1-rho*rho)
xi <- xi + deltax }
q0 <- sd1*sd2*sqrt(1-rho*rho)
volume <- 2*pi*q0*volume
volume }
# END pbivariate
##############################################################################
######
# DATA INPUT AND GRID SETTINGS
if(dataset == "complete") {
attach(crime2006_file3)
CC = 1
if(length(index)>0) {
x <- datax[-index]
y <- datay[-index]
24
t1 <- t1[-index]
t2 <- t2[-index]
t3 <- t3[-index]
t4 <- t4[-index]
t5 <- t5[-index]
} else {
x <- datax
y <- datay
}
if(dataset == "partial") {
data1 <- read.csv("juneData.csv")
x <- data1[,2]
y <- data1[,3]
}
if(covariates == "on") {
nc = length(civ)
n2 = length(xu2)
P2 = 20 # SECONDARY GRID WITH INFORMATION ON COVARIATES
Q2 = 20
} # END if covariates on
n = length(x)
25
delta_x
delta_y
n_star = P*Q
if(covariates == "on") {
xu = rep(x_grid, times=P)
yu = rep(y_grid, each=Q)
for(i in 1:Q) {
y1u[i] <- 0
y2u[(P-1)*Q + i] <- 0
}
for(j in 1:P) {
x1u[(j-1)*Q + 1] <- 0
x2u[(j-1)*Q + Q] <- 0
}
if(covariates == "on") {
for(i in 1:Q2) {
y1u2[i] <- 0 # y1u2[i] <- min(yu2)-min(yu)
26
y2u2[(P2-1)*Q2 + i] <- 0 # y2u2[(P2-1)*Q2 + i] <- max(yu)-max(yu2)
}
for(j in 1:P2) {
x1u2[(j-1)*Q2 + 1] <- 0 # x1u2[(j-1)*Q2 + 1] <- min(xu2)-min(xu)
x2u2[(j-1)*Q2 + Q2] <- 0 # x2u2[(j-1)*Q2 + Q2] <- max(xu)-max(xu2)
}
par(mfrow=c(1,1))
plot(xu, yu, pch = "+", main="Data points and lattice for kernel convolution
modeling")
points(x,y,pch=20)
if(covariates == "on") points(xu2,yu2,pch=22)
n
n_star
# xi <- 1+(Q-1)*(x-min(x_grid))/(max(x_grid)-min(x_grid))
# yj <- 1+(P-1)*(y-min(y_grid))/(max(y_grid)-min(y_grid))
# loc <- (round(yj)-1)*Q+round(xi)
if(covariates == "on") {
xi <- 1+(Q2-1)*(xu-min(x_grid2))/(max(x_grid2)-min(x_grid2))
yj <- 1+(P2-1)*(yu-min(y_grid2))/(max(y_grid2)-min(y_grid2))
loc_u <- (floor(yj)-1)*Q2+floor(xi)
w1 = (1+floor(xi)-xi)*(1+floor(yj)-yj)
w2 = (xi-floor(xi))*(1+floor(yj)-yj)
w3 = (1+floor(xi)-xi)*(yj-floor(yj))
w4 =(xi-floor(xi))*(yj-floor(yj))
efe_u <- w1*efe_u2[loc_u,]+w2*efe_u2[loc_u+1,]+w3*efe_u2[loc_u+Q2,]
+w4*efe_u2[loc_u+Q2+1,]
xi <- 1+(Q2-1)*(x-min(x_grid2))/(max(x_grid2)-min(x_grid2))
yj <- 1+(P2-1)*(y-min(y_grid2))/(max(y_grid2)-min(y_grid2))
loc_s <- (floor(yj)-1)*Q2+floor(xi)
index <- which(loc_s<1|loc_s>n2)
if(length(index)>0) {
x <- x[-index]
y <- y[-index]
t1 <- t1[-index]
27
t2 <- t2[-index]
t3 <- t3[-index]
t4 <- t4[-index]
t5 <- t5[-index]
n <- length(x)
xi <- 1+(Q2-1)*(x-min(x_grid2))/(max(x_grid2)-min(x_grid2))
yj <- 1+(P2-1)*(y-min(y_grid2))/(max(y_grid2)-min(y_grid2))
loc_s <- (floor(yj)-1)*Q2+floor(xi)
}
w1 = (1+floor(xi)-xi)*(1+floor(yj)-yj)
w2 = (xi-floor(xi))*(1+floor(yj)-yj)
w3 = (1+floor(xi)-xi)*(yj-floor(yj))
w4 =(xi-floor(xi))*(yj-floor(yj))
efe_s <- w1*efe_u2[loc_s,]+w2*efe_u2[loc_s+1,]+w3*efe_u2[loc_s+Q2,]
+w4*efe_u2[loc_s+Q2+1,]
if(temporal == "on") {
# nc = dim(efe_s)[2]
sum_Mt = rep(0, 8)
for(j in 1:n) {
sum_Mt[1] <- sum_Mt[1] + sin(2*pi*t4[j]/12)
sum_Mt[2] <- sum_Mt[2] + cos(2*pi*t4[j]/12)
sum_Mt[3] <- sum_Mt[3] + sin(2*pi*t3[j]/52)
sum_Mt[4] <- sum_Mt[4] + cos(2*pi*t3[j]/52)
sum_Mt[5] <- sum_Mt[5] + sin(2*pi*t2[j]/365)
sum_Mt[6] <- sum_Mt[6] + cos(2*pi*t2[j]/365)
sum_Mt[7] <- sum_Mt[7] + sin(2*pi*t1[j]/7)
sum_Mt[8] <- sum_Mt[8] + cos(2*pi*t1[j]/7)
28
}
# MONTHLY BASIS
# td <- 6
# xd <- x[t4==td]
# yd <- y[t4==td]
# DAILY BASIS
td <- 176
xd <- x[t2==td]
yd <- y[t2==td]
n
n2
n_star
##############################################################################
######
# MODELING PART ONE (KERNEL SETTINGS)
29
dy <- matrix(0, n_star, n_star)
for(i in 1:n_star) {
for(j in 1:n_star) {
dx[j,i] <- xu[i] - xu[j]
dy[j,i] <- yu[i] - yu[j]
}}
for(i in 1:n) {
for(j in 1:n_star) {
hx[j,i] <- x[i]-xu[j]
hy[j,i] <- y[i]-yu[j]
h11[j,i] <- hx[j,i]*hx[j,i]
h12[j,i] <- hx[j,i]*hy[j,i]
h22[j,i] <- hy[j,i]*hy[j,i]
}}
rx0 = kernel_size
ry0 = kernel_size
ru0 = 0
sxu <- matrix((rx0*(max(x_grid)-min(x_grid))/(Q-1))^2,n_star, ITER)
syu <- matrix((ry0*(max(y_grid)-min(y_grid))/(P-1))^2,n_star, ITER)
post_ru <- rep(ru0,n_star)
ru <- matrix(ru0, n_star, ITER)
##############################################################################
######
# KERNEL MATRIX COMPUTATIONS
for(i in 1:n_star) {
for(j in 1:n_star) {
q1 <- sxu[j,1]*syu[j,1]*(1-ru[j,1]*ru[j,1])
q2 <- syu[j,1]*dx[j,i]*dx[j,i]-2*ru[j,1]*sqrt(sxu[j,1]*syu[j,1])*dx[j,i]
*dy[j,i]+sxu[j,1]*dy[j,i]*dy[j,i]
}}
for(i in 1:n) {
for(j in 1:n_star) {
q1 <- sxu[j,1]*syu[j,1]*(1-ru[j,1]*ru[j,1])
q2 <- syu[j,1]*hx[j,i]*hx[j,i]-2*ru[j,1]*sqrt(sxu[j,1]*syu[j,1])*hx[j,i]
*hy[j,i]+sxu[j,1]*hy[j,i]*hy[j,i]
30
k1[j,i] <- exp(-q2/(2*q1))
} # END OF ITERATION j
} # END OF ITERATION i
Ku <- pbivariate(min(x_grid)-xu,max(x_grid)-xu,min(y_grid)-yu,max(y_grid)
-yu,sxu[,1],syu[,1],ru[,1])
##############################################################################
######
# MODELING PART TWO (MONTE CARLO MARKOV CHAIN ALGORITHM)
par(mfrow=c(1,1))
ND = rep(0, ITER)
ND[1] <- n
if(covariates=="on") {
theta <- matrix(0, nc, ITER) # theta <- rep(0, ITER)
theta_star <- matrix(0, nc, ITER) # theta_star <- rep(0, ITER)
accepttheta <- rep(0, nc) # accepttheta <- 0
# THE HYPERPARAMETERS FOR GAMMA PRIORS ARE NAMED BY CONCATENATING THE LETTERS
A OR B
# (CORRESPONDING TO ALPHA OR BETA) AND THE RANDOM VARIABLE INITIALS FOR
31
WHICH
# THE MCMC IS RUN
Arx <- 4
Brx <- 4
Ary <- 4
Bry <- 4
RX <- rep(rx0, ITER)
RY <- rep(ry0, ITER)
RX_star <- rep(1, ITER)
RY_star <- rep(1, ITER)
acceptRX <- 0
acceptRY <- 0
RU <- rep(ru0, ITER)
RU_star <- rep(0, ITER)
acceptRU <- 0
if(temporal=="on") {
##############################################################################
######
# PRIOR SPECIFICATIONS
# HYPERPARAMETERS
32
}
# THE HYPERPARAMETERS FOR GAMMA PRIORS ARE NAMED BY CONCATENATING THE LETTERS
A OR B
# (CORRESPONDING TO ALPHA OR BETA) AND THE RANDOM VARIABLE INITIALS FOR
WHICH
# THE MCMC IS RUN. SOMETIMES UNDERSCORES ARE INTRODUCE FOR THE SAKE OF
CLARITY
if(covariates == "off") {
q1 <- mean(sxu[,1]*syu[,1]*(1-ru[,1]*ru[,1]))
if(covariates == "on") {
q1 <- mean(sxu[,1]*syu[,1]*(1-ru[,1]*ru[,1]))
A_tauk
B_tauk
if(prior_xu == "fixed") {
if(covariates == "off") {
alpha[1] <- np*pv1*n*(B_tauk/A_tauk)/sum(Ku) # 4 # np*n_star/n # np*sum(Ku)/n
# np
beta[1] <- np # # np # beta[1]*n/n_star # 1.0*n*beta[1]/sum(Ku) # from 0.01
to 0.99
}
if(covariates == "on") {
alpha[1] <- np*pv1*n*(B_tauk/A_tauk)/sum(Ku*exp(3e-5*efe_u)) # 4 # np*n_star/n
# beta[1] <- np*sum(Ku)/n # np
beta[1] <- np # alpha[1] <- np # beta[1]*n/n_star # 1.0*n*beta[1]/sum(Ku)
# from 0.01 to 0.99
}
alpha[1]
beta[1]
##############################################################################
33
######
# MCMC IMPLEMENTATION
sum1prev = 0
k = 1
for(k in 1:(ITER-1)) {
if(temporal == "on") {
} else {
Ls[,k] <- Ms[,k]*tauk[k]*t(fxu[,k])%*%k1
Lu[,k] <- Mu[,k]*tauk[k]*t(fxu[,k])%*%k2
}
# CONTROL PANEL 1
if(display=="on"&(k-2-round(k*burn))>1) {
# post_xu <- apply(fxu[,2:k],1,mean)
post_Lu <- apply(Lu[,(1+round(k*burn)):(k-1)],1,mean)
post_tau <- mean(tauk[(1+round(k*burn)):(k-1)])
turn = 1
thetas = numeric(0)
if(covariates == "off") {
nc = 0
post_ND <- mean(NDtemp[(1+round(k*burn)):(k-1)]*ND[(1+round(k*burn)):(k-1)])
var_ND <- var(NDtemp[(1+round(k*burn)):(k-1)]*ND[(1+round(k*burn)):(k-1)])
}
if(covariates == "on") {
post_ND <- mean(NDtemp[(1+round(k*burn)):(k-1)]*NDtheta[(1+round(k*burn))
:(k-1)])
var_ND <- var(NDtemp[(1+round(k*burn)):(k-1)]*NDtheta[(1+round(k*burn))
:(k-1)])
thetas <- apply(theta[,(1+round(k*burn)):(k-1)],1,mean)
turn <- sample(1:nc,1)
}
if(temporal=="on") {
thetas <- c(thetas, apply(thetatemp[,(1+round(k*burn)):(k-1)],1,mean))
turn <- sample(1:(nc+8),1)
surface5a <- matrix(log(post_Lu),Q,P)
image(x_grid,y_grid,surface5a,xlab=" ",ylab=paste("Accpt theta:"
,min(c(1+accepttheta/k,1+acceptthetatemp/k))," to ", max(c(1+accepttheta/k
,1+acceptthetatemp/k))," rho:",min(1+acceptru/k)," to ",max(1+acceptru/k))
,main=paste("Log[post L(u)] after iter = ",k,"with burn of ",100*burn,"%
theta[",turn,"] = ",thetas[turn]," tau = ",post_tau,"
mean(ND) = ",post_ND," var(ND) = ",var_ND),
sub=paste("Acceptance x(u): ",min(1-noaccept/k)," to ",max(1-noaccept/k),"
Acceptance rx,ry:",min(c(1+acceptrx/k,1+acceptry/k))," to ",max(c(1+acceptrx/k
,1+acceptry/k))))
34
contour(x_grid,y_grid,surface5a,add=TRUE)
post_NDt <- mean(NDt[(1+round(k*burn)):(k-1)])
var_NDt <- var(NDt[(1+round(k*burn)):(k-1)])
text(-84.45, 39.09, label = paste("ND[June 25] = ",post_NDt))
text(-84.45, 39.085, label = paste("var(ND) = ",var_NDt))
points(xd,yd)
} else {
surface5a <- matrix(log(post_Lu),Q,P)
image(x_grid,y_grid,surface5a,xlab=" ",ylab=paste("Accpt theta:"
,min(1+accepttheta/k)," to ", max(1+accepttheta/k)," rho:"
,min(1+acceptru/k)," to ",max(1+acceptru/k)),main=paste("Log[post L(u)]
after iter = ",k,"with burn of ",100*burn,"%
theta[",turn,"] = ",thetas[turn]," tau = ",post_tau,"
mean(ND) = ",post_ND," var(ND) = ",var_ND),
sub=paste("Acceptance x(u): ",min(1-noaccept/k)," to ",max(1-noaccept/k),"
Acceptance rx,ry:",min(c(1+acceptrx/k,1+acceptry/k))," to ",max(c(1+acceptrx/k
,1+acceptry/k))))
contour(x_grid,y_grid,surface5a,add=TRUE)
points(x,y)
}
} # end DISPLAY
##############################################################################
######
# PROPOSAL DISTRIBUTIONS FOR GAMMA PROCESS x(u)
a_eta <- 2.5 # a_eta <- 125 # a_eta <- 8 # a_eta <- 10
delta_eta <- 0.95 # 0.85; delta_eta <- 0.5
# eta <- rbeta(n_star,a_eta*Ku*delta_eta/2,a_eta*Ku*(1-delta_eta)/2)
eta <- rbeta(n_star,a_eta*delta_eta/2,a_eta*(1-delta_eta)/2)
# index0 <- which(eta=="NaN")
# eta[index0] <- rbeta(length(index0),a_eta*0.001*delta_eta/2,a_eta*0.001
*(1-delta_eta)/2)
fxu_star[,k] <- eta*fxu[,k]/delta_eta
# index1 <- which(fxu_star[,k]=="NaN")
# fxu_star[index1,k] <- fxu[index1,k]
index2 <- which(fxu_star[,k]<1e-50)
fxu_star[index2,k] <-1e-50
# Ls_star[,k] <- Ms[,k]*tauk[k]*t(fxu_star[,k])%*%k1
##############################################################################
######
# METROPOLIS-HASTINGS SAMPLERS FOR GAMMA PROCESS x(u)
prod = 0
for(i in 1:n) {
prod <- prod + log(Ls[i,k])
}
35
Lprod[k] <- prod
for(j in 1:n_star) {
if(temporal == "on") {
logprod = 0
for(i in 1:n) logprod <- logprod + log(Ls[i,k]+Mts[i]*Ms[i,k]*tauk[k]
*(fxu_star[j,k]-fxu[j,k])*k1[j,i])
} else {
logprod = 0
for(i in 1:n) logprod <- logprod + log(Ls[i,k]+Ms[i,k]*tauk[k]
*(fxu_star[j,k]-fxu[j,k])*k1[j,i])
}
if(covariates == "off") {
p1 <- (alpha[k]-1)*logfxu_star[j]-fxu_star[j,k]*(beta[k]+NDtemp[k]*Mu[j]
*tauk[k]*Ku[j])+logprod
p2 <- (alpha[k]-1)*logfxu[j]-fxu[j,k]*(beta[k]+NDtemp[k]*Mu[j]*tauk[k]
*Ku[j])+Lprod[k]
mh1 <- exp(p1-p2+q12[j])
}
if(covariates == "on") {
p1 <- (alpha[k]-1)*logfxu_star[j]-fxu_star[j,k]*(beta[k]+NDtemp[k]*tauk[k]
*Kutheta[j])+logprod
p2 <- (alpha[k]-1)*logfxu[j]-fxu[j,k]*(beta[k]+NDtemp[k]*tauk[k]*Kutheta[j])
+Lprod[k]
mh1 <- exp(p1-p2+q12[j])
}
36
##############################################################################
######
# UPDATING HYPERPARAMETERS
if(prior_xu == "fixed") {
alpha[k+1] <- alpha[k]
beta[k+1] <- beta[k]
}
# GIBSS SAMPLER FOR tauk (AS IF ALPHA WAS FIXED AND BETA WAS RANDOM)
##############################################################################
######
# UPDATING THE KERNEL PARAMETERS rx, ry, and rho
if(kernel == "fixed") {
rx[,k+1] <- rx[,k]
ry[,k+1] <- ry[,k]
ru[,k+1] <- ru[,k]
}
##############################################################################
######
# METROPOLIS-HASTINGS STEP FOR COVARIATES
if(covariates == "on") {
37
Kutheta_star <- matrix(0, n_star, nc) # Kutheta_star <- rep(0, n_star)
Kuu <- rep(0, n2)
for(j in 1:n_star) {
for(h in 1:nc) {
# Kutheta_star[j,h] <- Kutheta[j] + Kuu%*%exp(efe_u2[,h]
*(theta_star[h,k]-theta[h,k]))
Kutheta_star[j,h] <- Kuu%*%exp(as.matrix(efe_u2[,-h])%*%theta[-h,k]+efe_u2[,h]
*theta_star[h,k])
}
# Kutheta_star[j] <- Kuu%*%exp(efe_u2*theta_star[k])
}
q12t <- 0
for(h in 1:nc) {
if(mh1t[h]>1|mh1t[h]=="Inf") mh1t[h] <- 2
pstart <- min(1,mh1t[h])
theta[h,k+1] <- sample(c(theta_star[h,k],theta[h,k]),1,prob=c(pstart,1-pstart))
if(theta[h,k+1]==theta[h,k]) accepttheta[h] <- accepttheta[h]-1 else {
NDtheta[k+1] <- NDtheta_star[h,k+1]
Kutheta <- Kutheta_star[,h]
} # end else
} # end h loop
38
} # END covariates M-H
if(temporal == "on") {
mt_prior_temp = rep(0, 8)
sdt_prior_temp <- rep(0.5, 8)
sdt_star_temp <- rep(2.5e-2, 8)
thetatemp_star[,k] <- rnorm(8, thetatemp[,k], sdt_star_temp)
q12tt <- 0
NDtemp_star = rep(0, 8)
# twd = 1
# twk = 1
for(t in 1:365){
indext <- which(t2==t)
tmo <- t4[indext[1]]
twk <- t3[indext[1]]
twd <- t1[indext[1]]
NDtemp_star[1] <- NDtemp_star[1] + exp(thetatemp_star[1,k]*sin(2*pi*tmo/12)
+ thetatemp[2,k]*cos(2*pi*tmo/12)+
thetatemp[3,k]*sin(2*pi*twk/52) + thetatemp[4,k]*cos(2*pi*twk/52)+
thetatemp[5,k]*sin(2*pi*t/365) + thetatemp[6,k]*cos(2*pi*t/365)+
thetatemp[7,k]*sin(2*pi*twd/7) + thetatemp[8,k]*cos(2*pi*twd/7))
NDtemp_star[2] <- NDtemp_star[2] + exp(thetatemp[1,k]*sin(2*pi*tmo/12)
+ thetatemp_star[2,k]*cos(2*pi*tmo/12)+
thetatemp[3,k]*sin(2*pi*twk/52) + thetatemp[4,k]*cos(2*pi*twk/52)+
thetatemp[5,k]*sin(2*pi*t/365) + thetatemp[6,k]*cos(2*pi*t/365)+
thetatemp[7,k]*sin(2*pi*twd/7) + thetatemp[8,k]*cos(2*pi*twd/7))
39
thetatemp[5,k]*sin(2*pi*t/365) + thetatemp[6,k]*cos(2*pi*t/365)+
thetatemp_star[7,k]*sin(2*pi*twd/7) + thetatemp[8,k]*cos(2*pi*twd/7))
NDtemp_star[8] <- NDtemp_star[8] + exp(thetatemp[1,k]*sin(2*pi*tmo/12)
+ thetatemp[2,k]*cos(2*pi*tmo/12)+
thetatemp[3,k]*sin(2*pi*twk/52) + thetatemp[4,k]*cos(2*pi*twk/52)+
thetatemp[5,k]*sin(2*pi*t/365) + thetatemp[6,k]*cos(2*pi*t/365)+
thetatemp[7,k]*sin(2*pi*twd/7) + thetatemp_star[8,k]*cos(2*pi*twd/7))
if(covariates == "on") {
p1tt <- -NDtemp_star*NDtheta[k+1] + sum_Mt*thetatemp_star[,k]
-(1/(2*sdt_prior_temp^2))*(mt_prior_temp-thetatemp_star[,k])^2
p2tt <- -NDtemp[k]*NDtheta[k+1] + sum_Mt*thetatemp[,k]
-(1/(2*sdt_prior_temp^2))*(mt_prior_temp-thetatemp[,k])^2
mh1tt <- exp(p1tt-p2tt+q12tt)
}
if(covariates == "off") {
p1tt <- -NDtemp_star*ND[k+1] + sum_Mt*thetatemp_star[,k]
-(1/(2*sdt_prior_temp^2))*(mt_prior_temp-thetatemp_star[,k])^2
p2tt <- -NDtemp[k]*ND[k+1] + sum_Mt*thetatemp[,k]
-(1/(2*sdt_prior_temp^2))*(mt_prior_temp-thetatemp[,k])^2
mh1tt <- exp(p1tt-p2tt+q12tt)
}
for(h in 1:8) {
if(mh1tt[h]>1|mh1tt[h]=="Inf") mh1tt[h] <- 2
pstartt <- min(1,mh1tt[h])
thetatemp[h,k+1] <- sample(c(thetatemp_star[h,k],thetatemp[h,k]),1
,prob=c(pstartt,1-pstartt))
if(thetatemp[h,k+1]==thetatemp[h,k]) acceptthetatemp[h]
<- acceptthetatemp[h]-1
} # end h loop
NDtemp_new = 0
for(t in 1:365){
indext <- which(t2==t)
tmo <- t4[indext[1]]
twk <- t3[indext[1]]
twd <- t1[indext[1]]
40
for(j in 1:n) {
Mts[j] <- exp(thetatemp[1,k+1]*sin(2*pi*t4[j]/12)
+ thetatemp[2,k+1]*cos(2*pi*t4[j]/12)+
thetatemp[3,k+1]*sin(2*pi*t3[j]/52) + thetatemp[4,k+1]*cos(2*pi*t3[j]/52)+
thetatemp[5,k+1]*sin(2*pi*t2[j]/365) + thetatemp[6,k+1]*cos(2*pi*t2[j]/365)+
thetatemp[7,k+1]*sin(2*pi*t1[j]/7) + thetatemp[8,k+1]*cos(2*pi*t1[j]/7))
}
} # END OF ITERATION k
##############################################################################
######
# SPATIAL DISTRIBUTIONS, TRACE PLOTS, AND HISTOGRAMS
CC = 1
WK = "June"
burn = 1/2
NL = 8
colores = terrain.colors(128)
41
post_syu <- apply(syu[,(1+round(k*burn)):(k-1)],1,mean)
surface1b <- matrix(post_syu,Q,P)
image(x_grid,y_grid,surface1b,main=paste("Posterior mean of s2y(u) for
crimes type",CC," and Week ",WK," 2006"))
contour(x_grid,y_grid,surface1b,nlevels=NL,add=TRUE)
points(x,y)
Ku <- pbivariate(min(x_grid)-xu,max(x_grid)-xu,min(y_grid)-yu,max(y_grid)
-yu,post_sxu,post_syu,post_ru)
surface1d <- matrix(Ku,Q,P)
image(x_grid,y_grid,surface1d,main=paste("Posterior mean of K(u) for
crimes type",CC," and Week ",WK," 2006"))
contour(x_grid,y_grid,surface1d,nlevels=NL,add=TRUE)
points(x,y)
cov = 1
cov = 3
42
##############################################################################
######
43
contour(x_grid,y_grid,surface8b,nlevels=NL,add=TRUE)
points(x,y)
##############################################################################
######
# MAP PLOTS
par(mfrow=c(1,1))
# MAP 1
if(map == "map1") {
map1 <- read.table("cincinnati_map1.dat",header=TRUE)
dim(map1)
mx <- map1[,1]
my <- map1[,2]
mz <- map1[,-c(1,2)]
color1 <- terrain.colors(128)
color2 <- "black"
}
# MAP 2
if(map == "map2") {
long <- read.table("long_map2.dat")
lat <- read.table("lat_map2.dat")
mz <- read.table("cincinnati_map2.dat")
mx <- long[,1]
my <- lat[,1]
color1 <- gray(0:128/128)
color2 <- "yellow"
}
# SPATIAL SCATTERPLOT
# x1 = -84.820305
# x2 = -84.256391
44
# y1 = 39.021600
# y2 = 39.312056
if(map == "map1") {
stretch_x = 1.25
stretch_y = 1.00
offset_x = 0.00
offset_y = 0.00
}
if(map == "map2") {
stretch_x = 0.95
stretch_y = 1.15
offset_x = 0.01
offset_y = -0.005
}
NL = 15
mI <- post_Lu # mI <- apply(d$intensity, 2, mean)
meanintensity <- ezinterp(xu,yu,mI,method="akima",gridlen=200,span=0.02)
# meanintensity <- ezinterp(d$YY[,1],d$YY[,2],mI,method="akima"
,gridlen=200,span=0.02)
45
meanintensity <- ezinterp(xu,yu,mI,method="akima",gridlen=200,span=0.02)
# meanintensity <- ezinterp(d$YY[,1],d$YY[,2],mI,method="akima"
,gridlen=200,span=0.02)
cov = 1
hist(exp(1000*theta[cov,(1+round(k*burn)):(k-1)]),main=paste("Histogram of
exp(1000*theta) for covariate",cov),sub=paste("mean = "
,mean(exp(1000*theta[cov,(1+round(k*burn)):(k-1)]))," var = "
,var(exp(1000*theta[cov,(1+round(k*burn)):(k-1)]))),breaks=50)
46
par(mfrow = c(2,1))
hist(exp(1000*theta[1,(1+round(k*burn)):(k-1)]),main="Histogram of
exp(1000*theta) for covariate 1",sub=paste("mean = "
,mean(exp(1000*theta[1,(1+round(k*burn)):(k-1)]))," var = "
,var(exp(1000*theta[1,(1+round(k*burn)):(k-1)]))),breaks=50)
hist(exp(1000*theta[2,(1+round(k*burn)):(k-1)]),main="Histogram of
exp(100*theta) for covariate 2",sub=paste("mean = "
,mean(exp(100*theta[2,(1+round(k*burn)):(k-1)]))," var = "
,var(exp(100*theta[2,(1+round(k*burn)):(k-1)]))),breaks=50)
hist(exp(1000*theta[3,(1+round(k*burn)):(k-1)]),main="Histogram of
exp(1000*theta) for covariate 3",sub=paste("mean = "
,mean(exp(1000*theta[3,(1+round(k*burn)):(k-1)]))," var = "
,var(exp(1000*theta[3,(1+round(k*burn)):(k-1)]))),breaks=50)
hist(exp(1000*theta[4,(1+round(k*burn)):(k-1)]),main="Histogram of
exp(1000*theta) for covariate 4",sub=paste("mean = "
,mean(exp(1000*theta[4,(1+round(k*burn)):(k-1)]))," var = "
,var(exp(1000*theta[4,(1+round(k*burn)):(k-1)]))),breaks=50)
hist(exp(1000*theta[5,(1+round(k*burn)):(k-1)]),main="Histogram of
exp(1000*theta) for covariate 5",sub=paste("mean = "
,mean(exp(1000*theta[5,(1+round(k*burn)):(k-1)]))," var = "
,var(exp(1000*theta[5,(1+round(k*burn)):(k-1)]))),breaks=50)
# DISPLAY PANEL 2
if(covariates == "off") {
post_ND <- mean(ND[(1+round(k*burn)):(k-1)])
var_ND <- var(ND[(1+round(k*burn)):(k-1)])
}
if(covariates == "on") {
post_ND <- mean(NDtheta[(1+round(k*burn)):(k-1)])
var_ND <- var(NDtheta[(1+round(k*burn)):(k-1)])
thetas <- apply(theta[,(1+round(k*burn)):(k-1)],1,mean)
turn <- sample(1:nc,1)
# June 25
t2_date = 17
index_t <- which(t2==t2_date)
t1_date = t1[index_t[1]]
47
t2_u = rep(t2_date, n_star)
efe_u <- cbind(efe_u[,1:(nc-3)], cos(2*pi*t1_u/7), cos(2*pi*t2_u/365)
, sin(2*pi*t2_u/365))
# efe_u2 <- cbind(efe_u2, cos(2*pi*t1_u2/7), cos(2*pi*t2_u2/365)
, sin(2*pi*t2_u2/365))
xt <- x[t2==t2_date]
yt <- y[t2==t2_date]
}
for(j in 1:n_star) {
xx1 <- xu2+x1u2-xu[j]
xx2 <- xu2+x2u2-xu[j]
yy1 <- yu2+y1u2-yu[j]
yy2 <- yu2+y2u2-yu[j]
sxx <- rep(post_sxu[j],n2)
syy <- rep(post_syu[j],n2)
ruu <- rep(post_ru[j],n2)
Kuu <- pbivariate(xx1,xx2,yy1,yy2,sxx,syy,ruu,3)
Kutheta[j] <- Kuu%*%exp(as.matrix(efe_u2)%*%thetas)
}
NDtheta_t <- post_tau*fxu[,k]%*%Kutheta
post_Mu <- exp(as.matrix(efe_u)%*%thetas)
post_Lu <- post_Mu*post_tau*k2%*%post_fxu
##############################################################################
######
par(mfrow=c(2,1))
plot((1+round(k*burn)):(k-1),fxu[index4,(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of x(",index4,") with min L(u)"))
plot((1+round(k*burn)):(k-1),fxu[index3,(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of x(",index3,") with max L(u)"))
48
hist(fxu[index4,(1+round(k*burn)):(k-1)],breaks=100,main=paste("Histogram of
x(",index4,") with min L(u)"))
hist(fxu[index3,(1+round(k*burn)):(k-1)],breaks=100,main=paste("Histogram of
x(",index3,") with max L(u)"))
plot((1+round(k*burn)):(k-1), tauk[(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of the kernel tau"))
hist(tauk[(1+round(k*burn)):(k-1)],main=paste("Histogram of the kernel tau")
,breaks=100)
##############################################################################
######
# THETA FOR SPATIAL AND TEMPORAL COVARIATES
par(mfrow=c(2,1))
cov = 5
plot((1+round(k*burn)):(k-1), theta[cov,(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of theta [",cov,"]
covariate =",cov_name[cov]),sub=paste("mean = ",mean(theta[cov
,(1+round(k*burn)):(k-1)])," var = ",var(theta[cov
,(1+round(k*burn)):(k-1)])))
hist(theta[cov,(1+round(k*burn)):(k-1)],main=paste("Histogram of
theta [",cov,"]
covariate =",cov_name[cov]),sub=paste("Acceptance:",1+accepttheta[cov]/k)
,breaks=100)
cov = 2
plot((1+round(k*burn)):(k-1), thetatemp[cov,(1+round(k*burn)):(k-1)]
,type="l",main=paste("Trace plot of thetatemp [",cov,"]"),
sub=paste("mean = ",mean(thetatemp[cov,(1+round(k*burn)):(k-1)])," var = "
,var(thetatemp[cov,(1+round(k*burn)):(k-1)])))
hist(thetatemp[cov,(1+round(k*burn)):(k-1)],main=paste("Histogram of
thetatemp [",cov,"]"),
sub=paste("Acceptance:",1+acceptthetatemp[cov]/k),breaks=100)
##############################################################################
######
# NUMBER OF EVENTS, N(D), OVER THE STUDY REGION
par(mfrow=c(2,1))
plot((1+round(k*burn)):(k-1), NDk[(1+round(k*burn)):(k-1)],type="l"
,main="Expected number of crime events for the study area"
,xlab="MCMC iteration",ylab="Expected N(D)",sub=paste("mean = "
,mean(NDk[(1+round(k*burn)):(k-1)])," var = "
,var(NDk[(1+round(k*burn)):(k-1)])))
hist(NDk[(1+round(k*burn)):(k-1)],main=paste("Observed number of crime events
= ",length(x)),xlab="Expected number of crime events"
49
,sub=paste("MCMC with burning in of ",100*burn," % of ",k," iterations")
,breaks=100)
td = 12 # monthly basis: 1 to 12
monthlab = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep"
, "Oct", "Nov", "Dec")
##############################################################################
######
# MOVIE FRAMES
library(akima)
gridlen = 100
par(mfrow=c(1,1))
framing = "montly" # "daily", "weekly", "monthly"
monthlab = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep"
, "Oct", "Nov", "Dec")
monthday = c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
50
post_fxu <- apply(fxu[,(1+round(k*burn)):(k-1)], 1, mean)
post_thetatemp <- apply(thetatemp[,(1+round(k*burn)):(k-1)], 1, mean)
post_Ms <- apply(Ms[,(1+round(k*burn)):(k-1)], 1, mean)
post_Mu <- apply(Mu[,(1+round(k*burn)):(k-1)], 1, mean)
# MONTHLY BASIS
# td <- 12
for(td in 1:12) {
xd <- x[t4==td]
yd <- y[t4==td]
NL = 12
} # end td loop
dev.off()
51
points(xd,yd)
# DAILY BASIS
par(mfrow = c(1,1))
# td <- 176
for(td in 1:365) {
xd <- x[t2==td]
yd <- y[t2==td]
Nobs[td] = length(xd)
Ncalc[td] = post_NDt
Nse[td] = sqrt(var_NDt)
52
yo=seq(min(yu), max(yu), length=gridlen))
LL = c(40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 240, 280, 320
, 360) # NL = 14
} # end td loop
dev.off()
##############################################################################
######
# ADDITIONAL GRAPHS (FOR THE FINAL DOCUMENT)
LL = c(40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 240, 280, 320
, 360) # NL = 14
dev.off()
53
xd <- x[t2==td]
yd <- y[t2==td]
dev.off()
par(mfrow = c(2,1))
54
lines(1:365, Ncalc+1.96*Nse, col="red")
lines(1:365, Ncalc-1.96*Nse, col="blue")
dev.off()
par(mfrow=c(2,1))
cov = 2
plot((1+round(k*burn)):(k-1), theta[cov,(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of theta [",cov,"]
covariate =",cov_name[cov]),xlab = paste("theta [",cov,"]"), ylab=" "
,sub=paste("mean = ",mean(theta[cov,(1+round(k*burn)):(k-1)])," var = "
,var(theta[cov,(1+round(k*burn)):(k-1)])))
hist(theta[cov,(1+round(k*burn)):(k-1)],xlab = paste("theta [",cov,"]")
, ylab=" ",main=paste("Histogram of theta [",cov,"]
covariate =",cov_name[cov]),sub=paste("Acceptance:",1+accepttheta[cov]/k)
,breaks=100)
dev.off()
par(mfrow=c(2,1))
for(cov in 1:8) {
plot((1+round(k*burn)):(k-1), thetatemp[cov,(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of thetatemp [",cov,"]"),
xlab = paste("theta [",cov,"]"), ylab=" "
,sub=paste("mean = ",mean(thetatemp[cov,(1+round(k*burn)):(k-1)])," var = "
,var(thetatemp[cov,(1+round(k*burn)):(k-1)])))
hist(thetatemp[cov,(1+round(k*burn)):(k-1)],xlab = paste("theta [",cov,"]")
, ylab=" ",main=paste("Histogram of thetatemp [",cov,"]"),
sub=paste("Acceptance:",1+acceptthetatemp[cov]/k),breaks=100)
}
dev.off()
# EXPLORATORY ANALYSIS
ITR = length(x)
i = 1
j = 1
55
crime_weekday <- numeric(0)
h = t2[1]
k = t1[1]
for (i in 2:ITR) {
ITR
par(mfrow=c(2,1))
dev.off()
summary(lm(crime_count~as.factor(crime_weekday)))
##############################################################################
######
# ANSCOMBE RESIDUAL
sector_matrix = c(4,4)
xi <- 1+(Q-1)*(xu-min(x_grid))/(max(x_grid)-min(x_grid))
56
yj <- 1+(P-1)*(yu-min(y_grid))/(max(y_grid)-min(y_grid))
h1 <- ceiling(yj*sector_matrix[1]/P)
h2 <- ceiling(xi*sector_matrix[2]/Q)
sector_loc <- (h1-1)*sector_matrix[2]+h2
for(j in 1:(sector_matrix[1]*sector_matrix[2])) {
index_loc <- which(sector_loc == j)
sector_index[[j]] <- index_loc
}
# MONTHLY BASIS
td <- 1
xd <- x[t4==td]
yd <- y[t4==td]
# DAILY BASIS
# td <- 176
# xd <- x[t2==td]
# yd <- y[t2==td]
for(j in 1:(sector_matrix[1]*sector_matrix[2])) {
index_loc <- which(sector_locs == j)
sector_index_s[[j]] <- index_loc
}
57
NDobs_sector <- rep(0,sector_matrix[1]*sector_matrix[2])
NDcalc_sector <- rep(0,sector_matrix[1]*sector_matrix[2])
for(j in 1:(sector_matrix[1]*sector_matrix[2])) {
index_events <- sector_index_s[[j]]
if(length(index_events)>0) NDobs_sector[j] <- length(index_events)
monthlab[td]
Table_1
apply(Table_1, 2, sum)
td = td+1
##############################################################################
######
# MODEL SUMMARY
CC;WK
P;Q;n;n_star
area;n/area
(max(x_grid)-min(x_grid))/(Q-1)
(max(y_grid)-min(y_grid))/(P-1)
post_ND;post_ND/area
summary(theta[(1+round(k*burn)):(k-1)])
summary(tauk[(1+round(k*burn)):(k-1)])
summary(post_rx)
summary(post_ry)
summary(post_sxu)
summary(post_syu)
summary(post_ru)
mean(alpha);var(alpha)
mean(beta);var(beta)
alpha[1];beta[1]
alpha[1]/beta[1];alpha[1]/beta[1]^2
A_tauk;B_tauk
58
A_tauk/B_tauk;A_tauk/B_tauk^2
sum(Lu_times_Au)
min(noaccept/k);max(noaccept/k)
min(1+acceptrx/k);max(1+acceptrx/k)
min(1+acceptry/k);max(1+acceptry/k)
min(1+acceptru/k);max(1+acceptru/k)
min(1+accepttheta/k);max(1+accepttheta/k)
np;a_eta;delta_eta
A_alpha;B_alpha;A_alpha/B_alpha;A_alpha/B_alpha^2
A_beta;B_beta;A_beta/B_beta;A_beta/B_beta^2
pv1; Nx; Ny
proposal; kernel; prior_xu; covariates; display
# END
##############################################################################
######
59
References
[1] Peter J. Diggle. Statistical Methods for Spatio-Temporal Systems, chapter
1 Spatio-Temporal Point Processes: Methods and Applications, pages 1–
45. Chapman & Hall / CRC, 2007.
[7] Jonathan R. Stroud, Peter Muller, and Bruno Sanso. Dynamic models
for spatiotemporal data. Journal of the Royal Statisitical Society. Series
B (Statistical Methodology), 63(4):673–689, 2001.
60