Acevedo-Arreguin (2005)

UNIVERSITY OF CALIFORNIA
SANTA CRUZ
SPATIAL TEMPORAL STATISTICAL MODELING OF CRIME DATA:

THE KERNEL CONVOLUTION APPROACH
A final report submitted in partial satisfaction

Of the requirements for the degree of
MASTER OF SCIENCES
In
STATISTICS AND STOCHASTIC MODELLING
By
Luis Antonio Acevedo-Arreguı́n
June 2008
The Master’s Project of Luis Antonio

Acevedo Arreguin
Is approved:
____________________________________
Professor Bruno Sanso
____________________________________
Professor Herbert Lee
1
Copyright by
c
2008
2
Spatial Temporal Statistical Modeling of Crime Data:
The Kernel Convolution Approach
Professor Bruno Sansó
Faculty Advisor
Professor Herbert Lee
Academic Advisor
University of California, Santa Cruz
Department of Applied Mathematics and Statistics
June 2008
Abstract
A Poisson point process with a kernel convolution approach is

used to model the intensity of crime events occurred in the metropoli-
tan area of Cincinnati, OH, during every day of 2006. Departing
from the traditional Gaussian process machinery used in statis-
tics, the model investigated in this paper considers the convolu-
tion of a Gamma process with a bivariate Gaussian kernel within
a Bayesian framework. In addition to a succinct discussion of
results, we provide some plots, movies, and the complete code in
the R computer programming language.
3
Spatial Temporal Statistical Modeling of Crime Data:
The Kernel Convolution Approach
Introduction: Statistical analysis of spatial-temporal data can be applied

to address problems with different degrees of complexity, from testing the
hypothesis of complete spatial randomness to identifying clustering of events
in space-time. We are interested in the analysis of data (si , ti ) : i = 1, ..., n,
where each si denotes the location and ti the corresponding time of occurrence
of an event within a spatial region A and within a time-interval, (0, T ). Peter
J. Diggle calls a dataset of this kind ”a spatio-temporal point pattern, and
the underlying stochastic model for the data a spatio-temporal point process.”
[1]. He also discusses the importance of distinguishing ”in practice” data
for which the events occur in a space-time continuum, and data for which
the time-scale is discrete either by natural conditions or by aggregating the
number of events over a sequence of discrete time-periods. In our case, we
attempt different levels of data aggregation in time to find seasonal patterns,
and circumscribe the complexity of the problem to that of finding the number
of events per unit area and its evolution over time.
Although the hypothesis of complete spatial randomness (CSR) ”is rarely
of any scientific interest”, Diggle uses the concept to introduce the definition
of the process underlying such a hypothesis:
A homogeneous Poisson process is a point process that satisfies
two conditions: the number of events in any planar region A
follows a Poisson distribution with mean λ|A|, where | · | denotes
area and the constant λ is the intensity, or mean number of events
per unit area; and the number of events n disjoint regions are
independent [1].
Then, Diggle shows that the intensity function of the process can be
defined as
E[N(dx)]
λ(s) = lim , (1)
|dx|→0 |ds|
where ds denotes an infinitesimal neighborhood of the point s, and N(ds) de-
notes the number of events in ds. However, a more rigorous mathematical def-
inition of a Poisson point process is provided by Moller and Waagepetersen,
who state that ”the general mathematical theory for point processes is heav-
ily based on measure theory.” [5]
4
Moller and Waagepetersen provide the following definitions and remarks
to define the basic properties of Poisson point processes:
We start by considering Poisson point processes defined on a

space S ⊆ d and specified by a so-called intensity
function
ρ : S → [0, ∞) which is locally integrable, i.e. B ρ(ξ)dξ < ∞
for all bounded B ⊆ S. This is by far the most important case
for applications.
REMARK 3.1 In the definition below of a Poisson process we use
only the intensity measure µ given by

µ(B) = ρ(ξ)dξ, B ⊆ S.
B
This measure is locally finite, i.e. µ(B) < ∞ for bounded B ⊆ S,

and diffuse, i.e. µ({ξ}) = 0 for ξ ∈ S.
DEFINITION 3.1 Let f be a density function on a set B ⊆ S, and
let n ∈ N (where N = 1, 2, 3, ...). A point process X consisting
of n i.i.d. points with density f is called a binomial point process
of n points in B with density f . We write X ∼ binomial(B, n, f )
(where ∼ means ”distributed as”). Note that B in Definition 3.1
has volume |B| > 0, since B f (ξ)dξ = 1. In the simplest case,
|B| < ∞ and each of the n i.i.d. points follows Unif orm(B),
the uniform distribution on B, i.e. f (ξ) = 1/|B| is the uniform
density on B.
DEFINITION 3.2 A point process X on S is a Poisson point
process with intensity function ρ if the following properties are
satisfied (where µ is given by (3.1)):
(i) for any B ⊆ S with µ(B) < ∞, N(B) ∼ po(µ(B)), the Poisson
distribution with mean µ(B) (if µ(B) = 0 then N(B) = 0;
(ii) for any n ∈ (N) and B ⊆ S with 0 < µ(B) < ∞, conditional
on N(B) = n, XB ∼ binomial(B, n, f ) with f (ξ) = ρ(ξ)/µ(B).
We then write X ∼ P oisson(S, ρ).
For any bounded B ⊆ S, µ determines the expected number of
points in B,
EN(B) = µ(B).
5
Heuristically, ρ(ξ)dξ is the probability for the occurrence of a
point in an infinitesimally small ball with centre ξ and volume dξ
[5].
Note that their notation is slightly different than the one used here.
Fortunately for our purposes, as they in turn cite Daley and Vere-Jones
(1988,2003), ”statistical inference for space-time processes is often simpler
than for spatial point processes,” which is later further treated in Section
9.2.5 of Moller and Waagepetersen’s book.
On the other hand, the more complicated issue of performing spatial clus-
tering model is addressed by Andrew B. Lawson and David G. T. Denison.
They claim that there are two main approaches to model clusters. Basi-
cally, the difference between those views is whether or not the locations of
the aggregations are parameterized [3]. Bayesian cluster modelling and, es-
pecially, mixture distribution and nonparametric approaches are given more
emphasis because of the development of fast computational algorithms for
sampling of complex Bayesian models (most notably Markov chain Monte
Carlo algorithms).
Method: The Kernel Convolution approach has been used for several years
to model spatial and temporal processes [7] [4]. For example, Stroud, Müller
and Sansó (2001) applied it to model two large environmental datasets,
whereas Lee, Higdon, Calder, and Holloman (2004) convolved simple Markov
random fields with a smoothing kernel to model some cases in hydrology and
aircraft prototype testing. In most of the applications of the kernel convo-
lution approach, Gaussian kernels were used. The emphasis on Gaussian
spatial and space-time models is because they are ”quite flexible and can be
adapted to a wide variety of applications, even where the observed data are
markedly non-Gaussian” [2].
David Higdon explains the reason of using the convolution models:
[Gaussian Markov random field] GMRF models work well for im-
age and lattice data; however, when data are irregularly spaced, a
continuos model for the spatial process z(s) is usually preferable.
In this section, convolution –or, equivalently, kernel– models are
introduced. These models construct a continuos spatial model
z(s) by smoothing out a simple, regularly spaced latent process.
In some cases, a GMRF model is used for this latent process.
6
The convolution process z(s) is determined by specifying a latent
process x(s) and a smoothing kernel k(s). We restrict the latent
process x(s) to be nonzero at the fixed spatial sites ω1 , ..., ωm ,
also in S, and define x = (x1 , ..., xm )T where xj = x(ωj ), j =
1, ..., m. For now, the xj ’s are modeled as independent draws
from a N(0, 1/λx ) distribution. The resulting continuos Gaussian
process is then

z(s) = k(u − s)dx(u)
S

m
= k(ωj − s)xj
j=1
where k(ωj − ·) is a kernel centered at ωj [2].
Then, Higdon provides several cases to illustrate the advantages of this

methodology. Likewise, Swall (1999) provides a complete description of the
kernel convolution approach and some interesting variations, such as spatially
varying kernels, to be applied in cases where non-stationary conditions of key
parameters are not guaranteed [8]. Here, we propose the kernel convolution
approach to model a Poisson point process, the crime events occurred every
day for a year in Cincinnati, Ohio. The modeling considers the convolu-
tion of a Gamma process with a bivariate Gaussian kernel within a Bayesian
framework. To obtain acceptable mixing in the sampling of the posterior
distributions of the crime intensity and the corresponding latent variables,
a beta proposal is implemented in the Monte Carlo Markov Chain subrou-
tine. Sansó and Schmidt (2004) describe the theoretical aspects of the beta
proposal in the implementation of Metropolis-Hastings steps of a MCMC
routine [6].
Data processing: The dataset to test the models was constructed from a
database that the Cincinnati Police Department maintains at
http://www.cincinnati-oh.gov/police/pages/-5192-/
with records of crimes committed in the Hamilton County for several years.
The database reports date, time, and location of crimes, as well as other data
that might be useful to characterize the magnitude of the reported event. A
UCR code, which seems to be related to a uniform crime reporting code
7
that the FBI uses nationwide, was assigned to each event. There were more
than 70 different UCR codes describing a variety of events such as telephone
harassment, vehicle theft, murder, and the like. Although very diverse, this
variable was used to reclassify the data in order to be used in the test models.
Specifically, the data corresponding to 2006 were downloaded, address
geocoded, imported into R, and performed a simple descriptive statistical
analysis. Since the database only reported the street address of each crime,
the more than 43000 records of that year were processed to obtain their
geographical coordinates. The process of geocoding was conducted by using
online services. The website
http://www.gpsvisualizer.com
was helpful because by acquiring a Google API key, users can geocode thou-
sands of records a day. Thus, converting multiple addresses to GPS coordi-
nates only requires a minor modification on the HTML code of the geocoder
webpage to include the API key, reduce the google delay value to 0.5 sec-
onds or less, and increase the number of records to be processed at once.
Once geocoding was performed, the data was imported into R and some
temporal variables were added such as the day of the week and the day of
the year in which the crime was committed. Also, the UCR codes were
transformed into four main categories of events: crimes against people with
extreme violence, crimes against people with minor violence, crimes against
property, and crimes against the system. For example, the categorical vari-
able crime class, which was incorporated to the database
CRIME2006_plus3.dat,
was given values from 1 to 4 depending on the category of the crime. A crime
with a UCR of 105 (corresponding to murder) was given a crime class value
of 1, whereas a crime with a UCR of 1120 (passing bad checks) was given a
crime class value of 4, and so on. These new categorical variables allowed us
to perform a preliminary analysis, which was summarized in some box plots
and time series plots. A complete description of the UCR codes used by the
Cincinnati police is provided in the file UCR code description.txt.
From a preliminary analysis of the entire dataset, a cyclical pattern was
observed on a plot of the number of crimes both with respect to the day
of the week and with respect to the day of the year. The plots showed
the highest incidence of crimes happened around the middle of the year
8
whereas the values tended to decrease around the end of the year. Simi-
larly, a higher number of crimes was reported on Mondays than the number
reported for the rest of the week. A similar pattern emerged when a part
of the dataset, the one corresponding to the crimes against people with ex-
treme violence, was plotted with respect to the temporal covariates. Thus,
crimes type 1 were selected for statistical modeling. Within this crime cate-
gory were included the UCR codes 100 to 495 and 800 to 864, which along
with the rest of the data are in the file crime2006 plus3.dat. The files
CRIME2006 database description.txt provide more details on the entire
dataset.
As part of the data processing, map importing into R was another task
that required the search of mapping resources both for obtaining the satellite
photographs and for georeferencing the imported images. Google, especially
http://earth.google.com,
was again a good source of satellite images from the study area, almost in
the same way the website
http://tiger.census.gov
was very helpful not only for providing the spatial covariates later included in
the models, but also for generating maps from any part of the United States
by just specifying the GPS coordinates of the area of interest. For example,
to generate a map of Hamilton County, whose Tiger code is TGR39061, the
user only needs to type the following address in the browser of his or her
preference
http://tiger.census.gov/cgi-bin/mapgen?lat=39.166828&
lon=-84.538348&wid=0.290456&ht=0.290456&iwd=480&iht=480
in a single line and without spaces. The parameters included in the link were
computed by using the boundary coordinates of the Hamilton County, also
provided by the Census website (i.e., −84.820305 < longitude < −84.256391
and 39.021600 < latitude < 39.312056). The GPS coordinates in the link
correspond to the center of the map, the wid and ht values represent the
width and the height of the image in GPS units whereas the iwd and iht
values represent the same dimensions but in pixels. Thus, the width and the
height of the image were chosen depending on the dimensions of the JPEG
image to be generated.
9
The JPEG file with the map of Cincinnati was later processed in R by
using the package rimage. This was required to generate a surface ma-
trix that could be used by the command image as many times as needed
without demanding a lot of computational time and also to facilitate the
georeferencing of the JPEG map. Georeferencing was necessary to plot the
crime points on a map without transforming the GPS coordinates of each
point into another coordinate system. A satellite image of Cincinnati ob-
tained from Google Earth was processed in the same way, thus generating
the files Cincinnati map1.dat for the option ”map1” in the computer pro-
gram for model 1, and the files Cincinnati map2.dat, long map2.dat, and
lat map2.dat for the option ”map2” in the same program. The option
”map1” corresponds to the simple road map, whereas the option ”map2”
corresponds to the satellite photograph. These files are required to generate
the background on the plots both for the figures included in this report and
the background in the accompanying video clips.
Model Statement: Under the kernel convolution approach, the intensity
λ(s, t) of a point process is modeled as the convolution of a random pro-
cess Z(s) and a weighting kernel k(s − u) over a grid of u locations. Both
the spatial and the temporal covariates are included into the model through
multiplicative effects µs (s) and µt (t), so for a Poisson process on an observa-
tion window R, the corresponding expressions for the intensity, the expected
number of points, and the likelihood for n points y ∈ d occurring at times
t = 1...T are

λ(s, t) = τ k(s − u)Z(u)µs(u)µt (t) (2)
u
ΛR,T = λ(s, t) ds dt (3)
T R

n
L(λ|d) = exp(−ΛR (t)) λt (yi ) (4)
t∈T i=1
where u indicates a grid location and s indicates a point location. The spatial
multiplicative effect µs is a function of two spatial covariates, X1 (s) as the
population density in year 2000 (number of individuals per square mile) and
X2 (s) as the number of vacant units,
µs (u) = exp(θ1 X1 (u) + θ2 X2 (u)), (5)
10
whereas the temporal multiplicative effect is based on a linear combination
of sines and cosines of four temporal covariates,
2πt4 2πt4
µt (t) = exp(θt1 sin( ) + θt2 cos( )
12 12
2πt3 2πt3
+θt3 sin( ) + θt4 cos( )
52 52
2πt2 2πt2
+θt5 sin( ) + θt6 cos( )
365 365
2πt1 2πt1
+θt7 sin( ) + θt8 cos( ))
7 7
(6)
where t1 ∈ {1, .., 7} is the day of the week (1 for Sunday), t2 ∈ {1, 2, ...365} is
the day of the year (1 for January 1st, 2006), t3 ∈ {1, 2, ...52} is the number
of week, and t4 ∈ {1, 2, ...12} is the number of month (1 for January).
The kernels over a 13x10 grid were chosen to be bivariate Gaussian with
fixed parameters σx2 and σy2 , which were estimated for the elliptical contours
of each bivariate Gaussian to have one standard deviation from its center (u
location) on both directions, x and y, equal to 52.5% the distance between
two grid points in the same row or column. The correlation ρ was set to
zero. Associated to these kernels was a Gamma process Z(u) with fixed
hyperparameters α and β, and a multiplicative factor τ that played the role
of transforming Z(u) into the process τ ·Z(u) with one of its hyperparameters,
α or β, acting as a random variable. The corresponding prior distributions
for all the parameters of the model were chosen to be
7 2πσx σy 7
π(τ · Z(u)) ∼ τ · Gamma( · , ) (7)
4 u R k(s − u)ds 4
7 0.0075 7
π(τ ) ∼ Gamma( · , ) (8)
4 2πσx σy 4
π(θ1 ) ∼ N(0, 0.00012) (9)
π(θ2 ) ∼ N(0, 0.00052) (10)
π(θtj ) ∼ N(0, 0.52 ) (11)
where j ∈ {1, 2, ...8}. The posterior distributions were sampled by a combi-

nation of Gibbs steps and Metropolis-Hastings algorithms.
11
Results: The model parameters were estimated by using Markov chain
Monte Carlo (MCMC). Specifically for the posterior distribution of τ · Z(u),
a beta proposal was implemented to improve the acceptance rate of the pro-
posal value for a new iteration k of the M-H step. Thus, the proposal Z (u)
for a new Z k (u) was sampled from
Z k−1 (u) aδ a(1 − δ)
Z (u) ∼ Beta( , ), (12)
δ 2 2
where δ and a were set on 0.95 and 2.5, respectively. This multiplicative
random walk seemed to induce a fast convergence of the MCMC. More details
on this approach can be found in Sanso (2007). The parameter τ was sampled
from its posterior Gamma distribution by a Gibbs step. The rest of the
parameters were sampled from their corresponding posterior distributions
by using M-H with normal proposals. Thus, the proposal distributions for
the spatial θs were
θ1 ∼ N(θ1k−1 |0, 0.0000052) (13)
θ2 ∼ N(θ2k−1 |0, 0.0002502) (14)
whereas the proposal distributions for the temporal θs were simply

θtj ∼ N(θtj
k−1
|0, 0.0252), (15)
where j ∈ {1, 2, ...8}. For modeling the daily variation of the intensity λ,
5000 iterations were required to convergence and a burning in of 2500. Since
the entire computer program was coded in R, the simulation took over 20
hours/run. The code is included in the file ppm llnl ver7a.r.
Once the posterior means of the spatial and temporal parameters θ were
computed, the corresponding multiplicative factors µs (u) and µt (t) were es-
timated as
µs (u) = exp(0.000141X1(u) + 0.004256X2(u)) (16)
2πt4 2πt4
µt (t) = exp(0.027926sin( ) − 0.091187cos( )
12 12
2πt3 2πt3
−0.222996sin( ) + 0.226660cos( )
52 52
2πt2 2πt2
+0.134605sin( ) − 0.229748cos( )
365 365
2πt1 2πt1
+0.056335sin( ) + 0.037308cos( )
7 7
12
(17)
which, in conjunction with the baseline intensity λ(u), allow to make infer-
ences on the expected number of crime events Λ over the region of interest
per day. Contour plots of the intensity λ for an area of Cincinnati delimited
by the longitudes 84.63◦W and 84.38◦ W , as well as the latitudes 39.09◦ N and
39.22◦N were plotted for each of the 365 days of 2006, and can be observed
on the 6-min movie Video-2.wmv.
Conclusions: The model allowed us to obtain a picture of the criminal hot
spots of the metropolitan area of Cincinnati, OH, in terms of providing the
location on an actual map of the various modes of the spatial distribution of
the intensity λ and its corresponding evolution over time. By incorporating
the information of other spatial variables that were considered constant with
respect of time, such as population density or the number of houses for rent,
it was possible to visually find the correlation between crime intensity and
densely populated areas of Cincinnati.
The model also served the objective of testing new ways to deal with
the massive computational resources required to process thousand of data on
hundred of grid points by using new proposal distributions for the Metropolis-
Hastings steps to get fast convergence for the MCMC. The Beta proposal
resulted in faster MCMC iterations than the traditional Gaussian proposal.
Faster simulations might be obtained by translating the R code to Fortran
or C++. We worked the entire computer code in R because of its advantages
for educational settings with limited computational resources (i.e., it is open
source), and because of its graphical capabilities that allowed us to follow
the MCMC iterations on the computer screen in real time.
This model might be improved by incorporating kernels with varying
parameters over space and time to explore the correlation between crime ac-
tivity and city infrastructure such as roads or land use. Also, a preliminary
summary of criminal activity based on a spatial distribution of events occur-
ring during certain days of the week, month or year, might be incorporated
into the model to explore its forecasting potential.
Acknowledgements: This master’s project would not have been possible
without the support of Dr. William Hanley and his team in the Lawrence Liv-
ermore National Laboratories. Likewise, Professors Bruno Sansó and Herbert
Lee, as well as Dr. Matt Taddy were especially important academic advisors
for this project to come to a fruitful end.
13
Data points and lattice for kernel convolution modeling
+ + + + + + + + + +
●● ●
●●
●●● ●●
●
●● ● ●
●●
● ●●● ●●●●●
●
+ + + ●
● ● +● + + +●●● ● + + +
39.20
● ● ●
●●● ●
●●
●●●
●● ● ● ●●●● ●● ● ●
●●●●●
● ●●●
● ●●
●● ●● ● ● ● ●
● ●●
+ + +●
●
● ●
●
●●●●●
●●
+
●●
● ●●● ●●
●●
● ●
●
●
+ + ●●●
●●
● ●+
●
●
●
●
●●●
+ + +
●
● ●● ●●● ●
●
● ●
● ●●●●
●●
●
●
●●
● ● ●
● ● ● ● ●●
● ● ●●
● ●
● ● ● ●
● ●
●●● ● ●● ● ● ● ●● ●● ●
●
●●●● ●●
● ●
●
●
●
●●●
● ● ●●
● ●● ● ●
● ● ●
●
●
+ + + ●
●
●●
●
●●
● ●
+ ●
+ ●●
● + ● + ●
● ●
●
●
●
● ●● ●
+ ●
●●
●●
●●
● +
●● ●●
● ● ●● ●
● ●
+
● ●● ● ●
●● ●●● ●● ● ● ●●●
● ●
●
● ●
●● ● ●●
●●
●
●● ●●●● ●● ● ● ●●
●
●●
● ●●●● ●● ●
●
●
● ●●●
●● ●
●●●
●
●
●
● ●●
●
● ●
●●
+ + + ●●
+●●
●●●●●
+● ●●
●●●●
●●
●●●
●
+ ●●●
● +
●●●●
●
●
●●●
●●
●●
●
●
●
●●
●●
● + ● ● + +
●● ● ●
●
●●●●● ●● ● ● ●●
● ●
● ●
●● ● ● ●●
39.16
●●●
●
● ●●● ●● ●
● ● ● ●●●
● ●
●●
● ●●● ●●● ● ● ● ●
●●●● ● ●●
+ ●
+ + ●
●●●● +
●●
●●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●●●●
● ●
+ ●●●
●
●
+ ● ● ●
+ + ●●●
● ● ●
●
●
+ ●
●●
● ●
●●●
●
●
●
●
●
●●
●
●
●
●● ●
● ● ●
+
● ●●
●
●● ●● ● ●●● ● ● ●●
● ● ●●● ● ●● ●
●
●●● ●● ● ● ● ●● ●● ●●
● ●●
●● ●●●
● ● ●● ● ● ● ●● ●●● ● ●●● ● ●● ●
●●●
●●● ● ● ●● ● ● ● ●
●●● ●
●● ●
●
●●
● ● ● ● ●
yu
● ● ● ● ●● ● ●●
+ +
● ●
+ ● ●
●
● ●
●
●
●
●
●
+ ● ● ● + ●●●
●●● ● +
●●
●●
●●
●●
●
● ●●
●
●●●● + + ●● ●
● + ● ● ●●● +●
● ● ● ● ●●●● ● ● ●
● ●● ● ●● ●● ●●●
●● ● ● ● ●● ●
● ●
● ● ● ●● ●
●● ● ●● ●●●●● ● ● ●
● ●
● ●● ●●● ● ● ●● ●● ● ●●●
●● ●
●●
● ●●●●●
●● ●● ●●● ●●● ●
● ●●
● ● ●●● ●● ●●
●●● ● ●
●●●●● ●●●
●●
●
●
● ● ● ●●●●
●
●
●●●●● ●
●
●
●
●● ● ●
●●
● ● ● ● ● ●●●●●●
●
●
●●●●●● ●●● ●● ● ● ● ● ●●
●●●
● ●
●●●●●
●●
●●●●●●
●● ●●●● ● ●
●●
●
● ●● ●●
●● ●●
●●●● ●● ● ● ●
● ●
●
●● ● ●
● ●●●
●
●●●
● ● ●●● ● ● ●
● ● ●
●●
●●●
●
●●
●●● ● ● ●● ●
● ●● ●
● ●● ● ●
●● ● ● ● ● ● ● ●●●
+ ●●
●
●● ●
● ●●●●●●
+● ●● ●●
●
● ● ●
●
●●
● ●●
+ ●
●● ● ●
● ●
+ ●●
●
●●●● ●
●
●
●●● ●
●
+● ●
● ● ●●
●
●●● +
●●● ●
●●●●
●
●●● ●
●
● ●●● ●●●●
●● ● ●
●
●+
●
●
●●
●●●
●●●
●●● ● +
●
● ● + +
● ●●
● ● ●●●●●●
●●
●● ●● ●●●
●●● ● ●
●
●
●
●●● ● ●
●●
●●●
● ●● ● ●●●●
●●
● ●●●●●
●●
● ●● ●
●●●
●● ●● ● ● ●● ●●●● ● ● ●●
●
●
●
●
●●●●●
●●
● ●
●●
●
●● ● ●●●●● ● ●●
●
● ●●●
●● ● ●●
●●●
●●
●● ●● ●● ● ●● ●●●
●
●●●
●●
●●●● ● ●●●● ●
●● ●●●●●●●●● ●
●
● ●● ●●●● ●● ●●● ● ●● ● ●
●●
● ●● ●●
● ● ●●●
●●
● ●●
●● ● ●● ● ●● ●● ●● ●● ● ●●●● ● ● ● ●
+ + ●●
+ ●●●● ●● ●
●
●●+ ●● ●●
●
●
●
●
●
●
●
●
●
●+●
●
●●
●
●
● ●●
●●
●●●●
● ● + ●●
●
●●●
●●
●
●●● ●
●
●
●
● ●● +●
● ● + ● ● + +
39.12
●●
● ●● ●●
●●●●
●
● ●●
●●●
● ●●●●●● ●●●●
●
● ●●● ● ●●
●●●●
●●
●
●
●●
●●
●●●● ●
●● ● ●●● ●●● ● ●● ●
● ●
●
●●
● ●
●●●
●
●
●
●
●● ●●●●
● ●●● ● ●
●●
●
●●
●
● ●● ●●
●● ●
● ●
●●● ●●●● ● ● ●
●●●
● ● ●
●●●● ● ●● ● ● ● ● ●●
●●● ● ●
● ●●
● ●●● ● ● ●
●
●●● ●●
●●
●
● ● ● ● ●
●
●● ●●●●●
● ●
●●●
●
●
●●● ● ●● ● ● ● ● ●
● ●
●●● ●● ●● ●
●
● ●●
●
●● ●
●● ●●●● ● ● ●●
●● ●● ● ●●●
● ● ● ●● ●
+ ●
●
+ ● ●●
● ●
●
●
●
●●●
●
●●●
●
●●
●●● ●●●
●●
●
● ●
●
●
●
● +
●● ●● ●●
●
●
●
●
+ ● ● ●
●
●
●
●
●●●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●+
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●● ● ●+ ●
●
+ +
● ●●
● + +
● ● ●●●● ●● ●● ●●
●●●
●
●●
●●
●●● ●
●
●●
●
●●● ●
●●●● ● ●
●
●● ● ●
●●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
●●
●●
●●●
●●
●●
●
●● ●
● ●● ●● ● ●
●●
● ●●●
● ●
●●
●●●
●●●● ●●●● ●● ●●
●●●●
●
● ●● ●
●●●
● ●●
●
●
●
●●
●
●● ● ● ●● ● ●
●
●
●●● ●● ●
●●
●
●
●
●
●●●
●●
●●●
●●●●
● ●● ●●●
● ● ● ●
● ●●●● ● ●●
●●●
●● ●
●
●
●●●●●
● ●●
●●●●
● ●
● ● ●●
● ●● ●● ●
●●●●
● ●●
●
●● ●
+ +●●
●
●● ● ●
+ ●●●●● ●
●●
●
●
●
●
●
●●
●
●
+ ● ●
● +●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●●
+ + + ●
+ ● +
●● ● ● ●
● ●●
● ● ● ● ● ●●
●
● ●●● ● ● ●
● ● ●
● ●● ●
+ + ●●
+●
●●
●●
●
+ + + + + + ● +
●●
● ● ●
●
39.08
● ●
●
● ●
+ ●+ + + + + + + + + ●
−84.60 −84.55 −84.50 −84.45 −84.40
xu
Figure 1: There were two different grids used to estimate the parameters of
the model 1. The Gamma process Z(u) was modeled over a 13x10 grid (the
+s in the figure) whereas the spatial covariates distribution were imported
from
a 20x20 grid (the empty squares in the figure) when needed to compute
R
τ k(s − u)Z(u) exp(θ1 X1 (u) + θ2 X2 (u)) ds. The figure also shows the crime
type 1 locations over the year.
14
CRIME EVENTS REPORTED Daily variation of crimes against PEOPLE
[Case 1: Extreme Violence]
15
5
0 100 200 300
DAY OF THE YEAR

Day 1 = Jan 01, 2006
Crimes against PEOPLE during the days of the week

[Case 1: Extreme Violence]
●
CRIME COUNT
●
●
15
5
1 2 3 4 5 6 7
DAY OF THE WEEK

Day 1 = Sunday
Figure 2: The crime type 1, which includes events with high level of violence
especially against people, showed a cyclical pattern as that showed by the
entire dataset. There was a high number of crimes reported on Mondays
(upper panel) and high rates of criminal activity around the middle of the
year (lower panel).
15
39.22
39.20
39.18
39.16 Cincinnati Crime Data: Mean Intensity Surface Baseline
Latitude
39.14
39.12
39.10
−84.60 −84.55 −84.50 −84.45 −84.40
Longitude
Figure 3: The mean baseline λ(u), or the mean intensity surface when µs (u) =
µt (t) = 1.
16
Cincinnati Crime Data: Mean Intensity Surface
Jun / 25 /2006
39.22
39.20
39.18
39.16
Latitude
39.14
39.12
39.10
−84.60 −84.55 −84.50 −84.45 −84.40

Longitude
mean(ND) = 12.6876043087135 var(ND) = 0.350450423118253
Observed number of crime events = 13
Figure 4: The mean intensity surface corresponding to June 25th, 2006. This
picture is a frame taken from a movie generated in R and post processed with
Microsoft Windows Media Encoder 9.
17
Trace plot of theta [ 1 ]
0.000125 0.000150 covariate = population density 2000
2500 3000 3500 4000 4500 5000
theta [ 1 ]
mean = 0.000141174482952991 var = 2.29262511999377e−11
Histogram of theta [ 1 ]
covariate = population density 2000
100
40
0
0.000125 0.000135 0.000145 0.000155
theta [ 1 ]
Acceptance: 0.481696339267854
Figure 5: The trace plot for the parameter θ1 shows an acceptable mixing
(upper panel), whereas the estimated posterior mean of θ1 , whose histogram
is depicted in the lower panel, shows that, when used to compute µs (u),
an increment of 1000 new residents might increase the intensity of crime in
around 15%.
18
Trace plot of theta [ 2 ]
covariate = vacant units
0.0050
0.0035
2500 3000 3500 4000 4500 5000
theta [ 2 ]
mean = 0.00425622627213655 var = 8.58944178813553e−08
Histogram of theta [ 2 ]
covariate = vacant units
80
40
0
0.0035 0.0040 0.0045 0.0050
theta [ 2 ]
Acceptance: 0.585317063412683
Figure 6: The trace plot for the parameter θ2 shows an acceptable mixing
(upper panel), whereas the estimated posterior mean of θ2 , whose histogram
is depicted in the lower panel, shows that, when used to compute µs (u), an
increment of 10 vacant units might increase the intensity of crime in around
4.3%. In the case that 50 houses or apartments ended at some time with no
occupants, the intensity of crime events might increase in 23.7%.
19
Observed Number of Crimes
20
Nobs
10
0
0 100 200 300
day of the year
Posterior Mean of Number of Crimes

with 95% confidence interval
20
Ncalc
10
0
0 100 200 300
day of the year
Figure 7: The daily variation of the number of crime events, n and ΛR (t),
plotted from the observed data and from the estimated values according to
the model 1.
20
Population density 2006 in county 39061
39.22
39.20
39.18
39.16
rmy
39.14
39.12
39.10
−84.60 −84.55 −84.50 −84.45 −84.40
rmx
Figure 8: Future research might explore kernels with temporal and spatially
varying parameters.
21
R source code:
# POISSON PROCESS MODELING

# MODEL 1: KERNEL CONVOLUTION APPROACH
# Version 7a
# Oct 19, 2007
# Luis Acevedo Arreguin
#
##############################################################################
######
# MODELING SPECIFICATIONS
dataset = "complete" # "partial"; "complete"

covariates = "on" # "on"; "off"
temporal = "on" #
display = "on" # "on"; "off"
proposal = "beta" # "gammaloc1"; "gammaloc2"; "beta"; "lognormal"; "direct";
"prior"; "gamma1"; "gamma2"; "gamma3"; "gamma4LR"
prior_xu = "fixed" # "fixed"; "random"
kernel = "fixed" # ""fixed", "mono"; "multi";
kernel_size = 0.525 # initial kernel ellipse dimension (one standard deviation)

as a fraction of grid point separation
ITER = 5000
burn = 1/2
map = "map1" # "map1" = atlas map; "map2" = satellite map
# SETTING A GRID PxQ (nrow=P, ncol=Q)
P = 13
Q = 10
# DOMAIN COORDINATES (CORRESPONDING TO HAMILTON COUNTY, CODE= TGR39061,

FROM CENSUS.GOV)
x1 = -84.820305
x2 = -84.256391
y1 = 39.021600
y2 = 39.312056
# WORKING DIRECTORY
# setwd("C:/Documents and Settings/Me/Desktop/week 22")

# setwd("C:/Users/UCB Tiger/Desktop/week 18")
setwd("C:/Users/abc/Desktop/week 27")
# setwd("G:/Documents/week 20")
set.seed(9132)
##############################################################################
######
# SUBROUTINES AND FUNCTIONS
# MATT TADDY’S ROUTINE FOR INTERPOLATION AND CONTOURING
22
"ezinterp" <-
function(x, y, z, method="loess", gridlen=40,
span=0.05, ...)
{
# try to use akima, if specified

if(method == "akima") {
if(require(akima) == FALSE) {
warning("library(akima) required for 2-d plotting\ndefaulting
to loess interpolation\n");
} else {
return(interp(x,y,z, duplicate="mean",
xo=seq(min(x), max(x), length=gridlen),
yo=seq(min(y), max(y), length=gridlen), ...))
}
}
# try to default to loess

if(method != "loess")
cat(paste("method [", method, "] unknown, using loess\n",
sep=""))
if(length(x) != length(y) && length(y) != length(z))

stop("length of x, y and z must be equal")
if(length(x) < 30 && span < 0.5) {

warning("with less than 30 points, suggest span >> 0.5 or use
akima",
immediate. = TRUE)
cat(paste("currently trying span =", span, "for", length(x),
"points\n"))
}
xo <- seq(min(x), max(x), length=gridlen)

yo <- seq(min(y), max(y), length=gridlen)
xyz.loess <-
suppressWarnings(loess(z ~ x + y, data.frame(x=x, y=y),
span=span, ...))
g <- expand.grid(x=xo, y=yo)

g.pred <- predict(xyz.loess, g)
return(list(x=xo, y=yo, z=matrix(g.pred, nrow=gridlen)))
}
##############################################################################
######
# LUIS ARREGUIN’S SUBROUTINE TO COMPUTE BIVARIATE NORMAL PROBABILITY
pbivariate <- function(x1,x2,y1,y2,xvar,yvar,rho,slices=10) {
# COMPUTING PROBABILITIES FROM A BIVARIATE NORMAL DISTRIBUTION

# A RECTANGLE BOUNDED BY [x1,x2] AND [y1,y2]
sd1 = sqrt(xvar)
sd2 = sqrt(yvar)
23
deltax = (x2-x1)/slices
xi = x1 + deltax/2
volume = 0
for(i in 1:slices) {
m3 = xi*rho*sd2/sd1
sd3 = sd2*sqrt(1-rho*rho)
integral2 <- pnorm(y2,mean=m3,sd=sd3) - pnorm(y1,mean=m3,sd=sd3)
integral1 <- pnorm(xi+deltax/2,mean=0,sd=sd1)-pnorm(xi-deltax/2,mean=0,sd=sd1)
volume <- volume + integral1*integral2
xi <- xi + deltax }
q0 <- sd1*sd2*sqrt(1-rho*rho)
volume <- 2*pi*q0*volume
volume }
# END pbivariate
##############################################################################
######
# DATA INPUT AND GRID SETTINGS
if(dataset == "complete") {
# READING DATA FROM THE ENTIRE DATASET
crime2006_file3 <- read.table("crime2006_plus3.dat",header=TRUE)

dim(crime2006_file3)
attach(crime2006_file3)
# SELECTING CRIME CLASSIFICATION (UCR)
CC = 1
datax <- longitude[crime_class==CC]

datay <- latitude[crime_class==CC]
t1 <- weekday[crime_class==CC]
t2 <- yearday[crime_class==CC]
t3 <- week[crime_class==CC]
t4 <- month[crime_class==CC]
t5 <- day[crime_class==CC]
index <- which(datax==0)
if(length(index)>0) {
x <- datax[-index]
y <- datay[-index]
24
t1 <- t1[-index]
t2 <- t2[-index]
t3 <- t3[-index]
t4 <- t4[-index]
t5 <- t5[-index]
} else {
x <- datax
y <- datay
}
data1 = data.frame(x, y, t1, t2, t3, t4, t5)

} # END if dataset complete
if(dataset == "partial") {
data1 <- read.csv("juneData.csv")
x <- data1[,2]
y <- data1[,3]
}
if(covariates == "on") {
data2 <- read.csv("YY20gridData.csv")
# COVARIATE INDEX VECTOR

civ = c(13, 47)
cov_name = c("population density 2000", "vacant units")
nc = length(civ)
# efe_s <- data1[,civ] # efe_s <- data1[,13]

# sum_fs <- apply(efe_s, 2, sum) # sum_fs <- sum(efe_s)
xu2 <- data2[,2]

yu2 <- data2[,3]
efe_u2 <- data2[,civ] # efe_u2 <- data2[,13]

county <- data2[,8]
index <- which(county==39061)
length(index)
efe_u2[-index,] <- 0 # DISCARDING DATA FROM OTHER COUNTIES
n2 = length(xu2)
P2 = 20 # SECONDARY GRID WITH INFORMATION ON COVARIATES
Q2 = 20
} # END if covariates on
n = length(x)
# SETTING A GRID PxQ (nrow=P, ncol=Q)
delta_x <- (max(x)-min(x))/(Q-1)

delta_y <- (max(y)-min(y))/(P-1)
25
delta_x
delta_y
n_star = P*Q
x_grid = seq(from=(min(x)-delta_x/4), to=(max(x)+delta_x/4), length=Q)

y_grid = seq(from=(min(y)-delta_y/4), to=(max(y)+delta_y/4), length=P)
x_grid = seq(from=(min(xu2)+delta_x/10), to=(max(xu2)-delta_x/10), length=Q)

y_grid = seq(from=(min(yu2)+delta_y/10), to=(max(yu2)-delta_y/10), length=P)
# area <- (max(x_grid)-min(x_grid))*(max(y_grid)-min(y_grid))
xu = rep(x_grid, times=P)
yu = rep(y_grid, each=Q)
# cell <- area/((Q-1)*(P-1))

# Au <- rep(cell, n_star)
# for (j in c(1,P)) for(i in 1:Q) Au[(j-1)*Q+i] <- cell/2

# for (i in c(1,Q)) for(j in 1:P) Au[(j-1)*Q+i] <- cell/2
# for (j in c(1,P)) for(i in c(1,Q)) Au[(j-1)*Q+i] <- cell/4
delta_x1 <- (max(xu)-min(xu))/(Q-1)

delta_y1 <- (max(yu)-min(yu))/(P-1)
x1u <- rep(-delta_x1, n_star)

x2u <- rep(delta_x1, n_star)
y1u <- rep(-delta_y1, n_star)
y2u <- rep(delta_y1, n_star)
for(i in 1:Q) {
y1u[i] <- 0
y2u[(P-1)*Q + i] <- 0
}
for(j in 1:P) {
x1u[(j-1)*Q + 1] <- 0
x2u[(j-1)*Q + Q] <- 0
}
delta_x2 <- (max(xu2)-min(xu2))/(Q2-1)

delta_y2 <- (max(yu2)-min(yu2))/(P2-1)
x1u2 <- rep(-delta_x2, n2)

x2u2 <- rep(delta_x2, n2)
y1u2 <- rep(-delta_y2, n2)
y2u2 <- rep(delta_y2, n2)
for(i in 1:Q2) {
y1u2[i] <- 0 # y1u2[i] <- min(yu2)-min(yu)
26
y2u2[(P2-1)*Q2 + i] <- 0 # y2u2[(P2-1)*Q2 + i] <- max(yu)-max(yu2)
}
for(j in 1:P2) {
x1u2[(j-1)*Q2 + 1] <- 0 # x1u2[(j-1)*Q2 + 1] <- min(xu2)-min(xu)
x2u2[(j-1)*Q2 + Q2] <- 0 # x2u2[(j-1)*Q2 + Q2] <- max(xu)-max(xu2)
}
par(mfrow=c(1,1))
plot(xu, yu, pch = "+", main="Data points and lattice for kernel convolution
modeling")
points(x,y,pch=20)
if(covariates == "on") points(xu2,yu2,pch=22)
n
n_star
# xi <- 1+(Q-1)*(x-min(x_grid))/(max(x_grid)-min(x_grid))
# yj <- 1+(P-1)*(y-min(y_grid))/(max(y_grid)-min(y_grid))
# loc <- (round(yj)-1)*Q+round(xi)
# Lu_data <- rep(0, n_star)

# Nu_data <- rep(0, n_star)
# for(j in 1:n_star) {
# index0 <- which(loc==j)
# Nu_data[j] <- length(index0)
# }
# Lu_data <- Nu_data/Au
x_grid2 <- seq(min(xu2),max(xu2),length=Q2)

y_grid2 <- seq(min(yu2),max(yu2),length=P2)
xi <- 1+(Q2-1)*(xu-min(x_grid2))/(max(x_grid2)-min(x_grid2))
yj <- 1+(P2-1)*(yu-min(y_grid2))/(max(y_grid2)-min(y_grid2))
loc_u <- (floor(yj)-1)*Q2+floor(xi)
w1 = (1+floor(xi)-xi)*(1+floor(yj)-yj)
w2 = (xi-floor(xi))*(1+floor(yj)-yj)
w3 = (1+floor(xi)-xi)*(yj-floor(yj))
w4 =(xi-floor(xi))*(yj-floor(yj))
efe_u <- w1*efe_u2[loc_u,]+w2*efe_u2[loc_u+1,]+w3*efe_u2[loc_u+Q2,]
+w4*efe_u2[loc_u+Q2+1,]
xi <- 1+(Q2-1)*(x-min(x_grid2))/(max(x_grid2)-min(x_grid2))
yj <- 1+(P2-1)*(y-min(y_grid2))/(max(y_grid2)-min(y_grid2))
loc_s <- (floor(yj)-1)*Q2+floor(xi)
index <- which(loc_s<1|loc_s>n2)
if(length(index)>0) {
x <- x[-index]
y <- y[-index]
t1 <- t1[-index]
27
t2 <- t2[-index]
t3 <- t3[-index]
t4 <- t4[-index]
t5 <- t5[-index]
n <- length(x)
xi <- 1+(Q2-1)*(x-min(x_grid2))/(max(x_grid2)-min(x_grid2))
yj <- 1+(P2-1)*(y-min(y_grid2))/(max(y_grid2)-min(y_grid2))
loc_s <- (floor(yj)-1)*Q2+floor(xi)
}
w1 = (1+floor(xi)-xi)*(1+floor(yj)-yj)
w2 = (xi-floor(xi))*(1+floor(yj)-yj)
w3 = (1+floor(xi)-xi)*(yj-floor(yj))
w4 =(xi-floor(xi))*(yj-floor(yj))
efe_s <- w1*efe_u2[loc_s,]+w2*efe_u2[loc_s+1,]+w3*efe_u2[loc_s+Q2,]
+w4*efe_u2[loc_s+Q2+1,]
sum_fs <- apply(efe_s, 2, sum)
if(temporal == "on") {
# READING THE SPATIAL DISTRIBUTION OF TEMPORAL COVARIATES
# spatialtempcov2 <- read.table("YY20gridData_TEMP.dat", header=TRUE)
# t1_u2 <- spatialtempcov2[,3] # weekday: day of the week (Sunday = 1)

# t2_u2 <- spatialtempcov2[,4] # yearday: day of the year (Jan 01, 2006 = 1)
# t3_u2 <- spatialtempcov2[,5] # week: from 1 to 52 or 53
# t4_u2 <- spatialtempcov2[,6] # month: from 1 to 12 (January = 1)
# cov_name = c(cov_name, "cos(2*pi*weekday/7)", "sin(2*pi*month/12)",

"cos(2*pi*month/12)")
# efe_s <- cbind(efe_s, cos(2*pi*t1/7), sin(2*pi*t4/12), cos(2*pi*t4/12))
# sum_fs <- apply(efe_s, 2, sum)
# nc = dim(efe_s)[2]
# efe_u2 <- cbind(efe_u2, cos(2*pi*t1_u2/7), sin(2*pi*t4_u2/12),

cos(2*pi*t4_u2/12)) # cos(2*pi*t2_u2/365), sin(2*pi*t2_u2/365))
# SELECTING TEMPORAL CYCLES
sum_Mt = rep(0, 8)
for(j in 1:n) {
sum_Mt[1] <- sum_Mt[1] + sin(2*pi*t4[j]/12)
sum_Mt[2] <- sum_Mt[2] + cos(2*pi*t4[j]/12)
28
}
# DATA FOR THE CONTROL PANEL DISPLAY

# June 25
# t1_u = rep(1, n_star)

# efe_u <- cbind(efe_u, cos(2*pi*t1_u/7), sin(2*pi*t4_u/12), cos(2*pi*t4_u/12))

# cos(2*pi*t2_u/365), sin(2*pi*t2_u/365))
# MONTHLY BASIS
# td <- 6
# xd <- x[t4==td]
# yd <- y[t4==td]
# DAILY BASIS
td <- 176
xd <- x[t2==td]
yd <- y[t2==td]
n
n2
n_star
##############################################################################
######
# MODELING PART ONE (KERNEL SETTINGS)
# ARRAY FOR THE KERNELS INTEGRATED OVER THE DOMAIN

Ku <- rep(1, n_star)
# ARRAYS REQUIRED TO COMPUTE THE CORRELATION MATRIX BETWEEN

# THE LOCATIONS (S) OF THE DATA POINTS AND THE LOCATIONS (u) OF THE KERNELS
k1 <- matrix(0, n_star, n)

hx <- matrix(0, n_star, n)
hy <- matrix(0, n_star, n)
h11 <- matrix(0, n_star, n)
# ARRAYS REQUIRED TO COMPUTE THE CORRELATION MATRIX BETWEEN

# THE LOCATIONS (u) OF THE KERNELS AND THE LOCATIONS (u) OF THE KERNELS
k2 <- matrix(0, n_star, n_star)

dx <- matrix(0, n_star, n_star)
29
dy <- matrix(0, n_star, n_star)
for(i in 1:n_star) {
for(j in 1:n_star) {
dx[j,i] <- xu[i] - xu[j]
dy[j,i] <- yu[i] - yu[j]
}}
for(i in 1:n) {
hx[j,i] <- x[i]-xu[j]
hy[j,i] <- y[i]-yu[j]
h11[j,i] <- hx[j,i]*hx[j,i]
h12[j,i] <- hx[j,i]*hy[j,i]
h22[j,i] <- hy[j,i]*hy[j,i]
}}
# non informative kernel parameters
rx0 = kernel_size
ry0 = kernel_size
ru0 = 0
sxu <- matrix((rx0*(max(x_grid)-min(x_grid))/(Q-1))^2,n_star, ITER)
syu <- matrix((ry0*(max(y_grid)-min(y_grid))/(P-1))^2,n_star, ITER)
post_ru <- rep(ru0,n_star)
ru <- matrix(ru0, n_star, ITER)
##############################################################################
######
# KERNEL MATRIX COMPUTATIONS
for(i in 1:n_star) {
q1 <- sxu[j,1]*syu[j,1]*(1-ru[j,1]*ru[j,1])
q2 <- syu[j,1]*dx[j,i]*dx[j,i]-2*ru[j,1]*sqrt(sxu[j,1]*syu[j,1])*dx[j,i]
*dy[j,i]+sxu[j,1]*dy[j,i]*dy[j,i]
# k2[j,i] <- (1/(2*pi*sqrt(q1)))*exp(-q2/(2*q1))

k2[j,i] <- exp(-q2/(2*q1))
}}
for(i in 1:n) {
q1 <- sxu[j,1]*syu[j,1]*(1-ru[j,1]*ru[j,1])
q2 <- syu[j,1]*hx[j,i]*hx[j,i]-2*ru[j,1]*sqrt(sxu[j,1]*syu[j,1])*hx[j,i]
*hy[j,i]+sxu[j,1]*hy[j,i]*hy[j,i]
# k1[j,i] <- (1/(2*pi*sqrt(q1)))*exp(-q2/(2*q1))
30
k1[j,i] <- exp(-q2/(2*q1))
} # END OF ITERATION j
} # END OF ITERATION i
Ku <- pbivariate(min(x_grid)-xu,max(x_grid)-xu,min(y_grid)-yu,max(y_grid)
-yu,sxu[,1],syu[,1],ru[,1])
##############################################################################
######
# MODELING PART TWO (MONTE CARLO MARKOV CHAIN ALGORITHM)
# SOME OF THE FOLLOWING ARRAYS WILL NOT BE REQUIRED UNTIL

# MORE COMPLEX VERSIONS OF THE MODEL ARE IMPLEMENTED
par(mfrow=c(1,1))
ND = rep(0, ITER)
ND[1] <- n
alpha <- rep(0, ITER) # alpha <- matrix(0, ITER,1)

beta <- rep(0, ITER) # beta <- matrix(0, ITER,1)
Lprod <- rep(0, ITER)
Ls <- matrix(0, n, ITER)

Lu <- matrix(0, n_star, ITER)
noaccept <- rep(0, n_star)
Ms <- matrix(1, n, ITER)

Mu <- matrix(1, n_star, ITER)
if(covariates=="on") {
theta <- matrix(0, nc, ITER) # theta <- rep(0, ITER)
theta_star <- matrix(0, nc, ITER) # theta_star <- rep(0, ITER)
accepttheta <- rep(0, nc) # accepttheta <- 0
NDtheta <- rep(0, ITER)

NDtheta_star <- matrix(0, nc, ITER) # NDtheta_star <- rep(0, ITER)
NDtheta[1] <- n/12 # CHECK TEMPORAL CYCLES
}
fxu <- matrix(0, n_star, ITER)

fxu_star <- matrix(0, n_star, ITER)
rx <- matrix(rx0, n_star, ITER)

ry <- matrix(ry0, n_star, ITER)
rx_star <- matrix(1, n_star, ITER)
ry_star <- matrix(1, n_star, ITER)
acceptrx <- rep(0, n_star)
acceptry <- rep(0, n_star)
# THE HYPERPARAMETERS FOR GAMMA PRIORS ARE NAMED BY CONCATENATING THE LETTERS
A OR B
# (CORRESPONDING TO ALPHA OR BETA) AND THE RANDOM VARIABLE INITIALS FOR
31
WHICH
# THE MCMC IS RUN
Arx <- 4
Brx <- 4
Ary <- 4
Bry <- 4
RX <- rep(rx0, ITER)
RY <- rep(ry0, ITER)
RX_star <- rep(1, ITER)
RY_star <- rep(1, ITER)
acceptRX <- 0
acceptRY <- 0
RU <- rep(ru0, ITER)
RU_star <- rep(0, ITER)
acceptRU <- 0
sxu_star <- matrix(1, n_star, ITER)

syu_star <- matrix(1, n_star, ITER)
k1x_star <- matrix(0, n_star, n)
k1y_star <- matrix(0, n_star, n)
ru_star <- matrix(0, n_star, ITER)

k1r_star <- matrix(0, n_star, n)
acceptru <- rep(0, n_star)
tauk <- rep(1, ITER)
NDtemp <- rep(1, ITER)
if(temporal=="on") {
thetatemp <- matrix(0, 8, ITER)

thetatemp_star <- matrix(0, 8, ITER)
acceptthetatemp <- rep(0, 8)
NDtemp_star <- matrix(0, 8, ITER)
NDt <- rep(0, ITER)
Mts <- rep(1, n)
Mtu <- 1
##############################################################################
######
# PRIOR SPECIFICATIONS
# HYPERPARAMETERS
if(proposal == "beta"&covariates == "off") {

np = 1.75 # 1.75 when kernel size = 0.6 # 4 # from 0.125 to 7.5
pv1 = 0.6
}
if(proposal == "beta"&covariates == "on") {

np = 1.75 # 4 # from 0.125 to 7.5
pv1 = 0.0075 # 0.005 # 0.015 # 0.1 # 0.065 # 0.2
32
}
# THE HYPERPARAMETERS FOR GAMMA PRIORS ARE NAMED BY CONCATENATING THE LETTERS
A OR B
# (CORRESPONDING TO ALPHA OR BETA) AND THE RANDOM VARIABLE INITIALS FOR
WHICH
# THE MCMC IS RUN. SOMETIMES UNDERSCORES ARE INTRODUCE FOR THE SAKE OF
CLARITY
if(covariates == "off") {
q1 <- mean(sxu[,1]*syu[,1]*(1-ru[,1]*ru[,1]))
A_tauk <- np*pv1/(2*pi*sqrt(q1)) # (n/10)*1/2 # 25*1/2

B_tauk <- np # 1/2
}
q1 <- mean(sxu[,1]*syu[,1]*(1-ru[,1]*ru[,1]))
A_tauk <- np*pv1/(2*pi*sqrt(q1))

B_tauk <- np
A_tauk
B_tauk
tauk[1] <- A_tauk/B_tauk
if(prior_xu == "fixed") {
alpha[1] <- np*pv1*n*(B_tauk/A_tauk)/sum(Ku) # 4 # np*n_star/n # np*sum(Ku)/n
# np
beta[1] <- np # # np # beta[1]*n/n_star # 1.0*n*beta[1]/sum(Ku) # from 0.01
to 0.99
}
alpha[1] <- np*pv1*n*(B_tauk/A_tauk)/sum(Ku*exp(3e-5*efe_u)) # 4 # np*n_star/n
# beta[1] <- np*sum(Ku)/n # np
beta[1] <- np # alpha[1] <- np # beta[1]*n/n_star # 1.0*n*beta[1]/sum(Ku)
# from 0.01 to 0.99
}
} # END prior_xu fixed
alpha[1]
beta[1]
if(proposal == "beta") fxu[,1] <- 1.0*n/n_star

index1 <- which(fxu[,1]<1e-50)
fxu[index1,1] <- 1e-50
##############################################################################
33
######
# MCMC IMPLEMENTATION
sum1prev = 0
k = 1
for(k in 1:(ITER-1)) {
Ls[,k] <- Mts*Ms[,k]*tauk[k]*t(fxu[,k])%*%k1

Lu[,k] <- Mtu*Mu[,k]*tauk[k]*t(fxu[,k])%*%k2
} else {
Ls[,k] <- Ms[,k]*tauk[k]*t(fxu[,k])%*%k1
Lu[,k] <- Mu[,k]*tauk[k]*t(fxu[,k])%*%k2
}
# CONTROL PANEL 1
if(display=="on"&(k-2-round(k*burn))>1) {
# post_xu <- apply(fxu[,2:k],1,mean)
post_Lu <- apply(Lu[,(1+round(k*burn)):(k-1)],1,mean)
post_tau <- mean(tauk[(1+round(k*burn)):(k-1)])
turn = 1
thetas = numeric(0)
nc = 0
post_ND <- mean(NDtemp[(1+round(k*burn)):(k-1)]*ND[(1+round(k*burn)):(k-1)])
var_ND <- var(NDtemp[(1+round(k*burn)):(k-1)]*ND[(1+round(k*burn)):(k-1)])
}
post_ND <- mean(NDtemp[(1+round(k*burn)):(k-1)]*NDtheta[(1+round(k*burn))
:(k-1)])
var_ND <- var(NDtemp[(1+round(k*burn)):(k-1)]*NDtheta[(1+round(k*burn))
:(k-1)])
thetas <- apply(theta[,(1+round(k*burn)):(k-1)],1,mean)
turn <- sample(1:nc,1)
}
if(temporal=="on") {
thetas <- c(thetas, apply(thetatemp[,(1+round(k*burn)):(k-1)],1,mean))
turn <- sample(1:(nc+8),1)
surface5a <- matrix(log(post_Lu),Q,P)
image(x_grid,y_grid,surface5a,xlab=" ",ylab=paste("Accpt theta:"
,min(c(1+accepttheta/k,1+acceptthetatemp/k))," to ", max(c(1+accepttheta/k
,1+acceptthetatemp/k))," rho:",min(1+acceptru/k)," to ",max(1+acceptru/k))
,main=paste("Log[post L(u)] after iter = ",k,"with burn of ",100*burn,"%
theta[",turn,"] = ",thetas[turn]," tau = ",post_tau,"
mean(ND) = ",post_ND," var(ND) = ",var_ND),
sub=paste("Acceptance x(u): ",min(1-noaccept/k)," to ",max(1-noaccept/k),"
Acceptance rx,ry:",min(c(1+acceptrx/k,1+acceptry/k))," to ",max(c(1+acceptrx/k
,1+acceptry/k))))
34
contour(x_grid,y_grid,surface5a,add=TRUE)
post_NDt <- mean(NDt[(1+round(k*burn)):(k-1)])
var_NDt <- var(NDt[(1+round(k*burn)):(k-1)])
text(-84.45, 39.09, label = paste("ND[June 25] = ",post_NDt))
text(-84.45, 39.085, label = paste("var(ND) = ",var_NDt))
points(xd,yd)
} else {
,min(1+accepttheta/k)," to ", max(1+accepttheta/k)," rho:"
,min(1+acceptru/k)," to ",max(1+acceptru/k)),main=paste("Log[post L(u)]
after iter = ",k,"with burn of ",100*burn,"%
theta[",turn,"] = ",thetas[turn]," tau = ",post_tau,"
Acceptance rx,ry:",min(c(1+acceptrx/k,1+acceptry/k))," to ",max(c(1+acceptrx/k
,1+acceptry/k))))
points(x,y)
}
} # end DISPLAY
##############################################################################
######
# PROPOSAL DISTRIBUTIONS FOR GAMMA PROCESS x(u)
if(proposal == "beta") { # PROPOSAL 2 [Beta]
a_eta <- 2.5 # a_eta <- 125 # a_eta <- 8 # a_eta <- 10
delta_eta <- 0.95 # 0.85; delta_eta <- 0.5
# eta <- rbeta(n_star,a_eta*Ku*delta_eta/2,a_eta*Ku*(1-delta_eta)/2)
eta <- rbeta(n_star,a_eta*delta_eta/2,a_eta*(1-delta_eta)/2)
# index0 <- which(eta=="NaN")
# eta[index0] <- rbeta(length(index0),a_eta*0.001*delta_eta/2,a_eta*0.001
*(1-delta_eta)/2)
fxu_star[,k] <- eta*fxu[,k]/delta_eta
# index1 <- which(fxu_star[,k]=="NaN")
# fxu_star[index1,k] <- fxu[index1,k]
index2 <- which(fxu_star[,k]<1e-50)
fxu_star[index2,k] <-1e-50
# Ls_star[,k] <- Ms[,k]*tauk[k]*t(fxu_star[,k])%*%k1
q12 <- log(eta) - log(delta_eta)
} # END PROPOSAL "beta"
##############################################################################
######
# METROPOLIS-HASTINGS SAMPLERS FOR GAMMA PROCESS x(u)
prod = 0
for(i in 1:n) {
prod <- prod + log(Ls[i,k])
}
35
Lprod[k] <- prod
logfxu <- log(fxu[,k])

logfxu_star <- log(fxu_star[,k])
sum1 <- sum(logfxu)
sum2 <- sum(fxu[,k])
# METROPOLIS-HASTINGS FOR x(u) [see Robert & Casella (2004) p. 270]
if(proposal == "beta") { # M-H FOR PROPOSAL 2
logprod = 0
for(i in 1:n) logprod <- logprod + log(Ls[i,k]+Mts[i]*Ms[i,k]*tauk[k]
*(fxu_star[j,k]-fxu[j,k])*k1[j,i])
} else {
logprod = 0
for(i in 1:n) logprod <- logprod + log(Ls[i,k]+Ms[i,k]*tauk[k]
*(fxu_star[j,k]-fxu[j,k])*k1[j,i])
}
# q12 <- log(eta) - log(delta_eta) # see the proposal section
if(covariates == "on"&k == 1) Kutheta <- Ku
p1 <- (alpha[k]-1)*logfxu_star[j]-fxu_star[j,k]*(beta[k]+NDtemp[k]*Mu[j]
*tauk[k]*Ku[j])+logprod
p2 <- (alpha[k]-1)*logfxu[j]-fxu[j,k]*(beta[k]+NDtemp[k]*Mu[j]*tauk[k]
*Ku[j])+Lprod[k]
mh1 <- exp(p1-p2+q12[j])
}
p1 <- (alpha[k]-1)*logfxu_star[j]-fxu_star[j,k]*(beta[k]+NDtemp[k]*tauk[k]
*Kutheta[j])+logprod
p2 <- (alpha[k]-1)*logfxu[j]-fxu[j,k]*(beta[k]+NDtemp[k]*tauk[k]*Kutheta[j])
+Lprod[k]
mh1 <- exp(p1-p2+q12[j])
}
# if(mh1 == "Inf") mh1 <- 2

# if(mh1 == "-Inf") mh1 <- 0
pstar <- min(1,mh1)
fxu[j,k+1] <- sample(c(fxu_star[j,k],fxu[j,k]),1,prob=c(pstar,1-pstar))
if(fxu[j,k+1]==fxu[j,k]) noaccept[j] <- noaccept[j]+1
} # END M-H PROPOSAL "beta"
} # END ITERATION j FOR M-H ALGORITHMS FOR x(u)
36
##############################################################################
######
# UPDATING HYPERPARAMETERS
if(prior_xu == "fixed") {
alpha[k+1] <- alpha[k]
beta[k+1] <- beta[k]
}
# GIBSS SAMPLER FOR tauk (AS IF ALPHA WAS FIXED AND BETA WAS RANDOM)
aprox = 2 # 1 = constant tau; 2 = random tau with a gamma prior

if(covariates == "off"&aprox==2) tauk[k+1] <- rgamma(1,A_tauk+n,B_tauk
+NDtemp[k]*t(fxu[,k])%*%Ku)
if(covariates == "off"&aprox==1) tauk[k+1] <- tauk[k]
if(covariates == "on"&aprox==1) tauk[k+1] <- tauk[k]
if(covariates == "on"&aprox==2&k>1) tauk[k+1] <- rgamma(1,A_tauk+n,B_tauk
+NDtemp[k]*NDtheta[k]/tauk[k-1])
if(covariates == "on"&aprox==2&k==1) tauk[k+1] <- rgamma(1,A_tauk+n,B_tauk
+NDtemp[k]*t(fxu[,k])%*%Ku)
##############################################################################
######
# UPDATING THE KERNEL PARAMETERS rx, ry, and rho
if(kernel == "fixed") {
rx[,k+1] <- rx[,k]
ry[,k+1] <- ry[,k]
ru[,k+1] <- ru[,k]
}
sxu[,k+1] <- rx[,k+1]*((max(x_grid)-min(x_grid))/(Q-1))^2

syu[,k+1] <- ry[,k+1]*((max(y_grid)-min(y_grid))/(P-1))^2
##############################################################################
######
# METROPOLIS-HASTINGS STEP FOR COVARIATES
mt_prior = rep(0, nc) # rep(0, nc) # mt_prior <- c(0, 0) # mt_prior = 0

sdt_prior <- c(1e-4, 5e-4) # , 0.5, 0.5, 0.5) # rep(1e-4, nc) # sdt_prior
<- c(1e-4, 1e-4) # sdt_prior <- sqrt(var_ratio/apply(efe_u2, 2, var))
# sdt_prior <- c(5e-5,3e-3) # sdt_prior = 5e-5 # 7.5e-6 # 1e-4 # 1e-6
sdt_star <- c(5e-6, 2.5e-4) # , 2.5e-2, 2.5e-2, 2.5e-2) # rep(2.5e-5, nc)
# sdt_star <- c(2.5e-5, 7.5e-5) # sdt_star <- sd_ratio*sdt_prior
# sdt_star = 1e-5 # 1e-5 # 1e-6
theta_star[,k] <- rnorm(nc, theta[,k], sdt_star) # theta_star[k]
<- rnorm(1,theta[k], sdt_star)
Kutheta <- rep(0, n_star)
37
Kutheta_star <- matrix(0, n_star, nc) # Kutheta_star <- rep(0, n_star)
Kuu <- rep(0, n2)
xx1 <- xu2+x1u2-xu[j]

yy1 <- yu2+y1u2-yu[j]
sxx <- rep(sxu[j,k+1],n2)
syy <- rep(syu[j,k+1],n2)
ruu <- rep(ru[j,k+1],n2)
Kuu <- pbivariate(xx1,xx2,yy1,yy2,sxx,syy,ruu,3)
Kutheta[j] <- Kuu%*%exp(as.matrix(efe_u2)%*%theta[,k]) # Kutheta[j]

<- Kuu%*%exp(efe_u2*theta[k])
for(h in 1:nc) {
# Kutheta_star[j,h] <- Kutheta[j] + Kuu%*%exp(efe_u2[,h]
*(theta_star[h,k]-theta[h,k]))
Kutheta_star[j,h] <- Kuu%*%exp(as.matrix(efe_u2[,-h])%*%theta[-h,k]+efe_u2[,h]
*theta_star[h,k])
}
# Kutheta_star[j] <- Kuu%*%exp(efe_u2*theta_star[k])
}
index15a <- which(Kutheta<1e-6)

index15b <- which(Kutheta_star<1e-6)
Kutheta[index15a] <- 1e-6
Kutheta_star[index15b] <- 1e-6
q12t <- 0
NDtheta[k+1] <- tauk[k]*fxu[,k]%*%Kutheta

NDtheta_star[,k+1] <- tauk[k]*fxu[,k]%*%Kutheta_star # NDtheta_star[k+1]
<- tauk[k]*Kutheta_star%*%fxu[,k]
p1t <- -NDtemp[k]*NDtheta_star[,k+1] + sum_fs*theta_star[,k]

-(1/(2*sdt_prior^2))*(mt_prior-theta_star[,k])^2
p2t <- -NDtemp[k]*NDtheta[k+1] + sum_fs*theta[,k]-(1/(2*sdt_prior^2))
*(mt_prior-theta[,k])^2
mh1t <- exp(p1t-p2t+q12t)
for(h in 1:nc) {
if(mh1t[h]>1|mh1t[h]=="Inf") mh1t[h] <- 2
pstart <- min(1,mh1t[h])
theta[h,k+1] <- sample(c(theta_star[h,k],theta[h,k]),1,prob=c(pstart,1-pstart))
if(theta[h,k+1]==theta[h,k]) accepttheta[h] <- accepttheta[h]-1 else {
NDtheta[k+1] <- NDtheta_star[h,k+1]
Kutheta <- Kutheta_star[,h]
} # end else
} # end h loop
Ms[,k+1] <- exp(as.matrix(efe_s)%*%theta[,k+1])

Mu[,k+1] <- exp(as.matrix(efe_u)%*%theta[,k+1])
38
} # END covariates M-H
mt_prior_temp = rep(0, 8)
sdt_prior_temp <- rep(0.5, 8)
sdt_star_temp <- rep(2.5e-2, 8)
thetatemp_star[,k] <- rnorm(8, thetatemp[,k], sdt_star_temp)
q12tt <- 0
NDtemp_star = rep(0, 8)
# twd = 1
# twk = 1
for(t in 1:365){
indext <- which(t2==t)
tmo <- t4[indext[1]]
twk <- t3[indext[1]]
twd <- t1[indext[1]]
NDtemp_star[1] <- NDtemp_star[1] + exp(thetatemp_star[1,k]*sin(2*pi*tmo/12)
+ thetatemp[2,k]*cos(2*pi*tmo/12)+
thetatemp[3,k]*sin(2*pi*twk/52) + thetatemp[4,k]*cos(2*pi*twk/52)+
thetatemp[5,k]*sin(2*pi*t/365) + thetatemp[6,k]*cos(2*pi*t/365)+
thetatemp[7,k]*sin(2*pi*twd/7) + thetatemp[8,k]*cos(2*pi*twd/7))
NDtemp_star[2] <- NDtemp_star[2] + exp(thetatemp[1,k]*sin(2*pi*tmo/12)
+ thetatemp_star[2,k]*cos(2*pi*tmo/12)+

thetatemp_star[3,k]*sin(2*pi*twk/52) + thetatemp[4,k]*cos(2*pi*twk/52)+
thetatemp[3,k]*sin(2*pi*twk/52) + thetatemp_star[4,k]*cos(2*pi*twk/52)+

thetatemp_star[5,k]*sin(2*pi*t/365) + thetatemp[6,k]*cos(2*pi*t/365)+
thetatemp[5,k]*sin(2*pi*t/365) + thetatemp_star[6,k]*cos(2*pi*t/365)+

39
thetatemp_star[7,k]*sin(2*pi*twd/7) + thetatemp[8,k]*cos(2*pi*twd/7))
thetatemp[7,k]*sin(2*pi*twd/7) + thetatemp_star[8,k]*cos(2*pi*twd/7))
# if(twd<7) twd <- twd+1 else {

# twd <- 1
# twk <- twk+1}
}
p1tt <- -NDtemp_star*NDtheta[k+1] + sum_Mt*thetatemp_star[,k]
-(1/(2*sdt_prior_temp^2))*(mt_prior_temp-thetatemp_star[,k])^2
p2tt <- -NDtemp[k]*NDtheta[k+1] + sum_Mt*thetatemp[,k]
-(1/(2*sdt_prior_temp^2))*(mt_prior_temp-thetatemp[,k])^2
mh1tt <- exp(p1tt-p2tt+q12tt)
}
p1tt <- -NDtemp_star*ND[k+1] + sum_Mt*thetatemp_star[,k]
-(1/(2*sdt_prior_temp^2))*(mt_prior_temp-thetatemp_star[,k])^2
p2tt <- -NDtemp[k]*ND[k+1] + sum_Mt*thetatemp[,k]
-(1/(2*sdt_prior_temp^2))*(mt_prior_temp-thetatemp[,k])^2
mh1tt <- exp(p1tt-p2tt+q12tt)
}
for(h in 1:8) {
if(mh1tt[h]>1|mh1tt[h]=="Inf") mh1tt[h] <- 2
pstartt <- min(1,mh1tt[h])
thetatemp[h,k+1] <- sample(c(thetatemp_star[h,k],thetatemp[h,k]),1
,prob=c(pstartt,1-pstartt))
if(thetatemp[h,k+1]==thetatemp[h,k]) acceptthetatemp[h]
<- acceptthetatemp[h]-1
} # end h loop
NDtemp_new = 0
for(t in 1:365){
indext <- which(t2==t)
NDtemp_new <- NDtemp_new + exp(thetatemp[1,k+1]*sin(2*pi*tmo/12)

+ thetatemp[2,k+1]*cos(2*pi*tmo/12)+
thetatemp[3,k+1]*sin(2*pi*twk/52) + thetatemp[4,k+1]*cos(2*pi*twk/52)+
thetatemp[5,k+1]*sin(2*pi*t/365) + thetatemp[6,k+1]*cos(2*pi*t/365)+
thetatemp[7,k+1]*sin(2*pi*twd/7) + thetatemp[8,k+1]*cos(2*pi*twd/7))
}
NDtemp[k+1] <- NDtemp_new
40
for(j in 1:n) {
Mts[j] <- exp(thetatemp[1,k+1]*sin(2*pi*t4[j]/12)
+ thetatemp[2,k+1]*cos(2*pi*t4[j]/12)+
thetatemp[3,k+1]*sin(2*pi*t3[j]/52) + thetatemp[4,k+1]*cos(2*pi*t3[j]/52)+
thetatemp[5,k+1]*sin(2*pi*t2[j]/365) + thetatemp[6,k+1]*cos(2*pi*t2[j]/365)+
thetatemp[7,k+1]*sin(2*pi*t1[j]/7) + thetatemp[8,k+1]*cos(2*pi*t1[j]/7))
}
indext <- which(t2==td)

Mtu <- exp(thetatemp[1,k+1]*sin(2*pi*tmo/12)

+ thetatemp[2,k+1]*cos(2*pi*tmo/12)+
thetatemp[3,k+1]*sin(2*pi*twk/52) + thetatemp[4,k+1]*cos(2*pi*twk/52)+
thetatemp[5,k+1]*sin(2*pi*td/365) + thetatemp[6,k+1]*cos(2*pi*td/365)+
thetatemp[7,k+1]*sin(2*pi*twd/7) + thetatemp[8,k+1]*cos(2*pi*twd/7))
if(covariates == "on") NDt[k+1] <- Mtu*NDtheta[k+1] else NDt[k+1]

<- Mtu*tauk[k]*Ku%*%fxu[,k]
} # END temporal M-H
ND[k+1] <- tauk[k]*Ku%*%fxu[,k]
} # END OF ITERATION k
# ACCEPTANCE RATIO FOR THETA

c(1+accepttheta/k,1+acceptthetatemp/k)
##############################################################################
######
# SPATIAL DISTRIBUTIONS, TRACE PLOTS, AND HISTOGRAMS
CC = 1
WK = "June"
burn = 1/2
NL = 8
colores = terrain.colors(128)
post_Ls <- apply(Ls[,(1+round(k*burn)):(k-1)],1,mean)

post_rx <- apply(rx[,(1+round(k*burn)):(k-1)],1,mean)
post_ry <- apply(ry[,(1+round(k*burn)):(k-1)],1,mean)
post_sxu <- apply(sxu[,(1+round(k*burn)):(k-1)],1,mean)

surface1a <- matrix(post_sxu,Q,P)
image(x_grid,y_grid,surface1a,col=colores,main=paste("Posterior mean of
s2x(u) for crimes type",CC," and Week ",WK," 2006"))
contour(x_grid,y_grid,surface1a,nlevels=NL,add=TRUE)
points(x,y)
41
post_syu <- apply(syu[,(1+round(k*burn)):(k-1)],1,mean)
surface1b <- matrix(post_syu,Q,P)
image(x_grid,y_grid,surface1b,main=paste("Posterior mean of s2y(u) for
crimes type",CC," and Week ",WK," 2006"))
contour(x_grid,y_grid,surface1b,nlevels=NL,add=TRUE)
points(x,y)
post_ru <- apply(ru[,(1+round(k*burn)):(k-1)],1,mean)

surface1c <- matrix(post_ru,Q,P)
image(x_grid,y_grid,surface1c,main=paste("Posterior mean of rho(u) for
contour(x_grid,y_grid,surface1c,nlevels=NL,add=TRUE)
points(x,y)
Ku <- pbivariate(min(x_grid)-xu,max(x_grid)-xu,min(y_grid)-yu,max(y_grid)
-yu,post_sxu,post_syu,post_ru)
surface1d <- matrix(Ku,Q,P)
image(x_grid,y_grid,surface1d,main=paste("Posterior mean of K(u) for
contour(x_grid,y_grid,surface1d,nlevels=NL,add=TRUE)
points(x,y)
cov = 1
cov_distro1 <- efe_u[,cov]

surface8a1 <- matrix(cov_distro1,Q,P)
image(x_grid,y_grid,surface8a1,main=paste("covariate[",cov,"]: "
,cov_name[cov]))
contour(x_grid,y_grid,surface8a1,nlevels=NL,add=TRUE)
points(x,y)
cov = 3
cov_distro2 <- efe_u2[,cov]
surface8a7 <- matrix(cov_distro2,Q2,P2)

image(x_grid2,y_grid2,surface8a7,main=paste("covariate[",cov,"]: "
,cov_name[cov]))
contour(x_grid2,y_grid2,surface8a7,nlevels=NL,add=TRUE)
points(x,y)
surface8a7b <- matrix(t1_u2,Q2,P2)

image(x_grid2,y_grid2,surface8a7b,main="Spatial distribution of covariate:
WEEKDAY")
contour(x_grid2,y_grid2,surface8a7b,nlevels=NL,add=TRUE)
points(x,y)
surface8a7a <- matrix(t2_u2,Q2,P2)

image(x_grid2,y_grid2,surface8a7a,main="Spatial distribution of covariate:
YEARDAY")
contour(x_grid2,y_grid2,surface8a7a,nlevels=NL,add=TRUE)
points(x,y)
42
##############################################################################
######
post_fxu <- apply(fxu[,(1+round(k*burn)):(k-1)],1,mean)

surface4 <- matrix(post_fxu,Q,P)
image(x_grid,y_grid,surface4,main=paste("Posterior mean of x(u) for crimes
type",CC," and Week ",WK," 2006"))
contour(x_grid,y_grid,surface4,nlevels=NL,add=TRUE)
points(x,y)
post_fxu_log <- apply(log(fxu[,(1+round(k*burn)):(k-1)]),1,mean)

surface4s <- matrix(post_fxu_log,Q,P)
image(x_grid,y_grid,surface4s,main=paste("Posterior mean of Log[x(u)] for
crimes type",CC," and ",WK," 2006"))
contour(x_grid,y_grid,surface4s,nlevels=NL,add=TRUE)
points(x,y)
post_Mu <- apply(Mu[,(1+round(k*burn)):(k-1)],1,mean)

surface8a <- matrix(post_Mu,Q,P)
image(x_grid,y_grid,surface8a,main=paste("post M(u) after iter = ",k,"
with burn = ",k*burn),
sub=paste("Theta acceptance: ",min(1+accepttheta/k)," to "
,max(1+accepttheta/k)))
contour(x_grid,y_grid,surface8a,nlevels=NL,add=TRUE)
points(x,y)
post_LuoverMu <- apply(Lu[,(1+round(k*burn)):(k-1)]

/Mu[,(1+round(k*burn)):(k-1)],1,mean)
surface8a2 <- matrix(post_LuoverMu,Q,P)
image(x_grid,y_grid,surface8a2,main=paste("post L(u)
/M(u) after iter = ",k," with burn = ",k*burn),
sub=paste("No acceptance: ",min(noaccept/k)," to ",max(noaccept/k)))
points(x,y)
surface8a3 <- matrix(log(post_LuoverMu),Q,P)

image(x_grid,y_grid,surface8a3,main=paste("Log[post L(u)/M(u)] after iter =
",k," with burn = ",k*burn),
sub=paste("No acceptance: ",min(noaccept/k)," to ",max(noaccept/k)))
points(x,y)
post_Lu <- apply(Lu[,(1+round(k*burn)):(k-1)],1,mean)

surface8 <- matrix(post_Lu,Q,P)
image(x_grid,y_grid,surface8,main=paste("posterior intensity mean L(u)
iter = ",k," with burn = ",k*burn),
sub=paste("Acceptance x(u): ",min(1-noaccept/k)," to ",max(1-noaccept/k)))
contour(x_grid,y_grid,surface8,nlevels=NL,add=TRUE)
points(x,y)
post_Lu_log <- log(post_Lu)

surface8b <- matrix(post_Lu_log,Q,P)
image(x_grid,y_grid,surface8b,main=paste("posterior intensity mean L(u)
[log scale]
43
contour(x_grid,y_grid,surface8b,nlevels=NL,add=TRUE)
points(x,y)
post_Lu_var <- apply(Lu[,(1+round(k*burn)):(k-1)],1,var)

surface8c <- matrix(post_Lu_var,Q,P)
image(x_grid,y_grid,surface8c,main=paste("posterior intensity variance L(u)
contour(x_grid,y_grid,surface8c,nlevels=NL,add=TRUE)
points(x,y)
post_Lu_varlog <- log(post_Lu_var)

surface8d <- matrix(post_Lu_varlog,Q,P)
image(x_grid,y_grid,surface8d,main=paste("posterior intensity variance L(u)
[log scale]
contour(x_grid,y_grid,surface8d,nlevels=NL,add=TRUE)
points(x,y)
##############################################################################
######
# MAP PLOTS
par(mfrow=c(1,1))
# MAP 1
if(map == "map1") {
map1 <- read.table("cincinnati_map1.dat",header=TRUE)
dim(map1)
mx <- map1[,1]
my <- map1[,2]
mz <- map1[,-c(1,2)]
color1 <- terrain.colors(128)
color2 <- "black"
}
# MAP 2
if(map == "map2") {
long <- read.table("long_map2.dat")
lat <- read.table("lat_map2.dat")
mz <- read.table("cincinnati_map2.dat")
mx <- long[,1]
my <- lat[,1]
color1 <- gray(0:128/128)
color2 <- "yellow"
}
# SPATIAL SCATTERPLOT
# DOMAIN COORDINATES (CORRESPONDING TO HAMILTON COUNTY, CODE= TGR39061,

FROM CENSUS.GOV)
# x1 = -84.820305
# x2 = -84.256391
44
# y1 = 39.021600
# y2 = 39.312056
if(map == "map1") {
stretch_x = 1.25
stretch_y = 1.00
offset_x = 0.00
offset_y = 0.00
}
if(map == "map2") {
stretch_x = 0.95
stretch_y = 1.15
offset_x = 0.01
offset_y = -0.005
}
rx1 = mean(mx) - stretch_x*(mean(mx)-min(mx)) + offset_x

rx2 = mean(mx) + stretch_x*(mean(mx)-min(mx)) + offset_x
ry1 = mean(my) - stretch_y*(mean(my)-min(my)) + offset_y
ry2 = mean(my) + stretch_y*(mean(my)-min(my)) + offset_y
rmx <- seq(rx1,rx2,length=length(mx))

rmy <- seq(ry1,ry2,length=length(my))
# hamilton <- map(database="county",xlim=c(x1,x2),ylim=c(y1,y2))
map2 <- as.matrix(mz)
X_map <- c(-84.63,-84.38)

Y_map <- c(39.09,39.22)
value1 = 1
value2 = 2
### LLNL PLOTS
NL = 15
mI <- post_Lu # mI <- apply(d$intensity, 2, mean)
meanintensity <- ezinterp(xu,yu,mI,method="akima",gridlen=200,span=0.02)
# meanintensity <- ezinterp(d$YY[,1],d$YY[,2],mI,method="akima"
,gridlen=200,span=0.02)
image(rmx,rmy,map2,main="Cincinnati Crime Data: Mean Intensity Surface"

,pty="s",col=color1,xlab="Longitude",ylab="Latitude",xlim=X_map,ylim=Y_map)
if(temporal=="on") points(xd,yd,pch=22,col="black") else points(x,y
,pch=22,col="black")
# surface1 <- kde2d(longitude[(UCR<500)|(UCR==1702)]
,latitude[(UCR<500)|(UCR==1702)],n=480,lims = c(rx1,rx2,ry1,ry2))
# contour(x_grid,y_grid,surface8b,lty=value1,lwd=value2,span=0.05
,gridlen=200,nlevels=NL,add=TRUE,col="black",xlim=X_map,ylim=Y_map)
contour(meanintensity, drawlabels=FALSE, nlevels=NL, add=TRUE, col=color2)
mI <- post_Lu_log # mI <- apply(d$intensity, 2, mean)
45
meanintensity <- ezinterp(xu,yu,mI,method="akima",gridlen=200,span=0.02)
# meanintensity <- ezinterp(d$YY[,1],d$YY[,2],mI,method="akima"
,gridlen=200,span=0.02)
image(rmx,rmy,map2,main="Cincinnati Crime Data: Mean Intensity Surface

[log scale]",pty="s",col=color1,xlab="Longitude",ylab="Latitude"
,xlim=X_map,ylim=Y_map)
# contour(x_grid,y_grid,surface8b,lty=value1,lwd=value2,span=0.05
,gridlen=200,nlevels=NL,add=TRUE,col="black",xlim=X_map,ylim=Y_map)
contour(meanintensity, drawlabels=FALSE, nlevels=NL, add=TRUE, col=color2)
cov = 1
cov_distro1 <- efe_u[,cov]

cov_distro2 <- efe_u2[,cov]
mJ <- cov_distro1 # mJ <- apply(d$jpdf, 2, mean)

covd1 <- ezinterp(xu,yu,mJ,method="loess",gridlen=200,span=0.02)
# meanpdf <- ezinterp(d$YY[,1],d$YY[,2],mJ,method="loess",gridlen=200
,span=0.02)
image(rmx,rmy,map2,main=paste(cov_name[cov],"of County 39061")

# contour(x_grid,y_grid,surface8a1,lty=value1,lwd=value2,nlevels=NL
,add=TRUE,col="black",xlim=X_map,ylim=Y_map)
contour(covd1, drawlabels=FALSE, nlevels=NL, add=TRUE, col=color2)
mJ <- cov_distro2 # mJ <- apply(d$jpdf, 2, mean)

covd2 <- ezinterp(xu2,yu2,mJ,method="loess",gridlen=200,span=0.02)
# meanpdf <- ezinterp(d$YY[,1],d$YY[,2],mJ,method="loess",gridlen=200
,span=0.02)
image(rmx,rmy,map2,main=paste(cov_name[cov],"of County 39061")

# contour(x_grid,y_grid,surface8a1,lty=value1,lwd=value2,nlevels=NL
,add=TRUE,col="black",xlim=X_map,ylim=Y_map)
contour(covd2, drawlabels=FALSE, nlevels=NL, add=TRUE, col=color2)
hist(exp(1000*theta[cov,(1+round(k*burn)):(k-1)]),main=paste("Histogram of
exp(1000*theta) for covariate",cov),sub=paste("mean = "
,mean(exp(1000*theta[cov,(1+round(k*burn)):(k-1)]))," var = "
,var(exp(1000*theta[cov,(1+round(k*burn)):(k-1)]))),breaks=50)
46
par(mfrow = c(2,1))
hist(exp(1000*theta[1,(1+round(k*burn)):(k-1)]),main="Histogram of
exp(1000*theta) for covariate 1",sub=paste("mean = "
,mean(exp(1000*theta[1,(1+round(k*burn)):(k-1)]))," var = "
,var(exp(1000*theta[1,(1+round(k*burn)):(k-1)]))),breaks=50)
### END LLNL PLOTS
# DISPLAY PANEL 2
# post_Lu <- apply(Lu[,(1+round(k*burn)):(k-1)],1,mean)

post_fxu <- apply(fxu[,(1+round(k*burn)):(k-1)],1,mean)
post_ND <- mean(ND[(1+round(k*burn)):(k-1)])
var_ND <- var(ND[(1+round(k*burn)):(k-1)])
}
post_ND <- mean(NDtheta[(1+round(k*burn)):(k-1)])
var_ND <- var(NDtheta[(1+round(k*burn)):(k-1)])
turn <- sample(1:nc,1)
if(temporal == "on" ) {# DATE FOR THE CONTROL PANEL 2
# June 25
t2_date = 17
index_t <- which(t2==t2_date)
t1_date = t1[index_t[1]]
t1_u = rep(t1_date, n_star)
47
t2_u = rep(t2_date, n_star)
efe_u <- cbind(efe_u[,1:(nc-3)], cos(2*pi*t1_u/7), cos(2*pi*t2_u/365)
, sin(2*pi*t2_u/365))
# efe_u2 <- cbind(efe_u2, cos(2*pi*t1_u2/7), cos(2*pi*t2_u2/365)
, sin(2*pi*t2_u2/365))
xt <- x[t2==t2_date]
yt <- y[t2==t2_date]
}
sxx <- rep(post_sxu[j],n2)
syy <- rep(post_syu[j],n2)
ruu <- rep(post_ru[j],n2)
Kutheta[j] <- Kuu%*%exp(as.matrix(efe_u2)%*%thetas)
}
NDtheta_t <- post_tau*fxu[,k]%*%Kutheta
post_Mu <- exp(as.matrix(efe_u)%*%thetas)
post_Lu <- post_Mu*post_tau*k2%*%post_fxu

,min(1+accepttheta/k)," to ", max(1+accepttheta/k)," rho:"
,min(1+acceptru/k)," to ",max(1+acceptru/k)),main=paste("Log[post L(u)]
after iter = ",k,"with burn of ",100*burn,"%
ND[",t2_date,"] = ",NDtheta_t," tau = ",post_tau,"
Acceptance rx,ry:",min(c(1+acceptrx/k,1+acceptry/k))," to "
,max(c(1+acceptrx/k,1+acceptry/k))))
if(temporal=="on") points(xt,yt) else points(x,y)
##############################################################################
######
par(mfrow=c(2,1))
index1 <- which(post_Lu>=mean(post_Lu))

index2 <- which(post_Lu>=(mean(post_Lu)+max(post_Lu))/2)
index3 <- which(post_Lu==max(post_Lu))
index4 <- which(post_Lu==min(post_Lu))
plot((1+round(k*burn)):(k-1),fxu[index4,(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of x(",index4,") with min L(u)"))
plot((1+round(k*burn)):(k-1),fxu[index3,(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of x(",index3,") with max L(u)"))
48
hist(fxu[index4,(1+round(k*burn)):(k-1)],breaks=100,main=paste("Histogram of
x(",index4,") with min L(u)"))
hist(fxu[index3,(1+round(k*burn)):(k-1)],breaks=100,main=paste("Histogram of
x(",index3,") with max L(u)"))
plot((1+round(k*burn)):(k-1), tauk[(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of the kernel tau"))
hist(tauk[(1+round(k*burn)):(k-1)],main=paste("Histogram of the kernel tau")
,breaks=100)
##############################################################################
######
# THETA FOR SPATIAL AND TEMPORAL COVARIATES
par(mfrow=c(2,1))
cov = 5
plot((1+round(k*burn)):(k-1), theta[cov,(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of theta [",cov,"]
covariate =",cov_name[cov]),sub=paste("mean = ",mean(theta[cov
,(1+round(k*burn)):(k-1)])," var = ",var(theta[cov
,(1+round(k*burn)):(k-1)])))
hist(theta[cov,(1+round(k*burn)):(k-1)],main=paste("Histogram of
theta [",cov,"]
covariate =",cov_name[cov]),sub=paste("Acceptance:",1+accepttheta[cov]/k)
,breaks=100)
cov = 2
plot((1+round(k*burn)):(k-1), thetatemp[cov,(1+round(k*burn)):(k-1)]
,type="l",main=paste("Trace plot of thetatemp [",cov,"]"),
sub=paste("mean = ",mean(thetatemp[cov,(1+round(k*burn)):(k-1)])," var = "
,var(thetatemp[cov,(1+round(k*burn)):(k-1)])))
hist(thetatemp[cov,(1+round(k*burn)):(k-1)],main=paste("Histogram of
thetatemp [",cov,"]"),
sub=paste("Acceptance:",1+acceptthetatemp[cov]/k),breaks=100)
##############################################################################
######
# NUMBER OF EVENTS, N(D), OVER THE STUDY REGION
par(mfrow=c(2,1))
if(covariates == "on") NDk <- NDtheta

if(covariates == "off") NDk <- ND
if(temporal == "on") NDk <- NDk*NDtemp
plot((1+round(k*burn)):(k-1), NDk[(1+round(k*burn)):(k-1)],type="l"
,main="Expected number of crime events for the study area"
,xlab="MCMC iteration",ylab="Expected N(D)",sub=paste("mean = "
,mean(NDk[(1+round(k*burn)):(k-1)])," var = "
,var(NDk[(1+round(k*burn)):(k-1)])))
hist(NDk[(1+round(k*burn)):(k-1)],main=paste("Observed number of crime events
= ",length(x)),xlab="Expected number of crime events"
49
,sub=paste("MCMC with burning in of ",100*burn," % of ",k," iterations")
,breaks=100)
td = 12 # monthly basis: 1 to 12
monthlab = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep"
, "Oct", "Nov", "Dec")
Mtuk <- exp(thetatemp[1,(1+round(k*burn)):(k-1)]*sin(2*pi*td/12) +

thetatemp[2,(1+round(k*burn)):(k-1)]*cos(2*pi*td/12))
if(covariates == "on") NDtk <- Mtuk*NDtheta[(1+round(k*burn)):(k-1)] else
NDtk <- Mtu*tauk[(1+round(k*burn)):(k-1)]*Ku%*%fxu[,(1+round(k*burn)):(k-1)]
plot((1+round(k*burn)):(k-1), NDtk,type="l",main=paste("Expected number of

crime events for the study area
corresponding to ", monthlab[td],"2006"),xlab="MCMC iteration"
,ylab="Expected N(D)",sub=paste("mean = ",mean(NDtk)," var = ",var(NDtk)))
hist(NDtk,main=paste("Observed number of crime events = "
,length(t4[t4==td])),xlab="Expected number of crime events"
,sub=paste("MCMC with burning in of ",100*burn," % of ",k," iterations")
,breaks=100)
td = 365 # daily basis: 1 to 365

Mtuk <- exp(thetatemp[1,(1+round(k*burn)):(k-1)]*sin(2*pi*td/365) +

thetatemp[2,(1+round(k*burn)):(k-1)]*cos(2*pi*td/365))
plot((1+round(k*burn)):(k-1), NDtk,type="l",main=paste("Expected number of

crime events for the study area
corresponding to ", monthlab[td],"2006"),xlab="MCMC iteration"
,ylab="Expected N(D)",sub=paste("mean = ",mean(NDtk)," var = ",var(NDtk)))
hist(NDtk,main=paste("Observed number of crime events = ",length(t2[t2==td]))
,xlab="Expected number of crime events",sub=paste("MCMC with burning in of "
,100*burn," % of ",k," iterations"),breaks=100)
##############################################################################
######
# MOVIE FRAMES
library(akima)
gridlen = 100
par(mfrow=c(1,1))
framing = "montly" # "daily", "weekly", "monthly"
monthday = c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
if(covariates == "on") NDk <- NDtheta

if(covariates == "off") NDk <- ND
if(temporal == "on") NDk <- NDk*NDtemp
50
post_fxu <- apply(fxu[,(1+round(k*burn)):(k-1)], 1, mean)
post_thetatemp <- apply(thetatemp[,(1+round(k*burn)):(k-1)], 1, mean)
post_Ms <- apply(Ms[,(1+round(k*burn)):(k-1)], 1, mean)
post_Mu <- apply(Mu[,(1+round(k*burn)):(k-1)], 1, mean)
Lu_baseline <- post_Mu*post_tau*t(post_fxu)%*%k2
# MONTHLY BASIS
# td <- 12
postscript("llnl_month_%04d.eps", paper="letter", onefile = FALSE, title=" ")
for(td in 1:12) {
xd <- x[t4==td]
yd <- y[t4==td]
Mtuk <- exp(thetatemp[1,(1+round(k*burn)):(k-1)]*sin(2*pi*td/12)

+ thetatemp[2,(1+round(k*burn)):(k-1)]*cos(2*pi*td/12))
post_NDt <- mean(NDtk)

var_NDt <- var(NDtk)
post_Mtu <- exp(post_thetatemp[1]*sin(2*pi*td/12) + post_thetatemp[2]

*cos(2*pi*td/12))
post_Mtuk <- mean(Mtuk)
Lut <- post_Mtu*Lu_baseline
surface_mov1 <- interp(xu, yu, log(Lut), duplicate="mean", xo=seq(min(xu)

, max(xu), length=gridlen),
yo=seq(min(yu), max(yu), length=gridlen))
NL = 12
image(rmx,rmy,map2,main=paste("Cincinnati Crime Data: Mean Intensity Surface

",monthlab[td], " 2006"),
sub=paste("Observed number of crime events = ",length(t4[t4==td])),pty="s"
,col=color1,xlab=paste("Longitude
mean(ND) = ",post_NDt," var(ND) = ",var_NDt),ylab="Latitude",xlim=X_map
,ylim=Y_map)
contour(surface_mov1,drawlabels=TRUE, nlevels=NL, add=TRUE, col=color2)
points(xd,yd,pch=22,col="black")
} # end td loop
dev.off()
image(surface_mov1,xlab=paste("mean(ND) = ",post_NDt," var(ND) = ",var_NDt)

,ylab=" ",
main=paste("Mean Intensity Surface: ",monthlab[td], " 2006"),
sub=paste("Observed number of crime events = ",length(t4[t4==td])))
contour(surface_mov1,add=TRUE)
51
points(xd,yd)
# DAILY BASIS
par(mfrow = c(1,1))
postscript("day_%04d.eps", paper="letter", onefile = FALSE, title=" ")
# td <- 176
Nobs = rep(0, 365)

Ncalc = rep(0, 365)
Nse = rep(0, 365)
for(td in 1:365) {
xd <- x[t2==td]
yd <- y[t2==td]

if(tmo>1) mday = td-sum(monthday[1:(tmo-1)]) else mday = td
Mtuk <- exp(thetatemp[1,(1+round(k*burn)):(k-1)]*sin(2*pi*tmo/12)

+ thetatemp[2,(1+round(k*burn)):(k-1)]*cos(2*pi*tmo/12)+
thetatemp[3,(1+round(k*burn)):(k-1)]*sin(2*pi*twk/52)
+ thetatemp[4,(1+round(k*burn)):(k-1)]*cos(2*pi*twk/52)+
thetatemp[5,(1+round(k*burn)):(k-1)]*sin(2*pi*td/365)
+ thetatemp[6,(1+round(k*burn)):(k-1)]*cos(2*pi*td/365)+
thetatemp[7,(1+round(k*burn)):(k-1)]*sin(2*pi*twd/7)
+ thetatemp[8,(1+round(k*burn)):(k-1)]*cos(2*pi*twd/7))


Nobs[td] = length(xd)
Ncalc[td] = post_NDt
Nse[td] = sqrt(var_NDt)
# post_Mtu <- exp(post_thetatemp[1]*sin(2*pi*td/12)

+ post_thetatemp[2]*cos(2*pi*td/12))
# Lut <- post_Mtu*Lu_baseline

Lut <- post_Mtuk*Lu_baseline
surface_mov1 <- interp(xu, yu, Lut, duplicate="mean", xo=seq(min(xu)

52
LL = c(40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 240, 280, 320
, 360) # NL = 14

",monthlab[tmo], "/", mday, "/2006"),
mean(Ncalc) = ",post_NDt," var(Ncalc) = ",var_NDt),ylab="Latitude",xlim=X_map
,ylim=Y_map)
contour(surface_mov1,drawlabels=TRUE, levels=LL, add=TRUE, col=color2)
points(xd,yd,pch=15,col="blue")
} # end td loop
dev.off()
image(surface_mov1,xlab=paste("mean(ND) = ",post_NDt," var(ND) = ",var_NDt)

,ylab=" ",
main=paste("Mean Intensity Surface: ",monthlab[td], " 2006"),
sub=paste("Observed number of crime events = ",length(t4[t4==td])))
contour(surface_mov1,add=TRUE)
points(xd,yd)
##############################################################################
######
# ADDITIONAL GRAPHS (FOR THE FINAL DOCUMENT)
# pdf("baseline2006.pdf", paper="letter", onefile = TRUE, title=" ")
postscript("baseline2006.eps", paper="letter", onefile = TRUE, title=" ")
Lub <- Lu_baseline
surface_mov1 <- interp(xu, yu, Lub, duplicate="mean", xo=seq(min(xu)

LL = c(40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 240, 280, 320
, 360) # NL = 14

Baseline"),
pty="s",col=color1,xlab="Longitude",ylab="Latitude",xlim=X_map,ylim=Y_map)
dev.off()
#pdf("june252006.pdf", paper="letter", onefile = TRUE, title=" ")
postscript("june252006.eps", paper="letter", onefile = TRUE, title=" ")
td <- 176 # June 25, 2006
53
xd <- x[t2==td]
yd <- y[t2==td]

if(tmo>1) mday = td-sum(monthday[1:(tmo-1)]) else mday = td
Mtuk <- exp(thetatemp[1,(1+round(k*burn)):(k-1)]*sin(2*pi*tmo/12)

+ thetatemp[2,(1+round(k*burn)):(k-1)]*cos(2*pi*tmo/12)+
thetatemp[3,(1+round(k*burn)):(k-1)]*sin(2*pi*twk/52)
+ thetatemp[4,(1+round(k*burn)):(k-1)]*cos(2*pi*twk/52)+
thetatemp[5,(1+round(k*burn)):(k-1)]*sin(2*pi*td/365)
+ thetatemp[6,(1+round(k*burn)):(k-1)]*cos(2*pi*td/365)+
thetatemp[7,(1+round(k*burn)):(k-1)]*sin(2*pi*twd/7)
+ thetatemp[8,(1+round(k*burn)):(k-1)]*cos(2*pi*twd/7))


Lut <- post_Mtuk*Lu_baseline
surface_mov1 <- interp(xu, yu, Lut, duplicate="mean", xo=seq(min(xu)


",monthlab[tmo], "/", mday, "/2006"),
mean(ND) = ",post_NDt," var(ND) = ",var_NDt),ylab="Latitude",xlim=X_map
,ylim=Y_map)
points(xd,yd,pch=15,col="black")
dev.off()
# pdf("NobsNcalc.pdf", paper="letter", onefile = TRUE, title=" ")
postscript("NobsNcalc.eps", paper="letter", onefile = TRUE, title=" ")
par(mfrow = c(2,1))
plot(1:365, Nobs, main="Observed Number of Crimes", xlab="day of the year"

, ylab= "Nobs", type="l", ylim = c(0,25))
plot(1:365, Ncalc, main="Posterior Mean of Number of Crimes
with 95% confidence interval", xlab="day of the year", ylab= "Ncalc", type="l"
, ylim = c(0,25))
54
lines(1:365, Ncalc+1.96*Nse, col="red")
lines(1:365, Ncalc-1.96*Nse, col="blue")
dev.off()
# THETA FOR SPATIAL AND TEMPORAL COVARIATES
# pdf("Theta2.pdf", paper="letter", onefile = TRUE, title=" ")
postscript("Theta2.eps", paper="letter", onefile = TRUE, title=" ")
par(mfrow=c(2,1))
cov = 2
plot((1+round(k*burn)):(k-1), theta[cov,(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of theta [",cov,"]
covariate =",cov_name[cov]),xlab = paste("theta [",cov,"]"), ylab=" "
,sub=paste("mean = ",mean(theta[cov,(1+round(k*burn)):(k-1)])," var = "
,var(theta[cov,(1+round(k*burn)):(k-1)])))
hist(theta[cov,(1+round(k*burn)):(k-1)],xlab = paste("theta [",cov,"]")
, ylab=" ",main=paste("Histogram of theta [",cov,"]
covariate =",cov_name[cov]),sub=paste("Acceptance:",1+accepttheta[cov]/k)
,breaks=100)
dev.off()
# pdf("ThetaTemp%02d.pdf", paper="letter", onefile = FALSE, title=" ")
postscript("ThetaTemp%02d.eps", paper="letter", onefile = FALSE, title=" ")
par(mfrow=c(2,1))
for(cov in 1:8) {
plot((1+round(k*burn)):(k-1), thetatemp[cov,(1+round(k*burn)):(k-1)],type="l"
,main=paste("Trace plot of thetatemp [",cov,"]"),
xlab = paste("theta [",cov,"]"), ylab=" "
,sub=paste("mean = ",mean(thetatemp[cov,(1+round(k*burn)):(k-1)])," var = "
,var(thetatemp[cov,(1+round(k*burn)):(k-1)])))
hist(thetatemp[cov,(1+round(k*burn)):(k-1)],xlab = paste("theta [",cov,"]")
, ylab=" ",main=paste("Histogram of thetatemp [",cov,"]"),
sub=paste("Acceptance:",1+acceptthetatemp[cov]/k),breaks=100)
}
dev.off()
# EXPLORATORY ANALYSIS
ITR = length(x)
i = 1
j = 1
crime_count <- numeric(0)

crime_yearday <- numeric(0)
55
crime_weekday <- numeric(0)
h = t2[1]
k = t1[1]
for (i in 2:ITR) {
if (t5[i]==t5[i-1]&t4[i]==t4[i-1]) j <- j+1 else {

crime_count <- c(crime_count,j)
crime_yearday <- c(crime_yearday,h)
crime_weekday <- c(crime_weekday,k)
j = 1
h <- t2[i]
k <- t1[i]
}
crime_count <- c(crime_count,j)

crime_yearday <- c(crime_yearday,h)
crime_weekday <- c(crime_weekday,k)
length(crime_count)
length(crime_yearday)
length(crime_weekday)
sum(crime_count)
ITR
# pdf("Exploratory.pdf", paper="letter", onefile = FALSE, title=" ")
postscript("Exploratory.eps", paper="letter", onefile = FALSE, title=" ")
par(mfrow=c(2,1))
plot(crime_yearday,crime_count,type="l",main="Daily variation of crimes

against PEOPLE
[Case 1: Extreme Violence]",sub="Day 1 = Jan 01, 2006",xlab="DAY OF THE YEAR"
,ylab="CRIME EVENTS REPORTED",xlim=c(1,365))
plot(crime_count~as.factor(crime_weekday),main="Crimes against PEOPLE
during the days of the week
[Case 1: Extreme Violence]",sub="Day 1 = Sunday",xlab="DAY OF THE WEEK"
,ylab="CRIME COUNT")
dev.off()
summary(lm(crime_count~as.factor(crime_weekday)))
##############################################################################
######
# ANSCOMBE RESIDUAL
sector_matrix = c(4,4)
xi <- 1+(Q-1)*(xu-min(x_grid))/(max(x_grid)-min(x_grid))
56
yj <- 1+(P-1)*(yu-min(y_grid))/(max(y_grid)-min(y_grid))
h1 <- ceiling(yj*sector_matrix[1]/P)
h2 <- ceiling(xi*sector_matrix[2]/Q)
sector_loc <- (h1-1)*sector_matrix[2]+h2
sector_index <- as.list(rep(0,sector_matrix[1]*sector_matrix[2]))
for(j in 1:(sector_matrix[1]*sector_matrix[2])) {
index_loc <- which(sector_loc == j)
sector_index[[j]] <- index_loc
}
post_sxu <- apply(sxu[,(1+round(k*burn)):(k-1)],1,mean)

post_syu <- apply(syu[,(1+round(k*burn)):(k-1)],1,mean)
post_ru <- apply(ru[,(1+round(k*burn)):(k-1)],1,mean)
post_Ku <- pbivariate(min(x_grid)-xu,max(x_grid)-xu,min(y_grid)-yu
,max(y_grid)-yu,post_sxu,post_syu,post_ru)

sxx <- rep(post_sxu[j],n2)
syy <- rep(post_syu[j],n2)
ruu <- rep(post_ru[j],n2)
Kutheta[j] <- Kuu%*%exp(as.matrix(efe_u2)%*%thetas)
}
# MONTHLY BASIS
td <- 1
xd <- x[t4==td]
yd <- y[t4==td]
# DAILY BASIS
# td <- 176
# xd <- x[t2==td]
# yd <- y[t2==td]
xis <- 1+(Q-1)*(xd-min(x_grid))/(max(x_grid)-min(x_grid))

yjs <- 1+(P-1)*(yd-min(y_grid))/(max(y_grid)-min(y_grid))
h1s <- ceiling(yjs*sector_matrix[1]/P)
h2s <- ceiling(xis*sector_matrix[2]/Q)
sector_locs <- (h1s-1)*sector_matrix[2]+h2s
sector_index_s <- as.list(rep(0,sector_matrix[1]*sector_matrix[2]))
index_loc <- which(sector_locs == j)
sector_index_s[[j]] <- index_loc
}
57
NDobs_sector <- rep(0,sector_matrix[1]*sector_matrix[2])
NDcalc_sector <- rep(0,sector_matrix[1]*sector_matrix[2])
index_events <- sector_index_s[[j]]
if(length(index_events)>0) NDobs_sector[j] <- length(index_events)
index_u <- sector_index[[j]]
post_Mtu <- exp(post_thetatemp[1]*sin(2*pi*td/12)

+ post_thetatemp[2]*cos(2*pi*td/12))
if(covariates == "on") NDtheta_sector <- post_tau

*post_fxu[index_u]%*%Kutheta[index_u] else
NDtheta_sector <- post_tau*post_fxu[index_u]%*%post_Ku[index_u]
NDcalc_sector[j] <- post_Mtu*NDtheta_sector
Anscombe_Res <- (3/2)*(NDobs_sector^(2/3)-NDcalc_sector^(2/3))

/NDcalc_sector^(1/6)
Table_1 <- cbind(NDobs_sector, NDcalc_sector, Anscombe_Res)
monthlab[td]
Table_1
apply(Table_1, 2, sum)
td = td+1
##############################################################################
######
# MODEL SUMMARY
CC;WK
P;Q;n;n_star
area;n/area
(max(x_grid)-min(x_grid))/(Q-1)
(max(y_grid)-min(y_grid))/(P-1)
post_ND;post_ND/area
summary(theta[(1+round(k*burn)):(k-1)])
summary(tauk[(1+round(k*burn)):(k-1)])
summary(post_rx)
summary(post_ry)
summary(post_sxu)
summary(post_syu)
summary(post_ru)
mean(alpha);var(alpha)
mean(beta);var(beta)
alpha[1];beta[1]
alpha[1]/beta[1];alpha[1]/beta[1]^2
A_tauk;B_tauk
58
A_tauk/B_tauk;A_tauk/B_tauk^2
sum(Lu_times_Au)
min(noaccept/k);max(noaccept/k)
min(1+acceptrx/k);max(1+acceptrx/k)
min(1+acceptry/k);max(1+acceptry/k)
min(1+acceptru/k);max(1+acceptru/k)
min(1+accepttheta/k);max(1+accepttheta/k)
np;a_eta;delta_eta
A_alpha;B_alpha;A_alpha/B_alpha;A_alpha/B_alpha^2
A_beta;B_beta;A_beta/B_beta;A_beta/B_beta^2
pv1; Nx; Ny
proposal; kernel; prior_xu; covariates; display
# END
##############################################################################
######
59
References
[1] Peter J. Diggle. Statistical Methods for Spatio-Temporal Systems, chapter
1 Spatio-Temporal Point Processes: Methods and Applications, pages 1–
45. Chapman & Hall / CRC, 2007.
[2] David Higdon. Statistical Methods for Spatio-Temporal Systems, chapter

6 A Primer on Space-Time Modeling from a Bayesian Perspective, pages
217–279. Chapman & Hall / CRC, 2007.
[3] Andrew B Lawson and David G T Denison. Spatial Cluster Modeling,

chapter 1 Spatial Cluster Modelling: An Overview, pages 1–19. Chapman
& Hall / CRC, 2002.
[4] Herbert K. H. Lee, Dave M. Higdon, Catherine A. Calder, and Christo-

pher H. Holloman. Efficient models for correlated data via convolutions
of intrinsic processes. Statistical Modelling, 5:53–74, 2005.
[5] Jesper Moller and Ramus Plenge Waagepetersen. Statistical Inference

and Simulation for Spatial Point Processes. Chapman & Hall / CRC,
2004.
[6] Bruno Sanso and Alexandra M. Schmidt. Spatio-temporal models based

on discrete convolutions. Technical Report AMS2004-07, University of
California, Santa Cruz, 2004.
[7] Jonathan R. Stroud, Peter Muller, and Bruno Sanso. Dynamic models
for spatiotemporal data. Journal of the Royal Statisitical Society. Series
B (Statistical Methodology), 63(4):673–689, 2001.
[8] Jenise Lynn Swall. Non-Stationary Spatial Modeling Using a Process

Convolution Approach. PhD thesis, Duke University, 1999.
60

Acevedo-Arreguin (2005)

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Acevedo-Arreguin (2005)

Загружено:

Авторское право:

Доступные форматы

UNIVERSITY OF CALIFORNIA

SPATIAL TEMPORAL STATISTICAL MODELING OF CRIME DATA:

A final report submitted in partial satisfaction

Luis Antonio Acevedo-Arreguı́n

The Master’s Project of Luis Antonio

A Poisson point process with a kernel convolution approach is

Introduction: Statistical analysis of spatial-temporal data can be applied

We start by considering Poisson point processes deﬁned on a

This measure is locally ﬁnite, i.e. µ(B) < ∞ for bounded B ⊆ S,

where k(ωj − ·) is a kernel centered at ωj [2].

Then, Higdon provides several cases to illustrate the advantages of this

µs (u) = exp(θ1 X1 (u) + θ2 X2 (u)), (5)

where j ∈ {1, 2, ...8}. The posterior distributions were sampled by a combi-

−84.60 −84.55 −84.50 −84.45 −84.40

0 100 200 300

DAY OF THE YEAR

Crimes against PEOPLE during the days of the week

DAY OF THE WEEK

−84.60 −84.55 −84.50 −84.45 −84.40

−84.60 −84.55 −84.50 −84.45 −84.40

2500 3000 3500 4000 4500 5000

0.000125 0.000135 0.000145 0.000155

2500 3000 3500 4000 4500 5000

0.0035 0.0040 0.0045 0.0050

0 100 200 300

day of the year

Posterior Mean of Number of Crimes

0 100 200 300

day of the year

−84.60 −84.55 −84.50 −84.45 −84.40

# POISSON PROCESS MODELING

dataset = "complete" # "partial"; "complete"

kernel_size = 0.525 # initial kernel ellipse dimension (one standard deviation)

# SETTING A GRID PxQ (nrow=P, ncol=Q)

# DOMAIN COORDINATES (CORRESPONDING TO HAMILTON COUNTY, CODE= TGR39061,

# setwd("C:/Documents and Settings/Me/Desktop/week 22")

# MATT TADDY’S ROUTINE FOR INTERPOLATION AND CONTOURING

# try to use akima, if specified

# try to default to loess

if(length(x) != length(y) && length(y) != length(z))

if(length(x) < 30 && span < 0.5) {

xo <- seq(min(x), max(x), length=gridlen)

g <- expand.grid(x=xo, y=yo)

pbivariate <- function(x1,x2,y1,y2,xvar,yvar,rho,slices=10) {

# COMPUTING PROBABILITIES FROM A BIVARIATE NORMAL DISTRIBUTION

integral2 <- pnorm(y2,mean=m3,sd=sd3) - pnorm(y1,mean=m3,sd=sd3)

integral1 <- pnorm(xi+deltax/2,mean=0,sd=sd1)-pnorm(xi-deltax/2,mean=0,sd=sd1)

volume <- volume + integral1*integral2

# READING DATA FROM THE ENTIRE DATASET

crime2006_file3 <- read.table("crime2006_plus3.dat",header=TRUE)

# SELECTING CRIME CLASSIFICATION (UCR)

datax <- longitude[crime_class==CC]

index <- which(datax==0)

data1 = data.frame(x, y, t1, t2, t3, t4, t5)

data2 <- read.csv("YY20gridData.csv")

# COVARIATE INDEX VECTOR

# efe_s <- data1[,civ] # efe_s <- data1[,13]

xu2 <- data2[,2]

efe_u2 <- data2[,civ] # efe_u2 <- data2[,13]

# SETTING A GRID PxQ (nrow=P, ncol=Q)

delta_x <- (max(x)-min(x))/(Q-1)

x_grid = seq(from=(min(x)-delta_x/4), to=(max(x)+delta_x/4), length=Q)

x_grid = seq(from=(min(xu2)+delta_x/10), to=(max(xu2)-delta_x/10), length=Q)

# area <- (max(x_grid)-min(x_grid))*(max(y_grid)-min(y_grid))

# cov_name = c(cov_name, "cos(2piweekday/7)", "sin(2pimonth/12)",

# efe_u2 <- cbind(efe_u2, cos(2pit1_u2/7), sin(2pit4_u2/12),

# efe_u <- cbind(efe_u, cos(2pit1_u/7), sin(2pit4_u/12), cos(2pit4_u/12))

# k2[j,i] <- (1/(2pisqrt(q1)))exp(-q2/(2q1))

# k1[j,i] <- (1/(2pisqrt(q1)))exp(-q2/(2q1))

A_tauk <- nppv1/(2pisqrt(q1)) # (n/10)1/2 # 25*1/2

A_tauk <- nppv1/(2pi*sqrt(q1))

Ls[,k] <- MtsMs[,k]tauk[k]t(fxu[,k])%%k1

Kutheta[j] <- Kuu%%exp(as.matrix(efe_u2)%%theta[,k]) # Kutheta[j]

NDtheta[k+1] <- tauk[k]fxu[,k]%%Kutheta

p1t <- -NDtemp[k]NDtheta_star[,k+1] + sum_fstheta_star[,k]

NDtemp_star[3] <- NDtemp_star[3] + exp(thetatemp[1,k]sin(2pi*tmo/12)

NDtemp_star[5] <- NDtemp_star[5] + exp(thetatemp[1,k]sin(2pi*tmo/12)