Вы находитесь на странице: 1из 43

CrimeStat III

P art III: Sp atia l Mod e ling

Chapter 8
Kernel Density Interpolation
In t h is ch a pt er , we discu ss t ools a imed a t int er pola t ing incident s, us ing t h e ker n el
den sit y a pp r oach . In terpolation is a t ech n iqu e for gen er a lizin g in cid en t loca t ion s t o a n
en t ir e a r ea . Wh er ea s t h e spa t ia l d is t r ibu t ion a n d h ot spot st a t is t ics pr ovide st a t is t ica l
su m m a r ies for t h e da t a in cid en t s t h em selves, in t er pola t ion t ech n iqu es gen er a lize t h ose
da t a inciden t s t o t h e ent ire r egion . In pa r t icu lar , th ey provide density est im a t es for a ll
pa r t s of a r egion (i.e., a t a n y loca t ion ). Th e den sit y est im a t e is a n in t en sit y va r ia ble, a Zvalu e, th a t is est ima t ed a t a pa r t icu lar loca t ion . Con sequ en t ly, it ca n be displa yed by
eith er sur face maps or cont our ma ps th at show the intensity at all locat ions.
Th er e a r e m a n y in t er pola t ion t ech n iqu es, s u ch a s Kr igin g, t r en d su r fa ces, loca l
r egres sion m odels (e.g., Loess, sp lines), a n d Dir ich let t essella t ion s (An selin, 1992;
Clevelan d, Gr osse a n d Sh yu, 1993; Ven a bles an d Ripley, 1997). Most of t h ese r equ ire a
va r ia ble t h a t is bein g est im a t ed a s a fun ction of loca t ion. H owever , kernel density
estim ation is a n in t er pola t ion t ech n iqu e t h a t is a ppr opr ia t e for in divid u a l p oin t loca t ion s
(Silverm a n , 1986; H r dle, 1991; Bailey an d Ga t r ell, 1995; Bur t an d Ba r ber, 1996; Bowm a n
a n d Azalini, 1997).

Ke r n e l D e n s i t y E s t im a t i o n
Ker n el de n sit y est im a t ion in volves p la cing a sym m et r ical su r face over ea ch p oin t ,
eva lu a t in g t h e d is t a n ce fr om t h e p oin t t o a r efer en ce loca t ion ba s ed on a m a t h em a t ica l
fu n ct ion , a n d s u m m in g t h e va lu e of a ll t h e s u r fa ces for t h a t r efer en ce loca t ion . Th is
pr ocedu r e is r epea t ed for a ll r efer en ce locat ion s. It is a t echn iqu e t h a t wa s d eveloped in
t h e la t e 1950s a s a n a lt er n a t ive m et h od for est im a t in g t h e den sit y of a h ist ogra m
(Rosen bla t t , 1956; Wh it t le, 1958; P a r zen , 1962). A h is t ogr a m is a gr a ph ic r epr esen t a t ion of
a fr equ en cy dis t r ibu t ion . A cont in u ou s va r ia ble is divid ed in t o in t er va ls of size, s (t h e
int er val or bin widt h ), an d t h e n u m ber of ca ses in ea ch int er val (bin) ar e cou n t ed a n d
dis pla yed a s block dia gr a m s. Th e h ist ogr a m is a ss u m ed t o rep r esen t a sm oot h , un der lying
dis t r ibu t ion (a den sit y fu n ct ion ). H owever , in or der t o est im a t e a sm oot h den sit y fu n ct ion
fr om t h e h ist ogra m , tr a dit ion a lly r esea r ch er s h a ve link ed a djacent var iable int er vals by
con n ect in g t h e m idp oint s of th e in t er va ls w it h a ser ies of lin es (Figu r e 8.1).
Un for t u n a t ely, doing t h is ca u ses t h r ee s t a t ist ical pr oblem s (Bowm a n a n d Azalin i,
1997):
1.

Inform at ion is discar ded becau se all cases with in an int erval ar e assigned to
t h e m id poin t . Th e wid er t h e in t er va l, t h e gr ea t er t h e in for m a t ion los s.

2.

Th e t ech n iqu e of conn ect in g t h e m id poin t s lea ds t o a dis con t in u ou s a n d n ot


sm oot h den sit y fu n ction even t h ough t h e u n der lying den sit y fu n ction is
a ssu m ed t o be sm oot h . To com pen sa t e for t h is, resea r ch er s will r edu ce t h e
widt h of t h e in t er va l. Th u s, t h e den sit y fu n ction becom es sm oot h er wit h
8.1

Figure 8.1:

Constructing A Density Estimate From A Histogram


Method of Connecting Midpoints
50

Frequency

40

30

20

10

0
1

3
2

5
4

7
6

Variable Classification Interval (bin)

sm a ller int er val widt h s, alt h ou gh st ill n ot very sm oot h . Fu r t h er , th er e a r e


limit s t o t h is t ech n ique a s t h e sa m ple size decr ea ses wh en t h e bin widt h gets
sm a ller, even t u a lly becom ing t oo sm a ll t o pr odu ce r eliable est ima t es.
3.

Th e t echn iqu e is d epen den t on a n a r bit r a r ily defined in t er va l size (bin


widt h ). By m a kin g th e int er val wider , th e est ima t or becom es cru der a n d,
con vers ely, by m a kin g th e int er val n a r r ower , th e est ima t or becom es fin er .
H owever , t h e u n der lyin g d en sit y d is t r ibu t ion is a ssu m ed t o be sm oot h a n d
con t in u ou s a n d n ot dep en den t on t h e in t er va l s ize of a h is t ogr a m .

To h a n dle t h is pr oblem, Rosen blat t (1956), Wh itt le (1958) a n d P a r zen (1962)


developed t h e ker n el d en sit y m et h od in or der t o avoid t h e fir st t wo of th ese difficu lt ies; t h e
bin wid t h is su e st ill r em a in s. Wh a t t h ey d id wa s t o pla ce a sm oot h k ernel fun ction , r a t h er
t h a n a block , over ea ch p oint a n d s u m t h e fun ction s for ea ch locat ion on t h e scale . F igu r e
8.2 illu st r a t es t h e pr oces s w it h five point loca t ions . As s een , over ea ch locat ion, a
sym m et r ical k er n el fun ction is p la ced; by sym m et r ical is m ea n t t h a t is falls off with
dis t a n ce fr om ea ch p oin t a t a n equ a l r a t e in both dir ection s a r oun d ea ch p oin t . In t h is
ca se, it is a n or m a l dist r ibut ion , but ot h er t ypes of sym m et r ica l dist r ibut ion h a ve been
u sed. Th e u n der lying den sit y distr ibut ion is est ima t ed by su m m ing t h e individu a l ker n el
fu n ct ion s a t all loca t ion s t o p r od u ce a s m oot h cu m u la t ive d en s it y fu n ct ion . N ot ice t h a t t h e
fun ctions a re sum med at every point a long the scale and n ot just a t t he point locat ions.
Th e a dva n t a ges of t h is a r e t h a t , fir st , ea ch poin t con t r ibu t es equ a lly t o t h e den sit y s u r fa ce
a n d, secon d, th e r esu ltin g dens ity fu n ct ion is con t inu ou s a t a ll poin t s a lon g th e scale.
Th e t h ir d pr oblem m en t ion ed a bove, in t er va l s ize, s t ill r em a in s sin ce t h e wid t h of
th e kernel fun ction can be var ied. In th e kernel density literat ur e, th is is called band wid th
a n d r efer s essen t ia lly t o t h e wid t h of t h e ker n el. F igu r e 8.3 sh ows a ker n el wit h a n a r r ow
ba n dwidt h pla ced over t h e sa m e five poin t s wh ile figu r e 8.4 shows a ker n el with a wider
ba n dw idt h pla ced over t h e poin t s. Clea r ly, th e sm oot h n ess of t h e r esu lt in g den sit y
fun ction is a consequence of th e ban dwidth size.
Th er e a r e a n u m ber of differ en t ker n el fu n ct ion s t h a t h a ve been u sed, a sid e fr om
t h e n orm a l dis t r ibu t ion , su ch a s a t r ia n gu la r fun ction (Bur t a n d Ba r ber , 1996) or a qu a r t ic
fu n ct ion (Ba iley a n d Ga t r ell, 1995). F igur e 8.5 illus t r a t es a qu a r t ic fu n ct ion . But t h e
n orm a l is t h e m ost com m only u se d (Kelsa ll a n d D iggle, 1995a ).
Th e n or m a l d ist r ib u ti on fu n ct ion h a s t h e followin g fu n ct ion a l for m :

g(x j) =

E{

[Wi * I i ] * ----------*
h 2 * 2B

d ij2
- [--------- ]
2*h 2

(8.1)

wh er e d ij is t h e dis t a n ce bet ween a n in cid en t loca t ion a n d a n y r efer en ce poin t in t h e r egion ,


h is t h e st a n da r d d evia t ion of th e n orm a l dis t r ibu t ion (t h e ba n dw idt h ), Wi is a weigh t a t
th e point locat ion a nd I i is an int en sit y at t h e poin t loca t ion . This fun ct ion exten ds t o
in finit y in a ll dir ect ions a n d, t h u s, will be a pp lied t o an y loca t ion in t h e r egion.
8.3

Figure 8.2:

Kernel Density Estimate


Summing of Normal Kernel Functions for 5 Points
0.7

0.6

Kernel density estimate

Density

0.5

0.4

Kernels over individual points


0.3

0.2

0.1

0.0
0

2
1

4
3

6
5

8
7

10
9

12
11

14
13

Relative Location

16
15

18
17

20
19

Figure 8.3:

Kernel Density Estimate


Smaller Bandwidth
1.0

0.9

0.8
Kernel density estimate
0.7

Density

0.6

0.5

0.4

0.3

0.2

0.1

0.0
0

2
1

4
3

6
5

8
7

10
9

12
11

14
13

Relative Location

16
15

18
17

20
19

Figure 8.4:

Kernel Density Estimate


Larger Bandwidth
0.5

0.4
Kernel density estimate

Density

0.3

Kernels over individual points


0.2

0.1

0.0
0

2
1

4
3

6
5

8
7

10
9

12
11

Relative Location

14
13

16
15

18
17

20
19

Figure 8.5:

Kernel Density Estimate


Summing of Quartic Kernel Function
0.18

0.16

Kernel density estimate

0.14

Density

0.12

0.10
Quartic functions over individual points
0.08

0.06

0.04

0.02

0
0

2
1

4
3

6
5

8
7

10
9

12
11

Relative Location

14
13

16
15

18
17

20
19

I n Crim eS tat, t h er e a r e fou r a lt er n a t ive k er n el fu n ct ion s t h a t ca n be u s ed , a ll of


wh ich h a ve a cir cu m scribed r a diu s (un like t h e n or m a l dist r ibut ion ). The q u a r t i c fu n ct ion
is a pplied t o a lim it ed a r ea a r ou n d ea ch in cid en t poin t defin ed by t h e r a diu s, h . It fa lls off
gr a du a lly wit h dis t a n ce un t il t h e r a diu s is r ea ched . It s fu n ctiona l form is:
1.

Ou t sid e t h e specified r a diu s, h :


g(x j) = 0

2.

(8.2)

Wit h in t h e specified r a diu s, h :

g(x j) =

E{

3
d ij2
2
[W i * I i ] * [ ----------]
* [1 - -------]
h2 * B
h2

(8.3)

wh er e d ij is t h e dis t a n ce bet ween a n in cid en t loca t ion a n d a n y r efer en ce poin t in t h e r egion ,


h is t h e r a diu s of th e sea r ch a r ea (th e ba n dw idt h ), Wi is a weigh t a t t h e poin t loca t ion a n d
I i is a n in t en sit y a t t h e point loca t ion.
Th e t ri a n gu la r (or con ica l) dist r ibut ion fa lls off evenly wit h dist a n ce, in a lin ea r
r ela t ion sh ip . Com pa r ed t o t h e qu a r t ic fu n ct ion , it fa lls off m or e r a pid ly. It a ls o h a s a
circum scribed r a diu s a n d is, th er efor e, app lied to a limit ed a r ea a r ou n d ea ch inciden t
point, h. Its fun ctiona l form is:
1.

Ou t sid e t h e specified r a diu s, h :


g(x j) = 0

2.

(8.4)

Wit h in t h e specified r a diu s, h :


g(x j) =

E [K -

K/h ] * d ij

(8.5)

wh er e K is a con s t a n t . In Crim eS tat, th e con st a n t K is init ially set t o 0.25 an d t h en r escaled t o en su r e t h a t eith er t h e den sit ies or pr oba bilit ies su m t o t h eir a ppr opr iat e valu es
(i.e., N for den sit ies a n d 1.00 for pr oba bilit ies).
Th e n e g a t i v e ex p on e n t i a l (or pea k ed) dist r ibu t ion falls off ver y r a pid ly wit h
dis t a n ce up t o th e circu m scr ibed r a diu s. It s fu n ctiona l form is:
1.

Ou t sid e t h e specified r a diu s, h :


g(x j) = 0

2.

(8.6)

Wit h in t h e specified r a diu s, h :


g(x j) =

E A*e -K*dij

(8.7)
8.8

where A is a const an t a nd K is an exponen t. In Crim eS tats im plem en t a t ion , K is set t o 3


wh ile A is in it ia lly s et t o 1 a n d t h en r e-sca led t o en su r e t h a t eit h er t h e den sit ies or
pr oba bilit ies su m t o t h eir a ppr opr ia t e va lu es (i.e., N for den sit ies a n d 1.00 for
pr oba bilit ies).
F ina lly, th e u n i for m distribut ion weight s all points with in th e circle equally. Its
fun ctiona l form is:
1.

Ou t sid e t h e specified r a diu s, h :


g(x j) = 0

2.

(8.8)

Wit h in t h e specified r a diu s, h :


g(x j) =

EK

(8.9)

wh er e K is a con st a n t . Init ially, K is set t o 0.1 but t h en r e-scaled t o en su r e t h a t eith er t h e


den sit ies or pr oba bilit ies su m t o t h eir a ppr opr ia t e va lu es (i.e., N for den sit ies a n d 1.00 for
pr oba bilit ies).
Kern e l P ara m e te rs
Th e u ser ca n select t h ese five differ en t ker n el fu n ct ion s t o int er pola t e t h e da t a t o
t h e gr id cells. Th ey p r odu ce su bt le differ en ces in t h e sh a pe of t h e in t er pola t ed su r fa ce or
con t ou r . Th e n or m a l d is t ribu t ion weigh s a ll p oin t s in t h e s tu dy a r ea , t h ou gh n ea r poin t s
a r e weigh t ed m or e h igh ly t h a n dist a n t poin t s. The oth er fou r t ech n iques u se a
circum scr ibed cir cle ar oun d t h e gr id cell. The u n ifor m dis t r ibu t ion weigh s a ll point s wit h in
t h e circle equ a lly. The qu a r t ic fu n ct ion weigh s n ea r poin t s m or e t h a n fa r poin t s, but t h e
fa ll off is gr a d u a l. Th e t r ia n gu la r fu n ct ion weigh s n ea r p oin t s m or e t h a n fa r p oin t s wit h in
t h e circle, but t h e fa ll off is m or e r a pid. Fin a lly, th e n egat ive expon en t ial weighs n ea r
poin t m u ch m ore h igh ly t h a n far poin t s w it h in t h e circle.
Th e u se of a n y of on e of t h ese dep en ds on h ow m u ch t h e u ser wa n t s t o weigh n ea r
poin t s r ela t ive t o fa r point s. U sin g a ker n el fun ction wh ich h a s a big differ en ce in t h e
weight s of n ea r vers u s far poin t s (e.g., th e n egat ive expon en t ial or t h e t r ian gula r ) t en ds t o
pr odu ce fin er va r ia t ion s wit h in t h e su r face t h a n fun ctions wh ich a r e weight m ore e ven ly
(e.g., t h e n or m a l d is t ribu t ion , t h e qu a r t ic, or t h e u n ifor m ); t h es e la t t er on es ten d t o sm ooth
t h e dist r ibut ion m or e.
S h a p e a n d s iz e of t h e b a n d w i d t h
H owever , S ilver m a n (1986) h a s a r gu ed t h a t it does n ot m a k e t h a t m u ch differ en ce
a s lon g as t h e ker n el is sym m et r ica l. Ther e a r e a lso edge effect s t h a t ca n occu r a n d t h er e
h a ve be en differ en t pr oposed solu t ion s t o t h is pr oblem (Ven a bles a n d Ripley, 1997).

8.9

Th er e h a ve a ls o be en va r ia t ion s of t h e size of t h e of ba n dwid t h wit h va r iou s


for m u la s a n d cr it er ia (Silver m a n , 1986; H r dle, 1991; Ven a bles a n d Riple y, 1997).
Gen er a lly, ban dw idt h choice fall in t o eith er fixed or a da pt ive (va r ia ble) ch oices (Kelsa ll an d
Diggle, 1995a ; Bailey a n d G a t r ell, 1995). Crim eS tat follows t h is dist inction , which will be
expla in ed below.
Anoth er suggestion is to use th e Mora n corr elogra m, which was discussed in
ch a pt er 4, to estim a t e t h e sh a pe of t h e weigh t ing fu n ct ion (Cliff a n d H a gget t , 1988; Bailey
a n d Ga t t r ell, 1995). This wou ld be ap pr opr iat e for var iables t h a t h a ve weights, s u ch a s
popu la t ion or em ploymen t . Th e Mor a n cor r elogra m dis pla ys t h e degr ee of sp a t ia l
a u t ocor r ela t ion a s a fu n ct ion of dis t a n ce. Wh et h er t h e a u t ocor r ela t ion fa lls off qu ick ly or
m ore slowly ca n be u se d t o select a n a pp r oxima t e k er n el fun ction (e.g., a n ega t ive
expon en t ial fu n ct ion fa lls off quickly wher ea s a qua r t ic fu n ct ion fa lls off very slowly). The
ba n dwid t h could a ls o be select ed by t h e dis t a n ce a t wh ich t h e Mor a n corr elogr a m levels off
(i.e., a pp r oach es t h e global I va lu e). This would lea d t o an est im a t e t h a t m in im izes s pa t ia l
a u t ocor r elat ion in t h e da t a set . It would be good for ca pt u r ing m a jor t r en ds in t h e da t a ,
bu t wou ld n ot be good for id en t ifyin g loca l clu st er s (h ot spot s) sin ce t h e ba n dwid t h dis t a n ce
wou ld incor pora t e m ost of a m et r opolita n a r ea .
T h r e e-d i m e n si on a l k e r n e l s
Th e ker n el fu n ct ion ca n be expan ded t o m or e t h a n t wo dim en sion s (H r dle, 1991;
Ba iley a n d G a t r ell, 1995; Bu r t a n d Ba r ber , 1996; Bowm a n a n d Aza lin i, 1997). Figu r e 8.6
sh ows a t h r ee-dim en sion a l n orm a l dis t r ibu t ion pla ced over ea ch of five poin t s wit h t h e
r esu lt in g den sit y su r face bein g a su m of a ll five in dividu a l su r faces. Th u s, t h e m et h od is
pa r t icu lar ly a ppr opr iat e for geogra ph ica l da t a , such a s crim e inciden t loca t ion s. The
m et h od h a s a lso bee n developed t o rela t e t wo or m ore va r ia bles t ogeth er by a pp lyin g a
ker n el est ima t e t o ea ch var iable in t u r n a n d t h en dividin g on e by th e ot h er t o pr odu ce a
t h r ee-dim en sion a l est im a t e of risk (Ke ls a ll a n d Diggle, 1995a ; Bowm a n a n d Aza lin i, 1997).
Significa n ce t est ing of den sit y est ima t es is m or e com plica t ed. Cur r en t t ech n iques
t end t o focu s on simu lat ing su r fa ces u n der spa t ially ra n dom as su m pt ion s (Bowm a n an d
Aza lin e, 1997; Ke ls a ll a n d Diggle, 1995b). Beca u se of th e st ill exp er im en t a l n a t u r e of th e
t es t in g, Crim eS tat does n ot in clu de a n y t est in g of den sit y est im a t es in t h is ver sion .
C r i m eS t a t Ke r n e l D e n s i t y Me t h o d s
Crim eS tat h a s t wo int er pola t ion t echn iqu es, both ba sed on t h e k er n el de n sit y
t echn iqu e. Th e firs t a pp lies t o a s in gle va r ia ble, wh ile t h e secon d t o th e r ela t ion sh ip
between t wo var iables. Bot h rout ines h a ve a n u m ber of opt ion s. Figur e 8.7 sh ows t h e
in t er pola t ion pa ge in Crim eS tat. User s in dica t e t h eir choices by clickin g on t h e t a b a n d
m en u it em s. F or eit h er t echn iqu e, it is n ecess a r y t o ha ve a r efer en ce file, wh ich is u su a lly
a gr id pla ced over t h e st u dy r egion (see ch a pt er 3). Th e r efer en ce file r epr esen t s t h e r egion
t o wh ich t h e ker n el est im a t e will be gen er a lized (figu r e 8.8).

8.10

Figure 8.6:

Kernel Density Surfaces


Summing of Normal Kernel Surfaces for 5 Points

+
=

Figure 8.7:

Interpolation Screen

Figure 8.8:

Grid Cell Structure for Baltimore Region


108 Width x 100 Height Grid Cells
! Upper-right

Coordinate

Baltimore County

City of Baltimore

Miles
0

4
Lower-left Coordinate

S in g le D e n si ty Es ti m at e s
Th e sin gle ker n el de n sit y r out in e in Crim eS tat is a pp lied t o a dist r ibu t ion of point
loca t ion s, s uch a s cr im e in cid en t s. It ca n be u sed wit h eit h er a pr im a r y file or a secon da r y
file; t h e pr im a r y file is t h e defa u lt . F or exa m ple, t h e pr im a r y file ca n be t h e loca t ion of
m otor veh icle t h efts. Th e poin t s ca n a lso ha ve a weigh t in g or a n a ss ocia t ed in t en sit y
va r ia ble (or bot h ). F or exa m ple, t h e poin t s cou ld r epr esen t t h e loca t ion of police st a t ion s
wh ile t h e we igh t s (or in t en sit ies ) repr es en t t h e n u m ber of calls for ser vice. Aga in , t h e u se r
mu st be car eful in h aving both a weight ing var iable and a n int ensity var iable as th e
r out in e will u se both va r ia bles in calcu la t in g den sit ies ; th is could lea d t o double we igh t in g.
H a vin g defined t h e file on t h e pr ima r y (or secon da r y) file t a bs, th e u ser ind ica t es
t h e r ou t ine by ch eckin g th e Sin gle box. Also, it is necessa r y to define a r efer en ce file,
eit h er a n exist in g file or one gen er a t ed by Crim eS tat (see cha pt er 3). Ther e a r e ot h er
par am eters t ha t m ust be defined.
F ile to be In te rp ola te d
Th e u ser m u st in dica t e wh et h er t h e pr im a r y file or t h e secon da r y file (if u sed) is t o
be int erpolated.
Me t h o d o f In t e r p o la t i on
Th e u ser m u st in dica t e t h e m et h od of in t er pola t ion . Five t ypes of ker n el de n sit y
estimat ors a re used:
1.
2.
3.
4.
5.

Norm a l dis t r ibu t ion (bell; defau lt )


Un ifor m (fla t ) dist r ibu t ion
Qu a r t ic (s ph er ica l) d is t r ibu t ion
Tr ia n gu la r (con ica l) d is t r ibu t ion
Nega t ive exp on en t ia l (p ea k ed) dis t r ibu t ion

In ou r experien ce, th er e a r e a dva n t a ges t o ea ch . The n or m a l dist r ibut ion pr odu ces
a n est im a t e over t h e en t ir e r egion wh er ea s t h e ot h er fou r pr odu ce est im a t es on ly for t h e
cir cu m scr ibed ba n dwid th r a diu s. If t h e d is t ribu t ion of p oin t s is sp ar s e t owa r ds th e ou t er
pa r t s of t h e r egion , t h en t h e fou r cir cu m scr ibed fu n ct ion s will n ot pr odu ce est im a t es for
t h ose a r ea s, wh er ea s t h e n or m a l will. Con ver sely, t h e n or m a l d is t r ibu t ion ca n ca u se som e
edge effect s t o occu r (e.g., spik es a t t h e edge of t h e r efer en ce grid), pa r t icu lar ly if t h er e a r e
m a n y point s n ea r on e of t h e bou n da r ies of t h e st u dy ar ea . The fou r circum scribed
fu n ct ion s will p rod uce les s of a pr oblem a t t h e ed ges , a lt h ou gh t h ey s till ca n pr od uce s om e
s pik es . Wit h in t h e fou r cir cu m scr ibed fu n ct ion s, t h e u n ifor m a n d qu a r t ic t en d t o s m oot h
th e data more whereas t he tr iangular a nd n egat ive exponent ial tend t o empha size peaks
a n d va lleys . Th e differ en ces be t ween t h es e differ en t ker n el fun ction s a r e sm a ll, h owever .
Th e u ser sh ould pr obably st a r t wit h t h e defa u lt n orm a l fun ction a n d a dju st a ccord in gly t o
how the sur face or cont our looks.

8.14

Choice of Band w idth


Th e u ser m u st in dica t e h ow ba n dwid t h s a r e t o be defin ed. Th er e a r e t wo t yp es of
ba n dw idt h for t h e sin gle k er n el den sit y r out in e, fixed in t er va l or a da pt ive in t er va l.
F i x e d i n t e rv a l
Wit h a fixed ba n dw idt h , t h e u ser m u st sp ecify t h e in t er va l t o be used a n d t h e u n it s
of m ea su r em en t (squ a r ed m iles, squ a r ed n a u t ica l miles, squ a r ed feet, squ a r ed k ilom et er s,
or s qu a r ed m et er s). Depen din g on t h e t ype of ker n el es t im a t e u sed, t h is in t er va l h a s a
s ligh t ly d iffer en t m ea n in g. F or t h e n or m a l k er n el fu n ct ion , t h e ba n dwid th is th e s ta n da r d
d evia t ion of t h e n or m a l d is t ribu t ion . F or t h e u n ifor m , qu a r t ic, t r ia n gu la r , or n ega t ive
exponen t ia l ker n els , t h e ba n dw idt h is t h e r a diu s of th e sea r ch a r ea t o be in t er pola t ed.
Th er e a r e few gu idelin es for choosin g a pa r t icula r ba n dw idt h oth er t h a n by visu a l
ins pection (Ven a bles an d Ripley, 1997). Som e h a ve ar gued t h a t t h e ban dwidt h be no lar ger
t h a n t h e fin est r esolu t ion t h a t is desir ed a n d ot h er s h a ve a r gu ed for a va r ia t ion on r a n dom
n ea r est n eigh bor dist a n ces (see Spen cer Cha iney a pplica t ion lat er in t h is ch a pt er ). Oth er s
h a ve a r gued for pa r t icula r sizes (Silver m a n , 1986; H r dle, 1991; Ka da far , 1996; F a r ewell,
1999; Ta lbot , Kulldor ff, Fora n d, an d H a ley, 2000).1 There does not seem t o be consensu s
on t h is is su e. Con se qu en t ly, Crim eS tat leaves t h e defin ition u p t o t h e u ser .
Typica lly, a n a r r ower ba n dw idt h in t er va l will lea d t o a finer m esh den sit y est im a t e
wit h a ll t h e lit t le pe a k s a n d va lleys. A lar ger ba n dw idt h in t er va l, on t h e oth er h a n d, will
lea d t o a s m oot h er dis t r ibu t ion a n d, t h er efor e, less va r ia bilit y bet ween a r ea s. Wh ile
sm a ller ba n dwid t h s sh ow gr ea t er differ en t ia t ion a m on g a r ea s (e.g., bet ween h ot spot a n d
low spot zon es), on e h a s t o keep in m in d t h e st a t is t ica l p r ecis ion of th e es t im a t e. If t h e
sa m ple size is n ot ver y lar ge, t h en a sm a ller ba n dw idt h will lea d t o mor e im pr ecision in t h e
est ima t es; t h e peak s a n d valleys m a y be noth ing more t h a n ra n dom var iat ion . On t h e
oth er h a n d, if t h e sa m ple size is la r ge, t h en a finer den sit y est im a t e can be p r odu ced. In
gen er a l, it is a good idea t o exper im en t wit h differ en t fixed in t er va ls t o see wh ich r esu lt s
m a ke t h e m ost sen se.
Ad a p t iv e i n t e r v a l
An a da pt ive ba n dwid th a dju st s t h e ba n dwid th in t er va l s o t h a t a min im u m n u m ber
of p oin t s a r e fou n d. Th is ha s t h e a dva n t a ge of p rovid in g con st a n t pr ecis ion of t h e es t im a t e
over t h e en t ir e r egion . Th u s , in a r ea s t h a t h a ve a h igh con cen t r a t ion of p oin t s , t h e
ba n dwidt h is na r r ow wh er ea s in a r ea s wh er e t h e con cen t r a t ion of poin t s is m or e spa r se,
t h e ba n dw idt h will be la r ger . Th is is t h e defau lt ba n dw idt h choice in Crim eS tat since we
believe t h a t con sis t en cy in st a t ist ical pr ecision is pa r a m oun t . Th e degr ee of pr ecision is
gen er a lly dep en den t on t h e sa m ple size of t h e ba n dw idt h in t er va l. Th e defau lt is a
m in im u m of 100 p oin t s wit h in t h e ba n dw idt h r a diu s. Th e u ser can m a k e t h e es t im a t e
m or e fine gr a ined by ch oosin g a sm a ller n u m ber of poin t s (e.g., 25) or m or e sm oot h by
ch oosin g a la r ger n u m ber of poin t s (e.g., 200). Aga in , exp er im en t a t ion is n ecessa r y t o see
wh ich r esu lts m a ke t h e m ost sen se.
8.15

Output U nits
Th e u ser m u st in dica t e t h e m ea su r em en t u n it s for t h e den sit y est im a t e in point s
per squ a r ed m iles, squ a r ed n a u t ica l miles, squ a r ed feet, squ a r ed k ilom et er s, or squ a r ed
m et er s. The d efa u lt is point s per squ a r e m ile.
Int e n si ty or We ig h tin g Vari ab le s
If an int en sit y or weight ing va r iab le is u sed , th ese boxes m u st be ch eck ed. Be
car efu l a bout u sin g both a n in t en sit y a n d a weigh t in g va r ia ble t o avoid dou ble w eigh t in g.
D e n s it y Ca lc u la ti on s
Th e u ser m u st ind ica t e t h e t ype of ou t pu t for t h e den sit y est ima t es. Ther e a r e t h r ee
t ypes of ca lcu lat ion t h a t ca n be con du ct ed wit h t h e ker n el dens ity r ou t ine. The
calcula t ion s a r e a pp lied t o each r efer en ce cell:
1.

The kernel estima tes can be calculat ed as absolute density es t im a t es u sin g


for m u las 8.1-8.9, depen din g on wh a t t ype of ker n el fu n ct ion is us ed. The
est ima t es a t ea ch r efer en ce cell a r e r e-scaled so th a t t h e su m of t h e den sit ies
over a ll r efer en ce gr id s equ a ls t h e t ot a l n u m ber of in cid en t s ; t h is is t h e
defau lt va lu e.

2.

The kernel estima tes can be calculat ed as relative density estimat es. These
divid e t h e a bs olut e den sit ies by t h e a r ea of t h e gr id cell. It h a s t h e a dva n t a ge
of in t er pr et in g t h e den sit y in t er m s t h a t a r e fa m ilia r . Th u s, in st ea d of a
den sit y est im a t e r epr esen t ed by point s p er gr id cell, th e r ela t ive den sit y will
con ver t t h is t o point s p er , sa y, squ a r e m ile.

3.

Th e den sit ies ca n be con ver t ed in t o probabilities by dividin g t h e den sit y a t


a n y one cell by t h e t ota l n u m ber of in ciden t s.

Sin ce th e t h r ee t ypes of calcula t ion a r e dir ectly int er r ela t ed, t h e out pu t su r face will
n ot differ in it s var iability. The ch oice wou ld depen d on wh et h er t h e ca lcu lat ion s a r e u sed
t o est im a t e a bsolu t e den sit ies, r ela t ive den sit ies, or pr oba bilit ies. F or com pa r is on s
bet ween differ en t t ypes of crim e or bet ween t h e sa m e t ype of cr im e a n d d iffer en t t im e
per iods, u su a lly a bs olut e den sit ies a r e t h e u n it of choice (i.e., in ciden t s p er gr id cell).
H owever, t o expres s t h e ou t pu t a s a pr oba bilit y, t h a t is, th e likelih ood t h a t a n inciden t
would occur at an y one locat ion, then out put ing th e results as pr obabilities would ma ke
m or e sen se. For d ispla y pur poses, however , it m a kes n o differ en ce a s both look t h e sa m e.
Ou tp u t F ile s
Fin a lly, t h e res u lts can be displayed in a n ou t pu t t a ble or ca n be ou t pu t int o t wo
for m a t s: 1) Ra st er gr id for m a t s for dis pla y in a su r fa ce m a pp in g p rogr a m - S u rfer for
W in d ow s .d a t for m a t (Gold en Soft wa r e, 1994) or ArcView S pat ial A n alyst a s c for m a t
8.16

(ES RI, 1998); or 2) P olygon gr ids in ArcView .sh p, M apIn fo .m if or Atlas*GIS .bna
form at s. 2 H owever , a ll bu t S u rfer for Win d ow s r equ ire t h a t t h e r efer en ce grid be crea t ed
by Crim eS tat.
Exam ple 1: Kern el De ns ity Estim ate o f Stree t Robberie s
An exa m ple ca n illu st r a t e t h e u se of t h e s in gle k er n el d en sit y r ou t in e. F igu r e 8.9
s h ows a S u rfer for Win d ow s ou t pu t of t h e 1180 st r eet r obber ies for 1996 in Ba lt im or e
Coun t y. Th e r efer en ce grid w a s gen er a t ed by Crim eS tat a n d h a d 100 colum n s a n d 108
r ows. Th u s, t h e r out in e calcu la t ed t h e dist a n ce bet ween ea ch of t h e 10,800 r efer en ce cells
a n d t h e 1180 robbery inciden t loca t ion s, evalu a t ed t h e ker n el fu n ct ion for ea ch m ea su r ed
dist a n ce, an d su m m ed t h e r esu lts for ea ch r efer en ce cell. The n or m a l dist r ibut ion ker n el
fu n ct ion wa s select ed for t h e k er n el es t im a t or a n d a n a d ap t ive ba n d wid t h wit h a m in im u m
sam ple size of 100 was chosen as t he par am eters.
Th er e a r e t h r ee views in t h e figu r e: 1) a m a p view sh owin g t h e loca t ion of th e
in ciden t s; 2) a su r face view sh owin g a t h r ee-dim en sion a l in t er pola t ion of robbe r y den sit y;
a n d 3) a con t ou r view sh owin g con t ou r s of h igh r obber y d en sit y. Th e su r fa ce a n d con t ou r
views pr ovide differ en t per sp ect ives. Th e su r face s h ows t h e pea k s ver y clear ly a n d t h e
r ela t ive den sit y of t h e pea k s. As ca n be seen , t h e pea k for r obber ies on t h e ea st er n pa r t of
t h e Cou n t y is m u ch h igh er t h a n t h e t wo pea ks in t h e cen t r a l a n d west er n pa r t s of th e
Cou n t y. Th e cont ou r view ca n sh ow wh er e t h ese pea k s a r e loca t ed; it is difficu lt t o iden t ify
locat ion clear ly from a th ree-dimensiona l sur face map. Highways an d str eets could be
overla id on t op of t h e con t ou r view t o ident ify more pr ecisely wh er e t h ese pea ks a r e
loca t ed.
F igu r e 8.10 s h ows a n ArcViewS pat ial A n alyst m a p of t h e r obber y den sit y wit h t h e
r obber y inciden t locat ion s over la id on t op of th e den sit y cont our s. H er e, we ca n see qu it e
clea r ly t h a t t her e a r e t h r ee s tr on g con cen t r a t ion s of in cid en t s, on e s pr ea din g over a
dis t a n ce of sever a l m iles on t h e west sid e, one on n ort h er n bord er bet ween Ba lt im ore Cit y
a n d Ba lt im ore Coun t y, an d on e on t h e ea st sid e; th er e is a lso on e sm a ller pea k in t h e
sou t h ea st cor n er of t h e Cou n t y.
F r om a st a t ist ica l per spective, th e ker n el est ima t e is a bet t er h ot spot ident ifier
t h a n t h e clus t er a n a lysis r ou t ines discuss ed in cha pt er 6. Clus t er r ou t ines group in ciden t s
in t o clu st er s a n d dis t in gu is h bet ween in cid en t s wh ich belon g t o t h e clu st er a n d t h ose
wh ich do n ot belon g. Depen din g on wh ich m a t h em a t ica l algor ith m s a r e u sed, differ en t
clus t er ing r ou t ines will r et u r n differ ing a lloca t ion s of inciden t s t o clus t er s. The k er n el
est ima t e, on t h e ot h er h a n d, is a con t inu ou s su r fa ce; t h e den sit ies a r e ca lcu lat ed a t all
loca t ion s; t h u s, t h e u ser ca n vis ua lly in sp ect t h e va r ia bilit y in den sit y a n d d ecid e wh a t t o
ca ll a h ot s pot wit h ou t h a vin g t o d efin e a r bit r a r ily wh er e t o cu t -off t h e h ot s pot zon e.
Going ba ck t o t h e S u rfer for Win d ow s ou t pu t , figu r e 8.11 sh ows t h e effect s of
va r yin g t h e ba n dw idt h pa r a m et er s. Th er e a r e t h r ee fixed ba n dw idt h in t er va ls (0.5, 1, a n d
2 m iles res pectively) a n d t h er e a r e t wo a da pt ive ban dwidt h int er vals (a m inim u m of 25
a n d 100 poin t s r espectively). As can be seen , th e finen ess of t h e int er pola t ion is affect ed by
8.17

Figure 8.9:

Baltimore County Robberies: 1996-97


Kernel Density Interpolation

Contour View

Surface View
39.70

N
39.60

39.50

39.40

39.30

39.20

-76.80

Ground Level View

-76.70

-76.60

-76.50

-76.40

-76.30

Figure 8.10
Baltimore County Street Robberies: 1996
Kernel Density Estimate

Baltimore County

Baltimore Beltway
Major Road
City of Baltimore
Baltimore County
Robbery Density
Low

City of Baltimore

High

10

20 Miles

E
S

Figure 8.11:

Interpolation of Baltimore County Auto Thefts: 1996


Different Smoothing Parameters
Fixed/ h=1.0 mi

Fixed/ h=0.5 mi

Fixed/ h=2.0 mi

Adaptive/ n=50

Adaptive/ n=100

Kernel Density Interpolation to Estimate Sampling Bias in the Climatic


Response of Sphagnum Spores in North America
Mike Sawada
Laboratory for Applied Geomatics and GIS Science
University of Ottawa, Department of Geography, Canada
Sphagnum moss, the dominant species of bogs, thrives under certain ranges of
temperature and precipitation. Sphagnum releases spores for reproduction and these are
transported, often long distances, by wind and water. Thus, the presence of a spore in the
fossil record may not indicate nearby Sphagnum plants. However, spores should be most
numerous near Sphagnum plants. Over time, these spores and pollen from other plants
accumulate in lake and bog sediments and leave a fossil record of vegetation history.
We wanted to use the amount of fossil Sphagnum spores in different parts of North
America to infer past climates. To do so, we had to first show that Sphagnum spores are
most abundant in climates where Sphagnum plants thrive and secondly, that this center of
abundance is not biased sampling because of under sampling in parts of climate space.
First, we developed a Sphagnum spore response surface showing the relative abundance of
spores along the axes of temperature and precipitation (Fig. A).
CrimeStat was used in the second stage to develop a kernel density surface using a
quartic kernel for 3007 sample sites within climate space (Fig. B). These were smoothed
and visualized in Surfer. The surface showed that the intensity of points is higher in
regions surrounding the response maximum. This gave us confidence that the Sphagnum
response was real since other parts of climate space are well sampled but unlikely to
produce high spore proportions. This fact allowed climate inferences to be made within the
fossil record for past time periods using the amount of Sphagnum spores present.

Figures modified from Gajewski, Viau, Sawada et al. 2001. Global Biogeochemical Cycles,

Describing Crime Spatial Patterns By Time of Day


Renato Assuno, Cludio Beato, Brulio Silva
CRISP, Universidade Federal de Minas Gerais, Brazil
We used the kernel density estimate to visualize time trends for crime
occurrences on a typical weekday. We found markedly different spatial distributions
depending on the time, with the amount of crime varying and the hot spots,
identified by the ellipses, appearing in different places.
The analysis used 1114 weekday robberies from 1995 to 2000 in downtown
Belo Horizonte. Breaking the data into hours, we used the normal kernel, a fixed
bandwidth of 450 meters and outputted densities option (points per square unit of
area). Note that the latter option could be useful if one is interested only in the hot
spot locations, and not in the distribution during the day. To make the ellipses, we
used the nearest neighbor hierarchical spatial clustering technique with a minimum
of 35 incidents. We output the results to MapInfo, keeping the same scale for all
maps. Four of them are shown below.

9:00 AM

7:00 PM

1:00 PM

11:00 PM

Using Kernel Density Smoothing and Linking to ArcView:


Examples from London, England
Spencer Chainey
Jill Dando Institute of Crime Science
University College
London, England
CrimeStat offers an effective method for creating kernel density surfaces. The
example below uses residential burglary incidents in the London Borough of
Croydon, England for the period June 1999 May 2000 (N=3104). The single kernel
routine was used to produce a kernel density surface representing the distribution of
residential burglary.
The kernel function used was the quartic, which is favoured by most crime
mappers as it applies added weight to crimes closer to the centre of the bandwidth.
Rather than choosing an arbitary interval it is useful to use the mean nearest
neighbour distance for different orders of K, which can be calculated by CrimeStat as
part of a nearest neighbour analysis. For the Croydon data, an interval of 269
metres was chosen, which relates to a mean nearest neighbour distance at a K-order
of 13. The output units were densities in square kilometres and was output to
ArcView.
Kernel density estimation is a particularly useful method as it helps to
precisely identify the location, spatial extent and intensity of crime hotspots. It is
also visually attractive, so helping to invoke further enquiry and the reasoning
behind why crime and disorder is concentrated. The density surface that is created
can reflect the distribution of incidents against the natural geography of the area of
interest, including representing the natural boundaries, such as reservoirs and
lakes, or an alignment that follows a particular street in which there is a high
concentration of offending. The method is also less subjective if clear guidelines are
followed for the setting of parameters.

Infant Death Rate and Low Birth Weight


in the I-5 Corridor of Seattle and King County
Richard Hoskins
Washington State Department of Health
Olympia, Washington
Although the infant death rate (< 1 year old) has been steadily declining in
Washington, the incidence of low birth weight (< 2500 gms) is increasing. This is a
significant public health problem, resulting in suffering and high medical cost. If we
know where the rates are high at a neighborhood level we can develop more efficient
and effective programs. The goal is to determine regions where rates are clustered
and to characterize those regions with respect to SES variables from the US Census.
Birth and infant death data were geocoded to the street level. In order to
detect clusters of high infant death and low birth weight, several CrimeStat tools
were used. We find that using several tools at once helps detect regions where
something untoward is going on and also helps develops guesses about where other
problems might be expected develop.

I-5 corridor in
King County

Kernel density
interpolation

Top: 3-D map: empirical Bayes rate


Bottom: Prism map: SMR

The result of a kernel density interpolation using a normal estimator is


shown above along with an empirical Bayes rate and standardized mortality ratio
(SMR) calculated in SAS and mapped in Maptitude (www.caliper.com). Starting
with over 2,500 infant deaths, about 25,000 low weight births (out of over 500,000
live births) occurred in the Seattle I-5 corridor region in King County from 19892002. The kernel density method was used to detect high rate regions. A clearly
articulated region and ridge appears on the grid of the kernel density map and the
3D and prism maps.

t h e ban dwidt h ch oice. For t h e t h r ee fixed int er vals, a n int er val of 0.5 miles pr odu ces a
fin er m es h in t er p ola t ion t h a n a n in t er va l of 2 m iles , wh ich t en d s t o over s m oot h t h e
dist r ibut ion . Per h a ps, t h e int er m edia t e int er val of 1 m ile gives t h e best ba lan ce bet ween
finen ess a n d gener a lity. F or t h e t wo a da pt ive int er vals, t h e m inim u m sa m ple size of 25
gives s ome ver y sp ecific pea k locat ion s wh er ea s t h e a da pt ive in t er va l wit h a m in im u m
sa m ple size of 100 gives a sm oot h er dis t r ibu t ion.
Which of t h ese sh ou ld be us ed a s t h e best ch oice wou ld depen d on h ow m u ch
con fid en ce t h e a n a lys t h a s in t h e r es u lt s . A k ey qu es t ion is wh et h er t h e p ea k s a r e r ea l or
m er ely byprodu ct s of sm a ll sa m ple sizes. Th e best ch oice wou ld be to pr odu ce a n
in t er p ola t ion t h a t fit s t h e exp er ien ce of t h e d ep a r t m en t a n d officer s wh o t r a vel a n a r ea .
Aga in , exper im en t a t ion a n d d iscu ss ion s wit h bea t officer s will be n ecess a r y t o est a blish
which ban dwidth choice should be used in fut ur e int erpolations.
Not e in a ll five of t h e in t er pola t ion s, t h er e is som e bia s a t t h e edges wit h t h e Cit y of
Ba lt im or e (t h e t h r ee-s id ed a r ea in t h e cen t r a l s ou t h er n p a r t of t h e m a p ). S in ce t h e
pr ima r y file on ly inclu ded in ciden t s for t h e Cou n t y, t h e int er pola t ion n evert h eless h a s
est ima t ed some likelihood a t t h e edges; t h ese a r e ed ge biases a n d n eed t o be ign or ed or
r em oved wit h a n AS CII ed it or .3 F u r t h er , t h e wid er t h e in t er va l ch osen , t h e m ore bia s is
pr odu ced a t t h e ed ge.

D u a l Ke rn e l E st im a te s
Th e du a l ker n el de n sit y r out in e in Crim eS tat is a pplied t o tw o d is t r ibu t ion s of p oin t
loca t ion s. For exa m ple, th e pr ima r y file cou ld be th e loca t ion of a u t o t h eft s wh ile t h e
secon da r y file cou ld be th e cen t r oids of cen su s t r a ct s, with t h e popula t ion of t h e cen su s
t r a ct be in g a n in t en sit y var ia ble. Th e du a l r out in e m u st be u sed wit h both a pr im a r y file
a n d a secon da r y file. Also, it is n ecess a r y t o define a r efer en ce file, eit h er a n exist in g file
or on e gen er a t ed by Crim eS tat (see cha pt er 3). Severa l pa r a m et er s n eed t o be defined .
F ile to be In te rp ola te d
The user m ust indicat e the order of th e int erpolation. The routine uses the
la n gu a ge first file and secon d file in m a kin g th e com pa r ison (e.g., dividin g th e firs t file by
t h e secon d; a dd in g t h e firs t file t o th e secon d). Th e u ser m u st in dica t e wh ich is t h e firs t
file, t h e pr ima r y or t h e secon da r y. The d efa u lt is t h a t t h e pr ima r y file is t h e firs t file.
Me t h o d o f In t e r p o la t i on
Th e u ser m u st ind ica t e t h e t ype of ker n el est ima t or . As wit h t h e sin gle ker n el
den sit y rout ine, five t ypes of ker n el dens ity est ima t or s a r e u sed
1.
2.
3.

Norm a l dis t r ibu t ion (bell; defau lt )


Un ifor m (fla t ) dist r ibu t ion
Qu a r t ic (s ph er ica l) d is t r ibu t ion

8.25

4.
5.

Tr ia n gu la r (con ica l) d is t r ibu t ion


Nega t ive exp on en t ia l (p ea k ed) dis t r ibu t ion

In ou r experien ce, th er e a r e a dva n t a ges t o ea ch . The n or m a l dist r ibut ion pr odu ces
a n est im a t e over t h e en t ir e r egion wh er ea s t h e ot h er fou r pr odu ce est im a t es on ly for t h e
cir cu m scr ibed ba n dwid th r a diu s. If t h e d is t ribu t ion of p oin t s is sp ar s e t owa r ds th e ou t er
pa r t s of t h e r egion , t h en t h e fou r cir cu m scr ibed fu n ct ion s will n ot pr odu ce est im a t es for
t h ose a r ea s, wh er ea s t h e n or m a l will. Con ver sely, t h e n or m a l d is t r ibu t ion ca n ca u se som e
edge effect s t o occu r (e.g., spik es a t t h e edge of t h e r efer en ce grid), pa r t icu lar ly if t h er e a r e
m a n y point s n ea r on e of t h e bou n da r ies of t h e st u dy ar ea . The fou r circum scribed
fu n ct ion s will p rod uce les s of a pr oblem a t t h e ed ges , a lt h ou gh t h ey s till ca n pr od uce s om e
s pik es . Wit h in t h e fou r cir cu m scr ibed fu n ct ion s, t h e u n ifor m a n d qu a r t ic t en d t o s m oot h
th e data more whereas t he tr iangular a nd n egat ive exponent ial tend t o empha size peaks
a n d va lleys . Th e differ en ces be t ween t h es e differ en t ker n el fun ction s a r e sm a ll, h owever .
Th e u ser sh ould pr obably st a r t wit h t h e defa u lt n orm a l fun ction a n d a dju st a ccord in gly t o
how the sur face or cont our looks.
Choice of Band w idth
Th e u ser m u st defin e t h e ba n dwid t h pa r a m et er . Th er e a r e t h r ee t yp es of
ba n dwid t h s for t h e sin gle ker n el d en sit y r ou t in e - fixed in t er va l, va r ia ble in t er va l, or
a da pt ive in t er va l.
F i x e d i n t e rv a l
Wit h a fixed ba n dw idt h , t h e u ser m u st sp ecify t h e in t er va l t o be used a n d t h e u n it s
of m ea su r em en t (squ a r ed m iles, squ a r ed n a u t ica l miles, squ a r ed feet, squ a r ed k ilom et er s,
or s qu a r ed m et er s). Depen din g on t h e t ype of ker n el es t im a t e u sed, t h is in t er va l h a s a
s ligh t ly d iffer en t m ea n in g. F or t h e n or m a l k er n el fu n ct ion , t h e ba n dwid th is th e s ta n da r d
d evia t ion of t h e n or m a l d is t ribu t ion . F or t h e u n ifor m , qu a r t ic, t r ia n gu la r , or n ega t ive
exponen t ia l ker n els , t h e ba n dw idt h is t h e r a diu s of th e sea r ch a r ea t o be in t er pola t ed.
Sin ce th er e a r e t wo files bein g comp a r ed, t h e fixed in t er va l is a pp lied both t o th e firs t file
a n d t h e secon d file.
Va r ia b le i n t e r v a l
Wit h a va r ia ble in t er va l, ea ch file (t h e first a n d t h e secon d) h a ve differ en t in t er va ls.
F or bot h , t h e u n it s of m ea su r em en t s m u st be specified (squ a r ed m iles, s qu a r ed n a u t ica l
m iles , squ a r ed feet , squ a r ed kilomet er s, or squ a r ed m et er s). Th er e is a good r ea son wh y a
u ser m igh t wa n t va r ia ble in t er va ls . In com pa r in g t wo ker n el est im a t es, t h e m ost com m on
com p a r is on is t o d ivid e on e by t h e ot h er . H owever , if t h e d en s it y es t im a t e for a p ar t icu la r
cell in t h e den om ina t or a ppr oa ch es zer o, th en t h e r a t io will blow u p a n d becom e a very
la r ge n u m ber . Visu a lly, t h is w ill be s een a s s pik es in t h e dist r ibu t ion, t h e r es u lt , u su a lly,
of t oo few ca s es . In t h is ca s e, t h e u s er m igh t d ecid e t o s m oot h t h e d en om in a t or m or e t h a n
n u m er a t or in or der t o r edu ce t h ese spik es. F or exa m ple, t h e in t er va l for t h e fir st file (t h e
n u m er a t or) could be 1 m ile wh er ea s t h e in t er va l for t h e secon d file (th e den omin a t or) could
8.26

be 3 m iles. E xper im en t a t ion will be n ecess a r y t o see wh et h er t h is is wa r r a n t ed. Bu t , in


our exper ien ce, it fr equ en t ly ha pp en s wh en eit h er t h er e a r e t wo few ca ses or t h er e is a n
irregular boun dar y to th e region with a nu mber of incidents grouped at one of th e edges.
Ad a p t iv e i n t e r v a l
An a da pt ive ba n dwid th a dju st s t h e ba n dwid th in t er va l s o t h a t a min im u m n u m ber
of p oin t s (s a m ple s ize) is fou n d . Th is sa m p le s ize is a pp lied t o bot h t h e fir s t file a n d t h e
secon d file. It h a s t h e a dva n t a ge of pr ovidin g cons t a n t pr ecision of t h e k er n el es t im a t e
over t h e en t ir e r egion . Th u s , in a r ea s t h a t h a ve a h igh con cen t r a t ion of p oin t s , t h e
ba n dwidt h is na r r ow wh er ea s in a r ea s wh er e t h e con cen t r a t ion of poin t s is m or e spa r se,
t h e ba n dw idt h will be la r ger . Th is is t h e defau lt ba n dw idt h choice in Crim eS tat s in ce
con sis t en cy in st a t ist ical pr ecision is im por t a n t . Th e degr ee of pr ecision is gen er a lly
depen den t on t h e sa m ple size of t h e ban dwidt h int er val. The defau lt is a m inim u m of 100
poin t s. Th e u se r can m a k e t h e es t im a t e finer by choosin g a sm a ller n u m ber of poin t s (e.g.,
25) or s m oot h er by choosin g a la r ger n u m ber of poin t s (e.g., 200).
Us e k er n el b a n d w id th s t h a t p r od u ce st a bl e est im a tes
Note: with a du el ker n el ca lcu lat ion , par t icu lar ly t h e r a t io of on e var iable t o
a n ot h er , be ca r efu l a bou t ch oosin g a ver y s m a ll ba n dwid t h . Th is cou ld h a ve t h e effect of
cr ea t in g s pik es a t t h e edges of t h e st u dy a r ea or in low popu la t ion den sit y a r ea s. F or
exa m ple, in low popu la t ion den sit y a r ea s, t h er e will p r oba bly be fewer even t s t h a n in m or e
bu ilt -u p a r ea . F or t h e den om in a t or of a r a t io est im a t e, a n ext r em ely low va lu e cou ld ca u se
t h e r a t io t o be exaggera t ed (a spik e) r elat ive t o n eigh bor ing grid cells. Usin g a la r ger
ba n dwidt h will pr odu ce a m or e st a ble aver a ge.
Output U nits
Th e u ser m u st in dica t e t h e m ea su r em en t u n it s for t h e den sit y est im a t e in point s
per squ a r ed m iles, squ a r ed n a u t ica l miles, squ a r ed feet, squ a r ed k ilom et er s, or squ a r ed
met ers.
Int e n si ty or We ig h tin g Vari ab le s
If a n int en sit y or weigh t ing var iable is u sed for eith er t h e firs t file or t h e secon d file,
t h ese boxes m u st be checked. Be ca r eful a bout u sin g both a n in t en sit y a n d a weigh t in g
va r ia ble t o avoid dou ble w eigh t in g.
D e n s it y Ca lc u la ti on s
Th e u ser m u st in dica t e t h e t ype of den sit y ou t pu t . Th er e a r e six types of den sit y
ca lcu la t ion s t h a t ca n be con d u ct ed wit h t h e d u a l k er n el d en s it y r ou t in e. Th e ca lcu la t ion s
a r e a pplied t o ea ch r efer en ce cell:

8.27

1.

Th er e is th e ratio of densities, t h a t is t h e first file divid ed by t h e secon d file.


Th is is t h e defa u lt ch oice. F or exa m ple, if t h e fir st file is t h e loca t ion of au t o
t h efts in ciden t s a n d t h e secon d file is t h e loca t ion of cens u s t r a ct cen t r oids
with t h e popula t ion a ssign ed a s a n int en sit y va r iable, th en r a t io of den sit ies
wou ld divid e t h e ker n el est im a t e for a u t o t h eft s by t h e ker n el est im a t e for
populat ion a nd would be an estimat e of au to th efts r isk.

2.

Th er e is also th e log ratio of d en sities. Th is is t h e n a t u r a l loga r it h m of t h e


den sit y r a t io, t h a t is
Log r a t io of den sit ies = Ln [ g(x j) / g(y j) ]

(8.10)

wh er e g(xj) is t h e den sit y est im a t e for t h e fir st file a n d g(y j) is t h e den sit y
est ima t e for t h e secon d file. For a var iable t h a t h a s a spa t ially skewed
distr ibut ion, such t ha t m ost r eference cells ha ve very low density estima tes,
bu t a few h a ve ver y h igh den sit y est im a t es, con ver t in g t h e r a t io in t o a log
fun ction will tend t o mu te th e spikes th at occur . This measu re ha s been
u se d in st u die s of ris k (Kelsa ll a n d D iggle, 1995b).
3.

Th er e is th e absolu te d ifferen ce in d en sities, th at is th e first file minus t he


se con d file. Th is ca n be a u se ful ou t pu t for exa m in in g differ en t ia l effect s.
F or exam ple, by u sin g th e cen t r oids of cen su s block gr ou ps (see exam ple 2
below) with t h e popu la t ion of t h e censu s block gr oup a ss igned a s a n in t en sit y
or weigh t in g var ia ble, t h er e is a sligh t bia s p r odu ced by t h e spa t ia l
ar ra ngements of th e block groups. The U. S. Censu s Bureau suggests th at
cen su s u n it s (e.g., cen su s t r a ct s , cen su s block gr ou ps ) be d ra wn so t h a t t her e
a r e a ppr oxim a t ely equ a l p opu la t ion s in ea ch u n it . Th u s, block gr ou ps
t owa r ds t h e cen t er of t h e m et r opolita n a r ea t en d t o be sm a ller beca u se t h er e
is a h igher popu la t ion den sit y a t t h ose loca t ion s. Th u s, t h e spa t ia l
a r r a n gem en t of t h e block gr oup s will ten d t o pr odu ce a k er n el es t im a t e
wh ich h a s a h igh er va lu e t owa r ds t h e cen t er in depen den t of t h e a ct u a l
popu la t ion of t h e block gr oup ; th e bia s is ver y sm a ll, less t h a n 0.1%, but it
does exist . A m ore pr ecise est im a t e cou ld be pr odu ced by s u bt r a ctin g t h e
k er n el es t im a t e for t h e block gr oup cent r oids w it hou t u s in g p op u la t ion a s t h e
in t en sit y var ia ble from t h e k er n el es t im a t e for t h e block gr oup cent r oids with
popula t ion a s t h e int en sit y va r iable. The r esu ltin g ou t pu t cou ld t h en be rea d
ba ck in t o Crim eS tat a n d u sed a s a m or e pr ecis e m ea su r e of popu la t ion
dis t r ibu t ion . Th er e a r e ot h er u ses of t h e differ en ce fu n ct ion , s u ch a s
su bt r a ct ing t h e est ima t e for t h e popula t ion -a t -r isk from t h e inciden t
dis t r ibu t ion r a t h er t h a n t a k in g th e r a t io or by calcula t in g t h e n et cha n ge in
populat ion between t wo censuses.

4.

Th er e is th e relative d ifferen ce in d en sities. Like t h e r ela t ive den sit y in t h e


sin gle-ker n el rout ine (discu ssed a bove), th e r elat ive differ en ce in den sit ies
firs t st a n da r dizes th e den sit ies of ea ch file by dividing by th e grid cell ar ea
a n d t h en su bt r a cts t h e secon da r y file r ela t ive den sit y fr om t h e pr im a r y file
8.28

r ela t ive den sit y. This ca n be u seful in calcula t in g ch a n ges bet ween t wo tim e
per iods, for exam ple in ca lcu lat ing a cha n ge in rela t ive dens ity between t wo
cen su ses or a ch a n ge in t h e cr im e den sit y bet ween t wo t im e per iod s.
5.

Th er e is th e sum of the densities, t h a t is, t h e den sit y est im a t e for t h e firs t file
plu s t h e den sit y est im a t e for t h e secon d file. Aga in , t h is is a pplied t o ea ch
r efer en ce cell a t a t im e. A poss ible u se of t h e su m oper a t ion is t o combin e
t wo differ en t den sit y su r faces, for exa m ple t h e den sit y of robber ies plu s t h e
density of assa ults;

6.

F ina lly, th er e is th e relative su m of d en sities bet ween t h e pr ima r y file a n d


t h e secon da r y file. The r elat ive su m of den sit ies firs t st a n da r dizes th e
densities of each file by dividing by th e grid cell ar ea an d th en subtr acts t he
secon da r y file r ela t ive den sit y fr om t h e pr im a r y file r ela t ive den sit y. This
ca n be u sefu l for id en t ifyin g t h e t ot a l effect s of t wo dis t r ibu t ion s. F or
exa m p le, t h e t ot a l im p a ct of r obber ies a n d bu r gla r ies on a n a r ea ca n be
est im a t ed by t a k in g t h e r ela t ive den sit y of r obber ies a n d a ddin g it t o t h e
r ela t ive den sit y of bu r gla r ies. Th e r esu lt is t h e com bin ed r ela t ive den sit y of
r obber ies a n d bu r gla r ies per u n it a r ea (e.g., r obber ies a n d bu r gla r ies per
squ a r e m ile).

Ou tp u t F ile s
F ina lly, th e u ser m u st specify th e file for m a t s for t h e ou t pu t . The r esu lts ca n be
out pu t in t h r ee form s. F ir st , t h e r es u lt s a r e disp la yed in a n out pu t t a ble. S econ d, t h e
r esu lt s ca n be out pu t in t o two ra st er gr id for m a t s for dis pla y in a su r face m a pp in g
p rogr a m : S u rfer for Win d ow s form a t a s a .d a t file (Golden Soft wa r e, 1994) a n d ArcView
S pat ial A n alyst for m a t a s a .a s c file (E S RI , 1998). Th ir d , t h e r es u lt s ca n be ou t p u t a s
polygon gr id s in t o ArcView .sh p, M apIn fo .mif an d Atlas*GIS .bn a for m a t (see foot n ot e
1). All bu t S u rfer for Win d ow s r equ ir e t h a t t h e r efer en ce grid be crea t ed by Crim eS tat.
Ex am ple 2: Ke r n e l D e n s i t y Es t i m a t e s o f Ve h i c l e Th e f t s
R e l a ti v e t o P o p u l a ti o n
As a n exa m p le of t h e u s e of t h e d u a l k er n el d en s it y r ou t in e, t h e d u el r ou t in e is
a pp lied in both t h e Cit y of Balt im ore a n d t h e Coun t y of Balt im ore t o 14,853 m otor veh icle
t h eft loca t ions for 1996 r ela t ive t o th e 1990 p opu la t ion of cen su s block gr oup s. Aga in , a
r efer en ce grid of 100 colu m n s by 108 r ows wa s gen er a t ed by Crim eS tat.
Figur e 8.12 shows the r esulting density estima te as a S u rfer for Win d ow s ou t pu t ;
a ga in , t h er e is a m a p view, a su r fa ce view, a n d a con t ou r view. Th e n or m a l k er n el fu n ct ion
wa s u sed a n d a n a da pt ive ban dwidt h of 100 point s wa s selected. As seen , th er e is a ver y
h igh con cent r a t ion of au t o th eft in ciden t s w it h in t h e cen t r a l pa r t of t h e m et r opolita n a r ea .
Th e con t our view s u ggest five or six p ea k a r ea s t h a t a r e close t o each ot h er .

8.29

Figure 8.12:

Baltimore County Vehicle Thefts: 1996


Kernel Density Interpolation

Contour View

Surface View
39.70

N
39.60

39.50

39.40

39.30

39.20

-76.80

Ground Level View

-76.70

-76.60

-76.50

-76.40

-76.30

Mu ch of t h is con cent r a t ion , however , is pr odu ced by h igh popu la t ion den sit y in t h e
m et r opolita n cen t er . Figur e 8.13, for exam ple, shows t h e ker n el est ima t e for 1349 cen su s
block gr ou ps for bot h t h e City of Balt imore a n d t h e Cou n t y of Balt imore wit h t h e 1990
popula t ion a ssign ed a s t h e int en sit y va r iable. Again , th e n or m a l ker n el fu n ct ion wa s u sed
wit h a n a da pt ive ba n dwid t h of 100 poin t s be in g s elect ed. Th e m a p sh ows t h r ee views: 1) a
su r fa ce view; 2) a con t ou r view; a n d 3) a groun d level view lookin g directly nort h . The
dist r ibut ion of popula t ion is, of cou r se, also highly con cen t r a t ed in t h e m et r opolita n cen t er
with t wo pea ks , quit e close t o ea ch ot h er with severa l sm a ller pea ks .
Wh en t h ese t wo ker n el es t im a t es a r e com pa r ed u sin g t h e du a l ker n el de n sit y
r ou t ine, a m or e com plica t ed pictu r e em er ges (figu r e 8.14). This r ou t ine h a s con du ct ed
t h r ee oper a t ion s: 1) it ca lcula t ed t h e dist a n ce bet ween ea ch of t h e 10,800 r efer en ce cells
a n d t h e 14,853 a u t o t h eft loca t ion s, evalu a t ed t h e ker n el fu n ct ion for ea ch m ea su r ed
dis t a n ce, a n d su m m ed t h e r esu lt s for ea ch r efer en ce cell; 2) it ca lcu la t ed t h e dis t a n ce
bet ween ea ch of t h e 10,800 r efer en ce cells a n d t h e 1349 censu s block gr oup s wit h
popu la t ion a s a n in t en sit y va r ia ble, eva lu a t ed t h e k er n el fun ction for ea ch in t en sit yweight ed dist a n ce, an d su m m ed t h e r esu lts for ea ch r efer en ce cell; a n d 3) divided t h e
k er n el d en sit y est im a t e for a u t o t h eft s by t h e ker n el d en sit y est im a t e for popu la t ion for
ea ch r efer en ce cell loca t ion.
Wh ile t h e con cent r a t ion of m otor veh icle t h efts r ela t ive t o popu la t ion (m otor veh icle
t h eft r isk) is st ill h igh in t h e m et r opolita n cen t er , th er e a r e ban ds of h igh r isk t h a t spr ea d
ou t wa r d , p a r t icu la r ly a lon g m a jor a r t er ia ls . Th er e a r e n ow m a n y h ot s pot a r ea s wh ich
h a ve a h igh dis t r ibu t ion of m otor veh icle t h efts r ela t ive t o th e r esiden t ia l popu la t ion . We
cou ld, of cou r se, refine t h is a n a lysis fur t h er by ta kin g, for exam ple, employm en t a s a
ba selin e va r ia ble r a t h er t h a n popu la t ion ; employmen t is a bet t er in dica t or for t h e da yt im e
popu la t ion dis t r ibu t ion wh er ea s t h e r esid en t ia l p opu la t ion is a bet t er in dica t or for
n igh t t im e popu la t ion dis t r ibu t ion (Levin e, Kim , a n d Nit z, 1995a ; 1995b).
E x a m p l e 3: Ke r n e l D e n s i t y Es t i m a t e s a n d R is k -a d ju s t e d C lu s t e r i n g o f
R o b be r i e s R e la t i ve t o P o p u l a t io n
Th e fina l exa m ple sh ows h ow th e du el k er n el in t er pola t ion com pa r es wit h t h e r isk a dju st ed n ea r est n eigh bor clu st er in g, discus sed in cha pt er 6. F igur e 8.15 s h ows 7 first or der r isk-ad just ed clu st er s over laid on t h e a du el ker n el est ima t e of 1996 robberies
r ela t ive t o 19 90 popu la t ion . 4 As seen , th er e is a cor r espond en ce bet ween t h e ident ified
r isk-ad just ed clu st er s a n d t h e du el ker n el in t er pola t ion of t h e r a t io of r obber ies t o
popula t ion . For a br oa d r egion a l per spective, th e int er pola t ion pr odu ces a n a dequ a t e
m odel of wh er e t h er e is a h igh r obber y r isk . At t h e n eigh borh ood level, however , t h e r isk a dju st ed clu st er s a r e m ore s pecific a n d would be pr efer a ble for u se by police in iden t ifyin g
h igh -r is k loca t ion s.
Th e a dva n t a ge of a du a l ker n el dens ity int er pola t ion r ou t ine is t h a t t wo var iables
ca n be rela t ed t ogeth er . By int er pola t ing on e var iable t o a r efer en ce grid a n d t h en
int er pola t ing a secon d var iable t o t h e sa m e r efer en ce grid, t h e t wo var iables h a ve been

8.31

Figure 8.13:

Baltimore Metropolitan Population: 1990


Kernel Density Estimate of Block Group Population

Contour View

Surface View
39.70

N
39.60

39.50

39.40

39.30

39.20

-76.90

Ground Level View

-76.80

-76.70

-76.60

-76.50

-76.40

-76.30

Figure 8.14:

Baltimore County Vehicle Theft Risk


Kernel Density Ratio of 1996 Vehicle Thefts to 1990 Population
Contour View

Surface View
39.70

N
39.60

39.50

39.40

39.30

39.20

-76.80

Ground Level View

-76.70

-76.60

-76.50

-76.40

-76.30

Figure 8.15:
Risk-adjusted Robbery Clusters and Interpolated Robbery Risk
1996 Robberies Relative to 1990 Population

Baltimore County

#
#

#
#

#
##

# #
#
#

#
#
#
# ##
# ##
##
#

#
# ###
#

##
#
#

#
#
# ##
# # ##
#
#

###

#
#
#
##
########
#
### ## #
## #
#
##
#
###
##
# # ##### ## #
# # # ##
# ###
##
# #
#
#
# ##
# #
#
## ### ## ## ##
#
# ##
#
#
##
#
## #
## # ##
## ###
###
##
##
## #
#
# ###
# # #### ## #### #
#
# #
#

#
#
#

##
##

##
#

##
## # #
##
#
###

# #
# #
# ###
#
#

# #

Robbery locations
1st-order robbery risk clusters
City of Baltimore
Baltimore County
Robberies Per 1000 Population
Low
#

##
## ##
## #
# ###
###
##
#
#
# #
##
# ##
##
#
#
#####
## #
##
# ##
#
############
### #
#
##
#
###
### ######
##
# ###
# # ##
#
#
##
#
#
#
#
#
# ##
### #
###
#
##
#
#
# #
##
##
#
##
## #
#
##
# #####
#
#
# ## ### #
##
#
##
#
#
#
##
#
##
######
####
#
##### ###
#
##
#
##
#####
#
# #
# ### ##
# ## # #
#
# ### # #######
# ## #
#
## ######## ### #
#
##
#
## ###### #
# # #
#
# #
## ###
#
#
#
#
#####
# ##
#
#
# #
# # ##
#
#
#
#
#

###

#
#
##

#
#

#
##
##

##
## #
###
#
#
##
#
#
#

##
#
#

###
#
#
# #
###### ##
# ## # # #
##
##

#
##
#
###

#
#
#
#

##
##

##
##
#

#
#

### ##
#

#
# ####
#
#
# #

#
#
## #
#
#
###
#
#
##
#
##
# #
# ##
###
#
##
#
# #
# # #
# #
### ### #
# ##
##
#
### #
#
#
## #
#####
# #####
##
# #
##
#
##
#
#
####
##
#
#
###
###
#
#
##
# ##

# ####

##
#
#

# ## #
## ### ## ##
## ### ##
#
##

# ##
##

#
#
#
##
## #

City of Baltimore

#
#

# ####
#
#
# #
#
## ## # #
#
#
#
### #
#
###
## ##
#
## ###
#
##
#####
#
#
# ###
#
#
# ## # ###
#
##
##
# # ##
#
##
#
#
#
#
#
######
##
#
## ##
# # ##
# ###
##
#

#
##

# # #
#

High

N
W

E
S

18 Miles

Using Small Area Estimation to Target Health Services


Thomas F. Reynolds, MS
University of Texas-Houston School of Public Health
In Texas, the City of Houston and Harris County organized a Public Health
Task force to make recommendations concerning the provision of health services for
those without health insurance. Task force members wanted to know approximately
how many area citizens did not have health insurance.
Data from the two most recent Current Population Survey Annual Social and
Economic Supplements (CPS-ASEC, 2003-04) were used to derive a synthetic
estimate using a stratified model. Estimates were calculated at census tract and
block group levels. Selected political divisions were clipped from base maps for
political officials and legislators.
Percentages are indicative of risk. On the other hand, numbers are essential
for targeting physical resources. There is seldom a perfect correspondence between
high percentages and large numbers. For example, an area with a concentration of
multi-family housing may have a relatively small percentage, but a large number, of
uninsured. Percentage maps of the uninsured (figure 1) are generally clustered and
informative; however, due to large variations in population numbers at both levels of
census geography, maps of the population densities of uninsured proved most
valuable to officials (figure 2).
CrimeStat was used to develop the density maps. The single kernel density
routine was used to estimate the density of block group values using the centroid to
represent the values and the number of uninsured as an intensity value. The Moran
Correlogram was used to select the type of kernel for the single-kernel interpolation
(a uniform distribution) and an optimal bandwidth.
Fig. 1: Percent Uninsured

Fig. 2: Population Density of Uninsured

int er pola t ed t o t h e sa m e geogra ph ica l un its . The t wo int er pola t ion s can t h en be rela t ed, by
d ivid in g, s u bt r a ct in g, or s u m m in g. As h a s been m en t ion ed t h r ou gh ou t t h is m a n u a l, on e of
t h e pr oblems with t ech n iques t h a t depen d on t h e con cen t r a t ion of inciden t s is t h a t t h ey
ign or e t h e u n der lyin g p op ula t ion -a t -r is k. Wit h t h e d ua l r ou t in e, h owever , we ca n st a r t t o
exa m in e t h e r is k a n d n ot ju s t t h e con cen t r a t ion .

Vi su a l ly P resen tin g Kern el E st imat e s


Wh et h er t h e sin gle- or du el-ker n el es t im a t e is u sed, t h e r esu lt is a gr id
in t er pr et a t ion of th e da t a . By sca lin g t h es e va lu es by color in a GIS p r ogra m , a
vis u a liza t i on of t h e da t a i s obt a i n ed . Ar e a s w it h h i gh e r de n sit ie s ca n b e s h ow n in d a r k er
t on es a n d t h ose wit h lower den sit ies ca n be sh own in ligh t er t on es; som e people do t h e
opposit e wit h t h e h igh den sit y a r ea s bein g ligh t er .
To m a k e t h e visu a liza t ion even m or e r ea list ic, on e cou ld u se a GIS pr ogr a m t o cu t
ou t t h os e gr id cells t h a t ar e ou t s id e t h e s t u dy a r ea or a r e on wa t er bod ies . Befor e d oin g
t h is , h owever , be su r e t o r e-sca le t h e est im a t ed Z va lu es so t h a t t h ey will s u m t o t h e t ot a l
of t h e origin a l grid. F or exa m ple, if t h e origin a l sa m ple size wa s 1000, t h en t h e gr id cells
will su m t o 1000 if th e a bsolu t e den sit y opt ion is chosen . If, sa y, 20% of th ese cells a r e
t h en r em oved to impr ove th e visu a liza t ion , th en t h e grid cell Z valu es h a ve to be re-sca led
so th a t t h eir s u m will con t inu e t o be 1000. A sim ple wa y to do t h is is t o, firs t , add u p t h e Z
va lu es for t h e r em a in in g cells a n d, s econd, m u lt ip ly ea ch gr id cell Z by t h e r a t io of t h e
original sum t o th e reduced sum.
Th e visu a liza t ion is u sefu l for a br oa d, r egion a l view. It is n ot pa r t icu la r ly u sefu l
for m icro an alysis. The use of one of th e cluster r out ines discussed in cha pters 6 an d 7
wou ld be m ore a pp r opr ia t e for sm a ll a r ea a n a lysis .

Conclusion
Ker n el den sit y est im a t ion is one of t h e m oder n sp a t ia l st a t ist ical t echn iqu es .
Th er e is cu r r en t ly resea r ch on t h e u se of t h is t echn iqu e in both t h e st a t ist ical t h eory a n d in
develop in g a pplica t ion s. F or cr im e a n a lysis , t h e t ech n iqu e r epr esen t s a power fu l wa y of
con d u ct in g bot h h ot s pot a n a lys is a s well a s bein g a ble t o lin k t h e h ot s pot s t o a n
u n der lying popula t ion -a t -r isk. It ca n be us ed bot h for police deploymen t by ta r getin g ar ea s
of h igh con cen t r a t ion of in cid en t s a s well a s for p r even t ion by t a r get in g a r ea s wit h h igh
r is k . It ca n a ls o be u sed a s a r esea r ch t ool for a n a lyzin g t wo or m or e dis t r ibu t ion s. Mor e
developm en t of t h is a ppr oa ch ca n be expect ed in t h e n ext few yea r s.

8.36

The Risk of Violent Incidents Relative to Population Density in Cologne


Using the Dual Kernel Density Routine
Dietrich Oberwittler and Marc Wiesenhtter
Max Planck Institute for Foreign and International Criminal Law
Freiburg, Germany
When estimating the density of street crimes within a metropolitan area by interpolating crime incidents, the result is usually a very high concentration in the city center.
However, there is also a very high concentration of people either living or pursuing their
daily routine activities in these areas. The question emerges how likely is a criminal event
when taking into account the number of people spending their time in these areas. The
CrimeStat duel kernel density routine is able to estimate a ratio density surface of crime
relative to the 'population at risk'.
In this example, data on calls to the police for assault and battery from April 1999 to
March 2000 (N=6363 calls) and population from Cologne were used. Exact information on
the number of people spending their time in the city does not exist. Therefore, 1997 counts of
passengers entering and leaving the public transport system at each of 550 stations and bus
stops in the city was used as a proxy variable. The number of persons at each station or bus
stop was assigned to adjacent census tracts and added to the resident population resulting in
a crude measure of the 'population at risk'.
In the dual kernel routine, the density estimate of crime incidents is compared to the
density estimate of the population at risk, defined by the centroids of census tracts with the
number of persons as an intensity variable. We chose the normal method of interpolation
and adaptive intervals with a minimum of five points. The adaptive bandwidth adjusts for
the fact that there are fewer incidents and census tracts at the edges of the city, resulting in
a relatively smoother density surface for the ratio. The results were output to ArcView.
The effect of adjusting the crime distribution for the underlying 'population at risk'
becomes quite visible. Whereas the concentration of crime is highest in the city center (left
map), the crime risk (right map) is in fact much higher in several more distant areas that are
known for high concentrations of socially disadvantaged persons. Given the imperfect nature
of the population data these results should be interpreted as a broad view on the distribution
of crime risk that, nevertheless, has important policy implications.
Single kernel density of crime incidences
(assault & battery, Cologne 1999/2000)

Dual kernel density of crime incidences


relative to population at risk

Kernel Density Interpolation of


Police Confrontations in
Buenos Aires Province, Argentina: 1999
Gastn Pezzuchi
Crime Analyst
Buenos Aires Province Police Force
Buenos Aires, Argentina
One of our first tryouts with the CrimeStat software involved the calculation
of both single and dual kernel density interpolations using data on 1999
confrontations with the police within Buenos Aires Province, an area that covers 29
counties around the Federal Capital. The confrontations include mostly gun fights
with the police but also other attacks (e.g., knives, rocks, sticks). In the last three
years, there has been an increase in confrontations with the police. The single
interpolation shows a density surface that gives a good picture of the ongoing level of
violence while the duel interpolations shows a risk surface using the personnel
deployment data; the latter are confrontations relative to the number of police
deployed. Typically, police are allocated to areas according to crime rates.\
Example: Kernel Density Estimation
(CrimeStat)

Dual Interpolation - Ratio of densities (Risk)

Single Interpolation - Density of Events

Buenos Aires City


(N o-Data)

Buenos Aires City


(No-Data)

Fron tera
De ns ity
Lo w

Fron tera

Events = Police shootings (aprox. 800)

Risk
Lo w

Medium

Medium

High
No Data

High
No Data

Both images are quite different, suggesting varying policing strategies. For
example, though there are two well-defined hot spot areas in the Province (one in
the north, the other in the south), the high levels of risk detected in the southern
areas came as a complete surprise. The northern area has a higher crime rate than
the southern area, hence a high police deployment. However, the level of
confrontation are approximately equal between the two areas.

Evolution of the Urbanization Process in the Brazilian Amazonia


Silvana Amaral, Antnio Miguel V. Monteiro, Gilberto Cmara, Jos A. Quintanilha
INPE, Instituto Nacional de Pesquisas Espaciais, Brazil
The Brazilian Amazon rain forest is the worlds largest contiguous area of
tropical rain forest in the world. During the last three decades, the region has
experienced the largest urban growth rates in Brazil, a process that has reorganized
the network of human settlements in the region. We used the CrimeStat single and
duel kernel density routines to visualize trends in urbanization from 1996 to 2000 in
Amazonia. Two variables were used to measure urbanization: 1) the concentration
of urban nuclei (city density); and 2) the ratio of urban to total population.
The concentration of cities was spatially associated withfederal roads in the
eastern and southern portions, and along the Amazonas River in the middle of the
region. Additionally, the surfaces of urban population show that city density is not
always associated with large urban populations. From 1996 to 2000 city density
increased in the western Amazonia (Par state) at a greater rate than the growth of
the urban population. In the southeastern part of the region (Rondnia state), there
were many urban centers. But the ratio of urban to total population was small,
indicating that they are predominately agricultural regions.

City density - 1996

Urban Pop/Total Pop-1996

City density - 2000

Urban Pop/Total Pop-2000

En dn ot e s t o Ch ap te r 8
1.

Th er e a r e differ en ces in opinion a bout h ow wide a pa r t icula r fixed ba n dw idt h


sh ou ld be det er m ined . The sm oot h ing is done for a dist r ibut ion of valu es, Z. If t h er e
a r e on ly u n ique point s (an d, hen ce, th er e is no Z valu e a t a poin t ), th e dist a n ces
between point s ca n be subs t itu t ed for Z. Th u s, Mean D is th e mea n dist a n ce, sd(D)
is t h e st a n da r d devia t ion of dis t a n ce, a n d iqr (D) is t h e in t er -qu a r t ile r a n ge of
dis t a n ces bet ween poin t s. Th ese wou ld be su bst it u t ed for Mea n Z, s d(Z), a n d iqr (Z)
r espectively
Silver m a n (1986; 45-47; H r dle, 1991; F a r ewell, 1999) pr oposed a ba n dwid t h , h , of:
iqr (Z)
h = 1.06 * m in { sd(Z), -------- } * N -1 /5
1.34
wh er e m in is t h e m in im u m of t h e n ext t wo t er m s, sd (Z ) is t h e st a n da r d devia t ion of
t h e va r ia ble, Z, bein g in t er pola t ed, iqr(Z ) is t h e in t er -qua r t ile r a n ge of Z, an d N is
t h e sa m ple size.
Bowm a n a n d Azza lin i (1997; 31) defin ed a sligh t ly differ en t opt im a l ba n dwid t h for a
n orm a l ker n el.
4
h = { --------- }1 /5 * sd(Z)
3N
To avoid bein g influen ced by out lier , t h ey su ggest ed u sin g t h e m edia n a bsolu t e
devia t ion est im a t or for sd(Z)
Z(i) - Med ia n Z
MAD(Z) = m edia n { --------------------- }
0.6745
Scot t (1992) su ggest ed a n u pper bou n d on t h e n or m a l k er n el of
h = 1.144 * sd(Z) * N -1 /5
Ba iley a n d Ga t r ell (1995, 85-87) offer ed a r ou gh ch oice for t h e ba n dwid t h of
h = 0.68 * N -0.2
bu t su ggest ed t h a t t h e u ser cou ld exp er im en t wit h differ en t ba n dwid t h s t o exp lor e
t h e su r fa ce.
On t h e oth er h a n d, t h e con cept of a n a da pt ive ba n dw idt h is ba sed m ore on sa m plin g
8.40

t h eor y (Ba iley a n d Ga t r ell, 1995). By increa sin g th e ban dwidt h u n t il a fixed
n u m ber of poin t s a r e cou n t ed en su r es t h a t t h e level of p r ecis ion is con st a n t
th roughout th e region. As with all sam pling, th e stan dar d error of th e estima te is a
fu n ct ion of t h e s a m ple s ize; a la r ger s a m ple lea d s t o s m a ller er r or . In gen er a l, if
t h er e wa s in depen den t sa m plin g, t h e 95% con fid en ce in t er va l of a ba n dwid t h for a
n or m a l ker n el cou ld be ap pr oxim a t ed by
.5
95% C.I . = Mea n (Z) +/- 1.96 * --------- * sd(Z)
N(h )1 /2
wh er e N (h) is t h e a da pt ive sa m ple size (t h e n u m ber of point s cou n t ed wit h in t h e
ba n dwidt h for t h e a da pt ive ker n el). This a ssu m es t h a t a poin t h a s a n equa l
likelih ood of fallin g wit h in t h e ba n dw idt h of one cell com pa r ed t o an a dja cent cell
(i.e., it sit s on t h e boun da r y of th e ba n dw idt h cir cle). Th e a da pt ive ba n dw idt h
cr iter ia r equ ires t h a t t h e ban dwidt h be increa sed u n t il it cap t u r es t h e specified
n u m ber of poin t s. On a ver a ge, if t h er e a r e N poin t s in a r egion of ar ea , A, a n d if t h e
a da pt ive sa m ple size is N(p), t h en t h e a ver a ge a r ea r equ ir ed t o ca pt u r e N (p) poin t s
is
N(p) * A
A(p) = -------------N
a n d t h e a ver a ge ba n dw idt h , Mea n (h), is
A(p)
N(p) * A
Mea n (h ) = SQRT[------------] = SQRT[ ---------------]
B
N*B
E a ch of th es e pr ovide differ en t crit er ia for t h e ba n dw idt h size wit h t h e a da pt ive
bein g t h e m ost con ser va t ive. For exa m ple, for a st a n da r dized dis t r ibu t ion wit h
1000 dat a points, a sta nda rdized mean of Z of 0 and a sta nda rdized stan dar d
devia t ion of 1, t h e Silver m a n cr it er ia wou ld pr odu ce a ba n dwid t h of 0.2663; t h e
Bowm a n a n d Azza lin i cr it er ia wou ld pr odu ce a ba n dwid t h of 0.2661; t h e Scot t
crit er ia would pr odu ce a ba n dw idt h of 0.2874 a n d t h e Ba iley a n d Ga t r ell crit er ia
wou ld pr odu ce a ba n dwidt h of 0.1708. For t h e a da pt ive int er val, if t h e r equ ired
a da pt ive sa m ple size is 25, t h en t h e a ver a ge ba n dw idt h would be a pp r oxim a t ely
0.3162 (t h is a ssu m es t h a t t h e a r ea is a circle with a r a diu s of 2 st a n da r dized
st a n da r d d evia t ions ).
2.

Crim eS tat will ou t pu t t h e geogr a ph ica l bou n da r ies of t h e r efer en ce gr id (a polygon


grid) an d will a ssign a t h ird -var iable (ca lled Z ) a s t h e den sit y est im a t e. Of t h e
t h r ee p olygon gr id ou t pu t s, ArcView .s h p files ca n be r ea d dir ect ly in t o t h e
p rogr a m . F or M apIn fo, on t h e ot h er h a n d, t h e ou t p u t is in Ma p In fo In t er ch a n ge
Form at (a .mif an d a .mid file); th e density estima te (also called Z ) is assigned to
8.41

t h e .m id file. Th e files m u st be im por t ed t o con ver t it t o a M apIn fo .t a b file. F or


Atlas*GIS .bn a for m a t , h owever , t h er e a r e t wo files t h a t ar e ou t p u t - a .bn a file
wh ich inclu des t h e bou n da r ies of t h e polygon grid a n d a .dbf file wh ich inclu des t h e
grid cell na mes (called gridcell) an d th e density estima te (also called Z ). The .bna
file m u st be r ea d in fir st a n d t h en t h e .d bf file m u st be r ea d in a n d m a t ch ed t o t h e
va lu e of gridcell. For a ll th ree out put form at s, th e values of Z can be shown as a
t h em a t ic m a p bu t t h e r a n ges m u st be a dju st ed t o illu st r a t e t h e lik ely loca t ion s for
t h e offend er s r esiden ce (i.e., th e defau lt va lu es in t h e GI S p r ogr a m s will not d isp la y
t h e den sit ies ver y well). On t h e ot h er h a n d, t h e defa u lt in t er va l va lu es for S u rfer
for W in d ow s an d ArcView S pat ial A n alyst pr ovide a r ea son a bly good visu a liza t ion of
th e densities.
3.

All t h e Crim eS tat ou t pu t s except for ArcView sh p files ar e in ASCII. Ther e a r e


u su a lly edge effects a n d va lu es in t er pr et ed out sid e t h e a ctu a l geogra ph ical a r ea .
Th ese can be rem oved with a n ASCII ed itor by su bst itu t ing 0' for t h e valu es a t t h e
edges or ou t side t h e st u dy region . For sh p files, t h e valu es a t t h e edges ca n be
edit ed wit h in t h e ArcView progra m. Anoth er altern at ive is to cut out th e cells th at
a r e beyon d t h e st u dy a r ea . Ca r e m u st be t a k en , however , t o not edit a n out pu t file
t oo m u ch ot h er wise it will bear lit t le rela t ion sh ip t o t h e ca lcu lat ed k er n el est ima t e.

4.

Th e r is k -a dju st ed h ier a r ch ica l clu st er in g (Rn n h ) m et h od defin ed t h e la r gest sea r ch


r a diu s bu t a m inim u m of 25 point s being r equ ired t o be clus t er ed. The k er n el
est im a t e for bot h t h e Rn n h a n d t h e du el-k er n el r ou t in es u sed t h e n or m a l
distr ibut ion fun ction with a n a dapt ive ban dwidth of 25 points.

8.42

Вам также может понравиться