Вы находитесь на странице: 1из 356

UBWien FB Ur u.



Iiltiluililililliluilulllriltruil|uuJrrtr p h i ca I

R-13263 cB

Information system inmchaeology

ptark conolly and take fames


The studyof geographicar information systems (GIS)hasmoved from the domainof the computer specialist into the wider archaeologi.a *rn.""fry, pr""larrg U *i l,| a powerfultool for research anddatamanagement. This clearlywrittenbut ngoaous bookprovides a comprehensive guideto rhearchaeological uses lfCfi. f"pi", include: thetheoretical contextandthe "_*"0 basics ofGIS;dataacquisition in"tuaineiutuOu". design; creation ofelevation models; explor"r".ya",."""ly.i. i""rra;;:;;;;"n*"*, statistical spatialanalysis; map argebra; spatiar operations incruding ,ro-"ur"rru,io,' ol slole and aspeclnlteringand erosionmodelling; methods io, aialysingregrons, visibility analysis; nerwork analysis inctuding r,yaroiogi.ui.la.lri;;;;;;?*r"r,""

""a "r?"* an e\rensjve rrnge of archaeological examples, it is an invaluable ::.::,:l'::. "r*]".C source ot practical information about GIS for all *chaeologists, ,ut etfrer in cultu&l resourcemanagement "ngag.,i or acadertuc research. This is an essentialhandbook tbr both the novice and the advanceduse.
James Conolly is former Lecturer in Archaeology at University College London and now CanadaResearchChair in Archaeology ut i."ot Uniu".riiy, canilJai""g"o" the archaeotogical usesofGIS. his research interestsinctude settie;;;;;; t;;.""p"


forpaper anct etectroni" p"uri""ri-""; ti"


::::,j:l:qr: ongms and spreadof agricultureand Aegeanprehistory.

q*i,t,:rive methods andpopularion history, ..p..i"t;



cls andspad"rL*ry* i" a.J""iriilii..l."ar"n sparial analysis andevolutionary arc'h'ae"r"*u" i. :::'::r":i:i"9. :"1.prehisrory, a cont',bntot to Handbook of Archaeorogicar sciences andu ."^t". otJ"i'iito.iut board of World Archaeolop\.

Mark Lake is a lecturer at the Institute Archaeology, Universiry College Loltron, _of where he coordinares rheM.Sc.


Geneml editor Graeme Barker. Universiryof Cambritlge Advisory editors Elizabeth Slater, Unirersit! of Liverpool PeterBogucki, P/lnceton Unirersit! Books in the series Pottery in Archaeolog!, Clive Ofion, Paul Tyers and AIan Vince Vertebtate Taphonomy, R. Lee Lyman Photography in Archaeologyand Consematiot, 2nd edn., PeterG. Dorrell Alln ial Geoarchaeolo$,,A. G. Brown Srells, Cheryl Claasen J. Reitz and Elizabeth S. Wing Zooarchaeology,Elizabeth Sampling in Archaeology, Clive Orton ,E-rcdvdlior4Steve Roskams re?t , 2nd edn., Simon Hillson Jt. Lilri.s,2nd edn..William Andretskey Geographical Information Systemsin Archaeolog), James Conolly and Mark Lake

Cambridge Manuals in Archaeology is a seies of reference handbooks designed for an intemational audience of upper-level undergraduate and graduate students, and professional archaeologists and book archaeologicalscientistsin univenities, museums,researchlaboratoriesand field units. F-ach reference material on practice essential alongside includes a survey of current archaeological contemporary techniques and methodology.

R 12-36 6g



Departnent of AnthropoLogy, Trent Unil,ersir| & hlstitute of Archaeology, Univ ers ity College ln ndon

Mark Lake
Itrstitute of Archaeology, Uftirersit, Couege landa/l

C,a.unnrocr ffi qlxg/uNrl'ERSrrY PRxss




SaoPaulo New York, Melboume,Madrid, CapeTown' Singaporc, Cambridge, UniversityPress Cambridge CB2 2RU, UK The EdinburghBuilding, Cambridge New York UniversityPress, of Americaby Cambddge in the United States Published org www.cambridge. Informationon this title: wwwcamb dge.org/9180521791443 @J. ConollyandM. Lake2006 ight. Subjectto statutoryexception This publicationis in cop)T collectivelicensingagreements' provisions of relevant to the and no reprcductionof any part may takeplacewithout UniversityPress' the written permissionof Cambridge First published2006 Cambridge Printedin the United Kingdom at the UniversityPress, A cataloguerccordfor thispublication is avail^bleJiom the British Library hardback ISBN 13 978-0-521-79330-8 hardback ISBN 10 0-521-79330-0 p petb^ck ISBN 13 978-0-521-'797M-3 0-521-797M-6 10 ISBN Paqerback

or accwacyol hasno responsibilityfor the persistence UniversityPress Cambddge rcferredto in this publication,and URLs for extemalor third-partyintemetwebsites is, or will rcmarn,accwateor that anycontenton suchwebsites doesnot guarantee appropnate.

Tt Lucy and Ello, Paddyand Kat,


List ofrtgures List of tables List of box.es Acknowledgements 1 INTRODUCTION AND THEORETICAL ISSUES IN ARCHAEOLOGICAL GIS 1.1 About this book 1.2 Theoreticalissues 1.3 Conclusion FIRST PRINCIPLES 2.1 Introduction 2.2 The basics 2.3 Cartographicprinciples 2.4 Data models and data structures: the digital representation of spatial phenomena 2.5 Conclusion PUTTING GIS TO WORK IN ARCHAEOLOGY 3. 1 Management of archaeological resources 3.2 GIS and excavation 3.3 Landscape archaeology 3.4 Spatial and simulation modelling 3.5 Conclusion THE 4.1 4.2 4.3 GEODATABASE Introduction Designing a relational databasefor attribute data Spatial data storage and management

paSex\ xvii xix


1 i 3 i0 1l 1l 1l l6 24 31 33 33 36 4l 45 50 51 51 55 57 61 61 61 7'7

SPATIAL DATA ACQUISITION 5.1 Introduction 5.2 Primary geospatialdata 5.3 Secondary data Ix

Contents 5.4 Map rectification and georeferencing 5.5 A note on spatialerror and map generalisation BUILDING SURFACE MODELS 6.1 Introduction 6.2 Interpolation 6.3 Global methods 6.4 Local methods kriging 6.5 Interpolationwith geostatistics: 6.6 Creatingdigital elevationmodels 6.7 Conclusion EXPLORATORY DATA ANALYSIS 7.1 Introduction 7.2 The query 7.3 Statisticalmethods 7.4 Data classification /.1 uoncluslon SPATIAL ANALYSIS 8.1 Introduction 8.2 Linear regression 8.3 Spatial autocorrelation 8.4 Cluster analysis 8.5 Identifying clustermembership 8.6 Density analysis 8.7 Local functions 8.8 Predictivemodelling 8.9 Conclusion MAP ALGEBRA, SURFACE DERIVATIVES AND SPATIAL PROCESSE S 9.1 Introduction: point and spatialoperations 9.2 Map algebra 9.3 Derivatives:terrain form 9.4 Continuity and discontinuity erosion 9.5 Surfaceprocesses: 9.6 Conclusion l0 REGIONS: TERRITORIES' CATCHMENTS AND VIEWSHEDS 10.1 Introduction: thinking aboutreglons 10.2 Geometricalregions 86 88 90 90 90 9I 94 97 100 111 112

112 122 135 148 1,49 149 149 158 162 168 1'.73 t77 179 186

187 187 188 189 197 202 206

208 208 209

Contents 10.3 Topographical regions 10.4 Conclusion l1 RO U TES :NE TW ORK S ,COS T P AT HS HYD R OL OGY 11.1 Introduction 11.2 Representing networks 11.3 Analysingnetworks 11.4 Networks on continuous surfaces 11.5 Conclusion
213 233

234 234 234 238 252 262 263 263 263 264 265 276 278 280 280 281 283 287 289 307 327

12 MAPS AND DIGITAL CARTOGRAPHY 12.1 Introduction 12.2 Designing an effectivemap 12.3 Map design 12.4 Thematic mapping techniques 12.5 Internetmapping 12.6 Conclusion


13.1 13.2 13.3 13.4 Introduction Metadatastandards Creatingrnetadata Conclusion

Glossary References Index


2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 3.1 3.2 3.3 3.4 3.5 4.1 4.2 4.3 4.4 4.5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9

by GIS. The main tasksperformed data' of archaeological characteristics Spatialandaspatial Polarcoordinates. and oneline of A conicalprojectionwith two lines of secancy tangency. conicalprojectionwith oneline of tangency Albersequal-area anda meridian. An azimuthal Projection. A cylindricalProjection. system coordinate Cartesian Pythagoras'theorem. primitives'. Vector'geographic to attributedata linked Vectorobjects relatedpolygons. Topologically ofpoint, line andpolygon' representations Raster of complexcurves' with rasterrepresentation Problems records' for excavation data model A simple Web-CD. WestHeslerton KIP GIS. KIP surveytract attributedatabase. distribution. artefact A computer-simulated (E-R) diagram. An entity-relationship structure. storage A simplearc-node A rastergrid with cell valuesoverlay. file A typical rasterstorage usingRLC. file compression Raster A 'totalstation'. A differentialGPS. spectrum. The electromagnetlc by a digital sensor. Pixel valuesrecorded Dot-densityovedayon an aerialphotograph' Stepsin digitisingmaPdata. vertices' A polygonbeforeandafterthe removalof redundant lines' in digitised errors topological Two common Threecommontopologicalerors in digitisedpolygons x

page12 I4 18 19


73 25

27 28 31 38 40
44 44 49 56 58 59 60 60 oz

66 67 75 81 84 84 85

List offigures 5.10 5.11 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.1I 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 7.15 7..16 7.17 7.18 7..19 8.1 8.2 Surveytract polygontopologyin the KIP GIS. Translation, scalingandrotation. Trendsurface analysis. Linearinterpolation. Interpolation usinginversedistance weighting. A variogram. Isometricview of a hillshadedDEM drapedwith an artefact distribution. Problemsassociated with the simple interpolationof contour data. Typicalproblemsof usingcontourdatafor interpolation. produced Comparison of interpolations usingIDW anda TIN. A Delaunaytriangulation. Useof contourdatafor TIN building. DEM generated using Isometricview of a hillshaded TOPOGRIDrn Arclnto 7.2. The Booleanoperators. The poinrin-polygonproblem. The line-in-polygon problem. Polygonoverlay. Bufferingof a point, line andpolygonfeature. Elevation, land-use capabilityanddistributionofcairnsin west Shetland. Boxplot of artefacts recovered from transects. Boxplotof repolarosizes for theTakuvaine andTupapaTapere. and Frequency of repolarosizesfor theTakuvaine distributions Tupapa Tapere. propoftiondistributionof repotarosizesfor Cumulative Takuvaine andTupapa Tapere. areas. Boxplotofartefactdensities forcoastalandinlandsuryey Boxplot of loggedafiefactdensities for coastal andinland surveyareas. Different qualitative classifications on the same stone-tool dataset. vadables to Hypothetical reclassification of qualitative rank-order variables. Normal,rectangular, bimodalandskewed distributions. Classification of normallydistributed data. Distributionof sherddensity. Six possible numericalclassifications of the samedataset. Classification of pixel valuesrecorded by a digital sensor. Idealisedcorrelationpattems. 'Line of regression'.

xtI 86 87 92 94 95 99
102 104 105 106 107 108 110 114 116 117 118 tt9

129 131 131 132 134 135 138 t40 l4l 142 143 L44 141 150 153

xrv 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11

List offigures pointdistribution' fittedto aheteroscedastic A line ofregression coeffcient' corelation the a variableto improve Transforming pottery' medieval actual ofpredictedversus Plot ofthe residuals andnearlyregularpoint Nearly random,nearlyclustered distdbutions. distribution' point pattemswith a near-random Smaller-scale MultiscalarPointPatteming. Identificationof multiscalarclusteringin the KytheranEarly K. BronzeAge usingRiPleY's A simPle Pointdistribution. of the point distributionin A singleJinkclusteranalysis 155 156 157 163 164

r66 t67 169 170 171 172 173 r74 175 176 178 179 182 185 188 190 191 193 193 194 195 196 t96 199 200

Fig.8.10. usingWard'smethod 8.12 A clusteranalysis 8.i3 A t-means cluster analysisof medievalcastleson Okinawa Island,JaPan. from ftench4b at Boxgrove' 8.14 Distributionof stoneartefacts England. for 1-20 clustersolutions' of sumof squares 8.15 Rateof change clusteranalysis' 8.16 The two-clustersolutionof a /<-means of the stoneafiefactdistributionat 8.17 Threeintensitysurfaces Boxgrove4b. of the stoneartefactdistributionat 8.18 Kernel density estimates Boxgrove4b. in five fields' densities 8.19 Distributionof artefact ofapredictive thegeneration in of stages flowchafi Generalised 8.20 model. 8.21 Cumulativeper cent corect predictionsfor model sites and non-sltes. 9 .r Calculationof a meanvalueas a spatialoperation' calculator'. 9.2 ESRI'SATCGIS'raster on the terraindepends while traversing 9.3 The slopeexperienced direction of travel. valuesto compass aspect 9.4 A GRASSGIS de file to reclassify directions. UK' of slopevaluesin DentdaleandGarsdale, 9.5 A histogram UK' Garsdale, and valuesin Dentdale of aspect 9.6 A histogram of slope' classiflcations andequal-area 9.7 Equal-interval relief model. 9.8 A shaded but differentplan 9.9 Two points with the sameslopeand aspect and profile comPlexitY. filter. 9.10 A simplelow-pass usingfilters' calculated diversity and range Mode, 9.11

List offgures 9.12 High-pass filtersappliedto a synthetic DEM containing traces of a field system. 10.1 Multiple buffer zonesarounda point. 10.2 Multiple mergedbuffers. 10.3 Thiessentessellations. 10,4 Linearbarrierbreached by diagonalmoves. pathsderivedusingrelativeandfixed costs. 10.5 Least-cost 10.6 Calculation of effectiveslope. 10.7 Sign of effectiveslope. 10.8 Energetic costoftraversing slopes according to Llobera(2000). 10.9 Energetic cost of traversing slopesaccording to Bell andLock (2000). 10.10 Energeticcost of traversingslopesaccordingto van Leusen (2002). 10.11 Iterationof a basicspreading function. 10.12 Algorithm artefacts in an accumulated cost-surface. 10.13 Crosstabulation of land-use potential. 10.14 Multiple viewshed. 10.15 Cumulative viewshed. 10.16 Edgeeffectin visibilityanalysis. 10.17 Non-reciprocity of intervisibility. 10.18 Probabilistic viewshed. I 1.1 Connected anddisconnected simplegraphs. 11.2 Pathsandcyclesin a simplegraph. I 1.3 Weighted digraphof a roadnetwork. 11.4 Transportnetwork. I 1.5 Planargraphs. I 1.6 Tum tablefor a nodein a transport network. 11.7 Sparsely and well-connected networks. 11.8 Serbian traderoutesin the thirteenth andfourteenth centuries 11.9 The C1matrix for a tradenetwork. 11.10 Distributionofaccessibility in themedieval Serbian oecumene. I l.1l Justified graph. 11.12 Visibilitygraph. (2001)shellmidden 11.13 Mackie's network. 11.14 Results of Mackie's(2001)location-allocation analysis. 11.15 The travellingsalesman problem. 11.16 Errors in least-cost paths. 11.17 Collischonn path. andPilar's(2000) least-cost 11.18 A globally suboptimal path. least-cost 11.19 LDD map andderivatives. 11.20 Stream orderindices. 11.21 Watersheds andridges.


202 210 2r1 212 216 211 218 219 219 220 221 222 223 226 227 227 229 230 231 235 235 236 238 239 239 244

246 247 248 250 250 251 253 253


256 259 259

xu 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 13.1 13.2 13.3

List of figures shading' Effect of PolYgon maPPing. ChoroPleth Mapping enumerationunit variability' Continuousdistributionmapprng' symbols' Use of ProPortional maP. A dot densitY maP. Isochronic Isoplethicmap. SVG maP. Interactive XMLInPUIeditor. editor' Metadata ESRI's ArcCatalogue tool' search Metadata ESRI's ArcCatalogue

266 267 269 270 2:tI

278 286 286 287


1.1 3.1 3.2 3.3 4.1 4.2 5.1 5.2 5.3 5.4 5.5 5.6 5.7 6.1 6.2 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.13 7.14 7.15 7.16 7.17

usingGIS The main typesof question that canbe answered KIP geospatial datasets The four scales of measurement (2000b)predictive model Variables usedin Woodman's A flat-file database database tables Threenormalised imagery Sources of digital satellite NASA Landsat-7 ETM+ bands Principalapplications of Landsat TM spectral bands Commonsatelliteandrasterimasefile formats Commonvectorfile formats A correspondence table RMSE anderror values Attribute, distancevalues and weighted athibutes Surface modelsthat canbe comDuted from a DEM SQL logical operators Results ofa groupingand aggregate SQL statement Observed numberof caimsandhouses on eachland class Contribution ofeach land classto the studyarea Summarystatistics of elevation valuesfor cairnsandhouses Basic statistical termsandconcepts on eachland class Observed numberof caimsandhouses ExDected numberof cairnsandhouses on eachland class @=41 values Crit"ical valuesfor the chlsquaredtest based on land-class areas Exnected numbers of monuments Artefacts recoveredfrom ten transects Ranked transect values Ranked measures test Critical valuesofD for the Kolmogorov-Smimov Mesolithic artefactdensitiesfrom coastaland inland survey areas on Islay

page2 45 46 48 52 54 68 69 70 77 79 82 82 96 103 t14 115 IZT 122 123 124 124 125 126 t27 t27 128 L29 129 132 133


List of tables 8.1 Example valuesusedfor calculating the correlation coefficient, r 8.2 Example data usedfor calculating the standarderror in a linear regressionanalYsis 8.3 Counts of prehistoric and medieval pottery recoveredftom ten surfacecollection areas and solutions with 8.4 Somecommon problems, consequences regressionanalYsis 8.5 A distancematrix for hierarchical cluster analysis matrix for Getis'sGI statistic 8.6 A distance soil loss equation The universal 9.1 of networkstructure 11.1 Basicmeasures of network structure 11.2 Local measures of networkstructure 11.3 Globalmeasures maPitems 12.1 Essential coreelements 13.1 ISO 19115 13.2 StandmdUK metadataelements




15 r69 18 202 24{ 24( 2{,


28 28


2.1 GIS tasks and descriptions error 5.1 Root-mean-squa.re

6.r 7.r
7.2 8.1 8.2 9.1 9.2 r0.1 I1.1

Interpolation using kriging in ATcGIS Using R Univariate statisticsin R Monte-Carlo simulation Clark aad Evans' nearestneighbour statistic How to calculate slope and aspect Parameters for the ANSWERS erosion model Weighting Thiessenpolygons A method for locating confluences

page 13 82 101 136

t) t

_t ol

205 213 261


Many people facilitated the writing of this book and we would like to take the opportunity to thank: ProfessorsPeter Ucko and StephenShennanfor their on landscape encouragement and advice; Dr Andrew Bevan for many discussions for archaeology andGIS; Dr CyprianBrookbankfor his 'early-adopter'enthusiasm Dr SueCollegefor her keeneye; Drs Andre Costopolousand spatialtechnologies; Andrew Gardnerfor their opinions on Chapter 1; ProfessorYvonne Edwards for her helpful commentson a draft versionofthe manuscript;SachKillam for his new ProfessorClive Orton for guidanceon all mattersstatistical;Dr user'sperspective; Paddy Woodmanfor sharingher expertise;all of our studentsfrom the MSc programmein GIS and SpatialAnalysis at the Institute of Archaeology(UCL) whose feedbackon drafts of this book has been invaluable;Sirnon Whitmore at CUP for criticism greatly reviewerswhoseconstructive his patience;andthe two anonymous improved the text. Mark Lake's use of GIS has at various times been generously Council andthe LeverhulmeTrust. by the Natural EnvironmentResearch supported the Departmentof Anthropology of the JamesConolly would like to acknowledge Fellow and,durUniversity ofAuckland for welcominghim asa Visiting Research ing the latter phasesof writing, the support of the Social Scienceand Humanities Chairs Program). Research Council of Canada(CanadaResearch

Introduction and theoreticalissues

in archaeological GIS

: i About this book (GIS) has now matured to the -: .rudy of geographicalinformation systems can take advantage of relatively user-friendlysoftware :. :ri s herenon-specialists ::elp them solve real archaeologicalproblems. No longer is it the preserveof , :.ns who - in the eyesof cynics - chosetheir archaeological casestudiessolely good a thing, because problems. This is, of course, to GIS solutions :llustrate the widespleadadoptionofGIS Nevertheless, ,--S hassomuchto offer archaeology. The most problematicis that modernGIS ::ass with it severalattendantdangers. :i.:kages off'er users a variety of powerful tools that are easily applied, without at hand. for the dataor questions ::.'r iding much gnidanceon their appropriateness just a few mouseclicks require : -.rexample,many currenl GIS softwarepackages : ' .reate an elevationmodel from a set of contour lines, but none that we know of contoursis likely to .'. .uld warn thatthe applicationofthis methodto widely spaced errors host of intetpretative results that could lead to a highly unsatisfactory ::oduce who becomeoverthereis a risk that researchers down the line. Conversely, :--Lnher abilities of GIS may shy away from tackling ::pendent on the data management ::lore analytical questions simply becauseit is not immediately obvious which who keepsthis manualnear :urtons to push.It is our ambjtion that no archaeologist :i: or her computer will make such mistakes,nor be hesitantabout tackling the tools .orts of questionsthat can only be answeredwith someof the more advanced offer. :hat GIS packages we recognisethat We have adoptedan approachthat is both practical, because minimum of fuss, get pafiicularjob with a done a nany readerswill be looking to .rnd rigorous, becausewe are equally well aware that poorly describedshort cuts usuallyturn out to be the most tortuousroutesof all. Practicalmeansthat we have usersof on the kinds of problemsthat areroutinely facedby archaeological rocused means that we have research. It also management and GIS, in both cultural resource rried to give the readersufficientguidanceto achieveall but the most complextasks from manuals \ ithout having to consulta raft of snppoftingliterature,apartperhaps or help files specificto the chosenGIS software.The latter may be requiredbecause rve simply cannot provide instructionsfor every GIS softwarepackage,although examplesto provide concreteillustration rvehaveprovided somepackage-specific is rigorous in that we alwaystry to explain wr) Our approach of certainoperations. as well as /zow.In our severalyears' experienceof teaching GIS to archaeology students, this is the best way of ensuringthe appropriateapplication of methods,

Introduction and theoreticalissues

GIS Table 1.1 The main rypesof questionthat cdn be answeredusing


new What artefacts have been found along the proposed route of the road? Where were Roman coins dating to the secold century AD found? How doesthe density of primary debitagechangeas one moves away lrom the Prchistoric heafih? Does the medieval trackway follow the most energetically efncient route? or do Are the bu al caims distributedunifornly acrossthe landscape' they cluster on SE facing slopes? Where would one expectto find more Mesolithic campsites?

Chapter 7 & l0

Location Condition Trend Routing Pattern Modelling

'7 6&8

'7&9 8

asthe needarises'Indeed' while alsoempoweringusersto developnew applications attitude to problem-solving we hope aboveall else that this manual will inspire a use of GIS. the archaeological their way Although ire do not envisagemaly readersmethodically.working prological a maintain to tried have through th'ismanual from start to finish, we be might they that gress[n such that topics are introduced in roughly the order This GIS' in the course of developing and using an archaeological Encountered GIS' Readerswho chapterconsiderssome theoreticalissuesraisedby the use of 2' which introChapter reading after ,r"* ,o GIS may find it helpful to return in which GIS ways varied -. Ju.es the basics,and Chapter 3, which illustratesthe with concemed 4 and 5 are primarily projects.Chapters can benefit archaeological data acquisition' the constructionof a btS ana' in parricular,the processof spatial GIS' of archaeological construction in the a commonnext step Chapter6 discusses an elefor example' point data: from ofcontinuous surfaces which is the generation for analysis' of GIS 7-1 1 describethe use vation modeifrom spotheights.Chapters in Table 1 1' Readerswho are using listed question of that is, answeringthe types (CRM) will probably find Chapters7 and GIS for cultural resourcemanagement applicationscan be 10 most immediately relevantto their needs,although CRM readers in the otherchaptersResearch-orientated found for many of the techniques depending will probably want to read Chapter7 and a selectionof Chapters8-11' of continuous derivatives (Chapter 8), pattering wheiher their interestis in spatial (Chapter10)' suchas slopeand aspect(-hapter 9), the analysisof regions surfaces for presentor the analysisof routes (Chapter 11). Chapter 12 describes.methods or publication' for use by others,whetherty traditional ing the resultsof analyses of spatial deiivery via the Intemet. Chapter13 providesadvicefor the maintenance but chapter' this last skip to tempted data. We suspect that some readers will be be soon GIS will a creating metadataare of vital importance,as the investmentin readers glossaryto aid wastedwithoutthem. Finaily, we haveprovidedan extensive the book' who pick their own path through

1.2 Theoreticalissues

1.: Theoretical issues -.. iust explained,our primary motivation in writing this manual is to promote the problems.In essence, -:tropriate and creativeuse of GIS to tackle archaeological just one tool - albeit a very powerful one - among the many that : rreat GIS as purposes. However,therehasbeenconsiderable -,1 be deployedfor archaeological just whether it is a 'science' in its own right is a tool, or whether GIS about :: rf,te if GIS is just a tool then :, ighl et al. 1997).It is arguedthat this mattersbecause but if it is a sciencethen its use :. usemay be construedas largely theory-neutral, - one that may or -,:omatically brings with it a particular theoreticalperspective .i} not be welcome. The remainder of this chapter provides an introduction to . . ne of the key issuesin this debate. 1.2.1 Thenatureof space ,-:ographical information systems require two descriptorsto describe the real ,.)rld: attribute records what is presentwhlle location tecordswhereit is (Worboys is what setsGIS 2, the locationdescriptor clearin Chapter r95). As will become More importantly here,it requiresa conceptof systems. .:lrt from other database .,. :iat spaceis and a meansof describingit. $hat is space? on whetherformal or informal, is ultimately predicated rnl kind of spatialanalysis, Westernthoughthasbeendominatedby two main philosophical of space. : .-oncept :eas aboutthe natureof space,one of which views it as a containerand the other ,! a relation betweenthings. The absolute concpt views spaceas a containerof all material objects,which :rists independentlyof any objects that might fill it. The origin of the absolute (Harvey 1969,p. 195), may lie with the Greekatomistphilosophers -..nceptof space position thought during the in Westem dominant :ur in any case it assumeda particularly as a result of the successof Newton's laws of motion, Lenaissance, of movement.Kant .,. hich require a fixed frame of referencefor the measurement .rbsequentlydevelopedthe absoluteconceptof spaceas 'a kind of framework for :rings and events:somethinglike a systemof pigeonholes,or a filing systemfor geographyas the study of all (Popper 1963,p. 179).He categorised -.bservations' :henomenaorganisedaccordingto this 'filing system'; this view remainedcentral :o geographyuntil at least the mid 1950s(Harvey 1969,Chapter l4). asapositionalquality ofthe world of the relative conceptviews space In contrast, (Harvey p. 1969, 195),from which it follows that, unlike ::raterialobjectsor events of things. spacein the absence :n the absoluteconcept,it is impossibleto envisage reactingagainstNewton's identificationof absolutespace of science, Philosophers rs God or one of God's attributes,came to favour the relative conceptduring the century.Physicists,however,remainedweddedto the absoluteconcept nineteenth until the early twentieth century, when the GeneralTheory of Relativity reduced on Newtonian mechanics.Theoretically inclined geographers rheir dependence

Introduclion and theoretical issues

if can only be understood followed in the 1950sasthey realisedthat many processes in termsofcost, time or socialinteraction(Watson1955),none distanceis measured provide the invariantframework requiredfor Kant's 'filing system'. can of which How can we describe space? Just as spatial analysisis predicatedon a concept of space,so it also requires a 'ianguage' (Harvey 1969,p. 191) with which to describethe spatial distributions responsiblefor of objects and eventsin that space,and to discussthe processes geometries, as and the two are known suchdistributions.Formal spatiallanguages that are most immediately relevant to archaeologicalusers of GIS are topology andEuclitlean geometry.These and other geometriesmay be distinguishedfrom one anotherbecausethey are not equally capableof distinguishingthe effects of particulartransfomations, such as stretching,enlargingor rotatrng.
Topology distinguishesspatial objects that should be considereddifferent on account of the way in which they relaie to their neighboursand' for that reason'it has a close afllnity with the relative model of space For example' supposean excavationplan of the were drawn on a rubber sheet,then topology is concernedwith those aspects not is shetched or knotted, but sheet when the invariant rccordedfeaiuresthat remain cut or folded. Theseinclude stratigraphicrelationssuch as 'contains' and 'abuts" but not the areascoveredby different deposits lndeed, one of the most notablefeatures of topological geometriesis that they do not allow oae to measuredistanceor area' the identilication ofexplicit topological relationsis often an impoftant Nevertheless, step in the constructionof a CIS (Chapter5), especiallyif it containsnetworks such as river or toad systems(Chapter l1). Euclidean geometry is the geometrythat most of Llsare taught at school. Devised by Euclid around 300 BC, it is an example of a metric geometry,that is' one which includesthe conceptoi distancebetweenpoints suchthat the distancefrom pointA to point B is the sameas that from B to A. Euclideangeometryhaslong beenassociated with the absoluteconceptof space.Note also that the familiar Cafiesiancoordinate system(Chapter2) is not actually an essentialfeature ol Euclideangeomety - it is approximately2000 yearsyounger- but it is ofcourse a very useful tool for analysing transfotmationsin Euclideanspace.Retuming to the exampleofan excavationplan, Euclideangeometryallows one to measurethe areascoveredby different depositsas well as to statethe stratigraphicrelationsbetweenthosedeposits

Since Euclidean geometry allows one to distinguish a larger number of transformations than topology it may be consideredmore 'specific' (Klein i939). In GIS terms,a more specificgeometrysupportsa larger number of meaningful questions about the spatialrelationsin a database. Space in GIS As alreadynoted,GIS describethe wodd in terms of attributesand locations.The two principal data models used in GIS to describehow theseshould he linked to someextentmirror the two philosophicalconceptsof space. 'lhe continuous field data rnodel proposes a space over which some attribute varies, usually smoothly and continuously (Burrough and McDonnell 1998; Couclelis 1992). A. concrete implementation, which provides a discrete

1.2 Theorelicttl issues

approximationof a continuous field, is a raster digital elevation model (DEM). As will be discussed further in Chapter2, a rasterDEM recordsheight abovesea level for a setofcells arranged in a regulargrid, but the importantpoint to note here is that one can query any cell in the grid and expect to retrieve an elevationvalue (or a NULL value in the caseof missing data), in other words, every location has an attributevalue (at leastin principle). Sincethe continuousfield model organises information by a set of predetemined locationsin spaceit can be considered to fit quite closely with the philosophicalabsoluteconceptof space. The altemativeentlry datamodel proposes a set ofentities which havea location and which are characterised by spatialand/or non-spatialattributes(Bunough and McDonnell 1998). A typical implementation of this model is a vector map of archaeologicalsurvey units, which records the extent of each unit as a closed polygon and associates with each a unique identifier and information such as the weight of potsherds recoveredby surfacecollection (seeChapter2). In contrastto the previousexample,one would not expectto retrievedata about potsherdsfrom locations other than those associated with survey units, in other words, some possiblymany - locationsdo not haveattributevalues.The entity model has some affinity to the philosophicalrelative conceptof space,at least to the extent that it organises information by entity rather than by a set of predetermined locationsin space. On the otherhand, it is worth noting that in practicethe locationsof entities are given accordingto a fixed coordinate system that describesa spaceexisting independently of thoseentities. We havejust suggested a close associationbetweenraster maps and the continuousfield data model, and betweenvector maps and the entity data model. This associationdoes indeed mirror standardpractice in implementing the two data models.Nevertheless, it is important to be awarethat raster and vector maps are themselves data structures rather than datamodels:it is in fact possible- although usually less convenient- to representa continuousfield as a vector map and a collectionofentities asa rastermap (e.g.seethe discussion oftriangulatedirregular networks(TINs) in Chapter6). 1,2,2 Spacein archaeologicalGIS Does it really matter much for archaeologicalpurposesthat there are different philosophiesof spaceand that theseare at least partially mirrored in the different modelsof spaceusedin GIS? In our view it does,both for the use of GIS to record archaeological evidenceand fbr its subsequent use to analysethat evidencein the hope of learning about the past.We considereachin tum. Recording the evidence Archaeologistsroutinely, although often implicitly, invoke particular conceptsof spaceand pafticular geometriesto record the spatialorganisationof the evidence for past human activity. For instance,the single contextrecording systemusedon many deeply stratified urban excavationsis essentiallypredicatedon the relative conceptof spaceandemphasises topologicalrelations.Thus theremay be few or no

Introduction and theoretical issues

plans of a continuoussurfaceshowing the various stratigraphicunits revealedat a particularstagein the excavation, sincethe primary concernis not to recordwhat is presentin eachquadraton the site grid, but ratherthe locationsof and relationships between the individual stratigraphicunits that provide evidencefor past events. This, it may be argued,is better achievedby planning each unit separatelyand recording the relationshipsbetween units on a topological diagram - the Harris matrix (Harris 1979). In contrast,a programmeof lield surveyby surfacecollection is more likely to relationshipsthat be predicatedon the absoluteconceptof spaceand to emphasise require Euclidean geometry.For example,if the purposeis to locate settlements then the results might be recorded on a plan showing the number or weight of artefactsfound in each quadratof a survey grid, since the primary concern is to identify locationswith particular attributes,in this casethose with many artefacts. Fufihemore, if for some reasonit was not possibleto lay down a regular survey grid then the resultsmight be adjustedto take accountof the areacoveredby each surveyunit, somethingthat clearly requiresEuclideangeometry. problemsjust outlinedwill ofthe two different archaeological The requirements best be met using different models of space:the entity model in the first caseand the creationof vectol and the relative model in the second,which in tum suggests rastermaps,respectively(but note the earlier caveat).This illustratesthat concepts of recording of spaceand geometriesare important for the very practicalbusiness evidence. the spatialorganisationof archaeological Learning about the past As noted at the outset of this discussion,there has been much debatein geography about whether GIS constitutesa 'tool' or a 'science' (Wright et al. 1997) and what theoretical, and indeed ethical, baggage might accompany its use (e.g. Cuny 1998; Sui 1994).Archaeologists have engagedin a similar debate Thomas (e.g. Wheatley 1993;Gaffney and van Leusen 1995t Gaffney er al. 19961 fbr ethics and for the 1993,2004; Witcher 1999),albeit with less-explicitconcern feminist critique found in geography(see Kwan 2002). Roughly speaking,those who view GIS asa 'tool' takethe view that it is potentially applicableto many kinds of learning,whetherthat is pursuedthrough the inferential framework characterisor through other frameworksprovided by, for example, tic of the natural sciences humanistsociology.In contrast,those who view GIS as a'science' tend to regard it as closely or even inextricably linked to the natural sciencesmodel. Whatever the theoretical argumentsabout how GIS can and cannot be used, in practice it GIS recapitulates, archaeological appearsthat the history of research-orientatedr
to refer to studieswhose purposeivwas to make sense \*!i. di."ur.ion use the term 'research-oricntated' ". spatialorganisation. applicationsofrecognisably modenr The very earlest archaeological ofpresenrpast human GIS mostly involved the constructionol predictive models (e.9. papersin Judge and Sebastian1988), but the evidencewithout necessarily seeking primary purposeofthese was oiten to predict the presence ofarchaeologicaL to explain or understandil (seeChapter 8 in this book).

1.2 Theoretical issues

over a greatly compressed timescale,the parentdiscipline's experimentation with differentmodesof leaming and its changingemphasis on differentfacetsofhuman behaviour(seeLake andWoodman2003for amore detailedtreatmentin the context of visibility studies). 'Common-sense' narrative Prior to lhe adventofthe New Geographyin the i960s, moststudiesofthe humanuseof spaceproceeded by descriptivesynthesis, provid_ rng a narrativeaccountof what happenswhere. The samerelianceon description andnarrativewas broadly true of archaeology up until the late 1960s,especiallyin its treatment ofspace.Thoughboth 'traditional, geographyand ,traditional' archae_ ology had developedspecificmethodologiessuch as distribution mapping and, in the caseof archaeology, seriation,neither were generallyvery explicit about their theoreticalpremises,nor aboutthe inferential logic usedto justify their claims. Curiously,the earliestresearch-orientated archaeological GIS studiesgenerally mirrored the 'common-sense'approachof,traditional' archaeology(Aldenderfer 1996),eventhough they were undertaken asrecentlyasthe late 1980s/early1990s. For example,Gaffney and Standid(1991, 1992)usedGIS ro establishthar Roman towers on the Adriatic island of Hvar are intervisible and then suggested that the location of thesetowers may have been determinedby the needfor intervisibility. While it is quitepossiblethatthis suggestion is correct,the authorsdid not attemptto supportit by, for example,demonstrating that intervisibility is unlikely to haveoc_ curredby chancealoneand was not a byproductof someotherfavourableattribute. ScientiJic explanation During the 1960sthe New Geography(HolrJensen lggg), and subsequently the New Archaeology(Binford and Binford 1968; Clarke 196g: Binford 1989),adopteda positivist approachto their subjectmatter.It was hoped that the application of logical thought to observationsof actual conditions could produce lawJike statementsabout human behaviour. Even though the initial enthusiasm for Hempel's hypothetico-deductive method (as championedby Fritz andPlog 1970)soonwanedas archaeologists struggledto apply it in the contextof a historical science,much archaeological researchconductedsince the 1970shas been conductedin processualvein, that is, broadly predicatedon the assumption that the methodsof the natural sciences can be usedto explain the subjectmatter of the social sciences. This is manifest in a more rigorous approachto inferenceand a greateruse of quantitativeand especiallystatisticalmethods. A parallel developmentoccuned in the ea y-mid 1990sas the use of GIS for archaeological research rapidly enteredwhat onemight term its post-pioneer phase. In 1993 Kvamme urged archaeologists to take an integrated approach to spa_ tial statisticsand GIS, having already noted how GIS might be combined with one-sample teststo examineassociationbetweensite location and environmental parameters (Kvarnme1990c).Inthe sameyearvanLeusen( 1993,p. 120)performed a cluster analysis of the geomorphologicalproper-ties of palaeolithicfMesolithic site viewshedson the groundsthat thesewould be expectedto vary for sites that

Introduction and theoreticalissues

From then on therewasa system. fulfilled different functionswithin the subsistence clear concem with increasinginferential rigour. Thus Wheatley (1995, 1996)used a one-sampleKolmogorov-Smirnov test to evaluatean explicit hypothesisabout the intervisibility of sites. His work was subsequentlyfurther refined by Fisher between that the mere existenceof an association et aL (1997), who emphasised human activity and one or more environmentalvariablesdoesnot in itself provide how evidenceof a causalrelationship.For example,they demonstrated adequate whether with coastal sites can help ascertain control samples useof more-restricted large viewshedswere deliberatelylocatedto havecommandingviews, or whether of proximity to the sea. this was an unintendedconsequence understanding, erperience, symbolism and 'otherness' The antipositivist, or humanist, critique of positivist social sciencefound its way into geography in of post-processual the 1970s(e.g.Tuan 1974)and was takenup in the development (e.g.Hodder1982,1986;Shanks andTilley 1987a). archaeology duringthe 1980s GIS practitionershave beenpafticSince the mid 1990sEuropeanarchaeological ularly concemedthat the use of GIS has, whether intentionally (Wheatley 1993, the continuation p. 133) or otherwise (Gaffney er al. 1996, p. 132), encouraged or even re-introductionof a positivist approachthat had otherwise been rejected archaeology.The following introduction to the use of GIS by post-processual accordingto threestrandsof postpost-processual frameworkis organised within a processualthought, although we concedethat in reality these are not so readily separable. how we leam aboutthe past.This constitutesa rejectionof One strandconcerns are appropriatefor the study of the notion that the methodsof the natural sciences Instead,drawingon Idealist sociallife, andwith it the goal of scientilic explanation. often proposethat human action can only archaeologists thought,post-processual of thoseinvolved (Hodder 1986).This has by taking the perspective be understood the creation phenomenological approachthat emphasises with a been augmented (e.g. physical world Tilley 1994; with the throughbodily engagement of experience of I'ttndscape: Thomas 1996). Thus Chris Tilley arguesin his A Phenomenology Paths, Places and Monuments (1994, p. 10) that 'spacecannot exist apart from one the eventsand activities within which it is implicated'. From this perspective with of the major problemswith traditional GIS analysishas been its association the absolutemodel of spaceand the way in which, as a result, it is claimed to visibleit (1991,p. 189) 'God trick': by makingeverything perpetuate Haraway's inhabitant would hardly 'a picture past which the presents landscapes of not only (Thomas 1993, 'a appropriation' recognise'but also facilitates kind of intellectual p. 25). It is increasinglyarguedthat the way forward is to combineGIS with virtual ofpast materialconditions reality soasto provide somekind oflocalised experience (see Gillings and Goodrick 1996; Pollard and Gillings 1998; Earl and Wheatley 2002; also Gillings 2005 for a critique). This approachrepresentsa significant break with positivist models of inference,eschewingexpert explanationbasedon

1.2 Theoretical issues

:he resultsof statisticaltestsin favour of multiple understandings, eachpotentially inique to a particularpafticipant A secondstrandof post-processual thoughtconcerns what aspects of the pastwe ihoose to study.The tendencyof processualarchaeologyto focus most attention -\nthe ecologicaland economicdimensionsof humanexistencehas beenreplaced r) an emphasis on meaningand symbolism.Thus, for example,the spreadof agri:ulture acrossEuropeis treatedin termsof the replacement of one systemof mean:ns by anotherrather than the replacement of one mode of subsistence by another Hodder 1990;Thornas1991b).Most attemptsto move beyondthe allegedenvironnental deteminism of earlier GIS applicationshave treatedsymbolic landscapes :s primarily a product of intervisibility (e.g. Gaffney et al. 1996).This, however, i:ks replacinga determinismbasedon one suite of environmentalvariableswith a leterminism basedon another.In response therehavebeen three developments in :rchaeologicalGIS. One replaces dependence on the simplepresence or absence of : line-of-sightwith an attemptto model more complex aspects ofvisual perception e.g.Wheatley1993;Witcher 19991 Wheatley andGillings2000).A second develrpment combinesGibsonianpsychology with the calculationof many or even all oossibleviews in an attemptto map landscape'affordances'(Llobera 1996,2001, i003). Finally, therehavealsobeena few attemptsto model senses otherthan vision e.g.Tschanet al.2OO0; Mlekuz 2004). The third strandof post-processual thought that we considerhere is concerned \\'ith the 'othemess'of the past.This involvesa recognitionthat the pastmight have beenvery different,in particularthatpastpeoplemight havehadvery different ways .rf thinking (Shanksand Tilley 1987b;Thomas l99la) and, evenmore profoundly, ihat the very experience ofbeing an individual might havebeenquite differentfrom ihat with which we are familiar (Thomas 1996). So far as the use of GIS is coniemed, this perspective has contributedto the objection, alreadynoted above,that GIS representations are built using modelsof spaceand spatiallanguages - suchas the absolutemodel and Euclideangeometry- that are specificto Westernthought. \lore fundamentally, Julian Thomas(2004,p. 201) argues that evenifit is 'possible to developa sensuous, experientialarchaeologyof place and landscape, which is sensitive to the relationalitythatrendersthingsmeaningful. . . it is questionable how tarthis process canbe facilitatedby a microprocessor'. At the root ofhis doubt is the rvell-knowncritique of computational theory of mind (Dreyfus 1972;Searle1992), \\ hich argues that traditionalartificial intelligenceandcomputationalmethodssimply do not capturethe real natureof thinking and knowledge.Archaeologicalusers ofGIS havemadesuggestions that may go someway to addressing the first ofthese critiques.For example,Zubrow (1994) 'warped' Euclideanspaceto investigate the fit betweenthe observedand ideal distributionsof Iroquois longhouses. In addition it hasbeenargued(e.g.Wheatley 1993)that cost-surfaces (seeChapter10) provide anotherway of representing non-Euclideanexperience of distance.It may also be that object-orientated GIS (Tschan 1999) will help us model spaceas inextricably bound up in eventsand activities.In contrast,the secondcritique initially appears


issues Introduction and theoreticaL

less tractable,as it questionsthe very use of computer methods.However, artifiin sociologicalsimulation, including many specialists cial intelligenceresearchers, are actively moving beyondtraditional computationaltheory of mind and tackling issuessuch as the social constructionof emotions(Caflameroand de Velde 2000) and the idea that cognition is not somehow separatefrom engagementwith the world (Maris and te Boekhorst 1996). We suspectthat these developmentswill filter through to GIS, perhapsinitially in conjunction with the use of agent-based simulation models Lake 2004). 1,3 Conclusion GIS has been describedas 'the most powerful technologicaltool to be applied to archaeology sincetheinventionofradiocarbondating' (WestcottandBrandon2000, backcover),but also as a technologywithout intellectualvigour, overly dependent on simplepresuppositions aboutthe importanceofspatial patternsin a dehumanised artificial space(cf. Pickles 1999,pp. 50-52). Although thereare elementsof truth of the useof we believethat one of the greateststrengths in both theseperspectives, is its diversity.In somecasessimply organisingour data more GIS in archaeology efficiently is enough to prompt new ideas about the past. In others,new insights to constructnew In yet othersit is necessary requirecarefuluse of spatialstatistics. finally, GIS. And, we will surely the framework of conventional methodswithin with viftual reality, agentleam even more as a result of the integration of GIS in artificial intelligence.Ultimately, basedsimulation and ongoing developments is to use GIS appropriately,which meansremaining cognisant the key to success inherent within it and having adequate technical of the theoreticalencumbrances possibilities it offers. commandof the powerful and diverse

First principles

2.1 Introduction The power of GIS, as with other computer programs,can be deceptive:visually impressivebut ultimately meaningless resultscan appearunassailable becauseof the sophisticated technologiesused to producethem (Eiteljorg 2000). The familiar adage 'garbagein, garbageout' is particularly applicableto GIS, and one of our primary aims throughout this book is to provide guidanceon how to use this technologyin ways to strengthen and extendour understanding of the human past, ratherthan to obfuscateit. In this chapterwe start by providing an overview of the 'first principles' of GIS: the softwareandhardwarerequirements, geodeticandcartographicprinciples, and GIS data models.Theseprovide the conceptualbuilding blocks that are essentialfor understanding what GIS is, how it works, and what its strengthsand limitations are. Although some of these 'first principles' may be familiar to readerswho are experienced in cartographyand computergraphics,we nevertheless provide a thorough review of each as they yield the foundation on which we build in later chapters. 2.2 The basics 2.2.1 GI S.func tionality What does a GIS do? Simply providing a definirion of GIS and refening to its abilities to captureand manipulatespatialdata doesn't provide much insight into its functionality.More informative is to break someof the basic tasksof a GIS into five groups:data acquisition,spatialdatamanagement, database management, data visualisationand spatialanalysis.Someof the routine tasksperformedunder these headingsare outlined in Fig. 2.1 and describedin Box 2.1. While each of these tasks are impoftant in themselves,above all GIS should be consideredas both an integrated and as an integrating technology that provides a suite of tools that help people interact and understand spatialinformation. It is impofiant to stressthat although the origins of GIS are strongly rooted in digital cartography,GIS is not just about 'maps' nor is it necessarilyonly about the digital manipulation of the sorts of information and methodsthat are usually depicted on maps (cf. Longley et al. 1999).The use of GIS has a much broader contribution to rnakein terms of understanding spatial and even space-timerelationshipsbetweennaturaland anthropogenic phenomena (Couclelis 1999).Indeed, it is increasinglycommon to make the distinction betweenthe softwaretools used to processgeospatialdata (GIS), and a geographicinformation science('GISc')

First principles
DataAcquisilion data collection Primary maps Digitise Remotesensing Dataentry i\.4anagement Database SpatialDataAnalysis Datamodelling construction Database Metadataconstruclion Updatingdata Creating/maintaining data relations SpatialData Visualisation Digitalcartography mapping Thematic data patterns Explore 3D visualisation Queryby location Queryby attribute Locational analysis Spatialanalysis Analysis of association Visibi'ity modelling Modellingof movement of behaviour Simulation modelling Predictive modellinq Geostatistical modelling Surface

SpatialDatafulanagement transformation Coordinate Georectification Metadataconstruction topologie5 Building datacleaning Spatial

Fig.2.1 The five main groups of tasksperformed by GIS (after Jones 199?' Fig l 2)'

that is concemedboth with the more fundamentalconceptualissuesof spatialand space-timerelationshipsas well as the impact geospatialtechnologiesare having within the humanitiesand social sciences(Marble 1990; Cuny 1998: Forer and 1999;Longleyet aI.2005) Unwin 1999;Johnston 2.2.2 Geog raphic information overlap betweenthe aims of the disciplines of archaeology There is considerable and geographyas both sharean interest in exploring and interpreting the spatial at scalefrom the micro to macro (e'g' ofhuman societies struciureandorganisation as an inclusive one that tran'geographical' Clarke 1977).We thus treat the term 'Geospatial infomation' (GI) the discipline of geography(Couclelis 1999) scends can thereforebe broadly defined as information about natural and anthropogenic infomation can also with eachother.Geospatial andtheir relationships phenomena on the facades erosion patterning of the as such describemicro-scalephenomena, (e g Marean et al bone on of historic buildings or the distribution of cut marks

2001:Abeetal.2OOZ).Mostarchaeologicaldata-whetherartefacts,ecofac buildings,sitesor landscapes havespatialand aspatialattributesthat can features, be exploredusing GIS (Fig. 2.2). Theseattributesinclude:
A spatiallocation thattells us wherethe information is in ejthera local or 8lobal context' A location can be dellned by a qualitativetem such as 'in Texas' or 'next to the river' research or quantitativelyusing map coordinates.A large componentof social-science to postcodes districts or census cities' as counties' data such positioning usesqualitative form the unit of analysis Qualitative locationsare less commonly used in archaeology

2.2 The basics Box 2.1 GIS tasksanddescriptions


The acquisition of spatial data GIS is a software platform for the acquisition and integration of spatial datasets.Spatial data include, but are certainly not limited to, topographic maps, site locations and morphology, archaeologicalplans, aftefact distributions, air photography, geophysicaldata and satellite imagery,all of which can be integratedinlo a common analytic environment. database management systemsfor the storage Spatial data management GIS usessophisticated and retrieval of spatial data and their attributes.This might involve the translbrmation of map coordinate systems to enable data collected from different sources to be integrated, the building of vector topologies, the 'cleaning' of newly digitised spatial datasefs,and lhe creationof geospatialmetadata. Database management A major strengthoi GIS is that it provides an environment for linking and exploring relationshipsbetween spatial and non-spatial datasets.For example, given a on the provenance that contains database of a sampleofprojectile points, and anotherdatabase information on the moryhology of the same points, they can be ljnked in such a way that it becomespossible to look for spatial pattems in points' morphological variability. Database management,involving conceptual and logical data modelljng, is thus an important part of GIS, as is databaseconstruction and maintenanceto ensure that the spatial and aspatial componentsof a datasetare properly linked. Spatial data analysis GIS also provides the ability to undertakelocational and spatial analysis of archaeologicaldata, as well as tools for examining visibility (viewsheds)and movement (cosFsurfaces) acrosslandscapes. Much work in GIS involves the mathematicalcombination of spatial datasetsin order to produce new data that may provide insight into natural and anthropomorphicphenomena.These range from ecological models that provide predictions of soil suitability lbr agriculture or erosion potential, or predictive models of potential site location. Tools for geostatisticalmodelling of spatial data to create,for example, continuous surfacesfrom a set of discrete observationsare also available. GIS can also be a route to the computer simulatioi of human behaviour and decision nakng in ditlerent types of environments. Spatial data visualisation GIS has powerful visualisation capabilitiesused tbr viewing spatial datain innovativeways (suchasthematicallyor for 'fly-throughs' in threedimensions)that can suggestpotential pattenls and routes for fufther analysis.GIS also provide cartog.aphictools to help produce hard copy paper maps. Many GIS packagesalso facilitate the publication oi interactivemaDdata on the Internet.

although we may, at tirnes,use locationssuch as parish,county or surveyrcgion. More frequentiy we use quantjtative location data in the fbrm of map coordinates.These include global geographiclocationalsystems, with latitudeand longitudebeing the most common, or national,regional or locally delined Cartesianmetic coordinatesystems. A morphology that defines the shape and size of an object, such as 'straight' or '100 mr'. Qualitative or quantitaiivedescriptorscan be recordedas aftribute databy, for example,recording the size of an archaeologicalsite or the shapeot' a distdbution. Alternatively, it is possibleto record spatial morphology directly by mapping the size and shape of a phenomenon,such as an archaeologicalsite on a map. For certain analytical or visual purposes,morphology might be drawn directly on a map, such as the arrangement of a skeletonor the shapeoi a distribution of artefacts.

2.2 Thebasics 2.2.3 Components of a GIS A GIS is a computer-dependent technology.In additionto the computeritself, there are a number of other important componentsto a GIS. The most important ones are:
Software ln order to qualify as a GIS the sotlware must have:(i) a spatial database that storesand managesspatial objects; (ii) some mechanismof linking attdbute data to thesespatial objects,either as an internal function ofthe GIS package,or by providing functions that enable accessto extemal database systems;(iii) a 'geoprocessing engine', which pennits the manipulation and analysisof the geospatialinformation stored in the spatial and attribute databases. None of the many GIS packagescurrently availablepeform all tasks equally well. The choice of sotlware consequently needsto be made with respectto severalfactors,including the tasksit is neededlbr, what operating$ystemit has to run under (e-g.UNIX oI UNIX-like systemssuch as MacOS X, Linux, I x, or Solaris: or one of the ve.sionsof Microsoft Windows) and the size of the budget for software,hardwareand training costs. A large number of packagesare available too many for us to attempt to review - each with their own strengths and weaknesses in tems ofease ofuse and the lange of analyticaltools they grounding in the offer An afternoon'sresearchon the web will provide a reasonable range of software options. If cost is a primary conce , it is worth knowing that one of the more powerful GIS packages, GRASS GIS,| is availablefree of chargeunder an open-source licence.Excellent comprehensive commercial GIS packages include ldrisi,' the ATCGIS suite ofprograms,3and Maplnfo,a and all may offer discountsfor educationalusers. Hardware ln addition to the computerthat runs the softwal, which could range from a smail palmtop computerto a large institutional mainframe,there arc severalother hardwarecomponentsthat are essentialto making a GIS work. Thesecan be divided into two groups. The first consistsof input devices,which might be limited to the keyboardand mousesuppliedwith the computer,but couldextendto digitising tablets, flatbed and ro11scanners,digital surveying equipment such as global positioning system(CPS) devicesand Total Statiois, or geophysicalsensors. Chapter5 discusses the various methods for acquiring digital data in some detail. The second group consists ofthe outputdevicesneeded forviewing andsharinginformation.A conlputer monitor is the basicpiece ofdisplay hardwarebut, with the obvious exceptionof the WWw' it is not a very convenientdevicelbr distributing information to otherpeople. Some type ofprinter, from standardletter devicesto larger colour plotters,is needed for producing the maps, graphs and tables that GIS routinely produces.We review map production and spatialdata communicationin Chapter 12. People GIS operatorsare the most crucial part ofthe systemas they are responsible for the designaid analysisof spatial datasets. A GIS is nevera fully objectiveprocessdata and questionscan rarely be simply 'fed'to a CIS and useful resultsretumed- so it is essentialthat the specialistsresponsibLe for digitising, processingand analysing data are closely integratedwith both project design and data collection. This is less ofan issuewhen one researcher is conductingboth the project design,data collection and analysis,but in large researchprojects or commercial archaeologicalunits it is

lhttp, / lg ru"". it..

' w l l ' v \ ' . e sri . co m.

ic .

r . ' ! ' I , r w. c lar k labs . or s .


'www.m apinfo.


First principles
impoftant to ensurethat GIS analystsare included at the earlieststages of the project designto preventany disjuncturebetweenthe envisionedproject aims and outcomes. A CIS will Iarely contributein any meaninglul way if it is tacked-onas an extra and handedto a 'Gls-person' who has no real understanding oi the original goals of rhe project in question.

2.3 Cartographic principles 2.3.1 Maps, cligital cartographyand GIS A major elementof GIS is the visualisation,managementand analysisof spatial datapresented in the form ofdigital maps.It is consequently imponant to emphasise that all maps, whetherpaper or digital, simplify the world and presentan abstract model of spatialphenomena. Maps can be divided into two basic types:
Topographic mapsptovide gener-al information aboutthe physical surfhceoI the earth. including natural and human-madefeatureslike roads,rivers, settiements and eleva tion. Theseexist at a va ety of different fbrmats and scales.each suited to patticular purposes. Navigationalair cha.ts,for inslance,are conpiled at a scalewhich is useful for pilots ( I | 500 000) and emphasise ropography, settlemenrs, restrictedai.-space and airports. In the UK, the OrdnanceSurvey produce a va ety of different topographic maps(in both paperand digital formats) showingelevarion,naturaland cultural land scapeteatures(including archaeological and historical sites and monuments),roads, towns and villages that are suitablefor a rangeofdifferent applications. The US Geo logical Surveyprodr.rce equivalentmapsfbr the USA and most countrieshavesimilar organisations(e.g. Canada'sCentre for TopographicInformation, and Geoscience Australia). Thematic maps provide specificinformation about a single leature ofthe landscape or environment,ordisplay infomlation abouta single subject.When the datavaluesvzLry continuouslythrough spaceit is common to display then on i.rdrillrrrlc maps,which use lines to connect points of constantnumeric value, such as elevation (aarto(/s), tempetature(isothemij)- precipitation (lsoilers) or even fiequenciesof hailstorms (isochalazes).Other themes are more likely to be displayed on choropleth naps, which use shading or symbols to display avemgevaluesof information in different areas.such as vegetation,geology or iumbers of aftefactscoilected in a survey unit.

To emphasise the differencesbetweentraditional papel maps and the dynanical interfacethat GIS offers it is worth noting someof the constraints ofthe former (cf. p.6). Longley et al. 1999, Papermapsdiffer from GIS because they are:
Static The dynamic space-time interactionsbetweenobjectscannot easily be depicted (e.9.changes in populationand settlement pattems.orenvironmenralchange).A CIS otlers the advantaSe ofenabling explorationofthe dynamics ot' ternporalpatterlnrg. The University ot' Sydney ArchaeologicalComputing Lab's TimeMap project5is rn excellentexample ofthis lbrn ofdynamic mapping. T$o-dimensional Multidimersionality cannotbe easilydepictedon paper.Multivariate spatial data and the three-dimensionalrepresentationof topography benefit fron multidimensional forms of displayavailable in CIS (e.g.Portugali and Sonis 199l; Couclelis1999).
5w w .

c i m e m a p . n e c.

2.3 Canographicprinciples


Flat Representing a curved three-dimensionalsurface, such as the Eafth, in two(seebelow) dimensionsot'tenintroduces significantdistortion in spatialmeasurements and GIS provides facilities for improving this. PreciseThe traditional methods ofca ographic reprosentationdo not allow for the (that occur between,for example,vegetadepictionof inprecise, 'fuzzy', boundaries lional zones,cultural boundaries, etc.).While this remainsa problem for some forrns ol spatial representation in GIS, there are more possibilities fbr working with less clearly defined boundariesthan paper maps traditionally offer. Difficult to update Orce committed to paper,a map is fixed and can only be updated by producinga new map, whereasa digital rnap may be updatedcontinuously- even in real-time. Difficult to relate to non-spatial data The att butesof the objectsorl traditional maps haveto becodedand further information canonly be fbundby reference to a gazetteer. A GIS has severaladvantages over non-digital systemswith regardsto attributedata: in particular,a GIS offe$ more comprehensive data retrieval,easeof update and an ability to explorc data patternsmore quickly comparedwith its papercounterpart.

A further major advantage ofa GIS over traditionalmappingis that a GIS permits ihe organisationof different componentsof the samemap into different thematic map layers(andthus often referredto asthemoticmapping),which is the basicway rhat spatial data are organisedwithin a GIS environment.In practice this means rhat in one GIS digital display many different elementsmay be combined, each of which can be individually turned on or off, queried,modified, reclassifiedand .dited. Many analytical functions, such as spatialqueries,can operateacrossone or more layers dependingon the need of the GIS analyst.Map layers, or subsets of individuallayers, can alsobe combined to produce new mapsat will, providing potentialinsight into relationshipsbetweenelementson different themes. 2.3.2 Map prqection q)stemsand geodeticdatLlms { basic property of a map is that it is has a spatial context - more properly, geo-by implicitly or explicitly referring to positionson the Eath's surface. rc.fbrenced Obviously with many mapsa preciseand absolutespatialcontextis not imponant; a quick sketchof the route to a fiiend's houseservesa pur?oseeventhough it may be inaccurateand relative. However,when precision and absolutespatial context are important,then an explicit systemof measurement is required.As the Eafih is a complex shapethis is not a trivial processand the scienceof geo./es,y is concemed with the measurement of the moryhology of the Earth's surface.The shapeof the Earthis bestapproximated by a flattenedsphere, ret'erred to aseither an ellipsoid or geoid. and positionson it can be delined using polar or geogrophicalcoordinates (Fig. 2.3). GeographicaL coordinatesls/e/r?^r deline degrees, minutes and seconds north or south of the equatoras latitude and degrees, minutes and secondseastor west from GreenwichPrime Meridian as longitLtde. This is an elegantand simple solution for locating positions on the planet.It is lesssuitablefbr representing the surfaceof the Earth on a two-dimensionalplane, fbl example,on a papermap or computerscreen. The namegiven to a systemused



Fig. 2.3 Polar coordinates. The circle ofthe spherein the -|-.)-plane is the equator,and in the.r. a-planeit is the meridian.If p is an arbitrary point on the surfaceolthe Earth. then the angledefinedby d is thereforelongitude,and the angledefinedby d is latitude (after Worboys 1995,p. 1,13).

to display areasof the Earth's round surfaceon a flat map is map proiection, which involves a mathematicaltransformationof the units of longitude and latitude (i.e. graticules)to aflat plane.Essentially, a flat map of a largeareaofthe Earth'ssurface cannotbe producedwithout some form of projection. When mapping areasat the continentalor intemational scalethe transformationfrom three to two dimensions profound distofiion and spatialeror in particulartypesof measurement. causes Ar national and regional scalesor larger, the distortion arising from projection to r flat surfacecauses fewer problems,and national and statemapping agencies have projectionsfor minimising error within their own boundaries.At verl established small scales,what we might term subregional or local, the surfaceof the Earth can be regarded asllat andgrid systems canbe established and usedwithout reference to geodeticcorrection.Note herethe useofthe terms 'large scale' and 'small scale',as this can be a sourceof confusion.Large scalegenerallyrefersto scales of 1 : 50 000 or greater(e.g. I : 25 000, 1 : 5000,etc.),and small scaleto mapswith scales smaller than 1 :50 000 (e.g.I : 100000, 1 : I 000000,etci Thurston e/ al. 2003,p. 37). Many fbrms of map projectionhavebeendeveloped for both global and national mappingpurposes and most GIS programswill supportmany or all of the common ones (GRASS, for example,suppofis some 123 different projections).Projection systemsmay be grouped into a projection fanlll of which there are three main ones,conical, azimuthal and cylindical, defined accordingto how the sphereis projectedonto a flat sudace (for a mathematicaldiscussionseeIliffe 2000). Each projection family has either a line of tangencyor two lines of secancythat define wherethe imaginedprojectionsurfacecomesinto contactwith the Eafth, andwhere there is corespondingly the least distortion (Fig. 2.4). All projectionswill diston

2.3 Catographic principles


Fig. 2.4 A conical projection with two lines of secancy(left) and one line of tangency (right). The point(s) ofcontact are also refeted to as stanclarcl parallels.

one or more of the parameters of distance,direction, scale,conformality (shape) andarea,althougheachprojectionfamily attemptsto minimise distortion in one or lwo parameters at the expense of increasingit in others. In addition to the three projection families, there are four projection groups definedon the basisof how this distortion is managed.ConJbrmalor othomorphic projectionspreserve the 90" intersectionoflines oflatitude and longitudeto ensure correct angle measurements betweenpoints, but in so doing distort areameasuremenIs.Equal-areaprojections preserve areacalculations, so that the multiplication ofthe two edgesofrectilinear featuresrepresented on a map andglobe will be identical (but the propertiesof shape,angle and scale are then distorted).Projections that maintain distances betweenone or more pairs of points are describedas eqaldistant projectionr. Any given equidistantprojection will only apply to measurementstaken in a cerlain direction: sinusoidalequidistantprojections,for example, enforcethe measurements parallel to the equator, but distort measurements parallel to the meridian.True-direction projeclions maintainthe correctanglefrom any line measured from the centreof the projection to any other point on the map. A projection is defined by the combination of a family and then a projection tvpe.For example,a conical projectioncan be conceptualised as fitting a coneover oneof the polarregions asdepicted itFtg.2.4, which is thencut alonga meridian asin Fig. 2.5. The result is a map in which the lines of longitude are straight and convergent, and lines of latitude are concentricarcs.The line of tangencyon conic projections is referredto asthe standarcl paralLel anddistortionincreases the further onemoves awayfrom this line. The amountofdistortion can,however,be controlledby altering the spacing of the lines of latitude; if evenly spacedthen the projection will be equidistant along the nofth southaxis (equidistant conicprojection);if compressed at the northernand southemends,then the projection becomesequal-area (Albers equal-area conic p roj ecti on) Azimuthal (or planar) projectionsrepresentthe Earth's sudace on a flat plane using a single point of contactrather than a line of tangency(Fig. 2.6). Azimuthal



conical projection with one line oftangency (left) and a Fig. 2.5 Albers equal-area meridian (dashedline). The resulting map is to the right, showing the lires oflatitude as concentic arcs.

Fig. 2.6 Azimuthal projection with a point ofcontact at the North Pole. The resulting map has radiating lines of longitude,and concentriclines of latitude. Angle and taken along the lines of longitude remain accurate distancemeasurements

(or planar)prqections areusually usedto map the polesalthoughin theorythey can occur anywhereon the Eafth's surface.If polar, then the projection is confbrmal with concentric lines of latitude and radiating lines of longitude. Area distortion from occursas one movesaway from the poles,but directionsand linear distances the centrepoint to any other point on the map are accurate. Cylindrical projectionsare conformal and so 90' anglesare maintainedbetween along the line of tanthe lines of latitude and longitude (Fi}.2.7). Measurements gency are equidistantbut at fufiher distancesfrom this line area measurements becomeincreasinglydistofted. The most common cylindrical projection is lhe Mercator Projection, which uses the equator as its line of tangency and scalesthe )-dimension (latitude) to reduce the distoftion at polar extremes.This projection gives a very misleading areastowardsthe top view ofthe world asmovementaway from the equatorcauses large in area (Snyder and the map to become disproportionately and bottom of Voxland 1989, p. l0). The TransverseMercator Projection (TM projection), invented by Johann Lambert (1728 1177), rotatesthe cone 90' so that a meridin the east-west ian becomesthe line of tangency.This distorts measurements Mercator better than the standard measurements north-south axis but maintains globe. ways of mapping the Projection.The TM Projectionis one ofthe standard

2.3 Catographic pinciples

Fig. 2.7 Cylindrical projection with a line of tangencycoresponding to the equator and a meridian (dashedline). The resulting map is to the right, showing the lines ol latitude as parallel lines.

Finally, the Uniyersal Tran,syerse Mercotor Projection (UTM) is a twentiethcentury modification of the TM Projection that divides the world into 60 vefiical zones, eachof which are 6' of longitudewide. Thereis a centralmeridian in eachof these60 zonesthat minimises measurement distortion in the east-westto approximatefy 1 m in every 2500 m (Robinson et al. 1995;DeMers 1997,pp. 63-64). Each zone is divided into rows of 8' latitude (12" in the nofihemmost section) which equatesto a 100000-m wide grid square.The central meridian is given a false eastingvalue of 500000 m to eliminate the needfbr negativenumberswhen specifyingeast-westcoordinates. For the samereasonthe equatoris given a nofihing value of 0 m for measurements in the nolthern herrisphere,and l0 000 000 m for measurements in the southernhemisphere. Universal Transverse Mercator coordinatesare given by fir'st specifying the zone and then the easting(with 6 digits fbr I m precision)and northing (with 7 digits for t m precision).The UTM projection is very popular in GIS and relatedgeospatial technologies like remotesensing because of its global application,minimal distortion and metric coordinatesystem. Most GPSs are able to record locations in UTM coordinates,making it an ideal systemfor spatialdata collection when a local grid systemis not available. In addition to the projection systemused to make the map, it is also impoftant to be aware of which mathematicalapproximationof the shapeof the Earth was used fbr the constructionof a map. The Earth is not an exact ellipsoid, since the surface is not smooth and the poles are not equidistantfrom the equator.Polar coordinates of latitude and longitude are thereforecalculatedusing a mathematical approximationof the Earth's shapeand its centre.Severaldifferent approximations have been calculated,often for a specific region of the planet. Clarke's 1866 cal, culationsformed the basisfor lhe 192'7 datum of North America (North American Dattm 27, or NAD27). NAD27 is being replacedby satellite-derivedmeasurementsof an ellipsoid calledNAD83 but many organisations still usemeasurements and locations using the earlier geodetic datum. The Geodetic ReferenceSystem


First principles

(GRS80), World Geodetic System 84 (WGS84) and EuropeanTerrestrialReference System (ETRS89) are more recent recalculationsof the ellipsoid used in generallyusewhicheverellipsoid calculations Europe.National mapping agencies most closely fit their needs.For example,most national mapping in Great Britain usesthe OSGB36Datumbased the Ordnance on the 1830Arrl-ellipsoid.although Surveyhave adoptedthe ETRS89 ellipsoid for more recentmapping derivedfrom GPS receivers. Coordinate transformation and reprojection Maps that sharea projectionsystem(e.g.UTM) but arebasedon differentellipsoids (e.g. WGS84 versusNAD27) are not compatible,nor are maps that use different projectionsystems (e.g.Transverse MercatorversusStatePlane)but sharethe same ellipsoid (e.g.NAD27). For example,the physicaldistancebetweentwo points that have identicttl geographicalcoordinates,but one basedon NAD27 and the other based on NAD83,canbe asmuchas 100m apartin theUSA. For datafrom multiple map sources to be combined,the mapsmustshareacommonprojectionandgeodetic referencesystem. Ifthis is not the casethe projectionsand/orreference systemmust be alteredthrough a processcalled secontlarl-transfonn(tion or reprojection.To computethe transformation, fairly specificinformationis requiredabouttheexisting projection provide tools to transfonnmaps anddesired systems. Most GIS packages from one geodeticdatumto another,anddedicatedsoftwaretools are also available to help convefi between NAD27 and NAD83 (see,for example,the directory of andtheironlineconversion software on theUS National Geodetic Sun, ey s website6 toolT).Details aboutthe ellipsoid and projection systemusedin the constructionof a map are typically printed in the comer or are containedin an associated metadata recordfor digital data (seeChapter13 for further discussion of metadata elements). grid systems 2.3.3 National and regioncLl programs geographic Many GIS use coordinatesof latitude and longitude as the basisfor regionalmaps(most often asdecimal degrees where minutesand seconds are convertedto decimal units so that, for example,30 minutes30 seconds is equal to 0.508of a degree). While decimaldegreesystems can work well in GIS packages that are ableto managethe correctionsfor spatialmeasurement, a Cartesian system basedon metric units, such as UTM or national (military) grid system,is often a betterchoicebecause of the advantages it offers for calculatingdistances and areas. grid systemis usedfor mapping,east westmeaWhen a two-dimensional Cartesian surements are locatedon the horizontal;r-axis and called eastings,andnorth-south (Fig. 2.8). measurements are locatedon the vertical 'r,-axis and are called norrlrlrugs (e.g.greaterthan1 : 1 000 000) mostnationalorregionalmapping At largerscales systemsprovide metric planar coordinatesalongside,or in place oi latitude and longitude. Metric planar coordinatesare used in the global UTM projection, the
6 w ww ,n g s,n o a a , gov.

7 r r r ./.tr s".tto ...so v/csi

bl n/nadcon.prl .

2.3 Cartographic principles

c't c

t o

67 -----------+ap


(67 31) '


(eastings) x-axis


Fig. 2.8 A two-dimensional Cartesia[ coordinate systgm. Point p is located by reference to its distance from a 0,0 datum in the r- and )-planes (respectively refened to as the 'eastings' and 'northings').

ol E

q (34,18)


(eastings) x-axis


Fig. 2.9 To calculate the linear distance between points p and q (c), Pythagoras' theorem is used:c: t@J;2. Asa and, areknown(a = ,p- xq : 33, y, b: -yq - l3r . r hec alc ulat ionis c : J 33 2 * 1 3 , : 1 5 . 5 .

US State Plane system, British National Grid and in most other national gdds. National grid systems, such as the US State Plane system, are often better choices for regional mapping projectsbecause the ellipsoid is often selectedto maximise spatial accufacy for the specific area covered by that particular system. In parts of the world where national or military grids are unavailable, then UTM is an excellent choice. We must emphasiseagain our waming from the previous section regarding the inevitable and significant spatial errors that will result from combining data derived from maps with different projections and/or ellipsoids. Metric planar systemshave the important and crucial advantageof allowing the easy calculation of distance and area. For example: linear distance measurements can be calculated using Pythagoras' theorem (Fig. 2.9); polygon areas can be


First principles

calculatedusing a systemthat lirst breaksthe shapeinto smallertrapezia and, then sums their individual areasto derive the total area;and the geometric centreof a polygon (its centroid')can be found by taking the mean of the coordinatesof all vefiices that define the polygon (for alternatives, seeJones 1991 , p. 66: Burrough andMcDonnell 1998,p. 63). 2.4 Data models and data structurs: the digital representation of spatial phenomena How doesa GIS representspatialdata?The roots of GIS originatewith the development of automatedmapping in the middle of the last century.In the late 1950s someof the basic computeralgorithms for handling geographicinformation were developed, including the principles for digital cartography, at aboutthe sametime that technologyhad developed to incorporatecomputergraphics(e.g.Tobler 1959). The CanadaGeographicInformatiorz Sys/em, developed in 1963to managenatural resources, was a natural outcome of thesedevelopments and qualifies as the first GIS. It wasfollowed soonafterby the development of othersystems thatwere capable of automated mapping(Foresman 1998; Tomlinson 1998). Automated mapping oft'ered considerable time savingsovertraditionalpapermethodsby providing faster and more accurate facilities for the management andupdatingof spatialdata.These early systemsrelied on point, line and polygon 'geographicprimitives', which still form the building blocks of modern vector-based GIS. A GIS works by manipulatingthe digital representations of real world entities. Howeveqa GIS only hasa finite setofresources with which to replicatethe infinitely complex world and, as a consequence, the digital representations usedby GIS are necessarily schematicand generalised. The representation of elementsof reality in this way is ref'ered to as a data model.In GIS, data modelstend to be very simple representations of reality, althoughas we shall seein later chapters,simple models may become the building blocks for more complex models that are designedto quantify relationships betweendifferent entities. As we saw in Chapter I, GIS representspatial data using one or both of the entity and continuousfield data models.Theseare usually implementedas vector and.raster data structures,respectively.Raster and vector data structuresstore, represent andmanipulatespatialinfomation in very differentways.Certaintypesof entitiesare more typically represented in one format or the other,althoughmuch of the datathat archaeologists routinely encounter can ultimately be represented using either structure.Until recently,these two data structureswere virtually mutually exclusive:GIS programstendedto rely on either one or the other, forcing usersto make a decisionasto which they would use.Todaymost GIS permit the mixing of both rasterandvectordataasseparate thematicmap layers,giving usersthe freedom to decide on the most appropriatestructurewithout necessarilyusing a ditferent program. The lbllowing sectionsoutline the differencesbetweenthe vector and raster data structures,and provide some examplesof ways that different sons of data are handledby eachfonnat.

2.4 Data modelsand data struclures





. .' .l'
aa ot aa o

Fig. 2.10 The three vector 'geographicprimitives' of points, lines and polygons.

2.4.1 The vector data structure -\ vector is a mathematicalterm that refers to one or more coordinatesused to define an object in Cartesianspace.In the vector structure,real-world entities are represented using one of three geometricalprimitives: points, lines or polygons. pairscalleduertices, Eachprimitiveis de{ined usingone or morer, J-coordinate andarethusdescribed asdiscreteobjects because oftheir preciselydefinedlocations andboundaries(Fig. 2. l0). Verticesthat are locatedat the endsof discretelines, or at their intersections. are called noles. For example,points are zero-dimensionalobjects (fbr they have no length or breadth)defined by a single coordinatepair, and lines and polylines (often also refened to as .rrcJ or edges) arc one-dimensionalvectors (having the property of length, but not breadth) defined by two or more coordinate pairs. Polygons, or areas,are two-dimensionalobjects defined by three or more coordinatepairs. Three-dimensionalobjects are refened to as volumes, but despite the fact that CAD systems routinely usethree-dimensional vectorobjects,the three-dimensional \ ector structurehas not yet been widely implementedin GIS. The discretenatureof every vector object meansthat, in addition to possessing its own unique spatial location and morphology, it is a trivial processto provide eachvector object with an identification (id) number On the basis of this unique identifier,eachand every object can thus be linked to a set of additionalnon-spatial attributesthat describeadditional propeftiesof that object. Thesepropeniesmost often consist of real-world quantitative and/or qualitative variablesthat give the lector objectmeaningwithin the GIS (Fig.2.11). Vector topology An extremelyimpofiant conceptthat underliesthe vector structureis the geometrical relationshipsbetween vector objects, referred Io as topology. The analysis is exploredmore fully in Chapter11, so here it is of topologicalrelationships sufficient to note a f'ew basic concepts.Firstly, topological relationshipsdefine


First principles Vectorobjects Attributedata

Fig. 2.11 Vector ob.jects linked to attribute data.In this example,eachpoiygon has a unique id number that links it directly to an attribute rablerhat definesthe soil type represented by that polygon.

the connectionsand relationshipsbetweenvector objectsrather than their spatial location. For example,when two roadscrosseachother, two different topological relationshipscan potentially exist betweenthoseentities.If the lines simply cross without sharinga node,the lines are not topologically connected. This is equivalent to a road crossinganothervia an underpass and it is not possibleto get from one road to anotherat the point of intersection.If the roadsdo sharea node, they are topologically linked. In this case,it would be equivalentto the two roadsmeeting at an intersection,Topological relationshipsare thereforedefined by the presence of sharednodes between vector objects. In practice, many GIS require nodes at both crossingand meeting points, in which caseadditional methodsmust be used to provide adequate topological information (seeChapter I l). Topological relationshipsalso define how polygons relate to each other. For example,two adjacentpolygons, perhapsrepresentingseparate parcelsof land or survey zones, are topologically related if they share one or more nodes or arcs in common (Frg.2.12).Without commonnodesthis relationship doesnot exist, and the polygons then must either overlap and./orhave a gap between them. It is entirely possiblethat they intentionally ov_erlap or have a gap to reflect a realworld spatial relationship;but more usually adjacentpolygons have an assumed, if not actual,topological lelationship. The calculationof spatial relationshipsand propefiies of vector objects is not a trivial process,and is dependentboth on the data stmcture and accuracyof the dataset.During the data collection phase and particularly during the processof digitising vector objects,care shouldto be taken to ensurethat topologicalrelationshipsare properly maintainedand defined.Many vector GIS programshave 'clean-up' routinesthat can be usedto createtopologies betweenobjectsautomatically(Chapter5). Somegeodatabases, suchas ArcGIS, provide a setof topologicalrules to ensure that vector objects are always relatedin appropriateways. For example,polygons that definesurveyareasmight havea 'Must Not Overlap' rule, so that any instances

2.4 Data modelsand data structures b


Fig. 2.12 Three topologically relatedpolygons. Polygons 1, 2 and 3 sharearcs (edges) definedby nodescm, mj and mg.

where tbis occurs are identified and the appropriateaction taken (e.g. the overlapping area is subtractedfrom one polygon, or a new polygon delined by the ove apping areais created). Topological accuracy also makes for more efficient storageof vector data as of this when vector objectscan then sharedata.SomeGIS systemstake advantage storing the geometricdefinitions by only recording an arc (and its vertices)once, and then defining its relationshipto polygons. InFig.2.12, for example,arcs cm, mj and mg need only be stored once instead of twice for each of the polygon they define.On large,complex,polygonal maps suchasthoseroutinely boundaries encounteredwith soil or geological series,this can result in a significant saving of storagespaceand computationaltime, an issue examinedin further detail in Chapter4. 2.4.2 The raster data struclure Unlike vectorgraphics,which usecoordinategeometryto definethe spatialparameters of objects,raster graphicsuse a grid matrix of equally sized cells or pixels to representspatialdata (Fig. 2.13). Rastermaps are thereforedefinedonly by the number of rows and columns in the grid and the size of each pixel in terms of with it that represents the actualareacovered.Each cell also has a value associated attribute statusof the object at that location. In a digital elevationmodel (DEM), for example,each cell has a quantitativevalue that signifies the mean elevation acrossthe areadefined by that pixel, whereasthe pixels in a vegetationmap may be codedto reflect modal vegetationtype. The relianceon pixels and the use of a singleattributeper pixel may appearto be very limiting in comparisonto the vector structure,but within the simplicity of the raster structurelies its strength.Raster


First princtples




Fig. 2.13 Point, line and polygon primitives as rcpresented on a rastergrid.

datasetsare easily combined and mathematicallymanipulatedas computerscan processand display rasterdataconsiderablymore quickly than vector databecause of the efficiency with which they can store and handle grid data. The simplicity of the strxcture does not reduce its functionality for, as we shall see in the next chapter,the raster data structurecan be used to model some extrernelycomplex spatialphenomena. A critical variablein the rasterstructureis the size of the cells, sincethey define the resolution of a map by providing a minimum unit of representation. Whereas a raster map depicting density of archaeologicalsites acrossa Iarge region may consistof a grid ofpixels that eachrepresent an areaof a squarekilometre or more, a rastermap showingthe densityof artefactsacrossa sitemay usepixels that define an areaof a squaremetre or less.Although computerprocessingspeedand storage spaceis continuouslyincreasing,there are nevertheless somepracticalrestrictions on how much information a typical desktopcomputercan efficiently process. This, ofcourse,varieswith the specification ofthe computet but lossofperformancemay be noticed when the total number of pixels is severalmillion or more (e.g. a raster map representing an areaof 50 x 50 km with individual pixel sizesof 10 m2 will require a grid of 5000 columnsby 5000 rows and thereforestorageof infomation for 25 000 000 pixels). Decisionsrelating to resolutionneedto be madevery early on in the model-buildingprocess.In the next chapterwe show that such decisions can have important consequences for interpretationof the results of analysisand for imakingsenseof spatialpattems. 2.4,3 Choosinga data structure There are many instanceswhere vector systemsprovide the only sensiblemeans of answeringa specific set of questions,or handling the sorts of multiscalar and preciselydefineddatathat one might be interested in exploring.On the other hand, rasterdatais suitablefor powerful spatialmodelling,is ableto represent continuous datasetsmore smoothly, and provides image analysis and classificationroutines suitablefor aerial photographyand saiellite imagery.Many modern GIS systems

2.4 Data modeLs and data sttactures


areto a largeextent 'hybrid' and offer capabilitiesfor manipulatingboth rasterand \ ectordatasets. As a result it is now commonto lind both datastructures being used in a singleprogramenvironment,thusreducingthe choicebetweenthe rasterversus \ ector structures to that of appropriateness fol the particularneedsand questionsat hand.From the philosophicalperspective adoptedin ChapterI , the vector structure i\ generallymost appropriatewhen the subjectmatterhasbeenconceivedusing the entity model of space.Conversely, the raster structureis usually a better choice if rhe subjectmatter has beenconceivedas a continuousfield. Advantages and disadvantagesof the vector structure oJlhe vector structure A major advantage .\dvantages of the vector structureis its spatialprecision.Real-worldentitiescan be drawn andpositionedwith an acculacy restrictedonly by practicallimitations suchasthe precisionof the recordingequipment. Afiefacts, features,sites and other archaeological entities can be integrated in a single envilonment,eachmappedwith as much spatialdetail as is requiredfor precision would not be required for the analysis rnalysis. While centimetre-scale .)f the spatialdistribution of sitesacrossa large study region,this level of precision ri ould, however,be essentialfor the study of the distribution of chert flakes on a knapping floor. Finding the balancebetween spatial precision and the minimum rcale of analysisis crucial: most importantly to prevent spatial enors fiom influencing pattern recognition.In practice,it is important to recognisethat increased resolution also means increasedfile sizes, with a correspondingburden on stor'rge and processingtime. Another significant advantage of the vector structureis rhat vector objects are maintained as distinct entities and can be easily linked to rttribute data records in an internal or external database. For this reason.vectorbased GIS programshavetraditionally led the way in termsof database integration, f,scomplexattributeque escanbe performedwith relativeease. A vectormap may rherefore acl as a window into a database, in which eachobject is describedin great detail. Di.scLdvantages ofthe Nector structLtreVectorobjectsare computationallydemanding. Every vertexandnodeof a vectormap must be storedin computermemory and drawing vector objectsrequiles a considerable amount of processortime. For this reason,vector data are often much slower to generateon a computer screenthan rasterdata. The manipulation of vector data is correspondinglyintensive;spatial queriesinvolving, for example,the calculation of areasof overlap within a Iarge set of polygons needsconsiderablecomputer-processing unit (CPU) time. Vector data also impose propertiesonto real-world objects that do not necessarilycorrespondwith reality. The most important imposition is 'boundedness'. Although many real-world objects do indeed have precise and discreteboundaries,certain typesofdata aremore'ftzzy' and do not lend themselves to the hard,precise,edges of vector objects.As vector data cannotreadily deal with fuzzinessor imprecision, this can result in artificial precision in some scenarios. An issuerelated to fuzzy


First principles

boundariesis the implied non-varying stateof an atffibute acrossa vector object. For example, a polygon used to representa discretesurvey area may possess an attribute value that replesentsthe density of afiefacts in that enumerationunit. This implies a continuousdistribution of arlefactsin that area,which in reality is rarely the case (e.g. Fig. 12.2). An additional attribute could be used to express the variability within a polygon, but there is no simple way spatially to map continuous change with vector objects. Elevation is thereforeinherently difficult to representusing discretevector objects such as points or lines: contour lines, for example,give an indication of topographicvariation at set intervals,but it can be difficult to predict elevationvaluesbetweenthe lines.Thereare specialvectorstruc(TINs) that overcomethesedifficulties turescalled trlangulated irregular nenuor,(s (Chapter6), but the caseremainsthat sometypesof data are lesswell suitedto the vector structureAdvantages and disadvantagesof the raster structure Advantages of the raster structure The speedat which rasterdatacan be processed ofl'ers advantages for some applicationsinvolving very large datasetsand there are severalother key areasin which the raster structurecan offer advantages over vector formats.Firstly, rasterdata are very good for mappingcontinuouslyvarying phenomena,such as elevation,as the continuouscell-basedstructureis akin to a continuouslyvarying surface.The raster structureis also very good at representing real-world entities that have fuzzy boundaries.For example,a distribution of aftefactscollected in a ploughed field could be represented more realistically by using raster cells that show the changing density of material rather than a single polygon that arbitrarily definesthe site's area with a single density value. When this type of information is crucial, then the raster data structure offers a clear advantage. Secondly,rasterdatasets can be mathematicallymanipulatedand combined more easily than polygon maps,making it an exceedinglypowerful tool for spatial modelling. A simple rnodel of agricultural potential, for example,may be constructed by combining data from severaldifferent sources,such as rastermaps of elevation, slope, aspect,soil drainage and soil type in a process called map algebra.Thtdly, aerialphotographs, satelliteimagesand geophysicalsurveysproducedatain rasterformats, andthe imageprocessing thatis often needed to enhance, classify and make senseof these sorts of data can only be performed in a raster envirbnment. Disadvantages of the raster structure There are three major disadvantages of the rasterstrxcture:its fixed resolution,its difficulty in representing discreteentitiesand its limited ability to handle multiple attribute data.The first problem ariseswhen data collected at different scalesneed to be integrated.Combining multiscalar datasetscould be seenas introducing additional problems regardlessof the data model, andmight ideally be avoided,but in practicethereare many instances when

2.5 Conclusion


Fig. 2. l4 Representing complex curveswith rasterdata can be problematic.The box on the far lelt showsfive vector polylines. The centrebox showsthe samelines using a 10 x 10 rastergrid (i.e. 100 ce11s). On the far right rhe resolutionhasbeen increasing to a 20 x 20 grid (i.e. 400 cells). This improvesthe representation, but the rastermap still suffersfrom being blocky and from lost derail.

datacollectedat different scales mustbe combined.Field surveydata,for example, oiten mix scalesof representation from the larger survey unit (such as a field) to site-based artefactcollections where more detail is collected.The representation of multiscalardata is difficult in rastersystemsand the combination of raster data collectedat different scales often resultsin havingto defaultto the smallerscaleand loosing detail. Secondly, problemscan arisewith representing complexboundaries using raster data because of the inherentlimitations of grid data for representing tightly curvedobjects.Unlessthe cells are very small in relation to the object being replesentedand the storagesize conespondingly increased,curved lines always will be blocky in appearance (Fig. 2.14). Fol this reasoncomplex shapes, suchas contourlines, are better modelledusing vectorobjects.Finally, rasterdatahavealwaysbeendifficult to connectto attribute tables.Although someGIS programs,notably Idrisi and GRASS, provide a facility for linking fasterdatato a database, in practicethis is often more cumbersome than the embeddedattributetables that vector-based GIS programsprovide. The raster data structurethus has limitations for the management and querying of multiscalar spatialdatasets. 2.5 Conclusion Geographicalinformation systems(GIS) are a powedul technology that offer a host of analytical possibilities tbr investigatingthe spatialorganisationof culture and human-environmentrelationships. These 'first principles' of GIS only define the startingpoint fbr exploring the conplexity of the human useof spacewith GIS. In fact, many of thesefirst principles are being constantlychallengedby research that is pushing beyond the constraintsof two-dimensionalmapping to use GIS to model space-timerelationshipsmore adequatelythan the basic vector and raster



building blockspresented in this chapter. Nevertheless, GIS is - for the time being principles at least- still reliant on cartographic and a reductionisttendencythat restrictsits rangeof possibilitiesfor representing and interpretingthe real world. Within theselimitations,however, thereis still a very broadrangeof waysthat GIS canbe usedto developan understanding of humanculturein a spatialframework, provides andthe next chapter somereal-worldexamples of how GIS cananddoes work in archaeologv.

3 Putting GIS to work in archaeology

This chapterreviewsfour typical applicationsof GIS in archaeology: management ofarchaeological resources, excavation, landscape archaeology andthe spatialmodelling ofpast humanbehaviour. For eachapplicationwe discuss somegeneralissues concerningthe useof GIS in that panicular context,followed by a presentation of a casestudythat illustratesthe contributionthat GIS hasmade.Although theseexamples are in no way exhaustive, they do provide a good overview of the capabilities potential and contributionsthat GIS can make to archaeological management and research,

3.1 Management of archaeological resources It is not our intention to discussthe objectivesof cultural resourcemanagement formanagingthe archaeostructure ofa spatialdatabase {CRM), northe appropriate logicalrecord,asthesedecisions aremostappropriately madeby governmentbodies andthe archaeologists chargedwith the tasksofrecording andmanagingthe archaeHowever,we note that archaeological andhistoric databases ologicalresource. have increasinglybeen subject to govemment scrutiny. In the UK, this most recently occunedin a parliamentaryreview ofarchaeologythat took placein 2003 (APPAG 1003; Gilman 2004). In panicular, the UK archaeological databases termed 'Sites andMonumentsRecords' (SMRs) are underreview in light ofrecent developments in infomation technology,especially GIS and the Internet (e.g. Newman 2002). This report makesit clear that SMRs should evolve into broaderHistoric Environment Records (HERs) that include information such as historic buildings, parks and gardens, historic aircraft crashsites,etc. Moreover,the role of HERs as essential vehiclesboth for the management record and for public of the archaeological educationwas emphasised. At the moment, SMRs are still fragmentarydue in pafi to their often ad-hoc nature and the absenceof clear guidelines as to what should be recorded.Some guidance andstrategies for datacollection,standards haverecently andmaintenance been provided in Fernie and Gilman (2000), ALGAO (2001), ALGAO (2002), but the lack of statutory requirementsfor their developmentor maintenanceby (APPAG 2003). Severaladditional local authorities is hindering standardisation lecommendations made to the governmentto improve SMRs include the need to streamlinesystemsand centraliseaccessvia the Internet in order to increasetheir availability to the public (APPAG 2003, p. l9).



Putting GIS to work

The implications of this in terms of the role of GIS are significant.For example, althoughin 2004 over 907oof SMRs in England reportedusing GIS for recording and management of data (Bevan and Bell 2004), there is little sLandardisation of methodsand fornats of data storage(Newman 2002). There are ongoing studies to introduce national standards specifying how particular sorts of archaeological and historically impofiant information shouldbe recordedwithin spatialdatabases, alongside recommendations regardingthe role of GIS andthe accessibility of spatial data to membersof the public. Terminological standardssuch as the Monttntent In)entory Dato Stctndard(MIDAS) and English Heritage's Nationol Monwnents RecordTlrcsattrit are alreadyin use.While we appearto be someyearsaway from seeing GIS as the primary managementsystem for archaeologicaland historic data, it is cerlainly on the horizon. With this in mind, we can make some very generalobservations and a few brief recommendation s for peoplewho might either reviewingand be considering adoptinga GIS for the first time, or considering upgrading existingGIS facilities. Firstly, asshouldnow be clearfrom the previouschapter, GIS offers many advantages overattribute managing spatial data. Althoughit only database systems when is possibleto havea perfectly adequate that recordssite location non-GIS database and other information in a spreadsheet or papercard index, suchdatabases begin to problems run into significant a key part of the when spatialinformationbecomes (SSSI),nationalpal'ks record.Sitesof Special Interest or largearchaeoScientific parameters, logicalsites cannot be easilydescribed withoutreference to theirspatial point on the landscape andrecording morethanthe locationof a specific anything becomes very difficult in a standard text-based For this reason. database. spatial databases are lar superioras they are able to record n.rorphology and topology in formats that can be queried in ways that attribute only data cannot.From the perspective of resource management, the advantages of havingan integrated system that permits the flexible interrogationof sites within their broader spatial context 'e enormous.Therefore GIS-basedmanagement systemsare replacing standard CRM database systems worldwide,althoughthe uptakeof GIS in cultural heritageis cetainly not unitbrm,evenwithin Europe(Garcia-Sanjuan andWheatley 1999). Vector-based are particularlysuited to mapping precise GIS environments boundariesand the linking of attribute data to spatial objects. They have thus traditionally beenprefened for resourcemanagement. ln the UK, nearly all SMRs held by county councils use vector objects to record archaeologicalspatial data (Bevan and Bell 2004). However,to be truly integrative,an SMR should include severalother data layers that may require raster capabilities. Data such as airphotogfaphs, digital elevation mode]s,satelliteimagery,historicalrnaps,digital images and resistivity or other fbrms of survey data will provide a broader range of contextualinfonnation. Rasterdatasets also fbrm the basisof predictivemodels,
l r r a a n , f l a n " " u , r r.r " . e n g r l ish - h e r ita e e . orq. uK.

3.1 ManagementoJarchaeologicalresources


embraced in the early daysof GIS and CRM. There are which wereenthusiastically many reasonsto be doubtful of its suitability for explaining human environment relationships(cf. Wheatley 2004), but predictive modelling continues to be an (seeWestcott important fbcus within the sphereof cultural resourcemanagement ofobjectives andChapter8 for a discussion andBrandon2000for severalexamples and methodsof predictive modelling). An exampleof using a continuoussurface significance areasof archaeological to model 'importance' of known or suspected (2002, pp. 219 221). also been discussed by Wheatley and Gillings has problemsfaced by a CRM organA GIS will not solve all the data management isation. and the use of GIS will itself result in a number of issuesthat have to be - ideally prior to migrating from establishedsystemsFirst, GIS impleaddressed mentationcan be expensive.Software and hardwarecosts,particularly if using a commercialor speciallywritten application,can be enormous.Thesecostsmay be relatively minor, however, compared to the costs of acquiring and maintaining spatial data and training personnel to operate the GIS software. Furthermore, rvhile popular off-the-shelf systemssuch as Maplnfo or ATcGISare powerful and fully capableof dealing with the most demandingCRM and SMR data systems, professionalcustomisationand programming of these systemsto meet the specific needs of an SMR or CRM organisationis a worthwhile investment.The customisationof a GIS, while perhaps within the capability of a well-trained GIS operator,is something that is often best undertakenby a professionalGIS programmer. Second, the costs of transferring data from an attdbute system to a spatial databasecan create difficulties becauseof the increaseddetail required by the latter.Even the most basic form of spatialdata points - might presentdifficulties level.For to an established ifspatial precisionhasnot beenrecordedor standardised example,if most sitelocationsare recordedto the nearest10 m, but a portion of the recordsonly provide locationsto the nearest100m, then tbis will immediatelycall undeftaken. into questionthe accuracyof any spatialqueriesthat are subsequently spatialerror,or The only solution is to maintain dataaboutthe known or suspected to havespatiallocationschecked,confirmedor correctedby field personnel(which hascost implications).With more detailedspatialobjects- for example,very large locationrecordswill haveto be translated - descriptive sitesor historicallandscapes into topologically corect polygons that show their rnorphology and their spatial pipelines relationshipsto other vector objects such as roads,property boundaries, and other landscape features. Third, costsfor acquiring digital topographicmaps of the areaof responsibility also have to be taken into account,as national mapping agenciesmay not provide fiee versionsof their costly data for commercialpulposes.The costsof upgrading indeed,after dataso that it 'works' in a GIS environmentcannotbe underestimated, the investmentin training or acquiring GIS personnel,the acquisition of spatial data is perhapsthe single most important investmentto be made when moving to a GIS.


Putting GIS to lvctrk

3.1.1 CasestudyI: the Greater London Sites and MonumentsRecord The GreaterLondon SMR is a good wolking exampleof a CRM GlS. The software usedto manage the SMR is calledthe Historic Buildings,Sitesand Monuments Record(HBSMR)2and was speciallydesigned by Exegesis SDM in partnership with English Heritage's National Monuments Record (NMR) and the Association of Local Govemment Archaeological Officers (ALGAO). ExegesisSDM's HBSMR is built aroundMicrosoft's Accessdatabase and is one of the more popular archaeologicalspatial managementsystemsin use in Britain. Its relational strxcture- termed the 'Event Source-Monument' model permits the use of a sophisticated data model basedaroundseparate tablesfor the attributesof sitesor monuments, associated archaeological investigations, andthe datathatconstitutes the archivefor that site or monument (Bourn 1999;Dunning2001).Many SMRs held by countycouncilsin Englandnow usethis systemto manage their historic and archaeological datasets. The attributedatabases link directly to spatialobjects and basemapsmanaged in eitherMaplnfo or ArcView GIS, andprovide the essential spatial searchand query functionality neededto managethe data etlectively. The systemis based on the I :25 000 Ordnance SurveyMeridianseries that provides a spatial framework and information on roads,property and administrative boundaries.The archaeologicaland historical places database consistsof spatial objects that define the location and/or boundariesof a broad range of data, from prehistoric sites to larger Roman and medieval structures,and more recentbuildings ofhistorical importance. The associated attributeinfblmation for eachof these locationsis definedby English Heritage'sNatiol2 al Monurnents Record Thesauri.3 A thesaurusis an essentialpart of any database as it provides a list of pref'erred termsfor describing typesof archaeological and historicalsites.and establishing relationships betweenthe hierarchicalterms usedto describethe spatialobjectsin theSMR. The SMR holdstherecords ofnearly 80000archaeological sites, historic placesand listed buildingsacross London's33 boroughs. and providesthe main sourceof infomation about the archaeological and historic placesin London fbr archaeologists, r'esearchers, planning consultants and developers. Further infbrmation aboutthe GreaterLondon SMR can be obtainedliom the National Monuments Record sectionof the English Heritagewebsite.4 3.2 GIS and excavation Archaeology has traditionally possessed a strong conceptualdivide betweendata collectionand dataanalysis, manifested most obviouslybetween excavation and post-excavation activities.For many reasons this division is logical and necessary, not least becausethere are many tasks that are difficult to perfbrm in the field: cataloguingphotographicrecords; detailed analysis of artefacts,site plans
rlwrr r . e s d m . c o . u k / H B SI4 R. a sp . )ht t p : / / i : h e s a u r u s . e n q lish l1 e iir a q e . rwww . e n g l i s h h e r i L a q e . o r q . u k/ .

o r g . u k.

3.2 GIS and excavation


and environmentalsamples;inking-up ofdrawings; and so on. However.the availability of laptop computersdesignedfor field usemeansthat it is now common for field projects to possess the capabilitiesfor on-site conputing. From a GIS perspective, therefore,the most importantchangein recentyearsrelatesto the massive increasein the useof digital recordingmethodsfor spatialand attributedata.Many of the routine tasks that were normally assignedto the post-excavation phaseof a project can now take place in a field setting and this has erodedthe traditional separation betweendata collection and description,and that of data analysisand interpretation.Beck and Beck (2000) provide a good example of the impact of digital recording on the excavationprocess,and the implications of eroding the boundary betweendata collection and interpretationhave also been explored by Hodder(1997,pp. 80-104.and 1999). A comprehensive discussion of excavation methodologies and theory,including the impact of information technology(IT) on excavation practices, can alsobe found in Roskams (2001). Although it is certainly not a panacea, GIS can play a useful role in an archaeological excavation. As a spatialmanagement tool it is unsurpassed asit allows rapid visualisation of spatial data and can link plans and drawings of archaeological remains directly to database records.On any site with more than a few hundred plans and associated data records GIS can offer major advantages for data management.GIS also enablesthe visualisationof data pattems at or soon afier their collection, which can be invaluableduring the courseof an excavation, facilitating a 'r'eflexive' approachto data collection in the way describedby Hodder ( I999). Data collection thercforebecomesa more iterativeexercise,allowing ideasabout possiblepattemsandrelationships betweendatato be identified and exploredmore quickly andefficiently than can occur in traditionalpaper-based recordingsystems. A very simpleconceptual modelofa basicexcavation GIS is outlinedin Fig. 3.1. The useof GIS on any excavation requiresa considerable amountofforethought and planning.Digital recordingtechnologies bring their own set of disadvantages, particularly the dependence on electricity and the associated costs of buying or renting expensiverecording equipment.These costs should be weighed against the potential benefits,such as the time savingsin the transferof information from paperto digital formats,andthe intelpretativeadvantages of being ableto visualise and explore complex data pattemsduring the processof excavation.It is increasingly common for excavationprojects to record spatial data using a total station or diff'erentialGPS, with data transferreddirectly into a data-loggeror computer to reduce the time and the possibility of introducing erors when recording data (Powlesland 1998;Ziebartet al. 1998l.More recentsurveytechnologies, suchas the ArcSurvey packageof the ATcGISfamily, extendGIS functionality to datacollection so that GPS and total stationdata can be viewed and annotated in real-time on laptop or tablet computers.Alternatively it is possibleto record spatialinformation by photogrammetry the deriving of spatial data from photographs- which has a long history of archaeological application at both landscapeand site levels (e.g. Sterud andPratt1975; Anderson 1982; Poulter 1997). andKerslake Procedures


Putttng GIS to work

UnitData unitnumber category excavated volume soilconsistency soilcolour so;linclusions

Artefact Dato number artefact unitnumber datacategory material length width thickness etc.

Spatill Database ll artefactnumber

Spatial Dotabase I unit number

Fig. 3.1 A simple data model for linking excavated unit attributedata and artetact The 'polygon' and 'point' fields in the spatial attributedata 1oa spatialdatabase. the spatial object, as captuled databases contain the,r, _r-coordinate data that descr-ibes Such a systemallows directly in the field or subsequently digitised from paperp1ans. the visualjsationof units as polygons, with an overlay of artefactsas points. Each is directly linked to the attributedatabases, allowing context-specificspatial or attribute queies to be preformed (e.g. 'locate all the chert tools within I m of hearthfeatures').

for capturingdigital datafrom paperplansis discussed in somedetail in Chapter5, which also considerssomeof the digital recordingtechnologiesthat are available to archaeologists. Three-dimensional stratigraphic modelling In many situationsarchaeologists record data in three dimensionsthat cannot be jn properly interpreted the two-dimensionalplane offered by most GIS sofiware. possible While it hasbeen in three dimensions using for someyears to record objects computer-aided drawing packagessuch as AutoCAD, and the spatial analysisof three-dimensional data has long been usedby prehistorians to understand the pat(see for recentexamples), teming of objects Spikins er a/. 2002; Nigro et al. 2OO3, (Hudson-Smith target truly analytical three-dimensional GIS remains anelusive and Evans2003).Currently,rnanyoff-the-shelfGIS packages represent the third dimension by extruding vector objects by a I attributevariable to createthe impression of a three-dimensional surfaceor volume. referredto as '2.5D' or 'almost threedimensional'.While this has someadvantages for the three-dimensional visualisaphenomena tion of surfaces or regularthree-dimensional such as buildings, spatial querieslike 'retrieveall the objecLs within a sphericalradius of I m from this point in three-dimensional space',or the modelling of irregularvolumes,are still beyond 'buffer zones'may be basicdesktopGIS lunctionality (althoughthree-dimensional generated in ATcGIS's'three-dimensional Analyst'). Thereare,however,programs that havebeen developedfor modelling geologicaldata and thesemay otTersome potential for fully three-dimensional visualisationand analysisof archaeological

3.2 GIS and excavation


data.The program 'Vulcan',5 for example,provides tools for building and visualising volumetric data in three dimensions.A similar set of software producedby C Tech DevelopmentCorporation (marketedwith the title EnvironmentalVisualisation Systems6) geostaoffers similar capabilities,as well as three-dimensional ristical tools. The ability to build a detailedthree-dimensional stratigraphicmodel of an excavationin order to understandthe structureand formation processes of the archaeological record is an exciting possibility that has yet to be fully explored by archaeologists, and we are awareof only the previouslycited exampleof Nigro et al. (2003) as an exampleof its potential. There have also been a number of projectsthat haveusedvirtual reality to provide innovative ways of interactingwith three-dimensional objects.For example, GillingsandGoodrick(1996)andExonet al. (2000)have explored thepossibilities of virtual reality (VR) for the arts and humanities,and provide severalexamples of potential applications.Purely analytic usesthat link GIS with VR remain under developmentand archaeologicalexamplesthat combine these two spatial technologiesremain to be seen.Interestedreadersare refered to Fisher and Unwin's t)002) VinuaLRealiry in Geographyfor examples ofthe analytic potentialof threedimensional modelling, virtual realityand GIS. 3.2.1 CasestudyII: the WestHeslertonProiect WestHeslerton is a village in Yorkshire,England, that was the setting for one of the largestEnglish Heritagerescuearchaeology projectsin recenthistory.Over 20 hectares ofan Anglo-Saxonvillage andassociated cemeterywereexaminedthrough .\cavation, sulvey and remote sensing,providing a comprehensive picture of a .rucial period in English history.The project was directedby Dominic Powlesland andis describedina numberofpublications,both online andin print (seePowlesland 1998, andHaughton andPowlesland 1999,for a full list of references). Although the project is of major importance for the archaeologyof both the Late Roman,/Early Anglo-Saxonandthe Early,Middle Anglo-Saxontransitions,of particularconcernhere is its pioneeringuse of digital recording and presentation of data. Al1 artefacts,featuresand sampleswere recorded using a total station, and handheldcomputerswere usedin the field as the primary tools for the vectorbasedrecordingof over 300 000 archaeological contexts,artefacts, environmental, drawingand photographic records(Powlesland1998).What marksWestHeslerton as different from other projects that were pioneersin the use of digital recording is thatit alsousedGIS to manage, techniques visualise andanalyse archaeological (G-Sys) spatialdata.The software was written by Powleslandto enablethe threedimensionalplotting of point data and the integrationof geophysicaland remotely sensed data,and digital photographs. with line and polygon vector data.

' 7-*u..r.rr..,-rla..o*. i t o a \,. cte ch . co n /p ro d ucts

/ evs- pr o

. hLn.


Plntirlg GIS to work


Fig. 3.2 The workings of the West Hesle on WEB CD. showing the dynamic links betweenthe plans. seclionsand attributetables.Alfows depict contexFsensitive data records and images obtainable by clickingon the siteplan.SeePowlesland e/ 4/. (1998) for defails on its construction.Intagesreproducedwith perlrusston.

As well as facilitating the management and analysisof the spatialdata,the use ofGIS alsoexpedited thepublication process. Oneofthe mostinnovative aspects of the project was its use of digital media to publish the primary record. The scale of data involved,consisting of nearly 30 000 contexttecordsand plans,90 000 object recordsand closeto a million animal bone fragments.alongsidethe copions photographic,stratigraphic,geophysicaland other datasets, meantthat traditional paper publication would have been a costly and unwieldy medium to presentthe full dataset. On whatwasinexpensively published asthe WestHeslerton WEB-CD (Powleslandet al. 1998),data tables can be searched, plans and text, and section drawingscanbe interactivelylinked in waysimpossiblewith traditionalpublication (Fig.3.2;Powlesland fbrmats 1997). Althoughit is now some years since theproject

3.3 Landscapearchaeology


nnishedits fieldwork. West Heslertonremainsa milestonein the use of GIS in the jeld, andfor the digitalpublication of archaeological data. -1.3 Landscape archaeology Regionalsurvey (or 'field-walking') projectspresentan obvious environmentfor :-rsing GIS, as the regional and spatialfoci are readily alignedto the tools that GIS .rffers.Whether the surveyis organisedusing systematictestpitsor transects, or is lrore opponunisticandexploratory,usingGIS offersenormousbenefitsover paperrasedrecording. A numberof publications haveconsidered the impactof spatial :echnologies on archaeological surveyand landscape archaeology: for example, :apers in Gillings el al. (1999) and Bintliif et al. (2001) discuss technicaland :ntelpretative issues arisingfrom the useof GIS in specific environmental settings, (2003, Lock pp. .nd 14 77) providesa more general, but very comprehensive, rr erview.One major challenge tacing landscape archaeologists is how to collate ilta from ditferentprojectswhendatahavebeencollectedunderdifferentrecording .\ stemsand methods- for example,papersin Alcock and Cherry (2004) provide , \{editerraneanperspective on this problem - and GIS cenainly has a major role :o play in this endeavourHowever,we limit ourselveshereto only a more general ,'.)nsideration of the value and role of GIS in landscape archaeology. The spatiallesolution for archaeological landscape data collection variesenor:rously: it may rangetiom the intensivepoint plotting of all artefacts acrossa study :lion (e.g. Fanning andHoldaway 2001;Ketz2001),to extensive suNeys thatseek .nlv to recordthe locationof archaeological sites. In artefact-rich landscapes, such .i the Mediterranean, it is often impractical to plor the location of every artefact .r eventransect, and it is not uncommonfor archaeologists to use surveyunits :r the region of 0.5 ha or greater in area, in which counts of artefacts(but not :reir locations) are tabulated from field walking (Banon et al. 1999,2002;Lock .: el. 1999:Gillings and Sbonias1999;BevanandConolly 200,1). In many cases, rrojectswill use a rangeof resolutions for differenttasks,althoughthis always :quires careful documentation to ensurethe data recordedat the coarserlevel are rrrt analysedassuminga higher level of precision. For example, whereasinsight ::uo the effects of erosion on artefact visibility could potentially be obtained by ::rlestigatingthe relationship between point-plotted artefact locations and slope . rlues on a 2-m2 grid, the same investigationwith artefact densitiescollected in :nits of 50 n.r2 would be meaningless unless the slopedatawereresampled to this of resolution. -elel Regardless of the spatialresolution,a common startingpoint for any landscape!iale project is the digital acquisition(most often as vectordata)of the background :rrpographicdata. such as extant field systems and boundary walls, buildings, \rreams,roads,pathwaysand tracks.This level of detail is dependent on the avail.rbility of mapsof a suliciently high resolution,which for all practicalpurposesis uruallya scaleof I :25 000 or larger Thesebackground contextual dataprovidea useful spatialframework for archaeological investigation.For projectsworking in


Putting GIS to work

smallerareasit may be practicalto collect these,andalso elevationdata,asprimary data via recordingequipmentsuchas a different GPS or a total station(Chapter5). More often, however,it is more practicalto captureelevationdatafrom secondary sources,such as by digitising contoursfrom existing paper maps.then converting however, very theseto a continuouselevationmodel (Chapter6). In somesituations, little high-resolution mappingmay be available, in which caseit might be neceslowerresolution maps(suchasPennsylvania saryto plot dataon publiclyavaiJable Library'sDigital Chrrt of the World,T theUS Ceological Survey's State University imagery, HYDROIk series,i or commercially available satellite as describedin Chapter 5). The useof a navigation-grade of t10 m) will allow GPS (i.e.with an accuracy the survey units (test squares, transectsor tracts) or actual artefactsto be plotted reasonably on basemapsof scales of I : 25 000 or smaller. With a wideaccurately (WAAS)-enabled(seeChapter5) handheldGPS (+3 m) area-augmentation-station such data can be plotted accuratelyat scalesof I :5000 or smaller,and if a differential or survey-grade GPS is usedthen the spatialaccuracywill exceedthat of all but the highestresolntionbasemaps(seeChapter'5for more infbrmation on digitai recordingmethods).As outlined in Chapter2, the useofa metric coordinatesystem (such as UTM, UK Oldnance Survey, US State Plane,or some fbrm of National Grid) is preferableto latitude and longitude coordinates. Recording the spatial boundaries of a suweyunit within an existingset of map data requires a reasonabledegree of map-readingskill and is essentialfol the The use of GPS accurateplacementof archaeologicaldata within the landscape. receiversmay improve the accuracy, althoughsince handheldGPS recordershave an errorrateof t10 m it will be necessary to checkandpossiblymanuallyadjust plotted spatiallocations at scales largerthan 1 :25 000. If surveyunits ale being digitisedfrom paper'fieldmaps.then a minimum spatialerror shouldbe estab(RMSE) value (see Chapter5) lished by referenceto the root-mean-square-error of the map registrationfor eachsessionof digitising. If the survey units are tracts (i.e.polygons), relationships thenit is alsoimperative thatthe topological between the units are properly established: adjacenttracts should sharenodesif they truly are adjacent, andmust not overlapifthey arenot. Signi{icanterrorscan be nude, for example,with aftefactdensity estimates because of topological effors in digitised polygon in further detarl data.Issuesrelatedto digitising and topology arediscussed in Chapter 5. point that Thomas(1993,p. 26) has It is worth highlightingherean irnportant raised concerningan erroneousbelief that can potentially arise in GIS-led landscapesuNeys: that 'data assembled are data understood'.The apparently'totalising knowledge' that emergesfrom the assemblyof structures,fields, hydrology, soils, elevation and extant archaeologicalevidenceir.ttoa GIS does not directly Meaningfuland lead to an understanding of the all-impotant social landscape.
7r"lv. *aproo*. p "l, . e d u /d cw/ . 8 h ttp , / /"d cdaac. usqs. qov/gtopo3 0 /hydi o/ .

3.3 Landscapearchaeology


iubstantive interpretationsof the complex and often unpredictablerelationships humanshavewith their landscapes cannotbe arrived at by assemblingdata alone, but must be carefully built andexploredusing a rangeofsources(e.g.ethnographic, historical,environmental,experiential,archaeological). Each provides insight and possibilitiesfor developinganunderstanding interpretative ofpast humanbehaviour ir a landscape context; GIS is but one of severalpotentially useful tools that can 5e usedto reachthis goal and its resultsshouldbe balancedagainstother forms of rnalysisand inter?retation. 3.3.1 CasestudyIII: the K.llhera Island Project The Kythera Island Project (KJP)eservesas a good exampleof an extensivesurvey troject in an artefact-richlandscape where GIS played an important role in meet:nc the aims of the project. Kythera,a Greek island that lies betweenCrete and the Peloponnese, was selectedfor an intensivestudy to explore insular dynamics(see Broodbank 1999 for an overview). Here we highlight only the landscape archae..logical componentof the research,which was built on a detailed study of over .'ne-thirdof the island's total areaof 280 km2. The geospatialframework usedby :he project is basedon the Greek military grid systemestablished for the island, ,rn which the | : 5000 topographicmap seriesis based.Thesemapswere manually Jigitised prior to the survey and included a range of anthropomorphicfeatures, :ncluding roadsandtracks, field systems and standing buildings(Fig. 3.3). A 10-m resolution DEM for the survey area was createdfrom manually digi:isedcontoursto aid in the interpretationof land-use,site and afiefact taphonomy, rnd ecological modelling(BevanandConolly 20041. The field computingsystem of three laptops running digital mapping systems, databaseand GIS. -'onsisted Hardcopies of sectionsof I : 5000 mapsdefining the day's field-walking objectives .rereprintedout for archaeological surveyors, who thenuseda combinationof local .lnd featuressuchasfield boundaries, roadsandbuildings,togetherwith compasses :nd GPS receivers,to locate and plot their survey tracts.Individual transectsand :.lated afiefact counts were also recordedon paper forms, and land-usedata was :lso recordedfor eachtract. Artefact record sheets were enteredinto a linked GIS rnd relational database systemby an onsite dedicateddata-entryteam (Fig. 3.4). Pointsdefining the individual survey tracts were also digitised at the end of each Jay,which werethen linked to the database records.Between 1998and2001, nearly .1000 lracts covering an areaof 42.5km2 were.urre)edand digitised and nearly 100 new siteslocatedand documented within the surveyarea. The use of GIS during the survey seasons enabledthe qualitative assessment .rf new spatial patterning within a few hours of the data having been collected. \ewly discoveredarchaeologicalsites, aftefact distributions and the conditions influencing the patterning of artefactswere then used to infon.n ongoing project decisions.The archaeologicaldatum itself could also be immediately revisited,
rrm. u c 1 . a c . u k / k i p .



.,'' r.') .x"..".'(^=.. /


' to :..).' -' ;^ ' :'

|\( \ i




'!t .-:



/1 ;-\.'l J rffi$'-'t"\ '/ . \llr-t-


'&{ ri :::'



r/,.'- -7




7: -,4,


:', ; 'r": r,

''t" :.\


\r ',',rti urj._.."

a, -. 1/ ,/ 1=

5 0 0m

. ,;t*"iil*;

Fig.3-3 A small window into the KIP GIS. showing majof anthropomorphicfeatures Kvthera that fonned the fiamewor-kfor the archaeolosicaldata collection. Sour-ce: Island Proiect.Used with Defmission.

Fig. 3.,1 A sectionof the KIP suNey tract attributedatabase. SourcerKythera Island Project. Used with permission.

3.4 Spatial and simulation motlelling Table3.1 KIP geospatialdatasets

Entity l0-m contours Spot heights l J-m contours Cultural topography Bedrockgeology {erial photographs imagery Sarellite Digital photos !levation Site location iire scatter Ceramicdisffibution (i) Ceramicdist bution (ii) (i) Geoarchaeology (ii) Geoarchaeology Island Island Survey area Island Island Island Island Local Local Survey area Local/surveyarea Local Survey area SubsuNeyarea Local



Point Vector Vector Vector Raster Raster Raster Vector Vector Vector Vectol' Vector Vectoi Va ous Grid Grid nla Point Point

Area All All

I : 5000 I :5000 I : 5000 I :5000 I :50000 I I 15000 20rn resolution nla I : 2000 -l :15000 -l : 15000 25-.100-m2 Llnits <10000-m2 units 1 : 5000 I : 2000

iource: Bevan and Conolly 200,1.

,-heckedand confimed, leading to a very high-quality dataset.Furthermore,the :apid feedbackof results to the archaeologists responsiblefor the data collection 1ad a significant positiveeffect on team morale. From a management perspective, :he KIP GIS also helpedwith predictionsofthe time requiredto completesections .rf the survey,and how best to allocate surveyorsin the field. Thesefactors were :1soof enormousvalue during the field season. The final KIP GIS datasetconsistsof a broad range of topographic,geological lnd archaeologicaldata, collected and processedusing a variety of techniques Table 3.1). Thesedata haveformed the basisfor an ongoing substantive study of island's historic past :he landscapes, includingthe studyof land-use systems, the :elationship betweenecological featuressuch as watersheds and prehistoric site placement, and Bronze Age demographyand settlement patterns(seeBevan2003; Bevanand Conolly 2004) for recentexamples. 3.4 Spatial and simulation modlling Theterm 'spatialmodelling' refersto the useofgeospatialdatato simulateaprocess, understand a complex relationship,predict an outcome or analysea problem. For 3xample,regression analysiscan be usedto asceftainthe relationshipbetweentwo iontinuous variables,one of which may be spatial.A well-known exampleof this r) pe of model is the explorationofthe relationshipbetweenthe proportion of a raw materialfound in an archaeological assemblage againstthe distancefrom the source of the raw material. This is often performed to provide insight into the possible governinghow that material was transportedacrossthe landscape. processes This


Putting GIS to work

Table 3.7 The.fbur scol es of ne osurement

Ddtr lypc Nominal Ordinal I11terval Ratio Deqeription Descr'iptive categories Rarked data Continuousdata but arbitrary '0' Continuousdata with fixed '0' Example Coloursrred, blulr, black, etc. Relative sizes:sn]all. medium. large, larger 1(X) BC. AD 50, etc. Calendar dates: L e n g t h s :1 3 . 4 ,1 6 . 2 .1 8 . 1 e , tc.

forrn of modelling was popular in the late 1960sthrough the 1970s,although the optimismexpressed by earlyadvocates concerning thecorrelation ofthe regression (e.g. cuweswith particularforms ofexchange systems Renfiewelo1.1968;Renfiew andDixon 1976)wassubsequently mutedby the simulation studies of Hodderand Orton ( 1976). Regression analysisnevertheless still plays an importantrole in understanding the relationship (Shennan quantitative between variables 1988,pp. 11.1-13.1), although in a subsequent chapter we will highlightsomeof the problems thathave beenidentified with the technique whenmodellingspatial datasets. More complex formsof archaeological spatial modellingcan involveestablishing therelationship between known site locations and environmental variables - typicallythingslike elevation, slope, aspectand distance to water-tohelp predictthe occurrence ofsites (i.e. 'predictive in unsampled areas modelling',asdiscussed in detailin Chapter 8). Other typesof spatialmodelling commonly employedby archaeologists include the useof elevation modelsto understand visibility (Wheatley1995;LakeandWoodman 2003;Llobera 2003),elevationandterain datato understand movementacross (Bell landscapes et a|.2002), ecological modellingto understand factorsrelated to site visibility and location(Bevan2003),and quantitative analysis of artefact (Spikinset a|.20021.Spatialmodellingcan distributions to model living surfaces also includethe use of simulationstudies to understand humandecisionmaking (Lake in specific environmentalsituations 2000a,b).The lange and types of data used in spatial modelling are clearly broad in scope,to the extent that it is difficult to provide generalguidelinesfor how best to approachthis type of GIS-based analysis. One common approachto modelling first extractsdata from a spatial database in order to explore,clarify and define patternsand relationships(possibly with the aid of separate statistical softwale). Examples include:the extlactionof sitelocations as r, )-coordinates in order to analysethei| relative aggregationat different scales;logistic regressionanalysisof grid-basedenvironmentalvariablesagainst site location:or the calculation of statistical indexes suchas Moran's 1 to search (autocorelation) within a dataset. fbr spatialdependency In theseexamples, the datamodelis not asimpoftantasthe scaleof measurement, that is nominal.ordinal,intervalof ratio data(Table3.2 and seealso Chapter8), which inlluencesthe type of quantitativeanalyses that can be performed.Interval

3.4 Spatial and simulation motlelling


lata provide opportunitiesfor a wide rangeofrobust statisticaltestsbut infbrmative ( 1988, nodellingcanstill be performed with lower-order pp.65-70), data. Shennan provides tor example, a simplecaseof a non-parametric analysis of site location r.-sainst a nominalvaliable(soil type)thatprovides someinsightinto the processes .tructuring site location.Chapter8 reviews a selectionof useful statisticalmethods :or testingthe presence and strengthof relationshipsin spatialdatasets. A secondapproachto spatial modelling involves the mathematicalmanipula:lon of one or more datasets to producenew data. A number of different types of rnodelling,including network,visibility, erosion,movement,cost-surface andsome :ormsofecological modelling,fall into this category. With the exceptionofnetwork ;rodelling, which dependson vector topology, raster data are often the preferred lirta format for building spatial models.Elevation data play a particularly impor:.lnt role as a basic dataseton which visibility. movementand ecological models the human use of space.New -.rebuilt; thesecan then be usedto help understand can also be created :eospatialdatasets by combining rasterdatasets using algebraic :rpressions.Chapters9 and 10 examine this form of spatial modelling in further l.'tail andthe casestudies described below together outlinean appliedexample. A third approach is tl'-namicmodelling,which exploreshow phenomenon change er time. Cefiain types of erosion models lhat rse automata fall cellL or into this -r\ Archaeologists are also using ogent basedsinzzlntlonmodelswithin GIS - ategory. r\ stems to predictandexplaincertaintypesof pasthumanbehaviour anddecision :nakingin an actuallandscape setting(e.9. Kohler and Gumerman2000; Lake i000a). Agent-based modelshave also been combinedwith GIS for predicting ',isitormovement in nationalparks(Gimblett2002) and this technique may well :rove useful for CRM (Eve 2004). Integration with simulation modelling is one .rl the most active, exciting and rapidly developingareasof archaeologicalGIS. However,building a dynamic simulation in GIS is an advancedtopic that falls .rutside the scopeofthis book. Interested readers areadvisedto refer to the two cited i olumesaboveandto review papersinlhe JournaL of Artifctal Societies and Social Sintulation.Gilbert and Troitzsch (1999) provide a more practical introduction to ihe simulation of socialphenomena, although they do not specifically address the u\e of GIS.

3.4.1 Casestudy IV: the modelling of Mesolithic site location and nraging beluviour Examples of the application of GIS-based spatialmodellingcan be tbund in Lake 1000a,b, Lake andWoodman2000.andWoodman 2000b.Thesethreelinked studies investigateprehistoric hunter-gathereractivity on the island of Islay, Scotland, and, respectively.make predictions about site location, explore the behavioural activities that produce sites, and establishthe relationship between site location and the amount of the landscape that can be seenfrom that position (i.e. a site's 'r'iewshed').


Putting GIS to work

(2000b)predictive modelling study Table 3.3 VariablesuserJ in Woodman's

Variable Elevation Local relief Shelterquality Aspect Exposure Angle ol view Distanceto water source Order of nearestwater source Distanceto modem coast Distanceto Mesolithic coast Exposureof modem coast Exposureof Mesolithic coast Topographiclocation of modern coast Topographiclocation of Mesolithic coast Descdption Derived from l0-m contours Maximum elevationrange in 500-m radius Ordinal-scalemeasurement of landform shelterbased on Kvamme (1985) De vative of elevation Derivation of aspectto createcategories describing exposure Derivation of slope,to createcategoies describing horizontal angleof view Linear distanceto nearestwater source Basedon drainageclassificationusing Strahler's method (Strahler 1952) Linear distanceto hjgh-water mark of modem coastline Linear disranceto recofftructed coastline (exposed, Ordinal-scalemeasurement semi-protected to protected) Nominal-scalemeasurement describingsettingof site (e.g. bay, headland,linear coastalor inland) As above

Woodman'sstudy involved subjectinga selectionof 14 environmentalvariables to a form of statisticalmodelling called linear logistic regression analysisin order to ascertaintheir relationshipto site location. This type of analysisis multistaged, and is dependenton a number of presuppositionsthat first have to be demonstratedrather than assumed, but ultimately resultsin an equationofthe form p : a I h xt I bzxz *' . . * b*x*, where p represents the log odds of site occurrence, .{1,,y2. . . .rk are the valuesofvariables suchaselevation,distanceto water,etc.,and a andb1, b2 . . . b7,are derivednumeric valuesestablished by the statisticalanalysis (Tab1e 3.3) (Menard 2001). This formula can then be usedto constructprobability maps of site location. In this particular case, by comparing the predictions for site presence againstboth random and known settlementlocationsfor inland sites, Woodman was able to show that her model was able to distinguish site locations frorn random placesin the landscape, and thus had value for predicting the likelihood of site occurrence elsewhere on the island. Lake's (2000a,b) study combined GIS and computer simulation to construct a dynamic model of hunter-gathererforaging behaviour on Islay. The aim of the exercisewas to develop a better understandingof the causal factors underlying the spatial patteming of Mesolithic artefactslocated during archaeological fieldwork (Mithen 2000) and to test whether their distribution and composition could be explainedin terms of hunter-gatherer decisionmaking that prioritised the

3.1 Slttttiol arul sinuLlation modelling 0 10 km



n 100.00 & 0.34 : 0.00

Fig.3.5 A simulated artelnct distribulioiresulting fiom one rrrnol an agent-based model. The key showsthe numbel of microliths as a percentage of the maximunl nunlber discardedin any map cell. Reproducedliom Lake (2000a)with pern]ission: coaslline all rishtsreserved. licenceno. 100021 I8:t. O Crown Copyright.

The process :cquisitionof hazeln[ts. invo]vedthe construction of an agent-based .imulationmodel that pfovidedeachagentwith a set of goals,decision-making includeda r.nodel ,bilities and risk{rking parameters. The agents'environment abundance ri hazelnut on the island,which was constructed by combiningGIS parameters .l\ ers suchas measures of land capabilitywith estimated for woodderivedfrom pollen data and contemporary ..rndspecies'tolerances topological Agentswere then'set loose'on this digital landllta such as slopeand aspect. i.ape to searchand collect hazelnuts.and shareinformation about the location of By building in tool-discard riizelnuts. behaviour', Lake was able to comparethe (e.g. varying the location of landing and :imulatedruns of various scenarios :he willingnessto lake risks in terms of deciding whereto move), and comparethe against the known archaeological :esults record(Fig. 3.5).The modelof hazelnut ioraging behaviour,as predictedby the simulation,showeda poor correspondence u ith the archaeological record. suggestingthat hazelnutforaging was not a major Jeterminantof hunter-gathereractivity on the island. The lack of fit did not mean


Putting GIS to work

that the simulation 'failed' - only that this behaviour does not account for the archaeologicalevidenceand, consequently, that some other explanationmust be entertained. Finally, Lake andWoodman(2000)responded to a well-documented behavioural trait ofhunter-gatherersconcerningthe importanceof information acquisition,and hypothesised that Mesolithic sites were situatedin positions that provided more extensiveviews acrossthe landscape than the 'average' view from elsewhereon the landscape.To test this idea, they formally developeda null hypothesisthat stated that the amountoflandscapevisible (the viewshed)from known sitelocations shouldbe statisticallyindistinguishable from the quantityoflandscapevisible from for a1lnon-sitelocations all non-sitelocations.As the calculationofthe viewsheds would involve deriving information from 90000 individual locations,they useda technique calledMonte-Crtrlo simulation (Chapter 8) to provide estimatesof the viewshedparameters for non-site locations.This involved randomly sampling 24 locationsfrom their studyarea25 times,andcalculatingan 'envelope'that provided for non-sitelocations.Thesevalues a statisticalrange of viewshedcharacteristics of known sites, which were were then comparedto the viewshedcharacteristics shown to be statistically different. This result was upheld even when sites were by similar elevation, slope and compared with non-site locations characterised rejected,allowing aspectvaluesas the sites.The null hypothesiswas consequently them to proposethat Mesolithic people on Islay deliberatedsituatedsitesin areas that possessed large viewsheds,presumablyas part of a strategyfor maximising information about their local landscape. 3.5 Conclusion to which It should now be clear that the rangeof tasksand the variety of questions your pdmary interests vast. Whether are using GIS can contributeanswersare GIS to help to facilitate the management of spatialdatacollectedduring an excavation, predict the probability of site locationsin a study arca,or to model the movement routes of people acrossterrain, GIS can play an important role. If you are neu' you should now havea to GIS and havebeen following the chapterssequentially, broad much betteridea ofhow GIS fits into the wider framework of archaeologists' how GIS works, and how it has interestin the human use of spaceand landscapes, of the archaeological record. beenusedin the past to help manageand make sense The next three chapterslook more closely at how one builds a GIS. Firstly, in This is followed by a systems. Chapter4 we considerdatastorageandmanagement comprehensive chapteron how datasets can be digitally capturedand structuredfor Chapter6 then delvesmore deeply manipulationand analysisin a GIS environment. into the manipulationof spatialdatasets, showing how to classify and interpolate analysis. data in order to createand structurespatialdata for subsequent

4 The geodatabase

J.l Introduction -}is chapter describesthe way that spatial and attribute data are stmctured and .:ored fbr usewithin a GIS.It provides thenecessary intbrmation aboutdatamodels database design ,rd to enablearchaeologists unfamiliar with computer databases :-. make appropriatedecisionsabouthow best to constructa systemthat will work ; ell andefficiently. A database is a collection of information that is structuredand recorded in a manner. A card cataloguethat recordsinformation aboutarchaeological -'.rnsistent as a full-fledged web-ites,suchas their locationand date,is as much a database .;archable digital sitesand monumentsrecord.Digital databases diffel from their :irper counterpartsmainly in that they are dependenton databasesoftware for records. .rarchingandretrieving The complexityof the datastructure will alsobe rcreasedas digital databases are often broken into severaldifferent related files. This reducesthe amount of duplicatedinformation in a database, improvesaccess and also enables the retrieval of small subsets of data rather than complete 'peed ::cords. Softwarethat is usedto store,manageand manipulatedata is ret'erred to a Database ManagementSyslen (DBMS). The objectivesof a DBMS are to '; .tore and retrieve data records in the most efficient way possible,from both the :erspectiveof the overall size of the database and also the speedat which that data be accessed. -'an The technologyof DBMS is a major research focus in computerscience. There :i consequently a vast and growing literature on database method and theory, but :nless one is interestedin designingand writing database programsfrom scratch :r is possible to remain largely ignorant of these details. More usefully we can briefly note the four specificfunctions that a DBMS shouldprovide (cf. Burrough .rndMcDonnell 1998,p.50), namely:(i) quick access to, and the ability to select .ubsetsof data, potentially by severalusers at the sane timel (ii) a facility for inputting, editing and updating data; (iii) the ability to define and enforce rules ro ensuredata accuracy and consistencyi(iv) the ability to protect data against unintentional or malicious destruction. A GIS needsto store,manageand retrieveboth geographicaland attributedata Ithecombination of which is collectively refenedto asa 'geodatabase'). Many GIS providetools for the management programs geographical ofattribute aswell as data, althoughit is also comrnon to use a separate DBMS plogram, such as Microsoft Accessor MySQL, to manageattributedata.The first pafi of this chapterreviews


The geodatabase

Table 4.1 A flat-fle database

drameter (mm.) earliestdate (BP) latest date (BP)


sherd-class temper nm body base body rim body base rim rim fine sand line sand groS vegetable vegetable fine sand coarsesand shell grog

site id


l 2 3 4 5 6 7 8 9



1001 1 0 01 l00l 1004 100,1 1006 1007 r00? 1009

1200 1200 t200 900 900 1000 800 800 200

I 100 I 100 r 100 700 700 800 400 400 100

8045 8045 80,15 410 410 8900 7980 7980 1200

in a DBMS, andthe second the ways that attributedatamay be storedand managed part examinesthe storagemethodsfor rasterand vector datasets. 4,1,1 Data models:from flat Jile to relational The simplest form of database consistsof a single table of data, in which each column containsa field and eachrow a record.Tablessuchas theseare referredto asflat-Jile databasesbecause there is no depth to the data. All the data are stored in one location and there are no linkagesbetweentheserecordsand those in other over other logical models data tables.Flat-file systemsdo have some advantages asthey are very easyto maintain,the dataare readily available,and searchand find operationsare computationallysimple. However,they have a major disadvantage which is that they have which precludestheir usefor all but the simplestdatabases, For example,many records no facility for dealingwith datathat may be redundant. in the pottery sherd databaseshown in Table 4.1 contain duplicatedinformation about the site fiom which the sherdcomes. This introducesa high level of duplication in the data. Furthermore,some data cells are empty for example, the fi.elddiameter is only filled in for rim sherds. The result is both an unnecessarily large and redundantdata structurein which a large number of cells either contain repeateddata or are null. Flalfile tables are as they soon become unwieldy thus only appropriatefbr the simplest databases, when information about more than one entity needsto be recorded,as in the above example.An elegant solution to this problem, first defined by E. F. Codd (1970) in what became known as lhe RelattonaLModeL, is to break the databaseinto tablesthat eachcontain a coherentpackageof information. For example. separate creating three new tables each containing infonnation o\ sites,pottery and rim.s reduces the amountof repeated is that the dataand blank entries.The disadvantage systemmust then managethreetables insteadof one, so some mechanismis then neededto extract information from all tables to answer questionssuch as 'what different types of temper are found on rim sherdsfrom sites with dates before 1000BP?'.

4.1 Introduction


The advantages ofthe relationalmodel areits simpletabularstructure, uncomplipowerfulandversatile querylanguage, andits associated ilted relationships, called Althoughthe relational modelmay seemcomplexto SQL (pronounced'sequel'). in practice it is a simpleandversatile large :re uninitiated, way ofmanaging datasets. principal relational is table of data, referred The componentof a database a to :\ a relation, which consistsof a set of recordsthat contain a number of different Each record must be distinctfiom every other via .eldssharedby all records. , unique identifier (such as an id number), referred to as the primaDt kzy. If no 'lngle field containsa uniqueentry,then two or more lields can be usedto define a must alsomeetcertain :rimary key,in which caseit is calleda composileke)-.Tables (for examples, :lrnditionsdefinedby the rclationalconceptof nonnolisation see Beynon-Davies1992.pp.3042). This meansthat recordswith the samerepeating ...rlues shouldbe placedinto separate tablesand be definedby their own primary The process ofbreaking a setof data into separate tablesand defining the links ,,er'. rJtween them is refered to as rutnnolisation. For example,in Table4.1, valuesfor site id.earl,- dote,late date andarectalways .ccul in groupsand shouldthereforebe recordedin a separate table (called 'sites'). The rules of normalisation also require that dependentvalues in the remaining :relds are separated; and so, in our example, dicuteter must also be placed into than lhe shetd-id. sherd class and temper : different table ('r'im measurements') ields, because it is dependent on a s/rer11 clas.! valueof 'rin'. ln this case, the rims possible :;rble needsto inherit the sherJ-id data so that it is to determine which .herd each rim measurement belongsto. The normalisedtables and their primary ieys are shownin Table4.2. Relationships betweentablesarethencreated by defininglinks betweenaprirnary iey and its equivalentfield in anothertable. ref'erredto as a foreign,tey. In our :rample, we needthereforeto createa new field in the pottery table called sl/e- id ihat containsthe id number of the site where that sherdwas found. This field then .rctsas the foreign key and allows the primary key (site,id') in the site table to a relation with the pottery table. The variablesherd-id, which is present -stablish in both the pottery andrim tables,can be usedto link thosetwo tablestogether. The links tablescan take the form of either a one-to-otle,one-to-nntryor many^to-many relationship, depending on the numberofrecords referredto by the primary foreign key link. In this example, the lelationship between'sites' and 'pottery'is one-to(as many one site may have many sherds)and the relationshipbetween 'pottery' is one-to-one(as no sherd will have more than one rim and 'rim measurements' record note we are not consideringthe possibility of conjoining sherdsor rims here!). For reasonsof storageefficiency and relational cohelence,many-to-many needto be turned into one-to-rnany relationships relationshipsvia the construction of intermediarytables.Normalisation is covercd in greaterdetail in a number of publications, accessible includingthe University of Texasat Austin'sInformation (2003). Technology Service DataModellingPageslandHemandez
'wrrr^ r . u L e x a s . e d u / i t s /win d o ws/d a ta b a se /d a la n o d e lin q /r m/overw i ew .html .


The geodatabase 'lable 4.2 The three normalised tables: sites (upper), pottery @iddle) (lower). Primnry keysare definedby a t, the and im measurements foreign key by a I
sitejdt early date(BP) laredate(BP)

1001 1004 1006 1007 1009


1200 900 1000 800 200

sherd-class nm body base body rim body base rim dm

1100 700 800 400 100

fine sand fine sand grog vegetable vegetable fine sand coa$e sand shell

8045 4to 8900 7980 1200 site.idt

1001 1001 r001 1004 1004 1006 1007 1007 1009


diametef (mm)

1 5 8 9

88 47 1.40 134

The retrieval of dataftom a relational systemusing SQL involvesdefining the relationsbetween entitiesandthe conditionsunderwhich certainattributesshould be selected.This is discussed in further detail in Chapter7, where spatial data querymethods arealsoreviewed. SomeGIS programs, suchasESRI'sArclnfo and ArcView, haveembedded DBMSs that allow basicrelationships between tablesto be definedand queried.While embedded management systemsoffer good functionality for dataretrievalfrom databases witl simplestructures, linking a GIS to a database suchasPostgreSQl,MySQL, Oracleor Microsoft's SQL Server, permits the full useof the relationalmodel and SQL for managingand queryingattribute data. Even if the GIS softwaredoesprovide basic DBMS facilities for managing attributedata there are three good reasons for using an externalDBMS instead: (i) an extemal DBMS will be able to cope with more complex multi-table data relations; (ii) dataretrieval will be fasterwith large volumesof data; (iii) it will pennit morecomplexqueries t}tanthebasicsearch andretrievalfunctionsprovided

4.2 Destgninga relational database


DBMS found in GIS programs.Nearly all GIS offer facilities for : manyintegrated :rking spatialdata objectsto attributedata held in an externalrelationaldatabase, Connectivity technologies suchasOpen DataBase :renthroughSQL and database .IDBC).2SQL is covered it is worth noting in moredetailin Chapter 7, although provide graphical query tools so that the user is that many database systems -:re required to write raw SQL. -rt Be\'ondthe relational model: object-orientated systems :,:cent post-relational databasetheory has seen the development of object,ientuted(or OO) databases. (OO) databases sharesonteof the Object-orientated languages such asJava. This includes concepts behindOO programming : rilosophy ..jch as packaging (behaviour) into modular and functionality together units data so thattheycan only change their own the encapsulotion of objects -.Jledobjectsl .:rte. not the stateof othersin the system:andinheritance, suchthat new objects :rv inherit the propertiesof other existing objects. Objectot'ientateddatabases by including intbrmation about ::us diffbr significantly fi'om relational databases characteristics of entities.which arereferredto as its r?e/hods. The ::: behavioural ,mbined is ref'erred table and attributes, including rules defining their behaviour, ': as an obiett. and similar obiects are grouped together in object classes.For enclosures' may be :\rnlple, as appliedto GIS, 'burial mounds' and 'causeway r.rdelledas objects in an OO data model rather than simply entities in a layer. In within the object, ::.e OO model, infonnation about the entities are encapsulated .', hereasin the relational model. information is obtainedvia linked records in an -:iribute table.Embeddingthe relevantinlbrmation within the object itselfremoves :re need to searchfor propertiesin additional tables, and also provides the pos .:bility of includingrulesthat stipulate the behavioulof the object.The process a ri retrieving data is similar to the relational rnodel, although in OO databases (OQL) is used. of SQL called Object Langunge Query -:3rivative A fully OO GIS that permits a more flexible set of object rules andbehavioursis :..rtyet availablefor the desktop,andhasnot madea significantimpact in archaeolwhich hasbeen systems includeSmallworld.r rsv. Commercial OO GIS software requirements. with significantasset-management Note, ::signed for organisations partially geodatabase is that the structureused in ATcGIS OO because :.rrwever, . ictor entities do have defined behavioursand rules to, for example,control the :omr of topological relationshipsbetweenvector objects. -1.2Designing a relational databaselbr attribute data theory when they need The usersof GIS most often becomeinvolved in database :o design a database to provide storageand accessto a set of attribute data. For survey,you may :rample, if you havebeencollecting data from an archaeological
. -t t p, f f " r , . r , ' i X l p " d i a . o r s/wiki /o DBC .


The geodatabase
skeloaation siteid* easting1 , northingT sites siteid* county earlydate latedlte

Fig. 4.1 An entity-relationship (E R) diagram (boxes: entities and att butes, arrows : relationships)for a geodatabase. Asterisks indicate unique values (prirnary keys). t indicatesa spatial attributeused for GIS implementation.

need to create a system to store information about the types of environmental and archaeological data that were collectedthat can then be linked to a GIS that contains information about the spatial location of, for example, testpits, survey areas,transects and archaeological sites. The designof a database designinga conceptual typically consistsof four stages: model, implementingthe model, consfixcting an extemalinterface(or 'front end'), and deciding on the systemfor the physical storageof data.
Conceptual model Databasedesign often first begins at the level of the conceptual ,nodel. This is the most absffactand arguably the most impo ant level of database and their attdbutes(such design,as it concernsthe entities(suchas sitesor artefacts) as size or type), and their relationshipswith other entities(such as betweenartefacts and sites). The term 'entity' in this senserelbrs to any phenomenon,physical or abstract,that can be describedand distinguishedfrom other phenomena(Jones1997. p. 164).For example, 'anefacts' and 'archaeologicalsites' are two different types of entities.When designingan attributedatabase, a useful first step is to make a list of entities and their attributesand the nature of the relationshipsbetween them. This can then be modeiled using an entity-relationship (E R) diagram (Chen 1976).The example provided in Fig.4.1 shows the relationshipbetween the entities discussed previously: 'sites', 'pottery', 'im measuements'.Note that in this instancea foufih relation has been addedthat containsthe sDatiallocations of the sites.which would usually be storedin a GIS. Implementation Following the conceptual design,the next stageis typically to populate the database with a sampleof information to ensurethat the rclations lunction in the way that is intended,and that it is possibleto extract information in the combinations requiredfor subsequent analysis(e.g.using the sameexampleas above,is it possible to extact the locationsof sitesthat possess sand{emperedpottery?). External interface Assuming the databaseworks, a comrnon next step is to create the external interface, although with smaller single-usersystemsthis may not be necessary. In larger multiuser systems having a 'front-end' is desirableas it provides users with a set of tools to enter and searchfor data. In addition, as there may be severaldifferent categoriesofuser, the externalmodel can be customisedto present only the necessarysubsetsof data to a particular user. For example, the extemal level of a sites and monumentsrecord databasecould be desisned so that the data

4.3 Spatial data storageand manaqement


and municipal plannersre olganised differently to the data seenby archaeologists made available to membersof the public. It is at this level that an entire database, or sectionsof it, can be 'locked' so that data cannot be seen or modified by nonadministrativeusers.Facilities for speedingup data entry and reducing input error can alsobe introducedatthe extemallevel, which might includedrop-downmenus,or packages 'radio buttons'offering simple yes/nooptionsfor dataenhy Somedatabase provide tools for the design of front-end 'forms', and many provide tools to enable the use of web brcwsersto view query and edit data. Physical storage Somedecisionscan be madeby usersregardingwhere and how inforprognmme$ expendconmation should be storedon hard-disks.GIS and database siderableenergyon increasingthe efficiencyoftheir programsby designingthem so that tbe jnformation they storecan be quickly retrievedby the computer.Fordatabase w tes end users,the specificsof the physical model (such as how olten a database to a disk, how it allocatesvirtual memory, and how it keeps track of changesto a and its ability to process database) may remain hidden, but the speedofthe database rcquestsfor information efficiently is ultimately linked to how the databasedeals with the physical storageofdata. A small desktopGIS will probably storedataon the samephysical device (hard-disk) as the GIS software;but data for large, multiuser, networkedsystemswill likely be spreadover many disks and, to preventdata loss in the eventofdisk tailure, duplicatedin what is known as a redundantanay ofinexpen for more than one user,then it is sive disks (RAID). If you are designinga database in more detail an afternoon's network storageandRAID systems worth investigating research on the Intemet will provide ample information about availableoptions.

4.3 Spatial data storage and management for the storageand manipAs with attributedata, a GIS also requiresa database ulation of spatial objects. However, as the data structuresfor spatial objects are the spatial be manipulated, into the GIS programand cannotthemselves embedded job with'behind it will do its the scenes'.Although objectDBMS remainslargely worth briefly reviewing out complaintor needofuser intervention,it is nevefiheless usedto storeand manipulatevector and rasterdatasets the different data structures in order to reinforce the primary imponanceof the DBMS in GIS. 4.3,1 Vectordata storage In the vector data structure, all objects, whether points, lines or polygons, are represented using one or more .r, y-coordinatepairs. A set of points, for example, is recordedas a list of coordinatepairs, each of which is given an identification of vertices teminated by nodes number. Polylines may be stored as a sequence in the same andare also given identiflcationnumbers.Polygonscan be represented manner as polylines and their final node must be the same as their first node in order to close the loop. This method of data storageis used by CAD and some early GIS programs (e.g.ArcView 3.2), but it is not very efficient because it doesnot take into account the topological relationshipsbetweenvector objects.When two polygons sharea common boundary,for example,the verlices of the common boundaryneedto be system. Furthermore, for eachpolygon,which resultsin a very data-hungry recorded


The gcodatub,tse

at.s stort end venices l-poly r r .j ,a,btl 2c fd,etl l 3i fh,gl Vl 41k ttV 5/r c 6k ftv



Fig.,1.2Threetopologically polygons related and a simplearc nodeslomgcstr.ucture.

there is no way to store relationships- such as adjacencyor overlap between vectorobjects, so they mustbe computed at need.A moreel'flcient solution usesa relationalmodel for data organisationand is referredto as aru'ttodetluta structure (Fig.,1.2). This methodshxctures, stores andreferences datain a relational DBMS so thatpoints(nodes) construct polylines(arcs)andpolylinesconstruct polygoI,ts. The directiona]ity of the arcs is recordedin this storage n.rethod by a field that defines the beginning andendnodeplus the veftices ir.r between. Arcs thatareused to delinepolygonboundaries alsorecordwhich polygons lie on the left andright ol. the alc. This enables adjacency queries on polygonsto be quickly answered with reterence to the arctableratherthanusinga computationally morecomplexspatial querymethod. Any spatialqueriesperformedon theseobjectscanthenbe answered by reference to thesetables. For example, qLrestions suchas 'what is the areaof polygonIII?' canbe answered by usingthe vertices thatdefinearcs5, 2 and6 to build triangles. for which areas can be calculated, the sum o1 which provides the polygon'sarea. Topological queries, suchas 'which polygons are adjacent to polygonlV?' canbe answered by findingthe arcsthat definepolygonIV, then usingthe left and right polygonfield fbr thosearcsto defineadjacency (i.e.polygonsII to otherpolygons andIII). Although the arc-node structure is arguablymor.eefficientit can limit flexibility, so some recentGIS databases, such as that usedin ArcGiS, havemoved to a different topological structuremanagedby what is called a Spatial Database Engine.In this system,the arc-nodestructurefor eachobject is storedseparately in what is termeda FeatureClassobject.Topologicalrelationships are thus computed at-needratherthan storedas an inhelent part of a spatialdatabase, although some forms of relationshipscan be predefinedand storedin FeatureDatasets,so that, for example, changes in the boundaryof one object will automatically resultin

1.3 Spatial data slorage and nnnagemenl


Fig. 4.3 A l0 x 10 rastergrid with cell valuesov(]rlay.

:hangesto thoseothersthat sharethe sane boundary.This permits a high degreeof but is alsocomputabetweenspatialentities, -eribility for definingrelationships ::onally more intensiveandrequiresadditionalmemory for recordingthe structures .nd relationships between spatial objects. ln Chapter7 we discussthe mechanicsof a broaderrangeof spatialqueriesthat :ependon topological relationships. 1.3.2 Rasterdata storage Considerthe rastermap in Fig. 4.3. A simple method for storing such a grid of n .rrr's by m columnsis to record the r, 1-coordinatepair of eachcell togetherwith :ls associated attributevalue.This, for obviousreasons, is neithervery practicalnor :' icient: there is no mechanismfor recording the size of cells and large matrices ..iill havemassive memoryoverheads threepiecesof because everypixel requires :ata (i.e. its r, )-coordinate and its value).Advances in rasterstorage systems in :he 1980sand early 1990ssaw the developmentof much more efficient methods pp. 51 57).All but the most data(Burrough andMcDonnell1998, -rfstoringraster now possess rasic rasterstorage systems a header, which definesthe numberofrows .rndcolumns,the cell size,the geodeticdatum,the projectionandreferencesystem, :nd the locationof a comerpixel followedby a sequence of cell values(Fig. 4.4). In some GIS programs, such as Idrisi, the headerinfbmation is storedas a leparatefile that sharesa name with the file that storescell values.Using a header neednot be stored, iesultsin more efficientdatastorage because the cell coordinates but large files can still occur with data consistingof long real number strings,as is rhecasewith elevation data(Burroughand McDonnell 1998,p. 51). One method to reducefile sizethat works well for datathat havesomedegreeof positive spatial lutocorrelation (Chapter 6) is to record the differencesbetweenone cell and the


The geodatabase columns:10 rows:10 cell size: 20 minimum;r: 320 minimum y: 153 reference system: metres

1 1 1 | | 2 2 2 2 2 1 1 | | 1 2 2 2 2211I12222221111222 2 2 2 | | | 1 2 2 2 2 2 2 | | 1 3 2 22222113 3 3 3 22221 | 3 3 3 3 3 3 3 3 1 r r3 3 3 3 3 3 3 1 1 13333333

Fig.4.4 A typicalraster stongefile.

I 1 1 | 1 2 2 2 2 2 1 | | 1 1 2 2 2 221 11I22222211 r r 222 2 2 2 1 I 1 1 2 2 2 2 2 2 1 1 1 3 2 222221 I 3 3 3 3 2222 | | 3 3 3 3 3 3 3 3 1 1 1 3 3 3 3 3 3 3 I 1 13333333


1 525 1 525 1 426 1 426 1 426 1 3 3 t 2 6 1 2 3 4 2 4 1 2 3 8 1 3 3 71337

Fig. 4.5 An example of Ester file compressionusing the RLC method.

next.In this way an elevationmafix that containedthe sequence { 1021.34,I022.0I, 1023.00) could be recordedas {1021.34, 0.67, 0.99} and would rhus reducethe number of stored digits in the raster file. Methods that rely on a form of data compression may also offer solutionsfor particulartypes of data.For exarnple,an array of numberscan be compressed using a systemof run length compression or RLC. This involvessubstitutingrepeatingnumberswith an integerthat defineshow many times the number occurs in sequence. RLC would reducethe data array in Fig. 4.4 to the secondarray shown in Fig. 4.5. Other forms of compressionuse regular shapessuch as squares,rectangles, trianglesor hexagons that tessellate in a regularpattem and theseare then assigned to data of uniform value, so that only the size and location of the shapeand the sharedcell value need to be recorded.When squareblocks are usedthe processis often referred to as quadtree data encoding. Like the more common RLC method, eachmay offer some spacesavingsat the expense of computationaltime. In practice, a user neednot be ovedy concemedwith the specificsof the raster compressionsystem used in any given GIS, as most are able to both import and expoft non-comprcssed raster data. The specific format and headersused to read non-compressed dataareoften particularto an individual GIS, althoughthese details will be included in the software'sdocumentation. Readersinterestedin additional information on raster data storage and compression are referred to Burough and McDonnell(1998,pp. 51-57).

Spatial data acquisition

5.1 Introduction This chapterexaminesthe different ways in which spatialdatasets are acquiredand jtructuredto take advantage of the visualisationand analyticalabilities of GIS. It is conventionalto distinguishbetweenpriruary andsecondarydata sourcesbecause differ considerably acquisition methods,data formats and structuring processes or information collected betweenthe two. Pdmary data consist of measurements tiom field observations, survey,excavationand remote sensing.Secondarydata refer to information that has already been processedand interpreted,available most often as paper or digital maps.Many usersof GIS wish to integrateprimary (for example,to plot the location of primary survey data lnd secondarydatasets f,cross an elevationmodel obtainedfrom a data supplier).Both types of data have :rdvantages and disadvantages, which this chapterexaminesin somedetail. By the end of this chapteryou will be familiar with the ways in which both primary and secondary dataareobtained. andthe issues andprocedures for assessing thequality of combined datasets. 5.2 Primary geospatial data processed Primary,or 'raw', geospatial data has not been significantly or transgenerate tblmed sincethe information wasfirst captured. Archaeologists vastquan iities of primary data during excavationand survey,such as the location of settlenents, featuresandafiefacts,geoarchaeological andpalaeo-environmental dataand the location of raw material sourceswithin the landscape. Raw data may also be availablefrom databases of information compiled by other agencies: the location of archaeologicalsites,for example,can be obtainedfrorn Sites and Monuments Recordsand published site 'gazetteers'.Primary data may also be extractedfrom remotesensingsourcessuch as aerial photographs, satelliteimagery and geophysical survey.This section reviews the two major sourcesfor primary data, namely andremotesensing. survey 5.2.1 Sun'eltit g technologies Digital recording is increasinglycommon on archaeological surveysand excavations andhasbeengreatlysimplified in recentyearsby tabletandhandheldcomput(perhaps ersconnected wirelessly)to digital recordingequipment.This accelerates recordingprocedures and reducesthe probability of errorsbeing introducedin the transferbetweenpaper and digital formats. Key spatialtechnologiesin this regard



Sl,atit I h ro d Lq tti I i t i.)tr

Fig.5.l A tol lstation consisting ofa tergedess lascrdistancc measurer and an electfonic theoclolile.

(GPS)receivers areg!obalposition lng.rr,.iterz and 'self-registering tachometers' or Iotal slaliot1s. Althoughboth piecesof equipment are expensive. the time saving. they offer help to offsettheir rentalor purchase costs. Total station survey A total station is a pieceof electronic surveyeqLripment thatis ableto recordhori zontalandverticalangles andlineardistances fiom itselfto a target(usingintl'ared or laserlight) andthenautomatically convert this datainto eastings, northings and elevation valuesvia trigonometric formula (Fig.5.1). Total stations havelargel\ Ieplacedoptical the)dolites.but the latter are still in use in situations where I total station's costor dependencc on electricity causes difficulties. It is possiblc to establish pointswith refer the positionof the total station and all newly mcasurcd ence to a knorvncoordinatesystemprovided that two or more points with kno\\, n (i.e. 'benchmarks') coordinates arelocally available. Alternatively. a Iloatinglocai glid independent of any nationalor global systemmay be established. The locai grid, then,may laterbe 'tied in' to a nationalglid, for example by usinga GPSto establish the reference. Coordinate datacapluredby a total stationarccither storedon an internalmemorr cardor data-logger, whichcanthenbe downloadcd to a computer. or thetotalstation may be connected to a computer sothattheresults ofthe survey canbe immediatell visualised andannotated in a programsimilarto ESRI'sSurveyAnalystor Leica's GIS DataPROsoliwarepackages. If the data are storedon a data-logger it will

5.2 Primary-geospatialdata


:\pon a text file that lists the nameor codeof eachobject alongsidethe easting(-r), -,rthing() ) and elevation(z) that definesthe object'srelativeor absolutelocation. ris datacan then be imported into a GIS program. ,3lobal positioning system survey :re GPS is a satellite navigation system used to provide precise locations to -:.-eiverson the Earth's surface.It was initially developed,and is currently main',:ned, by the US military and first startedoperationin 1978.The current system of 29 satellites launched ,:,nsists between1990and 2004.eachof which circles 3 Earth twice in a 24-h period at an altitude of approximately20 100 km. Each .,:ellite continuouslybroadcasts a time signal from its internal atomic clock. The ..::llites broadcast two signals, calledL1 andL2, the formerof which contains a (.C/A vse acquisition code code) that is publicly accessible, whereasthe latter protected(P) codewith access rights controlledby the US - : rrains a moLeacculate l3partment ofDefense.Until May 2000, the C/A codewas intentionallyd'egraded, - -: now this is no longer the casebasic GPS accuracyhas increased fiom -50 m under20 m. \ GPS receiverunit, which can range from a small inexpensivehandheldunit ln expensivepermanentbasestation,has its own internal clocks fiom which it . lble to ascertain the time it takesfor eachsatellite'ssignalto travelto the receiver, I thus the distance from the feceiver to each satellite.In theory, three satellites location (e.g. latitude,longitude and --: sufficientto provide a three-dimensional :r ation) on the Earth's surfaceby establishingthe position of the intersectionof i distances betweenthe receiverandeachsatellite.In reality,the internalclockson :rsumer GPS receiversare not accurateenoughto establishthe time diff'erentials rhelevelof precision required for usefulthlee-dimensional measurements, so an ,:litional (fourth) satellitesignal is needed. In practice, a GPS receiver must often estimatethe position of the intersecof vagariesin its estimateof local time and atmosphericinterference :n because signals .th satellite lying at low angles in the sky.Conectionmechanisms in the 'PS receiverand additional data broadcastby the satellite are used to minimise .a:e so[rces of error. Higher-endreceiversare able to reduceeror even further :. analysingthe signal code frequenciesto estimatethe atmosphericdistofiion. :rportant sourcesof correction are also obtainedby additional satellite systems, -:,rh as the Wide Area AugmentationSystem(WAAS) and EuropeanGeostation-:r Navigation Overlay Service (EGNOS). These separatelymaintainedsystems the USA and Europe, that - -.nsistof networks of ground stationsin, respectively, information reliability :::nsmit on the and accuracyof the GPS signals to other receiverscan read. -,rellites that WAAS- or EGNOS-enabled Inexpensivenavigation-grade handheldGPS receiversare suitablefor collectlocational infbrmation about features on the landscapeto within 10-20 m :r_s r their true location. It is increasingly common for inexpensivereceiversto be .\.\AS/EGNOS enabled,providing locations *3 m of their true location in the


Spatial data acquisition

USA or Europe. This level of error is acceptablefor larger phenomenasuch as archaeological sites or monuments,but is less suitabletbr recording local topogphenomena raphy or smaller-scale like artefactlocations. For applications that require greaterprecision and accuracy,a systemthat provides di.fferentialcorrection betweenthe estimated and actuallocationof the receiveris needed(hence tlifferentictlor DGPS). Mapping-gradereceiversthat are accurateto between 0.: and 5 m are suitable for archaeological mapping(e.g. sitesurveyand larger.scale up to I : 2500) and receivecorrectionsfrom other satellitesor from radio beacons. More expensive survey-grade systemsaccurateto the centimetrelevel consistof a roving unit and a basestation(Fig. 5.2). The basestationis set up at a location for which the precisecoordinatesare already known or are first estimatedby averaging thousands of readingstaken over severalhours.Once the basestation'sactual location is reasonablywell established, it begins recording the constantlychanging differencesbetweenits actual location and its fluctuatingreal-time location as derived from the satellites.These fluctuating differencesare time-stamped.Datr collected by the loving unit are similarly time-stampedso that calculatedcoordinatescan be adjustedby the appropriateconection factor determinedby the base stationeither in real time (real-timekinematicor RTK GPS)or in post-surveyprocessing(.rtalic DGPS). Positionalacculacy is thereby increasedto the centimetre level or better,allowing GPS to be usedfor higher-precision survey work., Surveycoordinates obtainedby RTK GPS are storedin the roving unit and then downloadedto a computeras a set of coordinates and featurecodes,or downloaded directly to a portable (often a weatheeroofed tablet style) computer As with all GPS technology,however,if the receiversare unable to communicatewith the satellitesthen they are unable to record locations. This reducestheil usefulness in situationswhere there is denseoverheadvegetation.Global positioning systent surveytechnologyis thereforenot suitablefor all applications and shouldbe viewed simply as one of severalpotential methodsfor spatial data acquisition.The GPS equipment canappealfairly intimidating although in realityit i\ to the uninitiated. a fairly simple technologyto use, and in appropriateconditions is oflen far faster thanusinga total station. 5.2.2 Remote sensinp The term remote sensing(RS) refers to the collection and intelpretationof infoimation aboutphenomena without physicalcontact.Our eyes and visual corter. for example,constitutea type of remote-sensing device that is extremely good al detectingand making sense of difl'erentialabsorptionand reflectionin electromag(nm) and 400 nm (with netic wavelengths betweenapproximately700 nanometres a conesponding wavefrequency between 4 and7.5 x l0'* Hz), which we perceive as visiblelight (Fig.5.3).More customarily, however, RS refersmore specificalh to the art and scienceof acquiring and interpreting infbrmation about objects and phenomenaby measuringtheir responses to electromagnetic radiation collected ground using sensors mountedon anything from vehicles,to balloons,aircraft and

5.2 Primarytgeospatialdata

Fig.5.2 A differential GPS (DGPS), which typicalLyprovides three-dimensional accuracyto +3 cm or bettea


Spatial data acquisition los l03

5.0 0.7 0.4





.t G E


pf r >2

I _-

.9 FE

1 0 l0


t0' to'o 10" 1o'' to't t o ' ' t 0 ' ' t 0 ' u r0

1 o ' 8ro " Hz

Fig. 5.3 Wave frequencies(in Hz) and lengths(in pm) ofthe electromagnetic speclrum,

spacecrafl. Remote-sensing systemshavelong beenusedby archaeologists to collect infbrmation about the structure of landscapes, the natural environment,and the location and conligurationof archaeological sitesand features.The decreasing cost and increasingavailability, resolution and versatility of satelliteimagery has begun to make a significant impact in archaeology, although the full potential of this rich dataset hasnot yet beenfully exploredfor expandingour understanding of long-tem landscape and environmentalhistory. Although dedicatedRS software is neededfor sophisticated forms of image corection, enhancement and analysis, many GIS programsprovide a good rangeof tools for the analysisof image data. Moreover,by allowing imagesto be usedas thematic layers with other geospatial information,GIS providesan interpretative frameworkfor both collating, analysing and interpretingRS data aboutthe contemporaryand past landscape. Types of imagery Remote-sensing image data can be divided into two types,photographic anddigifal, dependingon whether the information is capturedusing sensorsthat record electromagnetic responses as visual or digital data. Digital imagescan be further subdivided i.rltopanchromatic or multispectral dependingon the breadthof electromagneticradiation they are able to capture. Aerial photographsare frequently used in archaeologicalsurvey and, more recently,high-resolutionimagesfrom photographicsensors on fomer spy satellites have been used by the archaeological community. Photographicsensors typically capturehigh-resolutionimagesofthe Eal:th'ssurfaceusingcolour,panchromatic or infiared film. In generalterms, the value of any photographis prirnarily a function of the areacoveredby the photographand the resolutionof the image.This in turn is a function of the altitude of the aircraft (or spacecraft), the focal length of the cameralens, the sensitivity of the lens and the propertiesof the film (Jones1997, p.99). Digital sensors record the way that natural and anthropogenic phenomena such as water, soil, leafy vegetation,r'oadsand buildings respondto different parts of

5.2 Primary geospatialdata Landscape (b) Digital'image'


190 182 1 9 3 1 9 0 192 187 78 170 50 48 187 191 1 9 0 179 1 8 6 1 9 0 't /6 1 9 0 59 52 191 1 9 3 1 8 5 1 9 0 1 7 6 182 4l 198 192 191 181 1 8 0 1 8 2 5 8 183 179 6 l 53 65 68 241

1 8 0 58 68 1 9 0 240 251 24a 240 248 240 245 248 240 237 5B 58 54

188 5 7 55 51 240 251 235

56 49 60 242 252 235 60 5 7 140 1 7 6 53 67 243 252 235 60 5 7 188

234 250 240 53 56 5 7 1 8 8 1 9 0 1 9 3 Fig. 5.4 A hypotheticallandscape consistingof three distinct ecological zones(a), the pixel valuesas recordedby a digital sensor(b). Note that the pixel valuesexpressan averagefor the areathey cover,and only one band of severalpotential bandsis sho\rn.

ie electromagnetic spectrum,and record this information as a sequence of bltes,r rihere eachbyte corresponds to a pixel value (between0 and 255; Fig. 5.4). The :najoradvantage ofdigital sensors is that, unlike optical photographicdevices, they :ecordinformation acrossa broadrangeof visible andnon-visiblewavelengths and :re thus refened to as multispectral scanners. Typical resolutionsfor multispectral latellite systemsare in the range of 10 to 40 m. This obviously restrictsthe use .f digital imagery to investigatingphenomena that exist at much larger scales than .-anbe detectedwith aerial photographs. However,recentdigital imaging systems, as SpaceImaging's IKONOS2 and DigitalGlobe's Quickbird,3 can achieve 'uch ranchromaticresolutionsof 1 m or less (Table5.1), which is equivalentto optical .ensorson fbrmer spy satellites. Photographicimagery Aeial photographsare normally classifiedas vertical or oblique. Yertical photographsare those taken with the camera axis as vertical as possible.These are often used for photogrammerry- the derivation of spatial measurements from photographs- which has long been profitably employed by (e.g. Sterudand Pratt 1975;Poulter and Kerslake 1997;Bewley and erchaeologists Raczkowski2002).When vertical aerialphotographs aretakenin stereoscopic pairs
J--at" jt". t.^n".it dutu to ground stations in streamsot brnai:ydata composed of bits which have only two possiblevalues,0 o. l. As bits cannot individually represent anything more complex than on/off. trueffalse.0/1, elc., they are often grouped-together in groups of eight, called 'bytes'. Bytes offer considerablymore flexibility (i.e- 2') potential values,and dre referredto as 8 bir images. _thanbits, having 256 -:rv,,'w. spaceimaging . corl'toe.,r. eurimage . com/produc Ls / quickbi rd. html.


Spatial data acquisitio n Table 5.1 Major sourcesof dtgital satellite imagery
Type SPOT 5 Landsat-7 ETM+ IKONOS Quickbird Radarsat SIX-C/X.SAR
" Multispectral. 'Panchromatic. 4-band multispectraland panchromatic 7-band multispectraland panchromatic 4-band multispectraland panchromatic

Resolurion (mJ 10,


l 5D

4-band multispectraland panchromatic 4-band multispectral Radar

2.5. 0.6, 3 100 Typically30

and viewed with a stereoscopic readeror displayedusing specialised photogrammetric software they can be used to map three-dimensional topography.Vertical photographs are usually obtainedusing specialised camerasand dedicatedaircraft and they are available(often for a fee) from the agenciesthat acquiredthem. For GIS and/ormapping purposes, photographsare preferable vertical or near-vertical to oblique photographsas they are subjectto lessspatialdistofiion. Oblique aerial photographs can still be usedto extractuseful information, but they will needto be (see rectified below) and their level of accuracyfor mapping and analysiswill be lessthan that obtainablefrom vertical images.However,as archaeologists are often obliged to use oblique photographs, programshave beendevelopedto specialised help rectify them so they can be usedfor mapping.The prograrnAirPhoto included with the Bonn Archaeological Package (BASP)4 Statistics is oneexample thathas beendesignedspecificallyfor archaeologicai purposes. Acombination ofshadow,vegetation, soil patterns andtextureinfluencethe value of aerialphotographyin detectingor delineatingarchaeological remains.Visibility can, therefore, change dramatically between the time of day and the seasonof the year in which the photographis taken. The ability to detect featurescan also be influenced by the type of film used. Black-and-white film has an established pedigree in archaeology, but it is being superseded by increasinguse of colour aerialphotographyand colour infrared photography(CIR: e.g. McKee e/.r1.1994). As aircrafts fly significantly closer to the Earth's suface than satellites,aerial photography will always be available at higher resointions. However, recently declassifiedintelligence satellite imagery can approachthe spatial resolution of aerial photographyand is also suitablefor distinguishinglandscape features.Former Sovietspysatellites suchastheRussian KVR-1000andUS CORONAsystems,
r, , { 1' 1 d u. ni k o e l n . d e / - a ]0 0 1 /b a sp . h tm l.

5.2 Primary geospatialdata Table 5.2 NASAlandsat-7 ETMI bands

Band Wavelength(pm) Spectralregion Resolution (m)


0.45--0.52 0.530.61 0.63-0.69 0.7s 0.90 1. 5 5 -1 .7 5 10.40-12.50 2.1c 2.35 0.52 0.90

Visibleblue Visiblegreen Visiblered Nearinfrared Mid infrared Thermalinfrared Mid infrared Panchromatic

30 30 30 30 30 60 30 15

(2002). Source: Jensen -.orexample,take panchromaticphotographicimages with a spatial resolution of letween 1.5and3 m, which is sufficientto detectarchaeologically relevantfeatures Fowler 1996;Philip et a|.2002:Ur 2003). Digital sensors Digital sensors are conventionallydivided into two types,passive :nd actile, dependingon whetherthe instrumentation is responsible for producing }re energythatis reflectedoffthe phenomena beingrecorded(Table5.1).Active sys:--ms recordthe reflectedresponse to transmittedelectromagnetic radiation,suchas nicrowave and radar imaging systemsthat transmit microwaveenergyand receive ie reflectedenergyback from the Ealth's surface.The Spacebome Imaging Radar SIR-C) and X-Band SyntheticAperlure Radar (X-SAR) project missions(ointly rrn by NASA, the German SpaceAgency and the Italian Space Agency) have :ncreased the rangeof information that can be capturedusing activemicrowaveand :adarimaging systems.s Archaeologists havefound that in cefiain conditionsthese \-anners can sometimesdetect infomation through canopy vegetationor buried lnoer snaUow so -" However,many satelliterecordingsystems, like aerialphotography, are passive. The first four Landsatsatellites launchedbetween1972 and 1978by NASA all carried passivemultispectral scanners that capturereflectedenergy from the Earth's .urface.Thesedemonstrated the value of satelliteimagery for providing dataabout fie terrestrial environment,such as land-usepatternsand ecological variability. Landsat-4andLandsat-5(launchedin 1982and 1984)carriedimprovedmultispecral scanners refered to as the ThematicMapper and included thermal infrared iensorsfor recording heat energy emitted from the Eafih. Landsat-6(launchedin 1993)failed to reach orbit, but Landsat-7 (launchedin 1999) uses an Enhanced Thematic Mapper Plus (ETM+) to record seven multispectral bands plus an eighth (panchromatic)sharpening band (Table5.2).
: 1t t p: / / s o u E h p o r t . j p f . n a sa . g o vl sir -: r, ^ r. jp l . n a s a . g o v l r a da r / sir cxsa r /. c /q e tr in g d atsa/mi ssi ons-generaf . hrmt.


Spatial data acquisition

Table5.3 Principctlapplicationsof Landsat TM spectral bands

TM banJ Spectral regior Visible blue Principllapplicdlinn

Water penetrationuseful for coastaln]apping.Soil/vegetation discrimination.forest type nappitlg. identificationof cultural features Visible gree11 Vegetationdiscrimination and vigoLlrassessment. Identification of cultural f-eatures Visible red Chlorophyll absorytionfbr speciesdifferentiation.Cultuml feature identification Near infiafed Vegetationiypes. vigour and biomasscontent.Identitying warer boJie. :nd fnr :oil rnoi\tLre diJcrirniruliun Mid infrared Vegetationand soil moislllre content.Dit]erentiation of snow from clouds Therrnal intl-ared Vegetationstr-ess analysis,soil moisturecontent and thennal mapping applicarions Mid inliared Discrimination of minet-aland rock types. Vegetationmojsture content Panchromatic Cultulal and natural featurenapping Source: Lillesand and Kiefer (2000:Table6.3).

The latestSPOT system(SPOT-5,launchedin May 2002) collectsmulrispecrral data in the green.red and near-infraredbands.DigitalGlobe's Quickbird satellite system,launchedin 2001, provides multispectraldata in the blue, green,red and near-infi'ared band at 2.5-m resolution.Multispectral scanners can also be operated fiom speciallyequippedaircraft,therebypermitting much higher-resolution - often submetre- data capture. The different ways that different phenomena reflect the Sun's energyor respond to transmittedenergy is describedas a speclral rcspotlse pattern (SRP).Analysis of SRPs is used to help identify and distinguish betweenphenomenathat might look similar to the naked eye and thus define a responsesigtloture for different phenomenon(Eastman2001, p. 57). In practice,signaturesare difficult to define. A major part of satellite imagery interyretationthereforeinvolves the analysisof how different objectsand phenomena respondto either reflectedsolar radiation or radiationtransmittedfrom the recordinginstrument,under differ.ent conditions.At a grosslevel, different types of vegetationmay be distinguishable in the longer red andnearto short-wave infrared band(600-900 nm) and differences in soil moisture content may be diffetentiatedby responses in the shofi infi.aredbands (i.e. 1400, 1900 and 2700 nm) because wet soil sudacesabsorbtheseelectromagnetic wavelengths (Lillesand and Kief'er 2000, pp. l7 20). Table 5.3 provides a generalised exampleof the how different phenomena respondacrossdifferent spectralbands. In the sameway that archaeological featuresmay be detectedin aerialphotographs

5.2 Primary geospatialdata


iiom vegetationpattems,often manifestedas 'crop marks', subtle differencesin soil moisture content or vegetationmay also indicate the presence of past human When attemptingto identify featuresthat havean unknown SRP,a 'train-rctivity. ing' processfbllowed by 'ground truthing' is often neededto help with classidcation. Chapter 6 descnbesthe processes involved in image classificationand in further detail. .nhancement Morc recent passivesystemsused by archaeologists include airbome thermal l ideo radiometers(TVR) that measureemitted radiation in the thermal-infrared TIR) spectralregion (3 14 pm). This has been shown to have application for Jetectingsubsurface features:since buried objectscool more slowly than the sursoil matrix, differences :r,.runding in emittedthermalenergycanbe detected by TVR (Ben-Dor at night :nstruments et al- 1999). Some other RS devicesused by archaeologists are ground based,and involve iapturing information about subsurface remains,which include ground penetrat:ng radar (GPR), resistivity metresand fluxgategradiometers (or magnetometers). profitable There is a long and history of geophysicalsurvey in archaeology, the ::sults ofwhich can often be well integrated with otherforms of spatialdata(B arker :998, pp. 60-67). Geophysicalsurveyis a specialised archaeological subdiscipline iat requiresa considerable amountof training and expertise.A good introduction :.r basicprinciplescan be found in Clark (1996). Snelltte and airbome-derivedtopographicdata It is becomingincreasinglycom:non to use satellite or airborne digital data to constructdigital elevationmodels DEMs) ofthe Earth'ssurface. This canoffer someadvantages over moretraditional :trrms of creating DEMs, such as from manually digitised contour lines (Chapter ! r or through stereoscopic analysis of aerial photographs,becauseof the faster time and larger area coverage :rocessing that satellitesystems offer. Major satellite .uppliers offer global or n-ear-global coveragefor DEM products. For example, SPOT' and SpaceImagingdcan supply elevationon a grid spacingof 30 m or less, .lerivedfrom analysisofstereopairsof images.The AdvancedSpacebome Thermal (ASTER) also providesdata on land sudace Emissionand ReflectionRadiometerv :.mperature,reflectance and elevationat a 30-m horizontalresolution. Intetferometry is a technique that uses simultaneoussignal acquisitions by !\ nthetic-aperlure radar (SAR) to collect elevationdifferenceson the Earth's sur:hce, as used by the 2000 Space Shuttle Endeavor's Shuttle Radar Topography .tlisslon (SRTM). Here, a radar system installed in the cargo bay and one at the :nd of a 60-m boom capturedelevationdata for 80 per cent of the globe at resolutions of up to 30 m, which has recently been made publicly available.lO These
--. . " . . p o t . . o * / n o . e / p r o se r / e te va L / d e m /we lco m e . h tn. o vervi ew . htn. 'r, -rw.s p a c e i m a g r i n g r . c o m . /p r o d u c ts / i tm / te ch n ica l '1'ssi or.h p l.r d sa .go . : . t p: aslerweb.'pl .r a sa .q o s-rr



Spatial data acquisition

sourcesof primary data are useful fol regional analysesand may offer time and cost savingsover surveyorthe manualdigitising ofpaper mapsto collect elevation oata. Higher-resolution datacan be obtainedby mounting laser or radarransmirters on speciallyequippedaircraft. such as Intermap'sSTAR-3i Intederometric Synthetic ApertureRadar (IFSAR). which is typically mountedon a Learjet.t' Aircraftmounted systems suchasthese cancapture elevation datawith a horizontal accuracy of under 2 m and a vertical accuracyof 50 cm or better.Even more accurateis aircraft-based laseraltimetry,suchasLight DistanceAnd Ranging(tlDAR), which can record 2000-5000 height measurements per secondto produce a DEM with a horizontalresolntion of about 1 m and a claimedvefticalaccuracy of 115 cm (Deloach andLeonard 2000; Brock et al.2002). Although this is a very quick way to collect higli-resolntion elevation data over a large area,the cost of these data productsoften puts them beyondthe budgetof most archaeological users. Archaeological applications Aerial photographsand satellite imagesare particularly useful tools for archaeologists as they can provide valuableinlbrmation about a landscape that may not be available from conventional maps,or would take a considerable amountof time to collect through field survey.Historically, much of the RS performedby alchaeologistsis basedon aedal photography at visible or near-infiared wavelengths, but siucethe launchof the first satellite digital sensors in the early 1970s and the more recent availability of irnagesfrom lbrmer Cold,War spy satellites,the use of satelliteimagery is becomingincreasingly common.Aerial photography and satellite imageryhavebeenusedto identify and map culturalphenomena suchas ancientwalls, settlements or other hu.nanmodificationsto the landscape that may only be visibleundercertain light andvegetation conditions. Somesatellite images approachair photographsin the level of detail they show and can also be usedto map structuresand f-eatures. For example, layers from multispectral systems, typicallythe red,blueandgreen (RGB), can be combinedto createwhat is known as a'colour composite'that mimics information in normal colour image i.e. green vegetationappearsgreen and water appearsblue, etc. It is also possible to produce composite imagesby combining the infiared, red and greenbandsto enhance healthy vegetation, which in thesecor.nposite imageswill be depictedas bright red. Some systems. such as SPOT and Quickbird, can have their composite images improved through a process of'pan-sharpening', which usesthe hjgher-resolution panchromatic data to enhance the lower-resolutioncolour data.providing a good alternativeto colour aerial photography. Geographicalinformation systemshave had a major role to play in the archaeological use of photographicand digital imagery. The integrativenature of GIS
'r*-. irrt"r..p . .u / in te r n a p ,ST AX 3 i L L p q r a d e .htn.

5.2 Primary geospaticLl data


:rables imagery to be related to other forms of spatial data, and many programs include sophisticatedimage enhancement and analytical tools to help clas-.tso .:t_\' and make senseof what can be extremelycomplex remotely sensed data.The : rllowing arethreetypical examples of the useto which satelliteimageryand aerial rhotographsare typically put within an archaeological GIS. ?,ospection One ofthe valuesof remotely sensed irnageryis its ability to provide . Jilferent perspective on what might otherwisebe a familiar phenomenon, which lead to the identification of new or unexpected features that may not have been -=n ::iognisable fron the ground. The history of using camerasin this way can be ::.icedback to photographstaken of Stonehenge from a balloon in 1906 (Capper .907). During the SecondWorld War many aerial photographswere taken fron.r ::;onnaissance aircraft and archaeologists were usedto help interyrettheir military . .nificance. After the war, the technique becamean establishedarchaeological ::rrspectiontool in WesternEuropeand North America, and thereare now copious fbr the discoveryand mapping of previ- ramplesof the use of aerial photographs unknown, or difficult to define, archaeological features.For example,stan-.rsly ::rd black-and-whitephotographs havebeenusedto identify previously unknown \:olithic encampments in Chalente, France(Bouchetand Burnez 1990),to dis(Wilkinson topographichollows in Mesopotamia - r\ er andinter?retanthropogenic r93), and to identify new f'eatures at the Avebury henge,Wiltshire (Bewley et al. !96). With the opening up ofthe post-communist skiessincethe end of the Cold ..\ir. the techniquehasmore recentlyplayeda major role in the discoveryanddocu:rintation of archaeological remainsin EasternEurope(Bewley et al. 1996, 1998). :recialised photographysuch as infrared (IR) imaging is sometimesnecessary to J:ntify archaeologicalfeaturesthat would otherwisebe difficult to see with the rked eye. The value of colour IR film, tbr example,has been demonstrated by :' role in identifyingprehistoric footpaths in CostaRica (Sheets and Sever1991; \lcKee l al. 1994), but interpreting such imagery often requiressome form of :riagemanipulationand enhancement. Declassifiedintelligence photographsfrom satellitesthat carried photographic .:nsorscanbe usedin wayssimilarto aerial photographs. For example, Phtlipet al. i002) andUr (2003)havesuccessfully useddeclassified intelligence photographs ::om the US CORONA satellite series to aid in the discovery and mapping of ,ichaeological settlementsin Syria. Russian KVR-1000 imagery has also been ::ed to help identify archaeological remainsat the RomansiteofZeugma, Turkey,l2 ..s well as clarify linear featuresin the Stonehenge region (Fowler 1996). Space:orne radar imaging has also been important in some high-profile archaeological :iscoveries, suchas the 1992finding of the Lost City of Ubar, which hashelped radardata.l3 :opularise
* i"t rr. at u.j p l . rr.'f.r"fntml /d e p a r tm e n ! / ze u sm a /r e m o L e . hLm1. - n a s a . g o v / r ad a r / sir cxsa r /.


Spatial data acquisitirnt

Air- and space-borne sensors produceimagerythat canbe of usefor archaeological prospection,but GIS does not necessarilyplay a significant role unlessthese images needto be enhanced. classified or usedas the basisfor constructing maps, Many GIS programs,particularly thosewith developed rasteranalysiscapabilities such as idrisi and GRASS,havesophisticated tools for imageenhancement and analysis of panchromatic, colour and multispectral data.Otherpackages, suchas Imagine,'*havebeen designed specifically for the interpretation of image data. Archaeologists haveprocessed multispectral imagery to help locatearchaeological remainson the basisof their particularsignatures within the image.Research in the ValeofPickering, Yorkshire, tbr example, usedair-borne multispectral imagery to identity and map previously unidentifiedlandscape featuressuch as trackways, (Powlesland enclosure systems and barrowcemeteries et al. lr997).In the USA, multispectral imagery hasbeenusedwith equalsuccess in ChacoCanyon to detect and map prehistoricroads,walls, buildings and agriculturalsysrems, some oI which could not be identiliedon standard aerialphotography or colour infrared photographs (Sever andWagner1991).

Photogratnmetn,TheriseofClShasledtoaresurgenceinthertseofaerialphotog raphy for the pur?oseso1'archaeological mapping.The use of aerial photographs to make mapsof archaeological siteswas pioneered by O. G. S. Crawfordin the 1920s and 1930s. and became increasingly commonafterthe Second World War. Although a major archaeological tool, photogrammetry is a disciplinein its own light and specialised optical or computervisualisation equipment(as well as a greatdeal of tfaining) is needed interpret pairs with the aim of to stereoscopic recording three-dinensional data.The limitationsof desktop GIS restricts phototo on-screendigitising of georeferenced aerial photographsto capture -sranrmetry two-dimensional data.ln the lastfew yearstherehavebeensuccessful attempts to nirp cuJtural landsc:rpes usinghigh-definition panchromatic satellite imageswith resolutions of under4 m (Fowler 1996). Theseimages come in tilescoveringtens or hundredsof squarekilometresand havebeen shownto be a good alternativeto aerialphotoglaphy for regional landscape mapping, such.as their useby Ur (2003) for rr.rapping ancientMesopotamianroad systems. A typical photogrammatic application involvesdelineating features by tracing theirpathswith polylines. or definingtheirboundaries with polygons, but this type of rnapping will rarely be as accurateas a lleld survey.Mapping featuresfrom aerialphotographs, however, may accelerate map productionfbr extensive features thatwould be difficultto surveyon the ground.An excellent example ofthis is the mapping undertakenat the Roman town of Viroconit ?7 at Wroxeter, England. A mixture of vertical and oblique photographs(that had been lectilied using points obtainedfrom geophysicalsurvey)were usedto aid the mappingof archaeological features(Baker 1992).Aerial photographs have also been usedsuccessfully to
'"vrr,fir.erdas. com,

5.2 Prinmry geospntioldata

,!i. -





Fig. 5.5 A doFdensity overlayon an aerialphotograph can providean excellent visual tiamewolkfor-thequalitative assessnent ofdistdblttionpatterns. O KytheraIsland Project.Used with pemissron.

::.tinguishand map environmental zones, suchas the categorisation of wetlands for an investigation into the environmental cultural history and of NW -rdertaken :ngland (Cox 1992). ,.tnrlscape visualisation Rectifying and georeferencingan aerial photographor .ltellite imagein order to use it as a background landscapes for visualising can ::cilitatethe interpretation ofother spatial datalayers. A visualreference to enable ,,rntextual detailto be qualitatively assessed can provideprofoundinsightinto the .'-riitial relationships between archaeological f'eatures. dataandlandscape Figure5.5 :rovidesa good exampleof this by using a georectified aerialphotograph as a r,ickdrop to an artefact distribution pattem, which adds a contextual dimension :hlt would be ver-ydifficult to replicate in any other way. Aerial photographyor rJtelliteimagerycan also be usedto provide an elementof terrainfea]ismto a :hree-dimensional digital elevation model (DEM). We discuss the construction of DEMs in some detail in the next chapter, so here it is s[fficient only to draw rttention to the fact that aerial photographscan be used to enhancetltc three Jimensional visualisation of terrainwhen drapedon a DEM. This technique can


Spatial data acquisition

play an important role in the effective communication of the qualitative aspects of landscape variability in the sameway that an aerial photographcan provide an effectivebackdropto other data layers. Acquiring and integrating remotely senseddata Satelliteimagerycaneitherbeobtaineddirectly from the government orcommercial agencies suchas SPOT,ls NASA,16EurimagelT or IKONOS,I8or from academic or other third-party organisations. The US Geological Survey's (USGS) EafthExplorer database maintains archivesof Landsat,CORONA and other declassified satelliteimagery searchable via an online cataloguelo and the Canadian Centre for Remote Sensing (CCRS) hosts a similar service.20 In the UK, the Natural EnvironmentResearch Council (NERCtzl has a range of Landsat,SPOT and air-borne multispectral imagery availablefor download. MIMAS2: also provides accessto UK coveragesof Landsat and SPOT data for the UK higher education community. Satellite data may be suppliedeither as photographicprints or negatives, digital images or, as is common for multispectral data. in a file archive. The exact format in which digital imagery is acquired dependson the supplier, so it is important to consult the documentationfor the pret'enedGIS program to establish whether it is able to read the supplieddata fbrmat. Geographicalinformation systemprogramsthat have traditionally been strongerin image processingfacilities (such as ldrisi, Imagine and GRASS) can cope with most imagery formats. Other more vector-orientated software, such as ArcView and Arclnfo, may have somelimitations.Table5.4 lists someof the morecommonly encountered rasterfile formats. The digitisationoffeaturesfrom an imagebasemap is a commonway ofcreating a map. Howevel,the identificationand accurate delineationof phenomena of interest is often dependent on image enhancement, analysisand intetpretation,which is coveredin the next chapter.There are also severalpreconditionsthat must be met before image data can be fully integratedwith other spatialdata. Air photographs and satelliteimages,unlike maps,lack an inherentreferencesystemtbr scaleand location. It is, however,increasinglycommon for image data to be suppliedrectified and with referencedata incorporatedinto the image.GeoTiff and SpatialData Transf'er Standard(SDTS) formats,for example,are commonly usedfor distributing ref'erenced raster files. Bear in mind that an embeddedreferencesystemmay not be the sameas the rest of your data. In order to incorporateunreferenced or incompatibly referencedimages into a GIS and to relate them to other maps, a new coordinatesystemmust be imposed on the image through a processreferred to as georefbrenclng,as described in more detail below. This mry necessirate
r5rv u w . 1 6 h ttp , //1 a n d sa t. g sfc.n a sa . qov. l ?m. euri nage. spot,. corn. r3m . s p a c e i m a g i n g . co r n . le h !!p :/,/e d csn sl? .cr .u sgs.qov/E arrhE xti l orer/ lor" . w l , t , c c r s . n r c a n . g c.ca . llwww.n e r c.a c.u k. l l w w w . nri mas , ac , uk. con.

5.3 Secondary data Table 5.4 Commonsatellite and imagef.le formats

Image fomat Tagged Image File Format Georeferenced TaggedImage Files Fomat SPOT GeoSpotFiles Windows Bifinap File Hierarchical Data Fo.mat National LandsatArchive Production System National Imagery TransmissionFormat Standard USGS Spatial Data TransferStandard USGS ASCII DEM Erdas Imagine files ATCGIS Grids Military Elevations (DTED) Shuttle RadarTopographyMission files Landsat (band-interleaved-by-pixel) SPoI (band-interleaved-byJine) Joint PhotographicExperts Group Image File Usual extension



:3-projectingor otherwisechangingthe shape ofthe image (i.e. 'rectifying', 'warp'rubber :ns or sheeting' systemmore accurately. A ) to makeit fit the new reference processof adjustmentto correct for relief-baseddistor:ilated, but more accurate, jon is called orlr orectirtcafion.This is necessary to conect the signilicant terrain Jisplacement in satelliteand air-bomeimagery in areasof significantvertical relief softwareand skills to perform accurately. -mdrequiresspecialised 5.3 Secondary data Thetransferof datafrom anexistingstorage medium(e.g.from papermaps)remains I cornmonroute of spatialdataacquisition.Secondary data,whetherdigital or anaiogue,haveby deflnition alreadyundergone processing and intetpretationandtheir rse thereforeshould come with an understanding of the potential sourcesof enor in the data. To a greateror lesserextent, all maps are partially subjective,highly ransformed and interpretedtranslationsof raw data, and they inevitably contain 'he biasesand intentionsof the peopleor agencies thatproducedthem (Wood 1992; King 1996).Secondary data sources may also contain substantial spatialerrorsthat needto be taken into accountwhen the data are used for spatial analysisor deci:ion making. This includes maps that have been obtainedfrom commercial map suppliersand from national mapping agencieslike the UK OrdnanceSurvey and the US GeologicalSurvey.While theseagencies publish information about acquisition andprocessing methodswith anticipatederror rates,the risk ofincorporating .roneous or imprecisedata in a spatialmodel remainsvery high when using secondary data from theseor other sources. Furthermore,if thesemaps exist only in


Spatial data acquisition

paperform they will needto be digitised,which can itself be a sourceofpotentially significant error. This is not to say that primary data sourcesare always preferable; the enomous investmentin time and specialised skills requiredto producea topographicmap from ground survey or aerial photographsmeansthat it is more efficient to use secondary data,albeit with the expectation that someerrors will be encountered. 5,3,1 Integrating secondarymap data In order to integratenew map datainto a GIS project, the following six characteristics needto be considered to ensurecompatibility betweennew and existing spatial information (Bunough and McDonnell 1998,p. 81).
The georeferencing system We have already consideredthe topic of map coordinate systemsin Chapter 2, and here we only need reiterate that maps may have been acquiredusing a coordinatesystemincompatible with existirg datasets. In order to integratedata collected using a different referencingsystemit must be re-projected or rectified, which invariably inffoduces a degreeof spatial error. Rectification and the quantilicationolspatial enors using RMSE values,are discussed in further detail below. Scale and resolution Combining data collected at different scalescan causeinterpretative difficulties. In generaltems, a map compositeconstructedfrom two or more sourcemapswill be no more accurate than the leastaccuratemap,andpossibly somewhat worse than this if some maps have been re-projected.In addition, to combine rasterdatasets that have different resolutions(i.e. pixel sizes)someapprcpriateconversionofgrcups of pixels into a single value is necessary before the two laye6 can be combined. Data collection techniques If new map data are to be combined with existing data, it is essentialthat they are compatible, not only in resolution, but also in tenns of their collection methods.For example, attemptingto combine the resultsot t\'/o archaeological separate surveysof the samearea,oneofwhichhas beenwalked using linear hansectsand one using gridded surfacecollection, could lead to problemsof interpretationwhen defining countsor densitiesof artefactsacrossthe site. pleasingmap givesanimpresData quality A professionallyproducedand aesthetically sion that the sameamount of care was taken with the collection of the primary data usedinits consfflction. Unfortunately,it is notalways possibleto assess thequality of datamerelyby examining a map.The datasources, reco.ding inshuments,conditions of data collection and, if appropriate,spatial eror estimatesshould be recorded-Il this information is not available,the map data should be treatedwith suspicion. Data classification methods Integratingsecondary dataalsorequiressomeunderstanding ofthe classifrcation systemusedand whether it is compatiblewith the objecdves of the project. To statethe obvious, a map of archaeologicalsitesclassifiedinto car ego es of 'prehistoric', 'Roman' and 'post-Roman', will not be useful for exploring spatial patterning of lron Age settlements. Less obvious may be the classilication of environmentaldata, such as soil productivity maps, which may be derived from modern agricultural productivity and so not be directly relevantto past agricultural conditions. Data processing methods When using a map ofelgvation or some other continuoush changing variable, such as tempemture,then the method of collection, the densir\ of samplepoints and the intelpolation methods should be considered. For example.

5.3 Secondary data Table5.5 Commonvectorf.le formats

Data exchange Format Arc,{nfo Ceneratefiles Arcview Shapefi]es National GeodeticSurvey Digital line graphs UK National Transfer Format


Possible file exrension


if the choice between two DEMS is to be made on grounds of quality, then it will be important to ascertainwhich methodsof intelpolation and subsequent filtering (if any) were usedin their creation.Digital elevationmaps producedfrom contour data are particularly prone to interpolationerrors.Chapter6 discusses which are the most appropriateinterpolationmethodsfor particular circumstances.

,\cquiring digital map data \lany maps are availablefor download from govemment data suppliers such as tre US Geological Survey,23 the CanadianCentrefor TopographicInformation,2a Australia,2s Geoscience the UK OrdnanceSurvey26 and equivalentEuropeanand :rtional mapping agencies.Publicly available global datasetsat coaiser reso.utions are available from Land Processes Distributed Active Archive Cenlre,21 Pennsylvania StateUniversity Library's Digital Chart of the World2sand are also Itlen provided by GIS companiesas part of the software package.The delivery -.rrmatof the map file may vary considerably.Common vector map file formats ,re outlined in Table 5.5. The exactprocessby which data acquiredfrom theseor ..qher agencies is impofted into a particularGIS programdepends on how well that trogram supportsthe relevantfile format, andis usuallyexplainedin the program,s .locumentation. In caseswherethe preferredsoftwareoffers no supporl for the rel3\ant format it is usually possibleto find anotherprogram that can be used as an to convertthe data to a file format that is suppofied. -ntermediary Vectorfiles will be appropriatelystructuredto allow the extractionof only those .lirtalayers or spatial objectsrequired for a particular application.Each entity, be :i a point, line or polygon, will also have one or more descriptiveattributes.For 3\ample,map datapurchased from the OrdnanceSurveyof GreatBritain contains :umerous layers,eachdefining different types of spatialobject.Thesecodesallow -he datasetto be separatedinto different map layers within a GIS environment. Raster files containlessinformation because of the limitations of the file structure.
'a, : t p: / / g r e o g r a p h y . u s g r s . g o v. : -r, v rv .or d r a n c e s u r v e y . g o v. u k. - rinrr. maproom. psu. edu/dcw.
24www. 27http


r ncan,

gc . c a.

25w w w . go

. go..


: //edcdaac.usss.

s ov /qtopo3O/hy dr o.


Spatial data acquisition

At the minimum, a raster file will contain a grid of numeric values, but more typically there will be a header'file that definessuch things as the scale and resolution of the image (Chapter4). Some GIS plograms may require this information to be rewritten in an appropliatefolmat before it can be interpretedby the proglam. 5,3,2 Digitisitg pdper m(rps For many parts of the world detailedtopoglaphic maps are not availablein digital formats.Even when digital maps ale available, digital thematicmaps, such as geological or soil data,may only exist in paperformats.In somecountries it may evenbe difficult to obtain papermapsfor reasons of nationalsecurity,andregionalbasedanalysesmay dependon publicly available maps or remotely senseddata obtainedfrom the resources outlined above.In order to usepapermapsin GIS it is necessary to 'digitise'them. The term 'digitising' refersto the process analogue information of transferring to a digitalformat,although in this contextthe term refersmore specifically to the fiom papermaps,plansor delineation andtransfer of discrete unitsof information aerialphotographs to a spatial database. Digitisingcanbe pertbmed from within a do) or in a specialised GIS programif the programsupportsit (and many packages computer-aided drawing(CAD) environment suchasAutoCAD. Digitisingcanbe pieceof hardwale calleda digitising donein oneof threeways.Oftena specialised placed tabletis used, mapleatures on whicha paper mapis andindividual ale traced 'puck'. Alternatively, georeferenced usingan electronic the papermap is scanned, anddisplayed on a computer screen so that features may be digitally tracedusing a mousein a processcalledhecLds-t4t or on-screendigitising. More recently it has becomecommon to use programsthat automatically find and extract vector data from scannedmaps. Programs such as ESRI's 'ArcScan' integrate this process packages within a GIS environment, but thereare alsoa numberof standalone as well. Regardless to the digitising of which methodis used,thereare fbur stages (i) capturingthe spatialdatal (ii) enteringthe attributedata; (iii) eror process: checkingand 'drawing cleanup',and when appropriatel(iv) linking the spatialdata to rhe artributedata (Fig. 5.6). Capturing the spatial data In orderthatthe information beingtransfered fron the papernap is placedin its corposition, the map needsto be .qeolefcrzriceul rect geospatial before any infbrmation georeferencing involves a mathematical is captured. Whenusinga digitisingtablet. processthat transformsthe coordinates of the papernap to real-world ones(that is sometimes called 'map calibration'). This requires the identihcation of a number to locations of groLtnd controlpaizs (GCPs)on the originalnap that correspond although more than on the targetmap.Ideally,morethan threeGCP: are needed. nine pointscan soonbecome redundrnl. The placem:ntof the pointsis important to ensure an accurate fit between should be spread thepaperanddirital drta. Points

5.3 Secondarydata


attribute data

sp a tial data

regrstrationto coordInate system ma n ua! digitizing ofpoints, lines,and polygon data i n p u t v i a t e xt fr le s cle a n - u p li nesand


+ +

weed-out excess nodes

buildtopology create relational tables create entityidentif iers


Fig. 5.6 Stepsin digitising map data (after Burrough and McDonnell I 998, Fig. 4.8).

elenly aroundthe perimeterof the areato be digitised rather than clusteredin the ientre or lying along one or two straight lines. The mathematicalbasis of calibration, and the associated spatial eror in the calibration - as expressedusing the root-mean-square error (RMSE) - is describedin the Box 5.1. Whetherdigitising in 'heads-up'mode or via a digitising tablet,capturinginformationfrom the map may be undertaken in either 'point' or 'stream' mode.The {irst process refersto a in which the usermanuallydefinesthe locationofthe verticesthat definepoints,lines,polylines or polygons.The lastprocess, streamdigitising, is the automaticplacing of verticeswhile the puck or mouseis tracedover the map object, rvith the distancebetweenverticesestablished by a user-setincrementvalue. The


Spatial data acquisition table Table 5.6 A correspondence

lnput .x

Input] 112.95',7 tt2.632 )12.469 94.552 8 5 .9 1 9 89.1'7'7 82.661 98.461

r|: Output 115.542 408.210 4',76.557 294.29'1 r 19.04'7 383.675 296.050 4 78.310

Output ) 532.924 534.6'7'7 534.6'7'7 336.644 242.009 271 .802 2l'7.4',7 4 426.022

I 2 l 4 5 6 ,7 8

t9.682 50.304 60.892 36.785 19520 48.675 4 0 .5 3 1 61.21'7

Tab'le5.7 RMSE and error values(the 'residuak'). RMS:x=2.55,y=e.$i

.ir enot )-e(or

I 2 3 4 5 6 '7 8

0.454 2.9'7',l 2.139 -3.2',73 1.632


t7 - 1.0 7.886 -9.659 1.606 0.555 -19.54'/ 9.r05 12.1',79

error Box 5.1 Root-mean-square When computing the rectification,a GIS program will attemptto use the polynomial function that bestfits the points to their new location.However,an exact match is highly unlikely, so it is important to obtain a quantitativemeasureof betweenthe desiredandactuallocationsof the GCPs.This is the goodness-of-fit error (RMSE), which canbe intelpreted usuallyexpressed asa root-mean-square asthe average spatialenor of the rectification.The RMSE rnay be accompanied which gives by an individual error valuefor eachGCP usedfor georectilication, the linear distancebetweenthe desiredand actual locationsof eachGCP after rectification.Theseare also called the 'residuals'.The residualsfor the control follor.r ing an affine transformation. points in Table 5.6 are given in Table as: The RMSE valueis calculated


I(r" - ",)t

(5 l)

the original and transformedcoordinatelocawhere;ro,and .rt are,respectively, tions, and n is the number of GCPs. In the example in Table 5.7. the mean of

5.3 Secondarydata


the squares is 6.52 for r and 96.59 for y. The squareroots of thesetwo values, 2.55 for.x and 9.83 for y, give the RMSE and thus the averagespatialerror in the rectified map or image. Whether or not this is an acceptable level of error depends on the scaleof the maps and the purposeto which they are being put. As a rule of thumb, it is wise to aim for an error of less than 1 : 3000. so if the original image was I : 15 000 in scale,then an RMSE of approximately5 m (15 000/3000 : 5) or less would be acceptable. In the example in Table 5.7 the J-value falls outside this range and the combined RMSE value for both the r- and y-dimensions is 7.18. On this basis,the rectification exceeds the acceptable error limit, so it would be necessary to choosea new setof GCPsand reconfim that their new coordinates are accuratelyestimated.It can be helpful when reselectingGCPsto examinethe residuals.For example,Table5.7 shows that GCP 6 has a high error in both the x- and y-dimensions,suggestingthat it might be contributing significantly to the overall error. Simply removing this GCP fiom the next attempt at rectification may therefore reduce the overall RMSElo raithin acceptable limirs. Combining spatialdatafrom two or more sources will resultin a largerRMSE data from than any individual map.This may occurwhen,forexample, amap that containstopographicdatais combinedwith a geologymap sothat measurements betweenphenomenaon both maps can be taken (e.g. distancesfrom sites to a paflicularraw materialsource), or whentwo rastermapseachwith cell valuesthat representtwo phenomena(e.g. quarterly rainfall amounts)are addedtogether. When data from two maps are added together the combined RMSE can be calculatedby:

RMSEl"er+ RMSE:,"p,


The combinedRMSE canresultin a largedegreeofuncertainty being introduced into spatialdata.For example,combining a 1 : 25 000 topographicmap with an RMSE of 8 m and a I : 100000 soil map with an RMSE of 32 m meansthar any point on the combineddataset will only be accurateto *33 m.

advantage of point digitising is that operatorcontrol over the placementof vefiices tends to result in a more 'intelligent' selection of vertices that better represent the shapeof a polyline or polygon using the fewest points necessary. For some forms of interpolationof elevationdata - pafticularly TINs - the manual capture of'very important points' (VIPS) at contour vefiices producesbetter resultsthan streamdigitising whiih can result in long strings of redundantvertices.A major disadvantage of point digitising, however,is that it is a very time consumingand tediousprocess.For that reason,streamdigitising is often preferredfor capturing complex information such as contour lines. Many GIS programscontain 'drawing cleanup'options,which consistof a setoftools thatremoveverticesalong a polyline

Spattal data acquisiti on

Fig.5.7 A polygon before (left) and after (right) the removal of redundantvertices. tolerances are usedto detemine when verticesarc redundant.In this User-specified example, 'cleaning up' has resultedin the loss of some detail, but has leduced the spaceneededto storethis polygon by nearly half.

Dangling lines


Fig. 5.8 Two common topological errors when diSitising line data.

if the resulting locational change falls within a user-specifiedspatial tolerance (Fig. 5.7). This, in effect, meansthat a drawing digitised in streammode can be 'weeded' ofredundantinformation,thusreducingits sizecloseto that ofa manually digitised drawing. Error checking and building topologies When digitising a drawing, careful attention must be paid to the topology of the vector objects.When digitising a topographicmap, for example,it is very important that individual contour lines are connectedwithout overhangsor undershoots analysisand (Fig. 5.8) as thesetopological errorsmay severelyimpair subsequent interyolation.Many digitising programshavefunctionsforhelping maintaintopoloverticeswithin a gies througha 'snap' featurethat automaticallyfinds and connects properly polygons need to be defined, limit when digitising. Similarly, user-defined vertices(Fig. 5.9).Even without unintentional'slivers', 'overhangs'orunconnected to cary out error checkingand when usingautomated'snap' functionsit is essential post-digitisingdrawing 'cleanup' to ensurea topologically correctspatialdatabase. Some GIS packageshave post-digitising automated'clean and build' functions that first 'weed' redundantvertices and then constructtopologies in the manner

5.3 Secondary data



Fig. 5.9 Three common topological errors when digitising polygons.

in Chapter2. This ensures a clean,properly structuredand efficient spaJescribed :ial database. Survey projects often record data about many small units acrossa landscape $ polygons (e.g. Bevan and Conolly 2004; Barton et al.2OO2).The recording of Jre location and morphology of each unit as polygons, as well as the topological between the polygons, may be captureddirectly in the field using :.-lationships :.rtal stationsor GPS equipment,or during post-surveydigitising from fie1dmaps. interpretationof the ln either case it is of utmost importancefor the subsequent polygon data that topology is recorded correctly so that overhangs -rchaeological ..r slivers are presentonly when they are true featuresof the original survey data Fig.5.10). Major errors in artefact density calculations can occur with poorly :3cordedsurveyunits. Entering attribute data doesnotnecessarily involve only the captureoflocation andmorphology. Di,eitising Altribute data, such as contour line values,site codes,etc., are an essentialpart of . GIS as they provide meaningto the spatialobjects.In the caseof single attribute iirta, such as contour line values, it is often easier to enter this information at jre time of digitising. Alternatively, and particularly in situationswhere extensive rftribute data need to be entered,it is easierto enter this information in a separate spreadsheet or raw text editor. Providedthat an identifier is also entered Jatabase, n ith the attribute data,then this information can be linked to the spatialdatabase identifiershavebeen addedto eachsDatialentitv. ..rnce Linking spatial and attribute data Once the spatial data have been enteredand topological relationshipshave been entity identifiersneedto be addedor confirmedfor eachspatial built and/orchecked, .lbject.Theseidentifiersthen serveaslinks to the information storedin the attribute The actual mechanicsof linking attributeand spatialdatabases Jatabase. will vary ,.r'iththe GIS software,but typically this involves defining an identifier attribute held in eachdatabase and specifying that the two tables should then be linked via


sitiut Spotiu! data ocqLLi


{t ,.,'.',,"L2 *-,*"

/ -"a

tractPol)gontopolog) oflhe KIP CIS. shoring sln-\'c) Fig.5.l0 A srnallseclion polygons and gapsbetween buI do not overlap. pol),gons ottcn shalebofders. Acljrcent areasC) KylheraIslind Project Uscdwith and rellectLll]suneycd are inlentional
p ct-lIIsslotl.

join. as described (i.e. as a relational'one-toone' or 'one-to-many' this attribute 4). in Chapter 5.4 Map rectification and georeferencing images with othermap photographs or satellite ofpapernaps.aerial The integration so that it can be properly data for thc new clata requiresdclilting spatialcoordinates process is called positioned and scaledin relationto the exiting map data.This this, a numbero1 locationswith known i ng. In order to accomplish geore.;fe renc the map or image.referredto as glound control identifiable on must be cooldinates a grid overlayfiom whichcoordinates have Map datawill. typically, points(GCPs). clitficult on low-resolution but findingcontrolpointsmay be mot'e canbe obtained, or lessit data.On imageswith a 20-rn pixel resolution imagessuch as satellite

5.4 Map rectifcation and georeferencing


Fig. 5.1I Translation,scaling and rotation.

shouldbe possibleto locate road intersections, field boundaries,rock formations, buildings or other featurcsfor your control points. If GCPs cannot be defined by ;omparing the image data with a map, then it may be necessary to obtain GCPSby GPS survey(seeSmith and Atkinson 2001). In its simplestform, georeferencing is a mathematical transformationof one set olcoordinatesto another. Most ofthe mathematical calculations requiredfor simple ransfomations are relatively easyto perform as they consistof a combination of three operations:translation, scaling and rotation (Fig.5.11). Translationis the horizontalor vertical shifting of a setof coordinates. For example,changing -rimple rhefollowing setofthree coordinates { 10,5l 7, 8; 2, l2} to the new coordinates {20, 101' l7, 131' 12, 17] involvesa simpletranslation of +10 units in the x-direction, and +5 units in the y-direction. Scaling is similarly straightforward;changingrhe new set of coordinates by a scaleof3 would give us {60,301 51,39t 36,51}. Rotation, while easy to visualise,is a more complicatedprocedurerequiring the trigonometric manipulation of coordinatepairs. In practice, it is often necessary ro combine thesethree operations,so GIS programsuse polynomial functions to rransformthe original coordinatesystem.The programfirst definesa function that uses severalunknown parametersto link the inputs, which consist of the GCP coordinates on the original image,to the outputsdefinedby their intendedlocation. It then obtainstheseparameters polynomial equation. by solving the corresponding Finally, the completedfunction is appliedto all cells or verticeson the original map, thus effecting the transformation. When translation,scaling and rotation is applied equally acrossthe whole map (i.e. it does not alter the shapeof the original image) it is refened to as a firstorder polynomial, or orthogonal tansformation. In many situationsa more radical approach may be needed.A second-order polynomial (or affine transformation) is ableto adjustthe shapeofthe original imageby independently scalingthe x, y-axis of the original image,then translating, rotating and skewingthe image asnecessary to make it fit the desiredcoordinatesystem.Third- and higher-orderpolynomials areable to translate,scaleandrotatein differing proportionsover the entire map, in what is called a projective transformation.This is also known as image 'warping'


Sputial data acquisition

or 'rubbersheeting', which providesa good analogyfor what the rectification is attemptingto pedorm. The combinedprocessof correctingdistofiion and placing in a cooldinatesystemis often called georcoifcation. The mannerin which map or image rectification is performedis similar across table or text most GIS packages. Typically, it involves creating a correspondence definedusing ,r, r'file that containsa list of GCP locations(as pixel locations coordinates)on the unrectified map or image alongsidetheil desirednew coordinates(as obtainedfrom the basemap; Table 5.6). The selectionof GCPs will be dictatedby the availability of featureson the basemap and the map or image to be rectified fol which coordinates can be identified.Thesecontrol points should also be widely spacedthroughoutthe image, ratherthan clusteredin a comer or along the edges.The number of control points to use is dependenton how radical the rectificationneedsto be: sirnpleorthogonaltransformationneedsonly two or three points, but a badly distorred aelial photographor satellite image requiresmany control points to compute an accuratewarp. Nine well-spacedGCPs are usually sufficientfor evena relatively radical rectification,althoughin someextremecases (Campbell1996,p. 304). pointsmay be needed l6 or morewell-placed SomerelativelyrecentGIS packages suchasATcGIShaverectificationtools that allow one roughly and manually to position the map or image to be georectified and then selecta seriesof common points to serveas GCPs. Theseare then used to pedorm the rectification and calculate an RMSE value in the manner defined above. 5.5 A note on spatial error and map generalisation An RMSE valueprovidesa measure ofhow accuratelyspatialinformation hasbeen translatedto a new coordinatesystem.The RMSE valueswill be low when calibrating papermapsthat are in good condition and higher when using, fol example, photocopies.In some situations,such as when georectifying aerial photographs, RMSE valuesmay be much higher than the guideline values we suggest.ln any given situation ajudgement will have to made as to whether the spatialelTorfalls within a usablelimit. It is essential to recordthe RMSE for all georectiliedmapsas it may becomean importantvariablewhen collecting spatialdata suchasdistances. ldeally, the RMSE should be included in the metadataaccompanyingthe dataset (seeChapterl3). One important point to rememberis that maps of different scaleswill havedifferent resolutionsand erors. By its very nature,a I : 50 000 map cannotaccurately record the morphology of featureswith a spatialextentmuch lessthan 50 m, as at this scalesuch featureswould needto be drawn lessthan I mm wide. If this maD was digitised, thele is nothing to stop it being reproducedat much larger scales well beyondthe accuracyof the original and usedto collect spatialmeasurements 'map generalisation' ref'ers to the effors arising from making spatial data.The term inferencesfrom map data that has beenreproducedat scalesthat are incongruous with the limitationsset by the originalscaleof the drawing.This practice can be

5.5 Spatial error and mnp generalisation


dre sourceof considerableerror and should be avoided.Furthermore,as we have explained,combining maps of two different scalescan also result in significant spatial error. When combining maps of different scalesit is important therefore to structurequestionsand condition inferencesto the error of the largest scale map used in the analysis. Readers requiring further infomation on map generalisation and spatialerror are refened to Jolo (1998).

Building surface models

6.1 Introduction Surfacemodelling is an importantanalyticaltool and,particularlyin the caseofelevation modelling, is often the final stageof GIS project developnent. Constructing a digital elevationmodel (DEM) from secondary sourcessuch as digitised contour lines and/orspotheights,or from primary datasuchasLiDAR or DGPS survey,is a frequentobjectiveof surfacemodelling (Atkinson 2002). Surfacemodelscan also be derived from a wide rangeof point-basedenvironmentaland anthropomorphic countsor soil chemistry(e.g.Robinsonand Zubrow 1999; data,srch as artefact Lloyd and Atkinson 2004). The derivation of a continuoussurfacefrom a set of involvesa processcalledinterpolatloirandthe selectionof an discreteobservations appropriate interpolationtechniquedepends on the structureof the sampledataplus the desiredoutcomeand characteristics of the surfacernodel. This chapterbegins methods and is tbllowedby by reviewingsomeof the morecommoninterpolation a more detailed reviewof technicues for buildinpDEMs from contourdata. 6.2 Interpolation Interyolation is a mathematicaltechniqueof 'lilling in the gaps' betweenobservations.More precisely,interpolationcan be dehnedas predicting data using surrounding observations. It can be contrasted to extrapolatiort which is the process of pledicting valuesbeyond the limits of a distribution of known points. To use a simpleexample, tf n andm are unknownvalueswithin the set of numbers{2, 4. n could be i terpoloted n,8, 1,0, mj, then using a simplernodelof linearchange as being equaito 6. Using the sameassumption of constant change. rl would be extapolated as equalto 12. It is important to highlight that interpolation may usefully be applied to any quantitativespatialphenomenon that has somedegreeof structurc in othel words, when the valuesof observations are not random,buihave posittveautocorrelation. This term describesa form of spatial structurein which, over the entire distribution, it is to a greater or lesser extent the case that the closer two observations are together,the more similar theil values.This is clearly the casewith elevation data (e.g. the closer two placeson a landscape are to each other, the more similar Positiveautotheir elevations will be) and many other typesof naturalphenomena. correlationis an important prerequisitefor interpolation,as it determineswhether datapoints.However, while attributes canbe predicted by lookingat neighbouring positive autoconelationcan be assumed for many environmentalphenomena such


6.3 Global methods


etc.,it cannotnecessarily be assumed for anthroaselevation,rainfall, temperature, pomorphicphenomena, acrossa landscape. The suchasthe distributionof artefacts therefore be investidegree to which a setofobservationsare autocorrelated should satedbefore attemptingto model it as a continuousdistribution, as only positively correlateddistributionswill be worth interpolating.In caseswhere there is weak positive autocorrelation, to assess interpolationmethodsthat use geostatistics the characterof spatial variation in a dataset(e.g. kriging, as describedbelow) often performbetterthan simple methods.Both approaches are describedin detail in this chapter.
Edge effects The reliability ofaprediction ofan attdbutevalue at an unsampled location will improve as the number and proximity of surounding known points increases. It thereforefollows that points at areastowardsthe edgeofa distdbution will often have less accuratepredictionsbecausethere are fewer points surroundingit. This 'edge eft'ect' can significantly distort the predicted surfaceat the edge of the distribution and, as a rough rule of thumb, the outer l0 per cent of an interpolatedsurfacecan be consideredsuspect. For example,in a situationwhere the distribution of sampled points coversa rcctangularareaof 100 x 100m, then the interpolatedsurfacewithin the central 80 x 80 m will havefewer edgeeffectsthan the outer lo-m border.When collecting dataibr an interpolation,such ascontour lines to build a DEM, it is wise to include infomation from the surroundingareato reducethe influenceofedge effects. The interpolatedsurfacecan then be clipped to the desiredextent. in a number of Types of interpolation Interpolation algorithms can be characterised different ways. One characterisation is basedon the number of points used:gloral operatols lse all polnts collectively to identify a global trend; local operators \ook only to known valuesin a definedneighbourhoodof the unsampled location to ensure that the surfacerespondsto local vadability, which typically resultsin a rougher but locally more responsivesudace.Interpolation algorithms can also be distingujshed according to whether they are exact or ineracl. The former describesmodels where the values of the o ginal data points are maintainedin the output, whereasinexact methodsresult in an entirely derived surface,replacing the original data valueswith thosederivedfrom the model. Finally, interyolationscan be definedasconstrainedot unconstrained,dependingon whether upper and lower limits to the model are fixed.

6.3 Global methods 6.3.1 Trendsurfaceanalysis generaltrends Global interpolationmethodsare extremely valuablefor assessing in data but provide poor predictionsof local variation. The most commonly used global interpolation method in archaeologyis trend surface analysis, in which a mathematical surface is fittedto a spatialdistributionof quantitative attributevalues. For example,Fig. 6.1(a)showsan artefactdistribution,wherethe sizeofthe artefact hasalsobeenrecorded. The distributioncanbe envisaged asa three-dimensional point cloud, where the size of the artefactdefinesits height abovethe surface.A trend surfacemodel applied to this distribution is the mathematicalequivalentof fitting a sheetof paperthrough the three-dimensional surfaceso asto minimise the sum ofthe distances betweeneachpoint and the piece ofpaper (or, more precisely,


Bui ld i ng surface mode ls


o'- a ' : ..:.;,.'

'a da ^


. 2-19mm . l 9-38 mm e 38-65mm 465 i l 0mm mm l, 110-204

o o ao

Fig. 6.1 Trend surfaceanalysisof artefactsizes.The upper map (a) is a proportional symbol representation of artelactsizes.Map (b) models size changeusing a linear trend surfaceand map (c) a quadratictrend surface.Createdusing ZREND in Idrisi.

the sum of the squareddistances). A 'first-order' trend surface,as in Fig. 6.1(b), attempts to fitthe imaginarysheetofpaperwithoutbendingit at all, whereas second(Fig. 6.1c) and higher-ordersurfaceswarp the surfacein progressively more complex ways to minimise the averagedistancebetweenall the points. In effect, the higher-orderthe surface,the more local the model becomes.

93 Mathematically,a trend surfaceis a polynomial model. The term ,polynomial' Jescribesa simple function constructed by the addition of terms, typically of the tbrmat ,0x, where b6 represents a number derived from the model, and ,r is the of the sparialobject. The complexity (or degree)of a polynomial is -r-coordinate Jefinedby the number of its terms.As appliedto two-dimensionalspatialobjects, r first-degree, or linear, polynomial model is given by the equation: Z,,y:bsl hxlbzy

6.3 Global methods

(6 . 1 )

u herez is the value of the variableto be established; bs, b1 andb2 are coefficients .rbtainedby the regressionanalysis; and x and y are geographicalcoordinates Hodder and Orton 1976,p. 160). Second-degree, or quadraticmodels, as in Fig. 6.1(c),are given by the equation: z".y = bo I hx t bzy + bzxz I baxyf b5y2


For example,in the linear model in Fig. 6.1(b),the variability in arrefactsize,z, is modelledasr:11.45+0.2Ix f 0.46y.Whenapplied ro afiefact sizes inrhe map rn Fig. 6.1(a),both the linear and quadraticmodelssuggest general, that, in artefact size is lowest in the upper left of the distribution, with the linear model showing iize increasingtowardsthe lower right. The quadraticmodel showssizesincrease iowards the upperright andlower left ofthe distribution.As with regression analysis, .rn R2 value (often refered to as the 'goodness-of-fit')can be usedto assess how \\ ell the calculatedsurfaceflts the original distribution of valuesand can be taken f,s a measureof the proportion of the variation that is explainedby the modelled surface(Shennan1988,pp. 126-131).It is calculatedas:


tr herez; is the observedvalue, 2; is the value predictedby the model and 1; is the meanofall observed values.An R2 valueof0.095 for the linear model in Fig. 6.1(b) and0.105for the quadraticmodel in Fig. 6.1(c) showspoor correspondence between themodelledsurfaceandthe original datapoints.A higher-orderpolynomial would probably provide a better fit betweenthe observedand predicteddata values,but suchmodelsareoften overly sensitiveto outliersat the edgesofdistributions which can significantly distofi reality (Burrough and McDonnell 199g,p. 109). Trend surfaceanalysishas value for identifying very generaldata trendsbut by itself rarely provides adequateunderstandingof spatial processes and should be usedcautiously.This doesnot entirely nullify the value of trend surfaceanalysis, but it does suggestthat using polynomial surface models to understandarchae_ ological distribution patternscan often result in a gross approximation of more complex pattems.Examplesof the use of trend surfaceanalysisto make senseof


Building sutface mode ls


Fig. 6.2 Linear interpolarion.UsinS Eq. 6.,1the. value at -r is predictedas: (48 - l2J ^- l , 12= 18. o

archaeological datacan be found in Hodderand Ofton (1976,pp. 155-17.1), who discuss the technique in considerable detail.Sydoriak(1985)and D'Andrea er al. (2002) show its valuefor assessing intra-siteaftefactdistributionsandBove ( 1981) and Neiman ( 1997) have used trend surfacesto help understandthe collapse of Classic Mayancivilisation. 6.4 Local methods Local interpolatorsare usedto model surfacesto provide information about local variability as accuratelyas possible.They will provide a closeor exactfit between the original data observations and the model, dependingon whetherthey are exact or inexactinterpolators.Somelocal interpolationmethodstake into accounttrends acrossthe whole study area to establishmore precisely the local versus distant influenceon attributevalues,while otherslook to only a very small neighbourhood to predict a value.There are many different types of local interpolatorbut here we review the two most commonly used techniques:inversedistanceweighting and splines.Both producea continuoussurfacegrid from an initial distribution of discrete observations, althoughexactly how they do thjs, and their resulting surfaces, vary enormously.In generalterms,they are both appropriate tbr modelling quantitative surfaces of environmentalor anthropogenic phenomena including elevation. A further local interpolator that may be used for the intelpretation of qualitative dala,Thiessentessellations or polygons,is describedin Chapter 10. 6,4,1 Linear interpolation with inyersedistqnceweighting Linear interpolationis conceptuallyequivalentto drawing a straight line between two points with known attributevalues.Any point along this line can then havean attributevalue predictedby establishing its relative distancebetweenthe points, as sivenbv: (2" zt) x 4,
it l


where{ is the interpolated value at point.r; zu andzr arethe upperandlower values; 4t is the linear distancebetweenthem; and dlr is the distancebetweenr and the lower value.This is demonstrated in Fis. 6.2.

6.4 Local methods


I f

Knownvalue(.{) Kcellbeing calculated (x,) Unknown value(x,)

Fig. 6.3 Interpolation is used to calculatevaluesat points within a distdbution of points with known values.The distanceweighted averageof up to l2 nearest neighboursis often used.Distancesnot lying along the.r- or )-axes are calculated using Pythagoras'theorem(seeFig. 2.9).

However,the use of only two points along a straight line doesnot always result in accuratepredictionsof valuesat unsampledlocations.Methods that look to the r aluesand distances from a larger sampleof sunounding known points give more reliableresults.Onecommonapproach (IDW) is calledan lnyers e distanceweighted predictor,in which the weight given to the sampleofsurroundingpoints is inversely propoftionalto its linear distanceraisedto a specificpower.This is given by:
\ - .,{ r r 0, : r..r '


\-.r-' L2' tl

rvherei(,r6) is the unknown value, z(.ri) are known valuesand d;; is the distance betweenthe unknown and known point. The attribute values of n nearestneighbours are used to establishthe unknown value (Fig. 6.3). The exact number of neighboursselectedwill havean effect on the final surface.Someprogramsselect the nearest12 by default (hence 'IDW12'), whereasothersuse a searchradius to selectbetween4 and 8 points. The distancefrom eachdata point to the point being interpolatedis then transtbrmedto its reciprocal(i.e.d,r'), or more often the inverseofthe distance raisedto a powerof two (i.e. dt2;. The tiansformeddistanceis then multiplied by the attribure


Building surfacemodels Table 6.1 Attribute, distancevaluesand.tretghted attribLnes .for Fig. 6.3

2.2 3.6 4.1 4 .1 5 .0 5 .0 6 .0 6 .1 6 .1 6 .4 6 .' 7 6.',7 0.45 0.28 0.24 0 .2 4 0 .2 0 0 .2 0 0 .1 7 0 .1 6 0 .1 6 0 .1 6 0 .1 5 0.r 5 2 .5 6


--tr, td,,r 20.9 13.3 8.8 7.3 9.6 6.1 3.'7 4.9 8.2 7.8 3.6 4.8 99.3

. ( \, ).1tr 9.5 3.',7 2.1 t.8 1.9 1.3 0.6 0.8 1.3 1.2 0.5 0.7 25.4

146 248 336 430 548 632 122 830 950 10 |24 12 Total

50 32

0.21 0 .0 8 0.06 0.06 0.04 0.04 0.03 0.03 0.03 0.02 0.02 0.02 0.6.1

values of the known point to 'distance weight' the attribute value so that points located further away will contribute less to the prediction. These values are then Inversedistance summedandthen divided by the sum ofthe transformeddistances. for when d approaches 0, weightedmethodsare by necessityexact interpolations. to infinity and thus the original data points are maintainedin the weight increases the derived surface.' The influence of distancecan be modified by changing its Bun:ough andMcDonnell(1998,pp. 117-118)showhow weighting: for example, is an interpolatedsurface significantly alteredby modifying the distancefunction to d;2, di3 and d,;4. Table 6.1 provides a worked example and lists the attribute (d;;) for the 12nearest identified (:(x;)) anddistances neighbours in Fig. 6.3. values Both the reciprocaland reciprocalof the squareare listed togetherwith the correspondingweightedvalue.Dependingon which of thesetwo weightsis chosen,the predicted valuefor the grey square in Fig. 6.3 would be either99.3/2.56: 38.8, : 39.7. or 25.410.64 Inverse distance weighting is a good all-purpose interpolation algorithm for well-spaceddatapoints, but determiningan optimum set ofparametersis difncult; varying the numberofneighbours andthe distanceweighing will significantlyalter the surface.Withholding some data points from the model, then comparingthese known valueswith the predictedvaluesat that point is one way of comparingthe accuracyof different setsof vadables.These differencescan be quantified as an produced using differentweighting RMSE, following Eq. 5.1, and interpolations valuescanthenbe objectivelycompared. However,shortof actuallvlisiting the area
ita : z,;tsreciprocal is 2 l: "*r*pt". : thenits reciDrocal is I oc. -n,, j : 0.5;ifl : 0-001. then itsre.ipro.rli.+ : 1000:ifd : 0,

6.5 Kriging


being interpolatedto check the correspondence betweenthe model and the actual phenomenon, thereis no way to ascefiainthe probability of whether the model is .orrectly predicting values at unsampledlocations.If this level of information is required,then kriging may be a good altemative as it provides a measureof the rccuracy of the modelled sur-face. 6.4.2 Splines 'Splines' refer to a good general-purpose interpolation method, best applied to smoothly varying surfaceslike elevations.Like the trend surface model, fitting :plines producesa polynomial surface,conceptuallyequivalentto bending a sheet oi rubber to passthrough the three-dimensional point cloud (and in fact takesits namefrom the flexible rulers 'splines' - used by cartographers to draw curved .urfaces). More precisely, spline functions are piece-wise polynomial functions that arejoined togetheral breakpoints to forn a bicubic spline that passes exactly rhrough the data pojnts (Burrough and McDonnell 1998, p. 119). Wirh all exacr interyolators,excessivelyhigh or low values can causeunnatural 'pits or peaks' in the modelled svface. Thin-plate splines - the usual implementation in GIS packages solve this by replacing the exact surface with a weighted averageto producea minimum-curvaturesurface (Franke 1982; Buffough and McDonnell 1998,p. 120).The thin-plate spline interpolationmethod usually offers rhe option of setting either a tensioningor regularising weight. Increasingthe weighting of rhe former adjusts the tension of the total curvature of the surface to produce a rougher surface that improves the fit with the original data, whereas increasing r regularising weight adjusts the third derivative in the curvature minimisation e\pressionto prcduce a smoothersudace. Like IDW rnethods, splinesarea good all-purpose interpolationmethodthat have the major advantage ofretaining small-scale variability in the data.Because oftheir inherentlysmoothsurface, DEMs generated from splinescanproduceaesthetically pleasing sloped surfaces.The major disadvantages are that, as with the inverse distanceweighting methods, there is no easy way to assess how accuratelythe model reflects actuality and the smooth surfacesmay be inappropriatefor some sortsof data. 6.5 Interpolation with geostatistics:kriging The generalinterpolatorsdescribedaboveperform well on most datasets, but will not necessarilybe the best approachfor situationswhere there is weak positive spatial autocorrelationand the density and distribution of points is irregular. In thesecasesan interpolation method that uses geostatisticalmethods to estimate better the degreeof spatial variation and autocorrelationin a datasetwill help ensurethat the predictionsare optimal and will also provide information on error ratesassociated with each prediction (Haining 2003, p. 327). Interpolation using geostatistics is known as&riglng andis a rnorecomplexprocessthan the previously discussed techniques. Kriging is conceptuallysimilar to inversedistanceweighting


Building su(ace models

in that it usesa variable to weight the contribution of surroundingknown values bf,sedon their distancefiom the unsampledlocation to predict a new value. The major differenceis that the weighting value is dependenton the spatial structure f,nd degreeof spatialautocorrelation of the sampledistribution.The calculationof tt.i,shting valuesfirst involvesthe consttrction of an e4terimental variogram a nf,ihematical model that predicts what influence distancehas on the relationship betweenknown values- and then fitting thts lo a theoreticalruorle1 defined using an established function. The use of variogramsto derive localised predictionsof attributesfor unsampled locationsis dependent on reglonalised variable theory (Matheron l97l). This theory statesthat spatialvariation of a variableis a product of a deterministiccomponent and a stochastic(random) component.The deterministic componentis a global mean or trend. The stochasticcomponenthas two parts: a localised random componentcorrelatedwith the global pattem and highly localised random noisecaused (Burrough and by measurement error or small-scale spatialprocesses McDonnell 1998,pp. 133-134;Lloyd and Atkinson 2004). This theory can be expressed as the value of a variablez at point -r being definedby: z(x):n(r)fe'(x)+e"

(6 6)

whererr(x) is a functiondescribing the globaltrendof .zat r, e'(.r)is the random but spatially dependent localvariation at r ande" is randomnoise(DeMers1997). For example,if z is taken to representthe number of lithic artefactsrecoveredin a 2 x 2-m excavationsquare,then regionalisedvariable theory might be used to model a situationin which the number of anefactsactually recovered(or predicted to be recovered) is a productof: (a) a generaltrendof afiefactsdecreasing in number the closer one is towardsthe peripheryof the distribution; plus (b) a localisedtrend thathasthe numberofartefactsat the peripheryvaryingfrom high to low depending on whetherthe edgeof the distribution is slightly upslope or downslope: plus (c) random influenceat a highly localised level, such as the presence of a midden at the edge of the distribution which could causean area to have an unusually high number. Kriging, by first establishingthe relationship between distance and attribute variability, uses fegionalised variable theoryto establish locallyresponsive distance weightings.The weighting function is definedas .1.; and the sum of ,1,; is equal to 1. When appliedto eachof the values, within the search z(x;), from the n neighbours radius,the sum of the modified valueswill pfovide the estimatedattributevalue,2 at location-r0(where.rrarethe n neighbours): s-^ Z{xn): ) ; \ z(.1, )


In order to derive.)"1, an experimentalsemi-variogrammust be created.which is a plot of the semi-variance,denotedby /, againstdistancebet\\,eeneachpair of

6.5 Kriging


Las{h) Fig. 6.4 A plot oflag (r) againstsemi-variance(/) is a variog.am. The observations are shown as data points, to which an exponentialtheoreticalmodel (solid line) has beenfltted.

in the sample(referredto as the lag, and denotedby /z).The semi-variance F.oints "-anbe estimatedby the formula:

f t.tt,) z(xi+ h)f



* here n is the number of pairs of points of distance/2.The plot of y againstft is fie experimentalvariogram(Fig. 6.4). The most important stagein kriging is the fitting of a theoreticalmodel to the e\perimental variogram, of which the most common are spherical,exponential, linearand the Gaussian models(Bunough and McDonnell 1998,p. 136).Close ;orrespondence betweenthe experimentalandtheoreticalsemi-var.iogram is essenrial for the accuratederivation of ),; and some measureof the successof the fit is neededto ensurean accuratematch. Good kriging software should provide a mechanism for quantifying the fit betweenthe two (e.g. via an R2 value). poor lit betweenthe two modelsmay indicatethat there are two or more trendsin the data t}tatshould be modelled separately. When a good fit is identified,the coefficientsof the particulartheoreticalmodel htted to the semi-variogram(basedon the variable parameters of the nugget, sill andrange,and the type ofmodel used,as shownin Fig. 6.4) are then usedto derive the valuesof ,1" for eachdistancebetweenthe unklown and known points. As well as an interpolatedsurface,kriging provides a standard eror surface,which can be usedto identify areaswhere the intelpolation is lessaccurate. There are severaldifferent kdging techniquessuitable for different forms of data.For sampleswithout high peaksor troughs (i.e. atypically high or low values comparedto their neighbours)and without spatialstructure(e.g. a trend from high to low values)thenordinary kriging canprovide a more accurate surfacemodel than methodsderivedwithout geostatistics. In instances wherc thereare unusuallyhigh or low valuesblock kriging can provide a smoothersurfaceby averagingvalues. Universalkriging is atechniquethatuses trendsurface informationbut stillresponds


Building surfacemodels

to local values.This methodworks well with datathat havea definitetrend asit can utilise that data to make more accuratepredictions.The techniqueof co-kiging uses information from more than one variable in the distribution (e.g., retuming to the example above,perhapslithic artefactsand slope values).Co-kriging uses the autocorrelation of the variablein questionas well as correlationbetweenit and anotherquantitativevariableto makebetterpredictionsofthe weighting valueused in the interpolation. Despiteits complexity,kriging is a valuablealternative to more traditional interpolation algorithms,especiallywhen somemeasure ofthe accuracy ofthe modelled surfaceis desirable.However, some understanding of the spatial structureof the sample data is essentialto ensure that it is being implemented and interpreted correctly. Haining (2003, pp. 325-333) provides a good introduction to models for representing spatialvariation.A detailedand informative discussionof kriging with worked examples ofcalculating kriging weightscanbe found in Burrough and McDonnell (1998, pp. 1.32-151.) and Lloyd and Atkinson (2004). The latter work is notableasit is one of the few recentlypublishedexamplesof the applicability of kriging to archaeological data.The authorsprovide two casestudies,one of which usessoil phosphate dataobtainedduring the LaconiaSurveyin Greece(B uck et al. 1988) and the other that examinesassemblages of Roman pottery from the south of Britain reportedby Allen and Fulford (1996). The other notablerecentwork is Ebert (2002), who useskriging to predict the quantity of lithic artefactsexpected to be recovered duringfield walking. Geostatistical tools for the derivationofvariograms and the fitting ofexperimental models are availablefor many GIS packages, including ArcView (using one of the third-partykriging extensions listed on the ArcSriptspageof theESRI websire2 ), Arclnfb, ATcGIS (as a componentof the GeostatisticalAnalyst), GRASS (using Gstat) and Idrisi. There are also a number of fteely availablenon-GIS packages that can be usedfor geoslalistical analysis and kriging.suchas RJ and Variowina (Pannatier1996). Box 6.1 explains how to produce an interpolatedsurfaceusing kriging in ArcGIS. 6.6 Creating digital elevation models The creationof digital elevationmodels(DEMs) is oneof the more commonusesof interpolationalgorithmsin GIS, althoughwith the increasingavailability of highresolutiondigital elevationdata from mapping agenciesthis need is declining. A DEM is stored,manipulated,viewed and analysedin a GIS either as a rastergrid, when it is properly referredto as an altitude matrix, or as a triangulatedirregular network (TIN). Raster-based DEMs are the most popular methodfor working with elevationdata, althoughTINS offer advantages in storageoverheads and for some
2http: . e sr i . co m / . / / a r c scr ip ts 3wr,,,w 4 www-sst. u n i1 . ch/research/vari ow i n r . p r o j e ct. o r s.

6.6 Digital elettationmodels Box 6.1 Interpolationusing kriging in ATcGIS Kriging in ATcGIS is performed via the GeostatisticalAnalyst extension.Initiating the GeostatisticalWizard leads to the user being asked to select from a number of different kriging methods, the choice of which dependson the structureof the data as explainedabove.Once a techniqueis selected,a range of visualisationtools are provided, including the experimentalvariogram and directionalvariogramto assistwith the identificationof anisotropy.Severaldifferent theoreticalmodels can be selected,and it is possibleto assess visually how well theseeachfit the experimentalmodel. Unfortunately there is no way of quantitatively comparing the fit between the experimentaland theoretical models.One solution is thereforeto use a separate program such as 'Variowin' (Pannatier1996) to establishthe best-fitting theoreticalmodel and then enter the parameters(e.g. model type, nugget,range and sill) into the Geostatistical Wizard. Oncethis hasbeenentered, the wizard thenpasses the informationto the Analyst Spatial and computesthe interpolation,resulting in a modelledsurface. The wizard allows for the generationof predictionerrors,and also for randomly selectingsubsets of data for empirical testing of the accuracyof the modelling distribution.

torms of analysis.Both model formats may be derived from spot heights(points), vectorhypsography(contour lines) or a combinationof the two. The quality of a DEM can also be evaluatedon the basis of its suitability for rarticular tasks.Is it, for example,of a suitableresolution for the scaleof analyiis and/or secondarJdatasets with which it is to be integrated? Is it a statistically trccurate surface?Is it smooth and aesthetically pleasing?Prioritising one or more of thesequalitiesdependsvery much on the applicationto which the DEM will be put (Yang and Holder 2000). For example,the aestheticqualities of a DEM, pardcularly when viewed isometrically,are important for landscape visualisationand provide .'an a differentperspective on the character oftopography andterrain.They .'an be especiallypowerful when 'draped' with other information, such as shaded (Fig. 6.5). On the other hand, smoothly relief, aerial photographs or other datasets larying surfaces may not necessarily be the most accurate. In somecircumstances it may be more impoftant to ensurethat the DEM containsasfew errorsaspossible :r'en ifthis meanssacrificingsomeotheraspectsuchasresolutionor visual impact. The accuracy ofthe DEM is essential ifit is to be usedfor spatialanalysis. Digital :levation models form the basedata for a host of derived sudacesand ecological and environmentalmodels (discussed in further detail in Chapters9 and l0). The DEM-derived surfaces that are commonly usedin archaeological GIS include: the tirst-orderderivativesof slope and aspect(e.9. Kvamme 1985;Woodman 2000b; Bafion et aL.2002);the second-order derivativeof terraincurvature(e.g.Bevanand

Building sutface m.odel s

per Sherds 100m walk ed

Fig.6.5 An isonlelrically viewedDEN'I. hillshadcd ancl draped wilh artelact patterns. distribution Solrfcei KytheraIslaid Project. Usedwith perrni\\ion.

Conolly 2004);visibility (e.9.Wheatleyand Gillings 2002; Lake and Woodman (e._q. (e.g. 2003;Llobera2003);movement Llobera2000)andcost-surlices Bell and Lock 20(X); Bell et a1.2002); (Hill andhydrological modelsandwatershed extents 1998:Bevan2003;Bevanand Conolly 2004;Table6.2).Digital elevation models ale alsoneeded for the construction of taphonomic nodels, snchas soil lossand (Bun'ough sediment movement and McDonnell 1998,pp. 193-198:Bafion et ol. 2002: see also Chapter9), althoughthe full potentialof quantitative modelling of erosionprocesses has not yet been lully exploitedby archaeological users of GIS. Thequalityo1'DEMs varies enonnously depending on theaccuracy andstructure ofthe originaldataandtheinterpolation methods used. Thenumberand distlibution of samplepoints ('spot heights')is cfucial.Generallyspeaking. the greater their numberin areasof highly variableterain the more accurate the interyolation is likely to be.Whenever possible. therefore, it is betterto useprimarydatacollected manuallyusing the techniclues descfibed in Chapter5 so that the distribution of pointscan be properlycontrolled. Obviously. this is not a viableoptionfor largeprojectsor in situations scaleregionallandscape where surveyis impractical or inrpossible. In many regions, DEMs with grid spacin_e of 15-30 m are available from satellitesources, but thesemay not be of sutlicientlyhigh resolutionlbr someforms of landscape archaeology. Ilt thesesituations it is worth investigating the availabilityol DEMs from govemmentor commercial data sLrpplie|s. Note, however,that some commercially availableDEMs are actually interpolationsof existing contour data and are therefore likely to sharethe silme errors as other lnternolated data.

6.6 Digital elevationmodels Table6.2 Surfacemodelsthat can be computed from a DEM

Artribute Slope/gradient


Maximum rate ofchange in elevation

Practicalapplication Steepness of terrain; difficulty of movement;land capability classification;erosion and artefact movement;predictive modelling Solar inadiance; vegetationmodelling; movement;site predictive modelling Erosion modelling; soil and land evaluation Visual assessment of tenain variability Vegetationmodelling; Iand capability classifi cation: predictive modelling Site and settlementlocation analysis; predictive modelling Settlementlocation analysis

Aspect/exposure Compassbearing of steepest downhill slope Profile curvature Rate of changeof slope relief Shaded i:radiance \.iewshed \\atershed Representation of tenain relief via a shadoweffect ('hillshade') Amount of solar energyfalling on a landform Location ofvisible landscape from a given point The region draining to a defined point in the landscape

Adapted from Bunough and McDonnell (1998, pp. 190-192,204). Source:

6.6.1 Building a DEM from contour lines The creation of a DEM using manually digitised contour lines from paper maps nust often be resofiedto because of the lack of any altemativesources of informa:ion. However,contoursdo not provide an ideal setofdata for creatinga DEM. First, how accuratelythe contour data represents :r is often difficult to assess the actual in question.If the contourswere originally derived :norphologyof the landscape analysisof aerial photographs :iom photogrammetric then they will be more accu:ate than contours createdfrom interpolatedspot heights, but it is often unclear iow the contour map was created.Second,as describedin Chapter 5, manually contour data will contribute further spatial error to the dataset.Third, Ji_eitising described be1ow, contoursare lessthan ideal datasets rs for the interpolationroudescribed in theprerioussection. :ines A common but oflen unsatisfactoryapproachto interpolatingDEMs from coniours is to rasterisethe contour lines and then use the coresponding grid cells in .rnIDW, spline or kiging interpolation.Alternatively,the vetices of the polylines may be extractedas points to be used as nodesfor one of theseroutines.In these iituations there are good reasonsto be cautiousabout using interpolatorssuch as previously.Simple interpolators IDW or splinesdescribed like IDW work bestwith elenly distributeddata,whereascontourlines presenta string ofpoints of the same ralue with potentially very wide spacesbetweenthe next string of points. Under &eseconditions,simple distanceweighted algorithmsbasedon nearestneighbour or a fixed radius searchare likely to producesignificanterors, or 'artefacts' in the


B uilding surface models

Fig. 6.6 Problemsassociated with the simple interpolation ofcontour data.Using a fixed searchradius can result in either no points availablefot interpolation (point A) which rcsults in areaswith an erroneouspredictedvalue ol'0', or a set ofpoints sha ng a single value (point B), leading to 'plateaus'forming around contour lines_ Distanceweighted interpolation of nearestneighboursmay provide a smoothersurface (point C), although points locatedbetweenwidely spacedintervals,such as at D, will still result in a platear. Source:Kythera lsland Pro.ject.

modelled surface(Burrough and McDonnell 1998,Fig. 5.16; Canara et al. 1997, p. 471; Fig. 6.6). Such problems are often clearly visible as altemating plateaus and steepslopeson slopemaps derivedfrom the DEM (Fig. 6.7). Despitethe importanceof the DEM and the frequent needto use contour lines to createone, thereis no established bestpracticefor convertingcontour data to a continuoussudaceand the subjectremainsthe focus of considerable debate(see, for example,Carrara et al. 1997 Burroughand McDonnell 1998,pp. 121-131; Franklin 2000; Yang and Holder 2000; Merwin et al. 2002, for recenr technical discussions). One approachis to useintelpolation algorithmsspeciallydesigned to handle contour data, as theseare able to managethe inegularity of the point distribution, as demonstrated by examplesin HagemanandBennett(2000). However, DEM a createdfrom contour data cannotbe assumed to be accurateand must be checkedfor errors following interpolation. Some accepted methodsfor assessing the quality of a DEM include checkingthat: (i) predictedheightsfalling on or near the original contour lines havevaluesclose or equalto the contour height; (ii) predicted elevationspositionedbetweentwo contour lines have a value betweenthe two contour lines; (iii) elevationsvary linearly betweenthe elevationsof the two boundingcontour lines; (iv) areasboundedby a single contour interval, for example valley bottoms and hill tops, have a realistic morphology; (v) any artefactsof interpolation(i.e. areaswhich haveunrealisticpredictions)are limited to <0.1 0.2 per cent of the modelled surface(Carraraet al. 1997,p. 453). The quality of a DEM can also be assessed by deriving two additional datasets from the modelled surface.Firstly, a contour map with an interval one-halfthat of

6.6 Digital elevationmodels


Fig. 6.7 Typical problemsof using contour data tbr interpolation.Although the DEM (a) appearsto be an accuratemodel of surfacevariability, the derived stopemap (o.) shows extensive'tiger-stdping' (i.e. altemating light and dark areasthat conespondto the location of the original contours).Createdby IDW12 interpolation of coDtour verticesin Arclnfo 7.2. Source:Kythera Island Project.


B uiklin g surfac e motleIs

Fig. 6.8 Comparisonofoiginal contour lines (thick) with contour lines generated from surfaces(thin) using (a) IDW intelpolation and (b) a TIN. Significanrly more discrepancies betweenthe o ginal and modelled suface are evident in A than B, becauseof the problemsof using point-basedinterpolation ofcontour data.The TIN was translbrmed to a gtrd Dsrng TINI"ATTICE and contour lines were generated for both surface grids usinl CONTOUR at one half of the original interval ratio using Arclnfo 7.2. Source:Kythera Island Project.

the original contour data can be comparedto the predictedcontoursto identify the overall accuracyofthe interpolationand areaswheretheremay be errors(Fig. 6.8). Secondly,a slope map derived from a poorly interpolatedDEM may show artificially steepslopesinterspersed with plateausin the form of'tiger stripes' located along the original contour lines, as in Fig. 6.7. Finally, in critical applications, the accuracyof a DEM can also be verified by the statisticalcomparisonof the DEM and original dataset,using methodsdescribedin Kvamme (1990b), Wood (1996,Chapter3), Canuaet al. (1997),Hagemanand Bennett(2000) or Yang and Holder (2000). Corrmon testsinclude the calculationof RMSE valuesand examining a frequencyhistogram of elevationvalues.The latter method,referredto as hypsometricanalysls,is particularly useful when interpolatingfrom contour data, as poorly interpolateddata often results in 'spikes' correspondingto the original (Wood 1996,Section contourintervals 3.1.3). Filtering (smoothing) a DEM (as describedin Chapter 9) may help 'iron our' problems such as thosejust described.However, as a mle of thumb, if considerable filtering is required to make a DEM visually plausible,it is likely that there are problemswith either the sourcedata, or the method of interpolation,or both. Because of the problemsassociated with building DEMs from contours,it is therefore preferableto resamplethe original data to reduce the effects of contour line data on the interpolationand/or to selecta method that is better able to deal with this type of data. In this section we consider three common approaches used to build DEMS from contour data:TINS, linear interpolationbasedon steepest slope, and Arclnfo's proprietary TOPOGRID algoithm. No one method will be suitable for all types of terrain and, as accurateDEMS are critical for a wide rangeof other

6.6 Digital eletation models


Fig. 6.9 A Delaunaytriangulation.No circle that passes through the three verticesof a triangle may contain any other node.

nodels, it is worth experimentingand comparing different approaches to ensure :hat the best model is used. Triangulated irregular networks Triangulatedirregular networks (TINs) are constructedfrom a set ol masspoints u hich canbe derivedfrom spotheights,the verticesofcontours or a combinationof roth. A TIN is constructed by building a tessellation of triangularpolygonsfrom the nass points, configuredsuchthat if a circle was passedthrough the three nodesof :ach triangle,no other nodeswould be included (Fig. 6.9).This is called, Delaunay :riangulation, and it ensuresthat the closestnode from any point in triangle n is .rneof the three nodesusedto constructtriangle r?. However,'raw' contour datapresentseveralproblemsfor TIN building. Firstly, digitisedcontourlines often containmore datathan is needed to represent a surface and samplepoints along one contour line may lie closer together than points on rhe adjacentcontour.When constructinga TIN, this results in unrealistically flat :levationsin pans ofthe surfacemodel,mostoften on lobed spurs,or widely spaced Jontourson hill tops andvalley floors (Fig. 6.10a,b). To help avoidredundant points being incorporatedin the model, TlN-building programs therefore usually offer ihe capability of setting a proximal tolerance and a weed tolerance. The former establishes the minimum distancebetweenpoints on the horizontalplane,so that if points lie within the tolerancerangeonly one is usedas a node in the or more nvo TIN. More critically, the 'weed tolerance'establishes the minimum distancealong a line before a vertex is usedin the TIN (ESRI 2002, p. 31). The accuracyofa TIN will be greatlyimprovedby digitising critical points along contourlines (i.e. capturingonly major, ratherthan all, inflection points) and 'very


Bu i lJi try sutfar e nole Ls

Fig.6.l0 Using all veftices from a contour map (a) resultsin many flat triangles (shadedgrey in b). Resamplingthe contoursand adding spot heights betweenlobed contours (c) reducesthis problem (d). Source:Kythera Island proiecr.

important points' (VIPs) that definehill tops,pits and saddles. Canara et at. (1997, pp. 470471) also show that wirhout defining hard breaklinesto help guide the constructionoftriangles, valley bottomsand hill tops are modelledasflat iriangles, which results in an unrealistic surfacemodel. One suggested way to do this is to pre-mark thesepoints on the map using a coloured pen so that they can then be manuallydigitised (with their elevationvalues)to createthe masspointsfor the TIN (ESRI2002, p. 57). This will reducethe numberofflar triangres andresultin a more realistic and accuratesurface(Fig. 6.10c, d). To improve the accuracyof the TIN, additional landscapefeaturessuch as ridges and streamscan also be integrated and definedas hard'break lines,.Theselines,which do not necessarily require elevation values, are then maintained as linear featuresin the TIN and interrupt .surface smoothness, which is particularly important for the accuratemodelling of hydrological processes.

6.6 Digital elevationmodels


The resulting TIN is a very versatile model and can be used ,as is' to derive .rttdbutesdefining elevations,slope, aspect,shadedrelief and viewshedswithout requiring additional layers. Spikins et al. (.2002) demonstrate an innovativeuse of TINs to reconstructburied land surlhcesusing the three-dimensional coordinates Df artefacts. Triangulatedirregular networksalso offer advantages over rastergrids \\ ith regardsto storageand processingtime for smaller study areas,becausethe density of trianglescan be adjustedto the complexity of the terrain and the needs of the model, with extratrianglesusedfor areasof highly variablerelief. However, n ith large study areas,this advantage disappears becauseof the computationally demandingtask of generatingvector polygons.The major disadvantage of TINs is rhatthe elevationsurfaceoften retainsa triangularimprint, imparting an unrealistic, rlthough not necessarily inaccurate, feel to a surfacemodel. Finally, in order to integrateTIN elevationdata with other grid datasets, or for othermodellingpurposes. it may be necessary to convert the TIN into a raster grid. This is a straightforward process that involveslaying a grid of desired resolution !r\er the TIN with individualcell values beingcalculated by linearinterpolation or r bicubic average to the nearest vertex.However.slopemapsderivedfrom this grid .rrealso susceptible to the retentionof the original triangular surface. Linear interpolation between contours One of the most basic algorithmsthat can be used on contourdata is a linear interpolationalong a line drawn betweentwo contours.Thesemethodsfirst involve rasterising the digitisedcontourat the resolution of the final grid. The rasterising nust be doneat a sufficiently fineresolution to ensure thatno pixel hastwo contour linespassing throughit, as this can causesignificant errors.Linear interpolators ihenlook to these contourlinesto derive pixels values fbr thatlje between them.For erample, Idrisi's INTERCON linear interpolatorworks by projecting fbur straight ;ines (top-bottom, left-r'ight, and two intersectingdiagonals)through the pixel to be interpolateduntil eachline intersectswith a contour. line. The slopesof eachof the four lines are established by calculating the change in attributevaluedivided b), the length of the line. The line with the maximum slope is chosen and the interyolated value estimated by its positionon this line, as definedby Eq. 6..1. \!'ithout going into detail, this interpolator can produce some significant enors or 'artefacts' in the sudacemodel depending on the configuration of the original contour lines.FilteringtheDEM will smooth it, but will not completely remove the errors(Eastman 2001,pp. 120 121).For this reason, sirup1e Iinearinterpolation of contour datais not advisable; in mostcircumstances the altematives discussed next will produce betterresults. A generally reliablelinearinterpolator is the 'flood-fi11, method,as includedin theGRASSGIS package 'r.surf.contour' . In this method, fbr eachcell in the input mapthat is not a contourline cell, a 'flood fiIl' an evenoutwardexpansion in all directions is generated from that spot until the fill comes to two unique vahes. The flood fill is not allowed to crossover the rasterised contour lines,thus ensuring

B uilding surface models

using TOPOGRID in Arclnfo Fig. 6.11 Isometdc view of a hillshadedDEM generated HILLSHADE lsing an azimuth oi 315' and an altitude of ?.2. Hillshade generaredwrth. 60'. Tiger striping is still presenton south-facingslopes,although lesspronounced than in DEM producedusing IDW in Fig. 6.7. Original contourshavebeenoverlaid for reference.Source:Kythera Island Project.

that an uphill and downhill contour value will be the two values chosen.Unlike usesthe linear INTERCON, which usesa simple linear interpolation,r.surf.contour attribute: to derive the new inversedistanceweighted average

(a;t + alr)


upslopecontouq d6 is wherez is the interpolatedvalue,du is the distanceto nearest contour,and zu and za are the two bounding to the nearest the downslopedist,rnce contour values(Wood 1996,Section3.2.1).This resultsin a more accuratesudace and r.sud.contouris a justifiably popular approachto building altitude matrices from contour data. Topogrid A well-establishedand robust method of creating DEMs from contour data is Arclnfo's TOPOGRID algorithm, which was designedto producehydrologically (Hutchinson1989;HutchinsonandDowling 1991).Itwill corectelevation surfaces from contour dataalonebut to work optimally it requiresadditional build a surface datasets like spot heights,the locations of rivers, lake polygons, streamchannels andridge lines.Elevationdataareinitially usedto constructa streamnetworkthat is

6.7 Conclusion


thenusedto ensurethe hydrogeomolphicpropertiesofthe DEM. As it is a propriety algorithm, the interpolation algorithm is not fully documented, although it usesa form of thin-plate splines,with a roughness weighting to allow for abrupt changes in topography(ESRI 1999).Using it wirh contour data alone significantly reduces the accuracyof the model for hydrological applications,but usually producesa DEM that is nevertheless acceptablefor other purposes.The derived slope map doesexhibit some plateaus(Fig. 6.11), although it is ofren smootherthan simp[ point-based interpolation(e.g.Fig. 6.7). 6.7 Conclusion Surface modelsare an importantanalyticaltool, althoughthe processoftransform_ ing a set of discreteobservations to a continuousdistribution is rarely straightforrvard.Someofthe methodsreviewedin this chapter,suchastrend surfaceanilysis, may provide a useful picture of generalpatterning,but rarely provide a complete pictureofspatial processes andneedto be carefully interpreted. Although IDW and splinesare good all-purposeinterpolatorsfor evenly spreaddatapoints, the results needcareful examination to establish thatthey accurately rcpresent thephenomenon in question.Other interyolators,such as kriging, require proper considerationof geostatistics beforecreatingthe model.Digital elevationmodels,in particular,form a crucialdataset for a varietyofspatial modelsand analyses, so it is essential that the interpolationis as appropriate and error free aspossible.When attemptingto create a DEM from contours,specialattentionis needed to dealwith someofthe problems inherentin this form of representation. Many GIS packages offer severaltypes of interpolation,including kriging, with push-buttonsimplicity. However, verifying the accuracyof the original data, selectinga method appropriatefor that data and the desiredoutput, and then assessing the interpolatedsurfaceis time wisely spent.

7 Exploratory data analysis

7.1 Introduction dataon the basisoftheir location and attributes Selectingandclassifyinggeospatial startsthe processof data exploration,patternrecognition and the interpretationof spatialdata.The first part of this chapterexaminesqueriesaspart of the analytical ofdata basedon oneor more for a subset process in GIS. A queryis a formal request selectioncriteria and foms a core function of GIS. The secondpart of this chapter thenconsiders the subjectofclassification,which refersto the groupingor placingof qualitativeor quantitativecharacteristicson the basisof shared datainto categories methodsfor the classificationof multispectralsatellite This chapteralso discusses image data,which is an important processin the interpretationof remotely sensed imagery.

7.2 The query Therearethreetypesofquery performedin GIS: (i)plrenomenalor atltibutequeries, which questionthe relatednon-spatialdatatablesof spatialobjects(e.g. 'selectall topologicc queries, which question the sites that have obsidian aftefacts'); (.11) geometric coniiguration of an object or relationshipbetween objects (e.g. 'select all sites within Smith County'): (tri) distancequeries,which ask somethingabout the spatial location of objects (e.g. 'select all sites within 100 km of an obsidian source'). In Chapter4 the conceptof the relational model was introducedfor managing This is the most commonly encountereddata both attribute and spatial datasets. SQL, can be usedasthe 'engine' retrievallanguage, structurein GIS. Its associated criteria. As powerful asSQL of defined selection ofdata on the basis to find subsets is, however,it was not designedto handle spatial datasets.Consequently,most the querying of spatial (i.e. map) data from the querying GIS programs separate of attribute data. SQL is more frequently used for the latter, which is temed a phenomenalquery. Spatial queriesare often evokedusing specialtools and query enginesprovided by the GIS software.This is not exclusivelythe case,however, and a version of SQL called Spatial SQL has been defined to enable map data to be queded using a proceduresimilar to that used for attributedata (Egenhofer 1991, 1994:.Open GIS Consortium 1999). Databasedeveloperssuch as Oracle to their relational and PostgreSQl have also addedbasic spatialquery extensions systems(respectivelycalled Oracle Spatial and PostGIS).

7.2 Thequery


7.2.r SQL
SQL is abasictool forquerying relationaldatabases. Every majorcommercialDBM that usesthe relational model understands and acceptsSQL statements, although many desktopproducts (such as Microsoft Access) also use graphical interfaces for query construction.Most GIS systemssuch as Idrisi, GRASS and ATcGISare (or passthem to an extemal relational database), able to processSQL statements and many also offer a graphical interface (e.g. ATcGIS's Query Builder) to help constructattributequeries.The SQL languageis relatively easyto understand and the basicsare straightforward to learn.It is, however,a largeandcomplextopic and we are only able to provide examplesof the more common types of query. Clausesand operators The 'select' clause SQL usesa simple structurefor selectinga set of recordsfrom oneor more tablesof data.The most powerful SQL componentis the selectclause, which retrievesone or more fields (variables)from a table of dataspecifiedusing a to retrievea list of 'site id' numbers from clatse. For example,the SQL statement from the sitestable definedin Table4.2 is: SELECT sites.site id FROM sites;

Note that in this and the following examples,SQL commandsare always given in capitallettersandtheclauses in lowercase.Clauses mustbe followedby the nameof the table and,if specified,columnsare separated from the table nameby a full stop. The 'where' clause Introdvcing conditions to specify what data we are interested in requiresuseof a where cla]use to rcstrictthe selectionset to recordsthat meet the speciliedcriteria. A where clatse needsto invoke a conditional operatol to define the conditionsfor selectionin a query statement. The four mathematicaloperators, GREATER THAN(>), LEss rHAN (<), EeUAL ro (:) and Nor EeUAL ro (<>), are for queriesof quantitativeattributedata.For example,the SQL statement designed for selectingall siteslessthan a hectarein size is given by: SELECT sites.site-id FROM sites LdHERE sites.area < 10000;

This statementwould retum a list of'siteid' numbers only for those sites that were smaller than 10 000 m2. It is also possibleto use the equality and inequality to extractqualitativedata,for example,'dateaqulr ro (:) BronzeAge'. operators The LESS rHlN and cnslrnR THANfunctions do not apply to qualitative data and their use in thesecaseswill retum an error. Both quantitativeand qualitativedatamay also be selected using logical opera(Table7.1).Forexample,the following statement Iorsthat alsofollow awhere cla.use could be usedto retrieveall sitesbetween 1 and 2 hectares(inclusive): SELECT sites.site id FROM sites BETWEEN 9999 AND 2OOO1; tr{HERE sites.area


Exploratory data analy sis Table 7.1 SQL Ingical operators

l,ogical operator BETWEEN n AND ttt I Nn LIKE {Eo,_} EXISTS IJNIQIJ'E ALL ANY IS NULL DescriptiorI Finds all valuesbetween valuesn andm Find the matchingvaluesof', Find valueusingwildcard '7o' (multiple characters) or '-' (singlecharacters) Searches a table for a row that meetsa specifiedcdterion Searches rows to frnd duplicatevalues Compares a valueagainstall valuesin anothelset Compares a valueagainstanyvaluein anotherset Finds missingvalues(not '0')

neolithic AND mesolithic

neolithic OR mesolithic

neolithic NOTmesolithic

XOR mesolithic neolithic

Fig.7.1 TheBoolean operators. Theshaded component represents theselection set. The LIKE oDeratormav also have a '7o' added to act as a multicharacter 'wild card' (i.e. r,xe '%tthic' would retum 'palaeolithic', 'neolithic' and 'mesolithic'). Boolzan operators Another class of operators use Boolean Ingic Io define a selection set through the actions of union, intersection, difference and. exclusion. Quleies lu.se Boolean algebra, which consists of four logical opelators: AND (intersection), oR (union), Nor (difference) and xon (exclusion) to define setsof data. The function of the operators can be explained by reference to a sample datasetof archaeological sites, some of which are Mesolithic, some of which are Neolithic, and some of which contain both periods. It is possible to define new sets of data on the basis of attribute statesdeflned by the Boolean operators. Figure 7.1 showst}le results ofthe

7.2 The query Table 7 .2 Example of the results of a grouping and aggregateSQL statement
site-id 1001 1002 1004 1006 1007 1009
Number ofrim sherds


2 I 2. I

four algebraicexpressions where the shadedcomponentsof the circles represents the selection set. Boolean logic can also be used in spatial queries in a similar fashion: 'sites < I km from a river oR <500 m from a lake'. Relational queries The greatestadvantage of the relationalmodel is that setsof data can be retrieved from acrosstwo or more tables (relations)by specifying the primary and foreign keys.We can,for example,query the relationsin Table4.2 to obtain a list of site_id numbersthat containrim sherds(as recordedin the pottery table) by using what is called an inner-join expression to define the link betweenthe relations 'sites' and 'pottery'. Inner joins follow a where clauseand are specifiedwith an equalssign, as in the following example: SELECT sites.site id FROM sites, pottery I\TIERE sites.site id = pottery.site-id AND pot.tery. type = rim; This would list all the site ids for the selection-specific criteria in the order they occur in the tables,so that a site that containedsix rim sherdswould be listed six times.This isn't the most practicalway of viewing the resultsof the query,so SQL has two grouping clauses(orderby,groupby) and five aggregate functions (co&rrt, sum,average,min andmax) for summarisingdata.We could improve the former query by using the optional clausesof groupby andcount to provide output with a new field giving us the number of times that site occurs(which is effectivelygiving us the number of rims discoveredat eachsite). SELECT sites. site-id poEte-ry.sice-id AS 'number of rir sheros' COUNT FROM sites, sur veyors WHERE = pottery.site-id sites.site-id AND pottery.sherd class = rrm GROUPBY sites. site-id; The output from this query is shown in Table 7.2.

I I6

Exploratory data analysis

Fig.7.2 Thepointin-polygon problem. A spatial query asking whether anypointis inside or outside apolygon is solved by extending a linein a single direction fromthe point. 7.2.2 Spatial queries Spatial queries are queries that require examination of the spatial properties of the objectsto provide the selectionset.They can be divided into two types:topological and distance (buffering) queries. Topological queries Topological queries pose questions about the geometrical relationship between two or more objects. Imagine, for example, two map layers where points represent 'archaeologicalsites' and 'states' are represented by polygons. The question 'select all archaeological sitesin Maine' is thus a topological query as it depends on establishingwhether a point lies within the boundariesof a polygon. The processof retrieving this data is what is called a point-in-polygon operation (Haines 1994;Worboys 1995,pp. 215-218). Otherrelatedtypesoftopological questions are termed line-in-polygon operations andpolygon-overlay queries.It is worth examining these in more detail, as they provide considerable insight into both the abilities and mechanicsof spatialqueries. Point-in-polygon Tlte point-in-polygrr? test is the most straightforward of the topological queries and is sometimes referred to as a containment query. The most cornmon strategyto determinewhether a point is inside a polygon uses an algorithm that counts the number of times a hypothetical line, extended infinitely in a single direction, intersectsthe polygon boundary (Haines 1994). If it intersects an odd number of times, the point is inside the polygon, if an even number of times, then the point is outsidethe polygon (Fie.7.2). Manber (1989) provides a point-in-polygon algorithm basedon this theorem. To minimise the time neededto conductpoint-in-polygonsearches, aboundingrectangleis usually initially defined to reducethe set of points that needto be tested(Jones1997,pp. 189-190).

7.2 Thequery

I I'1

Fig.7.1 The line in polygonproblem.

Containmentquerieson rasterdatasets are more trivial. Assuming that polygon t'eatures are represented by a pixel value uniqueto that polygon, any pixel sharing thatvaluelies within the polygon(Foleyel al. 1990). Point-in-polygonoperationsare usedto answerqueriessimilar to:
. . . . what propoftion of the Bronze Age barrows in Wiltshire is locatedon chalkland? Selectall artefactsthat come from building N in settlement S. How many siteswere found in the last surveyzone? Are there any Late hoquoian villages il1 Sincoe County?

Line-in-polygon A more computationallyden.randing test is neededto determine the relationshipbetweena line and a polygon. The first stageof a line-in-polygon test js to define the minimum rectangularboundaryof the polygon and then deter(Fig. 7.3).Lines that nine whetherthis intersects or contains the line in question potentiallyintersectthe polygon will be partially containedby the polygon. as with lines rz, b and d. This does not, however,take into account those lines that are entirelycontained by the polygon,suchas c. This possibilityis accounted for by pertbrming a point-in-polygon test for the end nodes of the line. It is possible, however,fbr both nodesto be outsidethe polygon, yet the line still passes through as with Furthermore, polygons, it. a. in the caseof the concavesidesof both nodes may be within the polygon but the line can still have an externalsegment,as with lined. Various solutions have been devisedto solve this problem, some of which are outlined in Foley et al. (1990). In a topologically awareGIS environment,nodes placedat the intersections of lines and polygons enablecalculationof exactly how rr]uchof eachline fal ls within the polygon. Questions that requirea line-in-polygon testwill be similarto:
. . . Which countiesdoes O1la's Dyke passthrough'J How many kilometr-es of Ronan road are found in Surreyl How many public footpathsare there in the Avcbury world HeritageSite iuea?


Etplorar ory dara analysis

Fig. 7.4 Polygon overlay.

Polygonoverlay A similarly complex problem occurswhen the spatialproperties of two overlappingpolygons are investigated. Imagine, for example,that it is necessaryto determinethe areaof overlapofpolygonA, which definesan archaeologically sensitivearea,by polygon R, which represents a proposedhighway extension

(Fie.7 .4).
The computationalprocessis similar to the line-in-polygon problem in that it involves using nodes at each point of intersection (nodes e, f, k, j in Fig. 1.4). There are several ways that this can be rnanaged by the GIS software, but typically it dependson examining topological relationships.For example,the areaof the polygon, R2, defined by the vertices e, f, k, j can be established by standard geometric principles, as defined in Chapter 2. More sophisticated algorithms are neededwhen polygons haveinternal 'islands', or have two or more areasof overlap because of a convexity in the underlying polygon. For that reasontopological databases are not solely limited to arc-node structures but alsorecordrelationships such as 'contains' or 'borders left', to facilitate all possible configurationsand morphologiesof polygons.Polygon overlaysare usedto answerqueriessimilar to:
. . . w1lat prcportion of the survey area is under cultivation and has been fieldwalked? How many hectarcs of the proposed new urban zoning falls in archaeologically sensitive areasl What is the amount of arableland that lies within a 5-km radius of the site?

Worboys(1995,pp. 218222) providesfurther informationon the implementation of topological operations in a GIS. Distance (buffering) queries Thesecond query, query, typeof spatial a buffering theselection involves ofa subset point,line or polygonfeature. of a dataset based on its distance to a defined For example, point theresultof a bufferzoneof distance a n around will be a circleof

7.2 Thequery


Fig.7.5 Theeffects of buffering a point,lineandpolygon featurc by theequivalent number ofunits. radiusn. If a line featureis buffered,the result will be a linear polygon with rounded ends.A bufferedpolygon will producean enlargedversionof itself (Fig. 7.5). Determining whether any given point, line or polygon falls within the buffer zoneis then a matter of determiningits topological relationshipwith the polygon definedby the buffer, asin the mannerdescribedfor topologicalqueries.Buffering operations are necessary to answerquestionslike:
. . . . What proportion of sitesfall within 1 km of the coast? What is the changein density of sherdsmoving away from the centreof site fr in 100-m interyals? What is the difference in the averageamount of high-grade arable land falling within 5 km of sitesof type A versussitesof type B? what propoftion of all scrapers are found within 2 m of hearth features?

7.2.3 Using axrtbute and spatial queies: an example from Shetlnnd Now that we haveintroducedbasic query logic andthe sortsof attributeand spatial queriesthat can be answered by a GIS, we provide someexamples that demonstrate questions. their applicationto archaeological In doing so, we will demonstrate how the combination of attribute and spatial queries often constitutethe first stepsof exploratorydata analysis(EDA). We will use a simple dataset from Neolithic west Shetland in Scotlandthat consistsofthe locationsof archaeological sites(codedas either chambered caims or stonehouses),elevationand land capability (Fig. 7.6). Thesedata were obtainedfrom Miiller (1988). Attribute queries We can begin to investigate the relationshipbetweenthe typesof site size and their distributionpatternsby using SQL with a combinationof mathematicaland logical We could, for exarnple,begin by dividing the data into different setsby operators. defininga query to find sitesdefinedas 'chambered cairns'. The SQL statement for thisis: site, site-tlzpe FROM SELECT sites ;!+IERE site-El,?e cairn'; 'charbered -

Exploratory datcL anaLy si s

* P ".\<.



Fig.7.6 West Shedandsludy areaelevation(as shadedrclief),1and-usecapability and the distribution of Neolithic chambe.edburial caims and stonehouses(Miiller 1988). Land capability represeots major divisions only (4 : limited value for crop production; 5 : grassland, grazing, possiblecrop production; 6 : land suitableonly for rough grazing; 7 : unsuitedto agriculture;Bibby et.r/. l99l). Total size of study afea is 2587.5 hectares. Elevation and coastlineO Crown Copyright, all rights reserved. licence no. 100021 184. Land capability nlaps O The Macaulay Land Use Research Institute 2004, all rights reserved.

This simple query begins the processof data exploration and allows us to find structure in the data. In this particular example, we can stan to see evidenceof spatialclusteringand different distributionpatternsfor chambered cairnsand stone (Fig. houses 7.6c,d). Usingthis information we canbeginto unpackthe locational difl-erences betweenthe two monument classesand explore the factors that may haveconditioned their placement in the landscape. Spatial queries The limited set of data that we are using in the previous and following example is not sufficientto build a comprehensive understanding of the reasons underlying

7.2 The query Table 7.3 Observednumber of caims and category housesin each Land-use

Land class

0 l3 21 0 0.0 38.2 6 1 .8 0.0 0 28 20 0 48

0.0 58.3 4t.'1 0.0

4 5 6 ,1 Total



Table 7.4 Contribution of each land class to the studyarea


4 5 6 7

459.6 6'766.2 l'7'7 41.9 914.5 25 882.2

1.8 26.1 68.6 3.5 100

the placementof thesemonumentsin west Shetland.However,simply to illustrate further the initial stages ofan exploratoryanalysis,a useful next stepis to definethe spatialrelationships of the different classes of site with the land-usemap. This is a containmentquery requiring a point-in-polygonoperationthat will establishwhich points lie within eachpolygon class.The precisemechanics ofhow this is performed depends on the GIS program (in ATcGISit is carriedout using the 'selectby' query function), but the outcomeof the query is shown in Table 7.3. This showsus that most of the burial cairns are situatedin land capability class 6 (rough grazing), whereasmost of the stonehousesare land capability class5 (grassland with some agricultural potential). However, there is a danger in inferring that each type of site is preferentially located in a particular zone. This can only be established by determiningwhat proportion eachland classcontributesto the total study areaand testing to seewhetherthe distribution is significantly different from that expected by chancealone. Determining the specific proportion each of the four land-usecategoriescontributesto the study areais anotherspatialquery.In this instance,a1lthe polygons of a particular class are selectedand the total area of the selection set is calculated (Table 7.4). Tables7 .3 and 7 .4 provide us with all the information we need to determinewhether burial cairns and stonehousesare differentially distributed accordingto different classes of land in west Shetland.The next sectiondescribes rrsins a v2-resl ho\alo substant;ate these observations


ExpIorat ory data analysis Tab\e7.5 Summarystatisticsof elevationvalues (m)for caims and houses
Houses average elevation standarddeviation median elevation minimum elevation maximum elevation

4 8 .3 24.3 45 5 106

2t.4 12.6 20 2 49

'n:33. bn:18.

As a fufther exampleof a spatialquery,we havecomparedthe elevationof the two classes of archaeological sites.The preciseway this is performedin a GIS again dependson the software,but the algorithm will be relatively trivial: as elevation data are stored in a raster grid and the sites are stored as points, so finding the elevationfor each site is simply a matter of correlating a site's ,r, y-location with pixel on the rastermap and extractingthe pixel's value.Table7.5 its corresponding shows the summary statisticsfor the elevation values of burial cairns and stone housesin our study area. The results suggestthat caims are generally situatedat higher elevationsthan houses.However,a further statisticaltest for example,a Kolmogorov-Smimov test (seebelow) - would needto be undertaken this observation. to substantiate 7.3 Statistical methods Statistics play an important role in GIS-led researchbecausethey are used to explore,clarify andascertain the signihcance ofrelationshipsin spatialandattribute datasets. There is a large variety of statistical techniquesand five of the most common are described below. The first three (the chi-squared,Wilcoxon and Kolmogorov-Smirnov tests) are ref'enedto as non-paramelrictestsbecausethey do not rely on an estimateofparameters ofthe distributionofthe variableofinterest in the population.They erethereforeappropriate testsfor small samples, especially when little is known about the parameters of the parent population, or when the parent population cannot be safely assumedto be normally distributed (i.e. as in Fig. 7.9, seep. 131).As the name s:uggests, parametric statisticsrequire an understanding of the shapeof the population. The most common parametrictest, the Student's/-test, dependson the samplebeing normally distributed. Statisticaltesting is dependent upon a number of conceptsand terms, the most basic of which are defined in Table 7.6. Shennan(1988) and Baxter (1994) are good startingpoints lbr readersrequiring more detailsthan we are able to provide here.

7.3 Statisticalmethotls Table'7.6 Basic statistical termsand concepts



In statisticalterms, a populationis any set ofphenomena with one or more shar:ed attributes.'Iroquoianlonghouses','palaeolithichandaxes' and 'Folsom sites' are all populations Sample A sampleis a subsetot' a population. 'Senecalonghouses','B.itish palaeolithic handaxes'and 'Folsom sitesin New Mexico' are all samplesof the populationsdefined above Null hypothesis The null hypothesisusually definesa stateof non-significance - in other words, that there is no differencebetweenthe two samplesbeing tested and that they are thereforemost probably drawn from the same population Altemate hypothesis The altematehypothesisusually statesthat the samplesare most probably drawn fiom different populations Significancelevel The level of probability, or 'critical value', at which the null hypothesis should be rejected.Conventionis rhar rhe probability (p) is set to 0.05 or less,meaningthat there is only a 57achanceof making a Type I efIol Typ eland2er r or s A Type I error occurswhen a null hypothesishas been incorrectly rejected.A Type 2 erlor occurswhen a null hypothesishas been incorrectly accepted Scale of measurement Data can be measuredat nominal, ordinal, ratio or inteNal scalesof measurement. Thesecategories are definedin Table 3.2


7.3,1 Non-parametrictestsof signirtcance The chi-squared test The chi-squared test is an excellentstartingpoint for situationswhen it is necessary to test the significanceof observations made at nominal scalesof measurement. It is an extremelyversatiletest that has a wide range of applicationsboth in GIS and morebroadly in archaeology andthe socialsciences. Examplesofsituationswherea chi-squared testmight be appropriate would be when comparingthe numberofsites found on different typesof soil or geology,testingthe conelation betweendifferent types of aftefact found in different areasof an archaeologicalsite, or assessing the relationshipbetweennumber of surfaceartefactscollectedin different types of setnngs. The chi-squared test beginswith the constructionof a contingencytable, where the count (not percentages) of obseryationsagainstdata categoriesare recorded. For example,earlier in this chapterwe established the number of different types of Neolithic site found on different types of land in west Shetland,reproducedin Table 7 .7. These constitute olurobservedvalues.Our initial hypothesisis simply that thereis a significantdifferencebetweenthe numbersof the two different types of monumentfound in the different land classes. The null hypothesiscan therefore be statedas 'there is no differencebetweenthe observednumbersof the different


Explo rato ry data analysi s Table 7.7 Observednumber of carns and houseson each land class
Land class

4 5 6 '7

0 l3 21 0

0 28 20 48

0 41 4l 0 82

Table7 .8 Expectednumber of cairns and houseson each ktnd ck ss

Land class

4 5 6 '1 Total

0 t7 I1 0

0 24 24 48

0 11 4l 0 82

monumenttypes on the different land classes'.The level of significanceat which we will reject the null hypothesisis set at 0.05. In order to determinewhetherthere is any relationshipbetweenthe numbersof different sitesfound on the two different land categories, a tableof expected values for eachcategoryis needed againstwhich the observed valuescan be compared. For a two-sampletestsuchasthis, expected valuesareusuallycalculatedby multiplying the row total by the column total and dividing by the total numberof observations, asin Table7.8. The chi-squared formula is given by:
-2 - -/ \ - \ ui I:tt'

(7 .r)

where X / is the chi-squared value,ft is the numberof categories and O, and ti are the observedand expectednumber of casesin each category.For every category, the expectedvalue is then subtracted from the observedvalue and this number is multiplied by itselfand then divided by the expected value.Oncethis hasbeendone for all categories, the sum ofthese valuesis the X2. Using the datafrom Tables7.7 and 7.8, we thereforehavethe following (Table7.9). It is now left to establishwhetherthe 12-value of 3.22 is large enoughfor us to reject the null hypothesisof no correlation.However,as large tables will naturally havelarger12-values, a measure call ed,the degrees offreedom(r) is usedto 'weight'

7.3 Statisticalmethods


Table 7.9ff
Land class


4 5 6 '7 Total

000 0.94 0.94 000 1 .8 8

0.6',7 0.67 r.3 4

1.61 1.61 3 .2 2

the significanceof the Xr basedon the size of the table. For a two-sampletest it is eivenbv: u:(r-l)(c-1)


whereu definesthe degrees offreedom, r is the numberof rows andc is the number of columns. In our example o : 3. We can now comparethe calculated12-value with the expectedvalue of 12 given 1 degreeof freedom, at the predetermined significance level of 0.05. More fomally, if for u degrees of freedom 11,. ) Xj we will reject the null hypothesis, but if X3or. < Xo2, then we acceptthe null hypothesis. In this scenario, u : 1 ando : 0.05. A statisticaltablemustbe consulted that gives percentage points the of the 12-distribution, which is provided in Table 7.10. In : 7.8 1. As 3.22 < 7 .81, this case,it can be seenthat for 3 degrees of freedomXolos we thereforecannotreject the null hypothesis, which meansthat we must conclude that in this samplethereis no significantdifferencebetweenthe observednumbers of cairns and housesin relation to land capability class. This result may at first seemcounter-intuitive,for the data in Table 7.7 appear to show that there are more housesfound in land class5 and more cairns found in land class6. What the chi-squaredtest has told us is that, while our observations suggesta relationship,the difference in patteming between housesand caims is not sufficiently pronouncedgiven the samplesize to allow us to make a claim that a difference exists.What constitutes'reasonablyconfident', however,is a matter of some debate.As a rule of thumb p < 0.05 - a 95 per cent confidencelevel is acceptable for most situations,although a more conservative or relaxed level of significancemay sometimesbe appropriate.Decreasingthe significancelevel to p < 0.1 (i.e. 90 per cent confident)would correspondinglyincreasethe chanceof incorrectly rejectingthe null hypothesis,referredto as aType 1 error. We could further investigatethe relationshipbetweenland capability and monument type by comparing our observeddistribution with an expecteddistribution basedon the percentage of eachland classin the study area.For example,as land class7 constitutes3.5 per cent of our study area,we should expectto find 3.5 per cent of all cairns and 3.5 per cent of all houseslocatedin that region (Table7.I 1).


Explo ratory data analysis Table 7.10 Critical t,aluesof x2 for u degrees offreedom and critical values(u)of 0.10 to 0.0001. Calculatedusing built-in functions of the program' R' statistical

0 .1 0 I 2 3 1 5 6 '7 8 9 10 lt 12 13 t4 l5 t6 17 l8 l9 20 21 22 23 24 25 26 2'7 28 29 30 2.',l I 4 .6 1 6.25 '7.',78 9.21 10.64 9.04 1 3 .3 6 14.68 1 5 .9 9 l'7.28 1 8 .5 5 1 9 .8 1 21.06 2 2 .3 1 24.7'7 25.99 2',1.20 28.11 29.62 3 0 .8 1 32.01 33.20 34.38 35.56 36.'71 3'.7 .92 39.09 40.2'/

0.05 3 .8 4 5 .9 9 7 .8 1 9.49 1 1 .0 7 12.59 t4.07 1 5 .5 1 16.92 1 8 .3 1 1 9 .6 8 2 1 .0 3 22.36 23.68 25.00 26.30 2',1.59 28.8'1 3 0 .1 4 3 1 .4 1 32.6',7 33.92 35.1'7 36.12 37.65 38.89 4 0 .1 I 41.34 42.56 43.'71 5.02 '1.38 9.35 1 1.14 12.83 14.45 1 6 .01 r7 .53 19.02 20.48 2 l .92 2 3.34 24.'14 2 6.12 2'7.49 28.85 3 0.19 3 1.53 32.85 34.t'7 35.48 36.78 38.08 39.36 40.65 41.92 4 3 .19 44.46 46.98

0.01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 2t.67 23.21 21.'73 26.22 2'7 .69 29.14 30.58 32.00 33.41 34.81 36.19 3'7.5'7 38.93 40.29 41.64 42.98 44.31 45.64 46.96 48.28 49.59 50.89

0.005 7.88 10.6 12.84 14.86 16.7 5 18.55 20.28 21.95 23.59 25.19 26.',l 6 28.30 29.82 31.32 32.80 34.2'7 35.'72 3',7.16 38.58 40.00 41.40 42.80 44.18 45.56 46.93 48.29 49.65 50.99 53.67

10.83 13.82 16.2'7 18.47 20.51 22.46 24.32 26.t2 27.88 29.59 3t.26 32.91 34.53 36.12 37.10 39.25 40.'79 42.31 43.82 45.3 t 46.80 48.2',7 19.73 51.18 52.62 54.05 55.48 56.89 58.30 59.70

This, however, is investigatinga subtly differenthypothesis thanthat suggested previously: not that there is a differencein the observednumbersof caims or houses in relation to how the other monumenttype is distributedacrossthe four different land classes, but that thereis a differencein the numbersofcairns and houses given the sizesof the different land classes. In this case,a chi-squared test returnsa value of 30.7 for 3 degrees of freedom,which is well within the limits for a 99.9 per cent level of confidence.This allows us to statethat there is a statistically significant relationshipbetweenland capabilityclassandmonumentdistribution.Examination

7.3 Statisticalmethods Table 7 . 11 Expectednumbersof monuments basedon land-classareas.SeeTable 7.7for observedvalues

Land class'

r )1

4 ( 1. 8) 5 ( 26. 1) 6 (68.5) 7 (3.5) Total

0.6 8.9 23.3 1 .2 34.0

0.9 t2 .5 32.9 |.'1 48.0

l .i

21.4 56.2 2.9 82.0

o Valuesin parentheses are percentages ofthe tolal land-classarea.

Table 7.12Qd
Land class

,olu"t odiusted for area

4 5 6 '7 Total

0.6 1 .9 0.2 1 .2 3 .9

0.9 1 9 .1 5 .1 1.'7 26.8

1.5 2l .0 5.3 2.9 30.'1

of the contributionsto the chi-squared statisticshowsthat this is mainly caused by (Table the many more housesobserr'ed on land class5 than expected 7.12). The Wilcoxon (Mann-Whitney) test The Wilcoxon or Mann-Whitney testis anothergood andrelativelystraightforward non-parametric statisticaltest.Unlike the X 2. however,it can be usedto test for differences betweentwo samplesofordinal or continuousdata.The only prerequisites for this test arethat the two samples drawn,that the arerandomlyandindependently variableis potentially continuous(i.e. decimalplacesto the nth place are possible, if not necessarily logical), and that the measures within the two sampleshave the propertiesof at least an ordinal scaleof measurement, so that it is meaningful to speakof 'greaterthan', 'less than' and 'equal to' (Lowry 2003, Chapter I 1a). Exampleswhere the Mann-Whitney test may be appropriateinclude situations where it is necessary to compare the sizes of sites from two study areas,sizes of aftefactsfrom two excavationtrenches,or numbers of artefactsrecoveredin field-walkedtransects taken acrosstwo different sites. For the puryoses ofillustration, we usea hypotheticalexampleof artefacts recoveredfrom ten transects walked acrosstwo sites,referredto as siteA and siteB. The


Exploratory data anaLy sis Table7 .13Artefacts recovered from ten transectsoyer sitesA and B
Site A Site B

23 22 t8 l5 4 2 I 1 I Total96 Mean 9.6

44 32 2t) l2 t0 l0 4 2
I 1 136 13. 6

data are given in Table7.13, with the transectresultsrankedfrom highestto lowest for eachsite. The questionwe wish to answeris whetherthe datasupportan argumentthat the densitiesof materialrecovered from sitesA andB are equivalent,assumingthat the transects are equal length for both sites and that they are a random sampledrawn from the total population of transects that could havebeen made acrosseachsite. The datain Table7.13 showthat on average were recovered more artefacts from the transects in site8, anda boxplot ofthesenumbersshowsthat the rangein the number of artefacts collectedin the transects from siteB is alsogreaterthan siteA (Frg.1.7). To claim that obselveddift'erences are sufficientlypronouncedand that the transectsare thus samplesfrom two different populations(and thereforethat sitesA and B do havedifferent surfacedensitiesof artefacts), requiresa statisticaltest. The null hypothesisis that the samplesare drawn from the samepopulationand consequently that thereis no signilicantdifferencein artefactdensilybetweenthese two sites. As with all statisticaltests,a level of significancefor rejecting the null hypothesis mustbe established. Here,we will use0.05,sothatif p < 0.05,we will reject the null hypothesisand concludethat the samplesare drawn from different populations. The testbeginsby re-sortingthe samplevaluesinto a singlerankedgroup of size N : no I na, where no and nb arc the number of observations for sitesA and B (Table7.14). Each value is then ranked,accordingto where it sits in the total list. When there is more than one instanceof a value (as in this casewith values l, 2, 4 and l0), then eachreceivesthe mean of the rankings for that value. For example, value 1 is ranked l, 2,3, 4 and 5, so all instancesof value 1 receivethe ranking of 15/5 - 3. Theseranked valuesare then returnedto each of the samples,as in Table7.15 (note that the order is reversed).

7.3 Statistical methods Table 7.14 Rankedtransect valuesfor sitesA and B

Value 3 3 3 3 3 2 2 4 .4 9 l0 l0 t2
t:) h.J

r29 Table7 .15 Rankedmeasures for sitesA and B

SiteA 3 3 3 6.5 8.5 l0 t4 l5 t'7 l8 Total 98.0 Mean 9.8 Sitet 3 3 8.5 11.5 11.5 l3 16 19 20 t12.0 11.2

Site A A A B B A B A B A B B B A A B

a .J

l8 20 22 32 44 Total Mean

8.5 10 1 1 .5 1 1 .5 13 14 l5 16 l'7 18 19 20 210.0 10.5

Fig. 7.7 Boxplot of artefacts rccovered from transects taken across sites A and B. The box is bounded by the first and third quartiles and so contains 50 per cent of the observations. The transverse line within the box marks the median value, while the transverse lines at the ends of the 'whiskers' mark the minimum and maximum values.


Exploratory data analysis

The Mann Whitney test calculatesa measure {-/ for eachsample,which is given by:

: nont, Lro +
LIr,: nantt *

n'(qL+ I)

- w.


nt(nL-t l)

- *o


wherenn andn6 refer to the numberof obseruations (transects) for siteA andB, and lV. and W7, denotethe sum ofthe ranksfor sitesA andB (asgiven in Table7.15).By substituting the valuesin Eqs. (7.3) and(7.4), then U. : l0 x l0 + (10(10 + 1)/2) - 98: 57 and Ur,: l0 x 10 + (10(10 + 1)12) - 112 :43. If the null hypothesis is true (that siteI doesnot showa differentconcentration ofafiefacts than site A), we can expect U" { Ut. This is in fact not the case,but we need to consult a statistical table that provides values of U lbr critical values of p to determinewhetheror not the differenceis significant.l In this example,n,, andn6 areboth equalto 10,and at p < 0.05 the upper and lower limits of U areZ7 and73 lbr a one-tailedtest and 23 and 77 for a two-tailed test.As our alternatehypothesis proposesthat sitesA and B are different we use a two-tailed test (a one-tailedtest shouldbe usedifa hypothesis ofdifference in onedirection is proposed). Our lower and upperestimates of U, namely43 and 57, fall within the wider boundaries ofthe two-tailed test and we thereforecannotreject null hypothesis:thereis no statistical differencein artefactdensitybetweenthe two sites. The Kolmogorov-Smirnov test Another useful non-parametricmethod is the Kolmogorov-Srrirnov test (or K-S test).Like the Mann-Whitney test,it can be usedon two independent observations measured at the ordinal scaleor aboveand it is not dependent on the observations being normally distributed.It is useful for a broad rangeof problems,particularly when the restrictive conditions of a parametric test cannot be met. To provide an example, we refer to Campbell's (2000) study of archaeologicallandscapes on the Pacific island of Rarotonga,where he has recordedthe size of agricultural (termedrepotaro).Terraces terraces areusedby members ofcorporategroupscalled rnatakeinanga. eachof which residesin a territory termeda rapere(Campbell2000, pp. 63-64). To demonstrate the value of a K S test, differencesin repotaro sizes betweenthe two corporategroup territories are investigated. The data are taken from Campbell 2000, Appendix l. A graphical comparisonof repotaro sizes (in square metres) for tvto topere, Takuvaineand Tupapa,is given in Fig. 7.8 and shows that the 14 repotoro *om
L{n o,rlin. .ul.u]"to. fnr critical valuesof U can Deaccesseo ar h t t p : / / f a c u l r y . va ssa r . e d u / lo lr r ylch 1 1 a . h r n t. S cc Low ry (2003)fbr dei ai l s.

7.3 Statisticalmethods



(n=14) Takuvaine

=l7) (|) Tupapa

Fig. 7.8 Boxplot of repotaro sizesfor the Takuvaineand TupapaTapere.




10000 20000 30000 40000 50000 (m2) Size


10000 (rn2) Size


Fig.7.9 Frequency distributions of rcpolaro sizes fortheTakuvajne and Tupapa Tapere. Takuvaineare, on average, larger than the 17 recordedfor Tupapa.Visual assessment of the shapeof the distribution of the samples(Fie. 7.9) shows that they are not normally distributed,so a non-parametric method is an appropriateway to investigatedifferences. The null hypothesisis that the two samplesof terracesare drawn from the samepopulation and that there is no significant differencein size betweenthe two groups. In order to calculate the K-S statistic. the data need to be converled to a cumulative distribution. The K-S test measuresthe maximum difference in the cumulative distributions of the two categories(referred to as D) and compares this differenceagainstthe differencepredictedifthe sampleswere drawn from the

Exp loratory data analysi s Table 7 .16 Critical values of D, KolmogorovSmimoy testfor two populations (Arsham 2003), where n1 and n2 are the fivo samplesizes
Critical value

0.10 0.05 0.025 0.0r

' a is the significance level.

r.22 x J@1 1nj/@p)

r.36 x J@]i]frlp) r.48x J@\Ti;nii;

r.63x J@jijl@pj


t +o (-)

g = F


10000 20000 30000 (mz) Size


Fig. 7.10 Cumulative proportiondistributionof repotarosizesfor Takuvaine and Ttpapa Tapere. Locationof maximumdifferencemarkedby dashed line.

same distribution: D :maxlS r(x)-S z(,r)l


where D is the K-S statistic,and S1("r)and Sz(.r)are the two cumulativedistributions.In this example,aplot ofthe cumulativedistributioncurvesshowsa maxrmum distance of 0.64, at -10500 m2 (Fig. 7.10). To ascertain whether a value of D : 0.64 is significantor not, a tableofcritical valuesmustbe consulted. Consultation ofTable 7.16 for n1 : 14 andn2 : l7 for p < 0.05 providesa rninimum value

7.3 Statisticalmethod.s

r3 3

Table7.17 Mesolithic atlefact denstties(arlefactsper hectare) from coastal and ir and surveynreason Islay
2.663;75t.',78 0.650.435.15 0.202.0',7 0.40 0.'101.26 D.4lO.'12 0.530.923.92 3.26 0.200.09 0.530.600.30 0.036.360.45 0..14 0.060.46 4.440.5'7 .1.60 2.09 0.531.06 0.351.09 0.531.3714.89 1.11 0.43 0.I8 8.64 0.020.13 0.200.48 0.27 0.310.170.43 0.080.910.14 0.551.42 1.261.2'7 0.61


(2000b, Source: Woodman p. 457).

of 0.36.Our D valueof0.64 exceeds this, so we thereforecanrejectthe null hypothesisand concludethal the repotaro sizesare not drawn from the samepopulation. In other words, Takuvainerepotaro are significantly bigger than those in Tupapa Tapere. Although this testis easyto perfom by handandcritical valuesofD areprovided in statisticaltables (e.g. Lindley and Scott 1984), many statisricalpackagesalso include K-S testing and will calculateD and return a number for a given critical valueof p.

7,3,2 Parametrictestsof signifcance: Student'st Parametrictestsdependon knowledgeof the characteristics of the distribution of data.The /{est is the mostconlmonparametricmethodusedto assess theprobability thattwo samples havebeendrawnfrom two differentpopulations. In theory,the size of the samplescan be as few as ten observations, but an essentialprerequisitefor performing a ,-test is that the sampledataare normally distributedsincethe results are meaningless if this condition is not met. Larger samples(e.g. 30+) arc more likely to meetthe condition of normality (iftheir parentis also normal) andthis can be verified either by visual assessment of a histogramor boxplot or, more robustly, by using a statisticaltest such as a Kolmogorov-Smimov test for normality, the Shapiro-Wilks' lF test or the Lilliefors test.In addition, it is necessary to ascertain whetherthe variances in the samplescoresare roughly equivalent,as thereare two forms of the t-test dependingon whether or not this is the case.Equivalenceof variancecan be establishedby an F-test. Testsfor normality, the F-test and the /-test itself, are included in major statisticalpackages, including R We have used data from Woodman'sstudy of Mesolithic site location on Islay for a worked example(Woodman2000b,p. 457). The dataset consistsof 36 coastal and 37 inland surveyareas,eachof which has a value that describes the density of Mesolithic artefacts. Thesesamplescan be usedto test the hypothesisthat there is a differencein artefactdensitybetweencoastaland inland surveyareasasparl of a wider study into Mesolithic site locarion (Table7.17).


Exploratory data analysis


o o.a (,)



in land

Fig. 7.1I Boxplot of aftefactdensitiesfor coastaland inland survey areas.

The mean artefact density is 1.39 artefactsper hectarefor coastal areasand 1.48 for inland areas,suggesting that the latter zoneon average possesses a higher density of artefactsthan the former Boxplots of the two samples,however,show that thereis considerable overlap in the sampledistributions(Fig. 7.11). A skewed sample distribution is a common occurrencewith archaeological datasets and contravenes the requirements of the Student'st-test. However,rather than use one of the non-parametric testsit is often possibleto transform samples and 'normalise' the shape oftheir distributions,andin so doing, makethem suitable for a parametrictest.In this example,by taking the natural log of eachof the density values,the distribution beginsto resemblemore closely a normal distribution (Fig.7.12). The Shapiro-Wilks test is commonly used to establishthat each sampleis not statisticallydifferent from a normal distribution (in this casereturning p:0.18 and p :0.74forthe logged valuesfrom the coastaland inland samples,showing that they are not statistically different from a normally distributed sample).An F-test may then be usedto establish that the two samples haveequivalentvariances. In this casethe result, p : 0.55, indicatesthat the t-test can be taken assuming this condition. The ttest itself retums a p-value of 0.91, much higher than the critical value of 0.05 (or less) neededfor rejecting the null hypothesis.This indicatesthar the two samplesare not statisticallydifferent. Thus, althoughthe coastalsamplehas a highermeandensitythan the inland surveysample,the differenceis not sufRciently

7.4 Data classifcation


c\l I an I



Fig.7.12 Boxplot of logged altefact densitiesfor coastaland inland survey areas.

for all coastalareas largeenoughto concludethat the densityof Mesolithic artefacts is different from all inland areasof Islay. Comprehensivestatistical packagessuch as S-Plus, SPSS and R are able to perform the non-parametric and parametrictestsdescribedin this section(as well as a range of other more specialisedtechniquesdescribedin the next chapter). ,\t the time of writing, thesestatisticalprocedures were also availableonline from VassrtrStats: WebSitefor StatisticatCompurotion2 \Lowry 2003).Boxes7.I and7.2 packagethat integrates provide guidanceon the use of R, which is an OpenSource with GRASSGIS. 7.4 Data classification This final sectionintroducesa numberof statisticalmethodsof classilicationuseful for visualising patternswithin attribute data. As defined in the introduction to this chaptet classificationinvolvesthe grouping or placementof data into groups. Members of each categoryshould be more similar to each other than they are to The reasons non-members on the basisof qualitativeor quantitativecharacteristics. for classifyingspatialdataareto simplify an otherwisecomplexmatrix, to discover Thereareawide variety structure andpattemsandto facilitatecomparative analysis. of clctssifccrtion syste/ns that, for our purposes, are usefully dividedinto quolitative
.i.p, "_, r/.

a s sir .aou

- osr ,/

assar sla

s._- - 1


Explo rat ory data analysis

Box 7.1UsingR
R is a versatileandpowerful object-orientated statisticalprogramminglanguage that can be usedfor everythingfrom simple univariatestatisticalteststo more complex statistical and geostatisticalmodelling. It can be linked to GRASS GIS so that data and spatialparameters can be read directly into R without the needfor separate data entry (seesectionson using R within GRASS in Neteler and Mitasova 2002). As with GRASS, R is provided free of chargeunder the GNU GeneralPublic Licence.It can be obtainedfor a variety of platforms from h r r^ . / /-T ^ r r - ^r ^i6- l- ,,---.org. As R is an object-orientated language actionsareperfomed on objectsreferred to asnameddata structures,whichmightbe a numbervector,array,listor matrix. One of the simplestand most useful objectsis a lisl (or numeric wctor) of d,ata in the form of rt, 12,-r3....r,. To setup a vectorcalledsite-l consisting of 10 numbersthe following R commandwould be used: > sire-1 < c (64 , 23 , 42 , 2 , 23 , 9 , 42 , 2 , LL, 6)

Altematively, if the data were in a delineatedASCII file called sitel.txt the command > site 1 < read. table ("sitel. txL" )

performs a similar function. Mathematical operationscan then easily be performed on this object. For example,it is possibleto calculatethe natural logarithm of the variablesin site-l and store them in a new vector called site-lloe using the command
siLe 1fo 9 < 1o9 / s iLe- l)

Severaloperations canbe performedon numericvectors,including operations for basic descriptivestatisticsand graphicaldisplay of the distribution pattern. For example,thesethree separate commands: > sunmary(site-1) > box. plot (site 1) > hist (site 1) respectivelyreturn basic descriptivestatistics(i.e. the minimum, first quartile, median,mean,third quartile,maximum of the distribution) and a boxplot of the distribution and histosramof the distribution.

7.4 Data classifcation Box 7.2 Univariate statisticsin R


Given two lists of numbers,say site-l and site-2, the statisticallests described in the first part of this chapterare quickly and easily performed.For example, the Wilcoxon test is performedby the command > wilcox. test ( site-1, site-2)

which in this example retums a result in a format similar to the following, showingthat the two samplesare most likely drawn from the samepopulation: data: site-1 and s i te-2 W = 51, p-value = 4.969'7 alternative h)4)othesis: true mu is not equal to 0 Other tests are equally easily performed.The Kolmogorov-Smirnov test is called with the command > ks.test(s'te-1, si ce-2 )

and a t-test assumingequal variances is called by > t. test ( slte-1, site-2, var.equal-TRUE)

The prerequisites for a t-test, namelythat both samplesarenormally distributed, can be detemined with the Shapiro-Wilks test:
\ cLrhi,^ l-acr/ci Fa-l \

and equivalence of variancecan be established using the F-test with the following command: > var.Eest(site-1, si te-2, var.equal-TRUE)

In all these casesthe output takes a form similar to that shown above for the Wilcoxon test, namely some summary information followed by a p-va1ue that indicatesthe degreeof confidencethat the samplesare drawn from two populations. For more information aboutR, seeVenablesand Smith (2003). numerical. Qualitativeclassifications are thosethat use a descriptiveattribute and, For example,an artefact as the basisfor dividing objectsinto different categories. typology basedon the shapeof the pot (e.g. 'open mouth' vs. 'closed mouth'), or type of tool (e.g. 'scraper' vs. 'burin' vs. 'projectile point') are qualitative. Spatial data can also be classified using descriptive terms: archaeologicalsites might be divided into different periods,or landscape dataclassifiedon the basisof (e.g. 'loessterrace' vs. 'Holocene alluvial environmentaland fomation processes plain' vs. 'palaeochannel' vs. 'gravel island'etc.).


Exploratory data analysis

( b)

I t r t..


o' so:' 1o



E -(D.
. 3r. - U
O Scraper
( d)

@ -O

. Stone tool (cl


to o o o'o. ol'o.
oa a

?' s?'''


O Local O Exotic . ljn.lassed


o ExhaustedI Exploited a Fresh .Unclassed

Fig. 7.13 Different qualitativeclassiflcations on the samestone-tooldataset:(a) unclassifieddistdbution, (b) scrapervs. core locations,(c) raw material distribution and (d) reduction sequence classification.Some patteming in location ofcores and scrapers is evident in (b), but the spatial and attributepattemswould require statistical validation using techniquesdescribedin Chapter8.

7.4,1 Qualitative classif.cations One consequence of an object possessing severalattributesis that severaldifferent maps may derive from the samebasedata dependingon which attributeis usedto constructthe groupings. Figure 7.13 showsa hypotheticaldistribution of stonetools acrossa 1iving surface,whereeachartefactis represented by apoint and is associated with an attribute table that describeswhat sort of tool it is (e.g. 'scraper', 'core', etc.), its stagein the reductionsequence (e.g. 'early', 'exhausted')and the raw material on which it has beenmade (e.g. 'local chert', 'exotic chert', etc.). By differentially classifying

7.4 Data classiJication


objectsaccordingto thesedifferent qualitativevariables,it is possibleto generate of spatialorganisationof severaldifferent maps that pemit the visual assessment For example,Dibble er al. (1997)usesthis methodto highlight spatial the artefacts. patterningof stone tools to assess behaviouralve$us post-depositional effects at the FrenchAcheulian site of CagnyJ'Epinette. Qualitative classesmay also form the basis for qualitative modelling follow(e.g. Fig. 7.14). Bunough and ing their reclassifcation into rank-ordercategories VcDonnell (1998, pp. 171-172) describea simple example where land quality, LQ, for small-holdercrop production may be modelled as a function of nutrient supply, oxygen supply, water supply and erosion susceptibility.In this example, rank-orderclassificationsare derived from polygon maps of soil depth (shallow, steep).Erosion moderate, deep),soil series(five classes) and slope(flat, moderate, susceptibilityis constructedby combining soil and slope data (e.g. 'if soil series is 52 and slopeclassis flat, then erosion susceptibility: 1'). Assuming that high Yaluesmean less suitable,then the overall suitability of land for crop production by Burrough and can be determinedby the highestvalue in the manner suggested McDonnell('l998,p. 172): LQnuur.n,,, LQ**.") Suitability : max(LQ*",",,LQo*yg.n, This is obviously a great simplification of the potential factors involved in the serve as a relative suitability of land for crop production,but it doesnevertheless qualitative into rank-order dataclasses may be converted useful illustration of how numeric variablesand subsequently combined in order to producenew data from the constituentelementsof severalorisinal datasets. 7.4.2 Numerical classifcations The subject of numerical classification has received considerable attention in archaeology,particularly in terms of artefact analysis (e.g. Baxter 1994; Shennan 1988).Our concernhereis more specificallywith the applicationof numerical classificationas appliedto spatialdatasets, althoughnumerical classifications can be divided into two types: unfuariate andmultivariate.Univanate methodsare thosethat deal with only a singlevariable- for example,the classificationof slope values- into discretecategories, whereasmultivariatemethodsdeal with the clasof two or more variables - as, for example,commonly encountered with sification multispectralsatelliteimagery- into clustersor categories that reflectreal structure in the data. Univariate classificationsand statistical generalisation Univariate classificationsmay be applied to geospatialdata in order to simplify or generalisea dataset,as is the casewith choroplethmapping (seeChapter 12). For instance,a sampleof severalhundredtestpits,eachof which havea number of artefactsranging from 0 to 36, might be grouped into four new categoriescalled 'no evidence','low intensity','medium intensity','high intensity'.The strategy


Exploratory data anolysis Soil

Ranked soil


Ranked slope

Ranked soil+ ranked slope


.i ;
I worst

Fig. ?.14 Hypothetical l eclass ihcation ot' qualitativevadables to rank order variables, for-the purposeol qualitativeprediction of land quality for honiculture. Soil and slope mapshave beenranked jnto nLrnedcalcategories(0 : most suitable.to 3 : leasr sLritable). Theseranked aftr-ibutes are combined using polygon overlay (if vectors),or with map algebra(if rasters). The result is a new map with five land catesories showing modelled land-qualiry variability. (1996) SeeFAO (1974,1976)and Rossiter for guidelines predictions on qualitative of this kind, and Wilson(1999)andHoobler er rrl. (2003) for applied examples.

7.4 Data classification

r 4r

E z Numberof artefacts
Number of artefacts

o 6 .o

z. Number of artefacts

z Number of artefacts

(c) bimodal,(d) Fig. 7.15 Four idealiseddistributions:(a) nonnal, O) rectangular, skewed(to the right).

for both defining the groupsand for decidingwhich testpitsshouldbe placedin which group can be done arbitrarily (e.g. perhaps0 artefacts: 'no evidence', 1-5 : 'low', 6-18 : 'medium', 18-36 : 'high'), but theremay thenbe litde correspondence between the dataandthe categories; if thereis only onetestpitthat containsbetween6 and 18 adefacts,for example,the classificationwould not be 'natural' breaksin the datasets. It is thereforeessential to havesome representing of in to classify it appropriately. understanding of the shape datadistribution order eachof whichis best Figure7.15shows four idealised examples of distributions, methods. classifiedusins oneof thefollowins classification
. Standard deviation and quantile In situations wherc the data distribution can be shown to be nomlal,.or close to normal (i.e. similar to the histo$am in Fig, 7.15a), then classif,cations using Jta"/ard deviations ot quantiles to defrne d^t^ categories will be most useful. Standard deviation methods use the statistical deviatiotr from the mean number of objects per observation, typically in steps of 40.5 or 11.0, to consfiuct classes (Fig.7.16). When applied to choropleth mapping, the effect often serves to illustate highs and lows, wifr differcnt colouN used for the upper and lower end of the data mtrge (e.g. blues for standard deviations to the left of the mean and rcds for standard deviations to the dght). Quantile classifications divide the number of observations into equally sized groups, such as Emniles, which each contain 25 per cent of the number of observations, or quintiles wtich eirch contain 20 per cent of the nurrber of observations.

(a.r c F

Explorat ory data analysi s

(b) -


o o



Number of artefacts

n=25 I n=25 | n=2s I n=25

Numberof artefacts

Fig.7.16 Classincadonof nomally disbibuted data: (a) by standarddeviation. lb) by quartiles(, : number of observations).

Equal interval or equal step Rectangular, or nearrectangular, distributions(i.e. where objectshave the samenumber of observations) are rare in archaeology. In caseswhere such a pattem does exisl, categories can be defined with equal interuals,such that each categorycontainsthe samerange ofvalues (e.g. I 8,9 16, 16 24,24-36). This is the simplest form of classifrcation, although it is not a good way to depict data variability if the distribution is not linear.For example,if there are mote instances of lower-valued data (e.9. 75 per cent of henchescontainedfewer than 8 artefacts), then in the resulting map 75 per cent ofthe data will residein a single classand any spatialpatterningin the 1-8 artefactrange will be masked. Natural break This is perhapsthe most common form of general classification tbr quantitative data.Theclassificationmethodattemptsto find the mostsuitableclassranges ('breaks') by testingthem againstthe distributionof the entiredataset sothat the resulting class rangesreflect the structureof the distribution. The statisticalmethod most often usedforthis is called"/enksOptimal Method,which attemptsto find naturalclustersin the data so as to createclasses that are intemally coherentbut distinctivefiom other classes (Jenksand Caspall 1971: Slocum 1998).The processinvolves an iterative comparison of class meansagainsta measureof the mean values for the entire datasetin ordeato maximise what is refened to as a goodness of|ariance rtt (GVF).'fhe optimal partttion for each subsetis the one with the smallesttotal enor (the sum of absolutedeviations about the classmedian or, alternatively,the sum of squareddeviations about the class mean).The calculationis straightforward(seeDent 1999,p. 148) and can be carriedout by hand. However, as it is an iterative processthat involves starting with an arbitrary classification, thenredefiningclasses andre-calculating the GVF in orderto find the most 'natural'classification,it is bestcardedout on a computer In an idealisedsituation,such as in Fig. 7.15(c), if the rnaximum number of artefactswas 36 then the natural-break method would createtwo classesof 1-21 arld 22-36. In other cases,such as with the skewed distribution in Fig. 7.15(d), natural-breakmethods will ofren provide suitable categories, althougha classificationbased on an arithmeticor geometricprogression may be better (sebelow). The three major disadvantages of natural-break methodsare that they aredifficult 10replicate,assmall changes in datavaluescanchangethe classification scheme,the bin rangesare difficult to read as they do not lie on intuitive breaks,and outliers can receiveundue visual importanceas they are given their own category. Equal area Insteadof using the data valuesas the basisfor determining the individual class range, divisions in the data are estabiishedso that each class sharcs an equal

7.4 Data classtfcation


(sherds perhectare) Ceramic density Fig. 7.17 Distribution of sherddensity.Nearly 90 per cent of the 456 units have fewer than 100 sherdsper hectarc.As the maximum value is nearly 10000 sherdsper hectare,only the far left ofthe distribution curve is shown. Colour coding shows geomet c progressionclasses used for Fig. 7.18(f).

proportion ofthe map area.In certainapplicationsthis may provide a useful altemctive to classifications that are basedon the data distribution curye itsell because equal-area classbreakscan be usedto examinehow the data are distributedin terms of their spatial area.The major disadvantage is that largepolygonsmay end up in a classby themselves. Geometric intervals In caseswhere distuibutions are skewed,as in Fig. 7.15(d),ir may be difncult to obtain a representative classificationusing the pevious techniques.One altemativeis to use a geomet c progressionto define inte als. The method is suitable for classifying distributionsthat show very pronouncedratesofchange, as classranges increaseexponentiallyin the mannerof Xl, X2, X3, etc. The formula


can be used to determinethe multiplier X, where z is the number of desiredclasses,11 is the highest value and L is the lowest value (Dent 1999, p. 406). Class upper limirs are defined as

L x X ) , L x X 2 .L x x 3 ,..., L x X "


For example, Fig. 7.17 shows the skewed distribution curye produced from a dataset of ceramic densitiesfrom 256 survey units, where the lowest recorded value is I and the highest is 9902. Five classesdefined using geomet c progressioncan thereforebe constructedby substitutingthe appropriatevaluesin (7.6): v5 9902 I

x : f,9902:6.3
From (7.7), class upper limits are therefore establishedas: I x 6.3r,1 x 6.3,, I x 6.3 3,I x6.3 4, 1 x 635 : 6. 3, 39. ' 7. 250. 0. 15' 7 5 . 3 . 9 9 2 4 . 4 .

In the sameway thatreclassifyingandregroupingdatausingqualitativeattributes can result in signilicantly different maps, different numerical methodsapplied to the same datasetcan produce very different impressionsof the data. Figure 7.18


Erploraton data anabsi s

E E E n & I 0 - - ln .d e v mean 0 - Ist.d e v 1 2 st.dev 2 3 st.dev 3 n.dev l:a w I l o 1 - 2.7 2.8 8.0 8.1 25.5 25.6 9901.2

ft0 D I ',1980 n r98r 3960 w 3961 - 5940

5941 792A 7921 9901

EO f l fl w 7 -

r 1 0 9 .0 r 0 9 .r- 6 8 r .0 6 8 3 .12 0 4 1 0 2041 .1- 4246.0 4246.1 .2 - 99A1

f t Er w I I

0 3.0 3.r 9.0 9 . 1 '2 5 . 0 25.',r 9901.2

EO ,a 7-40

r:: 41 2so
w 25r 1572 I1573-9902

(a) standard Fig. 7.18 Six possible numerical classifications ofthe samedatasett (b) quintilc.(c) equalinterval. (d) naturalbleak, (e) equalarea. (f) georncr|ic deviation, progressionSource: KytheraIslandProiect. Usedwith permission.

7.4 Data classffication


showshow the six different methodsdescribed createsix very different maps,even though the basedata are identical. As the data show significant skew (Fig. 7.17), classifications basedon standard deviation(a) and equalinterval (c) obscurespatial patteming.Quintile (b) and equal-area (e) classificationmethodsproduce similar resultsand both lack detail in the upper rangesbecause of the large classrange of 25-990I; this is a common problem with skeweddistributions,as the fewer large numbersget lumped togetherin a single class.The natural-break classification(d) is an improvement,but while the categoryl-109 is 'natural', referenceto Fig. 7.17 showsthat about90 per cent of the data fall within this group and thereforespatial patterningwithin this rangemight be masked.The geometric-interval classification (f), basedon Eqs. (7.6) and(7.7), coffectsthis by mimicking the actualdistribution curve;the lower valuesare subdivided,which permits a better understanding of the spatialpropertiesof the lower (majority) data values. Multivariate classifications Classifications that aim to discoverand define grouping in a set of data using two or more variablesare referred to as multivariate. These methodsare best known in archaeologyas applied to artefact analysis and severaltextbooks explain the varioustechniquesthat can be used to both discover,describeand test patterning (e.g. Shennan19881 statisticallywithin multivariatedatasets Baxter 1994;seealso Aldenderfer 1998 for a recentreview of quantitativeapproaches in archaeology). Multivariate classificationis an important componentof spatialdata analysisin at least two instances:for finding clusterswithin a set of spatial objects using two or more of their attributesand for the detemination of classesin multispectral image data. The former can be useful when attempting to discover grouping in objectson the basisof attributesderived from GIS analysis.For example,several quantitativevariables(e.g. similar to thosedescribedin Table 3.3; elevation,slope, aspect, distanceto water,visibility, etc.) may havebeencollectedfor a distribution of archaeological sites acrossa study region. An exploratorymultivariateanalysis couldthendetemine whethersitesareclustered in groupsdefinedby theirmeasured characteristics. Multivariate analysessuch asfaclor analysis or principal components analysis will permit the visual assessment of clustering, which can then be confirmed and refined by using a k-meansor discriminant function analysis. For datasetsthat consist of both quantitative and qualitative data, a coefficient matrix, such as a Jaccard coefficient,may be usedas the basisfor techniquessuch asprincipal components analysisor hiernrchical cluster analysis.Analysis of this sort might, for example,revealtwo distinct groupsof sites,one of which is situated on higher elevationswith south-facingslopesand good visibility, while a second group is found at lower elevationson low slopesand closerto water sources. Readersare referredto the previouslycited textsfor descriptions on how thesestatistical techniques are used. Methods related to predictiye modelling, a techniquethat seeksto predict the probability ofencounteringa phenomenon in unsampled areas basedon knowledge gained from sampledareas,may provide insight into a suspected pattem. While


Exploratory data analysis

data patternsmight be apparentfrom visual inspection of a spatial dataset,statistical analysis is useful fbr confirming pattening and determining the relati\e contribution of eachof the variables.Chapter8 introducesseveralstatisticaltechniques,including predictivemodelling, for investigatingpattemsand relationships in spatialdatasets. 7.4.3 ClassiJication of remotelysensedimagery In Chapter5 we reviewedhow imagedatacanbe usedin archaeology. In manycases an imagemay be useful 'as is' andneedsno further modificationor analysisbeforeit canbe usedasa visual reference, or for collection of spatialdata.In othercases, particularly with multivadateimagery from satellitesensors, the datarequireprocessing andanalysisto extractmeaningfulinformation.This often involvestwo separate processes: imageenhancement lbllowed by imageclassification. The manipulation. classification andintetpretation ofrernotely sensed imageryis a disciplinein its own possibly right andwe cannot do itjusticein the context of thisbook.The following only outlines the basic philosophy underlying remote sensing;readersr.equiring more comprehensiveinformation are referred to Lillesand and Kiefer (2000). Campbell (2002) or Lillesand et al. (2003), for excellent introductions ro these techniques. As we have previouslydescribed, a digital image usuallyconsists of one or more separate bands(i.e. rastergrids) of data where every pixel within eachband is assigneda value of between 0 and 255. Raster-based GIS packagessuch as Idrisi and GRASS include a range of tools designed specifically fbr enhancing andclassifying imagedata.Imageenhancement involves the manipulation of pixel values to make the dataseteasier to classify and interpret. Idrisi, for example. contains a range of analytical rnodules specifically designedto enhanceimage data, such as the tool 'STRETCH'. This performs a controst stretchon an image. which is usefulif the image'spixel valuesare clustered in a nanow band within the 0-255 range.It re-scales the rangeof valuesto provide a broaderrangeof data categories for clusteranalysis, usingtechniques suchas /zl;tagra m equalisatiotl to help improve the contrastbetweenhigh and low data values.Other forms of image enhancement include the constructionof colour compositesto producean image sinrilar in appearance to a colour photograph,or false colour comlra,ri/es to make certain features,such as leafy vegetation,stand out. For example,by combining the bands3 (visiblered), 2 (visiblegreen)and I (visible blue) frorn a Landsar ETM image,a colour composite is created. A falsecolourcomposite designed to emphasisenear-infrared(NIR) elementsof an image would use bands 4 (NIR). 3 and 2. Vegetativeindexes,such as the Normalised DiJJbrence Vegetative Index (NDVI), which is a ratio of leflectivities measuredin the red and near-infrared portions of the electromagnetic spectrum,can be usedto provide a measure of the (.20t)1, lelativeamounts of greenvegetation within an image.Eastman pp.27-31) provides a good overview of theseand other digital image processingtechniques and describes their implementationusing ldlisi.

7.4 Data classification Landscape Digitalimage Classified map


.' ilili Treesf Grass Water

Fig.7.19 A hypotheticallandscapeconsistingof three distinct ecologicalzones (left), the pixel valuesas rccordedby a digital sensor(cenhe) and a possibleclassification ( ght). Note that the pixel valuesexpressan averagefor the areathey cover and that the classificationhas focusedon distinguishingbetweenthe three ecological zones. Alternative classifications might, for example,aitempt to find classeswithin the forest zone reflecting the predominanceof different tree types or species.

Following any enhancements and correctionsfor atmospheric distoltion, image classificationcan then be undertakenin order to discoverand define the relationship between pixel values (i.e. recorded electromagneticradiation) and features on the Eafih's sudace (Fig. 7.19). Image classificationcan proceed using either unsupervised or supervised methods. Unsupervised classification This involves grouping pixels of similar spectralvalue into classeswithout prior knowledge ol either class composition or what the classesmay represent.This processis also known as spectral clustering.In single band images,clusterscan be identified by the analysisof a histogramof pixel values.If two bandsare being used, their pixel values can be plotted on x, )-axes and clusters defined on the resulting graph. For multispectral images, the processof identifying clustering patternsinvolves multivariate statistics,such as principal componentsanalysis,as describedin the previous section.Most image analysisprogramshave facilities in which it is possible to specify the number of desiredclusters.In other casesthe programmay automaticallydefineand then group pixels to maximisethe variation in the imagewith little subsequent groupscanbe manually userinput. Alten.ratively, defined,and pixels allocatedusing a statisticalfunction. Most simply, each pixel is assignedto its nearestcluster (e.9. Idrisi's 'MINDIST' function). Pixels may also be assignedto classes on the basis of some other statisticalrelationshipwith the cluster groups (e.g. whether it falls within a set distance).Assessingwhether the classifiedimage makes sensein terms of the landcoverit represents involves (Eastman comparingthe resulting classificationmap with known ground features


Explorttrort tkrta utthsis

2001,pp. 30-321.Classes that eitherdo not make sense, or are not usefulfrom an rnter?retati'eperspective, can becomethe focus of additionaldata collection to createa faining samplefor a supervised classilication.This exercrse rs reterredto as ground truthing and is an essential part of unsupervised classification. Supervised classification This method often produces better results than unsupervisedclassification,but it requires some backgroundunderstanding of the landscapebelng studied. The processbegins with the user defining groups of pixels to a.ctas tammg areas, each of which consistsof a sample of pixeJs thai Oefineu ino.'vn phenomenon.

(such as mean and standarddeviation) fbr each cruster defined by the training areas. A clas.sifcarion algorithm i, ii"n u."o to classify of the training areasinro one of these clusrers. f,hese range from lt::t: slmple:i,:i,t: techniquesthat involve placing a pixel into the class that has the nearest mean pixel value to its own value (e.g. ,maximum likelihood cLttssffication, ), to more complex Bayesianmethodsthat depend on specifying the likely propol-tion to rhe image, can help assign .cl-ifficult,pixels. Some :::l],^.5J,rU.. .which u(sKrup \rlJ programs suppo quitecomplexclassilication facilities, notablyIdrisi and GRASS.However. image classification una int".fr"tutlon cun be a.o.pl"* processand it is advisableto consult an introductory book on rerrrore senslngand lmagelnterpretation for further detailson the philosophyunderlying this technique. The Idrisi manual (Eastman2001) offers u u"ry good ou..ui"roi ot r".ote_sensing principlesand how they can be implemented inlirisi. rne cnass ontine rutoriall also discusses image processingand crassification p.o..0o.., rn irnher deta . 7.5 Conclusion The threStypes of exploratory data analysis describedin this chaprer_ quenes. parametric and non-parametricstatistical tests, and data classification_ are by themselvesinfbrmative and useful devices for findin;, ;;tryj;; and visualising datapattems.Spatialpatterns,however,require a .epaiat" s.t of itohstrcal toorsin order to ascerlaintheir structureandsignificance. The n.*i oir.urses this in further detail andintroducesa rangeofmethods "n"f_ for identifying"rputiuf ,"futionships in archaeological datasets.
3ht t p : / /mpa. itc. it/n a r ku s/o sg O5 /n e te le r g r a ss6 nurshet12005.

j-.-oit* fn3wnsroups ofpixils in tiis wayit tspos.iuie to oertve rhecenrral 3t telgency and dispersion

pdf .

8 Spatialanalysis

8.1 Introduction Spatial analysis lies at the core of GIS and builds on a long history of quanti tative methodsin archaeology. Many of the foundationsof spatial analysiswere quantitative geographers established by in the 1950sand 1960s,and adoptedand modified by archaeologists in the 1970sand 1980s.For a variety ofreasons,spatial analysisfell out of fashion both in archaeologyand in the other social sciences. In part this was because ofthe perceivedovergeneralisation of certaintypesof mathematicalmodels,but alsobecause ofa shift towardsmorecontextuallyorientatedand relativist studiesofhuman behaviour.Recently,however,therehasbeena renewed the spatial organiinterestin the techniquesof spatial analysisfor understanding sation of human behaviourthat takeson board thesecriticisms. In the last decade particularlygeography therehavebeenseveraladvances within the social sciences, and economics,in their ability to reveal and interpret complex patternsof human behaviourat a variety of scales,from the local to the general,using spatial statistics. Archaeology has participatedsomewhatless in these recent developments, althoughthere is a growing literature that demonstrates a renewedinterest in the appiicationof thesetechniques to the study of pasthuman behaviour In this chapter we review some historically important methods(e.g. linear regression,spatial autoconelation,cluster analysis)and also highlight more recent advancesin the applicationof spatialanalysisto archaeology(e.g.Ripley's K, kernel density estimates,linear logistic regression).Readersrequiring more in-depth discussionof methodsof spatialanalysisare advisedto consult the sourcesthat we have made use of for this review, parlicularly Bailey and Gatrell (1995); Fotheringhamet al. (2000b);Rogerson(2001) and Haining (2003). 8.2 Linear regression Linear regression haslong beena stapleof quantitativeanalysis.It is usedto model the relationshipbetweentwo continuousvariables, andis oneofthe more important methodsin spatial statistics.Consequentlywe have describedthe technique,and potentialpitfalls, in somedetail. someassociated Linear relationshipsbetween two quantitative variables may be expressedin terms of the degreeof cotelation, of which there are three basic possibilities: positivecorrelation,zero correlationor negativecorrelation.Two variablesare said to be positively correlatedwhen there is a simultaneous increasein value between two numerical variables (Fig. 8.la) and negatively correlatedwhen one variable 149

Spatial analysis

Fig. 8.1 Idealisedconelation pattems: (a) positive, (b) negative,(c) zerc, (d) spurious positive corelation in an uneven,clustered,dataset.

(Fig. 8.1b).Zero correlationoccurswhen there increases while the other decreases is no relationshipbetweenthe two variables(Fig. 8.1c).Figure 8.1(d) showsa more complex relationship between two variableswhere there is a degreeof positive correlation,althoughthereis also a strongclusteringpattem and, heteroscedasticitJ, (unevenness) in the datadistributionthat reduces the predictivevalueof the model. We will examine the firct two examplesinitially, then retum to casesof zero or spuriouscorrelationlater. When examining the type and strength of correlation between two variables, one variable is considered. to be dependent and the other in dependent-When plotted on an r, )-graph, the independent variableis plotted on the r-axis and the dependent variableis plottedonthe y-axis. The differencebetweenthe dependent andindependent variablesis irnportantandcan be thoughtof ascloseto that ofcause andeffect.

8.2 Linear regression


To usea well-known archaeological example,the proportion of a particulartype of raw materialcan often be shownto decline with the distancefrom the sourceofthe raw material- i.e. distanceandproporlion arenegativelycorrelated, aspredictedby 'law' (Renfrew Renfrew's of monotonic decrement and Dixon 1976).In this case, it is the proportion of material that acts as the dependentvariable, as its value is determined by its distancefrom the source.In situationswherethe suspected causal relationshipis more ambiguous,for examplebetweenthe number of artefactsand the sizeofan archaeological site,it is possibleto speakof interdependence. In these cases,which variable is r and which is y is only significantin terms of what the analysisis specificallyattemptingto model. regression While it is acceptable simply to describethe relationshipbetweentwo quantitapositively variables tive as either or negativelycorrelated, it is often more useful to expressthis in terms of the strengthof the relationship.The standardmeasureof the linear correlationbetweentwo variablesis the Pe arson conelation coeflicientsymbolisedby r and given by

I(r; -;)

x (1l, - t)

(8 1)

l ( x i-t)2xI()j-t)2
where t and t are the mean values of the independentand dependentvariables. Valuesof r rangefrom f 1.0 for a perfectpositivecorelation to - 1.0 for a perfect negativecorrelation.The midpoint, r 0.0, denotesa completeabsence of conelation betweentwo variables.For example,the two variableslisted in columnsone and two in Table 8.1 possess a correlationcoefficientof +0.96, meaningthat they are highly positively correlated- asthe.r-value increases, so too does;y in a highly predictablemanner. The correlationcoefficientis usefully visualisedasa 'line ofregression'placedto (actuallythe sumofthe squared minimisethe surnofthe verticaldistances distances) from eachpoint (the 'residuals') as illustrated in Fig. 8.2. In contrastto r", which simply describes the strengthofthe correlationandwhetherit is positiveornegative, the r2-value gives a better indication of the predictive power of the independent variableand can be interpretedas a proportion of the variation in the valuesof y that aredetermined by x. To convertthis to a more tangibleexample,imaginethat "r and )) are takento refer,respectively, to site size and artefactcount (so that artefact countis acting asthe dependent variable).A correlationcoefficientof0.96 converts to a coefficientof determination,r2, of 0.88. This indicatesthat 88 per cent of the variation in aftefactcount can be explainedsirnply by site size. Note that it would be acceptable to turn this around and recomputethe correlation coefficient using site size as the dependent variable (i.e. as )), if the purposeof the analysiswas to predict site size on the basisof artefactcount. Two quantitiesof the line of regression, its slope (a), which definesthe rate of changeand the point at which the line crossesthe J-axis (0, called the intercept),

r 52

Spatial analysis

Table 8.1 Samplex- and y-values and the calculationsfor deriving r in Eq. 8.) and Fip. 8.2
(,ri - t) x (li - t)


r.r, t)2 0.102 0.078 0.073 0.058 0.036 0.029 0.026 0.008 0.006 0.000

(1, t)': 31',7 l3'7 .92 2'17 88'/ .12 469430j2 181603.82 t2t 208.42 2t 948.42 106373.82 51143.82 238290.42 '792.12 9 244.82 8 996.52 29 532.42 | 455.42 222642.42 304538.42 256896.92 3768l 1.82 600392.s2 511 010.52 4 107 338.50

o.11 0.2r
0.22 0.25 0.30 0.32 0.33 0.40 0.41 0.48 0.50 0.56

0.62 0.65 0.69 0.72 0.76 0. 81 0.82 Mean 0.49 Sum 9.8

365 401 243 502 580 780 602 '702 440 900 832 1023 1100 890 1400 1480 | 435 | 542 1103 1643 9 2 8 .1 5 1 85 6 3

-0.32 -0.28 0.2'7 -0.24 -0 .1 9 -0.t'7 -0.16 0.09 -0.08 0.01 0.01 0.0? 0 .0 9 0.13 0 .1 6 0 .2 0 0.23 0.2'7 0.32 0.33

-5 6 3 .1 5 52't.15 -6 8 5 .r5 -426.15 -3 4 8 .1 5 -t4 8 .1 5 3 2 6 .1 5 -226.15 4 8 8 .l 5 -2 8 .1 5 9 6 .1 5 94.85 1 7 1 .8 5 -3 8 .1 5 4 7 1 .8 5 5 5 1 .8 5 506.85 6 1 3 .8 5 774.85 't 14.85

180.21 147.60 184.99 102.28 66.15 25.19 52.18 20.35 39.05 0.28 -0.96 6.64 15.4'7 4.96 75.50 110.37 116.58 4 t65.'7 247.95 235.90 1786.51

0.005 0.008 0.017 0.026 0.040 0.053 0.073 0.103 0.109 0.849

are collectively rcferred to as the regression constantsand are given by:





and: \Lr t

.,. -;



-. ^r

The slope and intercept values allow for predictionsto be made for y for any given value of ,r, as given by:



8.2 Linear regression x -1 02.93 Y=2104.2 r'?=0.88


fitted to the t- and 1-valuesin Table 8.1, shown with the Fig.8.2 A line of regression equation. coefficienl of dererminarion and the regrescion 1121

For example, if Fig. 8.2 showed the correlation between the size of a site in hectares(r) and the number of artefactsrecoveredfrom surfacecollection (y) and a : -102.93 and b :2104.2, then it would be possible to predict the number of artefactsfrom a site of 0.9 ha by substitutingthese values into Eq. 8.4: )) : ( -102.93\ + 2104.2x 0.9:1790.9. Predictionsalso have an associated standarderror. calculatedas
1,' ./1

;,. -. - Jt l'2


y -


where ! is the predicted value of the dependentvariable. Using the data from root of 348 999.1divided error of the predictionis the square Table8.2,the standard by 18, which equals139.2.One standardenor is roughly equivalentto 68 per cent ifthe residualsare normally distributed.In this example,assuming of observations that this is the case,then the prediction for the number of artefactson a site 0.9 ha in size is 1790.9 + 139.2,which has a 68 per cent probability ofbeing conect. If


Spatial analysis Table 8.2 Data for the calcukttion of standnrd error for the regression analysisof the variables given in Table8.1 (y' - l')'
365 401 243 502 580 780 602 '702 440 900 832 | 023 1100 890 I 400 l 480 I 435
I 54)

1'103 | 643 Sum

254.8 339.0 360.0 423.1 528.3 570.4 5 9 1 .5 738.8 '759.8 907.l 949.2 1075.4 1 tl'7.5 | 20t.'7 | 264.8 I 349.0 1412.1 | 496.3 1 6 0 1 .5 1622.5

1t0.2 62.0 -117.0 78.9 5r.'7 209.6 10.5 36.8 -319.8 '1.1 -1r7.2 -52.4 l 7.5 -311.',l 135.2 131.0 22.9 45.'7 101.5 20.5

1214'7.6 3 850.0 13687 .6 6222.1 2 669.8 43926.3 | 11.2 1350.6 r02266.9 50.2 l 3' 728.8 2',l48.1 306.5 9'7140;7 18219.0 l'7 169.4 524.7 2092.0 1030'7.9 419.7 348999.1

greateraccuracyis needed,then doubling the standardenor to +278.4 providesa 95 per cent probability of being conect. The Pearsoncorrelationcoefftcient(r) and the coefficientof determination(r2) are includedin nearly all computerstatisticalpackages, including Microsoft Excel, SPSS, S-PlusandR. However,theuseofPearson'sr fordescribingrelationships can be problematicandit is worth reviewingthemajorpitfalls into which unexperienced usersoften stumble.Firstly,it is unwiseto assume a causalrelationshipsolelyon the basis of an obsen ed correlation.For example,while it may be generallyobserved that there is a strong positive correlation between the size of an archaeological site and the age of the director of the excavation,thereis no causalrelationshipin eitherdirection betweenthesetwo variables.In cases where a causalrelationshipis suspected, it is alwaysworth investigatingthe possibility that thereareintermediary variables(e.g.in the sequence of: age-+ seniority -+ sizeof funding grant -+ size of archaeological site). Regressionanalysis also dependson a number of assumptionsthat must be shown to be true before any measuredcorrelationcan be shown to be meaningful (cf. Shennan1988,pp. 139-142). For example,the predictivevalue ofthe statistic is dependent on the variation aroundthe line of regress ton being homosced.astic or evenlydistributed. If this is not the casethe variationis d escrlbed ashetercscedastic.

8.2 Linear regression 12= 0.835

r 55

Fig. 8.3 A line of regression litted to a heteroscedastic point distribution.There is a strongpositive correlation (r : 0.91), but the coefficient of determination(/2 : 0.84) is meaningless.

Although a heteroscedastic distribution can be subjectto regression and a fomula obtained,the results are largely meaningless. Figure 8.3, for example,showstwo randomclustersof points that individually havean 12 of 0 but collectively havean r' of 0.835. The heteroscedastic nature of the distribution, however,meansthat .r hasvery little predictivevaluefor y. Similarly meaningless resultscan occur when one or two outliers from a random distributionresult in a line of regression with a strongpositive or negativevalue.In thesecases, visual inspectionofthe scatterplot is essentialto ensurethat the distribution of points is evenly spreadalong the rand y-values.If this is not the caseand clusteringor outliers are evident,then the latter should be removed (and separatelyaccountedfor) and clustersinvestigated separately. Even ifobvious outliersarenotpresent,analysisofresidualsin the ways describedby Shennan(1988, pp. 139 144) can provide considerableinsight into the structureof a linear relationship. Thirdly, linear regressionattemptsto model linear relationshipsbetween variables,but a non-linear trend might be apparentin the data, as in Fig. 8.4. When visual inspectionof an x, y-plot suggests a non-linearpattem, it is appropriateto transformone or both variablesprior to performing a linear regression, especially asthis will not reducethe predictivenatureof the model. Common transformations include the square, the square root, the naturallog or logro ofone or both variables. Experimentationwith different transformations is often neededto obtain the optimum correlationcoefficient with datathat exhibit curvilinear tendencies. Shennan (1988, pp. 135-165) provides a comprehensive discussionof transformationsof variables to improvelinearregression. Fourthly, one assumption of regression analysisas appliedto spatialdata is that eachofthe two observations on the r- and y-variablesshould be independent and


Spatial analysis

6000 5000 y2 +ooo 3000 2000 1000 5ooo6000Tooosooo x x2

Fig. 8.4 Transforminga variableto improve the conelation coefficient.The scatterplor on the left has a slight curvilinear trend, which can be fansformed to a linear trend by squaiing the r- and /-variables, as shown on the ight. The regression equationfor the transformeddata thereforebecomesy2 : a+ b x x2,otJ b-:? Ja + -

not spatially autocoffelated(seeSection8.3), nor should the residualsbe spatialll' (Rogerson2001, p. 154).If both "r and 1rhavehigh sparialautocorautocorrelated relationthen the varianceand standard error of r, which is a function of the number of observations as defined in Eq. (8.5), will be underestimated because Pearson's r assumes that each pair of observationsare independentof each other (Haining 2003, p. 279). The correlationcoefficient will as a consequence be overestimated. Although an assessment of autocorrelation using methodssuch as Moran's 1 can be made on each of the two variablesprior to the regression,spatial dependence can also be assessed by mappingthe residualsfrom a regression analysisand visually searchingfor evidenceof spatialautocorrelation(Fotheringhamet al. 2000b.

For example,imaginethat the relationshipbetweenthe amountofprehistoric and medievalpottery from a sampleof surfacecollection areaswas being investigated (Table8.3).Thereis a null hypothesis ofno correlation(i.e.higheror loweramounts ofprehistoric pottery haveno bearingon how much rnedievalpottery is recoveredand vice versa).A Pea$on's r analysis(8.1) retums a value of 0.4 with medieval pottery asthe dependent variable,suggesting a slightbut definitepositivecorrelation betweenthe two pottery types,perhapsindicating that the prehistoricand medieval sitesoverlap to a certain degree. A plot of the residualsof ), however,showsconsiderable positive spatialautG. (Fig. corelation 8.5). High positive deviations between predicted and observed medievalpottery clusterin the centreand upperright, and high negativedeviations only appearin the upper left and the bottom of the survey area. This indicates that the observations are not independent. This in tum wams that the resultsofthe

8.2 Linear regression Table 8.3 Countsof prehistoric and medievalpottery recoveredfromten surface collection areas
Prehistodc I 2 4 5 6 7 8 9 l0 30 o 56 42 21 59 56 2l 33 36'7
Medieval Predicted


2 3 56 23 l2 65 30 9 2 24'7

21.1 8.4 34.9 2'7.5 16.4 36.6 34.9 16.4 28.r 22.',] 24',7.0

-r9.1 5.4 21.1 -4.5 28.6 -24.6 30.1 )3.6 t9.l -20.1 0.0

Fig. 8.5 A plot of the residualsof predictedversusactualmedievalpottery.Numbersin upper left of squares refer to individual surfacecollection areasas definedin Table 8.3.

corelation analysisare not necessarily valid and that r will be overestimated; thus anyjudgement of the significanceof the correlationbecomesproblematic. When a plot of the residualsexhibits positive spatial autocorrelationthen the regression analysismay be improvedeitherby investigatingadditionalexplanatory variables by producing added-variable plots (Haining 1990),by usingthe technique (Rogerson2001, pp. 187-188), modelling the residualsas a of spatial regression function of the surroundingresiduals(Bailey and Gatrell 1995), or by employing a geographicallyweighted regression(GWR) techniqueas developedby Fother(Fotheringham ingham and colleagues et al. 1998).Extensive discussion and nany worked examplesof GWR are provided in Fotheringhamet al. (2002a).


Spatial analysis

variables Alternatively,it is possibleto control for the influenceof autocorrelated on the sample by first establishingthe number of spatially independentpairs of observations(n') from all observationsn and using only the lbrmer to establish the significanceof the correlation (Clifford and Richardson1985; Haining 2003. pp. 278 279). These and other common problems with regressionanalysis are summarised in Table 8.4. 8.3 Spatial autocorrelation The term 'spatial autocorrelation' refers to the degree of correlation between pairs of observedvalues and the distance between those observrtions in spatial distributions (Cliff and Ord l98l). Positive spatial autocorelation describes a state where attribute values exhibit a tendency to be more similar the closer they are together (e.g. such as elevation. where the closer two sample points are together, the more likely they are to share a similar elevation). If there is no apparentrelationship between spatial proximity and attribute value, then the distlibution exhibits zero spatial autocorrelation.Negative spatial autocorrelation occurs when similar attribute values are located away from each other (Worboys1995,pp. 157-158). of a datadistributionproHaving an understanding of the spatialautocorrelation vides important supportinginfomation for certaintypesof modelling procedures. data is problematicfor reason: of autocorrelated In particular,the linear regression describedin the previous section.On the other hand, intel?olation is only a valic While this can b; exercisefor data with some degreeof positive autocorrelation. such as elevation,rainfall, temperassumed for many environmentalphenomena, ature, etc., it cannot be assumedfor anthropogenicdata. Creating a continuou. sudaceof anefact densitiesfrom, for example,a sampleof testpitsis only usei(which is assesse: if the sampledata show somedegreeof positive autocorrelation in Chapter6). Thus interpolationmetl:' within the techniqueof kriging asdescribed - suchast]:: of autoconelationinto their procedures ods that incorpolatemeasures in Chapter6 - typically producecontinuoussurface. techniqueofkriging described that are more accuratethan other methods. of spatialautoco:More generally,therehasbeensomeoptimism that measures (Williams 1993), in archaeology but thus i-relation may have wider application the most successfulapplicationshave been constrainedto the analysisof Mar ": tenninal monumentdates(Premo2004). . The most common method of measuring autocorrelationis using Mr.rnzrr'r (Moran 1950): statistic






- , 1 '^ '

^ '\ x /



(E :
\- ,1 2\ ^t ;,1

IP',,J L

8.3 Spatial autocorcelation


Table8.4 Somecommonproblems,consequences and solutionswith regression analysis

Consequences Residuals non-normal Heteroscedastic Inferential test is likely to be invalid Biasedestimationof error varianceand invalid inference Underestimation of variance and invalid inference Diagnostic Shapiro-Wilks test (Chaprer7) Plot ofresiduals agalnsty Moran's 1 Correctiveaction Transform )-values Tmnsform y-values

Non-independent variables

Non-linea. relationship Poor fit and non-independent Scatterplot residuals Outliers Can severelyaffect model Scatterplot estimatesand fit \on-interval or ratio Linear regression not valid data

GWR, added-variable plots, spatial reglessron Transform ),- and/or r variable Delete outliers Logistic regression

Source: Adapted fromHaining (1990, pp.332-333); (2001, Rogerson p. 146). where subscriptsI and j refer to the spatial objectsof which there are n, t rs the meanofall attributesand ru;; is a weighting function to reducethe impact ofdistant points.If the variableof interest_d is first transformedto a a-score{z : (_r_;)/s} then formula can be simplified to (Rogerson2001,p. 167): \I_


, ,



The weighting function uil is most often an inversedistancemeasure (|). For area data a measureof binary connectivity,where ur;; : 1 if i and are iliacent and ; irij : 0 if not, is frequently usedinstead(Rogerson2001, p. 167). The expected valueof Moran's 1, if thereis no spatialautocorrelation, is defined bv E( I):



Valuesof 1 largerthan E(1) indicatepositiveautocorrelation andvalueslower indi_ cate negativeautocorrelation(Fotheringhamet at. 2000a).The statisticalsignificanceof any departurefrom this expectedvaluecan be testedusing an assumption of normality (i.e. that the valuesofx; aredrawn from a normal population).In iases wheren is 'large' the standardised statistic

7_r -




Spatial analysis

can be used where the variance of 1 under an assumption of normality is

'* ,,,-'-

n2(n -l)5, - nln - l\S z + z t n -2 \ S l (r?+1.)(n-l)2 s o


\/' \-,,,,. t!',*,1 r ,,, *l' ,2 ' '

(8 . 1 1 )


n / n n \2

s ': r (T" * \ '" )

nSa - 5355


Less restrictively, an assumption of randomisation can be used (i.e. where observed 1 is compared to an expected 1 if -r; was randomly distributed) (Hodder and Orton 1976,p. 178).In this case,the varianceof 1 is given by var(1) =


(n - z)(n -r,(i - t)(n





-'r) (,-'io,
: (n2 -3n* 3)sr - nsz sa t 3(i+,,,,)'

ss : s r -2ns 1t 6( i+ , r)'


The variancecan then be used in (8.9) to calculatea Z-vahrc, which can then be compared to a normal distribution for significance (Fotheringham et al. 2000b, p. 204). In caseswhere n is 'small' then it may be necessaryto simulate the parameters of 1 using Monte-Carlomethods(seeBox 8.1) for which Fotheringham et al. (2000b, pp.204-209) provide a worked example.

8.3 Spatial autocorrelation Box 8.1 Monte-Carlo simulation


Monte-Carlosimulationpredates the rise ofcomputing, but it only really became an importantform ofstatistical samplingin the second halfofthe the lastcentury pafiicularly in physics(Robertand Casella2004). More recentlyit hasemerged asan increasinglyimpoftantmethodofstatisticalsamplingin the socialsciences, asitprovides a way of estimatingthe parameters ofcomplex populations. MonteCarlo simulationthus has an importantrole to play in GIS. The basis of the techniqueis common to that of statisticalsampling: that a randomsampleofindividuals from a populationwill showsomecorrespondence to thepopulationparameters, andthusthelattercanbeestimated from the sample. In many caseswhere populationsare large and potentially diverseit is unclear how a random sampleshouldbe generated, how many samplesshouldbe taken or whether any given random sampleis at all representative of the population. Monte-Carlo simulation reduces this uncertainty by taking repeatedrandom samples(often 1000 or more). It is then possibleto examinethe distribution of valuesof somestatistic(usually the mean) acrossthe samples. For example,a common archaeological GIS problem is whether the average viewshedsize of a sampleof archaeological sites is different from the average viewshedsize of the backgroundlandscape. Viewshedcalculationsare computationally intensive,andthus evenfor a modestlysizedstudyareait is unrealistic to attemptto establishthe viewshedsizefor every cell. Taking a randomsample ofpoints in the landscape and comparingthis with the archaeological sampleis the only realistic option, but then it may not be clear whetherthe random sample is representative of the backgroundpopulation. A Monte-Carlo simulation approachto this problem will take severalrandom samplesand so provide a betterestimateof the populationparameters. A common starting point for Monte-Carlo simulation is to take 1000 simple random sampleseach consisting of 1000 individuals and then to averagethe results.Note that this is computationallyequivalentto taking 1 000 000 samples, which may exceedthe populationsize! In cases like this it is acceptable to reduce the sizeandnumberofthe samples;forexample, Lake andWoodman(2000)used 100random samplesof 30 locationsto estimatethe parameters of the viewshed characteristics of their study area (seeChapter 3). A result is significant if the statisticfor the sites falls on the edge of or outsidethe range of valuesof the slrtistics for lhe non-site sample\. Obviously this is not a taskthatcanbe performedmanually ideally it requires a simpleprogramthat selects the random samples, calculates their viewshedsize and stores the resultsin a 1ogfile. Repetitivetaskslike thesearerelativelyeasyto programin, fbr example,Visual Basic for ArcGIS. Alternatively,setsof random x, yJocations can be generated in a spreadsheet or statisticalpackagesuch as R, savedas text files, and then imported into the GIS to be usedas the basisfor the calculations.


Spatial analysis

In other applications.such as spatial analysis,Monte_Carlotechniquesare usedto establishwhat a random distribution actually looks like. This is impor_ tant in Rilley's K (this chapter)where a known distribution of points must be compared to a landom distributionto establish whetheror not it is distinctive(i.e. ir^.ly-r9..d: or regular).Although still computationallyintensive,1000samples of 1000 points is a good starting point as it increases the confidencethat the characteristics of a random sampleare accumtelyestimated. Robert and Casella (2004) provide a good introduction to Monte_Carlo methods. AlthoughMoran's1 canbe calculated by hand,it soonbecomes unwieldywhen thereare more than ten or so objects.It is much easierto use a statisticalcomDuter package, particularly when computingthe significanceofthe statistjc(rs the Dre\i_ ousformulae might suggest!). Modulesfor calculating Moran's1 arereatjilyavail_ able for many popular GIS packages(e.g. Spatial Statisticsfor ArcView (Monk 2001), r.moran for GRASS and AUTOCORR for lclrisi) and are included in dedi_ catedgeostatistical packages suchas G*.1 It is also includedin the freely available CrimeStat2 spatialstatisticsprogram (Levine 2002). The degreeof positive spatialautoconelationin a spatialdataset may assistwith _ the interpretationof certain anthropogenicphenomena.For example,the spatial structure of 'eventhorizons'suchas the spread (Sokalet at.1gg9: of agriculture Gkiasta ercl. 2003)or the collapse ofthe Classical (Kvamme1990d: Mayansrare Williams 1993;Neiman 1997),dependto a largeextenton establishing positire autocorrelation betweenthe datesand locationsat which the eventis first observed. A recentuse of measures of spatialautocorrelation in archaeology can be found in (Premo 2004), who applies autocorelation statisticsto contextual terminal date\ of Mayan sites within local neighbourhoods to provide further insight into Mavan 'collapse'. While examples suchas thesedo depend on adequately sampled data. th^ere are ceftainly many more potential archaeological applicationsof the analysi. ot autocorrelation than haveyet beenrealised. 8.4 Cluster analysis Archaeologists frequently usepoints to represent the location of afiefacts,feature. andsites.The analysisofpoint distributionpattemsis thereforean importanttool tor describing,interpretingand explaining the spatialcharacteristics of theseohenom_ ena. Point distribution patternsare often describedin terms of their configurarior vis-a-visthreeidealisedstates - namely random,clusteredor regular(Fig. g.6a_c, In reality, spatialarrangements, whetherartefactuar or settlement, can rarelv be s.. simply described.Analysis of distributionpattemsneedsto be sensitiveto r.he frc:
rwr^'vu.qammade qn.com/. si -inrr. r cpsr.umi ch.edu/NACJD/crimes


8.4 Cluster analysis


Fig. 8.6 Idealisedpoint distriburionsj(a) nearly random, (b) nearly clustered,(c) nearly regulaa.

that severaldifferent smaller-scale patternsmay exist within a study areaand that different types of patterningoften exist at different spatiotemporal scales. In the caseof settlement patternanalysis,regularspacingof siteshasbeentaken to reflect either a form of competition between settlements, the existenceof site catchments,or a combination of both as a result of demographicgrowth from an initial random distribution (Hodder and Orton 1976, pp.54-85; perlds 2001, pp.132-147). Chrstering of sitesmay result from a numberof facto$, but localised distribution of resourcesand the emergenceof polities or regional centreshave often been highlighted (Roberts 1996, pp. 15 37; Ladefogedand pearson2000). In contrast,random distributionshave usually been treated as the statisticalnull hypothesis, thoughseveral conmentatorsprovidegood examples ofhow apparently randomdistributionscan be conditionedby less-obvious environmental, biological and social variables(Maschnerand Stein 1995; Woodman 2000b; Daniel 2001). In generalterms,the inter?retationof archaeological settlementdistributionsis in need of new theory building coupled with renewed empirical and experimental investigation. Recentwork by Premo (2004) on Mayan site distributionprovidesa good exampleof such an approach. One major issue in the analysisof point distributionsis the effect that the size ofthe study areahason the detectionand characterisation ofpatterning. Figure g.7 showshow adjustingthe scaleof analysishas a major influenceon both the homogeneity,intensityandclusteringtendencies ofpoint distributions.In the entirc study area, A l, the patternis homogenous with a clusteredstructure(i.e. clusteringoccurs relatively evenly) suchthat a frequencydistributionof the distances to eachpoint,s nearest neighbourwould be normally distributed.At smallerscales, for examplein areaA2, the pattem is heterogeneous with a strongleft to right gradient.A neighbourhood density function would be positively skewedwith a bimodal tendency. AreaA3 is similarly heterogeneous, althoughits densityvalueis significantlylower than42. AreaA4 has a high intensity and homogenous distribution,althoughhere it is far more regula-r than seenelsewhere. The dichotomyofdispersionvs.nucleationcreatedbymanypointbased analyses providesonly a coarsecharacterisation of human settlement patterns. In particular,


Spatial analysis

Fig. 8.7 Severalsnaller-scalepoint pattems are apparentin the neat-random distribution of points in areaA 1. For example,areasA2 and A3 can be describedas containing clustereddistributions,while A4 is betterdescribedas regular

analyses that are sensitiveat only one scale(like nearestneighbouranalysis)may overlook more complex, multiscalar,spatialpattems. 8.4.1 Nearestneighbouranalysit point to analyse techniqueusedby archaeologists A favourite,ifnow old-fashioned, Clark andEvans( 1954)first exploredthe distributionsis nearest neighbouranalysis. utility ofnearestneighbouranalysisin ecology(Box 8.2),andthe New Geographers were soon applying it to human settlementpattems (Dacey 1960; Haggett 1965). Its use for archaeological settlementpattern analysisfollowed some time later in Hodder 1972;Whallon the early 1970s(Hodder and Hassell 1971a;Clarke 1972l. 1974;Washburn1974;Hodder and Ofion 1976) and continuedthrough the 1980s and 1990s.The techniqueretains its prominencein archaeologyboth in general textbooks(e.g.Wheatley and Gillings 2002) and in culturally specificstudies(e.g. 2000).Its popularity is a productof two factors: Perlds1999;LadefogedandPearson it is straightforwardto calculate(seeBox 8.2) and it providesan easily intetpreted coefficient. neighbouranalvsi: There are,however,severalsignificantlimitations to nearest neighbou:. It was initially designed to detectspatialpatterningbetweenI st nearest and thus is not suitedto identifying multiscalareffects. For example, Fig. 8.8 showsa hypothetical distributionof a number of:::-. by points. A single-ordernearestneighbour analysis applied to : . represented point distribution in the left panel would detect the presenceof clusters. an- below) could be employedto show that the optir- -K meansstatistic(described number of clusterswas probably eight. Howeveq neither of theseanalysisu r - , be able to identify the fact that there is also a higher-orderscaleproducine t.-:=, resolutionreprese:::: clusters.Furthermore,if we include the finer artefact-scale on the right panel (rather than just an approximationof the centre of the ane--distribution),then clusteringcan be shownto exist at threedifferent spatialscai:.

8.4 Cluster analysis Box 8.2 Clark and Evans'nearestneighbourstatistic


This useful but problematic statistic is calculatedby dividing the mean of the observeddistancebetweeneachpoint andits nearest neighbour(denoted by Ro.1 by an expectedvalue of R if the distribution was random (R.). This latter is estimatedusing the equation:



where), is the densityof pointsin the studyarea(i.e. the meanintensitvof poinrsr. asgiven by Eq.t8.22). Theratioof Roro R" ldenoted by R) provides rhestatistic:




If R" and R" are equal - in other words, the observedmean nearestneighbour distanceis equivalentto that predicted if the distribution were random _ then their ratio (R) will be equal to 1. In a clustercddistribution, the mean distance betweenpoints will be less than when they are randomly distributed.Thus an R-valuelessthan i indicatesa clustereddistribution.If R is greaterthan 1 (up to its theoreticalmaximum of2.15), this indicatesthat the points aremore regularly spaced. The significance of R is dependent on the samplesizeand densityofthe point disrribution.It is known that the varianceofmean distances betweenneishbours in a random distribution is
vtp l -

4x1T x^xn


wheren is the numberofpoints and ). is the meanintensity ofpoints (Rogerson 2OO'1,p. 162).As we can estimatethe variance,a z-test can thereforebe usedto test the null hypothesisof random distribution:

,_( R"- R.) |T{R"-I


Tablesof the standard normal distributioncan be usedto assess significance: 1-valuesof 1.96or greaterindicating significantuniformity andvaluesof _ 1.96 or lower indicating a significanttendencytowardsclustering. (i) artefacts foming sites(clustersi-x); (ii) sitesforming primary clusters(clusters l-8), and (iii) primary clustersforming secondary clusters(clustersA_C). Increasing the nearest neighbour measurementto the second, third.....r?th neighbourmay detectclusteringat different scales,but the statisticalvalidation of patterningthen becomesdifficulr (Hodder and Orton 1976,p.41). Nearestneigh_ bour analysisis also significantly influencedby the size of the areato be analvsed.








Fig. 8.8 Multiscalar patterning:three large clusters(A, B, C) are eachcomposedof sm aller c lus t er s ( l- 8) , whic ht hem s e l v e s c o n s i s t o f s m a l l e r c l u s t e r s ( e . g . i x f b r cluster3).

with regular, random or clustereddistributionspafiially dependenton the shape of the study area.The size of surroundingarea included in the analysisalso can significantly influence the identification of clustering: the greater the amount of empty spacesurroundinga centraldistributionof randompoints, the more likely it is that the pattem will be identified asclustered.There are 'workarounds'for these problemsbut the techniquenevertheless blunt instrumentwith remainsa somewhat which to describepoint distribution patterns. 8.4.2 Riplel's K of archaeological survey As GIS-led apploaches to the collection and r.nanagement data are able to store data at several dift'erent scaleswithin the same envlronand spatiallysensitive more sophisticated ment (e.g.artefacts, sitesand regions), distributionpatterns.One techtechniques are requiredto identify and characterise problems neighbour ofnearest analysis is niquethataddresses someofthe inherent was designed to identify Ripley's K-function(Ripley 1976,l98l). The technique the relative aggregationand segregationof point data at difl'erent spatial scales of patterning. and the shapeof the study area has little effect on the assessment process point ),, where )"K(r) for intensity defines The statisticis defined a of the expectednumber of neighboursin a circle of radius r at an arbitrary point is a in the distlibution(P6lissier and Goreaud2001,p. l0l). The K-distribution point intensityat set intervals of r. cumulative frequency djstribution of average Significanceintervals are generated by Monte-Carlo simulation of random distributions of the points and a 95 per cent confidenceinterval can usually be obtained can be compared within 1000 5000 iterations(Manly 1991).Theseestimates robustmeasure of cluster valuesof K to providea statistically with the observed the cumulative For claritv of presentation size and cluster distancein the dataset.

8.4 Cluster analysis


Fig. 8.9 Identification of multiscalarclustering in the Kytheran Early Bronze Age using Ripley's K. The presenceof clustersof settlements between500 and -700 m is attestedby the peak at that position on the,{-axis, with significantbut less obviolrs clusteringoccurring between-700 and 1250m (for details seeBevan and Conolly in press).

K-distribution is usually transformedto L(r) : JR67; - r, where the expecta(L(r) : 0) is a horizontal line (Fig. 8.9). Z(r) < 0 means tion under randomness thattherearefewer thanexpected neighboursat distancer, suggesting a regularpaf tern, and l(r) > 0, meansthat thereare more neighboursthan expected at distance r, indicating a clusteredpattern(P6lissierand Goreaud2001, p. 102). Ripley's K is availablewithin comprehensive statisticalpackages suchasR, asa module in the freely availablespatialstatisticspackageADE-43 (Thioulouseer al. 1997) and within the third-party exrensionfor ArcView Spatial Statisrics4 (Monk 2001). Although it is more complex than the Clark and Evans nearestneighbour statistic,and it may take longer to calculatebecause of the necessityof simulating the parameters of a random distribution, Ripley's K offers a much better route to investigating the spatialstructureofpoint pattems.BevanandConolly (in press), for example,have usedthe techniqueto investigatethe changingnatureof settlement
l hlLp: / / p b i 1 . u n i v - l y o n l. fr lADE a ht Lp: / / a r c s c r i p t s . e s ri.co m /. 4 /.


Spatial ana\'sis

patternson the island of Kythera. They were able to show not only the presence on the island,but also how both of clusteringduring different phasesof settlement changedover the structureof the distributionand the sizeof clustersof settlements time.rellecting di llerent settlement strategie.. 8.5 Identifying cluster membership phase the second of analysis If clusteling is identified in a distribution of pointdata, There are usuallyconsists of definingthe numberand locationof thoseclusters. this objective, which can be a numberof techniques that may be usedto achieve divided into three groups dependingon whether they use hierarchical (Sokal and pnrtitionirg (8a11 andHall 1970)or lensin' SneathI 963; Sneath andSokal1973), 1986).Hierachical methods stafi with methods fo| clusterdeflnition(Silverrnan group higher-order progressively into fewer clusten so individual objectsand them membershipol one group. Pafiitioning methods that eventuallyall objectsassume beginwith the complete distribution, and breakit into a numberof smallerunits. of obiects. In this section identily dense concentrations while densityapproaches k-means partitionirc hierarchical cluster analysis, we describe threeapproaches: anddensityanalysis. 8.5.1 H ierarchical chtst er an alysrs in archaeology. This approach hasa very long historyof application to clustering It works by creatinga'distancematrix' between objectsbasedon their attribute between all pointsin the states, and as appliedto spatialdatathe lineardistances form the basisof the matrix.For example, the felationships between the dataset points in Fig. 8.10 can be converted to a distance matrix (Table8.5), which can groupingof individual a dendrogrant defining the hierarchical be usedto construct (Fig.8. I l). points The constructionof the dendroglambeginsin an agglutinativemannerby definpointasa groupuntoitself.thenlocating the pair thatpossess ing eachseparate the points I and which distance. In this example, this occursbetween 5, are smallest (7 and9) arethen distance Thepairofpointswith thesecond smallest 0.7lnits apart. groLrp: grouped pairsmay link to er,isting together. Individualpointsandgrouped rule. which definesthe clustermethodand ultimatel) on the basisof a specified dendrogram. For example, oneof the sirnplest methods is the shape of theresulting (SLCA) joins points to groupsor groups which calledSingle-Litk Cluster AnaLysis to groups on the basis of a sharedlevel of similarity betweenany member of the two groups.In this scenario,point 10 therefore would link to the pair (7, 9). and joined point4 wouldlink to the pair ( 1,5). Additionalpointsarethenprogressively leadingto the dendrogram depicted in Fig. 8.11. Examination of Fig. 8.1I providessomeindicationof the spatialstructure of can be distinguished, the point distribution in Fig. 8.10.Two major clusters one consisting of 6, 3, 2, 4, I and -5. consisting of points8, 10. 7 and 9, and a second group. group, point 8 standsout Within the first with 6 sitting as an outlier in that With such a small distribution the additionalinsight from the main distribution.

8.5 ldentifiing cluster membership Table 8.5A distancematrix for hierarchical cluster analysis


',7.8 2 '7.3 3 4 4.1 5 0.'1 6 7.3 '7 13.6 8 12.7 9 14.4 t0 16.6

4.3 3.',7 4.5 '/.0 6.1 3 .4 t2.t 13.',7 9.3 l9.'1 20.7 1 6 .6 19.8 20.0 1 6 .3 20.'l 2t.6 t'7.5 22.8 23.8 19.'7

1.5 1 4 .0 1 3 .3 14.9 1'/.1

'7.',7 8 .8 3.8 8 .7 1 .0 3.7 10.8 3.1 5.2 2.2

oa o9


o6 o5


Fig. 8.10 A simple point distribution.

that this brings over visual assessment of the distribution is marginal, although hierarchicalanalysisappliedto larger datasets may help make sense of more complex arrangements. However, a major problem with SLCA is that links between groups are createdvery easily, so large chains of small clustersoften result and outliers are connected to other clustersbasedon a connectionwith only one member.Better methods,suchasAveldge-Link ClusterAnalysls, use averagesimilarity scoresof groups to define the level at which additional members cluster. Even more sophisticated approaches to group definition offer further advantages, suchas Ward'sMethod which seeksto maximise the homogeneityof clustersby defining clustersso that the error sum of squares(ESS)- the sum of the squareddistances of all points from the meansof the clustersto which they belong - is minimised.


Spatial analysis

6 5

o 3 2 I 0

in Fig. 8.10. Fig. 8.11 A singlelink clusteranalysis ofthe point distribution

Clusterscreatedrsing Ward's Method are typically easierto interpretthan sinsl:' occurlessfrequentmethods because long chains of smallgroups or average-link resulting in morehomogeneous clusters anda moreeasilyintelpreted dendrogr": the dendrograrn in F._For exanple,Ward's(1990)methodwas usedto construct 8.12.The resultsare arguablybetterthan the resultsfrom SLCA, in that 3. 2 af.: form a cluster distinct fi'om 1 and 5, with 6 as an outlier from the latter pair. ( 1988,pp.212-232) providesa comprehensive review of severalhi;:-'. Shennan ofhielarchicalmet: chicalclustering methods. In general terms,the advantage isthatitispossibletoviewclusteringatarangeofdifferentscales,beginnins., into largercluste:.: - smallglonpsof two or moreobjects andbuildingeventually includemanysmallergroups.Thisadvantage,however,canthenpresentdiflic-, may exist in a dataset, as the ni when decidingexactlyhow many clusters ofclustersisdefinedbyanarbitrarylevelofsimilarity(orerrorsumofsquar;-' the caseof Ward's method) chosenby the analyst.For this reasonit is often n i.. to avoid dependency on a hierarchicalmethod and insteadto use conrplementar to helpascertain the optimumnumberof clurstatistics, suchast-meansanalysis. packages, suchas SPSS, Sf andR, offe: tersin a dataset. Most popularstatistical range hierarchical clustering methods. a of 8.5.2 k-M eansanall's is When the number of objects is large (e.g. > 100),the dendrogramproducedfrorr and it becomes difficult rr a hierarchical clusteranalysis is not easilyinterpreted groups should be defined. Betterar: ascertain thelevelof similarityat which cluster methodsthat allow the desirednumberofclustel s to be specifiedbeforehand so thr: canbecompared andanoptimalsolution chosen.,t-Means ana' alangeofsolutions ysis is one such method.The major diff'erence betweent-means and hierarchic..

8.5 ldentifying cluster membership



.91 I

-3 0 20 10 0

Fig.8.12A cluster analysis ofthedisrriburion shown in Fig.8.10 using Ward,s method. clusteringis that fr -means is a partitior?itxg clusteringtechnique,because insteadof grouping similar objectstogether,it divides up a group of objectsinto a specified numberof clusters.Cluster centresare initially definedby the selectionof random pointsfrom the distributionwhich act as 'seeds',andthe remainingobjectsarethen addedto the cluster which they are nearest. As new objectsare addedto a cluster, the centre of the cluster is recalculated,and if a previously assignedobject now lies closer to anothercluster centreit is reassigned.. This iterative reallocation is a major strengthofthe [-means technique,althoughbecause the seeds that definethe clustersare random,different optimum solutionsmay result and solutionsmay not be replicable.Once all objectshavebeen allocated,eachcluster,ssum of squared Euclideandistances(i.e. the squareddistancebetweeneach object and the centre of its cluster) is calculatedto provide an assessment of the clusteringsolution. One way to determinethe optimum number of clustersis to examine the rate of decrease in the total sum of squareddistancesover the increasingnumber of clustersolutions.As the number of cluster solutionsincreases towardsthe number of points in the distribution, the sum of squaresreducestowards 0. The rate of decline, although generally exponential,reducesat points where increasing the numberof clustersdoesnot drasticallyalter the total sum of squares. The way that this is usually measuredis to plot the natural log of the percentage of the total sum of squaresfor an increasingnumber of clusters (ft). In situationswhere the distributions are reasonablyhighly clustered,this can be a useful techniquefor identifying the optimum number and membershipofgroups. One good exampleof this is the k-meansanalysisof distribution of medievalcastleson Okinawa Island by Ladefogedand Pearson(2000), which suggested rhat the optimum clustering solution was three (Fig. 8.13).


Spatial analysis

on OkinawaIsland, castles Japan, clusteranalysis of medieval Fig.8.13 A I means per cent sum of graph of the decreasing with the 'elbow' at solution 3 in the associated (redrawnftom Ladefogedand Pearson2000, Figs. 2 4). squares

More complex distributionsprove difficult to cluster in such a straighttbrward manner.For example, the distribution of 1795 points in Fig. 8.14 representthe of stoneartefactsfrom Trench4b from the Lower Palaeolithic spatialarrangement site of Boxgrove, England (Roberts and Parfitt 1999, Fig. 2'79).The Clark and is 0.69, with a z-valueof 24.8,which allows a null hypothesis EvansR-statistic to be safely rejected.The questionremains as to the number and of randomness significantchallenges location ofcluster groupsin this distribution,which presents pattem and the lack of clear lines distribution of the diffuse natureof the because of division betweenhigher and lower density areas. The only clear 'elbow' is at the two-clustersolution,as shownin Fig. 8.15. The result is a distribution divided in two with the dividing line roughly corresponding to an area of reduceddensity running through the middle (Fig. 8.16). There are, however,difficulties with this solution that highlight some of the problems with k-meansanalysis.In particular,the dividing line betweenthe left and right clusters ofthe relatively appears to be lessthan optimally positioned.This hasarisenbecause simple mannerin which ,t-meanscalculatescentroidsas the meanof the ,r- and ycoordinates.

8.6 Density analysis

Fig. 8.14 Distribution of stoneartefactsfrom horse-butchery trench 4b at the Lower Palaeolithicsite ofBoxgrove, England (Robertsand parfirt 1999,Fig. 279). Source: The Boxgrove Project.Used with permission.

In some solutions this can result in points that lie towards the peripheriesof clustergroupsbeing placedinto different clusterseventhough they form a coherent group. Thesedifficulties are to a large extent createdbecause of the natureof this point distribution,which is characterised by a numberof smallhigh-densityclusters interspersed with a lower density 'carpet' that foms diffuse clustersthat bleed into each other. Identifying the optimum number of clusters,and their membership, can be very difficult in thesesituations,although subdividing the distribution and applying ft-meansto each subunitmay help. In general,however,complex spatial distributionssuch as this example are difficult to cluster using k-means,and may not usefully contributeto the interpretationand understanding ofthe behaviourthat createdthe dataset. More sophisticated partitioning algorithms like Partitioning Around Medoids (PAM; Kaufman andRousseeuwi990; van de Laan et aL20O2), may provideadditional insight. Altematively, examining cluster patterningthough the generation of density measurements can be infonnative. 8,6 Density analysis Thereare many distributionswhereclusteringis evident,but the definition of membershipis very difficult to definebecause of the quantity ofpoints, or because of the 'fuzzy' boundariesof concentrations. In thesesituations,the problem of how best to define cluster location and size may benefit from approaches that describethe changing density (or, more properly, Ihe intensity) of material.Theseapproaches fall undera categoryof spatialmodeTling called. intensityanalysis and, allow archaeologiststo describeand visualisethe changingfrequencyofobservationsthat occur


Spatial analysis

10 15 (k) Clusters


(expressed Fig. 8.15 Rateof change of sumof squares aslog percentage of totalsum ('elbow')in therateof change of squares) for 1-20cluster solutions. Thedeviation at t :2 (arrow) suggests two clusters is anoptimum solution. within a given area, often to compare different phenomena within the same area or against the same phenomenon in different areas. If N(a) denotes the number of observations of a phenomenon occurring in A, Ihen the mean intensity of lhe phenomenon, M1 is given by:




For example, the mean surface intensity for a distribution of 1795 artefacts in a study areaof 62 m2 is calculatedas 1795162: 28.95 artefactsper squaremeter. The same principle could be applied to subregions of the study area, for example, individual l-m squares, and the first-order intensity function calculated for each square.In this case,the intensity is given by l(.r) as we are calculating it for a subsetof d. More usefully, however, the size of the observation area can be reduced to a 'rnoving window' to derive measures of local density, ),"(-r; ), on a regular lattice across the study area. This can be formally expressedby the equatron:

N[C(x;. r)] T xri


where .xj is the location at which the intensity is being calculated, and N is the number of artefactsin C(x;, r), which is a circle of radius r aroundpoint .x;. This calculationis referred,to as the naive estimntor(Fotheingham et aL.2000a, p. 147).

8.6 Density analysis

1'7 5

36 Fig.8.16





The two-cluster solution of a ft-means cluster analysis.

(r) and the grid cell size will havea pronounced Obviously the areaof observation influenceon the resulting density surface.Some insight might be gainedby comparing localised surfaces(i.e. smaller searchradii) against generalisedsudaces (i.e. larger searchradii) to investigatesmall-scaleversuslarge-scale influenceson distribution patterns(Fig. 8.17). Note that as the searchareadecreases, local densitieswill increasefor smaller clusters- e.g. 5 artefactsfa'lling within a searchradius of 0.25 m will result in a local densityof5/ (T x 0.252): 25.5, whereas falling in a search 5 artefacts radius of I m wouldproduce a localdensitlof5/rz x l2t: 1.2. 8.6.1 Kernel densityestimates A more sophisticated densitymeasurecalledkernel densityestimation(KDE) producessmootherandmore readily interpreted resultsthan simpledensitytechniques (Silverman 1986). Kernel density estimation is a non-parametric technique in which a twoprobability densityfunction (the 'kernel') is placedacross dimensional the observed data points to createa smooth approximationof its distribution from the centreof the point outwards.The two parameters that can be manipulatedare the shapeof the kemel placed over eachdata point (althoughin many GIS packages this is set to a quadraticfunction and cannotbe changed)and the variance(or radius) of the kernel, referredto as the bandwidth and denotedby ft. The density value for each cell is then established by adding togethervaluesof the density distributions(each of which will be a fraction of 1, unlessthe data points representpopulations)that overlie that grid cell. Experimentationwith different values of l2 is advised,and more detailedguidelinescan be found in Wand and Jones(1995); archaeological


Spatial analysis


a-----------12 m

Fig. 8.17 Three intensity sudaces of theartefacr distribution (top),0.5-m in Fig. 8.14:0.25-m radius (middle), (bottom). radius 1-mradius A1lcalculated on a l0-cm grid.Intensity values areexpressed per square in artefacts meter.

8.7 Localfunctions


applicationsare describedin Beardahand Baxter (1996) and Beardah (1999). In general,using too wide a radius will result in an overly smootheddistribution, whereastoo narrow a radius will producepeaksarounddata clustersthat may not reflect the actualdistribution.Figure 8.18 showsthe result of KDE using the same point data as for the simple density calculationsshown in Fig. 8.17. The result is a smootherand more easily interpretedcontinuoussurfacefor cluster identification. 8,7 Local functions functions can be replacedby otherneighbourhood Finally, simpledensitymeasures to producecontinuoussurfacesthat show the changingnatureof an attribute.For example,if the count of artefactsis replacedwith an attribute variable, y, then a local estimateof ), at point J is given by: )- v(r' t ;,(- \ - "' n\ri )


This returnsa continuoussurfaceshowingthe meanofthe variable.This interpolaeachcell and tion is local because valuesareestimated usingthe valuessurrounding it canbe contrasted to the global trendsurfaceof artefactsizesin Fig. 6.1. However, in common with the other density functions, this method is highly influencedby edgeeffectsand shouldbe interpretedwith caution. In certain situationsit might be important to identify whether a local region deviatesfrom the global trend. This might occur, for example, with survey data if one was interestedin exploring whether some enumerationunit (e.g. fields) had neighbourswith higher artefactdensitiesthan the global pattem and could be defined as a'hot spot'. An appropriatemethod to answerthis question is Getis's Gl statistic(Ord and Getis 1995)

luil(t)x1 -wii
sf(nSi - w:2)/(n - 1)]t/2


valuesx, and u ij(d) : I deviationofthe observation wheres is the samplestandard j (Rogerson 2001, p. 174).Finally: if region is within a distanced from region I




si,: I,li


For example,considerthe distribution of artefactsrecoveredfrom the five fields in Fig. 8.19. The question is whether the high values around field 5 representa


Spatial analysis

:& -'l

r-----------r 2m Fig.8.18 Kernel density estimatesof the artefactdistribution in Fig. 8.14 using a diameterof 0.25-m radius (top), 0.5-m radius (middle), 1-m radius (bottom). Densitiesexpressed as artefactsper squarcmeter.Comparc to Fig. 8.17.

lling 8.8 P redictive mod.e


(d) in five fields. Doesfleld5 andits two of artefact densities Fig. 8.19 Distribution neighbours represent a local'hot spot'? statistically significant local cluster distinct from the global pattem. From these observations the values in Table 8.6 can be established to complete Eq. 8.25 so that:

5 6-3 x 13.8




- lA1

The statistic Gf can be taken as a standardnormal random variable with a mean of0 and a varianceof I (Rogerson2001, p. 174).We can thereforeuse the normal (z) distribution to establishwhetherthe calculatedvalue of Gi is sufficiently large level, thus allowing that it falls within the critical region for a specifiedsignificance test,if p < 0.05 then z : 1.645, to be rejected.For a one-sided the null hypothesis which is less than our calculatedGi. We can thereforereject the null hypothesis and concludethat field 5 is locatedin an areaof locally high values. 8.8 Predictive modelling The term 'predictive modelling' refers to the method of predicting the value (or probability of occurrence) of a dependent variablein an unsampledlocation using it is most closely variables.As applied to archaeology, one or more independent associated with attemptsto predict the probability of archaeologicalsettlements of the on the basis of quantitativeassessment occurring in unsampledlandscapes (Kvamme 1983; Judge in a surveyedarea locationalcharacteristics of settlements and Sebastian1988;Kvamme 1990a;Westcottand Brandon 2000). In this narrow predictivemodelling is subjectto the chargeof environmentaldeterminism sense,


Spatial analysis Table 8.6A distancematrixfor Getis's Gl stotistic



5, 4 5, 3 5. 2 5. I
Sum St. dev.

0 0

22 16 1u 5 8 69.0 13.8 '7 .l

22 t6 1u 0 0

for its reliance on a limited range of environmentalvariables (GafIney and van Leusen1995). While it is certainlytrue thatenvironmental variables influence the choice of settlementlocation, it is also true that theseare not the only factors that peopleconsiderwhen choosingwhere and how to settlea landscape. Severalwriters have shown how cultural factors ranging from the influenceof ,supernatural' phenomena to the location of pre-existingsettlements can play an important role in influencing thehumanuseof space (Ingold 1993;Tilley1994.1996;Bradley199g, 2000; Barett 1999:Tilley and Benner2001). The inregrationof experienrialvari_ ablesin GIS to improve understanding of the multiple factorsthat influencehuman settlementlocation remainsa major challenge.To date most work in this vein has concerned visibility,althoughas we discuss in Chapter10, concemwith visibility doesnot automatically overcome the charge of environmental determinisn. In the meantime,activeresearch in developingpredictivemodelsremainsfocusedon environmentaland ecologicalvariables,no doubt in part because of the easewith which these canbe measurcd. Despitethe abovecritique,we suggest thatpredictivemodelling canbe genuinely informative in two situations.The first is cultural resourcernanagement (CRM), whereit is often necessary to predictthepresence of archaeological materialin order to preventor mitigate damagefrom constructionor agricultural practices.From a CRM perspectivewhat normally mattersis whether the prediction ls conect, not whetherit contributes to an explanationof sitelocation.Sufficeit to saythat numer ousexamples ofpredictive modelling haveshownthat environmental variables such aslelief, soil type, drainageandpermeability,slope,aspect,distancero water,etc., do - at times only moderately but occasionallysignificantly - improve archae_ ologists'ability to predictthe occurrence (Kvamme 19g3, 19g5. of settlements 1990a;Duncan and Beckman 2000; Waren and Asch 2000; Woodman 2000b). The secondsituation where predictive n.todelling is useful is in understanding the extent to which site location may have been influenced by a complex interplay of environmentalfactors (cf. Wheatley 2004). In this casethe goal is ultimalely

8.8 Predictivemodelling


explanation,but providing one remains alert to the fact that correlation does not necessarily imply a causalrelationship,predictive modelling provides a valuable tool fbr identifying multivariatepatterning. The constructionof a predictivemodel is usually under-taken in four stages: data collection,statisticalanalysis,applicationofthe model andlinally model validation (cf. Duncan and Beckrnan2000, p. 36; Warren and Asch 2000, p. l3). Note rhat validationmay lead to fufther refinement.We considereachstasejn tum. 8.8.1 Data coLlection Predictivemodelling works on the assumptionthat it is possible to differentiate betweenareasof the landscape that have evidenceof past occupation(i.e. .sites') and areasof the landscapethat do not ('non-sites') on the basis of one or more landscape attributes.It follows that the constructionof a predictive model requires information about the location of sites and about the distribution of the relevant landscape attributevalues. Ideally, site and non-sitelocationsare established by a programmeof randomor possiblyclustersamplingofthe landscape, asis commonpracticein north America (e.g. Kohler and Parker 1986; Kvamme 1992a).This approachhas the virtue of allowing estimation of the actual frequency of sites, which in tum allows one to make absolutepredictions about the presenceor absenceof sites. The altemative approach,known as casecontrol, typically makes use of existing data about the presence or absence of sites,and is more common in Europe,where extensive sampling of the landscape is often not practical (Woodman2000b). Case-control dataonly supportrelativerather than absolutepredictionsof site presence, that is, statements of the form that it is 2.5 times more likely that thereis a site at location A than there is a site at location B. Note that it is best to avoid the practice of identifying sites and then randomly picking other locationsto serve as non-sites, sinceif the latter havenot beenexaminedthey may in fact containsites,which will then inevitably underminethe ability of the model to identify landscape attribute valuesthat discriminatebetweensite and non-sitelocations.However the site and non-sitelocations are identified, it is common practice to split them rnto a tratning sampleusedto build the model and a tesling samplewithheld for the putpose of testing its accuracy.This procedureis known as split sampling and it is usual to place 50 per cent of locations in the training sample and 50 per cent in the testing sample.When the locations have been obtainedby cluster sampling then they should be split by cluster rarher than by individual location (Kvamme 19gg,


A wide variety of landscape attributeshavebeenusedfor predictivemodelling. The primary datasets often consistof a combinationof elevation,soil, hydrology, geology and vegetationmaps. Once input into the GIS thesecan then be used to derivefurther secondary datasets suchasrelieffor a set ofdifferent catchmentsizes (i.e. the range of elevationvaluesin a circumscribedarea),slope and aspect,distances to annualandpermanentstreams, soil productivity,erodibility, permeability


Spatial analysis Test validity

Statistical validation usrng sampl e of known sitelocations withheld duringmodel development stage Grou nd truthing Application to newstudyarea andground truthing

Primarycoverages Secondarycoverages

Relief in 500-m catchment Slope Distance to nearesr srream Soilsurface runoff Soillandform Distance to neareSt permanent stream

Archaeological andsurvey sites areaS

flowchart of stagesin the generatjonof a predictive model Fig. 8.20 Generalised Adapted from wanen and Asch's flo\tchafi for their predictive model of site location in Montgomery County, Illinois (2000, Fig. 2.4). The six independentvariablessho\\n here are only a sampleof a much wider rangeof potentially useful variables.See Wanen and Asch (2000) and Woodman (2000b) for examples.

and drainage,vegetationtype, exposure,shelter quality, viewshed size, etc. (see Fig. 8.20). Of course,not all the data collected will necessarilytum out to have predictivepower. 8.8.2 Statisticalanalysis Once the data have been assembled,the first task is to identify the landscape attributesthat significantly discriminate between the site and non-site locations. This is normally achievedby univariateanalysisof eachattributein tum. Depending on the type of data (e.g.nominal in the caseof soil typesor ratio in the caseol distances, elevationand slope)the most appropriatestatisticaltest will probably be test,the Kolmogorov one of thosedescribedin Chapter7, suchas the chi-squared Smimov test or, if normally distributed,Student's,{est. The potentially useful discriminatorsare thoseattributesfor which it is possibleto reject the null hypothesis that their valuesat site and non-sitelocationsare drawn from the samepopulation. Before theseattributesare finally selectedfor inclusion in the predictive model it whetherany of them either confound,or interact,with others. is wise to investigate Confoundingoccurswhen one attributecan substitutefor anotherin predictingthe

8.8 Predicrivemodelling

r8 3

presence or absence of a site (Woodman2000b,p. 452), from which it follows that the confounder need not be included in the model. Interaction occurs when one attribute modifies the relationship between another and the presenceor absence of a site (Woodman2000b). For example,in her predictive model of Mesolithic settlementon Islay (seeChapter3), Woodmanfound that there is a low chanceof finding sites at locationswhere water sourcesare small and elevationis high, but a much greaterchanceof finding them at locationswhere water sourcesare small andelevationis low. When attributes interactin this way they shouldbe replacedby hybrid variablesthat speciflcallyrepresent the natureof the interaction.Woodman (2000b,pp.452453) describes methodsthat may be usedto idenrify confounding and interaction. Once the appropriateattributeshave been identified, the next step is to build the predictive model itself, for which the preferred techniqueis logistic regres_ sion analysis (Stopherand Meybtrg 1979;Hosmer and Lemeshow l9g9; Menard 2001). Logistic regression differs from linear regression in two ways that make it particularly well suitedto predictivemodelling. The first is that logistic regression is able to use a combinationof variablesof different scales(i.e. a mix of nominal, ordinal, interval and/ormtio data).The secondis that logistic regression seeksto fit an 'S'-shaped probability curve(hence 'logistic').The ,S'-shaped curveallowsthe predictedprobabilities of site presencero swirch fairly rapidly from low to high, thus avoiding the long sequence of intermediatevalues(representing uncertainty) that would be producedby a normal linear function. In the caseof archaeological predictivemodelling, the probability curve is fitted along an axis of discrimination determinedby differentially weighting the contribution of the chosen attributes in such a way as to maximise the difference between site and non-site locations (Waren and Asch 2000,pp. 6-9). A numberof statisticalpackages offer the ability to perform this task, including SPSS,R and S |. What all of them oumur 1s an interceptd, and a seriesof regression coeffrcients b1, bz, . . . , b, that detemine the weighting applied to each of the r?attributesx 1, 12, . . . , ;r, . The predictive model is an equationthat takesthe form

V : a + xftr + x2b2 +... -t x,b,,

where V (often referredto as a 'score') is the log oddsof site presence.


8.8.3 Application Once logistic regression has been usedto build the model it must be applied on a cell-by-cell basis to the study area.The first task is to calculatethe score, V, for every map cell by implementingEq. 8.28 in map algebra(seeChapter9). In cases where the variablesare ratio scale(e.g. elevation)then the coefficientsare applied directly to the variable (e.g. if xl refers to elevationand the regression coefficient b1 is 0.345, then the rastermap containing elevationdata would be multiplied by 0.345).In caseswhere nominal scaledata are used,then the resultsof the inalysis will define a set of numeric design variables for each nominal categorythat can

r ti4

Spatial analysis

be insertedinto the map algebraicformula (seeWarren and Asch 2000. p. 19,for' 1breachmap cell, l. a concrete example). Oncethe score,y, hasbeencalculated p;. This is achieved it mustthenbe converted to a probabilityof sitepresence, by (Haining p. implementing the following equation 2003. 262) in map algebra:

vi I + exp(f )


Dependi;rg the resultingrastermap will how the training samplewas collected. provideeitheran absolute probability in eachmap cell. or relative of a siteexisting 8.8.4 Valiclation The fact that it hasbeenpossible to construct a predictive modeldoesnot in itself guarantee This can be assessed using the testing the accuracy of its predictions. samplethat was withheld fiom the model-building process.The basic idea is to establishhow many of the observedsites from the testing samplefall within the areawhere sites are predictedto be found. For example,if 16 out of 25 observecl sites fall in the areawhere sites are predicted,then the model could be expressed as correctly predicting site location 64 per cent of the time. In reality, however. mane15 rre notquiteso simple. for lwo rnlin reasons:
Prediction is probabilistic Very few. ifany, modelsprcdict site occurrencewith abso lule certainty of presenceor absence. Consequentlyit usually onlv makes senset(. talk about the model correctly pfedicting site presenceat somespecifiedprobabilitr. /r, between0.0 and L0. Models tend to be more accurateat low probabilitiesand le\. accuate at high probabilities. Non-sites mattr Often it is possibleto specify a probability for the occurence ofsire. that is so low that all obsened sites do actually fall within the area where sites are predicted,in other words, so that the model is 100 pel cent accurate.HoweveL,the corollary is usually that a large numberofnon-sitesalso tall in the areawhere sitesar.' predicled,sothe model is very inaccur-ate at predictingthe lack ofarchaeologicalsite\ ofthe modelwas,lbr example. This wouldclearlybe very uldesirable ifthe purpose to identify a route lbf a ncw road that minimised the damagelo archaeological sitei

Clearly then,it is impofiant to considelthe accuracyof a model with reference to production problem is the at hand.One methodthat facilitatesthis the of cumulative (Kvamme 1988). per-cent-conect predictioncurvesfor both sitesand non-sites just sucha graph,in whrchthe numberof sitesfalling in areas Figure8.21 shows where they ale predicteddecreases as the probability of site occunenceincrea;es. while the numberof corect non-sitesincreases asthe probability of site occurrence sitesthen one might incleases. In this case,if it was inrportant to avoiddamaging chooseto avoid areaswith even a relatively low probability of site occurrence. However,a further complication which arisesat this point is that the relevantarea probabilities is likely to be so large(sincethese arecumulative the areain question includes with a low probabilityor greater) asto renderthe prediction all locations Thereare at leasttwo solutions to this dilemma.One is to pa\ virtually worthless.

8.8 Predictive modelling

r8 5



.o 60
.9 o
(J qJ

Sites Non-sites

, ln

o^^ vzv

0.1 0.3 0.5 0.9 0.7 Predicted Site Probability

Fig.8.2l Cumulativeper cent corect predictionsfor model sitesand non-sitesfor all probabilitiesof occurence. Reproducedwith permissionfrom Kvamme 1988, Fig. 8 .11 8

attentionto the trade-off betweencorrectly predicting site and non-site locations, while anotheris to examine the predictive gain offered by the model. Kvamme (1988,p. 329) defines the gain,G, as:

G :1 -

Eoof total areawhere sitesare oredicted 7oof observedsites within areawhere they are predicted

G, which is calculatedfor a specifiedprobability of site occutrence, rangesfrom 1 (high predictive utility) though 0 (no predictiveutility) ro 1 (rhe model predicts the reverseof what it is supposed to). The most important property of this measure is that it can distinguisha conect but relatively worthlessmodel from an ostensibly less correct but more useful one. For example, a model that correctly predicts 80 per cent of sites and predicts site occurrenceover 70 per cent of the landscape is probably not very useful, which is reflected in the low gain of 0.13. On the other hand, a model that coffectly predicts 70 per cent of sites and predicts site occurrenceover a mere 5 per cent of the landscapewould provide a better basis


Spatial analysis

for many decisions,which is reflectedin the gain of 0.93. Further suggestions for testingpredictivemodels can be found in Kvamme (1988). 8,9 Conclusion This chapter has introduced some techniques useful for investigating the pattems and relationshipsin spatial datasets. It is impossibleto do full justice to the very large literature on techniquesof spatial analysis,so we have chosento highlight approaches that are within the grasp of the majority of archaeologists, whether numerically inclined or not. Although this chapter has reviewed a number of 'traditional' techniques - nearest neighbour, hierarchical and ft-means cluster analysis, for example - we have also highlighted recent approaches to the explication of spatialprocesses, such as Ripley's K. We encourage further investigationof modem spatial analytical techniques as described by Haining (2003) and Fotheringham et al. (2000a\.

9 Map algebra,surfacederivativesand
spatial processes

9.1 Introduction: point and spatial operations In this chapterwe introduce a number of point and spatial operationsthat can be performedon continuousfield data.We begin with the use of map algebra,before moving on to the calculation of derivatives (e.g. slope and aspect) and spatial filtering (e.g. smoothing and edge detection), all of which are widely used by archaeologists. In the final section we introducemore specialised techniquesthat potential. havearchaeological Map algebrais a point operation,whereasthe other techniques discussed in this chapterare spatialoperations. Point operations computethe new attributevalueof a (;r, ),) from the attributevaluesin other mapsat the same location with coordinates location(x, y), (Fig. 9.1b).In contrast,spatialoperations computethe new attribute valueof a locationfrom the attributevaluesin the samemap,but at otherlocations(Fig. 9.1a).The neighbourhood thosein the neighbourhood usedin a spatialoperation may ormay notbe spatiallycontiguous. Forexample,slopeis usuallycalculated using the elevationvalues in a neighbourhoodcomprising the four or eight map cells immediately adjacentto the location in question (see below), but we saw in Chapter 6 how inverse distanceweighting interpolateselevation values from some number of nearestspot heights, irrespectiveof how far away those spot heightsactuallyare.The useofpoint and spatialoperations requiresan appropriate data structureas well as careful considerationof whether the results are actuallv meaninsful:
1 Data structure Point operationsrequire that attribute values in different maps can be accessedusing the same spatial index, which is most easily achieved by 'stacking' nster maps of the sameresolution,as depictedin Fig. 9.1(b). Spatialoperationscan, in pdnciple, be appliedto a variety ot' data structurcs. For instance,slopecan be calculated usingelevationdatastoredasa rasterelevationmodel oras aTIN (Chapter6).In pmctice, however,many spatialoperationsare most commonly,oreven only, performed on raster data because the regular alray of valuesgreatly simplifies the necessary algorithms. Interpretation So far asthe processofcomputation is concerned, most point and spatial operationscan be applied to any information that can be storedusing the requisitedata sffucture. Whether the results will be meaningful is an entirely different matter! For example, the map 'calculators' included in many GIS packageswill allow the user to muhiply a nominal soil map by elevation,even though '134 m x clay' will rarely be meaningful.This doesnot necessarily mean,however,that opentions should only ever be used for the purposesfor which they were originally envisaged. For instance,many



Map algebra, derivativesand prccesses



gU-1,rt, y1= Alx, B\x+1,y) + y-1)+ B(x, B\x,y+1)/4


Fig. 9.I Calculation of a meanvalue as a spatial operation(a) and as a pornt operarion(b).

spatial operationswere developedfor applicationto modelsoftefiain, where they have a 'natural' or common-sense interpretation.Thus few users will have much difficult! attachntga meaning to the rate of changeof elevation.which is ot couLse ,lup". anj. similarly, pathwaystracedarongrouteswhich maximise the rate ofchange ofelevation in a downhill ditection are easily inteq)rctedas the expectedflow{nes ror waterrunoff. Archaeologistshave,however,found good reasonsto calculateboth the rate of change of values and .flow lines' for non_elevation data. For example, Llobera (2003, p. 4-01 .slope' calculatedthe ofa cumulativeviewshedmap (seeChapter l0) sincethis allowed him to investigatehow the extentof visibiliry changesas a rcsult of movement thj.ough the landscape. As so often with GIS. it is ultimately up to the user ro work out which methodswill producemeaningful resultsin a given context.

9.2 Map algebra The term 'map algebra' refers to the mathematicalmanipulationand combination of rastergrids on a cell-by-cell basis(Tomlin 1990).This is an imporrant capability of GIS and is essential for many forms of modelling and analysis.Concepiuallyit is straightforward:imagine two rastergrids, A and B, covering the same location at the sameresolution. Each cell in grid A containsthe value 2, and each cell in grid B contains3. The result of adding grid A to grid B will be a new map where each cell conrainsrhe value 5. Similarly, multiplying grid A by grid B will result in a new map where eachcell containsthe value 6. bbviously, rihen cells within a rastermap havedifferent valuesthat reflect someform of spatialvariability (e.g. vegetation,elevation,etc.), then the resurtsof combinationwith other rastei mais will be a new set of continuouslyvarying data.This will occur,for example, when combining datain wayssimilarto the example shownin Fig. 7.14

9.3 Derivatiyes


More complex algebraiccombinationsare,ofcourse, possible.For example,the well-known Normalised Difference VegetationIndex (NDVI) shows variation in greenvegetationacrossa land surface,with 'greener'areashaving higher numbers than non-greenareas.It is calculatedfrom the near-infrared(NIR) and visible-red (VIS) image bands from a remote sensingsatellite such as the SPOT or Landsat systems:
-\l_, v I : -


(NrR+ VIS)

Similarly,creatingamap that showsthe probability of sitelocationasderivedfrom a logistic regression formula (Chapter8) also requiressomelong, but not necessarily complex, map algebra.This often takesthe form of a polynomial: predictivejnap : (0.345 x elevation map) *(0.2194 * slope map) ... *(1.34 x distance-to_water-map) Note that it is necessary for grids to be the sameresolutionin order to be combined via map algebra.Ifthis is not the case,one or more grids will haveto be resampled to the desiredresolution. Map algebra is also used to transform a single raster map into a new map via a mathematicalexpression. For example,some remote-sensing and predictive modelling applicationsrequirethat cell valuesbe replacedwith their logarithmsin order to interpret the results. In many GIS packagesmap algebra is handled via a calculator-like interface similar to the one in Fig. 9.2, which provide push-buttoncapabilitiesfor combining raster maps. Some packages,such as GRASS, also provide a command-line interface,which is often more convenientfor repeatingcalculations.GRASS is also interesting for another reason: it is unique in allowing the new value of a cell with coordinates(,r,;y) to be computedfrom the values of cells with different coordinates,say (,r * 1, 1r* 1). This is a very useful facility since it allows complex spatial operations(as defined above) to be implementedin map algebra (seeBox I 1.1for an example). 9.3 Derivatives: terrain form The vast majority of archaeologicalGIS include a model of terrain. Often this is simply used to provide a visual backdrop for the display of, for example, site locations.In other cases, however,the model of terrain is intendedto supportsome kind of analysis,perhapsbecause terrain is thought to haveinfluencedsomeaspect of past human activity, or altemativelythe recoveryof evidenceof past activities. Take,for example,the tradiiional settlement pattem of the westemYorkshireDales in England: famsteads are typically located neither in the valley bottoms nor on

t 90

M ap algebra, derivatives arul processes

_tl-:j _:J::Jid rlJil Jll i,lJl -:li:hC !ll rl-Lb!

(ls.ir r s vis) /( 1 s_ n ir "ls vis)

Fig. 9.2 ESRI's ATCGIS'mster calculator'. Source:ESRI. ATCGIS soflware and graphicaluser interfaceare the intellectual property of ESRI and usedherein with perrnission. Copyright O ESRL All rights reserved.

the hill tops, but insteadat elevationsbetween 190 m and 260 m. This is probabll for a variety of reasonsincluding proximity to springs,easeof passage between dwellings when the valleys were more wooded and/or marshy, and not least the fact that animal manureneedonly be moveddownhill to fertilise the hay meadows. The simplest possible GIS model of the location of such farmsteadswould be a reclassifiedelevation map delineatingthe areaslocated between 190 and 260 nr abovesealevel. Of course,a model as crude as this would havelittle to say about the specific locationsof individual farmsteads. For that we might needto include additionalinformation suchasthe distributionof springsand watercourses, but also the other aspectsof terrain, such as the availability of flat(ish) land for building. and shelterfrom prevailing winds. The point here is that there is more to terain of terrain form that are than simply elevation.In this sectionwe discussmeasures themselves continuousfields: slope, aspect,plan convexity and profile convexity. Later, in Chapter 10, we discussthe use of GIS to identify discontinuousregions of terrain. 9,3,1 First-order deriyatiyes:sLope and aspect Definitions The s/opecalculated by GIS packages is themaximum rateofchangeofthe elevation at a given location and lhe aspect is the azimuth (compassdirection) of this rate

9.3 Derivatives


Fig. 9.3 The slope experienced while traversingterrain dependson the direction of ffavel. Moving due west over an east-facing40' slope meansa rate of ascentof40', whereasnorth or south movementis at 0". Moving at a NW or SW headingto a 40' slope changesthe ascentangle to 27'. Hence our propensityto 'zig-zag' up steep slopesto minimise the angleof ascent.

of changein the downhill direction. The notion of a maximum rate of changeis important,since as Fig.9.3 illustrates,the actual slopeexperienced when moving over terrainmay vary accordingto the direction of travel,a point which we consider further when discussing cost-surface analysisin Chapter10.Technically,slopeand aspect arefirst-orderderivatives ofthe surfaceinjust the sameway that acceleration is a first-order derivativeof velocity - the only differenceis that slopeis a rate of changeacrossspacewhereasacceleration is a rate of changein time. Calculating slope and aspect Almost all GIS softwarepackages provide push-button functionality for calculating slopeand aspect from elevationdata.Nevertheless, the resultsmay vary from packageto packagebecause severaldifferent algorithmshavebeendevisedto calculate derivatives ofcontinuousfie1ddata. Fotunately it appears thatthe well-known packagesmostly use the two algorithms that have been found to perform best in tests: Zevenbergen and Thome's method and Hom's method (Burrough and McDonnell 1998,p. 192). Box 9.1 presentsa method for calculating slope and aspectin the absence of push-buttonfunctionality. The units of measurement used for slope and aspectare not dependenton the algorithm usedto calculatethemi someGIS packages offer the user a choice:


Map algebra,deriyatires cmdprocesses Box 9.1 How to calculateslope and aspectfrom first principles

Horn's method This generally producesgood results,but is not practicable unlessthe GIS packagein questionprovides spatial map algebra (e.g. GRASS) or allows the constluctionof quite elaboratespatialfilters. According to this method, the tangent of the maximum rate of change (steepest slope),S, is givenby r"n 5 :,/rd/bl,- r6.,alr'

andthe tangent of the aspect, A, by 16 4 : -(6;/61,) l6z/'x':t wherethe partialderivatives give the slopein the east west 6z/6,rand,):/61, and south-nofih directionsrespectively. They can be calculatedas lbllows: 6z/dr : (1',*,.,.*, f 2:.+r., -f :'.+r.'. r - ?r l.r*l - 22. r.r. 1.. 1.. 1)/8A,r (2.+r.'+r* 2z*."+rf:. dr/6) r.,+r -

- . , r. . -r)/ 8 A )
Note that z f .. is the elevationvalue at row r,, column z and A-r and A_rr are the east west and south-north map resolutions(i.e. the grid spacing).

Slope may be expressed asdegrees ofinclination fiom the horizontal.or asa percentage such that 0' \c/c andg}' : 10070. The relationshipbetweenrheseunirs is simpl) : s l op e p e r .e n r slopede"ree. : 1 0 0 x ta n ( slo p e d e lr e e \) arctan ( slopeo.. ""n, / 100)

(9.I (9.1

Aspect is usually expressed as degreesof rotation lrom some origin, with a separari value lbr flat areaswhere aspect is undellned. It is important to establishexacll\ what scherneis used by a GIS packageas the meaning ot' the values may not be sell'-evident. For example,GRASS returns0 for flat areasalld then numerical value: which increasecounter-clockwisefrom an azimuth infinitesimally norlh ot' east,s. that northis 90', v'/est is 180-', southis 270'andeastis 360'. Itis commonpracr(e to reclassily aspect valuesto the compass directions N, NE, E, . . .. W and NW. In the caseof GRASS. this could be achievedwith the rules showi in Fig. 9.4.

Usingslopeand aspect Slope andaspect arecommonly predicative included in archaeological models of (see sitelocation Chapter 8).Theydo,however, properties have whichdemand thar theybe treated with caution.

9.3 Derivatives


0 0.000'l thru 22.4999 22.5 thru 67.4999 67.5 thru 112.4999 112.5 thru 157.4999 157.5 rhru 202.4999 202.s thru 247. 4999 247.5 rhru 292.4999 292.s thru 337.4999 337.5 thru 360

0 Flat 3E 2NE
1N 8NW 7W 6S W 5S 4S E 3E

Fig. 9.4 A GRASS GIS rule file to reclassifyaspectvaluesto the compassdirections N.NE.E.....W andNW .

10 20 30 40

Slope(') Fig. 9.5 A histogramof slope valuesin Dentdaleand Garsdale,UK, calculated fron OrdnanceSuNey elevationdata.Despite being an areaof highly vaiable relief, the dist bution of slope valuesis strongly skewedsuch that, although the maximum slope is 43.1', 75 per cent of cells have slopes < l3' and 50 per cent haveslopes<8.7'.

Slope Values calculatedfrom elevation maps are seldom uniformly or normally distributed (Fig.9.5). Rather,it is often the casethat relatively slight slopesare most frequent,with steeperslopeslessliequent and slopes > 15' comparativelyrare. The non-normaldistributionofslope valuesrules out the useofparametric statisticaltests for investigatingwhether archaeologicallysignificant locations are associated with particularmagnitudes ofslope. As describedin ChapterT,one solutionis to usea nonparametdctest, such as the Kolmogorov-Smimov test, to comparethe slope values at archaeologicallysignificant locationswith those at a number of randomly chosen locations. The parametersof the distribution of slope values should be established


Map algebra,derivativesand processes


135 (") Aspect


3 15

Fig.9.6 A histogran ofaspect values in Dentdale and Garsdale.UK, calculatedfrom Ordnance Survey elevation data.

empiricaly as they are rikery to differ fron one study region ro another.Note that the sanle is also true of many other products ol elevation iata, rnctuding aspectand vlewshed size. Aspect Vallrescalcuialedfi.om elelation maps are usually more uniiormly distributecl than slope values.as can be reen from the hisrogram in Fi-q.9.6. However, it is not uncommonfor the distrrburionof rspect \ aluesealcr.rlated jrie rasrcrsysremto contatn afietactsof the method of calculation.For example,Fig. 9.6 showsa seriesof low bur wide peaksat 90. intervalsand high but nanow peakiat ,15, intervals.Mysrics will be disappointedto rearn that the hils ot' northern England do nor resemblea ser-ies or octangularpyramids odentated exactry on the cariinai crirections!Rathet these peaksof higher fiequency are the t_esult of calculatingaspectusing the eight cells in the tmmediateneighbourhoodofeach location. As we have stressecl elsewhere, CIS maps are models,and not always very good o[es.

Thosewho wish to useslopeand aspect for statistical analysis shouldnotethat , they are not wholly indepenclent variables,becauseslopesof 0. ur" necessanly correlaFd with undefinedaspect.That said, it is generaliy the casethat elevation, are nor closely correlatedin area's of varying el.uar,on 1Kuu,nrn. :l^ip,:i",t,:rp".: ryyzo).airnough rt may be worth checking this on a case,by_case basis(e.g.Lake and Woodman2000). Displaying slope and aspect When displayinga slopemap it may be necessary to increase the numberofcolours allocatedto low slopevaluesin order to revealv;riation in all areasofthe map. This is a directresultof theskewed distribution of values mentioned above. Figure9.7(a) shows a map of slope values drawn with shades of grey allocatedto equal_size classes of slope.Figure 9.7(b) showsthe sameintorlration, Uut orsplayeA witn shadesof grey allocated to variable_size classesof stop. .tro_n to equalisethe area they represent:note how it provides better differentiation of the low slope

9.3 Derivcttives

Fig. 9.7 Slope classes in (a) are shadedusing an equal-interval method,obscuring variability at the low end (darker colours) ofthe range.This in corected in (b) by defining classes using an equal-area classificatjon.

values.Burrough andMcDonnell (1998,p. 193)andMi tasova et al. (1995)provide alternativesuggestions for suitableclasssizes. One of the most common applicationsof slope and aspectis the production of shadedrelief maps. Shading makes it possible to display elevation data with a three-dimensional appearance, as if illuminated by low sunlight. The effect is pafticularly useful with high,resolutiontopographicsurvey data to help visualise, for example, partially ploughed-out earthworks.It is also mercilessly revealing of algorithm artefacts,particularly of the .terraces' sometimesproduced durin; interpolationfrom contour data (Chapter6). Once the slope and aspecthavebeen calculatedfrom elevation data, shading can then be applied to other data, such as land-cover classesor even aerial photographs,giving an irnpressionof relief

(Fie. 9.8).

Most GIS packages provide push-buttonfunctionality to produceshadedrelief maps.The simplestimplementationsshadethe surfaceso that cells whose asoect faces away from a hypotheticallight source (conventionallyplaced at 45" above the horizon in the NW) are shadedmore darkly than those facing the light source. More sophisticated implementationsnot only allow the user to vary the position of the light source,but also estimatethe amount of light reflectedfrom each cell as a function of aspect,slope and the reflectanceof the land cover. Hom (19g1) providesa detailedreview of the variousmethodsthat are used. The principles usedto calculateshadedrelief are taken to their logical conclu_ sion in the calculation of solar gain, also known as irradiance.Solar gain is not a presentational device, but a genuinely analytical model of the amount of solar energy falling on each location in a landscape.The calculation of solar gain is


Map algebrtt,derivatiNes and processes

Fig.9.8 A hillshade modelintegrated lvith a DEM ro creare a shaded reliefmodel.

Fig. 9.9 Poinl A on rhe pond barow has the same slope and aspectas point B on the bell barrow, brt they diff'er in plan and profile convexity.as describedin the text.

relatively complicated - seeBurroughand McDonnell(1998,pp. 202-203)for a brief introduction- but it has beenusefully incorporatedinto predictivemodelsof sitelocation(DuncanandBeckman2000). 9,3,2 Second-order derivatives:pffile and plarLconvexi\ While the first-orderderivativesof elevation,slopeand aspect,provide useful and easily interpretedinformation about terain, they do not dircctly tell us about its shape.For that it is necessary to calculatethe second-order der.ivatives, profile and plan convexity,which measurechangein the first-orderderivatives. By way of illustration, suppose a colleague had conducted a high-resolution topographicsurvey of a Bronze Age barrow cemeteryand provided you with the slope and aspectvalueslbr an unspecifiedlocation near the centre of a perfectly fbrmed round barow of unspecifiedtype (Fig. 9.9). With just this intbrmation it would be impossible to distinguish betweena bell banow and a pond barrow. If, howeveq your colleague had instead provided you with the profile and plan convexity, then the type of barrow would have been immediately apparent:any locationnearthe centre of a bell banow would be located on a slopethatis convex in both profile and plan, giving it positive values, whereasany location near the

9.4 Continuity and discontinuity


centreof a pond banow would be locatedon a slopethat is concavein both profile and plan, giving it negativevalues. In practice,the main applicationsof profile and plan convexity are in geomorphology,sincequantitativemeasures of terrainform can provide usefulinformation about the formation processes that gaverise to particular landscapes (Clowes and Comfort 1982).Such information may in turn be of value to archaeologists, especially those studying earlier prehistory.There are, however,a variety of ways in which profile andplan convexitycould be of direct useto archaeologists (we do not seriouslyanticipatetheir use to classify round barrows!).Retuming to the example of the placementof Yorkshire Dales' farmsteads, both profile and plan convexity might provide proxiesfol shelter,suchthat locationswith negativevaluesare more shelteredthan those with positive values.Alternatively - and in a less processual vein - the ability to rnap qualitativeaspects of terrain such as .roundedness' opens up the possibility of enhanced dialoguewith thosewho espouse a phenomenologi_ cal approachto landscape. Indeed,we suspect that suchdialoguemight fare rather betterif GIS-usingarchaeologists showeda greaterwillingnessto experimentwith the more abstract higher-orderderivatives ofcontinuous field data.Thoseinterested in such an endeavourshould consult the work of Marcos Llobera (2001) on, for example,prominence,aswell asthat by Lake andWoodman(2003) on the settings of stonecircles. 9.4 Continuity and discontinuity It is possible to emphasisechangein continuousfield data, often in the hope of identifying boundariesthat may mark the edgesof archaeologicalfeaturessuch as walls, buildings or even entire settlements, dependingon the scaleof analysis. Conversely, it is also possibleto de-emphasise changeto, for example,smooth an otherwise 'problematic' DEM. Both are most commonly achievedusing spatial filtering applied to rastermaps. 9.4.1 Spatialfiltering Spatialfiltering is in somerespects similar to the inversedistanceweighting method of interpolationdiscussed in Section6.4.1.Like that method,it computesthe value of a given cell as a function of the values of cells in a neighbourhoodand the result is often, although not necessarily, some kind of weighted average.Unlike interpolation,however,the aim is not to estimatethe value of unsampledcells, but to modify existing data in such a way as to increaseor decrease the spatial autocorrelation betweenneighbouringcells.As we saw in Chapter8, an increase in correlationmeansthat the valuesof neighbouringcells will becomemore similar, while a decrease meansthat they will becomeless similar The neighbourhoodused in spatial filtering is usually a squarewindow, often called the kernel. The new (filtered) value of the central cell in the window is computedas a function of the cell valuescoveredby the window. The window is moved over the entire map, ensuringthat all cells eventuallyreceivea new filtered

T oL

Map algebra,derivativesand processes

value. The processis said to be seqLrential if the new value of a cell is used in calculatingthe value of a neighbouring cell, or parallel if all calculationsuse the original (unfiltered)cell values.Most archaeological applicationsrequire parallel filtering. Spatial filtering is describedmathematicallyby the equation:

,,,) ,':'(E:i,.,,,,


What this says is that the central cell (C;;) is some function (/) of each of the sunounding cells (c;;) in a window of radius m, where the value of each cell c;; is multiplied by a weight.l";; and thesevaluesare then summedto determinethe value at the centreof the kernel value. Note that eachcell may be multiplied by a different weight dependingon the type of function and the distanceof the cell. For example,a type of spatialfilter caTled Laplaclan, describedin further detail below, weights the valuesof pixels in such a way that locations where rapid changesin values occur are highlighted. To achievethis effect the weights may be negative in value dependingon the size of kemel, as in the following example for a7 x l kemel:
0 0 -1 0 1 -3 -L - 3 0 L 3 '1 L -3 0 1 0 -3 00 -1

-1 L 0 0 -3 -3 1 0 '74 -3 -1 24 ',7 3 1 '70 3 T -3 -3 -L,L

1 0 0 0

GIS packagesthat otTer spatial filtering may require the user to specify the weights and probably also the radius ofthe window. The more analytically orientatedpackages also provide a choiceof functions,for examplethe Idrisi 'FILTER' module providesnine predefinedfilters as well asthe option for user-defined functions(Eastman 2001,p. 87). 9.4.2 Low-pcLss fLters: smoothing Low-passfilters decrease the correlationbetweenneighbouringcells: they havethe effect of smoothingaway local variability (often referredto as noise).Figure 9.10 showsthe applicationof a low-passlilter to a 5 x 5 matrix. Using low-pass filters Low-passfilters havebeenusedto smooth(or 'blur') datain an archaeological GIS for a variety of reasons, somemore appropriatethan others:

9.1 Continuity and discontinuity (al (b) 5



2 14 5



'10 2
18 9 13


11 5

Fig. 9.10 A simple low-passlilrer. ln this examplerhe 25 cells valuesin (a) have been replacedby the mean of themselves plus their 8 nearestneighbours.This new vatuers storedin a separate map (b). The processtransfoms a 'rough' surface(c) to a .srnooth, sudace (d). Note that new vallresare not calculatedfor the cells at the edgeolthe map as efioneolrsvalueswould be ca]culaledfor theseowine to them having fewer than 8 neighbours.

Presentation Smoothedsudacesoften look more visually attractivethan the raw data. For example,it may be preferableto overlay labels and other graphics on the continuous tone representingsmoothedquadrat counts of, say, microliths, rather than overlay them on the the 'blocky' representation of the raw counts.However,wc suggest that it is generallybetter to avoid such usesof smoothingunless:(a) the originai measurements are also made available; and (b) the smoothed sudace is not being used to suppot an argument that is untenablegiven the variability in the original data. Enhancement ofterrain models As we saw in Chapter6, terrain modelsoften contain intelpolation artelacts, in which case there may be a temptation to smooth them away. Smoothing for purely aestheticrcasonshas the disadvantage that it misleads others about the quality of the terain model. One common analytical rcason for smoothingis the removal of 'pits' that would otherwisedisrupt the simulatedflow of water in hydrological modelling (seeChapter I l). Another is to remove interpolahon 'tenaces' that would otherwise erroneously disrupt the line-of-sight in viewshed analysis,although this has the significantdisadvantage oflowering peaksand ridges which are often important deteminants of viewshedextent.


Map algebra, derivativesand processes (a) 6 3 1 5


(b) 5 5
4 o 4

5 5 5


Fig. 9.I 1 The application of filtering to calcularethe (b) mode, (c) range and (d) diversity of map (a).

Identification of trend By reducing local variability it may be easierto identily trends in data.The use of smoothingfor this purposecan be appropriatewhen the process is of an informal exploratory naturc, but for more analytical purposesit is better to model a trend surface(Chapter6) as this provides a measureofthe fit.

Computinglow-pass filters Low-pass filterstypicallycompute the central cell assomekind of average of the cellsin thewindow(thekernel):
. The mean is usedon ratio and interval data (Fig. 9.10), fof exampleelevationor pottery counts.The mean can be calculatedquite simply by applying equal fractional weights and then summing the results.For example,if the window is of radius m : 1 then all nine weights J.i r,y_r. . . li+r.;+r should be set to 1/9 since summing one-ninthof nine cells gives their average value. The mode (most comrnon value) provides a more appropriate average of nominal and ordinal data such as,for example,a nap showing the most frequentartefacttype in each cell (Fig. 9. 1lb) .

An altemative use of low-pass filtering is not to smooth existing data, but to createa new map that provides a measureof its local variability. The following functionsmeasure(but do not increase)local variability:

9.4 Continuity and discontinuity

. .


The range is usedon ntio and interval datato describethe maximum differencebetween cell values in the window (Fig.9.1lc). The diversity (the numberofdiflerent valuesin the window) providesamore appropiate measure of vadability in nominal and ordinal data (Fig. 9.1ld).

9.4.3 High-passfihers: emphasising change High-pass filters have the opposite effect of low-pass filters: they enhance(or 'sharpen')local variability (nolse). Using high-pass filters The most common archaeological applicationof high-passfiltering is edgedetection. For example,given a high-precisiontopographicsurvey it may be possible to identify the walls of a buried building by locating areasof rapid rare of change of elevation(Fig. 9.12). The sametechniquecan also be appliedto crop marks in aerial photographs, where the changeis not in elevationbut the tones recordedin the (digitised)image. In suchcasesthe purposeof edgedetectionmight be:
Detection of featuresbarely discernibleto the human eye; Extraction of featureslbrautomated mapping (by thinning and vectorisingthe results); Quantification of the fragility of the traces of f'eatures,perhaps for conservation putposes.

Edge detectioncan also be usedvisually to examineterrain modelsfor interpolation aftefacts,althougha slopemap is often just as effectivefor this purpose. Computing high-pass fi lters High-passfilters are definedin map algebraicterms as: High-pass map = Original map - Low pass-map High-pass filters canbe created by implementingthe aboveformula in map algebra. Figure9.12(b)showsahigh-passfilter calculated in this way usinga kernelradiusof 1.It is more common,however,to usespeciallydesigned setsof kemel weights.For example,the previously describedLaplacian edge filter is calculatedby summing the productsof eachpixel value multiplied by its kernel values.In a 3 x 3 kemel the appropriateindividual pixel weights are:
,1 -1 -1 -1 +8 -1 -1 -1 -1

This result emphasises the edgesofpolygonal areaswith higher or lower values than the surroundingarea.The aboveLaplacian filter applied to the elevationdata in Fig. 9.12(a) produces the imagein Fig. 9.12(c).

Map algebra, derivativesand processes (b)

Fig. 9.12 Map algebraic(b) and Laplacian (c) high passnltets applied to a synthetic DEM containing tracesof a field system(a) Exlraotion of values > I from the map system(d). algebraicresult producesa map ofthe 1ie1d

GIS packagessuch as GRASS and Idrisi also offer The more raster-orientated and Sobel edgefilters such as the zero-crossing edge-detection other specialised detectortilters.

9.5 Surface processes:erosion soil erosionmodeliing as an exampleof a spatialoperation This sectiondiscusses that models a real-world processoperating continuously acrossspace.We have chosensoil erosion modelling for two reasons:first, it has severalarchaeological

9.5 Surfaceprocesses Table 9.1Theuniversal soil lossequation: A:Rx

Variable Description

2o3 KxLx SxCxP



Annual soil loss (ton h ') Ercsivity of the rainfall, where R : 0.1lab. + 66 and a is averageannual precipitation (cm), b is the maximum one-dayprecipitationoccurring once in 2 years(cm), and c is the maximum total precipitationof a showeroccu[ing once in 2 years(cm) Erodibility of the soil Slope length factor: l, : (l122.1)0s, wheret is slope lengrh (m) Slope factor: S : 0.0062 x r'? + 0.065, where.r is the slope (7o) Cultivation parameter Production pammeter

Source:Burrough and McDonnell 1998,Box'7.2.

applicationsand, second,it provides a good illustration of the kinds of issuesthat are likely to confront thoseusing the more specialised GIS functions. Archaeologistsmay gain insight from the modelling of soil erosionin a variety of applications,including:
Conservationi as archaeologicalsites may benent from the identification of those at risk from natunl erosion,especiallyin upland areas; Prediction ofsite location: as this may be enhanced by the identificationof areaswhere soil has been depositedin sufficient quantity to ensutethe preservationof archaeological features; Landscape history: which may requirean appreciation ofthe effectsofhuman land-use practiceson soil erosionand thus, ultimately, landform.

9.5,1 Computingsoil erosion Prior to the widespreadavailabiliry of GIS, quantitative soil erosion modelling essentiallyinvolved constructingan empirical predicativemodel of the likely soil loss at a given location. This approach,epitomisedby the well-known universal soil loss equation (USLE), links observedrates of soil loss with the erosivity of the rainfall and attributesfof the easternUSA (Wischmeierand Smith 1978).The latterusually include the erodibility of the soil, the slope,the slopelength,the type of cultivation and any measures taken to impedeerosion (Table9.1). A numberof organisations provide datato simplify the calculationof the USLE for agriculturaland soil management purposes. For example,the US National Soil ErosionLaboratory (NSERL)' and the Ontario Ministry of Agriculture and Food2 publishtablesof data giving valuesof R, K, C and, P for local areasand conditions. The USLE can be implementedin a GIS by applying map algebrato rastermaps
Flt t p / / t o p " o i r . r ' s e r t . p u r d u e . e d u /u s1 e / ' -vrww.erov. on. cal OMAFRA/index . html. .


Map algebra, tlerivativesand processes

of the various attributes.Note, however,that it is not a model per se but a set of variablesand land usefor empirically derivedrelationships betweenenviLonmental and in areas the easternUSA. Its applicability outside this legion is questionable, suchasthe Mediterranean the USLE hasbeenshownto havelittle predictivevalue for estimatingratesof soil erosion (Grove and Rackham2003, p. 255). model oferosion but it fails to harness the The USLE providesa basicgeneralised that aretruly spatial.More recentGIS-based ability of GIS to implementoperations approaches attemptto predict soil loss by modelling the actual processof erosion (Nearinget a/. 1994),namely,the transpofiof sedimentover a land surfhce.Examples of this approachinclude the AGricultural Non-Point SourceModel (AGNPS) (Young er al. 1989) and the Areal Non-point Source WatershedEnvironnental Response model (ANSWERS;Beasley andHugginsl99l ). Simulation to 'flow' Process-orientated erosionmodelstypically work by allowing sediment downhill from eachmap cell to its lowest neighbour.Since cells may also receive sedimentfrom their uphill neighboursthe end result at any pafiicular location can be either erosion- a net loss of sediment or deposition- a net gain of sediment. Dependingupon how the model is irrplementedthe resultsmay be viewed as one or more of: a revised elevationmodel; a map of revised sedimentthickness;or a niap of the changesin elevationor sedimentthickness(seefigures in Mitas et al. I 996 for examples). Theoretically,the amountof sedimentthat movesbetweenany giventwo cells is dependent the detachment upon factorsthat affectthreeprocesses: particles by some of soil particlesfrom the parentmaterial: the transportof those particles.The factors agent,suchaswateror wind; andthe depositionof transpofied most commonly included in process-orientated GIS erosion models are those that measure the contlibution of landform, soil type, land cover and rainfall intensity to detachment, transportand deposition(Box 9.2). 9.5.2 Issuesin erosionmodelling andwe discuss them Threeissues standout asparticularlyrelevantto archaeologists modelling many kinds of advanced in GlS. in somedetail as they also apply to Choosing a model The simplest way of choosinga model is to find one that has been implemented in the GIS packagewith which you are most familiar and/or which the relevant project uses.However,sinceconsiderablymore time is likely to be spentgathe ng input datathan learning which buttonsto press,it may be quicker in the long run - to chooseaccordingto propeniesof the model and more scientificallydefensible of rather than the software.Most models draw on the samebasic understanding the physical processes that causeerosion,but all differ in how exactly they model thoseprocesses. This is true in terms of both the spatialresolutionand the specific while AGNPSassumes parameters For example, usedto characterise theprocesses. that rainfall is constant over the entire region, ANSWERS allows several 'rain gauges' to be specified,each recolding ternporalvariation in raintall in a given subregion.Similarly, AGNPS requiresthat soil propertiesbe specifiedin terms of

9.5 Surfaceprocesses Box 9.2 Parameters required for the ANSWERS' erosionmodel
Soil . . . . . . .


Total porosity (per cent pore spacevolume of soil) Field capacity(per cent satumtion) Steady stateinfiltration rate (mm h r) Difference betweensteady-state and maximum infiltration rate (mm h r) Infiltration exponent accountsfor the mte of decreasein infiltration capacity against increasingmoisture content Antecedentsoil moisture (per cent saturation) Soil erodability (K in the USLE)

Land use . Potentialrainfall interceptionby land cover (mm) . Surfacecoveredby specinedland use (7o) . Roughness coefficient of the surface(a shapefactor) . Maximum rcughnessheight of the surfaceprofile (m) . Manning's , (a measureof ffow retardance of the surface) . Relative erosiveness (function of time and USLE C and P) Watershed . Tile d.ain coefficient . Groundwaterreleasefraction (contributionof groundwaterflow to runofI) Rainfall . Intensity at specifiedtimes during event (mmh r) Channels . Width . Roughness . Slope (ifnot sameas overlandslope) Management practices . One of a choice of four best management practices(BMPS)

percentages of sand,silt and clay, while ANSWERS requiresinformation aboutthe porosity,saturation point and infiltration rate.Consequently, thereare at leastthree criteria that could be usedto choosea model:
Spatial resolution: the match betweenthe spatialresolution and the study region; Parameters: the match betweenthe parameters requiredand thosethat are availableor can be most easily obtained: Theoretical preference: which is likely to require expert advice.

Assembling input data AsTable 9.1andBox9.2show, erosion models require large numbers ofparameters. ofthese, Some suchaselevation andthedistribution maybe available of soiltypes, frompublished sources in digitalform(and almostcertainly forafee).Itis important


Map algebra, derivativesand processes

to check that the resolution is adequate for the problem at hand. The distribution of land cover can also be derived from satelliteremote-sensing data,althoughthis may require fieldwork for ground tnrthing (seeChapter7). The propertiesof soil types and the effect of land-covertypes on, for example,flow rates,may requrre extensiveinvestigationin the field and laboratory.The key point here is that with erosionmodelling in particular,and advanced modelling in general,the useof GIS is often the easiestand cheapest part of the whole endeavour. High_quality input data are more likely to have greaterinfluenceon the usefulness of the resultsthan expenditureon more sophisticatecl software. Palaeoenvironmental reconstruction By runningan erosion modelon the palaeoenvironment it may be possible to pre_ dict where sitesare buried or to enhance understanding of landscape history.Aier issue, of course, is thequalityof the available palaeoenvironmental reconstruction. All archaeologists wiil be familia. with issues of temporalresolutionand other sources of uncertainty, suchas the differentiarpreservation ofpollen from difl-erent treespecies. The additional problemfacingthoseundertaking GIS_based process_ drivenmodellingof erosionis that the requirement for spatiallycontinuous input ls not lmmediatelymet by the palaeoenvironmental eviclence, which almostalwavs constitutes point data,suchas that from boreholes. This problem,whjch is corn_ mon to mostGIS-based modellingof pastlandscapes, includingviewshed analysis (Chapter l0), must be overcomeby some form of interpolation. One responsl to the additional uncefiainty introduced by the need fbr intelpolation is simply an outrightrejection of GIS-based modellingof pastlandscapes, but this seems to us overlypessimistic. Instead, we suggest two criteriathatcanbe usedin determinins u hether ro ploceed:
If the effect of u[certainty i n the in put datac an be measurecr in sucha way thaf the resu] Is can be qualified with a probability or a confidenceilterval. This might be achievedb\ running the model with a mnge ofdifferent input valuesreflecting the possiblestates ol the palaeoenvironment. 2. Ifthe uncertaintyis commensurable with the inter.ential logic ofthe project.For example. a proJectthat seeksto infer unknown human rand-use practicesliom their contribution to the evolulion of the modern environnent requires reasonablecertainty about the palaeoenvlronment because rls logic is to infer an unknown processfiom a known sta and end point. In contrast, a project where the lancl use practices are known and the aim is insteadto reconstructthe palaeoenvironnentrequiresIesscertainty becausethc logic is that the end point and processwill be usedto infer the unknown sra polnt trv a process of 'rcverse engineering'. I

9.6 Conclusion ln thischapter we havediscussed theapplication ofspatialoperations to contrnuous field data'suchaselevation. The calculation of slopeandaspect is armost routinein research-orientated archaeological GIS, althoughaswe explained,therearevarious pitfalls to be avoided.The calculationofplan andprofile convexityis less common.

9.6 Conclusion


and possibly also more phenomenobut has its usesfor both geomorphological logically orientatedresearch. Low-passfilters havevarious applications- some more desirablethan others- while high-passfilters can be usedto assistin the interpretation of aerial and satelliteimagery and even,in somecases, automated mapping. Thefinal partof this chapter discussed soil erosionmodellingasanexample of a spatialoperationwhich attempts to modela real-worldprocess of interest to archaeologists. In the next chapterwe extendour discussion of modelling and introducethe derivationand analvsisofregions.

10 Regions:territories, catchments and viewsheds

10.1 Introduction: thinking about regions A GIS can be used to create,representand analysemany kinds of region. Some regionshavean objectivereality,at leasttothe extentthattheyarewidely recognised and have a readily detectableinfluence on aspects of human behaviour.The most obvious examplesof this kind are sociopolitical regions such as the teritories of modernnation states. Other regionshavean objectivereality in anothersense: that they are defined by some natural process.A good example of a natural region is the watershed;that is, the area within which all rainfall drains to some specified point in a drainagenetwork. A third kind of region is essentiallyjust analytical in the sensethat it is createdfor a specific short-livedpurposeand may never be recognised by anyoneother than the analyst.For example,an archaeologist might determinethe region containing all land within 100 m of a proposedhigh-speed railway line in order to identify at-risk archaeological sites,but it is the list of sites and their locations,not the region, that is fed back into the planning process. Regionsarereadily represented aspolygonsin a vectormap, or lessefficiently as cells codedin sucha way as to distinguishbetweeninside and outsidea region in a rastermap.Wherethe extentofregions areknown in advance ofGIS-basedanalysis. as is often the casewith sociopolitical regions,their generationand manipulation within a GIS is mostly an issueof data captureand map query.1Consequently, the readerwhoseprimary concemis to map andmanipulateknown regionsshouldconsult Chapters5 and 7, where we discussed theseoperationsin detail. Our principal concem in this chapteris with the use of GIS to build models of regions whose extentsare not known in advanceof GlS-basedanalysis. The use of GIS to model regions resonates so closely with certain theoretical developments in archaeology andgeography that it is importantto be awareof them. sincethey oftenprovide theimplicit, ifnot explicit,justificationfor many GIS-based analyses. As notedin Chapter1, thereis a long history of intellectualtrafnc between geography (Goudie 1987),stretching andarchaeology fromthelocal antiquaries and travelleddilettantesof the sixteenth-nineteenth centuries,who engaged in various combinationsof survey,map making, explorationand antiquarianism, to the influenceof humanisticgeographers such as Ted Relph (1976) and Yi-Fu Tuan (1974.

."ulity of ,upposed]y objective rcgions is sometimesconresred, in which case a more sophisricated approachmaybe required,such as representation using a rastermap with cells coded accordingto somemeasure of the degreeto which there is consensus about whether or not they fall in the region.


I0.2 Geometricalregions


1978)on the development of phenomenological approachs to Iandscape archaeology.Along the way, the New Archaeologywas stronglyinfluenced panicularly in its Europeanguise by the New Geography, whoseclass ic textsLocatiotlol Anol_ t'sis in Human Geography(Haggett 1965) andModels in Geography(Chorley and Haggett 1967)providedthe stimulus for Analytical Archaeology(Clarke 1968)and Models in Archaeologl (Clarke 1972).Like the New Archaeologywhich ir parrially inspired,the New Geographywasto aconsiderable extentbom ofa growing interest in (and availability of) techniques of measurement and comparison,and a dissatis faction with the tradition of descriptivesynthesis and naffative(WagstafflggT). Of particularsignificance forthetreatmentofregions,however, wasthe moveto replace chorology-the studyof the characterand interrelations betweenplacesandregions, which formed the backboneof traditional geography- with spatial analysis the study of geometric arrangementand pattems of phenomena(see e.g. Bunge 1966).In otherwords, accordingto the New Geography, placeandregion shouldbe treatedasthe resultofprocesses ofinteraction, not objectiveentitiesawaiting study. This view inspireda distinctivespatialarchaeology asdefinedby thecontributionsto Spatial Archaeologl (Clarke 1977) and Spatial Anatysisin Archaeologtt(Hodder andOrton 1976), andthe development ofsite catchment (Roper1979). analysis The notion that regions should be treated as the outcomesof processes both inspired and continuesto provide theoreticaljustification for many of the com* mon GIS techniques for modelling and creatingregions.For example,the creation of regions using Thiessenpolygons (Section 10.2.2)betrayscertain assumptions about processes of spatial differentiation,while the use of cost-sudaces (Section | 0.3.1) typically acknowledges factorssuchasthe cost of transpofi.Consequently, the uptakeof GIS in archaeologyhas led to a tension betweenthe availability of techniques ideally suitedto the agenda ofthe 1970sspatialarchaeology andthe fact that much ofthis agenda hasbeendiscarded by post-processualists more concemed with a situatedcontextualmeaningthan the identificationof generalprocesses. We will not attemptto adjudicateon this matter,but insteadsimply lay out the assumptions behind the varioustechniquesand encourage usersto think creativelyabout their application. 10.2 Geometrical regions Geometricalregionsare defined in abstractspatial terms and do not take account of the contentof space.As a result they are mostly analytical,since very few realworld spatial processes are unaffectedby what is 'on the ground'. Geometrical reglonsare most commonly ploducedby buffering (alsocalled proximity analysis) and tessellation. 10.2.1 Buffering The simplest kind of buffer is a region containing all locations within a certain proximity to pafticulargeographically referenced entity (the origin). The proximity is usually specified in terms of a maximum distance, or both a minimum and

2 TO


Fig. l0.l Multiplebuller zonesarounda point.

maximumdistance. In Fig. 7.5 we illustrated theresultof a distance or but'fer querr performedon point, linear andaleal entities.A singlebuffer generated from a linear entity such as a river or road is often refered to as a coridor It is also possibleto create multiplebuffers,in which caseminimum and maximumvaluesare usuallr chosenso as to producea set of contiguous buffers(zones)of ever increasins distance from the origin (Fig. 10.1). The presence or absence of a specifiedlocation within a particularzoneprovide, categoricalinformation about the distancefrom that location to the origin. If the zonesare madesuitablynarrow with reference to the map resolution,then the effecr 1sanalogous to the process of creatinga continuousproximity surfaceby spreadin-s (seeSection10.3.1). In thiscase, thepresence ofa specified location in a panicular zone does essentiallyrecord the actual distancefi.om that location to the origin. Single and multiple buffers can also be generated tiom multiple origins, in whici: case the buffers are computed according to the distancefrom the nearestori-cir: (Fig. 10.2). Since buft'ersare discreteentities they are most efficiently represented as pol\, gonsin a vectormap.Consequently, almostall vector-based GIS software provide. push-buttonlunctionality fbr the computationof single and multiple buff'ersfroll: oneor more origins.A particularvirtue ofthe vectorrepresentation of buffersis th:: they may then be included in spatialqueriesimplementedusing polygon overla.. (Chapter7). In this way it would be easy,for example,to identify all archaeologicr, sites that fall within the jurisdiction of a particular local authority and which ar. also locatedwithin 50 m of a proposed new road.It is, however, alsopossible r,. representbuffers in raster databases. While single bufl'erscan be represented b. Booleanmaps,with cells coded I if they are inside the buffer and 0 if they are oui side,multiple buffersmustbe represented by coding cells accordingto the categor. value of the buffer in which they fall. Traditionally, raster buffer.sare createdb,

10.2 Geometricalregions

Fig. 10.2 Multiple mergedbuffers. Each mergedzone repons the distanceto the nearcstpoint without indicating which point.

reclassifyinga continuousproximity sudace,but many raster-based GIS packages now offer push-buttonfunctionality for this task. We have included buffers as examples of geometrical regions becausethey are almost always calculatedsolely on the basisof Euclidean distance.However, Euclideandistanceis not necessarilythe most appropriatemeasureof proximity: that havepotential applicationin archaeologyinclude travel time, other measures rhe concentrationof chemicalsdiffusing from a source,intervisibility and measuresof the possibility of flooding. Map algebraand reclassification can be used distanceconceivedin tems of categoricalattributes, ro createbuffers representing suchas intervisibility and location on the floodplain.Buffers representing distance conceivedin terms of continuousattributes,such as travel time, must usually be derivedfrom the resultsof spreading(seeSection 10.3.I below). 10.2.2 kssellation Tessellation is simply the processof dividing an area into a set of smaller tiles iuch that there are no gaps betweenthe tiles. Raster maps are one example of a tessellation, but in this casethe tiles (i.e. map cells) are not normally interpreted as meaningful regions. Here we are more interestedin tessellations constructed Ibr analytical purposes. By far the most common of theseis a method refened to variouslyasDirichlet tessellations, Voronoi diagramsor, more frequently,Thiessen polygons.Given an initial distributionof points, the tessellationdivides an areaso ihat eachpoint is enclosed by exactlyonepolygon which also containsall the space thatis closerto that point than any other.The polygon may then inherit an attribute of its defining point so that it becomesin fact a simple form of interpolation.This methodcan be usedfor interpolationof categoricalpoint observations, such as to (Fig. .'reatecontinuoussoil mapsfrom boreholedata 10.3).



Fig. 10.3 Thiessenpolygons.Differential shadingrepresents attributestate.

The New Geography pioneered the useofThiessenpolygonsto definethespheres of influence of urban centres,an idea which found its way into Clarke's (1968) manifestofor the New Archaeology.Thiessenpolygonswere subsequently usedto define 'territories' for archaeological site types, including Neolithic long banows (Renfrew 1976),hillfofis (Cunliffe 1971)andRomano-Britishsettlements (Hodder andHassell 1971b).In thesecases they implementeda clearrationalefor allocation oftenitories: thatcontrol shouidbe ascribed to the nearest site.To the extentthatthis rationaleis madeexplicit, theuseofThiessenpolygonsrepresented an improvement on ad hoc methodssuchasthe 'squeezing'ofcircular buffers (Dennell and Weblel, 1975;seeSteinberg1996;Perlds2001,pp. 139-143for morerecentuses). However. few archaeologists would today acceptthe underlying assumptions that inffuence is independent of the cost of transport,the size of the centreand a myriad of other social and cultural factors (see,for example,Milner's 1996 review of Peregrine's 1995application). There are no 'textbook' methodsfor adjusting tessellations to reflect how the contentofindividual tiles influences their extentthroughfactorssuchasthe cost of transport.This is not surprising,sincegeometricalmethodsforconstructingregions ignore the content of space;spreadingprovides a more appropriatemethod when the latter is important.It is, however, possibleto adjustthe extentof individual tiles accordingto therelativeimportanceofeach of the pointsfrom which the tessellation was generated. In this casethe 'impoftance' may be a measureof anything from population sizeto the presence or absence of a religious centre. The basicpremiseof gravity modelling is that the intensityofinteractionbetween locationsis directly proportionalto somequantity at thoselocationsand inversely proportionalto the interveningdistance(HodderandOrton 1976,pp. 187-195).For example- andvery crudely- it predictsmore interactionbetweenlarge settlements

I 0.3 Topog rctphicalregions Box 10.1Weighting Thiessen polygons


One possible equation to weight Thiessenpolygons, and thus to model the boundarybetweentwo settlements, i and I is:
n_ ' 1 -L

\/' tt' l

/ p/ p

where P1and Pj are the two population sizes,D,j is the distancebetweenthem and D:,j is the distancefrom settlementj at which the boundarymay be drawn (Hodder andOrton 197o. p. 188r. Constructingweighted Thiessenpolygons based on this gravity model in a GIS involvesfirst establishing the smallestconvexhull aroundeachpoint using its neighbouringpoints as vertices.Weightedpolygons may then be constmcted by first connectinga seriesof point locations with line segmentsand erecting perpendicularsto those line segmentsat the point defined by D,;, and then extendingthose perpendiculars until they intersect.An alternativeis provided by Aurenhammer and Edelsbrunner(1984). Weighting Thiessenpolygons is not provided as a built-in function of desktopGIS, so implementing this algo_ rithm in mainstreamcommercial GIS packageswill likely require the writing of a script. Altemalively, freely availablestandalone programssuchas Gambini (Tiefelsdorfand Boots n.d.) or VPPIants(Gavrilovan.d.) perform this function, and the latter permits the export of the results so they can be placed back into a GIS.

than between small settlements, and that the amount of interaction is less if the settlements were far apartratherthan nearby.For example,Jochim (1976, pp. 55_ 62) has used gravity modelling to investigatethe ,differential ,,pulls" of various resources on [site location]'in MesolithicGermany. One very simplemethodfor adjusting the size of Thiessenpolygons - by weighting them according ro the relative size or importanceof the individual points , is describedin Box 10.l. An alternativeapproachthat employs weighted boundariesto allow for dominanceof one tenitory over anotherwas devisedby Renfrew and Level (1979) and is called the XrENr model. There are, howeveq very few applicationsbeyond rheir trial fonnulation, largely because of the acknowledgedsubjectivity in detemining the value of the constantthat determineswhether territories are likely to be more or lessautonomous. 10.3 Topographical regions It is often appropriateto define regions according to the outcome of processes. Unlike the geometricregions discussed above,regionsdefined by processes must necessarily take accountof the content of space,sincethe outcomeof the relevant



processes will be at leastpartly detemined by what is 'on the ground'. We havechomost that the processes label sento suchregions'topographic'for the simplereason happento be frequentlychosento defineregions,whethernaturalor anthropogenic, thosesignificantly affectedby elevationand its products.For example,watersheds are definedby the flow of rainfall over a land surface(seeChapter 1I ), which is in large part a function of slope.Similarly, regionsdefinedby the maximum distance that can be travelledwithout exceedinga certaincost are also likely to be strongll however,that the techniques influencedby the steepness of slopes.We must stress, describedin this sectionare not limited in their applicationto topography.Indeed. there is scopefor applying them to the cultural content of space,fbr example,to in which particular placesare vested model movementthrough sacredlandscapes This with meaningsthat make them attractiveor perhapsforbidden destinations. could perhapsshow greatercreativity in co-optinE is an areawhere archaeologists 2000). to their own ends(but seeWheatley1993;Llobera GIS functionality 10.3.l Accessibleregions The explanationof human land-usepatternsoflen, if not always, requiresunderon - movementthroughthe landscape. standingof the potentialfor - andconstraints madeto establishthe relativeaccessithat an attempl be In other words, it requires bility of dilTerentlocations.The classicexampleof explicit interestin accessibility availablewithin of the resources is sitecatchment analysis,which is an investigation accessible from a site (Finzi and Higgs 1972;Higgs and Finzi a region (catchment) asa functionof distance modelled accessibility havevariously 1972). Suchanalyses (Foley 1977)or (Higg expenditure (Higgs et al. 1967),time s et al. 1967),energetic all of which can- at packing(Cassels 1972;DennellandWebley1975), territorial leastin principle be modelledusing GIS. The influenceof distanceand teffitorial describedabove. packing can be modelledasgeometricalregionsusing techniques polygons respectively. The influenceof suchasbuffering andthe fitting ofThiessen can be modelledusing topographicregionsderived time and energeticexpenditure which is our concem hele. from cost-sudaces, or more properly,accumulatedcost-suface, modelsthe cost of A cost-surface, The cost of moving moving from a specifiedorigin to one or more destinations. by a,spretttling to eachdestinationis accumulated futtction Ihal'spreads' out over and cost-of-passage maps are nap. Accumulated cost-sulfaces a cost-of-passage algorithms. asrastermaps,sincethis greatlysimplifiesthe necessary usuallycreated MostGIS softwarerequiresa minimum of threeinput mapsto createan accumulated a map specifying the origin, a map specifying a destinationat which cost-surface: map. The first two requirements the calculationmay ceaseand a cost-of-passage an appropriate are trivial, but the amountof work that may be requiredto generate not underestimated. map should be cost-of:passage map in more detail, it is imporhow to createa cost-of-passage Before discussing tant to note that the costs usedto determineaccessibilityneed not necessarilybe

I 0.3 Topographicalregions


measured usinga functionalcurrencysuchasenergetic expenditure or elapsed time. In principle, it is possibleto employ coststhat represent cultural influencessuch as attractionto or repulsion from burial mounds,or other significantplaces.Indeed, Llobera (2000)hasdescribedin detail how the cultural influenceofmonumentscan be combinedwith the energeticcost of traversingterain. Although advanced GIS skills are neededto tackle such a model, it nevertheless suggests that the obstacles to progressin this area lie as much with the difficulty of calibrating models of cultural influenceas with the limitations of GIS technolosv. Cost-of-passage maps The coslof-passagemap models the cost of traversingeach individual map cell. The cost is often referredto as afriction and the cosfof-passagemap as afriction map. The cost of moving over a cell dependson the mode of transpoft as well as the attributesof the cell. For example,a map cell that falls in a river may be costly to traverseon foot, but cheapto traverseby boat. The combinationof mode of transport and cell attributesdetermineswhether the cost is isotropic, partially anisotropicor fully anisotropic(Collischonn and pilar 2000).
Isotropic costsare independentof the direction ot' travel. For example,the cost due to land cover is usually the sameirrespective ofthe direction in which one is havellingj scrub offers the sameresistancewhether one is travelling north or south, as does a metalled (i.e. gravel or pebble) sudace. Partially anisotropic costs are dependenton the direction of travel, but the diection of maximum cost is the samefor all cells in the map. For example,a kayaker incun a greaterenergeticcost paddling into a headwind than with a tail wind. but the wind direction itsell may be constantacrossthe map. Anisotropic costs are dependenton both the direction of travel and the attributesof individual map cells. For example, the maximum cost of walking acrossa ceil is likely to be incurred when walking uphill in the direction of steepest slope, but this direction is potentially different for every cell in the map.

In most published archaeologicalcost-surfaceanalysesthe relevant mode of transportis walking (e.g. Bell and Lock 2000; BelI et at.2002: van Leusen2002), which incurs both isotropic and anisotropiccosts. Isotropic cost-of-passdge The most important determinants of isotropic cost when walking are surfaceroughness and landcover.So-called ,terraincoefficients'have beendevelopedto model the effect of surfaceroughness on energeticexpenditure (Soule and Goldman 1969), although rhe additional costs seldom exceed 10 per cent of the cost on a smooth surface(Passmore and Dumin 1955).The energetic expenditureincurredin moving through different types of land cover has not been studied so systematically, partly becausethere are so many possibilities,ranging from paved surfaces,through various kinds of farmed and natural vegetario;, t; lock and water.Fieldcraft manuals(e.g. Langmuir 1997) provide a starting point


Regi.,n s

\ \ \ \

Fig. 10.4 Linear barrier breachedby diugonal moves.

for asceltainingthe relative costsof some land-covertypes, but ultimately it may be necessary to undertakefield trials. Van Leusen's(2002, Chapter l6) attemptto model Iron Age and Roman tradenetworksprovidesa useful exampleof the kinds of decisionsthat must be made when consideringthe effect of land cover on the costof movement. Thus far we haveassumed that all map cells can be traversed, albeit with varying costs,but sometimes it is more appropriate to mark somecells asabsolute barriersto movement. Forexample,onemay wish to n.rodel theimpactofdefensiveeafihworks. territorial limits, sacred areas or dangerously fascflowing rivers.SomeGIS software packages providefor the inclusionofbariers by allowing oneto placespecial(often negative) valuesin the cost-of-passage map. Wherethis is not the case,a similar effectcanbe obtainedby the simpleexpedient ofinsening exceptionallyhigh values in the cos!of-passage map:the valueschosenshouldbe ordersofmagnitude greater than the maximum value ascribedto the traversable cells. In both cases, a potential problem may alise if the map resolutionis suchthat a Iinear banier is represented by a stringof singlemap cells.As van Leusen(2002,Chapter16) points out, if the chosenspreadingalgorithm allows movementalong the diagonalsas well as (Fig. 10.4) the barrier can be the cardinal directions,then in some circumstances breached. One solutionto this problemis to addadditionalcells to the banier, which can normally be done using a function that 'grows' regions. The converseof modelling bimiers is modelling featuresthat can be expected to canalisemovement.In many casesthesewill be real physical phenomena, such as roads or rivers, in which case the relevant land-cover type should be coded with a suitably low value in the cost-of-passage map. In other casesit may be that movement is canalisedby habit rather than any specific land-covertype. Where

10.3 Topographicalregions


Fig. 10.5 Least-costpaths(black lines) derivedfrom isotropic cost_of_passage maps in which roads (grey lines) are assigned(a) a relative cost and (b) a fixed cost.

this is known to havebeenthe case,it can be modelledby reducingthe relevantcell valuesin the cost-of-passage rnap. Although modifying the cost-of-passage map will negatethe value of any attempt to calculatedcumulatedenergy costs (ii in fact, that is desired),this effectively encourages movement along routes that are known or are suspected to have beenpreferred.A 1arge reductionto a fixed value will accordgreaterinfluenceto habitual movementthan a smallerreductionmade relativeto the costsin the surroundingcells (Fig. 10.5). Anisotropiccost-of-passage Walking is a good exampleof a modeof transportthat incurs anisotropiccosts,in this casedue to the increased energeticrequirements of traversingslopes.There aretwo aspects to modelling the cost of walking on slopes. The first is to determinethe magnitudeof the slope that is actually experienced, the effectiyeslope, and, the secondis to ascribea cost to traversinga slope of that maenitude.
Effective slope differs from the slope recorded in a slope map in two ways. First, slope mapsrecord the maximum rate of changeofelevation acrossa map cell, but one may not necessarilybe travelling in the direction of the maximum rate of change.Since aspectmapstypically record the azimuth ofthe downward direction ofthe maximum rate of change,it follows that the slope that is experiencedwhen traversinga map cell varies from zero to the slope, according to whether the direction of travel is perpendicular to, or parallelwith, the aspect, respectively. Second,a given slope may be either downhill, or uphill, dependingon whether travel is in the sameor opposite direction to the aspect.Taking these lwo considerationstogerher.it ls posslile to calculatethe effectiveslope using Eq. 10.3below,whereo is the differencebetwen



Fig. 10.6 Calculation of effective slope.

the aspect and the direction of travel, o is the slope, and a and b are as defined in Fig. 10.6.Assuming a : l, then (10.3) may be derived thus:

cosa : b/a

and since:

it follows from (10.l) and (10.2) that:

: cos(ttano e: arctan(cos a,tan o)

( 10.3)

It is important to note that the effective slope calculated using (10.3) is correct ool! if the difference angle was calculatedtsitg a rcversed aspectmap, i.e. an aspectm4, that rccords the azimuth of the upv)ard dkectlon of the maximum rate of change: Providing this is the case, then the cosine term ensurcs tlat the effective slope varier from the slope (+o) uphill to minus the slope (-o) downhill, as shown in Fig. l0-7,

Once the effective slope values havebeencalculated they must be ascribed appropriate costs. Laboratory experiments have demonstrated that the energy expended during walking is leaston a slight downhill slopeof 4_6" , increases slightly on th flat and then increasesrather more rapidly and nonlinearly on steeperuphill slopes (Minetn et al. 1993;Rose et a|.1994, p.62). The energyexpendedalso increasc on steeperdownhill slopes, but at a slower rate than on the equivalent uphill slopes. Llobera (2000,p. 71) hasput togetherdatafrom a numberof sources to suggest lb relationship depicted in Fig. 10.8.

10.3 Topographicalregions




cos0 =1

cos270 = O
Fig. 10.7 The sign of the effective slope indicateswhether movementis up-slopeor down-s1ope.


0 (') Slope
Fig. 10.8 Energeticcost oi faversing different slopesaccordingto Llobera. Redrawn fiom Llobera (2000, Fig. 2).

Van Leusen(1999,pp. 216-217) descibes someofthe functionsthat havebeen usedin archaeological studiesto model the cost of walking on slopesof differing magnitudes. Thesehave included treating cost in terms of speed,and relative and absoluteenergeticexpenditures.
Speed Gorenflo and Gale (1990) recommenduse of the following equation to model the effect of slope oll the speedofwalking:

where u is the walking speed in km h-r ands is the slopein degrees (notethat ls + 0.05| is the absolutevalue ofs * 0.05, i.e. it is guaranteed to be positive). VerhageD e/ dl.


-Relative 150 cost

100 0 -50 -100 -r50 200

0 -60-50-40-30-20-10 (') Slope

Fig. 10.9 The energetic cost of traversing different slopes according to Bell and Lock

( 1999) used this equation for calculating the accessibility of setdements, but note that sitrce they appear to use slope rather than effective slope their analysis is not fully anNotroplc. Relative energetic expenditure Bell and Lock (2000) propose a function that is based on changes in potential energy rather than laboratory-derived estimates of actual metabolic energy expenditwe. They demonstrate that the relative cost of ascending slopes can be modelled asthe ratio of the tangents ofthe slope angles, s . Consequently, a map of relative cost, C, can be obtained using the equation: C = tans/ tan I


where a slope of f is used as the reference point. This equation has the virtue of simplicity while neverfieless prcducing a non-linear rclationship between slope and cost (Fig. 10.9). On the other hand, it has the disadvantage that the rclationship is symmetrical2 about a slope of 0'. As Bell and Lock themselves acknowledge, this is not compatible with the results of laboratory studies of actual metabolic energy expenditure during walking. Absolute energetic expenditure Van l,usen (2002, Chapter 6) proposes the following equation to calculate the metabolic energy expenditure (in wafts) of walking:

M : t5 w +2 .0 (w + L )(# )' + N (w+ D (r.sv2 + o.3svlc + 6l)

where M is watts expended, W is the weight of the walker's body (kg), a the weight of any load (kg), V the speed of walking ftm h-t), N is a 'terain factor' and G the slope (7o). The terrain factor makes it possible to incorporate the effect of land cover on the ease of movement and assumes a paved sudace as the reference, such that N : I for a road, N: 1.5 for a surface that is one and a half times as costly to taverse as a road and so on. Note that the term lG + 6l ensuGs that the minimum cost is associated with downhill slopes of 6 per cent rather than flat land (Fig. 10.10). Van l-eusen has used this equation (albeit withour the adjustment to minimum cost) to model IIon Age and Roman trade networks in the hinterland of Wroxeter Roman city (vatr Leusen 2002, Chapter 16).
'Albeit positive in the uphill direction and negative in the downhifl direction.

10.3 Topo graphical regions



0 rsooo

35 o*i*,


Fig. 10.10 The energeticcost of haveming different slopesaccordingto van Leusen


It is important to recognisethat none of the abovemethodsfor calculatins the cost of traversing a slope of a given magnitude are anisotropic unless they are applied to the effective slope. The difference angle and thus effective slope are easily calculatedusing map algebra,provided that the direction of travel is known in advance.This may be true when modelling the cost of travelling along known pathways(see,for example,Krist and Brown's 1994studyofhistoric nativeAmer_ ican trails in Michigan), but more often the purposeof cost-surface analysisis to predict an unknown direction of travel. Under thesecircumstances the difference angle and thus effective slope can only be calculatedby the spreadingfunction during the processofcalculating the accumulated cost-of-passage surface,because it is only then that the direction of travel is resolved (seebelow). Consequently, when undertakinganisotropiccost-surface analysisit is vitally importantto choose GIS softwarethat not only calculates the effectiveslope,but also allows the userto specifyhow that information is transformedinto a cost.At presentthe only widely usedsoftwarewith this functionality is Idrisi (Eastman1997,Chapter 15). The accumulated cost-surface An accumulatedcost-surface is calculatedby applying a spreadingfunction to a cost-of-passage map. The spreadingfunctions used for this purposeare designed to minimise the accumulated cost at the destination(s), from which it follows that an accumulatedcost-of-passage map models the minimum cost of moving from a specifiedstafiing point to the specifieddestinations(often all cells in the map). One of the best-knownalgorithms is that devisedby Tomlin (1990) and shownin simplified form in Fig. 10.1l. Figure 10.11(a) showsan origin, O, and destination, D, on an isotropiccost-of-passage surface. Inspectionof the cost-of-passage surface



(a) 5 4 6

3 4 5
6 9 10 8


2 3

9 4 8 7 8 7 6 6 5 6 6 3 2 2 9 5 6
(d) 9

2 6 9 3 8
8 7 8



1 20 lc.t


6 6

1 20 9 8 8 17


10 5 3 8 4 7 ,8 7 8 l3 t3 12 t3 l 0 9 9 21 17

4 3


8 7 8 15 13 1 ) t3 1 4

te,t o l o

1 20 (f) 9 6 l8 I 4 .9 A 2 6 1.0 8 t'7 10 4 7 R 7 U t: ) 4 7 10 4 t3 t2 l 3 4 l0 lq o. o: 21 t9 7 o 6

5 o i 2 6 9 3 8 I 7 8 t 3 t 2 l3




z 1

1 7 l9

Fig. 10.11 Stages in the iteration ofa basicspreading function.

suggests that the black map cells constitute the leastcostly routefrom the origin to the destination. Figuresl0.ll(b-f) show how this would be calculated. Note thatthe cellsdepicted in grey are thosewhosevaluesremained unchanged in thar step.At the start,all cells are initialisedwith an accumulated cost of zero.Then. at eachstep,a window comprising the eight immediate neighbours is movedover eachcell that has at leastone non-zeroneighbour, startingwith the origin. The accumulatedcost is calculatedas the cost of traversingthe cell at the centre ol the window,plus the lowestaccumulated cost amongits neighbours. Calculation of a completeaccumulated cost-surface for the map in Fig. 10.I 1(a) takesfive steps.

10.3 Topographicalregions


Fig. 10.12 An accumulated cost surfaceshowing algorirhm artefacrs.

l. 2.


4. 5.

(Fig. l0.1lb) The firsr srep considersonty rhe cells in rhe eighr-cell neighbourhoodof the origin. (Fig. 10.IIc)Theeightcellsconsideredinsteponearereconsideredandtheiraccumu_ lated costs found not ro change,but the acoumulatedcosts fbr the cells neighbouring them can now be calculated. (Fig. 10.1Id) The 24 cells considered in steptwo are reconsidered and their accumulated costs tbund not to change,but the accumulatedcosts for the cells neighbouringthem can now be calculated. (Fig. 10.1Ie)The 36 cells considered in stepthreearereconsidered and two cells are now found to have lower accumulatedcosts.As it happens,one of thoseis the destination. (Fig. 10.110 The 36 cells consideredin step tbur are reconsideredyet again and a furtherthree cells are found to havelower accumulated costs.No additionalchanees are possible,so this is the final rcsult.

Productionversionsof Tomlin's algorithm usually include modificationsto take joumey length when travelling diagonally acrossa cell accountof the increased rather than in the cardinal directions. They may also include modifications to improve efficiency, such as Eastman's(1989) push-broomprocedure, which has beenimplementedin Idrisi. The spreading functionsfound in GIS softwareareoftenpoor.Douglas(1999)has pointedout that a uniform isotropiccost-of-passage map - i.e. onein which the cost of passage is the samefor all cells in all directions- should yield an accumulated cost-surfaceon which the accumulatedcost increasesas a series of concentric circular rings of increasingradius. As it tums out, most GIS software packages produce actually a series of concentric polygons, ascanbe seen in Fig. 10.12.This is simply wrong and will underminethe validity of any further analyses basedon the accumulated cost-surface. The reasonis often nothing more complicatedthan the useof an eight-cell neighbourhood, which only allows coststo be accumulated



in thedilections N, NE, E, SE,S, SW W andNW. Douglas hasproposed a solution problem for this based on an analogywith the processes ofrefraction anddiffraction in the transmissionof light, but unfortunatelyit has yet to be implementedin any widely availablesoftware. Using cost-surfaces Accumulatedcost-sufacesare usually calculatedas the precursorto further analysis, often the derivationof least-cost paths,which we considerin Chapter I l. Here we are more concerned with their use to delimit discreteregionssuch as ten'itories or sitecatchments. Indeed, the availability of GIS has fuelledrenewed (Gaffneyand Standid1991;Hunt 1992;Saile interestin site-catchment analysis 1997:Stancicet al. 1997).which had fallen from favour in the late 1970s.In this contextit is impofiantto recognise do not necessarily thattechnical advances overcomefundamentalshortcomings in particularforms of analysis.Consequently. anyone contemplating undertaking a GIS-based analysis would do site-catchment ( well to acquaint themselves with Roper's 1979)reviewof a largenumberof eiulier studies. In many cases the problems sheidentifies arenot insurmountable, but nor will they be automatically overcome by the useof GIS. Delimititlg site catch lents The initial creation of an appropriatecost-of"passage map is the first stepin usinga cost-surface to delimit a sitecatchment. This musi rellect the relevantfactors,suchaste ain, land cover,etc.,that are believedto ha\e influenced the sizeof the areawhich could be exploited from the sitein question. The influence of thosefactorsmust be measured in a suitable currency, suchas travel time or energeticexpenditure.Once the accumulatedcost-surface has been calculated,the boundaryof the catchmentcan be extractedas follows:
Decide on the maximum value ofthe chosencunency beyondwhich resources could nol have beenexploited, for exanple: the nlaximum travel tiIne that would allow travel to. exlraction of, andretumtraveltiom a resour-ce within a day.This decision will. ofcour-se. require nLlmerous assumptions aboutthe organisation oflesouace acquisilion. Indeed. ir could be arguedthat it is plecisely when there is uncertaintythat the applicationof CIS becomes mosl Llsefirl. sinceit readilyfacilitltesthe comparison of catchnents deli\ed usingdifferentassumptions. 2. Differentially code each map cell according to whether or noI its accurnulatedcosr exceedsthe cl'rosen maximum. In rnany casesthis is easily achievedusing map aJgebrr similar to the following: catchment: (accumlrlatcd-cost-sruface <: iraxirnum) L

which would produce a rnap wlth cells coded I if they fall within the catchmentund ll otnerwlse. 3. If desired (seebelow), conven the mster-representation of the calchment to a polyllorl in a vector map.

proceedby analysing Anal.tsing site cotchments Most studiesof site catchment the compositionof resource types (or other landscape attributes)presentwithin the

I 0.3 Topographicalregions The ultimate aim may be to: catchment(s).

. . . .


Detemine whether a comnunity of a given size was likely to havebeen self-sufficient: Estimatethe maximum population at a site; e.g.farming versus Determinethe modeofsubsistence practisedby the site'sinhabitants, foraging; Determine whether the specilic range of resources availablewas likely to have been a factor influencingsite location,perhapsby comparingthe catchments of siteswith those oi randomly chosenlocations (e.g. Zarky 1976).

Whateverthe motivation, it is a simple matter to analysethe composition of a sitecatchmentusing GIS, providing the relevantresources havebeenmapped.This is, of course,a significantcaveat,sincesuchmappingis likely to require a substantial effort in palaeoenvironmental reconstruction. While the difficulties thar may be palaeoenvironmental encountered in producingan adequate reconstruction havelitper tle to do with GIS se,it can neveftheless be arguedthat the useof GIS doestend to highlight the disjuncturebetweenthe desirefor a spatiallycontinuousreconstruction and the point natureof most palaeoenvironmental data.This is largely because GIS sotlware does not readily lend itself to the 'eyeballing' of, say, vegetation extents,but rather to the constructionof explicit models from which such extents may be deduced,which is in turn often more revealingof the sparseness of data. Assuming that the relevantresource maps are available,then the compositionof a site catchmentmay be analysedusing one or both of two methods:
Polygon overlay is appropriatefor use with vectol maps of disconlinuousresourcesAs describedin ChapterT, it can be used to answerqueriessuch as establishingthe proportion of a site catchmentthat is cultivable land. Cross-tabulation allows similar queries to be made of raster maps. In this case, the result of cross-tabulationwould be a table showiDg how the values in the raster map are distributedbetweenthosecells within the site catchmentand thoseoutsideit (Fig. I 0. I 3). Cross-tabulation c an be appliedto rastermapsrepresenting continuously (e.g.typesof) resources. vatying resources aswell asthoserepresentinS discontinuous Since the former producesa table containing one row (or columl) for every unique resourcevalue, many of which will representvery few map cells, it is usually more helpiul to reclassifysuchamap into a setofclasses. Forexample,one mightreclassify a map of solar gain into the classeslow, medium and high.

10.3.2 Visibleregions The calculationofviewsheds (regionsof intervisibility) hasbecomea very popular use of GIS in academicresearch. Much of this work has concernedthe placement of monumentsin the landscape, for example:Wheatley's(1995) study ofthe intervisibility of earthernburial mounds (long barrows) in central southernEngland; Woodman's(2000a) study of the reciprocity of view betweenstone burial chambers (chamberedcairns) in the Orkney Islands of Scotland;Fisher et al.'s (1997) investigationof whether stonecairns on the Scottishisland of Mull were deliberately placed so as to overlook the sea;and Llobera's studiesof the prominenceof monuments(Llobera 2001) and the manner in which they becomevisible as one aooroaches them.


potential (0/6) Land-use Area(km2) Totalstudyarea

ffi n-uu


R ough grazjng Good grazrn9

Within two-hour range

Arable R ough qrazing

Good qraznq




Two-hour range


Fig.10.l3 Cross-tabulation of land-use potential within a two-hour territorial limitof (afrer a sire Vita-Finzi 1978, Fig.87). The calculationof viewshedsalso has applicationsin cultural resourcemanagement and planning more generally.For example,Batcheior (1999) calculatedthe areavisible from Stonehenge in order to investigate which of severalproposalsfor re-routing roadsnear the World Heritage Site would have leastvisual impact. Definitions Intenisibility Visibility analysisin GIS is foundedon the automaricdeterminarion of whetherany givenpair ofpoints areintervisible.This process is normally carried out on a rasterdigital elevationmodel (DEM) and works by projecting a straight line-of-sight from the viewpoint to the target.If the elevationsof all intervening map cells fall below the line-of-sight,then the two points areheld to be intervisible. If, on the other hand,the elevationof one or more interveningcells falls abovethe line-of-sight,then the line-of-sight is interruptedand so the two points areheld not to be intervisible. ViewshedThe viewshed of a viewpoint is the set of target cells that can be seen y'oiz the viewpoint (seereciprocity,below).
Single viewshed The simplestresult ftom a viewshedanalysisis a binary map marking targetcells as visible or not visible from a specifiedviewpoint. Some softwarepack, agesprovide additional information. For example, GRASS createsa map in which eachtargetcell is marked with the declination(anglein the vertical plane) at whichit

10.3 Topographicalregions

) )1

Fig. 10.14 A multiple viewshedcreatedby the merging of the viewshedsof each of the three locations.

Fig. 10.15 A cumulative viewshedcreatedby the addition (overlay) ofthe viewsheds of eachofthe threelocations. is visible from the specifiedviewpoint. Otheruseful infomation, suchas the azimuth (angle in fte horizontal plane) can be obtainedby post-processing, as describedby Wheatley and Gillings (2000, p. 22). A multiple viewshed map is the logical union of two or more viewshed rrraps (Fig. 10.l4). The valuesin a multiple viewshedmap are either 1 (visible from one or morc viewpoints) or 0 (not visible from any viewpoint). Each map cell in a multiple viewshedmap recordswhether it is visible from at least one viewpoint. Cumulative viewshed is the term introducedby Whearley (1995, p. 173) for rhe map algebraic sum of two or more binary single viewshed maps (Fig. 10.15). The cell


valuesin a cumulative viewshedare integersranging from zero to a theoreticalmax imum of the number of viewpoints, although this will only occur if at least one cell is visible from all viewpoints. Each map cell in a cumulative viewshedmap records the number of viewpoints from which it is visibie. A total vie\rshed map (Llobera 2003) is usually createdusing one oftwo methods.The first is to calculatethe viewshedfrom every single map cel] in tum and then sum all theseviewsheds. Such a map is effectivelythe cumulativeviewshedof evety possible viewpoint, fiom which it follows that eachmap cell recordsthe number of other map cellsfrom which it is vlsibl?.The secondmethod is to use a dedicatedprogram tsuon as the r.cva module availablefor GRASS) which calculatesthe viewshedfiom every single map cell, countsthe numberofcells in the viewshedand then rccordsthis at the viewpoint. Unlike the first nethod, this producesamap in which eachcell recordsthe numberofother map cellst,hich are visiblefrom lr. Ir is possibleto obtain the second type of map using the lirst mefhod by caretul choice of obseNer and target offsets (seereciprocity,below). The cell valuesin a total viewshedare integersranging from zero (or one if the line-of-sight algorithm being used fieats a viewpojnt as visib]e liom itself) to a theoreticalmaximum of the number ofcells in the map, althoughthis maximum is unlikely to be obtainedwith models ot' natural terrain. Isovist feld is the the term used in planning and architectureto refer to a total \ ie$shed.An isovist is 'the set of all points visible from a given vantagepoint in space, (Benedikt 1979,p. 47) and an isovist fieid is createdby sunlming isovists for-cvcry possiblevantage point ill an environment(Batty 2001).An isovistfield is thus directly equivalentto a total viewshedcreatedusing the first method describedabove.

Issuesin visibility analysis The potential pitfalls in GIS-basedvisibiliry analysishave been well rehearsed in the archaeologicalliterature (Wheatley and Gillings 2000). Here we disringuish belw eencomputat ional, erperimental, substanlive andtheoretical concems. Computationalissaer Thesestemfrom the way in which visibility analysisis programmedin individual GIS softwarepackages. In the caseof commercialsoftware the endusergenerallyhasno control over this, beyondthe initial choiceofsoftware. and even then informed choice is often hamperedby lack of adequate documentation. In the case of open-sourcesoftware the end user can change the way in which visibility analysisis programmed,but only if they or a collaboratorhavethe necessary programmingskills. The two most important computationalissueshaveto do with the ability ofthe GIS softwareto calculatetheoretical intervisibility accurately, given perfectdata:
Algorithm Different GIS software packagesuse different algorithms for calculatins intervisibility. In pafticular, they vary in how they calculate the elevationof a nup cell on the line-of sighr, whether they ffeat the viewpoint and rargetsas points or cells. and how they compare the elevation of a map cell wirh the elevation of the line-of-site (Fisher 1993). Consequently,difierent GIS software packagesproduce diflerent results,even from the samedata (Fisher 1993;Izraelevitz 2003). Curvature ofthe Earth Since the surfaceof rhe Eanh is not flat, but curved. thcrc rs a maximum distancebeyond which a target of a given height cannot be seen,even at sea with perfect visibility. The effect of the curvatureof the Earth's sudace is ro

| 0.3 Topographicalregions
+with edgeeffect
20 t8 t6 14

229 'a- withoutedgeeffect

500 700 Viewshed size Fig. 10.16 The edgeeffecrin visibiliryanatysis.

.educe the elevation of a target by approximately 7.g6 m for every l0 km from the viewpoint,which is potentially sigrificant fbr.manyarchaeological visibilily analyses. Unfortunately, many GIS software packagesdo not take accollnt of the curvature of the Ea h whei calculating intervisibility. Whele necessary, this problem can be overcomebymodifying the digital elevationmodelto curve it gently downwardsaway from rhe viewpoint, as describedby Ruggles and Medyckyj Scou (i996, p. 133). Note, however,that $,hen calculating multiple, cumulative or.total viewsheds.this modillcation must be made individually for each viewpoint in tlrrn.

Erperimental issuesTheseariseasa result of the way in which a visibility analysis is conductedonce all substantive decisions(parameters, data,pulpose) have been made.They are underthe control of the end user,althoughtheremay be constraints such as the speedof the availablecomputer(s).The rnost important experimental issuesinclude the followins:
The edge effect is particularly iDfluentialifthe aim ofa visibility analysisis to conpare the sizeor shapeoltwo or more viewsheds. Sinceintervisibility decayswith distance (see substantiveissues),it is usually only calculatedfor target cells that fall within some specitiedradius of the viewpoint (the maximum viewing distance).When the distancebetweena given viewpoint and rhe edge of the map region is lessthan that radius it follows thar rhe viewshed may be artilicially truncated,thus invalidating comparisonwith the viewshedsol other viewpoints that were further from the edge of the map (Fig. 10.16).The simplestsolution is to perform the visibility analysison amap region which is surroundedby a bufferzone ofthe samewidth asthe maxlrlum viewing distanceithis is then discardedduring subsequent analysis.For an example seethe study by Lake er a/. (1998), which clearly demonstrates the edgeeffect Reciprocity If, as is usually the case,ol1etakesaccountof the heights of the obseNer and target above ground level (the obrerver and target .otfserr.i rhen ir is possible fbr the target to be visible to the observerwhile the observerremainsinvisible lrom



X = target
Fig. 10.17 Differing observerand targetoffsetspreventexact reciprocity of intervisibility.

the target (Fig. 10.17). The greater the disparity between the observer and targel offsets, the grealer the probability that intervisibility will not be reciplocal. Lack for the placementol of reciprocity may be interesting,as has been denronstrated Orcadianchamberedtombs (Woodman2000a) and Hellenistic cily detences(Lool\ design, for example. of poorexperimental 1997). It only becomes a problcmin cases when one attemptsto producea total viewshedin which eachcell recordsihe number ol other map cells ll./rirft ue visibleiont it by surnming the viewshedsof ever-r possible viewpoint; in this case it would be imporlant to reversethe obsefver and targetoffsels to achievethe desiredresult. DEM quality Accurate dele nination ol the theoretical intervisibility between I\\.' points crucially dependson the digital elevation model provjding an accutalerep is particularly sen of intervisibility resentation of rcality.Giventhat the calculation sitive to the elevationof peaksand crests,it is besl to createIhe DEM using an era.: network of spot heishl. interpolation method in conjunction with a comprehensive on local elevrtion maxima and (lessirllpoltantly) minima. when thereare real douL'i. aboutthe accuracyofthe DEM it may be advisableto createa probabilistic viewshc; by Fisher(1991). as described Sensitivity When intervisibility nust be calculated over lalge distancesthere is .. increasedlikelihood thaf ve.y small changesin observerheight will result in lar!. This is turther compoundedby any inaccurr.differencesjn viervshedsizeor shape. in the DEM, as weil as other iactors such as ulcertainty about the height of irrt:venilg legetation (seebelow). For this reasonit is good practice to repeatvisiLril::analysesfor a range of patametervalues in order to underslandwhether-thefesr are - hopelully robust.or instead,highly sensitiveto small changcs(seeLocl .,: Harris 1996for a discussion).

Substantivelssries These determinethe choice of parametervalues and data f : They include: visibility analysis.
Palaeoenvironment Most. it' not all. p blished visibility analysesuse a modern DE' ' This may provide an adequatemodel of the later prehistoric topoglaphy il] a r:'. for earlier prehistoryor in a built up i:: area;it is much less likely to be adequate PalaeovegetationThe so-called 'trec tactor' (Wheatley and Gillings 2000, p. -' prssenl::i Chapmanand Gearey2000) is invarjabl! rri<cd at ever) conierence of a visibility analysis.lt is obvious that the presenceof tall vegetationcan r , a significant effect on intervisibilit)'l it is not quite so obvious what to do ab,,:::

10.3 Topographicalregions


Fig. 10.18 A probabilistic viewshed.

The simplest solution (where it cannot be demonstratedthat the palaeovegetation i, to add rhe averageheight of vegerarion to rhe rclevant map celis in the :T, iy) DEM In practiceit is r-rsualry dif{icult to obtainparaeoenvironmentar re.onstn.tions ofsufficient spatialresoiution.In any case,the blocking effect of v"g"rorio; d;p"no, on both the time ofyear (whetherit is in leaf and its density within Igiven map cell. Forthis reasona more promisilg solution might be to provide probabllistic a measure ofblocking analogous ro thar describedby Fisher 1l9d ly for moielting nEN4;udrty, although this is likely to require the developmentof new algorithms. ^ Contrast A target that contrastswith its backgroundlnuy, fJ. p.o"ti.ul purposes, be visible over greaterdistances than one that blendsin witi its backg.orrna. Coni.ast is a iunction of the innateprope ies of the target,atmospheric conditions and lighting (Felleman 1986). For exampie, a newly built chalk burial mound gleaming-white in oblique sunright may be visibre from t'urther afield rhan a grasls"d-oueiburial mound viewed through light rain on a generally overcast day. ihe visibility toots availablein cunent GIS software packagesdo not model the effect ot' conrrasron inteNisibility. The simplest solution is to set the maximum viewing distanceon the b:i.i. o f re al uot lde\ per im ent s \ ie\ \ ingun. lnatn g o u \ t a r i l e t t n a n a l o g o uc . ondilonj. r'owever. such use ol r single maximum viewing distance fails to account fbr any gradual redxction in the probability of intervisibility with increasingdistance,Just as it.also.fails10 accountfor daily and seasonal variationsin atmosph_eric conditions and lighting. Consequently, a more nuancedapproachadds a prob;bilistic ;i;menr. In the caseof effecrsrelatedto distance(for examplethe size of the rargertir i; p,rs_ sible to produce a map in which the probabillty of;crual inrervisibiliry 8""1,"". *ull distance.This can be achievedby post_processing the calculatedviewshed Lrsirt a combination of buffering and map algebra,as described by Wheatley and Gillngs (2000). A similar solution could be developed for effects related to the horizon_ tal angie of viewing. In the case of effects related to .pecin. conaitions, ,ucr, as weather,it is possibleto producea map in which the probability ofactual intervisrbil ity.reflectsthe extentto which a given map cell is alwiys visible, oronly visible under a limited set of circumstances (Fig. 10.lg). One way ol proau"ing ,o"h ; ,o;i. u, follows


L Calculatea binary viewshedmap for each condition using the appropriatemaxrmum viewing distance. 2. Use map algebrato multiply eachbinary viewshedby the number of units of time (e.g. hours or days) for which the relevantconditioll holds. 3. Sum the binaryviewsheds. 4. Use map algebla to divide the sum of all the viewshedsby the total number of unitsoltime (e.9.24hours,365days). More sophisticated methodsfbr modelling the facto$ that contributeto conlmsthave been developedto assessthe visual impact of proposed wind turbines: {he work of Bishop (2002), in particular, is potentially of great relevanceto archaeologiqts conductingvisibility analysis. Height ofobserver The choice of observerheight is not straightforwud even when GIS software thereis good skeletalevidencefbr the population in questjon- because only acceptsone height value when the targel may have been of interest to many people of different heights.This is not a trivial issue:Lock and Haris ( 1996) have demonstratedin their analysis of visibility fiom Danebury Iron Age hillfod that viewshed. One solution rs to observerheight can significantly alter the calcuLaled calculatethe viewshedsfor a range of heights.Dependingon the rcsearchquestion. it may then be approprlateto createa probabilistic map showing the likelihood of a personof unknown height being able to seeeach map cell. Acuity of yision CIS softwaredoes not take accountof acuity of vision, sinceto do so it would need information about the observer'seyesightas well as the contmst and one observerand a llxed targel).A rather sizeofthe target(and eventhat presupposes crude solution treatsthe maximum viewing distanceas a proxy for acuity and then a range of acuity. representing repeatsthe analysisfor severaldifferent distances

TheoreticaL issaes These determinethe frame of referenceand purposeof GISvisibility analysis. Lake andWoodman(2003)havearguedthatdevelopments based in GIS-basedvisibility studieshave tendedto recapitulatethose in archaeological theory more generally.Important theoreticalissuesinclude:
Inferential strategy One of the nost popular reasonsfor usilg GIS to calculateinter'visibility is to compare the viewshedsof archaeologicalsites, tbr example, burial mounds, with those of other places that lack evidencefor human activity. Given an appropdateexpe mental design (seeChapter 8) it may then be possible10attribute somelevel of statisticalsignilicanceto the results,which lnay in turn provide a means for site location (seeFisheret al. 199'T of testinga hypothesisaboutthe reasons ].Lake er al. 1998;Wheatley 1995for examples).Such analysiscan be pedormed manually in the field (for example Bradley al. 1993),but the use of CIS generally allows a "r much larger number of non-site locationsto be sampledand ofien also provides the only lealislic meansof mitigating the impact of modem building. Perception There hasbeensignilicantinterest- albeit mosdy among Europeanarchaeoften ologists- in using GIS visibility tools to investigateperceptionof landscape, in an attempt to denonstrate that GIS can be applied in a broadly post processual such studies(e.g. Caffney et al.1996) have failed framework. Frequently,however-, to movebeyondeslablishingwhethercertainpoints, for examplestonecircles or rock the existence ofa line-of-sight art, are intervisible.lt canbe arguedthat demonstrating betweenpoints of interest has little to do with perception.which is morc pioperly A construedas the prccessol moving from sensoryjnput to cognitive rcpresentation.

10.4 Conclusion


smail group of researchenare cognisantoI this and, most notably, Marcos Llobera (1996, 2001,2003) has been developing methods which have as thelr theoretical foundation Gibson's (19g6) ecological approachto visual perception.Wheatley and Gillings (2000) alsolind inspirationin Gibson,swork, althoughtbr pracricalpu;oses they draw more on Higuchi's (19g3) eight visual indices of the visual and sDatial structureoflandscape. As it stands, thesemore sophisticated approaches to perception largely constitute the developmenrof methodology: the challenge now is to make substantive contributionsto specific archaeological problems. VisualismA frequent crilicismot CtS brsed visibilityanallrrsjs lhatrt privileges vision over sound,smeil and touch, which does more to perpetuate Renaissance ideals than reflect the relative importance of vision in the more distant past (Witcher l99g: Whearley and Gillings 2000). Tschan e. al. (2000, Fig. l0) illusrrarea hypothetical 'perception shed,, which records the combination of sensesthat may be used to perceiveeach cell jn a rastermap. They suggestthat hearingand smelling, as senses 'subject to distanceas a process' (Tschan?r a/. 2000, p. 45), could be moclelledby adapting existing visibility analysis tools, although they do not provjde a worked example. Another critique related to that of visualism is that rhe plan shapeof a vjewshed depicted in a GIS map bears little resemblance to the experienceof people on the ground and is therefo.eoflimited interpretative value (Thomas 1993).Thoseworking in a theoreticalframework for which this is a problem will wish to conslder whether virtual reality ptovides a meansof overcomingthe ,specular,(Thomas 1993) means of representation aflorded by two_dimensional GIS mapping. Tne potentralbenelits ofvirtual reality have beendiscussed by Gillings and Goodrick (1996), while Exon et al. (2000) provide a substantialcase study that combines .traditional, GIS and virtual reality in an investigationof the visual relationsltipbetweenStonehenge and its surroundings.

10.4 Conclusion The extentsof archaeologicallysignificantregions are often known prror to GIS_ basedanalysis,in which casethejr conrentcan usually be analysedusing methods discr,rssed in Clrapter 7, such as polygon overlay.In contrast,this chapterfocused on the use of GIS to help define regions whose extentsare not already known. In the first part we introducedmethodsthat can be usedto creategeomemc regrons, suchasbuffersandThiessen polygons.Thesetechniques do not t-ake accountofthe contentof space. In the secondpart of the chapterwelurned our attentron to what _ Ior convenlence - we have labelled ,topographic' regions. The methodsused to createthesekinds ofregions do take accountof the co;tent of space. Cost_surfaces allow one to define regions on the basis of accessibility,whi& may be physical and/or cultural. Visibility analysisprovides a meansof defining regions as setsof locationsthat are intervisible,althoughwe havesuggested that i;will otten be most appropriateto think of such regionsin probabilistic terms. one important kind of topographicregion that doesnot appearin this chapter,however,is the watershed, or rainfall catchment. We discussthis region in ChapterI 1 wherewe are betterable to explore its derivationfrom a hydrological network.


Routes:networks,costpathsand hydrology

11.1 Introduction It is often appropriateto model the spatialorganisationof human activity in terms of point locationsand the relationshipsbetweenthem; for examplethe movement of goods between settlementsor the intervisibility between forts. This chapter discusses the various network analysistools that can be used to study such relafor predictingthe likely path of an unknown techniques tionships.It also discusses route betweenpoint locations,as well as the flow of water and watersheds. it is surprising datais ultimately point based Giventhat the bulk of archaeological that network analysis has not featured more prominently in the archaeological applicationof GIS. Of course,what is a point at one scaleof analysismay be a region at another,and it is thus important to recognisethat the applicability of netwolk analysisis determinedby the way in which the problem is framed rather than the netwotk geographicalextentof a particular study.A few publishedarchaeological from the colonisation of neu' subjects ranging in scale haveinvestigated analyses (Bell andChurch andthelocationof'centres' territory(Allen i9901Zubrow 1990) of rooms in individualbuildings(Foster 1985;Mackie 2001) to the connectivity to evensmaller why this rangecould not be extended 1989).Thereis no reason lithic pattems among artefactsin a investigate of refitting to, for example, extents: single stlatigraphicunit. 11.2 Representing networks We are all familiar with networks such as road and rail networks, and also the idea of social networking that a friend or businessassociateknows somebody thai who knows somebodyelse who can provide . . . Theseexamplesdemonstrate In the caseof road and networkscan be conceivedat different levelsof abstraction. physical links connectfixed points that havegeographicteference. rail, observable but in the caseof social networksthe links do not have- at leastnot permanently physical manifestationand the people linked do not generallyhave an observable between Consequently it is importantto difl-erentiate a fixed geographicreference. different types of network, most importantly for archaeologicalpurposes:pure networks,flow networks and transpofiationnetworks (Fischer2004). netvvorks as a simple graph. comThe 'purest' form of network, known to mathematicians prisesa finite numberofnodesconnected by edges, suchthateachedgejoinsexactly 234

I L 2 Rep resentin g nefir) o rks



H Fig. I I .1 Connecred(a) and disconnected (b) simple graphs. A

CFDis a cycle BCE is a path of length2 SCDE is a parh of length3 BCDGE is a path of length4

rig. r r": rorr.r, una'"ycles ln a srmple graph. two nodes,thereis no more than one edgeper pair of nodesand edgesdo not have directionality(Wilson 1996).Socialnetworksarereadily represented usinga sinple graph since the nodescan be used to represent people and the edgesthe relation_ shipsbetweenthosepeople,in which casethe rulesaboutedgesreflectthe ideathat humanrelationships are diadic,that any two peoplehaveonly onerelationship,that peopledo not haverelationships with themselves and that relationships are mutual. A graph is saidto be connectedif every node can be reachedfrom any other node by traversingone or more edges,otherwiseit is said to be disconnectecl (Fig. 11.1). Thus in a connectedsocial network it would be possibleto deliver a lettei to any otherpersonby asking an acquaintance to passit on to one of their acqualntances, and so on. If eachperson handledthe letter only once then the sequence of edges by which it reachedthe final recipienr would consritutea path (Fig. 11.2). If ihe senderand final recipient were the samepersonthen the path would constitutea c-ycle. Note that the length of a path is simply the number of edgesbetween the origin and destinationnodes,somethingthat is not so readily appreciated when the nodeshappento havea fixed geographicreference(i.e. geographical distanceand path length are not necessarily correlated). The information stored in a simple graph is purely topological, i.e. about the connectivity betweenthe nodesonly; in the aboveexampleit is about whether or not two peoplehavea relationship. This is entirely appropriate for purposessuchas




Fig. ll.3 A weighted digraph(a) ofaroad network(b). ln this casethe weightsreflect the number of lanes.

( 1998)investigation WattsandStrogatz's ofwhatkind ofsocial networkunderwrites that despitehaving a denselyknit the 'small-world' phenomenon the observation network of close friends we can also be connectedto almost everyoneelse on the planet in just six steps.It may be less appropriate,however,fbr researchingthe influenceof physicalproximity on social network formation (seeGamble 1998for an archaeological interestin this subject)sincethe placementof nodesin a simple graph providesno information aboutphysical location. as vector maps,l such that the In nost GIS software,networks are represented aspoints and the edgesas lines joining them. Two problems nodesare represented arisein attemptingto representa simple graph in this way. The first is the issueof planarity (seetransporlationnetworksbelow) and the secondis that it will require it is usuallybetter for the nodes.Consequently the inventionofarbitrary coordinates for rhe analysis of simplegraphs. Such to usespecialist sofiware, suchas Pajek,2 softwarealso has the virtue of offering a wider range of graph theoreticmeasures Of course,in the caseof a network than will normally be found in GIS packages. wherethe nodesdo havefixed geographic reference thenthe appropriate coordinates can be assignedto the points representingthe nodes.In this case,howeveq it is importantto recognisethat suchlocationalinformation is not intrinsic to the graph and will thereforebe ignored in the calculationof most graph theoretic measures ( see ll.3.l). Section 11.2.2 Flow networks Given an interestin the movementof goods from one location to another,then it is very likely that one would wish to model both the direction and quantity of the goodsmoved through a network. Networks in which direction is important can be represented usingwhat mathematicians call a directedgr aph,or digraph (Fig. 11.3).
i, ttoor"ti"rlly po.rible to develop raster representations of neiworks using Tomlin's (1990) 'increnental'It linkage'opcralor This is seldom implemented in commercial software. but one exception is MFWorks ( w r . r wk . e i g a n s y s t s e n s . co ilr / so ftwa r e /m fwo r ks / lndex. htnl ). _ -n , r p : p - b r - - wo r \s paj -k r . .s dara.-L.h( v r d o o . rr ' . - r |

I 1.2 Representing networks


In a digraph, edgeshave directionality, i.e. rhey point from one node to another. Consequently, it is permissiblein a digraph to have more than one edgejoining a pair ofnodes soasto represent thebidirectionalflow ofgoods, influenceor whatever elseis being modelled.It is usually also considered permissibleto haveedgesthat both start and end at the samenode, perhapsin this caseto representthe intemal consumptionof goodsproducedat a node. Note that someauthorsprefer the tel.ln arc for a directedlink, reservingthe label edgefor an undirectedlink (e.g.Fischer 2004). It is possibleto model the quantitiesthat flow through a network by attaching weightsto the edges.In a weighteddigraph, often referredto in the GIS literature asa valuedgraph(e.g.Chou 1997,p.229),theweights (values) areusuallychosen to representeitber impedanceor a capacity constraint. Impedancemeasures the relativeor absoluteeasewith which materialcan flow betweenthe nodes joined by an edge, while a capacity constraintrepresents the maximum amount of material that can flow betweennodes.Suchweights typically reflect real-world phenomena suchasthe width or surfacequality of a road,the transporttechnologyavailableon that route or eventhe distancebetweenthe relevantnodes. It might not be immediately clear from a GIS perspectivewhy the distance betweennodesshould be specifiedas a weight, when often it would appearto be an intrinsic property of the real-world distribution of the nodes.The pragmatic answeris that impedanceand capacityconstraintsare rarely a function of the simple straightJine distancebetween nodes, but instead a function of the distance coveredby (quite possibly winding) routeson the ground.Few GIS packages can automaticallycalculatesuch distances in the courseof network analysis.The reason why distancesare specifled as weights is that the spatial element of a flow network is an abstractionof its topology rather than a representation of its exact physical layout. Consequently, given appropriateweights, it is perfectty possible to calculate flows without ref'erence to the geographiccoordinatesof the nodes involved. 11.2.3 Trunspo rtation networks The majority of GIS-based network analyses concemthe location of facilities such as markets or the routing of deliveries on transpofiationnetworks, mostly road networks. Although much of this work can be carried out using abstractedflow networks, the demand from end-usersfor realism (for example, drivers want to be able to recognisean edge as a particular winding road on the ground) and the benefitsthat accruefrom using existing GIS databases mean that it is desirableto perform network analyses on ordinary geographicallyreferenced veclor maps.The threerequirements for transportation modelling can be met as follows.
L It should be possibleto attachweights to the lines representing edges.Since most vector maps alreadyassociate attribute valueswith each line this is normally only a matter of adding an additional neld to the atffibute table (seeChaDter2).



Fig. I 1.,1 A transportnetwork. The nodes(large circles)cany topoiogical informrrion. the vertices(small circles) cany only topographicalinfbrmation.

It should not be necessaryto eiiminate infbrmaiion about the real-world coulse of a route in tbvour of a straighFline edge belweenthe relevantnodes-All curent CIS packageswith network analytic pretensionsare capable of ditltrentialing nodes that j unctions.fion verticesthal carr\ carry bpological infbrmation. i.e. thoserepresenting only topogfaphical intbnnation. i.e.\\'hose solefunction is to nark a change in direction of the road. rail\lay. etc. Provided this is Ihe case, it is perttctly possible to perfor-nr nctwork analysis on cuNed cdges whose real,wofld collrse is approximatedby man\ sepamte shaightlines(Fig. I1.4). 3. Nodes shouldcarry infbrmation aboutjunctions.Figure I 1.4 shows a contmon situation in which one road crossesanotlter on a bridge. so thal there is no jLrnctionat which a vehiclecan leaveone road and join the other.The easiest way of represenling thii sillration in a nelwork is to have the edgesrepresentingthe two roads cross wllhoul a node. However, GIS software ofteir enforces planarity in the processoi creating .r 'ropologically coirect' vector map. which effectively meansthat edgesaaenot allowed to cross except at nodes.To understand the concept ol planarity, in'laginea net$,ork in which the nodesare represented by balls placed on a table and the edgesby lengthsol sling tied between the relevant balJs.A slmple graph represented in this way woutd remain topologically constanteven il one picked the balls Llpand movcd tltetn aroLmd. providing that the strings remainedattachedto thlr sarneballs. In the caseol a p/.r/i.t/ grapl. however. it is not pemissibleto pick up eitherthe ballsor shings,only to push them aroundon the table top. The theoreticalsignificanceofthis restrictionis that so|-: anangementsof nodes that ar-e consideredtopologically equivalent in a sirrple g:_: arenot topologically equivalent in a plarar gmph (Fig- I I .5).The practical sign il:, -- . for transport analvsis is that GIS sof'twarethat enforces planarity requifes e n havenodes evenwherethereis no possibility on thegroundofmoving lrom o e r--. another. The most comnon solutionto this problem Llses a lrDr t rble (Fischef200.1). F, ' example, ArcView's Network Analyst extensiol requires that Lr every node the u::. specifywhetherit is possible joining that node.l to movebetween eachpair of edges the caseof a bridSe,the tunl lable might look like rhardepicredin Fig. I I .6. Compitir: a turn tableis tedious, but it hasthe viftue offfexibility in that it provides a nieans,' representing a rangeof aeal-worldcomplications,such as prohibited turns-

11.3 Analysing networks The following sectionsprovide detailedexplanationof a wide and complex varier'. of measurement termsfor netwofks.Although the tems themselves are mathemai ical and abstract,we do show how thesecan be appliedto archaeological data.

11.3 Analysing networks


Network o

Network b

Network c

Fig. 11.5 Planar graphs. AII threenetworks aretopologically equivalent unless embedded in theplane, in whichcase d : , + c.

e e e

hl iI jt e1 i1


t I T

e1 h1 j0 el h0 i0

J l j

Fig.11.6A tumtable fornode D in therransporr nerwork in Fig.11.4. The questionsasked of networks fall into three main groups. First, there are questionsabout network structureor topology, i.e. how the nodes are connected. Second,thereare questions aboutthe location ofparticular facilities (usually sites) on the network. The third set of questionsconcernsthe routing of information, goods or peoplethrough networks. tl.3.l Measuresof network structure Measures ofnetwork structure, suchashow well-connected nodesareto eachother, potentially enableone to establishwhetheran ancientroad network would survive


Routes Table ll.l Basic measures of the networksin Fig. I 1.I

Network d Network ,

1 '7'72


Table ll.2 Local measures for the nodesin Fig. 11.2

Ai Ki

At 821. 0 c 40. 68 2 D40. ' 7 8 2 E 40. ' 7 8 2 F 21. 0 G 3

1 .0

133 12

0 .8 3

12 103

the severingof a few links, whetheran ancientcity may havegainedimportance by virtue of its accessibility, whetherurbangrowth is scalefree or whethersome parts of a museumare more accessible to visitors than othersby virtue of their intervisibility. We describesomeof theseexarnples in moredetail below,but first we reviewsome of themorewidelyusedmeasures of networkstructure. Since not all graph theoreticmeasures are independent of one another,there is sorneoverlapin their interpretation. Consequently, the following scheme is not intendedto be prescriptive, but simply to providea startingpoint whenchoosinga measure for a particularproblem. Basic measures Measures of network structureoften requirea few basicquantities, including:
N The total number of nodes in a graph. t The total number of edges in a gaph. Note that a simple graph may contain a maximum of N(N - l)/2 edges. G The number of components of a graph, i.e. the number of sections of the graph that aie totally disconnected ftom one another.

The two networksshownin Fie. 11.1 have the basic measuesshownin Table 11.1. Local measures A numberof measures canbe usedto assess the locationof an individual nodein a network. Thenodes in Fis. 11.2havethelocalmeasures shown in Table11.2.

I 1.3 Analysing networks Local connectivity


The nodal degree,ti, is simply the nunber ofedges connected to a pa icular node.In a simple graphthe maximum possiblenumber ofedges that can be connected to a node i is N L In the caseof a directed graph il is possibleto distinguish the in_degree from the out-degree. The clustering coefficient of node i, C,, is calculatedas


n\ n


(r r. 1 )

where /? is the number of nodesdirectly connectedto i (including i itself) and e is the number of edgesthat connectsthosenodesto one another.In other words, C, is the number of edgesjn the immediateneighbourhoodof i exprcssed as a ftaction ol the ma\imum possiblenumber-ofedgesin that neighbourhood. This fraction ranges tiom 0, if i is completely disconnected, to I, if all the nodesto which i is directly connectedare also connectedto one another.A social interpretationof Ci is that it measures the extentto which your lriends are also your friends' friends.

Local accessibilirl^While measures of local connectivityprovideinfomation about the immediate neighbourhoodof a node, measures of accessibilityprovide infor_ mation about how readily a node may be reachedfrom further alield. Notice in Table 1I .2 how the most accessible nodes(thosewith low A;, i.e. C, D and E) also havethe leastclusteredneighbourhoods (low C,).
The accessibility index of node I, Ai, is caiculatedas a -\-r. ^ 1 _', lll.2)

where l;7 is the shortesrpath length between nodes i and j. For those unfamil_ iar with this notation, the symbol lj I r' says to sum whatever follows, in this case the sho est path lenSths,for all the nodes j to which i is connected,exctuo ing i itselL Note that there is no instruction to sun1the shortestpath length to all nodesin the graph, i.e. liom j : 1 to j = N, becau$ it is possiblethat sone nodes are not connectedat all, in which case the path length betweenthem is considered infinite. The Kbnig index is a measure of the centr.ality ofa node.The Kdnig index ofnode i, (i, is simply the longesrof the sho.testpath lengthsto a node j 10which I is connected. This can be statedfbrmally as


Global measures Sometimesthe overall propertiesof a network are of more interest than those of specific nodesor, alternatively,one may wish to assess whether or not a node is typical by comparingits propertieswith thoseof an ,average'node.The networks in Fig. 11.7havethe globalmeasures shown in Table11.3.


Routes Table 11.3 Global measures for the networks in Fig. I 1.7

Networka Network,

1;11, 4.57

0.86 2.29

0.29 0;76

0.0 l.1l

O.79 0.89

5 2

50 26

2.38 1.24



Network b

Fig. 11.7 Sparsely(a) and well-connected(b) networks.

Global connectivity
The avemgedegree,,K, of a networkis calculated as

r=1Sr, ' N


If the avemge degree of a simple graph is close to N - I then almost every node is dirccdy coirnected to every other, whereas if it is considembly smaller then many todes arc'lot directly connected, although they may still be connected via intermediate nodes. I[dircct connectivity can be quantified using accessibility measures. The beta index, B, is simply the average number of edges available per node, calculated as

(11.5) p mngesfrom 0 for a completelydisconnected graphto (tr' 1)/2 for a completely connected simplegnph. The gammaindex, y, is the ratio of numberof edgesactuallypresent in a networkto the maximumnumberpossible.It is calculated for a simplenetworkas y : E / E ^ -:E /tN (N -1 )/2 1 (11.6)

11.3 Analysingnetworks


The maximum possiblenumber ofedges in a planarnetwork is always lessthan for a non-planarnetwork with the samenumber of nodes,so in this case/ is calculatedas


E / E ^.^ .: E l l 3 (N -2 )l

(r 1.7)

/ measuresessentially the same network propefties as p, but makes it easier to comparedifferent networks becauseit is constrainedto range in value from 0loa a completely disconnected network to I for a completely connectednetwork. The alpha index, cv,is the ratio of the actual number of cycles in a network to the maximum number possible.It is calculatedas: ct:(E - N + G ) / ( 2N 5) (1i.8)

a providessome indication of the amount ofredundancy among the connections in a network, so that a network with high a might be thought more robust than one with a low d. The average clustering coefficient, C, is calculatedas:

. =* E o

(l l.9)

A value of 1 would indicate that a network is made up of a number of globally disconnected (components)that are each locally highly connected, neighbourhoods whereasa vaiue of 0 would indicate that no nodes shareneighbours.In studiesof social networks, C is used as a measureof'cliqueiness'.

Global accessibili4) Measuresprovide someindication of how easy it is to travel betweennodesthat are connected, but not directly connected.
The network diameter, d, is equalto the largestKiinig iidex of anyl1ode in the network. It measures the maximum joumey length that would be required to travel between two nodes.Olcourse, longer lengthsmighl be possibleby avoidingthe shortest paths, but this is usuaily of lessintercst. The Shimbel index, D, is defined as the total number of edges traversedby all the shortestpathsbetweenpairs of nodesin the network. It may be calculatedas

n--\ - \ - r




A network with a small Shimbel index is said to be compact, in the sensethat most nodesare only a short path length away tiom others. The average path length, a, is calculatedas

, "

N( N-r\+4"'

\- \-,

( r 1.11)

Note that 2/N(N l) is the reciprocalofthe numberofpairs ofnodes,which provides a with an advantage over the Shimbel index in that it scalesthe measureaccording to the network size.

Examples The followingexamples havebeenchosen to illustrate theapplication of network analysis at botha rcgional scale andthescale of individual buildings. Gorenflo and



Fig. 11.8 A graphof Serbian tade rcutesin the thirteelth andfourteenthcenturies (data ftom Carter1969, Fig.2.) Bell (1991) describe several other examples concentrating on archaeological and

historical transportation networks. Settlementlocation Carter (1969) usedmeasures of network structureto analyse the centrality of successivecapitals of medieval Serbia in the thirteenth-fourteenth centuries AD. His aim was to identify the likely extent of the Serbian oecumene3 and,in particular,to assess whetherStefanDusan'schoiceofcapital city ultimately contributed to the dernise of the Serbian state. For this purpose he created an undirected planar graph of trade routes with 42 nodes representing medieval settlements(Fig. 11.8). Carter chose to analyse the trade network by representing it in maffix form, which is a common approachin network analysis.A C1 matrix comprisesa set of N rows and ly' columns, such that each entry is coded 1 if the row and column node are connected by an edge and 0 otherwise(Fig. 11.9).By raising the C1 matrix to the power of the network diameter, in this casemultiplying the Cl matrix by itself
'The concept of an oecumene rcughly equates to that 'portion of the state that supports the alensest alld mosr extended population and has the closest mesh of transportation lines' (Whittlesey 1944 cited in Carter 1969. D. 39).

I 1.3 Analysin g networks

Fig. 11.9 TheCr matrix for the Serbiantrade network shown in Fig. 11.8.

11 times, he was ableto obtain a result in which eachentry givesthe numberof 11steproutesbetweenthe relevantpair of nodes.He then summedeachrow to obtain the 'grossvertex connectivity' of eachnode,i.e. the total numberof 11-steproutes to which it is connected.Carter found that the medieval capitals of Serbia were ranked 1, 5, 6, 7 and 17 in descendingorder of gross connectivity, with Dujan's chosencapital, Skoplje, rankedlowest. A significantproblemwith grossvertexconnectivityis that it includesredundant information: it seemsunrealistic that a trader would prefer to do businesswith the settlementat the end of the largestnumber of I l-step routes as opposedto a settlement that is generallyonly a few stepsaway.Consequently, Carterconstructed an N x N short-path matrix, in which eachentry recordsthe numberofedges in the shortestpath betweenthe relevantpair of nodes.He again summedeachrow, this time obtaining the accessibilityindex A; as defined above.The medieval capitals ranked 1, 2,3, 6 and 8 in order of descendingaccessibility,that is increasingA;, with Skoplje rankedlowest. Carterconcludedthat StefanDuSan's chosencapitalwasnot bestplacedin terms oftransportationlinks andthatthismay havebeena contributoryfactor in the demise of the Serbian state.Perhapsmore interestingly from a methodologicalpoint of view, he also createdmaps showing isolinesaof accessibility(Fig. I 1.10), which
{Iiilo,", 12fo, * of isotiner. "*planation



a oo a

ao a o o a

places 10Mostaccessible places 20 Mostaccessible

(data Fig. 11.10Di-stribution oecumene of accessibility in themedieval Serbian from Carter 1969, Figs. 3 and 6). he used to suggest that the extent of the medieval Serbian oecumene may have extended further west than hitherto realised. This suggestion also drew on more advanced work in which Carter calculated the eigenvectors of the connectivity matrix, a subject which falls beyond the scope of this book - interested readers shouldconsult Gould's (1967) On the GeographicalInterpretationof Eigewalues and may also find a straightforward introduction to matrix algebra useful (e.g. Chapter 18 of Easoner al. 1980). Building design There has been sporadic archaeological interest in space syntax, which comprisesa theory of the human use of spaceand a collection of methods

I 1.3 Analysing networks



>\\ t\

Fig. 11.11 Ajustified graphof a simplebuilding.

for analysingthe humanuseofspace accordingto that theory.The main theoretical tenetsof space syntax were laid out in Hillier and Hanson,s(19g4) The Social Logic of Space and its relationship with architecturalpractice further developed in Hillier's (19961Spaceis the Machine. The spacesyntax method that has been most usedby archaeologists is the constructionofjustified graphs.Suchgraphsare tree-likenetworksthat show how discretespaces in a building, or sometimes group of buildings,are accessible from one another(Fig. 1t.11). They havebeenused to investigateissuessuch as social stratificationand privacy in contextsincluding: ScottishIron Age brochs (Foster 1989), English medieval nunneriesand monas_ teries(Gilchrist1997,pp.160-169), North Americanpueblos (Bustard1996)and Bulgarian tells (Chapman1990), Our concernhere,however,is with recent work that usesmeasures of network strxctureto analysethe layout of buildings in terms of the intervisibility of spaces within them. Ttmer et al. (2001) calculatethe intervisibility betweenall points on a 1-m grid within a building. They then consrructa viriril ity grdph (Floriani er a/. 1994) in which the nodesrepresentthe points on the grid and the edgesindicate mutualvisibility (Fig. 11.12). Note that in this case rhe nodes represent analytical units ratherthan real, clearly delineated, physicalentities;this clearly demonstrates the llexibility of network analysis. Tvner et al. (2001) arguethat network measures of visibility graphscan provide useful information aboutthe way in which built spaces influencehumanbehaviour. First, and most straightforwardly,the immediateneighbourhoodof a node consti_ tutes that node's isovist (see Chapter l0) and its nodal degreeis linearly related to the isovist area.They argue that the local clustering coefficient, C;, measures the 'proportion of intervisiblespacewithin the visibiliry neighbourhood of a point' (Ttmer et aI.2001,p. 110).By plottinga map of C; for all nodes it is thenpossible to investigate how much the visual field changes asone movesaroundthe building, since C; > 0 implies a large changein what is visible while C; -+ 1 implies littie change.Finally, Tumer er al. argte that the mean shortestpath length from each



Fig. I l.l2 A ! isibilitygraphtbr the cxtcriorot'an L shapecl building.

nodeprovides a measure of visualaccessibility which is sensitive to global,notju.: local,relationships. They haveplottedthisfor a setof pointsin a major art gallerr in London and found that is is correlated with the numberof visits to rooms rrgallery. the Importantly. the actualEuclidean distance between the pointsdoesno: correlate with visits,leadingTurner?1ol. to arguethat meanshoflest path lengr: measures an aspect of intervisibility thatinfluences the comprehensibility of spac: ll.3.2 LoccLtion on nenrorks The problen of optimallylocatingfacilitieswithin a catchment is known as rl:: problem. ktcatirn allocation planning In modern-day andcommerce,firclliticr l:, (seeLongleyet a/. 2001 typicallythingslike schools and supermarkets . pp. 3 1-:316),but it is not difficultto think of othercentralised resources more relevanr : thepast.Location allocation problems canbe solved for the placement of taciliri-. in continuous space. or for their positionon networks.Beil and Church (l9t-' providearchaeological examples of both,but it is the latterwhich interests us he.: (we havealready discussed continuous space in Chapterl0). Solving location-allocation problems on networks As Gorenfloand Bell (1991)point out, networkscan play two roles in locatii-:allocationmodels:they can providea locationalconstraint on the placemenr facilities(i.e.the lacilitiesmustbe on the network)or theycanprovidea mean\ modelling the interactionbetweencandidatecentres(e.g. edgesthat represenl l . flow and volumeof goods). What constitutes the 'optimal' locationof lacilitie. a network is determinedby the objectit,e function. The most commonly emplor. objectivefunctions are designedto solve the following problems:

11.3 Analysing networks


The p-median problem is to locate p centresoffering facilities so as to minimise the total distancetravelledfrom all demandlocationsto their nearestfacility. Coterage problems seek to locate centres offering facilities so as to ensure that all demand locations are within some specifieddistanceof their nearestlacility. Compared with the p-median solution, the averagedistancebetween a demand location and its nearestfaciiity may be higher,but the maximum distanceis likely to be lower. In contemporaryplanning, the coverageproblem is often seen to provide the most appropriatesolution for the location ofemergency seNices,where it is the maximum responsetime that must be minimised.

Concerns havebeenraisedthatthe useof optimality modelsmerely serves to project modern capitalisr values backintothepast(e.g. Thomas1991a, but see Mithen 19g9 for a more sympatheticview). In the caseof location-allocation modelling, two counter-arguments may be made.The first is that archaeologists are free to specify the objective function of their choosing,althoughadminedly this may require scriptingor programming,sincecommercialGIS softwareis unlikely to offer builtin supportfor anything other than the standard p-median and coverageproblems. The secondargumentconcemsthe inferential framework within which locationallocationrnodellingis to be applied.As archaeologists, we are not normally in the business of locating facilities, but insteadseekto understand the location of those already established in the past. In this case,rather than simply asseftingthat the past was different, we may be able to establishhow different by comparing locational solutionsderivedfrom modem objectivefunctions againstthe archaeological evidence. A more seriousobjection specific to location-allocation modelling is that the 'classical' (Bailey andGatrell 1995,p. 369) approach, which includesthep-median and coverageproblems as formulated above,assumes that people always use the closestfacilities.This is not alwaysthe casetoday andmay not havebeenin thepast. Spatial interaction methodsovercomethis limitation by allowing the probabilistic allocation of demand to facilities. Bailey and Gatrell (1995, Chapter 9) provide an introduction to spatial interaction modelling, and Fotheringhamand O'Kelly (1989) provide a more extendedtreatment. An archaeological example Mackie (2001) has used location-allocation modelling to study rhe relationships betweenshell midden sites on VancouverIsland, Canada.Taking the centroidsof midden zonesas nodes,he built a network basedon shortest-path distances across water (Fig. 11.13).Then, trearingall nodesasboth demand locationsandcandidate centres(with equal capacity to function as such),he usedARC/INFO to solve the p-median problem for numbersof centresranging from I to 25. Mackie found that for p : 1 and p : 5 to p : 9, the middenschosenas centres had average areaslargerthan would be expected by chancealone,suggesting a link betweencentrality and intensity of use (Fig. I I .14). The importanceof centrality per se was supporred by the lack of any correlationbetweennetwork accessibility


Middn z@e { I rn buf6)



niddn '@

Fig. I l.l3 Detailof Mackie'sshellmidden network(reproduced with permission from Mackie2001,Fig.6.3).

.* t!oc
lg; ic

r oo oc q., (u =E --o


Number of clusters



Fig. I L 14 The relationshipbetweenthe number of centres,p, and the averagear.ea ol the midden zoneschosenas cenftesin l\,[ackie'slocation-al]ocationanalysisof shell middensites(datafrom Mackie2001,Fig. 7.3 and table7.1).o, actualsol]rion sers: !, random solution sets.

) 1.3 Analysing networks A^_AB o+f+c+e= 3+2+4+2=11

o+b+c+d= 3+2+4+3=12

e+b+f+d= 2+2+2+3=9

Fig. I l 1.5 The travelling salesman problem for 4 nodes.Route rly' is shofiest, assumingweights are totals rather than per,unit distance.

and the size of midden zones.Mackie (2001, p. 63) concludedfrom theseresults that he had discovered the scaleat which 'habitual action makesboth a sisnificant recognisable and contributionto the archaeological signature'. 11.3.3 Routing on networks Routing is an important element of many commercial applicationsof GIS (see Longley et a|.2001 for concrete examples).It is likely to become even more widespreadwith the proliferation of wireless devicesthat can be used to downIoad and display data for real-time navigation.It has,however,found virtually no use in archaeological GIS. So far as logistics are concerned,archaeological fieldwork is rarely conceivedon a scalethat would justify the use of GIS for optimising the deploymentof, say,field-walking teams.And in the caseof research, it would appearthat archaeologists have found few analoguesfor the problems that are routinely solvedin commerce. For thesereasons we haverestdctedour discussion of routing to a brief account of the main applcations - to serveas a stimulusfor further thought if nothing else. Those who require a more detailed treatmentshould consult Fischer (2004) for further references.
The travelling salesman problem is to find 'the least cost tour through a set of nodes so that eachnodeis visited exactly once' (Fischer2004,p. 2). Note that asthe number of nodesincreases, so the number of possibletours increases very rapidly: there are 3 possibletours through 4 nodes(Fig. I 1.l5), 12 possibletours through 5 nodes,bur 181440 possibletours through 10 nodes(Longley et al. 2OOl,p.317). As a resulr,ir is usual for GIS softwareto provide an approximaterather than exact solution to the travelling salesman ptoblen. The vehicle routing problem is a generalisationof the travelling salesmanproblem to include situations where there is more than one salespersonand it is therefore necessaryto decide which salespersonshould visit which node. Additional


complications can include multiple stafting points, constraintson the number ol visits that an individual salesperson can make (so called cdpdri\, conrtraints) and also constrainison the visiting times for each node (Fischer200,1). The orienteering problem is a variant of the traveiling salesmanproblem in which lt is Dot necessary to visit every node. Insteadthe aim is to maximise the gains front visiting nodeswhile simultaneously minimising the distancetravelled(Longley "tdl. 2 00 1) .

11.4 Networks on continuous surfaces paths ll,4.l Least-cost Most usersof GIS know the physicallocation of the network links that they wish to model - for example,roads,railways, watet'ways or power lines. Archaeologists. however,often do not know the exactrouteoftranspofiationlinks because for much of history transportdid not involve the constructionof specialisedinfrastructure such as roadsand aftilicial waterways.Even where it did, such infrastructuremlr not have been preserved.Under these circumstances GIS can be used to predicr pathsfrom an appropriateaccumulated transportroutesby defiving least-cost costsurtace(Chapter 10). Of course,prediction of 'lost' routes is not the only use for paths:they can be comparedto known routesin order to help understand least-cost location the of thoseroutes. Calculating least-costpaths As Husdal (2000) recounts,the idea of using cost-surfaces to derive least-cosr pathsdatesback to the Iate 1970sand was effectively introducedinto GIS by Tom lin (1990). Most, if not all, commercially availableGIS software implementsthe calculationof a least-costpath as a two stageprocess.The first stageis to create a cost-surfacethat models the accumulatedcost of travelling outward fi'om the origin using the relevant transport technology.The second stage is to trace the route of steepestreduction in accumulatedcost from the destination back to path is dependent the origin. It follows that the validity of the final least-cost upon the suitability ofboth the accumulated cost-surface andthe path-findingalgorithm. We have already discussedthe first of these at length in Chapter 10, where the readershouldpay pafticular attentionto the problem of anisotropy. We discussthe secondin more detail here. pathsgenerated Many least-cost using GIS are problematic,either because ther fail to replicateknown routes,or because they follow routes that seemintuitivel\ unlikely.(Thereis, of course, a delicate balance to be struckin weighingcounterintuitive resultsfrom the applicationofa relatively crudeGIS techniqueagainstthe possiblefailings of common sensel)The most commonly encountered problem. arisefrom the following (seealso Hanis 2000):
Shape of search neighbourhood Least cost paths often exhibit small-scalezig-za-s: evenwhen they purporledlyrepresent the leastcostly route overa unifblrn landscape which ought to be a straight line (Fig. ll.16.). This frequerltly occurs when rh.

I1.4 Networkson continuoussurfaces


'"' ----.-.'-.-*'''


Fig. I 1.16 Least-costpaths are seldom optimal, even on a uniform surface:(a) shows the optimal path betweentwo nodeson a uniform surface,(b) showshow it is typically generated in a rasterGIS.

Fig. I1.17 Collischonn and Pilar's least-costpath on a mountain (reproducedfrom Collischonn ald Pilar 2000, Fig. 8, wirh the permissionofTaylor & Francis Ltd, www'tandf.co.uk/joumals).

path-facing algorithm and/or the cost accumulation algorithm searchesa neighbourhood with a radius ofjust one map ceil. In the worst case, the von Neumann neighbourhood(Rook's move) only allows moves in the four cardinal directons. while the Moore neighbourhood(Queen'smove) only permits moves in eight direc tions- There arc two solutions to this problem, although both are likely to require some programming.The first, as explored by Xu and Lathrop (1995), is to increase the searchradius, therebyincreasingthe number ofdirections in which it is possible to move. The secondmethod, favoured by Douglas (1999), interpolatesthe exact courseof the path through each map cell: this overcomesthe need for pathsto pass through the centresof map cells and as a result removesthe limitation that a move can only be made in one of a finite number of directions. Failure to model anisotropy Despitethe occurrence ofsmall-scalezig-zags,leasFcosr paths seldom exhibit the large-scalezig-zags that characterisemountain roads in many parts of the world. The absence of such paths when they might be expectedis usually indicativeof the failure to model the anisotropiccostoftraversing slopingland adequately. As Collischonn and Pilar (2000) point out, the traditional procedures for finding a least-cost path cannotbe appliedto solveproblemswith direction-dependent costs becausethe direction of travel acrosseach cell is not known at the time that the cosFof-passage surfaceis created.They have devisedan algorithm that perlorms both steps together and which is capable of producing 'realistic' mountain roads (Fig. 11.17)provided that the costsatt.ibuted to different slopesare appropriate:as



1 0 km

(J (J

IE ilt

oa {,



path (based on datafiom Bell and Lock Fig. IL18 A globall) suboptimal leasFcost 2000. Fig- 5). Elevationdata @ CIown Copyright. All rights reserved. Licence nu. 10 00 21181. discussecl in Chapter 10. in many casessteepdownhill slopes should be considere!l morecosdythangentleuphill slopes. Assumption of steepestdescentAnother common objection to least-costpathsis thi: they often do not appear to follo\\ the globall) optimal route. For example, Bel a leasl-cosl path between two locations on a prehiston. and Lock (2000)generated trackway in centraLEngland. Although the tmckway follows a nanrral ridge. th: Gls-generated least-costpath dropped otT the lidge and then cljnbed back onto r: (Fig. I l.l8). GiventhatBe]]andLock assigned to flatlandandgenll: theloweslcosts one might haveexpected the GIS path to fbllow the ridge.theleb) avoidir: slopes, of thoseobtainedfroi:' the steeptfansve$e slopes.In tact ihis resull is characteristic in accumulated palh tracing aigorithms thatseekthe mostrapidreduction cost.Sue. algorithmsessentiallytlace the path that water would lake ovef the accumulated co.: when flowing tiom the destinationto the origin. Conslderedtiom the origiri. surf'ace sLrcha path typically accumlllalescosts slowly at first and then at an ever grear.': rate as one approaches the destinarion.The problem with this is that peoplc do nf: usually leave the hardestwork until last. but insteadseek a route along which coirrelatively steadily. accul,)rulate padralgorithmscunently avu. Failure to model multiple destinations The leasFcost able in production CIS are designedto trace paths between exactlv two locatiof. Thus in orderto predictthe most likely routeto multiplelocations one musl nrak: The first would be to treatthat lociLri.' do with one ol the fbllowing two methods. as the o gin and then trace paths separatelyto all other locations. which mjghl t: groundsfor trealing one location as some kind i.' appropriateif there arc rcasonable redistfibutive centre. The secondmethod would be to ueat each as the destinrl(, uencc ol locctiLr,. and then o gin in turn. whic h might better suit, fbr example.a seq along a caravanroute. Both methodssuffer seriouslimilalions: the fiist preventsdr.

I 1.4 Networks on continuoussutfaces

generationof substantiallysharedrouteswhose exact location is partly a function of the necessary compromises,while the secondrequires that one already knows the order in which locations were visited. At the time of writing specialisedmultiple destinationleast-costpath algorithms are the preserveof experimentalGIS (seefor exampleMcllhagga 1997cited in Husdal 2000). Failure to use ratio-scale costs It is common foa least-cost pathsgenerated for archae_ ological puaposes 10 be derived from relative rather than absolutemeasures of cost. pa ly because ofthe difficulty olcalibratinS the costsofpast transpofi technologies. Relativecostsaaeusually measured on an ordinal or interyal scalerather than a ratio scale.It is not widely appreciated that when costsare measured on an interval scale. adding or subtractingan arbitrary uniform amount from the costs will often alter the least-costpath becauseit alters the relationship between the costs lncured in travemingpafiicular map cells (Longley er a1.2001, p. 319). Least_cost pathsshoulcl, whereverpossible,be generated using costsmeasured on a ratio scale.

Archaeological applications Archaeological applicationsof leasfcost path analysismay be divided into two groups:thosethat seekto replicateklown routesand thosethat attemptto predict unKnownroutes. Replicationof roules The purposeofreplicating routesis generallyto aid in understanding the reasonsfor the location of routes. For example, Madry and Rakos (1996)soughtto replicateknown segments ofthe Celtic road network in the Anoux Valley,France,using severaldifferent deteminants of,cost,. They then used the best model - 'a combination of least changein elevation,low slope and prefer_ enceto remain within line of sight of hillfofis' (Madry and Rakos 1996,p. I l5) _ to predict the course of a missing segmentof the road network. Similar studies include Bell and Lock's (2000) study of the Ridgewayprehistorictrack, discussed above,and Kantner's (1996) attempt to replicate the location of known Anasazi road segments in ChacoCanyon,New Mexico. Prediction of routes One of the more sophisticated attemptsto predict the location of unknown routeswas carriedout by the WroxeterHinterland project, which sought to reconstructthe Iron Age road network that existed prior to the devel_ opment of the Roman city of Wroxeter, England. Van Leusen and his colleagues (seevan Leusen 2002, Chapter l6) generated a road network by tracing least_cost pathsfrom the locationsof known settlementenclosures to Wroxeter (which was a centreof exchangein the Iron Age) and also the least-costpathsfrom the same known settlements to the larger multivallate enclosures. Broadly similar attempts to predict entirepathwayor road networkshavebeenmadefor the Bronze and Iron Age settlements of central Italy (Bell et al. 2002) and rhe Neolithic habitation of the Biferno Valley,also in Italy (Silva and pizziolo 2001). A ratherdifferenrexample of route prediction is offered by Kdst and Brown,s (1994) study of possible Late Pleistocene/Early Holocenehunting sites in the Great Lakes region of North America. Here, least-costpaths were used to predict the likely migration routes


Fig. I l.19 A (a) DEM (white: high),(b) local drainage dircctionmap.(c) stream chaniel (definedon flow accunulation > 50) and (d) ffow accumulatioi map (white = low).

of caribou in order to ascertainwhetherthey were in view of the possiblehunting sites.Note that this study providesa well-documented exampleof the creationof a partially anisotropicaccumulated cost-surface. 71.4.2 Hydrologl" Archaeological GIS frequently include hydrological information in the fom of vector maps of watercourses. Such maps are most commonly usedfor proximity analysis,for example,to identify sites that may be at risk from flooding, or as a possiblepredictor of site location.Vector mapsof navigablewatercourses can also be usedfor networkanalysis. For example, both Zubrow (1990)and Allen (1990) basedtheir network models of the early Europeancolonisationof North America on maps of the major river systems.In principle. most of the methodsdescribed abovecan be applied to hydrological networks. hydrology using Our concernhere is with an alternativemethod of representing local drainagedirection (LDD) maps (Fig. 11.19b). LDD maps are a form of surfacetopology map. Such maps show how map cells are connectedby some process, in this casethe direction of water flow acrosseachcell. Note that although

I 1.4 Networkson continuoussurfaces


topological relations such as previous cell (e.g. where the water came from) and nexl cell (e.9. where the water will flow) can be retrievedfrom a surfacetopology map (e.g.by following the direction of flow), they are not themselves storedin the map.In other words, whereasa vector map recordstopological relationsexplicitly, a surfacetopology map recordsthem only implicitly. Therearetwo advantages in representing hydrology using an LDD map.The first is that it makespossiblethe automatedderivationof watercourses where suchdata areeither not availableor data acquisitionwould be costly and/ortime-consuming. The second advantage is thatLDD mapscanbe usedto deriveadditionalinformation abouta region'shydrology and landscape morphology which may in turn be useful for understanding archaeological site location. Calculating LDD maps Local drainagedirection maps are usually implementedas raster maps, although they have also been createdusing TINs (see,for example,Jonesel a/. 1990).The computationof a raster LDD map involves two processes, which may be carried out iteratively until an acceptable result has beenobtained. Direction of fow Each cell in an LDD map is coded according to the dircction in which water would flow out of it. The simplest method computesthis as the directionofthe steepest downhill slopewithin the Moore neighbourhood, i.e. within a window of radius one map cell. The use of such a small window only allows the direction to be determined to the nearest45', so that the cell at the centre of the window will typically be coded with an integer in the range 1-9 representing thebearing N, NE, E, SE,S, SW W NW or 'pit' (see below).Jensen andDomingue ( | 988) describesearchstrategies that can be employed when more than one map cell sharesthe steepestslope. Burrough and McDonnell (1998, p. 194) describe more sophisticated methodsthat also allow for the dispersionof water over several cells. Pit remoyal Cells arecoded 'pit' when all their neighbourshavea higher elevation. Pits disrupt the surfacetopology because water that flows into them doesnot flow out again.They must thereforebe removed,which can be achievedby increasing the elevation of the pit until at least one neighbour is lower.or, altematively.by decreasingthe elevation of all the neighbours until at least one is lower. Both methodscan be appliedmanually or automatically,but in either casethe challenge is to remove only thosepits that do not reflect reality. Methods havebeen devised (e.g.Deursen1995)to ensurethat pits are only removedifthey fall below a cefiain size, specifiedin terms of area,depth or volume. Derivatives of LDD maps Oncean LDD map hasbeencomputedit can be usedto generate further information about the hydrology and topographyof a region.



Hydrology BurroughandMcDonnell(1998,pp. 195-198)providea usefulintroduction to the many hydrological indices that can be derived from LDD maps. focusing particularly on thosethat are usedto model sedimenttransport.Here we concentrate insteadon the indices that are most likely to be of use for addressing generallocational questions.
Flow accumulation This LDD derivative codes each map cell with the number ol upstreamcells that drajn into it (Fig. 11.19d).Although the flow accumulationmap provides a crude measureof the relative volume of water that would drain through each map cell, it is mostly used as an intemediate siep in the cornputationof other indices. Stream channel maps These code each map cell according to whether they contain a watercourse(Fig. 11.l9c). They can be generatedfrom a flow accumulationmap using simple map algebrato extract all cells with more than some minimum number of upstreamcells. How many [pstream ce]ls are required to produce pemanent running water is, of course,a lunction of land cover, soil type, underlying geologr and precipitation.Nevertheless, in many casesstreamchannelmaps provide a useful and potentially more accurate- altemative to digitising watercourses from paper mapping. watercourses Stream order In natural languagewe often categorise using words sucll as' s t r eam ' , ' br ook ' , ' v er ' and' es t uar y '. T h e s e c a t e g o r i e s a r e t y p i c a l l y d e f i n e d o n more than one property,so that, tbr example, 'streams' are narrow.fast flowing and haveno tributaries,while 'brooks' are wider and have only streamsast.ibutaries,and 'rivers' are wide, flow slowly and have many tributariesof all types. The categori.i used in GIS, so-called stream-orderindices, are different in that they are typicall\ basedon only one property of a watercourse. Consequently, it is important to choose the most appropriateindex for the problem at hand (seeHaggett and Chorley 1969. pp. 8-16, lbr a uselul surveyof indices and Ward 1990fbr a morc recenttreatmenli For example, the Stmhler Index provides a purely topological categodsation, such that watercourses at the tips ofa network are coded l, thosetbrmed by the confluenee of two or more order 1 watercourses are coded 2. thoseformed by the confluenceoi two or more order-2 watercourses are coded 3 and so on (Fig. I 1.20a).Note that the confluenceof a lower-order watercoursewith a higher-order watercoursecloesnoi alter the order olthe latter.Consequently. the Strahlerlndex providesno information about the number of tributaries of a watercou$e. rendering it unsuitabieas even ! loose proxy for the width or speedof flow oi a watercourse.In contrast,the .tr,"r. hdex adds intbrmation about the number of tributariesby the device of assignins a l1ew order at every confluence,where that order is the sum of the orde6 ol the watercourses that join (Fig. 11.20b).As a result the Shreve Index provides a bettel indication of the likely relative sizesol watercourses.

TopographyThere are many ways in which one can attemptto characterise topography,including measures of surfaceroughness and the useof second-order derivatives (Chapter9). Hydrological modelling adds the possibility of mapping watersheds(drainagebasins)and ridges.
Watersheds are usually defined fbr a specific point on a watercourse(the pour poit t and constituteall the land that drainsinto that point (Fig. I I .21a.b). They are readil! identified from an LDD map by stepping back up the dminage network frcm the pour point until there are no further cells that receiveinput irom any other cell. The

I 1.4 Networks on continuoussurfaces



" l


Fig. I 1.20 Streamorder codedusing (a) rhe StrahlerIndex and (b) the ShreveIndex.

Fig. 11.21 WateNhedsand ridges: (a) watersheds with pour point at accumulatedflow of 1500; (b) watersheds with pour poinr at accumulatedflow of4500; (c) ridges.


RoIlfes for that pour in this processconstitutesthe watershed completeset ofcells traversed point. Archaeologistsusually conceiveof wateNhcds at a landscapescale,relating and tenito es. How them to economic and social conceptssuch as site catchments possible to identify the local also pour point it is ever,by appropriatechoice of lhe walershedsof individual siles, or even pafis of sites. although the latter, in partic ular will require a suitably high-resolutionDEM. Such small scale wateNheds,or of sitetbrmation elenenE,aould potenlially contributeto the understanding upstt-eam processes by mapping the likely sourcesof water bone materials Ridges are the local elevationmarima that dellne the edgesoi watershedsEither facl may be used to identify them fiom LDD maps,but which is most suitabledepends on the question at hand. If ol1ewishes to identily all ridges irrespectiveof whether they constituteslight risesbetweensoeans or the summitsof mountainsthat separate entjredversystems.then it is bestsimply to identity all local elevationmaxima, which may be accomplishedby extracting the set of cells that have no upstreamelements (Fig. I 1.21c). If the softwarein use does not permit topological queriesoi the LDD can be achievedusjng map algebrato extracl all cells with map, then the sameefTect a value of 1 in the flow accumulationmap. that one identifiesridgesolapartioular however.the questiondemands Sometirnes. scale:fbr exanple, the courseof a long distanceprehistorictmckway is more likely to havebeeninfluencedby the location of major ridgesthat extendfor a consideiable distancethan it is by the location of relatively minor toPographicfeatures.In this case it nuy be most appropriateto identify idges liom the edges of watersheds since the latter can be calculated at specillc scalesby careful choice of the pour points. For example. the ridges that bound the watershedsof pour points located by a high Strahler stream-orderindex will represent characterised on watercourses relatively large scaletopographicfeaiures,whereasthosethat bound the watershed\ index will representsmall scaletopographic of pour poinis with a low stream-order associated features.The main problem with lhis approachis that a set of watersheds with a given stream order index will not include areasthat drain dircctly irto higherthLlsleaving ihe ridges in thosealeas undelined order watercourses,

Archaeological applications predictivemodels. is routinely includedin archaeological Proximity to watercourses of hydrology that can be derived from LDD maps are included less Other aspects often,probablyowing asmuch asanythingto a lack of familiarity with hydrological of a that proximity to watercourses modelling. Indeed,one might, a priori, suppose pafiicular streamorder (or that drain a certainsize area)would be better predictors than proximity per se. In addition, there are other potential locational influences that can be delived from LDD maps,eventhough they are not normally available as palt of the hydrological modelling functionality of mainstreamGIS software. in S\\' {Early Archaic lithic scatters that many Palaeoindian For example,it appears (Petersor of watercourses Nebraskamay be preferentially located at conffuences from a stteam' canbe created how a mapof confluences 2004).Box ll.1 describes order map. is for palaeoenvironmenAnother useof hydrologicalmodelling in archaeology althoughthereare few examplesto date.One is Gillings' (1995 tal reconstruction, 1997) attempt to reconstructthe river network in the Tiza Valley, Hungary. The

I 1.4 Networks on continuoussurfaces Box 11.1A methodfor locatinsconfluences


Thisbox describes a method for generating a mapof confluences from a stream ordermap(Peterson 2004).Thebasicideais very simple.
1. Use the hydrological modelling capabilities of software such as ATcCIS Spatial Analyst to createa map of the Strahlerstreamindex. 2. If necessary, rasterisethe steam-order map and reclassifyit so that cells that do not contain a watercourse are coded NULL, while thosethat do are coded accordingto their ordet 3. Passa kemel of radius I over the rasterisedstream-ordermaD. If the central cell contains a stream then compare its value with the mean value of all other cells in the kernel that contain a stream.If the value of the cenftal cell and the mean arc different, then the central cell is located at a confluence;if they are equal then it is not locatedat a confluence.

This solution requires software that supportsspatial offsets in map algebraic expressions, such as GRASS. The appropriateGRASS map algebrafor step 3 aboveis:
co n f luenc es - ev a1 (n = ( i f ( i s n u l 1 (s tre a n L order l -1, -1 I ), 0, 1 ) + \ i f { is nul1 ( s t r e a n l o rd e r [0 , -1 ] ), 0 . 1 ) +\ i f ( is nu11( s t r ea m o rd e r 1 1 . -1 1 ) , 0 ,1 ) +\ i f ( is nu11 ( s t r e a rL o rd e r [-1 , 0 ] ), 0 , 1 ) +\ i f ( is nu11 ( s t r e a ru o rd e r [0 , O ] ), O , 1 ) + \ i f ( is nul1 ( s t r e a ru o rd e r [1 , 0 ] ), 0 , 1 ) + \ i f ( is nu11 ( s t r e a n l o rd e r l -1 ,1 1 ) ,0 ,1 ) +\ i f ( is nul1 ( s t r e a rL o rd e r I 0 , 1 l ), 0 , 1 ) + \ i f ( is null ( s t : re a rL o r:d e r [1 ,1 ] ) ,0 ,1 ) ) , \ t= (i f ( is nulf ( s t r e a rl o rd e r [-1 , -1 ] ) , 0 , s t rearLorder [ -1, -1] )+ \ i f ( is nul1 ( s t r e a ru o rd e r 1 0 , -1 1 ) , 0 , s tre a(Lorder I0, -11 ) + \ i f ( is nu11 ( s t r e a (L o rd e r 1 1 , -1 1 ), 0 , s tre aruorder t 1, -1 I ) + \ i f ( is nu11 i s t r e a (L o rd e r [ -1 . 0 ] ), 0 , s tre aruorder [ -1, O] ) + \ i f { is nulf ( s t r e a n l o rd e r [ 0 , 0 ] ), 0 , s tn eanl order [ 0, O] ) + \ i f ( is nu11 ( s t r e a n l o rd e rl l , 0 l ), 0 ,s tre a nl order[1, 0] )+ \ i f ( is nu1l ( s t r e a mo rd e r l -1 , 1 1 ) , 0 , s treanLorder [ -1, 1] ) + \ i f ( 1s nu11 ( s t r e a mo rd e r [0 , 1 ] ) , 0 , s tre aLorder [ 0,1] ) + \ i f ( is nulf ( s t r e a n l o rd e r [1 , 1 ] ) , 0 , s tre amorder [1, 1] ) ) , \ m=i f ( n, L/ n, 0) , \ stre a rL o rd e r[0 ,0 ] ! =m) The above creates an output raster map called confluences from an input map called streanLorder, both containing what their names suggest. Cells in the output map that do not contain watercourses are coded NULL, those that contain watercourses but are not at confluences are coded 0, and those that contarn watercourses and are at confluences are coded l. Note that the equation makes use of three intermediate variables before the final result is calculated on



the last line: n is the number of cells in the kemel that contain a watercourse. indices of those cells and m is the meanof the t is the sum of the stream-order stream-order indices.It may help to know that i snul1 (A) is a logical function that returnsTRUE if the map cell in a is NULL andFALSE otherwise.Similarly, the logical function if (A, B, C) takesthe form: if the map cell in a is TRUE (i.e. non-zero)then retum the result B but ifA is not TRUE (i.e. zero)then retum the result c. problem faced by the Upper Tiza Projectwas that nineteenth cenlury flood control it very difficult to understand had eliminatedthe annualflood regime,thus making pastuse of the landscape. The aim of the hydrological modelling was to 're-access illustratesman)' dynamics' (Gillings 1997).This research thosepastenvironmental of the processes describedabove.Briefly, in light of evidencethat no large-scale river migration nor alluviation had taken place during the time period being studied, Gillings processed a modern DEM to identify and removepits, distinguishing betweenthoselikely to representartefactsof interpolationand thoserepresenting real features.He then createdan LDD map and, specifying lowest points around watersheds. This allowed the boundaryof the study areaas pour points, generated from which he genflow map for the Tiza watershed, him to createan accumulated flow of 50 cells or by extractingall cells with an accumulated eratedwatercourses more (representing a minimum catchmentof 20 000 m'). Gillings then vectorised the watercourses and merged them with the courseof the Tiza, thus providing a reconstrxctionof the naturalhydrology obliteratedin the nineteenthcentury. 11.5 Conclusion In this chapterwe haveexploredapplicationsofnetwork analysisto archaeological problems.Techniquessuch as the generationof least-costpaths are well knownalthough by no means unproblematic.Others, such as the creation of visibilitr promisefor the future. We haveendeavoured to shou networks,hold considerable how many differentkinds ofphenomenacanbe conceivedasnetworks,while beinr when usingGIS maps carefulto draw attentionto the issues that mustbe considered for network analysis.

Maps and digital cartography

12.1 Introduction Cartographers have long recognisedthe influence that maps have on the shaping (Monmonier1991;Wood 1992;Lewis andWigen 1997). of spatialconsciousness The purposeof this chapteris to explore the way maps, whether paper or digital, may be usedto presentspatialinformation and to highlight somedesignprinciples to maximise their effectiveness at this task. In doing so we describe a range of mappingtechniques appropriate for the different sortsof dataroutinely handledby archaeologists. We also consider some major cartographicprinciples and design conventions that help make mapseffectivecommunicationdevices,anddiscussthe growing importanceof the Intemet and interactivemapping for the publication of spatialdata.

12.2 Designing an effective map As definedin Chapter2, maps are traditionally divided into two categories: /oprgraphic and, themattc.The former tem describes maps that contain generalinformation aboutfeaturesof the Earth's surface,whereasthematicmaps are limited to single subjects,such as soils, geology, historic places,or some other single class of phenomena. Both types of map must contain somebasic piecesof information so that the readeris able to comprehendand contextualisethe data that is being presented. The most basic of these,without which a map is difficult if not impossible to understand, are: (i) a titlei (ii) a scale;(iii) a legendand (iv) an orientation device, suchas a northarrow (Table12.1). There are other items that may needto be included on a map in certain circumstances. The most important of theseis referenceinformation aboutthe coordinate systemused to createthe map, and how the grid systemrelatesto what is being depicted.Maps producedby national mapping agencies will always contain information about the coordinatesystem (as describedin Chapter2). This often takes the form of a few lines of text in the bottom left or right of the map describing the point of origin of the grid and the coordinatesystem(e.g. UK National Grid), and the ellipsoidthat hasbeenused(e.g.WGS84).This informationis necessary for computingtransformations betweendifferent mapping systems, and it can also be of use in archaeological contextswhen attemptingto relatedata collectedfrom a GPS receiver with existing topographicmaps. Coordinateinformation, such as how the grid was established and the location of the 0, 0 datum,is alsouseful when 263


Maps and digital cartography

Table 12.1 Essentialmap items

Item Description It cannot be assumedthat common senseis a sufficient replacemenifor a title. For topographicmaps,the title must provide the geographicalcontext.For thematic maps,the title should identify the classof object or objects. elevation:they may Contour lines, for erarnple,do not always represent temperature, rainfall or some other folm of trend suface. represent alone. It is impossibleto determinethe scaleof a map ftom visual assessment Scalesshouldeither be given in easily multiplied linear units ( I , 2, 5, 10, 100, etc.) or as an absolutescale(e.g. I : 25 000). In most cases,the latter style is more useful as it perrnitsthe readerto calculateapproximatedistances using a rule. Many GIS and CAD systemshave a facility fbr allowing absolute scalesto be addedto maps by automaticallycalculatingthe mathematical relationshipbetweena map's geographicalextentand the size of the p nted version.As publishersmay changethe size of a map to fit the availablespace in a repoft or book. absolutescaleswill needto be recalculated. For this reason,despitebeing less useful, linear scalesare often prelelled. This dellnesthe thematicclassificationof the nlap. Legends,like the phenomenathey represent,may either be continuoLls or discr'ete. In the case of the former, it is necessary to break the va ables into categories or'bins'. to easy Some GIS programscreatedata bins that do not lend themselves interpretation,and thesethereforeneedto be changedmanually.The number olJegend categories used to dellne discretedata will ultimately be influenced by the number of objectsthere are in the class;however,it is worth keeping between largenumbers ofcategoriesin mind thatit is diffcult to distinguish A device to o entatethe reader,usually in the form of a nor'tharrow or grid Although this in itself doesn't necessarily make the map overlay is essential. morc undentandable,it doespemit the readerto orientatethe map and plcce it u ithin r geo.p:rtral frameuork.




presentinglarge-scale excavationor field data, such as plans of an archaeological sufvey. Information defining how the data were collectedmay also be of useand should be storedas associated metadata.Metadata transformsa map into a more complete record of the spatial information. For example,maps showing geophysicaldata. such as soil resistivity, should contain,in addition to the intbrmation describedin the previousparagraph, shofi descriptions ofthe equipmentusedandthe conditions under which the data were collected. Chapter 13 describesgeospatialmetadata standards in further detail, including tools for the recording of this information. 12.3 Map design One ofthe disadvantages ofthe type ofmapping facilities includedin GIS software is that the resultingmapsoften do not adhereto basiccafiographicdesignprinciples (Dent 1999, p. 6). Successfulcommunication of spatial information requires at Althoughwe are not advocating leastsomeunderstanding of theseprinciples. an

12.1 Thematicmapping techniques


approachthat emphases style over content,it is necessary to be awarethat design plays an important part in the effectivepresentation of spatialdata.Cartographers have recognisedthis for some time, and have grouped issuesof map design into threeareas:(i) thoserelatedto objectarrangement, (ii) thoserelatedto visual impact and(iii) those relaled lo comprehension. Object arangement is an important componentof map designand helps ensure proper balanceand focus of attention.Automated map production is a common element in many GIS packages,and can significantly reducethe amount of time spent preparing maps. Uniortunately, automatedmapping can never quite reach the skill of a good cartographer, and mapsproducedusing the built-in functionsof GIS packages often require substantial editing. Generalproblemsof thesesystems include poor arrangement and sizing of basic elements:titles are often inappropriately large or small, Iabels are incorrectly placed, scalesare in unsuitableunits, north arrows are too large and omate. Fortunately,thesecan be manually edited, although specialisedvector graphic packagesoffer more opportunitiesfor cartographic manipulation. Visual impact plays an equally importantrole in map communication.For example, the upper map in Fig. 12.1showsa map of Thracein which therearefew visual aidsto help distinguishbetweenthe mainland,islandsandsea.The lower map in the samefigure correctsthis by using polygon shading,which immediately provides a mechanismfor visually distinguishingbetweenfeatures.Colours may also help readersdistinguish betweenmap features,but the non-judicial use of colour can createmaps that becomeunintelligible rainbowsof information. Humans have a limited ability to deal with complex classificationsof data, so progressive shadingor the useof dill'erentcolours may help in highlighting differences betweenobjectclasses. Altematively,it is sometimes worth usinghigher-level categories to make complicateddata more comprehensible. For example,an excavatedarchaeological site may have20 different typesoft'eaturesand thesecould be groupedinto fewer numbersof larger categories (e.g.hearthand ashdeposits,pits and constructionalfeatures)eachwith their own distinctivecolour ramp (e.g. reds, blues and greens).In caseswhere three or more categories of infbrmation needto be communicated, it may be preferableto use a separate map for eachcategory. 12.4 Thematic mapping techniques Thematic mapping often requires the use of specialisedmapping techniquesto conveyinformation properly, of which there are five basic types: (i) the choropleth map, (ii) the continuousdistribution map, (iii) the proportional symbol map, (iv) the dot density map and (v) the isarithmic map. Most GIS programsare capableof producing all of thesemap types. 12.4.1 The choroplethmap The choropleth map is the best way to depict the distribution of data classesin well-defined areas(called enumerationrnlrs). Choropleth maps do three things


Maps and digital cartography


e Hn'

Fig. 12.1 The absence of polygon shadingin the upper map makesit difncult to distinguishbetweenthe mainland, islands,rivers and sea.The addition of a polygon fiIl in the lower map allows thesefeaturesto be more easily distinguished.

with a very effectively: (i) they show a quantitativeor qualitative value associated geographicalarea,(ii) they give the readera senseof the patterningof the mapped variablesover a larger areaand (iii) they provide a basis for comparing other values mapped using the same geographicalboundaries(Dent 1999, p. 140). For example, the upper map in Fig. 12.2 shows the density of pottery sherds per survey unit (i.e. number of sherds/unitarea). This form of mapping provides a meansfor determiningthe density of eachunit in the surveymap and, at the same

I 2.4 Themaricmapping terhnique:


Kastriand Region Ceramic SherdDensity

ff flf Wl I I I

l,so 50'250 250 -5so 5oo - looo rooo -2ooo 2ooo,35@

d d*,q5g. khnd P6jer surey,


Kastriand Region LithicArtifactDensity

fl o 81-r2
812-27 S 27,s7
soueor ddi: irrhFr Bhnd tuldsurry


Fig.12.2 Choroplethmapping using quantitativedata.Sou.ce:Kythera Island prolecr. Used with permission.


Maps and digital cartography

time, providesinfomation aboutthe variability of the densityin the study area.The lower map in the samefigure showsthe density of lithic artefacts(i.e. number of area).As both mapssharethe sameenumerationunits. it is easy lithic artefacts/unit to comparethe distributionpattemsofthe two materials.Quantitativeor qualitative data can be usedfor choroplethmapping.In the caseof the former, the data rnust either be a total (such as the number of artefacts)or a ratio (such as the number of artefactsdivided by the area of the enumerationunit). Unless the enumeration units are identical in area,ratio valuesare the more useful measure. In both quantitativeandqualitativefoms ofchoropleth mapping,the underlying assumptionis that the data are spreadevenly throughoutthe enumerationunit. ln some casesthis assumptioncannot easily be justified, yet this form of representation may still be the most appropriatemapping technique.In the previous map of pottery sherd distribution, for example,the artefactscannot be assumedto be by Lock evenly distributedwithin eachsurveyunit. This issuehas been addressed to the et al. (1,999)who suggestthat choropleth maps bring major disadvantages to intra-enumeration-unit representation of artefactdistributions.Someconcession variability can be made by mapping a measureof the variability itself, using one that expressthe variability in a distribution of of a variety of statisticalmeasures numbers.One simple measureislhe coefficientofvariation. which is the standard deviation divided by the mean.For example,Fig. 12.3 maps intra-unit variability lbr eachenumerationunit for the ceramic sherddata inFtg. 12.2. Standardising and normalising data datawhenpresented works bestwith quantitative As noted,thechoroplethtechnique as a ratio and, most often, this will be a ratio of the data value to the area of the the data so that variability is not enumeration unit (i.e. a density).This standardises dependent on the size of the collection unit. Altemative ratio calculationscan also be used to 'normalise' the data allowing disparatedata to be comparedon equal terms. For example,an observedvalue, such as a sherd count, could be used to normalisethe distribution of anotherarlefactcategorythus showing new valuesas a proportion to number of sherdsrecoveredin that enumerationunit. Determining class ranges A major issue with choropleth mapping is categorisingdata values. Classes,or 'bins', are used to simplify patternsand make data easierto comprehend.Classification of data will invariably lead to generalisationand a loss of detail, but conventionalcartographictheory, on the basis of experimentation,suggeststhat classes, andDent(1999. morethan1l grey-tone readers cannot distinguish between p. 143)makesthe recommendation shouldbe used.The that no more than 6 classes classificationmethod needsto be carefully controlled in order fbr it to be comprehensible.Some GIS programsuse classificationalgorithmsto determinethe optialthoughuser-interrr'ention is often required.As described mum numberof classes, to numeric classificationsfor the in Chapter 7, there are five basic approaches

12.4 Thematicmapping techniques


Kastri andRegion Intra-Tract Variability

Coeffc'ent of vaiationofnumberof pottery shrdsrccovercdin traneets for eacli tra.L Lowervalles indicat less ;ntemal variabli9.

Coefficient of variation

n n w r

2-5 5+

sourc oi data:(ythera lslandPoje<t tur, t99a-2001.tn$itute of Archa@togy, Uni@rsityCollege London.

Fig. 12.3 Mapping enumeration-unitvariability. Source:Kythera Island project. Used with permission.

puryosesof choroplethmapping:equal interval, naturalbreak, standarddeviation, quantile and equal area.As we previously explained,it is important to be awareof the structureof your data before choosingone of thesemethods(e.g. which of the four 'idealised' distributions it resembles in Fig. 7.l5). This will help you choose the most appropriateclassificationrnethodalthoughyou may needto edit the class rangesmanually to presenta more comprehensible picture of the data. 12,4,2 Maps of continuousdistribution Continuousdistribution maps are usedto display information about continuously varying quantitativedataratherthan discretedata.Maps ofcontinuous distributions are similar in many ways to choroplethmaps becausethey display values within an enumerationunit, within which the value is assumed to be representative. The major differenceis thatin mapsofcontin uousdistribution,the enumeration units are typically equal shapes and sizes(usually rasterpixels, althoughother shapes such as hexagonscan also be used)whereaschoroplethmaps may use different shaped


Maps and digital cartography

Fig. 12.,1Continuous dist butionmappingt slopevalues. Source: Kyther.a lsland Projecr. U'ed with permi..inn.

polygons.The visualisationframework for maps of continuousdistribution is also different,in that valuesarebestrepresented using a ramped(or smooth)scalerather than discreteclasses. The smoothgradationfrom the highestandlowest valuesalso enhances the idea that the data being displayedare continuousin nature.Several colourscan potentiallybe usedin a colour 'ramp', for examplea transition fronl greento yellow to red, to show changingelevation,temperature or rainl'all values. Colour ramping can result in dramatic visual effects for continuously distributed data such as elevation.If done injudiciously, such as ramping betweenmore than threeor four colotlrs.it caninterf'ere with the intelpretationofdata patteming.Son]e data,suchasslopevalues,are often betterrepresented using a singlerampedcolour. Figure 12.4 showsa map of slopevaluesfor the samestudyareausedin the previousfigures. In this case,the data categories are not individually identifiable: rather,an impressionof a continuouslyvarying surfaceis maintainedby the use of

12.1 Thematicmapping techniques

27r Kastri and Region Ceramic SherdDensity

Polygon defines boundaryof Kastri

, o E

150 50-250 250- 500

I $

soo - rooo rooo -:ooo

Sourceof dara:Kythe6 lsland Poje.t 5urvey, 1993-200l.lnstituteof Ar.hao og, Unive6ity Co iege London.

Fig. 12.5 Ceramic density variability mappedas prcportional symbols.Source: Kythera Island Project.Used with pennission.

rampedlegend.It is important to emphasise that continuousdistribution maps are not limited to environmentalphenomena; they can be usedeffectively to show the distribution of a wide range of data. To cite one recent example,the mapping of fluted point distribution using continuousdistribution techniqueshas been usedto help understand the expansionof Palaeoindians (Steeleer al. 1998;Anderson and Faught2000). 12.4.3 Theproportional symbolm.up Proportional symbol maps are possibly the most popular form of thematic map devised.This is largely because of the fundamentallysimple idea behindthe rechnique of using the size of a symbol (typically circles, although squares, triangles or other shapes are also used)to representthe quantity of a phenomenonat a specific place on a map. Figure 12.5 shows a map of pottery sherd densitiesusing proportionalsymbols.


Maps and digitcLL cartograph\-

Simple though this techniquemay be, thereare someissues that needto be taken into accountwhen creatingproportional symbol maps,particularly with regardsto the selectionof the symbol and the mannerin which the grading is calculated.Circles are traditionally the prefened shapebecausethey can be easily scaledand related to a precise position on a map, but most programs offer a range of altemative symbols that may be more suited to the type of data that is being mapped.A major problem with complex shapes,however,is the difficulty they presentfbr assessing scalardifferences.Experimentshavedemonstrated that people both over- and underestimate the size of circles dependingon the size of the neighbouringcircles. The relative sizesof squares are most accuratelyestimated, althoughsquares can createa rectangularityin mapsthat can distractfrom the data (Dent 1999,pp. 179-180). Most GIS programspossess automaticscalingfeaturesfor detemining the size of the symbolsin relation to their associated classranges. Formulaefor determining symbol size havebeendeveloped (e.g.Dent 1999,pp. 177-183), in cartography but it is unclear to what extent this has been adoptedin GIS. Examination of the size gradingin most systems suggests that it is basedon linear calculations, so some user intervention may be neededfor effective communication.Depending on whether circles, squaresor some other shapeis used, sizes may need to be adjustedto give a better impressionof the differencesbetweenlarger and smaller symbols. 12,4,4 The dot-densitr* map The rise of GISled cartographicproduction has seena decline in the use of dotdensitymaps,which is unfortunatebecause they are an extremelyusetulandpowerful way to present information aboutdensityvalues. Dot-density mappingis based on a simplepremise: eachdot represents a consistent unit of data,so thatthe total numberof dots within an enumerationareaequalsthe total data value.The map in Fig. 12.6depicts ceramic distribution in the studyareausingthistechnique. As can be seen,this providesa very good impressionof the underlying spatialstructureof the distribution, although not the precisevaluesfor eachenumerationunit. which is better represented by choroplethmapping. Additjonally, as the dots are placed randomly within the enumeration unit, they do not show the specificlocation ofthe phenomena. This may appearrather limited comparedto a map which showsthe locationof the units,but Fig. 12.6arguably provides a bettersense of the density variability than a simple choroplethmap. The effectiveness of this sort of mapping is basedpartly on the value of the dots,partly on the size of the dots and pafily on how the dots are placedwithin the enumeration unit. Ifthe dot valueis too low or the dot sizeis too large.the map soon becornes crowdedto such an extent that dots bleed into eachother. Conversely, if the dot value is too high or the sizetoo small, the map is too sparselypopulatedand any underlying spatialpattem is difncult or impossibleto detect.To avoid creating a false senseof patteming, the dots also need to be randomly placed within their

12.1 Themcttic mapping techniques


Kastri and Region SherdDistribution Ceramic

Polygon defines boundary oI Ka*l suruey ara. Individua ruNeytra.ls.ot shown. sour.ofdata:Kyihera s andProjecrsurue, 19932001Instiiute of Archaeology, London. llniversiiy College

0 5c

Fig. 12.6 Ceramic distribuiion as a dot-densitymap. Source:Kythera Island Project. Used with permission.

enumerationareas;any tendencyto cluster within this spacecan undemine the verypurpose of this typeo[ mapping. 12.4.5 The isarithmic map Isarithmicmapsareusedto portraythree-dimensional sudaces(volumes)usingline symbolsrefened to as isolines.This folm of mapping is most recognisablewhen it is usedto show variationsin elevationusing contour lines. The techniqueis, of course,not limited to showing elevation,and can be usedto representany expression ofvolume - or datathat canbe conceivedasa volumetric- suchastemperature or rainfall. The techniquecan be divided into two separate forms dependingon the underlying data: isometric mapping and isoplethic mapping. The former telm is usedto describedatathat are collectedaspoints, whereas the latter term is usedfor datathat occur over geographical areas(Dent 1999,p. 191).Examplesof isometric mapsusedbyarchaeologists includeelevation, rainfall andotherenvironmental data


Maps and digital cartograph),

Fig. 12.7 lsochronic mapping of the sprcadof Neolithic farming, wirh datesin yearsBp (f'romAmnlernanand Cavalli-Sforza p.685). Dashed 1971, linesarc expectedrcgional variations.Reproducedwith the permissionol Blackwell Publishing.SeeGkiastac/ dl. (2003) lbr more recentspatjal modelling of rhe Neoli$ii mdiocarbonrecord.

that havebeencollectedby recofding instrumentsor field observations. Isarithmic techniques havealsobeenusedto construct mapssuchas that in Fig. 12.7.which depictsthe spread of Neolithic expansion in Europeusingisoclrrons, wherethe locations and datesofEarly Neolithic sitesform the underlyingpoint data(Ammerman andCavalli-Sforza 197l). ln contrast,isoplethic techniquesare used to depict spatial variability of dara that have been collected in enumerationunits, for example population densin per square kilometre or the number of pottery sherdsper hectare as shown in Fig. 12.8.Althoughthe methodof construction is similarto isometricmaps,isoplethlc mappingultimately hidesthe scaleof the collection unit. This is acceptable for truly continuousdata (as this can often be found in the map documentation J. but may causeconfusionfor quasi-continuous as usedin Fig. 12.8.For datasets this reason,isoplethicmapsare less suitedfor data collectedin enumerationunits. which are better mappedusing choropleths. Before methodsfor determining the placementof isolines were integratedinto many common GIS packages,isarithmic mapping was a technically demandin-s

12.4 Thematicmapping techniques


a'l",,,. .. ,'r

t ) ..

,oo o

Kastri andRegion ceramic Density

,'91 \(-rl " - /)^

''. ra


t\# 'kJ

-; \."" ./i

161l v/-Aa


.,a,,.,' "''r ....1, .

dn a

^ ,,--



Conto!r interyal = 200 shds/hectare nte.po ation based on ractcntrcids ai50 m 9 d spacing.Poygor dfines boundary of Kanri !urvey area. source of datarKytheG s and Pojectsufte, I 993 200r. Innitute of Archaeoloq, UniverilyCo eqe London.



Fig. 12.8 Isoplethic mapping of pottery density.The basedata are identical to thar usedto constructthe mapsin Figs. 12.2, 12_5 and 12.6.SourcetKythera Island project. Used with permission.

exercise.Conceptually,the processcan be equatedto insefiing a horizontal plane througha three-dimensional surfaceat a setvalue,andrecordingthe line ofintersection. In practice,this processcould be approximatedby 'threading' a line through an aray of data valuesand then srnoothingthe line. This, of course,assumes that the data array containsvaluesthat match the intervalsof the lines. If this is not the case,then datavalueswill haveto be estimated throughthe process of inlerpolation (as describedin Chapter6). Isarithmicmappingis subjectto the samesorlsof errorsthat havebeendescribed previously;errorsin data collection may producefaulty basedata,improper inrerpolation algorithm might result in inaccuracies,edge effects can cause spurious resultstowardsthe bordersof the distribution, the map may be improperly drawn and the finishedmap itself may be imprope y interpretedby the reader. The useof


Maps and digital cartography

GIS for map generationdoesnot necessarily meanthat a map will be appropriately drawn, so somecarealso needsto be takento makecertainthat the sourceinformation is both clearly and properly presented. There are some design strategies that can be used to assistthis, such as making isolines appeardominant on a map by using heavierweightediines than usedfol other map data,and making certainthat the isoline labelsarelegible andorientated towardsthe user.Automatedlabellingof isolines, or otherspatial objects, is nothandled verywell by GIS andtheuseof more sophisticated third-party automatedlabelling facilities or, alternatively,manually placinglabels,will ensure greater legibility. 12.5 Internet mapping As the useofthe Intemetandotherfon.ns ofelectronic dissemination ofarchaeological data increase, so doesinterestin presentingmap data in such a way that it can take advantage of the interactiveand hypertextfacilities that multimedia authoring languagesoffer. Electronic publication has also changedthe style of publication by providing the readerwith more options to navigatetext and data in non-linear ways, and to createtheir own intetpretationsof the evidence. Hodder (1999, pp. ll7 128) has discussed rhe impacrof global inibrmation systems on archaeology and,in a lively debate, (1997) Hodder(1997)andHassan also discuss the implications arisingfrom Hodder'scelebration of the erosionof the authority of authorshipthat has been seen as an implication of digital publication. Similar implications arising tiom digital pr.rblication of spatial infonnation can be proposed:providing users with interactiveaccessto complete spatial datasets- and thus the ability to create, interrogateand analysesubsetsof this information - challengesthe authority of the cartographerand presentsopportunities fbr alternativeinterpretations. Standards for Internet GIS, or 'Web GIS' as it is also called,are still emerging(Smithrl al.20O2: Pengand Tsou 2003);the developmentof technologiesfor interactive Internet mapping is consequentlya relatively new enterprisein the rapid rise of GIS technologies, but is one that has the potential to bring GIS to a much wider non-specialist audience. Archaeologistshave been early usersof this technology as it provides a logical extension of the traditional published site report to a more interactive and interpretable fomat. Facilities that permit hyperlinks betweendatatables,descriptions and related maps can increasethe effectiveness of maps as communication media.A good exampleis the development of a 'Web-CD' for the West Heslerton Project (Powlesland1998) describedin Chapter 3. Another is the York Archaeological Trust's publication of 4149 Walmgate (Macnab 2003), where within a web browser usersare able to navigateat will betweenmaps and data with ease: large-scale maps are hyperlinked to smaller-scale plans of individual burials and features, which arelinked to datatablesdescribingthe contentsand artefacts. These in turn arehyperlinkedto Harris matrixes,the cells ofwhich arelinked back to other maps.This providesthe readerwith near-seamless movementbetweenspatialdata.

12.5 Internet mapping


descriptionand synthesis,providing a more integratedsite repofi than the tradipaper-based. ti onal. Iinear model. While hypeftext and hypermapsprovide exciting avenuesfor the publication of archaeologicalinfomation, they are dependenton predefinedlinks between data that the user is then free to navigate. Fully 'interactive mapping', on the other hand, involves a different set of tools that mimic the functions of standalone GIS delivered via the Internet and accessed by a browser. This type of mapping involves an element of free navigation within and between maps and spatialdatasets, including standardfacilities like changing the scale and level of panning and moving acrossmaps,and enabling and disabling map eeneralisation, themes(Cafiwrigbt et al- 2001). Many commercial GIS packagesnow offer the ability to distributeGIS data and GIS functionality over the Inremer.When linked to a'geolibrary' (Goodchild1998)offeringsearchable access to spatialdatasets, ihe potentialfor more exploratoryanalysisofspatial datarelationships is increased. Someof the morecommonproprietaryWeb GIS platfoms includeESRI's ArcIMSl and Internet Map Server (IMS).2 and Maplnfo's MapXtremel software. Using these programs, it is possible to distribute GIS data from dynamic servers ro usersvia web browsers,extendingthe user-base of spatialinformation to non-GIS specialists. Web GIS hasbeenextendedby the definition ofa markup languagecalled Scalable Vector Graphics (SVG), which offers considerableimprovementon HTML. Previously,maps accessed by web browsershad to be in raster formats and user interactivity was enabled by predefinedhyperlinks, or by Java scripting which provided some degreeof interactivity. SVG is a new non-proprietarygrammar that extends the utility of XML and establishesa standardfor describing twodimensionalvector graphics.Introductory guidesto XML and SVG can be found in Ray and Mclntosh (2002) and Eisenberg(2002). Although SVG only provides a rnechanismfor the creation of vector graphics(and their integrationwith raster images) it nevertheless offers a major enhancementto HTML by enabling the combinationof rasterand vector graphicsin ways not previously possible.Adobe Illustratorawill export files as SVG, and extensions to severalcommon GIS packageshavebeendeveloped that enablethe publication of map datausing SVG, such as SVGMapper5for ESRI's ArcView, Map2SVG6 for Maplnfo professionaland GeoMedia WebMapTfor Intergraph.A Web GIS application using SVG is shown in Fig. 12.9,which allows rnouse-over queryingof archaeological contextsfrom an (Macnab2003). Verhoeven excavation (2003) hasalso describedthe applicationof SVG to interactivemapping of archaeological data from a Mesopotamiansurvey project.
r vr,.u. esri . com/ \'nul^l..=.i . .o^Z =ortlrare / ar.c ins I sofrware / inrerne rmaDs 1 rm. a r,va'\,. 5 ww-. srgmapper . comr. napinf o . c orn/. adobe . com/ svg /. 6 l n n^r. gis n e w s - d e / s v g / m a p2 svg .h tm . T tr ttp ://im g s.in te.g.rpl ,..o^l q.*f

)7 R

Maps and digiral cartograplty

Phase 6.1:late1Ah to early13thcentury

Itnbor BulHlngs f,|.rd N on tho Wal|r|gatsatrost lrlniage t ltlx ol'ann
Fst colB ol lho acavrlion ara: lwo linbr lrildirus Afi. tns .l6!me of lf| kii.s i. $ at (guildings M a.'d N) Fc .ctrcted 6 lhe Walngale sll ton|aqs, patal!i io it an sid. lry $ide.

E ts_-





gshoy Intdpr.rdion LlFr

gsh.w Intlcon.

Fig. 12.9 An SVG map oflate tweltth to ea.ly thirteenthcentuly phasesat walmgateYork (Macnab 2003). Clicking on a given context pulls up a context descriptionwith links to photos and the stratigraphicmatrix. The map can be zoomed and pannedand thus provides many of the functions of GIS via a web browser.The systemlvas built b\ Mike Rains using MySQL, PHP and Apache. Data and interfacecopyright @York ArchaeologicalTrust. Reproducedwjth permission.

12.6 Conclusion that areableto hold andexpress Maps arewonderfully communicativedocuments a vastamountofspatial datathat would otherwisebe impossibleto conveyefficientll. A basic understandingof cartographicprinciples is extremely important for the effective use of maps, and the techniquesdescribedin this chapterwill fulfil the usersof GIS. As is clear from the previous mappingneedsof most archaeological digital publicationrequiresboth anew setofskills andarethinking section,however, of how data can be made availableto the public. Although there will be a place for the traditional paper map in the future of GIS, and traditional map design is

12.6 Conclusion


still a vital skill, Internet mapping will increasingly come to dominate GIS and will transform the mapping techniques that cartographers have come to rely on. The growing impodance of the Intemet as a primary vehicle for data dissemination, the sharing of map data between users in different countries and tle provision of tools for the interrogation of map data within Intemet browsers, are all part of this trend. In our opinion, this will continue to the stagewhen dynamic digital maps will become the primary vehicle with which users interact and explore archaeological datasets.

Maintaining spatial data

13.1 Introduction (non-human) The mostvaluable possesses resource thatanyorganisation is its data. Hardwareandsoftwareareeasilyrcplaceable but the lossofdata canbe catastrophic for an organisation. Informationloss, whetherfull or partial,is easily avoided through the routine taking of backupsand the storageof data off-site. As there is plentyofreadilyavailable information on how bestto implement a backup anddataprocedure. recovery we do not consider it in any detailin this book. What is less palticu]arly obvious, to thosenewto GIS anddigitaldata,is the similarlyimportant taskof datamaintenance. Consider, for example, the following threescenanos:
An employee in a culrural rcsolrrcemanagerrrent (CRM) unit is assignedthe task of updating site locationsfrcn] newly acquired GPS data. Horv should the fact rhat a tc\\ site locationshavebeenupdaledbe docunenled atd wherc and how should the old data be stored? An aerial photographof a ponion oflandscapehasbeenrectifiedand georelerenced, and is ready to be usedto delineatefeatulesof archaeological signilicance-How and rvher.e should intbrmation about the degree of error in the georet_erencing be documentcdl Where and how should the efiom fbr the newly digitised a.chaeologicalfeatures be documented? A resealrch student is collecting data on soil types fbr Eastern Europe tiom several different national agcnciesthat each have different scalesand recording systems.Horr is Ihis student able10search and compare and ul!imately intcgrate datasets in a maunel tbat ensuresthe data will be appropdateibr his/her needsJ

Each of thesethree scenarios highlightsthe needfor a way of recordingand retrievingmetodata, i.e., structured data aboutdata.In a very real and practical sense, metadata providesuserswith a set of essential piecesof and standardised infbrmation aboutwhatthe dataset is, whenit wascreated, whatits update cycleis. who createdit, wherethe datalefer to, andhow to obtain rnoreintbrmation aboutit. For any organisation that collectsor updatesdata,the recordingof metadata about eachdataset is an essential stepin datacollection andmaintenance. For individual researchers metadata is also very important,as it ensures that, fbr example, the sourcesof data are recordedso that they can be properly referenced in any ensuing publication. Althoughthe time investment associated with creatingand maintaining metadata is not trivial, it is vital to ensure the long-term viability of data archives. The benefitsof metadataare many and include greaterlongevity of infbrmation.

I3.2 Metadata standards


improvedunderstanding of a dataset, the facility to searchfor appropriatedatasets basedon severalselectioncriteria (e.g. type of data, location and creation date), andexpanded capabilitiesto sharedatabetweenindividualsand organisations. This ihapter describes methodsandinternationalstandards for the collectionofmetadata lbr geospatialdatasets. 13.2 Metadata standards What information abouta dataset shouldbe recorded?Standardisation of metadata categories is impoftant, not leastbecause of the increasinglycommon tendencyof accessing information via the Internetftom global gateways. There is a consequent need to be able to assess the relevantdetails about data from a standardised list of categoriesand descriptiveterms. Additionally, as digital data become the defacto form for many spatialdisciplineslike archaeology, standardised metadata are essential for providing userswith the ability to assess both the quality and appropriateness ofthe broadrangeof commercialandfree datasets that are encountered in a GIS environment.Metadatastandards havebeendefinedby a numberof organisations, and one of the most common is the Dublin CoreMetadata ElementSet, also known as ISO Standard 15836-2003. This consists of 15 elements that describe a dataset: title, creator,subject,description,publisher,contributor,data,type, format, identifier, source,language,relation, coverageand rights (Dublin Core Metadata Initiative 2003). The Dublin Core does not, however,contain elementsthat describegeospatial datasets, such as geographicallocation and scale.For that reason,severalaltemative standards havebeendefinedto describegeospatialdatasets. Theseinclude the Australia New ZealandLand Infomation Council (ANZLIC ) Working Group on Metadata: CoreMetadataElements GuidelinesDraftT,the Canadian GeneralStandardsBoard (CGSB) CanadianDirectory Information Describing Digital GeoreferencedData Sets,EuropeanCommittee for Standardisation (CEN) Standard for GeographicInformation, the UK National GeospatialData Framework (NGDF) metadata standard, and the US FederalGeographicData Committee(FGDC) Co[tent Standard l 998). Although eachof for GeospatialMetadata (FGDC-STD-0O1theseorganisations havedefinedstandards for their particularregion,regionalvariations have recentlybeen reducedby the creationof an internationalstandardfor geospatialmetadataby the InternationalStandards Organisation(ISO) Technical Committee(TC) 211 on Geographic Information and Geomarics, called1SO19j 15 (GeographicInformotion - Metadata)(IntemationalStandards Organisation 2003). This standardis now closely followed by all the former regional organisations and establishes the currentintemationalstandard for definingmetadata elementsappropriate fbr geospatial information. ISO 19115 consistsof over 400 hierarchically orderedmetadataelements.Its structure is complicated, but also comprehensiveand flexible. ISO 19115 also definesa 'core' setof22 elements: 7 ofwhich aremandatory, 4 that areconditionally mandatory (depending on the type of data) and 11 are optional (Table 13.1).


Maintaining spatial data

Table 13.1ISO 19115 core elements and entry examples

ISO 1 9 1 1 5 ElementNo. 360

Dataset title Dataset rcference date Dataset responsible party Geographical location (four bounding coordinates)

Exampleentry KytheraDigital ElevationModel 2 (KIPDEM2) (creation) 2001-10-01 James Conolly westBoundlongitude:22.89 eastBoundlongitude:23. II northBoundlatitude:36.38 southBoundlatitude:36. I3 en utf-8 006 (elevation) 20m The KIPDEM2 is a cortinuous mstermapof elevationvaluesfrom the Aegeanislandof Kythera.It hasa 20-mcell sizeandwas interpolated from a 5-m contour inteNal andspotheightsbasemap (manuallydigitisedfrom the Greek Military mapsof Kythera).The DEM wascreated usingTopogrid. ATcGIS 8.3 Elevationrange0-250 m taster


Mandatory 363 Optional 367 Conditional 343-346

Dataset language Dataset character set Dataset topic category Spatial rcsolution Abstract descdbing dataset

Mandatory Conditional Mandatory Optional Mandatory

39 40 41 61 25

Distribution format Additional exterit infomation (vertical/temporal) Spatial representation type Spatial reference system Lineage statement

Optional Optional Optional Optional Optional

285 355 358 3'7 20'7 83

Derived from the 5-m contouG and spot heights manually digitised ftom the I : 5000 Grcek Military maps of Kythera. wl^,w, uc1 . ac. uk/kip KIP.DEM2 ISO 19115Geographic Information - Metadata DIS-ESRII.O 9n utf-8 JamesConolly

Online resource Metadata file identifier Metadata standard name Metadata standard version Metadata language Metadata character set Metadata point of contact Metadata date stamp

Optioml - Optional Optional Optional Conditional Conditional Mandatory Mandatory

39'1 2 10 11 3 4 377 9

2003-07 -21

Source: OpenGIS Consortium(2001,Table3). Note that this table is providedfor informariononly and is not a guide to ISO 19115metadata standards. For full detailspleaserefer to the standard's (ISO,2003). documentation

I3.3 Creatingmetadata


Mandatory elementsinclude such things as the title of the dataset,an abstract describing the data and the point of contact (i.e. a named individual) for the dataset.Conditional elementsrefer to those that must be completed if the data are of a particulartype. Recordingthe geographiclocationof the dataset, for example, is necessary if the data are geospatial.Optional information includes items such as the type of spatial data, the spatial resolution of the dataset,the ref'erencesystemand the distributionformat. Although optional,theseelements plovide essentialintbrmation and should be completedwhen possible.If strict adherence to the full ISO 19115structure isn't necessary or appropriate, then this core sel can be usedfor guidancefor the minimum details that are neededto document a dataset. Some nationalgeospatialorganisations continue to define their own set of core elementsbasedon the intemational standard. For example,there are 32 proposed standardelementsfor the UK defined by UK Gemini (Walker 2003). As can be (M) and 15 areoptional(O). It is \eenfiom Table13.2,l7 of these are mandatory designed to be compliantwith ISO 19115 andwill tbrm the standard setof elements fbr UK geospatialmetadata. 13.3 Creating metadata It should be clear from this brief review of metadatastandards that creating a full, ISO- l9l 15-compliant,metadatadatabase for each datasetin a project or organisation is a very time-consumingexercise.It can, however,be streamlinedif some prior to creatingthe metadata basic informatjon aboutthe datais assembled record (cf. Land Information New Zealand2003):
How many datasets are there? During project developmentand analysisthe number of spatial datasetsmay rapidly expand as new combinations of data are created. Keeping track of individual datasets and their relationshipsis an essentialfirst step to proper data management and naintenance. What is the purpose of the data? Urdentanding what the data are used for and why they were createdwill also help jn the assemblyof metadata. What do the data represent? A clear definition of what is being represented by a set of geospatialdata is neededlor descriptivepurposes. How are the data represented?The provision of this information is crucial as it simplifies lhe process of searching throughcatalogues olmetadata to identify appropriate datasets. For example,elevationdata may be stored as a set of points, contour lines or as a continuousgrid, and although this is obvious when viewillg a dataset,it must nevertheless be specifiedin metadatain order to facilitate searchand ret eval. Who created it? Recordingthe identity ofthe individual(s)responsible for the crettiolr of a dataset is important so that queriesaboutits consfiuctioncan be addressed to the nght person. What are the sources ofiDformation and resources(the 'lineage') that were used to create the data? When creatingdata,whether by manually or automaticallydigitising papermaps,or by generating deriveddatasuchasa slopemap, it is necessar'y to documentthe sourceofthe basedataset, the methodsofdigitisation. the post-capture processingmethodsand the methodsof trarsfomation, as appropdate.


M aintninin g sp atial data Table 13.2 Standard metadata elementsfor the UK

I 2 3 4 5 6 '1 8 9 10 11 12 14 l5 t6 t7 l8

Type Tide Altemativetitle Datasetlanguage Abstract Topic category Subject Date Dataset reference date Originator Lineage Westboundingcoordinate Eastboundingcoordinate North bourding coordilate Southboundingcoordinate Extent Verticalextentinformation Spatialreference system Spatialresolution SpatialreFesentation type Prcsentation type Data fomrat Supplymedia Distributor Frequency of update Accessconsfaint Useconstxaint Additional hformation source Online resource Browsegraphic Dateof updatemetadata Metadata standard name Metadata standard version M


o o


20 22 23

o o o


25 26 2'7 28 29 30 31 32

o o o o o

o o

What are the potential sourcesof error? Although this information does not form part of the ISO 19115corc elementset, it is nevertheless an important piece of metadata for geospatial information,If, for erample,datahavebeendigitisedfrom a papermap,then the rclevantRMS elTommust be recordedfor rcasons explainedin Chapter5. When was it createdanrl how long is it valid for? This information is particularly important for data that are subject to updares,such as a record of sites or find locations. Odginal datasets arc often superseded when new infomation is made availableandit is essetrtial that the time of creationandthe validity of the dataset is recorded soasto eNure thatthemostup-to-date infonnationis beingusedat anygiven time.

13.3 Creating metadata


13.3.1 Recordingmethods The emerging standardfor storing metadatais XML (eXtensible Markup Language),which itself is a standardfor recording information about information.r As applied to metadata,it consistsof a set of nesteditems with enclosedtagsthat describe the elementfollowed by the recorddetails,asin this segment from a longer XML record:
<-i d Pl rp > ' f o p- ov - de values a c o n c i .L o u s ma p o r el evaL for Kythera < / idpurp> on

< i dS Lat us > < ! - - c om plet e d --> < P r ogc d v a l u e = " 0 0 1 " /> < / i ds t at us >
< l C tP O( t>

<rporgName>Kythera Tsland Proj ect< / rporgName> < r pc nt T nf o> < c nt A dd re s s > < delP o i n t> In s t i tu te o f A rchaeo fogy< / del poi nt> < c it y > L o n d o n < /c i ty > <admi.Area>London< / adminArea> < pos tC o d e > WC 1 H 0 PY </p o s tC o de> <country>United Kingdom< / country> < / c nt A d d re s s > < c nt O nf i n e R e s > < link age > h ttp : / /w w w . u c f . a c . u k/ ki p< / l i nkage> < / c nt O n l i n e R e s > < / rpcnt.Info> < / idPoC> Although metadata can be entered manually in any text editor, the use of a dedicated XML editor with an ISO 191 l5 (orequivalenr) remplare grearly facilitates the construction of strxctured records. A good cross-platform example of such an editor is the USGS-developed Java application XMLInput2 Fig. 13.1). Many recent GIS programs also have facilities for recording information about metadata which adhere closely to the ISO 191 15 standard. ESRI's ArcGIS, for example, comes with a program called ArcCatalogue, which has a'wizard'for building ISO 191 i5 metadata that automatically srores rhe data in XML (Fig. 13.2). Integrated tools also speed up the documentation process by automatically completing some fields (e.g. scale, geographical coordinates, time of creation, etc.) by querying the data itself. Integrated programs provide a reasonably straightforward
t F*rh", info.rou,ion on XML can be found at wvnr. wl . orglXr4L. 2 A v ailable f r o m : f t p : / / f l p e x t . u sq s.so v/p u b /cr lm o /r o t1 a /r e te a se /:fl ti npuct


M aintaining sp atial data

D 8 8 ts ? rv u n o o r" g
D -rL)rr.$:
onsrsr. d!.rr.l'.0h,0.!x nrc..hr lrxrri.rdnrr srordorcrn'pr.,! di!! 'rrd.iLyrnDmory co Hor,r).lhepmJ'y.o!'d.1or nr. d.ra.asei. n'e reJ?nse Lhtipioll A90i6!: io[A) ore'r1rotrxi ixrllror'.rriiorri.'n3s.niiL-.nr)l|rlfr1sc!r.rr.ii!:rrdn'{r.r'!sLr}'siorrrrtjlrodir nsl.rl. crnrrnious 9loharcori!|, ot ess$ni.r hrs.itrt, r0.rur.s !,r,,!lL!r.sIr!.'r.rddrr.,'disorlrni.id]ln.rrlrrfr.r Jtr':.rrrdrlrtrl(J{l,!snq.rro.d htJr.rogii rirn'i.!trstsi.'ns. m:ll., rlirir!i0r!!olrs rd,ar..dsr'yniprrin.s ?id '.rrffbrorrs,m.lo, n!nic.n.. h,!', llr mqo' !trPo' nl4o.ti.r. Lr.rndrnrsnidr.|r|rrl.d!k.rs.m.0c1l3rsrhJsitr r.r!.rlirrlror.trDs lo od in lor.li|!q!rc!s oti crur1.Tlret!r.l'.s'!nnte(fucss:dJne.rryr'.n



D rir-,rr'
e[l ri,aesrP.rly o d rn n csp l , l l Drdii,. Di P !'f 9 t:l dPbc D'l,i!riir",

Fig. 13.1 The XMllnput editorwith an ISO l9l l-5template file loadedfor the Digjral Chart oi the World. Available from: I p, f- p- x - , u- gs . oo. o. o c ! m o ! o t - a e ease rm- ir pu

o*"*rlc*r*rl r6rioil Timepdiodledusi sp[atoo*iilKe]vods B..ph I l Brosse I seoriv
l Ca ,n u e re q er ar on.a"e.e,r ,.denr ,ohMlN0lSI2r ir e,ed{ 3 r :l i h6qefa'1hi .l i ndor Kr ther i .G 'eec el m as e l pq, etacr :ra 01voodr !1o B4l v oebEen 999..00l , Irk,-,I,-d or b"Gd.r r" { " -:J


r ,r h e r a . l v e s E r a h o n a r F s.r





g"u I

Fig. 13.2 The Metadataeditor in ESRI'SArccatalogue. Source:ESRL ATGIS software and graphical user inte ace are the intellectualpropeity ofESRI and used herein with permission.Copyright @ ESRI. All rights reserved.

13.1 Conclusion

r Rr

Na.me &locaron I Geosradv I oate Adranced I
Define addil,oral ,ea,ch c{ilera


( ondiLion:

',-. I



the wo'd :l I ncrude,


uer,', sea.ch ]


DdB birg seaiched h3s ts meetll srrh c tsrio[i.e. cile,k $eANDsdl

:'" ': l

Fig. 13.3 The Metadara search tool in ESRI'sArccaralogue. Sourcet ESRI.ATcGIS sottwarc and graphical user interibceare the intellectualpropefiy of ESRI and used helein with permission.Copyrjght @ BSRI. All dghrs rcserved.

;nterface andcentfalised database lbr entering,storingandretrievingmetadata. Furthermore,ArcCatalogueand similar programsalso provide tools for implementing queries (Fig. 13.3). metadata Metadataqueries can take the lbrm of multiple searchcriteria (e.g. 'all raster datasets describing elevation for locationp. created in the last 12 months,with a grid sizeof500 m or better'). Additionalsoftware toolsareavailable fbr geospatial metadatacreation: the USGS maintains a set of creation and validation software tools, as well as guidancepagesfol creatingFGDC compliant metadata (fbllowing ISO 19115 standards).3 13.4 Conclusion This brief review of the impofiance of properly maintaining spatial data using geospatialmetadatastandards should be sufficient to convince most readersthat metadataare important and warrant the necessary investmentto createcompliant records.In particular,the ability quickly to searchandfilter a database ofgeospatial data and then quickly retrieve the most appropriateor up-to-dateversion ot', for example a distribution map, is of great utility to rhe user. Furthermore,proper maintenance ofdata makesthe shmingof informationmuch easier, :ls it encourages
..9 s.q o \


Maintaining spatial data

common standardsfor recording the source, quality and errors associatedwith any given dataset.Finally, and no lessimportantly, the provision of metadatacontributes to the 'future-proofing' of the data that it describes, thereby helping to protect the investmentin the creationof thosedata.Sincethesebenefitsjustify the substantial costs of collecting metadata, their creation should be a routine part of any project involvins GIS.


GIS terms
2.5D GIS These GIS rypically represenrrhe rhird-dimension by exrruding vecror objecrs by a r_ attribute vaiable to createthe impressionof a three-dimensional surfaceor volume. They are not normally capableofanswering fully three-dimensional queries.(Compare3D C1S.) 3D GIS A CIS that recordsthe spatial locationoi geographicfeat ruJ in threedimensionssuohthat it is capableof answe ng querieslike .iind all the obiects within a sphe.icalradius of I m from lhi( poinl . (Cumpare 2.5D C,tS and virludl rpaliN.) of rocal accessibility provide infonnation abouthow read y -{ccessibility Graph theoreticmeasures an individual ,od may be reachedfrom other nodesoutsicre its immediateneigrrbourhood.Mei suresof global accessibility provide similar.information for the network as a whole. (ComDare conn?cliviry-) .lccumulated cost-surface A rnodel of the cost of moving from a specifiedorigin to one or more destinations.The cost rccorded ar rhe destination(s)is that incurred when forowing rhe reast costly route from the origin, as computedby applying a spreuding Jiortion to a cost-of pc$sage Accuracy consider taling the meanofseveral measurements to estimatesome.,//rir,ae, suchas soil pH. The estimatewould be accurateif the mean was close to the true value, even if the spread of measured valueswas wide. (Compareprecisio'!.) lgent-based model A computerprogram in which a collection of often interacting_ autonomous software entities (agents)pursuetheir goals by carying out one or more tasks1n slmulatedenvironment.In archaeological examples, agentstypically represent i11dividual humanbeingsorhouse_ holds. -\GNPS The AGricultural Non-Point Source Model is a process-orientated ercsion model thar attemptsto predict soil loss by allowing sedirnentto ,flow, over a map. -{nisotropic Dependenton both the direcrion ot travel and any drrectionalartribures(e.g. aspect)ot individual map cells. (Compareisotropic.) -{NSWERS The Areal Non,point SoorceWatershedEnvironmentalResponseSimulation model is a process-onentated erosion model similar to AGNPS. .{rc Another term fbr pol_vlr)7e. ,{rc-node data structure A method that structures, storesand references r?a/o/ data in a rekttiotltll DBMS so that wrtices (and roder) constructpolyllaes (arcs)and polylines consrrucr pol)gons. Area A two-dimensionalgeometricalprimitive that may be represented by a NecturporyBon, arra contiguousset of /4Jl/ cell,t that sharethe sameattibute value. Aspect The aaimlr/r of the maximum rate ol change ol elevdtion (slope) in the downhill direction. Attribute The term attributeusually refers to rhe non locatjonal propenies of r geographic facrurc (e.g. the number of anelacts found at a site), but it is worth noting that the OpenGIS Reference Model (www.opengeospatial.org/specs/?page=onn) includes location among the attributesof a geographic Iea ture. Automated mapping The forerunnerto GIS, this made possiblethe computerisedstorageand pres entation ofdigital maps,but offered no analytical capabilities.




Azimuth A direction in the horizontal plane,often given as the number of degreesclockwise from nofih. Binary viewshed map A raster otyectormap that codescellr orpol],so4.t accordingto whether or not they fall inside a glen viewshed. Bit The smallesturit ofdigital infbmation representing a binary state,suchas0 or I, or true or false. Boolean algebra Algebra that usesthe logical (or set) operationsAND (intersection),OR (union). XOR (exclusivedisjunction) and NOT (complement). Bufler An arza containing all locaLions within a specifieddistancefrom a givenZolnt, line or areal feature. Byte A unit of digital information made up of8 rtr. A kilobyre (kB) is either 1000 (103) or 1024 (2rt') bits, a megabyte(MB) I 000 000 (106) or I 0,18576 (2,0) birs, elc., dependingupon whether Ihe writer is using SI or binary units (unfortunatelyit is not always easyto tell). Cartesian coordinate The Caftesiancoordinatesystem usesan.rr-axis and a ). axis placed so as to be orthogonal(i.e. at a right angle) to one another.The two axes intersectar an odgin, which has the Caftesiancoordinate(0, 0). The Caftesiancoordinateof other locationstakesthe fbrm (r., l,). wherejr is the distanceftom zero on the r-axis and I is the distancefrom zero on the _l axis. The systemmay be extendedto threedimensionsby the addition ofa.-axis. (Seealsoedrriag,r?orrrir?gt cofiparc po la r cooftl inat e.) Cartographic Relatingto the productionof maps (as distincr from rhe manipulationof spatialdata). C,{D ComputerAided Design providesa meansof storing,malipulating and displayingrepresentations oftwo- or thrce-dimensional objects.Although it can be usedto producemaps,CAD soitware does not geographicallylocate objectsand is not capableol supportingmost of the database and analyticalfunctionality expectedof a GIS. Cell A single (usually square)areawithin the grid structurethat forms a r?.j/er map. Cellular automata A computersimulation in which identicalcells anaDgedon a reglrlargrid repeatedly update their state according to the state of their neighbours.The techniqueis particularli suitablefor modelljng spreadingprocesses, such as the spreadof wildlire. Centroid The geometic centre of a po1_t?or. Its location is often approximatedby calculating the mean of the coordinatesof all yerricesthat define the polygon. Choropleth map A thematic map showing a quantitativeor qualitative drribal? value associaied with each geographicala/"a (e.g. the number of archaeologicalsitesin each county). Cleaning The processof cleaning a rector map entails rcmoving redundant verdces and, mole importantly, ligitisirS erors that would otherwise disrupt rhe map's topologJ, srch as ddnglitlg (Seealso jrap.) lites, overhangs,overshoolsand sl:o)ers. Client A computerthat makesuseofa service,suchasthe authentication oflegitimate users, which is providedby anothercomputerto which it is networked.Alternatively,acomputer soitwarepackage that allows a user to retrieve and possibly manipulatedata stored using other software, either on the sameor a different computer.(Conparc sener a':rd:front end.) Colour ramp A gradualchangeofcoloua usedto represent ordinalrcdle (seestatisticalterms),inter val scaleor ratio scale(r,a/irrl? values.Normally the user specifies the exactcolourto be associated with the minimum and maximum valuesand the software then interpolatesthe approp ate colour for intermediatevalues. Command-line A facility for running computer programs, specifying opentions and choosing options by typing text rather than using a GUI Connectivity Crdp, theoreticmeasures of local connectivity provide information about how well an individual aodeis connected to its neighbours. Measuresofglobal connectivity provide similar information for the network as a whole. (Compareaccessi&iliry-.) Constrained interpolation The minimum and maximum valuesin a model generated by constrained interpolation must matchthe minimum and maximum valuespresentin the original sanple. (Com pareunconstrctircdbtterDo lati on.)



Continuous field This model of spaceis appropriatewhen dealing with aaarirrle values that vary continuouslythrough space,such as ?lefarior. It is usually represented using a /dstcr map. (Compate entitJ model.) Contour Another namefor lsoli??, but us.rallyusedto rcfer specillcallyto an imaginary linejoining locationsat a specifiedeleyation. Corridor A Dafer createdaround a lia.. Cost-of-passage map A (normally rasrer) map that models the energetic,time or other cost of travelling acrosseachlnap cell. Cost-surlace (Seedccanulated cost-surface. ) CRM Cultural ResourceManagement involves the identification. preservation,presentationand interpretationof archaeologicaland historical sites that are threatenedby development,natural processes or damageinllicted by visiLors. Cross-tabulation The production of a table that recordshow the dtarlrrf? values in one /.rrter map are distributedwith respectto the att bute valuesin anotherrastermap. Cumulative viewshed map A rasrer or r?clor map that codes celLsor polygons accordingto how many rien'slredrthey fall inside (i.e. the arithmetic sum of two or morc bhtart- \)ierrshed maps). Cycle A pda,4 that beginsand ends at the samenode in a gruph. Dangling fine A, digitising error in which two adjacerltline segmerts of what should be a single pol_]./i,l fall sho of one another,so teminating at two unwantednoder father than at a common vertex. (Comparc over shoot.) Database A collection of infbrmation organisedaccordingto a d4t4 ,radel. The central component of any GIS is a database that storesinformation abott geographicfettarrer, including both their geographiclocation and their attributes.Note, however,that many GIS practitionersuse the term databaseto refer specifically to the ddll7 .f/r1rctrr"e.r and-/orsoftwarc used to store non locational attribute data. Data model A conceptualschemethat defiles what aspectsoi the real wor]d are to be treatedas features,what all/irrte.r they have aDdhow they are related. Data structure The way in which a DB,44S actually organisesdata so that it can be stored in a 0omputer DBMS A DataBase ManagementSystemis softwarethat is used to store,manageand query data in a database. Decimal degree When an angular measurcment is givetl in decimal degreesthen its faactionalpart is given in multiples of l / 10, 1/100, 1/1000 and so on, ratherrhan in minures(1/60) and seconds ( l/3600). Delaunay triangulation A tessellation oftriangularpolygors constructed suchthat the closestyerfeir to any point within a given triangle is one ofthe venices used to consffuct that triangle. DEM A Digitat ElevationModel is a digital map that provides a model of the elevationol (part of) the Earth's surface.(CompareDIM.) Digitising Generally,the process ofconverting conventional papermapsand imagesinto digital form. Among GIS practitionersthe term is often usedto refer specificallyto thosemethodsthat produce |ector tather th^n rasr?/ output, especially the use of a digiising tablet or heads-up digitlsing. Digitising tablet A device for digitisin8 paper maps by tracing over them with a prck. (Compare schnner.) Dirichlet tesselfation A tessellationmade up of Thiessen pollgons lltted to a set ofpoirb. Dot-density map A map in which dots representinga consistentunit of data are placed mndomly in each enumerdtionarea such that the total number of dots within a given enumerationarea represents the total data value within it. DTM Digital Teffain Model is anotherterm for rEM. Note, however,that some GIS practitioners restrict the term DTM to thosemaps (usually Z1N.t) that explicitly model the location of landform f-eatures such as ridger. (CompareDEM.)



Dynamic modelling A fbrm of modelling involving the computer simulation of change through time. Easting The distanceeast of the origin (jr coordinate)in a Cdrre.ridncooldirda? system.(See also northitlS.) Edge A connectionbetweentwo roder in a g/dpr, usually represented as a linejoining them. Edge effect The tendencyfor Vdtial operationsto rctum incorrect values near the edge of a map, often because the ,?eig hbourhood tsedlo calculatethem has beenartificially truncated. Effective slope The ,r/opethat is actually expedencedwhen traversing a map .e/l in a horizontal direction other than the dspecr. Elevation The height abovesome specifieddatum. often nean sealevel. Ellipsoid A relatively simple modelofthe shapeofthe Earth that approximates it asa sphereflattened at the poles. (Compare8eail.) Entity model This model of spaceis appropriatewhen dealing with discretegeographicalfeaturcs such as tind spots, roads or administrativedistricts. The location and gorrelry-of each feature (entity) is usually recordedin a r,ectormap and its attributesin an associated./4t4rar_ (Compar continuous Jield.) Enumeration area An drcd treatedas an indivisible whole for the purposesofplotting the distributron of attribute vahes acrossa map. Environmental determinism A critical label appliedto modelsthat treat one or more - particularl\ locational- aspects ofhuman behaviourasa purely functionalresponse to environmental conditions such as the distribution of shelterand food resources. Error propagation The increasein uncertaintythat resultsfiom combining datasets and/orbuilding models (e.9. by iterpoldtior) from them. (See alsomap generelisation.) ETRS 89 The EuropeanTerrestrialReferenceSystem 1989is the geodeticdatum prelerredfor highprecision GPS survey in Europe. (CompareNAD 27, NAD 83 and WGS 81.) Euclidean Contbrming to the axioms that define Euclideangeor??tn. This is the kind oi geometry that underw tes what most of us are taught at school: for example,that the linear distancefiom A to B is the sameas from B to A, and that the lengths of the sidesof a ght-angledtriangle are relatedaccordingto the formula a2 + b2 : c2. Exact interpolation A type ol iterpoldtior? that leaves the attrib te values at rdr?p/? locations (Comp^re inexact interpoldtion-\ unchanged. Filter A spatial operalio, designedto removevariability from a map by computing the new 4//rirrle value of a given cell as a function of the valoes inthat cell's neighbourhooal. Ffat-file database Addtabase compisrngjust one tablethat storesinformation aboutall the recorded dfrirrter ofeach object. (Comparcrelatiotral database.\ Flood-fill interpolation An interyolation algorithm designedspecifically lbr the purpose of interpolating continuoussurfacesfrom isollll? (normally contoul) data. Flow accumulation map A map createdby searchi\Ea local draindgedirection maptocomputethe numberofupstreamaellsthat drain into eachcell. Usually,but not necessarily, usedfor lldrological modelling. Florv network A nentork that representsthe movement of, for example goods, between nods. This requiresthat it includes information about the direction and possibly the magnitudeof flo$. (CompareneflNorkand tanspoftation nentork.) Friction map Another term for cost of-pIssage nlap. Front end A userinterface to a GIS orothercomputersystem, especiallyonethat hasbeencustomised fbr a particular subsetof usersand/or which is a client ol a server. Fuzzy A state of unceftainty, usually either tn the geogr.Lphiclocation of some phenomenon. such as a boundary, or in the membership of a set, such as the set of c?ll.r visible from a site. Geodesy The scienceof neasuring the shapeof the Earth and the location of points on its sudace.



Geodetic datum A coordinatesystemdesignedto provide the bestpossiblefit with all or part of the georal. Geographic coordinate Geographiccoordinates use /a/i/&deand longitude to specify a location on the surlaceof the Earth. Geographic feature A featu.e associated with a geographiclocation. Geographic location A location relative to the Ea h's surface. Geoid A model of the shapeof the Earth that takes into accounrdeviationsfrom the ell?sotd such that its surfacelargely coincideswith mean sealevel. Geolibrary A collection of searchable spatial datasets, typically rnade available to users via We, G1S. Geometry The branch of mathematicsconcemedwith spatialrelationships.GIS practitioneft often usethe term 'geometric' to refer to spatialaspects of a geographic&atrle (e.g.its shape)otherthan its geographic locatio,r. Note also that'a geometry' is a fbrmal spatial language,e.g. Euclidean geometfy. Georectification The combined processof correctingdistortion in an image (such as an aerial photograph) and geoteferencitlg it. Georeferencing The processof placing spatial data (such as remote-sensing imagery or digitised paper maps) in its conect geogmphic loccttionby ft,ansforming it to fit an appropriatecoordinate system. Geospatiaf A telln used to describedata (or rnethodsapplied to it) aboutBeogrdphicJeatures,in other words. data in which the spatial componentincludes location relativeto the Eafih. Global operation A spatial operdtiotl that computesthe new rlttribute va]lue at a grvenlocation from the attributevaluesat all locationsfor which data are available.(Comparelocal operatiott.\ GPS The Global Positioning System allows users with suitable electrcnic receiversto detemine ther geographiclocatio, by comparing the time taken to rcceive signdlsbroadcastfron three or more (usually at least tbur) satellites.Handheld navigation-grade GpS receiversare suitablefor mapping featureson the landscape to within l0 20 m oftheir true location.In North America, the Wide Area AugmentationSystern(ltrAAS) and,inEurope, the EuropeanGeostationary Navigation Overlay Service(EGNOS) transmitinfornlation on accuracyofGPS positioningsignals,whichcan enhancethe accumcyof suitably eqLripped navigation-grade GPS receiversto +3 m. Differential GPS (DGPS) are accurateto between0.5 and 5 m and thercfore suitablefor archaeologicalsite survey and larger-scale mapping. Graph A very abstractkjnd of l?eano,*simply comprising a collection of nodeslinked by erlges. Graticule The 'grid' fo|med by displaying the lines of /a/itude and longiude on a map. The exact appearance of the graticule dependson the choice o[ projection. Gravity model A locational model which assumesthat the intensity of interactionbetween locations is directly proportional to somequantity at thoselocationsand inverselyproportional to the interveningdistance. Ground control point A locationthat can be identifiedon a map and whosercal 8ogrup hic locatiotl is known. Ground control points play an im porlant role in georeJe rencing . GUI A Graphical User Interfaceis a facility for running computerprograms,specilying operations and choosing options by using a pointing device (mouse)to selectfrom choioespresented as, tbr example, 'drop-down menus' and 'radio' buttons.(Comparccommaid-Line.) Header The headerof a r.rstermap u$U.ally recordsthe number of rows and columns in the map, the geographicloc.ttion ofits cornersand the size (on the ground) of each c?11. Heads-up digitising The processof digitisinS a map by using a mouse or other pointing device to traceo\er ascannedcopy that is displayedon a computerscreen.Also called on-screendigitising. (Comparcdi giti sing tabI et.) High-pass fflter Aflrerdesigned to removelong rangevariability ftom amap andwhichis therefore otten used to accentuate local f-eatL,rcs such as the edsesof fields.



Hisfogram equalisation A technique lor classilting a continuousvariable (e.g. elerat[on) so that each cld.rscontainsthe samenumber of observations. Hydrological modelling The processof modelling the movementof water over the Earth's surface. (See also local drainage.lirection n.tp.) Hypermap A digital map. designedto be viewed in a web browser, that containsfixed hyperlinks (URLS) to more detailed maps and/or ntrribla? data.(Comparet/lletactive nlap.) preseNethe 4ltriD!aevalues Inexact interpolation Alype of interpolatiar!that does not necessarily at sd/??/elocations.(Co\itpareexact interpokttion.) Interactive map Llke htpennapt, interactive maps are designed to be viewed in a web browser, but rather than relying on fixed hyperlinks they allow iree navigation within and between maps by providing standardCIS facilities such as the ability lo pan across maps, to change the scale and level of genemlisation, and to enable and disable map themes. (See also WebGIS.) Interpolation A mathematicaltechniquefor estimatingallriD&le values(e.g.e/el,.rli(rr,soil type) at unsampledlocations from those measuredat sampled locations, where the unsarnpledlocations lall within the spatialdist bution of sampledlocations. Intervisibility Locations A and B are computed to be intervisible when a straight'line-of-sigha surfacerepresentingpotential obstructions projectedfrom A to B does not intersectan ?1?vdl/or1 such as rising ground or buildings. (Seealso i eciproci\.) Inverse distance weighting (IDW) A form ol itlterpolatro, in which the influence of a sampled location is inverselyproportionalto its distancefrom the unsampledlocation whosedr,-ir&te value is being estimated. Irradiance The amount of solar energyreceivedper unit area,also refered to as solar gain. Isarithmic Isarithmic maps are usedto portray three dimensionalsurlaces(volumes)usin8 lsdlln?s. Isoline An imaginary line passingthrough a set of locationsthat sharethe samevalue of a particulaf attribute.(Comparecortdx'-.) Isometric map An isaritlu ic map that depicts the distribrtion of point attributes. (Compare isoof a three d imensional object (or a surfacesuchaselevation lsometric view A visual representation ) in which the anglesbetweenthe \-,l.and :-axes are equal. as are the scalesalong them. Note that dlawingneithercondition is true of a pempective Isopfethic map An isdiithmic fiap that depicts the dislribution ol drcal attributes. (Compare is.r of the direction of travel. (Compareorisorropic.) Isotropic Independent graph ,ta\ anallsis uses 8rdp,t.rto depict which spacesin a building or urban Space Justified from which others (e.g. via a doorway). A justified graph showsthese areaare directly accessible relationshipsrelativeto a speci6edsta.ting space.offel1the outsideof a building. Isovist The set of all locationsi/tervi.rirl? with a given viewpoint. used, in a spatial operation soch as.filteritlg. I{ernel The neighboLtrhoocl Key A rnique identifier that can be used to distinguish between individ al records in a .lararar-. table.A primary key usesjustone field il1 the table,whereasa composite key usesa combination of fields. A foreign key is used to build relationshipsbetweentables in a relatiotldl ddtabdse. Laplacian filter A pafticular kind of hiSr-p.rs.r.r/terthat is often usedfbr edgedetectionin images. Its t?r,?e/weightsare chosento approximatethe secondderivativesin the definition ofthe Laplace operalor. Latitude An angular measurementgiving the location of a point on the Earth's sulface north or sorth from the equator. Both latitude and,lotlgitude are required fbr a complete geogrdphi. Least-cost path A route that minimises the total cost of moving betweentwo locationson an 4ca(mliated cost sutfdce,



Line A one dimensionalgeometricalprimitive. Also a basicrectol object comprising a straighFline segmentdefined by two terminating ,ode s. A lrne,f geogrcphicfeaturc can be represented in a vector map using the basic line object if it is straight, or a pol-),line(arc) if it is curved. Linear geographicfeaturescan also be rcpresented in a rdrter map as a sequence ofraster c?ll.rthat share the \ame d//rirrlP \alue. Local drainage directionmap A hydrologicdl model, usually presented as a ra.rtermap, in which the a///ib /e value in eachcell represents the direction in which water would flow out of that ceil. Local operation A spatial operation that computesthe new dttribute yaloe at a gi\en location fto1n the attributevaluesat locationswithin a reighbourhood that comprises a (usually small) subsetof the locationsfor which data are available.(ComparcBloba.loperation.) Location-allocation modelling Location allocation models provide solutions to the problem of optimally locating facilities (such as a trading post) within a catchment. Note that the solutions assume that people always use the nearest facility. (Compare spatial interaction modetlils.\ Longitude An angularmeasurcment giving the location of point on the Earth's surfaceeastor west from the GreenwichMerrdian. Both latitude andlongitude are requiredfor a completegeoSldpfuc Low-pass filter Afllr designedto removeshort-mngevariability from a map and which is therefore often usedto smooth a surface. Map algebra The application of arithmetic and./or logical operatorsto one or more rd.rter maps so as to compute new dttribute valres on a cell-by-cell basis. Map generalisation The processof adjusting the amount of detail in a map when it is reproduced at a different.rc4leor psortio, so as to avoid the enols that would otherwisearisefrom ignoring the limitations in accuracyset by the o ginal scale. Mass points The poirts usedto build a TIN. Metadata Information about a dataset,usually provided in a standardisedformat, and typically recording what the datasetcontains,its spatial location, whei it was created,who createdit, how it was created,etc. MIDAS The Monument Inventory Data Standard,which provides a standardised terminology for the inventory of archaeological and historical featuresin the UK. Multiple viewshed map A raster or rector map that codescells or polygons according to whether or not they tall inside one or more given vien slreds(i.e. the logical union of two or more Dlnary iewshed maps). NAD 27 The North American Datum 27 is the geodeticdatutnthatwasusedfor much ofthe twentieth century mapping of Nofth America. It is now being replacedby NAD 8J. NAD 83 A recent Nofih American 8.?odetl c datum that bettet capturesthe deviationsbetween the true shapeofthe Earth and the Clarke ellipsoil on which the earlierNAD ?7 was based.(Compare also ErRS 89 and WGS81.) Neighbourhood The setoflocations usedby a spattuloperutior to computea new 4ttribute vaTtleata given location.Note that the locationsin a neighbourhoodmay or may not be spatiallycontiguous. An exampleof a contiguousneighbourhood commonly usedwhenfl/?/t g a rzstelmap is the target cellplus the eight immediatelyadjacentcells.The term has adillerent meaningin rebrolt analysis where it refersto the set of rode.rto which a given node is connected by some specifiednumber of ealges. Network A representatiolof the oonnections betweensomephenomena(such as cities), especial]y one in which the basic arap, is augmentedwith additional information such as theJlow of goods of a turn table. Node In a vector map a node is a r?/ter that represents the end of a discreteiine (which may be a polyline), or the intersectionof one or more discretelines. Alteinati\ely, in a Sraph or network. a node is an object that is connectedby zero, one or more edge.r.



Normalisation The processof splittinE tablesin a relational ddtdras? so that it conformsto a set ol 'normal forms' designedto preventduplication and ensuredata integrity. Northing The distancenorlh ofihe origin O-coordinafe)in a C.rresitul coordinate syslem.(Seealso eas tlg.) computerprogrammingparadigmthat packages Object-orientated A data model anllot associated attribute data lbothlocational and non locational) and t'unctionality(i.e. behaviour)togetherinto nodular units called objects. Observer offset When computing lr?,sr-visiriliDmost software allorvsthis extra height to be added to the viewpoint e/ev4tio, in order to ensurethat the line-of-sight is projectedfrom eye level rather than ground level. (Conparc tdrget ojfset.\ ODBC Open DataBaseConnectivity is an application programming interface for connecting to database managementsystems(DBMS). Clixr software that implements ODBC (e.9. the mail merget'acility in a word processor) can access data in a DBMS that also implemeDts ODBC, even if the client software was wdtten in a different programming languageand runs on a differenl operatingsystem. Open source software The label 'open source'is often usedto refer collectivelyto sofiwarerelease d undervar-ious licencesthat permit or evenrequire the distribution of sourcecode,generallyat littlc or no cost to the user.More specifically.open sourcesoftware is that distributedaccordingto the terns of the Open SourceDefirition (see!vww.opensource.org/docs/definition.php). Orthorectification The process ol removing elet ation-lndnced distortion from rernote-sensirg images.(Comparegro rcc t iJicati o11.) Overhang A .llgttislrg error in which one pol.!'.qo,overlapsanother.(Comparesliv?l-.) enor in which two adjacentline segmenis of what shouldbe a singlepoll,,;rr Overshoot A digiri.rirTS are nof.joinedat a common w,te.r, but insteadcrossbefore eachteminating at an unwanted,?o.le. (ComparcdangI ing I ine.) Palmtop Small handheld computing devices mostly used as per'sonal organisers.but increasingl\ able to run a wider range of sofiwa.e including GIS. in a graph such that no edge occurs more than Path A sequenceol erlg?.rconnectingtwo /?odes Photogrammetry The process of determining three dimensional coordinatesfrom two or more photographsof the sane subject taken fiom different positions. Photogrammetryis sometimei used to derive ?l?vd,il)r data from stereoscopic aerial photographs. Pit A map c?ll whoseneighboursall havea higher elevation.Pits may or rnay not accurately represenl g becatse real tenain features, but eitber way they are often removedprior to l\'drological noclelLin they disrupt the calculation of local clraitrttge dircction. Pixel Another term for ce11. Some GIS practitionersprefer the useof pixel tbr raw rasterimagesand cell for lllly geor"./r,f?d rastermaps. Pla[ convexity A second-orderderivative of a sudace that measuresthe amoLrnt of cur-vature in the horizontal plane. Il the surtaceis an elef4tiorTmodel. then positive plan convexity indicate! a location that is on a convex sudace (perhapsa hill), while negativeplan convexity indicatesa (perhapsa naturalbasin). (Compatepntfle cotteri\.) location that is on a concavesurf'ace Plotter A device for printing maps.particularly one designedto print on very large sheets of papcr (larger than A3) by literally drawing with a 'pen'. Polar coordinate Polar coordinates use a distance fiom a lixed point and ore or morc anglei (dependinghow many dimensionsare relevant)to dellne a iocation. In contr'ast. Carfesian coordirle.i use only dislancesalong one or more orthogonalaxes.Geographiccootditntes ate esseitially an abbreviatedfbrm of polar coordinate in which only the angles (ldrirrdu and /o,?gi tude) arc glen becauseit is assumedthat all locations of interest lie (approximately) on the ellipsoitl. from which it follows that the distance from the centre of the Eaflh is redundant intbrmation.



Precision consider taking the meanot'several measurements to estimatesomeattribute,forexample, soil pH. The estimate\aould be precise if the spreadof measuredvalues was narro% even if the mean was not close to the true value. (Compareaccurac t-.) Point A zero-dimensional geometricprimitive that is represented by a single ycl&r in a redor map or (.ather imperfectly) a single cell in a rdster map.In reality, many pojnt geographic features are not zero dimensional,but they are represented as such because thei exact spatial extentis not relevantand/or is too small to be recordedgiven the resolution of the diSjtal data. Point digitising A meansof using a l ilirivng rablet or of heads ttp tligirising in which the prck or mouseis used to place verllc?s manually on the map object. (Comparc sfi?on1 aligitising.) Point operation A point operation computes the new attribute value at a given location without reterenceto the attribute value at any other location. It may, however.involve the use of attribute valuesat the samelocatjon in other maps.(Comparespatial operutio ..) Polygon A rector object usedto represent an areal geogruphicfeature. Polygon overlay A type ol spatiol query-that is used to determinethe 4re4 that is defined bv the lntersecrion oI t\ro or mote rol)gons. Polyline A recaol object comprising two or more connectedstraight llne' which togetherLepresent a curving linear geograp hic fedture. The reftices at which a polyline terminatesare ofien referred to as no4es. Pour point Computation of a watersheclproceedsby following a local cfuLinoge direction map upstreamfiom a specifiedpour point on a watercoume. Predictive model usually a mathematicallydejinedJanclro,l, sr.lch as a logistic reSresslon equarion, that rclates the probability (or odds) of site presenceto one or more spatially \rtryrng a,tbutes such as elevation,proximity to water,etc. predictive models are often presented to the end,userin the form of a rdrr?/ map in which the attribute value of each cell represents the probability that archaeological evidencewill be found in that location. Primary data Data collectedby actual measurement of the phenomenonin question,foa examplea plan of field boundariescreatedby field surt ey.(Comparesecondarydata.l Profile convexity A second-order derivativeof a surfacethat measures the amount of cu'ature in the vefiical plane. Il the surfaceis an eler4tior?model, then positive profile convexity indicatesa location that is on a convex surface(perhapsthe side of a hill), while negativepronle convexity indicates a location that is on a concave sudace (pe.hapsthe side of a valley). (Comparep/an Projection The transformationrequiaed to representthe curved sudace of the Earth on a flat Dlane. such as a map. Projectionsare grouped into families according to whether they firsr projecr the surface of the Earth onto the sudace of a cylinder (cylindrical projection), cone (conic pro_ jection) or flat plane (azimuthal projection). They also fall into groups according to how they manage the distortion causedby the transfonnation t'rom a curved sudace to a flat planet i.e. whether they preservethe orthogonal layout of the lines of latitude and longitude (conformal projection), the areaof a geographicfeaturc (equal-area projection), the distancebetweentwo prir7tu(equidistant projection) or the direction ofa line drawn betweentwo points (true_direction projection). Proportional symbol map A map that usesthe size of a symbol to reprcsentthe dfribute yalre at a specific location. Proximal tolerance when creating a TrN. the minimum horizontal distancebetweenveltice.,used to createthe tessellation.(Compatev,eedtolerance.) Puck An electronicpointing device usedto record the location of points of intereston a map placed on a digttiting !ohl?t. Quadrat A rectangularareaused as a sampling unit during fieldwork. (Compare.mcr.) Quadtree A methodusedto rcducethe storage requirements ofarastermap by meansofa hierarchical tessellation.



spatial propQury The selectionof geographicfeature.raccordingto their a/tribLte values and./or enies. (Comparespatial quem.\ Raster map A raster map represents spatial data using a grid of equally sized cells or pxels. Each cell containsa value recordingsomeataibute at that location.The geographiclocationofeach cell is usually calculatedfrom inlbrmation provided rn the map header. Reciprocity Ina?rvisiriliD is saidto be reciprocaliflocation A is computedto be visible from location reciprocal B, and B is also computedto be visiblefrom A. Note that intervisibility is not necessarily rf the obsener offsetts different from the drget ortl. or categoriesused to Reclassification The processof simplifying a datasetby alteing the c/d.r.re.s record 4ttrirrl? values. Relational database A databasemade up of separate tables such that all records in a given table have all the attributes tha.t appear as fields (normally columns) in that table. (Comparefdafile.\ Resolution GIS have display tools that allow digital data to be viewed and printed at a variety of rcdlei. Consequently, the traditionai conceptof scaledoes not measurean int nsic propeny of a digital dataset.Instead it is ofien helpful to think in terms of the resolution of the dataset,that is the smallest8eogi"apft icoL.ledturcs Ihat it distinguishes. Ridge Local elevationmaxima that define the upstreamedgesof \|atet sheds. Errorofatransformation (e.g.dudng georectificdion)lsaquantitative RMSE The Root Mean Square measurcofthe goodness-of-fitbetweenthe desircdand actual locations. Run length compression A method used to reduce the storage requirementsof a raster map by eliminatiDgcontiguousduplicate numberstiom a sequence. on the Earth'ssurface. Scale The ratio betweena distanceon a map and the distancethat it represents Large-scalemaps have a small ratio and usually cover small a]easin detail, whereassmall-scale mapshave a large ratio and usually cover large areasin lessdetail. (Compare/eiolrlirr; not to be conlusedwith the statisticalusageol.tcd[".) (e.g.a papermap) by moving a sensor Scanner A deviceusedto capturca /d.rter imageofson'rething ovel lt, created Secondary data Data acquiredfron an existingrecord,fbr examplea plal1of field boundaries pritnaD,data.) by./rglllsirg a paper map. (,Compare Sensitivity analysis The processofrepeating an analysiswith different parametervaluesordifferenl .r4r?p/?s in order to Saugehow much confidencecan be placedin the results. Server A computerthat provides a service,such as the authenticationof legitimate users,to another computerto which it is networked.Alte.natively, a computersoftware package,such as a DBMS. that storesdatathat is accessed andpossibly manipulatedusing other softwarerunning on the sanle or a different computer.(Conpare c/ie,?r.) appear Shaded relief map A map (usually of eleldrior?)shadedso as to give it a three-dimensional anie. as if illurninated b) low 'unlieht. Site catchment The geographicalregion exploited and/or controlled by the inhabitantsof a site. Sliver A digitising enor in which there is a small gap between t'Nopoli,gons that should sharea common boundary.(Compareolerlrarg.) at a given location. Slope The maximum rate of changeof the el?vdaior? Smoothing The applicationof a low-puss Jilter. Snap Many digiti.rlr,gprogramshave a snap feature that automatically linds and connectsveltlc?j within a user-defined fadius in order to help pre\ent danglinlaline5 and avershoots. Space syntax A theory and collection of methodsfor studying the human use of spacein the builr environmentthat togetherplace great emphasison the importance of the 'confiBuration' of (i.e. connections between)sDacesSpatial interaction modelling Unlike location allocdtion models,spatial interactionmodels allou the probabilistic allocation ofdemand when attemptingto locate facilities optimally.



spatial operation A spatialoperationcomputesthe new dttr.lr4fevalue at a given rocatronriom the attribute values at one or more other locttion\ in the same map. qCompaie point operationl see a\so neighbourhootl.) Spatial query The selectionof.qeographic Ji:aturcs accordingto their spatial propertiesas well as, or insteadof, their attrl,ata values.(Seealso buffer,polygon ot,erlay andque4j.) Spline An ilterpolatior2 method that joins togethera number oi polynomial functions to describea smooth surfacethat passes through all the datllrute valuesat the sampledlocations. Spot height An elevation value associated with a single pornt rather than a cortorl-, otten but not nece5\ c ril) re co tdin g c lo ualnr inim um or m ar im Lr m . spreading function The f'unction used to genemte an arcLollulote(rcost-surface ftom a c.)st()f passaSe map. SQL Structu.ed Query Language provides the industry-standardmechanismtbr users to create. modify and selectentriesjn a rclational databaseStream channel map A map depictingthe location of watercourses. Many strearncnannetmapsare createdby hydrological modelling rather than s|j.Vey. Stream order A classifcation of a watercourseaccording to some attribute such as how many tributariesit has or its location within the drainagenetwork. Stream digitising A meansof using a digitising tablet ot of heads up digitising in \rrhichrerrice.r are placed automaticallywhjle the p,l.& or mouseis traced over the map object. (Comparepoinl digirisitrg.) SVG ScalableVector Graphicsis an extensionot' XML that providesa non propfletary stanctard tor describingtwo-dimensionalvectorgraphics ina mannerthat allows them to be renderedbv suitablv !{eb bro$ser\and olherilpplicarion: enabled Target offset When computing intenisibilitr some software allows extra height to be added to each cell that is being viewed in order to ensur.e that the line-of_sightis projected towards the relevantobject (such as the top of a signal station) rather than ground level. (,Compare obsener ojlset.) Tension A paraneter that can be used to increasethe roughnessof a surface iterpolated usrng spllrp.r so that it better lits the dffrirrre valuesar the sampledlocations. Thiessen polygon A polygon that containsexactly one of a set of points and that also coniains all the spacethat is closer to that point than any other in the set. Thematic map A map whose content is limited to a single subject,such as soil, geology or hisroric places.(Compareropagraphi( map.) TIN A TriangulatedLTegularNetwork is a rc1orDtM that reptsents the lan{l surlaceas a tessel_ lation of triangularpo I\rgons(.acL\\al1y a DeIawlar triLtngulation). Topographic map A ,.ap that provides generarinformation about featuresof the Earth's sudaceUnltkethematicnaps, topographicmapsincludeinformation aboutseveral subjects,usuallyinclud_ ing elevation,watercourses, landcoveqbuildings, roads,etc. TOPOGRID A proprietary method of interpolation developed by ESRI to prodrce h),drok)gicall! corect DtM-r from contoLr data,spot heightsand, preferably,additjonal hycirologicalclata. Topology is the branch ol geomerD, concernedwith thosespatialpropertiesofan object that relnain unchangedwhen the object is subjectto continuousdistortion (i.e. stretchingor knotting). Consequently,the topological aspects of a geogr?p hic.feaxrreincl|(:!e whetherit directly abuts,contains or is containedby anotherfeaturc,but not its shape,size or gcogrdphic Location. Total station An electronic suNey jnstrument that is able to reco.d horizontal and vertical angles and linear distancesfaom itself to a target and then automaticallyconvert this oata rnto d.rrrr{j. t tt,rl hi ngt and pI et I tri.' v d,ttes. Tract An area used as a sampling urit during fieldwork. Whereas cpndrats are defined by a grid, tracts are often inegular in shape and at least pafiially delined by ieatures such as field boundaries.



the infiasrucure of a transportsystemsuch as Tfansportation network A renrork that represents a road or rail network. It will typical]y record the actual geographic location of edges(e.9. the (e.g. prohibited turns) and bends in roads),the nature of the connectionsbetweenedgesat r?odes possibly also the propertiesofedges (e.9. road width). (Seealso /uiz rarle.) pdtft through a set of notlesin Tfavelling-salesman problem The problem of finding the least-cost a r?er)rolt so that each node is visited exactly once. Trend surface analysis The processof fitting a mathematicallydefinedtwo-dimensionalsurfaceto a set ofpoilts. Trend surfaceanalysisis occasionallyusedas a method ol Elobalinterpolation,brlt more often to model a spatial process. Thrn table A table used to specify whether it is actualiy possibleon the ground to move between eachpair of elges joinirg at a node in a trunsponcttionnefltork. by unconUnconstrained interpolation The minimum and maximum valuesin a model generated s:nained intet?olation may be lessthan or greaterthan the minimum and maximum valuespresent in the oliginal sample.(Cot\1pare constrainedinterpolation.) USLE The Universal Soil Loss Equation links observedrates of soil loss at a location with the erosivity of the rainfall and various other topographic attributes stJchas slope and landcover Unlike AGNP.' it does not actually model the processoferosion. UTM The UniversalTransverse Mercatorp/oJeclio, is a cylindrical projectionthat divides the $rorld on each zone, making into 60 vertical zones, each6" of longitude wide. A grid is superimposed which comprisethe zone,an it possibleto specify a geographiclocation using UTM coordinates, easling and a northinq. Vector map A vector map represents spatialdata usingpaints.lines ard po\Sotlr, which are in turn a link to a ddtardr. and possibly also topological infotmation and,/or storedas lists of coordinates containing4lrlira1? values. Vertex An element in a realo/ map that represents an isolatedpoirt, the beginning or end of a li,?r (including lines that are pa:'"t of a pob,line), or the comer ot' a pohBott. (Comparcnode.) Viewshed The set of locationsthat are ir?/enisirle with a given viewpoint. gerrSraphic locations are linked by edgesii Visibility graph A graph in which nodes representing they Ne reciprocalh intervi.tible. Voronoi diagram Another name for Dilichlet tessellatiotl. VR Virtual reality is the computer simulation of an environment.WhereasJD G1Sprovides for the storage,manipulation and display of data recorded in three dimemions, viftual reality provides ways of actually interactingwith three dimensionalobjects. Watershed The areafrom whjch water drainsto a specifiedpotnl (thepour poil?f)on a watercours.. lveb-CD Electronicpublication ofdataon aCD-ROM in which hyperlinks provide immediatecrossrefercncingbetweenmaps,plans, illustraiions,tables,descriptivetext, etc. Web GIS Use of the Internetto publish digital spatialdata in a way that affords usersat least some of the functiolrality of conventionalCIS. (See inte&ctive mdp.) Weed tolerance When building a Z1N,the minimun distancealong a corlol,' before a rerte-ris used (Comparcprorinal tolertLnce.) to createthe tessellation. WGS 8,f The World GeodeticSystem 1984is a geodeticdatur?that can be usedfor GPS surveying anywherein the world. (Comparc NAD 27. NAD 8-l and ETRS 89.) capabie XML extensible Markup Languageis a general-purpose self-docunentingmarkup language of describing many different kinds of data. It is panicularly usef'ulfor storing metadataand fol sharingdata betweensystemsconnectedvia the Intemet.

Statistical terms
Autocorrelation In statistics,autoco elation means that observationsare not independent,such that each obseNation tends lo be similar to its neishbours.SDatial autocorrelation thus refers



to a situation where the difference in attribute \a\tJe between any two points is correlatedwith the distancebetweenthose points. The correlationmay be positive (i.e. attribute valuesare more similar the closertwo points are together)or negative(i.e_valuesare more similar the fifther away luo poinr5 are from eacholher). Binary connectivity A measure of the lopaloSical relationshipbetweena set of/? objects.Typically calculatedin a D x /1 matdx, such that each pair of objectsis recordedas either being connected { lr or no r(0 ). Case-control Retersto a form ofpredictir)e modelling in \ahich the location of sites is known, but the location of non-sitesis unknown. The resull is consequentlyan odds,rather than an absolute probabililymea\ure for it te pre.ence. Chi-squared test A commonly usednon-paratuetricstatisticaltest for independence. Often usedto test tbr independence in data arrangedin a contingencytable. Class A subsetofobservationsor objectsproducedby the processol classification. Classification The processol dividing a set of obseNationsor obiects into two or more subsets (i.e. classes)based on their qualitative or quantitative (numerical) properties,e.g. through the application of cluster anal\sis. Commonly used numerical classification methods for clroro plet mapping inclLrde standard deviation (elementsare grouped according to their distribution around the mean); quantile (classescontain equal numben of elements,as in quartiles where each class possesses 25 per cent of the elements);natural break (where class boundariesare defined at 'natural' intervals basedon the distribution pattem of the data, as tbund by routines such as "/eri r optimisdlion), equal interval (classesdefined at eqLlal mnges of values); and geometric interval (class boundarieschange systematicallyto increaseclass width in skewed distributions). Cluster analysis Atype of classirtcaio, that involves creating groups (through a v arietyol cluster deJinitionmethods)of elementsbasedon the elements,spatiallocttion or some other d/trirurefr). such as size or morphology. cluster definition A clusteranalysismay take many fonns, although the more coinrnon are hierarchical methodsin which small clusterswith few elementsmay form a smaller number of larger groups:density methodsin which clustetsare definedaroundhigh-densitya7"as (e.g.ofpoirr.r, or of attribLtte\al]]es 1na continuousdistribution); and partitioning methodsin which elementsare sequentiallyplaced into one of two or more groups by iteratively ascertaining the group to which eachobject is most similar. Coefficient of determination A statistic,/2, that expresses the amount ofdependencebetweentwo varjables, usuallylabelled x and 1. It can rangefrom 0 (i.e. no dependency) to +l or I (i.e., respectively, a completely positive or completely negativedependency). coefficient of variation A statistic that expressesthe variability in a distribution. lt is usually expressed as a percentage and calculatedas: CV : (standarddeviation/mean)x 100. Co-kriging Atypeof fr,I-gi,lg that usestwool more variibles tanclthustwo or moresemt venograns) fbr establishingthe panfielers lor an interyolation. Confounding Two variables are said to be confounded when they are not independent,and it is thrs difhcult to ascertainwhich variable is responsiblefbr the observedeftect. For exanple, in predictive modelling, high eleratiotr and large viewsheclarea might influence srre presence,Dut thesetwo variablesare confounded becausethey are not independentof eacb other (i_e.higher elevationstend to have larger viewsheda.eas),and thus it is rlifficult to determinewhich variable is actually inlluencing site presence. Contingency table A table of counts constructedby classiji,i?g objects by two or more nomlnaf srzle or ordinal-scale variables. For example, a distribution of sites may be classilied into slope class (e.g. low, medium or high) and date (e.g. palaeoindian, Archaic, Woodlancl). Measures of association in contingency tables may be determined by the Chi sqrtured test.



Correlation Two variablesare said to be conelated it' the valuesof one va able predictablychange with the value of the other variable. If both increasetogether,then they are positively correlated; if one decreases as the other increases,then they are negatively conelated. The strength of the relationshipcan be measrred by a correkiiotr coe.fficient. the degreeof relalionshipbetweentwo variables. Correlation coefficient A statistic that measures usually labelled-r and ';. The Ped^atr Ptodltct-Motjlent Corrclatio, CoeJtrcie,?/ is the most commonly used,and is denotedby r. Density analysis (intensity analysis) A set of methods for measuringthe changinSdensity of al1 atfirrl/e value (suchas the nunber ofpoints. or a population) o'rera gfien aretL. Dependent variable A variablethat is inffuencedby, or is thoughtto be in1'luenced by, an independent variable 'site location' may be infiuencedby the iodependent rari.lrle. For example,the dependent variable 'soil type'. Design variable In the conlext of prerlictiw tnodellhg- a designvaiable is a lumeric va.iable that is dedved fron a nominal-r-(dlevariable. Distribution A measureof the relative tiequency of occurence (i.e. probability of occu.rence) of values from a number of obseNationsof a single variable. If values are distributed equally around the mean (i.e. mean mode median) then the distribution is said to be normal ^: ^: (i.e. a'bell curve', or Gaussian distribution). A skewed distributionoccurswhen valuesare not equally distributed around the mean. More specilically, if the mode > mean (i.e. small numbers oi low values) then the distrjbution is said to be negatively skewed; il mode < mean (i.e. snall numbers of high values) then the distribution is said to be positivel) skewed. Diversity The number of diflerent valuesin a cell's neighbourhoad. Function A relation (e.9.ecluation) in which eachelementoi a setis associated with anotherelement of the sameor anotherset. For example,r : 3-\'is a function. Getis's Gl statistic A statisticaltest for determjningwhether a location and its surroundingregions fbrm a cluster ofhigher (or lower) than averagearlrir,le values. Geostatistics Stalisticalmethodsthat deal with the spatialrelationshipsbetweenobjects. Heteroscedastic Describes a siiuation where errors in an ifidependetlt\,o'iab[e arc drawn tionl dif[erent distributions (i.e. two or more separateprocessesare being expressed). This may bc identified by the presence ofclustering in an ir, r-plot. (Compare,l?omoscedostic.) Homoscedastic A situation when error'sare drawn from the same distribution fbr all independent variables.(Compal, eteroscedast ic -) Independent variable The variable(normally denotedby r) that influencesthe deperderl variable (normally denotedby -)). Interaction The description,analysisor prediction ofJ4or,(i.e. interaction)betweendefinedregions or centrcs(e.g. betweenone populationcente and another). Intercept The point on the ! axis at which it intersects a 1ll?e o//?Srerrion. Interdependence Describesa relationshipbetweentwo variables in whjch it is ditficuli or impossible (Conpare ftttbwlditrg.) to identify which is dependentand which is independent. Jenk's optimisation A statisticalmethod fbr devising 'natural-break'.1dssboundaries(seecld.rsi fcation) by minimising the sum of the squareddeviationsflon classmeans. t-Means analysis A commonly usedmethod of c/rsler analysisthat usesa partitioning method for cluster definition. Kolmogorov-Smirnov A commonly usedr1onpaftn1?1ricslaistical test usedto determinewhether two frequency(probablllly) clistributionsdtfler fioln each othe. Kriging A form of tterpolatian that relies on g?o.rldri.irr to calculate the distanceweighting of surroundingvalues. Lag The distancebetweenpairs ofpoirtr in a spatialdistribution (usually groupedinto lag intervals for the purposesofconstructing a Idrio.g,?,??).



Line of regression A straight line through a distribution of _r,_tvariablesthat is positionedto min_ imise lhe sum of the squaledverticaldistances. The line hastwo regressionconstants-. a y_intercept and a.r/op?,which are used to derive the linear equation. Mann-Whitney U test A commonly usednon-parametric staLisfical test usedto detenninewhether oldinal- or higher-Jcd/e data are dmwn fiom t singlepopulation. Monte-carlo simulation A statisticaltechniqueused to reducethe uncertaintyinherent in random .!dr?1?.r by taking repeated random samples(typically 1000or more). The distributionof valuesof some statistic(suchas the samplemean) can then be examinedin order to obtain a betterestimate of the population mean. Multivariate More than one variable,as in multivariatestatistics_ Nearest neighbour analysis A simple methodfbr detennining whethera setolpoinrs in spacetends towardsa regular,random or clusteredspatial pattern. Non-parametric test A type of statisticaltest (such as the Chi-s(luare(ltest) thar ooes nor requlre the distribLttionol vdues to conibm to a specilic shape.(Compareparametri(..) Null hypothesis An explanation thar is being statistically tested, usually expressedas either .no difference' (e.g. betweensdrlplp.r) or 'random,. and denoted by H6. If Hn is rejected, then the alternate hypothesis (llr) is tavoured (usually expressedas a ,significant diff'erence'between samples, or'non-randon1ness'). One"sample test A statisticaltest in which an observedpattern is comparedto a (randorn)sanrple ol independent observations. Outlier An unusually high or low value relative to other values in a set of data.Outlieasmay occur because ofhuman error (e.g.in observation or in dataentry), because the measurement comesfrom a dtffercntpopt ation, or because the value is recordinga mre event in a single population. Parametric test A type of statisticaltest (such as Str./erl,.r r) that requiresa certain shapeof disrll_ bution (typrcally it normal distribution). (Comparenon pdranretric.) Partitioning Around Medoids (PAM)A methodof clu.tter an.b,rrlsthatis similar,but more robust, than k-means. Point distribution A set ofpohri jn space.The arrangement of points may be describedas tending towards randomness (i.e. each point is independentof every other point), clusteredor regular in distribution. Seealso Deare.tt neighbour and Ripley's K Population The set ofentities aboutwhich statisticalinferences are to be made,typically by clrawing a rdmple from the population. Predictive gain A measure ofthe utility of apredicrive o.lel which, when calculatedfor a specified probability of site occunence, mnges from I (high predictive Lrtility) through 0 (no predictive utility) to I (the model predictsthe reverseof what it is supposedto). Range The difference betweenthe highest and lowest values in a sarrplc oi data (such as a nelglr_ bourhootL of cellst. Regressionanalysis A statisticaltechniquethat, in the simplest sense.finds the line or curve that best fits a set of ir, )-data points. More specilically,regressionanalysisdeterminesthe valuesof parameters (e.9. the rcgressionconstnnts)of the.futlctiorlthat descdbesthe best_fitline or curve. In linear regressionthis function descr.ibes a straightiine; non-linear regression analysisdefines the shapeof a curved line. Logistic regression analysis attemptsto fit an S shaped(sigmoidal) curve [o a range between 0 and I to minimise the liequency of intermediatevalues; it js thus ideal for predictive ntodelling where 0 represents the absenceof a site, and I the presenceof a siteRegressionconstants The )- intercept and .llope of a line in a linear equation, or the potentially severalparameters in a non-linearequation. Residual The observable differencebetweenan actual value and its predictedvalue in a sample.Fot example,in linear regression anal\sis the residualis the differencebetweenthe actual delrenleftl (y,) value, and the value predictedby the lilre of regression.



Ripley's K-function A function used to compare an obseNed point pattem against a theoretical random distribution at a range of spatial scalesin order to asceftainwhether the observedpattern is significantly more clusteredor more reguiarly distributedthan expected. Sample A set of entities drawn from a popukfiion ftori, which inferencesabout the population are to be made. Scale In statisticalterms,scalerefersto the type ofdata. Nominal-scale dataaaeclassificatoryvalues that have no inherent order or distance between them (e.g. 'beaker', 'bowl', 'plate')i ordinalscale data are categoriesthat can be ordered, but distancesbetween them are not measurable (e.g. 'small', 'medium', 'big'); interval data have the propertiesof an ordinal scale,but also the property of known distancesbetweenthe categories(e.g. 1000 BCE, 500 BCE, 60 BCE); ratioscaledata have the propertiesof interval data. but have a llxed zero point (e.g. 1000 In, 500 m, 60 m). Semi-variance Semi-variance is a spatial statisticthat estimatesthe degreeof correletion betweeri the aftribute lalues of pointr and the distancebetweenthem (i.e. spaial autocorte[at[on). Significance level The probability level at which a null hypothesis is rejected in favour of the altenmte h)polhesis in a statistical test, and denoted either by a or p. Significance levels are usually set at no more than 10 per cent (d < 0.1), mosl often 5 per cent (s < 0.05), but at times 1 per cent (d < 0.01) or less. As the significance level moves from large to small, the probability of making a ,^pe I error decreases and the probability of making a type 2 eftor increases. Slope In statisticaltems, slope refers to the gradientof a line in a regression anal,-sisand is one of Ihe regf essnn constants. Spatial analysis The set of statisticaland related methods (e.g. visualisationtechniques)used fbr Ihe analysisol spatjaldata. Spatial modelling The set of statisticaland relatedmethods(e.g. vlewshedanalysis)usedto under (e.g.betweenarchaeological standthe spatialrelationshipsbetweendifferent phenomena sitesand their environmentalcorlelates). Spatial regression A geostatistical method of regrcssion anah,si.tthat incorporatesspatial data as a variableto reducethe errorsthat arisewhen performing standard regression on datathat are known or suspected of being spatially autocorrelated. Spatial statistics The setofstatisticsthat areconcerned with the descdptionand/oranalysisofspatial phenomena. Student's I A statisticaldistribution defined by W. S. Gossetin 1908that is used in Student'sFtest to ascefiain whether the means of two nofinally distributed samples arc drawn from the same Testing sample A ,ran?le of sites(typically 50 per cent) withheld from the sampleof all known sites lhat is usedto test the accutacyof a predictite model. Training sample A jample of sites (typically 50 per cent) selectedfrom the sampleof all known sitesthat is used to detelop a predictive model. T\vo-sample test A statisticaltest betweentwo .ramp|e.s drawn from the same (or suspected same) Type 1 error Inconectly rejecting a null hypothesis. Type 2 error Incorrectly acceptinga null hypothesis. \alidation The re.rrng of a modelto determine it. accuracy. Variance A measureof the distribution arcund the mean of a sdrr?le of numbels. Variogram A plot of semirariaace against distance between points (i.e. the 1.r8,denoted by i) is called a |ariogram and is used to estimate the parametersnecessaryfor interpolation by kriging. An experimental variogram is a known function that dellnes the best-fit line between points on the variogram, and is usually based on a linear, spherical or Gaussian model.

Remote sensing Remote-sensing terms


Active sensor A remote-sensig systemthat transmitsenergyand readshow it is reflectedback from phenonena. (Comparepassivesenso r.) Colour composite An image that resemblesa colour image, but has been created by combining informat;on from individually captured.rp? ctral bands from a multispectral sensor Also called a false colour composite wheo the colours do not resemblewhat would be seen to the naked eye (e.g.when spectralbandsthat are strongly sensitiveto the presence ofvegetation are not coloured green). Contrast stretch A mathematical techniqueusedto expanda narrow rangeofpirel valuesin a digital image by 'stretching' the valuesto the upper and lower limits permitted (typically between0 and 255). GPR Ground PenetratingRadar is an actile sensor remol?,.sers,rgtool, usually handheldor on a small wheeledtrolley, that transmitselectromagnetic radiation (typically somewhere between 1.5 GHz to 100 MHz) beneaththe ground and readsthe reflectedsjgnal to detectsubsurface features (such as budals or building foundations). Ground truthing An essentialstageir image classifcation in which areasdefined by supervlsed classfficationor unsupemisedclassiJicationare physically checked for accuracy by visiting the areain question. Hyperspectral See multi sp ectral. IFSAR InterferometricSyntheticAperture Radar.. an actire sensormountedon spacecraft (such as the Space Shuttle) or aijctuIt to coTlecteleration d^ta. Image classification The processof convertingthe continuousdatarecordedin one or more spectrdl (orprobability valuesforclasses)thatrepresent barLr into discretecldsse.r differenttypesoffeatures or vegetation.This is usually done via an unsupervised classifcation ot supervisedclassifcation procedure. Inliared Radiation between the visible and microwave portions of the elechomagneticspectrum. Near-infrared (NIR) begins at wavelengthsof about 0.7 prm (i.e. 0.0007 mm) and stretchesto wavelengths of about 1000Um in the far-infrared (FIR) range. LiDAR Light Detection And Rangingt an.rctiyesensor mountedon aircralt to measwe ele\)ation. Multispectral More than one spectrdl band.ll there are very many narrow bandsrecorded(e.g. in excessof20), then the data may be refered to as hyperspectral.(Comparcpanchromatic.) Panchromatic A single spectralband that recordsvisible light in a format equivalentto a 'black and white' (i.e. greyscale)image. (Compare ltispecttal.) 'rr Passivesensor A, remote-sensing systemthat recordsenergythat is either generated by the targeted phenomenon or is being reflectedfrom someother source(e.g_ the Sun). (Compareactive sensor.) Remote sensing The aft and scienceofrecording and interpretingthe natureofphenomena without coming info physical contactwith them. Usually associated with some fom ofdevice that records energywaves (such as visible light, infrared light or radar). Spectral band The range of electromagnetic wavelengthsrecordedby a single sensoron a remoterensinS device. Typically, spectralbands conespond to a named ftequency mnge such as nearrr/iarcd. visiblegreen. \ isiblered.et.. Spectral clustering Seeunsupervised classification. SRP The SpectralResponse Patternis a distinctivepatternof energyvaluesrccordedin one or more spectral bands for aspecific featureor phenomenon(e.g.buildings or a plant species)or group of phenomena(e.g.a plant community). SRTM The Shuttle RadarTopographyMission, which usesan IFSAR device to collect world-wide elevation data. Supervised classification A processof image interpretationthat beginsby delineatinga stLmple of trdining areas in a digital image in order to identify and define a spectral response pattem for the Dhenomenon in ouestion.



Tfaining arta A defined area on the ground that contains a specif,c feature or phenomenon (e.g. an archaeological site, or a type of vegetation or plant community) that carl also be delineated on the digital image and thus lusedjn a supe^)ised cla"ssifrcation. Unsupervised classification A process of image interFetation that begins by stadstical pattem recognition, typically using some form of nultiyariate statistics to idendfy clusters across two ot morc spectral bands, Tl\ese spectral cla.ster,sare assumed to represent distinctive features or phenomena and can thus form the basis for an imaqe cla.ssification. Vegetative index A mathematical combination of different spectral bands tJttoigh map algebra to create a ne$ inage that representsspatial variation in some aspectofvegetation (e.g. the Normalised Difference Vegetation Index, or ND\rI, combines the rcd aJrdtearinfrared bands to produce a new map \rith cI, values typically between 0. 1 to 0.6, with higher values representing 'healthier' green vegetahon).


Abe, Y, Marean, C. W, Nilssen, P J., Asseta,Z. and Stone,E. (2002). The analysisof cutmarks on archaeofauna: a review and critique ofquantification procedurcs, and a new image analysisGIS appfoach. Ameri(an Antiqlti r)^.61: 613-663. Alcock, S. and Cherry, J., eds. (2004). Side by-Sitle Survev ComparativeRegional Stlklies in the Mediterranean World. Oxford. Oxbow Books. Aldenderfer,M. (1996).Introduction.In Aldenderfer,M. and Maschner,H. D. G., eds.,Arlthropolog!-, Spaceand GeographicalInformation S)1stems, pp. 3 18. New York, Oxford University Press. (1998). Quantitativemethodsin archaeology:a review olrecent trendsand developments. ./a4a.r/ oJArchaeological Research,6: 9 l- 1220. ALGAO (2001). Local Records National Resourc:e: An ALGAO Strateg\ .for Sitesan / Monunents Recor.lsin Erlgland.London,AssociationofLocal GovemmentArchaeologyOfficers(ALGAO). (2002).Historic Environment Records:Betlclunm'ks.for Good Practi.e. London, English Heritage/ Associationof Local Govenment AJchaeologyOfficers (ALCAO). Allen, J.R.L. and Fultbrd, M.c. (1996).The distribution of south-east DorsetBlack Burnished Category I pottery in south west Britain. Britannia,2l: 223-281. Allen, K. M. S. (1990). Modelling early historic trade in the easten Great Lakes using geographic inlbnnation systems.In Allen, K. M. S-, Green. S. W. and Zubrow, E B.W., eds.,Interprcthg Space:GIS and Arclneology, pp. 319 329. London, Taylor & Francis. Ammerman, A. J- and Cavalli-Sforza,L. L. ( i971). Measuringthe rate of spreadof early farming in Europe. Man, N.S.6: 674 688. Anderson,D. G. and Faught,M. K. (2000). Palaeoindian afiefacrdistributions:evidenceand implications. Anti(tuity,'l4: 50'7 513. Anderson,R. C. ( 1982).Photogran'lmetry: the pros and consfor archaeololy. Worl(lArchaeolog\,141 200 205. APPAG (2003).Tlre Cuffent StdteofArchaeologyinthe LlnitedKitlgdom: First Report ofthe Al[-Pdrn ParliamentaryArchaeolog)iCroup. London, The All-Party ParljamentaryArchaeologyGroup. Atkinson, P (2002). Sudace modelling: what's the point? ?r.rr? sactionsin GIS,6: 14. Aurenhammer F and Edelsbrunner, H. ( I984). An optimal algorithm for consrructingthe weighted Voronoi diagram in the plarre.Pattern Reco]niriotl, 1'7 | 251-25'1. Bailey, T. C. and Gatrel\, A.C. (1995). Interactiw SpatiaL Data Atldllsr.r.Harlow, Longman. Baker, W. A. ( 1992). Air archaeologyin the valley of the River Severn.Unpublished Ph.D. rhesis, University of Southampton. Ba1l,G. and Hall, D. (i970). A cluste ng techniquefor summarisingmultivariaredata.BehoNioral Science. l2: 153-155. Barker, P (1998). Techniques of Archaeological Excaratiotr,3rd edition. London and New York, Routledge. Barrett, J. C. (1999)- The mythical landscapes of the British Iron Age. ln Ashmore, A. and Knapp, 4.8., eds.,Archaeologiesof landscape, pp.253 265. Oxfbrd, Blackwell. Barlon,M.8., Bernabeu, J.,Aura, E. and Garcia, O. ( 1999). Land-use dynamics and socioeconomic change:an examplefrom the Polop Alto Valley. Anlerican Antiquih, 64: 609 634.




Barton,M. B., Bemabeu,J.,Aura, E., Garcia,O. and Roca,N. L. (2002).Dynamic landscapes, artifact taphonomyand land usemodeling in the westemMediteranean. Ceoarchaeology,l7: 155 190. Batchelor, D. (1999). The use of GIS for archaeological sensitivity and visibility analysis at Avebury and associatedsites, World Heritage site, United Kingdom. In Box, P, Stonehenge, ed., GIS and Cultural Resource Manageme t: A Manual Jbr Heritage Managers, pp. ll8-128. Bangkok, UNESCO. Batty, M. (2001). Exploring isovist flelds: spaceand shapein architecturaland urban morphology. Environmentand Planning B: Planning and Desigtt,2S: 123-150. Baxter,M. (1994).trploratory Multirariate Analfsis inArchaeology.Edinburgh,EdinburghUniversity Press. Beardah,C. (1999). Usesof multivariatekernel densityestimates. In Dingwall, L., Exon, S., Gaffney, V Laflin, S. and van Leusen,M., eds.,Arcftdeolob- in the Age of the Intemet: ComputerApplicationsand QudntitativeMethods in ArchaeologyI 997. B{Rlntemational SeriesS750.Oxford, Archaeopress. Beardah, C. and Baxter, M. (1996). The archaeologicaluse of kemel density estimates..anael net Archaeolog!.L http : / / intarch. ac. uk / j ournal / issuel /beardah-index. html (accessed 05/1 l/?004). Beasley, D. and Huggins, L. ( t991). ANSWERSUser's Manual. W. Lafayette.IN, Agricultural Engineering, PurdueUniversity. Beck, A. and Beck, M. (2000). Computing, theory and practice:establishingthe agendain contract to the Interpreting archaeology. In Roskams,5., ed.,InterpretingStratigraphy:PapersPresented Stratigraphy Conferences I 993 1997, pp. 173-181. Oxford, Archaeopress. Bell. T. and Lock, G. (2000). Topographicand cultural influenceson walking the Ridgeway in later prehistoric times. In Lock, C., el., Beyond the Map: Archaeologf atul Spatial Technoktgies, pp. 85 100.Amsterdam,IOS Press. and communication Bell, T., Wjlson, A. and Wickham, A. (2002). Tracking the Samnites:landscape routesin the SangroValley,Italy. American Journal of Archaeology, 106: 169-186. settlement pattem Bell. T. L. and Church, R. (1985).Location-allocation modelling in archaeological research: somepreliminary applications.WorldArchaeology, 16]. 351-37 1. Ben-Dor, E., Portugali, J., Kochavi, M., Shimoni, M. and Vinjtzky, L. (1999). Airborne thermal video radiometryand excavationplanning at Tel Leviah, Golan Hetghts.Israel.Journal of Field Arthaeology,26:l11 121 . Environmentald Planning Benedikt,M. L. (1979). To take hold of space:isovists and isovist field,s. B: Phnning and Desigtt,6: 4'7 -65. Jo&rr?a/ Bevan,A. (2003).The rural landscape ofNeopalatial Kythera: a GIS perspective. ofMedierraneanArchaeology, | 5 : 2l'7J56. Bevan,A. and Bell, T. (2004).A Surve)'of Starulan1s for the English ArchaeoloBicdlRecord Commutlit\': A Report on Behalf oJEnglish Heritage. London, English Hedtage. archaeology Bevan,A. and Conolly, J. (200,1). GIS, archaeologicai suryeyand landscape on the Island of Kythera, Greece.J.rurnal of Fiekl Archaeoktgy,29: 123-138. (in press).Multiscalar approaches to settlement distributions.In Lock, G. andMolyneaux, 8., eds., Cotironting ScalefuArchaeolog!: Issues ofTheod and Ptactice.London, Kluwer Press. http : paper s 'b o v a n - . o - o wwl^ /..u ore .( r pn u. ( d njc onolly v-200).pdf (accessed 22l08/05). Bervley, R., Bmasch, O. and Palmer, R. (1996). An aerial archaeologytraining week, 15-22 June 1996,held nearSiofok,Lake Balaton, Hungary. Antiquitl,'70:'745'750. Bewley, R., Donoghue, D., Gaffney, V, van Leusen, M. and Wise, A. (1998). Archivtug Aerial Photographyand Remote Sensing Daxr A Guide to Cootl Practice. A s and Humanities Data Service Guides to Good Practice.Arts and Humanities Data Service. http : / / ads . ahds . p ro-je cL goodguides apandr - ( ac c e s s e0 d5 / l l / 2 0 0 4 ) . -c. ^ K



Bewley, R. and Raczkowski, W., eds.(2002).Aeial Archaeol.g!: DetelopingFuturepractice. Oxford, IOS Press. Beyron-Davies,P (1992).Relatiotldl DatabaseDesign. Oxford, Blackwell Scientific. Bibby, J. S., Douglas,H. A., Thomasson,A. J. and Robeftson,J. S. ,1g9l). Land Capability Classifi_ cation lbr Agriculture. Aberdeen,Macaulay Land Use Research Institute. Binford, L.R. (1989). The'New Archaeology' rhen and now. In Lamberg_Karlovsky, C.C., ed., ArchaeoloSicalThought in Ameica, pp.50-62. Cambridge,CambridgeUniversity press. Binford, L. R. and Binford, S. R., eds. (1968).New penpectivesii A/chaeolog!. Chtcago.IL, Aldine. Bindift, J., Kuna, M. and Venclova,N., eds.(2OOl).TheF ure of SujfaceArtefact Suley in Europe. Sheffield, Sheffield Academic Press. Bishop, I. D. (2002). Dete.mination of thresholdsof visual impactt the caseof wind turbines. E?vi, ronmenl and Planting B: Planning an(l Design,29: lO7_71g. Bouchet,J.-M. and Bumez, C. (1990.1. Le camp Ndolithique de R6jolles i Biron CharenteMaririme. Bulletin de la Soci'Q Pr.histoique FtunEaise,S'j:10 12. Boum, R. (1999). Eventsand monuments:a discussionpaper.S,44R g: 3 7 ly'err,r, Bove,F.J. ( 198l). Trend sudaceanalysisand the Lowlaod ClassicMaya collapse. A,nelican Antiqulr-. 16:93 112. Bradley, R. (1998). Ruined buildings, ruined stones,enclosures,tombs and natunl places in the Neolithic of south west England. WorldArchaeolog,-.30 13_22. An Arch(teologyof Natural places. Lonclon.Routledse_ 12000). Bradley,R., Harding, J., Rippon, S. and Mathewr, M. (1993). A field merhod for investigatingthe distribution of rock art. Ox,t'ord Journal ofArchaeology, 12: 129_145. Brock,J. C., Wright, C. W, Sallenger, A. H., Krabill, W. B. and Swifr, R. N. (2002).Basisand methods of NASA Airborne TopographicMapper LiDAR surveysfor coastals trdies. Joulral of Coastal R c, ear, h,1 8 l-lj3 . Broodbank,C. (1999). Kythera survey: preliminary .eport on the l99g season. Annaal ofthe British SrhoolLl!Athca\,94 19l-211. Buck, C., Cavanagh, W. and Litton, C. (1988).The spatialanalysisofsire phosphare dara.In Rahtz,S., ed., Computer and QuantitativeMethods itt Archaeology 1988,BAR International Series42t6, pp. 151 160.Oxford,BAR. Bunge,W W (1966). TheoreticalGeogrcph),.Lund Studiesin Geography6, SeriesC, General and MathematicalGeography.Lund, Gleerup. Burrough,P A. and McDonnell, R. (1998).p rinciples oJGeographicalInformttion Syste,rr. Oxford, Oxford University Press. Bustard,W (1996). Spaceas Place: Small and Great House Spatial Organisationin Chaco Canyon, New Mexico, 1000 ll50 AD. Unpublishedph.D. thesis,University of New Mexico. Campbell,J. B. (1996).Introduc on to Remote Sensi,lg,2nd, edrtion.New york. The cuilford press. (2002). httroductiotl to RemoteSensing,3rd edition.New york, Guiltbrd publications. Campbell, M. (2000). Sites and site types on Rarotonga,Cook Islands. Net|,Zealand Jolrndl of Atrhaeulogl. 22 45 14 Caiiamero, D. andde Velde,W. V (2000).Emotionally groundedsocialinteraction.[n Dautenhahn. K.. ed' Humdn cognition and sociol Age t Technolob-,pp.137_162.Amsterdam,John Benjamins. Capper,J. (1907). Photographs of Stonehenge, as seenfrom a war b allool. Archaeolopia.60. 5.j l. Carara, A., Bitelli, G. and Carla, R. (1997). Comparisonof techniques for generatingdigitai te.rain models from contour lines. lntemational Journal of Geographical Information Science, 11: 451414. Cafier, F.W (1969). An analysis ol the Medieval Serbian Oecumene: a theoretical aDoroach. ueografJu Annler,5l: 39 52. Cartwright, W, Crampton, J., Gartner, G., et al. (2OOl). Geospatialinformation visualisation user intertaceissues.Cdro graphy anclGeogruphicInformation Science.2g: 45 60.



R. ( 1972).Locational analysisof prehjstoricsettlementin New Zealand.Mankind,8: 212 Cassefs, 222. and the perception of prehistoric landscapes: Chapman,H. and Gearey,B. (2000). Palaeoecology Arriq if,,'74:316 319. to phenomenology. somecommentson visual approaches Chapman,J. (1990). Social inequality on Bulgarian tells and the Vama problem. In Samson,R., ed., The Social Archaeologt of Houses,pp. 49-92. Edinburgh,Edinburgh University Press. ACM Transactions model: toward a unified view ol daLa. Chen,P P S. (1976).The entity-relationship l:9-36. on Database Systems, Models in Geograpfiy.London, Methuen. Chorley,R. J. and Haggett,P, eds. (196'1). Albany, NY, Chou, Y-H. (199'7).Eaplorirg Spatial Analysisin Geographic hformation SJstems. OnWord Press. Clark, A. J. (1996). SeeingBeneath tlrc Soil: ProspectinqMethods in Archaeology.London, B. T. Batstbrd. in Clart. P and Evans,F (1954). Distanceto nearestneighbouras a measureof spatialrelatioDships populations.EcoloS)',35r 445--'153. Claike, D. L. (1968).Analytical Archaeology.London, Methuen. (19'72). ed,. Models in Arcftaeology.London, Methuen. ed. (197'7 ). Spatial Archdeologr. London, Academic Press. Models a d Applicdlio,?s.London, Pion. Cliff, A. D. and Ord, J.K. (7981). Spatial Processes, Decisions S. ( 1985).Testingthe association betweentwo spatialprocesses. Clit'fbrd,P andRichardson, No.2, pp. 155-160. and Statistics,SupplementIssLrc Clowes, A. and Comfoft,P. (1982). Processand Iandform. Edinblrgh, Oliver & Boyd. Codd. E. F ( 1970). A relational model of data for large shareddata banks. Communicationsof the ACM. | 3(6\: 3'7'7 -38'7. least cost path algodthm fbr.oads and Collischonn, W and Pilar, V (2000). A direction-dependent canals.lnternational Journal of GeographicalInformatiotr Science,14. 39'7406. H. (1992).Peoplemanipulateobjects(but cultivatefields): beyondthe raster vectordebate CoucJelis, in GIS. In Frank, A. U-, Campari, I. and Formentini, U., eds., Theoriesand Methods of SpatioTemporalReasoningin GeogruphicSpace.pp. 65 77. Berlin, Springer-Verlag. (1999). Space,time, geography.In Longley, P A., Goodchild, M. F., Maguire, D. J. and Rhind. vol. I, Principles arul Technical Issues. D.w., eds., Geographical Infamlation .t),.tlem.r, pp. 29 38. New York, John Wiley & Sons. Cox, C. ( 1992).Satelliteimagery, aerial photographyand wetland archaeology.World Archaeologl. 24: 249J61. In Jesson, M. and of hill-forts and their cultural environments. Cunliffe, B. W. (197J). Some aspects Hill, D., eds.,T,&e lron Age and its Hill-Forts, University ol SouthamptonMonograph Series 1. pp. 53-70. Southampton, University of Southampton. London. Curry. M. R. (1998\. Digital Places: Lit)i g with Geographic Information Technologies. Routledge. Dacey.M. F (1960).The spacingofriver tow ns.Annals of theAssociationof Americdn Geogruphers. 50 : 59 61 . D Andrea, A., Gallotti, R. and Piperno, M. (2002). Taphonomic interpretation of the Developed Oldowan site of Garba IV (Melka Kunture, Ethiopia) through a GIS application.Arti4&i4,, 76: 991-1002. Daniel, L R. (2001). Stoneraw material availability and Early Archaic settlementin the southeastern United States. Arplican Antiqu 1".66: 231-265. hislory. ProfessionaL Suneyor,20: Deloach, S. R. and Leonard,J. (2000). Making photogrammetric 6 10 . New York, John Wiley & DeMers, M. N. (1997). Fundamentalsof GeographicInformation S.|sl?m.t. Sons.


3r I

Dennell, R.W and Webley, D. (1975). hehistoric settlementand land use in southernBulgaria. In Higgs, E. S., ed., Palneoeconomy, Being the SecondVolumeof papers in Etonomic prehistory pj'oject in the Ea y History by Membersand Associatesof the BritishAcademf Major Research oJAgriculture, pp.97-109. Camb dge, CambridgeUniversiry press. Dent, B. D. (1999). Cartography: Thernttic Map Design, 5th edition.London, WCB,McGraw Hill. Deursen, W P A.V (1995). ceographical Information Sysrems and Dynamic Models. Unpub_ lished Ph.D. thesis, Utrecht University. Available as NederlandseGeografischeStudies 190. http : / /pcraster . geog . uu. n1lpubtications . htnl (accessed l6108/0.1). Dibble, H. L., Chase,P G., McPherron, S. P and Tuffreau, A. (1997). Tesringrhe realiry of a .living lloor' with archaeologicd dat^.Ame can Antiquit!,62. 629-651. Douglas, D.H. (1999). Least cost path in GIS using an accumulated cost surface andslope lines.http: / /www. hig . se / "dds /research/ leastcos / cumcost4 . htm (accessed 10/0212002). Dreyfus, H. (1972). What ComputersCan't Do: A CritiqLteof Arirtcidl Reason.New york, Harper & Row. Dublin Core MetadataInitiative (2003). Dublin Core MetadataElement Set, Version l.l: Relerence Description. http : / /dubl incore . orqldocuments /dces / (accessed 08/09/2004). Duncan,R. B. and Beckman,A. (2000).Site location in pennsylvaniaand West Virginia. In Westcott, K. L. and Brandon, R.J., eds.,Practical Applications of GIS for Archaeo[ogists:A pre.]ictive Modeling Kit,pp.33-58. London, Taylor & Francis. Dunning, A. (2001). Managing changewith digital data:the caseof the EssexSites and Monuments Record. Internet Archaeolog!.15.htLp | / /ahds . ac . uk/ creating/ case- studies / essex/ (accessed 12/l l/2004). Earl, G. andWheatley,D. (2002).Virtual reconstruction and the interpretiveprocess: a case study from Avebury. In Wheatley,D., Earl, G. and Poppy,5., eds.,ContemporaryThemesin Archaeological Cotlrputing,University of SouthamptonDepartmentof Archaeology Monograph 3, pp. 5_15. Oxford, Oxbow Books. Eason, G., Coles, C. W and Geninby, G. (1980). Mathematics an(l Statisticsfor the Bio-sciences. Chichester. Ellis Horwood. Eastrnan,J.R. (1989). Pushbroom algorithms for calculating distancesin rasrer grids. In Auto, Carto, eds.,A4lo Carto9: Ninth Intenational Symposium onComp ter-AssistedCattograph!, Baltimore,MD,2-7 April. Falls Church,VA, American Societyfbr photogmmmetryand Remote Sensing,American Con8resson Surveying and Mapping, pp. 288 297. (1997). Idrisi Version2: GLlideto GIS an.l ltnage processitg.\,/ol.l. Worcester,MA, Clark Labs. (2001).Itlrisi32 Release 2: Guide to GIS and Imageprocessing,Vol. l. Worcester, MA, ClarkLabs. Ebert, D. (2002). The potential of geosratisrics in rhe analysisof fieldwalking dara.In Wheatley,D., Earl, G. and Poppy,S., eds.,Contemporar! Themesin ArchaeoloBicalComputitlg,IJniversityof SouthamptonDepartmentof Archaeology Monograph 3, pp. 82 89. Oxford, Oxbow Books. Egenhofer,M. J. (1991). Extending SQL for cartographic displays.Cartograph| an.l Geogrdphic Infbrmatiotr Slstems,I 8: 230-245. (1994). Spatial SQL: a query and presentationlangtrage.IEEE Ttunsdctionson Knowledge ancl Data Engineering,6: 86 95. Eisenberg, J. D. (2002).SVG Essentialt. Cambridge,MA, O'Reilly. Eiteljorg, H. (2000). The compelling computer image: a double-edged sword. Intemer Archaeologl, 8. hLLp | / / intarch. ac. uk/ j ournal / issue8 /eitef j org-index. html (accessed l2l01/2004). ESRI (1999). Splines.Arclnfo Version7.2.I Documentation.Redlands,CA, EnvironmenralSystems Research Institute. (2002).TIN in Arclnfo 8.0.l. A.clnfo Version 8.0.1Documentation.Redlands, CA, Environmental SystemsResearch Institute.



Multi Agent Modelling System (CMAMS vl.2). UnpublishedM.Sc. Eve, S. (2004). Chersonesos thesis,Institute of Archaeology,University College London. Landscapes:Joutfiers Stonehenge Exon, S., Gaffney, V, Woodward, A. and Yorston, R. (2OOO). Tlrough Real-andhttagined Worlds.Oxfotd, Archaeopress. in westernNSW, Australia: geomorFanning,P C. and Holdaway,S. J. (2001).Stoneartifact scatters phic controls on artifact size and distribution. Geoarchaeology.16: 667 686. FAO (1974). Approaches to Iatrd Classirtcation,Soils Bliletin 22. Rome, Food and Agriculture Organisation. FAO (1976). A Framework for Land Evaluation, Soils Bulletitr 32. Rorne, Food and ht t p: / / www. f ao. or gld o c r e p / X 5 3 1 0 E / X 5 3 1 0 E 0 0 . h t n Ag ricu ltureOr ganis at ion. (accessed 03/09/2004). visibility. In Smardon,R. C., Palmer,J. F and Felleman,J. P, eds., Felleman,J. P (1986). Landscape visual Prcject At1t1l\sis. New York. pp.47 62. John wiley & Sons. Foundations.t'or Fernie, K. and Gilman, P (2000). ht:formingthe Fut rc of the Past: Guideliles for SMRs. Swindon, English Heritage. Finzi, C. V and Higgs, E. S. (1972). Prehistoriceconomy in the Mount Camel areaofPalestine: site 1-37. catchmentanalysis.Proceedingsof the Prchistoric Sociea^,361 Fischer, M.M. (2004). GIS and network analysis. In Hensher, D., Button, K., Haynes, K. and Amsterdam,Elsevier Stopher,P, eds.,Hanlbook ofTransport Geographt and Spatial Systems.
s.i a h .p l -i r r^. c c ,c :nl /r d'om l n^oF ,c /r '11 ndF

(accessed 25l08/2004). Fisher,P and Unwin, D., eds. (2002). virtual Realir) in Ceoglapft]. London and New York, Taylor & Francis. Fisher, PF- (1991). First experimentsin viewshed unceflainty: the accuracyof the viewshed area. 57 : 1321 132'1. PhotogrammetricEngineeing atld Remote Sensing, (1993). Algorithm and implementationuncertaintyin viewshedanalysis.International Joultlol of Geographical InJbmd tiotl Slstems,'l : 33 | -31'7. Fisher,P F., Farelly, C., Maddocks,A. and Ruggles,C. L. N. ( 1997).Spatialanalysisof visible areas from the Bronze Age cairns of Mull. Joarnal of Archaeological Science,24. 581-592. Floriani, L. D., Marzano, P and Puppo, E. (1994). Line-of-sight communicationon tenain models. 8: 329-342. Intemational Joutllal of Geographical Informatiotl S),stems, Foley,J. D., van Dam, A., Feiner,S. K. and Hughes.J. F. (1990). Co rPuterGraphics:Princi)les an.l Pracrice.Reading,MA, Addison-Wesley. Foley,R. A. ( 1977).Spaceandenergy:a methodforanalysinghabitatvalueandutilisation in relationto archaeological sites.In Clarke,D. L., ed-,SpatialArchaeolog,pp. 163 187.London, Academic Press. Forer, P and Unwin, D. (1999). Enabling progressin GIS and education.ln Longley, PA., Goodchild, M. F., Maguire, D. J. and Rhind, D. W, eds., Geographica[Informat[on Slstems. Vo1.2, 47 756. New York, John wiley & Sons. ManagementIssues and Applications, pp. '7 Foresman,T. w. (1998). GIS early years and the thread of evolution. In Foresman,T. W., ed., Ifte History of GIS, pp. 3 17. London, PrenticeHall. analysis)as an insight into social Foster,S. M. (1989).Analysis of spatialpattemsin buildings (access shucture:examplesfrom the ScottishIron Age. Antiqui6,,63:40 50. Fotheringhan, A. S., Brunsdon,C. and Charlton, M. (2000a).Geographicall! weightedRegression: London, John Wiley & Sons. The Analysisof Spatidlly VnryingRelationshrps. (2OOOb). Geogftrpr). London, Sage. Quantitative Fotheringham,A. S., Charlton, M. E. and Brunsdon, C. ( 1998).Geographicallyweighted regression: a naturalevolution of the expansionmethod for spatialdata analysis.Environmentawl Planting A,30: 1905-192'7 .


3 r3

Fotheringham, A. S. and O'Kelly, M. E. ( l9 g9).Spatia! InteractionMcttlels:Fonnulatiojls an(l Appli .?tlorr. Dordrecht. Kluwer. Fowler,M ( 1996)-High resorurionsateniteimageryin archaeorogical application:a Russiansatelite photognph of rhe Sronehenge region. Anli4uir!,.10: 66j_6i1. Franke.R. (l982). Smooth inte.polation ofscattered data by local thin plate splines. Conputers arul Mathematicsrrith Apptication, gt 23j_2g1. Franklin, W. R. (2000). Applicatiors of analytical ca(ography. Canogrctphyantl CeogrttphicInJbralion Science, 21| 225-231. Fritz, J. M. and Plog, F. T. ( 1970).The natureof archaeological explanatjon.AmerK:anAntiquit)-,351

405-,1r 2.

Gallney, V and Stanii4Z. (1991). GIS Appnaches to Regional At1.tl:-sis: .t CaseStu.lyofthe Ltland of Hvar. Ljubljana, Znanstveniinstitut FilozoMe fakultete. ( J992).Diodo'us Sicurusand the israndofHvar, Darmatia: testingthe text with GIs. In Lock. G. and MolTett,J., eds.,Cor?p uter Applications unLleuantitati,e Methocls itl A t chaeotoB!I99I , Brjtlsh ArchaeologicalRepons Illternational Series557, pp. 113_126.Oxford. Ternpus Reparatum. Gaffney,V, Stanai6, Z. and Watson,H. (1996).Movirg from catchments to cogntoon:renlarrve sreps toward a larger archaeologicar contexl for GIS. In Aldenderfer, M. anJ Maschner, H. D.G., eds.,Ailtllropol.)g!, Spacearul GeographicIn|brmation S\-stetns, pp. 132 154. Oxtbrd, Oxford University Press. Galfney' v and van Leusen,p M. (1995). postscript: GIs, environmentardetermrmsm and archae_ ology. A paraliel text. In Lock, G.R. and Stantia, 2., ecls., Archaeologtrand Ceogfttphi(al lnfonnation Systems: A Europeanperspectit)e, pp.36j_3g2. Lonrion,Tajior & Francis. Gamble, C. S. (1998). Palaeolithicsociety and the releasefrom proximity: a network approachto mf mate relations. WorldArchaeolog r-.29:126419. Garcia-sanjuan,L. and wheatrey, D. (1999). The slate ol the Arc: differenriar ratesor adoption or GIS for Europeanheritagemanagerrent . Euft)peqnJout nal of Arthaeolog1,2: 201_22g. _ Cav lova,M. h.d.). Weighted Vor.onoi diagrams in biology. http: / /F.g."7.pr.. rr..1g..o. cal -marina/r,'pplants / (accessed 05/l l/2004). cibson, J.J. (I986). z,e EcologicalApproucltto percepriorl.H lsdale,Law'ence Erlbaum Associates. Gilbert, N. and Troitzsch, K.c. (1999). SimLlation.for the Social Scientist. Buckingham, Open Univer-sity Press. Gilchrist, R. (1991). Gender an(l Material Cu[tLtre.. The Archaeologj,.)f Religints Woman.Lottdor-, Routledge. Gillings,M. (l995). Flood dynamics and settlenent in the Tiszava11ey of north_east Hungary:GIS and the Upper Tisza Project.ln Lock, G. R. aDdStanaia, 2., eds.,A rcfuteoLogy and Geolqtcrphit Itformation Sf.stems: A Europear perspective.pp. 67_g,1. London, Taylor & Francis. (199t). Not drowning but waving? The Tisza flood-plain revisited. In Johnson,t. and North. M., eds.,,.ArchaeologicalApplications qt' GlS: prccee(lings o.fColloquiun I I, IJlSpp XIIIttt Congre:s"^. Forli, Italy, September 1996. Sydney University Archaeological Methods Series S. Sydney, ArchaeologicalComputing Laboratory.CD-ROM. (20!5). The real, the viftually rcal, and the hyperrealtthe.ole of VR in archaeology. In Smiles, S. and Moser, S., eds. , Etwisioning the past:Archaeology, antl the Image, pp.223 239. Oxford, Blackwell Publishing. Gillings, M. and Goodrick, G.T. (1996). Sensuous and reflexiveGIS: explonng visualisation and VRML. InternetAtthaeologr, l. http : / /intarch. ac. uk/ j ournal /issuel / gilfings index. html (accessed 30/08/2004). Gillings, M., Mattingly, D. and van Dalen, J., eds. (1999). Geographical Informatiotl :ilstems and LandscapeArchaeology. The Archaeology of MediterraneanLandscapes3. Oxford, Oxbow Books.



Gillings, M. and Sbonias,K. (1999). Regional survey and cIS: rhe Boeoria project. In Gillings, M., Mattingly, D. and van Dalen, J., eds.,Geographical Information Systensantl Lantlscipe Atchaeology,The ArchaeologyofMediterranean Landscapes 3. Oxtbrd, Oxbow Books. Gilman, P (2004). Siles and Monuments Recordsand Historic Environment Recordsin England: is Cinderella finaliy going to the ball? 1nrr,et Archaeolog), 15. http : / / intaich. ac . uk/ journaf /issue15/gilman_toc.htm1 (accessed ll10612004). Girnblett.H. R., ed. (2002).Integrating GeographicInformation Slstems aDdAgent-Based Modeling Techniques foi' simulating socidl and Ecological processes.o'ford, oxford university press. Gliasra, M., Russell,T., Shennan,S. and Steele,J. (2003). Neolithic transition in Europe: the radio carbon record revisited.Antiquit!, j'7: 45-63. Goodchild, M.F (1998). The geolibrary. In Carver, S., ed., huloljations in GIS 5: Selectedpape$ ftonl the Fi|th Nttrional Conference on GIS ResearchUK (GISRUK), pp. 59_68. London, Taylor & Francis. Gorenllo, L. J. and Bell, T. L. (1991).Network analysisand the study of pastregional organrsation. In Trombold, C. D., ed., R.)ad Netu)orksand Settletnent Hierarchies in the New World. pp. g0_9g. Cambridge,CambridgeUniversity press_ Gorenflo, L.J. and Gale, N. (1990). Mapping regional settlementin information space. Joumal of AnthrcpoIogical Archaeology,9: 24OJ'j 4. Goudie, A. S. (1987). Geographyand archaeology:the growth of a relarionship.In Wagstaff, J. M., ed.,ktndscape and Culture: Geographicaland Archaeologicalperspeajyes,pp. I I 25. Oxford, Basil Blackwell. Could, PR. (1967). On the geographicalinterpretationof eigenvalues. Transactionsof the Instit e of Briri'h Ceographers. 42: 5J- 8b. Crove. A. T. and Rackham,O. (2003). The Nature of Me.lierranean Europe: An Ecological History. New Haven,Cl Yale University press. Hageman,J. B. and Bennett, D. A. (2000). construction of digitar elevationmodels for archaeolosical applications.In Westcott,K. L. and Brandon, R.J., eds.,practical Applications ,tf GtSfit At'clneologists:A Predictite Modeling Kit, pp. I l3- 127.London. Taylor & Francis. Haggett,P (1965).Zacational Anal!sis in Human Geogtuphr.New york, John Wiley & Sons. Haggett,P and Chorley,R. J. (1969). N?b'o rk Analysish Geograpi). London, Edward Arnold. Haines,E. (1994). Point in polygon srrategies. In Heckberr, p., ed.,Graphics Gems IV, pp. 2446. London. Academic Press. Haining, R. (1990.). The use of added variable plots in regressionmodelling with spatial data. l,te P rofessional Geographe r, 42: 336 345. (2003).Spatial Data Analysis:Theory and prdcrice. Cambridge,Cambridge University press. Haraway,D. (1991). Simians,Cyborgs,antlWomqn: The Rein,)ention ofNatrz. London, Free Asso_ ciation Books. Harris, E. C. (1979). PrincipLesofArchaeological Stratigraph!. London, Academic press. Harris, T. (2000). Moving GIS: exploring movement within prehisto c culturai randscapes usins GIS. In Lock, G' ed., Beyoru1 the Map: Archaeology and Spatial Technologrcs, pp. t 16_123-. Amsterdam.IOS Press. Harvey,D. (1969). E:rp lanatiotl in Geographr. London, Edward Arnold. Hassan,F. (1997). Beyond the surtace:commentson Hodder's 'reflexive excavatlon methodolosv'. AatiquiD. 1 | : 1020-1025. Haughton, C. and Powfesland,D. (1999). WestHestenon: the Angliatl Cemeter)-, Archaeologic,l Monograph l. Yedingham,Landscape ResearchCentre. Hemandez, M J. (2003). Database Design for Mere Mortars: A Han(ls-ott Guide to Relatiotlal DatabaseDesign, 2ndedition. Boston, MA, Addison-Weslevprofessional. Higgs,E. S. and Finzi, C. V (1972).prehisroric economiest a terrjroriaL approach. In Higgs,E. S.. ed., Papersitr Econotuicprehistory: Studies b,t MetnbersandAssociatesofthe British Academ\


3 r5

Major Research Pmject in the Earb, History of Agiculture, pp.21 36. Carnbridge,Cambridge Unive.sity Press. Higgs, E. S., Vita-Finzi, C., Harris, D. R. and Fagg, A. E. (i967). The clinare, environmenr and industriesofStone Age Greece:part II. Proceedingsofthe Prchistoric Society.33. l-29. Higuchi, T. (1983). The Visual and Spatial Strucure of Lanclscapes. Camb dge, MA, MIT Press. (Trans.CharlesS. Terry-) Hill, J. B. (1998)-Ecologicalvariability andagriculturalspecialisation amongtheprotohistoricpueblos ofcentral New Mexico. Joumal of FieLdArchaeolog)',2512'75 294. Hifliel B. (1996).Spaceis the Machine. Cambridge,CambridgeUniversiry Press. Hillier, B. and Hanson.J.(1984\. The SociulLogic of Space.C.ambrrdge, CambridgeUniversiry Press. Hodder, I. (1972). Locational models and the study of Romano British settlement.In Clarke, D. L., ed.,Models in Archaeologl, pp. 887-909. London, Merhuen. (1982). $'mbolic anrl Stuctural Archaeology.Cambridge,CambridgeUniversity Press. (1986). Reading the Parr. Cambridge,CambridgeUniversity Press. (1990). The Domesticationol Europe. Oxford, Basil Blackwell. (1997). Always momentar], fluid and flexible: towards a reflexive excavation methodolosy. Antiq uit!, 1 1: 69 l-'7 00. (1999). The ArchaeologicalProcess.Oxford.,Btackwell. Hodder, L and Hassell,M. (197l). The non-randomspacingofRomano,Bdtish walled towns. M4r. N . S .6:3 91 -40 7. Hodder,L and Orton, C. (1916]r. Spatial Anah,sisin Archaeolo7t-.Cambridge,CarnbridgeUniversity Press. HolFJensen, A. ( 1988).Geograpll,: History and Concepts, a Student'sG uide.London,PaulChapman Publishing. (Trans.B. Fulle on.) Hoobler,B. M., Vance,G. F., Hamerlinck, J. D., Munn, L. C. and Hayward, J. A. (2003).Appiicarions ofland evaluationand site assessment. Jorrral of Soil and llater Conser-r,ation, 58: 105 113. Horn, B. K. P ( l98l ). Hill shadingand the reflectance map. Procee dings oJ the IEEE,69: 144'l . Hosmer, D. W. and Lemeshou S. (1989). Applied Logistic Regression.New York, John Wiley & Sons. Hudson-Smith,A. and Evans,S. (2003). Viftual cities: from CAD to 3-D GIS. In Longley, p A. and Batty, M., eds.,Adrarced Sp ial Anal))sis:The CASABook ofGIS. pp.41-60. Redlands,CA, ESRI Press. Hunt, E. D. (1992).Upgradingsite-catchment analysiswith the useofGIS: investigating the settlement pattemsof horliculturalists.World Archdeologl, 24: 283 309. Husdal, J. (2000). How to make a straight line square:network analysisin rasterGIS. Unpublished M. Sc.thesis, University ofLeicester. http: / / husdal .com/mscgis/Lhesis/(accessed D5/O'1/2004). Hutchinson,M. F (1989). A new method for gridding elevationand streamline data with automatic removal of pits. Jorrn al of Hydrology. 106 211 232. Hutchinson, M. F. and Do\rling, T. I. (1991). A continentalhydrological assessment of a new grid baseddigital elevationmodel of Australia.It) drological Processes, 5: 45 58. Iliffe, J. C. (2000). Datums and Map Projectiotr.t, for RemoteSensing,GIS, and S ne!-ir8. London, whittles Publishing. Ingold, T. ( I993). The temporality of the lan dscape. lVorld Archaeolog!,251 152-1'11. Izraelevitz, P (2003). A fast algorithm for approximate viewshed computatiorr.Photogrammetric Engineeringand RemoteSensing,69'.'16'7 -'7'l4. lnternational StandardsOrganisation(2003). ISO 19115: 2003 ceographic informarion: Meradara. ht tp : / / dubl incore . org / document s / dc e s / (accessed 08109/2001). Jenks,G. F. and Caspall, F.C. (1971). Error in choroplethmaps:denniton, measurement, reduction. Annals ofthe Associationof Ame can Geographers, 6l: 211-244.



PrenticeHall. Jensen, J. R. (2002).RemoteSensingof the Envitonment.New Jersey, Jensen, S. K. and Domingue,J. O- (1988).Extracting topographicsffucturefrom diSital elevationdata RemoteSensing. forgeographicinformation systemanalysis.PhotogrammetricEngineeriry (1nd 54 ;15 93 -16 00 . of Map Generalisatior. London, Taylor & Francis. Jodo,E. M. ( 1998).Causesand Consequences and Settleme t: A Predictire Model. New York, Jochim, M. A. (1976). Hunter GathererSubsistence Academic Prcss. Johnston,R. J. (1999). Geographyand GIS. In Longley, P A., Coodchild, M. F., Maguire, D. J. and Rhind, D. W, eds., Ceographical Informatlon Slsrems.Vol. I, Principles and TechnicalIssues, pp. 39 47. New York, John wiley & Sons. Jones, C. (1997). G?ogruphicalIn;formationS\stetnsand ComputerCartography.Harlow, Longman. Jones,N. L., Wright, S. G. and Maidment, D. R. (1990). watershed delineationwith triangle based terain models.Jornlal of Hydraulic Engineering, l16: 1232-1251. L., eds. (1988). OLtatltif))it1g the Prcsentand Prcdicting the Past: Theo.-, Judge,w J. and Sebastian, Predictive Modeling. Washington,DC, US Bureau Method and Application of ArchaeoLogical of Land Management,Departmentof Interior, US GovernmentPrinting Olfice. http: / /sipapu . ucsb. edu/ J. (1996).An evaluation of ChacoAnasaziroadways. Kantner, 20/08/200,1). Paperpresented at the 6l st SAA Annual Meeting, roads / sAA9 6 . pdf (accessed New Orleans.LA. Kaufman,L. andRousseeuw, P J. ( 1990).Fi, ding Groupsin Data: An Introductionto Cluster Analysis. New York, John Wiley & Sons. Ketz, D. D. (2001). Managing archaeologicalsurveysthrough geospalialifiormation. Pipeline and 40. Gas Journal- 228: 3'7 King. G. (1996). Mapping Redlit!: An Exploration of CLitural Cartogldpries. London, Macmillan. Klein, F. ( 1939).Elen entary Mathematics .from an Atlvtnced Stantlpoint,vol.2, Geometry.London, Macmillan- (Trans.E. R. Hedrick and C. A. Noble ftom the third German edition). Agent Kohler, T. A. and Gumerman.G. J., eds. (2000). Dfnanlics it1Hltman utd Primate SL,cieties: Oxford, Oxford University Press. Based Modeling ofSocial and Spatial Proces.rer. Kohler, T.A. and Parker, S.C. (1986). Predictive modelling for archaeologicalresourcelocation. In Schiffer, M.8, el., Advancesin Archaeological Method and Theon-, Vol. 9. pp. 397 452. New York, Academic Press. period caribou migrationsand Krist, F J. and Brown, D. G. (1994). GIS modelling of Palaeo-Indian viewshedsinnortheasternLower Michigan. PhotogrammetricEngineeringand RemoteSensing, 60: 1129-ll3'1 . for regionalmodeling of archaeological site Kvamme, K. L. (1983).Computerprocessingtechniques locations.Advancesit1ComputerArchdeologl, lt 26-52. (1985).Determiningrelationshjps betweenthe naturalenvironmentand prehistoricsite locations:a hunter-gatherer example.In Can, C., ed.,For"Concordance in ArchaeologicalAnalysis:Bridging DataStructtrre,Q.nrtitutireTechniqLteandTheory,pp.2O8-238.KansasCity.KS,Westport. (1988). Developmentand testing of quantitativemodels. In Judge,W J. and Sebastian, L., eds., Quantifying the Presentand Predicting the Past: Tlrcory.Method antl Application qf Archaeological PreclictireModeling, pp.325121. washington, DC, US Bureau of Land Management, Departmentof Interior, US GovernmentPrinting Office. (1990a). The lundamental principles and practice ol predictive archaeological modeling. In Voorips, A., ed.,Mdt ematicsand lnformation Scienceit Archaeology:A Flerible Framework, Studiesin Modem Archaeology 3, pp. 257 295. Bonn, Holos-Verlag. (1990b).GIS algorithms and their effectson regional archaeological analysesln Allen, K. M. S., Green,S. W and Zubrow, E. B. W, eds.,lnaerpreting Space: GIS and Archdeolo$,pp. 112-125. London, Taylor & Francis. (1990c).One-sample testsin regional archaeological analysisrnew possibilitiesthrough computer technology. American Antiqui\, 55. 36'7 381.



(1990d). Spatial autocorrelationand the Classic Maya collapserevisited: refined techniquesand new conclusions."Iol rnal ofArchaeobgical Science,17: l9'7-20'7. (1992a).A predictivesite location model on the High Plains:an examplewith an independent test. Plains Anthropologist,37: 19 40. (1992b).Tenain fonn analysisolarchaeologicallocation throughgeographicinformation systems. In Lock, G. and Moffet|I., eds.,Conputer Applications und Quantirdtij)e Methods in Archaeolog) /991, British ArchaeologicalReporlsInternarional Series577, pp. 127- 136.Oxford, Tempus Reparatum. (1993). Spatial statisticsand GIS: an integratedapproach.In Andresen,J.. Madsen,T. and Scollar, L. eds, Conputing the Past: ComputerApplications and euantitotive Methods in Archaeology 1992, pp.9l-103. A,arhus, Aarhus University Press. Kwan, M. P (2002). Is GIS for woman? Reflectionson the critical discoursein the 1990s.Genrler: Place and Cultutc. 9: 27 l-2'79. Ladefoged, T. N. and Pearson,R. (2000). Fofiified castleson Okinawa Island during the Gusuku Period,AD l20O 1600.Antiquit\,74:404412. Lake, M. W. (2000a).MAGICAL compurersimularion of Mesolithic foraging. In Kohler, T. A. and Gumerman,G. J., eds..Duwmics it Human and Prinlale Societies:Agent-BasedModelling of Socidl and Spatidl Proc?Jjes,pp. 107-143. New York, Oxford University press. (2000b). MAGICAL computer simularion of Mesolirhic foraging on Islay. In Mithen, S. J., ed., Hunter GathererLartdscape Archaeolog,t,: The SouthernHebridesMesotithic project j98tj-98, Yol.2, ArchaeologicalFieldv,orkotl Colotls.t!, Co rputerModellivg, Experinental Archoeolog!, atrd Final Interpretatiol?.r, pp. 465-495. Cambridge,The McDonald Institutefor Archaeological Research. (2004).Being in a simulacrum:electronicagency. In Gardner,A., ed., Agenc\ Llncovere.l: Archdeological Ptrspectivet on Social Agency,Power and BeinB Humar, pp. 191-209. London, UCL PressLake, M. W. and Woodman, P E. (2000). Viewshed analysis of site location on Islay. ln Mirhen, S. 1., ed.-Httnter-GathererLandscapeArclueology: The Southen Hebrides Mesolithic pmject 7988-98, Vol. 2, Archaeological Fieldwnrk on Colonsay, ConrputerModelling, Experinental Archaeolog),, and Findl hfierpretdtiolls, pp. 497 503. Cambridge.The McDonald Institute for ArchaeologicalResearch. (2003). Visibility studiesin archaeology:a review and case stody. En\iironmentund planning B: Plonni ga d DP.,ign.l0:68'1 707. Lake, M. W., Woodman, P E. and Mithen, S. J. (1998). Tailoring GIS software for archaeological applications:an exampleconceming viewshed analysis. Journal ofArchaeological Science,25.. 27-38. Land Information New Zealand (2003). New Zealand Govemnent GeospatialMetadataStandardv 1.2Draft:Part1-Profile Delinition. v lnv.IiIrz.govL.nz/Tcs/Iinz/pltb/web/rooL/ core /Topography/ ?ro j ectsATrndprogranunes / geospatialmetadata/ index. j sp (accessed 08/09/2004). Langmuir, E. (199'7).Mountdincrert and Itadership. Glasgow, The Scottish Spon Council/The Mountain Leader Training Board. Levine, N. (2002.1Crimestat: A SpcttialStatistics Progrumfor the AnaL-sis of Crime Incident Loca tions (v 2.0). Houston, TX, Ned Levine & Associates/Washington, DC, National Institute of Justice. Lewis, M. and Wigen, K. (1997). The Myth o.fContinents:A Critique of MetageoSraph\. Berkeley, CA, University of California Press. Lillesand, T. M., Chipman.J. W and Kiefer, R. W. (2003).Renote Sensingaid ImageInterprctcttion, 5th edition. New York, John Wiley & Sons. Lilfesand, T. M. and Kiefer, R. W (2000). RenroteSensitlgand Ima7e Inrcrpretation, 4th edition. New York. John Wilev & Sons.

3r 8


Lindlen D. V and Scott, W F (1984). Cambridge Elementary Statistical Tables. Cambridge, CambridgeUniversity PressLlobera, M. (1996). Exploring the topographyof mind: GIS, social spaceand archaeology. Antiquity, 70: 612 622. (2000).Understandingmovement:a pilot model towardsthe sociolo8y of movement.In Lock, G., pp.65 8,1. ed.,Beyondthe Map: Archaeologyand Spatial Technologies. Amsterdam,IOS Press. (2001). Building past landscapeperception with GIS: understandingtopographic prominence. gical Science.28: 1005-1014. Journal of Archaeolo (2003). Extending CIS-basedvisual analysis:the conceptof 'visualscapes' . hlternational Jourrldl of Geographical Infotmation Science, l'1| 2548. Lloyd, C. D. and Atkinson, P M. (2004). Archaeology and geostatistics. "/ournal o.fArchaeological Science.3l:15l-164. Lock, G. (2003). Using Computersin Arlreologl. London, Routledge. Lock, G., Bell, T. and Lloyd, J. (1999).Towardsa methodologyfor modelling sudacesurveydata:the SangroValley Project.In Gillings, M., Mattingly, D. and van Dalen,I.,eds., GeographicalInfor mation Systems an.l Landscape Archaeologt-, The ArchaeologyofMediteranean Landscapes 3, pp. 55-63. Oxford, Oxbow Books. Lock, G. R. and Harris, T. M. (1996). Danebury revisited: an English Iron Age hillfort in a digital landscape.In Aldenderfer,M. andMaschner, H.D.G.,eds.,A thropology,SpaceandGeographic Infomation Slstems,pp.214 240. Oxford, Oxford University Press. Longley, P A., Goodchild, M. F., Maguire, D. J. and Rhind, D. w. (1999). Introduction. In Longley, P A., Goodchild,M. F, Maguire, D. J. and Riind, D. W, eds.,6eographical Itformation Systems, yol.I, Prirciples and TechnicalIssues, pp. l-20. New York, John Wiley & Sons. (2005).GeographicInformation Systemsand Science,2ndedition. Chichester, JohnWiley & Sons. Loots, L. (1997).The use ofprojective and reflectiveviewshedsin the analysisofthe Hellenistic city defencesystemat Sagalassos, Turkey. Arcraeological Computitg Nev,sletter,49.12-16. http : / / facul ty . vas sar . Lowry, R. (2003).Conceptsand applicationsofinferential statistics. adu fowry. webc exL . html (accesqed 08/07/03) Mackie, Q. (2001). Settlement Archaeology in a Fjordlarul Archipelago: Netuork Analr-sis, Social Practice and the Built Environmentof WesternVancout,er Island, British Columbia, Canada since 2000 BP.Brirrsh ArchaeologicalReportslnternationalSe es 926. Oxford, Archaeopress. Macnab, N. (2003). Anglo-Scandinavian,Medieval and Post-MedievalUrban Occupationat 41-49 (accessed Walmgate,York, UK. \,rww.yorkarchaeology. co . uk/wgate/ 07109/2004). for regional archaeological Madry S. and Rakos,L. (1996).Line-of-sight and cost surfacetechniques rcsearchin the Aroux river valley. In Maschner,H. D. G., ed., New Methods, Okl Problems: pp. 104- 126.Carbondale, GeographicInformation Slstemsin Modern ArchaeologicalResearch, IL, SouthemIllinois University Centerfor ArchaeologicalInvesti8ations. Reading,MA, Addison-Wesley. Manber,U. ( 1989).ln trcduction to Algorithms: A CreativeApproac,r?. Manly, B. F. L (1991). Rantlomiaationand Monte Carlo Metho.ls in Biology. London, Chapman& Hall. Marble, D. F. (1990).The potential methodologicalimpact ofgeographic information systemson the In AIIen, K. M. S., Green,S. W and Zubrow, E. B. W, eds.,Interpreting Space: social sciences. GIS and Atchaeology,pp.9 21. London, Taylor & Francis. Marean, C. W., Abe, Y, Nilssen, P J. and Stone,E. C. (2001). Estimating the minimum number of GIS approach. skeletalelements(MNE) in Zooarchaeology: a review and a new image-analysis American Ant iquity, 66: 333-348. Maris, M. and te Boekhorst, R. (1996). Exploiting physical consftaints: heap formation through behavioural error in a group of robots. In IEEE/RSJ, ed., Intelligent Robots and Slstems, Proceedings of the 1996 IEEE/RSJIntemational Conference on Intelligent Robotsand Systems (|ROS96) Pan III,pp.1655 1661.New Jersey,IEEE.

Reference s

3 r9

Maschner,H. D. G. and Stein, J. W (1995).Multivariate approaches to site location on the northwest coast of North America. Antiquity,69: 61J3. Matheron, G. (1911). The Theory of RegionalizetlVariables and its Applications. Les Cahiers du Centre de Morphologie Math6matique,Fasc.5. Fontainebleau, Centrede Geostatistique. Mcllhagga, D. (1997). Optimal path delineatjonto multiple targetsincorporatingfixed cost distance. UnpublishedHonours B.A. thesis,Carleton University. McKee, 8., Sever,T. and Sheets,P (1994). Prehistoricfootpathsin Costa Rica: remote sensingand field vedfication.ln Sheets, P andMccee, B. , eds., Archaeology,Volcdnism, aul RemoteSensirg in the Arenal Region,Costa Rica,pp.142 157. Austin, TX, University of TexasPress. Menard, S. (2001).Applied.Logistic Regressio,r Ar,tl_tsis.ThousandOaks, CA, SagePublications. Merwin, D. A., Cromley, R. G. and Civco, D. L. (2002). Artilicial neural networks as a method ol spatial interpolation for digital elevation models CartographJ awl GeogruphicInformation S cien ce .29 :9911 0. Milner, G. R. (1996). Native American interactions:multiscalar analysesand interpretationsin the EasternWoodlands.Book reviews.Anti4rity.lO:992 995. Minefti, A. E., Ardigd, L. P and Saibene,F (1993). Mechanical determinantsof gradient walking. Jountal of Plrysiology,4'7l: '125 135. Mitas, L., Mitasova,H., Brown, W M. and Astley, M. ( l 996). Interactingfields approach forevolving spatial phenomena:application to erosion simulation for optimized land use. In Procee(lings, Thircl International Conferencenyorkshop on lntegrating GIS and ENironmental Modeling, SantaFe, NM, January2l-26, t996. SantaBarbara,CA, National Centerfor GeographicInformation and Analysis.www. ncgia. ucsb. edu/ conf / SANTA-FECD RoM/inain. html (accessed 29l09/04). Mitasova. H., Mitas, L., Brown, W. M., et dl. (1995). Modelling temporally and spatially distributed phenomena:new methods and tools for GRASS GIS. International Journal of Geographic InJbrmatiotr S),stems.9t433446. Mithen, S. J. (1989). Evofutionary theory and post processual archaeology. Antiquitt,63'.483494. ed. (2000). Hunter-GathererLandsccrpe Archaeology: The SottthemHebrides MesoLithicProiect 1988-9B, Vol. 2, Archaeologicdl Fieldwork on Colonsay, Comp ter Modelling, Experimental Archaeology,and Findl Interpretatioa,i.Cambridge,McDonald Institute for Archaeological Research. Mlekuz, D. (2004). Listening to the landscapes: modelling soundscapesrn GlS. Intemet Archaeology. 16.http : / /intarch. ac. uk/ j ournal /issue16 /nlekuz toc. htmf (accessed 23l05/05). Monk, D. (2001).SpatialStatistics v. 1.0 for Arcview. hLtp://arcscripts.esr.i.com/ details . asp?dbid= 11 B2 8 (accessed 03109/20U). Monmonier, M. (1991). Hov, to Lie with Maps. Chicago,IL, University ofChicago Press. Moran, P A. P (1950). Notes on continuousstochasticphenomena. Biometrika,3'1..I7-23. Miiller, J. (1988). The ChamberedCairns of the Northent and WestemIsles, OccasionalPaper 16. Edinburgh,Depaftmentof Archaeology,University of Edinburgh. Neaing, M. A., Lane,L. J. and Lopes,V L. (1994).Modeling soil erosion.In Lal, R., ed., Soil Erosiorl ResearchMethods,2nd edition, pp. 127-156. Delray Beach, FL, Soil and Water Conservation Society/St.Lucie Press. Neiman, F. D. (1997).Conspicuous consumptionas wastefuladvertising:a Danvinian perspecri!eon spatialpatterns in ClassicMaya terminal monumentdates.ln Barton,M. C. andClark, G. A., eds., Rediscovering Darwin: Evolutionary Theory and Are-haeological Explanation, Archaeologrcal Papers ofthe American AnthropologicalAssociation'7 DC, American ,pp.267-290. Washjngton, Anthropological Association. Neteler,M. and Mitaso\a,H.(2002). OpenSourceGIS: A GPi/'SS GISApplodci. Boston,MA, Kluwer Academic Press.



Newman.M. (2002).SMR Contentand Computing Suney 2002.London, Data ServicesUnit, English Hedtage. Nigro, J. D., Ungar, P S., de Ruirerand,D. J. and Berger L. R. (2003). Developing a ceographic Intbrmation System (GIS) for mapping and analysing lbssil depositsat Swartkrans,Gauteng Plovince, SouthAfrica.Joumal of Archueologicttl Science.3O:311 324. Open GIS Consonium (1999). Document 99-049: OpenGIS simple fearuresspeciticationfor SeL, re vision 1 .1. h t t p : / / www. opengis . or glt ec h n o / s p e c s / 99- 049. pdf (accessed 21/5/03). (2001). The OpenGIS Abstract Specificarion. Topic llr OpencIS Metadara (ISO/TC 2 l I DIS 19 11 5) ht . t p: / / por t al . opengeos p a t i a l . org/ fates/?artifact id= 1094 (accessed 22108/05). Ord. J. K. and Getis, A. (l995). Local spatial autocorrelationstatistics:distributional issuesand an applicalion.Geographical Anab,sis, 2'l : 286-306. Pannatier,Y (1996). VARIOWIN: Software.for Spatiol Datu Anal,-sisil? 2D. New york. Sp nger_ Verlag. Passmore, R. and Dumin, J. V G. A. (1955). Human energy expendjtote.phrsiological Reriew,35. 8 01 . Pdlissier,R. and Goreaud,F. (2001). A practical approachto the study of spatial structurein simple casesofheterogeneousvegetatronJoulval ofVegetationScience.l2: 99 108. Peng,Z.-R. and Tsou, M.-H. (2003).httentet GIS: DistribLltedCeographic lr1fon114tiot1 Senicesfor the Intemet arul Wircl?.r.t Nnrolt. New York, John Wiley & Sons. Peregrine,P (1995). Networks ofpower: the Mississippian world system.In Nassaney.M. S. and Sassaman, K. E., eds.,Native American hlteractions: Multiscalar Anal\sesand Interpretations ii the Eastern W()odlands, press. pp.24'7-265. Knoxville, TN. University of Tennessee Perlis, C. (1999). The dist bution of n?4gorl?sin EasrernThessaly.In Haisread,p.. ed.,NeoLithic Socieq in Grcece, Shefiield Srudies in Aegean Archaeology, pp. 42 56. Sheffield, Sheflield Academic Press. (2001). The Earty Neolithic in Greece.Cambridge,CambridgeUniversiry press. Peterson, M. (2004). Developing predictive models for prehistoric settlementpatternson the High Plains of Westem Nebraska.Unpublished M.Sc. thesis, Institure of Archaeology, University College London. Philip, G., Donoghue,D., Beck, A. and caliarsaros,N. (2002). CORONA satellirephorography:an archaeological application from the Middle East.Arai4lin, 76: 109 118. Pickles,J. (1999). Arguments,debates,and dialogues:the GIS-socialtheory debateand the cotcern for altematives.In Longley, P A., Goodchild, M. F.. Maguire, D. J. and Rhind, D. W, eds.. Geographical hrfornntiott Ststenr. Vol. I. Prilriples and Technic.tl Issues,pp. 49 60. New York, John Wiley & Sons. Pollard,J. and Gillings, M. (1998.). Romancingthe stones:towardsan elementaland viftual Avebury_ ArchaeologicalDialogues, 5: 140 164. Popper, K. (1963). Conjecturesand Refutations: The Grottth of Scientifc Kttovtleclge.London, Routledge& Kegan Paul. Portugaii, J. and Sonis, M. (1991). Palestiniannational identity and the Israeli labour marker: e_ analysis.The PtufessionalGeogrdpher,46 256-2'79. Poulter,A. and Kerslake,I. (1997).Vertical photographicsiterecording:the..HolmesBoom.,.Journt . of Field Archaeolog!.24:221 232. Powlesland, D. (1997).Publishingin the round:arole forCD-ROM in thepublicarionofarchaeological fieldwork results.Artiquin, T l: 1062 1066. ( 1998).The West Heslerton assessment. hltenlet Archaeolog).,5. ht tp : / / intarch . ac . uk / j ournal / i s sue 5 /wes the s _t oc . html (accessed 30/08/2004). Powlesland, D., Clemence, H. and Lyall, J. (1998). Wesr Heslenon: WEB CD: rhe application of HTML and WEB tools for creating a dist.ibuted excavation archive in the form

Reference s


of a WEB CD.lnre rnetAr c haeologl. 5. ht t p: / / int ar c h . ac.uk/journaj/issue5/ we s thes cd_t o c . htmf (accessed 3l /0g/2004). Powlesland,D., Lyall, J. and Donoghue, D. (1997). Enhancing the record through remote sens_ ing. The application and integration of multi sensor,non-invasiveremote sensingtechniques fbr the enhancement of the Sites and Monuments Record. Heslerronparish project, N. york shire,England. 1rleft er ArclMeologr,2.hLLp : / / intarch. ac . uk/ j ourna-L / issue2 / p1d-index . html (accessed 05/l t /200,1). Prerno,L (2004).Localspatial autoco'eration statistics quantify muiti-scare patternsin distributional data: an exampleftom the Maya Lowlands. umal ol Archaeological Science. "/o -jI: 855-g66. Ray, E. T. and Mclnrosh, J. (2002). pert & XML. Cambrid.ge, MA, O,ieilly. Reiph, E. (1976). P/ace and Placelessness. London. pion. Renfrew, c. (19'76). Before civilisatiotl: The Ratliocarbon Reyolution .tnd ptehistoric EuroDe. HarmonLls$ orth. Pelican. Renfrew, C. and Dixon, J. (1976). Obsidian in Western Asia: a review. In Sieveking, G. d. G., Longworth, I. H. and Wilson, K. E., eds., problerns itt Economic antl Social Archaeoloptpp. 137 150.London, Duck\,r'orth & Co. Renfrew,C., Dixon, J. E. and Cann, J. R. (1968)_ Fu.ther analysisof Near East obsidian.p nceediltps of the Pr