Вы находитесь на странице: 1из 205

GIS Manual Index File

A SELF TEACHING STUDENT'S MANUAL FOR GIS


Module 1. Geographic Information Systems Module 2. Raster GIS: An Introduction Module 3. Vector GIS: An Introduction Module 4. Managing Attribute Data in GIS Module 5. Integrating Remote Sensing with GIS

Modules written by
George Cho Ph.D. Associate Professor, School of Resource, Environmental & Heritage Sciences
Division of Science and Design Univeristy of Canberra, ACT 2601, AUSTRALIA.

Citation
To reference this material the correct citation for this page is as follows: Cho, G (1995) A Self-Teaching Student's Manual for Geographic Information Systems. (Insert here the Module Number and Name) Canberra: University of Canberra and CAUT. Online URL, http://infosyslaw.canberra.edu.au/gismodules/index.html, as of (...) [date and time].
Send an e-mail message to cho@scides.canberra.edu.au if you have comments and suggestions to make about this web site. Copyright 1999 George Cho Last updated: Feb 12, 2001 Information System Law | Research Group | e-Business Law | Research Projects| Publications| GIS Teaching Modules| GIS Links| News| Conferences| e-Mail Contact

http://infosys-law.canberra.edu.au/gismodules/index.html [09.06.04 09:09:47]

GIS_Module 1--tabl_introduction

GEOGRAPHIC INFORMATION SYSTEMS


MODULE 1
[Home][Comments] [Modules] [Glossary]

TABLE OF CONTENTS
Preface Acknowledgments Introduction Materials require Aims Objectives I. Geographic, Information and Systems 1.1 Definition 1.2 A Simple Approach 1.3 Geographic and Geographical 1.3.1 Locational and Spatial Questions 1.3.2 Spatial Patterns and Spatial Processes 1.4 Information 1.5 Systems 1.6 A Quick Quiz 2. GIS: What, How and Why 2.1 A Brief History of GIS 2.2 GIS and Other Disciplines 2.3 Applications of GIS 2.4 A Critique of GIS 2.5 Another Quick Quiz 3. GIS: Nuts and Bolts 3.1 Basic Elements 3.2 GIS Viewpoints 3.3 GIS: Basic Questions 3.4 Requirements of a GIS Summary Further Reading

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_tab_intro.html (1 von 6) [09.06.04 09:10:58]

GIS_Module 1--tabl_introduction

Revision Working with PCs Glossary.

PREFACE
Geographic Information Systems is the first of five modules in the series on A Self Teaching Students Manual for GIS. This manual is the result of work undertaken for a Committee for the Advancement of University Teaching (CAUT) National Teaching Development Grant for 1995. Four other modules follow this which address raster systems, vector systems, managing attribute data and integrating remote sensing with GIS. In order to complete this selfcontained unit successfully users should be prepared to spend approximately ten hours, that is, reading and working with the manual, writing up results, doing extra reading and attempting an assessment exercise. We guarantee that you will have achieved the aims and objectives of this module if you abide by the instructions given in the best practice guarantee agreement appended below! The presentation style given in this and following modules is one which may be described as a spiral curriculum. In such a curriculum, the contents in the present module are used again in a following module except in more depth and detail the next time the same or similar concepts are encountered. In general, there are four parts to a module:

1. the text presents both the conceptual and practical aspects of the
module with examples from as many usages as possible; 2. diagrams, figures and other illustrative materials are used to explain and show relationships; 3. questions, exercises and problems to be solved and an assessment; and 4. suggestions for further reading and research. (See curriculum chart in Figure 1.1).

.
Figure 1.1 Interlocking modules of the spiral curriculum

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_tab_intro.html (2 von 6) [09.06.04 09:10:58]

GIS_Module 1--tabl_introduction

It is advisable that you use this workbook as your personal notepad. A highlighter or bright coloured biro to underline text will help identify important points. In this workbook all important concepts, words and phrases are set out in bold letters and those words and phrases used which carry different meanings from their usual are italicised. To begin with you should browse through this workbook very quickly just to get a feel of its contents. Reading this preface helps! A tutor may walk you through this workbook but the pace may or may not suit you. You should try to go through this workbook at a pace with which you are comfortable with. The appendices are an important component of the overall module because they contain important tools. Hints on using computers, a glossary, an index, and some answers to workbook problems are provided here.

Geographic Information Systems


Your Guarantee* Geographic Information Systems will:
q q q q q

be one of the most interesting units that you will do at this University; change the way you look at locations and space and your place in it; give you a solid foundation for spatial studies in applied science; be intellectually stimulating and physically challenging; and, take up a great deal of your time and energy over the semester.

*This guarantee is invalid if:

a. you do not study for at least ten hours in addition to the ten hours you will
spend going through this Manual; or b. you do not ask the tutors about anything you do not understand; or c. you do not accept responsibility for your own work habits, attitudes and learning situations.

George Cho

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_tab_intro.html (3 von 6) [09.06.04 09:10:58]

GIS_Module 1--tabl_introduction

Adapted from an idea by John Dearn (pers. comm. 1995).

ACKNOWLEGMENTS I wish to thank the grants committee of CAUT for giving me this opportunity to
prepare the student manuals and to contribute to the dissemination of knowledge in geographic information systems. The CAUT grant also enabled me to use the services of Tom Harradine as a research assistant who was able not only to learn the subject-matter very quickly but also to help assemble the reading material, prepare the computer software and necessary data and generally to bring this project to fruition. Colleagues and students in the School of Resource, Environmental and Heritage Sciences and the Faculty of Applied Science and the Faculty of Environmental Design, University of Canberra provided a very stimulating learning and teaching environment. I should also like to thank the following individuals and publishers for permission to reproduce their illustrations and examples in this workbook.
John Dearn Concepts in Biology Handbook, 1995; Abler, R.S. Adams, J.S. & Gould, P. (1971) Spatial Organization, New Jersey: Prentice Hall, Figure 3-2, p. 57; Figure 3-18, p. 77; Antenucci et al. (1991) Geographic Information Systems: A Guide to the Technology, New York: Chapman & Hall, Figure 8-13, p. 174; Berry, B.J.L. (1964) Annals, Association of American Geographers, v. 54, pg. 5; Dangermond, J. (1990) in Peuquet, D.F. & Marble, D.F. Introductory Readings in Geographic Information Systems, New York: Taylor & Francis, Figure 3 p. 36; GIS Tutor 2 Overview (1993), London: Longman, p.5; Maguire, D.F. (1991) in Maguire, D.F., Goodchild, M,F. & Rhind, D. (eds.) Geographical Information Systems, London: Longman, p. 13 Figure 1.1.

INTRODUCTION
This first module on geographic information systems (GIS) is an important one because it is the fundamental building block for all other workbooks that follow. Module 2, for example, deals with raster systems while Module 3 is devoted to vector systems. Both these are different ways of handling data in a GIS. For example, any irregularly shaped area may be represented on a paper map as having a certain dimension of shape, size and orientation. In a GIS such an area can be set out in vector form or in a raster. A vector is a line which represents the sides of an area and has a certain length and direction. A raster, on the other hand is portrayed in a grid pattern of regular dimensions, with each cell having certain values of colour intensity or shading to depict different densities which together form a discernible picture. In Module 4, how to manage the attributes of a map or a database is introduced since such attributes are an important component of any GIS. Finally, in Module 5 the integration of remotely sensed data and imagery with GIS is presented given that the latter is a technology driven by data. This workbook will be presented in three parts. The first section will concentrate on the different components of the trilogy of words geographic, information and systems. Then a second section will discuss what constitutes a GIS, how they work, how they are used and to what benefit. In section three particular

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_tab_intro.html (4 von 6) [09.06.04 09:10:58]

GIS_Module 1--tabl_introduction

issues concerning GISs are discussed including the hardware and software involved, data availability in terms of spatial data and data models, topology and Boolean arithmetic with spatial objects, analytical capabilities of GISs, functions and opportunities. At the end of each section there will be some quiz questions associated with that particular section. An assessment exercise concludes this workbook. Work on a computer and demonstrations are included to introduce newcomers to the hardware, software and related commands. MATERIALS REQUIRED
q q q q

Bright coloured biro or highlighter. Sharp pencils, preferably HB (hard-black), rulers, erasers. A4 mm graph paper, tracing paper. Access to a personal computer (Intel-based IBM-compatible PC).

AIMS
This module aims to provide students with the following:

1. Understand what geographic information systems (GIS) are in its many


forms and guises and their applications.

2. Appreciate and understand the many uses of GISs through examples 3. 4. 5. 6. 7. 8.


and demonstrations. Gain a knowledge of the history of and the disciplines contributing to GISs and the directions of future development. Evaluate the capabilities and short-comings of GIS from a study of its basic components. Comprehend the hardware, software, technical and organisational issues required to support and implement GISs in a work-place. Excite students to the possibilities and potentialities of this new information technology tool for maintenance, planning and decisionmaking in any area of application. Appreciate the use of computers in processing spatial information. Develop work skills related to using computers in any environment data processing, word processing and mapping techniques.

Objectives
As a result of completing work related to this unit students should be able to undertake the following tasks with a certain level of understanding and competence.

1. Describe the various types of GISs and their applications and to


differentiate GISs from mere data banks and mapping packages.

2. Explain the various applications and uses of GIS. 3. Relate the history and development of GIS and the various disciplines
which have contributed to this technology as well as provide an informed assessment of the future directions of this new tool. 4. Provide an objective assessment of the capabilities and relative merits of a GIS.
http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_tab_intro.html (5 von 6) [09.06.04 09:10:58]

GIS_Module 1--tabl_introduction

5. Evaluate the relative needs of different systems in terms of hardware,


software and other technical requirements as well as some of the management and organisational issues in implementing a GIS. 6. Communicate the possibilities and potentialities of this new information technology tool in a lively and animated way. 7. Use computers uninhibitedly and imaginatively for any tasks without fear of embarrassment or loss of face, or inhibitions of any kind that may occur in face-to-face teaching and learning situations. 8. Home computer skills to a high level of competence through constant practice and adopt best practice work habits.

Send an e-mail message to cho@science.canberra.edu.au with your comments and suggestions about this web site. ISBN 0 85889 4793 Copyright 1995 George Cho Last updated: Feb. 13, 2001 [Home][Comments] [Modules] [Glossary]

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_tab_intro.html (6 von 6) [09.06.04 09:10:58]

MODULE 1 Chap1A

GEOGRAPHIC INFORMATION SYSTEMS


[Home][Table of Contents][Comments] [Modules] [Glossary]

1 GEOGRAPHIC, INFORMATION AND SYSTEMS


1.1 DEFINITION Geographic information systems (GISs) mean different things to different people from the simplistic to the perverse. To some, a filing cabinet containing maps of different kinds represents a GIS in its most basic form. However, while the information on the geography of places is all there in the maps it may not be readily useable because of the way the information is depicted on the maps. Hence, relationships between places and things are not easily apparent to the unschooled. On the other hand, a GIS may seem to be part of a government information system or a new fangled alphabet soup of new information technology that is understandable only in acronyms and jargon of the trade. There is no one definition of a GIS because the experts themselves are in disagreement as to what to include and/or exclude in the definition. Thus, in order to appreciate these differences, listed below are six contrasting definitions which have been selected to highlight the importance of different aspects of GIS as perceived by various authors over time. Tomlinson (1972: ii) definition. not a field by itself but rather the common ground between information processing and the many fields utilizing spatial analysis techniques...One class or category of information system and is therefore defined by specifying the particular characteristics that would qualify an information system as geographical. Rhind (1981: 17) definition. [a term] normally used to describe general-purpose and extensible computer facilities which handle data pertaining to areas of ground or to individuals or groups of people who can be defined as living or working in specific geographical locations....The term is restricted to those computer systems which have the capability to interrelate data sets pertaining to different variables and/or to different moments in time. Lord Chorley (1987: 132) definition. a system for capturing, storing, checking, manipulating, analysing and displaying data which are spatially referenced to the Earth. Goodchild & Kemp (1990: 1) definition. a particular form of Information System applied to geographical data...A system of hardware, software and procedures designed to support the capture, management, manipulation, analysis, modeling and display of spatially referenced data for solving complex planning and management problems.

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1A.html (1 von 8) [09.06.04 09:12:22]

MODULE 1 Chap1A

Environmental Systems Research Institute (ESRI) (1990: 1-2) definition. an organised collection of computer hardware, software, geographic data, and personnel designed to efficiently capture, store, update, manipulate, analyze and display all forms of geographically referenced information ... a computer system capable of holding and using data describing places on the earths surface. Epstein (1990: 490) definition. a tool for decision-making and an aid for planning and development, consisting of a database containing spatially referenced land-related data, as well as the procedures and techniques for systematically collecting, updating, processing and distributing those data. In reproducing the definitions above, there is no value judgement that any one definition is better than the other much less that any one definition provides a final statement on the matter. Also as the list is highly selective there is no implication that any of the other definitions used by various authors, say, in Maguire, Goodchild & Rhind (1991), are unmeritorious. Rather, the listing usefully demonstrates the development of the discipline as a tool and aid, as part of a computer and Information System and one class of information systems using information processing and spatial analysis techniques. The listing also shows how GIS has developed and changed from strictly academic pursuits in research to one which has become a useful tool, an activity and as a scientific art of handling spatial data. In sum, there is no single definition of GISs save to say that it is a collective noun embracing the tools of scientific analysis and the use of computers, applications and concepts.

1.2 A SIMPLE APPROACH


One of the many ways in which to develop a deeper understanding of the various definitions of GIS is to disaggregate each word in G.I.S. itself and proceed from there. In this single word association analysis it would require one to think carefully about each word by itself and how it relates as part of the collective noun. This would involve an examination, firstly, of what kinds of issues would arise from, say, purely geographic themes. Giving incorrect map locations would be an issue or giving wrong directions as a navigator in a car would lead many a family to distraction. Secondly, information themes could be analysed not only for its content but also for the primary sources from which these data are derived. Finally, the systems part of the noun suggests how the geography and the information fit together in a systematic whole. This is important in helping to build the conceptual framework from which to hang our ideas but also to see how the technology may be used in the socioeconomic environment.

1.3 GEOGRAPHIC AND GEOGRAPHICAL


Both geographic and geographical are adjectives derived from the root word geography meaning the study of the Earth (from the Greek word geo = Earth, and graphos = study or describe). Thus, while geographic means of or relating to geography, geographical has the connotation of belonging to or characteristic of a particular region (Websters New Ideal Dictionary, 1978: 211). Most authors prefer geographic in GIS although one significant publication has used geographical in the title of the work (Maguire, Goodchild & Rhind 1991). Here it is immaterial whichever shade of meaning is preferred or used because the following provides not only an outline of words associated with the study of the Earth but also those features and characteristics that make any place special, different, distinctive and unique in the geography of a place.

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1A.html (2 von 8) [09.06.04 09:12:22]

MODULE 1 Chap1A

For nearly 2,200 years of its existence as a discipline geography has been concerned primarily with accurately describing the location of places. Until the basic task of accurately mapping places on the earths surface was completed, geographers had little time or interest in asking what existed at places and why. This was because so much of the earth was unknown and uncharted. Once the basic task was completed and the Western world kept expanding beyond its boundaries and filling the world map, by about the early 19th C. geographers then turned their attention to describing places and telling us what occurred at places and why. Thus, the familiar themes of quiz competitions, for instance questions on naming the highest mountains, the longest rivers and capital cities of exotic places became part and parcel of many a geography lesson in the 1940s and 1950s. Such descriptive geography gave way in the early 1960s to a quantitative revolution in geography which saw the introduction of analytical methods and the drawing of spatial relationships between phenomena distributed on the earths surface. It is here that the new spatial context of GIS and its related computer technology comes into play. Yet, the basic questions of geography have not disappeared. Indeed, these questions have become even more important given that they can be better defined in both relative and absolute terms. 1.3.1. Locational and Spatial Questions A basic geographical question is: Why are spatial distributions structured the way they are? This question is fundamental to the science of geography because it implies both a spatial distribution and a spatial process (Abler, Adams & Gould 1971: 56). A spatial distribution is the frequency with which something occurs in a space. Such distributions may be in one-, two-, three- and n-dimensional spaces; that is, length, breadth, height and n-dimensions in hyper-space respectively. Time is sometimes considered as the fourth-dimension. Histograms are often used to describe distributions in one-dimensional space. By placing a cell along a line (one axis) each time a value is observed we produce a two-dimensional visual expression of the frequency of occurrence of a variable in one-dimensional numeric space (See Figure 1.2a). In twodimensional numeric space the same principles apply except there are two axes (plural of axis) instead of one and for each observation there are two variables x, y to plot. Here we produce a scatter plot (Figure 1.2b). This is useful because it gives the nature of any systematic relationship between the two variables in numeric space. It is also a starting point for further analysis of the distribution of a phenomena. The x, and y-axes can also be thought of as latitude and longitudes which define terrestrial space. Distributions may also be observed in three-dimensional statistical space or terrestrial space. Plotting an observation according to three variables such as latitude, longitude and height produces a distribution in three-dimensional space (Figure 1.2c). It is difficult to visualize more than three dimensions but mathematicians and physicists tells us that 4, 6, 10 or n-dimensions of hyperspaces do exist. Figure 1.2 Spatial distributions in numeric space

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1A.html (3 von 8) [09.06.04 09:12:22]

MODULE 1 Chap1A

The two things to note here are: (1) a distribution is the frequency of occurrence of a phenomena in space, and (2) the scale at which a space is examined determines the nature of the distribution. Spatial distributions may be composed of like or unlike things; and they may be ubiquitous (found everywhere) or localized. Areal variation that describes the spatial differences in occurrences (that is spread of a phenomena over space) and density (that is, concentration of the phenomena in one place), is characteristic of almost all distributions in terrestrial space. Spatial differences in occurrence manifests itself as a pattern while density may be shown as a variation in intensity from place to place.

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1A.html (4 von 8) [09.06.04 09:12:22]

MODULE 1 Chap1A

The geographical scale of analysis of any distribution is that which is observable at the local to terrestrial scales. The question of where is basic to all geography. But this question can be answered in an absolute and a relative way. Absolute location is a position in relation to a conventional grid system designed for locational and navigational purposes. Canberra, for example, is located at 35 17 south latitude and 149 08 east longitude using the Australian Map Grid (AMG). Another description of absolute location is a street address, viz. 90210 Melrose Place, Philip, ACT 2610. The description of the apartment at this place is its site, that is, the plot of ground that it occupies. In both these examples, it enables anyone to find it easily because these do not change through time and are part of an imaginary network of places. The only difference is that the first example is based on a small-scale map whereas the latter must of necessity be based on a street map, a large-scale map. The concept of small-scale and large-scale maps need a short explanation. Consider a football field measuring 100 meters on a side. On a map of 1:10,000 scale, the field is drawn 1 centimetre on a side. On a map of 1:100,000 scale, the field is drawn 0.1 millimetre on a side. The field appears larger on the 1:10,000 scale map; we call this a large-scale map. Conversely, the field appears smaller on the 1:100,000 scale map and we call this a small-scale map. Relative location is a position with respect to other locations. For example, the University of Canberra is 14 kilometres north-west of Civic or that it is situated between Civic and Belconnen (that is, its situation). We may also say that Canberra is about three hours driving time south-west of Sydney. Such relative locations and distances can also be expressed in the cost of travel between two places, the travel time it takes or the cost of a long-distance phone call. Because these measurements are relative their location can be said to be changing also. For example, in the 17th C. it took about six months or more to travel from Liverpool, England to Sydney, Australia but today it would take about 30 hours by jet and much less with supersonic aircraft and communicate just about instantaneously by electronic means. This phenomena has been described as a time-space convergence where distant places are brought closer together through modern technology and the term Global Village has sometimes been used to describe the shrinking world in so far as communications is concerned. 1.3.2. Spatial Patterns and Spatial Processes When the internal organisation of a distribution in space is examined it may be observed that there is a pattern in the location of the elements of the distribution relative to one another. The pattern may be described as dense, sparse, agglomerated, dispersed and linear. This internal relative location of elements has been called a spatial pattern or spatial structure the location of each element relative to each other and the location of each element relative to all others taken together. Spatial processes are the mechanisms which produce the spatial patterns of distribution over time. As mentioned previously in the new spatial context relative location and relative distance define a new way of dealing with stretchable and shrinkable spaces as if these locations were drawn on a rubber sheet either under tension or stretched. In the former when the tension is released two locations can be seen to have moved physically closer together whereas in the latter the locations move apart when the rubber sheet is stretched. (See Figure 1.3). Figure 1.3 Rubber sheeting example

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1A.html (5 von 8) [09.06.04 09:12:22]

MODULE 1 Chap1A

Relative distance is the basis of relative space since spaces are defined by distances along dimensions. Thus, by choosing different distance measures we can change space (see Figures 1.4 and 1.5). Hence, we have different measurement metrics such as the following:

Euclidean space: the shortest distance between two points is a straight line. Riemann space: the shortest distance between two points is a curved line. Manhattan space: is a variant of Euclidean space in which the shortest distance between two points is a path consisting of line segments which meet at right angles. Rubber sheeting force-fits a selected number of points to pre-defined positions, typically defined through coordinate values. Figure 1.4 Manhattan space

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1A.html (6 von 8) [09.06.04 09:12:22]

MODULE 1 Chap1A

Source: Antenucci et al. (1991: 174). Figure 1.6 Population Space: A cartogram showing United States in proportion to population, 1 July 1967.

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1A.html (7 von 8) [09.06.04 09:12:22]

MODULE 1 Chap1A

Source: Abler, Adams & Gould (1971: 77 Figure 3-18).

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: Feb. 13, 2001

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1A.html (8 von 8) [09.06.04 09:12:22]

Module 1 Chap1B

GEOGRAPHIC INFORMATION SYSTEMS


[Home][Table of Contents][Comments] [Modules] [Glossary]

1.4. INFORMATION GISs are a technology fed by data. Raw data are transformed into information through initial processing. This implies that the data, as the basic components of a library of data a data bank are in the form that is useable by computers. When the data are converted into machine-readable form, invariably as a set of digits, they become digital data. The raw data may come in a variety of numeric scales of measurement nominal, ordinal, interval and ratio. nominal nom (French for name) identifies observed differences without putting a quantity, for example, a suburb, district or town centre ordinal involves ranking, greater than (>) or less than (<) in terms of attractiveness or beauty. Thus: B > C > A < D < E interval allows the difference between adjoining pairs in sequence to vary thus: B < C by 1.4, A < D by 2.8 etc. But because there is no true zero in this scale we cannot conclude that D is twice as attractive or beautiful than B.

ratio

preserves the ratio between numerical observations as well as between intervals. There is a true zero, for example, the Kelvin scale for temperature. On the Kelvin scale 20K is twice as hot as 10K.

In the context of processing therefore, when the data are ordered in some way to make it intelligible to humans we then have the beginnings of information. Ironically, in the form of information, it is neither necessary that computers are required nor are they an essential part of the context. This characterisation of data and information may be taken to higher levels of abstraction so that information is derived knowledge and the sum of all knowledge provides an intelligence to the scheme of things. Indeed, it is sometimes claimed that wisdom is the correct use of knowledge.

Figure 1.7 The hierarchical relationship between data, information, knowledge and wisdom

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1B.html (1 von 7) [09.06.04 09:14:42]

Module 1 Chap1B

A cardinal rule in scientific inquiry is that the observations that we make should be done in such a way as to lead others to obtain the same results. The data that we collect should build up precise and systematic information about our subject rather than result in a series of random impressions. There are a number of ways in which the data and information may be put together so that relationships and associations may be observed and the results shown in a graphical or tabular form. Thus, the orderly arrangement and display of a set of observations can be thought of as a description of a subject-matter under study and these are usually put in words or verbal form. Geographers have relied on maps as a means of description. More recently, emphasis is on numerical description and the map is replaced by a matrix. A matrix is any ordered table with numerical information occupying the cells. The general form of a geographical data matrix may thus be illustrated and explained (see Figure 1.8). Figure 1.8 A geographical data matrix

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1B.html (2 von 7) [09.06.04 09:14:42]

Module 1 Chap1B

Source: Berry, B.J.L. (1964) Approaches to regional analysis: A synthesis, Annals, Association of American Geographers, v. 54, pp. 2-11.

Along the vertical axis are the units of observation places, locations or areas numbered in a sequence. Along the horizontal axis are the conditions on which observations are taken, again numbered in sequence and these are referred to as the variables or attributes. They are variables because one observation may vary from the next, and they are attributes measured on a nominal scale since each observation may have distinctive features or characteristics. Each cell in the matrix will be occupied by a number and depending on the scale of measurement can indicate the presence or absence of an attribute measured on a nominal scale, the rank of the unit of observation if ordinal measurement is used or the magnitude of the observation on a variable using an interval or ratio scale. This matrix is therefore rich with information when completed. A column of the matrix will show a single distribution pattern and how this variable varies between areas or locations. A row of this matrix indicates the character of a particular place in terms of all the variables and attributes collected about this place. A third dimension may be added to represent changes through time, that is, the trends and processes which change either from place to place or of the variables and attributes observed. Any geographical pattern may also be depicted as a matrix points, lines or areas. These three representations have been used as the building blocks for a geographically based information

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1B.html (3 von 7) [09.06.04 09:14:42]

Module 1 Chap1B

system. Each of these data types can be represented in one of seven ways feature data, areal unit information, network topological data, sampling data, surface information, label text information, and graphic symbol data (see Figure 1.9). The word polygon is used here and is synonymous with area. In formal terms a polygon is simply any irregularly shaped figure with five or more sides. Looking at the geographical patterns in both matrices presented above helps put in focus important issues arising from any study. These are: what units of observation are to be used, what conditions are to be studied, when, and what units of measurements should be used? One final observation before concluding this section on information may be necessary. When we collect data and then place them into some order so that they become understandable, we sometimes combine the observations into groups. These groups are made as homogenous as possible because we are trying to generalize the data. Say we are collecting data on the number of trees in any area, we may choose to classify these data in terms of the density, that is, the number of trees per unit area. Plotting a map showing the tree density will be very informative because it may lead us to ask questions about the soil type underlying some area, proximity to water, slope of the land and so on. This map can be more useful than simply a plot of the location of each tree. Figure 1.9 Breakdown of geographic data types and methods of representation

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1B.html (4 von 7) [09.06.04 09:14:42]

Module 1 Chap1B

But here we encounter a basic paradox. The map based on tree density contains less information than the map showing the location of trees. In classifying trees according to density we lose information through the process of generalisation. We cannot tell from this density map where each tree is located. Yet, this density map conveys information more readily than the tree location map. The conclusion is that the raw data often contains more information than we can comprehend and trying to discover relationships among the raw data usually produces a condition of information overload. Thus, what we need to achieve is some balance between too much information and generalising information to the point that it becomes meaningless. 1.5 SYSTEMS Systems is the final word in GIS which requires a short discussion. The word systems has become heavily clich-ridden in the sense that nearly every scientific discipline uses the term to describe a methodology, a technique or a process. Some have considered the systems view as potentially a very powerful analytical tool. The use of the word in GIS suggests an analogy to an organism as an adaptive system, speaking of system boundaries, articulation with the environment, homeostasis (stability), equilibrium and regulation. Consider the following discussion of an information system. An information system can be conceived of as a set of interrelated structures that receive an input of data through receptors and process this data by comparing it to memory and values and submitting it to decision. Decision leads to storage of data in memory, and if appropriate, to implementation of decisions through effectors. This affects the environment beyond the system boundaries, causing feedback as part of the data input of the next phase of the input-transform-output process. Moreover, this information input may contain unneeded information or noise, as well as information at varying levels of importance. For this reason the receptors must scan and select data received prior to processing. This selectivity introduces the possibility of perceptual biases and errors, especially as the load of data input increases. Load may also involve either lag in processing or reduction of lead in forecasting from received data (Garson 1971: 51). In an overview Marble (1984: 19) tabulated the major components of a GIS to contain the following:

1. A data input subsystem which collects and/or processes spatial data derived
from existing maps, remote sensors etc. 2. A data storage and retrieval subsystem which organizes the spatial data in a form which permits it to be quickly retrieved by the user for subsequent analysis, as well as permitting rapid and accurate updates and corrections to be made to the spatial database. 3. A data manipulation and analysis subsystem which performs a variety of tasks such as changing the form of the data through user-defined aggregation rules or producing estimates of parameters and constraints for various space-time
http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1B.html (5 von 7) [09.06.04 09:14:42]

Module 1 Chap1B

optimization or simulation models.

4. A data reporting subsystem which is capable of displaying all or part of the


original database as well as manipulated data and the output from spatial models in tabular or map form. (See Marble 1984). Figure 1.10 Components of a GIS. A simplified overview

To be considered a true GIS, the software system must include all four of the functions noted above and must perform efficiently in all four areas. Thus, digitizing systems which concentrate on data capture with minimal data storage / retrieval capabilities, remote sensing and image processing systems and thematic mapping systems do not qualify as GISs because one or more of the four ingredients are missing. In future it is thought that modelling capabilities should be included as a mandatory function of any true GIS (Marble 1984: 19).

1.6 A QUICK QUIZ

1. Examine the various definitions of GIS. What do you learn about the diversity of definitions of 2. 3. 4. 5. 6. 7.
GIS? Can you find other definitions of GIS from the perspectives of (a) applications, (b) functions, and (c) system structure. Think of the cricketers worm frequently seen on TV. The number of overs bowled are given on the x-axis and the runs are on the y-axis. Is this a scatter plot? One- or 2-dimensions? Can any meaningful statistical analysis be performed on such a plot? Why and why not? Differentiate between terrestrial, statistical and numeric space. Give examples. Give examples of occurrence and density of spatial distributions that you have come across. Is a 1:1 million map a large- or small-scale map? Explain this to someone. When we have a scale of 1: 50,000 we mean 1 centimeter on the map is 50,000 centimeters on the ground. Equally, we could also have said that 1 millimeter on the map is equal to 50,000 millimeters on the ground. Both are correct ways of expressing the scale of the same map. Explain how this is true. Find out what is a representative fraction. Express this in both words and numbers. Give a real-world example of how two locations may move apart in relative terms. What is the minimum permissible personal space that we are allowed in conversation. Is this a physical space or a perceptual one? Examine Jack Dangermonds (1990) figure (Fig. 1.9 page.1- 21) very carefully. Study each of the entries in the 21 cells. Are there any overlaps in the cells? Can you think of any omissions? Where would you place the following in the matrix: a bus stop, a hockey field and a street name.

8. 9. 10. 11.

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1B.html (6 von 7) [09.06.04 09:14:42]

Module 1 Chap1B

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: Feb. 13, 2001

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap1B.html (7 von 7) [09.06.04 09:14:42]

Module1 chap2

"center">

GEOGRAPHIC INFORMATION SYSTEMS


[Home][Table of Contents][Comments] [Modules] [Glossary]

2. GIS: WHAT, HOW AND WHY


In a GIS geographic data are referenced in such a way as to allow its retrieval, analysis and display on any spatial criteria. These tasks are handled by data processing subsystems, data analysis subsystems and information use subsystems. The GIS therefore is an integrated set of computer programs for handling spatial data. There are of course various terms which have been used to describe this technology (see Table 1.1). In this section we will discuss a short history of GIS, the other disciplines that contribute to and use GISs and some applications of this new technology. A critique concludes this section. Table 1.1 Other terms used to describe Geographic Information Systems

Automated GIS (AGIS) Automated Mapping & Facilities Management (AM/FM) Computerised GIS Environment Information Systems Geo-Information Systems Geographically Referenced Information Systems Image Based Information System Land Information Systems Land Resources Information Systems Multipurpose Cadastre Multipurpose Geographic Data System Mutipurpose Input Land Use System Natural Resource Management Information System Planning Information Systems Resource Information System Spatial Information Systems System for Handling Natural Resources Inventory Data

2.1 A BRIEF HISTORY OF GIS The initial attempts to apply computer technology to handle the problems encountered with the use of spatial data were associated with military applications. Such efforts were very productive of results but at the cost of using massive computing resources. These early problems were also associated not only with the low level state-of-the-art in computer technology but also the special problems encountered with digital spatial data. In the mid-60s the first serious attempt to handle substantial amounts of spatial data using computers was instituted in the Canadian Geographic Information System (CGIS). It is still in operation and remains one of the most cost effective examples of large-scale spatial data handling. In the beginning all spatial data handling systems were custom-built, but in the late-70s general purpose, turnkey systems began appearing and today the use of such systems is the rule rather than the exception. Turnkey is a term used to describe off the shelf computer systems without any customisation or modifications. The system operates at the turn of a key. The purpose of CGIS was to analyze the data collected by the Canadian Land Inventory (CLI) and to produce statistics to be used in developing land management plans for large areas of rural Canada. The CLI created maps at a large-scale of 1:50,000 which classified land using various themes, for example, soil capability, recreational potential, wildlife, forestry and present land use. The CLI produced seven primary map layers each showing locations or areas with homogenous attributes; other map layers were developed subsequently. Among the technological innovations were the development of
http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap2.html (1 von 9) [09.06.04 09:15:44]

Module1 chap2

new technology because no one had had previous experience in this area and there was no precedent for performing GIS operations such as overlays and area measurement. The high cost of technical development landed the project in trouble when it failed to deliver promised products and capabilities. The completion of a database and the generating of products by the mid-70s established CGIS as a model of technological excellence despite its aging database. Attempts were made to adapt the system to new data and the addition of new functionality with remote access and networking, but these efforts have failed to compete with new vendor products of the 1980s and 1990s. (Refer to Goodchild, M.F. & Kemp, K. (eds.) (1990) Introduction to GIS, NCGIA Core Curriculum, Unit 23: History of GIS, Santa Barbara, CA: NCGIA, pp. 23-1 - 23-9). The ODYSSEY System was designed by the Laboratory for Computer Graphics and Spatial Analysis at Harvard University. It was developed in the mid-70s and extended earlier Harvard programs beyond format conversion to a comprehensive analysis package based on vector data and provided algorithms (a kind of mathematical formula) for polygon overlay and sliver removal. Slivers result when lines drawn over each other produce tiny sections of non-overlap when what was intended was simply one neat line. Slivers are also known as digitizing errors because they occur when adjacent polygons overlap. Sometimes the opposite may also occur during digitizing and gaps appear between adjacent polygons. An early Harvard program called SYMAP was developed as a general mapping package. The output was exclusively on a line printer hence its low resolution and poor quality and limited functionality. But it demonstrated that a computer could make maps and sparked intense interest. (See Figure 1.10 for an example of a SYMAP output). In the late 1960s CALFORM was developed to produce SYMAP maps on a plotter. A table of point locations was used to speed up the inputting of internal boundaries of a polygon. SYMVU produced a 3-D perspective of SYMAP output while GRID enabled raster cells to be displayed using the same output techniques as SYMAP and later allowed the multiple overlay of raster calls.

Figure 1.11 An example of a SYMAP output. Note the use of a line printer and over-typing to produce density shades.

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap2.html (2 von 9) [09.06.04 09:15:44]

Module1 chap2

Beginning in the 1970s POLYVRT allowed the conversion between alternate ways of forming area objects. The ODYSSEY system therefore is the culmination of developments at the Harvard Lab and its purpose was to permit the combination of different sources of data notably using the Dual Integrated Map Encoded (DIME) files from the US Bureau of the Census, Land Use Series data from the US Geological Survey (USGS), World Data Banks I & II of the Central Intelligence Agency (CIA), LANDSAT data from the National Aeronautical Space Agency (NASA) and soil survey data from the US Soil Conservation Service. The data were put into a common database and computer programmes using certain analytical processes of polygon overlay created composite coverages. The results were either coloured or blackand-white maps (Lo 1986: 369-87). GIRAS Geographic Information Retrieval and Analysis System developed by the USGS is oriented to land use and land cover maps as a data source. Typically USGS maps are produced on a 1:250,000 and 1:100,000 scale (considered as medium- to small-scale maps). The system is designed to input, manipulate, analyse and output digital spatial data. While it was developed for land use and land cover mapping, in the beginning the developers of the programme were preoccupied by editing and correcting digitizing problems in the land use and associated data base. A vector format was used in conjunction with a series of linked data files, for example, map files, text files and data files with associated subfiles and rules for data conversions to a standard format were developed. The manipulation and analytical procedures allowed for nine different transformations including rotation, conversion of geographic coordinates, text and other conversions, providing summary statistics, interpolation, filtering, generalizations and accuracy estimations. The standard output was a coloured or monochrome map, with perspective views, block diagrams and isometric histograms. GIRAS-II is an interactive, on-line, time-sharing, random access input processing system with a powerful database management system. However, the major bottleneck in such a system still remained that is, digitizing even with either manual or scanner techniques. The US Bureau of the Censuss major need was to find a simple method of assigning census returns to correct geographical locations. Address matching and the use of geographic coordinates proved a major difficulty. DIME files coded street segments between intersections using identification codes for right and left blocks, identification codes of the from and to nodes (or intersections), x, y-coordinates and address ranges on each side of a street. This file structure borrowed heavily from CGISs arc structure and the common denominator format of POLYVRT. Later on topological ideas of DIME were refined and included in the TIGER model Topologically Integrated Geographic Encoded Reference System. DIME and TIGER files have been influential in stimulating work on products which relied on street network databases:
q q

vehicle navigation systems (but only if you own a BMW 7-series!) garbage truck routing (milkos as well)

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap2.html (3 von 9) [09.06.04 09:15:44]

Module1 chap2
q

emergency vehicle despatching (police, ambulance, fire).

The production of atlases of computer generated maps for selected variables for selected cities is one application. Simple computer maps are now used for marketing and retailing applications and the advent of personal computers (PCs) has stimulated the production of maps using simple mapping packages. All these are now possible because of the digital boundary files produced by the Census Bureau. For users of this workbook with access to the Internet you may wish to view copies of DIME and TIGER files at the following address:http://www.census.gov/geog.html ESRI The Environmental Systems Research Institute, founded in 1969, developed the ARC/INFO program which built on techniques and ideas developed at the Harvard Lab and elsewhere. While initially slow to take hold, by the 1980s ARC/INFO successfully cornered the GIS market. This was because it was able to implement the CGIS idea of separate attribute and locational information, fused standard relational database management systems (RDBMS) INFO to handle attribute tables with software to handle ARCS a basic design now copied by other systems, and developed a toolbox which was command-driven and a product-oriented user interface. ARC/INFO was one of the earliest GISs to take advantage of new super-minicomputer hardware, a platform that was affordable to many resource management agencies. The emphasis of ARC/INFO has been on independence from specific platforms and operating systems. Figure 1.12 Migration path of GIS from large mainframe systems to powerful desktop PC systems.

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap2.html (4 von 9) [09.06.04 09:15:44]

Module1 chap2

Source: After McLaughlin & Coleman 1989. 2.2 GIS AND OTHER DISCIPLINES GIS is said to represent a convergence of technological fields and traditional disciplines. It has been called an enabling technology because of the potential it offers to the wide variety of disciplines which must deal with spatial data with each field providing some technique that make up GIS. While some fields emphasize data collection, GIS emphasized data integration, modelling and analysis leading some to claim it as the science of spatial information. Apart from geography there are several other disciplines which contribute to GIS today. (See Figure 1.13 The relationship between GIS and selected disciplines). Figure 1.13 The relationship between GIS and selected disciplines

Source: Maguire (1991: 13).

Cartography is concerned with the display of spatial information and the main source of input data for GIS is maps. The discipline has a long tradition in the design of maps and recent developments in digital and automated cartography provides methods for digital representation and manipulation of cartographic features and methods of visualisation. Remote sensing of the ground using satellites and aircraft are a major source of geographical data. Many image analysis systems contain sophisticated analytical functions and these data when merged with other data layers are extremely useful in a GIS. Photogrammetry is the science of using aerial photographs and techniques for making accurate measurements from them. Photogrammetry is the source of most data on topography as ground elevations for input to a GIS. Surveying provides high quality data on positions of land boundaries, buildings and other land marks. Geodesy is the source of high levels of positional control and accuracy for GIS. In conjunction with surveying techniques and the use of differential global positioning systems (DGPS) accuracies of up to a meter are possible. DGPS use three or

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap2.html (5 von 9) [09.06.04 09:15:44]

Module1 chap2

four of the 24 geo-stationary positional satellites circling the earth to take measurements and through a system of triangulation produces accurate ground positions. Statistics provide the theoretical underpinnings of GIS models for analytical, simulation and prediction purposes. Statistics provide insights to understand error and uncertainty in GIS data. Operations Research provide the tools for optimizing routines for use in GIS especially in decision-making models. For example, the quickest route on a network, the optimal location of a hospital and the location of facilities generally call upon techniques developed in operations research. Computer Science especially CAD computer aided design, DBMS database management systems, AI artificial intelligence and expert systems all provide GIS with tools and techniques. CAD software, for example, are used for data input, display and visualisation in 3-D. DBMS contribute methods for representing data in digital form, for systems design and for handling large volumes of data especially in retrieval and updating records. AI and expert systems use computer techniques to mimic decisions in such functions as map designs and generalizing map features. Mathematics, especially geometry and graph theory are used in GIS system design and analysis of spatial data as well as provide the computer functions in mathematical formulas that produce algorithms to solve particular problems. Civil Engineering especially in transportation and urban design has benefitted from interchange of ideas with GIS practitioners. (For reference see: Cowen, D. (1990) What is GIS? in Goodchild & Kemp (eds.) Introduction to GIS, NCGIA Core Curriculum, Santa Barbara, CA: NCGIA, pp. 1-1 - 1-9). 2.3 APPLICATIONS OF GIS GIS has been used in various ways. Rather than describe these in detail GIS applications may be summarized as follows:

Street network-based
q q q q q

address matching, finding locations given street addresses vehicle routing and scheduling, deliveries routing and drive-time studies location analysis and site selection, branch location assessment and analysis development of evacuation movement plans for emergencies integrated transport planning

Natural resource-based
q q q q q q

management of wilderness, floodplains, wetlands, forests, agricultural lands, aquifers, wildlife Environmental Impact Analysis (EIA), Environmental Audits viewshed analysis, intervisibility studies hazardous or toxic facility siting groundwater modelling and contamination tracking wildlife habitat analysis, wildlife corridors and migration route planning

Land parcel-based
q q q q

land use zoning and subdivision plans land acquisition water quality management land ownership management

Facilities management

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap2.html (6 von 9) [09.06.04 09:15:44]

Module1 chap2
q q q q q q

location of underground pipes, cables, sewers planning facility maintenance, infrastructure planning tracking energy use inventory management and lifecycle studies base map generation leak detection, fault location, safety monitoring

Financial services
q q q

demographic profiling target marketing insurance claim/risk modelling

Environment
q q q q q

pollution, weather, climate monitoring cause-effect studies landscape assessment conservation planning biodiversity libraries

Local Authorities
q q q q q q

planning-building control land searches boundary change modelling property road maintenance crime analysis police, fire, ambulance service command and control

Health care
q q q q

asset management ambulance and emergency mobilisation epidemiological studies road traffic accident analysis, black spots

2.4 A CRITIQUE OF GIS An introduction to GIS as a tool will be incomplete without a discussion of some of its shortcomings and drawbacks. This critique should thus begin with what a GIS is not. A GIS is not simply a computer system for making maps even though the system produces maps at different scales and projections in a variety of visually attractive colours. As an analytical tool a GIS permits the linking and identification of spatial relationships between features in visual form on a computer screen or a paper map. GISs store its information in digital form in a series of linked files in a relational database. Because of clever computer programming a GIS will be able to retrieve data on particular areas with specific characteristics at will and show these graphically. In addition, the database management system also links the spatial with other attribute nongraphic non-spatial data to produce meaningful reports and labels for map features. So a map that we are familiar with, which previously was stored on our shelves and in our desk drawers now is stored as a digital database on a magnetic medium of a floppy disk or hard disk and which is unreadable and unintelligible to the naked eye as a map. Using the database the GIS will be able to add, subtract, multiply and divide the various attributes to produce new relationships previously unseen and perhaps unthought of. The database concept is central to all good GIS structures and this distinguishes it from other simple drafting and computer aided mapping systems which can produce good quality map products but not highlight important relationships or offer any form of analysis. A GIS by sticking to the fundamental questions of what, where and how can also answer the what if? question. Essentially, a GIS provides the ability to link information with a map feature and create new relationships. For example, a GIS may determine the suitability of various sites for development, evaluate environmental impacts, calculate harvest volumes, identify sites for

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap2.html (7 von 9) [09.06.04 09:15:44]

Module1 chap2

new facilities and so on. (Refer to ESRI 1990 Understanding GIS. The ARC/INFO Method, Redlands, CA: ESRI, pp. 1-1 1-10). That a GIS is a powerful tool cannot be disputed. However, there is the danger though that such a powerful tool when used improperly could produce nonsense results that can be described variously as spurious correlations or spatial autocorrelations. On example would suffice. Suppose one were recording the incidence of sexually transmitted disease in some remote highland or island location and then simultaneously recorded the number of missionaries at these places. It may turn out that there is a high degree of association between the two statistics collected when mapped. But it will be dangerous to rely on these statistics because there may be no such relationship and the two events may be totally unrelated. The result was produced by forcing different datasets together and pushing the limits of their use. GIS is claimed as a new discipline. However, current disciplinary boundaries suggest that this new technology will sit uncomfortably in any one or a combination of disciplines. The academic practices and traditions are too entrenched to allow GIS to integrate easily. Thus, GIS will remain a powerful tool for spatial display, analysis and modelling. GIS is predicted to displace cartography, geodesy and other land sciences. The traditional cartographic production process of paper maps is said to be tedious, expensive, slow to produce and costly to update and maintain. GIS itself is in its infancy and suffers the same kinds of criticisms levelled against traditional cartography. In geodesy, the lack of architectural scale accuracy even with highly accurate DGPS limits applications at a large-scale. At present even a 1-meter resolution from DGPS is unsuitable for cadastral mapping and land surveys. These very large-scale cadastres cannot be used because GIS are ideally suited to small- to medium-scale work. In computer science the rage is with object-oriented database systems where objects are self-contained data and programmes which can help in collecting, manipulating and understanding spatial data. As an example, the Windowsbased graphical user interface (GUI) uses icons (pictures) to depict a file, programme or some application. These can be easily moved about the screen using a pointing device such as a mouse. This application, file or data may be used by the user at will by selecting it to start the program. In simple terms these icons are thus said to be objects. The introduction of these ideas into GISs has caused severe anxiety among practitioners and especially among geographers. Many geographical constructs are implicitly uncertain, for example, the concept of far away. Some spatial objects are often the product of interpretation or generalization, for example, proximity, contiguity, randomness, hinterlands, regions and so on. It is therefore very difficult to mould the observations around us into the strict and rigidly bounded objects favoured by computer programmers and scientists. A GIS is said to lead to improved spatial analysis. This argument is flawed in two ways. First, there is no way that more data will lead to better analysis. The idea of an information overload has previously been discussed. Moreover, in the collection-analysis of data there is a large time-lag so that there is always a need to re-evaluate, re-define and adapt to changes recorded and observed. This continual distillation and refinement process is an essential part of the scientific method rather than the static, slice-of-time analysis of the collected data. Secondly, as with traditional methods, GISs can preoccupy the practitioner with the mechanics and technical aspects of the technology that much more time is spent developing the model and tools than the analysis itself. This is a distinct possibility with novice users of GISs. With more expert users the problems with data availability, non-availability, of the right kind, in the right form, as well as the quantity and quality of the hardware are some of the on-going issues to be resolved. GISs are not easy to use, not only because of the technology but also because of the lack of an industry standard. With so many different vendors of GIS products there is a lack of a common interface for all users. There is need, on the one hand, to know the command structure of a particular program. For example, the widely used PC ARC/INFO has about 2,600 different commands which a user may need to be aware of if that user wishes to be fully conversant with the system. On the other hand, a user needs to have a broad understanding of the theory and mechanics of spatial analysis. Thus, to be able to use a GIS as a simple do-all tool is a false assumption. There certainly much more to it than simply producing a map as an end-product it may make a pretty picture but it may not withstand minute and critical scrutiny. The recent advent of a Windows-based version of ARC/INFO may provide the much needed standard because all vendors would have to write their programmes so that they are compliant with the requirements of Windows programming. Current GISs are deficient in handling temporal data either to show changes or trends. No models exist to show spatial succession, for example, changing land uses in different locations. The ability to model change through time is constrained by the lack of suitable data and the technology itself to show changes is still unsophisticated. Organisational issues in the implementation of GIS has seldom been discussed in the literature. This is new and expensive technology which requires someone in the organisation to promote, develop and nurture. There is limited acceptance and effective use of GIS by government agencies because of the high cost of implementation, difficulty of

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap2.html (8 von 9) [09.06.04 09:15:44]

Module1 chap2

training and keeping personnel and fear of organisational change. The latter is a structural problem because people in the higher levels of administration may not wish to abdicate their decision-making roles, while those using GIS at the coalface may wish to implement certain decisions suggested by the GIS but which do not filer upwards to those who actually make the decisions. These hierarchical decision-making structures may make it difficult for the widespread adoption of GISs. Moreover, there is limited documentation of successful GIS implementation to convince the decision-maker to invest in this new technology. (See Aangeenbrug, R.T. 1991. A critique of GIS in Maguire, Goodchild & Rhind (eds.) pp. 101107).

2.5. ANOTHER QUICK QUIZ

1. Examine the symap printout carefully (Figure 1.11). What are the advantages and limitations of this
kind of a mapping system? How are irregularly shaped areas shown? What about drawing line features on such a system. Can point locations also be shown using this mapping package? 2. Explain the difference Automated Mappin/Facilities Management (AM/FM), Computer Aided Design (CAD) and GIS. 3. Explain the difference between attribute data and cartographic data. Give examples

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: Feb.13 2001

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap2.html (9 von 9) [09.06.04 09:15:44]

Module1 Chap3_sum

GEOGRAPHIC INFORMATION SYSTEMS


[Home][Table of Contents][Comments] [Modules] [Glossary]

3. GIS: NUTS AND BOLTS


3.1 BASIC ELEMENTS There appear to be four elements which together comprise a GIS computer hardware, computer software, data and liveware. a. The computer hardware can range from the most sophisticated mainframe computer to mini-computers, high performance workstations to personal computers. The trend in the late 1990s is towards workstations using the UNIX operating system. Included under hardware are input and output devices. Input devices include digitizers, scanners and other automated digital data capture equipment. For data output, apart from the computer monitor and the printer, there are plotters of various kinds as well as output direct into magnetic devices such as floppy disks, cartridges and more recently CD-ROMs (compact disksread only memory). b. GIS computer software has been developed to very sophisticated levels to include a large number of commands and a variety of functionality. Three basic GIS designs have evolved. These are called file processing, hybrid and extended designs. In the file processing design, each data set and function is stored as a separate file and these are linked together during analytical operations. Examples of systems using this design are IDRISI, IMAP, E-RMS, SAGE. In the hybrid design attribute data are stored separately in a DBMS while geographical data are stored and processed by a different computer programme. ARC/INFO, MapInfo, GenaSys are examples of hybrid designs. In the third design, extended DBMS both the geographical and attribute data are stored in a DBMS which is extended to provide appropriate geographical analytical functions. The best known examples using the extended design are SYSTEM 9 which extends the EMPRESS DMBS and TIGRIS systems. c. GIS Data. The third important element in a GIS are the data. Geographical data are very expensive to collect, store and manipulate because huge volumes are required even for small areas. It has been estimated that the cost of data is often more than twice the cost of the software and hardware in a GIS. Data of the right form and type has always been very scarce even with the use of remote sensing satellites and mapping programmes to collect digital data on a national scale. d. Liveware. This is the most significant GIS element because people are responsible for designing, implementing and using GIS. The lack of trained personnel has impeded the more widespread implementation of GIS. The focus on the technology has sometimes overlooked the more important element of the GIS the people who provide the intelligence to interpret and use the results

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap3_sum.html (1 von 7) [09.06.04 09:16:28]

Module1 Chap3_sum

that GISs produce.

3.2 GIS VIEWPOINTS


GIS can be synthesized and presented in the form of three distinct but overlapping viewpoints. These are termed the map, database and spatial analysis viewpoints. a. Map view. This focuses on the cartographic aspects of the GIS. In this school of thought, the GIS is seen as a map processing or display system. Each map is conceived of as a layer, theme or coverage which are overlayed in order to search for patterns. The search for patterns or manipulation includes functions that allow common features to be added or unique features to be subtracted from the map to produce a new map as a result of the search. Such ideas have extensive applications especially when the data are in the form of remotely sensed images. Many topographic and thematic mapping agencies favour this map view and rely on a GIS to produce high quality maps and charts for public consumption. b. Database view. This view of a GIS emphasizes the importance of a DBMS. A sophisticated DBMS is integral to many GISs and is favoured by users with a computer-information science background. From the various applications of a GIS, especially those which record transactions, those that require frequent use of simple queries such as land registration and those requiring high volumes of transactions such as real-time vehicle positioning systems and the like, a GIS using a database approach is most obvious. c. Spatial analytic view. This view focuses on the analytical and modelling aspects of the technology and is conceived by some as a spatial information science. Users also accept that this function of a GIS separates it from other kinds of information systems. These views need not necessarily be taken singly since, depending on the application at hand, all three may be included. The views may be thought as a set of interlocking chains, with each view being emphasized in particular application and software programmes. This also serves to highlight the numerous uses of GISs, its generality in application and the heterogeneity of the GIS community. (See Maguire, D.J. (1991) An overview and definition of GIS in Maguire, Goodchild & Rhind (eds.) pp. 9 - 10). 3.3 GIS: BASIC QUESTIONS The basic questions which a GIS can answer may be classified in a generic fashion. There are six generic questions that a sophisticated GIS can address: Modelling Location What is at ...? For example, the number of animals in a habitat.

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap3_sum.html (2 von 7) [09.06.04 09:16:28]

Module1 Chap3_sum

Condition

Routing

Trends

Where is it? Find a location where certain conditions are satisfied. This is the intersection of two pieces of data Which is the best way to ...? Calculates the best, fastest, shortest, most scenic route or the route between two places. What has changed since ...? Monitoring change over time, for example, deforestation What spatial patterns exist? Identification of patterns help describe and compare distributions of a phenomena, to understand the processes and account for their distribution.

Patterns

What if? To determine what happens when one changes some feature or variable. Requires geographic and other information and possibly scientific laws, for example, sea level changes, global warming, desertification for an explanation. 3.4 REQUIREMENTS OF A GIS Smith et al. (1987) discuss what should be required of a GIS. A GIS should:
q q

be able to work with large, heterogenous spatial databases; be able to query the database about the existence, location and characteristics of a wide variety of objects; operate efficiently, so that the user can work interactively with the underlying data and the required data analysis models; be easy to tailor to a variety of applications, as well as to many kinds of users; be able to learn in significant ways about the data and the users objectives; and, be able to supply a readily interpretable output product for the ultimate users of the system.

The requirements present a rich list of research projects which will appeal to a number of research disciplines. It caters both for pure researchers as well as for the practitioners and ultimately those users of the results of GIS analysis. (See Smith et al. (1987).
http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap3_sum.html (3 von 7) [09.06.04 09:16:28]

Module1 Chap3_sum

Summary
GIS is of enormous commercial significance and is having an important impact in the private, public and commercial world. GISs have been used to understand and to provide solutions to key socio-economic and environmental problems. Many people and groups use GIS for a very wide variety of applications. Inevitably, different people have different ideas of what a GIS is and what it may be used for. In summary terms a GIS is an integrated collection of computer hardware, software, data and liveware that operates in an institutional context. GIS is a special case of an information system sharing many common features with other systems. The focus on spatial analytical modelling distinguishes GISs from other systems. Spatial searching and overlay operations are fundamental functional features of any GIS.

FURTHER READING
Aageenbrug, R.T.(1991) A critique of GIS in Maguire, Goodchild & Rhind (eds.) Ch. 8, pp. 101 - 107. Abler, R., Adams, J.S. & Gould, P. (1971) Spatial Organization: The Geographers View of the World. London: Prentice Hall. Cowen, D. (1990) What is GIS? in Goodchild, M.F. & Kemp, K (eds.), pp. 1-3 1-9. Dangermond, J. (1990) A classification of software components commonly used in geographic information systems in Peuquet, D. F. & Marble, D. F. (eds.) pp. 30 - 51. Environmental Systems Research Institute (ESRI) (1990) Understanding GIS: The ARC/INFO Method. Redlands CA: ESRI. Garson, G.D. (1971) Handbook of Political Science Methods. Boston, MA: Holbrook Press Inc. pp. 49 - 65. Goodchild, M.F. & Kemp, K. (eds.) (1990) Introduction to GIS, NCGIA Core Curriculum, Santa Barbara, CA: NCGIA. Lo, C.P. 1(986) Applied Remote Sensing, Harlow: Longman, Ch. 9 Geographic Information Systems, pp. 369 - 387. Lord Chorley (1987) Handling Geographic Information. Report of the Committee of Enquiry chaired by Lord Chorley, London: HMSO. Maguire, D.F. (1991) An overview and definition of GIS in Maguire, Goodchild & Rhind (eds.), Ch. 1, pp. 9 - 20.
http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap3_sum.html (4 von 7) [09.06.04 09:16:28]

Module1 Chap3_sum

Maguire, D.F., Goodchild, M.F. & Rhind, D. (1991) (eds.) Geographical Inforamtion Systems, London: Longman Scientific & Technical. Marble, D.F. (1984) Geographic information systems: An Overview, PECORA 9 Proceedings, Spatial Information Technologies for Remote Sensing Today and Tomorrow, Oct. 2-4, Sioux Falls, SD: IEEE, pp. 18-24. Rhind, D.W. (1981) Geographical Information Systems in Britain, in Bennett, R.J. & Wrigley, N. (eds.) Quantitative Geography: Retrospect and prospect, London: Routledge & Kegan Paul, pp. 17 - 35. Smith, T.R., Menon, S., Star, J.L. & Estes, J.E. (1987) Requirements and principles for the implementation and construction of large-scale geographic information systems, International Journal of GIS, v.1(1) pp. 13-32.) Tomlinson, R.F. (ed.) (1972) Geographic Data Handling, Commission on Geographical Data Sensing and Processing. Ottawa: International Geographical Union.

REVISION 1. How are data associated with geography in a GIS? 2. Explain the difference between a transaction-based information system
and a data-based information system. What are the advantages and disadvantages of each? 3. Compare GIS to an airline reservation system. How do the information system definitions presented in this module apply to the airline reservation example? 4. Describe how crime data can be used at the operations, management and policy levels of local government. 5. The pattern of GIS development since 1965 has been largely attributable to the changing balance between the costs of hardware, communications and software development. Discuss.

WORKING WITH PCs


This guide has been prepared to enable you to start using microcomputers on the Faculty of Applied Science Local Area Network (LAN). It describes only the essential and most commonly used commands and their parameters. The network software is Novell Netware and operates 24 hours a day. Disk Drives. Microcomputers usually have two floppy disk drives and a hard disk. These are labelled A:, B: and C: respectively. The disk drives will format, write and read diskettes to any density format: 360 K, 720 K, 1.2 Mb. Never remove your floppy disk when the drive light is on. Drive G:\SCRATCH is a
http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap3_sum.html (5 von 7) [09.06.04 09:16:28]

Module1 Chap3_sum

scratch disk space where users can temporarily put large files they are working on, but be aware that files on this drive are routinely deleted without warning as the available disk space fills up. Backup. Get into the habit of backing up your work on floppy disks frequently and as often as convenient. The consequences of wasted effort, lost work or damaged work can be very time consuming and frustrating. Logging On. You may log on to the network by typing RSTUDENT on the screen showing the University of Canberra logo. Remember to log off when you finish by typing BYE. Common Commands All commands that you type into the computer are shown in courier font. format a: will format your diskette if you have not already done so. You do this once only for new disks. Formatting a used disk will wipe out all data on it. CD \TIGER will change your directory to a different path, here its path is changed to TIGER which is the name of a directory on the current drive. DIR will give you a list of files in your current directory. DIR /W will give you a directory listed to the width of your screen DIR /P will give you a directory listed a page at a time MD NEWMAN will create a new directory called NEWMAN. RD NEWMAN will remove a directory called NEWMAN. COPY A:RAILROAD B:RAILROAD will copy a file called RAILROAD from the A: drive to the B: drive. Make sure there are floppy disks in both these drives. DELETE A:RAILROAD will delete a file called RAILROAD on the floppy disk in the A: drive. RENAME A:RAILROAD.DOC A:RAILWAY.DOC will rename a file called RAILROAD.DOC to a file called RAILWAY.DOC in the A: drive. BYE will log you out of the network. Filenames. Filenames consist of from 1 to 8 characters, a decimal point and an optional 3 character extension. The legal characters include A-Z, 0-9 $ & # ! % _ [ ]. For example, HAPPY.DOC or just HAPPY are valid filenames. No spaces may appear in the filename and case is irrelevant in filenames, as it is for all DOS commands. (DOS stands for disk operating system.) The wildcard characters are ? for any single character, and * for any character or combination of characters. Therefore B:*.* refers to all files on B: drive, whilst B:*.doc refers to all files with the .doc extension.
http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap3_sum.html (6 von 7) [09.06.04 09:16:28]

Module1 Chap3_sum

Note that the following names cannot be used as filenames: AUX, COM1, COM2, CON, LST, PRN, LPT, NUL. These refer to DOS devices.

GIS Net Sites For students who have access to the Internet the following address gives you a bookmark belonging to Jim Aylward (jim@hdm.com) who has personally compiled a massive list of GIS net sites as at 1 July 1995. You are invited to surf the Net using this list. The list is a dynamic one and is being added to continually. This is just a starting point. Have a go, the internet address is:

Citation To reference this material the correct citation for this page is as follows: Cho, G (1995) Geographic Information Systems. Students Manual. Module 1. Canberra: University of Canberra and CAUT; http://infosyslaw.canberra.edu.au/gismodules/manual_1/index.html.

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: Feb. 13 2001

http://infosys-law.canberra.edu.au/gismodules/manual_1/m1_chap3_sum.html (7 von 7) [09.06.04 09:16:28]

RASTER GIS Intro and Table of Contents

RASTER GIS: AN INTRODUCTION


MODULE 2 [Home][Comments] [Modules] [Glossary]

TABLE OF CONTENTS
Preface Acknowledgments Introduction Materials required Aims Objectives 1. Raster GIS: An Introduction 1.1 Raster data structures 1.2 Scales of measurement 1.3 Raster models 1.3.1 Regular tesselations 1.3.2 Fixed spatial resolution 1.3.3 Variable spatial resolution 1.4 Raster conversions and compression 1.5 Spatial relationships 1.5.1 Geographical data and quantized data 1.5.2 Run length encoding 1.5.3 Standard run length encoding 1.5.4 Value point encoding 1.5.5 Quadtrees 1.5.6 Chain codes and block codes 1.5.7 Sampling effects and generalization 2. Raster Data Management 3. Raster Data Manipulation and Analysis

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_tab_intro.html (1 von 7) [09.06.04 09:17:24]

RASTER GIS Intro and Table of Contents

3.1 Reclassification and aggregation 3.2 Spatial operations on neighbourhoods 3.3 Making measurements 3.4 Modelling with data layers 4. Reporting 5. Other Considerations Summary Further Reading Revision Class exercise Computer Demonstration Glossary

PREFACE This module on Raster GIS: An Introduction is the second of five modules in the series on A Self Teaching Students Manual for GIS. This manual is the result of work undertaken for a Committee for the Advancement of University Teaching (CAUT) National Teaching Development Grant for 1995. Apart from Module 1 on an Introduction to Geographic Information Systems, three other modules follow this which address Vector GIS: An Introduction, Managing Attribute Data in GIS and Integrating Remote Sensing with GIS. In order to complete this selfcontained unit successfully users should be prepared to spend approximately ten hours, that is, reading and working with the manual, writing up results, doing extra reading and attempting an assessment exercise. The presentation style given in this and following modules is one which may be described as a spiral curriculum. In such a curriculum, the contents in the present module are used again in a following module except in more depth and detail the next time the same or similar concepts are encountered. In general, there are four parts to a module:

1. the text presents both the conceptual and practical aspects of the
module with examples from as many usages as possible; 2. diagrams, figures and other illustrative materials are used to explain and show relationships; 3. questions, exercises and problems to be solved and an assessment; and 4. suggestions for further reading and research. (See curriculum chart in Figure 2.1). Figure 2.1 Interlocking modules of the spiral curriculum

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_tab_intro.html (2 von 7) [09.06.04 09:17:24]

RASTER GIS Intro and Table of Contents

It is advisable that you use this workbook as your personal notepad. A highlighter or bright coloured biro to underline text will help identify important points. In this workbook all important concepts, words and phrases are set out in bold letters and those words and phrases used which carry different meanings from their usual are italicised. To begin with you should browse through this workbook very quickly just to get a feel of its contents. Reading this preface helps! A tutor may walk you through this workbook but the pace may or may not suit you. You should try to go through this workbook at a pace with which you are comfortable with.

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_tab_intro.html (3 von 7) [09.06.04 09:17:24]

RASTER GIS Intro and Table of Contents

The appendices are an important component of the overall module because they contain important tools. Hints on using computers, a glossary, an index, and some answers to workbook problems are provided here. ACKNOWLEGMENTS I should also like to thank the following individuals and publishers for permission to reproduce their illustrations and examples in this workbook. Stan Aronoff (1989: 142) for Table 5.2 used on page 2- 26; Bob Itami & Rob Raulings (1993: 9) for the summary table of rules of combinations used on page 2- 52; John Star, & John Estes (1990) Geographic Information Systems: An Introduction, Englewood Cliffs, NJ: Prentice Hall, for Figure 4.2 on page 36 and Figure 6.1 on page 79 used on page 2- 32; Peuquet & Marble (1990: 270) for Figure 16 used on page 2-2. INTRODUCTION This second module on Raster GIS: An Introduction builds from the general ideas presented in the first module. A companion to this module is the following one on Vector GIS: An Introduction. While both of these are different ways of handling geographic data, there have been software developments aimed at merging these systems so that data organisation becomes irrelevant. The fourth module in this series brings together the management of attribute data in GISs while the final module introduces the integration of remotely sensed data and imagery with GIS. The present workbook is presented in five parts. The organisation follows that suggested by Figure 1.1 of Module 1 where the four principal elements of a GIS were shown input, storage and retrieval, manipulation and analysis and reporting. These are important functional elements of any GIS and it will be instructive for us to follow the structure. After a brief introduction of the map as a layer and the concept of map overlays to produce further maps, the consideration of what a raster GIS should contain in order to provide the minimal functionality is discussed. Then in the first section, we concentrate on the input of data beginning with data acquisition. The various measurement scales are again considered here (remember we discussed these in Module 1 on page 17). The data structure of raster systems is described and conversions from and to other kinds of data structures are explained. The second section examines data management issues such as how to detect error in the data and eliminate sources of error. The section also presents the filtering of the data because it is sometimes necessary to smooth out perturbations and to present a final acceptable product. Then in the third section the manipulation and analysis of spatial data are introduced. The reclassification and analysis functions of a raster GIS are considered in this section. Also spatial operations such as finding zones, neighbourhoods and regions are discussed in conjunction with making measurements from raster data sets and their use in applying the modelling functions of the GIS. Part four gives examples of map products as thematic and proximal maps and how reports may be generated from a raster GIS. The final section is on other

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_tab_intro.html (4 von 7) [09.06.04 09:17:24]

RASTER GIS Intro and Table of Contents

considerations and a case study. Such considerations as the hardware requirements, how to deal with pixellated maps and applying anti-aliasing techniques are discussed. The application of a raster GIS to locate a gravel mining quarry is given as a case study here. An assessment exercise concludes this module. A major part of this module is in a hands-on use of a raster GIS. The software introduced is SAGE version 2.5 authored by Robert Itami and Robert Raulings of the Centre for Geographic Information Systems at the University of Melbourne and Digital Land Systems Research (DLSR), Clayton, Victoria. It is a simple system developed for use on a PC with minimal storage and processing requirements. There is an exercise associated with the use of the software to demonstrate the power and functionality of raster GISs. MATERIALS REQUIRED

r r r r

r r

Bright coloured biro or highlighter. Sharp pencils, preferably HB (hard-black), rulers, erasers. A4 mm graph paper, tracing paper. Access to a personal computer (Intel-based IBM-compatible PC) with the following specifications Intel 386-, 486-processors 640 kb RAM (560 k free RAM) (Random Access Memory) DOS 3.3 or later 20 Mb Hard Disk or larger VGA graphics and a VGA colour monitor Printer SAGE Version 2.5 Working Model One 3.5" disk (HD) (High Density 1.4 Mb capacity for data files and GIS software).

References Itami, R.M. & Raulings, R.J. (1993) SAGE Introductory Guidebook, Melbourne: Digital Land Systems Research. Itami, R.M. & Raulings, R.J. (1993) SAGE Reference Manual, Melbourne: Digital Land Systems Research. For those with access to the Internet the following home page is available: http: //www.dlsr.com.au AIMS This module aims to provide students with the following:

1. An understanding of the different kinds of raster data structures used in


raster GISs. 2. An appreciation of the representation, conversion, manipulation and
http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_tab_intro.html (5 von 7) [09.06.04 09:17:24]

RASTER GIS Intro and Table of Contents

analysis of raster data.

3. An understanding of spatial operations and modelling functions of raster


GISs.

4. Skills in applying GIS methods through a study of cases and their


applications. 5. Competence in the use of a raster GIS. 6. An holistic grasp of GIS concepts by working with individual parts of a GIS.

OBJECTIVES As a result of completing work related to this module, students should be able to undertake the following tasks with a certain level of understanding and competence.

1. Explain the use of thematic maps as layers or coverages of an area


which may be manipulated, analysed and modelled.

2. Differentiate between the various scales of measurement and explain


why each is important in a raster GIS context.

3. Show how rasters are represented in a raster data structure from simple 4. 5. 6. 7. 8. 9. 10. 11.
tessellations to hierarchical and quadtree representations. Describe data structure conversions and data compression techniques. In manipulating and retrieving data show how errors may be detected, edited and filtered. Perform reclassifications and simple analysis of raster data using simple examples. Set up spatial operations and modelling of data by identifying a problem, setting out criteria to be met, measurements to be made and final products to be produced. Describe how thematic and proximal maps may be used in a GIS report. Understand the various needs of projects from hardware and software considerations through to the nature and form of maps for presentation purposes (for example, pixellated and anti-aliasing). Critique and review a case study on the selection of a site for gravel mining. By using a raster GIS software be able to show an understanding of various commands used the why and the how and relate these commands to one or more of the five sections of this manual presented above. Develop computer skills in DOS command line operations and windowsbased application.

12.

[Home] [Table of Contents][Comments] [Modules] [Glossary]

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_tab_intro.html (6 von 7) [09.06.04 09:17:24]

RASTER GIS Intro and Table of Contents

Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 14 2001

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_tab_intro.html (7 von 7) [09.06.04 09:17:24]

Module 2 Chap1A

RASTER GIS: AN INTRODUCTION


[Home][Table of Contents][Comments] [Modules] [Glossary] I. RASTER GIS: AN INTRODUCTION In any GIS a map may now be considered as a database because of the way individual elements of the maps are stored. Since the data elements have to be converted into machine readable form, digitizing the map data are necessary. The word digitize is simply the process of putting map features into a digital form. How and what form the digital data are to take will depend largely on the GIS that will be used. In this module we concentrate on raster GIS as one form of map representation. The other form of representation, vector GIS, will be addressed in the next module. The concept of the map as a layer is not new. We have been using hand-drawn overlays in planning and geography when we wished to interrelate say, the underlying geology of an area with the vegetation and the land use of the same area. The necessity for doing so is because our geological and topographic maps are represented thematically one theme for one map. Thus, topographic maps may show rivers, contour lines and elevation, vegetation, roads, human settlement patterns and other features on a single map sheet. However, in a GIS these features will have to be categorized separately and stored in different map themes or overlays. Roads will be stored separately in one layer while hydrological features such as rivers and streams will be stored as a separate theme. By organizing our maps in layers it becomes more flexible to use since it will be possible to combine all these different themes in as many ways as one would wish. In a raster GIS the map data files are arranged as a matrix of evenly spaced grid cells or rasters. The cells are ordered in a rectangular array of rows (running across horizontally) and columns (running vertically down). The identity of each cell is thus referenced by its pair of row and column numbers. All map overlays in a single database are registered and referenced to the same grid matrix in order that the row and columns referred to are the same every time each map layer is used. The map matrix is also tied to real-world locations and features on the earths surface. This matrix representation of real-world locations will thus allow GIS functions and operations to be performed on the data. Figure 2.2 Map storage in a raster GIS

A map layer thus contains data that describes a single characteristic for each location within an area. Only one item of information is available for each location within a single layer so that multiple items of information will require multiple layers. A typical raster database will have as many layers as there are items of information.

.
Arithmetic and logical overlay operations provide means of combining the layers of information. Arithmetic overlay includes operations such as addition, subtraction, division and multiplication of each value in the data layer by the value in the corresponding location in a second data layer. A logical overlay involves finding those areas where a specified set of conditions occur or do not occur together (Aronoff 1989: 208). A detailed description of raster data structures follows in the next section. For the moment however, there is a need to outline the requirements of a raster GIS. A raster GIS must have capabilities for:

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1A.html (1 von 9) [09.06.04 09:19:35]

Module 2 Chap1A
r r r

input of data; various housekeeping functions such as data management and storage; spatial operations that allow overlays, recoding of data, simple arithmetic to be performed on the data like add, subtract, multiply, and logical operations to find zones, neighbourhoods and regions by using a spread function; and, output of data and results.

The range of possible functions and operations is enormous and current raster GIS software only provide the basic and most used functions. This is because the range is so large and no one yet has come up with a logical classification that will accommodate all known functions into a consistent scheme and one which is generally acceptable by all. 1.1 RASTER DATA STRUCTURES Data input is the procedure of converting data into a form that is readable by computers. This procedure of data capture, entry and storage represents the most expensive and time consuming part of a GIS operation and it is also the most critical because decisions based on poor data are at best poor decisions or no decision at all. The types of data considered below relate to raster data. The term raster also known as a scanning pattern first used in television technology refers to the path followed by the cathode ray in generating an image on the screen of a television set. The line by line, left to right dissection and subsequent reconstitution of television images is known as scanning the term one similarly uses in the progression of ones eyes when reading a printed page. The geometry of the scanning pattern is that it consists of two sets of lines, the first set, A, is scanned first and the lines are so laid down that there is an equal empty space maintained between lines. Then set B is laid down after the first but is so positioned that the lines fall exactly on the empty lines created by set A. Any image is thus scanned twice and this whole process is known as interlaced scanning. Each set of the scanning line is known as a scanning field with the two sets together comprising the whole scanning pattern known as a scanning frame. Depending on the cycle and "frequency" of electricity power, the repetition rate of field scanning is standardized about 50 to 60 fields per second, so that the rate of frame scanning is about 25-30 frames per second (see Figure 2.3). Figure 2.3 Interlaced scanning and the combination of scanning fields

.
Each scanned line is broken up into discrete units known as pixels (picture elements). Each pixel carries with it certain information and in our TV analogy it represents a colour value. When millions of these pixels are combined on a TV screen we see a picture. Given this kind of a framework it is easy to see how raster models can potentially be useful in capturing and displaying geographic data. In a raster model the cells next to one another demonstrate the concept of topology (a relationship) and proximity (closeness, contiguity). However, this may not be the way we think nor is the real-world presented to us in terms of one element of information at a time. Rather, our minds make lateral connections between elements, characteristics, ideas and other information. This information is overlayed with what we see and the features themselves may trigger other associations and ideas. Because the features are depicted in cells, this cellular

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1A.html (2 von 9) [09.06.04 09:19:35]

Module 2 Chap1A

characteristic means that real objects would need to be inferred from the pattern made up by the values or colours used in each of the cells. The size of each of the raster cells is fixed to the state-of-the-art in the technology either to capture or to display that data. Modern technology allows the easy capture, processing and display of raster data. Potentially this easy data capture means that very large data volumes can result from the same sources. As will be noted again in Module 5, remotely sensed images captured by satellite are in the form of rasters. The huge volumes of data mean that GIS can count on a rich load of information from this source. Image scanning devices such as the Multi-Spectral Scanner (MSS) capture remote sensing images and data on a very high resolution digital matrix organized along scan lines and making scanned patterns. Each rectangular pixel approximates a rectangular area on the ground. Each pixel carries a multiplicity of data from colour, hue, spectral signatures and so on. These data need to be decoded before they can be useful in any GIS or image processing system. A pixel is thus an area of the earths surface represented by a single digital image value. (See Figure 2.4 for an illustration of a pixel and scan line in a raster.) Figure 2.4 Pixels and scan lines in a raster

1.2. SCALES OF MEASUREMENT In a raster GIS it is important to understand the various scales of measurement. These scales nominal, ordinal, interval and ratio were introduced in Module 1 under Section 1.4 (page 16ff). Here we may wish to add a fifth kind of a variable, that of the binary scale. This variable takes on a simple value as either yes or no, present or absent, 1 or 0, true or false. In addition to these five kinds of data measurement scales, there are two different classes of data in most raster GISs. The first class is one which can be described as its spatial attributes. The location of a well can be given as a pair of coordinates or a tuple which allows us to locate it precisely on the earths surface. The second class of data we may derive is described as its non-spatial or attribute data which gives that well a character. Such characteristics as the depth of the well, the volume of water over a period of time, turbidity measurements, organism counts of the water from the well and so on. These attributes are logically linked to the locational data and in many GIS there are tools to both store and manipulate the non-spatial data along with the spatial data (Star & Estes 1990: 28). These scales of measurement are best remembered diagrammatically as suggested in Figure 2.5 below. (See also Module 1, page 1- 17). Figure 2.5 Scales of measurement in a GIS

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1A.html (3 von 9) [09.06.04 09:19:35]

Module 2 Chap1A

1.3 RASTER MODELS As noted previously, a raster model of geographic data is one in which the geometric elements are cells in an integer space. By integer space we mean that each and every one of these cells are the same; there are no part cells or half or quarter cells. The model thus simply enumerates objects in integer space one cell-one value or object. Here we need to discuss raster data structures from the point of view of how these may be represented simple, hierarchical and quadtree and from the point of view of data conversion and compression. Another way of expressing a raster data structure is to describe it as a tessellation model. A tessellation of a plane is an aggregate of cells that divide up or partition that plane. A logical unit of data in a tessellation model is a unit of space. For each of these units of space, there is also an associated set of objects and properties. So if a cell contains a geographic feature, say a tree, then we are saying that the cell is occupied by one object (a tree) which has certain properties (for example, a eucalypt of some kind, height, girth, age and so on). Thus, in this scheme we can observe three things. First, the objects are arranged spatially in relation to each other among all the cells in the raster. Secondly, there is an implicit order in the logical storage of the data because we can easily reference each cell according to some numbering system. In the raster, we reference the cells according to row number and then cell number (thus, r11c12 refers to row 11 and column 12). The numbering begins from the top left hand corner in a rectangular matrix. A third feature is that this data model mirrors arrangements in geographic space and in the real-world. This mirroring of the real-world means that we may place a transparent square or rectangular grid over a map and mark off whole squares, for example, those containing playing fields. A marked-off cell implies that it contains a playing field object or feature whereas those unmarked cells imply an absence of that feature. This present-absent scheme means that every cell must be considered as a whole an integer. Where two features share a cell, the rule of thumb is that the cell will assume the value of the more dominant feature according to what is described as the majority rule (more on this rule later in the module). For tessellation models to be useful, two criteria thus need to be satisfied;

1. first, the tessellation should be capable of producing an infinitely repetitive pattern on the plane; and, 2. secondly, that the tessellation should be infinitely recursively decomposable into similar patterns of a smaller and
smaller size. Before discussing tessellation models in detail it has to be said that the raster data structure has several advantages as well as disadvantages. Among its advantages are that firstly, it is a simple data structure. Any regular or irregular grid pattern, so long as it satisfies the criteria of a tessellation given above may be used. Secondly, overlay operations are easily and efficiently implemented. The efficiency here is measured by the speed with which such operations may be accomplished even on relatively slow PCs. Thirdly, this structure represents areas with high spatial variability quite
http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1A.html (4 von 9) [09.06.04 09:19:35]

Module 2 Chap1A

efficiently with the only limiting factor being the scale at which the cells are drawn. So smaller-scale cells representing large areas would thus be unsuitable to depict spatial variability. Fourthly, only in raster structures is it feasible to perform viewshed analysis based on digital elevation models (DEMs). Viewshed analysis are useful for determining line of sight and is useful for locating microwave transmission towers for cellular phones, TV transmissions as well as planning the aesthetics for residential housing on hill slopes. Finally, the raster format is seemingly the one most often used for the efficient manipulation and enhancement of remotely sensed images. On the other hand, critics of the raster data structure cite the loss of detail when rasters are used on a very small-scale. Cartographers are appalled by the crude outlines given to parcels of land resulting in the pink shear effect at boundaries of areas as represented by grid cell edge. Land surveyors are dismayed by the inaccuracy caused by cells when portraying linear features and points such as roads, rivers and bus stops. In these situations the raster model may have sacrificed too much detail thus resulting in a blocky appearance of the final map. While there may be solutions to eliminate these problems using so-called anti-aliasing (smoothing) techniques such as increasing the size and number of cells, there is a price to be paid. The trade-off is in the unacceptably large data files which result. A second drawback of the raster model is that its structure is less compact. Data compression techniques may often overcome this problem. The third shortcoming is that topology or relationships are often difficult to represent. Topological concepts such as from-node to to-node to indicate a path, the areas to the left and right of a linear feature, the distance and other functions can be difficult to show on a raster. There are basically three types of tessellations models, each with particular characteristics that have been tailored for specific purposes. The three types are: grid and other regular tessellations, hierarchical tessellation models and irregular tessellations (Peuquet 1991: 80). However, tessellation models can equally be viewed in terms of scale and spatial resolution since this view seems to be applications-oriented in any GIS. The following discussion, thus, focuses on regular tessellations, those meshes with fixed spatial resolution and those designs with variable spatial resolution such as quadtrees. See Figure 2.6 below on the three types of tessellation models. Figure 2.6 Typology of tessellation models

.
1.3.1 REGULAR TESSELLATIONS In general, there are three types of regular tessellations on a plane -- square (or rectangular), triangular and hexagonal (see Figure 2.7 below). However, the square mesh is most commonly used because it resembles grid cells or a raster data structure. Most data capture and display devices work easily in the horizontal and vertical axes, the x-y directions.

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1A.html (5 von 9) [09.06.04 09:19:35]

Module 2 Chap1A

Indeed, satellite images or other scanning devices produce raster data sets. The data structure in a square tessellation is compatible with computer programs such as FORTRAN, and there are compatibilities with the Cartesian coordinate system. Figure 2.7 Three regular tessellations in a recursive pattern

.
Triangular tessellations, whether regular or irregular, have the disadvantage of not maintaining the same orientation when such geometric figures are recursively produced. This can make between-cell comparisons more complex when matched against either the square or hexagonal mesh. However, this disadvantage of the triangular mesh is its strength when used in representing volume height data such as in terrain, elevation and other kinds of surface data. Assigning a 'Z' value to represent elevation to a coordinate pair of X,Y values will result in producing triangular faces when several vertices are interpolated. Such faces can represent some data by assigning slope and direction values. As an infinite variety of slope angles and orientation may be feasible, it is common to use an irregular triangular mesh since this facilitates easier interpolation. The hexagonal mesh has the intrinsic advantage that all neighbouring cells of a given cell are equidistant from the cell's centre point. This will be most useful in radial searches and retrievals around this central point. Such a characteristic is, however, absent in a square mesh. Consider a square cell with two diagonal lines drawn from the opposite corners of the square so that the intersection of the diagonal lines define the centre of the square. The distance between this center point and the sides along the diagonal are longer than those perpendicular to the sides. This characteristic makes the square cell unsuitable for radial measurements. CELL NUMBERING Here, an important technical note must be made with regards to cell numbering in raster datasets. In coordinate geometry such as the Cartesian coordinate system (introduced in the next module) the method of referencing locations on the graph is to give an X,Y coordinate pair. The numbering system for these coordinates has an origin at 0,0 at the bottom left-hand corner of the x- and y-axes respectively. However, in grid cell and raster data structures, the technology has dictated that the cells be numbered from top to bottom and from left to right so that row-cell one and column-cell one are numbered from the top left-hand corner of the grid. Also rather than speak of X,Y coordinates, references to the cells are by way of row and column numbers respectively. RASTER DATA STORAGE Thematic coverages are used in all raster structures. A data value is given to every cell (pixel) of a matrix corresponding to the entire area being mapped. Each layer has its own unique theme, and point and linear entities are encoded in such a database are part of the whole picture and are not distinguished as individual entities. This means that raster data structures can be more valuable in depicting continuous distributions rather than discrete entities, for example, point entities such as flag poles. So the approach here is a coverage with every cell having some attribute value. Point features
http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1A.html (6 von 9) [09.06.04 09:19:35]

Module 2 Chap1A

occupy a cell or pixel. Land parcels are made up of adjacent cells whose attribute values relate to that cell. Area features are made up of a number of contiguous whole cells covering the area feature under consideration. Thus, areas at the fringes covering only a part of a cell or pixel are considered to be a whole cell. The area is thus calculated by summing the total number of cells that are within a feature depicted on the map. In this way, raster storage is very much simpler than say vector data storage. The raster database is simply a group of geo-referenced coverages each of which represents the values of a different attribute at every cell location (Martin 1991: 94). Apart from the potential for amassing redundant data, the basic limitation of raster data storage is pixel size because this determines the smallest data variation which can be recorded. The accuracy of the data decreases as pixel size increases, although a smaller pixel size may mean more data values. SCALE AND SCALES OF MEASUREMENT The scale on a map or image of the surface of the earth may be understood to mean the ratio between measurements taken between points on its surface and measurements taken between equivalent locations on the earth. This ratio is usually written as a dimensionless number to indicate that the measurements on the image or map and the earth are in the same units. However, in a raster data structure, the scale of measurement is also an important consideration. Whatever the scale of the map, how values are measured or the scale of measurement used for each cell is crucial since the values need to be transformed into some digital value which can then be manipulated by the computer system. In other words, there may need to be a table of codes associated with these features so that they may be translated easily into digital values. So in a raster database the values stored in a pixel of a single layer may be either dichotomous (present or absent), a discrete class, or a continuous value, but never mixed. Manipulation and analysis of the raster database can thus be performed relatively quickly and efficiently. RESOLUTION Resolution (also known as spatial resolution) is considered to be the minimum linear dimension of the smallest unit of geographic space for which data are recorded (NCGIA 1990 Unit 4.6). More formally Tobler (1987) defined spatial resolution as "the content of the geometric domain divided by the number of observations normalized by the spatial dimension" (reported in Star & Estes 1990: 11). The two-dimension domain on maps and photos is the area covered by the observations. Thus, a dataset with high resolution is one where we have more information. In raster-based GIS, the smallest object which may be discerned on the image gives a measure of the resolution. In remotely sensed images, for example, aerial camera systems are capable of giving a resolution of about 80 lines per mm. Resolution is also sometimes represented as megapels or 106 pixels. Thus, a 640 x 480 pixel resolution gives 0.3 megapels while a 1280 x 1024 pixel resolution about 1.3 megapels. High resolution here refers to rasters with small cell dimensions giving a lot of detail and having many cells. The rasters may be large, but the cell size small. For preparing thematic maps from remotely sensed images the resolution is the size of the smallest object that is represented. Thus, for a soil layer or land use layer the resolution or the minimum mapping unit will depend on the data capture equipment. The minimum mapping unit or resel (resolution element) is the smallest element we can uniquely represent any data. This may not be the same as the raster cell size. A 10 m resolution from a SPOT satellite image in monochromatic mode means that objects smaller than 10 m will not be captured and displayed as such. (SPOT abbreviation for Systme Pour d'Observation de la Terre the French satellite first launched in 1986. It has been suggested that a rule of thumb based on statistical sampling theory that one should use a raster cell half the length or one-fourth the area of the smallest feature that is to be recorded. A more conservative suggestion is to use a raster cell one-third or one-fourth the length of the smallest desired feature (Star & Estes 1990: 37). The display of data in raster systems is conceptually simple and straight forward. This is because raster data in rows and columns fit in easily with the horizontal and vertical line display structure of most output devices. The one major drawback to this scheme is that lines at any other angle will produce a kind of distortion called aliasing. Such a stair-step appearance (jaggies in some vocabulary) is caused by the limiting resolution of the raster array and can be quite distracting. Various algorithms have been produced to overcome this problem such as one where points are selectively chosen to minimize the stair-step appearance. This technique is known as anti-aliasing (Star & Estes 1990: 174).

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1A.html (7 von 9) [09.06.04 09:19:35]

Module 2 Chap1A

ORIENTATION In a raster data structure, orientation is defined to be the angle between the true north and the direction defined by the columns of a raster. Orientation is the angular difference between the direction depicted as true north, and the direction facing a designated upper edge of a cartographic plane. Orientation is an important issue when the 2-D raster structure is transformed or used in a pseudo-3-D illustration of the data when a Z-axis is added to the X,Y coordinate pair. This consideration emerges when using a triangular mesh to depict elevation models of the data such as the triangular irregular network (TIN) which will be discussed in a later unit. For the moment, however, orientation is important for determining slope and aspect of triangulated surfaces. GEOMETRY OF REGULAR TESSELLATIONS The description of the geometry of regular tessellations can begin by first identifying zones. From this a measurement of the perimeter of the zone will be possible, as is distance from zone boundary and shape of a zone. Zones may be identified by examining adjacent pixels having the same values. The procedure is to give each such zone (sometimes also known as patch) a unique number and then set each pixel's value to the number of its zone or patch. Area of zones is obtained by summing the number of pixels contained in each zone and assigning this value to each pixel instead of the zone number. The alternative method is to print these data as a summary table on a printer. Perimeter of zones may be measured by counting the number of pixels along the edge of a zone and assigning this value to each pixel instead of the zone's number. The length of the perimeter is thus the sum of the number of exterior cell edges in each zone. Distance from zone boundary measures the distance from each pixel to the nearest part of its zone boundary and assigning this value to the pixel. The boundary is defined as the pixels which are adjacent to the pixels of different values. Shape of a zone is measured by comparing the perimeter length of a zone to the square root of its area. Then dividing this number by a constant of 3.54 produces a measure which ranges from 1 for a circle (the most compact shape) to 1.13 for a square, to larger numbers for long, thin and irregularly shaped zones. These measurements are important in practice, for example, in ecological work the geometry and spatial arrangement of wildlife habitats size and shape of wetlands, wildlife corridors, migration routes are important variables for wildlife conservation and management. 1.3.2 FIXED SPATIAL RESOLUTION Fixed spatial resolution models have the advantage that the structure of the mesh can be tailored to the areal distribution of the data. Grid cell data structures are examples of raster data with fixed spatial resolution. As the name implies, grid cells store information as numerical values in arrays. Each cell may represent a uniform parcel of land located on some large rectangular grid. The locational attribute of each cell is referenced by its corresponding row and column numbers which identifies its position. Within each cell, there may be assigned some code or numerical value corresponding to the map feature. This is referred to as the thematic attribute of that cell. Thus, the data structure is a set of spatially referenced cells with thematic attributes for each cell. Other layers of such cells represent other thematic attributes. The grid cell data structure lends itself to conventional statistical and mathematical operations. Some examples of these manipulations and transformations include the following:
r

Reclassify. Data sets may be reclassified by dividing up a continuous range of values into discrete levels of classes. Based on contiguity, cells with the same values are classed into similar classes. Overlay. This can include set-theoretic ideas of intersection and union or cover. Thus, areas may be reformed based on given values of cells at different map levels. Alternatively, arithmetic operations such as add, subtract, multiply and divide may also be performed on the cellular data. Distance and connectivity. Paths define the line of progress along contiguous cells in any direction and a measure of distance is given by the number of cells so traversed. Connectivity on the other hand simply is a

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1A.html (8 von 9) [09.06.04 09:19:35]

Module 2 Chap1A

measure of linkage of the paths so defined. A further use is that of viewsheds to assess inter-visibility among locations. Characterising neighbourhoods. This operation seeks to determine slope, aspect and orientation of cells. From these data it is also possible to produce profiles of the grid cells.

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 14 2001

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1A.html (9 von 9) [09.06.04 09:19:35]

Module 2 Chap1B

RASTER GIS: AN INTRODUCTION


[Home][Table of Contents][Comments] [Modules] [Glossary]

1.3.3 VARIABLE SPATIAL RESOLUTION The quadtree is a variable spatial resolution model sometimes equated with hierarchical tessellation models. The resolution is said to be variable in the sense that the size and density of the polygons vary over space. The mesh can be adjusted to reflect the density of data in each area of space. Cells can be made larger where data are sparse and smaller where data are dense. In an earlier discussion of tessellation models, we noted that those kinds of models had two principle properties. The first is that a repetitive pattern is produced, while the second property is that of decomposition, that is, being amenable to be broken up into smaller and smaller units whilst retaining the same shape and orientation. Whereas triangles can be subdivided into smaller triangles, the resultant orientation of adjacent triangles is radically altered. Hexagons, on the other cannot be subdivided into other hexagons but may be capable of laying the foundations of building larger-sized hexagons in a nested patterns. A quadtree is a term used to describe a hierarchical or tree-based data structure whose common property is that the structure is based on the principle of the recursive decomposition of space. Quadtrees are differentiated by:
q q q

the type of data it may be used to represent the principles governing the decomposition process the type of resolution whether variable or fixed

In a quadtree, each node in the tree can be represented by two items of information, defining whether or not it contains further subdivisions. If there are no further subdivisions it is a leaf in the tree and stops there, and it assumes the attribute value associated with it. The great advantage is the variable resolution within a single thematic coverage (see Martin 1991: 96). The quadtree structure can be used for point data, for regions, surfaces and volumes. However, its widespread adoption is hampered by the fact that it is not well suited for resource management applications. Moreover, its recency in usage as a data structure means that knowledge and expertise is lacking even though previously it has been used in map organizations for the storage of map data. This structure may be useful for focusing on interesting, detailed subsets of data in the future. A version of the quadtree structure in a turnkey system has been produced by Tydac Technologies Inc. (US) called SPANS Spatial Analysis System. Figure 2.8 demonstrates quadtrees and the hierarchical structuring of raster data. Figure 2.8 The quadtree data structure

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (1 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

Source: Peuquet (1990: 270).

The Thiessen polygon is an example of an irregular tessellation model. The basic advantage here is that there will be no redundant data and the structure of the mesh itself can be tailored to the areal distribution of the data. The variable resolution here derives from the fact that the size and density of the polygon varies over space. An irregular mesh reflects the density of data occurrences within each area of space. Thus, each cell can be defined to contain the same number of occurrences. This results in cells becoming larger where the data are sparse, and smaller where the data are dense. The size, shape and orientation of cells reflect the real-world phenomena and is very useful for visual inspection and other types of analyses. The irregular tessellation most frequently used as a spatial data model in many GISs is the triangular irregular network (TIN) or Delunay triangles, where each vertex of the triangulated mesh has an elevation value. TINs are a standard method of representing terrain data for landform analysis, hill shading and hydrological applications. (See Figure 2.9). A major problem associated with irregular triangulated networks is that there are many possible different triangulations which can be generated from the same point set. There are also many different triangulation algorithms. This means that there is a lack of consistency in the resultant networks and no one solution can be the correct one.
http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (2 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

Thiessen polygons, also called Voronoi diagrams or Dirichlet tessellations, are the logical counterpart (or dual) of the irregular triangulated mesh. Thiessen polygons are constructed by bisecting the side of each triangle at 90 angle, the result being an irregular polygonal mesh where the polygons are convex and have a variable number of sides. Such polygons are useful for calcualting adjacency, proximity and reachability of places. The problems associated with irregular polygonal tessellations include the fact that overlaying two irregular meshes can be extremely difficult. Generating irregular tessellations is also a complex and time-consuming task. These two factors make irregular tessellations cumbersome as a data model in GIS work (Peuquet 1990: 274-6). Figure 2.9 Examples of Thiessen polygons and triangular irregular networks (TINs)

1.3.4 CLASS EXERCISE The following pieces of work are designed to reinforce ideas gathered during the first half of this lesson. The exercise also serves to demonstrate practical considerations and conceptualization of problems when dealing with raster data. A. TRIANGULAR TESSELLATIONS

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (3 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

On an ordinary ruled writing pad paper do the following:

1. Draw a horizontal line A-B 4 cm long about halfway down the page. 2. Divide the line into two halves at the 2 cm mark. 3. Draw a perpendicular line at this halfway mark. 4. Draw in the sides of an equilateral triangle with 4 cm sides based on the original horizontal line. 5. Divide your perpendicular line into three equal parts beginning from the apex of the triangle and parallel to
your original horizontal line.

6. Recursively produce triangles of nearly the same size from the apex of the triangle to its base. Hint: Draw
in parallel lines at a diagonal, sloping left to right and right to left to form triangles.

7. Examine the triangles you have drawn. Are the triangles all approximately equilateral? What about the
orientation of the triangles in each of the lines how many are north pointing and how many south pointing? Work out the ratios of North-South pointing triangles for each line and then for the triangle as a whole. B. HIERARCHICAL TESSELLATIONS Part 1. A quadtree has a root. From the root there are four leaves. One leaf has four nodes. One of the nodes has four branches. Grandfather/Grandmother Father/Mother Son/Daughter Future offsprings

Schematically represent the quadtree structure described above. Part 2. You are given a standard land use code for mapping different uses. The code is in the following form: Primary codes: 1 Industrial 2 Transport 3 Residential 4 Other Secondary codes: 3 Residential 31 Single houses 32 Semi-detached 33 Multi-unit
http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (4 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

34 Other Tertiary codes: 3 Residential 31 Single houses 311 Brick 312 Wooden 313 Brick-veneer 314 Other Assume that each category of land use occupies a quarter of the land area that each category of residential use occupies a quarter of the land area that each category of single residences occupies a quarter of the land area a. Identify the root, leaf and nodes from the above. b. Draw a quadtree representation according to the percentage share of the land use types given in the data.

1.4 RASTER CONVERSIONS AND COMPRESSION


As the data used in GISs originate from various sources, this short section addresses the problem of converting input data into a form that is readily useable. Also because the volumes of data can get extremely large and potentially unmanageable data compression techniques would have to be used. In an earlier section we discussed raster data obtained from multispectral scanners (MSS) mounted on both aircraft and satellites. Data from these systems may be thought of as an array of brightness values for each wavelength band in the sensor. Several methods have been proposed to convert and store such data. One method is to keep all data pertaining to one variable (for example, rainfall, height or spectral channel) as a separate array. This method is called band sequential (BSQ) since each array is kept in a separate file. An alternative to this is called band interleaved by pixel (BIP) which places all of the different measurements from a single pixel together a single array containing multivariate pixels. The advantages of each of the methods become apparent when one wishes to use either one layer of information or a combination of data themes. A BSQ raster database will be more advantageous in the first while a BIP is more efficient in the second. The band interleaved by line (BIL) raster organization occupies the middle ground between the two methods above. Here, adjacent ground locations for a single theme are adjacent in the data file, and subsequent themes are recorded in sequence for that same line. In this way the different themes corresponding to a row in the file are relatively near each other in the file. All the values of a variable from a single line are stored before the values for another variable in the same line. Such a data organization will speed up retrieval time and subsequent processing when compared to the two methods above. (See Figure 2.10 and Star & Estes 1990: 77-80). Figure 2.10 Raster data organizations

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (5 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

1. Band Sequential (BSQ) File 1 Zoning AB,....

2. Band Interleaved by Pixel (BIP) Line 1 A I 120...B II 140

File 2 Lnad Cover I, II,...

Pixel 1 Pixel 2

File 3 Elevation 120, 140... Band Interleaved by Line (BIL)

Line 1 A B.... 120 140... Line 2 B B... ... .... .... ....

Source: Star & Estes (1990: 79, Figure 6.1). While the problem of converting between different raster data structures described above is relatively simple in computation terms, the problem of converting between vector and raster data can be difficult. These difficulties come about because of differences in representation and storage methods. The difficulties are also technical ones and these should not concern us for present purposes except to say that once one has converted vector data to raster form, the conversion is irreversible and we may have lost data through the process of generalization, data thinning and other simplifying methods. The original vector data points may be irretrievably lost (see Star & Estes 1990: 80-85 for a discussion on the various methods of data conversion from vectors to rasters). Furthermore, input data may require data reduction of various kinds because the given application may not require the details available, for example, average tree heights may be needed rather than the exact height of trees in a forest. A more complex operation of trying to reduce data is where land survey records show a wealth of detail in terms of ownership history, area, land use and so on. One option may simply be to accept it all in the database but with the attendant cost of increased processing and storage costs. An alternative option is to accept a less precise representation of the original data. This latter option is called generalization. Also called map dissolve or
http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (6 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

thinning the process results in less detailed classes by combining two or more detailed classes into one. Generalization may reduce the level of classificatory detail. But the result might make the underlying pattern more apparent one of the tasks of a GIS. Such procedures apply equally to both vector and raster data structures. In vector data structures when the data are plotted the degree of precision is dependent on the resolution the plotter is capable of. Where generalization has to be performed the procedure is called thinning and a lower resolution product results.

1.5 SPATIAL RELATIONSHIPS


1.5.1 GEOGRAPHICAL DATA MATRIX AND QUANTIZED DATA In handling geographical data, the use of a matrix has been of considerable benefit not only in helping conceptualize the task at hand, but also to provide an organizing structure to the data available. The geographical data matrix of Brian Berry (1968), an Anglo-American geographer, is conceived of as a matrix where the rows are occupied by places or locations and the columns by various geographical themes from soils, precipitation, through to population and economic activities for each of those places. Later on, he added a third dimension to this matrix to portray time slices. (See Module 1 page 19 Figure 1.8). The data matrix has been used a great deal in spatial analysis in which each cell or pixel now represents some value. However, this means that the twodimensional data are now transformed from the original continuous distributions to being quantized data elements. This transformation means a change from a continuous distribution to a discrete one. Burroughs (1986: 20) says that such transformations have very important effects when dealing with cellular data with respect to the estimation of lengths and areas, especially where the cell sizes are rather large. The example given by Burroughs is the classic Pythagorean theorem of finding the length of the hypotenuse: a2 = b2 + c2. In a right-triangle with sides 3 and 4 respectively, the length of the hypotenuse is 5. In a raster-coded diagram with sides 3 and 4 cells respectively, the length of the long side is either 4 or 7 cells depending on whether one counts cell edges or whole cells that must be traversed. The area in the right triangle is 6 square units whereas the area in the rasterized diagram is 7 square units. There is thus a loss of precision which needs to be rectified by suitable algorithms. Figure 2.11 shows the effects of raster coding of a triangle. The above discussion reveals some of the limitations of raster representation of geographic features discussed by Star & Estes (1990: 36ff). One limitation is that cells are not evenly spaced even though it is said that the area of each cell in regular tessellations is the same. The distance to each side from the centre of the cell is one unit whereas the distance along the diagonal from the centre is about 1.41 units (the square root of 2). So, in searching through the data we may be operating in a 4-connected neighbourhood when the search pattern is from the central cell to the cells directly above and below and to the two sides. When cells along the diagonal are included, a 8-connected neighbourhood is used so that cells are now not evenly spaced. In this case, some cells share only a vertex while other cells share an edge. In both examples because all neighbouring cells have the same size and shape, it is said that spatial neighbourhood similarity exists. There can be schemes where cells in the neighbourhood possesses different size and organization. (See Figure 2.12). Figure 2.11 Quantizing effects of raster coding on distance and area

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (7 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

STORAGE ISSUES The object of an efficient data structure is to reduce redundancy in a database. That is, to use as few records as possible for recording location or geographical address and the value of contiguous groups of identically valued pixels. The procedure to represent data in a more compact form is known as data compression. Where storage is compact, the companion activity of retrieval will also be relatively efficient and quick. Aronoff (1989: 168) in discussing issues of data compression has identified four commonly used methods run length encoding, standard run length encoding, value point encoding and quadtrees and these are discussed next. However, we will also discuss two other methods chain codes and block codes since these also address similar issues but with different approaches. 1.5.2 RUN LENGTH ENCODING This method exploits the fact that many datasets have large homogeneous regions. In this procedure, adjacent cells along a row that have the same value are treated as a group and termed a run. Each row in the grid is examined in turn, and pixels having the same value, that is, homogeneous pixels are grouped together. A disadvantage of this method is that pixel groups are identified only in one direction (parallel to the x-axis) so that nearby rows, that is, those above and below the current row which may have the same values of the group are represented separately. The figure below (Figure 2.13) shows an example of how data are run-encoded. Figure 2.13. Run length encoding

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (8 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

In run-length encoded data, the original data are now replaced by pairs or tuples. The first digit in the pair refers to a counter indicating the number of repetitions of the second digit, the data value. So if we had data values of the following order in each of the cells of an original data matrix, in run length encoded form we would have a much more compressed version, viz.: Original data values in each cell 1 1 1 3 3 3 2 3 3 3 3 3 3 1 1 Total = 15 Run length encoded tuples (3 1) (3 3) (1 2) (6 3) (2 1) Total = 5 The original data set of 15 elements can now be compressed to 5 elements for a compression factor of about 67 per cent. 1.5.3 STANDARD RUN LENGTH ENCODING In this method the row number, value of the attribute and the number of cells in the run are recorded. This is referred to as a standard method because the row number is always preserved in the dataset, so that the original dataset may be reconstructed if the need arose. As an example the following show an original dataset of 60 elements and the run encoded version which has reduced the number of elements by 50 per cent. Original data values in each cell AAAAAAAAAAAAAAA AAAAAAACCCCCCCC AAABBBBCCCCCAAA BBBBBBBCCCAAAAA Total number

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (9 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

of elements = 60 Standard run length encoding

Row 1 2 2 3 3 3 3 4 4 4

Value A A C A B C A B C A

Length 15 7 8 3 4 5 3 7 3 3

Total number of elements = 30 1.5.4 VALUE POINT ENCODING Here the procedure begins with assigning position numbers to each cell starting at the upper left corner and proceeding left to right and from top to bottom of the cell matrix. The position number for the end of each run is recorded in a point column, while the value for each cell in the run is in the value column. With this method we will be able to identify the start and end of cells of a value by simply looking down the table. From an original dataset of values given, it will be seen that the original 60 elements can now be reduced to 18 using the value point encoding method (a 70 per cent reduction). Original data values in each cell AAAAAAAAAAAAAAA AAAAAAACCCCCCCC AAABBBBCCCCCAAA BBBBBBBCCCAAAAA Total number of elements = 60

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (10 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

Value point encoding

Value A C A B C A B C A

Point 22 30 33 37 42 45 52 55 60
Total number of elements = 18

1.5.5 QUADTREES Quadtrees, discussed previously in Part A, addresses both the resolution and the redundancy issues directly. The quadtree seems to be a more compact version for data storage and has the advantage of using a variable sized grid cell. Thus, instead of dividing up an area into cells of one size, finer and finer subdivisions are used only in those regions where finer details has been recorded. These regions would include those data showing points, lines and polygon boundaries where a transition or change from one data value to another takes place. The conceptual framework of the quadtree rests on a process of subdivision based on specific criteria. Here the best resolution that may be obtained is governed by the cell size of the original dataset. Operations such as pointin-polygon searches and the whole family of set-theoretic operations can be performed on quadtree structured data. It also means that it is easy to generate data to any level of detail. However, a major disadvantage is that it is very time consuming to create a quadtree data structure. This implies that changes to the original data will require a re-building of the entire quadtree. Burroughs (1984: 24) is of the view that the largest problems associated with quadtrees is that "tree translation is not translation-invariant two regions of the same shape and size may have quite different quadtrees, so consequently shape analysis and pattern recognition are not straight forward". So, translations of the map or dataset by either rotation or scaling can pose considerable problems with a quadtree data structure. It is not easy to modify a quadtree once it has been built, so more complex maps pose greater challenges. The quadtree is the best suited for maps or datasets containing large homogeneous regions. A region may be split up into parts and a region can also contain holes. 1.5.6 CHAIN CODES AND BLOCK CODES Chain codes and block codes are two other means of attempting to reduce storage space and the retrieval time required. In a chain-coded representation of a map, the outline of a region gives the size and shape of any dataset. Using any starting point on the border of an object, for example, the boundaries of a park, the sequence of cardinal directions of the cells that make up the boundary of the park are recorded systematically. As shown in Figure 2.14, the location of an initial point is established and a square locational code grid is superimposed on it. From the centre of the locational grid there are eight adjoining points to which it is possible to move in one step. Each point is located at the crossing point of two of the grid lines that adjoin the centre point. The points are identified by a number ranging from 0 to 7. Once a move in the direction of the line is made and
http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (11 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

recorded, the locational grid is re-centred over a new location and the next move defined in the same way. Figure 2.14. Freeman chain coding

This is an efficient means to store zones and neighbourhoods if such regions are stored as separate entities in the database. However, to resurrect this entity to its original dataset may be too time consuming and costly. The following figure shows chain coding (sometimes called Freeman chain coding) using cardinal directions, coded in a clockwise direction. Block codes are a two-dimensional extension to run length codes. The method here is to use square blocks to tile the area to be mapped (Burroughs 1986: 23). The data structure here contains the origin square and the radius of each square. Called a medial axis transformation (MAT) the aim is to obtain as many large square blocks as possible representing data values of the same type. Thus, the larger the square that may be fitted into a region, the more efficient block coding becomes. This method has the advantage of performing union and intersection of regions and for detecting properties such as elongation. Figure 2.15 illustrates a simple region described by a medial axis transformation in block coding. Figure 2.15 Medial Axis Transformation of a Region using block coding.

Region stored in 9 unit squares plus 4 four-square nits plus 1 sixteen-squre units

1.5.7 SAMPLING EFFECTS AND GENERALIZATION Often when data are collected in the field, there is a need to pre-process the data before input into a database. For
http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (12 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

example, data may be too numerous for a particular application and more complex data may need to be simplified. This pre-processing of data may involve a change of scale. One method is to accept the level of detail thus enlarging the volume of data in a database. Alternatively, the data may be selectively reduced so that the dataset is less precise than the field data. These processes are collectively known as generalization sometimes also called map dissolve where through a process of classification, less detailed classes are obtained by combining two or more detailed classes. While generalization may reduce the level of classification detail, the result might make underlying patterns more apparent one of the tasks of a GIS. Star and Estes (1990: 91) have suggested that for raster data of continuous variables, for example, rainfall, vegetation, the data may be generalized in two steps.

1. Compute the average value of the attribute in a 2 x 2 neighbourhood of cells (that is, 4 cells
in total);

2. Record this average value in a new raster cell at the geographical location of the point
shared by the four original raster cells. These procedures effectively increase the linear dimensions of raster cells by a factor of exactly 2, so that the four original cells cover the same area as a single derived cell. Also known as resampling, the method may be used where new raster cells are not of a length that is an integer multiple of the initial cell length. For nominal and ordinal data, because cell values represent codes rather than numerical values, the problem is slightly more complicated. For example, in a raster containing nominal data, if the coded values for slope are 1 for steep, 2 for medium, and 3 for undulating, there may be difficulties in deciding a code in a 2 x 2 neighbourhood where one of the cells were coded either 1 or 3. Similarly, in a raster containing ordinal data, where individual plant species were recorded, how to code a resultant combination of mixed cells can arise. In these cases rules for aggregation need to be developed. A simple rule is that of majority or plurality cells take precedence. Another rule is to define a hierarchy of classes so that the aggregated records at one level, say the local or individual level resolution, can be grouped at a higher level and giving a coarser resolution. These sorts of rules produce new categories of data aggregated from original data. Some references refer to this procedure as re-coding since a classification procedure involves assigning new numerical values to cells derived from a combination of two or more original cells. In the majority rule case the category that covers the largest fraction of the resulting aggregated data determines the attribute level for the new cell. Ties are resolved by deciding beforehand which attribute would take precedence. Alternatively, we may treat ties as an entirely new class of mixed attributes. Averaging functions can be anything in addition to the arithmetic mean, for example, the median and the mode. Such functions effectively are a kind of filter in which only general trends remain from the original data. A general problem which arises from either filtering or resampling is where two datasets have different spatial resolutions. For example, where one dataset uses a resolution of 25 m for its cell size and another with a spatial resolution of 100 m, aggregating both sets would begin by somehow generalizing the first set by a factor of 4 so that the two datasets can now be merged at a coarser resolution of 100 m. For practical purposes, database management principles dictate that the original 20 m and 100 m resolution data should be kept separate and preserved for future use. Also extreme care should be taken in the interpretation of the aggregated data. A rule of thumb therefore is that the aggregated data can have no greater spatial precision than that of the input data layer with the coarser spatial resolution.

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (13 von 14) [09.06.04 09:20:13]

Module 2 Chap1B

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 15 2001

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap1B.html (14 von 14) [09.06.04 09:20:13]

Module 2 Chap2-5-summary

RASTER GIS: AN INTRODUCTION


[Home][Table of Contents][Comments] [Modules] [Glossary]

2. RASTER DATA MANAGEMENT


The data management function is to make the data available to users. One way to achieve this is to ensure that the data used is free from error. In this brief section we look at the quality of the data that has been input. This is because there is error associated with all geographic information. At every step of generating, processing and using geographic information error is introduced. Aronoff (1989: 141) says that "the objective in dealing with error should not be to eliminate it but to manage it". Below is reproduced a table summarizing the sources of error that one should be aware of (Table 2.1). The table is selfexplanatory and some time should be spent examining this table to appreciate the magnitude of the problem at each stage of using a GIS. In most GISs there will be some form of a database management system (DBMS) and this should be transparent to the user. In other words, the DBMS hides the details of storage and retrieval functions from users. All the user need do is understand how to access and use the information without having to be concerned about the technical and physical aspects of the storage and retrieval processes that go on within the computer system. The DBMS will permit one or more users to work efficiently with the data. The essential components of a DBMS provide users the means to define the contents of a database, insert new data, delete old data, interrogate the database for its contents and to modify the contents of the database. There are, of course, other details of DBMS such as efficiently, current DBMS technology and spatial database management and these are discussed in greater detail in Star & Estes (1990: Ch. 7 pp. 126-142). Table 2.1 Common sources of error encountered in using a GIS

Stage Data Collection

Sources of Error errors in field data collection errors in existing maps used as source data errors in the analysis of remotely sensed data

Data Input

inaccuracies in digitizing caused by operator and equipment inaccuracies inherent in the geographic feature (for example, edges such as forest edges, that do not occur as sharp boundaries) insufficient numerical precision insufficient spatial precision

Data Storage

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (1 von 16) [09.06.04 09:22:51]

Module 2 Chap2-5-summary

Data inappropriate class intervals Manipulation boundary errors error propagation as multiple overlays are combined slivers caused by problems in polygon overlay procedures Data Output scaling inaccuracies error caused by inaccuracy of the output device error caused by instability of the medium

Use of Results

the information may be incorrectly understood the information may be inappropriately used

Source: Aronoff (1989: 142). Sometimes before the data are used, there may be the need to filter the data to correct rough edges. There perturbations are not errors per se, but rather are peculiarities in the data and therefore need to be smoothed. This process is known as filtering the data and operates by moving a window across the entire raster, for example, 3x3 cells at a time. The new value for the cell at the middle of the window is a weighted average of the values in the window so that by changing the weights we can produce smoothing and edge enhancement. In smoothing a low-pass filter will remove or reduce local detail whilst in edge enhancement a high-pass filter will exaggerate local detail. All weights should add to 1. Three examples are given in Figure 2.16 below. In the first (a) each value of a cell is replaced by a simple unweighted average of it and eight neighbouring cells. The result severely smooths spatial variations in the layer. In (b) the middle cell is 12 times the weight of the neighbouring values so that spatial variation is only slightly altered. However, in (c) local detail is enhanced by giving neighbours a negative weight. Figure 2.16 The use of weights to filter data (a) .11 .11 .11 .11 .11 .11 .11 .11 .11 (b) .05 .05 .05 .05 .60 .05 .05 .05 .05 (c) -.1 -.1 -.1 -.1 1.8 -.1 -.1 -.1 -.1

Filters can be useful in enhancing detail on images for input to GIS or to smooth layers in order to expose general trends. Note that in using these filters, these weights are usually multiplied by the values found in the original matrix. The procedures of matrix multiplication need not concern us here save to say that a GIS would have such a function and will perform the operation automatically after an appropriate command is given.

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (2 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

3. RASTER DATA MANIPULATION AND ANALYSIS


Here we concern ourselves with the tasks of manipulation and analysis of data. By manipulation we mean the handling of spatial datasets in an information system. Analysis means to look for general principles underlying an observed phenomena. In this section we will examine the processes of reclassification and aggregation, spatial operations on neighbourhoods, making measurements and modelling with data layers. 3.1 RECLASSIFICATION AND AGGREGATION

It often occurs that the data an analyst may have are not in the right form either because the categories used in the data are inappropriate or that the database is of the wrong scale and resolution. Where the categories need to be changed the procedure is one of recoding the original data to produce new categories that aggregate like occurrences. Figure 2.17 demonstrates attribute aggregation beginning first with a species map (a), then recoding the original map into two classes conifers or deciduous in (b), and in (c) to remove the redundant boundaries to show the two classes of trees. It will be apparent that this process of aggregation by overlaying can have numerous applications and while it is conceptually simple can produce numerous problems in practice. Figure 2.17 Attribute aggregation of species in a forest stand Another form of the overlaying process is to use mathematical rules to help combine or eliminate cells. Such a method is commonly described as Boolean (or logical) operation. In raster arrays, Boolean operators such as OR will combine exclusions in two layers logically. The AND operator is used to merge characteristics that are required at the same time while an XOR (exclusive OR) is used to determine when one condition or the other is met but not both (see the illustration below in Figure 2.18). Classification operates

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (3 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

on several data layers and using statistical procedures to locate and describe relatively homogenous regions. This method is frequently used with remotely sensed data and employs two different approaches supervised and unsupervised classification. In the former the analyst supervises the classification by determining the classes needed beforehand through the use of a training site where the categories are found. A statistical analysis of the site is performed and from the characteristics obtained in the analysis are used as the basis to find like areas elsewhere on the map. There are problems associated with this technique including the variability of characteristics among members of the same class and choice of training site among others. In contrast unsupervised classification statistical clustering procedures are used to find different areas with similar attribute relationships. These areas are then labelled as belonging to the same class.

There is the danger of making mistakes when combining raster data made up of different kinds of measurement scales. While spatial aggregation involves increasing the size of the elemental unit in the raster array; the user will need to develop rules of aggregation for merging different attributes of the data. Such rules allow the user to take into account the interdependence of factors. The

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (4 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

method also provides a legitimate way of combining data of different levels of measurement. The output values may be nominal, ordinal or binary according to the definition of the relationship. By establishing such rules, the user may avoid making the mistake of overlaying data that will be implausible in the real-world and avoid making incorrect interpretations. The following table summarizes valid methods of analysis (Table 2.2). Note that as the level of measurement increases, there are a greater number of analysis methods available. Figure 2.18 Logical operations on a raster array

Logical AND What cells are both A AND 7? OR 7 A A B 6 7 7 A A B 6 7 7 C C B 8 8 7 reclassify reclassify 1 1 0 0 1 1 0 0 0 0 0 0 multiply 1x0 1x1 0x1 0 1 0 0+1 1 2 1 1X0 1X1 0X1 0 1 0 0+1 1 2 1 0X0 0X0 0X1 0 0 0 0+1 0 0 1 reclassify reclassify

Logical OR What cells are A

A A B 6 7 7 A A B 6 7 7 C C B 8 8 7 reclassify

0 1 1 0 1 1 0 1 1 0 1 1 0 0 1 0 0 1

1 1

1 1

0 0

add 1+0 1+1

1+0 1+1

0+0 0+0

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (5 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

Logical XOR (exclusive OR) Where does either A XOR 7 occur but not both A A B A A B C C B reclassify 1 1 0 1 1 0 0 0 0 6 7 7 6 7 7 6 8 7 reclassify 0 1 1 0 1 1 0 0 1

XOR 1+0 1+1 0+1 1+0 1+1 0+1 0+0 0+0 0+1 reclassify 1 0 1 1 0 1 0 0 1

Add 1 2 1 1 2 1 0 0 1

Table 2.2 Summary of valid methods of aggregation

Level of measurement
Nominal Ordinal

Valid method of analysis


rules of aggregation minimum, maximum, rules of aggregation

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (6 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

Binary Interval Ratio

Boolean, counting (by adding), multiplication, rules of aggregation minimum, maximum, rules of aggregation, arithmetic minimum, maximum, rules of aggregation, arithmetic

Source: Itami & Raulings 1994: 7. 3.2 SPATIAL OPERATIONS ON NEIGHBOURHOODS Spatial operations may be performed on neighbourhoods or clusters of cells in a raster array. Distance may be computed by calculating the distance of each cell from a cell or the nearest several cells. The value in the new layer thus is the distance from the given cell (c). Buffer zones can be built around objects and features and are a useful device for GIS analysis. For example, a logging buffer of 500 m around lakes and streams to conserve water quality may be used before logging operations begin. A buffer may be thought of as spreading spatially by a given distance from the object under scrutiny. Thus, the layer will have values such as 1 if it is in the original selected object, 2 if in the buffer and 0 if outside the object and buffer. Noise buffers around roads, safety buffers around hazardous facilities are some examples. The creation of a buffer in a layer is usually achieved in two steps, first doing a distance operation and second reclassifying the distance layer. A friction layer can also be introduced by using the distance layer and adding in another criteria such as volume of noise generated by 2-, 4-lane highways. Such a buffer thus will assume a variable shape, larger for segments with more noise and smaller for the less noisy segments of the highway. A final spatial operation is that of viewsheds or visible areas. Given a layer of elevations with one or more viewpoints, the area visible from at least one viewpoint may be computed. The viewpoint layer will contain values of 1 if visible and 0 if not. Such an operation is useful for planning locations of unsightly facilities such as cooling towers or surveillance facilities such as fire lookout towers or transmission facilities.

3.3 MAKING MEASUREMENTS


In making measurements based on layers of raster data, the primary task is to identify a zone. A zone is one where adjacent cells or pixels have the same value (sometimes range of values). Each such patch or zone is given a unique number. The area of a zone may thus be obtained by measuring the area of each zone and output in the form of a summary table which may be printed. Perimeters of zones may also be measured and its length is obtained simply by summing the number of exterior cell edges in each zone. Area and perimeter are highly dependent upon the orientation of the zones with respect to the orientation of the grid. Distance from the zone boundary is obtained by measuring the distance from each cell to the nearest part of its zone boundary. A boundary is defined as cells which are adjacent to cells of different zones. The shape of a zone may be described by comparing the perimeter length of a
http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (7 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

zone to the square root of its area. By dividing this number by 3.54 we obtain a measure which ranges from 1.0 for a circle (the most compact possible) to 1.13 for a square to large numbers representing long, thin and elongated zones. Measuring shape of zones is important in studying the effects of geometry and spatial arrangement of habitats. For example, the size and shape of wetlands and the number and variety of water birds it can sustain; the value of corridors across logging areas or roads to allow for the movement and migration of animals.

3.4 MODELLING WITH DATA LAYERS


As one develops experience and expertise in the use of GIS so does ones need for more and better analytical techniques. Some GISs have weak analytical functions while others have powerful but complex functions. Each user will have specific needs for particular tools. In general, it is important that the user is able to extract the necessary data either for use in GIS analysis or for use in other numerical models of analysis; in that way, a lot of flexibility is maintained and the full advantages of a GIS may be exploited. The Boolean and other arithmetic functions introduced earlier can be used to develop a wide range of application models. At a minimum the GIS should have capability that allows for the extraction of data both spatial and attribute as well as to produce it in a form that can be used by other analytical systems. Various spatial modelling strategies exist but no one generic model has been developed. The rule of thumb is that each model would need to be constructed with a particular problem at hand. In doing so, the modeller will have to be conscious of the precision of the data available and the legal methods of combining nominal, ordinal, binary, interval and ratio data. Also as variables may be interdependent and will need close attention to avoid autocorrelations. The choice of variables is important as some variables have strengths over others and some are static as against dynamic ones. These considerations play an important role in the whole process of modelling. Different strategies may be used for different models depending on the nature of the problem. The most important thing to keep in mind is the problem and the kind of information needed to solve it.

4. REPORTING
Any GIS must include software that will permit the display of maps, charts and other tabular information on a variety of output media to accompany the written report. The choice of what to use in a report will depend largely on the nature of the data, the scale and resolution of presentation, hardware and software limitations and the ultimate audience of the report. Also the GIS should provide for the export of data and results in any format for use in other systems. A common product of a GIS is a map generally defined as a two-dimensional scale model of a part of the Earth. This model is a systematic depiction of the earth, using symbols to represent certain objects and phenomena. Maps are an effective way of presenting a great deal of information about objects and the spatial relationships of objects (Star & Estes 1990: 174ff).

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (8 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

Common thematic maps include the following:


s

Thematic maps concentrate on the spatial variations of a single theme, for example, rainfall, vegetation, land use. Choropleth maps (from choro = same or similar, pleth = line) give the relative magnitudes of continuous variables as they occur within boundaries of unit areas. Population density plotted by census blocks, per capita income variation by districts or states are common examples. Such maps use different tones, colours and shading patterns to show different values for each of the sub-areas. A key or legend usually shows the correspondence between data values and the shading patterns used. Proximal or dasymetric mapping focuses on the location and magnitudes of areas exhibiting relative uniformity. Density of crops in a region is an example. As in choropleth maps shading and colour patterns are used, but unlike choropleth maps the boundaries of areas are based on changes in the data values themselves to portray areas that are relatively homogenous. The class intervals used may be defined by the analyst. Contour or isarithmic mapping is used to represent quantities by lines joining data points of equal value contour lines show differences in height and gradient; isobars show lines joining areas experiencing similar barometric pressure. A contouring operation is extremely common in a GIS. The contour map is generated first from point data and then the system interpolates between data values between known points in order to calculate the position of specific contour lines. On topographic maps contour lines are shown at set distances apart rather than at every height. On raster based systems colour codes are used to display differences in contour line values and usually an appropriate lookup table is used to cross reference a cells attribute value with the corresponding displayed representation. Other kinds of maps. Dot maps depict spatial distributions by showing density of occurrence of a phenomena. For example, one dot to represent 10 trees and so on. Proportional circle maps show spatial distribution according to some phenomenon, such as population relative to the total population. Larger circles therefore suggest larger populations. Proportional line maps show direction and magnitude of potential or actual flows. The origin and destination of passengers in a rail network is an example used frequently in network flow analysis. Landform maps provide 3-D views of the Earths surface. Cartograms are drawn to depict some phenomenon as a function of some other characteristic or attribute of the object (see Figure 1.6 on page 16 of Module 1.) Animated maps are becoming a popular way to show change through time such as the growth of a city as its population and area increase through time.

How the reports are presented is also governed by hardware considerations. The hardware includes graphic printers, plotters, film recorders and output on microfilm. Using the printer, graphic displays may be achieved in one of two ways. First, using one character to depict one category a picture for the whole area may be shown. Second, the use of over-printing often one or more characters types on the same spot will show variations in density (see Figure 1.11 on page 28 of Module 1 where the technique is used effectively in SYMAP
http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (9 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

maps.) Displaying horizontal and vertical lines on a raster is quite simple since the raster follows the rows and columns pattern naturally. However, when diagonal lines are drawn they become distorted and step-like. This is known as aliasing and is caused by the resolution of the raster array. Many algorithms are available (Bresenhams algorithm for instance) including techniques called antialiasing. A diagonal vector or line occupying cells in a raster array is depicted such that portions of line occupying less than a certain fraction of the cell are given a certain value (lighter) when compared to those portions of the line occupying a greater portion of the cell (heavier). The line occupying a full cell will have the darkest shade. In this way when viewed from a distance the line appears smoother and straighter than one in which the anti-aliasing operation has not been performed.

5. OTHER CONSIDERATIONS
This short section is included to illustrate the application of GIS to find and locate a gravel pit. Consider the following hypothetical scenario. A client is looking for new locations to mine sand and gravel. A consulting geologist has been hired and has concluded that anywhere within 1000 m of a certain stream in areas where the slopes are 5% or less will yield the necessary sand and gravel. Avoid the wetlands since they bog down plant and equipment. To check this out an initial model has been developed with the following specifications. Model 1: Criteria Best road distances.
s

Gravel deposits within 2000 m of primary roads or 1000 m of secondary roads

Second best road distances


s

Gravel deposits from 2000 to 4000 m of primary roads or 1000 to 2000 m of secondary roads Gravel pit must be at least 10-20 ha in size. Larger deposits are even better.

Best sites
s

Large deposits within close range of primary and secondary roads

Second best sites


s

Larger deposits within moderate range of primary and secondary roads

Once the potential sites have been located, a mining permit and lease of the land must be obtained. Since the sites were on public land, a local Council
http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (10 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

required a public hearing on the proposal. This is because environmentalists were worried about water quality, the bird watchers were worried about loss of habitat, the tourism board was worried about visual impacts and the neighbourhood action group was worried about dust and noise. In the event a further model had to be developed so as to take account of further criteria identified by various interest groups following the public hearing. The criteria included: Model 2 Criteria
s s

No lease within 200 m of any water, to protect water quality No lease within 800 m of any residential buildings within 2000 m radius of the centre of the water body to reduce conflicts of noise and dust No lease within 2000 m viewshed of primary roads to eliminate visual impacts along tourist roads

The illustration above demonstrates several aspects of a GIS project. First, the criteria in either model shows how various people view the problem the developer as opposed to the interest group. Second, the models also show why a GIS will be most useful given the necessary data layers. Most of the spatial operations of a GIS will be required including buffer, zones, area, perimeter, Boolean operations and viewshed analysis. Third, the modelling and spatial operations are but the start in the analysis of any project before any decisions can be made and any policy formulated. In the models it was assumed that the data are available; if not, then the process of gathering the environmental and cultural data needed for the project becomes the fourth aspect of the project. Finally, a multi-theme analysis of the dataset provides the identification the locations of gravel pits indicating areas meeting all the necessary criteria. The final map indicates those areas in which to locate the gravel and sand mining pit. SUMMARY This module has introduced the layer concept as applied to raster data structures. Rasters were explained in terms of the various scales of measurement that built on ideas introduced in Module 1. Raster representation models such as simple tessellations, hierarchical and quadtree were used to show the various ways rasters may be depicted. Because raster arrays can be potentially very large datasets raster conversions and compression techniques were described. In storage and retrieval of raster data, the topic of detecting and editing errors were discussed in addition to the use of filtering concepts. In the manipulation and analysis of raster data, attribute recoding was discussed in terms of logical or Boolean classification techniques as well as in identifying neighbourhoods, measurement and modelling techniques. Thematic and other proximal maps were described in the section on reporting and outputs from a GIS. A typical problem employing GIS methods was described and the various steps from project conception through to the assessment of criteria were used to show the scope of GIS tools in a hypothetical problem.

FURTHER READING

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (11 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

Aronoff, S. (1989) GIS: A Management Perspective, Ottawa, Canada: WDL Publications. See especially Chapter 4 Data Input and Output pp. 103 - 109; and Chapter 5 Data Quality pp. 140-143. Burroughs, P.A. (1986). Principles of Geographical Information Systems for Land Resources Assessment. Oxford: Clarendon Press. See Chapter 2 on data structures for thematic maps, pages 13-38 and especially on the raster data structure on pages 20-24. Goodchild, M.F. & Kemp, K. (eds.) (1990) Introduction to GIS, NCGIA Core Curriculum, Santa Barbara, CA: NCGIA. Maguire, D.F., Goodchild, M.F. & Rhind, D. (1991) (eds.) Geographical Information Systems, London: Longman Scientific & Technical. Martin, D. (1991). Geographic Information Systems and their Socio-economic Applications. London: Routledge. See Chapter 6 Data Storage pages 82-101, and especially see raster data storage on pages 93-97. Pazner, M. (1990) Unit 5 Raster GIS Capabilities in Goodchild & Kemp (eds.) pp. 5-1 - 5-9. Peuquet, D.J. (1990) A conceptual framework and comparison of spatial data models in Peuquet, D.J. & Marble, D.F. (eds.) Introductory Readings in Geographic Information Systems, London: Taylor & Francis, pp. 250-285. Star, J. & Estes, J. (1990) Geographic Information Systems. An Introduction, Englewood Cliffs, NJ: Prentice Hall. See especially Chapter 6 Preprocessing pp. 76-98; Chapter 8 Manipulation and Analysis pp. 143-173; and Chapter 9 Product Generation pp. 174-190. Tomlin, D. (1990) Unit 4 The Raster GIS in Goodchild & Kemp (eds.) pp. 41 - 4-9.

REVISION 1. Give three examples of how basic spatial query, analysis and 2. 3.
s s s

2. 3.

manipulation operations may be used to highlight a spatial phenomenon. What types of geographical data fit the raster GIS data model best? What types fit worst? What resolutions would be appropriate for the following problems: determining logging areas in a State forest; finding suitable locations for campsites in a wilderness area; planning subdivisions to take account of noise from an airport? "The most valuable skill in GIS is the ability to take a real problem and convert it into a series of GIS operations". Discuss. Summarize the arguments for raster GIS and the application areas in which it has distinct (a) advantages, and (b) disadvantages.

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (12 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

CLASS EXERCISE [This class exercise is under development. Please refer to handout.] COMPUTER DEMONSTRATION Log-on to the Applied Science computer network in the computer lab (3C18) as RSTUDENT Go to the C: drive by typing C: Change to the SAGE directory: CD\SAGE\BP Start the SAGE demonstration programme by typing SAGE demo1 Type END or Control C to get out. Demonstration 1 DEMO1 This program demonstrates the use of SAGE software on the PC. The demonstration will take approximately 15 minutes. There are 5 data files in this database on Browns Pond (BP) Altitude, Buildings, Trees, Water and Roads, This demonstration gives you an idea of the following:

1. 2. 3. 4. 5. 6.

Database of Browns Pond Data input into SAGE Combining maps in SAGE Terrain analysis using SAGE Data output from SAGE File house-keeping operations

SAGE Commands you need. LIST to show the list of all files in a database. What is the size of the grid cells? Give the name and type of map displayed. SHOW fn. OVER fn. SAGE shows one fn. (layer or filename) over another fn. It can also CROP, ROTATE, TILE, SCALE and EXAGGERATE any map. SAGE can also change the colours of any map and show special 3D surfaces.

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (13 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

IMPORT fn FOR fn. SAGE will import the file and load it, calling it the same or a different name. POINT FOR SPOTS SAGE can point at cells on a map and create a new map called SPOTS. To point at cells you give the row number, column number and the value, for example 23 20 1 24 20 1 etc. SHOW SPOTS will show you the spots that you have chosen. You can import AUTOCAD .DXF files and transfer between various raster-based applications such as IDRISI, ARC/INFO, and ASCII formats. BOOLEAN BOTH = TREES AND WATER allows you to combine TREES and WATER layer to form a layer called BOTH. Other spatial analysis tools in SAGE include MATH, CROSS, MAX, MIN, COVER and COPY. Terrain analysis commands in SAGE include SLOPE, ASPECT, HILLSHADE, RADIATE. a. Define the LOOKOUT b. Look at it over the ALTITUDE map POINT FOR LOOKOUT; 30 30 1; SHOW LOOKOUT OVER ALTITUDE c. Generate views from the LOOKOUT d. Take account of trees in the area (no more than 2 cells into the trees) RADIATE LOOKOUT TO 30 OVER ALTITUDE THRU TREES & SCREENING 50 FOR VIEW e. Look at VIEW map SHOW VIEW OVER ALTITUDE TEXT Data Output SAGE outputs to REPORTS, ASCII, or PCX images. To produce a PCX image of a map, use SHOW command SHOW TREES PCX TREES XRES 60 YRES 60

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (14 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

A PCX file called TREES with a resolution of 60 x 60 pixels is produced. EXPORT TEST FOR TEST SAGE will search for a file called TEST and save it as an ASCII file Housekeeping ERASE BOTH will delete the file called BOTH from the database.

DEMO3 Cape Liptrap Cape Liptrap is a region of natural, rugged beauty approximately two hours east of Melbourne. It is characterised by natural vegetation and rolling hills interspersed by farmland and with often steep, rocky slopes adjacent to the ocean. In recent years there has been increasing pressure on the area for urban areas primarily as holiday homes. Both these developments are often highly inappropriate causing degradation of the visual and environmental amenity of the Cape. The Shire of Woorayl sought a method for determining the best areas for residential subdivision based on criteria that would protect the natural environment. To view this demonstration do the following: Change directory CD\SAGE\LIPTRAP Then type in: SAGE DEMO3 1. Data Capture 2. Base Maps 3. Base Maps 4. Model Criteria Sub-models erosion potential access from existing roads visual sensitivity environmental sensitivity visual amenity 5. Erosion Potential 6. Access from Roads

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (15 von 16) [09.06.04 09:22:52]

Module 2 Chap2-5-summary

7. Visual Sensitivity 8. Ecological Sensitivity 9. Visual Amenity 10. Combining the models 11. Advantages of GIS model clear criteria what if modelling by changing criteria GIS as an objective and highly accurate tool

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: Februray 15, 2001

http://infosys-law.canberra.edu.au/gismodules/manual_2/m2_chap2_5_sum.html (16 von 16) [09.06.04 09:22:52]

Vector GIS T-o-C Intro

VECTOR GIS: AN INTRODUCTION


Module 3
[Home][Comments] [Modules] [Glossary]

TABLE OF CONTENTS
Preface Acknowledgments Introduction Materials required Aims Objectives 1. Vector GIS: An Introduction 1.1 Spatial objects 1.2 Types of spatial objects: point, line and area 1.3 Point objects 1.4 Line objects 1.5 Area objects 1.6 Data acquisition 2. Vector data structures 2.1 Whole polygon structure 2.2 Arc-node model 2.3 Relational structure 2.4 DIME 2.5 DLG structure 2.6 Critique of vector data structures 3. Vector data: Storage and retrieval 3.1 Topology 3.2 Editing A quick exercise 4. Vector manipulation and analysis 4.1 Analysis of spatial data

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_tab_intro.html (1 von 6) [09.06.04 09:25:15]

Vector GIS T-o-C Intro

4.2 Integrated analysis of spatial and attribute data 5. Vector outputs Summary Further reading Revision Class exercise Computer demonstration Glossary Index

PREFACE
Vector GIS: An Introduction is the third of five modules in the series on A Self Teaching Students Manual for GIS. This manual is the result of work undertaken for a Committee for the Advancement of University Teaching (CAUT) National Teaching Development Grant for 1995. The two previous modules included Geographic Information Systems and Raster GIS: An Introduction. A further two modules follow this and deals with Managing Attribute Data in GIS and Integrating Remote Sensing with GIS. In order to complete this self-contained unit successfully users should be prepared to spend approximately ten hours, that is, reading and working with the manual, writing up results, doing extra reading and attempting an assessment exercise. The presentation style given in this and following modules is one which may be described as a spiral curriculum. In such a curriculum, the contents in the present module are used again in a following module except in more depth and detail the next time the same or similar concepts are encountered. In general, there are four parts to a module:

1. the text presents both the conceptual and practical aspects of the
module with examples from as many usages as possible;

2. diagrams, figures and other illustrative materials are used to explain and
show relationships;

3. questions, exercises and problems to be solved and an assessment;


and 4. suggestions for further reading and research. (See curriculum chart in Figure 3.1). Figure 3.1 Interlocking modules of the spiral curriculum

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_tab_intro.html (2 von 6) [09.06.04 09:25:15]

Vector GIS T-o-C Intro

It is advisable that you use this workbook as your personal notepad. A highlighter or bright coloured biro to underline text will help identify important points. In this workbook all important concepts, words and phrases are set out in bold letters and those words and phrases used which carry different meanings from their usual are italicised. To begin with you should browse through this workbook very quickly just to get a feel of its contents. Reading this preface helps! A tutor may walk you through this workbook but the pace may or may not suit you. You should try to go through this workbook at a pace with which you are comfortable with.

>The appendices are an important component of the overall module because


they contain important tools. Hints on using computers, a glossary, an index, and some answers to workbook problems are provided here.

ACKNOWLEGMENTS
I should also like to thank the following individuals and publishers for permission to reproduce their illustrations and examples in this workbook.

Stan Aronoff (1989) for Figure 6.17 page 175 in Geographic Information Systems: A Management Perspective. Ottawa: WDL Publications; James Carter (1984) for the figure on page 18 of Computer Mapping. Progress in the 80's. Washington, D.C.: Association of American Geographers; ESRI (1990) for ideas

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_tab_intro.html (3 von 6) [09.06.04 09:25:15]

Vector GIS T-o-C Intro

contained in Lesson 8 Performing Geographic Analysis in Understanding GIS: The Arc/Info Method, Redlands, CA: ESRI; and Jeffrey Star and John Estes, (1990) Figure 4.10 on page. 50 in Geographic Information Systems: An Introduction, Englewood Cliffs, NJ: Prentice Hall.

INTRODUCTION
This third module on Vector GIS: An Introduction builds on the previous two modules but in particular Module 2 since vector systems are a different way of handling spatial data. While rasters deal principally with data that can be stored, manipulated, analysed and presented in cellular form, vectors deal with data that may be referenced to the geographical primitives of points, lines, areas and volumes. In this module we deal primarily with the various methods of depicting these primitives and the associated procedural steps required of a GIS. The next module discusses ways of managing attributes of a map and databases while the final module gives an integration of remotely sensed data and imagery with GIS. This workbook will be presented in four parts including a separate introduction and a summary. In the introduction the representation of geographical entities as objects is discussed as a fundamental basis for most work in vector GIS. The primitives of points, lines, areas and volumes as basic dimensions are explained. Issues relating to data acquisition and the trials and pitfalls of digitizing are discussed. These discussions lead up to the first substantive section in this module which deals with vector data structures. After briefly contrasting such a structure with the raster system, the geometric and topological characteristics of vectors are highlighted. Then a detailed discussion of the vector data structure in terms of five different approaches are given, viz. whole polygon, arc-node, relational, Dual Independent Map Encoding (DIME) and Digital Line Graphs (DLG). A critique of the vector data structure concludes this section. In the second section the storage and retrieval of vector data is discussed including how to detect errors, editing the data and applying planar enforcement rules in order that the data are made useable in the analytical stage. In section three vector manipulation and analysis is described in terms of the principles of topology. The relationship between entities is explored and some analytical functions such as aggregating spatial data, integrating spatial with attribute data, overlays, buffering of spatial primitives are presented. In the fourth section output from vector GISs are contrasted with those from raster systems. A summary of the vector-based GISs concludes discussions in this module. Revision questions, library research and further reading and a note on computer demonstrations are given at the end of this publication. A glossary of terms used and an index occupies the final pages of this workbook.

MATERIALS REQUIRED
Bright coloured biro or highlighter. r Sharp pencils, preferably HB (hard-black), rulers, erasers.
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_tab_intro.html (4 von 6) [09.06.04 09:25:15]

Vector GIS T-o-C Intro


r r

A4 mm graph paper, tracing paper. Access to a personal computer (Intel-based IBM-compatible PC). ARCVIEW Ver. 1.0 for Windows. Intel-based 386 computer or highe.r MS Windows 3.0 or higher. 4 Mb memory in RAM, 8 Mb strongly recommended. 12 Mb hard disk space for software installation, additional disk space for data storage. Colour monitor, mouse, windows-supported printer (optional)..

Internet For students with access to the Internet a visit to these sites might be profitable. "Free GIS Software" at http://www.esri.com/free/arcview1/arcview1.html and "GIS solutions for Everyone" at http://www.esri.com

AIMS
To have a student:

1. Appreciate how vector GISs describe, represent and use spatial objects 2. 3. 4. 5. 6. 7. 8.
as vectors. Understand how spatial objects are captured for use in a GIS. Distinguish between geometric and topologic characteristics of spatial objects. Differentiate between the various types of vector data structures. Evaluate the needs and requirements for planar enforcement. Understand the analytical functions of vector GISs. An informed assessment of trends in vector GIS. Critique the advantages and disadvantages of vector GIS.

OBJECTIVES
As a result of completing this module a student should be able toundertake the following tasks with a certain level of understanding and competence

1. Describe how to represent basic spatial objects. 2. Explain the spatial primitives of point, line and area objects. 3. Describe how data may be acquired for a vector GIS.
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_tab_intro.html (5 von 6) [09.06.04 09:25:15]

Vector GIS T-o-C Intro

4. Give examples of the geometric and topologic characteristics of a spatial 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
entity and explain the differences between the characteristics. Outline a typology of vector data structures including whole polygon, arcnode, relational, DIME and DLG. Critique the typology of vector data structures. Identify errors in the database and edit the data. Define what is meant by topology. Define what is meant by planar enforcement, and explain the rules for such enforcement. List and describe five simple axioms that can be used to describe topology. List and define some of the terms associated with the topological structuring of vector data. Sketch out the possibilities of topological overlays using the spatial primitives of points, lines and area. Explain uses of buffering of points, lines and areas. Give advantages and disadvantages of vector GISs. Demonstrate the use of vector GIS in a real-world application. Discuss the current trends in vector GIS and possible future directions of change and applications.

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 16 2001

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_tab_intro.html (6 von 6) [09.06.04 09:25:15]

Module 3 Chapter 1 Vector GIS

VECTOR GIS: AN INTRODUCTION


[Home][Table of Contents][Comments] [Modules] [Glossary]

1. VECTOR GIS: AN INTRODUCTION


A vector GIS is simply a generic name to describe a class of GIS that use the vector data structure to describe, represent and use spatial objects. A vector is a physical quantity that requires both magnitude and direction for its description. What this means is that a vector can be represented graphically by a directed line segment (see Figure 3.2). On this line segment, for example, we can have
r r r

a line having a length equal to the magnitude of the vector to some scale; a direction indicated by the angle (inclination of the line); and, the position of the arrowhead.

Figure 3.2: A vector showing distance (magnitude) and direction.

An important property of vectors is that when they are oriented in the same or opposite directions, they can be added or subtracted, just like numbers with plus and minus signs. During the Gulf War recently, you may have come across the term vectoring. This simply means to guide an aircraft or missile in flight by means of radioed directions. Vectors then can be useful as they are fundamental to the building of an understanding of how to represent basic objects in geographic space. In the discussions above we have used the term objects. What we mean by an object is that it represents anything from a simple number to a complex entity such as a car or an insurance agency. Conceptually we will have little or no difficulty in visualizing objects. Our difficulties arise when we want to show these objects on a map and to make further statements about these objects. We may
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (1 von 12) [09.06.04 09:35:16]

Module 3 Chapter 1 Vector GIS

wish either to describe it or to analyze them further. In other words we wish to obtain a mapping of these objects in space. In this context and for simplicity, we can say that a map is a birds-eye view of a portion of the earths surface, and the maps basic function is the communication of information of how various phenomena are arranged on the surface of the earth. That is, we can use a map to store as well as to communicate information. The information that we convey can be physical (such as mountains, rivers and so on), natural (flora and fauna) and social phenomena (such as populations, towns, ethnic groups) and these are distributed over the earths surface.

1.1 SPATIAL OBJECTS


There can be many types of spatial objects which are shown on a map. Such spatial objects have locations depicted simply as points, lines or areas. In addition, such spatial objects may have attributes. Attributes are non-spatial data such as a certain species of tree in the forest, the number of cars on the road, the arrangement of small containers etc. These attributes can thus be used to indicate that the objects possess one of the following spatial characteristics: continuous distributions, counts, ordered lists (ordinal), categories (categorical), dichotomous variables or vector indices. Such map features are represented by graphic symbols on maps. These locational and attribute information thus form the basis of many spatial relationships which we observe both implicitly and explicitly. Points, lines and areas (or polygons) are collectively known as spatial data and all features of the landscape may be reduced to one of these spatial data categories. Spatial data is an important concept because GIS technology and computers know nothing of an eagles nest or the tributary of a river or a stand of pine forest. However, computers can be taught to know something about points the location of all eagle nests, lines rivers and river systems and areas or polygons timber and forest stands of various species. Thus, to use computers for handling spatial data, we need to reduce the data in question to the level of a computers comprehension. To achieve this we need to specify three things to the computer:

1. where each feature is in geographic space the location; 2. what each feature is its identity; and, 3. what is each features spatial relationship to the other features on the map
sometimes called map topology. A spatial datum (singular for data) on its own does not amount to much. There are associated with such data some tricky concepts to overcome especially when we associate different objects together in space on a map or in a GIS. These include the following issues:
r r r

distance and direction how far and in which direction; the distributional pattern of a geographic phenomena, for example, spread of a disease; the question of topologic neighbourhood and contiguity that is, nearness, and next to locations; and, the description of shape.

All these properties need somehow to be captured in any spatial database before the data as a whole can be useful for analysis and further manipulation. To this spatial data we must also add its associated non-spatial attribute information so that the data can be described and made more useful.
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (2 von 12) [09.06.04 09:35:16]

Module 3 Chapter 1 Vector GIS

Before we go on, notice what we are doing. We are adding new words to our vocabulary this is an important task in itself because these words and terminology will crop up again later on in greater volume and intensity. Let us add a few more terms and concepts before the real task begins in earnest. A spatial data feature is an abstraction of real-world phenomena; and a feature is made up of both a spatial object and an attribute object. It is possible to have a composite feature made up of many features and an attribute. For example, a hospital complex with each building having a separate name for identification. Spatial objects are the locational attributes of the feature and may comprise geometrical objects such as points, lines and areas. Usually these carry a spatial address, for example an X,Y coordinate pair (discussed later). Spatial objects are said to exist in four spatial domains, or spheres of influence, as shown in the following. In increasing degrees of abstraction: Euclidean space is the most richly structured space. For instance, measurements of distance and angle are possible. Metric space is based on the concept of distance but where distance is measured in a nonconventional way, for example, in terms of time or cost. Topological concepts express spatial relationships such as connectivity between groups of objects in space. The storage and manipulation of such topological data gives current GISs much of its analytical power. Set theoretic domain in its most general form is characterized by such things as overlap, containment, outside of, inside area etc. Attribute objects carry the non-locational attributes of a feature and this alone identifies the type of feature observed. These attribute objects are composed of one or more attribute tables; a simple example is the key or legend on any map. Entity is sometimes used to denote an element in reality that is, the phenomenon of interest in reality.

1.2 TYPES OF SPATIAL OBJECTS: POINT, LINE AND AREA


According to the US Digital Cartographic Data Standards Task-force (1988), there are four spatial object types in terms of their dimensional characteristics:
r r

zero-dimension (0-D) the object has a position in space but no length -- it is a point; one-dimension (1-D) an object having length and composed of two or more 0-D objects a line; two-dimension (2-D) an object having length and width, bounded by at least three 1-D line segment objects an area; and, three-dimension (3-D) an object having length, width, height/depth, and bounded by at

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (3 von 12) [09.06.04 09:35:16]

Module 3 Chapter 1 Vector GIS

least four 2-D objects a volume. Three spatial object types are shown on Figure 3.3 below. Whereas points are considered as an entity at the first level, lines and segments may be considered as second level entities and areas and polygons as third level entities. From these basic building blocks of points, lines and areas we are now in a position to examine more carefully each of these spatial objects in detail. In particular, we are interested in how these spatial objects are represented in a database, since ultimately all GIS work would be with the aid of a computer. We must be able to convey our ideas to a computer system, to be able to store such information so as to enable the computer system to manipulate the data and produce some output for further description and analysis. Figure 3.3. Types of spatial objects and their extensions. Source: NCGIA (1990) Introduction to GIS Core Curriculum, Unit 10.

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (4 von 12) [09.06.04 09:35:17]

Module 3 Chapter 1 Vector GIS

1.3 POINT OBJECTS


Point objects have a position in space but have no length. They are said to be zero-dimensional objects and points specify geometric location. Such location is specified by one coordinate pair of X,Y values. Points are used to locate geographical phenomena at that location on a map or to represent map features too small to be shown as lines or areas. A point is a generic name from which may be distinguished by three special variations.
r

An entity point is used to locate a point feature or an areal feature which may be collapsed to a point due to scale representation. A point may also be a polygon label point such as a point found within a polygon which is used to locate labels or information about that polygon. This point is topologically linked to its boundary polygon. Polygon label points are not linked to attribute objects. A node is a zero-dimensional object that is a topological junction of two or more links or end point of a sequence of links. Unless nodes have attributes, nodes will be implicit in the data structure.

Points thus are used in a number of ways in GIS, computer graphics and digital cartographic data. Points are commonly used to indicate the features themselves, such as, the centre of a field, the beginning of a track, and the corner post of a paddock. When we attach names to these points, we have point labels such as a surveyors benchmark or a trigonometric station on top of a hill. Points can also be used to define more complex spatial objects such as transect lines, boundaries of playing fields and so on. Using a simple coordinate system, it is possible to represent all objects in geographic space. The traditional format has been the Cartesian grid or plane coordinate system. This is an X,Y coordinate system in which both horizontal and vertical distances are measured from an arbitrary origin. (See Figure 3.4). Such a system of representing features on the earths surface is used to project onto a flat two-dimensional map all points, lines and areas. The Cartesian coordinate system may be used to refer map locations to ground locations. An added advantage of this system is that it is one way of representing geographic data in a computer. The coordinate lists represent how map features are stored in a computer as a set of X,Y digits. Thus, the term digitize means the process of turning features into map data represented as a set of digits. Figure 3.4. Cartesian coordinates for points, lines and area as X,Y coordinate strings.

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (5 von 12) [09.06.04 09:35:17]

Module 3 Chapter 1 Vector GIS

What we are using here is known as a vector data model because X,Y coordinates have been used to define points, a series of X,Y coordinates to represent lines and an area as a closed loop of X,Y coordinate pairs. In fact the closed loop of straight line segments is termed a polygon. Remember a vector has distance and direction. As soon as we are able to communicate with a computer the location of two points by using digits and the direction of the line joining these two points, then were in business.

It may be seen that all the points in Figure 3.4, points A through to I have X,Y coordinate pairs. There are also attached to these points some simple attribute data, identifying for instance whether these points are either points, parts of a line or parts of an area. These attribute data are the descriptive information which may be stored in a database about map features and where they are located on the map. The labels A through I also are part of the information name labels in this case. The logic of this data requires that a spatial reference or locational attribute is present. Table 3.1 therefore is the associated attribute table that is paired with Figure 3.4 and is set up in the attribute database. Table 3.1: An attribute table for points, lines and area derived from Figure 3.4. X,Y Coordinates 1,1 3,1 4,2 4,3 5,3 3,4 1,4 2,5 5,5 Attribute or Type of Point P L L L L A A A A

Point A B C D E F G H I

Notes: P = point, L = line, A = area

1.4 LINE OBJECTS

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (6 von 12) [09.06.04 09:35:17]

Module 3 Chapter 1 Vector GIS

A line object is a spatial object made up of a connected sequence of points. Because lines have no width, a specified location must be on either one side of the line or on the other, but never on the line itself. Where there are points on the line then we have what is known as a node, a special kind of point. Usually this indicates the junction between two or more links, or the end of a line. Sometimes the term line segment is used to describe the line joining two points. So, a simple one-dimensional (1-D) object is a straight line between two points. The term string is used to denote an ordered sequence of points. The line here represents a connected, non-branching sequence of line segments. More complex forms of lines include connected sets of straight lines, curves based on mathematical functions (arcs), and lines whose directions are specified (chains). For example, by denoting the beginning and end of a line one can also identify the right and left sides of the line. Real-world examples of such directed lines or chains include one way streets, direction of conveyor belts and oil pipelines. (See Figure 3.5.) Thus, in more formal terms, a chain is described as a one-dimensional object which is composed of a sequence of non-intersecting line segments bounded by nodes (not necessarily distinct) at each end and/or by arcs. Chains may explicitly make reference to the start and end nodes as well as explicitly reference the polygons which lie to the left and right hand side of the chain with respect to the direction of digitising. As a general rule, chains that are found in any network of a map layer cannot intersect other chains except at nodes. Where two nodes are connected the term used is a link; and if direction is specified it is called a directed link. Figure 3.5 Line segments, strings and chains An arc is the locus of points that forms a curve that is defined by a mathematical function. The sine curve is an example of an arc. Line features are an ordered set of points and represent map features too narrow to be shown as an area or feature that theoretically has no width. An example of such a feature is the boundary of local government areas, or State boundaries. When we have a sequence of intersecting chains, strings, links or arcs, and where there is closure, the resultant figure is described as a ring. It is still considered a one-dimensional object type because we are not interested in either the interior area or the area
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (7 von 12) [09.06.04 09:35:17]

Module 3 Chapter 1 Vector GIS

surrounding this sequence of links.

As with points, lines can also have descriptive data attached. These descriptive data are represented to the computer as a set of numbers and characters. Usually a table is used so that if the line were to represent a road feature we can have the following descriptive or attribute data relating to the line in question in an attribute data file of the database, viz.:
r r r r

>Road name >Road width >Number of lanes >Road type 1 = highway >2 = arterial road >3 = major road >4 = suburban street >5 = track unpaved ... .

1.5 AREA OBJECTS


Areas are continuous two-dimensional objects. As a bounded area we can focus either on just the boundary or on the region within the boundary depending on our area of interest. The area bounded may be homogeneous or it may be divided internally into areas with different characteristics. The word polygon is used interchangeably with area objects and simple polygons

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (8 von 12) [09.06.04 09:35:17]

Module 3 Chapter 1 Vector GIS

are homogeneous, whereas complex polygons may be divided into various segments internally. The outer boundary is defined by a set of chains (vertices) and there may be one or more nonintersecting non-nested inner boundaries also defined by a set of chains (see Figure 3.6). In some references the word area and parcel have been used to refer to these areal features whose perimeters are defined by a series of enclosing segments and nodes equivalent to a map showing only one feature, for example, roads, the area defined by the polygon may belong to only one polygon. Polygons can also explicitly reference their component chains. Each polygon is encoded in the database as a sequence of locations that define the boundaries of each closed area in a specified coordinate system. Each polygon is stored as an independent feature. There is no way of referencing areas that are adjacent. The attributes of each polygon such as vegetation cover or type of soil is stored in a coordinate list. In representing each polygon separately the topological relationships between different spatial objects are lost. Polygons sharing boundaries, polygons with polygons and other combinations with points and lines will not be maintained by this database structure. Figure 3.6: Simple polygons and a complex polygon

.
The database representation of a polygon therefore is one which is composed of lines linking points (or nodes) in a series of segments. There must be closure, that is, the beginning and end points must

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (9 von 12) [09.06.04 09:35:17]

Module 3 Chapter 1 Vector GIS

be joined. A file of spatial data built up in this way is essentially a collection of coordinate strings with little or no structure to it. Hence the term used to describe this is the spaghetti model. The structure is simple because it is simply a map written down as Cartesian coordinates. As mentioned previously the spatial relationships between polygon features are not encoded. Such information would have to be generated by searching through all the features in the data file and calculate whether or not they were adjacent. The spaghetti model, because it has no structure is very inefficient for most types of spatial analysis since any spatial relationships must be derived by computation. However, the model is efficient for digitally reproducing maps because data extraneous to the plotting of the map are neither captured nor stored.

1.6 DATA ACQUISITION


Data represents an essential requirement of any GIS. The data which includes both the spatial and aspatial (non-spatial) makes up the largest investment of time and money in GIS often accounting for twice or more than the cost of hardware and software put together. The quality of the data plays an important role in helping make good quality decisions and formulating policy. The quality here refers not only to the reliability and accuracy but also to timeliness, in the correct amounts and in forms that are useable in a GIS. The range of data formats can be bewildering so that the challenge is to try to get it either in a form that can be used immediately or to translate it into the correct form. These operational problems arise because of the numerous ways in which the data are acquired. Often users would generate their own data and this may ensure form, quality and quantity are of the right type and in the correct amounts for the purposes at hand. But this is often not the case when using existing datasets and these may be acquired in a variety of ways including:
r r r r r

>maps and photography >records from site visits >related non-spatial information in printed and digital files >digital spatial data as records of demographic or land ownership data >magnetic tapes containing data on topography and satellite imagery.

>Given these forms of data an essential task is that of manual or automated preprocessing; and this includes even those data collected by the user. The most important element to bear in mind here is to have information about the accuracy, precision, currency and spatial characteristics of the data to be used. >Scale and resolution of spatial data are important for problem-solving. Data for town planning purposes will have to be on a very large-scale compared to those required to draw out broad regional trends and variations. The media used to reproduce the data for visual interpretation may have an important role. Film for aerial photographs can have different degrees of resolution or grain so that terrestrial features that may be captured on one film can be non-existent on another. The geometric and geodetic properties are also important, the former referring to size, shape and geographical orientation, whereas the latter refers to positions on the ground in relation to positions on the globe, the coordinate system and projections used. Spatial datasets are often accompanied by its counterpart or dual in the form of its non-spatial elementsincluding such details as date of data compilation, the observation criteria and source, logical consistency and completeness. Observational criteria refer to an aspect of ground truthing, that is, observing and verifying what is on the ground but also verifying who the observer is. Data on presumed vegetation species and genera is of no use unless it may be so identified on the ground by a trained person such as a botanist. Logical consistency ensures that what is classified on one map
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (10 von 12) [09.06.04 09:35:17]

Module 3 Chapter 1 Vector GIS

and in one dataset is similarly classified in other maps and datasets. There is a great deal of spatial information available in the public domain which may be useful for particular projects. More often than not these are produced by public agencies particularly mapping, but more recently aerial photography, remotely sensed digital data from aerospace agencies. These are becoming more accessible. In the United States the regime is one where spatial data and information collected by public agencies are in the so-called public domain with little or no restrictions in its use. In other countries, however, there are restrictions for legal, economic and security reasons. Proprietary, intellectual property and copyright of such data place legal restrictions to which are tied economic ones in which the owners see opportunities either to recoup costs or to generate revenues for investments in creating the datasets. Data and maps of sensitive defence installations are clearly strictly controlled for obvious reasons. Creating ones own dataset usually arises because either there is a total absence of spatial information for the area or that the data are not in the right form. In creating ones own dataset a lot of time must be spent planning the best strategy to acquire the data by considering such issues as the scale, data density, coverage and geographical features to be included or excluded in the final dataset. Data collection also involves fieldwork which implies that some sampling design is implemented. The statistical techniques of spatial sampling can include point sampling, line sampling, quadrats and transects, random, exhaustive, systematic and stratified sampling strategies. These are well-known techniques described in good statistical texts and should not concern us for the moment. What needs care though is that the questions of accuracy and precision are addressed. Whereas accuracy refers to error-free unbiased data which reflect the true value, precision refers to the ability to distinguish small differences. In vector GISs, when one is creating ones own data, the process of translating map lines as vectors in the dataset is described as digitizing. This procedure may either be done manually or through an automated scanning device. Digitizing is the process of using a digitizer to automate the locations of geographic features by converting their position on a map to a series of x, y Cartesian coordinates and stored in computer files. Thus, the digitizer is simply a device consisting of a table and a cursor (stylus or pointing device) with crosshairs and keys to record the locations of map features. There are various functions to help speed up the digitizing process including the point mode where points may be input into the computer file, line mode to follow lines input as a series of vectors and stream mode to follow streams and tributaries as a series of curved arcs. >Another form of manual digitizing is by keyboard entry. Attribute data are usually entered in this way. However, one may also record information in the field using some data capture device and stored on portable computers. The captured data are then down-loaded into computer files in the computer laboratory. Alternatively, data may also be read off global positioning systems (GPSs); instruments which provide coordinate data for locations and sometimes elevation data. These data are then either keyed in using the keyboard or down-loaded from the field data capture devices. The technology to capture data is no different to the hand-held devices used by supermarket staff who check shelves to ensure that the shelves have stock. Coordinate geometry (COGO) procedures are also used for land record information where a high level of precision is mandatory. Actual survey measurements are entered as a series of x-, ycoordinates because precision is needed and the maps represent the land cadastre exactly as it is expressed in the legal description. In some systems manual digitizing has been replaced by scanning where a digital image of the map

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (11 von 12) [09.06.04 09:35:17]

Module 3 Chapter 1 Vector GIS

is produced by moving an electronic detector across the map surface. There are several designs of scanners of which two are common. Flat-bed scanners are those in which the map is on a flat scanning stage over which the detector is moved in the x- and y-directions. A drum scanner, on the other hand, has the map mounted on a cylindrical drum and the detector is moved horizontally across the drum which is rotated. The sensor motion across the drum produces the x-coordinates while the rotating drum produces the y-coordinates. The output from the scanner is a digital image and depending on the technology may produce spot sizes (to the order of 0.02 mm or 20 microns), colour information, vector or raster formats and tagging of features (during the editing process) in order to link geographic features to attribute characteristics. Video scanners, essentially miniature television cameras, with appropriate interface electronics have also been used to create computer readable datasets. These data can be produced in black and white, or in colour, are extremely fast and relatively inexpensive. As expected that data are in the form of a raster array of brightness values with data arrays of the order of 250-1000 pixels on a side. However, geometrical properties are poor with spatial distortion and uneven sensitivity to brightness across the scanned field. In general, scanned documents need to be clean and smudge-free with line features at least 0.1 mm wide. Complex lines increase error and text may be accidentally scanned-in as line features. Contour lines must be unbroken and there is no automatic feature recognition to distinguish between, say roads and contours. Thus, if good source documents are available, scanning can be an efficient time saving mode of data input.

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 16 2001

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap1.html (12 von 12) [09.06.04 09:35:17]

Manual 3_Chapter 2/Vector GIS

VECTOR GIS: AN INTRODUCTION


[Home][Table of Contents][Comments] [Modules] [Glossary]

2. VECTOR DATA STRUCTURES


In order to appreciate the various types of vector data structures, there is a need first to distinguish between the geometric characteristics of spatial entities with those of its topological characteristics. After that it will be possible to examine the three main types of vector data structures, namely: whole polygon, arc-node and relational. The DIME and the DLG structures are included in this discussion as an illustration of applications of the conceptual model. A critique of the vector data structure will highlight the strengths of the vector model. GEOMETRIC AND TOPOLOGIC CHARACTERISTICS The geometric characteristics of a spatial entity may be described in terms of its twodimensional shape, its distance between like or different entities, how it is connected to other entities and what entities occupy the space adjacent to it. Such characteristics may be easily described either in words as attributes or by a system of coordinate grids on a map. Spatial entities can be described in terms of their dimensional characteristics, 0-D to denote a point, 1-D to denote a line and 2-D to denote an area or polygon. If these entities were to be changed by some kind of a transformation, for example, a mathematical transformation, all the characteristics of these spatial entities would likewise change. Shape can be made smaller, lines made longer and so on. In other words, there is a fundamental change to the entity itself. The mathematical transformation of geometric entities is sometimes known as rubber sheeting, as if the map were drawn on a rubber sheet and stretched in various directions. In contrast, some topological properties of a spatial entity do not change when subject to any kind of a transformation (see reference in Module 1 page 14). If a map is stretched and distorted, some of its properties may change, for example, distance, direction (angles) and relative location of objects. However, other properties are unchanged, and remain constant. An adjacency such as next to, is contained in and crosses are properties which do not change. In other words, areas remain as areas, lines remain as lines and points remain as points. Therefore, a strict topological property is one that remains unchanged by geometric distortions or transformations of the surface. In topology the neighbourhood function and adjacency properties are important features. They are important for performing spatial analyses because we need to know the position of the feature both in absolute space and with respect to its neighbouring features. Knowing if adjacency exists is important when using area objects. Many of the methods of solving mathematical and geometric relationships work better if we know which areas share common boundaries. Some systems store boundaries as several individual line segments and include arc attributes (or pointers) which indicate which polygon falls on each side of the line segment. By storing common boundaries instead of complete polygon boundaries, duplication in digitizing is avoided as is the problem where two
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap2.html (1 von 9) [09.06.04 09:37:36]

Manual 3_Chapter 2/Vector GIS

versions of each common boundary do not coincide. In order to handle the topological characteristics of spatial data there are several forms of vector data structures in common use whole polygon, arc-node and relational (see Star & Estes 1990: 48). 2.1 WHOLE POLYGON STRUCTURE In this structure each layer in the database is divided into a set of polygons. Each polygon is recorded as a series of point locations that define the boundaries of each closed area in a specified coordinate system. Since each polygon is stored as an independent feature it is not possible to identify adjacency from this data file. Attributes related to the polygons such as land cover, ownership status, may be stored with the coordinate list. There is some redundancy at the edges of the polygon where the boundaries or vertices are recorded twice, similarly points shared by two or more polygons are also encoded as many times as there are polygons. Attributes of the polygons, such as land cover or ownership, may be stored in a coordinate list. Each polygon is stored as a separate entity and as such there is no topology, or relationship between the different spatial objects. There is a lot of redundant data stored in such a structure. Line segments at the end of a polygon are recorded twice, once for the polygon on each side of a line. Points shared by two or more polygons are represented many times in the database. Editing and updating of the database without computing the data structure can be difficult. (See Figure 3.7). Figure 3.7 Whole polygon structure

Source: Star & Estes (1990: 50 Figure 4.10) 2.2 THE ARC-NODE MODEL

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap2.html (2 von 9) [09.06.04 09:37:36]

Manual 3_Chapter 2/Vector GIS

Aronoff (1989: 174ff) describes the arc-node or topologic model as the most widely used method of encoding spatial relationships. The example he describes is based on arcnode data, arc because the basic logical entity is made up of a series of points that start and end at a node. The node therefore is the intersection point of two or more arcs. A node can also be found at the end of a dangling arc. Isolated nodes not connected to an arc represent points. The polygon is built up by a closed chain of arcs these represent the boundaries of the area. So in the system used here, the objects in the database are structured hierarchically points, arcs, nodes and polygons. The system is hierarchical because all entities are built up from nodes or points as the elemental components. Arcs are individual line segments that are defined by a series of x, y-coordinate pairs, nodes are at the ends of arcs, nodes also form points of intersection with adjacent arcs. Nodes or points can also exist alone without reference to arcs. Polygons are areas completely bounded by arcs. Topology for each of these spatial elements, node, arc and polygon respectively are stored in separate tables. This structure permits encoding of the geometry of the data with no redundancy. Points are only stored once. Locational data in the form of coordinates are stored in a fourth table. In a Node Topology Table each node is defined by the arcs to which it belongs (Figure 3.8). For example, node N1 is an end point for arcs a1, a3, and a4. Node N5 is a single point that is also defined as arc a6 and as polygon D. This table therefore identifies each point and node on the map that is related to any line or polygon displayed on the map. The Arc Topology Table defines the relationship of the nodes and polygons to the arcs. The points are designated either as start or end nodes so that arc a5 starts at node N3 and ends at node N2. In moving between nodes N3 and N2, the segment traverses Polygon A to the left and Polygon B to the right. This table identifies each line, what types of features the line represents and which points and nodes are connected to them. Figure 3.8: The topological data model

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap2.html (3 von 9) [09.06.04 09:37:36]

Manual 3_Chapter 2/Vector GIS

Source: Aronoff (1989: 175). The Polygon Topology Table shows the arcs that make up the boundaries of each polygon. Polygon A, for example, is bounded by arcs a1, a3 and a5. It is possible to have polygons within polygons or islands, Polygon C is within Polygon B. So in the arc list for Polygon B a zero precedes the list of arcs that make up the island. Polygon C has only one arc (a7). The point in Polygon B is also treated as a polygon, Polygon D which is comprised of a single arc (a6) and it is a polygon with no area. The area outside the map boundary is Polygon E, but no arcs are explicitly defined for it. In sum, the table identifies all polygons on the map that might be processed by a GIS because it identifies attributes, a unique identifier for each polygon, type of feature it represents and which lines form its boundaries. From these topology tables it is possible to perform an analysis of the relative position of map elements. All polygons adjacent to Polygon B can be found by searching the Arc Topology Table. On this table, every polygon paired with B is adjacent to it because they share a common arc for example, arc a5. Therefore Polygon B is adjacent to Polygon A. The topology tables may also be used to find all features contained within a polygon by searching the Polygon Topology Table for arc lists which have a zero. Arcs with zero are then searched in the Arc Topology Table to identify the elements, here arc a6 and arc a7 are islands. Arc a6 is a single point with the same right and left polygons whereas arc a7 is an island with different left and right polygons.
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap2.html (4 von 9) [09.06.04 09:37:36]

Manual 3_Chapter 2/Vector GIS

Map features are related to the real-world through the use of X,Y coordinates and these data are stored in an Arc Coordinate Data Table. Each arc has one or more pairs of X,Y coordinate values. Arc a1 may be represented by its end points and a single intermediate point. The geometry of the data are encoded with little or no redundancy in arc-node structure. The database can also include attribute data for each node, arc and polygon. These are expanded in the attribute table which explicitly links it to the geometry of the spatial object. 2.3 RELATIONAL STRUCTURE This is another variant of the arc-node vector data organisation. In this relational data structure, attribute information is kept separately in relational tables. A row in this table represents a single data record and the columns represent different fields or attributes. One relational table keeps track of the attributes of points or nodes in the database. Similarly, the arc table holds information about arc attributes and the polygon table on polygon attributes. Note that attribute data are kept separate from topological information. Thus, there are more separate files and pointers to maintain. Commercially available general purpose database management software are used in a relational model to keep track of all forms of relationships that may arise (see Figure 3.9) According to Aronoff (1989: 177) the topological data structure has the advantage of allowing spatial analysis to proceed without using the coordinate data. Topological relationships such as connectivity analyses may be performed quickly since there is no need to derive the spatial relationships from the geographic coordinates. However, the use of the topological data structure comes at a cost. Every time a new map relationship is entered or an existing map altered or changed, topology must be re-built. The procedure of rebuilding and updating can be relatively time-consuming however it is performed whether in real-time, that is, sitting in front of a computer terminal and doing all the changes, or, in batch mode where all changes are performed in one session without operator intervention. Where the data structure does not include topology, more complex programs and algorithms are required, including more powerful computers to perform these analyses. In the market place today, for obvious reasons of cost and speed of computation, nearly all vector-based GISs now use the topological data model as the principal data model. Figure 3.9 Relational data structure and associated attribute information

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap2.html (5 von 9) [09.06.04 09:37:36]

Manual 3_Chapter 2/Vector GIS

2.4 DIME The DIME structure is an example of the application of the whole polygon structure. Here it represents an early attempt at incorporating explicit topological relationships in a geographic database is that by the US Bureau of Census DIME system. This stands for Dual Independent Map Encoding. The basic element in a DIME file is a line segment that is defined by two end points. The implicit assumption is that the line segment is straight and uncrossed by any other line. So a complex line will be made up of a series of line segments. Also the segment has two node identifiers, together with coordinates of its two end points, and codes for the polygon on each side of the segment. However, neighbourhood relationships are not made explicit and the DIME structure is cumbersome to use when there are areas made up of complex lines. By design, each entry in any boundary file carrying the name DIME consists of the following as a minimum:

1. 1. 1. 1.

segment name to identify the segment, for example name of street beginning node number and coordinates of the node (from-node) ending node number and coordinates of the node (to-node) identifier for polygon on left side of segment (LPOLY)

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap2.html (6 von 9) [09.06.04 09:37:36]

Manual 3_Chapter 2/Vector GIS

1. identifier for polygon on right side of segment (RPOLY)

Figure 3.10: Two street segments coded in a DIME file format

Source: Adapted from Carter (1984: 18). Figure 3.10 shows two street segments coded in a DIME file format. For the 1990 US Population Census, a more sophisticated system, incorporating both data files and associated software has been developed. The new format known as TIGER (Topologically Integrated Geographic Encoding and Referencing) the 1980 Census boundaries so as to enable the evaluation of population change between the two censuses. Also TIGER files contain roads, rivers, railway lines, physical features and administrative boundaries for the entire country (Huxhold 1991: 163). Figure 3.10 shows additional attributes coded in the DIME file. When the segment is a part of a street, the address ranges for both sides of the street may be stored. A field is available for segments that are not streets to indicate other features such as a shoreline. In addition, there are separate fields for census enumeration districts, country and state code numbers, electoral voting precincts, fire services districts, planning districts and so on. A major disadvantage of the DIME structure is that there is great difficulty in manipulating complex lines. Especially with line segments that represent streets but which cross over other line segments, great computational effort is required to trace the whole

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap2.html (7 von 9) [09.06.04 09:37:36]

Manual 3_Chapter 2/Vector GIS

length of the line segment in question. A major advantage of this system is that address matching of spatial objects stored in different files is a relatively easy task because the address are stored explicitly in a DIME file. (For further references see US Bureau of Census (1990), Sobel (1990) and Marx (1990)). 2.5 DLG STRUCTURE A further illustration of a vector data structure that incorporates whole polygon, arc-node and relational structures is the DLG product of the US Geological Survey (USGS). It is a standard vector file format in which all the data are topologically structured. The contents of DLG files are subdivided into four thematic layers boundary information, hydrographic features, transport network and the Public Land Survey System. Data elements are similar to the point, line and area elements discussed previously giving information on geographic location, topology and detailed attribute codes. Each file has a header record which provides information on the date of creation of the file, map projection and coordinate system used, and the number of points, lines and areas stored in the file. an optional final record gives estimations on the accuracy of the data (see Star & Estes 1990: 55). 2.6 CRITIQUE OF VECTOR DATA STRUCTURES A data structure may or may not contain topological information that describes a spatial entitys location and spatial relationships. In some work this information may be unimportant and so may unduly take up valuable storage and processing requirements. However, in other applications topological information is a necessary part of the analysis to detect errors, for presentations, network applications and some analytical procedures in measuring proximity, performing overlays and intersection procedures. While topology is implicit in raster data structures, in vector data structures topology has to be made explicit and is defined explicitly. While topology may not be apparent when a user digitizers a map, in the Arc/Info system topology has to be built after the data are cleaned of errors. In a topological data structure there is an additional cost every time a map is edited either to remove a spatial object or to include a new entity. Each edit requires topology to be updated because the existing relationships have changed. This can be quite time consuming in processing terms but depends on the software, the size of the map and speed of the hardware. That is why it is always advisable to build topology after all cleaning and editing of the data have been completed. The debate between raster and vector data structures need not be rehearsed here except to note that perhaps these are two different systems to store and express the same things. However, there are certain penalties for using the arc-node database. If one wanted to retrieve a geometric entity alone, one would also have retrieved the accompanying relevant attributes associated with this entity. Such a structure therefore exacts an additional overhead in the retrieval process. The relational data structure, however, overcomes this sort of a problem because the geometrical properties are stored separately from its attribute properties. simple searches and retrieval of data can thus be quite efficient.

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap2.html (8 von 9) [09.06.04 09:37:36]

Manual 3_Chapter 2/Vector GIS

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February,16 2001

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap2.html (9 von 9) [09.06.04 09:37:36]

Manual 3 Chapter 3 Summmary

VECTOR GIS: AN INTRODUCTION


[Home][Table of Contents][Comments] [Modules] [Glossary] 3. VECTOR DATA: STORAGE AND RETRIEVAL Previous discussions have shown how it is feasible to display map features represented by points, lines and polygons. These features can be further described by their associated attributes. Both the locational and attribute information can be used for locating these features on a map, informing the map reader what type of feature is shown as well as give appropriate symbols with each feature for map display. However, map users frequently subconsciously make additional associations or connections when studying a printed map. For example, we often trace routes with our fingers on the map when we want to get to somewhere unfamiliar. Similarly, it is possible on city maps to identify two department stores which are next to each other and on what street they are found. In doing so, we are interpreting relationships by identifying locations and reference points, looking for connecting lines along a street, defining areas enclosed by city blocks with streets forming the boundaries and identifying which buildings are next to each other.

3.1 TOPOLOGY
Topology is the term used to describe the making of spatial relationships. Topology is a branch of mathematics that deals with relationships between spatial objects. In a GIS the user has the ability to create new relationships, to associate new attributes to map features and to store these in an attribute database. Using topological procedures it is possible to identify spatial relationships between spatial objects. Thus, for example, a wildlife biologist may wish to identify land cover areas which border rivers for habitat assessment while a town planner may wish to locate sites to avoid those with potential conflicts in use such as industrial zones and new suburban development. Before the advent of GIS it seems that if one wanted to identify soil types within a particular area of associated rock types, we simply overlaid the soils map over the geology map of the same scale on a light table and then traced out our areas of interest. However, the task would be difficult if there were many different soil classes over the same area. Such an operation can easily be performed in a GIS by using topological overlay. This is one of many spatial operations that allow the creation of new spatial relationships. In fact, new features can be formed by overlaying map layers of different features. The associated attributes of each input layer are automatically transferred to and combined with attributes from other layers to describe new output features. In a topological data structure we add intelligence to the database in a GIS by explicitly building relationships. But before discussing how topology is built it might be useful to describe the elements of topology. Topological data define the logical connection between points, lines and areas for geographical description and analysis. Connections between spatial objects, for example, information on areas which bound a line segment are considered to be topological data. From this adjacent spatial objects may also be identified. The topology of any line would thus include the starting node, its destination node and the left and right polygons through which the line passes. If we were to put these together in a map composition, we have map topology, the spatial relationship between map features or spatial objects. A topological spatial database is said to exist only if one or more of the following have been
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (1 von 17) [09.06.04 09:39:49]

Manual 3 Chapter 3 Summmary

computed and stored:

1. Adjacency relationships between areas or contiguity have been established.


Elements that touch each other are said to be adjacent or contiguous.

2. The different map layers have been subjected to planar enforcement, that is,
topology has been built.

3. There is a connectedness of links at intersections or connectivity, that is, there


is a node at the intersection of links and there is an interconnected pathway or network. (NCGIA Introduction of GIS Unit 12). Planar enforcement is the process of building points, lines, and areas from digitized spaghetti. Wherever intersections occur between lines, the lines are broken and a point is inserted. There is a set of rules to be followed and where the set of objects follow these rules is said to be planar enforced. Planar enforcement is a very important operation in a vector GIS. The result is a set of points, lines and areas which obey specific rules are also called axioms. The five axioms in topology are as follows: 1. All arcs end in points or nodes. 2. Arcs cannot intersect except at their nodes. 3. Areas are completely enclosed by arcs. 4. Areas do not overlap. 5. Every location is within some area. This process is known loosely as building topology. The points, lines, and areas in this scheme are sometimes referred to as 0-dimension cells, 1-dimension cells, and 2-dimension cells respectively. These rules are important in vector GISs, because planar enforcement is used to build objects out of digitized lines. When we build topology, we are calculating and encoding relationships between points, lines and areas. So when planar enforcement is used, area objects in one class or layer cannot overlap and must exhaust the space of a layer. Examples include a layer showing only rainfall data and another showing only temperature data (NCGIA Introduction to GIS Unit 13). Before examining a topological model, there is a rule of thumb for topological data structures which states that anything of interest on a map must be explicitly defined as either a point, line or area in order for systems to perform any sort of spatial analysis on the data. Note that text, annotations, and symbols are also usually defined explicitly either through the attribute data or in special tables which define their meanings, location, shapes, orientation, colour and so forth (Huxhold, 1991: 54). A further guiding principle in dealing with topological data structures is that because it is expensive to generate topological information when it is not explicitly present, it is best to define these topological relationships at the earlier stage of data collection and input (Star & Estes 1990: 79). There is an infinite number of possible relationships among spatial objects. Many of these can be important in the analysis of spatial data. A point in an area is an example of a is contained in

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (2 von 17) [09.06.04 09:39:49]

Manual 3 Chapter 3 Summmary

relationship which relates an object to its surrounding environment. Where two lines intersect this gives us the intersects relationship and this may be important for route analysis of road networks. So, relationships may exist for the same or different entities. There may be three types of relationships which can be identified.

1. Relationships which are used to construct complex objects from simple


primitives. For example, a line and an ordered set of lines to produce a chain. 2. Relationships which may be computed from the coordinates of objects. For example, where two lines cross, the crosses relationship can be computed. Areas may be examined to see if it overlaps a given area the overlaps relationship can be computed. 3. Relationships which cannot be computed from coordinates must be coded in the database during input. Further examples of spatial relationships may be discerned from the following matrix (Table 3.2).

Table 3.2: Points, lines and areas: Possible relationships among spatial objects. Point Point is within Line ends at is nearest Area is contained in is nearest to can be seen from crosses borders

Line

crosses comes within flows into

Area

overlaps is nearest to is adjacent to

A number of advantages accrue from storing topological relationships of spatial objects. The data are stored more efficiently when topology is used because fewer redundant data are recorded and kept. Therefore, one may process the data faster and process a much larger dataset. Identifying topological relationships also means that analytical functions such as modelling traffic flows through a road network, combining adjacent polygons with similar characteristics, for example, combing areas under agricultural crops, as opposed to native grasslands; and overlaying geographic features, for example, mine shaft over the geology of an area would be accomplished much faster.

3.2 EDITING
At every stage of the input, storage, retrieval and manipulation process error is continually being introduced. The objective is to attempt to manage these errors rather than eliminate them. An extended discussion of the sources of error was introduced in Module 2 pages 45-46 and has also been dealt with quite comprehensively in Aronoff (1989: 141-144). Here the focus is on errors arising from
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (3 von 17) [09.06.04 09:39:49]

Manual 3 Chapter 3 Summmary

digitizing maps and their associated terminologies. In the main, the examples used here are obtained from Arc/Info. Some common errors of the digitizing process are associated with nodes, arcs and polygons. A polygon by definition should have closed boundaries and the boundaries must be continuous. Where this is the case, the source of the error may lie in encoding where an arc is independent of the polygon, or a digitizing error where the nodes have not been connected to form the polygon. Overshoots and undershoots are terms to describe such non-closure of polygons. Sometimes polygons sharing common boundaries may have been digitized twice with a resulting sliver due to intersecting arcs. Alternatively the digitizing itself may have missed out areas thus producing gaps. There may be instances where a large map sheet may have to be partitioned or cut up into individual manageablesized sheets in order to lay on a digitizing tablet. After digitizing all the individual sheets, these may be merged digitally to form the original large map sheet. However, because of slight inaccuracies either in digitizing or registration of each tile of the map, the map edges of adjacent sheets may not match. So an editing procedure called edge-matching may have to be used to correct the errors and hence present the map as one continuous sheet. Problems associated with registration arise when map sheets overlayed on top of each other have features do not match. The problem may lie in the digitizing process or it may be caused by the original maps themselves where the map manuscripts were produced on non-stable material. Expansion, contraction, creases due to folding may produce difficulties with registration. In the digital map, however, the non-registration of separate layers or coverages may be caused by errors in the tic-points thus producing unacceptably large RMS (root mean square) errors. A RMS is simply an indication of the calculated difference between the digitized location and the specified location on a map. The higher the value the greater the error and typically an acceptable RMS error is of the order of 0.003 or less! Figure 3.11 and Table 3.3 show these errors. Table 3.3 Examples of possible sources of error Spatial Objcet node arc arc arc polygon label label label Source of error missing label point dangling node due to overshoot / undershoot arc dangling node due to unclosed island polygon missing arc open polygon label point with wrong label identification polygon with two label points due to missing arc polygon with no label

Source: ESRI (1990: 5-15ff). Figure 3.11 Examples of some typical errors in a vector dataset

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (4 von 17) [09.06.04 09:39:49]

Manual 3 Chapter 3 Summmary

A QUICK EXERCISE The map of ROADS below shows seven nodes which have been numbered. Do the following tasks:

1. Topology. Use the table to list the from-node and to-node of all arcs. 2. Connectivity. List the arcs along which you would traverse to get from node 6 to node 1.
Indicate the direction of the travel across each arc. 3. Area and contiguity. Using the first table, define each polygon. Record the number of each arc. Then in the second table, for each arc list the polygons on the left and right sides. Arrows indicate the direction of each arc.

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (5 von 17) [09.06.04 09:39:49]

Manual 3 Chapter 3 Summmary

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (6 von 17) [09.06.04 09:39:49]

Manual 3 Chapter 3 Summmary

4. VECTOR MANIPULATION AND ANALYSIS


In performing geographic analysis there are a series of discrete steps which require attention. Setting these steps out will help achieve one of two objectives. Firstly, it will set out the tasks that need to be done and secondly the sequence with which these tasks have to be conducted. Eight steps in the GIS analysis have been identified and are shown in Figure 3.12. As the terminology in the diagram is selfexplanatory there is no need for an extended discussion. Also, each project will have different objectives and criteria so that a generic statement will be difficult to write out. The tabular analysis mentioned in the figure refers to the non-spatial data which may be descriptive, non-statistical information. An analysis of such descriptors can be conducted by itself and the results can be as informative as those obtained from the analysis of the spatial data. With GIS analysis underlying trends may be detected as well as making new information available. The models may revel new or previously unidentified relationships within and between datasets thus enhancing our understanding of the realworld. With the rapid development in GIS software, the so-called functionality of analyses has become so bewilderingly large that novice users of GIS would find it difficult to make a start. Aronoff (1989: 195) has proposed a classification of the analytical functions of any GIS. In his classification scheme there are four major categories each of which are further subdivided. The four categories are: (1) the maintenance and analysis of spatial data; (2) the maintenance and analysis of attribute data; (3) integrated analysis of spatial and attribute data; and, (4) output formatting. While the classes are
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (7 von 17) [09.06.04 09:39:50]

Manual 3 Chapter 3 Summmary

arbitrary, they provide a useful framework. Here we will concentrate on the spatial analytical functions and integrated spatial and attribute functions while postponing the handling of attribute data to the next module. Figure 3.12 Steps in performing geographic analysis

Source: Adapted from ESRI (1990: Chapter 8).

4.1 ANALYSIS OF SPATIAL DATA


The simple display of spatial objects is a useful starting point in any analysis. The distribution, density and geometric patterns of spatial objects say much of the underlying relationships. However, more frequently than not, a user may wish to display only a subset of the data. Depending on the language
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (8 von 17) [09.06.04 09:39:50]

Manual 3 Chapter 3 Summmary

the software uses, in general there would be spatial operators that permit the selection of a subset of the data given certain rules and parameters. Such rules include the following:
r

relational: more than > GE less than <LT equal to =EQ not equal to NE greater than or equal to GE less than or equal to LE CN contains specified characters arithmetic: +, -, =, *, **, / for arithmetic fields (addition, subtraction, equality, multiplication, exponentiation and division respectively) Boolean: AND, OR, XOR, NOT (for union (AND), either (OR), exclusive or (XOR) either one or other but not both, exclusive (NOT)) Other: within: where one object is within an area buffer: where a spatial search is made within a radius of a point, along a line or a bounding polygon

The idea of topological overlays has been introduced previously. A major requirement is that of planar enforcement, that is, where the spatial objects satisfy certain rules. Individual layers of a map require planar enforcement as well as when two layers are combined. For example, when two layers are combined all new intersections must be identified and nodes created where lines cross. Furthermore, any line crossing over an area object creates two new area objects. All such relationships have to be updated for new layers the result of overlaying two or more maps. It may be that the aim is to reduce the number of categories of grassland species or that the aim may be the opposite to find as many areas within unique combinations of species; for each of these the topology has to be built. As well, new attributes will also have to be created so that the new areas crated by overlaying can be labelled and identified. With the three basic spatial objects of points, lines and areas one can imagine a 3 x 3 matrix in which new topological relationships may be created. However, rather than examine all nine such combinations, three will illustrate the issues involved. Point in polygon. An overlay of point objects on area objects will produce a so-called is contained in relationship. The result is a new attribute for each point. Line on polygon. An overlay of line objects on area objects will also produce a is contained in relationship. But note that lines are broken at each of the area object boundaries and the number of lines resulting from the overlay procedure is greater than the number of lines in the original layer. Each output line may or may not create a new area object. Polygon on polygon. This is the classic polygon overlay where boundaries are broken at each intersection. The number of output areas is likely to be greater than the total number of input areas. Examples of such polygon overlays are postcode boundaries, suburb boundaries, and an output of postcode-suburban boundaries. After this overlay operation it may be possible to recreate either the input layer by dissolving and merging based on the attributed contained and contributed in the input layer (see Figure 3.13). Figure 3.13 Examples of some overlay operations involving points, lines and areas

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (9 von 17) [09.06.04 09:39:50]

Manual 3 Chapter 3 Summmary

Arc/Info uses the terminology of union, identity, and intersect very much like those used in set theory where one draws Venn diagrams. In a union one retains all areas from both areas overlayed. In an identity one overlays points, lines or polygons on polygons and keep all input coverage features. An intersect involves the overlay of points, lines, or polygons on polygons. Here one keeps only those
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (10 von 17) [09.06.04 09:39:50]

Manual 3 Chapter 3 Summmary

portions of the input coverage features falling within the overlay coverage features. Apart from overlaying maps, it is also feasible to use buffers around points, lines and areas to create new area objects. Buffering has applications in transport, forestry and resource management purposes. For example, protected zones around lakes and streams, zones for noise pollution along highways, service zones around bus stops (for instance, using a radius of 300 m as a buffer) and groundwater pollution zones around waste sites. While there may be problems of a programming kind, buffers can be determined by an attribute of an object and therefore assume a variable width or radius. For example, to draw buffers in residential neighbourhoods depending on the type of street involved we may draw buffers that will be 600 m set back for a major street, 200 m set back for secondary streets and 100 m for tertiary streets. The problem with buffer operations occurs when very complicated lines or areas are involved and the buffers become indistinct and merged together (Figure 3.14).

4.2 INTEGRATED ANALYSIS OF SPATIAL AND ATTRIBUTE DATA


The integrated analysis of spatial and attribute data involves the joint consideration of both spatial and attribute operations. Figure 3.14 Buffer operations on a point, line and area

Whereas in the first section where spatial operations were involved, the geometric relation of spatial objects changed with the analysis. However, the attributes associated with each of these spatial objects would also have changed and would require updating accordingly. More often than not, these attributes
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (11 von 17) [09.06.04 09:39:50]

Manual 3 Chapter 3 Summmary

are tabular and relational so that the task of updating is one involving the joining of tables and the relating of tables according to one characteristic or feature in the table. To simplify matters a unique identification number is always used and it is present in every table and repeated in every subsequent table in order to achieve a successful joining and relating of tables. For example, to join a table of vegetation classes for one area to a table showing soil classes or types two criteria have to be fulfilled. First, both these tables must have one element labelled id or identification number of some kind so that this number refers to the same area on the map. Secondly, both tables have to contain the same number of rows where the rows refer to areas on the map. Given these, the joining and relating of tables can be successfully carried out. Error checking routines in the software will detect incompatibilities such as mixing numbers with characters and illegal characters which may not be used. With area objects, common spatial analytic operations include reclassify, dissolve and merge based on attributes. Thus, in a soils map showing all soil types we may wish to show only the main categories of soil on a broad regional basis. This may be achieved by reclassifying areas on the basis of one attribute or a combination of attributes. The result is a much simpler map because some boundaries would dissolve because adjacent areas may be of the same type and some polygons have been merged to form larger objects. The new boundaries will not encompass much larger areas. Topology for the new map will now have to be re-built and new identification numbers given for each new polygon. The attribute tables will also need to be updated accordingly. In a forestry example, a forest may be divided into stands of 10 ha with different attributes for each stand including tree species, average tree age, tree species and so on. Assuming there are physical boundaries for each of these stands and the attributes apply homogenously within each stand, a GIS problem may be formulated which asks the question: "Identify all cuttable areas of pinus radiata". A twostep procedure is necessary. First, one would need to assign a new attribute called cuttable to each stand. This is done by assigning yes to stands which are Pinus radiata AND aged > 50 years and no is assigned if not fulfilling the two criteria. After assigning the new attribute, all other stands not fulfilling the criteria can be dropped. Secondly, to identify cuttable areas, individual stands may be merged by dissolving boundaries between adjacent polygons with the same value as the cuttable attribute. The merged polygon produces larger spatial objects to allow for a more efficient harvesting programme minus back-tracking of plant, equipment and personnel.

5. VECTOR OUTPUTS
Most GIS software include capabilities for displaying results either on the computer monitor or in a hard copy. The outputs most usually are in the form of maps, graphs and tables which are used by the analyst to produce the written reports. Cartographic functions permit the production of maps depicting the spatial distribution of phenomena and new information produced from the analysis. The choice of display depends on several factors including the type of data, scale and resolution, hardware and software available and the intended audience of these products and reports. Also, some information produced from the analysis may need to be used as input to other systems and GISs so that the output format is also an important consideration. A lot of data today is transmitted in digital form and while there is no standard format those data carried on floppy disks, cartridges, CD ROMS and other magnetic media may be stored in ASCII the American Standard Code for Information Interchange, has been used more or less as an industry standard. Placing the data on these various media also ensures the longevity of the data provided it is not physically destroyed advertently or otherwise. There is no need to go into the various technologies of producing maps and other products from a GIS analysis because of its technical nature. However, it must be observed that map products using the vector data structure are more aesthetically pleasing and appear more professional than those produced by other systems. Of course, this depends on the hardware available. With the choice of colour or monochrome, pen plotters and ink jet plotters, the technology is available today to produce
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (12 von 17) [09.06.04 09:39:50]

Manual 3 Chapter 3 Summmary

cartographic products of a very high standard and quality. Also with the convergence of technology the products may be produced directly on film for publication purposes or on any other media. There is no longer the need for further manipulation and processing when exporting to other media and technologies.

SUMMARY
The vector data model is based on vectors. The fundamental primitive of a vector is a point. Spatial objects are created by connecting points with straight lines and some systems even allow points to be connected using arcs of circles. Area objects are defined by sets of lines that describe a polygon. The term polygon is synonymous with area in vector databases because of the use of straight-line connections between points. Very large vector databases have been built for various purposes especially when lines, points and networks are present, for example, transport networks, utility management and marketing purposes. Vector and raster data structures have been used in combination for resource management applications and for those applications requiring broad regional scales of analysis. Various modes of input, storage, retrieval and manipulation of vector data are available for further use in GIS analyses. Whether whole polygons, or arc-node or relational models are used depend very much on the application at hand. A key principle in all these operations is the concept of topology and how it is built by applying the rules of planar enforcement. At the analytical stage spatial data may be manipulated and analyzed to produce new information or provide a re-interpretation of old information because new relationships have become apparent or have been made apparent through operations involving topological overlays. The new spatial objects produced from the topological operations will also show new features and characteristics which will need to be updated in the tables and attribute data. A change in a spatial object will produce one or more changes in the attribute table. The analysis functions in a vector GIS deal with spatial objects. Such spatial objects may be measured precisely such as area, length, or distance and so on. The measurements are more accurate because these are based on the dimensions of the objects point, line, area and volume. However, in vector GIS some operations can be slower than those in raster GISs such as overlaying maps, finding buffers, cleaning and editing, and building topology. On the other hand other functions can be faster, for example, finding routes through a road network. Finding optimal routes through a network can be combinatorially explosive though!

FURTHER READING Aronoff, S. (1989) Geographic Information Systems: A Management Perspective, Ottawa: WDL Publications. Chapter 1 An introduction to Geographic Information Systems pp. 1-30; Chapter 7 GIS Analysis Function, pp. 189-247. Burrough, P.A. (1986) Principles of Geographical Information Systems for Land Resource Assessment, Oxford: Clarendon Press, See Chapter 2 on data structures for thematic maps, pages 13-38 and Chapter 5 Data Analysis. Carter, James R. 1984. Computer Mapping. Progress in the 80s. Washington, D.C.: Association of
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (13 von 17) [09.06.04 09:39:50]

Manual 3 Chapter 3 Summmary

American Geographers. Dickinson, H. (1990) Unit 13: The vector or object GIS, in Goodchild & Kemp, pp. 13-3 - 13-7. Also Unit 14: Vector GIS capabilities in Goodchild & Kemp, pp. 14-3 - 14-8. Environmental Systems Research Institute (ESRI) (1990) Understanding GIS: The ARC/INFO Method. Redlands CA: ESRI. Chapter 8 Performing Geographic Analysis. Goodchild, M.F. & Kemp, K. (eds.) (1990) Introduction to GIS, NCGIA Core Curriculum, Santa Barbara, CA: NCGIA. Huxhold, William E. 1991. An Introduction to Urban Geographic Information Systems. New York: Oxford University Press. See Chapter 4 Topological data structures, pages 127-146. This chapter is especially good on adding topological data structures to a cartographic database, point, line, and area tables. Also Chapter 5 Geographic Base Files, pages 147-179. See discussion on geocoding and geoprocessing and on examples of DIME and TIGER files. Maguire, D.F., Goodchild, M.F. & Rhind, D. (1991) (eds.) Geographical Information Systems, London: Longman Scientific & Technical. Martin, David 1991. Geographic Information Systems and their Socio-economic Applications. London: Routledge. See Chapter 6 Data Storage pages 82-101. Marx, R.W. (1990) The TIGER system: Automating the geographic structure of the United States Census, in Peuquet & Marble (eds.) Chapter 9, pp. 120-141. Peuquet, D.J & Marble, D.F. (eds.) 1990. Introductory Readings in Geographic Information Systems. London: Taylor & Francis. Star, J. & Estes, J. (1990) Geographic Information Systems: An Introduction, Englewood Cliffs, NJ: Prentice Hall, See Chapter 4 on Data Structures pages 32-60 especially the section of vector data structures and Chapter 8 Manipulation and Analysis pp. 143-173. Sobel, J. (1990) Principal components of the Census Bureaus TIGER file, in Peuquet & Marble (eds.) Chapter 8, pp. 112-119. Taylor, D.R.F. (ed.) 1991. Geographic Information Systems: The Microcomputer and Modern Cartography. London: Pergamon Press. See the article by Donna Peuquet on Methods for Structuring Digital Cartographic Data in a Personal Computer Environment, pages 67-96. US Bureau of the Census (1990) Technical description of the DIME system, in Peuquet & Marble (eds.) Chapter 7, pp. 100-111. White, G. (1990) Unit 12. Relationships among spatial objects, in Goodchild & Kemp (eds.) pp. 12-3 12-9.

REVISION
http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (14 von 17) [09.06.04 09:39:50]

Manual 3 Chapter 3 Summmary

1. What makes the concept of a spatial database unique relative to other types of databases? 2. 3. 4.
NCGIA 10-9 What is a database model, and why is it important for designing a database? NCGIA 10-9 List and define an example of a spatial object type from each of the 0-D, 1-D, 2-D and 3-D groups of object types. NCGIA 10-9 Draw diagrams to illustrate the spatial join of area objects. First draw an input coverage. Then draw a separate union cover to be overlayed. Finally draw a separate cover to illustrate each of the three types of spatial joins: (a) union (b) identity (c) intersect Would the analytical format be the same for raster data structures? Illustrate you answer similarly but using a raster data structure this time. GIS analysis functions include the following: Measure Coordinate transformation Generate objects Select a subset of objects Modify attributes of objects Dissolve and merge area objects Generalize or smooth lines Compute statistics for a set of objects Topological overlay Operations on surfaces Network analyses Input and output management Compare the classification of map analysis functions above with that proposed for raster functions. See Berry NCGIA Unit 5. Source: NCGIA Unit 14 pp. 15-3 - 15-11. What steps can be taken to minimize the effects of generalization and error in digital data obtained from maps? NCGIA 2-10. Discuss the use of planar enforcement for street networks, and the problems presented by overpasses and underpasses and tunnels. Can you modify the basic rules to maintain consistency by allow for such instances? NCGIA 12-9. What additional examples of relationships can you devise in each of the six categories used below: point to point; point to line; point to area; line to line; line to area; area to area. NCGIA 129. List and describe the processes involved in constructing a vector database by digitizing maps. NCGIA 13-7. By using simple sketches, describe and illustrate typical problem cases which lead to difficulties in building area-object topology in a vector database, and the strategies which various GISs use to minimize editing effort. NCGIA 13-7 Discuss the applications of GIS, in relation to the vector data model. Give examples of cases where the model would be particularly inappropriate in comparison with raster. NCGIA 13-7 What are spurious polygons and what characteristics of data cause them? NCGIA 14-8. Three types of topological overlay include points in polygon, line on polygon and polygon on polygon. Are there others? What applications might they have? NCGIA 14-8.

5.

6. 7. 8. 9. 10. 11. 12. 13.

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (15 von 17) [09.06.04 09:39:50]

Manual 3 Chapter 3 Summmary

CLASS EXERCISE
A. With pencil and paper, roughly enlarge a local street map to a scale of approximately 1: 1000.

a. Develop a raster dataset of the transport network. On a light table, lay a sheet of millimetre
graph paper over the enlarged street map, and mark off the cells corresponding to the centre lines of the main roads. b. Develop a vector dataset of the transport network. Lay a second sheet of millimeter graph paper over the map and draw the centre lines of the roads as a series of straight line segments. Use the cells at the end points of these straight vectors to indicate the points that might be termed nodes in a vector database. c. Compare the number of raster cells coded in (a) with the number of end cells required in (b). B. Examine the arc-node data structure example given below.

a. How would you revise this example so that all the information in this example is stored in a
raster data structure? b. Estimate the relative data volume of the arc-node example in comparison with your choice of raster representation. Nodes No. 1 2 3 4 Arcs No. I II III IV Polygons Name A34 A35 Owner J. Smith R. White Arcs Perimeter I,II,III,IV 405.2 III,VI,VII 478.1 Area 10203 11562 Zoning Attribute R-4 R-4 From 4 1 2 3 To 1 2 3 4 Length Pavement 106.3 good 92.2 fair 111.6 fair 95.1 fair Lanes 4 4 2 2 Name Eastings 126.5 218.6 224.2 129.1 Northings 578.2 581.9 470.4 471.9 Traffic control light sign none sign Pedestrian crossing yes yes no no

Source: Star & Estes 1990: p. 253 Assessment exercise figure

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (16 von 17) [09.06.04 09:39:50]

Manual 3 Chapter 3 Summmary

COMPUTER DEMONSTRATION For those who have access to the Internet here are a few addresses that you can visit to learn more about DIME and TIGER files. http://tiger.census.gov/geog.html [This computer demonstration is under development. Please refer to a class handout.]

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 16 2001

http://infosys-law.canberra.edu.au/gismodules/manual_3/m3_chap3_sum.html (17 von 17) [09.06.04 09:39:50]

Manual 4 T-o-C and Introduction

MANAGING ATTRIBUTE DATA IN GIS


MODULE 4 [Home][Comments] [Modules] [Glossary]

TABLE OF CONTENTS
Preface Acknowledgments Introduction Materials required Aims Objectives 1. Spatial-Attribute Data Links 1.1 Storing attributes 1.2 Spatial-descriptive data links 1.3 Relate and joins 1.4 Look-up tables 1.5 Files and fields 2. Database Management Systems 2.1 Design steps 2.2 Database characteristics 2.3 Database types 3. Attribute Information 3.1 Features of attribute data 3.2 Accuracy and errors 3.3 Creating attribute tables 4. Analysis and Manipulation of Attribute Data 4.1 Attribute operations 4.2 Adding vector attributes 4.3 Adding raster attributes 5. Querying Attribute Data

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_tab_intro.html (1 von 7) [09.06.04 09:40:08]

Manual 4 T-o-C and Introduction

5.1 A typology 5.2 Query languages 5.3 Queries and data types Summary Further Reading Revision Class exercise Computer demonstration Software evaluation Glossary of terms Index

PREFACE Managing Attribute Data in GIS is the fourth of five modules in the series on A Self Teaching Students Manual for GIS. This manual is the result of work undertaken for a Committee for the Advancement of University Teaching (CAUT) National Teaching Development Grant for 1995. The three previous modules included Geographic Information Systems, Raster GIS: An Introduction, and Vector GIS: An Introduction. A further module which follows this deals with Integrating Remote Sensing with GIS. In order to complete this self-contained unit successfully users should be prepared to spend approximately ten hours, that is, reading and working with the manual, writing up results, doing extra reading and attempting an assessment exercise. The presentation style given in this and following modules is one which may be described as a spiral curriculum. In such a curriculum, the contents in the present module are used again in a following module except in more depth and detail the next time the same or similar concepts are encountered. In general, there are four parts to a module:

1. the text presents both the conceptual and practical aspects of the
module with examples from as many usages as possible;

2. diagrams, figures and other illustrative materials are used to explain and
show relationships;

3. questions, exercises and problems to be solved and an assessment;


and 4. suggestions for further reading and research. (See curriculum chart in Figure 4.1). Figure 4.1 Interlocking modules of the spiral curriculum

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_tab_intro.html (2 von 7) [09.06.04 09:40:08]

Manual 4 T-o-C and Introduction

It is advisable that you use this workbook as your personal notepad. A highlighter or bright coloured biro to underline text will help identify important points. In this workbook all important concepts, words and phrases are set out in bold letters and those words and phrases used which carry different meanings from their usual are italicised. To begin with you should browse through this workbook very quickly just to get a feel of its contents. Reading this preface helps! A tutor may walk you through this workbook but the pace may or may not suit you. You should try to go through this workbook at a pace with which you are comfortable with. The appendices are an important component of the overall module because they contain important tools. Hints on using computers, a glossary, an index, and some answers to workbook problems are provided here. ACKNOWLEGMENTS I should also like to thank the following individuals and publishers for permission to reproduce their illustrations and examples in this workbook.

ESRI (1990) Understanding GIS: The ARC/INFO Method, Redlands, CA: ESRI for figures found on pages 2-21, 2-23, 6-2; William Huxhold (1991) An Introduction to Urban Geographic Information Systems, New York, Oxford University Press for figures 2.2, 2.3, 2.4, 2.5 and 2.6.

INTRODUCTION
http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_tab_intro.html (3 von 7) [09.06.04 09:40:08]

Manual 4 T-o-C and Introduction

This is the fourth module in the series A Self Teaching Students Manual for GIS deals with Managing Attribute in GIS. It is a capstone unit which links the first three modules because this module deals with the handling and managing of non-spatial attribute data in a GIS. An attribute in a GIS is a field in a computer file containing information about an entity. It may be considered as a class descriptor of an entity; for example, colour, grassland species, age of trees in a stand. While raster GISs deal with data that can be stored, manipulated, analysed and presented in cellular form and vector GISs reference data to the geographical primitives of points, lines, areas and volumes, this module addresses how the attributes of these entities are managed and handled by a database management system. By sequencing the modules in this fashion allows the merging of both raster and vector systems in so far as attributes are concerned. This workbook is presented in five parts. After this introductory section, part one deals with the linking of spatial data with its attributes or characteristics. This section identifies the use of geocodes and associated descriptions of features associated with it, and the integration of data types. Part two introduces database management systems (DBMS) in the spatial sciences. This is because such systems have traditionally concentrated on aspatial data. The design of DBMSs, the various characteristics, types and capabilities are explained. The functionality of DBMSs to query, analyze, manipulate and output results in any GIS is also examined. In part three the nature of attribute information is presented in terms of its specific features and requirements. This presentation discusses the classification, description and recording of various types of attributes. How descriptive variables are linked with spatial entities is also explained. Then in part four, the analysis and manipulation of attribute and textual data is discussed to show how operations on attribute data may be performed while still maintaining the links with the spatial data. Adding information from other files created in other formats is also discussed briefly. Finally, in part five, the query and display of attribute data are discussed from the viewpoint of a GIS in general. This module concludes with a brief summary followed by suggestions for further reading, revision exercises and a computer demonstration. A glossary and index follows to assist the reader navigate through this workbook. MATERIALS REQUIRED
r r r r r r r r r r

Bright coloured biro or highlighter. Sharp pencils, preferably HB (hard-black), rulers, erasers. A4 mm graph paper, tracing paper. Access to a personal computer (Intel-based IBM-compatible PC). MS Excel Ver. xx for Windows Intel-based 386 computer or higher MS Windows 3.0 or higher 4 Mb memory in RAM, 8 Mb strongly recommended 12 Mb hard disk space for data storage Colour monitor, mouse, Windows-supported printer (optional)

Computer demonstration software

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_tab_intro.html (4 von 7) [09.06.04 09:40:08]

Manual 4 T-o-C and Introduction

CDATA-91 demonstration programme (Courtesy of Australian Bureau of Statistics). ANSETT Australia Flight Information System (Courtesy of Ansett Australia Airlines). Internet For students with access to the Internet a visit to the following sites might be a profitable venture: Australian Bureau of Statistics (ABS): gopher://gopher.statistics.gov.au England: gopher://cs6400.mcc.ac.uk/11/midas/datasets Statistics Canada: http://www.statcan.ca US: http://www.census.gov/geog.html AIMS To have a student:

1. Appreciate the value of attribute data to spatial information systems. 2. Understand how spatial data attributes are input, stored, retrieved, 3. 4. 5. 6. 7. 8.
manipulated and used in a database in particular and in GISs in general. Obtain an introduction to database concepts and models and how these are used in representing textual information.(AISIST 3.0204). Use database models to represent spatial entities. Appreciate the requirements in designing a database including the various concepts and design steps.(AISIST 3.0303). Obtain a basic understanding of the principles, methodology and importance of data management.(AISIST 4.05). Gain an insight into the various methods of database management systems (DBMSs) and how these are used to manage and handle attribute data. Comprehend some of the organisational and technical issues required to implement DBMS strategies in a GIS and to support its use.

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_tab_intro.html (5 von 7) [09.06.04 09:40:08]

Manual 4 T-o-C and Introduction

OBJECTIVES As a result of completing this module a student should be able to undertake the following tasks with some level of understanding and competence.

1. Discuss the organisation of spatial data attributes in a database. 2. Describe how spatial data attributes are input, stored, retrieved, 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
manipulated and used in a GIS. List the main advantages and disadvantages of common methods of managing attribute data in a GIS. Describe basic models and methods used to represent textual data. (AISIST 3.0204). Explain the relationships between rows, columns and entities in a relational view of data. (Relational view -- attributes, rows and domains). Give examples of how a relational model is used to represent vector objects. Outline the main steps in database design and construction as conceptual, logical and physical models. Describe the processes involved at each of the three primary steps in designing a database. List and describe each of the important characteristics of a database. Explain the principal methods and importance of data management. (AISIST 4.09). Explain what, how and why there are different types of database models in use. Outline general cartographic and graphic design principles for producing GIS products, reports and other outputs.

Citation To reference this material the correct citation for this page is as follows: Cho, G (1995) A Self-Teaching Student's Manual for Geographic Information Systems. (Insert here the Module Number and Name) Canberra: University of Canberra and CAUT. Online URL, http://infosyslaw.canberra.edu.au/gismodules/index.html, as of (...) [date and time].

[Home] [Table of Contents][Comments] [Modules] [Glossary]

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_tab_intro.html (6 von 7) [09.06.04 09:40:08]

Manual 4 T-o-C and Introduction

Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 2001

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_tab_intro.html (7 von 7) [09.06.04 09:40:08]

Module 4 Chapter 1-2

MANAGING ATTRIBUTE DATA IN GIS


[Home][Comments] [Table of Content] [Modules] [Glossary]

1. SPATIAL ATTRIBUTE DATA LINKS The linking of spatial data with its attribute data is one of the most important features in any GIS. This is because the attribute data provides the intelligence and the descriptions of the spatial data which taken by itself appears bland, non-descript and lacking meaning when taken out of context. Textual data also provide the identifiers to the geocodes, the locational identifiers in any spatial database. A geocode is simply an identifier that is assigned to both a map feature and to a data record which contains attributes that describe the particular entity represented by the map feature. Examples of geocodes include street addresses, census tracts, and political and administrative boundaries. The importance of attribute data also stem from the fact that such data allow for linking texts (text-text), for linking text to spatial data (text-spatial) and as a means of indexing spatial relationships (spatial-spatial). To illustrate such linkages examples are taken from a vector GIS to give one approach. The philosophy used in PC ARC/INFO, a software package developed by the Environmental Systems Research Institute (ESRI), hinges on the use of separate but linked files. The name of the software package itself suggests the dilemma of having to deal with both the locational-spatial data hence the name ARC, and the descriptors and attributes of these arcs info. The marriage of arc with info thus represents a compromise solution to the problem in two ways how to deal with spatial and attribute data simultaneously and how to overcome the problem of dealing with both types of data simultaneously using a programming solution. The idea here has been to deal with each type of data separately but at the same time ensure that the links between the two are maintained. That is, a data point is dealt with in Arc while the attributes of that data point are dealt with in Info simultaneously. How such linkages are maintained is discussed in the following sections on database management systems and database management system design. As a vector-based GIS PC ARC/INFO maintains separate files for each spatial entity in the form of points, arcs and polygons. Associated with each of these entities are point attribute tables (known as PAT files), arc attribute tables (AAT) and polygon attribute tables (also known as PAT files but whose internal organisation and fields are intrinsically different from point attribute files). These attribute tables are linked dynamically so that the spatial data has labels, descriptors and attributes. Thus, whenever one spatial datum (singular for data) is changed in the database, the attribute data associated with this datum is also changed or updated. These dynamic links have been automated and are transparent to the user. The organisation and structure of the system software ensures that such changes come about through a rigorous process of editing the data, and then building the topology in the data before the data are useable. The need to undergo these procedures and performing error checks ensure that the spatial-attribute linkages are maintained.. 1.1 STORING ATTRIBUTES Descriptive attributes associated with map features are captured and stored in the computer in a similar way in which coordinate data are captured and stored. Attributes are stored as combinations of numbers and words. Thus, attribute information for a set of lines representing roads may include:

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (1 von 12) [09.06.04 09:40:33]

Module 4 Chapter 1-2

Attribute Road type:

Code Meaning of code 1 = highway 2 = arterial roads 3 = major roads 4 = suburban streets 5 = tracks concrete asphalt gravel measured in metres a count of lanes name given to each road

Surface

Width Number of lanes Name

.
These roads may be broken up into a series of segments or arcs and associated with each arc when stored in a computer is a string of values arranged in some pre-determined format, such as: 1 concrete 20 4 Federal Highway These might be attributes for a highway (road type 1) made of concrete, 20 m wide, four lanes and named the Federal Highway. So in a road map containing many arcs and associated features, it can be imagined that these attribute data can be summarized as a table. Such a table will have as many rows as there are arcs, and as many columns as there are features. The attribute table will look something like this: Table 4.1 A table showing feature attributes

Feature IdNo. 1

ROAD TYPE

SURFACE

WIDTH

LANES

NAME

Asphalt

16

Belconnen Way Federal Highway Baldwin Drive

Concrete

20

Asphalt

11

Such a tabular data file has been used in ARC/INFO. Each line in the file is considered a record and it stores all the information about one occurrence of a feature (in this case an arc and information on three arcs are shown here) and an item stores one type of information (that is, attribute information on six features in the database). Together these data files are known as feature attribute tables (FATs). A FAT is thus a data file for storing descriptive map data. This data file also has an important role in providing the link with the spatial data..

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (2 von 12) [09.06.04 09:40:33]

Module 4 Chapter 1-2

1.2 SPATIAL-DESCRIPTIVE DATA LINKS As noted previously, the power of a GIS lies in the link between the spatial data and its descriptive data. ARC/INFO achieves this link in three ways. First, there is a one-to-one relationship between features on the map and records in the FAT. An arc on the map will have a record pertaining to that arc in the FAT. Similarly, a point and a polygon. Secondly, the link between the feature (whether point, arc, or polygon) and the record is maintained through the unique identifier assigned to each feature. In our example we have used a column entitled "Feature Id Number. For polygons, the identifier is assigned by the polygons label point. Finally, the unique identifier is physically stored in two places: in the files containing the x,y coordinate pairs (the spatial data file) and in the corresponding record in the FAT (the attribute data file). Figure 4.2 Linking spatial features with attributes of features

Source: ESRI (1990: 2-21) ARC/INFO automatically creates and maintains the connections between the different files. Figure 4.2 illustrates the linkage between features and attributes. The files on coordinates and attributes have a common element: the feature identification number (Feature Id. No.). This number associates the attributes with the feature coordinates in a one-to-one correspondence. Given this connection, it is possible to query the map to display attribute information or create a map based on the attributes stored in the FAT. 1.3 RELATE AND JOINS Rather than just keeping track of features and their attributes the concept may be applied to connect any two tables provided they share a common attribute. ARC/INFO uses the concept of a relate and a relational join. A relate uses a common item to establish temporary connections between corresponding records in two

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (3 von 12) [09.06.04 09:40:34]

Module 4 Chapter 1-2

tables. The effect of a relate is to make the table wider by temporarily adding feature attributes. A relational join, on the other hand, relates and merges two attribute tables using their common item. In a relate the related tabular data file can be maintained and updated separately. The concept of relate and relational joins are conceptually simple yet are used frequently in GIS operations. In a spatial overlay, for example, each new output feature has attributes from both sets of input features used to create it. Such a spatial overlay in reality is a spatial join. Here the records are matched based on their locations and their associated geographic features rather than using a common item in two tables.

4 LOOK-UP TABLES A further device to associate map features and attribute information to graphic and symbolic data is the lookup table (LUT). A LUT is a special tabular data file associated with a particular FAT and containing additional attributes about the features beyond those stored in the FAT. Some refer to these special tables as a relate table, an external attribute table or an expansion table. Thus, during map display, features can be drawn using symbols contained in a related file. For example, land use could be classified as residential (100), commercial (200) and industrial (300), each type having a corresponding symbol number in a related table. When the map is drawn each land use polygon is then drawn with a shade specific to its type. (See Figure 4.3). Figure 4.3 Mapping attributes using lookup tables (LUTs)

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (4 von 12) [09.06.04 09:40:34]

Module 4 Chapter 1-2

Source: ESRI (1990: 2-23). The concept of a LUT is also used as easily in a raster system. A raster of rainfall values, in mm per year, can be stored in raster cells. A LUT table may be drawn up so that we can draw contour lines (isohyets) showing areas experiencing the same amount of rain. If we chose a contour interval of 2 cm/year the LUT would look like this: Cell value 1 2 3 4 5 Display black red black red black

.
The display on the screen would show red contour lines on a black background. A more sophisticated display could be devised to show different colours to indicate the different values of each contour line. 1.5 FILES AND FIELDS A typical database attribute file for spatial objects may be organized in a table. The word file refers to a file in the database which contains several records. A record may be thought of as one row in a table and each record may be divided into fields each of which contains an item of data. A field defines where a particular kind of information may be found in the record. Each field may have a key or label so that when a query is made the keys are searched and the data retrieved. Using keys, searches for data can be quick and the data file made more compact. In Table 4.2 under the column point record the key words symbology, colour, level, x-, y-coordinates give information on how point records are kept in an attribute file. Symbology means what the symbol should look like, colour used to show the point and the x-, y-coordinates where the point should be displayed. For line records the key weight refers to the thickness of the line, and the start and end points of the line. Area records require vertices to provide the boundaries of the area. Text and symbol records give further information as to how these attributes are to be shown on a map. Table 4.2 Typical database attribute file for spatial objects required in a GIS for mapping instructions. RECORD: Point Line Area Text Symbology Symbology Symbology Font type Symbol Name

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (5 von 12) [09.06.04 09:40:34]

Module 4 Chapter 1-2

Colour Weight Level Colour x-,yLevel coordinate Start point end point

Colour Level Vertices

Height Angle Colour Start point end point

Colour Line Attributes Level x-,y- coordinate value

Source: Adapted from Huxhold (1991: 128). Given that there will be massive amounts of data in any GIS project, it is important that the data are managed properly and effectively. Data represents nearly four-fifths of capital outlay, maintenance, updating and use in any GIS. The next section addresses this aspect of managing databases in any GIS. 2. DATABASE MANAGEMENT SYSTEMS A database may be thought of as a set of data files that are linked together. The linkage may be physical or may have common identifiers to provide the linkage. Computer software is used to establish databases which permit the creation, editing, manipulation and analysis of data. The database usually consists of files in which records are stored. Each record may contain several fields or classes each of which contains one or more items of information. In order that the records and fields are amenable for computer or automated analysis, the number and placement of each of the fields have to be pre-determined and conform to one fixed format. Thus, each record may contain information on road type, surface construction, width, number of lanes, name of road and so on. The contents of each field in the record may be wholly numbers, a mixture of numbers and words (alphanumeric) or wholly of words and characters. The length of each field may either be a constant or a variable length. However, once decided, the format is fixed and should not be changed. There can be several classes or fields of records in a database. Thus, in an airline reservation system database the classes of records and associated items could include the record class and items shown in the following table. ITEMS Name, phone number, flight number Type, registration number, number of seats Names of pilot, co-pilot, cabin crew, home city Number, departure and arrival times, aircraft

RECORD CLASS Passengers Aircraft Crew Flight

Figure 4.4 An illustration of data files, records, fields and classes in a database

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (6 von 12) [09.06.04 09:40:34]

Module 4 Chapter 1-2

Figure 4.4 provides an illustration of the organisation of a database. Such a database serves several functions. The database may be used for creating and editing records and as the basis for comparing new data inputs. The database can also be used for providing the information to be included in printing records or summary of a group of records, and as input to customized report forms. Based on a set of criteria and userhttp://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (7 von 12) [09.06.04 09:40:34]

Module 4 Chapter 1-2

specified rules the database may be interrogated to select records for further scrutiny. . Records may be updated based on new information and given that a database exists there will be no need to create a new database. Finally, the database serves to provide the essential link to various records in different files, for example, to determine arrival time for a passenger by matching a passengers record with the flight information record. Thus, a main advantage of a database approach is the reduction in data redundancy because of the sharing of information. This sharing reduces the problem of inconsistency in stored information especially where many different sections of the same organisation collect and store the same information and enables the easy maintenance of data. 2.1 DESIGN STEPS There are several important steps to consider in the design of the database if it is to be useful. This is because while almost all geographic entities have at least a three dimensional spatial character not all are required at the same time; for instance, the depth of the asphalt in a road may be less important than the width of the road in some analyses; and the width of the road may be less important than the length of the road in other circumstances. Thus, while dimensionality may be important, it is dependent upon the particular context or project. How the data are represented in a database is also governed by the types of manipulations that may be needed to be made on the data to produce the required information. Also the map scale of the source document may be as important as the map scale of the final product because some geographical entities, although present at one scale my be indiscernible at another scale. The design of the database therefore will have to address conceptual issues, logical issues and physical resources.

(a) Conceptual
r

r r

The need to design a database which is independent of hardware and software considerations. The use of code interchange standards will allow a wide variety of software to use the data on as many hardware platforms as feasible (for example, ASCII and text files). The need to describe and define spatial and geographical as well as attribute entities. The need to identify how these entities are to be represented in the database. For example, the spatial objects of points, lines, areas and raster cells. To decide on how real-world dimensionality and spatial relationships will be represented. This is a matter of geographical scale as much as it is a matter of processing of the data, before, during and after input into a database.

(b) Logical
r

The logic of the database will in large part be governed by the database management system used by the software. This in turn will determine the logical structure of the database elements. The arrangement of data in a database will often be governed more by software specific considerations than by hardware considerations.

(c) Physical
r

Both software and hardware specific since the storage of data on magnetic media will depend on the software used as much as the hardware used to read the data. Further, how the files will be structured on the magnetic media will depend on how efficiently the

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (8 von 12) [09.06.04 09:40:34]

Module 4 Chapter 1-2

hardware is able to extract the data. Most hardware systems read data off a file in a row-by-row scan whereas on some architectures the data is read more efficiently column-by-column. Having considered the various design steps noted above, there is a need to construct a data dictionary. This dictionary itself is a database and since it describes the contents of other associated databases, it is said to contain metadata, data about data. A data dictionary is a list maintaining, for each layer, the names of the attributes and a description of the attribute values plus a description of each code where necessary. Notice we are talking about a layer, that is one cover or one theme of a map. The creation of a data dictionary for the database will be invaluable as a reference during the project as well as for transferring information to other users. A sample of a data dictionary in tabular form is given below as Table 4.3. Table 4.3 A sample data dictionary Feature SOILS Class Polygons Attributes Value Description Abbreviation for soil SOILCODE type SUIT 0 Unsuitable Poor suitability 1 2 Moderate suitability Good suitability 3 Urban LUCODE 100 Agriculture 200 Brusland 300 Forest 400 Water 500 600 Wetlands 700 Barren Cost per hectare Actual monetry (COSTHA) value STRMCODE 1 Major stream 2 Minor stream DIAMETER Actual diameter in SYMBOL mm. 1 60cm pipe RDCODE 1 Improved 2 Semi-improved

LANDUSE

Polygons

STREAMS Lines SEWERS Lines

ROADS

Lines

Source: ESRI (1990: 3-12).

.
2.2 DATABASE CHARACTERISTICS In the design of databases, several important characteristics have been identified (see Nyerges 1990: 10-8). A database should be:
r

Contemporaneous. This means that the database should contain information of the same vintage for all its measured variables. Taking data from two map sources published a few years apart will

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (9 von 12) [09.06.04 09:40:34]

Module 4 Chapter 1-2

r r

introduce error especially where the variables being extracted are of a dynamic nature, for example, agriculture or land use. Detailed. The data should be as detailed as necessary for the intended applications. Thus, categories and sub-categories of data should be as complete as possible to allow the analysis and modelling of the behaviour of the resource under inspection. Accurate. Positional accuracy is always a prime consideration especially where spatial analysis is involved. In addition, the data recorded must be internally accurate in that it will portray the nature of the phenomena without error. To achieve this there is a need to set out clear definitions of the phenomena to be included as well as criteria to either include or exclude certain other entities. Accuracy is also coupled with precision since the more precisely a phenomena may be recorded the more accurately it may be described in the data. Compatible. The data must be exactly compatible with other information that may be overlain with it. Thus, while internal criteria can ensure consistency within the database, that of ensuring its use with other databases can be more difficult to achieve. Easily updated. The database must be readily and easily updated on a regular basis. Accessible. The database must be readily accessible to whoever may wish to use it as well as easy to use.

2.3 DATABASE TYPES (a) Flat Files In a flat file data structure or tabular file, each record in the file contains the same data fields as other records. The values of each field or fields may differ and usually one field is designated as a key field which is used for locating a particular record or for sorting the file in a particular order. When such fields are computerized the task of locating and sorting the file can be a relatively easy task. A sequential search of flat files is possible when the files are ordered sequentially by the value of the key field. Searching for a record based on the value of the key is relatively fast because searching stops when the desired record is found. In a binary search a sequenced key field can be searched much more rapidly than any other technique. Here the search starts in the middle of a file sequenced in order of the key field and compares the value of the key in the record within the desired value for a greater than or less than consideration and eliminates for consideration the half of the file in which the desired record could not be stored. The search then goes halfway again through the remaining half and compares the value of the key field in the record to the desired value, eliminating half of those records (now one-quarter of the original file). The search continues by halving the records for consideration until the desired record(s) is found. Indexed searches are used when the key field is not sequenced. An index is a separate table containing the key of every record and an address pointing to the location of the data record for each key. The address is the location on the physical storage device used and is assigned by the software that controls the device. The index is then sequenced, and a binary search is used on the index rather than on the data file. New records may be added at the end and there is no need to re-sequence the entire file only the index. Flat file structures are simple and efficient for repetitive tasks especially in transaction-based information systems. However, such structures can be inflexible and unresponsive to certain types of queries. Access to records based on any field other than the key is very slow. The fields cannot be readily expanded to accommodate more information without major effort. Adding new records to the data files requires additional processing.
http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (10 von 12) [09.06.04 09:40:34]

Module 4 Chapter 1-2

(b) Hierarchical File Structure In a hierarchical file structure (tree-structure) there is more than one type of record held in different files. Pointers allow a one-to-many relationship where each master record can have one or more detailed records associated with it. Access to the data is limited to one type of record the master record. In order to remember this structure, the master record may be thought of as the parent which can be associated with any number of detailed records or children through internally assigned pointers. Furthermore, the detailed records can also have children associated with them, with additional pointers assigned for the third level of association. A distinguishing characteristic here is that each record has one higher level record associated to it. While flat files use up data storage space in an attempt to accommodate for as many variations as possible, adding a further field requires a major re-structuring of the file. There need be little or no need for additional storage space in hierarchical structures because the data files can accommodate additions and changes in a flexible way. A drawback of the hierarchical file structure is that records may only be used by first accessing the master record and data in detailed records must be repeated for each master record association. As geographical data have a natural tree-structure, hierarchical file structures have been used in some GISs. However, such tree structures can be inflexible because new linkages cannot be defined between records once the tree has already been established. Also lateral or diagonal linkages cannot be defined because the relationships are vertically structured. (c) Networks A network file structure is organized so that there is more than one type of record with pointers allowing related records to be associated with each other in a many-to-many relationship. Access to the data can be made on any type of record, and pointers are used to associate different records with related data. A network file structure has many files containing different, yet related, records. The relationship is through key fields that are common between the files. Pointers to related records are contained in different files. Any update to the information necessitates only one change to the relevant record. While changes take place, the database management system software keeps track of the pointers, ensuring that the network relationships are intact. The expansion of the database and the complexity in maintaining the pointers within the network can require greater amounts of storage. The management of these pointers can become so cumbersome that the system becomes inflexible and difficult to use. However, the network model has greater flexibility than the hierarchical model for handling complex spatial relationships. But, in turn, network models have not had widespread use in GISs because of the greater inflexibility of relational database models. (d) Relational Databases A relational database structure is made up of separate flat files which contain related data that can be combined by matching records having the same values in columns common to the files. No pointers or keys are used, thus reducing the complexity of the network. Logical linkages of the data values in common fields among files form the associations between files. The database is thus relational since one or more other tables are related by a common data item and then joined to form a new table. A major advantage is the almost unlimited flexibility in forming relationships among data items in the database without additional difficulties of managing the linkages. Without the need for keys and pointers to manage new relationships, data items can be added dynamically on the fly rather than having to do major re-programming and restructuring of the database. In sum, the relational database management system is the most flexible database model. It is a user view of the data and not the way the data are organized internally. Figure 4.5 provides a schematic representation of the various database file structures discussed above. It may be noted that Star & Estes (1990: 137) describe the network and hierarchical data structures as a navigational database model because the relationships between different tuples (pairs of items) can be displayed as links in a diagram. The links in the database are traversed when we retrieve information from

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (11 von 12) [09.06.04 09:40:34]

Module 4 Chapter 1-2

two different files and that is why the term navigational is used. Figure 4.5 Schematic illustration of database file structures: flat, hierarchical, network and relational

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 2001

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap1_2.html (12 von 12) [09.06.04 09:40:34]

Module 4 Chapters3_4

MANAGING ATTRIBUTE DATA IN GIS


[Home][Table of Contents][Comments] [Modules] [Glossary] 3. ATTRIBUTE INFORMATION Attribute information describe "what is it". A spatial entity referenced in any way in a raster or vector system is of limited use if it is not described properly. The feature may be a forest stand and its attributes might include species composition, average tree height, date last logged and other data. These are often termed non-spatial attributes because there are no locational information present. In contrast to spatial information, attribute data contains an inherent level of inaccuracy because the stand may lack full coverage of a particular variable. For instance, there may not be 100 per cent trees of one species, the logging may have been selective and so on. Here we examine the various features of attribute information and how descriptive, textual data can be linked to spatial entities in a GIS. 3.1 FEATURES OF ATTRIBUTE DATA One of the first features to note about attribute data is its greater volume when compared to spatial data. This means that greater amounts of data storage space may be required to be set aside in the database. For example, in the case of a water well, as a simple spatial object its essential geographic location can be represented with a tuple of coordinates. However, the attribute data associated with this well includes a wide range of ancillary information including its depth, the date of drilling, volume of water extracted, water quality tests and so on. Attributes thus capture the thematic mode by defining the different characteristics of objects. This is achieved by using an attribute table showing the attributes of objects. Each object corresponds to one row of the table, and each theme or characteristic corresponds to a column of the table. Attributes shown on the table are usually non-spatial although some may be related to the spatial characteristics of a phenomena such as an item showing the area of a spatial object or its perimeter. Another feature of an attribute is that there is an attribute value. This refers to the actual value of the attribute that is being measured or sampled and then stored in the database. The entity type is always labelled and a descriptor is attached to this as an attribute, for example, the name of a forest and the type of forest. Thus, eucalypts, pines, others may be coded 1, 2, and 3 respectively and these numbers are the attribute values that refer to the type of trees in the forest. As noted previously, attribute data is inherently imprecise and contains some uncertainty since it may be used to represent either the attribute of an object or it may be an attribute of an entire class of objects. Thus, if one said that the attribute of a polygon is 90 per cent of class A soil then it suggests that 10 per cent is unascertained. On the other hand, if the data is an attribute of an entire class of objects, what the observer is saying is that soil type A has been correctly identified 90 per cent of the time. 3.2 ACCURACY AND ERRORS Attribute errors stem from various sources, including mis-identification and compilation problems as well as a question of precision in the use of descriptive language. In general, attributes may be discrete or continuous. A discrete variable can take on a finite number of values and these are usually integer values or category values (categorical variables) such as land use class, vegetation type, local government area. Ordered lists are also discrete variables where if one were evaluating soil erosion 1 would represent low erosion levels and 4 to indicate a severe soil erosion problem. On the other hand, one could be ranking soil classes according to various categories of fertility so that a soil rank of 1 represents the very best and 10 the very worst. The values used here are whole numbers or integers and it would be meaningless to speak of intermediate values, for example, in halves or quarters. The scale of measurement used is nominal and ordinal. From this discussion it will be clear that there will be occasions when a researcher will have to make a subjective decision in giving an attribute value to a particular sample whether it be a rank, category or class. As some subjective judgement is thus necessary, the attribute information may not be very precise even though it accurately portrays the object being observed. Continuous variables, on the other hand, can take on an infinite number of values and are thoroughly suited to variables that can be measured by some instrument. Variables like temperature, average tree heights are continuous and intermediate values are valid. Here we are dealing with the interval and ratio scales of measurement. While the classification of such variables can be highly precise especially when field measurements are made or where the data have been ground truthed, sometimes it may be necessary to provide estimates when data are obtained from other sources. In classifying the data from such sources the accuracy is affected by the number of classes used, the shape and size of individual areas, the method of selecting test sites and points and confusion arising from the use of different classes.

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap3_4.html (1 von 9) [09.06.04 09:41:17]

Module 4 Chapters3_4

Aronoff (1989: 136) provides the following discussion on the difficulties of giving accuracy estimates when classifying objects and their associated attribute information. For example, wetlands along streams are typically long narrow areas. These wetlands may take up less than 1 per cent of the total map area. So a random selection of test points may not capture the wetlands and therefore provide no information on the map accuracy of the wetland class. Even if test points are selected for each map class, the precision needed to accurately locate the test points in the field may exceed the positional accuracy of the data being tested. That is, the boundary may be mapped to within 10 m of the true position, but the wetland area itself may be less than 10 m across. Also sharp boundaries often do not exist even though they are mapped with clean sharp lines. A wetland edge is usually a zone of several or tens of meters in width. Thus, the assessment of classification accuracy may not be entirely objective. 3.3 CREATING ATTRIBUTE TABLES The creation of attribute tables can usually be achieved either by the GIS software itself or externally using spreadsheets, database and other software that produce compatible data files. Here the example is taken from PC ARC/INFO. A series of steps are prescribed by the software (ESRI 1990: 6-2) including:

1. Create a new data file to hold the attributes. 2. Add the attribute values to this newly created data file. 3. Relate or join the attributes to the feature attribute table.
Figure 4.6 illustrates these three steps. It may be noted that the user has had to decide on the attributes that will be needed for each layer in the database. A layer refers to a theme such as soils, streams, vegetation type and so on. For each of these themes the user next has to decide on the specific parameters for each attribute and the types of values to be stored in them. These could simply be numbers, characters or a combination of numbers and characters (alphanumeric). Figure 4.6 Steps for creating an attribute table

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap3_4.html (2 von 9) [09.06.04 09:41:17]

Module 4 Chapters3_4

Source: ESRI (1990: 6-2). Other considerations in the creation of an attribute table or a tabular data file includes giving names to the various attributes, distinguishing between different types of attributes and calculating the storage space requirements for each attribute. This is an important step because it forces the user to plan beforehand what is needed and prepare other requirements. At this stage errors of logic, compilation problems and mis-identification of categories may surface and may need to be rectified. This stage is also important in ensuring that the newly created attribute table may not only be used in the GIS software by ensuring that the table is in the correct format but also that the table is compatible with other tables in the database. This latter consideration is important where it is desired to merge two or more tables, and the basic requirement for that operation is that there is a common item in both tables and that the dimensions of both tables are the same. In this way it will ensure an error-free relate, join or relational join of the attribute tables. It should be noted that proprietary products such as dBase III, Lotus 1-2-3 are database programs whereas Excel is a spreadsheet program. Table 4.4 is taken from ARC/INFO and illustrates some rules in the creation of tabular data files. The description of each parameter is given together with an example. Table 4.4 Creating a tabular data file Parameter Item Name Item Type C N D No. of decimals Item width Description Any name up to 10 alphanumeric characters Must begin with alpha character Data type of item Character. Any combination of alpha characters Number. Any character made up of integers or real numbers Date stored as 8 bytes For N the number of digits to right of decimal No. of spaces (bytes) used to store item values Example LNDG01_ID ROADCODE

224 Baker Street .10, 101.3 12/31/89 4.22 3 bytes for a code of 500

Source: ESRI (1990: 6-4). 4. ANALYSIS AND MANIPULATION OF ATTRIBUTE DATA A consideration in the analysis and manipulation of attribute and textual data is to ensure the creation and maintenance of the spatial-textual relationships. This is vital if error-free results are to be obtained when performing manipulations and analysis either on the spatial data or on the attribute data. This is also because operations on either types of data gives results in the other. As in the spatial domain, with textual data Boolean operations such as union, join and intersection are also possible. Examples of these include AND, OR and XOR. While tabular relates and joins have been demonstrated previously in this section the manipulation of textual data using simple text editors, other software packages like Microsoft EXCEL, dBASE III and SAGE are discussed below. The section begins with attribute operations in general in order to provide the overall context. 4.1 ATTRIBUTE OPERATIONS Attribute operations may come about because either the original coding for the attributes are not appropriate or because the
http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap3_4.html (3 von 9) [09.06.04 09:41:17]

Module 4 Chapters3_4

coding scheme used was too detailed for a later usage. Both of these reasons necessitate a re-coding of the original data. The recoding of the data may proceed along the lines of some pre-conceived idea of what the simpler or more generalized codes ought to be; for example, instead of having four or five different tree species in a forest all the codes are to be reduced to one simply to indicate the presence of forests. In performing the re-coding of the forest data, there is a knock-on effect on the spatial data since there will now be many redundant boundaries left on the map which would need to be removed. The result will be a simpler and more generalized map. Depending on the data structure used, the removal of the redundant boundaries can be a complex process. The task is more complex for vector-based data structures because the polygons defined by the boundaries have to be explicitly coded. This means the intermediate points making up the boundaries would also need to be removed before merging the polygons. In contrast raster-based structures have boundaries and areas defined implicitly with the result that the merging and removal of redundant boundaries is not so difficult a task. Star & Estes (1990: 146-148) give a very good example of the overlay procedure and describe the processes of merging attribute data in a matrix and plotting the results on a map. The example they have used concerns the trafficability of a terrain over which a vehicle has to travel. There are three slope types and three soil types found in a certain area and the vehicle can only travel across the area based on slope and type of soil. A matrix of possibilities is produced and these can then be shown on a map giving the potential for traversing this area. The input data here are both nominal variables (steep, moderate, level and rock, sand, clay) with the resulting output of nominal data in the form of fair, easy and hard. A raster system may use some algorithm that processes the two sets of input data and comes up with a third set of output data as the result. In a vector system the analysis is based on a polygon intersection algorithm in which new polygons are created and redundant points and boundaries removed. While a more complex operation, the processing involves less data when compared to a typical raster database. Furthermore, continuous variables (ratio and interval) may also be used in the analysis to add realism to the problem. Such variables as time, cost may also be included in the algorithm used for assessing accessibility and trafficability. The use of multiple layers is often done in a ordered step-wise fashion where two input layers are combined to form an intermediate layer which is then combined with a further input layer to produce yet another intermediate layer and so forth. Where the input layers are in ratio and interval scales of measurement, mathematical procedures may be used to combine different layers. Simple arithmetic operations such as add may be used to combine, say a layer of plant species in an area with a layer showing the number of species that make up that habitat. The combined layer can now produce the density of plant species per unit area. Vector-based systems will operate on polygons whereas raster-based systems will involve a process of adding the values in each of the data layers, cell by cell. The examples given above demonstrate attempts through GIS analysis to locate areas in a multivariate data space. The relationships produced may confirm initial hypotheses. The results also are useful for summarizing complex data sets that may have come from different sources or collected at different times. Further as an analytical tool, the GIS may help formulate further questions and hypotheses about spatial patterns and spatial structure. 4.2 ADDING VECTOR ATTRIBUTES The idea of relates, relational joins and joins in vector GISs has already been introduced previously. Some GIS software permit the adding of attributes from an existing digital file. Such a text file must be created and exist in the correct format on the computer. In PC ARC/INFO the ADD FROM command in TABLES can be used to do this. Say we have a digital file called STREAMS.TAB which contains two items: an identifier (User-ID) for each arc and a code for each record (for example, 1 refers to a major stream and 2 a minor stream). Such a file may be created using a simple text editor or even a word processor but saved as a text or ASCII file. This simple file contains no formatting of any kind except numbers and characters. In the TABLES module of PC ARC/INFO a data file is created to hold the stream attributes. The identifier will be the relate item and therefore is defined same as user-id. Then with the ADD FROM command the attribute values are read directly from the STREAMS.TAB text file. ADD FROM adds values from an ASCII text file to records in the selected data file called STREAMS.DAT. Table 4.5 Adding attribute data from a digital file ________________________________________________________________________________

Step 1 Create a Digital file

Step 2 Create a data file in TABLES module of PC ARC/INFO

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap3_4.html (4 von 9) [09.06.04 09:41:17]

Module 4 Chapters3_4

STREMS.TAB 200,2 201,2 202,2 203,2 204,2 205,2 ... ... Step 3 ADD FROM STREAMS.TAB Enter command: Enter command:

Enter DEFINE command: STREAMS.DAT Item Name: STRM01_ID Item Width:11 Item Type: N Item Decimal 0 Place: Item Name: STRMCODE :Item 1 Width: Item Type N Item Decimal 0 Place: ADD FROM STREAMS.TAB LIST

$RECNO 1 2 3 4 5 ....

STRM01_ID 200 201 202 203 204 ....

STRMCODE 2 2 2 2 2 ..... etc.

__________________________________________________________________________________ Source: ESRI (1990: 6-27 - 6-29). Another method of getting attributes and showing them on maps is through the use of key files. In a map, symbols may have been used to show map features. To explain what the symbols mean a legend is added. In PC ARC/INFO there is a need to create a key legend file specifying what will be in the legend. The key legend is a simple text file created by using a text editor or word processor but where the file is saved as a text ASCII file. The key legend file specifies the symbols to be displayed (by giving their appropriate code) and the descriptive text that will accompany each. Using appropriate PC ARC/INFO commands to draw a key box, position where it is desired and the width of the lines to be drawn, the key legend file is then called to show the appropriate symbols and text that accompany the symbols. Table 4.6 Using a key legend file

Step 1 Create key file using text editor: ROAD.key .6 Improved Road .78 Semi-Improved

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap3_4.html (5 von 9) [09.06.04 09:41:17]

Module 4 Chapters3_4

Step 2 Show key in ARCPLOT [ARC] :ARCPLOT :DISPLAY 4 :KEYLINE ROAD.KEY NOBOX [The legend will now show the following] Improved Road ______________ Semi-improve Road
Source: ESRI (1990: 9-20). As the PC ARC/INFO GIS uses dBASE III as the software to maintain attribute information and to perform analysis of the tabular data, it is possible to directly edit the dBASE files created in PC ARC/INFO. Any coverage (layer) will have a series of files related to that cover. Embedded within some of these files are those with a .dbf extension. (The extension to a file is that part of a file following the name of the file and separated by a decimal point. The object of such an extension is as a means of identifying the type of file.) This .dbf file may be edited directly in dBASE III or MS EXCEL when it is opened as a dBASE file. The usual editing of the data may be performed to correct errors, add new items, update existing items and include new records and so on. However, it must be stressed that care must be taken to ensure that any changes to the dimensions of the file (either in terms of the number of rows or number of columns) is compatible with the formats of the other files that may be used with the one currently being edited. If such care is not exercised, serious unrecoverable errors may emerge, thus producing more errors and a breakdown of the analysis. Many users prefer this method of data entry, updating and editing because MS EXCEL is a Windowsbased software and is thus much easier to use than the DOS-driven TABLES module which in turn activates the embedded dBASE software.

4.3 ADDING RASTER ATTRIBUTES


A new overlay in a raster-based GIS may be created by changing the values in an existing overlay. The command is the RECODE operation in the SAGE GIS software and the ASSIGN and RECLASS commands in IDRISI. The example used below is taken from SAGE. The features on the BUILDINGS, ROADS, TREES and WATER map overlays will be combined to form a single overlay named SITEMAP. The identities of all the features on each map overlay will be retained by giving each region in each overlay a unique value by RECODING. (See Table 4.7). Table 4.7 Examples of RECODE operations in SAGE GIS

_________________________________________________________________________________________________ Map Overlay BUILDING.-BP __________________________________________________________________________________________________ Value Label Label 0 1 2 3 4 5 Map Overlay
ROADS.-BP NO STRUCTURE HOUSE PUBLIC BUILDING DUMP CEMETERY GRAVEL PIT

NEWROADS _________________________________________________________________________________________ Value Label Value Label 0 NON-ROAD 0 NON-ROAD 1 PRIMARY ROAD 6 PRIMARY ROAD 2 SECONDARY ROAD 7 SECONDARY ROAD
http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap3_4.html (6 von 9) [09.06.04 09:41:17]

Module 4 Chapters3_4

ROAD ___________________________________________________________________________________________ Map Overlay TREES. -BP NEWTREES __________________________________________________________________________________________ Value Label Value Label1 0 OPEN LAND 0 OPEN LAND 1 SOFTWOODS 8 SOFTWOODS 2 HARDWOODS 9 HARDWOODS 3 MIXED WOODS 10 MIXED WOODS ____________________________________________________________________________________________ Map Overlay WATER. -BP NEWWATER _____________________________________________________________________________________________ Value Label Value Label 0 DRY LAND 0 DRY LAND 1 STREAM 11 STREAM 2 WETLAND 12 WETLAND 3 POND 13 POND
Source: Itami & Raulings (1993: 14). In the example above intermediate overlays are created. NEWROADS is created from the existing ROADS overlay using the RECODE option. RECODE roads FOR newroads & ASSIGNING 6 TO 1 & ASSIGNING 7 TO 2 The first line will recode the ROADS overlay and create a new overlay called NEWROADS. Line 2 instructs SAGE to exchange all 1s in the ROADS overlay with 6s in the NEWROADS map. Line 3 instructs SAGE to exchange all 2s in the ROADS map with 7s in the NEWROADS map. The zero values remain untouched because we wish to retain these values in the new map overlay. As the codes in the NEWROADS overlay do not yet have labels, these would have to be explicitly coded using the LABEL operation. The commands are as follows:

LABEL 0 6 7 -1

NEWROADS Non-road Primary roads Secondary roads

The first line instructs SAGE to execute the LABEL operation in the overlay NEWROADS. Each of the subsequent lines is a series of numeric region values and their corresponding labels. A blank line or the value -1 signals the end of the labelling procedure. After recoding all layers, a combined map called SITEMAP may now be created. The COVER operation is used and proceeds by overlaying two maps at a time. The values of the top map overlay replaces the values of the lower map overlay unless the value of the top map overlay is zero. In that case the zero value of the top map overlay is replaced by the value of the lower map. The commands in SAGE are as follows:

.
COVER NEWTREES WITH NEWWATER & WITH NEWROADS WITH BUILDINGS &

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap3_4.html (7 von 9) [09.06.04 09:41:17]

Module 4 Chapters3_4

FOR SITEMAP The RECODE operation re-categorizes cell values in the map overlay specified. The values may be either real or integer. Value or value ranges that are not specified will retain their original values. Input overlays may be integer or real maps. The resulting overlay is real. If the input overlay is real, care should be taken to use the UPTO modifier. This ensures that the range of floating point numbers is correctly assigned. IDRISI is a raster-based GIS authored by Ron Eastman of the Graduate School of Geography at Clark University, Worcester, Massachusetts. IDRISI uses the command ASSIGN to create new images by linking the geography of features defined in an overlay with attributes defined in an attribute file. This separation of data allows one to use other spreadsheet programs and database management packages as part of the IDRISI system. For example, a single overlay may define a set of census tracts while the attributes associated with those tracts are entered, stored and partially analyzed with the aid of a spreadsheet program. Only when a map of a particular attribute is required would ASSIGN be used to create that map. Also ASSIGN is ideally suited for a fast reclassification module where the input data are integers. The command dialogue for ASSIGN is as follows:

.
Name of image of geographic features? Name of new image to be produced? Name of attribute values file? Note: Attribute values file format has two columns.

Column 1 Feature code corresponding to features in feature definition image

Column 2 Data values to be assigned to those features

Any features omitted from this list will be assigned a value of zero in the output image. Another command for manipulating the coded data is by using the RECLASS operation in IDRISI. This operation provides for the classifying or re-classifying of the data stored as images (layers) or attribute values files into new integer categories. Classification is either by equal intervals division of the data range, or by the application of user-defined limits. The RECLASS procedure is as follows:

Name of file to be classified? New name of file for result? Choose [1] Equal Intervals or [2] User-defined classification NOTES. In option [1] the current minimum and maximum values in the input image are given and the user is asked if a change is required. The user is next asked to specify the number of classes to be used or the width of each class. A title is required as the last question. In option [2] for each class, the user must enter the new value (an integer) of the CLASS. The range of old values is asked for that should be assigned to this new class. The choice of the old values is important. The class will include the lower of the old values but not the higher one. Finish by entering a -1 as the new value of the next class. A title for the new image is now requested.

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap3_4.html (8 von 9) [09.06.04 09:41:17]

Module 4 Chapters3_4

These various operations provide users with much flexibility in the analysis and manipulation of attribute data. In raster systems the data volumes are quite large and hence the processing times even for relatively small datasets can be quite protracted. This re-classification in combination with the use of colours can produce highly professional output in the form of coloured maps either on screen or printed on paper.

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 2001

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap3_4.html (9 von 9) [09.06.04 09:41:17]

Module 4 Chapter5_sum

MANAGING ATTRIBUTE DATA IN GIS


[Home][Table of Contents[Comments] [Modules] [Glossary] 5. QUERYING ATTRIBUTE DATA Querying a GIS is one of the most interesting, challenging and intellectually stimulating exercise an analyst may engage in. Here hypotheses may be tested and sometimes confirmed, new directions which the research may take will be suggested depending on the results of a query and new hypotheses formulated. The latter will also mean the charting of a new course of action as well as exploring different avenues of research. In querying the database a user interacts directly with the system. Common examples of queries include identification of properties in land records, the query of municipal records for the maintenance and mapping of public utilities as well as queries for transport planning purposes. 5.1 A TYPOLOGY A typology of different kinds of queries include the following types:
r

Simple recall of data either for a listing or for identifying unique attributes. For example: List the attributes of object A Location of spatial objects by unique attributes, scale and address matching. For example: Where is object A? Identification of spatial objects in terms of its attributes and/or projection of types of objects likely to be found within certain parameters. For example: What is (this) object A? Summary attributes of objects within a spatial range in terms of total number, potential numbers, descriptive statistics and density. For example: Summarize objects within distance x. Summary attributes of objects within a region. For example: What objects are in Region Y Definition of optimal routes within a network (least cost, least impact, fastest). For example: What is the best route? Identifying spatial objects that satisfy a criteria. For example: Show all objects satisfying these criteria. Establishing relationships between objects either based on data already stored in a database or computed from data available in the database. Such data are sometimes described as synthetic data because they are derived from existing data. An example of establishing relationships may be as follows: Show all links in stream network below this link; show all oil wells owned by Company E; Show nearest road to this point.

All these queries are based on data available. Some queries cannot be answered because the data is unavailable while others are provided with part answers only. A query of the database replaces the previously anecdotal memory of staff and gives answers that can be more precise, more rapid, repetitive and of a higher volume than can be handled in face-to-face queries. In order for the queries to be useful there needs to be a higher level of user
http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap5_sum.html (1 von 11) [09.06.04 09:41:34]

Module 4 Chapter5_sum

expertise. A user must make frequent use of the system in order to justify and maintain a level of familiarity as well as to keep up with changes and development of the system. A second need is for there to be a friendly user interface so that command-line entries are replaced by icons and pull-down menus as provided for in a graphical user interface (GUI) environment, for example, Windows. A final requirement is the structuring of the database since this will determine the type of queries that can be made and also whether or not the system is easy to use.

5.2 QUERY LANGUAGES


Information science has developed what are described as functional query languages to help users interrogate a database. Such languages may either be procedural or non-procedural query languages. In a procedural query language the user must specify not only what is wanted from the database but also the procedures as to how to get it. It is hard-work for the user since it presumes that the user knows how the data are structured and stored and to ask the right questions in order to produce the answers required for a particular project. In addition, the user also needs to learn the query language itself since it may require knowledge of computer programming. In non-procedural query language, the user will need to specify the objectives of a query. An objective may simply be to find water pipes of a certain dimension and age on a database. The database will be processed to find water pipes satisfying these criteria. No knowledge of the query language is required, neither is there a need to know how the data are stored and structured. Because the system is non-procedural the user need not specify the procedures as to how to get the relevant data. Many interfaces have been developed that make queries simple and straightforward. A popular non-procedural interface is termed query-by-example (QBE). Using this interface a user describes what is required in a report. The system then develops methods to produce such a report. Here the process is similar to the problem of writing a programme in a database query language. Best of all, writing instructions are easy to learn and use. 5.3 QUERIES AND DATA TYPES As noted before the types of queries are dependent on the structure of the data models used hierarchical, network and relational database. In both the hierarchical and network database models physical linkages are embedded in the data records and these key fields are used to traverse the database. As a result of this a user is required to know the hierarchy in which the data are stored. On the other hand, a relational database data model is more flexible because there is no hierarchy in the attributes. Any attribute can be used as a key to retrieve the information. The data can be found in separate tables and can be related using an attribute field that the tables share in common. These relations have not been explicitly coded in the database and the user need not know the structure of the database to construct the query. An example of such a query language is Structured Query Language (SQL) developed by IBM. It is a popular language because it is easy to learn yet powerful and makes querying a

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap5_sum.html (2 von 11) [09.06.04 09:41:34]

Module 4 Chapter5_sum

database simpler and accessible to users with little or no computer training (Aronoff 1989: 162). The following example is taken from Star & Estes (1990: 135-7). It shows a dialogue of a query, explanatory notes and results from the query. Sample query: OPEN TABLE "FRONT NINE" PRINT (HOLE, LENGTH) WHERE (PAR > 4) Line 1: Instructs the system to look at an appropriate part of the database Line 2: Poses a specific question to the database manager. Response: QUERY #1: TABLE "FRONT NINE"

PRINT (HOLE, LENGTH) WHERE (PAR > 4) Hole 5 8 Length 490 475

In a relational database management system a single query can have more than one set of tuples, that is, more sophisticated queries might involve more than one attribute at a time. Sample query: PRINT (HOLE, TOTAL AREA WHERE (DATE = 2/87) AND (TOTAL AREA > 10000) QUERY #2: TABLE "FRONT NINE" PRINT... Hole 2 3 5 SUMMARY The linking of spatial data with its attribute data is one of the most important features in any GIS. While spatial data provide the form and structure to spatial Total Area 20350 16980 21760

Response:

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap5_sum.html (3 von 11) [09.06.04 09:41:34]

Module 4 Chapter5_sum

objects, attribute data provide the intelligence to the spatial data. The storage of such textual data depends on the type of GIS used but in general includes both numeric codes as well as descriptive labels. The power of a GIS lies in the linkage between the spatial and the descriptive data and different GIS software handle this task differently. In general, most vector systems use attribute tables whether these relate to points, lines or polygons whereas raster systems implicitly have these attributes coded into the data itself. Given that spatial joins and aggregation is possible, tabular data may likewise be joined in operations variously known as relates and relational joins A short cut to avoid having to code spatial features repetitively is through the use of look-up tables in both vector and raster systems. Tables, files and fields were discussed and their differences explained. Database management systems formed the subject of the second section since a well constructed database provides functionality and ease of use. Design steps were discussed including why conceptual issues, logical issues and physical resource implications were important. The characteristics of a database were highlighted together with the types of databases that will aid in the integration of textual with spatial data. Also different kinds of file types were described including flat files, hierarchical files, networks and databases. While attribute information describe "what is it?", there are several features which require care and attention. This was the subject of the third section where the volume of data, definitional problems and attribute values were discussed together with issues of accuracy and errors. Steps in the creation of attribute tables were demonstrated using a particular GIS software. In the fourth section the analysis and manipulation of attribute data using simple textual editors and other proprietary software was demonstrated. The need to be able to change attributes was shown under attribute operations, for example, where one key operation the spatial overlay produces knock-on effects on the descriptive data. How attributes may be added, updated and removed in vector and raster data structures was also discussed. Finally in the fifth section the querying of attribute data was shown to be one of the most interesting, challenging and intellectually stimulating exercises an analyst could engage in. A typology of different types of queries was drawn up that suggested that different kinds of query languages might be needed for different data structures. A brief discussion of procedural and non-procedural query languages led to an examination of the needs of different data types. Examples were given to demonstrate the nature of the issue at hand. This summary concludes by reminding users about the fundamental differences and unique problems posed by spatial and attribute data. This reminder is a quotation from Jack Dangermond (1988: 568). "It is important to recognize that there are fundamental differences between managing tabular data and map data. Present database management systems technology is very good for managing tabular data (adding, deleting or modifying records), but mapping involves much
http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap5_sum.html (4 von 11) [09.06.04 09:41:34]

Module 4 Chapter5_sum

more than just the storage and retrieval of data. Database management systems technology is not very effective for updating and managing cartographic data, except the attribute data associated with spatial features. This is so because of the topological relationship between map features. When, for example, a land parcels polygon is adjusted, the relations of surrounding parcels to the first parcel and to each other must also be adjusted; the geometric attributes of all these parcels change. Thus manipulating spatial data is much more complex than managing tabular data. So, coordinate and topological data fundamental to a GIS do not belong in the traditional tabular database management systems environment." FURTHER READING Abel, D.J. (1989) SIRO-DBMS: A database tool-kit for geographical information systems, International Journal of Geographic Information Systems, v. 1, pp. 3350. (An extension of the relational model for spatial data). Aronoff, S. (1989) Geographic Information Systems: A Management Perspective, Ottawa: WDL Publications. Chapter 5 Data Quality pp. 13-136; Chapter 6 Data Management pp. 151 - 187. Banting, D. (1990) Unit 18. Modes of User/GIS Interaction, in Goodchild, M.F. & Kemp, K.K. (1990) (eds.) Introduction to GIS, NCGIA Core Curriculum, Santa Barbara, CA: NCGIA. pp. 18-3 - 18-9. Burrough, P.A. (1986) Principles of Geographical InformationSystems for Land Resources Assessment, Oxford: Clarendon Press. Dangermond, J. (1988) A technical architecture for GIS GIS/LIS 88 Proceedings, v. 2, pp. 561-70: Falls Church, VA: American Society for Photogrammetry and Remote Sensing. Environmental Systems Research Institute (ESRI) (1990) Understanding GIS: The ARC/INFO Method. Redlands CA: ESRI. Goodchild, M.F. & Kemp, K. (eds.) (1990) Introduction to GIS, NCGIA Core Curriculum, Santa Barbara, CA: NCGIA. Huxhold, W.E. (1991) An Introduction to Urban Geographic Information Systems, New York: Oxford University Press. Chapter 2 Geographic Information Systems Defined pp. 25 - 63 and Database Management Systems pp. 36 - 49. Itami, R.M. & Raulings, R.J. (1993) SAGE: Introductory Guidebook, Parkville, Vic.: DLSR.
http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap5_sum.html (5 von 11) [09.06.04 09:41:34]

Module 4 Chapter5_sum

Maguire, D. J & Dangermond, J. (1991) The functionality of GIS in Maguire, D.J., Goodchild, M.F.& Rhind, D. W. (eds.) Geographical Information Systems, London: Longman, pp. 389 - 402. Nyerges, T.L. (1989) Schema integration analysis for the development of GIS databases, International Journal of Geographic Information Systems, v.3, pp. 153 - 184. (Looks at formal procedures for comparing and merging spatial database schemas). Nyerges, T.L. (1990) Unit 10. Spatial Databases as Models of Reality in Goodchild & Kemp. (eds.) pp. 10-3 - 10-9. Star, J. & Estes, J. (1990) Geographic Information Systems: An Introduction Englewood Cliffs, NJ: Prentice Hall. Chapter 7 Data Management pp. 126 142; Chapter 8 Manipulation and Analysis pp. 143 - 173. van Roessel, J.W. (1987) Design of a spatial data structure using the relational normal forms, International Journal of Geographic Information Systems, v. 1, pp. 33 - 50. (For a discussion of the relational database model in GIS). White, G. (1990) Unit 43 Database Concepts I in Goodchild & Kemp (eds.) pp. 43-3 - 43-11; Unit 44 Database Concepts II pp. 44-3 - 44-8; and Unit 66 Database Creation pp. 66-3 - 66-14. REVISION

1. Describe the functional differences between databases, spreadsheets


and statistical packages. Which would be more useful for: research in a university department administrative record-keeping in a small business personal budget planning? (NCGIA 3-11). What makes the concept of a spatial database unique relative to other types of databases? (NCGIA 10-9). What is a database model, and why is it important for designing a database? (NCGIA 10-9). Compare the four database models (flat file, hierarchical, network and relational) as bases for GIS. What particular features of the relational model account for its popularity? (NCGIA 43-11). In what ways are the database issues of GIS different from those of databases generally? (NCGIA 44-8). Explain the difference between attribute data and cartographic data. Give examples. (Huxhold: 61). What makes a relational database different from one that is hierarchical or network? Why would this be important in a GIS? (Huxhold: 61). A spreadsheet (eg. Microsoft EXCEL) allows the user to perform a variety of functions on tabular data. Discuss the possibility of a geographical spreadsheet What would it do, and what applications would it have? (NCGIA 8-11).

2. 3. 4. 5. 6. 7. 8.

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap5_sum.html (6 von 11) [09.06.04 09:41:34]

Module 4 Chapter5_sum

9. Design a database for an airline reservation system. What types of


entities and relationships would be needed and what attributes of each entity? Would the concept of an object-pair be useful? (NCGIA 15-11). 10. The reference by Mountsey, H. & Tomlinson, R.F. (1988) Building Databases for Global Science (London: Taylor & Francis) contains chapters discussing a number of efforts to construct global databases. What particular problems do they present? (NCGIA 20-8).

CLASS EXERCISE. Your task is to examine an application of a GIS in the workplace. The case study should show the utility of mapping and GISs to private organisations, government agencies and the research community. In your case study include the following: Background: To the organisation and an explanation of the problem GIS has been applied to. Implementation: Briefly discuss some of the technical aspects. Benefits gained by implementation, for example: faster and more streamlined operations improved customer service reduction of manual tasks increased capabilities through manipulation of spatial data new products and services, eg. valueadding business opportunities

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap5_sum.html (7 von 11) [09.06.04 09:41:34]

Module 4 Chapter5_sum

cost-benefit analysis Future plans Each case study will emphasize the graphics capabilities and use of database linkages, as well as the analysis and modelling capabilities of the GIS. COMPUTER DEMONSTRATION This is a computer demonstration of CDATA91 which incorporates SUPERMAP. CDATA91 comprises of SUPERMAP, a database management system and a mapping system joined together. However, the mapping functions are different from any you may have seen before. Standard computer mapping systems require a user to input the points and vectors which make up the map outline as well as other information to go on the map. SUPERMAP, on the other hand, has stored in its database all the basic information on map boundaries. You simply indicate which maps or areas are to be used. Names and other information can be added to a map but the boundaries cannot be changed. SUPERMAP is a powerful tool for the extraction and manipulation of census data. SUPERMAP provides access to data in a wide range of categories, over a broad sweep of geographic entities. For more detailed instructions see separate sheets handed out in class. The following page entitled CDATA91 simply provides a navigational aid to the software. These headings suggest tasks which users may wish to try out on their own. CDATA91 1.0 Introduction Filename 8 characters Double click. Click. Drag. Repositioning a window. Dragging the window title bar. Double click CDATA91 icon. 2.1 File menu Open. HOBART.SMP OK Windows menu. Click on any map or chart number. Interpretation of legend. = > < signs Chart 1. Histogram. x = flats % total private dwellings. y = no. collection districts CDs)
http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap5_sum.html (8 von 11) [09.06.04 09:41:34]

Module 4 Chapter5_sum

Chart 2. Scatter diagram. x = O/S born % total persons. y = flats % total private dwelling Line of best fit. 2.2 Cascading windows. Minimising a window. Tiling windows. Tile Map 1, Map 2. Note difference in scales. 2.3 Deleting annotations. Standard annotations in Annotations menu. Check boxes. 2.4 Rescaling maps. Fill window. Percentage. Absolute.. 2.5 Tables. HOBART.SMP icon. Scrolling a table: Horizontal. Vertical. 2.6 Data. Add Expression. List Items. More details. OK. Add items. Community profile. Expression. Heading. Close. Examine a data column. Click on Statistics. Sort: Ascending/Descending order 2.7 Creating a new map. New Map. Annotating a map. Annotations. Standard Annotations. OK. Legend. Reposition legend. Classify. OK. Adding text. Annotations text. System. Choose font. Size. Attributes. Type in text. OK. Drag positioning rectangle to new position. 2.8 Creating a table. File New. No (not to save previous work). 2.9 Add areas by Add Areas. Area. Add Area. More detail. More detail. More detail to get Hobart C. Inner and Hobart C Remainder. Select both. Click all subareas. Close. 2.10 Add data by Choose Data. Data. Choose Data. Selected Characteristics. More detail. Highlight total males, females, persons Add data. Close
http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap5_sum.html (9 von 11) [09.06.04 09:41:34]

Module 4 Chapter5_sum

2.11 Adding areas by visual selection. File New. No (to save). Area. Add Visually. X is for visual selection. Click and drag mouse to define rectangle. Zoom-in button. Click Local Government Area. Draw area boundaries symbol. 2.12 Adding areas by radial selection. (Symbol is circle within a square). 2.13 Adding data by Choose Community Profile. Data Basic Community Profile. Table by number. Ethnicity. More detail. Selected Characteristics. OK. Click Add. Close. 2.14 Exit CDATA91. File Exit. No to save. SOFTWARE EVALUATION You are provided with a demonstration model of CDATA91. Your task is to evaluate the functionality of this package. In order to evaluate this software you will also need to read Maguire & Dangermond (1991). In their treatment of assessing the functionality they have used ten different criteria. These criteria include the following:
r r r r r r

r r r

capture of data transfer of data validate and editing of data storage and structure of the dataset restructuring the dataset from raster to vector and vice versa generalization of dataset in terms of smoothing and aggregation of features transformation of datasets in terms of scale, rotation, translation and inversion of matrices query capabilities, such as "Where is...?" analysis capabilities, typically "What if ..." presentation of results in terms of flexibility of presentation including maps, graphs, tables, colours, symbols used etc.

This is only a suggested start. You may, if you wish include other criteria that you find useful. Also the required reading is a minimum. You may need to look up other references for ideas. See Burroughs (1986) and Maguire (1991) also.

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap5_sum.html (10 von 11) [09.06.04 09:41:34]

Module 4 Chapter5_sum

Reference: Maguire, D.J. & Dangermond J. (1991) The functionality of GIS in Maguire, D.J., Goodchild, M.F. & Rhind, D.W. (eds.) pp. 389 - 402.. Citation To reference this material the correct citation for this page is as follows: Cho, G (1995) A Self-Teaching Student's Manual for Geographic Information Systems. (Insert here the Module Number and Name) Canberra: University of Canberra and CAUT. Online URL, http://infosyslaw.canberra.edu.au/gismodules/index.html, as of (...) [date and time].

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 2001

http://infosys-law.canberra.edu.au/gismodules/manual_4/m4_chap5_sum.html (11 von 11) [09.06.04 09:41:34]

Manual 5 T-O-C and Introduction

INTEGRATING REMOTE SENSING WITH GIS


MODULE 5 [Home][Comments] [Modules] [Glossary]

TABLE OF CONTENTS
Preface Acknowledgments Introduction Materials required Aims Objectives

1. Remote Sensing and GIS: Definitions 1.1 Geographic Information Systems 1.2 Remote Sensing 1.3 Image Analysis 1.4 Photogrammetry 1.5 Cartography 2. Integrating Remote Sensing with GIS 2.1 Why integrate? 2.2 Some possible modes of integration 2.3 Problems and obstacles to integration 3. Solutions to Problems of Integration 4. Remote Sensing and GIS as Tools or Science 5. Future Prospects: Remote Sensing with GIS Summary Further Reading Revision Class exercise Computer demonstration Glossary of terms Index

PREFACE
http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_tab_intro.html (1 von 6) [09.06.04 09:41:46]

Manual 5 T-O-C and Introduction

Integrating Remote Sensing with GIS is the final of five modules in the series on A Self Teaching Students Manual for GIS. This manual is the result of work undertaken for a Committee for the Advancement of University Teaching (CAUT) National Teaching Development Grant for 1995. The four previous modules included: Introduction to Geographic Information Systems, Raster GIS: An Introduction, Vector GIS: An Introduction and Managing Attribute Data in GIS. In order to complete this self-contained unit successfully users should be prepared to spend approximately ten hours, that is, reading and working with the manual, writing up results, doing extra reading and attempting an assessment exercise. The presentation style given in this and following modules is one which may be described as a spiral curriculum. In such a curriculum, the contents in the present module are used again in a following module except in more depth and detail the next time the same or similar concepts are encountered. In general, there are four parts to a module:

1. the text presents both the conceptual and practical aspects of the
module with examples from as many usages as possible; 2. diagrams, figures and other illustrative materials are used to explain and show relationships; 3. questions, exercises and problems to be solved and an assessment; and 4. suggestions for further reading and research. (See curriculum chart in Figure 5.1).

Figure 5.1 Interlocking modules of the spiral curriculum It is advisable that you use this workbook as your personal notepad. A highlighter or bright coloured biro to underline text will help identify important points. In this workbook all important concepts, words and phrases are set out in bold letters and those words and phrases used which carry different meanings from their usual are italicised. To
http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_tab_intro.html (2 von 6) [09.06.04 09:41:46]

Manual 5 T-O-C and Introduction

begin with you should browse through this workbook very quickly just to get a feel of its contents. Reading this preface helps! A tutor may walk you through this workbook but the pace may or may not suit you. You should try to go through this workbook at a pace with which you are comfortable with. The appendices are an important component of the overall module because they contain important tools. Hints on using computers, a glossary, an index, and some answers to workbook problems are provided here.

ACKNOWLEGMENTS
I should also like to thank the following individuals and publishers for permission to reproduce their illustrations and examples in this workbook.

.
Curran, P.J. Principles of Remote Sensing (1985: 2 Figure 1) reproduced as Figure 5.2 here, Aronoff, S. Geographic Information Systedms: A Management Perspective (1989: 79, 82 Figures 3.16 and 3.18) reproduced as Figure 5.4 here, Colwell, R.N. Manual of Remote Sensing 2nd. ed. (1983) for Figure 5.5, Fisher, P.F. & Lindenberg, R.E. (1989: 1432-33) for Figure 5.7

INTRODUCTION

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_tab_intro.html (3 von 6) [09.06.04 09:41:46]

Manual 5 T-O-C and Introduction

This is the final module in the series A Self Teaching Students Manual for GIS and is concerned with Integrating Remote Sensing with GIS. This module attempts to provide a holistic view to studies of spatial information systems. Such a module is necessary for three reasons. First, a course in GIS would be incomplete if it were to ignore a major source of spatial data as that provided by aerial photographs, photogrammetry and remote sensing (RS) methods. However, there is also a need to know how, when and why such data may be used to greatest advantage. A discussion of the integration of both RS and GIS would seem most pertinent. Secondly, many students move on to pick up knowledge about RS as a separate course of study. Indeed, there may be others who may have recently completed studies in remote sensing and wish to use their skills in the GIS area. This module therefore provides some suggestions as to how GIS may use remotely sensed data in project work and in the development of analytical methods. Finally, this module also provides a brief discussion as to the status of remote sensing and GIS both as tools of analyses as well as scientific endeavours which result in new discoveries and knowledge. This module is presented in five parts. After this introductory section, part one deals with definitional issues. RS and Image Analysis Systems (IAS) are defined and described and these are contrasted with present understanding of GISs. As image data are involved, photogrammetry and air photo interpretation are also defined to place each of these subjects in relation to each other. Also, as the final result of any project involves the production of maps, the place of cartography in this scheme of things is also defined. In part two, the advantages of integrating RS with GIS is discussed in order to draw out the principles governing integration as well as some of the obstacles and problems one may encounter. In part three, some solutions to integration are suggested and conclusions are drawn with regards to present and future trends. Part four deals with the much vexed question of whether RS and GIS are both mere techniques or whether these studies may justifiably be considered as sciences in themselves and are part of spatial information science. The concluding section to this module discusses future prospects of GIS with RS and the convergence and divergence of technologies. This module concludes with a brief summary followed by suggestions for further reading and revision exercises. A glossary and index follow to assist the reader navigate through this workbook. MATERIALS REQUIRED
r r r

Bright coloured biro or highlighter. Sharp pencils, preferably HB (hard-black), rulers, erasers. A4 mm graph paper, tracing paper.

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_tab_intro.html (4 von 6) [09.06.04 09:41:46]

Manual 5 T-O-C and Introduction


r r

Access to a personal computer (Intel-based IBM-compatible PC). Pocket Stereoscope

Computer demonstration software


r

EASI-PACE

Internet For students with access to the Internet a visit to the following sites might be a profitable venture. Note that URL addresses are case-sensitive. "Geography, Maps and Remote Sensing" http://ice.ecdavis.edu/Cyberspace_Jump_Station/geography.html "SPOT High Resolution Visible Data" http://edcwww.cr.usgs.gov/glis/hyper/guide/spot "Nice Geography Servers: GIS & Remote Sensing" ( A huge list of sites) http://www.frw.ruu.nl/nicegeo.htm#gis "Global Satellite Imagery" http://www.rsmas.miami.edu/images.html AIMS To have a student:

1. Appreciate the various components of spatial information systems: 2. 3. 4. 5. 6. 7.


remote sensing, GIS, air photo interpretation, photogrammetry, image analysis and cartography. Understand some basic principles of remote sensing systems and how these may be used in GISs. Appreciate the advantages of integrating remote sensing with GIS. Obtain a basic understanding of the principles governing the integration of remote sensing with GIS and the obstacles and problems of such integration. Gain an insight to possible solutions to difficulties of integrating remote sensing with GIS. Understand remote sensing and GIS as techniques and as a spatial information science. Evaluate the future prospects of remote sensing with GIS as both convergent and divergent technologies.

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_tab_intro.html (5 von 6) [09.06.04 09:41:46]

Manual 5 T-O-C and Introduction

OBJECTIVES As a result of completing this module a student should be able to undertake the following tasks with some level of understanding and competence.

1. Define the various components of spatial information systems in


particular remote sensing Geographic Information Systems air photo interpretation photogrammetry image analysis, and cartography. Describe the different ways remotely sensed data are captured, stored, retrieved, manipulated, and used in image analysis systems. Explain the need to integrate remote sensing with GIS. Discuss how remotely sensed data may be used in a GIS including the advantages, problems and obstacles to integration. Provide suggestions to solve some of the problems of integrating remote sensing with GIS. Explain the basic tenets of scientific discovery and explanation in contrast to the use of methodologies as tools of scientific analyses. Argue the case that remote sensing and GIS are both tools for scientific analysis as well as scientific disciplines in themselves. List the future prospects of remote sensing with GIS and outline trends in the convergence and divergence of these technologies.

2. 3. 4. 5. 6. 7. 8.

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 2001

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_tab_intro.html (6 von 6) [09.06.04 09:41:46]

Manual 5 Chapter1

INTEGRATING REMOTE SENSING WITH GIS


[Home][Comments] [Table of Content] [Modules] [Glossary]

1. REMOTE SENSING AND GIS: DEFINITIONS


In this section we attempt a comparison of definitions of the major components in spatial information systems (SIS). The usefulness of such an exercise stems not only from drawing the boundaries of each of the sub-components in SIS but also in identifying the areas of overlap for each of these. By doing so, it would be possible to consider each of these in a linear fashion in which the output of one forms an input to the other. An alternative view is one where each of these are placed in a hierarchy where each component is taken to form minor parts of the whole. It is appreciated that while any definition may have many alternatives, the ones used here are considered to be representative and are acknowledged to contain most of the requisite elements that best describe the component being defined. 1.1 Geographic Information Systems Several definitions of GISs have been given previously in Module 1. Personal preferences may dictate one definition over the other. Indeed the definition of GIS itself is not a closed one. But in the final analysis there are two main themes that can be distinguished and which recur time and again. These are definitions that emphasize the technological aspects of GIS and those that focus on the problem-solving aspects. The technological aspects may be observed most readily in the following definition by Burrough (1986: 6): "Geographical Information Systems ... [are] a powerful set of tools for collecting, storing, retrieving at will, transforming, and displaying spatial data from the real world for a particular set of purposes." Such a bias towards the technology is not surprising given that a lot of GIS activity concentrates on the computer-related aspects of the field and preoccupies many minds. However, the trend is towards emphasising the problem-solving aspects of a GIS. Thus, Goodchild (1985: 36) gives the following definition: "A GIS is best defined as a system which uses a spatial data base to provide answers to queries of a geographical nature". In the definition above, the emphasis is on the analytical aspects of the system. However, a definition that focuses on the problem-solving strengths of GISs is provided by Cowen (1988: 1554) which states that: "A GIS is ... a decision support system involving the integration of spatially referenced data in a problem solving environment". Within a GIS there are four components (Marble et al 1984) that integrate the spatially referenced data. Data input often comprises spatial or thematic data derived from a combination of existing maps, field observations and measurements and interpretation of remotely sensed (aircraft and satellite) imagery. Data manipulation and data analysis enables users to define procedures to generate information derived from the raw data. Data reporting is used to tabulate the results, generate maps and produce visual displays and reports. 1.2 REMOTE SENSING As with GIS, there are many definitions of remote sensing in the literature. Almost all definitions of remote sensing exhibit a high degree of agreement in what the field purports to do. A most authoritative definition of remote sensing is found in the Manual of Remote Sensing (Colwell 1983: 1) which is given as: "the gathering and processing of information about the Earths environment, particularly its natural and cultural resources, through the use of photographic and related data acquired from an aircraft or satellite".

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap1.html (1 von 11) [09.06.04 09:42:25]

Manual 5 Chapter1

Such a definition emphasizes both the method of data acquisition and the type of information that is being collected. Some authors have reviewed and compared a number of such definitions and Fussell et al. (1986: 1510) in particular have concluded that remote sensing is a science and its definition should include some or all of the following elements:
r r

r r r r

the non-contact acquiring, collecting and recording from regions of the electromagnetic spectrum (typically although not exclusively) that include but exceed the visible region through the use of instruments located on mobile platforms, and the symbolic transformation of collected data by means of interpretative techniques and/or computer-aided pattern recognition.

Curran (1987: 305), however, takes issue with the contention that remote sensing is a science. The preference is to have remote sensing labelled as a technique because its users are not in pursuit of knowledge but rather make use of remote sensing to solve problems. In general, therefore, remote sensing includes all of the following activities:
r

r r

data collection in which sensors used are not in contact with the object being analysed and where the electromagnetic energy is used as a means of transferring information. Electromagnetic energy includes light, heat and radio waves which help to detect, identify and measure target characteristics preprocessing to prepare the data to be used in any analytical system, and image presentation either as an interim dataset or as a final product to be incorporated into a report.

Advances in remote sensing technology have been ascribed to three key areas of development which have enhanced the use of such data. First, the developments in remote sensing instruments have enabled the acquisition of image data outside the visible and photographic near-infrared regions of the electromagnetic spectrum into the thermal infrared and microwave spectrum. Second, the deployment of earth orbiting satellites provide a vantage viewpoint that is not possible from aircraftmounted sensors. Current remote sensing systems acquire imagery in either analogue or digital form. The third development is the computer processing of digital imagery for rapid data processing and extraction of information. However, while fast and objective, some of these methods of information extraction are less accurate that visual interpretation. Nevertheless, these systems provide data and information for resource management and environmental. Figure 5.2 summarizes synoptically the structure of remote sensing as a discipline. Figure 5.2 Structure of remote sensing as a discipline

Source: Curran (1985: 2) (Figure 1). 1.3 IMAGE ANALYSIS

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap1.html (2 von 11) [09.06.04 09:42:26]

Manual 5 Chapter1

Large quantities of remotely sensed images are being generated by earth orbiting satellites such as Landsat and Spot. The Earth Resources Technology (ERTS-1) satellite, renamed Landsat 1 was launched in 1972. The satellite carried sensors capable of providing synoptic views of the Earths surface every 18 days (Curran 1985: 2). Landsat -4 remote sensing satellite (launched in 1982) has two sensors the Thematic Mapper (TM) and the Multispectral Scanner (MSS) with the former producing ten times more data than the latter. A full TM scene measures 185 km which may be analysed into 32 classes. The SPOT (Systme Pour lObservation de la Terre) program launched SPOT-1 in early 1986 carried two identical high resolution visible scanners which operate in either the panchromatic mode (10 m pixels) or in the multispectral mode (20 m pixels in 3 bands)(Aronoff 1989: 76-84). Figure 5.3 Digital satellite image of Belconnen, ACT wi th several enlargements (magnifications approximate).

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap1.html (3 von 11) [09.06.04 09:42:26]

Manual 5 Chapter1

The information content of these data sets / images can best be extracted using purpose-built computer programs designed for such purposes. Such systems have been described as Image Analysis Systems (IAS) (Goodenough 1988). An IAS has five elements:
r r r r r

data acquisition preprocessing analysis accuracy assessment information distribution

According to Goodenough (1988: 167) the usual input for a remote sensing IAS is a computer compatible table containing a digital image acquired at a remote sensing satellite tracking station. In some organisations, corrections for sensor-related radiometric and geometric errors will have been performed at the tracking station. An additional correction for errors also takes place in the preprocessing stage where other radiometric, geometric and atmospheric corrections are performed. Such corrections are highly specialized and concentrate on specific aspects of data collection and storage. These corrections are not normally provided by image production systems. Examples of specialized corrections include:
r r r r

radiometric distortions due to view angle geometric compensation for terrain relief projection of imagery to a variety of map projections, and refined atmospheric corrections with the help of meteorological information.

Figure 5.4 Landsat 4 & 5 satellite platform with the Thematic Mapper and Multispectral Scanner and the SPOT Satellite with the High Resolution Visual scanners.

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap1.html (4 von 11) [09.06.04 09:42:26]

Manual 5 Chapter1

Source: Aronoff (1989: 79, 82) (Figures 3.16 and 3.18) The result of all these work is to give error-free images to analysts to work with. When the image has been pre-processed it is then used for training and classification. The word training refers to the process of using a small part of the image, for which the analyst has additional information, as a training site. Here all the relevant signals in terms of ground data and thematic data are used as a guide to correctly identify features on the image. Once such details have been agreed upon, the classification of the image in terms of its spectral signatures and spatial classes can now be made according to various features geological, soil, vegetational, land use, cultural and so on. Such classification procedures use statistical methods such as principal components analysis and other ordination methods to produce like classes of features. The accuracy of the classification procedure is then assessed either by comparison with known theoretical parameters and estimations based on real-world known class statistics or by reference to test training sites or both. Finally, the derived information is distributed in the form of tables, maps, computer tapes or images for use in projects or as input data into GISs and other environmental information systems. As an example, the Landsat Digital Image Analysis System (LDIAS) of the Canadian Centre for Remote Sensing includes all of these image analysis functions plus GIS components described above. This pattern of image analysis has been emulated at other remote sensing and GIS centres. A conceptual
http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap1.html (5 von 11) [09.06.04 09:42:26]

Manual 5 Chapter1

framework of image analysis is shown in the following figure (see Figure 5.5). Figure 5.5 Conceptual framework of image analysis.

Source: Adapted from Colwell (1983). In some systems IAS are integrated with GISs where thematic and attribute data from a GIS are used to guide the selection of suitable training sites. The availability of such additional information are invaluable in the further processing of satellite imagery data and in enhancing the accuracy levels of images that are subsequently distributed. Often, the thematic polygons in a GIS contain many more spectral and spatial classes that may be used during the training and classification
http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap1.html (6 von 11) [09.06.04 09:42:26]

Manual 5 Chapter1

stage of image analysis. Moreover, it is important that the user has manual control in the selection of the training areas. This is because the user may have ground-truthed the area itself and thus has personal ground knowledge of the area. However, this may not be the case since some areas may be inaccessible or the scale may be inappropriate for any groundtruthing to be conducted. 1.4 PHOTOGRAMMETRY Photogrammetry includes techniques of obtaining precise measurements of objects from photographs. It is the art and science of obtaining reliable spatial measurements from aerial photography and other remotely sensed images (Aronoff 1989: 110). Aerial photographic interpretation is defined as the act of examining photographic images for the purposes of identifying objects and judging their significance. The process of interpretation includes detection, recognition and identification, analysis, deduction, classification, idealisation and accuracy determination (Curran 1985: 95). The key words in both photogrammetry and aerial photo interpretations are measurements and interpretation from photographic images. The relevance of measurement and interpretation are discussed below. As a general statement it may be said that aerial photography was the first method of remote sensing and even today still remains the most widely used type of remotely sensed data. Curran (1985: 56) lists six characteristics of aerial photography that has made it so popular: availability, economy, synoptic viewpoint, time freezing ability, spectral and spatial resolution and three-dimensional perspective.
r r r

availability: aerial photographs are readily available at a range of scales for much of the world economy: aerial photographs are cheaper than field surveys and are often cheaper and more accurate than maps synoptic viewpoint: aerial photographs enable the detection of small-scale features and spatial relationships that would not be evident on the ground time freezing ability: an aerial photograph is a record of the Earths surface at one point in time and can therefore be used as an historical record spectral and spatial resolution: aerial photographs are sensitive to radiation in wavelengths that are outside the spectral sensitivity range of the human eye as they can sense both ultra-violet (UV) and infrared (IR) radiation. They can also be sensitive to objects outside the spatial resolving power of the human eye. three dimensional perspective: a stereoscopic view of the Earths surface can be created and measured both horizontally and vertically; a characteristic that is lacking for the majority of remotely sensed images (see Figure 5.6).

Figure 5.6 A stereo-aerial photo of the Tavurvur Volcano, Rabaul Caldera, Papua New Guinea.

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap1.html (7 von 11) [09.06.04 09:42:26]

Manual 5 Chapter1

Source: Department of Geological Engineering and Sciences, Michigan Technological University, Houghton, MI. URL address as at 9/1995 was http://www.geo.mtu.edu/volcanoes/rabaul/ However, not all environmental scientists regard aerial photographs as tools-of-the-trade because of difficulties of obtaining suitable aerial photographs, the need to interpret as opposed to read aerial photographs, uncertainty as to equipment needed to interpret and take measurements from aerial photographs and the lack of knowledge of what aerial photographs have to offer. Nevertheless, aerial photo interpretation and photogrammetry remain useful and viable alternatives to obtaining data for input into GISs. Aerial photographs are in its element when it comes to providing historical records of particular environments and more specifically where the study concerns change detection: vegetation, land use, urban development and the like. 1.5 CARTOGRAPHY Cartography is the science and practice of representing features of the Earths surface graphically. In more formal terms a definition of cartography may be stated as follows: "the graphical representation of spatial relationships and spatial forms in what we call a map, and, very simply, cartography is the making and study of maps in all their aspects .... This includes teaching the skills of map use; studying the history of cartography; maintaining map collections with associated cataloguing and bibliographic activities; and the collection, collation and manipulation of data and the design and preparation of maps, charts, plans and atlases" (Robinson et al. 1985: 1-3). In the light of this extensive definition, it may be possible to distil four essential elements in cartographic work; namely:
r r r

collecting and selecting the data for mapping; manipulating and generalising the data, designing and constructing the map; reading and viewing the map; and,

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap1.html (8 von 11) [09.06.04 09:42:26]

Manual 5 Chapter1
r

responding to or interpreting the data.

Some observers, for example, Dent (1985), make a distinction between map making and cartography. Cartography, according to Dent (1985: 5) "requires the study of the philosophical and theoretical bases of the rules for map making, including the study of map communication". On the other hand, Muehrcke (1972: 1) considers map making "as the aggregate of those individual and largely technical processes of data collection, cartographic design and construction (drafting, scribing, display), reproduction, etc., normally associated with the actual reproduction of maps". The importance of such distinctions will become apparent when one attempts to draw parallels and contrasts between remote sensing, GIS and cartography. It will then be realised that there can be as many differences as there are similarities in each of the various fields. The review of the current definitions of the various fields presented above suggest a considerable overlap in their extent. Depending on whichever definition has been selected, one can be interpreted as totally subsuming the activities that might be claimed by the other fields. The definitions also show up the poorly defined boundaries in each of the fields and the lack of clarity in the drawing of interrelationships. Different models may be constructed to show different interpretations of how each of these fields interact. Thus, in grappling with the three major fields under discussion remote sensing, GIS and cartography, Fisher & Lindenberg (1989) have suggested four possible models of interaction and later provide a critique of each (see Figure 5.7). The Linear Model of Interaction depicts a situation where remote sensing feeds data to a GIS and the GIS passes it on to cartography for display (Fig. 5.7a). In a Cartography Dominant Model, cartography occupies an unrivalled position with respect to the other sub-disciplines (Fig. 5.7b). The third model is one that is labelled a GIS Dominant Model which suggests that the manipulation and analysis of information is all important in all sub-disciplines, but, being a primary concern of GIS, gives that dominance (Fig. 5.7c). A final model the Model of Three-way Interaction gives no dominance to any of the sub-disciplines and recognises that each has unique, if overlapping, areas of knowledge and intellectual activity (Fig.5.7d). A critique of the models will show that they either fail adequately to describe the relationship between the sub-disciplines or default to a position in which all spatial information-related concerns fall under the umbrella of either cartography or GIS. The linear model precludes an overlap of cartography and remote sensing, and suggests that remote sensing information is only used in cartography when it is processed by a GIS. This is erroneous because aerial photographs, as analogue products, have been used for mapping purposes. Also this is a narrow view as there are other forms of digital data that can also be used. The cartography dominant model does not provide for the possibility of an answer to a query of a spatial data base by a GIS which is not in graphic form. The GIS dominant model subjugates the importance of the map as a means of communicating efficiently all the necessary information and in turn the management of the data. Figure 5.7 Four models of interaction: remote sensing, geographic information systems and cartography

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap1.html (9 von 11) [09.06.04 09:42:26]

Manual 5 Chapter1

Source: Fisher & Lindenberg 1989: 1431 - 34). The three way interaction model realistically represents the interactions among the three fields. No field is placed in a position of dominance over the others, or even in isolation from the others. There is interaction in all possible combinations of all three fields. Data acquisition and analysis are the emphasis in remote sensing. The analysis of geographic information to support decisions is dependent on the way in which the data are gathered. Cartographers may use map information which is a direct product of remote sensing or which has been processed by a GIS. The emphasis within cartography is the effective presentation of the information as a map and includes data analysis and manipulation when it is to facilitate that presentation. The three way interaction model depicts a relationship that recognises these fields as having evolved to a point in which they are no longer subsidiaries of cartography (see Fisher & Lindenberg 1989: 1431 - 34).

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap1.html (10 von 11) [09.06.04 09:42:26]

Manual 5 Chapter1

[Home] [Table of Contents][Comments] [Modules] [Glossary] Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 2001

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap1.html (11 von 11) [09.06.04 09:42:26]

manual 5 Chapter2_Summmary

INTEGRATING REMOTE SENING WITH GIS


[Home][Comments][Table of Content][Modules] [Glossary] 2. INTEGRATING REMOTE SENSING WITH GIS The discussions above suggest that, cartography apart, there is a need to examine the rationals for integrating remote sensing with GIS. An integration of remote sensing with GIS should produce timely information that may assist in land and environment management. However, the integration of these two systems has presented some difficulties. This section of the manual addresses the issues of the benefits and advantages stemming from integration, some possible models and principles of integration and problems and obstacles encountered in the integration process. 2.1.WHY INTEGRATE? Environmental management, environmental monitoring and resource management all require information in order to make informed decisions. With adequate information on resources, physical and biological constraints and future potential, managers are able to model and predict the use of these resources in different scenarios. The data can also be mapped to show changes in the state of the environment and to detect and determine the causes of such changes. A considered view today is that remote sensing and GIS offer the potential of supplying both the data and the analytical capabilities to derive and synthesize information of an appropriate kind. Further, the integration of both these fields provide the ultimate tool for resource management and environmental monitoring purposes. Goodenough (1988) and Ehlers et al. (1989) have discussed the integration of RS and GIS and the potential benefits of such integration. The following summarizes some of the main strengths of integration from the discussions.

Timeliness. Some input data which GIS use can become obsolete quickly, for example, land use, vegetation, forest types, soil erosion and stream run-off patterns. These need to be regularly up-dated if the thematic maps produced by GISs are to provide any meaningful information. Remote sensing data has been considered the most cost-effective means of updating spatial data, especially if these are carried out routinely and where automated, as opposed to manual, procedures are employed. Accuracy. Remotely sensed data are routinely pre-processed before being released for use in other applications. Computer assisted techniques improve the accuracy of the information extracted from remotely sensed data. Errors arising from instrumentation as well as from other sources such as atmospheric disturbance need to be rectified. Moreover, remotely sensed data can also be improved by using ancillary contextual data to enhance its usefulness. For example, vegetation analysis using remotely sensed data can be improved by incorporating other data such as slope, aspect, elevation and climate parameters. The effect of such ancillary information is to enhance the image analysis process and enable a better segmentation and labelling of the digital images prior to using image analysis algorithms and prior to distribution for use in GISs. Geometric registration with digital images. In GISs a prerequisite for accurate analysis is good geometric registration between data layers. Digital remote sensing images may be rectified geometrically using various algorithms and these may be verified by overlaying a GIS map layer. The overlaying process will show areas of poor visual registration. These areas can then be identified and geometric errors corrected with the use of additional ground

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (1 von 13) [09.06.04 09:42:55]

manual 5 Chapter2_Summmary

control points. Digital images as layers. Aerial photographs and images may be used directly as data layers in a GIS. The aerial photographs and images may need to be digitized as an orthophotographic map and thus be considered as a data layer. The use of such orthophoto maps implies that the images have been corrected for relief displacement resulting from the nature of the topography, characteristics of the digital data and the accuracy of the data layers within the GIS being used. Once the necessary corrections have been made these orthophoto maps (also known as ortho-images) provide very good base maps upon which other GIS data layers may be overlayed. The orthophoto provides information that is both pleasing to the eye and can be easily understood by all without additional explanation. Interchange of features and functions. Many commercial GISs nowadays simultaneously display raster images with vector-based cartographic elements such as points, lines and polygons. This means that there are data exchange functions which permit an interchange of data types between image processing systems and GISs. Such basic functionality is most beneficial because many tasks which may be quite difficult to do in an image processing system are relatively easy in a GIS and vice versa. For example, overlay operations are easier to perform in a raster domain while network and topologic operations are more suited in a vector domain. As a consequence the advantages of both data structures can be exploited leading to a greater functionality to the user. Modelling change. Spatial information systems have the necessary data manipulation and modelling procedures to permit change detection and to predict future scenarios. Such scenarios derive either from extrapolation and continuation of current trends or are influenced by policy changes. An integrated remote sensing-GIS model may make it possible to analyse the impact of future changes in crop production, employment opportunities and the effect of development on soil loss and slope erosion. When such impacts are mapped, the results can be very persuasive for decision-making as well as for public dissemination. Thus, the ability to make predictions that have spatial impacts can be extremely beneficial to environmental and resource management planning.

2.2 Some possible modes of integration Some commercial systems have incorporated additional features such as raster-to-vector and vectorto-raster conversions and limited image processing to enlarge the capabilities of GISs. However, Ehlers et al. (1989: 1620) consider that such basic functionality may become hampered by a lack of flexibility when dealing with higher levels of integration. For example, image analysis and remote sensing have tended to push GISs to their limits in modelling and simulation studies. This may be because of limitations in the GIS software design so that the entire software may need to be rethought if the modelling and simulation were to be possible. Thus, a possible mode of integration is for GIS software to operate in tandem with image processing systems with some common interface between the two in order to facilitate data interchange. Tandem processing thus would entail raster on the one side and vector on the other. This approach has been used by companies which handle both image and cartographic data as separate products. A common user interface entails no change in either software package. The basic functionality of GISs may also be hampered by a lack of extensibility in terms of the programming tools that are currently available. Some systems provide rudimentary programming tools but GISs are often so complex structurally that to add a programming function may be too difficult and too time consuming to undertake. A database management system that uses fourthgeneration languages or non-procedural languages to manipulate spatial entities would be highly desirable and may result in the greater extensibility of the system. (See query languages in Module 4, pp. 46 - 48).

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (2 von 13) [09.06.04 09:42:55]

manual 5 Chapter2_Summmary

High levels of integration have been achieved with regard to certain problems and these have been in the extraction of cartographic features from digital imagery. The progress here has been facilitated by recent developments including:

r r r

software and hardware advances in GISs; the availability of high resolution satellite data in digital format such as SPOT and TM; and, new developments in automated information extraction especially the application of image matching techniques for generating digital elevation models (DEMs).

Satellite image data especially from Landsats TM and SPOT have been used to produce base maps and other map revision tasks at scales that previously have been thought impossible for remote sensing applications. Another mode of integration is the use of images in terrain visualisation. Satellite images in combination with DEMs can produce realistic perspective views of the ground for planning and environmental impact analysis. Video techniques now permit simulations of fly-bys as if one were in an aircraft, or on foot going over the terrain. The idea of displaying elevation data as a surface is not new. Contours may be drawn (threaded) through the pixels of a raster display along the lines of constant value. A search algorithm may be used for finding contours. However, while this may be very slow because the search operation for contours is computer intensive, the final surface may be shown in an oblique perspective view. The views are achieved by drawing profiles across the raster with each profile offset and removing hidden lines. The surface may also be enhanced by using colours. This may be achieved by draping a second layer over the surface defined by the first layer. The result can be very effective on a colour display terminal. In the film "LA The Movie", the Jet Propulsion Lab draped a Landsat image of Los Angeles over a layer of elevations and then simulated the view as if from a moving aircraft. To achieve realism and to avoid jerky images, very fast and powerful computer processing was necessary both to simulate perspective and to remove hidden lines. Another area of successful integration is that of generating DEMs using satellite stereo digital data. Because very highly accurate digital data may be generated DEMs and ortho-images at scales of 1 : 50,000 have become possible. From these further topographic information for GIS applications can be derived. Remote sensing in conjunction with GIS greatly enhances capabilities of updating map information on a regular basis. SPOT data has been used in a GIS for regional growth analysis and local planning at a small scale 1 : 24,000. Change detection in forestry has also been demonstrated using cartographic techniques and remotely sensed data. GIS vector information has also been used to aid image segmentation. Cartographic data has been used in conjunction with SPOT MSS data as a guide to segment land use in an agricultural area containing small plots of land. Others have combined segmentation results from filtered and unfiltered airborne radar data in a GIS using overlays. From the data it has been possible to locate clear cuts in a forest scene in an automated basis. Finally, modelling in GIS while widely used in two dimensions using standard overlay techniques are now being extended to include three-dimensional spatial modelling as well as dynamic temporal simulations. Such applications are crucial for several disciplines climatology, geology, soils, hydrology, marine science.

2.3 PROBLEMS AND OBSTACLES TO INTEGRATION

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (3 von 13) [09.06.04 09:42:55]

manual 5 Chapter2_Summmary

There are several impediments in the integration between remote sensing and GIS. The issues of data interchange, the accuracy of classified remotely sensed data and data stored in GIS layers are some logistical barriers to a high level of integration. These barriers may be divided into two groups: technical and institutional. Technical Data structures used in remote sensing and GIS have confounded attempts at integration. Remote sensing is almost completely oriented towards a raster approach to data capture, storage and analysis. GISs on the other hand have tended to be vector oriented although there are raster GISs. Each of these data structures have their own advantages and disadvantages. Remote sensing has concentrated on raster analysis for image processing for more basic technical reasons. The processing needs for image analysis is very high so that raster data structures are the only feasible choice on slow computers. Moreover, detectors produce raster digital information directly and so processing them seems a natural choice. Recent research also shows that human vision and perception more readily accepts images in raster form than higher level symbolic representation of geographic features. Finally, there are other data structures such as quadtrees and other tessellations which can equally do the job in representing features. An example of the common usage of a data structure such as rasters is given in the following figure. Figure 5.8 Using the raster data structure in remote sensing and GIS.

Depending on usage a raster representation can assume any one of the values below:

in a remote sensing image each pixel represents a number of values spatial, temporal, spectral, radiometric spatial values give the scale of the image, for example, a Landsat image covers 185km2

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (4 von 13) [09.06.04 09:42:55]

manual 5 Chapter2_Summmary

r r

whereas the resolution in SPOT images may be up to 10m2 temporal values give a view at one point in time and repeated in cycles (18 days for Landsat images) spectral values of remotely sensed images provide information on wavelengths (ultraviolet, infrared, microwave) as well as a digital number of the pixel radiometric values in terms of electromagnetic radiation in a band of wavelengths in a raster GIS each cell holds a single value linked to an attribute or a class of values

A second technical obstacle is that GISs rely on fairly uniform and pre-determined data far removed from the raw end of data collection. In remote sensing, however, a set of data are typically collected and the user has to decide how to use it as well as what to use from the data set. These approaches have several consequences. In GIS, the tracking of error in the data is much harder because the user is far removed from data collection. Furthermore, data management in a GIS is dependent on the regular availability of the necessary data. In remote sensing, on the other hand, it is difficult to separate data collection from data processing. Here, data collection can be uneven in terms of time and when data are collected and data coverage is far from uniform, in terms of where and at what scale.

Many GISs use vector storage to represent thematic classes. Image analysis systems use raster storage. An integrated IAS must have software to convert raster to vector and vice versa. A GIS can have an unlimited number of classes, for example, forested areas where there can be as many classes as tree or vegetation species. Most IAS are limited by the sophistication of the sensors used, usually less than 256 classes. The geometric accuracy of GISs for resource inventories are probably much poorer than corrected remotely sensed data. TM and SPOT imagery can be very accurate and cover a much larger geographical area (Landsat scenes cover about 185 km). GISs are usually derived from maps or a combination of maps and aerial photographs. In using aerial photographs error such as displacement error may creep in. GIS class labels may not correspond with detectable remotely sensed classes. Some remote sensing data of low resolution provide composite classes such as streams running under a forest canopy. Tree type under a canopy cannot be identified clearly, neither can stream widths and channels. The only way is to have contextual knowledge of the area through ground truthing and field work. A resource manager may simply class a group of trees as forests whereas remotely sensed data may show up different forest species. Other practical problems include mountain shadows, multiple classes, partial or complete clearcuts and how to group and label these features. Even where GIS and remote sensing data are standardized to a raster format the methods of interpretation will depend on the sets of classes under examination. A small number of spectral channels may be required to identify water bodies but many more bands and spatial features may be necessary to classify some forest species. What is used is dependent on processing and storage costs the more spectral and spatial channels used the more costly it is going to be. At a more practical level inconsistencies in making topographical maps may mean that the registration of images on these maps may be impossible. These inconsistencies cannot be solved algorithmically, unfortunately. Maps produced by different levels of government and those maps based on high resolution satellite imagery cannot be reconciled and therefore prevent any integration. Boundaries of water bodies on maps produced by different agencies may differ in both size and shape from those on the image. Remote sensing image segmentation may not give the same boundaries

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (5 von 13) [09.06.04 09:42:55]

manual 5 Chapter2_Summmary

on the GIS. The computing environment on which remote sensing and GIS software reside may impose technical restraints in integration. Labels on a GIS map with associated attribute information are handled differently from those on digital images which contain no such labels except its digital number. In the conversion between systems, say from a GIS to a IAS these labels may not be recognised. An error is returned in the IAS, and the error may now need to be rectified to remove any ambiguity. The problem can be a combinatorial one especially if the map uses many labels. A combinatorial problem refers to the fact that the number of errors grows factorially (that is, n! = n factorial, such that 4! means 4I 3I 2I 1 = 24) rather than either exponentially (42 = 16 ) or geometrically (4I 2 = 8).

Institutional What is interpreted as institutional barriers to integration really distils to who the various users of the systems are. In simple terms, there are apparently three main groups of users of both remote sensing and GIS technology: the decision-makers in government and private industry, the academic research community and finally the non-expert public. Each of these groups of users has different needs and perceive the use of the technology in their own specialized ways. Decision-makers, for instance, in practice are concerned with large scale data and on very particular projects such as housing development. Their needs for remotely sensed data and GIS data are thus very highly specialized and detailed. The scientific research community, on the other hand, look more globally and on a much smaller scale. Their concerns are global concerns such as modelling forest ecology subsystems or marine ecosystems. The data needs of this group of users includes both spatial as well as temporal databases. Within this group there may also be different requirements, for example, a GIS for marine areas may need only relatively low spatial resolution but may have to address temporal variability in a three-dimensional environment. A GIS for land parcel ownership data would need to be very accurate and hence a higher spatial resolution even though the data need only be static and twodimensional. These functional differences between users of GISs may be more fundamental than those arising from remote sensing and cartography. A further institutional barrier is that decision-makers see a GIS as part of a total information system for the institution as a whole whereas to some in the scientific community the system is a mere tool and independent of the organisational aspects of the system itself. A third user of the GISs and IASs are the non-expert public who may need to be shown the capabilities of each of these systems and shown how to best use each in order to extract the most out of the data. Inexpert users such as farmers may need a system that not only gives instructions for image interpretation but also how to use GISs to draw up relationships. An integrated and simple system is thus required if the systems are to be used effectively. This is because the farmers themselves have built up a huge knowledge-base of their own farms and can tell without the help of electronic devices and sophisticated data what crops are best suited to which fields, how best to rotate these crops from season to season, and the best times to open the sluice gates to irrigate their farms. The advent of modern remote sensing and GISs only helps to refine the processes which farmers have long used intuitively.

3. SOLUTIONS TO PROBLEMS OF INTEGRATION

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (6 von 13) [09.06.04 09:42:56]

manual 5 Chapter2_Summmary

Ehlers et al. (1989) have proposed several suggestions to overcome some of the problems and barriers that have become apparent in attempts at integrating remote sensing with GIS. It has been argued that remote sensing is essentially a data acquisition technology and that its role is limited to serving as a data input tool to GIS. But, it is to be noted that remote sensing also includes data processing technology touching on a variety of different wavelength regions, spatial resolutions and at irregular time intervals. GIS technology, on the other hand, is focused on modelling regions and a deeper understanding of the areas under examination. Thus, while remote sensing provide the objective view of an area through its images, GIS may provide a subjective, interpretative view of the same area. Combining both viewpoints may produce better information for planning and decisionmaking purposes. The solution then is the need for a fundamental shift in thinking in so far as both remote sensing and GIS are concerned. The shift in thinking may begin in ones conception of space. Image and cartographic data represent information about the world. Image data is space filling with raster elements for each pixel. This is described as a field representation of the world because the pixels contain information on reflectance values of objects at certain spectral wavelengths. Processing and further information is required before the objects can be identified in the image data. Cartographic data is obtained by abstracting information about the world and placing objects in two-dimensional Euclidean space, hence an object representation of the world. Roads, houses, fields are shown on maps as objects to represent features on the ground. Interpretation is thus required to understand the map and this is assisted by a legend, association of objects shown on the map and other prior knowledge from previous observations. Thus, the successful integration of remote sensing with GIS will depend on the ability to understand and conceptualize the transition from one form of representation to another. Ehlers et al. (1989: 1624) describe three different levels of integration which they have labelled as separate but equal, seamless integration, and total integration. Each of these levels of integration are discrete avenues although they suggest a linear progression from the simple to the complex. The separate but equal strategy involves the use of both raster and vector data in GISs and IASs systems and the ability to move results from IAS to GIS and vice versa. Seamless integration involves tandem raster to vector processing in GISs as well as extensions to existing capability of GISs. Such extensions will include control over remotely sensed image components, incorporating GIS vector data into image processing, ability to handle spatial, radiometric, spectral and temporal data, ability to handle hierarchical entities at different scales of analysis, for example, house, block, suburb, city; the capability to analyse errors and ability to generate simulations using cartographic data with image data in a temporal domain. To date, however, no single system is capable of handling all these tasks. In total integration, the false dichotomy between raster and vector data structures can be overcome by higher processing levels and hence higher levels of abstraction. A flexible method of handling object-based space representation as well as field-based space representation is crucial in this integration as is a GIS that accommodates both types of information at different hierarchical levels. This total information system may indeed incorporate an integrated query language that is imageintegrated as well. The query language might use image analysis software to extract image information not explicitly recorded in the cartographic data structures. Together these difficulties point to the fact that symbolic reasoning and expert systems may need to be used to solve problems of integration.

4. REMOTE SENSING/GIS AS TOOLS OR SCIENCE? The question has been asked as to whether remote sensing and GIS technologies are mere tools for analysis or are a scientific endeavour. The reason for such a question arises when some consider

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (7 von 13) [09.06.04 09:42:56]

manual 5 Chapter2_Summmary

remote sensing as a mere data acquisition technology and GIS as the analytical framework used to discover new information. To appreciate such reasoning, it may be necessary to reiterate some the historical development of remote sensing and to observe its parallels in GIS. Major efforts in remote sensing began in the late 1960s at about the same time as developments in GIS. However, remote sensing was well-funded with the prospects of developing peaceful uses of space technology and the collection of vast amounts of geographical data quickly and cheaply. It is no wonder that the growth of remote sensing in the decades of the 1960s and 1970s outpaced those in GIS until the early 1980s. Moreover, GIS has been seen as an add-on to remote sensing systems given its potential for modelling and analysis and the ability to integrate information and improve the accuracy of classification of features found on maps. Yet, despite these advances, lessons learnt from remote sensing have direct bearing on GIS. There is first the need for formal theory in GIS. This is because much of the work in remote sensing is purely empirical and is limited to specific times and places. Generalisation and hence theorising may be difficult under such circumstances. It is difficult to generalize results from one satellite or sensor to another so that much basic work may be necessary for each new satellite or sensor used. Secondly, the excessive expectations of the technology has had to be tempered with the practical reality of what is feasible and achievable. The early promises of remote sensing in monitoring agricultural production, forest harvesting has had to be balanced against the problems associated with the accuracy of classifications. The errors introduced by the atmospheric effects on the spectral response and hence the data on images has had to be accommodated. This emphasizes the continuing need for basic research in the use of this technology. Both remote sensing and GIS are particularly attractive since it combines both high technology and colour graphics. However, more needs to be done by way of the use of these as tools and as a scientific endeavour in itself. Finally, remote sensing was expected to produce more fundamental changes in the way people thought about and used geographical data. However, the magnitude of the changes is unclear and much remains to be done. As a general statement, remote sensing has contributed towards the emergence of a global science and a major technology in global monitoring. It has given a view of the Earth from space and has encouraged the view of the planet as an integrated system (Gaia). GIS provides similarities in such thinking in that GISs integrate many layers of spatial information. Together remote sensing and GIS contribute to an emerging spatial information science. In the final analysis where users of the system are in pursuit of knowledge in a fundamental way, then it may be justifiable to label both GIS and remote sensing as a science. Used in any other way would consign these as mere tools.

5. FUTURE PROSPECTS: REMOTE SENSING WITH GIS This discussion on the future of GIS is based on Simonett (1990). In terms of future trends, whether or not there is a convergence between GIS and remote sensing will depend to a large extent on the dictates of the market-place. the glue that may bind users is found in the language used and in the use of common technology. The market-place for spatial information can be characterized as follows:

r r r r

desktop mapping products that produce thematic maps from input data spatial analysis systems that emphasize the ability to overlay, combine layers, build buffers database systems which have limited geographical functions such as displays and data input geographical spreadsheets in which two or more areas may be merged for districting purposes (for example, schooling, electoral, postal)

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (8 von 13) [09.06.04 09:42:56]

manual 5 Chapter2_Summmary
r

query systems providing the interface and hence access to large public databases as well as the ability to geocode, query, find optimum routes through a network image processing systems to process remotely sensed imagery but with added GIS functions for data integration.

The vendors of such systems are faced with a choice either of building a product that satisfies a common denominator market or to home in on the most lucrative sub-market. Whichever way succeeds depends on trends and market forces. These trends are dictated by institutions, education and training, research and development (R&D) and technology that can simultaneously deliver the requirements of each sub-market. Thus, a vision for GIS may be one of automated geography in which almost all geographical data are automated its capture, processing, use, analysis and reporting. In a digital environment geographical information becomes more powerful and versatile given functions such as overlay and integration, measurement and simple map analysis and map displays. Together these trends suggest that a spatial information science is in the making and that GIS and its allied fields such remote sensing and cartography all point in this direction:

r r r r r r

data collection remote sensing, surveying, photogrammetry data compilation classification, interpretation, cartography data models data structures, theories of spatial information data display cartography, computer graphics navigation, spatial information query and access spatial analysis and modelling

Spatial information science is sufficiently distinct to claim a following of scientists. The theories and problems which spatial information addresses are sufficiently basic for research to be fruitfully conducted. Spatial information science is sufficiently unique in terms of its identity to merit research work as a discipline in itself. The ultimate GIS will be a planetary, global GIS since as an integrated system the concerns would include modelling of the physical, biological, chemical and temporal cycles of the system Earth. SUMMARY Integrating remote sensing with GIS represents a very important stage in the evolution of a truly global spatial information system (SIS). A simple view is that remote sensing provides the data whilst GIS provide both the analytical and scientific framework. However, such a view is too simplistic since both these encompass important technologies and philosophical frameworks. The module was presented in five parts, the first dealing with definitional issues while the second spelt out some of the advantages and problems of integration. Part three gave some solutions to integration and part four addressed the tool/science dichotomy debate. The concluding section discussed the convergence of technologies and the prospects for a truly global information science. The definitional issues to be resolved included placing GIS in relation to remote sensing as two different technologies dealing with spatial data. The structure of remote sensing was summarized diagrammatically and reasons for the rapid growth and use of remote sensing was discussed. These included developments in sensing instrumentation, the deployment of Earth orbiting satellites and the

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (9 von 13) [09.06.04 09:42:56]

manual 5 Chapter2_Summmary

growth of computer technology to process digital imagery. Included in these considerations were image analysis and related systems and photogrammetry and air photo interpretation as additional tools of spatial data analysis. Finally, cartography was considered as the graphical product of remote sensing and GIS. Different models of integration were presented with the conclusion that a Threeway Interaction model would serve the best interests of all three technologies under consideration above. Part two posed the question of the need for integration. An integrated remote sensing-GIS provides the ultimate tool for resource management and environmental monitoring. Other strengths arising from integration include timeliness in data collection, provision, and analysis; accuracy of geographic data used in various analyses; geometric registration of digital image data as part of GIS map layers; the use of images as layers of images; the interchange of image and functions and better capabilities and functionality for modelling change. Some possible modes of integration include tandem processing digital image data in a raster format on the one hand and vector data on the other; and terrain visualisation where satellite images in combination with digital elevation models may produce stunning graphical effects together with advanced analytical capabilities. The problems and obstacles to successful integration have been traced to technical and institutional barriers. A major problem is that of the data structure used raster data for images and vector data for GISs. In GIS itself there is a tension between the two data structures but in the long term with better software being written the interchange of data between the two structures may become seamless in future. Institutional barriers to higher levels of integration rest with the users of these various systems public and private decision-makers, the scientific research community and the non-expert users. Identified barriers may disappear with a greater interaction and understanding of the needs of each of the user groups. Part three suggested several solutions to the problems of integration. A major requirement is in a shift in thinking so as to include both in the field-representation view of the world with an object representation of the same space. A hierarchical segmentation of levels of integration was suggested depending on the ultimate usage of each system. In Part four, remote sensing and GIS were considered as either tools for data collection or as a scientific methodology yielding new knowledge. To understand the question better, a brief history of remote sensing was provided in order to trace its parallels with developments in GIS. The final section briefly considered future prospects of using remote sensing with GIS. Already software programmes have been written with these objects in mind. However, the future lay in the dictates of the market-place and whether a common denominator market will prevail or one where a niche market will prevail is dependent indirectly on what users want and need. Ultimately what may emerge is a planetary, global GIS integrated with remote sensing that enables a modelling of the physical, biological, chemical and temporal cycles of the system Earth.

FURTHER READING
Aronoff, S. (1989) Geographic Information Systems: A Management Perspective Ottawa: WDL Publications. Ch. 3 Remote Sensing, pp 47-102; Ch. 9, Conclusion pp 281-285. Burrough, P.A. (1986) Principles of Geographical Information Systems for Land Resources Assessment New York: Oxford University Press. Colwell, R.N. (ed.) (1983) Manual of Remote Sensing, (2nd. ed.). American Society of
http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (10 von 13) [09.06.04 09:42:56]

manual 5 Chapter2_Summmary

Photogrammetry, Falls Church, Va. (2 volumes). Cowen, D. (1988) GIS versus CAD versus DBMS: What are the differences? Photogrammetric Engineering and Remote Sensing, v. 54, pp. 1551 - 1555. Curran, P.J. (1985) Principles of Remote Sensing, Harlow: Longman. Curran, P.J. (1987) On defining remote sensing Photogrammetric Engineering and Remote Sensing, v. 53, pp. 305 - 306. Dent, B.D. (1985) Principles of Thematic Map Design, Reading, Mass: Addison-Wesley. Ehlers, M., Edwards, G. & Bdard, Y. (1989) Integration of remote sensing with geographic information systems: A necessary evolution, Photogrammetric Engineering and Remote Sensing, v. 55(11), pp. 1619 - 1623. Fisher, P.F. & Lindenberg, R.E. (1989) On distinctions among cartography, remote sensing and geographic information systems, Photogrammetric Engineering and Remote Sensing, v. 55(10), pp. 1431 - 1434. Fussell, J., Rundquist, D. & Harrington, J.A. (1986) On defining remote sensing, Photogrammetric Engineering and Remote Sensing, v. 52, pp. 1507 - 1511. Goodchild, M.F. (1985) Geographical information systems in undergraduate geography: A contemporary dilemma, The Operational Geographer, v.8, pp. 34 - 38. Goodchild, M.F. & Kemp, K. (eds.) (1990) Introduction to GIS, NCGIA Core Curriculum, Santa Barbara, CA: NCGIA. Goodenough, D.G. (1988) Thematic Mapper and SPOT Integration with a Geographic Information System, Photogrammetric Engineering and Remote Sensing, v. 54(2), pp. 167 - 176. Lo, C.P. 1(986) Applied Remote Sensing, Harlow: Longman, Ch. 9 Geographic Information Systems, pp. 369 - 387. Maguire, D.F., Goodchild, M.F. & Rhind, D. (1991) (eds.) Geographical Information Systems, London: Longman Scientific & Technical. Marble, D.F., Calkins, H.W. & Peuquet, D.F. (1984) Basic Readings in Geographic Information Systems. Marble, D.F. (1984) Geographic information systems: An Overview, PECORA 9 Proceedings, Spatial Information Technologies for Remote Sensing Today and Tomorrow, Oct. 2-4, Sioux Falls, SD: IEEE, pp. 18-24. Muehrcke, P.C. (1972) Thematic Cartography, Resource Paper 19, Washington, D.C.: Association of American Geographers. Robinson, A.H., Sale, R.D., Morrison, J.L. & Muehrcke, P.C. (1985) Elements of Cartography 5th
http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (11 von 13) [09.06.04 09:42:56]

manual 5 Chapter2_Summmary

ed. New York: Wiley. Simonett, D. (1990) Unit 75: The Future of GIS, in Goodchild, M.F. & Kemp, K.K. (eds.) pp. 75-3 75-9. Star, J. & Estes, J. (1990) Geographic Information Systems: An Introduction Englewood Cliffs, NJ: Prentice Hall. Ch. 10 Remote Sensing and GIS, pp. 191 - 219; Ch. 13 Looking Toward the Future, pp. 245 - 251. REVISION

1. Why are the following disciplines components of spatial information systems: 2. 3. 4. 5. 6. 7. 8. 9.


remote sensing, GIS, air photo interpretation, photogrammetry, image analysis and cartography? Differentiate between remote sensing and image analysis. What is photogrammetry and how is it different from air photo interpretation? Describe some basic principles governing remote sensing systems from data acquisition through to data manipulation and use. Why is there a need to integrate remote sensing with GIS? Outline some of the obstacles and problems in attempting to integrate remote sensing with GIS. What suggestions have been made to solve the difficulties in integrating remote sensing with GIS? Spatial information science seeks to advance empirical and theoretical knowledge of the Earth. How do remote sensing and GIS qualify as scientific disciplines? Discuss the future prospects of remote sensing with GIS as a global information science.

CLAASS EXERCISE This part of the manual is still under development. See relevant class handouts pertaining to this Module

COMPUTER EXERCISE
FARMIMAGETM and EASI-PACE are two software packages that have been developed to process and digitally enhance remotely sensed images of the Earth. Separate handouts and documents describing the functions of each will be handed out in classes.

[Home] [Table of Contents][Comments] [Modules] [Glossary]

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (12 von 13) [09.06.04 09:42:56]

manual 5 Chapter2_Summmary

Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 2001

http://infosys-law.canberra.edu.au/gismodules/manual_5/m5_chap2_sum.html (13 von 13) [09.06.04 09:42:56]

GLOSSARY OF TERMS for GIS Manuals

A SELF TEACHING STUDENT'S MANUAL FOR GIS


GLOSSARY OF TERMS

bar8.gif (418 bytes)

[Home][Module 1][Module 2][Module 3][Module 4][Module 5][Comment] Module 1 (Geographic Information Systems) accuracy algorithm freedom from error (cf. precision) a set of procedures to be followed in solving any problem of a given type. Computer programmes are written to implement algorithms two or more connected points with nodes at either end. Arcs are the building blocks of polygons

arc

attribute

a field containing information about an entity. A class of descriptor of an entity, for example, colour refers to the boundaries and subdivisions of public lands and private estates. Information includes ownership and taxation of property facts of any given kind a collection of interrelated data stored together to serve one or more applications; stored so as to be independent of the programs using the data data base management system

cadastre

data data base

DBMS

digital data DIME file

data represented by a sequence of code characters, readable by computers a geographic base file produced by the US Census Bureau with Dual Independent Map Encoding. (cf. TIGER) the process of converting analogue measures such as length into digital form a satellite orbiting above the earths equator at an altitude of approximately 36,000 km, such that its period of revolution about the Earth matches the Earths rotational period. Thus, the satellite continuously views the same portion of the earths surface. Also called geosynchronous satellite. global positioning system. A technology that uses geodetic reference points such as longitude and latitudes. The most precise system of land surveying relies on satellites to determine geodetic coordinates. GPS was developed by the US government for the Department of Defence, but is available for other uses. An improved version is now in use called differential GPS differential because the technology relies on known ground locations in order to calculate positions with accuracies of up to 1 metre. data transfer from an external storage medium into the internal storage of the computer a printing device which composes and prints a full line of characters at one time

digitization geostationary satellite

GPS

input line printer

http://infosys-law.canberra.edu.au/gismodules/glossary.html (1 von 8) [09.06.04 09:46:23]

GLOSSARY OF TERMS for GIS Manuals

plotter

a graphic hard copy device using a moving pen or electrostatic recording process. May be drum or flat-bed, digital or analogue or node is a zero-dimensional object, a co-ordinate location a two-dimensional surface bound by line segments the degree of exactness with which a quantity is stated (cf. accuracy)

point polygon precision raster relational data base

a gridded unit of information for display a data base made up of flat files, allows the system to recombine data elements from different relations giving great flexibility in the usage of data Topologically Integrated Geographic Encoding and Referencing file. US national digital database of planimetric base map features, including the location of all street addresses in major urban areas as well as all water bodies, highways, and railroads. Produced by the US Census Bureau for the 1990 Decennial Census.

TIGER files

turnkey system a system which combines hardware and software and potentially data as one package to operate on the turn of a key after installation

Module 2 (Raster GIS: An Introduction)

ASCII aspect

American Standard Code for Information Interchange The direction of steepest slope or the direction in which the surface is locally facing. Measured in degrees from North or by compass points N, NE, E and so on.

A specific implementation of a multivariate raster data set. For each line in the raster, band interleaved by the values of each of the variables or bands are stored in sequence, before the set for the succeeding line. line (BIL) band interleaved by For each pixel, the values of each of the variables or bands are stored before moving pixel (BIP) onto the next pixel. band sequential (BSQ) data base system data base digital elevation model (DEM) The complete data array for each separate variable or band is stored independently of the other variables. Provides for the input, storage and retrieval of data Set of data that are stored A raster array of elevation values.

http://infosys-law.canberra.edu.au/gismodules/glossary.html (2 von 8) [09.06.04 09:46:23]

GLOSSARY OF TERMS for GIS Manuals

digitizer A device that converts analog information into a digital form. For flat graphic material, a digitizer can be either flatbed or scanning. For analog signals, a digitizer consists of a digital-to-analog converter plus associated control and interface devices. measurement scale A system for quantifying observations according to predetermined rules, which define four successively greater levels of data precision (nominal, ordinal, interval and ratio). Binary variables are sometimes included here. pixel Of a surface: the smallest unit whose characteristics may be uniquely determined. From picture element.

supervised classification.

Classification of data into discrete categories based on statistics developed from training sites.

tessellation

A tessellation of a plane is an aggregate of cells that divide up or partition that plane. Two rules govern tessellation: that it should be capable of producing an infinitely repetitive pattern on the plane, and that it should be infinitely recursively decomposable into similar patterns of a smaller and smaller size.

training site

Recognizable areas on a image with distinct spectral reflectance or other properties useful for identifying other similar areas.

tuple

A data element composed of more than one value. For example, geographic locations are often specified by two values: latitude and longitude. The pair of latitude and longitude coordinates is called a tuple.

unsupervised classification.

Automated classification of data into discrete categories by grouping together similar observations without the aid of training data. Also called statistical clustering.

quadtree

A raster data structure that uses variable pixel size, depending on the spatial homogeneity of the image. Efficient for data compression of thematic maps, and allows for fast search of the database with Morton coordinates referencing system that uses quadlevel and quad position (see Bonham Carter 1988 in Peuquet & Marble 1990).

bar8.gif (418 bytes)


Module 3 (Vector GIS: An Introduction)

accuracy arc connectivity

freedom from error or bias; closeness to the true value a line feature that has a node at each end of the line refers to interconnected pathways or networks that transport something. Connectivity functions are used to find optimum routes through a network most efficient, fastest, shortest, cheapest, most direct and so on.

http://infosys-law.canberra.edu.au/gismodules/glossary.html (3 von 8) [09.06.04 09:46:23]

GLOSSARY OF TERMS for GIS Manuals

contiguity

the spatial relationship of adjacency, that is, elements that touch each other are adjacent. any grouping of similar phenomena that should eventually get represented and stored in a uniform way. NCGIA Unit 10 a real-world phenomenon that is not subdivided into phenomena of the same kind a parameter set in Arc/Infos CLEAN operation. Removes coordinates within the minimum distance of other coordinates as the coverage is processed (cf. weed tolerance).

entity type

entity: fuzzy tolerance:

nodes precision RMS error

refer to the end points that define an arc refers to the ability to distinguish small differences the calculated difference between recorded and specified tic locations, expressed as a residual of the means squared. Values are usually just slightly greater than 0.003 the higher the value, the greater the error. the process of moving a feature to coincide exactly with coordinates of another feature that is within a certain specified snap distance. describes a file of spatial data as a collection of coordinate strings with no inherent structure. The map is expressed in Cartesian coordinates of the elemental units of points, lines and polygons.

snapping

spaghetti model:

spatial data base: a set of spatially referenced data that acts as a model of reality spatial object spatial object type: tics a digital representation of all or part of an entity one of a limited set of clearly defined spatial primitives and simple objects, for example, node, chain, ring in Arc/Info tics are registration points representing the location of known points on the earths surface for which real-world coordinates are known. a common set of tics is used in each layer so that the layers register to each other and to adjacent map sheets.

universe polygon also known as the external polygon represents the area outside all the polygons in a map vertices refer to points occurring along an arc between nodes which help define the shape of an arc. the minimum acceptable distance between any two vertices along an arc. A parameter set before adding arc features.

weed tolerance:

Module 4 (Managing attribute data in GIS)

http://infosys-law.canberra.edu.au/gismodules/glossary.html (4 von 8) [09.06.04 09:46:23]

GLOSSARY OF TERMS for GIS Manuals

attribute

a descriptive characteristic of a feature. An attribute asks a question about it: What, where, how big, how many, when etc. The answers to the questions are the values stored in a database. Cartographic attributes describe how to display map information (colour, length, height, width etc.) while nongraphics data attributes describe the mapped feature (what is it? what it cost? when it was built?) a field containing information about an entity. A class descriptor of an entity, for example, colour. Tabular data associated with geographic features.

cartographic data attributes of cartographic features that are stored in a computer system for display as a map. cartographic feature something that can be named or assigned an identifier (such as a street, a manhole, a property line, a building or an intersection) and can be located on a map.

cartographic object

the digital record that contains attributes of a cartographic feature and is stored in a computer system.

database schema a logical description of data stored in a database. The scheme not only defines the names of the data items and their sizes and other characteristics but it also identifies the relationships among the items (for example, all the data associated with a block is also associated with the census tract into which the blocks fall).

dBASE

a tabular database management system that can be used with PC ARC/INFO to stxore and manipulate map feature attributes.

DBMS

Database Management System. The collection of software required for using and manipulating a tabular database, and presenting multiple, different views of the data. xxvii A collection of computer programs that are used to organize and use data stored in a database. Typical functions of a DBMS include the logical and physical linkage of related data elements , the retrieval and verification of data values and other data management functions as security, archiving and updating.

descriptive data

tabular and /or textual data describing the geographic characteristics of map features. xxvii ARC/INFO Feature Attribute Table data file storing descriptive map data.

FAT

feature selection by attribute

the process of selecting a subset of features from a coverage using logical selection criteria that operate on the attributes of coverage features (for example, AREA GT 1000). Only those features whose attributes meet the selection criteria are selected. Also known as logical selection. stores information in records of the same length. Each record holds several fields including a key field which is used for searching purposes. an identifies assigned to both a map feature and to a data record containing attributes that describe the entity represented by the map feature. Common geocodes include: addresses, census tracts, and political and administrative boundaries. Geocodes are also referred to as location identifiers. a tree structure, one-to-many relationship, a parent-child relationship.

flat file structure geocode

hierarchical file structure

http://infosys-law.canberra.edu.au/gismodules/glossary.html (5 von 8) [09.06.04 09:46:23]

GLOSSARY OF TERMS for GIS Manuals

lookup table

also known as a relate table, external attribute table, or expansion table. A special tabular data file associated with a particular feature attribute table and containing additional attributes about the feature beyond those stored in the feature attribute table. a lookup table is a set of values stored in computer memory. The LUT consists of a list of input values and corresponding output values. LUTs are positioned in the data path between the frame buffer and the monitor. In effect they process the image data on the fly as they are sent to the display. See also lookup table. a many-to-many relationship. Child record can have more than one parent record. Relational Database Management System. A database management system with ability to access data organized in tabular files that may be related together by a common field. A RDBMS has the capability to recombine the data items from different files, thus providing powerful tools for data usage. an ARC/INFO operating connecting corresponding records of two tables using an item common to both. 2-26 Each record in one table is connected to one or more records in the other table that share the same value for a common item. A relate gives access to additional feature attributes that are not stored in a single table. The connection is only temporary. ARC/INFO operation relating and physically merging two attribute tables using their common item. term used to refer to a geographical partition of a database. generates the data to drive the colour monitor from the frame buffer values. Instead of generating one output value for each input value the video LUT generates three values each in the range of 0 - 255. The three values control the brightness levels of the Red, Green, Blue electron guns in a computer monitor.

LUT

network file structure RDBMS

relate

relational join tile video LUT

Module-5 (Integrating Remote Sensing with GIS)

digital elevation model (DEM) digital number in remote sensing, electromagnetic radiation electromagnetic spectrum ground truth

a raster array of elevation values the numerical value of a specified pixel energy propagating through space or matter in the form of waves of interacting electric and magnetic fields. Radio waves and light are examples of electromagnetic radiation the range of wavelengths of known electro-magnetic radiation, including ultraviolet radiation (UV), visible radiation, infrared radiation (IR) and microwaves information obtained on the ground, at the same time a remote sensing system is acquiring data from the same location. Ground truth is normally considered the most accurate available, and is used to interpret and calibrate remotely sensed observations a two-dimensional representation. Examples include a photograph, a multispectral imaging sensors data output, and the processed result of an aeromagnetic survey analysis of digital image values, including one or more spatial, temporal, and spectral band relationships, to obtain categories of information about specific features

image image classification

http://infosys-law.canberra.edu.au/gismodules/glossary.html (6 von 8) [09.06.04 09:46:23]

GLOSSARY OF TERMS for GIS Manuals

instantaneous field of view (IFOV) Integrated Geographic Information System (IGIS) Landsat

for a sensing device, the area covered at a single moment, described either as the angle through which the sensor gathers radiation, or the area on the ground at a specified altitude a GIS that includes facilities for working with remotely sensed data

a series of Earth observing satellites (originally named Earth Resources Technology Satellite (ERTS), first launched in 1972 by NASA, that serve as platforms for several instruments, including the return beam vidicon (RBV), the multispectral scanner (MSS) and the thematic mapper (TM) an image that has been corrected for all geometric distortions caused by the Earths rotation and curvature, satellite motion, attitude, and viewing perspective as well as relief displacement a photograph that has been manipulated in such a way as to eliminate image displacement due to photographic tilt and relief techniques of obtaining precise measurements from images the flux of radiant (electromagnetic) energy measured in power units (for example, watts) a passive device for intercepting and quantitatively measuring electromagnetic radiation in a band of wavelengths a set of spectral, tonal, temporal or spatial characteristics that together serve to identify a class or feature by remote sensing an approach based on the hypothesis that individual materials, objects or features, may be identified based upon their unique spectral response (hence signature) in a set of discrete wavelength regions a multispectral remote sensing satellite system with pointable sensors. The instantaneous field of view (IFOV) of the system is 20 m in multispectral mode and 10 m in monochromatic mode. This French system was first launched in 1986 classification of data into discrete categories based on statistics developed from training sites S a seven channel, 30 m instantaneous field of view (IFOV) multispectral scanner (MSS), designed for monitoring Earth resources flown by NASA on the Landsat 4 and 5 platforms recognisable areas on an image with distinct spectral reflectance or other properties useful for identifying other similar areas automated classification of data into discrete categories by grouping together similar observations without the aid of training data. Also called statistical clustering

ortho-image

orthophoto photogrammetry radiance radiometer signature spectral signature SPOT (Systme Pour dObservation de la Terre) supervized classification Thematic Mapper (TM) training sites unsupervized classification

Citation To reference this material the correct citation for this page is as follows: Cho, G (1995) Geographic Information Systems. Students Manual. Glossary of Terms. Canberra: University of Canberra and CAUT; http://infosys-law.canberra.edu.au/docs/gismodules/manual_1/index.html.

Please click on the BACK button on your browser to go to the previous page where you came from.
http://infosys-law.canberra.edu.au/gismodules/glossary.html (7 von 8) [09.06.04 09:46:23]

GLOSSARY OF TERMS for GIS Manuals

Send an e-mail message to cho@scides.canberra.edu.au with your comments and suggestions about this web site. Copyright 1999 George Cho Last updated: February, 2001

http://infosys-law.canberra.edu.au/gismodules/glossary.html (8 von 8) [09.06.04 09:46:23]