You are on page 1of 45

iirs

DATA MODEL
SPATIAL DATA MODELS

DATA FORMATS & STRUCTURES


DR. SAMEER SARAN
Head
Geoinformatics Department
Remote Sensing & Geoinformatics Group
Indian Institute of Remote Sensing (IIRS)
Indian Space Research Organization (ISRO)

Department of Space, Government of India


Dehradun
www.iirs.gov.in/Dr.Sameer Saran

iirs

What is a GIS ?

A GIS is a computer-based system that


provides the following four sets of
capabilities to handle geo-referenced data:
1. Input
2. Data management (storage and retrieval)
3. Manipulation and analysis
4. Output.
(Aronoff, 1989)

iirs

Spatial Data Model

It represents the linkages between the real world

domain of geographic data and the computer or GIS


representation of these features. It helps (Marble, 1982)
To organize a systematic file structure
Abstracts the real world into properties which are perceived

by a specific application

iirs

GIS structures as representations of reality

Two approaches have been widely adopted


for representing the spatial & attribute
information within a GIS
A composite model (raster)
Geo-relational model (vector)

iirs

spatial data models

Two fundamental approaches:


raster model
vector model

iirs

Spatial data types

Regular tessellations
Irregular tessellations
Point data
Line data
Area data

iirs

Vector Data Concept

iirs

Vector Data Structure

Point (node): 0-dimension


single x,y coordinate pair
zero area
tree, oil well, label location
Line (arc): 1-dimension
two (or more) connected x,y
coordinates
road, stream
Polygon : 2-dimensions
four or more ordered and
connected x,y coordinates
first and last x,y pairs are the
same
encloses an area
census tracts, county, lake

iirs

Vector model

In a vector-based GIS data are handled as:


Points X,Y coordinate pair + label
Lines series of points
Areas line(s) forming their boundary
(series of polygons)

line
feature
point feature

area
feature

iirs

Vector model

iirs

Line Types

Line segment: with two end points


Line string: a sequence of line segments
LinearRing (Ring): a sequence of segments with
closure
Curves (Arc)
* Sometimes, Arcs refer to lines (in
ArcGIS/ArcView case)

iirs

Vector Structures

How to organize vectors in Computer ?


Spaghetti Structure

Whole Polygon Structure


Points and Polygons Structure
Topological Structure

Spaghetti model

iirs

A COLLECTION OF COORDINATE STRINGS


WITH NO INHERENT STRUCTURE
70
60

L1

Y - AXIS

50

40
30

20

P1

10
0
0

10

20

30
40
X - AXIS

50

60

70

iirs

Vector Structure: Spaghetti

iirs

Spaghetti Vector Structure

Spaghetti structure is usually derived


from manual digitizing
Crossing lines (no crossing nodes)
The common boundary between adjacent
polygons is recorded twice
No neighbourhood information
Unlinked data require a large amount of
storage memory

Whole Polygon Structure

iirs

(A Kind of Spaghetti)
Whole Polygon (boundary structure): polygons
described by listing coordinates of points in order as
you walk around the outside boundary of the
polygon

coordinates/borders for adjacent polygons stored twice


may not be same, resulting in slivers (gaps), or overlap

all lines are double (except for those on the outside


periphery)
no topological information about polygons
which are adjacent and have common boundary?
how to relate different geographies? e.g. zip codes and tracts?

iirs

Whole Polygon: illustration

iirs

Points & Polygons Structure

Points and Polygons: polygons described by


listing ID numbers of points in order as you walk

around the outside boundary; a second file lists


all points and their coordinates
solves the duplicate coordinate/double border problem
lines can be handled similar to polygons (list of IDs) ?

still no topological information

iirs

Points and Polygons:illustration

iirs

Topology

Topology is a branch of mathematics that deals


with properties of space that remain invariant
under certain transformations.
Properties : Three spatial relationships
Area:
Polygons can be defined by set of lines enclose them
Contiguity: Identification of polygons which touch each other or
connect identify contiguos polgons (left or right)
Connectivity: Identification of interconnected arcs, starting point
& end point of network analysis

Rubber Sheet Transformation

iirs

1
5

A
E 7

6
B

E
4

7
6
B

iirs

Topology-1

Connections & relationships between objects are


independent of their coordinates

Topological properties of an object are preserved


when the object is stretched, distorted and bended
Overcomes major weakness of spaghetti model
allowing for GIS analysis (Overlaying, Network)
Requires all lines be connected, polygons closed,
loose ends removed

iirs

Topology-2

It describes spatial relationships


Connectivity: relationships between
the arcs in the network
Contiguity (adjacency): relationships
between the polygons
For example, with respect to line 1, left and
right polygons are A and B respectively

Containment: this refers to what is


within a polygon
For example, Polygon B is within Polygon A

iirs

Topological data model

iirs

Spaghetti vs. topological model

Spaghetti model

Topological model

Very simple and easy to understand

More complex data structure

No spatial relationships retained

Spatial relationships are retained


Spatial analysis can be performed largely
without specifying co-ordinate data

Lines between adjacent areas must be


digitised and stored twice

Map updating requires re-establishing


topology

iirs

Popular File Formats

DIME Dual Independent Map Encoding


TIGER Topologically Integrated
Geographic Encoding and Reference
DLG Digital Line Graph
Shape File, ESRI
Software or data specific

iirs

Shape File, ESRI

Shape file: native GIS data structure for a vector


layer in ArcView
not fully topological

limited info about relationship of features one to another


draws faster

comprises several (at least 3) physical disk files

xxxx.shp (geometric shape described by XY coords)


xxxx.shx (indices to improve performance)
xxxx.dbf (contains associated attribute data)
xxx.sbn xxxx.sbx (for indexing)

openly published specs so other vendors can develop


shape files and read them

iirs

Vector Data Structures

Advantages
Good modeling of objects (object-view)
Compact data structure
Topology can be described explicitly therefore good
for analysis

Coordinate transformation & rubber sheeting is easy


Accurate graphic representation at all scales

Retrieval, updating and generalization of graphics &


attributes are possible

iirs

Vector Data Structures

Disadvantages
Complex data structures

Combining several polygon networks by


intersection & overlay is difficult; uses

considerable computer power


Display & plotting often time consuming

and expensive; especially high quality


drawings, coloring, and shading

iirs

Raster Data Concept

iirs

Raster Data Structure

Area is covered by grid with


(usually) equal-sized cells
Cells often called pixels
(picture elements); raster data
often called image data
Attributes are recorded by
assigning each cell a single
value based on the majority
feature (attribute) in the cell,
such as land use type
Typically 8 bits assigned to
values therefore 256 possible
values (0-255)

iirs

Raster Data Structures: Tessellation

iirs

Raster based data structures

To effectively increase data processing


performance and reduce the demand for
data storage, two issues involved in raster
data structures:
Compression methods:- how to more
efficiently store the data, and
Scan order:- how to scan the data in an
array and deals with performance in terms
of data processing

iirs

Run-length Coding

Describes the interior of


an area by run-lengths,
instead of the boundary
Run-Length Codes:

Row 9: 2,3; 6,6; 8,10


Row 10: 1,10

Row 11: 1,9

iirs

Raster Compression

Run Length Compression


One of the widely used raster data representation
and compression techniques
E.g.: Code the raster (shown in the example
image) using the run-length coding with the roworder
Run-length Codes: 14, 3; 2,7; 4,3; 4,7; 4,3; 3,7;
9,3; 2,7; 6,3; 4,7; 5,3; 3,7; 4,3
Original image size (assume that each pixel is
coded using 1 byte) is 8x8=64 bytes
The run-length code file needs 13x2 = 26 bytes
The compression radio is, 64:26 = 2.46:1

iirs

Quad-tree Coding

iirs

Quad-tree Coding contd

Raster Compression Contd

iirs

Quad-tree Compression

Used widely for spatial data indexing


Quadtree codes with N-order (Peano keybased)

iirs

Raster Ordering

Raster is two-dimensional
2-D ordering is developed to create a 1-D
representation of the 2-D raster, in order to improve
the efficiency of raster access.

Raster Ordering Contd

iirs

N-order (Peano Order or Morton-order , Z-order)

Hierarchical ordering system

Build level by level and repeat the same pattern at each

level

iirs

Raster Compression Contd

lossless compression vs lossy


compression
Can you reproduce exactly the original data
from the compressed data?

Zip (2-5:1)
GIF (2-4:1), JPEG (10-40:1), MPEG
(50:1).
ECW , Mr. Sid etc.

iirs

Raster Data Structures

Advantages
Simple data structures
Location-specific manipulation of attribute
data is easy
Many kinds of spatial analysis and filtering
may be used
Mathematical modeling is easy because all
spatial entities have a simple, regular shape

iirs

Raster Data Structures

Disadvantages
Large data volumes
Using large grid cells to reduce data volumes
reduces spatial resolution; loss of information
& inability to recognize phenomena that have
logically defined structures
Crude raster maps are inelegant though graphic
elegance is becoming less of a problem
Coordinate transformations are difficult and time
consuming unless special algorithms are employed

iirs

Choices: Raster vs. Vector

iirs

THANK YOU