Вы находитесь на странице: 1из 32


Data Preparation and Integration

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Data Preparation and Integration: the necessary steps
Geocoding: assigning geographic coordinates to points
Perhaps the most basic form of spatial data entry
data media conversion
data format conversion
raster & vector
data reduction
Topology, error detection and topological editing
rectification and registration (one on top of the other)
overlaying sheets and referencing to the real world
edge matching & image adjustment (side by side)
linking & balancing adjacent sheets

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Geocoding:assigning spatial coordinates to point data
Address Matching assigns spatial coordinates (explicit location) to addresses
(implicit location)
Address matching requires street network file with street attribute information (street
name and number range) for all street segments (block sides)
Zone variable required if data spans multiple cities (to handle duplicated street names)
precise matching of street names can be problematic
completeness (esp. for new streets) important
PO boxes, building names, and apartment complex names cause problems.
Implementation in ArcGIS is 3-step process
In ArcToolbox (9.2), process street network file to create a Geocoding Service
In ArcMap, load appropriate geocoding service via Tools/Geocoding/Services Manager
In ArcMap, geocode a table of addresses using Tools/Geocoding/Geocode Addresses
Point Location Files containing lat/long or x,y coordinates
(e.g derived via GPS)
bring table (e.g. in .csv or .dbf format) into ArcGIS using add data icon
Right click table name in T of C and select Display X,Y data
Displays as event layer. Export to shapefile or gdb feature class for spatial data set.
Input table must contain 3 variables at minimum: Feature ID, x, y
Data Media Conversion--Scanning:
automated recording of map or aerial
Produces dumb raster data Great if need only raster representation
vectorize using conversion software
Create smart image using digital image
Automated creation of vector data from
processing techniques scanning very problematic:
electromechanical docs must be clean
$100-$50,000 instruments complex line work adds error
drum or flatbed lines shouldnt be broken with text.
scan resolution depends on price! text may be interpreted as lines
down to 20 microns (millionth of m) automatic feature detection (road versus
Scanners v. sensors railroad) difficult
Sensors collect data directly in digital form
(e.g. digital cameras)
ESRIs ArcScan for ArcGIS (included
Sensor resolution now (2005>) matches that with ArcEditor) provides interactive,
of photos, so scanning photos becoming old semi-automated raster to vector
technology conversion.
Still lots of paper maps around e.g. property
ownership records Other vendors offer specialized
conversion software
Digital image processing techniques
used to create smart raster
Identify feature type within each raster
06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Data Media Conversion--Digitizing:
manually tracing a map or aerial
Applied to map or aerial photo
Use hard copy map/photo on table/tablet,
or scanned image on screen (heads-up
digitizing) paper maps unstable
pen or cursor detects x, y coords crease and fold
coordinates are in inches/cms from lower stretch with humidity ( up to 3%)
left (0,0)
photos more stable (0.2%)
control points (tic marks) relate digitized
coordinates to real world lat/long map errors transferred to GIS
coordinates maps often prepared for display
coordinates captured in stream or point not accuracy
accuracy of table (but not user!) usually human hand very shaky
better than 0.1 mm often generates undershoots,
all nodes and polygons should be marked overshoots, & double lines
and numbered first
essentially a vector approach editing and clean-up essential

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Data Format Conversion:
Vector raster

Vector to Vector
e.g. whole polygon (e.g SAS map data) to 4 possibilities
computationally intense
vector to raster: line
no accuracy loss providing data is clean cells assigned if touched by line
perfectly transitive stair step appearance of diagonal
raster to raster lines (called aliasing)
may involve resampling (see under data can be visually improved through
reduction) anti aliasing: brightness of cells
may involve conversion between different varied based on fraction of cell
vendors raster formats (e.g. GRID to BIL) covered by the line
vector to raster: point raster to vector
node x,y assigned to closest raster cell by far the most difficult
locational shift almost inevitable; error
depends on raster size.
two points in one cell indistinguishable Transitive: the ability to reproduce the
not transitive; cannot retrieve original data original data after conversion.
without error

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Vector to Raster Conversion

Point Orthogonal Line Diagonal Line



Note the use of

Raster anti-aliasing to
improve lines

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Raster to Vector Data Conversion: 3-step process
skeletonizing (or thinning): to reduce rasters to unit width
peeling approach successively removes outer edges
medial axis approach determines set of interior pixels farthest from outer edges
vector extraction: to identify lines
4-connected reconstruction
joins center points of 4-connected neighbors if present
particularly bad for diagonal line reproduction
8-connected reconstruction Available via the
joins center points of 8-connected neighbors if present
diagonal lines reproduced but adds extra lines
8-connected reconstruction with redundancy elimination
extension for
if 4-connected neighbor line exists, dont draw diagonal ArcGIS, as well
reduces redundant lines as via several
topological reconstruction: recreates topological structure specialized
create nodes at line junctions packages from
construct arcs
define polygons (manual designation required)
other vendors

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Raster to Vector Conversion

For example, go to:

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Raster to Vector Conversion:
Vector Extraction
4-connect reconstruction
Vector Raster

4-connect reconstruction:
search the 4 surrounding cells and
join center points if present

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Raster to Vector Conversion:
Vector Extraction
8-connect reconstruction
Vector Raster

8-connect reconstruction:
search the 8 surrounding cells and join
center points if present.

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Raster to Vector Conversion:
Vector Extraction
8-connect reconstruction with redundancy elimination

Vector Raster

8-connect with redundancy

draw diagonal from 8-cell search only if not already
connected by orthogonal from 4-cell search

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Data Format Conversion Implementation in ArcGIS 9
To Raster To Vector
Arctoolbox>Conversion Tools>To Raster> Arctoolbox>Conversion Tools>From Raster>
Raster To Other (multiple) Raster to Point
Raster to Polygon
Converts one or more raster dataset formats
supported by ArcGIS to a GRID, IMAGINE, Raster to PolyLine
TIFF, or geodatabase raster dataset format Converts raster datasets in GRID, IMAGINE, or
From TIFF formats to shapefiles or feature classes.
Results may not be what you expect!
Raster Can also be accomplished thru ArcCatalog, Can also be accomplished thru ArcCatalog, Export
Export function function

Arctoolbox>Conversion Tools>To Raster> Use ArcCatalog, Export function for

Feature to Raster conversions between shapefiles, gdb feature
classes, coverages and CAD
From Converts any shapefile, coverage, or geodatabase
feature class containing point, line, or polygon
Vector features to a raster dataset
ArcGIS Data Interoperability Extension
for the most comprehensive set of conversions

Can also be accomplished thru ArcCatalog,

Export function.

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
conserve space
Data Reduction
Disk in past
Comm. bandwidth today Thinning (vector data)
conserve time often applied to data digitized
reduce processing time (batch) in stream mode
speed response time (interactive) tolerance elimination: remove
Resampling (raster data) nearest-neighbor points which
average the 4 values in a 2by2 are too close (e.g. output
neighborhood device resolution insufficient
use this 1 value in a single cell occupying to distinguish)
the location of the 4 original cells topological elimination*:
use mean for interval data; rules remove points unnecessary for
required for ordinal or nominal data topo structure
not transitive!
model-based elimination: fit
polynomial by least squares
and record fewer points along
3 7 16 bytes its path
2 4 *Normally uses the Douglas/Poiker (or Peucker) algorithm: David H.
4 bytes
Douglas & Thomas K. Peucker Algorithms for the reduction of the
number of points required to represent a digitized line or its caricature,
4 Canadian Cartographer, 1973
1 byte
Implement in ArcGis via Advanced Editing toolbar,
Generalize tool 14
06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Topology & Errors
Topology --knowledge about relative spatial positioning
--spatial relationships between features and rules about these
--managing data cognizant of shared geometry
Implies knowledge of the three Cs:
connectivity (linked):
congruency (coincident/same as/on top of)
contiguity (adjacent)
It is critical that spatial data be created and managed so that it is topological
clean--free from topological errors
--editing must always aim to maintain topological structure
In topological editing, changes made to one feature (line, polygon, etc.) are also
reflected in all other features to which it is connected, coincident, or adjacent
In the classic GIS data structure model (as discussed in GIS Data Structures
lecture) this implies that, for example
--all arcs have nodes at end points
--there is a node wherever arcs intersect or connect
--a single arc forms the border between contiguous
polygons (e.g. Dallas and Tarrant county) Tarrant Dallas
--a single arc represents a common boundary
(e.g. state and county boundary)
06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Errors: detection and removal
GIS packages commonly use topological structure checking
to detect errors
Editing based on node snapping used to correct errors:
moving a feature so its coordinates correspond exactly with
snapping conducted based on tolerances -- snap if within 1
foot, for example
Care must always be taken to assure that topological
cleaning does not itself introduce errors (e.g. snapping
nodes and lines together which shouldnt be)

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Topological errors or real world occurrences?
common problems
dangling arc (node missing at one end)
No node at arc intersection (overpass?)
Overshoot (or missing node)?
pseudo node (but perhaps road surface

pseudo arc (connects to itself)

open polygon

Sliver polygon


06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
How ArcGIS Handles Topology
The original Coverage data model, introduced with
ArcInfo in 1981, incorporated topology as a part of the
The CLEAN command checked for, and automatically fixed,
topological errors based on a set tolerance
It could introduce errors into the data
The BUILD command then rebuilt polygon structures
ArcGIS 8.3 introduced the concept of topological rules for
geodatabases in which the topological relationships are
stored as a topology feature class separate from the data
The user can generate an error report, review each error, and then
fix it in the data if desired, or mark it as an exception

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Georeferencing: Rectification and Registration
providing true earth location/overlaying layers

rectification: rearrangment of Two methods

location of objects to correspond to homogeneous transformation
a specific reference system (usually via rotation, translation,
geodetic) scaling, skewing
registration: used for map projection and
rearrangment of location of objects similar conversions
of one set so they correspond with differential transformation via
those of another, without reference rubber sheeting
to a specific reference system used to correctly position
distorted images or scanned
Despite formal difference, often used maps or documents
Most commonly used to relate images (e.g. scanned photo) to a vector layer, but
can also be used to fix incorrect positioning of features in a vector layer
Implemented in ArcMap: via the Georeferencing toolbar for images
via the Spatial Adjustment toolbar for vector layers
06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
(homogeneous conversion)
translation of origin
from digitizer origin for sheet
to true origin of GIS file
rotation of axis
e.g to true north
translation differential
scaling of axis
differential (ovals to circles)
skewing of axis
Changing map projections may
involve all 4 rotation
06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Rubber Sheeting
(differential conversion)
GIS file is differentially stretched --the more the better
so that tic points in file overlay
corresponding ground control (tie) ground control (tie) --well distributed
points on earths surface (or tic
points in a second file) --known lat/long of
polynomial fitted by least squares ground control tie points
between known ground control (usually obtained from
coords and tic point coords in GIS GPS)
Least squares minimizes the sum map locations (tic) needed for rectification
of the squared distances between
tic/tie pairs
--common identifiable
derived parameters then applied to all points in each file
coordinates in file needed for registration
after conversion, tic points are on
average closer to ground control GIS file
points, but not identical
cant do this with a paper map!
06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Edge Matching:
Joining map sheets to create a seamless GIS
required for topo. consistency even if
features line-up visually
snapping used to connect features
Corresponding features fail
Issues to match on two sheets:
acceptable tolerance before
further investigation of mismatch
how far back to go on sheet(s) with
adjustments for mismatch
Causes of mismatch
paper map shrinkage/expansion
errors from digitizing/scanning
georeferencing errors
accuracy of equipment
extrapolation or round-off errors Edge matching in this example
overlapping map coverage would likely require further research
Implement in ArcGIS 9 by:
1. ArcToolbox>Data Management>General>Append
(replaces Geoprocessing Tools>Merge in AG 8)
combines two (or more) files, but does not link features
2. Spatial Adjustment toolbar, edge match tool
links features (after links have been manually identified)
06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Image Adjustments
raster/image data issues
Raster data is made from separate images (photos) or tiles which are mosaiced to produce seamless
Collars: must be removed for seamless image
Overlap between adjacent images
Borders of scanned maps
Image Balancing and Feathering: adjusting radiometry for consistent and/or desired image
color, brightness, contrast
Checker board appearance
Abrupt line between adjacent images
Brightness levels wash out detail in highly reflective areas, but enhance detail in low
reflectance areas
Inconsistent signature for same features, especially water as function of wind or sun relative to
camera (and is it blue?)
Digital Ortho adjustments:
Ground control (usually with GPS for visible points) to obtain real world location
Ground control for cameras angle relative to ground
Camera calibration data to remove lens distortion
Digital terrain model (dtm) to remove elevation distance
(5 mi. on map to mountain top, but 6 mi walking or on photo if mountain is 5,280 feet high!)

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
Collar removal required.
Tiles Before After

2005 NCTCOG Digital Orthos

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
to create regular spacings from irregular data
(e.g creating raster elevation surface from set of point height measurements)

estimating values for Estimated values

weighting functions
locations with no data average closest n (2?) points
based on: ignores distance
known values, and fit line between closest 2
understanding of spatial fit surface between closest 3
behavior of phenomena trend surface approaches
one high order polynomial
generally, should assign
oscillation a problem
more importance to closer finite element approach:
known values than those fit separate polynomials for
each local area
further away kriging: uses correlations of
values with distance
Implemented in ArcGIS 9 via ArcToolbox>Spatial Analyst Tools>Interpolation
06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
create new master coverage from the best spatial and attribute qualities
of two or more source coverages
combine multiple coverages into one to simplify support
updated data obtained (e.g. new TIGER file) but need to preserve
enhancements made to earlier version
two groups modify a single file, then need to recreate single version
which preserves mods
create new master coverage from quality spatial data in one source and
quality attribute data in another
somewhat narrower definition
Depending on the situation, can require application of a variety of
processing tools and can be labor intensive:
Approaches available within ArcGIS 9 include
Spatial Adjustment toolbar, specifically attribute transfer tool
ArcToolbox>Analysis Tools>Overlay>Update
other add-ins available such as
MapMerge from ESEA, Mountain View CA for ArcGIS
GIS/T-Conflate for transportation applications
06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals
NAVSTAR Global Positioning System (gps)
Types of Ground Collection and Corrrection
use to collect ground control for imagery/orthos Autonomous
or for point/line data (manholes, roads, etc) Hand-held unit provides 10m accuracy (with SA off)
$150-$1,500 per unit
NAVSTAR Satellite Program WAAS (wide area augmentation system)
<3 meter accuracy in practice (spec. is 7m vert/horiz)
24 (NAVigation Satellite Time and Ranging) satellites Base stations (25 across US) monitor satellites
in 11,00 mile orbit provide 24 hour coverage 2 master stations (E & W coast) calculate corrections
worldwide upload to two geosynchronous satellites over equator
correction signal broadcast to GPS receivers (no special extra equipment
first launched 1978; full system operational needed unlike DGPS)
December 1993. Began operation June, 1998
gps receiver computes locations/elevations via To be expanded to cover Canada, Mexico, Panama
European EGNO, Asian MSAS under development
signals from simultaneously visible satellites
Differential (DGPS-predecessor to WAAS)
(minimum 3 for 2-D, 4 for 3-D) accuracy 1-5m depending on equipment/exact method
Selective Availability (SA) security system equipment $1,500-$15,000 per receiver
100m accuracy with single receiver, if active correct for SA and other errors via either
real time correction signals over FM radio
10-15m accuracy if inactive
post process with data from Internet
SA turned off May 1st, 2000 Kinematic:
Multiple ways to counteract SA high accuracy engineering (within cms);
Even USCG broadcasted correction signal! two receivers (base station and rover
must lock-on to satellites
Europeans threatened to compete
equipment $15-30K per station
Regional denial of signal possible
Russias 21-satellite GLONASS (Global Navigation
Satellite System) also available.
Factors Affecting GPS Accuracy
worst in evening at low altitudes (but ephemerous best there)
especially water vapor which slows signal
reflected signals from buildings, cliffs, etc
position and number of satellites in sky
4 required for 3D (horiz. and vertical), 3 for 2D (no elevation)
ideallly, 3 every 120 horizon. with 20 elev., 1 directly above
blockage (of satellite signal)
by foliage, buildings, cliffs, etc.
WAAS signal espec. subject to blocking by terrain & buildings cos is from
geostationary equatorial satellite

Overall, accuracy better at night than during day.

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals

Most of the effort in most GIS projects involves

data preparation and integration!

06/02/17 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals