Академический Документы
Профессиональный Документы
Культура Документы
For a GIS to be useful it must be capable of receiving and producing information in an effective manner.
The data input and output functions are the means by which a GIS communicates with the world outside.
The objective in defining GIS input and output requirements is to identify the mix of equipment and methods needed to
meet the required level of performance and quality. No one device or approach is optimum for all situations.
DATA INPUT: The procedure of encoding data into a computer-readable form and writing the data to the GIS database.
Data entry is usually the major bottleneck in implementing a GIS. The initial cost of building the database is commonly 5
to 10 times to cost of the GIS hardware and software.
The creation of an accurate and well-documented database is critical to the operation of the GIS.
Accurate information can only be generated if the data on which it is based were accurate to begin with.
Data quality information includes the date of collection, the positional accuracy, completeness, and the method used to
collect and encode the data. (Discussed in detail in Ch. 5)
There are two types of data to be entered into a GIS: Spatial data and the associated non-spatial attribute data.
Keyboard entry: involves manually entering the data at a computer terminal. Attribute data are commonly input by
keyboard whereas spatial data are rarely input this way.
Keyboard entry may also be used during manual digitizing to enter the attribute information. However this is usually more
efficiently handled as a separate operation.
Roads files versus the census file -- roads file will use codes for the various road types while the census file uses exact
numbers for things like total population, age range, etc.
Coordinate Geometry (COGO): involves entering survey data using a keyboard. From these data the coordinate of spatial
features are calculated. This produces a very high level of precision and accuracy which is needed in a cadastral system.
For a city with 100,000 parcels, it would cost approximately $1 - $1.50 per parcel or $100,000 to $150,000 to digitize
the parcels manually. COGO procedures are commonly 6 times and can be up to 20 times more expensive than manual
digitizing.
Surveyors and engineers want the higher accuracy of COGO for their applications. Planners and most others are happy
with the lower accuracy provided by manual digitizing.
Manual Digitizing: The most widely used method for entering spatial data from maps. The map is mounted on a digitizing
tablet and a hand held device termed a puck or cursor is used to trace each map feature. The position of the puck is
accurately measured by the device to generate the coordinate data.
Digitizing surfaces range from 12 inches x 12 inches (digitizing tablet) to 36 x 48 (digitizing table) and on up.
The digitizing table electronically encodes the position of the pointing device with a precision of a fraction of a
millimeter.
The most common digitizer uses a fine wire mesh grid embedded in the table. The cursor normally has 16 or more buttons
that are used to operate the data entry and to enter attribute data.
V. Drake 1
SMC
The digitizing operation itself requires little computing power and so can be done without using the full GIS. A smaller,
less expensive computer can be used to control the digitizing process and store the data. The data can later be
transferred to the GIS for processing. The problem with this is having enough software for all the computers.
The efficiency of digitizing depends on the quality of the digitizing software and the skill of the operator. The process
of tracing lines is time-consuming and error prone. The software can provide aids that substantially reduce the effort of
detecting and correcting errors.
Attribute information may be entered during the digitizing process, but usually only as an identification number. The
attribute information referenced to the same ID number is entered separately.
Manual digitizing is a tedious job. Operator fatigue (eye strain, back soreness, etc.) can seriously degrade the data
quality. Managers must limit the number of hours an operator works at one time. A commonly used quality check is to
produce a verification plot of the digitized data that is visually compared with the map from which the data were
originally digitized.
Scanning: Scanning provides a faster means of data entry compared to manual digitizing.
In scanning, a digital image of the map is produced by moving an electronic detector across the surface of the map.
There are two types of scanner designs:
Flat-bed scanner: On a flat-bed scanner the map is placed on a flat scanning stage and the detectors move across the
map in both the X and the Y directions (similar to copy machine).
Drum scanner: On a drum scanner, the map is mounted on a cylindrical drum which rotates while the detector moves
horizontally across the map. The sensor motion provides movement in the X direction while the drum rotation provides
movement in the Y direction.
The output from the scanner is a digital image. Usually the image is black and white but scanners can record color by
scanning the same document three times using red, green and blue filters.
Inputting existing digital files: There are many companies and organizations on the market that provide or sell digital
data files often in a format that can be read directly into a GIS. These digital data sets are priced at a fraction of the
cost of digitizing existing maps.
Over the next decade, the increased availability of data should reduce the current high cost and lengthy production
times needed to develop digital geographic data bases.
Redrafting is often considered to be a major disadvantage of the scanning option. Redrafting, although time consuming,
does not necessarily add to the cost of the data conversion process. Redrafting can reduce the total cost of both
scanning and manual digitizing. For example, studies by the US Forest Service have shown that a "map preparation" step
before the manual digitizing is done can reduce the overall digital encoding costs by as much as 50%.
WHY?
1. The redrafting is done manually, not on a computer system and therefore costs are not incurred for the computer time
or the higher salaries of computer operators.
2. The digitizing operation proceeds much more quickly and requires less editing if the map has fewer errors and
inconsistencies. Faster completion of the digitization and editing functions reduces the amount and therefore the costs
of expensive computer system and computer operator time.
3. When inconsistencies on the map must be worked out, manual drafting is more efficient and faster than digitizing
because they require different skills. They are not equal tasks.
4. It is very time consuming and therefore very costly to make large numbers of changes to a map once it is in digital
form.
While a scanning system is for the most part automated, and requires less highly trained personnel, more complex
equipment must be maintained, more sophisticated software must be written or purchased and there are most steps in
the process.
V. Drake 2
SMC
Scanners are more expensive than digitizing tables. A 60 x 44 inch digitizing table can cost between $3000 and $8000.
A high quality scanner will cost $100,000. The higher equipment costs can be justified if there is a great deal of
production that needs to be done.
Most GIS software packages include a digitizing software capability, but separate special-purpose software is needed to
operate a scanning system. Scanning works best with maps that are very clean, simple, and do not contain extraneous
information. Scanning is most cost-effective for maps with large numbers of polygons (1000 or more) and maps with a
large number of irregularly shaped features such as lines and odd polygons.
Manual digitizing tends to be more cost-effective when there are relatively few maps that are not in a form that can be
scanned. Maps that require a lot of interpretation do not need to be scanned.
There is a strong demand for faster, more cost effective data entry methods. Hundreds of computer operators with
thousands of maps are not the answer. Although scanning will never replace manual digitizing, as more and more scanners
are used, the technology will become better and better.
Since digital data sets are produced to satisfy a wide range of users, the cost of the data, currency and accuracy vary.
The accuracy with which boundaries are drawn, the date of the information, and the method compilation may be
sufficiently different to create errors when different data layers or adjacent map sheets within a data layer are used
together.
Figure 4.4 p. 112 This is a map produced from the USGS 1:250,000 Land Use/Land Cover digital data set. To generate
this map, the data for two adjacent data sets were joined. Notice the abrupt change in land use categories along a
horizontal line in the center of the map. This change coincides with the boundary between two map sheets from which
the data were digitized. The differences may be a result of discrepancies in airphoto interpretation or of the three year
difference in the source dates of the aerial photography used.
Problems such as there may occur in any digital data set and must be identified and taken into account.
Private companies are also beginning to provide off-the-shelf database products. Although there may be difficulties, the
cost of existing data is usually a fraction of the cost of creating anew data set.
The availability of inexpensive data sets will make GIS technology economically more attractive and easier to implement.
In the US the cartographic community has made a considerable effort to coordinate and standardize the production and
distribution of digital geographic data.
At the federal level, the Federal Interagency Coordinating Committee on Digital Cartography (FICCDC) was formed in
1983 for this purpose. Over 14 organizations participate in the Committee, which holds regular meetings and produces a
newsletter and a variety of reports.
Now we are going to discuss examples of data sets available from these federal agencies.
Graphics format is essentially the line and point features digitized in vector format. In this form, the map can be easily
updated or modified to produced special purpose maps.
These data sets are well suited for the CAD systems used in digital mapping. However, they are severely limited by the
lack of topological structuring.
A commonly used interchange format is the SIF (Standard Interchange Format) developed by the digital mapping
industry for transferring lines, points, curves, and symbols.
These data sets can be incorporated into a GIS but there can be a lot of problems associated with it. For example, the
data files often have not been checked for topological consistency.
They may contain such inconsistencies as lines that do not met precisely, that overshoot or under shoot the correct
connection point. The may be missing lines or gaps that create polygons that are not closed.
For use in a vector GIS these files must be clean and topologically structured.
Topologically-Structured Format is designed to encode geographic information in a form better suited for spatial
analysis and other geographic studies. Most GISs are designed to use topologically structured data.
The USGS Digital Line Graph (DLG) data set is an example of topologically structured data. This cartographic data set
has been developed from previous mapping efforts at the 1:2 million scale and more recently at the 1:100,000 and
1:24,000 scales.
The older 1:2 million data includes transportation , hydrography, and political boundary maps.
The 1:100,000 scale data sets for hydrography and transportation have been completed for the entire US while the
political boundaries and Public Land Survey System are still being developed.
The 1:24,000 series will include the PLSS, political boundaries, transportation, hydrography, and contour data layers. See
Figure 4.5 on page 114
These data sets represent a comprehensive, standardized inexpensive and publicly available source of digital information.
The complete coverage (at the 1:100,000 scale) makes it possible to assemble large-area data bases quickly and at a low
cost.
V. Drake 4
SMC
Address Matching is the technique of linking data from separate files by means of a common attribute, the street
address. For example, welfare case records may include the name and the address of each recipient but not the census
tract. The census tract information can be retrieved from the spatial data file by using the address as a key to find the
data in the other file.
District Delineation is a procedure that defines compact areas based on one ore more attributes. For example, it can be
used to divide an area into electoral districts that each have about the same population. Conceptually, this involves
starting at one point and enlarging the area until it encompasses the specified number of people, then a new district is
started and the process is repeated.
The population information would be retrieved from the attribute data file and the information needed to define and
enlarge the district boundaries would be retrieved from the spatial data file.
The district delineation procedure is used to define police and fire service districts, school districts, and commercial
market areas.
Network Analysis is used to optimize transportation routing such as bus routes and emergency vehicle dispatching.
This procedure takes into account the length of each transportation segment and facts that affect the speed of travel
or the quantity of material that can be carried. Sophisticated systems can take into account the effects of rush hour
traffic, road closures, and vehicle availability in order to make the best assignment of delivery vehicles and routing.
Attribute data in the TIGER file include feature names, political and statistical geographic area codes (such as county,
incorporated place, census tract and block number) and potential address ranges, and zip codes for that portion of the
file. The Census Bureau no longer supports the DIME files. The TIGER files can be easily integrated into an existing
GIS data base by file matching, using the geographic area codes as match keys.
One method is to use a variable grid cell spacing to accommodate a variable density of points, with smaller cell sizes
being used to capture the detail in more complex terrain. A second approach has been to use irregularly spaced
elevation points and represent the topography by a network of triangular facets. In this way, elevation data can be
stored and manipulated using a vector representation. The TIN is produced from a set of irregularly spaced elevation
points (SEE FIGURE 4.9). A network of triangular facets is fit to these points. The coordinate positions and elevations
of the three points forming the vertices of each triangular facet are used to calculate such terrain parameters as the
slope and aspect.
The advantage of a TIN compared with a gridded representation is that the TIN can use fewer points, capture the
critical points that define discontinuities like ridge crests, and can be topologically encoded so that adjacency analyses
are more easily done.
A third way to digitally represent a topographic surface is by development of a profile showing the elevation of points
along a series of parallel lines. Elevation values should be recorded at all breaks in slope and at scattered points in level
terrain. If the profiles are constructed from a topographic map, the elevation values can only be taken where the
profile crosses a contour line.
The fourth approach is to digitize contour lines. Here the topographic surface is represented by series of elevation
points taken along the individual contours. Although elevation data can be converted from one format to another, each
time the data are converted some information is lost reducing the detail to the topographic surface.
Digital elevation data is available in the US and was first produced by the Defense Mapping Agency. They were produced
by scanning the contour overlays for 1:250,000 scale topographic maps.
These data have an accuracy of 15 m in level terrain, 30m in moderate terrain, and 60 m in steep terrain.
The data are sold by the map sheet as 1 degree x 1 degree blocks and are available for the entire US.
The USGS plans to progressively upgrade the accuracy of this data set and is also producing a higher accuracy DTM file
with a 30m sampling interval. The data are maintained in two datasets; one with a +7m accuracy and the other with a +7 -
+15m accuracy. These data are available for about 30% of the US and are sold by 7.5 minute quad sheets.
The unit price for these data decrease with the number of DTs purchased. Prices for orders of six or more DTM consist
of a base charge of $90 and $7 for each additional unit.
DATA OUTPUT
Output is the procedure by which information from the GIS is presented in a form suitable to the user. Data are output
in one of three formats: Hardcopy, Softcopy and electronic.
Hardcopy outputs are permanent means of display. The information is printed on paper, mylar, photographic film or other
similar materials.
Softcopy output is in the format viewed on a computer monitor. Softcopy outputs are used to allow operator interaction
and to preview data before final output. A Softcopy output can be changed interactively but he view is restricted by the
size of the monitor.
The hardcopy output takes longer to produce and requires more expensive equipment. However, it is a permanent record.
Output in electronic formats consists of computer-compatible files.
V. Drake 6
SMC
CHAPTER 5: DATA QUALITY
(GIS: A Management Perspective - Stan Aronoff)
Pages 133 - 149
People routinely make judgments about data quality. Hikers learn from experience that on topographic maps the position
of trails are less accurately shown than the position of roads. Their judgment of the relative quality of the trail and road
information guides their use of the map data. Knowing the quality of data is critical to judging the applications for which
they are appropriate.
When spatial analyses are done manually using map overlays, users quickly learn to shift the map slightly to align
boundaries that should overlap. A map overlay may not be precisely registered but with these manual adjustments it can
be shifted so that any local area can be registered closely enough for the work at hand.
You can't do this in a GIS. Implicit assumptions about data quality must be made explicit so that they can be properly
addressed. In a computer, either roads meet or they don't. The computer must be programmed to read a line ending
short of the road as connected.
The cost of assessing data quality varies with the degree of rigor needed. The more rigorous the data quality testing,
the more costly it becomes. This cost is not only a result of the expense of performing the test, but also of the delays
caused in the production process to perform the tests and correct errors. The level of testing should be balanced
against the cost of the consequences of less accurate data or a less rigorously confirmed level of quality. Demanding
higher levels of data quality than are actually needed quickly becomes a significant unnecessary expense when it is
applied to the entire GIS database.
In a similar way, the expense of testing and recording the quality of the data in a GIS should be matched to the
consequences of its inappropriate use. The data in a GIS may be used for a wider range of analyses than when the same
data were in a non-digital form. This is one of the advantages of a GIS, the capability to integrate diverse data sets that
could not be analyzed together.
However, the data may be used in ways not foreseen by heir produces and by users without the knowledge or experience
to judge whether the application is appropriate.
A landowner in Wisconsin successfully sued the state for inappropriately showing the highwater mark around a lake on a
standard topographic map. The user did not realize that this type of topographic map was not sufficiently accurate to
show land parcel boundaries in the context of the elevation data.
As a consequence, it appeared that a portion of the owner's land was below the highwater mark. According to state laws,
land below the highwater mark is the property of the state. The error was corrected, however the owner successfully
sued for damages because in the interim a reasonable interpretation of the map would have caused her title to the land
to be in doubt.
Basically, a USGS topographic map was used to present data of unknown quality. The hand-drawn information was judged
to have the accuracy of the topographic map, which was an incorrect assumption.
In another case, the US federal government was held responsible for inaccurately and negligently showing the location of
a broadcasting tower on an aeronautical chart. This was shown to be a contributing factor in a fatal plane crash.
The quality of geographic data is often examined only after incorrect decisions have been made and financial losses or
personal injury have occurred. More and more, producers of geographic information are being held liable when their
products are found to contain errors, are poorly designed, or are used in ways and for purposes unintended by their
designers. Data quality standards, appropriately designed, tested, and reported, can protect both the producer and user
of the geographic information.
A GIS provides the means for geographic information to be used for a broader range of applications and by users with a
wider variety of skills than ever before. In order for these data to be used in decision-making, their quality must be
predictable and known. Ultimately, the data quality standards must serve the needs of the users, so the user
community must be directly involved in specifying the data quality standards for the GIS data base and in dealing with
practical constraints like budget, technical capabilities, and rate of production.
V. Drake 7
SMC
COMPONENTS OF DATA QUALITY
The characteristics that affect the usefulness of data can be divided into 9 components which are grouped into 3
categories: micro level components, macro level components, and usage components.
Positional Accuracy
Positional accuracy is the expected deviance in the geographic location of an object in the data set (map) from its true
ground position.
It is usually tested by selecting a specified sample of points in a prescribed manner and comparing the position
coordinates with an independent and more accurate source of information.
There are two components of positional accuracy: bias and precision.
Bias refers to systematic discrepancies between the represented and true position. Bias is measured by the average
positional error of the sample points.
Precision refers to the dispersion of the positional errors of the data elements. Precision is commonly estimated by
calculating the standard deviation of the selected test points. A low SD indicates that the dispersion of the errors is
narrow; the errors tends to be relatively small. The higher the precision, the greater confidence in using the data.
Attribute Accuracy
Attributes may be discrete or continuous. A discrete variable can take on a finite number of values whereas a continuous
various can take on any number of values.
Categories like land use class, vegetation type, or administrative area are discrete.
Variables like temperature or average property value are continuous, the variable can take on any value so intermediate
values are valid.
The method of assessing accuracy for continuous variables is similar to that discussed for positional accuracy. HOW???
The assessment of the accuracy of discrete variables is the domain of classification accuracy assessment, which is a
complex procedure. The difficulties in assessing classification accuracy arise because accuracy measurement is
significantly affected by shape and size of individual areas. the way test points are selected, and the classes that are
confused with each other.
Randomly selected points from the data set are checked against field observations. For example, wetlands along
streams are typically long, narrow areas. Though they are often important for planning purposes, these areas commonly
make up less than 1% of the total map area.
In a randomly selected sample of test points, these areas would probably not be chosen. Therefore, a classification
accuracy could be calculated from the test points, but that is no indication of the wetlands class accuracy unless it was
also tested.
One way to alleviate this is to chosen a set of test points from every category.
Another problem exists because very few sharp boundaries exist in nature whereas the data set will have a demarcation
line between classes.
Logical Consistency
Logical consistency refers to how well logical relations among data elements are maintained. For example, it would not be
consistent to map some forest stand boundaries to the center of adjacent roads and others to the road edge.
Political and administrative boundaries defined by physical features should precisely overlay those features. For
example, the edge of a property that borders a lake should coincide with the lake boundary.
Another problem exists when mapping a reservoir because the water level will fluctuate over the span of a year.
Different GIS data layers may show the reservoir boundary at different locations, depending on the date of the
mapping. This problem is solved by providing a standard outline for a reservoir and placing it in each layer.
It is important to remember that two data sets may be correct to their specified level of positional accuracy and yet not
be logically consistent.
V. Drake 8
SMC
When two data sets are overlaid, the slight discrepancy causes a sliver.
There is no standard measurement of logical consistency, however, it is best addressed before data are entered in the
GIS data base. A map preparation stage is commonly used during which individual maps that are to be digitized are
checked and redrafted to correct errors and inconsistencies.
Resolution
The resolution of a data set is the smallest discernible unit or the smallest unit represented. For satellite images this is
called spatial resolution.
For thematic maps, the resolution is the size of the smallest object that is represented, and is termed minimum mapping
unit. The minimum mapping unit decision is made during the map compilation phase. Factors like the expected use of the
map, legibility, source data accuracy, and drafting expense are all considered. Geographic data in a digital GIS data
base can be displayed at any scale, because the geographic data do not really exists at a specific scale. Therefore the
minimum mapping unit can be very small. However, this ease with which the geographic data in a GIS can be used at any
scale highlights the importance of accurate data quality information.
Although the data do not have a specific scale, they were produced with levels of accuracy and resolution the make it
appropriate to use them at only certain scales. Using a GIS, a 1:50,000 scale map could be produced from data that
were digitized from a 1:500,000 scale map. However the map would not have the quality of a 1:50,000 scale map.
Completeness
There are several aspects to completeness and these are completeness of coverage, classification, and verification.
The completeness of coverage is the proportion of data available for the area of interest. Ideally, a data set will provide
100% coverage, however many data sets are progressively updated.
When information is needed about the current status of a resource, the most current information may be the most
suitable. In other cases, such as comparative analysis, it may be more important to have consistency in the data set. An
older data set for the entire study area may be more appropriate tan a patchwork of more recent data collected in
different years.
Completeness of classification is an assessment of how well the chosen classification is able to represent the data. For a
classification to be complete it should be exhaustive, that is it should be possible to encode all data at the selected level
of detail.
TABLE 5.1. If the livestock category horses occurs, it cannot be encoded at Level 3 only at Level 2.
Under truck crop, potato will fall into the OTHER category however it does not make the data set complete because the
total crop area could not be calculated.
Another problem occurs if an observation can be placed into more than one category. For instance, in the forest category
what happens in the transition areas from coniferous to mixed to deciduous.
Class definitions may also differ among map sheets as a result of the individual or the organizations that produced them.
The maps may be accurate in terms of position and classification, but the boundaries from adjacent maps may not match
if they were produced by different forest districts.
Completeness of verification refers to the amount and distribution of field measurement or other independent sources
of information that were used to develop the data.
Geologist indicated this aspect of data quality by using solid lines t map rock types for which they have direct field
evidence (boundaries they can see) and inferred boundaries by dashed or dotted lines.
Reporting of qualitative assessments of completeness have largely been ignored, however they are critical to the
appropriate use of the data.
V. Drake 9
SMC
Time
Time is a critical factor in using many types of data. Demographic information is usually very time sensitive and change
significantly over a year. Land cover will change rapidly in an area of rapid urbanization.
Some data are biased depending on what time of the year they are collected. For example, in areas that produce multiple
crops per year, the crop types grown in an area change with the seasons.
The time aspect of data quality is most commonly reported as the date of the source material. Topo maps usually include
the original source date as well as the update date. For geographic information that changes relatively quickly over
time, the date of acquisition may be a very important attribute. Forest inventory maps may be updated on a 5 - 10 year
basis while crop conditions change rapidly over the growing season and is commonly updated on a weekly basis.
Time is a frequently overlooked consideration when multiple data sets, collected independently, are used together.
Lineage
The lineage of a data set is its history, the source data and processing steps used to produce it. The source data may
include transaction records, field notes, airphotos, and other maps.
A lineage report documents this information.
A lineage report for a topo map included the date of the aerial photography used, the photogrammetric methods used to
map the contour lines and cultural features from the airphotos, the use of check points for photogrammetric control, and
the methods used to generate the final map.
Ideally, some indication of lineage should be included with the data set since the internal documents are rarely available
and usually require considerable expertise to evaluate.
Unfortunately, lineage information most often exists as the personal experience of a few staff members and is not
readily available to most users.
Usage Components
The usage components of data quality are specific to the resources of the organization. For example, the effect of data
cost depends on the financial resources of the organization.
A given data set may be too expensive for one organization and be considered inexpensive by another.
Satellite imagery may be inexpensive for an oil company to use in exploration but outrageously expensive for a wildlife
agency to use for habitat mapping.
The accessibility of the data depends on imposed usage restrictions and the human and computer resources of the
organization.
Accessibility
Accessibility refers to the ease of obtaining and using data. Data use may be restricted because it is privately held or it
is judged a matter of national defense or to protect the right of citizens.
Error exists in the original source materials that are entered into the GIS. These area may be a result of:
Inaccuracies in field measurement,
Inaccurate equipment, or
Incorrect recording procedures.
Much of the data input into a GIS comes from remote sensing techniques. There are inaccuracies in the photogrammetric
methods used to draw maps and measure elevations.
Airphoto and satellite image interpretations introduce a degree of error in the classification and delineation of
boundaries.
In digitizing, curved boundaries are approximated by a series of straight-line segments. The smaller the segment used,
the more closely the boundary is approximated. However, the smaller the line segments, the larger the data files are.
No matter how carefully boundaries and points are entered, some residual error will always remain.
Errors in the position of natural boundaries are often introduced because the boundary does not exist as a sharp line. A
forest edge, even though it is drawn as a definite line, usually exists as a zone that may be several meters or tens of
meters wide.
V. Drake 11
SMC
There is also a level of inaccuracy inherent in the way classes are defined. Many continuous phenomena, such as
vegetation and soils, are mapped as homogeneous map units with sharp boundaries, e. g. choropleth and thematic maps.
In reality there is variability within each map unit. A polygon labeled as a Pine stand may actually have other types of
trees in small numbers.
When data are compiled a decision is made that areas below a certain size (minimum mapping unit) will not be recognized
within an otherwise homogeneous map unit.
While this is perfectly acceptable based on what you intend to use the data for, it may be unacceptable when applied to
analyses that you do not foresee.
A soil materials map may show an area to be sandy soil. In forestry, even the presence of 15% clay soils in this map unit
does not restrict the use of the data. However, in siting a house, the presence of class inclusions is important because a
house sited partly on clay and partly on sandy soil will tend to settle unevenly, cracking the walls and the foundation.
ACCURACY
Accuracy is the likelihood that a prediction will be correct.
In the case of a map, the positional accuracy is the likelihood that the position of a point as determined from the map
will be the "true" position, i. e., the position determined by more accurate information, such as by field survey.
Classification accuracy is the probability that the class assigned to a location on the map is the class that would be
found at that location in the field. No map can be 100% accurate.
CONCLUSION
Accuracy assessment can be an expensive procedure, and although it is extremely valuable, its costs must be weighed
against the benefits of the accuracy information.
Less rigorous tests that are less expensive can be used for data sets where the consequences of errors are less critical.
Accuracy assessments usually involve a comparison of values from the data set to be tested with values from an
independent source of higher accuracy, such as field verification, which may be more expensive than the application can
justify.
Less expensive approaches may be used. For example, indirect verification of test points may be done by interpreting
airphotos instead of by field observations.
In the end, the specification of an accuracy level and the rigor with which it is assessed are judgment calls. They must
take into account:
How the information is being used,
The consequences of inaccuracies, and
whether the accuracy measurements are indeed valid.
Requiring different levels of accuracy for different features in the same map or in the same data base is more cost
effective than demanding than all features be represented at the same accuracy level.
For this reason, the expenditure on accuracy assessment and data quality reporting in general must be matched to the
consequences of errors.
The trade-offs in accuracy assessment costs, the mandate and budget of the producer of the data, and the willingness
of the user to pay for data will all influence the assessment methods chosen. A rigorous accuracy assessment may not be
justified for every data set in the GIS.
But an accuracy rating of some form and a description of the method used to generate that rating should always be
provided.
V. Drake 12
SMC