Вы находитесь на странице: 1из 39

Information Extraction from Remotely Sensed Images

From Data to Information


Data refers to numerical results of any set of measurements regardless of whether or not
the measurements are acquired with a certain purpose in mind.
Information is an aggregate of facts so organized or datum so utilized as to be knowledge
or intelligence.
Data has to be transformed so as to derive Information. The process of transforming data
into information is known as Information Extraction. Information Extraction can be of
three types:
1. Manual
2. Semiautomatic
3. Automatic
Information is the Key
Exploring the image has almost superseded the image itself. The information derived
from imagery is what provides earth resource managers with the information they need to
make decisions the best place for a new dam, the height of flood defenses and so on.
Remote Sensing is all about information the tangible result of all the effort that goes
into building and operating an earth observation platform is a set of measurements of the
Earth system from space, from which we can derive information of economic, social,
strategic, political or environmental value.
As sensors are continually being developed and refined so image-processing tools have to
change to ensure that new data can be fully exploited. The last 26 years has been
characterized by steadily increasing spatial accuracy and the increase of microwave SAR
but what affect will it have on how we process it? Many more sensors are planned for
the next few years, all of which will inevitably require new processing tools.
By far the most consistent trend has been the improvement in spatial resolution of optical
images since the 80 meter Multi Spectral Scanner on board Landsat 1. These will be as
sharp as 50 centimeters in the case of Quickbird. Whilst higher resolution enables greater
identification of small objects, it causes traditional land classification techniques to
become unreliable because the contributions of different material types within the pixel
distort the pixel spectrum from that of the material of interest, often resulting in a loss of
discrimination and potential misclassification. Other technique for addressing these have
been tried in the past, such as neural networks, but with limited success and reliability
and is hence used very little commercially.
Where are we heading for

The era of 1-meter satellite imagery presents new and exciting opportunities for users of
spatial data. With Space Imagings IKONOS satellite already in orbit and satellites from
EarthWatch Inc., Orbital Imaging Corp. and, of course, ISRO scheduled for launch in the
near future, high resolution imagery will add an entirely new level of geographic
knowledge and detail to the intelligent maps that we create from imagery.
Geographic imagery is now widely used in GIS applications worldwide. Decisions made
using these GIS systems by national, regional and local governments, as well as
commercial companies, affect millions of people, so it is critical that the information in
the GIS is up to date. In most instances, what aerial or satellite imagery provides is the
most up to date source of data available, helping to ensure accurate and reliable decisions.
However, with technological advancements come new opportunities and challenges. The
challenge now facing the geotechnology industry is two fold - how best to fully exploit
high-resolution imagery and how to get access to it in a timely manner.
Is high-resolution imagery making a difference?
There is no doubt that the GIS press has been deluged with high-resolution imagery for
the last few years. Showing an application with an imagery backdrop provides an
immediate visual cue for readers. Without the imagery backdrop, the context is lost and
the basic map, comprising polygons, lines and points becomes more difficult for the
layman to interpret. It is the context or visual clues that provide the useful information
and it is this information that is the inherent value of the imagery.
The higher the resolution of the imagery, the more man made objects that can be
identified. The human eye the best image processor of all can quickly detect and
identify these objects. If the application is therefore one that just requires an operator to
identify objects and manually add them into the GIS database, then the imagery is
making a positive difference. It is adding a new data source for the GIS Manager to use.
However, if the imagery requires information to be extracted from it in an automated and
semi automated fashion (for example, a land cover classification), it is a different matter.
If the same techniques that were developed for earlier lower resolution satellite imagery
are used on the high-resolution imagery, (such as maximum likelihood classification), the
results can actually create a negative impact. Whilst lower resolution imagery isnt
affected greatly by artifacts such as shadows, high-resolution data can be. Lower
resolution data also smoothes out variations across ranges of individual pixels,
allowing statistical processing to create effective land cover maps. Higher resolution data
doesnt do this individual pixels can represent individual objects like manhole covers,
puddles and bushes - and contiguous pixels in an image can vary dramatically, creating
very mixed or confused classification results. There is also the issue of linear feature
extraction. Lines of communication on a lower resolution image (such as roads) can be
identified and extracted as a single line. However, on a high-resolution image, a road
comprises the road markings, the road itself, the kerb (and its shadow) and the pavement
(or sidewalk). A very different method of feature extraction is therefore needed.

Its not just the spatial resolution that can affect the usage of the imagery. With 11 bit
imagery becoming available, the ability of the GIS to work with high spectral content
imagery becomes key. 11 bit data means that up to 2048 levels of grey can be stored and
viewed. If the software being used to view the imagery assumes it is 8 bit (256 levels),
then it will either a) display only the information below the 255 level (creating either a
black or very poor image) or b) try to compress the 2048 levels into 256, also reducing
the quality of the displayed image considerably. Having 2048 levels allows more
information in shadowy areas to be extracted as well as enabling more precise spectral
signatures to be defined to aid in feature identification. However, without the correct
software, this added bonus can easily turn into a problem.
Information Extraction from Remotely Sensed Images:
Geoinformation extraction using image data involves the consuction of
explicit,meaningful descriptions of physical objects (Ballard & Brown, 1982).When
performing analysis of complex data one of the major problems stems from the number
of variables involved. Analysis with a large number of variables generally requires a
large amount of memory and computation power or a classification algorithm which
overfits the training sample and generalizes poorly to new samples. Feature extraction is
a general term for methods of constructing combinations of the variables to simplify
these problems while still describing the data with sufficient accuracy. Best results are
achieved when an expert constructs a set of application-dependent features. All
approaches usually include object recognition i.e. interpretation using eye-brain/computer
system and object reconstruction i.e. coding, digitizing,sructuring.
It can be used in the area of image processing which involves using algorithms to detect
and isolate various desired portions or shapes (features) of a digitized image or video
stream. Generally approaches for information extraction using image processing
technoques may be grouped as follows:
Low-level

Edge detection
Corner detection
Blob detection
Ridge detection
Scale-invariant feature transform

Curvature

Edge direction, changing intensity, autocorrelation.

Image motion

Motion detection.

Shape Based

Thresholding
Blob extraction
Template matching
Hough transform (Lines, Circles/Ellipse, Arbitrary shapes -Generalized Hough
Transform)

Flexible methods

Deformable, parameterized shapes


Active contours (snakes)

Given below in scompiled form is various terminology used in context of


geoinformation extraction particlarly from image data:
Scene

Part of the visible world that one would like to describe

INFORMATION EXTRACTION FROM REMOTELY SENSED IMAGES


Data acquisition and data updating are important aspects in developing and maintaining
Geographical Information Systems (GISs). The spatial data in most existing GISs are
derived from existing maps through digitization. This method is prone to errors, and the
accuracy of the data derived from the existing maps is relatively low, especially
temporally. Photogrammetric measurement is another important method for data
acquisition. The data produced using this method can have good spatial accuracy.
However, the method is relatively expensive, as it needs precision photogrammetric
instruments and well-trained professionals. Therefore, methods of obtaining spatial data
for GISs efficiently and precisely have become a focus of photogrammetric research.
Feature Extraction
The automatic extraction of information from aerial photographs and satellite images is
a major requirement of the new digital based technology in photogrammetry. While a
number of tasks such as DEM and orthophoto determination can be achieved with a
large degree of automation, the extraction of linear and other features must still be
undertaken manually. Research described below aims to develop methods of
incorporating a greater level of automation in these tasks.
Semi-automatic Feature Extraction
The semi-automatic method for the extraction of linear features on remotely sensed
images in 2D and 3D, is based on active contour models or 'snakes'. Snakes are a method
of interpolation by regular curves to represent linear features on images. The initial
feature extraction is achieved by image processing operators, such as the Canny operator
for single edges, and morphological tools for narrow linear features of 1 or 2 pixels in
width. The approach developed is semi-automatic, and hence is assisted by an operator
to locate a selection of points along and near, but not necessarily exactly on the feature.
The iterative computation then locates the feature as closely as the details in the image
will allow by an optimisation process, based on the definition of the snakes by cubic Bsplines. The features are extracted on single images by 2D snakes in terms of the local
image coordinates, or in 3 dimensions using overlapping images in terms of 3D object
coordinates. Tests of the method applied to aerial photography and SPOT satellite images
have been carried out in terms of the accuracy of the extracted features and pull-in range,
for a range of features in 2 dimensions, and in 3 dimensions in terms of their object
coordinates derived from photogrammetric measurements and from maps.
Automatic Feature Extraction
In contrast to semi-automatic methods, automatic road extraction aims at locating a road
in images without input from an operator on its initial position. Locating a road in
images auto-matically has two tasks, ie, recognition of a road and determination of its
position. Recog-nizing a road in an image is much more difficult than determining its

position as it requires not only the information which can be derived from the image, but
also a priori knowledge about the properties of a road and its relationships with other
features in the image and other related knowledge such as knowledge on the imaging
system. Due to the complexity of aerial images and existence of image noise and
disturbances, the information derived from the image is always incomplete and
ambiguous. This makes the recognition process more complex.
A knowledge-based method for automatic road extraction from aerial images has been
developed in this laboratory. The method includes bottom-up hypothesis generation of
road segments and top-down verification of hypothesized road segments. The generation
of hypotheses starts with low-level processing in which linear features are detected,
tracked and linked. The results of this step are numerous edge segments. They are then
grouped to form the structure of road segments based on the general knowledge of a road,
and the generated structures of road segments are represented symbolically in terms of
geometric and radio-metric attributes. Finally, applying the knowledge stored in the
knowledge base to the generated road structures hypothesizes road segments. As
hypotheses of road segments are generated in a local context, ambiguity is unavoidable.
To remove spurious hypothesized road segments, all hypotheses are checked in a global
context using the topological information of road networks, which is derived from lowresolution images. The missing road segments are predicted using topological
information of road networks. This method has been applied to a number of aerial
images with encouraging results.

EXTRACTION OF POINTS
General Principles for Point Extraction
Definition :
Points are image objects, whose geometric properties can be represented by
only two coordinates (x, y). One can distinguish between several types of points.
A circular symmetric point is a local heterogeneity in the interior of a
homogeneous image region. CSPs are too small to be extracted as regions
(depending on the image scale) and are characterised by properties of circular
symmetry (e.g., peaks, geodetic control point signals, man-holes). CPSs can be
interpreted as region attributes; they do not affect the image structure. Endpoints
(start point or end point of a line), corners (intersection of two lines) and junctions
(intersections of more than two lines) are used for the geometrical description of
edges and region boundaries. Missing of these points can cause fatal
consequences for the symbolic image description.

REPRESENTATION:
The symbolic description of points can be given as a list containing geometric
attributes (the coordinates), radiometric attributes (e.g., strength) and relational
attributes (e.g., the edges, intersecting at this point).
APPLICATIONS
Major applications for extracted image points are image-matching operations.
Assuming that extracted points refer to significant points in the real world, we can
look for the same real point in two images taken from a different view. This
technique is used for image orientation (PADERS et al. 1984) or DTM-generation
(e.g., KRZYSTEK 1991).

BASIC APPROACHES
Here we only review approaches that solely use the image data (one could also
think of point extraction methods, which determine junctions or intersections from
already extracted contours). Three prominent methods are:
Point template matching
Corner detection based on properties of differential geometry
Point detection by local optimization
Deriving the point coordinates, normally follows a three-step procedure: in the
first step point regions are selected, applying a threshold procedure. These are
image regions where points are supposed to lie inside. In a subsequent step the
best point pixels within these regions are selected this operation could be
referred to as thinning. An even more accurate determination of the point position
can be derived by a least squares estimation (LSE), so in this step we look for
the real valued coordinates of points.
Point Templates
One possibility to detect point regions is to define a point pattern (template),
which represents the point structure we are looking for. The main idea of
template matching is to find the places in the image where the template fits best
in the image. The similarity between the template and the image can be
evaluated by multiplication of the template values with the underlying image
intensities or by the estimation of the correlation coefficients. Disadvantages of
template matching in general are the limitation by the number and types of
templates, and sensitivity to changes in scale and to image rotation (assuming
that the template are rotational invariant).
Corner Detection by Curvature
Let us assume that the image data is stored in an image function g(r,c), r refers
to the row of the image, c to the column. Several approaches are based on the
curvature of g, which can be expressed by the second partial derivatives in the
coordinates axes r and c. The sign of the curvature can be used for the
classification of the pixels and for the detection of corners. An overview and
evaluation of these approaches can be found in (DERICHE AND GIRAUDON
1990).
Point Detection by Optimization:
MORAVEC (1977) was the first who proposed an approach aiming at detecting
points, which can be easily identified and matched in stereo pairs. He suggested
measuring the suitability or interest of an image point by the estimation of the

variances in a small window (4x4, 8x8 pixels). This method is used in many
stereo matching algorithms and initiated further investigations leading to the
interest operators proposed by PADERES et. Al (1984) and FORSTNER AND
GULCH (1987). Similar to the Moravec-Operator, the objective of these
operators is the detection of adequate points (but with higher accuracy).
Adequate points are those which meet the two criteria of (1) local distinctness (to
increase geometric precision) and (2) global uniqueness (to decrease search
complexity), in figure the Forester-Operator is able to detect different point types
with the same algorithm and can be used either for image matching or image
analysis approaches.

Interest-operator in a 1-D case: Image matching can be reduced to a onedimensional problem, using the epipolar geometry of two images. In this case the
aim is to match two intensity profiles. The effect of the interest operator in 1-D is
identical to finding the zero crossings of the Laplacian, neglecting saddle points
of the intensity function.

EXTRACTION OF EDGES
General Principles for Edge Extraction :
DEFINITION
Referring to BALLARD AND BROWN 1983, ROSENFELD AND KAK 1982,
NALWA 1993 and edge is an image contour, where a certain property like
brightness, depth, color or texture (see Fig.11a) changes abruptly perpendicular
to the edge. Moreover, we assume that on each side of the edge the adjacent
regions are homogeneous in this property. According to these characteristics,
edges can be classified into two general types, step edges (edges) and bar
edges (lines)

Edges represent boundaries between two regions. The regions have two distinct
(and approximately constant) pixel values; e.g., in an aerial image two adjacent
agricultural fields with different land use.
Lines either occur at a discontinuity in the orientation of surfaces, or they are thin,
elongated objects like streets in a small-scale image. The latter may appear dark
on bright background or vice versa. When the scale is large the street appears as
an elongated 2-D region with edges on both sides. To avoid conflicts in the
symbolic image description it might be necessary to make an explicit distinction
between edges and lines.
REPRESENTATION
Edges extraction usually leads to an incomplete description of the image, i.e.
edges do not build closed boundaries of homogeneous image regions. The types
of representation of single edges are manifold depending on the intended use.
The symbolic description of edges can be given, e.g. as a list, containing
geometric, radiometric (e.g. strength, contrast) and relational attributes (e.g.
adjacent regions, junctions, etc.). The geometric attributes depend on the choice

of the approximation function (see step 5 below). For linear edges it is sufficient
to specify the start and endpoint.
APPLICATIONS
Contrary to points as image features, one can argue that a list of all edges in an
image contains all the desired image information, but its representation is much
more reduced and is easier to be interpreted by a computer. To support this
statement, consider again the image in Figure 3a. Just by looking at the edges it
is possible to recognize the object. If in addition each had stored the brightness
of its left and right adjacent region, the information would be even more
complete. Another justification could be based on information theory, COVER
AND THOMAS (1991) wrote: the less a certain structure can be found in an
image, the more unexpected it is. This means that an unexpected structure
contains much more information than a frequent one (like homogeneous
regions). Edges can be used to solve a broad range of problems because their
importance, some of them are:
Relative orientation: Edge-based matching in stereo pairs are applied for relative
orientation, e.g. L1 and SCHENK (1991) use curved edges.
Absolute orientation: Matching edges with wire frame models of buildings can be
used for absolute orientation.
Object recognition and reconstruction: In many cases object models consist of
structural descriptions of object parts. Straight lines often bound parts of manmade objects. The structural description based on edge extraction provides
besides its completeness highest geometrical accuracy. Models about the
expected shape of object boundaries can be involved easily in the process, e.g.
searching for straight lines. Therefore, extracting edges is widely used for object
recognition.

BASIC APPROACHES
Both edges types can be detected by the discontinuity in the image domain and
in the following we will make no distinction between these types as long as it
makes no difference for the algorithm. Since the beginning of digital image
processing, edge detection has been an important and very active research area.
As a result, a lot of edge detectors have been developed, which differ in the
image or edge model they are based on, the complexity, the flexibility and the
performance. In particular, the performance depends on 1) the quality of
detection, i.e. the probability of missing edges and yielding spurious edges and
2) the accuracy of the edge location. Unfortunately both criteria are conflicting.

Even a short description of all approaches is beyond the scope here, so we only
outline the principles by looking at the main processing steps most edge detector
algorithms have in common. A typical approach consists of five steps:
Extraction of edge regions: Extraction of all pixels, which probably belong to an
edge. The result is elongated edge regions.
Extraction of edge pixels: Extraction of the most probable edge pixels within
the edge regions reducing the regions to one pixel wide edge pixel chains.
Extraction of edge elements (edgels): Estimating edge pixel attributes, e.g.
real valued position of the edge pixels, accuracy, strength, orientation, etc.
Extraction of streaks: Aggregation or grouping of the edgels that belong to the
same edge.
Extraction of edges: Approximation of the streaks by a set of analytic functions,
for example polygons.
In the following section the main objectives and the most common techniques of
each step will be mentioned.
Edge Regions
The aim of this step is to extract all pixels from an input image, which are likely to
be edge pixels. The extraction could be done by template matching, by
parametric edge models or by gradients. Starting from an image with the
intensity function g, the result is a binary image where all edge pixels are labeled.
In addition, iconic features, e.g. the edge magnitude and the edge direction, of
each edge pixel are extracted and stored as they are required in subsequent
steps.
Template Matching: Edge templates are patterns, which represent certain edge
shapes. For each edge type (different edge models, different edge directions,
edge widths and strengths) a special pattern is required. Operators can be found
e.g. in ROSENFELD AND KAK (1982).
Gradient Operators (Difference Operators): The main idea of these
approaches is that in terms of differential geometry the derivatives of an image
intensity function g can be used to detect edges, which is more general than
template matching procedures. The first step is to apply linear filters (convolution)
to obtain difference (slope) images. The slope images represent the components
of the gradient of g; from these the edge direction and edge strength (magnitude)
can be calculated for each pixel.

The convolution of the image with one of the many known difference operators is
followed by a threshold procedure for distinguishing between the heterogeneous
image areas, i.e. pixels with high gradients and the homogeneous area, i.e.
pixels with low gradients (see Sec. 2.3). All pixels above a certain threshold are
edge region pixels.
Parametric Edge Models: An example for a parametric solution of edge
detection is Haralicks Facet Model (HARLICK AND WATSON 1981), which can
be used either for edge detection or for extracting regions and points.
The idea is to fit local parts of the image surface g by a first order polynomial f
(sloped planes or facets). Three parameter , and represent the facet f,
which can be evaluated by least squares estimation. The model is given by g (r,
c) = r + c + + n (r,c), where and , are the slopes in the two coordinate
axes r and c, the altitude of the facet and n(r,c) the image noise. HARALICK
AND SHAPIRO 1992 showed that the result of this approach is identical to the
convolution with a difference operator. The classification of edge pixels is a
function of the estimated slopes (,) : if the slopes are greater than a given
threshold and, in addition, the variances are small enough (to avoid noisy image
areas, which are assumed to be horizontal), the pixel belongs to an edge region.

Edge Pixels
Due to low contrast, image noise, image smoothing, etc. the first step leads to
edge regions, which are possibly more than one pixel wide. The aim of this step
is, to thin the edge regions to one pixel wide edge chains. These pixels should
represent the real edges with highest probability. Assuming the real edge is
located in the mid-line (skeleton) of the edge regions, thinning or skeleton
algorithms can be applied. Obviously these midlines of edge areas are not
necessarily identical to the real edges. To improve the accuracy of edge location,
the properties of the pixel like the gradient or the Laplacian may be used for
extracting the most probable location of the edges. This can be done by the
analysis of the local neighbourhood of each pixel (non-maxima-suppression) or
by global techniques (relaxation, Hough transformation). The non-maximasuppression is the most widely used method.

Non-Maxima-Suppression: The process consists of two steps: 1) The selection


of the neighbour pixels in the gradients direction, which have to be used for the
comparison; 2) Suppression of pixels, which are found to have lower gradient
magnitudes than their neighbours. An example for (1) is given by CANNY (1983)
: his algorithm is defined in a N8-Neighbourhood (see Fig. 7 and Fig. 7a). Given
an edge pixel (r,c) and its gradient direction g perpendicular to the edge e, the
first step is the estimation of the two points P1 and P2. The gradient magnitudes
for P1 and P2 can be approximated by a simple linear interpolation of the gradient
magnitude of the two adjacent pixels.
The location of the edges also can be determined analyzing the zero-crossings of
the Laplacian. One problem is, however, that zero crossings occur both at the
extreme of the gradient function and at saddle points of g. The saddle points
should be neglected.

After the selection of the edge pixels by non-maxima-suppression, the edge


areas are in most cases reduced to thin lines. Due to the discrete image raster
and image noise, edge regions might occur which are still more than one pixel
wide. In this case subsequent thinning is required.

Edge Elements
The extraction of edgels is the first transition stage from the edge pixels in the
discrete image domain to the symbolic description of the edge. This step
contains the estimation of properties of the edge pixels required for subsequent
interpretation processes (e.g. real values coordinates, contrast, sharpness,
strength, type) and which are stored as attributes of the symbolic edge elements.

Edge Streaks
The next step is to group all edgels, which belong to the same edge. One can
say that now the real detection of the image feature edge happensbut the
real edge is represented as a list of edgesl. The aggregation of the edge
elements can be done using local (edge tracking) or global techniques (Hough
transformation, dynamic programming, heuristic search algorithm).
The grouping process should ensure that each streak 1) consists of connected
edgels, where each pixel pair is connected by a non-ambiguous pixel path and 2)
delineates at most regions (usually edges delineate two regions, except dean
lines or open edges, which are surrounded by the same region).
To satisfy the second criterion we define a streak as an edge pixel chain between
two edge pixels, which are either end pixel(s) and/or node pixel(s). According to
the number of neighbours in a N8-Neighbourhood we classify the pixels as node,
line or end pixels as shown in Fig. 9. Given the classification, the easiest
aggregation method is an edge following or edge tracking algorithm: first one has
to look for an unlabeled edge pixel, which means, that this edge pixel does not
yet belong to an edge. If you found one, you track all direct and indirect
neighbours until en end-or node pixel appears. All these collected edge pixels
belongs to one edge and will be labeled with a unique edge number.

Edge Approximation
Up to now, the extracted streaks are still defined in the discrete image model as
they are represented by a set of connected edge elements. Thus, for deriving a
symbolic description of the edges a last processing step is required. This step is
very important since the representation domain changes from the discrete image
raster to a continuous image model, the plane.

It is not obvious how to approximate a list of edgels by an analytic representation.


For example, you could apply curve-fitting techniques like splines, Bezier curves,
Fourier series, etc. This may give you smooth curves and probably better visual

results, but it would be too much hassle if you only look for straight lines.
Furthermore a polygon as a set of straight lines can also approximate a curved
edge. As usual, the choice of the approximation depends on what you want (or
the application requires). Here we look at straight-line fitting.
Approximation by Straight Lines: For the approximation of the edges by
straight lines many different approaches are possible like merging, splitting or
split and merge algorithms. The critical point is to find the breakpoints or corners,
which lead to the best approximation.
The merging algorithm sequentially follows an edge and considers each pixel to
belong to a straight line as long as it fits the line. If the current pixel does not fit
anymore, the line ends and a new breakpoint is established. A disadvantage of
this approach is its dependency on the merging order: starting from the other end
of the edge would probably lead to different breakpoints.
Splitting algorithms divide recursively the edges in (usually) two parts, until the
parts fulfill some fitting conditions. Considering an edge consisting of a
sequence of edge pixel P1, P2,..Pn then P1 and Pn being the end points are
joined by an arc. For each pixel on that arc, the distance to the edge is
calculated. If the maximum distance is larger than a given threshold the edge
segment is divided into two new segments at the position where the maximum
distance was found.
It is possible to combine the advantages of the merging the splitting methods by
developing a split and merge algorithm. First we split, and then we do a merging
step by grouping lines if the new line fits the streak well enough, see Fig. 10.
The accuracy of the symbolic description, i.e. the edge parameters can be
improved applying a least square estimation taking all edgels belonging to one
edge into account. The observation values are given by the real valued
coordinates (xI, yI) of each edgel and the weights are defined by e.g. the squared
gradient magnitude. The covariance matrix of the estimated edge parameters
contains the accuracy of the edge. Thus, the uncertainty of the discrete image
information is preserved in the accuracy of the edges, which could be important
for the image interpretation processes.

Extraction of Regions
General Principles for Region Extraction
DEFINITION
Regions are image areas, which fulfill a certain similarity criterion, we call such
regions blobs. A similarity or homogeneity criterion could be intensity value of the
image pixel or some texture properties of the surrounded area of the pixel. The
result of such a region extraction should divide or segment the image to a
number of blobs. Ideally the union of these blobs will give the image again. The
regions themselves should be connected and bounded by simple lines.
REPRESENTATION
Depending on the strategy of the region extraction, we distinguish between
different segmentation results.
Incomplete segmentation: The image is divided into homogeneous and
heterogeneous area first. The latter (we call those areas background) do not
fulfill the homogeneity criterion and therefore do not fulfill the above definition
exactly.
Complete Segmentation: The image is completely divided into regions, fulfilling
the definition as given above for the discrete image, too. That might yield to
conflicting topology of the image regions, depending on the definition of the
neighborhood (N8 or N4) (see PAVLIDIS 1977) but also to inaccurate region
boundaries, depending on the cost of the approach.
The final symbolic representation of blobs consists of geometric, radiometric and
relational attributes. A blob itself can be represented by its boundaries (if the blob
contains holes, the blob has more than one boundary) or by a list of pixels inside
the blob. Blob boundaries define the location of the blob. Representing blob
boundaries is equivalent to representing image edges.
Geometric attributes of blobs are size, shape, center of gravity, mean direction,
etc.) Algorithms for extracting these attributes can be found in literature,
particular in the field of binary image analysis. Radiometric attributes are e.g.
mean intensity within the blob, variances of the intensities, texture parameter.
Lists of adjacent blobs mutual boundaries, junctions and corners are examples
for relational attributes.

APPLICATIONS
Region information has the advantage that it covers geometrically large parts of
the image. Therefore it can be used for several applications like compression or
interpretation tasks.
Data compression: Grouping all pixels, which are connected in image space
and have similar properties to one object (i.e. the blob) and representing the
object by characteristics attributes, reduces the amount of data and the
redundancy of information.
Analysing range images: Region-based segmentation algorithms were found to
be more robust when analysing range image.
Binary image analysis: In many cases region extraction is a prerequisite for
binary image analysis, widely used in industrial applications.
High-level image interpretation: in many case object models consist of the
structural description of object parts, where the interior of each part is assumed
to have similar surface and reflectance properties. Therefore, extracting blobs
and their attributes is quite useful for object recognition.
BASIC APPROACHES
Given a digital image with a discrete image function, region extraction is the
process of grouping pixels to regions according to connectivity and similarity
(homogeneity). The large amount of region extraction methods can be classified
in several ways. One possibility is to separate the methods by the number of
pixels, which are used for the grouping decision and are therefore called local or
global techniques. Further on we distinguish the methods depending on where
the grouping is done:
In the first place the grouping process is defined in the image domain. That
means, that the decision that connected pixels can be merged or should be split
is done directly by the analysis of the properties of adjacent pixels. Thus, both
the similarity and the connectivity are considered in one processing step.
Examples of this types are: region growing or region merging, region splitting and
split and merge algorithms.
The second approach applies the similarity and connectivity evaluation in two
separate steps: The goal is first to analyse the discriminating properties of the
pixels of the entire image and use the result to define several classes of objects.
Examples are thresholding and cluster techniques. This is done outside the
image raster by storing all pixel properties in a so-called measurement space
(e.g. a histogram). Then, the definition of the classes can be used to classify the
pixels: Going back to the image domain, each pixel is labeled with the identify
number of the class. In the second step, pixels of the same class and which are

also connected in the image space are grouped to homogeneous regions.


Connected components algorithms can easily do this.
In the following a short overview is given on thresholding techniques, region
growing/merging and split and merge approaches. An overview on further regionbased segmentation techniques can be found in (HARALICK AND SHAPIRO
1985) or (ZUCKER 1976).
Thresholding Techniques
Thresholding techniques consist of 4 steps (step 1 and 2 are not necessary when
the thresholds are known in advance).
Determination of the histogram:
Choosing the thresholds: The choice of the thresholds is the most sensitive and
the most difficult step. Unfortunately, it is not always the case, that the peaks of
the histogram (may be more than two) are clearly separated by valleys. Also the
histogram often contains many local valleys, which are probably not interesting.
A survey on several techniques for estimating thresholds automatically can be
found in (HARALICK AND SHAPIRO) 1992.
Labeling or classification of the pixels: If the thresholds are determined the pixels
can be classified easily. The result of the labeling process can be called
segmented image, because the labels are associated with object classes.
Extraction of blobs by connected-components: this processing step performs the
change from single pixels to blobs. Pixels that are labeled with the same number
must be connected by at least one pixel path, which are all labeled by the same
value. The connectivity can be defined in a N8- or N4- neighborhood.
Connected components algorithms are usually defined in binary images. A
description and comparison can be found e.g. in (HARALICK AND SHAPIRO
1992) and (ROSENFELD AND KAK 1982). After this step, every pixel is labeled
with a value, where the value is associated with the blob number the pixel
belongs to.
Thresholding techniques work well and fast if the objects that have to be
recognised or analysed are not too complex. This is the case for many industrial
applications. The main problem is the automatic estimation of the thresholds.
Even when the peaks are well separated, the threshold result may not lead to the
accurate regions. Moreover, they may produce holes and ragged boundaries due
to the similarity grouping is performed in the measurement space and not in the
image domain. In this sense, threshold techniques may not fulfill the criteria of a
good region extraction method.

Region Growing / Region Merging


As the name indicates, region growing and region merging methods follow a
bottom-up approach: starting from a single pixel or a small region (the seed or
the seed region) the region extraction is done by appending all adjacent pixels to
the expanding region which perform a certain similarity criterion. If the image
consists of more than one region (that is normally the case) for each of them a
separate region growing process is required, which can be done sequentially or
in parallel. The process consists of the following steps:
Determination of the seeds: The determination of the seeds must ensure that
every region, which has to be extracted, contains at least one starting point. In
case the number and positions of the seeds are known in advance, region
growing can be applied. If the seeds are not given, they may be defined by each
pixel of the image raster. In this case a region merging procedure is required.
However, the subsequent region-growing step produces probably many small
adjacent regions, which are not significantly different from each other. So more
processing steps are required to merge as many regions as possible, if they are
considered to be similar enough.
Region growing starts at the seeds and stops if all pixels are labeled. Referring to
HARALICK AND SHAPIRO 1985 the region growing techniques can be
distinguished by the number of pixels, which are involved in the grouping
decision, i.e. in the evaluation of the homogeneity. In the easiest case the
growing algorithm consist just of the comparison of two adjacent pixels. It is
obvious that the result is very sensitive to noisy data. Less sensitivity to noise

can be obtained by investigating not only the pixel properties themselves, but a
mean property of the local neighbourhood or the properties of already extracted
regions. Local neighbourhood properties are e.g. the mean values and variances,
but also gradients or Laplacians. The latter area also used for many edge
detectors. Using gradient or Laplacian, edges and regions can be extracted by
the same operator, which directly takes the duality of regions and edges into
account. Combinations of different techniques provide further improvements by
consequently using their positive properties. Criteria are the accuracy of the
regions are significantly different, the ability to place boundaries in weak areas,
and the robustness to noisy data.
Region Merging: Assuming the image area being completely partitioned into
regions, the aim is to merge adjacent region, which are not significantly different.
The main problem of region extraction by region growing algorithms is the
question of the merging order. Except of methods working in a highly parallel
manner (e.g. relaxation techniques), the result depends on which region was
extracted first and which of the adjacent pixels or regions are attended first
(usually more than one neighbour fulfils the homogeneity criterion). The
determination of the best merging candidate is a time-consuming search
algorithm and is difficult to be solved. Less complex approaches consist of well
(and locally) defined merging rules.

Spilt and Merge


The splitting algorithm is a process of dividing the image area successively into
sub-areas unless the sub-areas satisfy a certain homogeneity criterion. To
improve the efficiency the partitioning in sub-areas can be done regularly, i.e. by
the partitioning of the still inhomogeneous areas into quarters. This regularity
causes squared, artificial and also inaccurate boundaries.
To cope with this problem, combinations of split and merge were developed: the
strategy starts from any given partition. Adjacent regions are merged if the result

is homogeneous, single regions are split if they do not meet the homogeneity
criterion. The process continues until no more merging or splitting can be done. A
further advantage of this method is that is faster than a single splitting or merging
process.
Drawbacks
The independent application of the techniques presented here reveals a number
of drawbacks:

Techniques aiming at complete partitioning of the image area like regionbased approaches lead to uncertain or even artificial boundaries.
Region-based techniques conceptually are not able to incorporate midlevel knowledge such as the straightness of the boundaries.
Edge based techniques normally cannot guarantee closed boundaries,
thus do not lead to a complete partitioning. Edges are likely to be broken
or do not represent the boundaries of the regions (spurious edges)
because of image noise.
Corner detectors usually dont work at junctions. All point detectors have
difficulties at smooth corners.
The used models are either wrong or at least not adaptive to the local
image content (e.g. edge detection at junctions).

To avoid inconsistencies all three-feature types could be extracted


simultaneously and therefore be embedded in the same model. A complete
feature extraction using points, lines, regions, and their relations leads to a richer
and also topologically description of the image. Such an integrated approach
(polymorphic feature extraction) is addressed in (LANG and FORSTNER 1996).

Expert System for Information Extraction


As mentioned above, high-resolution imagery from both aerial and space borne
sensors provides a challenge to the user community in terms of information
extraction. The human eye and brain can identify objects in the image but the
computer finds it difficult. If we cannot automate this process, then we will most
certainly lose out on some of the major economic benefits of the imagery.
If the human brain can do it, why cant the computer? Well it actually can, if it
uses rules or knowledge based processing, just as the human brain does. The
brain can make a decision on an image very quickly by understand and using
context. If we see grassland in the center of an urban development, we can
easily decide that it is a park, as opposed to agricultural land. To make this
decision we are using knowledge and experience to create expertise and
computer based expert systems are beginning to emerge that mimic this
process.
For many years, expert systems have been used successfully for medical
diagnoses and various information technology (IT) applications but only recently
have they been applied successfully to GIS applications.
Statistical image processing routines, such as maximum likelihood and ISODATA
classifiers, work extremely well at performing pixel-by-pixel analyses of images to
identify land-cover types by common spectral signature. Expert-system
technology takes the classification concept a giant step further by analyzing and
identifying features based on spatial relationships with other features and their
context within an image.
Expert systems contain sets of decision rules that examine spatial relationships
and image context. These rules are structured like tree branches with questions,
conditions and hypotheses that must be answered or satisfied. Each answer
directs the analysis down a different branch to another set of questions.

The beauty of an expert system is that because true experts, such as foresters
or geologists, create the rules, also called a knowledge base, non-experts can
use the system successfully.
In terms of satellite images, the knowledge base identifies features by applying
questions and hypotheses that examine pixel values, relationships with other
features and spatial conditions, such as altitude, slope, aspect and shape. Most
importantly, the know ledge base can accept inputs of multiple data types, such
as digital elevation models, digital maps, GIS layers and other pre-processed
thematic satellite images, to make the necessary assessments.

Automatic Information Extraction


In recent years it has become clear that most of the value of Geographic
Information Systems lies in its data, rather than in its hard- or software. For data
to be valuable they need to be up-to-date in terms of data completeness,
consistency, and accuracy. Mapping is often posed as an end-to-end process
where new source imagery is collected to meet certain project specifications and
the entire compilation process is performed using a homogeneous set of spatially
and temporally consistent data sources. In contrast, other mapping applications
require the ability to perform incremental of existing spatial databases from a
variety of disparate sources. Thus, a timely revision of GIS databases plays a
major role in the overall process of acquiring, manipulating, analyzing, and
presenting topographic data.
Besides techniques like the digitization of large maps and terrestrial surveys,
photogrammetry seems to be especially well suited for generating or updating
GIS databases, since it has already had a major impact in traditional map
updating. Digital photogrammetry based on digital images has the potential to
further increase this impact, mainly due to the possibility to at least partly
automate and thus speed up the generation and/or the revision process.
Expect the manual or semi-automatic measurement of ground control points,
almost all steps are automated, but frequently some manual post editing is
required. Image matching has e.g. still problems in built-up areas and has
limitations in forest areas. No robust solutions for break-line detection or object
extraction in these data exist so far.
Degree of automation:
The automated extraction of 3D objects like buildings, roads, bridges, street
furniture or vegetation is not yet widely used in practice, which is mainly due to
the big technical problems. In order to solve the object extraction task, methods
of image understanding and image interpretation are applied. Keywords are
image segmentation, object modeling and information fusion in order to detect
and reconstruct 3-D objects from 2-D images. Major research efforts are
currently put on the extraction of man-made structures like roads and buildings
from digital aerial imagery and from space imagery. The approaches range from
manual methods, to semi-automated and automated feature extraction methods
from single and multiple image frames. New developments on high-resolution
space sensors might allow medium and large scale mapping from space. Linear
objects like roads, railroads or river networks have since long attracted
researchers, but due to the limited resolution of space imagery they could not
successfully be extracted for mapping in medium or large scales. With new high
resolution sensors of 1 m and much better ground sampling distances 1m-5m the
possibilities to extract linear objects have increased dramatically.

Schenk (Schenk, 1999) proposes the expression autonomous for a system that
can perform autonomously from human interaction. Also those which are called
automatic (like automatic DEM generation) are not purely automatic, as they
solve the task up to a certain percent of errors. In extension to that Heuel (Heuel,
2000) gives a proposal to classify the automation degree of systems using the
terms quantitative and qualitative interaction: methods are defined automatic, if
only simple y/n decisions or a selection of alternatives, i.e. qualitative interaction
are needed, they are regarded as semi-automatic, if qualitative decisions and
quantitative input parameters are needed.
We need to initialize the extraction process, we might need to interact during runtime and we certainly need to validate or correct the results. The less interaction
we need, the higher is the degree of automation. We expect from the integration
of automatic processes, that the overall efficiency of the system is increased, but
we know, that those processes can give erroneous results, which are costly for
the user and thus may decrease the efficiency of the system. We may want to
reduce the level of training by avoiding complexity and skill requirements in
decision-making, but we also want to reduce the number of manual actions in the
collection phase. Here we should not only refer to the amount of human
interaction referred to time and number of mouse operations, but also to the type
of interaction needed. We certainly have to select parameters according to the
task we want to solve and the data, which is available. This is valid for all
systems. We need to give the image numbers of overlapping photographs, we
need to defined the units (m or feet) or we need to give the type of features
searched for alike buildings and/or roads. We have to provide instructions on
how to collect buildings in an interactive system or we need to give a set of
building models and some min-max values if we want to extract them
automatically. If we need to get deeper involved in the algorithms we might need
to give thresholds and steering parameters (window sizes, minimal angle
difference, minimal line length in the image etc), which are not always
interpretable. Sometimes it is difficult to connect them to task and imager
material. This holds also for some stopping criteria for the algorithms, like
maximal number of iterations etc. Also the type of post editing can vary. We
might need to correct single vertex or corner points, or the topology of whole
structures or we need to manually check for completeness.
Summarizing the above statements we propose the following scheme, starting
from an interactive system, where we can solve all tasks required, to a semiautomatic system, where we interact during the measurement phase, to an
automated system, where the interaction is focused at the beginning and the end
of the automatic process and to an autonomous system, which is behind horizon
right now.
1. Interactive system (purely manual measurement, no automation for any
measurement task).

2. Semiautomatic system (interactive environment and integration of


automatic modules in the workflow)
3. Automated system (interactive environment with interaction before and
after the automatic phase).
4. Autonomous system.

Cartographic Feature Extraction


Of all tasks in photogrammetry the extraction of cartographic features is the most
time consuming. Since the introduction of digital photogrammetry much attention
therefore has been paid to the development of tools for a more efficient
acquisition of cartographic features. Fully automatic acquisition of features like
roads and buildings, however, appears to be very difficult and may even be
impossible. The extraction of cartographic features from digital aerial imagery
requires interpretation of this imagery. The knowledge one needs about the
topographic objects and their appearance in aerial images in order to recognize
these objects and extract the relevant object outlines is difficult to model and to
implement in computer algorithms. Therefore, only limited success has been
obtain in developing automatic cartographic feature extraction procedures.
Human operators appear to be indispensable for a reliable interpretation of aerial
images. Still, computer algorithms can contribute significantly to the improvement
of the efficiency of feature extraction from aerial imagery. Whereas human
operators are better in interpretation, computer algorithms often outperform
operators in case of specific measurement tasks. So-called semi-automatic
procedures therefore combine the interpretation skills of the operator with the
measurement speed of a computer. This paper reviews the most common
strategies for semi-automatic cartographic feature extraction from aerial imagery.
In several strategies knowledge about the features to be extracted can easily be
incorporated into the measurement part perform by a computer algorithm. Some
examples of the usage of such knowledge will be described in the discussion at
the end of this paper.
Semi-automatic feature extraction
Semi-automatic feature extraction is an interactive process between an operator
and one or more computer algorithms. To initiate the process, the operator
interprets the image and decides which features are to be measured and which
algorithms are to be used for this task. By positioning the mouse cursor the
approximate location of a feature is pointed
out to the algorithm. If required the operator also may tune some of the
algorithms parameters and select an object model for the current feature. Semiautomatic feature extraction algorithms have been developed for measuring
primitive features such as points, lines and regions, but also for more complex,
often parameterized, objects.
Extraction of points : Semi-automatic measurement of points is used for
measuring height points as well as for measuring specific object corners. The
first case is usually known as a cursor on the ground utility, which is
available in several commercial digital photogrammetric workstations. Here,
the operator positions the cursor at some XY-position in a stereoscopic
model, whereas the terrain height at this position is determined automatically
by matching patches of the stereo image pair. After this determination the 3D

cursor snaps to the local terrain surface. Thus, the operator is relieved from a
precise stereoscopic measurement and can therefore increase the speed of
data acquisition. The second type of point measure algorithms is used to
make the cursor snap to a specific object corner. These algorithms can be
used for monoplotting as well as for stereoplotting. For monoplotting the
operator approximately indicates the location of an object corner to be
measured. The image patch around this approximate point will usually contain
grey value gradients caused by the edges of the object. By applying an
interest operator (see e.g. [Frstner and Glch, 1987]) to this patch the
location of the object corner can be determined. Thus, such utilities can make
the cursor snap to the nearest point of interest. When using the same
principle for stereoplotting, the operator has to supply an approximate 3D
position of the object corner. The interest operator can then be applied to both
stereo images, whereas the estimated 3D corner position will be constrained
by the known epipolar geometry. For the measurement of house roof corners,
this procedure was reported to double the speed of data acquisition and
reduce the operator fatigue [Firestone et al., 1996].

Extraction of lines : The extraction of lines from digital images has been a
topic of research for many years in the area of computer vision [Rosenfeld,
1969, ueckel, 1971, Davis, 1975, Canny, 1986]. First attempts to extract
linear features from digital aerial and space imagery were reported in [Bajcsy
and Tavakoli, 1976, Nagao and Matsuyama, 1980]. Semi-automatic
algorithms have been eveloped for the extraction of roads. These algorithms
can be classified into two categories: algorithms using deformable templates
and road trackers.

Deformable templates:
Before starting algorithms using deformable templates the operator needs to
provide the approximate outline of the road. This initial template of the road is
usually represented by a polygon with a few nodes near to the road to be
measured. The task of the algorithm is to refine the initial template to a new
polygon with many more nodes that accurately outline the road edges or the
road centre (depending on the road model used).
This is achieved by deforming the template such that a combination of two
criteria is optimised: the template should coincide with image pixels with high
grey value gradients and the shape of the template should be relatively
smooth. The latter criterion is often accomplished by constraining the (first
and) second derivatives of the template. This constraint is needed for
regularisation but is also leading to more likely outline results, since road
shapes generally are quite smooth. Most algorithms of this kind are based on
so-called snakes [Kass et al., 1988]. The snakes approach uses an energy
function in which the two optimisation objectives are combined.
After computing the energy gradients due to changes in the positions of the
polygon nodes the optimal direction for the template deformation can be
found by solving a set of differential equations. In an iterative process the
polygon nodes are shifted in this optimal direction. The resulting behaviour of
the template looks like that of a moving snake, hence the name. Whereas
snakes were initially formulated for optimally outlining linear features in a
single image, they can also be used to outline a feature in 3D object space by
combining grey value gradients from multiple images together with the
exterior orientation of these images [Trinder and Li, 1995, Neuenschwander
et al., 1995].
This snakes approach has also been extended to outline both sides of a road
simultaneously. More research is conducted to further improve the efficiency
of mapping with snakes by reducing the requirements on the precision of the
initial template provided by the operator and by incorporating scene
knowledge into the template deformation process [Neuenschwander et al.,
1995, Fua, 1996].

Road trackers
In the case of snakes, the operator needs to provide a rough outline of the
complete road to be measured. In contrast, the input for road trackers only
consists of a small road segment outlined by the operator. The purpose of the
road tracker is then to find the adjacent parts of the road. Most road trackers are
based on matching grey value profiles [McKeown and Denlinger, 1988, Quam
and Strat, 1991, Vosselman and Knecht, 1995].
Based on the initial road segment outlined by the operator, a characteristic grey
value profile of the road is derived. Furthermore, the local direction and curvature
of the road is estimated. This estimation is used to predict the position of the road
at some step size after the initial road segment. At this position and
perpendicular to the predicted road direction at this position a grey value profile is
extracted from the image. By matching this profile with the characteristic road
profile a shift between the two profiles can be determined. Based on this shift, an
estimate for the road position along the extracted profile is obtained. By
incorporating previously estimated positions, other road parameters like the road
direction and the road curvature can also be updated. The updated road
parameters can then be used to make a next prediction of the road position at

some step size further along the road. This recursive process of prediction,
measurement by profile matching and updating the road parameters can be
implemented elegantly in a Kalman filter [Vosselman and Knecht, 1995].
The road tracking continues until the profile matching fails at several consecutive
predicted positions, i.e. it stops when the several extracted profiles show little
correspondence with the characteristic grey value profile. Some characteristic
results are shown in figure 3. Trees along the road or road crossings and
junctions can often explain matching failures. Due to these objects the grey value
profiles extracted at those positions deviate substantially from the characteristic
profile. By making predictions with increasing step sizes, the road tracker is often
able to jump over these kinds of obstacles and continue the outlining of the road.

Extraction of areas
Due to the lack of modeled knowledge about objects, the computer-supported
extraction of area features is more of less limited to areas that are homogeneous
with respect to some attribute. Of course, in images the most common attributes
to look at are the pixels grey value, colour and texture attributes. Algorithms that
extract homogeneous grey value areas can facilitate the extraction of objects like
water areas and house roofs. The most common approach is to let the operator
indicate a point on the homogeneous object surface and let an algorithm find the
outlines of that surface.
An example can be seen in figure 4. It is clear that the results of such an
algorithm still require some editing by an operator. Overhanging trees at the left
side of the river and trees that cast dark shadows at the right side of the river
cause differences between the bounds of the homogeneous area and the river
borders, as they should be mapped. Similar differences will also arise when
using these techniques to extract building roofs. Most objects are not

homogeneous enough to allow a perfect delineation. Still, the majority of the lines
to be mapped may be at the correct place. Thus, editing the results of such an
area feature extraction will often be faster than a complete manual mapping
process. Firestone et al. [1996] report the use this technique for mapping
lakeshores. Especially for small scale mapping this can be very efficient since the
water surface generally appears homogeneous and the disturbing effects of trees
along the shoreline, as in the example, may be negligible at small scale.
The algorithms used to find the boundaries of a homogeneous area are usually
based on the region-growing algorithm [Haralick and Shapiro, 1992]. Starting at
the pixel indicated
by the operator, this algorithm checks whether an adjacent pixel has similar
attributes (e.g. grey value). If the difference is below some threshold, the two
pixels are merged to one area. Next, the attributes of another pixel adjacent to
this area are examined and this pixel is also merged with the area if the attribute
differences are small. In this way a homogeneous area is grown pixel by pixel.
This process is repeated until all pixels that are adjacent to the grown area have
significantly different attributes.

Extraction of complex objects


As requirements to geographical data are shifting from 2D to 3D and from vector
data to object oriented data the acquisition of these data with digital
photogrammetry is also increasingly three-dimensional and object based. In
particular for the acquisition of 3D objects like buildings and other highly
structured objects the usage of object models can be beneficiary. These models
contain the topology and the internal geometrical constraints of the object. The
usage of these models relieves the operator from specifying these data within the
measurement process and will improve the robustness and precision of the data
acquisition.
A common interactive approach is illustrated in figure 5. After the selection of an
appropriate object model by an operator, the operator approximately aligns the
object model with the image (left image). In a second step a fitting algorithm is

used to find the best correspondence between the edges of the object model and
the location of high gradients in the image (middle image). Especially in presence
of neighboring edges with
High contrast (like the windows on the house front in the example) the resulting
fit does often not correspond to the desired result and therefore requires one or
more additional corrective measurements by the operator (right image). Different
approaches are being used to find the optimal alignment of the object model to
the image. Fua [1996] extended the above described snake algorithm for fitting
object models. The energy function is defined as a function of the sum of the
grey value gradients along the model edges. Derivatives of this energy function
with respect to changes in the co-ordinates of the object corners determine the
optimal direction for changes in these co-ordinates, whereas constraints on the
co-ordinates ensure that a valid building model with parallel and rectangular
edges is maintained. Lowe [1991] and Lang and Schickler [1993] use parametric
object descriptions and determine the optimal parameter values by fitting the
object edges to edge pixels (pixels with high grey value gradients) and extracted
linear edges respectively. Veldhuis [1998] analysed the approaches of Fua
[1996] and Lowe [1991] with respect to suitability for mapping.

Semi-automatic measurement techniques as reviewed in this paper surely


improve the
efficiency of cartographic feature extraction. In most cases there is a clear
interaction between the human operator and one or more measurement
algorithms. Prior to the measurement the task of the operator is to identify the
object to be measured, to select the correct object model and algorithm and to
provide approximate values. After the measurement by the computer the
operator needs to correct part of the measurements, since the delineation
resulting from the objective of the measurement algorithm often does not
correspond with the desired object boundaries. Robustness as well as precision

of the semi-automatic measurements can be improved by incorporating


knowledge about the topographic features into the measurement process. A
clear example of this was already shown for the case of complex object
measurement. Further knowledge can be knowledge can be added in the form of
constraints between neighbouring houses and roads. Hwang et al. [1986] e.g.
uses the fact that most houses are parallel to a road and that houses are often
connected to a road by a driveway. In the case of linear features many more
heuristics can be used to guide the feature extraction. Cleynenbreugel et al.
[1990] notice that roads usually have no steep slopes and that, therefore, digital
elevation models can be useful for road extraction in mountainous areas.
Furthermore they notice that the road patterns are often typical for the type of
landscape (mountainous, flat rural, urban). Soft bounds on the usually low
curvature of principal roads are used in the road tracker described in [Vosselman
and Knecht, 1995].
Useful properties of water surfaces are related to height. Fua [1996] extracts
rivers as 3D linear features and imposes the constraint that the height of the river
decreases monotonously. Furthermore, when lakes are extracted as 3D surfaces
they can often be assumed to be horizontal. The latter constraint can be used to
automatically detect delineation errors caused by occluding trees along the
lakeshore. To obtain a higher degree of automation by interpretation of the aerial
image by computer algorithms much more knowledge is to be modelled.
Knowledge based interpretation of aerial images and the usage of existing GIS
databases within this process is a topic of current research efforts [Kraus and
Waldhusl, 1996, Gruen et al., 1997].

Вам также может понравиться