Академический Документы
Профессиональный Документы
Культура Документы
Year
Kim, Jijoong, Automatic aircraft recognition and identification, PhD thesis, School of
Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2005.
http://ro.uow.edu.au/theses/499
This paper is posted at Research Online.
http://ro.uow.edu.au/theses/499
NOTE
This online version of the thesis may have different page formatting and pagination
from the paper copy held in the University of Wollongong Library.
UNIVERSITY OF WOLLONGONG
COPYRIGHT WARNING
You may print or download ONE copy of this document for the purpose of your own research or
study. The University does not authorise you to copy, communicate or otherwise make available
electronically to any other person any copyright material contained on this site. You are
reminded of the following:
Copyright owners are entitled to take legal action against persons who infringe their copyright. A
reproduction of material that is protected by copyright may be a copyright infringement. A court
may impose penalties and award damages in relation to offences and infringements relating to
copyright material. Higher penalties may apply, and higher damages may be awarded, for
offences and infringements involving the conversion of material into digital or electronic form.
by
JIJOONG KIM
B.Eng. (Hons) (The University of Adelaide) 1993
M.Eng.Sc. (The University of Adelaide) 1995
School of Electrical, Computer and Telecommunications Engineering
Certification
I, Jijoong Kim, declare that this thesis, submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy, in the School of Electrical, Computer
and Telecommunications Engineering, University of Wollongong, is wholly my own
work unless otherwise referenced or acknowledged. The document has not been submitted for qualifications at any other academic institution.
Signature of Author
Date
ii
Table of Contents
Table of Contents
iii
List of Tables
vi
List of Figures
ix
Abstract
xxi
Acknowledgements
xxiii
1 Introduction
1.1 Design Objectives . . . . . . . . . .
1.2 Definitions and Basic Assumptions
1.3 System Description . . . . . . . . .
1.4 Contributions of the Thesis . . . .
1.5 Outline of the Thesis . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
4
6
7
9
.
.
.
.
.
.
.
.
.
.
.
.
15
16
20
20
22
24
25
27
27
30
37
38
40
2.4.3
2.4.4
TRIPLE System . . . . . . . . . . . . . . . . . . . . . . . . .
Das and Bhanu . . . . . . . . . . . . . . . . . . . . . . . . . .
43
45
109
110
110
115
119
121
129
133
137
141
145
147
. . .
. . .
. . .
. . .
with
. . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
Other Systems
. . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
157
159
159
161
163
164
167
169
170
173
178
182
185
193
195
.
.
.
.
.
.
203
204
205
213
217
220
221
7 Conclusions
223
7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.3 Suggestions for Future Work . . . . . . . . . . . . . . . . . . . . . . . 227
A Description of Input Parameters to the Neural Networks Feature
Extractors
229
Bibliography
233
List of Tables
1.1 Simplified representation of aircraft domain knowledge. . . . . . . . .
18
2.2 An example of 2-D table for an efficient pose clustering. The resolutions
(bin widths) for s, , 4x and 4y are 0.2, 20, 5 and 5 respectively. . .
34
73
3.2 The neural network configurations and the mean error rates in detection of wings, noses, wingpairs and aircraft hypotheses. . . . . . . . . 104
3.3 Test of the neural networks on the spurious features that survived the
rule-based approach. As shown in the third column, 30-40% of those
features are successfully rejected by the neural networks. . . . . . . . 107
4.1 Scores obtained in the process of aircraft evidence accumulation. The
first 6 scores are dedicated to the aircraft part detection, and the remaining evidences (in the 7th 18th entries) are introduced in order to
help distinguish between the aircraft and clutter hypotheses. . . . . . 139
6.1 Comparison of the total number of lines with and without the use of
the clutter removal algorithm for images with dense clutter. . . . . . 205
6.2 Computational complexity of aircraft recognition and identification
processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6.3 Performance evaluation using real aircraft and clutter images. . . . . 216
vii
6.4
Recognition rates in the eight imaging categories. Note that for the
multiple aircraft category, the denominator 42 is the total count of
aircraft in 17 multiple aircraft images.
. . . . . . . . . . . . . . . . . 216
6.5
6.6
viii
221
List of Figures
1.1 Real aircraft images with blurring and noise. . . . . . . . . . . . . . .
10
10
10
10
11
1.6 Aircraft with protrusions - engine protrusions for (a) and (b), and
missile protrusions for (c) and (d). . . . . . . . . . . . . . . . . . . . .
11
1.7 Aircraft with shadows - shadows on aircraft shown in (a) and (b),
background shadow casted by aircraft shown in (c) and (d). . . . . .
11
12
13
2.1 (a) Aircraft represented by its skeleton, (b) primitives, (c) structure
generated by using a string grammar, and (d) the skeleton that can be
generated by the grammar L(G) = {abcn d|n 1}. . . . . . . . . . . .
17
2.2 The projected angles and determine the rotation (pitch and roll)
of the model vertex-pair projected onto the image plane. . . . . . . .
30
38
39
ix
2.5
42
2.6
2.7
2.8
Convexity test on a line pair. For any two lines, Li and Lj , we de-
46
termine two extra lines (green dashed) by joining the end points of Li
and Lj . If these lines are contained in the segmented region (shaded)
then the convexity test is passed. . . . . . . . . . . . . . . . . . . . .
2.9
47
3.1
48
3.2
56
Sliding search window to check for detecting dense clutter. The pixels
in all of the four quadrants need to be dense and randomly oriented if
the region under the window is to be tagged as clutter. . . . . . . . .
3.3
60
Detection of randomly oriented dense clutter regions. The clutter regions shaded. The clutter-aircraft borders are correctly included in the
non-clutter region so that the wing edges can be extracted. . . . . . .
x
61
3.4 Results of dense clutter removal process. The first column original
images, the second column edge images prior to the clutter removal
algorithm, and the third column edge images after clutter removal.
62
3.5 Results of dense clutter removal process (continued). The first column
original images, the second column edge images prior to the
clutter removal algorithm, and the third column edge images after
clutter removal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
3.6 Contour labelling process. The current pixel searches for a contour
pixel to inherit the label from. The direction of search is defined the
orientation of the current pixel. . . . . . . . . . . . . . . . . . . . . .
64
3.7 If a contour has at least 30% of its pixels in non-clutter region, the
contour is accepted. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
3.8 Straight line extraction process, similar to that of Lowe [79]. This
algorithm generates a line approximation which is visually plausible. .
66
66
69
3.11 Intensity means collected in the vicinity of the line pair. The intensity
information is used to supplement the line extension decision. . . . .
71
3.12 (a) Line features prior to the line extension process (b) Line extension
and prioritisation outcome - extended lines (red dotted line), significant
lines (blue), and non-significant lines (green). . . . . . . . . . . . . .
71
3.13 Histograms of the line orientations are shown in the right column. The
images in the left column show clutter lines that are predominantly
oriented along one or two directions. . . . . . . . . . . . . . . . . . .
75
3.14 Forming a line link based on the endpoint proximity property is shown
in (a), and a recursive line search to check if two lines are linked via a
line chain is shown in (b). . . . . . . . . . . . . . . . . . . . . . . . .
xi
77
80
81
82
3.18 Gradient distribution curve for the region enclosed by a two-line grouping. To pass the intensity check, the 10%, 20%, 30% percentiles must
be less than preset thresholds (ie., majority of the populations must
be on the left corner).
. . . . . . . . . . . . . . . . . . . . . . . . . .
83
84
3.20 Incorrect nose configurations in (a), (g), (l) are subject to further verification. Resulting accepted and rejected configurations are shown in
blue and red, respectively. . . . . . . . . . . . . . . . . . . . . . . . .
85
86
3.22 Location of the nose tip. If the nose tip is not visible, then it location
is estimated at the midpoint of nose edges intersection and midpoint
of nose edges inner endpoints. . . . . . . . . . . . . . . . . . . . . . .
86
3.23 Multiple two-line grouping configurations generated from single physical nose. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
3.24 Wing/Nose Representation. Leg1 and Leg2 are the two lines forming
the two-line grouping. Note that the symbol ] refers to a number. . .
90
3.25 Resulting wing and nose candidates from the two-line grouping process
on the image of Figure 3.4(a). In (a), line pairs are shown in blue, and
red lines are used to show which two lines are paired. [(b) 80 nose
candidates and (c) 513 wing candidates]. . . . . . . . . . . . . . . . .
90
93
3.27 Three point collinearity property both in space and in the image. . .
94
xii
3.28 Four-line grouping representation. The two slots right and left wing
hold the wing numbers which form the wing-pair. Note that the symbol
] refers to a number. . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
95
96
3.31 Nose to wing-pair matching. The nose must be within the search region, must be facing the wing-pair, and the skewness must not be severe. 99
3.32 In the feature parameter space (2-D for illustrative purpose) the blue
circles represent aircraft feature parameters and the red squares represent clutter feature parameters. (a) Use of single thresholds forms
simple decision boundaries that pass many clutter features, and (b)
the neural networks can generate complex shaped decision boundaries. 101
3.33 Plot of log-sigmoid function. . . . . . . . . . . . . . . . . . . . . . . . 102
3.34 ROC curves for detection of (a) wings, (b) noses, (c) wing-pairs and
(d) aircraft hypotheses. . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.1 Typical commercial and military aircraft, and the parts that needed to
be detected for evidence score accumulation. . . . . . . . . . . . . . . 111
4.2 Detection of fuselage edges and assessment of their coverage. . . . . . 113
4.3 Scale factor (fL or fR ) which is inversely proportional to the divided
angular width of the fuselage search region, expressed in terms of (C
F P, C PL ) and (C F P, C PR ). . . . . . . . . . . . . . . . . . 115
xiii
4.4
The detected fuselage boundary lines connect the nose to the wing
leading edges via connected chains. Such a nose-to-wing connection
provides the strong fuselage boundary evidence. . . . . . . . . . . . . 116
4.5
Locating tail fin edge lines: (a) geometric constraints in terms of location, length and orientation, (b)intensity-based constraints applied
both in the foreground and background regions, (c) skewed symmetry constraints applied to tail fin leading edges (ie., cot 1 + cot 2 =
cot 01 + cot 02 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.6
4.7
The wing leading edges must overlap when rotated about FP. The
overlapping portion is shown in red. The same rule applies to the
trailing edges of the wing-pair. . . . . . . . . . . . . . . . . . . . . . . 122
4.8
4.9
. . . . . . 123
The background intensity is computed from the shaded periphery region. We assume this periphery region contains mainly the background. 124
5.26 Model matching for JSF with clutter and occlusion (match score = 72%).199
5.27 Matching for Mirage with camouflage and protrusions (match score =
78%). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
5.28 Matching for F18 with shadows (match score = 68%). . . . . . . . . . 201
6.1 Number of line groupings extracted by the rule-based method: NN
(blue), NW (red), N4G (black) and NH (green) versus line count NE
(x-axis). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.2 Number of line groupings extracted by the neural network based method:
NN (blue), NW (red), N4G (black) and NH (green) versus line count
NE (x-axis). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.3 Distribution curves of the number of line groupings, NE (top left), NW
(top right), N4G (bottom left) and NH (bottom right), obtained via
the rule-based approach from the cluttered aircraft images. . . . . . . 207
6.4 Plots of total line counts. The curve represents the number of the
extended lines as a function of the unextended lines (prior to the line
extension process). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.5 Plot of NW curves obtained from real aircraft images, using the rulebased two-line grouping extraction algorithm. The red and black curves
represent NW with and without intensity checks, respectively. . . . . 212
6.6 Plot of NW curves obtained from non-aircraft clutter images,using the
rule-based two-line grouping extraction algorithm. The red and black
curves represent NW with and without intensity checks, respectively. . 213
6.7 ROC curves for the generic recognition of aircraft. The red curve is
obtained when the rule based method is used for the extraction of
line-groupings and the blue curve is obtained using the neural networks.215
6.8 Model match score: Correct match (blue asterisk) and false match (red
circle or red cross). A red circle represents a correct aircraft hypothesis
matched to a wrong model. A red cross represents a spurious aircraft
hypothesis matched to one of the models. . . . . . . . . . . . . . . . . 218
xix
6.9
ROC curve: trade off between true and false match rates as the threshold varies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
xx
Abstract
Aircraft recognition remains a challenging problem despite a great deal of effort to
automate the recognition process. The majority of the aircraft recognition methods
assume the successful isolation of the aircraft silhouette from the background, and
only a few have actually addressed real world concerns, such as occlusion, clutter and
shadows. This thesis presents an automatic aircraft recognition system, which shows
improved performance with complex images. This system assumes from the start
that the image could possibly be degraded, contain occlusions, clutter, camouflage,
shadows and blurring. It is designed to tolerate and overcome the degradations at
various analysis stages. The first part of the thesis focuses on the generic aircraft
recognition problem using a generic description of aircraft parts and the geometric
relationships that exist among them. The system implements line groupings in a
hierarchical fashion, progressively leading towards a generic aircraft structure. A
voting scheme is used to consolidate line groupings belonging to an aircraft while
discouraging the formation of spurious line groupings. The aircraft identification
process is carried out in the second part of the thesis, where the generically recognised
aircraft is matched to model candidates. Model matching is carried out via pixellevel silhouette boundary matching. The system is tested on numerous real aircraft,
scaled-down model aircraft and non-aircraft images with adverse image conditions.
The developed system achieves a recognition rate of 84% at a false alarm rate of 7% on
real aircraft images, and an correct matching rate of about 90% and a false matching
rate of 7% on the generically recognised aircraft from model aircraft images.
xxi
Acknowledgements
I would like to express my sincere gratitude to my principal supervisor Prof. Abdesselam Bouzerdoum for his guidance and enthusiasm over the years. His cheerful
attitude and encouragement will be dearly missed.
I am also deeply grateful to Dr. Hatem Hmam, for his constant guidance, advise and
friendship. Without his daily probing, criticisms and suggestions, I could not have
completed this journey.
I cannot thank enough my wife Christine Jang for putting up with me, and cheering
me up whenever I hit dead ends. This thesis is dedicated to her.
I thank Dr. Carmine Pontecorvo for proofreading the first draft of my thesis and
being a good friend for so many years.
I am indebted to my family for their love and prayers, and Dr. Farhan Faruqi, Mr.
Ashley Martin and other colleagues for their patience and support.
xxiii
Chapter 1
Introduction
The task of reliably detecting and recognising an aircraft from single images remains a
challenging problem despite advances made in computing technology, image processing and computer vision. Aircraft recognition techniques have been reported using a
variety of methods, but very few have actually addressed real world concerns such as
occlusion, clutter and poor image quality.
A brief list of existing object-recognition techniques applied to aircraft recognition
can be found in [32]. A more recent overview is given in Chapter 2. Most recognition
techniques can broadly be categorised into moment invariant [15, 36, 57], Fourier
descriptor [25, 44, 123], syntactic/semantic grammar [34, 113], and model/knowledgebased techniques [9, 18, 32, 33, 86, 87]. Other methods that do not fit into the above
categories make use of wavelet transform [2], non-uniform rational B-splines and cross
ratios [115], and feature integration [55, 56]. There are some recent efforts to apply
neural network techniques for aircraft model matching [65, 66, 83, 84, 106, 124].
With the exception of model/knowledge-based techniques, virtually all of the above
mentioned methods require the successful extraction of the entire aircraft silhouette
or region. Aircraft recognition performance therefore, suffers considerably under nonideal conditions where various forms of image degradation or occlusion are present.
Model/knowledge-based methods, on the other hand, generally offer superior performance against noise, occlusion and clutter. They often make use of domain specific
knowledge to compensate for missing data and feature extraction deficiencies suffered at the lower levels of image processing. Furthermore model-based systems call
for the explicit matching of the aircraft image with a number of aircraft model instances. This matching is carried out at the pixel or feature levels. Model matching
provides strong evidence of the aircrafts presence in the image and allows viewpoint
determination.
1.1
Design Objectives
The main objective of this work is to design a vision system which can recognise
a generic aircraft under various forms of image degradation. A large proportion of
existing aircraft recognition approaches assumes that the aircraft silhouette can be
successfully separated from background, and usually makes use of synthetic images
to demonstrate performance. Knowledge-based aircraft recognition systems [9, 32]
are applied to real images and are usually provided with ancillary information about
image acquisition conditions such as the camera viewing angles and sun position.
Such ancillary information, which helps locating shadow regions and aircraft shadowmaking edges, is assumed to be not available to us in this work. A list of the main
difficulties faced in automatic aircraft recognition is summarised below.
Poor image quality (see Figures 1.1(a)-1.1(d)) - Often noise can be filtered out
with smoothing. However, excessive noise and blurring often result in edge
fragmentation and distortion. Poor image contrast is particularly challenging as
some of the weak but critical edges may be washed away during edge detection.
Camouflage (see Figures 1.2(a)-1.2(d)) - Camouflage in visual band imagery is particulary challenging for region-based vision systems. The presence of many
segmented regions associated with camouflage patches is a source of confusion,
because of the excessive subdivision of the aircraft region into many subregions.
Clutter (see Figures 1.3(a)-1.3(d)) - Edge fragmentation is a common problem in
image processing. Compounding this difficulty is the presence of background
clutter, which makes distinguishing between aircraft and clutter edges very difficult. Furthermore, dense clutter in the immediate vicinity of aircraft boundaries
often introduce errors to edge detection algorithms, causing the boundaries to
appear noisy and fragmented. Clutter also strains the systems resources in
terms of increased computational complexity.
Closely spaced multiple aircraft (see Figures1.4(a)-1.4(d)) - Edges and parts from
one aircraft may coincidentally associate with parts of another aircraft nearby,
forming spurious aircraft hypotheses.
Occlusion (see Figures 1.5(a)-1.5(d)) - Occlusion distorts the global shape signature
of an object, and may cause the object to appear as two or more disjoint components. Airborne aircraft may become partially obstructed by clouds, smoke
or flare. In addition, self-occlusion can occur when a missile, engine or rudder
occlude parts of the aircraft fuselage or wings.
Protrusion (see Figures 1.6(a)-1.6(d)) - Engine or missile protrusions may complicate the model matching process. Missiles often result in some loss of model
matching sensitivity because they usually do not constitute a fixed part of the
aircraft and hence do not usually appear in the aircraft model set.
Often, combinations of these problems are present in a single image, magnifying the
challenges faced by the vision system. A number of dedicated algorithms have been
developed to detect or partially address some of these issues. This system, however,
relies more on its global architecture to overcome these issues and achieve aircraft
recognition.
1.2
We begin this section by defining the terms generic recognition and identification
that appear in the title and throughout the thesis. Firstly, our definition of aircraft is
confined to aeroplanes with either boomerang, diamond or triangle wings (see Figure
5.1(c)-(e) for examples). We exclude helicopters, hot-air balloons, or aeroplanes with
parallel wings or propellers. In this thesis, the term generic recognition includes
detecting the aircraft and having some information about its shape, which allows a
brad classification of the aircraft (eg., commercial aircraft). The term identification
refers to a specific aircraft model (eg., Boeing-747, F18 etc), and this process usually
involves model matching.
Our system does not require ancillary information such as sun position, weather conditions, camera viewpoints and target range. Furthermore no contextual information
is provided regarding the imaged environment (eg., aircraft runway scene, clear sky
scene, etc). However, we still require a number of basic assumptions to be met.
Nose
Fuselage
Long
Roughly parallel to longitudinal axis
Trapezoidal or triangular shape
Tail fin
Association Attributes
Connected to fuselage
Located between nose and tail fins
Skewed symmetry about fuselage axis
Connected to fuselage
Oriented to face the wing pair
Connected to nose and wings
In between nose and wings
Behind the wings
Smaller in size than wings
Closest to trailing edges of wings
Aircraft Shape: Aircraft wings, fuselage and nose are assumed to be roughly coplanar.
Generic Viewpoint: The viewing angle must not be too oblique so that the wings
are not visible nor wing edges appear parallel.
Weak Perspective Projection: We assume that aircraft images are taken from
a distance, much longer than the aircrafts wingspan. Our system, however,
is designed to be tolerant to moderate perspective distortion. This has been
demonstrated using a number of close shot images.
Aircraft Resolution: Like most edge-based object recognition systems, this system
requires that the aircraft image is large enough to enable its boundaries to be
approximated by piecewise linear segments.
Figures 1.1 - 1.7 show a selection of aircraft images from the image set processed by
our system. The image resolutions vary from roughly 250 250 to 600 600 pixels.
1.3
System Description
1.4
The primary contribution of this work is the construction of a vision system for aircraft
recognition using real images. This system is designed to be robust against excessive
clutter, blurring, occlusion, camouflage and shadow, and is able to recognise multiple
aircraft in one image. To the best knowledge of the author, no previous system has
clearly demonstrated aircraft recognition performance using no contextual or ancillary
information (eg. the image contains an airfield scene) and using a large number of
real aircraft images obtained under degraded conditions and camouflage. Furthermore
the system performance was also tested with numerous non-aircraft images, and only
occasionally was a false generic recognition reported.
Other contributions of secondary nature include the followings.
1.5
For the remaining part of the thesis, we first give a review of existing aircraft recognition methods in Chapter 2. The low level feature extraction and generation of line
groupings are presented in Chapter 3. In Chapter 4, four-line groupings are paired
with nose candidates to form aircraft hypotheses. Positive evidences are collected
based on the aircraft part associations to support correct hypotheses. Negative evidences associated with clutter are also considered to negate spurious hypotheses. The
hypothesis which survives the competition and conflict resolution emerges as the winning hypothesis. Chapter 5 deals with the aircraft identification via model matching.
Aircraft pose estimation, image and model alignment, computation of match metric
and best match finding are presented in this chapter. Chapter 6 demonstrates performance results and shows systems computational complexities. This dissertation
concludes with Chapter 7, which includes the thesis summary, relevant topics for
discussion and suggestions for future work.
10
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
11
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
Figure 1.6: Aircraft with protrusions - engine protrusions for (a) and (b), and missile
protrusions for (c) and (d).
(a)
(b)
(c)
(d)
Figure 1.7: Aircraft with shadows - shadows on aircraft shown in (a) and (b), background shadow casted by aircraft shown in (c) and (d).
12
Edge Detection
HYPOTHESES
GENERATION
Clutter Rejection
Contour Extraction
Line Extraction
2-Line Grouping
(wing & nose)
4-Line Grouping
Boomerang, Diamond & Triangle wing pairs
Evidence Accumulation
(aircraft parts detection & other evidence)
HYPOTHESES
VERIFICATION
VALIDATION &
IDENTIFICATION
Model Selection
Pose Estimation
Model-to-Image Matching
Model Set
F16
F18
F35
F111
Mirage
etc
Aircraft Identification
13
Hi
nose
Aircraft Hypothesis
wing pair
wing pair
Four-Line Groupings
nose
left wing
right wing
Two-Line Groupings
Lines
Figure 1.9: Feature hierarchy for generation of an aircraft hypothesis. The aircraft
hypothesis (nose-wingpair association) is at the top. The lower level features are
four-line groupings, two-line groupings and lines. By using pointers, the system can
access any low level feature of hypothesis, Hi .
Chapter 2
Aircraft Recognition Techniques:
A Review
There are a variety of object recognition and classification methods that can possibly
be applied to the domain of aircraft recognition. The different approaches to aircraft
recognition may be broadly classified into linguistic pattern recognition techniques,
global matching techniques, local matching techniques and knowledge-based systems.
Section 2.1 discusses syntactic/semantic grammar methods that use linguistic patterns to analyse shape.
knowledge-free global shape descriptors such as moment invariant features and Fourier
descriptor, to uniquely describe various shapes of aircraft silhouettes. In Section 2.3,
local matching approaches are explored, with a special attention to indexing/clustering
methods that are commonly referenced in the field of geometric matching. These
methods adopt a paradigm of hypothesise then verify in an attempt to overcome the
shortcomings of the global shape descriptor techniques (eg., sensitivity to noise and
occlusion). The mainstreams of this field, commonly known as pose clustering, alignment and geometric hashing, are reviewed. Then a number of techniques that extend
the indexing idea to aircraft shape recognition are presented, and these include Mundy
15
16
and Heller [89], Marouani et al [81], Fairney [37], and Chien and Aggarwal[27]. Section 2.4 is dedicated to knowledge-based systems, such as COBIUS [9], ACRONYM
[18], TRIPLE [86], and that of Das and Bhanu [32]. The system proposed by Das
and Bhanu [32, 33] is explored more closely as it is the most recent and brings the
greatest relevance to our work.
2.1
These approaches use linguistic pattern recognition techniques to analyse shape and
classify aircraft using piecewise linear border approximations. Basically the idea
behind syntactic recognition is the specification of a set of primitives (lines or arcs)
and a set of rules (grammar) that governs their geometric relationships [14, 39, 40].
This grammar specifies combinations of these primitives to construct the piecewiselinear aircraft boundaries. The grammar can be either in a form of string or can be
extended to tree forms.
To better explain the underlying concept of syntactic recognition, a simple example is
considered. Suppose the object shown in Figure 2.1(a) represents an aircraft skeleton.
We define the primitives as shown in Figure 2.1(b) to describe the structure of this
skeleton. The grammar, G, is expressed as
G = (N,
X
, P, S)
where
N = a finite set of syntatic categories called non-terminals
P
= a finite set of image primitives (eg., lines) called terminals
P =
S=
17
(b)
(a)
(c)
cn
(d)
Figure 2.1: (a) Aircraft represented by its skeleton, (b) primitives, (c) structure
generated by using a string grammar, and (d) the skeleton that can be generated by
the grammar L(G) = {abcn d|n 1}.
P
P
, P, S), with N = {A, B},
= {a, b, c, d},
and P = {S aA, A bB, B cB, B d}, where A and B are the nonP
terminals, and S is the starting symbol. The terminals
= {a, b, c, d} correspond
to the primitives shown in Figure 2.1(b). By applying the first production from P,
(S aA), followed by sequential applications of productions A bB, B cB,
B cB, B cB and B d will derive a string {abcccd} which represents the
aircraft skeleton shown in Figure 2.1(c). The language generated by the rules of this
grammar is L(G) = {abcn d|n 1}, which means G is only capable of generating
the skeleton of the form shown in Figure 2.1(d) but having arbitrary length for the
fuselage section (represented by the primitive c). In this example, we assumed that the
interconnection between the primitives takes place at the dots shown in Figure 2.1(b).
In more complicated situations, the rules of connectivity as well as the information
18
B cB
Bd
Semantic Information
Connection to a is made only at the dots. Length of a is 3cm.
Production rule can be applied only once.
Connections to b are made only at the dots.
Direction of b is given by the perpendicular bisector of line
joining the end points of the two un-dotted segments.
Direction of b must be the same as the direction of a.
No multiple applications of this production.
The length of the wing is 6cm.
Connections to c are made only at the dots.
Direction of c must be the same as the direction of a.
This production can be repeated no more than 5 times.
Length of c is 2cm.
Connection to d is made only at the dot.
Orientation of d and b must be the same.
regarding other factors such as primitive length and direction, and the limitations
on the repeatability of the production, must be made explicit. This can be carried
out by introducing the semantic information to the system. The semantic rules deal
with correctness of the object structure established by the syntax from the production
rules. By using the semantic information, a broader class of patterns can be described
without having to increase the size of the production rules and primitives. An example
of the semantic information embedded to the production rules is shown in Table 2.1.
Tang and Huang [113], and Davis and Henderson [34] apply these linguistic shape
analysis techniques to the recognition of aircraft (in terms of silhouette boundary)
in aerial images. They explicitly consider the problem of superfluous hypotheses in
existing methods, and propose a way to get around the problem, by introducing a
design of what they call a creation machine. The creation machine is an abstract
mechanism that applies formal language theory to filtering out unwanted words (spurious line segments) and to establishing an order to the wanted words (aircraft line
19
segments). They allow for possible segmentation of the contour, and use broken
contours (the straight line segments) and relationship among them to describe the
aeroplane. They also acknowledge the difficulty in finding a set of good thresholds
for all the images, and adopt a multiple threshold approach to deal with real images.
Despite efforts to be practical with real aircraft images, their algorithm is limited to
only one particular type of aircraft. Expanding to a larger class of different aircraft
shapes leads to much larger grammars and often less effective parsers. Moreover,
these methods suffer from more shortcomings such as the necessity for computing a
unique segmentation of the shape into primitives, and the requirement of assigning a
unique terminal name to each primitive. Hence, these syntatic/semantic approaches
suffer when presented with missing data and distortion of extracted segments.
Davis and Henderson [34] address the fact that a shape can be decomposed into many,
possibly overlapping primitives. They attempt to overcome this problem by introducing an approach called a hierarchical constraint process that assigns all plausible terminal symbol names (or labels) to each primitive (or part), and allows higher level processes to disambiguate the labelling of each part. The experiment demonstrates the
systems capability in handling the uncertainties associated with segmented boundary
lines. However, the results were confined to 2-D silhouettes of aeroplanes viewed from
directly above (ie., zero roll and pitch). Moreover, they acknowledge the difficulty
associated with the construction of a grammar to embrace various projected images
of complex-shaped aircraft.
Even though these techniques allow specification of the local structure rather than
global shape, and are capable of explicitly incorporating the variations in the object
shapes into the models, their claims have not yet been substantiated using a variety
of real aircraft images. Moreover, the issue of constructing a grammar capable of
20
handling such diversity of aircraft shapes and viewpoints has not yet been demonstrated, and it remains unclear how these methods can cope with occlusion, shadow
effects, camouflage and clutter.
2.2
2.2.1
The input image to this system is in a binary form where the aircraft is assumed to
be successfully isolated from the background and its pixels are assigned a value of
one. If the dimension of the image is M N, the spatial central moment [102] of
order (p q) is expressed as,
pq(unscaled)
M X
N
X
=
(m m)
p (n n
)q F (m, n)
(2.2.1)
m=1 n=1
(
where F (m, n) =
and m
and n
are the mean values (centroid) of the aircraft region. Hu [57] has proposed a normalisation of the central moments. These normalised central moments
have been used to develop a set of seven compound spatial moments that are invariant to translation, rotation and scale change. The feature vector consisting of
these moments is computed from the image and subsequently compared with those
computed offline in the model database, using the Euclidean distance as the match
metric. The aircraft model associated with the best match is accepted as the viewed
aircraft in the image.
Implementation of the moment invariant techniques on aircraft recognition tasks can
21
be traced to the work by Dudani et al. [36], Reeves et al. [103], McLaughlin [83],
and Breuers [15]. In Dudani et al. [36], the feature set contains seven Hu-moments
from aircraft boundary pixels and another seven from aircraft region pixels. The
test suite contains images of 6 aircraft viewed from different camera positions and
orientations. The Bayes decision and distance-weighted k-nearest neighbour rules
are used to find the best match (ie., classification). Reeves et al. [103] proposed a
normalisation technique, aspect ratio normalisation, that is less sensitive to noise, and
yields a comparable performance to the Fourier Descriptors method, described in the
next subsection. McLaughlin [83] introduced the use of quadratic neural nets through
which the moment invariants are matched to the models. Breuers [15] modified the
nearest neighbour search procedure to improve classification and the accuracy of pose
under a wide range of image resolutions and viewpoints.
Methods like these work well when the preprocessing stage can unambiguously generate the object outer boundary and therefore separate the object region from the
background. The strength of these methods lies in that the feature set is not affected
by rotational, translational and scaling differences between an object model and its
observed image. The image-to-model feature matching can readily be implemented
in real time. However, for realistic images such as those shown in Section 1.1, it is
often extremely difficult to correctly segment the object region and isolate it from the
surrounding background. The sensitivity of these methods to intensity distribution
inside and outside the object silhouette makes moment methods less appealing to our
application.
22
2.2.2
23
incorporating an efficient nearest neighbour searching, which arguably saves the computation time without sacrificing the performance.
The advantage of the FD is that the FDs can approximate segments of contours hence
enabling the partial shape matching in the presence of occlusion as discussed in [45].
However, these methods also suffer from shortcomings; the normalisation required in
deriving invariant features may not be uniquely determined. Compounding this problem is their sensitivity to sampling of contour points, uniformity of sample spacings,
size of the FD, quantisation error and the contour perturbations [32, 33]. For these
reasons, the FD invariant methods are not well suited to our application.
Glais and Ayoun [44] developed a system that accounts for commonly encountered
problems in practice. Their algorithm embeds two different recognition approaches,
which are selectively implemented based on the quality and properties of the input
image. If the target and background are not clearly separable, then a syntactic pattern
recognition technique is applied using local image features. To put this in more
detail, firstly a watershed algorithm is applied to the image to separate object and
background. Usually, multiple separation hypotheses are generated in this process.
These hypotheses are converted to the FD, and then compared to a library by means
of nearest neighbour search. If no match is found during the search, the system
assumes that the object separation was not successful. In this case, syntactic pattern
matching using local features is activated. The description of their work, however, is
not given in detail in the article. The experimental result, obtained from a test suite
of computer generated aircraft images indicates that the performance is sensitive to
observation condition and background structure. Applicability to real aircraft images
is yet to be validated.
24
2.3
In many applications, the objects to be recognised are usually well defined in terms
of shape and size and are limited in number so that specific models can be stored
and used to help identify the viewed object. Model matching techniques relying on
global features are efficient in terms of matching speed, but have a limited capability
to handle shadows, occlusion and clutter. An alternative approach is to make use of
local features such as corners, holes [11], lines [112] and curvature [122] to achieve
object recognition. Unlike moment and FD methods, the recognition process is not
achieved in one step, but often calls for a search method to take place in either the
transformation parameter domain (eg. pose clustering) or in the model and image
feature spaces (eg. alignment). All these geometric matching schemes employ a
hypothesise then verify paradigm, where the local features are used to hypothesise a
transformation (pose), followed by a verification process that ensures that all model
features are consistently matched to their image counterparts.
Commonly used matching methods include pose clustering [3, 37, 48, 97], alignment
[27, 60] and geometric hashing [43, 72, 74, 118, 126, 127]. These methods are built
upon the observation that a transformation of an object may be defined by a transformation of a small subgroup of the object features. In this section, a general description
and comparison of these techniques are outlined, and a number of investigations into
aircraft recognition (by Mundy and Heller [89], Marouani et al [81], Fairney [37], and
Chien and Aggarwal[27]) are discussed.
25
2.3.1
Pose Clustering
In the pose clustering (or Hough Transform) approach [3, 97, 48, 37], recognition of an
object is achieved by iteratively finding transformations that map feature subsets from
the model domain to the image, and by generating clusters of the transformations.
Let us assume that the model can be represented by a set of features, called interest
features [126], which can also be extracted from the image. In the most general (and
least informative) case, the interest features will be just points or lines.
Consider a 2-D image to 2-D model match where interest points from corners and
inflections are used as the interest features, a 2-D affine transformation can be represented by six independent parameters as shown below,
x0 = ax + by + c
y 0 = dx + ey + f
where (x, y) and (x0 , y 0 ) represent the 2-D coordinates of the model and image interest
points. This technique treats the affine transformation as a point (single count) in
the 6 dimensional parameter space. To solve for the 6 unknowns, we require four additional linear equations (ie., two additional points). Each correspondence of a model
point triplet with three image points generates one candidate affine transformation,
recorded as one vote in the 6-D parameter space. Good transform alignments result
in dense vote clusters in the parameter space.
For a model of m points and an image of n points, m3 n3 correspondences are required, which is computationally expensive. Another downside of this method is the
large dimensionality of the transformation table. Requirement of the 6-D parameter space for a 2-D matching appears as memory inefficient, hence their usefulness
on higher dimensional problems is questionable. In addition, this method compares
26
all of the image triplets with all of the model triplets. Allowing every one of these
exhaustive pairings to contribute a vote makes this method susceptible to noise (ie.,
the transformation table is likely to have many noise spikes) [48].
This method does not account for global consistency between object and image features; an incomplete set containing a large number of fragments may be favoured by
the Hough algorithm over the desirable set comprising a fewer number of long lines
that completely enclose the object boundary [5].
27
2.3.2
Alignment
Huttenlocher and Ullman use the term alignment to refer to the transformation from
model to image coordinate frames [60]. They proposed a method for computing a
transformation from three non-collinear points under a weak perspective assumption.
The system is operated in a prediction-and-verification fashion. After each possible
alignment from a pair of triplets of points is determined, complete edge contours are
then used to verify the hypothesised match. For m model points and n image points,
there are C3m C3n 3! possible alignments, which are explored in an exhaustive search. In
their implementation, each model point is associated with an orientation attribute.
The intersection of two lines that are defined by two points and their orientations
is used to induce the third point. This enables forming an alignment using only
two model and two image points. Using this technique, they reduce the complexity
down to C2m C22 2!. Each hypothesised alignment must be verified by matching the
transformed model with the image. They organise the verification process in a hierarchical fashion: segment endpoints are used for initial verification first, and only
those alignments that pass the initial verification use the entire contour to perform
detailed verification. The solution found is unique up to a reflection ambiguity. Since
the alignment of features is local and is obtained by identifying corners and inflections
in edge contours, the features are more tolerant to partial occlusion.
2.3.3
Geometric Hashing
The idea of geometric hashing is to use invariants to index from an extracted scene
into a pre-stored hash table in order to discover the possible candidate matches. The
method is an efficient technique that uses spatial arrangements of features to locate
instances of models. Because this method does not match models one by one, it is
28
capable of effectively recognising objects from a large model database. The invariant
is the local coordinate of a point (i , i ), expressed with respect to a frame locally
defined by arbitrarily chosen three non-collinear points known as the basis, [p0 , p1 , p2 ].
This can be expressed mathematically as pi = p0 + i (p1 p0 ) + i (p2 p0 ). The
basis information as well as the model index are recorded in the hash table in the offline preprocessing stage. A voting process is involved to recover the transformation
between the object in the scene and the object in the model database during the
on-line recognition stage.
Lamdan and Wolfson [72] introduce a prototype geometric hashing technique for
recognising flat industrial parts and synthesised 3-D objects. They view the geometric
hashing as a filtering procedure which can eliminate a large number of spurious solutions before direct verification is applied [74]. Gavrila and Groen [43] use a geometric
hashing system to recognise 3-D CAD models. Tsai [118] investigates the use of line
features to compute recognition invariants in a more robust way, and demonstrates
that this technique is nose resistant and more effective in occluded environments than
the point-based approaches.
More efficient indexing methods were developed by Stein [109, 111], where objects are
approximated as polygons. A sequence of consecutive line segments in the approximation is called a super segment. Super segments are encoded and stored in a hash
table for lookup at recognition time. Recognition proceeds by segmenting the scene
into a polygonal approximation; the code for each super segment retrieves model hypotheses from the table. Clustered hypotheses represent the instance of the model.
Finally the estimate of the transformation is refined. This work uses examples of
aircraft recognition from aerial photographs of airports. Stein also extended his work
to the problem of matching 3-D object models to 2-D image features [110], where
the importance of grouping control mechanisms to obtain a reasonable starting set of
29
features is stressed. He also argued that extending geometric hashing to 3-D full perspective matching is very difficult, and resorted to using the topological constraints
between the fairly complex image features.
Comparisons of the geometric hashing technique with the pose clustering and alignment methods have been addressed and can be found in [47, 48, 49, 51, 73, 126].
Grimson and Huttenlocher analysed the sensitivities of the Hough Transform [48]
and Geometric Hashing [47], and concluded that all these clustering based methods
suffer from the false positive rates becoming intolerably high in noisy and cluttered
environments. They seem to be more adequate for low dimensional matching problems under a controlled industrial setting, such as recognition of flat objects on a
conveyer belt under a stationary camera.
30
3-D model
vertex pair
3-D rotation
x-y translation
scaling
transformed model
vertex pair (3-D)
e1
e2
image plane
z
Figure 2.2: The projected angles and determine the rotation (pitch and roll) of
the model vertex-pair projected onto the image plane.
2.3.4
Particular Systems
Mundy and Heller [89] developed a model based recognition system that makes use
of 3-D vertex-pair of model and 2-D vertex-pair in the image to determine the affine
transform parameters (see Figure 2.2). Assuming a weak perspective projection, the
transformation between the object and image reference frames has six degrees of
freedom, three for rotations, two for translation and one for scaling. The vertex-pair
provides a sufficient number of constraints to determine the six parameters of the
affine transformation. Assuming that a correspondence has been made between the
affine projection of a 3-D model vertex pair and a set of 2-D edges and vertices derived
from the image intensity data, the roll and pitch rotations of the viewing angle can be
derived from the observed angles, and shown in Figure 2.2. The yaw angle can be
computed readily by measuring the rotation of the image vertex pair (shown in red
31
in Figure 2.2) with respect to the model vertex pair (in black) about the z axis. The
length ratio of the model and image vertex pair vectors is the estimate of the scale
factor (or equivalently the viewing distance if the camera focal length is known).
The estimated transformation casts a vote in the transform (Hough) space. The
six-parameter transform space is decomposed into subspaces, (ie., 2-D [roll, pitch]
array, 1-D [yaw] array, 3-D [x, y, scale factor]) for ease of computation. After
completing the voting process using a combination of binning and nearest neighbour
clustering techniques, clusters with large enough votes are considered to be a feasible
aircraft hypothesis. If the camera orientation and its parameters are known and the
aircraft is in a parked position, then the computation complexity is reduced and the
system robustness also improves. The validation process is carried out by comparing
the model edges (which have been transformed according to the computed viewpoint)
with the edge images. The actual edge coverage computation is performed using the
Distance Transform [13] (as will be discussed in Section 5.1.4).
Mundy and Heller tested their algorithm on real images of C130 transport aircraft
parked in an airfield. The experimental results show good classification percentages, provided that the aircraft boundaries are successfully extracted. However, their
test setting is limited to low clutter, high contrast images only. Furthermore, being indexing-based, this system is subject to a combinatoric strain and Hough-space
dimensionality problem.
32
decomposed into its main discernible components: two wings, two rear wing, engines,
the fuselage and a tail. The general methodology consists of grouping primitives
extracted from the image in the sets, which potentially represent the hypotheses of
instances of the aircraft in the image. The system aims to deal with edge fragmentation problem due to various image degradations. The model is extracted by hand
from one or more images, and is composed of two orthogonal planes: a horizontal
plane outlining the wings and the fuselage, and a vertical plane representing the tail.
In the system, the camera model and transformation (translation and orientation)
are assumed known, hence the model is transformed accordingly and projected to the
image plane. Using the sun azimuth and incidence angles (assumed to be known),
the shadow outlines can be computed and augment the projected 2-D aircraft model.
This process is followed by hidden line removal procedure.
The extraction of the image line segments is carried out using the LINEAR feature
extraction system [94]. Given a set of projected model segments and a set of extracted
image segments, every candidate pair of matching segments (one from each set) is
checked for their separation, angular deviation and length difference. These error
terms are used to determine the weighting for the vote. The vote is cast into an
accumulator array, whose axes denote the 2-D translation.
A peak in the accumulator array gives the position of the best translation between
the two sets of segments, and a second pass of the algorithm collects the matching
pairs that contribute to the peak. If the matching level exceeds a preset threshold,
then the model is validated. If not, then further validation and evaluation follow.
The validation starts by computing a binary function of the matched segments between image and model, along the arc length of the model, and then scaling this
function to map it on a circle of radius 1, centred at (0,0). It is followed by computing the moments of the resulting fragmented wheel to analyse the distribution of the
33
matching pixels. The matching metric, eccentricity, length of match and displacement
are derived and used to determine if the hypothesised model is validated.
The performance analysis of the system given in [81] is carried out for one aircraft.
The pose estimation is assumed known, which limits the usefulness of this system.
Fairney Model
The aircraft recognition approach proposed by Fairney [37] starts by building a shape
description of the object. In this study, jet aircraft and missiles are used in the
experiment. A series of salient points on the aircraft boundary are connected by
straight line segments. These line segments form a series of directed edge segments
(or a chain of edge vectors). This process of shape description is repeated for the
model. The yaw and roll angles are fixed to reduce the problem to 2-D matching,
therefore the model database contains 2-D projections (in terms of edge vectors) of
the model, and yaw and roll angles.
Then the transform which brings the image edge vectors into coincidence with the
model vectors needs to be estimated. This transform comprises a scale factor s, angle
between the model and image edge vector pair (pitch angle =), and two translations, 4x and 4y. A pose-clustering approach is selected here so that the transform
for one vector pair contributes a vote in the 4-D parameter space. After trying all
association combinations of the model edge vectors with the image edge vectors, the
most prominent cluster in the parameter space will be selected and the parameters
associated with the cluster is regarded as the correct transform.
Given a model database, this method uses a compactness measure (area/perimeter2 )
to narrow down the search space in the model database. Having obtained a short list
34
Table 2.2: An example of 2-D table for an efficient pose clustering. The resolutions
(bin widths) for s, , 4x and 4y are 0.2, 20, 5 and 5 respectively.
s
0.2-0.4
0.6-0.8
0.0-0.2
-
0 20
20 40
20 40
-
4x
5-10
10-15
5-10
-
4y
15-20
1-5
1-5
-
count
20
4
1
-
of the model candidates satisfying the compactness constraints, the pose clustering is
carried out using a more efficient 2-D table (shown in Table 2.2), instead of the 4-D
parameter space.
The pose clustering process starts with assigning bin widths to the four parameters
in the table. This table is initially empty. As the transformation for a vector pair is
estimated and appropriately quantised to the bin resolution, these parameters enter
the first row, with a vote count of one. For the next edge vector pair, if the parameter
combination does not already exist in the table, then this combination generates a new
entry in the table. On the other hand, if such a combination exists in the table, then
its count is simply incremented. This process can be made more efficient by initially
performing the clustering with larger bin sizes and then splitting the frequently visited
bins into smaller bins later. The data in the finally selected bin gives the best estimate
of the winning transformation. Further validation is carried out by aligning all the
image edge vectors with their model counterparts using the estimated transformation.
The root mean square (rms) difference between the transformed image coordinates
and those of the model are computed. The winning model and orientation of the
smallest rms error are finally selected. This method is efficient and can handle partial
occlusion and boundary perturbation due to noise. However, this method assumes a
successful extraction of the object boundary which can be very challenging in cluttered
scenes or under poor imaging conditions.
35
0
0
0
xi = R11
xm + R12
ym + R13
zm + stx
(2.3.1)
0
0
0
yi = R21
xm + R22
ym + R23
zm + sty
0
where s, tx , ty and Rij
= sRij are respectively the scale factor, the translations in
x and y, and the rotation parameters. Since these equations have eight unknowns,
three additional point pairs are required to generate eight linear equations to solve
for the eight unknowns (ie., transform parameters). Such four-point correspondence,
expressed in terms of the transform parameters gives rise to a hypothesis (model and
transform).
Verification
The hypothesis in terms of the transform parameters needs to be verified using the
constraints associated with the rotational parameters,
36
0 2
0 2
0 2
1. R11
+ R12
+ R22
= s2
0 2
0 2
0 2
2. R21
+ R22
+ R23
= s2
0
0
0
0
0
0
3. R11
R21
+ R12
R22
+ R22
R23
=0
If the computed transformation satisfies all these constraints, then the four-point correspondence is considered valid. The remaining model points are transformed onto
the image plane and the mean-square displacement error is computed. This process is
repeated over all the valid four-point correspondences. The four-point correspondence
whose mean-square-error is below the threshold is selected, and [R31 R32 R33 ]T is found
via the cross product of [R11 R12 R13 ]T and [R21 R22 R23 ]T to estimate the viewing angle.
Validation
The verified hypothesis brings the model and image contours into an alignment. First,
a pair of matching points are selected from the model and image contours. The
distance between the boundary point and centroid is then measured for both the
model and image contours, and their distance ratio is computed. The distance ratio
is collected for the remaining point pairs on the contours, and the standard deviation
of the ratios is used as the shape matching metric. The minimum distance ratio
standard deviation (DRS) is searched, to find the winning model and pose. A detailed
description of DRS is discussed in Section 5.1.2.
Simulation results using various aircraft [27], demonstrated the tolerance of this
method against occlusion and scale changes. This method is also applicable to the
multiple target images. However, the images used for the experiment have a homogeneous background, and therefore it is unclear to us whether or not this approach
will tolerate other types of image degradation such as clutter. In this work, the effect
of occlusion is expressed as deformation of the closed contour. However, an aircraft
37
2.4
Traditional global feature methods assume that each instance of an aircraft object
is an accurate projection of the aircraft of known dimensions and shape onto the
image plane [37]. The main focus of these approaches is to find the best match
from the model database to the image data for a particular viewpoint. These single
step recognition approaches are only effective if relevant image data is available and
relatively accurate. However, the image data is usually distorted due to a lack of
reliable low-level image processing techniques.
Reliable image primitive extraction is not always guaranteed in real-world environment. Variations in aircraft appearance in the image plane due to unknown viewpoint, noise, shadow, occlusion and adverse weather effects further complicate the
image data formation. Hence, the single step recognition approaches are often not
suited to object recognition in an uncontrolled environment.
A more appropriate approach is to carry out the analysis in multiple stages, where
each stage of image analysis is governed by the system knowledge/model database as
shown in Figure 2.3, that represents the domain object in various levels of hierarchy,
starting from a local part description level to a global category level.
The recognition process begins by detecting the image primitives (low level image
processing) and then these primitives are combined to form higher level features and
to make coarse-level decisions (intermediate level processing). Associations of the
38
Segmentation
Representation
and
Description
Preprocessing
Problem
Domain
Knowledge/Model Base
Recognition
and
Interpretation
Result
Image
Acquisition
2.4.1
COBIUS
The COBIUS [9] (A Constraint-Based Image Understating System) has been developed by Lockheed Missile and Space Company Image Technology Development
Program. This system focuses on applications using high resolution aerial imagery
interpretation, addressing generic domain object representation, compensation for unreliable image segmentation and knowledge control. The system consists of knowledge
bases for domain object models and control strategies, blackboard areas to contain
39
the instantiated hypotheses of the scene, and an image feature database to fuse results
from multiple image segmentation modules (see Figure 2.4).
COBIUS uses a hierarchical representation scheme for both domain objects and constraints. As for domain object, the hierarchy consists of event, scene, group, object,
subpart, surface and curve. Similar hierarchy applies to the constraints, from coarse
to fine levels. Complex constraints are decomposed into primitive constraints and,
the constraints can be manipulated by rules and other constraints. Model based
prediction and verification of primitive constraints from the complex constraints can
be used to reduce the combinatorial computation of matching techniques. In order to cope with unreliable image segmentation, COBIUS uses a multiple feature
fusion approach with model-based feature verification capability. The region segmentation generates coarse image feature for initial image interpretation, and the edge
40
2.4.2
ACRONYM
Brooks [18, 19] introduced a vision system called ACRONYM, which recognises 3D objects from 2-D images. He uses an example of recognising airplanes on the
runway of an airport from an aerial photograph. He uses a generalised cylinder (or
cone) representation for the models. A relational graph structure is used to store
such representations. Nodes are the generalised cylinders and the links represent the
relative transformations between the cylinder pairs. The system also uses two other
graph structures, constructed from the object models, to assist the matching process.
41
use) such as central heating water pump or gas pump. Additional restrictions
are allowed to be added to the graph during the recognition process.
Prediction Graph: Links in the graph represents relationships between features in the image. These links are labelled must-be, should-be or exclusive
according to how likely it is that a given pair of features will occur together in
a single object.
For any 3-D object represented as a generalised cone, one can define a corresponding
2-D shape representing its image under perspective projection from any arbitrary
view point. Two descriptions are used for the 2-D image features.
Figure 2.5 depicts the generalised cylinder (cone) representation of an aircraft model,
and the projected images in terms of ribbons and ellipses.
ACRONYM uses its geometric models, supplemented by a restriction graph and constraints upon variations in element sizing, structuring, positioning and orientation,
to predict possible ribbon images from various viewpoints. The matching process is
performed in two stages:
42
Generalised Cylinder
representation of
an aircraft model
Possible 2D projections
from different view points
Ribbons for fuselage
Figure 2.5: Generalised Cylinder representation of an aircraft and the projected images in terms of ribbons and ellipses.
43
1. The image is first searched for straight or curved lines and then, by linking lines
that are proximal within certain tolerances, local matches to ribbons predicted
from the model are searched for. Such instances of ribbon matches are grouped.
2. The groups of the matched ribbons are checked for global consistency in that
each match must satisfy both constrains of the prediction graph, and the accumulated constraints of the restriction graph.
2.4.3
TRIPLE System
Ming and Bhanu [86] developed a target recognition system called TRIPLE (Target
Recognition Incorporating Positive Learning Expertise) that incorporates two powerful learning techniques, known as Explanation-Based Learning (EBL) and Structured
Conceptual Clustering (SCC).
Figure 2.6 illustrates the configuration of the components in the TRIPLE target recognition system. The processing elements, shown as blue rectangular blocks, process
the input image data and features, and generate the target recognition results.
The segmentation and symbolic feature extraction block segments and locates the
regions of interest (ROIs) in the image, and then extracts the symbolic features from
the ROIs. The knowledge-based matching block traverses the classification tree using
the extracted symbolic features to reach a leaf node of the tree. If successful, then
44
image
Symbolic Feature
Definitions
Segmentation and
Symbolic Feature
Extraction
Background
Knowledge
Knowledge Based
Matching
Explanation Based
Learning
Target Classification
Tree
Structured
Conceptual
Clustering
Target Model
Database
Feature Value
Monitor
Goal
Dependency
Network
Figure 2.6: Multi-strategy machine learning approach for aircraft target recognition.
the target has been correctly identified. The matching block also initiates the proper
learning cycle based on the target recognition results. The explanation-based learning
(EBL) block, when invoked by the matching block, selects the relevant target features
based on the symbolic feature information during the target model acquisition cycle
(as bounded within red box in Figure 2.6). The EBL block also identifies new relevant
features for updating the classification tree. The structured conceptual clustering
(SCC) block is responsible for maintaining the classification tree, using the relevant
symbolic features selected by the EBL block. The feature value monitor block adjusts
the feature values in the classification tree, according to the changes in the previously
selected features for target recognition, during the target feature values refinement
cycle (as bounded within green box in Figure 2.6).
The background knowledge is accessed by the EBL block to assist in discriminating relevant target features from background. The target model database stores the
45
2.4.4
Das and Bhanu [32, 33] proposed a system for recognising aircraft in complex, perspective aerial images, using qualitative features. The system is designed to deal with
the issues of real-world scenarios, such as shadow, clutter, and low contrast. It uses a
hierarchical representation (consisting of qualitative-to-quantitative descriptions) of
aircraft models. Such descriptions vary from symbolic features (eg., aircraft wing)
to primitive geometric entities (eg., lines, points), and allow an increasingly focused
search of the precise models in the database to match the image features.
The system consists of four distinctive features which are:
46
Figure 2.7: Framework of the qualitative object recognition system [33, 32].
A qualitative-to-quantitative hierarchical object model database, and three recognition sub-processes which utilise these models.
Saliency-based regulation of low-level features to be used, in an incremental
fashion, in the subsequent steps of recognition.
Model-based symbolic feature extraction and evaluation that uses regulated lowlevel features and heterogeneous models of image segmentation, shadow casting,
and image acquisition.
Refocused matching for finer object classification.
The framework of the recognition system is shown in Figure 2.7. Initially, the lower
resolution version of the input image is processed to locate the regions of interest
(ROIs) [91] by identifying feature clusters. As a first step, edge pixels in the ROI
are detected by applying multiple thresholds, acknowledging the fact that different
images or different parts of an image are subject to different optimum thresholds.
47
contained in the
segmented region
Li
Lj
convexity test - accept
Li
segmented
regions
Lj
convexity test - reject
Figure 2.8: Convexity test on a line pair. For any two lines, Li and Lj , we determine
two extra lines (green dashed) by joining the end points of Li and Lj . If these lines
are contained in the segmented region (shaded) then the convexity test is passed.
Initially, most salient lines are used in the grouping, and if no aircraft recognition is
achieved, then next incremental salient lines are included. This progressive relaxation
continues until a successful recognition is declared or the least salient line features
are invoked.
Edge segment following is conducted to create long chains of edge segments. As
various parts (eg., wing, nose, fuselage, etc) of the generic aircraft model are described
in terms of linear segments, a straight line extraction technique similar to that of
Lowe [79] was used. Furthermore, corners are detected by obtaining gradients and
curvature measurements. In addition to line extraction, region segmentation (based
on the joint relaxation of two-class region-based and edge-based approaches [7]) is
also carried out. The potential dominant axes of the aircraft region are generated by
connecting the extremities of the segmented foreground region.
This system uses ancillary data, which includes weather condition. If the weather
is cloudy, then the shadow-detection algorithm is skipped. If not, then potential
shadow lines are extracted. If two regions divided by a line present bi-modality of
the intensity histogram, then the line is marked as a potential shadow line.
48
Li
Li
Lk
Lj
three-line grouping
Lh
Lk
Lj
four-line grouping
Figure 2.9: 3 or 4-line grouping process to generate symbolic aircraft features. The
shaded circles represent the proximal region of independently detected corners. Any
group of three lines (on the left) must satisfy the following conditions: (i) the two
lines, Li and Lj , are non-parallel, (ii) the third line, Lk , is in between Li and Lj , (iii)
the line intersections occur near independently detected corners, and (iv) the third
line, Lk , is shorter then at least one of Li and Lj . In addition, a group of four lines
(on the right) must satisfy the following conditions: (i) the two lines, Li and Lj , are
non-parallel, and the other two, Lh and Lk , are parallel, (ii) the parallels form the
opposite sides of the trapezoid, (iii) the line intersections occur near the detected
corners, and (iv) the parallel lines, Lh and Lk , are shorter than the non-parallels, Li
and Lj .
In order to extract the meaningful edges, the algorithm executes a two-pass convexgroup extraction process. During the first pass, the entire set of lines is decomposed
into subsets, based on proximity and collinearity such that lines in a subset satisfy
the convexity criterion. This convexity requirement is illustrated in Figure 2.8. If
the lines (in green) created by joining the endpoints of line pair, Li and Lj , are all
contained in the segmented region (shaded), then a convex group is created. This step
also results in line pairs that fail the convexity test (an example of which is shown
in red), which are subsequently put in a pool. The second pass considers if isolated
lines from the pool can be put in a convex group with relaxed proximity condition.
Provided that it is not overcast or dark, a shadow line to shadow-making line matching
is conducted using ancillary information about the camera-platform position/orientation
49
and the sun position together with the imaging parameters. Convex groups of shadowmaking lines are used to extract the symbolic features of the generic aircraft class.
Such features include trapezoid-like shapes for wings, tails and rudder, and wedgelike shape for the nose part. To extract these symbolic features, sets of conditions
derived from the aircraft part/subpart representation is used in a matching process
with three- and four-line groupings. Figure 2.9 illustrates typical arrangement of 3or 4-line groupings for the symbolic features of the generic aircraft class.
Once the symbolic features have been derived, they are matched to the generic aircraft
model through an evidence accumulation process; the parts mutual connectedness
is verified against the rules associated with the generic aircraft description. The
recognition confidence is based on the quality of the evidences. If the confidence level
is low, then the low level image processing is revisited to include less salient features.
Upon the recognition of a generic aircraft, further refinement of the detected aircraft
shape is initiated to account for the missing elements of the symbolic features. The
labelled parts are used to direct the search for more localised (symbolic/primitive)
features that are available at lower levels of the database hierarchy. Such retrieval of
the less salient features allows more precise classification in the refocused matching
process. The final output is the aircraft class recognition and its symbolic subparts.
The main contributions of this system are: (a) the salient feature extraction and their
use in a regulated fashion, (b) use of heterogeneous geometric and physical models associated with image formation for feature extraction and subsequent recognition, (c)
and the integration of high-level recognition processes with low-level feature extraction processes. Combination of these essentials makes the system robustness against
edge fragmentation commonly encountered in practice, as demonstrated in [32] using
real aircraft images of varying contrast, clutter and shadow.
50
The drawback of this system is its over-reliance on region segmentation. If an aircraft image contains camouflage, self-cast shadow or occlusion, resulting in multiple
disjoint subregions, then the dominant axes estimation and convex group extraction
may suffer. It is not clear how this system can cope with heavily cluttered images,
particularly if the background is not plain (eg., building or other objects in the background). Furthermore, this system is not capable of handling closely spaced multiple
aircraft in the ROI.
Chapter 3
Feature Extraction and Generation
of Aircraft Hypothesis
This chapter deals with the extraction process of line features, grouping of lines that
potentially describe or delimit an aircraft part in an image, and generation of aircraft
hypotheses. A part of the generic aircraft such as a wing, tail, nose or fuselage, is either
trapezoid-like, wedge-like or elongated in shape. In this system, the most prominent
part of an aircraft is the wings, as wing edges are usually straight and readily visible
from most viewing angles. The wing structure carries distinctive geometric attributes,
which provide strong clues of aircraft presence in the image. Furthermore, the wings
enable the gross classification of aircraft in terms of wing shape. In our system, the
wings are represented in pairs forming a triangular, diamond or boomerang shape.
Both wings are usually delimited by four linear sides associated with the leading
and trailing edges. As depicted in Figure 1.9, lines, two-line groupings and fourline groupings are extracted in this order to provide a hierarchical structure that
facilitates subsequent image analysis. Such progressive grouping of features enables
the propagation of geometric/intensity constraints to prune out a large number of
unwanted features at each stage, hence preventing the combinatoric explosion that
51
52
3.1
In many works including ours, a straight line forms the fundamental or basic feature
upon which more complex features are built. Due to its importance, a number of
existing techniques that extract lines in images of man-made structures (such as
buildings and roads), is first introduced. The most classic technique is the Hough
transformation [3, 61, 64, 75], where every edge pixel is indexed into a quantised
parameter space, based on the location and direction in the image. Point clusters
in the parameter space correspond to straight lines. The disadvantage of this global
processing method is that it fits straight lines to collinear points regardless of their
spatial contiguity. The alternatives to the Hough transform are techniques that use
53
54
more pairing is possible. The suppression of noisy lines are also implemented based on
line length, average contrast, and whether or not the line is isolated (ie., no other lines
within a 77 neighbourhood). This method is effective in detecting linear contours
in aerial images.
So far, some of the straight line extraction methods are discussed, including the edge
detection methods that rely on gradients including the Canny Edge detector [21].
Marr and Hildreth [82] came up with another scheme that uses the Laplacian of the
Gaussian (LoG) and zero crossings [82] to detect edges. Bennamoun [4] discusses
the trade-offs between the Gradient and Laplacian based methods, and presents the
hybrid of the two. Perona and Malik [101] came up with the anisotropic diffusion
instead of Gaussian smoothing, which encourages intra-region smoothing in preference
to inter-region smoothing.
Since we are only interested in extracting straight edges, we decided to go along with
the methods applied to detection of the objects with straight edges, such as buildings.
Therefore, our approach bears some resemblance to the works of Nevatia and Babu
[94] and Venkateswar and Chellappa [121], to the extent that these methods make
use of multiple directional masks and generate a pixel-orientation image (or phase
map), which is used to assist the edge-linking and merging processes. This has been
demonstrated in a number of real aerial images of buildings. We use more directional
masks, which enables us to increase the mask size in order to improve the detection
sensitivity to long but weak edges.
3.2
The input to the system is an eight bit grey-scale image with an image size of M N ,
where M and N vary between 240 and 600 pixels. The input image is convolved
55
with eight directional masks in steps of 22.5 . These are shown in Figure 3.1, and are
padded with zeros in such a way that the effective shape of the mask approximates
an elongated rectangle with the dimension of W L, where W and L are shown in
red in Figure 3.1. We set W =7 pixels and L=9 pixels.
The increased mask size in comparison to Nevatias [94] 55 was found to be more
appropriate for our aircraft application, allowing finer directional quantisation and
improved detection sensitivity to long but weak edges. However, making the mask
larger decreases the edge detection sensitivity around corner areas, and may result in
significant displacement of edge pixels. By using the increased number of directional
masks (8 masks as opposed to 4, as in Venkateswar [121], or 6, as in Nevatia [94]), the
sensitivity to weak edges is increased. It also provides more precise phase information,
which plays an important role in the contour extraction and linking processes.
The convolution process of the original image with the eight directional masks, generates eight gradient images {G1 , G2 , . . . , G8 } associated respectively with the directions {67.5 , 45 , . . . , 90 }. For each pixel (m, n) in the image, the largest gradient
magnitude in {G1 , G2 , . . . , G8 } is noted and the corresponding direction is assigned
to the direction (or phase) image. In mathematical terms, the gradient and phase
images are computed using,
G(m, n) = max {Gi (m, n)}
1i8
(3.2.1)
56
90 deg
67.5 deg
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
-1
-1
-1
L
45 deg
22.5 deg
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
-1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
-1
-1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
-1
-1
-1
+1
+1
+1
+1
+1
+1
-1
+1
+1
+1
+1
-1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
-1
+1
-1
-1
-1
-1
-1
-1
+1
+1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
+1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
-1
+1
+1
+1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
-1
-1
-1
-1
+1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
0 deg
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-22.5 deg
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
+1
+1
+1
+1
-1
-1
+1
+1
+1
-1
+1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
+1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
-1
+1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
-1
+1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
-1
+1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
-1
+1
-1
-1
-1
+1
+1
+1
-1
-1
-1
-1
-1
-1
-1
+1
+1
-45 deg
-67.5 deg
Figure 3.1: Eight directional edge masks in angular steps of 22.5 degrees. These edge
masks have an elongated rectangular shape to detect long weak edges.
57
the largest of all the adjacent pixels (ie., g > gi , for all i = 1, . . . , W , where W = 7
as shown in Figure 3.1), then the current pixel is accepted as an edge pixel, and the
adjacent pixels are removed.
The thinned gradient image undergoes thresholding that uses two thresholds to reduce
edge fragmentation. An edge contour can be broken into fragments by the gradients
that fluctuate above and below the threshold along the edge. If a single threshold
is applied to the gradient image, and the edge has an average strength equal to the
threshold, then because of the noise the edge may occasionally dip below the threshold
and appear dashed. To avoid this, we make use two thresholds, one high and one low.
Any pixel in the image that has a gradient above the high threshold is tagged as an
edge pixel. Then pixels which are connected to this edge pixel and have a gradient
above the low threshold, are also selected as edge pixels.
We set the two thresholds to 16.5% and 10% of the image peak gradient. These
threshold values were selected after experimenting with a number of aircraft images
which present blurring, low contrast and clutter. The illustration of this process can
be found in [32].
Additional processing is carried out to discard isolated 1-3 pixel clusters. Furthermore, it is often observed that an endpoint pixel may show a phase value inconsistent
with those connected to it. This is mainly due to the fact that only about half the
mask overlaps the edge, leading to erroneous phase computation. The phase value
of such endpoint pixels are corrected and made consistent with the phase values of
adjacent pixels.
58
3.3
In many real world complex scenes, a large proportion of edge pixels appear in the
background. These background pixels increase the computational load by generating
excessive number of line features. If the background clutter is dense and evenly
distributed in orientation, then it is possible to filter out many of these pixels at an
early stage.
Initially, we considered using texture-analysis methods for discriminating such clutter.
There exist various methods for extracting textural information from images that can
be largely divided into four categories [120]: statistical [96], geometrical [104, 117,
119], model-based [35, 100], and signal processing [28]. According to Tuceryan [120],
the outcome of these texture based methods seem applicable only to their reported
experimental setup. Furthermore, the variability of the clutter objects is usually too
large to be covered by a set of tractable models. The clutter types, that we usually
encounter in aircraft recognition include forests, urban areas, clouds, snow, rocks and
mountains. Grenander and Srivastava [46] proposed a way to model natural clutter
in terms of gradient distribution functions, classifying the clutter as one of three
classes (eg., structured, intermediate, and dense). However, we are not interested in
determining the clutter type, but instead, we are more interested in removing dense
clutter regions while preserving the aircraft outer boundary.
For this reason, we propose a simpler but effective approach, where local density and
orientation of edge pixels are used to distinguish clutter from aircraft. We implement this by applying a sliding window to the phase image, and examining the pixel
patterns in it. We use a rectangular window (see Figure 3.2), whose dimension is
proportional to the input image dimension. After numerous computer simulations,
the optimal window size was chosen to be 1/20th of the image size. The entire region
59
of the image is initially considered as clutter. The window slides on the phase image
in steps of 1/4 of the window size allowing a 75% overlap. The window is divided into
four quadrants. The edge pixels within each quadrant are collected and their density
and phase distribution are computed, based on the following measures.
1. The total pixel density, T , within the window exceeds the threshold (> 7%).
T is defined as a ratio of the number of non-zero pixels within the window to
the number of pixels in the window area.
2. The ratio of the maximum pixel density to minimum pixel density from the four
quadrants (ie., Q(1), Q(2), Q(3), Q(4)) must not be high,
max(Q(1) , . . . , Q(4) )
< thr
min(Q(1) , . . . , Q(4) )
where Q(i) is defined as the pixel density of the ith quadrant, and thr is a ratio
threshold.
3. The pixel phase values are not polarised in one direction.
Once these conditions are all satisfied, only then is the region under the mask allowed
to remain as clutter, otherwise it is considered to be a non-clutter region. When
the window is centered on the aircraft-clutter boundary, one or two quadrants of the
window will exhibit low pixel densities, not satisfying the second condition. Hence
the aircraft boundary is typically determined as the non-clutter region. This is clearly
shown in Figure 3.3, where the immediate proximity of the aircraft boundary is
labelled non-clutter. Figure 3.3 shows the aircraft images that contain dense clutter
in the background. The regions detected as clutter are shown as shaded regions. The
clutter rejection in Figure 3.3(a)-(f) amounts to 40%, 66%, 53%, 50%, 55% and 46%
of the whole edge image. Figures 3.4 and 3.5 show the edge images before (middle
60
2nd quadrant
3rd quadrant
4th quadrant
67% overlap
Figure 3.2: Sliding search window to check for detecting dense clutter. The pixels in
all of the four quadrants need to be dense and randomly oriented if the region under
the window is to be tagged as clutter.
column), and after (right column) the clutter removal. Notice further that although
portions of the aircraft boundaries are within the shaded areas of Figure 3.3, they
are successfully recovered (and not rejected) in Figures 3.4 and 3.5. This processing
feature is explained later in this section.
To extract straight lines, two approaches have been initially considered; one approach
is to generate the contours first and then to perform straight line fitting to the contours
[79]. The other approach is to generate shorter line segments, so called line primitives
[121], directly from the edge image (skipping the contour part), and progressively
build longer lines either by locally linking the line primitives [29, 71, 121] or by
finding and grouping globally optimal line segments [63, 94]. The latter approach is
better suited to images that contain long and straight lines as in images of buildings
and roads. If the object contains curved lines, then the first approach is preferred.
Our early experiments of the two approaches on real aircraft images clearly favoured
the first approach.
The phase image is raster scanned (left to right and top to bottom) for contour
labelling. When a current pixel is visited, its phase value suggests where to look for
61
(a)
(b)
(c)
(d)
(e)
(f)
Figure 3.3: Detection of randomly oriented dense clutter regions. The clutter regions
shaded. The clutter-aircraft borders are correctly included in the non-clutter region
so that the wing edges can be extracted.
62
Figure 3.4: Results of dense clutter removal process. The first column original
images, the second column edge images prior to the clutter removal algorithm,
and the third column edge images after clutter removal.
63
Figure 3.5: Results of dense clutter removal process (continued). The first column
original images, the second column edge images prior to the clutter removal
algorithm, and the third column edge images after clutter removal.
64
contour
-45 o
-67.5 o
-45
-45 o
90 o
67.5 o
90 o
90 o
-45 o
45 o
label:10
label: 13
-22.5 o
0o
22.5 o
-45 o
Figure 3.6: Contour labelling process. The current pixel searches for a contour pixel
to inherit the label from. The direction of search is defined the orientation of the
current pixel.
a labelled contour pixel with similar orientation. If there exists a labelled contour
pixel within the search window and the phase difference between the two pixels is
less than 23 , then the existing contour pixel inherits the contour label from it. The
phase similarity check ensures the extraction of smooth contours, not containing
high curvature points. Any high curvature point marks the end of the contour.
This labelling process is illustrated in Figure 3.7. Due to imaging degradation, edge
fragmentation is often encountered, therefore a gap up to 5 pixels is tolerated during
the contour extraction process. Each time a pixel is assigned a label, the pixel phase
distribution (in terms of histogram) for the contour label is updated. The phase
histogram is used to assess how straight the contour is. This information assists the
contour linking process, which is carried out subsequently. We also update whether
the currently visited pixel falls within the clutter or non-clutter regions. This becomes
useful when a decision is needed to accept or reject a contour that straddles both
clutter and non-clutter regions (refer to Figure 3.7).
Once contour extraction is complete, any short contour fragments that are separated
65
Clutter region
contours in the
clutter region
=> reject
Non-clutter region
Figure 3.7: If a contour has at least 30% of its pixels in non-clutter region, the contour
is accepted.
by slightly more than 5 pixels but collinear according to their phase histograms are
linked. Contours in the clutter regions are removed. However, if a substantial portion
of the contour is in the non-clutter region (ie., at least 30% of non-clutter pixels)
the contour is accepted (as illustrated in Figure 3.7). The outcome of this process
is illustrated in Figures 3.3(c)(d), 3.4(i) and 3.5(c). Sections of nose and cockpit
boundaries fall into clutter region, but are successfully recovered in Figure 3.4(i) and
Figure 3.5(c).
3.4
3.4.1
Our straight-line fitting approach is similar to that of Lowe [79]. The contour pixel
farthest from the line joining the contour endpoints is selected as a potential break
point. If its orthogonal distance is less than a threshold, the contour is split into two
sub-contours and the process is repeated again until no further contour splitting is
possible. Figure 3.8 illustrates the straight line extraction process, which eventually
66
Figure 3.8: Straight line extraction process, similar to that of Lowe [79]. This algorithm generates a line approximation which is visually plausible.
LINE
Line No: #
Endpoint1: (#,#)
Endpoint2:(#,#)
Length: #
Orientation: #
Significant: Yes/No
Collinear: pointer to its associated collinear line segments
Gap: #
Connected to: pointer to co-terminating proximal line segments
Figure 3.9: Line representation. Note that the symbol ] represents a number.
leads to the piecewise linearisation of the original contour. The resulting linear segments are stored in a database along with a number of line attributes, as shown in
Figure 3.9.
3.4.2
In practice, the aircraft silhouette outline is often fragmented due to various forms
of imaging degradation. The fragmentation can also arise from self occlusion due
67
to rudder, engine or missiles, and edge discontinuities due to wing flaps. Such fragmentation reduces the saliency of the desired line features, and presents a challenge
to the feature extraction processes. Numerous edge extension or linking methods
have been proposed to overcome this fragmentation problem. These methods are
broadly divided into two categories: a global process known as the Hough transform
[61, 64, 75] and local segment grouping approach [10, 90]. Hough transform methods
are not favoured in this work mainly because they implement global line search in
the image and therefore require significant postprocessing to associate the computed
line parameters with the line segments in the image. The second approach overcomes many weaknesses of the Hough transform, but calls for an iterative process to
link all fragmented collinear segments, and may not still handle severely fragmented
edges. Hybrids of these two approaches have been proposed [63], which appear to be
promising for linking severely fragmented collinear segments.
The proposed line extension method is basically a local approach, but is tailored for
our aircraft recognition application in that the line extension is desired only for the
wing edges. We prefer to use the terminology of line extension as opposed to line
linking or joining, mainly for two reasons. The first is that the gap between two
collinear lines can be large. The second reason is that we hypothetically join collinear
lines even if the resulting longer lines do not necessarily correspond to actual aircraft
edges. However, by extending lines we increase the probability that wing fragmented
edges become longer and improve in saliency. Extended lines that do not correspond
to aircraft wings may temporarily gain importance and be part of two, four or more
complex line groupings. These groupings, however, are most often discarded at the
higher level of generic aircraft recognition, where aircraft geometric and intensitybased constraints are applied.
After processing numerous aircraft images, we observed that no more than 4 edge
68
fragments are obtained from the wing edges. In our system, line extension is not
iterative and is confined to pairs of line segments. If the gap between the two collinear
segments is wide, only then a third line in the gap is searched for and three-line
extension is allowed. We also make use intensity information from the both sides of
the line segments to supplement the geometric conditions.
We define extended lines as lines generated by joining two or three collinear line
segments. The generation of extended lines is required in practice to build longer
lines out of numerous short segments. These longer lines need to be detected as they
are likely to belong to the structure of man-made objects, such as aircraft in our
application. The process of generating extended lines is based primarily on a number
of geometric attributes. The most important requirement for line extension is that
both line fragments, Li and Lj in Figure 3.10, must present similar orientations,
which in mathematical terms, translates into (Li , Lj ) < , with > 0 being an
upper angle deviation threshold. Note that the threshold becomes tighter if the
two lines are longer. Additional line joining constraints are summarised next with
`i , `j , gij and `ij being respectively the length of the two line segments Li and Lj , the
gap between the two segments, and the distance between the farthest endpoints of
the two segments.
1. `ij < 0.5 min(M, N ), where M and N are the image height and width, respectively.
2. (`i + `j + gij ) < `ij , where is a number slightly less than 1.
3. gij < 1 (`i + `j ), where 0 < 1 1.25,
or additional collinear line segment `k is found in the gap.
4. max(`i , `j ) < k min(`i , `j ), where k 1,
or gij < 2 (`i + `j ), where 0 < 2 1 .
69
: length of line Li
: length of line Lj
gij
Li
Lj
ij
Figure 3.10: Generation of an extended line - gap width, angular deviations and
length differences form the basis to extend the lines. Note that these two lines Li and
Lj are not removed from the line database. They are used later in the line-grouping
and evidence collection processes.
If L stands for the set of all line fragments in the image, then for every pair of lines
(Li , Lj ) L2 , the conditions above are checked to see if their geometric relationship
is suitable for extension. The final decision is held until the intensity pattern in the
vicinity of the lines are examined. Such an intensity augmented decision make the
line linking process more robust.
An explanation of all 4 conditions is now given. The first condition is based on
the observation that excessively long lines are usually generated from the fuselage of
commercial aircraft, road, river, costal line, runways etc. Extending these lines do not
bring any benefit to the system, hence they are left unextended. The second condition
ensures that both Li and Lj are aligned so that the length sum of the segments and
gap is slightly larger than or equal to (in case of perfect alignment) the distance
between the two farthest endpoints. The third condition requires that the gap is not
too large relative to the line length sum (ie., (`i + `j )). A smaller gap to length sum
ratio provides a strong indication that the two lines Li and Lj should be joined. If
the gap is too large, then a search for a third line within the gap is initiated. If a
collinear line is found, then all three lines are joined. The fourth condition requires
70
that no one line should be much longer than the other. The much shorter line could
be clutter and the confidence of joining the two lines is relatively low, and therefore
they are not linked. The only exception to this is if the gap is extremely narrow with
respect to the line length sum.
The geometric constraints are followed by an intensity profile check alongside the
line segments. As shown in Figure 3.11, the extended line frequently separates the
region into aircraft body (shaded) and background regions. Therefore, we expect the
intensity averages from Li side (blue windows) and Lj side (green windows) to match
(ie., their mean difference must be less than a threshold of 25 units) in at least one
side of the extended line. In order to deal with accidental failures due to noise pixels,
we repeat this procedure along 4 strips of windows as shown in Figure 3.11. If any
one of the strips returns a good intensity match and the geometric conditions have
been satisfied, then the two lines Li and Lj are extended.
However, if the aircraft is camouflaged and the background is also cluttered, then the
intensity test may fail. In this case, the alignment of Li and Lj needs to be almost
perfect for the intensity check to be ignored and the lines to be joined. On the other
hand, if some of the geometric conditions fail just marginally, then the intensity check
is revisited with a tighter threshold. If the intensity test still returns a good match,
then the lines Li and Lj are allowed to join.
The extended line is then stored in the existing line database, which is now denoted
by LE = {Li , 1 i NE }, where the parameter NE is the total number of lines.
The extended line in the line database will have the collinear slot activated (Figure
3.9). This slot will contain the labels of the two collinear segments Li and Lj . This
will help track which lines are used to extend the given line, when the need arises in
the evidence accumulation stage. Figure 3.12(b) illustrates the outcome of the line
extension process applied to the image of Figure ?? (see the red dotted lines).
71
AIRCRAFT REGION
Hypothetical (extended) Line
Lj
Figure 3.11: Intensity means collected in the vicinity of the line pair. The intensity
information is used to supplement the line extension decision.
Figure 3.12: (a) Line features prior to the line extension process (b) Line extension
and prioritisation outcome - extended lines (red dotted line), significant lines (blue),
and non-significant lines (green).
72
3.4.3
Line Significance
Line significance is the result of a selection process which favours longer lines over
shorter ones. Longer lines are more significant because they often describe the linear
structure of aircraft, particularly the wings and fuselage. The first step in determining
line significance is to sort all line segments by length and tag the Ns1 longest lines.
Figure 3.11 lines as significant. The next step is to sort all extended lines based on the
combination of gap width and collinearity, and tag the Ns2 best lines as significant.
We set Ns1 and Ns2 as 95 and 40 respectively. It should be noted that if the image
contains polarised clutter lines (as will be discussed in Section 3.4.5), such lines are
excluded from this line significance ranking process.
This line selection process contributes to a considerable reduction in the number of
multiple line groupings that can be formed in later stages of image analysis. Referring
back to Figure 3.12(b), lines tagged as significant are shown in red and blue. The
red dotted lines correspond to the extended lines. The green lines represent nonsignificant lines. Figure 3.9 illustrates the attributes of a line and shows a slot reserved
for line significance. The last line slot in Figure 3.9 points to the closest lines in the
immediate vicinity of endpoints.
3.4.4
Line Description
While implementing this aircraft recognition system, it was found that it is beneficial
to establish a mechanism by which lines in the image are differentiated. The reason for
this is that it has been observed that long lines may appear in both aircraft structure
and background clutter. Short lines on the other hand, appear predominantly in the
background of cluttered images.
73
Mathematical definition
top 10%
top 20%
bottom 20% and length< 0.05(M + N )
bottom 10% and length< 0.025(M + N )
Line descriptions like short and long need to be defined in mathematical terms before
they can be used in the recognition process. Such descriptions will have a meaning
if one constructs a line length subdivision between the shortest and longest lines and
associate each description with a given length interval.
In the actual system implementation, extremely long lines (ie., longer than half the
image dimension average) are removed before all lines are sorted in an ascending order
based on length. Table 3.1 provides a mathematical definition of the line description
used in this thesis.
It should be pointed out that the line attribute significant, defined earlier in Section
3.4.3, is also based on line length ordering. The difference, however, is that the line
attribute significant is not given relative to the total number of image lines. Instead
it is fixed as Ns1 + Ns2 lines, regardless of the line count in the image. Having a fixed
number of significant lines, reduces the variability of the system processing time as a
function of the number of lines in the image.
3.4.5
In Section 3.3, we discussed how clutter is filtered to improve the system performance.
Other clutter types of concern arise from man-made objects, roads, buildings and
grids. An attempt to filter them is likely to remove many desirable edges from the
74
aircraft structure. However, if the clutter is in the form of short or long line segments
mostly aligned along one or two directions, then it is possible to discriminate such
lines by assigning a unique tag to them. In this thesis, we denote such clutter in one
direction as polarised clutter, and in two approximately orthogonal directions as grid
clutter.
Initially, the orientation of all lines are extracted from the line database (see Figure
3.9) and processed to form an orientation histogram. The histogram contains 10 bins,
each of which has a width of 18 . If a direction bin with an index i shows a large
count, then the line counts of bins, i 1, i, and i + 1 are noted. If the count sum
exceeds 70% of the total line count, then all lines oriented along the direction are
declared as polarised.
Figure 3.13 shows some examples of polarised clutter. Figure 3.13(a) is the line
plot of the image shown in Figure 1.3(c), where the background generates many
parallel lines along the aircraft fuselage direction, and Figure 3.13(b) shows the line
orientation histogram, which suggests that the image contains polarised clutter. This
usually occurs when the camera tracks the high speed aircraft, causing the background
to appear as parallel lines. These polarised lines are usually perpendicular to the
wing edges, therefore cannot form the wing edges. Consequently, the lines in the
polarisation direction (with 10 tolerance) are prevented from entering the line
prioritisation process of Section 3.4.3, and are tagged non-significant. This provides
the opportunity for the wing edges that may appear short due to the polarised lines,
to be accepted as significant lines. In Figure 3.13(a), only a small portion of clutter
lines are shown in blue (which represents significant), and the wing edges are shown
in blue or red.
When an aircraft dispenses a flare, the flare trails often form long parallel lines along
the aircraft fuselage direction, as shown in Figure 3.13(c). The histogram in Figure
75
50
140
100
120
100
frequency
150
200
80
60
250
40
300
20
350
0
100
100
200
300
400
80
60
40
20
20
40
60
80
100
60
80
100
10
10
line orientations
500
(a)
(b)
16
100
14
200
frequency
12
300
10
400
4
500
0
100
100
200
300
400
80
60
40
20
20
40
line orientations
500
(c)
(d)
50
100
10
frequency
150
200
250
300
350
0
90
50
100
150
200
250
(e)
300
350
400
450
500
80
70
60
50
40
30
line orientations
20
(f)
Figure 3.13: Histograms of the line orientations are shown in the right column. The
images in the left column show clutter lines that are predominantly oriented along
one or two directions.
76
3.13(d) shows two distinct peaks at 80 and 80 , and the line count sum obtained
from the bins at (60 80 -80 ) exceeds 70% of the total line count. Therefore, the
clutter is regarded as being polarised. However, in this particular case, the line counts
for the extended and non-extended lines are respectively less than Ns2 and Ns1 , which
are defined in Section 3.4.3, so all the lines are accepted as significant lines.
Figure 3.13(e) has grid lines in the background. These grid lines result in two distinct
peaks roughly 90 apart in the line orientation histogram as shown in Figure 3.13(f).
In this case, it is possible that one of the grid directions is aligned with one of the wing
edges. Therefore, the grid lines should not be restrained from becoming significant.
We instead lower the thresholds used in the mathematical definition of long lines
(refer to Table 3.1) so that the wing edges that may appear relatively short compared
with the rest of the lines may have a better chance of belonging to long lines, hence
becoming more likely to survive the line grouping processes. The wing edges 3.13(e)
are all successfully labelled as being significant and long.
3.4.6
Object recognition systems often make use of perceptual grouping techniques to form
more complex line features, using proximity, parallelism and co-termination properties
[62]. In our system, we use the co-termination property to form long line chains, which
may potentially represent sections of the aircraft silhouette. Two lines are related by
the co-termination property if the distance separating their closest endpoints is below
a preset threshold, as shown in Figure 3.14(a)). These lines are linked by activating
the connected to slot in Figure 3.9. For example, a line Lj is linked to Li by placing
the index of Lj to the connected to slot of Li (and vice-versa). Furthermore, the
angle subtended by the two lines ij , and the endpoints through which the link was
established are also recorded as shown in Figure 3.14(a).
77
ij
connected via
endpoint proximity
property
Lk
Lj
e1
Li
current line
e2
Li.connected_to
e1
Lj
e2
Lk
L7
function = Is_connected(L 7,L 60 )
L17
L15
L25
L27
L40
L36
Is_connected(L
15
,L 60 )
Is_connected(L
25
,L 60 )
Is_connected(L
27
,L 60 )
Is_connected(L
,L )
17 60
Is_connected(L
36
depth first
recursive search
,L 60 )
L60
L45
Figure 3.14: Forming a line link based on the endpoint proximity property is shown
in (a), and a recursive line search to check if two lines are linked via a line chain is
shown in (b).
78
Having these links all established, checking if one line is connected to another distant
line is a simple matter of initiating a recursive search algorithm. This is illustrated in
Figure 3.14(b), where the connection between L7 and L60 was checked by implementing a depth-first recursive search. The search sequence is shown on the right side of
Figure 3.14(b). This search algorithm is used later in the fuselage finding stage, where
the connection between the aircraft nose edge to the wing via the fuselage boundary
edges is traced. In this work, it was found practical to limit the search depth to a
maximum of 7 levels, and have line proximity upper bound of 8 pixels.
3.5
Two-Line Grouping
After experimenting with a large number of aircraft images, we observed that our
lower level image processing (ie, edge detection, contour extraction, straight line
extraction and extension) is effective for extracting the wing leading edges and, to
a lesser degree, the wing trailing edges. The wing tips are often harder to detect
because they are usually short, and for fighter jets, the wingtips are often loaded with
missiles. Given these observations, it was decided that a wing is best represented by
a line pair (or two-line grouping) instead of a trapezoidal shape as used in [32, 33],
which implicitly requires finding the wing tip edge or fuselage-wing border (refer to
Figure 2.9 in Chapter 2). Furthermore, a two-line grouping is also very useful for
describing a nose as a wedge-like shape.
The aircraft nose shape in the image is usually curved, which initially suggested the
use of corner detection techniques [26, 78, 85, 105, 125] to find the corner points.
Figure 3.15 illustrates large variations in the nose shape and intensity distributions.
Most template based corner detection methods, when applied to such images [105],
may detect the corners from the noses, but will also generate a large number of
79
undesirable corners elsewhere because such template cannot discriminate the nose
corner from other corners. Furthermore, the detected corners would not provide any
information about the nose boundaries.
Another common corner detection method is contour-based, where the curvature is
computed along the contour, and local maximal curvature points are noted as a
potential nose tip [26]. This approach is based on the assumption that the nose
boundary is successfully extracted as a continuous contour, approximately parabolic
in shape. However due to image degradation and shading, such a parabolic contour
may be broken into two or more disjoint segments, making the curvature computation
at the true nose tip difficult to implement.
Knowing that line features representing the aircraft boundaries are already available,
and that two-line groupings will be performed to detect potential wings, it would be
more appropriate to treat a nose as a two-line grouping and include the nose detection
in the two-line grouping process.
To generate two line groupings that potentially represent a wing or nose, a number of
constraints derived from possible image projections of a wing or nose are formulated.
In the next two subsections, we introduce the rules governing the wing and nose
formation processes that are applied to every pair of lines from the line database.
3.5.1
Given a pair of lines labelled Li and Lj , we define parameters ti and tj , which indicate
in relative terms how far the lines are from their intersection point, denoted as C (see
Figure 3.16). The following set of constraints is used to detect wing candidates.
1. the two lines are labelled as significant and at least one line is long (Table 3.1),
80
or they are connected via one or two co-terminating lines (Figure 3.17(a)).
2. the intersection point C in Figure 3.16 must satisfy ti < 1 , tj < 1 and
ti + tj < 2 , where 1 and 2 are thresholds.
3. the separation between the two lines must not be too large (ie., the parameter
apart in Figure 3.16 must be less than a preset threshold).
4. the two lines must overlap when rotated about C as shown in Figure 3.17(d).
This is conditionally relaxed to accommodate severely occluded wings.
5. min(li , lj )/ max(li , lj ) > , where is a threshold value (Figure 3.17(e)).
6. the coordinate of the mirror image of C must be within the image (Figure
3.17(g)).
7. the line angular deviation should satisfy 6 < C < 90 (refer to Figure 3.16 for
the definition of C ).
81
8. the region enclosed by both line segments must not show excessive intensity
variation.
The first condition requires that potential wing edges must be long or they must be
connected via a third line (potential wingtip). This is achieved by the line connection
search as discussed in Section 3.4.6. For the second condition, the threshold for
2 is made linearly proportional to cos(C ) in Figure 3.16, allowing the intersection
point C to be far from the lines if C is small. The sixth condition removes any
two-line grouping in the vicinity of the image borderline, whose opening faces the
image border. Such a two-line grouping cannot possibly form a wing-pair inside the
image. The seventh condition sets limits for the wing angle C . Provided that the
viewpoint is not very oblique, the wing angle C has been found to be less than 90
for most aircraft. The last condition examines the image intensity distribution in the
region delimited by the line pair. This condition rejects regions with widely varying
texture and favours uniformly distributed regions. In practice however, a wing may
82
(a) the two lines are short, but connected by a third line
(recursive search - "connected_to") - accept
C
no
overlap
partial
overlap
image border
Figure 3.17: Wing candidate detection conditions - examples of accepted cases (a)
and (d), and commonly arising failed cases (shown in red lines).
83
Gradient distribution
within the region enclosed by a two-line-grouping
counts
gradient of image
intensity
Figure 3.18: Gradient distribution curve for the region enclosed by a two-line grouping. To pass the intensity check, the 10%, 20%, 30% percentiles must be less than
preset thresholds (ie., majority of the populations must be on the left corner).
display some texture, camouflage or shadowed subregions and therefore care must
be taken not to discard such wings. As shown in Figure 3.16 the intensity values
along 3 strips are collected and differentiated to generate the gradient profiles. The
gradient data are processed to from a gradient histogram as shown in Figure 3.18.
A region of uniform intensity will generate gradients with zero values resulting in
a sharp peak at the zero gradient. Camouflage regions which contain different but
uniform intensities, will also generate a strong peak at gradient level zero, along with
a small number of minor peaks associated with intensity jumps at the camouflage
boundaries. The gradient histogram for cluttered regions, however, is spread out as
shown in Figure 3.18. By normalising the area under the gradient distribution, and
comparing the (10%, 20%, 30%) percentiles with pre-defined gradient thresholds, one
is able to heuristically distinguish between clutter and non-clutter regions.
84
nose tip
C
gij
di
dj
N
Li
longer leg edge => LL
Lj
lL= ||LL||
tL= dL/lL
tS= dS /lS
3.5.2
A similar approach is used to set the conditions for extracting two-line groupings for
the nose. Given two lines Li and Lj , the longer and shorter lines are assigned tags
LL and LS respectively. Initially, a set of conditions which portray a typical nose
configuration shown in Figure 3.19, is presented below.
1. the nose edge must not be excessively long, ie., max(kLi k, kLj k) < lth1 , where
lth1 depends on the image resolution.
2. the intersection point C in Figure 3.16, must satisfy tL < L , tS < S , where
L < S (refer to Figure 3.19).
3. the gap between their closest endpoint (ie., gij in Figure 3.19) must not exceed
a preset threshold gth , where gth is proportional to (kLi k + kLj k).
4. the two lines must overlap when rotated about C as shown in Figure 3.17(d).
85
N
(a) nose angle is too
large, or gap is too wide
(d) supporing line is (e) supporing line is
incorrectly oriented incorrectly oriented
- reject
- reject
N
(l) nose angle is too small
Figure 3.20: Incorrect nose configurations in (a), (g), (l) are subject to further verification. Resulting accepted and rejected configurations are shown in blue and red,
respectively.
86
Figure 3.21: Any nose candidate in close proximity to image borderlines, which is
oriented in such a way that a large portion its projected silhouette is placed outside
the image borderlines.
C
(nose tip)
Figure 3.22: Location of the nose tip. If the nose tip is not visible, then it location
is estimated at the midpoint of nose edges intersection and midpoint of nose edges
inner endpoints.
87
Figure 3.23: Multiple two-line grouping configurations generated from single physical
nose.
5. the length ratio must be less than a preset threshold (ie., kLL k/kLS k < lth2 )
(see Figure 3.20(g) for a contradicting case).
6. the lines must not be too close to the image borders (see Figure 3.21).
7. min < N < max (see Figure 3.20(a) and (l)), where min and max are set to
be inversely proportional to (kLi k + kLj k).
Condition 1 sets an upper limit for the nose edge lengths. Nose edges are unlikely to
appear very long relative to the image size. Therefore, an upper limit is given as a
function of the image size. Condition 2 sets a limit on how far the lines can extend
before they intersect. The thresholds S , L , are bounded within 1.5 - 2.6. Condition
3 limits how far the lines can be separated from each other. The gap gij needs to be
88
small when compared with the line pair. Condition 4 states that unless the nose is
occluded, the nose boundary lines must overlap when rotated about the nose intersect
point, C. Condition 5 is required so that any line pair coincidentally formed by a long
line with a short clutter segment could be rejected. Condition 6 necessitates that if
the line pair is located in the vicinity of the image border and is orientated in such a
way that a large portion of the hypothetical aircraft silhouette falls outside the image
(see Figure 3.21), then the line pair cannot be a potential nose. The last condition
specifies the range for the nose angle, N . The upper limit decreases linearly with
the increasing the mean of the two line lengths. If all of the conditions are satisfied
then the two-line grouping is accepted as a potential nose. However, if some of the
conditions fail just marginally, then supplementary evidence is searched for.
The nose contour is usually curve shaped and therefore approximated by more than
two line segments. This in turn leads to multiple combinations of line pairings as
shown in Figure 3.23, and some of these may not satisfy all of the above constraints.
Therefore, if a line pair fails one of the constraints (see Figure 3.20(a), (g), (l)), then
further validations follow, checking for any supportive connected lines and a shaded
region. The validation procedure is summarised below, with reference to Figure 3.23.
1. As shown in Figure 3.20(a), if the lines are short and their nose angle N is
large, they are usually considered as clutter However, if at least one of the lines
is connected to a third line (by checking its connected to slot), and the three
lines approximate a parabolic shape as shown in Figure 3.20(c), then the line
pair is accepted as a potential nose. If not (as shown in Figure 3.20(b), (d),
(e)), then the line pair is rejected.
2. If the gap is wide (see Figure 3.20(a)), then the gap is searched for line(s)
bridging the gap. If the recursive search for a connected line chain from one
89
edge Li leads to Lj , as show in Figure 3.20(f), then the line pair is accepted as
a potential nose.
3. If one line is long and the other is much shorter as shown in Figure 3.20(g), then
the shorter line is checked for any connected line forming a parabolic shape as
shown in Figure 3.20(i). If such a third line is found, then the pair is accepted
as a nose candidate. Otherwise (see Figure 3.20(h), (i), (j)), the line pair is
rejected.
4. If the nose angle N is less than min and the gap, gij , is small (Figure 3.20(l)),
then the intensity between the lines is noted. Only if the region is dark then
it is regarded as a shadow section of the shaded nose cone, and the line pair is
accepted as a potential nose (see Figure 3.20(n)).
Since the nose tip appears often rounded, if the nose angle is small, then the intersection of the two nose edges can occur at a much further distance from the true nose
tip (see Figure 3.23(b), (c), (f)). Therefore, when a nose is formed, its corner location
is assigned as the midpoint of the line joining the intersection point and midpoint of
the two inner endpoints of the nose legs shown in Figure 3.22.
3.5.3
Once all geometric and intensity constraints are satisfied, the line pair candidate is
entered in a dedicated database along with a number of geometric and image intensity
attributes (see Figure 3.24). As an example, the angle between the line pair is recorded
in the angle slot in Figure 3.24. The length sum of the two lines (legs) is assigned
to the weight slot. The image in Figure ??(a) is used to show the outcomes of the
nose and wing processing steps. Figure 3.25 shows all the nose and wing candidates
generated from that image.
90
WING/NOSE
Wing/Nose No: #
Leg1: #
Leg2: #
Intersection point (corner): (#,#)
Angle: #
Weight: #
Average Intensity Level: #
Distance Between midpoints(apart): #
Minimum Gap: #
Figure 3.24: Wing/Nose Representation. Leg1 and Leg2 are the two lines forming
the two-line grouping. Note that the symbol ] refers to a number.
Figure 3.25: Resulting wing and nose candidates from the two-line grouping process
on the image of Figure 3.4(a). In (a), line pairs are shown in blue, and red lines are
used to show which two lines are paired. [(b) 80 nose candidates and (c) 513 wing
candidates].
91
3.6
Four-Line Grouping
Four-line groupings are a higher level data abstraction developed for the purpose
of representing the wing-pair of an aircraft. The wing-pair is the most prominent
feature of an aircraft and forms the starting point for generating aircraft hypotheses
in the image. Given that one wing is represented as a two-line grouping, it naturally
follows that a wing-pair is represented as a four-line grouping as shown in Figure
3.26(a). Any four-line grouping may be oriented arbitrarily in the image. Of all
possible orientation configurations, only those that result in the wing patterns of
Figure 3.26(c) are representative of real aircraft wings.
To extract a reduced number of meaningful groupings, every pair of two-line groupings, wing(i) and wing(j), must satisfy a number of geometric constraints as given
below.
1. The two wings must have compatible sizes, (0.5 < wing(i).weight/wing(j).weight <
2 (refer to Figure 3.24)). Recall that the wing weight is the leg length sum of
the wing.
2. The two wings must have comparable wing angles, (|LC RC | < d , where d
is an angle threshold).
3. The wing span must not be too small, (kLC RCk > Lth ), where Lth is derived
from the line length statistics.
4. Any 2 non-collinear edges (one from each wing) must not cross internally.
5. The wings must face each other (refer to Figure 3.26(b) for examples of unacceptable wing arrangements).
6. The angles F and R in Figure 3.26(a) must not exceed a preset threshold.
92
7. The two wings must comply with the skewed symmetry property [52, 53]. In
other words, the point M in Figure 3.26(a) is roughly the midpoint of LC and
RC.
The fifth test examines the relative orientation of the two wing candidates. An acceptable wing-pair falls into one of three wing configurations as depicted in Figure
3.26(c), namely diamond, boomerang and triangular. Figure 3.26(b) illustrates typical configurations of rejected groupings, which take up a large proportion of the
cluster set. Recognition of the leading and trailing edges is based on the comparison
of F and R as shown in Figure 3.26(a). The lines associated with the smaller angle
are labelled as the leading edges, and the ones with the larger angle are labelled as
the trailing edges. The last condition checks the symmetry property of the wing-pair
about its symmetry axis. Symmetry is a powerful grouping mechanism, and has been
addressed in numerous computer vision works [23, 30, 31, 38, 80]. As shown in Figure
3.27, the three points made up of the two wing intersection points (F P and RP )
and M the midpoint of LR and RC, preserve the collinearity property after weak
perspective projection. The collinear property holds exactly if the wings are perfectly
coplanar. In practice, however, some errors are introduced because of possible distortions caused by the imaging process and lower-level processing imperfections in
locating the wing edges. Furthermore, the wings are only approximately coplanar for
most aircraft. The error becomes larger if R approaches 180 , or if LC and RC are
small, as the location uncertainties of RP, LC and RC can grow very large. Hence,
for triangular wings, the symmetry test is replaced by another condition: that the
line joining FP and M must not cross the two trailing edges.
A wing pair satisfying these constraints is compiled into one of three wing categories,
namely triangle, diamond and boomerang (refer to Figure 3.26(c)). The data structure
is shown in Figure 3.28. The weight slot in the wing pair representation contains the
93
FP
PT4
PT2
F
3
RC
PT1
1
LC
LC
PT3
PT5
RC
PT7
MID
PT8
PT6
FP M
RP M
RP
FP
RC
LC
RP
LC
RP
diamond
LC
RC
triangle
FP
RP
RC
M
boomerang
94
RC
midpoint of wing
intersection points
(LC and RC)
M
RP
FP
Figure 3.27: Three point collinearity property both in space and in the image.
sum of the four line lengths, and is used to prioritise four-line groupings in terms of
size. For each wing-pair category, if the number of the four-line groupings exceeds 100
then, only the top 100 with the largest weights are selected for further processing. It
should be noted that one aircraft may generate multiple four-line groupings formed
by different combinations of line fragments as shown in Figure 3.29. Accepting all
line groupings and letting them compete in the later stages improves the system
robustness. Figure 3.30 shows all outcomes of the four-line grouping process.
3.7
The generation of an aircraft hypothesis calls for the consistent association of a wingpair candidate with a matching nose. As indicated in Table 1.1, a matching nose
must be aligned with the fuselage axis and be facing the wing-pair. The fuselage axis,
95
FOUR-LINE GROUPING
Wing-pair Index: #
Type: Boomerang/Diamond/Triangle
Left Wing Index: #
Right Wing Index: #
Four Edge Indices: [#, #, #, #]
Corner Coordinates: [FP(#,#) RP(#,#) LC(#,#) RC(#,#)]
Weight: # (four line length sum)
Wing Span: # (distance separating LC and RC)
Figure 3.28: Four-line grouping representation. The two slots right and left wing
hold the wing numbers which form the wing-pair. Note that the symbol ] refers to a
number.
L1
L2
FP
RP
L2
L1
line generated
by extending L1 and L2
Figure 3.29: Extraction of multiple wing-pairs due to wing edge fragmentation. This
figure shows 3 possible boomerang wing pairs arising from one wing pair, one of whose
edges contain 2 segments.
96
Figure 3.30: Resulting wingpair candidates from the four-line grouping process on
the image in Figure 3.4(a). The blue lines are constituent lines of four line groupings.
Red and green lines are introduced to show how the blue lines are grouped together .
[(b) triangle wing candidates, (c) diamond wing candidates, and (d) boomerang wing
candidates].
97
at this stage, is defined as the axis going through the forward point, F P , and rear
point RP , as shown in Fig. 3.26(a). For a triangular wing shape, RP is the midpoint
M of the left and right wing intersection points.
A successful nose-wing association requires several geometric conditions to be met
that are consistent with the generic viewpoint (ie., wings are visible). These conditions
are based on a number of heuristics deduced from the structure of a large number of
aircraft, imaged under different viewpoints. These conditions are listed below (refer
to Figure 3.31).
1. The lengths of the nose legs must not exceed the distance between the nose tip,
C, and intersection point of the wing leading edges, FP.
2. The nose tip, C, must be located within the nose search region as shown in Figure 3.31(a). The size of the nose search region, which is defined later, depends
on the wing-pair size and shape.
3. The nose must face the wing-pair. The nose angular bisector must approximately line up with the line joining C and FP (ie. d in Figure 3.31(b) should
be small).
4. The line joining C and MID (middle point of M and RP) must pass through the
gap between the wing-pair, without touching any one of the wing edges (Figure
3.31(c)).
5. The line joining C and MID must belong to the sector delimited by the nose
legs. In the actual system implementation, some tolerance is introduced by
slightly widening the nose sector.
6. The line joining C and FP must not be near parallel with any of the wing
leading edges (ie., min(L , R ) > th2 in Figure 3.31(b)).
98
7. The projection of the nose tip onto the line joining LC and RC must fall within
the wing span (ie., Wp < W ) as shown in Figure 3.31(c).
After examining and processing numerous aircraft images, it was determined that the
nose search region is located along the fuselage axis at a distance ranging between
kF P RP k/2 and 3kF P RP k forward of F P . The nose search region is shown in
red in Figure 3.31(a). The quantity kF P RP k is the distance separating the wing
forward and rearward points. The lateral angular extent of the search region (s ) is
determined to be no larger than 30 from the wing symmetry axis. Requirements 6
and 7 are imposed to eliminate nose-wingpair associations with pronounced skewness
(ie., the nose is considerably tilted to one side of the wingpair).
If a wing-pair and a matching nose satisfy these requirements, then their association
is accepted as an aircraft candidate, and an aircraft hypothesis is generated. Often
more than one nose may be successfully associated with a wing-pair, particularly
in cluttered images. A wing-pair candidate may be associated with a nose arising
accidentally at the cockpit. Also, there can be multiple line-pairs associated with the
same nose part (as shown in Figure 3.23(a)), or one side of the nose is shaded giving
rise to three legs. If a wing-pair is matched to more than one nose candidate, it is
necessary to prioritise them based on how well each nose is aligned with the wings
symmetry axis (ie., how small dev in Figure 3.31(c) is).
The nose candidates for the currently considered wing-pair enter a fuselage test. This
test checks for the existence of lines filling the space between the nose and the wingpair. These lines are called fuselage lines since they usually emerge from the fuselage
structure, although some often emerge from the cockpit. The fuselage test is presented
in detail in Section 4.1.1.
99
Search region
R2
FP
L
F
nose bisector
d < th1
R1
MID
Wing-pair bisector
RP
(b) Alignment
dev
diff
FP
W P << W
M
LC
MID
RP
RC
WP
Figure 3.31: Nose to wing-pair matching. The nose must be within the search region,
must be facing the wing-pair, and the skewness must not be severe.
100
3.8
As shown in Sections 3.5-3.7, formation of the line groupings (ie., wings, noses, wingpairs, and aircraft hypotheses) requires sequential applications of constraints, often in
the form of hard thresholds. This raises a concern that violating one threshold may
result in a failure to detect the aircraft. To mitigate this concern, the thresholds were
relaxed at the lower levels and were gradually tightened. Furthermore, some rules
were made flexible so that when one of the conditions fails marginally, the candidate
line-grouping is given additional validation checks for a second chance to survive.
Nonetheless, it would still be preferable to defer the decision making until after all
the parameters are considered. Another drawback of the rule based approach is that
the thresholds need to be manually adjusted, making the parameter tuning process
time-consuming and tedious.
We note that some of the line-grouping formation rules in Sections 3.5-3.7 are descriptive. Actual coding of such rules involves numerous parameters that are often
co-related or need to be constrained. We will name these parameters feature parameters. A collection of N feature parameters forms an N -by-1 vector that maps to a
point in the N -dimensional parameter space. A large number of the feature parameters generated from the training images will form clusters in the parameter space.
The surface of the clusters approximates the decision boundaries. The feature parameters collected from the non-aircraft images will be randomly distributed in the
parameter space.
Assuming a 2-D parameters space, Figure 3.32(a) illustrates the rectangular decision
boundaries of the rule-based approach with fixed thresholds. Such simple boundaries
usually let too many clutter features pass through (ie., under-fitting). Neural network
101
Features
Clutters
SImple Decision
Boundary
Complex-Shaped
Decision Boundary by
Neural Network
(a)
(b)
Figure 3.32: In the feature parameter space (2-D for illustrative purpose) the blue
circles represent aircraft feature parameters and the red squares represent clutter
feature parameters. (a) Use of single thresholds forms simple decision boundaries
that pass many clutter features, and (b) the neural networks can generate complex
shaped decision boundaries.
based approaches may provide a better approximation of the decision boundaries as
shown in Figure 3.32(b).
3.8.1
Input to the neural networks is the feature parameters that are associated with the
rules in Sections 3.5 - 3.7. The descriptions of the input parameters will not be
presented here, however the full listing of the input parameters and their references
to the rules and figures in Sections 3.5 - 3.7 are included in Appendix A.
We use the feed-forward neural networks as they are most popular and widely used in
the area of classification. The feed-forward neural network begins with an input layer,
which is connected to a hidden layer. This hidden layer can be connected to another
hidden layer or directly to the output layer. It is very rare for a neural network to
need more than two hidden layers [54].
102
The purpose of the proposed neural network is to indicate whether or not the input
parameters belong to the aircraft features or non-aircraft features. The output of the
network should approximate 1 for aircraft features and 0 for non-aircraft features.
Hence, only one neuron is used in the output layer. The neurons in the hidden
layer and output layer have a logistic sigmoid (log-sigmoid) transfer function (ie.,
1/(1 + en )) which is shown in Figure 3.33. The log-sigmoid transfer function was
selected because it is well suited to the output range [0, 1].
The best known example of a neural network training algorithm is back-propagation
[50, 99]. In back-propagation, the gradient vector of the error surface is calculated.
This vector points along the line of the steepest descent from the current point,
hence moving a short distance along this line will decrease the error. A sequence
of such moves leads to a local minimum. Even though this is the easiest algorithm
to understand, this is often too slow for practical problems. Instead, we resort to
Levenberg-Marquardt algorithm [8] which is typically one of the fastest training algorithms.
103
Next step is to determine the number of neurons in the hidden layer(s) that will produce good results without over-fitting. Over-fitting occurs when the neural network
becomes so complex that it may actually fit the noise, not just the signal. Instead
of learning, the network memorises the training set hence producing unpredictable
results when new cases are submitted to it. There are no quantifiable, best answer to
the layout of the network for any particular application. There are only general rules
that have been practised by researchers and engineers. Some of them are summarised
below.
The number of hidden neurons should be in the range between the size of the
input layer and the size of the output layer.
The number of hidden neurons should be 2/3 of the input layer size, plus the
size of the output layer.
The number of hidden neurons should be less than Ntest /(K (Ninput + Nouput )),
where Ntest is the number of cases in the training data, K is scaling factor
ranging between 5 and 10, and Ninput and Nouput are respectively the number
of neurons in the input and output layers.
Even though, the above rules may give a good starting point, the selection of the
network configuration really comes down to trial and error. The network design was
carried in the following manner:
The initial configuration was set as one hidden layer with the number of hidden
neurons set to half the sum of the input and output layer sizes.
Each configuration was trained several times, retaining the network producing
the smallest error rate. Several training trials were required for each configuration to avoid being fooled if training locates a local minimum. With the
104
Table 3.2: The neural network configurations and the mean error rates in detection
of wings, noses, wingpairs and aircraft hypotheses.
Features
Wing
Nose
Wingpair
Aircraft
no. of inputs
7
10
17
11
network configuration
7-5-1
10-4-2-1
17-6-1
11-6-1
best network (ie., optimum weights), this process is repeated by resampling the
experimental data; five-fold cross validation is used to generalise the error rate.
If the performance level is not met (due to under-fitting), then more neurons
are added to the hidden layer. If that does not help, then an extra hidden layer
is added.
If over-fitting occurs, hidden neurons are gradually removed.
The network output threshold is set to 0.5 for the evaluation of the mean error rate.
The term error rate is defined as the total count of misses and false alarms (in %).
Five-fold cross validation is used so that the error rates are generalised (ie., the error
rates remain consistent when new sets of of data are presented to the network).
The experimental data for each of the four neural networks contains 300 cases of
aircraft features (eg., wing, nose, wingpair and wingpair-nose) and 1500 cases of nonaircraft features. The network dimensions that result in the smallest mean error rates
are chosen for the system (see Table 3.2). When two configurations give almost same
error rates, the one with smaller number of the hidden neurons is chosen as it will be
less prone to over-fitting.
Note that the number of the hidden layers for the nose features increased to 2. Unlike
105
the wings, the nose sections are curved, non-planar and often shaded. Hence, discriminating the nose features is usually more difficult, and may require a more complex
configuration.
3.8.2
Figure 3.34 shows the receiver operation characteristic (ROC) curves [22], obtained
from the experimental data set. It is shown that high detection rates (eg., > 98%)
were achievable for small values of false detection rate. Achieving a very high detection
rate (or very low miss rate) is critical in this stage because failing to detect a wing
or nose leads to a MISS at the system output. The ROC curves indicate that the
neural networks may be a feasible option.
Next step is to examine if the proposed neural networks can remove the spurious
features that the rule-based approach could not remove previously. To test this, a
number of non-aircraft images are fed to the aircraft feature extraction rules (in Sections 3.5 - 3.7), and the surviving line groupings are collected. The feature parameters
generated from these line groupings are fed to the networks. The experimental results
showed that while maintaining a high detection rate above 97%, 30-40% of those spurious features could be removed (see the correct rejection rate in Table 3.3, indicating
possible improvement in terms of a reduced false alarm rate.
The neural networks are integrated into the system and tested on real aircraft images.
The effect of the neural networks on the overall system performance will be presented
in Chapter 6.
106
90
90
80
80
70
60
50
40
30
20
10
0
70
60
50
40
30
20
10
20
40
60
miss rate in percentage
80
100
20
(a)
90
90
80
70
60
50
40
30
20
10
20
40
60
Wingpair miss rate in percentage
(c)
100
80
(b)
100
40
60
nose miss rate in percentage
80
100
80
70
60
50
40
30
20
10
0
20
40
60
80
aircraft candidate miss rate in percentage
100
(d)
Figure 3.34: ROC curves for detection of (a) wings, (b) noses, (c) wing-pairs and (d)
aircraft hypotheses.
107
Table 3.3: Test of the neural networks on the spurious features that survived the
rule-based approach. As shown in the third column, 30-40% of those features are
successfully rejected by the neural networks.
Features
Wing
Nose
Wingpair
Aircraft
3.9
detection rate
97%
98%
97%
98%
Discussion
In this chapter, saliency based low level processing and line grouping mechanisms
for potential aircraft parts and hypotheses are discussed in detail. The low level
processing is dedicated to clutter removal and the detection of straight lines. The
pixel density and randomness of pixel orientation play an important role in the early
discrimination between object (aircraft) and clutter pixels.
The next processing step is to join collinear lines and establish a line data structure that includes information about the co-termination property of proximal lines.
This last processing step is very useful in tracing parts of the aircraft boundary
and contributes to the evidence accumulation process. Saliency-driven line organisation is carried out in this chapter. In particular lines are sorted and then allocated
descriptions depending on line length statistics. This processing measure reduces
considerably the polynomial growth of line groupings as the number of lines increases
progressively in each group.
To summarise, the key features of this chapter are: (a) processing of background
clutter, (b) tagging of polarised background lines, (c) extending and structuring lines
based on their saliency, (d) forming two-line groupings that potentially represent
108
wings and noses, wing pairs and aircraft hypotheses, and (e) introducing neural networks as an alternative approach to the rule-based line grouping method to improve
the system performance.
The following chapter describes the evidence accumulation processes based on examining positive and negative cues from aircraft parts and clutter.
Chapter 4
Generic Aircraft Recognition
In Chapter 3, the extracted lines were grouped to form wing-pair and nose associations, which are not small in number. This system adopts a strategy of what we call
low commitment; a large number of lower level features are initially accepted in order
to increase the extraction probability of features belonging to an aircraft. Higher
level rules, inspired from the aircraft knowledge domain, are then applied to filter out
spurious aircraft candidates arising from accidental line feature groupings.
In this chapter, the system subsequently collects evidences from fuselage, tail fins,
wing tips, and other areas within the aircraft region to consolidate correct hypotheses.
Intensity based information is also used in hypothesis promotion/demotion processes.
Section 4.1 outlines the evidence accumulation process, in which a confidence score
increases as aircraft parts are detected, and then additional set of positive and negative evidences are collected to further separate the score gap between the true and
spurious hypotheses. Section 4.2 describes the interpretation conflict resolution process. Section 4.3 shows how the system handles difficult shadow problems. Section
4.4 describes how the scores are weighted to improve the system recognition performance. Section 4.5 presents experimental results of selected aircraft images under
109
110
various imaging conditions: blurring, camouflage, clutter, multiple aircraft, occlusion, protrusions and shadowing effects. Section 4.6 provides several examples of
cases where the system outputs a winning hypothesis when there is no aircraft in
the image. This test illustrates how spurious hypotheses may form from background
clutter, and occasionally reach the final recognition stage. This chapter concludes
with a brief discussion in Section 4.7.
4.1
Evidence Accumulation
In this section, we address the aircraft evidence accumulation process in terms of fuselage, wing tips, tails fins, and local intensity match. We implement a voting scheme,
where the evidence score increases progressively as positive evidence accumulates,
and decreases if negative evidence is encountered.
4.1.1
Fuselage Detection
As with the wings and noses, a fuselage is an important feature of the aircraft which
connects the nose to the wings hence enclosing the forward section of the aircraft.
The fuselage section is readily visible from many viewpoints and contains long lines.
A fuselage is usually described as a long cylindrical shape, which is true for large
commercial aircraft, as shown in Figure 4.1(a). However, for military jets, the fuselage
does not usually have a simple shape and is more spread out laterally as shown in
Figure 4.1(b). The fuselage section for jets is defined in this thesis as the aircraft
structure located in front of the wings (see Figure 4.1(c)). This includes the cockpit
and nose. The fuselage region for such aircraft is often difficult to extract from the
image, particularly if the image is degraded. Applying camouflage to the fuselage
111
Fuselage is
long and narrow
Fuselage is flatter
Fuselage - wing junction
is blended and junction
edge is often undetected
Fuselage
Wing tip
Rear -fuselage
(only for boomerang wing shaped aircraft )
Tail fin
Figure 4.1: Typical commercial and military aircraft, and the parts that needed to
be detected for evidence score accumulation.
112
section further exacerbates the segmentation process of the fuselage region. In this
thesis, the detection of the fuselage section makes use of a different approach and takes
advantage of the observation that most edges arising from the fuselage structure are
roughly oriented along the fuselage axis. This means that the detection of a relatively
large number of similarly oriented lines within a confined region in the image, is
suggestive of the presence of a fuselage section. In practice these similarly oriented
lines are searched for within a region delimited by the detected nose and wing leading
edges.
Evidence about the fuselage section is determined by how much the fuselage axis,
obtained by joining the nose tip to the intersection point of the wing leading edges, is
covered by fuselage edges from both sides of the axis. This evidence is highest when
the fuselage edges cover all of the frontal aircraft section (from wings to nose). A
fuselage edge is defined as any edge approximately oriented along the fuselage axis
and located in a region around the fuselage axis. The remaining part of this section
explains in detail the detection process of the fuselage section.
The fuselage search region is based on the nose location and wing shape. As shown
in yellow in Figure 4.2 (a), the fuselage search region is defined as a trapezoid that
includes a convex hull made of the points C, NL , NR , PL and PR . These points are
respectively the nose tip, left and right nose endpoints, and innermost endpoints of
left and right leading edges of the wings.
To ensure that the search region is large enough to include all the fuselage section,
an extra margin of 10 pixels is added to enlarge the search region.
The gap widths kPL PR k and kPL0 PR0 k must be similar, as they represent the
width of the fuselage. If not, then the width of the search region is adjusted to fit
the smaller of these two gap widths.
113
Fuselage
search region
C (nose)
NL
NR
accepted
p1
wrong orientation
- rejected
FP
p2
10
10
PL
PM
PR
RP
P' L
P' R
detected
fuselage edges
C (nose)
projections of the
projections of the
fuselage lines onto
fuselage lines on line
the line joining C
joining C and P M
and P M
PM
114
If a line is found within the search region, then its alignment with the fuselage axis
(the line joining C and FP ) is tested. The angle between the line, p1 p2 , and the
fuselage axis, must be smaller than a preset threshold in order to be accepted as a
fuselage line. Any line outside the search region is not considered.
Having collected all fuselage lines, a fuselage score is computed based on the union
of their orthogonal projections onto the fuselage axis. This score computation is
illustrated in Figure 4.2(b), where individual line projections of length `L (i) from the
left side and `R (i) from the right side, respectively, are shown in red. The union of all
projections forms the total projected length, which is then normalised by the length
of the fuselage search region along the fuselage axis (estimated as kC PM k). The
fuselage coverage score is heuristically defined as,
PNR
PNL
`R (i)
i=1 `L (i)
fL + i=1
fR }
scorefuse = S {
kC PM k
kC PM k
(4.1.1)
where NL and NR are respectively the number of fuselage lines on the left and on the
right of the fuselage axis, S is a multiplicative factor made large to give more weight
to the fuselage section evidence, and fL and fR are respectively the left and right
side scale factors ranging from 0.4 to 1. These two scale factors are made inversely
proportional to the divided angular width of the nose search region, expressed in
terms of (C F P, C PL ) and (C F P, C PR ) as shown in Figure 4.3. This
scaling setup ensures that the narrower side receives more emphasis, as it is less likely
to include clutter lines belonging to the background.
Furthermore, if a subset of the extracted fuselage lines generates a connected chain,
linking the nose to PL or PR , then a bonus score is awarded because the connected
chain often represents a fuselage boundary. The term connected chain is defined in
this context as a series of lines where one line is joined to another by the endpoint
proximity property, while preserving a deviation angle greater than 75 . Figure 4.4
illustrates an example of an aircraft hypothesis where the nose connects to the right
115
1.2
1.1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
10
20
30
40
50
angle between [CFP] and [CPTR (or CPTL)]
60
70
Figure 4.3: Scale factor (fL or fR ) which is inversely proportional to the divided
angular width of the fuselage search region, expressed in terms of (C F P, C PL )
and (C F P, C PR ).
and left wing leading edges through two connected line chains, tracing the frontal
fuselage boundary. If the sum of NL and NR is very large (ie., > 25), most of the
fuselage edge lines are short and the fuselage coverage score is not large (ie., < 180),
then these lines are considered as clutter and a penalty of 100 is applied to the fuselage
coverage score.
This fuselage detection procedure is repeated for all nose/wingpair candidates. The
nose with the highest fuselage score is selected as the winning nose. However, if the
score gap between the winner and runner-up is very small and the runner-up nose is
best aligned with the fuselage axis, then the runner-up nose wins instead.
4.1.2
116
Figure 4.4: The detected fuselage boundary lines connect the nose to the wing leading edges via connected chains. Such a nose-to-wing connection provides the strong
fuselage boundary evidence.
The wing, nose and fuselage section are the most prominent features of an aircraft.
The use of these features is usually sufficient to detect an aircraft in a clean background. However, realistic scenes often include clutter, which (either on its own or
in combination with several aircraft parts) generate numerous spurious aircraft candidates. In order to increase the score gap between the true aircraft hypothesis and
spurious ones, more evidences are required from other aircraft parts. In this subsection, we consider locating the aircraft tail fin edges in the image to improve the
aircraft hypothesis confidence.
Detecting the tail fin edges correctly while filtering out clutter is not straightforward
because the edges are relatively short, and depending on the viewpoint, they are
often occluded by the rudder. Therefore, detection of the complete fin structure is
impractical. Instead, we mostly focus on the fin leading edges. If the complete fin
117
structure is detected as a two-line grouping, then a bonus score is added (see the
bottom left of Figure 4.5(a)). The tail fin detection algorithm checks the following
conditions assuming a generic viewpoint.
1. The tail fin must be located behind the wing and cannot be longer than the
longest wing edge from the same side.
2. The tail fin cannot be further from the fuselage axis than the wing is.
3. The tail fin cannot be too far behind the wing; an exception arises if the hypothesis has a boomerang wing shape and a narrow fuselage.
4. The tail fin must be approximately aligned with the wing leading edge, (|F
R | < th ) as shown in Figure 4.5(a)).
5. When extended, the tail fin must not cross the wing edge of the same side.
If all of the above conditions are satisfied, then the line is accepted as a tail fin
edge, and an evidence score for the fin is awarded. To consolidate this evidence,
the intensity values in the vicinities of the wing trailing edge and fin leading edge are
compared, as shown in Figure 4.5(b). Since the space between the two edges is usually
narrow especially for non-commercial aircraft, some degree of intensity uniformity is
expected unless the background is cluttered. If the local intensity shows a reasonable
match, then a bonus score is awarded, as illustrated in Figure 4.5(b).
Finally, if two or more tail fins are detected on both sides of the fuselage axis, then
they are further subjected to a symmetry test. For any symmetric line pair, the sum
of angle cotangents (eg. cot 1 + cot 2 or cot 01 + cot 02 in Figure 4.5(c)), is constant
and is only a function of the roll and pitch angles. This will be derived in Section 5.2
118
FP
F
RP
LC
RC
P1
R
P2
tail fin as a
2-wing grouping - bonus
fuselage axis
foreground
intensity match
F2
F1
B2
B1
b1
f1
b2
f2
b3
F3
B3
background
intensity
match
f3
TP
'1
'2
cot(1)+cot(2) = cot('1)+cot('2)
Figure 4.5: Locating tail fin edge lines: (a) geometric constraints in terms of location,
length and orientation, (b)intensity-based constraints applied both in the foreground
and background regions, (c) skewed symmetry constraints applied to tail fin leading
edges (ie., cot 1 + cot 2 = cot 01 + cot 02 ).
119
of the next chapter (refer to Equation 5.2.8), where the viewpoint is estimated. This
observation translates into the constraint
|(cot 1 + cot 2 ) (cot 01 + cot 02 )| < cth
where cth is a tolerance threshold. If this symmetry constraint is satisfied, then a
bonus score is awarded, otherwise a penalty is applied.
4.1.3
The Wingtip edge is another useful feature that completes the wing structure. Wingtip
characteristics vary depending on the aircraft type. A delta wing aircraft, as shown in
Figure 5.27(a), has no wingtip edge. Some fighter jets carry missiles on the wingtips,
resulting in elongated wingtip edges (see Figure 4.6(a)).
The wingtip edge must satisfy the following conditions.
1. The wingtip edge must be located in the search sector as shown on the right
side of Figure 4.6(b).
2. The wingtip edge must be approximately equal in length to the gap defined as
x2 in Figure 4.6(a). Often a wingtip edge appears longer than x2 when a missile
is attached to the wingtip (refer to Figure 4.6(a)). Therefore, the constraints
are relaxed accordingly (ie., x3 /x2 < x1 /x2 < rt and ` > x2 /2 where rt and `
are respectively the ratio threshold and wingtip line length).
3. The wingtip edge must be approximately aligned with the fuselage axis (see
the right side of Figure 4.6 (b) for typical rejected candidates). However, if the
wingpair is boomerang shaped, then the wingtip can also be roughly perpendicular to either wing edges as shown in Figure 4.6(c).
120
x1 > x3
x1
x2
x3
(a) x1, x2 and x3 are segmented lengths
C
fuselage axis
correct wing tip - accept
FP
outside the
search sector
- reject
RP
wingtip search sector
121
4.1.4
Having extracted most or all of the aircraft parts, the overall aircraft silhouette is now
defined. Additional scores are added (or subtracted) based on additional geometric
constraints and also on image intensity at selected locations within and around the
aircraft silhouette. Each of the following constraints contributes a score or imposes a
penalty.
1. If the wing leading edges overlap when rotated about FP as shown in Figure
4.7, then a score is awarded.
2. If the wing trailing edges overlap when rotated about RP (also refer to Figure
4.7), then an additional score is awarded.
3. Given the mean intensity values computed at the selected regions F1, F2, R1,
R2, M1 and M2 in Figure 4.8, if the mean intensity differences between each pair
of regions (F1 and F2), (R1 and R2) and (M1 and M2) are below a threshold,
then a score is added. This condition obviously favours aircraft with uniform
intensity distribution.
4. If the background is clean (as defined below), then the mean intensities of F1
and R2 should be distinct from the background mean intensity. If each intensity
122
fuselage axis
edge overlap
FP (or RP)
rla
ove
Figure 4.7: The wing leading edges must overlap when rotated about FP. The overlapping portion is shown in red. The same rule applies to the trailing edges of the
wing-pair.
123
C
Fuselage Axis
F1
FP
F2
M1
M2
R1
RP
R2
F1, F2, R1, R2, M1, M2: regions of interest for intensity level comparisons
Figure 4.8: Regions of interest for intensity level comparisons. The differences of the
mean intensity values between each pair of regions (F1 and F2), (R1 and R2) and
(M1 and M2) are expected to be small.
fuselage axis, then it is accepted as the rear fuselage edge, and a confidence
score is added. If no rear fuselage line is found for the boomerang wing-pair,
then a penalty is imposed.
6. The gap between FP and the inner endpoint of wing edge (ie., PT2 in Figure
4.11) is checked for any clutter lines. If three or more lines cross the doublearrowed brown line in Figure 4.11), then the hypothesis is more likely to be a
coincidental aircraft formation from clutter, hence is penalised.
7. If the image is cluttered (ie., total lines count exceeds 450), then an accidentally
generated spurious hypothesis may contain dense clutter inside its boundary.
Note that the number 450 is selected from the line count distribution curve
124
N/4
Figure 4.9: The background intensity is computed from the shaded periphery region.
We assume this periphery region contains mainly the background.
14
x 10
10
x 10
3.5
x 10
9
12
3
8
10
2.5
frequency
frequency
frequency
6
8
1.5
2
2
0.5
1
50
100
150
200
Background Image Intensity
250
300
50
100
150
Background Image Intensity
200
250
50
100
150
200
Background Image Intensity
250
300
Figure 4.10: Background intensity histograms obtained from the shaded perimeter
region (refer to Figure 4.9) of aircraft images with different clutter levels: (a) clean, (b)
light clutter, and (c) heavy clutter. PM is the count of pixels in the bin corresponding
to the peak, and PT is the total pixel count in the histogram. The ratio PM /PT roughly
indicates the clutter level.
125
FP
fuselage axis
PT2
RP
wing edge midpoint
LC
too short
- reject
rear fuselage
search line
accept
not aligned
with fuselage axis
- reject
RC
Figure 4.11: Finding of rear fuselage lines and clutter lines: Potential rear fuselage
edges for a boomerang shaped wing-pair are detected between the wing trailing edges,
and are shown in blue. Detection of many lines crossing the gap between the wing
edges inner point (eg., PT2) and the fuselage axis weakens the confidence of the
hypothesis. Clutter lines are shown in red.
C
not considered as clutter
because it is aligned with
fuselage axis
fuselage axis
clutter lines outside the
fuselage region
- shown in green
hypothetical
fuselage region
Figure 4.12: A spurious aircraft hypothesis coincidentally generated from dense clutter, is likely to contain many clutter lines in the hypothetical fuselage region.
126
60
40
20
20
40
60
80
100
120
140
10
15
clutter line count
20
25
30
Figure 4.13: Clutter evidence score plot as function of the clutter count. If the
clutter count within the fuselage region (refer to Figure 4.12) exceeds 7, then the
score becomes negative.
obtained from the clutter images (see Figure 6.3). Any short line segments
found within the hypothetical fuselage boundary (ie., shaded region in Figure
4.12) present a large angle with the fuselage axis, then they are considered as
clutter segments (shown in red in the figure). The score (or penalty) is computed
as a function of the number of the detected clutter segments, and is illustrated
in Figure 4.13. Note that the line count of 450 is selected based on the line
statistics of 160 non-aircraft clutter images.
8. For a correct aircraft hypothesis, it is expected that F P is close to the fuselage
127
FP
FP
RP
Figure 4.14: Deviation of FP from the fuselage axis, expressed as F P . Any aircraft
with coplanar wings and fuselage will display a small F P value. Spurious hypotheses
usually show larger F P values, therefore the parameter, F P , is used in interpretational conflict resolution process.
between R1 and R2 and no detected rear fuselage lines, provides a strong indication of such wing-fuselage formation, and the hypothesis is therefore severely
penalised.
10. If the image contains a large number of lines (eg., > 450), a spurious hypothesis
may be formed from the coincidentally extended clutter lines (see Figure 4.16).
If the image contains an aircraft, then it is unlikely that such a spurious hypothesis becomes the winning hypothesis. However, if the image does not contain
an aircraft, then without having to compete with a true aircraft, the spurious
hypothesis may become the winning hypothesis, generating a false alarm. In
order to handle this problem, we check whether or not more than three wing
edges of the hypothesis are extended as shown in Figure 4.16. If so, then the
line segments used to form the extended lines are identified by checking the
collinear slot in the extended lines (see Figure 3.9). If these segments did not
128
false wingpair
R2
R1
false nose
Spurious Hypothesis
Figure 4.15: Intensity comparisons between regions of R1 and R2. A spurious aircraft
hypothesis, often generated as wing-fuselage combinations, will show a large intensity
difference between the two regions.
generate a hypothesis describing the same wing shape as the given hypothesis,
then the current hypothesis is considered as being spurious and is therefore
severely penalised.
11. A fraction of the hypothesiss weight contributes a score. The score is calculated
as
sweight
K
An example of an aircraft candidate representation is given in Figure 4.17. All information about wings, fuselage, noses, tail fins and wingtips are included in the
representation. Two additional slots, Killed and Killed by are shown and are next
used during the interpretational conflict resolution process, which is the subject of
the following section.
129
C
clutter fragment
FP
RP
Spurious hypothesis: 3-4 wing edges
are formed by extending clutter edges
Figure 4.16: Spurious hypothesis which is accidentally formed where three or more
wings are the extended lines of clutter edges.
4.2
130
AIRCRAFT HYPOTHESIS
Wing-pair Index: #
Type: Boomerang/Diamond/Triangle
Left Wing Index: #
Right Wing Index: #
Four Edge Indices: [#, #, #, #]
Corner Coordinates: [FP(#,#) RP(#,#) LC(#,#) RC(#,#)]
Distance from FP to RP (ie., || FP - RP ||): #
Distance from LC to RC (ie., || LC - RC ||): #
FP's Angular Deviation: #
Weight: #
Detected Nose Cadidates: [ #, #, ..., #]
Winning Nose Candidate Index: #
Fuselage Score: #
Detected Fuselage Edges: [#, #, ..., #]
Nose to Wing Connect: [left(1 or 0), right(1 or 0)]
Detected Wing Tips:[left(#), right(#)]
Detected Tail Fins:[left(#), right(#)]
Wing Tip Score: #
Tail Fin Score: #
Other Score: #
Total Score: #
Killed:1 or 0
Killed by (aircraft candidate index): #
Figure 4.17: Aircraft-hypothesis representation. The two slots Killed and Killed by
are used during the interpretational conflict resolution process. The slot Weight
contains the sum of the four line lengths. Note that the symbol ] refers to a number.
based on which edges are conflicting. Edge conflicts are grouped into four commonly
occurring cases.
Case 1: The wing leading (or trailing) edges are shared. This often arises when the
wing leading or trailing edges cast shadows on the ground, and the shadow lines
are detected as the non-shared wing edges of the spurious hypothesis (shown
in red on the left side of Figure 4.18(a)). Sometimes the non-shared edges of
the spurious hypothesis come from the trailing edges of the tail fins (see the
right side of Figure 4.18(a)). If the image contains background clutter, then the
non-shared edges could also be clutter.
Case 2: One wing (ie., 2 edges from one side of the fuselage) is shared (refer to
131
Figure 4.18(b)). The spurious hypothesis has the non-shared wing, arising from
the tail fin, clutter, shadow or fuselage.
Case 3: Three edges are shared (refer to Figure 4.18(c)). This scenario sometimes
occurs when the non-shared edge of the spurious wing-pair comes from the
rudder, background shadow or clutter.
Case 4: The two hypotheses share only the nose and fuselage axes coincide (refer
to Figure 4.18(d)). This arises when the shadows cast on the ground by the
wings are positioned directly behind wings and form a wing-pair for the spurious
hypothesis.
To resolve the conflicts arising from the above scenarios, the reasonings (outlined
as below) are formulated. For convenience, we name the conflicting hypotheses HA
and HB , where HA and HB are the true and spurious hypotheses, respectively. The
hypothesis parameters such as F P (see Figure 4.14), weight and kF P RP k (see
Figure 4.17) are used in the following reasoning process.
Case 1: Two wing leading (or trailing) edges are shared.
IF 4F P between HA and HB is large,
if F P of HA is smaller and weight of HA is larger, then HA wins.
ELSEIF 4F P between HA and HB is small,
if weight of HA is much larger, then HA wins,
elseif the non-shared edges of HA are parallel to those of HB ,
then GO TO SHADOW REMOVAL ALGORITHM
elseif HA is boomerang-shaped, and kF P RP k of HA is smaller,
then HA wins (eg., right side of Figure 4.18(a))
ENDIF
Case 2: Left or right wing is shared.
IF 4F P between HA and HB is large,
if F P of HA is smaller, then HA wins.
ELSEIF 4F P between HA and HB is small,
if HA is not boomerang-shaped, and HB is boomerang-shaped,
132
tail fin
removed
removed
shadow
(a) 2 leading or trailing edges are shared
true
true
removed
removed
removed
tail fin
clutter or
shadow
(b) two edges from one wing is shared
fuselage
rudder
true
removed
removed
clutter or
shadow
true
removed
shadow
(d) only nose is shared and fuselage axes are aligned
133
If the interpretational conflict is not resolved by these reasoning processes, then the
higher score hypothesis wins. The surviving hypotheses are sorted in terms of their total score, and the top 5 highest score hypotheses reach the output stage of the generic
recognition. If the score exceeds a score threshold (eg., 600), then the hypothesis is
accepted as the recognised aircraft.
4.3
In images of parked aircraft, shadows cast by wings can potentially confuse the system.
Shadow detection is often addressed in building detection systems [76, 77, 128] and
also in aircraft recognition systems [32, 33, 81].
134
Figure 4.19: Shadow regions casted by wings ((a) are mostly covered by the wings,
or (b) are separated from the wings). The shadow wings have their symmetry axis
roughly aligned with the aircraft fuselage axis.
In our case, a spurious hypothesis containing shadow region(s) usually have a lower
score, and they are therefore successfully removed by the true hypothesis because of
the large score gap.
However, if the shadow lines of the spurious hypothesis fit well with the rest of the
aircraft structure, then its score may become large. Such a problem is illustrated in
Figure 4.19, where the shadow lines are parallel with the aircraft wing edges and fit
well with the fuselage.
In such a case, the conflict resolution algorithm invokes the shadow removal process.
The call from Case 1 is activated when two leading or trailing edges are shared, as
shown in Figure 4.19(a). The call from case 4 is activated if the shadow regions are
separated from the wings, and no lines are shared, as shown in Figure 4.19(b). Two
separate algorithms are developed to handle each case.
135
shared edge
causing conflict
shared edge
causing conflict
shadow region
AND
Hypothesis 1
Hypothesis 2
Region 1
Region 2
Region 3
Region 4
(b) four regions of interest for image intensity analysis to discriminate spurious aircraft
hypothsis which comprises 2 correct wing edges and 2 shadow lines
FP
WING EDGE
NOT SHARED
correct aircraft
hypothesis - accept
L2
L1
RP
L3
L4
RP'
L1'
L3'
wing edge
shadow line
L2'
L4'
spurious aircraft
hypothesis - reject
(c) conflict - nose & fuselage shared, no wing edges are shared between the
two aircraft hypotheses
Figure 4.20: Interpretational conflicts arising from shadow casted by the wings.
136
In this case, two non-shared lines of the spurious hypothesis are the shadow lines (one
of which is shown in red in Figure 4.20(a)). The shadow lines casted by the wing
leading edges are hidden under the wing, hence not visible in the image. The wing
on the far right side of Figure 4.20(a) belongs to the spurious hypothesis.
Referring to Figure 4.20(b), if the conflicting edges are the wing leading edges, then
regions 3 and 4 will be selected for intensity level checks. If the sharing occurs at
the wing trailing edges, then regions 1 and 2 will be examined. If the mean and
standard deviation of the intensity in the regions are both small, then the regions can
be regarded as the shadow regions, and the hypothesis with the smaller kF P RP k
wins.
From Case 4: no wing edge is shared - nose is shared and fuselage axes
are well aligned
Figure 4.19(b) is the typical example where the call is made from Case 4: the wings
are slender and the sun is roughly along the aircraft longitudinal direction, so that
the shadow is separated from the wings in the image. This spurious wing-pair fits
well with the fuselage axis and has the correct nose.
In this case, the four wing edges of the aircraft hypothesis are compared with their
counterparts from the spurious hypothesis (refer to Figure 4.20(c)). Firstly, L1, L2,
L3 and L4 must be parallel with L1, L2, L3 and L4, respectively. Secondly, the
two wing-pairs must have approximately the same dimension. Lastly, the separation
between the two wing-pairs along the fuselage axis must not be much greater than
kF P RP k. Provided that the aircraft is in a parked position (ie., distance between
the wings and ground is relatively small) and the viewing angle is not too oblique,
the separation is usually less than 1.5kF P RP k. If all of the geometric constraints
137
are satisfied, and if one of the hypotheses exhibits a roughly constant dark region,
then the spurious hypothesis is removed.
The proposed aircraft recognition system does not assume a severe regional overlap
between any two aircraft. Hence, if any pair of surviving hypotheses show an excessive
regional overlap, then the one with the smaller score is eliminated. Figures 4.21(a)-(c)
provide a snapshot of the final competition stage. The coloured lines in the figures
correspond to the extracted aircraft features (ie., wings, nose, fuselage and wingtips).
No tail fins are found for these three aircraft candidates. The aircraft candidate in
Figure 4.21(a) has a low score. Note how the true aircraft nose was mistaken for a
left wing by the system. The aircraft candidate in Figure 4.21(b) shares line features
with that of Figure 4.21(c). The wing symmetry axis (ie., line joining FP to RP)
of this aircraft candidate (from 4.21(b)) presents a large angular deviation from the
fuselage axis (measured as F P ). Therefore this hypothesis is dismissed from further
competition by updating its Killed and Killed by slots with one and the index of the
winning aircraft candidate, respectively. The aircraft candidate of Figure 4.21(a) is
also removed from the aircraft database because of its low score.
4.4
138
Figure 4.21: Examples of some competing aircraft candidates. Green lines correspond
to nose legs, red lines to wing edges and tips, blue line to fuselage axis, and cyan line
to wing symmetry axis.
139
Table 4.1: Scores obtained in the process of aircraft evidence accumulation. The first
6 scores are dedicated to the aircraft part detection, and the remaining evidences (in
the 7th 18th entries) are introduced in order to help distinguish between the aircraft
and clutter hypotheses.
n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Image evidences
nose
fuselage coverage (refer to Equation 4.1.1)
wing-to-nose-connect
excessive short fuselage boundary edges
wingtip
rear wings
wing leading edges overlap
wing trailing edges overlap
intensity matching in the boundary
background-foreground intensity differences
rear fuselage lines for boomerang wings
no rear fuselage line for boomerang wings
many clutter crossing wing leading edge
many clutter in fuselage region
FP deviation from fuselage axis
boomerang, RP1-RP2, and rear fuselage
3-4 wings are fragmented in clutter
hypothesis weight
Scores
100
120 to 320
[Left,Right]=[30,30]
-100
[Left,Right]=[40,40]
[Left,Right]=[10 to 60,10 to 60]
30
30
10 for FP, 10 for RP and 20 for M
-10 to -50
10no. of lines
-30 or -100 (long narrow wings)
-150
2 50 ... up to 18 130
2 F P 180/pi
-500
-30 -80 or -100
100 weight/((M + N )/2)
normalised score gap is maximised. The score gap is measured as the normalised
difference of the score means (T and F ) with respect to the standard deviations
(T and F ),
normalised score gap =
|T F |
T + F
(4.4.1)
It should be pointed that testing the recognition system on non-aircraft images was
crucial to better adjust the system parameters and introduce many score penalties
resulting from negative evidences.
Major positive evidences that consolidate the aircraft hypothesis are the nose and
140
70
60
50
40
30
20
10
0
120
140
160
180
200
220
240
260
280
300
320
Figure 4.22: Histogram of the fuselage coverage score of the winning hypotheses using
a sample base of 300 real aircraft images.
fuselage coverage. The fuselage score of the winning hypothesis (as shown in the
second row of Table 4.1) varies roughly from 120 to 320. Figure 4.22 shows the
histogram of the fuselage score of the winning hypothesis based on a set of 300 real
aircraft images.
The system suffers a large score drop if too many line fragments are found in the
fuselage region (refer to the fourth row of Table 4.1) or in the vicinity of the wing
leading edges (ninth row). For aircraft with boomerang shaped wings, the region
behind the wings are searched for any fuselage lines. It was observed that the wing
and fuselage were often paired to generate false boomerang wing pairs (see Figure
4.15). The checking for the existence of rear fuselage edges plays a vital role in
removing many spurious candidates. Considerable score drops are hence introduced
as shown in the 12th and 16th rows of Table 4.1. These scores in Table 4.1 are fixed
for the remainder of the experiments.
141
4.5
In Section 1.1 of Chapter 1, 28 representative images were selected from the image set
and were presented to help visualise the difficulties encountered in automatic aircraft
recognition. These difficulties were broadly divided into 7 categories: blurring, camouflage, clutter, closely placed multiple aircraft, occlusion, protrusion and shadowing
effects. For the experiments reported here, four images are allocated to each category.
The outcome of these runs is discussed in terms of those seven categories.
Blurring: Figures 4.23 to 4.26 feature aircraft aerial photographs and show substantial amount of blurring and noise. Most of the wing edges appear distorted or
present low contrast. The application of dual threshold edge detector combined with
the line extension algorithm, enabled the extraction of wing edges and eventually led
to the successful generic recognition of the aircraft. Figure 4.24 poses a particular
challenge as the background appears densely cluttered. However, the clutter rejection
algorithm described in Section 3.3 was activated and removed portions of the clutter. Note that this image resulted in two surviving hypotheses (See Figure 4.24(d))
where, because of non-overlap, the spurious hypothesis did not have to compete with
the true hypothesis. However, the score of the spurious hypothesis was well below
the threshold, and therefore was rejected in the final recognition stage. Notice finally
that all four aircraft have at least one wing edge extended.
Camouflage: Figures 4.27 to 4.30 present fighter jets with camouflage. The outline
of the camouflage patches forms a number of T junctions with the wing edges,
142
causing wing edge fragmentation. However, given that the line fragments are proximal and roughly collinear, they are usually extended to form longer wing edges, as
shown in column (c) of Figures 4.27 to 4.30. The camouflage patches are relatively
large in size, satisfying marginally the intensity check required to form two-line groupings. The aircraft image in Figure 4.30 has a camouflage pattern which resembles the
sand dune background. This makes it difficult for region segmentation-based methods to distinguish the aircraft from the background region. Our approach, which is
based primarily on geometric reasoning, shows some success in detecting camouflaged
aircraft in images.
Background Clutter: Figures 4.31 to 4.34 present aircraft images with cluttered
background. The first two images (Figures 4.31 - 4.32) contain dense clutter but are
successfully filtered during the contour extraction process. The winning hypothesis
in Figure 4.32(d) has the correct wings but the nose is located at the cockpit. This
occasionally occurs when the nose is either not detected or unable to compete with
a false nose arising from the cockpit structure. In many applications (eg., defence),
not locating the true aircraft nose is not a major issue as the cockpit is close enough
to the true aircraft nose and aligned with the fuselage axis.
In Figure 4.33, the clutter lines are polarised. Therefore, the clutter removal algorithm
would be insensitive to them. However, the polarised lines are assigned zeros in
their significant slots (refer to the table in Figure 3.9), and hence are prevented
from forming wings undesirably. Figure 4.33(d) shows that the winning hypothesis
is correct, and most of the aircraft parts are correctly detected. In Figure 4.34, the
background contains vegetation fields. The clutter in the edge image of Figure 4.34(b)
is not dense enough for the clutter removal algorithm to take effect. As a result,
clutter lines are allowed to form pairs and compete with the true wings. Despite this,
spurious aircraft hypotheses based partially or completely on clutter lines did not
143
build up enough confidence score to eliminate the correct aircraft hypothesis, which
is shown in Figure 4.34(d).
Multiple Aircraft: Figures 4.35-4.38 show closely spaced multiple aircraft in the
scene. Our system only accepts the five highest scoring hypotheses, because we
assume that the image does not contain more than 5 aircraft. Often one aircraft
part may be associated with parts from an adjacent aircraft and generate spurious
hypotheses. These hypotheses always give rise to interpretational conflicts because
part sharing is inevitable. However, spurious hypotheses can usually be eliminated
by the true hypothesis during the conflict resolution stage, unless the true aircraft
hypothesis suffers from clutter effect and occlusion. All of the images in Figures 4.354.38 generate the correct winning hypotheses (up to 4 aircraft), with most of their
parts correctly detected and labelled.
144
wing leading edge. The extended line could not form a line-pair because the intensity
check was not successful. Figure 4.42 is the bottom view of an aircraft with missiles
under the wings. These missile protrusions caused the fragmentation of all four wing
edges. However, the fragmented wing edges were successfully extended, and appear
as extended lines in the winning hypothesis.
Protrusions: Figures 4.43-4.46 contain top view images of aircraft that carry missiles
or have engines. Only the protruded parts of the missiles and engines are visible,
and the remaining portions are hidden under the wings, not cluttering up the wing
region. Therefore, the wing edges usually satisfy the intensity constraints required by
the two-line grouping formulation.
The engine protrusions could be used to determine if the aircraft is a large commercial
airplane (eg., Boeing 747). In Das and Bhanu [33], the engine feature is embedded
in the model hierarchy and used for aircraft classification. This is feasible only if
the engines can be detected and recognised reliably. In our application, where the
background could possibly be noisy and cluttered, it would be difficult to distinguish
engines from clutter. Therefore, protrusions are treated as clutter.
The aircraft in Figure 4.45 has all four wing edges fragmented; the leading edges are
broken into four or five segments due to missile launch rails. This results in numerous
combinations of line extensions, with several wing-pair candidates delimiting the same
aircraft wings. Usually the wing-pair composed of the extended lines presents the
highest score and emerges as the winner.
Shadows: Figures 4.47-4.48 present self cast shadows on aircraft. These shadows
generate dark regions on the aircraft body. Any aircraft recognition approach that
is based primarily on regional intensity information (eg., region segmentation based
145
methods) may suffer when the aircraft body contains shadow, possibly splitting the
aircraft region into subregions. Figures 4.49-4.50 show examples of aircraft casting
their shadows on the ground. In this case, we do not attempt to detect the shadows
as in Das and Bhanu [33], Nevatia [92], Lin [76], and Marouani [81]. Shadows in the
background are treated as clutter, and they frequently fail to form wings. Furthermore, coincidental hypotheses formed by shadow lines usually result in large score
hypotheses. However, if the shadow lines fit in well with nearby aircraft structure,
then a hybrid part-aircraft part-shadow hypothesis may gain a high score and cause a
conflict with the true hypotheses. When this occurs, the shadow rejection algorithm
described in Section 4.3 is invoked and eliminates the spurious hypothesis. Figures
4.47(d) and 4.48(d) show that no parts from the shadows are included in the winning
hypotheses.
Finally before proceeding to the next section, it should be pointed out that six additional demonstrations of generic aircraft recognition as applied to scaled-down aircraft, is provided in Chapter 5 (refer to Figures 5.23(b) 5.28(b)). These images
were obtained in a controlled environment where the effects of shadow, blurring, protrusion, camouflage, clutter and occlusion were deliberately introduced.
4.6
In the previous section, the recognition performance of the system was discussed using
real aircraft images, focusing on how the system detects aircraft under various adverse
imaging conditions. These figures show that if an aircraft exists in an image, then
false hypotheses arising from background and from aircraft-background associations,
146
are usually defeated by a true hypothesis. This raises the question as to whether
such false hypotheses would survive as winning hypotheses if there is no aircraft in
the image. This motivated us to consider non-aircraft images containing natural
and man-made structures. We are not aware of any previous attempt that includes
non-aircraft images in the performance analysis.
In this section, a number of experimental results are given in order to give some
idea about the types of non-aircraft test images selected, and to show how accidental
winning hypotheses appear in these images. Figure 4.51 shows 12 images along with
the scores of the winning hypotheses. The images contain aerial views of buildings,
runways, vegetation farms, roads and coast, as well as a number of cloud scenes.
These examples demonstrate that images of structured clutter are likely to generate
false winning hypotheses for any vision system that mostly rely on line features.
In Figure 4.51, regions enclosed by the detected false hypotheses do not show heavy
clutter. Slight intensity variations are usually accepted in the aircraft hypothesis
generation, because many camouflaged or shadowed aircraft regions present similar
intensity variations.
None of the winning hypotheses in Figure 4.51 contain heavy clutter within the wing
and fuselage regions, implying that the system successfully penalised any spurious
hypothesis whose silhouette contains dense clutter. Some of the false hypotheses
present short line fragments as their wing edges; this is allowed because the system
assumes from the start that the aircraft could be occluded, or its wing edges could
be fragmented or partly washed away. However, most of such hypotheses (as shown
in Figure 4.51) do not have a high score, and therefore fail to emerge as the winning
hypothesis. The only exception to this is the hypothesis in Figure 4.51(f) presents
good geometric attributes, and its boundaries are well enclosed, resulting in the false
recognition. However, this hypothesis fails to form when the neural networks in
147
4.7
Discussion
The generation process of an aircraft hypothesis along with its confidence score is
presented in detail in this chapter. A confidence score is a reflection of both positive
and negative evidences gathered by the hypothesis. The most important evidence for
an aircraft hypothesis is the presence of the fuselage section, whose detection process
is described at length in this chapter. Once the fuselage section is extracted, an
aircraft candidate is consolidated and the fuselage axis is refined for the purpose of
cueing the search for other aircraft parts (ie. wing tips and tail fins) and initiating
other geometric and intensity-based verification processes. Furthermore the fuselage
axis information is also used in the next chapter for aircraft viewpoint estimation.
The verification of generic aircraft hypotheses calls for testing of conditions of the
rules. It is essentially a reasoning process based on evidence accumulation to infer
the presence of aircraft instance in the image.
Key features of the verification step is gradual accumulation of evidence through (a)
part detection and association (positive evidence) and (b) clutter feature detection
148
(for negative evidence), and ambiguity resolution by re-visiting the constraint and
discriminating shadow lines. The most dominant evidence is the fuselage (fore-body)
coverage, because it brings together the more consistently visible parts, wing and
nose, while discarding large portions of nose and wing pair candidates. Clutter evidence are also sought for to penalise spurious hypotheses that have been accidentally
formed amongst clutter, and to increase the score gaps between the true and spurious
hypotheses.
The feature hierarchy is important in that the system can access features from different levels when necessary. Higher level features include pointers to component low
level features. Therefore the constituent features and evidence of a hypothesis can be
retrieved easily through those points.
If the confidence score exceeds a preset threshold, then the system declares recognition of a generic aircraft. The recognition comes with shape/intensity information
imbedded the winning hypothesis (as shown in Figure 4.17).
149
Please see print copy for Figure 4.23
150
151
152
153
154
155
156
Figure 4.51: Examples of spurious hypotheses from non-aircraft images when the
rule-based line grouping method is used. The spurious hypothesis in (f) survives as
its score exceeds the threshold. However, with the neural network based line grouping
method, this spurious hypothesis fails to form.
Chapter 5
Aircraft Pose Estimation and
Identification
So far, the generic aircraft recognition step is concerned with generating and verifying
the hypotheses based on evidence derived from generic knowledge of aircraft structure
(Table 1.1). More accurate aircraft recognition (ie., identification) can be obtained
if specific aircraft models are used in the recognition process. Model matching implements the overlay of complete silhouette boundaries, hence allowing previously
missed primitive features (eg., missed rudder edge) to contribute to the recognition
process. By matching the winning aircraft hypotheses to pre-stored aircraft models,
aircraft identification (eg., F16 or F18 as shown in Figure 5.1(a)-(b)) is possible. It
is shown in the literature [88] that techniques that do not use models face limitations
in object discrimination capabilities, suggesting some sort of matching technique for
further verification and identification.
In this system, an input to the identification stage is the winning aircraft candidates
(up to 5) that provide a wealth of information such as the aircraft longitudinal orientation, position in the image, wing shape, and wing leading and trailing edge labels.
Having all this information, model matching no longer involves an exhaustive search
157
158
process. Only a portion of selected image features and model candidates (out of the
entire model set) are required in the matching process.
Model matching can be performed either at the feature or pixel level. Matching using
line features calls for correspondence of model lines, after being transformed and
projected, with their image counterparts, and assessing the degree of match between
them [6, 37, 60]. This approach may be more robust in the presence of clutter and
partial occlusion, but performance may suffer if some salient lines are missed during
the feature extraction process. On the other hand, the pixel level matching approach
is more tolerant to poor imaging quality and image processing deficiencies, but can
be susceptible to noise and clutter.
This chapter begins with a review of matching metrics that could be used in our
application. Section 5.1 discusses these matching metrics. In Section 5.2, the three
dimensional model generation and the pose estimation algorithm are presented. Section 5.3 describes the model and image alignment process. Section 5.4 proposes a
fitting metric for the model matching. Section 5.5 discusses a pose fine-tuning process and a search strategy for the best match. Section 5.6 presents illustrative results
using six images of scaled-down aircraft. The purpose of this section is to visually
demonstrate the identification performance of the matching technique. Section 5.7
summarises the chapter.
159
5.1
Model matching is the last processing step in object recognition. A high degree of
match between a model instance and its image, provides a strong confirmation of the
objects presence in the image, and allows viewpoint determination.
Feature-based matching methods call for the correspondence of model and image
features (eg., lines) by optimising some fitting metric. Fairney [37], Huttenlocher and
Ullman [60], and Beveridge [5] use lines for 2-D matching. The line-based matching
methods are computationally less expensive than their pixel-based counterparts and
are more robust against clutter.
Pixel level matching techniques [27, 41, 68, 97, 97] are widely used for 2-D shape
matching applications, where a projected model can be matched with the gradient
image [68], distance transformed image [41] or the edge image [97]. The downside of
these methods is their degraded performance in the presence of excessive clutter. This
issue is, however, addressed in [97] where a modified Hausdorff measure was used to
help reject clutter and improve matching speed. In [108], any discrepancy of pixel
orientation between the model and image point pairs, imposes a penalty to the fitting
score. In the remaining part of this section, six fitting metrics are presented in some
details: integrated squared perpendicular distance, distance ratio standard deviation,
circular distribution of matched pixels, distance transform, Hausdorff distance and
averaged dot product of direction vectors.
5.1.1
As for fitting a transformed model to the corresponding image data, the most obvious
way would be to minimise the sum of squared distances between the corresponding
160
Image Segment
L
P2
P1
d2
d(t)
d1
Model Segment
points from the model and image data sets. Since using the sum of squared distances
as a fitting measure is prone to edge fragmentation in the image, others have proposed the use of point-to-line distance to accomplish fitting [1, 70]. Beveridge [5]
introduced a fitting criterion, called the Integrated Squared Perpendicular Distance
(ISPD) between image segments and model lines that are infinitely extended during
the fitting process. Referring to Figure 5.2, the perpendicular distances from model
segment endpoints P 1 and P 2 to a model segment, which is infinitely extended, are
labelled d1 and d2, respectively. The perpendicular distance d(t) from any point on
the image segment to the extended model segment can be expressed as
t
d(t) = d1 + (d2 d1 ) , 0 t L
L
(5.1.1)
where t is a position parameter along the image segment and L is the length of the
image segment.
The definite integral of d(t) over L generates the ISPD,
Z L
L
ISPD =
d2 (t)dt = (d21 + d1 d2 + d22 )
3
0
(5.1.2)
This ISPD is calculated and summed over all pairs of model-image line segments, and
161
1 X
ISPD(s)
Lm sC
(5.1.3)
where Lm is the sum of all model segment lengths, and s is the segment index in the
model line set C.
In addition to the fitting error, Beveridge also includes omission error, pairwise error,
and transformation error in order to form the total match error. The omission error
is the fraction of the model segment not covered by the corresponding image segment,
and is within the range [0,1]. The pairwise error is an increasing function of orientation
difference between the segment pair. The transformation error is introduced if the
scale change associated with the transformation under weak perspective projection is
too large.
5.1.2
Chien and Aggarwal [27] proposed a 3-D shape recognition technique based on corners
and contours as primary features instead of straight lines. The corner points are
extracted from the image contours and also from the 3-D model. Every four-point
correspondences (between four 2-D image points and four 3-D model points) are
used to generate the transform hypotheses, which are then verified using constraints
associated with the rotational parameters (refer to Section 2.3.4 for details).
Given the contours of the image and projected model that are aligned according to
the estimated transformation, first the model and image contour centroids (C and
C) are computed, then the principal axes (P and P) are determined (refer to Figure
5.3). Each contour is then sampled to generate Nc boundary points, mk and ik where
k = 1 . . . Nc . For each pair of points of the same index k, the distance from the
162
P'
m k-1
mk m
ik-1
k+1
d(C ,m k)
C'
model contour
ik i
k+1
d(C ',i k)
image contour
Figure 5.3: Projected model and image boundaries used for calculation of the distance
ratio standard deviation.
centroids to the points are computed and the ratio of the distances is obtained.
rn =
d(C, mk )
kC mk k
=
0
d(C , ik )
kC 0 ik k
(5.1.4)
163
(0,0)
(0,0)
Figure 5.4: Circular distribution of matched pixels, (a) good match between the
model and image boundaries, (b) poor match resulting in an uneven distribution of
points.
5.1.3
Marouani [81] proposed an algorithm where a short list of aircraft hypotheses are
generated by fine tuning the translation to find the best line-to-line fit between the
image and model segment sets. He uses a 3-D accumulator where two horizontal axes
denote 2-D translation, and the vertical axis represent accumulative votes from the
line pairs. The vote is weighted according to linear and angular separations of the
segment pair and their length difference. It should be noted that this method extracts
the line segments using the LINEAR feature extraction system [94], and assumes that
the viewing angles are known.
The validation procedure is based on the argument that the matched image segments
have to be evenly distributed on the model. Each part of the aircraft (wings, nose,
tail and fuselage) must have compatible proportion of matched segments in terms of
arc length. Firstly, the model outline is scanned, and for each model outline segment
the corresponding matched image segment is projected onto it. The total arc length
164
of the model is used to scale the binary function modulo 2 in order to map this
function onto a circle of radius 1. Matched pixel pairs assigned 1s will be mapped to
the pixels on the perimeter of the circle (see Figure 5.4). A good match will result in
points which are densely and evenly distributed around the circle as shown in Figure
5.4(a), and a poor match will result in unevenly distributed points as shown in Figure
5.4(b).
The evaluation of this distribution is achieved by introducing three parameters:
namely eccentricity (< 7%), length of match (> 50%), and displacement (< 20%).
These parameters are defined using the second order moments of the point coordinates
on the circle and the eigenvalues of the Hessian matrix consisting of the moments.
Detailed descriptions on the generation of these parameters can be found in [81].
5.1.4
Distance Transform
Given a binary edge image, each non-edge pixel is given a value that is a measure
of the distance to the nearest edge pixel. The edge pixels are assigned the value
zero. The operation converting a binary edge image to a distance image is called the
distance transform (DT) [12, 13, 41, 42, 114].
Figure 5.5 shows a sample binary pattern and its true Euclidean distance transform.
There are other approximations of the Euclidean distance measure, including the
chamfer 2-3 metric or 3-4 metrics [13, 114]. Typical model matching with distance
transformation image, denoted as I, is shown in Figure 5.6. This image, I, is correlated with the binary model edge template denoted by T . The average pixel values
of I that the edge pixels of T overlay is the measure of correspondence between the
165
2.2
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.2
1.4
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.4
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
1.0
0.0
1.0
1.0
1.0
1.0
1.0
0.0
1.0
1.4
1.0
0.0
1.0
1.4
1.0
0.0
1.0
1.4
2.2
1.4
1.0
0.0
1.0
0.0
1.0
1.4
2.2
2.8
2.2
1.4
1.0
0.0
1.0
1.4
2.2
2.8
3.6
2.8
2.2
1.4
1.0
1.4
2.2
2.8
3.6
4.2
3.6
2.8
2.2
2.0
2.2
2.8
3.6
4.2
Figure 5.5: A binary edge image (on the left) and its Euclidean Distance Transform
(on the right).
2.2
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.2
1.4
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.4
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
1.0
0.0
1.0
1.0
1.0
1.0
1.0
0.0
1.0
1.4
1.0
0.0
1.0
1.4
1.0
0.0
1.0
1.4
2.2
1.4
1.0
0.0
1.0
0.0
1.0
1.4
2.2
2.8
2.2
1.4
1.0
0.0
1.0
1.4
2.2
2.8
3.6
2.8
2.2
1.4
1.0
1.4
2.2
2.8
3.6
4.2
3.6
2.8
2.2
2.0
2.2
2.8
3.6
4.2
Figure 5.6: Computation of the Chamfer distance - model edge image (template) is
superimposed on the DT image, and the values in the shaded (blue) entries read the
distance between the model edges and the image edges.
166
1 X
dI (t).
|T | tT
(5.1.5)
where |T | denotes the number of edge pixels in T and dI (t) denotes the distance
between the template edge pixel t and the closest image pixel.
A perfect fit between the two edges will result in the chamfer distance of zero. The
matching process is to minimise the chamfer distance to find the best fit. The resulting
best match is accepted if the distance measure D(T, I) is less than a specified threshold
(ie.,D(T, I) < dth ).
Borgefors [13] proposed a matching method called the Hierarchical Chamfer Matching
Algorithm (HCMA) in order to reduce the computational loading. He embeds the
chamfer matching algorithm into a hierarchical structure, a resolution pyramid which
includes a number of versions of it in lower resolutions. The matching starts at a
very low resolution and the results of the low resolution matching are used to guide
higher resolution matching processes. Apart from the reduction of computational
complexity, this algorithm has also shown improved robustness against noise and
other imaging artifacts.
Gavrilla [41, 42] proposed another efficient method, which also implements a coarseto-fine approach in shape and parameter space, and incorporates a multistage segmentation. Firstly, a model template hierarchy is generated off-line, where a set of
similar shaped templates are grouped and represented by single prototype template.
Iterations of this grouping and prototype generation complete the construction of the
template hierarchy. Online, actual matching takes place adopting the coarse-to-fine
approach in terms of the template hierarchy and transformation parameters. Speed
gain from this approach in comparison to the brute-force DT formulation is of several
orders of magnitudes.
167
5.1.5
Hausdorff Distance
The Hausdorff distance is mainly applicable to image matching, and is used in image
analysis, visual navigation of robots, computer-assisted surgery, and so on. The
Hausdorff metric serves to check if a template image is present in a test image. It is
defined as the maximum distance of a point in a set to the nearest point in another
set. Given a set of points A and another set of points B, the directed Hausdorff
distance from A to B is given as
h(A, B) = max min ka bk
aA bB
(5.1.6)
where a and b are points in A and B, respectively. This directed Hausdorff distance is
oriented, which means that h(A, B) is not equal to h(B, A). Therefore, the definition
of the Hausdorff distance between A and B (not from A to B) would be
H(A, B) = max (h(A, B), h(B, A)).
(5.1.7)
Figure 5.7 provides a simple illustration of the Hausdorff distance between two ellipses. It is clear that better fitting of the two ellipses results in the smaller Hausdorff
distance. One downside of using the Hausdorff distance is its sensitivity to noisy
pixels. If one set of pixels contain a single nose point which happens to be far from
the points in the other set, then it will cause H(A, B) to be excessively large. This
sensitivity makes this classical definition of the Hausdorff distance not practical. A
more appropriate way to overcome this problem is to alter Equation 5.1.6 to
th
min ka bk
h(A, B) = faA
bB
(5.1.8)
th
where faA
denotes the f -th quantile value of minbB ka bk over the set A, for some
value of f between zero and one. When f = 0.5, Equation 5.1.8 becomes the modified
median Hausdorff distance. Huttenloacher provided a good approximation algorithm,
which proved highly efficient [59]. He developed some pruning techniques that reduce
168
A
B
H(A,B)
H(A,B)
Figure 5.7: Hausdorff distance shown for two point sets of ellipses. The ellipse pair
on top are better fitted, and result in the smaller H(A, B).
the running time significantly - with three speed up techniques (ie., ruling out circles,
early scan termination and skipping forward techniques). These are used in combination in order to rule out many possible relative positions of the model and the image
without having to explicitly consider them. Also, using the modified definition as in
Equation 5.1.8, the system robustness against small image perturbation and missing
features improved (as this allows for partial shape matching).
Huttenloacher also conducted Monte Carlo comparison study of the distance transform (Chamfer distance) and Hausdorff distance matching measures [58]. The algorithms were tested on synthetic images where clutter and occlusion were varied. The
performance comparisons of the DT (Chamfer) and Hausdorff measures were presented in terms of Receiver Operating Characteristic (ROC) curves that measure the
detection rate versus false alarm rate. The test result indicated that the Hausdorff
measure was better than the Chamfer measure.
169
Olson and Hutttenlocher [98] presented a modified version of the Hausdorff measure
which uses both the location and orientation of the model and image pixels in determining how well a target model matches the image at each position. To do this,
the target models and images are represented as sets of oriented edge pixels. The
bk
distance term ka bk in Equation 5.1.8 is replaced by max(ka bk, ka
) where a
and b are the pixel orientations of a and b, respectively, and is used to make the
values of ka bk and
ka b k
was tested on the synthetic images used in [58]. The ROC curves showed that the
modified version yields improved robustness against clutter and reduced false alarm
rates. Furthermore, the use of orientation information has also been shown to speed
up the recognition process.
5.1.6
Steger [107, 108] compared performances of the distance transform, Hausdorff and
Hough Transform method against occlusion, clutter and illumination variations, and
proposed a new match metric that is robust against occlusion and clutter.
Given an image and its transformed 2-D model, the match metric at a particular
reference point q in the image is computed as an averaged dot product of contour
direction vectors of the transformed model and image over all points (ie., pi where
i = 1 . . . n) of the model,
n
0
1 X huM (i) , uI(q+p0i ) i
s=
n i=1 ku0M (i) kkuI(q+p0i ) k
(5.1.9)
where u0M (i) is a direction vector of the ith point, p0i , on a transformed model, and
uI(q+p0i ) is a direction vector of the image point, whose location measured with respect
to the reference point q corresponds to the transformed coordinate of pi (ie., p0i ). If a
170
transformed model is overlaid to a very dense and randomly oriented clutter image,
then as the dot products will have positive and negative values, the average will be
small (ie., s ' 0). The threshold can be adjusted according to how much occlusion the
system is willing to accept. This match metric is inherently robust against occlusion
and clutter, because any missing part would not corrupt the value of the average
substantially, and the use of dot products will assign a large weighting to a pixel pair
if their directions are the same, and reduce the weighting if the pixel directions are
different. If the direction differs by more than 90 , the pixel pair will be penalised.
Steger [108] demonstrates that this approach achieves high recognition rates, when
applied to flat objects in a controlled environment. This method appears to be
more adequate for controlled industrial settings, where the object is flat, its shape is
well defined, the object rotation is limited to a single axis rotation, and the model
description is precise. Therefore, the direct application of this metric to our aircraft
application would not be appropriate.
5.1.7
So far, we have looked at six matching metrics relevant to our application. These
metrics are based on either lines or points and have different strengths and weaknesses.
As a first step of a matching algorithm design, the feature level (point, line, curvature,
etc) at which the model-to-image matching should be carried out needs to be decided.
In this subsection, we consider some practical issues associated with such pixel and
line based matching metrics.
If one assumes that the decomposition of a model outline can lead to connected lines
that are identical to those extracted from the image, then the mapping between the
model and image line features would be one-to-one. However, such an assumption
171
10
d
11
10
12
1
2
a
8
b
7
model data
12
1
2
3
b
7
11
d
image data
match
c
b
de
4
f
3
g
f
model data
4
5
de
3 c
f
2 b
1a
image data
6
g 7
f 8
match
Figure 5.8: Examples of one-to-many and many-to-many mappings. (a) One model
line is mapped to many image line fragments (eg., c {8, 9}, d {10, 11, 12}). (b)
When a curve is approximated with a series of straight line segments, the resulting
mapping is likely to be many-to-many.
is not realistic as illustrated in Figure 5.8(a); object edges often appear fragmented
due to poor imaging conditions or processing inefficiencies. In this case, the modelto-image mapping needs to be one-to-many. Another example is provided in Figure
5.8(b), where a parabolic curve (eg., aircraft nose shape) is approximated by a sequence of straight line segments. As shown in the figure, the exact points at which
the curve is broken into successive segments can vary. Therefore, as is clear in Figure
5.8(b), one model line may map to two or more data line segments, just as one data
line segment may map to a number of model line segments.
According to Beveridge [5], the majority of object recognition systems presume a
172
one-to-one mapping, and only a few permit one-to-many mappings. Beveridges ISPD
fitting metric (in Section 5.1.1) accommodates both one-to-many and many-to-many
mappings, and achieves reliable matching with fragmented data. Nonetheless, the
effectiveness of this method has been tested only on object shapes having straight
edges.
Line-based fittings may be effective against clutter and noise. But their performance
is inherently limited by the fact that image details that failed to form line features
would not reach the mapping stage. If an object contains curved contours, then manyto-many mapping is more appropriate but would be more difficult to implement.
On the other hand, pixel-level fittings are not subject to line-correspondence problems, and are relatively simpler to implement. These methods, however, are usually
more sensitive to clutter effects than line-based fittings are. Steger [107, 108] and Olson and Huttenloacher [98] incorporate pixel orientation information to their fitting
metrics to restrain clutter/noise pixels from contributing to match scores.
Following this discussion, our proposed fitting method is also implemented at the
pixel-level because (a) we need to match curved contours, (b) the phase image is
already available for the system, (c) image details missed in the feature extraction
stage can become available to better discriminate between similar aircraft models
(eg., F16 and F18), and (d) fitting metric is less sensitive to polygonal approximation
of the model.
173
5.2
As shown in Figure 5.1, we consider five military jets (F16, F18, F111, F35 and Mirage) for model matching purposes. Simple three-dimensional models are constructed
by recording the vertex coordinates of 5 true scaled-down aircraft models and joining
them in a piecewise fashion as shown in Figure 5.9. This approach was the most
straightforward solution to get 3-D dimension (relative dimension) of the aircraft. To
check these models are applicable to the true aircraft images, the matching of these
models were applied to the real aircraft images in the test set, and contour overlap
was acceptable.
The blue and red contours correspond to the aircraft horizontal and vertical planes,
respectively. The horizontal plane outlines the aircraft silhouette boundaries viewed
from top, and the vertical plane represents the side view. The origin of the 3-D axis
system is set at the intersection of the wing leading edges, FP.
Given the 3-D model and the image, we define two reference systems to be used in
the viewpoint determination: an aircraft reference system (XYZ) where aircraft 3-D
coordinates are measured, and image reference system whose x and y axes lie on the
image plane, as shown in Figure 5.10. Let the X-Y plane of the object reference
frame be the plane where the co-planar aircraft wings lie. The wings exhibit lateral
symmetry about the X axis. In Figure 5.10, the vectors V1 and V2 may represent
the wing leading edges. The orientation of the aircraft reference frame with respect
to the image reference frame is expressed in terms of Euler angles: roll, pitch and
yaw.
174
FP
RP
1
0
0
R =
0
cos
sin
0 sin cos
cos 0 sin
R =
0
1
0
sin 0 cos
cos sin 0
R =
sin
cos
0
0
0
(5.2.1)
The product of the three matrices, R = [R R R ], transforms the aircraft coordinates from the aircraft reference frame to the image reference frame [116]. The yaw
angle, , corresponds to the rotation of the fuselage axis about the Z axis. In Figure
175
aircraft reference
frame
P
Y
V2
V1
X
(or v2 )
v'2
p
1
y'
v'1
image reference
frame
x
x'
image plane
Figure 5.10: Model to image projection. Translation and scaling are ignored to
simplify the diagram. The x0 -y0 axes are the projections of the rotated X-Y axes.
Note v10 and v20 can also be expressed as v1 and v2 if measured with respect to the
image reference frame (ie., x-y frame).
176
5.10, it is shown as the angle between the y and y axes. Rotating the image plane
by the yaw angle, , aligns the projected X axis of the aircraft frame (ie., x0 -axis in
Figure 5.10) with the image x-axis. Let v10 and v20 denote the projections of V1 and
V2 in the x0 y0 coordinate system, representing the wing leading edges. V1 and V2
can also be expressed as v1 and v2 if referenced to the image reference system (ie.,
xy coordinate system). The relationship between v1 , v2 and v10 , v20 can be expressed
as a simple 2-D rotation of an angle , as shown in Equation 5.2.2
"
v1 =
"
v2 =
cos sin
sin
cos
cos sin
sin
cos
#
v10
(5.2.2)
#
v20
If 1 and 2 designate the angles of v10 and v20 with respect to the x0 -axis, as shown
in Figure 5.10, then we can express v10 and v20 in the x0 y0 coordinate system as
"
v10 = k1
"
v20 = k2
cos 1
sin 1
cos 2
#
(5.2.3)
#
sin 2
177
respectively as
"
R R R
0 1 0
0
"
#
1 0 0
= s
R R R
0 1 0
0
v1 = s
v2
1 0 0
(5.2.4)
(5.2.5)
(5.2.6)
cot 1 = F
cot 2
(5.2.7)
where F = / is the model angle cotangent of the symmetric vectors (refer to Figure
5.10).
178
Halving the sum and difference of the cotangents in Equation 5.2.7 leads to
c=
1
(cot 1
2
(5.2.8)
The left side of Equation 5.2.8 is a measurable quantity from the image and F = /
is a known wing parameter. After some algebraic manipulation of Equation 5.2.8, we
obtain the quadratic equation,
x2 Ax + R = 0
(5.2.9)
= arccos( x)
c
= arctan(
)
sin
(5.2.10)
5.3
Having estimated the aircraft wing orientation in the image reference frame, the model
as a set of connected 3-D points undergoes a scaled orthographic projection. This
is a relatively good approximation to perspective since the object like an aircraft is
not deep with respect to its distance from the camera [60]. This process is described
179
in Equation (5.3.1). Let [xi , yi ] be the coordinates of a point in the 2-D image and
[xm , ym , zm ] be the coordinates of the corresponding point in a 3-D model, then the
transformation can be expressed as
"
xi (k)
yi (k)
xm (k)
+
= sP R
y
(k)
m
zm (k)
"
1 0 0
"
4x
4y
#
(5.3.1)
|F P RP |i
|F P RP |m
(5.3.2)
where the subscript i refers to the image, and m refers to the projected model. After
scaling, the projected model is translated by
[4x 4y]T = F Pi F Pm
(5.3.3)
so that F Pi and F Pm are brought into coincidence. This procedure also aligns both
model and image wing leading edges.
The model silhouette boundary in the image is obtained by projecting both the model
horizontal and vertical planes (blue and red lines respectively in Figure 5.11(a)) onto
the image frame. A simple boundary following algorithm is then applied to trance
the aircraft outer boundary in the image.
1. Start from the aircraft nose point, where the horizontal and vertical outlines
coincide. Select the left-most path.
180
TURN LEFT AT
INTERSECTION
TURN LEFT AT
INTERSECTION
START HERE
TAKE LEFT PATH
181
2. Follow the left path until an intersection point (shown as green dots in Figure
5.11(a)) of the vertical and horizontal outlines is encountered.
3. At the intersection point, choose the leftmost path and repeat procedure 2.
4. Stop the process when the starting point (nose point) is reached.
Once the clockwise silhouette tracing is complete, the coordinates of the waypoints are
stored in an array. The stored points are subsequently linked in a piecewise fashion
to generate the model silhouette shown in Figure 5.11(b).
182
Figure 5.12: Filtered phase map: discrete orientations are displayed in different
colours.
5.4
In Section 5.1.7, we have looked at some practical issues associated with implementing
the line and pixel based matching techniques. There we have put forth arguments in
support of pixel level matching, as applied to aircraft recognition.
From Section 5.3, the transformed model silhouette is overlaid on the phase image,
where each pixel carries the local orientation information. Figure 5.12 shows the
phase image, where different orientations are shown in different colours. We define
the model and image points (or pixels) as,
Pm = (xM (m), yM (m)) M
Ii = (xI (i), yI (i)) I
where M and I are the model and image pixel coordinate sets in the image plane.
183
The slope angles of Pm and Ii are respectively given as m and i . In practice overlaying the transformed model outline onto its image counterpart, rarely results in a
perfect coincidence. However, by applying a tapering window of width dth (eg., cosine
tapering window as shown in Figure 5.13) to each pixel of the model silhouette, the
corresponding image outline is more likely to fit, at least partially, within the windowed region. The 3-D representation of the cosine taper function along the model
silhouette boundary is shown in Figure 5.14.
The matching algorithm proceeds as follows. For each visited model point, Pm M,
we draw an orthogonal strip (or window), of length dth , and search for the closest
image point having a similar orientation as Pm on the strip (refer to Figure 5.15).
We record this image point, which we denote by If (m) , where the function f maps
the index, m, of the currently visited model point to its image counterpart. We, in
parallel, record the window value Wm , which measures the quality of fit. Hence, as is
clear from (5.4.1), which is the cosine taper function plotted in Figure 5.13, a value
of Wm = 1 corresponds to a perfect model-image pixel coincidence. A value of 0 on
the other hand, indicates the absence of an edge pixel having a similar orientation as
the model point.
0
otherwise
(5.4.1)
where dm = kPm If (m) k. After completing the model pixel tracing along, M, the
resulting weights, Wm , are summed and normalised by the total pixel count for that
model silhouette (refer to (5.4.2)). This number is the model match score associated
with the current pose estimate.
Sm =
X
Pm M
Wm /
X
Pm M
(5.4.2)
184
proximity weight
0.8
proximity
weight
0.6
0.4
0.2
dth
dm
dth
Figure 5.13: Proximity weight - ranging from 0 to 1 for each pixel pair.
Figure 5.14: Overlay of the 3-D cosine taper function along the projected model
boundary. The red colour is equivalent to 1, and blue colour in the background is
equivalent to 0.
185
direction of
model pixel scan
currently visited
m th model pixel
dm
Figure 5.15: Search for the closest image pixel having a similar orientation to the
current model pixel. The distance between the two pixels is dm .
5.5
The estimated aircraft pose is subject to errors due to various sources. The imaging
model considered is a simplification of the full perspective projection model. The
assumption of aircraft wing coplanarity only holds approximately.
The most straightforward method to reduce the pose errors is to perturb the transformation parameters (ie., roll, pitch, yaw, translation and scale factor), until the
best match is obtained. This can be time-consuming if a brute-force approach is
adopted to determine the pose. The use of the cosine taper function (Figure 5.13)
during the match score calculation can handle modest amount of errors due to model
simplification and optical distortions. Significantly larger errors are due to wing edge
186
90
140
80
120
70
100
no of occurrences
no of occurrences
60
50
40
80
60
30
40
20
20
10
20
40
60
80
100
120
140
Angle subtended by the wing leading edges
(a)
160
180
20
40
60
80
100
120
140
Angle subtended by the wing trailing edges
160
180
(b)
Figure 5.16: Histogram of the angles between the wing leading edges (a), and histogram of the angles between the wing trailing edges (b). These angles are taken
from the winning aircraft hypotheses of the 300 real aircraft images.
displacements which often occur during the edge and straight line extraction processes. Such wing edge displacements cause the intersection points F P and RP to
move away from their true locations. This shift of the F P and RP points affects not
only the orientation angles but also the scale factor (Equation (5.3.2)) and translation
(Equation (5.3.3)) as well.
Usually, the point F P is located accurately due to the fact that the wing leading
edges are long and the angle between them is much less than 180 as shown in Figure
5.16(a), which shows the distribution of angles between the wing leading edges based
on a sample set of 300 real aircraft images. On the other hand, RP is more susceptible
to displacement error because the wing trailing edges are typically shorter and often
occluded by the rudder. More importantly, the angle subtended by them is relatively
closer to 180 (see Figure 5.16(b)), making the positional error of RP very sensitive
to slight edge rotations.
Figures 5.17 and 5.18 illustrate these observations. In Figure 5.17, the extracted wing
187
Figure 5.17: Incorrectly estimated position of RP , and the resulting rotational shift
of the wing symmetry axis.
trailing edges of the winning hypothesis are short, and the right wing trailing edge
is slightly misaligned. This causes RP to shift slightly towards the nose and away
from the fuselage axis. The implication of this on the pose estimation is obvious, as
illustrated in Figure 5.18. The transformed model is slightly smaller than its image
counterpart, and shows relatively large orientation errors. This poor alignment results
in a low match score.
Knowing that RP is the dominant contributor to pose error and that perturbing all
five transform parameters (ie., roll, pitch, yaw, translation and scale factor) can be
computationally expensive, an alternative method is proposed where the perturbation
is made only to RP instead of the five parameters. Now the computational complexity
drops drastically to O(n), where n is the number of perturbed RP locations. The
perturbed RP s span a grid which is centred at the initial estimate and oriented along
the wing trailing edges. Such a configuration of the RP grid is shown in Figure 5.19.
188
Figure 5.18: Poor outline matching due to relatively large transformation errors.
Figure 5.19: Various RP s in a grid for iteratively determining the correct transform
parameters.
189
Figure 5.20: Match with the highest match score after considering all RP s in the
grid.
The RP positions in the grid are iteratively used to re-estimate the pose and calculate
the match score. Figure 5.20 shows the alignment achieved with the RP position in
the grid which resulted in the highest match score. Numerous tests demonstrated that
this approach of perturbing RP can provide a pose estimate almost as accurate as the
one using the five-parameter perturbation approach at a fraction of the computational
effort. So far the pose estimation analysis and model matching focused on one aircraft
model. Having a set of M models would, on average, lead to an M-fold increase in
processing time. However, a number of invariant shape description parameters exist,
which help reduce the model set to be considered for pose calculation and model
matching. These shape descriptors are listed below and are computed for all aircraft
candidates in the image and the model set.
Wing shape - boomerang, triangle and diamond. The wing shape is invariant
190
Generic Aircraft
(winning hypothesis)
wing shape
Boomerang Wing
Triangular Wing
Diamond Wing
wing edge coterminate?
Large Aircraft
Delta Wing
Delta Wing
FWR=
|C-FP|/|FP-RP|
.................
Model Base
As for the boomerang wing aircraft, additional checks on the wing and fuselage narrowness may assist in discriminating the large class airplanes from fighter jets. The hierarchical process of shortlisting the model candidate for matching purposes is shown
in Figure 5.21.
191
Use of the RP grid as shown in Figure 5.19 can be split into two steps: coarse level
and finer level matchings. The coarse level matching is carried out using a sparse
RP grid applied on the shortlist of the selected models. The model and the position
of RP associated with the maximum match score are noted. If the match score is
very high (> 90%) then the match is accepted. However, if the match score is not
high enough, then we first check if it is distinctively higher than the remaining match
scores. If so, the model is accepted and enters the finer level matching. If not, modelRP combinations associated with the three highest match scores are used for finer
level matching.
In the finer level matching, the RP associated with the previously found maximum
match score is used as the centre of a finer RP grid. The search for the maximum
match score is carried out by iteratively trying different RP positions in the grid. If
the maximum score exceeds a preset threshold, then the match and the associated
pose are accepted. This process is depicted in the block diagram of Figure 5.22.
192
List of winning
hypothesis (up to 5)
input hypothesis
Shortlist of
model candidates
coarse level matching
sparce RP grid
1. estimate pose
2. transform model to image
3. calculate match score
[model,pose]
max score >= 90% ?
max score
STOP
yes
no
yes
[model,pose]
top 3 scores
max score
1. estimate pose
2. transform model to image
3. calculate match score
[model,pose]
max score
STOP
check the max score
against threshold
193
5.6
A large set of 200 images of 5 scaled-down aircraft (ie., F16, F18, F111, F35 and Mirage) were obtained in a controlled environment where the effects of shadow, blurring,
protrusion, camouflage, clutter and occlusion are introduced. The 3-D wire-frame
models for these aircraft were generated by taking the dimensions of the scaled-down
aircraft and were matched to the winning aircraft hypotheses from the test images.
This section presents some model matching outcomes that illustrate the systems
ability to identify the viewed aircraft.
Figure 5.23(a) shows an F111 aircraft, which has shadows on its surface and on the
ground. The winning hypothesis is shown in Figure 5.23(b). Notice in this figure that
the aircraft generic recognition is not accurate because part of the cockpit is mistaken
for a nose. When model matching is applied, this error is removed and the viewed
aircraft is correctly identified as an F111, as shown in Figure 5.23(d). Figure 5.23(c)
shows the phase image where the model-to-image matching is applied.
Figure 5.24(a) is again an F111 aircraft, viewed on a grid of lines. Figure 5.24(b)
shows that the winning hypothesis is correct. The fuselage axis estimation appears
accurate, which results in a fairly accurate initial pose estimate. The best match
given in Figure 5.24(d) shows a slight boundary mismatch which is largely due to
aircraft modelling errors.
Figure 5.25(a) is an F16 aircraft which is partly obstructed by tree branches. The
background is cluttered and there is a missile protrusion on the right wing. Figure
5.25(b) shows that the aircraft is correctly recognised despite occlusion. In Figure
5.25(c), a few densely cluttered regions are cleared (eg., below the right wing where
the branches are cluttered). Excessive occlusion can degrade the model matching
194
performance as a large portion of the aircraft silhouette, not visible to the camera,
does not contribute to the match score. In this figure, however, occlusion is not
severe and the match threshold is exceeded. The transformed model, shown in Figure
5.25(d), is well aligned with the image. A slight displacement of rudder edges is mainly
due to aircraft modelling inaccuracies.
In Figure 5.26(a), a Joint Strike Fighter (JSF) aircraft is surrounded by dense clutter,
and the cockpit region is occluded. Figure 5.26(b) shows that the nose is correctly
detected and most of the aircraft parts are correctly recognised. Pixel-level matching approaches suffer when applied to densely cluttered images, because regardless
of how the model is transformed, the projected model points always find matching
image points from the densely cluttered region. In order to overcome this problem,
densely cluttered regions are filtered with the clutter removal algorithm of Section
3.3, as shown in Figure 5.26(c). The clutter removal process, combined with the incorporation of pixel orientation in the model matching algorithm (refer to Equation
5.4.1), enable a correct match as shown in Figure 5.26(d).
Figure 5.27(a) is a camouflaged Mirage aircraft with missiles under its wings. Figure
5.27(c) shows that the wing leading edges are fragmented by the missile protrusions.
In Figure 5.27(b), those fragments are successfully extended. The boundary alignment
between the winning model and the image counterpart appears to be very accurate.
Figure 5.28(a) is an F18 aircraft which has a shadow underneath it, and the background has grid lines. In Figure 5.28(b), the generic winning candidate contains the
correct parts except for the trailing edge of the left tail fin, which belongs to the grid
line fragments. Figure 5.28(d) shows good alignment in the wing edges and fuselage
outlines. The alignment is less accurate in the rear end of the aircraft and wing
tips. The wire-frame model of the F18 does not include wing tip missiles, causing a
slight misfit at the wingtip. The overall fitting of the model to its image is, however,
195
satisfactory.
5.7
Summary
In this chapter, we presented a model matching technique which aligns the simple 3-D
model with the winning hypothesis in the image at the pixel level. The aircraft pose
is estimated by measuring the angles of the wing leading edges. It was shown that the
initial pose estimate can sometimes be poor when the aircraft wings are not coplanar,
and/or the extracted edges are displaced due to poor image quality. An alternative
method to fine tune the aircraft pose was proposed and found to be efficient.
For the model matching metric, we chose a pixel-based fitting because implementing
it on complex shapes is less difficult than line-based matchings (eg., no many-to-many
line mapping required), and image details missed during the line extraction stage can
be recovered and contribute to the matching process. The orientation of the pixels
in the image was incorporated to the matching algorithm in order to prevent clutter
pixels from falsely contributing to the match score.
Using viewpoint invariant measurements imbedded in the winning hypothesis, the
model search space could be pruned to speed up the matching process. The matching algorithm has been tested using five scaled-down aircraft models, and showed
promising results. The statistical analysis of the matching performance is deferred to
Chapter 6.
196
Figure 5.23: Model matching for F111 with shadow (match score = 64%).
197
Figure 5.24: Model matching for F111 with grid clutter (match score = 66%).
198
Please see print copy for Figure 5.25 through to Figure 5.28
Figure 5.25: Model matching for F16 with occlusion and protrusion (match score =
75%).
Chapter 6
Performance Analysis
In this chapter, the systems performance is evaluated in the presence of real world
problems, using a large test set of real images. Section 6.1 briefly describes the tuning
process of the system parameters, and the preparation of the test suite, comprising
real aircraft, non-aircraft and scaled-down aircraft images. In Section 6.2, the search
combinatorics are discussed in terms of line-grouping complexity. The effectiveness of
using the neural networks for aircraft feature detection, intensity-based constraints,
dense clutter removal and line extension algorithms on the computational savings and
the systems performance is pointed out. Section 6.3 analyses the generic recognition
performance of the system in terms of true and false recognition rates. The use
of Receiver Operating Characteristic (ROC) curve gives an insight into the tradeoff between recognition and false-alarm rate, and also assists in setting the score
threshold. Comparisons are made between the two ROC curves yielded by using
the rule-based and neural network based feature detection algorithms. Matching
performance of the system is presented in Section 6.4.
203
204
6.1
Implementation
The system is tested with 8 bit visual-band intensity images. The images are selected to assess the systems performance against poor image quality, shadow effects,
clutter, occlusion, camouflage and the existence of multiple aircraft. The system was
implemented using Matlab.
Initially, a representative training set of 100 real aircraft images and 60 non-aircraft
images that reflect the real world concerns, was used as a guideline for algorithm development and to fine tune the aircraft part detection and clutter rejection algorithms.
The non-aircraft images consist mainly of buildings and urban areas.
Based on the training set, the system parameters and thresholds were adjusted to
accept desirable features under degraded conditions, and to reject ambiguous features/parts. Furthermore, scores from positive evidences and penalties from negative
evidences were adjusted in order to widen the score gap between the correct and
spurious hypotheses.
We also have another version of the system that uses the neural networks to extract
wing, nose, wingpair and wingpair-nose association features. Training and validating
of the neural network required more experimental data from the real aircraft features,
therefor 200 additional aircraft images were processed to increase the experimental
data size to 300. The latter stages such as the evidence accumulation and ambiguity
resolution are not touched.
The test set consists of a total of 520 real images, comprising 220 real aircraft images,
200 scaled-down model images (for model matching) and 100 non-aircraft images.
205
Table 6.1: Comparison of the total number of lines with and without the use of the
clutter removal algorithm for images with dense clutter.
image index
4
50
58
59
61
62
63
65
70
75
85
91
average
6.2
N0E
no clutter removal
1619
741
1927
830
1219
1366
1182
906
1058
2084
803
1684
1285
NE
clutter removal
744
574
491
532
566
651
750
648
595
901
679
618
646
NE /N0E
0.45
0.77
0.25
0.64
0.46
0.47
0.63
0.71
0.56
0.43
0.84
0.36
0.55
Computational Complexity
The total number of lines, NE , as defined in Section 3.4.2 varies from about 40 to over
1000. Small line counts are usually obtained from aircraft images with no clutter in
the background. It was also observed that about half the images produced more than
200 lines. More challenging images containing background clutter usually generate
a large number of line fragments. If the clutter regions contain dense and randomly
oriented clutter pixels, then they can be removed by applying the proposed clutter
removal algorithm of Section 3.3. Table 6.1 demonstrates the differences that the
clutter removal algorithm makes to NE for heavily cluttered images. The use of the
clutter removal algorithm provides about 50% reduction in NE , significantly cutting
down the computational complexity of subsequent processes.
If, however, the background clutter pixels are not randomly oriented and present
relatively long segments, then NE may become very large (600-1100). The extracted
206
600
500
400
300
200
100
100
200
300
400
500 600
Line Count
700
800
900
1000
Figure 6.1: Number of line groupings extracted by the rule-based method: NN (blue),
NW (red), N4G (black) and NH (green) versus line count NE (x-axis).
600
500
400
300
200
100
100
200
300
400
500 600
Line Count
700
800
900
1000
Figure 6.2: Number of line groupings extracted by the neural network based method:
NN (blue), NW (red), N4G (black) and NH (green) versus line count NE (x-axis).
30
25
25
20
15
10
5
0
number of occurrences
number of occurrences
30
200
400
600
line count
800
20
15
10
5
0
1000
30
30
25
25
number of occurrences
number of occurrences
207
20
15
10
5
0
200
400
600
800
4line grouping count
1000
200
200
400
600
800
2line grouping count
1000
20
15
10
5
0
400
600
hypothesis count
800
1000
Figure 6.3: Distribution curves of the number of line groupings, NE (top left), NW
(top right), N4G (bottom left) and NH (bottom right), obtained via the rule-based
approach from the cluttered aircraft images.
lines are prioritised in terms of their length, and if the total line count is large, then
only the top 140 salient lines are labelled as significant (see Figure 3.9).
Let NS , NW , NN , N4G and NH be the number of significant lines, wing candidates,
nose candidates, four-line groupings and hypotheses, respectively. The complexity of
the two-line grouping generation for potential wings and noses is O(NE 2 ). However,
since wing candidates are formed using significant lines only, the wing candidate
count, NW , cannot exceed NS (NS 1)/2, where NS = 140. Figure 6.1 shows the
curves of NN , NW , N4G and NH versus NE , when the line groupings are extracted
using the rule-based approach. In this plot, NW stabilises when the line count NE
exceeds 140. On the other hand, NN increases approximately linearly with NE , as
non-significant lines are allowed to contribute to the nose formation. NN can be
approximately estimated as NN = 0.2NE , where NE is the total line count. Notice,
208
in Figure 6.1, that the nose count remains practically below 200.
Four-line groupings are formed by pairing the wing candidates, which are composed
of significant lines. Therefore, the computational complexity for four-line grouping
generation is expressed in terms of NS , as O(NS 4 ), where is in the order of 104 .
In Figure 6.1, the N4G curve (black) remains well below the NW curve (red) (ie.,
N4G 0.3NW ), and stops increasing as NE exceeds 140, showing that a large portion
of spurious groupings are successfully rejected in the four-line grouping process.
An aircraft hypothesis is generated if a four-line grouping finds a matching nose, requiring N4G NN computations. Therefore, the computational complexity for aircraft
hypothesis generation process would be O(NS 4 NE ) where is less than and is
also in the order of 104 . In Figure 6.1, the NH curve (green) is well below N4G , and
remains on average around 50.
Figure 6.2 also shows NN , NW , N4G and NH as a function of NE , but this time,
the line groupings are extracted using the neural networks. This figure looks similar
to Figure 6.1, except that NW is reduced by about 40% and NN is now kept below
80 instead of linearly increasing as in Figure 6.1. This indicates that the neural
networks successfully removed some of the spurious wing and nose features that the
rule-based approach could not remove. However, the reduction in N4G for the neural
networks is less impressive, and reduction in NH is only noticeable for NE > 600.
This indicates that the rule-based reasonings may have been also effective. Anyhow,
it can be concluded that the neural networks reduce the computational load of the
system.
If NE > 450, then our system considers the image as being cluttered. The distribution
curves of NE , NW , N4G and NH for cluttered images are shown in Figure 6.3. The
distribution curves are obtained with the rule-based approach. These curves appear
209
roughly Gaussian with means and standard deviations of E ' 650 (E ' 120),
W ' 400 (W ' 140), 4G ' 100 (4G ' 50) and H ' 45 (H ' 35) for N4G , NW ,
NE and NH , respectively. Such statistical characteristics are used as a guideline for
adjusting the memory allocations for the line and line grouping databases.
The aircraft hypotheses undergo the evidence accumulation process to support or
negate the hypothesis. The complexity associated with these processes is O(NH NL )
where NL is the total number of extracted lines prior to the line extension process
(eg., NL < NE ). After the evidence accumulation process, only a portion of the
hypotheses with a score above 420 is allowed to proceed to the conflict resolution
process. The complexity for interpretational conflict resolution process is O(NH 2 )
where 1. This system is configured to accept up to 5 winning hypotheses to
enable multiple aircraft recognition.
For model matching, let NW H , NM , LC , WC and NRP be the number of winning
hypotheses (usually 1), number of short-listed model candidates, contour length of
transformed model, width of the cosine weighting function (ie., dth in Equation 5.4.1),
and number of RP locations (for pose estimate fine tuning), respectively. The worst
case complexity for the matching process is O(NW H NM LC WC NRP ). The use of a hierarchial model for efficient pruning of the search space, a coarse-to-fine grid approach
for locating RP , and a more accurate model representation to allow for narrower WC
can reduce the complexity. The computational complexity is summarised in Table
6.2.
For any image understanding task which relies largely on edge features, it is important that lines belonging to the aircraft structure present a sufficient length. Often
in practice, such lines become fragmented and missed out after the line extraction
process. There are numerous reasons for such fragmentation problems: physical discontinuations due to protrusions, occlusion and wing flaps, and shortcoming of the
210
Complexity
O(NE 2 )
O(NS 4 )
Comments
NE : number of total lines
NS : number of significant lines
is in the order of 104
O(NS 4 NE )
O(NH NL )
NH : number of hypotheses
NL : number of lines before extension
1
O(NH 2 )
O(NW H NM
LC WC NRP )
edge detection algorithm due to noise, clutter surrounding the wing edges, wing camouflage and blurring. The system implements the line extension algorithm in order
to enhance the survivability of the wing edge lines. Figure 6.4 shows a plot of the
number of extended lines (ie., NE NL ) versus the total number of unextended lines,
NL . The value of NE NL increases approximately linearly with NL . The slope of
the line is roughly 0.15, which indicates that the line extension algorithm increases
the total line count by 15%.
In order to determine how effective the line extension algorithm is, the line images
were examined to count the occurrences of the wing edge fragmentation that should
be extended upon visual inspection. The total count was 495. It is then followed by
counting how many of them have been successfully extended by the line extension
algorithm. 409 edges out of the 495 fragmented wing edges have been extended,
resulting in a recovery rate of 82.6%. All of the extended wing edges are labelled
211
900
800
700
600
500
400
300
200
100
100
200
300
400
500
600
number of unextended lines
700
800
900
1000
Figure 6.4: Plots of total line counts. The curve represents the number of the extended
lines as a function of the unextended lines (prior to the line extension process).
as significant, and have a better chance of surviving in the higher level processes. A
recovery rate above 80% at the expense of a 15% overhead is satisfactory.
For two-line groupings (or wing candidates), the sole use of line-based constraints
has shown its limitations in keeping down the number of wing candidates. It was
discussed earlier in this section that the total number of two-line groupings is maintained reasonably small, despite a large increase in the line count (see Figure 6.1).
Apart from the fact that NS is limited below 140, the use of intensity-based information also contributes to a significant reduction in the two-line grouping count. To
support this argument, the two-line grouping process (using the rule-based approach)
was repeated on the same image set, but this time the intensity check routine was
disabled. The result is shown in Figure 6.5, where the rule-based approach is used.
The black curve represents the number of two-line groupings without the intensity
check, as opposed to the red curve obtained with the intensity check. For NE > 250,
212
800
700
600
500
400
300
200
100
100
200
300
400
500
600
700
800
900
Figure 6.5: Plot of NW curves obtained from real aircraft images, using the rule-based
two-line grouping extraction algorithm. The red and black curves represent NW with
and without intensity checks, respectively.
the red curve no longer increases, but the black curve continues to increase, and the
gap between the two curves widens slowly.
The two-line grouping count gap is more evident with non-aircraft cluttered images.
Figure 6.6 illustrates this point, and shows an average count separation of about 300
for line counts exceeding 600. The use of intensity-based information almost halved
the number of two-line groupings formed.
In the four-line grouping formation process, the intensity check was not necessary
because an increased number of geometric constraints were available. However, the
intensity information played an important role in the evidence accumulation and
shadow discrimination stages.
213
800
700
600
500
400
300
200
100
300
400
500
600
700
800
900
1000
1100
1200
1300
Figure 6.6: Plot of NW curves obtained from non-aircraft clutter images,using the
rule-based two-line grouping extraction algorithm. The red and black curves represent
NW with and without intensity checks, respectively.
6.3
For a statistical analysis of the recognition performance, a batch run was prepared
using 220 real aircraft images to test the systems capability to recognise the aircraft
in the image. Another batch mode trial was developed to test the reliability of the
system to declare no detection when the image does not contain an aircraft. A total
of 100 non-aircraft (clutter) images was used for this test.
We first define the performance indicators as below.
214
The setting of the threshold is application dependent. The user has to weigh up the
rate of false alarm in light of the available resources and the intended application
objectives. As an example, if it is judged that declared aircraft detections can, at a
low cost, be confirmed, by either taking and processing other subsequent images of
the same scene or having a human operator examine the detections, then the score
threshold may be lowered to reduce the misses. Use of a receiver operating characteristic (ROC) curve provides a convenient way to visualise the trade-off between
the recognition and false alarm rates (ie., RR versus FAR) for all possible threshold
settings, and reflects the system performance. As mentioned in Section 3.8, the system utilises two approaches to extract wing, nose, wing-pair and aircraft candidates
- one is rule-based and the other one uses the neural networks. The effects of the two
approaches on the overall performance are compared in terms of the ROC curves as
shown in Figure 6.7. The red and blue colours are respectively associated with the
rule-based and neural network based approaches. The area under the ROC curves
is larger for the neural network approach, suggesting that the neural networks generate less spurious hypotheses while maintaining or improving the recognition rate.
The gaps between the curves are more noticeable near the operating point (eg., FAR
< 10%), and diminishes as the FAR increases. Table 6.3 shows the recognition performance for 12 different thresholds. In the table, the neural network based approach
usually generates a smaller false alarm rate for a given recognition rate. For a false
215
0.7
recognition rate
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
false alarm rate
0.4
0.5
0.6
Figure 6.7: ROC curves for the generic recognition of aircraft. The red curve is
obtained when the rule based method is used for the extraction of line-groupings and
the blue curve is obtained using the neural networks.
alarm rate of about 7%, a recognition rate of 84% is achieved when the neural networks are used. With the rule based approach, the recognition rate drops to 81%,
which is also acceptable.
Table 6.4 gives a performance breakdown in terms of image category. The image
categories are blurring, camouflage, clutter, multiple targets, normal environment,
occlusion, protrusion and shadow. The performance was roughly consistent across the
image categories. However, comparison of recognition rates among these categories
is not accurate as the number of samples in each category is not large enough.
216
Table 6.3: Performance evaluation using real aircraft and clutter images.
Threshold
490
500
510
520
530
540
550
560
570
580
590
600
CRR (Rule)
87.7%
87.3%
86.5%
85.9%
85.7%
84.9%
84.7%
83.6%
82.0%
81.2%
80.2%
78.1%
FAR (Rule)
30.6%
27.5%
23.8%
23.1%
21.3%
16.8%
15.0%
12.5%
8.7%
6.8%
5.0%
3.8%
CRR (NN)
86.1%
85.7%
85.3%
84.9%
84.1%
83.3%
82.2%
80.5%
79.2%
77.5%
77.5%
76.2%
FAR (NN)
9.4%
8.4%
7.5%
7.4%
6.8%
6.7%
6.3%
5.0%
3.1%
1.3%
1.3%
0.7%
Table 6.4: Recognition rates in the eight imaging categories. Note that for the multiple aircraft category, the denominator 42 is the total count of aircraft in 17 multiple
aircraft images.
blur
TP 21/24
%
(88%)
normal
TP 32/35
%
(91%)
camouflage
19/23
(83%)
occlusion
29/32
(91%)
clutter
29/35
(83%)
protrusion
29/31
(94%)
multiple
34/42
(81%)
shadow
20/23
(87%)
217
6.4
Matching Performance
The second part is focussed on aircraft identification. The image set comprises 200
images of scaled-down models representing F16, F18, F111, F35 and the Mirage. The
matching algorithm based on these models were tested on real aircraft images and
the contour matching was acceptable. However, not enough samples of those images
(which span various viewing angles) could be acquired for a statistical analysis of
the pose estimation and matching performance. Therefore, 5 scaled-model aircraft
were built and photographed under various viewing angles, blurring, contrast, clutter, occlusion, shadow and camouflage. This enabled us to control the degradations
and generate various viewing angles for rigorous pose estimation test. Furthermore,
camouflage could be applied at will. Note that the degradation is not as severe as in
Section 6.3 so that a large number of true hypotheses could be made available for the
model matching.
After the generic recognition, 190 out of 200 aircraft were successfully recognised
with 5 counts of false alarms from the image background. The correct 190 winning
candidates were then subjected to model matching. Viewpoint invariant quantities,
such as the wingpair shape and the FWR ratio (FWR= kC F P k/kF P RP k),
are used to prune the model search space. After estimating the pose of each model
candidate, a match score is computed. It is then followed by fine-tuning of the pose
to find the best match. The model candidate generating the highest match score is
regarded as the matching model, and the match score is recorded.
An additional set of 82 spurious high scoring hypotheses, from the real aircraft and
non-aircraft image sets, is also subjected to model matching. The objective is to
analyse the model matching performance against coincidental line groupings that
appear like an aircraft.
218
match score
100
90
80
70
match score
60
50
40
30
20
10
20
40
60
80
100
120
image index
140
160
180
200
Figure 6.8: Model match score: Correct match (blue asterisk) and false match (red
circle or red cross). A red circle represents a correct aircraft hypothesis matched to
a wrong model. A red cross represents a spurious aircraft hypothesis matched to one
of the models.
Figure 6.8 shows true and false match scores. A blue asterisk corresponds to correct
match. A red cross corresponds to an incorrect match obtained with the spurious
hypotheses as input. A red circle corresponds to a mismatch between an aircraft
hypothesis and a wrong model. Overall, we obtained 182 correct matches (blue asterisks) and 82 incorrect matches (red crosses + red circles). They occupy two distinct
regions, with a slight overlap in-between (around the score of 60%). Different thresholds were experimented with and the results are given in Table 6.5, using the following
performance indicators.
TP(True Positive): correct match with a score above threshold.
FN(False Negative): correct match with a score below threshold.
FP(False Positive): incorrect match with a score above threshold.
TN(True Negative): incorrect match with a score below threshold.
219
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
0.6
False Match Rate (FMR)
0.7
0.8
0.9
Figure 6.9: ROC curve: trade off between true and false match rates as the threshold
varies.
Figure 6.9 shows an ROC curve for the model matching performance. An operation
point around the FMR of 0.1 corresponds to a threshold of about 60% (refer to Table
6.5). Setting the threshold to 55% yields a very high FMR of 43.9%. Raising it to
Table 6.5: Matching performance parameters.
Threshold
55%
57%
59%
60%
61%
62%
63%
65%
220
6.5
221
blur camo
Yes
No
Yes
No
Yes
?
Yes
No
Yes
Yes
clut
No
No
No
?
Yes
mult occl
No
No
No Yes
No
No
No Yes
Yes Yes
prot
No
No
Yes
Yes
Yes
shad
?
?
Yes
Yes
Yes
6.6
Concluding Remarks
An extensive test suite of 520 real aircraft, non-aircraft and scaled-down aircraft images, are used to cover a broad spectrum of image variations and also to test the
system performance. The performance analysis is carried out in terms of investigating various aspects of computational complexity, generating the ROC curves, and
descriptive performance comparison with other methods.
222
The system handles real world issues adequately by implementing the hypothesise
then verify paradigm where aircraft recognition decisions are made through a voting
(evidence accumulation) scheme. The system takes notice of distinctive image features
of aircraft and clutter at various levels of processing and implements a number of
geometric and intensity based heuristics to discriminate between aircraft and clutter.
Further enhancement to the system was the neural networks integration, which proved
to be a promising replacement for the heuristics in terms of a computational saving
and an improved ROC curve.
While the system is mainly driven by line features, the use of intensity based information was essential in widening the score gap between true and false aircraft candidates.
Furthermore, by progressively building up higher level features, the system was able
to keep the combinatorics under control. The pixel-level boundary fitting approach
displayed a consistently good model matching performance, in the presence of clutter
and occlusion.
The statical analysis of the systems generic recognition and identification performances produced a promising result. The recognition performance was consistent
across the imaging categories (refer to Table 6.4). From the ROC curves, true and
false recognition rates of 84% and 6.8%, and true and false matching rates of about
90% and 8% could be achieved. We find this result satisfactory.
Chapter 7
Conclusions
7.1
Summary
In this thesis, we present a knowledge-based approach for the generic recognition and
identification of aircraft in complex real-world imagery. The difficulties associated
with real-world imagery are occlusion, shadow, cloud, low image intensity contrast,
clutter, camouflage and flares.
The developed vision system is a rule based system, which uses a voting scheme
to reach a decision regarding the presence and location of an aircraft in an image.
Rules in this system mainly exploit the geometric relationships that hold within
and between aircraft parts. Image intensity information is also used to increase the
systems confidence to determine the aircraft parts and recognise the whole aircraft.
This system starts by detecting edges in an image and forming straight line features.
The extraction of these low-level features is achieved after dual thresholding, contour generation, clutter removal and line extension. Such primitive features are then
grouped in an incremental fashion to build more complex feature associations (eg.,
223
224
nose, wing pairs, tail fins, etc.). These feature groups eventually lead to the generation of a number of competing aircraft hypotheses, each of which is allocated a
confidence score (or vote), reflecting the degree of conformity to the aircraft generic
structure. Such a gradual build-up of the complex feature-associations requires intensive tuning process for the system parameters and thresholds. The neural network is
an attractive solution to this problem and could improve the system robustness.
Votes in this system are allocated proportional to the importance of the aircraft part
under consideration. The major components of an aircraft hypothesis are a wing
pair, a matching nose and a fuselage section. Due to their importance in the aircraft
recognition process, large voting scores are allocated to these parts. Other minor
scores are left to the less critical evidences arising from the wing tip and tail fin parts.
This system also makes use of negative evidences to penalise aircraft candidates that
contain contradicting features.
Although the recognition part of the system cannot provide the identity of the viewed
aircraft, it provides a broad classification of it in terms of wing shape. Aircraft
identification, however, was achieved through model matching using a model set of 5
fighter aircraft. The recorded correct identification rate was about 90% despite the
fact that the models used were simple wire frame representations of the aircraft in
the horizontal and vertical planes.
7.2
Discussion
This vision system is able to achieve a high aircraft recognition rate (> 80%) provided
that (a) the image intensity contrast is not extremely low in the regions of aircraft
wing edges and nose, (b) some of the aircraft line features are of sufficient length and
(c) the aircraft view is not too oblique that the wings become not clearly visible.
225
The first point needs to be further clarified by stressing that the contrast in areas
of interest (ie. wings and nose) should not be much lower than the largest contrast
recorded in the background.
The overall robustness of this aircraft recognition system is the result of the successful
integration of a number of features, which are summarised below.
1. Use of reasonably large edge detection templates and application of low dual
thresholds (as low as 16% and 10% of the peak edge gradient) for the purpose
of enhancing detection sensitivity to long straight edges of low gradients.
2. Extraction of long lines by extending shorter collinear lines; the objective of
this procedure is to join broken wing edges with reduced effect on lines in the
background.
3. Low level processing of the image background, including (a) identifying lines
predominantly oriented along one or two directions, and (b) removing dense
clutter pixels displaying random orientations.
4. Line organisation according to their significance, endpoint proximity and collinearity to selectively use them to improve the system robustness and computational
efficiency.
5. The direct and indirect use of intensity information to supplement the geometric
reasoning. This improves the systems capability to discard spurious hypotheses
arising from clutter.
6. Validation and identification of the recognised aircraft via the pixel-level model
matching, which is applied to the phase image to reduce clutter interference.
The two first robustness features address the problem of detecting weak edges in the
226
image and forming longer lines. In the actual system implementation, all thresholds
in the line extraction routines were relaxed to ensure that desirable wing edges are
not missed. This requirement, however, causes a large number of unwanted extended
lines to occur and leads therefore to the emergence of a large number of spurious line
groupings. These groupings, however, are most often discarded at the higher level of
generic aircraft recognition.
The third robustness feature addresses the interference problem of a cluttered background with the aircraft recognition process. By removing most or part of the clutter,
we considerably reduce the number of edges in the image. This in turn, reduces the
overall number of straight lines and therefore improves the ranking of wing edges in
terms of length (ie., wing edges become significant in the image). Furthermore, we
extend our background processing to longer polarised or grid background lines that
may reduce the saliency of aircraft boundary lines. In this case, instead of removing
the polarised/grid lines, we lower their saliency (significance) ranks in the line organisation/selection process (the fourth feature above), so that the desired line features
are successfully accepted in line grouping process. As explained in Chapter 3, this
line selection process eventually leads to improvements in the systems performance
robustness and computation efficiency.
We reiterate the importance of successfully detecting critical features (ie., nose/wing
edges) to ensure the generation of the aircraft hypothesis (ie., wing-nose association)
from them. Such a robust lower level processing is crucial in any bottom-up vision
system. Therefore, the conditions and thresholds used in the low level stages and line
grouping generation of Chapter 3 are made forgiving. Furthermore, another version of
the system is developed that incorporates the neural networks in place of the feature
extraction rules. This has shown to reduce the computational load and false alarm
rate.
227
Our model matching method implements a pixel-level contour matching, which allows
image pixels missed out in the line features to contribute to the final match score hence
improving the systems capability to discriminate between similar shaped aircraft.
Inherent clutter problem in pixel level matching is overcome by disregarding any
mapped image pixels that have conflicting phase values.
7.3
228
Bibliography
[1] N. Ayache and O.D. Faugeras. hyper: A new approach for the recognition
and positioning of 2d objects, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 8(1):4454, January 1986.
[2] E. Bala and A.E. Cetin, computationally efficient wavelet affine invariant
functions for shape recognition, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 26(8):10951099, August 2004.
[3] D.H. Ballard, generalizing the hough transform to detect arbitrary shapes,
In Real-Time Computer Vision, pages 714725, 1987.
[4] M. Bennamoun, edge detection: Problems and solutions, IEEE Transactions
on Systems, Man, and Cybernetics, Computational Cybernetics and Simulation, 4:31643169, 1997.
[5] J.R. Beveridge, Local Search Algorithms for Geometric Object Recognition:
Optimal Correspondence and Pose, Phd thesis, University of Massachusettes,
Amherst, May 1993.
[6] J.R. Beveridge and E.M. Riseman, how easy is matching 2d line models using
local search?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6):564579, June 1997.
229
230
[7] B. Bhanu and Holben R.D, model based segmentation of flir images, IEEE
Transactions on Aerospace Electronic System, 26(1):465491, 1998.
[8] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford: Univeristy
Press, Inc., New York, USA, 1995.
[9] C. Bjorklund, M. Noga, E. Barrett, and Kuan D, lockheed imaging technology research for missiles and space, In Proceedings of the DARPA Image
Understanding Workshop, pages 332352, Palo Alto, CA, May 1989.
[10] M. Boldt, R. Weiss, and E.M. Riseman, token-based extraction of straight
lines, Transactions on Systems, Man and Cybernetics, 19(6):15811594, 1989.
[11] R.C. Bolles and R.A. Cain, recognising and locating partially visible objects:
The local feature focus method, Internat. J. Robot. Res., 1(3):5782, 1982.
[12] G. Borgefors, distance transformations in arbitrary dimensions, Computer
Vision, Graphics, and Image Processing, 27(3):321345, September 1984.
[13] G. Borgefors, hierarchical chamfer matching: A parametric edge matching
algorithm, IEEE Transactions on Pattern Analysis and Machine Intelligence,
10(6):849865, November 1988.
[14] R. D. Boyle and R. C. Thomas, Computer Vision A First Course, Blackwell
Science Publications, 1988.
[15] M.G. Breuers, image-based aircraft pose estimation using moment invariants,
In SPIE Conference on Automatic Target Recognition IX, pages 294304, Orlando, Florida, April 1999.
[16] R.A. Brooks,
231
232
[27] C.H. Chien and J. K. Aggarwal, shape recognition from single silhouettes,
IEEE Transactions on Pattern Analysis and Machine Intelligence, 3(3):481
490, 1981.
[28] M. Clark, A.C. Bovik, and W.S. Geisler, texture segmentation using gabor
modulation/demodulation, Pattern Recognition Letters, 6:261267, 1987.
[29] S. Climer and S.K. Bhatia, local lines: A linear time line detector, Pattern
Recognition Letters, 24(14):22912300, October 2003.
[30] R.W. Curwen and J.L. Mundy, constrained symmetry exploitation, In Image
Understanding Workshop, pages 775781, 1998.
[31] R.W. Curwen, C.V. Stewart, and J.L. Mundy, recognition of plane projective
symmetry, In Proceedings of IEEE International Conference on Computer
Vision, pages 11151122, 1998.
[32] S. Das and B. Bhanu, a system for model-based object recognition in perspective aerial images, Pattern Recognition, 31:465491, 1998.
[33] S. Das, B. Bhanu, Wu X., and R.N. Braithwaite, Qualitative Recognition of
Aircraft In Perspective Aerial Images, chapter in Advanced Image Processing
and Machine Vision, pages 475517, Springer-Verlag, 1996.
[34] L.S. Davis and T.C. Henderson, hierarchical constraint processes for shape
analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence,
3(3):265277, 1981.
[35] H. Derin and H. Elliott, modelling and segmentation of noisy and textured
images using gibbs random fields, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 9(1):3955, January 1987.
233
[36] S.A. Dudani, K.J. Breeding, and R.B. McGhee, aircraft identification by
moment invariants, IEEE Transactions on Computers, 26(1):3946, January
1977.
[37] P.T. Fairney and D.P. Fairney, 3-d object recognition and orientation from
single noisy 2-d images, Pattern Recognition Letters, 17(7):785793, June 1996.
[38] S.A. Friedberg, finding axis of skewed symmetry, In Proceedings of IEEE
International Conference on Pattern Recognition, pages 322325, 1984.
[39] K.S. Fu, syntactic pattern recognition and applications, Pattern Recognition,
12(6):431441, 1980.
[40] K.S. Fu, Syntactic Pattern Recognition and Applications, Prentice Hall, New
Jersey, 1982.
[41] D. Gavrila, multi-feature hierarchical template matching using distance transforms, In Proceedings of IEEE International Conference on Pattern Recognition, 1998.
[42] D. Gavrila and V. Philomin, real-time object detection for smart vehicles,
In Proceedings of IEEE International Conference on Computer Vision, pages
8793, 1999.
[43] D.M. Gavrila and F.C.A. Groen, 3-d object recognition from 2-d images using
geometric hashing, Pattern Recognition Letters, 13(4):263278, April 1992.
[44] T. Glais and A. Andre, image-based air targt identification, In SPIE conference on Applications Of Digital Image Processing XVII, volume 2298, 1994.
[45] J.W. Gorman, O.R. Mitchell, and F.P. Kuhl, paritial shape recognition using
dynamic programming, IEEE Transactions on Pattern Analysis and Machine
Intelligence, 10(2):257266, 1988.
234
235
[55] J.W. Hsieh, J.M. Chen, C.H. Chuang, and K.C. Fan, novel aircraft type recognition with learning capabilities in satellite images, In Proceedgins of IEEE
International Conference on Image Processing, pages III: 17151718, 2004.
[56] J.W. Hsieh, J.M. Chen, C.H. Chuang, and K.C. Fan, aircraft type recognition
in satellite images, In IEE Proceedings on Vision, Image and Signal Processing,
volume 152, pages 307315, June 2005.
[57] M.K. Hu, visual pattern recognition by moment invariants, IRE Transations
on Information Theory, 8:179187, Febrary 1962.
[58] D.P. Huttenlocher, monte carlo comparison of distance transform based matching measures, In Proceedings of the DARPA Image Understanding Workshop,
pages 11791184, 1997.
[59] D.P. Huttenlocher, G.A. Klanderman, and W.J. Rucklidge, comparing images
using the hausdorff distance, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 15(9):850863, September 1993.
[60] D.P. Huttenlocher and S. Ullman, recognizing solid object by alignment with
an image, International Journal of Computer Vision, 5(2):211, 1990.
[61] J. Illingworth and J.V. Kittler, a survey of the hough transform, Computer
Vision, Graphics, and Image Processing, 44(1):87116, October 1988.
[62] Q. Iqbal and J.K. Aggarwal, retrieval by classification of images containing large manmade objects using perceptual grouping, Pattern Recognition,
35(7):14631479, July 2002.
[63] J.H. Jang and K.S. Hong, fast line segment grouping method for finding
globally more favorable line segments, Pattern Recognition, 35(10):22352247,
October 2002.
236
[64] H. Kalviainen, P. Hirvonen, L. Xu, and E. Oja, probabilistic and nonprobabilistic hough transforms: Overview and comparisons, Image and Vision
Computing, 13(4):239252, May 1995.
[65] B. Kamgar-Parsi, B. Kamgar-Parsi, and A.K. Jain, automatic aircraft recognition: Toward using human similarity measure in a recognition system, In
Proceedings of IEEE Computer Vision and Pattern Recognition, pages I: 268
273, 1999.
[66] B. Kamgar-Parsi, B. Kamgar-Parsi, A.K. Jain, and J.E. Dayhoff, aircraft
detection: A case study in using human similarity measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12):14041414, December 2001.
[67] Z. Kim and R. Nevatia, automatic description of complex buildings from
multiple images, Computer Vision and Image Understanding, 96(1):6095,
October 2004.
[68] H. Kollnig and H.H. Nagel, 3-d pose estimation by directly matching polyhedral models to gray value gradients, International Conference on Computer
Vision, 23(3):283302, June 1997.
[69] P. Kuhl and C. Giardina, elliptic fourier features of a closed contour, Computer Graphics and Image Processing, 18:236258, 1982.
[70] R. Kuma and A. Hanson, robust estimation of camera location and orientation
from noisy data having outliers, In In Proceedings of IEEE Workshop on
Interpretation of 3D Scenes, pages 5260, 1989.
[71] D. Lagunovsky and S. Ablameyko, straight-line-based primitive extraction in
grey-scale object recognition, Pattern Recognition Letters, 20(10):10051014,
October 1999.
237
[72] Y. Lamdan, J.T. Schwartz, and H.J. Wolfson, on recognition of 3-d objects
from 2-d images, In Proceedings of IEEE International Conference on Robotics
and Automation, pages 14071413, 1988.
[73] Y. Lamdan and H.J. Wolfson, geometric hashing: A general and efficient
model-based recognition scheme, In Proceedings of IEEE International Conference on Computer Vision, pages 238249, 1988.
[74] Y. Lamdan and H.J. Wolfson, on the error analysis of geometric hashing, In
Proceedings of IEEE Computer Vision and Pattern Recognition, pages 2227,
1991.
[75] V.F. Leavers, survey: Which hough transform?, Computer Vision, Graphics,
and Image Processing, 58(2):250264, September 1993.
[76] C. Lin, A. Huertas, and R. Nevatia, detection of buildings using perceptual
groupings and shadows, In USC Computer Vision, 1994.
[77] Chungan Lin and Ramakant Nevatia, building detection and description from a
single intensity image, Computer Vision and Image Understanding, 72(2):101
121, 1998.
[78] H.C. Liu and M.D. Srinath, corner detection from chain-code, Pattern Recognition, 23:5168, 1990.
[79] D.G. Lowe, three-dimensional object recognition from single two-dimensional
images, Artificial Intelligence, 31(3):355395, March 1987.
[80] G. Marola, using symmetry for detecting and locating objects in a picture,
Computer Vision, Graphics, and Image Processing, 46(2):179195, May 1989.
238
239
[89] J.L. Mundy and A.J. Heller, the evolution and testing of a model-based object recognition system, In Proceedings of IEEE International Conference on
Computer Vision, pages 268282, 1990.
[90] P.F.M. Nacken, a metric for line segments, IEEE Transactions on Pattern
Analysis and Machine Intelligence, 15(12):13121318, December 1993.
[91] H. Nasr, B. Bhanu, and S. Lee, refocused recognition of aerial photographs at
multiple resolution, Proceedings SPIE International Conference on Aerospace
Pattern Recognition, 1098:198206, 1989.
[92] R. Nevatia and A. Huertas, knowledge-based building detection and description: 1997-1998, In Proceedings of the DARPA Image Understanding Workshop, pages 469478, 1998.
[93] R. Nevatia, C. Lin, and A. Huertas, a system for building detection from
aerial images, In In A. Gr un, E. P. Baltsavias, and O. Henricsson, editors,
Automatic Extraction of Man-Made Objects from Aerial and Space Images (II),
Birkh auser, Basel, pages 7786, 1997.
[94] R Nevtia and Babu. R, linear feature extraction and description, Computer
Vision, Graphics, and Image Processing, 13:257269, 1980.
[95] S. Noronha and R. Nevatia, detection and description of buildings from multiple aerial images, In Proceedings of IEEE Computer Vision and Pattern
Recognition, pages 588594, 1997.
[96] T. Ojala, M. Pietikainen, and D. Harwood, a comparative study of texture
measures with classification based on feature distributions, Pattern Recognition, 29(1):5159, January 1996.
240
[97] C.F. Olson, efficient pose clustering using a randomized algorithm, International Conference on Computer Vision, 23(2):131147, June 1997.
[98] C.F. Olson and D.P. Huttenlocher, automatic target recognition by matching
oriented edge pixels, IEEE Transactions on Image Processing, 6(1):103113,
January 1997.
[99] D.W. Patterson, Artificial Neural Networks, Prentice Hall, Singapore, 1996.
[100] A.P. Pentland, fractal-based description of natural scenes, In IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 661674, 1984.
[101] P. Perona and J Malik, scale-space and edge detection using anisotropic diffusion, IEEE Transactions on Pattern Analysis and Machine Intelligence,
12(7):629639, July 1990.
[102] W. K. Pratt, Digital Image Processing, John Wiley and Sons. Inc., 2nd edition
edition, 1991.
[103] A.P. Reeves, R.J. Prokop, S.E. Andrews, and F.P. Kuhl, three-dimensional
shape analysis using moments and fourier descriptors, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 10(6):937943, November 1988.
[104] J. Serra, Image Analysis and Mathematical Morphology, Academic Press, 1982.
[105] S.M. Smith and J.M. Brady, susan: A new approach to low-level imageprocessing, International Journal of Computer Vision, 23(1), May 1997.
[106] A. A. Somaie, A. Badr, and T. Salah, aircraft recognition system using backpropagation, In Radar, 2001 CIE International Conference on, Proceedings,
pages 498501, 2001.
241
[107] C.T. Steger, similarity measures for occlusion, clutter, and illumination invariant object recognition, In Proceedings of the 23rd DAGM-Symposium on
Pattern Recognition, pages 148154, London, UK, 2001. Springer-Verlag.
[108] C.T. Steger, occlusion, clutter, and illumination invariant object recognition,
In Proceedings of Photogrammetric Computer Vision, page A: 345, 2002.
[109] F. Stein and G.G. Medioni, graycode representation and indexing: Efficient
two dimensional object recognition, In Proceedings of IEEE International
Conference on Pattern Recognition, pages VolI 1317, 1990.
[110] F. Stein and G.G. Medioni, recognition of 3-d objects from 2-d groupings, In
Proceedings of the DARPA Image Understanding Workshop, 1992.
[111] F. Stein and G.G. Medioni, structural indexing: Efficient two dimensional
object recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(12):11981204, December 1992.
[112] G. Stockman, S. Kopstein, and S. Benett, matching images to models for
registration and object detection via clustering, IEEE Transactions on Pattern
Analysis and Machine Intelligence, 4(3):229241, 1982.
[113] G.Y. Tang and T.S. Huang, using the creation machine to locate airplanes on
aerial photos, Pattern Recognition, 12(6):431441, 1980.
[114] E. Thiel and Montanvert. A, champer masks: Discrete distance functions,
geometrical properties, and optimization, In Proceedings of IEEE International
Conference on Pattern Recognition, pages 244247, 1992.
[115] S.C. Tien, T.L. Chia, and Y. Lu, using cross-ratios to model curve data for
aircraft recognition, Pattern Recognition Letters, 24(12):20472060, August
2003.
242
[116] D.H. Titterton and J.L Weston, Strapdown Itertial Navigation Technology,
IEE radar, sonar, navigation and avionics series 5. Peter Peregrinus Ltd., 1997.
[117] F. Tomita and S. Tsuji, Computer Analysis of Visual Textures, Kluwer Academic Publishers, Norwell, MA, USA, 1990.
[118] F.C.D. Tsai, geometric hashing with line features, Pattern Recognition,
27(3):377389, March 1994.
[119] M. Tuceryan and A.K. Jain, texture segmentation using voronoi polygons,
IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(2):211
216, February 1990.
[120] M. Tuceryan and A.K. Jain, texture analysis, In The Handbook of Pattern
Recognition and Computer Vision (2nd Edition), pages 207248, 1998.
[121] V. Venkateswar and R. Chellappa, extraction of straight lines in aerial images,
IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(11):1111
1114, November 1992.
[122] T.P. Wallace, O. R. Mitchell, and K. Fukunaga, three dimensional shape
analysis using local shape descriptors, IEEE Transactions on Pattern Analysis
and Machine Intelligence, 3(3):310323, 1981.
[123] T.P. Wallace and P.A. Wintz, an efficient three-dimensional aircraft recognition algorithm using fourier descriptors, Computer Graphics Image Processing,
13(1):99126, May 1980.
[124] L. Wan and L. Sun, automatic target recogiinition using higher order neural
network, In National Aerospace and Electronics Conference NAECBN, pages
221226, 1996.
243
[125] M.J.J. Wang, W.Y. Wu, L.K. Huang, and D.M. Wang, corner detection using
bending value, Pattern Recognition Letters, 16(6):575583, June 1995.
[126] H.J. Wolfson, model-based object recognition by geometric hashing, In Proceeding of European Conference of Computer Vision, pages 526536, 1990.
[127] H.J. Wolfson and Y. Lamdan, geometric hashing: A general and efficient
model-based recognition scheme, In Proceedings of IEEE International Conference on Computer Vision, pages 238249, 1988.
[128] F. Xu, X. Niu, and R Li, automatic recognition of civil infrastructure objects
using hopfield neural network, Geographic Information Science, 9((1-2)):78
89, December 2003.
[129] S. C. Zhu and A. Yullie, region competition: Unifying snakes, region growing,
and bayes/mdl for multiband image segmentation, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 18(9):884900, September 1996.
Appendix A
Description of Input Parameters to
the Neural Networks Feature
Extractors
This appendix lists the inputs to the neural networks, that are designed to detect line
groupings and aircraft hypotheses. The input parameters are briefly described and
are referenced to the relevant rules and figures in Section 3.5 - Section 3.7.
Feature parameters for wing candidates (refer to Section 3.5.1 and 3.17):
1. mean(li , lj )/lmedian : lengths must be significant (see condition 1).
2. ti : (see condition 2 and Figure 3.16).
3. tj : (see condition 2 and Figure 3.16).
4. apart: (see condition 3 and Figure 3.16).
5. overlap(%): extent of rotational edge overlap as shown in Figure 3.17 (d), (see
condition 4).
6. min(li , lj )/max(li , lj ): (see condition 5 and Figure 3.17 (e)).
7. C : (see condition 7).
245
246
It should be noted that condition 8 regarding the intensity distribution between the
two lines are not included here. Merging the geometry and intensity parameters and
feeding them to the neural networks did not improve the performance of the network.
Therefore, the rule-based approach is kept for the intensity check.
Feature parameters for nose candidates (refer to Section 3.5.2 and Figure 3.19):
1. lL /lmedian : (see condition 1 in the first round rule set and Figure 3.19).
2. tL : (see condition 2 in the first round check).
3. tS : (see condition 2 in the first round check).
4. gij : (see condition 3 in the first round check).
5. overlap(%): (see condition 4 in the first round check).
6. lS /lL : (see condition 5 in the first round check).
7. N :(see condition 7 in the first round check).
8. third line status (-2 to +2): used in the second round check (see conditions
1-3).
9. ave intensity: used in the second round check (see condition 4).
10. line count: extra information to indicate how cluttered the scene is.
Feature parameters for wing-pair candidates (refer to Section 3.6 and Figure
3.26):
1. Wmin /Wmax : ratio of wing weights (sum of edge lengths) (see condition 1).
2. LC : left wing angle (see condition 2).
3. RC : right wing angle (see condition 2).
4. kLC RCk: wing span (see condition 3).
5. kF P P T 1k/kP T 2 P T 1k: how far left leading wing edge is from the intersection point F P , (see condition 4).
247
6. kF P P T 3k/kP T 4 P T 3k: how far right leading wing edge is from the intersection point F P , (see condition 4).
7. kRP P T 5k/kP T 6 P T 5k: how far left trailing wing edge is from the intersection point RP , (see condition 4).
8. kRP P T 7k/kP T 8 P T 7k: how far right trailing wing edge is from the
intersection point RP , (see condition 4).
9. 1 : 1 -4 are used to determine the arrangement of two wings (see condition 5
and Figure 3.26 (a-b)).
10. 2 :(see condition 5 and Figure 3.26 (a-b)).
11. 3 : (see condition 5 and Figure 3.26 (a-b)).
12. 4 : (see condition 5 and Figure 3.26 (a-b)).
13. F : (see condition 6).
14. R : (see condition 6).
15. type: [boomerang, diamond, triangle] determined by geometric rules.
16. wing ave intensity gap/image instensity range: two wings have compatible
mean intensities.
17. (F P RP, F P M ): angular deviation in the symmetry check (see condition
7).
Feature parameters for aircraft candidates (refer to Section 3.7 and Figure
3.31):
1. shape: [boomerang, diamond, triangle] determined during wing-pair detection
process.
2. kC F P k/kF P RP k: location of nose corner, C, in the longitudinal direction
(see condition 2).
3. s : lateral angular extent of nose search region (see condition 2).
4. kC M k/kLC RCk: (see condition 2).
248
The first and last two parameters are included because the other parameters are
related to these parameters.