Вы находитесь на странице: 1из 265

University of Wollongong Thesis Collections

University of Wollongong Thesis Collection


University of Wollongong

Year

Automatic aircraft recognition and


identification
Jijoong Kim
University of Wollongong

Kim, Jijoong, Automatic aircraft recognition and identification, PhD thesis, School of
Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2005.
http://ro.uow.edu.au/theses/499
This paper is posted at Research Online.
http://ro.uow.edu.au/theses/499

NOTE
This online version of the thesis may have different page formatting and pagination
from the paper copy held in the University of Wollongong Library.

UNIVERSITY OF WOLLONGONG
COPYRIGHT WARNING
You may print or download ONE copy of this document for the purpose of your own research or
study. The University does not authorise you to copy, communicate or otherwise make available
electronically to any other person any copyright material contained on this site. You are
reminded of the following:
Copyright owners are entitled to take legal action against persons who infringe their copyright. A
reproduction of material that is protected by copyright may be a copyright infringement. A court
may impose penalties and award damages in relation to offences and infringements relating to
copyright material. Higher penalties may apply, and higher damages may be awarded, for
offences and infringements involving the conversion of material into digital or electronic form.

AUTOMATIC AIRCRAFT RECOGNITION AND


IDENTIFICATION

by

JIJOONG KIM
B.Eng. (Hons) (The University of Adelaide) 1993
M.Eng.Sc. (The University of Adelaide) 1995
School of Electrical, Computer and Telecommunications Engineering

A thesis submitted in partial fulfillment of the


requirements for the degree of
Doctor of Philosophy
from
UNIVERSITY OF WOLLONGONG
August, 2005

c Copyright by Jijoong Kim, 2005

Certification

I, Jijoong Kim, declare that this thesis, submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy, in the School of Electrical, Computer
and Telecommunications Engineering, University of Wollongong, is wholly my own
work unless otherwise referenced or acknowledged. The document has not been submitted for qualifications at any other academic institution.

Signature of Author

Date

ii

Table of Contents
Table of Contents

iii

List of Tables

vi

List of Figures

ix

Abstract

xxi

Acknowledgements

xxiii

1 Introduction
1.1 Design Objectives . . . . . . . . . .
1.2 Definitions and Basic Assumptions
1.3 System Description . . . . . . . . .
1.4 Contributions of the Thesis . . . .
1.5 Outline of the Thesis . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

2 Aircraft Recognition Techniques: A Review


2.1 Syntactic/Semantic Grammar Techniques . .
2.2 Global Matching Techniques . . . . . . . . .
2.2.1 Moment Invariant Techniques . . . .
2.2.2 Fourier Descriptor Techniques . . . .
2.3 Local Matching Techniques . . . . . . . . . .
2.3.1 Pose Clustering . . . . . . . . . . . .
2.3.2 Alignment . . . . . . . . . . . . . . .
2.3.3 Geometric Hashing . . . . . . . . . .
2.3.4 Particular Systems . . . . . . . . . .
2.4 Knowledge-Based Vision Systems . . . . . .
2.4.1 COBIUS . . . . . . . . . . . . . . . .
2.4.2 ACRONYM . . . . . . . . . . . . . .
iii

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

1
2
4
6
7
9

.
.
.
.
.
.
.
.
.
.
.
.

15
16
20
20
22
24
25
27
27
30
37
38
40

2.4.3
2.4.4

TRIPLE System . . . . . . . . . . . . . . . . . . . . . . . . .
Das and Bhanu . . . . . . . . . . . . . . . . . . . . . . . . . .

43
45

3 Feature Extraction and Generation of Aircraft Hypothesis


51
3.1 Review of Line Extraction Methods . . . . . . . . . . . . . . . . . . . 52
3.2 Proposed Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 Clutter Rejection and Contour Extraction . . . . . . . . . . . . . . . 58
3.4 Line Extraction and Organisation . . . . . . . . . . . . . . . . . . . . 65
3.4.1 Linear Approximation of Contours . . . . . . . . . . . . . . . 65
3.4.2 Extension of Collinear Lines . . . . . . . . . . . . . . . . . . . 66
3.4.3 Line Significance . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.4 Line Description . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.5 Polarised Lines and Grid Lines . . . . . . . . . . . . . . . . . 73
3.4.6 Endpoint Proximity Line Linking . . . . . . . . . . . . . . . . 76
3.5 Two-Line Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5.1 Detection of Wing Candidates . . . . . . . . . . . . . . . . . . 79
3.5.2 Detection of Nose Candidates . . . . . . . . . . . . . . . . . . 84
3.5.3 Two-line Grouping Organisation . . . . . . . . . . . . . . . . . 89
3.6 Four-Line Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.7 Generation of Aircraft Hypothesis . . . . . . . . . . . . . . . . . . . . 94
3.8 Neural Networks for Extracting Line-Groupings and Aircraft Hypotheses100
3.8.1 Configuration of the Neural Networks . . . . . . . . . . . . . . 101
3.8.2 Analysis of the Neural Networks . . . . . . . . . . . . . . . . . 105
3.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4 Generic Aircraft Recognition
4.1 Evidence Accumulation . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Fuselage Detection . . . . . . . . . . . . . . . . . . . . . . . .
4.1.2 Detection of Tail Fins . . . . . . . . . . . . . . . . . . . . . .
4.1.3 Wingtip Edge Detection . . . . . . . . . . . . . . . . . . . . .
4.1.4 Additional Evidence Accumulation . . . . . . . . . . . . . . .
4.2 Interpretational Conflict Resolution . . . . . . . . . . . . . . . . . . .
4.3 Shadow Removal Process . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Evidence Score Optimisation . . . . . . . . . . . . . . . . . . . . . . .
4.5 Experimental Results of the Selected Aircraft Images Shown in Section
1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Experimental Results from Non-Aircraft Images . . . . . . . . . . . .
4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv

109
110
110
115
119
121
129
133
137
141
145
147

5 Aircraft Pose Estimation and Identification


5.1 Review of Matching Metrics . . . . . . . . . . . . . . . . . .
5.1.1 Integrated Squared Perpendicular Distance . . . . . .
5.1.2 Distance Ratio Standard Deviations . . . . . . . . . .
5.1.3 Circular Distribution of Matched Pixels . . . . . . . .
5.1.4 Distance Transform . . . . . . . . . . . . . . . . . . .
5.1.5 Hausdorff Distance . . . . . . . . . . . . . . . . . . .
5.1.6 Averaged Dot Product of Contour Direction Vectors .
5.1.7 Discussions on Fitting Metrics . . . . . . . . . . . . .
5.2 Model Generation and Pose Estimation . . . . . . . . . . . .
5.3 Model and Image Alignment . . . . . . . . . . . . . . . . . .
5.4 Proposed Fitting Metrics . . . . . . . . . . . . . . . . . . . .
5.5 Fitting Optimisation and Best Match Finding . . . . . . . .
5.6 Model Matching Results . . . . . . . . . . . . . . . . . . . .
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 Performance Analysis
6.1 Implementation . . . . . . . . . . . .
6.2 Computational Complexity . . . . . .
6.3 Generic Recognition Performance . .
6.4 Matching Performance . . . . . . . .
6.5 Qualitative Performance Comparisons
6.6 Concluding Remarks . . . . . . . . .

. . .
. . .
. . .
. . .
with
. . .

. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
Other Systems
. . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

157
159
159
161
163
164
167
169
170
173
178
182
185
193
195

.
.
.
.
.
.

203
204
205
213
217
220
221

7 Conclusions
223
7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.3 Suggestions for Future Work . . . . . . . . . . . . . . . . . . . . . . . 227
A Description of Input Parameters to the Neural Networks Feature
Extractors
229
Bibliography

233

List of Tables
1.1 Simplified representation of aircraft domain knowledge. . . . . . . . .

2.1 Example of semantic information attached to production rules. . . . .

18

2.2 An example of 2-D table for an efficient pose clustering. The resolutions
(bin widths) for s, , 4x and 4y are 0.2, 20, 5 and 5 respectively. . .

34

3.1 Line description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

3.2 The neural network configurations and the mean error rates in detection of wings, noses, wingpairs and aircraft hypotheses. . . . . . . . . 104
3.3 Test of the neural networks on the spurious features that survived the
rule-based approach. As shown in the third column, 30-40% of those
features are successfully rejected by the neural networks. . . . . . . . 107
4.1 Scores obtained in the process of aircraft evidence accumulation. The
first 6 scores are dedicated to the aircraft part detection, and the remaining evidences (in the 7th 18th entries) are introduced in order to
help distinguish between the aircraft and clutter hypotheses. . . . . . 139
6.1 Comparison of the total number of lines with and without the use of
the clutter removal algorithm for images with dense clutter. . . . . . 205
6.2 Computational complexity of aircraft recognition and identification
processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6.3 Performance evaluation using real aircraft and clutter images. . . . . 216
vii

6.4

Recognition rates in the eight imaging categories. Note that for the
multiple aircraft category, the denominator 42 is the total count of
aircraft in 17 multiple aircraft images.

. . . . . . . . . . . . . . . . . 216

6.5

Matching performance parameters. . . . . . . . . . . . . . . . . . . . 219

6.6

Performance expectations of other methods such M.I (moment invariant)[36,


15], F.D (Fourier Descriptor)[123], Das et al.[32], and Hsieh et al.[56],
under different imaging conditions. The question mark means maybe.

viii

221

List of Figures
1.1 Real aircraft images with blurring and noise. . . . . . . . . . . . . . .

10

1.2 Real aircraft images with camouflage. . . . . . . . . . . . . . . . . . .

10

1.3 Real aircraft images in clutter background. . . . . . . . . . . . . . . .

10

1.4 Multiple aircraft in the image. . . . . . . . . . . . . . . . . . . . . . .

10

1.5 Partly occluded aircraft images. . . . . . . . . . . . . . . . . . . . . .

11

1.6 Aircraft with protrusions - engine protrusions for (a) and (b), and
missile protrusions for (c) and (d). . . . . . . . . . . . . . . . . . . . .

11

1.7 Aircraft with shadows - shadows on aircraft shown in (a) and (b),
background shadow casted by aircraft shown in (c) and (d). . . . . .

11

1.8 Functional flow diagram. . . . . . . . . . . . . . . . . . . . . . . . . .

12

1.9 Feature hierarchy for generation of an aircraft hypothesis. The aircraft


hypothesis (nose-wingpair association) is at the top. The lower level
features are four-line groupings, two-line groupings and lines. By using
pointers, the system can access any low level feature of hypothesis, Hi .

13

2.1 (a) Aircraft represented by its skeleton, (b) primitives, (c) structure
generated by using a string grammar, and (d) the skeleton that can be
generated by the grammar L(G) = {abcn d|n 1}. . . . . . . . . . . .

17

2.2 The projected angles and determine the rotation (pitch and roll)
of the model vertex-pair projected onto the image plane. . . . . . . .

30

2.3 Framework of knowledge/model based aircraft recognition. . . . . . .

38

2.4 COBIUS image understanding architecture [9]. . . . . . . . . . . . . .

39

ix

2.5

Generalised Cylinder representation of an aircraft and the projected


images in terms of ribbons and ellipses. . . . . . . . . . . . . . . . . .

42

2.6

Multi-strategy machine learning approach for aircraft target recognition. 44

2.7

Framework of the qualitative object recognition system [33, 32]. . . .

2.8

Convexity test on a line pair. For any two lines, Li and Lj , we de-

46

termine two extra lines (green dashed) by joining the end points of Li
and Lj . If these lines are contained in the segmented region (shaded)
then the convexity test is passed. . . . . . . . . . . . . . . . . . . . .
2.9

47

3 or 4-line grouping process to generate symbolic aircraft features. The


shaded circles represent the proximal region of independently detected
corners. Any group of three lines (on the left) must satisfy the following
conditions: (i) the two lines, Li and Lj , are non-parallel, (ii) the third
line, Lk , is in between Li and Lj , (iii) the line intersections occur near
independently detected corners, and (iv) the third line, Lk , is shorter
then at least one of Li and Lj . In addition, a group of four lines (on
the right) must satisfy the following conditions: (i) the two lines, Li
and Lj , are non-parallel, and the other two, Lh and Lk , are parallel,
(ii) the parallels form the opposite sides of the trapezoid, (iii) the line
intersections occur near the detected corners, and (iv) the parallel lines,
Lh and Lk , are shorter than the non-parallels, Li and Lj . . . . . . . .

3.1

48

Eight directional edge masks in angular steps of 22.5 degrees. These


edge masks have an elongated rectangular shape to detect long weak
edges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2

56

Sliding search window to check for detecting dense clutter. The pixels
in all of the four quadrants need to be dense and randomly oriented if
the region under the window is to be tagged as clutter. . . . . . . . .

3.3

60

Detection of randomly oriented dense clutter regions. The clutter regions shaded. The clutter-aircraft borders are correctly included in the
non-clutter region so that the wing edges can be extracted. . . . . . .
x

61

3.4 Results of dense clutter removal process. The first column original
images, the second column edge images prior to the clutter removal
algorithm, and the third column edge images after clutter removal.

62

3.5 Results of dense clutter removal process (continued). The first column
original images, the second column edge images prior to the
clutter removal algorithm, and the third column edge images after
clutter removal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

3.6 Contour labelling process. The current pixel searches for a contour
pixel to inherit the label from. The direction of search is defined the
orientation of the current pixel. . . . . . . . . . . . . . . . . . . . . .

64

3.7 If a contour has at least 30% of its pixels in non-clutter region, the
contour is accepted. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

3.8 Straight line extraction process, similar to that of Lowe [79]. This
algorithm generates a line approximation which is visually plausible. .

66

3.9 Line representation. Note that the symbol ] represents a number. . .

66

3.10 Generation of an extended line - gap width, angular deviations and


length differences form the basis to extend the lines. Note that these
two lines Li and Lj are not removed from the line database. They are
used later in the line-grouping and evidence collection processes. . . .

69

3.11 Intensity means collected in the vicinity of the line pair. The intensity
information is used to supplement the line extension decision. . . . .

71

3.12 (a) Line features prior to the line extension process (b) Line extension
and prioritisation outcome - extended lines (red dotted line), significant
lines (blue), and non-significant lines (green). . . . . . . . . . . . . .

71

3.13 Histograms of the line orientations are shown in the right column. The
images in the left column show clutter lines that are predominantly
oriented along one or two directions. . . . . . . . . . . . . . . . . . .

75

3.14 Forming a line link based on the endpoint proximity property is shown
in (a), and a recursive line search to check if two lines are linked via a
line chain is shown in (b). . . . . . . . . . . . . . . . . . . . . . . . .
xi

77

3.15 A wide variety of nose shapes and intensities. . . . . . . . . . . . . .

80

3.16 Two-line grouping process. . . . . . . . . . . . . . . . . . . . . . . . .

81

3.17 Wing candidate detection conditions - examples of accepted cases (a)


and (d), and commonly arising failed cases (shown in red lines). . . .

82

3.18 Gradient distribution curve for the region enclosed by a two-line grouping. To pass the intensity check, the 10%, 20%, 30% percentiles must
be less than preset thresholds (ie., majority of the populations must
be on the left corner).

. . . . . . . . . . . . . . . . . . . . . . . . . .

83

3.19 A typical nose configuration. . . . . . . . . . . . . . . . . . . . . . . .

84

3.20 Incorrect nose configurations in (a), (g), (l) are subject to further verification. Resulting accepted and rejected configurations are shown in
blue and red, respectively. . . . . . . . . . . . . . . . . . . . . . . . .

85

3.21 Any nose candidate in close proximity to image borderlines, which is


oriented in such a way that a large portion its projected silhouette is
placed outside the image borderlines. . . . . . . . . . . . . . . . . . .

86

3.22 Location of the nose tip. If the nose tip is not visible, then it location
is estimated at the midpoint of nose edges intersection and midpoint
of nose edges inner endpoints. . . . . . . . . . . . . . . . . . . . . . .

86

3.23 Multiple two-line grouping configurations generated from single physical nose. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

3.24 Wing/Nose Representation. Leg1 and Leg2 are the two lines forming
the two-line grouping. Note that the symbol ] refers to a number. . .

90

3.25 Resulting wing and nose candidates from the two-line grouping process
on the image of Figure 3.4(a). In (a), line pairs are shown in blue, and
red lines are used to show which two lines are paired. [(b) 80 nose
candidates and (c) 513 wing candidates]. . . . . . . . . . . . . . . . .

90

3.26 Formation of four-line groupings (wing-pair candidates). Commonly


encountered failed configurations are shown as red lines in (b). . . . .

93

3.27 Three point collinearity property both in space and in the image. . .

94

xii

3.28 Four-line grouping representation. The two slots right and left wing
hold the wing numbers which form the wing-pair. Note that the symbol
] refers to a number. . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

3.29 Extraction of multiple wing-pairs due to wing edge fragmentation. This


figure shows 3 possible boomerang wing pairs arising from one wing
pair, one of whose edges contain 2 segments. . . . . . . . . . . . . . .

95

3.30 Resulting wingpair candidates from the four-line grouping process on


the image in Figure 3.4(a). The blue lines are constituent lines of four
line groupings. Red and green lines are introduced to show how the
blue lines are grouped together . [(b) triangle wing candidates, (c)
diamond wing candidates, and (d) boomerang wing candidates]. . . .

96

3.31 Nose to wing-pair matching. The nose must be within the search region, must be facing the wing-pair, and the skewness must not be severe. 99
3.32 In the feature parameter space (2-D for illustrative purpose) the blue
circles represent aircraft feature parameters and the red squares represent clutter feature parameters. (a) Use of single thresholds forms
simple decision boundaries that pass many clutter features, and (b)
the neural networks can generate complex shaped decision boundaries. 101
3.33 Plot of log-sigmoid function. . . . . . . . . . . . . . . . . . . . . . . . 102
3.34 ROC curves for detection of (a) wings, (b) noses, (c) wing-pairs and
(d) aircraft hypotheses. . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.1 Typical commercial and military aircraft, and the parts that needed to
be detected for evidence score accumulation. . . . . . . . . . . . . . . 111
4.2 Detection of fuselage edges and assessment of their coverage. . . . . . 113
4.3 Scale factor (fL or fR ) which is inversely proportional to the divided
angular width of the fuselage search region, expressed in terms of (C
F P, C PL ) and (C F P, C PR ). . . . . . . . . . . . . . . . . . 115
xiii

4.4

The detected fuselage boundary lines connect the nose to the wing
leading edges via connected chains. Such a nose-to-wing connection
provides the strong fuselage boundary evidence. . . . . . . . . . . . . 116

4.5

Locating tail fin edge lines: (a) geometric constraints in terms of location, length and orientation, (b)intensity-based constraints applied
both in the foreground and background regions, (c) skewed symmetry constraints applied to tail fin leading edges (ie., cot 1 + cot 2 =
cot 01 + cot 02 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

4.6

Detection of wingtip edges. . . . . . . . . . . . . . . . . . . . . . . . . 120

4.7

The wing leading edges must overlap when rotated about FP. The
overlapping portion is shown in red. The same rule applies to the
trailing edges of the wing-pair. . . . . . . . . . . . . . . . . . . . . . . 122

4.8

Regions of interest for intensity level comparisons. The differences of


the mean intensity values between each pair of regions (F1 and F2),
(R1 and R2) and (M1 and M2) are expected to be small.

4.9

. . . . . . 123

The background intensity is computed from the shaded periphery region. We assume this periphery region contains mainly the background. 124

4.10 Background intensity histograms obtained from the shaded perimeter


region (refer to Figure 4.9) of aircraft images with different clutter
levels: (a) clean, (b) light clutter, and (c) heavy clutter. PM is the
count of pixels in the bin corresponding to the peak, and PT is the
total pixel count in the histogram. The ratio PM /PT roughly indicates
the clutter level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.11 Finding of rear fuselage lines and clutter lines: Potential rear fuselage
edges for a boomerang shaped wing-pair are detected between the wing
trailing edges, and are shown in blue. Detection of many lines crossing
the gap between the wing edges inner point (eg., PT2) and the fuselage
axis weakens the confidence of the hypothesis. Clutter lines are shown
in red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
xiv

4.12 A spurious aircraft hypothesis coincidentally generated from dense


clutter, is likely to contain many clutter lines in the hypothetical fuselage region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.13 Clutter evidence score plot as function of the clutter count. If the
clutter count within the fuselage region (refer to Figure 4.12) exceeds
7, then the score becomes negative. . . . . . . . . . . . . . . . . . . . 126
4.14 Deviation of FP from the fuselage axis, expressed as F P . Any aircraft with coplanar wings and fuselage will display a small F P value.
Spurious hypotheses usually show larger F P values, therefore the parameter, F P , is used in interpretational conflict resolution process. . 127
4.15 Intensity comparisons between regions of R1 and R2. A spurious aircraft hypothesis, often generated as wing-fuselage combinations, will
show a large intensity difference between the two regions. . . . . . . . 128
4.16 Spurious hypothesis which is accidentally formed where three or more
wings are the extended lines of clutter edges. . . . . . . . . . . . . . . 129
4.17 Aircraft-hypothesis representation. The two slots Killed and Killed by
are used during the interpretational conflict resolution process. The
slot Weight contains the sum of the four line lengths. Note that the
symbol ] refers to a number. . . . . . . . . . . . . . . . . . . . . . . . 130
4.18 Commonly encountered scenarios of interpretational conflicts due to
part sharing. Incorrect wing edges in the spurious wing candidates are
shown in red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.19 Shadow regions casted by wings ((a) are mostly covered by the wings,
or (b) are separated from the wings). The shadow wings have their
symmetry axis roughly aligned with the aircraft fuselage axis. . . . . 134
4.20 Interpretational conflicts arising from shadow casted by the wings. . . 135
4.21 Examples of some competing aircraft candidates. Green lines correspond to nose legs, red lines to wing edges and tips, blue line to fuselage
axis, and cyan line to wing symmetry axis. . . . . . . . . . . . . . . . 138
xv

4.22 Histogram of the fuselage coverage score of the winning hypotheses


using a sample base of 300 real aircraft images. . . . . . . . . . . . . 140
4.23 Blur image 1, Score = 757. . . . . . . . . . . . . . . . . . . . . . . . . 149
4.24 Blur image 2, Score = [879 510]. . . . . . . . . . . . . . . . . . . . . . 149
4.25 Blur image 3, Score = 737. . . . . . . . . . . . . . . . . . . . . . . . . 149
4.26 Blur image 4, Score = 647. . . . . . . . . . . . . . . . . . . . . . . . . 149
4.27 Camouflage 1, Score = 810. . . . . . . . . . . . . . . . . . . . . . . . 150
4.28 Camouflage 2, Score = 690. . . . . . . . . . . . . . . . . . . . . . . . 150
4.29 Camouflage 3, Score = 612. . . . . . . . . . . . . . . . . . . . . . . . 150
4.30 Camouflage 4, Score = 686. . . . . . . . . . . . . . . . . . . . . . . . 150
4.31 Dense clutter, Score = 620. . . . . . . . . . . . . . . . . . . . . . . . 151
4.32 Dense clutter, Score = 730. . . . . . . . . . . . . . . . . . . . . . . . 151
4.33 Polarised clutter, Score = 746. . . . . . . . . . . . . . . . . . . . . . . 151
4.34 Structured clutter, Score = 724. . . . . . . . . . . . . . . . . . . . . . 151
4.35 Multiple aircraft 1, Scores=[917 903 843]. . . . . . . . . . . . . . . . . 152
4.36 Multiple aircraft 2, Scores=[834 733 706]. . . . . . . . . . . . . . . . . 152
4.37 Multiple aircraft 3, Scores=[834 714 686 674]. . . . . . . . . . . . . . 152
4.38 Multiple aircraft 4, Scores=[783 717]. . . . . . . . . . . . . . . . . . . 152
4.39 Partial occlusion 1, Score = 825. . . . . . . . . . . . . . . . . . . . . . 153
4.40 Partial occlusion 2, Score = 713. . . . . . . . . . . . . . . . . . . . . . 153
4.41 Partial occlusion 3, Score = 725. . . . . . . . . . . . . . . . . . . . . . 153
4.42 Partial occlusion 4, Score = 766. . . . . . . . . . . . . . . . . . . . . . 153
4.43 Protrusions 1, Score = 963. . . . . . . . . . . . . . . . . . . . . . . . 154
4.44 Protrusions 2, Score = 831. . . . . . . . . . . . . . . . . . . . . . . . 154
4.45 Protrusions 3, Score = 797. . . . . . . . . . . . . . . . . . . . . . . . 154
4.46 Protrusions 4, Score = 726. . . . . . . . . . . . . . . . . . . . . . . . 154
4.47 Shadow problem 1, Score = 731. . . . . . . . . . . . . . . . . . . . . . 155
4.48 Shadow problem 2, Score = 695. . . . . . . . . . . . . . . . . . . . . . 155
4.49 Shadow problem 3, Score = 863. . . . . . . . . . . . . . . . . . . . . . 155
4.50 Shadow problem 4, Score = 692. . . . . . . . . . . . . . . . . . . . . . 155
xvi

4.51 Examples of spurious hypotheses from non-aircraft images when the


rule-based line grouping method is used. The spurious hypothesis in
(f) survives as its score exceeds the threshold. However, with the neural
network based line grouping method, this spurious hypothesis fails to
form. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.1 5 Military jets considered in the experiment. . . . . . . . . . . . . . . 158
5.2 Endpoints on a image segment projected onto an infinitely extended
model segment. The perpendicular distance at any point along the
image segment is given as d(t). . . . . . . . . . . . . . . . . . . . . . 160
5.3 Projected model and image boundaries used for calculation of the distance ratio standard deviation. . . . . . . . . . . . . . . . . . . . . . . 162
5.4 Circular distribution of matched pixels, (a) good match between the
model and image boundaries, (b) poor match resulting in an uneven
distribution of points. . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.5 A binary edge image (on the left) and its Euclidean Distance Transform
(on the right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.6 Computation of the Chamfer distance - model edge image (template)
is superimposed on the DT image, and the values in the shaded (blue)
entries read the distance between the model edges and the image edges. 165
5.7 Hausdorff distance shown for two point sets of ellipses. The ellipse pair
on top are better fitted, and result in the smaller H(A, B). . . . . . . 168
5.8 Examples of one-to-many and many-to-many mappings. (a) One model
line is mapped to many image line fragments (eg., c {8, 9}, d
{10, 11, 12}).

(b) When a curve is approximated with a series of

straight line segments, the resulting mapping is likely to be manyto-many. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171


5.9 Simplified 3-D model of an F16: bluehorizontal, redvertical. The
origin of the 3-D coordinate system is at the intersection of wing leading
edges F P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
xvii

5.10 Model to image projection. Translation and scaling are ignored to


simplify the diagram. The x0 -y0 axes are the projections of the rotated
X-Y axes. Note v10 and v20 can also be expressed as v1 and v2 if
measured with respect to the image reference frame (ie., x-y frame). . 175
5.11 Generation of transformed model silhouette. . . . . . . . . . . . . . . 180
5.12 Filtered phase map: discrete orientations are displayed in different
colours. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
5.13 Proximity weight - ranging from 0 to 1 for each pixel pair. . . . . . . 184
5.14 Overlay of the 3-D cosine taper function along the projected model
boundary. The red colour is equivalent to 1, and blue colour in the
background is equivalent to 0. . . . . . . . . . . . . . . . . . . . . . . 184
5.15 Search for the closest image pixel having a similar orientation to the
current model pixel. The distance between the two pixels is dm . . . . 185
5.16 Histogram of the angles between the wing leading edges (a), and histogram of the angles between the wing trailing edges (b). These angles
are taken from the winning aircraft hypotheses of the 300 real aircraft
images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.17 Incorrectly estimated position of RP , and the resulting rotational shift
of the wing symmetry axis. . . . . . . . . . . . . . . . . . . . . . . . . 187
5.18 Poor outline matching due to relatively large transformation errors. . 188
5.19 Various RP s in a grid for iteratively determining the correct transform
parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
5.20 Match with the highest match score after considering all RP s in the
grid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.21 Model Hierarchy for efficient model search. . . . . . . . . . . . . . . . 190
5.22 Efficient two-step model fitting process. . . . . . . . . . . . . . . . . . 192
5.23 Model matching for F111 with shadow (match score = 64%). . . . . . 196
5.24 Model matching for F111 with grid clutter (match score = 66%). . . . 197
5.25 Model matching for F16 with occlusion and protrusion (match score =
75%). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
xviii

5.26 Model matching for JSF with clutter and occlusion (match score = 72%).199
5.27 Matching for Mirage with camouflage and protrusions (match score =
78%). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
5.28 Matching for F18 with shadows (match score = 68%). . . . . . . . . . 201
6.1 Number of line groupings extracted by the rule-based method: NN
(blue), NW (red), N4G (black) and NH (green) versus line count NE
(x-axis). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.2 Number of line groupings extracted by the neural network based method:
NN (blue), NW (red), N4G (black) and NH (green) versus line count
NE (x-axis). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.3 Distribution curves of the number of line groupings, NE (top left), NW
(top right), N4G (bottom left) and NH (bottom right), obtained via
the rule-based approach from the cluttered aircraft images. . . . . . . 207
6.4 Plots of total line counts. The curve represents the number of the
extended lines as a function of the unextended lines (prior to the line
extension process). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.5 Plot of NW curves obtained from real aircraft images, using the rulebased two-line grouping extraction algorithm. The red and black curves
represent NW with and without intensity checks, respectively. . . . . 212
6.6 Plot of NW curves obtained from non-aircraft clutter images,using the
rule-based two-line grouping extraction algorithm. The red and black
curves represent NW with and without intensity checks, respectively. . 213
6.7 ROC curves for the generic recognition of aircraft. The red curve is
obtained when the rule based method is used for the extraction of
line-groupings and the blue curve is obtained using the neural networks.215
6.8 Model match score: Correct match (blue asterisk) and false match (red
circle or red cross). A red circle represents a correct aircraft hypothesis
matched to a wrong model. A red cross represents a spurious aircraft
hypothesis matched to one of the models. . . . . . . . . . . . . . . . . 218
xix

6.9

ROC curve: trade off between true and false match rates as the threshold varies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

xx

Abstract
Aircraft recognition remains a challenging problem despite a great deal of effort to
automate the recognition process. The majority of the aircraft recognition methods
assume the successful isolation of the aircraft silhouette from the background, and
only a few have actually addressed real world concerns, such as occlusion, clutter and
shadows. This thesis presents an automatic aircraft recognition system, which shows
improved performance with complex images. This system assumes from the start
that the image could possibly be degraded, contain occlusions, clutter, camouflage,
shadows and blurring. It is designed to tolerate and overcome the degradations at
various analysis stages. The first part of the thesis focuses on the generic aircraft
recognition problem using a generic description of aircraft parts and the geometric
relationships that exist among them. The system implements line groupings in a
hierarchical fashion, progressively leading towards a generic aircraft structure. A
voting scheme is used to consolidate line groupings belonging to an aircraft while
discouraging the formation of spurious line groupings. The aircraft identification
process is carried out in the second part of the thesis, where the generically recognised
aircraft is matched to model candidates. Model matching is carried out via pixellevel silhouette boundary matching. The system is tested on numerous real aircraft,
scaled-down model aircraft and non-aircraft images with adverse image conditions.
The developed system achieves a recognition rate of 84% at a false alarm rate of 7% on
real aircraft images, and an correct matching rate of about 90% and a false matching
rate of 7% on the generically recognised aircraft from model aircraft images.

xxi

Acknowledgements
I would like to express my sincere gratitude to my principal supervisor Prof. Abdesselam Bouzerdoum for his guidance and enthusiasm over the years. His cheerful
attitude and encouragement will be dearly missed.
I am also deeply grateful to Dr. Hatem Hmam, for his constant guidance, advise and
friendship. Without his daily probing, criticisms and suggestions, I could not have
completed this journey.
I cannot thank enough my wife Christine Jang for putting up with me, and cheering
me up whenever I hit dead ends. This thesis is dedicated to her.
I thank Dr. Carmine Pontecorvo for proofreading the first draft of my thesis and
being a good friend for so many years.
I am indebted to my family for their love and prayers, and Dr. Farhan Faruqi, Mr.
Ashley Martin and other colleagues for their patience and support.

xxiii

Chapter 1
Introduction
The task of reliably detecting and recognising an aircraft from single images remains a
challenging problem despite advances made in computing technology, image processing and computer vision. Aircraft recognition techniques have been reported using a
variety of methods, but very few have actually addressed real world concerns such as
occlusion, clutter and poor image quality.
A brief list of existing object-recognition techniques applied to aircraft recognition
can be found in [32]. A more recent overview is given in Chapter 2. Most recognition
techniques can broadly be categorised into moment invariant [15, 36, 57], Fourier
descriptor [25, 44, 123], syntactic/semantic grammar [34, 113], and model/knowledgebased techniques [9, 18, 32, 33, 86, 87]. Other methods that do not fit into the above
categories make use of wavelet transform [2], non-uniform rational B-splines and cross
ratios [115], and feature integration [55, 56]. There are some recent efforts to apply
neural network techniques for aircraft model matching [65, 66, 83, 84, 106, 124].
With the exception of model/knowledge-based techniques, virtually all of the above
mentioned methods require the successful extraction of the entire aircraft silhouette

or region. Aircraft recognition performance therefore, suffers considerably under nonideal conditions where various forms of image degradation or occlusion are present.
Model/knowledge-based methods, on the other hand, generally offer superior performance against noise, occlusion and clutter. They often make use of domain specific
knowledge to compensate for missing data and feature extraction deficiencies suffered at the lower levels of image processing. Furthermore model-based systems call
for the explicit matching of the aircraft image with a number of aircraft model instances. This matching is carried out at the pixel or feature levels. Model matching
provides strong evidence of the aircrafts presence in the image and allows viewpoint
determination.

1.1

Design Objectives

The main objective of this work is to design a vision system which can recognise
a generic aircraft under various forms of image degradation. A large proportion of
existing aircraft recognition approaches assumes that the aircraft silhouette can be
successfully separated from background, and usually makes use of synthetic images
to demonstrate performance. Knowledge-based aircraft recognition systems [9, 32]
are applied to real images and are usually provided with ancillary information about
image acquisition conditions such as the camera viewing angles and sun position.
Such ancillary information, which helps locating shadow regions and aircraft shadowmaking edges, is assumed to be not available to us in this work. A list of the main
difficulties faced in automatic aircraft recognition is summarised below.

Poor image quality (see Figures 1.1(a)-1.1(d)) - Often noise can be filtered out
with smoothing. However, excessive noise and blurring often result in edge
fragmentation and distortion. Poor image contrast is particularly challenging as

some of the weak but critical edges may be washed away during edge detection.
Camouflage (see Figures 1.2(a)-1.2(d)) - Camouflage in visual band imagery is particulary challenging for region-based vision systems. The presence of many
segmented regions associated with camouflage patches is a source of confusion,
because of the excessive subdivision of the aircraft region into many subregions.
Clutter (see Figures 1.3(a)-1.3(d)) - Edge fragmentation is a common problem in
image processing. Compounding this difficulty is the presence of background
clutter, which makes distinguishing between aircraft and clutter edges very difficult. Furthermore, dense clutter in the immediate vicinity of aircraft boundaries
often introduce errors to edge detection algorithms, causing the boundaries to
appear noisy and fragmented. Clutter also strains the systems resources in
terms of increased computational complexity.
Closely spaced multiple aircraft (see Figures1.4(a)-1.4(d)) - Edges and parts from
one aircraft may coincidentally associate with parts of another aircraft nearby,
forming spurious aircraft hypotheses.
Occlusion (see Figures 1.5(a)-1.5(d)) - Occlusion distorts the global shape signature
of an object, and may cause the object to appear as two or more disjoint components. Airborne aircraft may become partially obstructed by clouds, smoke
or flare. In addition, self-occlusion can occur when a missile, engine or rudder
occlude parts of the aircraft fuselage or wings.
Protrusion (see Figures 1.6(a)-1.6(d)) - Engine or missile protrusions may complicate the model matching process. Missiles often result in some loss of model
matching sensitivity because they usually do not constitute a fixed part of the
aircraft and hence do not usually appear in the aircraft model set.

Shadow (see Figures 1.7(a)-1.7(d)] - Shadow is considered a nuisance in our vision


system. If, however, ancillary data such as the camera viewing angles and the
sun position are available, then shadows can provide strong cues about aircraft
shadow-casting edges [32, 33].

Often, combinations of these problems are present in a single image, magnifying the
challenges faced by the vision system. A number of dedicated algorithms have been
developed to detect or partially address some of these issues. This system, however,
relies more on its global architecture to overcome these issues and achieve aircraft
recognition.

1.2

Definitions and Basic Assumptions

We begin this section by defining the terms generic recognition and identification
that appear in the title and throughout the thesis. Firstly, our definition of aircraft is
confined to aeroplanes with either boomerang, diamond or triangle wings (see Figure
5.1(c)-(e) for examples). We exclude helicopters, hot-air balloons, or aeroplanes with
parallel wings or propellers. In this thesis, the term generic recognition includes
detecting the aircraft and having some information about its shape, which allows a
brad classification of the aircraft (eg., commercial aircraft). The term identification
refers to a specific aircraft model (eg., Boeing-747, F18 etc), and this process usually
involves model matching.
Our system does not require ancillary information such as sun position, weather conditions, camera viewpoints and target range. Furthermore no contextual information
is provided regarding the imaged environment (eg., aircraft runway scene, clear sky
scene, etc). However, we still require a number of basic assumptions to be met.

Table 1.1: Simplified representation of aircraft domain knowledge.


Parts
Wing

Geometric Image Attributes


Trapezoidal or triangular shape
Wing pair is boomerang, diamond
or triangular

Nose

Wedge Shape (conical in 3-D)

Fuselage

Long
Roughly parallel to longitudinal axis
Trapezoidal or triangular shape

Tail fin

Association Attributes
Connected to fuselage
Located between nose and tail fins
Skewed symmetry about fuselage axis
Connected to fuselage
Oriented to face the wing pair
Connected to nose and wings
In between nose and wings
Behind the wings
Smaller in size than wings
Closest to trailing edges of wings

Aircraft Shape: Aircraft wings, fuselage and nose are assumed to be roughly coplanar.
Generic Viewpoint: The viewing angle must not be too oblique so that the wings
are not visible nor wing edges appear parallel.
Weak Perspective Projection: We assume that aircraft images are taken from
a distance, much longer than the aircrafts wingspan. Our system, however,
is designed to be tolerant to moderate perspective distortion. This has been
demonstrated using a number of close shot images.
Aircraft Resolution: Like most edge-based object recognition systems, this system
requires that the aircraft image is large enough to enable its boundaries to be
approximated by piecewise linear segments.

Figures 1.1 - 1.7 show a selection of aircraft images from the image set processed by
our system. The image resolutions vary from roughly 250 250 to 600 600 pixels.

1.3

System Description

A generic aircraft is represented by its main discernible components: wings, nose,


fuselage and tail fins. In this work, aircraft parts are composed of the geometric and
relationship attributes summarised in Table 1.1. The generic aircraft model is described by a set of basic primitive shapes interconnected in a certain fashion to form
an aircraft. Establishing evidence for the presence of such shapes and their correct
association in the image is the first step of the aircraft detection process. This evidence is collected in an incremental fashion starting from lower level image features
(ie., lines) and moving upwards until a generic aircraft instance is found with little
or no ambiguity. To cope with cluttered or degraded images, the lower and intermediate level processing thresholds are relaxed in order not to miss desirable features
associated with the aircraft structure. This unavoidably leads to the emergence of
a large number of undesirable features arising from clutter. To reduce the effect of
clutter, subsequent image analysis stages check the association property that must
hold between the aircraft parts (see Table 1.1, rightmost column) and update a confidence score based on how well the extracted image features fit together to represent
a generic aircraft.
The basic architecture of the proposed system is depicted in Figure 1.8. This system comprises a number of functional blocks, largely grouped into 3 main stages:
hypothesis generation, hypotheses verification, and validation (or identification). The
first four blocks are dedicated to low level processing, leading up to the extraction
of straight lines. The next two blocks execute line-grouping algorithms using rules
derived from the generic aircraft knowledge-base as described in Table 1.1, and generate feature sets for potential aircraft wings and noses. Potential wings and noses
are grouped as associations and aircraft hypotheses are declared at this stage. An aircraft hypothesis and its parts are all represented in terms of lines that are organised

in a hierarchical structure (as shown in Figure 1.9). A hypothesis confidence score is


initially assigned based on the alignment degree of the nose section with respect to
the wings.
Hypotheses are verified via gradual buildup of evidence. Firstly, in the evidence
accumulation block, additional features such as the fuselage, wing tips and tail fins
are searched for in order to increase the confidence score. An additional set of regionbased evidence associated with aircraft and clutter are used to consolidate or weaken
the hypothesis. Aircraft candidates having a score greater than a preset threshold,
proceed to the next stage where every pair of aircraft hypotheses is checked for any
sign of conflict. A conflict usually arises when both hypotheses share the same edge(s),
or their boundaries are heavily overlapping. Once the conflict is resolved, up to 5
hypotheses with the highest confidence scores are accepted as the winning hypotheses.
In the validation and identification stage, short-listed model candidates from the
model set are sequentially matched to the winning hypotheses. The aircraft model is
represented as a simple three-dimensional wireframe, where vertices are given in the
model horizontal and vertical planes as in [81]. The best match is found by iterative
applications of pose estimation and pixel level model-to-image boundary matching.

1.4

Contributions of the Thesis

The primary contribution of this work is the construction of a vision system for aircraft
recognition using real images. This system is designed to be robust against excessive
clutter, blurring, occlusion, camouflage and shadow, and is able to recognise multiple
aircraft in one image. To the best knowledge of the author, no previous system has
clearly demonstrated aircraft recognition performance using no contextual or ancillary
information (eg. the image contains an airfield scene) and using a large number of

real aircraft images obtained under degraded conditions and camouflage. Furthermore
the system performance was also tested with numerous non-aircraft images, and only
occasionally was a false generic recognition reported.
Other contributions of secondary nature include the followings.

Processing of the background, which includes dense clutter removal, extraction


of polarised edges (ie., lines predominantly oriented along one direction). This
processing step helps discriminate between aircraft and clutter edges.
Implementation of line extension algorithms for collinear lines, which tolerates
large gaps and wide angular deviations. This processing step increases the
probability of joining fragmented wing edges.
Selection of salient lines, based on length and other attributes (eg., gap size
and degree of collinearity for extended lines). This processing step reduces the
computational complexity at various stages of aircraft recognition.
Implementation of a robust generic aircraft recognition system based on the hypothesise then verify paradigm. Aircraft recognition decisions are made through
a voting scheme.
Use of line and image intensity based information in hypothesis generation and
verification stages. Intensity based information is used to consolidate or weaken
line-based evidence and aircraft hypotheses.
Use of the neural networks to improve the detection of aircraft wings, noses,
wingpairs and hypotheses, while discarding more spurious features. The effects
of the neural networks on the overall performance of the system are rigorously
examined. the rule-based method.

Development of a model matching scheme, that is tolerant to occlusion, clutter


and edge displacement and distortion. The model matching utilises a hierarchical model structure and coarse-to-fine pose computation.

1.5

Outline of the Thesis

For the remaining part of the thesis, we first give a review of existing aircraft recognition methods in Chapter 2. The low level feature extraction and generation of line
groupings are presented in Chapter 3. In Chapter 4, four-line groupings are paired
with nose candidates to form aircraft hypotheses. Positive evidences are collected
based on the aircraft part associations to support correct hypotheses. Negative evidences associated with clutter are also considered to negate spurious hypotheses. The
hypothesis which survives the competition and conflict resolution emerges as the winning hypothesis. Chapter 5 deals with the aircraft identification via model matching.
Aircraft pose estimation, image and model alignment, computation of match metric
and best match finding are presented in this chapter. Chapter 6 demonstrates performance results and shows systems computational complexities. This dissertation
concludes with Chapter 7, which includes the thesis summary, relevant topics for
discussion and suggestions for future work.

10

Please see print copy for Figure 1.1

(a)

(b)

(c)

(d)

Figure 1.1: Real aircraft images with blurring and noise.

Please see print copy for Figure 1.2

(a)

(b)

(c)

(d)

Figure 1.2: Real aircraft images with camouflage.

Please see print copy for Figure 1.3

(a)

(b)

(c)

(d)

Figure 1.3: Real aircraft images in clutter background.

Please see print copy for Figure 1.4

(a)

(b)

(c)

Figure 1.4: Multiple aircraft in the image.

(d)

11

Please see print copy for Figure 1.5

(a)

(b)

(c)

(d)

Figure 1.5: Partly occluded aircraft images.

Please see print copy for Figure 1.6

(a)

(b)

(c)

(d)

Figure 1.6: Aircraft with protrusions - engine protrusions for (a) and (b), and missile
protrusions for (c) and (d).

Please see print copy for Figure 1.7

(a)

(b)

(c)

(d)

Figure 1.7: Aircraft with shadows - shadows on aircraft shown in (a) and (b), background shadow casted by aircraft shown in (c) and (d).

12

8 bit intensity image (MxN)

Edge Detection

HYPOTHESES
GENERATION

Clutter Rejection
Contour Extraction
Line Extraction
2-Line Grouping
(wing & nose)
4-Line Grouping
Boomerang, Diamond & Triangle wing pairs

Nose Detection &


Aircraft Hypothesis Generation

Evidence Accumulation
(aircraft parts detection & other evidence)

HYPOTHESES
VERIFICATION

Interpretational Conflict Resolution

Generic Aircraft Recognition

VALIDATION &
IDENTIFICATION
Model Selection

Pose Estimation

Model-to-Image Matching

Model Set
F16
F18
F35
F111
Mirage
etc

Aircraft Identification

Figure 1.8: Functional flow diagram.

13

Hi
nose
Aircraft Hypothesis
wing pair

wing pair
Four-Line Groupings

nose

left wing

right wing

Two-Line Groupings

Lines

Figure 1.9: Feature hierarchy for generation of an aircraft hypothesis. The aircraft
hypothesis (nose-wingpair association) is at the top. The lower level features are
four-line groupings, two-line groupings and lines. By using pointers, the system can
access any low level feature of hypothesis, Hi .

Chapter 2
Aircraft Recognition Techniques:
A Review
There are a variety of object recognition and classification methods that can possibly
be applied to the domain of aircraft recognition. The different approaches to aircraft
recognition may be broadly classified into linguistic pattern recognition techniques,
global matching techniques, local matching techniques and knowledge-based systems.
Section 2.1 discusses syntactic/semantic grammar methods that use linguistic patterns to analyse shape.

Section 2.2 outlines global matching methods that use

knowledge-free global shape descriptors such as moment invariant features and Fourier
descriptor, to uniquely describe various shapes of aircraft silhouettes. In Section 2.3,
local matching approaches are explored, with a special attention to indexing/clustering
methods that are commonly referenced in the field of geometric matching. These
methods adopt a paradigm of hypothesise then verify in an attempt to overcome the
shortcomings of the global shape descriptor techniques (eg., sensitivity to noise and
occlusion). The mainstreams of this field, commonly known as pose clustering, alignment and geometric hashing, are reviewed. Then a number of techniques that extend
the indexing idea to aircraft shape recognition are presented, and these include Mundy
15

16

and Heller [89], Marouani et al [81], Fairney [37], and Chien and Aggarwal[27]. Section 2.4 is dedicated to knowledge-based systems, such as COBIUS [9], ACRONYM
[18], TRIPLE [86], and that of Das and Bhanu [32]. The system proposed by Das
and Bhanu [32, 33] is explored more closely as it is the most recent and brings the
greatest relevance to our work.

2.1

Syntactic/Semantic Grammar Techniques

These approaches use linguistic pattern recognition techniques to analyse shape and
classify aircraft using piecewise linear border approximations. Basically the idea
behind syntactic recognition is the specification of a set of primitives (lines or arcs)
and a set of rules (grammar) that governs their geometric relationships [14, 39, 40].
This grammar specifies combinations of these primitives to construct the piecewiselinear aircraft boundaries. The grammar can be either in a form of string or can be
extended to tree forms.
To better explain the underlying concept of syntactic recognition, a simple example is
considered. Suppose the object shown in Figure 2.1(a) represents an aircraft skeleton.
We define the primitives as shown in Figure 2.1(b) to describe the structure of this
skeleton. The grammar, G, is expressed as
G = (N,

X
, P, S)

where
N = a finite set of syntatic categories called non-terminals
P
= a finite set of image primitives (eg., lines) called terminals
P =

a set of rewriting rules called productions

S=

the starting symbol.

17

(b)

(a)

(c)

cn

(d)

Figure 2.1: (a) Aircraft represented by its skeleton, (b) primitives, (c) structure
generated by using a string grammar, and (d) the skeleton that can be generated by
the grammar L(G) = {abcn d|n 1}.

We build the grammar G = (N,

P
P
, P, S), with N = {A, B},
= {a, b, c, d},

and P = {S aA, A bB, B cB, B d}, where A and B are the nonP
terminals, and S is the starting symbol. The terminals
= {a, b, c, d} correspond
to the primitives shown in Figure 2.1(b). By applying the first production from P,
(S aA), followed by sequential applications of productions A bB, B cB,
B cB, B cB and B d will derive a string {abcccd} which represents the
aircraft skeleton shown in Figure 2.1(c). The language generated by the rules of this
grammar is L(G) = {abcn d|n 1}, which means G is only capable of generating
the skeleton of the form shown in Figure 2.1(d) but having arbitrary length for the
fuselage section (represented by the primitive c). In this example, we assumed that the
interconnection between the primitives takes place at the dots shown in Figure 2.1(b).
In more complicated situations, the rules of connectivity as well as the information

18

Table 2.1: Example of semantic information attached to production rules.


Production
S aA
A bB

B cB

Bd

Semantic Information
Connection to a is made only at the dots. Length of a is 3cm.
Production rule can be applied only once.
Connections to b are made only at the dots.
Direction of b is given by the perpendicular bisector of line
joining the end points of the two un-dotted segments.
Direction of b must be the same as the direction of a.
No multiple applications of this production.
The length of the wing is 6cm.
Connections to c are made only at the dots.
Direction of c must be the same as the direction of a.
This production can be repeated no more than 5 times.
Length of c is 2cm.
Connection to d is made only at the dot.
Orientation of d and b must be the same.

regarding other factors such as primitive length and direction, and the limitations
on the repeatability of the production, must be made explicit. This can be carried
out by introducing the semantic information to the system. The semantic rules deal
with correctness of the object structure established by the syntax from the production
rules. By using the semantic information, a broader class of patterns can be described
without having to increase the size of the production rules and primitives. An example
of the semantic information embedded to the production rules is shown in Table 2.1.
Tang and Huang [113], and Davis and Henderson [34] apply these linguistic shape
analysis techniques to the recognition of aircraft (in terms of silhouette boundary)
in aerial images. They explicitly consider the problem of superfluous hypotheses in
existing methods, and propose a way to get around the problem, by introducing a
design of what they call a creation machine. The creation machine is an abstract
mechanism that applies formal language theory to filtering out unwanted words (spurious line segments) and to establishing an order to the wanted words (aircraft line

19

segments). They allow for possible segmentation of the contour, and use broken
contours (the straight line segments) and relationship among them to describe the
aeroplane. They also acknowledge the difficulty in finding a set of good thresholds
for all the images, and adopt a multiple threshold approach to deal with real images.
Despite efforts to be practical with real aircraft images, their algorithm is limited to
only one particular type of aircraft. Expanding to a larger class of different aircraft
shapes leads to much larger grammars and often less effective parsers. Moreover,
these methods suffer from more shortcomings such as the necessity for computing a
unique segmentation of the shape into primitives, and the requirement of assigning a
unique terminal name to each primitive. Hence, these syntatic/semantic approaches
suffer when presented with missing data and distortion of extracted segments.
Davis and Henderson [34] address the fact that a shape can be decomposed into many,
possibly overlapping primitives. They attempt to overcome this problem by introducing an approach called a hierarchical constraint process that assigns all plausible terminal symbol names (or labels) to each primitive (or part), and allows higher level processes to disambiguate the labelling of each part. The experiment demonstrates the
systems capability in handling the uncertainties associated with segmented boundary
lines. However, the results were confined to 2-D silhouettes of aeroplanes viewed from
directly above (ie., zero roll and pitch). Moreover, they acknowledge the difficulty
associated with the construction of a grammar to embrace various projected images
of complex-shaped aircraft.
Even though these techniques allow specification of the local structure rather than
global shape, and are capable of explicitly incorporating the variations in the object
shapes into the models, their claims have not yet been substantiated using a variety
of real aircraft images. Moreover, the issue of constructing a grammar capable of

20

handling such diversity of aircraft shapes and viewpoints has not yet been demonstrated, and it remains unclear how these methods can cope with occlusion, shadow
effects, camouflage and clutter.

2.2
2.2.1

Global Matching Techniques


Moment Invariant Techniques

The input image to this system is in a binary form where the aircraft is assumed to
be successfully isolated from the background and its pixels are assigned a value of
one. If the dimension of the image is M N, the spatial central moment [102] of
order (p q) is expressed as,
pq(unscaled)

M X
N
X
=
(m m)
p (n n
)q F (m, n)

(2.2.1)

m=1 n=1

(
where F (m, n) =

0 if (m, n) 6 aircraft region


1 if (m, n) aircraft region,

and m
and n
are the mean values (centroid) of the aircraft region. Hu [57] has proposed a normalisation of the central moments. These normalised central moments
have been used to develop a set of seven compound spatial moments that are invariant to translation, rotation and scale change. The feature vector consisting of
these moments is computed from the image and subsequently compared with those
computed offline in the model database, using the Euclidean distance as the match
metric. The aircraft model associated with the best match is accepted as the viewed
aircraft in the image.
Implementation of the moment invariant techniques on aircraft recognition tasks can

21

be traced to the work by Dudani et al. [36], Reeves et al. [103], McLaughlin [83],
and Breuers [15]. In Dudani et al. [36], the feature set contains seven Hu-moments
from aircraft boundary pixels and another seven from aircraft region pixels. The
test suite contains images of 6 aircraft viewed from different camera positions and
orientations. The Bayes decision and distance-weighted k-nearest neighbour rules
are used to find the best match (ie., classification). Reeves et al. [103] proposed a
normalisation technique, aspect ratio normalisation, that is less sensitive to noise, and
yields a comparable performance to the Fourier Descriptors method, described in the
next subsection. McLaughlin [83] introduced the use of quadratic neural nets through
which the moment invariants are matched to the models. Breuers [15] modified the
nearest neighbour search procedure to improve classification and the accuracy of pose
under a wide range of image resolutions and viewpoints.
Methods like these work well when the preprocessing stage can unambiguously generate the object outer boundary and therefore separate the object region from the
background. The strength of these methods lies in that the feature set is not affected
by rotational, translational and scaling differences between an object model and its
observed image. The image-to-model feature matching can readily be implemented
in real time. However, for realistic images such as those shown in Section 1.1, it is
often extremely difficult to correctly segment the object region and isolate it from the
surrounding background. The sensitivity of these methods to intensity distribution
inside and outside the object silhouette makes moment methods less appealing to our
application.

22

2.2.2

Fourier Descriptor Techniques

In these approaches, the shape of a closed contour is represented using a Fourier


Descriptor (FD) technique. The underlying idea is that the outer boundary of an
aircraft in the image can be expressed as a function of a contour tracing variable,
and repeating the tracing process multiple times will produce a periodic function
that can be expressed as a Fourier series. The FD of a contour is defined as this
Fourier series. To implement this method of shape description, it is necessary to
sample the contour at a finite number of points. Since the discrete Fourier transform
of a sequence gives us the values of the Fourier series coefficients of the sequence,
by assuming it to be periodic, a Fast Fourier Transform (FFT) algorithm provides
an efficient way to compute these coefficients. Once the Fourier descriptors have
been computed, the operations of rotation, scaling, and shift of the starting point
are easily estimated in the frequency domain. While shapes may be compared in
the space domain, the procedures required to adjust their size and orientation are
computationally expensive. Normally an iterative type of algorithm is employed,
which searches for an optimum match between the unknown shape and each reference
shape. The geometric changes are related to simple transformations of the descriptors,
hence normalisation of the descriptors to make them invariant to translation, rotation,
scaling and starting-point shift, is relatively simple.
Wallace and Wintz [123] proposed a technique for normalising the FD in such a way
that all shape information is retained and the computation is efficient. They demonstrate that the effects of noise and image resolution variations on the FD coefficients
can be reduced by using an appropriate filter. Gorman et al. [45] use the FD of
local features to enable partial recognition of occluded or overlapping objects. Kuhl
and Giardina [69] formulated elliptic Fourier features that improved tolerance against
the contour perturbations. Chen and Ho [25] extended this elliptic FD approach by

23

incorporating an efficient nearest neighbour searching, which arguably saves the computation time without sacrificing the performance.
The advantage of the FD is that the FDs can approximate segments of contours hence
enabling the partial shape matching in the presence of occlusion as discussed in [45].
However, these methods also suffer from shortcomings; the normalisation required in
deriving invariant features may not be uniquely determined. Compounding this problem is their sensitivity to sampling of contour points, uniformity of sample spacings,
size of the FD, quantisation error and the contour perturbations [32, 33]. For these
reasons, the FD invariant methods are not well suited to our application.
Glais and Ayoun [44] developed a system that accounts for commonly encountered
problems in practice. Their algorithm embeds two different recognition approaches,
which are selectively implemented based on the quality and properties of the input
image. If the target and background are not clearly separable, then a syntactic pattern
recognition technique is applied using local image features. To put this in more
detail, firstly a watershed algorithm is applied to the image to separate object and
background. Usually, multiple separation hypotheses are generated in this process.
These hypotheses are converted to the FD, and then compared to a library by means
of nearest neighbour search. If no match is found during the search, the system
assumes that the object separation was not successful. In this case, syntactic pattern
matching using local features is activated. The description of their work, however, is
not given in detail in the article. The experimental result, obtained from a test suite
of computer generated aircraft images indicates that the performance is sensitive to
observation condition and background structure. Applicability to real aircraft images
is yet to be validated.

24

2.3

Local Matching Techniques

In many applications, the objects to be recognised are usually well defined in terms
of shape and size and are limited in number so that specific models can be stored
and used to help identify the viewed object. Model matching techniques relying on
global features are efficient in terms of matching speed, but have a limited capability
to handle shadows, occlusion and clutter. An alternative approach is to make use of
local features such as corners, holes [11], lines [112] and curvature [122] to achieve
object recognition. Unlike moment and FD methods, the recognition process is not
achieved in one step, but often calls for a search method to take place in either the
transformation parameter domain (eg. pose clustering) or in the model and image
feature spaces (eg. alignment). All these geometric matching schemes employ a
hypothesise then verify paradigm, where the local features are used to hypothesise a
transformation (pose), followed by a verification process that ensures that all model
features are consistently matched to their image counterparts.
Commonly used matching methods include pose clustering [3, 37, 48, 97], alignment
[27, 60] and geometric hashing [43, 72, 74, 118, 126, 127]. These methods are built
upon the observation that a transformation of an object may be defined by a transformation of a small subgroup of the object features. In this section, a general description
and comparison of these techniques are outlined, and a number of investigations into
aircraft recognition (by Mundy and Heller [89], Marouani et al [81], Fairney [37], and
Chien and Aggarwal[27]) are discussed.

25

2.3.1

Pose Clustering

In the pose clustering (or Hough Transform) approach [3, 97, 48, 37], recognition of an
object is achieved by iteratively finding transformations that map feature subsets from
the model domain to the image, and by generating clusters of the transformations.
Let us assume that the model can be represented by a set of features, called interest
features [126], which can also be extracted from the image. In the most general (and
least informative) case, the interest features will be just points or lines.
Consider a 2-D image to 2-D model match where interest points from corners and
inflections are used as the interest features, a 2-D affine transformation can be represented by six independent parameters as shown below,
x0 = ax + by + c
y 0 = dx + ey + f

where (x, y) and (x0 , y 0 ) represent the 2-D coordinates of the model and image interest
points. This technique treats the affine transformation as a point (single count) in
the 6 dimensional parameter space. To solve for the 6 unknowns, we require four additional linear equations (ie., two additional points). Each correspondence of a model
point triplet with three image points generates one candidate affine transformation,
recorded as one vote in the 6-D parameter space. Good transform alignments result
in dense vote clusters in the parameter space.
For a model of m points and an image of n points, m3 n3 correspondences are required, which is computationally expensive. Another downside of this method is the
large dimensionality of the transformation table. Requirement of the 6-D parameter space for a 2-D matching appears as memory inefficient, hence their usefulness
on higher dimensional problems is questionable. In addition, this method compares

26

all of the image triplets with all of the model triplets. Allowing every one of these
exhaustive pairings to contribute a vote makes this method susceptible to noise (ie.,
the transformation table is likely to have many noise spikes) [48].
This method does not account for global consistency between object and image features; an incomplete set containing a large number of fragments may be favoured by
the Hough algorithm over the desirable set comprising a fewer number of long lines
that completely enclose the object boundary [5].

27

2.3.2

Alignment

Huttenlocher and Ullman use the term alignment to refer to the transformation from
model to image coordinate frames [60]. They proposed a method for computing a
transformation from three non-collinear points under a weak perspective assumption.
The system is operated in a prediction-and-verification fashion. After each possible
alignment from a pair of triplets of points is determined, complete edge contours are
then used to verify the hypothesised match. For m model points and n image points,
there are C3m C3n 3! possible alignments, which are explored in an exhaustive search. In
their implementation, each model point is associated with an orientation attribute.
The intersection of two lines that are defined by two points and their orientations
is used to induce the third point. This enables forming an alignment using only
two model and two image points. Using this technique, they reduce the complexity
down to C2m C22 2!. Each hypothesised alignment must be verified by matching the
transformed model with the image. They organise the verification process in a hierarchical fashion: segment endpoints are used for initial verification first, and only
those alignments that pass the initial verification use the entire contour to perform
detailed verification. The solution found is unique up to a reflection ambiguity. Since
the alignment of features is local and is obtained by identifying corners and inflections
in edge contours, the features are more tolerant to partial occlusion.

2.3.3

Geometric Hashing

The idea of geometric hashing is to use invariants to index from an extracted scene
into a pre-stored hash table in order to discover the possible candidate matches. The
method is an efficient technique that uses spatial arrangements of features to locate
instances of models. Because this method does not match models one by one, it is

28

capable of effectively recognising objects from a large model database. The invariant
is the local coordinate of a point (i , i ), expressed with respect to a frame locally
defined by arbitrarily chosen three non-collinear points known as the basis, [p0 , p1 , p2 ].
This can be expressed mathematically as pi = p0 + i (p1 p0 ) + i (p2 p0 ). The
basis information as well as the model index are recorded in the hash table in the offline preprocessing stage. A voting process is involved to recover the transformation
between the object in the scene and the object in the model database during the
on-line recognition stage.
Lamdan and Wolfson [72] introduce a prototype geometric hashing technique for
recognising flat industrial parts and synthesised 3-D objects. They view the geometric
hashing as a filtering procedure which can eliminate a large number of spurious solutions before direct verification is applied [74]. Gavrila and Groen [43] use a geometric
hashing system to recognise 3-D CAD models. Tsai [118] investigates the use of line
features to compute recognition invariants in a more robust way, and demonstrates
that this technique is nose resistant and more effective in occluded environments than
the point-based approaches.
More efficient indexing methods were developed by Stein [109, 111], where objects are
approximated as polygons. A sequence of consecutive line segments in the approximation is called a super segment. Super segments are encoded and stored in a hash
table for lookup at recognition time. Recognition proceeds by segmenting the scene
into a polygonal approximation; the code for each super segment retrieves model hypotheses from the table. Clustered hypotheses represent the instance of the model.
Finally the estimate of the transformation is refined. This work uses examples of
aircraft recognition from aerial photographs of airports. Stein also extended his work
to the problem of matching 3-D object models to 2-D image features [110], where
the importance of grouping control mechanisms to obtain a reasonable starting set of

29

features is stressed. He also argued that extending geometric hashing to 3-D full perspective matching is very difficult, and resorted to using the topological constraints
between the fairly complex image features.
Comparisons of the geometric hashing technique with the pose clustering and alignment methods have been addressed and can be found in [47, 48, 49, 51, 73, 126].
Grimson and Huttenlocher analysed the sensitivities of the Hough Transform [48]
and Geometric Hashing [47], and concluded that all these clustering based methods
suffer from the false positive rates becoming intolerably high in noisy and cluttered
environments. They seem to be more adequate for low dimensional matching problems under a controlled industrial setting, such as recognition of flat objects on a
conveyer belt under a stationary camera.

30

3-D model
vertex pair

3-D rotation
x-y translation
scaling
transformed model
vertex pair (3-D)
e1
e2

image plane
z

image vertex pair

Figure 2.2: The projected angles and determine the rotation (pitch and roll) of
the model vertex-pair projected onto the image plane.

2.3.4

Particular Systems

Mundy and Heller

Mundy and Heller [89] developed a model based recognition system that makes use
of 3-D vertex-pair of model and 2-D vertex-pair in the image to determine the affine
transform parameters (see Figure 2.2). Assuming a weak perspective projection, the
transformation between the object and image reference frames has six degrees of
freedom, three for rotations, two for translation and one for scaling. The vertex-pair
provides a sufficient number of constraints to determine the six parameters of the
affine transformation. Assuming that a correspondence has been made between the
affine projection of a 3-D model vertex pair and a set of 2-D edges and vertices derived
from the image intensity data, the roll and pitch rotations of the viewing angle can be
derived from the observed angles, and shown in Figure 2.2. The yaw angle can be
computed readily by measuring the rotation of the image vertex pair (shown in red

31

in Figure 2.2) with respect to the model vertex pair (in black) about the z axis. The
length ratio of the model and image vertex pair vectors is the estimate of the scale
factor (or equivalently the viewing distance if the camera focal length is known).
The estimated transformation casts a vote in the transform (Hough) space. The
six-parameter transform space is decomposed into subspaces, (ie., 2-D [roll, pitch]
array, 1-D [yaw] array, 3-D [x, y, scale factor]) for ease of computation. After
completing the voting process using a combination of binning and nearest neighbour
clustering techniques, clusters with large enough votes are considered to be a feasible
aircraft hypothesis. If the camera orientation and its parameters are known and the
aircraft is in a parked position, then the computation complexity is reduced and the
system robustness also improves. The validation process is carried out by comparing
the model edges (which have been transformed according to the computed viewpoint)
with the edge images. The actual edge coverage computation is performed using the
Distance Transform [13] (as will be discussed in Section 5.1.4).
Mundy and Heller tested their algorithm on real images of C130 transport aircraft
parked in an airfield. The experimental results show good classification percentages, provided that the aircraft boundaries are successfully extracted. However, their
test setting is limited to low clutter, high contrast images only. Furthermore, being indexing-based, this system is subject to a combinatoric strain and Hough-space
dimensionality problem.

Marouani, Huertas and Medioni Model


Marouani et al. [81] proposes a technique where a simple 3-D model of an aircraft
is constructed on-line from the image available, and is used with camera geometry
to find instances of the aircraft in subsequent images. In this system, the aircraft is

32

decomposed into its main discernible components: two wings, two rear wing, engines,
the fuselage and a tail. The general methodology consists of grouping primitives
extracted from the image in the sets, which potentially represent the hypotheses of
instances of the aircraft in the image. The system aims to deal with edge fragmentation problem due to various image degradations. The model is extracted by hand
from one or more images, and is composed of two orthogonal planes: a horizontal
plane outlining the wings and the fuselage, and a vertical plane representing the tail.
In the system, the camera model and transformation (translation and orientation)
are assumed known, hence the model is transformed accordingly and projected to the
image plane. Using the sun azimuth and incidence angles (assumed to be known),
the shadow outlines can be computed and augment the projected 2-D aircraft model.
This process is followed by hidden line removal procedure.
The extraction of the image line segments is carried out using the LINEAR feature
extraction system [94]. Given a set of projected model segments and a set of extracted
image segments, every candidate pair of matching segments (one from each set) is
checked for their separation, angular deviation and length difference. These error
terms are used to determine the weighting for the vote. The vote is cast into an
accumulator array, whose axes denote the 2-D translation.
A peak in the accumulator array gives the position of the best translation between
the two sets of segments, and a second pass of the algorithm collects the matching
pairs that contribute to the peak. If the matching level exceeds a preset threshold,
then the model is validated. If not, then further validation and evaluation follow.
The validation starts by computing a binary function of the matched segments between image and model, along the arc length of the model, and then scaling this
function to map it on a circle of radius 1, centred at (0,0). It is followed by computing the moments of the resulting fragmented wheel to analyse the distribution of the

33

matching pixels. The matching metric, eccentricity, length of match and displacement
are derived and used to determine if the hypothesised model is validated.
The performance analysis of the system given in [81] is carried out for one aircraft.
The pose estimation is assumed known, which limits the usefulness of this system.

Fairney Model

The aircraft recognition approach proposed by Fairney [37] starts by building a shape
description of the object. In this study, jet aircraft and missiles are used in the
experiment. A series of salient points on the aircraft boundary are connected by
straight line segments. These line segments form a series of directed edge segments
(or a chain of edge vectors). This process of shape description is repeated for the
model. The yaw and roll angles are fixed to reduce the problem to 2-D matching,
therefore the model database contains 2-D projections (in terms of edge vectors) of
the model, and yaw and roll angles.
Then the transform which brings the image edge vectors into coincidence with the
model vectors needs to be estimated. This transform comprises a scale factor s, angle
between the model and image edge vector pair (pitch angle =), and two translations, 4x and 4y. A pose-clustering approach is selected here so that the transform
for one vector pair contributes a vote in the 4-D parameter space. After trying all
association combinations of the model edge vectors with the image edge vectors, the
most prominent cluster in the parameter space will be selected and the parameters
associated with the cluster is regarded as the correct transform.
Given a model database, this method uses a compactness measure (area/perimeter2 )
to narrow down the search space in the model database. Having obtained a short list

34

Table 2.2: An example of 2-D table for an efficient pose clustering. The resolutions
(bin widths) for s, , 4x and 4y are 0.2, 20, 5 and 5 respectively.
s
0.2-0.4
0.6-0.8
0.0-0.2
-

0 20
20 40
20 40
-

4x
5-10
10-15
5-10
-

4y
15-20
1-5
1-5
-

count
20
4
1
-

of the model candidates satisfying the compactness constraints, the pose clustering is
carried out using a more efficient 2-D table (shown in Table 2.2), instead of the 4-D
parameter space.
The pose clustering process starts with assigning bin widths to the four parameters
in the table. This table is initially empty. As the transformation for a vector pair is
estimated and appropriately quantised to the bin resolution, these parameters enter
the first row, with a vote count of one. For the next edge vector pair, if the parameter
combination does not already exist in the table, then this combination generates a new
entry in the table. On the other hand, if such a combination exists in the table, then
its count is simply incremented. This process can be made more efficient by initially
performing the clustering with larger bin sizes and then splitting the frequently visited
bins into smaller bins later. The data in the finally selected bin gives the best estimate
of the winning transformation. Further validation is carried out by aligning all the
image edge vectors with their model counterparts using the estimated transformation.
The root mean square (rms) difference between the transformed image coordinates
and those of the model are computed. The winning model and orientation of the
smallest rms error are finally selected. This method is efficient and can handle partial
occlusion and boundary perturbation due to noise. However, this method assumes a
successful extraction of the object boundary which can be very challenging in cluttered
scenes or under poor imaging conditions.

35

Chien and Aggarwal Model


The work by Chien and Aggarwal [27] is based on the observation that high curvature
points play an important role in determining the object identity from its shape outline.
In this technique, object recognition is achieved through a hypothesis and verification
process, and a 2-D validation (matching) process. The overall procedure is outlined
as follows.
Hypothesis
Given a pair of 3-D model point coordinate (xm , ym , zm ) and 2-D image point coordinate (xi , yi ), then the following linear equations can be established that transform
(xm , ym , zm ) to (xi , yi ).

0
0
0
xi = R11
xm + R12
ym + R13
zm + stx

(2.3.1)

0
0
0
yi = R21
xm + R22
ym + R23
zm + sty

0
where s, tx , ty and Rij
= sRij are respectively the scale factor, the translations in

x and y, and the rotation parameters. Since these equations have eight unknowns,
three additional point pairs are required to generate eight linear equations to solve
for the eight unknowns (ie., transform parameters). Such four-point correspondence,
expressed in terms of the transform parameters gives rise to a hypothesis (model and
transform).
Verification
The hypothesis in terms of the transform parameters needs to be verified using the
constraints associated with the rotational parameters,

36

0 2
0 2
0 2
1. R11
+ R12
+ R22
= s2
0 2
0 2
0 2
2. R21
+ R22
+ R23
= s2
0
0
0
0
0
0
3. R11
R21
+ R12
R22
+ R22
R23
=0

If the computed transformation satisfies all these constraints, then the four-point correspondence is considered valid. The remaining model points are transformed onto
the image plane and the mean-square displacement error is computed. This process is
repeated over all the valid four-point correspondences. The four-point correspondence
whose mean-square-error is below the threshold is selected, and [R31 R32 R33 ]T is found
via the cross product of [R11 R12 R13 ]T and [R21 R22 R23 ]T to estimate the viewing angle.

Validation
The verified hypothesis brings the model and image contours into an alignment. First,
a pair of matching points are selected from the model and image contours. The
distance between the boundary point and centroid is then measured for both the
model and image contours, and their distance ratio is computed. The distance ratio
is collected for the remaining point pairs on the contours, and the standard deviation
of the ratios is used as the shape matching metric. The minimum distance ratio
standard deviation (DRS) is searched, to find the winning model and pose. A detailed
description of DRS is discussed in Section 5.1.2.
Simulation results using various aircraft [27], demonstrated the tolerance of this
method against occlusion and scale changes. This method is also applicable to the
multiple target images. However, the images used for the experiment have a homogeneous background, and therefore it is unclear to us whether or not this approach
will tolerate other types of image degradation such as clutter. In this work, the effect
of occlusion is expressed as deformation of the closed contour. However, an aircraft

37

in unrestricted environment often displays fragmented contours and missing contour


segments, making model-to-image point mapping difficult.

2.4

Knowledge-Based Vision Systems

Traditional global feature methods assume that each instance of an aircraft object
is an accurate projection of the aircraft of known dimensions and shape onto the
image plane [37]. The main focus of these approaches is to find the best match
from the model database to the image data for a particular viewpoint. These single
step recognition approaches are only effective if relevant image data is available and
relatively accurate. However, the image data is usually distorted due to a lack of
reliable low-level image processing techniques.
Reliable image primitive extraction is not always guaranteed in real-world environment. Variations in aircraft appearance in the image plane due to unknown viewpoint, noise, shadow, occlusion and adverse weather effects further complicate the
image data formation. Hence, the single step recognition approaches are often not
suited to object recognition in an uncontrolled environment.
A more appropriate approach is to carry out the analysis in multiple stages, where
each stage of image analysis is governed by the system knowledge/model database as
shown in Figure 2.3, that represents the domain object in various levels of hierarchy,
starting from a local part description level to a global category level.
The recognition process begins by detecting the image primitives (low level image
processing) and then these primitives are combined to form higher level features and
to make coarse-level decisions (intermediate level processing). Associations of the

38

Intermediate Level Processing

Segmentation

Representation
and
Description

Preprocessing

Problem
Domain

Knowledge/Model Base

Recognition
and
Interpretation

Result

Image
Acquisition

Low Level Processing

High Level Processing

Figure 2.3: Framework of knowledge/model based aircraft recognition.


higher level symbolic features lead to the recognition solution (high level processing).
The process is usually bottom-up but top-down feedback is also often implemented.
The existing methods of aircraft recognition include the COBIUS [9] in Section 2.4.1,
the ACRONYM system by Brooks [16, 17, 18] in Section 2.4.2, the TRIPLE by Ming
and Bhanu [86] in Section 2.4.3, and the Qualitative Aircraft Recognition Technique
by Das and Bhanu [32, 33] in Section 2.4.4. The last method by Das and Bhanu
[32, 33] is most relevant to our system, and therefore is discussed in more detail.

2.4.1

COBIUS

The COBIUS [9] (A Constraint-Based Image Understating System) has been developed by Lockheed Missile and Space Company Image Technology Development
Program. This system focuses on applications using high resolution aerial imagery
interpretation, addressing generic domain object representation, compensation for unreliable image segmentation and knowledge control. The system consists of knowledge
bases for domain object models and control strategies, blackboard areas to contain

39

Please see print copy for Figure 2.4

Figure 2.4: COBIUS image understanding architecture [9].

the instantiated hypotheses of the scene, and an image feature database to fuse results
from multiple image segmentation modules (see Figure 2.4).
COBIUS uses a hierarchical representation scheme for both domain objects and constraints. As for domain object, the hierarchy consists of event, scene, group, object,
subpart, surface and curve. Similar hierarchy applies to the constraints, from coarse
to fine levels. Complex constraints are decomposed into primitive constraints and,
the constraints can be manipulated by rules and other constraints. Model based
prediction and verification of primitive constraints from the complex constraints can
be used to reduce the combinatorial computation of matching techniques. In order to cope with unreliable image segmentation, COBIUS uses a multiple feature
fusion approach with model-based feature verification capability. The region segmentation generates coarse image feature for initial image interpretation, and the edge

40

segmentation provides more detailed shape information for model-based verification.


For partially supported hypotheses, their missing parts are predicted, and the refocused search regions are selected for model-based re-segmentation. To address the
knowledge control problem, the COBIUS control knowledge is represented in terms
of control schemes and strategy selection rules, which manipulate the constraints in
a dynamic fashion and also decides which hypothesis will be explored first. The novelty of this approach is that constraints are represented as hierarchically organised
objects, therefore, the generation, combination, manipulation and evaluation of the
constraints are made flexible, and the system is adaptive to new domains. However, when applied to aerial imagery, the system requires a great deal of ancillary
information about the scene.

2.4.2

ACRONYM

Brooks [18, 19] introduced a vision system called ACRONYM, which recognises 3D objects from 2-D images. He uses an example of recognising airplanes on the
runway of an airport from an aerial photograph. He uses a generalised cylinder (or
cone) representation for the models. A relational graph structure is used to store
such representations. Nodes are the generalised cylinders and the links represent the
relative transformations between the cylinder pairs. The system also uses two other
graph structures, constructed from the object models, to assist the matching process.

Restriction Graph: Restricts the composition of any class of an object to a


hierarchy of subclasses. An example given by Brooks considers classes of electric
motors. These can be described by a generic motor type which is then divided
into more specific classes of motors such as ones with a base and with flanges.
These can then be further described in terms of functional classes (dependent on

41

use) such as central heating water pump or gas pump. Additional restrictions
are allowed to be added to the graph during the recognition process.
Prediction Graph: Links in the graph represents relationships between features in the image. These links are labelled must-be, should-be or exclusive
according to how likely it is that a given pair of features will occur together in
a single object.

For any 3-D object represented as a generalised cone, one can define a corresponding
2-D shape representing its image under perspective projection from any arbitrary
view point. Two descriptions are used for the 2-D image features.

Ribbons: These are planar shapes to describe a projection of an object made


of generalised cones. The ribbon is described by (L, S, R) where the shape is
generated by translating a line segment, L, along a finite planar spine, S, using
a sweeping rule, R. R governs the angle of L to S as L progresses along S.
Ellipses: These are used to describe the projection of the ends of the generalised cones. For ends of a circular cylinder, the projections are exactly ellipses.
For polygons, ellipses can provide a description of the ends by fitting the best
ellipse through the vertices and noting the projection of this shape.

Figure 2.5 depicts the generalised cylinder (cone) representation of an aircraft model,
and the projected images in terms of ribbons and ellipses.
ACRONYM uses its geometric models, supplemented by a restriction graph and constraints upon variations in element sizing, structuring, positioning and orientation,
to predict possible ribbon images from various viewpoints. The matching process is
performed in two stages:

42

Generalised Cylinder
representation of
an aircraft model

Possible 2D projections
from different view points
Ribbons for fuselage

Ribbons for wing

Ellipses for projection of


ends of a generalised
cylinder

Figure 2.5: Generalised Cylinder representation of an aircraft and the projected images in terms of ribbons and ellipses.

43

1. The image is first searched for straight or curved lines and then, by linking lines
that are proximal within certain tolerances, local matches to ribbons predicted
from the model are searched for. Such instances of ribbon matches are grouped.
2. The groups of the matched ribbons are checked for global consistency in that
each match must satisfy both constrains of the prediction graph, and the accumulated constraints of the restriction graph.

ACRONYM however falls short of addressing real-world concerns. In particular, it


has no mechanism for automatically acquiring and refining object models, handling
of shadows, while clutter and other image degradations are not adequately dealt with
[33].

2.4.3

TRIPLE System

Ming and Bhanu [86] developed a target recognition system called TRIPLE (Target
Recognition Incorporating Positive Learning Expertise) that incorporates two powerful learning techniques, known as Explanation-Based Learning (EBL) and Structured
Conceptual Clustering (SCC).
Figure 2.6 illustrates the configuration of the components in the TRIPLE target recognition system. The processing elements, shown as blue rectangular blocks, process
the input image data and features, and generate the target recognition results.
The segmentation and symbolic feature extraction block segments and locates the
regions of interest (ROIs) in the image, and then extracts the symbolic features from
the ROIs. The knowledge-based matching block traverses the classification tree using
the extracted symbolic features to reach a leaf node of the tree. If successful, then

44

image

Symbolic Feature
Definitions

Segmentation and
Symbolic Feature
Extraction

Background
Knowledge

Knowledge Based
Matching

Explanation Based
Learning

Target Classification
Tree

Structured
Conceptual
Clustering

Target Model
Database

Feature Value
Monitor

Target Feature Value


Refinement Cycle

Goal
Dependency
Network

Target Model Acquisition


and Refinement Cycle

Figure 2.6: Multi-strategy machine learning approach for aircraft target recognition.

the target has been correctly identified. The matching block also initiates the proper
learning cycle based on the target recognition results. The explanation-based learning
(EBL) block, when invoked by the matching block, selects the relevant target features
based on the symbolic feature information during the target model acquisition cycle
(as bounded within red box in Figure 2.6). The EBL block also identifies new relevant
features for updating the classification tree. The structured conceptual clustering
(SCC) block is responsible for maintaining the classification tree, using the relevant
symbolic features selected by the EBL block. The feature value monitor block adjusts
the feature values in the classification tree, according to the changes in the previously
selected features for target recognition, during the target feature values refinement
cycle (as bounded within green box in Figure 2.6).
The background knowledge is accessed by the EBL block to assist in discriminating relevant target features from background. The target model database stores the

45

complete schema of each target, previously encountered by the recognition system.


Relevant features, determined by the EBL block, are marked for future reference in
the target model database. The SCC utilises the goal dependency network (while
maintaining the classification tree) to compute the optimal clustering of the targets.
The target classification tree represents a structured hierarchy of all targets known
by the TRIPLE system, and assists the matching block to categorise various target
recognition results, in terms of complete recognition, partial recognition, occlusion,
recognition failure, new target, and target model refinement. The machine learning components in the TRIPLE system allow the system to adapt its target model
representations, in order to operate effectively in unconstrained environments.
The test suite in the experiment are confined to 2-D computer generated aircraft,
where image degradation effects were also simulated. These results serve merely as a
proof of the system concept as the system was in a development phase. We are not
aware of any further progress or update to this work.

2.4.4

Das and Bhanu

Das and Bhanu [32, 33] proposed a system for recognising aircraft in complex, perspective aerial images, using qualitative features. The system is designed to deal with
the issues of real-world scenarios, such as shadow, clutter, and low contrast. It uses a
hierarchical representation (consisting of qualitative-to-quantitative descriptions) of
aircraft models. Such descriptions vary from symbolic features (eg., aircraft wing)
to primitive geometric entities (eg., lines, points), and allow an increasingly focused
search of the precise models in the database to match the image features.
The system consists of four distinctive features which are:

46

Please see print copy for Figure 2.7

Figure 2.7: Framework of the qualitative object recognition system [33, 32].

A qualitative-to-quantitative hierarchical object model database, and three recognition sub-processes which utilise these models.
Saliency-based regulation of low-level features to be used, in an incremental
fashion, in the subsequent steps of recognition.
Model-based symbolic feature extraction and evaluation that uses regulated lowlevel features and heterogeneous models of image segmentation, shadow casting,
and image acquisition.
Refocused matching for finer object classification.

The framework of the recognition system is shown in Figure 2.7. Initially, the lower
resolution version of the input image is processed to locate the regions of interest
(ROIs) [91] by identifying feature clusters. As a first step, edge pixels in the ROI
are detected by applying multiple thresholds, acknowledging the fact that different
images or different parts of an image are subject to different optimum thresholds.

47

contained in the
segmented region
Li

Lj
convexity test - accept

Li

segmented
regions

not contained in the


segmented region

Lj
convexity test - reject

Figure 2.8: Convexity test on a line pair. For any two lines, Li and Lj , we determine
two extra lines (green dashed) by joining the end points of Li and Lj . If these lines
are contained in the segmented region (shaded) then the convexity test is passed.

Initially, most salient lines are used in the grouping, and if no aircraft recognition is
achieved, then next incremental salient lines are included. This progressive relaxation
continues until a successful recognition is declared or the least salient line features
are invoked.
Edge segment following is conducted to create long chains of edge segments. As
various parts (eg., wing, nose, fuselage, etc) of the generic aircraft model are described
in terms of linear segments, a straight line extraction technique similar to that of
Lowe [79] was used. Furthermore, corners are detected by obtaining gradients and
curvature measurements. In addition to line extraction, region segmentation (based
on the joint relaxation of two-class region-based and edge-based approaches [7]) is
also carried out. The potential dominant axes of the aircraft region are generated by
connecting the extremities of the segmented foreground region.
This system uses ancillary data, which includes weather condition. If the weather
is cloudy, then the shadow-detection algorithm is skipped. If not, then potential
shadow lines are extracted. If two regions divided by a line present bi-modality of
the intensity histogram, then the line is marked as a potential shadow line.

48

Li

Li
Lk
Lj
three-line grouping

Lh

Lk
Lj
four-line grouping

: independently detected corners

Figure 2.9: 3 or 4-line grouping process to generate symbolic aircraft features. The
shaded circles represent the proximal region of independently detected corners. Any
group of three lines (on the left) must satisfy the following conditions: (i) the two
lines, Li and Lj , are non-parallel, (ii) the third line, Lk , is in between Li and Lj , (iii)
the line intersections occur near independently detected corners, and (iv) the third
line, Lk , is shorter then at least one of Li and Lj . In addition, a group of four lines
(on the right) must satisfy the following conditions: (i) the two lines, Li and Lj , are
non-parallel, and the other two, Lh and Lk , are parallel, (ii) the parallels form the
opposite sides of the trapezoid, (iii) the line intersections occur near the detected
corners, and (iv) the parallel lines, Lh and Lk , are shorter than the non-parallels, Li
and Lj .

In order to extract the meaningful edges, the algorithm executes a two-pass convexgroup extraction process. During the first pass, the entire set of lines is decomposed
into subsets, based on proximity and collinearity such that lines in a subset satisfy
the convexity criterion. This convexity requirement is illustrated in Figure 2.8. If
the lines (in green) created by joining the endpoints of line pair, Li and Lj , are all
contained in the segmented region (shaded), then a convex group is created. This step
also results in line pairs that fail the convexity test (an example of which is shown
in red), which are subsequently put in a pool. The second pass considers if isolated
lines from the pool can be put in a convex group with relaxed proximity condition.
Provided that it is not overcast or dark, a shadow line to shadow-making line matching
is conducted using ancillary information about the camera-platform position/orientation

49

and the sun position together with the imaging parameters. Convex groups of shadowmaking lines are used to extract the symbolic features of the generic aircraft class.
Such features include trapezoid-like shapes for wings, tails and rudder, and wedgelike shape for the nose part. To extract these symbolic features, sets of conditions
derived from the aircraft part/subpart representation is used in a matching process
with three- and four-line groupings. Figure 2.9 illustrates typical arrangement of 3or 4-line groupings for the symbolic features of the generic aircraft class.
Once the symbolic features have been derived, they are matched to the generic aircraft
model through an evidence accumulation process; the parts mutual connectedness
is verified against the rules associated with the generic aircraft description. The
recognition confidence is based on the quality of the evidences. If the confidence level
is low, then the low level image processing is revisited to include less salient features.
Upon the recognition of a generic aircraft, further refinement of the detected aircraft
shape is initiated to account for the missing elements of the symbolic features. The
labelled parts are used to direct the search for more localised (symbolic/primitive)
features that are available at lower levels of the database hierarchy. Such retrieval of
the less salient features allows more precise classification in the refocused matching
process. The final output is the aircraft class recognition and its symbolic subparts.
The main contributions of this system are: (a) the salient feature extraction and their
use in a regulated fashion, (b) use of heterogeneous geometric and physical models associated with image formation for feature extraction and subsequent recognition, (c)
and the integration of high-level recognition processes with low-level feature extraction processes. Combination of these essentials makes the system robustness against
edge fragmentation commonly encountered in practice, as demonstrated in [32] using
real aircraft images of varying contrast, clutter and shadow.

50

The drawback of this system is its over-reliance on region segmentation. If an aircraft image contains camouflage, self-cast shadow or occlusion, resulting in multiple
disjoint subregions, then the dominant axes estimation and convex group extraction
may suffer. It is not clear how this system can cope with heavily cluttered images,
particularly if the background is not plain (eg., building or other objects in the background). Furthermore, this system is not capable of handling closely spaced multiple
aircraft in the ROI.

Chapter 3
Feature Extraction and Generation
of Aircraft Hypothesis
This chapter deals with the extraction process of line features, grouping of lines that
potentially describe or delimit an aircraft part in an image, and generation of aircraft
hypotheses. A part of the generic aircraft such as a wing, tail, nose or fuselage, is either
trapezoid-like, wedge-like or elongated in shape. In this system, the most prominent
part of an aircraft is the wings, as wing edges are usually straight and readily visible
from most viewing angles. The wing structure carries distinctive geometric attributes,
which provide strong clues of aircraft presence in the image. Furthermore, the wings
enable the gross classification of aircraft in terms of wing shape. In our system, the
wings are represented in pairs forming a triangular, diamond or boomerang shape.
Both wings are usually delimited by four linear sides associated with the leading
and trailing edges. As depicted in Figure 1.9, lines, two-line groupings and fourline groupings are extracted in this order to provide a hierarchical structure that
facilitates subsequent image analysis. Such progressive grouping of features enables
the propagation of geometric/intensity constraints to prune out a large number of
unwanted features at each stage, hence preventing the combinatoric explosion that

51

52

would otherwise occur.


This chapter begins with a review of straight line extraction methods in Section 3.1.
Section 3.2 discusses the proposed edge detection algorithm, which is designed to
be more sensitive to long straight edges than to short clutter. Section 3.3 describes
how large portions of undesirable dense clutter are removed, and briefly explains the
contour extraction process. Section 3.4 outlines the line extraction, line extension
and line organisation (prioritisation) processes, closing with detection of polarised
background lines. In Section 3.5, the generation of wing and nose candidates as twoline groupings is presented. In Section 3.6, the wing-candidates are paired to generate
potential wing-pairs as four-line groupings. Such wing-pair candidates are represented
as boomerang, triangular and diamond shapes. In Section 3.7, wing pairs and noses
are associated to generate aircraft hypotheses. In Section 3.8, neural networks are
introduced as an alternative to the rules from Sections 3.5 - 3.7. This chapter closes
with summarising comments in Section 3.9.

3.1

Review of Line Extraction Methods

In many works including ours, a straight line forms the fundamental or basic feature
upon which more complex features are built. Due to its importance, a number of
existing techniques that extract lines in images of man-made structures (such as
buildings and roads), is first introduced. The most classic technique is the Hough
transformation [3, 61, 64, 75], where every edge pixel is indexed into a quantised
parameter space, based on the location and direction in the image. Point clusters
in the parameter space correspond to straight lines. The disadvantage of this global
processing method is that it fits straight lines to collinear points regardless of their
spatial contiguity. The alternatives to the Hough transform are techniques that use

53

templates to extract edge pixels and then link them locally.


Nevatia and Babu [94] use six directional 55 masks representing segments of oriented
edges, in steps of 30 . The convolved edge image is thresholded and thinned to
produce edge pixels. Edge pixels are linked firstly by marking the locations of the
predecessor and successor of each edge element and then storing them in two files,
the predecessor and successor files, respectively. Tracing is followed by grouping the
linked edge elements into ordered lists using the predecessor-successor data. These
lists make up a boundary segment and they are approximated by a series of piecewise
linear segments using an iterative end-point fitting method.
In the approach by Burns et al. [20], edge pixels are first determined by convolving
the image with two orthogonal 22 masks. The pixels are grouped into line-support
regions of similar gradient orientation. The intensity surface associated with each linesupport region is approximated by a planar surface. Straight lines are extracted by
intersecting this fitted plane with a horizontal plane representing the average intensity
of the region weighted by a local gradient magnitude.
Venkateswar and Chellappa [121] developed an algorithm to detect linear object
boundaries in aerial images. This uses the Canny edge detector [21] to generate
the edge image. The edge direction of each edge pixel is quantised into four directions in steps of 45 . The edge image is raster scanned, and each scanned edge pixel
is assigned a label, thereby producing a label image. Newly generated line labels are
compiled in a database, which holds the two endpoints, pixel count and average contrast. Often, due to noise or poor contrast, the lines are fragmented. These fragments
are merged based on their contiguity and collinearity. If one segment is associated
with multiple neighbouring segments, then the conflict is resolved by using Lowes
collinearity significance measure [79] to rank the multiple pairs and choose the one
with the maximum measure. This merging process is carried out recursively until no

54

more pairing is possible. The suppression of noisy lines are also implemented based on
line length, average contrast, and whether or not the line is isolated (ie., no other lines
within a 77 neighbourhood). This method is effective in detecting linear contours
in aerial images.
So far, some of the straight line extraction methods are discussed, including the edge
detection methods that rely on gradients including the Canny Edge detector [21].
Marr and Hildreth [82] came up with another scheme that uses the Laplacian of the
Gaussian (LoG) and zero crossings [82] to detect edges. Bennamoun [4] discusses
the trade-offs between the Gradient and Laplacian based methods, and presents the
hybrid of the two. Perona and Malik [101] came up with the anisotropic diffusion
instead of Gaussian smoothing, which encourages intra-region smoothing in preference
to inter-region smoothing.
Since we are only interested in extracting straight edges, we decided to go along with
the methods applied to detection of the objects with straight edges, such as buildings.
Therefore, our approach bears some resemblance to the works of Nevatia and Babu
[94] and Venkateswar and Chellappa [121], to the extent that these methods make
use of multiple directional masks and generate a pixel-orientation image (or phase
map), which is used to assist the edge-linking and merging processes. This has been
demonstrated in a number of real aerial images of buildings. We use more directional
masks, which enables us to increase the mask size in order to improve the detection
sensitivity to long but weak edges.

3.2

Proposed Edge Detection

The input to the system is an eight bit grey-scale image with an image size of M N ,
where M and N vary between 240 and 600 pixels. The input image is convolved

55

with eight directional masks in steps of 22.5 . These are shown in Figure 3.1, and are
padded with zeros in such a way that the effective shape of the mask approximates
an elongated rectangle with the dimension of W L, where W and L are shown in
red in Figure 3.1. We set W =7 pixels and L=9 pixels.
The increased mask size in comparison to Nevatias [94] 55 was found to be more
appropriate for our aircraft application, allowing finer directional quantisation and
improved detection sensitivity to long but weak edges. However, making the mask
larger decreases the edge detection sensitivity around corner areas, and may result in
significant displacement of edge pixels. By using the increased number of directional
masks (8 masks as opposed to 4, as in Venkateswar [121], or 6, as in Nevatia [94]), the
sensitivity to weak edges is increased. It also provides more precise phase information,
which plays an important role in the contour extraction and linking processes.
The convolution process of the original image with the eight directional masks, generates eight gradient images {G1 , G2 , . . . , G8 } associated respectively with the directions {67.5 , 45 , . . . , 90 }. For each pixel (m, n) in the image, the largest gradient
magnitude in {G1 , G2 , . . . , G8 } is noted and the corresponding direction is assigned
to the direction (or phase) image. In mathematical terms, the gradient and phase
images are computed using,
G(m, n) = max {Gi (m, n)}
1i8

(3.2.1)

P(m, n) = arg max {Gi (m, n)}


1i8

This process is followed by thinning (non-maximum suppression) and thresholding.


In the thinning algorithm, the pixels of high gradients are traced in the edge direction
and non-maximum pixels are suppressed (ie., gradients set to 0). Let g = G(m, n)
and p = P(m, n). Then g is compared with the adjacent pixels (g1 . . . gW ) on both
sides, along the direction normal to p. If the gradient value, g, of the current pixel is

56

90 deg

67.5 deg

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

-1

-1

-1

L
45 deg

22.5 deg

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

-1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

-1

-1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

-1

-1

-1

+1

+1

+1

+1

+1

+1

-1

+1

+1

+1

+1

-1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

-1

+1

-1

-1

-1

-1

-1

-1

+1

+1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

+1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

-1

+1

+1

+1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

-1

-1

-1

-1

+1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

0 deg

-1
-1

-1
-1

-1
-1

-1
-1

-1
-1

-22.5 deg

-1
-1

-1
-1

-1
-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

+1

+1

+1

+1

-1

-1

+1

+1

+1

-1

+1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

+1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

-1

+1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

-1

+1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

-1

+1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

-1

+1

-1

-1

-1

+1

+1

+1

-1

-1

-1

-1

-1

-1

-1

+1

+1

-45 deg

-67.5 deg

Figure 3.1: Eight directional edge masks in angular steps of 22.5 degrees. These edge
masks have an elongated rectangular shape to detect long weak edges.

57

the largest of all the adjacent pixels (ie., g > gi , for all i = 1, . . . , W , where W = 7
as shown in Figure 3.1), then the current pixel is accepted as an edge pixel, and the
adjacent pixels are removed.
The thinned gradient image undergoes thresholding that uses two thresholds to reduce
edge fragmentation. An edge contour can be broken into fragments by the gradients
that fluctuate above and below the threshold along the edge. If a single threshold
is applied to the gradient image, and the edge has an average strength equal to the
threshold, then because of the noise the edge may occasionally dip below the threshold
and appear dashed. To avoid this, we make use two thresholds, one high and one low.
Any pixel in the image that has a gradient above the high threshold is tagged as an
edge pixel. Then pixels which are connected to this edge pixel and have a gradient
above the low threshold, are also selected as edge pixels.
We set the two thresholds to 16.5% and 10% of the image peak gradient. These
threshold values were selected after experimenting with a number of aircraft images
which present blurring, low contrast and clutter. The illustration of this process can
be found in [32].
Additional processing is carried out to discard isolated 1-3 pixel clusters. Furthermore, it is often observed that an endpoint pixel may show a phase value inconsistent
with those connected to it. This is mainly due to the fact that only about half the
mask overlaps the edge, leading to erroneous phase computation. The phase value
of such endpoint pixels are corrected and made consistent with the phase values of
adjacent pixels.

58

3.3

Clutter Rejection and Contour Extraction

In many real world complex scenes, a large proportion of edge pixels appear in the
background. These background pixels increase the computational load by generating
excessive number of line features. If the background clutter is dense and evenly
distributed in orientation, then it is possible to filter out many of these pixels at an
early stage.
Initially, we considered using texture-analysis methods for discriminating such clutter.
There exist various methods for extracting textural information from images that can
be largely divided into four categories [120]: statistical [96], geometrical [104, 117,
119], model-based [35, 100], and signal processing [28]. According to Tuceryan [120],
the outcome of these texture based methods seem applicable only to their reported
experimental setup. Furthermore, the variability of the clutter objects is usually too
large to be covered by a set of tractable models. The clutter types, that we usually
encounter in aircraft recognition include forests, urban areas, clouds, snow, rocks and
mountains. Grenander and Srivastava [46] proposed a way to model natural clutter
in terms of gradient distribution functions, classifying the clutter as one of three
classes (eg., structured, intermediate, and dense). However, we are not interested in
determining the clutter type, but instead, we are more interested in removing dense
clutter regions while preserving the aircraft outer boundary.
For this reason, we propose a simpler but effective approach, where local density and
orientation of edge pixels are used to distinguish clutter from aircraft. We implement this by applying a sliding window to the phase image, and examining the pixel
patterns in it. We use a rectangular window (see Figure 3.2), whose dimension is
proportional to the input image dimension. After numerous computer simulations,
the optimal window size was chosen to be 1/20th of the image size. The entire region

59

of the image is initially considered as clutter. The window slides on the phase image
in steps of 1/4 of the window size allowing a 75% overlap. The window is divided into
four quadrants. The edge pixels within each quadrant are collected and their density
and phase distribution are computed, based on the following measures.

1. The total pixel density, T , within the window exceeds the threshold (> 7%).
T is defined as a ratio of the number of non-zero pixels within the window to
the number of pixels in the window area.
2. The ratio of the maximum pixel density to minimum pixel density from the four
quadrants (ie., Q(1), Q(2), Q(3), Q(4)) must not be high,
max(Q(1) , . . . , Q(4) )
< thr
min(Q(1) , . . . , Q(4) )
where Q(i) is defined as the pixel density of the ith quadrant, and thr is a ratio
threshold.
3. The pixel phase values are not polarised in one direction.

Once these conditions are all satisfied, only then is the region under the mask allowed
to remain as clutter, otherwise it is considered to be a non-clutter region. When
the window is centered on the aircraft-clutter boundary, one or two quadrants of the
window will exhibit low pixel densities, not satisfying the second condition. Hence
the aircraft boundary is typically determined as the non-clutter region. This is clearly
shown in Figure 3.3, where the immediate proximity of the aircraft boundary is
labelled non-clutter. Figure 3.3 shows the aircraft images that contain dense clutter
in the background. The regions detected as clutter are shown as shaded regions. The
clutter rejection in Figure 3.3(a)-(f) amounts to 40%, 66%, 53%, 50%, 55% and 46%
of the whole edge image. Figures 3.4 and 3.5 show the edge images before (middle

60

Binary Edge Image (M xN)


sliding search window
1st quadrant

2nd quadrant

3rd quadrant

4th quadrant

67% overlap

Figure 3.2: Sliding search window to check for detecting dense clutter. The pixels in
all of the four quadrants need to be dense and randomly oriented if the region under
the window is to be tagged as clutter.

column), and after (right column) the clutter removal. Notice further that although
portions of the aircraft boundaries are within the shaded areas of Figure 3.3, they
are successfully recovered (and not rejected) in Figures 3.4 and 3.5. This processing
feature is explained later in this section.
To extract straight lines, two approaches have been initially considered; one approach
is to generate the contours first and then to perform straight line fitting to the contours
[79]. The other approach is to generate shorter line segments, so called line primitives
[121], directly from the edge image (skipping the contour part), and progressively
build longer lines either by locally linking the line primitives [29, 71, 121] or by
finding and grouping globally optimal line segments [63, 94]. The latter approach is
better suited to images that contain long and straight lines as in images of buildings
and roads. If the object contains curved lines, then the first approach is preferred.
Our early experiments of the two approaches on real aircraft images clearly favoured
the first approach.
The phase image is raster scanned (left to right and top to bottom) for contour
labelling. When a current pixel is visited, its phase value suggests where to look for

61

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3.3: Detection of randomly oriented dense clutter regions. The clutter regions
shaded. The clutter-aircraft borders are correctly included in the non-clutter region
so that the wing edges can be extracted.

62

Please see print copy for Figure 3.4

Figure 3.4: Results of dense clutter removal process. The first column original
images, the second column edge images prior to the clutter removal algorithm,
and the third column edge images after clutter removal.

63

Please see print copy for Figure 3.5

Figure 3.5: Results of dense clutter removal process (continued). The first column
original images, the second column edge images prior to the clutter removal
algorithm, and the third column edge images after clutter removal.

64

contour
-45 o

-67.5 o
-45

-45 o

90 o
67.5 o

90 o
90 o

-45 o

45 o

label:10

label: 13
-22.5 o

0o

22.5 o

-45 o

currently visited pixel


inherits the label (13)

Figure 3.6: Contour labelling process. The current pixel searches for a contour pixel
to inherit the label from. The direction of search is defined the orientation of the
current pixel.
a labelled contour pixel with similar orientation. If there exists a labelled contour
pixel within the search window and the phase difference between the two pixels is
less than 23 , then the existing contour pixel inherits the contour label from it. The
phase similarity check ensures the extraction of smooth contours, not containing
high curvature points. Any high curvature point marks the end of the contour.
This labelling process is illustrated in Figure 3.7. Due to imaging degradation, edge
fragmentation is often encountered, therefore a gap up to 5 pixels is tolerated during
the contour extraction process. Each time a pixel is assigned a label, the pixel phase
distribution (in terms of histogram) for the contour label is updated. The phase
histogram is used to assess how straight the contour is. This information assists the
contour linking process, which is carried out subsequently. We also update whether
the currently visited pixel falls within the clutter or non-clutter regions. This becomes
useful when a decision is needed to accept or reject a contour that straddles both
clutter and non-clutter regions (refer to Figure 3.7).
Once contour extraction is complete, any short contour fragments that are separated

65

Clutter region

contours in the
clutter region
=> reject

Non-clutter region

more than 30% of the contour


pixels are in the non-clutter
region => accept

Figure 3.7: If a contour has at least 30% of its pixels in non-clutter region, the contour
is accepted.
by slightly more than 5 pixels but collinear according to their phase histograms are
linked. Contours in the clutter regions are removed. However, if a substantial portion
of the contour is in the non-clutter region (ie., at least 30% of non-clutter pixels)
the contour is accepted (as illustrated in Figure 3.7). The outcome of this process
is illustrated in Figures 3.3(c)(d), 3.4(i) and 3.5(c). Sections of nose and cockpit
boundaries fall into clutter region, but are successfully recovered in Figure 3.4(i) and
Figure 3.5(c).

3.4
3.4.1

Line Extraction and Organisation


Linear Approximation of Contours

Our straight-line fitting approach is similar to that of Lowe [79]. The contour pixel
farthest from the line joining the contour endpoints is selected as a potential break
point. If its orthogonal distance is less than a threshold, the contour is split into two
sub-contours and the process is repeated again until no further contour splitting is
possible. Figure 3.8 illustrates the straight line extraction process, which eventually

66

Please see print copy for Figure 3.8

Figure 3.8: Straight line extraction process, similar to that of Lowe [79]. This algorithm generates a line approximation which is visually plausible.

LINE
Line No: #
Endpoint1: (#,#)
Endpoint2:(#,#)
Length: #
Orientation: #
Significant: Yes/No
Collinear: pointer to its associated collinear line segments
Gap: #
Connected to: pointer to co-terminating proximal line segments

Figure 3.9: Line representation. Note that the symbol ] represents a number.

leads to the piecewise linearisation of the original contour. The resulting linear segments are stored in a database along with a number of line attributes, as shown in
Figure 3.9.

3.4.2

Extension of Collinear Lines

In practice, the aircraft silhouette outline is often fragmented due to various forms
of imaging degradation. The fragmentation can also arise from self occlusion due

67

to rudder, engine or missiles, and edge discontinuities due to wing flaps. Such fragmentation reduces the saliency of the desired line features, and presents a challenge
to the feature extraction processes. Numerous edge extension or linking methods
have been proposed to overcome this fragmentation problem. These methods are
broadly divided into two categories: a global process known as the Hough transform
[61, 64, 75] and local segment grouping approach [10, 90]. Hough transform methods
are not favoured in this work mainly because they implement global line search in
the image and therefore require significant postprocessing to associate the computed
line parameters with the line segments in the image. The second approach overcomes many weaknesses of the Hough transform, but calls for an iterative process to
link all fragmented collinear segments, and may not still handle severely fragmented
edges. Hybrids of these two approaches have been proposed [63], which appear to be
promising for linking severely fragmented collinear segments.
The proposed line extension method is basically a local approach, but is tailored for
our aircraft recognition application in that the line extension is desired only for the
wing edges. We prefer to use the terminology of line extension as opposed to line
linking or joining, mainly for two reasons. The first is that the gap between two
collinear lines can be large. The second reason is that we hypothetically join collinear
lines even if the resulting longer lines do not necessarily correspond to actual aircraft
edges. However, by extending lines we increase the probability that wing fragmented
edges become longer and improve in saliency. Extended lines that do not correspond
to aircraft wings may temporarily gain importance and be part of two, four or more
complex line groupings. These groupings, however, are most often discarded at the
higher level of generic aircraft recognition, where aircraft geometric and intensitybased constraints are applied.
After processing numerous aircraft images, we observed that no more than 4 edge

68

fragments are obtained from the wing edges. In our system, line extension is not
iterative and is confined to pairs of line segments. If the gap between the two collinear
segments is wide, only then a third line in the gap is searched for and three-line
extension is allowed. We also make use intensity information from the both sides of
the line segments to supplement the geometric conditions.
We define extended lines as lines generated by joining two or three collinear line
segments. The generation of extended lines is required in practice to build longer
lines out of numerous short segments. These longer lines need to be detected as they
are likely to belong to the structure of man-made objects, such as aircraft in our
application. The process of generating extended lines is based primarily on a number
of geometric attributes. The most important requirement for line extension is that
both line fragments, Li and Lj in Figure 3.10, must present similar orientations,
which in mathematical terms, translates into (Li , Lj ) < , with > 0 being an
upper angle deviation threshold. Note that the threshold becomes tighter if the
two lines are longer. Additional line joining constraints are summarised next with
`i , `j , gij and `ij being respectively the length of the two line segments Li and Lj , the
gap between the two segments, and the distance between the farthest endpoints of
the two segments.
1. `ij < 0.5 min(M, N ), where M and N are the image height and width, respectively.
2. (`i + `j + gij ) < `ij , where is a number slightly less than 1.
3. gij < 1 (`i + `j ), where 0 < 1 1.25,
or additional collinear line segment `k is found in the gap.
4. max(`i , `j ) < k min(`i , `j ), where k 1,
or gij < 2 (`i + `j ), where 0 < 2 1 .

69

: length of line Li

: length of line Lj
gij

Li

Lj

ij

Figure 3.10: Generation of an extended line - gap width, angular deviations and
length differences form the basis to extend the lines. Note that these two lines Li and
Lj are not removed from the line database. They are used later in the line-grouping
and evidence collection processes.

If L stands for the set of all line fragments in the image, then for every pair of lines
(Li , Lj ) L2 , the conditions above are checked to see if their geometric relationship
is suitable for extension. The final decision is held until the intensity pattern in the
vicinity of the lines are examined. Such an intensity augmented decision make the
line linking process more robust.
An explanation of all 4 conditions is now given. The first condition is based on
the observation that excessively long lines are usually generated from the fuselage of
commercial aircraft, road, river, costal line, runways etc. Extending these lines do not
bring any benefit to the system, hence they are left unextended. The second condition
ensures that both Li and Lj are aligned so that the length sum of the segments and
gap is slightly larger than or equal to (in case of perfect alignment) the distance
between the two farthest endpoints. The third condition requires that the gap is not
too large relative to the line length sum (ie., (`i + `j )). A smaller gap to length sum
ratio provides a strong indication that the two lines Li and Lj should be joined. If
the gap is too large, then a search for a third line within the gap is initiated. If a
collinear line is found, then all three lines are joined. The fourth condition requires

70

that no one line should be much longer than the other. The much shorter line could
be clutter and the confidence of joining the two lines is relatively low, and therefore
they are not linked. The only exception to this is if the gap is extremely narrow with
respect to the line length sum.
The geometric constraints are followed by an intensity profile check alongside the
line segments. As shown in Figure 3.11, the extended line frequently separates the
region into aircraft body (shaded) and background regions. Therefore, we expect the
intensity averages from Li side (blue windows) and Lj side (green windows) to match
(ie., their mean difference must be less than a threshold of 25 units) in at least one
side of the extended line. In order to deal with accidental failures due to noise pixels,
we repeat this procedure along 4 strips of windows as shown in Figure 3.11. If any
one of the strips returns a good intensity match and the geometric conditions have
been satisfied, then the two lines Li and Lj are extended.
However, if the aircraft is camouflaged and the background is also cluttered, then the
intensity test may fail. In this case, the alignment of Li and Lj needs to be almost
perfect for the intensity check to be ignored and the lines to be joined. On the other
hand, if some of the geometric conditions fail just marginally, then the intensity check
is revisited with a tighter threshold. If the intensity test still returns a good match,
then the lines Li and Lj are allowed to join.
The extended line is then stored in the existing line database, which is now denoted
by LE = {Li , 1 i NE }, where the parameter NE is the total number of lines.
The extended line in the line database will have the collinear slot activated (Figure
3.9). This slot will contain the labels of the two collinear segments Li and Lj . This
will help track which lines are used to extend the given line, when the need arises in
the evidence accumulation stage. Figure 3.12(b) illustrates the outcome of the line
extension process applied to the image of Figure ?? (see the red dotted lines).

71

AIRCRAFT REGION
Hypothetical (extended) Line
Lj

first row of windows to


collect intensity means
Li
if the first row fails the intensity
match then try next row
BACKGROUND

Figure 3.11: Intensity means collected in the vicinity of the line pair. The intensity
information is used to supplement the line extension decision.

(a) straight lines

(b) lines after extension and prioritization

Figure 3.12: (a) Line features prior to the line extension process (b) Line extension
and prioritisation outcome - extended lines (red dotted line), significant lines (blue),
and non-significant lines (green).

72

3.4.3

Line Significance

Line significance is the result of a selection process which favours longer lines over
shorter ones. Longer lines are more significant because they often describe the linear
structure of aircraft, particularly the wings and fuselage. The first step in determining
line significance is to sort all line segments by length and tag the Ns1 longest lines.
Figure 3.11 lines as significant. The next step is to sort all extended lines based on the
combination of gap width and collinearity, and tag the Ns2 best lines as significant.
We set Ns1 and Ns2 as 95 and 40 respectively. It should be noted that if the image
contains polarised clutter lines (as will be discussed in Section 3.4.5), such lines are
excluded from this line significance ranking process.
This line selection process contributes to a considerable reduction in the number of
multiple line groupings that can be formed in later stages of image analysis. Referring
back to Figure 3.12(b), lines tagged as significant are shown in red and blue. The
red dotted lines correspond to the extended lines. The green lines represent nonsignificant lines. Figure 3.9 illustrates the attributes of a line and shows a slot reserved
for line significance. The last line slot in Figure 3.9 points to the closest lines in the
immediate vicinity of endpoints.

3.4.4

Line Description

While implementing this aircraft recognition system, it was found that it is beneficial
to establish a mechanism by which lines in the image are differentiated. The reason for
this is that it has been observed that long lines may appear in both aircraft structure
and background clutter. Short lines on the other hand, appear predominantly in the
background of cluttered images.

73

Table 3.1: Line description.


Line description
very long
long
short
very short

Mathematical definition
top 10%
top 20%
bottom 20% and length< 0.05(M + N )
bottom 10% and length< 0.025(M + N )

Line descriptions like short and long need to be defined in mathematical terms before
they can be used in the recognition process. Such descriptions will have a meaning
if one constructs a line length subdivision between the shortest and longest lines and
associate each description with a given length interval.
In the actual system implementation, extremely long lines (ie., longer than half the
image dimension average) are removed before all lines are sorted in an ascending order
based on length. Table 3.1 provides a mathematical definition of the line description
used in this thesis.
It should be pointed out that the line attribute significant, defined earlier in Section
3.4.3, is also based on line length ordering. The difference, however, is that the line
attribute significant is not given relative to the total number of image lines. Instead
it is fixed as Ns1 + Ns2 lines, regardless of the line count in the image. Having a fixed
number of significant lines, reduces the variability of the system processing time as a
function of the number of lines in the image.

3.4.5

Polarised Lines and Grid Lines

In Section 3.3, we discussed how clutter is filtered to improve the system performance.
Other clutter types of concern arise from man-made objects, roads, buildings and
grids. An attempt to filter them is likely to remove many desirable edges from the

74

aircraft structure. However, if the clutter is in the form of short or long line segments
mostly aligned along one or two directions, then it is possible to discriminate such
lines by assigning a unique tag to them. In this thesis, we denote such clutter in one
direction as polarised clutter, and in two approximately orthogonal directions as grid
clutter.
Initially, the orientation of all lines are extracted from the line database (see Figure
3.9) and processed to form an orientation histogram. The histogram contains 10 bins,
each of which has a width of 18 . If a direction bin with an index i shows a large
count, then the line counts of bins, i 1, i, and i + 1 are noted. If the count sum
exceeds 70% of the total line count, then all lines oriented along the direction are
declared as polarised.
Figure 3.13 shows some examples of polarised clutter. Figure 3.13(a) is the line
plot of the image shown in Figure 1.3(c), where the background generates many
parallel lines along the aircraft fuselage direction, and Figure 3.13(b) shows the line
orientation histogram, which suggests that the image contains polarised clutter. This
usually occurs when the camera tracks the high speed aircraft, causing the background
to appear as parallel lines. These polarised lines are usually perpendicular to the
wing edges, therefore cannot form the wing edges. Consequently, the lines in the
polarisation direction (with 10 tolerance) are prevented from entering the line
prioritisation process of Section 3.4.3, and are tagged non-significant. This provides
the opportunity for the wing edges that may appear short due to the polarised lines,
to be accepted as significant lines. In Figure 3.13(a), only a small portion of clutter
lines are shown in blue (which represents significant), and the wing edges are shown
in blue or red.
When an aircraft dispenses a flare, the flare trails often form long parallel lines along
the aircraft fuselage direction, as shown in Figure 3.13(c). The histogram in Figure

75

histogram of line orientation distribution


160

50

140

100

120

100
frequency

150

200

80

60

250
40

300
20

350
0
100

100

200

300

400

80

60

40

20

20

40

60

80

100

60

80

100

10

10

line orientations

500

(a)

(b)

histogram of line orientation distribution


18

16

100
14

200

frequency

12

300

10

400
4

500

0
100

100

200

300

400

80

60

40

20

20

40

line orientations

500

(c)

(d)

histogram of line orientation distribution


15

50

100
10

frequency

150

200

250

300

350
0
90

50

100

150

200

250

(e)

300

350

400

450

500

80

70

60

50

40
30
line orientations

20

(f)

Figure 3.13: Histograms of the line orientations are shown in the right column. The
images in the left column show clutter lines that are predominantly oriented along
one or two directions.

76

3.13(d) shows two distinct peaks at 80 and 80 , and the line count sum obtained
from the bins at (60 80 -80 ) exceeds 70% of the total line count. Therefore, the
clutter is regarded as being polarised. However, in this particular case, the line counts
for the extended and non-extended lines are respectively less than Ns2 and Ns1 , which
are defined in Section 3.4.3, so all the lines are accepted as significant lines.
Figure 3.13(e) has grid lines in the background. These grid lines result in two distinct
peaks roughly 90 apart in the line orientation histogram as shown in Figure 3.13(f).
In this case, it is possible that one of the grid directions is aligned with one of the wing
edges. Therefore, the grid lines should not be restrained from becoming significant.
We instead lower the thresholds used in the mathematical definition of long lines
(refer to Table 3.1) so that the wing edges that may appear relatively short compared
with the rest of the lines may have a better chance of belonging to long lines, hence
becoming more likely to survive the line grouping processes. The wing edges 3.13(e)
are all successfully labelled as being significant and long.

3.4.6

Endpoint Proximity Line Linking

Object recognition systems often make use of perceptual grouping techniques to form
more complex line features, using proximity, parallelism and co-termination properties
[62]. In our system, we use the co-termination property to form long line chains, which
may potentially represent sections of the aircraft silhouette. Two lines are related by
the co-termination property if the distance separating their closest endpoints is below
a preset threshold, as shown in Figure 3.14(a)). These lines are linked by activating
the connected to slot in Figure 3.9. For example, a line Lj is linked to Li by placing
the index of Lj to the connected to slot of Li (and vice-versa). Furthermore, the
angle subtended by the two lines ij , and the endpoints through which the link was
established are also recorded as shown in Figure 3.14(a).

77

ij

connected via
endpoint proximity
property

Lk

Lj
e1

Li

proximity upper bound


= 8 pixels

current line
e2

Li.connected_to

(a) detection of a connected line L

e1

Lj

e2

Lk

(via endpoint proximity)

L7
function = Is_connected(L 7,L 60 )
L17

L15

L25

L27

L40
L36

Is_connected(L

15

,L 60 )

Is_connected(L

25

,L 60 )

Is_connected(L

27

,L 60 )

Is_connected(L

,L )
17 60

Is_connected(L

36

depth first
recursive search

,L 60 )

L60
L45

(b) Recursive line search via connected lines

Figure 3.14: Forming a line link based on the endpoint proximity property is shown
in (a), and a recursive line search to check if two lines are linked via a line chain is
shown in (b).

78

Having these links all established, checking if one line is connected to another distant
line is a simple matter of initiating a recursive search algorithm. This is illustrated in
Figure 3.14(b), where the connection between L7 and L60 was checked by implementing a depth-first recursive search. The search sequence is shown on the right side of
Figure 3.14(b). This search algorithm is used later in the fuselage finding stage, where
the connection between the aircraft nose edge to the wing via the fuselage boundary
edges is traced. In this work, it was found practical to limit the search depth to a
maximum of 7 levels, and have line proximity upper bound of 8 pixels.

3.5

Two-Line Grouping

After experimenting with a large number of aircraft images, we observed that our
lower level image processing (ie, edge detection, contour extraction, straight line
extraction and extension) is effective for extracting the wing leading edges and, to
a lesser degree, the wing trailing edges. The wing tips are often harder to detect
because they are usually short, and for fighter jets, the wingtips are often loaded with
missiles. Given these observations, it was decided that a wing is best represented by
a line pair (or two-line grouping) instead of a trapezoidal shape as used in [32, 33],
which implicitly requires finding the wing tip edge or fuselage-wing border (refer to
Figure 2.9 in Chapter 2). Furthermore, a two-line grouping is also very useful for
describing a nose as a wedge-like shape.
The aircraft nose shape in the image is usually curved, which initially suggested the
use of corner detection techniques [26, 78, 85, 105, 125] to find the corner points.
Figure 3.15 illustrates large variations in the nose shape and intensity distributions.
Most template based corner detection methods, when applied to such images [105],
may detect the corners from the noses, but will also generate a large number of

79

undesirable corners elsewhere because such template cannot discriminate the nose
corner from other corners. Furthermore, the detected corners would not provide any
information about the nose boundaries.
Another common corner detection method is contour-based, where the curvature is
computed along the contour, and local maximal curvature points are noted as a
potential nose tip [26]. This approach is based on the assumption that the nose
boundary is successfully extracted as a continuous contour, approximately parabolic
in shape. However due to image degradation and shading, such a parabolic contour
may be broken into two or more disjoint segments, making the curvature computation
at the true nose tip difficult to implement.
Knowing that line features representing the aircraft boundaries are already available,
and that two-line groupings will be performed to detect potential wings, it would be
more appropriate to treat a nose as a two-line grouping and include the nose detection
in the two-line grouping process.
To generate two line groupings that potentially represent a wing or nose, a number of
constraints derived from possible image projections of a wing or nose are formulated.
In the next two subsections, we introduce the rules governing the wing and nose
formation processes that are applied to every pair of lines from the line database.

3.5.1

Detection of Wing Candidates

Given a pair of lines labelled Li and Lj , we define parameters ti and tj , which indicate
in relative terms how far the lines are from their intersection point, denoted as C (see
Figure 3.16). The following set of constraints is used to detect wing candidates.

1. the two lines are labelled as significant and at least one line is long (Table 3.1),

80

Please see print copy for Figure 3.15

Figure 3.15: A wide variety of nose shapes and intensities.

or they are connected via one or two co-terminating lines (Figure 3.17(a)).
2. the intersection point C in Figure 3.16 must satisfy ti < 1 , tj < 1 and
ti + tj < 2 , where 1 and 2 are thresholds.
3. the separation between the two lines must not be too large (ie., the parameter
apart in Figure 3.16 must be less than a preset threshold).
4. the two lines must overlap when rotated about C as shown in Figure 3.17(d).
This is conditionally relaxed to accommodate severely occluded wings.
5. min(li , lj )/ max(li , lj ) > , where is a threshold value (Figure 3.17(e)).
6. the coordinate of the mirror image of C must be within the image (Figure
3.17(g)).
7. the line angular deviation should satisfy 6 < C < 90 (refer to Figure 3.16 for
the definition of C ).

81



 
 
   
  



  


  

















  

Figure 3.16: Two-line grouping process.

8. the region enclosed by both line segments must not show excessive intensity
variation.

The first condition requires that potential wing edges must be long or they must be
connected via a third line (potential wingtip). This is achieved by the line connection
search as discussed in Section 3.4.6. For the second condition, the threshold for
2 is made linearly proportional to cos(C ) in Figure 3.16, allowing the intersection
point C to be far from the lines if C is small. The sixth condition removes any
two-line grouping in the vicinity of the image borderline, whose opening faces the
image border. Such a two-line grouping cannot possibly form a wing-pair inside the
image. The seventh condition sets limits for the wing angle C . Provided that the
viewpoint is not very oblique, the wing angle C has been found to be less than 90
for most aircraft. The last condition examines the image intensity distribution in the
region delimited by the line pair. This condition rejects regions with widely varying
texture and favours uniformly distributed regions. In practice however, a wing may

82

(a) the two lines are short, but connected by a third line
(recursive search - "connected_to") - accept

(b) intersection point is too


far from the nearest
endpoints of the lines
- reject

(c) they are too far from each other


- reject

(e) length diffence is too large


- reject
C

C
no
overlap

partial
overlap

(d) rotational overlap accept

image border

(f) angle between the lines


is too large - reject

the other wing


falls outside the
image border
- reject
(g) too close to the image border
and face the border - reject

Figure 3.17: Wing candidate detection conditions - examples of accepted cases (a)
and (d), and commonly arising failed cases (shown in red lines).

83

Gradient distribution
within the region enclosed by a two-line-grouping

counts

1. an uniform intensity region OR


2. a few subregions of different but uniform intensities
(eg. camouflaged or partly-shadowed wing)
dense clutter region
peaks corresponding to
intensity jumps at the
boundaries of subregions

gradient of image
intensity

Figure 3.18: Gradient distribution curve for the region enclosed by a two-line grouping. To pass the intensity check, the 10%, 20%, 30% percentiles must be less than
preset thresholds (ie., majority of the populations must be on the left corner).

display some texture, camouflage or shadowed subregions and therefore care must
be taken not to discard such wings. As shown in Figure 3.16 the intensity values
along 3 strips are collected and differentiated to generate the gradient profiles. The
gradient data are processed to from a gradient histogram as shown in Figure 3.18.
A region of uniform intensity will generate gradients with zero values resulting in
a sharp peak at the zero gradient. Camouflage regions which contain different but
uniform intensities, will also generate a strong peak at gradient level zero, along with
a small number of minor peaks associated with intensity jumps at the camouflage
boundaries. The gradient histogram for cluttered regions, however, is spread out as
shown in Figure 3.18. By normalising the area under the gradient distribution, and
comparing the (10%, 20%, 30%) percentiles with pre-defined gradient thresholds, one
is able to heuristically distinguish between clutter and non-clutter regions.

84

nose tip
C
gij

di

dj

N
Li
longer leg edge => LL

Lj

lL= ||LL||

shorter leg edge => LS lS= ||LS||

tL= dL/lL
tS= dS /lS

Figure 3.19: A typical nose configuration.

3.5.2

Detection of Nose Candidates

A similar approach is used to set the conditions for extracting two-line groupings for
the nose. Given two lines Li and Lj , the longer and shorter lines are assigned tags
LL and LS respectively. Initially, a set of conditions which portray a typical nose
configuration shown in Figure 3.19, is presented below.

1. the nose edge must not be excessively long, ie., max(kLi k, kLj k) < lth1 , where
lth1 depends on the image resolution.
2. the intersection point C in Figure 3.16, must satisfy tL < L , tS < S , where
L < S (refer to Figure 3.19).
3. the gap between their closest endpoint (ie., gij in Figure 3.19) must not exceed
a preset threshold gth , where gth is proportional to (kLi k + kLj k).
4. the two lines must overlap when rotated about C as shown in Figure 3.17(d).

85

(b) no supporting third line


- reject

(c) a proximal third line


correctly oriented
- accept

N
(a) nose angle is too
large, or gap is too wide
(d) supporing line is (e) supporing line is
incorrectly oriented incorrectly oriented
- reject
- reject

(h) no supporting line


- reject

(f) the nose edges are


connected by a line
chain
- accept

(i) correct edge is linked to


an supporting line
- accept

(g) excessive length


difference

(j) supporting edge is


incorrectly oriented
- reject

N
(l) nose angle is too small

(m) no dark region in


between the edges
- reject

(k) supporting edge is


incorrectly oriented
- reject

(n) shadow region is


detected between the edges
- accept

Figure 3.20: Incorrect nose configurations in (a), (g), (l) are subject to further verification. Resulting accepted and rejected configurations are shown in blue and red,
respectively.

86

image border lines

candidate nose is too close


to the border lines
- reject

Figure 3.21: Any nose candidate in close proximity to image borderlines, which is
oriented in such a way that a large portion its projected silhouette is placed outside
the image borderlines.

C
(nose tip)

Figure 3.22: Location of the nose tip. If the nose tip is not visible, then it location
is estimated at the midpoint of nose edges intersection and midpoint of nose edges
inner endpoints.

87

Please see print copy for Figure 3.23

Figure 3.23: Multiple two-line grouping configurations generated from single physical
nose.

5. the length ratio must be less than a preset threshold (ie., kLL k/kLS k < lth2 )
(see Figure 3.20(g) for a contradicting case).
6. the lines must not be too close to the image borders (see Figure 3.21).
7. min < N < max (see Figure 3.20(a) and (l)), where min and max are set to
be inversely proportional to (kLi k + kLj k).
Condition 1 sets an upper limit for the nose edge lengths. Nose edges are unlikely to
appear very long relative to the image size. Therefore, an upper limit is given as a
function of the image size. Condition 2 sets a limit on how far the lines can extend
before they intersect. The thresholds S , L , are bounded within 1.5 - 2.6. Condition
3 limits how far the lines can be separated from each other. The gap gij needs to be

88

small when compared with the line pair. Condition 4 states that unless the nose is
occluded, the nose boundary lines must overlap when rotated about the nose intersect
point, C. Condition 5 is required so that any line pair coincidentally formed by a long
line with a short clutter segment could be rejected. Condition 6 necessitates that if
the line pair is located in the vicinity of the image border and is orientated in such a
way that a large portion of the hypothetical aircraft silhouette falls outside the image
(see Figure 3.21), then the line pair cannot be a potential nose. The last condition
specifies the range for the nose angle, N . The upper limit decreases linearly with
the increasing the mean of the two line lengths. If all of the conditions are satisfied
then the two-line grouping is accepted as a potential nose. However, if some of the
conditions fail just marginally, then supplementary evidence is searched for.
The nose contour is usually curve shaped and therefore approximated by more than
two line segments. This in turn leads to multiple combinations of line pairings as
shown in Figure 3.23, and some of these may not satisfy all of the above constraints.
Therefore, if a line pair fails one of the constraints (see Figure 3.20(a), (g), (l)), then
further validations follow, checking for any supportive connected lines and a shaded
region. The validation procedure is summarised below, with reference to Figure 3.23.

1. As shown in Figure 3.20(a), if the lines are short and their nose angle N is
large, they are usually considered as clutter However, if at least one of the lines
is connected to a third line (by checking its connected to slot), and the three
lines approximate a parabolic shape as shown in Figure 3.20(c), then the line
pair is accepted as a potential nose. If not (as shown in Figure 3.20(b), (d),
(e)), then the line pair is rejected.
2. If the gap is wide (see Figure 3.20(a)), then the gap is searched for line(s)
bridging the gap. If the recursive search for a connected line chain from one

89

edge Li leads to Lj , as show in Figure 3.20(f), then the line pair is accepted as
a potential nose.
3. If one line is long and the other is much shorter as shown in Figure 3.20(g), then
the shorter line is checked for any connected line forming a parabolic shape as
shown in Figure 3.20(i). If such a third line is found, then the pair is accepted
as a nose candidate. Otherwise (see Figure 3.20(h), (i), (j)), the line pair is
rejected.
4. If the nose angle N is less than min and the gap, gij , is small (Figure 3.20(l)),
then the intensity between the lines is noted. Only if the region is dark then
it is regarded as a shadow section of the shaded nose cone, and the line pair is
accepted as a potential nose (see Figure 3.20(n)).
Since the nose tip appears often rounded, if the nose angle is small, then the intersection of the two nose edges can occur at a much further distance from the true nose
tip (see Figure 3.23(b), (c), (f)). Therefore, when a nose is formed, its corner location
is assigned as the midpoint of the line joining the intersection point and midpoint of
the two inner endpoints of the nose legs shown in Figure 3.22.

3.5.3

Two-line Grouping Organisation

Once all geometric and intensity constraints are satisfied, the line pair candidate is
entered in a dedicated database along with a number of geometric and image intensity
attributes (see Figure 3.24). As an example, the angle between the line pair is recorded
in the angle slot in Figure 3.24. The length sum of the two lines (legs) is assigned
to the weight slot. The image in Figure ??(a) is used to show the outcomes of the
nose and wing processing steps. Figure 3.25 shows all the nose and wing candidates
generated from that image.

90

WING/NOSE
Wing/Nose No: #
Leg1: #
Leg2: #
Intersection point (corner): (#,#)
Angle: #
Weight: #
Average Intensity Level: #
Distance Between midpoints(apart): #
Minimum Gap: #

Figure 3.24: Wing/Nose Representation. Leg1 and Leg2 are the two lines forming
the two-line grouping. Note that the symbol ] refers to a number.

Please see print copy for Figure 3.25

Figure 3.25: Resulting wing and nose candidates from the two-line grouping process
on the image of Figure 3.4(a). In (a), line pairs are shown in blue, and red lines are
used to show which two lines are paired. [(b) 80 nose candidates and (c) 513 wing
candidates].

91

3.6

Four-Line Grouping

Four-line groupings are a higher level data abstraction developed for the purpose
of representing the wing-pair of an aircraft. The wing-pair is the most prominent
feature of an aircraft and forms the starting point for generating aircraft hypotheses
in the image. Given that one wing is represented as a two-line grouping, it naturally
follows that a wing-pair is represented as a four-line grouping as shown in Figure
3.26(a). Any four-line grouping may be oriented arbitrarily in the image. Of all
possible orientation configurations, only those that result in the wing patterns of
Figure 3.26(c) are representative of real aircraft wings.
To extract a reduced number of meaningful groupings, every pair of two-line groupings, wing(i) and wing(j), must satisfy a number of geometric constraints as given
below.

1. The two wings must have compatible sizes, (0.5 < wing(i).weight/wing(j).weight <
2 (refer to Figure 3.24)). Recall that the wing weight is the leg length sum of
the wing.
2. The two wings must have comparable wing angles, (|LC RC | < d , where d
is an angle threshold).
3. The wing span must not be too small, (kLC RCk > Lth ), where Lth is derived
from the line length statistics.
4. Any 2 non-collinear edges (one from each wing) must not cross internally.
5. The wings must face each other (refer to Figure 3.26(b) for examples of unacceptable wing arrangements).
6. The angles F and R in Figure 3.26(a) must not exceed a preset threshold.

92

7. The two wings must comply with the skewed symmetry property [52, 53]. In
other words, the point M in Figure 3.26(a) is roughly the midpoint of LC and
RC.

The fifth test examines the relative orientation of the two wing candidates. An acceptable wing-pair falls into one of three wing configurations as depicted in Figure
3.26(c), namely diamond, boomerang and triangular. Figure 3.26(b) illustrates typical configurations of rejected groupings, which take up a large proportion of the
cluster set. Recognition of the leading and trailing edges is based on the comparison
of F and R as shown in Figure 3.26(a). The lines associated with the smaller angle
are labelled as the leading edges, and the ones with the larger angle are labelled as
the trailing edges. The last condition checks the symmetry property of the wing-pair
about its symmetry axis. Symmetry is a powerful grouping mechanism, and has been
addressed in numerous computer vision works [23, 30, 31, 38, 80]. As shown in Figure
3.27, the three points made up of the two wing intersection points (F P and RP )
and M the midpoint of LR and RC, preserve the collinearity property after weak
perspective projection. The collinear property holds exactly if the wings are perfectly
coplanar. In practice, however, some errors are introduced because of possible distortions caused by the imaging process and lower-level processing imperfections in
locating the wing edges. Furthermore, the wings are only approximately coplanar for
most aircraft. The error becomes larger if R approaches 180 , or if LC and RC are
small, as the location uncertainties of RP, LC and RC can grow very large. Hence,
for triangular wings, the symmetry test is replaced by another condition: that the
line joining FP and M must not cross the two trailing edges.
A wing pair satisfying these constraints is compiled into one of three wing categories,
namely triangle, diamond and boomerang (refer to Figure 3.26(c)). The data structure
is shown in Figure 3.28. The weight slot in the wing pair representation contains the

93

Wing's symmetry axis

FP

PT4
PT2
F

3
RC

PT1
1
LC
LC

PT3

PT5

RC

PT7

MID
PT8

PT6

FP M

RP M

RP

(a) wingpair parameters


internal cut

(b) examples of unacceptable configurations


FP

FP

RC

LC

RP

LC
RP
diamond

LC

RC

triangle

FP
RP
RC

M
boomerang

(c) diamond, triangle and boomerang wingpairs

Figure 3.26: Formation of four-line groupings (wing-pair candidates). Commonly


encountered failed configurations are shown as red lines in (b).

94

RC

midpoint of wing
intersection points
(LC and RC)

M
RP
FP

Line joining FP and RP


LC

Figure 3.27: Three point collinearity property both in space and in the image.

sum of the four line lengths, and is used to prioritise four-line groupings in terms of
size. For each wing-pair category, if the number of the four-line groupings exceeds 100
then, only the top 100 with the largest weights are selected for further processing. It
should be noted that one aircraft may generate multiple four-line groupings formed
by different combinations of line fragments as shown in Figure 3.29. Accepting all
line groupings and letting them compete in the later stages improves the system
robustness. Figure 3.30 shows all outcomes of the four-line grouping process.

3.7

Generation of Aircraft Hypothesis

The generation of an aircraft hypothesis calls for the consistent association of a wingpair candidate with a matching nose. As indicated in Table 1.1, a matching nose
must be aligned with the fuselage axis and be facing the wing-pair. The fuselage axis,

95

FOUR-LINE GROUPING
Wing-pair Index: #
Type: Boomerang/Diamond/Triangle
Left Wing Index: #
Right Wing Index: #
Four Edge Indices: [#, #, #, #]
Corner Coordinates: [FP(#,#) RP(#,#) LC(#,#) RC(#,#)]
Weight: # (four line length sum)
Wing Span: # (distance separating LC and RC)

Figure 3.28: Four-line grouping representation. The two slots right and left wing
hold the wing numbers which form the wing-pair. Note that the symbol ] refers to a
number.

L1

L2

FP

RP

L2

L1
line generated
by extending L1 and L2

Figure 3.29: Extraction of multiple wing-pairs due to wing edge fragmentation. This
figure shows 3 possible boomerang wing pairs arising from one wing pair, one of whose
edges contain 2 segments.

96

Please see print copy for Figure 3.30

Figure 3.30: Resulting wingpair candidates from the four-line grouping process on
the image in Figure 3.4(a). The blue lines are constituent lines of four line groupings.
Red and green lines are introduced to show how the blue lines are grouped together .
[(b) triangle wing candidates, (c) diamond wing candidates, and (d) boomerang wing
candidates].

97

at this stage, is defined as the axis going through the forward point, F P , and rear
point RP , as shown in Fig. 3.26(a). For a triangular wing shape, RP is the midpoint
M of the left and right wing intersection points.
A successful nose-wing association requires several geometric conditions to be met
that are consistent with the generic viewpoint (ie., wings are visible). These conditions
are based on a number of heuristics deduced from the structure of a large number of
aircraft, imaged under different viewpoints. These conditions are listed below (refer
to Figure 3.31).

1. The lengths of the nose legs must not exceed the distance between the nose tip,
C, and intersection point of the wing leading edges, FP.
2. The nose tip, C, must be located within the nose search region as shown in Figure 3.31(a). The size of the nose search region, which is defined later, depends
on the wing-pair size and shape.
3. The nose must face the wing-pair. The nose angular bisector must approximately line up with the line joining C and FP (ie. d in Figure 3.31(b) should
be small).
4. The line joining C and MID (middle point of M and RP) must pass through the
gap between the wing-pair, without touching any one of the wing edges (Figure
3.31(c)).
5. The line joining C and MID must belong to the sector delimited by the nose
legs. In the actual system implementation, some tolerance is introduced by
slightly widening the nose sector.
6. The line joining C and FP must not be near parallel with any of the wing
leading edges (ie., min(L , R ) > th2 in Figure 3.31(b)).

98

7. The projection of the nose tip onto the line joining LC and RC must fall within
the wing span (ie., Wp < W ) as shown in Figure 3.31(c).

After examining and processing numerous aircraft images, it was determined that the
nose search region is located along the fuselage axis at a distance ranging between
kF P RP k/2 and 3kF P RP k forward of F P . The nose search region is shown in
red in Figure 3.31(a). The quantity kF P RP k is the distance separating the wing
forward and rearward points. The lateral angular extent of the search region (s ) is
determined to be no larger than 30 from the wing symmetry axis. Requirements 6
and 7 are imposed to eliminate nose-wingpair associations with pronounced skewness
(ie., the nose is considerably tilted to one side of the wingpair).
If a wing-pair and a matching nose satisfy these requirements, then their association
is accepted as an aircraft candidate, and an aircraft hypothesis is generated. Often
more than one nose may be successfully associated with a wing-pair, particularly
in cluttered images. A wing-pair candidate may be associated with a nose arising
accidentally at the cockpit. Also, there can be multiple line-pairs associated with the
same nose part (as shown in Figure 3.23(a)), or one side of the nose is shaded giving
rise to three legs. If a wing-pair is matched to more than one nose candidate, it is
necessary to prioritise them based on how well each nose is aligned with the wings
symmetry axis (ie., how small dev in Figure 3.31(c) is).
The nose candidates for the currently considered wing-pair enter a fuselage test. This
test checks for the existence of lines filling the space between the nose and the wingpair. These lines are called fuselage lines since they usually emerge from the fuselage
structure, although some often emerge from the cockpit. The fuselage test is presented
in detail in Section 4.1.1.

99

Nose outside the search


region -rejected
Accepted
nose candidate

Nose not facing the wingpair rejected


C
C
s

Search region

R2

FP

L
F

nose bisector
d < th1

R1

min [L,R] > th2


MID
RP

MID

Wing-pair bisector

RP

(b) Alignment

(a) Nose search region

dev

diff

FP
W P << W

M
LC

MID

RP

RC

WP

(c) Nose tip projection falls within W

Figure 3.31: Nose to wing-pair matching. The nose must be within the search region,
must be facing the wing-pair, and the skewness must not be severe.

100

3.8

Neural Networks for Extracting Line-Groupings


and Aircraft Hypotheses

As shown in Sections 3.5-3.7, formation of the line groupings (ie., wings, noses, wingpairs, and aircraft hypotheses) requires sequential applications of constraints, often in
the form of hard thresholds. This raises a concern that violating one threshold may
result in a failure to detect the aircraft. To mitigate this concern, the thresholds were
relaxed at the lower levels and were gradually tightened. Furthermore, some rules
were made flexible so that when one of the conditions fails marginally, the candidate
line-grouping is given additional validation checks for a second chance to survive.
Nonetheless, it would still be preferable to defer the decision making until after all
the parameters are considered. Another drawback of the rule based approach is that
the thresholds need to be manually adjusted, making the parameter tuning process
time-consuming and tedious.
We note that some of the line-grouping formation rules in Sections 3.5-3.7 are descriptive. Actual coding of such rules involves numerous parameters that are often
co-related or need to be constrained. We will name these parameters feature parameters. A collection of N feature parameters forms an N -by-1 vector that maps to a
point in the N -dimensional parameter space. A large number of the feature parameters generated from the training images will form clusters in the parameter space.
The surface of the clusters approximates the decision boundaries. The feature parameters collected from the non-aircraft images will be randomly distributed in the
parameter space.
Assuming a 2-D parameters space, Figure 3.32(a) illustrates the rectangular decision
boundaries of the rule-based approach with fixed thresholds. Such simple boundaries
usually let too many clutter features pass through (ie., under-fitting). Neural network

101

Features

Clutters

SImple Decision
Boundary

Complex-Shaped
Decision Boundary by
Neural Network

(a)

(b)

Figure 3.32: In the feature parameter space (2-D for illustrative purpose) the blue
circles represent aircraft feature parameters and the red squares represent clutter
feature parameters. (a) Use of single thresholds forms simple decision boundaries
that pass many clutter features, and (b) the neural networks can generate complex
shaped decision boundaries.
based approaches may provide a better approximation of the decision boundaries as
shown in Figure 3.32(b).

3.8.1

Configuration of the Neural Networks

Input to the neural networks is the feature parameters that are associated with the
rules in Sections 3.5 - 3.7. The descriptions of the input parameters will not be
presented here, however the full listing of the input parameters and their references
to the rules and figures in Sections 3.5 - 3.7 are included in Appendix A.
We use the feed-forward neural networks as they are most popular and widely used in
the area of classification. The feed-forward neural network begins with an input layer,
which is connected to a hidden layer. This hidden layer can be connected to another
hidden layer or directly to the output layer. It is very rare for a neural network to
need more than two hidden layers [54].

102

logsigmoid transfer funtion


1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5

Figure 3.33: Plot of log-sigmoid function.

The purpose of the proposed neural network is to indicate whether or not the input
parameters belong to the aircraft features or non-aircraft features. The output of the
network should approximate 1 for aircraft features and 0 for non-aircraft features.
Hence, only one neuron is used in the output layer. The neurons in the hidden
layer and output layer have a logistic sigmoid (log-sigmoid) transfer function (ie.,
1/(1 + en )) which is shown in Figure 3.33. The log-sigmoid transfer function was
selected because it is well suited to the output range [0, 1].
The best known example of a neural network training algorithm is back-propagation
[50, 99]. In back-propagation, the gradient vector of the error surface is calculated.
This vector points along the line of the steepest descent from the current point,
hence moving a short distance along this line will decrease the error. A sequence
of such moves leads to a local minimum. Even though this is the easiest algorithm
to understand, this is often too slow for practical problems. Instead, we resort to
Levenberg-Marquardt algorithm [8] which is typically one of the fastest training algorithms.

103

Next step is to determine the number of neurons in the hidden layer(s) that will produce good results without over-fitting. Over-fitting occurs when the neural network
becomes so complex that it may actually fit the noise, not just the signal. Instead
of learning, the network memorises the training set hence producing unpredictable
results when new cases are submitted to it. There are no quantifiable, best answer to
the layout of the network for any particular application. There are only general rules
that have been practised by researchers and engineers. Some of them are summarised
below.
The number of hidden neurons should be in the range between the size of the
input layer and the size of the output layer.
The number of hidden neurons should be 2/3 of the input layer size, plus the
size of the output layer.
The number of hidden neurons should be less than Ntest /(K (Ninput + Nouput )),
where Ntest is the number of cases in the training data, K is scaling factor
ranging between 5 and 10, and Ninput and Nouput are respectively the number
of neurons in the input and output layers.
Even though, the above rules may give a good starting point, the selection of the
network configuration really comes down to trial and error. The network design was
carried in the following manner:
The initial configuration was set as one hidden layer with the number of hidden
neurons set to half the sum of the input and output layer sizes.
Each configuration was trained several times, retaining the network producing
the smallest error rate. Several training trials were required for each configuration to avoid being fooled if training locates a local minimum. With the

104

Table 3.2: The neural network configurations and the mean error rates in detection
of wings, noses, wingpairs and aircraft hypotheses.
Features
Wing
Nose
Wingpair
Aircraft

no. of inputs
7
10
17
11

network configuration
7-5-1
10-4-2-1
17-6-1
11-6-1

mean error rates


4%
3%
4%
1%

best network (ie., optimum weights), this process is repeated by resampling the
experimental data; five-fold cross validation is used to generalise the error rate.
If the performance level is not met (due to under-fitting), then more neurons
are added to the hidden layer. If that does not help, then an extra hidden layer
is added.
If over-fitting occurs, hidden neurons are gradually removed.

The network output threshold is set to 0.5 for the evaluation of the mean error rate.
The term error rate is defined as the total count of misses and false alarms (in %).
Five-fold cross validation is used so that the error rates are generalised (ie., the error
rates remain consistent when new sets of of data are presented to the network).
The experimental data for each of the four neural networks contains 300 cases of
aircraft features (eg., wing, nose, wingpair and wingpair-nose) and 1500 cases of nonaircraft features. The network dimensions that result in the smallest mean error rates
are chosen for the system (see Table 3.2). When two configurations give almost same
error rates, the one with smaller number of the hidden neurons is chosen as it will be
less prone to over-fitting.
Note that the number of the hidden layers for the nose features increased to 2. Unlike

105

the wings, the nose sections are curved, non-planar and often shaded. Hence, discriminating the nose features is usually more difficult, and may require a more complex
configuration.

3.8.2

Analysis of the Neural Networks

Figure 3.34 shows the receiver operation characteristic (ROC) curves [22], obtained
from the experimental data set. It is shown that high detection rates (eg., > 98%)
were achievable for small values of false detection rate. Achieving a very high detection
rate (or very low miss rate) is critical in this stage because failing to detect a wing
or nose leads to a MISS at the system output. The ROC curves indicate that the
neural networks may be a feasible option.
Next step is to examine if the proposed neural networks can remove the spurious
features that the rule-based approach could not remove previously. To test this, a
number of non-aircraft images are fed to the aircraft feature extraction rules (in Sections 3.5 - 3.7), and the surviving line groupings are collected. The feature parameters
generated from these line groupings are fed to the networks. The experimental results
showed that while maintaining a high detection rate above 97%, 30-40% of those spurious features could be removed (see the correct rejection rate in Table 3.3, indicating
possible improvement in terms of a reduced false alarm rate.
The neural networks are integrated into the system and tested on real aircraft images.
The effect of the neural networks on the overall system performance will be presented
in Chapter 6.

106

ROC curve of NN nose detection algorithm


100

90

90

80

80

nose detection rate in percentage

wing detection rate in percentage

ROC curve for wing detection


100

70
60
50
40
30
20
10
0

70
60
50
40
30
20
10

20

40
60
miss rate in percentage

80

100

20

(a)

90

90

80
70
60
50
40
30
20
10
20

40
60
Wingpair miss rate in percentage

(c)

100

ROC curve for detection of aircraft candidates


100
aircraft candidate detection rate in percentage

Wingpair detection rate in percentage

ROC curve for wingpair detection algorithm

80

(b)

100

40
60
nose miss rate in percentage

80

100

80
70
60
50
40
30
20
10
0

20

40
60
80
aircraft candidate miss rate in percentage

100

(d)

Figure 3.34: ROC curves for detection of (a) wings, (b) noses, (c) wing-pairs and (d)
aircraft hypotheses.

107

Table 3.3: Test of the neural networks on the spurious features that survived the
rule-based approach. As shown in the third column, 30-40% of those features are
successfully rejected by the neural networks.
Features
Wing
Nose
Wingpair
Aircraft

3.9

detection rate
97%
98%
97%
98%

false alarm rate (FAR)


67%
60%
65%
64%

correct rejection rate (1-FAR)


37%
40%
35%
36%

Discussion

In this chapter, saliency based low level processing and line grouping mechanisms
for potential aircraft parts and hypotheses are discussed in detail. The low level
processing is dedicated to clutter removal and the detection of straight lines. The
pixel density and randomness of pixel orientation play an important role in the early
discrimination between object (aircraft) and clutter pixels.
The next processing step is to join collinear lines and establish a line data structure that includes information about the co-termination property of proximal lines.
This last processing step is very useful in tracing parts of the aircraft boundary
and contributes to the evidence accumulation process. Saliency-driven line organisation is carried out in this chapter. In particular lines are sorted and then allocated
descriptions depending on line length statistics. This processing measure reduces
considerably the polynomial growth of line groupings as the number of lines increases
progressively in each group.
To summarise, the key features of this chapter are: (a) processing of background
clutter, (b) tagging of polarised background lines, (c) extending and structuring lines
based on their saliency, (d) forming two-line groupings that potentially represent

108

wings and noses, wing pairs and aircraft hypotheses, and (e) introducing neural networks as an alternative approach to the rule-based line grouping method to improve
the system performance.
The following chapter describes the evidence accumulation processes based on examining positive and negative cues from aircraft parts and clutter.

Chapter 4
Generic Aircraft Recognition
In Chapter 3, the extracted lines were grouped to form wing-pair and nose associations, which are not small in number. This system adopts a strategy of what we call
low commitment; a large number of lower level features are initially accepted in order
to increase the extraction probability of features belonging to an aircraft. Higher
level rules, inspired from the aircraft knowledge domain, are then applied to filter out
spurious aircraft candidates arising from accidental line feature groupings.
In this chapter, the system subsequently collects evidences from fuselage, tail fins,
wing tips, and other areas within the aircraft region to consolidate correct hypotheses.
Intensity based information is also used in hypothesis promotion/demotion processes.
Section 4.1 outlines the evidence accumulation process, in which a confidence score
increases as aircraft parts are detected, and then additional set of positive and negative evidences are collected to further separate the score gap between the true and
spurious hypotheses. Section 4.2 describes the interpretation conflict resolution process. Section 4.3 shows how the system handles difficult shadow problems. Section
4.4 describes how the scores are weighted to improve the system recognition performance. Section 4.5 presents experimental results of selected aircraft images under
109

110

various imaging conditions: blurring, camouflage, clutter, multiple aircraft, occlusion, protrusions and shadowing effects. Section 4.6 provides several examples of
cases where the system outputs a winning hypothesis when there is no aircraft in
the image. This test illustrates how spurious hypotheses may form from background
clutter, and occasionally reach the final recognition stage. This chapter concludes
with a brief discussion in Section 4.7.

4.1

Evidence Accumulation

In this section, we address the aircraft evidence accumulation process in terms of fuselage, wing tips, tails fins, and local intensity match. We implement a voting scheme,
where the evidence score increases progressively as positive evidence accumulates,
and decreases if negative evidence is encountered.

4.1.1

Fuselage Detection

As with the wings and noses, a fuselage is an important feature of the aircraft which
connects the nose to the wings hence enclosing the forward section of the aircraft.
The fuselage section is readily visible from many viewpoints and contains long lines.
A fuselage is usually described as a long cylindrical shape, which is true for large
commercial aircraft, as shown in Figure 4.1(a). However, for military jets, the fuselage
does not usually have a simple shape and is more spread out laterally as shown in
Figure 4.1(b). The fuselage section for jets is defined in this thesis as the aircraft
structure located in front of the wings (see Figure 4.1(c)). This includes the cockpit
and nose. The fuselage region for such aircraft is often difficult to extract from the
image, particularly if the image is degraded. Applying camouflage to the fuselage

111

Fuselage is
long and narrow
Fuselage is flatter
Fuselage - wing junction
is blended and junction
edge is often undetected

(a) Commercial aircraft

(b) Military jet

Fuselage
Wing tip

Rear -fuselage
(only for boomerang wing shaped aircraft )

Tail fin

(c) Fuselage, tail fin and wing-tip detection

Figure 4.1: Typical commercial and military aircraft, and the parts that needed to
be detected for evidence score accumulation.

112

section further exacerbates the segmentation process of the fuselage region. In this
thesis, the detection of the fuselage section makes use of a different approach and takes
advantage of the observation that most edges arising from the fuselage structure are
roughly oriented along the fuselage axis. This means that the detection of a relatively
large number of similarly oriented lines within a confined region in the image, is
suggestive of the presence of a fuselage section. In practice these similarly oriented
lines are searched for within a region delimited by the detected nose and wing leading
edges.
Evidence about the fuselage section is determined by how much the fuselage axis,
obtained by joining the nose tip to the intersection point of the wing leading edges, is
covered by fuselage edges from both sides of the axis. This evidence is highest when
the fuselage edges cover all of the frontal aircraft section (from wings to nose). A
fuselage edge is defined as any edge approximately oriented along the fuselage axis
and located in a region around the fuselage axis. The remaining part of this section
explains in detail the detection process of the fuselage section.
The fuselage search region is based on the nose location and wing shape. As shown
in yellow in Figure 4.2 (a), the fuselage search region is defined as a trapezoid that
includes a convex hull made of the points C, NL , NR , PL and PR . These points are
respectively the nose tip, left and right nose endpoints, and innermost endpoints of
left and right leading edges of the wings.
To ensure that the search region is large enough to include all the fuselage section,
an extra margin of 10 pixels is added to enlarge the search region.
The gap widths kPL PR k and kPL0 PR0 k must be similar, as they represent the
width of the fuselage. If not, then the width of the search region is adjusted to fit
the smaller of these two gap widths.

113

Fuselage
search region

C (nose)
NL
NR

outside the search


region - rejected

accepted
p1

wrong orientation
- rejected

FP

p2

10

10

PL

PM

PR

RP
P' L

P' R

(a) fuselage search region

detected
fuselage edges

C (nose)
projections of the
projections of the
fuselage lines onto
fuselage lines on line
the line joining C
joining C and P M
and P M

PM

(b) fuselage coverage computation

Figure 4.2: Detection of fuselage edges and assessment of their coverage.

114

If a line is found within the search region, then its alignment with the fuselage axis
(the line joining C and FP ) is tested. The angle between the line, p1 p2 , and the
fuselage axis, must be smaller than a preset threshold in order to be accepted as a
fuselage line. Any line outside the search region is not considered.
Having collected all fuselage lines, a fuselage score is computed based on the union
of their orthogonal projections onto the fuselage axis. This score computation is
illustrated in Figure 4.2(b), where individual line projections of length `L (i) from the
left side and `R (i) from the right side, respectively, are shown in red. The union of all
projections forms the total projected length, which is then normalised by the length
of the fuselage search region along the fuselage axis (estimated as kC PM k). The
fuselage coverage score is heuristically defined as,
PNR
PNL
`R (i)
i=1 `L (i)
fL + i=1
fR }
scorefuse = S {
kC PM k
kC PM k

(4.1.1)

where NL and NR are respectively the number of fuselage lines on the left and on the
right of the fuselage axis, S is a multiplicative factor made large to give more weight
to the fuselage section evidence, and fL and fR are respectively the left and right
side scale factors ranging from 0.4 to 1. These two scale factors are made inversely
proportional to the divided angular width of the nose search region, expressed in
terms of (C F P, C PL ) and (C F P, C PR ) as shown in Figure 4.3. This
scaling setup ensures that the narrower side receives more emphasis, as it is less likely
to include clutter lines belonging to the background.
Furthermore, if a subset of the extracted fuselage lines generates a connected chain,
linking the nose to PL or PR , then a bonus score is awarded because the connected
chain often represents a fuselage boundary. The term connected chain is defined in
this context as a series of lines where one line is joined to another by the endpoint
proximity property, while preserving a deviation angle greater than 75 . Figure 4.4
illustrates an example of an aircraft hypothesis where the nose connects to the right

115

1.2

1.1

scale factor SFR (or SFL)

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

10

20
30
40
50
angle between [CFP] and [CPTR (or CPTL)]

60

70

Figure 4.3: Scale factor (fL or fR ) which is inversely proportional to the divided
angular width of the fuselage search region, expressed in terms of (C F P, C PL )
and (C F P, C PR ).

and left wing leading edges through two connected line chains, tracing the frontal
fuselage boundary. If the sum of NL and NR is very large (ie., > 25), most of the
fuselage edge lines are short and the fuselage coverage score is not large (ie., < 180),
then these lines are considered as clutter and a penalty of 100 is applied to the fuselage
coverage score.
This fuselage detection procedure is repeated for all nose/wingpair candidates. The
nose with the highest fuselage score is selected as the winning nose. However, if the
score gap between the winner and runner-up is very small and the runner-up nose is
best aligned with the fuselage axis, then the runner-up nose wins instead.

4.1.2

Detection of Tail Fins

116

Please see print copy for Figure 4.4

Figure 4.4: The detected fuselage boundary lines connect the nose to the wing leading edges via connected chains. Such a nose-to-wing connection provides the strong
fuselage boundary evidence.

The wing, nose and fuselage section are the most prominent features of an aircraft.
The use of these features is usually sufficient to detect an aircraft in a clean background. However, realistic scenes often include clutter, which (either on its own or
in combination with several aircraft parts) generate numerous spurious aircraft candidates. In order to increase the score gap between the true aircraft hypothesis and
spurious ones, more evidences are required from other aircraft parts. In this subsection, we consider locating the aircraft tail fin edges in the image to improve the
aircraft hypothesis confidence.
Detecting the tail fin edges correctly while filtering out clutter is not straightforward
because the edges are relatively short, and depending on the viewpoint, they are
often occluded by the rudder. Therefore, detection of the complete fin structure is
impractical. Instead, we mostly focus on the fin leading edges. If the complete fin

117

structure is detected as a two-line grouping, then a bonus score is added (see the
bottom left of Figure 4.5(a)). The tail fin detection algorithm checks the following
conditions assuming a generic viewpoint.

1. The tail fin must be located behind the wing and cannot be longer than the
longest wing edge from the same side.
2. The tail fin cannot be further from the fuselage axis than the wing is.
3. The tail fin cannot be too far behind the wing; an exception arises if the hypothesis has a boomerang wing shape and a narrow fuselage.
4. The tail fin must be approximately aligned with the wing leading edge, (|F
R | < th ) as shown in Figure 4.5(a)).
5. When extended, the tail fin must not cross the wing edge of the same side.

If all of the above conditions are satisfied, then the line is accepted as a tail fin
edge, and an evidence score for the fin is awarded. To consolidate this evidence,
the intensity values in the vicinities of the wing trailing edge and fin leading edge are
compared, as shown in Figure 4.5(b). Since the space between the two edges is usually
narrow especially for non-commercial aircraft, some degree of intensity uniformity is
expected unless the background is cluttered. If the local intensity shows a reasonable
match, then a bonus score is awarded, as illustrated in Figure 4.5(b).
Finally, if two or more tail fins are detected on both sides of the fuselage axis, then
they are further subjected to a symmetry test. For any symmetric line pair, the sum
of angle cotangents (eg. cot 1 + cot 2 or cot 01 + cot 02 in Figure 4.5(c)), is constant
and is only a function of the roll and pitch angles. This will be derived in Section 5.2

118

fuselage axis (RP -> C)


extension of tail fin edge
cuts a wing edge - reject

FP
F
RP

LC

RC

P1
R

too far from the


fuselage axis - reject

P2

correct tail fin


- accept
longer than wing edge
- reject

tail fin as a
2-wing grouping - bonus

too far behind the wing


- reject

(a) geometric constraints

fuselage axis
foreground
intensity match
F2

F1

B2

B1

b1
f1

b2
f2

b3

F3

B3

background
intensity
match

f3

F1, F2, F3, f1, f2, f3: foreground intensity values


B1, B2, B3, b1, b2, b3: background intensity values

TP

'1

'2

cot(1)+cot(2) = cot('1)+cot('2)

match in foreground and background -> full bonus


match in foreground or background -> half bonus

(b) intensity constraints

(c) skewed symmetry constraints

Figure 4.5: Locating tail fin edge lines: (a) geometric constraints in terms of location,
length and orientation, (b)intensity-based constraints applied both in the foreground
and background regions, (c) skewed symmetry constraints applied to tail fin leading
edges (ie., cot 1 + cot 2 = cot 01 + cot 02 ).

119

of the next chapter (refer to Equation 5.2.8), where the viewpoint is estimated. This
observation translates into the constraint
|(cot 1 + cot 2 ) (cot 01 + cot 02 )| < cth
where cth is a tolerance threshold. If this symmetry constraint is satisfied, then a
bonus score is awarded, otherwise a penalty is applied.

4.1.3

Wingtip Edge Detection

The Wingtip edge is another useful feature that completes the wing structure. Wingtip
characteristics vary depending on the aircraft type. A delta wing aircraft, as shown in
Figure 5.27(a), has no wingtip edge. Some fighter jets carry missiles on the wingtips,
resulting in elongated wingtip edges (see Figure 4.6(a)).
The wingtip edge must satisfy the following conditions.

1. The wingtip edge must be located in the search sector as shown on the right
side of Figure 4.6(b).
2. The wingtip edge must be approximately equal in length to the gap defined as
x2 in Figure 4.6(a). Often a wingtip edge appears longer than x2 when a missile
is attached to the wingtip (refer to Figure 4.6(a)). Therefore, the constraints
are relaxed accordingly (ie., x3 /x2 < x1 /x2 < rt and ` > x2 /2 where rt and `
are respectively the ratio threshold and wingtip line length).
3. The wingtip edge must be approximately aligned with the fuselage axis (see
the right side of Figure 4.6 (b) for typical rejected candidates). However, if the
wingpair is boomerang shaped, then the wingtip can also be roughly perpendicular to either wing edges as shown in Figure 4.6(c).

120

x1 > x3
x1
x2
x3
(a) x1, x2 and x3 are segmented lengths
C
fuselage axis
correct wing tip - accept

too short - reject


too long - reject

FP

not aligned with


fuselage axis
- reject

outside the
search sector
- reject

RP
wingtip search sector

(b) examples of rejected wingtip candidates (in red)

boomerang wing cannot carry


missile on wing tip
- reject

wingtip roughly perpendicular to wing edge,


- accept
(c) boomerang wing-> wingtip normal to wing edge is accepted,
but protruding wingtip is rejected

Figure 4.6: Detection of wingtip edges.

121

Note that protrusion is not allowed for boomerang wing-pairs (eg. x1 = 0 x3 = 0)


because no missile can be loaded on such a wing tip.
If the wingtip candidate satisfies all of the above conditions, then the line is accepted
as a wingtip edge, and a score is awarded. The above procedure is applied separately
to each side of the fuselage, and the scores from both sides are summed.

4.1.4

Additional Evidence Accumulation

Having extracted most or all of the aircraft parts, the overall aircraft silhouette is now
defined. Additional scores are added (or subtracted) based on additional geometric
constraints and also on image intensity at selected locations within and around the
aircraft silhouette. Each of the following constraints contributes a score or imposes a
penalty.

1. If the wing leading edges overlap when rotated about FP as shown in Figure
4.7, then a score is awarded.
2. If the wing trailing edges overlap when rotated about RP (also refer to Figure
4.7), then an additional score is awarded.
3. Given the mean intensity values computed at the selected regions F1, F2, R1,
R2, M1 and M2 in Figure 4.8, if the mean intensity differences between each pair
of regions (F1 and F2), (R1 and R2) and (M1 and M2) are below a threshold,
then a score is added. This condition obviously favours aircraft with uniform
intensity distribution.
4. If the background is clean (as defined below), then the mean intensities of F1
and R2 should be distinct from the background mean intensity. If each intensity

122

fuselage axis
edge overlap

FP (or RP)

rla

ove

right wing edge

left wing edge

Figure 4.7: The wing leading edges must overlap when rotated about FP. The overlapping portion is shown in red. The same rule applies to the trailing edges of the
wing-pair.

difference is less than a threshold, then a penalty is applied. To determine if the


background is clean, the intensity values are collected from the image periphery
region as shown in Figure 4.9 and the intensity distribution is examined using
a histogram (see Figure 4.10). The cleaner the background is, the narrower
the histogram peak becomes. If we define PM as the pixel count in the bin
corresponding to the peak, and PT as the total pixel count in the histogram,
the ratio PM /PT is an indicative measure of clutter level. Figure 4.10 shows
three image examples with different clutter levels. The associated histograms
clearly show a correlation between the ratio PM /PT and clutter level. The bin
width is set to 1/10 of the image intensity range. A ratio of PM /PT > 0.8 is
indicative of a clean background.
5. For a boomerang shaped wing-pair, if a line is found in the rear fuselage region
(ie, crosses the double-arrowed green line in Figure 4.11), then its length and
orientation are checked. If the line is not short, and is roughly parallel to the

123

C
Fuselage Axis

F1
FP
F2

M1

M2
R1
RP
R2

F1, F2, R1, R2, M1, M2: regions of interest for intensity level comparisons

Figure 4.8: Regions of interest for intensity level comparisons. The differences of the
mean intensity values between each pair of regions (F1 and F2), (R1 and R2) and
(M1 and M2) are expected to be small.

fuselage axis, then it is accepted as the rear fuselage edge, and a confidence
score is added. If no rear fuselage line is found for the boomerang wing-pair,
then a penalty is imposed.
6. The gap between FP and the inner endpoint of wing edge (ie., PT2 in Figure
4.11) is checked for any clutter lines. If three or more lines cross the doublearrowed brown line in Figure 4.11), then the hypothesis is more likely to be a
coincidental aircraft formation from clutter, hence is penalised.
7. If the image is cluttered (ie., total lines count exceeds 450), then an accidentally
generated spurious hypothesis may contain dense clutter inside its boundary.
Note that the number 450 is selected from the line count distribution curve

124

intensity values are taken


from the shaded region
M/4

N/4

Figure 4.9: The background intensity is computed from the shaded periphery region.
We assume this periphery region contains mainly the background.

Please seee print copy for Figure 4.10

(a) clean background


4

14

(b) light clutter


4

Background Intensity Distribution

x 10

10

(c) heavy clutter


4

Background Intensity Distribution

x 10

3.5

Background Intensity Distribution

x 10

9
12

3
8

10

2.5

frequency

frequency

frequency

6
8

1.5

2
2

0.5
1

50

100

150
200
Background Image Intensity

(d) PM /PT 0.8

250

300

50

100
150
Background Image Intensity

200

(e) 0.4 PM /PT < 0.6

250

50

100

150
200
Background Image Intensity

250

300

(f) PM /PT < 0.2

Figure 4.10: Background intensity histograms obtained from the shaded perimeter
region (refer to Figure 4.9) of aircraft images with different clutter levels: (a) clean, (b)
light clutter, and (c) heavy clutter. PM is the count of pixels in the bin corresponding
to the peak, and PT is the total pixel count in the histogram. The ratio PM /PT roughly
indicates the clutter level.

125

too many clutter lines


crossing the gap between
wing edge and fuselage
axis - hypothesis is likely
to be an accidental clutter
association.

FP
fuselage axis

PT2

RP
wing edge midpoint

wing edge midpoint

LC

too short
- reject

rear fuselage
search line
accept

not aligned
with fuselage axis
- reject

RC

blue: fuselage lines


red: clutter lines

Figure 4.11: Finding of rear fuselage lines and clutter lines: Potential rear fuselage
edges for a boomerang shaped wing-pair are detected between the wing trailing edges,
and are shown in blue. Detection of many lines crossing the gap between the wing
edges inner point (eg., PT2) and the fuselage axis weakens the confidence of the
hypothesis. Clutter lines are shown in red.

C
not considered as clutter
because it is aligned with
fuselage axis

fuselage axis
clutter lines outside the
fuselage region
- shown in green

hypothetical
fuselage region

detected clutter lines


in the fuselage region
- shown in red

spurious aircraft hypothesis


coincidentally generated from clutter

Figure 4.12: A spurious aircraft hypothesis coincidentally generated from dense clutter, is likely to contain many clutter lines in the hypothetical fuselage region.

126

60

40

20

evidence score (penalty)

20

40

60

80

100

120

140

10

15
clutter line count

20

25

30

Figure 4.13: Clutter evidence score plot as function of the clutter count. If the
clutter count within the fuselage region (refer to Figure 4.12) exceeds 7, then the
score becomes negative.

obtained from the clutter images (see Figure 6.3). Any short line segments
found within the hypothetical fuselage boundary (ie., shaded region in Figure
4.12) present a large angle with the fuselage axis, then they are considered as
clutter segments (shown in red in the figure). The score (or penalty) is computed
as a function of the number of the detected clutter segments, and is illustrated
in Figure 4.13. Note that the line count of 450 is selected based on the line
statistics of 160 non-aircraft clutter images.
8. For a correct aircraft hypothesis, it is expected that F P is close to the fuselage

axis (C RP in Figure 4.14). Therefore, the penalty score is made proportional


to the angular deviation F P in Figure 4.14 (ie., penalty F P 180/).
9. In images containing a boomerang shaped aircraft, often spurious aircraft candidates arise from the wing-fuselage or rudder-fuselage combinations (see Figure
4.15 as an example). The combined evidence of the large intensity difference

127

FP

aircraft fuselage axis


C-RP

FP

RP

Figure 4.14: Deviation of FP from the fuselage axis, expressed as F P . Any aircraft
with coplanar wings and fuselage will display a small F P value. Spurious hypotheses
usually show larger F P values, therefore the parameter, F P , is used in interpretational conflict resolution process.

between R1 and R2 and no detected rear fuselage lines, provides a strong indication of such wing-fuselage formation, and the hypothesis is therefore severely
penalised.
10. If the image contains a large number of lines (eg., > 450), a spurious hypothesis
may be formed from the coincidentally extended clutter lines (see Figure 4.16).
If the image contains an aircraft, then it is unlikely that such a spurious hypothesis becomes the winning hypothesis. However, if the image does not contain
an aircraft, then without having to compete with a true aircraft, the spurious
hypothesis may become the winning hypothesis, generating a false alarm. In
order to handle this problem, we check whether or not more than three wing
edges of the hypothesis are extended as shown in Figure 4.16. If so, then the
line segments used to form the extended lines are identified by checking the
collinear slot in the extended lines (see Figure 3.9). If these segments did not

128

false wingpair

R2
R1

false nose

Spurious Hypothesis

Figure 4.15: Intensity comparisons between regions of R1 and R2. A spurious aircraft
hypothesis, often generated as wing-fuselage combinations, will show a large intensity
difference between the two regions.

generate a hypothesis describing the same wing shape as the given hypothesis,
then the current hypothesis is considered as being spurious and is therefore
severely penalised.

11. A fraction of the hypothesiss weight contributes a score. The score is calculated
as

sweight
K

where s is a constant and K is the average image size.

An example of an aircraft candidate representation is given in Figure 4.17. All information about wings, fuselage, noses, tail fins and wingtips are included in the
representation. Two additional slots, Killed and Killed by are shown and are next
used during the interpretational conflict resolution process, which is the subject of
the following section.

129

C
clutter fragment

FP

RP
Spurious hypothesis: 3-4 wing edges
are formed by extending clutter edges

Figure 4.16: Spurious hypothesis which is accidentally formed where three or more
wings are the extended lines of clutter edges.

4.2

Interpretational Conflict Resolution

The final stage of generic aircraft recognition is to perform consistency checks by


examining all generated aircraft hypotheses for signs of conflict. Any two high score
aircraft candidates are examined for conflicts arising from edge and region sharing.
If an aircraft part or edge belongs to more than one aircraft candidate, the aircraft
candidate with a clear score advantage is selected and the rest are cleared from the
aircraft database. The clearing process is implemented by updating the Killed and
Killed by slots in Figure 4.17, for all competing aircraft candidates. Cleared aircraft
candidates will have their Killed slot activated to 1, with the Killed by slot containing
the index number of the winning aircraft candidate.
However, if the score difference is small (ie., < 30), then a decision based solely on
score can sometimes be misleading. Instead, additional conditions are formulated

130

AIRCRAFT HYPOTHESIS
Wing-pair Index: #
Type: Boomerang/Diamond/Triangle
Left Wing Index: #
Right Wing Index: #
Four Edge Indices: [#, #, #, #]
Corner Coordinates: [FP(#,#) RP(#,#) LC(#,#) RC(#,#)]
Distance from FP to RP (ie., || FP - RP ||): #
Distance from LC to RC (ie., || LC - RC ||): #
FP's Angular Deviation: #
Weight: #
Detected Nose Cadidates: [ #, #, ..., #]
Winning Nose Candidate Index: #
Fuselage Score: #
Detected Fuselage Edges: [#, #, ..., #]
Nose to Wing Connect: [left(1 or 0), right(1 or 0)]
Detected Wing Tips:[left(#), right(#)]
Detected Tail Fins:[left(#), right(#)]
Wing Tip Score: #
Tail Fin Score: #
Other Score: #
Total Score: #
Killed:1 or 0
Killed by (aircraft candidate index): #

Figure 4.17: Aircraft-hypothesis representation. The two slots Killed and Killed by
are used during the interpretational conflict resolution process. The slot Weight
contains the sum of the four line lengths. Note that the symbol ] refers to a number.

based on which edges are conflicting. Edge conflicts are grouped into four commonly
occurring cases.
Case 1: The wing leading (or trailing) edges are shared. This often arises when the
wing leading or trailing edges cast shadows on the ground, and the shadow lines
are detected as the non-shared wing edges of the spurious hypothesis (shown
in red on the left side of Figure 4.18(a)). Sometimes the non-shared edges of
the spurious hypothesis come from the trailing edges of the tail fins (see the
right side of Figure 4.18(a)). If the image contains background clutter, then the
non-shared edges could also be clutter.
Case 2: One wing (ie., 2 edges from one side of the fuselage) is shared (refer to

131

Figure 4.18(b)). The spurious hypothesis has the non-shared wing, arising from
the tail fin, clutter, shadow or fuselage.
Case 3: Three edges are shared (refer to Figure 4.18(c)). This scenario sometimes
occurs when the non-shared edge of the spurious wing-pair comes from the
rudder, background shadow or clutter.
Case 4: The two hypotheses share only the nose and fuselage axes coincide (refer
to Figure 4.18(d)). This arises when the shadows cast on the ground by the
wings are positioned directly behind wings and form a wing-pair for the spurious
hypothesis.

To resolve the conflicts arising from the above scenarios, the reasonings (outlined
as below) are formulated. For convenience, we name the conflicting hypotheses HA
and HB , where HA and HB are the true and spurious hypotheses, respectively. The
hypothesis parameters such as F P (see Figure 4.14), weight and kF P RP k (see
Figure 4.17) are used in the following reasoning process.
Case 1: Two wing leading (or trailing) edges are shared.
IF 4F P between HA and HB is large,
if F P of HA is smaller and weight of HA is larger, then HA wins.
ELSEIF 4F P between HA and HB is small,
if weight of HA is much larger, then HA wins,
elseif the non-shared edges of HA are parallel to those of HB ,
then GO TO SHADOW REMOVAL ALGORITHM
elseif HA is boomerang-shaped, and kF P RP k of HA is smaller,
then HA wins (eg., right side of Figure 4.18(a))
ENDIF
Case 2: Left or right wing is shared.
IF 4F P between HA and HB is large,
if F P of HA is smaller, then HA wins.
ELSEIF 4F P between HA and HB is small,
if HA is not boomerang-shaped, and HB is boomerang-shaped,

132

incorrect wing edges


are shown in red
true
true

tail fin

removed
removed
shadow
(a) 2 leading or trailing edges are shared

true
true

removed
removed
removed
tail fin

clutter or
shadow
(b) two edges from one wing is shared

fuselage

rudder
true

removed

removed

clutter or
shadow

(c) three edges are shared

true

removed

shadow
(d) only nose is shared and fuselage axes are aligned

Figure 4.18: Commonly encountered scenarios of interpretational conflicts due to part


sharing. Incorrect wing edges in the spurious wing candidates are shown in red.

133

then HA wins (eg., leftmost side of Figure 4.18(b)).


ENDIF
Case 3: Three wing-edges are shared.
IF 4F P between HA and HB is large,
if F P of HA is smaller, then HA wins.
ELSEIF 4F P between HA and HB is small,
if the non-shared edges are collinear, and weight of HA is larger,
then HA wins.
elseif the non-shared edges are parallel, and kF P RP k of HA is smaller,
then HA wins.
elseif the total intensity based evidence scores of HA is larger,
then HA wins.
elseif the sum of total score and weight for HA is larger,
then HA wins.
ENDIF
Case 4: Only the nose is shared.
IF fuselage axes of HA and HB coincide,
if all of the corresponding edge pairs are parallel.
then GO TO SHADOW REMOVAL ALGORITHM.
END IF

If the interpretational conflict is not resolved by these reasoning processes, then the
higher score hypothesis wins. The surviving hypotheses are sorted in terms of their total score, and the top 5 highest score hypotheses reach the output stage of the generic
recognition. If the score exceeds a score threshold (eg., 600), then the hypothesis is
accepted as the recognised aircraft.

4.3

Shadow Removal Process

In images of parked aircraft, shadows cast by wings can potentially confuse the system.
Shadow detection is often addressed in building detection systems [76, 77, 128] and
also in aircraft recognition systems [32, 33, 81].

134

Please see print copy for Figure 4.19

(a) wing and shadow overlapped

(b) wing and shadow separated

Figure 4.19: Shadow regions casted by wings ((a) are mostly covered by the wings,
or (b) are separated from the wings). The shadow wings have their symmetry axis
roughly aligned with the aircraft fuselage axis.

In our case, a spurious hypothesis containing shadow region(s) usually have a lower
score, and they are therefore successfully removed by the true hypothesis because of
the large score gap.
However, if the shadow lines of the spurious hypothesis fit well with the rest of the
aircraft structure, then its score may become large. Such a problem is illustrated in
Figure 4.19, where the shadow lines are parallel with the aircraft wing edges and fit
well with the fuselage.
In such a case, the conflict resolution algorithm invokes the shadow removal process.
The call from Case 1 is activated when two leading or trailing edges are shared, as
shown in Figure 4.19(a). The call from case 4 is activated if the shadow regions are
separated from the wings, and no lines are shared, as shown in Figure 4.19(b). Two
separate algorithms are developed to handle each case.

From Case 1: wing leading or trailing edges are shared

135

shared edge
causing conflict

shared edge
causing conflict
shadow region

AND
Hypothesis 1

non-shared edges, each


belonging to the conflicting
hypotheses

Hypothesis 2

corrent wing belonging to


correct aircraft hypothesis
- accept

incorrect wing containing


shadow line
- reject

(a) conflict - a wing edge belongs to two aircraft hypotheses

Region 1

Region 2

Region 3

Region 4

(b) four regions of interest for image intensity analysis to discriminate spurious aircraft
hypothsis which comprises 2 correct wing edges and 2 shadow lines
FP
WING EDGE
NOT SHARED

correct aircraft
hypothesis - accept

L2

L1
RP
L3

L4

RP'

L1'

L3'

wing edge
shadow line

L2'
L4'
spurious aircraft
hypothesis - reject

(c) conflict - nose & fuselage shared, no wing edges are shared between the
two aircraft hypotheses

Figure 4.20: Interpretational conflicts arising from shadow casted by the wings.

136

In this case, two non-shared lines of the spurious hypothesis are the shadow lines (one
of which is shown in red in Figure 4.20(a)). The shadow lines casted by the wing
leading edges are hidden under the wing, hence not visible in the image. The wing
on the far right side of Figure 4.20(a) belongs to the spurious hypothesis.
Referring to Figure 4.20(b), if the conflicting edges are the wing leading edges, then
regions 3 and 4 will be selected for intensity level checks. If the sharing occurs at
the wing trailing edges, then regions 1 and 2 will be examined. If the mean and
standard deviation of the intensity in the regions are both small, then the regions can
be regarded as the shadow regions, and the hypothesis with the smaller kF P RP k
wins.

From Case 4: no wing edge is shared - nose is shared and fuselage axes
are well aligned
Figure 4.19(b) is the typical example where the call is made from Case 4: the wings
are slender and the sun is roughly along the aircraft longitudinal direction, so that
the shadow is separated from the wings in the image. This spurious wing-pair fits
well with the fuselage axis and has the correct nose.
In this case, the four wing edges of the aircraft hypothesis are compared with their
counterparts from the spurious hypothesis (refer to Figure 4.20(c)). Firstly, L1, L2,
L3 and L4 must be parallel with L1, L2, L3 and L4, respectively. Secondly, the
two wing-pairs must have approximately the same dimension. Lastly, the separation
between the two wing-pairs along the fuselage axis must not be much greater than
kF P RP k. Provided that the aircraft is in a parked position (ie., distance between
the wings and ground is relatively small) and the viewing angle is not too oblique,
the separation is usually less than 1.5kF P RP k. If all of the geometric constraints

137

are satisfied, and if one of the hypotheses exhibits a roughly constant dark region,
then the spurious hypothesis is removed.
The proposed aircraft recognition system does not assume a severe regional overlap
between any two aircraft. Hence, if any pair of surviving hypotheses show an excessive
regional overlap, then the one with the smaller score is eliminated. Figures 4.21(a)-(c)
provide a snapshot of the final competition stage. The coloured lines in the figures
correspond to the extracted aircraft features (ie., wings, nose, fuselage and wingtips).
No tail fins are found for these three aircraft candidates. The aircraft candidate in
Figure 4.21(a) has a low score. Note how the true aircraft nose was mistaken for a
left wing by the system. The aircraft candidate in Figure 4.21(b) shares line features
with that of Figure 4.21(c). The wing symmetry axis (ie., line joining FP to RP)
of this aircraft candidate (from 4.21(b)) presents a large angular deviation from the
fuselage axis (measured as F P ). Therefore this hypothesis is dismissed from further
competition by updating its Killed and Killed by slots with one and the index of the
winning aircraft candidate, respectively. The aircraft candidate of Figure 4.21(a) is
also removed from the aircraft database because of its low score.

4.4

Evidence Score Optimisation

A complete list of evidence sources is presented in this section and summarised in


Table 4.1. The system parameters were adjusted through a large number of computer
simulations, which use a training set of 80 aircraft and 30 non-aircraft images. Initially, the evidence score weights were adjusted heuristically, and then the evidence
score values in the 7th 18th rows of Table 4.1 were fine tuned to increase the score
gap between the true and spurious hypotheses. The evidence score generation process
of Section 4.1.4 was iterated using 100 true and 100 spurious hypotheses, until the

138

Please see print copy for Figure 4.21

Figure 4.21: Examples of some competing aircraft candidates. Green lines correspond
to nose legs, red lines to wing edges and tips, blue line to fuselage axis, and cyan line
to wing symmetry axis.

139

Table 4.1: Scores obtained in the process of aircraft evidence accumulation. The first
6 scores are dedicated to the aircraft part detection, and the remaining evidences (in
the 7th 18th entries) are introduced in order to help distinguish between the aircraft
and clutter hypotheses.
n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Image evidences
nose
fuselage coverage (refer to Equation 4.1.1)
wing-to-nose-connect
excessive short fuselage boundary edges
wingtip
rear wings
wing leading edges overlap
wing trailing edges overlap
intensity matching in the boundary
background-foreground intensity differences
rear fuselage lines for boomerang wings
no rear fuselage line for boomerang wings
many clutter crossing wing leading edge
many clutter in fuselage region
FP deviation from fuselage axis
boomerang, RP1-RP2, and rear fuselage
3-4 wings are fragmented in clutter
hypothesis weight

Scores
100
120 to 320
[Left,Right]=[30,30]
-100
[Left,Right]=[40,40]
[Left,Right]=[10 to 60,10 to 60]
30
30
10 for FP, 10 for RP and 20 for M
-10 to -50
10no. of lines
-30 or -100 (long narrow wings)
-150
2 50 ... up to 18 130
2 F P 180/pi
-500
-30 -80 or -100
100 weight/((M + N )/2)

normalised score gap is maximised. The score gap is measured as the normalised
difference of the score means (T and F ) with respect to the standard deviations
(T and F ),
normalised score gap =

|T F |
T + F

(4.4.1)

It should be pointed that testing the recognition system on non-aircraft images was
crucial to better adjust the system parameters and introduce many score penalties
resulting from negative evidences.
Major positive evidences that consolidate the aircraft hypothesis are the nose and

140

histogram of fuselage coverage score


80

number of aircraft hypotheses

70

60

50

40

30

20

10

0
120

140

160

180

200

220

240

260

280

300

320

fuselage coverage score

Figure 4.22: Histogram of the fuselage coverage score of the winning hypotheses using
a sample base of 300 real aircraft images.

fuselage coverage. The fuselage score of the winning hypothesis (as shown in the
second row of Table 4.1) varies roughly from 120 to 320. Figure 4.22 shows the
histogram of the fuselage score of the winning hypothesis based on a set of 300 real
aircraft images.
The system suffers a large score drop if too many line fragments are found in the
fuselage region (refer to the fourth row of Table 4.1) or in the vicinity of the wing
leading edges (ninth row). For aircraft with boomerang shaped wings, the region
behind the wings are searched for any fuselage lines. It was observed that the wing
and fuselage were often paired to generate false boomerang wing pairs (see Figure
4.15). The checking for the existence of rear fuselage edges plays a vital role in
removing many spurious candidates. Considerable score drops are hence introduced
as shown in the 12th and 16th rows of Table 4.1. These scores in Table 4.1 are fixed
for the remainder of the experiments.

141

4.5

Experimental Results of the Selected Aircraft


Images Shown in Section 1.1

In Section 1.1 of Chapter 1, 28 representative images were selected from the image set
and were presented to help visualise the difficulties encountered in automatic aircraft
recognition. These difficulties were broadly divided into 7 categories: blurring, camouflage, clutter, closely placed multiple aircraft, occlusion, protrusion and shadowing
effects. For the experiments reported here, four images are allocated to each category.
The outcome of these runs is discussed in terms of those seven categories.

Blurring: Figures 4.23 to 4.26 feature aircraft aerial photographs and show substantial amount of blurring and noise. Most of the wing edges appear distorted or
present low contrast. The application of dual threshold edge detector combined with
the line extension algorithm, enabled the extraction of wing edges and eventually led
to the successful generic recognition of the aircraft. Figure 4.24 poses a particular
challenge as the background appears densely cluttered. However, the clutter rejection
algorithm described in Section 3.3 was activated and removed portions of the clutter. Note that this image resulted in two surviving hypotheses (See Figure 4.24(d))
where, because of non-overlap, the spurious hypothesis did not have to compete with
the true hypothesis. However, the score of the spurious hypothesis was well below
the threshold, and therefore was rejected in the final recognition stage. Notice finally
that all four aircraft have at least one wing edge extended.

Camouflage: Figures 4.27 to 4.30 present fighter jets with camouflage. The outline
of the camouflage patches forms a number of T junctions with the wing edges,

142

causing wing edge fragmentation. However, given that the line fragments are proximal and roughly collinear, they are usually extended to form longer wing edges, as
shown in column (c) of Figures 4.27 to 4.30. The camouflage patches are relatively
large in size, satisfying marginally the intensity check required to form two-line groupings. The aircraft image in Figure 4.30 has a camouflage pattern which resembles the
sand dune background. This makes it difficult for region segmentation-based methods to distinguish the aircraft from the background region. Our approach, which is
based primarily on geometric reasoning, shows some success in detecting camouflaged
aircraft in images.

Background Clutter: Figures 4.31 to 4.34 present aircraft images with cluttered
background. The first two images (Figures 4.31 - 4.32) contain dense clutter but are
successfully filtered during the contour extraction process. The winning hypothesis
in Figure 4.32(d) has the correct wings but the nose is located at the cockpit. This
occasionally occurs when the nose is either not detected or unable to compete with
a false nose arising from the cockpit structure. In many applications (eg., defence),
not locating the true aircraft nose is not a major issue as the cockpit is close enough
to the true aircraft nose and aligned with the fuselage axis.
In Figure 4.33, the clutter lines are polarised. Therefore, the clutter removal algorithm
would be insensitive to them. However, the polarised lines are assigned zeros in
their significant slots (refer to the table in Figure 3.9), and hence are prevented
from forming wings undesirably. Figure 4.33(d) shows that the winning hypothesis
is correct, and most of the aircraft parts are correctly detected. In Figure 4.34, the
background contains vegetation fields. The clutter in the edge image of Figure 4.34(b)
is not dense enough for the clutter removal algorithm to take effect. As a result,
clutter lines are allowed to form pairs and compete with the true wings. Despite this,
spurious aircraft hypotheses based partially or completely on clutter lines did not

143

build up enough confidence score to eliminate the correct aircraft hypothesis, which
is shown in Figure 4.34(d).

Multiple Aircraft: Figures 4.35-4.38 show closely spaced multiple aircraft in the
scene. Our system only accepts the five highest scoring hypotheses, because we
assume that the image does not contain more than 5 aircraft. Often one aircraft
part may be associated with parts from an adjacent aircraft and generate spurious
hypotheses. These hypotheses always give rise to interpretational conflicts because
part sharing is inevitable. However, spurious hypotheses can usually be eliminated
by the true hypothesis during the conflict resolution stage, unless the true aircraft
hypothesis suffers from clutter effect and occlusion. All of the images in Figures 4.354.38 generate the correct winning hypotheses (up to 4 aircraft), with most of their
parts correctly detected and labelled.

Occlusion: Figures 4.39-4.42 show some examples of partially occluded aircraft. A


wing is represented as a pair of non-parallel lines which potentially delimit a wing
structure. The constraints on line length are relaxed to account for possible occlusion.
In Figure 4.39, the aircraft has its left wing and tail fin obstructed by another aircraft.
A small portion of the left wing trailing edge is visible and allowed to form a twoline grouping. This led to the successful extraction of the wing pair and winning
hypothesis. The other aircraft was not recognised because its cockpit and nose were
not visible. In Figure 4.40, the aircraft appears small and is covered with flame
and smoke. In this case, the obstructed part is middle part of the fuselage, and the
wing and nose edges are visible. Having these key features available, the aircraft is
correctly recognised. In Figure 4.41, the aircraft is occluded by flares. The left wing
edges are successfully recovered by the line extension algorithm. However, in the final
stage (Figure 4.41(d)), the winning hypothesis presents a non-extended line for its left

144

wing leading edge. The extended line could not form a line-pair because the intensity
check was not successful. Figure 4.42 is the bottom view of an aircraft with missiles
under the wings. These missile protrusions caused the fragmentation of all four wing
edges. However, the fragmented wing edges were successfully extended, and appear
as extended lines in the winning hypothesis.

Protrusions: Figures 4.43-4.46 contain top view images of aircraft that carry missiles
or have engines. Only the protruded parts of the missiles and engines are visible,
and the remaining portions are hidden under the wings, not cluttering up the wing
region. Therefore, the wing edges usually satisfy the intensity constraints required by
the two-line grouping formulation.
The engine protrusions could be used to determine if the aircraft is a large commercial
airplane (eg., Boeing 747). In Das and Bhanu [33], the engine feature is embedded
in the model hierarchy and used for aircraft classification. This is feasible only if
the engines can be detected and recognised reliably. In our application, where the
background could possibly be noisy and cluttered, it would be difficult to distinguish
engines from clutter. Therefore, protrusions are treated as clutter.
The aircraft in Figure 4.45 has all four wing edges fragmented; the leading edges are
broken into four or five segments due to missile launch rails. This results in numerous
combinations of line extensions, with several wing-pair candidates delimiting the same
aircraft wings. Usually the wing-pair composed of the extended lines presents the
highest score and emerges as the winner.

Shadows: Figures 4.47-4.48 present self cast shadows on aircraft. These shadows
generate dark regions on the aircraft body. Any aircraft recognition approach that
is based primarily on regional intensity information (eg., region segmentation based

145

methods) may suffer when the aircraft body contains shadow, possibly splitting the
aircraft region into subregions. Figures 4.49-4.50 show examples of aircraft casting
their shadows on the ground. In this case, we do not attempt to detect the shadows
as in Das and Bhanu [33], Nevatia [92], Lin [76], and Marouani [81]. Shadows in the
background are treated as clutter, and they frequently fail to form wings. Furthermore, coincidental hypotheses formed by shadow lines usually result in large score
hypotheses. However, if the shadow lines fit in well with nearby aircraft structure,
then a hybrid part-aircraft part-shadow hypothesis may gain a high score and cause a
conflict with the true hypotheses. When this occurs, the shadow rejection algorithm
described in Section 4.3 is invoked and eliminates the spurious hypothesis. Figures
4.47(d) and 4.48(d) show that no parts from the shadows are included in the winning
hypotheses.
Finally before proceeding to the next section, it should be pointed out that six additional demonstrations of generic aircraft recognition as applied to scaled-down aircraft, is provided in Chapter 5 (refer to Figures 5.23(b) 5.28(b)). These images
were obtained in a controlled environment where the effects of shadow, blurring, protrusion, camouflage, clutter and occlusion were deliberately introduced.

4.6

Experimental Results from Non-Aircraft Images

In the previous section, the recognition performance of the system was discussed using
real aircraft images, focusing on how the system detects aircraft under various adverse
imaging conditions. These figures show that if an aircraft exists in an image, then
false hypotheses arising from background and from aircraft-background associations,

146

are usually defeated by a true hypothesis. This raises the question as to whether
such false hypotheses would survive as winning hypotheses if there is no aircraft in
the image. This motivated us to consider non-aircraft images containing natural
and man-made structures. We are not aware of any previous attempt that includes
non-aircraft images in the performance analysis.
In this section, a number of experimental results are given in order to give some
idea about the types of non-aircraft test images selected, and to show how accidental
winning hypotheses appear in these images. Figure 4.51 shows 12 images along with
the scores of the winning hypotheses. The images contain aerial views of buildings,
runways, vegetation farms, roads and coast, as well as a number of cloud scenes.
These examples demonstrate that images of structured clutter are likely to generate
false winning hypotheses for any vision system that mostly rely on line features.
In Figure 4.51, regions enclosed by the detected false hypotheses do not show heavy
clutter. Slight intensity variations are usually accepted in the aircraft hypothesis
generation, because many camouflaged or shadowed aircraft regions present similar
intensity variations.
None of the winning hypotheses in Figure 4.51 contain heavy clutter within the wing
and fuselage regions, implying that the system successfully penalised any spurious
hypothesis whose silhouette contains dense clutter. Some of the false hypotheses
present short line fragments as their wing edges; this is allowed because the system
assumes from the start that the aircraft could be occluded, or its wing edges could
be fragmented or partly washed away. However, most of such hypotheses (as shown
in Figure 4.51) do not have a high score, and therefore fail to emerge as the winning
hypothesis. The only exception to this is the hypothesis in Figure 4.51(f) presents
good geometric attributes, and its boundaries are well enclosed, resulting in the false
recognition. However, this hypothesis fails to form when the neural networks in

147

Section 3.8 are used.


Making the intensity based rules more discriminatory is challenging given such a vast
variation of aircraft texture. However, if one could design such a method that is also
efficient enough to be embedded in the system, then the performance would improve
notably. Another way to reject the false winning hypotheses is to apply a model
matching test to them. Several model matching results using scaled-down aircraft
images are presented in the next chapter.

4.7

Discussion

The generation process of an aircraft hypothesis along with its confidence score is
presented in detail in this chapter. A confidence score is a reflection of both positive
and negative evidences gathered by the hypothesis. The most important evidence for
an aircraft hypothesis is the presence of the fuselage section, whose detection process
is described at length in this chapter. Once the fuselage section is extracted, an
aircraft candidate is consolidated and the fuselage axis is refined for the purpose of
cueing the search for other aircraft parts (ie. wing tips and tail fins) and initiating
other geometric and intensity-based verification processes. Furthermore the fuselage
axis information is also used in the next chapter for aircraft viewpoint estimation.
The verification of generic aircraft hypotheses calls for testing of conditions of the
rules. It is essentially a reasoning process based on evidence accumulation to infer
the presence of aircraft instance in the image.
Key features of the verification step is gradual accumulation of evidence through (a)
part detection and association (positive evidence) and (b) clutter feature detection

148

(for negative evidence), and ambiguity resolution by re-visiting the constraint and
discriminating shadow lines. The most dominant evidence is the fuselage (fore-body)
coverage, because it brings together the more consistently visible parts, wing and
nose, while discarding large portions of nose and wing pair candidates. Clutter evidence are also sought for to penalise spurious hypotheses that have been accidentally
formed amongst clutter, and to increase the score gaps between the true and spurious
hypotheses.
The feature hierarchy is important in that the system can access features from different levels when necessary. Higher level features include pointers to component low
level features. Therefore the constituent features and evidence of a hypothesis can be
retrieved easily through those points.
If the confidence score exceeds a preset threshold, then the system declares recognition of a generic aircraft. The recognition comes with shape/intensity information
imbedded the winning hypothesis (as shown in Figure 4.17).

149
Please see print copy for Figure 4.23

Figure 4.23: Blur image 1, Score = 757.

Please see print copy for Figure 4.24

Figure 4.24: Blur image 2, Score = [879 510].

Please see print copy for Figure 4.25

Figure 4.25: Blur image 3, Score = 737.

Please see print copy for Figure 4.26

Figure 4.26: Blur image 4, Score = 647.

150

Please see print copy for Figure 4.27

Figure 4.27: Camouflage 1, Score = 810.

Please see print copy for Figure 4.28

Figure 4.28: Camouflage 2, Score = 690.

Please see print copy for Figure 4.29

Figure 4.29: Camouflage 3, Score = 612.

Please see print copy for Figure 4.30

Figure 4.30: Camouflage 4, Score = 686.

151

Please see print copy for Figure 4.31

Figure 4.31: Dense clutter, Score = 620.

Please see print copy for Figure 4.32

Figure 4.32: Dense clutter, Score = 730.

Please see print copy for Figure 4.33

Figure 4.33: Polarised clutter, Score = 746.

Please see print copy for Figure 4.34

Figure 4.34: Structured clutter, Score = 724.

152

Please see print copy for Figure 4.35

Figure 4.35: Multiple aircraft 1, Scores=[917 903 843].

Please see print copy for Figure 4.36

Figure 4.36: Multiple aircraft 2, Scores=[834 733 706].

Please see print copy for Figure 4.37

Figure 4.37: Multiple aircraft 3, Scores=[834 714 686 674].

Please see print copy for Figure 4.38

Figure 4.38: Multiple aircraft 4, Scores=[783 717].

153

Please see print copy for Figure 4.39

Figure 4.39: Partial occlusion 1, Score = 825.

Please see print copy for Figure 4.40

Figure 4.40: Partial occlusion 2, Score = 713.

Please see print copy for Figure 4.41

Figure 4.41: Partial occlusion 3, Score = 725.

Please see print copy for Figure 4.42

Figure 4.42: Partial occlusion 4, Score = 766.

154

Please see print copy for Figure 4.43

Figure 4.43: Protrusions 1, Score = 963.

Please see print copy for Figure 4.44

Figure 4.44: Protrusions 2, Score = 831.

Please see print copy for Figure 4.45

Figure 4.45: Protrusions 3, Score = 797.

Please see print copy for Figure 4.46

Figure 4.46: Protrusions 4, Score = 726.

155

Please see print copy for Figure 4.47

Figure 4.47: Shadow problem 1, Score = 731.

Please see print copy for Figure 4.48

Figure 4.48: Shadow problem 2, Score = 695.

Please see print copy for Figure 4.49

Figure 4.49: Shadow problem 3, Score = 863.

Please see print copy for Figure 4.50

Figure 4.50: Shadow problem 4, Score = 692.

156

Please see print copy for Figure 4.51

Figure 4.51: Examples of spurious hypotheses from non-aircraft images when the
rule-based line grouping method is used. The spurious hypothesis in (f) survives as
its score exceeds the threshold. However, with the neural network based line grouping
method, this spurious hypothesis fails to form.

Chapter 5
Aircraft Pose Estimation and
Identification
So far, the generic aircraft recognition step is concerned with generating and verifying
the hypotheses based on evidence derived from generic knowledge of aircraft structure
(Table 1.1). More accurate aircraft recognition (ie., identification) can be obtained
if specific aircraft models are used in the recognition process. Model matching implements the overlay of complete silhouette boundaries, hence allowing previously
missed primitive features (eg., missed rudder edge) to contribute to the recognition
process. By matching the winning aircraft hypotheses to pre-stored aircraft models,
aircraft identification (eg., F16 or F18 as shown in Figure 5.1(a)-(b)) is possible. It
is shown in the literature [88] that techniques that do not use models face limitations
in object discrimination capabilities, suggesting some sort of matching technique for
further verification and identification.
In this system, an input to the identification stage is the winning aircraft candidates
(up to 5) that provide a wealth of information such as the aircraft longitudinal orientation, position in the image, wing shape, and wing leading and trailing edge labels.
Having all this information, model matching no longer involves an exhaustive search
157

158

Please see print copy for Figure 5.1

Figure 5.1: 5 Military jets considered in the experiment.

process. Only a portion of selected image features and model candidates (out of the
entire model set) are required in the matching process.
Model matching can be performed either at the feature or pixel level. Matching using
line features calls for correspondence of model lines, after being transformed and
projected, with their image counterparts, and assessing the degree of match between
them [6, 37, 60]. This approach may be more robust in the presence of clutter and
partial occlusion, but performance may suffer if some salient lines are missed during
the feature extraction process. On the other hand, the pixel level matching approach
is more tolerant to poor imaging quality and image processing deficiencies, but can
be susceptible to noise and clutter.
This chapter begins with a review of matching metrics that could be used in our
application. Section 5.1 discusses these matching metrics. In Section 5.2, the three
dimensional model generation and the pose estimation algorithm are presented. Section 5.3 describes the model and image alignment process. Section 5.4 proposes a
fitting metric for the model matching. Section 5.5 discusses a pose fine-tuning process and a search strategy for the best match. Section 5.6 presents illustrative results
using six images of scaled-down aircraft. The purpose of this section is to visually
demonstrate the identification performance of the matching technique. Section 5.7
summarises the chapter.

159

5.1

Review of Matching Metrics

Model matching is the last processing step in object recognition. A high degree of
match between a model instance and its image, provides a strong confirmation of the
objects presence in the image, and allows viewpoint determination.
Feature-based matching methods call for the correspondence of model and image
features (eg., lines) by optimising some fitting metric. Fairney [37], Huttenlocher and
Ullman [60], and Beveridge [5] use lines for 2-D matching. The line-based matching
methods are computationally less expensive than their pixel-based counterparts and
are more robust against clutter.
Pixel level matching techniques [27, 41, 68, 97, 97] are widely used for 2-D shape
matching applications, where a projected model can be matched with the gradient
image [68], distance transformed image [41] or the edge image [97]. The downside of
these methods is their degraded performance in the presence of excessive clutter. This
issue is, however, addressed in [97] where a modified Hausdorff measure was used to
help reject clutter and improve matching speed. In [108], any discrepancy of pixel
orientation between the model and image point pairs, imposes a penalty to the fitting
score. In the remaining part of this section, six fitting metrics are presented in some
details: integrated squared perpendicular distance, distance ratio standard deviation,
circular distribution of matched pixels, distance transform, Hausdorff distance and
averaged dot product of direction vectors.

5.1.1

Integrated Squared Perpendicular Distance

As for fitting a transformed model to the corresponding image data, the most obvious
way would be to minimise the sum of squared distances between the corresponding

160

Image Segment
L

P2

P1
d2
d(t)
d1

Model Segment

Figure 5.2: Endpoints on a image segment projected onto an infinitely extended


model segment. The perpendicular distance at any point along the image segment is
given as d(t).

points from the model and image data sets. Since using the sum of squared distances
as a fitting measure is prone to edge fragmentation in the image, others have proposed the use of point-to-line distance to accomplish fitting [1, 70]. Beveridge [5]
introduced a fitting criterion, called the Integrated Squared Perpendicular Distance
(ISPD) between image segments and model lines that are infinitely extended during
the fitting process. Referring to Figure 5.2, the perpendicular distances from model
segment endpoints P 1 and P 2 to a model segment, which is infinitely extended, are
labelled d1 and d2, respectively. The perpendicular distance d(t) from any point on
the image segment to the extended model segment can be expressed as
t
d(t) = d1 + (d2 d1 ) , 0 t L
L

(5.1.1)

where t is a position parameter along the image segment and L is the length of the
image segment.
The definite integral of d(t) over L generates the ISPD,
Z L
L
ISPD =
d2 (t)dt = (d21 + d1 d2 + d22 )
3
0

(5.1.2)

This ISPD is calculated and summed over all pairs of model-image line segments, and

161

then normalised to produce a fit error,


Efit =

1 X
ISPD(s)
Lm sC

(5.1.3)

where Lm is the sum of all model segment lengths, and s is the segment index in the
model line set C.
In addition to the fitting error, Beveridge also includes omission error, pairwise error,
and transformation error in order to form the total match error. The omission error
is the fraction of the model segment not covered by the corresponding image segment,
and is within the range [0,1]. The pairwise error is an increasing function of orientation
difference between the segment pair. The transformation error is introduced if the
scale change associated with the transformation under weak perspective projection is
too large.

5.1.2

Distance Ratio Standard Deviations

Chien and Aggarwal [27] proposed a 3-D shape recognition technique based on corners
and contours as primary features instead of straight lines. The corner points are
extracted from the image contours and also from the 3-D model. Every four-point
correspondences (between four 2-D image points and four 3-D model points) are
used to generate the transform hypotheses, which are then verified using constraints
associated with the rotational parameters (refer to Section 2.3.4 for details).
Given the contours of the image and projected model that are aligned according to
the estimated transformation, first the model and image contour centroids (C and
C) are computed, then the principal axes (P and P) are determined (refer to Figure
5.3). Each contour is then sampled to generate Nc boundary points, mk and ik where
k = 1 . . . Nc . For each pair of points of the same index k, the distance from the

162

P'

m k-1

mk m

ik-1
k+1

d(C ,m k)

C'

model contour

ik i
k+1

d(C ',i k)

image contour

Figure 5.3: Projected model and image boundaries used for calculation of the distance
ratio standard deviation.

centroids to the points are computed and the ratio of the distances is obtained.

rn =

d(C, mk )
kC mk k
=
0
d(C , ik )
kC 0 ik k

(5.1.4)

The standard deviation of rn , denoted as DRS (Distance Ratio Standard deviation),


is used as a metric of shape similarity. If the two contours have exactly the same
shape (only differ by rotation, translation and dilation), then the DRS is zero.
If the contour is partially occluded, then instead of using the entire contour, the
occluded contour is divided into contour segments whose endpoints are high curvature
points, and the matching is performed on these segments. The overall DRS is defined
as the weighted mean of the DRSs of all the matched contour segment pairs. If the
minimum DRS is below a threshold, then the match is accepted and the model with
the minimum overall DRS is regarded as the correct model. This matching method
can handle partial occlusion and multiple targets.

163

matched edge pixels


shown as points on the
scaled circular perimeter

(0,0)

(a) good boundary match

(0,0)

(b) poor boundary match

Figure 5.4: Circular distribution of matched pixels, (a) good match between the
model and image boundaries, (b) poor match resulting in an uneven distribution of
points.

5.1.3

Circular Distribution of Matched Pixels

Marouani [81] proposed an algorithm where a short list of aircraft hypotheses are
generated by fine tuning the translation to find the best line-to-line fit between the
image and model segment sets. He uses a 3-D accumulator where two horizontal axes
denote 2-D translation, and the vertical axis represent accumulative votes from the
line pairs. The vote is weighted according to linear and angular separations of the
segment pair and their length difference. It should be noted that this method extracts
the line segments using the LINEAR feature extraction system [94], and assumes that
the viewing angles are known.
The validation procedure is based on the argument that the matched image segments
have to be evenly distributed on the model. Each part of the aircraft (wings, nose,
tail and fuselage) must have compatible proportion of matched segments in terms of
arc length. Firstly, the model outline is scanned, and for each model outline segment
the corresponding matched image segment is projected onto it. The total arc length

164

of the model is used to scale the binary function modulo 2 in order to map this
function onto a circle of radius 1. Matched pixel pairs assigned 1s will be mapped to
the pixels on the perimeter of the circle (see Figure 5.4). A good match will result in
points which are densely and evenly distributed around the circle as shown in Figure
5.4(a), and a poor match will result in unevenly distributed points as shown in Figure
5.4(b).
The evaluation of this distribution is achieved by introducing three parameters:
namely eccentricity (< 7%), length of match (> 50%), and displacement (< 20%).
These parameters are defined using the second order moments of the point coordinates
on the circle and the eigenvalues of the Hessian matrix consisting of the moments.
Detailed descriptions on the generation of these parameters can be found in [81].

5.1.4

Distance Transform

Given a binary edge image, each non-edge pixel is given a value that is a measure
of the distance to the nearest edge pixel. The edge pixels are assigned the value
zero. The operation converting a binary edge image to a distance image is called the
distance transform (DT) [12, 13, 41, 42, 114].
Figure 5.5 shows a sample binary pattern and its true Euclidean distance transform.
There are other approximations of the Euclidean distance measure, including the
chamfer 2-3 metric or 3-4 metrics [13, 114]. Typical model matching with distance
transformation image, denoted as I, is shown in Figure 5.6. This image, I, is correlated with the binary model edge template denoted by T . The average pixel values
of I that the edge pixels of T overlay is the measure of correspondence between the

165

2.2

2.0

2.0

2.0

2.0

2.0

2.0

2.0

2.2

1.4

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.4

1.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

1.0

1.0

0.0

1.0

1.0

1.0

1.0

1.0

0.0

1.0

1.4

1.0

0.0

1.0

1.4

1.0

0.0

1.0

1.4

2.2

1.4

1.0

0.0

1.0

0.0

1.0

1.4

2.2

2.8

2.2

1.4

1.0

0.0

1.0

1.4

2.2

2.8

3.6

2.8

2.2

1.4

1.0

1.4

2.2

2.8

3.6

4.2

3.6

2.8

2.2

2.0

2.2

2.8

3.6

4.2

Figure 5.5: A binary edge image (on the left) and its Euclidean Distance Transform
(on the right).

Distance Transform (DT)


Image

2.2

2.0

2.0

2.0

2.0

2.0

2.0

2.0

2.2

1.4

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.4

1.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

1.0

1.0

0.0

1.0

1.0

1.0

1.0

1.0

0.0

1.0

1.4

1.0

0.0

1.0

1.4

1.0

0.0

1.0

1.4

2.2

1.4

1.0

0.0

1.0

0.0

1.0

1.4

2.2

2.8

2.2

1.4

1.0

0.0

1.0

1.4

2.2

2.8

3.6

2.8

2.2

1.4

1.0

1.4

2.2

2.8

3.6

4.2

3.6

2.8

2.2

2.0

2.2

2.8

3.6

4.2

Feature (Edge) Template

Figure 5.6: Computation of the Chamfer distance - model edge image (template) is
superimposed on the DT image, and the values in the shaded (blue) entries read the
distance between the model edges and the image edges.

166

two edges, called chamfer distance,


D(T, I)

1 X
dI (t).
|T | tT

(5.1.5)

where |T | denotes the number of edge pixels in T and dI (t) denotes the distance
between the template edge pixel t and the closest image pixel.
A perfect fit between the two edges will result in the chamfer distance of zero. The
matching process is to minimise the chamfer distance to find the best fit. The resulting
best match is accepted if the distance measure D(T, I) is less than a specified threshold
(ie.,D(T, I) < dth ).
Borgefors [13] proposed a matching method called the Hierarchical Chamfer Matching
Algorithm (HCMA) in order to reduce the computational loading. He embeds the
chamfer matching algorithm into a hierarchical structure, a resolution pyramid which
includes a number of versions of it in lower resolutions. The matching starts at a
very low resolution and the results of the low resolution matching are used to guide
higher resolution matching processes. Apart from the reduction of computational
complexity, this algorithm has also shown improved robustness against noise and
other imaging artifacts.
Gavrilla [41, 42] proposed another efficient method, which also implements a coarseto-fine approach in shape and parameter space, and incorporates a multistage segmentation. Firstly, a model template hierarchy is generated off-line, where a set of
similar shaped templates are grouped and represented by single prototype template.
Iterations of this grouping and prototype generation complete the construction of the
template hierarchy. Online, actual matching takes place adopting the coarse-to-fine
approach in terms of the template hierarchy and transformation parameters. Speed
gain from this approach in comparison to the brute-force DT formulation is of several
orders of magnitudes.

167

5.1.5

Hausdorff Distance

The Hausdorff distance is mainly applicable to image matching, and is used in image
analysis, visual navigation of robots, computer-assisted surgery, and so on. The
Hausdorff metric serves to check if a template image is present in a test image. It is
defined as the maximum distance of a point in a set to the nearest point in another
set. Given a set of points A and another set of points B, the directed Hausdorff
distance from A to B is given as
h(A, B) = max min ka bk
aA bB

(5.1.6)

where a and b are points in A and B, respectively. This directed Hausdorff distance is
oriented, which means that h(A, B) is not equal to h(B, A). Therefore, the definition
of the Hausdorff distance between A and B (not from A to B) would be
H(A, B) = max (h(A, B), h(B, A)).

(5.1.7)

Figure 5.7 provides a simple illustration of the Hausdorff distance between two ellipses. It is clear that better fitting of the two ellipses results in the smaller Hausdorff
distance. One downside of using the Hausdorff distance is its sensitivity to noisy
pixels. If one set of pixels contain a single nose point which happens to be far from
the points in the other set, then it will cause H(A, B) to be excessively large. This
sensitivity makes this classical definition of the Hausdorff distance not practical. A
more appropriate way to overcome this problem is to alter Equation 5.1.6 to
th
min ka bk
h(A, B) = faA
bB

(5.1.8)

th
where faA
denotes the f -th quantile value of minbB ka bk over the set A, for some

value of f between zero and one. When f = 0.5, Equation 5.1.8 becomes the modified
median Hausdorff distance. Huttenloacher provided a good approximation algorithm,
which proved highly efficient [59]. He developed some pruning techniques that reduce

168

A
B

H(A,B)

H(A,B)

Figure 5.7: Hausdorff distance shown for two point sets of ellipses. The ellipse pair
on top are better fitted, and result in the smaller H(A, B).

the running time significantly - with three speed up techniques (ie., ruling out circles,
early scan termination and skipping forward techniques). These are used in combination in order to rule out many possible relative positions of the model and the image
without having to explicitly consider them. Also, using the modified definition as in
Equation 5.1.8, the system robustness against small image perturbation and missing
features improved (as this allows for partial shape matching).
Huttenloacher also conducted Monte Carlo comparison study of the distance transform (Chamfer distance) and Hausdorff distance matching measures [58]. The algorithms were tested on synthetic images where clutter and occlusion were varied. The
performance comparisons of the DT (Chamfer) and Hausdorff measures were presented in terms of Receiver Operating Characteristic (ROC) curves that measure the
detection rate versus false alarm rate. The test result indicated that the Hausdorff
measure was better than the Chamfer measure.

169

Olson and Hutttenlocher [98] presented a modified version of the Hausdorff measure
which uses both the location and orientation of the model and image pixels in determining how well a target model matches the image at each position. To do this,
the target models and images are represented as sets of oriented edge pixels. The
bk
distance term ka bk in Equation 5.1.8 is replaced by max(ka bk, ka
) where a

and b are the pixel orientations of a and b, respectively, and is used to make the
values of ka bk and

ka b k

comparable. The performance of this modified version

was tested on the synthetic images used in [58]. The ROC curves showed that the
modified version yields improved robustness against clutter and reduced false alarm
rates. Furthermore, the use of orientation information has also been shown to speed
up the recognition process.

5.1.6

Averaged Dot Product of Contour Direction Vectors

Steger [107, 108] compared performances of the distance transform, Hausdorff and
Hough Transform method against occlusion, clutter and illumination variations, and
proposed a new match metric that is robust against occlusion and clutter.
Given an image and its transformed 2-D model, the match metric at a particular
reference point q in the image is computed as an averaged dot product of contour
direction vectors of the transformed model and image over all points (ie., pi where
i = 1 . . . n) of the model,
n
0
1 X huM (i) , uI(q+p0i ) i
s=
n i=1 ku0M (i) kkuI(q+p0i ) k

(5.1.9)

where u0M (i) is a direction vector of the ith point, p0i , on a transformed model, and
uI(q+p0i ) is a direction vector of the image point, whose location measured with respect
to the reference point q corresponds to the transformed coordinate of pi (ie., p0i ). If a

170

transformed model is overlaid to a very dense and randomly oriented clutter image,
then as the dot products will have positive and negative values, the average will be
small (ie., s ' 0). The threshold can be adjusted according to how much occlusion the
system is willing to accept. This match metric is inherently robust against occlusion
and clutter, because any missing part would not corrupt the value of the average
substantially, and the use of dot products will assign a large weighting to a pixel pair
if their directions are the same, and reduce the weighting if the pixel directions are
different. If the direction differs by more than 90 , the pixel pair will be penalised.
Steger [108] demonstrates that this approach achieves high recognition rates, when
applied to flat objects in a controlled environment. This method appears to be
more adequate for controlled industrial settings, where the object is flat, its shape is
well defined, the object rotation is limited to a single axis rotation, and the model
description is precise. Therefore, the direct application of this metric to our aircraft
application would not be appropriate.

5.1.7

Discussions on Fitting Metrics

So far, we have looked at six matching metrics relevant to our application. These
metrics are based on either lines or points and have different strengths and weaknesses.
As a first step of a matching algorithm design, the feature level (point, line, curvature,
etc) at which the model-to-image matching should be carried out needs to be decided.
In this subsection, we consider some practical issues associated with such pixel and
line based matching metrics.
If one assumes that the decomposition of a model outline can lead to connected lines
that are identical to those extracted from the image, then the mapping between the
model and image line features would be one-to-one. However, such an assumption

171

10
d

11

10

12
1
2

a
8

b
7
model data

12
1

2
3

b
7

11
d

image data

match

(a) one-to-many mappings due to edge fragmentations

c
b

de

4
f

3
g

f
model data

4
5
de
3 c
f

2 b

1a

image data

6
g 7
f 8

match

(b) many-to-many mappings when a curve is approximated by line segments

Figure 5.8: Examples of one-to-many and many-to-many mappings. (a) One model
line is mapped to many image line fragments (eg., c {8, 9}, d {10, 11, 12}). (b)
When a curve is approximated with a series of straight line segments, the resulting
mapping is likely to be many-to-many.

is not realistic as illustrated in Figure 5.8(a); object edges often appear fragmented
due to poor imaging conditions or processing inefficiencies. In this case, the modelto-image mapping needs to be one-to-many. Another example is provided in Figure
5.8(b), where a parabolic curve (eg., aircraft nose shape) is approximated by a sequence of straight line segments. As shown in the figure, the exact points at which
the curve is broken into successive segments can vary. Therefore, as is clear in Figure
5.8(b), one model line may map to two or more data line segments, just as one data
line segment may map to a number of model line segments.
According to Beveridge [5], the majority of object recognition systems presume a

172

one-to-one mapping, and only a few permit one-to-many mappings. Beveridges ISPD
fitting metric (in Section 5.1.1) accommodates both one-to-many and many-to-many
mappings, and achieves reliable matching with fragmented data. Nonetheless, the
effectiveness of this method has been tested only on object shapes having straight
edges.
Line-based fittings may be effective against clutter and noise. But their performance
is inherently limited by the fact that image details that failed to form line features
would not reach the mapping stage. If an object contains curved contours, then manyto-many mapping is more appropriate but would be more difficult to implement.
On the other hand, pixel-level fittings are not subject to line-correspondence problems, and are relatively simpler to implement. These methods, however, are usually
more sensitive to clutter effects than line-based fittings are. Steger [107, 108] and Olson and Huttenloacher [98] incorporate pixel orientation information to their fitting
metrics to restrain clutter/noise pixels from contributing to match scores.
Following this discussion, our proposed fitting method is also implemented at the
pixel-level because (a) we need to match curved contours, (b) the phase image is
already available for the system, (c) image details missed in the feature extraction
stage can become available to better discriminate between similar aircraft models
(eg., F16 and F18), and (d) fitting metric is less sensitive to polygonal approximation
of the model.

173

5.2

Model Generation and Pose Estimation

As shown in Figure 5.1, we consider five military jets (F16, F18, F111, F35 and Mirage) for model matching purposes. Simple three-dimensional models are constructed
by recording the vertex coordinates of 5 true scaled-down aircraft models and joining
them in a piecewise fashion as shown in Figure 5.9. This approach was the most
straightforward solution to get 3-D dimension (relative dimension) of the aircraft. To
check these models are applicable to the true aircraft images, the matching of these
models were applied to the real aircraft images in the test set, and contour overlap
was acceptable.
The blue and red contours correspond to the aircraft horizontal and vertical planes,
respectively. The horizontal plane outlines the aircraft silhouette boundaries viewed
from top, and the vertical plane represents the side view. The origin of the 3-D axis
system is set at the intersection of the wing leading edges, FP.
Given the 3-D model and the image, we define two reference systems to be used in
the viewpoint determination: an aircraft reference system (XYZ) where aircraft 3-D
coordinates are measured, and image reference system whose x and y axes lie on the
image plane, as shown in Figure 5.10. Let the X-Y plane of the object reference
frame be the plane where the co-planar aircraft wings lie. The wings exhibit lateral
symmetry about the X axis. In Figure 5.10, the vectors V1 and V2 may represent
the wing leading edges. The orientation of the aircraft reference frame with respect
to the image reference frame is expressed in terms of Euler angles: roll, pitch and
yaw.

174

FP

RP

Figure 5.9: Simplified 3-D model of an F16: bluehorizontal, redvertical. The


origin of the 3-D coordinate system is at the intersection of wing leading edges F P .

The three independent rotational matrices are defined in Equation 5.2.1.

1
0
0

R =
0
cos

sin

0 sin cos

cos 0 sin

R =
0
1
0

sin 0 cos

cos sin 0

R =
sin

cos

0
0
0

(5.2.1)

The product of the three matrices, R = [R R R ], transforms the aircraft coordinates from the aircraft reference frame to the image reference frame [116]. The yaw
angle, , corresponds to the rotation of the fuselage axis about the Z axis. In Figure

175

aircraft reference
frame

P
Y

V2

V1

X
(or v2 )
v'2

p
1
y'

v'1

image reference
frame

x
x'
image plane

(or v1 if referenced to xy frame)

Figure 5.10: Model to image projection. Translation and scaling are ignored to
simplify the diagram. The x0 -y0 axes are the projections of the rotated X-Y axes.
Note v10 and v20 can also be expressed as v1 and v2 if measured with respect to the
image reference frame (ie., x-y frame).

176

5.10, it is shown as the angle between the y and y axes. Rotating the image plane
by the yaw angle, , aligns the projected X axis of the aircraft frame (ie., x0 -axis in
Figure 5.10) with the image x-axis. Let v10 and v20 denote the projections of V1 and
V2 in the x0 y0 coordinate system, representing the wing leading edges. V1 and V2
can also be expressed as v1 and v2 if referenced to the image reference system (ie.,
xy coordinate system). The relationship between v1 , v2 and v10 , v20 can be expressed
as a simple 2-D rotation of an angle , as shown in Equation 5.2.2
"
v1 =
"
v2 =

cos sin
sin

cos

cos sin
sin

cos

#
v10

(5.2.2)

#
v20

If 1 and 2 designate the angles of v10 and v20 with respect to the x0 -axis, as shown
in Figure 5.10, then we can express v10 and v20 in the x0 y0 coordinate system as
"
v10 = k1
"
v20 = k2

cos 1
sin 1
cos 2

#
(5.2.3)
#

sin 2

where k1 and k2 are scale factors.


Having derived the above relationships, if V1 and V2 are given by V1 = [, , 0]T
and V2 = [, , 0]T in the aircraft reference frame, then after undergoing 3-D
rotation, projection and scaling, their image counterparts v1 and v2 can be expressed

177

respectively as
"

R R R

0 1 0
0

"
#

1 0 0

= s
R R R

0 1 0
0

v1 = s

v2

1 0 0

(5.2.4)

where s is a scale factor.


Equations 5.2.4 are expanded and re-organised as shown in Equation 5.2.5.
#"
#
"
cos sin
cos + sin sin
v1 = s
sin cos
cos
"
#"
#
cos sin
cos sin sin
v2 = s
sin cos
cos
Comparing Equations 5.2.2, 5.2.3 and 5.2.5, it then follows that,
"
#
#
"
cos 1
cos + sin sin
k1
= s
sin 1
cos
"
#
"
#
cos 2
cos sin sin
k2
= s
sin 2
cos

(5.2.5)

(5.2.6)

From Equation 5.2.6 we deduce that


cos
+ sin tan
cos
cos
= F
+ sin tan
cos

cot 1 = F
cot 2

(5.2.7)

where F = / is the model angle cotangent of the symmetric vectors (refer to Figure
5.10).

178

Halving the sum and difference of the cotangents in Equation 5.2.7 leads to
c=

1
(cot 1
2

+ cot 2 ) = sin tan


cos
d = 12 (cot 1 cot 2 ) = F
cos

(5.2.8)

The left side of Equation 5.2.8 is a measurable quantity from the image and F = /
is a known wing parameter. After some algebraic manipulation of Equation 5.2.8, we
obtain the quadratic equation,
x2 Ax + R = 0

(5.2.9)

where x = cos2 , R = d2 /F 2 and A = R + c2 + 1.


After solving Equation 5.2.9, the pitch and roll are inferred as,

= arccos( x)
c
= arctan(
)
sin

(5.2.10)

It should be noted that two opposite solutions (, ) and (, ) always exist. We


also assume that the pitch and roll angles are well within [90 , 90 ], and that the
aircraft wings are coplanar.

5.3

Model and Image Alignment

Having estimated the aircraft wing orientation in the image reference frame, the model
as a set of connected 3-D points undergoes a scaled orthographic projection. This
is a relatively good approximation to perspective since the object like an aircraft is
not deep with respect to its distance from the camera [60]. This process is described

179

in Equation (5.3.1). Let [xi , yi ] be the coordinates of a point in the 2-D image and
[xm , ym , zm ] be the coordinates of the corresponding point in a 3-D model, then the
transformation can be expressed as
"

xi (k)
yi (k)

xm (k)

+
= sP R
y
(k)
m

zm (k)
"

where, as defined in Section 5.2, P =

1 0 0

"

4x
4y

#
(5.3.1)

is the projection operator, s is


0 1 0
a scale factor, and R = [R R R ] is the direction cosine matrix (DCM). The
point [xm (k), ym (k), zm (k)]T is the k th model vertex and [xi (k), yi (k)]T is its image
counterpart. The scale factor s is estimated as
s=

|F P RP |i
|F P RP |m

(5.3.2)

where the subscript i refers to the image, and m refers to the projected model. After
scaling, the projected model is translated by
[4x 4y]T = F Pi F Pm

(5.3.3)

so that F Pi and F Pm are brought into coincidence. This procedure also aligns both
model and image wing leading edges.
The model silhouette boundary in the image is obtained by projecting both the model
horizontal and vertical planes (blue and red lines respectively in Figure 5.11(a)) onto
the image frame. A simple boundary following algorithm is then applied to trance
the aircraft outer boundary in the image.

1. Start from the aircraft nose point, where the horizontal and vertical outlines
coincide. Select the left-most path.

180

TURN LEFT AT
INTERSECTION

TURN LEFT AT
INTERSECTION

START HERE
TAKE LEFT PATH

(a) silhouette tracing

(b) resulting model silhouette

Figure 5.11: Generation of transformed model silhouette.

181

2. Follow the left path until an intersection point (shown as green dots in Figure
5.11(a)) of the vertical and horizontal outlines is encountered.
3. At the intersection point, choose the leftmost path and repeat procedure 2.
4. Stop the process when the starting point (nose point) is reached.

Once the clockwise silhouette tracing is complete, the coordinates of the waypoints are
stored in an array. The stored points are subsequently linked in a piecewise fashion
to generate the model silhouette shown in Figure 5.11(b).

182

Figure 5.12: Filtered phase map: discrete orientations are displayed in different
colours.

5.4

Proposed Fitting Metrics

In Section 5.1.7, we have looked at some practical issues associated with implementing
the line and pixel based matching techniques. There we have put forth arguments in
support of pixel level matching, as applied to aircraft recognition.
From Section 5.3, the transformed model silhouette is overlaid on the phase image,
where each pixel carries the local orientation information. Figure 5.12 shows the
phase image, where different orientations are shown in different colours. We define
the model and image points (or pixels) as,
Pm = (xM (m), yM (m)) M
Ii = (xI (i), yI (i)) I
where M and I are the model and image pixel coordinate sets in the image plane.

183

The slope angles of Pm and Ii are respectively given as m and i . In practice overlaying the transformed model outline onto its image counterpart, rarely results in a
perfect coincidence. However, by applying a tapering window of width dth (eg., cosine
tapering window as shown in Figure 5.13) to each pixel of the model silhouette, the
corresponding image outline is more likely to fit, at least partially, within the windowed region. The 3-D representation of the cosine taper function along the model
silhouette boundary is shown in Figure 5.14.
The matching algorithm proceeds as follows. For each visited model point, Pm M,
we draw an orthogonal strip (or window), of length dth , and search for the closest
image point having a similar orientation as Pm on the strip (refer to Figure 5.15).
We record this image point, which we denote by If (m) , where the function f maps
the index, m, of the currently visited model point to its image counterpart. We, in
parallel, record the window value Wm , which measures the quality of fit. Hence, as is
clear from (5.4.1), which is the cosine taper function plotted in Figure 5.13, a value
of Wm = 1 corresponds to a perfect model-image pixel coincidence. A value of 0 on
the other hand, indicates the absence of an edge pixel having a similar orientation as
the model point.

0.5 + 0.5 cos dm , if dm <dth & |i m |<th


dth
Wm =

0
otherwise

(5.4.1)

where dm = kPm If (m) k. After completing the model pixel tracing along, M, the
resulting weights, Wm , are summed and normalised by the total pixel count for that
model silhouette (refer to (5.4.2)). This number is the model match score associated
with the current pose estimate.
Sm =

X
Pm M

Wm /

X
Pm M

(5.4.2)

184

proximity weight

0.8

proximity
weight

0.6

0.4

0.2

dth

dm

dth

Figure 5.13: Proximity weight - ranging from 0 to 1 for each pixel pair.

Figure 5.14: Overlay of the 3-D cosine taper function along the projected model
boundary. The red colour is equivalent to 1, and blue colour in the background is
equivalent to 0.

185

d th =cosine taper width

direction of
model pixel scan
currently visited
m th model pixel

image pixel search strip


perpendicular to the
model silhouette

dm

closest image pixel


detected
in the search strip

Figure 5.15: Search for the closest image pixel having a similar orientation to the
current model pixel. The distance between the two pixels is dm .

5.5

Fitting Optimisation and Best Match Finding

The estimated aircraft pose is subject to errors due to various sources. The imaging
model considered is a simplification of the full perspective projection model. The
assumption of aircraft wing coplanarity only holds approximately.
The most straightforward method to reduce the pose errors is to perturb the transformation parameters (ie., roll, pitch, yaw, translation and scale factor), until the
best match is obtained. This can be time-consuming if a brute-force approach is
adopted to determine the pose. The use of the cosine taper function (Figure 5.13)
during the match score calculation can handle modest amount of errors due to model
simplification and optical distortions. Significantly larger errors are due to wing edge

186

90

140

80
120
70
100

no of occurrences

no of occurrences

60

50

40

80

60

30
40
20
20
10

20

40

60
80
100
120
140
Angle subtended by the wing leading edges

(a)

160

180

20

40

60
80
100
120
140
Angle subtended by the wing trailing edges

160

180

(b)

Figure 5.16: Histogram of the angles between the wing leading edges (a), and histogram of the angles between the wing trailing edges (b). These angles are taken
from the winning aircraft hypotheses of the 300 real aircraft images.

displacements which often occur during the edge and straight line extraction processes. Such wing edge displacements cause the intersection points F P and RP to
move away from their true locations. This shift of the F P and RP points affects not
only the orientation angles but also the scale factor (Equation (5.3.2)) and translation
(Equation (5.3.3)) as well.
Usually, the point F P is located accurately due to the fact that the wing leading
edges are long and the angle between them is much less than 180 as shown in Figure
5.16(a), which shows the distribution of angles between the wing leading edges based
on a sample set of 300 real aircraft images. On the other hand, RP is more susceptible
to displacement error because the wing trailing edges are typically shorter and often
occluded by the rudder. More importantly, the angle subtended by them is relatively
closer to 180 (see Figure 5.16(b)), making the positional error of RP very sensitive
to slight edge rotations.
Figures 5.17 and 5.18 illustrate these observations. In Figure 5.17, the extracted wing

187

Please see print copy for Figure 5.17

Figure 5.17: Incorrectly estimated position of RP , and the resulting rotational shift
of the wing symmetry axis.

trailing edges of the winning hypothesis are short, and the right wing trailing edge
is slightly misaligned. This causes RP to shift slightly towards the nose and away
from the fuselage axis. The implication of this on the pose estimation is obvious, as
illustrated in Figure 5.18. The transformed model is slightly smaller than its image
counterpart, and shows relatively large orientation errors. This poor alignment results
in a low match score.
Knowing that RP is the dominant contributor to pose error and that perturbing all
five transform parameters (ie., roll, pitch, yaw, translation and scale factor) can be
computationally expensive, an alternative method is proposed where the perturbation
is made only to RP instead of the five parameters. Now the computational complexity
drops drastically to O(n), where n is the number of perturbed RP locations. The
perturbed RP s span a grid which is centred at the initial estimate and oriented along
the wing trailing edges. Such a configuration of the RP grid is shown in Figure 5.19.

188

Please see print copy for Figure 5.18

Figure 5.18: Poor outline matching due to relatively large transformation errors.

Please see print copy for Figure 5.19

Figure 5.19: Various RP s in a grid for iteratively determining the correct transform
parameters.

189

Please see print copy for Figure 5.20

Figure 5.20: Match with the highest match score after considering all RP s in the
grid.

The RP positions in the grid are iteratively used to re-estimate the pose and calculate
the match score. Figure 5.20 shows the alignment achieved with the RP position in
the grid which resulted in the highest match score. Numerous tests demonstrated that
this approach of perturbing RP can provide a pose estimate almost as accurate as the
one using the five-parameter perturbation approach at a fraction of the computational
effort. So far the pose estimation analysis and model matching focused on one aircraft
model. Having a set of M models would, on average, lead to an M-fold increase in
processing time. However, a number of invariant shape description parameters exist,
which help reduce the model set to be considered for pose calculation and model
matching. These shape descriptors are listed below and are computed for all aircraft
candidates in the image and the model set.

Wing shape - boomerang, triangle and diamond. The wing shape is invariant

190

Generic Aircraft
(winning hypothesis)
wing shape

Boomerang Wing

Triangular Wing

narrow wings & fuselage?


y

Diamond Wing
wing edge coterminate?

Large Aircraft

Delta Wing

Delta Wing

FWR=
|C-FP|/|FP-RP|

.... B747 ....

.... F111 ....

.................

.... F16 ....

.... Mirage ....

.... F18 ....

Model Base

shortlist of selected models from the model base

Figure 5.21: Model Hierarchy for efficient model search.


under the generic view assumption.
Fuselage to wing ratio (FWR): which is defined as kC F P k/kF P RP k. This
ratio is approximately invariant because the nose and wing symmetry axis of
most aircraft are approximately aligned.
Co-terminating leading and trailing wing edges: This feature is invariant during
the imaging process and is indicative of the presence of delta wing aircraft.

As for the boomerang wing aircraft, additional checks on the wing and fuselage narrowness may assist in discriminating the large class airplanes from fighter jets. The hierarchical process of shortlisting the model candidate for matching purposes is shown
in Figure 5.21.

191

Use of the RP grid as shown in Figure 5.19 can be split into two steps: coarse level
and finer level matchings. The coarse level matching is carried out using a sparse
RP grid applied on the shortlist of the selected models. The model and the position
of RP associated with the maximum match score are noted. If the match score is
very high (> 90%) then the match is accepted. However, if the match score is not
high enough, then we first check if it is distinctively higher than the remaining match
scores. If so, the model is accepted and enters the finer level matching. If not, modelRP combinations associated with the three highest match scores are used for finer
level matching.
In the finer level matching, the RP associated with the previously found maximum
match score is used as the centre of a finer RP grid. The search for the maximum
match score is carried out by iteratively trying different RP positions in the grid. If
the maximum score exceeds a preset threshold, then the match and the associated
pose are accepted. This process is depicted in the block diagram of Figure 5.22.

192

List of winning
hypothesis (up to 5)

input hypothesis

Shortlist of
model candidates
coarse level matching
sparce RP grid

1. estimate pose
2. transform model to image
3. calculate match score

[model,pose]
max score >= 90% ?

max score

STOP
yes

no

check the max score


against threshold

max score>>runner up score?


no
[models,poses]

yes
[model,pose]

top 3 scores

max score

match fine tuning


dense RP grid

1. estimate pose
2. transform model to image
3. calculate match score
[model,pose]

max score

STOP
check the max score
against threshold

Figure 5.22: Efficient two-step model fitting process.

193

5.6

Model Matching Results

A large set of 200 images of 5 scaled-down aircraft (ie., F16, F18, F111, F35 and Mirage) were obtained in a controlled environment where the effects of shadow, blurring,
protrusion, camouflage, clutter and occlusion are introduced. The 3-D wire-frame
models for these aircraft were generated by taking the dimensions of the scaled-down
aircraft and were matched to the winning aircraft hypotheses from the test images.
This section presents some model matching outcomes that illustrate the systems
ability to identify the viewed aircraft.
Figure 5.23(a) shows an F111 aircraft, which has shadows on its surface and on the
ground. The winning hypothesis is shown in Figure 5.23(b). Notice in this figure that
the aircraft generic recognition is not accurate because part of the cockpit is mistaken
for a nose. When model matching is applied, this error is removed and the viewed
aircraft is correctly identified as an F111, as shown in Figure 5.23(d). Figure 5.23(c)
shows the phase image where the model-to-image matching is applied.
Figure 5.24(a) is again an F111 aircraft, viewed on a grid of lines. Figure 5.24(b)
shows that the winning hypothesis is correct. The fuselage axis estimation appears
accurate, which results in a fairly accurate initial pose estimate. The best match
given in Figure 5.24(d) shows a slight boundary mismatch which is largely due to
aircraft modelling errors.
Figure 5.25(a) is an F16 aircraft which is partly obstructed by tree branches. The
background is cluttered and there is a missile protrusion on the right wing. Figure
5.25(b) shows that the aircraft is correctly recognised despite occlusion. In Figure
5.25(c), a few densely cluttered regions are cleared (eg., below the right wing where
the branches are cluttered). Excessive occlusion can degrade the model matching

194

performance as a large portion of the aircraft silhouette, not visible to the camera,
does not contribute to the match score. In this figure, however, occlusion is not
severe and the match threshold is exceeded. The transformed model, shown in Figure
5.25(d), is well aligned with the image. A slight displacement of rudder edges is mainly
due to aircraft modelling inaccuracies.
In Figure 5.26(a), a Joint Strike Fighter (JSF) aircraft is surrounded by dense clutter,
and the cockpit region is occluded. Figure 5.26(b) shows that the nose is correctly
detected and most of the aircraft parts are correctly recognised. Pixel-level matching approaches suffer when applied to densely cluttered images, because regardless
of how the model is transformed, the projected model points always find matching
image points from the densely cluttered region. In order to overcome this problem,
densely cluttered regions are filtered with the clutter removal algorithm of Section
3.3, as shown in Figure 5.26(c). The clutter removal process, combined with the incorporation of pixel orientation in the model matching algorithm (refer to Equation
5.4.1), enable a correct match as shown in Figure 5.26(d).
Figure 5.27(a) is a camouflaged Mirage aircraft with missiles under its wings. Figure
5.27(c) shows that the wing leading edges are fragmented by the missile protrusions.
In Figure 5.27(b), those fragments are successfully extended. The boundary alignment
between the winning model and the image counterpart appears to be very accurate.
Figure 5.28(a) is an F18 aircraft which has a shadow underneath it, and the background has grid lines. In Figure 5.28(b), the generic winning candidate contains the
correct parts except for the trailing edge of the left tail fin, which belongs to the grid
line fragments. Figure 5.28(d) shows good alignment in the wing edges and fuselage
outlines. The alignment is less accurate in the rear end of the aircraft and wing
tips. The wire-frame model of the F18 does not include wing tip missiles, causing a
slight misfit at the wingtip. The overall fitting of the model to its image is, however,

195

satisfactory.

5.7

Summary

In this chapter, we presented a model matching technique which aligns the simple 3-D
model with the winning hypothesis in the image at the pixel level. The aircraft pose
is estimated by measuring the angles of the wing leading edges. It was shown that the
initial pose estimate can sometimes be poor when the aircraft wings are not coplanar,
and/or the extracted edges are displaced due to poor image quality. An alternative
method to fine tune the aircraft pose was proposed and found to be efficient.
For the model matching metric, we chose a pixel-based fitting because implementing
it on complex shapes is less difficult than line-based matchings (eg., no many-to-many
line mapping required), and image details missed during the line extraction stage can
be recovered and contribute to the matching process. The orientation of the pixels
in the image was incorporated to the matching algorithm in order to prevent clutter
pixels from falsely contributing to the match score.
Using viewpoint invariant measurements imbedded in the winning hypothesis, the
model search space could be pruned to speed up the matching process. The matching algorithm has been tested using five scaled-down aircraft models, and showed
promising results. The statistical analysis of the matching performance is deferred to
Chapter 6.

196

Please see print copy for Figure 5.23

Figure 5.23: Model matching for F111 with shadow (match score = 64%).

197

Please see print copy for Figure 5.24

Figure 5.24: Model matching for F111 with grid clutter (match score = 66%).

198

Please see print copy for Figure 5.25 through to Figure 5.28

Figure 5.25: Model matching for F16 with occlusion and protrusion (match score =
75%).

Chapter 6
Performance Analysis
In this chapter, the systems performance is evaluated in the presence of real world
problems, using a large test set of real images. Section 6.1 briefly describes the tuning
process of the system parameters, and the preparation of the test suite, comprising
real aircraft, non-aircraft and scaled-down aircraft images. In Section 6.2, the search
combinatorics are discussed in terms of line-grouping complexity. The effectiveness of
using the neural networks for aircraft feature detection, intensity-based constraints,
dense clutter removal and line extension algorithms on the computational savings and
the systems performance is pointed out. Section 6.3 analyses the generic recognition
performance of the system in terms of true and false recognition rates. The use
of Receiver Operating Characteristic (ROC) curve gives an insight into the tradeoff between recognition and false-alarm rate, and also assists in setting the score
threshold. Comparisons are made between the two ROC curves yielded by using
the rule-based and neural network based feature detection algorithms. Matching
performance of the system is presented in Section 6.4.

203

204

6.1

Implementation

The system is tested with 8 bit visual-band intensity images. The images are selected to assess the systems performance against poor image quality, shadow effects,
clutter, occlusion, camouflage and the existence of multiple aircraft. The system was
implemented using Matlab.
Initially, a representative training set of 100 real aircraft images and 60 non-aircraft
images that reflect the real world concerns, was used as a guideline for algorithm development and to fine tune the aircraft part detection and clutter rejection algorithms.
The non-aircraft images consist mainly of buildings and urban areas.
Based on the training set, the system parameters and thresholds were adjusted to
accept desirable features under degraded conditions, and to reject ambiguous features/parts. Furthermore, scores from positive evidences and penalties from negative
evidences were adjusted in order to widen the score gap between the correct and
spurious hypotheses.
We also have another version of the system that uses the neural networks to extract
wing, nose, wingpair and wingpair-nose association features. Training and validating
of the neural network required more experimental data from the real aircraft features,
therefor 200 additional aircraft images were processed to increase the experimental
data size to 300. The latter stages such as the evidence accumulation and ambiguity
resolution are not touched.
The test set consists of a total of 520 real images, comprising 220 real aircraft images,
200 scaled-down model images (for model matching) and 100 non-aircraft images.

205

Table 6.1: Comparison of the total number of lines with and without the use of the
clutter removal algorithm for images with dense clutter.
image index
4
50
58
59
61
62
63
65
70
75
85
91
average

6.2

N0E
no clutter removal
1619
741
1927
830
1219
1366
1182
906
1058
2084
803
1684
1285

NE
clutter removal
744
574
491
532
566
651
750
648
595
901
679
618
646

NE /N0E
0.45
0.77
0.25
0.64
0.46
0.47
0.63
0.71
0.56
0.43
0.84
0.36
0.55

Computational Complexity

The total number of lines, NE , as defined in Section 3.4.2 varies from about 40 to over
1000. Small line counts are usually obtained from aircraft images with no clutter in
the background. It was also observed that about half the images produced more than
200 lines. More challenging images containing background clutter usually generate
a large number of line fragments. If the clutter regions contain dense and randomly
oriented clutter pixels, then they can be removed by applying the proposed clutter
removal algorithm of Section 3.3. Table 6.1 demonstrates the differences that the
clutter removal algorithm makes to NE for heavily cluttered images. The use of the
clutter removal algorithm provides about 50% reduction in NE , significantly cutting
down the computational complexity of subsequent processes.
If, however, the background clutter pixels are not randomly oriented and present
relatively long segments, then NE may become very large (600-1100). The extracted

206

Number of Line Groupings


700
blue:no. of nose candidates
red: no. of wing candidates
black: no. of fourline groupings
green: no. of aircraft hypotheses

number of line groupings

600

500

400

300

200

100

100

200

300

400

500 600
Line Count

700

800

900

1000

Figure 6.1: Number of line groupings extracted by the rule-based method: NN (blue),
NW (red), N4G (black) and NH (green) versus line count NE (x-axis).

Number of Line Groupings


700
blue:no. of nose candidates
red: no. of wing candidates
black: no. of fourline groupings
green: no. of aircraft hypotheses

number of line groupings

600

500

400

300

200

100

100

200

300

400

500 600
Line Count

700

800

900

1000

Figure 6.2: Number of line groupings extracted by the neural network based method:
NN (blue), NW (red), N4G (black) and NH (green) versus line count NE (x-axis).

30

25

25

20
15
10
5
0

number of occurrences

number of occurrences

30

200

400
600
line count

800

20
15
10
5
0

1000

30

30

25

25

number of occurrences

number of occurrences

207

20
15
10
5
0

200

400
600
800
4line grouping count

1000

200

200

400
600
800
2line grouping count

1000

20
15
10
5
0

400
600
hypothesis count

800

1000

Figure 6.3: Distribution curves of the number of line groupings, NE (top left), NW
(top right), N4G (bottom left) and NH (bottom right), obtained via the rule-based
approach from the cluttered aircraft images.

lines are prioritised in terms of their length, and if the total line count is large, then
only the top 140 salient lines are labelled as significant (see Figure 3.9).
Let NS , NW , NN , N4G and NH be the number of significant lines, wing candidates,
nose candidates, four-line groupings and hypotheses, respectively. The complexity of
the two-line grouping generation for potential wings and noses is O(NE 2 ). However,
since wing candidates are formed using significant lines only, the wing candidate
count, NW , cannot exceed NS (NS 1)/2, where NS = 140. Figure 6.1 shows the
curves of NN , NW , N4G and NH versus NE , when the line groupings are extracted
using the rule-based approach. In this plot, NW stabilises when the line count NE
exceeds 140. On the other hand, NN increases approximately linearly with NE , as
non-significant lines are allowed to contribute to the nose formation. NN can be
approximately estimated as NN = 0.2NE , where NE is the total line count. Notice,

208

in Figure 6.1, that the nose count remains practically below 200.
Four-line groupings are formed by pairing the wing candidates, which are composed
of significant lines. Therefore, the computational complexity for four-line grouping
generation is expressed in terms of NS , as O(NS 4 ), where is in the order of 104 .
In Figure 6.1, the N4G curve (black) remains well below the NW curve (red) (ie.,
N4G 0.3NW ), and stops increasing as NE exceeds 140, showing that a large portion
of spurious groupings are successfully rejected in the four-line grouping process.
An aircraft hypothesis is generated if a four-line grouping finds a matching nose, requiring N4G NN computations. Therefore, the computational complexity for aircraft
hypothesis generation process would be O(NS 4 NE ) where is less than and is
also in the order of 104 . In Figure 6.1, the NH curve (green) is well below N4G , and
remains on average around 50.
Figure 6.2 also shows NN , NW , N4G and NH as a function of NE , but this time,
the line groupings are extracted using the neural networks. This figure looks similar
to Figure 6.1, except that NW is reduced by about 40% and NN is now kept below
80 instead of linearly increasing as in Figure 6.1. This indicates that the neural
networks successfully removed some of the spurious wing and nose features that the
rule-based approach could not remove. However, the reduction in N4G for the neural
networks is less impressive, and reduction in NH is only noticeable for NE > 600.
This indicates that the rule-based reasonings may have been also effective. Anyhow,
it can be concluded that the neural networks reduce the computational load of the
system.
If NE > 450, then our system considers the image as being cluttered. The distribution
curves of NE , NW , N4G and NH for cluttered images are shown in Figure 6.3. The
distribution curves are obtained with the rule-based approach. These curves appear

209

roughly Gaussian with means and standard deviations of E ' 650 (E ' 120),
W ' 400 (W ' 140), 4G ' 100 (4G ' 50) and H ' 45 (H ' 35) for N4G , NW ,
NE and NH , respectively. Such statistical characteristics are used as a guideline for
adjusting the memory allocations for the line and line grouping databases.
The aircraft hypotheses undergo the evidence accumulation process to support or
negate the hypothesis. The complexity associated with these processes is O(NH NL )
where NL is the total number of extracted lines prior to the line extension process
(eg., NL < NE ). After the evidence accumulation process, only a portion of the
hypotheses with a score above 420 is allowed to proceed to the conflict resolution
process. The complexity for interpretational conflict resolution process is O(NH 2 )
where 1. This system is configured to accept up to 5 winning hypotheses to
enable multiple aircraft recognition.
For model matching, let NW H , NM , LC , WC and NRP be the number of winning
hypotheses (usually 1), number of short-listed model candidates, contour length of
transformed model, width of the cosine weighting function (ie., dth in Equation 5.4.1),
and number of RP locations (for pose estimate fine tuning), respectively. The worst
case complexity for the matching process is O(NW H NM LC WC NRP ). The use of a hierarchial model for efficient pruning of the search space, a coarse-to-fine grid approach
for locating RP , and a more accurate model representation to allow for narrower WC
can reduce the complexity. The computational complexity is summarised in Table
6.2.
For any image understanding task which relies largely on edge features, it is important that lines belonging to the aircraft structure present a sufficient length. Often
in practice, such lines become fragmented and missed out after the line extraction
process. There are numerous reasons for such fragmentation problems: physical discontinuations due to protrusions, occlusion and wing flaps, and shortcoming of the

210

Table 6.2: Computational complexity of aircraft recognition and identification processes.


Process
2-Line Grouping
Formation
4-Line Grouping
Formation
Hypothesis
Generation
Evidence
Accumulation
Ambiguity
Resolution
Matching

Complexity
O(NE 2 )
O(NS 4 )

Comments
NE : number of total lines
NS : number of significant lines
is in the order of 104

O(NS 4 NE )

< is in the order of 104

O(NH NL )

NH : number of hypotheses
NL : number of lines before extension
1

O(NH 2 )
O(NW H NM
LC WC NRP )

NW H : number of winning hypotheses


NM : number of model candidates
LC : transformed contour length
WC : pixel search width
NRP : number of RP locations

edge detection algorithm due to noise, clutter surrounding the wing edges, wing camouflage and blurring. The system implements the line extension algorithm in order
to enhance the survivability of the wing edge lines. Figure 6.4 shows a plot of the
number of extended lines (ie., NE NL ) versus the total number of unextended lines,
NL . The value of NE NL increases approximately linearly with NL . The slope of
the line is roughly 0.15, which indicates that the line extension algorithm increases
the total line count by 15%.
In order to determine how effective the line extension algorithm is, the line images
were examined to count the occurrences of the wing edge fragmentation that should
be extended upon visual inspection. The total count was 495. It is then followed by
counting how many of them have been successfully extended by the line extension
algorithm. 409 edges out of the 495 fragmented wing edges have been extended,
resulting in a recovery rate of 82.6%. All of the extended wing edges are labelled

211

Number of Extended Lines


1000

900

800

number of extended lines

700

600

500

400

300

200

100

100

200

300

400
500
600
number of unextended lines

700

800

900

1000

Figure 6.4: Plots of total line counts. The curve represents the number of the extended
lines as a function of the unextended lines (prior to the line extension process).

as significant, and have a better chance of surviving in the higher level processes. A
recovery rate above 80% at the expense of a 15% overhead is satisfactory.
For two-line groupings (or wing candidates), the sole use of line-based constraints
has shown its limitations in keeping down the number of wing candidates. It was
discussed earlier in this section that the total number of two-line groupings is maintained reasonably small, despite a large increase in the line count (see Figure 6.1).
Apart from the fact that NS is limited below 140, the use of intensity-based information also contributes to a significant reduction in the two-line grouping count. To
support this argument, the two-line grouping process (using the rule-based approach)
was repeated on the same image set, but this time the intensity check routine was
disabled. The result is shown in Figure 6.5, where the rule-based approach is used.
The black curve represents the number of two-line groupings without the intensity
check, as opposed to the red curve obtained with the intensity check. For NE > 250,

212

Twoline grouping counts with and without the intensity check


1000
: no.of twoline groupings with intesity check
: no. of twoline groupings without intensity check
900

number of twoline groupings (N )

800

700

600

500

400

300

200

100

100

200

300

400

500

600

700

800

900

line count (NE)

Figure 6.5: Plot of NW curves obtained from real aircraft images, using the rule-based
two-line grouping extraction algorithm. The red and black curves represent NW with
and without intensity checks, respectively.

the red curve no longer increases, but the black curve continues to increase, and the
gap between the two curves widens slowly.
The two-line grouping count gap is more evident with non-aircraft cluttered images.
Figure 6.6 illustrates this point, and shows an average count separation of about 300
for line counts exceeding 600. The use of intensity-based information almost halved
the number of two-line groupings formed.
In the four-line grouping formation process, the intensity check was not necessary
because an increased number of geometric constraints were available. However, the
intensity information played an important role in the evidence accumulation and
shadow discrimination stages.

213

Twoline grouping counts with and without the intensity check


1000
: no.of twoline groupings with intesity check
: no. of twoline groupings without intensity check
900

number of twoline groupings (N )

800

700

600

500

400

300

200

100

300

400

500

600

700

800

900

1000

1100

1200

1300

line count (NE)

Figure 6.6: Plot of NW curves obtained from non-aircraft clutter images,using the
rule-based two-line grouping extraction algorithm. The red and black curves represent
NW with and without intensity checks, respectively.

6.3

Generic Recognition Performance

For a statistical analysis of the recognition performance, a batch run was prepared
using 220 real aircraft images to test the systems capability to recognise the aircraft
in the image. Another batch mode trial was developed to test the reliability of the
system to declare no detection when the image does not contain an aircraft. A total
of 100 non-aircraft (clutter) images was used for this test.
We first define the performance indicators as below.

TP (True Positive): A correct hypothesis is accepted (score threshold).


FN (False Negative): A correct hypothesis is rejected (score < threshold).
FP (False Positive): A spurious hypothesis is accepted (score threshold).

214

TN (True Negative): A spurious hypothesis is rejected (score < threshold).


RR (Recognition Rate): T P/(T P + F N ), where (T P + F N )=total number
of aircraft in the image set.
FAR (False Alarm Rate): F P/(F P + T N ), where (F P + T N )=total number
of false hypotheses.

The setting of the threshold is application dependent. The user has to weigh up the
rate of false alarm in light of the available resources and the intended application
objectives. As an example, if it is judged that declared aircraft detections can, at a
low cost, be confirmed, by either taking and processing other subsequent images of
the same scene or having a human operator examine the detections, then the score
threshold may be lowered to reduce the misses. Use of a receiver operating characteristic (ROC) curve provides a convenient way to visualise the trade-off between
the recognition and false alarm rates (ie., RR versus FAR) for all possible threshold
settings, and reflects the system performance. As mentioned in Section 3.8, the system utilises two approaches to extract wing, nose, wing-pair and aircraft candidates
- one is rule-based and the other one uses the neural networks. The effects of the two
approaches on the overall performance are compared in terms of the ROC curves as
shown in Figure 6.7. The red and blue colours are respectively associated with the
rule-based and neural network based approaches. The area under the ROC curves
is larger for the neural network approach, suggesting that the neural networks generate less spurious hypotheses while maintaining or improving the recognition rate.
The gaps between the curves are more noticeable near the operating point (eg., FAR
< 10%), and diminishes as the FAR increases. Table 6.3 shows the recognition performance for 12 different thresholds. In the table, the neural network based approach
usually generates a smaller false alarm rate for a given recognition rate. For a false

215

ROC curve for hypothesis scores


0.9
0.8
Rulebased feature extraction method
NNbased feature extraction method

0.7

recognition rate

0.6
0.5
0.4
0.3
0.2
0.1
0

0.1

0.2

0.3
false alarm rate

0.4

0.5

0.6

Figure 6.7: ROC curves for the generic recognition of aircraft. The red curve is
obtained when the rule based method is used for the extraction of line-groupings and
the blue curve is obtained using the neural networks.

alarm rate of about 7%, a recognition rate of 84% is achieved when the neural networks are used. With the rule based approach, the recognition rate drops to 81%,
which is also acceptable.
Table 6.4 gives a performance breakdown in terms of image category. The image
categories are blurring, camouflage, clutter, multiple targets, normal environment,
occlusion, protrusion and shadow. The performance was roughly consistent across the
image categories. However, comparison of recognition rates among these categories
is not accurate as the number of samples in each category is not large enough.

216

Table 6.3: Performance evaluation using real aircraft and clutter images.
Threshold
490
500
510
520
530
540
550
560
570
580
590
600

CRR (Rule)
87.7%
87.3%
86.5%
85.9%
85.7%
84.9%
84.7%
83.6%
82.0%
81.2%
80.2%
78.1%

FAR (Rule)
30.6%
27.5%
23.8%
23.1%
21.3%
16.8%
15.0%
12.5%
8.7%
6.8%
5.0%
3.8%

CRR (NN)
86.1%
85.7%
85.3%
84.9%
84.1%
83.3%
82.2%
80.5%
79.2%
77.5%
77.5%
76.2%

FAR (NN)
9.4%
8.4%
7.5%
7.4%
6.8%
6.7%
6.3%
5.0%
3.1%
1.3%
1.3%
0.7%

Table 6.4: Recognition rates in the eight imaging categories. Note that for the multiple aircraft category, the denominator 42 is the total count of aircraft in 17 multiple
aircraft images.
blur
TP 21/24
%
(88%)
normal
TP 32/35
%
(91%)

camouflage
19/23
(83%)
occlusion
29/32
(91%)

clutter
29/35
(83%)
protrusion
29/31
(94%)

multiple
34/42
(81%)
shadow
20/23
(87%)

217

6.4

Matching Performance

The second part is focussed on aircraft identification. The image set comprises 200
images of scaled-down models representing F16, F18, F111, F35 and the Mirage. The
matching algorithm based on these models were tested on real aircraft images and
the contour matching was acceptable. However, not enough samples of those images
(which span various viewing angles) could be acquired for a statistical analysis of
the pose estimation and matching performance. Therefore, 5 scaled-model aircraft
were built and photographed under various viewing angles, blurring, contrast, clutter, occlusion, shadow and camouflage. This enabled us to control the degradations
and generate various viewing angles for rigorous pose estimation test. Furthermore,
camouflage could be applied at will. Note that the degradation is not as severe as in
Section 6.3 so that a large number of true hypotheses could be made available for the
model matching.
After the generic recognition, 190 out of 200 aircraft were successfully recognised
with 5 counts of false alarms from the image background. The correct 190 winning
candidates were then subjected to model matching. Viewpoint invariant quantities,
such as the wingpair shape and the FWR ratio (FWR= kC F P k/kF P RP k),
are used to prune the model search space. After estimating the pose of each model
candidate, a match score is computed. It is then followed by fine-tuning of the pose
to find the best match. The model candidate generating the highest match score is
regarded as the matching model, and the match score is recorded.
An additional set of 82 spurious high scoring hypotheses, from the real aircraft and
non-aircraft image sets, is also subjected to model matching. The objective is to
analyse the model matching performance against coincidental line groupings that
appear like an aircraft.

218

match score
100

90

80

70

match score

60

50

40

30

20

10

20

40

60

80

100
120
image index

140

160

180

200

Figure 6.8: Model match score: Correct match (blue asterisk) and false match (red
circle or red cross). A red circle represents a correct aircraft hypothesis matched to
a wrong model. A red cross represents a spurious aircraft hypothesis matched to one
of the models.

Figure 6.8 shows true and false match scores. A blue asterisk corresponds to correct
match. A red cross corresponds to an incorrect match obtained with the spurious
hypotheses as input. A red circle corresponds to a mismatch between an aircraft
hypothesis and a wrong model. Overall, we obtained 182 correct matches (blue asterisks) and 82 incorrect matches (red crosses + red circles). They occupy two distinct
regions, with a slight overlap in-between (around the score of 60%). Different thresholds were experimented with and the results are given in Table 6.5, using the following
performance indicators.
TP(True Positive): correct match with a score above threshold.
FN(False Negative): correct match with a score below threshold.
FP(False Positive): incorrect match with a score above threshold.
TN(True Negative): incorrect match with a score below threshold.

219

ROC curve for Match Scores


1

0.9

0.8

True Match Rate (TMR)

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.1

0.2

0.3

0.4
0.5
0.6
False Match Rate (FMR)

0.7

0.8

0.9

Figure 6.9: ROC curve: trade off between true and false match rates as the threshold
varies.

TMR (True Match Rate)- T P/(T P + F N )


FMR (False Match Rate)- F P/(F P + T N )

Figure 6.9 shows an ROC curve for the model matching performance. An operation
point around the FMR of 0.1 corresponds to a threshold of about 60% (refer to Table
6.5). Setting the threshold to 55% yields a very high FMR of 43.9%. Raising it to
Table 6.5: Matching performance parameters.
Threshold
55%
57%
59%
60%
61%
62%
63%
65%

True Match Rate


98.4%
97.8%
94.0%
92.9%
92.4%
89.1%
87.5%
74.5%

False Match Rate


43.9%
32.9%
18.3%
12.2%
9.8%
7.3%
7.3%
6.1%

220

65% significantly decreases the TMR. A threshold of 62% seems to be appropriate as


it achieves about 7% FMR while maintaining about 90% TMR.

6.5

Qualitative Performance Comparisons with Other


Systems

Head-to-head comparison with other method is not possible as it requires an access to


the code and benchmark test images of the previous works. Therefore, we provides an
indicative performance comparison based on the published works [36, 15, 123, 56, 32].
A summary is given in Table 6.6. Methods using global features such as moment or
Fourier Descriptor invariants almost always treat the object of interest in isolation
from the background (i.e., perfect segmentation), hence not adequate in real-world
applications where various forms of image degradation exist. A very recent publication
by Hsieh et al [56] proposes feature integration of 4 global methods (ie. bitmaps,
wavelet coefficients, Zernike moments and distance map) to capture the different
shape characteristics of an aircraft. For best performance, however, this method
assumes that the aircraft fits tightly within the image and that no other features exist
in the background that may interfere with the aircraft identification process. This
is typically feasible if an operator does the required pre-processing manually before
handing over the processed image to the identification algorithm. Furthermore the
input images are limited to satellite imagery of parked aircraft obtained from top
view. Our system handles oblique views as well as images of flying airplanes. The
only requirement for our system is that enough image resolution is available to detect
wing edges and other aircraft features.
Another widely known vision system that has been applied to aircraft recognition is
ACRONYM [16]. Despite the structural elegance of this system, it is not suitable for

221

Table 6.6: Performance expectations of other methods such M.I (moment


invariant)[36, 15], F.D (Fourier Descriptor)[123], Das et al.[32], and Hsieh et al.[56],
under different imaging conditions. The question mark means maybe.
method
M.I
F.D
Hsieh
DAS
Proposed

blur camo
Yes
No
Yes
No
Yes
?
Yes
No
Yes
Yes

clut
No
No
No
?
Yes

mult occl
No
No
No Yes
No
No
No Yes
Yes Yes

prot
No
No
Yes
Yes
Yes

shad
?
?
Yes
Yes
Yes

real-world applications unless significant enhancements and modifications are made


[32]. Das et al. [32] emphasise the need for a more practically oriented approach like
ours, and demonstrate a system that addresses the real-world concerns. One limitation of their system, however, is its over-reliance on region segmentation which would
not suit images of camouflaged aircraft. After examining their system description, it
also remains unclear how their system would behave in the presence of heavy clutter
(the shown test images only contain a modest amount of clutter) or if more than
one aircraft exists in the image. Our system is more capable of handling multiple
aircraft images, excessive clutter, occlusion of the wings by the rudder or rudders and
camouflage.

6.6

Concluding Remarks

An extensive test suite of 520 real aircraft, non-aircraft and scaled-down aircraft images, are used to cover a broad spectrum of image variations and also to test the
system performance. The performance analysis is carried out in terms of investigating various aspects of computational complexity, generating the ROC curves, and
descriptive performance comparison with other methods.

222

The system handles real world issues adequately by implementing the hypothesise
then verify paradigm where aircraft recognition decisions are made through a voting
(evidence accumulation) scheme. The system takes notice of distinctive image features
of aircraft and clutter at various levels of processing and implements a number of
geometric and intensity based heuristics to discriminate between aircraft and clutter.
Further enhancement to the system was the neural networks integration, which proved
to be a promising replacement for the heuristics in terms of a computational saving
and an improved ROC curve.
While the system is mainly driven by line features, the use of intensity based information was essential in widening the score gap between true and false aircraft candidates.
Furthermore, by progressively building up higher level features, the system was able
to keep the combinatorics under control. The pixel-level boundary fitting approach
displayed a consistently good model matching performance, in the presence of clutter
and occlusion.
The statical analysis of the systems generic recognition and identification performances produced a promising result. The recognition performance was consistent
across the imaging categories (refer to Table 6.4). From the ROC curves, true and
false recognition rates of 84% and 6.8%, and true and false matching rates of about
90% and 8% could be achieved. We find this result satisfactory.

Chapter 7
Conclusions
7.1

Summary

In this thesis, we present a knowledge-based approach for the generic recognition and
identification of aircraft in complex real-world imagery. The difficulties associated
with real-world imagery are occlusion, shadow, cloud, low image intensity contrast,
clutter, camouflage and flares.
The developed vision system is a rule based system, which uses a voting scheme
to reach a decision regarding the presence and location of an aircraft in an image.
Rules in this system mainly exploit the geometric relationships that hold within
and between aircraft parts. Image intensity information is also used to increase the
systems confidence to determine the aircraft parts and recognise the whole aircraft.
This system starts by detecting edges in an image and forming straight line features.
The extraction of these low-level features is achieved after dual thresholding, contour generation, clutter removal and line extension. Such primitive features are then
grouped in an incremental fashion to build more complex feature associations (eg.,
223

224

nose, wing pairs, tail fins, etc.). These feature groups eventually lead to the generation of a number of competing aircraft hypotheses, each of which is allocated a
confidence score (or vote), reflecting the degree of conformity to the aircraft generic
structure. Such a gradual build-up of the complex feature-associations requires intensive tuning process for the system parameters and thresholds. The neural network is
an attractive solution to this problem and could improve the system robustness.
Votes in this system are allocated proportional to the importance of the aircraft part
under consideration. The major components of an aircraft hypothesis are a wing
pair, a matching nose and a fuselage section. Due to their importance in the aircraft
recognition process, large voting scores are allocated to these parts. Other minor
scores are left to the less critical evidences arising from the wing tip and tail fin parts.
This system also makes use of negative evidences to penalise aircraft candidates that
contain contradicting features.
Although the recognition part of the system cannot provide the identity of the viewed
aircraft, it provides a broad classification of it in terms of wing shape. Aircraft
identification, however, was achieved through model matching using a model set of 5
fighter aircraft. The recorded correct identification rate was about 90% despite the
fact that the models used were simple wire frame representations of the aircraft in
the horizontal and vertical planes.

7.2

Discussion

This vision system is able to achieve a high aircraft recognition rate (> 80%) provided
that (a) the image intensity contrast is not extremely low in the regions of aircraft
wing edges and nose, (b) some of the aircraft line features are of sufficient length and
(c) the aircraft view is not too oblique that the wings become not clearly visible.

225

The first point needs to be further clarified by stressing that the contrast in areas
of interest (ie. wings and nose) should not be much lower than the largest contrast
recorded in the background.
The overall robustness of this aircraft recognition system is the result of the successful
integration of a number of features, which are summarised below.

1. Use of reasonably large edge detection templates and application of low dual
thresholds (as low as 16% and 10% of the peak edge gradient) for the purpose
of enhancing detection sensitivity to long straight edges of low gradients.
2. Extraction of long lines by extending shorter collinear lines; the objective of
this procedure is to join broken wing edges with reduced effect on lines in the
background.
3. Low level processing of the image background, including (a) identifying lines
predominantly oriented along one or two directions, and (b) removing dense
clutter pixels displaying random orientations.
4. Line organisation according to their significance, endpoint proximity and collinearity to selectively use them to improve the system robustness and computational
efficiency.
5. The direct and indirect use of intensity information to supplement the geometric
reasoning. This improves the systems capability to discard spurious hypotheses
arising from clutter.
6. Validation and identification of the recognised aircraft via the pixel-level model
matching, which is applied to the phase image to reduce clutter interference.

The two first robustness features address the problem of detecting weak edges in the

226

image and forming longer lines. In the actual system implementation, all thresholds
in the line extraction routines were relaxed to ensure that desirable wing edges are
not missed. This requirement, however, causes a large number of unwanted extended
lines to occur and leads therefore to the emergence of a large number of spurious line
groupings. These groupings, however, are most often discarded at the higher level of
generic aircraft recognition.
The third robustness feature addresses the interference problem of a cluttered background with the aircraft recognition process. By removing most or part of the clutter,
we considerably reduce the number of edges in the image. This in turn, reduces the
overall number of straight lines and therefore improves the ranking of wing edges in
terms of length (ie., wing edges become significant in the image). Furthermore, we
extend our background processing to longer polarised or grid background lines that
may reduce the saliency of aircraft boundary lines. In this case, instead of removing
the polarised/grid lines, we lower their saliency (significance) ranks in the line organisation/selection process (the fourth feature above), so that the desired line features
are successfully accepted in line grouping process. As explained in Chapter 3, this
line selection process eventually leads to improvements in the systems performance
robustness and computation efficiency.
We reiterate the importance of successfully detecting critical features (ie., nose/wing
edges) to ensure the generation of the aircraft hypothesis (ie., wing-nose association)
from them. Such a robust lower level processing is crucial in any bottom-up vision
system. Therefore, the conditions and thresholds used in the low level stages and line
grouping generation of Chapter 3 are made forgiving. Furthermore, another version of
the system is developed that incorporates the neural networks in place of the feature
extraction rules. This has shown to reduce the computational load and false alarm
rate.

227

Our model matching method implements a pixel-level contour matching, which allows
image pixels missed out in the line features to contribute to the final match score hence
improving the systems capability to discriminate between similar shaped aircraft.
Inherent clutter problem in pixel level matching is overcome by disregarding any
mapped image pixels that have conflicting phase values.

7.3

Suggestions for Future Work

As a future work we suggest to investigate different ways to draw more information


from image intensity or texture. Intensity or texture based reasoning can be very
effective in further specifying aircraft parts and distinguishing them from the surrounding background. There are some recent segmentaion/edge detection methods
[24, 129] that could be useful in cases where the aircraft boundaries cannot be defined
by gradient or appear too smooth. One in particular is active contour model proposed
by Chan and Vese [24] that is based on Mumford-Shah segmentation techniques and
the level set method. Their experimental results demonstrate some advantages: the
ability to detect smooth boundaries, scale-adaptivity, automatic change of topology
and robustness against excessive noise. Adopting this approach could assist or replace
our edge detector when the aircraft boundaries appear very blurry or surrounded by
dense clutter.
Use of the neural networks could extend to the evidence collection stage, where the
positive and negative evidences form an input to the neural network that generates
a normalised evidence score in the range of [0,1]. Such an upgrade may improve the
system performance, and the intensive tuning process for the evidence scores and
penalties can be significantly alleviated.
As far as matching is concerned, a full volumetric model is expected to improve

228

aircraft identification and discrimination between resembling aircraft.


This system can also be expanded to accommodate aircraft recognition from multiple
views. Aircraft primitive features, which may appear weak or occluded in some views,
are likely to be detectable in others. Integrating these features across all image and
at all levels leads to improved recognition performance [93, 95, 67].

Bibliography
[1] N. Ayache and O.D. Faugeras. hyper: A new approach for the recognition
and positioning of 2d objects, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 8(1):4454, January 1986.
[2] E. Bala and A.E. Cetin, computationally efficient wavelet affine invariant
functions for shape recognition, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 26(8):10951099, August 2004.
[3] D.H. Ballard, generalizing the hough transform to detect arbitrary shapes,
In Real-Time Computer Vision, pages 714725, 1987.
[4] M. Bennamoun, edge detection: Problems and solutions, IEEE Transactions
on Systems, Man, and Cybernetics, Computational Cybernetics and Simulation, 4:31643169, 1997.
[5] J.R. Beveridge, Local Search Algorithms for Geometric Object Recognition:
Optimal Correspondence and Pose, Phd thesis, University of Massachusettes,
Amherst, May 1993.
[6] J.R. Beveridge and E.M. Riseman, how easy is matching 2d line models using
local search?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6):564579, June 1997.
229

230

[7] B. Bhanu and Holben R.D, model based segmentation of flir images, IEEE
Transactions on Aerospace Electronic System, 26(1):465491, 1998.
[8] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford: Univeristy
Press, Inc., New York, USA, 1995.
[9] C. Bjorklund, M. Noga, E. Barrett, and Kuan D, lockheed imaging technology research for missiles and space, In Proceedings of the DARPA Image
Understanding Workshop, pages 332352, Palo Alto, CA, May 1989.
[10] M. Boldt, R. Weiss, and E.M. Riseman, token-based extraction of straight
lines, Transactions on Systems, Man and Cybernetics, 19(6):15811594, 1989.
[11] R.C. Bolles and R.A. Cain, recognising and locating partially visible objects:
The local feature focus method, Internat. J. Robot. Res., 1(3):5782, 1982.
[12] G. Borgefors, distance transformations in arbitrary dimensions, Computer
Vision, Graphics, and Image Processing, 27(3):321345, September 1984.
[13] G. Borgefors, hierarchical chamfer matching: A parametric edge matching
algorithm, IEEE Transactions on Pattern Analysis and Machine Intelligence,
10(6):849865, November 1988.
[14] R. D. Boyle and R. C. Thomas, Computer Vision A First Course, Blackwell
Science Publications, 1988.
[15] M.G. Breuers, image-based aircraft pose estimation using moment invariants,
In SPIE Conference on Automatic Target Recognition IX, pages 294304, Orlando, Florida, April 1999.
[16] R.A. Brooks,

symbolic reasoning among 3-dimenstional models and 2-

dimensional image, Artificial Intelligence, 17:285349, 1981.

231

[17] R.A. Brooks,

model-based three-dimensional interpretations of two-

dimensional images, Real-time Computer Vision, pages 360370, 1987.


[18] R.A. Brooks and T.O. Binford, geometric reasoning in acronym, In Proceedings of the DARPA Image Understanding Workshop, pages 4854, 1979.
[19] R.A. Brooks, R. Greiner, and T.O. Binford, the acronym model-based vision system, In Proceedings of International Joint Conference on Artificial
Intelligence, pages 105113, 1979.
[20] J.B. Burns, A.R. Hanson, and E.M. Riseman, extracting straight lines, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 8(4):425455, July
1986.
[21] J. Canny, a computational approach to edge detection, IEEE Transactions
on Pattern Analysis and Machine Intelligence, 8(6):679698, November 1986.
[22] O. Carmichael and M. Hebert, shape-based recognition of wiry objects, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 26(12):15371552,
December 2004.
[23] T.J. Cham and R. Cipolla, symmetry detection through local skewed symmetries, Image and Vision Computing, 13(5):439450, June 1995.
[24] T.F. Chan and L. A. Vese, active contours without edges, IEEE Transactions
on Image Processing, 10(2):266277, February 2001.
[25] Z. Chen and S.Y. Ho, computer vision for robust 3d aircraft recognition with
fast library search, Pattern Recognition, 24:375390, 1991.
[26] D. Chetverikov, a simple and efficient algorithm for detection of high curvature
points in planar curves, In Proceedings of Computer Analysis of Images and
Patterns, pages 746753, 2003.

232

[27] C.H. Chien and J. K. Aggarwal, shape recognition from single silhouettes,
IEEE Transactions on Pattern Analysis and Machine Intelligence, 3(3):481
490, 1981.
[28] M. Clark, A.C. Bovik, and W.S. Geisler, texture segmentation using gabor
modulation/demodulation, Pattern Recognition Letters, 6:261267, 1987.
[29] S. Climer and S.K. Bhatia, local lines: A linear time line detector, Pattern
Recognition Letters, 24(14):22912300, October 2003.
[30] R.W. Curwen and J.L. Mundy, constrained symmetry exploitation, In Image
Understanding Workshop, pages 775781, 1998.
[31] R.W. Curwen, C.V. Stewart, and J.L. Mundy, recognition of plane projective
symmetry, In Proceedings of IEEE International Conference on Computer
Vision, pages 11151122, 1998.
[32] S. Das and B. Bhanu, a system for model-based object recognition in perspective aerial images, Pattern Recognition, 31:465491, 1998.
[33] S. Das, B. Bhanu, Wu X., and R.N. Braithwaite, Qualitative Recognition of
Aircraft In Perspective Aerial Images, chapter in Advanced Image Processing
and Machine Vision, pages 475517, Springer-Verlag, 1996.
[34] L.S. Davis and T.C. Henderson, hierarchical constraint processes for shape
analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence,
3(3):265277, 1981.
[35] H. Derin and H. Elliott, modelling and segmentation of noisy and textured
images using gibbs random fields, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 9(1):3955, January 1987.

233

[36] S.A. Dudani, K.J. Breeding, and R.B. McGhee, aircraft identification by
moment invariants, IEEE Transactions on Computers, 26(1):3946, January
1977.
[37] P.T. Fairney and D.P. Fairney, 3-d object recognition and orientation from
single noisy 2-d images, Pattern Recognition Letters, 17(7):785793, June 1996.
[38] S.A. Friedberg, finding axis of skewed symmetry, In Proceedings of IEEE
International Conference on Pattern Recognition, pages 322325, 1984.
[39] K.S. Fu, syntactic pattern recognition and applications, Pattern Recognition,
12(6):431441, 1980.
[40] K.S. Fu, Syntactic Pattern Recognition and Applications, Prentice Hall, New
Jersey, 1982.
[41] D. Gavrila, multi-feature hierarchical template matching using distance transforms, In Proceedings of IEEE International Conference on Pattern Recognition, 1998.
[42] D. Gavrila and V. Philomin, real-time object detection for smart vehicles,
In Proceedings of IEEE International Conference on Computer Vision, pages
8793, 1999.
[43] D.M. Gavrila and F.C.A. Groen, 3-d object recognition from 2-d images using
geometric hashing, Pattern Recognition Letters, 13(4):263278, April 1992.
[44] T. Glais and A. Andre, image-based air targt identification, In SPIE conference on Applications Of Digital Image Processing XVII, volume 2298, 1994.
[45] J.W. Gorman, O.R. Mitchell, and F.P. Kuhl, paritial shape recognition using
dynamic programming, IEEE Transactions on Pattern Analysis and Machine
Intelligence, 10(2):257266, 1988.

234

[46] U. Grenander and A. Srivastava, probability models for clutter in natural


images, IEEE Transactions on Pattern Analysis and Machine Intelligence,
23(4):424429, April 2001.
[47] W.E.L. Grimson and D.P. Huttenlocher, on the sensitivity of geometric hashing, In Proceedings of IEEE International Conference on Computer Vision,
pages 334338, 1990.
[48] W.E.L. Grimson and D.P. Huttenlocher, on the sensitivity of the hough transform for object recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(3):255274, March 1990.
[49] W.E.L. Grimson, D.P. Huttenlocher, and T.D. Alter, recognizing 3d objects
from 2d images: An error analysis, In MIT AI Memo, 1992.
[50] S. Haykin, Neural Networks, Prentice Hall, 2nd edition edition, 1999.
[51] Y.C. Hecker and R.M. Bolle, on geometric hashing and the generalized hough
transform, IEEE Transactions on Systems, Man, and Cybernetics, 24(9):1328
1338, September 1994.
[52] H. Hmam and J. Kim, automatic aircraft recogntion, Proceedings of SPIE on
Battlespace Digitization and Network-Centric Warfare II, 4741:229238, 2002.
[53] H. Hmam and J. Kim, aircraft recognition via geometric reasoning, Proceedings of SPIE on Battlespace Digitization and Network-Centric Warfare II,
5094:374385, 2003.
[54] K. Hornik, M. Stinchcombe, and H. White, multilayer feedforward networks
are universal approximators, Neural Networks, (2):359366, 1989.

235

[55] J.W. Hsieh, J.M. Chen, C.H. Chuang, and K.C. Fan, novel aircraft type recognition with learning capabilities in satellite images, In Proceedgins of IEEE
International Conference on Image Processing, pages III: 17151718, 2004.
[56] J.W. Hsieh, J.M. Chen, C.H. Chuang, and K.C. Fan, aircraft type recognition
in satellite images, In IEE Proceedings on Vision, Image and Signal Processing,
volume 152, pages 307315, June 2005.
[57] M.K. Hu, visual pattern recognition by moment invariants, IRE Transations
on Information Theory, 8:179187, Febrary 1962.
[58] D.P. Huttenlocher, monte carlo comparison of distance transform based matching measures, In Proceedings of the DARPA Image Understanding Workshop,
pages 11791184, 1997.
[59] D.P. Huttenlocher, G.A. Klanderman, and W.J. Rucklidge, comparing images
using the hausdorff distance, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 15(9):850863, September 1993.
[60] D.P. Huttenlocher and S. Ullman, recognizing solid object by alignment with
an image, International Journal of Computer Vision, 5(2):211, 1990.
[61] J. Illingworth and J.V. Kittler, a survey of the hough transform, Computer
Vision, Graphics, and Image Processing, 44(1):87116, October 1988.
[62] Q. Iqbal and J.K. Aggarwal, retrieval by classification of images containing large manmade objects using perceptual grouping, Pattern Recognition,
35(7):14631479, July 2002.
[63] J.H. Jang and K.S. Hong, fast line segment grouping method for finding
globally more favorable line segments, Pattern Recognition, 35(10):22352247,
October 2002.

236

[64] H. Kalviainen, P. Hirvonen, L. Xu, and E. Oja, probabilistic and nonprobabilistic hough transforms: Overview and comparisons, Image and Vision
Computing, 13(4):239252, May 1995.
[65] B. Kamgar-Parsi, B. Kamgar-Parsi, and A.K. Jain, automatic aircraft recognition: Toward using human similarity measure in a recognition system, In
Proceedings of IEEE Computer Vision and Pattern Recognition, pages I: 268
273, 1999.
[66] B. Kamgar-Parsi, B. Kamgar-Parsi, A.K. Jain, and J.E. Dayhoff, aircraft
detection: A case study in using human similarity measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12):14041414, December 2001.
[67] Z. Kim and R. Nevatia, automatic description of complex buildings from
multiple images, Computer Vision and Image Understanding, 96(1):6095,
October 2004.
[68] H. Kollnig and H.H. Nagel, 3-d pose estimation by directly matching polyhedral models to gray value gradients, International Conference on Computer
Vision, 23(3):283302, June 1997.
[69] P. Kuhl and C. Giardina, elliptic fourier features of a closed contour, Computer Graphics and Image Processing, 18:236258, 1982.
[70] R. Kuma and A. Hanson, robust estimation of camera location and orientation
from noisy data having outliers, In In Proceedings of IEEE Workshop on
Interpretation of 3D Scenes, pages 5260, 1989.
[71] D. Lagunovsky and S. Ablameyko, straight-line-based primitive extraction in
grey-scale object recognition, Pattern Recognition Letters, 20(10):10051014,
October 1999.

237

[72] Y. Lamdan, J.T. Schwartz, and H.J. Wolfson, on recognition of 3-d objects
from 2-d images, In Proceedings of IEEE International Conference on Robotics
and Automation, pages 14071413, 1988.
[73] Y. Lamdan and H.J. Wolfson, geometric hashing: A general and efficient
model-based recognition scheme, In Proceedings of IEEE International Conference on Computer Vision, pages 238249, 1988.
[74] Y. Lamdan and H.J. Wolfson, on the error analysis of geometric hashing, In
Proceedings of IEEE Computer Vision and Pattern Recognition, pages 2227,
1991.
[75] V.F. Leavers, survey: Which hough transform?, Computer Vision, Graphics,
and Image Processing, 58(2):250264, September 1993.
[76] C. Lin, A. Huertas, and R. Nevatia, detection of buildings using perceptual
groupings and shadows, In USC Computer Vision, 1994.
[77] Chungan Lin and Ramakant Nevatia, building detection and description from a
single intensity image, Computer Vision and Image Understanding, 72(2):101
121, 1998.
[78] H.C. Liu and M.D. Srinath, corner detection from chain-code, Pattern Recognition, 23:5168, 1990.
[79] D.G. Lowe, three-dimensional object recognition from single two-dimensional
images, Artificial Intelligence, 31(3):355395, March 1987.
[80] G. Marola, using symmetry for detecting and locating objects in a picture,
Computer Vision, Graphics, and Image Processing, 46(2):179195, May 1989.

238

[81] S.M. Marouani, A. Huertas, and G. Medioni, model-based aircraft recognition


in perspective aerial imagery, Symposium on Computer Vison, pages 371376,
1995.
[82] D. Marr and E. Hildreth, theory of edge detection, Proceedings of Royal
Society in London, B:187217, 1980.
[83] R. A. McLaughlin and M. D. Alder, recognising aircraft: Automatic extraction
of structure by layers of quadratic neural nets, In In Proceedings of IEEE
International Conference on Neural Networks, pages 42884293, 1994.
[84] R.A. Mclaughlin and M.D. Alder, recognition of infra red images of aircraft
rotated in three dimensions, In Third Australian and New Zealand Conference
on Intelligent Information Systems, pages 8287, 1995.
[85] G.G. Medioni and Y. Yasumoto, corner detection and curve representation
using cubic b-splines, Computer Vision, Graphics, and Image Processing,
39(3):267278, September 1987.
[86] J. Ming and B. Bhanu, a multistrategy learning approach for target model
recognition, acquisition, and refinement, In Proceedings of the DARPA Image
Understanding Workshop, pages 742756, 1990.
[87] D.I. Moldovan and C.-I. Wu, a hierarchical knowledge based system for airplane classification, IEEE Transactions on Software Engineering, 14(12):1829
1834, 1988.
[88] Y. Moses and S. Ullman, limitations of non model-based recognition schemes,
In Proceedings of European Conference of Computer Vision, pages 820828,
London, UK, 1992. Springer-Verlag.

239

[89] J.L. Mundy and A.J. Heller, the evolution and testing of a model-based object recognition system, In Proceedings of IEEE International Conference on
Computer Vision, pages 268282, 1990.
[90] P.F.M. Nacken, a metric for line segments, IEEE Transactions on Pattern
Analysis and Machine Intelligence, 15(12):13121318, December 1993.
[91] H. Nasr, B. Bhanu, and S. Lee, refocused recognition of aerial photographs at
multiple resolution, Proceedings SPIE International Conference on Aerospace
Pattern Recognition, 1098:198206, 1989.
[92] R. Nevatia and A. Huertas, knowledge-based building detection and description: 1997-1998, In Proceedings of the DARPA Image Understanding Workshop, pages 469478, 1998.
[93] R. Nevatia, C. Lin, and A. Huertas, a system for building detection from
aerial images, In In A. Gr un, E. P. Baltsavias, and O. Henricsson, editors,
Automatic Extraction of Man-Made Objects from Aerial and Space Images (II),
Birkh auser, Basel, pages 7786, 1997.
[94] R Nevtia and Babu. R, linear feature extraction and description, Computer
Vision, Graphics, and Image Processing, 13:257269, 1980.
[95] S. Noronha and R. Nevatia, detection and description of buildings from multiple aerial images, In Proceedings of IEEE Computer Vision and Pattern
Recognition, pages 588594, 1997.
[96] T. Ojala, M. Pietikainen, and D. Harwood, a comparative study of texture
measures with classification based on feature distributions, Pattern Recognition, 29(1):5159, January 1996.

240

[97] C.F. Olson, efficient pose clustering using a randomized algorithm, International Conference on Computer Vision, 23(2):131147, June 1997.
[98] C.F. Olson and D.P. Huttenlocher, automatic target recognition by matching
oriented edge pixels, IEEE Transactions on Image Processing, 6(1):103113,
January 1997.
[99] D.W. Patterson, Artificial Neural Networks, Prentice Hall, Singapore, 1996.
[100] A.P. Pentland, fractal-based description of natural scenes, In IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 661674, 1984.
[101] P. Perona and J Malik, scale-space and edge detection using anisotropic diffusion, IEEE Transactions on Pattern Analysis and Machine Intelligence,
12(7):629639, July 1990.
[102] W. K. Pratt, Digital Image Processing, John Wiley and Sons. Inc., 2nd edition
edition, 1991.
[103] A.P. Reeves, R.J. Prokop, S.E. Andrews, and F.P. Kuhl, three-dimensional
shape analysis using moments and fourier descriptors, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 10(6):937943, November 1988.
[104] J. Serra, Image Analysis and Mathematical Morphology, Academic Press, 1982.
[105] S.M. Smith and J.M. Brady, susan: A new approach to low-level imageprocessing, International Journal of Computer Vision, 23(1), May 1997.
[106] A. A. Somaie, A. Badr, and T. Salah, aircraft recognition system using backpropagation, In Radar, 2001 CIE International Conference on, Proceedings,
pages 498501, 2001.

241

[107] C.T. Steger, similarity measures for occlusion, clutter, and illumination invariant object recognition, In Proceedings of the 23rd DAGM-Symposium on
Pattern Recognition, pages 148154, London, UK, 2001. Springer-Verlag.
[108] C.T. Steger, occlusion, clutter, and illumination invariant object recognition,
In Proceedings of Photogrammetric Computer Vision, page A: 345, 2002.
[109] F. Stein and G.G. Medioni, graycode representation and indexing: Efficient
two dimensional object recognition, In Proceedings of IEEE International
Conference on Pattern Recognition, pages VolI 1317, 1990.
[110] F. Stein and G.G. Medioni, recognition of 3-d objects from 2-d groupings, In
Proceedings of the DARPA Image Understanding Workshop, 1992.
[111] F. Stein and G.G. Medioni, structural indexing: Efficient two dimensional
object recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(12):11981204, December 1992.
[112] G. Stockman, S. Kopstein, and S. Benett, matching images to models for
registration and object detection via clustering, IEEE Transactions on Pattern
Analysis and Machine Intelligence, 4(3):229241, 1982.
[113] G.Y. Tang and T.S. Huang, using the creation machine to locate airplanes on
aerial photos, Pattern Recognition, 12(6):431441, 1980.
[114] E. Thiel and Montanvert. A, champer masks: Discrete distance functions,
geometrical properties, and optimization, In Proceedings of IEEE International
Conference on Pattern Recognition, pages 244247, 1992.
[115] S.C. Tien, T.L. Chia, and Y. Lu, using cross-ratios to model curve data for
aircraft recognition, Pattern Recognition Letters, 24(12):20472060, August
2003.

242

[116] D.H. Titterton and J.L Weston, Strapdown Itertial Navigation Technology,
IEE radar, sonar, navigation and avionics series 5. Peter Peregrinus Ltd., 1997.
[117] F. Tomita and S. Tsuji, Computer Analysis of Visual Textures, Kluwer Academic Publishers, Norwell, MA, USA, 1990.
[118] F.C.D. Tsai, geometric hashing with line features, Pattern Recognition,
27(3):377389, March 1994.
[119] M. Tuceryan and A.K. Jain, texture segmentation using voronoi polygons,
IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(2):211
216, February 1990.
[120] M. Tuceryan and A.K. Jain, texture analysis, In The Handbook of Pattern
Recognition and Computer Vision (2nd Edition), pages 207248, 1998.
[121] V. Venkateswar and R. Chellappa, extraction of straight lines in aerial images,
IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(11):1111
1114, November 1992.
[122] T.P. Wallace, O. R. Mitchell, and K. Fukunaga, three dimensional shape
analysis using local shape descriptors, IEEE Transactions on Pattern Analysis
and Machine Intelligence, 3(3):310323, 1981.
[123] T.P. Wallace and P.A. Wintz, an efficient three-dimensional aircraft recognition algorithm using fourier descriptors, Computer Graphics Image Processing,
13(1):99126, May 1980.
[124] L. Wan and L. Sun, automatic target recogiinition using higher order neural
network, In National Aerospace and Electronics Conference NAECBN, pages
221226, 1996.

243

[125] M.J.J. Wang, W.Y. Wu, L.K. Huang, and D.M. Wang, corner detection using
bending value, Pattern Recognition Letters, 16(6):575583, June 1995.

[126] H.J. Wolfson, model-based object recognition by geometric hashing, In Proceeding of European Conference of Computer Vision, pages 526536, 1990.

[127] H.J. Wolfson and Y. Lamdan, geometric hashing: A general and efficient
model-based recognition scheme, In Proceedings of IEEE International Conference on Computer Vision, pages 238249, 1988.

[128] F. Xu, X. Niu, and R Li, automatic recognition of civil infrastructure objects
using hopfield neural network, Geographic Information Science, 9((1-2)):78
89, December 2003.

[129] S. C. Zhu and A. Yullie, region competition: Unifying snakes, region growing,
and bayes/mdl for multiband image segmentation, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 18(9):884900, September 1996.

Appendix A
Description of Input Parameters to
the Neural Networks Feature
Extractors
This appendix lists the inputs to the neural networks, that are designed to detect line
groupings and aircraft hypotheses. The input parameters are briefly described and
are referenced to the relevant rules and figures in Section 3.5 - Section 3.7.
Feature parameters for wing candidates (refer to Section 3.5.1 and 3.17):
1. mean(li , lj )/lmedian : lengths must be significant (see condition 1).
2. ti : (see condition 2 and Figure 3.16).
3. tj : (see condition 2 and Figure 3.16).
4. apart: (see condition 3 and Figure 3.16).
5. overlap(%): extent of rotational edge overlap as shown in Figure 3.17 (d), (see
condition 4).
6. min(li , lj )/max(li , lj ): (see condition 5 and Figure 3.17 (e)).
7. C : (see condition 7).
245

246

It should be noted that condition 8 regarding the intensity distribution between the
two lines are not included here. Merging the geometry and intensity parameters and
feeding them to the neural networks did not improve the performance of the network.
Therefore, the rule-based approach is kept for the intensity check.
Feature parameters for nose candidates (refer to Section 3.5.2 and Figure 3.19):
1. lL /lmedian : (see condition 1 in the first round rule set and Figure 3.19).
2. tL : (see condition 2 in the first round check).
3. tS : (see condition 2 in the first round check).
4. gij : (see condition 3 in the first round check).
5. overlap(%): (see condition 4 in the first round check).
6. lS /lL : (see condition 5 in the first round check).
7. N :(see condition 7 in the first round check).
8. third line status (-2 to +2): used in the second round check (see conditions
1-3).
9. ave intensity: used in the second round check (see condition 4).
10. line count: extra information to indicate how cluttered the scene is.

Feature parameters for wing-pair candidates (refer to Section 3.6 and Figure
3.26):
1. Wmin /Wmax : ratio of wing weights (sum of edge lengths) (see condition 1).
2. LC : left wing angle (see condition 2).
3. RC : right wing angle (see condition 2).
4. kLC RCk: wing span (see condition 3).
5. kF P P T 1k/kP T 2 P T 1k: how far left leading wing edge is from the intersection point F P , (see condition 4).

247

6. kF P P T 3k/kP T 4 P T 3k: how far right leading wing edge is from the intersection point F P , (see condition 4).
7. kRP P T 5k/kP T 6 P T 5k: how far left trailing wing edge is from the intersection point RP , (see condition 4).
8. kRP P T 7k/kP T 8 P T 7k: how far right trailing wing edge is from the
intersection point RP , (see condition 4).
9. 1 : 1 -4 are used to determine the arrangement of two wings (see condition 5
and Figure 3.26 (a-b)).
10. 2 :(see condition 5 and Figure 3.26 (a-b)).
11. 3 : (see condition 5 and Figure 3.26 (a-b)).
12. 4 : (see condition 5 and Figure 3.26 (a-b)).
13. F : (see condition 6).
14. R : (see condition 6).
15. type: [boomerang, diamond, triangle] determined by geometric rules.
16. wing ave intensity gap/image instensity range: two wings have compatible
mean intensities.
17. (F P RP, F P M ): angular deviation in the symmetry check (see condition
7).

Feature parameters for aircraft candidates (refer to Section 3.7 and Figure
3.31):
1. shape: [boomerang, diamond, triangle] determined during wing-pair detection
process.
2. kC F P k/kF P RP k: location of nose corner, C, in the longitudinal direction
(see condition 2).
3. s : lateral angular extent of nose search region (see condition 2).
4. kC M k/kLC RCk: (see condition 2).

248

5. d : nose must face the F P (see condition 3).


6. tg : tg ranges from 0 to 1 if [C M ] passes through the gap between the wingpair. If not (ie.,[C M ] cuts one of the wing edges), then tg can have a value
less than 0 or greater than 1 (see condition 4).
7. dif f : [C M ID] cuts the nose opening internally(see condition 5 and Figure
3.31 (c)).
8. min(L , R ): (see condition 6).
9. Wp /W : (see condition 7).
10. LC : left wing angle (see Figure 3.26 (a)).
11. RC : right wing angle (see Figure 3.26 (a)).

The first and last two parameters are included because the other parameters are
related to these parameters.

Вам также может понравиться