Академический Документы
Профессиональный Документы
Культура Документы
Machine Vision
Algorithms in Java
Techniques and Implementation
Springer
ISBN 978-1-4471-1066-8
DOI 10.1007/978-1-4471-0251-9
The use of registered names, trademarks etc. in this publication does not imply, even in the absence of
a specific statement, that such names are exempt from the relevant laws and regulations and therefore
free for general use.
The publisher makes no representation, express or implied, with re gard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made.
Typesetting: PostScript files by authors
34/3830-543210 Printed on acid-free paper SPIN 10743121
Preface
For many novices to the world of machine vision, the development of automated vision solutions may seem like a relatively easy task as it only requires a
computer to understand basic elements such as shape, colour and texture. Of
course this is not the case. Extracting useful information from images in a laboratory environment is a difficult process at the best of times, but to develop
imaging systems with the reliability and repeatability required for industrial,
medical and associated imaging applications increases the complexity of the
design task. The aim of this book was to produce a self contained software
reference source for machine vision designers which takes such issues into account. To that end Machine Vision Algorithms in Java contains explanations
of key machine vision techniques and algorithms, along with the Java source
code for a wide range of real-world image processing and analysis functions.
A number of texts have been written over the last few years, which have
focused on implementing image processing and to a lesser extent image analysis functions, through coded examples (i.e. for a range of software languages
and environments). So, why do we need another book on the subject? Firstly,
Machine Vision Algorithms in Java concentrates on examining these issues
from a machine vision perspective. It focuses on the image analysis and general machine vision design task, as well as discussing the key concepts relating
to image processing. In addition, we have made available (via the Internet) a
comprehensive machine vision development environment, Neat Vision, which
allows the algorithms and techniques discussed throughout this book to be
implemented by the reader.
The range of machine vision techniques and applications has grown significantly over the last decade and as such it would be difficult for a single
text to cover them all. Therefore, this book concentrates on those algorithms
and techniques that have found acceptance within the machine vision community. As is in the nature of putting a book like this together, certain areas
receive greater attention reflecting our experience and the nature of our own
research.
This book has grown from a number of different elements. Many of the
ideas outlined are based on the postgraduate modules Computer and Machine
Vision (EE544) and Object-oriented Programming (EE553) developed by Dr.
Paul Whelan and Derek Molloy respectively. Both modules are available in
Vlll
Preface
traditional form and via the Internet as part of Dublin City University's
Remote Access to Continuing Engineering Education (RA CeE) initiative l .
Another key element was the development of Neat Vision, a Java based
visual programming environment for machine vision. It provides an intuitive
interface which is achieved using a "drag and drop" block diagram approach,
where each image analysis/processing operation is represented by a graphical
block with inputs and outputs that can be interconnected, edited and deleted
as required. NeatVision was designed to allow the user to focus on the machine vision design task rather than concerns about the subtlety of a given
programming language. It allows students of machine vision to implement
their ideas in a dynamic and easy to use way, thus reinforcing one of the key
elements of the educational experience, interaction. In conjunction with the
publication of this book a fully functional 'shareware' version of NeatVision
has been made available via the Internet 2 .
We have also included an introduction to Object-oriented Programming
(OOP) and the Java programming language, with particular reference to
its imaging capabilities. This was done for those readers who may be unfamiliar with OOP and the Java programming language. It includes details
relating to the design of a Java based visual programming environment for
machine vision along with an introduction to the Java 2D imaging and the
Java Advanced Imaging (JAI) Application Programming Interface (API). A
wide range of illustrative examples are also included.
Having successfully digested the ideas outlined in this book the reader
will:
Be familiar with the essential elements of machine VlSlOn software and
have an understanding of the problems involved in the development and
implementation of machine vision systems from a software perspective.
Have the ability to design and implement image processing and analysis
techniques.
Be able to evaluate emerging machine vision algorithms and techniques.
Be familiar with the Java programming language and its application to
image analysis.
Be able to develop machine vision solutions using a Java based visual programming environment (i.e. Neat Vision).
This book is aimed at senior undergraduate and postgraduate students in
engineering and computer science as well as practitioners in machine vision,
who may wish to update or expand their knowledge in the field. We have
tried to present the techniques and algorithms of machine vision in a way
1
http://www.racee.ie/
See http://www.NeatVision.com/ for further details on downloading and in-
stalling this software. This web site also contains details of the NeatVision environment along with a summary of its techniques. A number of working examples
along with sample files and associated visual workspaces are also available.
Preface
IX
that it will be understood not only by specialists familiar with the field, but
also by those who are less familiar with the topic. Care has also been taken to
ensure that we have provided adequate references to supporting work. This
should aid readers who wish to examine the topics covered in more detail.
The organisation of the book is as follows. Chap. 1 introduces the general
field of machine vision systems engineering. Chap. 2 outlines the key concepts
behind the Java programming language. As well as giving a brief history, description and layout of the Java language, this chapter outlines the properties
of Java that make it useful for image processing and analysis. The purpose of
Chap. 3 is to detail some of the basic techniques and algorithms used in the
development of machine vision systems. Key elements of the image processing and analysis functions introduced in this section are also implemented in
Java and form the basis of the NeatVision visual programming environment.
Chap. 4 follows on from this basic introduction by examining mathematical morphology, a key framework in which many machine vision algorithms
can be placed. Chaps. 5 and 6 expand our discussion of imaging techniques
to include key elements of texture and colour image analysis. Chap. 7 details the design and implementation of the Neat Vision visual programming
environment. Appendix A outlines the graphics file formats that are used
by NeatVision and Appendix B details the NeatVision imaging Application
Programming Interface (API) specification. Finally, Appendix C summarises
the range of operators available in the NeatVision machine vision software
development environment. A range of sample applications implemented in
NeatVision are highlighted throughout the book.
For updates, corrections, colour images and sample visual programmes
please refer to the book web site at http://www.eeng.deu.ie;- j avamv /.
Paul F. Whelan
Derek Molloy
mvajeeng.deu.ie
Acknowledgments
This book has benefited from the comments and suggestions of a wide range
of people, including our colleagues with whom we have had many fruitful
discussions and collaborations. Numerous students have also contributed,
both directly and indirectly. The most important contributions coming from
members of the Vision Systems Laboratory (VSL) at Dublin City University (DCU), namely Ovidiu Ghita, Alexandru Drimbarean and Pradeep PP.
We would particulary like to express our gratitude to Robert Sadleir (VSL)
for a fruitful collaboration in the development of Neat Vision. Robert also
contributed to our discussion on the Neat Vision development environment,
specifically in Chap. 7. We would also like to thank all the members of the
VSL for their comments on the various drafts of this book.
We would like to thank Prof. Charles McCorkell, Head of the School
of Electronic Engineering, DCU, for his support of the VSL and this book
project. We would also like to acknowledge all our academic colleagues for
our many discussions on computer and machine vision, especially Prof. Bruce
Batchelor (machine vision systems engineering) and Dr. Pierre Soille (mathematical morphology). We would like to acknowledge Xuemei Zhang (Department of Psychology, Stanford University), Mark Graves (Spectral Fusion
Technologies) and Tim Carew (Technology Systems International) and thank
them for their permission to use some of the material cited in this book.
Special thanks are due to Nicholas Pinfield and Oliver Jackson of SpringerVerlag for their commitment to this book and maintaining its schedule. Machine Vision Algorithms in Java: Techniques and Implementation was prepared in camera-ready form using the IbTEX text processing environment and
Paint Shop Pro image editing software.
Finally, the authors would like to thank their families for their constant
support and encouragement during this project. We would like to thank the
readers in advance for comments and suggestions aimed at improving and extending the present book and its associated NeatVision software. Any errors,
of course, remain our own responsibility.
XII
Acknowledgments
Notice
Neat Vision and its associated materials are copyrighted 2000, by Paul
F. Whelan. The software is presented "as is". While every reasonable effort
has been made to ensure the reliability of this software, NeatVision and the
associated algorithms outlined in this book are supplied for general reference
only and should not be relied on without further specific inquiry. Neat Vision
may be downloaded, stored on a hard drive or other storage device, with the
following restrictions and exceptions:
Systematic or multiple-copy reproduction or republication; electronic retransmission to another location; print or electronic duplication of any
NeatVision material supplied for a fee or for commercial purposes; or altering or recompiling any contents of NeatVision and its associated materials
are not permitted.
By choosing to use Neat Vision and its associated materials, you agree to
all the provisions of the copyright law protecting it and to the terms and
conditions established by the copyright holder.
The authors cannot accept responsibility for any loss or damage caused by
the use of the source code presented in this book.
Trademarks
Sun, Sun Microsystems, Solaris, Java and all Java-based trademarks are
trademarks or registered trademarks of Sun Microsystems, Inc. in the
United States and other countries.
Netscape and Netscape Navigator are registered trademarks of Netscape
Communications Corporation in the United States and other countries.
Microsoft, Windows, Windows NT and/or other Microsoft products referenced herein are either trademarks or registered trademarks of Microsoft
Corporation.
IBM is a registered trademark of IBM Corporation in the United States
and other countries.
Paint Shop Pro is a registered trademark of Jasc Software, Inc.
Contents
1.
2.
An
1.1
1.2
1.3
1.4
1
5
6
7
7
8
10
10
12
16
17
18
Java Fundamentals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.1 The History of Java .....................................
2.1.1 What Makes Java Platform Independent? ...........
2.1.2 The Just-In-Time Compiler ........................
2.1.3 The Sun Java Software Development Kit (Java SDK).
2.2 Object-oriented Programming. . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.2.1 Encapsulation....................................
2.2.2 Classes..........................................
2.2.3 Objects.........................................
2.2.4 Inheritance......................................
2.2.5 Polymorphism...................................
2.2.6 Abstract Classes .................................
2.3 Java Language Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.3.1 Variables........................................
2.3.2 Access Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.3.3 Java Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.3.4 Java Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.3.5 Java Comments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.3.6 The super and this Keywords. . . . . . . . . . . . . . . . . . . ..
2.3.7 Java Arrays .....................................
21
21
22
22
23
23
24
24
25
25
27
27
28
28
28
29
29
29
30
31
XIV
3.
Contents
2.3.8 The "Object" and "Class" Classes. . . . . . . . . . . . . . . . ..
2.3.9 Interfaces .......................................
2.3.10 Packages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.4 Applications and Applets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.4.1 Writing Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.4.2 Applets.........................................
2.4.3 An Applet and an Application? ... . . . . . . . . . . . . . . . ..
2.5 Java and Image Processing ..............................
2.5.1 The Canvas Class ................................
2.5.2 Java and Images .................................
2.5.3 Image Producers and Image Consumers . . . . . . . . . . . ..
2.5.4 Recovering Pixel Data from an Image Object ... . . . ..
2.5.5 Recreating an Image Object from an Array of Pixels..
2.6 Additional Classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.6.1 ColorModel......................................
2.6.2 ImageFilter......................................
2.6.3 CropImageFilter.................................
2.6.4 RGBImageFilter.................................
2.6.5 FilteredImageSource..............................
2.7 Double Buffering.. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.8 Recent Additions to Java for Imaging. . . . . . . . . . . . . . . . . . . ..
2.8.1 Java 2D API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.8.2 Working with the Java 2D API ....................
2.8.3 Renderable Layer and Rendered Layer . . . . . . . . . . . . ..
2.8.4 Java Advanced Imaging API (JAI) .................
2.8.5 JAI Functionality ................................
2.9 Additional Information on Java . . . . . . . . . . . . . . . . . . . . . . . . ..
2.10 Conclusion ............................................
31
31
32
33
33
34
35
36
36
38
41
41
44
45
45
45
46
46
47
47
49
49
50
54
54
56
58
59
61
61
62
65
66
70
73
74
79
82
90
91
93
96
98
99
Contents
3.3.3 Measurements on Binary Images ...................
3.3.4 Run-length Coding ...............................
3.3.5 Shape Descriptors ................................
Global Image Transforms ................................
3.4.1 Geometric Transforms ............................
3.4.2 Distance Transforms ..............................
3.4.3 Hough Transform ................................
3.4.4 Two-dimensional Discrete Fourier Transform (DFT) ..
Conclusion ............................................
112
115
115
116
116
117
119
124
134
137
138
138
143
145
145
146
148
149
150
152
152
156
157
160
161
162
162
163
163
163
163
164
165
165
165
165
169
169
174
174
3.4
3.5
4.
xv
XVI
Contents
5.
175
175
175
176
177
177
178
178
178
179
181
181
182
182
183
183
184
184
185
186
187
187
188
189
189
189
6.
191
191
193
195
197
197
198
203
203
204
205
206
206
208
209
209
210
211
Contents
7.
XVII
211
214
215
217
219
223
224
225
226
227
231
231
233
233
235
237
237
245
245
247
249
251
252
254
254
256
257
258
259
259
The purpose of this chapter is to introduce the reader to the basic principles of
machine vision. In this discussion the differences between computer, machine
and human vision are highlighted. In doing so, we draw attention to the key
elements involved in machine vision systems engineering. While this book
concentrates on the software aspects of the design cycle, this task cannot be
taken in isolation. Successful application of vision technology to real-world
problems requires an appreciation of all the issues involved.
This chapter is aimed at readers with minimal prior experience of machine
vision and as such more experienced readers will find most of this material
familiar. We also briefly introduce the reader to the concepts behind the
Neat Vision visual programming environment. This is discussed in more detail
in Chap. 7.
subset of the general systems engineering task. The formal definition 1 of machine vision put forward by the Automated Vision Association (AVA) refers
to it as:
"The use of devices for optical, non-contact sensing to automatically
receive and interpret an image of a real scene in order to obtain
information and/or control machines or processes." (AVA 1985).
Computer vision is a branch of computer science which concentrates on
the wider scientific issues relating to the automated interpretation of images.
Computer vision often deals with relatively domain-independent considerations, generally aiming to duplicate the effect of human vision by electronically perceiving and understanding an image. Research in this area often
deals with issues involving perception psychology and the analysis of human
and other biological vision systems.
Of course there is a danger in over-emphasizing the differences between
computer and machine vision. Both fields embrace common ground (Fig. 1.1),
but each has its own non-overlapping areas of expertise (Braggins 2000). The
key difference seems to lie in the motivation behind the development of the
imaging solution.
Image Analysis
Image Processing
Image Representation
Image Understanding
Pattern Recognition
Visual Modelling & Perception
Biological Vision
Systems Engineering
Robotics
Applied Vision
In general, machine vision engineering is about solving specific engineering problems and not trying to develop 'human-like' visual systems. The
concept of modelling an industrial vision systems design on the human visual system can often seem like a good starting point for novices to the field.
This is not the case (Hochberg 1987) and can often be counterproductive. We
should not confuse the two types of vision, the danger in relying on human
1
Note that this definition does not specifically refer to cameras or computer systems
driven approaches is that simpler and perhaps more elegant solutions may be
overlooked. The human vision system is qualitative in nature, dealing with
a broad range of imaging tasks in varying conditions. The human system
can pick out a familiar person in a crowded room, but would find it difficult
to give the exact dimensions of that person. Machine vision On the other
hand is quantitative, i.e. it can deal with making precise measurements at
high speed, but lacks the key elements of the human vision system. Machine
vision systems are good at repetitive tasks and provide fast and reliable decision making. If a human operator was to examine a colour web material
over a shift, they would find it difficult to detect a subtle colour change due
to the human's vision process ability to adapt to such gradual changes. The
fact that the human vision process is prone to subjective considerations such
as fatigue and boredom, which interfere with consistent evaluations, must
also be taken into consideration (these can be reliably handled by a machine
vision system).
Machine vision systems for industry first received serious attention in the
1970s (Parks 1978), although the proposal that a video system be used for
industrial inspection was first made in the 1930s. Throughout the early 1980s,
the subject developed slowly, with a steady contribution being made by the
academic research community, but with only limited industrial interest being
shown. In the mid-1980s there was serious interest being shown in vision
systems by the major automobile manufacturers, although this was followed
by a period of disillusionment with a large number of small vision companies
failing to survive. Interest grew significantly in the late 1980s and early 1990s,
due largely to significant progress being made in making fast image processing
hardware available at a competitive price. Throughout this period academic
workers have been steadily proving feasibility in a wide range of products,
representing all of the major branches of manufacturing industry.
Machine vision systems nOw appear in every major industrial sector, including such areas as electronics (PCB inspection, automatic component
recognition), car manufacturing (inspection of car bodies for dents, dimensional checking), food (inspection and grading of fruit and vegetables, inspection of food containers) and the medical industries (tablet quality control,
detection of missing items in pharmaceutical packets). The main application
areas for industrial vision systems occur in automated inspection and measurement and robotic vision. Automated visual inspection and measurement
systems have, in the past, tended to develop faster. In fact, quality control
related applications such as inspection, gauging and recognition currently account for well over half of the machine vision market. This has been mainly
due to the lower cost and the ease of retrofitting such inspection systems onto
existing production lines, compared to the large capital investment involved
in developing a completely new robotic work cell and the extra uncertainty
and risks involved in integrating two new complex technologies. As manufacturing technology becomes more complex, there is a growing requirement to
integrate the inspection process more closely with the overall manufacturing
process (McClannon et al. 1993). This moves the application of automated
inspection from a quality control to a process control role, that is from defect
detection to defect prevention. The reader is referred to Batchelor & Whelan
(1994b), Chin (1988), Chin & Harlow (1982), Davies (1996) and Freeman
(1987) for further details on a wide range of industrial implementations of
machine vision systems.
As well as reducing scrap and rework costs, product quality can also be
improved by using vision to aim for a higher quality standard. Machine vision
can be used to determine the cause of "abnormal situations" and provide early
warning for potential hazards on a production line, for example detecting a
warped metal plate before it is fed into a stamping press and potentially
damaging the press. This technology can also provide extensive statistics of
production process and in some applications we may have no alternative to
using automated vision, as the environment may be hazardous to humans.
Machine vision systems are not perfect, they contain two significant types
of error. System errors will be familiar to all engineers as there will always
be a trade-off between the cost and functionality of the systems components.
System errors can often be reduced by using components with higher specifications. Statistical errors are not as easy to handle. It can be quite difficult
to decide exactly where to draw the line which determines a good from a bad
product. If the image analysis process places the product in this grey area
then we have either false rejects or worse still we may pass a faulty product. This type of error can also be confounded by a lack of communication
between the vision engineer and the customer.
Significant progress has been made over the last decade due, in part to the
falling cost of computing power. This has led to a spread in the technology
and has enabled the development of cheaper machine vision systems. This,
in turn, has enabled medium-sized manufacturing companies to consider the
option of using machine vision to implement their inspection tasks. To a lesser
extent, the availability of a well educated work-force, a small proportion of
which has an awareness of machine vision, has also aided the growth and
acceptance, of industrial vision systems. The main reason, however, for this
growth is strategic. There is a growing realisation within many industries
that machine vision is an integral component of a long term automation
development process, especially when one considers the importance of quality
in manufacturing. This fact, combined with the legal liabilities involved in
the production and sale of defective products, highlights the strategic case
for the use of machine vision in automated inspection. A similar argument
applies to the application of vision to robotics and automated assembly.
that the software is supplied with the self-contained system has yet to be
moulded into a form that would solve the vision application. Such systems
have significant advantages, the main one being speed. The majority of selfcontained systems are custom designed, although they may contain some
plug-in boards and are tuned to provide whatever functionality is required
by the application. The self-contained nature of the mechanical, image acquisition and display interfaces is also a significant benefit when installing
vision systems. However, it can be difficult to add further functionality at a
later date without upgrading the system.
Turn-key vision systems are self-contained machine vision systems, designed for a specific industrial use. While some such systems are custom designed, many turn-key systems contain commercially available plug-in cards.
Turn-key systems tend to be designed for a specific market niche, such as
paper inspection. So, not only is the hardware tuned to deal with high-speed
image analysis applications, it is also optimised for a specific imaging task.
While the other systems discussed usually require significant development
to produce a final solution for an imaging application, turn-key systems are
fully developed, although they need to be integrated into the industrial environment. This should not be taken lightly, as it can often be a difficult task.
Also, it may not always be possible to find a turn-key system for a specific
application.
http://www.eeng.dcu.ie/-whelanp/resQurces/resQurces.html
Oala
Flow
Image Sensor
FeedbaCk
Palh
Image acquisition generally involves the capture of the analogue image signal
(although digital cameras can also be used) and the generation of a onedimensional (I-D) or two-dimensional (2-D) array of integers representing
pixel brightness levels. In the majority of machine vision applications a solid
state sensor based camera is employed. This is made of discrete elements
that are scanned to produce an image. Examples of such sensors are Charge
Coupled Devices (CCD) and Charge Injection Devices (CID). CCD based
cameras are the most commonly used solid state sensor due to their small size,
low power consumption requirements, wide spectral response and robustness.
These cameras are available in both line scan and area array configurations.
Line scan cameras are suitable for high resolution scanning applications where
the object moves beneath a fixed camera position, for example a paper web.
Area array cameras produce a medium to high resolution 2-D snapshot of a
scene (similar to a TV image).
It is worth noting that there have been major advances in other component technologies, specialised lighting units, lenses and advisor programs,
which guide a vision engineer through the initial stages of the design process.
While the acquisition of image data may not be seen as directly related to
image analysis and processing, the design decisions involved in this stage (i.e.
camera type, lighting and optical system design) have a fundamental bearing
on the systems software. For example, in Fig. 1.3 we have applied two different
lighting techniques to the same scene. Given the task of counting the number
of candies in the image, which lighting configuration should be adopted?
While the image in (b) gives a cleaner image, we would have to apply complex
image segmentation techniques (such as the one discussed in Sec. 4.4.2) to
separate the touching items prior to counting. Whereas the image in (a) may
initially seem to be of little use due to the bright reflections evident on each
candy piece. In fact the lighting unit was specifically designed to produce
this effect. This lighting configuration simplifies the image processing and
analysis tasks as all that is required of the software is to isolate and count
these reflections (Chap. 3), both relatively straightforward imaging tasks.
(a)
(b)
Fig. 1.3. Different lighting techniques applied to candy sweets. (a) Ring Light
(Light Background). (b) Back-lighting, this produces a silhouette of the scene.
This relates to the means of representing the image brightness data in a form
that enables the image processing and analysis design process to be implemented efficiently. We shall first consider the representation of monochrome
(grey scale) images. Let x and y denote two integers where 1 :::; x :::; m and
1 :::; y :::; n. In addition, let f (x, y) denote an integer function such that
0:::; f(x, y) :::; W (W denotes the white level in a grey scale image). An array
F will be called a digital image, where an address (x, y) defines a position in
F, called a pixel, or picture element. The elements of F denote the intensities
within a number of small rectangular regions within a real (i.e. optical) image. Strictly speaking, f(x, y) measures the intensity at a single point, but if
ly
f(x,y)
----+
x
10
(a)
(b)
(c)
(d)
Fig. 1.5. Image resolution and binary threshold. (a) to (c) Bottle-top grey scale
image at varying resolutions. (d) Binary version of the bottle-top image.
This is an image to image operation. It aims to produce images that will make
the analysis stage simpler and robust. Key to the image processing task is
the ability to segment (Fig. 1.6) and enhance the features to be analysed.
This can be a difficult task when dealing with complex scenes. The two main
classes of operations applied to images during processing are point to point
and neighbourhood operations. A wide range of image processing functions
are detailed in Chaps. 3, 4, 5 and 6.
1.4.4 Image Analysis
(a)
11
(b)
Fig. 1.6. Image processing applied to a paper watermark image. (a) Original image.
(b) Extraction of the watermark from (a).
template matching, pattern recognition using feature extraction and descriptive syntactic processes.
Template matching involves comparing an ideal representation of a pattern or object, to segmented regions within the image. This can be carried out
on a global basis, or locally whereby we can use several local features, such
as corners (Fig. 1.7). The key problem with this approach is that it cannot
easily deal with scale and/or orientation changes, therefore it can be computationally expensive. This approach to image analysis was popular in the
early days of machine vision, but had gone out of favour due to the problems
outlined. Although the introduction of faster computing power and custom
VLSI devices has caused many researchers to reevaluate the possibility of
using template matching for industrial imaging tasks.
IIL-.J
1
1/3
12
(a)
(b)
(c)
Fig. 1.8. A typical sequence of operations found in machine vision. The aim of the
algorithm is to count the number of points on the star so that it can be compared to
a predefined value and hence a decision on its suitability can be made. (a) Original
binary star image. (b) Image processing: The image is thinned (see Sec. 3.3.2) a
predefined number of times to reduce the points on the stars to a single pixel width.
(c) Image analysis: Limb end analysis (see Sec. 3.3.2) is then applied to isolate each
point (these are shown as small white squares). The points are then counted (see
Sec. 3.3) prior to classification.
Objects are classified based on the set of features extracted. Key to the image
classification task is the fact that similarity between objects implies that they
contain similar features, which in turn form clusters. A number of image classification techniques including the Nearest Neighbour, K-Nearest Neighbour,
Maximum-likelihood and Neural Networks classifiers are discussed. We also
examine the Confusion Matrix and Leave One-out methods of evaluating the
performance of these classifiers.
But, how do we know we have identified suitable features? Ideally extracted features should have four characteristics (Castleman 1979).
13
Nearest Neighbour Classifier (NNC): Classifies an object based on the closest cluster that the unknown object lies near. This is also referred to as
the Maximum Similarity Classifier (Vernon 1991). This approach is simple to implement and hence popular, although incorrect classification is
possible. It is also sensitive to noise. Assuming an object Q described by
14
the vector (Xl, X 2, ... , X n), then a set of m reference vectors, descri bing archetypal objects of each class, denoted by Yl , Y 2 . . . , Y m , where
Yi = (Yi,l, Yi.2, ... , Yi.n). Similarity between two objects represented X
and Yi can be assessed by measuring the Euclidean distance (D (X, Yi))
between them.
The larger D(X, Yi) is, the smaller the similarity is between the patterns
they represent. Thus, an object of unknown type and which is represented
by a vector X can be attributed to an appropriate class by finding which
of the Yl , Y 2 . , Y m values are closest to X.
The K-Nearest Neighbour (KNN) is an extension to NNC which tries to
obtain the k nearest neighbours in the training set and then labels them
using majority voting, i.e. assigns a class C to a pixel X if the majority of
the known neighbours of X are of class C. This approach is more robust
to noise than the NNC. k is usually chosen to be quite small (k = 3,5
are commonly used) and tends to be dependent on the amount of data
that is available.
Maximum-likelihood Classifier: Also referred to as Bayesian classification
(Vernon 1991). Assume two classes of objects, Cl and C2 , with a single feature to distinguish these classes (e.g., the number of holes, x).
This technique involves finding the Probability Density Function (PDF)
for each class and measuring the probability that an object from a given
class will have a given feature value. This is found by measuring x for a
large number of samples of each class. Let P[Cn ] be the probability of
class n occurring (in the range [0, 1]). But we require the probability that
an object belongs to class n given a measured value of x, i.e. P[Cnlx].
An object belongs to class 2 if P[C2 Ix] > P[Cllx], where P[xIC2 ] is the
probability that feature x will occur given the object belongs to class 2.
Using Bayes Theorem and assuming that P[x] (normalisation factor) is
the same for both classes and P[Cl ] = P[C2 ], P[xIC2 ] > P[xICl ]. If this
occurs then the object belongs to class 2, based on the measurement of
the feature x.
Artificial Neural Networks (ANN): Artificial Neural Networks are relatively
straightforward to design and are capable of dealing with pattern complexities not discernible by other classifiers. During the training phase
we present the input data to the network, calculating its outputs. The
outputs are then compared with the expected results. The weights of the
network are then adjusted. This should force the network to converge
and thus learn how to discriminate varying inputs as they are presented.
Each cycle of presenting the input, calculating the output, comparing
results and changing weights is referred to as an epoch. Unfortunately
the training phase can take a lot of time and we have no guarantee of
15
16
The error rate is simply N'N ss , where N miss is the number of misclassifications. This method is an almost unbiased estimator for the true error
rate of the classifier, even for a small number of samples. The method is
superior to the confusion matrix for evaluating a systems performance,
although it can be computationally expensive for a large number of cases.
1.4.6 Systems Engineering
17
The general issues involved in the machine vision engineering task has
been detailed elsewhere (Batchelor & Whelan 1994b, Batchelor & Whelan
1997).
4
5
18
!:II. ;'dll
~'ew
~'oJ.CI
~1iI7iD E]
!y,ndow !:leiP
...
\I
=-------~======~~~
IQ WorkSPlICe 1$1""_1
"rtf' 1!11
System US"
~tl!l!9'lll
IntegralelmageRows
!ntegra!elmageColumns
'i' S
HorzSum
BmaryShadow
~TopToBonom
L.IITORIVht
R,ghtTOLen
[(94.'~2)
Fig. 1.9. A typical NeatVision application and its associated visual workspace.
Blocks are selected from the component explorer and placed in the
workspace as required. Interconnections are made between blocks following
a strict 45 snap to grid rule system. When the processing system is set up,
input images must be specified. This is performed by double clicking on the
relevant input block causing an image viewer frame to appear. The open
option is then selected from the file menu and the appropriate file name is
chosen by navigating the file dialog box directory structure.
There is currently support for 13 input graphics file formats and 11 output
formats (.gif and .fpx are excluded as they are proprietary standards), see
19
2. Java Fundamentals
The 'Java' name was actually derived from the name of the programmers
favourite coffee during a brainstorming session (as opposed to the suggested
D++). HotJava is a WWW browser developed by Sun to demonstrate the power
of the Java programming language.
22
2. Java Fundamentals
Java
Java
Compiler (javac)
Java
Runtime
System
Host
Machine
!
Java Virtual Machine
Oava. appleMewer)
Operating System
WIr'Idow$ NT t98. So&iIrd.. lJnI.n
OS/2. MacOS
23
runs like a natively compiled program. The JIT compiler can, in certain
cases, improve the run-time speed of applets and applications by a factor of
10.
2.1.3 The Sun Java Software Development Kit (Java SDK)
Java's standard edition, referred to as Java 2 SDK 2 , is freely available from
http://java.sun . com/. It provides the following key tools:
javac - The Java programming language compiler.
java - The launcher for developed Java applications.
appletviewer - A basic Java enabled browser.
j avadoc - Automatic API documentation tool.
jdb - Java debugging tool.
javah - Native development tool for creating 'e' headers.
j avap - A class file disassembler.
It also includes a set of advanced tools for security, Remote Method Invocation (RMI) and internationalisation.
The Java SDK 1.2 release also contains the Java Runtime Environment
(JRE), which consists of the JVM, the Java platform core classes and supporting files. It is aimed at developers who wish to distribute their applications
with a runtime environment. This is important as the Java SDK is quite large
and licensing prevents the distribution of the Java SDK. All the examples developed in this book have been compiled and tested with the Java SDK 1.2.2
under Intel x86.
24
2. Java Fundamentals
2.2.1 Encapsulation
Encapsulation is one of the major differences between conventional structured programming and OOP. It enables methods and data members to be
concealed within an object, in effect making them inaccessible from outside
of that object. It is good programming practice for a module to hide as much
of its internals as possible from other modules.
Encapsulation can be used to control access to member data, forcing its
retrieval or modification through the objects interface which should always
incorporate some level of error checking. In strict 00 design, an object's
data members should always be private to the object. All other parts of the
program should never have access to that data. You might like to think of
this as a television, with the internal electronics protected by a case on which
controls are available. The controls provide our interface to the television,
while the case prevents unauthorised access (and indeed electrocution).
2.2.2 Classes
Objects are then instantiated from a class, in that they exist in the computers
memory, with memory space for the data. Instantation is the creation of an
object from the class description. If we expand the previous class to contain
states and methods then it will have the following form:
public class AnExampleClass {
private int anExamplelntegerState;
public void anExampleMethod()
{
25
2.2.3 Objects
new AnExampleClass();
AnExampleClass myExampleClass;
Iistep 1 - declare the reference variable
myExampleClass = new AnExampleClass();
Iistep 2 - use the new keyword to store
lithe object at the myExampleClass reference
}
}
2.2.4 Inheritance
26
2. Java Fundamentals
the Vehicle class and can add to or modify these properties. The Lorry class
might add storage properties, or might alter the number of wheels.
The is-a/is-a-part-of relationships are a useful way of viewing the relationship between derived classes and can help clarify the situation when there
is confusion. In Fig. 2.2 we have is-a relationships, such as car is-a vehicle,
lorry is-a vehicle. This is contrary to the is-a-part-of relationship that might
exist in Fig. 2.3. In that case, wheels, body, windows and engine all form an
is-a-part-of relationships with the vehicle class, i.e. body is-a-part-of vehicle.
There are no inheritance relationships occurring, rather the 'parts' may be
other classes, used to construct the vehicle class.
IS-A-PartOf
Wheels
Engine
Vehicle
Body
o
o
Windows
Car
000
27
example in Fig. 2.3, the class Car extends Vehicle and the class definition
would take the form of public class Car extends Vehicle. The extends
keyword therefore implies an is-a relationship.
2.2.5 Polymorphism
As just discussed, a derived class inherits its functions or methods from the
base class, often including the associated code. It may be necessary to redefine
an inherited method specifically for one of the derived classes, i.e. alter the
implementation. Polymorphism is the term used to describe the situation
where the same method name is sent to different objects and each object
may respond differently.
For example, consider the class diagram in Fig. 2.2. If a method was
written called drawO for the Car class, that displayed a picture of a generic
car, in the derived classes, Saloon and Estate cars, the inherited draw 0
method would not be suitable. Polymorphism allows new methods to be
written in these derived classes, with the exact same method name, draw 0 ,
to display more suitable pictures of these particular cars. Polymorphism has
advantages in that it simplifies the Application Programming Interface (API)
and allows a better level of abstraction to be achieved.
Overloading is similar to polymorphism, in that the same method name
is used, however using a different number of arguments or different argument
types. For example:
int sum(int x, int y)
String sum(String x, String y)
This sum method definition would refer to two functions that perform the
very different operations of summing integers or concatenating strings.
2.2.6 Abstract Classes
28
2. Java Fundamentals
int aLocalVariable;
}
}
There are no global variables in Java, however, class variables can be used to
communicate global values.
29
Friendly: Is the default access control level of all states and methods. Unless
modified by public or private, these states and methods are available
throughout the package that contains the class in which they are defined.
Protected: A state or method modified with the keyword protected is available in the class, in its derived classes and in the package that contains
that class.
2.3.3 Java Types
Some of the primitive types that exist in Java are illustrated in Table 2.l.
There are no cast conversions between integer and Boolean types.
Table 2.1. Java primitive types.
Type
Description
byte
short
int
long
float
double
char
Boolean
8 bit signed
16 bit signed
32 bit signed
64 bit signed
32 bit IEEE 754 floating point number
64 bit IEEE 754 floating point number
16 bit unicode character
true or false
Expressions are functional statements that return some form of value, commonly using operators to determine this value. Some of the common operators
are listed in Table 2.2.
Operators in Java widen their operands so as to operate at either 32 bit or
64 bit precision. Java throws (generates) an arithmetic exception if the right
hand operand to an integer division is zero. We can catch such exceptions to
trap run-time errors.
2.3.5 Java Comments
1* Block
1** Java
30
2. Java Fundamentals
Operator
Description
Arithmetic addition
Arithmetic subtraction
Arithmetic multiplication
Arithmetic division
Comparison
Greater-than comparison
Less-than comparison
Not equal to
Increment
Decrement
Logical AND
Bitwise AND
Logical OR
Bitwise OR
Logical shift left
Logical shift right
*
/
>=
<=
!=
++
&&
&
II
I
The third type of comment is useful for the automatic creation of documentation for a Java application. It has certain keywords such as param
to describe the methods in the code. The j avadoc application (part of the
Java SDK) can then be run on your code, generating an automatic structured HTML document describing your code. Possibly the best example of
a javadoc generated document is the Java API document 3 that is associated
with the Java SDK.
1**
*
*
*
*
*1
In Java the word this refers to the specific instance of the class and the word
super refers to the super class of the current class.
super. display 0 ;
this.x++;
3
http://java.sun.com/products/jdk/1.2/docs/api/
31
Java arrays are similar to those of the 'C' programming language. They contain a special length member that returns the size of the array. The Java
Virtual Machine constantly checks to see if the index goes out of bounds and
throws an exception if this occurs.
Object objArray[]; II create a reference to an array of Objects
objArray = new Object [5] ; Iiallocate the space for 5 objects
II
II
Every class by default (if we do not extend anything) extends the Object class.
It provides methods that are inherited by all Java Classes, such as equals 0,
getClass 0 and toString O. Consider passing data across a network. We
could write methods to send Object objects across the network. Since every
Class extends object, we can package it up as an object and extract it on the
receiving end.
Objects of the Class class (called class descriptors) are automatically created and associated with the objects to which they refer. For example, the
getName 0 and toString 0 methods return the String containing the name
of the class or interface. This may be used to compare the classes of objects.
2.3.9 Interfaces
II the states
public static final int SomelntegerState;
32
2. Java Fundamentals
To use this interface, or rather to conform to this interface, we can use the
following:
public class myApplet extends Applet implements Mylnterface {
public void someMethod()
{
Consider the mouse device. We may write many applications that use the
mouse and within these applications, the mouse is used in different ways by
different components. We can define a common interface for the mouse that
each one of these components must implement. These components then share
a common defined interface.
2.3.10 Packages
2.4
33
This shows two different ways of importing the Button class, in that we can
either import by explicitly naming the Button class as the specific class we
require to import from the AWT package, or we can import all the classes in
the package by using the wildcard ('*') character, therefore also including the
Button class. Note however, that the use of '*' does not cause sub-packages to
be imported. For example, import java. awt. *; does not import the package
java.awt. image.
If the name of the package is omitted when developing Java applications
then the default package (with no name) is assumed, with the same directory
as the source file under development. If you wish to associate a package name
with your source then the package statement must be the very first thing in
your (.java) source file (e.g. package myPackage;).
34
2. Java Fundamentals
System. out. println 0 refers to the method println 0 of the field out of
the System class. The string passed to this method is displayed to the standard output stream. The following steps are required to compile this application using Sun's Java SDK:
2.4.2 Applets
is first initialized. This method is used to set up the display of the applet,
its colours etc.
start 0: Called whenever the applet is visited, i.e. when the applet starts
running again. You could use this method to start up an animation that
stops when the applet looses focus (i.e. it is not the currently selected
window).
stop 0: Called when the applet is no longer on the screen (e.g. minimized).
destroy 0: Called when the applet is discarded. This method should perform
any closing down tasks that you would wish to perform, such as closing
connections to databases, etc.
paint 0: This method is called whenever the appearance of the applet is
required to be updated, such as when the applet is being maximized
from a minimized state. It is also possible for the programmer to call this
method. This method does nothing by default.
When developing for distributable applets you have full access to the
functionality of the AWT but do not have access to the non-core APIs. In
fact, you are developing applets for a virtual machine that only supports a
subset of the true Java specification.
The appletviewer tool supplied with the Sun Java SDK is a compact version of a standard web-browser allowing you to view your applets without
having to run a full WWW browser. It expects a HTML file to be passed
as an argument, i.e. the HTML file that points to the applet you wish to
view using the <applet> tag. The HTML file example. html would contain
an applet tag of the form: <applet code=HelloWorld. class width=300
35
height=50></applet>
This main () method basically creates a new Frame instance and places the
applet instance in the centre of the Frame instance (by default, frames have
a 'border layout', so components are added to the "North", "South", "East",
"West" and "Center"). It then calls the initialise ini t () method followed by
the start () method.
36
2. Java Fundamentals
II
Ilconstructor
super();
II call the Canvas class constructor
addMouseListener(this); Iladd to the canvas - NB.
37
public
public
public
public
void
void
void
void
Fig. 2.4. The window from the simple graphics operations code segment.
This code example illustrates the operation of the AWT Canvas class as
well as the use of mouse events. In the example, MyCanvas is the programmers
implementation of the Canvas class that overwrites the paint 0 method. In
this case we 'draw' a string at location (50,20), followed by a line and an
oval. Before we draw the oval, we set the current colour of the graphics
context to red, using g. setColor (Color. red). Table 2.3 contains some of
the commonly used Graphics class methods.
We use the MouseListener interface to allow us to capture events from the
mouse such as: mouse pressed, mouse released, mouse entered, mouse exited,
mouse clicked and to associate program functionality with these events. In
this case we wish to locate the position of the mousePressed 0 event. We use
e. get X0 and e. getY 0 to get the x and y location. These values are then set
as class states, so that they can be seen (have scope) in the paint () method.
An addMouseListener(this) is added to the canvas in the MyCanvas constructor, to allow the canvas to receive events. The only event that we needed
to implement for this application was the mousePressedO event. However,
we still need to provide empty methods for the remainder of the mouse events
to conform to the MouseListener interface.
38
2. Java Fundamentals
The PlotApp class provides the application code, creating the display
frame and creating an instance of MyCanvas class. We set the layout of the
frame to be borderlayout allowing a component added to the centre (in this
case the canvas) to scale to the size of the frame, even if it changes during
the application.
Table 2.3. Basic methods of the Graphics class.
Method
Description
setColor(Color)
39
the Image object. It has been a part of the Java SDK in the java. awt and
java. awt . image packages since its original release. Some of the methods
associated with the Image class are described in Table 2.4.
Table 2.4. Basic methods of the Image class.
Method
Description
int getWidthCImageObserver)
Where the required integer width and height values are passed along with
the 'hints', that define the resampling algorithm to use, e.g. SCALE...F AST ,
SCALE_SMOOTH, SCALEJDEFAULT.
40
2. Java Fundamentals
public void init()
{
g.drawImage(this.theImage,10,10,this);
}
}
int r = (rgb
int g
(rgb
int b
(rgb
Ilobtain the
Ilobtain the
Ilobtain the
Ilcan ignore
red
green
blue
the alpha
The alpha or transparency value is usually set to its maximum value (255) as
images are required to be fully opaque for true display purposes, hence the
alpha value can usually be ignored (set to 255) when processing an image. For
grey scale images, the values for each of the colour planes will be the same,
hence only the value for one of the colour planes need be extracted in order
to obtain the grey intensity value for that pixel. This approach increases the
41
speed at which an image can be processed. For instance, in the last code
segment, we need only extract the red value of the returned rgb value to
determine the green and blue values. The basic AWT imaging model is a
'push' (producer/consumer) model and is discussed in Sec. 2.5.3.
2.5.3 Image Producers and Image Consumers
42
2. Java Fundamentals
and is illustrated in Fig. 2.5. The offset represents the location into the array
of where to store the first pixel. The size of the array (n) to store the Image
Area of Interest must be at least (( scansize - 1) x height) + of f set + width.
(0.0)
Unsealed Image
~f-Im-a-g-e-Ar-ea-o-I---'
Interest
pixelsll
_01
_
~11
..
~31
- TI
heig~ r--I
0 ~I
0;:';;1 o......rl;;;;;;;;;;'I~~I
0,.. ..... . ;;.;'"
I
iii
WIdth
'e
offset
pililBll!ilnl
~--------------------~
choose scansize
When a pixel I(i, j) is grabbed, it is placed in the resulting array at pixels[(jy) x scansize + (i - x) +of f set]. Fig. 2.6 illustrates two different cases for the
the choice of the scansize parameter. In Fig. 2.6(a), the scansize is chosen
to be the same width as the sub image to be converted to a data array. This
gives an array of size: (width x height) + of fset. In Fig. 2.6(b), the scansize
is set as the width of the original image. This causes the array to contain zero
values (transparent black pixels) in the space defined by the offset that does
not belong to the sub-image. It is important to note the offset when you are
again constructing an image from the pixel array.
II
II
43
(pixel
(pixel
(pixel
(pixel
alpha
red
green
blue
scansize = width
Image Area of
Interest
Image Area of
Interest
width
pixels[
pixels[ I
I(xy)tr-I--r----,----,I ~
I(X,y)tooCJL-----L----=o=------....1.r=J_----L--=_..J...r=J
_~..
__.O(-_ __ _-J._-o(- __ _
_.o(--,sca=n",SI""ze'--~.
C OIlsel
(a)
(b)
Fig. 2.6. Illustration of the use of scansize when using the Pixel Grabber class.
(a) When scansize = width. (b) When scansize is set the same as the full image
width.
Once the pixel information is in array form, it can be processed using
established image processing algorithms (such as those discussed throughout
the book). After processing an image array it is then required to reconstruct
a new Image object for display purposes.
44
2. Java Fundamentals
In the last code example we used an infinite loop that checked to see if
the image had loaded completely, as the Image . getHeight (this) 4 will only
return a positive value when the image is fully loaded. We could also use the
Media Tmcker class, that can monitor the status of loading images and notify
the applet when complete. An example of the use of the MediaTracker class
is outlined in Sec. 2.8.2.
2.5.5 Recreating an Image Object from an Array of Pixels
The reconstruction of an Image object from its representative pixel information is implemented using two Java classes. The first of these is the MemoryImageSource class, where an object of this class is created using a reference
to an array of pixels and information about the dimensions of the image to
be constructed. This class implements ImageProducer. The ImageProducer
is used by the createImage () method of the Toolkit class to create a new
Image object from the referenced pixel information. All the components are
now in place for the implementation of the image processing loop, which allows the development of a complex image processing class while keeping the
interface as simple as possible. The interface in many cases may consist of
a setImage () and getImage () method pair with the underlying complexity
hidden from the user. The MemoryImageSource class constructor is used to
convert an array of integers in the RGB Color Model form to an Image object
has the form:
MemorylmageSource(int width, int height, int[] pix,
int offset, int scansize)
Problems have been recorded in using this approach under Solaris. In this case
the MediaTracker class must be used.
45
processing block appears to its neighbours as a 'black-box'. The only information known about this 'black-box' is the number of inputs and outputs it
has.
This class is the template for all classes that encapsulate methods for translating from pixel values to their alpha, red, green and blue components. In
the Java default RGB colour model, each pixel is described by a 32 bit integer which can be broken up into four 8 bit values for red, green, blue and
transparency or alpha planes. There are two main defined subclasses of the
ColorModel class, the DirectColorModel and IndexColorModel.
DirectColorModel. This class is not particularly useful, as it is very similar
in structure to the Default RGB colour model, where the number of colours
available to an image can be specified. For example, using 3 bit masks for
each colour there could be 8 possibilities for each colour plane and hence 512
possible pixel values.
IndexColorModel. The IndexColorModel Class provides an extremely fast
and efficient method of implementing LUT or palette processing algorithms.
The IndexColorModel is useful in dealing with grey scale images, since the
maximum amount of colours in the model is 256, therefore ideal for dealing
with an 8 bit grey scale images. The fast execution of image processing algorithms with large images using the IndexColorModel is due to the fact that
only the L UT is operated on, hence only a maximum of 256 operations must
be carried out.
2.6.2 ImageFilter
This class is the template for all classes that are required to process the data
delivered from an ImageProducer object to an ImageConsumer object. The
46
2. Java Fundamentals
normal image filter implemented by this class is the 'null-filter', that has no
effect on the data passing through it. Functional image filters should extend
this class and override its methods. The ImageFilter class and its subclasses
should be used in conjunction with the FilteredlmageSource class to produce
filtered versions of existing image objects.
2.6.3 CropIrnageFilter
The cropped image filter is used for cropping images. This class extends
the basic ImageFilter class to extract a specific rectangular region from a
given image. The result of this operation is a new image containing only the
specified region. Other possible uses for this class include the development
of image magnification utilities where the CroplmageFilter is used with the
drawlmage () method of the Graphics class to display a magnified region of
a source image.
2.6.4 RGBIrnageFilter
This class provides a simple method for creating an image filter that modifies the pixels of the original image, by converting them one at a time using
the default RGB colour model. Objects of this class are to be used in conjunction with a filtered image source object to produce filtered versions of
existing images. The type of filter implemented by this class should perform
monadic image operations (although alterations can be made to inherited
versions of this class which make it capable of performing neighbourhood operations). The following code segment illustrates one such monadic example,
the threshold image filter:
import java.awt.image.RGBlmageFilter;
class ThresholdFilter extends RGBlmageFilter {
private int nThresholdValue = 128;
public ThresholdFilter()
{
nThresholdValue = nThresholdSet;
canFilterlndexColorModel = true;
}
int r
int g
(rgb
(rgb
47
0) & Oxff;
(int) (r + g + b)/3;
0;
This filter has two constructors. The first one assumes a threshold value of
128 (i.e. half way to 255) and the second constructor allows the user to specify
a threshold value in the range 0 to 255. To use this filter:
ThresholdFilter filter = new ThresholdFilter(100);
ImageProducer P = new FilteredlmageSource(
this.thelmage.getSource() ,filter));
this.thelmage
Toolkit.getDefaultToolkit().createlmage(p);
2.6.5 FilteredIrnageSource
48
2. Java Fundamentals
II
II
this.addMouseMotionListener(this);
}
Dimension curScreenDimension
if (offScreenGraphics==null)
{
this. getSize 0 ;
II
II
this.mouseX = e.getX();
this.mouseY = e.getY();
thi s . repaint 0 ;
}
}
1*
do nothing
*1 }
49
50
2. Java Fundamentals
graphics and imaging. It adds support for a persistent image object, advanced image filters and additional colour models and image formats. The
Java 2D API also introduces a Renderable and Rendered Interface, to provide
resolution-independent image rendering support. Using these interfaces, images can be 'pulled' through a sequence of image processing operations, with
the resolution selected using a rendering context.
An unrendered source image, is an ideal image, with no defined size, which
may be projected onto any surface, at any resolution. Operators (image filters) can change the form of the image, independent of the destination, as
the image is requested by the display, providing the optimal rendering quality for that display. This Java 2D Immediate Mode Model (j ava. awt . image)
provides support for:
Raster class: Represents an array of pixels, bound by a rectangle, that need
not necessarily start at (0,0). A Raster has a SampleModel that describes
how the the pixels are stored in the corresponding DataBuffer that stores
the image.
Bufferedlmage class: Used to describe an image with an available buffer of
image data. It contains a ColorModel and a Raster of image data.
RasterOp interface: Describes a single input, single output operation that is
performed on Raster objects. The interface cannot be used to describe
multiple input operations and the Source and Destination objects must
contain the correct number of bands for the class implementing this interface.
BufferedlmageFilter class: Provides a simple way of using single source,
single destination image filter on a BufferedImage in the producer - filter
- consumer model.
BufferedlmageOp interface: Describes a single input, single output operation that is performed on BufferedImage objects. Objects created using
this interface can be passed into a BufferedImageFilter to operate on a
BufferedImage following the producer - filter - consumer model.
2.8.2 Working with the Java 2D API
The following Java 2D API code segment illustrates how the BufferedImage
class can be used within a filtering application (Fig. 2.7). The testFil ter 0
method receives a BufferedImage object and returns a BufferedImage object
after a convolution operation (Sec. 3.2). The kernel that is used is a Laplacian and performs a form of edge detection on the image. The loadlmage ()
method shows how the MediaTracker class can be used to obtain a complete
image and convert the Image object into a BufferedImage object. The mainO
method generates a simple image viewer for viewing BufferedImage objects.
The JFrame class (j avax. swing. JFrame) used here is an extended version
of java. awt. Frame, a component class of the Swing package. The Swing set
provides a large set of lightweight (pure Java) components that work in the
51
same manner on all platforms. They add tables, sliders, progress bars, scroll
panes and many more components to Java.
II
import
import
import
import
java.awt.*;
java.awt.image.*;
javax.swing.*; II Use Swing JFrame and ImageIcon Classes
java.net.URL;
II
II
II
52
2. Java Fundamentals
II
II
(a)
(b)
Fig. 2.7. Laplacian edge detection using the Java 2D API. (a) Laboratory input
image. (b) The output image illustrating the filtering operation.
The Java 2D API can also be used quite successfully for graphical applications. The following code example illustrates how the graphics context
may be obtained for such an application.
53
This Java 2D API example also illustrates the use of anti-aliasing 5 . The
setRenderingHint 0 method is used to add this anti-aliasing property to
the graphics context. The resulting effect is shown in Fig. 2.8. There is a
computational cost associated with the use of anti-aliasing, so it should not
be used by default. It is generally used when quality rendering is required.
(a)
(b)
Fig. 2.8. Sample Java 2D API graphical application output. (a) No anti-aliasing
em ployed. (b) Anti-aliasing employed.
Aliasing is due to the discrete nature of digital images. The effect can be seen
as discontinuities ('jaggies') on a line as it is sampled over the image. To reduce
this effect anti-aliasing algorithms (smoothing) can be used. Alternatively the
resolution of the sampled image can be increased.
54
2. Java Fundamentals
The Java AWT supports two imaging layers, renderable and rendered. The
renderable layer is rendering-independent, where operators are available that
use rendering independent parameters. When a rendering is requested, the
request passes from the destination to the source. The rendered layer is an
image that has been rendered for a particular context. The renderable layer is
often preferred, as it is lightweight, with alterations to the renderable operators (RenderablelmageOp) being editable, without the need for pixel computation. These renderable operators can be connected to form a chain and are
connected to a Renderablelmage source. A Renderablelmage source can be
given a RenderContext to produce a Renderedlmage. This RenderContext is
used to define the mapping between the user defined coordinate system of the
Renderablelmage and the coordinate space for the desired device dependent
Renderedlmage.
A Parameter Block is used to contain the parameters and sources that
are required by a specific RenderablelmageOp. The parameters in the chain
can be altered by using the setParameterBlockO method, affecting future
Renderedlmages obtained from this chain.
The Java AWT API also provides support for CMY (cyan, magenta and
yellow) and CMYK (cyan, magenta, yellow and black) image pixel sampling.
As discussed previously, DataBuffer is used for RAW image storage, but is
not concerned with how the pixel information is stored. The SampleModel
class contains a sample of how pixels are contained within an image. A Raster
contains a DataBuffer and a SampleModei.
2.8.4 Java Advanced Imaging API (JAI)
The Java Advanced Imaging API (JAI) was developed for digital image processing applications in Java by Sun, Siemens, Kodak and Autometric Inc.
with the aim of retaining the flexibility of Java applets and applications (including the runtime environment) but allowing the use of imaging techniques.
To retain this flexibility involved developing a cross-platform object-oriented
device-independent API. The API supports numerous image formats and
imaging algorithms, while still working well with other Java core APls. The
JAI supports client server imaging (Java RMI and JavaBeans). It also links
to Intel MMX libraries and Sun VIS hardware optimisations.
There are numerous uses for the JAI in image processing and analysis.
The Java Advanced Imaging (JAI) API was developed to support image
processing under the Java language. As yet, it does not provide the large
collection of machine vision techniques available in Neat Vision, rather it provides support for the generation of any image processing application using
a carefully defined programming model. The JAI supports many image formats and advanced imaging techniques while encapsulating the complexities
involved.
55
The JAI API coupled with the Java programming language provide many
advantages as the choice of platform for machine vision application development, including:
Object-oriented: The JAI retains the object-oriented structure of Java, implementing images and image filters as objects.
Extensible: Any image processing operation developed using JAI can be
added to the API.
Cross Platform: An application developed using the JAI can be deployed to
any platform with a Java Virtual Machine.
Rendering-independent sources and operations: To allow processing operations to be specified in resolution-independent coordinates.
Rendering contexts: To avoid defining image and operation characteristics
until execution.
Editable graphs of image processing operations: To be defined and instantiated as required.
Multi-dimensional, multi-banded image data types: Including common Image data types.
Tiled images: Supported to allow efficient processing of large images.
Deferred execution: Supported to allow operations on only the pixels that
are required.
Operation chain optimisation: To allow a sequence of operations to be combined efficiently in a single operation.
The JAI API provides support for the Java AWT and Java 2D imaging models. It allows JAI operators to be applied directly to Java 2D
Bufferedlmage objects. The JAI API has support for the 'pull' imaging
model, allowing operations to be performed on requested areas of an image
source. The operation of the JAI API is illustrated in Fig. 2.9. The chain of
operations in JAI is referred to as a graph. JAI supports Rendered Graphs
and Renderable Graphs. Rendered Graphs allow direct modification of pixel
values, processing in immediate mode, so when the chain of operations is altered (e.g. by adding a new operation), it immediately computes a new result.
Renderable Graphs on the other hand are evaluated when there is a request
for rendering, known as deferred execution (the 'pull' model). Rendered and
Renderable graphs are editable, however once rendering takes place at a node
in a Rendered graph, it is 'locked' and parameters or sources may not be altered. Fig. 2.9 also illustrates a remote mode render option. This remote
mode is based on RMI (Remote Method Invocation), allowing the client to
call a method for server-side image processing.
56
2. Java Fundamentals
source
operations
render
Image File
Rendered
immediate mode
Network Image
Renderable
deferred mode
Generate Image
Remote
remolemode
destination
javax.media.jai.*;
java.awt.image.renderable.ParameterBlock;
javax.media.jai.widget.ScrollingImagePanel;
com.sun.media.jai.codec.FileSeekableStream;
java.awt.*;
java.io.*;
57
II
II
II
RenderedOp theImage;
RenderedOp theImage;
theImage = LoadImage();
58
2. Java Fundamentals
if (theImage!=null)
{
This example should be compiled using the Sun Java SDK, ensuring that
you have obtained the JAI API from SUN and added the JAI libraries to
your CLASSPATH environment variable (see Sec. 2.4.1 for details on the
compilation procedure).
This example uses the RenderedOp class. JAI supports both RenderableOp and RenderedOp classes. The RenderableOp class is a representation
of an operation in a Renderable graph. The RenderedOp class is used in a
Rendered graph and stores the Parameter Block and RenderingHints for that
operation.
(OfIOO1S.pg
cOf'l'idor.JPO
CS',pcx
e<nOOO4.1>\j
corrOOl4.1>\j
fl.J>O
f3.1P\l
fr ..... l.~
fr .....2.~
Aeooma
Fletol!>l>O
lAir... 1"1
(a)
(b)
Fig. 2.10. Sobel edge detection using JAr. (a) The LoadFile dialog box. (b) The
output canvas.
2.10 Conclusion
59
refer to the following web sites (there are also several news groups available
under the general heading comp . lang . java):
Java Software Home Page (java. sun. com). Contains comprehensive documentation on APIs mentioned in this chapter.
Java API Documentation (j ava. sun. com/products/j dk/l. 2/docs/api).
This links to an on-line version of the API documentation. It is advisable
to download a local version to your machine if you wish to do further
programming.
Other Java Resources (www.developer.com).This is a good general resource site, with articles and examples related to Java programming.
The Java 2D API site contains documentation and examples specifically
for the Java 2D API (java. sun. com/products/java-media/2D/index.
html).
The Java Advanced Imaging API (java. sun. com/products/java-media/
j ai). Similar to the 2D API site, it provides documentation for programming with the JAI API.
Object-oriented Programming - Module EE553 (www.racee.ie).
Java Tutorials (developer. java. sun. com/ developer / onlineTraining/
Programming). This link refers to introductory tutorials in the Java programming language.
The Swing Connection (java. sun. com/products/jfc/tsc). For additional information on the Swing components mentioned briefly in this chapter.
2.10 Conclusion
This chapter discussed a framework for developing machine vision algorithms
using the Java programming language. The remainder of the book will concentrate on discussing machine vision algorithms and techniques using Java
as a means of illustrating their operation. Details relating to the design of the
NeatVision integrated image analysis and software development environment
(also developed in Java) will be discussed in Chap. 7.
The purpose of this chapter is to outline some of the basic techniques and
algorithms used in the development of machine vision systems. The key elements of the the image processing and analysis functions introduced in this
section are also implemented in Java and form the basis of the NeatVision
visual programming environment (Chap. 7). Elements of this chapter have
previously appeared in Chap. 2 of Intelligent Vision Systems for Industry
(Batchelor & Whelan 1997).
=
=
=
=
=
input.getxy(x-1,y-1); b = input.getxy(x,y-1);
input.getxy(x+1,y-1); d = input.getxy(x-1,y);
input.getxy(x,y);
input.getxy(x+1,y); g = input.getxy(x-1,y+l);
input.getxy(x,y+l); i = input.getxy(x+1,y+l);
N(x, y) forms a 3 x 3 set of pixels and is referred to as the 3 x 3 neighbourhood of (x, y). Many of the operators outlined are based on manipulating
pixels in a 3 x 3 or 5 x 5 (as illustrated below) neighbourhood.
62
= input.getxy(x-2,y-2);
= input.getxy(x,y-2);
e = input.getxy(x+2,y-2);
b
d
f
input.getxy(x-l,y-l); h
a
c
g
i
n
p
=
=
=
=
input.getxy(x+1,y-l);
input.getxy(x-2,y);
input.getxy(x,y);
input.getxy(x+1,y);
input.getxy(x-2,y+l);
input.getxy(x,y+l);
input.getxy(x+2,y+l);
input.getxy(x-1,y+2);
input.getxy(x+l,y+2);
j
1
=
=
=
u =
w=
0
q
s
input.getxy(x-l,y-2);
input.getxy(x+l,y-2);
input.getxy(x-2,y-l);
input.getxy(x,y-l);
input.getxy(x+2,y-l);
input.getxy(x-l,y);
input.getxy(x+2,y);
input.getxy(x-l,y+l);
input.getxy(x+l,y+l);
input.getxy(x-2,y+2);
input.getxy(x,y+2);
input.getxy(x+2,y+2);
Intensity Shift. Add a constant offset to a pixel intensity value. This definition was carefully designed to maintain c(x, y) within the same range as
the input, [0, W]. This is an example of a process referred to as intensity normalisation (i.e. it permits iterative processing in a machine having a limited
precision for arithmetic). Normalisation will be used frequently throughout
this chapter.
c(x,y) <=
{~X'Y) + k
a(x, y) + k < 0
0::; a(x,y) + k::; W
W < a(x, y) + k
63
o(x,y) <=
{~X,y) x k
a(x,y) x k < 0
0::; a(x,y) x k::; W
W < a(x, y) x k
(b)
(a)
Fig. 3.2. Negate operation. (a) Input image. (b) Negate operator applied to (a).
otherwise
a('W')2.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y, (input.getxy(x,y)*input.getxy(x,y))/w HITE);
c(x,y) {::: { :
(x,y):::: thres
otherwise
64
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(input.getxy(x,y) < thres)
output.setxy(x,y,BLACK);
else
output.setxy(x,y,WHITE);
Fig. 3.3. Under and over thresholding of the bottle-top image illustrated
Fig. 3.2(a).
III
Mid-level Threshold. Threshold midway between the minimum and maximum grey level (i.e. at the grey scale value MIDGREY).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(input.getxy(x,y) >= MIDGREY)
output.setxy(x,y,WHITE);
else
output.setxy(x,y,BLACK);
Dual Threshold. Values less than a user defined integer value lower and
greater than the user defined integer value upper are set to BLACK. Otherwise the pixel is set to WHITE (Fig. 3.4).
c(x,y)
~ {:
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if ((input.getxy(x,y=lower)&&(input.getxy(x,y)<=upper))
output.setxy(x,y,WHITE);
else
output.setxy(x,y,BLACK);
65
Triple Threshold. Values between BLACK and the user defined integer
value lower are set to BLACK. Values between the user defined integer value
upper and WHITE are set to WHITE. Otherwise the pixel is set to MIDGREY
(midway between the minimum and maximum grey level).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
if((input.getxy(x,y=BLACK)&&(input.getxy(x,y)<=lower
output.setxy(x,y,BLACK);
if((input.getxy(x,y=lower)&&(input.getxy(x,y)<=upper
output.setxy(x,y,MIDGREY);
if((input.getxy(x,y=upper)&&(input.getxy(x,y)<=WHITE
output.setxy(x,y,WHITE);
}
{~
a(x,y)=p
otherwise
L-x,y s(p, x, y)
It is not necessary to store each of the s(p, x, y), since the calculation of the
histogram can be performed as a serial process in which the estimate of h(p)
is updated iteratively as we scan through the input image. The cumulative
histogram, H(p), can be calculated using the following recursive relation:
H(p) = H(p - 1) + h(p), where H(O) = h(O).
Both the cumulative (Fig. 3.5) and the standard histograms have a number of uses, as will become apparent later. It is possible to calculate various
intensity levels which indicate the occupancy of the intensity range. For example, it is a simple matter to determine that intensity level, p(k), which when
66
Number of
pixels
t2
t,
Grey levels
Fig. 3.5. Using the cumulative histogram to find the number of pixels in an image.
N is the total number of white pixels in the image, Ntl the total number of pixels
with a grey level::; h. The total number of white pixels in the image after applying
the threshold h = N - Ntl .
67
Exponential LUT. The LUT is modified such that the output grey levels
are the normalised exponential of the input image.
for(int index=BLACK;index<=WHITE;index++)
{
intensity
(float) index/WHITE;
intensity
(float) ((Math.exp(intensity)-1)/(Math.exp(1.0)-1));
intensity
WHITE*intensity;
LUT[index] = (int) intensity;
}
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,LUT[input.getxy(x,y)]);
Logarithmic L UT. The L UT is modified such that the output grey levels
are the normalised logarithm of the input image. This is useful in brightening
dark detail in images with a large dynamic range.
for(int index=BLACK;index<=WHITE;index++)
{
intensity
(float) index/WHITE;
intensity
(float)Math.log((intensity*(Math.exp(1)-1))+1);
intensity
WHITE*intensity;
LUT[index] = (byte) intensity;
}
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,LUT[input.getxy(x,y)]);
nth Power LUT. This command modifies the LUT such that the output
grey levels are the current image pixel values to the power of a user defined
double value n. For example if n = 2.0, it is equivalent to squaring the grey
levels and normalising (Fig. 3.7).
for(int index=BLACK;index<=WHITE;index++)
{
intensity
(double) index/WHITE;
intensity
(double)Math.pow(intensity,n);
intensity
intensity*WHITE;
LUT[index] = (int) intensity;
}
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,LUT[input.getxy(x,y)]);
68
SHADES = 256;
II HISTOGRAM
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
histogram[input.getxy(x,y)]++;
II CUMULATIVE HISTOGRAM
for(int i=l; i<=WHITE; i++)
histogram[i] += histogram[i-l] ;
II TOTAL PIXEL COUNT
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,LUT[input.getxy(x,y)]);
(a)
(b)
Fig. 3.7. Look-up tables. (a) X-ray image of food jar. (Courtesy of M. Graves,
Spectral Fusion Technologies Ltd., UK). (b) nth power LUT applied to the x-ray
image, where n = 2. This is equivalent to squaring the grey levels and normalising.
Local Histogram Equalisation. This operation relies upon the application of histogram equalisation within a small local area or window. The number of pixels in a small window that are darker than the central pixel are
counted. This number defines the intensity at the equivalent point in the
output image. This is a powerful filtering technique, which is particularly
useful in texture analysis applications (Chap. 5). Fig. 3.8 illustrates the use
of local histogram equalisation for 3 x 3 and 5 x 5 regions.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
count_pixel = 0;
69
input.getxy(x,y-1);
a = input.getxy(x-1,y-1); b
c = input.getxy(x+1,y-1); d
input.getxy(x-1,y);
e = input.getxy(x,y);
f
input.getxy(x+1,y); g
input.getxy(x-1,y+1);
input.getxy(x,y+1); i
input.getxy(x+1,y+1);
h
if(a < e) count_pixel++;
if(b < e) count_pixel++;
if(c < e) count_pixel++;
if(d < e) count_pixel++;
if(f < e) count_pixel++;
if(g < e) count_pixel++;
if(h < e) count_pixel++;
if(i < e) count_pixel++;
output.setxy(x,y,count_pixel);
}
(a)
(b)
(c)
Fig. 3.8. Local histogram equalisation. (a) Input texture image. (b) 3 x 3 region.
(c) 5 x 5 region.
Contrast Enhancement. This operator modifies the LUT so that the output grey levels occupy the entire grey scale range from BLACK to WHITE
(Fig. 3.9). This is performed by assessing the maximum and minimum grey
levels in an image and stretching these levels over the range BLACK to
WHITE. This is particularly useful in low contrast images to increase the
apparent dynamic range (Fig. 3.10).
min = WHITE; max = BLACK;
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
current_pixel = input.getxy(x,y);
if (current_pixel < min)
min = current_pixel;
if (current_pixel > max)
max = current_pixel;
}
70
Pixel Count
Pixel Count
~'"
Grey Scale
b-&
Grey Scale
Pixel Count
Threshold
Grey Scale
255
Fig. 3.9. Stretching a histogram over the full range of grey scales to improve image
contrast. A possible threshold value is also indicated.
Contrast Stretching. This operator modifies the LUT so that the grey
levels in the user defined integer range min to max are mapped to the range
BLACK to WHITE. This is useful in examining details occurring in a small
part of the grey scale range. This is similar to contrast enhancement, except
that the range of levels to be enhanced are user defined and not automatically
generated.
3.1.4 Dyadic, Point-by-point Operators
These have a characteristic equation of the form: c(x, y) {::: h( a(x, y), b(x, y)).
It is important to realise that c(x,y) depends upon only a(x,y) and b(x,y)
(Fig. 3.11). Each of the operations described in this section determines which
input image has the greatest width (height) and uses that width (height) for
subsequent calculations (as outlined below).
71
if(inputA.getWidth(inputB.getWidth())
width
inputA.getWidth();
else
width
inputB.getWidth();
if(inputA.getHeight(inputB.getHeight())
height
inputA.getHeight();
else
height
inputB.getHeight();
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3.10. Images and their histograms. (a) Original image and its associated
histogram (b). (c) Contrast enhanced image and its associated histogram (d). (e)
Equalised image and its associated histogram (f).
72
ti
I-
a(x.y)+b(x,y)
2
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,(inputA.getxy(x,y)+inputB.getxy(x,y1);
MaxiIllUIn. c(x, y) {::: M AX[a(x, y), b(x, y)]. When applied to a pair of binary images, the union (OR function) of their white areas is computed. This
function may also be used to superimpose white writing onto a grey scale
image.
73
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(inputA.getxy(x,yinputB.getxy(x,y))
output.setxy(x,y,inputA.getxy(x,y));
else
output.setxy(x,y,inputB.getxy(x,y));
Minimum. c(x,y) {::: MIN[a(x,y),b(x,y)]. This operator finds the intersection (AND function) of the white areas of a pair of binary images. See
Fig. 3.12 for an example of this operator applied to two grey scale images.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(inputA.getxy(x,y)<inputB.getxy(x,y))
output.setxy(x,y,inputA.getxy(x,y));
else
output.setxy(x,y,inputB.getxy(x,y));
(a)
(b)
(c)
Fig. 3.12. Minimum operator. (a) Input image A. (b) Input image B. (c) Minimum
of images A and B.
74
I:+'
z = 0;
for(int my=O; my<maskSize; my++)
for(int mx=O; mx<maskSize; mx++)
if (mask [mx+(my*maskSize)] !=null)
z += input.getxy(x+mx-maskOffset,y+my-maskOffset)
*mask [mx+ (my*maskSize)] .intValue();
if (sum
if (sum
i f (sum
0) z=(MIDGREY+(z1;
> 1) z=z/sum;
< 0) z=WHITE-(z/sum);
output.setxy(x,y,flow(z;
}
75
for the normalisation constants, k1 and k2 are given later. The weight values
form a weight matrix which determines the properties of the linear local
operator. For example, in Fig. 3.14, we are using the 3 x 3 local operators
outlined below to highlight the horizontal and vertical elements within an
image.
-1 -1 -1]
= [ 0 0 0
111
and
-101]
= [ -1 0 1
-101
Fig. 3.14. Linear Local Operators: Highlighting horizontal and vertical lines.
The following rules summarize the behaviour of this type of operator (they
exclude the case where all the weights and normalization constants are zero,
since this would result in a null image):
If all weights are either positive or zero, the operator will blur the input
image. Blurring is referred to as low-pass filtering. Subtracting a blurred
image from the original results in a highlighting of those points where the
intensity is changing rapidly and is termed high-pass filtering.
If WI = W 2 = W3 = W 7 = Ws = W g = 0 and W 4 , W s , W6 > 0, then
the operator blurs along the rows of the image; horizontal features, such
as edges and streaks, are not affected.
If WI = W 4 = W 7 = W3 = W6 = W g = 0 and W 2 , W s , Ws > 0, then the
operator blurs along the columns of the image; vertical features are not
affected.
If W 2 = W3 = W 4 = W6 = W 7 = Ws = 0 and WI, W s , W g > 0, then the
operator blurs along the diagonal (top-left to bottom-right). There is no
smearing along the orthogonal diagonal.
76
P=
and
ill 1JII
I
II
11 1
77
=--;------,
2:p,q IWp, ql
k 2+- [1 -
2:p,q Wp,q 1 W
x2: p,q IWp,ql
2
(3.1)
(3.2)
Rectangular Average Filter. This command performs a rectangular average filter, of size (2k+l) x (2k+l) (Fig. 3.15). It is equivalent to a convolution
of that size in which all the coefficients are ones. This is effectively the same
as the Gaussian (Sec. 3.2.3), except that no multiplications are required (as
all coefficients are one) and therefore it is quicker than the Gaussian. When
dealing with such large filters the designer must account for the resultant
edge effects (Sec. 3.2.5).
A filter using an 11 x 11 weight matrix performs a local averaging function.
This produces quite a severe 2-directional blurring effect. Subtracting the
effects of a blurring operation from the original image generates a picture in
which spots, streaks and intensity steps are all emphasised (Fig. 3.15(d)). On
the other hand, large areas of constant or slowly changing intensity become
uniformly grey.
k = size;
1 = (2*k)+1;
offset = k+l;
for(int y=offset;y<height-offset;y++)
for(int x=offset;x<width-offset;x++)
{
z = 0;
for(int x=offset;x<width-offset;x++)
for(int y=offset;y<height-offset;y++)
{
z = 0;
for(int j=-k; j<=k; j++)
z+=bufferO.getxy(x,y+j);
output.setxy(x,y,z/l);
}
78
(a)
(b)
(c)
(d)
Fig. 3.15. Rectangular averaging filter. (a) Original Image. (b) 3 x 3. (c) 11 x II.
(d) The result of subtracting a 11 x 11 local averaging function from (a). A grey
scale offset has been added to the image to enhance its visibility.
mean = (a+b+c+d+e+f+g+h+i)/9;
threshold = mean - alpha;
if(e < threshold)
output.setxy(x,y,BLACK);
else
output.setxy(x,y,WHITE);
}
79
for(int yc=2;yc<height-2;yc++)
for(int xc=2;xc<width-2;xc++)
{
mean = (a+b+c+d+e+f+g+h+i+j+k+l+m+n+o+
p+q+r+s+t+u+v+w+x+y)/25;
threshold = mean - alpha;
if(m < threshold)
output.setxy(xc,yc,BLACK);
else
output.setxy(xc,yc,WHITE);
}
(a)
(b)
Fig. 3.16. Adaptive thresholding of the textured image illustrated in Fig. 3.8(a).
(a) 3 x 3 region. (b) 5 x 5 region. In both cases the offset value is set to zero.
z = Math.max(Math.max(Math.max(Math.max(a,b),Math.max(c,d)),
Math.max(Math.max(e,f),Math.max(g,h))),i);
output.setxy(x,y,z);
}
80
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
z = Math.min(Math.min(Math.min(Math.min(a,b),Math.min(c,d)),
Math.min(Math.min(e,f),Math.min(g,h))) ,i);
output.setxy(x,y,z);
}
(b)
(a)
Fig. 3.17. Intensity Neighbourhood operator applied to the textured image illustrated in Fig. 3.8(a). (a) Largest Intensity Neighbourhood. (b) Smallest Intensity
Neighbour hood.
Edge Detector. e {::: MAX (a, b, c, d, e, j, g, h, i) - e. This operator highlights edges within an image (i.e. points where the intensity is changing
rapidly).
Median Filter. e {::: FIFTH LARGEST(a,b,c,d,e,j,g,h,i). This filter
is particularly useful for reducing the level of noise in an image. Noise is
generated from a range of sources, such as video cameras and x-ray detectors
and can be problematic if it is not eliminated by hardware or software filtering
(Fig. 3.18).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
b[O]
b[2]
b[4]
b[5]
b[7]
fin=-l;
input.getxy(x-l,y-l);
input.getxy(x+l,y-l);
input.getxy(x,y);
input.getxy(x+l,y);
input.getxy(x,y+l);
while(fin!=O)
{
fin=O;
b [1]
b [3]
input.getxy(x,y-l);
input.getxy(x-l,y);
b [6]
b [8]
input.getxy(x-l,y+l);
input.getxy(x+l,y+1);
81
for(int i=O;i<8;i++)
{
swp = b[i] ;
b[i] = b[i +1] ;
b[i+1] = swp;
fin++;
}
}
}
output.setxy(x,y,b[4]);
}
Midpoint. This filter is based on the mid-value of pixels in a 3 x 3 neighbourhood (Fig. 3.18).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
high
low
Math.max(i,Math.max(Math.max(Math.max(a,b),
Math.max(c,d)) ,Math.max(Math.max(e,f) ,
Math.max(g,h))));
Math.min(i,Math.min(Math.min(Math.min(a,b),
Math.min(c,d)) ,Math.min(Math.min(e,f) ,
Math.min(g,h))));
z = (high+low)/2;
output.setxy(x,y,z);
}
z = (a+b+c+d+e+f+g+h+i)/9;
output.setxy(x,y,z);
}
82
z = flow((9*e)-a-b-c-d-f-g-h-i);
output.setxy(x,y,z);
}
Crack Detector. This operator is equivalent to applying the sequence of operations Largest Intensity Neighbourhood (LIN), LIN, Negate, LIN,
LIN, Negate and then subtracting the result from the original image. This
detector is able to detect thin dark streaks and small dark spots in a grey
scale image. It ignores other features, such as bright spots and streaks, edges
(intensity steps) and broad dark streaks (Fig. 3.18).
Rank Filters. The generalised 3 x 3 rank filter is: c(x,y) {::: kdL1,W1 +
L 2.W2 + L3.W3 + L 4 .W4 + L5.W5 + L6.W6 + L 7 .W7 + Ls.Ws + Lg.Wg) + k2
where (L1' L 2 , ... , Lg) is a sorted version (largest first, L 1) of the input pixel
values (a, b, c, d, e, f, g, h, i) and k1 and k2 are the normalization constants.
With the appropriate choice of weights (W1' W 2, ... , W g), the rank filter can
be used for a range of operations including edge detection, noise reduction,
edge sharpening and image enhancement.
Direction Codes. This function can be used to detect the direction of
the intensity gradient. The direction code function DIRCODE is defined as
e {::: DIRCODE(a, b,c,d,f,g,h,i), where
DIRCODE(a, b, c, d, f, g, h, i) {:::
1 if
2 if
3 if
4 if
5 if
6 if
7 if
8 if
a 2: MAX(b,c,d,f,g,h,i)
b 2: MAX(a,c,d,f,g,h,i)
c 2: MAX(a,b,d,f,g,h,i)
d 2: MAX(a,b,c,f,g,h,i)
f 2: MAX(a,b,c,d,g,h,i)
g 2: MAX(a,b,c,d,f,h,i)
h 2: MAX(a,b,c,d,f,g,i)
i 2: MAX(a,b,c,d,f,g,h)
(a)
(b)
(c)
(d)
(e)
(f)
83
Fig. 3.18. Filtering of the textured image illustrated in Fig. 3.8(a). (a) Median.
(b) Midpoint. (c) Low-pass filter. (d) Difference of low-pass filters (DOLPS). (e)
Sharpen. (f) Crack detection.
84
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
b
input.getxy(x,y-l); d
input.getxy(x-l,y);
e = input.getxy(x,y);
f
input.getxy(x+l,y); h
input.getxy(x,y+l);
z
MIDGREY + (((4*e)-b-d-f-h>1);
output.setxy(x,y,z);
}
return(output);
}
laplacian8(input)
{
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
a = input.getxy(x-l,y-1); b = input.getxy(x,y-1);
c = input.getxy(x+l,y-l); d = input.getxy(x-1,y);
e = input.getxy(x,y);
f
input.getxy(x+l,y); g = input.getxy(x-l,y+l);
h
input.getxy(x,y+1); i = input.getxy(x+1,y+l);
z
MIDGREY + (((8*e)-a-b-c-d-f-g-h-i>1);
output.setxy(x,y,z);
}
return(output);
}
85
1
-----ni
1-
------r'---'-r-r
Fig. 3.19. Examination of the effect of the first and second derivative when applied
to a line profile.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
a = input.getxy(x-1,y-1); b
input.getxy(x,y-1);
d = input.getxy(x-1,y);
e = input.getxy(x,y);
z1
Math.abs(a-e);
z2 = Math.abs(b-d);
z3 = z1+z2;
if(z3>WHITE) z3 = WHITE;
output.setxy(x,y,z3);
}
z1
z2
Math.abs(a+(b1)+c-g-(h1)-i);
Math.abs(a+(d1)+g-c-(f1)-i);
86
z3 = flow((zl+z2>2);
output.setxy(x,y,z3);
}
While the Roberts operator produces thinner edges, these edges tend to
break up in regions of high curvature. Roberts edges also tend to have poor
localization. The primary disadvantage of the Roberts operator is its high
sensitivity to noise, since fewer pixels are used in the calculation of the edge
gradient (Fig. 3.20). There is also a slight shift in the image, when the Roberts
edge detector is used. The Sobel edge detector does not produce such a shift
(Fig. 3.21).
Fig. 3.20. Comparison of the Laplacian, Roberts and Sobel edge operators on a
linear edge segment.
["
.~
: <1
'1
'7
'-.1
(a)
I'~
'-'
(b)
,,>
(c)
Fig. 3.21. Edge detection applied to the image illustrated in Fig. 3.12(a). (a)
Laplacian. (b) Roberts. (c) Sobel.
87
-ll
1-1 -1
0 0 0
111
and
where P1 and P2 are the values calculated from each mask respectively.
The Prewitt gradient magnitude is defined as: e -=Jp'f + P]' This operator
can also approximated by the Modified Prewitt edge detector. The derivative
is approximated by differencing pixels on either side of the centre pixel. The
absolute values give by each of the convolution kernels are added together
and the centre pixel replaced.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
z1
Math.abs(a+b+c-g-h-i);
z2
Math.abs(a+d+g-c-f-i);
z3
(z1+z2) /3;
if (z3>WHITE) z3 = WHITE;
output.setxy(x,y,z3);
}
Frei and Chen Edge Detector. This operator uses the two 3 x 3 masks
shown below to determine the edge gradient:
and
[
-1
OIl
--vI2 0 --vI2 J
-1 0
where Fl and F2 are the values calculated from each mask respectively.
The Frei and Chen gradient magnitude is defined as: e -= JF'f + F]. A
modified version of this operator (using absolute values) can be given as:
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
z1
Math.abs(a+(b*Math.sqrt(2.0+c-g-(h*Math.sqrt(2.0-i);
z2
Math.abs(a+(d*Math.sqrt(2.0+g-c-(f*Math.sqrt(2.0-i);
z3
(int) (z1+z2/(2+Math.sqrt(2.0);
if(z3>WHITE) z3 = WHITE;
output.setxy(x,y,z3);
}
88
if(((d<MIDGREY)&&(f>MIDGREY)&&(e<f)&&(e>d I I
((b<MIDGREY)&&(h>MIDGREY)&&(e<h)&&(e>b I I
((d>MIDGREY)&&(f<MIDGREY)&&(e>f)&&(e<d I I
((b>MIDGREY)&&(h<MIDGREY)&&(e>h)&&(e<b)
output.setxy(x,y,WHITE);
else
output.setxy(x,y,BLACK);
Marr-Hildreth Edge Detector. Marr & Hildreth (1980) proposed an approach to edge detection based on finding the zero crossings of the Laplacian
of a Gaussian (commonly referred to as LoG). The image is initially smoothed
with a Gaussian G(x, y) low-pass filter window which removes noise and spurious edges. The Laplacian is then applied followed by marking pixels for
which there is a zero crossing as edge pixels. This approach does not give an
indication of edge direction.
The 2-D Gaussian is defined as:
(3.3)
where x and yare the pixel coordinates, k is a scaling constant and (J" is the
standard deviation of the associated probability distribution. Fortunately, the
convolution of the 2-D Gaussian can be effected by two convolutions with 1-D
Gaussian functions, i.e. G(x, y) * f(x, y) = G(x) * (G(y) * f(x, y)). Therefore
the Laplacian of a Gaussian (LoG) is defined as:
(3.4)
89
V(G
VIG
* f(x,y))
* f(x, y)1
(3.5)
~~--~--~
* f(x,y) =
62
6n2G
* f(x,y) = 0
(3.6)
Canny edge detection involves convolving the input image with a Gaussian of scale (5, where the partial derivatives are computed and the magnitude and orientation results are recorded. Spurious responses can be reduced
by thresholding applied with hysteresis (two thresholds are defined) (Canny
1986). If gradient magnitude of the contour is greater than the higher threshold then it is considered a valid edge point. Any candidate edge pixels that
are connected to valid edge points and are above the lower threshold are also
considered edge points. The final step involves the removal of edge pixels that
are not local maxima, this is referred to as non-maximum suppression.
While the correct scale for the operator depends on the type of objects
under examination in the image, differing scales can be examined by varying
the standard deviation (5 of the Gaussian. The Canny edge detector gives
stable results, but junction connectivity is relatively poor.
While a number of alternative edge operators do exist these are usually
not required for the majority of machine vision applications and as such will
not be covered here. Many machine vision engineers hold the view that if
your edges are not easily detected then you should examine your lighting
90
(b)
(a)
Fig. 3.23. Canny edge detection, with the low and high thresholds set to 2 and 3
respectively. (a) (J" = .5. (b) (J" = 2.5.
configuration rather than use a more complicated edge operator. While this
oversimplifies the case, it does hold a certain degree of truth. A redesigned
lighting system which can enhance the edges does not add to the systems
computational load and hence can be an attractive option in the development
of real-time vision systems.
3.2.4 N-tuple Operators
I'
1+
Fig. 3.24. An N-tuple filter operates much like a local operator. The only difference
is that the pixels whose intensities are combined together do not form a compact
set. A linear N-tuple filter can be regarded as being equivalent to a local operator
which uses a large window and in which many of the weights are zero.
Shift
Tilt
Good
Fit
Mismatch
Wrong
Wrong
Size
Font
91
VL-i L-j[g(i,j) -
were t(x, y) is the template image and g(x, y) is the test image. In the
general 2-D case this becomes: d = (Xl - X2)2 + (YI - Y2)2 .
Cross-Correlation Based: R(m, n) = L-i L-j g(i, j) x t(i - m, j - n). The
template image, t(i - m,j - n) and the section of g(i,j) in the vicinity of
(m, n) are similar when the cross-correlation is large.
92
(b)
( )
INT
(255)
(d)
Fig. 3.26. Template matching using the N-tuple operator. (a) Original image. (b)
Detection of the Arial font 't' character using an N-tuple operator. (c) Convolution
mask, where a dark grid entry indicates a '1' and a clear grid indicates a "don't
care" state. (d) Associated visual workspace.
93
we adopt, is perfectly arbitrary and there will be occasions when the edge
effects are so pronounced that there is nothing that we can do but to remove
them by masking. Edge effects are important because they require us to make
special provisions for them when we try to patch several low-resolution images
together.
3.2.6 Grey Scale Corner Detection
A corner can be defined as a position in the 2-D array of brightness pixels
which humans would associate with the word 'corner', i.e. points where the
gradient direction changes rapidly and points which correspond to physical
corners of objects. For simple binary scenes, corners may be modelled as
the junction point of two or more straight line edges (Sec. 3.3.2). But the
detection of grey level corners is generally more difficult. Grey level corners
have been defined as: the location of maximum planar curvature in the locus
line of steepest grey level, Fig. 3.27. One problem with this model of corners
is that low gradient but highly curved corners will be missed (these would be
common in real-life images). The ability to extract stable corner features in
a computationally efficient manner is critical to many spatial and temporal
object tracking techniques (Blake & Isard 1998).
Full details of SUSAN low level image processing, including C program source
code, can be found at http://www.fmrib.ox.ac . uk/-steve/susan/index .html.
This program is Copyright 1995-1997 Defence and Evaluation Research
Agency, UK.
94
,-- - ,--
I
I
I
I
I
~
The mask is placed at each point in the input image. The operator works
by comparing the brightness of the nucleus to the brightness of each pixel
within the the mask. All pixels surrounding this point with a similar intensity (i.e. within a given threshold) are then grouped together into a univalue
segment assimilating nucleus or USAN. The area of the Smallest USAN (SUSAN) is then compared to a predefined value to decide if a corner exists. The
comparison function is given by:
c(r,ro) =
{~
(3.7)
where ro is the position of the nucleus, r is the position of any other pixel
within the mask, IO represents the brightness of the corresponding pixel and
tb is the brightness difference threshold. While this basic approach works
well for simulated scenes, the algorithm invokes a number of additional noise
filtering stages for real-world images. Specifically Smith (1992) have shown
that improved results can be archived by calculating c(r,ro) as follows:
(3.8)
This allows a pixel's brightness to vary slightly without having a large
effect on c( r, ro).
The final stage is to count the number of pixels (n(ro) = 2: r o c(r,ro))
(i.e. the USAN's area) within the USAN region. This value is compared with
a geometric threshold tg to determine the presence of a valid grey scale corner
at the nucleus. The brightness threshold tb controls the contrast allowed in
the detected corners and therefore the number of corners detected in the
image. The geometric threshold tg also has an effect on the number of corners
found, but more importantly it relates to the shape of the corners detected,
by lowering this value it in effect forces sharper corners (Fig. 3.29). Smith &
Brady (1997) argued that for a corner to be detected n < n'2 ax , where n max
is the maximum value of n. In fact, n max = 3700, since we have 37 pixels
in mask illustrated in Fig. 3.28 then n max = 37 x maximum value of c.
Generally the geometric threshold is fixed at tg = n'2 ax
In the example illustrated in Fig. 3.30 a corner is detected if the area
of the Smallest USAN is less than ths the area of the mask (M A). With
reference to Fig. 3.30 (Smith 1992),
............
(a)
(d)
95
(b)
(c)
(e)
(f)
Fig. 3.29. SUSAN threshold selection. The type of corner detected is dependent
on the brightness and geometric thresholds. (a) Strong corner, tb < a, where a is
a nominal value (e.g., 16). (b) tb = a. (c) Weak corner, tb > a. (d) Strong corner,
tg < (3, where (3 is a nominal value (e.g., ~). (e) tg = (3. (f) Weak corner, tg > (3.
Mask A: The area of the smallest USAN (shaded region within the mask)
< 2.~ A, thus a corner is correctly detected here.
Mask B: The area of the smallest USAN (white region within the mask)
> 2.~ A, therefore no corner is detected.
Mask C: The USAN area is exactly half that of the mask. No corner is detected. This could also be classified as an edge.
Mask D: Clearly the USAN area is too large. No corner is detected.
Mask E: The area of the smallest USAN (shaded region within the mask)
> 2.~A. No corner is detected.
Cif
Boundary of Mask ~
Nucleus of Mask
96
A further processing step involves removing any false positives that might
occur. For example in Fig. 3.31 the area of the USAN is indeed less than
~ I ths of the area of the mask, however it is not appropriate to classify it as a
corner. To combat this problem we calculate the centre of gravity of the region
within the mask. In Fig. 3.31 the centre of gravity is very close to the centre
of the mask. This however is not the case when a corner is detected. This
simple addition to the algorithm need only be calculated when the threshold
is reached, therefore not adding significantly to the overall computation of
the approach. Another step in reducing false positives involves insuring that
all the points between the centre of the mask and the centre of gravity belong
to the USAN for a corner to be detected. This insures a degree of uniformity
in the USAN and thus in the corner being detected. The final non-maximum
suppression stage is carried out on this response to detect the corners (See
Fig. 3.32).
X Centre of Gravity
Fig. 3.31. A possible falsely detected corner using the SUSAN algorithm and mask,
prior to the calculation of the centre of gravity.
As mentioned previously, the key advantage of this technique is its computational efficiency (especially when implemented using look-up tables). Another advantage is the fact that as there are no brightness spatial derivatives
the image noise is not amplified. Other methods have attempted to reduce
the noise in the image by smoothing, but this leads to local distortion and
a loss of local information. This approach to low level image processing has
also been applied to edge detection (Smith & Brady 1997).
97
OR operation (Fig. 3.33). Let 'L1(x, y)' denote the number of white points
addressed by N(x, y), including (x, y) itself.
Binary Border. Generates the borders around any binary objects in the
source image (effectively a binary edge detection routine). The image is
scanned and any white pixel with more than six white neighbours are transformed to black (Fig. 3.34).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
z = (a&1)+(b&1)+(c&1)+(d&1)+(f&1)+(g&1)+(h&1)+(i&1);
if(z>6)
output.setxy(x,y,BLACK);
else
output.setxy(x,y,e);
}
(a)
(b)
(c)
Fig. 3.32. SUSAN grey scale corner detector. The detected points are indicated
by white squares for clarity. The brightness threshold tb relates to the number of
corners reported by the operator. (a) Original image. (b) tb = 10. (c) tb = 50.
98
White Pixel Count. Count the number of white pixels within an image
region.
int number=O; for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(input.getxy(x,y)==WHITE)
number++;
return(number);
Remove Isolated White Points. Any white pixels with less than 1 white
pixel neighbour are set to black. This can be used to remove noise from a
binary image.
c(x,y)
~ {~
a(x, y) x (.1(X, y)
otherwise
> 1)
if(e!=BLACK)
if((a==WHITE)I I (b==WHITE) I I (c==WHITE) I I (d==WHITE) I I
(f==WHITE) I I (g==WHITE) I I (h==WHITE) I I (i==WHITE
output.setxy(x,y,WHITE);
}
NOT(a(x, y)).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,WHITE-input.getxy(x,y;
a(x, y)
1\
b(x, y).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,inputA.getxy(x,y)&inputB.getxy(x,y;
a(x, y)
b(x, y).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,inputA.getxy(x,y) linputB.getxy(x,y;
99
Exclusive OR. c(x, y) {::: a(x, y)r;;yb(x, y). This operator finds the differences
between white regions.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
(b)
(a)
94 T
12 5
(c)
(d)
Fig. 3.33. Binary image manipulation. (a) Input image a(x,y). (b) Input image
Q9
b(x, y).
I /\ 9 /\ h /\ i)
e {::: e /\ NOT (a /\ b /\ c /\ d /\ e /\ I /\ 9 /\ h /\ i).
11 Il
I101 J
0
I X I
100
110
001
000
o0
101
OXI
lXO
lXO
100
(b)
(a)
Fig. 3.34. Binary edge detection. (a) Original image. (b) Binary border.
However, those points marked X below are not critical for connectivity,
since setting X = 0 rather than 1 has no effect on the connectivity of the l's.
111
1 Xl
001
011
1XO
111
011
1XO
011
A connectivity detector shades the output image with 1 's to indicate the
position of those points which are critical for connectivity and which were
white in the input image. Black points and those which are not critical for
connectivity, are mapped to black in the output image.
Euler Number. The Euler number is defined as the number of connected
components (blobs) minus the number of holes in a binary image and is
a useful means of classifying shapes in an image. The Euler number also
represents a simple method of counting blobs in a binary image provided
that they have no holes in them. Alternatively, it can be used to count holes
in a given object, providing they have no islands in them (Fig. 3.35).
The reason why this approach is used to count blobs, despite the fact that
it may seem a little awkward to use, is that the Euler number is very easy and
fast to calculate. The Euler number can be computed by using three local
operators. Let us define three numbers N 1 , N2 and N 3 , where Nex indicates
the number of times that one of the patterns in the pattern set a (a = 1, 2
or 3) occur in the input image.
Pattern set 1 (Nd:
00
01
00
10
10
10
01
10
10
01
01
01
101
C=2
H=O
E=2
C=2
H=O
E=2
C=2
H=3
E =1
Fig. 3.35. Euler number for various objects. C - Connected components, H - Holes,
E - Euler Number.
11
01
10
10
01
11
11
The 8-connected Euler number, where holes and blobs are defined in terms
of 8-connected figures, is defined as: Nl-( 2x 2 )-N3. It is possible to calculate
the 4-connected Euler number using a slightly different formula, but this
parameter can give results which seem to be anomalous when we compare
them to the observed number of holes and blobs.
current_pixel = bufferO.getxy(x,y);
if (current_pixel>ob_count)
ob_count = current_pixel;
}
output=doubleThreshold(bufferO,ct,ct);
number = 0;
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(output.getxy(x,y)==WHITE)
number++;
pix_count = number;
102
biggest = pix_count;
bigthresh = ct;
}
}
output = doubleThreshold(bufferO,bigthresh,bigthresh);
Binary to Grey Scale. Convert a binary image to a user defined grey scale
grey.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(input.getxy(x,y)==WHITE)
output.setxy(x,y,grey);
Centroid. Replace grey scale shape by a single pixel at its centre of gravity.
The NeatVision implementation of this algorithm is limited by the fact that
we can only have a maximum of 255 blobs in a scene. It also assumes a black
background which is ignored in the centroid calculation. This approach is
based on the fact that all shapes must have a distinct grey scale.
int table[] = new int[256];
int cenx[]
new int[256];
int ceny[] = new int[256];
int current_pixel, max,index, X_centroid, Y_centroid;
int cnx=O,cny=O;
for(int i=BLACK; i<=WHITE; i++)
{
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
current_pixel = input.getxy(x,y);
table [current_pixel]++;
cenx [current_pixel] +=x;
ceny[current_pixel]+=y;
}
max=O;
index=O;
for(int j=BLACK; j<=WHITE; j++)
if (table [j] >max)
{
max=table[j] ;
103
index=j;
cnx=cenx[j];
cny=ceny[j];
}
X_centroid = cnx/max;
V_centroid = cny/max;
output.setxy(X_centroid,Y_centroid,WHITE);
}
}
(a)
(b)
(c)
Fig. 3.36. Shading blobs in a binary image. (a) Original binary image. (b) Labelling
according to their areas. (c) Labelling according to the order in which they are found
during a raster scan (left to right; top to bottom).
104
int pixel;
output
Not.not(input);
output
Mask.mask(output,1);
output
Labeller.label(output);
pixel
output.getxy(1,1);
output
DualThreshold.doubleThreshold(output,pixel,pixel);
output
Not.not(output);
output
Mask.mask(output,1);
return(output);
Where the mask operator draws a black border of a user specified amount
around the image.
Detecting/Removing Small Spots. A binary image can be represented
in terms of a grey scale image in which only two grey levels, 0 and Ware
allowed. The result of the application of a conventional low-pass (blurring)
filter to such an image is a grey scale image in which there are a larger number
of possible intensity values. Pixels which were well inside large white areas
in the input image are mapped to very bright pixels in the output image.
Pixels which were well inside black areas are mapped to very dark pixels in
the output image. Pixels which were inside small white spots in the input
image are mapped to mid-grey intensity levels. Pixels on the edge of large
white areas are also mapped to mid-grey intensity levels. However, if there is
a cluster of small spots which are closely spaced together, some of them may
also disappear.
Based on these observations, the following procedure has been developed.
It has been found to be effective in distinguishing between small spots and,
at the same time, achieving a certain amount of edge smoothing of the large
bright blobs which remain:
Low-pass filter using a 11 by 11 local operator
Threshold at mid-grey
105
This technique is generally easier and faster to implement than the blob
shading technique described previously. Although it may not achieve the desired result exactly, it can be performed at high speed. An N-tuple filter
having the weight matrix illustrated in Fig. 3.38 can be combined with simple thresholding to distinguish between large and small spots. Assume that
there are several small white spots within the input image and that they are
spaced well apart. All pixels within a spot which can be contained within a
circle of radius three pixels will be mapped to white by this particular filter.
Pixels within a larger spot will become darker than this. The image is then
thresholded at white to separate the large and small spots.
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
20
-1
-1
-1
-1
-1
-1
-1
Grass-fire Transform and Skeleton. Consider a binary image containing a single white blob. Imagine that a fire is lit at all points around the
blob's outer edge and the edges of any holes it may contain. The fire will
burn inwards, until at some instant, advancing fire lines meet. When this
occurs the fire becomes extinguished locally. An output image is generated
and is shaded in proportion to the time it takes for the fire to reach each
point. Background pixels are mapped to black (Fig. 3.39). The importance of
this transform, referred to as the grass-fire transform, lies in the fact that it
indicates distances to the nearest edge point in the image (Borgefors 1986).
It is therefore possible to distinguish thin and fat limbs of a white region.
Those points at which the fire lines meet are known as quench points. The
set of quench points form a match-stick figure, usually referred to as a skeleton or medial axis transform (MAT) (Fig. 3.40). These figures can also be
generated in a number of different ways (Gonzalez & Wintz 1987). One such
approach is described as onion-peeling. Consider a single white blob and a
'bug' which walks around the blob's outer edge, removing one pixel at a time.
No edge pixel is removed, if by doing so we would break the blob into two
disconnected parts. In addition, no white pixel is removed, if there is only
one white pixel amongst its 8-neighbours. This simple procedure leads to an
undesirable effect in those instances when the input blob has holes in it; the
skeleton which it produces has small loops in it which fit around the holes like
a tightened noose. More sophisticated algorithms have been devised which
avoid this problem (Sec. 3.4.2).
106
II IMAGE COPY
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,input.getxy(x,y;
II there's no point in replacing white pixels with white
while(!done && greyLevel<WHITE)
{
greyLevel++;
done = true;
II 8-connected
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
a = output.getxy(x-l,y-l);
b
output.getxy(x,y-l);
c = output.getxy(x+l,y-l);
d
output.getxy(x-l,y);
e = output.getxy(x,y);
f
output.getxy(x+l,y);
g
output.getxy(x-l,y+l);
h
output.getxy(x,y+l);
i
output.getxy(x+l,y+l);
i f (e == WHITE)
{
z = Math.min(i,Math.min(Math.min
(Math.min(a,b),Math.min(c,d,
Math.min(Math.min(e,f),Math.min(g,h;
if(z!=greyLevel && z!=WHITE)
{
output.setxy(x,y,greyLevel);
done = false;
}
}
}
}
107
AJQK
Fig. 3.40. Application of the Medial Axis Transform (MAT) (8-connected). The
top figure illustrates how a small variation in the input scene (e.g., noise) can have
a significant effect on the resultant MAT.
the number of ones in the 3 x 3 window, excluding the centre pixel e from
the count. If 6 ::; count::; 2, count the number of 0 to 1 transitions around
the current pixel. If the number of transitions = 1 and pixels j, h, band d
are black then place a single white pixel at the current location (shaded) in
the alternate image. Delete all the points flagged in the alternate image as
before.
Edge Smoothing and Corner Detection. Consider three points B 1 , B2
and B3 which are placed close together on the edge of a single blob in a
binary image (Fig. 3.42). The perimeter distance between Bl and B2 is equal
to that between B2 and B 3 . Define the point P to be that at the centre of
the line joining Bl and B 3 . As the three points now move around the edge of
the blob, keeping the spacing between them constant, the locus of P traces
108
a smoother path than that followed by B2 as it moves around the edge. This
forms the basis of a simple edge smoothing procedure.
A related algorithm for corner detection shades the edge according to
the distance between P and B 2 . This results in an image in which the
corners are highlighted, while the smoother parts of the image are much
darker. Many other methods of edge smoothing are possible. For example,
we may map white pixels which have fewer than three white 8-neighbours
to black. This has the effect of eliminating 'hair' around the edge of a
blob-like figure. One of the techniques described previously for eliminating
small spots offers another possibility. A third option is to use the processingsequence: expand white areas, shrink white areas, shrink white
areas, expand white areas.
The 3x3 masks illustrated in Fig. 3.43(a) can also be used to detect binary
corners. It detects 3 x 3 corner points and replaces them by a single white
pixel if the illustrated patterns occur. Otherwise, the output pixel (shaded)
is set to black. The application of this operator applied to a binary image is
illustrated in result of this operator Fig. 3.43(b).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
if ((b==WHITE)&&(d==WHITE)&&(e==WHITE)&&(c==BLACK)&&
(f==BLACK)&&(i==BLACK)&&(h==BLACK)&&(g==BLACK))
109
z=WHITE;
else if((b==WHITE)&&(f==WHITE)&&(e==WHITE)&&(a==BLACK)&&
(d==BLACK)&&(g==BLACK)&&(h==BLACK)&&(i==BLACK))
z=WHITE;
else if((d==WHITE)&&(h==WHITE)&&(e==WHITE)&&(a==BLACK)&&
(b==BLACK)&&(c==BLACK)&&(f==BLACK)&&(i==BLACK))
z=WHITE;
else if((e==WHITE)&&(f==WHITE)&&(h==WHITE)&&(a==BLACK)&&
(b==BLACK)&&(c==BLACK)&&(d==BLACK)&&(g==BLACK))
z=WHITE;
else
z=BLACK;
output.setxy(x,y,z);
}
~
~ ~
~ l ~ f~PI
1 0
(a)
A J QK.
(b)
Fig. 3.43. 3 x 3 corner detection. Note that due to quantization of the image not
all corner points are flagged. (a) Masks. (b) Corner detection based on the patterns
illustrated in (a).
Limb End Detection. Fig. 3.44(a) contains the masks for detecting 3 x 3
skeleton end points. Once detected, these limb end points are replaced by
a single white pixel. Otherwise the output pixel (shaded) is set to black.
Fig. 3.44(b) illustrates the application of this operator to a binary image. The
removal of the limb ends is referred to as pruning and is is commonly used
to smooth skeletons or thinned objects by removing extraneous branches.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
if ((e==WHITE)&&(d==BLACK)&&(g==BLACK)&&(h==BLACK)
&&(i==BLACK)&&(f==BLACK))
z=WHITE;
else if((e==WHITE)&&(a==BLACK)&&(b==BLACK)
&&(d==BLACK)&&(g==BLACK)&&(h==BLACK))
z=WHITE;
else if((e==WHITE)&&(b==BLACK)&&(c==BLACK)
&&(f==BLACK)&&(i==BLACK)&&(h==BLACK))
z=WHITE;
else if((e==WHITE)&&(a==BLACK)&&(b==BLACK)
&&(c==BLACK)&&(d==BLACK)&&(f==BLACK))
110
else
z=WHITE;
z=BLACK;
output.setxy(x,y,z);
if((e==WHITE)&&(b==WHITE)&&(g==WHITE)&&(i==WHITE
z=WHITE;
else if((e==WHITE)&&(a==WHITE)&&(f==WHITE)&&(g==WHITE
z=WHITE;
else if((e==WHITE)&&(h==WHITE)&&(a==WHITE)&&(c==WHITE
z=WHITE;
else if((e==WHITE)&&(d==WHITE)&&(c==WHITE)&&(i==WHITE
z=WHITE;
else if((e==WHITE)&&(a==WHITE)&&(i==WHITE)&&(g==WHITE
z=WHITE;
else if((e==WHITE)&&(a==WHITE)&&(c==WHITE)&&(g==WHITE
z=WHITE;
else if((e==WHITE)&&(c==WHITE)&&(g==WHITE)&&(i==WHITE
z=WHITE;
else if((e==WHITE)&&(a==WHITE)&&(c==WHITE)&&(i==WHITE
z=WHITE;
else
z=BLACK;
output.setxy(x,y,z);
}
~
~ I ~ I ~~ ~ I ~
~ ~
~
(a)
"..
'..
."
.
(b)
Fig. 3.44. 3 x 3 skeleton end point (limb) detection. (a) Masks. (b) Limb detection
based on the patterns illustrated in (a).
Bounding Box. Draw a bounding box around an image. The box dimensions are incremented by one pixel to ensure that none of the original image
is lost when the box is overlaid (Fig. 3.46).
fIhhl
~
~ 1
t:ttlj
~
111 ~
trtJj trt:tTI
b:ffi
111
..
.
(b)
(a)
Fig. 3.45. 3 x 3 skeleton junction point detection. (a) Masks. (b) Skeleton junction
point detection based on the patterns illustrated in (a).
current_pixel_value = input.getxy(x,y);
output.setxy(x,y,current_pixel_value);
if (current_pixel_value != 0)
{
current_pixel_xaddress = x;
current_pixel_yaddress = y;
if (current_pixel_xaddress > xmax)
xmax = current_pixel_xaddress;
if (current_pixel_xaddress < xmin)
xmin = current_pixel_xaddress;
if (current_pixel_yaddress > ymax)
ymax = current_pixel_yaddress;
if (current_pixel_yaddress < ymin)
ymin = current_pixel_yaddress;
}
}
x1=xmin-1;
/* allow 1 pixel gap */
y1=ymin-1;
x2=xmax+1 ;
y2=ymax+1 ;
output.drawBox(x1,y1,x2,y2,WHITE);
Convex Hull. Consider a single blob in a binary image. The convex hull
is that area enclosed within the smallest convex polygon which will enclose
the shape. This can also be described as the region enclosed within an elastic
string, stretched around the blob. The area enclosed by the convex hull, but
not within the original blob is called the convex deficiency, which may consist
of a number of disconnected parts and includes any holes and indentations.
If we regard the blob as being like an island, we can understand the logic of
referring to the former as lakes and the latter as bays (Fig. 3.47).
A fundamental property of the convex hull is that any line outside the hull,
when moved in any direction towards the hull, hits the hull at one of its vertex
112
Hole
Bays
Fig. 3.47. Convex hull of a binary image. The enclosed regions indicates the
shape's convex deficiencies (i.e. bays
and holes).
113
(O,O,...:-)_ __ __ _-J>-
(O,0r=-)_ __ _ _ _--JI-
(X2 ,Y2) ~
(X3,Y3)
Fig. 3.49. Convex hull generation: Stage 2.
Fig. 3.51. Generating the convex deficiencies by subtracting the original image
from the filled convex hull representation.
114
Fig. 3.52. Extracting the convex hull holes by applying the labelling function to
the inverted version of the original image. The holes can then be extracted using
the double threshold operator.
directions (Fig. 3.53). If 4-adjacency convention is applied to the image segment given in Fig. 3.53, then none of the four segments (two horizontal and
two vertical) will appear as touching, i.e. they are not connected. Using the
8-adjacency convention, the segments are now connected, but we have the
ambiguity that the inside of the shape is connected to the outside. Neither
convention is satisfactory, but since 8-adjacency allows diagonally-connected
pixels to be represented, it leads to a more faithful perimeter measurement.
Assuming that the 8-adjacency convention is used, we can generated a coded
description of the blob's edge. This is referred to as the chain code or Freeman
code. As we trace around the edge of the blob, we generate a number, 0-7,
to indicate which of the eight possible directions we have taken (i.e. from the
centre, shaded pixel in Fig. 3.53). Let Nodd indicate how many odd-numbered
code values are produced as we code the blob's edge and N even represent the
number of even-numbered values found. The approximate perimeter of the
blob is given by: N even +jNodd This formula will normally suffice for use in
those situations where the perimeter of a smooth object is to be measured.
The centroid of a blob determines its position within the image and can be
calculated using the following:
X+--
L L a(x;;) x x
x
Y +--
LL a(x,y) x y
N x 'y
(3.9)
X,Y
(3.10)
where
Nx,y +--
L L a(x, y)
y
(3.11)
115
21
~0
4
(a)
(b)
(c)
Fig. 3.53. Chain code. (a) Image segment. (b) 4-adjacency coding convention. (c)
8-adjacency coding convention.
C1
C2
C3
C4
Fig. 3.54. Run-length coding applied to a simple image. The resultant code for the three
scan lines indicted (labelled A,B,C) are ((A, CI,
C4)), ((B, CI, C2), (B, C3, C4) ), ((C, CI, C2)).
116
Algorithms exist by which images can be shifted, rotated, undergo axis conversion, magnified and warped. Certain operations, such as rotating a digital
image can cause some difficulties, as pixels in the input image are not mapped
exactly to pixels in the output image. This can cause smooth edges to appear stepped. To avoid this effect interpolation may be used, but this has the
unfortunate effect of blurring edges.
The utility of axis transformations is evident when we are confronted with
the examination of circular objects, or those displaying a series of concentric
arcs, or streaks radiating from a fixed point. Inspecting such objects is often
made very much easier, if we first convert from Cartesian to polar coordinates.
Warping is also useful in a variety of situations. For example, it is possible
to compensate for barrel, or pin-cushion distortion in a camera. Geometric
distortions introduced by a wide-angle lens, or trapezoidal distortion due
to viewing the scene from an oblique angle can also be corrected. Another
possibility is to convert simple curves of known shape into straight lines, in
order to make subsequent analysis easier.
117
The distance transform produces a grey scale image in which the intensity of a
pixel in the image is dependent on the distance of that pixel from nearest edge
point in the image, see the grass-fire transform outlined in Sec. 3.3.2. The
resultant image is dependent on the distance metric employed. Commonly
used distance metrics between the two points (Xl, Yl) and (X2, Y2) are outlined
below. We generally use the city block and chessboard distance metrics when
speed rather than accuracy is the key concern.
Euclidean Distance: V(X2 - xd 2 + (Y2 - Yl)2. This is the familiar straightline distance.
City Block Distance: IX2 - xII + IY2 - Yli. Diagonal moves are not allowed
when moving between pixels.
Chessboard Distance: max(lx2 - xII, IY2 - Yll). Diagonal moves count the
same as a horizontal move (Fig. 3.55).
a a a a a
a 1 1 1 1
a 1 1 1 1
a 1 1 1 1
a 1 1 1 1
a 1 1 1 1
0
0
a
a
1
1
1
0
1
1
1
0
0
1
1
1
1
1
1 1 1
1 1 1
1 1 1
0 0 0
a a
0 a
1 0
1 1 0
1 1 1 0
1 1 1 0
1 1 1 0
1 1 1 0
1 1 1 0
1 1 1 a
a 0 0 0
1
1
0
0
0
a
0
a
0
0
0
0
0
1
1
1
1
1
0
1
0
1
2
2
2
2
2
3
3
3
3
2
1
0
2
3
1 2
1 2
1 1
0 0
4
4
3
2
1
0
1
2
a a
1
2
2
2
2
2
2
4
3
2
3
3
3
3
2
1
0
1
1
a
0
a
a
a
a
1
1 0
1 a
a 0
118
Binary Image
10 10 1112 13 14 15 16 17 10 10 1
Pass 1 (L
10 10 1112 13 14 13 12 11 10 10 1
Pass 2 (R
R)
~
L)
Fig. 3.56. Calculating the distance transformation for a I-D image using the Chamfer algorithm.
the dot at the centre of the mask in Fig. 3.57(a)). The centre pixel is then
replaced by the minimum of these sums. The pixel value is overwritten prior
to moving the mask on to its next calculation (Fig. 3.57(d)). During the
second pass the same approach is applied using the reverse mask (i.e. using
a reverse raster scan) (Figs. 3.57(b) and (e)). The final distance transform
is the minimum of the output images generated from both passes. For the
3 x 3 mask the maximum deviation from the true Euclidean distance is 8%
(Fig. 3.58). If the mask size is increased to 5 x 5 the distance transform image
is scaled up by a factor of 5, with a maximum error of only 2% (Fig. 3.59). The
key aspects of the 3 x 3 Chamfer distance transform algorithm are outlined
below.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if (input.getxy(x,y)==WHITE)
{
pass1.setxy(x,y,1);
pass2.setxy(x,y,1);
}
a = pass1.getxy(x-1,y-1)+4;
b
pass1.getxy(x,y-1)+3;
c = pass1.getxy(x+1,y-1)+4;
d
pass1.getxy(x-1,y)+3;
z
Math.min(Math.min(a,b),Math.min(c,d));
pass1.setxy(x,y,flow(z));
}
f = pass2.getxy(x+1,y)+3;
119
pass2.getxy(x-1,y+1)+4;
g
h
pass2.getxy(x,y+1)+3;
i
pass2.getxy(x+1,y+1)+4;
z
Math.min(Math.min(f,g),Math.min(h,i));
pass2.setxy(x,y,flow(z));
}
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
output.setxy(x,y,Math.min(pass1.getxy(x,y),
pass2.getxy(x,y)));
}
a
a
Cf:j b:ti:tij
a
a
0
a
0
0
3
3
3
3
3
3
3
a a
0
3
6
6
6
6
6
6
0
0
3
6
9
9
9
9
9
0
0
3
6
8
8
8
8
8
0
a
3
4
4
4
4
4
4
0
0
0
a
0
0
0
0
a a
(d)
11
11
11
0
0
0
0
11
111
a
0
0
0
0
0
0
0
0
0
4
4
4
4
4
4
3
0
0
0
11 0
11 0
t 1 0
F 1 0
1 1 0
f 1 0
0 0 0
(c)
(b)
(a)
a a
0
0
~ ~
0
8
8
8
8
8
6
3
0
0
9
9
9
9
9
6
3
0
(e)
a
6
6
6
6
6
6
3
0
0
3
3
3
3
3
3
3
0
0
0
0
0
0
0
0
0
0
a
0
0
a
0
a
0
0
0
0
3
3
3
3
3
3
3
0
0
3
6
6
6
6
6
3
0
0
3
6
9
9
9
6
3
0
0
3
6
6
6
6
6
3
0
0
3
3
3
3
3
3
3
0
a
0
a
0
0
0
a
0
(f)
Fig. 3.57. 2-D Chamfer distance transform, using 3 x 3 masks. (a) Forward mask.
(b) Reverse mask. (c) Binary input image. (d) Result of pass 1. (e) Result of pass
2. (f) Minimum of (d) and (e).
The Hough transform (HT) provides a powerful and robust technique for
detecting lines, circles, ellipses, parabolae and other curves of pre-defined
shape in a binary image. One of the key applications of the Hough transform
is in the detection of straight lines. That is the location of nearly linear
120
(b)
(a)
Fig. 3.58. 3 x 3 Chamfer distance transform. (a) Input image. (b) Resultant distance transform.
11
11
11 7 5 7 11
5
(a)
5
11 7 5 7 11
11
11
(b)
Fig. 3.59. 5 x 5 masks for the Chamfer algorithm. (a) Forward mask. (b) Reverse
mask.
121
(a)
o
(b)
Fig. 3.60. Mapping of a point on a line into Hough space. (a) Input line. (b)
Sinusoidal representation of a point on the line illustrated in (a).
the point (Xi, Yi) in the input image is white. The bright spots in the HT
image indicate "linear" sets of spots in the input image. Thus, line detection
is transformed to the rather simpler task of finding local maxima in the accumulator array, A(r, ). The coordinates (r, ) of such a local maximum give
the parameters of the equation of the corresponding line in the input image.
The HT image can be displayed, processed and analysed just like any other
image, using many of the operators outlined previously.
The robustness of the HT techniques arises from the fact that, if part
of the line is missing, the corresponding peak in the HT image is simply
darker. This occurs because fewer sinusoidal curves converge on that spot
and the corresponding accumulator cell is incremented less often. However,
unless the line is almost completely obliterated, this new darker spot can also
be detected. In practice "near straight lines" are transformed into a cluster
of points. There is also a spreading of the intensity peaks in the HT image,
due to noise and quantisation effects. In this event, it may be convenient to
threshold the HT image prior to finding the centroid of the resulting spot
and calculating the parameters of the straight line in the input image (Pitas
1993). Fig. 3.61 illustrates how this approach can be used to find a line in a
noisy binary image.
The Hough transform can also be generalised to detect groups of points
lying on a curve. In practice, this may not be a trivial task, since the complexity increases very rapidly with the number of parameters needed to define the
curve. Another key application of this transform is in circle detection. The
circle is defined parametrically as: r2 = (x - a)2 + (y - b)2, where (a,b) determines the coordinates of the centre of the circle and r is its radius. This
requires a 3-D parameter space, which cannot be represented and processed
as a single image. For an arbitrary curve, with no simple equation to describe
its boundary, a look-up table is used to define the relationship between the
boundary coordinates and the HT parameters. See Sonka et al. (1993) for
further details.
122
(b)
(a)
"..
",\
~.'.
.
... .
.'
... "
.
........ .
"'.~,\.,
',.
(c)
,~
..
""
...
"-',><~ "
(d)
(e)
Fig. 3.61. Finding a line in a noisy image using the Hough transform. (a) Original
image. (b) Hough transform. (c) Inverse Hough transform applied to a single white
pixel located at the point of maximum intensity in the Hough transform. (d) Union
of (a) and (c). Notice how accurately this process locates the line in the input image,
despite the presence of a high level of noise. (e) Associated visual workspace.
123
The following code segment summarises the key steps in the generation
of the Hough transform.
double sine[] = new double[width];
double cosine[] = new double[width];
for(int theta=O; theta<width; theta++)
{
sine[theta] = Math.sin((theta*Math.PI)/width);
cosine[theta] = Math.cos((theta*Math.PI)/width);
}
current_pixel = input.getxy(x,y);
if (current_pixel==WHITE)
{
for(int theta=O;theta<width;theta++)
{
Similarly the inverse Hough transform is outlined below, where the line
count variable n is user defined.
maxx=O; maxy=O;
for(int lines=O;lines<n;lines++)
{
max=O;
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
current_pixel=input.getxy(x,y);
if (current_pixel>max)
{
max=current_pixel;
maxx=x;
maxy=y;
}
}
input.setxy(maxx,maxy,O);
r=(maxy-((double)height/2))*3.0;
thet=(maxx/((double)width))*Math.PI;
124
if(Math.abs(Math.sin(thet))<Math.abs(Math.cos(thet)))
{
m=-Math.sin(thet)/Math.cos(thet);
cons=r/Math.cos(thet);
for(int y=O;y<height;y++)
{
int x=(int)(((double)y*m)+cons);
if(x>O && x<width)
output.setxy(x,y,output.getxy(x,y)+l);
}
}
else
{
m=-Math.cos(thet)/Math.sin(thet);
cons=r/Math.sin(thet);
for(int x=O;x<width;x++)
{
int y=(int)(((double)x*m)+cons);
if(y>O && y<height)
output.setxy(x,y,output.getxy(x,y)+l);
}
}
}
F(rn, n) = N
L L
N-l N-l
=x+ny
(3.12)
x=O y=o
f(x, y) = N
L L
N-l N-l
=x+ny
(3.13)
m=O n=O
F[(g
* h)(x,y)] = G(u,v)H(u,v)
(3.14)
= (G * H)( u, v)
125
(3.15)
Several algorithms have been developed to calculate the 2-D DFT. The
simplest approach makes use of the observation that this is a separable transform which can be computed as a sequence of two 1-D transforms. Therefore, the 2-D transform is generated by calculating the 1-D DFT along the
image rows and then repeating this on the resulting image but, this time,
operating on the columns, i.e. f(x, y) --+ Row transform --+ Fl (x, n) --+
Column transform --+ F2(m, n). This reduces the computational overhead
when compared to direct 2-D implementations. (This is the approach taken
in the Neat Vision implementation of the DFT) (See Fig. 3.62).
Fig. 3.62. Binary images and their associated 2-D Discrete Fourier Transforms.
The frequency increases radially in the range [0,1] from the centre of the DFT
image.
Although this is still computationally slow when compared to many feature based segmentation techniques, the Fourier transform is quite powerful.
It allows the input to be represented in the frequency domain, which can be
displayed as a pair of images. (It is not possible to represent both amplitude
and phase using a single monochrome image). Fortunately, this is not just an
algorithmic transformation but a physical one (Hanson & Manneberg 1997).
The FT can be implemented in real-time using optical techniques (Stark
1982, Mitkas et al. 1998). Optical computational devices enable parallel computation of Fourier transforms, thereby presenting rapid processing.
Once the processing within the frequency domain is complete, the inverse
transform can be used to generate a new image in the original spatial domain.
The Fourier power, or amplitude, spectrum plays an important role in image
processing and analysis. This can be displayed, processed and analysed as
an intensity image. Since the Fourier transform of a real function produces a
126
outputStream.writeDouble(temp_r);
outputStream.writeDouble(temp_i);
}
else
{
outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}
}
Adaptive Low-pass Filter. The user defined input limit is the cutoff
frequency for the filter. Unlike the low-pass filter which is based on the
position of the filter in the Fourier space, this approach is dependent on
the real and imaginary values all the data in the Fourier space. If the
real 2 + imaginary 2 > limit then the real and imaginary values of the
frequency spectrum are retained, otherwise the real and imaginary values
are set to zero (Fig. 3.63).
for (int y=O; y<size; y++)
for (int x=O; x<size; x++)
{
inputStream.readDouble();
inputStream.readDouble();
127
outputStream.writeDouble(temp_r);
outputStream.writeDouble(temp_i);
}
else
{
outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}
}
High-pass Filter. The user defined input value is the cutoff frequency for
the high-pass filter. If a point in the frequency spectrum is greater than a
given radial distance then the real and imaginary values of the frequency
spectrum are retained, otherwise the real and imaginary values are set to
zero.
r=(value * value)/4;
for (int y=O; y<size; y++)
for (int x=O; x<size; x++)
{
outputStream.writeDouble(temp_r) ;
outputStream.writeDouble(temp_i) ;
}
else
{
outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}
}
Band-pass Filter. The user defined inputs valuel and value2 are the inner and outer bounds of the band-pass region (i.e. the region to be retained).
Fig. 3.64 illustrates how certain textured features can be highlighted using
the 2-D Discrete Fourier transform. The image is transformed into the frequency domain and an ideal band-pass filter (with a circular symmetry) is
applied. This has the effect of limiting the frequency information in the image. When the inverse transform is calculated, the resultant textured image
has a different frequency content (Fig. 3.65).
rl=(value1 * valuel)/4;
r2=(value2 * value2)/4;
128
outputStream.writeDouble(temp_r);
outputStream.writeDouble(temp_i);
}
else
{
outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}
}
Band-stop Filter. The user defined inputs valuel and value2 are the inner
and outer bounds of the band-stop region (Fig. 3.65).
r1=(value1 * value1)/4;
r2=(value2 * value2)/4;
for (int y=O; y<size; y++)
for (int x=O; x<size; x++)
{
outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}
else
{
outputStream.writeDouble(temp_r);
outputStream.writeDouble(temp_i);
}
}
Selective Filter. The user defined input valueO is the filters cutoff frequency. value 1 and value2 are x and y coordinate offset values respectively
(Fig. 3.65).
r=(valueO * valueO)/4;
xOffset=(int) ((value1/2)*size);
yOffset=(int) ((value2/2)*size);
for(int m=O; m<size; m++)
for(int n=O; n<size; n++)
{
x=(int) ((n+(size-xOffset))%size);
129
y=(int) ((m+(size+yOffset%size);
xf=(float) (((x+(sizel%size)-(sizel/(float)size ;
yf=(float) (((y+(sizel%size)-(sizel/(float)size ;
if ((xf*xf + yf*yf)<r)
{
outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}
else
{
outputStream.writeDouble(temp_r);
outputStream.writeDouble(temp_i);
}
}
130
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3.63. Low-pass filtering in the frequency domain. (a) Original image. (b)
Frequency spectrum of (a). (c) Low-pass filtering applied to (b), value = .2. (d)
Inverse Fourier transform of (c). (e) Adaptive low-pass filtering applied to (b),
limit = .2. (f) Inverse Fourier transform of (e).
(b)
(a)
,.
/I
", .. ,
.'
" .'" .
-.
It
-.
(c)
..
" '.....
131
""
. ..
(d)
(e)
Fig. 3.64. Filtering a textured image in the frequency domain. (a) Original textured image. (b) Application of the 2-D DFT to (a). The image is the frequency
spectrum shown as an intensity function. (c) Application of the an ideal band-pass
filter is applied to (b). (d) The resultant spatial domain image after the inverse 2-D
discrete Fourier transform is applied to (c). (e) Associated visual workspace.
132
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3.65. Filtering in the frequency domain. (a) High-pass. (b) Low-pass. (c)
Band-pass. (d) Band-stop. (e) Selective. (f) Symmetric (selective).
(a)
(b)
(c)
Cd)
(e)
(f)
133
Fig. 3.66. Gaussian filtering in the frequency domain. (a) Original image. (b)
Frequency spectrum of (a). (c) 2-D Gaussian. (d) Resultant image after the application of the Gaussian in (c), see Fig. 3.67. (e) 3-D profile of (d). (f) Inverse Fourier
transform of (e).
134
~L~----:t----1I.D 1 X2
IMAGI!!
[":amantl
FOllRll!R
X2
~
VIII5WWRJeR
MULTIPLY
X:i!
e",~cl!!! ~IE
~
IHVFOURIER
ENHANCE
(MAolE
o-------EJ
VIIEWFOURIER
IMAa~
Fig. 3.67. NeatVision visual workspace for Gaussian filtering in the frequency
domain.
(a)
(b)
Fig. 3.68. A typical watermark. (a) Plain paper. (b) Paper with chain (vertical)
and laid (horizontal) lines. Note the uneven illumination on both images.
3.5 Conclusion
The ideas outlined in this chapter will enable the reader to tackle a wide range
of machine vision applications. Over the coming chapters we will go into more
detail on specific methodologies dealing with the analysis of shape, colour and
texture. Specifically, in Chap. 4 we focus on the key set of techniques which
form the mathematical morphology framework. This approach to machine
vision is particulary useful for the analysis of shapes in binary and grey scale
images.
3.5 Conclusion
(a)
(b)
(c)
(d)
(e)
( f)
135
Fig. 3.69. Minimizing the effects of unwanted periodic lines in non-plain paper
samples using the Fourier transform. (a) Original Image. (b) DFT frequency spectrum as an intensity function. (c) Selective low-pass filtering of (b). (d) Inverse
Fourier transform of (c). (e) White top-hat of (d). (f) Filtered double threshold of
(e).
4. Mathematical Morphology
Mathematical morphology is a set-theoretic branch of nonlinear image processing and analysis that was initially developed by Matheron (1975). It deals
with geometric structure within an image. In fact, the term morphology refers
to the study of form and structure. Serra (1982), who describes the application of morphology to image analysis as the quantitative description of
geometric structures, produced the first systematic treatment of the subject.
Along with his colleagues at the Centre de Morphologie Mathematique in
France, Serra is one of the main independent contributors to the development
and growth of morphological techniques in image processing and analysis.
The two key reference books in this area by Matheron (1975) and Serra
(1988) tend to be highly mathematical and as such they are generally not
suitable for researchers new to the subject. For a good introduction to the
area of morphological image processing and analysis, the reader is directed
to Dougherty (1992) and more recently Soille (1999) (the latter book provides many working examples of morphology). Haralick & Shapiro (1992)
also provide an excellent tutorial text, which covers both binary and grey
scale morphology. Maragos (1987) reviews advances in the theory and application of morphological image analysis. This includes a discussion on the
application of morphology to filtering, edge detection, noise suppression, region filling, shape recognition, skeletonisation, shape smoothing and image
coding.
Mathematical morphological techniques appear across a broad range of
image processing and analysis applications. These techniques have proven to
be quite powerful, with applications including medical imaging and biology
(Liang et al. 1989), metallurgy (Serra 1982), surface description (Noble 1989),
texture analysis and synthesis (Serra 1988), microscopy (Klien & Serra 1972),
coding of binary images (Maragos & Schafer 1986), target tracking (Preston
1991), automated visual inspection (Sternberg 1986) and industrial vision
(Sternberg 1984, Whelan & Soille 1998).
The basic concept involved in mathematical morphology is simple: an image is probed with a template shape, called a structuring element (SE), to
find where the SE fits, or does not fit within a given image (Dougherty & Laplante 1995) (Fig. 4.1). By marking the locations where the template shape
fits, structural information can be derived from the image. The structuring
P. F. Whelan et al., Machine Vision Algorithms in Java
Springer-Verlag London 2001
138
4. Mathematical Morphology
elements used in practice are usually geometrically simpler than the image
they act on, although this is not always the case, see Sec. 4.5. Common structuring elements include points, point pairs, vectors, lines, squares, octagons,
discs, rhombi and rings. Key operations include expanding and shrinking,
thinning, thickening and the removal of unwanted features from the object
of interest. The effect of these operations on the image is dependent on the
size and shape the SE.
Since shape is a prime carrier of information in many machine vision
applications, mathematical morphology has an important role to play in
industrial systems (Haralick et al. 1987, Soille 1999). The language of binary morphology is derived from that of set theory (Haralick & Shapiro
1992). General mathematical morphology is normally discussed in terms of
Euclidean N-space, but in digital image analysis we are only interested in
a discrete or digitised equivalent in two-space. The following chapter uses
a formalism based on the Minkowski set operations (Haralick & Shapiro
1992, Dougherty 1992, Sonka et al. 1993). Slight variations of these definitions can also be found in literature on this subject (Serra 1982, Maragos
1987, Noble 1989, Soille 1999).
139
EB~=.
The Neat Vision code for both the 4-connected and the 8-connected dilation operator is given below. This is the generic dilation operation and applies
equally to grey scale images (Sec. 4.2.1).
II 4-connected dilation
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
a = input.getxy(x,y-1);
b
input.getxy(x-1,y);
c = input.getxy(x+1,y);
d
input.getxy(x,y+1);
e = input.getxy(x,y);
II
+---+---+---+
II
+---+---+---+
II
+---+---+---+
II
II
II
II
a
e
d
+---+---+---+
z = Math.max(Math.max(Math.max(Math.max(a,b),c),d),e);
output.setxy(x,y,z);
}
II
8-connected dilation
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
a = input.getxy(x-1,y-1);
b
input.getxy(x,y-1);
c = input.getxy(x+1,y-1);
d
input.getxy(x-1,y);
e = input.getxy(x,y);
f
input.getxy(x+1,y);
g
input.getxy(x-1,y+1);
h
input.getxy(x,y+1);
II
II
II
+---+---+---+
I
+---+---+---+
II
II
I die
I f
I
+---+---+---+
II
+---+---+---+
II
140
4. Mathematical Morphology
i
input.getxy(x+1,y+1);
z = Math.max(i,Math.max(Math.max(Math.max(a,b),
Math.max(c,d,Math.max(Math.max(e,f) ,
Math.max(g,h;
output.setxy(x,y,z);
}
pixelPlusMask = input.getxy(x+mxmaskDffset,y+my-maskDffset)+mask[mx+
(my*maskSize)] .intValue();
if (pixelPlusMask>max)
max=pixelPlusMask;
}
output.setxy(x,y,flow(max;
}
141
II 4-connected erosion
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
a = input.getxy(x,y-1);
b
input.getxy(x-l,y);
c = input.getxy(x+l,y);
d
input.getxy(x,y+1);
e = input.getxy(x,y);
II
+---+---+---+
II
+---+---+---+
II
+---+---+---+
II
II
II
II
+---+---+---+
z = Math.min(Math.min(Math.min(Math.min(a,b),c),d),e);
output.setxy(x,y,z);
}
II 8-connected erosion
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
II +---+---+---+
a = input.getxy(x-1,y-1);
b
input.getxy(x,y-1);
II I a I b I c I
II +---+---+---+
c = input.getxy(x+1,y-1);
d
input.getxy(x-1,y);
II I d i e I f I
II +---+---+---+
e = input.getxy(x,y);
f
input.getxy(x+1,y);
II I g I h I i I
II +---+---+---+
input.getxy(x-1,y+1);
g
h
input.getxy(x,y+1);
input.getxy(x+1,y+l);
i
z = Math.min(i,Math.min(Math.min(Math.min(a,b),
Math.min(c,d,Math.min(Math.min(e,f),
Math.min(g,h;
output.setxy(x,y,z);
e ~=
liB
142
4. Mathematical Morphology
output.setxy(x,y,flow(min;
}
(b)
(a)
(c)
Fig. 4.4. Binary erosion and dilation. (a) Original binary image. (b) Erosion by a
3 x 3 square SE. (c) Dilation by a 3 x 3 square SE.
143
and dilation operations are not inverses of each other, rather they are related by the following duality relationships: (A e B)C = AC EB 13 and
(A EB B)C = AC e 13, where AC refers to the complement of the image
set A and 13 = {x I for some b E B, x = - b} refers to the reflection of B
about the origin. Serra (1986) refers to this as the transpose of the SE. This
relationship is illustrated in Fig. 4.5.
(a)
(b)
(e)
(d)
(e)
(1")
Fig. 4.5. Duality relationship. (a) SE B and the reflection of B about the origin,
(SE origins are indicated by a dot.) (b) Input image A. (c) AeB. (d) (A8B)G.
(e) AG. (f) A G EB B. This is equivalent to the image illustrated in (d).
B.
hit = ErodeNxN.erodeNxN(input,hitMask);
miss = ErodeNxN.erodeNxN(Not.not(input) ,missMask);
output = And.and(hit,miss);
return(output);
}
The use of the hit-and-miss symbol 'Q9' should not be confused with the Boolean
XOR operator described earlier.
144
4. Mathematical Morphology
hit and miss patterns respectively (as illustrated in Fig. 4.7). Thinning and
thickening operations can also be implemented directly using the hit-and-miss
transform.
(b)
(a)
Fig. 4.6. Binary corner detection using the Hit-and-Miss Transform. (a) Original
binary image. (b) Application of the Hit-and-Miss structuring elements illustrated
in Fig. 4.7. The detected points are indicated by white squares for clarity.
X 1
X X
Hit (J)
Miss (K)
Fig. 4.7. Hit-and-Miss structuring elements for binary corner detection. Where 'x'
indicates a "don't care" state.
145
4-connected opening
open4(input)
{ return(Dilation.dilate4(Erosion.erode4(input); }
II 8-connected opening
open8(input)
{ return(Dilation.dilate8(Erosion.erode8(input); }
Closing. Closing is the dual morphological operation of opening. This transformation has the effect of filling in holes and blocking narrow valleys in the
image set A, when a SE B (of similar size to the holes and valleys) is applied
(Fig. 4.8). The closing of the image set A by the SE B, is denoted A. Band
is defined as (A ffi B) 8 B. Open and closing operations are related as follows:
(A.B)c=AcoB.
II
4-connected closing
close4(input)
{ return(Erosion.erode4(Dilation.dilate4(input); }
II 8-connected closing
close8(input)
{ return(Erosion.erode8(Dilation.dilate8(input); }
One important property that is shared by both the opening and closing
operations is idempotency. This means that successful reapplication of the
operations will not change the previously transformed image. Therefore, A 0
B = (A 0 B) 0 B and A. B = (A. B) B.
4.1.4 Skeletonisation
146
4. Mathematical Morphology
(b)
(a)
(c)
Fig. 4.8. Binary opening and closing. (a) Original binary image. (b) Opening by
a square SE. (b) Closing by a square SE.
Some vision systems can perform basic morphological operations very quickly
in a parallel and/or pipelined manner. Implementations that involve such special purpose hardware tend to be expensive, although there are some notable
exceptions. Unfortunately, some of these systems impose restrictions on the
shape and size of the structuring elements that can be handled. Therefore,
one of the key problems associated with the application of morphological
techniques to industrial image analysis is the generation and/or decomposition oflarge structuring elements (Pitas & Venetsanopoulos 1990). Two main
strategies have emerged to tackle this problem, namely serial and parallel decomposition.
Serial Decomposition. The first technique is called dilation or serial decomposition. This decomposes certain large structuring elements into a sequence of successive erosion and dilation operations, each step operating on
the preceding result. Unfortunately, the decomposition of large structuring
elements into smaller ones is not always possible. Also, those decomposi-
(a)
(e)
(b)
(c)
(f)
(g)
(i)
(j)
147
(d)
(h)
Fig. 4.9. Skeletonisation. (a) SE B. (b) Original Image A. (c) AoB. (d) So(A, B) =
tions that are possible are not always easy to identify and implement. If
a large SE B can be decomposed into a chain of dilation operations, B =
Bl ffiB 2ffi . . .ffiB n (Fig. 4.10 (Waltz 1988)), then the dilation of the image set A
by B is given by: AffiB = Affi(Bl ffiB 2ffi ... ffiB n ) = (((AffiBdffiB2)" .)ffiBn .
Similarly, using the chain rule, which states that Ae(BffiC) = (AeB)eC,
the erosion of A by B is given by: A e B = A e (Bl ffi B2 ffi ... ffi Bn) =
(((A e Bd e B 2) ... ) e Bn (Zhuang & Haralick 1986).
Parallel Decomposition. A second approach to the decomposition problem is based on breaking up the SE, B, into a union of smaller components,
148
4. Mathematical Morphology
Fig. 4.11.
bitrary SE.
structuring
with nine 3
ements.
Waltz (1988) compared these structural element decomposition techniques and showed that the serial approach has a 9:4 speed advantage over
its parallel equivalent. This was based on an arbitrarily specified 9 x 9 pixel
SE, implemented on a commercially available vision system. However, the
parallel approach has a 9:4 advantage in the number of degrees of freedom,
since every possible 9 x 9 SE can be achieved with the parallel decomposition,
but only a small subset can be realised with the serial approach. Although
slower than the serial approach, it has the advantage that there is no need
for serial decomposition of the SE.
Classical parallel and serial methods mainly involve the numerous scanning of image pixels and are therefore inefficient when implemented on conventional computers. This is the case, since the number of scans depends
on the total number of pixels (or edge pixels) in the shape to be processed
by the morphological operator. The parallel approach is suited to some customised (parallel) architectures, while the ability to implement such parallel
approaches on serial machines is discussed by Vincent (1991a).
4.1.6 Interval Coding
While 2-D binary images may be represented in a number of ways (Vincent 1991a, Vincent 1991b), interval coding (Piper 1985, Liang et al. 1989)
149
allows the efficient implementation of binary morphological operations on serial computers. The coding scheme is based on the earlier work of Young et
al. (1981) and is similar to run length encoding discussed in Sec. 3.3.4. This
approach places no restrictions on the shape or size of the scene, or of the
structuring elements, to be manipulated.
Erosion and Dilation Using Interval Coding. Any image set can be
regarded as a union of intervals or run lengths, denoted I, where each interval
is a set of pixels such that I = (LC, RC, R), where LC is the pixel column
coordinate at start of the interval, RC is the pixel column coordinate at end
of the interval and R is the row on which the interval occurs. Both the scene
A and the SE B can be represented as unions of intervals: A = UAIA and
B = UEIE.
LC
J.
RC
J.
R-+
~:l11
150
4. Mathematical Morphology
In Fig. 4.13, a 1-D morphological filter operates on an analogue signal (equivalent to a cross-section of a grey scale image), where the input signal is represented by the upper curve and the output by the lower curve. In this simple
example, the SE has an approximately parabolic form. In order to calculate
a value for the output signal, the SE is pushed upwards, from below the
input curve. The height of the top of the SE is noted. This process is then
repeated, by sliding the SE sideways and the locus of the SE's origin is traced
to produce the output signal. Notice how this particular operator attenuates
the intensity peak but follows the input signal quite accurately everywhere
else. Subtracting the output signal from the input would produce a result in
which the intensity peak is emphasised and all other variations are reduced.
The effect of the basic morphological operators on 2-D grey scale images can
also be explained in these terms. Imagine the grey scale image as a landscape
in which each pixel can be viewed in three-dimensions. The extra height dimension represents the grey scale value of a pixel. We generate new images
by passing the SE above/below this landscape.
IntenSity
t ~~~~:uring
-LtL:
nt
-t'
SlruCluring
Element
+-------,-----B
151
Outpul S;gnal
Inpul S<gnal
a a
(a)
(b)
(c)
Fig. 4.15. Grey scale dilation and erosion of a linear image f(x) by a 1 x 3 SE
B = [25,50,25]. (a) Original image f(x) (bottom) and the associated SE B. (b)
f(x)ffiB. (c) f(x)8B.
152
4. Mathematical Morphology
By combining the opening and closing operations the effect of both white
and black noise in an image can be reduced. In Fig. 4.19 the original image
is corrupted with a low level of salt and pepper noise (i.e. a random amount
of black and white pixels are overlaid on the image). By opening the image
with a 3 x 3 square SE the white noise can be eliminated from the image, although the image suffers some degradation. Similarly, by applying the closing
operator, also using a 3 x 3 square SE, black noise can be removed.
saltnpepper(inputImage, double noiseInput)
{
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
if (Math.random()<noiseInput)
{
if(Math.random(O.5)
pixel Value
BLACK;
else
pixel Value
WHITE;
}
else
pixelValue = inputImage.getxy(x,y);
outputImage.setxy(x,y,pixelValue);
}
return(outputImage);
}
Fig. 4.19(d) illustrates the combined image resulting from the opening
and closing stages. Fig. 4.19 also illustrates the resulting images when the
Sobel edge detector is applied to the original noisy image and the image after
the morphological noise removal has been applied.
4.2.3 Morphological Gradients
The application of some of the basic morphological operations outlined previously can lead to some simple gradient detection techniques (Soille 1999).
Internal Gradient. This is represented by A- (AeB). This is also referred
to as a half-gradient by erosion and enhances the internal boundaries of objects brighter than the background. External boundaries of objects darker
than the background are also enhanced.
External Gradient. This is represented by (AEBB) - A. This is also referred
to as a half-gradient by dilation and is the complementary operation to the
internal gradient.
Morphological Gradient. The simplest morphological gradient of an image A, with respect to a SE B, is the sum of the internal and external gradients and can be defined as GRAD(A) = (A EB B) - (A e B), see Fig. 4.20.
(a)
(b)
(c)
(d)
(e)
(f)
153
Fig. 4.16. Grey scale dilation and erosion (8-connected) using a 3 x 3 fiat SE (i.e.
all entries set to 1). (a) Original image. (b) 3-D representation of (a). (c) Dilation.
(d) 3-D representation of (c). (e) Erosion. (f) 3-D representation of (c).
154
4. Mathematical Morphology
(a)
(b)
(c)
Fig. 4.17. Grey scale opening and closing of a linear image f(x) by a 1 x 3 structuring element B = [25,50,25]. (a) Original image f(x) (bottom) and its associated
SE B. (b) f(x) 0 B. (c) f(x) B.
(a)
(b)
(c)
(d)
Fig. 4.18. Grey scale opening and closing (8-connected) of Fig 4.16(a). (a) Opening. (b) 3-D representation of (a). (c) Closing. (d) 3-D representation of (c).
(a)
(b)
(c)
(d)
(e)
(f)
155
Fig. 4.19. Noise removal (8-connected). (a) Original image contaminated with salt
and pepper noise. (b) Opening. (c) Closing. (d) Opening and closing operations
combined. (e) Sobel edge detector applied to (d). (f) Sobel edge detector applied
to (a).
156
4. Mathematical Morphology
(a)
(b)
(c)
(d)
Fig. 4.20. The morphological gradient applied to an x-ray image of a chicken fillet.
(Courtesy of M. Graves, Intelligent Manufacturing Systems Ltd., UK). (a) Original
grey scale image. (b) Internal boundary. (c) External boundary. (d) Morphological
gradient.
4.2.4 Point-Pairs
157
(a)
(b)
158
4. Mathematical Morphology
(b)
(a)
1 1 1 1 1 1 1 11
11
111.11111111
11111111111
I I I I
I I
I I
I I
(c)
(d)
Fig. 4.24. Determining the periodicity of lines by the erosion of point-pairs with
increasing inter-pixel distance. (a) Input image. (b) Erosion with a linear SE pointpairing separated by 10 pixels. (c) Erosion with a linear SE point-pairing separated
by 16 pixels. This results in retention of the periodic lines. (d) Erosion with a linear
SE point-pairing separated by 17 pixels.
the application of the top-hat to the linear image f(x). Fig. 4.26 illustrates
the effect of the top-hat transform on a grey scale image. One particularly
useful feature of this operator is its ability to remove a light gradient from
an image (Fig. 4.27).
tophat4 (input){
return(subtract(input,open4(input);
}
tophatS (input){
return(subtract(input,openS(input);
}
valley4(input) {
return(subtract(close4(input),input;
}
159
.{.
~
.~
,..
'c,
W
(,b
, tj
.. _-
, ;..
"
(a)
(b)
(c)
(d)
Fig. 4.26. Grey scale top-hat transformation (8-connected) of Fig 4.16(a). (a)
White top-hat. (b) 3-D representation of (a). (c) Black top-hat. (d) 3-D representation of (c) .
160
4. Mathematical Morphology
valleyS (input){
return(subtract(c!oseS(input),input;
}
(b)
(a)
(c)
Fig. 4.27. From an uneven to an even background light gradient using the white
top-hat transform. (a) Input image. (b) Opening with a 7 x 7 square. (c) White
top- hat: arithmetic difference between (a) and (b).
161
A8 A8
(a)
(b)
-------
A8 AS
(c)
(d)
Fig. 4.28. Conditional dilation. Beginning with the marker image we repeatedly
apply conditional dilations relative to the entire original image using a 3 x 3 SE.
(a) The mask image G is shaded. The marker image F is indicated by the white
pixel at the centre of the figure 8. (b) 20 iterations of the conditional dilation. (c)
50 iterations of the conditional dilation. Notice that the bridging effect begins to
occur on the bottom right edge of the character "A" in the image. (d) 70 iterations
of the conditional dilation. The effect of bridging is clearly illustrated.
162
4. Mathematical Morphology
A8 A8
(a)
(b)
AS A8
(c)
(d)
The geodesic erosion of the marker image F with respect to the SE Band
under the condition given by the mask image G is defined as c2;l (F) =
2
A geodesic distance is defined as the shortest discrete distance between two pixels
within a graph or connected region.
163
This is defined as the geodesic dilation of the marker image F with respect
to the mask image G until stability is reached and is denoted Rc(F) =
152; \ F) such that 152;+ 1 ) (F) = 152;\ F). The control image G is therefore
reconstructed from a marker image F (Vincent 1993, Soille 1999).
Fig. 4.30 illustrates how reconstruction by dilation, when applied to a
single pixel, can isolate a region within a complex scene. Figs. 4.32 and 4.33
illustrate how the removal of boundary objects from binary and grey scale
images can be implemented using this approach.
4.3.5 Reconstruction by Erosion
= Er;)(F) such
Ultimate erosion involves eroding the image until the point just before each
component within the image disappears. The union of these components results in the ultimate erosion. This can be viewed as the relative minima of
the complement of the grass fire transform, see Sec. 3.3.2.
4.3.7 Double Threshold
The double threshold operator (DBLT) thresholds the input image for two
ranges of grey scale values, one being included in the other. The threshold
DBLT for the narrow range is then used as a seed for the reconstruction of
the threshold for the wide range:
(4.1)
where R denotes the morphological reconstruction by dilation (Lantuejoul
& Maisonneuve 1984, Soille 1997) and T is the threshold operation. The resulting binary image is much cleaner than that obtained with a unique threshold. Moreover, the result is more robust to slight modifications of threshold
values. The double threshold technique is very similar to the hysteresis threshold proposed in Canny (1986) (Sec. 3.2.3).
In the example illustrated in Fig. 4.34, the double threshold is applied in
conjunction with reconstruction by dilation to remove all the pixels which do
164
4. Mathematical Morphology
(a)
(b)
(c)
(d)
Fig. 4.30. Reconstruction by dilation. (a) Input mask image G. (b) Single pixel
marker image F (indicted by an arrow). This forms the locus of the geodesic dilation
operation. (c) Result of reconstruction by dilation RG(F) when applied to (a) and
(b). Geodesic dilation is terminated when no further growth occurs. (d) Resultant
image when (c) is subtracted from the original image (a), G - RG(F).
not belong to the watermark image. The final filtering stage, Fig. 4.34(f), is
based on a surface area criterion applied to the closing of the double threshold
image: all connected components having a number of pixels smaller than a
given threshold are removed. The filtered closed image is then intersected
with the double threshold image.
4.3.8 Image Maxima
165
Fig. 4.31. Visual workspace for the operation described in Fig. 4.30.
For a given connected component within an image, its Influence Zone (IZ)
is the collection of points that are closer to it than any other component.
Alternatively, the IZ of a collection of points are defined as the Voroni regions
associated with these points. The SKIZ operation produces the boundary of
the various IZ, indicating the points that do not belong to any IZ. This is
approximated by the application of the thinning operator to the background
of a binary image and then removing any spurious branches, see Sec. 3.3.2.
A Geodesic SKIZ restricts attention to C and generates the IZ of A relative
to C, where C is a component of the image and A is a disconnected subset
of C. While this can be used to segment an image (Fig. 4.35) the SKIZ does
not always produce suitable results. This leads us to the examination of the
watershed segmentation technique.
4.4.2 Watershed SegIllentation
166
4. Mathematical Morphology
(a)
(b)
(c)
(d)
Fig. 4.32. Removing boundary objects from the binary image using reconstruction
by dilation. (a) Mask image G. (b) The marker image F is the point-wise minimum
between a single pixel wide white border and the mask image G. (c) Reconstruction
by dilation, RG(F). (d) Removal of the white regions touching the image border,
G - RG(F).
(a)
(b)
(c)
(d)
Fig. 4.33. Removing boundary objects from a grey scale image using reconstruction
by dilation. (a) Original mask image G. (b) The marker image F is the point-wise
minimum between a single pixel wide white border and the mask image G. (c)
Reconstruction by dilation, RG (F). (d) Removal of the grey scale regions touching
the image border, G - RG(F).
(a)
(b)
()
(d)
()
(f)
167
Fig. 4.34. Image reconstruction using double thresholding. (a) Original watermark
image. (b) White top-hat of (a). (c) First threshold of (b): marker image F. The
threshold value is chosen to minimize the unwanted background information without losing too much of the watermark. (d) Second threshold of (b): mask image G.
The threshold value is chosen to maximise the watermark data. (e) Reconstruction
of mask image from the marker. (f) After filtering based on the surface area of the
connected components of the closing of the reconstructed image by a 3 x 3 SE.
168
4. Mathematical Morphology
(a)
(b)
(c)
(d)
Fig. 4.35. Skeleton Influence by Zones, SKIZ. (a) Original image. (b) Resultant
image from the SKIZ operator. (c) Combination of (a) and (b). (d) Image (c)
zoomed to highlight incorrect segmentation.
have few grey levels separating them. The key to this form of segmentation
is finding suitable markers (i.e. local minima). Watersheds are zones dividing
adjacent Catchment Basins (CBs), where a catchment basin is defined as
a local minima which represents low gradient region interiors. While this
approach can be explained in a number of different ways, the Immersion
Approach is one of the more intuitive. This involves assuming that a hole
exists at each local minima and as the surface is placed in water the CBs are
filled from the bottom up. If two CBs are likely to merge due to further
immersion, a dam is built all the way to the surface (highest grey scale
level). This dam represents the watershed line. An efficient implementation
of this approach involves sorting the pixels in increasing order of grey scale
values, followed by a flooding step consisting of a fast breath-first scanning
of all the pixels in the order of their grey vales (Vincent & Soille 1991). As
well as being computationally efficient, this implementation overcomes the
inaccuracies that occur in previous approaches to the watershed algorithm.
169
170
4. Mathematical Morphology
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Fig. 4.36. Watershed Segmentation. (a) Original image. (b) Negative of its distance
function. (c) Profile of (b) along the horizontal line indicated in (b). (d) Immersion
of the image until the first watershed point is found and flagged. (e) Immersion
continues until all the watershed points are found and flagged. (f) Profile with the
watershed points flagged. (g) Result of the watershed segmentation process.
(a)
(b)
(c)
(d)
171
(e)
Fig. 4.37. Separating connected components using watershed segmentation. (a)
Original image. (b) Binary version of (a). (c) Watershed lines. (d) Watershed regions. (e) Separation of the discs in the image into distinct regions.
172
4. Mathematical Morphology
IMAGE
(a)
Fig. 4.38. The Neat Vision visual workspace for the operation described
Fig. 4.37.
(a)
III
(b)
Fig. 4.39. Grey scale segmentation using the watershed approach. (a) Original
image. (b) Application of watershed. The image has clearly been over segmented.
(b)
(a)
(d)
173
(c)
(e)
Fig. 4.40. Geometric packing using mathematical morphology. (a) Input image A.
(b) SE B. (c) Erosion residue C = A 8 B. (d) Step 4, BUitx.Jity). (e) Updated
scene, see step 5.
the shapes already packed, due to the idempotent nature of this operation
(Fig. 4.41) (Whelan & Batchelor 1996).
(a)
(b)
(c)
Fig. 4.41. Automated packing of tool shapes. (a) Tools packed in their current
orientation using geometric packing. (b) Tools reorientated for better efficiency
using heuristics. (c) Tools packed in an irregular scene.
174
4. Mathematical Morphology
4.7 Conclusion
This chapter has just hinted at the power of the mathematical morphology
methodology. It offers vision designers a structured approach in which they
can design complex machine vision solutions. While mathematical morphology is predominantly used in the analysis and manipulation of shapes, it can
also be applied to the analysis of texture features (as we will see in Chap. 5).
Chaps. 5 and 6 will discuss some of the key methodologies relating to the
analysis of colour and texture. While the initial version of N eat Vision (Version
1.0) only contains limited support for texture and colour image analysis, it
is our aim to present the background material necessary for designers to
implement these techniques themselves using the NeatVision development
environment (as discussed in Chap. 7).
176
5. Texture Analysis
(b)
(a)
.NT
[lJ
.NT
I .. IJ
(c)
Fig. 5.1. Edge density measurement using the Sobel edge detector. (a) Sample
sand image with an edge density of 35.15. (b) Sample grass image with an edge
density of 44.55. (C) Associated visual workspace.
177
''''''''.
l~olltJ
I MAG IE
IMAGIE
'HT
PI
,"""'.
(MClINES)
Fig. 5.2. Monte-Carlo method visual workspace. An image containing binary line
A(bx, by) -
L . Ie
',J
+ bx,j + by)
Z,
.)2
(5.1)
where I(i,j) is the image matrix. The variables (i,j) are restricted to
lie within a specified window outside of which the intensity is zero. Incremental shifts of the image are given by (bx, by). As the ACF and the power
spectral density are Fourier transforms of each other, the ACF can yield information about the spatial periodicity of the texture primitive. Although
useful at times, the ACF has severe limitations. It cannot always distinguish
between textures, since many subjectively different textures have the same
ACF. As the ACF captures a relatively small amount of texture information,
this approach has not found much favour in texture analysis.
178
5. Texture Analysis
179
= L: i 2 x P'(i).
= L:P'(i?
= - L: P'(i) x log(P'(i)).
MEAN = ,k L:i x P'(i).
Entropy: ENT
Mean:
= L:
P'()
i 2 +'1'
180
5. Texture Analysis
coc a
input.getxy(x-l,y);
II pixel d
input.getxy(x,y);
II pixel e
coc b
coc_c
input.getxy(x+l,y);
II pixel f
output.setxy(coc_b,coc_c,output.getxy(coc_b,coc_c)+l);
output.setxy(coc_b,coc_a,output.getxy(coc_b,coc_a)+l);
}
2
1
1
0
2
3 3 3
1 0 0
1 0 0
0 2 2
2 3 3
(a)
Grey
Scale
o
1
2
0 1 2 3
6
2
1
0
2 1 0
4 0 0
0 4 2
0 2 6
(b)
Fig. 5.3. Co-occurrence matrix applied to an sub-image. (a) Sub-image with 4 grey
levels. (b) Co-occurrence matrix f(p, q, 1, 0) for (a).
Since the co-occurrence matrix also depends on the image intensity range,
it is common practice to normalise the textured image's grey scale prior to
generating the co-occurrence matrix. This ensures that the first-order statistics have standard values and avoids confusing the effects of first- and secondorder statistics of the image. A number of texture measures (also referred to
as texture attributes) have been developed to describe the co-occurrence matrix numerically and allow meaningful comparisons between various textures
(Haralick & Shapiro 1992). Although these attributes are computationally
expensive and rotation variant, they are simple to implement and have been
successfully used in a wide range of applications. They tend to work well for
stochastic textures and for small window sizes. Rotation-invariance can be
achieved by accumulation of the matrices, but this increases the computational cost of the approach. A selection of the more commonly used texture
attributes are now outlined.
181
(b)
(a)
(c)
Fig. 5.4. Co-occurrence matrix. (a) Input image. (b) Co-occurrence matrix. (c)
Associated visual workspace.
5.9.2 Entropy
182
5. Texture Analysis
correlate with a persons perception of an images complexity (i.e a high entropy value implying a high degree of image complexity).
double entropy = 0;
double current_value = 0;
int width = input.getWidth();
int height = input.getHeight();
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{
current_value = (double)input.getxy(x,y);
if (current_value != 0.0)
entropy-=current_value*Math.log(current_value);
}
5.9.3 Inertia
This is a measurement of the moment of inertia of the co-occurrence matrix
about its main diagonal. This is also referred to as the contrast of the textured
image. This attribute gives an indication of the amount of local variation of
intensity present in an image. Inertia = 2: p 2: q (p - q? .f(p, q, d, a).
double contrast=O.O;
int width = input.getWidth();
int height = input.getHeight();
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
contrast+=Math.pow(y-x,2)*input.getxy(x,y);
current_pixel_value = input.getxy(x,y);
homogenity+=current_pixel_value/(l+Math.abs(y-x;
}
The co-occurrence approach has been successfully used for a wide class
of textured images in application areas such as aerial, medical, microscopic
imaging. A detailed survey of the co-occurrence applications is presented in
183
Table 5.1. Co-occurrence attributes for the sand and grass textures illustrated in
Fig. 5.1. Images were equalized prior to the calculation of the attributes.
Sand
Energy (10)
Entropy (10 4 )
Inertia (10 8 )
LH (10 3 )
Grass
f(p,q,l,O)
f(p,q,1,90)
f(p, q, 1,0)
f(p, q, 1, 90)
31.31
-18.81
5.46
3.51
30.9
-18.96
6.5
3.04
3.19
-9.19
2.96
2.02
3.53
-9.70
2.42
2.53
Haralick & Shapiro (1992). The power ofthis technique is that it captures the
second order grey levels statistics which are used by the human perception in
discrimination of textures. Unfortunately this approach does not lend itself
to describing the shape aspects of the texture.
This approach examines the ratio between the opening and closing of an image. The choice of structuring element and image threshold have a significant
bearing on the performance of this technique. The ratio measurement for the
sand and grass images illustrated in Fig. 5.1 are 1.09 and 17.78 respectively
(Fig. 5.5).
184
5. Texture Analysis
5.10.2 Granularity
B
Image
d
t. [ F 0 Hd ]
G(d)
A l B IC A l B IC
1
4 L5 1 4
oIaI0
2
4 1 4I 3
o I .2 1.25
I B IC
aI
1
3
0 I 3
1 1.25
Al BIC
4
O I OL O
1
(5.2)
185
= 1 - lim log(N(s))
8-+0
log( s)
(5.3)
186
5. Texture Analysis
(
\
'1
I/
l/
(b)
(a)
r-
(c)
(d)
Fig. 5.8. Fractal Dimension: Box-counting method for a I-D curve. (a) s = ~,
N(s) = 7. (b) s = ~, N(s) = 15. (c) s =
N(s) = 32. (d) s = ~, N(s) = 68.
iG,
Sn = 49
i=3
j=3
L L
IJn(r+i,c+j)1
(5.4)
i=-3 j=-3
187
texture units (note that the texture units are labelled and ordered in 38
different ways). This method can be useful in discriminating different types
of textures and has been shown to produce similar results to that of the
co-occurrence matrix approach (Wang & He 1990a, Wang & He 1990b).
188
5. Texture Analysis
2
1
5
3
(a)
1
3
(b)
(c)
(d)
Fig. 5.9. Local binary patterns (LBP). (a) Original 3 x 3 neighbourhood. (b)
3 x 3 neighbourhood thresholded by the value of the centre pixel. (c) LBP weights.
(d) The values in the thresholded neighbourhood (b) are multiplied by the LBP
weights, (c), given to the corresponding pixels. The elements are then summed to
form the value of this textured unit, LBP = 2 + 4 + 32 + 64 + 128 = 230. The
contrast measure LBP/C = 5+4+;+4+3 - 2+~+1 = 3.27
Markov Random Fields to model texture (Dubes & Jain 1989) the texture
is considered as a stochastic and stationary process. This model implies that
the probability of a pixel taking a given grey value is only dependant on
the grey levels of the pixels from its neighbourhood. The model can be fully
described by a small and compact number of parameters. The application of
MRF in both classification and synthesis of textures are numerous and the
overall performances achieved can be impressive. As well as being theoretically elegant, this model performs very well in classification and seems to be
a far superior technique when compared to the co-occurrence approach. Its
features are invariant to image rotation by k27r and are capable of recognising
scaled versions of an image as belonging to one class. Limitations of MRF include its sensitivity to grey level distortions and the fact it only accounts for
local interactions between the pixels. Additional disadvantages of this technique include its computational expense and the fact that it cannot model
all textures (e.g. due to the nature of this model, it cannot model directional
textures).
5.19 Conclusion
189
in a significant correlation between texture features. As well as been computationally expensive, the selection of features is not always a straightforward
task.
5.19 Conclusion
There is a growing belief within the machine vision community, as discussed
in Pietikainen & Kauppinen (1999), that a global solution to the texture analysis problem is no longer a suitable research goal. Many of the successes in
190
5. Texture Analysis
this field have been and are likely to continue to be application driven. While
model based techniques were dominant in the 1980's, they are not considered
practical for many machine vision applications. Currently filter based approaches, such as the LBP (Sec. 5.14) are viewed as a better options as they
are easily implemented and benefit from a psycho-physical understanding of
texture. Where possible, it is recommended that a supervised approach be
used as it greatly simplifies the texture analysis task. Unfortunately, while
many of the newer approaches work well under laboratory test conditions
they fail to impress when applied to practical applications. Therefore many
researchers in texture analysis take the view that it should only be applied
as a last resort in the analysis of images, as it has to contend with significant
variables in the scene, such as luminance.
The colour images that appear in this section are reproduced in grey scale.
Please refer to the book web site http://www . eeng. deli. ier j avamv / for full
colour versions of these images.
192
that of the eye. It should be realised that almost all colour cameras used in
machine vision systems have been designed to have this characteristic.
The traditional colour camera splits the incoming images into three optical
paths (or channels) by means of mirrors and beam-splitters. These are then
passed through Red, Green and Blue optical filters before being focused on
to three separate solid-state sensors (generally CCD sensors, although other
sensor types are possible). Alternatively good use can be made of the precise
geometry of the photo-sensor array; by placing minute patches of optical dye
in front of the sensor to produce a cheaper single-chip colour CCD camera
(Fig 6.1). In many three-chip colour cameras, the output consists of three
parallel lines, called R, G and B, each of which carries a standard monochrome
video signal. Another standard, called composite video, exists in which the
colour and intensity information is coded by modulating a high-frequency
carrier signal.
Colour
filters
N-- - - - -__+_
image
sensors
~~~
Optical signals
(a)
(b)
Fig. 6.1. Colour cameras. (a) Three-sensor colour camera. (b) Colour mask structure for a single chip solid-state colour camera. Each shaded rectangle represents
a photo detector with a coloured mask placed in front of it (e.g. R implies that it
passes red).
The transmitted signals derived from a colour camera are formed by encoding the RGB picture information into two parts: luminance, which carries
193
the brightness information and chrominance, which carries the colour information. The latter conveys both hue and saturation information (Sec. 6.3).
While the method of coding colour information is different depending on the
broadcast system (see below), it is important to note that all of the colour
information is carried in the chrominance signal.
NTSC: National Television System Committee. This was the first colour TV
broadcast system standard and was implemented in the United States in
1953. NTSC runs on 525 lines/frame and is used in the USA and parts
of Japan.
PAL: Phase Alternating Line. This was introduced in the early 1960's and
is commonly used throughout Europe (with the exception of France).
It utilises a wider bandwidth channel than NTSC, allowing for a better
picture quality. PAL runs on 625 lines/frame.
SEC AM: Sequential Couleur Avec Memoire (Sequential Colour with Memory). This standard was also introduced in the early 1960's and is implemented in France. SECAM uses the same bandwidth as PAL, but unlike
PAL it transmits the colour information sequentially. SECAM runs on
625 lines/frame.
The key to many machine vision based colour imaging tasks is the choice
of the colour representation (or colour) space in which to conduct the image
analysis. The concept of a colour space refers to a Cartesian space in which
the visual sensation of colour can be uniquely defined by a set of numbers
representing chromatic features. A wide range of different colour spaces exist,
a number of which are now discussed. Throughout this discussion we will
highlight the advantages / disadvantages and potential application areas of
the various approaches to colour imaging.
194
(a)
(b)
(c)
(d)
Fig. 6.2. Sample pizza image and its associated RGB components. The colour
images are displayed in grey scale. (a) Original image. (b) Red component (i.e. red
mapped to white). (c) Green component. (d) Blue component.
The RGB space is frequently converted into another colour space (usually
by means of a non-linear transformation) for an easier and more appropriate
colour processing. A number of such transforms will be discussed throughout this chapter. The following program segment illustrates the process of
converting a colour image to its RGB representation:
Image input = args.getImage(O);
DataBlock rets = new DataBlock();
rets.add(new ImageContainer(input,ImageContainer.RED));
rets.add(new ImageContainer(input,ImageContainer.GREEN));
rets.add(new ImageContainer(input,ImageContainer.BLUE));
r,g,b;
width = red.getWidth();
height = red.getHeight();
RGBImageArray[] = new int[width*height];
195
r = red.getxy(x,y);
g
green.getxy(x,y);
b
blue.getxy(x,y);
RGBImageArray[x+(y*width)]
Image output
Toolkit.getDefaultToolkit().createImage(new
MemoryImageSource(width,height,
RGBImageArray,O,width;
Consider Fig. 6.3(a), the RGB space is defined by 0::; R, G, B ::; W, where W
is a constant for all three signal channels, shows the allowed range of variation
of the point (R, G, B). The colour triangle, also called the Maxwell triangle
is defined as the intersection of that plane which passes through the points
(W, 0, 0), (0, W, 0) and (0,0, W), within the colour cube. The orientation of
that line joining the point (R, G, B) to the origin can be measured by two
angles. An alternative and much more convenient method is to define the
orientation of this line by specifying where the vector (R, G, B) intersects
the colour triangle (the (R, G, B) vector is extended if necessary). Then, by
specifying two parameters (i.e. defining the position of a point in the colour
triangle) the orientation of the (R, G, B) vector can be fixed. It has been
found experimentally that all points lying along a given straight line are
associated with the same sensation of colour in a human, except when very
close to the origin (i.e. very dark scenes) where there is a loss of perception
of colour. Hence, all (R, G, B) vectors which project to the same point on the
colour triangle are associated with the same colour name, as it is assigned by
a given person (at a given time, under defined lighting conditions). Moreover,
points in the colour triangle that are very close together are very likely to be
associated with the same colour label.
The colour triangle allows the use of a graphical representation of the
distribution of colours, showing them in terms of the spatial distribution of
blobs in an image. This is a particularly valuable aid to the understanding
the nature and inter-relationship between colours. It also permits the use
of the binary and grey scale image processing techniques outlined earlier.
This allows us to perform such operations as generalising colours, merging
colours, performing fine discrimination between similar colours, simplifying
the boundaries defining the recognition limits of specific colours, etc. Since
the colours perceived by a human can be related to the orientation of the
(R, G, B) colour vector, a narrow cone, with its apex at the origin, can be
drawn to define the limits of a given colour (e.g. "yellow"). The intersection
of the cone with the colour triangle generates a blob-like figure.
196
B
81u8
k - - - - - - - - - . " Magan'"
~n F--~~--~
,,
Colour
Triangle
,,
:8Iadc ~"-----_+--_:.,.J.........
(a)
Blue
(b)
Fig. 6.3. RGB space. (a) RGB space and the colour triangle. Points that are
close on the colour triangle are likely to have the same colour label. (b) Hue and
saturation plotted in the colour triangle. The hue is the angular position of a point
relative to the centre of the colour triangle. The degree of saturation is measured
by the distance from the centre. Intensity is not represented.
197
while vivid cyan is mapped to a paler (i.e. less saturated) tone. In relative
terms, these are minor difficulties and the colour triangle remains one of the
most useful concepts for colour vision. Fully saturated mixtures of two primary colours are found on the outer edges of the triangle, whereas the centre
of the triangle, where all three primary components are balanced, represents
white. The centre of the colour triangle is referred to as the white point, since
it represents neutral (i.e. non-coloured) tones. Other, unsaturated colours are
represented as points elsewhere within the triangle. The hue is represented
as the angular position of a point relative to the centre of the colour triangle,
while the degree of saturation is measured by the distance from the centre.
198
can tell us a great deal about how the red and blue signals are related to one
another and therefore provide a useful aid in understanding the types colours
in the image.
Image A(red)
Image B(blue)
-----0
-----0
a(i,j)
b(i ,j)
b(i ,j)
(blue)
a(i ,j)
[red]
199
contain colours from other parts of the spectrum. Weak or pastel colours
(S r:::l 0) have little saturation.
Eqns. 6.1 to 6.3 relate the HSI parameters to the RGB representation
(Gonzalez & Wintz 1987). For convenience, it is assumed that the RGB components have been normalised (W = 1).
2R-G-B
2J(R - G)2 + (R - B)(G - B)
}{ = arccos~~======7=~======~~~~
(6.1)
S = 1- 3 x min(R,G,B)
R+G+B
(6.2)
1= R+G+B
(6.3)
3
The following program segment illustrates the process of converting a
colour image to its HSI representation:
double r,g,b,h,s,i,intensity;
Image input = args.getImage(O);
ImageContainer R
ImageContainer G
ImageContainer B
new ImageContainer(input,ImageContainer.RED);
new ImageContainer(input,ImageContainer.GREEN);
new ImageContainer(input,ImageContainer.BLUE);
new ImageContainer(width,height);
new ImageContainer(width,height);
new ImageContainer(width,height);
I I HUE
double num;
double den;
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{
r = (double)R.getxy(x,y);
g
(double)G.getxy(x,y);
b
(double)B.getxy(x,y);
intensity = (r+g+b)/3;
if((r==g)&&(r==b
h = 255;
else
{
num = (r-g)+(r-b);
den = 2*Math.sqrt((((r-g)*(r-g+((r-b)*(g-b;
h = Math.acos(num/den);
if(h<=O) h = 2*Math.PI + h;
200
H.setxy(x,y,(int)h);
}
II SATURATION
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{
r = (double)R.getxy(x,y);
g
(double)G.getxy(x,y);
b
(double)B.getxy(x,y);
if((r==g)&&(r==b))
s=O;
else
s=(1-((3*Math.min(Math.min(r,g),b))/(r+g+b)))*255;
S.setxy(x,y,(int)s);
}
II INTENSITY
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{
r = (double)R.getxy(x,y);
g
(double)G.getxy(x,y);
b
(double)B.getxy(x,y);
i
(r+g+b)/3;
I.setxy(x,y,(int)i);
}
DataBlock rets
rets.add(H);
rets.add(S);
rets. add (I) ;
new DataBlock();
The process of creating a colour image from a HSI representation is illustrated below:
double r=O,g=O,b=O,h,s,i;
int red,green,blue;
int width = H.getWidth();
int height = H.getHeight();
int RGBImageArray[] = new int[width*height];
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{
201
h
((double)H.getxy(x,y/255;
s = ((double)S.getxy(x,y/255;
i
((double)I.getxy(x,y;
h
h*2*Math.PI;
[0<h<2PI/3]
/ / Region 1.
if((h > 0) && (h <= 2*Math.PI/3
{
b
(1-s)/3;
r = (1+(s*Math.cos(h)/Math.cos(Math.PI/3-h)/3;
g
l-(r+b);
}
/ / Region 2.
[2PI/3<h<4PI/3]
if((h > 2*Math.PI/3) && (h <= 4*Math.PI/3
{
h
h-(2*Math.PI/3);
r = (1-s)/3;
g
(1+(s*Math.cos(h)/Math.cos(Math.PI/3-h)/3;
b
1-(r+g);
}
/ / Region 3.
[4PI/3<h<2PI]
if((h > 4*Math.PI/3) && (h <= 2*Math.PI
{
h
g
b
r =
h-(4*Math.PI/3);
(1-s)/3;
(1+(s*Math.cos(h)/Math.cos(Math.PI/3-h)/3;
1-(g+b);
Standard grey scale operators can then be applied to the HSI colour channels. A disadvantage of HSI colour space is the so called hue singularity, i.e.
with reference to Eqn. 6.1 it can be seen that for R = G = B = 0 hue is
undefined.
Fig. 6.6 illustrates the use of RGB and HSI colour space in the calculation of the approximate red:green pepper ratio for a pizza image. This
involves splitting the colour image into its constituent RGB (Fig. 6.2) and
HSI (Fig. 6.5) colour channels. The channels are then processed as grey scale
images in order to extract the relevant regions.
202
(c)
(b)
(a)
Fig. 6.5. Sample pizza image (as illustrated in Fig. 6.2(a)) and its associated HSI
components. (a) Hue component. (b) Saturation component. (c) Intensity component. Note that some saturation has occurred in these images.
(b)
(a)
(c)
Fig. 6.6. Calculation of the approximate red:green pepper ratio for a pizza section.
(a) Red peppers highlighted. (b) Green peppers highlighted. (c) Associated visual
workspace.
203
(i, j).
5. Increment the intensity value stored at point (i, j) in the current image.
(Hard limiting occurs if we try to increase the intensity beyond 255.)
6. Repeat step 5 for all (i, j) in the input scene.
It is clear that dense clusters in the colour scattergram are associated with
large regions of nearly constant colour. A large diffuse cluster usually signifies
the fact that there is a wide variation of colours in the scene being viewed,
often with colours "melting" into one another. On the other hand, step-wise
colour changes are typically associated with small clusters that are distinct
from one another. The colour scattergram is a useful method of characterising
colour images and hence has a central role in programming the colour filters
described in Sec. 6.12.
= [1
-1 -21]
-1 2 [R]
G
1
1 1
(6.4)
204
(a)
(c)
(b)
Fig. 6.7. Sample pizza image (as illustrated in Fig. 6.2(a)) and its associated
Opponent Process components. (a) Red-Green component. (b) Blue-Yellow component. (c) White-Black component. Note that some saturation has occurred in
these images.
l 1lGRl
J
(6.5)
new ImageContainer(width,height);
new ImageContainer(width,height);
new ImageContainer(width,height);
r = input.getxy(RGBlmage.RED,x,y);
g
b
YO
IO
QO
205
input.getxy(RGBImage.GREEN,x,y);
input.getxy(RGBImage.BLUE,x,y);
(int) (.299*r + .587*g + .114*b);
(int) (.596*r - .275*g - .321*b);
(int) (.212*r - .523*g + .311*b);
Y.setxy(x,y,(int)YO);
I.setxy(x,y,(int)IO);
Q.setxy(x,y,(int)QO);
}
The process of creating a colour image from a YIQ representation is illustrated below:
double red=O,green=O,blue=O,yin,iin,qin;
int width = I.getWidth();
int height = I.getHeight();
RGBImage output = new RGBImage(width,height);
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{
yin
iin
qin
((double)Y.getxy(x,y));
((double)I.getxy(x,y));
((double)Q.getxy(x,y));
red
(1*yin+.956*iin+.621*qin);
green = (1*yin-.272*iin-.647*qin);
blue = (1*yin-1.105*iin+1.702*qin);
output.setxy(RGBImage.RED,x,y,(int)red);
output.setxy(RGBImage.GREEN,x,y,(int)green);
output.setxy(RGBImage.BLUE,x,y,(int)blue);
}
0.229 -0.5870.114]
[ -0.147 -0.289 0.437
0.615 -0.515 -0.1
[R]
G
B
(6.6)
206
bluish
blue In: ~-+-
x
Fig. 6.S. CIE (1931) chromaticity diagram. Colour representation in x,y plane (see
Sec. 6.8).
Over the years a number of additions have been made to the basic chromaticity diagram. MacAdams Ellipses are defined as ellipses within the CIE
chromaticity diagram for which colours inside these elliptical regions are
classed as indistinguishable. Colours lying just outside are referred to as
just noticeably different (JND). The ClE Uniform Chromaticity Scale (UCS)
transforms the elliptical contours with large eccentricity (major to minor
ratios of u pto 20 : 1) to circles.
207
is that the result of an additive mixing between two colours will always lie
on the line that connects the two colours in x-y plane. A key problem with
this approach is the non-uniformity of the CIEXYX plane.
X
X= - - - - -
X+Y+Z
Y
(6.7)
X+Y+Z
(6.S)
Z
z=-----
(6.9)
y=
X+Y+Z
(6.10)
(6.11)
(6.12)
r,g,b;
XOUT=O,YOUT=O,ZOUT=O;
r = input.getxy(RGBImage.RED,x,y);
g
input.getxy(RGBImage.GREEN,x,y);
b
input.getxy(RGBImage.BLUE,x,y);
/** X Component **/
XOUT = (.431*r + .342*g + .178*b);
X.setxy(x,y,(int)XOUT);
/** Y Component **/
YOUT = (.222*r + .707*g + .071*b);
Y.setxy(x,y,(int)YOUT);
/** Z Component **/
ZOUT = (.020*r + .130*g + .939*b);
Z.setxy(x,y,(int)ZOUT);
}
208
xin
yin
zin
((double)X.getxy(x,y;
((double)Y.getxy(x,y;
((double)Z.getxy(x,y;
red
(3.063*xin-1.393*yin-.476*zin);
green = (-.969*xin+1.876*yin+.042*zin);
blue = (.068*xin-.229*yin+1.069*zin);
output.setxy(RGBlmage.RED,x,y,(int)red);
output.setxy(RGBlmage.GREEN,x,y,(int)green);
output.setxy(RGBlmage.BLUE,x,y,(int)blue);
}
Y,
= 116 x (-)3 - 16 = Luminance
(6.13)
= 13 x L(u* -
u~)
(6.14)
= 13 x L( v* -
v~)
(6.15)
Yn
u*
v*
4X
X + 15Y +3Z
gy
X
+ 15Y +3Z
(6.16)
(6.17)
209
Y,
= 116 x (-)3 - 16
(6.18)
Yn
for Y/Yn
= 903.3 x -
X
Y
= 500 x [1(-) - f( -)]
Xn
(6.19)
Yn
Yn
Zn
(6.20)
(6.21)
http://white.Stanford.EDU/html/xmei/
210
01
= 0.279X + 0.72Y -
O2
= -0.449X + 0.29Y -
03
= 0.086X -
0.107Z
= Brightness
0.077 Z
0.59Y - 0.501Z
= Red green
= Yellow
(6.22)
component
(6.23)
blue component
(6.24)
(6.25)
tandard
IELAB
Calculation
01 r
patial
eparation Filtering
I'epre entation
Opponent
representation
XYZ
repre entation
Mirmehdi & Petrou (2000) propose a mechanism for segmenting colour textures based on the use of the S-CIELAB colour space in conjunction with a
multi-scale (or multi-distance) tower of images. With the multi-scale tower, no
subsampling occurs, therefore maintaining the same number of pixels (Petrou
211
Plane
Weights (Wi)
Spreads ((Ti)
0.921
0.105
-0.108
0.531
0.330
0.488
0.371
0.0283
0.133
4.336
0.0392
0 . .494
0.0536
0.386
Luminance
Red-Green
Yellow-Blue
et al. 1998). The tower is generated using the filter parameters given in Table 6.1 by viewing the same coloured object at incremental distances. The
aim is to imitate the blurring a human would experience at the corresponding
distance. Fig. 6.10 illustrates a colour image and its associated S-CIELAB
smoothed images at varying distances.
Fig. 6.11 illustrates the same image, but this time Gaussian smoothing is
employed. Mirmehdi & Petrou (2000) claim that the set of images based on
S-CIELAB smoothing provides a more realistic representation of the blurring
of a coloured object at varying distances.
= ----;=,----R_-_G_ _
V2.(R
+ G + B)
(6.26)
212
(a)
(b)
(c)
(d)
Fig. 6.10. Multi-scale tower for S-CIELAB smoothing. Colour images are represented in grey scale. Images (a) to (d) are the resultant images for distance increments of 10 inches.
V=
2.B-R-G
(6.27)
~~--------
v'6.(R
+ G + B)
To see how these equations can be derived, view the colour triangle normally (i.e. along the line QPO the diagonal of the colour cube). When the
vector (R, 0, 0) is projected onto the colour triangle, the resultant is a vector
Vr oflength R.V2f3 parallel with the R' axis. In a similar way, when the vector (0, G, 0) is projected onto the colour triangle, the result is a vector Vg of
length G.V2f3 parallel to the G' axis. Finally, the vector (0,0, B) projected
into the colour triangle forms a vector Vb of length B.
parallel to the
B' axis. A given colour observation (R, G, B) can therefore be represented by
the vector sum (Vr + Vg + Vb). Finally, U and V can be calculated, simply by
resolving Vr, Vg and Vb along these axes. Let us now consider the mapping
function
given by:
V2f3
ro
r((R G B)) _ (
"
R- G
y!2.(R
2.B - R - G )
+ G + B) , v'6.(R + G + B)
(6.28)
(a)
(b)
(c)
(d)
213
Fig. 6.11. Multi-scale tower for Gaussian smoothing. Colour images are represented
in grey scale. Images (a) to (d) are the resultant images for distance increments of
10 inches.
ro
projects a general point X = (R, G, B) within the colour cube onto the
colour triangle. Clearly, there are many values of the colour vector (R, G, B)
which give identical values for r((R, G, B)). The set of points within the
colour cube which give a constant value for r((R,G,B)) all lie along a
straight line passing through the origin in RGB space (0 in Fig. 6.12). This
set is denoted by <I>(U, V), where VX : X E <I>(U, V) --+ r(X) = u, V. The
colour scattergram is simply an image in which the "intensity" at a point
U, V is given by the number of members in the set <I>(U, V).
The details of the process of programming the PCF are as follows (see
Figs. 6.13 and 6.14):
1. Project all RGB vectors onto the colour triangle (which contains the
colour scatter gram ).
2. Process the colour scattergram, to form a synthetic image, S. (Image S
is not a "picture of" anything. It is merely a convenient representation
of the distribution of colours within the input).
214
B' & V ~
__ ----- --t'
Fig. 6.12. The relationship between the RGB- and UV-coordinate axes. The vectors R', G' and B' all lie in the UV -plane, which also contains the colour triangle.
3. Project each point, Y, in image S back into the colour cube. This process
is called Back-projection. Every vector X within the colour cube, that
shares the same values of hue and saturation as Y, is assigned a number
equal to the intensity at Y. The values stored within the look-up table
are obtained by back-projecting each point within the colour triangle
through the colour cube. Any points not assigned a value by this rule
are given the default value 0 (black). See Batchelor & Whelan (1997) for
further details.
215
B
Z is on the outer
edge of the colour
cube
~--~-r----~-+R
(a)
B
(b)
(c)
Fig. 6.13. Programming a colour filter. (a) The first step is to compute the colour
scattergram. The points lying along OZ are counted. This defines the value at Y
in the colour scattergram. (b) The colour scattergram is presented to the user in
the form of a grey scale image in which intensity indicates how many pixels were
found of each colour. (c) The second step is to process the colour scattergram. The
steps represented diagrammatically here are thresholding and blob shading.
When a PCF that has been trained on one scene and reapplied to the same
scene, it is often found that the filter output is noisy; some pixels in what
appears by eye to be a region of uniform colour are mapped to black. On a live
video picture, some pixels are seen to scintillate. There are several possible
reasons for this:
Pixels which generate points in the colour scattergram close to the edge of
the main cluster, will sometimes be displayed as black. Camera noise adds
an unavoidable jitter in the values generated at the RGB signal outputs.
216
Z is on the edge of
the colour cube
(a)
B
Upper limit, l2
Lower limit, L 1
(b)
Fig. 6.14. PCF back-projection. (a) The colour triangle is shown here within the
colour cube. The process of programming the colour filter is to "back-project" each
point, Y, through the colour triangle, so that each point lying along the line OY
is given the same value as Y. The blobs shown in Fig. 6.13(c) each contain many
points of the same intensity. The effect of "projecting" a blob through the colour
cube is to set all points lying within a cone to the same intensity. (b) Two intensity
limits (1 and 2) are specified by the user. (2 may well be set to 255, in which
case, it has no practical effect). When the LUT contents are being computed, points
within the colour triangle are not "back-projected" into either of the two corners,
shown shaded here.
(a)
(b)
217
(c)
Fig. 6.15. Isolating a single colour using the PCF. Colour images displayed in grey
scale. (a) Analysing the image of a small electronics component (black with shiny
silver printing) in a transparent plastic bag. The bag has red printing on it and
is resting on a white background. (Unprocessed video image). (b) Output of the
colour filter. (c) A grey scale image obtained by thresholding (b).
Recall that hard upper and lower intensity limits are defined during training of the PCF. Some very dark pixels, below the lower intensity limit
will be shaded black, even though they are of the particular tone that the
PCF is supposed to recognise. A similar situation holds around the upper
intensity limit.
Specular reflection on wrinkles on the surface of the object being viewed is
a prime cause of noise-like effects. The very high intensities produced by
glinting cause black spots to appear in the PCF output image.
The colour scattergram often consists of a compact cluster, with a diffuse "corona". Outlier points are often specifically excluded, by generating
a small blob which covers only the dense centre of the cluster, prior to
creating the filter.
Colour edges produce outlier points in the colour scattergram. As a result,
sharp colour edges may become jagged, in the PCF output image.
6.12.4 Colour Generalisation
218
a result the PCF output will consist of a noisy image in which large regions
are not recognised properly, even though they may be identical to areas in
the scene used during training of the PCF.
One approach to colour generalisation consists of adjusting the sizes of the
blobs created by thresholding the colour scattergram. It will be necessary to
do this in such a way that blobs which were distinct when the (multi-cluster)
scattergram was first thresholded, remain separate.
In a typical application, a number of coloured scenes are used to design the PCF. As each scene is being viewed, a scattergram is generated in
the colour triangle. Noise is then removed from the scattergram, using common image processing operations, such as low-pass filtering. This is followed
by thresholding. If the input scene consists of a single colour thresholding
the scattergram, after clean-up, creates a single blob B i . This process is
repeated for each colour we wish to use to design the PCF. As each new
blob (Bi' i = 1, ... , n) is generated it is superimposed on the colour triangle.
Therefore, prior to generalisation the colour triangle consists of a number of
blob regions, each of which corresponds to one of the trained colours. The
aim of the generalisation procedures is to expand these regions, forming the
regions Gi , i = 1, ... , n. By projecting the Gi back onto the colour cube we
generate the contents of the PCF look-up table. If the Gi have been generated
appropriately, the resulting colour recognition process is more reliable than
it would have been if the smaller blobs Bi had been used instead (Fig. 6.17).
A number of simple colour generalisation procedures are outlined below:
1. Simple Dilation: Each blob Bi in the colour triangle is dilated.
2. Dilation with preservation of connectivity: A single layer of background
pixels is stripped from the (binary) image in the colour triangle. Unlike
the previous approach, pixels critical for connectivity are retained.
3. Morphological watershed: This approach involves finding the morphological watershed (as outlined in Sec. 4.4.2) for each of the blobs B i .
4. Convex hull generalisation: The convex hull is drawn around the set of
blobs Bi in the colour triangle. It is reasonable to expect that points
within this convex hull will correspond to a generalisation of the observed
colours.
Procedures 1 to 3 are intended for use in those situations where colour
discrimination is required, whereas procedure 4 is more appropriate for those
occasions when simple colour recognition is needed. Analysis of colour scenes
is made much simpler if colour generalisation is used. For example:
6.13 Conclusion
(a)
(b)
219
(c)
Fig. 6.16. Colour scattergram displayed as a grey scale image, in which intensity
indicates the number of pixels with the same values of hue and saturation. (a)
Colour scattergram. (b) Threshold applied to the colour scatter gram to consolidate
the colour clusters. Minor clusters and outlier points are normally ignored. (c)
Generalization of the blob regions indicated in (b).
Consider a colour scattergram in which there are six distinct and compact
clusters (Fig. 6.18). This form of scattergram is generated by polychromatic
scenes in which there are several regions, each containing nearly constant and
perceptually distinct colours. After applying thresholding and noise reduction
to the colour scattergram, there are several small blobs, which can be shaded,
using the a blob labelling operator. Such a PCF will often be rather noisy.
In particular, some of the points in the original image are mapped to black,
incorrectly suggesting that they have not been seen beforehand. The reason is
that the small blobs do not cover all points in the colour scattergram. Notice
that the white blob at the 3 o'clock position straddles a sharp colour in the
pseudo colour-triangle. This is the reason that a (red) stripe in the input
image generates a noisy pattern in Fig. 6.18(b). If the blobs are enlarged and
a new PCF designed, the noise level is reduced. Blob enlargement can be
repeated provided that the blobs do not merge.
6.13 Conclusion
In this chapter we have only been able to touch on the general issues relating to colour image analysis for machine vision applications. For further
information on colour imaging and colour spaces, readers are referred to the
following web sites:
220
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 6.17. Recognising a single colour using PCF generalization. Colour images
displayed in grey scale. (a) Original image. (b) Output of the colour filter. (c) Grey
scale image, obtained by thresholding (b). (d) Back-projection of the colour filter
in (c). (e) Colour generalization applied to (c). (f) Back-projection of the colour
filter in (c).
6.13 Conclusion
(a)
(b)
(c)
(d)
221
Fig. 6.18. Application of the PCF to a polychromatic image. Colour images displayed in grey scale. (a) Coloured stripes from the packaging of a domestic product.
(Unprocessed video image.) (b) Output of the colour filter. (c) Colour scattergram,
superimposed on the pseudo colour-triangle. (d) A new pseudo colour-triangle was
created by flipping the pseudo colour-triangle about its vertical axis. A colour filter
was then generated in the usual way. Since there is no sharp transition across the
blob at 3 o'clock, the red stripe does not create a noisy pattern, as it did in (b).
224
_ _ _ _..;;rl'rlf'
lEI
,~"
.....'0
"""'.....
..
1<IIs.lRI
,"11"<
,
:Lnll-1.I1t8 1:
ou'tp!.lca I:
na.e - ".Jllva2D";
(I Ut I)
Fig. 7.1. A screen shot of the Neat Vision application, displaying some of the main
application features.
225
data through the program. In general, data flow begins at the inputs to the
visual program and ends at the outputs. Although the data interconnections
are specified by the programmer, the order in which operations are executed
is defined by the availability of data. The concepts of branching, looping and
conditional processing are supported in the visual domain using special flow
control components.
The concept of visual programming can be illustrated by considering the
generalised polynomial given by Eqn. 7.1. This relationship can be defined in
terms of primitive operations which in this case are multiply, power and add.
y
= ax 3 + bx 2 + cx + d
(7.1)
A visual implementation of this relationship can be generated by specifying the inputs and outputs to the system, where the inputs are a, b, c, d,
x and the output is y. Following this, each of the required operators must
be instantiated and the data interdependencies must be specified by creating
interconnections between the operators. This defines the data flow through
the program. The resulting visual representation of this equation is presented
in Fig. 7.2.
_ _ ...J
One of the most important parts of any program is user interaction. This
involves getting input data into the program, executing, then displaying the
226
results. NeatVision provides input and output components for this purpose.
Each of these components has an associated window or user interface that can
be used for data specification and presentation respectively and can be viewed
by double clicking on the relevant component. System input components
have one output node which represents the data in the associated window.
System output components have one input node which updates the associated
window whenever it receives new data. The main data types supported by
input and output components are outlined in Table 7.1.
0-
IMAGE
(lab)
Fig. 7.3. A sample input component and the associated user interface.
Once data has been made available to the visual workspace using the input
components it can be processed (via the processing components). A processing component has both inputs and outputs and can be compared to a
function in a conventional programming language. Inputs are equivalent to
function arguments and outputs are equivalent to return values. Although a
processing component can have many input and output connections, it requires only one of each. Processing components can support mixed data types
and in some cases dynamic type conversion. There are currently over 200 data
processing components available in the NeatVision application (Appendix C).
The major groups of processing components are described in Table 7.2.
A visual program can range in complexity from three components upwards
and is limited only by the availability of free memory. A minimal program
must have an input, an operation and an output. Such a program is illustrated
in Fig. 7.4, in which Sobel edge detection is performed on a sample grey scale
image.
227
Description
Image
Integer
Double
Boolean
Array
String
Fig. 7.4. A sample processing component, using the Sobel edge detector.
The components which have been introduced so far can be used to develop
simple linear programs. In these programs, data will flow from the inputs to
the outputs and the data path will be the same, even if the input data changes.
Visual programs can be enhanced significantly by introducing dedicated flow
228
Table 7.2. Description of processing components available with the standard edition of NeatVision.
Group
Description
Utilities
Arithmetic
Processing
Noise
Filter
Label
Transform
Colour
Morphology
Histogram
Maths
Low Level
String
JAI Colour
229
X3
Main Input
~ MalnOulput
loop Count Input ~ Feedback Outpul
Feedback Input
lOOP
Mainlnpul~
Siaft
Finish
Main Output
Feed back Oulpul
Increment
Current Counl
Feedback Inpul
FOR
Main Inpul
Main Outpul
Feedback Oulpul
Main Inpul
Feedback Inpul
---1~L---.I--
Feedback Oulput
ELSE
Fig. 7.5. Input and output descriptions for all of the flow control components.
tern any number of times before being sent out the conventional component
output. The flow control components are presented in Table 7.3.
Data Splitting. Data splitting allows data at a certain node in a visual
program to be replicated several times. Splitting creates multiple reference
nodes for the same piece of data where each of these nodes can be used to
feed a new program branch. The data splitting component has three variants replicating nodes two, three and four times. Data splitting components
implement dynamic type conversion and will support all future data types
implemented by NeatVision. A sample visual program using a data splitting
component is presented in Fig. 7.6.
OPERATOR2
DATA
OPERATOR3
DATA
230
Table 7.3. Description of flow control components available with the standard
edition of NeatVision.
Component
Description
If
Else
For
Loop
Split
231
LOOP
COUNT
(a)
STEP
(b)
Fig. 7.7. Examples of data looping components (a) A feedback loop.(b) A For
loop.
232
CONDITION
(a)
CONDITION
(b)
Fig. 7.S. Examples of conditional processing components (a) If loop.(b) If/Else
loop.
consisting of a combination of simple arithmetic operators add, subtract, multiply, divide, power and square root.
ax2
+ bx + c = 0
(7.2)
-b Vb 2 - 4ac
(7.3)
2a
Using visual equivalents for theses operators a program representing Eqn. 7.3
can be created (Fig. 7.9). It should be noted that this visual program is a
simplified representation of the equation and will only solve for real values of
x.
X=
Fig. 7.9. A visual program which solves the real roots of a second order polynomial
equation.
233
State
Description
STEADY STATE
WAITING TO PROCESS
234
user interface which it then passes to the output connections for further
processing. Fig. 7.10 illustrates the data flow in a simple linear system.
IMAGE B
IMAGE B
IMAGE B
IMAGE B
IMAGE B
COMPONENT STATES
IMAGE B
I
I
STEADY STATE
WAITING TO PROCESS
After the visual program has been initialised, both of the image input
blocks are WAITING TO PROCESS. Each of the image input blocks reads
the image from the associated user interface and writes this data to the output connection. The data from IMAGE A, which returns to STEADY STATE,
is then sent to the LOWPASS which immediately signals WAITING TO PRO-
235
CESS and the data from IMAGE B, which also returns to STEADY STATE,
is sent to ADD. ADD does not signal WAITING TO PROCESS at this point
because it has only received data on one of two inputs. LOWPASS processes
its data and sends the results to the ADD component and then it goes into
STEADY STATE. When the ADD component receives data on the second
of the two inputs it signals WAITING TO PROCESS, processes the input
data, sends the resulting data to the outputs and then goes into STEADY
STATE. When INVERT receives the data it immediately signals WAITING
TO PROCESS, processes the input data, sends the resulting data to the
next component and then goes into STEADY STATE. The image output
block then receives the data goes into WAITING TO PROCESS, sends the
data to the associated user interface and then returns to STEADY STATE.
The visual program is now in steady state, with execution completed.
The aim of this simple example is to show that the data flow in a visual
program is handled by the application. Components will only become active,
or execute when they have received data on all connected inputs. As a result,
component development is straightforward and independent of the overall
visual program, as only a small programming overhead is required to interface
with the underlying system.
7.2.2 Standard Component Architecture
inputs = 2;
outputs = 2;
name = "COMPONENT\nNAME";
width = 30;
height = 20;
}
236
Input
Connec1OtS
wldt_h_
1
1 - -_ _ _
Outpul
.......~I Connec1OlS
[] ComecIJon []
StalUS
[]
IndlCalOtS
[]
COMPONENT
NAME
The interface which has been created as a part of the component appearance must now be initialised. This involves setting the data type, mode and
description for each of the input and output connections. The interface is always initialised within the setup () method. There are two methods required
to initialise a connection. These are setConnectionType 0, which defines the
colour of the connection and setConnectionDescriptionO, which defines
the description of the connection. This description will appear in the status
bar when the mouse floats over the associated connection status indicator.
Valid arguments for the setConnectionType 0 method are listed in table
7.5.
public class Invert extends Core Interface {
public Invert 0
{
inputs = 1;
outputs = 1;
name = II INVERT II ;
width = 30;
height = 20;
}
237
Type
Description
Colour
IMAGE
INTEGER
DOUBLE
BOOLEAN
STRING
RED
GREEN
BLUE
ORANGE
PINK
238
Description
Object getObject(int c)
) _
g x, y -
(J2g(x, y)
8x 2
8 2 g(x, y)
8y2
(7.4)
11 1
11 -8
II
239
(7.6)
240
II
NeatVision Imaging. The following two examples describe how to implement a 4-connected Laplacian edge detector using NeatVision imaging. The
Neat Vision imaging API Specification consists of two classes, Greylmage and
RGBlmage is presented in Appendix B. NeatVision imaging is based almost
entirely on Java AWT imaging. Both examples are taken from single input,
single output components and utilise several of the more important methods
provided by the Greylmage and RGBlmage classes respectively.
The first create 0 method implements a Laplacian edge detector using
the Greylmage class and can be described as follows: The object associated
with the input connection is retrieved from the DataBlock as a Greylmage
object, with the dimensions of this image obtained using the getWidthO
and getHeight 0 methods. These dimensions are then used to create a new
Greylmage object to store the output image. The output image data is then
created by performing the convolution sum on each 3 x 3 4-connected neighbourhood in the input image. The output image is then returned from the
create 0 method for further processing by the system.
public Object create(DataBlock arguments) {
GrayImage image In = arguments.getGrayImage(O);
int w = imageIn.getWidth();
int h = imageIn.getHeight();
GrayImage image Out = new GrayImage(w,h);
int b,d,e,f,h,z;
for(int y=O; y<h; y++)
for(int x=O; x<w; x++)
{
b
imageIn.getxy(x,y-l);
d
imageIn.getxy(x-l,y);
e = imageIn.getxy(x,y);
f
imageIn.getxy(x+l,y);
h
imageIn.getxy(x,y+l);
Z
= (-4xe)+b+d+f+h;
II
return(imageOut);
}
II
II
II
II
II
II
II
+---+---+---+
I
die
+---+---+---+
I
+---+---+---+
I
+---+---+---+
241
The second create () method is based on the RGBImage class, a multiplane version of the GreyImage class, consisting of a red, green and blue plane.
The main difference between this code segment and the previous example
is that the convolution sum is performed sequentially on each input image
plane to generate the data for each output image plane. The individual image
planes are accessed by adding an extra argument, which identifies the plane
being addressed, using the getxy () and setxy () methods. In the example
below the plane variable p is generated by adding a third f or-loop to the
standard raster scan pair. It should be noted that the processing time for this
component is approximately three times that of the single plane GreyImage
equivalent.
public Object create(DataBlock arguments) {
RGBImage imageIn = arguments.getRGBImage(O);
int w = imageIn.getWidth();
int h = imageIn.getHeight();
RGBImage image Out = new RGBImage(w,h);
int b,d,e,f,h,z;
for(int p=RGBImage.RED; p<RGBImage.BLUE; p++)
for(int y=O; y<h; y++)
for(int x=O; x<w; x++)
{
b
imageIn.getxy(p,x,y-1);
d
imageIn.getxy(p,x-1,y);
e = imageIn.getxy(p,x,y);
f
imageIn.getxy(p,x+1,y);
h
imageIn.getxy(p,x,y+1);
Z
= (-4xe)+b+d+f+h;
II
II
II
II
II
II
II
+---+---+---+
I
die
+---+---+---+
I
+---+---+---+
I
+---+---+---+
return(imageOut);
}
Java AWT Imaging. The Laplacian can also be implemented using Java
AWT imaging. Using this approach the image is retrieved from the input
as a java. awt. Image object, where the pixel information for this object
is not directly accessible. The image must be decomposed into an array of
pixel values before any low level processing can take place. Classes from
the java. awt and java. awt . image packages are required for this example,
consequently the following imports must be made.
import java.awt.*;
import java.awt.image.*;
242
The Image object is converted into an array of pixels using the PixelGrabber
class. In order to construct a new PixelGrabber object we require the dimensions of the Image object. These are obtained using the getWidthO and
getHeight 0 methods of the Image object. These dimensions are also used
to create buffers for the input and output images. The pixel data associated with the input image is available after the grabPixels 0 method of
the PixelGrabber has been called. At this point the 3 x 3, 4-connected convolution is performed and the resulting data is stored in the output image
buffer. In order to simplify this example, the three image planes are combined
to generate a grey scale image. This operation is performed using a method
similar to avg 0 listed below.
private int avg(int pixel) {
II get the average of the intensity planes (R+G+B)/3 ...
return ((pixel16)&OxFF + (pixelS)&OxFF + pixel&OxFF) 13;
}
The resulting image data must be converted into an Image object before
it can be returned from the create 0 method. This is achieved using the
create Image 0 method of the Toolkit class and the MemoryImageSource
class which is an instance of the ImageProducer interface. Neat Vision imaging was developed as a user friendly interface to the functionality provided
by Java AWT imaging. By comparing this example to the GreyImage example from the NeatVision imaging section it should be quite obvious which
approach is more straightforward to use.
public Object create(DataBlock arguments) {
II
II
II
II
int b
avg(buffer[x+(y-1)*w]);
int d
avg(buffer[x-l+y*w]);
int e = avg(buffer[x+y*w]);
243
avg(buffer[x+l+y*w]);
avg(buffer[x+(y+l)*w]);
int z = (-4xe)+b+d+f+h;
if(z>255)z=255; if(z<O)z=O;
II create and return a new Image from the output pixels ...
244
Accessing Operations. In this example we describe how operations defined within other NeatVision components can be accessed. In this case the
Laplacian operation is being accessed. For more information about the range
of NeatVision operations available, please consult the NeatVision web site
(http://www.NeatVision.com/) .
public Object create(DataBlock arguments) {
Greylmage imageln = DataBlock.getGreylmage(O);
Greylmage image Out = NeatVision.create("laplacian",imageln,
new Integer(4;
return(imageOut);
}
245
The Statusbar - -- - ,
The statusbar provides general
Fig. 7.12. The main sections of the NeatVision user interface, the controls, the
component explorer, the status bar, the virtual desktop and the message window.
The visual program described here will implement simple arithmetic addition and requires only four components, two integer inputs, an arithmetic
addition component and an integer output. These components are all available from the component explorer and can be found under the system tab.
First a blank visual workspace is required, this can be created by selecting
New Workspace from the File menu. The new visual workspace which is located within the virtual desktop can now be populated with the required
components.
As mentioned previously the required components are all available from
the system tab of the component explorer, the integer inputs can be found in
246
the Input folder which is a subfolder of the Data folder one of the main system
folders. The integer output can be found in the Output folder which is also
a subfolder of the of the Data folder. Finally the add component is located
within the Basic Functions folder, a sub folder of the Math folder which is
located in the main system menu. All of these components can be added to the
newly created visual workspace by simply dragging them from the component
explorer to the desired location within the visual workspace. A component
can also be instantiated by double clicking on the desired component name
in the explorer.
The components of the visual program which are now present in the visual workspace must be combined together to create a valid program. The two
integer inputs must be connected to the input connectors of the add component. The output of the add component must then be connected to the
input of the integer output component. Connections are made by dragging
the output connector of one component to the input connector of another
component. If the connection is valid then the connection status indicator
will go green otherwise, in the case of an invalid connection, the connection
status indicator will stay red. If the connection status indicator is orange then
a connection to the component is not necessarily required. In this case the
default value is used (the default value can be overwritten by connecting an
appropriate input). Information about a connection can be displayed by moving the mouse over the connection status indicator of the relevant connection.
The description of this connection, including the default value if available is
then displayed in the status bar. In order to delete a connection it must first
be highlighted, this is done by clicking on the connection terminator. The
highlighted connection can then be deleted by pressing the delete key on the
keyboard. A component can be deleted in the same way.
Once the required connections have been made the visual program must
be compiled. The compiler can be invoked by selecting the Compile option
from the Project menu. After compilation any errors or warnings associated
with the visual workspace are displays in the message window. The source
of any error message can be found by double clicking on the error message
itself. Once all the errors have been rectified the inputs to the visual program
can be initialised. This is performed by double clicking on all of the input
components, in this case the two integer inputs. The user interface associated
with the component will then appear and can be used to specify the input
value. Any value typed into the user interface associated with an integer input component will appear below the related component (this is particularly
useful for reference purposes).
The visual program has been defined and the inputs have been initialised.
It can now be executed by pressing the run button on the toolbar, this button
can be identified by a signal forward pointing arrow. Once this button has
been pressed the workspace should execute almost instantaneously. The result
of the addition can be seen be double clicking on the integer output, causing
I!I
~nteger
-=-
11l1I"IV.lur.:
Eilnleger
F
IMllldV
15
17
247
Ei1nteuer
I
-00
I
the user interface associated with the integer output to be displayed. The
result will also appear immediately below the integer output component in
the visual workspace, again useful for reference purposes.
This example has described how a simple visual program, based on integer
arithmetic can be created using NeatVision. The same procedure can be used
to create more complex visual programs, based on the visual programming
theory introduced in the first section of this chapter and using the wide range
of image and general data processing algorithms provided by Neat Vision.
7.3.2 Image Processing
NeatVision is primarily a machine VlSlOn application and as such it provides a range of image analysis tools. A simple image addition example is
now presented that will demonstrate how to import image data into a visual
workspace and how to view the data that results from the image processing
operation. As before, the components required for the visual program are
located in the component explorer and dragged into a new visual workspace.
The components required for this example are an image adder component,
an image output and two image inputs. Once these components have been
instantiated they are connected together to create the desired program.
The two image inputs must now be initialised. As previously, this is
achieved by double clicking to the relevant input component. In the case
of image inputs, a file dialog box will then appear this is used to select a
file from the file system. When the desired file has been selected an image
viewer window appears displaying the selected image (the image file formats
supported by NeatVision are presented in Appendix A). The image input
component is now initialised. In order to select a new image to be associated
with this image input a different procedure must be carried out. The image
viewer associated with the relevant image input component must be selected.
Then the Open Image menu item from the File menu is selected. This results
in a file dialog box being displayed, which can then be used to select a new
image.
This visual program is compiled and if any errors or warnings occur they
must be corrected. Once this has been completed the program can be exe-
248
cuted. The resulting image can then be displayed by double clicking on the
image output component. The visual program is presented in Fig. 7.14.
The image viewer tool supports zooming and pseudo colouring, these are
accessible using the Image menu which appears in the menubar whenever an
image viewer is selected. A host of image analysis tools can also be accessed
using from the Image menu, including histogram, cumulative histogram, 3-D
profile mesh viewer and horizontal and vertical intensity scans.
The histogram and cumulative histogram can be launched by using the
Histogram submenu of the main Image menu. There are five buttons associated with a histogram window, these are used to select which colour planes
are represented by the histogram viewer and whether anti-aliasing or smoothing is turned on or off. Any combination of the three colour planes can be
displayed and as a final option the grey scale equivalent of these three planes
can also be displayed.
The 3-D profile mesh viewer can be launched from the Probe submenu of
the main Image menu. A rectangular area is selected by pressing the right
mouse button over the top left corner of the desired area and then dragging
until the highlighted rectangle encloses the entire area. This selection tool can
only be used when the image is being viewed at its normal (1:1) resolution,
i.e. the zoom tool has not been used. Once the area has been selected the
3-D profile mesh can be launched. The resulting mesh can be manipulated
using the options provided by the Probe menu, such as zoom in, zoom out, tilt
forward, tilt backward, spin clockwise, spin anti-clockwise, increase intensity
and decrease intensity. These options can also be accessed using the toolbar
and the keyboard. The keystroke controls are listed in Table 7.7. Anti-aliasing
can be used with the 3-D profile mesh viewer for smoother results, this option
is turned off by default as it slows down system performance. A sample 3-D
profile mesh viewer is presented in Fig 7.16.
t2J
(c)
(b)
(a)
249
Fig. 7.15. An example of the histogram tool. (a) Source image. (b) Resulting
histogram. (c) Cumulative histogram.
Table 7.7. The keystroke controls for the 3-D profile mesh viewer.
Option
Zoom In
Zoom Out
Tilt Forward
Tilt Backwards
Spin Clockwise
Spin Anti-clockwise
Increase Intensity
Decrease Intensity
Keystroke
+
t
+--+
The horizontal and vertical intensity scanners can also be launched from
the Probe submenu of the main Image menu. These tools describe the intensity along horizontal and vertical scan lines which are highlighted in the
original image. The location of these scan lines can be changed by dragging
the mouse pointer to the desired point in the image. The intensity of the
three colour planes can be display individually and as with the histogram the
average of these three colour planes can also be displayed. Anti-aliasing is
available for smoother results, however it is turned off by default as it slows
down system performance. The intensity scan user interfaces are presented
in Fig 7.17.
7.3.3 Other User Interfaces
So far only two user interfaces have been presented, the Image Viewer and
the Integer viewer. The final part of this section describes the remaining
250
3D Image Viewer
(b)
(a)
Fig. 7.16. An example of the 3-D vIewer. (a) Source image. (b) Resulting 3-D
profile mesh.
I!l
(a)
(b)
(c)
Fig. 7.17. An example of the intensity scanner. (a) The source image. (b) Horizontal scan. (c) Vertical scan.
user interfaces provided by Neat Vision. Some of these are associated with
input and output components, while others are associated with processing
components. Each of these user interfaces can be displayed by double clicking
on the related component.
The remaining user interface associated with input and output components are String, Double, Boolean and Array. String consists of a scrollable
text area which represents an ASCII string. The string data type is particularly useful for presenting the results of a visual program. Double consists
of a single text field which represents a double value, the value entered into
this text field will immediately appear below the related component. Boolean
can be used to represent one of two values, true or false, where the desired
value can be selected by choosing the relevant check box. Finally the Array
251
[lString
lEI
Supl. Stdnq .
f;"i
~
J:jArrllY
1
1
Inpur
V~'IIC:
I? 141~
Set SIZe
El ouuble
O Booiean
13
Irue
o false
Fig. 7.18. User interfaces associated with the input and output.
There are four user interfaces associated with the processing components
which are packaged with Neat Vision. These components are region of interest,
single, double and triple level threshold. Whenever one of these components
executes the user interface automatically appears. The user must then either
select a region of interest or specify a threshold value and the click the button
labelled OK. The component then reads the required information from the
user interface and processes the input data. A selection of the processing
component interfaces are presented in Fig. 7.19.
7.3.4 The Integrated Software Development Environment
In this section we present an example describing how to integrate a fully
functional component into NeatVision. In order to add new functionality to
Neat Vision the developers plug-in is required, this can be obtained from the
NeatVision web site.
The steps required to integrate a new component are as follows. Select the
user tab from the component explorer. Right click the mouse anywhere in the
user area. From the resulting popup menu select the Add New Component
option. Once this has been selected a dialog box appears. This dialog presents
three options, NeatVision component, Java class and Java interface. The first
option, NeatVision component should be selected. The desired name for the
component is then typed in the text field labelled Name. Once this has been
completed the Component Development Wizard (CDW) can be launched by
pressing the button labelled Next. The CDW allows the user to create the
252
III
(a)
(b)
Fig. 7.19. User interfaces available for use with the processing components. (a)
Region of interest selector. (b) Triple threshold.
~Gr~E"NCO
253
onen1
Add Eolde.
gename
C.aala Qroup
V1ew~Outte
View!lelp
C~ Component
~opy Componenl
~as:18 Componenl
Qelete Component
fie..
Cancel
(b)
(~
Compourn rtf
II H.1mIt
II PI.cl.n
c_ _
1
leplaclen
ltv.
o
'"
CreiIMe
Cancei
(c)
Fig. 7.20. New component development. (a) Select Add New Component from the
popup menu of the user area of the component explorer. (b) Select Neat Vision
component to be added specifying the component name. (c) Define the component
skeleton.
the relevant component either in the explorer or in the visual workspace and
selecting View Help from the resulting popup menu.
The help viewer can be controlled using the buttons which are located in
the main toolbar. These buttons allow the user to forward and back one page
at a time, stop the current page from loading and go to the home page. The
URL representing the current help page is displayed alongside the buttons.
254
O lnlf .... ~.
II
(be
~l
._~ d~DroIl0Jl.l
lIl'I. ,.
1~'fI.1Jf'L.ldt"'ln.ulll1
Sl'Jit tI
1~lr:II.1t!~k\9b'-Cfl..dll'
II grAb
''''-Il
u. 11
1111''-4\
II/IU-'bJ l
ou,bb,.r .QTIKIII'IIl.I_fl
llIot d .. lIWQ[tlurru[IC-lt,.lll
lliot . . .
111\ , .. 6volbuUullll""'W] II
IlIIt .... ....vv[bu.tt""[ IY.IJ.)Jl
II..
.-
1/ I .. I b
..
I - I
II .---.---- ....-- .
/I I 011 I
It I
II .~.---
1/ I I " , - I
II . ---.-------- .
(a)
(b)
Fig. 7.21. NeatVision code and its associated component. (a) The resulting code
generated by the component development wizard plus details of the Laplacian edge
detector to be implemented by the component. (b) The Laplacian component being
used in a sample visual program.
Only one instance of the help viewer can be active at anyone time. If the
user tries to access help when the viewer is already open, the current current
page is unloaded and the new page is loaded in its place.
255
30, colour
255;
int x = radius*sin(i);
int y = radius*cos(i);
input.setxy(xcent+x,ycent+y,colour);
}
Neat Vision implements a standard for-loop structure. The feedback actions will be implemented on a blank input image as the loop variable increments from 0 to 360 in steps of 1 (as determined by integer inputs to the
for-loop block). The top output of the for-loop block indicates the final
loop output. The middle output indicates the intermediate loop output for
each pass through the feedback cycle. The loop variable is indicated by the
lower output from the for-loop block. Once the location of the white pixel
value to be written has been calculated, it is converted to an image coordinate
(PtoC) and the appropriate pixel is set to the desired grey scale (SETPIX).
IMAGE
c::J
(a)
(b)
Fig. 7.22. Low level functionality of NeatVision. (a) Visual workspace. (b) Output
image.
256
Working at a higher level maximises the use of the large number of predefined
image processing and analysis algorithms available within Neat Vision.
Aim: To implement a simple blob fill algorithm using the label by location
operator (Fig. 7.23).
Operational Summary: The convex hull (CONVEXH) is applied to the input
image and which is then inverted. This is then passed to the label by location
operator (LABELL) to produce a number of distinct grey scale regions corresponding to the images convex deficiencies. The hole region is selected using
the double threshold operator (DOUBLE). In the example this is set to five,
although this value will be dependent on the input image. Once selected this
is combined (XOR) with the original image to fill the hole region .
INT
[5}
(a)
(b)
(c)
(d)
Fig. 7.23. Simple blob filling algorithm. (a) Visual workspace. (b) Original binary
image. (c) Output from the label by location operator. (d) Blob filling applied to
(b).
257
f NHANC E
IMAO E
(a)
(b)
(c)
(d)
Fig. 7.24. Find and isolate the largest object in a scene. (a) Visual workspace. (b)
Original image. (c) Objects labelled by size. (d) Segmentation of the largest object.
258
(a)
...
.
"
..
(b)
If
..
(c)
(d)
Fig. 7.25. Counting the flutes on a bottle-top image. (a) Visual workspace. (b)
Original image. (c) Segmentation of the crown flutes. (d) The centroid of each flute
is calculated and replaced by a single white pixel. These are counted to give the
final flute count. The single centroid pixels are replaced by white squares for clarity.
7.5 Conclusion
259
7.5 Conclusion
This chapter has described the NeatVision application. Initially the theory
behind the application was introduced, describing in depth the concepts of
visual programming and new component development using the Java programming language. The application user interface was then introduced and
a description of how to use the core concepts described in the first two sections of this chapter was presented. A range of sample applications were
also introduced to illustrate the power of the NeatVision environment. This
chapter should help new Neat Vision users to get the most from the Neat Vision application and provide experienced user with a good reference to the
environments capabilities.
260
(a)
(c)
(b)
(e)
(d)
(f)
Fig. 7.26. Plant-stem location. Original image courtesy of Prof. Bruce G. Batchelor, University of Wales. (a) Visual workspace. (b) Original image. (c) Binary
version of the input image. The input image has been thresholded at grey scale 200
and inverted. (d) Application of morphological operators to remove the stem. (e)
Isolation of the stem. This is the difference between the binary image (c) and the
image illustrated in (d). The largest blob operator has also been applied to remove
any extraneous noise after the differencing operation. (f) Detected stem junctions
overlaid on the original grey scale image. Each detected corner and junction point
is illustrated by a small white square.
Java provides support for the two most common file formats found on the
Internet, the Graphics Interchange Format (GIF) and the Joint Photographic
Experts Group File Interchange Format (JPEG). These file formats are not
generally used for machine vision applications as the compression techniques
which they employ distort image information. The pixel error introduced by
either compression technique is typically 1%, although this may not be visually apparent, it does result in unacceptable information loss. In order to
preserve image information, support for several non-corrupting file formats
are implemented in Java (the three main formats required are raw image data
(BYT), PC Paintbrush (PCX) and Tagged Image File Format (TIFF)). The
definitive list of file formats (Murray & Ryper 1996) supported by NeatVision is outlined below. R indicates a read-only file format and RW indicates a
read/write file format.
BMP (RW) Microsoft Windows Bitmap, a bitmap image file using Run Length
Encoding (RLE). It supports a maximum pixel depth of 32 bits and a
maximum image size of 32K x 32K pixels.
BYT (RW) Raw image data, grey scale only with a maximum pixel depth of
8 bits. The data is stored with the first byte of the file corresponding to
the top left hand corner of the image. No header is contained within the
image data and details about the dimensions must be supplied by the
user. This file format preserves the image data.
FPX (R) Kodak FlashPix is a multiple-resolution, tiled image file format
based on JPEG compression.
GIF (R) Graphics Interchange Format (GIF), is a bitmap file format which
utilises Lemple-Zev-Welch (LZW) compression, limited to a maximum
colour palette of 256 entries and a maximum image size of 64K x 64K
pixels.
JPEG (JPG) (RW) Joint Photographic Experts Group (JPEG) file interchange format is a bitmap file utilising JPEG compression, an encoding
scheme based on the Discrete Cosine transform (DCT). It has a maximum colour depth of 16.7 million colours and a maximum image size of
64K x 64K pixels.
262
Two image classes are provided by the NeatVision API, Greylmage and
RGBlmage. These classes provide low level random access to pixel information.
This appendix presents the methods associated with each class and describes
their operation.
The images described by these classes are based on the Cartesian coordinate system. Each pixel can be addressed by an (x, y) pair or an absolute
index into the image array p (Fig. B.1). The pixel (0,0) is located in the top
left corner of the image. The value for x increases from left to right and the
value for y increases from top to bottom. Any point (x, y) can be represented
as an absolute index p (see Eqns. B.1 to B.3).
= x + (y x width)
x = p%width
y = p/width
(B.1)
(B.2)
(B.3)
,,,,,,",,,,,,00
123456719
0123"S67a9
01234!io6189
. Ill
~ ~r s
'~~ .
33
~ 35
i :
3~ , 311 39
~!.D
10 ,1~J
10
(b)
Il2
3!I' M
-E:
73
if~
Ie in
(a)
36 ,
IJ
wii
M 6e 11 , M
1l!l2:~
n 7B
~~E IS ~ , u
J.jI
r-i
76
09.
7'9
IS9
96 :., '11:"
(C)
Fig. B.1. A magnified 10 x 10 pixel grey scale image representing the three available
addressing modes. (a) Coordinate based addressing. (b) Matrix based addressing.
(c) Pixel index based addressing.
264
265
266
267
intensity argument i.
void RGBlmage.fillCircle(int x,int y,int r,Color c)
Draws a filled circle on the image. The centre of the circle is specified by the
coordinates x and y and the radius of the circle is given by the argument r.
The colour of the resulting circle is specified by the colour argument c.
void RGBlmage.fillCircle(int p,int x,int y,int r,int i)
Draws a filled circle on plane p of the image. The centre of the circle is specified by the coordinates x and y and the radius of the circle is given by the
argument r. The intensity of the resulting circle is specified by the intensity
argument i.
C. NeatVision Components
(j Oal.
c:! Flow Con~ol
c:! UtJIi~B.
c:! ArllI>miUc
(j Processing
c:! Nol>e
c:! fine,
c:! ""alySlS
c:!lAbBI
FIowCon1101
Cl Trtifltform
Cl COlour
SPllnelX2
~"
Spl,Ue,X)
SplitteIX4
CllllorpnOIOgy
CI H",ogram
Feedbatk
Cllllall>.
CI LowI....,
CI SIII"g
c:::l .leolow,
Else
F.,
Termlnal&
CiI p'.....,ng
~L~:~;"~.,
t ..
PQW1lr
Square
, a~Th:~;':;~.sno,o
OuamueshOld
TripieThle.no,o
EntroplCThrls.holci
TllresholdM
T'hres.hoICl&c5
Conba"
Intenslty'RangeEnM.nc:el
HI$10 r g.ram EQualjse r
Allthmi'llc
aIUtJ~~:
Double
PoinIToSQuare
Rotale
Md
SlJbtract
Mull/Ply
DMde
And
RoiaiePlus90
0,
ROlaleM.nus90
x.,
ROI
Scala
Mask
No'
MJnlmom
MalDmum
~
,
IntenSlttRangeStreCh8'
Removels.ola1.e~BPb;el$
Integrat@lmage-Rows
IntegratermageColumns
HortSum
B,na,.,sh,OOW
LeIlTORlg.,
RighfTOl.on
TopToBottom
BottomToTop
270
C. NeatVision Components
An ...... ,.
S'ObAn ........
B'og1910b
Smalle$tll'llen$l~eIQl'lbour
ISOlale9ays
Centroid
Comext-tull
FlfladConve.Hu I
CratkOetedof
EulerNumbatCalculalor
'Wn'tePmeICou-nter
F"lfSM1'litePtxelLocaoor
Large$llnten$t~elghbour
OreyS,".P/n!'Sum
fltcef
lawPass
Sharpen
M,dian
.. IClllolnl
RectangularAverageFU1er
OOLPS
NOJSIt
S nnPePDerOtme,alor
Smilles1910b
BlobF"IJI
IsOlateHOles
Convolution
Con.nectmty
Z810Cronln-gsDelector
lniensltyGradientOlf8cbon
Thill
o:>a~l[
Addrtnre'MUeNotseOeneralOf
Gau$SlanNoiseOeneralOl
RayfeighNolseOener3'lOr
POitsonNotseOen&roitOr
Tlun_N
CornerPolntOetettOf
JUncllonOBI8clor
umbEndOet8c.tol
, 6 SOu"dOng ReClOnOI.
NonWaldma
L platt"n
Blnary80rCler
Sound.ngS ..
Fllli'd80undlngBOli
BoundJngBoxT opCOOrd
BoundJngBox80110mCOOrd
$usanCornerOeteclOt
Trlns-form
HOUOh
lrwe'S-eHougl'l
,a
Me(h3\AJ1,sTransform
FFT
IllV'IrSe_m
Vlow_m
lowpass
Hjghpass
AdaptNeLoWpass
OandfJas$
S.nd.,OP
"ul Pot
OlYlde
OauSSlin
SelecllYePass
S)mmelliC SeleCttiePass
t S Coocc.urrentelllalrrc
o.n Ie"'~"
label
L belMoc Mn
L.b.,ByAIe.
MeasureLabeliedOb,eclS
EnergrAttrlbu1e
EnlropyAJlnbute
ContnslAttrlbU18
Homogenll't'AnflbUIB
&- ~ [)ist;tnc:eTranstorm
D S'n.ryToO......C.,.
Colour
_aoeCOIOUrChinnel.
COloufToROB
R08ToCOIOur
COIOurToHSI
HstToCOlOul
ColourToXY'Z
)((lToColoUI
COIOurToYIQ
YlGToColoUl
C. NeatVision Components
"alllS
.. a
,
Basit functions
ConcutJonil
Grea,erThan
GreaterTnan_Or_Equirro
LessTnan
~
..
MOIP,1'I010OY
DlIalfon
ErOSIon
Open
Close
Morpl"lDlogltllValley
lrIIorphOlogitaITop1'la1
ErodeN:lt\l
OliateNxN
HIIAI'M:I'I.l!n
Walershiild
8001.'nMd
90oleanOr
8ooleanxor
800leanNot
.,. [ j C.s~ng
abs
ceil
Histogram
, 9
U$STl\an_O,_EquaITO
EQuafTo
Boolean
nom
random
MeanSquareError
Averagelnumslt'tCaleul810i
EntropyCalculi tor
VarleneeCatcUlalor
Kul'tOsl'$Caleulator
"""
S<l"
StinCSardOtv1atlon
Skewness
min
In
log
max
Loca EQUiliSillon
3:I!3RIB'Ql0n
"OUM
PI
~SRI!!'Qlon
EI JAtCoiou.
"
OperalJons
JAIAIId
JAIAIIdCOnSI
JAJSubuatl
JAlSubtrattConSI
JAJSubtt.1C1fromConsr
JAt.. ""p'1
JAlMtJlltpi'tConst
JAlOMde
JAlOMdeEr;Const
JAlOMdilntoConSl
JAlMd
JAlAnClCORst
JAtO
.worCo",'
JAtNot
.1000>.
JAD<orConst
JAJMIR
JAI"""
, oE'il l VT
lowL....1
OOlPIlOI
SetPlICel
RemovePkIl
Or..une
Oraw9Ol
fMIBolC
Polnls:.Coorchnale
Coordlnate,"Polnls
-''''
..
JAt80xf'IIe'
JAlMaolan
JAlLowpass
JAJ$t\arpen
OrawCl" l.
fillClrel8
OetimagewiClth
OttimageH&IQnt
JAllog
JAtExp
Slnng
SUongMd
ToLowarCii iii
TOUPPilCaSIiI
~JAtEi:~~'~'S
JAlrober1s
JAlpre'Wlt1
JAllr.lth.n
JAllaS:lIa<lan
271
References
274
References
Freeman, H. (1987), Machine Vision: Algorithms, Architectures, and Systems, Academic Press.
Gonzalez, R. & Wintz, P. (1987), Digital Image Processing, Addison-Wesley.
Hanson, P. & Manneberg, G. (1997), 'Fourier optic characterization of paper surfaces', Optical Engineering 36(1), 35-39.
Haralick, R. (1979), 'Statistical and structural approaches to texture', Proceedings
of the IEEE 67(5), 786-804.
Haralick, R. (1992), Performance characterization in computer vision, in D. Hogg &
R. Boyle, eds, 'Proceedings of the British Machine Vision Conference', SpringerVerlag, pp. 1-8.
Haralick, R. & Shapiro, L. (1992), Computer and Robot Vision: Volume 1, AddisonWesley.
Haralick, R., Sternberg, S. & Zhuang, X. (1987), 'Image analysis using mathematical
morphology', IEEE Transactions on Pattern Analysis and Machine Intelligence
9(4),532-550.
Heijmans, H. (1991), 'Theoretical aspects of grey-level morphology', IEEE Transactions on Pattern Analysis and Machine Intelligence 13(6), 568-582.
Hochberg, J. (1987), 'Machines should not see as people do, but must know how
people see', Computer Vision, Graphics, and Image Processing 37, 221-237.
Hutson, T. (1971), Colour Television Theory, McGraw-Hill, London.
Jain, A. (1989), Fundamentals of Digital Image Processing, Prentice-Hall.
Jain, A. & Dubes, R. (1988), Algorithms for Clustering Data, Prentice-Hall.
Kass, M., Witkin, A. & Terzopoulos, D. (1987), Snakes: Active contour models, in
'Proceedings of the 1st International Conference on Computer Vision', pp. 259268.
Keller, J. M. (1986), Color image analysis offood, in 'Proc. IEEE Computer Society
on Computer Inspection and Pattern Recognition', Florida, USA.
Klette, R. & Zamperoni, P. (1996), Handbook of Image Processing Operators, Wiley.
Klien, J. & Serra, J. (1972), 'The texture analyzer', J. Microscopy 95(2),349-356.
Lantuejoul, C. & Maisonneuve, F. (1984), 'Geodesic methods in image analysis',
Pattern Recognition 17(2),177-187.
Laws, K. (1980), Texture Image Segmentation, PhD thesis, University of Southern
California.
Liang, J., Piper, J. & Tang, J. (1989), 'Erosion and dilation of binary images by arbitrary structuring elements using interval coding', Pattern Recognition Letters
9(3),201-209.
Lindley, C. (1991), Practical Image Processing in C, Wiley.
Liu, S. & Jernigan, M. (1990), 'Texture analysis and discrimination in additive
noise', Computer Vision, Graphics, and Image Processing pp. 52-57.
Mandelbrot, B. (1983), The fractal geometry of nature, Freeman, New York.
Maragos, P. (1987), 'Tutorial on advances in morphological image processing and
analysis', Optical Engineering 26(7), 623-632.
Maragos, P. & Schafer, R. (1986), 'Morphological skeleton representation and coding of binary images', IEEE Transactions on Acoustics, Speech and Signal Processing 34(5), 1228-1244.
Marr, D. & Hildreth, E. (1980), Theory of edge detection, in 'Proceedings of the
Royal Society', Vol. B 207, pp. 187-217.
Matheron, G. (1975), Random Sets and Integral Geometry, John Wiley.
McClannon, K., Whelan, P. & McCorkell, C. (1993), Integrating machine vision into
process control, in M. Ahmad & W. Sullivan, eds, 'Proceedings of FAIM'93',
CRC Press, University of Limerick, pp. 703-714.
References
275
McCormick, B. & Jayaramamurthy, S. (1974), 'Time series model for texture synthesis', International Journal of Computer and Information Sciences 3, 329343.
Meyer, F. (1986), 'Automatic screening of cytological specimens', Computer Vision,
Graphics, and Image Processing 35, 356-369.
Mirmehdi, M. & Petrou, M. (2000), 'Segmentation of color textures', IEEE Transactions on Pattern Analysis and Machine Intelligence 22(2), 142-159 ..
Mitkas, P., Betzos, G. & Irakliotis, L. (1998), 'Optical processing paradigms for
electronic computers', IEEE Computer 31(2), 45-51.
Murray, J. & Ryper, W. V. (1996), Encyclopaedia of Graphics File Formats,
O'Reilly.
Myler, H. & Weeks, A. (1993), Computer Imaging Recipes in C, Prentice Hall.
Noble, J. (1989), Descriptions of Image Surfaces, PhD thesis, Oxford University.
Ojala, T. & Pietikainen, M. (1999), 'Unsupervised texture segmentation using feature distributions', Pattern Recognition 32(3),477-486.
Ojala, T., Pietikainen, M. & Harwood, D. (1996), 'A comparative study of texture
measures with classification based on feature distributions', Pattern Recognition
29(1),51-59.
Parker, J. (1997), Algorithms for Image Processing and Computer Vision, Wiley.
Parks, J. (1978), Industrial Sensory Devices, Plenum, New York, pp. 253-286.
Pavlidis, T. (1978), 'A review of algorithms for shape analysis', Computer Graphics
and Image Processing 7, 243-258.
Pavlidis, T. (1992), 'Why progress in machine vision is so slow', Pattern Recognition
Letters 13, 221-225.
Pentland, A. (1984), 'Fractal-based description of natural scenes', IEEE Transactions on Pattern Analysis and Machine Intelligence 6(6), 661-674.
Petrou, M., Mirmehdi, M. & Coors, M. (1998), Perceptual smoothing and segmentation of colour textures, in 'Proceedings of the 5th European Conference on
Computer Vision 98', Freiburg, Germany, pp. 623-639.
Pietikainen, M. & Kauppinen, H. (1999), Texture '99; Workshop on Texture Analysis in Machine Vision, Infotech Oulu.
Pietikainen, M. & Ojala, T. (1996), Texture analysis in industrial applications, in
J. Sanz, ed., 'Image Technology - Advances in Image Processing, Multimedia
and Machine Vision', Springer-Verlag, pp. 337-359.
Pietikainen, M. & Ojala, T. (2000), 'Rotation-invariant texture classification using
feature distributions', Pattern Recognition 33, 43-52.
Piper, J. (1985), 'Efficient implementation of skeletonisation using interval coding',
Pattern Recognition Letters 3(6), 389-397.
Pitas, I. (1993), Digital Image Processing Algorithms, Prentice-Hall.
Pitas, I. & Venetsanopoulos, A. (1990), 'Morphological shape decomposition', IEEE
Transactions on Pattern Analysis and Machine Intelligence 12(1), 38-45.
Preston, K. (1991), 'Three-dimensional mathematical morphology', Image and Vision Computing 9(5), 285-295.
Randen, T. & Hus0y, J. (1999), 'Filtering for texture classification: A comparative survey', IEEE Transactions on Pattern Analysis and Machine Intelligence
21(4),291-310.
Rao, K. & Hwang, J. (1996), Techniques and Standards for Image, Video and Audio
Coding, Prentice-Hall PTR.
Rauber, C., Tschudin, P., Startchik, S. & Pun, T. (1996), Archival and retrieval of
historical watermark images, in 'International Conference on Image Processing
(ICIP)', IEEE Press, Lausanne, pp. 773-776.
276
References
References
277
Index
is-a-part-oj, 26
is-a, 26
GreyImage, 264
RGBImage, 265
extends, 26
flowO, 72
friendly, 29
javac, 23, 34
main 0 , 33
private, 28
protected, 29
public, 28
super, 30
this, 30
4-adjacency, 112
4-connected, 62
8-adjacency, 112
8-connected, 62
aliasing, 53
alpha, 40
anti-aliasing, 248
API, 22, 27, 49, 243, 263
application
- blob fill, 256
- bottle-top inspection, 258
- colour recognition, 214
- draw, 254
- geometric packing, 169
- largest item, 257
- morphology, 169
- NeatVision, 254
- plant-stem location, 259
- polychromatic scenes, 219
- red:green pepper ratio, 201
- removal of boundary objects, 163
- template matching, 91
- watermark analysis, 129
auto-correlation (ACF), 177
autoregressive model, 189
AVVT, 32, 34, 38, 49, 54, 240, 241
280
Index
CIELUV, 208
CIEXYZ, 206
class
- derived, 25
- hierarchy, 25
classification
- artificial neural network (ANN), 14
- Bayesian, 14
- clustering, 15
confusion matrix, 15
KNN,14
leave one-out, 15
maximum similarity, 13
maximum-likelihood, 14
nearest neighbour, 13
performance, 15
- supervised, 13
- unsupervised, 15
closing
- binary, 145
- grey scale, 151
co-occurrence
- angular second moment, 181
- contrast, 182
- energy, 181
- entropy, 181
- inertia, 182
- local homogeneity, 182
- matrix, 179
colour
- analysis, 191
- CIE chromaticity diagram, 206
- cube, 195
- histogram, 197
- hue, 198
- images, 9
- intensity, 198
- representation, 193
- saturation, 198
- scattergrams, 197, 203
- separation, 197
- texture, 210
- triangle, 195
- WWW links, 219
colour camera, 191
conditional dilation, 161
connected component analysis, 99
connectivity detector, 99
contrast
- enhancement, 69, 160
- stretching, 70
convex hull, 111
convolution, 74, 124
corner
- binary, 90, 107, 143
- grey scale, 93
- SUSAN, 93
covariance, 156
difference of low-pass filters (DOLPS),
81
dilation
- binary, 138
- conditional, 161
- geodesic, 162
- grey scale, 139, 150
- half-gradient, 152
- interval coding, 149
- reconstruction, 163
Discrete Fourier Transform (DFT), 124
distance measure
- Euclidean, 14
distance measure
- Chessboard, 117
- City Block, 117
- Euclidean, 117
distance transform, 117
double threshold, 163
duals, 142
dyadic
- addition, 72
- divide, 72
- maximum, 72
- minimum, 73
- multiply, 72
operators, 70
subtraction, 72
edge
density, 175
- effects, 91
- operator, 82
edge operator
Canny, 89
Frei and Chen, 87
Laplacian, 84
Marr-Hildreth, 88
Prewitt, 86
Roberts, 84
Sobel, 85
zero-crossings, 88
epoch, 14
erosion
binary, 140
geodesic, 162
grey scale, 140, 150
half-gradient, 152
Index
- interval coding, 149
- reconstruction, 163
- ultimate, 163
euler number, 100
filling holes, 103
filter
- band-pass, 127
- band-stop, 128
- adaptive low-pass, 126
- crack detection, 82
- direction codes, 82
- DOLPS, 81
- edge detector, 80
- high-pass, 75, 77, 127
- low-pass, 75, 81, 126
- mean, 81
- median, 80
- midpoint, 81
- N-tuple, 90
- rank, 82
- rectangular average, 77
- selective, 128
- sharpen, 81
- symmetric, 129
Fourier spectral analysis, 177
fpx, 261
fractal analysis, 184
Freeman code, 114
Gabor filters, 188
geodesic
- dilation, 162
- erosion, 162
gif, 261
GLDM
- angular second moment, 179
- contrast, 179
- entropy, 179
inverse different moment, 179
- mean, 179
- operation, 178
GLRLM, 178
gradient, 152
grass-fire transform (GFT), 105
grey level difference, 178
grey level run length, 178
half-gradient
- dilation, 152
- erosion, 152
histogram, 178
histogram
- cumulative, 65
- equalization, 67
- intensity, 65
- local area, 68
- local equalization, 68
- stretching, 70
Hit-and-Miss, 143
Hot Java, 21
HSI, 198
hue singularity, 201
idempotency, 145
image
acquisition, 7
analysis, 10
binary, 96
classification , 12
dyadic, 70
grey scale, 8
monadic, 62
monochrome, 8
- neighbourhood, 61
- processing, 10, 61, 247
- representation, 8
imaging
- Java, 49, 243
- real-time, 6
inheritance, 26
inspection systems, 3
instantation, 24
intensity
- highlight, 63
- histogram, 65
- multiply, 63
- negate, 63
- shift, 62
- squaring, 63
interval coding
- dilation, 149
- erosion, 149
- operation, 148
JAI, 54, 56, 58, 223, 244
Java
- alpha value, 40
applet, 33-35
application, 33, 35
arrays, 31
background, 21
class, 32
class
- - descriptors, 31
- comments, 29
- compile, 34
281
282
Index
Image Consumers, 41
image processing, 36
Image Producers, 41
imaging
- API, 22, 49, 243
-- AWT, 49, 241
- - JAI, 54, 244
- import, 32
- interfaces, 31
- lightweight, 50
- newsgroups, 59
- operators, 29
- overloading, 27
- packages, 32
- PixelGrabber, 41
- RMI, 23
- Software Development Kit (SDK) ,
23
- Swing, 50
- throw, 29
- transparency value, 40
- types, 29
- variables, 28
- variables
- class, 28
- class member, 28
- local, 28
- static, 28
- WWW links, 58
Java class
- additional, 45
- Canvas, 36
- Class, 31
- Color Model, 45
- CropImageFilter, 46
- DirectColorModel, 45
- FilteredImageSource, 47
- Graphics, 38
- Image, 38
- ImageFilter, 45
- IndexColor Model, 45
- local, 28
- methods, 24
- Object, 31
- PlanarImage, 56
- RG BImageFilter, 46
- states, 24
- static, 28
Java Run-time Environment (JRE), 22
Java Virtual Machine (JVM), 22
jpeg (jpg) , 261
Just-In-Time (JIT) compiler, 22
Laplacian of a Gaussian (LoG), 88
Index
Object-oriented Programming
- abstract classes, 27
- classes, 24
- encapsulation, 24
- inheritance, 25
- objects, 25
- polymorphism, 27
Object-oriented Programming (OOP),
23
opening
- binary, 145
- grey scale, 151
opponent process, 203
PAL,193
pbm, 262
pcx, 261
perimeter, 114
pgm, 262
pixel, 8
283
png, 262
point-pairs, 156
point-wise maximum, 163
ppm, 262
programmable colour filter
- generalisation, 217
- implementation, 211
- noise, 215
- recognition, 214
programmable colour filter (PCF), 211
pruning, 109
random field models, 187
ras, 262
raw, 262
reconstruction
- dilation, 163
- erosion, 163
rectangular average filter, 77
region labelling, 103
RGB,193
RMI,55
run-length coding, 115
S-CIELAB, 209
salt and pepper noise, 152
SE, 137
SE decomposition, 146
SE decomposition
- dilation, 146
- parallel, 147
- serial, 146
- union, 148
SECAM, 193
shape descriptors, 115
skeleton, 105, 145
Skeleton Influence by Zones (SKIZ),
165
SKIPSM, 174
smallest intensity neighbourhood, 79
spatial CIELAB, 209
structural analysis, 189
structuring element, 137
SUSAN,93
syntactic processes, 12
system
- boards, 5, 6
- context, 16
- design, 7
- engineering, 16
- hardware, 5
- machine vision, 3
- morphology, 174
- pipeline, 5
284
Index
self-contained, 5
software, 6
turn-key, 5
template, 91
template matching, 11
texel, 189
textural energy, 185
texture, 175
texture
- spectrum, 186
- units, 186
thickening, 144
thinning, 106, 144
threshold
- adaptive
-- 3 x 3, 78
- - 5 x 5, 78
- double, 163
- dual, 64
mid-level, 64
- single, 63
- triple, 65
tiff (tif), 262
top-hat, 157
top-hat
- black, 157
- enhancement, 160
- opening, 157
- white, 157
transform, 116
transform
- Chamfer, 11 7
- DFT, 124
- DFT