Вы находитесь на странице: 1из 292

Machine Vision Algorithms in Java

Springer-Verlag London Ltd.

Paul F. Whelan and Derek Molloy

Machine Vision
Algorithms in Java
Techniques and Implementation

With 183 Figures

Springer

Paul F. Whelan, BEng, MEng, PhD


Derek Molloy, BEng
Vision Systems Laboratory, School of Electronic Engineering, Dublin City
University, Dublin 9, Republic of lreland

ISBN 978-1-4471-1066-8
DOI 10.1007/978-1-4471-0251-9

ISBN 978-1-4471-0251-9 (eBook)

British Library Cataloguing in Publicat ion Data


Whelan, Paul
Machine vis ion algorithms in java : techniques and
implementation
1.Computer vis ion 2. java (Computer program language)
3.Computer algorithms
I.Titiie II. Molloy, Derek
006.3'7
ISBN 978-1-4471-1066-8
Library of Congress Cataloging-in-Publication Data
Whelan, Paul E, 1963Machine vision algorithms in java: techniques and implementation / Paul E Whelan
and Derek Molloy.
p.cm.
ISBN 978-1-4471-1066-8 (alk. paper)
1. Computer vis ion. 2. Computer algorithms. 3. java (Computer program language) 1.
Molloy, Derek, 1973- II. Title
TA1634.w542000
006.3 '7 -dc21
00-030072
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored
or transmitted, in any form or by any means, with the prior permis sion in writing of the publishers, or
in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the
publishers.
Springer-Verlag London 2001
Originally published by Springer-Verlag London Berlin Heidelberg in 2001
Softcover reprint of the hardcover 1st edition 2001

The use of registered names, trademarks etc. in this publication does not imply, even in the absence of
a specific statement, that such names are exempt from the relevant laws and regulations and therefore
free for general use.
The publisher makes no representation, express or implied, with re gard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made.
Typesetting: PostScript files by authors
34/3830-543210 Printed on acid-free paper SPIN 10743121

To Caroline, Hannah and Sean. (PFW)

To Sally and parents. (DM)

Preface

For many novices to the world of machine vision, the development of automated vision solutions may seem like a relatively easy task as it only requires a
computer to understand basic elements such as shape, colour and texture. Of
course this is not the case. Extracting useful information from images in a laboratory environment is a difficult process at the best of times, but to develop
imaging systems with the reliability and repeatability required for industrial,
medical and associated imaging applications increases the complexity of the
design task. The aim of this book was to produce a self contained software
reference source for machine vision designers which takes such issues into account. To that end Machine Vision Algorithms in Java contains explanations
of key machine vision techniques and algorithms, along with the Java source
code for a wide range of real-world image processing and analysis functions.
A number of texts have been written over the last few years, which have
focused on implementing image processing and to a lesser extent image analysis functions, through coded examples (i.e. for a range of software languages
and environments). So, why do we need another book on the subject? Firstly,
Machine Vision Algorithms in Java concentrates on examining these issues
from a machine vision perspective. It focuses on the image analysis and general machine vision design task, as well as discussing the key concepts relating
to image processing. In addition, we have made available (via the Internet) a
comprehensive machine vision development environment, Neat Vision, which
allows the algorithms and techniques discussed throughout this book to be
implemented by the reader.
The range of machine vision techniques and applications has grown significantly over the last decade and as such it would be difficult for a single
text to cover them all. Therefore, this book concentrates on those algorithms
and techniques that have found acceptance within the machine vision community. As is in the nature of putting a book like this together, certain areas
receive greater attention reflecting our experience and the nature of our own
research.
This book has grown from a number of different elements. Many of the
ideas outlined are based on the postgraduate modules Computer and Machine
Vision (EE544) and Object-oriented Programming (EE553) developed by Dr.
Paul Whelan and Derek Molloy respectively. Both modules are available in

Vlll

Preface

traditional form and via the Internet as part of Dublin City University's
Remote Access to Continuing Engineering Education (RA CeE) initiative l .
Another key element was the development of Neat Vision, a Java based
visual programming environment for machine vision. It provides an intuitive
interface which is achieved using a "drag and drop" block diagram approach,
where each image analysis/processing operation is represented by a graphical
block with inputs and outputs that can be interconnected, edited and deleted
as required. NeatVision was designed to allow the user to focus on the machine vision design task rather than concerns about the subtlety of a given
programming language. It allows students of machine vision to implement
their ideas in a dynamic and easy to use way, thus reinforcing one of the key
elements of the educational experience, interaction. In conjunction with the
publication of this book a fully functional 'shareware' version of NeatVision
has been made available via the Internet 2 .
We have also included an introduction to Object-oriented Programming
(OOP) and the Java programming language, with particular reference to
its imaging capabilities. This was done for those readers who may be unfamiliar with OOP and the Java programming language. It includes details
relating to the design of a Java based visual programming environment for
machine vision along with an introduction to the Java 2D imaging and the
Java Advanced Imaging (JAI) Application Programming Interface (API). A
wide range of illustrative examples are also included.
Having successfully digested the ideas outlined in this book the reader
will:
Be familiar with the essential elements of machine VlSlOn software and
have an understanding of the problems involved in the development and
implementation of machine vision systems from a software perspective.
Have the ability to design and implement image processing and analysis
techniques.
Be able to evaluate emerging machine vision algorithms and techniques.
Be familiar with the Java programming language and its application to
image analysis.
Be able to develop machine vision solutions using a Java based visual programming environment (i.e. Neat Vision).
This book is aimed at senior undergraduate and postgraduate students in
engineering and computer science as well as practitioners in machine vision,
who may wish to update or expand their knowledge in the field. We have
tried to present the techniques and algorithms of machine vision in a way
1

http://www.racee.ie/
See http://www.NeatVision.com/ for further details on downloading and in-

stalling this software. This web site also contains details of the NeatVision environment along with a summary of its techniques. A number of working examples
along with sample files and associated visual workspaces are also available.

Preface

IX

that it will be understood not only by specialists familiar with the field, but
also by those who are less familiar with the topic. Care has also been taken to
ensure that we have provided adequate references to supporting work. This
should aid readers who wish to examine the topics covered in more detail.
The organisation of the book is as follows. Chap. 1 introduces the general
field of machine vision systems engineering. Chap. 2 outlines the key concepts
behind the Java programming language. As well as giving a brief history, description and layout of the Java language, this chapter outlines the properties
of Java that make it useful for image processing and analysis. The purpose of
Chap. 3 is to detail some of the basic techniques and algorithms used in the
development of machine vision systems. Key elements of the image processing and analysis functions introduced in this section are also implemented in
Java and form the basis of the NeatVision visual programming environment.
Chap. 4 follows on from this basic introduction by examining mathematical morphology, a key framework in which many machine vision algorithms
can be placed. Chaps. 5 and 6 expand our discussion of imaging techniques
to include key elements of texture and colour image analysis. Chap. 7 details the design and implementation of the Neat Vision visual programming
environment. Appendix A outlines the graphics file formats that are used
by NeatVision and Appendix B details the NeatVision imaging Application
Programming Interface (API) specification. Finally, Appendix C summarises
the range of operators available in the NeatVision machine vision software
development environment. A range of sample applications implemented in
NeatVision are highlighted throughout the book.
For updates, corrections, colour images and sample visual programmes
please refer to the book web site at http://www.eeng.deu.ie;- j avamv /.

Dublin, June 2000

Paul F. Whelan
Derek Molloy
mvajeeng.deu.ie

Acknowledgments

This book has benefited from the comments and suggestions of a wide range
of people, including our colleagues with whom we have had many fruitful
discussions and collaborations. Numerous students have also contributed,
both directly and indirectly. The most important contributions coming from
members of the Vision Systems Laboratory (VSL) at Dublin City University (DCU), namely Ovidiu Ghita, Alexandru Drimbarean and Pradeep PP.
We would particulary like to express our gratitude to Robert Sadleir (VSL)
for a fruitful collaboration in the development of Neat Vision. Robert also
contributed to our discussion on the Neat Vision development environment,
specifically in Chap. 7. We would also like to thank all the members of the
VSL for their comments on the various drafts of this book.
We would like to thank Prof. Charles McCorkell, Head of the School
of Electronic Engineering, DCU, for his support of the VSL and this book
project. We would also like to acknowledge all our academic colleagues for
our many discussions on computer and machine vision, especially Prof. Bruce
Batchelor (machine vision systems engineering) and Dr. Pierre Soille (mathematical morphology). We would like to acknowledge Xuemei Zhang (Department of Psychology, Stanford University), Mark Graves (Spectral Fusion
Technologies) and Tim Carew (Technology Systems International) and thank
them for their permission to use some of the material cited in this book.
Special thanks are due to Nicholas Pinfield and Oliver Jackson of SpringerVerlag for their commitment to this book and maintaining its schedule. Machine Vision Algorithms in Java: Techniques and Implementation was prepared in camera-ready form using the IbTEX text processing environment and
Paint Shop Pro image editing software.
Finally, the authors would like to thank their families for their constant
support and encouragement during this project. We would like to thank the
readers in advance for comments and suggestions aimed at improving and extending the present book and its associated NeatVision software. Any errors,
of course, remain our own responsibility.

XII

Acknowledgments

Notice
Neat Vision and its associated materials are copyrighted 2000, by Paul
F. Whelan. The software is presented "as is". While every reasonable effort
has been made to ensure the reliability of this software, NeatVision and the
associated algorithms outlined in this book are supplied for general reference
only and should not be relied on without further specific inquiry. Neat Vision
may be downloaded, stored on a hard drive or other storage device, with the
following restrictions and exceptions:
Systematic or multiple-copy reproduction or republication; electronic retransmission to another location; print or electronic duplication of any
NeatVision material supplied for a fee or for commercial purposes; or altering or recompiling any contents of NeatVision and its associated materials
are not permitted.
By choosing to use Neat Vision and its associated materials, you agree to
all the provisions of the copyright law protecting it and to the terms and
conditions established by the copyright holder.
The authors cannot accept responsibility for any loss or damage caused by
the use of the source code presented in this book.

Trademarks
Sun, Sun Microsystems, Solaris, Java and all Java-based trademarks are
trademarks or registered trademarks of Sun Microsystems, Inc. in the
United States and other countries.
Netscape and Netscape Navigator are registered trademarks of Netscape
Communications Corporation in the United States and other countries.
Microsoft, Windows, Windows NT and/or other Microsoft products referenced herein are either trademarks or registered trademarks of Microsoft
Corporation.
IBM is a registered trademark of IBM Corporation in the United States
and other countries.
Paint Shop Pro is a registered trademark of Jasc Software, Inc.

Contents

1.

2.

An
1.1
1.2
1.3
1.4

Introduction to Machine Vision. . . . . . . . . . . . . . . . . . . . . . .


Human, Computer and Machine Vision. . . . . . . . . . . . . . . . . . . .
Vision System Hardware ................................
Vision System Software .................................
Machine Vision System Design . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Image Acquisition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.2 Image Representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.3 Image Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
1.4.4 Image Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
1.4.5 Image Classification ..............................
1.4.6 Systems Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
1.5 NeatVision: Where Java meets Machine Vision.............
1.5.1 NeatVisions Graphical User Interface (GUI) .........

1
5
6
7
7
8
10
10
12
16
17
18

Java Fundamentals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.1 The History of Java .....................................
2.1.1 What Makes Java Platform Independent? ...........
2.1.2 The Just-In-Time Compiler ........................
2.1.3 The Sun Java Software Development Kit (Java SDK).
2.2 Object-oriented Programming. . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.2.1 Encapsulation....................................
2.2.2 Classes..........................................
2.2.3 Objects.........................................
2.2.4 Inheritance......................................
2.2.5 Polymorphism...................................
2.2.6 Abstract Classes .................................
2.3 Java Language Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.3.1 Variables........................................
2.3.2 Access Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.3.3 Java Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.3.4 Java Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.3.5 Java Comments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.3.6 The super and this Keywords. . . . . . . . . . . . . . . . . . . ..
2.3.7 Java Arrays .....................................

21
21
22
22
23
23
24
24
25
25
27
27
28
28
28
29
29
29
30
31

XIV

3.

Contents
2.3.8 The "Object" and "Class" Classes. . . . . . . . . . . . . . . . ..
2.3.9 Interfaces .......................................
2.3.10 Packages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.4 Applications and Applets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.4.1 Writing Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.4.2 Applets.........................................
2.4.3 An Applet and an Application? ... . . . . . . . . . . . . . . . ..
2.5 Java and Image Processing ..............................
2.5.1 The Canvas Class ................................
2.5.2 Java and Images .................................
2.5.3 Image Producers and Image Consumers . . . . . . . . . . . ..
2.5.4 Recovering Pixel Data from an Image Object ... . . . ..
2.5.5 Recreating an Image Object from an Array of Pixels..
2.6 Additional Classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.6.1 ColorModel......................................
2.6.2 ImageFilter......................................
2.6.3 CropImageFilter.................................
2.6.4 RGBImageFilter.................................
2.6.5 FilteredImageSource..............................
2.7 Double Buffering.. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.8 Recent Additions to Java for Imaging. . . . . . . . . . . . . . . . . . . ..
2.8.1 Java 2D API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.8.2 Working with the Java 2D API ....................
2.8.3 Renderable Layer and Rendered Layer . . . . . . . . . . . . ..
2.8.4 Java Advanced Imaging API (JAI) .................
2.8.5 JAI Functionality ................................
2.9 Additional Information on Java . . . . . . . . . . . . . . . . . . . . . . . . ..
2.10 Conclusion ............................................

31
31
32
33
33
34
35
36
36
38
41
41
44
45
45
45
46
46
47
47
49
49
50
54
54
56
58
59

Machine Vision Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..


3.1 Elementary Image Processing Functions. . . . . . . . . . . . . . . . . ..
3.1.1 Monadic, Point-by-point Operators. . . . . . . . . . . . . . . ..
3.1.2 Intensity Histogram ..............................
3.1.3 Look-up Tables (LUT) . . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.1.4 Dyadic, Point-by-point Operators ..................
3.2 Local Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.2.1 Linear Local Operators ...........................
3.2.2 Non-linear Local Operators. . . . . . . . . . . . . . . . . . . . . . ..
3.2.3 Edge Detectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.2.4 N-tuple Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.2.5 Edge Effects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.2.6 Grey Scale Corner Detection. . . . . . . . . . . . . . . . . . . . . ..
3.3 Binary Images .........................................
3.3.1 Boolean Operators ...............................
3.3.2 Connected Component (Blob) Analysis. . . . . . . . . . . . ..

61
61
62
65
66
70
73
74
79
82
90
91
93
96
98
99

Contents
3.3.3 Measurements on Binary Images ...................
3.3.4 Run-length Coding ...............................
3.3.5 Shape Descriptors ................................
Global Image Transforms ................................
3.4.1 Geometric Transforms ............................
3.4.2 Distance Transforms ..............................
3.4.3 Hough Transform ................................
3.4.4 Two-dimensional Discrete Fourier Transform (DFT) ..
Conclusion ............................................

112
115
115
116
116
117
119
124
134

Mathematical Morphology ................................


4.1 Binary Mathematical Morphology ........................
4.1.1 Dilation and Erosion ..............................
4.1.2 Hit-and-Miss Transform ...........................
4.1.3 Opening and Closing .............................
4.1.4 Skeletonisation ...................................
4.1.5 Structuring Element Decomposition ................
4.1.6 Interval Coding ..................................
4.2 Grey Scale Mathematical Morphology .....................
4.2.1 Basic Grey Scale Operators ........................
4.2.2 Noise Removal using Grey Scale Morphology .........
4.2.3 Morphological Gradients ..........................
4.2.4 Point-Pairs ......................................
4.2.5 Top-Hat Transform ...............................
4.3 Morphological Reconstruction ............................
4.3.1 Conditional Dilation ..............................
4.3.2 Geodesic Dilation ................................
4.3.3 Geodesic Erosion .................................
4.3.4 Reconstruction by Dilation ........................
4.3.5 Reconstruction by Erosion .........................
4.3.6 Ultimate Erosion .................................
4.3.7 Double Threshold ................................
4.3.8 Image Maxima ...................................
4.3.9 Image Minima ...................................
4.4 Morphological Segmentation .............................
4.4.1 Skeleton Influence by Zones (SKIZ) .................
4.4.2 Watershed Segmentation ..........................
4.5 Case Study: Geometric Packing ..........................
4.5.1 Geometric Packer Implementation ..................
4.6 Morphological System Implementation ....................
4.7 Conclusion ............................................

137
138
138
143
145
145
146
148
149
150
152
152
156
157
160
161
162
162
163
163
163
163
164
165
165
165
165
169
169
174
174

3.4

3.5
4.

xv

XVI

Contents

5.

Texture Analysis .........................................


5.1 Texture and Images .....................................
5.2 Edge Density ..........................................
5.3 Monte-Carlo Method ....................................
5.4 Auto-Correlation Function (ACF) ........................
5.5 Fourier Spectral Analysis ................................
5.6 Histogram Features .....................................
5.7 Grey Level Run Length Method (GLRLM) ................
5.8 Grey Level Difference Method (GLDM) ...................
5.9 Co-occurrence Analysis ..................................
5.9.1 Energy, or Angular Second Moment ................
5.9.2 Entropy .........................................
5.9.3 Inertia ..........................................
5.9.4 Local Homogeneity (LH) ..........................
5.10 Morphological Texture Analysis ..........................
5.10.1 Morphological Ratio ..............................
5.10.2 Granularity ......................................
5.11 Fractal Analysis ........................................
5.12 Textural Energy ........................................
5.13 Texture Spectrum Method ...............................
5.14 Local Binary Patterns (LBP) ............................
5.15 Random Field Models ...................................
5.16 Spatial/Frequency Methods ............................
5.17 Autoregressive Model ...................................
5.18 Structural Approaches to Texture Analysis ................
5.19 Conclusion ............................................

175
175
175
176
177
177
178
178
178
179
181
181
182
182
183
183
184
184
185
186
187
187
188
189
189
189

6.

Colour Image Analysis ....................................


6.1 Colour Cameras ........................................
6.2 Red-Green-Blue (RGB) Colour Representation .............
6.2.1 Maxwell's Colour Triangle .........................
6.2.2 One-dimensional Histograms: Colour Separations .....
6.2.3 Two-dimensional Scattergrams .....................
6.3 Hue-Saturation-Intensity (HSI) Colour Representation ......
6.3.1 Colour Scattergrams ..............................
6.4 Opponent Process Representation ........................
6.5 YIQ Colour Representation ..............................
6.6 YUV Colour Representation .............................
6.7 CIE Chromaticity Diagram .............................
6.8 CIEXYZ Colour Representation ..........................
6.9 CIELUV Colour Representation ..........................
6.10 CIELAB Colour Representation ..........................
6.11 Spatial CIELAB Colour Representation ...................
6.11.1 Segmenting Colour Textures .......................
6.12 Programmable Colour Filter (PCF) .......................

191
191
193
195
197
197
198
203
203
204
205
206
206
208
209
209
210
211

Contents

7.

XVII

6.12.1 PCF Implementation .............................


6.12.2 Recognising a Single Colour .......................
6.12.3 Noise Effects .....................................
6.12.4 Colour Generalisation .............................
6.13 Conclusion ............................................

211
214
215
217
219

N eat Vision: Visual Programming for Machine Vision .....


7.1 Visual Programming in NeatVision .......................
7.1.1 Input and Output Components ....................
7.1.2 Processing Components ...........................
7.1.3 Flow Control Components .........................
7.1.4 System Development ..............................
7.1.5 Sample Programme ...............................
7.2 Java Programming in NeatVision .........................
7.2.1 Data Flow Programming ..........................
7.2.2 Standard Component Architecture ..................
7.2.3 Adding Functionality .............................
7.2.4 Examples .......................................
7.3 The NeatVision Application .............................
7.3.1 Visual Programming in NeatVision .................
7.3.2 Image Processing .................................
7.3.3 Other User Interfaces .............................
7.3.4 The Integrated Software Development Environment ...
7.3.5 The Help Viewer .................................
7.4 Sample Applications ....................................
7.4.1 Low-level Programming ...........................
7.4.2 High-level Programming ...........................
7.4.3 Isolating the Largest Item in an Image ..............
7.4.4 Bottle-top Inspection .............................
7.4.5 Plant-stem Location ..............................
7.5 Conclusion ............................................

223
224
225
226
227
231
231
233
233
235
237
237
245
245
247
249
251
252
254
254
256
257
258
259
259

A. NeatVision Graphic File Formats ......................... 261

B. NeatVision Imaging API Specification .................... 263


B.1 Greylmage Class ....................................... 264
B.2 RGBlmage Class ........................................ 265
C. NeatVision Components .................................. 269
References ................................................ 273
Index ..................................................... 279

1. An Introduction to Machine Vision

The purpose of this chapter is to introduce the reader to the basic principles of
machine vision. In this discussion the differences between computer, machine
and human vision are highlighted. In doing so, we draw attention to the key
elements involved in machine vision systems engineering. While this book
concentrates on the software aspects of the design cycle, this task cannot be
taken in isolation. Successful application of vision technology to real-world
problems requires an appreciation of all the issues involved.
This chapter is aimed at readers with minimal prior experience of machine
vision and as such more experienced readers will find most of this material
familiar. We also briefly introduce the reader to the concepts behind the
Neat Vision visual programming environment. This is discussed in more detail
in Chap. 7.

1.1 Human, Computer and Machine Vision


"There is more to machine vision than meets the eye!" (Batchelor &
Whelan 1994a)
Computer and machine vision involve the automatic extraction, manipulation, analysis and classification of images or image sequences, usually within
special or general-purpose computing systems. The purpose of which is to
obtain useful information about the world with a view to carrying out some
task.
Many researchers use the terms computer and machine vision interchangeably, although there are key differences. Machine vision systems are generally used in an industrial environment. The design of machine vision systems
requires a broad spectrum of techniques and disciplines. These include electronic engineering (hardware and software design), engineering mathematics,
physics (optics and lighting), mechanical engineering (since industrial vision
systems deal with a mainly mechanical world) as well as the system engineering aspects of developing reliable industrial systems. Machine vision generally
involves the engineering of solutions to specific problems and can be seen as a

P. F. Whelan et al., Machine Vision Algorithms in Java


Springer-Verlag London 2001

1. An Introduction to Machine Vision

subset of the general systems engineering task. The formal definition 1 of machine vision put forward by the Automated Vision Association (AVA) refers
to it as:
"The use of devices for optical, non-contact sensing to automatically
receive and interpret an image of a real scene in order to obtain
information and/or control machines or processes." (AVA 1985).
Computer vision is a branch of computer science which concentrates on
the wider scientific issues relating to the automated interpretation of images.
Computer vision often deals with relatively domain-independent considerations, generally aiming to duplicate the effect of human vision by electronically perceiving and understanding an image. Research in this area often
deals with issues involving perception psychology and the analysis of human
and other biological vision systems.
Of course there is a danger in over-emphasizing the differences between
computer and machine vision. Both fields embrace common ground (Fig. 1.1),
but each has its own non-overlapping areas of expertise (Braggins 2000). The
key difference seems to lie in the motivation behind the development of the
imaging solution.

Image Analysis
Image Processing
Image Representation
Image Understanding

Pattern Recognition
Visual Modelling & Perception
Biological Vision

Systems Engineering
Robotics
Applied Vision

Fig. 1.1. Key aspects of computer and machine vision.

In general, machine vision engineering is about solving specific engineering problems and not trying to develop 'human-like' visual systems. The
concept of modelling an industrial vision systems design on the human visual system can often seem like a good starting point for novices to the field.
This is not the case (Hochberg 1987) and can often be counterproductive. We
should not confuse the two types of vision, the danger in relying on human
1

Note that this definition does not specifically refer to cameras or computer systems

1.1 Human, Computer and Machine Vision

driven approaches is that simpler and perhaps more elegant solutions may be
overlooked. The human vision system is qualitative in nature, dealing with
a broad range of imaging tasks in varying conditions. The human system
can pick out a familiar person in a crowded room, but would find it difficult
to give the exact dimensions of that person. Machine vision On the other
hand is quantitative, i.e. it can deal with making precise measurements at
high speed, but lacks the key elements of the human vision system. Machine
vision systems are good at repetitive tasks and provide fast and reliable decision making. If a human operator was to examine a colour web material
over a shift, they would find it difficult to detect a subtle colour change due
to the human's vision process ability to adapt to such gradual changes. The
fact that the human vision process is prone to subjective considerations such
as fatigue and boredom, which interfere with consistent evaluations, must
also be taken into consideration (these can be reliably handled by a machine
vision system).
Machine vision systems for industry first received serious attention in the
1970s (Parks 1978), although the proposal that a video system be used for
industrial inspection was first made in the 1930s. Throughout the early 1980s,
the subject developed slowly, with a steady contribution being made by the
academic research community, but with only limited industrial interest being
shown. In the mid-1980s there was serious interest being shown in vision
systems by the major automobile manufacturers, although this was followed
by a period of disillusionment with a large number of small vision companies
failing to survive. Interest grew significantly in the late 1980s and early 1990s,
due largely to significant progress being made in making fast image processing
hardware available at a competitive price. Throughout this period academic
workers have been steadily proving feasibility in a wide range of products,
representing all of the major branches of manufacturing industry.
Machine vision systems nOw appear in every major industrial sector, including such areas as electronics (PCB inspection, automatic component
recognition), car manufacturing (inspection of car bodies for dents, dimensional checking), food (inspection and grading of fruit and vegetables, inspection of food containers) and the medical industries (tablet quality control,
detection of missing items in pharmaceutical packets). The main application
areas for industrial vision systems occur in automated inspection and measurement and robotic vision. Automated visual inspection and measurement
systems have, in the past, tended to develop faster. In fact, quality control
related applications such as inspection, gauging and recognition currently account for well over half of the machine vision market. This has been mainly
due to the lower cost and the ease of retrofitting such inspection systems onto
existing production lines, compared to the large capital investment involved
in developing a completely new robotic work cell and the extra uncertainty
and risks involved in integrating two new complex technologies. As manufacturing technology becomes more complex, there is a growing requirement to

1. An Introduction to Machine Vision

integrate the inspection process more closely with the overall manufacturing
process (McClannon et al. 1993). This moves the application of automated
inspection from a quality control to a process control role, that is from defect
detection to defect prevention. The reader is referred to Batchelor & Whelan
(1994b), Chin (1988), Chin & Harlow (1982), Davies (1996) and Freeman
(1987) for further details on a wide range of industrial implementations of
machine vision systems.
As well as reducing scrap and rework costs, product quality can also be
improved by using vision to aim for a higher quality standard. Machine vision
can be used to determine the cause of "abnormal situations" and provide early
warning for potential hazards on a production line, for example detecting a
warped metal plate before it is fed into a stamping press and potentially
damaging the press. This technology can also provide extensive statistics of
production process and in some applications we may have no alternative to
using automated vision, as the environment may be hazardous to humans.
Machine vision systems are not perfect, they contain two significant types
of error. System errors will be familiar to all engineers as there will always
be a trade-off between the cost and functionality of the systems components.
System errors can often be reduced by using components with higher specifications. Statistical errors are not as easy to handle. It can be quite difficult
to decide exactly where to draw the line which determines a good from a bad
product. If the image analysis process places the product in this grey area
then we have either false rejects or worse still we may pass a faulty product. This type of error can also be confounded by a lack of communication
between the vision engineer and the customer.
Significant progress has been made over the last decade due, in part to the
falling cost of computing power. This has led to a spread in the technology
and has enabled the development of cheaper machine vision systems. This,
in turn, has enabled medium-sized manufacturing companies to consider the
option of using machine vision to implement their inspection tasks. To a lesser
extent, the availability of a well educated work-force, a small proportion of
which has an awareness of machine vision, has also aided the growth and
acceptance, of industrial vision systems. The main reason, however, for this
growth is strategic. There is a growing realisation within many industries
that machine vision is an integral component of a long term automation
development process, especially when one considers the importance of quality
in manufacturing. This fact, combined with the legal liabilities involved in
the production and sale of defective products, highlights the strategic case
for the use of machine vision in automated inspection. A similar argument
applies to the application of vision to robotics and automated assembly.

1.2 Vision System Hardware

1.2 Vision System Hardware


The imaging engine in many low-cost machine vision systems consists of a
host computer working in conjunction with single or multiple plug-in boards.
The most commOn example of these systems consists of a personal computer,
or workstation and a frame-store card, which allow an image to be captured
from a standard CCD camera (array image format) and displayed. Many of
the current range of frame-store cards also offer on-board processing. Plug-in
accelerator cards which enable certain functions to be implemented in realtime are available as daughter boards for many frame-stores. Some framestores have slow-scan capabilities and the ability to interface to line-scan
cameras. When used in conjunction with the current range of high speed
personal computers, such a vision system is an attractive option for small to
medium applications, of low complexity.
Certain personal computers/workstations nOw offer direct video input
without the need for additional plug-in frame-store cards. With the growth
in multimedia applications, it is likely that this will become more commOnplace On commercial computers. Such systems offer a number of significant
advantages, most important of which is their relatively low cost. Additional
advantages include their ease of use and familiarity. This is especially the case
when used in conjunction with standard personal computers, which have become commonplace both in the home and the workplace. The fact that the
host computer for the imaging system is a widely available commercial product also widens the base for software applications.
For greater speed and ability, engineers often turn to plug-in boards which
have a specific functionality, such as real-time edge detection, binary correlation and convolution. Typically, the host computer for such boards would
be a VME rack fitted with a CPU card. Quite often, such special-purpose
boards are pipelined, i.e. they perform different operations On an image in a
sequential manner. This allows a new image to be captured while the previous
image is still undergoing processing. The main advantage of such systems is
their speed and the ability to increase the systems image throughput rate by
the addition of extra plug-boards. The disadvantage of such systems is that
they can be difficult to program and quite often require programmers with
highly specialist skills. There is also a significant cost factor involved in the
capital equipment, along with the application development costs. While the
majority of dedicated plug-in boards for pipelined systems are tuned to deal
with array CCD cameras, newer systems have appeared On the market that
are specifically designed for line-scan cameras.
Some system manufactures have taken the option of designing specific
machine vision engines which are not tuned for a specific application, but
rather designed for their general functionality. Such systems may be totally
self-contained and ready to install in an industrial environment. That is, they
contain the imaging optics, camera, imaging engine and interfaces for various mechanical actuators and sensors. They differ from turn-key systems in

1. An Introduction to Machine Vision

that the software is supplied with the self-contained system has yet to be
moulded into a form that would solve the vision application. Such systems
have significant advantages, the main one being speed. The majority of selfcontained systems are custom designed, although they may contain some
plug-in boards and are tuned to provide whatever functionality is required
by the application. The self-contained nature of the mechanical, image acquisition and display interfaces is also a significant benefit when installing
vision systems. However, it can be difficult to add further functionality at a
later date without upgrading the system.
Turn-key vision systems are self-contained machine vision systems, designed for a specific industrial use. While some such systems are custom designed, many turn-key systems contain commercially available plug-in cards.
Turn-key systems tend to be designed for a specific market niche, such as
paper inspection. So, not only is the hardware tuned to deal with high-speed
image analysis applications, it is also optimised for a specific imaging task.
While the other systems discussed usually require significant development
to produce a final solution for an imaging application, turn-key systems are
fully developed, although they need to be integrated into the industrial environment. This should not be taken lightly, as it can often be a difficult task.
Also, it may not always be possible to find a turn-key system for a specific
application.

1.3 Vision System Software


There are a large number of image processing and analysis packages currently
available. Several of these packages are freely available over the Internet,
see Machine Vision Resources2 for general reference material relating to the
design and development of commercial machine vision systems. Some of these
packages are tightly tied to a given vision system, while others are compiled
for a number of host computers and operating systems. The majority of
the software packages have interactive imaging tools that allow ideas to be
tested prior to customizing the software for a given application. A number
of tutorial books have also been written which implement a large number of
image processing functions and to a lesser extent image analysis functions, in
a range of software languages (Parker 1997, Klette & Zamperoni 1996, Pitas
1993, Myler & Weeks 1993, Lindley 1991).
For more information on the hardware and software aspects of real-time
imaging, including a survey of commonly used languages, see Dougherty &
Laplante (1995).
2

http://www.eeng.dcu.ie/-whelanp/resQurces/resQurces.html

1.4 Machine Vision System Design

1.4 Machine Vision System Design


In this section we outline the key components found in the majority of machine vision systems, Fig. 1.2. While the application itself will determine the
relevant importance of each of these tasks, successful implementations will
have taken all these issues into consideration during the system design. The
software aspects of these design stages are expanded upon throughout the
course of this book.

Image Interpretation and


Mechanical Interface
Image Processing
and Analysis

Oala
Flow

Image Sensor

FeedbaCk

Palh

lighting and Optics

Part Feedi ng and


Mechanical Interface

Fig. 1.2. Machine vision system components.

1.4.1 lInage Acquisition

Image acquisition generally involves the capture of the analogue image signal
(although digital cameras can also be used) and the generation of a onedimensional (I-D) or two-dimensional (2-D) array of integers representing
pixel brightness levels. In the majority of machine vision applications a solid
state sensor based camera is employed. This is made of discrete elements
that are scanned to produce an image. Examples of such sensors are Charge
Coupled Devices (CCD) and Charge Injection Devices (CID). CCD based
cameras are the most commonly used solid state sensor due to their small size,
low power consumption requirements, wide spectral response and robustness.
These cameras are available in both line scan and area array configurations.
Line scan cameras are suitable for high resolution scanning applications where
the object moves beneath a fixed camera position, for example a paper web.
Area array cameras produce a medium to high resolution 2-D snapshot of a
scene (similar to a TV image).
It is worth noting that there have been major advances in other component technologies, specialised lighting units, lenses and advisor programs,
which guide a vision engineer through the initial stages of the design process.

1. An Introduction to Machine Vision

While the acquisition of image data may not be seen as directly related to
image analysis and processing, the design decisions involved in this stage (i.e.
camera type, lighting and optical system design) have a fundamental bearing
on the systems software. For example, in Fig. 1.3 we have applied two different
lighting techniques to the same scene. Given the task of counting the number
of candies in the image, which lighting configuration should be adopted?
While the image in (b) gives a cleaner image, we would have to apply complex
image segmentation techniques (such as the one discussed in Sec. 4.4.2) to
separate the touching items prior to counting. Whereas the image in (a) may
initially seem to be of little use due to the bright reflections evident on each
candy piece. In fact the lighting unit was specifically designed to produce
this effect. This lighting configuration simplifies the image processing and
analysis tasks as all that is required of the software is to isolate and count
these reflections (Chap. 3), both relatively straightforward imaging tasks.

(a)

(b)

Fig. 1.3. Different lighting techniques applied to candy sweets. (a) Ring Light
(Light Background). (b) Back-lighting, this produces a silhouette of the scene.

1.4.2 Image Representation

This relates to the means of representing the image brightness data in a form
that enables the image processing and analysis design process to be implemented efficiently. We shall first consider the representation of monochrome
(grey scale) images. Let x and y denote two integers where 1 :::; x :::; m and
1 :::; y :::; n. In addition, let f (x, y) denote an integer function such that
0:::; f(x, y) :::; W (W denotes the white level in a grey scale image). An array
F will be called a digital image, where an address (x, y) defines a position in
F, called a pixel, or picture element. The elements of F denote the intensities
within a number of small rectangular regions within a real (i.e. optical) image. Strictly speaking, f(x, y) measures the intensity at a single point, but if

1.4 Machine Vision System Design

the corresponding rectangular region is small enough, the approximation will


be accurate enough for most purposes. The array F contains a total of m x n
elements and this product is called the spatial resolution of F (Fig. 1.4).
For example, when m = n 2: 128 and W 2: 64, we can obtain a good
image of a human face. Many of the industrial image processing systems
in use nowadays manipulate images in which m = n = 512 and W = 255
(Fig. 1.5). This leads to a storage requirement of 256 Kbytes/image. A binary
image is one in which only two intensity levels, black (0) and white (1), are
permitted (Fig. 1.5(d)). This requires the storage of m x n bits/image.

ly
f(x,y)

----+
x

Fig. 1.4. A digital image consisting of an array of


m x n pixels. The pixel in the xth column and the
yth row has an intensity equal to f(x, y).

An impression of colour can be conveyed to the eye by combining different


colours of light. The signal from a colour television camera may be represented
using three components (red, green and blue): R = r(x, y); G = g(x, y); B =
b(x, y), where R, G and B are defined in a similar way to F. The vector
(r(x,y),g(x,y),b(x,y)) defines the intensity and colour at the point (x,y) in
the colour image. Colour image analysis is discussed in more detail in Chap. 6.
Multispectral images can also be represented using several monochrome images. Video sequences are used to represent and analyse movement within
or across scenes. A video sequence is, in effect, a time-sampled representation of the original moving scene (see Blake & Isard (1998) for details of
one approach to image sequence analysis). Each frame in the sequence is a
standard colour, or monochrome image and can be coded as such. Thus, a
monochrome video sequence may be represented digitally as a sequence of
2-D arrays [F1 ,F2 ,F3 ... FN].
While it is beyond the scope of this book to detail the various image
coding techniques that are now available (see Rao & Hwang (1996) for further
details on video coding), two basic coding mechanism have been employed in
machine vision, namely run-length and chain coding (these will be discussed
in Chap. 3).

10

1. An Introduction to Machine Vision

(a)

(b)

(c)

(d)

Fig. 1.5. Image resolution and binary threshold. (a) to (c) Bottle-top grey scale
image at varying resolutions. (d) Binary version of the bottle-top image.

1.4.3 Image Processing

This is an image to image operation. It aims to produce images that will make
the analysis stage simpler and robust. Key to the image processing task is
the ability to segment (Fig. 1.6) and enhance the features to be analysed.
This can be a difficult task when dealing with complex scenes. The two main
classes of operations applied to images during processing are point to point
and neighbourhood operations. A wide range of image processing functions
are detailed in Chaps. 3, 4, 5 and 6.
1.4.4 Image Analysis

Image analysis involves the automatic extraction of useful information from


an image. This information must be both explicit and useful in the subsequent decision making process. Common image analysis techniques include

1.4 Machine Vision System Design

(a)

11

(b)

Fig. 1.6. Image processing applied to a paper watermark image. (a) Original image.
(b) Extraction of the watermark from (a).

template matching, pattern recognition using feature extraction and descriptive syntactic processes.
Template matching involves comparing an ideal representation of a pattern or object, to segmented regions within the image. This can be carried out
on a global basis, or locally whereby we can use several local features, such
as corners (Fig. 1.7). The key problem with this approach is that it cannot
easily deal with scale and/or orientation changes, therefore it can be computationally expensive. This approach to image analysis was popular in the
early days of machine vision, but had gone out of favour due to the problems
outlined. Although the introduction of faster computing power and custom
VLSI devices has caused many researchers to reevaluate the possibility of
using template matching for industrial imaging tasks.

IIL-.J
1

1/3

Fig. 1. 7. Local template matching. In the image


of the rectangle each of the local templates are associated with corners within the scene. When the
image suffers a transformation the allocation of the
local templates to the scene becomes a more difficult task.

In feature extraction based pattern recognition, images are described in


terms of their representative features. This has been a dominant form of image
analysis in machine vision over the last ten years. This involves segmenting
the image into its constituent parts. Once this has been completed the key
features (e.g. the corners within an image) are extracted and then classified.
See Chaps. 3, 4, 5 and 6 for details on a wide range of image analysis functions.

12

1. An Introduction to Machine Vision

Descriptive syntactic processes involve a linguistic approach to image


analysis. They model an object by a set of basic elements and their (spatial) relationship. A key problem with this approach is that it lacks precision
and as such has found little favour with machine vision designers.
Fig. 1.8 summarises some of the key stages involved in a machine vision
system. Once an image is captured, processing is applied to highlight the
features of interest and reduce the amount of data that has to be processed.
The key features, i.e. the star tips in this example, are then extracted. Finally, a scalar value is generated to indicate the number of star points. This
will be compared to a predefined value to determine if the part has been
manufactured correctly.

(a)

(b)

(c)

Fig. 1.8. A typical sequence of operations found in machine vision. The aim of the
algorithm is to count the number of points on the star so that it can be compared to
a predefined value and hence a decision on its suitability can be made. (a) Original
binary star image. (b) Image processing: The image is thinned (see Sec. 3.3.2) a
predefined number of times to reduce the points on the stars to a single pixel width.
(c) Image analysis: Limb end analysis (see Sec. 3.3.2) is then applied to isolate each
point (these are shown as small white squares). The points are then counted (see
Sec. 3.3) prior to classification.

1.4.5 Image Classification

Objects are classified based on the set of features extracted. Key to the image
classification task is the fact that similarity between objects implies that they
contain similar features, which in turn form clusters. A number of image classification techniques including the Nearest Neighbour, K-Nearest Neighbour,
Maximum-likelihood and Neural Networks classifiers are discussed. We also
examine the Confusion Matrix and Leave One-out methods of evaluating the
performance of these classifiers.
But, how do we know we have identified suitable features? Ideally extracted features should have four characteristics (Castleman 1979).

1.4 Machine Vision System Design

13

Discrimination: Each feature should take on significantly different values for


objects belonging to different classes. For example, object size in sorting
A4 and A5 sheets of paper.
Reliability: Features should take on similar values for all objects of the same
class. For example, an apple may appear green or red, depending on
ripeness, therefore features may look different even though they appear
in the same class.
Independence: Various features used should be uncorrelated with each other.
For example, the diameter and weight of fruit could be highly correlated,
both reflecting the same property, i.e. size. Highly correlated features
may be combined (i.e. averaging to reduce sensitivity to noise) but they
should not be used as separate features.
Small feature set: Complexity of pattern recognition system increases rapidly
with the dimensionality (i.e number offeatures used) ofthe system. Also,
the number of objects required to train and measure the performance of
the classifier increases exponentially with the number of features present.
Adding more features which are noisy or highly correlated can degrade
system performance.
There are two main approaches to the problem of classification: supervised
(i.e. when all the classes are known and can be used to train the classifier) and
unsupervised classification (i.e. when the number of classes are unknown).
Supervised Classification. Supervised classification requires a priori information about the input data patterns. Usually this information is based on
an underlying statistical distribution of the patterns, or on a set of patterns
with the correct classification, the latter being produced by a human expert
who supervises the choice of learning cases (Weiss & Kulikowski 1990). There
are two phases in the supervised classification process: learning (or training)
and testing. During the training stage patterns are sequentially presented to
the classifier, which modifies its internal properties accordingly. Extracting
the training set of patterns to achieve optimal training is generally a difficult
task and it is very data dependent. The training set should be fully representative of the populations from which it is drawn (Davies 1996). Once the
training step has been completed, the system can be used in the testing phase
where the classifier is presented with new test patterns. This set of patterns
is called a validation set. Examples of this form of classification include Nearest neighbour, Maximum-likelihood and the application of Artificial Neural
Networks.

Nearest Neighbour Classifier (NNC): Classifies an object based on the closest cluster that the unknown object lies near. This is also referred to as
the Maximum Similarity Classifier (Vernon 1991). This approach is simple to implement and hence popular, although incorrect classification is
possible. It is also sensitive to noise. Assuming an object Q described by

14

1. An Introduction to Machine Vision

the vector (Xl, X 2, ... , X n), then a set of m reference vectors, descri bing archetypal objects of each class, denoted by Yl , Y 2 . . . , Y m , where
Yi = (Yi,l, Yi.2, ... , Yi.n). Similarity between two objects represented X
and Yi can be assessed by measuring the Euclidean distance (D (X, Yi))
between them.

The larger D(X, Yi) is, the smaller the similarity is between the patterns
they represent. Thus, an object of unknown type and which is represented
by a vector X can be attributed to an appropriate class by finding which
of the Yl , Y 2 . , Y m values are closest to X.
The K-Nearest Neighbour (KNN) is an extension to NNC which tries to
obtain the k nearest neighbours in the training set and then labels them
using majority voting, i.e. assigns a class C to a pixel X if the majority of
the known neighbours of X are of class C. This approach is more robust
to noise than the NNC. k is usually chosen to be quite small (k = 3,5
are commonly used) and tends to be dependent on the amount of data
that is available.
Maximum-likelihood Classifier: Also referred to as Bayesian classification
(Vernon 1991). Assume two classes of objects, Cl and C2 , with a single feature to distinguish these classes (e.g., the number of holes, x).
This technique involves finding the Probability Density Function (PDF)
for each class and measuring the probability that an object from a given
class will have a given feature value. This is found by measuring x for a
large number of samples of each class. Let P[Cn ] be the probability of
class n occurring (in the range [0, 1]). But we require the probability that
an object belongs to class n given a measured value of x, i.e. P[Cnlx].
An object belongs to class 2 if P[C2 Ix] > P[Cllx], where P[xIC2 ] is the
probability that feature x will occur given the object belongs to class 2.
Using Bayes Theorem and assuming that P[x] (normalisation factor) is
the same for both classes and P[Cl ] = P[C2 ], P[xIC2 ] > P[xICl ]. If this
occurs then the object belongs to class 2, based on the measurement of
the feature x.
Artificial Neural Networks (ANN): Artificial Neural Networks are relatively
straightforward to design and are capable of dealing with pattern complexities not discernible by other classifiers. During the training phase
we present the input data to the network, calculating its outputs. The
outputs are then compared with the expected results. The weights of the
network are then adjusted. This should force the network to converge
and thus learn how to discriminate varying inputs as they are presented.
Each cycle of presenting the input, calculating the output, comparing
results and changing weights is referred to as an epoch. Unfortunately
the training phase can take a lot of time and we have no guarantee of

1.4 Machine Vision System Design

15

convergence. Generally it is advisable to divide the training set in two


sections. The first section is used to train the network until the results
are acceptable. The network is then run with the remaining half to see if
the network is performing correctly.
The classification stage is similar to training. All data is passed through
the network thus generating an output. The training stage should have
modified the weights so they generate the appropriate outputs when presented with inputs similar to the training set. The activation values of
the outputs determine the class for that particular data vector. There
are a number of key problems associated with this approach. The training algorithm can be very slow, even for relatively simple cases. There
are many internal parameters to adjust and their values may highly influence the outputs. It is also difficult to design a training set to avoid
the parameters overfitting or the network becoming trapped in a local
mlmmum.
Unsupervised Classification. Unsupervised classification is a much more
difficult task because the definition of classes and in many cases even the
numbers of classes, is unknown. The goal in this case is to identify the clusters of patterns that are similar, thus identifying the potential classes. This
problem is often referred to as unsupervised learning or clustering. Jain &
Dubes (1988) text On algorithms for clustering data is a good introduction
to this area. There are two main categories of algorithms available in cluster
analysis: agglomerative algorithms and divisive algorithms. The agglomerative algorithms start considering every point in feature space as a single cluster and groups them using some similarity function. The divisive approach
views the set of features as a single cluster and partitions it until a predefined
criterion is reached. Key to obtaining good results is the correct selection of
the clustering algorithm and clustering criterion.
Evaluation of Classifier Performance. The main goal in designing a
classifier is to obtain good prediction and classification of new data. This is
usually performed by measuring the misclassification, i.e. when the classifier
is presented a new case and it makes a wrong decision, the error rate (i.e. the
number of misclassifications / number of cases) is computed.

Confusion Matrix: This is the simplest way to present the misclassification


error. The element ni,j of the matrix contains the number of patterns of
class i classified as belonging to class j. The main diagonal of the matrix
contains the number of correct classifications. If all the errors are treated
equally, then the overall error rate is 1 - 2:i nJrj , where N is the total
number of test patterns.
Leave One-out Method: Given the number of patterns N, the classifier is
trained using N - 1 patterns. The remaining one is used as a test pattern. This procedure is repeated until all of the N patterns are classified.

16

1. An Introduction to Machine Vision

The error rate is simply N'N ss , where N miss is the number of misclassifications. This method is an almost unbiased estimator for the true error
rate of the classifier, even for a small number of samples. The method is
superior to the confusion matrix for evaluating a systems performance,
although it can be computationally expensive for a large number of cases.
1.4.6 Systems Engineering

The complexity of a machine vision application is largely a reflection of the


complexity of the environment in which it finds itself (Simon 1981). Therefore, a successful vision application requires a total systems approach that
necessitates a range of engineering and practical skills to deal with such a
complex industrial environment. When faced with a specific application requirement, the designer should always consider analysing the problem from a
systems engineering perspective. By adopting a systems approach, the maximum use is made of problem-specific "contextual" information, derived, for
example, from the nature of the product being handled, the process used to
manufacture it and the special features of the manufacturing environment. By
doing so, it is hoped that the complexity of the application can be reduced,
or in some cases made trivial.
Researchers must also be open to the idea that vision may not be the
correct approach for the task at hand. Some tasks that the end-user may
see as a suitable application for machine vision, may be better served by using other engineering approaches, such as the use of mechanical sensors and
electronic transducers. The problem with some of the unsuccessful vision applications of the past has been the application of vision technology for its
own sake. For machine vision to become generally accepted by the manufacturing community, it must concentrate on tackling the industrial problems
rather than trying to implement a given technology for its own sake. A growing number of researchers have argued for the design of vision systems to
be firmly placed back into a systems engineering framework (Batchelor &
Whelan 1997, Haralick 1992, Pavlidis 1992). This arises from the belief that
an inadequate amount of vision research deals with the genuine design and
systems problems involved in the implementation of industrial vision systems.
To advance from the current generation of machine vision systems to a
new, more flexible family requires addressing a number of key issues:
Development of adaptive machine vision systems.
Application of a systems engineering approach to industrial vision tasks.
Maximise the use of contextual information available from the product,
process and application environment.
The production of standard solutions to industrial problems.
Tackling of sub-goals.
Widening of the application base.
The use of vision in a process and quality control role.

1.5 NeatVision: Where Java meets Machine Vision

17

Performance characterisation tools.


Ability to deal with unclear or missing information.
Systematic testing and repeatable experimental results.
Generic tools to deal with common analysis features such as shape, size,
colour and texture.
Investigation of algorithmic and heuristic procedures.
Flexible, user friendly interfaces.
Broader education of the systems issues.

The general issues involved in the machine vision engineering task has
been detailed elsewhere (Batchelor & Whelan 1994b, Batchelor & Whelan
1997).

1.5 Neat Vision: Where Java meets Machine Vision


Neat Vision is a visual programming software development environment for
use in the design and development of machine vision algorithms and techniques. Its intuitive interface is achieved using a drag and drop, block diagram approach where each image processing operation is represented by a
graphical block with inputs and outputs which can be interconnected, edited
and deleted as required. NeatVision was written in Java, a platform independent programming language, to allow it to be used on a wide range of
computer platforms. Java provides accessibility, hence reducing the workload
and increasing the deliver abIes in terms of cross-platform compatibility and
increased user base.
NeatVision is designed to work at two levels. The user level allows the
design of imaging solutions within the visual programming environment using
NeatVisions core set of functions. Currently3 NeatVision contains over 200
image processing and analysis functions, ranging from pixel manipulation
to colour image analysis. A full listing of all the functions available can be
found on the NeatVision web site 4 . At the more advanced developers level,
NeatVision allows users to integrate their own functionality, i.e. to upgrade
through the introduction of new image processing modules (Chap. 7). The
reader is encouraged to use this book in conjunction with NeatVision and to
experiment with the techniques provided with the view to solving imaging
tasks.
NeatVision is just one example of such a visual programming development environment for machine vision, other notable examples include Khoros
(Khoral Research, Inc.)5 and WiT (Logical Vision)6.
3

4
5

NeatVision Version 1.0


http://www.NeatVision.com/
http://www.khoral.com/
http://www.logicalvision.com/

18

1. An Introduction to Machine Vision

1.5.1 NeatVisions Graphical User Interface (GUI)


The NeatVision GUI (Fig. 1.9) consists primarily of a workspace where the
processing blocks reside. The processing blocks represent the functionality available to the user. Support is provided for the positioning of blocks
around the workspace, the creation and registration of interconnections between blocks and support for double clicking. The lines connecting each block,
represent the path of the image through the system, with information flowing from left to right. Some of the blocks can generate child windows, which
can be used for viewing outputs, setting parameters and selecting areas of
interest from an image. If each block is thought of as a function, then the application can be thought of as a visual programming language. The inputs to
each block are similar to the arguments of a function and the outputs from a
block are similar to the return values. The advantage of this approach is that
a block can return more than one value. The image processing system can be
compiled and executed as with a conventional programming language with
errors and warnings being generated depending on the situation. Warnings
are generally associated with the failure to connect blocks correctly.

!:II. ;'dll

~'ew

~'oJ.CI

~1iI7iD E]

!y,ndow !:leiP

...

\I

=-------~======~~~
IQ WorkSPlICe 1$1""_1
"rtf' 1!11

System US"

~tl!l!9'lll

IntegralelmageRows
!ntegra!elmageColumns

'i' S

HorzSum
BmaryShadow

~TopToBonom
L.IITORIVht

R,ghtTOLen

[(94.'~2)
Fig. 1.9. A typical NeatVision application and its associated visual workspace.

Blocks are selected from the component explorer and placed in the
workspace as required. Interconnections are made between blocks following
a strict 45 snap to grid rule system. When the processing system is set up,
input images must be specified. This is performed by double clicking on the
relevant input block causing an image viewer frame to appear. The open
option is then selected from the file menu and the appropriate file name is
chosen by navigating the file dialog box directory structure.
There is currently support for 13 input graphics file formats and 11 output
formats (.gif and .fpx are excluded as they are proprietary standards), see

1.5 NeatVision: Where Java meets Machine Vision

19

Appendix A for a full listing. Initialisation of certain block parameter may


be required throughout the system. If they are not set, the parameters will
assume their default values. System parameters can be adjusted and the
system may be reset and executed again until the desired response is obtained.
At any stage blocks may be added or removed from the system. Once a visual
programme is executed, each block is highlighted as it is processed. This
enables users to get a sense of the relative speeds of the various processing
options available within the NeatVision environment (Chap. 7).

2. Java Fundamentals

"The Java platform is a fundamentally new way of computing, based


on the power of networks and the idea that the same software should
run on many different kinds of computers, consumer gadgets and
other devices ... " - Sun Microsystems

2.1 The History of Java


Java was initially developed by a team of software engineers at Sun Microsysterns under the leadership of James Gosling in 1991 when he investigated the
development of a hardware-independent software platform using C++. Java
was developed as an alternative to C++, to implement control systems for
consumer electronic devices.
C++ was found to be unsatisfactory for this task in many ways, including
the introduction of program bugs in the form of memory leaks. C++ was
dropped and a new language called Oak was developed to solve some of these
inadequacies. This new language, consequently renamed Java, was a powerful
and well structured language waiting for an application. In 1994 the World
Wide Web (WWW) emerged and the Sun Developers used Java as the basis
of a Web browser - beginning the Java/HotJava projectl.
Java was perfectly suited for use in WWW applications as the program
code is compact, platform neutral and could be used to generate compact
programs called applets which are capable of being embedded in WWW
pages. In late 1995 Java (beta 2) was released along with the announcement
of JavaScript by Sun and Netscape Corporation. Support continued when
in late 1995 both Microsoft and IBM requested licensing rights from Sun.
In early 1996 Java 1.0 along with JavaScript were officially released on the
Internet.
1

The 'Java' name was actually derived from the name of the programmers
favourite coffee during a brainstorming session (as opposed to the suggested
D++). HotJava is a WWW browser developed by Sun to demonstrate the power
of the Java programming language.

P. F. Whelan et al., Machine Vision Algorithms in Java


Springer-Verlag London 2001

22

2. Java Fundamentals

2.1.1 What Makes Java Platfonll Independent?


Java programs exist in the form of compiled bytecode . This is similar to machine code except that its target is the Java Virtual Machine (JVM) (Fig. 2.1).
A JVM resides within every Java compatible WWW browser and is available
as a stand-alone application with the Java Run-time Environment (JRE). A
JVM is, in effect, a bytecode interpreting machine running on a hardware machine, translating bytecode to native instructions for that machine (slowing
execution speed). Java bytecode is extremely compact allowing it to be easily
delivered over a network. The JVM in each Web browser is built to the same
specification, requiring that one version of source code be compatible with
many platforms, provided that a Java enabled Web browser exists for that
platform. In reality not all Web browsers implement the exact same JVM
specification, introducing minor inconsistencies when the same Java applet
is viewed using several different browsers, on different platforms.

Java

Java Programming Language

Java
Compiler (javac)

Java
Runtime
System
Host
Machine

!
Java Virtual Machine
Oava. appleMewer)
Operating System
WIr'Idow$ NT t98. So&iIrd.. lJnI.n
OS/2. MacOS

Fig. 2.1. Description of the Java Application Programming Interface (API).

2.1.2 The Just-In-Time Compiler


When a Java applet is executed for the first time, the speed at which it
executes may seem disappointing. The reason being that the Java applet's
bytecode is interpreted by the JVM rather than compiled to native machine
instructions. The solution to this problem lies in the J ust-In-Time (JIT)
compiler developed by Microsoft. The JIT compiler reads the bytecode of
the Java applet and converts it into native machine code for the intended
operating system. Once the Java applet is converted into machine code it

2.2 Object-oriented Programming

23

runs like a natively compiled program. The JIT compiler can, in certain
cases, improve the run-time speed of applets and applications by a factor of
10.
2.1.3 The Sun Java Software Development Kit (Java SDK)
Java's standard edition, referred to as Java 2 SDK 2 , is freely available from
http://java.sun . com/. It provides the following key tools:
javac - The Java programming language compiler.
java - The launcher for developed Java applications.
appletviewer - A basic Java enabled browser.
j avadoc - Automatic API documentation tool.
jdb - Java debugging tool.
javah - Native development tool for creating 'e' headers.
j avap - A class file disassembler.

It also includes a set of advanced tools for security, Remote Method Invocation (RMI) and internationalisation.
The Java SDK 1.2 release also contains the Java Runtime Environment
(JRE), which consists of the JVM, the Java platform core classes and supporting files. It is aimed at developers who wish to distribute their applications
with a runtime environment. This is important as the Java SDK is quite large
and licensing prevents the distribution of the Java SDK. All the examples developed in this book have been compiled and tested with the Java SDK 1.2.2
under Intel x86.

2.2 Object-oriented Programming


Since Java is a well defined Object-oriented (00) language, it is important
to be familiar with some of the basic 00 concepts. The Object-oriented Programming (OOP) paradigm allows us to organise software as a collection of
objects that consist of both data structure and behaviour. This is in contrast to conventional programming practice that only loosely connects data
and behaviour. The object-oriented approach generally supports five main
aspects: Classes, Objects, Classification, Polymorphism and Inheritance.
Object-orientation encourages both the modularisation (decomposition
of the application into modules) and the re-use of software (the creation
of software from a combination of existing and new modules). We will now
discuss some of the terminology and concepts associated with OOP.
2

Termed "Java 2" by SUN and refers to version 1.2.x

24

2. Java Fundamentals

2.2.1 Encapsulation

Encapsulation is one of the major differences between conventional structured programming and OOP. It enables methods and data members to be
concealed within an object, in effect making them inaccessible from outside
of that object. It is good programming practice for a module to hide as much
of its internals as possible from other modules.
Encapsulation can be used to control access to member data, forcing its
retrieval or modification through the objects interface which should always
incorporate some level of error checking. In strict 00 design, an object's
data members should always be private to the object. All other parts of the
program should never have access to that data. You might like to think of
this as a television, with the internal electronics protected by a case on which
controls are available. The controls provide our interface to the television,
while the case prevents unauthorised access (and indeed electrocution).
2.2.2 Classes

Classes allow complex structures to be represented within a programming


language. Classes have two components:
States - (or data) are the values that the object has .
Methods - (or behaviour) are the ways in which the object can interact
with its data, i.e. the actions.
An example code segment of a Java class is described as:
public class AnExampleClass {
//place methods and state variables for the class here
}

Objects are then instantiated from a class, in that they exist in the computers
memory, with memory space for the data. Instantation is the creation of an
object from the class description. If we expand the previous class to contain
states and methods then it will have the following form:
public class AnExampleClass {
private int anExamplelntegerState;
public void anExampleMethod()
{

//add the method code(implementation) here ...


}
}

2.2 Object-oriented Programming

25

2.2.3 Objects

An object is a discrete, distinguishable entity. For example, myCar is an object,


whereas the common properties that exist with myCar and everyOtherCar
may be grouped and represented as a class called Car. Objects can be concrete
(a car, a file on a computer) or conceptual (a database structure) each with
its own individual identity. To create an instance of a class in Java use:
public class TempClass {
AnExampleClass myExampleClass

new AnExampleClass();

This code segment illustrates a temporary class in which we wish to have


an object of the class AnExampleClass as described previously. We use the
new keyword to create an instance of the class - in effect, the computer allocates a block of memory with a large enough space to store the object. This
then returns a reference to the block of memory which can be assigned to a
reference variable - in this case myExampleClass.
public class TempClass {
public void someMethod()
{

AnExampleClass myExampleClass;
Iistep 1 - declare the reference variable
myExampleClass = new AnExampleClass();
Iistep 2 - use the new keyword to store
lithe object at the myExampleClass reference
}
}

2.2.4 Inheritance

Object-oriented languages allow a new class to be created by building on


some other class. Inheritance enables the creation of a class which is similar
to one previously defined. An inherited class can expand on, or change the
properties of the class from which it is inherited. In traditional programming,
the modification of existing code would be required to develop a function
similar to inheritance, introducing bugs into the software as well as generating
bulky repetitive source code. The level of inheritance can be further controlled
by the use of the keywords public, private and protected.
Fig. 2.2 illustrates a very basic class hierarchy. The base class Vehicle
would contain properties common to all vehicles, such as wheels, engine, body
etc. The derived classes such as Car and Lorry will inherit the properties of

26

2. Java Fundamentals

Fig. 2.2. An example Class Hierarchy.

the Vehicle class and can add to or modify these properties. The Lorry class
might add storage properties, or might alter the number of wheels.
The is-a/is-a-part-of relationships are a useful way of viewing the relationship between derived classes and can help clarify the situation when there
is confusion. In Fig. 2.2 we have is-a relationships, such as car is-a vehicle,
lorry is-a vehicle. This is contrary to the is-a-part-of relationship that might
exist in Fig. 2.3. In that case, wheels, body, windows and engine all form an
is-a-part-of relationships with the vehicle class, i.e. body is-a-part-of vehicle.
There are no inheritance relationships occurring, rather the 'parts' may be
other classes, used to construct the vehicle class.
IS-A-PartOf
Wheels
Engine

Vehicle

Body

o
o
Windows

Car

000

Fig. 2.3. The is-a and is-a-part-of relationships.

An example Java code segment showing inheritance:


public class AnotherExampleClass extends AnExampleClass {
//place methods and state variables for the class here
}

The extends keyword allows the creation of a class, AnotherExampleClass


that inherits the states and methods of the base class AnExampleClass. For

2.2 Object-oriented Programming

27

example in Fig. 2.3, the class Car extends Vehicle and the class definition
would take the form of public class Car extends Vehicle. The extends
keyword therefore implies an is-a relationship.
2.2.5 Polymorphism

As just discussed, a derived class inherits its functions or methods from the
base class, often including the associated code. It may be necessary to redefine
an inherited method specifically for one of the derived classes, i.e. alter the
implementation. Polymorphism is the term used to describe the situation
where the same method name is sent to different objects and each object
may respond differently.
For example, consider the class diagram in Fig. 2.2. If a method was
written called drawO for the Car class, that displayed a picture of a generic
car, in the derived classes, Saloon and Estate cars, the inherited draw 0
method would not be suitable. Polymorphism allows new methods to be
written in these derived classes, with the exact same method name, draw 0 ,
to display more suitable pictures of these particular cars. Polymorphism has
advantages in that it simplifies the Application Programming Interface (API)
and allows a better level of abstraction to be achieved.
Overloading is similar to polymorphism, in that the same method name
is used, however using a different number of arguments or different argument
types. For example:
int sum(int x, int y)
String sum(String x, String y)

This sum method definition would refer to two functions that perform the
very different operations of summing integers or concatenating strings.
2.2.6 Abstract Classes

An abstract class is one that is incomplete, in that it describes the set of


operations of the class, but is missing the actual implementation of the operations. Abstract classes cannot be instantiated, but can be used through
inheritance, using polymorphism to define the operations. For instance, using
the same example (Fig. 2.2), we may define the class Car as an abstract class,
meaning that it cannot be instantiated. It may be useful to do this, forcing
a derived class to be instantiated, specifying some form of Car. For example,
the drawO method implementation could be removed from the class Car,
but also be described as being available. The implementation of this method
would then appear in the derived classes of Saloon and Estate cars.

28

2. Java Fundamentals

2.3 Java Language Basics


2.3.1 Variables
Variables are locations in the computer memory where values can be stored.
int someNumber;
String someString;

These variables can be local class members or static class members:


Local variables: The most common variables. These are generally used as
loop indices and temporary variables. They are declared, exist and used,
within the class methods. Local variables must be initialised.
Class member variables: Exist within the instance of the class and can be
used to transfer values between the multiple methods of a class. Every
object that is created from a class has individual storage space for its
own class member variables, i.e. changing the value of a class member
variable of one object does not alter that value in any other object. They
are initialised to default values.
Class variables: These are often referred to as static variables. These variables are attached to the class rather than the object. In other words, if
we alter the value of this variable in any instance of the class, then the
value will change for every instance of the class. They are initialised to
default values.
public AnExampleClass {
static int aClassVariable;
int aClassMemberVariable;
public void SomeMethod()
{

int aLocalVariable;
}
}

There are no global variables in Java, however, class variables can be used to
communicate global values.

2.3.2 Access Control


Java allows us to assign levels of access control to the methods and states of
a class. The common modifiers are:
Public: A state or method modified with the keyword public is available
anywhere that the defined class is available.
Private: A state or method modified with the keyword pri vate is only available within the body of the class in which it is defined.

2.3 Java Language Basics

29

Friendly: Is the default access control level of all states and methods. Unless
modified by public or private, these states and methods are available
throughout the package that contains the class in which they are defined.
Protected: A state or method modified with the keyword protected is available in the class, in its derived classes and in the package that contains
that class.
2.3.3 Java Types

Some of the primitive types that exist in Java are illustrated in Table 2.l.
There are no cast conversions between integer and Boolean types.
Table 2.1. Java primitive types.
Type

Description

byte
short
int
long
float
double
char
Boolean

8 bit signed
16 bit signed
32 bit signed
64 bit signed
32 bit IEEE 754 floating point number
64 bit IEEE 754 floating point number
16 bit unicode character
true or false

2.3.4 Java Operators

Expressions are functional statements that return some form of value, commonly using operators to determine this value. Some of the common operators
are listed in Table 2.2.
Operators in Java widen their operands so as to operate at either 32 bit or
64 bit precision. Java throws (generates) an arithmetic exception if the right
hand operand to an integer division is zero. We can catch such exceptions to
trap run-time errors.
2.3.5 Java Comments

There are three main comment forms:


II End of line comments - ignores until end of line

1* Block
1** Java

style comment - cannot be nested! *1


style comments - used with javadoc *1

30

2. Java Fundamentals

Table 2.2. Common operators.

Operator

Description

Arithmetic addition
Arithmetic subtraction
Arithmetic multiplication
Arithmetic division
Comparison
Greater-than comparison
Less-than comparison
Not equal to
Increment
Decrement
Logical AND
Bitwise AND
Logical OR
Bitwise OR
Logical shift left
Logical shift right

*
/
>=
<=
!=

++
&&
&
II
I

The third type of comment is useful for the automatic creation of documentation for a Java application. It has certain keywords such as param
to describe the methods in the code. The j avadoc application (part of the
Java SDK) can then be run on your code, generating an automatic structured HTML document describing your code. Possibly the best example of
a javadoc generated document is the Java API document 3 that is associated
with the Java SDK.

1**

*
*
*
*

*1

ImageFilter to Remove weak edges from an image source.


@author John Smith
@version 1.2 (Last Update 1/5/2000)
@param
the Image
the Image object to be processed.

2.3.6 The super and this Keywords

In Java the word this refers to the specific instance of the class and the word
super refers to the super class of the current class.
super. display 0 ;
this.x++;
3

II calls the display method of the base class.


II increments the state x by 1.

http://java.sun.com/products/jdk/1.2/docs/api/

2.3 Java Language Basics

31

2.3.7 Java Arrays

Java arrays are similar to those of the 'C' programming language. They contain a special length member that returns the size of the array. The Java
Virtual Machine constantly checks to see if the index goes out of bounds and
throws an exception if this occurs.
Object objArray[]; II create a reference to an array of Objects
objArray = new Object [5] ; Iiallocate the space for 5 objects

II
II

we have allocated room for 5 objects, but we


have not created 5 objects!

for (int i=O; i<objArray.length; i++) {


objArray[i] = new Object(); II create 5 objects.
}

2.3.8 The "Object" and "Class" Classes

Every class by default (if we do not extend anything) extends the Object class.
It provides methods that are inherited by all Java Classes, such as equals 0,
getClass 0 and toString O. Consider passing data across a network. We
could write methods to send Object objects across the network. Since every
Class extends object, we can package it up as an object and extract it on the
receiving end.
Objects of the Class class (called class descriptors) are automatically created and associated with the objects to which they refer. For example, the
getName 0 and toString 0 methods return the String containing the name
of the class or interface. This may be used to compare the classes of objects.
2.3.9 Interfaces

A Java Interface is a way of grouping methods that describe a particular


behaviour. They allow developers to reuse design and capture similarity, independent of the class hierarchy. We can share both methods and constants,
but no states. Methods in an interface are public and abstract, instance variables in an interface are public, static and final.
Interfaces provide Java with a way in which to replace the multiple inheritance structure of other languages (such as C++) without the same level
of complexity. Interfaces allow method descriptions to be passed to their
'children', but not the method implementation or the states. Interfaces are
declared using the interface keyword:
public interface Mylnterface {

II the states
public static final int SomelntegerState;

32

2. Java Fundamentals

II methods and states go here


void someMethod();
void someOtherMethod();
}

To use this interface, or rather to conform to this interface, we can use the
following:
public class myApplet extends Applet implements Mylnterface {
public void someMethod()
{

Iladd code implementation


}

public void someOtherMethod()


{

Iladd code implementation


}
}

Consider the mouse device. We may write many applications that use the
mouse and within these applications, the mouse is used in different ways by
different components. We can define a common interface for the mouse that
each one of these components must implement. These components then share
a common defined interface.
2.3.10 Packages

In Java, packages are a way of allowing a developer to group related classes


and interfaces, to allow large scale applications to be developed in a structured manner. Packages take the form of '.' (dot) separated package names,
representing the directory in which the associated class files reside. For example java. awt . image contains some of the Java class files for dealing with
images, such as: PixelGrabber, ColorModel, ImageFilter, ImageProducer, ImageObserver etc.
On a UNIX system these classes are in the java/awt/image directory,
with the '.' replacing the 'I' in UNIX. Similarly the '.' is replaced by '\'
for MSDOS. Notice that the package name (image) begins with a lower case
letter by convention. This makes it easier to distinguish between classes and
packages and also allowing name re-use. For example, a class Image exists
(j ava. awt . Image) as well as the image package (j ava. awt . image). These
packages can be loaded using the import keyword which is located at the
beginning of most Java source files, for example:
import java.awt.Button;
import java.awt.*;

2.4

Applications and Applets

33

This shows two different ways of importing the Button class, in that we can
either import by explicitly naming the Button class as the specific class we
require to import from the AWT package, or we can import all the classes in
the package by using the wildcard ('*') character, therefore also including the
Button class. Note however, that the use of '*' does not cause sub-packages to
be imported. For example, import java. awt. *; does not import the package
java.awt. image.
If the name of the package is omitted when developing Java applications
then the default package (with no name) is assumed, with the same directory
as the source file under development. If you wish to associate a package name
with your source then the package statement must be the very first thing in
your (.java) source file (e.g. package myPackage;).

2.4 Applications and Applets


Java applications are stand alone programs. These are usually stored on a
local computer and can be executed using a Java interpreter. Java applets on
the other hand are usually compact programs that are downloaded from the
WWW, embedded in a HTML web page and executed on the local machine.
Given this fact, it seems apparent that applets must conform to security
restrictions, otherwise a downloaded applet could simply delete the contents
of the hard-disk on the local machine. Applets must therefore conform to
certain restrictions ,which are enforced by the JVM that exists within the
web browser. These include limited access to the local file system, with any
requests being allowed or denied by the local user.
2.4.1 Writing Applications
In the same way that 'C' programs require a mainO function to act as the
starting point of the program. Java applications also require a starting point,
this is also called main () . In Java, the main () method must be a class method
(static, as it can only exist once) where the arguments are passed as an array
of strings, not unlike 'C', with the first argument being held in args [0] . The
number of arguments passed may be determined by using args . length. The
following is an example of a Java application source file (HelloWorld.java):
public class HelloWorld
{

public static void main(String args[J)


{

System.out.println(IIHello World! ... II);


}
}

34

2. Java Fundamentals

System. out. println 0 refers to the method println 0 of the field out of
the System class. The string passed to this method is displayed to the standard output stream. The following steps are required to compile this application using Sun's Java SDK:

Save the preceding sample file as HelloWorld. java


Compile the application to bytecode using the j avac compiler (for the Sun
Java SDK): javac HelloWorld. java This results in a class file containing
bytecode called HelloWorld. class.
Run the application using the Java interpreter (for the Sun Java SDK):
java HelloWorld

2.4.2 Applets

In contrast to a Java application there is no mainO method required for


applets. However there are other methods that must be implemented to create
a functional applet. To implement an applet we create a subclass of the Applet
class (j ava. applet. Applet) and then, (if required) override the following
methods:
ini to: This method 'sets-up' the applet and is called once when the applet

is first initialized. This method is used to set up the display of the applet,
its colours etc.
start 0: Called whenever the applet is visited, i.e. when the applet starts
running again. You could use this method to start up an animation that
stops when the applet looses focus (i.e. it is not the currently selected
window).
stop 0: Called when the applet is no longer on the screen (e.g. minimized).
destroy 0: Called when the applet is discarded. This method should perform
any closing down tasks that you would wish to perform, such as closing
connections to databases, etc.
paint 0: This method is called whenever the appearance of the applet is
required to be updated, such as when the applet is being maximized
from a minimized state. It is also possible for the programmer to call this
method. This method does nothing by default.
When developing for distributable applets you have full access to the
functionality of the AWT but do not have access to the non-core APIs. In
fact, you are developing applets for a virtual machine that only supports a
subset of the true Java specification.
The appletviewer tool supplied with the Sun Java SDK is a compact version of a standard web-browser allowing you to view your applets without
having to run a full WWW browser. It expects a HTML file to be passed
as an argument, i.e. the HTML file that points to the applet you wish to
view using the <applet> tag. The HTML file example. html would contain
an applet tag of the form: <applet code=HelloWorld. class width=300

2.4 Applications and Applets

35

where the name of the class to load is specified,


along with the initial width and height of the applet itself.

height=50></applet>

2.4.3 An Applet and an Application?

It is possible for us to write a class that is both an applet and an application.


We do this by writing an applet in the usual way, by creating a subclass of
the applet class and overriding the methods. We add a main () method to
the class (TheClass) of the form:
II Working example of a combined applet and application.
import java.awt.*;
import java.applet.*;
public class TheClass extends Applet {
public void init()
{

II Example init code ...


Button b = new Button("Test");
this.add(b);
}

public void start()


{

II add start code here ..


}

public static void main(String args[J)


{

Frame f = new Frame("The Application! ");


TheClass theClass = new TheClass();
theClas s . ini to;
theClass.start();
f. add ("Center" ,theClass) ;
f .packO;
f.setSize(f.getPreferredSize());
f .showO;
}
}

This main () method basically creates a new Frame instance and places the
applet instance in the centre of the Frame instance (by default, frames have
a 'border layout', so components are added to the "North", "South", "East",
"West" and "Center"). It then calls the initialise ini t () method followed by
the start () method.

36

2. Java Fundamentals

2.5 Java and Image Processing


2.5.1 The Canvas Class

The Canvas class allows a programmer to define a certain component area


within an application that is to be used for graphics based operations. Once
defined, the Canvas component can be positioned using any of the layout
managers. These operations include drawing lines, shapes, placing images
and handling events related to the graphics such as mouse button clicks. The
Canvas class is usually extended by the user, overwriting the paint () method
to perform a custom application.
Fig. 2.4 displays the output window from the simple graphics operations
code segment given below. When the mouse is pressed at a point on the
canvas, the point is circled, the coordinates are displayed and a line is drawn
from the origin to that point.

II Simple Graphics Operations using Java AWT


import java.awt.*;
import java.awt.event.*;
public class PlotApp {
public static void main(String[] args)
{

MyCanvas mic = new MyCanvas();

II

see class below

II create a new AWT window


Frame f = new Frame("The Graphic Canvas Frame");
II set the layout manger to be border
f.setLayout(new BorderLayout(;
II add the canvas object to the center
f.add(mic,BorderLayout.CENTER);
f.setSize(400,400); II set the canvas size
f.setVisible(true); II display the frame

class MyCanvas extends Canvas implements MouseListener {


int mouseX = 100, mouseY = 100; Ilmouse press positions
public MyCanvas()
{

Ilconstructor

super();
II call the Canvas class constructor
addMouseListener(this); Iladd to the canvas - NB.

public void paint(Graphics g)


{

g.drawString("Pos X="+mouseX+" Y="+mouseY,50,20);


g.drawLine(1,1,mouseX,mouseY);
g.setColor(Color.red);

2.5 Java and Image Processing

37

int radius = 15;


g.drawOval(mouseX-radius,mouseY-radius,radius*2,radius*2);
}

public void mousePressed(MouseEvent e) {


this.mouseX = e.getX();
//get the x mouse press pos
this.mouseY = e.getY();
//get the y mouse press pos
this.repaint();
//repaint the canvas
}

public
public
public
public

void
void
void
void

mouseReleased(MouseEvent e) {/*do nothing*/}


mouseEntered(MouseEvent e) {/*do nothing*/}
mouseExited(MouseEvent e) {/*do nothing*/}
mouseClicked(MouseEvent e) {/*do nothing*/}

Pos X=1 05 Y=117

Fig. 2.4. The window from the simple graphics operations code segment.

This code example illustrates the operation of the AWT Canvas class as
well as the use of mouse events. In the example, MyCanvas is the programmers
implementation of the Canvas class that overwrites the paint 0 method. In
this case we 'draw' a string at location (50,20), followed by a line and an
oval. Before we draw the oval, we set the current colour of the graphics
context to red, using g. setColor (Color. red). Table 2.3 contains some of
the commonly used Graphics class methods.
We use the MouseListener interface to allow us to capture events from the
mouse such as: mouse pressed, mouse released, mouse entered, mouse exited,
mouse clicked and to associate program functionality with these events. In
this case we wish to locate the position of the mousePressed 0 event. We use
e. get X0 and e. getY 0 to get the x and y location. These values are then set
as class states, so that they can be seen (have scope) in the paint () method.
An addMouseListener(this) is added to the canvas in the MyCanvas constructor, to allow the canvas to receive events. The only event that we needed
to implement for this application was the mousePressedO event. However,
we still need to provide empty methods for the remainder of the mouse events
to conform to the MouseListener interface.

38

2. Java Fundamentals

The PlotApp class provides the application code, creating the display
frame and creating an instance of MyCanvas class. We set the layout of the
frame to be borderlayout allowing a component added to the centre (in this
case the canvas) to scale to the size of the frame, even if it changes during
the application.
Table 2.3. Basic methods of the Graphics class.
Method

Description

drawLine (int, int, int, int)

Draw a line in the current graphics colour,


parameters: (xl, yl, x2, y2) to plot a line
from (Xl,Yl) to (X2,Y2).
Draw outlined rectangle in the current
colour, parameters: (x, y, width, height) to
plot a rectangle from (x, y) to (x+w, y+h).
FillReet (same form) draws a filled rectangle.
Draw an outlined oval in the current
colour, parameters: (x, y, width, height).
FillDval (same form) draws a filled oval.
Draw a string in the current graphics
colour, parameters: (theString, x, y).
Draws an outlined arc inside
a
specified
rectangle,
parameters:
(x, y, width, height, startAngle, arcAngle).
Begins at startAngle extends for arcAngle
degrees. FillAre (same form) draws a
filled arc.
Sets the current colour of the graphics context, parameters: (theColor), where
theColor is an instance of the Color class.

drawReet (int, int, int, int)

drawDval (int, int, int, int)

drawString (String, int, int)


drawArc(int,int,int,int,int)

setColor(Color)

This example allows the superimposition of graphics on images, a useful


property for machine vision application development. If, for example you
want to identify corners in an image, you can plot them on top of the image,
without any modification of the core image data. This property is also useful
in active contour model applications (Blake & Isard 1998, Kass et al. 1987)
to display dynamically changing content on top of a static image frame (the
g. drawlmage () method is used to display an image).
2.5.2 Java and Images

Java provides an Image class (java.awt.lmage) for the purpose of storing


images. This class contains useful methods for obtaining information about

2.5 Java and Image Processing

39

the Image object. It has been a part of the Java SDK in the java. awt and
java. awt . image packages since its original release. Some of the methods
associated with the Image class are described in Table 2.4.
Table 2.4. Basic methods of the Image class.
Method

Description

int getWidthCImageObserver)

Determines the width of the Image object.


If the width is not yet known the return
value is -1. The width may be unknown
if, for example, the image is being downloaded from a slow web site. Program execution can still continue.
Returns the height of the Image object.
If the height is not yet known the return
value is -1.
Returns the image producer object which
produces the pixels for the image.
Creates a graphics context for the image.
This can be used for drawing to an offscreen image.

int getHeight CImageObserver)


ImageProducer get Source 0
Graphics getGraphics 0

In the case of an applet, an Image object can be obtained from a source,


by using the Applet . get Image () method and then displayed on the canvas
using the drawImage () method which is a member function of the Graphics
class. This method allows the scaling of an image using a bounding box
(which can be used for performing basic magnification) using the Image class
method:
Image getScaledInstance(int width, int height, int hints);

Where the required integer width and height values are passed along with
the 'hints', that define the resampling algorithm to use, e.g. SCALE...F AST ,

SCALE_SMOOTH, SCALEJDEFAULT.

/* A complete example to display a scaled image in an Applet */


import java.awt.*;
import java.awt.image.*;
import java.applet.*;
public class ImageApplet extends Applet {
Image theImage;

40

2. Java Fundamentals
public void init()
{

II load the image 'test.jpg' in the current directory (URL)


the Image = this.getImage(this.getDocumentBase() ,"test.jpg");
II scale the image to be 200 x 200 pixels.
the Image
theImage.getScaledInstance(200,200,
Image.SeALE_SMOOTH);

public void paint(Graphics g)


{

g.drawImage(this.theImage,10,10,this);
}
}

An image consists of information in the form of pixel data. In the case of


a Java Image object, a pixel is represented by a 32 bit integer. This integer
consists offour bytes containing an alpha (transparency) value, a red intensity
value, a green intensity value and a blue intensity value. The transparency
level and each colour plane are integer values in the range 0 to 255. This
representation of pixel data is referred to as the default RGB colour model.
The method used to extract each of the four intensity planes is demonstrated
here, using a sample method, that averages the RGB pixels to determine grey
level pixel values:

II Pass the rgb pixel value (i.e. a 32 bit integer)


public int averageRGB(int rgb)
{

int r = (rgb
int g
(rgb
int b
(rgb

16) & Oxff;


8) & Oxff;
0) & Oxff;

Ilobtain the
Ilobtain the
Ilobtain the
Ilcan ignore

red
green
blue
the alpha

II average over the red, green and blue


int nIntensity = (int) (r + g + b)/3;
r = g = b = nIntensity;
II
II
II

build up the rgb integer value again from the


component colours. note: the ff sets the alpha
value to a maximum i.e. 255 (in Hex Oxff)
return (rgb & OxffOOOOOO) I (r16) I (g8) I (bO);
}

The alpha or transparency value is usually set to its maximum value (255) as
images are required to be fully opaque for true display purposes, hence the
alpha value can usually be ignored (set to 255) when processing an image. For
grey scale images, the values for each of the colour planes will be the same,
hence only the value for one of the colour planes need be extracted in order
to obtain the grey intensity value for that pixel. This approach increases the

2.5 Java and Image Processing

41

speed at which an image can be processed. For instance, in the last code
segment, we need only extract the red value of the returned rgb value to
determine the green and blue values. The basic AWT imaging model is a
'push' (producer/consumer) model and is discussed in Sec. 2.5.3.
2.5.3 Image Producers and Image Consumers

An image producer constructs an image whenever it is required by an image


consumer. An Image object is an abstraction than cannot be manipulated directly, rather it is a reference to an object that implements an ImageProducer
interface. An example of an image producer is evident when an Image object
is disassembled into its pixel array. An example of an implementation of the
ImageProducer interface is in the MemoryImageSource class, see Sec. 2.5.5.
The Image. get Source () method can be used to obtain the ImageProducer
for an Image object.
An object that implements the image consumer interface is concerned with
data produced by an image producer. An example of an image consumer is
evident anywhere an Image object is reconstructed from its representative
pixel array. Examples of an implementation of the image consumer interface
is the PixelGrabber class, see Sec. 2.5.4. ImageProducers, ImageConsumers
and ImageObservers make up the three interfaces associated with the Image
object
An ImageConsumer cannot request data, rather it must wait for an ImageProducer to push data to it. The ImageConsumer class is aware of a complete image being present once it has received a call to its ImageComplete ()
method. This allows an image to be produced from a web site, with an unknown download time. This push model is not ideal for image processing
applications, rather it was developed for use in developing applets for web
applications.
2.5.4 Recovering Pixel Data from an Image Object

In order to carry out local area image processing operations on an image


object, access to the pixel information is required. This is not provided directly by methods of the Image class. An Image object can be deconstructed
using the PixelGrabber class where a PixelGrabber is created using either
an Image object or the image producer associated with that Image object.
Information about the required image area dimensions as well as the array
to store the pixel information must also be supplied. Most programs only
call the grabPixels () method of the PixelGrabber class. The resulting pixel
information is again stored using the default RGB colour model. The PixelGrabber constructor used here has the form:
public PixelGrabber(Image thelmage, int x, int y,
int width, int height,
int[] pixels, int offset, int scansize)

42

2. Java Fundamentals

and is illustrated in Fig. 2.5. The offset represents the location into the array
of where to store the first pixel. The size of the array (n) to store the Image
Area of Interest must be at least (( scansize - 1) x height) + of f set + width.
(0.0)

Unsealed Image

~f-Im-a-g-e-Ar-ea-o-I---'
Interest

pixelsll
_01

_
~11

..
~31

- TI

heig~ r--I
0 ~I
0;:';;1 o......rl;;;;;;;;;;'I~~I
0,.. ..... . ;;.;'"
I

iii

WIdth

'e

offset

pililBll!ilnl

~--------------------~

choose scansize

Fig. 2.5. Graphical illustration of the PixelGrabber object construction operation.

When a pixel I(i, j) is grabbed, it is placed in the resulting array at pixels[(jy) x scansize + (i - x) +of f set]. Fig. 2.6 illustrates two different cases for the
the choice of the scansize parameter. In Fig. 2.6(a), the scansize is chosen
to be the same width as the sub image to be converted to a data array. This
gives an array of size: (width x height) + of fset. In Fig. 2.6(b), the scansize
is set as the width of the original image. This causes the array to contain zero
values (transparent black pixels) in the space defined by the offset that does
not belong to the sub-image. It is important to note the offset when you are
again constructing an image from the pixel array.

I*A segment of applet code to load an image, break it down into


a pixel array, operate on it and then reconstruct an image object*1
the Image = this.getImage(this.getDocumentBaseO ,"test.jpg");
II Wait for the image to load
while(theImage.getHeight(this)==-l){/*do nothing*/};
width = theImage.getWidth(this);
height = theImage.getHeight(this);
int[] pixels = new int[width*height]; Ilarray to hold pixels

II
II

Grab the pixels from the image, offset=O, scansize=width


and x,y = 0, so we are grabbing the entire image.
PixelGrabber pg = new PixelGrabber(theImage, 0, 0,
width, height, pixels, 0, width);
try {
pg. grabPixels 0 ;
} catch (InterruptedException e) {
System.out.println("Interrupted waiting for pixels ... ");
return;
}

2.5 Java and Image Processing

43

II assuming that we have grabbed the entire image


for (int j = 0; j < height; j++ ) {
for (int i = 0; i < width; i++)
{

int cur = (j * width) + i',


int pixel
pixels[cur];
int
int
int
int

(pixel
(pixel
(pixel
(pixel

alpha
red
green
blue

24) & Oxff;


16) & Oxff;
8) & Oxff;
) & Oxff;

green = 255; II perform your operation here!


II this test case just saturates the green plane
pixels[cur] = (pixel & OxffOOOOOO) I
(red16) I (green8) I (blueO);
}
}

scansize = width

scansize = unsealed ,mage width

Image Area of
Interest

Image Area of
Interest

width

pixels[

pixels[ I

I(xy)tr-I--r----,----,I ~

I(X,y)tooCJL-----L----=o=------....1.r=J_----L--=_..J...r=J
_~..
__.O(-_ __ _-J._-o(- __ _

_.o(--,sca=n",SI""ze'--~.

C OIlsel
(a)

(b)

Fig. 2.6. Illustration of the use of scansize when using the Pixel Grabber class.
(a) When scansize = width. (b) When scansize is set the same as the full image
width.
Once the pixel information is in array form, it can be processed using
established image processing algorithms (such as those discussed throughout
the book). After processing an image array it is then required to reconstruct
a new Image object for display purposes.

44

2. Java Fundamentals

In the last code example we used an infinite loop that checked to see if
the image had loaded completely, as the Image . getHeight (this) 4 will only
return a positive value when the image is fully loaded. We could also use the
Media Tmcker class, that can monitor the status of loading images and notify
the applet when complete. An example of the use of the MediaTracker class
is outlined in Sec. 2.8.2.
2.5.5 Recreating an Image Object from an Array of Pixels

The reconstruction of an Image object from its representative pixel information is implemented using two Java classes. The first of these is the MemoryImageSource class, where an object of this class is created using a reference
to an array of pixels and information about the dimensions of the image to
be constructed. This class implements ImageProducer. The ImageProducer
is used by the createImage () method of the Toolkit class to create a new
Image object from the referenced pixel information. All the components are
now in place for the implementation of the image processing loop, which allows the development of a complex image processing class while keeping the
interface as simple as possible. The interface in many cases may consist of
a setImage () and getImage () method pair with the underlying complexity
hidden from the user. The MemoryImageSource class constructor is used to
convert an array of integers in the RGB Color Model form to an Image object
has the form:
MemorylmageSource(int width, int height, int[] pix,
int offset, int scansize)

So, for the previous applet code segment:


II construct the image from the pixel array in the previous code
II assuming that the scansize remains equal to the width.
thelmage = this.getToolkit().createlmage(
new MemorylmageSource(width,height,pixels,O,width;

The operation of an image processing loop can be described as follows:


An Image object of any dimensions can be disassembled into its representative pixels in the form of an array.
An image processing operation can be performed on the image data in this
form.
The resulting image data can then be used to create a new Image object
which represents the processed image.
This concept plays a major part in the development of the block diagram
approach to image processing required by Neat Vision (Chap. 7). An image
4

Problems have been recorded in using this approach under Solaris. In this case
the MediaTracker class must be used.

2.6 Additional Classes

45

processing block appears to its neighbours as a 'black-box'. The only information known about this 'black-box' is the number of inputs and outputs it
has.

2.6 Additional Classes


The ability to process images has been verified though the use of the image
processing loop. Although this concept should cater for all algorithm possibilities there are built in image filtering classes and other utilities that should
be investigated. Some of these have been mentioned previously but shall be
outlined here in more detail for completeness. The image processing loop is
actually the slowest method of implementing an image processing operation
in Java, however it is the most versatile due to its low level of operation. There
are two other approaches, the first uses a look-up table (LUT) approach and
the second makes use of the built in Java ImageFilter class.
2.6.1 ColorModel

This class is the template for all classes that encapsulate methods for translating from pixel values to their alpha, red, green and blue components. In
the Java default RGB colour model, each pixel is described by a 32 bit integer which can be broken up into four 8 bit values for red, green, blue and
transparency or alpha planes. There are two main defined subclasses of the
ColorModel class, the DirectColorModel and IndexColorModel.
DirectColorModel. This class is not particularly useful, as it is very similar
in structure to the Default RGB colour model, where the number of colours
available to an image can be specified. For example, using 3 bit masks for
each colour there could be 8 possibilities for each colour plane and hence 512
possible pixel values.
IndexColorModel. The IndexColorModel Class provides an extremely fast
and efficient method of implementing LUT or palette processing algorithms.
The IndexColorModel is useful in dealing with grey scale images, since the
maximum amount of colours in the model is 256, therefore ideal for dealing
with an 8 bit grey scale images. The fast execution of image processing algorithms with large images using the IndexColorModel is due to the fact that
only the L UT is operated on, hence only a maximum of 256 operations must
be carried out.
2.6.2 ImageFilter

This class is the template for all classes that are required to process the data
delivered from an ImageProducer object to an ImageConsumer object. The

46

2. Java Fundamentals

normal image filter implemented by this class is the 'null-filter', that has no
effect on the data passing through it. Functional image filters should extend
this class and override its methods. The ImageFilter class and its subclasses
should be used in conjunction with the FilteredlmageSource class to produce
filtered versions of existing image objects.
2.6.3 CropIrnageFilter

The cropped image filter is used for cropping images. This class extends
the basic ImageFilter class to extract a specific rectangular region from a
given image. The result of this operation is a new image containing only the
specified region. Other possible uses for this class include the development
of image magnification utilities where the CroplmageFilter is used with the
drawlmage () method of the Graphics class to display a magnified region of
a source image.
2.6.4 RGBIrnageFilter

This class provides a simple method for creating an image filter that modifies the pixels of the original image, by converting them one at a time using
the default RGB colour model. Objects of this class are to be used in conjunction with a filtered image source object to produce filtered versions of
existing images. The type of filter implemented by this class should perform
monadic image operations (although alterations can be made to inherited
versions of this class which make it capable of performing neighbourhood operations). The following code segment illustrates one such monadic example,
the threshold image filter:
import java.awt.image.RGBlmageFilter;
class ThresholdFilter extends RGBlmageFilter {
private int nThresholdValue = 128;
public ThresholdFilter()
{

II Not dependent on x,y postion => faster


canFilterlndexColorModel = true;

public ThresholdFilter(int nThresholdSet)


{

nThresholdValue = nThresholdSet;
canFilterlndexColorModel = true;
}

public int filterRGB(int x, int y, int rgb)


{

int r
int g

(rgb
(rgb

16) & Oxff;


8) & Oxff;

2.7 Double Buffering


int b = (rgb
int nlntensity

47

0) & Oxff;
(int) (r + g + b)/3;

if (nlntensity < nThresholdValue) r = g


else r = g = b
255;

0;

return (rgb & OxffOOOOOO) I (r16) I (gS) I (bO);


}
}

This filter has two constructors. The first one assumes a threshold value of
128 (i.e. half way to 255) and the second constructor allows the user to specify
a threshold value in the range 0 to 255. To use this filter:
ThresholdFilter filter = new ThresholdFilter(100);
ImageProducer P = new FilteredlmageSource(
this.thelmage.getSource() ,filter));
this.thelmage
Toolkit.getDefaultToolkit().createlmage(p);

2.6.5 FilteredIrnageSource

The FilteredlmageSource class is an implementation of the ImageProducer


interface which takes an image and an image filter and uses them to produce a
filtered version of the original image. This class must be used in conjunction
with the create Image 0 method of the Toolkit class which creates a new
image from the ImageProducer interface associated with the relevant image
filter.

2.7 Double Buffering


One of the most common problems associated with animation and indeed
any form of graphical update, is 'flicker'. Double buffering is a solution to
this problem by maintaining an offscreen buffer into which the next frame
can be created, rather than placing it on the screen that is already visible,
i.e. the onscreen buffer. When we require to update the visible screen, the
entire offscreen buffer is updated onto the visible screen. For example, in a
3-D application where a mesh has to be created and rotated, the next frame
can be created offscreen, rather than blanking the visible screen and then
slowly drawing the mesh line-by-line. This allows a very smooth update and
adds to the quality motion effect.
Double buffering is more difficult to implement when compared to a single
call to the paint 0 method. However, it produces a major improvement in
the quality of the animation or screen updates. Double buffering requires additional memory space when compared to the single buffering approach. Note
that this type of double buffering will not work with the Swing components
discussed in Sec. 2.8.2.

48

2. Java Fundamentals

II
II

A double buffered example that paints circles very smoothly


when the mouse is pressed and moved on the applet surface.
import java.applet.*;
import java.awt.event.*;
import java.awt.*;
public class DoubleBuffer extends Applet
implements MouseMotionListener
{

Image offScreenImage = null;


Graphics offScreenGraphics = null;
int mouseX = 50, mouseY = 50;
public void init()
{

this.addMouseMotionListener(this);
}

public void paint(Graphics g)


{

Dimension curScreenDimension
if (offScreenGraphics==null)
{

this. getSize 0 ;

II
II

if there is no off Screen Graphics buffer then


we must create one
offScreenImage = createImage(curScreenDimension.width,
curScreenDimension.height);
II Obtain a Graphics object (such as an image display)
offScreenGraphics = offScreenImage.getGraphics();
this.update(g); II paint on startup ..

public void update(Graphics g)


{

Dimension curScreenDimension = this.getSize();


offScreenGraphics.setColor(Color.white);
offScreenGraphics.fiIIRect(O,O,curScreenDimension.width,
curScreenDimension.height);
offScreenGraphics.setColor(Color.black);
for (int i=O; i<45; i+=3)
offScreenGraphics.drawOval(mouseX+5+i,mouseY+5+i,
90-(2*i),90-(2*i));
g.drawImage(offScreenImage, 0, 0, this);
}

public void mouseMoved(MouseEvent e) {


public void mouseDragged(MouseEvent e)
{

this.mouseX = e.getX();
this.mouseY = e.getY();
thi s . repaint 0 ;
}
}

1*

do nothing

*1 }

2.8 Recent Additions to Java for Imaging

49

The double buffering illustrated in this example is implemented in three


main stages:
Create an off-screen image and obtain its graphics context object.
Draw the graphics using this graphics object.
Copy the off-screen buffer image onto the applet's graphics object.
This greatly improves the quality of the animation, removing the flickering
for several reasons:
In general, computers have an efficient way of copying blocks of image data
onto the screen.
The previous frame can remain visible until the computation has been
performed for the creation of the new image, rather than blanking the
screen and drawing each object, one by one.
Video memory is not cached by the CPU, whereas using double buffering
we are drawing to a location in physical memory.
The paint 0 method is scheduled to be called when the applet begins
execution or when the applet is brought from underneath another window.
The updateO method is called by the AWT whenever the repaint 0 or
paint () methods are called (we also call it whenever the mouse is dragged,
i.e. mouseDragged 0). The update 0 method clears the display by filling it
with the background colour, sets the colour of the graphics context to be the
foreground colour and then calls the paint () method to completely redraw
the graphics.

2.8 Recent Additions to Java for Imaging


There are several difficulties with the basic AWT imaging model that we have
just outlined. It is possible to develop quite advanced imaging applications
using this model, however it was not designed for the development of high
performance image processing applications. It uses the AWT push model,
designed for a web environment, where the image exists somewhere on the
network, waiting while the ImageProducer is required to send the image.
In many advanced imaging applications we can assume that the image is a
persistent object that can be re-used. Also, the basic AWT imaging model
does also not support region of interest operations and advanced filtering. It
also only supports a limited number of image formats.

2.8.1 Java 2D API


The new Java 2D API (now part of the core Java API) extends the functionality of the basic AWT imaging model to provide more support for 2-D

50

2. Java Fundamentals

graphics and imaging. It adds support for a persistent image object, advanced image filters and additional colour models and image formats. The
Java 2D API also introduces a Renderable and Rendered Interface, to provide
resolution-independent image rendering support. Using these interfaces, images can be 'pulled' through a sequence of image processing operations, with
the resolution selected using a rendering context.
An unrendered source image, is an ideal image, with no defined size, which
may be projected onto any surface, at any resolution. Operators (image filters) can change the form of the image, independent of the destination, as
the image is requested by the display, providing the optimal rendering quality for that display. This Java 2D Immediate Mode Model (j ava. awt . image)
provides support for:
Raster class: Represents an array of pixels, bound by a rectangle, that need
not necessarily start at (0,0). A Raster has a SampleModel that describes
how the the pixels are stored in the corresponding DataBuffer that stores
the image.
Bufferedlmage class: Used to describe an image with an available buffer of
image data. It contains a ColorModel and a Raster of image data.
RasterOp interface: Describes a single input, single output operation that is
performed on Raster objects. The interface cannot be used to describe
multiple input operations and the Source and Destination objects must
contain the correct number of bands for the class implementing this interface.
BufferedlmageFilter class: Provides a simple way of using single source,
single destination image filter on a BufferedImage in the producer - filter
- consumer model.
BufferedlmageOp interface: Describes a single input, single output operation that is performed on BufferedImage objects. Objects created using
this interface can be passed into a BufferedImageFilter to operate on a
BufferedImage following the producer - filter - consumer model.
2.8.2 Working with the Java 2D API
The following Java 2D API code segment illustrates how the BufferedImage
class can be used within a filtering application (Fig. 2.7). The testFil ter 0
method receives a BufferedImage object and returns a BufferedImage object
after a convolution operation (Sec. 3.2). The kernel that is used is a Laplacian and performs a form of edge detection on the image. The loadlmage ()
method shows how the MediaTracker class can be used to obtain a complete
image and convert the Image object into a BufferedImage object. The mainO
method generates a simple image viewer for viewing BufferedImage objects.
The JFrame class (j avax. swing. JFrame) used here is an extended version
of java. awt. Frame, a component class of the Swing package. The Swing set
provides a large set of lightweight (pure Java) components that work in the

2.8 Recent Additions to Java for Imaging

51

same manner on all platforms. They add tables, sliders, progress bars, scroll
panes and many more components to Java.

II

Example Image filtering using the Java 2D API

import
import
import
import

java.awt.*;
java.awt.image.*;
javax.swing.*; II Use Swing JFrame and ImageIcon Classes
java.net.URL;

public class J2D Example extends JFrame {

II
II

This is the Test Laplacian Filter receiving and returning


a BufferedImage.

public BufferedImage testFilter(BufferedImage input)


{

int type = input.getType();


int width = input.getWidth();
int height = input.getHeight();

II The empty output image


BufferedImage output = new BufferedImage(width,height,type);
float[] IMatrix = { O.OF, i.OF, O.OF,
i.OF,-4.0F, i.OF,
O.OF, i.OF, O.OF };
Kernel kernel = new Kernel(3,3,IMatrix);
ConvolveOp convolve = new ConvolveOp(kernel);
convolve.filter(input,output); Ilperform the convolution
return(output);
}

II

Load a Buffered Image using the MediaTracker class

public BufferedImage loadImage(String name)


{

URL urI = J2DExample.class.getResource(name);


Image image = this.getToolkit().getImage(url);
try {
II the media tracker tracks the load progress
MediaTracker tracker = new MediaTracker(this);
tracker. add Image (image , 0);
tracker.waitForID(O);
} catch (Exception e) {}
int width = image.getWidth(this);
int height = image.getHeight(this);

II Create a blank BufferedImage


BufferedImage input = new BufferedImage(width, height,
BufferedImage.TYPE_INT_RGB);
II Convert the Image object to a BufferedImage object.
input.createGraphics().drawImage(image,O,O,null);
return input;
}

52

2. Java Fundamentals

II
II

The main application - Creates a JFrame object


and adds the Image to it using a JLabel with ImageIcon

public static void main(String[] args)


{

J2DExample eg = new J2DExample();


BufferedImage b = eg.loadImage("test.jpg");
b = eg.testFilter(b);
eg.getContentPane().add(new JLabel(new ImageIcon(b);
eg.pack(); II pack the components of the JFrame
eg.setVisible(true); II display the JFrame

(a)

(b)

Fig. 2.7. Laplacian edge detection using the Java 2D API. (a) Laboratory input
image. (b) The output image illustrating the filtering operation.

The Java 2D API can also be used quite successfully for graphical applications. The following code example illustrates how the graphics context
may be obtained for such an application.

II Example Drawing using the Java 2D API


import java.awt.*;
import java.awt.image.*;
import javax.swing.*;
public class J2DDraw extends JFrame {
private BufferedImage paint(BufferedImage b)
{

II Create the graphics context from the image


Graphics2D g2 = b.createGraphics();
II Add anti-aliasing properties
g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING,
RenderingHints.VALUE_ANTIALIAS_DN);

2.8 Recent Additions to Java for Imaging

53

II Draw a set of ovals


for (int i=O; i<45; i+=3)
g2.drawOval(5+i,5+i,90-(2*i),90-(2*i));
return b; II return the altered image
}

II The main application - Creates a JFrame object


II and adds the Image to it using a JLabel with ImageIcon
public static void main(String[] args)
{

J2DDraw eg = new J2DDraw();


BufferedImage b = new BufferedImage(100,100,
BufferedImage.TYPE_INT_RGB);
b = eg.paint(b);
eg.getContentPane().add(new JLabel(new ImageIcon(b)));
eg.packO;
eg.setVisible(true);
}
}

This Java 2D API example also illustrates the use of anti-aliasing 5 . The
setRenderingHint 0 method is used to add this anti-aliasing property to
the graphics context. The resulting effect is shown in Fig. 2.8. There is a
computational cost associated with the use of anti-aliasing, so it should not
be used by default. It is generally used when quality rendering is required.

(a)

(b)

Fig. 2.8. Sample Java 2D API graphical application output. (a) No anti-aliasing
em ployed. (b) Anti-aliasing employed.

Aliasing is due to the discrete nature of digital images. The effect can be seen
as discontinuities ('jaggies') on a line as it is sampled over the image. To reduce
this effect anti-aliasing algorithms (smoothing) can be used. Alternatively the
resolution of the sampled image can be increased.

54

2. Java Fundamentals

2.8.3 Renderable Layer and Rendered Layer

The Java AWT supports two imaging layers, renderable and rendered. The
renderable layer is rendering-independent, where operators are available that
use rendering independent parameters. When a rendering is requested, the
request passes from the destination to the source. The rendered layer is an
image that has been rendered for a particular context. The renderable layer is
often preferred, as it is lightweight, with alterations to the renderable operators (RenderablelmageOp) being editable, without the need for pixel computation. These renderable operators can be connected to form a chain and are
connected to a Renderablelmage source. A Renderablelmage source can be
given a RenderContext to produce a Renderedlmage. This RenderContext is
used to define the mapping between the user defined coordinate system of the
Renderablelmage and the coordinate space for the desired device dependent
Renderedlmage.
A Parameter Block is used to contain the parameters and sources that
are required by a specific RenderablelmageOp. The parameters in the chain
can be altered by using the setParameterBlockO method, affecting future
Renderedlmages obtained from this chain.
The Java AWT API also provides support for CMY (cyan, magenta and
yellow) and CMYK (cyan, magenta, yellow and black) image pixel sampling.
As discussed previously, DataBuffer is used for RAW image storage, but is
not concerned with how the pixel information is stored. The SampleModel
class contains a sample of how pixels are contained within an image. A Raster
contains a DataBuffer and a SampleModei.
2.8.4 Java Advanced Imaging API (JAI)

The Java Advanced Imaging API (JAI) was developed for digital image processing applications in Java by Sun, Siemens, Kodak and Autometric Inc.
with the aim of retaining the flexibility of Java applets and applications (including the runtime environment) but allowing the use of imaging techniques.
To retain this flexibility involved developing a cross-platform object-oriented
device-independent API. The API supports numerous image formats and
imaging algorithms, while still working well with other Java core APls. The
JAI supports client server imaging (Java RMI and JavaBeans). It also links
to Intel MMX libraries and Sun VIS hardware optimisations.
There are numerous uses for the JAI in image processing and analysis.
The Java Advanced Imaging (JAI) API was developed to support image
processing under the Java language. As yet, it does not provide the large
collection of machine vision techniques available in Neat Vision, rather it provides support for the generation of any image processing application using
a carefully defined programming model. The JAI supports many image formats and advanced imaging techniques while encapsulating the complexities
involved.

2.8 Recent Additions to Java for Imaging

55

The JAI API coupled with the Java programming language provide many
advantages as the choice of platform for machine vision application development, including:
Object-oriented: The JAI retains the object-oriented structure of Java, implementing images and image filters as objects.
Extensible: Any image processing operation developed using JAI can be
added to the API.
Cross Platform: An application developed using the JAI can be deployed to
any platform with a Java Virtual Machine.
Rendering-independent sources and operations: To allow processing operations to be specified in resolution-independent coordinates.
Rendering contexts: To avoid defining image and operation characteristics
until execution.
Editable graphs of image processing operations: To be defined and instantiated as required.
Multi-dimensional, multi-banded image data types: Including common Image data types.
Tiled images: Supported to allow efficient processing of large images.
Deferred execution: Supported to allow operations on only the pixels that
are required.
Operation chain optimisation: To allow a sequence of operations to be combined efficiently in a single operation.
The JAI API provides support for the Java AWT and Java 2D imaging models. It allows JAI operators to be applied directly to Java 2D
Bufferedlmage objects. The JAI API has support for the 'pull' imaging
model, allowing operations to be performed on requested areas of an image
source. The operation of the JAI API is illustrated in Fig. 2.9. The chain of
operations in JAI is referred to as a graph. JAI supports Rendered Graphs
and Renderable Graphs. Rendered Graphs allow direct modification of pixel
values, processing in immediate mode, so when the chain of operations is altered (e.g. by adding a new operation), it immediately computes a new result.
Renderable Graphs on the other hand are evaluated when there is a request
for rendering, known as deferred execution (the 'pull' model). Rendered and
Renderable graphs are editable, however once rendering takes place at a node
in a Rendered graph, it is 'locked' and parameters or sources may not be altered. Fig. 2.9 also illustrates a remote mode render option. This remote
mode is based on RMI (Remote Method Invocation), allowing the client to
call a method for server-side image processing.

56

2. Java Fundamentals
source

operations

render

Image File

Rendered
immediate mode

Network Image

Renderable
deferred mode

Generate Image

Remote
remolemode

destination

The Imaging ""aln

Fig. 2.9. The summary of the JAI operation.

2.8.5 JAI Functionality


The main packages in the JAI API are:
j avax . media. j ai contains the core J AI classes, such as CanvasJ AI, J AI,
PlanarImage, RenderedOp, RenderableOp, ROI, Tiledlmage, etc.
com. sun. media. j ai . codec contains images encoders and decoders, such as
support for BMP, JPG, PNG and TIFF image formats.
j avax. media. j ai. i terator contains support for writing extension operators.
j avax. media. j ai. operator contains the image operators, such as support
for AND, OR, NOT, XOR, Median, mean, thresholding, convolution broken into the main categories of: point operators (add, invert, more than 40
operations), area operators (median, convolve etc.), geometric operators
(scale, translate, shear etc.), colour quantization operators, file operators, frequency operators (DFT, DCT etc.), statistical operators (mean,
histogram, extrema) and edge extraction operators (gradient).
j avax. media. j ai. widget contains support for image canvases, including
the ImageCanvas and ScrollinglmagePanel classes.
JAI uses the PlanarImage (javax.media.jaLPlanarlmage) class to describe 2D images. It implements the Renderedlmage interface, discussed earlier (i.e. a tiled image), described by a SampleModel and a DataBuffer. Planar Image maintains source and sink connections between nodes of rendered
graphs. The API also supports multiple image coordinate systems. The following code segment illustrates a sample JAI application:
II JAI application to perform a Sobel edge detection on a
II chosen image.
import
import
import
import
import
import

javax.media.jai.*;
java.awt.image.renderable.ParameterBlock;
javax.media.jai.widget.ScrollingImagePanel;
com.sun.media.jai.codec.FileSeekableStream;
java.awt.*;
java.io.*;

2.8 Recent Additions to Java for Imaging

57

public class JAISobelExample {


private static RenderedOp SobelEdgeDetect(RenderedOp theImage)
{

II
II
II

Stores the required input source and parameters in a


ParameterBlock to be sent to the operation registry,
and eventually to the "Sobel" operator.
ParameterBlock params = new ParameterBlock();
params.addSource(theImage);
II Gradient Mask
params.add(KerneIJAI.GRADIENT_MASK_SOBEL_HORIZONTAL);
II Orthogonal Gradient Mask
params.add(KerneIJAI.GRADIENT_MASK_SOBEL_VERTICAL);
II Perform the filtering
the Image = JAI.create("GradientMagnitude", params);
return theImage;

private static RenderedOp LoadImage()


{

RenderedOp theImage;

II The frame for the File dialog box


Frame frame = new Frame();
FileDialog fileDialog = new FileDialog(frame);
fileDialog.setMode(FileDialog.LOAD) ;
fileDialog.show();
String file = fileDialog.getFile();
if (file == null){return null;}
String dir = fileDialog.getDirectory();
File f = new File(dir, file);
if (f. exists 0 )
{

FileSeekableStream stream = null;


try {
stream = new FileSeekableStream(dir+file);
} catch (IOException e) {
e.printStackTrace();
System.out.println("Err Loading file:"+dir+file);
}

theImage = JAI. create ("stream" , stream);


return theImage;
}

else return null;


}

public static void main(String[] args)


{

RenderedOp theImage;
theImage = LoadImage();

58

2. Java Fundamentals
if (theImage!=null)
{

the Image = SobelEdgeDetect(theImage);


ScrollingImagePanel thePanel = new ScrollingImagePanel(
theImage,theImage.getWidth(),theImage.getHeight());
Frame f = new Frame("Sobel Edge Detector");
f.add(thePanel);
f .packO;
f .showO;
}
}
}

This example should be compiled using the Sun Java SDK, ensuring that
you have obtained the JAI API from SUN and added the JAI libraries to
your CLASSPATH environment variable (see Sec. 2.4.1 for details on the
compilation procedure).
This example uses the RenderedOp class. JAI supports both RenderableOp and RenderedOp classes. The RenderableOp class is a representation
of an operation in a Renderable graph. The RenderedOp class is used in a
Rendered graph and stores the Parameter Block and RenderingHints for that
operation.

(OfIOO1S.pg
cOf'l'idor.JPO
CS',pcx

e<nOOO4.1>\j
corrOOl4.1>\j

fl.J>O
f3.1P\l
fr ..... l.~
fr .....2.~

Aeooma
Fletol!>l>O

lAir... 1"1
(a)

(b)

Fig. 2.10. Sobel edge detection using JAr. (a) The LoadFile dialog box. (b) The
output canvas.

2.9 Additional Information on Java


The information outlined in this chapter should be sufficient to allow the
reader to understand the Java concepts visited throughout the remainder of
this book. For readers who wish to follow up on the ideas presented, please

2.10 Conclusion

59

refer to the following web sites (there are also several news groups available
under the general heading comp . lang . java):
Java Software Home Page (java. sun. com). Contains comprehensive documentation on APIs mentioned in this chapter.
Java API Documentation (j ava. sun. com/products/j dk/l. 2/docs/api).
This links to an on-line version of the API documentation. It is advisable
to download a local version to your machine if you wish to do further
programming.
Other Java Resources (www.developer.com).This is a good general resource site, with articles and examples related to Java programming.
The Java 2D API site contains documentation and examples specifically
for the Java 2D API (java. sun. com/products/java-media/2D/index.
html).
The Java Advanced Imaging API (java. sun. com/products/java-media/
j ai). Similar to the 2D API site, it provides documentation for programming with the JAI API.
Object-oriented Programming - Module EE553 (www.racee.ie).
Java Tutorials (developer. java. sun. com/ developer / onlineTraining/
Programming). This link refers to introductory tutorials in the Java programming language.
The Swing Connection (java. sun. com/products/jfc/tsc). For additional information on the Swing components mentioned briefly in this chapter.

2.10 Conclusion
This chapter discussed a framework for developing machine vision algorithms
using the Java programming language. The remainder of the book will concentrate on discussing machine vision algorithms and techniques using Java
as a means of illustrating their operation. Details relating to the design of the
NeatVision integrated image analysis and software development environment
(also developed in Java) will be discussed in Chap. 7.

3. Machine Vision Techniques

The purpose of this chapter is to outline some of the basic techniques and
algorithms used in the development of machine vision systems. The key elements of the the image processing and analysis functions introduced in this
section are also implemented in Java and form the basis of the NeatVision
visual programming environment (Chap. 7). Elements of this chapter have
previously appeared in Chap. 2 of Intelligent Vision Systems for Industry
(Batchelor & Whelan 1997).

3.1 Elementary Image Processing Functions


The following notation will be used throughout this section. We will refer to
grey scale images, unless otherwise stated.
y and x are row and column address variables and lie within the ranges:
1 :::; y :::; nand 1 :::; j :::; m.
W denotes the white level.
a(x, y) is the value of a pixel at location (x, y).
N(x, y) is that set of pixels arranged around the pixel (x, y) as follows:

i(x-1,y-1) (x,y-1) (x+1,y-1)l


(x-1,y)
(x,y)
(x+1,y)
(x - 1, y + 1) (x, y + 1) (x + 1, y + 1)

and using our Java notation:


a
c
e
f

=
=
=
=
=

input.getxy(x-1,y-1); b = input.getxy(x,y-1);
input.getxy(x+1,y-1); d = input.getxy(x-1,y);
input.getxy(x,y);
input.getxy(x+1,y); g = input.getxy(x-1,y+l);
input.getxy(x,y+l); i = input.getxy(x+1,y+l);

N(x, y) forms a 3 x 3 set of pixels and is referred to as the 3 x 3 neighbourhood of (x, y). Many of the operators outlined are based on manipulating
pixels in a 3 x 3 or 5 x 5 (as illustrated below) neighbourhood.

P. F. Whelan et al., Machine Vision Algorithms in Java


Springer-Verlag London 2001

62

3. Machine Vision Techniques

= input.getxy(x-2,y-2);
= input.getxy(x,y-2);
e = input.getxy(x+2,y-2);

b
d
f
input.getxy(x-l,y-l); h

a
c

g
i

n
p

=
=

=
=

input.getxy(x+1,y-l);
input.getxy(x-2,y);
input.getxy(x,y);
input.getxy(x+1,y);
input.getxy(x-2,y+l);
input.getxy(x,y+l);
input.getxy(x+2,y+l);
input.getxy(x-1,y+2);
input.getxy(x+l,y+2);

j
1

=
=
=
u =
w=
0

q
s

input.getxy(x-l,y-2);
input.getxy(x+l,y-2);
input.getxy(x-2,y-l);
input.getxy(x,y-l);
input.getxy(x+2,y-l);
input.getxy(x-l,y);
input.getxy(x+2,y);
input.getxy(x-l,y+l);
input.getxy(x+l,y+l);
input.getxy(x-2,y+2);
input.getxy(x,y+2);
input.getxy(x+2,y+2);

For a 3 x 3 neighbourhood of (x, y), the points a, b, c, d, j, g, hand i are


called the 8-neighbours of (x, y) and are also said to be 8-connected to
(x, y). The points b, d, j and h are called the 4-neighbours of (x, y) and are
said to be 4-connected to (x, y).
3.1.1 Monadic, Point-by-point Operators
These have a characteristic equation of the form: c(x,y) {::: g(a(x,y)). Such
an operation is performed for all (x, y) in the range [1, m] x [1, n] (Fig. 3.1).

Fig. 3.1. Monadic point-by-point operator.

Intensity Shift. Add a constant offset to a pixel intensity value. This definition was carefully designed to maintain c(x, y) within the same range as
the input, [0, W]. This is an example of a process referred to as intensity normalisation (i.e. it permits iterative processing in a machine having a limited
precision for arithmetic). Normalisation will be used frequently throughout
this chapter.

c(x,y) <=

{~X'Y) + k

a(x, y) + k < 0
0::; a(x,y) + k::; W
W < a(x, y) + k

where k is a user defined variable.

3.1 Elementary Image Processing Functions

63

Intensity Multiply. Scales a given intensity value.

o(x,y) <=

{~X,y) x k

a(x,y) x k < 0
0::; a(x,y) x k::; W
W < a(x, y) x k

Negate. c(x, y) {::: W - a(x, y). Generate a photographic negative of the


image (Fig. 3.2).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,WHITE-input.getxy(x,y));

(b)

(a)

Fig. 3.2. Negate operation. (a) Input image. (b) Negate operator applied to (a).

Highlight. Emphasise a range of intensity values.


k3
c(x,y){::: { a (x,y )

kl ::; a(x, y) ::; k2

otherwise

Intensity Squaring. Square the intensity values. c(x, y) {:::

a('W')2.

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y, (input.getxy(x,y)*input.getxy(x,y))/w HITE);

Threshold. This is an important function, which converts a grey scale image


to a binary format. Unfortunately, it is often difficult, or even impossible to
find satisfactory values for the user defined integer threshold value thres
(Fig. 3.3).

c(x,y) {::: { :

(x,y):::: thres
otherwise

64

3. Machine Vision Techniques

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(input.getxy(x,y) < thres)
output.setxy(x,y,BLACK);
else
output.setxy(x,y,WHITE);

Fig. 3.3. Under and over thresholding of the bottle-top image illustrated
Fig. 3.2(a).

III

Mid-level Threshold. Threshold midway between the minimum and maximum grey level (i.e. at the grey scale value MIDGREY).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(input.getxy(x,y) >= MIDGREY)
output.setxy(x,y,WHITE);
else
output.setxy(x,y,BLACK);

Dual Threshold. Values less than a user defined integer value lower and
greater than the user defined integer value upper are set to BLACK. Otherwise the pixel is set to WHITE (Fig. 3.4).

c(x,y)

~ {:

lower::; a(x, y) ::; upper


otherwise

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if ((input.getxy(x,y=lower)&&(input.getxy(x,y)<=upper))
output.setxy(x,y,WHITE);
else
output.setxy(x,y,BLACK);

3.1 Elementary Image Processing Functions

65

Fig. 3.4. Dual threshold operation applied to


Fig. 3.2(a), with lower and upper grey scale values
of 105 and 184 respectively.

Triple Threshold. Values between BLACK and the user defined integer
value lower are set to BLACK. Values between the user defined integer value
upper and WHITE are set to WHITE. Otherwise the pixel is set to MIDGREY
(midway between the minimum and maximum grey level).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

if((input.getxy(x,y=BLACK)&&(input.getxy(x,y)<=lower
output.setxy(x,y,BLACK);
if((input.getxy(x,y=lower)&&(input.getxy(x,y)<=upper
output.setxy(x,y,MIDGREY);
if((input.getxy(x,y=upper)&&(input.getxy(x,y)<=WHITE
output.setxy(x,y,WHITE);
}

3.1.2 Intensity Histogram


The intensity histogram is defined in the following way:

s(p,x,y) +-where h(p)

{~

a(x,y)=p
otherwise

L-x,y s(p, x, y)

It is not necessary to store each of the s(p, x, y), since the calculation of the
histogram can be performed as a serial process in which the estimate of h(p)
is updated iteratively as we scan through the input image. The cumulative
histogram, H(p), can be calculated using the following recursive relation:
H(p) = H(p - 1) + h(p), where H(O) = h(O).
Both the cumulative (Fig. 3.5) and the standard histograms have a number of uses, as will become apparent later. It is possible to calculate various
intensity levels which indicate the occupancy of the intensity range. For example, it is a simple matter to determine that intensity level, p(k), which when

66

3. Machine Vision Techniques

used as a threshold parameter ensures that a proportion k of the output image


is black. p(k) can be calculated using the fact that H(p(k)) = m x n x k. The
mean intensity is equal to: Lp h;::~~p while the maximum intensity is equal to
MAX(plh(p) > 0) and the minimum intensity is equal to MIN(plh(p) > 0).
One of the principal uses of the histogram is in the selection of threshold
parameters. It is useful to plot h(p) as a function of p. It is often found from
this graph that a suitable position for the threshold can be related directly
to the position of the foot of the hill or to a valley in the histogram.

Number of
pixels

t2

t,

Grey levels

Fig. 3.5. Using the cumulative histogram to find the number of pixels in an image.
N is the total number of white pixels in the image, Ntl the total number of pixels
with a grey level::; h. The total number of white pixels in the image after applying
the threshold h = N - Ntl .

3.1.3 Look-up Tables (LUT)


Look-up tables provide a useful mechanism for changing the properties of the
input image. They can be used to enhance the image contrast in both linear
and non-linear fashions (Fig. 3.6).

Fig. 3.6. Application of the threshold


and inverse operations using look-up tables.

3.1 Elementary Image Processing Functions

67

Exponential LUT. The LUT is modified such that the output grey levels
are the normalised exponential of the input image.
for(int index=BLACK;index<=WHITE;index++)
{

intensity
(float) index/WHITE;
intensity
(float) ((Math.exp(intensity)-1)/(Math.exp(1.0)-1));
intensity
WHITE*intensity;
LUT[index] = (int) intensity;
}

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,LUT[input.getxy(x,y)]);

Logarithmic L UT. The L UT is modified such that the output grey levels
are the normalised logarithm of the input image. This is useful in brightening
dark detail in images with a large dynamic range.
for(int index=BLACK;index<=WHITE;index++)
{

intensity
(float) index/WHITE;
intensity
(float)Math.log((intensity*(Math.exp(1)-1))+1);
intensity
WHITE*intensity;
LUT[index] = (byte) intensity;
}

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,LUT[input.getxy(x,y)]);
nth Power LUT. This command modifies the LUT such that the output
grey levels are the current image pixel values to the power of a user defined
double value n. For example if n = 2.0, it is equivalent to squaring the grey
levels and normalising (Fig. 3.7).

for(int index=BLACK;index<=WHITE;index++)
{

intensity
(double) index/WHITE;
intensity
(double)Math.pow(intensity,n);
intensity
intensity*WHITE;
LUT[index] = (int) intensity;
}

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,LUT[input.getxy(x,y)]);

Histogram Equalization. An important operator for image enhancement


is given by the transformation: c(x, y) {::: WX!(:~X,y)). The concept behind
histogram equalisation is to spread the grey levels of an image over the full
range of BLACK to WHITE in such a way that there are a roughly similar
number of pixels of each grey level (Fig. 3.10) (i.e. it tries to flatten the
histogram of the output image c(x, y)).

68

3. Machine Vision Techniques

SHADES = 256;

II HISTOGRAM

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
histogram[input.getxy(x,y)]++;

II CUMULATIVE HISTOGRAM
for(int i=l; i<=WHITE; i++)
histogram[i] += histogram[i-l] ;
II TOTAL PIXEL COUNT

int pixels = histogram[WHITE];


if(pixels != 0)
for(int index=BLACK; index<=WHITE; index++)
LUT[index] = (int)((histogram[index]*(SHADES-l/pixels);

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,LUT[input.getxy(x,y)]);

(a)

(b)

Fig. 3.7. Look-up tables. (a) X-ray image of food jar. (Courtesy of M. Graves,
Spectral Fusion Technologies Ltd., UK). (b) nth power LUT applied to the x-ray
image, where n = 2. This is equivalent to squaring the grey levels and normalising.

Local Histogram Equalisation. This operation relies upon the application of histogram equalisation within a small local area or window. The number of pixels in a small window that are darker than the central pixel are
counted. This number defines the intensity at the equivalent point in the
output image. This is a powerful filtering technique, which is particularly
useful in texture analysis applications (Chap. 5). Fig. 3.8 illustrates the use
of local histogram equalisation for 3 x 3 and 5 x 5 regions.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

count_pixel = 0;

3.1 Elementary Image Processing Functions

69

input.getxy(x,y-1);
a = input.getxy(x-1,y-1); b
c = input.getxy(x+1,y-1); d
input.getxy(x-1,y);
e = input.getxy(x,y);
f
input.getxy(x+1,y); g
input.getxy(x-1,y+1);
input.getxy(x,y+1); i
input.getxy(x+1,y+1);
h
if(a < e) count_pixel++;
if(b < e) count_pixel++;
if(c < e) count_pixel++;
if(d < e) count_pixel++;
if(f < e) count_pixel++;
if(g < e) count_pixel++;
if(h < e) count_pixel++;
if(i < e) count_pixel++;
output.setxy(x,y,count_pixel);
}

(a)

(b)

(c)

Fig. 3.8. Local histogram equalisation. (a) Input texture image. (b) 3 x 3 region.
(c) 5 x 5 region.

Contrast Enhancement. This operator modifies the LUT so that the output grey levels occupy the entire grey scale range from BLACK to WHITE
(Fig. 3.9). This is performed by assessing the maximum and minimum grey
levels in an image and stretching these levels over the range BLACK to
WHITE. This is particularly useful in low contrast images to increase the
apparent dynamic range (Fig. 3.10).
min = WHITE; max = BLACK;
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

current_pixel = input.getxy(x,y);
if (current_pixel < min)
min = current_pixel;
if (current_pixel > max)
max = current_pixel;
}

70

3. Machine Vision Techniques

int LUT[] = new int[256];


for(int index=BLACK;index<min;index++)
LUT [index] =BLACK;
for(int index=max; index<=WHITE; index++)
LUT[index]=WHITE;
for(int index=min;index<max;index++)
LUT[index]=(int) (WHITE*(float) (index-min)/(float) (max-min;
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,LUT[input.getxy(x,y)]);

Pixel Count

Pixel Count

~'"
Grey Scale

b-&
Grey Scale

Pixel Count

Threshold

Grey Scale

255

Fig. 3.9. Stretching a histogram over the full range of grey scales to improve image
contrast. A possible threshold value is also indicated.

Contrast Stretching. This operator modifies the LUT so that the grey
levels in the user defined integer range min to max are mapped to the range
BLACK to WHITE. This is useful in examining details occurring in a small
part of the grey scale range. This is similar to contrast enhancement, except
that the range of levels to be enhanced are user defined and not automatically
generated.
3.1.4 Dyadic, Point-by-point Operators

These have a characteristic equation of the form: c(x, y) {::: h( a(x, y), b(x, y)).
It is important to realise that c(x,y) depends upon only a(x,y) and b(x,y)
(Fig. 3.11). Each of the operations described in this section determines which
input image has the greatest width (height) and uses that width (height) for
subsequent calculations (as outlined below).

3.1 Elementary Image Processing Functions

71

if(inputA.getWidth(inputB.getWidth())
width
inputA.getWidth();
else
width
inputB.getWidth();
if(inputA.getHeight(inputB.getHeight())
height
inputA.getHeight();
else
height
inputB.getHeight();

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 3.10. Images and their histograms. (a) Original image and its associated
histogram (b). (c) Contrast enhanced image and its associated histogram (d). (e)
Equalised image and its associated histogram (f).

72

3. Machine Vision Techniques

ti

I-

Fig. 3.11. Dyadic point-by-point operator

Add . c (x,Y ) -"~

a(x.y)+b(x,y)
2

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,(inputA.getxy(x,y)+inputB.getxy(x,y1);

Subtract. c(x, y) {::: flow(a(x, y) - b(x, y)).


for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,flow(inputA.getxy(x,y)
-inputB.getxy(x,y);

The subroutine flow (num) traps overflow and underflow situations on a


integer variable num. flow is defined as follows:
if (num>WHITE) return (WHITE)
if (num<BLACK) return (BLACK)

Multiply. c(x, y) {::: a(x,y)~b(x,y).


for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y, (inputA.getxy(x,y)
*inputB.getxy(x,y/WHITE);

Divide. c(x,y) {::: a(~i~~:r.


for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if (inputB.getxy(x,y)==BLACK)
output.setxy(x,y,WHITE);
else
output.setxy(x,y,flow((inputA.getxy(x,y)*WHITE)
/inputB.getxy(x,y);

MaxiIllUIn. c(x, y) {::: M AX[a(x, y), b(x, y)]. When applied to a pair of binary images, the union (OR function) of their white areas is computed. This
function may also be used to superimpose white writing onto a grey scale
image.

3.2 Local Operators

73

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(inputA.getxy(x,yinputB.getxy(x,y))
output.setxy(x,y,inputA.getxy(x,y));
else
output.setxy(x,y,inputB.getxy(x,y));

Minimum. c(x,y) {::: MIN[a(x,y),b(x,y)]. This operator finds the intersection (AND function) of the white areas of a pair of binary images. See
Fig. 3.12 for an example of this operator applied to two grey scale images.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(inputA.getxy(x,y)<inputB.getxy(x,y))
output.setxy(x,y,inputA.getxy(x,y));
else
output.setxy(x,y,inputB.getxy(x,y));

(a)

(b)

(c)

Fig. 3.12. Minimum operator. (a) Input image A. (b) Input image B. (c) Minimum
of images A and B.

3.2 Local Operators


The concept behind local operators is that the intensities of several pixels
are combined together in order to calculate the intensity of just one pixel.
Amongst the simplest of the local operators are those which use a set of nine
pixels arranged in a 3 x 3 square region. It computes the value for one pixel
on the basis of the intensities within a region containing 3 x 3 pixels. Other
local operators employ larger windows and we shall discuss these briefly later.
Local operators have a characteristic equation ofthe following form: c(x, y) {:::
g(p(x -1, y -1), p(x, y -1), p(x+ 1, y -l),p(x -1, y), p(x, y), p(x+ 1, y),p(x1, y + l),p(x, y + l),p(x + 1, y + 1)). In the simplified Java notation which we
introduced earlier, the above definition reduces to: e {::: g( a, b, c, d, e, j, g, h, i)
(Fig. 3.13).

74

3. Machine Vision Techniques

I:+'

Fig. 3.13. Local operator. In this instance, the


intensities of nine pixels arranged in a 3 x 3
window are combined together. Local operators may be defined which uses other possibly
larger windows. The window may, or may not
be square and the calculation may involve linear or non-linear processes.

Convolution. B(x, y) = A * M = 2:u 2:v A(u, v) x M(x - u, y - v), where


B(x, y) is the convolution of the input A with the mask or kernel, M. The
output image depends on the input image, A and the mask M, through
which it passes, therefore allowing the mask M to act as a filter. This is one
of the most powerful operators in machine vision as it allows the design of
filters which can enhance or attenuate certain aspects of the input image.
The general purpose NeatVision convolution programme uses a N x N user
defined mask, these coefficients must be in the form of a square odd sized
integer array. Spaces or X's are assigned as don't care states.
int maskSize = (int)Math.sqrt(mask.length);
int maskOffset= (maskSize - 1)/2;
int z; int sum =0;
for(int m=O; m<mask.length; m++)
if (mask[m] != null)
sum += mask[m].intValue();
for(int y=maskOffset;y<height-maskOffset;y++)
for(int x=maskOffset;x<width-maskOffset;x++)
{

z = 0;
for(int my=O; my<maskSize; my++)
for(int mx=O; mx<maskSize; mx++)
if (mask [mx+(my*maskSize)] !=null)
z += input.getxy(x+mx-maskOffset,y+my-maskOffset)
*mask [mx+ (my*maskSize)] .intValue();
if (sum
if (sum
i f (sum

0) z=(MIDGREY+(z1;
> 1) z=z/sum;
< 0) z=WHITE-(z/sum);

output.setxy(x,y,flow(z;
}

3.2.1 Linear Local Operators

An important class of local operator is that group which performs a linear


weighted sum and which are therefore known as linear local operators. For
this group, the characteristic equation is: e {::: kl x (a x W 1 + b X W 2 + c X
W3 + d X W 4 + ex W5 + f X W6 + 9 X W 7 + h X Ws + i x W g ) + k2, where
W 1 , W 2 , . , W g are weights, which may be positive, negative or zero. Values

3.2 Local Operators

75

for the normalisation constants, k1 and k2 are given later. The weight values
form a weight matrix which determines the properties of the linear local
operator. For example, in Fig. 3.14, we are using the 3 x 3 local operators
outlined below to highlight the horizontal and vertical elements within an
image.

Horizontal line mask

-1 -1 -1]

= [ 0 0 0
111

and

Vertical line mask

-101]

= [ -1 0 1
-101

Fig. 3.14. Linear Local Operators: Highlighting horizontal and vertical lines.

The following rules summarize the behaviour of this type of operator (they
exclude the case where all the weights and normalization constants are zero,
since this would result in a null image):
If all weights are either positive or zero, the operator will blur the input
image. Blurring is referred to as low-pass filtering. Subtracting a blurred
image from the original results in a highlighting of those points where the
intensity is changing rapidly and is termed high-pass filtering.
If WI = W 2 = W3 = W 7 = Ws = W g = 0 and W 4 , W s , W6 > 0, then
the operator blurs along the rows of the image; horizontal features, such
as edges and streaks, are not affected.
If WI = W 4 = W 7 = W3 = W6 = W g = 0 and W 2 , W s , Ws > 0, then the
operator blurs along the columns of the image; vertical features are not
affected.
If W 2 = W3 = W 4 = W6 = W 7 = Ws = 0 and WI, W s , W g > 0, then the
operator blurs along the diagonal (top-left to bottom-right). There is no
smearing along the orthogonal diagonal.

76

3. Machine Vision Techniques

If the weight matrix can be reduced to a matrix product of the form Q x P,


where

P=
and

the operator is said to be of the separable type. The importance of this is


that it is possible to apply two simpler operators in succession, with weight
matrices P and Q, in order to obtain the same effect as that produced by
the separable operator.
The successive application of linear local operators which use windows
containing 3 x 3 pixels produces the same results as linear local operators
with larger windows. For example, applying that operator which uses the
following weight matrix:

ill 1JII
I
II

11 1

twice in succession results in a similar image as that obtained from the


5 x 5 operator with the weight matrix illustrated below. For the sake of
simplicity, normalization has been ignored here.
12321
24642
36963
24642
12321
Applying the same 3 x 3 operator for a third time is equivalent to using
the following 7 x 7 operator:
1367631
3 9 1821 18 9 3
6 18 36 42 36 18 6
7 21 42 49 42 21 7
6 18 36 42 36 18 6
3 9 1821 18 9 3
1367631
Notice that all of these operators are also separable. Hence, it would be
possible to replace the last-mentioned 7 x 7 operator with four simpler

3.2 Local Operators

77

operators: 3 x 1,3 x 1,1 x 3 and 1 x 3, applied in any order. It is not


always possible to replace a large-window operator with a succession of
3 x 3 operators. This becomes obvious when one considers, for example,
that a 7 x 7 operator uses 49 weights and that three 3 x 3 operators provide
only 27 degrees of freedom. However, separation is often possible when the
larger operator has a weight matrix with some redundancy, for example
when it is symmetrical.
In order to perform normalisation, the following values are used for kl and
k2 .
kl +-

=--;------,

2:p,q IWp, ql

k 2+- [1 -

2:p,q Wp,q 1 W
x2: p,q IWp,ql
2

(3.1)

(3.2)

Rectangular Average Filter. This command performs a rectangular average filter, of size (2k+l) x (2k+l) (Fig. 3.15). It is equivalent to a convolution
of that size in which all the coefficients are ones. This is effectively the same
as the Gaussian (Sec. 3.2.3), except that no multiplications are required (as
all coefficients are one) and therefore it is quicker than the Gaussian. When
dealing with such large filters the designer must account for the resultant
edge effects (Sec. 3.2.5).
A filter using an 11 x 11 weight matrix performs a local averaging function.
This produces quite a severe 2-directional blurring effect. Subtracting the
effects of a blurring operation from the original image generates a picture in
which spots, streaks and intensity steps are all emphasised (Fig. 3.15(d)). On
the other hand, large areas of constant or slowly changing intensity become
uniformly grey.
k = size;
1 = (2*k)+1;
offset = k+l;
for(int y=offset;y<height-offset;y++)
for(int x=offset;x<width-offset;x++)
{
z = 0;

for(int i=-k; i<=k; i++)


z+=input.getxy(x+i,y);
bufferO.setxy(x,y,z/l);
}

for(int x=offset;x<width-offset;x++)
for(int y=offset;y<height-offset;y++)
{

z = 0;
for(int j=-k; j<=k; j++)
z+=bufferO.getxy(x,y+j);
output.setxy(x,y,z/l);
}

78

3. Machine Vision Techniques

(a)

(b)

(c)

(d)

Fig. 3.15. Rectangular averaging filter. (a) Original Image. (b) 3 x 3. (c) 11 x II.
(d) The result of subtracting a 11 x 11 local averaging function from (a). A grey
scale offset has been added to the image to enhance its visibility.

Adaptive Thresholding in an 3 X 3 Region. Threshold = mean of 3 x 3


region [a, b, c, d, e, f, g, h, i] - offset (0:), where alpha is a user defined offset
value (in Neat Vision the default offset value is set to zero) (Fig. 3.16).
for(int y=O;yc<height;y++)
for(int x=O;xc<width;x++)
{

mean = (a+b+c+d+e+f+g+h+i)/9;
threshold = mean - alpha;
if(e < threshold)
output.setxy(x,y,BLACK);
else
output.setxy(x,y,WHITE);
}

Adaptive Thresholding in an 5 X 5 Region Threshold = mean of a 5 x 5


region [a, b, c ... , y] - offset (0:), where alpha is a user defined offset value
(in Neat Vision the default offset value is set to zero) (Fig. 3.16).

3.2 Local Operators

79

for(int yc=2;yc<height-2;yc++)
for(int xc=2;xc<width-2;xc++)
{

mean = (a+b+c+d+e+f+g+h+i+j+k+l+m+n+o+
p+q+r+s+t+u+v+w+x+y)/25;
threshold = mean - alpha;
if(m < threshold)
output.setxy(xc,yc,BLACK);
else
output.setxy(xc,yc,WHITE);
}

(a)

(b)

Fig. 3.16. Adaptive thresholding of the textured image illustrated in Fig. 3.8(a).
(a) 3 x 3 region. (b) 5 x 5 region. In both cases the offset value is set to zero.

3.2.2 Non-linear Local Operators


Largest Intensity Neighbourhood. e {::: M AX(a, b, c, d, e, j, g, h, i). This
operator has the effect of spreading bright regions and contracting dark ones
(Fig. 3.17).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

z = Math.max(Math.max(Math.max(Math.max(a,b),Math.max(c,d)),
Math.max(Math.max(e,f),Math.max(g,h))),i);
output.setxy(x,y,z);
}

Smallest Intensity Neighbourhood. e {::: M IN(a, b, c, d, e, j, g, h, i). This


operator has the effect of spreading dark regions and contracting bright ones
(Fig. 3.17).

80

3. Machine Vision Techniques

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

z = Math.min(Math.min(Math.min(Math.min(a,b),Math.min(c,d)),
Math.min(Math.min(e,f),Math.min(g,h))) ,i);
output.setxy(x,y,z);
}

(b)

(a)

Fig. 3.17. Intensity Neighbourhood operator applied to the textured image illustrated in Fig. 3.8(a). (a) Largest Intensity Neighbourhood. (b) Smallest Intensity
Neighbour hood.

Edge Detector. e {::: MAX (a, b, c, d, e, j, g, h, i) - e. This operator highlights edges within an image (i.e. points where the intensity is changing
rapidly).
Median Filter. e {::: FIFTH LARGEST(a,b,c,d,e,j,g,h,i). This filter
is particularly useful for reducing the level of noise in an image. Noise is
generated from a range of sources, such as video cameras and x-ray detectors
and can be problematic if it is not eliminated by hardware or software filtering
(Fig. 3.18).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

b[O]
b[2]
b[4]
b[5]
b[7]
fin=-l;

input.getxy(x-l,y-l);
input.getxy(x+l,y-l);
input.getxy(x,y);
input.getxy(x+l,y);
input.getxy(x,y+l);

while(fin!=O)
{

fin=O;

b [1]
b [3]

input.getxy(x,y-l);
input.getxy(x-l,y);

b [6]
b [8]

input.getxy(x-l,y+l);
input.getxy(x+l,y+1);

3.2 Local Operators

81

for(int i=O;i<8;i++)
{

if (b [i +1] >b [i])


{

swp = b[i] ;
b[i] = b[i +1] ;
b[i+1] = swp;
fin++;
}
}
}

output.setxy(x,y,b[4]);
}

Midpoint. This filter is based on the mid-value of pixels in a 3 x 3 neighbourhood (Fig. 3.18).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

high

low

Math.max(i,Math.max(Math.max(Math.max(a,b),
Math.max(c,d)) ,Math.max(Math.max(e,f) ,
Math.max(g,h))));
Math.min(i,Math.min(Math.min(Math.min(a,b),
Math.min(c,d)) ,Math.min(Math.min(e,f) ,
Math.min(g,h))));

z = (high+low)/2;
output.setxy(x,y,z);
}

Low-pass (Mean). This is equivalent to a 3 x 3 convolution with a mask


in which all the coefficients are set to 1. The result is normalised by dividing
by 9 (Fig. 3.18).
for(int y=1;y<height-1;y++)
for(int x=1;x<width-l;x++)
{

z = (a+b+c+d+e+f+g+h+i)/9;
output.setxy(x,y,z);
}

Difference of Low-pass Filters (DOLPS). The DOLPS filter is another


commonly used operation (Fig. 3.18).
bufferO=LowPass.lowPass (LowPass.lowPass (LowPass. 10wPas s(input)));
bufferl=LowPass.lowPass (LowPass.lowPass (LowPass. 10wPas s(bufferO)));
output=SingleThreshold.threshold(subtract(bufferO,buffer1),128);

Sharpen. This operator emphasises high frequency regions within an image


(Fig. 3.18).

82

3. Machine Vision Techniques


for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

z = flow((9*e)-a-b-c-d-f-g-h-i);
output.setxy(x,y,z);
}

Crack Detector. This operator is equivalent to applying the sequence of operations Largest Intensity Neighbourhood (LIN), LIN, Negate, LIN,
LIN, Negate and then subtracting the result from the original image. This
detector is able to detect thin dark streaks and small dark spots in a grey
scale image. It ignores other features, such as bright spots and streaks, edges
(intensity steps) and broad dark streaks (Fig. 3.18).
Rank Filters. The generalised 3 x 3 rank filter is: c(x,y) {::: kdL1,W1 +
L 2.W2 + L3.W3 + L 4 .W4 + L5.W5 + L6.W6 + L 7 .W7 + Ls.Ws + Lg.Wg) + k2
where (L1' L 2 , ... , Lg) is a sorted version (largest first, L 1) of the input pixel
values (a, b, c, d, e, f, g, h, i) and k1 and k2 are the normalization constants.
With the appropriate choice of weights (W1' W 2, ... , W g), the rank filter can
be used for a range of operations including edge detection, noise reduction,
edge sharpening and image enhancement.
Direction Codes. This function can be used to detect the direction of
the intensity gradient. The direction code function DIRCODE is defined as
e {::: DIRCODE(a, b,c,d,f,g,h,i), where

DIRCODE(a, b, c, d, f, g, h, i) {:::

1 if
2 if
3 if
4 if
5 if
6 if
7 if
8 if

a 2: MAX(b,c,d,f,g,h,i)
b 2: MAX(a,c,d,f,g,h,i)
c 2: MAX(a,b,d,f,g,h,i)
d 2: MAX(a,b,c,f,g,h,i)
f 2: MAX(a,b,c,d,g,h,i)
g 2: MAX(a,b,c,d,f,h,i)
h 2: MAX(a,b,c,d,f,g,i)
i 2: MAX(a,b,c,d,f,g,h)

3.2.3 Edge Detectors

An edge is a collection of pixel values whose brightness values change


abruptly. Edges represent borders between regions on an object or in a scene.
The detection of these edge pixels is often a key operation in machine vision.
This is especially the case for industrial vision where the majority of manufactured products have clearly defined edge regions which can be used as
landmarks for image analysis and calibration. As will become evident from the
discussion that follows, edge pixels are calculated in terms of their relationship to their neighbouring pixels. An edge pixel has an associated magnitude
and gradient.

3.2 Local Operators

(a)

(b)

(c)

(d)

(e)

(f)

83

Fig. 3.18. Filtering of the textured image illustrated in Fig. 3.8(a). (a) Median.
(b) Midpoint. (c) Low-pass filter. (d) Difference of low-pass filters (DOLPS). (e)
Sharpen. (f) Crack detection.

84

3. Machine Vision Techniques

Laplacian Edge Operator. Performs a Laplacian second-order derivative


function on the current image (Figs. 3.19 and 3.20). This edge operator is
seldom used for edge detection in machine vision applications as it contains
no useful directional information and it enhances unwanted high frequency
noise in the image (Fig. 3.21). The thresholded magnitude of the Laplacian
produces a double edge. This can be used used to find if a pixel falls on the
dark or light side of a detected edge (Fig. 3.19). Two forms are implemented in
Neat Vision, the first is based on using a 4-neighbourhood convolution kernel.
The second, is based on using an 8-neighbourhood convolution kernel.
laplacian4(input)
{

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

b
input.getxy(x,y-l); d
input.getxy(x-l,y);
e = input.getxy(x,y);
f
input.getxy(x+l,y); h
input.getxy(x,y+l);
z
MIDGREY + (((4*e)-b-d-f-h>1);
output.setxy(x,y,z);
}

return(output);
}

laplacian8(input)
{

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

a = input.getxy(x-l,y-1); b = input.getxy(x,y-1);
c = input.getxy(x+l,y-l); d = input.getxy(x-1,y);
e = input.getxy(x,y);
f
input.getxy(x+l,y); g = input.getxy(x-l,y+l);
h
input.getxy(x,y+1); i = input.getxy(x+1,y+l);
z
MIDGREY + (((8*e)-a-b-c-d-f-g-h-i>1);
output.setxy(x,y,z);
}

return(output);
}

Roberts Edge Detector. e {::: J(a - e)2 + (b - d)2. A number of edge


operators are described by a collection of masks (kernels), which measure
the gradient for certain directions. The simplest gradient edge detector, the
Roberts cross operator, computes the first derivatives in two orthogonal directions. This is calculated from cross differences, implemented using two 2 x 2
convolution kernels. An approximate version of this operator also exists, this
is referred to as the Modified Roberts operator (it is this operator that is
implemented in NeatVision). This is simpler and faster to implement and is
defined as: e {::: (la-eltlb-dl) (Figs. 3.20 and 3.21).

3.2 Local Operators

85

1
-----ni

1-

------r'---'-r-r

Fig. 3.19. Examination of the effect of the first and second derivative when applied
to a line profile.

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

a = input.getxy(x-1,y-1); b
input.getxy(x,y-1);
d = input.getxy(x-1,y);
e = input.getxy(x,y);
z1
Math.abs(a-e);
z2 = Math.abs(b-d);
z3 = z1+z2;
if(z3>WHITE) z3 = WHITE;
output.setxy(x,y,z3);
}

Sobel Edge Detector. The Sobel edge detector (which is equivalent to a


first derivative) uses a 3 x 3 mask to determine the edge gradient.

e {:::J[(a+2b+c) - (g+2h+i)J2 + [(a+2d+g) - (c+2j+i)J2.


The derivatives are calculated using the two convolution masks which are
then summed. These have an inherent smoothing effect, which makes the Sobel very effective in the presence of noise. As with the Roberts operator the
Sobel edge detector is approximated for simpler and faster implementation:

e {::: (l(a+2 x b+c) - (g+2 x h+i)1 + l(a+2 x d+g) - (c+2 x j+i)I)/6.


for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

z1
z2

Math.abs(a+(b1)+c-g-(h1)-i);
Math.abs(a+(d1)+g-c-(f1)-i);

86

3. Machine Vision Techniques

z3 = flow((zl+z2>2);
output.setxy(x,y,z3);
}

While the Roberts operator produces thinner edges, these edges tend to
break up in regions of high curvature. Roberts edges also tend to have poor
localization. The primary disadvantage of the Roberts operator is its high
sensitivity to noise, since fewer pixels are used in the calculation of the edge
gradient (Fig. 3.20). There is also a slight shift in the image, when the Roberts
edge detector is used. The Sobel edge detector does not produce such a shift
(Fig. 3.21).

Fig. 3.20. Comparison of the Laplacian, Roberts and Sobel edge operators on a
linear edge segment.

":1 --:> <::' C>

["

.~

: <1

'1

'7
'-.1

(a)

I'~

'-'

(b)

,,>

(c)

Fig. 3.21. Edge detection applied to the image illustrated in Fig. 3.12(a). (a)
Laplacian. (b) Roberts. (c) Sobel.

Prewitt Edge Detector. The Prewitt edge-detector is similar in nature


to the Sobel operator, but is more sensitive to noise as it does not possess
the same inherent smoothing. This operator uses the two 3 x 3 masks shown
below to determine the edge gradient:

3.2 Local Operators

87

-ll

1-1 -1
0 0 0
111

and

where P1 and P2 are the values calculated from each mask respectively.
The Prewitt gradient magnitude is defined as: e -=Jp'f + P]' This operator
can also approximated by the Modified Prewitt edge detector. The derivative
is approximated by differencing pixels on either side of the centre pixel. The
absolute values give by each of the convolution kernels are added together
and the centre pixel replaced.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

z1
Math.abs(a+b+c-g-h-i);
z2
Math.abs(a+d+g-c-f-i);
z3
(z1+z2) /3;
if (z3>WHITE) z3 = WHITE;
output.setxy(x,y,z3);
}

Frei and Chen Edge Detector. This operator uses the two 3 x 3 masks
shown below to determine the edge gradient:

and
[

-1

OIl

--vI2 0 --vI2 J
-1 0

where Fl and F2 are the values calculated from each mask respectively.
The Frei and Chen gradient magnitude is defined as: e -= JF'f + F]. A
modified version of this operator (using absolute values) can be given as:
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

z1
Math.abs(a+(b*Math.sqrt(2.0+c-g-(h*Math.sqrt(2.0-i);
z2
Math.abs(a+(d*Math.sqrt(2.0+g-c-(f*Math.sqrt(2.0-i);
z3
(int) (z1+z2/(2+Math.sqrt(2.0);
if(z3>WHITE) z3 = WHITE;
output.setxy(x,y,z3);
}

3. Machine Vision Techniques

88

Detect Zero-Crossings. This is a simplistic approach to calculating zero


crossings (Fig. 3.22) and should not be confused with the Marr-Hildreth edge
detector (Marr & Hildreth 1980).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

if(((d<MIDGREY)&&(f>MIDGREY)&&(e<f)&&(e>d I I
((b<MIDGREY)&&(h>MIDGREY)&&(e<h)&&(e>b I I
((d>MIDGREY)&&(f<MIDGREY)&&(e>f)&&(e<d I I
((b>MIDGREY)&&(h<MIDGREY)&&(e>h)&&(e<b)
output.setxy(x,y,WHITE);
else
output.setxy(x,y,BLACK);

Fig. 3.22. Detection of zero-crossings when applied to the image illustrated in


Fig. 3.8(a).

Marr-Hildreth Edge Detector. Marr & Hildreth (1980) proposed an approach to edge detection based on finding the zero crossings of the Laplacian
of a Gaussian (commonly referred to as LoG). The image is initially smoothed
with a Gaussian G(x, y) low-pass filter window which removes noise and spurious edges. The Laplacian is then applied followed by marking pixels for
which there is a zero crossing as edge pixels. This approach does not give an
indication of edge direction.
The 2-D Gaussian is defined as:
(3.3)
where x and yare the pixel coordinates, k is a scaling constant and (J" is the
standard deviation of the associated probability distribution. Fortunately, the
convolution of the 2-D Gaussian can be effected by two convolutions with 1-D
Gaussian functions, i.e. G(x, y) * f(x, y) = G(x) * (G(y) * f(x, y)). Therefore
the Laplacian of a Gaussian (LoG) is defined as:

(3.4)

3.2 Local Operators

89

where * indicates convolution and V 2 is the Laplacian second-order


derivative function.
Canny Edge Detection. This approach to edge detection was initially developed in the early 80's to be optimal for step edges containing white noise,
(Canny 1986). Although quite powerful (it has become one of the more popular edge detection algorithms in computer vision) its computational expense
makes it less attractive for real-time applications. This edge detector was
based on several constraints:

Maximising the signal to noise ratio in order to achieve a reliable detection.


Minimizing the distance between the actual and located position of the
edge.
Minimizing multiple responses to a single edge. This criterion is very important especially when the input image is corrupted by noise.
The Canny edge detector returns both the magnitude and the direction
of the gradient. Canny developed an exponential function that is very similar
to the first derivative of a Gaussian (as defined in Eqn. 3.3). The normal to
an edge n can be expressed as:

V(G
VIG

* f(x,y))
* f(x, y)1

(3.5)

~~--~--~

where V is the gradient operator, f(x, y) is the input image and G is a


2-D Gaussian. The edge is located at the maximum in the direction n of the
first derivative of G in the same direction.
6 6G
6n 6n

* f(x,y) =

62
6n2G

* f(x,y) = 0

(3.6)

Canny edge detection involves convolving the input image with a Gaussian of scale (5, where the partial derivatives are computed and the magnitude and orientation results are recorded. Spurious responses can be reduced
by thresholding applied with hysteresis (two thresholds are defined) (Canny
1986). If gradient magnitude of the contour is greater than the higher threshold then it is considered a valid edge point. Any candidate edge pixels that
are connected to valid edge points and are above the lower threshold are also
considered edge points. The final step involves the removal of edge pixels that
are not local maxima, this is referred to as non-maximum suppression.
While the correct scale for the operator depends on the type of objects
under examination in the image, differing scales can be examined by varying
the standard deviation (5 of the Gaussian. The Canny edge detector gives
stable results, but junction connectivity is relatively poor.
While a number of alternative edge operators do exist these are usually
not required for the majority of machine vision applications and as such will
not be covered here. Many machine vision engineers hold the view that if
your edges are not easily detected then you should examine your lighting

90

3. Machine Vision Techniques

(b)

(a)

Fig. 3.23. Canny edge detection, with the low and high thresholds set to 2 and 3
respectively. (a) (J" = .5. (b) (J" = 2.5.

configuration rather than use a more complicated edge operator. While this
oversimplifies the case, it does hold a certain degree of truth. A redesigned
lighting system which can enhance the edges does not add to the systems
computational load and hence can be an attractive option in the development
of real-time vision systems.
3.2.4 N-tuple Operators

N-tuple operators may be regarded as generalised versions of local operators


(Fig. 3.24). In order to understand the N-tuple operators, let us first consider
a linear local operator which uses a large processing window, (say r x s pixels)
with most of its weights equal to zero. An N-tuple filter is one in which N
of the weights are non-zero, where N < < r x s. N-tuple filters are usually
designed to detect specific patterns. In this role, they are able to locate a
simple feature, such as a corner, annulus, the numeral '2'. However, they are
sensitive to changes of orientation and scale (Fig. 3.25).

I'

1+

Fig. 3.24. An N-tuple filter operates much like a local operator. The only difference
is that the pixels whose intensities are combined together do not form a compact
set. A linear N-tuple filter can be regarded as being equivalent to a local operator
which uses a large window and in which many of the weights are zero.

3.2 Local Operators

Shift

Tilt

Good
Fit

Mismatch

Wrong

Wrong

Size

Font

91

Fig. 3.25. Recognising a numeral '2' using an N-tuple.

The N-tuple can be regarded as a sloppy template, which is convolved


with the input image. Non-linear tuple operators may be defined in a fairly
obvious way. For example, we may define operators which compute the average, maximum, minimum or median values of the intensities of the N pixels
covered by the N-tuple. An important class of such functions is the morphological operators, these are discussed in detail in Chap. 4. Fig. 3.26 illustrates
how the N-tuple operator can be used to highlight the Arial font character 't'
in a sentence for a given orientation and scale. The detected characters are
overlaid with a small square box. The convolution mask used in this example
is illustrated in Fig. 3.26(c). The associated NeatVision visual workspace is
illustrated in Fig. 3.26(d).

Measures of Similarity. Two main types of similarity measurement are


used in determining a good fit between template and test images .
Euclidean Distance Based: E(m,n)

VL-i L-j[g(i,j) -

t(i - m,j - n)j2,

were t(x, y) is the template image and g(x, y) is the test image. In the
general 2-D case this becomes: d = (Xl - X2)2 + (YI - Y2)2 .
Cross-Correlation Based: R(m, n) = L-i L-j g(i, j) x t(i - m, j - n). The
template image, t(i - m,j - n) and the section of g(i,j) in the vicinity of
(m, n) are similar when the cross-correlation is large.

3.2.5 Edge Effects


All local operators and N-tuple filters are susceptible to producing peculiar
effects around the edges of an image. In order to calculate the intensity of a
point near the edge of an image, we require information about pixels outside
the image, which of course are simply not present. In order to make some
attempt at calculating values for the edge pixels, it is necessary to make
some assumptions, for example all points outside the image are black, or
have the same values as the border pixels. This strategy, or whatever one

92

3. Machine Vision Techniques

This example illustrates tlOW an Ntuple operator can be used to identify


the character "t" ina sentence.
(a)

(b)

( )

INT
(255)

(d)
Fig. 3.26. Template matching using the N-tuple operator. (a) Original image. (b)
Detection of the Arial font 't' character using an N-tuple operator. (c) Convolution
mask, where a dark grid entry indicates a '1' and a clear grid indicates a "don't
care" state. (d) Associated visual workspace.

3.2 Local Operators

93

we adopt, is perfectly arbitrary and there will be occasions when the edge
effects are so pronounced that there is nothing that we can do but to remove
them by masking. Edge effects are important because they require us to make
special provisions for them when we try to patch several low-resolution images
together.
3.2.6 Grey Scale Corner Detection
A corner can be defined as a position in the 2-D array of brightness pixels
which humans would associate with the word 'corner', i.e. points where the
gradient direction changes rapidly and points which correspond to physical
corners of objects. For simple binary scenes, corners may be modelled as
the junction point of two or more straight line edges (Sec. 3.3.2). But the
detection of grey level corners is generally more difficult. Grey level corners
have been defined as: the location of maximum planar curvature in the locus
line of steepest grey level, Fig. 3.27. One problem with this model of corners
is that low gradient but highly curved corners will be missed (these would be
common in real-life images). The ability to extract stable corner features in
a computationally efficient manner is critical to many spatial and temporal
object tracking techniques (Blake & Isard 1998).

Fig. 3.27. Grey scale corner.

Smallest Univalue Segment Assimilating Nucleus (SUSAN). In the


more general sense when we are discussing grey scale corners we are referring to the features formed at boundaries (i.e. a wide variety of junctions)
between only two image brightness regions, where the boundary curvature is
sufficiently high. Smith (1992) developed one such grey scale corner detector,
SUSANl . This computationally efficient approach to the detection of grey
scale corners is based on the application of a small circular mask (the centre
pixel of which is referred to as the nucleus) to an image. As this mask is
moved over the image in a raster fashion it examines the local information
to determine the likelihood of a valid grey scale corner (Fig. 3.28).
1

Full details of SUSAN low level image processing, including C program source
code, can be found at http://www.fmrib.ox.ac . uk/-steve/susan/index .html.
This program is Copyright 1995-1997 Defence and Evaluation Research
Agency, UK.

94

3. Machine Vision Techniques

,-- - ,--

I
I

I
I

I
~

Fig. 3.28. SUSAN default mask size 5 x 5, with 3 pixels


added onto each edge, giving a reasonable approximation
to a circle.

The mask is placed at each point in the input image. The operator works
by comparing the brightness of the nucleus to the brightness of each pixel
within the the mask. All pixels surrounding this point with a similar intensity (i.e. within a given threshold) are then grouped together into a univalue
segment assimilating nucleus or USAN. The area of the Smallest USAN (SUSAN) is then compared to a predefined value to decide if a corner exists. The
comparison function is given by:

c(r,ro) =

{~

if II(r) - I(ro)1 ::; tb


otherwise

(3.7)

where ro is the position of the nucleus, r is the position of any other pixel
within the mask, IO represents the brightness of the corresponding pixel and
tb is the brightness difference threshold. While this basic approach works
well for simulated scenes, the algorithm invokes a number of additional noise
filtering stages for real-world images. Specifically Smith (1992) have shown
that improved results can be archived by calculating c(r,ro) as follows:
(3.8)
This allows a pixel's brightness to vary slightly without having a large
effect on c( r, ro).
The final stage is to count the number of pixels (n(ro) = 2: r o c(r,ro))
(i.e. the USAN's area) within the USAN region. This value is compared with
a geometric threshold tg to determine the presence of a valid grey scale corner
at the nucleus. The brightness threshold tb controls the contrast allowed in
the detected corners and therefore the number of corners detected in the
image. The geometric threshold tg also has an effect on the number of corners
found, but more importantly it relates to the shape of the corners detected,
by lowering this value it in effect forces sharper corners (Fig. 3.29). Smith &
Brady (1997) argued that for a corner to be detected n < n'2 ax , where n max
is the maximum value of n. In fact, n max = 3700, since we have 37 pixels
in mask illustrated in Fig. 3.28 then n max = 37 x maximum value of c.
Generally the geometric threshold is fixed at tg = n'2 ax
In the example illustrated in Fig. 3.30 a corner is detected if the area
of the Smallest USAN is less than ths the area of the mask (M A). With
reference to Fig. 3.30 (Smith 1992),

3.2 Local Operators

............
(a)

(d)

95

(b)

(c)

(e)

(f)

Fig. 3.29. SUSAN threshold selection. The type of corner detected is dependent
on the brightness and geometric thresholds. (a) Strong corner, tb < a, where a is
a nominal value (e.g., 16). (b) tb = a. (c) Weak corner, tb > a. (d) Strong corner,
tg < (3, where (3 is a nominal value (e.g., ~). (e) tg = (3. (f) Weak corner, tg > (3.

Mask A: The area of the smallest USAN (shaded region within the mask)
< 2.~ A, thus a corner is correctly detected here.
Mask B: The area of the smallest USAN (white region within the mask)
> 2.~ A, therefore no corner is detected.
Mask C: The USAN area is exactly half that of the mask. No corner is detected. This could also be classified as an edge.
Mask D: Clearly the USAN area is too large. No corner is detected.
Mask E: The area of the smallest USAN (shaded region within the mask)
> 2.~A. No corner is detected.

Cif

Boundary of Mask ~

Nucleus of Mask

Fig. 3.30. SUSAN operator applied to a simple scene.

96

3. Machine Vision Techniques

A further processing step involves removing any false positives that might
occur. For example in Fig. 3.31 the area of the USAN is indeed less than
~ I ths of the area of the mask, however it is not appropriate to classify it as a
corner. To combat this problem we calculate the centre of gravity of the region
within the mask. In Fig. 3.31 the centre of gravity is very close to the centre
of the mask. This however is not the case when a corner is detected. This
simple addition to the algorithm need only be calculated when the threshold
is reached, therefore not adding significantly to the overall computation of
the approach. Another step in reducing false positives involves insuring that
all the points between the centre of the mask and the centre of gravity belong
to the USAN for a corner to be detected. This insures a degree of uniformity
in the USAN and thus in the corner being detected. The final non-maximum
suppression stage is carried out on this response to detect the corners (See
Fig. 3.32).

Falsely delecled Corner

X Centre of Gravity

Fig. 3.31. A possible falsely detected corner using the SUSAN algorithm and mask,
prior to the calculation of the centre of gravity.

As mentioned previously, the key advantage of this technique is its computational efficiency (especially when implemented using look-up tables). Another advantage is the fact that as there are no brightness spatial derivatives
the image noise is not amplified. Other methods have attempted to reduce
the noise in the image by smoothing, but this leads to local distortion and
a loss of local information. This approach to low level image processing has
also been applied to edge detection (Smith & Brady 1997).

3.3 Binary Images


For the purposes of this description of binary image processing, it will be
convenient to assume that a(x, y) and b(x, y) can assume only two values: 0
(black) and 1 (white). The operator 'V' denotes the Boolean OR operation, 'I\'
represents the AND operation and where 'Q?l' denotes the Boolean Exclusive

3.3 Binary Images

97

OR operation (Fig. 3.33). Let 'L1(x, y)' denote the number of white points
addressed by N(x, y), including (x, y) itself.
Binary Border. Generates the borders around any binary objects in the
source image (effectively a binary edge detection routine). The image is
scanned and any white pixel with more than six white neighbours are transformed to black (Fig. 3.34).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

z = (a&1)+(b&1)+(c&1)+(d&1)+(f&1)+(g&1)+(h&1)+(i&1);
if(z>6)
output.setxy(x,y,BLACK);
else
output.setxy(x,y,e);
}

(a)

(b)

(c)

Fig. 3.32. SUSAN grey scale corner detector. The detected points are indicated
by white squares for clarity. The brightness threshold tb relates to the number of
corners reported by the operator. (a) Original image. (b) tb = 10. (c) tb = 50.

98

3. Machine Vision Techniques

White Pixel Count. Count the number of white pixels within an image
region.
int number=O; for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(input.getxy(x,y)==WHITE)
number++;
return(number);

Remove Isolated White Points. Any white pixels with less than 1 white
pixel neighbour are set to black. This can be used to remove noise from a
binary image.

c(x,y)

~ {~

a(x, y) x (.1(X, y)
otherwise

> 1)

II Clear the output image.


for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,BLACK);
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

if(e!=BLACK)
if((a==WHITE)I I (b==WHITE) I I (c==WHITE) I I (d==WHITE) I I
(f==WHITE) I I (g==WHITE) I I (h==WHITE) I I (i==WHITE
output.setxy(x,y,WHITE);
}

Count White Neighbours. c(x, y)


is a grey scale image.

.1(a(x, y) = 1). Notice that c(x, y)

3.3.1 Boolean Operators


NOT. c(x, y)

NOT(a(x, y)).

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,WHITE-input.getxy(x,y;

AND White Regions. c(x, y)

a(x, y)

1\

b(x, y).

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,inputA.getxy(x,y)&inputB.getxy(x,y;

OR White Regions. c(x, y)

a(x, y)

b(x, y).

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,inputA.getxy(x,y) linputB.getxy(x,y;

3.3 Binary Images

99

Exclusive OR. c(x, y) {::: a(x, y)r;;yb(x, y). This operator finds the differences
between white regions.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

if(inputA.getxy(x,y)==WHITE && inputB.getxy(x,y)==BLACK)


output.setxy(x,y,WHITE);
if(inputA.getxy(x,y)==BLACK && inputB.getxy(x,y)==WHITE)
output.setxy(x,y,WHITE);

(b)

(a)

94 T

12 5

(c)

(d)

Fig. 3.33. Binary image manipulation. (a) Input image a(x,y). (b) Input image

b(x, y). (c) a(x, y) /\ b(x, y). (d) a(x, y)

Q9

b(x, y).

Expand White Areas. e {::: (a V bv cV dVe V IV 9 V h Vi). Notice that this


is closely related to the Largest Intensity Neighbour operator defined earlier.

I /\ 9 /\ h /\ i)
e {::: e /\ NOT (a /\ b /\ c /\ d /\ e /\ I /\ 9 /\ h /\ i).

Shrink White Areas. e {::: (a /\ b /\ c /\ d /\ e /\


Binary Edge Detector.

3.3.2 Connected COInponent (Blob) Analysis


Connectivity Detector Consider the following pattern:

11 Il
I101 J
0
I X I

If X = 1, then all of the l's are 8-connected to each other. Alternatively,


if X = 0, then they are not connected. In this sense, the point marked X is
critical for connectivity. This is also the case in the following examples:

100

110

001

000

o0

101

OXI

lXO

lXO

100

3. Machine Vision Techniques

(b)

(a)

Fig. 3.34. Binary edge detection. (a) Original image. (b) Binary border.

However, those points marked X below are not critical for connectivity,
since setting X = 0 rather than 1 has no effect on the connectivity of the l's.

111
1 Xl
001

011
1XO
111

011
1XO
011

A connectivity detector shades the output image with 1 's to indicate the
position of those points which are critical for connectivity and which were
white in the input image. Black points and those which are not critical for
connectivity, are mapped to black in the output image.
Euler Number. The Euler number is defined as the number of connected
components (blobs) minus the number of holes in a binary image and is
a useful means of classifying shapes in an image. The Euler number also
represents a simple method of counting blobs in a binary image provided
that they have no holes in them. Alternatively, it can be used to count holes
in a given object, providing they have no islands in them (Fig. 3.35).
The reason why this approach is used to count blobs, despite the fact that
it may seem a little awkward to use, is that the Euler number is very easy and
fast to calculate. The Euler number can be computed by using three local
operators. Let us define three numbers N 1 , N2 and N 3 , where Nex indicates
the number of times that one of the patterns in the pattern set a (a = 1, 2
or 3) occur in the input image.
Pattern set 1 (Nd:

00
01

00

10

10
10

Pattern set 2 (N2 ):

01
10

10
01

01
01

3.3 Binary Images

101

C=2
H=O
E=2

C=2
H=O
E=2

C=2
H=3

E =1

Fig. 3.35. Euler number for various objects. C - Connected components, H - Holes,
E - Euler Number.

Pattern set 3 (N3 ):


11

11

01

10

10

01

11

11

The 8-connected Euler number, where holes and blobs are defined in terms
of 8-connected figures, is defined as: Nl-( 2x 2 )-N3. It is possible to calculate
the 4-connected Euler number using a slightly different formula, but this
parameter can give results which seem to be anomalous when we compare
them to the observed number of holes and blobs.

Biggest White Blob. Find the largest, non-connecting, white blob in a


binary image. The binary blobs must not be touching the edge of region of
interest.
int number=O, biggest=O, bigthresh=O, ob_count=O;
int pix_count, current_pixel;
bufferO = label(input);
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

current_pixel = bufferO.getxy(x,y);
if (current_pixel>ob_count)
ob_count = current_pixel;
}

for(int ct = 1; ct<=ob_count; ct++)


{

output=doubleThreshold(bufferO,ct,ct);
number = 0;
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(output.getxy(x,y)==WHITE)
number++;
pix_count = number;

102

3. Machine Vision Techniques


if (pix_count > biggest)
{

biggest = pix_count;
bigthresh = ct;
}
}

output = doubleThreshold(bufferO,bigthresh,bigthresh);

Binary to Grey Scale. Convert a binary image to a user defined grey scale
grey.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if(input.getxy(x,y)==WHITE)
output.setxy(x,y,grey);

Centroid. Replace grey scale shape by a single pixel at its centre of gravity.
The NeatVision implementation of this algorithm is limited by the fact that
we can only have a maximum of 255 blobs in a scene. It also assumes a black
background which is ignored in the centroid calculation. This approach is
based on the fact that all shapes must have a distinct grey scale.
int table[] = new int[256];
int cenx[]
new int[256];
int ceny[] = new int[256];
int current_pixel, max,index, X_centroid, Y_centroid;
int cnx=O,cny=O;
for(int i=BLACK; i<=WHITE; i++)
{

table [i] =0;


cenx[i]=O;
ceny[i]=O;
}

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

current_pixel = input.getxy(x,y);
table [current_pixel]++;
cenx [current_pixel] +=x;
ceny[current_pixel]+=y;
}

for(int n=l; n<=256; n++)


{

max=O;
index=O;
for(int j=BLACK; j<=WHITE; j++)
if (table [j] >max)
{

max=table[j] ;

3.3 Binary Images

103

index=j;
cnx=cenx[j];
cny=ceny[j];
}

table [index] =0;


if((max>O)&&(index>O
{

X_centroid = cnx/max;
V_centroid = cny/max;
output.setxy(X_centroid,Y_centroid,WHITE);
}
}

Region Labelling. Consider an image containing a number of separate


blob-like figures. A region labelling operator will shade the output image so
that each blob is given a separate intensity value. We could shade the blobs
according to the order in which they are found during a conventional raster
scan of the input image. Alternatively, the blobs could be shaded according to
their areas; the largest blob having the highest grey scale value (Fig. 3.36).
This is a very useful operator, since it allows objects to be separated and
analysed individually. Small blobs can also be eliminated from an image using this operator. Region labelling can also be used to count the number of
distinct binary blobs in an image. Unlike the Euler number, counting based
on region labelling is not effected by the presence of holes. In Fig. 3.37 each
unique region that is detected is assigned a grey scale value. The regions that
appear unconnected during this first pass are reassigned in a second pass.

(a)

(b)

(c)

Fig. 3.36. Shading blobs in a binary image. (a) Original binary image. (b) Labelling
according to their areas. (c) Labelling according to the order in which they are found
during a raster scan (left to right; top to bottom).

Filling Holes. Consider a white blob-like figure containing a hole (lake),


against a black background. The application of the hole-filling operator will
cause all of the holes to be filled-in by setting all pixels in the holes to white.
This operator will not alter the outer edge of the figure.

104

3. Machine Vision Techniques

Fig. 3.37. Region labelling by location.

int pixel;
output
Not.not(input);
output
Mask.mask(output,1);
output
Labeller.label(output);
pixel
output.getxy(1,1);
output
DualThreshold.doubleThreshold(output,pixel,pixel);
output
Not.not(output);
output
Mask.mask(output,1);
return(output);

Where the mask operator draws a black border of a user specified amount
around the image.
Detecting/Removing Small Spots. A binary image can be represented
in terms of a grey scale image in which only two grey levels, 0 and Ware
allowed. The result of the application of a conventional low-pass (blurring)
filter to such an image is a grey scale image in which there are a larger number
of possible intensity values. Pixels which were well inside large white areas
in the input image are mapped to very bright pixels in the output image.
Pixels which were well inside black areas are mapped to very dark pixels in
the output image. Pixels which were inside small white spots in the input
image are mapped to mid-grey intensity levels. Pixels on the edge of large
white areas are also mapped to mid-grey intensity levels. However, if there is
a cluster of small spots which are closely spaced together, some of them may
also disappear.
Based on these observations, the following procedure has been developed.
It has been found to be effective in distinguishing between small spots and,
at the same time, achieving a certain amount of edge smoothing of the large
bright blobs which remain:
Low-pass filter using a 11 by 11 local operator
Threshold at mid-grey

3.3 Binary Images

105

This technique is generally easier and faster to implement than the blob
shading technique described previously. Although it may not achieve the desired result exactly, it can be performed at high speed. An N-tuple filter
having the weight matrix illustrated in Fig. 3.38 can be combined with simple thresholding to distinguish between large and small spots. Assume that
there are several small white spots within the input image and that they are
spaced well apart. All pixels within a spot which can be contained within a
circle of radius three pixels will be mapped to white by this particular filter.
Pixels within a larger spot will become darker than this. The image is then
thresholded at white to separate the large and small spots.

-1

-1

-1

-1

-1

-1

-1

-1
-1
-1

-1
-1
-1

20

-1

-1
-1

-1
-1

-1

-1

Fig. 3.38. N-tuple filtering to eliminate


small spots.

Grass-fire Transform and Skeleton. Consider a binary image containing a single white blob. Imagine that a fire is lit at all points around the
blob's outer edge and the edges of any holes it may contain. The fire will
burn inwards, until at some instant, advancing fire lines meet. When this
occurs the fire becomes extinguished locally. An output image is generated
and is shaded in proportion to the time it takes for the fire to reach each
point. Background pixels are mapped to black (Fig. 3.39). The importance of
this transform, referred to as the grass-fire transform, lies in the fact that it
indicates distances to the nearest edge point in the image (Borgefors 1986).
It is therefore possible to distinguish thin and fat limbs of a white region.
Those points at which the fire lines meet are known as quench points. The
set of quench points form a match-stick figure, usually referred to as a skeleton or medial axis transform (MAT) (Fig. 3.40). These figures can also be
generated in a number of different ways (Gonzalez & Wintz 1987). One such
approach is described as onion-peeling. Consider a single white blob and a
'bug' which walks around the blob's outer edge, removing one pixel at a time.
No edge pixel is removed, if by doing so we would break the blob into two
disconnected parts. In addition, no white pixel is removed, if there is only
one white pixel amongst its 8-neighbours. This simple procedure leads to an
undesirable effect in those instances when the input blob has holes in it; the
skeleton which it produces has small loops in it which fit around the holes like
a tightened noose. More sophisticated algorithms have been devised which
avoid this problem (Sec. 3.4.2).

106

3. Machine Vision Techniques


boolean done=false;
int greyLevel=O;
int W = 255;

II IMAGE COPY
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,input.getxy(x,y;
II there's no point in replacing white pixels with white
while(!done && greyLevel<WHITE)
{

greyLevel++;
done = true;
II 8-connected
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

a = output.getxy(x-l,y-l);
b
output.getxy(x,y-l);
c = output.getxy(x+l,y-l);
d
output.getxy(x-l,y);
e = output.getxy(x,y);
f
output.getxy(x+l,y);
g
output.getxy(x-l,y+l);
h
output.getxy(x,y+l);
i
output.getxy(x+l,y+l);
i f (e == WHITE)
{

z = Math.min(i,Math.min(Math.min
(Math.min(a,b),Math.min(c,d,
Math.min(Math.min(e,f),Math.min(g,h;
if(z!=greyLevel && z!=WHITE)
{

output.setxy(x,y,greyLevel);
done = false;
}
}
}
}

Thinning. Thinning allows the generation of match-stick like figures while


keeping critically connected points (Fig. 3.41). The thinning algorithm outlined below is a two pass procedure for a signal iteration of thinning (3 x 3) a
binary image. During the first pass we clear a temporary alternate image. The
number of ones in the 3 x 3 window excluding the centre pixel e is counted. If
6 :::; count:::; 2, then count the number of 0 to 1 transitions around the current
pixel. If the number of transitions = 1 and pixels j, h, band d are black then
place a single white pixel at the current location (shaded) in the alternate
image. Delete all the points flagged in the alternate image (this is performed
by exclusive OR'ing (XOR) the current and alternate images). During the
second pass, again clear alternate image. Check the current pixel and count

3.3 Binary Images

107

Fig. 3.39. Grass-fire transform and its associated 3-D profile.

AJQK
Fig. 3.40. Application of the Medial Axis Transform (MAT) (8-connected). The
top figure illustrates how a small variation in the input scene (e.g., noise) can have
a significant effect on the resultant MAT.

the number of ones in the 3 x 3 window, excluding the centre pixel e from
the count. If 6 ::; count::; 2, count the number of 0 to 1 transitions around
the current pixel. If the number of transitions = 1 and pixels j, h, band d
are black then place a single white pixel at the current location (shaded) in
the alternate image. Delete all the points flagged in the alternate image as
before.
Edge Smoothing and Corner Detection. Consider three points B 1 , B2
and B3 which are placed close together on the edge of a single blob in a
binary image (Fig. 3.42). The perimeter distance between Bl and B2 is equal
to that between B2 and B 3 . Define the point P to be that at the centre of
the line joining Bl and B 3 . As the three points now move around the edge of
the blob, keeping the spacing between them constant, the locus of P traces

108

3. Machine Vision Techniques

Fig. 3.41. Various stages in the thinning process.

a smoother path than that followed by B2 as it moves around the edge. This
forms the basis of a simple edge smoothing procedure.
A related algorithm for corner detection shades the edge according to
the distance between P and B 2 . This results in an image in which the
corners are highlighted, while the smoother parts of the image are much
darker. Many other methods of edge smoothing are possible. For example,
we may map white pixels which have fewer than three white 8-neighbours
to black. This has the effect of eliminating 'hair' around the edge of a
blob-like figure. One of the techniques described previously for eliminating
small spots offers another possibility. A third option is to use the processingsequence: expand white areas, shrink white areas, shrink white
areas, expand white areas.

Fig. 3.42. Edge smoothing and corner detection.

The 3x3 masks illustrated in Fig. 3.43(a) can also be used to detect binary
corners. It detects 3 x 3 corner points and replaces them by a single white
pixel if the illustrated patterns occur. Otherwise, the output pixel (shaded)
is set to black. The application of this operator applied to a binary image is
illustrated in result of this operator Fig. 3.43(b).
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

if ((b==WHITE)&&(d==WHITE)&&(e==WHITE)&&(c==BLACK)&&
(f==BLACK)&&(i==BLACK)&&(h==BLACK)&&(g==BLACK))

3.3 Binary Images

109

z=WHITE;
else if((b==WHITE)&&(f==WHITE)&&(e==WHITE)&&(a==BLACK)&&
(d==BLACK)&&(g==BLACK)&&(h==BLACK)&&(i==BLACK))
z=WHITE;
else if((d==WHITE)&&(h==WHITE)&&(e==WHITE)&&(a==BLACK)&&
(b==BLACK)&&(c==BLACK)&&(f==BLACK)&&(i==BLACK))
z=WHITE;
else if((e==WHITE)&&(f==WHITE)&&(h==WHITE)&&(a==BLACK)&&
(b==BLACK)&&(c==BLACK)&&(d==BLACK)&&(g==BLACK))
z=WHITE;
else
z=BLACK;
output.setxy(x,y,z);
}

~
~ ~
~ l ~ f~PI
1 0

(a)

A J QK.
(b)

Fig. 3.43. 3 x 3 corner detection. Note that due to quantization of the image not
all corner points are flagged. (a) Masks. (b) Corner detection based on the patterns
illustrated in (a).

Limb End Detection. Fig. 3.44(a) contains the masks for detecting 3 x 3
skeleton end points. Once detected, these limb end points are replaced by
a single white pixel. Otherwise the output pixel (shaded) is set to black.
Fig. 3.44(b) illustrates the application of this operator to a binary image. The
removal of the limb ends is referred to as pruning and is is commonly used
to smooth skeletons or thinned objects by removing extraneous branches.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

if ((e==WHITE)&&(d==BLACK)&&(g==BLACK)&&(h==BLACK)
&&(i==BLACK)&&(f==BLACK))
z=WHITE;
else if((e==WHITE)&&(a==BLACK)&&(b==BLACK)
&&(d==BLACK)&&(g==BLACK)&&(h==BLACK))
z=WHITE;
else if((e==WHITE)&&(b==BLACK)&&(c==BLACK)
&&(f==BLACK)&&(i==BLACK)&&(h==BLACK))
z=WHITE;
else if((e==WHITE)&&(a==BLACK)&&(b==BLACK)
&&(c==BLACK)&&(d==BLACK)&&(f==BLACK))

110

3. Machine Vision Techniques

else

z=WHITE;

z=BLACK;
output.setxy(x,y,z);

Junction Detection. The 3 x 3 masks illustrated in Fig. 3.45(a) have been


designed to detect 3 x 3 skeleton junction points and replace them by a
single white pixel if the illustrated patterns occur. Otherwise the output
pixel (shaded) is set to black. Fig. 3.45(b) illustrates the application of this
operator to a binary image.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

if((e==WHITE)&&(b==WHITE)&&(g==WHITE)&&(i==WHITE
z=WHITE;
else if((e==WHITE)&&(a==WHITE)&&(f==WHITE)&&(g==WHITE
z=WHITE;
else if((e==WHITE)&&(h==WHITE)&&(a==WHITE)&&(c==WHITE
z=WHITE;
else if((e==WHITE)&&(d==WHITE)&&(c==WHITE)&&(i==WHITE
z=WHITE;
else if((e==WHITE)&&(a==WHITE)&&(i==WHITE)&&(g==WHITE
z=WHITE;
else if((e==WHITE)&&(a==WHITE)&&(c==WHITE)&&(g==WHITE
z=WHITE;
else if((e==WHITE)&&(c==WHITE)&&(g==WHITE)&&(i==WHITE
z=WHITE;
else if((e==WHITE)&&(a==WHITE)&&(c==WHITE)&&(i==WHITE
z=WHITE;
else
z=BLACK;
output.setxy(x,y,z);
}

~
~ I ~ I ~~ ~ I ~
~ ~
~
(a)

"..

'..
."

.
(b)

Fig. 3.44. 3 x 3 skeleton end point (limb) detection. (a) Masks. (b) Limb detection
based on the patterns illustrated in (a).

Bounding Box. Draw a bounding box around an image. The box dimensions are incremented by one pixel to ensure that none of the original image
is lost when the box is overlaid (Fig. 3.46).

3.3 Binary Images

fIhhl

~
~ 1
t:ttlj
~
111 ~
trtJj trt:tTI

b:ffi

111

..

.
(b)

(a)

Fig. 3.45. 3 x 3 skeleton junction point detection. (a) Masks. (b) Skeleton junction
point detection based on the patterns illustrated in (a).

xmin = width-1; ymin = height-1;


xmax = 0; ymax = 0;
for (int y=O; y<height; y++)
for (int x=O; x<width; x++)
{

current_pixel_value = input.getxy(x,y);
output.setxy(x,y,current_pixel_value);
if (current_pixel_value != 0)
{

current_pixel_xaddress = x;
current_pixel_yaddress = y;
if (current_pixel_xaddress > xmax)
xmax = current_pixel_xaddress;
if (current_pixel_xaddress < xmin)
xmin = current_pixel_xaddress;
if (current_pixel_yaddress > ymax)
ymax = current_pixel_yaddress;
if (current_pixel_yaddress < ymin)
ymin = current_pixel_yaddress;
}
}

x1=xmin-1;
/* allow 1 pixel gap */
y1=ymin-1;
x2=xmax+1 ;
y2=ymax+1 ;
output.drawBox(x1,y1,x2,y2,WHITE);

Convex Hull. Consider a single blob in a binary image. The convex hull
is that area enclosed within the smallest convex polygon which will enclose
the shape. This can also be described as the region enclosed within an elastic
string, stretched around the blob. The area enclosed by the convex hull, but
not within the original blob is called the convex deficiency, which may consist
of a number of disconnected parts and includes any holes and indentations.
If we regard the blob as being like an island, we can understand the logic of
referring to the former as lakes and the latter as bays (Fig. 3.47).
A fundamental property of the convex hull is that any line outside the hull,
when moved in any direction towards the hull, hits the hull at one of its vertex

112

3. Machine Vision Techniques

Fig. 3.46. Bounding box of a binary image.

Hole

Bays

Fig. 3.47. Convex hull of a binary image. The enclosed regions indicates the
shape's convex deficiencies (i.e. bays
and holes).

points. While there have been a number of algorithms developed to determine


the convex hull, the one implemented in Neat Vision is very straight forward
(Sedgewick 1983). It requires us to start with some point guaranteed to be
on the hull (e.g., a point with smallest y coordinate). We then take a ray and
sweep it until it finds minimum angle point (= hull point). We then anchor at
this point and repeat sweep process until all points covered. Initially: Xl = 0
and Yl = minimum y, X2 = X at minimum y and Y2 = minimum y, X3 = X
scan point and Y3 = Y scan point (Fig. 3.48). When a minimum angle is
found: Xl = previous X2 and Yl = previous Y2, X2 = X3 at the minimum
angle and Y2 = Y3 at the minimum angle, X3 = new X scan point and Y3 =
new Y scan point (Fig. 3.49). The minimum angle is found by examining the
scalar (dot) product of two vectors (Fig. 3.50).
Figs. 3.51 and 3.52 illustrate how the convex deficiencies can be extracted
from a filled convex hull.
3.3.3 Measurements on Binary Images

To simplify the following explanation, we will confine ourselves to the analysis


of a binary image containing a single blob. The area of the blob can be
measured by the total number of object (white) pixels in the image. However,
we must first define two different types of edge points, in order to measure
an object's perimeter.
The 4-adjacency convention only allows the four main compass points
to be used as direction indicators, while 8-adjacency uses all eight possible

3.3 Binary Images

113

(O,O,...:-)_ __ __ _-J>-

Fig. 3.48. Convex hull generation: Initial


position.

(O,0r=-)_ __ _ _ _--JI-

(X2 ,Y2) ~

(X3,Y3)
Fig. 3.49. Convex hull generation: Stage 2.

Fig. 3.50. Convex hull generation: Scalar (dot)


product of two vectors.

Fig. 3.51. Generating the convex deficiencies by subtracting the original image
from the filled convex hull representation.

114

3. Machine Vision Techniques

Fig. 3.52. Extracting the convex hull holes by applying the labelling function to
the inverted version of the original image. The holes can then be extracted using
the double threshold operator.

directions (Fig. 3.53). If 4-adjacency convention is applied to the image segment given in Fig. 3.53, then none of the four segments (two horizontal and
two vertical) will appear as touching, i.e. they are not connected. Using the
8-adjacency convention, the segments are now connected, but we have the
ambiguity that the inside of the shape is connected to the outside. Neither
convention is satisfactory, but since 8-adjacency allows diagonally-connected
pixels to be represented, it leads to a more faithful perimeter measurement.
Assuming that the 8-adjacency convention is used, we can generated a coded
description of the blob's edge. This is referred to as the chain code or Freeman
code. As we trace around the edge of the blob, we generate a number, 0-7,
to indicate which of the eight possible directions we have taken (i.e. from the
centre, shaded pixel in Fig. 3.53). Let Nodd indicate how many odd-numbered
code values are produced as we code the blob's edge and N even represent the
number of even-numbered values found. The approximate perimeter of the
blob is given by: N even +jNodd This formula will normally suffice for use in
those situations where the perimeter of a smooth object is to be measured.
The centroid of a blob determines its position within the image and can be
calculated using the following:

X+--

L L a(x;;) x x
x

Y +--

LL a(x,y) x y
N x 'y

(3.9)

X,Y

(3.10)

where

Nx,y +--

L L a(x, y)
y

(3.11)

3.3 Binary Images

115

21

~0
4

(a)

(b)

(c)

Fig. 3.53. Chain code. (a) Image segment. (b) 4-adjacency coding convention. (c)
8-adjacency coding convention.

3.3.4 Run-length Coding


Run-length coding involves scanning the scene and recording the column
start and end coordinate for each run within a row (Fig. 3.54). This data can
then be processed directly or transmitted. The problem with this approach
is that it is non-isotropic and therefore prone to errors due to rotation of the
scene. Another problem occurs if the number of runs becomes too large, for
example in a noisy image, then the coded image can contain more data than
the original image.

C1

C2

C3

C4

Fig. 3.54. Run-length coding applied to a simple image. The resultant code for the three
scan lines indicted (labelled A,B,C) are ((A, CI,
C4)), ((B, CI, C2), (B, C3, C4) ), ((C, CI, C2)).

3.3.5 Shape Descriptors


The following are just a few of the simpler shape descriptors (Pavlidis 1978)
that have been proposed:
1. The distance of the furthest point on the edge of the blob from the
centroid.
2. The distance of the closest point on the edge of the blob from the centroid.

116

3. Machine Vision Techniques

3. The number of protuberances, as defined by that circle whose radius is


equal to the average of the parameters measured in (a) and (b).
4. The distances of points on the edge of the blob from the centroid, as
a function of angular position. This describes the silhouette in terms of
polar coordinates. (This is not a single-valued function.)
5. Circularity = Area/ Perimeter2. This will tend to zero for irregular
shapes with ragged boundaries and has a maximum value (= 4~) for a
circle.
6. The number of holes and/or bays.
7. Euler number.
8. The ratio of the areas of the original blob and that of its convex hull.
9. The ratio of the areas of the original blob and that of its circumcircle.
10. The ratio of the area of the blob to the square of the total limb-length
of its skeleton.
11. Distances between joints and limb ends of the skeleton.
12. The ratio of the projections onto the major and minor axes.

3.4 Global Image Transforms


An important class of image processing operators is characterised by an equation of the form B {::: f (A). Each element in the output image B is calculated
using all or, at least a large proportion of the pixels in A. The output image, B, may well look quite different from the input image, A. Examples of
this class of operators are: lateral shift, rotation, warping, Cartesian to polar
coordinate conversion, Fourier and Hough transforms.
3.4.1 Geometric Transforms

Algorithms exist by which images can be shifted, rotated, undergo axis conversion, magnified and warped. Certain operations, such as rotating a digital
image can cause some difficulties, as pixels in the input image are not mapped
exactly to pixels in the output image. This can cause smooth edges to appear stepped. To avoid this effect interpolation may be used, but this has the
unfortunate effect of blurring edges.
The utility of axis transformations is evident when we are confronted with
the examination of circular objects, or those displaying a series of concentric
arcs, or streaks radiating from a fixed point. Inspecting such objects is often
made very much easier, if we first convert from Cartesian to polar coordinates.
Warping is also useful in a variety of situations. For example, it is possible
to compensate for barrel, or pin-cushion distortion in a camera. Geometric
distortions introduced by a wide-angle lens, or trapezoidal distortion due
to viewing the scene from an oblique angle can also be corrected. Another
possibility is to convert simple curves of known shape into straight lines, in
order to make subsequent analysis easier.

3.4 Global Image Transforms

117

3.4.2 Distance Transforms

The distance transform produces a grey scale image in which the intensity of a
pixel in the image is dependent on the distance of that pixel from nearest edge
point in the image, see the grass-fire transform outlined in Sec. 3.3.2. The
resultant image is dependent on the distance metric employed. Commonly
used distance metrics between the two points (Xl, Yl) and (X2, Y2) are outlined
below. We generally use the city block and chessboard distance metrics when
speed rather than accuracy is the key concern.
Euclidean Distance: V(X2 - xd 2 + (Y2 - Yl)2. This is the familiar straightline distance.
City Block Distance: IX2 - xII + IY2 - Yli. Diagonal moves are not allowed
when moving between pixels.
Chessboard Distance: max(lx2 - xII, IY2 - Yll). Diagonal moves count the
same as a horizontal move (Fig. 3.55).

a a a a a
a 1 1 1 1
a 1 1 1 1
a 1 1 1 1
a 1 1 1 1
a 1 1 1 1
0
0

a
a

1
1
1
0

1
1
1
0

0
1
1

1
1
1
1 1 1
1 1 1
1 1 1
0 0 0

a a

0 a
1 0
1 1 0
1 1 1 0
1 1 1 0
1 1 1 0
1 1 1 0
1 1 1 0
1 1 1 a
a 0 0 0
1
1

0
0
0

a
0

a
0
0
0
0

0
1
1
1

1
1

0
1

0
1

2
2
2
2

2
3
3
3
3
2
1
0

2
3

1 2
1 2
1 1
0 0

4
4

3
2
1

0
1
2

a a
1

2
2

2
2
2
2

4
3
2

3
3
3
3
2

1
0

1
1

a
0

a
a
a
a

1
1 0
1 a
a 0

Fig. 3.55. Distance transform using the chessboard distance metric.

Chamfer Algorithm. This algorithm allows the calculation of the distance


transform in two passes of the image. In the case of a one-dimension signal
the first pass involves calculating the number of steps since the last zero value
was encountered when scanning from left to right. The second pass involves
calculating the number of steps since the last zero value was encountered
when scanning from right to left. The final result is generated by finding the
minimum of the two passes through the image (Fig. 3.56).
The 2-D Chamfer distance transform uses masks of varying sizes during
each pass. Fig. 3.57 illustrates its operation for 3 x 3 masks. This yields a
distance image in which the grey levels are three times the Euclidean distance
to the boundary. During the first pass the forward mask is applied to the white
pixels in the binary image (i.e .. in a raster fashion). Each element of the mask
is added to the underlying pixel (the current pixel of interest is indicated by

118

3. Machine Vision Techniques

10 10 11 1111 1111 111110 10 1

Binary Image

10 10 1112 13 14 15 16 17 10 10 1

Pass 1 (L

10 10 1112 13 14 13 12 11 10 10 1

Pass 2 (R

R)
~

L)

Fig. 3.56. Calculating the distance transformation for a I-D image using the Chamfer algorithm.

the dot at the centre of the mask in Fig. 3.57(a)). The centre pixel is then
replaced by the minimum of these sums. The pixel value is overwritten prior
to moving the mask on to its next calculation (Fig. 3.57(d)). During the
second pass the same approach is applied using the reverse mask (i.e. using
a reverse raster scan) (Figs. 3.57(b) and (e)). The final distance transform
is the minimum of the output images generated from both passes. For the
3 x 3 mask the maximum deviation from the true Euclidean distance is 8%
(Fig. 3.58). If the mask size is increased to 5 x 5 the distance transform image
is scaled up by a factor of 5, with a maximum error of only 2% (Fig. 3.59). The
key aspects of the 3 x 3 Chamfer distance transform algorithm are outlined
below.
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
if (input.getxy(x,y)==WHITE)
{

pass1.setxy(x,y,1);
pass2.setxy(x,y,1);
}

II first pass - forward raster scan.


for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
if(pass1.getxy(x,y) == 1)
{

a = pass1.getxy(x-1,y-1)+4;
b
pass1.getxy(x,y-1)+3;
c = pass1.getxy(x+1,y-1)+4;
d
pass1.getxy(x-1,y)+3;
z
Math.min(Math.min(a,b),Math.min(c,d));
pass1.setxy(x,y,flow(z));
}

II Second pass - reverse raster scan.


for(int y=height-1; y>=O; y--)
for(int x=width-1; x>=O; x--)
if(pass2.getxy(x,y) == 1)
{

f = pass2.getxy(x+1,y)+3;

3.4 Global Image Transforms

119

pass2.getxy(x-1,y+1)+4;
g
h
pass2.getxy(x,y+1)+3;
i
pass2.getxy(x+1,y+1)+4;
z
Math.min(Math.min(f,g),Math.min(h,i));
pass2.setxy(x,y,flow(z));
}

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

output.setxy(x,y,Math.min(pass1.getxy(x,y),
pass2.getxy(x,y)));
}

a
a

Cf:j b:ti:tij

a
a
0

a
0
0

3
3
3
3
3
3
3

a a

0
3
6
6
6
6
6
6
0

0
3
6
9
9
9
9
9
0

0
3
6
8
8
8
8
8
0

a
3
4
4
4
4
4
4

0
0
0

a
0
0
0
0

a a

(d)

11

11
11

0
0
0
0

11

111

a
0
0
0
0
0
0
0
0

0
4
4
4
4
4
4
3
0

0
0
11 0
11 0
t 1 0
F 1 0
1 1 0
f 1 0
0 0 0

(c)

(b)

(a)

a a

0
0

~ ~

0
8
8
8
8
8
6
3
0

0
9
9
9
9
9
6
3
0

(e)

a
6
6
6
6
6
6
3
0

0
3
3
3
3
3
3
3
0

0
0
0
0
0
0
0
0
0

a
0
0

a
0

a
0
0
0

0
3
3
3
3
3
3
3
0

0
3
6
6
6
6
6
3
0

0
3
6
9
9
9
6
3
0

0
3
6
6
6
6
6
3
0

0
3
3
3
3
3
3
3
0

a
0

a
0
0
0

a
0

(f)

Fig. 3.57. 2-D Chamfer distance transform, using 3 x 3 masks. (a) Forward mask.
(b) Reverse mask. (c) Binary input image. (d) Result of pass 1. (e) Result of pass
2. (f) Minimum of (d) and (e).

3.4.3 Hough Transform

The Hough transform (HT) provides a powerful and robust technique for
detecting lines, circles, ellipses, parabolae and other curves of pre-defined
shape in a binary image. One of the key applications of the Hough transform
is in the detection of straight lines. That is the location of nearly linear

120

3. Machine Vision Techniques

(b)

(a)

Fig. 3.58. 3 x 3 Chamfer distance transform. (a) Input image. (b) Resultant distance transform.

11
11
11 7 5 7 11
5

(a)

5
11 7 5 7 11
11
11
(b)

Fig. 3.59. 5 x 5 masks for the Chamfer algorithm. (a) Forward mask. (b) Reverse
mask.

arrangements of disconnected white spots and "broken" lines. Consider that


a straight line in the input image is defined by the equation r = x x cos +
Y x sin , where rand are two unknown parameters, whose values are to
be found. Clearly, if this line intersects the point (Xi, Yi), then r = Xi X
cos + Yi x sin can be solved for many different values of (r,). So, each
white point (Xi, Yi) in the input image may be associated with a set of (r, )
values (Fig. 3.60). Actually, this set of points forms a sinusoidal curve in (r, )
space (the latter is called the HT image). Since each point in the input image
generates such a sinusoidal curve, the whole of that image creates a multitude
of overlapping sinusoids in the HT image (Fig. 3.61). In many instances, a
large number of sinusoidal curves are found to converge on the same spot in
the HT image. The (r, ) address of such a point indicates the slope, and
position, r, of a straight line that can be drawn through a large number of
white spots in the input image.
The implementation of the HT for line detection uses a 2-D accumulator array, A(r, ), to represent quantised (r, ) space (Vernon 1991). An
important choice to be made is the step size for quantising rand . Assuming that all the elements of A(r, ) are initialised to zero, the HT is
found by computing a set S(Xi' Yi) of (r, ) pairs satisfying the equation
r = Xi X cos + Yi x sin. Then, for all (r,) in S(Xi,Yi), we increment
A(r, ) by one. This process is then repeated for all values of i such that

3.4 Global Image Transforms

121

(a)

o
(b)

Fig. 3.60. Mapping of a point on a line into Hough space. (a) Input line. (b)
Sinusoidal representation of a point on the line illustrated in (a).

the point (Xi, Yi) in the input image is white. The bright spots in the HT
image indicate "linear" sets of spots in the input image. Thus, line detection
is transformed to the rather simpler task of finding local maxima in the accumulator array, A(r, ). The coordinates (r, ) of such a local maximum give
the parameters of the equation of the corresponding line in the input image.
The HT image can be displayed, processed and analysed just like any other
image, using many of the operators outlined previously.
The robustness of the HT techniques arises from the fact that, if part
of the line is missing, the corresponding peak in the HT image is simply
darker. This occurs because fewer sinusoidal curves converge on that spot
and the corresponding accumulator cell is incremented less often. However,
unless the line is almost completely obliterated, this new darker spot can also
be detected. In practice "near straight lines" are transformed into a cluster
of points. There is also a spreading of the intensity peaks in the HT image,
due to noise and quantisation effects. In this event, it may be convenient to
threshold the HT image prior to finding the centroid of the resulting spot
and calculating the parameters of the straight line in the input image (Pitas
1993). Fig. 3.61 illustrates how this approach can be used to find a line in a
noisy binary image.
The Hough transform can also be generalised to detect groups of points
lying on a curve. In practice, this may not be a trivial task, since the complexity increases very rapidly with the number of parameters needed to define the
curve. Another key application of this transform is in circle detection. The
circle is defined parametrically as: r2 = (x - a)2 + (y - b)2, where (a,b) determines the coordinates of the centre of the circle and r is its radius. This
requires a 3-D parameter space, which cannot be represented and processed
as a single image. For an arbitrary curve, with no simple equation to describe
its boundary, a look-up table is used to define the relationship between the
boundary coordinates and the HT parameters. See Sonka et al. (1993) for
further details.

122

3. Machine Vision Techniques

(b)

(a)

"..
",\

~.'.
.

... .

.'

... "

.
........ .

"'.~,\.,
',.

(c)

,~

..

""

...

"-',><~ "

(d)

(e)
Fig. 3.61. Finding a line in a noisy image using the Hough transform. (a) Original
image. (b) Hough transform. (c) Inverse Hough transform applied to a single white
pixel located at the point of maximum intensity in the Hough transform. (d) Union
of (a) and (c). Notice how accurately this process locates the line in the input image,
despite the presence of a high level of noise. (e) Associated visual workspace.

3.4 Global Image Transforms

123

The following code segment summarises the key steps in the generation
of the Hough transform.
double sine[] = new double[width];
double cosine[] = new double[width];
for(int theta=O; theta<width; theta++)
{

sine[theta] = Math.sin((theta*Math.PI)/width);
cosine[theta] = Math.cos((theta*Math.PI)/width);
}

for(int y=O; y<height; y++)


for(int x=O; x<width; x++)
{

current_pixel = input.getxy(x,y);
if (current_pixel==WHITE)
{

for(int theta=O;theta<width;theta++)
{

double r = ((x*cosine[theta]) + (y*sine[theta]));


r = (double)height/2 + (r/3.0);
int c = theta + (int) (Math.floor(r)*width);
if((c>O)&&(c<width*height))
if(output.get(c)<WHITE)
output.set(c,output.get(c)+1);
}
}
}

Similarly the inverse Hough transform is outlined below, where the line
count variable n is user defined.
maxx=O; maxy=O;
for(int lines=O;lines<n;lines++)
{

max=O;
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

current_pixel=input.getxy(x,y);
if (current_pixel>max)
{

max=current_pixel;
maxx=x;
maxy=y;
}
}

input.setxy(maxx,maxy,O);
r=(maxy-((double)height/2))*3.0;
thet=(maxx/((double)width))*Math.PI;

124

3. Machine Vision Techniques

if(Math.abs(Math.sin(thet))<Math.abs(Math.cos(thet)))
{

m=-Math.sin(thet)/Math.cos(thet);
cons=r/Math.cos(thet);
for(int y=O;y<height;y++)
{

int x=(int)(((double)y*m)+cons);
if(x>O && x<width)
output.setxy(x,y,output.getxy(x,y)+l);
}
}

else
{

m=-Math.cos(thet)/Math.sin(thet);
cons=r/Math.sin(thet);
for(int x=O;x<width;x++)
{

int y=(int)(((double)x*m)+cons);
if(y>O && y<height)
output.setxy(x,y,output.getxy(x,y)+l);
}
}
}

3.4.4 Two-dimensional Discrete Fourier Transform (DFT)


As mentioned previously, the transformation of an image into a different
domain can sometimes make the analysis task easier. Another important
operation to which this remark applies is the Fourier transform. Since we are
discussing the processing of images, the discussion will be focussed on the 2-D
Discrete Fourier transform (DFT). This operation allows spatial periodicities
in the intensity within an image to be investigated, in order to find, amongst
other features, the dominant frequencies. The 2-D DFT of an N x N image
f(x, y) is defined as follows (Gonzalez & Wintz 1987):
1

F(rn, n) = N

L L

N-l N-l

=x+ny

f(x, y)e- J27r (-N-)

(3.12)

x=O y=o

The inverse transform of F( u, v) is defined as:


1

f(x, y) = N

L L

N-l N-l

=x+ny

F(rn, n)eJ27r (-N-)

(3.13)

m=O n=O

where 0 ::; x, y ::; N - 1. The relationship between convolution (*) and


the Fourier transform is an important one (Eqns. 3.14 and 3.15), particularly
as it is often more intuitive to design filters in the frequency domain when
compared to using convolution based filters in the spatial domain (Sec. 3.2).

F[(g

* h)(x,y)] = G(u,v)H(u,v)

(3.14)

3.4 Global Image Transforms


F[g(x, y)h(x, y)]

= (G * H)( u, v)

125
(3.15)

Several algorithms have been developed to calculate the 2-D DFT. The
simplest approach makes use of the observation that this is a separable transform which can be computed as a sequence of two 1-D transforms. Therefore, the 2-D transform is generated by calculating the 1-D DFT along the
image rows and then repeating this on the resulting image but, this time,
operating on the columns, i.e. f(x, y) --+ Row transform --+ Fl (x, n) --+
Column transform --+ F2(m, n). This reduces the computational overhead
when compared to direct 2-D implementations. (This is the approach taken
in the Neat Vision implementation of the DFT) (See Fig. 3.62).

Fig. 3.62. Binary images and their associated 2-D Discrete Fourier Transforms.
The frequency increases radially in the range [0,1] from the centre of the DFT
image.
Although this is still computationally slow when compared to many feature based segmentation techniques, the Fourier transform is quite powerful.
It allows the input to be represented in the frequency domain, which can be
displayed as a pair of images. (It is not possible to represent both amplitude
and phase using a single monochrome image). Fortunately, this is not just an
algorithmic transformation but a physical one (Hanson & Manneberg 1997).
The FT can be implemented in real-time using optical techniques (Stark
1982, Mitkas et al. 1998). Optical computational devices enable parallel computation of Fourier transforms, thereby presenting rapid processing.
Once the processing within the frequency domain is complete, the inverse
transform can be used to generate a new image in the original spatial domain.
The Fourier power, or amplitude, spectrum plays an important role in image
processing and analysis. This can be displayed, processed and analysed as
an intensity image. Since the Fourier transform of a real function produces a

126

3. Machine Vision Techniques

complex function: F(m, n) = R(m, n) + i.I(m, n), the frequency spectrum of


the image is the magnitude function IF(m,n)1 = jR2(m,n) + 12(m,n) and
the power spectrum (spectral density) is defined as P(m,n) = IF(m,n)1 2 .
For more details on the Fourier transform and its implementations, see Pitas
(1993) and Gonzalez & Wintz (1987).
In the NeatVision implementation of the 2-D DFT a number of temporary
files are produced as random access file stream (whose size is indicated by the
variable size) containing the real and imaginary data denoted temp] and
temp_i respectively. The following section outlines the typical filtering operations employed in the frequency domain. Fig. 3.65 illustrates the operation
of these filters.
Low-pass Filter. The user defined input value is the cutoff frequency for
the low-pass filter. If a point in the frequency spectrum is less than a given
radial distance then the real and imaginary values of the frequency spectrum
are retained, otherwise the real and imaginary values are set to zero. The
variable size is the size of the input image file (measured in pixels).
r=(value * value)/4;
for (int y=O; y<size; y++)
for (int x=O; x<size; x++)
{

xf=(double) (((x+(size1))%size)-(size1))/(double)si ze;


yf=(double) (((y+(size1))%size)-(size1))/(double)si ze;
if ((xf*xf + yf*yf)<r)
{

outputStream.writeDouble(temp_r);
outputStream.writeDouble(temp_i);
}

else
{

outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}
}

Adaptive Low-pass Filter. The user defined input limit is the cutoff
frequency for the filter. Unlike the low-pass filter which is based on the
position of the filter in the Fourier space, this approach is dependent on
the real and imaginary values all the data in the Fourier space. If the
real 2 + imaginary 2 > limit then the real and imaginary values of the
frequency spectrum are retained, otherwise the real and imaginary values
are set to zero (Fig. 3.63).
for (int y=O; y<size; y++)
for (int x=O; x<size; x++)
{

inputStream.readDouble();
inputStream.readDouble();

3.4 Global Image Transforms

127

xf=(double) (((x+(sizel))%size)-(sizel))/(double)si ze;


yf=(double) (((y+(size1))%size)-(size1))/(double)si ze;
if ((temp_r*temp_r + temp_i*temp_ilimit)
{

outputStream.writeDouble(temp_r);
outputStream.writeDouble(temp_i);
}

else
{

outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}
}

High-pass Filter. The user defined input value is the cutoff frequency for
the high-pass filter. If a point in the frequency spectrum is greater than a
given radial distance then the real and imaginary values of the frequency
spectrum are retained, otherwise the real and imaginary values are set to
zero.
r=(value * value)/4;
for (int y=O; y<size; y++)
for (int x=O; x<size; x++)
{

xf=(double) (((x+(size1))%size)-(size1))/(double)si ze;


yf=(double) (((y+(size1))%size)-(size1))/(double)si ze;
if ((xf*xf + yf*yfr)
{

outputStream.writeDouble(temp_r) ;
outputStream.writeDouble(temp_i) ;
}

else
{

outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}
}

Band-pass Filter. The user defined inputs valuel and value2 are the inner and outer bounds of the band-pass region (i.e. the region to be retained).
Fig. 3.64 illustrates how certain textured features can be highlighted using
the 2-D Discrete Fourier transform. The image is transformed into the frequency domain and an ideal band-pass filter (with a circular symmetry) is
applied. This has the effect of limiting the frequency information in the image. When the inverse transform is calculated, the resultant textured image
has a different frequency content (Fig. 3.65).
rl=(value1 * valuel)/4;
r2=(value2 * value2)/4;

128

3. Machine Vision Techniques


for (int y=O; y<size; y++)
for (int x=O; x<size; x++)
{

xf=(double) (((x+(size1))%size)-(size1))/(double)si ze;


yf=(double) (((y+(size1))%size)-(size1))/(double)si ze;
if ((xf*xf + yf*yfr1 && (xf*xf + yf*yf)<r2)
{

outputStream.writeDouble(temp_r);
outputStream.writeDouble(temp_i);
}

else
{

outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}
}

Band-stop Filter. The user defined inputs valuel and value2 are the inner
and outer bounds of the band-stop region (Fig. 3.65).
r1=(value1 * value1)/4;
r2=(value2 * value2)/4;
for (int y=O; y<size; y++)
for (int x=O; x<size; x++)
{

xf=(double) (((x+(size1))%size)-(size1))/(double)si ze;


yf=(double) (((y+(size1))%size)-(size1))/(double)si ze;
if ((xf*xf + yf*yfr1 && (xf*xf + yf*yf)<r2)

outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}

else
{

outputStream.writeDouble(temp_r);
outputStream.writeDouble(temp_i);
}
}

Selective Filter. The user defined input valueO is the filters cutoff frequency. value 1 and value2 are x and y coordinate offset values respectively
(Fig. 3.65).
r=(valueO * valueO)/4;
xOffset=(int) ((value1/2)*size);
yOffset=(int) ((value2/2)*size);
for(int m=O; m<size; m++)
for(int n=O; n<size; n++)
{

x=(int) ((n+(size-xOffset))%size);

3.4 Global Image Transforms

129

y=(int) ((m+(size+yOffset%size);
xf=(float) (((x+(sizel%size)-(sizel/(float)size ;
yf=(float) (((y+(sizel%size)-(sizel/(float)size ;
if ((xf*xf + yf*yf)<r)
{

outputStream.writeDouble(O.O);
outputStream.writeDouble(O.O);
}

else
{

outputStream.writeDouble(temp_r);
outputStream.writeDouble(temp_i);
}
}

Symmetric (Selective) Filter. This is similar to the selective filter with


the exception that the selected region is reflected around the origin of the
Fourier space to produce two filter regions (Fig. 3.65).
Case Study: Watermark analysis. Watermarks are generally used for
security and quality purposes. They take the form of brand names, brook
bonds or logos (Fig. 3.68). They are typically formed by a pressing process
using a metal stamp containing an impression of the desired pattern. This
pattern induces a variation in the paper thickness and is implemented at
the end of the wet stage of the paper production process. Traditionally, the
watermark is captured on-line and displayed on a CRT beside an image of
a good sample, where operators examine it occasionally. This enables the
identification of gross errors, whereas the operators classify other defects on
a subjective basis. The aim of this application is the reliable segmentation of
the watermark for plain and non-plain paper types. Non-plain paper samples
contain laid and chain lines (Fig. 3.68). The horizontal laid lines are due to
the wire mesh that was used to hold the paper during fabrication (Rauber et
al. 1996). The vertical chain lines are due to the thicker wire that was used
to support the laid lines.
The periodic nature of the laid and chain lines indicates the possibility of
examining and filtering the paper images in the Fourier domain. To remove
laid and chain lines, the filter design must contain a priori knowledge about
the line properties. The images are transformed in to the Fourier domain
using the DFT (Fig. 3.69). Due to their higher frequency, the laid lines appear
as distinctive frequency spectrum peaks in this domain. The lower frequency
chain lines are not as obvious, since they are hidden by the significant amount
of low frequency information in the image. A selective low-pass filter can
be applied locally to the high frequency peaks, representing the laid lines
in the Fourier spectrum. This procedure is illustrated in Fig. 3.69. When
the inverse FT is applied to this image we are left with a representation
of the watermark without the laid lines. A white top-hat transformation (as
outlined in Sec. 4.2.5), thresholding and filtering stages are then applied to the

130

3. Machine Vision Techniques

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 3.63. Low-pass filtering in the frequency domain. (a) Original image. (b)
Frequency spectrum of (a). (c) Low-pass filtering applied to (b), value = .2. (d)
Inverse Fourier transform of (c). (e) Adaptive low-pass filtering applied to (b),
limit = .2. (f) Inverse Fourier transform of (e).

3.4 Global Image Transforms

(b)

(a)

,.

/I

", .. ,
.'
" .'" .

-.

It

-.

(c)

..

" '.....

131

""

. ..

(d)

(e)
Fig. 3.64. Filtering a textured image in the frequency domain. (a) Original textured image. (b) Application of the 2-D DFT to (a). The image is the frequency
spectrum shown as an intensity function. (c) Application of the an ideal band-pass
filter is applied to (b). (d) The resultant spatial domain image after the inverse 2-D
discrete Fourier transform is applied to (c). (e) Associated visual workspace.

132

3. Machine Vision Techniques

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 3.65. Filtering in the frequency domain. (a) High-pass. (b) Low-pass. (c)
Band-pass. (d) Band-stop. (e) Selective. (f) Symmetric (selective).

3.4 Global Image Transforms

(a)

(b)

(c)

Cd)

(e)

(f)

133

Fig. 3.66. Gaussian filtering in the frequency domain. (a) Original image. (b)
Frequency spectrum of (a). (c) 2-D Gaussian. (d) Resultant image after the application of the Gaussian in (c), see Fig. 3.67. (e) 3-D profile of (d). (f) Inverse Fourier
transform of (e).

134

3. Machine Vision Techniques

~L~----:t----1I.D 1 X2

IMAGI!!

[":amantl

FOllRll!R

X2

~
VIII5WWRJeR

MULTIPLY

X:i!

e",~cl!!! ~IE

~
IHVFOURIER

ENHANCE

(MAolE

o-------EJ

VIIEWFOURIER

IMAa~

Fig. 3.67. NeatVision visual workspace for Gaussian filtering in the frequency
domain.

(a)

(b)

Fig. 3.68. A typical watermark. (a) Plain paper. (b) Paper with chain (vertical)
and laid (horizontal) lines. Note the uneven illumination on both images.

resultant segmented image. Finally, a mask is applied to eliminate boundary


pixels. While the resultant image has been degraded, sufficient information
remains to allow the registering of the watermark on the paper sheet.
Whereas this process is efficient in removing the laid lines, the chain lines
are clearly visible. Fortunately the chain lines are easily detected and can be
masked out from the image. See Whelan & Soille (1998) for further details.

3.5 Conclusion
The ideas outlined in this chapter will enable the reader to tackle a wide range
of machine vision applications. Over the coming chapters we will go into more
detail on specific methodologies dealing with the analysis of shape, colour and
texture. Specifically, in Chap. 4 we focus on the key set of techniques which
form the mathematical morphology framework. This approach to machine
vision is particulary useful for the analysis of shapes in binary and grey scale
images.

3.5 Conclusion

(a)

(b)

(c)

(d)

(e)

( f)

135

Fig. 3.69. Minimizing the effects of unwanted periodic lines in non-plain paper
samples using the Fourier transform. (a) Original Image. (b) DFT frequency spectrum as an intensity function. (c) Selective low-pass filtering of (b). (d) Inverse
Fourier transform of (c). (e) White top-hat of (d). (f) Filtered double threshold of

(e).

4. Mathematical Morphology

Mathematical morphology is a set-theoretic branch of nonlinear image processing and analysis that was initially developed by Matheron (1975). It deals
with geometric structure within an image. In fact, the term morphology refers
to the study of form and structure. Serra (1982), who describes the application of morphology to image analysis as the quantitative description of
geometric structures, produced the first systematic treatment of the subject.
Along with his colleagues at the Centre de Morphologie Mathematique in
France, Serra is one of the main independent contributors to the development
and growth of morphological techniques in image processing and analysis.
The two key reference books in this area by Matheron (1975) and Serra
(1988) tend to be highly mathematical and as such they are generally not
suitable for researchers new to the subject. For a good introduction to the
area of morphological image processing and analysis, the reader is directed
to Dougherty (1992) and more recently Soille (1999) (the latter book provides many working examples of morphology). Haralick & Shapiro (1992)
also provide an excellent tutorial text, which covers both binary and grey
scale morphology. Maragos (1987) reviews advances in the theory and application of morphological image analysis. This includes a discussion on the
application of morphology to filtering, edge detection, noise suppression, region filling, shape recognition, skeletonisation, shape smoothing and image
coding.
Mathematical morphological techniques appear across a broad range of
image processing and analysis applications. These techniques have proven to
be quite powerful, with applications including medical imaging and biology
(Liang et al. 1989), metallurgy (Serra 1982), surface description (Noble 1989),
texture analysis and synthesis (Serra 1988), microscopy (Klien & Serra 1972),
coding of binary images (Maragos & Schafer 1986), target tracking (Preston
1991), automated visual inspection (Sternberg 1986) and industrial vision
(Sternberg 1984, Whelan & Soille 1998).
The basic concept involved in mathematical morphology is simple: an image is probed with a template shape, called a structuring element (SE), to
find where the SE fits, or does not fit within a given image (Dougherty & Laplante 1995) (Fig. 4.1). By marking the locations where the template shape
fits, structural information can be derived from the image. The structuring
P. F. Whelan et al., Machine Vision Algorithms in Java
Springer-Verlag London 2001

138

4. Mathematical Morphology

elements used in practice are usually geometrically simpler than the image
they act on, although this is not always the case, see Sec. 4.5. Common structuring elements include points, point pairs, vectors, lines, squares, octagons,
discs, rhombi and rings. Key operations include expanding and shrinking,
thinning, thickening and the removal of unwanted features from the object
of interest. The effect of these operations on the image is dependent on the
size and shape the SE.
Since shape is a prime carrier of information in many machine vision
applications, mathematical morphology has an important role to play in
industrial systems (Haralick et al. 1987, Soille 1999). The language of binary morphology is derived from that of set theory (Haralick & Shapiro
1992). General mathematical morphology is normally discussed in terms of
Euclidean N-space, but in digital image analysis we are only interested in
a discrete or digitised equivalent in two-space. The following chapter uses
a formalism based on the Minkowski set operations (Haralick & Shapiro
1992, Dougherty 1992, Sonka et al. 1993). Slight variations of these definitions can also be found in literature on this subject (Serra 1982, Maragos
1987, Noble 1989, Soille 1999).

Fig. 4.1. A SE fitting, B2 and not fitting, B 1 , into a given


image scene A.

4.1 Binary Mathematical Morphology


The following analysis is restricted to binary images in a digital 2-D integer
space, Z2. The image set (or scene) under analysis will be denoted by A, with
elements a = (aI, a2). The shape parameter, or structuring element, that will
be applied to scene A will be denoted by B, with elements b = (b l , b2 ). The
primary morphological operations examined are dilation, erosion, opening
and closing.

4.1.1 Dilation and Erosion


Dilation. This is also referred to as filling and growing and is the expansion
of an image set A by a SE B. It is formally viewed as the combination of
the two sets using vector addition of the set elements and is equivalent to

4.1 Binary Mathematical Morphology

139

Minkowski set addition. The dilation of an image set A by a SE B, is denoted


A EB B = {c E Z2 I c = a + b for some a E A and b E B}. This can also be
represented as the union of translates of the SE B, A EB B = UaEA B a , where
U represents the union of a set of points and the translation of B by point
a is given by Ba = {c E Z2 I c = b + a for some b E B} (Haralick & Shapiro
1992). This is best explained by visualising a SE B moving over an image A
in a raster fashion. Whenever the origin of the SE touches one of the image
pixels in A, then the entire SE is placed at that location. For example, in
Fig. 4.2 the grid image is dilated by a cross-shaped SE, contained within a
3 x 3 pixel grid. Dilation is commutative A EB B = B EB A and associative
(A EB B) EB C = A EB (B EB C) .

EB~=.

Fig. 4.2. Dilation of a grid image by a


cross-shaped SE. The additional dilated
region is highlighted in grey.

The Neat Vision code for both the 4-connected and the 8-connected dilation operator is given below. This is the generic dilation operation and applies
equally to grey scale images (Sec. 4.2.1).

II 4-connected dilation
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

a = input.getxy(x,y-1);
b
input.getxy(x-1,y);
c = input.getxy(x+1,y);
d
input.getxy(x,y+1);
e = input.getxy(x,y);

II

+---+---+---+

II

+---+---+---+

II

+---+---+---+

II

II
II
II

a
e
d

+---+---+---+

z = Math.max(Math.max(Math.max(Math.max(a,b),c),d),e);
output.setxy(x,y,z);
}

II

8-connected dilation
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

a = input.getxy(x-1,y-1);
b
input.getxy(x,y-1);
c = input.getxy(x+1,y-1);
d
input.getxy(x-1,y);
e = input.getxy(x,y);
f
input.getxy(x+1,y);
g
input.getxy(x-1,y+1);
h
input.getxy(x,y+1);

II
II

II

+---+---+---+
I

+---+---+---+

II
II

I die
I f
I
+---+---+---+

II

+---+---+---+

II

140

4. Mathematical Morphology
i
input.getxy(x+1,y+1);
z = Math.max(i,Math.max(Math.max(Math.max(a,b),
Math.max(c,d,Math.max(Math.max(e,f) ,
Math.max(g,h;
output.setxy(x,y,z);
}

We can also dilate an image by a square N x N integer user defined SE.


The maskOff set variable is the number of pixels from the centre of the SE to
the edge. This measure is required so only valid image pixels will be masked.
int maskSize = (int)Math.sqrt(mask.length);
int maskDffset = (maskSize - 1) / 2, pixelPlusMask;
for(int y=maskDffset;y<height-maskDffset;y++)
for(int x=maskDffset;x<width-maskDffset;x++)
{

int max = Integer.MIN_VALUE;


for(int my=O; my<maskSize; my++)
for(int mx=O; mx<maskSize; mx++)
if (mask [mx+(my*maskSize)] !=null)
{

pixelPlusMask = input.getxy(x+mxmaskDffset,y+my-maskDffset)+mask[mx+
(my*maskSize)] .intValue();
if (pixelPlusMask>max)
max=pixelPlusMask;
}

output.setxy(x,y,flow(max;
}

Erosion. Erosion is equivalent to the shrinking (or reduction) of the image


set A by a SE B. This is a morphological transformation which combines two
sets using vector subtraction of set elements (Haralick & Shapiro 1992). The
erosion of an image set A by a structuring element B is denoted A 8 B =
{x E Z2 I for every b E B ::J a E A such that x = a - b}. This can also be be
represented as the intersection ofthe negative translates: A8B = nbEB A_ b ,
where represents the intersection of a set of points. Erosion of the image A
by B is the set of all points for which B translated to a point x is contained
in A. This consists of sliding the SE B across the image A and where B is
fully contained in A (by placing the origin of the SE at the point x) then x
belongs to the eroded image A 8 B. For example, in Fig. 4.3 the grid image is
eroded by a cross-shaped SE contained within a 3 x 3 pixel grid. One should
note that the erosion function is only equivalent to Minkowski subtraction if
the SE is symmetric with respect to its origin.
The Neat Vision code for both the 4-connected and the 8-connected erosion
operator is given below. This is the generic erosion operation and applies
equally to grey scale images (Sec. 4.2.1).

4.1 Binary Mathematical Morphology

141

II 4-connected erosion
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

a = input.getxy(x,y-1);
b
input.getxy(x-l,y);
c = input.getxy(x+l,y);
d
input.getxy(x,y+1);
e = input.getxy(x,y);

II

+---+---+---+

II

+---+---+---+

II

+---+---+---+

II

II
II
II

+---+---+---+

z = Math.min(Math.min(Math.min(Math.min(a,b),c),d),e);
output.setxy(x,y,z);
}

II 8-connected erosion
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

II +---+---+---+
a = input.getxy(x-1,y-1);
b
input.getxy(x,y-1);
II I a I b I c I
II +---+---+---+
c = input.getxy(x+1,y-1);
d
input.getxy(x-1,y);
II I d i e I f I
II +---+---+---+
e = input.getxy(x,y);
f
input.getxy(x+1,y);
II I g I h I i I
II +---+---+---+
input.getxy(x-1,y+1);
g
h
input.getxy(x,y+1);
input.getxy(x+1,y+l);
i
z = Math.min(i,Math.min(Math.min(Math.min(a,b),
Math.min(c,d,Math.min(Math.min(e,f),
Math.min(g,h;
output.setxy(x,y,z);

e ~=

liB

Fig. 4.3. Erosion of a grid image by a


cross-shaped SE.

We can also erode an image by a square N x N integer user defined SE.


int maskSize = (int)Math.sqrt(mask.length);
int maskOffset = (maskSize - 1) I 2;
int pixelPlusMask;
for(int y=maskOffset;y<height-maskOffset;y++)
for(int x=maskOffset;x<width-maskOffset;x++)
{

142

4. Mathematical Morphology

int min = Integer. MAX_VALUE;


for(int my=O; my<maskSize; my++)
for(int mx=O; mx<maskSize; mx++)
if (mask [mx+(my*maskSize)] !=null)
{

pixelPlusMask = input.getxy(x+mxmaskOffset ,y+my-maskOffset)+mask [mx+


(my*maskSize)] .intValue();
if (pixelPlusMask<min)
min=pixelPlusMask;
}

output.setxy(x,y,flow(min;
}

(b)

(a)

(c)
Fig. 4.4. Binary erosion and dilation. (a) Original binary image. (b) Erosion by a
3 x 3 square SE. (c) Dilation by a 3 x 3 square SE.

A duality relationship exists between certain morphological operators,


such as erosion and dilation. This means that the equivalent of such an
operation can be performed by its dual on the complement (negative) image and by taking the complement of the result. Although duals, erosion

4.1 Binary Mathematical Morphology

143

and dilation operations are not inverses of each other, rather they are related by the following duality relationships: (A e B)C = AC EB 13 and
(A EB B)C = AC e 13, where AC refers to the complement of the image
set A and 13 = {x I for some b E B, x = - b} refers to the reflection of B
about the origin. Serra (1986) refers to this as the transpose of the SE. This
relationship is illustrated in Fig. 4.5.

(a)

(b)

(e)

(d)

(e)

(1")

Fig. 4.5. Duality relationship. (a) SE B and the reflection of B about the origin,
(SE origins are indicated by a dot.) (b) Input image A. (c) AeB. (d) (A8B)G.
(e) AG. (f) A G EB B. This is equivalent to the image illustrated in (d).

B.

4.1.2 Hit-and-Miss Transform


This morphological transform is used to select pixels with certain geometric
properties (e.g., corners) while ignoring others. This operation l is represented
by A Q9 (J, K) = (A e J) n (ACe K), for structuring elements J and K such
that J n K = 0 (null set). Skeleton, Convex Hull, Corner and Limb End
Detection can be implemented using this transform (Jain 1989).
hitAndMiss(input, Integer[] hitMask, Integer[] missMask)
{

hit = ErodeNxN.erodeNxN(input,hitMask);
miss = ErodeNxN.erodeNxN(Not.not(input) ,missMask);
output = And.and(hit,miss);
return(output);
}

The example illustrated in Fig. 4.6 outlines how a 3 x 3 binary corner


finder can be implemented using this transform, where J and K are the
1

The use of the hit-and-miss symbol 'Q9' should not be confused with the Boolean
XOR operator described earlier.

144

4. Mathematical Morphology

hit and miss patterns respectively (as illustrated in Fig. 4.7). Thinning and
thickening operations can also be implemented directly using the hit-and-miss
transform.

(b)

(a)

Fig. 4.6. Binary corner detection using the Hit-and-Miss Transform. (a) Original
binary image. (b) Application of the Hit-and-Miss structuring elements illustrated
in Fig. 4.7. The detected points are indicated by white squares for clarity.

X 1

X X

Hit (J)

Miss (K)

Fig. 4.7. Hit-and-Miss structuring elements for binary corner detection. Where 'x'
indicates a "don't care" state.

Thickening. A 8 (J, K) = A u A Q9 (J, K). By applying successive iterations


of the thickening operator we can generate the convex hull (Jain 1989).
Thinning. A 0 (J, K) = A - A Q9 (J, K) (Logical Subtraction). Successive
iterations of the thinning operator can be used to generate the skeleton of an
image (Jain 1989).

4.1 Binary Mathematical Morphology

145

4.1.3 Opening and Closing

Erosion and dilation tend to be used in pairs to extract, or impose, structure


on an image. The most commonly found erosion-dilation pairings occur in
the opening and closing transformations.
Opening. Opening is a combination of erosion and dilation operations that
have the effect of removing isolated spots in the image set A that are smaller
than the SE B and those sections of the image set A narrower than B. This
is also viewed as a geometric rounding operation (Fig. 4.8). The opening of
the image set A by the SE B, is denoted A 0 B and is defined as (A 8 B) ffi B.
II

4-connected opening
open4(input)
{ return(Dilation.dilate4(Erosion.erode4(input); }
II 8-connected opening
open8(input)
{ return(Dilation.dilate8(Erosion.erode8(input); }

Closing. Closing is the dual morphological operation of opening. This transformation has the effect of filling in holes and blocking narrow valleys in the
image set A, when a SE B (of similar size to the holes and valleys) is applied
(Fig. 4.8). The closing of the image set A by the SE B, is denoted A. Band
is defined as (A ffi B) 8 B. Open and closing operations are related as follows:
(A.B)c=AcoB.

II

4-connected closing
close4(input)
{ return(Erosion.erode4(Dilation.dilate4(input); }
II 8-connected closing
close8(input)
{ return(Erosion.erode8(Dilation.dilate8(input); }

One important property that is shared by both the opening and closing
operations is idempotency. This means that successful reapplication of the
operations will not change the previously transformed image. Therefore, A 0
B = (A 0 B) 0 B and A. B = (A. B) B.
4.1.4 Skeletonisation

The basic morphological operations can be combined to generate skeleton,


or stick-figure representation of a binary image. Let nB = B ffi B ffi ... ffi
B (n times, n :2: 0). Then the skeleton of A, using B is denoted S(A, B).
S(A, B) = Un=O Sn(A, B) where Sn(A, B) = (A 8 nB) - [(A 8 nB) 0 B] and
n = M ax[jlA 0 jB:2: null set].
The incremental operation of this morphological process is illustrated in
Fig. 4.9. Unfortunately, there are two basic difficulties with this approach.
Firstly, the technique is not homotopic, that is the operation changes the contiguity relationship between the regions and the holes in an image. Secondly,
as we can see from Fig. 4.9(j), the resultant skeleton can have line widths
greater than one pixel.

146

4. Mathematical Morphology

(b)

(a)

(c)
Fig. 4.8. Binary opening and closing. (a) Original binary image. (b) Opening by
a square SE. (b) Closing by a square SE.

4.1.5 Structuring Element Decomposition

Some vision systems can perform basic morphological operations very quickly
in a parallel and/or pipelined manner. Implementations that involve such special purpose hardware tend to be expensive, although there are some notable
exceptions. Unfortunately, some of these systems impose restrictions on the
shape and size of the structuring elements that can be handled. Therefore,
one of the key problems associated with the application of morphological
techniques to industrial image analysis is the generation and/or decomposition oflarge structuring elements (Pitas & Venetsanopoulos 1990). Two main
strategies have emerged to tackle this problem, namely serial and parallel decomposition.
Serial Decomposition. The first technique is called dilation or serial decomposition. This decomposes certain large structuring elements into a sequence of successive erosion and dilation operations, each step operating on
the preceding result. Unfortunately, the decomposition of large structuring
elements into smaller ones is not always possible. Also, those decomposi-

4.1 Binary Mathematical Morphology

(a)

(e)

(b)

(c)

(f)

(g)

(i)

(j)

147

(d)

(h)

Fig. 4.9. Skeletonisation. (a) SE B. (b) Original Image A. (c) AoB. (d) So(A, B) =

A-AoB. (e) AeB. (f) Sl(A,B) = (AeB)-[(AeB)oB]. (g) Ae2B. (h)


S2(A, B) = (A e 2B) - [(A e 2B) 0 B]. (i) A e 3B. (j) S(A, B) = U Sn(A, B).

tions that are possible are not always easy to identify and implement. If
a large SE B can be decomposed into a chain of dilation operations, B =
Bl ffiB 2ffi . . .ffiB n (Fig. 4.10 (Waltz 1988)), then the dilation of the image set A
by B is given by: AffiB = Affi(Bl ffiB 2ffi ... ffiB n ) = (((AffiBdffiB2)" .)ffiBn .

Fig. 4.10. Construction of a 7 x 7 SE


by successive dilation of a 3 x 3 SE.
Left to Right: Initial pixel. 3 x 3 SE
and the result of successive dilations.

Similarly, using the chain rule, which states that Ae(BffiC) = (AeB)eC,
the erosion of A by B is given by: A e B = A e (Bl ffi B2 ffi ... ffi Bn) =
(((A e Bd e B 2) ... ) e Bn (Zhuang & Haralick 1986).
Parallel Decomposition. A second approach to the decomposition problem is based on breaking up the SE, B, into a union of smaller components,

148

4. Mathematical Morphology

B l , ... , Bn. We can think ofthis approach as tiling of the SE by sub-structuring


elements (Fig. 4.11 (Waltz 1988)). Since the tiles do not need to be contiguous
or aligned, any shape can be specified without the need for serial decomposition of the SE, although the computational cost of this approach is proportional to the area of the SE (Waltz 1988). This is referred to as union or
parallel decomposition. Therefore, with B decomposed into a union of smaller
structuring elements, B = Bl U B2 U ... U B n , then the dilation of an image A
by the SE B can be rewritten as: AEBB = AEB(B l UB 2U ... UBn) = (AEBBdU
(AEBB2)U" .U(AEBBn). Likewise, the erosion of A by the SE B can be rewritten as: AeB = Ae(B l UB2U ... UBn) = (AeBdn(AeB2)n ... n(AeBn)'
This makes use of the fact that A e (B U C) = (A e B) n (A e C). Due to the
nature of this decomposition procedure, it is well suited to implementation
on parallel computer architectures .

Fig. 4.11.
bitrary SE.
structuring
with nine 3
ements.

Tiling of a 9 x 9 arLeft: The initial 9 x 9


element. Right: Tiling
x 3 sub-structuring el-

Waltz (1988) compared these structural element decomposition techniques and showed that the serial approach has a 9:4 speed advantage over
its parallel equivalent. This was based on an arbitrarily specified 9 x 9 pixel
SE, implemented on a commercially available vision system. However, the
parallel approach has a 9:4 advantage in the number of degrees of freedom,
since every possible 9 x 9 SE can be achieved with the parallel decomposition,
but only a small subset can be realised with the serial approach. Although
slower than the serial approach, it has the advantage that there is no need
for serial decomposition of the SE.
Classical parallel and serial methods mainly involve the numerous scanning of image pixels and are therefore inefficient when implemented on conventional computers. This is the case, since the number of scans depends
on the total number of pixels (or edge pixels) in the shape to be processed
by the morphological operator. The parallel approach is suited to some customised (parallel) architectures, while the ability to implement such parallel
approaches on serial machines is discussed by Vincent (1991a).
4.1.6 Interval Coding

While 2-D binary images may be represented in a number of ways (Vincent 1991a, Vincent 1991b), interval coding (Piper 1985, Liang et al. 1989)

4.2 Grey Scale Mathematical Morphology

149

allows the efficient implementation of binary morphological operations on serial computers. The coding scheme is based on the earlier work of Young et
al. (1981) and is similar to run length encoding discussed in Sec. 3.3.4. This
approach places no restrictions on the shape or size of the scene, or of the
structuring elements, to be manipulated.
Erosion and Dilation Using Interval Coding. Any image set can be
regarded as a union of intervals or run lengths, denoted I, where each interval
is a set of pixels such that I = (LC, RC, R), where LC is the pixel column
coordinate at start of the interval, RC is the pixel column coordinate at end
of the interval and R is the row on which the interval occurs. Both the scene
A and the SE B can be represented as unions of intervals: A = UAIA and

B = UEIE.

LC

J.

RC

J.

R-+

~:l11

Fig. 4.12. Interval coding triple.

The erosion of A by B can be rewritten in terms of their interval codes:


A 8 B = nE(A 8 IE) = nE UA (IA 8 IE) where IA = [IAl' IA2, ... , IAnJ and
IE = [IEl' I B2 , ... , IEnJ. The dilation of A by B can be expressed in terms of
their interval codes: A EEl B = UE (A EEl IE) = UE UA (IA EEl IE). Therefore, the
dilation of two intervals is reduced to the addition of the elements within those
intervals. If IAI = [LCl , RCl , RlJ and IEl = [LC2, RC2, R2J then IAI EElIEl =
[LCI + LC2, RCI + RC2, Rl + R2J. Similarly the erosion of IAI by IEl is
reduced to the difference of the elements within the interval: IAI 8 IEl =
[LC I - LC2, RCI - RC2, Rl - R2J (note that the resulting interval is empty
if LCI - LC2 > RCI - RC2).

4.2 Grey Scale Mathematical Morphology


Binary morphological operations can be extended naturally to process grey
scale imagery by the use of neighbourhood minimum and maximum functions. Heijmans (1991) presents a detailed study of grey scale morphological operators, in which he outlines how binary morphological operators and
thresholding techniques can be used to build a large class of useful grey scale
morphological operators. Sternberg (1986) discusses the application of such
morphological techniques to industrial inspection tasks.

150

4. Mathematical Morphology

4.2.1 Basic Grey Scale Operators

In Fig. 4.13, a 1-D morphological filter operates on an analogue signal (equivalent to a cross-section of a grey scale image), where the input signal is represented by the upper curve and the output by the lower curve. In this simple
example, the SE has an approximately parabolic form. In order to calculate
a value for the output signal, the SE is pushed upwards, from below the
input curve. The height of the top of the SE is noted. This process is then
repeated, by sliding the SE sideways and the locus of the SE's origin is traced
to produce the output signal. Notice how this particular operator attenuates
the intensity peak but follows the input signal quite accurately everywhere
else. Subtracting the output signal from the input would produce a result in
which the intensity peak is emphasised and all other variations are reduced.
The effect of the basic morphological operators on 2-D grey scale images can
also be explained in these terms. Imagine the grey scale image as a landscape
in which each pixel can be viewed in three-dimensions. The extra height dimension represents the grey scale value of a pixel. We generate new images
by passing the SE above/below this landscape.

IntenSity

t ~~~~:uring

-LtL:

nt

Fig. 4.13. Grey scale erosion: Pass the


SE below image surface and trace the path
taken by its origin.

Grey Scale Dilation. This is computed as the maximum of translations


of the grey surface. Grey level dilation of image A by the SE B produces
an image C defined by: C(r,c) = Maxi.j[A(r - i,c - j) + B(i,j)] = (A EB
B)(r, c) where A, Band C are grey level images (Fig. 4.14 (Dougherty 1992)).
Commonly used grey level structuring elements include rods, disks, cones and
hemispheres. Fig. 4.15 illustrates the operation of a grey scale dilation by a
linear SE. This can also be viewed as obtaining the reflection of the SE and
then passing it above the image top surface and tracing the path taken by its
origin. This operation is commonly used to smooth small negative contrast
grey level regions in an image (Fig. 4.16).
Grey Scale Erosion. The grey value of the erosion at any point is the
maximum value for which the SE centred at that point still fits entirely
within the foreground under the surface. This is computed by taking the
minimum of the grey surface translated by all the points of the structuring

-t'

4.2 Grey Scale Mathematical Morphology

SlruCluring
Element

+-------,-----B

151

Outpul S;gnal

Inpul S<gnal

Fig. 4.14. Grey scale


dilation by a conical
SE.

a a
(a)

(b)

(c)

Fig. 4.15. Grey scale dilation and erosion of a linear image f(x) by a 1 x 3 SE
B = [25,50,25]. (a) Original image f(x) (bottom) and the associated SE B. (b)
f(x)ffiB. (c) f(x)8B.

element. Grey level erosion of an image A by the SE B produces an image C


defined by: C(r, c) = Min(i,j) [A(r + i, c + j) - B(i, j)] = (A 8 B)(r, c) (Fig.
4.13). This operation is commonly used to smooth small positive contrast
grey level regions in an image (Fig. 4.16).
Grey Scale Opening. This operation is defined as the grey level erosion of
the image followed by the grey level dilation of the eroded image. That is, it
will cut down the peaks in the grey level topography to the highest level for
which the SE fits under the surface (Fig. 4.17).
Grey Scale Closing. This operation is defined as the grey level dilation of
the image followed by the grey level erosion of the dilated image. Closing fills
in the valleys to the maximum level for which the SE fails to fit above the
surface (Figs. 4.17 and 4.18).

152

4. Mathematical Morphology

4.2.2 Noise Removal using Grey Scale Morphology

By combining the opening and closing operations the effect of both white
and black noise in an image can be reduced. In Fig. 4.19 the original image
is corrupted with a low level of salt and pepper noise (i.e. a random amount
of black and white pixels are overlaid on the image). By opening the image
with a 3 x 3 square SE the white noise can be eliminated from the image, although the image suffers some degradation. Similarly, by applying the closing
operator, also using a 3 x 3 square SE, black noise can be removed.
saltnpepper(inputImage, double noiseInput)
{

for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

if (Math.random()<noiseInput)
{

if(Math.random(O.5)
pixel Value
BLACK;
else
pixel Value
WHITE;
}

else
pixelValue = inputImage.getxy(x,y);
outputImage.setxy(x,y,pixelValue);
}

return(outputImage);
}

Fig. 4.19(d) illustrates the combined image resulting from the opening
and closing stages. Fig. 4.19 also illustrates the resulting images when the
Sobel edge detector is applied to the original noisy image and the image after
the morphological noise removal has been applied.
4.2.3 Morphological Gradients

The application of some of the basic morphological operations outlined previously can lead to some simple gradient detection techniques (Soille 1999).
Internal Gradient. This is represented by A- (AeB). This is also referred
to as a half-gradient by erosion and enhances the internal boundaries of objects brighter than the background. External boundaries of objects darker
than the background are also enhanced.
External Gradient. This is represented by (AEBB) - A. This is also referred
to as a half-gradient by dilation and is the complementary operation to the
internal gradient.
Morphological Gradient. The simplest morphological gradient of an image A, with respect to a SE B, is the sum of the internal and external gradients and can be defined as GRAD(A) = (A EB B) - (A e B), see Fig. 4.20.

4.2 Grey Scale Mathematical Morphology

(a)

(b)

(c)

(d)

(e)

(f)

153

Fig. 4.16. Grey scale dilation and erosion (8-connected) using a 3 x 3 fiat SE (i.e.
all entries set to 1). (a) Original image. (b) 3-D representation of (a). (c) Dilation.
(d) 3-D representation of (c). (e) Erosion. (f) 3-D representation of (c).

154

4. Mathematical Morphology

(a)

(b)

(c)

Fig. 4.17. Grey scale opening and closing of a linear image f(x) by a 1 x 3 structuring element B = [25,50,25]. (a) Original image f(x) (bottom) and its associated
SE B. (b) f(x) 0 B. (c) f(x) B.

(a)

(b)

(c)

(d)

Fig. 4.18. Grey scale opening and closing (8-connected) of Fig 4.16(a). (a) Opening. (b) 3-D representation of (a). (c) Closing. (d) 3-D representation of (c).

4.2 Grey Scale Mathematical Morphology

(a)

(b)

(c)

(d)

(e)

(f)

155

Fig. 4.19. Noise removal (8-connected). (a) Original image contaminated with salt
and pepper noise. (b) Opening. (c) Closing. (d) Opening and closing operations
combined. (e) Sobel edge detector applied to (d). (f) Sobel edge detector applied
to (a).

156

4. Mathematical Morphology

(a)

(b)

(c)

(d)

Fig. 4.20. The morphological gradient applied to an x-ray image of a chicken fillet.
(Courtesy of M. Graves, Intelligent Manufacturing Systems Ltd., UK). (a) Original
grey scale image. (b) Internal boundary. (c) External boundary. (d) Morphological
gradient.

4.2.4 Point-Pairs

Fig. 4.21 illustrates how closing by 7 x 7 point-pair structuring elements


(Fig. 4.22) in both the horizontal and vertical directions can be used to
isolate lines in these orientations. By rotating these structuring elements we
can determine the orientation of lines within a given image.
Covariance. This involves the erosion of an image with points separated
by an increasing distance along the horizontal direction. Therefore, we can
obtain a measure of the volume of the image as it is eroded by a pair of
points. For example, when the distance equals the periodicity of the dashed
lines in Fig. 4.24, then the lines are not eroded. This can also be useful in
the removal of structured noise and determining the orientation of structures
within a periodic image (Soille 1999).

4.2 Grey Scale Mathematical Morphology

157

Fig. 4.21. Closing by a 7 x 7 point-pair in horizontal and vertical directions using


the point-pair SE's illustrated in Fig. 4.22.

Fig. 4.22. Point-pair structuring elements. A


dark grid entry indicates a "I" and a clear grid
indicates a "don't care" state. (a) Horizontal SE.
(b) Vertical SE.

(a)

(b)

Fig. 4.23. Visual workspace


associated with Fig. 4.21.

4.2.5 Top-Hat Transform


In general, we design a given morphological filter so as to extract, or filter
out a given structure within the image. Such a design will be dependent on
the shape, size and orientation of the structures under examination. Morphological top-hats are different, in the sense that they use knowledge about the
shape characteristics that are not shared by the relevant image structures.
An opening with a SE that does not fit the relevant image structures is then
used to remove them from the image. These structures are recovered through
the arithmetic difference between the image and its opening. This operation
is called a White Top-hat or Top-hat by Opening (Meyer 1986) and is defined
as WT H(A) = A- (AoK). The dual ofthis operation is the Black Top-hat or
Valley Detector and is defined as BTH(A) = (AeK) -A. Fig. 4.25 illustrates

158

4. Mathematical Morphology

(b)

(a)

1 1 1 1 1 1 1 11

11

111.11111111

11111111111

I I I I

I I

I I

I I

(c)

(d)

Fig. 4.24. Determining the periodicity of lines by the erosion of point-pairs with
increasing inter-pixel distance. (a) Input image. (b) Erosion with a linear SE pointpairing separated by 10 pixels. (c) Erosion with a linear SE point-pairing separated
by 16 pixels. This results in retention of the periodic lines. (d) Erosion with a linear
SE point-pairing separated by 17 pixels.

the application of the top-hat to the linear image f(x). Fig. 4.26 illustrates
the effect of the top-hat transform on a grey scale image. One particularly
useful feature of this operator is its ability to remove a light gradient from
an image (Fig. 4.27).
tophat4 (input){
return(subtract(input,open4(input);
}

tophatS (input){
return(subtract(input,openS(input);
}

valley4(input) {
return(subtract(close4(input),input;
}

4.2 Grey Scale Mathematical Morphology

159

Fig. 4.25. Top-hat of a linear image f(x), using a 1 x 3 SE B = [1,1,1]. Left to


right: f(x), White top-hat: f(x) - (f(x) 0 B) and Black top-hat: (f(x). B) - f(x).

.{.

~
.~

,..

'c,
W

(,b
, tj

.. _-

, ;..
"

(a)

(b)

(c)

(d)

Fig. 4.26. Grey scale top-hat transformation (8-connected) of Fig 4.16(a). (a)
White top-hat. (b) 3-D representation of (a). (c) Black top-hat. (d) 3-D representation of (c) .

160

4. Mathematical Morphology
valleyS (input){
return(subtract(c!oseS(input),input;
}

(b)

(a)

(c)
Fig. 4.27. From an uneven to an even background light gradient using the white
top-hat transform. (a) Input image. (b) Opening with a 7 x 7 square. (c) White
top- hat: arithmetic difference between (a) and (b).

Top-hat Based Contrast Enhancement. The top-hat can also be used


to enhance the contrast in an image. This is implemented as follows: Contrast
enhancement of A = A + WTH(A) - BTH(A).

4.3 Morphological Reconstruction


Morphological reconstruction techniques allow the isolation of certain features within an image based on the manipulation of a mask image G and
a marker image F. Morphological reconstruction algorithms are defined as
dilations or erosions performed until stability is reached, i.e. where the propagation or shrinking of the marker is totally impeded by the control image
G. The major design issue resides in the choice of adequate images for F and
G (Serra 1982, Soille 1997). As we will see, the choice of the mask-marker
pairs is generally application dependent.

4.3 Morphological Reconstruction

161

4.3.1 Conditional Dilation


If a dilation is successively repeated, then the original image grows without
any bounds. Sometimes it is important to restrict growth due to dilation
and this is referred to as conditional dilation and is represented by (F EB
B)(n) 6.G, where F is the marker image, G is the mask image used to limit the
propagation of the dilation operation, 6. represents the point wise minimum
operation and n represents the number of iterations in the dilation procedure.
Fig. 4.28 illustrates the conditional dilation of a marker point within the
connected component of interest. This approach suffers from the drawback
that for larger structuring elements bridging between components can occur
prior to masking. This problem can be largely overcome by implementing
geodesic dilation.

A8 A8
(a)

(b)
-------

A8 AS
(c)

(d)

Fig. 4.28. Conditional dilation. Beginning with the marker image we repeatedly
apply conditional dilations relative to the entire original image using a 3 x 3 SE.
(a) The mask image G is shaded. The marker image F is indicated by the white
pixel at the centre of the figure 8. (b) 20 iterations of the conditional dilation. (c)
50 iterations of the conditional dilation. Notice that the bridging effect begins to
occur on the bottom right edge of the character "A" in the image. (d) 70 iterations
of the conditional dilation. The effect of bridging is clearly illustrated.

162

4. Mathematical Morphology

4.3.2 Geodesic Dilation

The geodesic 2 dilation of the marker image F with respect to the SE


B and under the control of the mask image G is defined as 152;1 (F) =
[[[(F EEl B)1'>G] EEl B)1'>G] . .. EEl B)(n1 1'>G], where 1'> represents the point-wise
minimum operation and n represents the number of step by step iterations
in the dilation procedure. Geodesic dilation is lower or equal to the corresponding conditional dilation, as the constraint on the growth of F due to G
is more severe than conditional dilation (Fig. 4.29).

A8 A8
(a)

(b)

AS A8
(c)

(d)

Fig. 4.29. Extraction of a connected component using geodesic dilation. Beginning


with the marker image we repeatedly apply conditional dilations relative to the
entire original image using a 3 x 3 SE. (a) The mask image G is shaded. The
marker image F is indicated by the white pixel at the centre of the figure 8. (b) 20
iterations of the geodesic dilation. (c) 50 iterations of the geodesic dilation. (d) 70
iterations of the conditional dilation.

4.3.3 Geodesic Erosion

The geodesic erosion of the marker image F with respect to the SE Band
under the condition given by the mask image G is defined as c2;l (F) =
2

A geodesic distance is defined as the shortest discrete distance between two pixels
within a graph or connected region.

4.3 Morphological Reconstruction

163

[[[(F e B)\7G]e B)\7G] ... e B)(n)\7GJ, where \7 represents the point-wise


maximum operation and n represents the number of step by step iterations
in the erosion procedure. The mask limits the erosion of the marker image
immediately after each erosion stage.
4.3.4 Reconstruction by Dilation

This is defined as the geodesic dilation of the marker image F with respect
to the mask image G until stability is reached and is denoted Rc(F) =
152; \ F) such that 152;+ 1 ) (F) = 152;\ F). The control image G is therefore
reconstructed from a marker image F (Vincent 1993, Soille 1999).
Fig. 4.30 illustrates how reconstruction by dilation, when applied to a
single pixel, can isolate a region within a complex scene. Figs. 4.32 and 4.33
illustrate how the removal of boundary objects from binary and grey scale
images can be implemented using this approach.
4.3.5 Reconstruction by Erosion

Similarly, Reconstruction by Erosion can be defined as R'G(F)


that Er;+l) (F) = Er;)(F) (Soille 1999).

= Er;)(F) such

4.3.6 Ultimate Erosion

Ultimate erosion involves eroding the image until the point just before each
component within the image disappears. The union of these components results in the ultimate erosion. This can be viewed as the relative minima of
the complement of the grass fire transform, see Sec. 3.3.2.
4.3.7 Double Threshold

The double threshold operator (DBLT) thresholds the input image for two
ranges of grey scale values, one being included in the other. The threshold
DBLT for the narrow range is then used as a seed for the reconstruction of
the threshold for the wide range:
(4.1)
where R denotes the morphological reconstruction by dilation (Lantuejoul
& Maisonneuve 1984, Soille 1997) and T is the threshold operation. The resulting binary image is much cleaner than that obtained with a unique threshold. Moreover, the result is more robust to slight modifications of threshold
values. The double threshold technique is very similar to the hysteresis threshold proposed in Canny (1986) (Sec. 3.2.3).
In the example illustrated in Fig. 4.34, the double threshold is applied in
conjunction with reconstruction by dilation to remove all the pixels which do

164

4. Mathematical Morphology

(a)

(b)

(c)

(d)

Fig. 4.30. Reconstruction by dilation. (a) Input mask image G. (b) Single pixel
marker image F (indicted by an arrow). This forms the locus of the geodesic dilation
operation. (c) Result of reconstruction by dilation RG(F) when applied to (a) and
(b). Geodesic dilation is terminated when no further growth occurs. (d) Resultant
image when (c) is subtracted from the original image (a), G - RG(F).

not belong to the watermark image. The final filtering stage, Fig. 4.34(f), is
based on a surface area criterion applied to the closing of the double threshold
image: all connected components having a number of pixels smaller than a
given threshold are removed. The filtered closed image is then intersected
with the double threshold image.
4.3.8 Image Maxima

The Image Maxima are defined as RMAX(A)


mark bright image objects.

= A - RA(A - 1). This will

4.4 Morphological Segmentation

165

Fig. 4.31. Visual workspace for the operation described in Fig. 4.30.

4.3.9 lInage MiniIlla

The Image Minima are defined as RMIN(A) = R:4(A + 1) - A. This will


mark dark image objects. RM AX = RM I N C . Image maxima and minima
can be used to mark image objects for reconstruction.

4.4 Morphological Segmentation


4.4.1 Skeleton Influence by Zones (SKIZ)

For a given connected component within an image, its Influence Zone (IZ)
is the collection of points that are closer to it than any other component.
Alternatively, the IZ of a collection of points are defined as the Voroni regions
associated with these points. The SKIZ operation produces the boundary of
the various IZ, indicating the points that do not belong to any IZ. This is
approximated by the application of the thinning operator to the background
of a binary image and then removing any spurious branches, see Sec. 3.3.2.
A Geodesic SKIZ restricts attention to C and generates the IZ of A relative
to C, where C is a component of the image and A is a disconnected subset
of C. While this can be used to segment an image (Fig. 4.35) the SKIZ does
not always produce suitable results. This leads us to the examination of the
watershed segmentation technique.
4.4.2 Watershed SegIllentation

This is a relative of adaptive thresholding and is commonly used to separate


neighbouring objects. It produces good results in areas that are touching or

166

4. Mathematical Morphology

(a)

(b)

(c)

(d)

Fig. 4.32. Removing boundary objects from the binary image using reconstruction
by dilation. (a) Mask image G. (b) The marker image F is the point-wise minimum
between a single pixel wide white border and the mask image G. (c) Reconstruction
by dilation, RG(F). (d) Removal of the white regions touching the image border,
G - RG(F).

(a)

(b)

(c)

(d)

Fig. 4.33. Removing boundary objects from a grey scale image using reconstruction
by dilation. (a) Original mask image G. (b) The marker image F is the point-wise
minimum between a single pixel wide white border and the mask image G. (c)
Reconstruction by dilation, RG (F). (d) Removal of the grey scale regions touching
the image border, G - RG(F).

4.4 Morphological Segmentation

(a)

(b)

()

(d)

()

(f)

167

Fig. 4.34. Image reconstruction using double thresholding. (a) Original watermark
image. (b) White top-hat of (a). (c) First threshold of (b): marker image F. The
threshold value is chosen to minimize the unwanted background information without losing too much of the watermark. (d) Second threshold of (b): mask image G.
The threshold value is chosen to maximise the watermark data. (e) Reconstruction
of mask image from the marker. (f) After filtering based on the surface area of the
connected components of the closing of the reconstructed image by a 3 x 3 SE.

168

4. Mathematical Morphology

(a)

(b)

(c)

(d)

Fig. 4.35. Skeleton Influence by Zones, SKIZ. (a) Original image. (b) Resultant
image from the SKIZ operator. (c) Combination of (a) and (b). (d) Image (c)
zoomed to highlight incorrect segmentation.

have few grey levels separating them. The key to this form of segmentation
is finding suitable markers (i.e. local minima). Watersheds are zones dividing
adjacent Catchment Basins (CBs), where a catchment basin is defined as
a local minima which represents low gradient region interiors. While this
approach can be explained in a number of different ways, the Immersion
Approach is one of the more intuitive. This involves assuming that a hole
exists at each local minima and as the surface is placed in water the CBs are
filled from the bottom up. If two CBs are likely to merge due to further
immersion, a dam is built all the way to the surface (highest grey scale
level). This dam represents the watershed line. An efficient implementation
of this approach involves sorting the pixels in increasing order of grey scale
values, followed by a flooding step consisting of a fast breath-first scanning
of all the pixels in the order of their grey vales (Vincent & Soille 1991). As
well as being computationally efficient, this implementation overcomes the
inaccuracies that occur in previous approaches to the watershed algorithm.

4.5 Case Study: Geometric Packing

169

Fig. 4.36 illustrates the implementation of the watershed technique using


the immersion approach. The NeatVision implementation of the watershed
is based on this technique as described in Vincent & Soille (1991).
Raw watersheds when applied to complex grey scale images suffer from the
disadvantage of producing highly over-segmented images with a multitude of
CBs, Fig. 4.39. This problem is overcome by using region markers to reduce
the numbers of CBs. The reader is directed to Soille (1999) for further details
on this approach.

4.5 Case Study: Geometric Packing


The ability to manipulate previously unseen objects under visual control is
one of the key tasks in the successful implementation of robotic, automated
assembly and adaptive material handling systems. It is within the context
of this framework that an industrial vision packing strategy has been developed. Its two main components are a geometric packer, based on the principles of mathematical morphology, which takes an arbitrary shape in a given
orientation and puts the shape into place, in that orientation. The second
component, a heuristic packer, is concerned with the ordering and alignment
of shapes prior to applying them to the geometric packer. Details on the
heuristic packer are beyond the scope of this book (Whelan & Batchelor
1993, Whelan & Batchelor 1996).

4.5.1 Geometric Packer Implementation


The following section outlines the intermediate steps involved in the morphological packing of an arbitrary shape (denoted by the SE B). The image
scene (i.e. the region to be packed) is denoted by the image set A. The morphological operations are illustrated in Fig. 4.40.
The steps involved are:
1. Digitise the scene, image set A and the shape to be packed, B.
2. Erode the image scene A, by the SE B, to produce the erosion residue
image C = A 8 B. Every white pixel in this residue image represents
a valid packing location. This step will be valid for any choice of SE
origin point. Conventionally and to be consistent with the ideas discussed
earlier, the origin will be taken to be the SE centroid.
3. Scan the erosion residue image, in a raster fashion, for the location of
the first (top-left) white pixel. This location is denoted by (fitx, fity) in
the program that follows and corresponds to the first possible packing
location of B in the scene A, when it is scanned in this way. It has been
suggested by Haralick & Shapiro (1992) that the further erosion of the
residue image C by a standard 3 x 3 square SE, prior to searching for
the first packing location, would enable control of the spacing between

170

4. Mathematical Morphology

(a)

(b)

(c)

(d)

(e)

(f)

(g)
Fig. 4.36. Watershed Segmentation. (a) Original image. (b) Negative of its distance
function. (c) Profile of (b) along the horizontal line indicated in (b). (d) Immersion
of the image until the first watershed point is found and flagged. (e) Immersion
continues until all the watershed points are found and flagged. (f) Profile with the
watershed points flagged. (g) Result of the watershed segmentation process.

4.5 Case Study: Geometric Packing

(a)

(b)

(c)

(d)

171

(e)
Fig. 4.37. Separating connected components using watershed segmentation. (a)
Original image. (b) Binary version of (a). (c) Watershed lines. (d) Watershed regions. (e) Separation of the discs in the image into distinct regions.

172

4. Mathematical Morphology

IMAGE

(a)
Fig. 4.38. The Neat Vision visual workspace for the operation described
Fig. 4.37.

(a)

III

(b)

Fig. 4.39. Grey scale segmentation using the watershed approach. (a) Original
image. (b) Application of watershed. The image has clearly been over segmented.

the packed shapes. Clearly, the number of pixel stripping operations, on


the erosion residue, would be related to the spacing between the packed
shapes.
4. Translate (shift) the shape to be packed, B, to the location (fitx,jity).
This effectively places B at the coordinate of the first possible packing
location (when the residue image is scanned in a raster fashion). The
resultant image will be denoted by B(jitx,fity).
5. The resultant image from step 4 is subtracted from the original image
set A to produce a new value for the image set A, therefore effectively
packing B into the scene (Fig. 4.40). This can be represented algebraically
as replacing A by A - B(jitx,fity)'
This procedure is applied recursively to the image set A until an attempt
to pack all the input shapes has been made or no more shapes will fit into
the remaining space. The reapplication of the transform has no effect on

4.5 Case Study: Geometric Packing

(b)

(a)

(d)

173

(c)

(e)

Fig. 4.40. Geometric packing using mathematical morphology. (a) Input image A.
(b) SE B. (c) Erosion residue C = A 8 B. (d) Step 4, BUitx.Jity). (e) Updated
scene, see step 5.

the shapes already packed, due to the idempotent nature of this operation
(Fig. 4.41) (Whelan & Batchelor 1996).

(a)

(b)

(c)

Fig. 4.41. Automated packing of tool shapes. (a) Tools packed in their current
orientation using geometric packing. (b) Tools reorientated for better efficiency
using heuristics. (c) Tools packed in an irregular scene.

174

4. Mathematical Morphology

4.6 Morphological System Implementation


While no single image processing operation is so important that all others can
be ignored, it is interesting to consider the implementation of the morphological operators, since it reflects the range of hardware and software techniques
that can be applied to achieve high speed processing. There are two classical
approaches to the implementation of morphological techniques on computer
architectures, parallel and sequential (serial) methods. Klien & Serra (1972)
discuss an early example of one of the many commercial computer architectures for digital image processing which implement the basic morphological
operations: erosion and dilation. Morphological operations with 3 x 3 pixel
structuring elements, are easily implemented by array architectures, such as
CLIP (Duff et al. 1973). Waltz (1988) describes examples of a near realtime implementation of binary morphological processing using large (up to
50 x 50 pixels) arbitrary structuring elements, based on commercially available image processing boards. The success of this approach, referred to as
SKIPSM (Seperated-Kernal Image Processing using Finite State Machines)
was achieved by reformulating the algorithms in such a way that it permitted high-speed hardware implementation. Similar algorithmic methods allow
fast implementation of these operators in software (Vincent 1991b). A growing number of companies now manufacture industrial vision systems that
incorporate video rate morphological operations, albeit with a limited range
of structuring elements.

4.7 Conclusion
This chapter has just hinted at the power of the mathematical morphology
methodology. It offers vision designers a structured approach in which they
can design complex machine vision solutions. While mathematical morphology is predominantly used in the analysis and manipulation of shapes, it can
also be applied to the analysis of texture features (as we will see in Chap. 5).
Chaps. 5 and 6 will discuss some of the key methodologies relating to the
analysis of colour and texture. While the initial version of N eat Vision (Version
1.0) only contains limited support for texture and colour image analysis, it
is our aim to present the background material necessary for designers to
implement these techniques themselves using the NeatVision development
environment (as discussed in Chap. 7).

5. Text ure Analysis

In this chapter we extend our analysis to images which contain significant


texture features. Its aim is to summarise the various texture analysis operators that may be employed in machine vision applications. The discussion
aims to highlight the advantages and disadvantages of such techniques, rather
than give a detailed description of their behaviour. While the discussion will
draw on many of the ideas outlined previously, a number of new concepts are
also introduced.

5.1 Text ure and Images


Texture is observed in the patterns of a wide variety of synthetic and natural
surfaces (e.g. wood, metal, paint and textiles). If an area of a textured image
has a large intensity variation then the dominant feature of that area would be
texture. If this area has little variation in intensity then the dominant feature
within the area is tone. This is known as the tone-texture concept and helps
to form many of the approaches to texture analysis. Although a precise formal
definition of texture does not exist, it may be described subjectively using
terms such as coarse, fine, smooth, granulated, rippled, regular, irregular and
linear. These features are used extensively in manual region segmentation.
There are two main classification strategies for texture: statistical and
structural analysis. The statistical approach is suited to the analysis and
classification of random or natural textures. A number of different techniques have been developed to describe and analyse such textures (Haralick &
Shapiro 1992), a few of which are introduced in this chapter. The structural
approach is more suited to textures with well defined repeating patterns.
Such patterns are generally found in a wide range of manufactured items.

5.2 Edge Density


This is a simple technique in which an edge detector or high pass filter is
applied to the textured image. The result is then thresholded and the edge
density is measured (i.e. the average number of edge pixels per unit area).

P. F. Whelan et al., Machine Vision Algorithms in Java


Springer-Verlag London 2001

176

5. Texture Analysis

2-D or directional filters/edge detectors may be used as appropriate. This


approach is easy to implement and computationally inexpensive, although it
is dependent on the orientation and scale of the sample images (Fig. 5.1).

(b)

(a)

.NT
[lJ

.NT

I .. IJ

(c)
Fig. 5.1. Edge density measurement using the Sobel edge detector. (a) Sample
sand image with an edge density of 35.15. (b) Sample grass image with an edge
density of 44.55. (C) Associated visual workspace.

5.3 Monte-Carlo Method


This basic approach to texture analysis involves the overlay of a number of
random binary lines on a binary version of the textured image (histogram
equalisation is applied to the image prior to thresholding). The number of
intersection points between images is then calculated to give an indication of
the image texture. This approach is dependent on the grey scale value used
to threshold the image and the number of random binary lines employed
(Fig. 5.2). The Monte-Carlo measurement for the sand and grass images
illustrated in Fig. 5.1 are 134 and 193 respectively.

5.5 Fourier Spectral Analysis

177

''''''''.

l~olltJ

I MAG IE

IMAGIE

'HT

PI

,"""'.

(MClINES)

Fig. 5.2. Monte-Carlo method visual workspace. An image containing binary line

patterns (MCLINES) is combined with a binary version of the input image.

5.4 Auto-Correlation Function (ACF)


Auto-correlation derives information about the basic 2-D tonal pattern that
is repeated to yield a given periodic texture. The ACF is defined as follows:

_ Li,j I(i,j) x I(i

A(bx, by) -

L . Ie
',J

+ bx,j + by)

Z,

.)2

(5.1)

where I(i,j) is the image matrix. The variables (i,j) are restricted to
lie within a specified window outside of which the intensity is zero. Incremental shifts of the image are given by (bx, by). As the ACF and the power
spectral density are Fourier transforms of each other, the ACF can yield information about the spatial periodicity of the texture primitive. Although
useful at times, the ACF has severe limitations. It cannot always distinguish
between textures, since many subjectively different textures have the same
ACF. As the ACF captures a relatively small amount of texture information,
this approach has not found much favour in texture analysis.

5.5 Fourier Spectral Analysis


As mentioned previously (Sec. 3.4.4), the Fourier spectrum is well suited to
describing the directionality and period of repeated texture patterns, since
they give rise to high energy narrow peaks in the power spectrum. Typical
Fourier descriptors of the power spectrum include: the location of the highest peak, mean, variance and the difference in frequency between the mean
and the highest value of the spectrum. Using a number of additional Fourier
features outlined by Liu & Jernigan (1990) it is possible to get high rates of
classification accuracy for natural textures. This approach to texture analysis is often used in aerial/satellite and medical image analysis. The main
disadvantage is that the procedures are not invariant, even under monotonic
transforms of its intensity.

178

5. Texture Analysis

5.6 Histogram Features


A useful approach to texture analysis is based on the intensity histogram of all
or part of an image. Common histogram features include: moments, entropy
dispersion, mean (an estimate of the average intensity level), variance (this
second moment is a measure of the dispersion of the region intensity), mean
square value or average energy, skewness (the third moment which gives an
indication of the histograms symmetry) and kurtosis (cluster prominence or
"peakness"). For example, a narrow histogram indicates a low contrast region,
while two peaks with a well-defined valley between them indicates a region
that can be readily separated by simple thresholding.
Texture analysis, based solely on the grey scale histogram, suffers from
the limitation that it provides no information about the relative position of
pixels to each other. Consider two binary images, where each image has 50%
black and 50% white pixels. One ofthe images might be a chessboard pattern,
while the second one may consist of a salt and pepper noise pattern. These
images generate exactly the same grey level histogram. Therefore, we cannot
distinguish them using first order (histogram) statistics alone.

5.7 Grey Level Run Length Method (GLRLM)


This approach is based on computing the grey level runs of various length.
A grey level run is a set of linearly connected pixels having the same grey
level value. The length of the run is the number of image points within the
run. Coarse textures tend to have frequent long runs, while in finer textures
shorter runs will predominate. This is defined as r( i, j Ie), which estimates the
number of times an image contains a run length j, for a grey level i in the
angle e direction, where e usually takes the values (0, i, ~, 3;). A number
of GLRLM features have be established and are detailed in Haralick (1979).
The main problems with this approach are due to the fact that it varies
under monolithic grey scale transformations and it is extremely sensitive to
noise in the image. Also, it does not measure the important second-order
probabilities. Due to its poor performance it is generally not used.

5.8 Grey Level Difference Method (GLDM)


First-order statistics of local properties, based on absolute differences between
pairs of grey levels, can also be used for texture analysis. For any given
displacement d = (!5x,!5y) where !5x, !5x are integers: 1'(x,y) = If(x,y) f(x + !5x,y + !5x)l. Let pi be the probability density function of 1'. If the
image has m grey levels, this has a form of an m-multidimensional vector
whose ith component is the probability that 1'(x,y) will have value i. pi

5.9 Co-occurrence Analysis

179

can be computed by counting the number of times each value of f'(x,y)


occurs. Coarse textures with relatively small values in P' will be gathered
near i = 0, whereas fine textures tend to be more spaced out. Features derived
from this approach are simple to compute, produce good overall results in
discriminating textures and capture useful texture information. Pietikainen
& Ojala (1996) have shown that the GLDM achieved good overall results
in discriminating textures for industrial applications. A number of GLDM
features are outlined below:
Contrast: CON

= L: i 2 x P'(i).

Angular second moment: ASM

= L:P'(i?

= - L: P'(i) x log(P'(i)).
MEAN = ,k L:i x P'(i).

Entropy: ENT
Mean:

Inverse different moment: IDM

= L:

P'()

i 2 +'1'

5.9 Co-occurrence Analysis


The co-occurrence matrix technique is based on the study of second-order
grey level spatial dependency statistics. This involves the study of the grey
level spatial interdependence of pixels and their spatial distribution in a local
area. Second order statistics describe the way that grey levels tend to occur
together in pairs and therefore provide a description of the type of texture
present. While this technique is quite powerful, it does not describe the shape
of the primitive patterns making up a given texture.
The co-occurrence matrix is based on the estimation of the second order
joint conditional probability density function, f(p, q, d, a), for angular displacements, a, equal to 0, 45, 90 and 135 degrees. Let f(p, q, d, a) be the
probability of going from one pixel with grey level p to another with grey
level q, given that the distance between them is d and the direction of travel
between them is given by the angle a. A 2-D histogram of the spatial dependency of the various grey level picture elements within a textured image
is created (for N g grey levels, the size of the co-occurrence matrix will be
N g x N g ). The following code segment illustrates the generation of the cooccurrence matrix for f(p, q, 1, 0):

int width = input.getWidth();


int height = input.getHeight();
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
output.setxy(x,y,BLACK);

180

5. Texture Analysis

II Account for edge effects


for(int y=l;yheight-l);y++)
for(int x=l;xwidth-l);x++)
{

coc a
input.getxy(x-l,y);
II pixel d
input.getxy(x,y);
II pixel e
coc b
coc_c
input.getxy(x+l,y);
II pixel f
output.setxy(coc_b,coc_c,output.getxy(coc_b,coc_c)+l);
output.setxy(coc_b,coc_a,output.getxy(coc_b,coc_a)+l);
}

For example, assuming the intensity distribution shown in the sub-image


given in Fig. 5.3, we can generate the co-occurrence matrix for d = 1 and
a = 0. See Fig. 5.4 for an example of a textured image and its associated
co-occurrence matrix. Note that a co-occurrence distribution that changes
rapidly with distance d indicates a fine texture.

2
1
1
0
2

3 3 3
1 0 0
1 0 0
0 2 2
2 3 3
(a)

Grey
Scale

o
1
2

0 1 2 3

6
2
1
0

2 1 0
4 0 0
0 4 2
0 2 6

(b)

Fig. 5.3. Co-occurrence matrix applied to an sub-image. (a) Sub-image with 4 grey
levels. (b) Co-occurrence matrix f(p, q, 1, 0) for (a).

Since the co-occurrence matrix also depends on the image intensity range,
it is common practice to normalise the textured image's grey scale prior to
generating the co-occurrence matrix. This ensures that the first-order statistics have standard values and avoids confusing the effects of first- and secondorder statistics of the image. A number of texture measures (also referred to
as texture attributes) have been developed to describe the co-occurrence matrix numerically and allow meaningful comparisons between various textures
(Haralick & Shapiro 1992). Although these attributes are computationally
expensive and rotation variant, they are simple to implement and have been
successfully used in a wide range of applications. They tend to work well for
stochastic textures and for small window sizes. Rotation-invariance can be
achieved by accumulation of the matrices, but this increases the computational cost of the approach. A selection of the more commonly used texture
attributes are now outlined.

5.9 Co-occurrence Analysis

181

(b)

(a)

(c)
Fig. 5.4. Co-occurrence matrix. (a) Input image. (b) Co-occurrence matrix. (c)
Associated visual workspace.

5.9.1 Energy, or Angular Second Moment

This is a measure of the homogeneity of a texture. It is defined as Energy =


Lp Lq f(p, q, d, a)2. In a uniform image, the co-occurrence matrix will have
few entries of large magnitude, therefore the energy attribute will be large.
The following attribute code segments work directly on the co-occurrence
matrix.
double energy=O.O;
int width = input.getWidth();
int height = input.getHeight();
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
energy+=Math.pow(input.getxy(x,y),2);

5.9.2 Entropy

This is a measure of the complexity of a texture and is defined as Entropy =


- Lp Lq f(p, q, d, a).log(f(p, q, d, a)). The entropy value has been found to

182

5. Texture Analysis

correlate with a persons perception of an images complexity (i.e a high entropy value implying a high degree of image complexity).
double entropy = 0;
double current_value = 0;
int width = input.getWidth();
int height = input.getHeight();
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

current_value = (double)input.getxy(x,y);
if (current_value != 0.0)
entropy-=current_value*Math.log(current_value);
}

5.9.3 Inertia
This is a measurement of the moment of inertia of the co-occurrence matrix
about its main diagonal. This is also referred to as the contrast of the textured
image. This attribute gives an indication of the amount of local variation of
intensity present in an image. Inertia = 2: p 2: q (p - q? .f(p, q, d, a).
double contrast=O.O;
int width = input.getWidth();
int height = input.getHeight();
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
contrast+=Math.pow(y-x,2)*input.getxy(x,y);

5.9.4 Local Homogeneity (LH)


This measure indicates the degree to which similar grey levels tend to be
neighbours. LH = '"
'" f(p,q,d,a).
~p~q Hlp-ql
double homogenity=O.O;
int current_pixel_value;
int width = input.getWidth();
int height = input.getHeight();
for(int y=O;y<height;y++)
for(int x=O;x<width;x++)
{

current_pixel_value = input.getxy(x,y);
homogenity+=current_pixel_value/(l+Math.abs(y-x;
}

The co-occurrence approach has been successfully used for a wide class
of textured images in application areas such as aerial, medical, microscopic
imaging. A detailed survey of the co-occurrence applications is presented in

5.10 Morphological Texture Analysis

183

Table 5.1. Co-occurrence attributes for the sand and grass textures illustrated in
Fig. 5.1. Images were equalized prior to the calculation of the attributes.
Sand

Energy (10)
Entropy (10 4 )
Inertia (10 8 )
LH (10 3 )

Grass

f(p,q,l,O)

f(p,q,1,90)

f(p, q, 1,0)

f(p, q, 1, 90)

31.31
-18.81
5.46
3.51

30.9
-18.96
6.5
3.04

3.19
-9.19
2.96
2.02

3.53
-9.70
2.42
2.53

Haralick & Shapiro (1992). The power ofthis technique is that it captures the
second order grey levels statistics which are used by the human perception in
discrimination of textures. Unfortunately this approach does not lend itself
to describing the shape aspects of the texture.

5.10 Morphological Texture Analysis


The ideas behind morphological image processing and analysis have previously been detailed in Chap. 4. Building upon these techniques, a number of
approaches to texture analysis have been developed.
5.10.1 Morphological Ratio

This approach examines the ratio between the opening and closing of an image. The choice of structuring element and image threshold have a significant
bearing on the performance of this technique. The ratio measurement for the
sand and grass images illustrated in Fig. 5.1 are 1.09 and 17.78 respectively
(Fig. 5.5).

Fig. 5.5. Texture analysis using a morphological ratio.

184

5. Texture Analysis

5.10.2 Granularity

The number of white pixels of the morphological opening operation as a


function of the size parameter of the structuring element H can determine the
size distribution of grains (or texture elements) in an image (Dougherty 1992).
The granularity of the image F is defined as: G(d) = 1 - (Ll[F 0 Hd]/LlF) ,
where Hd is a disk structuring element of diameter d or a line structuring
element of length d and LlF is the number of elements in F. This measures
the proportion of pixels participating in grains smaller than d (Fig. 5.6).
A recent summary of morphological texture analysis can be found in Soille
(1998) .

B
Image
d

t. [ F 0 Hd ]
G(d)

A l B IC A l B IC
1
4 L5 1 4
oIaI0

2
4 1 4I 3
o I .2 1.25

I B IC

aI
1

3
0 I 3
1 1.25

Al BIC
4

O I OL O
1

Fig. 5.6. Granularity. Application of a line structuring element of length d to


sub-images A, Band C.

5.11 Fractal Analysis


The fractal approach to texture analysis is based on the observation that natural objects such as clouds, mountains, coastlines or water surfaces can be
accurately described and generated using fractal objects (Mandelbrot 1983).
This has led to the use of fractal geometry in the analysis of textures, especially natural textures which exhibit a high degree of randomness. A key
attribute in the fractal analysis of textures is the fractal dimension (a noninteger number) and consequently many algorithms have been developed to
compute its value. Mandelbrot (1983) observed that the length Ls of a curve
C would depend on the size E of measuring tool used. If E tends to zero for a
curve with finite length, the value Ls represents the actual length of the curve
C. This does not happen with the fractal curves. In this case the smaller E,
the finer the structure it passes over and Ls tends to infinity. The fractal
dimension D of the curve C is given by Eqn. 5.2.

D(C) = lim[l- log(L s )]


E--+O
log( E)

(5.2)

5.12 Textural Energy

185

Fig. 5.7. Sample fractal.

For a fractal curve, D is a non-integer number larger than the immediate


geometrical dimension of the curve. The fractal dimension can be determined
using a range of geometrical and stochastic algorithms (Soille & Rivest 1996).
One approach to the calculation of the fractal dimension is the box-counting
method (Fig. 5.8). An arbitrary grid of boxes of size s is first defined and
then one counts the number of necessary boxes N (s) to cover the curve. The
grid size is reduce by 50% and the process repeated. While this can continue
indefinitely, we generally only implement a small number of iterations. The
fractal dimension is given by Eqn. 5.3. A log = N(s) versus log = s characteristic is then plotted and the best-fit line between the points is estimated.
If the slope of this line is m then Eqn. 5.3 becomes D = 1 - m.
D

= 1 - lim log(N(s))
8-+0

log( s)

(5.3)

A number of applications of fractal models in texture classification and


segmentation have been developed (Brammer 1989, Pentland 1984) although
fractal geometry is predominantly used for image compression and coding
tasks. The fractal dimension is relatively insensitive to image scaling and
shows a strong correlation with the human judgement of surface roughness
(Pentland 1984). While it can characterise textures with a relatively small
set of measures, the features require significant computation.

5.12 Textural Energy


This involves the convolution of the original image with a set of kernels (Laws
1980). The textural properties are then obtained by examining the statistics
of the convoluted images. The resultant image is processed with a non-linear
filter (usually the squared or absolute values of the convoluted images). Consider an the input image I and kernels gl, g2, g3 ... gn' The energy image
corresponding to the nth kernel is:

186

5. Texture Analysis

(
\

'1

I/

l/

(b)

(a)

r-

(c)

(d)

Fig. 5.8. Fractal Dimension: Box-counting method for a I-D curve. (a) s = ~,
N(s) = 7. (b) s = ~, N(s) = 15. (c) s =
N(s) = 32. (d) s = ~, N(s) = 68.

iG,

Sn = 49

i=3

j=3

L L

IJn(r+i,c+j)1

(5.4)

i=-3 j=-3

where I n = 1* gN is the convolved image. Therefore for each pixel (r, c)


we obtain a feature vector SI (r, c) ... Sn(r, c). The dimension of the kernels
are usually small (3 x 3, 5 x 5 and 7 x 7) and they give a strong response for
vertical or horizontal edges or lines. In their comparative study Ojala et al.
(1996) highlight the poor performance of this approach. They suggest that it
could obtain better results if the sample size was increased. They also argue
that the poor performance of this approach is probably due to there being
more discriminative power in the distribution variance information than in
the full distribution itself.

5.13 Texture Spectrum Method


The texture spectrum method (Wang & He 1990a, Wang & He 1990b) is
based on the concept of texture units (TU). These are a set of eight elements
with possible values (0,1,2) obtained from a 3 x 3 pixel neighbourhood. This
neighbourhood is represented by V = Va, VI, V2 , . , Vs where Va is the value
of central pixel. The TU is obtained applying a threshold operation as follows:
TU = E a, E l , E 2 , . ,Es for Ei = 0 if Vi < Va, Ei = 1 if Vi = Va, Ei = 2 if
Vi > Va where i = 1,2,3 ... 8.
Each TU pattern is mapped uniquely into a scalar number according to
s
. 1
the formula N TU = Li=1 Ei X 3~- , where N TU represents the number of

5.15 Random Field Models

187

texture units (note that the texture units are labelled and ordered in 38
different ways). This method can be useful in discriminating different types
of textures and has been shown to produce similar results to that of the
co-occurrence matrix approach (Wang & He 1990a, Wang & He 1990b).

5.14 Local Binary Patterns (LBP)


Ojala et al. (1996) proved that a simple two-level version of the Texture
Spectrum Method, referred to as Local Binary Patterns performs equally well.
This deals with binary patterns, therefore there are only 28 possible textures
units, hence speeding up the analysis. In this case the values of Ei are 0 if
V; < Va and 1 if V; 2: Va and N TU = L~=l Ei X 2i-1. This approach seems
to perform well in various texture classification and segmentation problems,
especially with Brodatz (1966) textures. The features are easy to compute
and they are invariant to any monotonic grey scale transformation (Fig. 5.9).
The TU corresponds to one of the 256 possible local patterns such as
edges, spots ... etc. By getting the histogram of the LBP over a region,
a measure of the density of each pattern or feature in the region can be
found. In general, the LBP operator is applied to the source texture and a
comparison of the global histogram based on a similarity score is used. A
nonparametric statistical histogram comparison test (G-statistic) has been
used to generate this similarity score (Ojala & Pietikainen 1999). The LBP
is often combined with an additional contrast measurement (LBP Ie) which
is based on the difference of the average grey level of those pixels which have
the value 1 and those which have the value 0, for a given neighbourhood.
LBP and LBP Ie perform well for small windows. In fact, the LBP approach
outperforms many other filtering methods, as outlined by Randen & Hus0Y
(1999) in their extensive comparison of such approaches to texture classification. The disadvantage of this approach as described is that it is rotation
variant, although a newer rotation invariant version has been proposed in
Pietikainen & Ojala (2000).
A solution to unsupervised texture segmentation based on the measurement of texture information using LBP and LBP Ie has recently been developed (Ojala & Pietikainen 1999). A region based algorithm is used to
generate coarse image segmentation and a pixelwise classification scheme is
then employed to improve the localization of the region boundaries.

5.15 Random Field Models


A Markov Random Field (MRF) is a joint distribution imposed on a set of
random variables representing objects of interest, such as pixel intensities,
that assumes statistical dependence in a spatially meaningful way. To use

188

5. Texture Analysis

2
1

5
3

(a)

1
3

(b)

(c)

(d)

Fig. 5.9. Local binary patterns (LBP). (a) Original 3 x 3 neighbourhood. (b)
3 x 3 neighbourhood thresholded by the value of the centre pixel. (c) LBP weights.
(d) The values in the thresholded neighbourhood (b) are multiplied by the LBP
weights, (c), given to the corresponding pixels. The elements are then summed to
form the value of this textured unit, LBP = 2 + 4 + 32 + 64 + 128 = 230. The
contrast measure LBP/C = 5+4+;+4+3 - 2+~+1 = 3.27

Markov Random Fields to model texture (Dubes & Jain 1989) the texture
is considered as a stochastic and stationary process. This model implies that
the probability of a pixel taking a given grey value is only dependant on
the grey levels of the pixels from its neighbourhood. The model can be fully
described by a small and compact number of parameters. The application of
MRF in both classification and synthesis of textures are numerous and the
overall performances achieved can be impressive. As well as being theoretically elegant, this model performs very well in classification and seems to be
a far superior technique when compared to the co-occurrence approach. Its
features are invariant to image rotation by k27r and are capable of recognising
scaled versions of an image as belonging to one class. Limitations of MRF include its sensitivity to grey level distortions and the fact it only accounts for
local interactions between the pixels. Additional disadvantages of this technique include its computational expense and the fact that it cannot model
all textures (e.g. due to the nature of this model, it cannot model directional
textures).

5.16 Spatial/Frequency Methods


The properties of Gabor Filters (Reed & Wechsler 1990) make them well
suited for texture analysis. The Gabor functions are among the earliest and
one of the most popular texture models. The approach is biologically motivated and minimizes the joint uncertainty in space and frequency. It has
tuneable parameters of orientation and scale and can capture underlying
texture information. It supports both frequency analysis (large masks) and
spatial pattern approaches (small masks). A multi-channel filtering approach
was used by Bovik et al. (1990) in texture segmentation tasks. They obtained
a multi-channel image by convolving the original image with the bank of Gabor filters. After the amplitude of each channel is smoothed, the channel with
the largest amplitude is used to label the texture. A key disadvantage is that
the outputs of the filter bank are not mutually orthogonal which may result

5.19 Conclusion

189

in a significant correlation between texture features. As well as been computationally expensive, the selection of features is not always a straightforward
task.

5.17 Autoregressive Model


The Autoregressive Model (McCormick & Jayaramamurthy 1974, Haralick &
Shapiro 1992) exploits the linear relationship between the pixel and its neighbours. The model can capture everything about a texture, is computationally
efficient and can be used for both the texture analysis and synthesis. A key
weakness with this approach is that the textures capable of being characterised by it tend to consist of micro-textures.
While details on the implementation of the Autoregressive, MRF models and
Gabor filtering are beyond the scope of this book, they were introduced to
illustrate the wide range of approaches that can be applied to the description
and analysis of textured images. The summaries are included with a view to
guiding the reader towards the appropriate research.

5.18 Structural Approaches to Texture Analysis


Certain textures are deterministic in that they consist of identical texels
(basic texture element), which are placed in a repeating pattern according to
some well-defined but unknown placement rules. To begin the analysis, a texel
is isolated by identifying a group of pixels having certain invariant properties,
which repeat in a given image. A texel may be defined by its: grey level,
shape, or homogeneity of some local property, such as size or orientation.
Texel spatial relationships may be expressed in terms of adjacency, closest
distance and periodicities. This approach has a similarity to language with
both image elements and grammar. Arising from this a syntactic model can
be generated.
A texture is labelled strong if it is defined by deterministic placement
rules, while a weak texture is one in which the texels are placed at random.
Measures for placement rules include: edge density, run lengths of maximally
connected pixels and the number of pixels per unit area showing grey levels
that are locally maxima or minima relative to their neighbours.

5.19 Conclusion
There is a growing belief within the machine vision community, as discussed
in Pietikainen & Kauppinen (1999), that a global solution to the texture analysis problem is no longer a suitable research goal. Many of the successes in

190

5. Texture Analysis

this field have been and are likely to continue to be application driven. While
model based techniques were dominant in the 1980's, they are not considered
practical for many machine vision applications. Currently filter based approaches, such as the LBP (Sec. 5.14) are viewed as a better options as they
are easily implemented and benefit from a psycho-physical understanding of
texture. Where possible, it is recommended that a supervised approach be
used as it greatly simplifies the texture analysis task. Unfortunately, while
many of the newer approaches work well under laboratory test conditions
they fail to impress when applied to practical applications. Therefore many
researchers in texture analysis take the view that it should only be applied
as a last resort in the analysis of images, as it has to contend with significant
variables in the scene, such as luminance.

6. Colour Image Analysis

Historically, machine vision has been applied to monochromatic objects, or


those in which colour I differences are visible as changes in intensity when they
are viewed using a monochrome camera. Despite the obvious importance of
colour in manufactured goods, machine vision systems capable of processing
coloured images are less common. Three major reasons have been put forward
for this (Batchelor & Whelan 1997).
Many industrial artifacts are monochromatic, or nearly so.
Some colour inspection problems are inherently very difficult, requmng
very precise measurement of subtle colour differences. Commercial colour
image sensors (e.g. CCD cameras) are simply too imprecise to allow really
accurate measurements to be made.
Colour image processing equipment is necessarily more complex and more
expensive than the equivalent monochrome devices. Colour cameras are
more complicated, lighting has to be more accurately controlled and a
colour image requires three times the storage capacity when compared to
a monochrome image of the same spatial resolution.
The aim of this chapter is to discuss a range of issues relating to colour
imaging. In particular, we will examine a number of ways to represent colour.
We will also examine one approach to coarse colour imaging suitable for real
time implementation, namely the programmable colour filter.

6.1 Colour Cameras


Human colour perception employs three types of cones in the retina, each
tuned to a different part of the visual spectrum. Colour television cameras
also contain three optical filters, which have been specially designed to approximate the spectral response of the cones in the human eye. In this way,
the colour response of the camera is made to correspond approximately to
1

The colour images that appear in this section are reproduced in grey scale.
Please refer to the book web site http://www . eeng. deli. ier j avamv / for full
colour versions of these images.

P. F. Whelan et al., Machine Vision Algorithms in Java


Springer-Verlag London 2001

192

6. Colour Image Analysis

that of the eye. It should be realised that almost all colour cameras used in
machine vision systems have been designed to have this characteristic.
The traditional colour camera splits the incoming images into three optical
paths (or channels) by means of mirrors and beam-splitters. These are then
passed through Red, Green and Blue optical filters before being focused on
to three separate solid-state sensors (generally CCD sensors, although other
sensor types are possible). Alternatively good use can be made of the precise
geometry of the photo-sensor array; by placing minute patches of optical dye
in front of the sensor to produce a cheaper single-chip colour CCD camera
(Fig 6.1). In many three-chip colour cameras, the output consists of three
parallel lines, called R, G and B, each of which carries a standard monochrome
video signal. Another standard, called composite video, exists in which the
colour and intensity information is coded by modulating a high-frequency
carrier signal.

Colour
filters

N-- - - - -__+_

image
sensors

~~~

Optical signals

(a)

(b)
Fig. 6.1. Colour cameras. (a) Three-sensor colour camera. (b) Colour mask structure for a single chip solid-state colour camera. Each shaded rectangle represents
a photo detector with a coloured mask placed in front of it (e.g. R implies that it
passes red).

The transmitted signals derived from a colour camera are formed by encoding the RGB picture information into two parts: luminance, which carries

6.2 Red-Green-Blue (RGB) Colour Representation

193

the brightness information and chrominance, which carries the colour information. The latter conveys both hue and saturation information (Sec. 6.3).
While the method of coding colour information is different depending on the
broadcast system (see below), it is important to note that all of the colour
information is carried in the chrominance signal.
NTSC: National Television System Committee. This was the first colour TV
broadcast system standard and was implemented in the United States in
1953. NTSC runs on 525 lines/frame and is used in the USA and parts
of Japan.
PAL: Phase Alternating Line. This was introduced in the early 1960's and
is commonly used throughout Europe (with the exception of France).
It utilises a wider bandwidth channel than NTSC, allowing for a better
picture quality. PAL runs on 625 lines/frame.
SEC AM: Sequential Couleur Avec Memoire (Sequential Colour with Memory). This standard was also introduced in the early 1960's and is implemented in France. SECAM uses the same bandwidth as PAL, but unlike
PAL it transmits the colour information sequentially. SECAM runs on
625 lines/frame.
The key to many machine vision based colour imaging tasks is the choice
of the colour representation (or colour) space in which to conduct the image
analysis. The concept of a colour space refers to a Cartesian space in which
the visual sensation of colour can be uniquely defined by a set of numbers
representing chromatic features. A wide range of different colour spaces exist,
a number of which are now discussed. Throughout this discussion we will
highlight the advantages / disadvantages and potential application areas of
the various approaches to colour imaging.

6.2 Red-Green-Blue (RGB) Colour Representation


Due to the fact that most colour cameras produce separate RGB video signal
outputs, the RGB colour space is commonly used as the starting point in
machine vision colour imaging applications. In the following discussion R, G
and B represent the Red, Green and Blue outputs from a colour video camera
(Fig. 6.2). While each of these colour planes can be processed individually
(and recombined if necessary) the RGB approach to colour image analysis
will not always be appropriate as it suffers from some major drawbacks:

It is device dependent and excludes some visible colours.


It is difficult to find thresholds and shapes in the RGB cube.
RGB space is perceptually non-uniform and consequently the distance metric will not represent the real differences between colours.
Uniform quantization of this space can lead to unsatisfactory results.

194

6. Colour Image Analysis

(a)

(b)

(c)

(d)

Fig. 6.2. Sample pizza image and its associated RGB components. The colour
images are displayed in grey scale. (a) Original image. (b) Red component (i.e. red
mapped to white). (c) Green component. (d) Blue component.

The RGB space is frequently converted into another colour space (usually
by means of a non-linear transformation) for an easier and more appropriate
colour processing. A number of such transforms will be discussed throughout this chapter. The following program segment illustrates the process of
converting a colour image to its RGB representation:
Image input = args.getImage(O);
DataBlock rets = new DataBlock();
rets.add(new ImageContainer(input,ImageContainer.RED));
rets.add(new ImageContainer(input,ImageContainer.GREEN));
rets.add(new ImageContainer(input,ImageContainer.BLUE));

The process of creating a colour image from a RGB representation is


illustrated below:
int
int
int
int

r,g,b;
width = red.getWidth();
height = red.getHeight();
RGBImageArray[] = new int[width*height];

6.2 Red-Green-Blue (RGB) Colour Representation

195

for(int y=O; y<height; y++)


for(int x=O; x<width; x++)
{

r = red.getxy(x,y);
g
green.getxy(x,y);
b
blue.getxy(x,y);
RGBImageArray[x+(y*width)]

OxFFOOOOOOI (r16) Ig8Ib;

Image output

Toolkit.getDefaultToolkit().createImage(new
MemoryImageSource(width,height,
RGBImageArray,O,width;

6.2.1 Maxwell's Colour Triangle

Consider Fig. 6.3(a), the RGB space is defined by 0::; R, G, B ::; W, where W
is a constant for all three signal channels, shows the allowed range of variation
of the point (R, G, B). The colour triangle, also called the Maxwell triangle
is defined as the intersection of that plane which passes through the points
(W, 0, 0), (0, W, 0) and (0,0, W), within the colour cube. The orientation of
that line joining the point (R, G, B) to the origin can be measured by two
angles. An alternative and much more convenient method is to define the
orientation of this line by specifying where the vector (R, G, B) intersects
the colour triangle (the (R, G, B) vector is extended if necessary). Then, by
specifying two parameters (i.e. defining the position of a point in the colour
triangle) the orientation of the (R, G, B) vector can be fixed. It has been
found experimentally that all points lying along a given straight line are
associated with the same sensation of colour in a human, except when very
close to the origin (i.e. very dark scenes) where there is a loss of perception
of colour. Hence, all (R, G, B) vectors which project to the same point on the
colour triangle are associated with the same colour name, as it is assigned by
a given person (at a given time, under defined lighting conditions). Moreover,
points in the colour triangle that are very close together are very likely to be
associated with the same colour label.
The colour triangle allows the use of a graphical representation of the
distribution of colours, showing them in terms of the spatial distribution of
blobs in an image. This is a particularly valuable aid to the understanding
the nature and inter-relationship between colours. It also permits the use
of the binary and grey scale image processing techniques outlined earlier.
This allows us to perform such operations as generalising colours, merging
colours, performing fine discrimination between similar colours, simplifying
the boundaries defining the recognition limits of specific colours, etc. Since
the colours perceived by a human can be related to the orientation of the
(R, G, B) colour vector, a narrow cone, with its apex at the origin, can be
drawn to define the limits of a given colour (e.g. "yellow"). The intersection
of the cone with the colour triangle generates a blob-like figure.

196

6. Colour Image Analysis

B
81u8

k - - - - - - - - - . " Magan'"

~n F--~~--~

,,

Colour

Triangle

,,
:8Iadc ~"-----_+--_:.,.J.........

(a)
Blue

(b)
Fig. 6.3. RGB space. (a) RGB space and the colour triangle. Points that are
close on the colour triangle are likely to have the same colour label. (b) Hue and
saturation plotted in the colour triangle. The hue is the angular position of a point
relative to the centre of the colour triangle. The degree of saturation is measured
by the distance from the centre. Intensity is not represented.

Trichromaticity is simply a useful working idea, but it does not guarantee


that perfect colour reproduction or recognition is possible using just three
primary colours. According to Hutson (1971): "No three primaries exist for
which all of the spectrum and all of the non-spectral colours can be produced.
As a result, the colour triangle cannot distinguish between saturated and nonsaturated cyans, purples and magentas. The situation is not as catastrophic
as the chromaticity diagram appears to indicate. The colours of everyday
scenes are generally rather unsaturated and have chromaticities lying near
the centre of the chromaticity diagram" .
Some colours that are distinguishable by the eye are not easily differentiated by a colour camera. For example, "gold" and "yellow" are mapped to
the same region of the colour triangle. Vivid purple is mapped to magenta,

6.2 Red-Green-Blue (RGB) Colour Representation

197

while vivid cyan is mapped to a paler (i.e. less saturated) tone. In relative
terms, these are minor difficulties and the colour triangle remains one of the
most useful concepts for colour vision. Fully saturated mixtures of two primary colours are found on the outer edges of the triangle, whereas the centre
of the triangle, where all three primary components are balanced, represents
white. The centre of the colour triangle is referred to as the white point, since
it represents neutral (i.e. non-coloured) tones. Other, unsaturated colours are
represented as points elsewhere within the triangle. The hue is represented
as the angular position of a point relative to the centre of the colour triangle,
while the degree of saturation is measured by the distance from the centre.

6.2.2 One-dimensional Histograms: Colour Separations


As we have seen, it is possible to think of the RGB colour representation as
being equivalent to separate monochrome images, known as the RGB colour
separations. These can be processed individually, in pairs, or combined together in some way, using conventional image processing operators. One useful method of describing a grey scale image is the intensity histogram and
we can apply the same technique to the R, G or B images. Various methods have been devised for measuring the shape of histograms and these can
quantify skewness, standard deviation, location of the peak. etc. All of these
can be applied to anyone of the three colour separations, resulting in a set
of numbers describing the colour image. Applications such as monitoring the
cooking of beef steaks and pizza crust have been studied in this way (Keller
1986). Although the histogram analysis technique has distinct potential, it
gives no indication about the spatial distribution of colours in an image. Another important consideration is the fact that this technique is, in no obvious
way linked, to the human perception of colour, making it difficult to interpret
the data.

6.2.3 Two-dimensional Scattergrams


2-D scattergrams are often able to yield useful information about the nature
and distribution of colours in a scene (Fig. 6.4) (Batchelor & Whelan 1997).
In order to describe how 2-D scattergrams may be generated, let us consider
two monochrome images A = a(i,j) and B = b(i,j). A third image C can be
generated as follows:
1. Make C black (level 0) initially.
2. Plot a point of intensity (Z + 1) at the position [a(i,j),b(i,j)] in C, if
this point is at level Z beforehand. (Notice that Z may equal 0 initially.)
3. Repeat Step 2 for all (i,j) in the image.
Suppose that image A was generated by the Red channel and that image
B was derived from the Blue channel. The resulting (Red, Blue)-scattergram

198

6. Colour Image Analysis

can tell us a great deal about how the red and blue signals are related to one
another and therefore provide a useful aid in understanding the types colours
in the image.

Image A(red)

Image B(blue)

-----0

-----0

a(i,j)

b(i ,j)

Scan images A and B to generate


scattergram shown below

b(i ,j)
(blue)

a(i ,j)
[red]

Fig. 6.4. Generating the 2-D scattergram.

6.3 Rue-Saturation-Intensity (RSI) Colour


Representation
Another popular technique for colour image processing is based upon the
concepts of hue, saturation and intensity. The HSI representation of colour is
close to the colour description method employed by humans (Fig. 6.5). When
asked to describe a colour, a person is likely to first describe the kind of colour
(hue), followed by the strength of colour (saturation) and the brightness
(intensity). Hue (H) defines the intrinsic nature of the colour. Different hues
result from different wavelengths of light stimulating the cones of the retina.
In other words, hue is related to the name of a colour that a human might
assign. A saturated colour (5 ~ 1) is deep, vivid and intense as it does not

6.3 Hue-Saturation-Intensity (HSI) Colour Representation

199

contain colours from other parts of the spectrum. Weak or pastel colours
(S r:::l 0) have little saturation.
Eqns. 6.1 to 6.3 relate the HSI parameters to the RGB representation
(Gonzalez & Wintz 1987). For convenience, it is assumed that the RGB components have been normalised (W = 1).

2R-G-B
2J(R - G)2 + (R - B)(G - B)

}{ = arccos~~======7=~======~~~~

(6.1)

S = 1- 3 x min(R,G,B)
R+G+B

(6.2)

1= R+G+B
(6.3)
3
The following program segment illustrates the process of converting a
colour image to its HSI representation:
double r,g,b,h,s,i,intensity;
Image input = args.getImage(O);
ImageContainer R
ImageContainer G
ImageContainer B

new ImageContainer(input,ImageContainer.RED);
new ImageContainer(input,ImageContainer.GREEN);
new ImageContainer(input,ImageContainer.BLUE);

int width = R.getWidth();


int height = R.getHeight();
ImageContainer H
ImageContainer S
ImageContainer I

new ImageContainer(width,height);
new ImageContainer(width,height);
new ImageContainer(width,height);

I I HUE
double num;
double den;
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{

r = (double)R.getxy(x,y);
g
(double)G.getxy(x,y);
b
(double)B.getxy(x,y);
intensity = (r+g+b)/3;
if((r==g)&&(r==b
h = 255;

else
{

num = (r-g)+(r-b);
den = 2*Math.sqrt((((r-g)*(r-g+((r-b)*(g-b;
h = Math.acos(num/den);
if(h<=O) h = 2*Math.PI + h;

II gives range 1-360

200

6. Colour Image Analysis


if ((b/intensity(g/intensity))
h
2*Math.PI - h;
h = (h/(2*Math.PI)) * 255;
}

H.setxy(x,y,(int)h);
}

II SATURATION
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{

r = (double)R.getxy(x,y);
g
(double)G.getxy(x,y);
b
(double)B.getxy(x,y);
if((r==g)&&(r==b))
s=O;
else
s=(1-((3*Math.min(Math.min(r,g),b))/(r+g+b)))*255;
S.setxy(x,y,(int)s);
}

II INTENSITY
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{

r = (double)R.getxy(x,y);
g
(double)G.getxy(x,y);
b
(double)B.getxy(x,y);
i
(r+g+b)/3;
I.setxy(x,y,(int)i);
}

DataBlock rets
rets.add(H);
rets.add(S);
rets. add (I) ;

new DataBlock();

The process of creating a colour image from a HSI representation is illustrated below:
double r=O,g=O,b=O,h,s,i;
int red,green,blue;
int width = H.getWidth();
int height = H.getHeight();
int RGBImageArray[] = new int[width*height];
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{

6.3 Hue-Saturation-Intensity (HSI) Colour Representation

201

h
((double)H.getxy(x,y/255;
s = ((double)S.getxy(x,y/255;
i
((double)I.getxy(x,y;
h
h*2*Math.PI;
[0<h<2PI/3]
/ / Region 1.
if((h > 0) && (h <= 2*Math.PI/3
{

b
(1-s)/3;
r = (1+(s*Math.cos(h)/Math.cos(Math.PI/3-h)/3;
g
l-(r+b);
}

/ / Region 2.
[2PI/3<h<4PI/3]
if((h > 2*Math.PI/3) && (h <= 4*Math.PI/3
{

h
h-(2*Math.PI/3);
r = (1-s)/3;
g
(1+(s*Math.cos(h)/Math.cos(Math.PI/3-h)/3;
b
1-(r+g);
}

/ / Region 3.
[4PI/3<h<2PI]
if((h > 4*Math.PI/3) && (h <= 2*Math.PI
{

h
g
b
r =

h-(4*Math.PI/3);
(1-s)/3;
(1+(s*Math.cos(h)/Math.cos(Math.PI/3-h)/3;
1-(g+b);

red = (int) (3*r*i);


green = (int) (3*g*i);
blue = (int) (3*b*i);
RGBImageArray[x+(y*width)]
OxFFOOOOOOI (red16)I (green8) Iblue;
}

Image output = Toolkit.getDefaultToolki

Standard grey scale operators can then be applied to the HSI colour channels. A disadvantage of HSI colour space is the so called hue singularity, i.e.
with reference to Eqn. 6.1 it can be seen that for R = G = B = 0 hue is
undefined.
Fig. 6.6 illustrates the use of RGB and HSI colour space in the calculation of the approximate red:green pepper ratio for a pizza image. This
involves splitting the colour image into its constituent RGB (Fig. 6.2) and
HSI (Fig. 6.5) colour channels. The channels are then processed as grey scale
images in order to extract the relevant regions.

202

6. Colour Image Analysis

(c)

(b)

(a)

Fig. 6.5. Sample pizza image (as illustrated in Fig. 6.2(a)) and its associated HSI
components. (a) Hue component. (b) Saturation component. (c) Intensity component. Note that some saturation has occurred in these images.

(b)

(a)

(c)
Fig. 6.6. Calculation of the approximate red:green pepper ratio for a pizza section.
(a) Red peppers highlighted. (b) Green peppers highlighted. (c) Associated visual
workspace.

6.4 Opponent Process Representation

203

6.3.1 Colour Scattergrarns

In examining Colour Scattergrams let us assume that a certain point (i,j) in


the scene being viewed yields a colour vector (Ri,j, Gi,j, Bi,j). Furthermore,
this vector, or its projection, will be assumed to intersect the colour triangle
at that point defined by the polar coordinates (Hi,j, Si,j) where Hi,j, Si,j are
hue and saturation values calculated using Eqns. 6.1 and 6.2. Each address
(i,j) in the scene being viewed defines a point in the colour triangle. The
colour scattergram is calculated as follows:
1. Clear the current image.
2. Select a point (i,j) in the input image being viewed.
3. Compute values for the hue and saturation, using Eqns. 6.1 and 6.2,
respecti vely.
4. Transform (Hi,j, Si,j) (polar coordinates) into Cartesian coordinates,

(i, j).

5. Increment the intensity value stored at point (i, j) in the current image.
(Hard limiting occurs if we try to increase the intensity beyond 255.)
6. Repeat step 5 for all (i, j) in the input scene.
It is clear that dense clusters in the colour scattergram are associated with
large regions of nearly constant colour. A large diffuse cluster usually signifies
the fact that there is a wide variation of colours in the scene being viewed,
often with colours "melting" into one another. On the other hand, step-wise
colour changes are typically associated with small clusters that are distinct
from one another. The colour scattergram is a useful method of characterising
colour images and hence has a central role in programming the colour filters
described in Sec. 6.12.

6.4 Opponent Process Representation


The Opponent Process Representation emphasises the differences between red
and green, yellow and blue and between black and white colour components
in an image. A simple Opponent Process model is based upon the transformation of the RGB signals given by Eqn. 6.4 (Fig. 6.7). More complicated
transformation models have been proposed in an attempt to account for certain psychophysical effects (these will be discussed later). The Opponent
Process representation of colour information is a good way to emphasise the
difference between red and green regions, blue and yellow regions and black
and white regions in a scene.
Red Green]
[ Blue Yellow
White Black

= [1
-1 -21]
-1 2 [R]
G
1

1 1

(6.4)

204

6. Colour Image Analysis

(a)

(c)

(b)

Fig. 6.7. Sample pizza image (as illustrated in Fig. 6.2(a)) and its associated
Opponent Process components. (a) Red-Green component. (b) Blue-Yellow component. (c) White-Black component. Note that some saturation has occurred in
these images.

6.5 YIQ Colour Representation


As mentioned previously, transmitted colour signals are formed by encoding RGB picture information into two parts, the luminance Y (brightness
information) and the chrominance (colour information). The colour hue information is given by the chrominance (phase) angle Q and the saturation
information is given by the chrominance amplitude I. The transformation of
the RGB signals to the YIQ colour space is given in Eqn. 6.5. This equation
is a simple representation of the NTSC encoding scheme used in American
broadcast television.
10.299 0.587 0. 114
O.596 -0.275 -0.321
0.212 -0.528 0.311

l 1lGRl
J

(6.5)

The following program segment illustrates the process of converting a


colour image to its YIQ representation:
RGBlmage input = getRGBlmage(O);
RGBlmage output = new RGBlmage(width,height);
int width = input.getWidth();
int height = input.getHeight();
int r,g,b;
int YD=O,ID=O,QD=O;
ImageContainer Y
ImageContainer I
ImageContainer Q

new ImageContainer(width,height);
new ImageContainer(width,height);
new ImageContainer(width,height);

for(int y=O; y<height; y++)


for(int x=O; x<width; x++)
{

r = input.getxy(RGBlmage.RED,x,y);

6.6 YUV Colour Representation

g
b
YO
IO
QO

205

input.getxy(RGBImage.GREEN,x,y);
input.getxy(RGBImage.BLUE,x,y);
(int) (.299*r + .587*g + .114*b);
(int) (.596*r - .275*g - .321*b);
(int) (.212*r - .523*g + .311*b);

Y.setxy(x,y,(int)YO);
I.setxy(x,y,(int)IO);
Q.setxy(x,y,(int)QO);
}

The process of creating a colour image from a YIQ representation is illustrated below:
double red=O,green=O,blue=O,yin,iin,qin;
int width = I.getWidth();
int height = I.getHeight();
RGBImage output = new RGBImage(width,height);
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{

yin
iin
qin

((double)Y.getxy(x,y));
((double)I.getxy(x,y));
((double)Q.getxy(x,y));

red
(1*yin+.956*iin+.621*qin);
green = (1*yin-.272*iin-.647*qin);
blue = (1*yin-1.105*iin+1.702*qin);
output.setxy(RGBImage.RED,x,y,(int)red);
output.setxy(RGBImage.GREEN,x,y,(int)green);
output.setxy(RGBImage.BLUE,x,y,(int)blue);
}

6.6 YUV Colour Representation


Similar to YIQ representation, but for European TV transmission systems
(Eqn. 6.6).

0.229 -0.5870.114]
[ -0.147 -0.289 0.437
0.615 -0.515 -0.1

[R]
G
B

(6.6)

206

6. Colour Image Analysis

6.7 CIE Chromaticity Diagram


The 1931 ClE (Commission lnternationale de l'Eclaimge) chromaticity diagram (Fig. 6.8) was initially used to define colour names. This was plotted
for a standard observer and under carefully controlled lighting conditions. It
acts as a conceptual aid rather than a precise basis for colour discrimination,
and as such it is considered impractical for industrial vision systems. Each
point in this diagram represents one instance of a broad class of objects.

bluish
blue In: ~-+-

x
Fig. 6.S. CIE (1931) chromaticity diagram. Colour representation in x,y plane (see
Sec. 6.8).

Over the years a number of additions have been made to the basic chromaticity diagram. MacAdams Ellipses are defined as ellipses within the CIE
chromaticity diagram for which colours inside these elliptical regions are
classed as indistinguishable. Colours lying just outside are referred to as
just noticeably different (JND). The ClE Uniform Chromaticity Scale (UCS)
transforms the elliptical contours with large eccentricity (major to minor
ratios of u pto 20 : 1) to circles.

6.8 CIEXYZ Colour Representation


The CIEXYZ colour space was proposed by the ClE and it is based on CIE
tristimulus values X, Y, Z. They can be calculated from R, G, B stimulus using the linear transformations illustrated in Eqns. 6.10 to 6.12. The colours are
conveniently described by the luminance Y and the x, y chromaticties (normalized), Eqns. 6.7 and 6.8. The XYZ representation can describe any colour
and its spectral composition corresponds to the colour matching characteristic of human vision systems. Another advantage of CIEXYZ representation

6.8 CIEXYZ Colour Representation

207

is that the result of an additive mixing between two colours will always lie
on the line that connects the two colours in x-y plane. A key problem with
this approach is the non-uniformity of the CIEXYX plane.
X

X= - - - - -

X+Y+Z
Y

(6.7)

X+Y+Z

(6.S)

Z
z=-----

(6.9)

y=

X+Y+Z

X = 2.36R - 0.515G + 0.0052B

(6.10)

Y = -0.S9R + 1.42G - 0.014B = Luminance

(6.11)

Z = -0.46R + O.SSG + 1.009B

(6.12)

The following program segment illustrates the process of converting a


colour image to its XYZ representation:
RGBImage input = getRGBImage(O);
RGBImage output = new RGBImage(width,height);
int width = input.getWidth();
int height = input.getHeight();
int
double

r,g,b;
XOUT=O,YOUT=O,ZOUT=O;

ImageContainer X = new ImageContainer(width,height);


ImageContainer Y = new ImageContainer(width,height);
ImageContainer Z = new ImageContainer(width,height);
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{

r = input.getxy(RGBImage.RED,x,y);
g
input.getxy(RGBImage.GREEN,x,y);
b
input.getxy(RGBImage.BLUE,x,y);
/** X Component **/
XOUT = (.431*r + .342*g + .178*b);
X.setxy(x,y,(int)XOUT);
/** Y Component **/
YOUT = (.222*r + .707*g + .071*b);
Y.setxy(x,y,(int)YOUT);
/** Z Component **/
ZOUT = (.020*r + .130*g + .939*b);
Z.setxy(x,y,(int)ZOUT);
}

208

6. Colour Image Analysis


setObject(O,X);
setObject (1, Y) ;
setObject(2,Z);

The process of creating a colour image from a XYZ representation is


illustrated below:
double red=O,green=O,blue=O,xin,yin,zin;
int width = X.getWidth();
int height = X.getHeight();
RGBlmage output = new RGBlmage(width,height);
for(int y=O; y<height; y++)
for(int x=O; x<width; x++)
{

xin
yin
zin

((double)X.getxy(x,y;
((double)Y.getxy(x,y;
((double)Z.getxy(x,y;

red
(3.063*xin-1.393*yin-.476*zin);
green = (-.969*xin+1.876*yin+.042*zin);
blue = (.068*xin-.229*yin+1.069*zin);
output.setxy(RGBlmage.RED,x,y,(int)red);
output.setxy(RGBlmage.GREEN,x,y,(int)green);
output.setxy(RGBlmage.BLUE,x,y,(int)blue);
}

6.9 CIEL UV Colour Representation


The CIELUV colour space, also introduced by the erE, is a perceptually uniform space. It is suitable for applications dealing with additive mixtures of
coloured light such as the lighting and colour television industries. Its parameters L, u, v are also derived from the X, Y, Z primitives. See Eqns. 6.13 to
6.15, where the subscript n stands for the reference white point measurements
and u,v are the uniform chromaticities defined in Eqns. 6.16 and 6.17.

Y,
= 116 x (-)3 - 16 = Luminance

(6.13)

= 13 x L(u* -

u~)

= Red green component

(6.14)

= 13 x L( v* -

v~)

= Yellow blue component

(6.15)

Yn

u*
v*

4X
X + 15Y +3Z

gy
X

+ 15Y +3Z

(6.16)
(6.17)

6.11 Spatial CIELAB Colour Representation

209

6.10 CIELAB Colour Representation


CIELAB is a popular colour system that is based on complementary colour
pairs (Sec. 6.4). Colour pairings include red-green, blue-yellow and blackwhite. This form of colour representation is recommended for applications
dealing with subtractive colour mixtures such as those found in the textile,
plastic, paint and printing industries. The parameters L (psychometric lightness channel), a (red-green channel) and b (yellow-blue channel) are computed
from the X, Y, Z primaries using the relationships outlined in Eqns. 6.18 to
6.21 (the subscript n represents the reference white point measurement).
L

Y,
= 116 x (-)3 - 16

(6.18)

Yn

for Y/Yn

> .008856, otherwise


Y
Yn

= 903.3 x -

X
Y
= 500 x [1(-) - f( -)]
Xn

(6.19)

Yn

Yn

Zn

b = 200 x [1( - ) - f(-)]

(6.20)
(6.21)

where f(t) = d for t > .008856, otherwise f(t) = 7.787 x t + 11166


A key advantage of this representation is the fact that it is a perceptually equalised colour space, i.e. numerical distance in this colour space is
proportional to perceived colour difference.

6.11 Spatial CIELAB Colour Representation


The Spatial CIELAB (S-CIELAB) colour space (Zhang & Wandell 1997) is
a spatial extension of the CIELAB colour representation which takes into
account the spatial frequency of the colour components. That is, colours that
appear in multicolour patterns and uniform areas are perceived differently.
This has been recently found to be in concordance with the human perception of colour. The colour image is first decomposed into three opponentcolour planes (brightness, red-green, yellow-blue). These are generated from
the X, Y, Z space using the linear transformations illustrated in Eqns. 6.22
to 6.24. Each plane is then smoothed with Gaussian filters with variable
weights Wi and spreads O"i (typical values for these filter parameters are given
in Table 6.1). The spatial CIELAB model 2 is illustrated in Fig. 6.9.
The Gaussian filters are of the form stated in Eqn. 6.25. The coefficients
for these filters (Wi, O"i) have been derived from psychophysics research and
2

http://white.Stanford.EDU/html/xmei/

210

6. Colour Image Analysis

are defined in Zhang & Wandell (1997). This produces a modified X, Y, Z


representation to which the standard CIELAB transformation is applied.
The S-CIELAB space adds a pattern-colour independent spatial processing
to CIELAB space. Since it can capture the dependence between colour and
textural pattern, it is a particulary suitable colour representation for texture
analysis.

01

= 0.279X + 0.72Y -

O2

= -0.449X + 0.29Y -

03

= 0.086X -

0.107Z

= Brightness

0.077 Z

0.59Y - 0.501Z

= Red green

= Yellow

(6.22)

component

(6.23)

blue component

(6.24)
(6.25)

tandard
IELAB
Calculation

01 r
patial
eparation Filtering

I'epre entation
Opponent
representation

XYZ
repre entation

Fig. 6.9. Spatial CIELAB model. Diagram courtesy of X. Zhang, Department of


psychology, Stanford University.

6.11.1 SegIllenting Colour Textures

Mirmehdi & Petrou (2000) propose a mechanism for segmenting colour textures based on the use of the S-CIELAB colour space in conjunction with a
multi-scale (or multi-distance) tower of images. With the multi-scale tower, no
subsampling occurs, therefore maintaining the same number of pixels (Petrou

6.12 Programmable Colour Filter (PCF)

211

Table 6.1. S-CIELAB Gaussian filter parameters.

Plane

Weights (Wi)

Spreads ((Ti)

0.921
0.105
-0.108
0.531
0.330
0.488
0.371

0.0283
0.133
4.336
0.0392
0 . .494
0.0536
0.386

Luminance
Red-Green
Yellow-Blue

et al. 1998). The tower is generated using the filter parameters given in Table 6.1 by viewing the same coloured object at incremental distances. The
aim is to imitate the blurring a human would experience at the corresponding
distance. Fig. 6.10 illustrates a colour image and its associated S-CIELAB
smoothed images at varying distances.
Fig. 6.11 illustrates the same image, but this time Gaussian smoothing is
employed. Mirmehdi & Petrou (2000) claim that the set of images based on
S-CIELAB smoothing provides a more realistic representation of the blurring
of a coloured object at varying distances.

6.12 Programmable Colour Filter (PCF)


The Programmable Colour Filter (PCF) (Batchelor & Whelan 1995, Batchelor & Whelan 1997) provides a method of filtering colour images based on
the colour triangle (Sec. 6.2.1). The PCF uses a standard RGB video input
from a colour camera and can be programmed to recognise one, or more,
colours and is particularly useful in coarse colour discrimination. The colour
filter is programmed by changing the contents of a LUT. Using an approach
based on colour scattergrams (Sec. 6.3.1) the contents of this LUT can be
modified, enabling it to recognise any desired combinations of the incoming
RGB signals. Thus, the filter can be programmed to recognise one, or more,
colours (note that the video signal can be processed in real-time by the LUT)

6.12.1 PCF Implementation


The position of a point in the colour triangle can be specified by the parameters U and V (Fig. 6.12) which can be calculated from R, G and Busing
the following formulae:

= ----;=,----R_-_G_ _
V2.(R

+ G + B)

(6.26)

212

6. Colour Image Analysis

(a)

(b)

(c)

(d)

Fig. 6.10. Multi-scale tower for S-CIELAB smoothing. Colour images are represented in grey scale. Images (a) to (d) are the resultant images for distance increments of 10 inches.

V=

2.B-R-G

(6.27)

~~--------

v'6.(R

+ G + B)

To see how these equations can be derived, view the colour triangle normally (i.e. along the line QPO the diagonal of the colour cube). When the
vector (R, 0, 0) is projected onto the colour triangle, the resultant is a vector
Vr oflength R.V2f3 parallel with the R' axis. In a similar way, when the vector (0, G, 0) is projected onto the colour triangle, the result is a vector Vg of
length G.V2f3 parallel to the G' axis. Finally, the vector (0,0, B) projected
into the colour triangle forms a vector Vb of length B.
parallel to the
B' axis. A given colour observation (R, G, B) can therefore be represented by
the vector sum (Vr + Vg + Vb). Finally, U and V can be calculated, simply by
resolving Vr, Vg and Vb along these axes. Let us now consider the mapping
function
given by:

V2f3

ro

r((R G B)) _ (
"

R- G

y!2.(R

2.B - R - G )

+ G + B) , v'6.(R + G + B)

(6.28)

6.12 Programmable Colour Filter (PCF)

(a)

(b)

(c)

(d)

213

Fig. 6.11. Multi-scale tower for Gaussian smoothing. Colour images are represented
in grey scale. Images (a) to (d) are the resultant images for distance increments of
10 inches.

ro

projects a general point X = (R, G, B) within the colour cube onto the
colour triangle. Clearly, there are many values of the colour vector (R, G, B)
which give identical values for r((R, G, B)). The set of points within the
colour cube which give a constant value for r((R,G,B)) all lie along a
straight line passing through the origin in RGB space (0 in Fig. 6.12). This
set is denoted by <I>(U, V), where VX : X E <I>(U, V) --+ r(X) = u, V. The
colour scattergram is simply an image in which the "intensity" at a point
U, V is given by the number of members in the set <I>(U, V).
The details of the process of programming the PCF are as follows (see
Figs. 6.13 and 6.14):
1. Project all RGB vectors onto the colour triangle (which contains the
colour scatter gram ).
2. Process the colour scattergram, to form a synthetic image, S. (Image S
is not a "picture of" anything. It is merely a convenient representation
of the distribution of colours within the input).

214

6. Colour Image Analysis

B' & V ~

__ ----- --t'

Fig. 6.12. The relationship between the RGB- and UV-coordinate axes. The vectors R', G' and B' all lie in the UV -plane, which also contains the colour triangle.

3. Project each point, Y, in image S back into the colour cube. This process
is called Back-projection. Every vector X within the colour cube, that
shares the same values of hue and saturation as Y, is assigned a number
equal to the intensity at Y. The values stored within the look-up table
are obtained by back-projecting each point within the colour triangle
through the colour cube. Any points not assigned a value by this rule
are given the default value 0 (black). See Batchelor & Whelan (1997) for
further details.

6.12.2 Recognising a Single Colour


Consider a very simple scene consisting of a single spectral colour, for example, red, with uniform illumination everywhere. What happens when the
PCF is trained on such an image? The colour scattergram, created during the
training process, contains a single compact cluster. In its recognition mode,
the programmed PCF may well be applied to the video signal derived from
a more complex scene, containing the same shade of red and other colours.
Only the particular shade of red encountered during training will be recognised. (Pixels of this shade of red are shaded white in the output image.)
All other colours will be ignored (shaded black). Very dark and very bright
red regions will also be ignored (i.e. since both upper and lower intensity
limits are applied during the back-projection process outlined previously).
When the PCF designed to recognise a particular shade of red is applied
to a polychromatic scene, only that same shade of red will be detected; all
other colours, including slightly different shades of red, will map to black
(Fig. 6.15).

6.12 Programmable Colour Filter (PCF)

215

B
Z is on the outer
edge of the colour
cube

~--~-r----~-+R

The value stored at Y is


equal to the total
number of points lying
along OZ. Notice that Y
lies in the colour triangle

(a)
B

(b)

(c)

Fig. 6.13. Programming a colour filter. (a) The first step is to compute the colour
scattergram. The points lying along OZ are counted. This defines the value at Y
in the colour scattergram. (b) The colour scattergram is presented to the user in
the form of a grey scale image in which intensity indicates how many pixels were
found of each colour. (c) The second step is to process the colour scattergram. The
steps represented diagrammatically here are thresholding and blob shading.

6.12.3 Noise Effects

When a PCF that has been trained on one scene and reapplied to the same
scene, it is often found that the filter output is noisy; some pixels in what
appears by eye to be a region of uniform colour are mapped to black. On a live
video picture, some pixels are seen to scintillate. There are several possible
reasons for this:
Pixels which generate points in the colour scattergram close to the edge of
the main cluster, will sometimes be displayed as black. Camera noise adds
an unavoidable jitter in the values generated at the RGB signal outputs.

216

6. Colour Image Analysis

Z is on the edge of
the colour cube

Assume values in this blob are


all the same (level 0). Then all
corresponding LUT entries in
Ihis cone are set 10 0

The value stored in


localion X of the LUT
is equal that at Y.
(Notice Ihat Y hes In
the colour trianglo.)

(a)
B

Upper limit, l2

Lower limit, L 1

(b)
Fig. 6.14. PCF back-projection. (a) The colour triangle is shown here within the
colour cube. The process of programming the colour filter is to "back-project" each
point, Y, through the colour triangle, so that each point lying along the line OY
is given the same value as Y. The blobs shown in Fig. 6.13(c) each contain many
points of the same intensity. The effect of "projecting" a blob through the colour
cube is to set all points lying within a cone to the same intensity. (b) Two intensity
limits (1 and 2) are specified by the user. (2 may well be set to 255, in which
case, it has no practical effect). When the LUT contents are being computed, points
within the colour triangle are not "back-projected" into either of the two corners,
shown shaded here.

6.12 Programmable Colour Filter (PCF)

(a)

(b)

217

(c)

Fig. 6.15. Isolating a single colour using the PCF. Colour images displayed in grey
scale. (a) Analysing the image of a small electronics component (black with shiny
silver printing) in a transparent plastic bag. The bag has red printing on it and
is resting on a white background. (Unprocessed video image). (b) Output of the
colour filter. (c) A grey scale image obtained by thresholding (b).

Recall that hard upper and lower intensity limits are defined during training of the PCF. Some very dark pixels, below the lower intensity limit
will be shaded black, even though they are of the particular tone that the
PCF is supposed to recognise. A similar situation holds around the upper
intensity limit.
Specular reflection on wrinkles on the surface of the object being viewed is
a prime cause of noise-like effects. The very high intensities produced by
glinting cause black spots to appear in the PCF output image.
The colour scattergram often consists of a compact cluster, with a diffuse "corona". Outlier points are often specifically excluded, by generating
a small blob which covers only the dense centre of the cluster, prior to
creating the filter.
Colour edges produce outlier points in the colour scattergram. As a result,
sharp colour edges may become jagged, in the PCF output image.
6.12.4 Colour Generalisation

The colour scattergram may be displayed as a grey scale image, in which


intensity indicates the number of pixels with the same values of hue and
saturation. The colour scattergram must be simplified before any further
processing takes place and an obvious step is to threshold it, This process
will generate a compact blob for each region of similar colours (Fig. 6.16).
It is often difficult to choose the value for the threshold parameter with
any degree of confidence. Choosing too high a value will result in a binary
image containing a large spot surrounded by a cloud of small spots, many
of them single, isolated pixels. This is unsatisfactory, since the outliers have
the same weight as those at the centre of the cluster, which are much more
representative of the general population of points within the input. Outliers
are also prone to the effects of camera and quantisation noise. By placing
the threshold too high, tolerance of colour variation in the input is lost. As

218

6. Colour Image Analysis

a result the PCF output will consist of a noisy image in which large regions
are not recognised properly, even though they may be identical to areas in
the scene used during training of the PCF.
One approach to colour generalisation consists of adjusting the sizes of the
blobs created by thresholding the colour scattergram. It will be necessary to
do this in such a way that blobs which were distinct when the (multi-cluster)
scattergram was first thresholded, remain separate.
In a typical application, a number of coloured scenes are used to design the PCF. As each scene is being viewed, a scattergram is generated in
the colour triangle. Noise is then removed from the scattergram, using common image processing operations, such as low-pass filtering. This is followed
by thresholding. If the input scene consists of a single colour thresholding
the scattergram, after clean-up, creates a single blob B i . This process is
repeated for each colour we wish to use to design the PCF. As each new
blob (Bi' i = 1, ... , n) is generated it is superimposed on the colour triangle.
Therefore, prior to generalisation the colour triangle consists of a number of
blob regions, each of which corresponds to one of the trained colours. The
aim of the generalisation procedures is to expand these regions, forming the
regions Gi , i = 1, ... , n. By projecting the Gi back onto the colour cube we
generate the contents of the PCF look-up table. If the Gi have been generated
appropriately, the resulting colour recognition process is more reliable than
it would have been if the smaller blobs Bi had been used instead (Fig. 6.17).
A number of simple colour generalisation procedures are outlined below:
1. Simple Dilation: Each blob Bi in the colour triangle is dilated.
2. Dilation with preservation of connectivity: A single layer of background
pixels is stripped from the (binary) image in the colour triangle. Unlike
the previous approach, pixels critical for connectivity are retained.
3. Morphological watershed: This approach involves finding the morphological watershed (as outlined in Sec. 4.4.2) for each of the blobs B i .
4. Convex hull generalisation: The convex hull is drawn around the set of
blobs Bi in the colour triangle. It is reasonable to expect that points
within this convex hull will correspond to a generalisation of the observed
colours.
Procedures 1 to 3 are intended for use in those situations where colour
discrimination is required, whereas procedure 4 is more appropriate for those
occasions when simple colour recognition is needed. Analysis of colour scenes
is made much simpler if colour generalisation is used. For example:

The "noise" level is reduced.


Regions of constant colour are made more "solid".
Estimates of the areas of the various coloured regions are more accurate.
It is possible to identify coloured regions from their shapes more reliably.

6.13 Conclusion

(a)

(b)

219

(c)

Fig. 6.16. Colour scattergram displayed as a grey scale image, in which intensity
indicates the number of pixels with the same values of hue and saturation. (a)
Colour scattergram. (b) Threshold applied to the colour scatter gram to consolidate
the colour clusters. Minor clusters and outlier points are normally ignored. (c)
Generalization of the blob regions indicated in (b).

Consider a colour scattergram in which there are six distinct and compact
clusters (Fig. 6.18). This form of scattergram is generated by polychromatic
scenes in which there are several regions, each containing nearly constant and
perceptually distinct colours. After applying thresholding and noise reduction
to the colour scattergram, there are several small blobs, which can be shaded,
using the a blob labelling operator. Such a PCF will often be rather noisy.
In particular, some of the points in the original image are mapped to black,
incorrectly suggesting that they have not been seen beforehand. The reason is
that the small blobs do not cover all points in the colour scattergram. Notice
that the white blob at the 3 o'clock position straddles a sharp colour in the
pseudo colour-triangle. This is the reason that a (red) stripe in the input
image generates a noisy pattern in Fig. 6.18(b). If the blobs are enlarged and
a new PCF designed, the noise level is reduced. Blob enlargement can be
repeated provided that the blobs do not merge.

6.13 Conclusion
In this chapter we have only been able to touch on the general issues relating to colour image analysis for machine vision applications. For further
information on colour imaging and colour spaces, readers are referred to the
following web sites:

The efg's Computer Lab Reference Library (www.efg2.com/lab/Library /


Color .htm) .

Charles Poynton - Color FAQ (www. inforamp. netrpoynton/ColorFAQ.


html).

220

6. Colour Image Analysis

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 6.17. Recognising a single colour using PCF generalization. Colour images
displayed in grey scale. (a) Original image. (b) Output of the colour filter. (c) Grey
scale image, obtained by thresholding (b). (d) Back-projection of the colour filter
in (c). (e) Colour generalization applied to (c). (f) Back-projection of the colour
filter in (c).

6.13 Conclusion

(a)

(b)

(c)

(d)

221

Fig. 6.18. Application of the PCF to a polychromatic image. Colour images displayed in grey scale. (a) Coloured stripes from the packaging of a domestic product.
(Unprocessed video image.) (b) Output of the colour filter. (c) Colour scattergram,
superimposed on the pseudo colour-triangle. (d) A new pseudo colour-triangle was
created by flipping the pseudo colour-triangle about its vertical axis. A colour filter
was then generated in the usual way. Since there is no sharp transition across the
blob at 3 o'clock, the red stripe does not create a noisy pattern, as it did in (b).

Frequently asked questions about Colour Physics (www.colourware.co .


uk/cpfaq.htm).

In the final chapter (Chap. 7), we examine the NeatVision development


environment in detail. In doing so we hope to encourage readers to implement the ideas outlined throughout the book. It is only by working with the
image analysis and processing techniques that one can fully appreciate their
various strengths and weaknesses. A number of sample applications will also
be examined.

7. Neat Vision: Visual Programming for


Machine Vision

NeatVision l provides a versatile platform which facilitates high level access


to a wide range of machine vision and general data processing algorithms.
These can be combined to create a visual program by specifying data interdependencies between the individual algorithms. In addition, the path of
data through a visual program can be dictated using special flow control
components. These techniques provide a fast and simple alternative to conventional text based programming, while still providing much of their power
and flexibility.
In addition to high level development using the visual programming environment, the user has the ability to generate new algorithms thus expanding
the wide range of built-in NeatVision components. New NeatVision components can be developed using the Java programming language from Sun
Microsystems 2 . Java AWT imaging, Java 2D imaging, Java Advanced imaging (JAI) (Chap. 2) and NeatVision imaging can all be used to develop new
components. Components developed using different imaging techniques can
be used together in the same visual program, where all intermediate data
conversion is automatically performed by the application.
The NeatVision application provides an attractive user interface to the
concepts presented above. The main application components are the visual
programming environment, the integrated Java development environment
and the web enabled HTML based help viewer. The main user interface component is a tree structure which represents the available data processing components. This tree structure can be used to access help and add components
to a visual workspace. An important feature for developers is the component
source code development wizard that automatically generates code for new
Neat Vision components.
N eat Vision provides a fully integrated software development solution,
with several levels of user interaction. It is extendable, portable and most
importantly, easy to use. The choice of Java as the development language
for the application not only improves portability but also introduces the de1
2

The NeatVision application is available from http://www.NeatVision.com/. For


more information, please contact info@neatvision. com.
The core Java Software Development Kit (SDK), standard extensions and documentation are available from Sun Microsystems main Java web site http:
/ /java. sun. com/

P. F. Whelan et al., Machine Vision Algorithms in Java


Springer-Verlag London 2001

224

7. NeatVision: Visual Programming for Machine Vision

_ _ _ _..;;rl'rlf'

lEI

,~"

.....'0

"""'.....

..

1<IIs.lRI

,"11"<
,

:Lnll-1.I1t8 1:
ou'tp!.lca I:

na.e - ".Jllva2D";

(I Ut I)

Fig. 7.1. A screen shot of the Neat Vision application, displaying some of the main
application features.

veloper to a very powerful and straightforward approach to object-oriented


programming for image processing and analysis, which Sun Microsystems is
committed to supporting and improving for the forseeable future. The remainder of this chapter introduces the main NeatVision concepts, the visual
programming approach, new functionality integration using Java, how these
concepts are made accessible to the user via the application user interface
(Fig. 7.1) and finally a number of sample applications implemented through
NeatVision are introduced.

7.1 Visual Programming in NeatVision


Programming involves defining variables, specifying relationships between
these variables and creating a structured flow of data using branching, looping and conditional processing as required. Programs are generally described
using text and are written in a linear fashion. This can make it very difficult
to appreciate the relationship between the code and the program which it represents. Visual programming provides a more attractive alternative, whereby
program elements are represented as graphical icons and the relationship between components is specified by interconnections, hence defining the flow of

7.1 Visual Programming in NeatVision

225

data through the program. In general, data flow begins at the inputs to the
visual program and ends at the outputs. Although the data interconnections
are specified by the programmer, the order in which operations are executed
is defined by the availability of data. The concepts of branching, looping and
conditional processing are supported in the visual domain using special flow
control components.
The concept of visual programming can be illustrated by considering the
generalised polynomial given by Eqn. 7.1. This relationship can be defined in
terms of primitive operations which in this case are multiply, power and add.
y

= ax 3 + bx 2 + cx + d

(7.1)

A visual implementation of this relationship can be generated by specifying the inputs and outputs to the system, where the inputs are a, b, c, d,
x and the output is y. Following this, each of the required operators must
be instantiated and the data interdependencies must be specified by creating
interconnections between the operators. This defines the data flow through
the program. The resulting visual representation of this equation is presented
in Fig. 7.2.

_ _ ...J

Fig. 7.2. Visual implementation of a polynomial equation (Eqn. 7.1).

NeatVision is a visual programming environment based on the data flow


programming concept. There is built-in support for several data types, with
special support provided for image data. Three main categories of components
are available, input/output, processing and flow control. The following sections introduce these main component categories, describe some of the visual
programming concepts unique to the Neat Vision environment and provide
some practical examples developed using the NeatVision application.
7.1.1 Input and Output Components

One of the most important parts of any program is user interaction. This
involves getting input data into the program, executing, then displaying the

226

7. NeatVision: Visual Programming for Machine Vision

results. NeatVision provides input and output components for this purpose.
Each of these components has an associated window or user interface that can
be used for data specification and presentation respectively and can be viewed
by double clicking on the relevant component. System input components
have one output node which represents the data in the associated window.
System output components have one input node which updates the associated
window whenever it receives new data. The main data types supported by
input and output components are outlined in Table 7.1.

0-

IMAGE
(lab)

Fig. 7.3. A sample input component and the associated user interface.

In some cases the data associated with an input or output component is


displayed below that component in the visual workspace. Image components
do not display the image data they represent, instead the filename of the
image is displayed. Any information displayed in this manner will be updated
immediately to reflect the data represented by the associated component.
All graphical user interfaces for the types listed in Table 7.1 will be discussed later in this chapter. Special attention will be given to the image user
interface and its associated image analysis tools.
7.1.2 Processing COInponents

Once data has been made available to the visual workspace using the input
components it can be processed (via the processing components). A processing component has both inputs and outputs and can be compared to a
function in a conventional programming language. Inputs are equivalent to
function arguments and outputs are equivalent to return values. Although a
processing component can have many input and output connections, it requires only one of each. Processing components can support mixed data types
and in some cases dynamic type conversion. There are currently over 200 data
processing components available in the NeatVision application (Appendix C).
The major groups of processing components are described in Table 7.2.
A visual program can range in complexity from three components upwards
and is limited only by the availability of free memory. A minimal program
must have an input, an operation and an output. Such a program is illustrated
in Fig. 7.4, in which Sobel edge detection is performed on a sample grey scale
image.

7.1 Visual Programming in NeatVision

227

Table 7.1. Data types currently supported by the com.neatvision.dataio package.


Type

Description

Image

This type represents colour, grey scale or binary image


data and is specified by an absolute file name.
This type represents integer numbers in the range 2,147,483,648 to 2,147,483,647.
This type represents double precision floating point
real numbers in range -1.79769313486232e308 to
1.79769313486232e308.
This type represents Boolean data. There are only two
possible values for Boolean data, true or false.
This type represents integer data in array form. When
used with morphology components any non-integer entry
is assumed to be a don't care value.
This type represents string data. Only standard ASCII
characters are supported.

Integer
Double
Boolean
Array
String

Fig. 7.4. A sample processing component, using the Sobel edge detector.

Although most processing components operate on data obtained from the


input nodes, some provide a dedicated user interface for specifying certain arguments, such as a region of interest or threshold value. These user interfaces
will be displayed automatically when the associated processing component
enters the execution cycle. The visual program will terminate at this point,
only resuming once the user has entered the required data and closed the
user interface. The operation of these user interfaces will be discussed later
in this chapter.
7.1.3 Flow Control COInponents

The components which have been introduced so far can be used to develop
simple linear programs. In these programs, data will flow from the inputs to
the outputs and the data path will be the same, even if the input data changes.
Visual programs can be enhanced significantly by introducing dedicated flow

228

7. NeatVision: Visual Programming for Machine Vision

Table 7.2. Description of processing components available with the standard edition of NeatVision.
Group

Description

Utilities

General manipulation utilities; rotation, resize, region of


interest and masking operations.
Basic arithmetic operators; add, subtract,multiply, divide
and logical operators.
Image processing algorithms; look-up tables (LUT),
threshold, contrast manipulation.
Additive noise generators; salt and pepper, white noise,
Gaussian, Rayleigh and Poisson.
Neighbourhood based filtering algorithms including edge
detection algorithms.
Labelling and measurement algorithms for image containing binary shapes.
A range of image transforms, including Hough, Fourier
and distance transforms.
Colour space conversion algorithms, RGB, HSI, XYZ and
YIQ.
Several morphological operators, including erosion, dilation, opening, closing, top-hat, hit-and-miss and watershed.
General histogram analysis algorithms and local equalisation.
An extensive range of numerical operators and utilities,
including constants and random number generation.
Pixel level operators; get pixel value, set pixel value and
basic shape generation.
String operators, object addition, to upper case and to
lower case.
Colour algorithms implemented using JAI, operators,
processing, filters and edge detectors.

Arithmetic
Processing
Noise
Filter
Label
Transform
Colour
Morphology
Histogram
Maths
Low Level
String
JAI Colour

control components, facilitating conditional processing of data, splitting of


data paths and data looping (Fig. 7.5).
The data looping and conditional processing components introduce an
interesting concept, the feedback path. Output data from these components
can be sent out through the conventional output (the top output) or the feedback output (the bottom output). The feedback output can have any number
of components attached to it. These components represent a feedback subsystem and the output from the system represented by these components must
be connected to the feedback input of the relevant flow control component.
This concept, although not as intuitive as the concepts already introduced,
is quite powerful and allows data to be sent through the feedback subsys-

7.1 Visual Programming in NeatVision

229

First Copy of Main Input

Main Input --~

Second Copy of Main Input


Third Copy of Main Inpul

X3
Main Input

~ MalnOulput
loop Count Input ~ Feedback Outpul
Feedback Input

lOOP

Mainlnpul~
Siaft

Finish

Main Output
Feed back Oulpul

Increment

Current Counl

Feedback Inpul

FOR

Main Inpul
Main Outpul
Feedback Oulpul

Condilional Inpul - -....j..j


Feedback Input

Main Inpul
Feedback Inpul

II--- MaIn Oulpul

---1~L---.I--

Feedback Oulput

ELSE

Fig. 7.5. Input and output descriptions for all of the flow control components.

tern any number of times before being sent out the conventional component
output. The flow control components are presented in Table 7.3.
Data Splitting. Data splitting allows data at a certain node in a visual
program to be replicated several times. Splitting creates multiple reference
nodes for the same piece of data where each of these nodes can be used to
feed a new program branch. The data splitting component has three variants replicating nodes two, three and four times. Data splitting components
implement dynamic type conversion and will support all future data types
implemented by NeatVision. A sample visual program using a data splitting
component is presented in Fig. 7.6.

OPERATOR2

DATA

OPERATOR3

DATA

Fig. 7.6. A sample visual program demonstrating data splitting or copying.

230

7. NeatVision: Visual Programming for Machine Vision

Table 7.3. Description of flow control components available with the standard
edition of NeatVision.
Component

Description

If

Data will be passed through the feedback subsystem of


this component, only if the conditional input is true, otherwise the data is passed directly to the main output.
This component can only be connected directly to the
main output of an If component. No intermediate data
processing is permitted. Data will be passed through the
feedback subsystem of this component if the conditional
input to the corresponding If component is false, otherwise the data is passed directly to the main output.
The number of times data will be passed through the
feedback subsystem of this component is determined by
the loop arguments start, finish and increment. The current loop count is made available to the feedback subsystem.
The number of times data will be passed through the
feedback subsystem of this component is determined by
the loop count input.
The data at the input node of this component is copied
to each of the output nodes. The data at each of these
output node can then be processed independently.

Else

For

Loop
Split

Data Looping. Data looping allows data to be passed through a feedback


subsystem several times. The number of times looping occurs is determined
by the loop arguments. Data received at the main input is passed to the
feedback output, processed by the feedback subsystem, then depending on
the current state of the looping component, the data is either sent out the
feedback output for processing by the feedback subsystem or sent out the conventional output for further processing by the system. There are two looping
components available, Loop and For. The inputs and outputs associated with
these components are described in Fig. 7.5. The For component provides a
special output which represents the current loop count (labelled "current" in
Fig. 7.7) which in turn can be used, if required, by the feedback subsystem.
Conditional Data Processing. Conditional processing functionality is
provided by the If and Else components. The If component requires a main
data input and a conditional input. If the conditional input is true then the
data from the main input is sent via the feedback output through the feedback subsystem, the resulting data from the feedback subsystem is passed to
the feedback input and then out through the main output. An If component
may be followed by an Else component. The main output from the If component must be connected directly to the main input of the Else component.

7.1 Visual Programming in NeatVision

231

LOOP
COUNT

(a)

STEP

(b)
Fig. 7.7. Examples of data looping components (a) A feedback loop.(b) A For
loop.

No intermediate processing is permitted. Data will only be sent through the


feedback subsystem of the Else component if the conditional input to the
corresponding If component is false. If this is not the case, the data is sent
directly to the main output of the Else component. Data will only be passed
through the feedback subsystems of either of the conditional data processing components once. The path chosen will depend on the conditional input
supplied to the If component.
7.1.4 System Development
All of the visual programming components provided by the NeatVision application have been introduced and the general operation of each of the component groups has been described in detail. A visual program can be created
by defining input data using the input components, then implementing the
desired algorithm using the processing and flow control components. The
data flow is specified by creating interconnections between the instantiated
components. The program can be completed by adding output components
to view the data resulting from the algorithm execution.
7.1.5 Sample Programme
Consider the generalised second order polynomial equation in x (Eqn. 7.2).
The roots of this polynomial can be calculated using Eqn. 7.3, an equation

232

7. NeatVision: Visual Programming for Machine Vision

CONDITION

(a)

CONDITION

(b)
Fig. 7.S. Examples of conditional processing components (a) If loop.(b) If/Else
loop.

consisting of a combination of simple arithmetic operators add, subtract, multiply, divide, power and square root.
ax2

+ bx + c = 0

(7.2)

-b Vb 2 - 4ac
(7.3)
2a
Using visual equivalents for theses operators a program representing Eqn. 7.3
can be created (Fig. 7.9). It should be noted that this visual program is a
simplified representation of the equation and will only solve for real values of
x.
X=

Fig. 7.9. A visual program which solves the real roots of a second order polynomial
equation.

7.2 Java Programming in NeatVision

233

7.2 Java Programming in NeatVision


Neat Vision provides a fully functional Java development environment, allowing the user to create new components within the NeatVision application.
New components can be developed using the functionality of other components, either provided by the system or developed by the user. Any of the
standard system data types can be used as arguments or inputs to newly
developed components and any form of data can be generated by user defined components and then passed to the system in the form of an object.
This section will introduce the data flow concept, i.e. the order in which the
data is passed between components in a valid system. The architecture and
interface of each component is also discussed. Finally, several programming
examples, based on both simple data processing and image processing using
several different imaging models are presented.

7.2.1 Data Flow Programming


The data flow through a visual program is controlled by an algorithm based on
data flow programming. This algorithm attempts to bring any visual system
to a steady state. In general, data in a visual program flows from the input
components to the output components and passes through the intermediate
processing and flow control components, depending on the data interdependencies within the system. It is important to note that a component will only
execute if it has received data on all connected inputs. There are two possible
states for any component. These state are presented in Table 7.4.
Table 7.4. The two possible states of a NeatVision component.

State

Description

STEADY STATE

The component is either waiting for data on


one of the input connections or has received
data on all inputs processed this data and
passed any results to the output connections.
The component has received data on all its input connections and is waiting to process this
data and pass the results to the output connections.

WAITING TO PROCESS

Visual program initialisation involves identifying all the input components


and setting the state of these components to WAITING TO PROCESS. Each
of these components then executes by retrieving the data from the associated

234

7. NeatVision: Visual Programming for Machine Vision

user interface which it then passes to the output connections for further
processing. Fig. 7.10 illustrates the data flow in a simple linear system.

IMAGE B

IMAGE B

IMAGE B

IMAGE B

IMAGE B

COMPONENT STATES

IMAGE B

I
I

STEADY STATE
WAITING TO PROCESS

Fig. 7.10. Data flow through a simple linear visual program.

After the visual program has been initialised, both of the image input
blocks are WAITING TO PROCESS. Each of the image input blocks reads
the image from the associated user interface and writes this data to the output connection. The data from IMAGE A, which returns to STEADY STATE,
is then sent to the LOWPASS which immediately signals WAITING TO PRO-

7.2 Java Programming in NeatVision

235

CESS and the data from IMAGE B, which also returns to STEADY STATE,
is sent to ADD. ADD does not signal WAITING TO PROCESS at this point
because it has only received data on one of two inputs. LOWPASS processes
its data and sends the results to the ADD component and then it goes into
STEADY STATE. When the ADD component receives data on the second
of the two inputs it signals WAITING TO PROCESS, processes the input
data, sends the resulting data to the outputs and then goes into STEADY
STATE. When INVERT receives the data it immediately signals WAITING
TO PROCESS, processes the input data, sends the resulting data to the
next component and then goes into STEADY STATE. The image output
block then receives the data goes into WAITING TO PROCESS, sends the
data to the associated user interface and then returns to STEADY STATE.
The visual program is now in steady state, with execution completed.
The aim of this simple example is to show that the data flow in a visual
program is handled by the application. Components will only become active,
or execute when they have received data on all connected inputs. As a result,
component development is straightforward and independent of the overall
visual program, as only a small programming overhead is required to interface
with the underlying system.
7.2.2 Standard Component Architecture

A NeatVision component has an appearance, an interface and performs an


operation. The appearance describes the physical appearance of the component in terms of the dimensions, number of inputs and outputs and the
name which appears below the component. The interface defines the type
and description for each input and output connection created by the appearance and finally the operation of the component describes what operation
the component performs on the input data in order to generate the output
data.
A sample component is presented in Fig. 7.11. The appearance of this
component can be completely described by the following section of code.
It should be clear that the programming overhead required to generate a
new component is minimal and the code is intuitive. The appearance of a
component is always defined in the constructor.
public class componentAppearance extends NeatComponent {
II The constructor.
public componentAppearance()
{

inputs = 2;
outputs = 2;
name = "COMPONENT\nNAME";
width = 30;
height = 20;
}

236

7. NeatVision: Visual Programming for Machine Vision


public void setup()
{
}

public Object create(DataBlock object)


{
}
}

Input
Connec1OtS

wldt_h_

1
1 - -_ _ _

Outpul
.......~I Connec1OlS

[] ComecIJon []
StalUS

[]

IndlCalOtS

[]

COMPONENT
NAME

Fig. 7.11. A NeatVision component.

The interface which has been created as a part of the component appearance must now be initialised. This involves setting the data type, mode and
description for each of the input and output connections. The interface is always initialised within the setup () method. There are two methods required
to initialise a connection. These are setConnectionType 0, which defines the
colour of the connection and setConnectionDescriptionO, which defines
the description of the connection. This description will appear in the status
bar when the mouse floats over the associated connection status indicator.
Valid arguments for the setConnectionType 0 method are listed in table
7.5.
public class Invert extends Core Interface {
public Invert 0
{

inputs = 1;
outputs = 1;
name = II INVERT II ;
width = 30;
height = 20;
}

public void setup()


{

Input [0] .setConnectionType(IMAGE);


Input [0] .setConnectionDescription("Input");
Output [0] .setConnectionType(IMAGE);
Output [0] .setConnectionDescription("Output");
}

7.2 Java Programming in NeatVision

237

public void maine)


{
}
}

Table 7.5. Valid arguments for the setConnectionType () method.

Type

Description

Colour

IMAGE
INTEGER
DOUBLE
BOOLEAN
STRING

Any Image data type


Integer data
Double data
Boolean data
String data

RED
GREEN
BLUE
ORANGE
PINK

7.2.3 Adding Functionality


Functionality can be added to a component by overwriting the create ()
method. The create () method has only one argument, a DataBlock object
which holds references to the data at the components input connections and
one return value which can be either an Obj ect object when there is only
one output connection or a DataBlock object when there are multiple output
connections. The methods provided by the DataBlock class are presented in
Table 7.6. The DataBlock provides support for the retrieval of any type
of object using the getObj ect () method in conjunction with type casting,
thus allowing new component development based on user defined data types.
The create () method of a data processing component will always consist of
three stages: (1) Retrieval of input data from the DataBlock argument, (2)
the processing of this data and (3) returning of the results to the system.
7.2.4 Examples
All ofthe key elements of Neat Vision component design have been introduced.
Now, examples based on the concepts discussed in this section are presented.
Examples range from simple data handling and data processing components
to complex image processing components. These examples describe how to
overwrite the create () method of the NeatComponent class and can easily
be integrated into a sample NeatVision component.
As NeatVision is primarily an image processing application, special attention is given to the implementation of image processing operations using

238

7. NeatVision: Visual Programming for Machine Vision

Table 7.6. The methods associated with the DataBlock class.


Method

Description

Object getObject(int c)

Returns the data associated with input


c as an Object object.
Returns the data associated with input
c as primitive type into
Returns the data associated with input
c as primitive type long.
Returns the data associated with input
c as primitive type float.
Returns the data associated with input
c as primitive type double.
Returns the data associated with input
c as primitive type boolean.
Returns the data associated with input
c as a String object.
Returns the data associated with input
c as a java.awt.lmage object.
Returns the data associated with input
c as a com.neatvision.Graylmage object.
Returns the data associated with input
c as a com.neatvision.RGBlmage object.
Returns the data associated with input c
as a javax.media.jai.RenderedOp object.

int getInteger (int c)


long getLong(int c)
float getFloat (int c)
double getDouble (int c)
boolean getBoolean(int c)
String getString(int c)
Image getImage (int c)
GrayImage get Gray Image (int c)
RGBImage getRGBImage (int c)
RenderedOp getRenderedOp (int c)

the wide range of imaging techniques supported by NeatVision. All imaging


examples are based on the Laplacian edge detector, a simple algorithm which
detects edge magnitudes without regard to orientation. The Laplacian which
is defined in Eqn. 7.4, is a differential operator which has the same properties
in all directions and is therefore invariant to rotation in the image.
n2

) _

g x, y -

(J2g(x, y)
8x 2

8 2 g(x, y)
8y2

(7.4)

In digital images this equation is approximated by a convolution sum and


a 3 x 3 mask is generally used. For four and eight connected neighbourhoods
the Laplacian is defined by Eqns. 7.5 and 7.6 respectively. All imaging examples presented here are based on the four connected version of the Laplacian
equation (Sec. 3.2.3).
(7.5)

7.2 Java Programming in NeatVision

11 1
11 -8

II

239

(7.6)

Copying Object References. This example describes how data can be


copied from a single input connection to multiple output connections. The
create () method ofthe single input, dual output component which performs
this operation is presented below and can be described as follows. The reference to the data associated with the input connection is retrieved using the
getObj ect () method of the DataBlock class. Zero is the index associated
with the topmost input connection is the argument to this method. A new
DataBlock is then created to hold the output, or returned data. The reference
to data at the input connection is stored in the first two entries of this newly
created DataBlock object, which is subsequently returned from the create ()
method. The contents ofthe DataBlock, returned from the create () method
are then applied to the output connections of the component.
public Object create(DataBlock arguments) {
II retrieve the input data ...
Object argumentO = arguments.getObject(O);

II create a new DataBlock and add output data ...


DataBlock returns = new DataBlock();
returns.add(argumentO);
returns.add(argumentO);
II return the output DataBlock ...
return(returns);
}

Simple Arithmetic Operations. The previous example simply passed an


object reference directly from the input connection to the output connections, with no data processing performed. This example describes how to
implement a simple arithmetic operation using a NeatVision component, the
create () method of a dual input, single output component which performs
simple arithmetic addition is presented below and can be described as follows.
The objects associated with the two input connection are retrieved as int
primitives using the getInteger () method of the DataBlock class. These
are added together and the result is used to create a new Integer object
which is then returned from the create () method. This Integer object is
then applied to the output connection of the component.
public Object create(DataBlock arguments) {
II retrieve the input data ...
int argumentO
arguments.getlnteger(O);
int argumentl = arguments.getlnteger(l);

II perform simple arithmetic addition ...


int returnO = argumentO + argumentl;

240

7. NeatVision: Visual Programming for Machine Vision

II

create an Integer object using result ...


Integer returns = new Integer(returnO);

II return the output Integer ...


return(returns);
}

NeatVision Imaging. The following two examples describe how to implement a 4-connected Laplacian edge detector using NeatVision imaging. The
Neat Vision imaging API Specification consists of two classes, Greylmage and
RGBlmage is presented in Appendix B. NeatVision imaging is based almost
entirely on Java AWT imaging. Both examples are taken from single input,
single output components and utilise several of the more important methods
provided by the Greylmage and RGBlmage classes respectively.
The first create 0 method implements a Laplacian edge detector using
the Greylmage class and can be described as follows: The object associated
with the input connection is retrieved from the DataBlock as a Greylmage
object, with the dimensions of this image obtained using the getWidthO
and getHeight 0 methods. These dimensions are then used to create a new
Greylmage object to store the output image. The output image data is then
created by performing the convolution sum on each 3 x 3 4-connected neighbourhood in the input image. The output image is then returned from the
create 0 method for further processing by the system.
public Object create(DataBlock arguments) {
GrayImage image In = arguments.getGrayImage(O);
int w = imageIn.getWidth();
int h = imageIn.getHeight();
GrayImage image Out = new GrayImage(w,h);
int b,d,e,f,h,z;
for(int y=O; y<h; y++)
for(int x=O; x<w; x++)
{

b
imageIn.getxy(x,y-l);
d
imageIn.getxy(x-l,y);
e = imageIn.getxy(x,y);
f
imageIn.getxy(x+l,y);
h
imageIn.getxy(x,y+l);
Z

= (-4xe)+b+d+f+h;

II

clamp values ...


if(z>255)z=255; if(z<O)z=O;
imageOut.setxy(x,y,z);

return(imageOut);
}

II
II
II
II
II
II
II

+---+---+---+
I

die

+---+---+---+
I

+---+---+---+
I

+---+---+---+

7.2 Java Programming in NeatVision

241

The second create () method is based on the RGBImage class, a multiplane version of the GreyImage class, consisting of a red, green and blue plane.
The main difference between this code segment and the previous example
is that the convolution sum is performed sequentially on each input image
plane to generate the data for each output image plane. The individual image
planes are accessed by adding an extra argument, which identifies the plane
being addressed, using the getxy () and setxy () methods. In the example
below the plane variable p is generated by adding a third f or-loop to the
standard raster scan pair. It should be noted that the processing time for this
component is approximately three times that of the single plane GreyImage
equivalent.
public Object create(DataBlock arguments) {
RGBImage imageIn = arguments.getRGBImage(O);
int w = imageIn.getWidth();
int h = imageIn.getHeight();
RGBImage image Out = new RGBImage(w,h);
int b,d,e,f,h,z;
for(int p=RGBImage.RED; p<RGBImage.BLUE; p++)
for(int y=O; y<h; y++)
for(int x=O; x<w; x++)
{

b
imageIn.getxy(p,x,y-1);
d
imageIn.getxy(p,x-1,y);
e = imageIn.getxy(p,x,y);
f
imageIn.getxy(p,x+1,y);
h
imageIn.getxy(p,x,y+1);
Z

= (-4xe)+b+d+f+h;

II
II
II
II
II
II
II

+---+---+---+
I

die

+---+---+---+
I

+---+---+---+
I

+---+---+---+

II clamp values ...


if(z>255)z=255; if(z<O)z=O;
imageOut.setxy(p,x,y,z);
}

return(imageOut);
}

Java AWT Imaging. The Laplacian can also be implemented using Java
AWT imaging. Using this approach the image is retrieved from the input
as a java. awt. Image object, where the pixel information for this object
is not directly accessible. The image must be decomposed into an array of
pixel values before any low level processing can take place. Classes from
the java. awt and java. awt . image packages are required for this example,
consequently the following imports must be made.
import java.awt.*;
import java.awt.image.*;

242

7. NeatVision: Visual Programming for Machine Vision

The Image object is converted into an array of pixels using the PixelGrabber
class. In order to construct a new PixelGrabber object we require the dimensions of the Image object. These are obtained using the getWidthO and
getHeight 0 methods of the Image object. These dimensions are also used
to create buffers for the input and output images. The pixel data associated with the input image is available after the grabPixels 0 method of
the PixelGrabber has been called. At this point the 3 x 3, 4-connected convolution is performed and the resulting data is stored in the output image
buffer. In order to simplify this example, the three image planes are combined
to generate a grey scale image. This operation is performed using a method
similar to avg 0 listed below.
private int avg(int pixel) {
II get the average of the intensity planes (R+G+B)/3 ...
return ((pixel16)&OxFF + (pixelS)&OxFF + pixel&OxFF) 13;
}

The resulting image data must be converted into an Image object before
it can be returned from the create 0 method. This is achieved using the
create Image 0 method of the Toolkit class and the MemoryImageSource
class which is an instance of the ImageProducer interface. Neat Vision imaging was developed as a user friendly interface to the functionality provided
by Java AWT imaging. By comparing this example to the GreyImage example from the NeatVision imaging section it should be quite obvious which
approach is more straightforward to use.
public Object create(DataBlock arguments) {

II

retrieve the input data ...


int image In = arguments.getImage(O);
MediaTracker tracker = new MediaTracker(imageIn,O);
tracker.waitForID(O);

II

get the image dimensions ...


int w imageIn.getWidth(null);
int h = imageIn.getHeight(null);

II

grab image pixels to a buffer ...


int[] input = new int[w*h];
int[] output = new int[w*h];
PixelGrabber grabber = new PixelGrabber(imageIn,O,O,w,h,buffer,w);
grabber.grabPixels();

II

perform the Laplacian operation ...

for(int y=O; y<h; y++)


for(int x=O; x<w; x++)
{

int b
avg(buffer[x+(y-1)*w]);
int d
avg(buffer[x-l+y*w]);
int e = avg(buffer[x+y*w]);

7.2 Java Programming in NeatVision


int f
int h

243

avg(buffer[x+l+y*w]);
avg(buffer[x+(y+l)*w]);

int z = (-4xe)+b+d+f+h;

II clamp values ...

if(z>255)z=255; if(z<O)z=O;

buffer[x+y*w] = OxFFOOOOOOI (z16) 1 (z8) Iz;


}

II create and return a new Image from the output pixels ...

ImageProducer producer = new MemoryImageSource(w,h,output,O,w);


Image image Out = Toolkit.getDefaultToolkit().createImage(producer);
return(imageOut);
}

Java 2D Imaging. This example describes how a 4-connected Laplacian


edge detector can be implemented using Java 2D imaging. The Java 2D imaging API provides high level classes for performing the convolution operation,
these are ColvolveOp for performing the convolution and Kernal for specifying the convolution kernel. These classes are members of the java. awt . image
package and in order to use them the following imports must be made.
import java.awt.image.*;

The operation of the create () method presented below can be described


as follows. The data at the input connection is retrieved in the form of a
Bufferedlmage object. The type and dimensions of this object are then obtained using the get Type (), getWidth () and getHeight () methods of the
Bufferedlmage class. Using this information a new Bufferedlmage object is
created to store the output image. A convolution kernel is then created using the convolution mask associated with the Laplacian edge detector. This
kernel is in turn used to construct a ConvolveOp object which ultimately
performs the convolution operation on the input image. The resulting output
image is then returned from the create () method.
public Object create(DataBlock arguments) {
BufferedImage imageIn = arguments.getBufferedImage(O);
int type = imageIn.getType();
int width = imageIn.getWidth();
int height = imageIn.getHeight();
BufferedImage image Out = new BufferedImage(width,height,type);
float[] matrix = {O.OF, 1.0F, O.OF,
1.0F,-4.0F, 1.0F,
O.OF, 1.0F, O.OF};

244

7. NeatVision: Visual Programming for Machine Vision

Kernel kernel = new Kernel(3,3,matrix);


ConvolveOp convolve = new ConvolveOp(kernel);
convolve.filter(imageln,imageOut);
return(imageOut);
}

Java Advanced Imaging (J AI). This implementation of the 4-connected


Laplacian edge detector is based on JAI and requires classes from the
j avax. media. j ai package, hence the following import must be made.
import javax.media.jai.*;

This example is very important as it describes how to access JAI operations


from within a NeatVision component, thus allowing NeatVision access to
the extensive and expanding range of imaging algorithms provided by Sun
Microsystems. The operation of this create () method can be described as
follows. The data associated with the input is retrieved in the form of a
RenderedOp. As before the convolution mask is defined and used to create
a convolution kernel object (this time a KernalJAI object). The following
code segment illustrates how JAI operations are accessed, using the static
create () method of the JAI class. The first argument is the operation name,
followed by the operation arguments. The resulting RenderedOp is then returned from the create () method.
public Object create(DataBlock arguments) {
RenderedOp imageln = DataBlock.getRenderedOp(O);
float[] matrix = {O.OF, i.OF, O.OF,
i.OF,-4.0F, i.OF,
O.OF, i.OF, O.OF};
KernelJAI kernel = new KernelJAI(3,3,i,i,matrix);
RenderedOp image Out = JAI.create("Convolve",imageln,kernel);
return(imageOut);
}

Accessing Operations. In this example we describe how operations defined within other NeatVision components can be accessed. In this case the
Laplacian operation is being accessed. For more information about the range
of NeatVision operations available, please consult the NeatVision web site
(http://www.NeatVision.com/) .
public Object create(DataBlock arguments) {
Greylmage imageln = DataBlock.getGreylmage(O);
Greylmage image Out = NeatVision.create("laplacian",imageln,
new Integer(4;
return(imageOut);
}

7.3 The NeatVision Application

245

7.3 The NeatVision Application


The two main features of the NeatVision application were introduced in the
previous sections of this chapter, visual programming and new component
development. This section focuses on how these features can be accessed
through the application user interface. The Neat Vision user interface was
developed using Java Swing components and consists of five main sections:
Controls, error feedback, the component explorer, the status bar and the
virtual desktop (where all of the application and tool windows reside after
they are launched).

7.3.1 Visual PrograIllIlling in NeatVision


When Neat Vision is invoked the main application window appears (the main
sections of this window are described in Fig 7.12). Initially the virtual desktop is empty and there are no messages in the message window. This section
describes how to create, compile and execute a visual program using NeatVision.
The Controls
The NeatVision controls conSist 01 the
IOObar and the menubar, these adapt

depending on which application window

in the virtual desklop is currently selected.


The Component Explorer
The NeatVision component explorer

The Virtual Oesktop

available component. components


can be dragged directly 'rom the
explorer into a visual workspace.

desktop 01 the NeatVlsion application.

provides access 10 the amenlly

The Statusbar - -- - ,
The statusbar provides general

Information abOut the current state


of the appltC8Uon. II also provides
helplul information describing the

Any intemal appllcallOn such as the


visual workspace or the source code
viewer will be located WIthin the virtual

The Message Wfndow


All Java and visual compilation errors

and warnIngs ate presenled in this


windOW. To hnd the source of any
message Just dcuble clICk II.

reqUired data for component


'-...,,,,,,,,,;;;;;=~.:::==:...~_~..:::=.d
connections in avisual workspace.

Fig. 7.12. The main sections of the NeatVision user interface, the controls, the
component explorer, the status bar, the virtual desktop and the message window.

The visual program described here will implement simple arithmetic addition and requires only four components, two integer inputs, an arithmetic
addition component and an integer output. These components are all available from the component explorer and can be found under the system tab.
First a blank visual workspace is required, this can be created by selecting
New Workspace from the File menu. The new visual workspace which is located within the virtual desktop can now be populated with the required
components.
As mentioned previously the required components are all available from
the system tab of the component explorer, the integer inputs can be found in

246

7. NeatVision: Visual Programming for Machine Vision

the Input folder which is a subfolder of the Data folder one of the main system
folders. The integer output can be found in the Output folder which is also
a subfolder of the of the Data folder. Finally the add component is located
within the Basic Functions folder, a sub folder of the Math folder which is
located in the main system menu. All of these components can be added to the
newly created visual workspace by simply dragging them from the component
explorer to the desired location within the visual workspace. A component
can also be instantiated by double clicking on the desired component name
in the explorer.
The components of the visual program which are now present in the visual workspace must be combined together to create a valid program. The two
integer inputs must be connected to the input connectors of the add component. The output of the add component must then be connected to the
input of the integer output component. Connections are made by dragging
the output connector of one component to the input connector of another
component. If the connection is valid then the connection status indicator
will go green otherwise, in the case of an invalid connection, the connection
status indicator will stay red. If the connection status indicator is orange then
a connection to the component is not necessarily required. In this case the
default value is used (the default value can be overwritten by connecting an
appropriate input). Information about a connection can be displayed by moving the mouse over the connection status indicator of the relevant connection.
The description of this connection, including the default value if available is
then displayed in the status bar. In order to delete a connection it must first
be highlighted, this is done by clicking on the connection terminator. The
highlighted connection can then be deleted by pressing the delete key on the
keyboard. A component can be deleted in the same way.
Once the required connections have been made the visual program must
be compiled. The compiler can be invoked by selecting the Compile option
from the Project menu. After compilation any errors or warnings associated
with the visual workspace are displays in the message window. The source
of any error message can be found by double clicking on the error message
itself. Once all the errors have been rectified the inputs to the visual program
can be initialised. This is performed by double clicking on all of the input
components, in this case the two integer inputs. The user interface associated
with the component will then appear and can be used to specify the input
value. Any value typed into the user interface associated with an integer input component will appear below the related component (this is particularly
useful for reference purposes).
The visual program has been defined and the inputs have been initialised.
It can now be executed by pressing the run button on the toolbar, this button
can be identified by a signal forward pointing arrow. Once this button has
been pressed the workspace should execute almost instantaneously. The result
of the addition can be seen be double clicking on the integer output, causing

7.3 The NeatVision Application

I!I

~nteger

-=-

11l1I"IV.lur.:

Eilnleger
F
IMllldV

15

17

247

Ei1nteuer

I
-00
I

Fig. 7.13. A visual implementation to simple arithmetic addition.

the user interface associated with the integer output to be displayed. The
result will also appear immediately below the integer output component in
the visual workspace, again useful for reference purposes.
This example has described how a simple visual program, based on integer
arithmetic can be created using NeatVision. The same procedure can be used
to create more complex visual programs, based on the visual programming
theory introduced in the first section of this chapter and using the wide range
of image and general data processing algorithms provided by Neat Vision.
7.3.2 Image Processing

NeatVision is primarily a machine VlSlOn application and as such it provides a range of image analysis tools. A simple image addition example is
now presented that will demonstrate how to import image data into a visual
workspace and how to view the data that results from the image processing
operation. As before, the components required for the visual program are
located in the component explorer and dragged into a new visual workspace.
The components required for this example are an image adder component,
an image output and two image inputs. Once these components have been
instantiated they are connected together to create the desired program.
The two image inputs must now be initialised. As previously, this is
achieved by double clicking to the relevant input component. In the case
of image inputs, a file dialog box will then appear this is used to select a
file from the file system. When the desired file has been selected an image
viewer window appears displaying the selected image (the image file formats
supported by NeatVision are presented in Appendix A). The image input
component is now initialised. In order to select a new image to be associated
with this image input a different procedure must be carried out. The image
viewer associated with the relevant image input component must be selected.
Then the Open Image menu item from the File menu is selected. This results
in a file dialog box being displayed, which can then be used to select a new
image.
This visual program is compiled and if any errors or warnings occur they
must be corrected. Once this has been completed the program can be exe-

248

7. NeatVision: Visual Programming for Machine Vision

Fig. 7.14. A visual implementation to simple image addition.

cuted. The resulting image can then be displayed by double clicking on the
image output component. The visual program is presented in Fig. 7.14.
The image viewer tool supports zooming and pseudo colouring, these are
accessible using the Image menu which appears in the menubar whenever an
image viewer is selected. A host of image analysis tools can also be accessed
using from the Image menu, including histogram, cumulative histogram, 3-D
profile mesh viewer and horizontal and vertical intensity scans.
The histogram and cumulative histogram can be launched by using the
Histogram submenu of the main Image menu. There are five buttons associated with a histogram window, these are used to select which colour planes
are represented by the histogram viewer and whether anti-aliasing or smoothing is turned on or off. Any combination of the three colour planes can be
displayed and as a final option the grey scale equivalent of these three planes
can also be displayed.
The 3-D profile mesh viewer can be launched from the Probe submenu of
the main Image menu. A rectangular area is selected by pressing the right
mouse button over the top left corner of the desired area and then dragging
until the highlighted rectangle encloses the entire area. This selection tool can
only be used when the image is being viewed at its normal (1:1) resolution,
i.e. the zoom tool has not been used. Once the area has been selected the
3-D profile mesh can be launched. The resulting mesh can be manipulated
using the options provided by the Probe menu, such as zoom in, zoom out, tilt
forward, tilt backward, spin clockwise, spin anti-clockwise, increase intensity
and decrease intensity. These options can also be accessed using the toolbar
and the keyboard. The keystroke controls are listed in Table 7.7. Anti-aliasing
can be used with the 3-D profile mesh viewer for smoother results, this option
is turned off by default as it slows down system performance. A sample 3-D
profile mesh viewer is presented in Fig 7.16.

7.3 The NeatVision Application

1-'-"-= -;;;;;..._ _--=-"01

t2J

(c)

(b)

(a)

249

Fig. 7.15. An example of the histogram tool. (a) Source image. (b) Resulting
histogram. (c) Cumulative histogram.
Table 7.7. The keystroke controls for the 3-D profile mesh viewer.
Option

Zoom In
Zoom Out
Tilt Forward
Tilt Backwards
Spin Clockwise
Spin Anti-clockwise
Increase Intensity
Decrease Intensity

Keystroke

+
t

+--+

The horizontal and vertical intensity scanners can also be launched from
the Probe submenu of the main Image menu. These tools describe the intensity along horizontal and vertical scan lines which are highlighted in the
original image. The location of these scan lines can be changed by dragging
the mouse pointer to the desired point in the image. The intensity of the
three colour planes can be display individually and as with the histogram the
average of these three colour planes can also be displayed. Anti-aliasing is
available for smoother results, however it is turned off by default as it slows
down system performance. The intensity scan user interfaces are presented
in Fig 7.17.
7.3.3 Other User Interfaces

So far only two user interfaces have been presented, the Image Viewer and
the Integer viewer. The final part of this section describes the remaining

250

7. NeatVision: Visual Programming for Machine Vision

3D Image Viewer

(b)

(a)

Fig. 7.16. An example of the 3-D vIewer. (a) Source image. (b) Resulting 3-D
profile mesh.

I!l

(a)

(b)

(c)

Fig. 7.17. An example of the intensity scanner. (a) The source image. (b) Horizontal scan. (c) Vertical scan.

user interfaces provided by Neat Vision. Some of these are associated with
input and output components, while others are associated with processing
components. Each of these user interfaces can be displayed by double clicking
on the related component.
The remaining user interface associated with input and output components are String, Double, Boolean and Array. String consists of a scrollable
text area which represents an ASCII string. The string data type is particularly useful for presenting the results of a visual program. Double consists
of a single text field which represents a double value, the value entered into
this text field will immediately appear below the related component. Boolean
can be used to represent one of two values, true or false, where the desired
value can be selected by choosing the relevant check box. Finally the Array

7.3 The NeatVision Application

251

user interface represents a square integer array or structuring element, where


the size of this array can be changed dynamically using the controls located
at the bottom of the user interface. In the case of a structuring element the
"don't care" entries are specified by non integer values. Each of these user
interfaces and the related components are presented in Fig 7.18.

[lString

lEI

Supl. Stdnq .

f;"i
~

J:jArrllY
1
1

Inpur

V~'IIC:

I? 141~

Set SIZe

El ouuble

O Booiean

13

Irue

o false

Fig. 7.18. User interfaces associated with the input and output.

There are four user interfaces associated with the processing components
which are packaged with Neat Vision. These components are region of interest,
single, double and triple level threshold. Whenever one of these components
executes the user interface automatically appears. The user must then either
select a region of interest or specify a threshold value and the click the button
labelled OK. The component then reads the required information from the
user interface and processes the input data. A selection of the processing
component interfaces are presented in Fig. 7.19.
7.3.4 The Integrated Software Development Environment
In this section we present an example describing how to integrate a fully
functional component into NeatVision. In order to add new functionality to
Neat Vision the developers plug-in is required, this can be obtained from the
NeatVision web site.
The steps required to integrate a new component are as follows. Select the
user tab from the component explorer. Right click the mouse anywhere in the
user area. From the resulting popup menu select the Add New Component
option. Once this has been selected a dialog box appears. This dialog presents
three options, NeatVision component, Java class and Java interface. The first
option, NeatVision component should be selected. The desired name for the
component is then typed in the text field labelled Name. Once this has been
completed the Component Development Wizard (CDW) can be launched by
pressing the button labelled Next. The CDW allows the user to create the

252

7. NeatVision: Visual Programming for Machine Vision

III

(a)

(b)

Fig. 7.19. User interfaces available for use with the processing components. (a)
Region of interest selector. (b) Triple threshold.

skeleton for a NeatVision component using a graphical user interface. The


skeleton specifies everything except for the component operation which is
defined by the create () method. The requirements for this example are a
single image input, a single image output component with dimensions, thirty
pixels wide by twenty pixels high. The connection descriptions can be left
blank for the purpose of this example. The code represented by the wizard
can then be generated by pressing the button labelled Create. The code then
appears in a new text viewer window within a the virtual desktop. In order to
create a functional component, overwrite the create method in the resulting
code with one of the examples outlined previously. The steps listed above are
illustrated below in Fig. 7.20.
This code can now be compiled by highlighting the code window and then
selecting Compile from the Project menu. If there are any errors in the code
they will be highlighted in the message window, with the source of any error
located by double clicking on the related message in the message window.
After the errors have been corrected the component is available for instantiation. Fig. 7.21(a) illustrates the code generated with the assistance of the
component development wizard. Fig. 7.21(b) shows the generated component
as used in a simple visual program.

7.3.5 The Help Viewer


Neat Vision has a built in HTML based help viewer which can be used to view
the help topics, accessible from the main Help menu. The help viewer can also
be used to view the help for the NeatVision components under the System
tab in the component explorer. This can be performed by right clicking on

7.3 The NeatVision Application

~Gr~E"NCO

253

onen1

Add Eolde.

gename
C.aala Qroup
V1ew~Outte

View!lelp
C~ Component

~opy Componenl
~as:18 Componenl
Qelete Component

fie..

Cancel

(b)

(~

Compourn rtf

II H.1mIt

II PI.cl.n

c_ _

1
leplaclen

ltv.
o

'"

CreiIMe

Cancei

(c)
Fig. 7.20. New component development. (a) Select Add New Component from the
popup menu of the user area of the component explorer. (b) Select Neat Vision
component to be added specifying the component name. (c) Define the component
skeleton.

the relevant component either in the explorer or in the visual workspace and
selecting View Help from the resulting popup menu.
The help viewer can be controlled using the buttons which are located in
the main toolbar. These buttons allow the user to forward and back one page
at a time, stop the current page from loading and go to the home page. The
URL representing the current help page is displayed alongside the buttons.

254

7. NeatVision: Visual Programming for Machine Vision

O lnlf .... ~.

II

(be

~l

._~ d~DroIl0Jl.l

lIl'I. ,.

1~'fI.1Jf'L.ldt"'ln.ulll1

Sl'Jit tI

1~lr:II.1t!~k\9b'-Cfl..dll'

II grAb

''''-Il
u. 11

1111''-4\

1I1Ir.. l. t.o ... bllUer


~.

II/IU-'bJ l

1)"'IP'Ilt .. heW Hl~tljl'hl:

ou,bb,.r .QTIKIII'IIl.I_fl

Ulot b" ..... rb"ttltll'[IC.(y-Il ]ll

llIot d .. lIWQ[tlurru[IC-lt,.lll
lliot . . .

-vo [bu.U'.r I."'"U :

111\ , .. 6volbuUullll""'W] II
IlIIt .... ....vv[bu.tt""[ IY.IJ.)Jl

II..

.-

1/ I .. I b

..

I - I

II .---.---- ....-- .
/I I 011 I
It I
II .~.---
1/ I I " , - I
II . ---.-------- .

(a)

(b)

Fig. 7.21. NeatVision code and its associated component. (a) The resulting code
generated by the component development wizard plus details of the Laplacian edge
detector to be implemented by the component. (b) The Laplacian component being
used in a sample visual program.

Only one instance of the help viewer can be active at anyone time. If the
user tries to access help when the viewer is already open, the current current
page is unloaded and the new page is loaded in its place.

7.4 Sample Applications


As we have seen, NeatVision provides an image analysis software development
environment that can work at several levels. For example, at a relatively low
level individual pixels can be manipulated. Alternatively, Neat Vision's built
in functionality can be used to generate solutions to complex machine vision
problems.
This section examines a range of sample problems and their solutions using the ideas outlined throughout this book. The examples are for illustrative
purposes and do not represent the optimal, or even the only approach to the
problems posed.
7.4.1 Low-level Programming
Aim: To generate a white circle from first principles. The circle should be
placed in the centre of the input image and have a radius of 30 pixels. This
example illustrates how Neat Vision can be used for low-level pixel manipulation.

7.4 Sample Applications

255

Operational SUIllInary: The visual workspace illustrated in Fig. 7.22 is


equivalent to the following code segment:
int

xcent = 128, ycent = 128, radius

30, colour

255;

for (int i=0;i<360;i++)


{

int x = radius*sin(i);
int y = radius*cos(i);
input.setxy(xcent+x,ycent+y,colour);
}

Neat Vision implements a standard for-loop structure. The feedback actions will be implemented on a blank input image as the loop variable increments from 0 to 360 in steps of 1 (as determined by integer inputs to the
for-loop block). The top output of the for-loop block indicates the final
loop output. The middle output indicates the intermediate loop output for
each pass through the feedback cycle. The loop variable is indicated by the
lower output from the for-loop block. Once the location of the white pixel
value to be written has been calculated, it is converted to an image coordinate
(PtoC) and the appropriate pixel is set to the desired grey scale (SETPIX).

IMAGE

c::J

(a)

(b)
Fig. 7.22. Low level functionality of NeatVision. (a) Visual workspace. (b) Output

image.

256

7. NeatVision: Visual Programming for Machine Vision

7.4.2 High-level PrograIllIning

Working at a higher level maximises the use of the large number of predefined
image processing and analysis algorithms available within Neat Vision.
Aim: To implement a simple blob fill algorithm using the label by location
operator (Fig. 7.23).
Operational Summary: The convex hull (CONVEXH) is applied to the input
image and which is then inverted. This is then passed to the label by location
operator (LABELL) to produce a number of distinct grey scale regions corresponding to the images convex deficiencies. The hole region is selected using
the double threshold operator (DOUBLE). In the example this is set to five,
although this value will be dependent on the input image. Once selected this
is combined (XOR) with the original image to fill the hole region .

INT
[5}

(a)

(b)

(c)

(d)

Fig. 7.23. Simple blob filling algorithm. (a) Visual workspace. (b) Original binary
image. (c) Output from the label by location operator. (d) Blob filling applied to
(b).

7.4 Sample Applications

257

7.4.3 Isolating the Largest IteIll in an IIllage


AiIll: To isolate the largest item in a scene (Fig. 7.24).
Operational SUIllIllary: A binary version of the input image is generated
by passing it through a threshold block (SINGLE) set to a user defined level.
The input values to the threshold blocks can be either predefined integer
inputs, or the user can select them interactively via the slide bars. The image
is eroded (ERODE) to remove small white regions prior to the application of
the label by size (LABELS) operator. This assigns a unique grey scale to every
distinct binary object based on its size (the larger the object, the greater the
grey scale value assigned to that object). The processing flow path is split to
allow multiple operations on the same data. The lower path allows the image
to be enhanced and sent to an output display window. The middle path finds
the largest grey scale value in the image (this corresponds to the grey scale
value associated with the largest object) and is used as the threshold value
for the dual threshold operation (DOUBLE). The upper path data is passed
through a dual threshold block, allowing a unique blob to be isolated and
displayed. Finally, the area of this region is calculated (by counting the white
pixels (CWp)) and displayed as an integer value.

f NHANC E

IMAO E

(a)

(b)

(c)

(d)

Fig. 7.24. Find and isolate the largest object in a scene. (a) Visual workspace. (b)
Original image. (c) Objects labelled by size. (d) Segmentation of the largest object.

258

7. NeatVision: Visual Programming for Machine Vision

7.4.4 Bottle-top Inspection


Aim: To count the number of flutes (i.e. the outer most triangular like regions) on a bottle-top (Fig. 7.25).
Operational Summary: The input image is thresholded at a user defined
level to produce a binary image. Isolated white pixels are removed (REMISO)
and the largest white region (BIGBLOB) in the image is found and eliminated.
The remaining regions are consolidated using erosion and dilation to produce
the image illustrated in Fig. 7.25 (c). This is done prior to labelling each region
with a unique grey scale (LABEL). The CENTROID operator replaces each grey
scale region (which corresponds to each detected crown flute) with its centroid
(as represented by a white pixel). These pixels are then counted to indicate
the number of flutes in the image. Finally the PSQUARE operator replaces each
white pixel by a small square region illustrating the crown flutes.

(a)

...
.

"

..

(b)

If

..
(c)

(d)

Fig. 7.25. Counting the flutes on a bottle-top image. (a) Visual workspace. (b)
Original image. (c) Segmentation of the crown flutes. (d) The centroid of each flute
is calculated and replaced by a single white pixel. These are counted to give the
final flute count. The single centroid pixels are replaced by white squares for clarity.

7.5 Conclusion

259

7.4.5 Plant-stern Location


Aim: To find the stem points in the plant image (Fig. 7.26).
Operational Summary: After thresholding and inversion (INV) the original input image is eroded (ERODE) and dilated (DILATE) to eliminate the stem
from the binary image. The resultant mask is then used to segment out the
stem. Since the mask is not an exact match with the leaf data, some additional
noise will also exist. This is eliminated by finding the biggest white region
in the image, i.e. the region corresponding to the stem. The image is then
thinned (THIN) prior to the application of 3 x 3 binary junction (JUNCTION)
and corner detectors (CORNER). These produce a point corresponding to each
valid junction and corner. These are combined (OR) and overlaid on the original grey scale image as small white squares.

7.5 Conclusion
This chapter has described the NeatVision application. Initially the theory
behind the application was introduced, describing in depth the concepts of
visual programming and new component development using the Java programming language. The application user interface was then introduced and
a description of how to use the core concepts described in the first two sections of this chapter was presented. A range of sample applications were
also introduced to illustrate the power of the NeatVision environment. This
chapter should help new Neat Vision users to get the most from the Neat Vision application and provide experienced user with a good reference to the
environments capabilities.

260

7. NeatVision: Visual Programming for Machine Vision

(a)

(c)

(b)

(e)

(d)

(f)

Fig. 7.26. Plant-stem location. Original image courtesy of Prof. Bruce G. Batchelor, University of Wales. (a) Visual workspace. (b) Original image. (c) Binary
version of the input image. The input image has been thresholded at grey scale 200
and inverted. (d) Application of morphological operators to remove the stem. (e)
Isolation of the stem. This is the difference between the binary image (c) and the
image illustrated in (d). The largest blob operator has also been applied to remove
any extraneous noise after the differencing operation. (f) Detected stem junctions
overlaid on the original grey scale image. Each detected corner and junction point
is illustrated by a small white square.

A. Neat Vision Graphic File Formats

Java provides support for the two most common file formats found on the
Internet, the Graphics Interchange Format (GIF) and the Joint Photographic
Experts Group File Interchange Format (JPEG). These file formats are not
generally used for machine vision applications as the compression techniques
which they employ distort image information. The pixel error introduced by
either compression technique is typically 1%, although this may not be visually apparent, it does result in unacceptable information loss. In order to
preserve image information, support for several non-corrupting file formats
are implemented in Java (the three main formats required are raw image data
(BYT), PC Paintbrush (PCX) and Tagged Image File Format (TIFF)). The
definitive list of file formats (Murray & Ryper 1996) supported by NeatVision is outlined below. R indicates a read-only file format and RW indicates a
read/write file format.
BMP (RW) Microsoft Windows Bitmap, a bitmap image file using Run Length
Encoding (RLE). It supports a maximum pixel depth of 32 bits and a
maximum image size of 32K x 32K pixels.
BYT (RW) Raw image data, grey scale only with a maximum pixel depth of
8 bits. The data is stored with the first byte of the file corresponding to
the top left hand corner of the image. No header is contained within the
image data and details about the dimensions must be supplied by the
user. This file format preserves the image data.
FPX (R) Kodak FlashPix is a multiple-resolution, tiled image file format
based on JPEG compression.
GIF (R) Graphics Interchange Format (GIF), is a bitmap file format which
utilises Lemple-Zev-Welch (LZW) compression, limited to a maximum
colour palette of 256 entries and a maximum image size of 64K x 64K
pixels.
JPEG (JPG) (RW) Joint Photographic Experts Group (JPEG) file interchange format is a bitmap file utilising JPEG compression, an encoding
scheme based on the Discrete Cosine transform (DCT). It has a maximum colour depth of 16.7 million colours and a maximum image size of
64K x 64K pixels.

262

A. NeatVision Graphic File Formats

PCX (RW) PC Paintbrush (PCX), a bitmap file using either no compression


or RLE. A maximum colour depth of 24 bits is available and a maximum
image size of 64K x 64K pixels is supported.
PNG (RW) PNG is a lossless data compression method for images (i.e. no
information loss). PNG can be stored using 8, 24 and 32 bits. GIF compression is used when the pixel is represented by 8 bits or less and a
lossless JPEG compression is used when the pixel is represented by 24
bits or less.
PBM (RW) Portable BitMap. The portable bitmap format is a lowest common denominator monochrome file format. It was originally designed for
mailing bitmaps between different platforms using typical network mailers. Now it serves as the common standard for a large family of bitmap
conversion filters.
PGM (RW) Portable Greymap Utilities. This is a non-compressed bitmap
format, hence allowing image data to be left intact. It has a maximum
grey scale depth of 256 levels.
PPM (RW) Portable PixelMap. The portable pixelmap format is a lowest
common denominator colour image file format.
RAS (RW) Sun Raster Image (RAS), a bitmap file format using either no
compression or RLE. A maximum of 16.7 million colours are supported
and the maximum image size is undefined.
RAW (RW) Raw image data. This is similar to the BYT format described earlier except in this case colour image data is also supported. The image
data is stored in RGB triplets. The image dimensions and other information must be specified by the user at load time.
TIFF (TIF) (RW) Tagged Image File Format is a bitmaped file format using
a wide range of compression techniques, uncompressed, RLE, LZW, International Telegraph and Telephone Consultative Committee (CCITT)
Group3 & Group4 and JPEG. It supports a maximum pixel depth of 24
bits (16.7 million colours) and a maximum image size 4 giga pixels.

B. Neat Vision Imaging API Specification

Two image classes are provided by the NeatVision API, Greylmage and
RGBlmage. These classes provide low level random access to pixel information.
This appendix presents the methods associated with each class and describes
their operation.
The images described by these classes are based on the Cartesian coordinate system. Each pixel can be addressed by an (x, y) pair or an absolute
index into the image array p (Fig. B.1). The pixel (0,0) is located in the top
left corner of the image. The value for x increases from left to right and the
value for y increases from top to bottom. Any point (x, y) can be represented
as an absolute index p (see Eqns. B.1 to B.3).

= x + (y x width)
x = p%width
y = p/width

(B.1)

(B.2)
(B.3)

where y is restricted to being an integer value.

,,,,,,",,,,,,00

123456719

0123"S67a9

01234!io6189

. Ill

~ ~r s

'~~ .

33

~ 35

' ;?li".! '

Ell 12 l'~ :-1"~~j~ r~ :~~- 1;


E 2. ~'.~~~ 2~]E: ~ 2? 2t1 H
30 31

i :

3~ , 311 39

~ _ 5' -~~ 53 '~~ _~__~~ ~I[6i

~!.D
10 ,1~J
10

(b)

Il2

3!I' M

-E:

73

if~

Ie in

(a)

36 ,

4D 4'''' 43 .. ~ -loS -44 41 -48 ', 4!11

IJ

wii

M 6e 11 , M

1l!l2:~

n 7B
~~E IS ~ , u
J.jI

r-i

76

09.
7'9

IS9

96 :., '11:"

(C)

Fig. B.1. A magnified 10 x 10 pixel grey scale image representing the three available
addressing modes. (a) Coordinate based addressing. (b) Matrix based addressing.
(c) Pixel index based addressing.

264

B. NeatVision Imaging API Specification

B.l Greylmage Class


The Graylmage class refers to a single plane image.
int Graylmage.getHeight()
Establishes the height in pixels of the image represented by the Graylmage
object. No arguments are required by this method.
int Greylmage.getWidth()
Establishes the width in pixels of the image represented by the Graylmage
object. No arguments are required by this method.
int Graylmage.getxy(int x,int y)
Determines the pixel intensity at the image location given by the coordinates
x and y. If these coordinates are outside the valid image area then the value
returned by the method is the background intensity.
void Graylmage.setxy(int x,int y,int i)
Changes the intensity value of the pixel at the image location specified by
coordinates x and y. The new intensity value of this pixel is given by the
intensity argument i.
int Graylmage.get(int p)
Determines the pixel intensity at the image location given by the point p. If
the point p is outside the valid image area then the background intensity is
returned by the method.
void Graylmage.set(int p,int i)
Changes the intensity value of the pixel at the image location specified by
the point p. If p is outside the valid image area the method has no effect.
void Graylmage.clear()
Resets all the pixel intensity values to zero or black. This method has no
arguments and no return value and once it has been called all image information is lost.
void Graylmage.drawLine(int xl,int x2,int y2,int y2,int i)
Draws a line on the image between and including the points (xl, yl) and
(x2, y2). The intensity of the resulting line is specified by the intensity argument i.
void Graylmage.drawRect(int x,int y,int w,int h,int i)
Draws the outline of a rectangle on the image. The top left corner is specified
by the coordinates x and y and the dimensions of the rectangle are specified
by the width and height arguments wand h respectively. The intensity of the
resulting rectangle is specified by the intensity argument i.
void Graylmage.fillRect(int x,int y,int w,int h,int i)
This method performs the same operation as drawRect except in this case
the resulting rectangle is filled. The intensity of the filled rectangle is speci-

B.2 RGBlmage Class

265

fied by the intensity argument i.


void Graylmage.drawCircle(int x,int y,int r,int i)
Draws the outline of a circle on the image. The centre of the circle is specified by the coordinates x and y and the radius of the circle is given by the
argument r. The intensity of the resulting circle is specified by the intensity
argument i.
void Graylmage.fillCircle(int x,int y,int r,int i)
This method performs the same operation as drawCircle except in this case
the resulting circle is filled. The intensity of the filled circle is specified by
the intensity argument i.

B.2 RGBImage Class


The RGBlmage class is an extension ofthe Graylmage class. An RGB image is
represented by three intensity planes, one for each of the colour components.
These planes can be referenced by the static members of the RGBlmage class
RGBlmage. RED, RGBlmage. GREEN and RGBlmage. BLUE. The methods provided
by the RGBlmage class are listed below.
int RGBlmage.getHeight()
Establishes the height in pixels of the image represented by the RGBlmage
object. No arguments are required by this method.
int RGBlmage.getWidth()
Establishes the width in pixels of the image represented by the RGBlmage
object. No arguments are required by this method.
int RGBlmage.getxy(int p,int x,int y)
Determines the pixel intensity on plane p at the image location given by the
coordinates x and y. If these coordinates are outside the valid image area
then the value returned by the method is the background intensity.
void RGBlmage.setxy(int p,int x,int y,int i)
Changes the intensity value of the pixel on plane p and at the image location
specified by coordinates x and y. The new intensity value of this pixel is given
by the intensity argument i.
int RGBlmage.get(int p,int i)
Determines the pixel intensity at the image location given by the absolute
index i on plane p. If the index i is outside the valid image area then the
background intensity is returned by this method.
void RGBlmage.set(int p,int i,int v)
Changes the intensity value of the pixel at the image location specified by the
absolute index i on the plane p to the value represented by v. If i is outside

266

B. NeatVision Imaging API Specification

the valid image area the method has no effect.


void RGBlmage.clear()
Resets all the pixel intensity values to zero or black. This method has no
arguments and no return value and once it has been called all image information is lost.
void RGBlmage.drawLine(int xl,int x2,int y2,int y2,Color c)
Draws a line on the image between and including the points (x I, y I) and
(x2, y2). The colour of the resulting line is specified by the colour argument
c.

void RGBlmage.drawLine(int p,int xl,int x2,int y2,int y2,int v)


Draws a line on the image between and including the points (xl, yl) and
(x2, y2) on plane p. The intensity of the resulting line is specified by the intensity argument v.
void RGBlmage.drawRect(int x,int y,int w,int h,Color c)
Draws the outline of a rectangle on the image. The top left corner is specified
by the coordinates x and y and the dimensions of the rectangle are specified
by the width and height arguments wand h respectively. The colour of the
resulting rectangle is specified by the colour argument c.
void RGBlmage.drawRect(int p,int x,int y,int w,int h,int i)
Draws the outline of a rectangle on plane p of the image. The top left corner
is specified by the coordinates x and y and the dimensions of the rectangle
are specified by the width and height arguments wand h respectively. The
intensity of the resulting rectangle is specified by the intensity argument i.
void RGBlmage.fillRect(int x,int y,int w,int h,Color c)
Draws a filled rectangle on the image. The top left corner is specified by the
coordinates x and y and the dimensions of the rectangle are specified by the
width and height arguments wand h respectively. The colour of the resulting
rectangle is specified by the color argument c.
void RGBlmage.fillRect(int p,int x,int y,int w,int h,int i)
Draws a filled rectangle on plane p of the image. The top left corner is specified by the coordinates x and y and the dimensions of the rectangle are
specified by the width and height arguments wand h respectively. The intensity of the resulting rectangle is specified by the intensity argument i.
void RGBlmage.drawCircle(int x,int y,int r,Color c)
Draws the outline of a circle on the image. The centre of the circle is specified
by the coordinates x and y and the radius of the circle is given by the argument r. The colour ofthe resulting circle is specified by the color argument c.
void RGBlmage.drawCircle(int p,int x,int y,int r,int i)
Draws the outline of a circle on plane p of the image. The centre of the circle
is specified by the coordinates x and y and the radius of the circle is given
by the argument r. The intensity of the resulting circle is specified by the

B.2 RGBlmage Class

267

intensity argument i.
void RGBlmage.fillCircle(int x,int y,int r,Color c)
Draws a filled circle on the image. The centre of the circle is specified by the
coordinates x and y and the radius of the circle is given by the argument r.
The colour of the resulting circle is specified by the colour argument c.
void RGBlmage.fillCircle(int p,int x,int y,int r,int i)
Draws a filled circle on plane p of the image. The centre of the circle is specified by the coordinates x and y and the radius of the circle is given by the
argument r. The intensity of the resulting circle is specified by the intensity
argument i.

C. NeatVision Components

(j Oal.
c:! Flow Con~ol

c:! UtJIi~B.
c:! ArllI>miUc

(j Processing

c:! Nol>e
c:! fine,
c:! ""alySlS
c:!lAbBI

FIowCon1101

Cl Trtifltform
Cl COlour

SPllnelX2

~"

Spl,Ue,X)
SplitteIX4

CllllorpnOIOgy

CI H",ogram

Feedbatk

Cllllall>.

CI LowI....,
CI SIII"g
c:::l .leolow,

Else

F.,

Termlnal&

CiI p'.....,ng

~L~:~;"~.,

t ..

PQW1lr

Square

, a~Th:~;':;~.sno,o
OuamueshOld
TripieThle.no,o

EntroplCThrls.holci
TllresholdM
T'hres.hoICl&c5

Conba"
Intenslty'RangeEnM.nc:el
HI$10 r g.ram EQualjse r

Allthmi'llc

aIUtJ~~:

Double
PoinIToSQuare
Rotale

Md
SlJbtract
Mull/Ply
DMde
And

RoiaiePlus90

0,

ROlaleM.nus90

x.,

ROI

Scala
Mask

No'

MJnlmom
MalDmum

~
,

IntenSlttRangeStreCh8'
Removels.ola1.e~BPb;el$
Integrat@lmage-Rows
IntegratermageColumns

HortSum

B,na,.,sh,OOW

LeIlTORlg.,
RighfTOl.on

TopToBottom
BottomToTop

270

C. NeatVision Components

An ...... ,.
S'ObAn ........
B'og1910b

Smalle$tll'llen$l~eIQl'lbour

ISOlale9ays
Centroid
Comext-tull
FlfladConve.Hu I
CratkOetedof
EulerNumbatCalculalor
'Wn'tePmeICou-nter
F"lfSM1'litePtxelLocaoor

Large$llnten$t~elghbour

OreyS,".P/n!'Sum

fltcef

lawPass
Sharpen
M,dian
.. IClllolnl
RectangularAverageFU1er

OOLPS

NOJSIt

S nnPePDerOtme,alor

Smilles1910b
BlobF"IJI
IsOlateHOles

Convolution

Con.nectmty

Z810Cronln-gsDelector
lniensltyGradientOlf8cbon

Thill

o:>a~l[

Addrtnre'MUeNotseOeneralOf
Gau$SlanNoiseOeneralOl
RayfeighNolseOener3'lOr
POitsonNotseOen&roitOr

Tlun_N
CornerPolntOetettOf
JUncllonOBI8clor
umbEndOet8c.tol

, 6 SOu"dOng ReClOnOI.

NonWaldma

L platt"n
Blnary80rCler

Sound.ngS ..

Fllli'd80undlngBOli
BoundJngBoxT opCOOrd
BoundJngBox80110mCOOrd
$usanCornerOeteclOt

Trlns-form
HOUOh
lrwe'S-eHougl'l

,a

Me(h3\AJ1,sTransform

FOun An ...... '.

FFT

IllV'IrSe_m
Vlow_m
lowpass
Hjghpass
AdaptNeLoWpass
OandfJas$

S.nd.,OP
"ul Pot
OlYlde
OauSSlin
SelecllYePass
S)mmelliC SeleCttiePass

t S Coocc.urrentelllalrrc
o.n Ie"'~"

label
L belMoc Mn
L.b.,ByAIe.
MeasureLabeliedOb,eclS

EnergrAttrlbu1e
EnlropyAJlnbute
ContnslAttrlbU18

Homogenll't'AnflbUIB
&- ~ [)ist;tnc:eTranstorm

D S'n.ryToO......C.,.

Colour

_aoeCOIOUrChinnel.
COloufToROB
R08ToCOIOur
COIOurToHSI
HstToCOlOul
ColourToXY'Z
)((lToColoUI
COIOurToYIQ
YlGToColoUl

C. NeatVision Components

"alllS

.. a
,

Basit functions
ConcutJonil
Grea,erThan
GreaterTnan_Or_Equirro
LessTnan

~
..

MOIP,1'I010OY
DlIalfon
ErOSIon
Open
Close
Morpl"lDlogltllValley
lrIIorphOlogitaITop1'la1

ErodeN:lt\l
OliateNxN
HIIAI'M:I'I.l!n
Walershiild

8001.'nMd
90oleanOr
8ooleanxor
800leanNot

.,. [ j C.s~ng
abs
ceil

Histogram

, 9

U$STl\an_O,_EquaITO

EQuafTo
Boolean

nom

Hrgh IG' O!\. ..... '


Lowe sta. 0!\.0e1

random

MeanSquareError
Averagelnumslt'tCaleul810i
EntropyCalculi tor
VarleneeCatcUlalor
Kul'tOsl'$Caleulator

"""
S<l"

StinCSardOtv1atlon
Skewness

min

In

log

max

Loca EQUiliSillon

3:I!3RIB'Ql0n

"OUM
PI

~SRI!!'Qlon

EI JAtCoiou.
"
OperalJons
JAIAIId
JAIAIIdCOnSI
JAJSubuatl
JAlSubtrattConSI
JAJSubtt.1C1fromConsr

JAt.. ""p'1
JAlMtJlltpi'tConst

JAlOMde
JAlOMdeEr;Const

JAlOMdilntoConSl
JAlMd
JAlAnClCORst

JAtO

.worCo",'

JAtNot
.1000>.
JAD<orConst
JAJMIR

JAI"""

, oE'il l VT

lowL....1
OOlPIlOI
SetPlICel
RemovePkIl
Or..une
Oraw9Ol
fMIBolC

Polnls:.Coorchnale

Coordlnate,"Polnls

-''''
..
JAt80xf'IIe'

JAlMaolan
JAlLowpass
JAJ$t\arpen

OrawCl" l.
fillClrel8
OetimagewiClth
OttimageH&IQnt

JAllog
JAtExp

Slnng
SUongMd
ToLowarCii iii
TOUPPilCaSIiI

~JAtEi:~~'~'S
JAlrober1s
JAlpre'Wlt1
JAllr.lth.n

JAllaS:lIa<lan

271

References

AVA (1985), AVA Machine Vision Glossary, Automated Vision Association.


Batchelor, B. & Whelan, P. (1994a), Machine vision systems: Proverbs, principles,
prejudices and priorities, in 'Proceedings of the SPIE - The International Society
for Optical Engineering', Vol. 2347, pp. 374-383.
Batchelor, B. & Whelan, P. (1995), 'Real-time colour recognition in symbolic
programming for machine vision systems', Machine Vision and Applications
8(6), 385-398.
Batchelor, B. & Whelan, P. (1997), Intelligent Vision Systems for Industry,
Springer-Verlag.
Batchelor, B. & Whelan, P., eds (1994b), Selected Papers on Industrial Machine
Vision Systems, SPIE Optical Engineering Press.
Blake, A. & Isard, M. (1998), Active Contours, Springer-Verlag.
Borgefors, G. (1986), 'Distance transformations in digital images', Computer Vision, Graphics, and Image Processing 34, 344-371.
Bovik, A., Clark, M. & Geisler, W. (1990), 'Multichannel texture analysis using
localised spatial filters', IEEE Transactions on Pattern Analysis and Machine
Intelligence 12(1),55-73.
Braggins, D. (2000), 'The machine vision community: Splits old and new.', Advanced
Imaging March, 43-44.
Brammer, R. (1989), 'Unified image computing based on fractals and chaos model
techniques', Optical Engineering.
Brodatz, P. (1966), Textures: A Photographic Album for Artists and Designers,
Dover Publications, New York.
Canny, J. (1986), 'A computational approach to edge detection', IEEE Transactions
on Pattern Analysis and Machine Intelligence 8(6), 679-698.
Castleman, K. (1979), Digital Image Processing, Prentice-Hall, New York, USA.
Chin, R. (1988), 'Automated visual inspection: 1981 to 1987', Computer Vision,
Graphics and Image Processing 41, 346-381.
Chin, R. & Harlow, C. (1982), 'Automated visual inspection: A survey', IEEE
Transactions on Pattern Analysis and Machine Intelligence 4(6), 557-573.
Davies, E. (1996), Machine Vision: Theory, Algorithms, Practicalities, Academic
Press.
Dougherty, E. (1992), An Introduction to Morphological Image Processing, Vol.
TT9, SPIE Press.
Dougherty, E. & Laplante, P. (1995), Introduction to Real-time Imaging,
SPIE/IEEE Press.
Dubes, R. & Jain, K. (1989), 'Random fields models in image analysis', Journal of
Applied Statistics 16, 131-164.
Duff, M., Waston, D., Fountain, T. & Shaw, G. (1973), 'A cellular logic array for
image processing', Pattern Recognition.

274

References

Freeman, H. (1987), Machine Vision: Algorithms, Architectures, and Systems, Academic Press.
Gonzalez, R. & Wintz, P. (1987), Digital Image Processing, Addison-Wesley.
Hanson, P. & Manneberg, G. (1997), 'Fourier optic characterization of paper surfaces', Optical Engineering 36(1), 35-39.
Haralick, R. (1979), 'Statistical and structural approaches to texture', Proceedings
of the IEEE 67(5), 786-804.
Haralick, R. (1992), Performance characterization in computer vision, in D. Hogg &
R. Boyle, eds, 'Proceedings of the British Machine Vision Conference', SpringerVerlag, pp. 1-8.
Haralick, R. & Shapiro, L. (1992), Computer and Robot Vision: Volume 1, AddisonWesley.
Haralick, R., Sternberg, S. & Zhuang, X. (1987), 'Image analysis using mathematical
morphology', IEEE Transactions on Pattern Analysis and Machine Intelligence
9(4),532-550.
Heijmans, H. (1991), 'Theoretical aspects of grey-level morphology', IEEE Transactions on Pattern Analysis and Machine Intelligence 13(6), 568-582.
Hochberg, J. (1987), 'Machines should not see as people do, but must know how
people see', Computer Vision, Graphics, and Image Processing 37, 221-237.
Hutson, T. (1971), Colour Television Theory, McGraw-Hill, London.
Jain, A. (1989), Fundamentals of Digital Image Processing, Prentice-Hall.
Jain, A. & Dubes, R. (1988), Algorithms for Clustering Data, Prentice-Hall.
Kass, M., Witkin, A. & Terzopoulos, D. (1987), Snakes: Active contour models, in
'Proceedings of the 1st International Conference on Computer Vision', pp. 259268.
Keller, J. M. (1986), Color image analysis offood, in 'Proc. IEEE Computer Society
on Computer Inspection and Pattern Recognition', Florida, USA.
Klette, R. & Zamperoni, P. (1996), Handbook of Image Processing Operators, Wiley.
Klien, J. & Serra, J. (1972), 'The texture analyzer', J. Microscopy 95(2),349-356.
Lantuejoul, C. & Maisonneuve, F. (1984), 'Geodesic methods in image analysis',
Pattern Recognition 17(2),177-187.
Laws, K. (1980), Texture Image Segmentation, PhD thesis, University of Southern
California.
Liang, J., Piper, J. & Tang, J. (1989), 'Erosion and dilation of binary images by arbitrary structuring elements using interval coding', Pattern Recognition Letters
9(3),201-209.
Lindley, C. (1991), Practical Image Processing in C, Wiley.
Liu, S. & Jernigan, M. (1990), 'Texture analysis and discrimination in additive
noise', Computer Vision, Graphics, and Image Processing pp. 52-57.
Mandelbrot, B. (1983), The fractal geometry of nature, Freeman, New York.
Maragos, P. (1987), 'Tutorial on advances in morphological image processing and
analysis', Optical Engineering 26(7), 623-632.
Maragos, P. & Schafer, R. (1986), 'Morphological skeleton representation and coding of binary images', IEEE Transactions on Acoustics, Speech and Signal Processing 34(5), 1228-1244.
Marr, D. & Hildreth, E. (1980), Theory of edge detection, in 'Proceedings of the
Royal Society', Vol. B 207, pp. 187-217.
Matheron, G. (1975), Random Sets and Integral Geometry, John Wiley.
McClannon, K., Whelan, P. & McCorkell, C. (1993), Integrating machine vision into
process control, in M. Ahmad & W. Sullivan, eds, 'Proceedings of FAIM'93',
CRC Press, University of Limerick, pp. 703-714.

References

275

McCormick, B. & Jayaramamurthy, S. (1974), 'Time series model for texture synthesis', International Journal of Computer and Information Sciences 3, 329343.
Meyer, F. (1986), 'Automatic screening of cytological specimens', Computer Vision,
Graphics, and Image Processing 35, 356-369.
Mirmehdi, M. & Petrou, M. (2000), 'Segmentation of color textures', IEEE Transactions on Pattern Analysis and Machine Intelligence 22(2), 142-159 ..
Mitkas, P., Betzos, G. & Irakliotis, L. (1998), 'Optical processing paradigms for
electronic computers', IEEE Computer 31(2), 45-51.
Murray, J. & Ryper, W. V. (1996), Encyclopaedia of Graphics File Formats,
O'Reilly.
Myler, H. & Weeks, A. (1993), Computer Imaging Recipes in C, Prentice Hall.
Noble, J. (1989), Descriptions of Image Surfaces, PhD thesis, Oxford University.
Ojala, T. & Pietikainen, M. (1999), 'Unsupervised texture segmentation using feature distributions', Pattern Recognition 32(3),477-486.
Ojala, T., Pietikainen, M. & Harwood, D. (1996), 'A comparative study of texture
measures with classification based on feature distributions', Pattern Recognition
29(1),51-59.
Parker, J. (1997), Algorithms for Image Processing and Computer Vision, Wiley.
Parks, J. (1978), Industrial Sensory Devices, Plenum, New York, pp. 253-286.
Pavlidis, T. (1978), 'A review of algorithms for shape analysis', Computer Graphics
and Image Processing 7, 243-258.
Pavlidis, T. (1992), 'Why progress in machine vision is so slow', Pattern Recognition
Letters 13, 221-225.
Pentland, A. (1984), 'Fractal-based description of natural scenes', IEEE Transactions on Pattern Analysis and Machine Intelligence 6(6), 661-674.
Petrou, M., Mirmehdi, M. & Coors, M. (1998), Perceptual smoothing and segmentation of colour textures, in 'Proceedings of the 5th European Conference on
Computer Vision 98', Freiburg, Germany, pp. 623-639.
Pietikainen, M. & Kauppinen, H. (1999), Texture '99; Workshop on Texture Analysis in Machine Vision, Infotech Oulu.
Pietikainen, M. & Ojala, T. (1996), Texture analysis in industrial applications, in
J. Sanz, ed., 'Image Technology - Advances in Image Processing, Multimedia
and Machine Vision', Springer-Verlag, pp. 337-359.
Pietikainen, M. & Ojala, T. (2000), 'Rotation-invariant texture classification using
feature distributions', Pattern Recognition 33, 43-52.
Piper, J. (1985), 'Efficient implementation of skeletonisation using interval coding',
Pattern Recognition Letters 3(6), 389-397.
Pitas, I. (1993), Digital Image Processing Algorithms, Prentice-Hall.
Pitas, I. & Venetsanopoulos, A. (1990), 'Morphological shape decomposition', IEEE
Transactions on Pattern Analysis and Machine Intelligence 12(1), 38-45.
Preston, K. (1991), 'Three-dimensional mathematical morphology', Image and Vision Computing 9(5), 285-295.
Randen, T. & Hus0y, J. (1999), 'Filtering for texture classification: A comparative survey', IEEE Transactions on Pattern Analysis and Machine Intelligence
21(4),291-310.
Rao, K. & Hwang, J. (1996), Techniques and Standards for Image, Video and Audio
Coding, Prentice-Hall PTR.
Rauber, C., Tschudin, P., Startchik, S. & Pun, T. (1996), Archival and retrieval of
historical watermark images, in 'International Conference on Image Processing
(ICIP)', IEEE Press, Lausanne, pp. 773-776.

276

References

Reed, T. & Wechsler, H. (1990), 'Segmentation of textured images and gestalt


organisation using spatial/spatial-frequency representation', IEEE Transactions
on Pattern Analysis and Machine Intelligence.
Sedgewick, R. (1983), Algorithms, Addison Wesley.
Serra, J. (1982), Image Analysis and Mathematical Morphology, Academic Press.
Serra, J. (1986), 'Introduction to mathematical morphology', Computer Vision,
Graphics, and Image Processing 35, 283-305.
Serra, J., ed. (1988), Image Analysis and Mathematical Morphology, Volume 2:
Theoretical Advances, Academic Press.
Simon, H., ed. (1981), The Sciences of the Artifical, 2nd Edition, MIT Press.
Smith, S. (1992), A new class of corner finder, in D. Hogg & R. Boyle, eds, 'Proceedings of the British Machine Vision Conference', pp. 139-148.
Smith, S. & Brady, J. (1997), 'SUSAN - a new approach to low level image processing', Int. Journal of Computer Vision 23(1),45-78.
Soille, P. (1997), Geodesic transformations and metrics: an overview, in G. Banon,
ed., 'Brazilian Workshop'97 on Mathematical Morphology, mini-course II', Instituto Nacional de Pesquisas Espaciais, CP 515, CEP 12201-970 Sao Jose dos
Campos, SP, Brazil, pp. 1-48.
Soille, P. (1998), Morphological texture analysis: a survey, in H. Burkhardt, ed.,
'WTA'98: Workshop on Texture Analysis', Institute for Pattern Recognition and
Image Processing, Albert-Ludwigs-Universitst, Freiburg, Germany, pp. 193207.
Soille, P. (1999), Morphological Image Analysis: Principles and Applications,
Springer-Verlag, Berlin Heidelberg.
Soille, P. & Rivest, J. (1996), 'On the validity of fractal dimension measurements in
image processing', Journal of Visual Communication and Image Representation.
Sonka, M., Hlavac, V. & Boyle, R. (1993), Image Processing, Analysis and Machine
Vision, Chapman and Hall.
Stark, H. (1982), Applications of Optical Fourier Transforms, Academic Press.
Sternberg, S. (1984), Industrial morphology, in 'Proc. SPIE Applications of Digital
Image Processing VII', pp. 202-213.
Sternberg, S. (1986), 'Grey scale morphology', Computer Vision, Graphics, and
Image Processing 35, 333-355.
Vernon, D. (1991), Machine Vision, Prentice Hall.
Vincent, L. (1991a), 'Morphological transformations of binary images with arbitrary
structuring elements', Signal Processing 22, 3-23.
Vincent, L. (1991b), New trends in morphological algorithms, in 'Proc. SPIE Nonlinear Image Processing II', Vol. 1451, SPIE, pp. 158-170.
Vincent, L. (1993), 'Morphological grayscale reconstruction in image analysis: applications and efficient algorithms', IEEE Transactions on Image Processing
2(2), 176-201.
Vincent, L. & Soille, P. (1991), 'Watersheds in digital spaces: an efficient algorithm
based on immersion simulations', IEEE Transactions on Pattern Analysis and
Machine Intelligence 13(6), 583-598.
Waltz, F. (1988), Fast implementation of standard and 'fuzzy' binary morphological
operations with large, arbitrary structuring elements, in 'Proc. SPIE Intelligent
Robots and Computer Vision VII', Vol. 1002, pp. 434-441.
Wang, L. & He, D. (1990a), 'Texture classification using the texture spectrum',
Pattern Recognition pp. 905-910.
Wang, L. & He, D. (1990b), 'Texture features based on the texture spectrum',
Pattern Recognition 24, 391-399.
Weiss, S. & Kulikowski, C. (1990), Computer Systems that Learn, Morgan Kaufmann Publishers Inc.

References

277

Whelan, P. & Batchelor, B. (1993), 'Flexible packing of arbitrary two-dimensional


shapes', Optical Engineering 32(12), 3278-3287.
Whelan, P. & Batchelor, B. (1996), 'Automated packing systems - a systems engineering approach', IEEE Trans. on Systems, Man and Cybernetics - Part A:
Systems and Humans 26(5), 533-544.
Whelan, P. & Soille, P. (1998), Watermark extraction in paper samples, in D. Vernon, ed., 'Proc. Optical Engineering Society of Ireland & Irish Machine Vision and Image Processing Joint Conference', National University of Ireland,
Maynooth, pp. 287-299.
Young, I., Peverini, R., Verbeek, P. & Otterloo, P. V. (1981), 'A new implementation for the binary and Minkowski operators', Computer Graphics and Image
Processing 17, 189-210.
Zhang, X. & Wandell, B. (1997), 'A spatial extension of CIELAB for digital color
image reproduction', SID Journal.
Zhuang, X. & Haralick, R. (1986), 'Morphological structuring element decomposition', Computer Vision, Graphics, and Image Processing 35, 370-382.

Index

is-a-part-oj, 26
is-a, 26
GreyImage, 264
RGBImage, 265
extends, 26
flowO, 72
friendly, 29
javac, 23, 34
main 0 , 33
private, 28
protected, 29
public, 28
super, 30
this, 30
4-adjacency, 112
4-connected, 62
8-adjacency, 112
8-connected, 62

aliasing, 53
alpha, 40
anti-aliasing, 248
API, 22, 27, 49, 243, 263
application
- blob fill, 256
- bottle-top inspection, 258
- colour recognition, 214
- draw, 254
- geometric packing, 169
- largest item, 257
- morphology, 169
- NeatVision, 254
- plant-stem location, 259
- polychromatic scenes, 219
- red:green pepper ratio, 201
- removal of boundary objects, 163
- template matching, 91
- watermark analysis, 129
auto-correlation (ACF), 177
autoregressive model, 189
AVVT, 32, 34, 38, 49, 54, 240, 241

biggest white blob, 101


binary
- border, 97
- corner detection, 107
- count white neighbours, 98
- detecting small spots, 104
- edge detector, 99
- edge smoothing, 107
- images, 112
- imaging, 96
- junction detection, 110
- limb end detection, 109
- measurements, 112
- morphology, 138
- pruning, 109
- remove isolated white points, 98
- removing small spots, 104
- to grey scale, 102
- white pixel count, 98
blob analysis, 99
bmp, 261
Boolean
- AND, 98
- binary edge detector, 99
- exclusive OR, 99
expand white areas, 99
imaging, 96
NOT,98
OR,98
shrink white areas, 99
bounding box, 110
buffering
- double, 47
- single, 47
byt, 261
bytecode, 22
centroid, 102
chain code, 114
Charge Coupled Device (CCD), 7
chrominance, 193
CIELAB,209

280

Index

CIELUV, 208
CIEXYZ, 206
class
- derived, 25
- hierarchy, 25
classification
- artificial neural network (ANN), 14
- Bayesian, 14
- clustering, 15
confusion matrix, 15
KNN,14
leave one-out, 15
maximum similarity, 13
maximum-likelihood, 14
nearest neighbour, 13
performance, 15
- supervised, 13
- unsupervised, 15
closing
- binary, 145
- grey scale, 151
co-occurrence
- angular second moment, 181
- contrast, 182
- energy, 181
- entropy, 181
- inertia, 182
- local homogeneity, 182
- matrix, 179
colour
- analysis, 191
- CIE chromaticity diagram, 206
- cube, 195
- histogram, 197
- hue, 198
- images, 9
- intensity, 198
- representation, 193
- saturation, 198
- scattergrams, 197, 203
- separation, 197
- texture, 210
- triangle, 195
- WWW links, 219
colour camera, 191
conditional dilation, 161
connected component analysis, 99
connectivity detector, 99
contrast
- enhancement, 69, 160
- stretching, 70
convex hull, 111
convolution, 74, 124

corner
- binary, 90, 107, 143
- grey scale, 93
- SUSAN, 93
covariance, 156
difference of low-pass filters (DOLPS),
81
dilation
- binary, 138
- conditional, 161
- geodesic, 162
- grey scale, 139, 150
- half-gradient, 152
- interval coding, 149
- reconstruction, 163
Discrete Fourier Transform (DFT), 124
distance measure
- Euclidean, 14
distance measure
- Chessboard, 117
- City Block, 117
- Euclidean, 117
distance transform, 117
double threshold, 163
duals, 142
dyadic
- addition, 72
- divide, 72
- maximum, 72
- minimum, 73
- multiply, 72
operators, 70
subtraction, 72
edge
density, 175
- effects, 91
- operator, 82
edge operator
Canny, 89
Frei and Chen, 87
Laplacian, 84
Marr-Hildreth, 88
Prewitt, 86
Roberts, 84
Sobel, 85
zero-crossings, 88
epoch, 14
erosion
binary, 140
geodesic, 162
grey scale, 140, 150
half-gradient, 152

Index
- interval coding, 149
- reconstruction, 163
- ultimate, 163
euler number, 100
filling holes, 103
filter
- band-pass, 127
- band-stop, 128
- adaptive low-pass, 126
- crack detection, 82
- direction codes, 82
- DOLPS, 81
- edge detector, 80
- high-pass, 75, 77, 127
- low-pass, 75, 81, 126
- mean, 81
- median, 80
- midpoint, 81
- N-tuple, 90
- rank, 82
- rectangular average, 77
- selective, 128
- sharpen, 81
- symmetric, 129
Fourier spectral analysis, 177
fpx, 261
fractal analysis, 184
Freeman code, 114
Gabor filters, 188
geodesic
- dilation, 162
- erosion, 162
gif, 261
GLDM
- angular second moment, 179
- contrast, 179
- entropy, 179
inverse different moment, 179
- mean, 179
- operation, 178
GLRLM, 178
gradient, 152
grass-fire transform (GFT), 105
grey level difference, 178
grey level run length, 178
half-gradient
- dilation, 152
- erosion, 152
histogram, 178
histogram
- cumulative, 65

- equalization, 67
- intensity, 65
- local area, 68
- local equalization, 68
- stretching, 70
Hit-and-Miss, 143
Hot Java, 21
HSI, 198
hue singularity, 201
idempotency, 145
image
acquisition, 7
analysis, 10
binary, 96
classification , 12
dyadic, 70
grey scale, 8
monadic, 62
monochrome, 8
- neighbourhood, 61
- processing, 10, 61, 247
- representation, 8
imaging
- Java, 49, 243
- real-time, 6
inheritance, 26
inspection systems, 3
instantation, 24
intensity
- highlight, 63
- histogram, 65
- multiply, 63
- negate, 63
- shift, 62
- squaring, 63
interval coding
- dilation, 149
- erosion, 149
- operation, 148
JAI, 54, 56, 58, 223, 244
Java
- alpha value, 40
applet, 33-35
application, 33, 35
arrays, 31
background, 21
class, 32
class
- - descriptors, 31
- comments, 29
- compile, 34

281

282

Index

Image Consumers, 41
image processing, 36
Image Producers, 41
imaging
- API, 22, 49, 243
-- AWT, 49, 241
- - JAI, 54, 244
- import, 32
- interfaces, 31
- lightweight, 50
- newsgroups, 59
- operators, 29
- overloading, 27
- packages, 32
- PixelGrabber, 41
- RMI, 23
- Software Development Kit (SDK) ,
23
- Swing, 50
- throw, 29
- transparency value, 40
- types, 29
- variables, 28
- variables
- class, 28
- class member, 28
- local, 28
- static, 28
- WWW links, 58
Java class
- additional, 45
- Canvas, 36
- Class, 31
- Color Model, 45
- CropImageFilter, 46
- DirectColorModel, 45
- FilteredImageSource, 47
- Graphics, 38
- Image, 38
- ImageFilter, 45
- IndexColor Model, 45
- local, 28
- methods, 24
- Object, 31
- PlanarImage, 56
- RG BImageFilter, 46
- states, 24
- static, 28
Java Run-time Environment (JRE), 22
Java Virtual Machine (JVM), 22
jpeg (jpg) , 261
Just-In-Time (JIT) compiler, 22
Laplacian of a Gaussian (LoG), 88

largest intensity neighbourhood, 79


local binary patterns (LBP), 187
local operators, 73
local operators
- linear, 74
- non-linear, 79
look-up table
- nth power, 67
- enhancement, 69
- exponential, 67
- logarithmic, 67
- stretching, 70
look-up table (LUT), 66
luminance, 192
MacAdams ellipses, 206
Markov random field (MRF), 187
mask, 104
match-stick, 105
mathematical morphology, 137
Maxwell's triangle, 195
measures of similarity, 91
medial axis transform (MAT), 105
Minkowski
- addition, 139
- subtraction, 140
monadic operators, 62
monte-carlo, 176
morphology
- binary, 138
duality, 142, 143
external gradient, 152
gradient, 152
granularity, 184
grey scale, 149
internal gradient, 152
maxima, 164
- minima, 165
- ratio, 183
- reconstruction, 160
- SE decomposition, 146
- segmentation, 165
- SKIZ, 165
- system implementation, 174
- texture analysis, 183
- watershed, 165
multispectral images, 9
N-tuple operator, 90, 91
Neat Vision, 17, 223
NeatVision
- adding functionality, 237
- anti-aliasing, 248

Index

API specification, 263


application, 245
architecture, 235
component development wizard
(CDW),251
component explorer, 245
conditional processing, 230
data flow, 233
data looping, 230
- data splitting, 229
- data types, 227
- example, 231, 237
- flow control, 227
- GUI, 18, 223, 249
- help, 252
- image, 226
- image viewer tool, 248
- imaging, 240
- input/output, 225
- integration, 251
- JAI, 244
- Java, 233
- processing, 226
- profiling, 248
- software development, 251
NeatVision class
- image, 263
- Greylmage, 264
- RGBlmage, 265
noise removal, 152
non-maximum suppression, 89
NTSC, 193
-

Object-oriented Programming
- abstract classes, 27
- classes, 24
- encapsulation, 24
- inheritance, 25
- objects, 25
- polymorphism, 27
Object-oriented Programming (OOP),
23
opening
- binary, 145
- grey scale, 151
opponent process, 203
PAL,193
pbm, 262
pcx, 261
perimeter, 114
pgm, 262
pixel, 8

283

png, 262
point-pairs, 156
point-wise maximum, 163
ppm, 262
programmable colour filter
- generalisation, 217
- implementation, 211
- noise, 215
- recognition, 214
programmable colour filter (PCF), 211
pruning, 109
random field models, 187
ras, 262
raw, 262
reconstruction
- dilation, 163
- erosion, 163
rectangular average filter, 77
region labelling, 103
RGB,193
RMI,55
run-length coding, 115
S-CIELAB, 209
salt and pepper noise, 152
SE, 137
SE decomposition, 146
SE decomposition
- dilation, 146
- parallel, 147
- serial, 146
- union, 148
SECAM, 193
shape descriptors, 115
skeleton, 105, 145
Skeleton Influence by Zones (SKIZ),
165
SKIPSM, 174
smallest intensity neighbourhood, 79
spatial CIELAB, 209
structural analysis, 189
structuring element, 137
SUSAN,93
syntactic processes, 12
system
- boards, 5, 6
- context, 16
- design, 7
- engineering, 16
- hardware, 5
- machine vision, 3
- morphology, 174
- pipeline, 5

284

Index

self-contained, 5
software, 6
turn-key, 5
template, 91
template matching, 11
texel, 189
textural energy, 185
texture, 175
texture
- spectrum, 186
- units, 186
thickening, 144
thinning, 106, 144
threshold
- adaptive
-- 3 x 3, 78
- - 5 x 5, 78
- double, 163
- dual, 64
mid-level, 64
- single, 63
- triple, 65
tiff (tif), 262
top-hat, 157
top-hat
- black, 157
- enhancement, 160
- opening, 157
- white, 157
transform, 116
transform
- Chamfer, 11 7
- DFT, 124
- DFT

adaptive low-pass, 126


band-pass, 127
band-stop, 128
- - case study, 129
- - high-pass, 127
-- low-pass, 126
- - selective, 128
- - symmetric, 129
- distance, 117
- Fourier, 124
- geometric, 116
- grass-fire, 117
- Hit-and-Miss, 143
- Hough, 119
- top-hat, 157
transparency, 40
transpose, 143
trichromaticity, 196
valley detector, 157
vision
- computer, 1
- human, 2
- machine, 1
vision engineering, 2
visual programming, 224, 245
watershed
- immersion approach, 165
XYZ, 206
YIQ,204
YUV, 205

Вам также может понравиться