Академический Документы
Профессиональный Документы
Культура Документы
CHAPTER 1
INTRODUCTION
1.1 What is machine vision?
Machine vision (MV) is a branch of engineering that uses computer vision in the
context of manufacturing. While the scope of MV is broad and a comprehensive
definition is difficult to distil, a "generally accepted definition of machine vision is '...
the analysis of images to extract data for controlling a process or activity. Put another
way, MV processes are targeted at "recognizing the actual objects in an image and
assigning properties to those objects--understanding what they mean.
The first step in the MV process is acquisition of an image, typically using cameras,
lenses, and lighting that has been designed to provide the differentiation required by
subsequent processing. MV software packages then employ various digital image
processing techniques to allow the hardware to recognize what it is looking at.
CHAPTER 2
THRESHOLDING
2.1 What is thresholding?
2.2 Method
During the thresholding process, individual pixels in an image are marked as “object”
pixels if their value is greater than some threshold value (assuming an object to be
brighter than the background) and as “background” pixels otherwise. This convention
is known as threshold above. Variants include threshold below, which is opposite of
threshold above; threshold inside, where a pixel is labeled "object" if its value is
between two thresholds; and threshold outside, which is the opposite of threshold
inside. Typically, an object pixel is given a value of “1” while a background pixel is
given a value of “0.” Finally, a binary image is created by coloring each pixel white or
black, depending on a pixel's label's.
The key parameter in the thresholding process is the choice of the threshold value (or
values, as mentioned earlier). Several different methods for choosing a threshold
exist; users can manually choose a threshold value, or a thresholding algorithm can
compute a value automatically, which is known as automatic thresholding. A simple
method would be to choose the mean or median value, the rationale being that if the
object pixels are brighter than the background, they should also be brighter than the
average. In a noiseless image with uniform background and object values, the mean or
median will work well as the threshold, however, this will generally not be the case. A
more sophisticated approach might be to create a histogram of the image pixel
intensities and use the valley point as the threshold. The histogram approach assumes
that there is some average value for the background and object pixels, but that the
actual pixel values have some variation around these average values. However, this
may be computationally expensive, and image histograms may not have clearly
defined valley points, often making the selection of an accurate threshold difficult.
One method that is relatively simple, does not require much specific knowledge of the
image, and is robust against image noise, is the following iterative method:
2. m2 = average value of G2
4. A new threshold is created that is the average of m1 and m2
1. T’ = (m1 + m2)/2
5. Go back to step two, now using the new threshold computed in step four, keep
repeating until the new threshold matches the one before it (i.e. until
convergence has been reached).
Sezgin and Sankur (2004) categorize thresholding methods into the following six
groups based on the information the algorithm manipulates:
1. "histogram shape-based methods, where, for example, the peaks, valleys and
curvatures of the smoothed histogram are analyzed
2. clustering-based methods, where the gray-level samples are clustered in two
parts as background and foreground (object), or alternately are modeled as a
mixture of two Gaussians
3. Entropy-based methods result in algorithms that use the entropy of the
foreground and background regions, the cross-entropy between the original
and binarized image, etc.
4. Object attribute-based methods search a measure of similarity between the
gray-level and the binarized images, such as fuzzy shape similarity, edge
coincidence, etc.
5. spatial methods [that] use higher-order probability distribution and/or
correlation between pixels
6. Local methods adapt the threshold value on each pixel to the local image
characteristics."
CHAPTER 3
SEGMENTATION
3.1 What is segmentation?
The result of image segmentation is a set of segments that collectively cover the entire
image, or a set of contours extracted from the image (see edge detection). Each of the
pixels in a region are similar with respect to some characteristic or computed
property, such as colour, intensity, or texture. Adjacent regions are significantly
different with respect to the same characteristic(s).[1] When applied to a stack of
images, typical in Medical imaging, the resulting contours after image segmentation
can be used to create 3D reconstructions with the help of interpolation algorithms like
marching cubes.
1. Medical imaging
3.2 Thresholding
The simplest method of image segmentation is called the thresholding method. This
method is based on a clip-level (or a threshold value) to turn a gray-scale image into a
binary image. The key of this method is to select the threshold value (or values when
multiple-levels are selected). Several popular methods are used in industry including
the maximum entropy method, Otsu's method (maximum variance), and et al. k-
means clustering can also be used.
Edge detection is a well-developed field on its own within image processing. Region
boundaries and edges are closely related, since there is often a sharp adjustment in
intensity at the region boundaries. Edge detection techniques have therefore been used
as the base of another segmentation technique.
The edges identified by edge detection are often disconnected. To segment an object
from an image however, one needs closed region boundaries.
Overview:
4-connectivity 8-connectivity
A graph, containing vertices and connecting edges, is constructed from relevant input
data. The vertices contain information required by the comparison heuristic, while the
edges indicate connected 'neighbours'. An algorithm traverses the graph, labeling the
vertices based on the connectivity and relative values of their neighbours.
Connectivity is determined by the medium; image graphs, for example, can be 4-
connected or 8-connected.
Following the labeling stage, the graph may be partitioned into subsets, after which
the original information can be recovered and processed.
3.7 Algorithms
Two-pass
The input data can be modified in situ (which carries the risk of data corruption), or
labeling information can be maintained in an additional data structure.
Connectivity checks are carried out by checking the labels of pixels that are North-
East, North, North-West and West of the current pixel (assuming 8-connectivity). 4-
connectivity uses only North and West neighbours of the current pixel. The following
conditions are checked to determine the value of the label to be assigned to the current
pixel (4-connectivity is assumed)
Conditions to check:
1. Does the pixel to the left (West) have the same value?
1. Yes - We are in the same region. Assign the same label to the current
pixel
2. No - Check next condition
2. Do the pixel to the North and West of the current pixel have the same value
but not the same label?
1. Yes - We know that the North and West pixels belong to the same
region and must be merged. Assign the current pixel the minimum of
the North and West labels, and record their equivalence relationship
2. No - Check next condition
3. Does the pixel to the left (West) have a different value and the one to the
North the same value?
1. Yes - Assign the label of the North pixel to the current pixel
2. No - Check next condition
4. Do the pixel's North and West neighbours have different pixel values?
1. Yes - Create a new label id and assign it to the current pixel
The algorithm continues this way, and creates new region labels whenever necessary.
The key to a fast algorithm, however, is how this merging is done. This algorithm
uses the union-find data structure which provides excellent performance for keeping
track of equivalence relationships.[7] Union-find essentially stores labels which
correspond to the same blob in a disjoint-set data structure, making it easy to
remember the equivalence of two labels by the use of an interface method Eg: findSet
(l). findSet (l) returns the minimum label value that is equivalent to the function
argument 'l'.
Once the initial labeling and equivalence recording is completed, the second pass
merely replaces each pixel label with the it's equivalent disjoint-set representative
element.
1. Iterate through each element of the data by column, then by row (Raster
Scanning)
2. If the element is not the background
1. Get the neighbouring elements of the current element
2. If there are no neighbours, uniquely label the current element and
continue
3. Otherwise, find the neighbour with the smallest label and assign it to
the current element
4. Store the equivalence between neighbouring labels
1. The array from which connected regions are to be extracted is given below
2. After the first pass, the following labels are generated. Note that a total of 7 labels
are generated in accordance with the conditions highlighted above.
3. Array generated after the merging of labels is carried out. Here, the label value that
was the smallest for a given region "floods" throughout the connected region and
gives two distinct labels, and hence two distinct labels.
4. Final result in colour to clearly see two different regions that have been found in
the array.
CHAPTER 4
PATTERN RECOGNITION
4.1 What is pattern recognition?
Pattern recognition algorithms generally aim to provide a reasonable answer for all
possible inputs and to do "fuzzy" matching of inputs. This is opposed to pattern
matching algorithms, which look for exact matches in the input with pre-existing
patterns. A common example of a pattern-matching algorithm is regular expression
matching, which looks for patterns of a given sort in textual data and is included in
the search capabilities of many text editors and word processors. In contrast to pattern
recognition, pattern matching is generally not considered a type of machine learning,
although pattern-matching algorithms (especially with fairly general, carefully
tailored patterns) can sometimes succeed in providing similar-quality output to the
sort provided by pattern-recognition algorithms.
4.2 Overview
unlabeled data). Note that in cases of unsupervised learning, there may be no training
data at all to speak of; in other words, the data to be labelled is the training data.
Note that sometimes different terms are used to describe the corresponding supervised
and unsupervised learning procedures for the same type of output. For example, the
unsupervised equivalent of classification is normally known as clustering, based on
the common perception of the task as involving no training data to speak of, and of
grouping the input data into clusters based on some inherent similarity measure (e.g.
the distance between instances, considered as vectors in a multi-dimensional vector
space), rather than assigning each input instance into one of a set of pre-defined
classes. Note also that in some fields, the terminology is different: For example, in
community ecology, the term "classification" is used to refer to what is commonly
known as "clustering".
The piece of input data for which an output value is generated is formally termed an
instance. The instance is formally described by a vector of features, which together
constitute a description of all known characteristics of the instance. (These feature
vectors can be seen as defining points in an appropriate multidimensional space, and
methods for manipulating vectors in vector spaces can be correspondingly applied to
them, such as computing the dot product or the angle between two vectors.) Typically,
features are either categorical (also known as nominal, i.e. consisting of one of a set of
unordered items, such as a gender of "male" or "female", or a blood type of "A", "B",
"AB" or "O"), ordinal (consisting of one of a set of ordered items, e.g. "large",
"medium" or "small"), integer-valued (e.g. a count of the number of occurrences of a
particular word in an email) or real-valued (e.g. a measurement of blood pressure).
Often, categorical and ordinal data are grouped together; likewise for integer-valued
and real-valued data. Furthermore, many algorithms work only in terms of categorical
data and require that real-valued or integer-valued data be discretized into groups (e.g.
less than 5, between 5 and 10, or greater than 10).
Many common pattern recognition algorithms are probabilistic in nature, in that they
use statistical inference to find the best label for a given instance. Unlike other
algorithms, which simply output a "best" label, often times probabilistic algorithms
also output a probability of the instance being described by the given label. In
addition, many probabilistic algorithms output a list of the N-best labels with
associated probabilities, for some value of N, instead of simply a single best label.
When the number of possible labels is fairly small (e.g. in the case of classification),
N may be set so that the probability of all possible labels is output. Probabilistic
algorithms have many advantages over non-probabilistic algorithms:
1. They output a confidence value associated with their choice. (Note that some
other algorithms may also output confidence values, but in general, only for
probabilistic algorithms are this value mathematically grounded in probability
theory. Non-probabilistic confidence values can in general not be given any
specific meaning, and only used to compare against other confidence values
output by the same algorithm.)
2. Correspondingly, they can abstain when the confidence of choosing any
particular output is too low.
3. Because of the probabilities output, probabilistic pattern-recognition
algorithms can be more effectively incorporated into larger machine-learning
Techniques to transform the raw feature vectors are sometimes used prior to
application of the pattern-matching algorithm. For example, feature extraction
algorithms attempt to reduce a large-dimensionality feature vector into a smaller-
dimensionality vector that is easier to work with and encodes less redundancy, using
mathematical techniques such as principal components analysis (PCA). Feature
selection algorithms, attempt to directly prune out redundant or irrelevant features.
The distinction between the two is that the resulting features after feature extraction
has taken place are of a different sort than the original features and may not easily be
interpretable, while the features left after feature selection are simply a subset of the
original features.
CHAPTER 5
BARCODE
The first use of barcodes was to label railroad cars, but they were not commercially
successful until they were used to automate supermarket checkout systems, a task for
which they have become almost universal. Their use has spread to many other tasks
that are generically referred to as Auto ID Data Capture (AIDC). The very first
scanning of the now ubiquitous Universal Product Code (UPC) barcode was on a pack
of Wrigley Company chewing gum in June 1974.
The earliest, and still the cheapest, barcode scanners are built from a fixed light and a
single photosensor that is manually "scrubbed" across the barcode.
Barcode scanners can be classified into three categories based on their connection to
the computer. The older type is the RS-232 barcode scanner. This type requires
special programming for transferring the input data to the application program.
Like the keyboard interface scanner, USB scanners are easy to install and do not need
custom code for transferring input data to the application program.
CHAPTER 6
EDGE DETECTION
6.1 What is edge detection?
A typical edge might for instance be the border between a block of red colour and a
block of yellow. In contrast a line (as can be extracted by a ridge detector) can be a
small number of pixels of a different colour on an otherwise unchanging background.
For a line, there may therefore usually be one edge on each side of the line.
Although certain literature has considered the detection of ideal step edges, the edges
obtained from natural images are usually not at all ideal step edges. Instead they are
normally affected by one or several of the following effects:
1. Focal blur caused by a finite depth-of-field and finite point spread function.
2. Penumbral blur caused by shadows created by light sources of non-zero
radius.
3. Shading at a smooth object
A number of researchers have used a Gaussian smoothed step edge (an error function)
as the simplest extension of the ideal step edge model for modeling the effects of edge
blur in practical applications.[3][5] Thus, a one-dimensional image f which has exactly
one edge placed at x = 0 may be modeled as:
At the left side of the edge, the intensity is , and right of the edge
it is . The scale parameter σ is called the blur scale of the edge.
To illustrate why edge detection is not a trivial task, consider the problem of detecting
edges in the following one-dimensional signal. Here, we may intuitively say that there
should be an edge between the 4th and 5th pixels.
If the intensity difference were smaller between the 4th and the 5th pixels and if the
intensity differences between the adjacent neighbouring pixels were higher, it would
not be as easy to say that there should be an edge in the corresponding region.
Moreover, one could argue that this case is one in which there are several edges.
Hence, to firmly state a specific threshold on how large the intensity change between
two neighbouring pixels must be for us to say that there should be an edge between
these pixels is not always simple.[3] Indeed, this is one of the reasons why edge
detection may be a non-trivial problem unless the objects in the scene are particularly
simple and the illumination conditions can be well controlled (see for example, the
edges extracted from the image with the girl above).
6.5 Approaches
There are many methods for edge detection, but most of them can be grouped into two
categories, search-based and zero-crossing based. The search-based methods detect
edges by first computing a measure of edge strength, usually a first-order derivative
expression such as the gradient magnitude, and then searching for local directional
maxima of the gradient magnitude using a computed estimate of the local orientation
of the edge, usually the gradient direction. The zero-crossing based methods search
for zero crossings in a second-order derivative expression computed from the image
in order to find edges, usually the zero-crossings of the Laplacian or the zero-
crossings of a non-linear differential expression. As a pre-processing step to edge
detection, a smoothing stage, typically Gaussian smoothing, is almost always applied
(see also noise reduction).
The edge detection methods that have been published mainly differ in the types of
smoothing filters that are applied and the way the measures of edge strength are
computed. As many edge detection methods rely on the computation of image
gradients, they also differ in the types of filters used for computing gradient estimates
in the x- and y-directions.
a single edge. He showed that the optimal filter given these assumptions is a sum of
four exponential terms. He also showed that this filter can be well approximated by
first-order derivatives of Gaussians. Canny also introduced the notion of non-
maximum suppression, which means that given the presmoothing filters, edge points
are defined as points where the gradient magnitude assumes a local maximum in the
gradient direction. Looking for the zero crossing of the 2nd derivative along the
gradient direction was first proposed by Haralick. It took less than two decades to find
a modern geometric variational meaning for that operator that links it to the Marr-
Hildreth (zero crossing of the Laplacian) edge detector. That observation was
presented by Ron Kimmel and Alfred Bruckstein.
Although his work was done in the early days of computer vision, the Canny edge
detector (including its variations) is still a state-of-the-art edge detector. Unless the
preconditions are particularly suitable, it is hard to find an edge detector that performs
significantly better than the Canny edge detector.
The Canny-Deriche detector was derived from similar mathematical criteria as the
Canny edge detector, although starting from a discrete viewpoint and then leading to a
set of recursive filters for image smoothing instead of exponential filters or Gaussian
filters.
For estimating image gradients from the input image or a smoothed version of it,
different gradient operators can be applied. The simplest approach is to use central
differences:
corresponding to the application of the following filter masks to the image data:
The well-known and earlier Sobel operator is based on the following filters:
Given such estimates of first- order derivatives, the gradient magnitude is then
computed as:
Other first-order difference operators for estimating image gradient have been
proposed in the Prewitt operator and Roberts cross.
If the edge thresholding is applied to just the gradient magnitude image, the resulting
edges will in general be thick and some type of edge thinning post-processing is
necessary. For edges detected with non-maximum suppression however, the edge
curves are thin by definition and the edge pixels can be linked into edge polygon by
an edge linking (edge tracking) procedure. On a discrete grid, the non-maximum
suppression stage can be implemented by estimating the gradient direction using first-
order derivatives, then rounding off the gradient direction to multiples of 45 degrees,
and finally comparing the values of the gradient magnitude in the estimated gradient
direction.
Edge thinning is a technique used to remove the unwanted spurious points on the edge
of an image. This technique is employed after the image has been filtered for noise
(using median, Gaussian filter etc.), the edge operator has been applied (like the ones
described above) to detect the edges and after the edges have been smoothed using an
appropriate threshold value. This removes all the unwanted points and if applied
carefully, results in one pixel thick edge elements.
Advantages:
There are many popular algorithms used to do this, one such is described below:
4) Do this in multiple passes, i.e. after the north pass, use the same semi processed
image in the other passes and so on.
6) Else keep the point. The number of passes across direction should be chosen
according to the level of accuracy desired.
Some edge-detection operators are instead based upon second-order derivatives of the
intensity. This essentially captures the rate of change in the intensity gradient. Thus,
in the ideal continuous case, detection of zero-crossings in the second derivative
captures local maxima in the gradient.
Written out as an explicit expression in terms of local partial derivatives Lx, Ly ... Lyyy,
this edge definition can be expressed as the zero-crossing curves of the differential
invariant
where Lx, Ly ... Lyyy denote partial derivatives computed from a scale-space
representation L obtained by smoothing the original image with a Gaussian kernel. In
this way, the edges will be automatically obtained as continuous curves with sub-pixel
accuracy. Hysteresis thresholding can also be applied to these differential and
subpixel edge segments.
CHAPTER 7
TEMPLATE MACHINING
Template matching is a technique in digital image processing for finding small parts
of an image which match a template image. It can be used in manufacturing as a part
of quality control, a way to navigate a mobile robot, or as a way to detect edges in
images.
For templates without strong features, or for when the bulk of the template image
constitutes the matching image, a template-based approach may be effective. As
aforementioned, since template-based template matching may potentially require
sampling of a large number of points, it is possible to reduce the number of sampling
points by reducing the resolution of the search and template images by the same factor
and performing the operation on the resultant downsized images (multiresolution, or
pyramid, image processing), providing a search window of data points within the
search image so that the template does not have to search every viable data point, or a
combination of both.
In instances where the template may not provide a direct match, it may be useful to
implement the use of eigenspaces – templates that detail the matching object under a
number of different conditions, such as varying perspectives, illuminations, colour
contrasts, or acceptable matching object “poses”. For example, if the user was looking
for a face, the eigenspaces may consist of images (templates) of faces in different
positions to the camera, in different lighting conditions, or with different expressions.
This method is normally implemented by first picking out a part of the search image
to use as a template: We will call the search image S(x, y), where (x, y) represent the
coordinates of each pixel in the search image. We will call the template T(x t, y t),
where (xt, yt) represent the coordinates of each pixel in the template. We then simply
move the center (or the origin) of the template T(x t, y t) over each (x, y) point in the
search image and calculate the sum of products between the coefficients in S(x, y) and
T(xt, yt) over the whole area spanned by the template. As all possible positions of the
template with respect to the search image are considered, the position with the highest
score is the best position. This method is sometimes referred to as 'Linear Spatial
Filtering' and the template is called a filter mask.
For example, one way to handle translation problems on images, using template
matching is to compare the intensities of the pixels, using the SAD (Sum of absolute
differences) measure.
A pixel in the search image with coordinates (xs, ys) has intensity Is(xs, ys) and a pixel
in the template with coordinates (xt, yt) has intensity It(xt , yt ). Thus the absolute
difference in the pixel intensities is defined as Diff(xs, ys, x t, y t) = | Is(xs, ys) – It(x t, y
t) ./
The mathematical representation of the idea about looping through the pixels in the
search image as we translate the origin of the template at every pixel and take the
SAD measure is the following:
Srows and Scols denote the rows and the columns of the search image and Trows and Tcols
denote the rows and the columns of the template image, respectively. In this method
the lowest SAD score gives the estimate for the best position of template within the
search image. The method is simple to implement and understand, but it is one of the
slowest methods.
Example
+ =
REFERENCES:
1. Introduction to Robotics By Saeed B.Niku