0 оценок0% нашли этот документ полезным (0 голосов)
50 просмотров65 страниц
A scalar function might be sufficient to describe a monochromatic image, while vector functions are used to represent colour images consisting of three component colours. Functions used may be categorized as continuous, discrete or digital. Image processing quite often deals with static images in which time t is constant.
A scalar function might be sufficient to describe a monochromatic image, while vector functions are used to represent colour images consisting of three component colours. Functions used may be categorized as continuous, discrete or digital. Image processing quite often deals with static images in which time t is constant.
Авторское право:
Attribution Non-Commercial (BY-NC)
Доступные форматы
Скачайте в формате PPT, PDF, TXT или читайте онлайн в Scribd
A scalar function might be sufficient to describe a monochromatic image, while vector functions are used to represent colour images consisting of three component colours. Functions used may be categorized as continuous, discrete or digital. Image processing quite often deals with static images in which time t is constant.
Авторское право:
Attribution Non-Commercial (BY-NC)
Доступные форматы
Скачайте в формате PPT, PDF, TXT или читайте онлайн в Scribd
Mathematical models are often used to describe images and other signals. It will be a function depending on some variable with physical meaning; it can be one, two or three dimensional. A scalar function might be sufficient to describe a monochromatic image, while vector functions are used to represent colour images consisting of three component colours. Functions used may be categorized as continuous , discrete or digital. A continuous function has continuous domain and range; if the domain set is discrete, then we get a discrete function, if the range set is also discrete, then we have a digital function. Image functions: By image we shall understand the usual intuitive meaning image on the human retina or the image captured by TV camera. The image can be modeled by a continuous function of two or three variables; in the simple case arguments are coordinates (x,y) in a plane, while if image changes in time a third variable t might be added. The image function values corresponds to the brightness values at image points. The fuction value can express other values as well (distance from the observer). Brightness integrates different optical quantities- using brightness as a basic quantity allows to avoid the description of the very complicated process of image formation. The image on the retina or on TV camera sensor is intrinsically two Dimensional (2 D). Such 2D images bearing brightness information Are called intensity images.
The only information available in an intensity image is the brightness of the appropriate pixel, which is dependent on number of independent factors such as object surface reflectance properties ( given by the surface material, microstructure and marking), illumination properties, and object surface orientation with respect to viewer and light source. It is very difficult to separate these components when trying to recover the 3D geometry of an object from the intensity image. 2D images are used irrespective of original object being 2D or 3D. The image formation process disciplines are photometry -brightness measurement and colorimetry- light reflectance or emission depending on wavelength. Image processing quite often deals with static images in which time t is constant. a monochromatic static image is represented by a continuous image function f ( x, y) whose arguments are two coordinates in the plane. The real world which surrounds us is intrinsically three dimensional. The 2D intensity image is the result of a perspective projection of the 3D scene, which is modeled by the image by a pin hole camera , as illustrated in the previous slide. In this figure P has coordinate (x,y,z) F the focal length and coordinates of projected point is (x,y). Orthographic projection is a limiting case of perspective projection.
xf yf x y z z ' ' = = If f is made infinity it results in orthographic projection. To bring the depth of points in 2D image we need a representation that is independent of view point, and expressed in the coordinate system of the object rather than of the viewer. If such a representation can be recovered, then any intensity image view of the object may be synthesized by standard computer graphics techniques. Recovering the information lost by the perspective projection is only one, mainly geometric, problems of computer vision- a second problem is understanding image brightness. Computerized image processing uses digital image functions which are usually represented by matrices so coordinates are integers. The domain of the image is a region R in the plane ( ) { } , ,1 ,1 m n R x y x x y y = ss s s s where x m and y n represent maximal image coordinates. The customary of axis, in n image is in normal cartesian fashion( horizontal x axis , vertical although the ( row, column) orientation used in matrices is also often used in digital image processing. The range of image function is also limited; by convention in monochromatic images the lowest value corresponds to black and the highest to white. Brightness values bounded by these limits are gray- levels. The quality of digital image grows in proportion to the spatial, radiometric, and time resolutions. The spatial resolution is given by proximity of image samples in the image plane; spectral resolution is give by the band width of the light frequencies captured by the sensor; radiometric resolution corresponds to the no. of distinguishable gray levels; the time resolution is given by the interval between time samples at which images are captured.
The discretization of images from a continuous domain can be achieved by convolutions with Dirac functions. The Fourier transform can be a useful way of decomposing image data. Images are statistical in nature, and it can be natural to represent them as a stochastic process.
Image digitization: An image to be processed by computer must be represented using an appropriate discrete data structure, for example a matrix. An image captured by a sensor is expressed as continuous function f (x, y) of two coordinates in the plane. Image digitization means that the f (x, y) is sampled in to a matrix with M rows and N columns. Image quantization assigns to each continuous sample an integer value- the continuous range of the image f (x, y) is split in to K intervals. The finer the sampling (i e. the larger M and N) and quantization (larger K) the better the approximation of the continuous image function f (x, y) achieved. Two important points in sampling are detrmining sampling period, the distance between two neighbouring sampling pointsin the image, and geometric arrangement of sampling points ( sampling grid ) should be set.
Sampling: A continuous image f(x,y) can be sampled using a discrete grid of sampling points in the plane.( A second possibility is to expand the image function using some orthonormal function as a base- the Fourier transform is an example. The coefficients of this expansion then represents the digitized image.) The image is sampled at points x=jx, y=ky, for j=1,..... M, k=1,......N. Two neighbouring sampling points are separated by a distance x along the x axis and by y along y axis. Distances x and y are called the sampling interval( on the x or y axis) and the matrix of samples f(jx,ky) constitutes the discrete image. The ideal sampling s(x,y) in the regular grid can be represented using a collection of Dirack distributions
1 1 ( , ) ( , ) ................... 1 M N j k s x y x j x y k y o = = = A A
The sampled image f
s (x,y) is the product of the continuous image f(x,y) and sampling function s(x,y) 1 1 ( , ) ( , ) ( , ) ( , ) ( , ) .................. 2 s M N j k f x y f x y s x y f x y x j x y k y o = = = = A A
A continuous image is digitized at sampling points. These sampling
points are ordered in the plane, and their geometric relation is called grid. The digital image is then a data structure usually a matrix. Grids are in practice usually a square or hexagonal. See next slide. It is important to distinguish the grid from the raster ; the raster is the grid on which a neighbourhood relation between points is defined (more later). One infinitely small sampling point in the grid corresponds to one picture element (pixel) in the digital image. The set of pixels together covers the entire image. Pixel is not further divisible and referred to as point.
Quantization: A value of the sampled image f s (jx, ky) is expressed as digital value in image processing. The transition between continuous values Of the image function (brightness) and its digital equivalent is called quantization. The number of quantization levels should be high enough to permit human perception of fine shading details in the image. Most digital image processing devices use the quantization in to k equal Intervals. If b bits are used to express the value of pixel brightness, then The number of brightness levels is k=2 b . Eight bits per pixel are commonly used, although some systems use six or four bits. A binary image, which is either black or white can be represented by one bit. Specialized measuring devices use 12 and more bits per pixel, although These are becoming more common. The occurrence of false contours is the main problem in images which Have been quantized with insufficient brightness levels, less than that humans can easily distinguish. This number is dependent on many factors like average local brightness. To avoid this effect normally about 100 intensity levels are provided. This problem can be reduced when quantization into levels of unequal length is used, the size of intervals corresponding to less less probable brightness in the image is enlarged. These are called grayscale trans., techniques. An efficient representation of brightness values in digital images requires that eight bits, four bits, or one bit are used per pixel, meaning that one, two or eight pixel brightness can be stored in one byte of computer memory. Fig in the following slide demonstrate the effect of reducing the number of brightness levels in an image. Digital image properties: A digital image has several properties, both metric and topological which are somewhat different from those of continuous two dimensional functions with which we are familiar from basic calculus. Another feature of difference is human perception of images, since judgment of image quality is also important. Metric and topological properties: A digital image consists of picture elements with finite size these pixels carry information about the brightness of a particular location in the image. Usually pixels are arranged in rectangular sampling grid. Such a digital image is represented by a two dimensional matrix whose elements are integer numbers corresponding to the quantization levels in the brightness scale. Euclidean distance D E (i, j), (h, k) = The advantage of Euclidean distance is that it is obvious, the disadvantages are costly calculation due to the square root, and its non integer value. ( ) ( ) 2 2 i h j k + The distance between two points can also be expressed as the minimum number of elementary steps in the digital grid which are needed to move from the starting point to the end point. If only horizontal and vertical steps are allowed the distance is known as D 4
D 4 (i, j), (h, k) = i-h + j-k
If moves in diagonal directions are allowed in the digitization grid, the distance is called D 8
city block distance D 8 (i, j), (h, k) = max i-h , j-k
Any of these metrics may be used as the basis of chamfering, in which the distance of pixels from some large subset ( perhaps describing some feature) is generated. The resulting image has pixel values of 0 for elements of the relevant subset, low values for close pixels, and then high values for pixels remote from it- the appearance of this array gives the name to the technique. chess board distance Pixel adjacency is another important concept in digital images. Any two pixels are called 4- neighbours if they have D 4 =1 from each other. Analogously, 8-neighbours are two pixels with D 8 =1. 4- neighbours and 8- neighbours are illustrated below. If there is a path between two pixels in the image , these pixels are called contiguous. A region in the image is a contiguous set. The compliment set R c which is contiguous with the image limits is called background and the rest of the compliment R c is called holes. If there are no holes in a region we call it simply contiguous region. A region with holes is called multiply continugous. some regions in the image are called objects. The brightness of a pixel is a very simple property which can be used to find objects in some images. If a point is darker than some fixed value ( threshold), then it belongs to the object. All such points which are also contiguous constitute one object. A hole consists of points which do not belong to the object are surrounded by the object, and all other points constitute background. An example is the blue typed text on this slide in which the letters are objects. White areas surrounded by the letters are holes( inside o) Other white parts of the slide are background. The border of a region is another important concept in image analysis. The border of a region R is a set of pixels within the region that have one or more neighbours outside R. The definition corresponds to an intuitive understanding of the border as a set of points at the limit of the region. This definition of border is sometimes referred to as inner border to distinguish it from outer border, that is the border of the background ( its compliment) of the region.
An edge is a further concept. This is a local property of a pixel and its immediate neighbourhood- it is a vector given by a magnitude and direction. Images with many brightness levels are used for edge computation and the gradient of the image function is used to compute edges. The edge direction is perpendicular to the gradient direction which points in the direction of image function growth. Note that there is difference between border and edge. The border is a global concept related to a region, while edge expresses local properties of an image function. The border and the edges are related as well. One possibility for finding boundaries is chaining the significant edges ( points with high gradient of the image function. The edge property is attached to one pixel and its neighbourhood- some times it is of advantage to asses properties between pairs of neighbouring pixels, and the concept of the crack edge comes from this idea. Four crack edges are attached to each pixel, which are defined by its relation to its 4-neighbours. The direction of crack edge is that of increasing brightness, and is a multiple of 90, while magnitude is the absolute difference between the brightness of the relevant pair of pixels. Crack edges are illustrated below. Images have topological properties that are invariant to rubber sheet transformations. The convex hull of a region may be described as the minimal convex subset containing it. The brightness histogram is a global descriptor of image intensity. Understanding Human visual perception is essential for design of image display. Human visual perception is sensitive to contrast , acuity, border perception and colour. each of these may provoke visual paradoxes. Live images are prone to noise. It may be possible to measure its extent quantitatively. White, Gaussian, impulsive, and salt and pepper are common models. Signal to noise ratio is a measure of image quality. Colour images: Colour is a property of enormous importance to human visual perception, but historically it has not been particularly used in digital image processing. The reason for this has been the cost of suitable hardware, but since the 1980s this has declined sharply, and colour images are now routinely accessible via TV Cameras or scanners. The large memory requirement has also been solved because of reduction in cost of memory. Colour display is of course the default in most computer systems. It is useful because monochromatic images may not contain enough information for many applications, while colour or multi spectral images can often help. Colour is connected with the ability of objects to reflect electro magnetic waves of different wave lengths; the chromatic spectrum spans the electro magnetic spectrum from approximately 400 nm to 700 nm. RGB model. A particular pixel may have associated with it a three dimensional vector (r,g.b) which provides the respective colour intensities, where (0,0,0) is black and (k, k, k) is white, (k,0,0) is pure red, and so on- k here is the quantization granularity for each primary ( 256 is common). This implies a colour space of k 3 distinct colours (2 24 if k=256) primary (256 is common) which not all displays, particularly older ones, can accomadate. For this reason For display purposes, it is common to specify a subset of this space that is actually used; this subset is often called a pallet. The R G B may be thought of as a 3D coordinatization of colour space ( fig. in previous slide); note secondary colours which are combinations of two pure primaries. Most image sensors provide data according to this model; the image can be captured by several sensors, each of which is sensitive to a rather narrow band of wavelengths, and the image function at the sensor output is just as in the simple case. Each spectral band is digitized independently and represented by an individual digital image function as if it were a monochromatic image. Other colour models turnout to be equally important. CMY-cyan, magenta, and yellow which is based on the secondaries and is used to construct a subtractive colour scheme. The Y I Q model (or I Y Q ) is useful in colour TV broadcasting, and is a simple linear transform of an R G B representation. The alternative model of most relevance to image processing is HSI (or IHS) Hue, Saturation and intensity. Hue refers to the perceived colour (technically the dominant wavelength for example purple or orange, ans saturation measures its dilution by white light Hue refers to the perceived colour (technically the dominant wavelength for example purple or orange, ans saturation measures its dilution by white light giving rise to light purple or dark purple. HSI decouples intensity information from the colour, while hue and saturation correspond to human perception, thus making this representation very useful for developing image processing algorithms.
Morphology: Morphological operators: Contents Dilation - grow image regions
Erosion - shrink image regions
Opening - structured removal of image region boundary pixels
Closing - structured filling in of image region boundary pixels
Hit and Miss Transform- image pattern matching and marking
Thinning - structured erosion using image pattern matching
Thickening - structured dilation using image pattern matching
Skeletonization / Medial Axis Transform- finding skeletons of binary regions Dilation, Common Names: Dilate, Grow, Expand
Brief Description: Dilation is one of the two basic operators in the area of mathematical morphology, the other being erosion.
It is typically applied to binary images, but there are versions that work on grayscale images. The basic effect of the operator on a binary image is to gradually enlarge the boundaries of regions of foreground pixels (i.e. white pixels, typically). Thus areas of foreground pixels grow in size while holes within those regions become smaller. How It Works Useful background to this description is given in the mathematical morphology section of the Glossary. The dilation operator takes two pieces of data as inputs. The first is the image which is to be dilated. The second is a (usually small) set of coordinate points known as a structuring element (also known as a kernel). It is this structuring element that determines the precise effect of the dilation on the input image. The mathematical definition of dilation for binary images is as follows: Suppose that X is the set of Euclidean coordinates corresponding to the input binary image, and that K is the set of coordinates for the structuring element. Let Kx denote the translation of K so that its origin is at x. Then the dilation of X by K is simply the set of all points x such that the intersection of Kx with X is non-empty. The mathematical definition of grayscale dilation is identical except for the way in which the set of coordinates associated with the input image is derived. In addition, these coordinates are 3-D rather than 2-D. As an example of binary dilation, suppose that the structuring element is a 33 square, with the origin at its center, as shown in Figure 1. Note that in this and subsequent diagrams, foreground pixels are represented by 1's and background pixels by 0's.
Figure 1 A 33 square structuring element To compute the dilation of a binary input image by this structuring element, we consider each of the background pixels in the input image in turn. For each background pixel (which we will call the input pixel) we superimpose the structuring element on top of the input image so that the origin of the structuring element coincides with the input pixel position. If at least one pixel in the structuring element coincides with a foreground pixel in the image underneath, then the input pixel is set to the foreground value. If all the corresponding pixels in the image are background, however, the input pixel is left at the background value. For our example 33 structuring element, the effect of this operation is to set to the foreground color any background pixels that have a neighboring foreground pixel (assuming 8-connectedness). Such pixels must lie at the edges of white regions, and so the practical upshot is that foreground regions grow (and holes inside a region shrink). Dilation is the dual of erosion i.e. dilating foreground pixels is equivalent to eroding the background pixels.
Guidelines for Use Most implementations of this operator expect the input image to be binary, usually with foreground pixels at pixel value 255, and background pixels at pixel value 0. Such an image can often be produced from a grayscale image using thresholding. It is important to check that the polarity of the input image is set up correctly for the dilation implementation being used. The structuring element may have to be supplied as a small binary image, or in a special matrix format, or it may simply be hardwired into the implementation, and not require specifying at all. In this latter case, a 33 square structuring element is normally assumed which gives the expansion effect described above. The effect of a dilation using this structuring element on a binary image is shown in Figure 2.
Figure 2 Effect of dilation using a 33 square structuring element
The 33 square is probably the most common structuring element used in dilation operations, but others can be used. A larger structuring element produces a more extreme dilation effect, although usually very similar effects can be achieved by repeated dilations using a smaller but similarly shaped structuring element. With larger structuring elements, it is quite common to use an approximately disk shaped structuring element, as opposed to a square one. The image
shows a thresholded image of
The basic effect of dilation on the binary is illustrated in
This image was produced by two dilation passes using a disk shaped structuring element of 11 pixels radius. Note that the corners have been rounded off. In general, when dilating by a disk shaped structuring element, convex boundaries will become rounded, and concave boundaries will be preserved as they are. Dilations can be made directional by using less symmetrical structuring elements. e.g. a structuring element that is 10 pixels wide and 1 pixel high will dilate in a horizontal direction only. Similarly, a 33 square structuring element with the origin in the middle of the top row rather than the center, will dilate the bottom of a region more strongly than the top. Grayscale dilation with a flat disk shaped structuring element will generally brighten the image. Bright regions surrounded by dark regions grow in size, and dark regions surrounded by bright regions shrink in size. Small dark spots in images will disappear as they are `filled in' to the surrounding intensity value. Small bright spots will become larger spots. The effect is most marked at places in the image where the intensity changes rapidly and regions of fairly uniform intensity will be largely unchanged except at their edges. Figure 3 shows a vertical cross-section through a graylevel image and the effect of dilation using a disk shaped structuring element.
Figure 3 Graylevel dilation using a disk shaped structuring element. The graphs show a vertical cross-section through a graylevel image. Erosion
Common Names: Erode, Shrink, Reduce
Brief Description Erosion is one of the two basic operators in the area of mathematical morphology, the other being dilation. It is typically applied to binary images, but there are versions that work on grayscale images. The basic effect of the operator on a binary image is to erode away the boundaries of regions of foreground pixels (i.e. white pixels, typically). Thus areas of foreground pixels shrink in size, and holes within those areas become larger. How It Works Useful background to this description is given in themathematical morphology section of the Glossary. The erosion operator takes two pieces of data as inputs. The first is the image which is to be eroded. The second is a (usually small) set of coordinate points known as a structuring element (also known as a kernel).It is this structuring element that determines the precise effect of the erosion on the input image. The mathematical definition of erosion for binary images is as follows: Suppose that X is the set of Euclidean coordinates corresponding to the input binary image, and that K is the set of coordinates for the structuring element. Let Kx denote the translation of K so that its origin is at x. Then the erosion of X by K is simply the set of all points x such that Kx is a subset of X. The mathematical definition for grayscale erosion is identical except in the way in which the set of coordinates associated with the input image is derived. In addition, these coordinates are 3-D rather than 2-D. As an example of binary erosion, suppose that the structuring element is a 33 square, with the origin at its center as shown in Figure 1. Note that in this and subsequent diagrams, foreground pixels are represented by 1's and background pixels by 0's.
Figure 1 A 33 square structuring element To compute the erosion of a binary input image by this structuring element, we consider each of the foreground pixels in the input image in turn. For each foreground pixel (which we will call the input pixel) we superimpose the structuring element on top of the input image so that the origin of the structuring element coincides with the input pixel coordinates. If for every pixel in the structuring element, the corresponding pixel in the image underneath is a foreg round pixel, then the input pixel is left as it is. If any of the corresponding pixels in the image are background, however, the input pixel is also set to background value. For our example 33 structuring element, the effect of this operation is to remove any foreground pixel that is not completely surrounded by other white pixels (assuming 8-connectedness). Such pixels must lie at the edges of white regions, and so the practical upshot is that foreground regions shrink (and holes inside a region grow). Erosion is the dual of dilation, i.e. eroding foreground pixels is equivalent to dilating the background pixels.
Hit-and-Miss Transform
Common Names: Hit-and-miss Transform, Hit-or-miss Transform
Brief Description The hit-and-miss transform is a general binary morphological operation that can be used to look for particular patterns of foreground and background pixels in an image. It is actually the basic operation of binary morphology since almost all the other binary morphological operators can be derived from it. As with other binary morphological operators it takes as input a binary image and a structuring element, and produces another binary image as output.
How It Works The structuring element used in the hit-and-miss is a slight extension to the type that has been introduced for erosion and dilation, in that it can contain both foreground and background pixels, rather than just foreground pixels, i.e. both ones and zeros. Note that the simpler type of structuring element used with erosion and dilation is often depicted containing both ones and zeros as well, but in that case the zeros really stand for `don't care's', and are just used to fill out the structuring element to a convenient shaped kernel, usually a square. In all our illustrations, these `don't care's' are shown as blanks in the kernel in order to avoid confusion. An example of the extended kind of structuring element is shown in Figure 1. As usual we denote foreground pixels using ones, and background pixels using zeros.
Figure 1 Example of the extended type of structuring element used in hit-and-miss operations. This particular element can be used to find corner points, as explained below. The hit-and-miss operation is performed in much the same way as other morphological operators, by translating the origin of the structuring element to all points in the image, and then comparing the structuring element with the underlying image pixels. If the foreground and background pixels in the structuring element exactly match foreground and background pixels in the image, then the pixel underneath the origin of the structuring element is set to the foreground color. If it doesn't match, then that pixel is set to the background color. For instance, the structuring element shown in Figure 1 can be used to find right angle convex corner points in images. Notice that the pixels in the element form the shape of a bottom-left convex corner. We assume that the origin of the element is at the center of the 33 element. In order to find all the corners in a binary image we need to run the hit-and-miss transform four times with four different elements representing the four kinds of right angle corners found in binary images. Figure 2 shows the four different elements used in this operation.
Figure 2 Four structuring elements used for corner finding in binary images using the hit-and-miss transform. Note that they are really all the same element, but rotated by different amounts. After obtaining the locations of corners in each orientation, We can then simply OR all these images together to get the final result showing the locations of all right angle convex corners in any orientation. Figure 3 shows the effect of this ccorner detection on a simple binary image.
Figure 3 Effect of the hit-and-miss based right angle convex corner detector on a simple binary image. Note that the `detector' is rather sensitive.
Implementations vary as to how they handle the hit-and-miss transform at the edges of images where the structuring element overlaps the edge of the image. A simple solution is to simply assume that any structuring element that overlaps the image does not match underlying pixels, and hence the corresponding pixel in the output should be set to zero. The hit-and-miss transform has many applications in more complex morphological operations. It is being used to construct the thinning and thickening operators, and hence for all applications explained in these worksheets. Guidelines for Use The hit-and-miss transform is used to look for occurrences of particular binary patterns in fixed orientations. It can be used to look for several patterns (or alternatively, for the same pattern in several orientations as above) simply by running successive transforms using different structuring elements, and then ORing the results together. The operations of erosion, dilation, opening, closing, thinning and thickening can all be derived from the hit-and-miss transform in conjunction with simple set operations. Figure 4 illustrates some structuring elements that can be used for locating various binary features.
Figure 4 Some applications of the hit-and-miss transform. 1 is used to locate isolated points in a binary image. 2 is used to locate the end points on a binary skeleton Note that this structuring element must be used in all its rotations so four hit-and-miss passes are required. 3a and 3b are used to locate the triple points (junctions) on a skeleton. Both structuring elements must be run in all orientations so eight hit-and-miss passes are required.
Skeletonization/Medial Axis Transform
Common Names: Skeletonization, Medial axis transform
Brief Description Skeletonization is a process for reducing foreground regions in a binary image to a skeletal remnant that largely preserves the extent and connectivity of the original region while throwing away most of the original foreground pixels. To see how this works, imagine that the foreground regions in the input binary image are made of some uniform slow-burning material. Light fires simultaneously at all points along the boundary of this region and watch the fire move into the interior. At points where the fire traveling from two different boundaries meets itself, the fire will extinguish itself and the points at which this happens form the so called `quench line'. This line is the skeleton. Under this definition it is clear that thinning produces a sort of skeleton. Another way to think about the skeleton is as the loci of centers of bi-tangent circles that fit entirely within the foreground region being considered. Figure 1 illustrates this for a rectangular shape.
Figure 1 Skeleton of a rectangle defined in terms of bi-tangent circles. The terms medial axis transform (MAT) and skeletonization are often used interchangeably but we will distinguish between them slightly. The skeleton is simply a binary image showing the simple skeleton. The MAT on the other hand is a graylevel image where each point on the skeleton has an intensity which represents its distance to a boundary in the original object. How It Works The skeleton/MAT can be produced in two main ways. The first is to use some kind of morphological thinning that successively erodes away pixels from the boundary (while preserving the end points of line segments) until no more thinning is possible, at which point what is left approximates the skeleton. The alternative method is to first calculate the distance transform of the image. The skeleton then lies along the singularities (i.e. creases or curvature discontinuities) in the distance transform. This latter approach is more suited to calculating the MAT since the MAT is the same as the distance transform but with all points off the skeleton suppressed to zero. Note: The MAT is often described as being the `locus of local maxima' on the distance transform. This is not really true in any normal sense of the phrase `local maximum'. If the distance transform is displayed as a 3-D surface plot with the third dimension representing the grayvalue, the MAT can be imagined as the ridges on the 3-D surface. Lighting techniques: Structured illumination: One of the characteristics of robot vision that sets it apart from artificial vision in general is the ability of the user, in most instances, to carefully structure the lighting in the viewing area. Optimal lighting is a very inexpensive way to increase the reliability and accuracy of a robot vision system. In addition with the use of structured lighting, three dimensional information about the object, including complete height profile can be obtained. Light sources: Perhaps the most effective form of lighting, when it is feasible is back lighting. For a scene that uses back lighting, the illumination comes from behind the scene, so that objects appear as silhouttes as shown in the next slide. In this case, the object should be opaque in comparison with the background material. The principal advantage of back lighting is it produces good contrast between the objects and the background.
Consequently, the gray level threshold needed to separate the dark foreground objects from the light background is easily found. Usually a wide range of gay level thresholds will be effective in segmenting the foreground area from the background. A light table for backlighting can be constructed by shining one or more lamps on to a diffused translucent surface such as ground glass Diffused plastic, or frosted Mylar. Since only the silhouette of the object is visible to the camera, no direct information about the height of the object is available through backlighting. In fact backlighting is most effective in inspection of thin, flat objects. More generally, back lighting can be used to inspect objects whose essential characteristics are revealed by profiles generated by one or mare physically stable poses. Can distinguish between coins or keys but not heads or tails of a coin. when the outline or silhouette of an object does not provide sufficient information about an object, then some form of front lighting must be used. Front lighting is also necessary when backlighting is simply not feasible, for example when parts are being transported on conveyor belt. with front lighting, the light source or sources are on the same side as the camera, as indicated below There are several forms of front lighting which differs from one another in the relative positions and orientations of the camera, the light sources and the object. Scenes with front lighting are typically low contrast scenes in comparison with scenes with backlighting. As a result, considerable care must be taken in arranging the As a result, considerable care must be taken in arranging the lighting and background to produce a uniformly illuminated scene of maximum contrast. even then, the gray level threshold needed to separate the foreground area from background can be sensitive. If possible the background should be selected to contrast sharply with the foreground objects. The last basic form of lighting is side lighting, which can be used to inspect surface defects such as bumps or dimples on an otherwise flat surface. If arranged at acute angle the defects will be emphasized by either casting shadows or creating reflections depending upon the material. Shown below is an arrangement.
Light Patterns: The lighting schemes examined thus far are all based on uniform illumination of the entire scene. With the advent of lasers and the use of specialized lenses and optical masks, a variety of patterns of light can be projected on to the scene as well. The presence of a three dimensional object tends to modulate these patterns of light when they are viewed from proper perspective. By examining the modulated pattern, we can often infer such things as the presence of an object, the objects dimensions, and the orientations of object surfaces. There is one simple light pattern that can be used to detect the presence of three dimensional object and measure its height. Here a line or stripe of light is projected on to a scene from the side with a laser and a cylindrical lens as shown in next slide. Triangulation: A common method of measuring the depth of a particular point on an object is to use range triangles. consider the arrangement shown below.
Here the light source might be a laser or some other source capable of projecting a narrow beam of light. From the diagram it can be seen that the distance of lens from the point p on the object can be calculated. If two points are measured like that the height between the points can be found. camera calibration: In robotic applications, the objective is to determine the position and orientation of each part relative the base plate of the robot. Once this information is known the tool configuration camera base T can be selected and then a joint space trajectory q(t) can be computed, so as to manipulate the part. With the aid of a robot vision system, ne can determine the position and orientation of a part relative to the camera. Thus to determine the part coordinates relative to the robot base we must have an accurate transformation from camera coordinates to the base coordinates. Experimentally determining this transformation is called camera calibration. In general camera calibration requires determining both the position and the orientation of the camera.
camera base T Thresholding: Gray-level thresholding is the simplest segmentation process. Many objects or image regions are characterized by constant reflectivity, or light absorption of their surfaces; a brightness constant or threshold can be determined to segment objects and background. Thresholding is computationally inexpensive and fast- it is oldest segmentation method and is still used in simple applications; thresholding can easily be done in real time using specialized hardware. A complete segmentation of an image R is a finite set of regions R 1 .... ..., R s , R=
1 0 s i i j i R R R i j = = = Complete segmentation can result from thresholding in simple scenes. thresholding is the transformation of an image f to an output (segmented) binary image g as follows:
( , ) 1 ( , ) 0 ( , ) .............. 2 g i j for f i j T for f i j T = > = where T is the threshold , g(i,j) =1 for image elements of objects, and g(i,j)=0 for image elements of the background( or vice versa). Basic thresholding: Search all the elemnts f(i,j) of the image f. An image element g(i,j) of the segmented image is an object pixel if f(i,j) T, an dis background pixel otherwise.
fig 5.1 page 125 Sonka Correct threshold selection is crucial for successful threshold segmentation, this selection can be determined interactively or it can be the result of some threshold detection method. only under certain special circumstances can thresholding be successful using a single threshold for the whole image ( global thresholding) since even in very simple images there are likely to be gray level variations in objects and background; this variation may be due to non-uniform lighting, non- uniform input device parameters or a number of other factors. Segmentation using various other thresholds where the threshold value varies over the image as a function of local image characteristics, can produce the solution in these cases. Global thresholding T = T(f) ocal threshold T = T(f,f c ) Where f c is that image part in which the threshold is determined. Basic thresholding as defined by equation 2 of slide 59 has many modifications. One possibility is to segment an image in regions of pixels with gray levels from a set D and into background otherwise. ( , ) 1 ( , ) 0 . g i j for f i j D otherwise = e = This thresholding can be useful in micrscopic blood cell segmentation, where a particular gray level interval represents cytoplasma, the background is lighter and the cell kernel darker. This thresholding definition can serve as border detector as well. fig 5.2 page 126 Sonka There are many modifications that use multiple thresholds, after which the resulting image is no longer binary, but rather an image consisting of a very limited set of gray levels 1 2 3 ( , ) 1 ( , ) 2 ( , ) 3 ( , ) ................. ( , ) 0 n g i j for f i j D for f i j D for f i j D n for f i j D otherwise = e = e = e = e = Where each D i is a specified subset f gray levels. Another special choice of gray-level subset D i defines semi thresholding which is sometimes used to make human assisted analysis easier: ( , ) ( , ) ( , ) 0 ( , ) g i j f i j for f i j T for f i j T = > = < Threshold detection methods: p-tile thresholding: If some property of an image after segmentation is known, the task of threshold selection is simplified, since the threshold is chosen to satisfy this property. a printed text sheet may be an example if we know that characters of the text cover 1/p of the sheet area. Using this prior information about the ratio between the sheet area and the character area, it is very to choose a threshold T such that 1/p of the image area has gray values less than T and the rest has gray values larger than T. More complex methods of threshold detection are based on histogram shape analysis. Optimal thresholding: Methods based on approximation of the histogram of an image using a weighted sum of two or more probability densities with normal distribution represent a different approach called optimal thresholding. The threshold is set as gray level corresponding to the minimum probability between the maxima of two or ore normal distributions, which results in minimum error segmentation. fig 5.4 page 129 sonka Multi- spectral thresholding: Many practical segmentation problems need more information than is contained in one spectral band. Colour images are a natural example, in which information is coded in three spectral bands for example red green and blue; multi spectral remote sensing images or meteorological satelite images may have even more spectral bands. One segmentation approach determines threshold independently in each spectral band and combines them in to a single segmented image. Thresholding in hierarchical data structures: The general idea of thresholding in hierarchical data structures is based on local thresholding methods, the aim being to detect the presence of a region in a low resolution image, and to give the region more precision in images higher to full resolution.